arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2402.14035 2026-06-19 cs.LG cs.AI 版本更新

Wisdom of Committee: Diverse Distillation from Large Foundation Models and Domain Experts

委员会智慧：来自大型基础模型和领域专家的多样化蒸馏

Zichang Liu, Qingyun Liu, Yuening Li, Liang Liu, Anshumali Shrivastava, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao

发表机构 * Rice University（Rice大学）； Google DeepMind（谷歌DeepMind）； Google Inc（谷歌公司）； University of California, Davis（加州大学戴维斯分校）

AI总结针对基础模型向紧凑领域模型蒸馏时能力、架构和模态差异大的问题，提出DiverseDistill框架，通过可学习的问答机制和对齐异构教师输出，在推荐和视觉任务上恢复73-114%的性能差距。

Comments Accepted at the 1st Workshop on Resource-Efficient Learning and Knowledge Discovery (RelKD), KDD 2026

Journal ref Proceedings of the RelKD Workshop at KDD 2026

详情

AI中文摘要

从基础模型向紧凑领域模型进行知识蒸馏因能力、架构和模态的巨大差异而具有挑战性。例如，在我们的实验中，从7600万参数的语言模型蒸馏到200万参数的推荐模型仅能弥补未蒸馏学生与教师之间不到40%的性能差距。我们表明，引入与基础模型共享学生架构特征的领域专家作为多样化教师委员会，能显著改善迁移效果。然而，标准的多教师方法未能利用这种多样性：简单组合异构教师可能使性能低于单教师蒸馏。为此，我们提出DiverseDistill，一种交互式蒸馏框架，采用可学习的问答机制生成教师条件查询，并将异构教师输出对齐到学生的表示空间。与需要基于梯度的协同优化或修改教师架构的方法不同，DiverseDistill在冻结教师的情况下仅通过其中间层的前向推理运行：无需参数更新、无需协同训练、无需架构修改。动态教师重要性机制通过过滤每个样本中低相关性的教师（例如，在推荐任务中减少约30%的前向传播且无质量损失）进一步降低训练成本，而整个蒸馏模块在训练后被丢弃，推理时零开销。在推荐（38倍压缩）和视觉（3.6倍压缩）任务上的评估表明，DiverseDistill恢复了73-114%的师生性能差距，持续优于所有单教师和多教师基线方法。

英文摘要

Knowledge distillation from foundation models to compact domain models is challenging due to substantial gaps in capacity, architecture, and modality. For example, in our experiments, distilling from a 76M-parameter language model to a 2M-parameter recommender closes less than 40% of the performance gap between the undistilled student and the teacher. We show that introducing domain-specific experts -- which share the student's architectural characteristics -- alongside the foundation model as a diverse teacher committee significantly improves transfer. However, standard multi-teacher methods fail to exploit this diversity: naively combining heterogeneous teachers can degrade performance below single-teacher distillation. To address this, we propose DiverseDistill, an interactive distillation framework that employs a learnable Question-Answer mechanism to generate teacher-conditioned queries and align heterogeneous teacher outputs into the student's representation space. Unlike methods requiring gradient-based co-optimization or architectural modification of teachers, DiverseDistill operates with frozen teachers using only forward-pass inference through their intermediate layers: no parameter updates, no co-training, and no architectural surgery. A dynamic teacher importance mechanism further reduces training cost by filtering low-relevance teachers per sample (e.g., ~30% fewer forward passes with no quality loss for recommendation tasks), while the entire Distillation Module is discarded after training, adding zero inference overhead. Evaluations on recommendation (38x compression) and vision (3.6x compression) tasks demonstrate that DiverseDistill recovers 73-114% of the teacher-student performance gap, consistently outperforming all single- and multi-teacher baselines.

URL PDF HTML ☆

赞 0 踩 0

2501.18322 2026-06-19 cs.LG math.AP 版本更新

A Unified Perspective on the Dynamics of Deep Transformers

深度Transformer动力学的统一视角

Valérie Castin, Pierre Ablin, José Antonio Carrillo, Gabriel Peyré

发表机构 * CNRS and Ecole Normale Supérieure PSL（CNRS和巴黎高等师范大学）； Apple（苹果公司）； Mathematical Institute, University of Oxford（牛津大学数学学院）

AI总结提出Transformer PDE作为注意力层迭代的均场极限，证明其适定性并分析高斯初始数据下的各向异性演化与聚类现象。

详情

AI中文摘要

Transformer在大多数机器学习任务中是最先进的，它将数据表示为称为token的向量序列。然后通过注意力函数利用这种表示，该函数学习token之间的依赖关系，是Transformer成功的关键。然而，跨层迭代应用注意力会导致复杂的动力学，这些动力学尚未被完全理解。为了分析这些动力学，我们将每个输入序列识别为一个概率测度，并将其演化建模为称为Transformer PDE的Vlasov方程，其速度场在概率测度中是非线性的。我们的第一组贡献聚焦于紧支撑初始数据。我们证明Transformer PDE是适定的，并且是相互作用粒子系统的均场极限，从而将先前的分析推广并扩展到自注意力的几种变体：多头注意力、L2注意力、Sinkhorn注意力、Sigmoid注意力和掩码注意力——利用条件Wasserstein框架。在第二组贡献中，我们首次研究非紧支撑初始条件，聚焦于高斯初始数据。再次针对不同类型的注意力，我们证明Transformer PDE保持高斯测度空间，这使我们能够从理论上和数值上分析高斯情况以识别典型行为。这种高斯分析捕捉了通过深度Transformer的数据各向异性演化。特别地，我们强调了与先前在非归一化离散情况下的结果平行的聚类现象。

英文摘要

Transformers, which are state-of-the-art in most machine learning tasks, represent the data as sequences of vectors called tokens. This representation is then exploited by the attention function, which learns dependencies between tokens and is key to the success of Transformers. However, the iterative application of attention across layers induces complex dynamics that remain to be fully understood. To analyze these dynamics, we identify each input sequence with a probability measure and model its evolution as a Vlasov equation called Transformer PDE, whose velocity field is non-linear in the probability measure. Our first set of contributions focuses on compactly supported initial data. We show the Transformer PDE is well-posed and is the mean-field limit of an interacting particle system, thus generalizing and extending previous analysis to several variants of self-attention: multi-head attention, L2 attention, Sinkhorn attention, Sigmoid attention, and masked attention--leveraging a conditional Wasserstein framework. In a second set of contributions, we are the first to study non-compactly supported initial conditions, by focusing on Gaussian initial data. Again for different types of attention, we show that the Transformer PDE preserves the space of Gaussian measures, which allows us to analyze the Gaussian case theoretically and numerically to identify typical behaviors. This Gaussian analysis captures the evolution of data anisotropy through a deep Transformer. In particular, we highlight a clustering phenomenon that parallels previous results in the non-normalized discrete case.

URL PDF HTML ☆

赞 0 踩 0

2511.04514 2026-06-19 cs.LG 版本更新

Linear Mode Connectivity under Data Shifts for Deep Ensembles of Image Classifiers

图像分类器深度集成在数据偏移下的线性模式连通性

C. Hepburn, T. Zielke, A. P. Raulf

发表机构 * Institute for AI Safety & Security（人工智能安全与安全研究所）

AI总结实验研究数据偏移下线性模式连通性（LMC）的条件，发现小学习率和大批量可减轻其影响，并揭示LMC在训练效率与集成多样性间的权衡。

Comments 17 pages, 22 figures

详情

AI中文摘要

线性模式连通性（LMC）现象将深度学习的多个方面联系起来，包括噪声随机梯度下的训练稳定性、局部最小值（盆地）的平滑性和泛化性、采样模型的相似性和功能多样性，以及架构对数据处理的影响。在这项工作中，我们实验研究了数据偏移下的LMC，并确定了减轻其影响的条件。我们将数据偏移解释为随机梯度噪声的额外来源，可以通过小学习率和大批量来减少。这些参数影响模型是收敛到相同的局部最小值，还是收敛到损失景观中具有不同平滑性和泛化性的区域。尽管通过LMC采样的模型往往比收敛到不同盆地的模型更频繁地犯相似错误，但LMC的好处在于平衡训练效率与从更大、更多样化的集成中获得的收益。代码和补充材料可从此https URL获取。本工作已提交给IEEE考虑发表。版权可能随时转移，此后此版本可能不再可访问。

英文摘要

The phenomenon of linear mode connectivity (LMC) links several aspects of deep learning, including training stability under noisy stochastic gradients, the smoothness and generalization of local minima (basins), the similarity and functional diversity of sampled models, and architectural effects on data processing. In this work, we experimentally study LMC under data shifts and identify conditions that mitigate their impact. We interpret data shifts as an additional source of stochastic gradient noise, which can be reduced through small learning rates and large batch sizes. These parameters influence whether models converge to the same local minimum or to regions of the loss landscape with varying smoothness and generalization. Although models sampled via LMC tend to make similar errors more frequently than those converging to different basins, the benefit of LMC lies in balancing training efficiency against the gains achieved from larger, more diverse ensembles. Code and supplementary materials are available at https://github.com/DLR-KI/LMC. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

URL PDF HTML ☆

赞 0 踩 0

2602.09689 2026-06-19 cs.LG 版本更新

Model soups need only one ingredient

模型汤只需一种成分

Alireza Abdollahpoorrostam, Nikolaos Dimitriadis, Adam Hazimeh, Pascal Frossard

发表机构 * EPFL（瑞士联邦理工学院）； EPFL LTS4（瑞士联邦理工学院 LTS4）

AI总结提出MonoSoup方法，利用SVD分解单检查点的层更新，通过熵有效秩自动重加权成分，实现强分布内-分布外平衡，无需多检查点。

详情

AI中文摘要

在目标分布上微调大型预训练模型通常会提高分布内（ID）准确性，但代价是分布外（OOD）鲁棒性下降，因为表示会专门适应微调数据。权重空间集成方法，如模型汤（Model Soups），通过平均多个检查点来缓解这一影响，但它们在计算上代价高昂，需要训练和存储数十个微调模型。在本文中，我们介绍了MonoSoup，一种简单、无数据、无超参数的事后方法，仅使用单个检查点即可实现强大的ID-OOD平衡。我们的方法对每一层的更新应用奇异值分解（SVD），将其分解为捕捉任务特定适应的高能量方向和引入噪声但可能仍编码对鲁棒性有用的残余信号的低能量方向。然后，MonoSoup使用基于熵的有效秩自动重新加权这些分量，并考虑模型的谱和几何结构的逐层系数。在ImageNet上微调并在自然分布偏移下评估的CLIP模型，以及在数学推理和多选题基准上测试的Qwen语言模型上的实验表明，这种即插即用方法是多检查点方法的实用且有效的替代方案，保留了其大部分好处而无需计算开销。

英文摘要

Fine-tuning large pre-trained models on a target distribution often improves in-distribution (ID) accuracy, but at the cost of out-of-distribution (OOD) robustness as representations specialize to the fine-tuning data. Weight-space ensembling methods, such as Model Soups, mitigate this effect by averaging multiple checkpoints, but they are computationally prohibitive, requiring the training and storage of dozens of fine-tuned models. In this paper, we introduce MonoSoup, a simple, data-free, hyperparameter-free, post-hoc method that achieves a strong ID-OOD balance using only a single checkpoint. Our method applies Singular Value Decomposition (SVD) to each layer's update and decomposes it into high-energy directions that capture task-specific adaptation and low-energy directions that introduce noise but may still encode residual signals useful for robustness. MonoSoup then uses entropy-based effective rank to automatically re-weigh these components with layer-wise coefficients that account for the spectral and geometric structure of the model. Experiments on CLIP models fine-tuned on ImageNet and evaluated under natural distribution shifts, as well as on Qwen language models tested on mathematical reasoning and multiple-choice benchmarks, show that this plug-and-play approach is a practical and effective alternative to multi-checkpoint methods, retaining much of their benefits without their computational overhead.

URL PDF HTML ☆

赞 0 踩 0

2604.15838 2026-06-19 cs.LG 版本更新

Reversible Residual Normalization Alleviates Spatio-Temporal Distribution Shift

可逆残差归一化缓解时空分布偏移

Zhaobo Hu, Vincent Gauthier, Mehdi Naima

发表机构 * CNRS -- LIP6 Sorbonne Universit\'e

AI总结针对时空分布偏移问题，提出可逆残差归一化框架，通过空间感知可逆变换同时处理时空维度偏移，结合图卷积与谱约束图神经网络实现自适应归一化。

详情

AI中文摘要

分布偏移严重降低了深度预测模型的性能。虽然这一问题在单变量时间序列中已有充分研究，但在时空领域中仍然是一个重大挑战。有效的解决方案如实例归一化及其变体可以通过标准化统计量来缓解时间偏移。然而，图上的分布偏移更为复杂，不仅涉及单个节点序列的漂移，还涉及空间网络中的异质性，其中不同节点表现出不同的统计特性。为了解决这个问题，我们提出了可逆残差归一化（RRN），一种新颖的框架，执行空间感知的可逆变换以解决空间和时间维度上的分布偏移。我们的方法在可逆残差块中集成了图卷积操作，实现了在保持可逆性的同时尊重底层图结构的自适应归一化。通过将中心归一化与谱约束图神经网络相结合，我们的方法以数据驱动的方式捕获和归一化复杂的时空关系。我们框架的双向性允许模型在归一化的潜在空间中学习，并通过逆变换恢复原始分布特性，为动态时空系统上的预测提供了一种鲁棒且模型无关的解决方案。

英文摘要

Distribution shift severely degrades the performance of deep forecasting models. While this issue is well-studied for individual time series, it remains a significant challenge in the spatio-temporal domain. Effective solutions like instance normalization and its variants can mitigate temporal shifts by standardizing statistics. However, distribution shift on a graph is far more complex, involving not only the drift of individual node series but also heterogeneity across the spatial network where different nodes exhibit distinct statistical properties. To tackle this problem, we propose Reversible Residual Normalization (RRN), a novel framework that performs spatially-aware invertible transformations to address distribution shift in both spatial and temporal dimensions. Our approach integrates graph convolutional operations within invertible residual blocks, enabling adaptive normalization that respects the underlying graph structure while maintaining reversibility. By combining Center Normalization with spectral-constrained graph neural networks, our method captures and normalizes complex Spatio-Temporal relationships in a data-driven manner. The bidirectional nature of our framework allows models to learn in a normalized latent space and recover original distributional properties through inverse transformation, offering a robust and model-agnostic solution for forecasting on dynamic spatio-temporal systems.

URL PDF HTML ☆

赞 0 踩 0

2605.09609 2026-06-19 cs.LG math.AG 版本更新

Minimal Filling Architectures of Polynomial Neural Networks: Counterexamples, Frontier Search, and Defects

多项式神经网络的最小填充架构：反例、前沿搜索与缺陷

Kevin Dao, Jose Israel Rodriguez

发表机构 * Department of Mathematics, University of Wisconsin-Madison, Wisconsin, USA（威斯康星大学麦迪逊分校数学系）

AI总结本文通过前沿搜索和符号计算验证了多项式神经网络的最小单峰猜想反例，揭示了部分子架构存在较大缺陷，与以往小缺陷现象形成对比。

2605.30456 2026-06-19 cs.LG math.OC 版本更新

DisjunctiveNet: Neural Symbolic Learning via Differentiable Convexified Optimization Layers

DisjunctiveNet: 通过可微凸优化层实现的神经符号学习

Shraman Pal, Can Li

发表机构 * Davidson School of Chemical Engineering, Purdue University, West Lafayette, USA（帕克大学化学工程大卫逊学校）

AI总结针对数据稀疏且富含领域知识的场景，提出DisjunctiveNet框架，通过可微凸优化层将析取约束嵌入神经网络，实现硬约束满足与强预测性能。

Comments ICML 2026

详情

AI中文摘要

科学与工程中的许多学习任务以稀疏数据集为特征，这限制了纯数据驱动方法的有效性。同时，这些问题通常伴随着源自物理定律、操作要求和专家启发式的丰富领域知识。这些知识经常以涉及逻辑命题和线性不等式的规则形式表达。现有的神经符号方法通常通过软惩罚近似地强制执行这些规则，在设计专门架构时假设输入无关的规则，或者依赖推理时的不可微后处理来实现硬约束满足。虽然可微优化层的最新进展使得在神经网络中实现端到端的可行性强制成为可能，但由于固有的非凸性，将这些方法扩展到逻辑或混合整数规则仍然具有挑战性。在这项工作中，我们提出了一个统一的端到端框架，用于在神经网络中强制执行硬性的、输入相关的混合整数线性约束。我们的方法将规则表示为析取约束，并应用层次凸松弛来获得凸包公式。这些松弛产生了易于处理的线性约束，可以嵌入为可微优化层，同时实现精确的规则满足。我们在真实数据集上展示了所提出框架的有效性，实现了完美的规则满足和强大的预测性能。

英文摘要

Many learning tasks in science and engineering are characterized by sparse datasets, which limits the effectiveness of purely data-driven approaches. At the same time, these problems are often accompanied by rich domain knowledge derived from physical laws, operational requirements, and expert heuristics. Such knowledge is frequently expressed as rules involving logical propositions and linear inequalities. Existing neuro-symbolic methods typically enforce these rules approximately through soft penalties, assume input-independent rules when designing specialized architectures, or rely on non-differentiable post-processing at inference time to achieve hard constraint satisfaction. While recent advances in differentiable optimization layers enable end-to-end feasibility enforcement within neural networks, extending these approaches to logical or mixed-integer rules remains challenging due to inherent nonconvexity. In this work, we propose a unified end-to-end framework for enforcing hard, input-dependent mixed integer linear constraints within neural networks. Our approach represents rules as disjunctive constraints and applies hierarchical convex relaxations to obtain convex hull formulations. These relaxations yield tractable linear constraints that can be embedded as differentiable optimization layers while enabling exact rule satisfaction. We demonstrate the effectiveness of the proposed framework on real-world datasets, achieving perfect rule satisfaction and strong predictive performance.

URL PDF HTML ☆

赞 0 踩 0

2606.16575 2026-06-19 cs.LG math-ph math.MP 版本更新

RepNN: Tackling spectral bias in deep neural networks via parameter reparameterization

RepNet：通过参数重参数化解决深度神经网络中的谱偏差

Yong Wang, Tao Zhou, Xuhui Meng

发表机构 * Institute of Interdisciplinary Research for Mathematics and Applied Science, School of Mathematics and Statistics, Huazhong University of Science and Technology（华中科技大学数学与统计学院交叉科学与应用数学研究所）； Institute of Computational Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences（中国科学院数学与系统科学研究院计算数学研究所）

AI总结针对深度神经网络在捕捉振荡和多尺度行为时的谱偏差问题，提出RepNet模型，通过重参数化第一隐藏层的权重和偏置，有效控制初始斜率尺度和分区点分布，实现自适应频率缩放，在函数逼近、PDE求解和算子学习中显著提升精度。

详情

AI中文摘要

深度神经网络（DNN）在科学计算中取得了显著成功，但在捕捉振荡和多尺度行为时常常受到谱偏差的影响。在本研究中，我们通过考察浅层ReLU神经网络在高频函数拟合中的失败来探究这一局限性。这一观察识别出解决快速振荡的两个重要因素：初始斜率尺度和网络诱导的分区点分布。受此分析启发，我们提出了RepNet，一种针对ReLU和tanh网络的重参数化DNN模型，专为高频和多尺度问题设计。关键思想是重参数化第一隐藏层的权重和偏置，从而能够有效控制初始斜率尺度并提供合适的初始分区点分布。此外，将重参数化的权重和偏置视为可训练参数，使得DNN在训练过程中实现自适应频率缩放。我们还推导了重参数化DNN的输出和斜率幅度的定量估计，以指导所提方法的初始化。数值实验，包括多尺度一维和四维函数逼近、结合物理信息神经网络（PINN）的正向和逆向PDE问题以及算子学习，表明RepNet在略微增加计算成本的情况下，提高了普通DNN在捕捉高度振荡特征时的预测精度。这些结果表明，RepNet为克服谱偏差并将DNN应用于多尺度问题提供了一种有效且灵活的方法。

英文摘要

Deep neural networks (DNNs) have achieved remarkable success in scientific computing, yet they often suffer from spectral bias in capturing oscillatory and multiscale behaviors. In this study, we investigate this limitation by examining the failure of shallow ReLU neural networks in fitting high-frequency functions. This observation identifies two important factors in resolving rapid oscillations: the initial slope scale and the distribution of partition points induced by the networks. Motivated by this analysis, we propose RepNN, a reparameterized neural network model with activation ReLU or tanh designed for high-frequency and multiscale problems. The key idea is to reparameterize the weights and biases in the first hidden layer, which enables effective control of the initial slope scale and provides an appropriate distribution of the initial partition points. Furthermore, treating the reparameterized weights and biases as trainable parameters allows the DNN to achieve adaptive frequency scaling during training. In addition, we derive quantitative estimates for the output and slope magnitudes of the reparameterized DNN to guide the initialization of the proposed method. Numerical experiments, including multiscale one- and four-dimensional function approximations, forward and inverse PDE problems in combination with physics-informed neural networks (PINNs), and operator learning for an earthquake problem using real data, demonstrate that RepNN improves the predicted accuracy of vanilla DNNs in capturing highly oscillatory features with slightly additional computational cost. These results indicate that RepNN provides an effective and flexible approach for overcoming spectral bias and applying DNNs to multiscale problems.

URL PDF HTML ☆

赞 0 踩 0

2606.17832 2026-06-19 cs.LG 版本更新

From Drift to Coherence: Stabilizing Beliefs in LLMs

从漂移到一致：稳定LLM中的信念

SongEun Kim, Seungyoo Lee, Edwin Fong, Hyungi Lee, Juho Lee

发表机构 * Department of Statistics, Seoul National University ； Korea Advanced Institute of Science \& Technology ； Department of AI, Kookmin University ； University of Hong Kong

AI总结研究LLM在多项选择问答中的信念漂移问题，提出提示式预测重采样（PPR）方法，发现信念过程会自稳定并收敛，进而提出种子答案提示策略和自一致性损失以加速稳定并提高预测一致性。

详情

AI中文摘要

大型语言模型（LLM）常被假设执行隐式贝叶斯推理，然而一个关键的一致性条件——预测信念的鞅性质——已被证明在受控的合成上下文学习设置中失效。我们在更典型的使用场景中重新审视这个问题：通用多项选择问答。利用离散答案空间，我们计算精确的预测分布，并研究由自回归答案重采样引起的信念动态。我们引入了提示式预测重采样（PPR），其中LLM对同一问题生成一系列答案。实验表明，PPR揭示了早期阶段的信念漂移，表明鞅性质被违反。然而，在足够的重采样步骤后，信念过程自稳定并收敛到一个一致的预测分布。基于这一观察，我们进一步提出了（i）种子答案提示策略以加速稳定，以及（ii）自一致性损失，通过微调将早期漂移摊销到模型中。在多项选择问答基准上的实验表明，我们的方法在不牺牲准确性的情况下显著减少了信念漂移并提高了预测一致性。

英文摘要

Large language models (LLMs) are often hypothesized to perform implicit Bayesian inference, yet a key coherence condition, the martingale property of predictive beliefs, has been shown to fail in controlled synthetic in-context learning settings. We revisit this question in a more typical usage regime: generic multiple-choice question answering. Exploiting the discrete answer space, we compute exact predictive distributions and study belief dynamics induced by autoregressive answer resampling. We introduce prompted predictive resampling (PPR), where an LLM generates a sequence of answers to the same question. Empirically, PPR reveals early-stage belief drift, indicating martingale violations. However, after sufficient resampling steps, the belief process self-stabilizes and converges to a coherent predictive distribution. Based on this observation, we further propose (i) a seed-answer prompting strategy to accelerate stabilization, and (ii) a self-consistency loss that amortizes early-stage drift into the model via fine-tuning. Experiments on multiple-choice QA benchmarks show that our methods substantially reduce belief drift and improve predictive coherence without sacrificing accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.17886 2026-06-19 cs.LG 版本更新

Monotonic Kolmogorov-Arnold Networks: A Theoretical and Empirical Study of Monotonicity as an Inductive Bias

单调Kolmogorov-Arnold网络：单调性作为归纳偏置的理论与实证研究

Mikhail Krasnov, Blaž Bertalanič, Carolina Fortuna

发表机构 * Jozef Stefan Institute（约瑟夫·斯特凡研究所）

AI总结提出MKAN，通过指数重参数化B样条系数、正边权和单调基激活实现硬单调性，理论证明任何特征提取器可被单调化且编码器规模有界，实验表明MKAN在单调性基准上达到最优并保持KAN的逐边功能透明性。

详情

AI中文摘要

单调性一直是神经网络长期使用的架构归纳偏置，其动机来源于表格、科学和经济场景，其中输出已知对某些输入呈单调响应。现有方法基于MLP或流模型，缺乏逐边功能透明性；唯一具有单调性的KAN变体MonoKAN仅在受限参数子集上施加约束，并需要投影式训练过程。我们通过\textbf{MKAN}填补了这一空白，MKAN是一种KAN，通过B样条系数的指数重参数化、正边权和单调基激活，对所有参数值保证硬单调性。训练简化为标准的无约束梯度下降。我们的主要理论贡献是一个\textbf{表示代价}定理：任何诱导球状语义邻域划分的$C^K, K >0$特征提取器，都可以在$N' = N^* + k \le 2N^*$处实现等价邻域结构的单调实现，其中$k$是原始非单调坐标的数量。该界限与架构无关，并为单调编码器提供了原则性的规模确定规则。实验上，MKAN在SMM/ICML-2024基准上与最先进的单调神经网络竞争，同时是唯一结合了硬无约束单调性和KAN逐边功能透明性的方法；在四个真实数据集上的自监督特征规模扫描中验证了$2N^*$预测，在受控单调生成数据集上，MKAN以显著高于KAN、MLP和线性基线的Spearman对齐恢复了真实因子。

英文摘要

Monotonicity has been a long-running architectural inductive bias for neural networks, motivated by tabular, scientific, and economic settings where outputs are known to respond monotonically to certain inputs. Existing approaches are MLP- or flow-based and lack per-edge functional transparency; the only Kolmogorov--Arnold Network (KAN) variant with monotonicity, MonoKAN, enforces the constraint only on a restricted parameter subset and requires a projection-style training procedure. We close this gap with \textbf{MKAN}, a KAN with hard monotonicity guaranteed for \emph{all} parameter values via exponential reparameterization of B-spline coefficients, positive edge weights, and a monotone base activation. Training reduces to standard unconstrained gradient descent. Our headline theoretical contribution is a \emph{representation-cost} theorem: any $C^K, K >0$ feature extractor inducing a ball-shaped semantic-neighborhood partition admits a monotone realization of the equivalent neighborhood structure at $N' = N^* + k \le 2N^*$, where $k$ is the number of non-monotone coordinates of the original. The bound is architecture-agnostic and gives a principled sizing rule for monotone encoders. Empirically, MKAN is competitive with state-of-the-art monotone NNs on the SMM/ICML-2024 benchmark while being the only method that combines hard unconstrained monotonicity with KAN's per-edge functional transparency; the $2N^*$ prediction is validated in a self-supervised feature-size sweep on four real datasets, and on a controlled monotone-generative dataset MKAN recovers ground-truth factors with substantially higher Spearman alignment than KAN, MLP, and linear baselines.

URL PDF HTML ☆

赞 0 踩 0

2601.22300 2026-06-19 physics.optics cond-mat.dis-nn cs.ET cs.LG 版本更新

Toward all-optical unsupervised Hebbian learning in deep photonic neuromorphic networks

面向全光学无监督Hebbian学习的深度光子神经形态网络

Xi Li, Disha Biswas, Peng Zhou, Wesley H. Brigner, Anna Capuano, Joseph S. Friedman, Qing Gu

发表机构 * Department of Electrical and Computer Engineering, North Carolina State University（北卡罗来纳州立大学电气与计算机工程系）； Department of Electrical and Computer Engineering, The University of Texas at Dallas（德克萨斯大学达拉斯分校电气与计算机工程系）； Department of Materials Science and Engineering, North Carolina State University（北卡罗来纳州立大学材料科学与工程系）； Department of Physics, North Carolina State University（北卡罗来纳州立大学物理系）

AI总结提出一种基于相变材料突触和局部光反馈的深度光子神经形态网络架构，实现在线无监督Hebbian学习，实验验证了自适应突触演化和光学推理。

Comments 16 pages, 4 figures

详情

AI中文摘要

我们提出了一种基于相变材料（PCM）突触和局部光反馈的深度光子神经形态网络（PNN）架构，用于在线、无监督的Hebbian学习。该架构将光学矢量-矩阵乘法、非易失性PCM突触加权以及局部符合驱动的突触自适应结合在一个与光子集成电路兼容的多层光子交叉开关框架中。与依赖外部计算梯度、重复光电转换或全局反向传播的传统PNN不同，所提出的框架采用由突触前和突触后光学活动直接控制的局部Hebbian学习。为了研究所提出的学习机制的可行性，我们使用光纤组件、可编程可变光衰减器和包含PCM热动力学的实时软件控制实现了PNN设计。在离线和在线学习条件下，使用代表性图像识别任务实验评估了监督和无监督学习行为。实验结果表明，在现实光纤硬件条件下，通过局部Hebbian学习实现了自适应突触演化、成功的光学推理和自主模式编码。这些结果为未来能够实现可扩展和节能的在线Hebbian学习的集成光子神经形态系统铺平了道路。

英文摘要

We propose a deep photonic neuromorphic network (PNN) architecture based on phase-change material (PCM) synapses and local optical feedback for online, unsupervised Hebbian learning. The proposed architecture combines optical vector-matrix multiplication, non-volatile PCM synaptic weighting, and local coincidence-driven synaptic adaptation within a multilayer photonic crossbar framework compatible with photonic integrated circuits. Unlike conventional PNNs that rely on externally computed gradients, repeated optical-electrical-optical conversions, or global backpropagation, the proposed framework employs local Hebbian learning governed directly by correlated pre- and post-synaptic optical activity. To investigate the feasibility of the proposed learning mechanism, we implemented the PNN design using fiber-optic components, programmable variable optical attenuators, and real-time software control that incorporates PCM thermal dynamics. Supervised and unsupervised learning behaviors were experimentally evaluated under both offline and online learning conditions using representative image-recognition tasks. The experimental results demonstrate adaptive synaptic evolution, successful optical inference, and autonomous pattern encoding through local Hebbian learning under realistic fiber-optic hardware conditions. These results establish a pathway toward future integrated photonic neuromorphic systems capable of scalable and energy-efficient online Hebbian learning.

URL PDF HTML ☆

赞 0 踩 0

2606.11673 2026-06-19 quant-ph cs.LG 版本更新

Higher-Order Token Interactions via Quantum Attention

高阶令牌交互的量子注意力机制

Jian Xu, Chao Li, Delu Zeng, John Paisley, Qibin Zhao

发表机构 * RIKEN iTHEMS ； RIKEN AIP ； South China University of Technology（华南理工大学）； Columbia University（哥伦比亚大学）

AI总结提出量子高阶注意力（QHA），通过数据重上传和非克利福德纠缠器在浅电路中合成任意阶令牌交互，证明其表达能力超越经典自注意力，并具有可训练性保证，在遗传上位、带噪学习奇偶和图三角形检测中高效检测高阶交互。

详情

AI中文摘要

标准点积自注意力在单层中仅计算令牌间的成对（二阶）交互；表示一般的$k$阶交互已知需要在单层中使用超二次资源或通过深度组合。我们引入\textbf{量子高阶注意力（QHA）}，一种浅层、硬件可实现的量子注意力头，通过数据重上传和全对非克利福德纠缠器，在电路内部合成$k$阶令牌交互，并通过局部单量子比特读出暴露它们。我们证明：（i）表达能力分离：任何嵌入维度$m$、$H$个头和$p$位精度满足$mHp=o(N/\log\log N)$的单个标准自注意力层无法表示一个QHA头以电路深度$O(\log k)$（$O(k)$个两量子比特门）表示的$k$阶相关族；（ii）其局部设计实例的可训练性保证：使用局部读出和$O(\log n)$深度，梯度方差为$\Omega(1/\mathrm{poly}(n))$（无贫瘠高原），我们通过实验确认——同时明确我们基准测试的更具表达力的全对实例是经验训练的，并显示指数衰减的梯度。实验上，在参数预算小$6.5\times$的情况下，QHA从不相交输入中泛化每个阶$k\le6$的隐藏子集奇偶性，而更大的经典注意力头在阶~2之后崩溃；与理论一致，优势的大小跟踪目标的傅里叶度——奇偶性最大，当存在低阶结构时缩小。作为一个应用，QHA在三个领域——遗传上位、带噪学习奇偶和图三角形检测——作为紧凑的高阶交互检测器，在最小的参数预算下达到噪声上限，而领域标准的线性方法失败。

英文摘要

Standard dot-product self-attention computes, in a single layer, only pairwise (order-2) interactions between tokens; representing a generic order-$k$ interaction is known to require either super-quadratic resources in one layer or composition across depth. We introduce \textbf{Quantum Higher-Order Attention (QHA)}, a shallow, hardware-realizable quantum attention head that, via data re-uploading and an all-to-all non-Clifford entangler, synthesizes order-$k$ token interactions inside the circuit and exposes them through a local single-qubit read-out. We prove (i) an expressivity separation: any single standard self-attention layer with embedding dimension $m$, $H$ heads and $p$-bit precision satisfying $mHp=o(N/\log\log N)$ cannot represent the order-$k$ correlation family that one QHA head represents with circuit depth $O(\log k)$ ($O(k)$ two-qubit gates); and (ii) a trainability guarantee for its local-design instantiation: with a local read-out and $O(\log n)$ depth the gradient variance is $Ω(1/\mathrm{poly}(n))$ (no barren plateau), which we confirm empirically -- while being explicit that the more expressive all-to-all instantiation we benchmark is trained empirically and shows exponentially decaying gradients. Empirically, at a $6.5\times$ smaller parameter budget, QHA generalizes hidden-subset parity of every order $k\le6$ from disjoint inputs, whereas the larger classical attention head collapses past order~2; consistent with theory, the size of the advantage tracks the target's Fourier degree - largest for parity and shrinking when low-order structure is present. As an application, QHA serves as a compact high-order interaction detector across three domains - genetic epistasis, learning-parity-with-noise, and graph triangle detection - reaching the noise ceiling at the smallest parameter budget where field-standard linear methods fail.

URL PDF HTML ☆

赞 0 踩 0

2606.18611 2026-06-19 cs.SD cs.AI cs.LG stat.ML 版本更新

QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

QC-GAN: 一种参数高效的四元数Conformer GAN用于高保真语音增强

Shogo Yamauchi, Hideaki Tamori, Makoto Sakai, Yosuke Yamano, Tohru Nitta

发表机构 * The Asahi Shimbun Company（朝日新闻社）； Tokyo Woman's Christian University（东京女子基督教大学）

AI总结提出参数高效的QC-GAN，结合四元数Conformer生成器和MetricGAN训练，通过汉密尔顿积共享权重减少参数量，在VoiceBank+DEMAND上以0.89M参数达到PESQ 3.48，性能媲美两倍大小模型。

Comments 10 pages, 6 figures and 5 tables. Accepted at Interspeech2026

2406.07775 2026-06-19 cs.LG 版本更新

Self-attention-based non-linear basis transformations for compact latent space modelling of dynamic optical fibre transmission matrices

基于自注意力的非线性基变换用于动态光纤传输矩阵的紧凑潜在空间建模

Yijie Zheng, Robert J. Kilpatrick, David B. Phillips, George S. D. Gordon

发表机构 * Optics and Photonics research group, University of Nottingham, UK（诺丁汉大学光学与光子学研究组，英国）； University of Exeter, UK（埃克塞特大学，英国）； State Key Laboratory of Extreme Photonics and Instrumentation, College of Optical Science and Engineering International Research Center for Advanced Photonics, Zhejiang University, Hangzhou, China（极端光子学与仪器国家重点实验室，浙江大学光科学与工程学院，国际先进光子学研究中心，中国杭州）； Research Center for Humanoid Sensing, Zhejiang Lab, Hangzhou, China（人感知研究中心，浙江实验室，中国杭州）

AI总结提出使用自注意力层动态变换光纤矩阵的坐标表示到紧凑基，实现低维表示，在多个数据集上验证了基稀疏性（参与比0.01-0.11）和低重建误差（<10%）。

详情

AI中文摘要

多模光纤是头发丝粗细的玻璃丝，能高效传输光。它们有望实现下一代医用内窥镜，在体内深处提供前所未有的亚细胞图像分辨率。然而，将光限制在这样的光纤中意味着图像在传输过程中固有地被打乱。传统上，通过预先校准特定光纤如何打乱光并求解表示光纤物理模型的静态线性矩阵方程来补偿这种打乱。然而，随着技术向实际部署发展，解扰过程必须考虑由于移动和温度变化等因素导致的光纤对光影响的矩阵的动态变化，以及由于光纤尖端在体内不可及而产生的非线性。这种复杂、动态和非线性行为非常适合用神经网络近似，但大多数领先的图像重建网络依赖卷积层，这些层假设相邻像素之间存在强相关性，这种强归纳偏置不适用于光纤矩阵，因为光纤矩阵可以用具有长程相关性的任意坐标表示来表达。我们引入了一个新概念，使用自注意力层将变化的光纤矩阵的坐标表示动态变换到允许紧凑、低维表示的基，适合进一步处理。我们在不同的光纤矩阵数据集上展示了该方法的有效性。我们展示了我们的模型在变换基上显著提高了光纤基的稀疏性，以参与比p作为稀疏性度量，介于0.01和0.11之间。此外，我们展示了这些变换后的表示允许以<10%的重建误差重建原始矩阵，证明了可逆性。

英文摘要

Multimode optical fibres are hair-thin strands of glass that efficiently transport light. They promise next-generation medical endoscopes that provide unprecedented sub-cellular image resolution deep inside the body. However, confining light to such fibres means that images are inherently scrambled in transit. Conventionally, this scrambling has been compensated by pre-calibrating how a specific fibre scrambles light and solving a stationary linear matrix equation that represents a physical model of the fibre. However, as the technology develops towards real-world deployment, the unscrambling process must account for dynamic changes in the matrix representing the fibre's effect on light, due to factors such as movement and temperature shifts, and non-linearities resulting from the inaccessibility of the fibre tip when inside the body. Such complex, dynamic and nonlinear behaviour is well-suited to approximation by neural networks, but most leading image reconstruction networks rely on convolutional layers, which assume strong correlations between adjacent pixels, a strong inductive bias that is inappropriate for fibre matrices which may be expressed in a range of arbitrary coordinate representations with long-range correlations. We introduce a new concept that uses self-attention layers to dynamically transform the coordinate representations of varying fibre matrices to a basis that admits compact, low-dimensional representations suitable for further processing. We demonstrate the effectiveness of this approach on diverse fibre matrix datasets. We show our models significantly improve the sparsity of fibre bases in their transformed bases with a participation ratio, p, as a measure of sparsity, of between 0.01 and 0.11. Further, we show that these transformed representations admit reconstruction of the original matrices with < 10% reconstruction error, demonstrating the invertibility.

URL PDF HTML ☆

赞 0 踩 0

2502.03227 2026-06-19 cs.LG cs.CV 版本更新

Adversarial Dependence Minimization

对抗性依赖最小化

Pierre-François De Plaen, Tinne Tuytelaars, Marc Proesmans, Luc Van Gool

发表机构 * CVL, ETH Zürich, Switzerland（CVL，苏黎世联邦理工学院，瑞士）； INSAIT, Sofia University, Bulgaria（INSAIT，索菲亚大学，保加利亚）

AI总结提出ADM算法，通过对抗博弈最小化特征维度间的统计依赖性，证明全局最优时达到相互独立，并应用于非线性去相关、图像分类泛化提升和自监督学习维度坍塌预防。

2509.15927 2026-06-19 cs.LG cs.AI 版本更新

Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search

增强生成式自动出价：结合离线奖励评估与策略搜索

Zhiyu Mou, Yiqin Lv, Miao Xu, Qi Wang, Yixiu Mao, Jinghao Chen, Qichen Ye, Chao Li, Rongquan Bai, Chuan Yu, Jian Xu, Bo Zheng

发表机构 * Taobao & Tmall Group of Alibaba（阿里巴巴淘宝与天猫集团）； Department of Automation, Tsinghua University（清华大学自动化系）

AI总结针对现有生成式自动出价方法无法超越静态数据集进行探索的性能瓶颈，提出AIGB-Pearl方法，通过轨迹评估器和KL-Lipschitz约束的分数最大化方案实现安全高效探索，在模拟和真实广告系统中取得最优性能。

详情

AI中文摘要

自动出价是广告主提升广告效果的关键工具。最近进展表明，AI生成式出价（AIGB）从离线数据中学习条件生成规划器，相比典型的基于离线强化学习（RL）的自动出价方法取得了更优性能。然而，现有AIGB方法仍面临性能瓶颈，因其固有能力无法在静态数据集之外进行带反馈的探索。为解决此问题，我们提出\textbf{AIGB-Pearl}（\emph{\textbf{P}lanning with \textbf{E}valu\textbf{A}tor via \textbf{RL}}），一种融合生成式规划与策略优化的新方法。AIGB-Pearl的核心在于构建轨迹评估器以评估生成分数的质量，并设计一个理论上可靠的KL-Lipschitz约束分数最大化方案，确保在离线数据集之外进行安全高效的探索。进一步开发了结合同步耦合技术的实用算法，以保证所提方案所需的模型正则性。在模拟和真实广告系统上的大量实验证明了我们方法的最优性能。

英文摘要

Auto-bidding is a critical tool for advertisers to improve advertising performance. Recent progress has demonstrated that AI-Generated Bidding (AIGB), which learns a conditional generative planner from offline data, achieves superior performance compared to typical offline reinforcement learning (RL)-based auto-bidding methods. However, existing AIGB methods still face a performance bottleneck due to their inherent inability to explore beyond the static dataset with feedback. To address this, we propose \textbf{AIGB-Pearl} (\emph{\textbf{P}lanning with \textbf{E}valu\textbf{A}tor via \textbf{RL}}), a novel method that integrates generative planning and policy optimization. The core of AIGB-Pearl lies in constructing a trajectory evaluator to assess the quality of generated scores and designing a provably sound KL-Lipschitz-constrained score-maximization scheme to ensure safe and efficient exploration beyond the offline dataset. A practical algorithm that incorporates the synchronous coupling technique is further developed to ensure the model regularity required by the proposed scheme. Extensive experiments on both simulated and real-world advertising systems demonstrate the state-of-the-art performance of our approach.

URL PDF HTML ☆

赞 0 踩 0

2510.19893 2026-06-19 cs.LG 版本更新

EQPO: Equitable Group Relative Policy Optimization for Clinical Reasoning

EQPO: 面向临床推理的公平群体相对策略优化

Shiqi Dai, Wei Dai, Jiaee Cheong, Paul Pu Liang

发表机构 * MIT（麻省理工学院）； Harvard University（哈佛大学）

AI总结提出EQPO分层强化学习方法，通过自适应重加权样本促进异质临床人群的均衡学习，在7个诊断基准上降低F1标准差43.9%，缩小预测公平差距27.2%。

Comments Accepted as Oral on NeurIPS 2025 GenAI4Health Workshop

详情

AI中文摘要

医疗AI系统展示了令人印象深刻的诊断性能，但它们在不同人口统计群体之间通常表现出不均匀的准确性，使代表性不足的人群处于不利地位。尽管多模态推理基础模型推动了临床诊断的发展，基于强化学习的后训练倾向于吸收并放大多数主导训练语料中存在的偏见。我们提出公平群体相对策略优化（EQPO），一种分层强化学习方法，通过根据子群表示、任务难度和数据来源自适应地重新加权样本，鼓励跨异质临床人群的平衡学习。由于人口统计注释在真实临床数据中经常缺失，EQPO还在不可用时应用无监督聚类来恢复潜在子群。在覆盖5种模态（X射线、CT、皮肤镜、乳腺X线摄影、超声）的7个诊断基准上，EQPO在QoQ-Med3-8B上相比原始GRPO将F1标准差降低43.9%，最大跨群体F1差距降低42.7%，并在MedGemma-4B上将预测公平差距缩小27.2%（相比有偏减轻的RL基线），同时即使没有任何人口统计标签也将F1提高12.5%。检查训练轨迹显示，EQPO在优化过程中稳步提高公平性，而基线方法的公平性随训练进行而下降，并且发现的隐式群体保持稳定并与掩蔽的人口统计属性对齐。我们进一步发布了EquiMedGemma-4B和EquiQoQ-Med3-8B，这两种具有公平意识的临床VLLM在显著缩小人口统计差距的同时达到了最先进的准确性。

英文摘要

Medical AI systems demonstrated impressive diagnostic performance, yet they routinely show uneven accuracy across demographic groups, disadvantaging underrepresented populations. Although multimodal reasoning foundation models have pushed clinical diagnosis forward, reinforcement learning-based post-training tends to absorb and magnify the biases present in majority-dominated training corpora. We propose Equitable Group Relative Policy Optimization (EQPO), a hierarchical reinforcement learning method that encourages balanced learning across heterogeneous clinical populations by adaptively reweighting samples according to subgroup representation, task difficulty, and data source. As demographic annotations are frequently missing in real-world clinical data, EQPO additionally applies unsupervised clustering to recover latent subpopulations when they are unavailable. On 7 diagnostic benchmarks covering 5 modalities (X-ray, CT, dermoscopy, mammography, ultrasound), EQPO reduces F1 standard deviation by 43.9% and the maximum cross-group F1 gap by 42.7% on QoQ-Med3-8B over vanilla GRPO, and narrows predictive parity gaps by 27.2% on MedGemma-4B over bias-mitigated RL baselines while raising F1 by 12.5% even without any demographic labels. Examining the training trajectory shows that EQPO steadily improves fairness over the course of optimization, in contrast to baseline methods whose fairness degrades as training proceeds, and the discovered implicit groups remain stable and align with masked demographic attributes. We further release EquiMedGemma-4B and EquiQoQ-Med3-8B, equitability-aware clinical VLLMs that attain state-of-the-art accuracy with markedly smaller demographic gaps.

URL PDF HTML ☆

赞 0 踩 0

2510.21978 2026-06-19 cs.LG cs.AI 版本更新

Beyond Reasoning Gains: Mitigating General-Capability Forgetting in Large Reasoning Models

超越推理增益：缓解大型推理模型中的通用能力遗忘

Hoang Phan, Xianjun Yang, Yuanshun Yao, Jingyu Zhang, Shengjie Bi, Xiaocheng Tang, Madian Khabsa, Lijuan Liu, Deren Lei

发表机构 * Meta Superintelligence Labs（Meta超智能实验室）； New York University（纽约大学）； Johns Hopkins University（约翰霍普金斯大学）

AI总结针对强化学习训练导致推理模型遗忘基础能力的问题，提出RECAP重放策略，通过动态目标重加权在线调整训练重点，在保持通用能力的同时提升推理性能。

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）在数学和多模态推理方面取得了显著进展，并已成为当代语言和视觉-语言模型的标准后训练范式。然而，RLVR方法引入了能力退化的重大风险，即模型在长时间训练后，若未采用正则化策略，会遗忘基础技能。我们通过实验证实了这一担忧，观察到开源推理模型在感知和忠实性等核心能力上出现性能下降。虽然施加KL散度等正则化项有助于防止偏离基础模型，但这些项是在当前任务上计算的，因此不能保证保留更广泛的知识。同时，跨异构领域的经验回放使得决定每个目标应获得多少训练权重变得困难。为解决这一问题，我们提出RECAP——一种具有动态目标重加权的重放策略，用于通用知识保留。我们的重加权机制利用短期收敛和不稳定信号在线自适应，将后训练焦点从饱和目标转移到表现不佳或不稳定的目标。我们的方法是端到端的，可直接应用于现有RLVR流程，无需训练额外模型或进行繁重调优。在Qwen2.5-VL-3B和Qwen2.5-VL-7B上的广泛实验证明了我们方法的有效性，该方法不仅保留了通用能力，还通过实现任务内奖励的更灵活权衡提升了推理性能。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has delivered impressive gains in mathematical and multimodal reasoning and has become a standard post-training paradigm for contemporary language and vision-language models. However, the RLVR recipe introduces a significant risk of capability regression, in which models forget foundational skills after prolonged training without employing regularization strategies. We empirically confirm this concern, observing that open-source reasoning models suffer performance degradation on core capabilities such as perception and faithfulness. While imposing regularization terms like KL divergence can help prevent deviation from the base model, these terms are computed on the current task and therefore do not guarantee preservation of broader knowledge. Meanwhile, commonly used experience replay across heterogeneous domains makes it nontrivial to decide how much training emphasis each objective should receive. To address this, we propose RECAP-a replay strategy with dynamic objective reweighting for general knowledge preservation. Our reweighting mechanism adapts online using short-horizon signals of convergence and instability, shifting the post-training focus away from saturated objectives and toward underperforming or volatile ones. Our method is end-to-end and readily applicable to existing RLVR pipelines without training additional models or heavy tuning. Extensive experiments on benchmarks using Qwen2.5-VL-3B and Qwen2.5-VL-7B demonstrate the effectiveness of our method, which not only preserves general capabilities but also improves reasoning by enabling more flexible trade-offs among in-task rewards.

URL PDF HTML ☆

赞 0 踩 0

2601.22970 2026-06-19 cs.LG cs.AI 版本更新

Stabilizing the Q-Gradient Field for Policy Smoothness in Actor-Critic Methods

稳定Q-梯度场以实现Actor-Critic方法中的策略平滑性

Jeong Woon Lee, Kyoleen Kwak, Daeho Kim, Hyoseok Hwang

发表机构 * College of Software, Kyung Hee University（韩国庆熙大学软件学院）

AI总结针对连续动作空间中actor-critic方法策略振荡问题，提出基于评论家微分几何的PAVE框架，通过稳定Q-梯度场实现策略平滑，无需修改actor。

详情

AI中文摘要

通过连续actor-critic方法学习的策略通常表现出不稳定的高频振荡，使其不适合物理部署。当前方法试图通过直接正则化策略输出来强制平滑性。我们认为这种方法治标不治本。在这项工作中，我们从理论上建立了策略非平滑性根本上由评论家的微分几何决定。通过对actor-critic目标应用隐式微分，我们证明了最优策略的敏感性受限于Q函数的混合偏导数（噪声敏感性）与其动作空间曲率（信号区分度）之比。为了实证验证这一理论见解，我们引入了PAVE（策略感知值场均衡），一种以评论家为中心的正则化框架，将评论家视为标量场并稳定其诱导的动作梯度场。PAVE通过最小化Q-梯度波动同时保持局部曲率来修正学习信号。实验结果表明，PAVE在不修改actor的情况下，实现了与策略侧平滑正则化方法相当的平滑性，同时保持了有竞争力的任务性能。

英文摘要

Policies learned via continuous actor-critic methods often exhibit erratic, high-frequency oscillations, making them unsuitable for physical deployment. Current approaches attempt to enforce smoothness by directly regularizing the policy's output. We argue that this approach treats the symptom rather than the cause. In this work, we theoretically establish that policy non-smoothness is fundamentally governed by the differential geometry of the critic. By applying implicit differentiation to the actor-critic objective, we prove that the sensitivity of the optimal policy is bounded by the ratio of the Q-function's mixed-partial derivative (noise sensitivity) to its action-space curvature (signal distinctness). To empirically validate this theoretical insight, we introduce PAVE (Policy-Aware Value-field Equalization), a critic-centric regularization framework that treats the critic as a scalar field and stabilizes its induced action-gradient field. PAVE rectifies the learning signal by minimizing the Q-gradient volatility while preserving local curvature. Experimental results demonstrate that PAVE achieves smoothness comparable to policy-side smoothness regularization methods, while maintaining competitive task performance, without modifying the actor.

URL PDF HTML ☆

赞 0 踩 0

2602.04037 2026-06-19 cs.LG cs.RO 版本更新

DADP: Domain Adaptive Diffusion Policy

DADP: 领域自适应扩散策略

Pengcheng Wang, Qinghang Liu, Haotian Lin, Yiheng Li, Guojian Zhan, Masayoshi Tomizuka, Yixiao Wang

发表机构 * University of California, Berkeley, California, USA（加州大学伯克利分校）； Peking University, Beijing, China（北京大学）； Tsinghua University, Beijing, China（清华大学）

AI总结提出DADP，通过无监督解耦和领域感知扩散注入，实现跨动态环境的鲁棒零样本适应，在运动与操控任务上超越先前方法。

详情

AI中文摘要

学习能够泛化到未见过的转移动态的领域自适应策略，仍然是基于学习的控制中的一个基本挑战。通过领域表示学习来捕获领域特定信息，从而实现领域感知决策，已经取得了实质性进展。我们分析了通过动态预测学习领域表示的过程，发现选择与当前步骤相邻的上下文会导致学习到的表示将静态领域信息与变化的动态属性纠缠在一起。这种混合可能会混淆条件策略，从而限制零样本适应。为了应对这一挑战，我们提出了DADP（领域自适应扩散策略），通过无监督解耦和领域感知扩散注入实现鲁棒适应。首先，我们引入了滞后上下文动态预测，这是一种将未来状态估计条件化在历史偏移上下文上的策略；通过增加这个时间间隔，我们通过过滤掉瞬态属性来无监督地解耦静态领域表示。其次，我们通过偏置先验分布和重新制定扩散目标，将学习到的领域表示直接集成到生成过程中。在涉及运动和操控的具有挑战性的基准测试上的大量实验表明，DADP相对于先前方法具有优越的性能和泛化能力。更多可视化结果可在此https URL上获得。

英文摘要

Learning domain adaptive policies that can generalize to unseen transition dynamics, remains a fundamental challenge in learning-based control. Substantial progress has been made through domain representation learning to capture domain-specific information, thus enabling domain-aware decision making. We analyze the process of learning domain representations through dynamical prediction and find that selecting contexts adjacent to the current step causes the learned representations to entangle static domain information with varying dynamical properties. Such mixture can confuse the conditioned policy, thereby constraining zero-shot adaptation. To tackle the challenge, we propose DADP (Domain Adaptive Diffusion Policy), which achieves robust adaptation through unsupervised disentanglement and domain-aware diffusion injection. First, we introduce Lagged Context Dynamical Prediction, a strategy that conditions future state estimation on a historical offset context; by increasing this temporal gap, we unsupervisedly disentangle static domain representations by filtering out transient properties. Second, we integrate the learned domain representations directly into the generative process by biasing the prior distribution and reformulating the diffusion target. Extensive experiments on challenging benchmarks across locomotion and manipulation demonstrate the superior performance, and the generalizability of DADP over prior methods. More visualization results are available on the https://outsider86.github.io/DomainAdaptiveDiffusionPolicy/.

URL PDF HTML ☆

赞 0 踩 0

2602.17315 2026-06-19 cs.LG cs.AI 版本更新

Flickering Multi-Armed Bandits

闪烁多臂老虎机

Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen

发表机构 * University of Colorado Boulder（科罗拉多大学博尔德分校）； INRIA Paris（巴黎国家信息与自动化研究所）

AI总结提出闪烁多臂老虎机模型，通过随机图约束动作可用性，设计两阶段懒惰随机游走算法实现次线性遗憾界，并证明信息论下界的最优性。

强化学习基础模型本应已经存在

Abdelrahman Zighem, Jill-Jênn Vie

发表机构 * École normale supérieure de Paris, PSL University, Paris, France（巴黎高等师范学院，PSL大学，法国巴黎）； Soda team, Inria Saclay, Palaiseau, France（Soda团队，法国国家信息与自动化研究所萨克雷中心，法国帕莱索）

AI总结提出通过合成MDP构建强化学习基础模型，利用固定大小的充分统计量使注意力架构适用，在线和离线实验均优于传统算法。

详情

AI中文摘要

语言和视觉的基础模型由互联网规模的数据驱动，而结构化领域（表格预测、时间序列预测、图学习、强化学习）则不然。替代方案是合成数据，它将负担从收集转移到先验设计。这种先验已经存在于许多结构化任务中：TabPFN及其后续工作通过一个在合成贝叶斯先验上预训练的Transformer解决表格分类问题。我们提出两点。\textbf{首先}，强化学习是明显的空白：采样一个合成MDP与采样一个合成表格数据集一样可行，然而没有上下文强化学习工作将先验设计作为主要目标。\textbf{其次}，MDP允许一个固定大小的充分统计量，独立于观察到的回合且形状为表格形式，这使得它们直接适用于用于表格基础模型的基于注意力的架构，只需将策略头替换监督目标。这些共同定义了强化学习基础模型的议程。作为概念验证，我们完全在合成MDP上训练一个模型，并表明，无需任务特定的调优，它就能在上下文中解决留出的表格基准，包括在线和离线：在线时，使用比UCB-VI和表格Q-learning少得多的回合；离线时，与VI-LCB竞争。

英文摘要

Foundation models for language and vision are powered by internet-scale data, while structured domains such as tabular prediction are powered by synthetic data. This substitute shifts the challenge from collection to prior design. Such priors already exist for many structured tasks: TabPFN and its successors solve tabular classification with a transformer pretrained on a synthetic Bayesian prior. We make two points. \textbf{First}, reinforcement learning is the conspicuous gap: sampling a synthetic MDP is as feasible as sampling a synthetic tabular dataset, yet no in-context RL work treats prior design as a primary objective. \textbf{Second}, MDPs admit a fixed-size sufficient statistic, independent of the episodes observed and tabular in shape, which makes them directly amenable to the attention-based architectures used for tabular foundation models, with a policy head replacing the supervised target. Together these define the agenda for an RL foundation model. As a proof of concept, we train a Graph Attention Network entirely on synthetic MDPs and show that, with no task-specific tuning, it solves held-out tabular benchmarks in context, both online and offline: online, in far fewer episodes than UCB-VI and tabular Q-learning, and offline, competitively with VI-LCB.

URL PDF HTML ☆

赞 0 踩 0

2505.18201 2026-06-19 cs.RO cs.LG 版本更新

Reinforcement Twinning for Hybrid Control of Flapping-Wing Drones

强化孪生用于扑翼无人机的混合控制

Romain Poletti, Lorenzo Schena, Lilla Koloszar, Joris Degroote, Miguel Alfonso Mendez

发表机构 * Environmental and Applied Fluid Dynamics, von Karman Institute for Fluid Dynamics（环境与应用流体动力学，冯·卡门流体动力学研究所）； Department of Mechanical Engineering, Vrije Universiteit Brussel（机械工程系，自由大学布鲁塞尔）； Department of Electromechanical, Systems and Metal Engineering, Ghent University（机电系统与金属工程系，根特大学）； Aero-Thermo-Mechanics Laboratory, École Polytechnique de Bruxelles, Université Libre de Bruxelles（航空热力学力学实验室，布鲁塞尔理工学院，自由大学布鲁塞尔）； Experimental Aerodynamics and Propulsion Lab, Universidad Carlos III de Madrid（实验空气动力学与推进实验室，马德里卡洛斯三世大学）

AI总结提出一种混合无模型/基于模型的扑翼无人机控制方法，通过强化孪生算法结合强化学习与自适应数字孪生，利用迁移学习和策略裁判提升样本效率与控制鲁棒性。

详情

AI中文摘要

控制扑翼无人机需要能够处理来自不完整、有噪声传感器数据的时变、非线性、欠驱动动力学的控制器。人工智能的最新进展，特别是强化学习，通过从环境交互中进行数据驱动的策略优化，为解决此类复杂控制问题开辟了新视角。然而，纯数据驱动方法样本效率低，需要大量甚至不安全的探索，尤其是在缺乏引导物理模型的情况下。这激发了混合人工智能-物理框架。本文提出了一种使用强化孪生算法的混合无模型/基于模型的飞行控制方法。基于模型的组件使用伴随公式和从实时轨迹中连续识别的自适应数字孪生；无模型组件使用强化学习。两个智能体通过迁移学习、模仿学习以及真实环境与数字孪生之间的共享经验来共享知识，并由一个策略裁判协调，该裁判根据数字孪生性能和真实到虚拟一致性比率选择哪个智能体在现实中行动。该框架针对扑翼无人机的纵向控制进行了评估，该无人机被建模为由准稳态气动力驱动的非线性时变系统。混合策略在三种自适应模型初始化下进行了测试：（1）从现有数据进行离线识别，（2）随机初始化并进行完全在线识别，以及（3）使用有偏参数进行离线预训练，然后进行在线自适应。在所有情况下，混合框架在性能、鲁棒性和样本效率方面均优于纯无模型和纯基于模型的方法。

英文摘要

Controlling flapping-wing drones requires controllers that handle time-varying, nonlinear, underactuated dynamics from incomplete, noisy sensor data. Recent advances in artificial intelligence (AI), particularly reinforcement learning (RL), have opened new perspectives for addressing such complex control problems through data-driven policy optimization from interaction with the environment. Yet purely data-driven methods are sample-inefficient, demanding extensive, sometimes unsafe exploration, especially without guiding physical models. This motivates hybrid AI-physics frameworks. This article proposes a hybrid model-free/model-based flight-control approach using the reinforcement twinning algorithm. The model-based (MB) component uses an adjoint formulation and an adaptive digital twin continuously identified from live trajectories; the model-free (MF) component uses RL. The two agents share knowledge via transfer learning, imitation learning, and shared experience between the real environment and the digital twin, coordinated by a policy referee that selects which agent acts in reality based on digital-twin performance and a real-to-virtual consistency ratio. The framework is evaluated for the longitudinal control of a flapping-wing drone, modelled as a nonlinear time-varying system driven by quasi-steady aerodynamic forces. The hybrid strategy is tested under three adaptive-model initializations: (1) offline identification from existing data, (2) random initialization with fully online identification, and (3) offline pre-training with biased parameters followed by online adaptation. In all cases, the hybrid framework improves performance, robustness, and sample efficiency over purely model-free and purely model-based approaches.

URL PDF HTML ☆

赞 0 踩 0

2507.19712 2026-06-19 cs.DC cs.AI cs.GT cs.LG cs.NI 版本更新

Oranits: Mission Assignment and Task Offloading in Open RAN-based ITS using Metaheuristic and Deep Reinforcement Learning

Oranits: 基于Open RAN的智能交通系统中的任务分配与卸载——元启发式与深度强化学习方法

Ngoc Hung Nguyen, Nguyen Van Thieu, Quang-Trung Luu, Anh Tuan Nguyen, Senura Wanasekara, Nguyen Cong Luong, Fatemeh Kavehmadavani, Van-Dinh Nguyen

发表机构 * Department of Smart City, Hanyang University（翰阳大学智能城市系）

AI总结提出Oranits系统模型，通过元启发式算法CGG-ARO和深度强化学习框架MA-DDQN优化车辆协作中的任务依赖与卸载成本，分别提升任务完成率7.7%和12.5%。

Comments 16 pages, 13 figures

Journal ref IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2026

详情

AI中文摘要

本文研究了基于开放无线接入网（Open RAN）的智能交通系统（ITS）中的任务分配与卸载问题，其中自动驾驶车辆利用移动边缘计算进行高效处理。现有研究常忽视任务之间的复杂依赖关系以及将任务卸载到边缘服务器的成本，导致决策次优。为弥补这一不足，我们引入了Oranits，一种新颖的系统模型，明确考虑了任务依赖性和卸载成本，同时通过车辆协作优化性能。为此，我们提出了一种双重优化方法。首先，我们开发了一种基于元启发式的进化计算算法，即混沌高斯全局ARO（CGG-ARO），作为单时隙优化的基线。其次，我们设计了一种增强的基于奖励的深度强化学习（DRL）框架，称为多智能体双深度Q网络（MA-DDQN），该框架集成了多智能体协调和多动作选择机制，显著减少了任务分配时间并提高了对基线方法的适应性。大量仿真表明，CGG-ARO将完成任务数量和总体收益分别提高了约7.1%和7.7%。同时，MA-DDQN在任务完成率和总体收益方面分别实现了11.0%和12.5%的更大提升。这些结果凸显了Oranits在动态ITS环境中实现更快、更自适应、更高效任务处理的有效性。

英文摘要

In this paper, we explore mission assignment and task offloading in an Open Radio Access Network (Open RAN)-based intelligent transportation system (ITS), where autonomous vehicles leverage mobile edge computing for efficient processing. Existing studies often overlook the intricate interdependencies between missions and the costs associated with offloading tasks to edge servers, leading to suboptimal decision-making. To bridge this gap, we introduce Oranits, a novel system model that explicitly accounts for mission dependencies and offloading costs while optimizing performance through vehicle cooperation. To achieve this, we propose a twofold optimization approach. First, we develop a metaheuristic-based evolutionary computing algorithm, namely the Chaotic Gaussian-based Global ARO (CGG-ARO), serving as a baseline for one-slot optimization. Second, we design an enhanced reward-based deep reinforcement learning (DRL) framework, referred to as the Multi-agent Double Deep Q-Network (MA-DDQN), that integrates both multi-agent coordination and multi-action selection mechanisms, significantly reducing mission assignment time and improving adaptability over baseline methods. Extensive simulations reveal that CGG-ARO improves the number of completed missions and overall benefit by approximately 7.1% and 7.7%, respectively. Meanwhile, MA-DDQN achieves even greater improvements of 11.0% in terms of mission completions and 12.5% in terms of the overall benefit. These results highlight the effectiveness of Oranits in enabling faster, more adaptive, and more efficient task processing in dynamic ITS environments.

URL PDF HTML ☆

赞 0 踩 0

2605.00457 2026-06-19 cs.NI cs.LG cs.SY eess.SY 版本更新

Utility-Aware DRL-Based TXOP Adaptation for NR-U and Wi-Fi Coexistence Networks

基于策略驱动的DRL的NR-U与Wi-Fi共存中的TXOP自适应

Po-Heng Chou, Yi-Fang Yu, Shou-Yu Chen, Chiapin Wang

发表机构 * Research Center for Information Technology Innovation (CITI), Academia Sinica (AS)（资讯科技创新研究所以（CITI），中华学术界（AS））； Department of Electrical Engineering, National Taiwan Normal University (NTNU)（国立台湾师范大学电子工程系（NTNU））

AI总结针对NR-U与Wi-Fi在非授权频谱共存中的频谱利用不平衡问题，提出一种基于策略驱动的深度强化学习框架，通过奖励设计实现公平性、吞吐量和效用的灵活权衡控制。

Comments 15 pages, 13 figures, 2 tables, submitted to IEEE Open Journal of the Communications Society

详情

AI中文摘要

NR-U与Wi-Fi在非授权频谱中的共存引入了一个具有挑战性的共存管理问题，其中异构信道接入机制导致频谱利用的显著不平衡和Wi-Fi性能下降。为了解决这一挑战，我们提出了一种基于策略驱动的深度强化学习（DRL）框架，用于自适应传输机会（TXOP）控制，其中共存过程被建模为马尔可夫决策过程（MDP），深度Q网络（DQN）通过在线交互学习控制策略。一个关键贡献是通过奖励设计引入策略层，从而实现对公平性、吞吐量和效用之间共存权衡的显式控制。开发了三种策略，即绝对公平、适度公平和基于效用的公平，以实现不同的工作点。仿真结果表明，所提出的框架在严格公平控制下实现了高于0.9的Jain公平指数。与绝对公平相比，适度公平将总吞吐量提高了68.22%，而基于效用的策略进一步将效用提高了177.6%。这些结果表明，策略驱动控制为管理异构共存网络中的权衡提供了一种灵活有效的解决方案。

英文摘要

The coexistence of NR-U and Wi-Fi in the unlicensed spectrum introduces a challenging resource management problem, where heterogeneous channel access mechanisms can lead to unbalanced spectrum utilization and severe Wi-Fi performance degradation. To address this issue, this paper proposes a utility-aware deep reinforcement learning (DRL) framework for adaptive transmission opportunity (TXOP) control in NR-U/Wi-Fi coexistence networks. The coexistence process is formulated as a Markov decision process (MDP), in which the NR-U TXOP duration is treated as a controllable variable for regulating post-access channel occupancy. A deep Q-network (DQN) is then employed to learn adaptive TXOP control policies through online interaction with the coexistence environment. A key feature of the proposed framework is the integration of a configurable reward and criterion design, which enables explicit control of the fairness-efficiency-utility tradeoff. Three operating policies are developed, namely absolute fairness, moderate fairness, and utility-oriented moderate fairness, to characterize different coexistence operating points. Simulation results show that the proposed framework achieves a Jain fairness index above 0.9 under strict fairness control. Compared with the absolute fairness policy, the moderate fairness policy improves aggregate throughput by 68.22%, while the utility-oriented policy achieves a 177.6% improvement under the adopted utility evaluation metric. These results demonstrate that the proposed utility-aware DRL framework provides an effective and flexible solution for adaptive TXOP control and tradeoff management in heterogeneous unlicensed coexistence networks.

URL PDF HTML ☆

赞 0 踩 0

2605.22748 2026-06-19 cs.RO cs.AI cs.LG cs.MA 版本更新

Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning

通过多智能体强化学习实现超人类安全且敏捷的赛车

Ismail Geles, Leonard Bauersfeld, Markus Wulfmeier, Davide Scaramuzza

发表机构 * Robotics and Perception Group, University of Zurich（苏黎世大学机器人与感知组）； Google DeepMind（谷歌深Mind）； Nomagic

AI总结本文提出通过多智能体强化学习在高速四旋翼赛车中实现安全且敏捷的性能，展示了多智能体交互对真实世界交互安全性的关键作用，同时在高速赛车中超越人类飞行员并减少碰撞率。

Comments 12 pages (+4 supplementary). Website: https://rpg.ifi.uzh.ch/marl

详情

AI中文摘要

自主系统在孤立或模拟环境中已实现超人类性能，但在共享、动态的真实世界空间中仍显得脆弱。这种失败源于物理应用中主导的单智能体范式，其中其他参与者被忽略或视为环境噪声，阻碍了有效协调。本文证明多智能体强化学习为真实世界交互提供了必要的安全性基础。使用高速四旋翼赛车作为高风险测试平台，训练智能体在复杂空气动力学相互作用和战略机动中导航，具有可变数量的赛车。通过联赛基于的自我对战，智能体进化出复杂的前瞻性行为，包括主动避障、超车和处理多智能体物理交互，包括空气动力学下洗。我们的智能体在超过22米/秒的速度下多玩家赛车中超越了冠军级人类飞行员，同时与最先进的单智能体基线相比，碰撞率减少了50%。关键的是，使用多样化的人工智能体进行训练能够实现零样本泛化到更安全的人类交互。这些结果表明，实现稳健的机器人共存的路径不在于孤立的安全约束，而在于多智能体交互的严格要求。多媒体材料可在：https://rpg.ifi.uzh.ch/marl

英文摘要

Autonomous systems have achieved superhuman performance in isolation or simulation, yet they remain brittle in shared, dynamic real-world spaces. This failure stems from the dominant single-agent paradigm for physical applications, where other actors are ignored or treated as environmental noise, preventing effective coordination. Here we show that multi-agent reinforcement learning provides the essential safety scaffolding required for real-world interaction. Using high-speed quadrotor racing as a high-stakes testbed, we train agents to navigate complex aerodynamic interactions and strategic maneuvering with a variable number of racers. Through league-based self-play, agents evolve sophisticated anticipatory behaviors, including proactive collision avoidance, overtaking, and handling multi-agent physical interactions, including aerodynamic downwash. Our agents outperform a champion-level human pilot in multi-player races at speeds exceeding 22 m/s, while simultaneously reducing collision rates by 50 % compared to state-of-the-art single-agent baselines. Crucially, training with diverse artificial agents enables zero-shot generalization to safer human interaction. These results suggest that the path to robust robotic co-existence lies not in isolated safety constraints, but in the rigorous demands of multi-agent interaction. Multimedia materials are available at: https://rpg.ifi.uzh.ch/marl

URL PDF HTML ☆

赞 0 踩 0

2507.05169 2026-06-19 cs.LG cs.AI cs.CL cs.CV cs.RO 版本更新

Critique of World Model

世界模型批判：一种用于世界建模的生成式潜在预测架构

Eric Xing, Mingkai Deng, Jinyu Hou

AI总结本文从心理学“假设性思维”出发，提出世界模型的核心目标是模拟真实世界的所有可行动可能性，并设计了一种基于状态化、分层、多级、混合连续/离散表示的生成式潜在预测（GLP）架构。

详情

AI中文摘要

AI中文摘要

训练数据的选择如何影响AI模型？这个广泛的问题对于可解释性、隐私和基础科学至关重要。其技术核心是数据删除问题：在合理的预计算量之后，快速预测如果从学习算法中排除给定训练数据子集，模型在给定情况下的行为。我们提出了一种数据删除方案，能够在深度学习设置中以可忽略的误差$\varepsilon$和失败概率$\delta$预测模型输出。我们的预计算和预测算法分别仅比常规训练和推理慢$\tilde{O}(\log(1/\delta)/\varepsilon^2)$因子。存储需求为$\tilde{O}(\log(1/\delta)/\varepsilon^2)$个模型。我们的证明基于一个称为稳定性的假设。与先前工作所做的假设相比，稳定性似乎与学习强大AI模型完全兼容。为支持这一点，我们展示了稳定性在microgpt的最小实验集中得到满足。我们的代码可在https://this URL获取。在技术层面，我们的工作基于一种新方法，通过计算随机复方向的高阶导数来局部勾勒算术电路。前向模式自动微分允许廉价计算这些导数。

英文摘要

How does the choice of training data influence an AI model? This broad question is of central importance to interpretability, privacy, and basic science. At its technical core is the data deletion problem: after a reasonable amount of precomputation, quickly predict how the model would behave in a given situation if a given subset of training data had been excluded from the learning algorithm. We present a data deletion scheme capable of predicting model outputs with vanishing error $\varepsilon$ and failure probability $δ$ in the deep learning setting. Our precomputation and prediction algorithms are only $\tilde{O}(\log(1/δ)/\varepsilon^2)$ factors slower than regular training and inference, respectively. The storage requirements are those of $\tilde{O}(\log(1/δ)/\varepsilon^2)$ models. Our proof is based on an assumption that we call stability. In contrast to the assumptions made by prior work, stability appears to be fully compatible with learning powerful AI models. In support of this, we show that stability is satisfied in a minimal set of experiments with microgpt. Our code is available at https://github.com/SamSpo1/microgpt-sketch. At a technical level, our work is based on a new method for locally sketching an arithmetic circuit by computing higher-order derivatives in random complex directions. Forward-mode automatic differentiation allows cheap computation of these derivatives.

URL PDF HTML ☆

赞 0 踩 0

2606.04307 2026-06-19 cs.LG stat.CO stat.ME 版本更新

Folded Transport MCMC: Eliminating Label Switching by Sampling on a Fundamental Domain

折叠传输MCMC：对称贝叶斯模型的可认证商后验计算

Jun Hu

发表机构 * Wuhan University of Technology（武汉理工大学）

AI总结针对对称贝叶斯模型中的冗余多峰性导致MCMC收敛诊断退化的问题，提出Folded Transport MCMC方法，通过在对称群的基本域上构建独立采样器直接对商后验进行推断，并利用LCNF振荡认证框架在商度量下提供可证明的认证下界。

Comments 50 pages (including supplementary material), 5 figures, 6 tables. Submitted to Journal of Computational and Graphical Statistics

详情

AI中文摘要

具有有限对称性的贝叶斯模型——如可交换分量的混合模型、具有紧密间隔模态的结构识别——定义的后验在标签置换群下不变，产生冗余的多峰性，从而降低MCMC收敛诊断的质量。我们引入折叠传输MCMC（FolT-MCMC），该方法通过在对称群的基本域上构建独立采样器，直接对商后验进行推断。商提议分布通过对群轨道上学习的归一化流进行对称化得到。我们证明了基于LCNF振荡的认证框架可以迁移到商度量，并具有稳定子修正的球质量界和改进的覆盖半径，并且当未折叠流表现出跨模态提议缺陷时，分位数核心认证下界会得到改善。在高斯混合（d=2-20）、标签切换目标（最多24个等价模态）以及标准贝叶斯三分量混合后验上，分位数核心认证改进比从2倍到145倍不等，且折叠认证经验上几乎与维度无关。在台风山竹期间超高层建筑的真实加速度计数据上，FolT-MCMC产生了非平凡的分位数核心认证，而未折叠认证是平凡的。

英文摘要

In Bayesian mixture models and other exchangeable-component models, the posterior is invariant under permutation of component labels, creating m! equivalent modes-the label-switching problem. Standard MCMC methods either mix poorly across these modes or rely on post-hoc relabelling that cannot guarantee the sampler has converged. We propose Folded Transport MCMC (FolT-MCMC), which eliminates label switching before sampling by restricting the Markov chain to a fundamental domain-a sorted or reflected subspace containing exactly one representative from each symmetric mode. The proposal is a learned normalising flow whose density is symmetrised over the group orbits, ensuring correct targeting on the reduced space. We show that this construction preserves a computable convergence diagnostic based on the oscillation of the log-density ratio, and that the diagnostic becomes sharper on the fundamental domain whenever the original-space flow under-covers one or more symmetric modes. Experiments on Gaussian mixtures (d=2-20), label-switching targets (up to 24 equivalent modes), a standard Bayesian three-component mixture posterior, and real accelerometer data from a supertall building show improvement ratios of 2x to 145x, with the folded diagnostic stable across dimensions while the unfolded diagnostic collapses.

URL PDF HTML ☆

赞 0 踩 0

具有多于 $\sqrt{n}$ 个社区的随机块模型的相变

Alexandra Carpentier, Christophe Giraud, Nicolas Verzelen

发表机构 * Institut für Mathematik – Universität Potsdam, Potsdam, Germany（波恩大学数学研究所，德国波恩）； Laboratoire de Mathématiques d’Orsay, Université Paris-Saclay, CNRS, France（奥赛数学实验室，巴黎-萨克雷大学，法国 CNRS）； INRAE, Institut Agro, MISTEA, Univ. Montpellier, France（国家农业研究院，蒙彼利埃大学，法国）

AI总结本文证明在随机块模型中，当社区数 $K\geq \sqrt{n}$ 时，低度多项式在 Chin 等人提出的阈值以下无法恢复社区，而通过计数特定子图可在多项式时间内实现恢复，支持了新相变阈值的猜想。

详情

AI中文摘要

高维经验风险最小化中高斯普适性破坏的表征

Chiheb Yaakoubi, Cosme Louart, Malik Tiomoko, Zhenyu Liao

发表机构 * School of Data Science, The Chinese University of Hong Kong, Shenzhen, China ； Huawei Noah's Ark Lab, Huawei Technologies, Paris, France ； School of Electronic Information ； Communications, Huazhong University of Science \& Technology, China

AI总结通过将凸高斯极小极大定理推广到非高斯数据，刻画了高维经验风险最小化估计量的渐近分布，揭示了高斯普适性的适用范围与局限。

Comments 28 pages, 5 figures, 1 table

Journal ref ICML 2026

详情

AI中文摘要

我们研究了一般非高斯数据设计下的高维凸经验风险最小化（ERM）。通过启发式地将凸高斯极小极大定理（CGMT）扩展到非高斯设置，我们推导出关键统计量的渐近极小极大表征，从而能够近似ERM估计量 $\hat{\theta}$ 的均值 $\mu_{\hat{\theta}}$ 和协方差 $C_{\hat{\theta}}$。具体地，在数据矩阵的集中假设以及损失和正则化子的标准正则性条件下，我们证明：对于独立于训练数据的测试协变量 $x$，投影 $\hat{\theta}^\top x$ 近似遵循 $\mu_{\hat{\theta}}^\top x$ 的一般非高斯分布与一个独立中心高斯变量（方差为 $\mathrm{tr}(C_{\hat{\theta}} \mathbb{E}[xx^\top])$）的卷积。这一结果阐明了ERM高斯普适性的范围和局限。此外，我们证明任何 $\mathcal{C}^2$ 正则化子渐近等价于一个由其零点的Hessian矩阵和 $\mu_{\hat{\theta}}$ 处的梯度唯一确定的二次型。我们提供了跨不同损失和模型的数值模拟，以验证我们的理论预测和定性见解。

英文摘要

We study high-dimensional convex empirical risk minimization (ERM) under general non-Gaussian data designs. By heuristically extending the Convex Gaussian Min-Max Theorem (CGMT) to non-Gaussian settings, we derive an asymptotic min-max characterization of key statistics, enabling approximation of the mean $μ_{\hatθ}$ and covariance $C_{\hatθ}$ of the ERM estimator $\hatθ$. Specifically, under a concentration assumption on the data matrix and standard regularity conditions on the loss and regularizer, we show that for a test covariate $x$ independent of the training data, the projection $\hatθ^\top x$ approximately follows the convolution of the generally non-Gaussian distribution of $μ_{\hatθ}^\top x$ with an independent centered Gaussian variable of variance $\mathrm{tr}(C_{\hatθ} \mathbb{E}[xx^\top])$. This result clarifies the scope and limits of Gaussian universality for ERMs. Additionally, we prove that any $\mathcal{C}^2$ regularizer is asymptotically equivalent to a quadratic form determined solely by its Hessian at zero and gradient at $μ_{\hatθ}$. Numerical simulations across diverse losses and models are provided to validate our theoretical predictions and qualitative insights.

URL PDF HTML ☆

赞 0 踩 0

2606.18679 2026-06-19 cs.DS cs.GT cs.LG math.OC 版本更新

Fair Online Resource Allocation

公平在线资源分配

Christopher En, Yuri Faenza, Andrea Lodi, Gonzalo Muñoz

发表机构 * Columbia University, IEOR Department（哥伦比亚大学工业工程与运营研究系）； Cornell Tech（康奈尔科技学院）； Universidad de Chile（智利大学）

AI总结研究在线资源分配中的公平性问题，提出基于对偶镜像下降的算法，在批次内强制执行公平约束，实现亚线性遗憾，并通过难民数据验证了福利与公平的权衡。

Comments 30 pages, 4 figures. To appear in the proceedings of EC 2026

详情

AI中文摘要

我们研究公平在线资源分配问题，其动机源于难民安置和航班调度等应用，其中代理顺序到达并必须分配到容量有限的设施。我们引入一个模型，在资源约束和Lipschitz公平性要求下最大化整体福利，该要求确保同一批次中到达的相似代理获得相似的预期结果。我们首先分析离线问题，证明最优公平分配的价值至少是最优不公平分配的$\Omega(1/\gamma)$倍，其中$\gamma$是公平系数，从而界定了公平的代价。对于在线设置，我们提出一种基于对偶镜像下降的算法，该算法在估计最优对偶变量的同时，在批次内强制执行公平约束。我们证明该算法相对于最优离线流体基准实现了亚线性遗憾。最后，我们使用难民经济项目的真实数据验证了理论结果，展示了算法的性能，并考察了福利最大化与公平执行之间的权衡。

英文摘要

We study the problem of fair online resource allocation, motivated by applications such as refugee resettlement and airline scheduling, where agents arrive sequentially and must be assigned to facilities with limited capacities. We introduce a model that maximizes the overall welfare subject to resource constraints and a Lipschitz fairness requirement, which ensures that similar agents arriving in the same batch receive similar expected outcomes. We first analyze the offline problem, proving that the value of the optimal fair allocation is at least an $Ω(1/γ)$ fraction of the optimal unfair allocation, where $γ$ is the fairness coefficient, thereby bounding the price of fairness. For the online setting, we propose an algorithm based on dual mirror descent that enforces fairness constraints within batches while estimating optimal dual variables. We prove that this algorithm achieves sublinear regret relative to the optimal offline fluid benchmark. Finally, we validate our theoretical results using real-world data from the Refugee Economies Programme, demonstrating the algorithm's performance and examining the trade-offs between welfare maximization and fairness enforcement.

URL PDF HTML ☆

赞 0 踩 0

2510.01565 2026-06-19 cs.LG cs.DC 版本更新

TetriServe: Efficiently Serving Mixed DiT Workloads

TetriServe: 高效服务混合DiT工作负载

Runyu Lu, Shiqi He, Wenxuan Tan, Shenggui Li, Ruofan Wu, Jeff J. Ma, Ang Chen, Mosharaf Chowdhury

发表机构 * University of Michigan（密歇根大学）； University of Wisconsin-Madison（威斯康星大学麦迪逊分校）； Nanyang Technological University（南洋理工大学）

AI总结针对混合分辨率与截止时间的异构DiT工作负载，提出基于步骤级序列并行的TetriServe系统，通过轮次调度与自适应并行度，在保证图像质量下将SLO达成率提升32%。

详情

AI中文摘要

扩散Transformer（DiT）模型通过迭代去噪步骤生成高质量图像，但由于其高计算成本（尤其在大分辨率下），在严格服务级别目标（SLO）下服务这些模型具有挑战性。现有服务系统使用固定程度的序列并行，这对于具有混合分辨率和截止时间的异构工作负载效率低下，导致GPU利用率低和SLO达成率低。在本文中，我们提出步骤级序列并行，根据请求的截止时间动态调整单个请求的并行度。我们提出了TetriServe，一个实现此策略的DiT服务系统，用于高效图像生成。具体来说，TetriServe引入了一种新颖的基于轮次的调度机制，通过（1）将时间离散化为固定轮次以使截止时间感知调度可处理，（2）在步骤级别自适应并行度并最小化GPU小时消耗，以及（3）联合打包请求以最小化延迟完成，从而提高SLO达成率。对最先进的DiT模型进行的广泛评估表明，与现有解决方案相比，TetriServe在不降低图像质量的情况下实现了高达32%的SLO达成率提升。

英文摘要

Diffusion Transformer (DiT) models excel at generating high-quality images through iterative denoising steps, but serving them under strict Service Level Objectives (SLOs) is challenging due to their high computational cost, particularly at larger resolutions. Existing serving systems use fixed-degree sequence parallelism, which is inefficient for heterogeneous workloads with mixed resolutions and deadlines, leading to poor GPU utilization and low SLO attainment. In this paper, we propose step-level sequence parallelism to dynamically adjust the degree of parallelism of individual requests according to their deadlines. We present TetriServe, a DiT serving system that implements this strategy for highly efficient image generation. Specifically, TetriServe introduces a novel round-based scheduling mechanism that improves SLO attainment by (1) discretizing time into fixed rounds to make deadline-aware scheduling tractable, (2) adapting parallelism at the step level and minimizing GPU hour consumption, and (3) jointly packing requests to minimize late completions. Extensive evaluation on state-of-the-art DiT models shows that TetriServe achieves up to 32% higher SLO attainment compared to existing solutions without degrading image quality.

URL PDF HTML ☆

赞 0 踩 0

2510.18784 2026-06-19 cs.LG 版本更新

CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training

CAGE: 曲率感知梯度估计用于精确的量化感知训练

Soroush Tabesh, Mher Safaryan, Andrei Panferov, Alexandra Volkova, Dan Alistarh

发表机构 * Anonymous Authors（匿名作者）

AI总结提出CAGE方法，通过曲率感知校正项改进直通估计器，平衡损失最小化与量化约束，在平滑非凸设置下提供收敛保证，显著提升低比特量化感知训练的精度。

Comments Accepted at MLSys 2026 (Oral). To appear in Proceedings of Machine Learning and Systems 8

Journal ref Proceedings of Machine Learning and Systems 8 (MLSys 2026)

详情

AI中文摘要

尽管在低比特量化感知训练（QAT）方面已有大量工作，但这些技术与原生训练之间仍存在精度差距。为解决这一问题，我们引入了CAGE（曲率感知梯度估计），一种新的QAT方法，它用曲率感知校正项增强直通估计器（STE）梯度，旨在抵消量化引起的损失增加。CAGE源自QAT的多目标视角，平衡损失最小化与量化约束，产生一个依赖于局部曲率信息的原理性校正项。在理论方面，我们引入了量化优化的帕累托最优解概念，并证明CAGE在平滑非凸设置下具有强收敛保证。在实现方面，我们的方法是优化器无关的，但我们提供了一个利用Adam统计信息的高效实现。在相似计算成本下，CAGE在精度上显著优于先前最先进的方法：对于QAT微调，它将压缩精度损失相对于先前最佳方法减半；而对于Llama模型的QAT预训练，其在3比特权重和激活（W3A3）下的精度与先前最佳方法在4比特（W4A4）下达到的精度相当。官方实现可在以下链接找到：https://github.com/IST-DASLab/CAGE。

英文摘要

Despite significant work on low-bit quantization-aware training (QAT), there is still an accuracy gap between such techniques and native training. To address this, we introduce CAGE (Curvature-Aware Gradient Estimation), a new QAT method that augments the straight-through estimator (STE) gradient with a curvature-aware correction designed to counteract the loss increase induced by quantization. CAGE is derived from a multi-objective view of QAT that balances loss minimization with the quantization constraints, yielding a principled correction term that depends on local curvature information. On the theoretical side, we introduce the notion of Pareto-optimal solutions for quantized optimization, and establish that CAGE yields strong convergence guarantees in the smooth non-convex setting. In terms of implementation, our approach is optimizer-agnostic, but we provide a highly-efficient implementation that leverages Adam statistics. CAGE significantly improves upon the prior state-of-the-art methods in terms of accuracy, for similar computational cost: for QAT fine-tuning, it halves the compression accuracy loss relative to the prior best method, while for QAT pre-training of Llama models, its accuracy for 3-bit weights-and-activations (W3A3) matches the accuracy achieved at 4-bits (W4A4) with the prior best method. The official implementation can be found over https://github.com/IST-DASLab/CAGE .

URL PDF HTML ☆

赞 0 踩 0

2602.04396 2026-06-19 cs.LG cs.AI 版本更新

LoRDO: Distributed Low-Rank Optimization with Infrequent Communication

LoRDO: 分布式低秩优化与低频通信

Andrej Jovanović, Alex Iacob, Mher Safaryan, Ionut-Vlad Modoranu, Lorenzo Sani, William F. Shen, Xinchi Qiu, Dan Alistarh, Nicholas D. Lane

发表机构 * University of Cambridge（剑桥大学）； Institute of Science and Technology Austria（奥地利科学与技术研究院）； Lancaster University（兰卡斯特大学）； Flower Labs（Flower实验室）

AI总结提出LoRDO框架，统一低秩优化与低频同步，通过全秩准双曲更新恢复子空间探索，在125M-720M模型规模下实现与低秩DDP近似的性能，通信量减少约10倍。

Comments Accepted at ICML 2026

详情

AI中文摘要

通过$\ exttt{DDP}$进行基础模型的分布式训练受限于互连带宽。虽然低频通信策略减少了同步频率，但优化器状态的内存和通信需求仍然构成瓶颈。低秩优化器可以缓解这些限制；然而，在局部更新机制下，工作节点无法访问计算低秩投影所需的全批次梯度，这降低了性能。我们提出$\ exttt{LoRDO}$，一个统一低秩优化与低频同步的原则性框架。我们首先证明，虽然基于伪梯度的全局投影在理论上更优，但它们将优化轨迹永久限制在低秩子空间中。为了恢复子空间探索，我们引入了一个全秩准双曲更新。$\ exttt{LoRDO}$在125M-720M模型规模的语言建模和下游任务中实现了与低秩$\ exttt{DDP}$近乎相同的性能，同时将通信量减少了约10倍。最后，我们表明在具有小秩/小批次大小的极低内存设置中，$\ exttt{LoRDO}$的性能提升更为显著。

英文摘要

Distributed training of foundation models via $\texttt{DDP}$ is limited by interconnect bandwidth. While infrequent communication strategies reduce synchronization frequency, they remain bottlenecked by the memory and communication requirements of optimizer states. Low-rank optimizers can alleviate these constraints; however, in the local-update regime, workers lack access to the full-batch gradients required to compute low-rank projections, which degrades performance. We propose $\texttt{LoRDO}$, a principled framework unifying low-rank optimization with infrequent synchronization. We first demonstrate that, while global projections based on pseudo-gradients are theoretically superior, they permanently restrict the optimization trajectory to a low-rank subspace. To restore subspace exploration, we introduce a full-rank quasi-hyperbolic update. $\texttt{LoRDO}$ achieves near-parity with low-rank $\texttt{DDP}$ in language modeling and downstream tasks at model scales of $125$M--$720$M, while reducing communication by $\approx 10 \times$. Finally, we show that $\texttt{LoRDO}$ improves performance even more in very low-memory settings with small rank/batch size.

URL PDF HTML ☆

赞 0 踩 0

2602.22495 2026-06-19 cs.LG cs.AI 版本更新

UltraEP：在机架级节点上以近最优负载均衡释放MoE训练与推理

Xinming Wei, Chao Jin, Tuo Dai, Yinmin Zhong, Shan Yu, Chengxu Yang, Bingyang Wu, Zili Zhang, Jing Mai, Qianchao Zhu, Zhouyang Li, Yuliang Liu, Guojie Luo

AI总结提出UltraEP，首个基于精确负载的实时均衡器，通过协同设计规划求解与专家复制通信，在机架级节点上实现MoE训练和推理的微批次与逐层重均衡，达到94.3%的力均衡理想吞吐量。

详情

AI中文摘要

大规模专家并行（EP）正成为训练和服务前沿MoE模型的关键，但它也加剧了设备级专家负载不均衡，导致计算掉队者、令牌全对全瓶颈和激活内存峰值。现有的均衡器基于历史负载定期重新分配专家，这对于具有非平稳负载模式的生产部署变得不可靠。我们提出UltraEP，首个用于大规模EP MoE训练和在机架级节点（RSN）上服务预填充的精确负载实时均衡器。基于RSN扩展的纵向扩展连接性，UltraEP在关键路径上对每个微批次和层进行重均衡，这需要规划求解和专家复制通信的非平凡协同设计，以最小化暴露的开销。为此，UltraEP通过高效的配额驱动规划对门控后负载做出积极反应，并利用RSN原生的持久tile流和基于中继的扇出缓解来执行由此产生的不规则专家状态传输。在训练和预填充中，平均涵盖106B到671B参数的MoE模型，UltraEP实现了力均衡理想吞吐量的94.3%，相比无均衡提升了1.49倍，同时将最终跨秩不均衡从1.30-4.01降低到1.01-1.04。此外，我们在2560个GPU的生产MoE训练中验证了UltraEP的可扩展性和鲁棒性。

英文摘要

Large-scale expert parallelism (EP) is becoming pivotal for training and serving frontier MoE models, but it also amplifies device-level expert load imbalance into compute stragglers, token all-to-all bottlenecks, and activation-memory spikes. Existing balancers redistribute experts periodically based on historical load, which becomes unreliable for production deployments with non-stationary load patterns. We present UltraEP, the first exact-load, real-time balancer for large-EP MoE training and serving prefill on rack-scale nodes (RSNs). Leveraging the extended scale-up connectivity among dozens of GPUs within RSNs, UltraEP rebalances every microbatch and layer on critical paths, which requires nontrivial co-design of plan solving and expert replication communication to minimize exposed overhead. To this end, UltraEP eagerly reacts to post-gating load with an efficient quota-driven planner, and executes the resulting irregular expert-state transfers with RSN-native persistent tile streaming and relay-based fan-out mitigation. We evaluate UltraEP in a multi-RSN deployment of up to 256 GPUs, using cutting-edge MoE models from 106B to 671B parameters. Averaged across training and serving, UltraEP achieves 94.3% of the force-balanced ideal throughput, delivering 1.49$\times$ improvement over no-balancing, while reducing the final inter-rank imbalance from 1.30$-$4.01 to 1.01$-$1.04.

URL PDF HTML ☆

赞 0 踩 0

2604.06464 2026-06-19 cs.LG physics.app-ph stat.ML 版本更新

Weighted Bayesian Conformal Prediction

加权贝叶斯共形预测

Xiayin Lou, Peng Luo

发表机构 * Technical University of Munich（慕尼黑技术大学）； Massachusetts Institute of Technology（麻省理工学院）

AI总结提出加权贝叶斯共形预测（WBCP），通过加权Dirichlet先验推广贝叶斯共形预测到重要性加权设置，理论证明有效样本量决定后验方差，并提供更丰富的条件覆盖不确定性。

详情

AI中文摘要

共形预测提供具有有限样本覆盖保证的分布自由预测区间，Snell & Griffiths 最近的工作将其重新解释为贝叶斯求积（BQ-CP），通过阈值上的 Dirichlet 后验产生强大的数据条件保证。然而，BQ-CP 根本上要求 i.i.d. 假设。同时，加权共形预测通过重要性权重处理分布偏移，但仍然是频率学派方法，仅产生点估计阈值。我们提出 \textbf{加权贝叶斯共形预测（WBCP）}，它将 BQ-CP 推广到任意重要性加权设置，用加权 Dirichlet $\Dir(\neff \cdot \tilde{w}_1, \ldots, \neff \cdot \tilde{w}_n)$ 替换均匀 Dirichlet $\Dir(1,\ldots,1)$，其中 $\neff$ 是 Kish 有效样本量。我们证明了四个理论结果：(1)~$\neff$ 是匹配频率学派和贝叶斯方差的唯一集中参数；(2)~后验标准差以 $O(1/\sqrt{\neff})$ 衰减；(3)~BQ-CP 的随机占优保证扩展到每个权重轮廓的数据条件保证；(4)~HPD 阈值在条件覆盖上提供 $O(1/\sqrt{\neff})$ 的改进。我们将 WBCP 实例化为 \emph{地理贝叶斯共形预测}，其中基于核的空间权重产生每个位置的后验，并具有可解释的诊断。在合成和真实空间数据集上的实验表明，WBCP 在保持覆盖保证的同时提供了更丰富的不确定性信息。

英文摘要

Conformal prediction provides distribution-free prediction intervals with finite-sample coverage guarantees, and recent work by Snell \& Griffiths reframes it as Bayesian Quadrature (BQ-CP), yielding powerful data-conditional guarantees via Dirichlet posteriors over thresholds. However, BQ-CP fundamentally requires the i.i.d. assumption. Meanwhile, weighted conformal prediction handles distribution shift via importance weights but remains frequentist, producing only point-estimate thresholds. We propose \textbf{Weighted Bayesian Conformal Prediction (WBCP)}, which generalizes BQ-CP to arbitrary importance-weighted settings by replacing the uniform Dirichlet $\Dir(1,\ldots,1)$ with a weighted Dirichlet $\Dir(\neff \cdot \tilde{w}_1, \ldots, \neff \cdot \tilde{w}_n)$, where $\neff$ is Kish's effective sample size. We prove four theoretical results: (1)~$\neff$ is the unique concentration parameter matching frequentist and Bayesian variances; (2)~posterior standard deviation decays as $O(1/\sqrt{\neff})$; (3)~BQ-CP's stochastic dominance guarantee extends to per-weight-profile data-conditional guarantees; (4)~the HPD threshold provides $O(1/\sqrt{\neff})$ improvement in conditional coverage. We instantiate WBCP for spatial prediction as \emph{Geographical BQ-CP}, where kernel-based spatial weights yield per-location posteriors with interpretable diagnostics. Experiments on synthetic and real-world spatial datasets demonstrate that WBCP maintains coverage guarantees while providing substantially richer uncertainty information.

URL PDF HTML ☆

赞 0 踩 0

2605.30089 2026-06-19 cs.LG 版本更新

Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption

推理时元素损坏下的分布鲁棒集合表示学习

Yankai Chen, Hanrong Zhang, Bowei He, Philip S. Yu, Xue Liu

发表机构 * McGill University（麦吉尔大学）； University of Illinois Chicago（伊利诺伊大学芝加哥分校）

AI总结针对推理时元素损坏问题，提出SW-DRSO分布鲁棒优化框架，通过重心对抗近似最坏情况损失，在四个任务上验证了鲁棒性和性能。

Comments Accepted by ICML'26

2606.08892 2026-06-19 cs.LG 版本更新

一个探针无法捕捉所有：迈向有针对性的欺骗检测

Vikram Natarajan, Devina Jain, Shivam Arora, Satvik Golechha, Joseph Bloom

发表机构 * LASR Labs（LASR实验室）； UK AI Security Institute（英国人工智能安全研究所）

AI总结针对线性探针在欺骗检测中的异质性，提出根据具体欺骗类型匹配探针可显著提升性能（AUC提升0.108），建议组织定义威胁模型并部署相应探针。

详情

AI中文摘要

线性探针是一种有前景的监测AI系统欺骗行为的方法。先前工作表明，在对比指令对和简单数据集上训练的线性分类器可以达到良好性能。然而，这些探针即使在简单场景中也表现出显著失败，包括虚假相关性和对非欺骗响应的误报。在本文中，我们证明欺骗检测本质上是异质的：虽然单个通用探针实现了适度的改进（+0.032 AUC），但事后最优分析显示，当探针与特定欺骗类型匹配时，潜力显著更高（+0.108 AUC），并且合成验证实验表明，当欺骗类型事先已知时，这一上限是先验可实现的。我们的发现表明，指令对捕捉的是欺骗意图而非内容特定模式，这解释了为什么提示选择主导探针性能（占70.6%的方差）。鉴于这种异质性，我们得出结论，组织应定义其特定威胁模型并部署适当匹配的探针，而不是寻求通用的欺骗检测器。

英文摘要

Linear probes are a promising approach for monitoring AI systems for deceptive behaviour. Previous work has shown that a linear classifier trained on a contrastive instruction pair and a simple dataset can achieve good performance. However, these probes exhibit notable failures even in straightforward scenarios, including spurious correlations and false positives on non-deceptive responses. In this paper, we demonstrate that deception detection is inherently heterogeneous: while a single universal probe achieves modest improvements (+0.032 AUC), post-hoc oracle analysis reveals substantially higher potential (+0.108 AUC) when probes are matched to specific deception types, and synthetic validation experiments suggest this ceiling is achievable a priori when the deception type is known in advance. Our findings reveal that instruction pairs capture deceptive intent rather than content-specific patterns, explaining why prompt choice dominates probe performance (70.6% of variance). Given this heterogeneity, we conclude that organizations should define their specific threat models and deploy appropriately matched probes rather than seeking a universal deception detector.

URL PDF HTML ☆

赞 0 踩 0

2603.19423 2026-06-19 cs.CR cs.AI cs.LG 版本更新

The Autonomy Tax: Defense Training Breaks LLM Agents

自主性税：防御训练破坏LLM智能体

Shawn Li, Yue Zhao

发表机构 * University of Southern California（南加州大学）

AI总结揭示防御训练在提升LLM智能体安全性时，系统性地破坏其工具执行能力，导致任务失败率飙升，且无法有效防御复杂攻击。

详情

AI中文摘要

大型语言模型（LLM）智能体日益依赖外部工具（文件操作、API调用、数据库事务）来自主完成复杂的多步骤任务。实践者部署经过防御训练的模型，以防止通过恶意观察或检索内容操纵智能体行为的提示注入攻击。我们揭示了一个基本的\textbf{能力-对齐悖论}：旨在提高安全性的防御训练系统性地破坏了智能体的能力，同时未能阻止复杂的攻击。在97个智能体任务和1000个对抗性提示上，将防御模型与未防御基线进行比较，我们发现了多步骤智能体特有的三种系统性偏差。\textbf{智能体无能偏差}表现为立即的工具执行崩溃，模型在观察到任何外部内容之前就在良性任务上拒绝或生成无效操作。\textbf{级联放大偏差}导致早期失败通过重试循环传播，使防御模型在99%的任务中超时，而基线仅为13%。\textbf{触发偏差}导致矛盾的安全退化，防御模型的表现比未防御基线更差，而直接攻击以高概率绕过防御。根本原因分析表明，这些偏差源于捷径学习：模型过度拟合表面攻击模式而非语义威胁理解，这由防御效果在不同攻击类别上的极端方差所证明。我们的发现表明，当前的防御范式优化了单轮拒绝基准，同时使多步骤智能体从根本上不可靠，因此需要新的方法在对抗条件下保持工具执行能力。

英文摘要

Large language model (LLM) agents increasingly rely on external tools (file operations, API calls, database transactions) to autonomously complete complex multi-step tasks. Practitioners deploy defense-trained models to protect against prompt injection attacks that manipulate agent behavior through malicious observations or retrieved content. We reveal a fundamental \textbf{capability-alignment paradox}: defense training designed to improve safety systematically destroys agent competence while failing to prevent sophisticated attacks. Evaluating defended models against undefended baselines across 97 agent tasks and 1,000 adversarial prompts, we uncover three systematic biases unique to multi-step agents. \textbf{Agent incompetence bias} manifests as immediate tool execution breakdown, with models refusing or generating invalid actions on benign tasks before observing any external content. \textbf{Cascade amplification bias} causes early failures to propagate through retry loops, pushing defended models to timeout on 99\% of tasks compared to 13\% for baselines. \textbf{Trigger bias} leads to paradoxical security degradation where defended models perform worse than undefended baselines while straightforward attacks bypass defenses at high rates. Root cause analysis reveals these biases stem from shortcut learning: models overfit to surface attack patterns rather than semantic threat understanding, evidenced by extreme variance in defense effectiveness across attack categories. Our findings demonstrate that current defense paradigms optimize for single-turn refusal benchmarks while rendering multi-step agents fundamentally unreliable, necessitating new approaches that preserve tool execution competence under adversarial conditions.

URL PDF HTML ☆

赞 0 踩 0

2606.07822 2026-06-19 cs.CL cs.AI cs.LG 版本更新

The ACUTE Protocol: Operationalizing Language Model Activations for Better Calibration, Utility, and Trust

ACUTE协议：操作语言模型激活以实现更好的校准、效用和信任

Nishant Subramani, Palash Goyal, Yiwen Song, Mani Malek, Yuan Xue, Tomas Pfister, Hamid Palangi

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Google（谷歌）； Scale AI

AI总结提出ACUTE协议，通过操作语言模型激活来估计置信度，平衡校准与信息性，在多项选择问答、工具调用和科学文档摘要等任务上优于强基线，提升校准、效用和可信度。

Comments ICML 2026

详情

AI中文摘要

随着语言模型的改进并越来越多地部署以解决各种任务，可信度变得至关重要。校准是信任的良好代理：良好校准的置信度估计有助于在信任特定模型输出时告知风险与回报的权衡。不幸的是，即使模型改进，它们仍然校准不良，往往偏向过度自信。此外，校准可能被操纵：总是预测基率的策略是完美校准的，但完全没有信息性。为了解决这个问题，我们开发了一个新指标，即通过预言机重新归一化的期望效用（EURO），它平衡了校准和信息性。我们还提出了一种通用的基于激活的置信度、效用和信任估计协议（ACUTE），以适当裁决不确定性。ACUTE协议为4个模型家族的6个模型上的3个任务（包括多项选择问答、工具调用和科学文档摘要）提供了灵活、样本高效和计算高效的置信度估计器。ACUTE在EURO上优于强基线，同时保持较低的校准误差。综合来看，我们的工作表明，为LLM配备ACUTE协议可以在多种设置中提高校准、效用和可信度。

英文摘要

As language models improve and become increasingly deployed to solve a variety of tasks, trustworthiness becomes essential. Calibration is a good proxy for trust: well-calibrated confidence estimates help inform the risk versus reward tradeoff when trusting a specific model output. Unfortunately, even as models improve, they remain poorly calibrated, often biasing towards overconfidence. Additionally, calibration can be gamed: a policy that always predicts the base rate is perfectly calibrated, but completely uninformative. To resolve this, we develop a new metric, expected utility renormalized by the oracle (EURO), that balances calibration and informativeness. We also propose a general-purpose activation-based confidence, utility, and trust estimation protocol (ACUTE) to appropriately adjudicate uncertainty. The ACUTE protocol provides flexible, sample-efficient, and compute-efficient confidence estimators for 3 tasks including multiple choice question answering, tool-calling, and scientific document summarization across 6 models from 4 model families. ACUTE outperforms strong baselines on EURO, while maintaining low calibration error. Taken together, our work shows that equipping LLMs with the ACUTE protocol can improve calibration, utility, and trustworthiness in numerous settings.

URL PDF HTML ☆

赞 0 踩 0

2507.22524 2026-06-19 cs.LG 版本更新

HGCN(O): A Self-Tuning GCN HyperModel Toolkit for Outcome Prediction in Event-Sequence Data

HGCN(O)：一种用于事件序列数据结果预测的自调优GCN超模型工具包

Fang Wang, Paolo Ceravolo, Ernesto Damiani

发表机构 * College of Computing and Mathematical Sciences, Khalifa University（哈立发大学计算与数学科学学院）； Department of Computer Science, University of Milan（米兰大学计算机科学系）

AI总结提出HGCN(O)工具包，集成四种GCN架构和多种图表示，通过自调优优化预测准确性和稳定性，在平衡和不平衡数据集上表现优异，优于传统方法。

Comments 38 pages, 2 figures

2510.16311 2026-06-19 cs.LG 版本更新

支持边界经验混合的持续学习

Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

发表机构 * National Taiwan University（国立台湾大学）

AI总结提出经验混合框架，通过差分隐私启发的噪声生成支持边界数据，联合训练样本和边界数据以正则化决策边界，在多个数据集上提升持续学习准确率。

详情

AI中文摘要

持续学习旨在减轻模型在顺序任务训练时的灾难性遗忘。常见方法经验回放存储过去的样本，但仅稀疏地近似数据分布，导致决策边界脆弱且过于简化。我们通过引入支持边界数据来解决这一限制，该数据通过差分隐私启发的噪声注入潜在特征，生成边界邻近表示，隐式正则化决策边界。基于此，我们提出经验混合框架，通过双模型聚合策略联合训练样本和支持边界数据。经验混合有两个组成部分：(1) 潜在空间噪声注入以生成支持边界数据，(2) 联合利用样本和支持边界数据的端到端训练。与标准经验回放不同，支持边界数据丰富了决策边界附近的特征空间，从而实现更稳定和鲁棒的持续学习。在CIFAR-10、CIFAR-100、Tiny ImageNet和ImageNet1K上的大量实验分别展示了10%、6%、13%和2%的持续准确率提升。

英文摘要

Continual learning (CL) seeks to mitigate catastrophic forgetting when models are trained with sequential tasks. A common approach, experience replay (ER), stores past exemplars but only sparsely approximates the data distribution, yielding fragile and oversimplified decision boundaries. We address this limitation by introducing Support Boundary Data (SBD), generated via differential-privacy-inspired noise into latent features to create boundary-adjacent representations that implicitly regularize decision boundaries. Building on this idea, we propose Experience Blending (EB), a framework that jointly trains on exemplars and SBD through a dual-model aggregation strategy. EB has two components: (1) latent-space noise injection to generate support boundary data, and (2) end-to-end training that jointly leverages exemplars and SBD. Unlike standard experience replay, SBD enriches the feature space near decision boundaries, leading to more stable and robust continual learning. Extensive experiments on CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet1K demonstrate consistent accuracy improvements of 10%, 6%, 14%, 2%, respectively.

URL PDF HTML ☆

赞 0 踩 0

2104.08928 2026-06-19 stat.ML cs.CL cs.LG 版本更新

Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings

面向词嵌入迁移学习的组稀疏矩阵分解

Kan Xu, Xuanyi Zhao, Hamsa Bastani, Osbert Bastani

发表机构 * W. P. Carey School of Business, Arizona State University（亚利桑那州立大学韦伯商学院）； University of Pennsylvania（宾夕法尼亚大学）； Wharton School, University of Pennsylvania（宾夕法尼亚大学沃顿商学院）

AI总结提出一种基于组稀疏惩罚的两阶段估计器，通过结合大规模语料和少量领域数据高效迁移学习领域特定的词嵌入，并证明了其泛化误差界和非凸目标函数的局部最优与全局最优统计等价。

详情

AI中文摘要

非结构化文本为许多领域的决策者提供了丰富的数据源，从零售中的产品评论到医疗保健中的护理记录。为了利用这些信息，单词通常通过无监督学习算法（如矩阵分解）转化为词嵌入——编码单词之间语义关系的向量。然而，从训练数据有限的新领域学习词嵌入可能具有挑战性，因为在新领域中含义/用法可能不同，例如，单词“positive”通常具有积极情感，但在医疗记录中通常具有消极情感，因为它可能意味着患者检测出疾病阳性。在实践中，我们预计只有少数领域特定的单词可能具有新含义。我们提出了一种直观的两阶段估计器，通过组稀疏惩罚利用这种结构，通过结合大规模文本语料库（如维基百科）和有限的领域特定文本数据，高效地迁移学习领域特定的词嵌入。我们限定了迁移学习估计器的泛化误差，证明当只有少量嵌入在领域间改变时，它可以用显著更少的领域特定数据实现高精度。此外，我们证明了在标准正则化条件下，由非凸目标函数识别的所有局部最小值与全局最小值在统计上不可区分，这意味着我们的估计器可以高效计算。我们的结果首次给出了组稀疏矩阵分解的界限，这可能具有独立意义。我们通过与自然语言处理中最先进的微调启发式方法进行实证比较来评估我们的方法。

英文摘要

Unstructured text provides decision-makers with a rich data source in many domains, ranging from product reviews in retail to nursing notes in healthcare. To leverage this information, words are typically translated into word embeddings -- vectors that encode the semantic relationships between words -- through unsupervised learning algorithms such as matrix factorization. However, learning word embeddings from new domains with limited training data can be challenging, because the meaning/usage may be different in the new domain, e.g., the word ``positive'' typically has positive sentiment, but often has negative sentiment in medical notes since it may imply that a patient tested positive for a disease. In practice, we expect that only a small number of domain-specific words may have new meanings. We propose an intuitive two-stage estimator that exploits this structure via a group-sparse penalty to efficiently transfer learn domain-specific word embeddings by combining large-scale text corpora (such as Wikipedia) with limited domain-specific text data. We bound the generalization error of our transfer learning estimator, proving that it can achieve high accuracy with substantially less domain-specific data when only a small number of embeddings are altered between domains. Furthermore, we prove that all local minima identified by our nonconvex objective function are statistically indistinguishable from the global minimum under standard regularization conditions, implying that our estimator can be computed efficiently. Our results provide the first bounds on group-sparse matrix factorization, which may be of independent interest. We empirically evaluate our approach compared to state-of-the-art fine-tuning heuristics from natural language processing.

URL PDF HTML ☆

赞 0 踩 0

2601.02322 2026-06-19 stat.ME cs.LG 版本更新

Environment-Adaptive Covariate Selection: Learning When to Use Spurious Correlations for Out-of-Distribution Prediction

环境自适应协变量选择：学习何时利用虚假相关进行分布外预测

Shuozhi Zuo, Yixin Wang

发表机构 * Department of Statistics, University of Michigan, Ann Arbor（统计系，密歇根大学，安阿伯分校）

AI总结针对分布外预测中协变量选择问题，提出环境自适应算法，根据环境特征动态选择协变量集，在模拟和实际数据中优于静态方法。

详情

AI中文摘要

一种常见的分布外预测方法将模型限制为因果或不变协变量，以避免可能随环境变化的虚假关联。尽管具有理论吸引力，但当仅观察到结果的部分因果父节点时，该策略可能不如经验风险最小化。在这种情况下，非因果协变量可以作为未观察到的因果父节点的代理，当代理关系稳定时改善预测，但当变化破坏这种关系时则有害。因此，最优协变量集可能取决于所遇到的具体变化。由于不同的变化会在未标记的协变量分布中留下特征，我们提出了一种环境自适应协变量选择算法，该算法将环境级摘要映射到特定于环境的协变量集。这些摘要可以是手工制作的，也可以从多环境数据中学习，并且先验因果知识可以作为约束条件纳入。在模拟和应用数据集中，所提出的方法在各种变化下优于静态因果、不变和其他非自适应规则。

英文摘要

A common approach to out-of-distribution prediction restricts models to causal or invariant covariates to avoid spurious associations that may change across environments. Despite its theoretical appeal, this strategy can underperform empirical risk minimization when only a subset of the causal parents of the outcome is observed. In such settings, non-causal covariates can serve as proxies for unobserved causal parents and improve prediction when the proxy relationship is stable, but they can hurt when shifts disrupt that relationship. Thus, the optimal covariate set can depend on the specific shift encountered. Because different shifts leave signatures in the unlabeled covariate distribution, we propose an environment-adaptive covariate selection algorithm that maps environment-level summaries to environment-specific covariate sets. These summaries may be hand-crafted or learned from multi-environment data, and prior causal knowledge can be incorporated as constraints. Across simulations and applied datasets, the proposed method improves over static causal, invariant, and other non-adaptive rules under diverse shifts.

URL PDF HTML ☆

赞 0 踩 0

2505.16319 2026-06-19 cs.LG 版本更新

FreshRetailNet-50K: A Stockout-Annotated Censored Demand Dataset for Latent Demand Recovery and Forecasting in Fresh Retail

FreshRetailNet-LT：面向生鲜零售中潜在需求恢复与预测的缺货标注删失需求数据集

Yangyang Wang, Jiawei Gu, Li Long, Xin Li, Li Shen, Zhouyu Fu, Xiangjun Zhou, Xu Jiang

发表机构 * Fresh Retail, Inc.（新鲜零售公司）

AI总结针对生鲜零售中缺货导致的销售数据删失问题，提出首个大规模基准数据集FreshRetailNet-50K，包含50,000条高时间分辨率小时级销售序列及缺货标注，并展示了两阶段需求建模方法，将预测准确率提升2.73%，需求低估偏差从7.37%降至近零。

详情

AI中文摘要

准确的需求估计对于零售业务指导易腐产品的库存和定价策略至关重要。然而，它面临缺货期间删失销售数据的根本挑战，其中未观察到的需求会造成系统性政策偏差。现有数据集缺乏解决这种删失效应所需的时间分辨率和标注。为填补这一空白，我们提出了FreshRetailNet-50K，这是首个用于删失需求估计的大规模基准。它包含来自18个主要城市898家商店的50,000条商店-产品时间序列的详细小时级销售数据，涵盖863个易腐SKU，并精心标注了缺货事件。该数据集独有的小时级库存状态记录，结合丰富的上下文协变量（包括促销折扣、降水和时间特征），使得超越现有解决方案的创新研究成为可能。我们展示了一个两阶段需求建模的用例：首先，利用精确的小时级标注重建缺货期间的潜在需求；然后，利用恢复的需求在第二阶段训练鲁棒的需求预测模型。实验结果表明，该方法将预测准确率提高了2.73%，同时将系统性需求低估从7.37%降至接近零偏差。凭借前所未有的时间粒度和全面的真实世界信息，FreshRetailNet-50K在需求插补、易腐库存优化和因果零售分析方面开辟了新的研究方向。该数据集独特的标注质量和规模解决了零售AI中长期存在的局限性，提供了即时解决方案和未来方法论创新的平台。数据（此 https URL ）和代码（此 https URL ）已公开。

英文摘要

Accurate demand estimation is critical for the retail business in guiding the inventory and pricing policies of perishable products. However, it faces fundamental challenges from censored sales data during stockouts, where unobserved demand creates systemic policy biases. Existing datasets lack the temporal resolution and annotations needed to address this censoring effect. To fill this gap, we present FreshRetailNet-50K, the first large-scale benchmark for censored demand estimation. It comprises 50,000 store-product time series of detailed hourly sales data from 898 stores in 18 major cities, encompassing 863 perishable SKUs meticulously annotated for stockout events. The hourly stock status records unique to this dataset, combined with rich contextual covariates, including promotional discounts, precipitation, and temporal features, enable innovative research beyond existing solutions. We demonstrate one such use case of two-stage demand modeling: first, we reconstruct the latent demand during stockouts using precise hourly annotations. We then leverage the recovered demand to train robust demand forecasting models in the second stage. Experimental results show that this approach achieves a 2.73% improvement in prediction accuracy while reducing the systematic demand underestimation from 7.37% to near-zero bias. With unprecedented temporal granularity and comprehensive real-world information, FreshRetailNet-50K opens new research directions in demand imputation, perishable inventory optimization, and causal retail analytics. The unique annotation quality and scale of the dataset address long-standing limitations in retail AI, providing immediate solutions and a platform for future methodological innovation. The data (https://huggingface.co/datasets/Dingdong-Inc/FreshRetailNet-50K) and code (https://github.com/Dingdong-Inc/frn-50k-baseline}) are openly released.

URL PDF HTML ☆

赞 0 踩 0

2507.15584 2026-06-19 cs.LG 版本更新

We Need to Rethink Benchmarking in Anomaly Detection

我们需要重新思考异常检测中的基准测试

Philipp Röchner, Simon Klüttermann, Kevin Kammler, Franz Rothlauf, Emmanuel Müller, Daniel Schlör

发表机构 * University of Mainz（马尔堡大学）； TU Dortmund（杜伊斯堡-艾森大学）； University of Würzburg（维尔茨堡大学）

AI总结本文指出当前异常检测基准测试导致进展停滞，提出基于场景分类的评估框架以改进算法选择和性能评估。

详情

AI中文摘要

尽管不断有新的异常检测算法提出且基准测试工作广泛，但进展似乎停滞不前，既有基线与新算法之间仅存在微小的性能差异。在这篇立场论文中，我们认为这种停滞源于我们评估异常检测算法的方式存在局限性。在当前的基准测试中，一个仅检查单个特征极端值的平凡算法与最先进的深度学习方法竞争激烈，尽管它在简单案例（如正常点环内的异常）上失败。此外，现有基准测试未能充分反映异常检测应用的多样性，使得从业者难以可靠地为其应用选择算法。因此，我们需要重新思考异常检测中的基准测试。我们认为，异常检测应通过使用场景来研究，这些场景将共享相关特征的应用分组，并通过通用分类法定义。场景内的基准测试能够实现预处理、度量和模型选择的场景特定选择，明确哪些进展在相似应用间迁移，并为从业者在其特定上下文中提供可靠指导。

英文摘要

Despite the continuous proposal of new anomaly detection algorithms and extensive benchmarking efforts, progress seems to stagnate, with only minor performance differences between established baselines and new algorithms. In this position paper, we argue that this stagnation is due to limitations in how we evaluate anomaly detection algorithms. In current benchmarks, a trivial algorithm that only checks for extreme values in individual features performs competitively with state-of-the-art deep learning methods, despite failing on simple cases such as anomalies within an annulus of normal points. Moreover, existing benchmarks do not adequately reflect the diversity of anomaly detection applications, making it difficult for practitioners to reliably select algorithms for their applications. Consequently, we need to rethink benchmarking in anomaly detection. In our opinion, anomaly detection should be studied using scenarios that group applications sharing relevant characteristics, defined through a common taxonomy. Benchmarking within scenarios enables scenario-specific choices for preprocessing, metrics, and model selection, clarifying which advances transfer across similar applications and providing practitioners with reliable guidance for their specific contexts.

URL PDF HTML ☆

赞 0 踩 0

2510.06048 2026-06-19 cs.LG 版本更新

脑MRI的量子潜GAN增强的受控基准测试

Syed Mujtaba Haider, Silvia Figini

发表机构 * Department of Mathematics（数学系）； Department of Political and Social Sciences（政治与社会科学系）

AI总结通过受控基准测试，比较量子与经典生成器在脑MRI数据增强中的性能，发现两者均未显著优于仅用真实数据训练，且量子生成器无额外优势。

详情

AI中文摘要

医学图像分类常受限于有限的标注数据，因此生成式增强被提出；最近，量子生成模型被用于此目的，并经常报告准确率提升。然而，这些声称通常基于单次训练运行，未匹配量子与经典生成器的参数预算，也未表征任何收益出现的数据范围。我们提出了一个受控基准测试，隔离量子生成器对脑MRI增强的贡献。图像被编码到KL正则化的潜在空间中，在该空间中，使用变分量子生成器或参数数量几乎相同的经典生成器（1648 vs. 1632）训练带有梯度惩罚的条件Wasserstein GAN。合成样本被解码并用于增强预训练分类器，覆盖从5%到100%的标注数据比例，通过八个随机种子进行配对显著性检验（多重比较校正）以及集内多样性和潜在分布分析。在所有比例下，没有增强变体显著优于仅用真实数据训练，且量子与经典生成器在统计上无法区分。任何低数据优势表现为正则化而非忠实的数据扩展：合成样本分布外移，并且在数据稀缺时严重模式崩溃，而量子生成器并不比经典生成器更多样化。我们发布该协议作为医学成像中量子生成增强严格评估的测试平台。

英文摘要

Medical image classification is often constrained by limited labeled data, motivating generative augmentation; recently, quantum generative models have been proposed for this purpose, frequently reporting accuracy gains. However, such claims are typically based on single training runs, do not match the parameter budgets of the quantum and classical generators, and do not characterize the data regime in which any benefit appears. We present a controlled benchmark that isolates the contribution of a quantum generator to brain-MRI augmentation. Images are encoded into a KL-regularized latent space in which a conditional Wasserstein GAN with gradient penalty is trained using either a variational quantum generator or a classical generator of near-identical parameter count (1648 vs. 1632). Synthetic samples are decoded and used to augment a pretrained classifier across labeled data fractions from 5% to 100%, evaluated over eight random seeds with paired significance testing (with multiple-comparison correction) and with intraset diversity and latent-distribution analyses. Across all fractions, no augmentation variant significantly outperforms real-data-only training, and the quantum and classical generators are statistically indistinguishable. Any low-data benefit behaves as regularization rather than faithful data expansion:synthetic samples are off distribution and severely mode collapsed precisely where data is scarce, and the quantum generator is no more diverse thanits classical counterpart. We release the protocol as a testbed for rigorous evaluation of quantum generative augmentation in medical imaging.

URL PDF HTML ☆

赞 0 踩 0

2508.05762 2026-06-19 cond-mat.mtrl-sci cs.LG 版本更新

Evaluating Universal Machine Learning Force Fields Against Experimental Measurements

评估通用机器学习力场与实验测量的对比

Sajid Mannan, Vaibhav Bihani, Carmelo Gonzales, Kin Long Kelvin Lee, Nitya Nand Gosvami, Sayan Ranu, Santiago Miret, N M Anoop Krishnan

发表机构 * Department of Civil Engineering, Indian Institute of Technology Delhi（印度理工学院德里土木工程系）； Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi（印度理工学院德里人工智能学院）； Intel Labs, California, USA（美国加州英特尔实验室）； Department of Materials Science and Engineering, Indian Institute of Technology Delhi（印度理工学院德里材料科学与工程系）； Department of Computer Science and Engineering, Indian Institute of Technology Delhi（印度理工学院德里计算机科学与工程系）

AI总结提出UniFFBench框架和MinX数据集，系统评估六种通用机器学习力场，发现模型在计算基准上表现优异但在实验复杂性下存在显著“现实差距”，密度预测误差高于实际应用阈值。

详情

AI中文摘要

通用机器学习力场（UMLFFs）有望通过实现跨元素周期表的快速原子模拟来革新材料科学。然而，它们的评估一直局限于可能无法反映实际性能的计算基准。我们引入了UniFFBench，一个全面的评估框架，包含MinX数据集——一个涵盖85种元素、极端热力学条件（0–5000 K, 0–1000 GPa）和结构复杂性（包括部分占据和无序）的1500多种矿物系统的多样化集合。这种多样性，结合用于验证的实验参考值，使得能够评估UMLFF在化学空间和条件上的泛化能力，这些条件远超典型的训练场景。我们对六种最先进的UMLFF的系统评估揭示了一个显著的“现实差距”：在计算基准上表现令人印象深刻的模型在面对实验复杂性时常常失败。即使是最好的模型也表现出高于实际应用所需阈值的密度预测误差。我们观察到模拟稳定性和力学性能准确性之间的脱节，预测误差与训练数据表示相关，而非建模方法。

英文摘要

Universal machine learning force fields (UMLFFs) promise to revolutionize materials science by enabling rapid atomistic simulations across the periodic table. However, their evaluation has been limited to computational benchmarks that may not reflect real-world performance. We introduce UniFFBench, a comprehensive evaluation framework featuring the MinX dataset -- a diverse collection of 1,500+ mineral systems spanning 85 elements, extreme thermodynamic conditions (0--5000 K, 0--1000 GPa), and structural complexity, including partial occupancy and disorder. This diversity, combined with experimental reference values for validation, enables assessment of UMLFF generalization across chemical space and conditions substantially beyond typical training scenarios. Our systematic evaluation of six state-of-the-art UMLFFs reveals a substantial ``reality gap'': models achieving impressive performance on computational benchmarks often fail when confronted with experimental complexity. Even the best-performing models exhibit higher density prediction error than the threshold required for practical applications. We observe disconnects between simulation stability and mechanical property accuracy, with prediction errors correlating with training data representation rather than the modeling method.

URL PDF HTML ☆

赞 0 踩 0

2510.08807 2026-06-19 cs.RO cs.LG 版本更新

Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation

Humanoid Everyday：面向开放世界人形机器人操作的综合机器人数据集

Zhenyu Zhao, Hongyi Jing, Xiawei Liu, Jiageng Mao, Abha Jha, Hanwen Yang, Rong Xue, Sergey Zakharov, Vitor Guizilini, Yue Wang

发表机构 * University of Southern California（南加州大学）； Toyota Research Institute（丰田研究院）

AI总结提出Humanoid Everyday数据集，包含10.3k轨迹、260个任务的多模态数据，用于人形机器人灵巧操作、人机交互和移动操作研究，并配套云评估平台。

详情

AI中文摘要

从运动到灵巧操作，人形机器人在展示复杂的全身能力方面取得了显著进展。然而，当前大多数机器人学习数据集和基准主要关注固定机器人臂，少数现有人形数据集要么局限于固定环境，要么任务多样性有限，通常缺乏人机交互和下肢运动。此外，缺乏用于在人形数据上对基于学习的策略进行基准测试的标准化评估平台。在这项工作中，我们提出了Humanoid Everyday，一个大规模且多样化的人形操作数据集，其特点是涉及灵巧物体操作、人机交互、运动集成动作等广泛的任务多样性。利用高效的人工监督遥操作流水线，Humanoid Everyday聚合了高质量的多模态感官数据，包括RGB、深度、LiDAR和触觉输入，以及自然语言注释，包含10.3k条轨迹和超过300万帧数据，涵盖7个大类共260个任务。此外，我们对数据集上的代表性策略学习方法进行了分析，提供了它们在不同任务类别中的优势和局限性的见解。为了标准化评估，我们引入了一个基于云的评估平台，允许研究人员在我们的受控环境中无缝部署他们的策略并接收性能反馈。通过发布Humanoid Everyday以及我们的策略学习分析和标准化的基于云的评估平台，我们旨在推进通用人形操作的研究，并为现实世界中更有能力和具身化的机器人代理奠定基础。我们的数据集、数据收集代码和云评估网站在我们的项目网站上公开发布。

英文摘要

From loco-motion to dextrous manipulation, humanoid robots have made remarkable strides in demonstrating complex full-body capabilities. However, the majority of current robot learning datasets and benchmarks mainly focus on stationary robot arms, and the few existing humanoid datasets are either confined to fixed environments or limited in task diversity, often lacking human-humanoid interaction and lower-body locomotion. Moreover, there are a few standardized evaluation platforms for benchmarking learning-based policies on humanoid data. In this work, we present Humanoid Everyday, a large-scale and diverse humanoid manipulation dataset characterized by extensive task variety involving dextrous object manipulation, human-humanoid interaction, locomotion-integrated actions, and more. Leveraging a highly efficient human-supervised teleoperation pipeline, Humanoid Everyday aggregates high-quality multimodal sensory data, including RGB, depth, LiDAR, and tactile inputs, together with natural language annotations, comprising 10.3k trajectories and over 3 million frames of data across 260 tasks across 7 broad categories. In addition, we conduct an analysis of representative policy learning methods on our dataset, providing insights into their strengths and limitations across different task categories. For standardized evaluation, we introduce a cloud-based evaluation platform that allows researchers to seamlessly deploy their policies in our controlled setting and receive performance feedback. By releasing Humanoid Everyday along with our policy learning analysis and a standardized cloud-based evaluation platform, we intend to advance research in general-purpose humanoid manipulation and lay the groundwork for more capable and embodied robotic agents in real-world scenarios. Our dataset, data collection code, and cloud evaluation website are made publicly available on our project website.

URL PDF HTML ☆

赞 0 踩 0

2603.28387 2026-06-19 cs.AI cs.LG 版本更新

The Scaffold Effect: How Prompt Framing Drives Apparent Multimodal Gains in Clinical VLM Evaluation

脚手架效应：提示框架如何驱动临床VLM评估中的表面多模态增益

Doan Nam Long Vu, Simone Balloccu

发表机构 * Technical University of Darmstadt（达姆施塔特技术大学）

AI总结研究发现，在临床VLM评估中，提示中提及MRI可用性即可解释70-80%的性能提升，与图像数据是否存在无关，这种“脚手架效应”揭示了表面评估无法反映真实多模态推理能力。

详情

AI中文摘要

可信的临床AI要求性能提升反映真实的证据整合而非表面伪影。我们在两个临床神经影像队列\textsc{FOR2107}（情感障碍）和\textsc{OASIS-3}（认知衰退）上评估了12个开源视觉语言模型（VLM）的二分类性能。两个数据集都包含结构MRI数据，但这些数据不携带可靠的个体级诊断信号。在这些条件下，较小的VLM在引入神经影像上下文后F1分数提升高达58%，蒸馏模型变得与规模大一个数量级的模型相当。对比置信度分析显示，仅仅在任务提示中\textit{提及}MRI可用性就解释了70-80%的转变，与影像数据是否存在无关，这是模态坍塌的一个领域特定实例，我们称之为\textit{脚手架效应}。专家评估揭示了在所有条件下捏造基于神经影像的正当理由，而偏好对齐虽然消除了引用MRI的行为，却使两种条件都退化为随机基线。我们的发现表明，表面评估不足以作为多模态推理的指标，这对VLM在临床环境中的部署有直接影响。

英文摘要

Trustworthy clinical AI requires that performance gains reflect genuine evidence integration rather than surface-level artifacts. We evaluate 12 open-weight vision-language models (VLMs) on binary classification across two clinical neuroimaging cohorts, \textsc{FOR2107} (affective disorders) and \textsc{OASIS-3} (cognitive decline). Both datasets come with structural MRI data that carries no reliable individual-level diagnostic signal. Under these conditions, smaller VLMs exhibit gains of up to 58\% F1 upon introduction of neuroimaging context, with distilled models becoming competitive with counterparts an order of magnitude larger. A contrastive confidence analysis reveals that merely \emph{mentioning} MRI availability in the task prompt accounts for 70-80\% of this shift, independent of whether imaging data is present, a domain-specific instance of modality collapse we term the \emph{scaffold effect}. Expert evaluation reveals fabrication of neuroimaging-grounded justifications across all conditions, and preference alignment, while eliminating MRI-referencing behavior, collapses both conditions toward random baseline. Our findings demonstrate that surface evaluations are inadequate indicators of multimodal reasoning, with direct implications for the deployment of VLMs in clinical settings.

URL PDF HTML ☆

赞 0 踩 0

2604.13240 2026-06-19 cs.CV cs.LG 版本更新

A High-Resolution Landscape Dataset for Concept-Based XAI With Application to Species Distribution Models

基于概念的可解释AI的高分辨率景观数据集及其在物种分布模型中的应用

Augustin de la Brosse, Damien Garreau, Thomas Houet, Thomas Corpetti

发表机构 * Université Rennes 2, CNRS, Nantes Université, Univ Brest, LETG, UMR 6554（里昂大学第二分校、法国国家科学研究中心、南特大学、布列塔尼大学、LETG、UMR 6554）； LTSER Zone Atelier Armorique（Armorique 领域实验室区）； University of Würzburg, Center for Artificial Intelligence and Data Science（乌尔姆大学、人工智能与数据科学中心）

AI总结提出首个基于概念的可解释AI方法用于物种分布模型，利用高分辨率多光谱和LiDAR无人机影像构建景观概念数据集，通过Robust TCAV量化景观概念对模型预测的影响，案例研究验证了方法的有效性。

详情

AI中文摘要

绘制物种空间分布对于保护政策和入侵物种管理至关重要。物种分布模型（SDMs）是完成此任务的主要工具，具有两个目的：实现稳健的预测性能，同时提供关于分布驱动因素的生态见解。然而，深度学习SDMs日益增长的复杂性使得提取这些见解更具挑战性。为了调和这些目标，我们提出了首个基于概念的可解释AI（XAI）在SDMs中的实现。我们利用Robust TCAV（测试与概念激活向量）方法量化景观概念对模型预测的影响。为此，我们提供了一个新的开放获取的景观概念数据集，该数据集源自高分辨率多光谱和LiDAR无人机影像。它包括跨越15个不同景观概念的653个斑块和1,450个随机参考斑块，旨在适用于广泛的物种。我们通过两个水生昆虫（襀翅目和毛翅目）的案例研究，使用两个卷积神经网络和一个视觉Transformer来展示这种方法。结果表明，基于概念的XAI有助于根据专家知识验证SDMs，同时发现产生新生态假说的新颖关联。Robust TCAV还提供了景观层面的信息，对政策制定和土地管理有用。代码和数据集公开可用。

英文摘要

Mapping the spatial distribution of species is essential for conservation policy and invasive species management. Species distribution models (SDMs) are the primary tools for this task, serving two purposes: achieving robust predictive performance while providing ecological insights into the driving factors of distribution. However, the increasing complexity of deep learning SDMs has made extracting these insights more challenging. To reconcile these objectives, we propose the first implementation of concept-based Explainable AI (XAI) for SDMs. We leverage the Robust TCAV (Testing with Concept Activation Vectors) methodology to quantify the influence of landscape concepts on model predictions. To enable this, we provide a new open-access landscape concept dataset derived from high-resolution multispectral and LiDAR drone imagery. It includes 653 patches across 15 distinct landscape concepts and 1,450 random reference patches, designed to suit a wide range of species. We demonstrate this approach through a case study of two aquatic insects, Plecoptera and Trichoptera, using two Convolutional Neural Networks and one Vision Transformer. Results show that concept-based XAI helps validate SDMs against expert knowledge while uncovering novel associations that generate new ecological hypotheses. Robust TCAV also provides landscape-level information, useful for policy-making and land management. Code and datasets are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2605.20448 2026-06-19 cs.CV cs.LG 版本更新

逐点是否无意义？基于图神经网络的降水临近预报的多模态消融研究

Ophélia Miralles, Máté Mile, Christoffer Artturi, Thomas Nipen, Ivar Seierstad

发表机构 * Norwegian Meteorological Institute（挪威气象研究所）

AI总结本研究通过多模态图神经网络系统，消融分析雷达、数值预报、地面观测、卫星数据及训练损失对降水临近预报的影响，发现各模态分别改善不同方面，点观测虽提升局部但需结合损失函数和不确定性表示才能优化雷达场。

详情

AI中文摘要

稀疏点观测在降水临近预报中日益可用，但尚不清楚它们能在多大程度上改善密集雷达场预报。我们通过北欧雷达区域的多模态图神经网络临近预报系统部分回答了这个问题。该模型预测未来两小时内每五分钟的降雨率，并采用雷达历史、MEPS数值天气预报、Netatmo地面观测、MSG卫星通道、随机噪声和基于CRPS的集合损失的不同组合进行训练。本研究设计为对操作相关信源和训练目标的消融。我们比较了仅雷达、NWP信息、站点信息、卫星信息、噪声增强和基于CRPS的配置，使用雷达网格、站点位置、降雨起始的互补诊断，以及oracle、位移和幅度评分。结果表明，每个信源改善了预报问题的不同方面。MEPS稳定了仅雷达外推，Netatmo观测改善了局部站点和起始诊断，卫星预测因子减少了某些站点级偏差，但在确定性使用时可能过早激活降雨。基于CRPS的配置提供了最一致的雷达网格增益，而卫星与CRPS的组合设置给出了最佳的整体oracle/DAS评分。这些结果不支持点观测对临近预报无用的结论，但表明局部观测技能和空间相干雷达场技能是不同的目标。实际意义是，稀疏观测可以提供有用的局部约束，但它们对雷达类场的益处取决于训练损失、不确定性表示以及观测支持在模型中的编码方式。

英文摘要

Sparse point observations are increasingly available for precipitation nowcasting, but it is unclear how much they improve dense radar-field forecasts. We partially address this question with a multimodal graph neural network nowcasting system over the Nordic radar domain. The model predicts rain rate every five minutes up to two hours ahead and is trained with different combinations of radar history, MEPS numerical weather prediction, Netatmo surface observations, MSG satellite channels, stochastic noise, and CRPS-based ensemble losses. The study is designed as an ablation of operationally relevant information sources and training objectives. We compare radar-only, NWP-informed, station-informed, satellite-informed, noise-augmented, and CRPS-based configurations using complementary diagnostics on the radar grid, at station locations, for rain onset, and through oracle, displacement, and amplitude scores. The results show that each source improves a different part of the forecast problem. MEPS stabilises radar-only extrapolation, Netatmo observations improve local station and onset diagnostics, and satellite predictors reduce some station-level biases but may activate rain too early when used deterministically. CRPS-based configurations provide the most consistent radar-grid gains, while the combined satellite and CRPS setup gives the best overall oracle/DAS score. These results do not support the conclusion that point observations are uninformative for nowcasting, but they show that local observational skill and spatially coherent radar-field skill are distinct targets. The practical implication is that sparse observations can provide useful local constraints, but their benefit for radar-like fields depends on the training loss, uncertainty representation, and how observation support is encoded in the model.

URL PDF HTML ☆

赞 0 踩 0

2606.19245 2026-06-19 cs.AI cs.LG 版本更新

TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology

TxBench-PP：分析AI代理在小分子临床前药理学中的表现

Hannah Le, Ramesh Ramasamy, Alex Urrutia, Mahsa Yazdani, Tim Proctor, Kenny Workman

发表机构 * LatchBio

AI总结提出TxBench-PP基准，用于评估AI代理从真实实验数据中恢复临床前药理学结论的能力，测试显示最强配置Claude Opus 4.8 / Pi仅通过59.3%的端点尝试。

详情

AI中文摘要

人工智能（AI）代理有望通过压缩解释和决策循环来加速药物发现，但实际部署需要基于现实程序决策的可信评估。我们引入了TherapeuticsBench临床前药理学（TxBench-PP），这是一个针对小分子临床前药理学的可验证基准，也是更广泛的TherapeuticsBench在药物发现阶段和治疗模式中的首个聚焦切片。TxBench-PP测试代理是否能够从真实实验数据中恢复准确的结论，而非从文献中记忆的事实。该基准包含100个评估，按程序阶段、实验类型和任务结构索引，涵盖作用机制（MoA）和药效学（PD）推理、化合物-靶点结合、因果靶点验证、可开发性与安全性以及转化疗效。代理接收现实的工作流程快照，在编码环境中检查文件，并返回确定性评分的结构化答案。在16个模型-工具配置（包括11个模型和4,800条轨迹）中，没有系统能够可靠地恢复临床前药理学决策。最强配置Claude Opus 4.8 / Pi通过了59.3%的端点尝试（178/300；95% CI, 51.1-67.6），其次是GPT-5.5 / Pi，为55.3%（166/300；47.0-63.6）。

英文摘要

Artificial intelligence (AI) agents promise to accelerate drug discovery by compressing interpretation and decision-making loops, but practical deployment requires trusted evaluation on realistic program decisions. We introduce TherapeuticsBench Preclinical Pharmacology (TxBench-PP), a verifiable benchmark for small-molecule preclinical pharmacology and the first focused slice of a broader TherapeuticsBench effort across drug-discovery stages and therapeutic modalities. TxBench-PP tests whether agents can recover accurate conclusions from real-world assay data rather than memorized facts from literature. The benchmark contains 100 evaluations indexed by program stage, assay type, and task structure, spanning mechanism-of-action (MoA) and pharmacodynamic (PD) reasoning, compound-target engagement, causal target validation, developability and safety, and translational efficacy. Agents receive realistic workflow snapshots, inspect files in a coding environment, and return structured answers graded deterministically. Across 16 model-harness configurations, comprising 11 models and 4,800 trajectories, no system reliably recovered preclinical pharmacology decisions. The strongest configuration, Claude Opus 4.8 / Pi, passed 59.3\% of endpoint attempts (178/300; 95\% CI, 51.1-67.6), followed by GPT-5.5 / Pi at 55.3\% (166/300; 47.0-63.6).

URL PDF HTML ☆

赞 0 踩 0

2509.24725 2026-06-19 cs.LG cs.AI 版本更新

Q-Net: Queue Length Estimation via Kalman-based Neural Networks

Q-Net：基于卡尔曼神经网络的队列长度估计

Ting Gao, Elvin Isufi, Winnie Daamen, Erik-Sander Smits, Serge Hoogendoorn

发表机构 * University of Amsterdam（阿姆斯特丹大学）； Delft University of Technology（代尔夫特理工大学）

AI总结本文提出Q-Net框架，通过结合卡尔曼滤波与神经网络，解决信号交叉口队列长度估计中的数据融合问题，提升空间转移性和实时性，实现无需昂贵传感设备的准确队列估计。

Journal ref Transportation Research Part C: Emerging Technologies, Volume 190, September 2026, Article 105809

详情

DOI: 10.1016/j.trc.2026.105809

AI中文摘要

估计信号交叉口的队列长度一直是交通管理中的长期挑战。尽管有两类隐私保护的数据源：(i) 接近停止线的环形检测器提供的车辆计数汇总数据，以及 (ii) 提供路段平均速度测量的汇总浮动汽车数据 (aFCD)，但如何将这些具有不同空间和时间分辨率的数据源整合用于队列长度估计仍不清楚。为此，本文提出Q-Net：一种基于状态空间形式的队列估计框架。该设计解决了队列建模中的关键挑战，如违反交通守恒假设。Q-Net遵循卡尔曼预测-更新结构，并在状态演变和测量模型中保持物理可解释性。Q-Net使用AI增强的卡尔曼滤波器从数据中学习时间变化的增益动态。该框架支持实时实现，并通过将aFCD测量分组为固定大小的局部组来提高空间转移性，使可学习参数的数量与路段长度无关。在荷兰 Rotterdam 城市主干道的评估显示，Q-Net优于基线方法，能够准确追踪队列的形成和消散，并缓解aFCD引起的延迟。通过结合数据效率、可解释性、实时适用性和空间转移性，Q-Net在无需昂贵的传感基础设施（如摄像头或雷达）的情况下实现了准确的队列长度估计。

英文摘要

Estimating queue lengths at signalized intersections is a long-standing challenge in traffic management. Partial observability of vehicle flows complicates this task despite the availability of two privacy-preserving data sources: (i) aggregated vehicle counts from loop detectors near stop lines, and (ii) aggregated floating car data (aFCD) that provide segment-wise average speed measurements. However, how to integrate these sources with differing spatial and temporal resolutions for queue length estimation is rather unclear. Addressing this question, we present Q-Net: a queue estimation framework built upon a state-space formulation. This design addresses key challenges in queue modeling, such as violations of traffic conservation assumptions. Q-Net follows the Kalman predict-update structure and maintains physical interpretability in both the state evolution and measurement models. Q-Net uses an AI-augmented Kalman filter to learn time-varying gain dynamics from data. The framework supports real-time implementation and improves spatial transferability by grouping aFCD measurements into fixed-size local groups, making the number of learnable parameters independent of section length. Evaluations on urban main roads in Rotterdam, the Netherlands, show that Q-Net outperforms baseline methods, tracks queue formation and dissipation accurately, and mitigates aFCD-induced delays. By combining data efficiency, interpretability, real-time applicability, and spatial transferability, Q-Net makes accurate queue length estimation possible without costly sensing infrastructure like cameras or radar.

URL PDF HTML ☆

赞 0 踩 0

2412.18980 2026-06-19 cs.LG 版本更新

Evaluating deep learning models for fault diagnosis of a rotating machinery with epistemic and aleatoric uncertainty

评估深度学习模型在旋转机械故障诊断中的认知不确定性和偶然不确定性

Reza Jalayer, Masoud Jalayer, Andrea Mor, Carlotta Orsenigo, Carlo Vercellis

发表机构 * Faculty of Engineering and Natural Sciences（工程与自然科学学院）； Department of Information and Communications Engineering（信息与通信工程系）； Department of Management, Economics and Industrial Engineering（管理、经济与工业工程系）

AI总结本文首次全面比较了不确定性感知深度学习架构在旋转机械故障诊断中的表现，发现深度集成模型在检测未知故障和噪声数据方面优于其他方法。

详情

AI中文摘要

不确定性感知深度学习模型最近在故障诊断中受到关注，作为一种在来自未见故障（认知不确定性）或噪声存在（偶然不确定性）的分布外数据出现时促进可靠故障检测的方法。在本文中，我们首次对旋转机械故障诊断中最先进的不确定性感知深度学习架构进行了全面比较研究，其中研究了受认知不确定性影响的不同场景和不同类型的偶然不确定性。所选架构包括通过dropout采样、贝叶斯神经网络和深度集成。此外，为了区分不同场景中的分布内和分布外数据，我们交替应用了两个不确定性阈值，其中一个是在本文中引入的。我们的实证结果为必须部署实际不确定性感知故障诊断系统的从业者和研究人员提供了指导。特别是，它们揭示了在存在认知不确定性的情况下，所有深度学习模型都能够有效地检测到平均而言所有场景中相当一部分分布外数据。然而，深度集成模型显示出优越的性能，与用于区分的阈值无关。在存在偶然不确定性的情况下，噪声水平起着重要作用。具体来说，低噪声水平阻碍了模型有效检测分布外数据的能力。即使在这种情况下，深度集成模型也表现出较温和的性能下降，主导其他模型。这些成就，加上它们更短的推理时间，使得深度集成架构成为首选。

英文摘要

Uncertainty-aware deep learning (DL) models recently gained attention in fault diagnosis as a way to promote the reliable detection of faults when out-of-distribution (OOD) data arise from unseen faults (epistemic uncertainty) or the presence of noise (aleatoric uncertainty). In this paper, we present the first comprehensive comparative study of state-of-the-art uncertainty-aware DL architectures for fault diagnosis in rotating machinery, where different scenarios affected by epistemic uncertainty and different types of aleatoric uncertainty are investigated. The selected architectures include sampling by dropout, Bayesian neural networks, and deep ensembles. Moreover, to distinguish between in-distribution and OOD data in the different scenarios two uncertainty thresholds, one of which is introduced in this paper, are alternatively applied. Our empirical findings offer guidance to practitioners and researchers who have to deploy real-world uncertainty-aware fault diagnosis systems. In particular, they reveal that, in the presence of epistemic uncertainty, all DL models are capable of effectively detecting, on average, a substantial portion of OOD data across all the scenarios. However, deep ensemble models show superior performance, independently of the uncertainty threshold used for discrimination. In the presence of aleatoric uncertainty, the noise level plays an important role. Specifically, low noise levels hinder the models' ability to effectively detect OOD data. Even in this case, however, deep ensemble models exhibit a milder degradation in performance, dominating the others. These achievements, combined with their shorter inference time, make deep ensemble architectures the preferred choice.

URL PDF HTML ☆

赞 0 踩 0

2502.06866 2026-06-19 cs.LG cs.AI econ.EM stat.AP stat.ML 版本更新

Global Ease of Living Index: a machine learning framework for longitudinal analysis of major economies

全球生活便利指数：面向主要经济体纵向分析的机器学习框架

Arun Kumar Selvaraj, Tanay Panat, Rohitash Chandra

发表机构 * Transitional Artificial Intelligence Research Group, School of Mathematics and Statistics（过渡人工智能研究组，数学与统计学学院）； Centre for Artificial Intelligence and Innovation（人工智能与创新中心）； Pingla Institute（Pingla研究所）

AI总结提出全球生活便利指数，结合社会经济和基础设施因素，利用机器学习处理缺失数据，并通过主成分分析和因子分析降维，为政策制定者提供改善生活质量的可操作工具。

详情

AI中文摘要

全球经济、地缘政治条件以及COVID-19疫情等破坏性事件对生活成本和生活质量产生了巨大影响。理解主要经济体中生活成本和生活质量的长期影响至关重要。一个透明且全面的生活指数必须包含生活条件的多个维度。在本研究中，我们提出了一种通过全球生活便利指数量化生活质量的方法，该指数将各种社会经济和基础设施因素整合为一个单一综合得分。我们的指数利用定义生活水平的经济指标，这有助于针对特定领域进行干预改进。我们提出了一个机器学习框架来处理特定国家某些经济指标的数据缺失问题。然后，我们整理并更新数据，并使用降维方法（主成分分析和因子分析）创建自1970年以来主要经济体的生活便利指数。我们的工作通过为政策制定者提供识别需要改进领域（如医疗系统、就业机会和公共安全）的实用工具，显著丰富了相关文献。我们的方法使用开放数据和代码，易于复现并适用于各种情境，为生活质量评估的持续研究和政策制定提供了透明度和可访问性。

英文摘要

The drastic changes in the global economy, geopolitical conditions, and disruptions such as the COVID-19 pandemic have impacted the cost of living and quality of life. It is essential to comprehend the long-term implications of the cost of living and quality of life in major economies. A transparent and comprehensive living index must include multiple dimensions of living conditions. In this study, we present an approach to quantifying the quality of life through the Global Ease of Living Index that combines various socio-economic and infrastructural factors into a single composite score. Our index utilises economic indicators that define living standards, which could help in targeted interventions to improve specific areas. We present a machine learning framework to address missing data for certain economic indicators in specific countries. We then curate and update the data and use a dimensionality reduction approach (Principal Component Analysis and Factor Analysis) to create the Ease of Living Index for major economies since 1970. Our work significantly adds to the literature by offering a practical tool for policymakers to identify areas needing improvement, such as healthcare systems, employment opportunities, and public safety. Our approach with open data and code can be easily reproduced and applied to various contexts, providing transparency and accessibility for ongoing research and policy development in quality-of-life assessment.

URL PDF HTML ☆

赞 0 踩 0

2604.06265 2026-06-19 cs.LG cond-mat.stat-mech quant-ph 版本更新

SMT-AD: a scalable quantum-inspired anomaly detection approach

SMT-AD：一种可扩展的量子启发式异常检测方法

Apimuk Sornsaeng, Si Min Chan, Wenxuan Zhang, Swee Liang Wong, Joshua Lim, Jonathan Pan, Dario Poletti

发表机构 * Science, Mathematics and Technology Cluster, Singapore University of Technology and Design（新加坡科技设计大学科学、数学与技术集群）； Centre for Quantum Technologies, National University of Singapore（新加坡国立大学量子技术中心）； Artificial Intelligence and Data Analytics Strategic Technology Centre, ST Engineering（ST工程人工智能与数据分析战略技术中心）； Engineering Product Development Pillar, Singapore University of Technology and Design（新加坡科技设计大学工程产品开发支柱）

AI总结提出基于多分辨率张量叠加的量子启发式异常检测方法SMT-AD，通过傅里叶辅助特征嵌入和矩阵乘积算子实现线性可扩展，在标准数据集上取得竞争性能。

Comments 12 pages, 5 figures

详情

AI中文摘要

量子启发的张量网络算法已被证明是机器学习任务（包括异常检测）中有效且高效的模型。在此，我们提出一种高度可并行化的量子启发式方法，称为SMT-AD（Superposition of Multiresolution Tensors for Anomaly Detection）。它基于键维数为1的矩阵乘积算子的叠加，通过傅里叶辅助特征嵌入对输入数据进行变换，其中可学习参数的数量随特征大小、嵌入分辨率和矩阵乘积算子结构中附加组件的数量线性增长。我们展示了在标准数据集（包括信用卡交易）上成功的异常检测，并发现即使采用最小配置，它也能与已建立的异常检测基线相媲美。此外，它提供了一种直接的方法来减少模型权重，甚至通过突出最相关的输入特征来提高性能。

英文摘要

Quantum-inspired tensor networks algorithms have shown to be effective and efficient models for machine learning tasks, including anomaly detection. Here, we propose a highly parallelizable quantum-inspired approach which we call SMT-AD from Superposition of Multiresolution Tensors for Anomaly Detection. It is based upon the superposition of bond-dimension-1 matrix product operators to transform the input data with Fourier-assisted feature embedding, where the number of learnable parameters grows linearly with feature size, embedding resolutions, and the number of additional components in the matrix product operators structure. We demonstrate successful anomaly detection when applied to standard datasets, including credit card transactions, and find that, even with minimal configurations, it achieves competitive performance against established anomaly detection baselines. Furthermore, it provides a straightforward way to reduce the weight of the model and even improve the performance by highlighting the most relevant input features.

URL PDF HTML ☆

赞 0 踩 0

2503.04507 2026-06-19 q-bio.QM cs.CG cs.LG 版本更新

The Morse Transform for Discrete Shape Analysis

离散形状分析的Morse变换

Alexander M. Tanaka, Aras T. Asaad, Richard Cooper, Vidit Nanda

AI总结提出一种基于定向分段线性Morse理论的拓扑变换，通过记录多个高度函数下的临界点来量化嵌入对象的几何形状，生成的特征向量在配体虚拟筛选中取得最优平均AUROC。

Comments 37 pages, 3 main figures, 2 main tables, 12 appendix figures and 4 appendix tables

详情

AI中文摘要

物体的几何形状在调节其与物理世界的相互作用中起着至关重要的作用。然而，为了统计推断或分类任务的目的，用数值描述几何信息仍然困难。在这里，我们引入了一种新的拓扑变换，它利用定向分段线性Morse理论，通过编录多个高度函数下的临界点来量化嵌入对象的几何形状。该Morse变换的输出记录了表征底层形状的临界点的高度和局部拓扑类型（峰、谷或鞍点），保留了比欧拉特征变换更精细的信息，同时自然优先考虑形状的最外层区域。关键的是，该输出可以进一步压缩为丰富而紧凑的特征向量。我们将Morse特征向量作为配体虚拟筛选（LBVS）的描述符进行基准测试，这本质上依赖于分子的形状。在常见的梯度提升树分类流程下，与其他拓扑变换描述符和标准基于形状的LBVS描述符相比，Morse描述符实现了最高的平均AUROC。

英文摘要

The geometry of an object plays a vital role in modulating its interactions with the physical world. It nevertheless remains difficult to describe geometric information numerically for the purposes of statistical inference or classification tasks. Here, we introduce a new topological transform which leverages directional piecewise-linear Morse theory to quantify the geometry of an embedded object by cataloguing critical points across multiple height-functions. The output of this Morse transform records both the heights and the local topological type (peak, trough or saddle) of the critical points that characterise the underlying shape, retaining finer information than the Euler characteristic transform whilst naturally prioritising a shape's outermost regions. Crucially, this output can be further compressed into a rich but compact feature vector. We benchmark the Morse feature vector as a descriptor for ligand-based virtual screening (LBVS), which intrinsically depends on the shape of molecules. Under a common gradient-boosted tree classification pipeline, Morse descriptors achieve the highest mean AUROC when compared to other topological transform descriptors and to standard shape-based LBVS descriptors.

URL PDF HTML ☆

赞 0 踩 0

2605.15231 2026-06-19 cs.LG cs.CV 版本更新

Mask-Morph Graph U-Net: A Generalisable Mesh-Based Surrogate for Crashworthiness Field Prediction under Large Geometric Variation

Mask-Morph Graph U-Net：一种通用的基于网格的替代模型，用于在大几何变化下预测碰撞worthiness领域

Haoran Li, Tobias Lehrer, Yingxue Zhao, Haosu Zhou, Philipp Stocker, Tobias Pfaff, Marcus Wagner, Nan Li

发表机构 * Dyson School of Design Engineering, Imperial College London（帝国理工学院伦敦设计工程学院）； TUM School of Engineering and Design, Technical University of Munich（慕尼黑技术大学工程与设计学院）； Faculty of Mechanical Engineering, OTH Regensburg（雷根斯堡机械工程学院）； NVIDIA（NVIDIA公司）

AI总结本文提出Mask-Morph Graph U-Net，通过特征对齐的重心参数化和节点掩码预训练，提升网格模拟的通用性和数据效率，适用于碰撞worthiness设计探索。

Comments 48 pages, 15 figures, jounral paper under review

详情

AI中文摘要

非线性有限元碰撞模拟准确但计算成本高，限制了其在迭代设计优化中的应用。基于图神经网络（GNN）的机器学习替代模型提供了更快的替代方案。消息传递GNN广泛用于网格模拟，其共享节点和边更新函数在不同图结构中相对通用。相比之下，非共享边特定聚合层能更准确地捕捉非线性关系，但通常需要固定图连接性，限制了通用性。本文提出Mask-Morph Graph U-Net（MMGUNet），一种解决分层图U-Net架构限制的方法，该架构使用边特定下采样和上采样层。固定粗图连接性是边特定层所必需的。为了在保留此连接性的同时提高空间对应性，所提出的方法通过特征对齐的重心参数化将粗化图层次变形到每个输入网格，然后构建跨图边。它进一步在监督预训练中应用节点掩码，随后进行参数高效的微调，其中高参数边特定层被冻结。所提出的方法在分布内、分布外和跨组件迁移设置中使用均欧距离和最大入侵百分比误差进行评估。结果表明，粗图变形相对于固定粗图基线提高了测试准确性，而掩码监督预训练减少了训练-测试差异并提高了迁移期间的数据效率。所提出的模型还比外部基线取得了更低的预测误差。这些结果展示了通往可重用、数据高效网格替代模型的实用路径，用于碰撞worthiness设计探索。

英文摘要

Nonlinear finite element crash simulations are accurate but computationally expensive, limiting their use in iterative design optimisation. Machine-learning surrogate models based on graph neural networks (GNNs) offer a faster alternative. Message-passing GNNs are widely used for mesh simulation, and their shared node and edge update functions are relatively generalisable across varying graph structures. By contrast, non-shareable edge-specific aggregation layers can capture nonlinear relationships more accurately but usually require fixed graph connectivity, which limits generalisability. This paper presents Mask-Morph Graph U-Net (MMGUNet), a practical approach to addressing the limitation of hierarchical Graph U-Net architectures that use edge-specific downsampling and upsampling layers. Fixed coarse graph connectivity is required for edge-specific layers. To retain this while improving spatial correspondence, the proposed method morphs the coarsened graph hierarchy to each input mesh using feature-aligned barycentric parameterisation before constructing cross-graph edges. It further applies node masking during supervised pretraining, followed by parameter-efficient fine-tuning in which high-parameter edge-specific layers are frozen. The proposed approach is evaluated in in-distribution, out-of-distribution, and cross-component transfer settings using mean Euclidean distance and maximum intrusion percentage error. Results show that coarse-graph morphing improves test accuracy relative to a fixed-coarse-graph baseline, while masked supervised pretraining reduces the train-test discrepancy and improves data efficiency during transfer. The proposed model also achieves lower prediction error compared with external baselines. These results demonstrate a practical route toward reusable, data-efficient mesh-based surrogate modelling for crashworthiness design exploration.

URL PDF HTML ☆

赞 0 踩 0

2606.12500 2026-06-19 cs.LG cs.AI 版本更新

Improving Crash Frequency Prediction from Simulated Traffic Conflicts Using Machine Learning Based Microsimulation

基于机器学习的微观仿真从模拟交通冲突改进碰撞频率预测

Xian Liu, Carlo G. Prato, Gustav Markkula

AI总结本文利用机器学习行为模型替代传统规则模型进行交通微观仿真，通过极端值理论分析模拟冲突预测碰撞频率，在英国利兹五个信号交叉口验证了ML模型无需地点校准即可提升预测准确性。

详情

AI中文摘要

交通微观仿真结合替代安全措施越来越多地被用作历史碰撞数据的主动替代方案，用于预测当前或计划道路基础设施设计的碰撞频率。然而，现有的基于微观仿真的安全研究采用了简化的基于规则的行为模型，这些模型能较好地再现交通流，但往往无法生成真实的冲突动态，限制了碰撞预测的准确性。机器学习（ML）行为模型的最新进展提供了一个有希望的机会，通过直接从大规模轨迹数据集中学习人类驾驶行为，可能提高微观仿真的真实性和碰撞频率预测。为了研究这种可能性，我们对英国利兹的五个真实信号交叉口进行了交通微观仿真，使用了标准的基于规则模型和最先进的ML模型。使用二维碰撞时间指标分析模拟车辆轨迹以识别模拟冲突，然后使用极端值理论建模以预测碰撞频率。结果表明，ML模型的冲突产生的碰撞预测与实际碰撞数据一致，而基于规则的模型由于缺乏对特定模拟交叉口的模型校准，无法产生有意义的预测。直接使用ML生成的模拟碰撞来预测实际碰撞频率也产生了较差的结果，这表明尽管当前的ML模型可以真实地再现冲突，但尚不能生成真实的碰撞。总体而言，研究结果表明，基于ML的行为模型在无需特定地点模型校准的情况下，有望从模拟冲突中改进碰撞预测，并为基于ML的交通微观仿真指明了明确的未来方向。

英文摘要

Traffic microsimulation combined with surrogate safety measures has increasingly been used as a proactive alternative to historical crash data for predicting crash frequency for current or planned road infrastructure designs. However, existing microsimulation-based safety studies have adopted simplified rule-based behaviour models, which reproduce traffic flow reasonably well but often fail to generate realistic conflict dynamics, limiting crash prediction accuracy. Recent advances in machine learning (ML)-based behaviour models offer a promising opportunity to potentially improve microsimulation realism and crash frequency predictions by learning human driving behaviour directly from large-scale trajectory datasets. To investigate this possibility, traffic microsimulation was conducted for five real-world signalised intersections in Leeds, UK, using both a standard rule-based model and a state-of-the-art ML model. Simulated vehicle trajectories were analysed using a two-dimensional Time-to-Collision metric to identify simulated conflicts, which were then modelled using Extreme Value Theory to predict crash frequency. Results show that conflicts from the ML model yielded crash predictions in line with the real-world crash data, whereas the rule-based model did not permit meaningful predictions, presumably due to a lack of model calibration to the specific simulated intersections. Directly using ML-generated simulated crashes to predict real-world crash frequency also yielded poor results, suggesting that while current ML models can realistically reproduce conflicts, they are not yet able to generate realistic crashes. Overall, the findings demonstrate that ML-based behaviour models are promising for improving crash prediction from simulated conflicts, without a need for location-specific model calibration, and suggest clear future directions for ML-based traffic microsimulation.

URL PDF HTML ☆

赞 0 踩 0

2606.18933 2026-06-19 cs.LG cs.IR stat.ME 版本更新

Zero-Shot Active Feature Acquisition via LLM-Elicitation

基于LLM启发式的零样本主动特征获取

Binyamin Perets, Natalie Mendelson, Shiran Vainberg, Yehuda Chowers, Shai Shen-Orr, Shie Mannor

发表机构 * Faculty of EE, Technion（技术学院电子工程系）； Faculty of Medicine, Technion（技术学院医学院）； CytoReason ； NVIDIA

AI总结提出通过LLM启发式获取马尔可夫随机场充分统计量的零样本主动特征获取框架，解决数据标注不足问题，在IBD患者诊断中优于现有方法。

详情

AI中文摘要

主动特征获取（AFA）顺序选择要观察的特征以达成分类或排序决策。其主要局限性在于依赖大量标注数据来拟合指导获取的概率模型。大型语言模型（LLM）提供无监督的领域知识，但作为序列规划者表现不佳。要求其同时知晓和决策会混淆最好分开的能力。这里，我们通过严格的启发式方法开发了一个零样本AFA框架：仅要求LLM返回其可被信任返回的内容，即马尔可夫随机场（MRF）的充分统计量——一元偏差和成对协变。我们将该框架应用于两个场景：二分类和top-$k$识别。实践中，LLM可靠地仅返回判别性统计量，即区分类别而非孤立每个类别的统计量，这阻碍了经典AFA。我们应用最大熵闭包来解决这种规范模糊性。我们在炎症性肠病（IBD）患者队列上进行评估，这是一个活跃的临床环境，其中诊断模糊性和患者异质性阻碍了稳定的治疗策略。我们的框架在真实标签和其自身提取的信念上均优于LLM。在最关键的地方，即最困难的患者上，我们的top-$k$获取策略显著优于所有现有方法。

英文摘要

Active feature acquisition (AFA) sequentially selects which features to observe to reach a classification or ranking decision. Its central limitation is reliance on large amount of labeled data to fit probabilistic models guiding acquisition. Large language models (LLMs) supply unsupervised domain knowledge, but are poor sequential planners. Asking one to both know and decide conflates capabilities best kept separate. Here, we develop a framework for zero-shot AFA through disciplined elicitation: asking the LLM only for what it can be trusted to return, the unary deviations and pairwise co-variations that are the sufficient statistics of a Markov random field (MRF). We apply our framework to two settings: binary classification and top-$k$ identification. In practice, the LLM reliably returns only discriminative statistics, what distinguishes the classes rather than each class in isolation, which precludes classical AFA. We apply a maximum-entropy closure that resolves this gauge ambiguity. We evaluate on a cohort of Inflammatory Bowel Disease (IBD) patients, an active clinical setting where diagnostic ambiguity and patient heterogeneity obstruct stable treatment strategies. Our framework outperforms the LLM both on real labels and on its own extracted beliefs. Where it matters most, on the hardest patients, our top-$k$ acquisition policy markedly outperforms all existing methods.

URL PDF HTML ☆

赞 0 踩 0

2503.17386 2026-06-19 eess.SY cs.LG cs.SY 版本更新

A graph neural network surrogate model for mesh-based crashworthiness prediction of vehicle panel components

基于图神经网络的网格级车辆面板部件耐撞性预测代理模型

Haoran Li, Yingxue Zhao, Haosu Zhou, Tobias Pfaff, Nan Li

发表机构 * Dyson School of Design Engineering, Imperial College London（迪森设计工程学院，帝国理工学院伦敦分校）； NVIDIA

AI总结提出递归图U-Net (ReGUNet) 代理模型，通过图表示有限元网格，结合层次架构和递归机制，高效准确预测车辆B柱等面板部件的动态变形和耐撞性指标。

Comments Accepted manuscript version. Final published version available in Results in Engineering via DOI: 10.1016/j.rineng.2026.110925

Journal ref Results in Engineering 30 (2026) 110925

详情

DOI: 10.1016/j.rineng.2026.110925

AI中文摘要

耐撞性是安全关键车辆面板部件（如B柱）设计中的关键性能指标。有限元（FE）模拟广泛用于评估碰撞响应，但对于大规模非线性碰撞场景，特别是当集成到迭代设计和优化过程中时，计算成本仍然很高。尽管基于机器学习的代理模型已被开发用于快速耐撞性分析，但它们在对复杂三维部件的详细表示方面存在局限性。图神经网络（GNN）已成为处理复杂结构数据的有前景的解决方案。然而，现有的GNN模型通常缺乏足够的精度和计算效率以满足工业需求。本文提出了递归图U-Net（ReGUNet），一种用于车辆面板部件耐撞性分析的基于图的代理模型。通过将有限元网格表示为图形式，该模型自然地适应复杂的非规则结构几何。其层次架构提高了计算效率和精度，而递归的引入增强了多时间步长上时间预测的稳定性。使用不同几何形状的热冲压钢B柱的侧面碰撞案例研究来生成训练数据集。训练后的模型在预测未见过的部件设计的动态变形行为和耐撞性指标方面表现出高精度。与基线方法相比，ReGUNet在平均变形预测误差上实现了超过52%的降低，同时计算效率显著提高。ReGUNet提供了快速可靠的耐撞性评估，从而加速了车辆面板部件的设计周期。

英文摘要

Crashworthiness is a key performance measure in the design of safety-critical vehicle panel components such as B-pillars. Finite element (FE) simulations are widely used to evaluate crash responses but remain computationally expensive for large-scale, nonlinear impact scenarios, particularly when integrated into iterative design and optimisation processes. Although machine learning-based surrogate models have been developed for rapid crashworthiness analysis, they exhibit limitations in detailed representation of complex 3-dimensional components. Graph Neural Networks (GNNs) have emerged as a promising solution for processing data with complex structures. However, existing GNN models often lack sufficient accuracy and computational efficiency to meet industrial demands. This paper proposes Recurrent Graph U-Net (ReGUNet), a graph-based surrogate model for crashworthiness analysis of vehicle panel components. By representing FE meshes in graph form, the model naturally accommodates complex irregular structural geometries. Its hierarchical architecture improves computational efficiency and accuracy, while the introduction of recurrence enhances stability of temporal predictions over multiple time steps. A side-impact case study of hot-stamped steel B-pillars with varying geometries is used to generate training dataset. The trained model demonstrates high accuracy in predicting the dynamic deformation behaviour and crashworthiness indicators of previously unseen component designs. ReGUNet achieves over a 52% reduction in the average deformation prediction error relative to baseline methods, together with markedly improved computational efficiency. ReGUNet provides rapid and reliable crashworthiness assessments, which in turn accelerates the design cycle of vehicle panel components.

URL PDF HTML ☆

赞 0 踩 0

2505.18726 2026-06-19 cs.SD cs.LG eess.AS 版本更新

Bioacoustic Geolocation: Species Sounds as Geographic Signals

生物声学地理定位：物种声音作为地理信号

Mustafa Chasmai, Wuao Liu, Subhransu Maji, Grant Van Horn

发表机构 * University of Massachusetts, Amherst（马萨诸塞大学阿姆赫斯特分校）

AI总结本文研究仅通过声音进行全球尺度地理定位，利用生物声学信号中的物种地理分布线索，提出结合物种范围预测与检索的地理定位方法，并验证多模态融合的潜力。

Comments Accepted to ICML 26

详情

AI中文摘要

我们能否仅通过听到的声音确定某人的地理位置？声学信号是否足以定位到国家、州甚至城市？在这项工作中，我们应对全球尺度音频地理定位的挑战，特别关注野生动物和自然声音。我们假设生物声学信号包含信息丰富的地理定位线索，因为物种具有明确的地理分布范围。为了验证这一假设，我们对图像地理定位和声景映射方法进行基准测试，设计预言机和以物种为中心的基线，并提出一种结合物种范围预测与基于检索的地理定位的混合方法。我们进一步探究地理定位是否随着物种多样性记录和跨邻近样本的时空聚合而改善。最后，我们将研究扩展到多模态地理定位，通过结合音频和视觉内容的电影案例研究。我们的结果突出了将生物声学信号纳入地理空间任务的潜力，为物种识别和音频地理定位的未来工作提供了动力。

英文摘要

Can we determine someone's geographic location solely from the sounds they hear? Are acoustic signals enough to localize within a country, state, or even city? In this work, we tackle the challenge of global-scale audio geolocation, with a particular focus on wildlife and natural sounds. We posit that bioacoustic signals contain informative geolocation cues because of well-defined geographic ranges of species. To test this hypothesis, we benchmark image geolocation and soundscape mapping methods, design oracles and species-centric baselines, and propose a hybrid approach that combines species range prediction with retrieval-based geolocation. We further ask whether geolocation improves with species-diverse recordings and spatiotemporal aggregation across neighboring samples. Finally, we extend our study to multimodal geolocation with case studies from movies that combine both audio and visual content. Our results highlight the potential of incorporating bioacoustic signals into geospatial tasks, motivating future work on species recognition and audio geolocation.

URL PDF HTML ☆

赞 0 踩 0

2507.19653 2026-06-19 cs.NI cs.AI cs.LG 版本更新

On the Limitations of Ray-Tracing for Learning-Based RF Tasks in Urban Environments

关于射线追踪在城市环境中基于学习的射频任务局限性的研究

Armen Manukyan, Hrant Khachatrian, Edvard Ghukasyan, Theofanis P. Raptis

发表机构 * Yerevan State University, Yerevan, Armenia（亚美尼亚叶里温州立大学）； YerevaNN, Yerevan, Armenia（亚美尼亚叶里温YerevaNN）； Institute of Informatics and Telematics, National Research Council, Pisa, Italy（意大利那不勒斯国家研究委员会信息与电信研究所）

AI总结通过罗马城区实测数据评估Sionna射线追踪仿真器，发现天线位置和方向对保真度影响显著，而超参数影响微弱；优化后相关性提升5%-130%，定位误差降低三分之一，但残差城市噪声仍是挑战。

Comments This work was supported by funding under the bilateral agreement between CNR (Italy) and HESC MESCS RA (Armenia) as part of the DeepRF project for the 2025-2026 biennium, and by the HESC MESCS RA grant No. 22rl-052 (DISTAL)

Journal ref 2026 IEEE Wireless Communications and Networking Conference (WCNC)

详情

DOI: 10.1109/WCNC65185.2026.11555460

AI中文摘要

我们研究了Sionna v1.0.2射线追踪在罗马市中心户外蜂窝链路中的真实感。我们使用了包含1,664个用户设备（UE）和六个名义基站（BS）站点的真实测量数据集。利用这些固定位置，我们系统地改变了主要仿真参数，包括路径深度、漫反射/镜面反射/折射标志、载波频率，以及天线的属性如高度、辐射方向和方向图。通过测量功率与仿真功率之间的Spearman相关性，以及基于RSSI指纹的k近邻定位算法，对每个基站的仿真保真度进行评分。在所有实验中，求解器超参数对所选指标的影响微不足道。相反，天线位置和方向被证明是决定性的。通过简单的贪婪优化，我们将不同基站的Spearman相关性提高了5%到130%，而仅使用仿真数据作为参考点的kNN定位误差在真实世界样本上减少了三分之一，但仍比纯真实数据的误差高一倍。因此，精确的几何形状和可信的天线模型是必要但不充分的；忠实地捕捉残余的城市噪声仍然是实现可迁移、高保真户外射频仿真的一个开放挑战。

英文摘要

We study the realism of Sionna v1.0.2 ray-tracing for outdoor cellular links in central Rome. We use a real measurement set of 1,664 user-equipments (UEs) and six nominal base-station (BS) sites. Using these fixed positions we systematically vary the main simulation parameters, including path depth, diffuse/specular/refraction flags, carrier frequency, as well as antenna's properties like its altitude, radiation pattern, and orientation. Simulator fidelity is scored for each base station via Spearman correlation between measured and simulated powers, and by a fingerprint-based k-nearest-neighbor localization algorithm using RSSI-based fingerprints. Across all experiments, solver hyper-parameters are having immaterial effect on the chosen metrics. On the contrary, antenna locations and orientations prove decisive. By simple greedy optimization we improve the Spearman correlation by 5% to 130% for various base stations, while kNN-based localization error using only simulated data as reference points is decreased by one-third on real-world samples, while staying twice higher than the error with purely real data. Precise geometry and credible antenna models are therefore necessary but not sufficient; faithfully capturing the residual urban noise remains an open challenge for transferable, high-fidelity outdoor RF simulation.

URL PDF HTML ☆

赞 0 踩 0

2510.00831 2026-06-19 cs.AI cs.LG eess.SP 版本更新

Controlled Comparison of Machine Learning Models for Fault Classification and Localization in Power System Protection

电力系统保护中故障分类与定位的机器学习模型受控比较

Julian Oelhaf, Georg Kordowich, Changhun Kim, Paula Andrea Pérez-Toro, Christian Bergler, Andreas Maier, Johann Jäger, Siming Bayer

发表机构 * Department of Electrical Engineering, Media and Computer Science, Ostbayerische Technische Hochschule Amberg-Weiden（奥贝格-魏登应用技术大学电气工程、媒体与计算机科学系）

AI总结在统一电磁暂态数据集和10-50ms决策窗口下，对比机器学习模型在故障分类与定位中的性能，发现分类在10ms时F1>0.98，定位误差稳定在约10%线路长度。

Comments Accepted at IEEE PES Innovative Smart Grid Technologies Europe 2026 (ISGT Europe 2026). Pre-camera-ready author version; final proceedings version may differ

详情

AI中文摘要

现代电力系统因逆变器基和分布式能源的集成而日益复杂，挑战了传统保护方案的可靠性，并推动了机器学习在保护任务中的应用。然而，由于不同研究中的数据集、传感假设和决策时域各异，已发表的结果往往难以比较。本文在相同的传感、时序和验证条件下，基于公共电磁暂态数据集，使用10-50ms的决策窗口以反映保护相关时间尺度，对故障分类（FC）和故障定位（FL）的机器学习模型进行了受控比较。对于FC，性能最佳的非线性模型在10ms时F1分数已超过0.98，而低容量模型在较短时域下性能下降，但随窗口延长而改善，表明相关故障类型信息在最早暂态中已存在。对于FL，顶级模型在所有评估时域下达到约10%归一化线路长度的稳定定位误差，而较弱模型形成明显分离的第二性能层级。线路解析分析显示，定位精度随电网段变化，表明存在拓扑依赖的难度而非仅时间上下文不足。这些发现为比较两个信息需求根本不同的保护任务中的机器学习模型提供了受控参考。

英文摘要

The increasing complexity of modern power systems, driven by the integration of inverter-based and distributed energy resources, challenges the reliability of conventional protection schemes and motivates the use of machine learning for protection tasks. However, published results are often difficult to compare because datasets, sensing assumptions, and decision horizons vary across studies. This paper presents a controlled comparison of machine learning models for fault classification (FC) and fault localization (FL) under identical sensing, timing, and validation conditions on a common electromagnetic transient dataset, using decision windows of 10-50 ms to reflect protection-relevant time scales. For FC, the best-performing nonlinear models achieve F1 scores above 0.98 already at 10 ms, while lower-capacity models degrade at shorter horizons but improve with longer windows, indicating that relevant fault-type information is already present in the earliest transient. For FL, the top-performing models reach a stable localization error of about 10 % of normalized line length across all evaluated horizons, while weaker models form a clearly separated second performance tier. Line-resolved analysis shows that localization accuracy varies across grid segments, indicating topology-dependent difficulty rather than insufficient temporal context alone. These findings provide a controlled reference for comparing machine learning models across two protection tasks with fundamentally different information requirements.

URL PDF HTML ☆

赞 0 踩 0

2511.22486 2026-06-19 physics.plasm-ph cs.LG 版本更新

The Machine Learning Approach to Moment Closure Relations for Plasma: A Review

等离子体矩闭包关系的机器学习方法：综述

Samuel Burles, Enrico Camporeale

发表机构 * School of Physical and Chemical Sciences, Queen Mary University of London（伦敦大学女王学院物理与化学科学学院）； Space Weather TREC, University of Colorado（科罗拉多大学空间天气TREC）

AI总结本文综述了机器学习方法在等离子体流体模型中发展改进闭包模型的研究，涵盖神经网络代理和方程发现两类方法，并讨论了离线测试与在线模拟的挑战及未来方向。

Comments 58 pages, 6 figures

详情

AI中文摘要

大规模等离子体全局模拟的需求是空间和实验室等离子体物理学中持续存在的挑战。任何基于流体模型的模拟都固有地需要高阶等离子体矩的闭包关系。本综述汇编并分析了近期涌现的机器学习方法，这些方法旨在开发改进的等离子体闭包模型，能够在等离子体流体模型中捕捉动力学现象。我们调查了两类方法：神经网络代理（从多层感知器到傅里叶神经算子，后者最近在流体求解器内在线复现了线性和非线性朗道阻尼）和方程发现方法（如稀疏回归）；并根据这些研究是离线对照参考数据测试还是在线在时间演化求解器内测试进行组织。我们概述了与机器学习闭包相关的挑战，包括非对角压力张量精度、超出训练分布的泛化能力以及稳定集成到大尺度模拟中，并指出了未来研究可能解决这些问题的方向。

英文摘要

The requirement for large-scale global simulations of plasma is an ongoing challenge in both space and laboratory plasma physics. Any simulation based on a fluid model inherently requires a closure relation for the high order plasma moments. This review compiles and analyses the recent surge of machine learning approaches developing improved plasma closure models capable of capturing kinetic phenomena within plasma fluid models. We survey two methodological families: neural-network surrogates (from multilayer perceptrons to Fourier neural operators, the latter recently reproducing both linear and non-linear Landau damping online within a fluid solver) and equation-discovery methods such as sparse regression; and organise the studies by whether they are tested offline against reference data or online within a time-evolving solver. We outline the challenges associated with machine-learning closures, including off-diagonal pressure-tensor accuracy, generalisation beyond the training distribution, and stable integration into large-scale simulations, and the directions future research might take to address them.

URL PDF HTML ☆

赞 0 踩 0

2601.00014 2026-06-19 eess.SP cs.AI cs.LG 版本更新

Modeling Day-Long ECG Signals to Predict Heart Failure Risk with Explainable AI

建模全天心电图信号以可解释人工智能预测心力衰竭风险

Eran Zvuloni, Ronit Almog, Michael Glikson, Shany Brimer Biton, Ilan Green, Izhar Laufer, Offer Amir, Joachim A. Behar

发表机构 * Leumit Health Services（Leumit健康服务）

AI总结提出DeepHHF深度学习模型，利用24小时单导联心电图数据预测五年内心力衰竭风险，AUC达0.80，优于短时片段和临床评分，可解释性分析显示模型关注心律失常和心脏异常。

详情

AI中文摘要

心力衰竭（HF）影响11.8%的65岁及以上成年人，降低生活质量和寿命。预防HF可降低发病率和死亡率。我们假设将人工智能（AI）应用于24小时单导联心电图（ECG）数据可预测五年内HF风险。为此，使用了Technion-Leumit Holter ECG（TLHE）数据集，包括20年间收集的47,729名患者的69,663条记录。我们的深度学习模型DeepHHF在24小时ECG记录上训练，实现了0.80的受试者工作特征曲线下面积，优于使用30秒片段和临床评分的模型。DeepHHF识别的高风险个体住院或死亡事件概率翻倍。可解释性分析显示DeepHHF关注心律失常和心脏异常。本研究强调了深度学习建模24小时连续ECG数据的可行性，捕捉了对可靠风险预测至关重要的阵发性事件。应用于单导联Holter ECG的人工智能无创、廉价且广泛可及，使其成为HF风险预测的有前景工具。

英文摘要

Heart failure (HF) affects 11.8% of adults aged 65 and older, reducing quality of life and longevity. Preventing HF can reduce morbidity and mortality. We hypothesized that artificial intelligence (AI) applied to 24-hour single-lead electrocardiogram (ECG) data could predict the risk of HF within five years. To research this, the Technion-Leumit Holter ECG (TLHE) dataset, including 69,663 recordings from 47,729 patients, collected over 20 years was used. Our deep learning model, DeepHHF, trained on 24-hour ECG recordings, achieved an area under the receiver operating characteristic curve of 0.80 that outperformed a model using 30-second segments and a clinical score. High-risk individuals identified by DeepHHF had a two-fold chance of hospitalization or death incidents. Explainability analysis showed DeepHHF focused on arrhythmias and heart abnormalities. This study highlights the feasibility of deep learning to model 24-hour continuous ECG data, capturing paroxysmal events essential for reliable risk prediction. Artificial intelligence applied to single-lead Holter ECG is non-invasive, inexpensive, and widely accessible, making it a promising tool for HF risk prediction.

URL PDF HTML ☆

赞 0 踩 0

2601.03040 2026-06-19 cs.RO cs.AI cs.LG 版本更新

PiDR: Physics-Informed Inertial Dead Reckoning for Autonomous Platforms

PiDR：面向自主平台的物理信息惯性航位推算

Arup Kumar Sahoo, Itzik Klein

发表机构 * Autonomous Navigation and Sensor Fusion Lab (ANSFL)（自主导航与传感器融合实验室（ANSFL））； Hatter Department of Marine Technologies（海洋技术系）； Charney School of Marine Sciences（海洋科学学院）； University of Haifa（海法大学）

AI总结提出PiDR框架，将惯性导航原理作为物理信息残差融入网络训练，在纯惯性导航中减少轨迹漂移，在移动机器人和水下自主航行器数据集上定位精度提升超29%。

Comments 11 pages and 7 figures

详情

AI中文摘要

完全自主的一个基本要求是在缺乏外部数据（如GNSS信号或视觉信息）的情况下维持精确导航的能力。在这些具有挑战性的环境中，平台必须完全依赖惯性传感器，导致纯惯性导航。然而，在现实场景中，惯性传感器的固有噪声和其他误差项会导致导航解随时间漂移。尽管传统的深度学习模型已成为惯性导航的一种可能方法，但它们本质上是黑箱的。此外，它们在有限的监督传感器数据下难以有效学习，并且常常无法保持物理原理。为了解决这些局限性，我们提出了PiDR，一种用于纯惯性导航情况下自主平台的物理信息惯性航位推算框架。PiDR通过物理信息残差组件将惯性导航原理明确地整合到网络训练过程中，从而提供了透明性。即使在有限或稀疏监督下，PiDR在减轻轨迹突然偏差方面也起着关键作用。我们在移动机器人和自主水下航行器收集的真实世界数据集上评估了PiDR。在两个数据集中，我们获得了超过29%的定位改进，证明了PiDR在不同环境和动力学下运行的不同平台上的泛化能力。因此，PiDR提供了一种鲁棒、轻量级且有效的架构，可以部署在资源受限的平台上，在不利场景中实现实时纯惯性导航。

英文摘要

A fundamental requirement for full autonomy is the ability to sustain accurate navigation in the absence of external data, such as GNSS signals or visual information. In these challenging environments, the platform must rely exclusively on inertial sensors, leading to pure inertial navigation. However, the inherent noise and other error terms of the inertial sensors in such real-world scenarios will cause the navigation solution to drift over time. Although conventional deep-learning models have emerged as a possible approach to inertial navigation, they are inherently black-box in nature. Furthermore, they struggle to learn effectively with limited supervised sensor data and often fail to preserve physical principles. To address these limitations, we propose PiDR, a physics-informed inertial dead-reckoning framework for autonomous platforms in situations of pure inertial navigation. PiDR offers transparency by explicitly integrating inertial navigation principles into the network training process through the physics-informed residual component. PiDR plays a crucial role in mitigating abrupt trajectory deviations even under limited or sparse supervision. We evaluated PiDR on real-world datasets collected by a mobile robot and an autonomous underwater vehicle. We obtained more than 29% positioning improvement in both datasets, demonstrating the ability of PiDR to generalize different platforms operating in various environments and dynamics. Thus, PiDR offers a robust, lightweight, yet effective architecture and can be deployed on resource-constrained platforms, enabling real-time pure inertial navigation in adverse scenarios.

URL PDF HTML ☆

赞 0 踩 0

2602.00510 2026-06-19 cs.AI cs.LG cs.SE 版本更新

一种联合求解具有任意参数和初始分布的瞬态Fokker-Planck方程的深度学习框架

Xiaolong Wang, Jing Feng, Qi Liu, Chengli Tan, Yuanyuan Liu, Yong Xu

发表机构 * School of Mathematics and Statistics, Shaanxi Normal University（陕西师范大学数学与统计学院）； School of Mathematics and Statistics, Northwestern Polytechnical University（西北工业大学数学与统计学院）； MOE Key Laboratory for Complexity Science in Aerospace, Northwestern Polytechnical University（航空复杂科学教育部重点实验室，西北工业大学）； School of Science, Xi’an University of Posts and Telecommunications（西安邮电大学理学院）； Department of Systems and Control Engineering, Institute of Science Tokyo（东京科学大学系统与控制工程系）

AI总结提出基于深度学习的伪解析概率解(PAPS)，通过单次训练同时求解任意多模态初始分布、系统参数和时间点的瞬态FPE，速度比GPU加速蒙特卡洛快四个数量级。

详情

AI中文摘要

高效求解Fokker-Planck方程(FPE)是分析复杂参数化随机系统的核心。然而，当前数值方法缺乏跨不同条件的并行计算能力，严重限制了全面的参数探索和瞬态分析。本文引入一种基于深度学习的伪解析概率解(PAPS)，通过单次训练过程，同时求解任意多模态初始分布、系统参数和时间点的瞬态FPE解。核心思想是通过高斯混合分布(GMD)统一初始、瞬态和稳态分布，并开发一个约束保持自编码器，将受约束的GMD参数双射映射到无约束的低维潜在表示。在该表示空间中，可以建模跨不同初始条件和系统参数的全局瞬态动力学。在典型系统上的大量实验表明，所提出的PAPS在保持高精度的同时，推理速度比GPU加速的蒙特卡洛模拟快四个数量级。这种效率提升使得以前难以实现的实时参数扫描和随机分岔的系统研究成为可能。通过将表示学习与物理信息瞬态动力学解耦，我们的工作为多维参数化随机系统的概率建模建立了一个可扩展的范式。

英文摘要

Efficiently solving the Fokker-Planck equation (FPE) is central to analyzing complex parameterized stochastic systems. However, current numerical methods lack parallel computation capabilities across varying conditions, severely limiting comprehensive parameter exploration and transient analysis. This paper introduces a deep learning-based pseudo-analytical probability solution (PAPS) that, via a single training process, simultaneously resolves transient FPE solutions for arbitrary multi-modal initial distributions, system parameters, and time points. The core idea is to unify initial, transient, and stationary distributions via Gaussian mixture distributions (GMDs) and develop a constraint-preserving autoencoder that bijectively maps constrained GMD parameters to unconstrained, low-dimensional latent representations. In this representation space, the panoramic transient dynamics across varying initial conditions and system parameters can be modeled by a single evolution network. Extensive experiments on paradigmatic systems demonstrate that the proposed PAPS maintains high accuracy while achieving inference speeds four orders of magnitude faster than GPU-accelerated Monte Carlo simulations. This efficiency leap enables previously intractable real-time parameter sweeps and systematic investigations of stochastic bifurcations. By decoupling representation learning from physics-informed transient dynamics, our work establishes a scalable paradigm for probabilistic modeling of multi-dimensional, parameterized stochastic systems.

URL PDF HTML ☆

赞 0 踩 0

2606.10686 2026-06-19 physics.comp-ph astro-ph.IM cs.LG 版本更新

An adaptive framework for the axisymmetric pulsar magnetosphere using physics-informed Kolmogorov-Arnold networks

基于物理信息Kolmogorov-Arnold网络的轴对称脉冲星磁层自适应框架

Spyros Rigas, Ioannis Contopoulos, Georgios Alexandridis, Antonios Nathanail

发表机构 * Department of Digital Industry Technologies, School of Science, National and Kapodistrian University of Athens（数字产业技术系，科学学院，国家与卡布利安大学）； Research Center for Astronomy and Applied Mathematics, Academy of Athens（天文与应用数学研究所，雅典学院）

AI总结提出基于Kolmogorov-Arnold网络的自适应框架，结合自动化训练流程和物理收敛准则，在双精度下将PDE残差均方误差降至O(1e-6)，收敛时间缩短至20分钟内，并可靠解析缩小80%的恒星半径。

Comments 25 pages, 10 figures

详情

AI中文摘要

脉冲星磁层直到最近才通过物理信息神经网络（PINNs）进行研究，采用区域分解方法并将分离线和赤道电流片视为无限薄的间断。然而，这一基线方法需要大量手动超参数调整，最终精度有限且需要数小时训练。我们通过引入基于Kolmogorov-Arnold网络的领域特定神经架构、自动化自适应训练流程以及基于物理的收敛准则来改进这一框架，消除了手动校准的需求。所提出的方法提供了自洽的轴对称磁层解，在双精度下PDE残差的均方误差达到O(1e-6)量级——比基线方法提高了两个数量级——同时在单精度下在20分钟内实现收敛。重要的是，该方法可靠地解析了相比基线缩小高达80%的恒星半径，克服了同样挑战传统求解器的严重空间尺度差异。此外，通过改变开放至无穷远的磁通量，我们提供了将其与赤道T点位置关联的方程的修正。完整框架已作为开源库PulsarX发布。

英文摘要

The pulsar magnetosphere has only recently been addressed using Physics-Informed Neural Networks (PINNs), by deploying a domain-decomposition approach and treating the separatrix and equatorial current sheet as infinitesimally thin discontinuities. However, this baseline requires extensive manual hyperparameter tuning, achieves limited final accuracy and demands several hours of training. We refine this framework by introducing domain-specific neural architectures based on Kolmogorov-Arnold networks, an automated adaptive training pipeline and a physics-based convergence criterion that eliminate the need for manual calibration. The proposed methodology delivers self-consistent axisymmetric magnetosphere solutions with mean squared errors of the PDE residuals at O(1e-6) in double precision - an improvement of two orders of magnitude over the baseline - while achieving convergence in under 20 minutes in single precision. Importantly, the method reliably resolves stellar radii reduced by up to 80% compared to the baseline, overcoming the severe spatial scale disparities that also challenge traditional solvers. Furthermore, by varying the flux that opens to infinity, we provide a correction to the equation that connects it to the equatorial T-point's position. The complete framework is released as the open-source library PulsarX.

URL PDF HTML ☆

赞 0 踩 0

2606.14776 2026-06-19 cs.RO cs.LG 版本更新

Deep Learning-Based Lunar Crater Terrain Relative Navigation

基于深度学习的月球陨石坑地形相对导航

Batu Candan, Simone Servadio

发表机构 * NASA（美国国家航空航天局）； University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结提出一种结合深度学习陨石坑检测器和扩展卡尔曼滤波的地形相对导航算法，在初始位置偏差达5公里时仍能将导航误差降至数百米。

详情

AI中文摘要

准确的位置估计对于未来使用自主飞行器实现月球着陆至关重要，尤其是在地形特征稀疏的危险环境中。本文提出了一种地形相对导航（TRN）算法，该算法结合了我们专门为NASA陨石坑检测挑战问题设计的深度学习陨石坑检测器和扩展卡尔曼滤波（EKF）。我们的检测器分析从轨道获取的单目图像中的陨石坑特征，并通过匈牙利分配方法及基于共识的离群点去除方法，识别它们与全球数据库中陨石坑的匹配。然后，估计的测量值用于优化EKF，其中航天器在月心月固（LCLF）参考系中的姿态估计，结合高度辅助信息，约束径向漂移。仿真结果表明，即使航天器偏离实际位置达5公里，TRN也能从这种情况中恢复，将导航误差降低到几百米。需要注意的是，为了保持陨石坑特征的对应关系，必须将图像分辨率和场景中的尺度与检测器训练集分布相匹配。

英文摘要

Accurate position estimation is crucial for the successful implementation of future lunar landings using autonomous vehicles, especially in dangerous environments with sparse terrain features. In this paper, we propose a terrain relative navigation (TRN) algorithm combining our deep-learning crater detector, which was designed specifically for the NASA Crater Detection Challenge problem, and an Extended Kalman Filter (EKF). Our detector analyzes crater features from the monocular images acquired from orbit, and their matches with craters from a global database are identified via a Hungarian assignment approach followed by the consensus-based outliers removal method. The estimated measurements are then used to refine an EKF, where spacecraft pose estimation in the Lunar-Centered Lunar-Fixed (LCLF) frame of reference, augmented with altitude aiding information, constrains radial drift. The simulation results indicate that even if the spacecraft is off from its actual location up to 5 km, TRN could recover from this situation, achieving navigation error reduction to a few hundred meters. It should be noted that in order to maintain crater feature correspondences, it is important to match the image resolution and the scales within the scene to the detector training set distribution.

URL PDF HTML ☆

赞 0 踩 0

2606.19149 2026-06-19 cs.CR cs.LG 版本更新

OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing

OpenAnt：通过代码分解、对抗性验证和动态测试实现LLM驱动的漏洞发现

Nahum Korda, Gadi Evron

AI总结提出OpenAnt系统，结合静态分析与LLM推理，通过代码分解、对抗性验证和动态测试三阶段流水线，在降低误报率的同时发现未知漏洞。

详情

AI中文摘要

在大型代码库中自动发现漏洞仍然具有挑战性：传统静态分析误报率高，而模糊测试等动态方法需要大量基础设施且通常针对狭窄的漏洞类别。大型语言模型（LLM）的最新进展使得对程序行为进行语义推理成为可能，但将LLM应用于仓库级安全分析会引入上下文管理、成本和验证方面的挑战。我们提出了OpenAnt，一个开源漏洞发现系统，它在多阶段流水线中集成了静态程序分析与基于LLM的推理。OpenAnt引入了三种关键技术。首先，代码库被分解为自包含的分析单元，并通过从外部入口点的可达性进行过滤，将分析面减少高达97%，同时保留与攻击相关的代码。其次，候选漏洞通过受限攻击者模拟进行对抗性验证，其中模型在现实攻击者能力下评估可利用性。第三，通过动态验证确认发现结果，其中自动生成利用环境，在沙箱容器中执行，并在使用后丢弃。在包括OpenSSL、WordPress和Flowise在内的广泛使用的开源项目上的评估表明，这种架构可以识别先前未知的漏洞，同时保持可管理的分析成本并大幅减少误报。我们的结果表明，结合语义推理与利用验证的闭环漏洞发现流水线，为可扩展的自动化安全分析提供了一条实用路径。OpenAnt已在Apache 2.0许可下开源，网址为https://this https URL。

英文摘要

Automated vulnerability discovery in large codebases remains challenging: traditional static analysis produces high false-positive rates, while dynamic approaches such as fuzzing require substantial infrastructure and often target narrow classes of bugs. Recent advances in large language models (LLMs) enable semantic reasoning about program behavior, but applying LLMs to repository-scale security analysis introduces challenges related to context management, cost, and verification. We present OpenAnt, an open-source vulnerability discovery system that integrates static program analysis with LLM-based reasoning in a multi-stage pipeline. OpenAnt introduces three key techniques. First, codebases are decomposed into self-contained analysis units filtered by reachability from external entry points, reducing the analysis surface by up to 97% while preserving attack-relevant code. Second, candidate vulnerabilities undergo adversarial verification through constrained attacker simulation, where the model evaluates exploitability under realistic attacker capabilities. Third, findings are validated through dynamic verification, in which exploit environments are generated automatically, executed in sandboxed containers, and discarded after use. Evaluation on widely used open-source projects including OpenSSL, WordPress, and Flowise shows that this architecture can identify previously unknown vulnerabilities while maintaining manageable analysis cost and substantially reducing false positives. Our results suggest that closed-loop vulnerability discovery pipelines, combining semantic reasoning with exploit validation, provide a practical path toward scalable automated security analysis. OpenAnt is released as open source under the Apache 2.0 license at https://github.com/knostic/OpenAnt.

URL PDF HTML ☆

赞 0 踩 0

2606.19186 2026-06-19 cs.RO cs.LG 版本更新

Learning to Annotate Delayed and False AEB Events: A Practical System for Extreme Class Imbalance and Asymmetric Label Noise

学习标注延迟和误报AEB事件：针对极端类别不平衡和非对称标签噪声的实用系统

Mengxiang Hao, Xin Jiang, Xinghao Huang, Wenliang Su, Zhiteng Wang, Junjie Rao, Xiaotian Yang, Wei Liao, Chengyu Han, Gen Liang, Yulun Song, Zhitao Xu, Xianpeng Lang

发表机构 * Li Auto（理想汽车）

AI总结提出首个自动化AEB标注框架，通过特定数据增强和噪声抑制技术，解决极端类别不平衡和非对称标签噪声问题，将延迟/误报触发召回率提升80%，人工工作量减少50%。

Comments 8 pages, 5 figures, accepted by IEEE International Conference on Robotics and Automation (ICRA)

Journal ref 2026 IEEE International Conference on Robotics and Automation (ICRA)

详情

AI中文摘要

自主紧急制动（AEB）优化依赖于准确标注的真实世界触发事件，特别是揭示系统缺陷的罕见但关键的延迟和误报AEB触发事件。然而，这些少数样本在每天数千次触发事件中占比不到5%，使得大规模人工标注成本过高。我们提出了首个自动化AEB标注框架来解决这一问题。在开发过程中，我们识别出两个严重损害延迟/误报触发标注准确性的基本挑战：（1）极端类别不平衡，其中延迟/误报触发被真实触发淹没；（2）非对称标签噪声，其中误标注的多数样本（真实触发）抑制了少数样本（延迟/误报触发）的学习。为克服这些挑战，我们提出两项关键创新：（1）特定数据增强，通过操纵焦点目标属性、移植自车动态和掩蔽非焦点代理来合成逼真样本；（2）噪声抑制，使用稳定硬度估计和探针引导的自适应阈值来清理误标注的真实触发样本。关键的是，我们将模型部署为具有全栈架构的实用标注系统，从每天数千个AEB事件中高效识别关键的延迟/误报触发。生产结果表明，延迟/误报触发的召回率提高了80%，人工工作量减少了50%。除了直接收益，该系统通过积累高质量标注实现持续自我改进，为车载AEB系统优化奠定了必要的数据基础。

英文摘要

Autonomous Emergency Braking (AEB) optimization relies on accurately annotated real-world trigger events, particularly rare but critical delayed and false AEB triggers that expose system deficiencies. However, these minority samples comprise less than 5% of thousands of daily triggers, making manual annotation prohibitively expensive at scale. We present the first automated AEB annotation framework to address this problem. During development, we identified two fundamental challenges that severely impair delayed/false trigger annotation accuracy: (1) Extreme class imbalance where delayed/false triggers are overwhelmed by true triggers; (2) Asymmetric label noise where mislabeled majority samples (true triggers) suppress minority samples (delayed/false triggers) learning. To overcome these challenges, we propose two key innovations: (1) Specific data augmentation that synthesizes realistic samples by manipulating focal target attributes, transplanting ego-vehicle dynamics, and masking non-focal agents; (2) noise suppression using stable hardness estimation and probe-guided adaptive threshold to clean mislabeled true trigger samples. Crucially, we deploy our model as a practical annotation system with full-stack architecture, efficiently identifying critical delayed/false triggers from thousands of daily AEB events. Production results demonstrate 80% improvement in recall of delayed/false triggers and 50% reduction in manual workload. Beyond immediate gains, the system enables continuous self-improvement through accumulated high-quality annotations, establishing a necessary data foundation for on-vehicle AEB system optimization

URL PDF HTML ☆

赞 0 踩 0

2505.22829 2026-06-19 cs.LG cs.AI 版本更新

Bridging Distribution Shift and AI Safety: Conceptual and Methodological Synergies

弥合分布偏移与AI安全：概念与方法论的协同

Chenruo Liu, Kenan Tang, Yao Qin, Qi Lei

发表机构 * Center for Data Science, New York University New York New York USA ； Computer Science Department, University of California, Santa Barbara Santa Barbara California USA ； Department of Electrical ； Computer Engineering, University of California, Santa Barbara Santa Barbara California USA ； Courant Institute for Mathematical Sciences \& Center for Data Science, New York University New York New York USA ； Center for Data Science, New York University ； Computer Science Department, University of California, Santa Barbara ； Computer Engineering, University of California, Santa Barbara ； Courant Institute for Mathematical Sciences \& Center for Data Science, New York University

AI总结本文通过分析分布偏移与AI安全之间的概念和方法论协同，建立了特定偏移类型与细粒度安全问题之间的两种联系，促进了两领域研究的深度融合。

Comments 35 pages

2406.02421 2026-06-19 cs.DM cs.LG cs.SC 版本更新

Representing Piecewise-Linear Functions by Functions with Minimal Arity

用最小元数函数表示分段线性函数

Christoph Koutschan, Anton Ponomarchuk, Josef Schicho

发表机构 * Johann Radon Institute for Computational and Applied Mathematics（约翰·拉登研究所（计算与应用数学））； Research Institute for Symbolic Computation（符号计算研究所）； Johannes Kepler University（约翰· Kepler大学）

AI总结本文研究了连续分段线性函数表示为max函数线性组合所需的最小参数个数，建立了函数诱导的空间剖分与所需参数个数之间的直接联系。

2509.03122 2026-06-19 cs.CL cs.AI cs.LG 版本更新

From Construction to Injection: Edit-Based Fingerprints for Large Language Models

从构建到注入：面向大型语言模型的基于编辑的指纹

Yue Li, Xin Yi, Dongsheng Shi, Yongyi Cui, Gerard de Melo, Linlin Wang

发表机构 * East China Normal University（华东师范大学）； Hasso Plattner Institute/University of Potsdam（哈索罗普拉特纳研究所/波茨坦大学）

AI总结提出端到端注入指纹框架，通过代码混合指纹和多候选编辑方法，解决黑盒部署中指纹的不可感知性和鲁棒性挑战。

Comments preprint

详情

AI中文摘要

可靠的模型指纹对于保护大型语言模型（LLMs）免受未经授权的重新分发和商业滥用至关重要。在黑盒部署中，验证受到对可疑指纹查询的防御性过滤以及可能削弱嵌入所有权证据的下游模型修改的阻碍。这些风险要求指纹在构建和注入方面都具有鲁棒性。在构建方面，先前的范式面临不可感知性的权衡：自然语言指纹可能被意外激活，而乱码指纹在统计上暴露且更容易被过滤。在注入方面，现有方法难以在模型修改下保持持久的触发-目标行为。我们提出了一个端到端的注入指纹框架来解决这些挑战。代码混合指纹（CF）在高复杂度约束下使用最低困惑度的代码混合来缓解这种双向不可感知性权衡。多候选编辑（MCEdit）构建结构冗余、间隔分离的触发-目标映射，以在模型修改下实现优雅降级。在不可感知性、可检测性和无害性方面的广泛评估表明，该框架在几乎不影响实用性的情况下实现了鲁棒的所有权验证。

英文摘要

Reliable model fingerprints are essential for protecting large language models (LLMs) against unauthorized redistribution and commercial misuse. In black-box deployment, verification is hindered by defensive filtering of suspected fingerprint queries, as well as by downstream model modifications that may weaken embedded ownership evidence. These risks require fingerprints to be robust in both construction and injection. For construction, prior paradigms face an imperceptibility trade-off: natural-language fingerprints may be accidentally activated, whereas garbled fingerprints are statistically exposed and easier to filter. For injection, existing methods struggle to preserve persistent trigger--target behaviors under model modification. We propose an end-to-end injected fingerprinting framework to address these challenges. Code-mixing Fingerprints (CF) use lowest-perplexity code-mixing under a high-complexity constraint to mitigate this two-sided imperceptibility trade-off. Multi-Candidate Editing (MCEdit) constructs structurally redundant, margin-separated trigger--target mappings to enable graceful degradation under model modification. Extensive evaluations on imperceptibility, detectability, and harmlessness demonstrate robust ownership verification with negligible impact on utility.

URL PDF HTML ☆

赞 0 踩 0

2605.20531 2026-06-19 cs.LO cs.LG 版本更新

Pseudo-Formalization for Automatic Proof Verification

伪形式化用于自动证明验证

Slim Barkallah, Luke Bailey, Kaiyue Wen, Mohammed Abouzaid, Tengyu Ma

发表机构 * GitHub

AI总结本文提出了一种名为伪形式化的证明格式，该格式在保持自然语言灵活性的同时，保留了形式证明的模块性和精确性，通过块验证算法实现了对自然语言证明的高效验证，其在错误发现的精度和召回率上优于现有基线方法。

Comments 31 pages, code available at https://github.com/Slim205/pseudo-formalization

详情

AI中文摘要

可靠的证明验证仍然是训练和评估在复杂数学推理上的人工智能系统的主要瓶颈。在像Lean这样的语言中，完全形式化的证明容易验证，因为它们是无歧义且模块化的。大多数证明，尤其是由人工智能系统编写证明，既没有这种属性，将它们翻译成形式语言在许多前沿数学领域仍然具有挑战性。我们提出了伪形式化（PF），一种证明格式，它捕捉了形式证明的模块性和精确性，同时保留了自然语言的灵活性。一个伪形式化证明被分解成自包含的模块，每个模块陈述其前提、结论和证明，用自然语言。为了验证一个常规的自然语言证明的正确性，一个LLM将其翻译成伪形式化，然后独立验证每个模块，我们称之为块验证（BV）。我们在两个涵盖竞赛和研究级数学的基准上评估PF+BV，其中它在错误发现的精度和召回率上优于LLM-as-judge基线。为了支持未来的工作，我们发布了我们的研究级证明验证基准ArxivMathGradingBench。

英文摘要

Reliable verification of proofs remains a bottleneck for training and evaluating AI systems on hard mathematical reasoning. Fully formal proofs, in languages like Lean, are easy to verify because they are unambiguous and modular. Most proofs, particularly those written by AI systems, have neither property, and translating them into formal languages remains challenging in many frontier math settings. We propose Pseudo-Formalization (PF), a proof format that captures the modularity and precision of formal proofs while retaining the flexibility of natural language. A Pseudo-Formal proof is decomposed into self-contained modules, each stating its premises, conclusion, and proof in natural language. To verify the correctness of a regular natural language proof, an LLM translates it to Pseudo-Formal and then verifies each module independently, an algorithm we call Block Verification (BV). We evaluate PF+BV on two benchmarks spanning olympiad and research-level mathematics, where it pareto-dominates LLM-as-judge baselines on error-finding precision and recall. To support future work, we release our research-level proof verification benchmark ArxivMathGradingBench.

URL PDF HTML ☆

赞 0 踩 0

2407.11933 2026-06-19 cs.LG 版本更新

Fairness-Aware Multi-Group Target Detection in Online Discussion

Soumyajit Gupta, Maria De-Arteaga, Matthew Lease

发表机构 * Dept. of Computer Science, The University of Texas at Austin（德克萨斯大学奥斯汀分校计算机科学系）； Department of Data, Analytics, Technology, and Artificial Intelligence, ESADE（ESADE大学数据、分析、技术和人工智能系）； The Information School, The University of Texas at Austin（德克萨斯大学奥斯汀分校信息学院）

Journal ref 2026 ACM Conference on Fairness, Accountability, and Transparency (FAccT)

2602.05416 2026-06-19 cs.CE cs.AI cs.LG physics.ao-ph physics.flu-dyn 版本更新

Reduced-Order Surrogates for Forced Flexible Mesh Coastal-Ocean Models

Freja Høgholm Petersen, Jesper Sandvig Mariegaard, Rocco Palmitessa, Allan P. Engsig-Karup

发表机构 * DTU（技术大学）

Comments Submitted for peer-review in a journal. v2: revised version submitted to journal after minor revisions

2601.12433 2026-06-19 eess.SP cs.LG 版本更新

Temporal Data and Short-Time Averages Improve Multiphase Mass Flow Metering

Amanda Nyholm, Yessica Arellano, Jinyu Liu, Damian Krakowiak, Pierluigi Salvo Rossi

发表机构 * Dept. Electronic Systems, Norwegian University of Science and Technology（电子系统系，挪威科学与技术大学）； Dept. Gas Technology, SINTEF Energy Research（气体技术系，SINTEF能源研究）； Dept. Research and Development, KROHNE Ltd.（研发部，KROHNE有限公司）

Comments 9 pages, 6 figures

Journal ref IEEE Sensors Journal, vol. 26, no. 11, pp. 17252-17261, 1 June 2026

2506.23396 2026-06-19 stat.ML cs.LG 版本更新

AICO: Feature Significance Tests for Supervised Learning

Kay Giesecke, Enguerrand Horel, Chartsiri Jirachotkulthorn

发表机构 * Stanford University, Department of Management Science and Engineering and Institute for Computational and Mathematical Engineering（斯坦福大学管理科学与工程系和计算与数学工程研究所）； Upstart, Inc.（Upstart公司）； Stanford University, Institute for Computational and Mathematical Engineering（斯坦福大学计算与数学工程研究所）

2412.20298 2026-06-19 cs.LG cs.CY stat.ML 版本更新

An Experimental Study on Fairness-aware Machine Learning for Credit Scoring Problems

Huyen Giang Thi Thu, Thang Viet Doan, Ha-Bang Ban, Tai Le Quy

发表机构 * Banking Academy of Vietnam（越南银行学院）； Vietnam Academy of Science and Technology（越南科学技术 academy）； Hanoi University of Science and Technology（河内科学技术大学）； University of Koblenz（科隆大学）

Comments The manuscript is submitted to Springer Nature's journal

2602.14239 2026-06-19 cs.SI cs.AI cs.LG 版本更新

A Hybrid TGN-SEAL Model for Dynamic Graph Link Prediction

Nafiseh Sadat Sajadi, Behnam Bahrak, Mahdi Jafari Siavoshani

发表机构 * Department of Computer Engineering, Sharif University of Technology（谢尔万大学计算机工程系）； Tehran Institute for Advanced Studies, Khatam University（泰赫兰高级研究院，卡塔姆大学）

Journal ref EPJ Data Science (2026)

2510.05013 2026-06-19 stat.ML cs.LG 版本更新

Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration

通过自我探索的机器人好奇心驱动行为与语言发展

Theodore Jerome Tinker, Kenji Doya, Jun Tani

发表机构 * Okinawa Institute of Science and Technology（冲绳科学技术大学院大学）

AI总结本研究通过好奇心驱动的机器人自我探索，结合Q学习实现主动推理，揭示了组合泛化、快速学习、先配对后组合以及异常处理导致的U型发展模式，为人类高效语言习得提供解释。

Comments 27 pages, 22 pages of supplementary material

详情

AI中文摘要

婴儿通过极少的经验就能泛化习得语言，而大型语言模型需要数十亿的训练标记。人类高效发展的基础是什么？我们通过实验研究了这一问题，其中机器人代理通过好奇心驱动的自我探索学习执行与祈使句（例如，推红色立方体）相关的动作。我们的方法使用Q学习摊销主动推理，实现内在动机的发展性学习。模拟揭示了与发展心理学观察相对应的关键发现。i) 随着组合元素规模的增加，泛化能力显著提高。ii) 好奇心驱动的探索能够加速学习。iii) 句子和动作的机械配对先于组合泛化。iv) 异常处理导致U型发展表现，这种模式类似于儿童语言学习中的表征重述。这些结果表明，好奇心驱动的主动推理解释了内在动机的感觉运动-语言学习如何支持人类和人工代理中的可扩展组合泛化和异常处理。

英文摘要

Infants acquire language with generalization from minimal experience, whereas large language models require billions of training tokens. What underlies efficient development in humans? We investigated this problem through experiments wherein robotic agents learn to perform actions associated with imperative sentences (e.g., push red cube) via curiosity-driven self-exploration. Our approach amortizes active inference using Q-learning, enabling intrinsically motivated developmental learning. The simulations reveal key findings corresponding to observations in developmental psychology. i) Generalization improves drastically as the scale of compositional elements increases. ii) Curiosity-driven exploration enables faster learning. iii) Rote pairing of sentences and actions precedes compositional generalization. iv) Exception-handling induces U-shaped developmental performance, a pattern like representational redescription in child language learning. These results suggest that curiosity-driven active inference accounts for how intrinsically motivated sensorimotor-linguistic learning supports scalable compositional generalization and exception handling in humans and artificial agents.

URL PDF HTML ☆

赞 0 踩 0

1. 深度学习架构与训练方法 13 篇

Wisdom of Committee: Diverse Distillation from Large Foundation Models and Domain Experts

A Unified Perspective on the Dynamics of Deep Transformers

Linear Mode Connectivity under Data Shifts for Deep Ensembles of Image Classifiers

Model soups need only one ingredient

Reversible Residual Normalization Alleviates Spatio-Temporal Distribution Shift

Minimal Filling Architectures of Polynomial Neural Networks: Counterexamples, Frontier Search, and Defects

DisjunctiveNet: Neural Symbolic Learning via Differentiable Convexified Optimization Layers

RepNN: Tackling spectral bias in deep neural networks via parameter reparameterization

From Drift to Coherence: Stabilizing Beliefs in LLMs

Monotonic Kolmogorov-Arnold Networks: A Theoretical and Empirical Study of Monotonicity as an Inductive Bias

Toward all-optical unsupervised Hebbian learning in deep photonic neuromorphic networks

Higher-Order Token Interactions via Quantum Attention

QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

2. 表示学习、自监督与对比学习 2 篇

Self-attention-based non-linear basis transformations for compact latent space modelling of dynamic optical fibre transmission matrices

Adversarial Dependence Minimization

3. 强化学习与序列决策 14 篇

Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search

EQPO: Equitable Group Relative Policy Optimization for Clinical Reasoning

Beyond Reasoning Gains: Mitigating General-Capability Forgetting in Large Reasoning Models

Stabilizing the Q-Gradient Field for Policy Smoothness in Actor-Critic Methods

DADP: Domain Adaptive Diffusion Policy

Flickering Multi-Armed Bandits

Approximate Next Policy Sampling: Replacing Conservative Target Policy Updates in Deep RL

Large Language Models Hack Rewards, and Society

StarOR: Synergizing Tree Search and Test-Time Reinforcement Learning for Optimization Modeling

Reinforcement Learning Foundation Models Should Already Be A Thing

Reinforcement Twinning for Hybrid Control of Flapping-Wing Drones

Oranits: Mission Assignment and Task Offloading in Open RAN-based ITS using Metaheuristic and Deep Reinforcement Learning

Utility-Aware DRL-Based TXOP Adaptation for NR-U and Wi-Fi Coexistence Networks

Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning

4. 生成模型与概率建模 5 篇

Critique of World Model

Prior-Informed Flow Matching for Graph Reconstruction

Flow Matching for Efficient and Scalable Data Assimilation

Meta Flow Maps enable scalable reward alignment

Learning to Emulate Chaos: Adversarial Optimal Transport Regularization

5. 优化、泛化与理论分析 12 篇

The Hidden Cost of Approximation in Online Mirror Descent

How to sketch a learning algorithm

Folded Transport MCMC: Eliminating Label Switching by Sampling on a Fundamental Domain

Indexed Bellman Information Complexity

SILAGE: Memory-Efficient, Full-Gradient-Free Nonconvex Optimization for Nested Finite Sums

Benign overfitting beyond prediction: The ordinary least squares interpolator

Phase Transition for Stochastic Block Model with more than $\sqrt{n}$ Communities

Improved Stochastic Optimization of LogSumExp

Alternating Direction Method of Multipliers for Nonlinear Matrix Decompositions

Stabilizing Bandits using Regularization: Precise Regret and A Quantitative Central Limit Theorem

Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization

Fair Online Resource Allocation

6. 高效学习、压缩与部署 7 篇

TetriServe: Efficiently Serving Mixed DiT Workloads

CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training

LoRDO: Distributed Low-Rank Optimization with Infrequent Communication

Reinforcement-aware Knowledge Distillation for LLM Reasoning

A Survey of On-Policy Distillation for Large Language Models

Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models

UltraEP: Unleash MoE Training and Inference on Rack-Scale Nodes with Near-Optimal Load Balancing

7. 鲁棒性、不确定性与可信学习 8 篇

Weighted Bayesian Conformal Prediction

Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption

Diffuse AI Control on Fuzzy Tasks

Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agents

Influence-Guided Concolic Testing of Transformer Robustness

One Probe Won't Catch Them All: Towards Targeted Deception Detection

The Autonomy Tax: Defense Training Breaks LLM Agents

The ACUTE Protocol: Operationalizing Language Model Activations for Better Calibration, Utility, and Trust

8. 图学习与结构化数据 4 篇

HGCN(O): A Self-Tuning GCN HyperModel Toolkit for Outcome Prediction in Event-Sequence Data

Toward General Digraph Contrastive Learning: A Dual Spatial Perspective

Capturing Intransitive Dominance in Tennis Forecasting: A Graph Neural Network Approach

KG-SoftMAP: Soft Knowledge-Graph Priors for Bayesian Network Structure Learning from Sparse Discrete Data

9. 迁移、元学习与持续学习 3 篇

Continual Learning with Support Boundary Experience Blending

Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings

Environment-Adaptive Covariate Selection: Learning When to Use Spurious Correlations for Out-of-Distribution Prediction

10. 数据集、基准与评测 15 篇

FreshRetailNet-50K: A Stockout-Annotated Censored Demand Dataset for Latent Demand Recovery and Forecasting in Fresh Retail

We Need to Rethink Benchmarking in Anomaly Detection