arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.26112 2026-05-26 cs.AI cs.LG 版本更新

用于偏好学习的主动查询合成

Namrata Nadagouda, Nauman Ahad, Maegan Tucker, Mark A. Davenport

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； Gauss Labs（Gauss实验室）

AI总结针对偏好学习中的查询反馈可靠性问题和池评估计算瓶颈，提出基于互信息最大化的连续空间主动查询合成框架Info-Synth，并扩展出两种有限池查询策略，在合成数据、文本摘要和机器人控制任务上验证了有效性。

Comments 27 pages, 12 figures

详情

AI中文摘要

用户偏好的高效学习对于许多现代决策系统至关重要，但通常需要昂贵的标注数据。主动学习降低了这一成本，然而由于基于池的评估，标准方法计算开销大。此外，大多数方法假设所有查询反馈同样可靠，忽略了几乎相同或完全不同的项目之间的成对查询会产生模糊、低置信度的响应。为了解决反馈可靠性问题，我们引入了一种新颖的置信度感知响应模型，明确考虑了这些模糊比较。为了克服基于池评估的计算瓶颈，我们提出了一个主动查询合成框架Info-Synth，通过在连续空间内最大化基于互信息的目标来生成最优查询。此外，我们提出了两种策略，Pair M-dist和Pair Opt-dist，将Info-Synth扩展到即使限制在有限查询池中也能选择有效查询。我们通过合成偏好学习、受限文本摘要数据集以及模拟移动机器人的主观连续空间控制器增益调优，展示了我们框架的通用性和性能。

英文摘要

Efficient learning of user preferences is crucial for many modern decision making systems but typically requires costly labeled data. Active learning reduces this cost, yet standard methods are computationally expensive due to pool-based evaluation. Further, most methods assume all query feedback is equally reliable, ignoring that pairwise queries between nearly identical or entirely dissimilar items yield ambiguous, low-confidence responses. To address the issue of feedback reliability, we introduce a novel confidence aware response model that explicitly accounts for these ambiguous comparisons. To overcome the computational bottleneck of pool-based evaluation, we propose an active query synthesis framework, Info-Synth that generates optimal queries by maximizing a mutual information-based objective within a continuous space. Moreover, we propose two strategies, Pair M-dist and Pair Opt-dist, that extend Info-Synth to select effective queries even when restricted to finite query pools. We demonstrate our framework's versatility and performance across synthetic preference learning, constrained text summary datasets, and subjective, continuous-space controller gain tuning for a simulated mobile robot.

URL PDF HTML ☆

赞 0 踩 0

2605.26067 2026-05-26 cs.LG cs.AI 版本更新

Conditional KRR: Injecting Unpenalized Features into Kernel Methods with Applications to Kernel Thresholding

条件KRR：将无惩罚特征注入核方法及其在核阈值处理中的应用

Rustem Takhanov, Zhenisbek Assylbekov

发表机构 * Department of Mathematics, Nazarbayev University, Astana, Kazakhstan（纳扎尔巴耶夫大学数学系）； Nazarbayev University Research Administration, Astana, Kazakhstan（纳扎尔巴耶夫大学研究行政部）； Purdue University Fort Wayne, Indiana, USA（普渡大学枫林分校）

AI总结本文通过将条件KRR简化为带残差核的KRR，理论分析了其统计性质，并展示了在核主成分和随机特征场景下优于标准KRR的条件。

Comments Accepted to ICML 2026

详情

AI中文摘要

条件正定（CPD）核是相对于函数类$\mathcal{F}$定义的。众所周知，这样的核$K$与其原生空间（类似于RKHS定义）相关联，进而产生一种学习方法——称为条件核岭回归（条件KRR），因其与KRR的类比而得名——其中估计的回归函数通过其原生空间范数的平方进行惩罚。该方法之所以引人关注，是因为它可以被视为经典线性回归（由$\mathcal{F}$指定特征），随后对目标变量的残差（未解释）部分应用标准KRR。这类方法最近引起了越来越多的关注。我们通过将其行为简化为带有另一个固定核（称为残差核）的KRR来研究该方法的统计性质。我们的主要理论结果表明，这种简化确实是可能的，代价是期望测试风险中增加一个由$\mathcal{O}(1/\sqrt{N})$界定的额外项，其中$N$是样本量，隐藏常数依赖于类$\mathcal{F}$和输入分布。这种简化使我们能够分析在$K$是正定的且$\mathcal{F}$由$K$的Mercer分解中的前$k$个主特征函数给出的情况下的条件KRR。我们还考虑了$\mathcal{F}$由来自$K$的随机特征表示的$k$个随机特征组成的设置。事实证明，这两种设置密切相关。我们的理论分析和实验都证实，只要回归函数的$\mathcal{F}$分量比残差部分更显著，条件KRR在这些情况下优于标准KRR。

英文摘要

Conditionally positive definite (CPD) kernels are defined with respect to a function class $\mathcal{F}$. It is well known that such a kernel $K$ is associated with its native space (defined analogously to an RKHS), which in turn gives rise to a learning method -- called conditional kernel ridge regression (conditional KRR) due to its analogy with KRR -- where the estimated regression function is penalized by the square of its native space norm. This method is of interest because it can be viewed as classical linear regression, with features specified by $\mathcal{F}$, followed by the application of standard KRR to the residual (unexplained) component of the target variable. Methods of this type have recently attracted increasing attention. We study the statistical properties of this method by reducing its behavior to that of KRR with another fixed kernel, called the residual kernel. Our main theoretical result shows that such a reduction is indeed possible, at the cost of an additional term in the expected test risk, bounded by $\mathcal{O}(1/\sqrt{N})$, where $N$ is the sample size and the hidden constant depends on the class $\mathcal{F}$ and the input distribution. This reduction enables us to analyze conditional KRR in the case where $K$ is positive definite and $\mathcal{F}$ is given by the first $k$ principal eigenfunctions in the Mercer decomposition of $K$. We also consider the setting where $\mathcal{F}$ consists of $k$ random features from a random feature representation of $K$. It turns out that these two settings are closely related. Both our theoretical analysis and experiments confirm that conditional KRR outperforms standard KRR in these cases whenever the $\mathcal{F}$-component of the regression function is more pronounced than the residual part.

URL PDF HTML ☆

赞 0 踩 0

2605.26061 2026-05-26 cs.LG cs.AI 版本更新

Neuronal Stochastic Attention Circuit (NSAC) for Probabilistic Representation Learning

神经元随机注意力电路（NSAC）用于概率表示学习

Waleed Razzaq, Yun-Bo Zhao

发表机构 * Department of Automation, University of Science \& Technology of China, Hefei, China ； Institute of Artificial Intelligence, Hefei Comprehensive National Science Center

AI总结提出一种受生物学启发的连续时间注意力架构NSAC，通过Ornstein-Uhlenbeck随机微分方程和NCP门控机制在logits上诱导高斯分布，实现概率输出与不确定性量化。

详情

AI中文摘要

连续时间表示学习中不确定性估计的可靠量化仍处于初级阶段，尤其是在连续时间注意力架构中。我们引入了神经元随机注意力电路（NSAC），这是一种新颖的受生物学启发的连续时间注意力架构，它将注意力logit计算重新表述为Ornstein-Uhlenbeck随机微分方程的解，该方程由来自重新利用的秀丽隐杆线虫神经元电路策略（NCP）布线机制的输入依赖的非线性互连门调制。它在logits上诱导高斯分布，通过注意力权重上的逻辑正态分布传播原则性的随机性，从而产生概率输出。一个结合高斯负对数似然与认知分离正则化器的两项目标函数强制更高的预测方差，并能够联合量化偶然不确定性和认知不确定性。实验上，我们在多种学习任务中实现了NSAC，包括：(i) 不规则连续时间函数逼近；(ii) 多元回归；(iii) 长程预测；(iv) 工业4.0；以及(v) 自动驾驶车辆的车道保持。我们观察到，NSAC在准确性上与多个基线保持竞争力，产生合理校准的不确定性估计，同时在神经元细胞级别具有可解释性。

英文摘要

Reliable quantification of uncertainty estimates in continuous-time (CT) representation learning remains nascent, particularly within CT attention architectures. We introduce the Neuronal Stochastic Attention Circuit (NSAC), a novel biologically-inspired CT attention architecture that reformulates attention logit computation as the solution of an Ornstein-Uhlenbeck stochastic differential equation modulated by input-dependent, nonlinear interlinked gates derived from repurposed C.elegans Neuronal Circuit Policies (NCPs) wiring mechanism. It induces Gaussian distribution over logits that propagates principled stochasticity through logistic-normal distribution over attention weights to yield probabilistic output. A two-term objective function combining Gaussian negative log-likelihood with an epistemic-separation regularizer enforces higher predictive variance and enables joint quantification of aleatoric and epistemic uncertainty. Empirically, we implement NSAC in a diverse set of learning tasks including: (i) irregular CT function approximation; (ii) multivariate regression; (iii) long-range forecasting; (iv) Industry 4.0; and (v) the lane-keeping of autonomous vehicles. We observe that the NSAC remains competitive against several baselines in terms of accuracy and produces reasonably well-calibrated uncertainty estimates while being interpretable at the neuronal cell level.

URL PDF HTML ☆

赞 0 踩 0

2605.26059 2026-05-26 physics.flu-dyn cs.LG 版本更新

Accelerating Bayesian inverse design in computational fluid dynamics using neural operators

利用神经算子加速计算流体力学中的贝叶斯逆向设计

Bipin Tiwari, Omer San

发表机构 * Department of Mechanical and Aerospace Engineering, University of Tennessee, Knoxville（机械与航空航天工程系，田纳西大学，诺克斯维尔）

AI总结本文提出将神经算子代理模型嵌入MCMC采样循环，在保持后验结构的同时实现超过三个数量级的加速，用于计算流体力学中的贝叶斯逆向设计。

详情

DOI: 10.1007/s44379-026-00062-2
Journal ref: Mach. Learn. Comput. Sci. Eng 2, 14 (2026)

AI中文摘要

贝叶斯逆向设计提供了一个原则性框架，用于从稀疏流场观测中推断空气动力学几何形状并量化不确定性。然而，其在计算流体力学（CFD）中的实际应用受到基于梯度的马尔可夫链蒙特卡洛（MCMC）采样所需重复高保真模拟成本的严重限制。虽然通常提出代理模型来降低这一成本，但它们对后验几何和不确定性（尤其是激波主导流）的影响仍知之甚少。在这项工作中，我们证明神经算子代理可以直接嵌入MCMC推断循环中，同时保持后验结构。通过准一维喷管流的全贝叶斯逆公式，我们证明几何参数化在可辨识性和后验条件中起决定性作用，其中三次B样条产生稳定且物理意义明确的不确定性估计。基于该公式，在No-U-Turn采样器中用CFD生成数据训练的深度算子网络替代CFD求解器，同时保持似然模型、先验和采样配置不变。在从稀疏到完全观测的范围内，基于代理的推断再现了CFD参考的后验几何和不确定性趋势。由于代理集成，总推断时间减少到一秒以下，对应超过三个数量级的加速。此外，直接逆神经算子作为逆向设计的确定性替代方案被研究，无需后验采样即可实现单次几何重建。这些结果表明，神经算子加速的贝叶斯推断能够为空气动力学应用实现实用的、不确定性感知的逆向设计工作流程。

英文摘要

Bayesian inverse design provides a principled framework for inferring aerodynamic geometries from sparse flow observations while quantifying uncertainty. However, its practical use in computational fluid dynamics (CFD) is severely limited by the cost of repeated high-fidelity simulations required for gradient-based Markov chain Monte Carlo (MCMC) sampling. While surrogate models are commonly proposed to reduce this cost, their effect on posterior geometry and uncertainty, especially for shock-dominated flows, remains poorly understood. In this work, we demonstrate that neural operator surrogates can be embedded directly within the MCMC inference loop while preserving posterior structure. Using a fully Bayesian inverse formulation of quasi-one-dimensional nozzle flow, we demonstrate that geometry parameterization plays a decisive role in identifiability and posterior conditioning, with cubic B-splines yielding stable and physically meaningful uncertainty estimates. Building on this formulation, a Deep Operator Network trained on CFD-generated data is substituted for the CFD solver within a No-U-Turn Sampler, while keeping the likelihood model, priors, and sampling configuration unchanged. Across sparse to fully observed regimes, surrogate-based inference reproduces the posterior geometry and uncertainty trends of the CFD reference. As a result of surrogate integration, total inference time is reduced to under one second, corresponding to a speedup exceeding three orders of magnitude. In addition, a direct inverse neural operator is examined as a deterministic alternative for inverse design, enabling single-shot geometry reconstruction without posterior sampling. These results demonstrate that neural operator-accelerated Bayesian inference enables practical, uncertainty-aware inverse design workflows for aerodynamic applications.

URL PDF HTML ☆

赞 0 踩 0

2605.26036 2026-05-26 cs.AI cs.LG 版本更新

AdvantageFlow: 流模型中基于优势加权的强化学习最小二乘法

Branislav Kveton, Anup Rao, Subhojyoti Mukherjee, Krishna Kumar Singh, Viet Dac Lai

发表机构 * Adobe Research（Adobe研究）

AI总结提出AdvantageFlow算法，通过优势加权前向过程预测损失和 rollout 策略正则化，在图像生成任务中优于Flow-GRPO和负感知微调基线。

2605.26012 2026-05-26 cs.LG cs.AI 版本更新

Learning in Low-Dimensional Subspaces: Orthogonal Bottlenecks for Reinforcement Learning

低维子空间中的学习：强化学习的正交瓶颈

Aleksandar Todorov, Matthia Sabatelli

AI总结提出一种在强化学习编码器特征中插入固定正交投影以约束低维子空间的简单先验，证明其在线性可实现性假设下保持表达能力，并在实验中显示价值表示可压缩至极低维度而不损失性能。

详情

AI中文摘要

深度强化学习代理通常依赖高维神经表示，尽管越来越多的证据表明任务相关的价值和策略结构本质上是低维的。在这项工作中，我们提出了一种简单而有效的表示级先验，它插入一个固定的正交投影以将编码器特征约束到低维子空间，无需辅助目标、预训练或对底层RL算法的更改。在线性可实现性假设下，我们证明当瓶颈维度超过特征空间中最优价值函数的内在秩时，瓶颈保持表达能力，并将诱导的梯度动力学保留到等价的低维参数化。实验上，我们发现，在单任务和多任务基准测试中，一旦瓶颈维度超过一个小的任务相关阈值，基线性能要么匹配要么提高；在许多情况下，价值表示可以压缩到极低维度而不损失，最小充分维度更多地取决于环境复杂性而非编码器宽度。此外，我们分析了表示几何，发现正交瓶颈稳定了特征范数，并与更高的有效秩相关。这些结果共同支持了强化学习中流形假设的表示空间解释，并将正交瓶颈定位为一种轻量级、架构无关的塑造RL表示的机制。

英文摘要

Deep reinforcement learning (RL) agents commonly rely on high-dimensional neural representations, despite growing evidence that task-relevant value and policy structure may be intrinsically low-dimensional. In this work, we present a simple yet effective representation-level prior that inserts a fixed orthonormal projection to constrain encoder features to a low-dimensional subspace, requiring no auxiliary objectives, pretraining, or changes to the underlying RL algorithm. Under a linear realizability assumption, we prove that when the bottleneck dimension exceeds the intrinsic rank of the optimal value function in feature space, the bottleneck preserves expressivity and leaves the induced gradient dynamics unchanged up to an equivalent low-dimensional parameterization. Empirically, we find that across both single and multi-task benchmarks, baseline performance is either matched or improved once the bottleneck dimension exceeds a small task-dependent threshold; in many cases, value representations can be compressed to extremely low dimensions without loss, and the minimal sufficient dimension depends far more on environment complexity than encoder width. In addition, we analyze representation geometry and find that orthogonal bottlenecks stabilize feature norms and are associated with higher effective rank. Together, these results support a representation-space interpretation of the manifold hypothesis in reinforcement learning and position orthogonal bottlenecks as a lightweight, architecture-agnostic mechanism for shaping RL representations.

URL PDF HTML ☆

赞 0 踩 0

2605.26000 2026-05-26 stat.ML cs.LG stat.ME 版本更新

Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance

超越有限方差的随机梯度下降统计推断

Jose Blanchet, Peter Glynn, Wenhao Yang

发表机构 * Management Science and Engineering, Stanford University（斯坦福大学管理科学与工程系）

AI总结针对随机梯度下降中梯度方差可能无限的问题，提出一种基于联合弱收敛和自正则化统计量的模型无关置信域构建方法，并通过子采样校准实现渐近有效推断。

详情

AI中文摘要

随机梯度下降（SGD）是大规模统计学习和随机优化的基础算法。然而，当随机梯度具有无限方差时，基于SGD迭代的统计推断仍然具有挑战性，因为相关的极限分布依赖于未知的冗余参数。在本文中，我们开发了一种高效、模型无关的方法，用于从SGD轨迹构建置信域，该方法适用于有限方差和无限方差两种情况。该过程基于Polyak-Ruppert平均估计量和由SGD轨迹上的随机梯度构建的经验二阶矩归一化器的联合弱收敛结果。这种联合极限产生了一个自归一化统计量，其中主要的尾部依赖尺度项相互抵消。然后，我们使用子采样校准方案来估计相关的临界值，避免了对尾部指数、慢变函数或稳定律参数的显式估计。由此产生的置信域易于实现，并且在有限二阶矩和无限二阶矩情况下都是渐近有效的。模拟研究显示了在各种设置下的可靠覆盖，支持所提出的方法作为随机优化中不确定性量化的实用工具。

英文摘要

Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite variance, as the relevant limiting distributions depend on unknown nuisance parameters. In this paper, we develop an efficient, model-agnostic methodology for constructing confidence regions from SGD trajectories that applies in both finite- and infinite-variance regimes. The procedure is based on a joint weak convergence result for the Polyak-Ruppert averaged estimator and an empirical second-moment normalizer constructed from stochastic gradients along the SGD trajectory. This joint limit yields a self-normalized statistic in which the leading tail-dependent scaling terms cancel. We then use a subsampling calibration scheme to estimate the relevant critical values, avoiding explicit estimation of tail indices, slowly varying functions, or stable-law parameters. The resulting confidence regions are straightforward to implement and are asymptotically valid under both the finite- and infinite-second-moment regimes. Simulation studies show reliable coverage in various settings, supporting the proposed method as a practical tool for uncertainty quantification in stochastic optimization.

URL PDF HTML ☆

赞 0 踩 0

2605.25998 2026-05-26 cs.LG 版本更新

Causal methods for LLM development and evaluation

因果方法在LLM开发与评估中的应用

Dennis Frauen, Marie Brockschmidt, Konstantin Hess, Haorui Ma, Yuchen Ma, Abdurahman Maarouf, Maresa Schröder, Jonas Schweisthal, Yuxin Wang, Athiya Deviyani, Sonali Parbhoo, Rahul G. Krishnan, Stefan Feuerriegel

发表机构 * Imperial College London（帝国理工学院伦敦分校）； University of Toronto（多伦多大学）

AI总结本文提出因果方法可解决LLM开发与评估中的关键因果问题，并系统梳理其在预训练、对齐、路由等环节的应用机会。

Comments Published in KDD 2026

详情

DOI: 10.1145/3770855.3818647

AI中文摘要

大型语言模型（LLM）的开发目前由数据混合、奖励模型、路由策略和评估流程的大规模经验迭代驱动。本文认为，LLM开发和评估中的许多核心问题本质上是因果性的：在预训练中添加数据域会产生什么影响？当LLM以不同风格生成文本时，注释者的偏好如何变化？在推理成本约束下，提示应路由到更大还是更小的模型？通常，因果方法非常适合这种干预改变结果的情景，但令人惊讶的是，它们在LLM开发中代表性不足。我们的贡献有三方面：（1）我们解释了因果方法如何帮助现代LLM开发和评估：LLM开发严重依赖日志数据，这些数据通常受混杂和分布偏移影响；评估使用学习到的但可能有偏见的评判者；部署环境是非平稳的。这些条件使得纯预测方法变得脆弱，并为因果推断中的原则性识别和估计方法创造了机会。（2）我们进一步映射了因果方法在整个LLM开发流程中的机会，包括预训练、对齐、路由、智能体工作流和评估。（3）我们讨论了利用因果方法进行LLM开发和评估的新研究机会。总体而言，我们认为因果方法在LLM开发和评估流程中可能未被充分利用，尽管这些方法可以确保可靠且科学合理的设计。

英文摘要

Large language model (LLM) development is currently driven by large-scale empirical iteration over data mixtures, reward models, routing strategies, and evaluation pipelines. Here, we argue that many central questions in LLM development and evaluation are inherently causal: What is the effect of adding a data domain during pretraining? How do annotator preferences change when LLMs generate text in a different style? Should a prompt be routed to a larger or smaller model given inference cost constraints? In general, causal methods are well-suited to such settings where interventions change outcomes but, surprisingly, are underrepresented in LLM development. Our contribution is threefold: (1) We explain how causal methods can help develop modern LLM development and evaluation: LLM development relies heavily on logged data, which are often subject to confounding and distribution shifts; evaluation uses learned but potentially biased judges; and deployment environments are non-stationary. These conditions make purely predictive approaches fragile and create opportunities for principled identification and estimation methods from causal inference. (2) We further map opportunities for causal methods in the entire LLM development pipeline, including pretraining, alignment, routing, agentic workflows, and evaluation. (3) We discuss new research opportunities around leveraging causal methods for LLM development and evaluation. Overall, we argue that causal methods are potentially underutilized for the LLM development and evaluation pipeline, despite the fact that such methods can ensure a reliable and scientifically grounded design.

URL PDF HTML ☆

赞 0 踩 0

2605.25997 2026-05-26 cs.LG stat.ML 版本更新

Deployment-complete benchmarking

部署完备的基准测试

El Mustapha Mansouri, Keigo Arai

发表机构 * School of Engineering, Institute of Science Tokyo（东京科学研究所工程学院）

AI总结提出部署完备的基准测试框架，通过证据纤维和完成曲线量化基准证据是否足以确定部署行动，并证明仅靠分数不足以支持部署决策。

Comments 33 pages, 5 figures, 1 table; supplementary tables and code available

详情

AI中文摘要

基准测试日益指导部署、采购和科学筛选，但分数仅支持其记录的反应，不一定支持部署行动。我们引入了部署完备的基准测试，测试基准证据是否确定部署行动。当行动在每个证据纤维上恒定时，基准对于某个声明是完备的；混合纤维暴露了缺失的部署信息，完成曲线量化了解决歧义所需的证据。在受控响应空间中，基准通道的共形覆盖率为94.98%，但迁移到未测量的部署通道时表现不佳（10.07%），而响应排名区间实现了94.91%的覆盖率；即使零基准错误，在最大残差大小下也仅认证了45.4%的候选者。公开审计揭示了不完备性，包括97.9%的混合Tox21纤维和Matbench与JARVIS主要审计中零中位可认证分数。在保留的重放中，先认证后获取将Tox21中的错误决策从1.19%降至0.027%，JARVIS中从20.3%降至0.128%，同时改变了模型选择并识别了部署相关的探针。部署就绪的基准应报告证据、支持的行动、歧义和完成成本，而不仅仅是分数。

英文摘要

Benchmarks increasingly guide deployment, procurement and scientific screening, yet a score supports only the response it records, not necessarily the deployment action. We introduce deployment-complete benchmarking, which tests whether benchmark evidence determines a deployment action. A benchmark is complete for a claim exactly when the action is constant on each evidence fiber; mixed fibers expose missing deployment information, and completion curves quantify the evidence required to resolve ambiguity. In controlled response spaces, benchmark-channel conformal coverage of 94.98% transferred poorly to an unmeasured deployment channel (10.07%), whereas response-rank intervals achieved 94.91% coverage; even zero benchmark error certified only 45.4% of candidates at the largest residual size. Public audits revealed incompleteness, including 97.9% mixed Tox21 fibers and zero median certifiable fraction in main Matbench and JARVIS audits. In held-out replays, certify-then-acquire reduced false decisions from 1.19% to 0.027% in Tox21 and from 20.3% to 0.128% in JARVIS, while changing model choice and identifying deployment-relevant probes. Deployment-ready benchmarks should report evidence, supported actions, ambiguity and completion cost rather than scores alone.

URL PDF HTML ☆

赞 0 踩 0

2605.23082 2026-05-26 stat.ML cs.AI cs.LG 版本更新

KAPLAN: Kolmogorov-Arnold Prognostic Learnable Activation Networks for Survival Analysis

KAPLAN: 用于生存分析的Kolmogorov-Arnold可预测可学习激活网络

Stelios Boulitsakis Logothetis, Angela Wood, Pietro Liò

发表机构 * University of Cambridge（剑桥大学）

AI总结提出KAPLAN-HR模型，利用B样条Kolmogorov-Arnold网络非参数估计条件风险函数，通过深层架构自动捕捉交互和时变效应，并证明其收敛速率仅依赖于表示平滑性，从而缓解维度灾难，在六个临床数据集上达到或超越现有方法。

Comments 9 pages, 3 figures, 13 supplementary pages. Submitted to NeurIPS 2026

详情

AI中文摘要

生存分析旨在建模协变量和时间如何共同影响右删失下的事件时间分布。经典方法如Cox模型和广义加性模型（GAM）需要手动指定交互和时变效应，这在丰富的临床数据集上越来越不切实际。我们引入了KAPLAN-HR，一种B样条Kolmogorov-Arnold网络（KAN），用于非参数估计条件风险函数作为协变量和时间的联合函数。单层KAPLAN-HR模型恢复GAM，而更深层的架构通过组合捕捉交互和时变效应。我们为非参数KAN风险估计器建立了收敛速率，该速率仅依赖于底层KAN表示的平滑性，而不依赖于协变量维度，从而缓解了KAN可表示目标的维度灾难。在六个临床基准数据集的评估中，KAPLAN-HR匹配或超过了已建立的统计和深度学习生存方法的预测性能。

英文摘要

Survival analysis aims to model how covariates and time jointly shape the time-to-event distribution under right censoring. Classical methods such as the Cox model and generalised additive models (GAMs) require interactions and time-varying effects to be manually specified, which is increasingly impractical on rich clinical datasets. We introduce KAPLAN-HR, a B-spline Kolmogorov-Arnold Network (KAN) for nonparametric estimation of the conditional hazard as a joint function of covariates and time. A single-layer KAPLAN-HR model recovers a GAM, while deeper architectures capture interactions and time-varying effects through composition. We establish a convergence rate for the nonparametric KAN hazard estimator that depends only on the smoothness of the underlying KAN representation and not on the covariate dimension, thereby mitigating the curse of dimensionality for KAN-representable targets. In evaluations over six clinical benchmark datasets, KAPLAN-HR matches or exceeds the predictive performance of established statistical and deep learning survival methods.

URL PDF HTML ☆

赞 0 踩 0

2605.02836 2026-05-26 cs.LG math.AT 版本更新

A Closed-Form Persistence-Landmark Pipeline for Certified Point-Cloud and Graph Classification

一种用于认证点云和图分类的闭式持久性-地标管道

Sushovan Majhi, Atish Mitra, Žiga Virk, Pramita Bagchi

发表机构 * Data Science, George Washington University, USA（乔治华盛顿大学数据科学系）； Department of Mathematical Sciences, Montana Technological University, USA（蒙塔纳技术大学数学科学系）； Faculty of Computer and Information Science, University of Ljubljana, Institute IMFM, Slovenia（卢布尔雅那大学计算机与信息科学系，IMFM研究所，斯洛文尼亚）； Biostatistics and Bioinformatics, George Washington University, USA（乔治华盛顿大学生物统计学与生物信息学系）

AI总结提出PLACE管道，通过闭式公式从持久同调签名中分类点云和图，无需学习权重或校准，提供基于间隔的过量风险率、描述符选择规则和每个预测的认证。

Comments TMLR submission, https://openreview.net/forum?id=4kZxNlE5Ve. v2: variance-aware Pinelis-Bernstein certificate (radius iii) fires on 8/12 benchmarks (v1: not operational); MUTAG: empirical and population NC rules agree on 940/940 predictions. Matching-free nu-coherence replaces non-interference. Le Cam lower bound (Thm 3.2) recast PD-native, matching regime m<~R/D explicit

详情

AI中文摘要

我们引入PLACE（持久性-地标分析分类引擎），一种通过持久同调签名对点云和图进行分类的闭式管道。三个定量保证——基于间隔的过量风险率、闭式描述符选择规则和每个预测的认证——仅从训练标签中推导，无需学习权重或保留校准。嵌入将Mitra-Virk单点坐标函数求和到稀疏地标网格上；闭式权重规则$w_k^2 \propto (d_{k+1}^2 - d_k^2)/R_k^2$在$\nu$-相干性下最大化Mitra-Virk仿射证书中的失真斜率。(i) 由类均值分离$\Delta$和嵌入半径$R$驱动的$O(kR/(\Delta\sqrt{m_{\min}}))$间隔界，在样本匮乏区域$m \lesssim R/\Delta$中由Le Cam极小极大下界匹配。(ii) 在Ledoit-Wolf收缩协方差下的马氏距离是64描述符化学图池中最强的闭式排序器（11个基准上平均Spearman $\rho=+0.56$，11个中10个为正）；各向同性替代$\Delta/\sqrt{\ell}$在同质蛋白质/社交池上具有闭式选择一致性率。(iii) 训练时决定的证书，无每个预测开销，有三种具体半径（Pinelis、高斯插件和方差感知的Pinelis-Bernstein）。实验上，PLACE是Orbit5k上最强的基于图的方法，并在MUTAG和COX2上在统计噪声内匹配最强的基于拓扑的基线；剩余差距分为两个可诊断区域（NCI1/NCI109上的描述符盲点；其他地方的池覆盖限制）。Pinelis-Bernstein半径在12个基准中的8个上触发；在MUTAG上，经验和总体最近质心规则在940个保留测试预测中的每一个上一致，验证了证书的机制。

英文摘要

We introduce PLACE (Persistence-Landmark Analytic Classification Engine), a closed-form pipeline for classifying point clouds and graphs through their persistent-homology signatures. Three quantitative guarantees -- a margin-based excess-risk rate, a closed-form descriptor-selection rule, and a per-prediction certificate -- are derived from training labels alone, with no learned weights or held-out calibration. The embedding sums Mitra-Virk single-point coordinate functions over a sparse landmark grid; the closed-form weight rule $w_k^2 \propto (d_{k+1}^2 - d_k^2)/R_k^2$ maximizes the distortion slope in Mitra-Virk's affine certificate under $ν$-coherence. (i) An $O(kR/(Δ\sqrt{m_{\min}}))$ margin bound, driven by class-mean separation $Δ$ and embedding radius $R$, matched in the sample-starved regime $m \lesssim R/Δ$ by a Le Cam minimax lower bound. (ii) The Mahalanobis margin under Ledoit-Wolf-shrunk covariance is the strongest closed-form ranker on a 64-descriptor chemical-graph pool (mean Spearman $ρ= +0.56$ across 11 benchmarks, positive on 10 of 11); the isotropic surrogate $Δ/\sqrt{\ell}$ admits a closed-form selection-consistency rate on the homogeneous protein/social pools. (iii) A training-time-decided certificate, with no per-prediction overhead, in three concrete radii (Pinelis, Gaussian plug-in, and variance-aware Pinelis-Bernstein). Empirically, PLACE is the strongest diagram-based method on Orbit5k and matches the strongest topology-based baseline within statistical noise on MUTAG and COX2; remaining gaps fall into two diagnosable regimes (descriptor blindness on NCI1/NCI109; pool-coverage limits elsewhere). The Pinelis-Bernstein radius fires on 8 of the 12 benchmarks; on MUTAG the empirical and population nearest-centroid rules agree on every one of 940 held-out test predictions, validating the certificate's mechanism.

URL PDF HTML ☆

赞 0 踩 0

2605.25991 2026-05-26 cs.LG cs.NA math.NA 版本更新

Fuzzy PyTorch: Rapid Numerical Variability Evaluation for Deep Learning Models

Fuzzy PyTorch: 深度学习模型的快速数值变异性评估

Inés Gonzalez-Pepe, Hiba Akhaddar, Tristan Glatard, Yohan Chatelain

发表机构 * Department of Computer Science and Software Engineering（计算机科学与软件工程系）； Concordia University（康科迪亚大学）； Krembil Centre for Neuroinformatics（神经信息学克雷姆布里中心）； Centre for Addiction and Mental Health（成瘾与心理健康中心）； Camh

AI总结提出Fuzzy PyTorch框架，通过集成随机算术和概率舍入实现深度学习模型数值变异性的快速评估，相比现有工具Verrou实现5至60倍加速，并支持从1到3.41亿参数的模型规模。

Comments 19 pages, 8 figures, Published in Transactions on Machine Learning Research (01/2026)

详情

AI中文摘要

我们介绍了Fuzzy PyTorch，一个用于快速评估深度学习（DL）模型中数值变异性的框架。随着DL越来越多地应用于各种任务，理解浮点运算带来的变异性对于确保稳健可靠的性能至关重要。评估此类变异性的工具必须具有可扩展性、高效性，并能与现有框架无缝集成，同时最小化代码修改。Fuzzy PyTorch通过将随机算术集成到PyTorch中实现了这一点，它采用了一种名为“概率舍入与指令集管理”的新型库，该库与数值分析编译器Verificarlo接口。该库提供了随机舍入模式以及一种新模式：上下舍入。对比评估显示，Fuzzy PyTorch保持了模型性能，并且与最先进的工具Verrou相比，运行时间减少了5倍到60倍。我们进一步通过运行从1到3.41亿参数的模型展示了其可扩展性，确认了其在小型和大型DL架构中的适用性。总体而言，Fuzzy PyTorch为评估深度学习中的数值变异性提供了一种高效、可扩展且实用的解决方案，使研究人员和从业者能够在不牺牲性能或计算效率的情况下量化和管理浮点不确定性。

英文摘要

We introduce Fuzzy PyTorch, a framework for rapid evaluation of numerical variability in deep learning (DL) models. As DL is increasingly applied to diverse tasks, understanding variability from floating-point arithmetic is essential to ensure robust and reliable performance. Tools assessing such variability must be scalable, efficient, and integrate seamlessly with existing frameworks while minimizing code modifications. Fuzzy PyTorch enables this by integrating stochastic arithmetic into PyTorch through Probabilistic Rounding with Instruction Set Management, a novel library interfacing with Verificarlo, a numerical analysis compiler. The library offers stochastic rounding mode and a novel mode; up-down rounding. Comparative evaluations show Fuzzy PyTorch maintains model performance and achieves runtime reductions of 5x to 60x versus Verrou, a state-of-the-art tool. We further demonstrate scalability by running models from 1 to 341 million parameters, confirming applicability across small and large DL architectures. Overall, Fuzzy PyTorch provides an efficient, scalable, and practical solution for assessing numerical variability in deep learning, enabling researchers and practitioners to quantify and manage floating-point uncertainty without compromising performance or computational efficiency.

URL PDF HTML ☆

赞 0 踩 0

通过特定恐惧症数据迁移学习定量评估创伤后应激障碍的严重程度

Nicolas Ricka, Gauthier Pellegrin, Denis A. Fompeyrine, Thomas Rohaly, Leah Enders, Heather Roy

发表机构 * MyndBlue ； DCS Corporation ； Human in Complex Systems Division, DEVCOM Army Research Laboratory（复杂系统人类研究部，DEVCOM陆军研究实验室）

AI总结提出基于多元核密度估计的机器学习方法，利用心率与皮肤电导信号从特定恐惧症数据迁移学习，客观评估PTSD严重程度，分类准确率86%，平均绝对误差5.6。

Comments Submitted to a peer-reviewed journal, comments welcome

详情

AI中文摘要

创伤后应激障碍（PTSD）是一种普遍且使人衰弱的心理健康状况，对个人和社会产生重大影响。目前PTSD的临床评估通常依赖主观评价，耗时、昂贵且易受人为偏见影响。本研究提出一种基于多元核密度估计（MKDE）技术的机器学习方法，用于客观评估PTSD严重程度。我们收集了21名参与者在沉浸式模拟期间的心率（HR）和皮肤电导反应（GSR）信号以及PTSD检查表-军事版（PCL-M）标签。在公开的蜘蛛恐惧症数据集上训练恐惧反应模型，并从军事数据集估计的恐惧反应曲线中提取PTSD预测特征。该模型在分类PTSD状态时达到86%的准确率，有效区分有和无PTSD的参与者（PCL-M阈值为36）。模型的平均绝对误差（MAE）为5.6，并以17%的平均绝对百分比误差估计临床PTSD严重程度量表。我们的算法通过提供一种客观且低努力的生理评估方法，显示出增强PTSD严重程度估计和随访的潜力。这些发现表明在筛查和随访环境中具有临床实用性。

英文摘要

Posttraumatic stress disorder (PTSD) is a prevalent and debilitating mental health condition with significant personal and societal impacts. Current clinical assessments of PTSD often rely on subjective evaluations, which can be time-consuming, costly, and prone to human bias. This study proposes a machine learning (ML) approach based on multivariate kernel density estimation (MKDE) technique for the objective evaluation of PTSD severity. We collected heart rate (HR) and galvanic skin response (GSR) signals as well as PTSD Checklist - Military Version (PCL-M) labels from 21 participants during an immersive simulation. A fear-response model was trained on a public arachnophobia dataset, and predictive features of PTSD were extracted from the fear-response curves estimated on the military dataset. The model achieved an accuracy of 86\% in classifying PTSD status, effectively distinguishing participants with and without PTSD (PCL-M threshold of 36). The average mean absolute error (MAE) of the models is 5.6, and it estimated a clinical PTSD severity scale with a mean absolute percentage error of 17\%. Our algorithm demonstrates promising potential for enhancing estimation of PTSD severity and followup by offering an objective and low-effort evaluation approach using physiology. These findings suggest clinical utility in both screening and follow-up settings.

URL PDF HTML ☆

赞 0 踩 0

2605.25924 2026-05-26 cs.CL cs.LG 版本更新

Does Continued Pretraining on a Learner Corpus Improve Automated Essay Scoring on English Proficiency Tests? Evidence from EFCAMDAT

在学习者语料库上继续预训练是否能提高英语水平测试的自动作文评分？来自EFCAMDAT的证据

Duy Anh Nguyen

发表机构 * University of Greenwich（格林威治大学）

AI总结研究通过在EFCAMDAT学习者语料库上进行领域自适应继续预训练（DAPT），探究其对基于Transformer的自动作文评分（AES）在英语水平测试中的影响，发现全语料库DAPT效果不一，而基于CEFR分级的针对性DAPT能更可靠地提升领域内评分性能。

Comments 16 pages, 3 figures, 10 tables, including references and appendices

详情

AI中文摘要

最近的自动作文评分（AES）研究越来越多地使用预训练的Transformer模型，但这些模型通常是在通用领域英语上预训练的，可能无法充分代表第二语言学习者的写作。本研究调查了在EFCAMDAT学习者语料库上进行领域自适应继续预训练（DAPT）是否能提高基于Transformer的AES在英语水平测试中的表现。我们对三个Transformer编码器应用DAPT，并在FCE和IELTS上评估了领域内评分和少样本跨数据集迁移。全语料库DAPT在模型、数据集和指标上产生了混合结果。进一步分析表明，这些混合效应部分由EFCAMDAT与下游数据集在熟练度、体裁和交际目的上的不匹配解释。基于熟练度的消融实验显示，使用CEFR对齐子集进行针对性DAPT比全语料库DAPT更可靠地提高了下游评分，尤其是对于使用B1-B2数据的FCE。然而，这些增益并未一致地改善跨数据集迁移。总体而言，研究结果表明，当预训练数据与下游评估设置充分对齐时，在学习者写作语料库上继续预训练可以有益于英语评估的领域内AES，但它不会自动提高跨不同英语水平测试数据集的迁移性。

英文摘要

Recent automated essay scoring (AES) studies increasingly use pretrained transformer models, but these models are usually pretrained on general-domain English and may under-represent second-language learner writing. This study investigates whether domain-adaptive continued pretraining (DAPT) on the EFCAMDAT learner corpus improves transformer-based AES for English proficiency tests. We apply DAPT to three transformer encoders and evaluate them on FCE and IELTS in both in-domain scoring and few-shot cross-dataset transfer. Full-corpus DAPT produces mixed results across models, datasets, and metrics. Further analyses suggest that these mixed effects are partly explained by mismatches in proficiency, genre, and communicative purpose between EFCAMDAT and the downstream datasets. A proficiency-based ablation shows that targeted DAPT using CEFR-aligned subsets improves downstream scoring more reliably than full-corpus DAPT, especially for FCE with B1--B2 data. However, these gains do not consistently improve cross-dataset transfer. Overall, the findings suggest that continued pretraining on a learner-writing corpus can benefit in-domain AES for English assessment when the pretraining data is sufficiently aligned with the downstream assessment settings. However, it does not automatically improve transferability across different English proficiency test datasets.

URL PDF HTML ☆

赞 0 踩 0

2605.25916 2026-05-26 cs.LG cs.DC cs.NI 版本更新

Joint Optimization of Training and Inference in Federated Edge Learning via Constrained Multi-Objective Deep Reinforcement Learning

通过约束多目标深度强化学习联合优化联邦边缘学习中的训练与推理

Zhen Li, Jun Cai, Chao Yang, Haoran Gao

发表机构 * Department of Electrical and Computer Engineering, Concordia University（康科迪亚大学电气与计算机工程系）； School of Automation, Guangdong University of Technology（广东工业大学自动化学院）

AI总结提出一种在线优化框架，通过约束多目标深度强化学习算法C-MOPPO联合管理资源受限边缘设备上的联邦训练和推理，以在最小化延迟和能耗的同时最大化推理精度。

详情

AI中文摘要

联邦边缘学习（FEEL）最近成为一种有前景的范式，通过支持跨边缘设备的协作模型训练同时保护数据隐私来实现边缘智能（EI）。在本文中，我们提出了一种在线优化框架，用于联合管理资源受限边缘设备上的联邦训练和推理。我们引入了一种基于串联队列的转换机制，将推理请求与训练数据桥接起来，并进一步将数据和模型的新鲜度纳入准确性公式中，以捕捉真实环境中的时间动态。为了在最小化延迟和能耗的同时最大化推理精度，边缘设备的模式选择、通信和计算资源分配被联合优化。我们将此优化表述为一个多目标优化问题，该问题是NP难的，并且由于在线设置而进一步复杂化。为了应对这些挑战，我们将问题转化为多目标马尔可夫决策过程（MOMDP），并开发了一种约束多目标近端策略优化（C-MOPPO）算法。具体来说，C-MOPPO首先学习一组具有不同目标偏好策略，然后利用约束策略优化来丰富帕累托前沿并获得高质量、密集的解。大量实验表明，C-MOPPO在目标之间实现了良好的平衡权衡，并在各种系统配置下显著优于基线。

英文摘要

Federated edge learning (FEEL) has recently emerged as a promising paradigm for achieving edge intelligence (EI) via enabling collaborative model training across edge devices while protecting data privacy. In this paper, we put forth an online optimization framework that jointly manages federated training and inference on resource-constrained edge devices. We introduce a tandem-queue-inspired conversion mechanism that bridges inference requests and training data, and further incorporate both data and model freshness into the accuracy formulation to capture temporal dynamics in real-world environments. To maximize inference accuracy while minimizing latency and energy consumption, the mode selections, communication, and computation resource allocations of edge devices are jointly optimized. We formulate this optimization as a multi-objective optimization problem, which is NP-hard and further complicated by the online setting. To address these challenges, we transform the problem into a multi-objective Markov decision process (MOMDP) and develop a \underline{c}onstrained \underline{m}ulti-\underline{o}bjective \underline{p}roximal \underline{p}olicy \underline{o}ptimization (C-MOPPO) algorithm. Specifically, C-MOPPO first learns a set of policies with different preferences across three objectives, then leverages constrained policy optimization to enrich the Pareto front and obtain high-quality, dense solutions. Extensive experiments demonstrate that C-MOPPO achieves well-balanced trade-offs among objectives and significantly outperforms baselines under various system configurations.

URL PDF HTML ☆

赞 0 踩 0

2605.25903 2026-05-26 cs.CL cs.LG 版本更新

Universal Activation Verbalizer: A Unified Framework for Cross-Model Activation Explanation

通用激活词化器：跨模型激活解释的统一框架

Haiyan Zhao, Zirui He, Guanchu Wang, Ali Payani, Yingcong Li, Mengnan Du

发表机构 * New Jersey Institute of Technology（新泽西理工学院）； University of North Carolina at Charlotte（北卡罗来纳大学夏洛特分校）； Cisco Research（思科研究）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））

AI总结提出通用激活词化器（UAV）框架，通过共享解码器和轻量适配器将异构模型的隐藏表示转化为自然语言解释，支持跨模型家族和规模的激活词化，在分类、事实检索和要点总结任务中与强基线竞争。

Comments 23 pages, 11 figures, 11 tables

详情

AI中文摘要

激活词化以自然语言解释隐藏表示，但现有方法大多局限于自解释，即每个模型仅解释自身的激活。我们引入通用激活词化器（UAV），一个使用共享解码器解释来自异构捐赠模型激活的框架。UAV学习一个轻量适配器，将捐赠激活转化为解码器嵌入空间中的软标记，并通过重用冻结的解码器侧LoRA同时为另一个捐赠者训练新适配器，进一步支持仅适配器迁移。在分类、事实检索和要点总结任务中，UAV在实现跨模型家族和规模的跨模型词化时，与强自解释基线保持竞争力。消融实验表明，解码器侧调优主要改善任务行为，而适配器提供激活基于的事实和语义信息，用于忠实解释。

英文摘要

Activation verbalization explains hidden representations in natural language, but existing methods are mostly limited to self-explanation, where each model explains only its own activations. We introduce Universal Activation Verbalizer (UAV), a framework that uses a shared decoder to explain activations from heterogeneous donor models. UAV learns a lightweight adapter that converts donor activations into soft tokens in decoder's embedding space, and further supports adapter-only transfer by reusing a frozen decoder-side LoRA while training only a new adapter for another donor. Across classification, fact retrieval, and gist summarization, UAV remains competitive with strong self-explanation baselines while enabling cross-model verbalization across model families and scales. Ablations show that decoder-side tuning mainly improves task behavior, whereas the adapter provides the activation-grounded factual and semantic information needed for faithful explanations.

URL PDF HTML ☆

赞 0 踩 0

2605.25894 2026-05-26 cs.LG q-fin.ST 版本更新

Predicting Stock Price Direction on Earnings Announcement Days using Multi-modal Deep Learning

使用多模态深度学习预测盈利公告日的股价方向

Manuel Noseda, Nathan Soldati, Marco Paina

发表机构 * ETH Zürich（苏黎世联邦理工学院）

AI总结本研究结合基本面指标、技术指标和新闻情感，利用LSTM和Transformer模型预测盈利公告日的股价方向，发现Transformer在识别波动方面更敏感，且新闻情感有助于提升性能。

详情

AI中文摘要

预测盈利公告（EAs）期间的股价走势是一个重大挑战，因为市场噪音和高冲击价格不连续性。在本研究中，我们评估了公告前的新闻情感、公司基本面和近期市场动态是否共同预测了EA日股票的价格方向运动。我们构建了一个多模态特征空间，结合了15个基本面指标、3个基于价格的技术指标以及使用FinBERT处理的金融新闻文章的情感分数。我们将长短期记忆（LSTM）网络和基于Transformer的架构与逻辑回归基线进行比较，并进一步评估所有模型在有和没有情感特征的情况下的增量价值。我们的结果表明，虽然LSTM通过保守的安全策略显示出更高的精确度，但Transformer模型在识别波动性运动方面表现出更高的敏感性，获得了更高的宏观F1分数，消融实验显示加入新闻情感有一致的益处。

英文摘要

Predicting stock price movements during Earnings Announcements (EAs) is a significant challenge due to market noise and high-impact price discontinuities. In this study, we evaluate whether pre-announcement news sentiment, firm fundamentals, and recent market dynamics jointly predict the directional price movement of equities on EA days. We construct a multi-modal feature space combining 15 fundamental metrics, 3 price-based technical indicators and sentiment scores derived from financial news articles processed using FinBERT. We compare a Long Short-Term Memory (LSTM) network and a Transformer-based architecture against a logistic regression baseline, and further assess all models with and without sentiment features to quantify their incremental value. Our results indicate that while the LSTM demonstrates higher precision through a conservative safe-bet strategy, the Transformer model exhibits superior sensitivity in identifying volatile movements, achieving a higher macro F1-score, with ablation experiments showing a consistent benefit from incorporating news sentiment.

URL PDF HTML ☆

赞 0 踩 0

2605.25890 2026-05-26 cs.LG 版本更新

Merge-Bench: Resolve Merge Conflicts with Large Language Models

Merge-Bench: 使用大型语言模型解决合并冲突

Benedikt Schesch, Michael D. Ernst

发表机构 * Amazon（亚马逊）； University of Washington（华盛顿大学）

AI总结本文构建了包含7938个真实合并冲突的数据集Merge-Bench，并利用组相对策略优化（GRPO）训练LLMergeJ模型，在Java程序上以14B参数超越多数商业LLM，但最佳模型正确解决率仍低于60%。

Comments 14 pages, 7 figures

详情

AI中文摘要

本文应用机器学习处理版本控制合并这一困难且重要的任务。（1）我们构建了一个数据集Merge-Bench，包含来自1439个GitHub仓库的7938个真实合并冲突片段。真实标注是开发者提交到仓库的合并解决方案。我们的数据集构建方法可扩展到任意数据量，因为无需手动标注。（2）我们训练了一个模型LLMergeJ，用于解决Java程序中的合并冲突。我们的方法使用组相对策略优化（GRPO），一种在线强化学习方法，来训练大型语言模型（LLM）。（3）我们对LLM在解决合并冲突上的性能进行了两次评估。在Java程序上，具有14B参数的LLMergeJ优于3个商业LLM，仅次于Gemini 2.5 Pro。在11种编程语言中，商业LLM的性能在不同语言间基本稳定。最佳模型正确解决的合并冲突不到60%。

英文摘要

This paper applies machine learning to the difficult and important task of version control merging. (1) We constructed a dataset, Merge-Bench, of 7938 real-world merge conflict hunks from 1439 GitHub repositories. The ground truth is the merge resolution that developers committed to the repository. Our dataset construction methodology is scalable to arbitrary amounts of data since no manual labeling is required. (2) We trained a model, LLMergeJ, to resolve merge conflicts in Java programs. Our approach uses Group Relative Policy Optimization (GRPO), an online reinforcement learning method, to train a Large Language Model (LLM). (3) We performed two evaluations of the performance of LLMs on resolving merge conflicts. On Java programs, LLMergeJ with 14B parameters outperforms 3 commercial LLMs, trailing only Gemini 2.5 Pro. Across 11 programming languages, commercial LLM performance is largely stable from language to language. The best models correctly resolve less than 60% of merge conflicts.

URL PDF HTML ☆

赞 0 踩 0

2605.25888 2026-05-26 cs.LG math.OC 版本更新

Optimal and Order-optimal Gated Priority-based Greedy Policies for Two-layer Multi-item Order Fulfillment

两层多物品订单履约的最优和阶最优门控优先级贪婪策略

Xi Chen, Yuze Chen, Ziyi Chen, Yuan Zhou

发表机构 * Leonard N. Stern School of Business, New York University（纽约大学 Leonard N. Stern 商学院）； Qiuzhen College, Tsinghua University（清华大学邱泽学院）； Yau Mathematical Sciences Center & Department of Mathematical Sciences, Tsinghua University（清华大学姚数学科学中心及数学科学系）

AI总结针对电商在两层分销网络中实时履约决策问题，提出门控优先级贪婪策略，证明其竞争比最优性，并通过数值实验验证性能。

详情

AI中文摘要

我们研究当多物品客户订单顺序到达且未来需求未知时，电商企业如何在两层分销网络中做出实时履约决策。核心管理矛盾在于：是否使用稀缺的前端配送中心（FDC）库存来节省当前履约成本，还是保留该库存用于未来可能更有价值的本地服务订单。我们构建了一个对抗性在线模型，包含多个FDC、一个区域配送中心（RDC）、多单位多物品订单以及物品特定且时变的可变成本。理论目标是刻画简单、可解释且可实施的履约规则何时能够达到与最优先知规划者几乎相同的性能。我们提出了一类门控优先级贪婪策略，在时变和时不变成本结构下推导了竞争比保证，并为任何在线算法建立了匹配或接近匹配的下界。数值实验表明，所提策略相对于广义短视和基于预测的基准方法表现强劲。分析提供了管理指导：何时应保护本地库存，何时拆分订单值得承担固定成本负担，以及固定成本和可变成本的相对大小如何决定更复杂优化的价值。

英文摘要

We study how an e-commerce firm should make real-time fulfillment decisions in a two-layer distribution network when multi-item customer orders arrive sequentially and future demand is unknown. The central managerial tension is whether to use scarce front distribution center (FDC) inventory to save current fulfillment cost or preserve that inventory for future orders that may be more valuable to serve locally. We formulate an adversarial online model with multiple FDCs, one regional distribution center (RDC), multi-unit multi-item orders, and item-specific and time-varying variable costs. Our theoretical objective is to characterize when simple, interpretable, and implementable fulfillment rules can perform nearly as well as an optimal clairvoyant planner. We develop a family of Gated Priority-based Greedy policies, derive competitive-ratio guarantees under both time-varying and time-invariant cost structures, and establish matching or near-matching lower bounds for any online algorithm. Numerical experiments show that the proposed policies perform strongly relative to generalized myopic and forecast-based benchmarks. The analysis yields managerial guidance on when local inventory should be protected, when splitting orders is worth the fixed-cost burden, and how the relative magnitudes of fixed and variable costs determine the value of more sophisticated optimization.

URL PDF HTML ☆

赞 0 踩 0

2605.25882 2026-05-26 cs.LG 版本更新

Conformalised imprecise inference for robust extrapolation under limited data

基于共形化的不精确推断在有限数据下的鲁棒外推

Yu Chen, Scott Ferson

发表机构 * Institute for Risk and Uncertainty（风险与不确定性研究所）； University of Liverpool（利物浦大学）

AI总结提出一种模型无关的共形化不精确推断框架，通过引入不精确性和距离感知，在分布偏移下保持覆盖并自适应扩展不确定性，实现有限数据下的鲁棒外推。

Comments 10 pages, 5 figures

2605.25880 2026-05-26 cs.LG 版本更新

The Quantization Benefits of Residual-Free Transformers

无残差Transformer的量化优势

Yiping Ji, Mahalakshmi Sabanayagam, Peyman Moghadam, Hemanth Saratchandran, Simon Lucey

发表机构 * Australian Institute for Machine Learning, Adelaide University（澳大利亚机器学习研究所，阿德莱德大学）； DATA61, CSIRO（DATA61，CSIRO）

AI总结本文通过对比残差与无残差Transformer，发现残差连接导致激活值非高斯性增强，从而增加量化误差；而无残差Transformer通过正交初始化等技术保持近高斯激活值，显著提升低比特量化鲁棒性，揭示了精度与可压缩性之间的权衡。

Comments Under review

详情

AI中文摘要

大规模Transformer的训练和部署日益受到跨加速器传输激活值、梯度和优化器状态的限制。低比特量化提供了一种自然的补救措施，但Transformer的激活值通常具有重尾和异常值主导的特点，使得简单量化损失严重。我们表明，这种困难不仅是量化器的属性，也是架构的属性。具体来说，残差连接在训练过程中可能使Transformer激活值偏离高斯性。通过残差和无残差Transformer之间的受控比较，我们证明这种效应导致残差模型在低精度下量化误差和精度下降显著更高。我们通过超额峰度分析解释这一现象，表明残差混合可以放大非高斯性，而无残差中的密集混合则压缩非高斯性。然后我们展示，使用正交初始化、谱或二阶优化以及注意力温度的深度感知缩放，可以使无残差Transformer可训练。在语言任务中，虽然全精度性能略有下降，但这些模型保持近高斯激活值，并对低比特量化表现出显著改善的鲁棒性。我们的结果揭示了Transformer设计中的精度-可压缩性权衡，并激发了面向量化的基础模型的架构级方法。

英文摘要

Large-scale transformer training and deployment are increasingly constrained by the transfer of activations, gradients, and optimizer states across accelerators. Low-bit quantization offers a natural remedy, but transformer activations are often heavy-tailed and outlier-dominated, making simple quantization highly lossy. We show that this difficulty is not only a property of the quantizer, but also of the architecture. Specifically, residual connections can drive transformer activations away from Gaussianity during training. Using controlled comparisons between residual and residual-free transformers, we demonstrate that this effect leads to substantially higher quantization error and accuracy degradation at low precision in residual models. We explain the phenomenon through an excess kurtosis analysis, showing that residual mixing can amplify non-Gaussianity, whereas dense mixing in residual-free contracts non-Gaussianity. We then show that residual-free transformers can be made trainable using orthogonal initialization, spectral or second-order optimization, and depth-aware scaling of attention temperature. In language tasks, while there is a small drop in full precision performance, these models retain near-Gaussian activations and exhibit significantly improved robustness to low-bit quantization. Our results identify an accuracy--compressibility trade-off in transformer design and motivate architecture-level approaches to quantization-friendly foundation models.

URL PDF HTML ☆

赞 0 踩 0

2605.25868 2026-05-26 cs.HC cs.LG 版本更新

The Timing Dependencies of Trust: Speed, Accuracy, and cBCI Neuro-Decoupling in Human-AI Teams

信任的时间依赖性：人机团队中的速度、准确性与cBCI神经解耦

Christopher Baker, Stephen Hinton, Akashdeep Nijjar, Riccardo Poli, Caterina Cinel, Tom Reed, Stephen Fairclough

发表机构 * School of Electronics, Electrical Engineering and Computer Science（电子工程与计算机科学学院）； Queen's University Belfast（贝尔法斯特女王大学）； School of Psychology（心理学学院）； Liverpool John Moores University（利物浦约翰摩尔斯大学）； School of Computer Science and Electronic Engineering（计算机科学与电子工程学院）； University of Essex（埃塞克斯大学）； Defence Science Technology Laboratory（国防科学技术实验室）

AI总结本研究通过比较快速低准确率（FLA-AI）与慢速高准确率（SA-AI）两种AI助手，利用协作脑机接口（cBCI）和自适应黎曼Oracle，揭示了AI响应时间决定了团队失败机制：快速AI引发盲目服从，慢速AI导致延迟认知冲突，并通过混合融合方法有效提升了团队性能。

详情

AI中文摘要

TIAR：基于轨迹信息的优势重加权用于大语言模型弃权学习

Muyu Pan, Shu Zhao, Nan Zhang, Philip Shin, Varun Parekh, Vijaykrishnan Narayanan, Rui Zhang

发表机构 * Department of Computer Science, The Pennsylvania State University（宾夕法尼亚州立大学计算机科学系）

AI总结本文提出TIAR方法，利用GRPO中的多条轨迹作为自然弃权信号，动态重加权弃权奖励，在六个评估类别中的五个上取得最优弃权F1分数，同时保持基线准确率。

Comments 10 pages, 1 figure, 4 tables

详情

AI中文摘要

本文研究大语言模型（LLM）的弃权学习，特别是使用三元奖励来激励大语言模型中的真实性。本文将该思想从三元奖励扩展到基于轨迹信息的优势重加权（Trajectory-Informed Advantage Reweighting），在组相对策略优化（GRPO）训练期间动态重加权弃权奖励。本工作的目标聚焦于弃权学习而非提升真实性，作为减少幻觉的探索。本文的新颖之处在于方法论创新、优势重加权和基准选择。利用GRPO的多条轨迹作为自然弃权信号，该方法使用奖励信号探索知识边界并鼓励一致性。通过证明轨迹可以作为策略相对于查询的置信度指标，进而用于动态计算弃权优势。使用AbstentionBench作为评估基准，因为本工作旨在为弃权学习领域做出贡献。对该基准上的所有数据集，均使用本方法和各种基线进行了测试。实证结果表明，TIAR在六个评估类别中的五个上取得了最优弃权F1分数，在31个基准数据集中的17个上优于静态三元基线，同时完全保持基线准确率。

英文摘要

This paper investigates large language model (LLM) abstention learning, specifically using ternary reward, which incentivize truthfulness in large language models. This paper extends that idea by moving from a ternary reward to a Trajectory-Informed advantage reweighting, dynamically re-weights the abstention reward during Group Relative Policy Optimization (GRPO) training. The objective of this work focuses on abstention learning instead of improving truthfulness, serving as an exploration into hallucination reduction. The novelty of this paper lies in methodological innovation, advantage re-weighting, and benchmark selection. Leveraging GRPO's multiple trajectories as a natural abstention signal, this method uses a reward signal to explore knowledge boundaries and encourage consistency. By demonstrating that trajectories can be used as a confidence indicator of the policy relative to the query, they are then used to dynamically calculate the abstention advantage. AbstentionBench is used as the evaluation benchmark, as this work aims to contribute to the field of abstention learning. All datasets on the benchmark were tested against this method and various baselines. Empirical results demonstrate that TIAR achieves state-of-the-art abstention F1 scores across five of six evaluation categories, outperforming the static ternary baseline on 17 of 31 benchmark datasets while fully preserving baseline accuracy.

URL PDF HTML ☆

赞 0 踩 0

2605.25848 2026-05-26 cs.LG cs.AI 版本更新

Geometric Evolution Maps: Extracting Stable Concept Probes from Transformer Residual Streams

几何演化图：从Transformer残差流中提取稳定概念探针

James Henry

发表机构 * Independent Researcher（独立研究者）

AI总结提出几何演化图（GEM）方法，通过追踪残差流中概念的方向轨迹并识别旋转停止的交接层，提取稳定的概念探针，在391个概念×模型对中优于峰值层探针的比例达66.2%。

Comments 24 pages, 3 figures. Reference implementation: rosetta_tools v1.3.1 (doi:10.5281/zenodo.20361433)

详情

AI中文摘要

从Transformer残差流中提取的概念探针的可靠性取决于提取层。常见的做法是在固定的后期层或分离得分函数的峰值处进行探测，这忽略了一个基本的结构特征：概念表示在其组装阶段经历显著的方向旋转，直到主要概念分配区（CAZ）之后的一个特征交接层才稳定下来。我们引入了几何演化图（GEM），它通过残差流激活追踪概念的完整方向轨迹，识别旋转停止的交接层，并从该层提取稳定的探针方向。在跨越70M到14B参数的23种架构和17种概念类型中，CAZ内入口到出口的余弦相似度平均为0.233，表明CAZ入口处的探针方向不能可靠地预测出口处的探针方向。在391个概念×模型对（23个模型×17个概念）上的消融实验表明，GEM提取的探针在268/391次试验（68.5%）中至少与峰值层探针一样精确，并在259/391次试验（66.2%）中严格优于峰值层探针。架构差异显著：MHA模型在173/221次试验（78.3%）中偏好交接层；GQA模型仅在56/119次试验（47.1%）中偏好交接层。模型级Wilcoxon检验：W=214, N=23, p=0.010（单侧）。一个自适应消融宽度规则针对79/391个近最终层情况：在60/79个触发情况（75.9%）中提高了探针质量，平均增益+7.44个百分点。方向特异性控制证实消融效果是概念方向特异性的：与随机方向消融相比，中位数抑制率为377倍（99.1%的概念方向击败了所有10个随机种子）。参考实现：rosetta_tools v1.3.1（doi:10.5281/zenodo.20361433）。

英文摘要

Concept probes extracted from transformer residual streams are only as reliable as the layer from which they are extracted. The common practice of probing at a fixed late layer or at the peak of a separation score function ignores a fundamental structural feature: concept representations undergo substantial directional rotation during their assembly phase, and do not settle into a stable direction until a characteristic handoff layer after the primary Concept Allocation Zone (CAZ). We introduce Geometric Evolution Maps (GEMs), which track the full directional trajectory of a concept through residual stream activations, identify the handoff layer where rotation ceases, and extract the settled probe direction from that layer. Across 23 architectures spanning 70M to 14B parameters and 17 concept types, the entry-to-exit cosine similarity within CAZs has a mean of 0.233, showing that probe direction at CAZ entry does not reliably predict probe direction at exit. Ablation experiments across 391 concept x model pairs (23 models x 17 concepts) show that GEM-extracted probes are at least as precise as peak-layer probes in 268/391 trials (68.5%), and strictly outperform in 259/391 (66.2%). The architecture split is pronounced: MHA models favour the handoff in 173/221 trials (78.3%); GQA models favour the handoff in only 56/119 trials (47.1%). Model-level Wilcoxon: W=214, N=23, p=0.010 (one-sided). An adaptive ablation width rule targets the 79/391 near-final-layer cases: it improves probe quality in 60/79 triggered cases (75.9%), mean gain +7.44pp. A direction-specificity control confirms the ablation effect is concept-direction specific: median 377x suppression rate versus random-direction ablation (99.1% of concept directions beat all 10 random seeds). Reference implementation: rosetta_tools v1.3.1 (doi:10.5281/zenodo.20361433).

URL PDF HTML ☆

赞 0 踩 0

2605.25835 2026-05-26 cs.LG cs.AI 版本更新

Context-Instrumental Data Distillation for Kubernetes Manifest Generation: Method and Experimental Evaluation

面向Kubernetes清单生成的上下文-工具数据蒸馏方法及实验评估

Andrey Kozachok, Anatoliy Bakaev, Aleksandr Kozachok, Shamil Magomedov, Artem Noev

发表机构 * RTU MIREA（俄罗斯莫斯科RTU MIREA）

AI总结提出上下文-工具数据蒸馏方法，通过合成生成和反向指令生成构建语料库，结合外部验证器过滤，在资源受限条件下微调1.5B参数小语言模型生成Kubernetes清单，实验表明严格输出格式比增加训练样本更关键。

Comments 15 pages, 4 figures, 2 tables

详情

AI中文摘要

本文研究了参数高达40亿的小语言模型（SLM）在领域特定语言（DSL）中生成工件的专业化。选择Kubernetes清单作为目标领域。我们提出了上下文-工具数据蒸馏方法：源语料库通过合成生成形成，在扩展方案中通过从真实Kubernetes YAML文件进行反向指令生成，仅当通过外部验证器并匹配领域上下文模型时，才将配对包含在训练中。与经典的KL散度知识蒸馏不同，基线实现简化为在工具验证示例上进行监督微调。实验部分在资源受限条件下展示了试点实现：DeepSeek-V4 Flash API作为教师模型进行合成生成，而Qwen2.5-Coder-1.5B-Instruct通过LoRA在CPU上进行微调。在K8s-Distill-Pilot语料库（训练1200，验证100，测试200）上，我们以更严格的提示公式和max_new_tokens=768实现了full-pass@1 = 91.5%（183/200）。关键经验发现是，对于Kubernetes YAML，试点中的结果质量更多地取决于严格的输出格式要求，而不是简单地增加训练样本数量。

英文摘要

This paper examines the specialization of Small Language Models (SLMs) with up to 4 billion parameters for generating artifacts in domain-specific languages (DSL). Kubernetes manifests are chosen as the target domain. We propose the context-instrumental data distillation method: the source corpus is formed through synthetic generation and, in an extended scheme, through reverse instruction generation from real Kubernetes YAML files, with pairs included in training only upon passing external validators and matching the domain context model. Unlike classical KL-divergence knowledge distillation, the baseline implementation reduces to supervised fine-tuning on instrumentally verified examples. The experimental section presents a pilot implementation under resource-constrained conditions: the DeepSeek-V4 Flash API serves as the teacher for synthetic generation, while Qwen2.5-Coder-1.5B-Instruct is fine-tuned via LoRA on CPU. On the K8s-Distill-Pilot corpus (train_1200, validation_100, test_200), we achieved full-pass@1 = 91.5% (183/200) with a stricter prompt formulation and max_new_tokens=768. The key empirical finding is that for Kubernetes YAML, result quality in the pilot depended more on strict output format requirements than on simply increasing the number of training examples.

URL PDF HTML ☆

赞 0 踩 0

2605.25831 2026-05-26 cs.CL cs.AI cs.LG 版本更新

几何自适应反事实分布学习与扩散引导平滑

Kwangho Kim

发表机构 * Department of Statistics, Korea University（韩国大学统计系）

AI总结针对高维反事实分布学习，提出两种基于扩散引导的几何自适应平滑估计器，通过有效维度降低误差，并在CelebA实验验证。

详情

AI中文摘要

我们研究了高维结果的反事实分布学习，其反事实律可能集中在低维结构附近。标准各向同性平滑对所有环境方向一视同仁，导致不利的缩放和不稳定的局部推断。我们提出了两种基于半参数去偏的扩散引导估计器：用于反事实密度的扩散知情平滑和用于反事实得分的扩散知情得分平滑。这些估计器将因果干扰调整与由扩散得分信息驱动的几何自适应定位相结合，在去除一阶干扰偏差的同时使平滑与局部结果几何对齐。我们建立了平滑密度和基于得分目标的渐近展开、风险界限和推断程序，并在额外近似条件下获得了环境密度推断。在结构几何条件下，主导随机误差由扩散引导核诱导的有效维度控制，而非环境维度。基于CelebA的半合成实验显示几何自适应方法的误差衰减更陡峭，支持了所提出的有效维度理论。

英文摘要

We study counterfactual distribution learning for high-dimensional outcomes whose counterfactual law may concentrate near lower-dimensional structure. Standard isotropic smoothing treats all ambient directions equally, leading to unfavorable scaling and unstable local inference. We propose two diffusion-guided estimators based on semiparametric debiasing: diffusion-informed smoothing for counterfactual densities and diffusion-informed score smoothing for counterfactual scores. The estimators combine causal nuisance adjustment with geometry-adaptive localization driven by diffusion score information, removing first-order nuisance bias while aligning smoothing with local outcome geometry. We establish asymptotic expansions, risk bounds, and inference procedures for smoothed density and score-based targets, with ambient density inference obtained under additional approximation conditions. Under structural geometry conditions, the leading stochastic error is governed by an effective dimension induced by the diffusion-guided kernel, rather than by the ambient dimension. Semi-synthetic experiments based on CelebA show steeper error decay for geometry-adaptive methods, supporting the proposed effective-dimension theory.

URL PDF HTML ☆

赞 0 踩 0

2605.25789 2026-05-26 cs.LG cs.AI cs.IT math.IT stat.ML 版本更新

On the Benefits of Free Exploration for Regret Minimization in Multi-Armed Bandits

关于自由探索对多臂老虎机遗憾最小化的益处

Yunlong Hou, Zixin Zhong, Vincent Y. F. Tan

发表机构 * Department of Mathematics, National University of Singapore（新加坡国立大学数学系）； Department of Mathematics, Department of Electrical and Computer Engineering, National University of Singapore（新加坡国立大学数学系、电子与计算机工程系）

AI总结本文研究在初始自由探索阶段后最小化累积遗憾的多臂老虎机问题，提出一种两阶段算法UFE-KLUCB-H，并证明其相比无自由探索的策略能严格减少遗憾。

Comments 55 pages

详情

AI中文摘要

我们研究了一个随机多臂老虎机问题，其中智能体在遗憾累积之前被授予一个自由探索预算，这是经典遗憾最小化或纯探索范式未涵盖的设置。目标是设计一个自适应策略，在初始自由探索阶段策略性地探索老虎机实例，并在后续阶段最小化累积遗憾。我们形式化了这个带有自由探索的遗憾最小化问题，并识别出一个有趣的区间，其中自由探索预算与时间范围成对数比例。为了量化由于自由探索阶段的可用性而高概率节省的遗憾量，我们引入了一类新的策略，称为$(α,β)$-可能节省策略。我们提出了一种两阶段、可能节省的算法UFE-KLUCB-H，它由一个原则性的自由探索策略UFE和一个历史感知的遗憾最小化策略KLUCB-H组成。推导了UFE-KLUCB-H的实例相关上界，表明UFE-KLUCB-H累积的遗憾严格少于无法访问自由探索阶段的策略。作为补充，我们基于针对自由探索环境定制的多实例扰动论证推导了实例相关下界，证明了UFE-KLUCB-H对于二值老虎机的近乎最优性。我们的上界和下界揭示了累积遗憾中依赖于可用自由探索量的尖锐相变。进行了仿真，表明算法中的强制探索和自适应性导致了更大的遗憾节省。

英文摘要

We study a stochastic multi-armed bandit problem where an agent is granted a free exploration budget before regret accumulates, a setting not captured by the classic regret minimization or pure exploration paradigms. The goal is to design an adaptive policy that strategically explores the bandit instance in the initial free exploration phase and minimizes the cumulative regret in the subsequent phase. We formalize this regret minimization with free exploration problem and identify an interesting regime where the free exploration budget scales logarithmically with the time horizon. To quantify the amount of regret saved with high probability as a result of the availability of the free exploration phase, we introduce a novel set of policies known as $(α,β)$-probably saving policies. We propose a two-phase, probably saving algorithm, UFE-KLUCB-H, which consists of a principled free exploration policy, UFE, and a history-aware regret minimization policy KLUCB-H. Instance-dependent upper bounds on UFE-KLUCB-H are derived, showing that UFE-KLUCB-H accumulates strictly less regret than policies that do not have access to a free exploration phase. Complementarily, we derive instance-dependent lower bounds based on novel multi-instance perturbation arguments tailored to the free-exploration setting, demonstrating the near-optimality of UFE-KLUCB-H for two-valued bandits. Our upper and lower bounds reveal sharp phase transitions in the accumulated regret depending on the amount of available free exploration. Simulations are conducted to demonstrate that forced exploration and adaptivity in the algorithm lead to greater regret savings.

URL PDF HTML ☆

赞 0 踩 0

2605.25786 2026-05-26 cs.LG cs.AI 版本更新

NPSolver: Neural Poisson Solver with Iterative Physics Supervision

NPSolver: 具有迭代物理监督的神经泊松求解器

Bocheng Zeng, Rui Zhang, Runze Mao, Mengtao Yan, Xuan Bai, Yang Liu, Zhi X. Chen, Hao Sun

发表机构 * Gaoling School of Artificial Intelligence（高岭人工智能学院）； Renmin University of China（中国人民大学）； School of Mechanics and Engineering Science（力学与工程科学学院）； Peking University（北京大学）； AI for Science Institute（AI for Science研究院）； University of Chinese Academy of Sciences（中国科学院大学）

AI总结提出NPSolver，通过迭代物理监督（利用少量PCG步骤）训练无标签的神经泊松求解器，并引入边界感知Transolver架构，在2D/3D不规则几何上优于物理信息和数据驱动基线。

Comments kdd 2026

详情

AI中文摘要

在复杂不规则域上高效求解泊松方程仍然是科学计算中的一个基本挑战，因为经典迭代求解器常常因病态系统而面临过长的运行时间。虽然神经算子提供了一种快速的替代方案，但它们通常依赖大规模标记数据集，或者在使用物理信息残差损失时难以处理不稳定的训练动态。我们提出 extsc{NPSolver}，一种通过迭代物理监督训练的无标签神经泊松求解器。 extsc{NPSolver} 不依赖完全收敛的数值解或原始PDE残差，而是利用少量预处理共轭梯度（PCG）步骤来优化自身预测，从而提供更稳定且尺度良好的训练信号。理论分析证实，这种迭代监督充当了良态误差代理，并且停止梯度设计对于优化稳定性至关重要。为了更好地捕捉混合边界条件下的边界驱动特征，我们进一步引入了边界感知Transolver（ extsc{BA-Transolver}）架构，该架构明确分离了内部和边界令牌化。在2D和3D不规则几何上的广泛评估表明， extsc{NPSolver} 优于物理信息和数据驱动基线。此外，一个下游热控制任务突出了该模型进行高效可靠的基于梯度的边界控制的能力。我们将在 https://github.com/intell-sci-comput/NPSolver 发布我们的代码和数据。

英文摘要

Efficiently solving Poisson equations on complex, irregular domains remains a fundamental challenge in scientific computing, as classical iterative solvers often suffer from prohibitive runtime due to ill-conditioned systems. While neural operators offer a fast alternative, they typically rely on large-scale labeled datasets or struggle with unstable training dynamics when using physics-informed residual losses. We propose \textsc{NPSolver}, a neural Poisson solver trained without solution labels via iterative physics supervision. Instead of relying on fully converged numerical solutions or raw PDE residuals, \textsc{NPSolver} utilizes a small number of preconditioned conjugate gradient (PCG) steps to refine its own predictions, providing a more stable and well-scaled training signal. Theoretical analysis confirms that this iterative supervision serves as a well-conditioned error proxy and that a stop-gradient design is essential for optimization stability. To better capture boundary-driven features under mixed boundary conditions, we further introduce the Boundary-Aware Transolver (\textsc{BA-Transolver}) architecture that explicitly separates interior and boundary tokenization. Extensive evaluations on 2D and 3D irregular geometries demonstrate that \textsc{NPSolver} outperforms both physics-informed and data-driven baselines. Furthermore, a downstream thermal control task highlights the model's capability for conducting efficient and reliable gradient-based boundary control. We will release our codes and data at https://github.com/intell-sci-comput/NPSolver.

URL PDF HTML ☆

赞 0 踩 0

2605.25771 2026-05-26 cs.LG cs.AI 版本更新

MDGMIX: Boundary-Aware Subgraph Mixing for Multi-Domain Graph Pre-Training

MDGMIX: 边界感知的子图混合用于多域图预训练

Ziyu Zheng, Yaming Yang, Ziyu Guan, Wei Zhao, Xinyan Huang

发表机构 * School of Computer Science（计算机科学学院）； Technology, Xidian University, Xi’an, China（技术学院，西安电子科技大学）； School of Artificial Intelligence, Xidian University, Xi’an, China（人工智能学院，西安电子科技大学）

AI总结针对多域图预训练中的数据冗余问题，提出MDGMIX框架，通过边界感知子图混合与层次判别学习解耦共享和域特定模式，并在适配时使用轻量级提示加权机制，在少样本分类任务中优于强基线且效率更高。

Comments Accepted by ICML2026

详情

AI中文摘要

多域图预训练是构建具有跨域泛化能力的基础图模型的关键步骤。然而，现有方法主要依赖联合训练所有源域图，导致计算成本高。此外，尚不清楚所有源域图数据是否对有效迁移有同等贡献。本文通过实验揭示了多域图预训练中存在显著的数据冗余。基于这一发现，我们提出了多域图预训练框架MDGMIX，该框架将边界感知的子图混合与层次判别相结合。通过选择边界节点构建具有挑战性的混合域子图，MDGMIX利用粗粒度域判别和细粒度域分解损失来解耦共享模式与域特定模式。在适配过程中，MDGMIX采用轻量级提示加权机制来迁移源域知识。大量实验表明，MDGMIX在少样本分类任务中持续优于强基线，同时表现出优越的时间和内存效率。代码可在 https://github.com/zhengziyu77/MDGMIX 获取。

英文摘要

Multi-domain graph pre-training is a crucial step in constructing foundational graph models with cross-domain generalization capabilities. However, existing methods predominantly rely on jointly training all source domain graphs, resulting in high computational costs. Furthermore, it remains unclear whether all source domain graph data contribute equally to effective transfer. This paper empirically reveals significant data redundancy in multi-domain graph pre-training. Based on this finding, we propose the Multi-domain Graph Pre-training Framework, MDGMIX, which combines boundary-aware subgraph mixing with hierarchical discrimination. By selecting boundary nodes to construct challenging mixed-domain subgraphs, MDGMIX employs coarse-grained domain discrimination and fine-grained domain decomposition losses to decouple shared patterns from domain-specific patterns. During adaptation, MDGMIX employs a lightweight prompt weighting mechanism to transfer source domain knowledge. Extensive experiments demonstrate that MDGMIX consistently outperforms strong baselines in few-shot classification tasks while exhibiting superior time and memory efficiency. The code is available at: https://github.com/zhengziyu77/MDGMIX.

URL PDF HTML ☆

赞 0 踩 0

2605.25765 2026-05-26 cs.CV cs.AI cs.LG 版本更新

Concept Unlearning via Cross-Attention Activation Projection for Diffusion Models

通过交叉注意力激活投影实现扩散模型的概念遗忘

Saemi Moon, Suhyeon Jun, Seoyeon Lee, Dongwoo Kim

发表机构 * CSE, POSTECH（POSTECH计算机科学系）； GSAI, POSTECH（POSTECH通用人工智能实验室）

AI总结提出PURE方法，利用交叉注意力激活空间构建遗忘和保留基，通过线性投影编辑权重，在保持保留概念的同时有效消除目标概念。

详情

AI中文摘要

概念遗忘旨在从预训练的文本到图像扩散模型中擦除目标概念，而无需重新训练。闭式方法在此设置中具有吸引力，因为它们对交叉注意力权重应用单一确定性编辑，并且不增加推理时间成本。然而，现有的闭式方法通过文本编码器对少数命名目标概念的简短锚定提示的响应来表示目标概念，而唤起该概念但不一致命名的释义提示可以绕过编辑。我们认为，目标应该改为在交叉注意力激活空间中表示。文本嵌入描述用户的提示，而交叉注意力激活描述模型即将渲染的内容，后者泛化到锚定模板未覆盖的释义。基于这一观察，我们提出了PURE（U-Net渲染中的投影用于擦除），这是一种闭式方法，从沿短去噪轨迹捕获的逐层交叉注意力激活构建遗忘和保留基，并将单个线性投影器应用于交叉注意力键和值权重。在最近涵盖艺术风格、知识产权、名人和NSFW类别中十个概念的整体概念遗忘基准上，PURE显著减少了在释义和对抗性提示下的目标泄露，同时将保留概念保持接近未编辑模型，在评估方法中实现了最佳的总体遗忘-保留权衡。

英文摘要

Concept unlearning aims to erase a target concept from a pretrained text-to-image diffusion model without retraining. Closed-form methods are attractive in this setting because they apply a single deterministic edit to the cross-attention weights and add no inference-time cost. Existing closed-form methods, however, represent the target concept through the text encoder's response to a few short anchor prompts that name it, and paraphrased prompts that evoke the concept without naming it consistently bypass the edit. We argue that the target should instead be represented in the cross-attention activation space. Text embeddings describe the user's prompt, while cross-attention activations describe what the model is about to render, and the latter generalize to paraphrase the anchor templates do not cover. Building on this observation, we propose PURE (Projection in U-Net Rendering for Erasure), a closed-form method that builds the forget and retain bases from per-layer cross-attention activations captured along a short denoising trajectory and applies a single linear projector to the cross-attention key and value weights. On a recent holistic concept-unlearning benchmark covering ten concepts across artistic style, intellectual property, celebrity, and NSFW categories, PURE significantly reduces target leakage under paraphrased and adversarial prompts while preserving retain concepts close to the unedited model, yielding the best overall forget-retain trade-off among evaluated methods.

URL PDF HTML ☆

赞 0 踩 0

2605.25750 2026-05-26 cs.LG 版本更新

Invariant-Based Weight Sharing for Message Passing

基于不变量的消息传递权重共享

Florian Seiffarth

发表机构 * University of Bonn（波恩大学）； Lamarr Institute for Machine Learning and Artificial Intelligence（拉马尔人工智能与机器学习研究院）

AI总结提出一种基于图不变量的权重共享原则，通过直接根据图不变量索引权重，增强消息传递神经网络的结构感知能力，并在合成与真实数据上取得优于标准MPNN的效果。

Comments 13 pages main paper + 30 pages references and appendix

详情

AI中文摘要

消息传递神经网络（MPNN）是学习图结构域表示的一个强大框架。然而，MPNN中的权重仅作用于特征，限制了其捕捉结构模式的能力。我们引入了一种新颖的结构感知权重共享原则，该原则明确地融入了图结构固有的信息。权重由用户选择的图不变量（即在节点置换下保持不变的函数）直接索引，从而能够在结构等价的子图之间进行系统性的权重复用。我们提出了ShareGNN，该模型在一个简单的编码器-解码器架构中实例化了这一原则，产生了一个具有可学习邻接矩阵和类似Transformer连接性的MPNN。我们证明，其表达能力至少与所选不变量的区分能力相当，从而提供了对模型复杂度的显式控制。在合成数据和真实数据以及子图计数任务上的实验表明，与标准MPNN相比，该方法具有一致的改进，具有超越1-WL测试的竞争力，并且可扩展到大型数据集。

英文摘要

Message-passing neural networks (MPNNs) are a powerful framework for learning representations of graph-structured domains. However, weights in MPNNs act on features only, limiting their ability to capture structural patterns. We introduce a novel structure-aware weight sharing principle that explicitly incorporates information inherent to the graph structure. Weights are indexed directly by user-chosen graph invariants, i.e., functions preserved under node permutations, enabling systematic reuse across structurally equivalent subgraphs. We present ShareGNNs, which instantiate this principle within a simple encoder-decoder architecture, resulting in an MPNN with learnable adjacency and transformer-like connectivity. We show that their expressivity is at least as strong as the discriminative power of the chosen invariants, providing explicit control over the model complexity. Experiments on synthetic and real-world data, as well as subgraph counting tasks, demonstrate consistent improvements over standard MPNNs, competitive expressivity beyond the 1-WL test, and scalability to large datasets.

URL PDF HTML ☆

赞 0 踩 0

2605.25749 2026-05-26 cs.IR cs.AI cs.LG 版本更新

DeGRe: Dense-supervised Generative Reranking for Recommendation

DeGRe: 密集监督的生成式重排序用于推荐

Chaotian Song, Jingyao Zhang, Chenghao Chen, Zisen Sang, Dehai Zhao, Guodong Cao, Boxi Wu, Deng Cai, Jia Jia

发表机构 * College of Software, Zhejiang University Hangzhou China ； Rajax Network Technology, Taobao Shangou of Alibaba Hangzhou China ； Rajax Network Technology, Taobao Shangou of Alibaba Beijing China ； State Key Lab of CAD\&CG, Zhejiang University Hangzhou China ； Rajax Network Technology, Taobao Shangou of Alibaba Shanghai China ； College of Software, Zhejiang University ； Rajax Network Technology, Taobao Shangou of Alibaba ； State Key Lab of CAD\&CG, Zhejiang University

AI总结提出DeGRe框架，通过离线探索中的密集监督信号（Lookahead Evaluator）指导在线生成器（Online Generator）进行单步贪婪解码，解决重排序中的启发式标签偏差和信用分配问题。

Comments Accepted to KDD 2026 (ADS Track)

详情

DOI: 10.1145/3770855.3818363

AI中文摘要

在多阶段推荐系统中，重排序通过捕获列表内上下文依赖关系来优化整体效用，但其核心挑战在于在指数级排列空间中探索最优序列。最近的研究转向端到端生成式框架，通常利用列表级奖励或偏好对齐来指导生成器训练。然而，这些方法仍面临两个关键问题。首先是启发式标签偏差。现有方法通常基于简单规则构建训练目标，例如将点击项提升到顶部，而忽略列表上下文中的因果依赖关系。其次是信用分配问题。稀疏的列表级后验奖励无法直接指导序列生成中的中间步骤，导致优化方向模糊。为了解决这些问题，我们提出DeGRe（密集监督的生成式重排序），一种通过密集监督弥合离线探索与在线效率之间差距的生成式重排序框架。DeGRe的核心在于其离线-在线解耦设计。在离线阶段，我们引入基于累积回归的Lookahead Evaluator，利用束搜索在未曝光空间中主动挖掘高价值前瞻序列。在训练期间，我们将评估器的逐步价值估计转换为密集监督信号，并将其蒸馏到轻量级在线生成器中。这种机制使生成器能够内化前瞻规划能力，在线推理时仅需一次高效的贪婪解码即可逼近全局最优。实验表明，DeGRe在公开基准和工业数据集上优于基线模型。我们已成功将DeGRe部署到淘宝闪购中，显著提升了在线推荐效果。

英文摘要

In multi-stage recommender systems, reranking optimizes overall utility by capturing intra-list contextual dependencies, yet its central challenge lies in exploring optimal sequences within an exponentially large permutation space. Recent studies have shifted towards end-to-end generative frameworks, which typically leverage list-wise rewards or preference alignment to guide generator training. However, these methods still face two critical issues. First is the heuristic label bias. Existing methods often construct training targets based on simple rules, such as promoting clicked items to the top, while ignoring causal dependencies within the list context. Second is the credit assignment problem. Sparse list-level posterior rewards fail to directly guide intermediate steps in sequence generation, leading to ambiguous optimization directions. To address these issues, we propose DeGRe (Dense-supervised Generative Reranking), a generative reranking framework that bridges the gap between offline exploration and online efficiency through dense supervision. The core of DeGRe lies in its offline-online decoupled design. During the offline phase, we introduce a Lookahead Evaluator based on cumulative regression, which leverages beam search to actively mine high-value lookahead sequences in the unexposed space. During training, we transform the step-wise value estimations from the evaluator into dense supervision signals and distill them into a lightweight Online Generator. This mechanism enables the generator to internalize lookahead planning capabilities, requiring only a single efficient greedy decoding pass during online inference to approximate the global optimum. Experiments demonstrate that DeGRe outperforms baseline models on public benchmarks and industrial datasets. We have successfully deployed DeGRe on Taobao Flash Shopping, significantly improving online recommendations.

URL PDF HTML ☆

赞 0 踩 0

2605.25740 2026-05-26 cs.LG 版本更新

Latent Representation Alignment for Offline Goal-Conditioned Reinforcement Learning

离线目标条件强化学习中的潜在表示对齐

Hyungkyu Kang, Byeongchan Kim, Min-hwan Oh

发表机构 * Seoul National University（首尔国立大学）

AI总结针对离线目标条件强化学习中价值函数错误泛化的瓶颈，提出潜在对齐价值学习（LAVL）算法，通过潜在表示价值泛化与分层规划的统一框架，在OGBench的22个数据集上20个取得最优性能。

Comments Accepted in ICML 2026

详情

AI中文摘要

离线目标条件强化学习（GCRL）提供了一个从固定数据集获取目标达成策略的实用框架。然而，在长视野任务中学习可靠的目标条件价值函数仍然具有挑战性。在本文中，我们指出目标条件价值函数中的错误泛化是一个根本性瓶颈，并证明在价值函数中引入适当的归纳偏置对于解决该瓶颈至关重要。基于这些发现，我们提出了潜在对齐价值学习（LAVL），一种离线GCRL算法，它将基于潜在表示的价值泛化与分层规划集成在一个统一框架中。在OGBench上的大量实验表明，LAVL持续优于现有的离线GCRL方法，在22个数据集中的20个上取得了最高性能。值得注意的是，LAVL在长视野任务和轨迹拼接数据集上表现出强大的性能，而先前的方法在这些任务上性能显著下降。我们的代码可在https://github.com/oh-lab/LAVL.git获取。

英文摘要

Offline goal-conditioned reinforcement learning (GCRL) provides a practical framework for obtaining goal-reaching policies from fixed datasets. However, learning a reliable goal-conditioned value function in long-horizon tasks remains challenging. In this paper, we identify erroneous generalization in goal-conditioned value functions as a fundamental bottleneck, and demonstrate that appropriate inductive bias in the value function is crucial for addressing the bottleneck. Building on these findings, we propose Latent-Aligned Value Learning (LAVL), an offline GCRL algorithm that integrates latent-representation-based value generalization with hierarchical planning in a unified framework. Extensive experiments on OGBench demonstrate that LAVL consistently outperforms existing offline GCRL methods, achieving the highest performance on 20 out of 22 datasets. Notably, LAVL exhibits strong performance in long-horizon tasks and trajectory stitching datasets, where prior methods suffer significant performance degradation. Our code is available at https://github.com/oh-lab/LAVL.git.

URL PDF HTML ☆

赞 0 踩 0

2605.25739 2026-05-26 cs.LG cs.GT stat.ML 版本更新

The Behavioral Credibility Trilemma: When Calibrated Autonomy Becomes Impossible

行为可信度三难困境：当校准自主性变得不可能

Lauri Lovén, Nam Do, Hassan Mehmood, Dinesh Kumar Sah, Sasu Tarkoma

发表机构 * Future Computing Group University of Oulu（奥卢大学未来计算组）； Department of Computer Science University of Helsinki（赫尔辛基大学计算机科学系）

AI总结本文证明，在理性监督下，当某些任务超出智能体的可靠能力时，任何具有置信门控自主性的强化学习策略都无法同时实现最大帮助性、最优校准和完全自主性，即行为可信度三难困境。

Comments 48 pages, 3 figures

详情

AI中文摘要

我们证明，在理性监督下，当某些任务超出智能体的可靠能力时，任何具有置信门控自主性的强化学习策略都无法同时实现最大帮助性、最优校准和完全自主性：即行为可信度三难困境。这种不可能性是几何性的——向严格适当的评分规则添加任何非仿射自主性激励都会破坏严格适当性，因此，同时因校准置信度和自主行动而获得奖励的智能体，会在低于委托人批准阈值的任务上系统性地夸大其报告的置信度。行为扰动引理量化了这种膨胀（对于Brier分数，缩放比例为 $w_A/(2 w_C)$），并表明检测需要 $Ω(1/Δ^2)$ 次观测。我们证明委托人的最优监督规则必然是非仿射的，这使得不可能性是无条件的，并且在对数凹密度策略族中与优化器无关。我们形式化了置信门控决策问题，将现有方法映射到三难困境上，并确定了两种建设性的解决路径（承诺、领域分离）。一个540配置的Best-of-N实验测试了五个预注册假设，所有假设均得到强烈证实（效应量 $d = 1.10$ 至 $5.32$），并增加了对可达 $(H, C, A)$ 曲面几何的描述性分析，显示了一个与预测的膨胀饱和一致的平台截断前沿。

英文摘要

We prove that no reinforcement learning policy with confidence-gated autonomy can simultaneously achieve maximum helpfulness, optimal calibration, and full autonomy under rational oversight, whenever some tasks exceed the agent's reliable competence: the Behavioral Credibility Trilemma. The impossibility is geometric -- adding any non-affine autonomy incentive to a strictly proper scoring rule destroys strict properness, so an agent rewarded for both calibrated confidence and autonomous action systematically inflates its reported confidence on tasks below the principal's approval threshold. The Behavioral Perturbation Lemma quantifies the inflation (scaling as $w_A/(2 w_C)$ for the Brier score) and shows detection requires $Ω(1/Δ^2)$ observations. We prove the principal's optimal oversight rule is necessarily non-affine, making the impossibility unconditional and optimizer-independent across log-concave-density policy families. We formalize the Confidence-Gated Decision Problem, map existing methods onto the trilemma, and identify two constructive resolution pathways (commitment, domain separation). A 540-configuration Best-of-N experiment tests five pre-registered hypotheses, all strongly confirmed (effect sizes $d = 1.10$ to $5.32$), and adds a descriptive analysis of the achievable-$(H, C, A)$ surface geometry showing a plateau-truncated frontier consistent with the predicted inflation saturation.

URL PDF HTML ☆

赞 0 踩 0

2605.25717 2026-05-26 cs.AI cs.CE cs.LG 版本更新

FLOATBench: A Dataset and Benchmark for Floating Offshore Wind Turbine Tower Fatigue

FLOATBench：浮式海上风力发电机塔架疲劳数据集与基准

João Alves Ribeiro, Bruno Alves Ribeiro, Francisco Pimenta, Sérgio M. O. Tavares, Faez Ahmed

发表机构 * Department of Mechanical Engineering（机械工程系）； Massachusetts Institute of Technology（麻省理工学院）； School of Engineering（工程学院）； Brown University（布朗大学）； CONSTRUCT, Faculty of Engineering University of Porto（CONSTRUCT，工程学院，葡萄牙波尔图大学）； University of Aveiro（阿维罗大学）

AI总结提出FLOATBench，一个包含582,120个疲劳损伤标签的表格基准，基于22 MW浮式风机塔架的高保真仿真，并引入工况感知的评估协议以检测随机划分无法发现的性能排名变化。

详情

AI中文摘要

全球大部分海上风能资源位于水深过大、无法使用固定式基础的海域，因此浮式海上风力发电机（FOWT）对于深水部署至关重要。随着行业向22 MW级设计规模发展，塔架疲劳变得愈发关键，因为更大的结构会放大由持续风浪激励引起的耦合气动-水动-伺服-弹性载荷。准确的疲劳损伤预测对于认证、设计优化和成本降低至关重要。然而，该领域缺乏共享的替代模型基准：不同研究报告了不同的仿真、划分和指标，使得方法难以比较。我们提出FLOATBench，一个公开的表格基准，包含三种22 MW FOWT塔架几何形状的582,120个逐截面疲劳损伤标签，这些标签来自三种塔架的19,404次高保真OpenFAST仿真（每种塔架6,468次：1,078个对齐风浪工况点×六个湍流种子），每种塔架在30个截面上进行标注。FLOATBench包括一个基于工况感知的联合风浪运行包络的alpha-shape划分，将测试点分为训练内、插值和外推区域。它配备了一个可复现的评估框架，涵盖三个协议级别：随机验证（E1）、塔内工况感知评估（E2）和跨塔迁移（E3）。工况感知协议揭示了全局性能与外推性能之间的排名变化，而随机划分排行榜无法检测到这些变化。据作者所知，FLOATBench是首个用于表格替代建模的FOWT疲劳基准，并提供了一个可推广到定义在物理运行包络上的工程替代模型的评估协议。数据集和代码可在以下网址获取：https://github.com/Joao97ribeiro/FLOATBench。

英文摘要

Most of the world's offshore wind resource lies in waters too deep for fixed-bottom foundations, making floating offshore wind turbines (FOWTs) essential for deep-water deployment. As the industry scales toward $22$ MW class designs, tower fatigue becomes increasingly critical because larger structures amplify the coupled aero-hydro-servo-elastic loads induced by continuous wind and wave excitation. Accurate fatigue-damage prediction is therefore central to certification, design optimization, and cost reduction. Yet the field lacks a shared surrogate benchmark: studies report different simulations, splits, and metrics, making methods difficult to compare. We present FLOATBench, a public tabular benchmark with $582{,}120$ per-section fatigue-damage labels across three $22$ MW FOWT tower geometries, derived from $19{,}404$ high-fidelity OpenFAST simulations across the three towers ($6{,}468$ per tower: $1{,}078$ aligned wind/wave operating points $\times$ six turbulence seeds), labeled at $30$ cross-sections per tower. FLOATBench includes a regime-aware alpha-shape partition of the joint wind/wave operating envelope, stratifying test points into in-train, interpolation, and extrapolation regimes. It is paired with a reproducible evaluation harness covering three protocol levels: random validation (E1), within-tower regime-aware evaluation (E2), and cross-tower transfer (E3). The regime-aware protocol reveals rank shifts between global and extrapolation performance that random-split leaderboards cannot detect. To the authors' knowledge, FLOATBench is the first FOWT fatigue benchmark for tabular surrogate modeling, and offers an evaluation protocol that generalizes to engineering surrogates defined over physical operating envelopes. Dataset and code available at: https://github.com/Joao97ribeiro/FLOATBench.

URL PDF HTML ☆

赞 0 踩 0

2605.25710 2026-05-26 physics.chem-ph cond-mat.mtrl-sci cs.LG physics.comp-ph 版本更新

Machine Learning Multiscale Interactions

机器学习多尺度相互作用

Àlex Solé, Sergio Suárez-Dou, Albert Mosella-Montoro, Silvia Gómez-Coca, Eliseo Ruiz, Alexandre Tkatchenko, Javier Ruiz-Hidalgo

发表机构 * Image Processing Group – Signal Theory and Communications Department（图像处理组——信号理论与通信系）； Inorganic and Organic Chemistry Department and Institute of Theoretical and Computational Chemistry（无机和有机化学系及理论与计算化学研究所）； Department of Physics and Materials Science（物理与材料科学系）

AI总结提出多尺度结构集成（MuSE）层次模型，通过软粗粒化池化构建多尺度表示，与多种机器学习力场耦合，准确捕获跨尺度的量子力学相互作用。

详情

AI中文摘要

现实物理系统的特征在于跨多个长度和时间尺度的涌现相互作用，这对预测性机器学习模型构成了重大挑战。大多数科学机器学习模型关注于狭窄的相互作用范围。虽然机器学习力场提供了接近量子精度的准确性，但普遍的消息传递层缺失了长程多体效应。在此，我们引入多尺度结构集成（MuSE），一种层次模型，它使用软粗粒化池化从原子到粗节点的平滑分数分配构建粗粒表示，使机器学习力场模块能够在多个尺度上运行。MuSE是架构无关的，并与SO3krates、MACE和PaiNN机器学习力场耦合，适用于分子和材料。通过基于Hessian的基准测试、生物分子的折叠轨迹以及分子-石墨烯纳米结构中的能量分布，我们展示了MuSE的强大能力——与近期其他长程机器学习模型不同，MuSE在相关尺度上准确捕获了量子力学相互作用。

英文摘要

Realistic physical systems are characterised by emergent interactions across multiple length and time scales, posing a significant challenge for predictive machine learning (ML) models. Most scientific ML models focus on a narrow range of interactions. While machine learning force fields (MLFFs) offer near-quantum accuracy, the ubiquitous message-passing layers miss long-range many-body effects. Here we introduce the Multiscale Structural Ensemble (MuSE), a hierarchical model that uses Soft Coarse-Graining Pooling to construct coarse representations from smooth fractional assignments of atoms to coarse nodes, enabling MLFF modules to operate across multiple scales. MuSE is architecture-agnostic and coupled with SO3krates, MACE, and PaiNN MLFFs for both molecules and materials. We demonstrate the power of MuSE through Hessian-based benchmarks, folding trajectories for biomolecules, and energy profiles in molecule-graphene nanostructures, where MuSE accurately captures quantum-mechanical interactions at relevant scales -- unlike other recent long-range ML models.

URL PDF HTML ☆

赞 0 踩 0

2605.25704 2026-05-26 cs.CL cs.LG 版本更新

PowLU: An Activation Function for Stable Pre-Training of LLMs

PowLU: 一种用于LLM稳定预训练的激活函数

Peijie Jiang, Yuqi Feng, Cunyin Peng, Qian Zhao, Jia Liu, KunLong Chen, Zhiqiang Zhang, Jun Zhou

发表机构 * Ant Group（蚂蚁集团）

AI总结提出PowLU激活函数，通过有理幂函数实现自适应非线性，解决SwiGLU在低精度LLM训练中的数值不稳定问题，在大规模训练中取得与SwiGLU和SwiGLU-Clip相当的性能并提升可扩展性。

Comments 17 pages, 7 figures, techreport

详情

AI中文摘要

在当代大型语言模型（LLM）中，swish门控线性单元（SwiGLU）激活函数被广泛采用以调节信息流并引入非线性。对于大的正输入，SwiGLU近似于二次函数$x^2$，提供强非线性和表达能力。然而，这一特性也导致随着输入或模型规模增大时的数值不稳定性，特别是在低精度LLM训练中。主要原因是其近似二次放大，扩大了输出范围并加剧了异常值。为了解决这个问题，我们提出了一种稳定的激活函数——幂线性单元（PowLU），用于大规模LLM预训练。具体来说，PowLU采用有理幂函数实现自适应非线性，从而改善表示能力并在尖峰区域实现稳定训练。此外，我们为PowLU的几个关键性质提供了理论证明。缩放定律实验确认了性能在不同模型规模下的一致性，进一步使用Ling架构（总参数7.9B和124B）的实验结果表明，PowLU在大规模LLM训练中取得了与SwiGLU和SwiGLU-Clip相当的结果。此外，实验结果还表明PowLU有效提升了LLM大规模训练的可扩展性。

英文摘要

In contemporary large language models (LLMs), the swish-gated linear unit (SwiGLU) activation function is widely adopted to regulate the information flow and introduce non-linearity. For large positive inputs, SwiGLU approximates the quadratic function $x^2$, providing strong nonlinearity and expressive capacity. However, this property also causes numerical instability as the input or model scale increases, particularly in low-precision LLM training. The main reason is its approximate quadratic amplification, which enlarges the output range and exacerbates outliers. To address this issue, we propose a stable activation function, Power Linear Unit (PowLU), for large-scale LLM pre-training. Specifically, PowLU employs a rational power function to achieve adaptive nonlinearity, thereby improving representation ability and enabling stable training in spike regions. Moreover, we provide theoretical justification for several key properties of PowLU. Scaling law experiments confirm that the performance is consistent across model sizes, and further experimental results with the Ling architecture (7.9B and 124B total parameters) demonstrate that PowLU achieves competitive results against SwiGLU and SwiGLU-Clip in large-scale training of LLMs. In addition, the experimental results also show that PowLU effectively improves the scalability of the large-scale training of LLMs.

URL PDF HTML ☆

赞 0 踩 0

2605.25698 2026-05-26 cs.LG cs.AI 版本更新

How Should LLMs Consume High-Quality Data? Optimal Data Scheduling via Quality-Aware Functional Scaling Laws

LLM应如何消费高质量数据？通过质量感知的功能缩放定律实现最优数据调度

Zhitao Zhu, Xili Wang, Shizhe Wu, Jiawei Fu, Xiaoqing Liu

发表机构 * Peking University（北京大学）； Meituan（美团）

AI总结本文通过引入数据质量维度扩展功能缩放定律，解析求解了联合数据质量和批次大小调度问题，揭示了高质量数据的双重角色，并提出了Drop-Stable-Rampup调度策略，在15B MoE模型上相比WSD和余弦衰减分别提升平均准确率+1.70和+2.98。

详情

AI中文摘要

高质量数据在大语言模型训练中稀缺，但如何联合训练动态调度其使用缺乏理论指导。我们通过引入数据质量维度扩展功能缩放定律，并以渐近闭式形式求解了联合数据质量和批次大小调度问题。该解揭示了两个阶段和高质量数据的双重角色。在噪声受限阶段，高质量数据应作为信号放大器：降低批次大小将更清洁的数据转换为更多信号而不放大噪声。在信号受限阶段，它应作为噪声抑制器：后期放置可减少终端噪声而不牺牲信号积累。现有的课程式流程主要利用第二个角色，将更清洁的数据放在后期，但忽略了第一个角色，因为传统的衰减调度在高质量数据可用时恰好降低了更新强度。受此启发，我们为LLM中期训练提出了Drop-Stable-Rampup：在质量转换时，降低批次大小，保持稳定以积累信号，然后逐渐增加以抑制终端噪声。在一个在108B tokens上中期训练的15B混合专家模型上，Drop-Stable-Rampup相比Warmup-Stable-Decay (WSD)平均准确率提升+1.70，相比余弦衰减提升+2.98，在数学推理基准如GSM8K (+4.23)和MATH (+2.80)上增益尤其显著。

英文摘要

High-quality data is scarce in large language model (LLM) training, yet how to schedule its use jointly with training dynamics lacks theoretical guidance. We extend functional scaling laws by incorporating a data-quality dimension, and solve the joint data-quality and batch-size scheduling problem in asymptotic closed form. The solution reveals two regimes and a dual role of high-quality data. In the noise-limited regime, high-quality data should be used as a signal amplifier: lowering the batch size converts cleaner data into more signal without amplifying noise. In the signal-limited regime, it should be used as a noise suppressor: late placement reduces terminal noise without sacrificing signal accumulation. Existing curriculum-style pipelines primarily exploit the second role by placing cleaner data late, but miss the first role because conventional decay schedules reduce update intensity exactly when high-quality data becomes available. Guided by this, we propose Drop-Stable-Rampup for LLM midtraining: upon the quality transition, drop the batch size, hold it stable to accumulate signal, then ramp up to suppress terminal noise. On a 15B Mixture-of-Experts model midtrained on 108B tokens, Drop-Stable-Rampup improves average accuracy over Warmup-Stable-Decay (WSD) by +1.70 and over Cosine-decay by +2.98, with particularly large gains on mathematical reasoning benchmarks such as GSM8K (+4.23) and MATH (+2.80).

URL PDF HTML ☆

赞 0 踩 0

2605.25696 2026-05-26 cs.LG 版本更新

Evaluating passing decision-making in professional football: An enhanced MPNN approach to Receiver Selection

评估职业足球中的传球决策：一种增强的MPNN方法用于接球者选择

Gabriel Masella, Giuseppe Alessio D'Inverno, Max Goldsmith, Gianluigi Rozza

发表机构 * Department of Mathematics, Informatics and Geoscience（数学、信息学与地质科学系）； University of Trieste（特里斯特大学）； MathLab（数学实验室）； International School for Advanced Studies (SISSA)（国际高级研究学校（SISSA））； Royal Belgium Football Association（比利时皇家足球协会）

AI总结提出一种图神经网络框架，通过将场上交互建模为动态图来预测最佳传球目标，在接球者选择任务上达到竞争性准确率，并能在数秒内评估超过1000次传球。

详情

AI中文摘要

足球中的决策过程以空间定位、对手压力和球员意图之间的复杂相互作用为特征。本文介绍了一种图神经网络（GNN）框架，旨在通过将场上交互建模为动态图来预测接球者选择，即最佳传球目标。每个球员被表示为一个节点，具有位置和上下文特征，而潜在的传球线形成加权边，由距离、角度和压力指标表征。我们开发并训练了一个消息传递神经网络（MPNN），使用了来自职业比赛的跟踪数据和事件数据的组合，通过基于优化版Needleman-Wunsch算法的稳健流水线进行同步。该模型在识别实际选择的接球者方面达到了竞争性准确率，并在前三建议中达到了最先进的准确率。我们的模型还提供了每个选项的可能性、威胁和创造力的量化，使表现分析师能够在数秒内评估超过1000次传球。

英文摘要

The process of decision-making in football is characterized by a complex interplay between spatial positioning, opponent pressure, and player intent. This work introduces a Graph Neural Network (GNN) framework designed to predict Receiver Selection, the optimal passing target, by modeling on-field interactions as dynamic graphs. Each player is represented as a node with positional and contextual features, while potential passing lines form weighted edges characterized by distance, angle, and pressure metrics. A Message-Passing Neural Network (MPNN) has been developed and trained using a combination of tracking data and event data from professional matches, synchronized through a robust pipeline based on an optimized version of the Needleman-Wunsch Algorithm. The model achieves competitive accuracy in identifying the actual chosen receiver and state-of-the-art accuracy within its top three suggestions. Our model further offers quantification of each option's likelihood, threat, and creativity, enabling performance analysts to evaluate over 1,000 passes in seconds.

URL PDF HTML ☆

赞 0 踩 0

2605.25681 2026-05-26 cs.LG cs.AI 版本更新

StrTransformer: 面向无监督盲源恢复的源向结构化Transformer

Yuan-Hao Wei

发表机构 * PolyU

AI总结提出StrTransformer框架，通过源向结构化Transformer分支和观测空间混合器直接优化潜在源矩阵，实现盲源恢复和分支潜在建模。

详情

AI中文摘要

本文提出StrTransformer，一种用于盲源恢复和分支潜在建模的源向结构化Transformer框架。StrTransformer不使用编码器推断潜在变量，而是直接优化潜在源矩阵，同时结合观测空间混合器和源向结构化Transformer分支。混合器强制重建一致性，而每个Transformer分支对一条潜在源轨迹施加可微的结构约束。具体来说，每个源被转换为多尺度补丁令牌，随机掩码，由局部偏置Transformer处理，并通过掩码补丁重建能量进行评估。该能量作为隐式的源向结构先验。为了鼓励不同潜在分支专门处理不同的时间模式，StrTransformer进一步引入有序多尺度控制器，学习分支特定的补丁尺度权重、有序尺度中心和局部注意力斜率。最终目标函数结合了观测重建、源向结构正则化以及用于分离和尺度专门化的模块化辅助惩罚。我们分析了目标函数的解耦和耦合结构、正则化精确重建纤维，以及由有序分支描述符引起的置换对称性减少。一个受控案例研究表明，学习到的分支收敛到不同的时间尺度结构，并在事后评估中恢复源对齐的潜在轨迹。

英文摘要

This paper proposes StrTransformer, a source-wise structured Transformer framework for blind source recovery and branch-wise latent modeling. Instead of using an encoder to infer latent variables, StrTransformer directly optimizes the latent source matrix together with an observation-space mixer and source-wise structural Transformer branches. The mixer enforces reconstruction consistency, while each Transformer branch imposes a differentiable structural constraint on one latent source trajectory. Specifically, each source is converted into multi-scale patch tokens, randomly masked, processed by a locality-biased Transformer, and evaluated through a masked patch reconstruction energy. This energy acts as an implicit source-wise structural prior. To encourage different latent branches to specialize into different temporal regimes, StrTransformer further introduces an ordered multi-scale controller that learns branch-specific patch-scale weights, ordered scale centers, and locality attention slopes. The resulting objective combines observation reconstruction, source-wise structural regularization, and modular auxiliary penalties for separation and scale specialization. We analyze the decoupling and coupling structure of the objective, the regularized exact-reconstruction fiber, and the reduction of permutation symmetry induced by ordered branch descriptors. A controlled case study shows that the learned branches converge to distinct temporal-scale structures and recover source-aligned latent trajectories under post-hoc evaluation.

URL PDF HTML ☆

赞 0 踩 0

2605.25640 2026-05-26 physics.ins-det cs.LG hep-ex nucl-ex 版本更新

3D Magnetic Field Reconstruction and Mapping with Physics-Informed Neural Networks

基于物理信息神经网络的3D磁场重建与映射

Haohan Yu, Zhanxu Hao, Bingzhi Li, Zejia Lu, Xiang Chen, Liang Li

发表机构 * Xinxiang Medical University（新乡医学院）； Institute of Particle and Nuclear Physics（粒子与核物理研究所）； Henan Normal University（河南师范大学）； Henan University of Urban Construction（河南城市学院）； Shanghai Institute of Applied Physics（上海应用物理研究所）； Chinese Academy of Sciences（中国科学院）； State Key Laboratory of Dark Matter Physics（暗物质物理国家重点实验室）； School of Physics and Astronomy（物理与天文学院）； Key Laboratory for Particle Astrophysics and Cosmology (Ministry of Education)（粒子天体物理与宇宙学重点实验室（教育部））； Shanghai Key Laboratory for Particle Physics and Cosmology（上海粒子物理与宇宙学重点实验室）； Scientific Model Research Group（科学模型研究组）

AI总结提出一种物理信息神经网络（PINN）框架，通过将麦克斯韦方程直接融入损失函数并引入测量点物理残差损失，实现高精度3D磁场重建，仿真精度达10^{-4}，实验精度达10^{-3}水平。

详情

AI中文摘要

准确重建不可达区域的磁场对于物理学中的许多高精度实验至关重要。传统方法（如球谐展开）常因截断误差而限制精度。本研究提出一种先进的物理信息神经网络（PINN）框架，用于高精度3D磁场映射。与传统的纯数据驱动模型不同，所提出的PINN将麦克斯韦方程直接融入损失函数，在整个域内强制执行无散度和无旋度条件。一个关键创新是在测量位置包含显式的物理残差损失，确保超越随机配点采样的严格物理一致性。使用模拟数据进行验证，重建精度达到$10^{-4}$，比现有PINN基准提高十倍。此外，使用定制线圈组件的实验验证表明，在环境条件下，相对精度达到亚百分比水平（$10^{-3}$量级）的稳健重建。这种AI驱动方法为传感器放置受限的复杂实验环境中的场监测和测量提供了稳健的高精度解决方案。

英文摘要

Accurate reconstruction of magnetic fields in inaccessible regions is vital for many high-precision experiments in physics. Traditional methods, such as spherical harmonic expansion, often suffer from truncation errors that limit their precision. This study proposes an advanced Physics-Informed Neural Network (PINN) framework for high-precision 3D magnetic field mapping. Unlike conventional data-driven models, the proposed PINN integrates Maxwell's equations directly into the loss function, enforcing divergence-free and curl-free conditions across the entire domain. A key innovation is the inclusion of explicit physics-residual losses at measurement locations, ensuring rigorous physical consistency beyond random collocation sampling. Validation using simulated data achieves a reconstruction accuracy of $10^{-4}$, a tenfold improvement over existing PINN benchmarks. Furthermore, experimental validation using a custom coil assembly demonstrates robust reconstruction with sub-percent relative accuracy, reaching the $10^{-3}$ level under ambient conditions. This AI-driven methodology provides a robust, high-precision solution for field monitoring and measurement in complex experimental environments where direct sensor placement is restricted.

URL PDF HTML ☆

赞 0 踩 0

2605.25632 2026-05-26 cs.AI cs.LG q-fin.RM 版本更新

Insuring Every Action: An Authority Frontier Framework for Runtime Actuarial Control of Autonomous AI Agents

为每个行动投保：自主AI代理运行时精算控制的权威边界框架

Hao-Hsuan Chen

发表机构 * Department of Risk Management and Insurance（风险管理与保险系）

AI总结提出精算行动接口（AAI）和权威边界框架，通过确定性运行时合约对自主AI代理的副作用行动进行定价、门控和评估，实现跨领域的精算控制与基准测试。

Comments 35 pages, 4 figures, 11 tables. Companion paper on the mathematical foundations: SSRN 6761960

详情

AI中文摘要

自主AI代理越来越多地产生带有副作用的行动：数据库变更、退款、支付、外部承诺。我们提出精算行动接口（AAI），这是一个确定性的运行时合约，它在时间一致的风险映射下，对每个此类行动按照合约固定的安全默认值进行定价，并根据每个边界的储备资本预算门控执行。然后我们开发了权威边界，这是一种评估原语，用于衡量运行时在每个储备资本水平下释放的自主权威量。该框架提供：(i) 一个确定性的报价-绑定-提交协议，带有通行费限制的能力令牌；(ii) 一个通用的七类行动分类法，将异构工具调用映射到可比较的权威单位；(iii) 在alpha支出下的重放确定性和逐路径储备覆盖；(iv) 通过全储备需求C_full和资本指标Capital@k进行跨域归一化。我们在四个代理环境（数据库变更、客服退款以及公共tau-bench零售和航空工具使用轨迹）中实例化AAI，并报告一个实时Postgres面板，其中三个Azure托管的模型通过同一合约提出行动。边界在跨域中表现出常见的低储备拒绝和中间释放模式，仅在预算网格达到全储备需求时饱和；所需储备资本变化达22倍（Capital@50从289到6457）。该框架不强制域采用相同形状；它揭示每个域的精算几何。在实时面板中，合约在低预算下防止了所有三个模型的实现损失，但在拒绝下的承保持续性方面有所不同：模型身份是一个精算承保变量。贡献是一个用于自主代理副作用运行时精算控制的基准就绪评估框架。

英文摘要

Autonomous AI agents increasingly issue side-effect-bearing actions: database mutations, refunds, payments, external commitments. We propose the Actuarial Action Interface (AAI), a deterministic runtime contract that prices each such action against a contractually fixed safe default under a time-consistent risk mapping, and gates execution against a per-boundary reserve capital budget. We then develop the Authority Frontier, an evaluation primitive measuring how much autonomous authority the runtime releases at each level of reserve capital. The framework provides (i) a deterministic quote-bind-commit protocol with toll-bounded capability tokens; (ii) a universal seven-class action taxonomy mapping heterogeneous tool calls to comparable authority units; (iii) replay determinism and pathwise reserve coverage under alpha-spending; (iv) cross-domain normalization via full reserve demand C_full and capital metrics Capital@k. We instantiate AAI across four agentic environments (database mutation, customer-service refund, and the public tau-bench retail and airline tool-use traces) and report a live Postgres panel in which three Azure-hosted models propose actions through the same contract. The frontier exhibits a common low-reserve refusal and intermediate-release pattern across domains, with saturation only where the budget grid reaches full reserve demand; required reserve capital varies by 22x (Capital@50 from 289 to 6457). The framework does not force domains into the same shape; it surfaces each domain's actuarial geometry. In the live panel the contract prevents realized loss across all three models at low budget while differing in underwriting persistence under denial: model identity is an actuarial underwriting variable. The contribution is a benchmark-ready evaluation framework for runtime actuarial control of autonomous-agent side effects.

URL PDF HTML ☆

赞 0 踩 0

2605.25619 2026-05-26 cs.LG 版本更新

Analogies between Transformer Layers and Power Method

Transformer层与幂法之间的类比

Chenglong Li, Claudio Altafini

AI总结本文揭示了Transformer层中的操作（投影和层归一化，忽略前馈神经网络）与幂法步骤之间的类比，并证明通过层后token倾向于与该层输出权重矩阵和值权重矩阵乘积的主特征向量对齐，同时提出了一种将Transformer输出导向任意期望方向的方法。

2605.25616 2026-05-26 cs.LG stat.ML 版本更新

Courtroom Analogy: New Perspective on Uncertainty-Aware Classification

法庭类比：不确定性感知分类的新视角

Taeseong Yoon, Heeyoung Kim

发表机构 * Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology（工业与系统工程系，韩国科学技术院）

AI总结提出法庭类比框架，通过结构化混合狄利克雷分布建模分类中的不确定性聚合，并设计单次前馈神经网络MoDEX实现高效、可解释的不确定性量化。

Comments ICML 2026

详情

AI中文摘要

分类中的单次不确定性量化方法通过预测类概率向量上的可处理分布来表示不确定性。现有方法主要关注增强该分布的表示能力，但往往对预测不确定性如何结构化和聚合提供的见解有限，导致可解释性较弱。我们引入法庭类比，将不确定性感知分类概念化为类特定倡导者之间的结构化辩论。每位倡导者形成概率意见，并通过输入依赖的可信度权重聚合这些意见得出最终裁决。在此框架中，每位倡导者的意见被建模为狄利克雷分布，其浓度参数分解为共享证据和类特定倡导。这产生了具有语义可解释参数的结构化混合狄利克雷分布。为实例化该公式，我们提出了混合狄利克雷专家（MoDEX），一种预测法庭参数的单次前馈神经架构，能够在显式建模不确定性聚合的同时实现高效且表达力强的不确定性量化。我们证明MoDEX具有强大的理论性质，并在多种基准测试中实现了最先进的不确定性量化性能，产生具有有意义语义的可解释不确定性估计。

英文摘要

Single-pass uncertainty quantification (UQ) methods for classification represent uncertainty by predicting a tractable distribution over the class probability vector. While existing approaches primarily focus on enhancing the expressiveness of this distribution, they often provide limited insight into how predictive uncertainty is structured and aggregated, resulting in weak interpretability. We introduce the courtroom analogy, which conceptualizes uncertainty-aware classification as a structured debate among class-specific advocates. Each advocate forms a probabilistic opinion, and a final verdict is reached by aggregating these opinions using input-dependent plausibility weights. In this framework, each advocate's opinion is modeled as a Dirichlet distribution whose concentration parameter is decomposed into shared evidence and class-specific advocacy. This yields a structured mixture of Dirichlet distributions with semantically interpretable parameters. To instantiate this formulation, we propose Mixture of Dirichlet EXperts (MoDEX), a single-pass neural architecture that predicts the courtroom parameters, enabling efficient and expressive UQ while explicitly modeling uncertainty aggregation. We demonstrate that MoDEX enjoys strong theoretical properties and achieves state-of-the-art UQ performance across diverse benchmarks, yielding interpretable uncertainty estimates with meaningful semantics.

URL PDF HTML ☆

赞 0 踩 0

2605.25612 2026-05-26 cs.LG cs.AI 版本更新

Towards the Connection between Activation Sparsity and Flat Minima

激活稀疏性与平坦极小值之间的联系

Ze Peng, Jian Zhang, Lei Qi, Yang Gao, Yinghuan Shi

发表机构 * State Key Laboratory for Novel Software Technology, Nanjing University（南京大学新型软件技术国家重点实验室）； Institute of Brain-Machine Interface, Nanjing University（南京大学脑机接口研究院）； School of Computer Science and Engineering, Southeast University（东南大学计算机科学与工程学院）

AI总结本文发现损失景观的平坦性与Transformer中MLP激活稀疏性密切相关，通过理论推导和三种实用方法增强稀疏性，显著降低推理和训练成本。

详情

AI中文摘要

标准训练的Transformer的MLP块中出现的激活稀疏性为在不牺牲性能的情况下大幅降低计算成本提供了机会。为了从理论上解释这一现象，现有工作表明激活稀疏性并非源于数据属性或数据拟合，而是来自训练过程的隐式偏差。然而，这些联系是在强假设下得到的，无法应用于标准训练的大步数深度模型。与这些工作不同，我们发现损失景观的平坦性也与MLP激活稀疏性密切相关，并且可以作为标准深度网络的一个更弱且自然出现的假设。具体来说，我们发现：1) MLP激活稀疏性等于“增强平坦性”（平坦性度量的加权和）与输入范数和MLP激活梯度乘积的比值。我们经验性地发现该比值在训练过程中下降，导致稀疏激活。2) 我们还提出了导数稀疏性的概念，在ReLU下它退化为激活稀疏性，但进一步支持反向传播中的剪枝，并且比激活稀疏性更稳定。基于理论发现，我们通过三种方法减小分子和增大分母来进一步鼓励激活稀疏性。这些即插即用的修改可以有效降低比值并产生更稀疏的激活。在ImageNet-1K和C4上的实验表明，与原始Transformer相比，推理稀疏性至少提高36%，训练稀疏性至少提高50%，表明在推理和训练中进一步降低成本的潜力。

英文摘要

The observation that activation sparsity emerges in MLP blocks of standardly trained Transformers offers an opportunity to drastically reduce computation costs without sacrificing performance. To theoretically explain this phenomenon, existing works have shown that activation sparsity does not result from the data properties or data fitting but from the implicit bias of the training process. However, these connections are obtained with strong assumptions, which cannot be applied to deep models standardly trained with a large number of steps. Different from these works, we find that the flatness of loss landscapes is also closely related to the MLP activation sparsity and can serve as a weaker and naturally emerging assumption standard deep networks. Specifically, we find that 1) the MLP activation sparsity equals a ratio between "augmented flatness" (a weighted sum of flatness measures) and the product of the input norm and activation gradient of the MLP. We empirically find that this ratio decreases during training, leading to sparse activations. 2) We also propose the notion of derivative sparsity, which reduces to activation sparsity under ReLU, but further enables pruning in the backward propagation and is more stable than activation sparsity. With the theoretical findings, we can further encourage activation sparsity by decreasing the numerator and increasing the denominator of the ratio using three methods. These plug-and-play modifications can effectively reduce the ratio and produce sparser activations. Experiments on ImageNet-1K and C4 demonstrate relative improvements of at least 36% on inference sparsity and at least 50% on training sparsity over vanilla Transformers, indicating further potential cost reduction in both inference and training

URL PDF HTML ☆

赞 0 踩 0

2605.25608 2026-05-26 stat.ML cs.LG 版本更新

Learning Sparse Compositional Functions with Norm-Constrained Neural Networks

学习具有范数约束神经网络的稀疏组合函数

Shuo Huang, Lorenzo Fiorito, Lorenzo Rosasco, Tomaso Poggio

发表机构 * Istituto Italiano di Tecnologia（意大利技术研究院）； Università degli Studi di Genova（热那亚大学）； MaLGa（MaLGa实验室）； DIBRIS（迪布里兹实验室）； CBMM（生物医学工程与机器人实验室）； Massachusetts Institute of Technology（麻省理工学院）

AI总结本文通过范数约束的深度神经网络，建立了学习稀疏组合函数的逼近率和过风险界，证明了深度网络能够利用层次表示避免维数灾难。

详情

AI中文摘要

深度神经网络学习层次特征的能力被广泛认为是其在高维学习中成功的关键机制。现有理论通过基于参数计数的逼近率和组合模型的无维数灾难样本复杂度保证，部分支持了这一观点。为了研究参数数量超过样本量的过参数化场景，我们开发了一个通过参数范数衡量复杂度的框架。在该方法中，我们使用Frobenius范数约束的深度神经网络，为学习稀疏组合函数建立了逼近率和过风险界，其中组合函数的组合结构由有向无环图表示。我们的结果具有广泛的适用性，因为每个可有效图灵计算的函数都具有稀疏组合表示。特别地，我们涵盖了一系列代表性模型，包括多指标模型、二叉树结构和一般组合架构。我们推导的速率表明，深度网络可以利用目标函数的组合结构，通过层次表示有效避免维数灾难。

英文摘要

The ability of deep neural networks to learn hierarchical features is widely regarded as a key mechanism underlying their success in high-dimensional learning. Existing theory partially supports this view by establishing approximation rates based on parameter counts and sample complexity guarantees for compositional models without incurring the curse of dimensionality (CoD). To study overparameterized regimes, where the number of parameters exceeds the sample size, we develop a framework that measures complexity via the parameter norm. Within this approach, we establish approximation rates and excess risk bounds for learning sparse compositional functions whose compositional structure is represented by directed acyclic graphs (DAGs), using Frobenius norm-constrained deep neural networks. Our results have broad applicability since every function that is efficiently Turing computable admits sparse compositional representations. In particular, we cover a range of representative models, including multi-index models, binary tree structures, and general compositional architectures. The rates we derive show that deep networks can exploit the compositional structure of the target functions, effectively avoiding the CoD through hierarchical representations.

URL PDF HTML ☆

赞 0 踩 0

2605.25605 2026-05-26 eess.AS cs.LG 版本更新

Decoding Stimulus Reconstruction-Based Auditory Attention Robustly in Unbalanced EEG Datasets

在不平衡EEG数据集中基于刺激重建的听觉注意力鲁棒解码

Yuanming Zhang, Yayun Liang, Zhibin Lin, Jing Lu

发表机构 * Key Lab of Modern Acoustics, Nanjing University（南京大学现代声学国家重点实验室）； Horizon Robotics

AI总结研究不平衡数据集对基于刺激重建的听觉注意力解码性能的影响，提出留一对包交叉验证协议以防止解码准确率膨胀。

详情

AI中文摘要

在过去十年中，许多研究通过刺激重建从脑电图信号中应用深度神经网络解码听觉注意力。然而，数据集平衡对基于刺激重建的AAD解码性能的影响尚未被探索。在本研究中，使用三个公开的EEG-AAD数据集——KUL、DTU和NJU cEEGrid——构建平衡和不平衡的实验条件。我们假设并证明基于刺激重建的DNN解码器倾向于在不平衡数据集上产生高估的解码性能。为了解决这个问题，我们提出了一种留一对包交叉验证协议。实验结果证实，LOPEO有效防止了在不平衡数据集上的解码准确率膨胀。虽然平衡数据集在实验设计中通常更受青睐，但LOPEO为已经发表的不平衡数据集提供了一个原则性的评估框架，填补了该领域的一个重要空白。

英文摘要

In the past decade, numerous studies have applied deep neural networks (DNNs) to decode auditory attention (AAD) from Electroencephalogram (EEG) signals via stimulus reconstruction. However, the influence of dataset balance on the decoding performance of stimulus reconstruction-based AAD remains unexplored. In this study, three publicly available EEG-AAD datasets - KUL, DTU, and NJU cEEGrid - are used to construct both balanced and unbalanced experimental conditions. We hypothesize and demonstrate that stimulus reconstruction-based DNN decoders tend to produce overestimated decoding performance on unbalanced datasets. To address this issue, we propose a leave-one-paired-envelope-out (LOPEO) cross-validation protocol. Experimental results confirm that LOPEO effectively prevents inflated decoding accuracy on unbalanced datasets. While balanced datasets are generally preferred in experimental design, LOPEO provides a principled evaluation framework for unbalanced datasets that have already been published, filling an important gap in the field.

URL PDF HTML ☆

赞 0 踩 0

2605.25604 2026-05-26 cs.CL cs.LG 版本更新

基于折扣在线镜像梯度的非平稳广义线性老虎机

Joongkyu Lee, Min-hwan Oh

发表机构 * Seoul National University（首尔国立大学）

AI总结提出DOMD-GLB算法，利用折扣在线镜像梯度处理非平稳广义线性老虎机，在保持O(1)每轮计算和内存成本的同时，实现动态遗憾界。

详情

AI中文摘要

我们研究非平稳广义线性老虎机（GLBs），其中期望奖励通过非线性链接函数与未知时变参数建模。该框架涵盖广泛的奖励模型，包括线性、伯努利和二项式奖励。现有方法主要基于最大似然估计（MLE），使用滑动窗口、重启或折扣机制处理非平稳性。尽管这些方法在统计上实现了高效的遗憾保证，但它们通常需要在每轮重新访问过去观测，导致计算和内存成本随时间增长；此外，其中一些方法依赖于非凸投影步骤。本文提出DOMD-GLB，一种用于非平稳GLBs的新算法，利用折扣在线镜像梯度（DOMD）进行参数估计，从而每轮仅产生O(1)的计算和内存成本。我们证明了在漂移环境下的动态遗憾界为$\tilde{O} \big(c_\mu^{-1/2} d^{3/4} P_T^{1/4} T^{3/4}\big)$，在分段平稳环境下为$\tilde{O}\big(c_\mu^{-1/3} d^{2/3} \Gamma_T^{1/3} T^{2/3}\big)$，其中$d$表示特征维度，$T$表示时间范围，$P_T$表示路径长度，$\Gamma_T$表示变化点数量，$c_\mu$是与链接函数相关的曲率参数，同时显著提高了计算效率。据我们所知，这是首个每轮计算和内存成本与时间无关的非平稳GLBs算法。

英文摘要

We study nonstationary generalized linear bandits (GLBs), where the expected reward is modeled through a nonlinear link function with an unknown time-varying parameter. This framework encompasses a broad class of reward models, including linear, Bernoulli, and binomial rewards. Existing approaches are predominantly based on maximum-likelihood estimation (MLE), using sliding-window, restart, or discounting mechanisms to handle nonstationarity. Although these methods achieve statistically efficient regret guarantees, they generally require revisiting past observations at every round, which leads to computation and memory costs that grow with time; moreover, several of them rely on a non-convex projection step. In this paper, we propose DOMD-GLB, a new algorithm for nonstationary GLBs that utilizes discounted online mirror descent (DOMD) for parameter estimation, thereby incurring only $O(1)$ computation and memory costs per round. We prove dynamic regret bounds of order $\tilde{O} \big(c_μ^{-1/2} d^{3/4} P_T^{1/4} T^{3/4}\big)$ in drifting environments and $\tilde{O}\big(c_μ^{-1/3} d^{2/3} Γ_T^{1/3} T^{2/3}\big) $in piecewise-stationary environments, where $d$ denotes the feature dimension, $T$ the time horizon, $P_T$ the path length, $Γ_T$ the number of change points, and $c_μ$ a curvature parameter associated with the link function, while substantially improving computational efficiency over prior work. To the best of our knowledge, this is the first algorithm for nonstationary GLBs with per-round computation and memory costs independent of time.

URL PDF HTML ☆

赞 0 踩 0

2605.25581 2026-05-26 cs.LG 版本更新

Learning Latent Dynamical Causal Processes for Single-Cell Perturbation Prediction

学习单细胞扰动预测的潜在动态因果过程

Wenkang Jiang, Yuhang Liu, Erdun Gao, Ehsan Abbasnejad, Lina Yao, Javen Qinfeng Shi

发表机构 * AIML, Adelaide University（AIML，阿德莱德大学）； Responsible AI Research Centre（负责任人工智能研究中心）； Monash University（莫纳什大学）； University of New South Wales（新南威尔士大学）

AI总结提出一种潜在动态因果生成模型（CITE-VAE），联合捕获潜在细胞程序、扰动条件机制和时间演化，实现单细胞扰动预测的分布外泛化。

Comments Accepted to SIGKDD 2026 AI4Science Track

详情

AI中文摘要

单细胞扰动预测旨在推断细胞如何响应未见过的干预，并实现分布外（OOD）泛化，为理解扰动如何随时间重塑细胞程序提供计算途径。现有的机器学习方法取得了重要进展，但通常仅捕捉响应的一方面。潜在因果方法寻求支持泛化和解释的机制，但往往将扰动效应视为静态结果。时间模型描述基因表达随时间的变化，但通常不显式恢复驱动这些变化的潜在因果生成机制。在实践中，扰动效应既是潜在的也是动态的：干预通过未观察到的细胞程序起作用，这些程序的状态随时间演变并产生观察到的表达谱。受此观点启发，我们提出一个用于单细胞扰动数据的潜在动态因果生成模型，联合捕获潜在细胞程序、扰动条件机制和时间演化。我们进一步提供可识别性分析，表明在适当条件下，潜在因果变量可恢复至标准等价类。在此分析指导下，我们开发了CITE-VAE，一个从单细胞测序数据中恢复潜在细胞程序及其扰动驱动动态的学习框架。在Causal-3DIdent上的实验验证了理论结果和所提方法在受控环境中的有效性。在真实世界的基于CRISPR的单细胞扰动数据上的额外实验表明，与最先进的基线相比，对未见扰动的泛化能力有所提升，突显了我们方法的实际鲁棒性。

英文摘要

Single-cell perturbation prediction aims to infer how cells respond to unseen interventions and to achieve out-of-distribution (OOD) generalization, providing a computational route to understanding how perturbations reshape cellular programs over time. Existing machine learning methods have made important progress, but typically capture only one side of the response. Latent causal approaches seek mechanisms that support generalization and interpretation, yet often treat perturbation effects as static outcomes. Temporal models describe how gene expression changes across time, but usually do not explicitly recover the latent causal generative mechanisms driving these changes. In practice, perturbation effects are both latent and dynamical: interventions act through unobserved cellular programs, whose states evolve over time and give rise to observed expression profiles. Motivated by this view, we propose a latent dynamical causal generative model for single-cell perturbation data that jointly captures latent cellular programs, perturbation-conditioned mechanisms, and temporal evolution. We further provide an identifiability analysis showing that, under suitable conditions, the latent causal variables are recoverable up to standard equivalence classes. Guided by this analysis, we develop CITE-VAE, a learning framework for recovering latent cellular programs and their perturbation-driven dynamics from single-cell sequencing data. Experiments on Causal-3DIdent validate the theoretical results and the effectiveness of the proposed method in controlled settings. Additional experiments on real-world CRISPR-based single-cell perturbation data show improved generalization to unseen perturbations compared with state-of-the-art baselines, highlighting the practical robustness of our approach.

URL PDF HTML ☆

赞 0 踩 0

2605.25577 2026-05-26 cs.LG cs.AI 版本更新

Geometric Flow Matching for Molecular Conformation Generation via Manifold Decomposition

基于流形分解的几何流匹配分子构象生成

Yunqing Liu, Yi Zhou, Wenqi Fan

发表机构 * The Hong Kong Polytechnic University（香港理工大学）

AI总结提出GO-Flow方法，通过将生成过程分解为平移、旋转和构象三个物理子空间，利用流形上的最优传输和测地流，解决现有方法忽略分子几何层次结构的问题，实现高质量、高效率的分子构象生成。

详情

AI中文摘要

生成准确的3D分子构象是计算化学和药物发现中的关键挑战。最近，扩散和流匹配模型取得了显著成功。然而，它们的数学公式与分子的物理现实之间存在严重的不匹配。现有方法主要将分子视为笛卡尔空间中的无结构点云，忽略了键长和键角相对刚性而扭转角构成主要柔性自由度的内在层次力学。这种对流形的不感知迫使模型从头重新学习基本几何约束，常常导致物理上不可信的中间结构。为了解决这个问题，我们提出了GO-Flow，通过流形分解将生成建模与分子几何对齐。GO-Flow不是强制在欧几里得空间中运动，而是将生成过程分解为三个物理驱动的子空间：具有线性最优输运的平移空间、$SO(3)$上具有测地流的旋转空间以及具有熵最优输运的构象空间。这种分解注入了几何归纳偏置，使生成路径更好地与分子自由度对齐。当与等变神经架构结合时，它鼓励旋转一致的生成并提高几何有效性。在GEOM-Drugs和GEOM-QM9上的大量实验表明，GO-Flow实现了最先进的生成质量。值得注意的是，通过在正确的流形上自然地学习更直的概率路径，我们的方法能够在仅50步的情况下实现高保真采样，有效弥合了结构精度与计算效率之间的差距。

英文摘要

The generation of accurate 3D molecular conformations is a pivotal challenge in computational chemistry and drug discovery. Recently, diffusion and flow matching models have achieved remarkable success. However, there is a critical misalignment between their mathematical formulation and the physical reality of molecules. Existing approaches predominantly treat molecules as unstructured point clouds in Cartesian space, overlooking the intrinsic hierarchical mechanics where bond lengths and bond angles are relatively stiff, whereas torsion angles constitute the dominant flexible degrees of freedom. This lack of manifold awareness forces models to relearn fundamental geometric constraints from scratch, often leading to physically implausible intermediate structures. To address this, we propose GO-Flow that aligns generative modeling with molecular geometry via manifold decomposition. Instead of forcing motion through Euclidean space, GO-Flow decomposes the generation process into three physically motivated subspaces: translation space with linear optimal transport, rotation space with geodesic flows on $SO(3)$, and conformation space with entropic optimal transport. This decomposition injects geometric inductive biases and makes the generative paths better aligned with molecular degrees of freedom. When combined with equivariant neural architectures, it encourages rotation-consistent generation and improves geometric validity. Extensive experiments on GEOM-Drugs and GEOM-QM9 demonstrate that GO-Flow achieves state-of-the-art generation quality. Notably, by learning straighter probability paths on the correct manifolds naturally, our method enables high-fidelity sampling with as few as 50 steps, effectively bridging the gap between structural precision and computational efficiency.

URL PDF HTML ☆

赞 0 踩 0

2605.25565 2026-05-26 cs.LG cs.CL 版本更新

RotMoLE: Enhancing Mixture of Low-Rank Experts through Rotational Gating Mechanism

RotMoLE：通过旋转门控机制增强混合低秩专家

Mengyang Sun, Maochuan Dou, Tao Feng, Dan Zhang, Yihao Wang, Junpeng Liu, Yifan Zhu, Jie Tang

发表机构 * Tsinghua University（清华大学）； Beijing Information Science and Technology University（北京信息科技大学）； National University of Singapore（新加坡国立大学）； Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； Beijing University of Posts and Telecommunications（北京邮电大学）

AI总结针对MoE-LoRA中传统门控仅标量加权限制表示能力的问题，提出RotMoLE框架，通过引入旋转门控机制对每个专家进行旋转操作，提升专家利用率和专业化程度，在多任务和多语言训练中验证有效性。

详情

AI中文摘要

虽然大型语言模型（LLM）通常在进行垂直应用之前会针对特定领域任务进行微调，但将它们适应于具有多样化专业知识的复杂场景仍然具有挑战性。与此同时，混合专家（MoE）架构已成为训练LLM的关键范式，最近的一些工作也将MoE引入参数高效微调（PEFT），提出了混合低秩专家（MoE-LoRA），以增强低秩适配器学习复杂知识的能力。然而，MoE中的传统门控机制通常仅对选中的专家应用标量重新加权，从而限制了其表示和泛化的潜在能力。受MoE-LoRA中低秩结构的启发和推动，我们提出了RotMoLE，一个专门针对低秩专家的MoE框架，其特点是一个额外的旋转门控。除了简单的缩放，RotMoLE为每个选中的专家实现了一个旋转机制，从而在专家候选有限的情况下，实现了更好的专家利用和专业化，以学习多样化的数据。在复杂多任务和多语言训练场景下的实证结果验证了我们的有效性。

英文摘要

While Large Language Models (LLMs) are commonly fine-tuned to handle domain-specific tasks before being applied to vertical applications, adapting them to complex scenarios with diverse specialized knowledge remains challenging. Meanwhile, Mixture-of-Experts (MoE) architecture has risen as a crucial paradigm for training LLMs, and some recent works have also incorporated MoE into Parameter-Efficient Fine-Tuning (PEFT) to propose the Mixture of Low-rank Experts (MoE-LoRA), to enhance the power of low-rank adapters for learning complicated knowledge. However, conventional gating mechanisms in MoE typically apply only a scalar reweighing to selected experts, thereby limiting their underlying capacity of representation and generalization. Motivated and enabled by the low-rank structures in MoE-LoRA, we propose RotMoLE, a specialized MoE framework for low-rank experts featuring an additional rotation gate. Beyond simple scaling, RotMoLE implements a rotation mechanism for each selected expert, enabling superior expert exploitation and specialization for learning diverse data, especially when expert candidates are limited. Empirical results on complex multi-task and multilingual training scenarios validate our effectiveness.

URL PDF HTML ☆

赞 0 踩 0

2605.25551 2026-05-26 cs.LG 版本更新

Learning Permutation from Structure Without Supervision

从结构中无监督学习排列

Ran Eisenberg, Ofir Lindenbaum

发表机构 * Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel（巴伊兰大学工程学院，拉马特甘，以色列）

AI总结提出熵自适应Gumbel-Sinkhorn方法，通过局部调节温度改善无监督排列学习的稳定性和质量。

详情

AI中文摘要

许多学习问题需要揭示隐藏的排序，以揭示无序数据中的结构，例如排序中的单调性或拼图重建中的空间连续性。在这些设置中，排列可以作为潜在算子通过优化直接定义在重排序输出上的目标来学习，通常没有真实排序的访问。可微松弛如Gumbel-Sinkhorn通过用双随机矩阵近似排列矩阵使这种方法实用。然而，无监督地从结构学习会导致非均匀的不确定性：一些分配早期变得自信，而其他分配仍然模糊。现有方法使用单个全局温度控制这一过程，迫使所有分配同时锐化或扩散，导致大规模不稳定。我们引入了一种熵自适应的Gumbel-Sinkhorn公式，根据分配不确定性局部调节温度。这使得自信的分配可以早期离散化，同时在不明确的地方保留探索。在排序和拼图重建任务以及路由式设置中，相对于固定温度基线，自适应熵控制提高了训练稳定性和最终排列质量，特别是在问题规模和分配模糊性增加时。

英文摘要

Many learning problems require uncovering a hidden ordering that reveals structure in unordered data, such as monotonicity in sorting or spatial continuity in jigsaw reconstruction. In these settings, permutations can be learned as latent operators by optimizing objectives defined directly on the reordered output, often without access to ground-truth orderings. Differentiable relaxations such as Gumbel-Sinkhorn make this approach practical by approximating permutation matrices with doubly stochastic matrices. However, learning from structure without supervision induces a non-uniform uncertainty: some assignments become confident early, while others remain ambiguous. Existing methods control this process using a single global temperature, forcing all assignments to sharpen or diffuse simultaneously and leading to instability at scale. We introduce an entropy-adaptive formulation of Gumbel-Sinkhorn that locally modulates temperature based on assignment uncertainty. This allows confident assignments to discretize early while preserving exploration where uncertainty remains. Across sorting and jigsaw reconstruction tasks and in routing-style settings, adaptive entropy control improves training stability and final permutation quality relative to fixed-temperature baselines, particularly as problem size and assignment ambiguity increase.

URL PDF HTML ☆

赞 0 踩 0

2605.25549 2026-05-26 cs.CL cs.AI cs.LG 版本更新

DeepSeekMath 遇见订单簿：面向高频方向性交易的组感知策略优化

Sayak Charabarty, Souradip Pal

发表机构 * Department of Computer Science（计算机科学系）； Northwestern University（西北大学）； School of Electrical and Computer Engineering（电气与计算机工程学院）； Purdue University（普渡大学）

AI总结本文通过将基于订单流的状态模型与策略梯度方法结合，研究限价订单簿上的高频交易强化学习，提出组感知策略优化方法，在回测中优于基于价值的 Q-learning 基线。

Comments 9 pages, 3 figures

2605.25526 2026-05-26 stat.ML cs.LG 版本更新

From DPPs to $k$-DPPs: identifiability analysis via spectral decomposition

从DPP到$k$-DPP：通过谱分解的可识别性分析

Hideitsu Hino, Keisuke Yano

发表机构 * The Institute of Statistical Mathematics（统计数学研究所）

AI总结通过谱分解研究行列式点过程（DPP）及其条件版本$k$-DPP的几何结构，揭示了$k$-DPP中谱参数和特征空间旋转参数的可识别性变化，并刻画了可识别性差距。

Comments 10 pages

2605.25525 2026-05-26 cs.LG 版本更新

SAE-FD: Sparse Autoencoder Feature Distillation for Continual Learning of Large Language Models

SAE-FD: 面向大语言模型持续学习的稀疏自编码器特征蒸馏

Mingxu Zhang, Yuhan Li, Lujundong Li, Dazhong Shen, Hui Xiong, Ying Sun

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； Nanjing University of Aeronautics and Astronautics（南京航空航天大学）； The 63rd Research Institute, National University of Defense Technology, Nanjing（国防科技大学第六十三研究所，南京）

AI总结针对持续学习中的灾难性遗忘问题，提出基于稀疏自编码器特征蒸馏的方法，通过将模型表示锚定在稀疏特征空间以减少表征纠缠，实现更精准的正则化，在多个基准上优于现有方法。

详情

AI中文摘要

持续学习使大语言模型能够适应不断变化的任务而无需从头重新训练，但灾难性遗忘仍然是一个核心障碍。在持续学习方法中，基于正则化的方法被广泛用于约束模型更新并减少遗忘，这些方法在权重空间、梯度空间或输出空间中操作。然而，这些密集表示空间存在特征叠加问题，即多个概念被编码在重叠的维度中，使得难以在不阻碍新任务学习的情况下有选择地保护先前学到的知识。为了解决这个问题，我们提出了\method（稀疏自编码器特征蒸馏），该方法将模型表示锚定在预训练稀疏自编码器的稀疏特征空间中，其中密集激活被分解为稀疏过完备基，从而减少表征纠缠，实现更有针对性的正则化，同时减少对新任务学习的干扰。在三个模型架构上的两个持续学习基准实验表明，\method始终优于现有的基于正则化的方法，平均准确率高达52.70%，仅产生-0.46的后向迁移。

英文摘要

Continual learning enables large language models to adapt to evolving tasks without retraining from scratch, yet catastrophic forgetting remains a central obstacle. Among continual learning methods, regularization-based approaches are widely used to constrain model updates and reduce forgetting, operating in weight space, gradient space, or output space. However, these dense representation spaces suffer from feature superposition, where multiple concepts are encoded in overlapping dimensions, making it difficult to selectively protect previously learned knowledge without impeding new-task learning. To address this issue, we propose \method (Sparse Autoencoder Feature Distillation), which anchors model representations in the sparse feature space of a pre-trained Sparse Autoencoder, where dense activations are decomposed into a sparse overcomplete basis that reduces representational entanglement, enabling more targeted regularization with less interference to new-task learning. Experiments on two continual learning benchmarks across three model architectures show that \method consistently outperforms existing regularization-based methods, achieving up to 52.70% average accuracy with only -0.46 backward transfer.

URL PDF HTML ☆

赞 0 踩 0

2605.25509 2026-05-26 stat.ML cs.LG 版本更新

Guided Flow Matching for Forward and Inverse PDE Problems with Sparse Observations: Algorithm and Theory

面向稀疏观测的正反PDE问题的引导流匹配：算法与理论

Xifeng Zhang, Jin Zhao

发表机构 * School of Mathematical Science（数学科学学院）； Academy for Multidisciplinary Studies（多学科研究学院）

AI总结提出FM4PDE流匹配生成框架，通过引导采样联合学习PDE系数与解分布，实现稀疏观测下的正向模拟与逆问题恢复，并提供误差保证。

Comments 50 pages, 8 figures, 4 tables

详情

AI中文摘要

从稀疏观测中重建PDE解是科学计算中的核心挑战。我们提出FM4PDE，一种流匹配生成框架，学习PDE系数（或初始状态）与解（或最终状态）的联合分布，从而在有限配对数据下实现正向模拟和逆问题恢复。在推理时，采样由一个复合损失引导，该损失强制与稀疏测量一致并减少PDE残差；我们支持确定性、随机性和混合采样器。我们为这些引导过程提供误差保证。对于确定性优化器，一个强制条件确保轨迹有界，且逐阶段收缩导致目标精度的对数复杂度。对于随机采样器，我们引入自适应引导并假设速度场的耗散性，以获得与噪声基底参数无关的均匀矩界。这导致多项式时间误差界，且一个匹配的下界表明恒定引导会引入不可避免的正偏差，从而激发自适应性。还提供了混合确定性-随机分析。在静态和时变基准PDE上的实验表明，与基于扩散的生成模型相比，具有竞争性的精度和更快的推理速度。

英文摘要

Reconstructing PDE solutions from sparse observations is a core challenge in scientific computing. We present FM4PDE, a flow-matching generative framework that learns the joint distribution of PDE coefficients (or initial states) and solutions (or final states), enabling both forward simulation and inverse recovery with limited paired data. At inference, sampling is guided by a composite loss that enforces agreement with sparse measurements and reduces the PDE residual; we support deterministic, stochastic, and hybrid samplers. We provide error guarantees for these guided procedures. For the deterministic optimizer, a coercivity condition ensures trajectory boundedness and a phase-wise contraction yields logarithmic complexity in the target accuracy. For the stochastic sampler, we introduce adaptive guidance and assume dissipativity of the velocity field to obtain uniform moment bounds independent of the noise-floor parameter. This leads to polynomial-time error bounds, and a matching lower bound shows constant guidance induces an unavoidable positive bias, motivating adaptivity. A hybrid deterministic-stochastic analysis is also provided. Experiments on static and time-dependent benchmark PDEs demonstrate competitive accuracy and faster inference than diffusion-based generative models.

URL PDF HTML ☆

赞 0 踩 0

2605.25508 2026-05-26 cs.LG 版本更新

Relative Repairability: A Calibration-Based Diagnostic for High-Sparsity Post-Pruning Allocation

相对可修复性：一种基于校准的高稀疏度剪枝后分配诊断方法

Qishi Zhan, Liang He, Minxuan Hu, Ziheng Chen

发表机构 * Marquette University（马凯特大学）； Tongji University（同济大学）； Cornell University（康奈尔大学）； UT Austin（得克萨斯大学奥斯汀分校）

AI总结提出相对可修复性（RR）指标，通过校准数据比较层剪枝引起的原始激活失真与通道方差匹配修复后的残余失真，用于诊断高稀疏度下剪枝损伤的可修复性，实验表明其在架构依赖的可恢复性转变区域优于现有分配规则。

详情

AI中文摘要

在极高稀疏度下，神经网络剪枝不仅决定哪些权重保留，还决定剪枝引起的损伤在网络中的分布位置，以及这些损伤能否通过固定的轻量修复过程恢复。我们通过修复条件稀疏分配的角度研究这一问题。我们引入相对可修复性（RR），一种基于校准的诊断方法，比较逐层剪枝引起的原始激活失真与通道方差匹配修复后的残余失真。RR仅使用未标记的校准数据，估计修复后剩余局部损伤的比例。在CIFAR10和CIFAR100上的ResNet18、ResNet34和VGG16 BN实验中，我们发现RR并非普遍主导的分配规则。相反，它在架构依赖的可恢复性转变附近最为有用，此时标准的结构或幅度基分配先验开始失去可靠性，但修复后恢复尚未完全崩溃。在CIFAR100 ResNet18上，细粒度扫描显示RR在中心转变带上优于ERK，并在该带上部超过LAMP。投影强制消融进一步表明，有上限的ERK可能过度保护投影层，将过多稀疏度转移到常规卷积上，降低修复后恢复。这些结果表明，高稀疏度剪枝不仅应分配保留的权重，还应分配可修复的损伤。

英文摘要

At very high sparsity, neural network pruning does more than decide which weights remain. It also determines where pruning induced damage is placed across the network, and whether that damage can be recovered by a fixed lightweight repair procedure. We study this problem through the lens of repair conditioned sparsity allocation. We introduce Relative Repairability (RR), a calibration based diagnostic that compares the raw activation distortion caused by layerwise pruning with the residual distortion left after channelwise variance matching repair. RR estimates the fraction of local damage that remains after repair, using only unlabeled calibration data. Across ResNet18, ResNet34, and VGG16 BN on CIFAR10 and CIFAR100, we find that RR is not a universally dominant allocation rule. Instead, it is most useful near an architecture dependent recoverability transition, where standard structural or magnitude based allocation priors begin to lose reliability but post repair recovery has not yet fully collapsed. On CIFAR100 ResNet18, a fine grained sweep shows that RR improves over ERK across the central transition band and surpasses LAMP near the upper part of this band. A projection forced ablation further shows that capped ERK can over protect projection layers, shifting excessive sparsity onto regular convolutions and reducing post repair recovery. These results suggest that high sparsity pruning should allocate not only retained weights, but also repairable damage.

URL PDF HTML ☆

赞 0 踩 0

2605.25499 2026-05-26 cs.LG 版本更新

Accelerated Dynamic Importance Weighting with Versatile Divergence-Minimizing Estimators

加速动态重要性加权与通用散度最小化估计器

Tongtong Fang, Nan Lu, Gang Niu, Kenji Fukumizu, Masashi Sugiyama

发表机构 * The Institute of Statistical Mathematics（统计数学研究所）； University of Bristol（布里斯托大学）； RIKEN Center for Advanced Intelligence Project（RIKEN高级智能项目中心）； The University of Tokyo（东京大学）

AI总结针对联合分布偏移问题，提出加速动态重要性加权（ADIW）框架，通过轻量投影梯度下降和通用散度最小化，在提升效率的同时实现最优性能。

详情

AI中文摘要

重要性加权（IW）是解决联合分布偏移的黄金方法，其中训练数据和测试数据的联合分布不同。为解决此问题，IW估计测试与训练密度比作为重要性权重，并相应地重新加权训练损失。最近动态IW（DIW）的进展将权重估计集成到模型训练中，实现了深度模型的可扩展IW，并在大型现代数据集上取得了强劲性能。尽管有前景，DIW仍存在两个局限。首先，它通过在每个小批量中求解核均值匹配（KMM）诱导的优化问题至收敛，导致大量计算开销。其次，它仅依赖KMM进行权重估计，而IW文献包含基于不同散度度量的多种估计方法。本文提出加速动态IW（ADIW），一个统一且高效的联合分布偏移下深度学习IW框架。ADIW执行少量轻量投影梯度下降更新，从先前更新的权重热启动，显著提高效率。此外，ADIW将DIW推广为一个统一的散度最小化框架，以即插即用方式支持多种权重估计方法，包括基于Kullback-Leibler散度、平方距离和Wasserstein-1距离的方法。我们在温和条件下建立了ADIW的收敛保证，实证结果表明ADIW在实现最先进IW性能的同时，效率大幅提升。

英文摘要

Importance weighting (IW) is a golden solver for joint distribution shift, where the joint distributions differ between the training and test data. To solve this problem, IW estimates test-to-training density ratios as importance weights and reweights the training losses accordingly. Recent advances in dynamic IW (DIW) integrate weight estimation into model training, enabling scalable IW for deep models and achieving strong performance on large modern datasets. Despite its promise, DIW remains limited in two aspects. First, it incurs substantial computational overhead by solving a kernel mean matching (KMM)-induced optimization problem to convergence in every mini-batch. Second, it relies solely on KMM for weight estimation, whereas the IW literature contains diverse estimation methods based on different divergence measures. In this paper, we propose accelerated DIW (ADIW), a unified and efficient IW framework for deep learning under joint distribution shift. ADIW performs a few lightweight projected gradient descent updates that warm-start from previously updated weights, substantially improving efficiency. Moreover, ADIW generalizes DIW into a unified divergence-minimization framework that supports diverse weight-estimation methods in a plug-and-play manner, including those based on the Kullback-Leibler divergence, squared distance, and Wasserstein-1 distance. We establish convergence guarantees for ADIW under mild conditions, and empirical results demonstrate that ADIW achieves state-of-the-art IW performance while being substantially more efficient.

URL PDF HTML ☆

赞 0 踩 0

2605.25492 2026-05-26 cs.LG 版本更新

SafetyRepro: Configuration-Conditional Rank Instability on Alignment Benchmarks

SafetyRepro: 对齐基准上的配置条件排名不稳定性

Yanhang Li, Zhichao Fan, Zexin Zhuang

发表机构 * Northeastern University, Boston, MA, USA（东北大学）； University of Illinois Urbana-Champaign, Urbana, IL, USA（伊利诺伊大学厄巴纳-香槟分校）； Southern Methodist University, Dallas, TX, USA（南方 Methodist 大学）

AI总结本文通过理论命题和提交戳评估协议，证明对齐基准上的成对模型比较结果（如“A比B更安全”）会因未指定的配置选择而发生严格反转。

2605.25469 2026-05-26 cs.LG 版本更新

面向常规心电图广谱心血管评估的信号-语言基础模型

Ziqing Yu, Yuhui Tao, Jiayu Huo, Lei Pan, Zilong Xiao, Juecheng Chen, Xiao Li, Jianxuan Li, You Zhou, Zhixing Li, Cong Wang, Beijian Zhang, Chen Chen, Hongyang Lu, Konstantinos Patlatzoglou, Daniel B. Kramer, Jonathan W. Waks, Yangang Su, Fu Siong Ng, Shuo Wang, Yixiu Liang, Junbo Ge

发表机构 * Department of Cardiology, Zhongshan Hospital of Fudan University（复旦大学中山医院心内科）； Shanghai Institute of Cardiovascular Diseases, National Clinical Research Centre for Interventional Medicine（上海心血管病研究所，国家介入医学临床研究中心）； Digital Medical Research Center, School of Basic Medical Sciences, Fudan University（复旦大学基础医学研究院数字医疗研究中心）； Shanghai Key Laboratory of Medical Imaging Computing and Computer Assisted Intervention（上海医学影像计算与计算机辅助手术重点实验室）； National Heart and Lung Institute, Imperial College London, Hammersmith Hospital, Du Cane Road（伦敦帝国学院国家心肺研究所，哈马舍姆医院，杜肯路）； Department of Cardiology, Shanghai Geriatric Medical Center（上海老年医学中心心内科）； Cardiac Rhythm Management, Medtronic Technology Center, Medtronic (Shanghai) Ltd.（美敦力技术中心，美敦力（上海）有限公司，心律管理部）； Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology, Beth Israel Deaconess Medical Center, Harvard Medical School（哈佛医学院比尔·德·阿克谢心脏结局研究中心，贝斯以色列·德aconess医疗中心）； Harvard-Thorndike Electrophysiology Institute, Beth Israel Deaconess Medical Center, Harvard Medical School（哈佛-托尔恩迪克电生理研究所，贝斯以色列·德aconess医疗中心，哈佛医学院）； Department of Cardiology, Imperial College Healthcare NHS Trust（伦敦帝国学院医疗信托心内科部）； Department of Cardiology, Chelsea and Westminster NHS Foundation Trust（切尔西和温斯洛医院 NHS 基础信托心内科部）； Department of Computer Science and Technology, University of Cambridge（剑桥大学计算机科学与技术系）

AI总结提出ECGCLIP信号-语言对比学习框架，通过大规模心电图-报告预训练，在89项下游任务中超越基线，实现对常见心律失常、超声心动图靶标及罕见心脏病的广谱评估。

详情

AI中文摘要

心电图（ECG）是心血管诊疗的核心，但传统AI模型通常局限于常见心律失常，且在不同人群或临床细微疾病中泛化能力较差。我们开发了ECGCLIP（心电图对比语言-图像预训练），一种信号-语言对比学习框架，将ECG波形与专家诊断报告对齐。ECGCLIP在来自1,324,856名患者的2,837,962份心电图研究上进行了预训练，并在一个留出内部测试集以及包含约150万份心电图的九个独立外部队列上进行了评估。评估覆盖89项下游任务，包括45项心电图诊断、39项超声心动图靶标和5种罕见心脏病，以PRAUC为主要指标。ECGCLIP在随机初始化和Merl-R18基线上持续提升性能。在内部测试集上，ECGCLIP-R34对心房颤动（PRAUC 0.900）和ST段抬高型心肌梗死（PRAUC 0.383）表现出强劲性能，并在所有外部队列中具有稳健泛化能力。它还改善了低患病率和诊断困难的疾病，包括埃布斯坦畸形、缩窄性心包炎、右位心和心脏淀粉样变性，内部PRAUC值分别为0.253、0.175、0.121和0.201。ECGCLIP数据高效，仅使用10%的训练数据即可达到或超过全数据集基线性能。特征可视化和显著性分析表明，其学习到的表示与既定心电图标准具有临床意义的对齐。这些发现表明，大规模心电图-报告对比预训练可以将常规心电图解读从常见心律失常扩展到广谱心血管评估以及超声心动图和罕见病的机会性筛查。

英文摘要

Electrocardiography (ECG) is central to cardiovascular care, but conventional AI models are often restricted to common arrhythmias and may generalize poorly across populations or clinically subtle diseases. We developed ECG Contrastive Language-Image Pre-training (ECGCLIP), a signal-language contrastive learning framework that aligns ECG waveforms with expert diagnostic reports. ECGCLIP was pre-trained on 2,837,962 ECG studies from 1,324,856 patients and evaluated on a held-out internal test set plus nine independent external cohorts comprising about 1.5 million ECGs. Evaluation covered 89 downstream tasks, including 45 ECG diagnoses, 39 echocardiographic targets, and 5 rare cardiac diseases, using PRAUC as the primary metric. ECGCLIP consistently improved performance over random initialization and Merl-R18 baselines. On the internal test set, ECGCLIP-R34 achieved strong performance for atrial fibrillation (PRAUC 0.900) and ST-segment elevation myocardial infarction (PRAUC 0.383), with robust generalization across all external cohorts. It also improved low-prevalence and diagnostically elusive diseases, including Ebstein anomaly, constrictive pericarditis, dextrocardia, and cardiac amyloidosis, with internal PRAUC values of 0.253, 0.175, 0.121, and 0.201, respectively. ECGCLIP was data efficient, matching or exceeding full-dataset baseline performance with only 10% of training data. Feature visualization and saliency analysis suggested clinically meaningful representations aligned with established electrocardiographic criteria. These findings indicate that large-scale ECG-report contrastive pre-training can expand routine ECG interpretation beyond common arrhythmias toward broad cardiovascular assessment and opportunistic screening of echocardiographic and rare conditions.

URL PDF HTML ☆

赞 0 踩 0

2605.25439 2026-05-26 cs.LG 版本更新

Missing Pattern Recognized Diffusion Imputation Model for Missing Not At Random

缺失非随机识别的扩散插补模型

Gyuwon Sim, Sumin Lee, Heesun Bae, Byeonghu Na, Doyun Kwon, Ju-Hee Hwang, Jae-Young Lim, Il-Chul Moon

发表机构 * KAIST（韩国科学技术院）； Seoul National University（首尔国立大学）

AI总结针对缺失非随机（MNAR）问题，提出缺失模式识别扩散插补模型（PRDIM），通过模式识别器和EM算法最大化联合分布似然，实现精确插补。

详情

AI中文摘要

缺失数据在包括时间序列和图像在内的多个领域中频繁出现。在现实世界中，缺失的发生往往依赖于不可观测的值本身，这被称为缺失非随机（MNAR）。在这项工作中，我们引入了缺失模式识别扩散插补模型（PRDIM），这是一个新颖的框架，它显式地捕获缺失模式并精确插补未观测值。PRDIM在期望最大化（EM）算法下迭代地最大化观测值和缺失掩码的联合分布似然。从这个意义上说，我们首先采用一个模式识别器，它近似潜在的缺失模式，并在每次推理中提供指导，以针对缺失信息进行更合理的插补。通过大量实验，我们证明PRDIM在多种数据模态的MNAR设置下始终实现强大的插补性能。

英文摘要

Missing data frequently arises across diverse domains, including time-series and image domains. In the real world, missing occurrences often depend on the unobservable values themselves, which are referred to as Missing Not at Random (MNAR). In this work, we introduce the Missing Pattern Recognized Diffusion Imputation Model (PRDIM), a novel framework that explicitly captures the missing pattern and precisely imputes unobserved values. PRDIM iteratively maximizes the likelihood of the joint distribution for observed values and missing mask under an Expectation-Maximization (EM) algorithm. In this sense, we first employ a pattern recognizer, which approximates the underlying missing pattern and provides guidance during every inference toward more plausible imputations with respect to the missing information. Through extensive experiments, we demonstrate that PRDIM consistently achieves strong imputation performance under MNAR settings across multiple data modalities.

URL PDF HTML ☆

赞 0 踩 0

2605.25429 2026-05-26 cs.LG 版本更新

Rethinking Feature Alignment in Generalist Graph Anomaly Detection: A Relational Fingerprint-based Approach

重新思考通用图异常检测中的特征对齐：一种基于关系指纹的方法

Yujing Liu, Yixin Liu, Yu Zheng, Alan Wee-Chung Liew, Xiaofeng Cao, Shirui Pan

发表机构 * Griffith University, Gold Coast, Australia（格里菲斯大学，澳大利亚黄金海岸）； Tongji University, Shanghai, China（同济大学，上海，中国）

AI总结针对通用图异常检测中特征对齐忽略语义导致负迁移的问题，提出基于关系指纹的通用方法ReFi-GAD，通过编码上下文和结构异常指示线索的语义感知指纹，结合Transformer编码器和SNR引导的领域自适应模块，在14个数据集上显著超越现有方法。

Comments 9 pages, 7 figures. Accepted by ICML 2026

详情

AI中文摘要

通用图异常检测（GAD）旨在无需针对特定图进行重新训练即可检测未见图上的异常。然而，现有方法主要关注通过基于PCA的投影来对齐不同数据域间的异构特征，这种对齐方式虽然统一了特征维度，却忽略了特征语义。因此，GAD模型无法学习可迁移的语义知识，甚至在未见图上表现出负迁移。为解决此问题，我们提出一种基于关系指纹的通用GAD方法（简称ReFi-GAD），通过一种通用的、语义感知的关系指纹（ReFi）对齐异构原始特征，该指纹从上下文和结构两个角度编码异常指示线索。基于ReFi，我们设计了一个基于指纹的通用GAD模型，该模型结合了基于Transformer的编码器以捕获领域不变知识，以及一个SNR引导的细化模块用于领域特定自适应。在14个数据集上的大量实验表明，ReFi-GAD显著优于现有最先进方法。

英文摘要

Generalist graph anomaly detection (GAD) aims to detect anomalies on unseen graphs without graph-specific retraining. Nevertheless, existing approaches primarily focus on aligning heterogeneous features across different data domains via PCA-based projection, which harmonizes feature dimensions ignores feature semantics. As a result, GAD models fail to learn transferable semantic knowledge, and even exhibit negative transfer on unseen graphs. To address this issue, we propose a Relational Fingerprint-based generalist GAD approach (ReFi-GAD for short), aligning heterogeneous raw features with a universal and semantics-aware Relational Fingerprint (ReFi) that encodes anomaly-indicative cues from both contextual and structural perspectives. Building on ReFi, we design a fingerprint-grounded generalist GAD model, which combines a transformer-based encoder to capture domain-invariant knowledge with an SNR-guided refinement module for domain-specific adaptation. Extensive experiments on 14 datasets demonstrate that ReFi-GAD significantly outperforms state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2605.25424 2026-05-26 cs.LG cs.AI 版本更新

ViroBench：病毒基因组学任务中的核苷酸基础模型基准测试

Dongxin Ye, Fang Hu, Han Hu, Shu Hu, Yang Tan, Wanli Ouyang, Stan Z. Li, Jie Cui, Nanqing Dong

发表机构 * Shanghai Innovation Institute Shanghai China ； University of Electronic Science ； Fudan University Shanghai China ； Shanghai Artificial Intelligence Laboratory Shanghai China ； Institute of Infection ； Health Fudan University Shanghai China ； Shanghai Sci-Tech Inno Center for Infection \& Immunity Shanghai China ； Shanghai Jiao Tong University Shanghai China ； Shenzhen Loop Area Institute Shenzhen China ； Chinese University of Hong Kong Hong Kong China ； Westlake University Hangzhou China ； Shanghai Innovation Institute ； Fudan University ； Shanghai Artificial Intelligence Laboratory ； Shanghai Sci-Tech Inno Center for Infection \& Immunity ； Shanghai Jiao Tong University ； Shenzhen Loop Area Institute ； Chinese University of Hong Kong ； Westlake University

AI总结提出首个针对病毒基因组学的综合基准ViroBench，评估66个核苷酸基础模型在生物学理解和潜在生物安全风险上的表现，发现模型在系统发育和时间偏移下性能下降，生成任务中统计似然与生物功能有效性脱钩，且预训练数据的分类多样性比参数规模更重要。

Comments 42 pages,15 figures

详情

DOI: 10.1145/3770855.3819057

AI中文摘要

核苷酸序列构成了生物系统的基本遗传基础，使得病毒基因组分析对生物医学进步至关重要。尽管生物基础模型，特别是核苷酸基础模型（NFMs）取得了进展，但该领域缺乏一个统一的病毒基因组学标准来促进社区发展并实施生物安全约束。为了解决这个问题，我们引入了ViroBench，这是第一个专门为病毒场景中的NFMs设计的全面且大规模的基准测试。ViroBench在两个关键维度上评估模型：生物学理解和潜在生物安全风险，覆盖4种任务类型中的18个不同场景。对66个不同架构的NFMs的广泛评估得出了三个关键结论。首先，NFMs在系统发育和时间偏移下表现出生物学理解的性能下降，表明外推能力较弱。其次，生成任务揭示了统计似然与生物功能有效性之间的脱钩，构成了潜在的生物安全风险。第三，受控消融研究表明，预训练数据中的分类多样性比参数规模更重要。具体来说，一个在多样化数据上训练的轻量级基线相比其原始模型实现了67.5%的性能提升。总体而言，ViroBench为未来病毒核苷酸基础模型的研究提供了可解释的诊断评估和可重复的测量框架。数据集和代码公开于https://github.com/QIANJINYDX/ViroBench。

英文摘要

Nucleotide sequences constitute the fundamental genetic basis of biological systems, rendering viral genomic analysis critical for biomedical advancement. Despite progress in biological foundation models, specifically nucleotide foundation models (NFMs), the field lacks a unified standard for viral genomics to facilitate community development and enforce biosecurity constraints. To address this, we introduce ViroBench, the first comprehensive and large-scale benchmark specifically designed for NFMs in viral settings. ViroBench evaluates models across two critical dimensions: biological understanding and latent biosecurity risk, covering 18 diverse scenarios within 4 task types. Extensive evaluation of 66 NFMs across diverse architectures yields three critical conclusions. Firstly, NFMs exhibit a performance degradation in biological understanding under phylogenetic and temporal shifts, indicating weak extrapolation capabilities. Secondly, generation tasks reveal a decoupling between statistical likelihood and biological functional validity, posing latent biosecurity risks. Thirdly, controlled ablation studies reveal that taxonomic diversity in pretraining data outweighs parameter scale. Specifically, a lightweight baseline trained on diverse data achieves a 67.5% performance gain over its original model. Overall, ViroBench provides interpretable, diagnostic evaluations and a reproducible measurement framework for future research on viral nucleotide foundation models. The datasets and code are publicly available at https://github.com/QIANJINYDX/ViroBench.

URL PDF HTML ☆

赞 0 踩 0

2605.25383 2026-05-26 stat.ML cs.LG math.ST stat.TH 版本更新

Learning manifold diffusion semigroups from graph transition matrices

从图转移矩阵学习流形扩散半群

Xiuyuan Cheng, Nan Wu

发表机构 * Department of Mathematics, Duke University（杜克大学数学系）； Department of Mathematical Sciences, The University of Texas at Dallas（德克萨斯大学达拉斯分校数学科学系）

AI总结本文提出通过迭代图转移矩阵直接逼近流形热半群，在低正则性假设下给出了无穷范数误差界，并实现了与图拉普拉斯方法相当的收敛速率。

详情

AI中文摘要

我们考虑由从嵌入欧氏空间的未知流形中抽取的有限独立同分布样本构建的图扩散过程，其中图亲和度由环境高斯核矩阵定义。我们证明，在测试函数 $f$ 仅具有低正则性假设（包括 $f \in L^\infty$ 的情况）下，流形热半群 $Q_t = e^{t\Delta}$ 可以通过迭代图转移矩阵 $P$ 直接逼近。我们以 $\infty$-范数界定了 $\| P^n f - Q_t f \|$，其中算子对 $f$ 的作用被适当定义，并且对于扩散时间 $t$ 至 $O(1)$ 及更长，我们恢复了经典图拉普拉斯逐点速率 $O(N^{-2/(d+6)})$（忽略对数因子）。该速率适用于样本内误差以及样本外泛化，其中新点处 $Q_t f$ 的估计量通过核卷积定义。为了处理流形上的非均匀采样密度，我们引入了图转移矩阵的右归一化；在采样密度 $p$ 为 $C^3$ 且远离零的假设下，相同的收敛速率成立。我们在模拟数据上数值验证了所提估计器的性能。

英文摘要

We consider graph diffusion processes constructed from finite i.i.d. samples drawn from an unknown manifold embedded in ambient Euclidean space, where the graph affinity is defined by an ambient Gaussian kernel matrix. We show that the manifold heat semigroup $Q_t = e^{tΔ}$ can be approximated directly by iterating the graph transition matrix $P$, under only low regularity assumptions on the test function $f$, including the case $f \in L^\infty$. We bound $\| P^n f - Q_t f \|$ in $\infty$-norm, with the operator application to $f$ properly defined, and we recover the classical graph-Laplacian pointwise rate $O(N^{-2/(d+6)})$ up to logarithmic factors, for diffusion times $t $ up to $O(1)$ and longer. The rate holds for in-sample error as well as out-of-sample generalization, where the estimator of $Q_t f$ at a new point is defined via kernel convolution. To handle non-uniform sampling densities on the manifold, we introduce a right-normalization of the graph transition matrix; under the assumption that the sampling density $p$ is $C^3$ and bounded away from zero, the same convergence rates hold. We numerically demonstrate the performance of the proposed estimator on simulated data.

URL PDF HTML ☆

赞 0 踩 0

2605.25381 2026-05-26 cs.LG 版本更新

Not only where, But when: Temporal Scheduling for RLVR

不仅在哪里，而且何时：RLVR 的时间调度

Jinghao Zhang, Ruilin Li, Feng Zhao, Jiaqi Wang

发表机构 * University of Science and Technology of China（中国科学技术大学）； Shanghai Innovation Institute（上海创新研究院）； Wuhan University（武汉大学）

AI总结针对强化学习可验证奖励（RLVR）中忽略策略行为异质性的问题，提出时间调度方法，通过动态调整信用分配标准来优化学习动态，实验表明该方法能提升训练稳定性和效率。

Comments Github: https://github.com/Jinghaoleven/RLVR-Schedule

详情

AI中文摘要

具有可验证奖励的强化学习（RLVR）已成为大型语言模型（LLMs）后训练的核心技术。虽然策略优化由所有采样token在全局广播标量奖励下驱动，但轨迹中表现出的异质性策略行为在很大程度上被忽视而未加以区分。现有工作通过信用分配来解决这一问题，包括token级优势重加权和选择性token优化，然而分配标准在整个训练过程中基本保持不变，限制了策略的弹性演化。在这项工作中，我们认为学习信号的调度时机与它们在token间的分配位置同样重要，并引入了时间维度，即在RLVR优化过程中调度信用分配标准。我们发现，优先关注具有特定策略行为的目标token，并逐渐向通用优化衰减，可以带来更稳定和高效的学习动态。此外，我们表明简单的轨迹百分位数为区分策略行为提供了自然视角，并与时间调度有效配合。我们的分析揭示，标准优化在同时适应异质性行为时显著牺牲了策略熵，而时间调度产生了更健康的策略演化动态。在数学和通用推理基准上的实验表明了一致的改进，表明时间调度构成了一个有前景的优化维度。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has become a core technique for post-training of Large Language Models (LLMs). While policy optimization is driven by all sampled tokens under a globally broadcast scalar reward, the heterogeneous policy behaviors exhibited along trajectories are largely overlooked without differentiation. Existing works address this by credit allocation, including token-level advantage reweighting, and selective token optimization, however, the allocation criterion are principally stagnant throughout training, limiting resilient policy evolution. In this work, we argue that \textit{when} learning signals are scheduled can be as important as \textit{where} they are allocated across tokens, and introduce the temporal dimension that scheduling the credit allocation criteria over the course of RLVR optimization. We find that prioritizing targeted tokens emphasized with specific policy behaviors, and gradually attenuating toward general optimization leads to more stable and efficient learning dynamics. Furthermore, we show that simple trajectory percentiles provide a natural perspective for distinguishing policy behaviors, and works effectively with temporal scheduling. Our analysis reveals that standard optimization substantially sacrifices policy entropy when simultaneously accommodating heterogeneous behaviors, whereas temporal scheduling yields healthier policy evolution dynamics. Experiments across mathematical and general reasoning benchmarks demonstrate consistent improvements, suggesting that temporal scheduling constitutes a promising optimization dimension.

URL PDF HTML ☆

赞 0 踩 0

2605.25352 2026-05-26 cs.LG cs.AI 版本更新

Certified Robustness from Approximate Gaussian Mixture Structures in Pretrained Latent Spaces

基于预训练潜在空间中近似高斯混合结构的认证鲁棒性

Konstantinos Emmanouilidis, Tianjiao Ding, Nghia Nguyen, Nicolas Loizou, René Vidal

发表机构 * CS & MINDS Johns Hopkins University（计算机科学与MINDS约翰霍普金斯大学）； CIS University of Pennsylvania（计算机与信息科学宾夕法尼亚大学）； AMS & MINDS Johns Hopkins University（人工智能与机器学习系约翰霍普金斯大学）； ESE, Radiology & IDEAS University of Pennsylvania（工程科学与放射学系及IDEAS宾夕法尼亚大学）

AI总结本文提出一个框架，利用预训练编码器将输入映射到近似高斯混合的潜在分布，通过理论分析证明鲁棒性退化有界，从而实现可认证鲁棒分类器，在CIFAR-10和ImageNet上达到最优或竞争性的认证准确率。

详情

AI中文摘要

深度学习模型易受对抗扰动影响，这对安全关键部署提出了重要关切。经验性防御在实践中可以实现强鲁棒性，但缺乏形式化保证，这推动了可认证鲁棒分类器的需求。虽然认证方法提供了形式化保证，但由于无法利用复杂数据分布中的结构，它们通常产生过于保守的边界。在这项工作中，我们提出了一个设计可认证鲁棒分类器的框架，该框架利用数据表示中的潜在结构。我们首先分析高斯混合设置，推导出鲁棒分类器存在的必要和充分条件，并构建了一个具有闭式鲁棒性证书和泛化保证的分类器。我们的主要贡献是证明精确结构并非必需：我们证明，如果预训练编码器将输入映射到一个与高斯混合分布$\varepsilon$-接近（在KL散度下）的潜在分布，那么认证准确率会优雅地退化，并给出了一个显式边界，关联真实分布和近似分布下的鲁棒性。这一结果使得直接使用预训练模型成为可能，而无需精确的分布假设。实验上，我们的方法在CIFAR-10和ImageNet上实现了最先进或具有竞争力的认证准确率，同时保持了强大的干净性能和低计算开销。总体而言，我们的工作将近似潜在结构确立为通往可认证鲁棒性的一条实用且有原则的路径。

英文摘要

Deep learning models are vulnerable to adversarial perturbations, raising important concerns for safety-critical deployment. Empirical defenses can achieve strong robustness in practice, but lack formal guarantees, motivating the need for certifiably robust classifiers. While certified methods provide formal guarantees, they often yield overly conservative bounds due to their inability to exploit structure in complex data distributions. In this work, we propose a framework for designing certifiably robust classifiers that leverages latent structure in data representations. We first analyze the Gaussian mixture setting, deriving necessary and sufficient conditions for the existence of robust classifiers and constructing a classifier with a closed-form robustness certificate and generalization guarantees. Our main contribution is to show that exact structure is not required: we prove that if a pretrained encoder maps inputs to a latent distribution that is $\varepsilon$-close (in KL divergence) to a Gaussian mixture, then certified accuracy degrades gracefully, with an explicit bound relating robustness under the true and approximate distributions. This result enables the direct use of pretrained models without requiring exact distributional assumptions. Empirically, our method achieves state-of-the-art or competitive certified accuracy on CIFAR-10 and ImageNet, while maintaining strong clean performance and low computational overhead. Overall, our work establishes approximate latent structure as a practical and principled route to certifiable robustness.

URL PDF HTML ☆

赞 0 踩 0

2605.25348 2026-05-26 eess.IV cs.AI cs.CV cs.LG cs.SC 版本更新

Parameter-Efficient CT Reconstruction via Deep Graph Laplacian Regularization

基于深度图拉普拉斯正则化的参数高效CT重建

Veera Varuni Radhakrishnan, Chinthaka Dinesh, Qurat-ul-Ain Azim

发表机构 * Mechanical and Industrial Engineering Department（机械与工业工程系）

AI总结提出深度图拉普拉斯正则化（Deep GLR）方法，通过将二次图正则化集成到近端前向-后向分裂优化框架中，仅用少量参数和数据即可实现低剂量CT重建的噪声抑制，在参数效率和数据效率上显著优于现有方法。

Comments 7 pages, 3 figures, conference

详情

AI中文摘要

低剂量计算机断层扫描（LDCT）重建面临重建质量与资源需求之间的关键权衡。虽然最近的深度学习方法达到了最先进的性能，但它们通常依赖超过50万个参数，并在超过35,000次扫描的大规模数据集上训练。本文研究在严格资源约束下，基于图的正则化是否能提供有意义的噪声抑制。我们提出了深度图拉普拉斯正则化（Deep GLR），将二次图正则化集成到近端前向-后向分裂优化框架中，并包含三个轻量级CNN模块。在LoDoPaB-CT基准上评估，Deep GLR达到了30.70 dB的PSNR，相比滤波反投影提高了6.33 dB，同时仅使用了91,848个参数，在1000个样本上训练（标准训练集的2.8%）。与基准方法相比，这代表了每dB改进5.8倍的参数效率和30倍的数据效率。学习到的图带宽参数（ε=1.25）收敛到可解释的值，表明该方法捕捉了有意义的图像先验而非过拟合。尽管与最先进方法相比仍有13 dB的差距，但结果表明基于图的正则化为资源受限的医学成像场景提供了有利的效率-质量权衡。

英文摘要

Low-dose computed tomography (LDCT) reconstruction faces a critical tradeoff between reconstruction quality and resource requirements. While recent deep learning methods achieve state-of-the-art performance, they typically rely on over 500,000 parameters trained on large-scale datasets exceeding 35,000 scans. This work investigates whether graph-based regularization can provide meaningful noise reduction under strict resource constraints. We propose Deep Graph Laplacian Regularization (Deep GLR), integrating quadratic graph regularization into a Proximal Forward-Backward Splitting optimization framework with three lightweight CNN modules. Evaluated on the LoDoPaB-CT benchmark, Deep GLR achieves 30.70 dB PSNR, representing a 6.33 dB improvement over filtered backprojection, while using only 91,848 parameters trained on 1000 samples (2.8\% of standard training set). Compared to benchmark methods, this represents 5.8 times better parameter efficiency and 30 times better data efficiency per dB improvement. The learned graph bandwidth parameter ($ε$=1.25) converges to interpretable values, suggesting the method captures meaningful image priors rather than overfitting. While a 13 dB gap remains versus state-of-the-art methods, results demonstrate that graph-based regularization provides a favorable efficiency-quality tradeoff for resource-constrained medical imaging scenarios.

URL PDF HTML ☆

赞 0 踩 0

2605.25347 2026-05-26 cs.CV cs.LG 版本更新

CausalFlow: LLM Agent 失败的因果归因与反事实修复

Akash Bonagiri, Devang Borkar, Gerard Janno Anderias, Setareh Rafatirad, Houman Homayoun

发表机构 * Department of Computer Science University of California, Davis（计算机科学系加州大学戴维斯分校）

AI总结提出CausalFlow框架，通过反事实干预计算步骤级因果责任分数，识别失败步骤并生成最小编辑修复，用于测试时修复和训练时监督，在多个基准上优于启发式方法。

详情

AI中文摘要

大型语言模型（LLM）代理在涉及推理、工具使用和环境交互的多步任务中经常失败。虽然此类失败通常被记录或通过启发式重试处理，但它们包含了关于执行中断位置的结构化信号。我们提出了CausalFlow，一个干预框架，将失败的代理轨迹转换为最小的反事实修复和可重用的监督。CausalFlow将执行轨迹建模为依赖步骤的顺序链，并通过步骤级反事实干预计算因果责任分数（CRS）来识别导致失败的步骤。对于这些步骤，我们生成最小编辑修复，将最终结果翻转为成功，产生形式为（错误步骤，修正步骤）的验证对比对。CausalFlow支持两种互补用途：具有最小行为漂移的针对性测试时修复，以及适用于离线偏好优化或奖励建模的训练时监督。在涵盖数学推理、代码生成、问答和医学浏览的四个基准测试中，CausalFlow将失败执行转换为具有高最小性和因果一致性分数的验证最小修复，并证明因果归因对于跨不同代理任务的可靠改进是必要的，在复杂检索设置中优于启发式细化，同时产生更局部的修复。这些结果表明，对结构化执行轨迹的干预分析提供了一种原则性和可扩展的机制，将代理失败转化为可靠性提升和可学习的监督。

英文摘要

Large language model (LLM) agents frequently fail on multi-step tasks involving reasoning, tool use, and environment interaction. While such failures are typically logged or retried heuristically, they contain structured signals about where execution broke down. We introduce CausalFlow, an interventional framework that converts failed agent traces into minimal counterfactual repairs and reusable supervision. CausalFlow models execution traces as sequential chains of dependent steps and computes Causal Responsibility Scores(CRS) via step-level counterfactual intervention to identify failure-inducing steps. For these steps, we generate minimally edited repairs that flip the final outcome to success, producing validated contrastive pairs of the form (wrong step, corrected step). CausalFlow supports two complementary uses: targeted test-time repair that recovers from failures with minimal behavioral drift, and training-time supervision suitable for offline preference optimization or reward modeling. Across four benchmarks spanning mathematical reasoning, code generation, question answering, and medical browsing, CausalFlow converts failed executions into validated minimal repairs with high minimality and causal-consensus scores, and demonstrates that causal attribution is necessary for reliable improvement across diverse agent tasks, outperforming heuristic refinement in complex retrieval settings while producing more localized repairs throughout. These results demonstrate that interventional analysis over structured execution traces provides a principled and scalable mechanism for transforming agent failures into reliability gains and learning-ready supervision.

URL PDF HTML ☆

赞 0 踩 0

2605.25313 2026-05-26 cs.LG cs.AI cs.RO stat.ML 版本更新

UWM-JEPA: Predictive World Models That Imagine in Belief Space

UWM-JEPA：在信念空间中进行想象的世界预测模型

Santosh Kumar Radha, Oktay Goktas

发表机构 * AgentField AI

AI总结针对部分可观测环境，提出UWM-JEPA模型，通过密度矩阵潜变量和酉预测器在信念空间中保持联合状态谱，实现长时域盲推演下的不确定性保持，显著优于向量潜变量基线。

Comments 14 pages, 6 figures, 7 tables. Code and data: https://github.com/santoshkumarradha/uwm-jepa

详情

AI中文摘要

部分可观测环境下的世界模型必须想象多个兼容的隐藏未来，并在反事实动作下引导它们。联合嵌入预测架构（JEPAs）在潜在空间中实现这一点，但向量值潜变量没有内部结构来承载盲推演过程中隐藏连续性的信念。我们引入了酉世界模型JEPA（UWM-JEPA），这是一种JEPA世界模型，具有在联合系统-环境空间上的密度矩阵潜变量和学习的酉预测器。该结构在推演过程中精确保持联合状态谱，因此预测器本身不会耗散表示的不确定性。在一个需要根据给定动作序列进行五步前向模拟且目标观测被掩蔽的隐藏速度指示任务中，UWM-JEPA达到0.77的准确率，并且随着动作被扰动而单调下降；而参数匹配的LSTM-JEPA在相同的反事实目标目标和动作头训练下，在所有动作条件下都崩溃为多数类准确率（0.53）。在盲推演下，UWM-JEPA在短时域上损失不到十个点的探针R^2，而向量潜变量基线损失四十一个和六十八个点；两者在保留的上下文探针上表现相当，表明差异在于预测器而非编码器。动作敏感性本身需要针对反事实而非教师强制目标进行训练，这一发现适用于酉参数化之外。对于JEPA世界模型在部分可观测性下进行想象，潜变量几何和预测器动力学至关重要，而不仅仅是冻结的上下文编码能力。

英文摘要

World models for partially observed environments must imagine multiple compatible hidden futures and steer between them under counterfactual actions. Joint Embedding Predictive Architectures (JEPAs) do this in latent space, but a vector-valued latent has no internal structure for carrying the belief over hidden continuations through blind rollout. We introduce the Unitary World Model JEPA (UWM-JEPA), a JEPA world model with a density-matrix latent on a joint system-environment space and a learned unitary predictor. The construction preserves the joint-state spectrum exactly during rollout, so the predictor itself cannot dissipate the represented uncertainty. On a hidden-velocity indicator task requiring five-step forward simulation under a given action sequence with the target observation masked, UWM-JEPA reaches 0.77 accuracy and degrades monotonically as actions are perturbed; a parameter-matched LSTM-JEPA trained under the same counterfactual-target objective and action head collapses to majority-class accuracy (0.53) under every action condition. Under blind rollout, UWM-JEPA loses fewer than ten points of probe R^2 at short horizons while vector-latent baselines lose forty-one and sixty-eight; both nevertheless tie on a held-out context probe, locating the separation in the predictor rather than the encoder. Action sensitivity itself requires training against counterfactual rather than teacher-forced targets, a finding that applies beyond the unitary parameterisation. For JEPA world models to imagine under partial observability, latent geometry and predictor dynamics matter, not frozen context-encoding capacity alone.

URL PDF HTML ☆

赞 0 踩 0

2605.25305 2026-05-26 cs.LG 版本更新

Electricity Consumption Forecasting: An Approach Using Cooperative Ensemble Learning with SHapley Additive exPlanations

电力消耗预测：一种使用SHapley加法解释的协作集成学习方法

Eduardo Luiz Alba, Gilson Adamczuk Oliveira, Matheus Henrique Dal Molin Ribeiro, Érick Oliveira Rodrigues

发表机构 * Industrial & Systems Engineering Graduate Program (PPGEPS), Federal University of Technology-Parana (UTFPR)（工业与系统工程研究生项目（PPGEPS），联邦技术大学-巴兰（UTFPR））

AI总结提出一种名为弱分离器增强器（WSB）的协作集成学习方法，结合LSTM、RF、SVR和XGBoost模型，利用SHAP进行特征选择，遗传算法和粒子群优化超参数，对巴西联邦学院两个校区未来12个月的电力消耗进行预测，取得较低误差。

详情

DOI: 10.3390/forecast6030042
Journal ref: Forecasting 2024

AI中文摘要

电力费用管理面临重大挑战，因为该资源易受多种影响因素影响。在大学中，随着机构扩张，对该资源的需求迅速增长，并对环境产生显著影响。本研究使用长短期记忆（LSTM）、随机森林（RF）、支持向量回归（SVR）和极端梯度提升（XGBoost）机器学习模型，基于巴拉那联邦学院（IFPR）过去七年的历史消费数据和气候变量，训练模型以预测未来12个月的电力消耗。采用了两个校区的数据集。为了提高模型性能，使用Shapley加法解释（SHAP）进行特征选择，并使用遗传算法（GA）和粒子群优化（PSO）进行超参数优化。结果表明，所提出的名为弱分离器增强器（WSB）的协作集成学习方法在数据集上表现最佳。具体而言，对于IFPR-Palmas校区，其sMAPE为13.90%，MAE为1990.87 kWh；对于Coronel Vivida校区，sMAPE为18.72%，MAE为465.02 kWh。SHAP分析揭示了两个IFPR校区不同的特征重要性模式。一个共同点是滞后时间序列值的强烈影响和气候变量的最小影响。

英文摘要

Electricity expense management presents significant challenges, as this resource is susceptible to various influencing factors. In universities, the demand for this resource is rapidly growing with institutional expansion and has a significant environmental impact. In this study, the machine learning models long short-term memory (LSTM), random forest (RF), support vector regression (SVR), and extreme gradient boosting (XGBoost) were trained with historical consumption data from the Federal Institute of Paraná (IFPR) over the last seven years and climatic variables to forecast electricity consumption 12 months ahead. Datasets from two campuses were adopted. To improve model performance, feature selection was performed using Shapley additive explanations (SHAP), and hyperparameter optimization was carried out using genetic algorithm (GA) and particle swarm optimization (PSO). The results indicate that the proposed cooperative ensemble learning approach named Weaker Separator Booster (WSB) exhibited the best performance for datasets. Specifically, it achieved an sMAPE of 13.90% and MAE of 1990.87 kWh for the IFPR-Palmas Campus and an sMAPE of 18.72% and MAE of 465.02 kWh for the Coronel Vivida Campus. The SHAP analysis revealed distinct feature importance patterns across the two IFPR campuses. A commonality that emerged was the strong influence of lagged time-series values and a minimal influence of climatic variables.

URL PDF HTML ☆

赞 0 踩 0

2605.25304 2026-05-26 cs.LG cs.CR cs.CV 版本更新

When Interpretability Becomes a Liability: Adversarial Attacks on CBM Concept Layers

当可解释性成为负担：针对CBM概念层的对抗攻击

Aditya Sridhar

发表机构 * Independent Researcher（独立研究者）

AI总结本文系统研究了概念瓶颈模型（CBM）中概念层的对抗性脆弱性，提出了一种基于语义扰动的稳定性正则化防御方法SPECTRA，显著提高了攻击所需的最小扰动范数，同时保持了分类精度。

Comments Accepted to CVPR 2026 (Findings). 9 pages, 6 figures

详情

AI中文摘要

概念瓶颈模型（CBM）已成为可解释机器学习的基础方法，通过显式的概念激活提供人类可理解的中间表示。然而，这种可解释性从根本上引入了一个关键且先前未被探索的攻击面：概念瓶颈层本身。我们提出了对CBM中概念级对抗性脆弱性的全面、系统性研究，揭示了对输入像素进行有针对性的最小扰动可以通过操纵语义表示导致灾难性的错误分类。我们开发了一个严格的理论框架来量化概念空间的鲁棒性，建立了揭示这些架构脆弱性景观的新指标。我们在CUB-200-2011数据集上的广泛分析表明，标准CBM对概念级操纵表现出严重的敏感性。为了解决这一关键弱点，我们引入了SPECTRA（基于语义扰动的概念训练以增强对抗鲁棒性），一种原则性的稳定性正则化防御。SPECTRA有效地强化了语义表示空间，将成功攻击所需的最小扰动范数从0.46提高到超过4,200，使得有针对性的概念操纵在计算上变得不可行。此外，SPECTRA将基线分类精度保持在2.2%以内。通过将概念级攻击确立为一种根本不同的威胁模型，这项工作在可解释机器学习与对抗鲁棒性的交叉领域开辟了一个新的研究前沿。

英文摘要

Concept Bottleneck Models (CBMs) have emerged as a cornerstone approach for interpretable machine learning, providing human-understandable intermediate representations through explicit concept activations. However, this interpretability fundamentally introduces a critical, previously unexplored attack surface: the concept bottleneck layer itself. We present a comprehensive, systematic study of concept-level adversarial vulnerabilities in CBMs, revealing that targeted, minimal perturbations operating on input pixels can induce catastrophic misclassification by manipulating semantic representations. We develop a rigorous theoretical framework to quantify concept-space robustness, establishing novel metrics that expose the vulnerability landscape of these architectures. Our extensive analysis on the CUB-200-2011 dataset demonstrates that standard CBMs exhibit severe susceptibility to concept-level manipulation. To address this critical weakness, we introduce SPECTRA (Semantic Perturbation-based Concept Training for Robustness against Attacks), a principled stability regularization defense. SPECTRA effectively hardens the semantic representation space, increasing the minimal perturbation norm required for a successful attack from 0.46 to over 4,200, rendering targeted concept manipulation computationally prohibitive. Furthermore, SPECTRA preserves baseline classification accuracy to within 2.2%. By establishing concept-level attacks as a fundamentally distinct threat model, this work opens a new research frontier at the intersection of interpretable machine learning and adversarial robustness.

URL PDF HTML ☆

赞 0 踩 0

2605.25290 2026-05-26 stat.ML cs.LG 版本更新

Choosing Online Experiment Designs under Interference in Ads, Recommendations, and Member-Experience Systems

广告、推荐和会员体验系统中存在干扰时的在线实验设计选择

Prashant Shekhar, Caroline Howard

发表机构 * Department of Mathematics（数学系）； Embry-Riddle Aeronautical University（埃姆布里-瑞德尔航空航天大学）

AI总结针对广告、推荐和会员体验系统中干扰机制未知的问题，提出一种基于鲁棒设计选择的框架，通过最坏情况规划风险比较六种可实施设计，并给出几何感知保证和有限目录近似定理。

详情

AI中文摘要

广告、推荐和会员体验系统中的在线实验通常是在主导干扰机制已知之前规划的。处理效应可能通过预算、库存、生产者曝光、图溢出或时间结转传播，使得随机化设计本身成为一个统计决策。我们将此问题形式化为在不确定曝光机制下的鲁棒设计选择。给定一个包含六种可实施设计的有限目录，选择器通过模糊集上的最坏情况规划风险比较每种设计。风险结合了曝光偏差、分配单元方差、最小可检测效应、污染或结转、操作成本和估计量不匹配。在理论证明方面，本文开发了一种几何感知保证，指出设计偏差受限于到发布曝光分布的Wasserstein距离，并且该惩罚在Lipschitz曝光响应下是极小极大紧的。我们还证明了有限目录近似和具有超额风险控制的鲁棒选择器定理、在分离条件下的精确恢复，以及当风险曲面平坦时的认证候选列表。实证上，同一选择器在来自公共数据集的样本上给出不同的推荐。它在Criteo广告上选择用户随机化，无量纲鲁棒风险为1.295；在Open Bandit-bts/men上选择切换设计，风险为2.105；在KuaiRand上选择聚类随机化，风险为2.240。Open Bandit案例强调了已知但不均匀的日志记录支持，倾向性从0.00006到0.594，IPS有效样本份额为5.17%。总体而言，本文贡献了一个基于机制鲁棒设计决策的干扰感知实验设计框架，输出要么是合理的设计选择，要么是不确定性候选列表。

英文摘要

Online experiments in ads, recommendation, and member-experience systems are often planned before the dominant interference mechanism is known. A treatment may propagate through budgets, inventory, producer exposure, graph spillovers, or temporal carryover, making the randomization design itself a statistical decision. We formulate this problem as robust design selection over uncertain exposure mechanisms. Given a finite catalog of six implementable designs, the selector compares each design by worst-case planning risk over an ambiguity set. The risk combines exposure bias, assignment-unit variance, minimum detectable effect, contamination or carryover, operational cost, and estimand mismatch. For theoretical justification, the paper develops a geometry-aware guarantee, stating that design bias is bounded by Wasserstein distance to the launch exposure distribution, and this penalty is minimax tight under Lipschitz exposure response. We also prove finite-catalog approximation and a robust selector theorem with excess-risk control, exact recovery under separation, and certified shortlists when the risk surface is flat. Empirically, the same selector gives different recommendations across samples from public datasets. It selects user-randomization on Criteo ads with dimensionless robust risk 1.295, switchbacks on Open Bandit-bts/men with risk 2.105, and cluster-randomization on KuaiRand with risk 2.240. The Open Bandit case stresses known but uneven logging support, with propensities from 0.00006 to 0.594 and a 5.17% IPS effective-sample share. Overall, the paper contributes an interference-aware experiment design framework based on mechanism-robust design decisions, where the output is either a justified design choice or an uncertainty shortlist.

URL PDF HTML ☆

赞 0 踩 0

2605.25275 2026-05-26 cs.LG 版本更新

Label-NTK Alignments and A Tighter Convergence Bound in the NTK Regime

标签-NTK 对齐与 NTK 区域中更紧的收敛界

Ruchirinkil Marreddy, Chaoyue Liu

发表机构 * Elmore Family School of Electrical and Computer Engineering（埃洛姆家族电气与计算机工程学院）

AI总结通过标签与NTK特征谱的对齐特性，提出更紧的收敛界，显著改进经典最坏情况结果。

详情

AI中文摘要

神经正切核（NTK）框架通过近似线性化动力学解释过参数化神经网络的优化，提供指数收敛保证。然而，现有结果往往过于悲观，与实际快速训练不符，因为它们依赖于最小的NTK特征值，而该特征值在实践中通常极小。在这项工作中，我们通过刻画数据标签与NTK特征谱之间的相互作用，开发了更精确的收敛保证。我们识别出两个关键现象：标签-NTK对齐和残差-NTK对齐，表明标签和残差在NTK特征向量上的投影与对应特征值成比例。我们在温和的数据假设下提供了经验证据和理论证明。利用这些对齐性质，我们推导出一个依赖于完整谱的精细收敛界，该界紧密匹配实际训练动态，显著优于经典最坏情况结果。我们进一步获得了改进的泛化界。在多个数据集上的MLP和CNN实验验证了我们的理论。

英文摘要

The Neural Tangent Kernel (NTK) framework explains optimization in over-parameterized neural networks via approximately linearized dynamics, yielding exponential convergence guarantees. However, existing results are often overly pessimistic and do not match the fast training in practice, as they depend on the smallest NTK eigenvalue, which is typically extremely small in practice. In this work, we develop sharper convergence guarantees by characterizing the interaction between data labels and the NTK eigen-spectrum. We identify two key phenomena, Label-NTK alignment and Residual-NTK alignment, showing that projections of labels and residuals onto NTK eigenvectors scale with the corresponding eigenvalues. We provide empirical evidence and theoretical justification under mild data assumptions. Exploiting these alignment properties, we derive a refined convergence bound that depends on the full spectrum and closely matches practical training dynamics, significantly improving over classical worst-case results. We further obtain improved generalization bounds. Experiments on MLPs and CNNs across multiple datasets validate our theory.

URL PDF HTML ☆

赞 0 踩 0

2605.25267 2026-05-26 cs.LG cs.AI 版本更新

Latent Q-Barrier Shielding for Safe In-Context Reinforcement Learning

潜在Q-屏障屏蔽用于安全上下文强化学习

Minjae Kwon, Amir Moeini, Shangtong Zhang, Lu Feng

发表机构 * University of Virginia（弗吉尼亚大学）

AI总结提出一种潜在Q-屏障屏蔽方法，通过学习上下文表示、潜在动力学和集成成本评论家，在部署时无需参数更新即可根据剩余预算和预测未来成本过滤或软重加权候选动作，从而改善安全上下文强化学习在分布外转移下的奖励-安全权衡。

详情

AI中文摘要

安全上下文强化学习（ICRL）在测试时不更新参数，仅从交互历史中在线适应，同时将情节成本控制在安全预算内。在分布外（OOD）部署转移下，仅预训练的安全ICRL可能产生较差的奖励-安全权衡，因为剩余预算仅通过冻结的策略条件影响行为，而非通过针对预测未来成本的显式动作级检查。我们提出一种潜在Q-屏障屏蔽，在部署前学习上下文表示、潜在动力学和集成成本评论家。无需参数更新，该屏蔽从历史中推断上下文，并使用剩余预算和预测未来成本过滤或软重加权候选动作。我们证明了一个条件性的、误差分解的屏障-边际结果：满足Q-屏障的动作将下一个潜在预算状态置于近似预算安全的延续中（在学习的评论家下），误差上界由贝尔曼误差和潜在预测误差决定。在五个安全ICRL基准测试中，该屏蔽在部署时相比强安全ICRL基线改善了奖励-安全权衡：在短上下文窗口后，它在五个基准中的四个上实现了更高的回报，同时在所有五个基准中匹配或降低了平均情节成本。

英文摘要

Safe in-context reinforcement learning (ICRL) adapts online from interaction history without test-time parameter updates while controlling episode cost under a safety budget. Under out-of-distribution (OOD) deployment shifts, pretraining-only safe ICRL can give poor reward-safety tradeoffs because the remaining budget affects behavior only through frozen policy conditioning, not an explicit action-level check against predicted future cost. We propose a latent Q-Barrier shield that learns a context representation, latent dynamics, and an ensemble cost critic before deployment. Without parameter updates, the shield infers context from history and filters or softly reweights candidate actions using the remaining budget and predicted future cost. We prove a conditional, error-decomposed barrier-margin result: a Q-Barrier-satisfying action leaves the next latent-budget state with an approximately budget-safe continuation under the learned critic, up to Bellman and latent-prediction errors. Across five safe ICRL benchmarks, the shield improves deployment-time reward-safety tradeoffs over a strong safe-ICRL baseline: after a short context window, it achieves higher return in four of five benchmarks while matching or lowering average episode cost in all five.

URL PDF HTML ☆

赞 0 踩 0

2605.25258 2026-05-26 cs.IR cs.AI cs.CY cs.LG 版本更新

First, do no harm: Breaking suicidogenic echo chambers in media recommendation

首先，不伤害：打破媒体推荐中的自杀性回音室

Alberto Díaz-Álvarez, Raúl Lara-Cabrera, Fernando Ortega-Requena, Víctor Ramos-Osuna

发表机构 * E.T.S.I. Sistemas Informáticos (Universidad Politécnica de Madrid)（马德里理工大学信息系统工程系）

AI总结针对推荐系统在心理健康场景中可能加剧用户自杀倾向的问题，提出RankAid重排序方法，通过惩罚有害内容并提升治疗性内容，在保持推荐准确性的同时确保临床安全。

Comments 10 pages, 5 figures. Research on safety-aware recommender systems and algorithmic ethics

详情

AI中文摘要

推荐系统通常优化用户参与度，但在心理健康背景下这种方法存在危险。当脆弱用户表现出自杀意念迹象时，标准算法往往将他们困在有害内容的回音室中，恶化其心理状态。为此，我们引入RankAid，一种重排序方法，在预测相关性的同时优先考虑临床安全性。它作为现有模型的附加层运行：根据用户当前的脆弱程度惩罚风险项目并提升治疗性内容。我们使用MovieLens 1M数据集评估了该方法，其中项目通过大语言模型进行了临床风险和治疗价值的语义注释。我们的模拟表明，该算法在危机高峰期成功阻止了有害内容的推荐，主动重塑信息流以支持情绪降级。此外，这种安全干预仅导致标准准确性指标（如NDCG）可控且可接受的下降。通过使用非对称超参数，RankAid还使系统管理员能够根据特定的临床指南调整干预的严重程度。

英文摘要

Recommender systems generally optimises user engagement, but this approach is dangerous in mental health contexts. When vulnerable users show signs of suicidal ideation, standard algorithms often trap them in echo chambers of harmful content, worsening their psychological state. In response, we introduce RankAid, a re-ranking method that prioritises clinical safety alongside predictive relevance. It works as an add-on layer to existing models: it penalises risky items and boosts therapeutic content depending on the user's current level of vulnerability. We evaluated this approach using the MovieLens 1M dataset, where items were semantically annotated for clinical risk and therapeutic value using large language models. Our simulations show that our algorithm successfully blocks the recommendation of harmful content during crisis peaks, actively reshaping the feed to support emotional de-escalation. Furthermore, this safety intervention only causes a controlled, acceptable drop in standard accuracy metrics like NDCG. By using asymmetric hyperparameters, RankAid also gives system administrators the flexibility to tune the severity of the intervention based on specific clinical guidelines.

URL PDF HTML ☆

赞 0 踩 0

2605.23650 2026-05-26 stat.ML cs.LG 版本更新

凸组合推理模型

Meir Roketlishvili, Semyon Semenov, Maksim Bobrin, Viktor Kovalchuk, Albert Baichorov, Abduragim Shtanchaev, Fakhri Karray, Dmitry V. Dylov, Martin Takáč, Arip Asadulaev

发表机构 * Mohamed bin Zayed University of Artificial Intelligence（莫扎德·本·扎耶德人工智能大学）； Applied AI Institute（应用人工智能研究所）； Computational Imaging Lab（计算成像实验室）

AI总结针对组合推理中能量景观的非凸几何瓶颈，提出凸组合能量最小化框架，通过输入凸神经网络参数化因子并优化紧凸松弛，实现确定性投影一阶优化，在小问题上训练后可零样本迁移到大实例。

详情

AI中文摘要

组合能量模型可以通过在许多局部约束中重用学习到的因子能量，泛化到更大的组合推理问题。在本文中，我们表明组合推理的一个关键瓶颈不是组合本身，而是学习到的能量景观的非凸几何。为了解决这个问题，我们引入了凸组合能量最小化（CCEM），这是一个用输入凸神经网络参数化每个因子，并在可行集的紧凸松弛上优化组合能量的框架。由于凸性在求和下保持不变，全局松弛目标保持凸性，从而能够进行确定性投影一阶优化。CCEM分两个阶段训练：因子级对比学习以塑造局部能量盆地，然后通过展开的投影求解器进行端到端细化。我们的实验表明，在小子问题或单个问题规模上训练的模型可以无需重新训练地迁移到更大的实例。

英文摘要

Compositional energy-based models can generalize to larger combinatorial reasoning problems by reusing a learned factor energy across many local constraints. In our paper, we show that a key bottleneck in compositional reasoning is not composition itself, but the non-convex geometry of the learned energy landscape. To solve this problem, we introduce Convex Compositional Energy Minimization (CCEM), a framework that parameterizes each factor with an input-convex neural network and optimizes the composed energy over a tight convex relaxation of the feasible set. Because convexity is preserved under summation, the global relaxed objective remains convex, enabling deterministic projected first-order optimization. CCEM is trained in two stages: factor-level contrastive learning to shape local energy basins, followed by end-to-end refinement through an unrolled projected solver. Our experiments show that our models trained on small subproblems or a single problem size transfer to larger instances without retraining.

URL PDF HTML ☆

赞 0 踩 0

2605.22894 2026-05-26 cs.GR cs.LG cs.RO 版本更新

SCRIPT: Scalable Diffusion Policy with Multi-stage Training for Language-driven Physics-based Humanoid Control

SCRIPT: 面向语言驱动的物理仿真人体控制的可扩展扩散策略与多阶段训练

Jingyan Zhang, Han Liang, Ruichi Zhang, Bin Li, Juze Zhang, Xin Chen, Jingya Wang, Lan Xu, Jingyi Yu

发表机构 * ShanghaiTech University（上海科技大学）； University of Pennsylvania（宾夕法尼亚大学）； Stanford University（斯坦福大学）

AI总结提出SCRIPT框架，通过联合动作-状态-文本扩散Transformer和多阶段训练（监督模仿预训练+混合奖励强化学习后训练），实现语言指令驱动的物理仿真人体控制，在文本对齐、运动质量和物理真实性上超越现有方法。

Comments Project page: https://zhanglele12138.github.io/SCRIPT/

详情

AI中文摘要

从自然语言指令控制物理仿真人体是迈向通用具身智能体的关键一步。然而，现有方法仍受限于语义表达能力和物理可行性之间的张力，往往难以同时实现忠实的指令跟随、高质量的运动和稳定的长时程控制。我们提出SCRIPT，一种具有多阶段训练框架的可扩展扩散策略，用于语言驱动的物理仿真人体控制。SCRIPT的核心是联合动作-状态-文本扩散Transformer（JAST-DiT），它将动作、物理状态和文本表示为专门的令牌流，并通过联合注意力将它们耦合，使语言语义和控制动态之间能够直接交互。为了稳定自回归控制，我们引入了一种非线性历史条件机制，该机制保留密集的近期上下文，并从长期历史中采样越来越稀疏的线索。除了监督模仿预训练外，我们提出了一个后训练阶段，使用混合奖励强化学习（RLHR）进一步提高性能。通过将可学习噪声注入流采样过程，RLHR利用混合物理反馈和文本奖励在闭环模拟中有效改善运动质量和指令跟随。定量评估表明，SCRIPT在文本对齐、运动质量和物理真实性指标上均优于先前的最先进方法。此外，在1200小时的MotionMillion数据集上的扩展研究显示，随着模型规模的扩大，性能持续提升，突显了SCRIPT在大规模预训练中的稳健可扩展性。我们的代码将公开供未来研究使用。

英文摘要

Controlling physics-based humanoids from natural-language instructions is a critical step toward general-purpose embodied agents. However, existing methods remain constrained by a tension between semantic expressiveness and physical feasibility, often failing to jointly achieve faithful instruction following, high-quality motion, and stable long-horizon control. We propose SCRIPT, a scalable diffusion policy with a multi-stage training framework for language-driven physics-based humanoid control. The core of SCRIPT is a Joint Action-State-Text Diffusion Transformer (JAST-DiT), which represents actions, physical states, and text as dedicated token streams and couples them through joint attention, enabling direct interaction between language semantics and control dynamics. To stabilize autoregressive control, we introduce a nonlinear history conditioning mechanism, which preserves the dense recent context and samples increasingly sparse cues from long-term history. Beyond supervised imitation pre-training, we propose a post-training stage, further improving the performance using Reinforcement Learning with Hybrid Rewards (RLHR). By injecting learnable noise into the flow-sampling process, RLHR effectively improves motion quality and instruction following within closed-loop simulations using hybrid physical feedback and text rewards. Quantitative evaluations demonstrate that SCRIPT outperforms prior state-of-the-art methods, with gains across text alignment, motion quality, and physical realism metrics. Furthermore, scaling studies on the 1200-hour MotionMillion dataset demonstrate consistent performance gains with model scaling, highlighting SCRIPT's robust scalability for large-scale pre-training. Our code will be publicly available for future research.

URL PDF HTML ☆

赞 0 踩 0

2605.22892 2026-05-26 q-fin.RM cs.LG 版本更新

Is TabPFN the Silver Bullet for Insurance Pricing?

TabPFN 是保险定价的银弹吗？

Bruno Deprez, Wouter Verbeke, Tim Verdonck

发表机构 * KU Leuven University of Antwerp-imec（根特大学安特卫普-imec）； KU Leuven（根特大学）； University of Antwerp-imec（安特卫普大学-imec）

AI总结本文首次实证评估 TabPFN 在车险定价中的表现，与 GLM 和 XGBoost 对比，发现其性能不稳定、推理时间长且对上下文训练集大小敏感，目前无法替代传统精算方法。

详情

AI中文摘要

非寿险定价中的索赔频率和严重性建模主要依赖广义线性模型，梯度提升机是领先的机器学习替代方案。表格基础模型（TFM）提出了一种根本不同的推理范式。通过在大量合成数据集上预训练，TFM 能够通过上下文学习对新数据进行推理，无需针对特定数据集进行拟合或超参数调优。本文首次对 TabPFN 在车险定价中进行实证评估，在两个公开的 MTPL 数据集上将其与 GLM 和 XGBoost 进行基准测试。我们的结果表明，TabPFN 并未持续优于已建立的基线，推理时间显著更长，并且对上下文训练集的大小敏感。虽然表格基础模型代表了有前景的方向，特别是在数据稀缺的情况下，但其当前性能无法为已建立的精算方法提供可行的替代方案。

英文摘要

Modelling claim frequency and severity for non-life insurance pricing predominantly relies on generalised linear models, with gradient-boosted machines as the leading machine learning alternative. Tabular foundation models (TFMs) present a fundamentally different inference paradigm. By pre-training on large collections of synthetic datasets, TFMs enable inference on new data through in-context learning, without any dataset-specific fitting or hyperparameter tuning. This paper presents a first empirical evaluation of TabPFN for motor insurance pricing, benchmarking it against GLM and XGBoost on two publicly available MTPL datasets. Our results show that TabPFN does not consistently outperform established baselines, exhibits substantially longer inference times, and is sensitive to the size of the in-context training set. While tabular foundation models represent a promising direction, particularly in data-scarce settings, their current performance does not offer a viable replacement for established actuarial methods.

URL PDF HTML ☆

赞 0 踩 0

2605.22856 2026-05-26 eess.SP cs.AI cs.IT cs.LG cs.NI math.IT 版本更新

PilotWiMAE: Pilot-Native Representation Learning for Wireless Channels

PilotWiMAE：面向无线信道的导频原生表示学习

Berkay Guler, Giovanni Geraci, Hamid Jafarkhani

发表机构 * Center for Pervasive Communications and Computing, University of California, Irvine（加州大学尔湾分校普及通信与计算中心）； Nokia and Universitat Pompeu Fabra（诺基亚与庞培法布拉大学）

AI总结提出PilotWiMAE自监督框架，直接处理噪声导频观测，通过分解注意力机制和补丁归一化重构，在缩小观测空间的同时实现跨频段波束选择和信道表征，优于监督基线。

详情

AI中文摘要

信道基础模型假设能够访问完全观测的信道，这一假设在部署中不成立。我们提出PilotWiMAE，一种自监督框架，其编码器直接接收噪声导频观测，注意力沿时间与联合空频处理轴分解，这是受问题物理特性启发的归纳偏置。导频输入将观测空间缩小两个数量级，并消除了全CSI可用性的不现实假设，同时降低延迟。分解设计通过利用可分离的信道结构生成鲁棒表示，并允许预训练掩码率达到$99\%$。我们将捕获小尺度衰落结构的补丁归一化重构与恢复大尺度衰落特征的辅助尺度损失相结合，并使用AWGN课程学习来匹配预训练和部署时的导频噪声。仅在$3.5$\,GHz上预训练，在$28$\,GHz上评估，涵盖分布内和分布外场景，PilotWiMAE的跨频段波束选择和信道表征在更小的观测空间上仍优于监督基线。为削弱解码器容量与表示质量之间的耦合，我们进一步提出在编码器-解码器联合预训练之后进行以解码器为中心的预训练阶段，使得PilotWiMAE在不牺牲表示质量的情况下展现出有竞争力的信道估计性能。为促进该方向的进一步研究，我们发布了PilotWiMAE预训练权重和训练流程，以及基于Sionna的射线追踪信道生成工具CSIGen和本文使用的信道数据集。

英文摘要

Channel foundation models assume access to fully observed channels, an assumption that fails in deployment. We introduce PilotWiMAE, a self-supervised framework whose encoder ingests noisy pilot observations directly and whose attention factorizes along the axis separating temporal from joint space-frequency processing, an inductive bias inspired by the physics of the problem. Pilot input shrinks the observation space by up to two orders of magnitude and also removes the unrealistic assumption of full-CSI availability while incurring lower latency. The factorized design generates robust representations by exploiting the separable channel structure and allows a pretraining mask ratio of $99\%$. We pair patch-normalized reconstruction, which captures small-scale fading structure, with an auxiliary scale loss that recovers the large-scale fading features, and use an AWGN curriculum to match pilot noise at pretraining and deployment. Pretrained solely on $3.5$\,GHz and evaluated at $28$\,GHz across in-distribution and out-of-distribution settings, PilotWiMAE's cross-frequency beam selection and channel characterization beat supervised baselines despite operating on a smaller observation space. To weaken the coupling between decoder capacity and representation quality, we further propose a decoder-centric pretraining stage following the encoder-decoder joint pretraining, which allows PilotWiMAE to demonstrate competitive channel estimation without sacrificing representation quality. To foster further work in this direction, we release the PilotWiMAE pretrained weights and training pipeline, together with CSIGen, our Sionna-based ray-tracing channel-generation tool, and the channel datasets used in this work.

URL PDF HTML ☆

赞 0 踩 0

2605.22795 2026-05-26 stat.ML cs.AI cs.LG math.ST stat.TH 版本更新

通过全循环Transformer简单稳定循环

Rao Fu, Zixuan Yang, Jiankun Zhang, Jing Ma, Hechang Chen, Yu Li, Yi Chang

发表机构 * Hong Kong Baptist University（香港 Baptist 大学）； Jilin University（吉林大学）

AI总结针对循环Transformer在迭代次数增加时出现的训练不稳定性，提出全循环Transformer，通过全循环架构和注意力注入两种无参数修改，稳定训练至12次循环，下游任务性能提升最高13.2%。

详情

AI中文摘要

扩展模型性能通常需要增加模型大小。循环Transformer通过迭代重用相同的Transformer块提供了一种引人注目的替代方案，用额外的计算换取性能提升，而不增加参数数量或上下文长度。由于推理时可以调整循环迭代次数，它还提供了一种平衡性能和测试时计算的自然机制。然而，当循环迭代次数增加时，循环Transformer仍然存在训练不稳定性。我们的分析表明，这种不稳定性源于两个来源：梯度振荡和残差爆炸。为了解决这两个问题，我们提出了全循环Transformer，它引入了两种无参数修改：（1）全循环架构，将循环间信号分布到所有层以缓解残差爆炸；（2）注意力注入，重用现有的注意力块以抑制梯度振荡。这些修改稳定了训练动态，使得全循环Transformer能够稳定训练多达12次循环迭代，而其他基线循环模型在这种情况下会崩溃。在循环Transformer不会崩溃的较温和设置中，全循环Transformer仍然将平均下游任务性能提升了高达13.2%。总体而言，我们的实验表明，全循环Transformer提高了训练稳定性，增强了下游性能，并通过在推理时改变循环迭代次数，提供了在不同测试时计算预算下的初步适应性。

英文摘要

Scaling model performance typically requires increasing model size. Looped Transformer offers a compelling alternative by iteratively reusing the same Transformer blocks, trading additional computation for improved performance without increasing parameter count or context length. Because the number of loop iterations can be adjusted at inference, it also provides a natural mechanism for balancing performance and test-time compute. However, Looped Transformer still suffers from training instability when the number of loop iterations increases. Our analysis reveals that this instability stems from two sources: gradient oscillation and residual explosion. To address these two problems, we propose the Fully Looped Transformer, which introduces two parameter-free modifications: (1) Fully Looped Architecture, which distributes inter-loop signals across all layers to mitigate residual explosion; (2) Attention Injection, which reuses the existing attention block to suppress gradient oscillation. These modifications stabilize training dynamics, enabling the Fully Looped Transformer to be trained stably up to 12 loop iterations, whereas other baseline looped models collapse in this regime. In milder settings where Looped Transformer does not collapse, Fully Looped Transformer still improves average downstream-task performance by up to 13.2\%. Overall, our experiments demonstrate that Fully Looped Transformer improves training stability, enhances downstream performance, and provides preliminary adaptability under different test-time compute budgets by varying loop iterations at inference.

URL PDF HTML ☆

赞 0 踩 0

2605.18746 2026-05-26 cs.CV cs.AI cs.CL cs.LG cs.RO 版本更新

ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

ESI-Bench: 迈向闭环感知-动作的具身空间智能

Yining Hong, Jiageng Liu, Han Yin, Manling Li, Leonidas Guibas, Li Fei-Fei, Jiajun Wu, Yejin Choi

发表机构 * Stanford University（斯坦福大学）； UCLA（加州大学洛杉矶分校）； Northwestern University（西北大学）

AI总结提出ESI-BENCH基准，通过主动探索（感知、移动、操作）在OmniGibson环境中评估具身空间智能，发现主动探索显著优于被动方法，失败主因是动作盲视而非感知弱，且模型存在元认知差距。

Comments https://esi-bench.github.io/

详情

AI中文摘要

空间智能通过感知-动作循环展开：智能体通过行动获取观察，并推理观察如何随动作变化。它们不是被动处理所见，而是主动揭示未见——遮挡结构、动态、包含关系和功能，这些无法仅通过被动感知解决。我们超越先前假设神谕观察的空间智能表述，将观察者重新定义为行动者。我们引入ESI-BENCH，一个基于OmniGibson、扎根于Spelke核心知识系统的全面具身空间智能基准，涵盖10个任务类别和29个子类别。智能体必须决定部署哪些能力——感知、移动和操作——以及如何排序以主动积累任务相关证据。我们对最先进的MLLM进行大量实验，发现主动探索显著优于被动对应物，智能体自发发现涌现的空间策略而无需明确指令，而随机多视角往往增加噪声而非信号，尽管消耗更多图像。大多数失败并非源于感知弱，而是动作盲视：糟糕的动作选择导致糟糕的观察，进而引发级联错误。虽然显式3D基础稳定了深度敏感任务的推理，但不完美的3D表示通过扭曲空间关系证明比2D基线更有害。人类研究进一步揭示，与寻求证伪视角并在矛盾下修正信念的人类不同，模型无论证据质量如何都过早且高置信度地承诺，暴露了一个既不能通过更好感知也不能通过更多具身互动单独闭合的元认知差距。

英文摘要

Spatial intelligence unfolds through a perception-action loop: agents act to acquire observations, and reason about how observations vary as a function of action. Rather than passively processing what is seen, they actively uncover what is unseen - occluded structure, dynamics, containment, and functionality that cannot be resolved from passive sensing alone. We move beyond prior formulations of spatial intelligence that assume oracle observations by recasting the observer as an actor. We introduce ESI-BENCH, a comprehensive benchmark for embodied spatial intelligence spanning 10 task categories and 29 subcategories built on OmniGibson, grounded in Spelke's core knowledge systems. Agents must decide what abilities to deploy - perception, locomotion, and manipulation - and how to sequence them to actively accumulate task-relevant evidence. We conduct extensive experiments on state-of-the-art MLLMs and find that active exploration substantially outperforms passive counterparts, with agents spontaneously discovering emergent spatial strategies without explicit instructions, while random multi-view often adds noise rather than signal despite consuming far more images. Most failures stem not from weak perception but from action blindness: poor action choices lead to poor observations, which in turn drive cascading errors. While explicit 3D grounding stabilizes reasoning on depth-sensitive tasks, imperfect 3D representation proves more harmful than 2D baselines by distorting spatial relations. Human studies further reveal that unlike humans who seek falsifying viewpoints and revise beliefs under contradiction, models commit prematurely with high confidence regardless of evidence quality, exposing a metacognitive gap that neither better perception nor more embodied interaction alone can close.

URL PDF HTML ☆

赞 0 踩 0

2605.18745 2026-05-26 stat.ML cs.LG cs.NA math.NA math.PR q-fin.MF stat.CO 版本更新

SURGE: Approximation and Training Free Particle Filter for Diffusion Surrogate

SURGE: 扩散替代模型的近似与免训练粒子滤波

Lifu Wei, Yinuo Ren, Naichen Shi, Yiping Lu

发表机构 * Department of Mechanical Engineering, Northwestern University, Evanston, IL, United States ； Institute for Computational \& Mathematical Engineering, Stanford University, Stanford, CA, United States ； Department of Industrial Engineering \& Management Sciences, Northwestern University, Evanston, IL, United States

AI总结提出一种基于扩散模型的无偏粒子滤波方法，通过序列蒙特卡洛对扩散轨迹进行重加权和重采样，融合观测数据与模型模拟，实现状态估计的连续校正。

Comments accepted by ICML 2026

详情

AI中文摘要

数据同化（DA）解决从含噪声和不完整的观测中顺序估计动力系统状态的问题。本文采用扩散模型作为世界模型来模拟和预测系统动力学。最近，基于分数的扩散模型学习了全局扩散先验，能有效建模（随机）动力学，显示出数据同化的强大潜力。本文研究如何利用含噪观测信息，在使用扩散先验时实现对预测系统状态的连续校正和细化。受粒子滤波方法启发，我们使用一组粒子表示后验分布。接收到含噪观测后，利用观测似然引导扩散模型，使生成过程朝向与观测一致的状态。然而，这种引导并不能保证从真实后验中采样。因此，我们将扩散轨迹视为路径测度，采用序列蒙特卡洛方法对粒子进行重加权和重采样，从而纠正生成过程并确保收敛到所需的后验分布。这产生了一种无偏的粒子滤波方法，严格地将观测数据与扩散模型模拟融合。

英文摘要

Data assimilation (DA) addresses the problem of sequentially estimating the state of a dynamical system from noisy and incomplete observations. In this work, we employ a diffusion model as a world model to simulate and predict the system's dynamics. Recently, score-based diffusion models have learned global diffusion priors that effectively model (stochastic) dynamics, revealing strong potential for data assimilation. In this paper, we investigate how information from noisy observations can be incorporated to enable continuous correction and refinement of the predicted system state when using a diffusion prior. Motivated by particle filtering methods, we represent the posterior distribution using a set of particles. After receiving noisy observations, the diffusion model is guided using the observation likelihood to steer the generation process toward observation-consistent states. Nevertheless, such guidance does not guarantee sampling from the true posterior. We therefore employ a Sequential Monte Carlo approach over the diffusion trajectory, viewed as a path measure, to reweight and resample particles, thereby correcting the generation process and ensuring convergence toward the desired posterior distribution. This leads to an unbiased particle filtering method that rigorously fuses observational data with diffusion model simulations.

URL PDF HTML ☆

赞 0 踩 0

2605.17788 2026-05-26 cs.IR cs.LG 版本更新

Uncertainty-Calibrated Recommendations for Low-Active Users

低活跃用户的不确定性校准推荐

Bob Junyi Zou, Sai Li, Tianyun Sun, Wentao Guo, Qinglei Wang

发表机构 * Stanford University（斯坦福大学）； ByteDance Inc.（字节跳动公司）

AI总结提出一个生产就绪的框架，通过校准模型不确定性来为低活跃用户实施风险规避的去增强策略，为高活跃用户采用风险寻求的UCB策略，从而平衡推荐可靠性与多样性。

Comments Accepted to the Applied Data Science (ADS) track at the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)

详情

DOI: 10.1145/3770855.3818501
Journal ref: Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '26), August 09--13, 2026, Jeju Island, Republic of Korea

AI中文摘要

推荐系统的一个基本挑战是平衡低活跃用户（LAUs）的可靠性与高活跃用户（HAUs）的多样性。这一平衡的关键在于量化模型不确定性，它近似预测误差的风险并揭示模型当前知识的局限性。在大规模短视频和直播平台上，模型不确定性可以警告可能导致LAUs脱离的低质量推荐，同时识别出为HAUs多样化内容推荐的机会。为了利用这种二分法，我们引入了一个统一的、生产就绪的框架，该框架校准不确定性以驱动差异化策略。具体来说，我们为LAUs实施了一种基于模型不确定性的风险规避去增强策略，以抑制不可靠的推荐，同时为HAUs采用风险寻求的上置信界（UCB）策略以鼓励探索。在一个主要直播平台上验证，我们的框架显著提高了LAUs的留存（活跃小时数）和满意度（质量观看时间比率），并显著增加了HAUs的兴趣多样性和类别覆盖率，证明了在工业环境中不确定性感知推荐的价值。

英文摘要

A fundamental challenge in recommender systems is balancing reliability for Low-Active Users (LAUs) with diversity for High-Active Users (HAUs). The key to this balance lies in quantifying model uncertainty, which approximates the risk of prediction errors and reveals the limits of the model's current knowledge. On large-scale short-video and livestream platforms, model uncertainty can warn of low-quality recommendations that may lead to disengagement of LAUs and at the same time identify opportunities to diversify content recommendation for HAUs. To leverage this dichotomy, we introduce a unified, production-ready framework that calibrates uncertainty to drive differentiated strategies. Specifically, we implement a model-uncertainty-based risk-averse deboosting policy for LAUs to suppress unreliable recommendations, while employing a risk-seeking Upper Confidence Bound (UCB) strategy for HAUs to encourage exploration. Validated on a major livestream platform, our framework demonstrates significant improvements in retention (active hours) and satisfaction (quality watch time ratio) for LAUs as well as remarkable increases in interest diversity and category coverage for HAUs, proving the value of uncertainty-aware recommendation in industrial settings.

URL PDF HTML ☆

赞 0 踩 0

2605.17730 2026-05-26 cs.LG cs.AI 版本更新

Nils Feldhus, Tanja Baeumel, Elena Golimblevskaia, Qianli Wang, Van Bach Nguyen, Aaron Louis Eidt, Selin Kahvecioglu, Christopher Ebert, Wojciech Samek, Jing Yang, Vera Schmitt, Sebastian Möller, Simon Ostermann

发表机构 * Technische Universität Berlin（柏林技术大学）； BIFOLD – Berlin Institute for the Foundations of Learning and Data（柏林学习与数据基础研究院）； German Research Center for Artificial Intelligence (DFKI)（德国人工智能研究中心）； Fraunhofer Heinrich Hertz Institute（弗劳恩霍夫海因里希·赫茨研究所）； Marburg University（马尔堡大学）； Centre for European Research in Trusted AI (CERTAIN)（欧洲可信人工智能研究中心）

AI总结本研究利用位置感知边归因修补（PEAP）因果分析Gemma-3、Qwen2.5和Llama-3的内部机制，发现结构化理解和开放式偏好任务中的判断共享一个稀疏、泛化的潜在评估子图，并通过解耦抽象判断与输出格式，揭示了格式诱导不一致性的机制原因。

Comments 39 pages

详情

AI中文摘要

LLM-as-a-judge已成为大规模评估模型输出的主导范式，然而同一模型在其输出格式变化时（例如，1-5评分与真/假标签）会系统地给出不同的分数。现有对这些格式诱导不一致性的诊断停留在输入输出层面。利用位置感知边归因修补（PEAP），我们因果地研究了Gemma-3、Qwen2.5和Llama-3的内部机制。我们发现，跨结构化理解和开放式偏好任务的判断共享一个稀疏、泛化的潜在评估子图，位于中后期多层感知器（MLPs）中；将其零消融会破坏判断，同时保留架构模块化模型中的世界知识。通过结构上解耦抽象判断与输出格式，我们为我们研究的开放权重模型上的格式诱导不一致性提供了机制解释：在共享主干中计算的连续判断信号通过脆弱、格式特定的终端分支映射，使得格式无关的偏好能够在请求的输出格式下游被隔离。我们的发现意味着跨格式的基准级可靠性比较部分测量的是格式化器几何形状而非评估质量。

英文摘要

LLM-as-a-judge has become the dominant paradigm for grading model outputs at scale, yet the same model assigns systematically different scores when its output format changes (e.g., a 1-5 rating vs. a True/False label). Existing diagnoses of these format-induced inconsistencies stop at the input-output level. Using Position-aware Edge Attribution Patching (PEAP), we causally investigate the internal mechanism in Gemma-3, Qwen2.5, and Llama-3. We find that judgments across structured understanding and open-ended preference tasks share a sparse, generalized Latent Evaluator sub-graph in the mid-to-late multi-layer perceptrons (MLPs); zero-ablating it collapses judgment while preserving world knowledge in architecturally modular models. By structurally decoupling abstract judging from output formatting, we provide a mechanistic account of format-induced inconsistency on the open-weight models we study: a continuous judgment signal computed in the shared trunk is mapped through fragile, format-specific terminal branches, enabling format-independent preference to be isolated downstream of the requested output format. Our findings imply that benchmark-level reliability comparisons across formats are partially measuring formatter geometry rather than evaluation quality.

URL PDF HTML ☆

赞 0 踩 0

2605.14769 2026-05-26 cs.LG 版本更新

Composable Crystals: Controllable Materials Discovery via Concept Learning

可组合晶体：通过概念学习实现可控材料发现

Nian Liu, Yuwei Zeng, Ryoji Kubo, Nikita Kazeev, Stephen Gregory Dale, Artem Maevskiy, Pengru Huang, Thomas Laurent, Kostya S. Novoselov, Xavier Bresson

发表机构 * National University of Singapore（新加坡国立大学）； Loyola Marymount University（洛桑玛丽蒙大学）

AI总结提出基于概念组合的晶体生成框架，利用向量量化变分自编码器自动发现可重用晶体概念，通过概念重组实现可控的新晶体探索，在MP-20和Alex-MP-20上V.S.U.N指标提升最高53.2%和51.7%。

详情

AI中文摘要

从头晶体生成是材料发现中的核心任务，旨在生成同时有效、稳定、独特且新颖的晶体。现有方法主要依赖黑盒随机采样，对生成结构如何超越观测分布的控制有限。本文提出了一种基于概念的组合式晶体生成框架。我们训练了一个向量量化变分自编码器，自动发现一组可重用的晶体概念，这些概念作为引导生成的构建块。这些学习到的概念在局部原子环境和全局对称模式上自然表现出可解释性，并能泛化到不同分布的晶体。通过重组这些概念，我们的框架能够可控地探索训练分布之外的新颖晶体，而非仅依赖无约束的随机采样。为进一步提高组合效率，我们引入了一个组合生成器，并使用模型自身生成的高质量样本对其进行迭代优化。最终的概念组合用于条件化下游晶体生成。在MP-20和Alex-MP-20上的数值实验表明，分别组合概念使基础模型在V.S.U.N指标上提升高达53.2%和51.7%，尤其在新颖性方面增益显著。

英文摘要

De novo crystal generation, a central task in materials discovery, aims to generate crystals that are simultaneously valid, stable, unique, and novel. Existing methods mainly rely on black-box stochastic sampling, providing limited control over how generated structures move beyond the observed distribution. In this paper, we introduce a concept-based compositional framework for crystal generation. We train a vector-quantized variational autoencoder to automatically discover a shared set of reusable crystal concepts, which serve as building blocks for guided generation. These learned concepts naturally exhibit interpretability from both local atomic environments and global symmetry patterns, and generalize to crystals from different distributions. By recombining such concepts, our framework enables controllable exploration of novel crystals beyond the training distribution, rather than relying solely on unconstrained random sampling. To further improve composition efficiency, we introduce a composition generator and iteratively refine it using high-quality samples generated by the model itself. The resulting concept compositions are then used to condition downstream crystal generation. Numerical experiments on MP-20 and Alex-MP-20 show that compositing concepts separately increase base model up to 53.2% and 51.7% on V.S.U.N metric, with particular gains in novelty.

URL PDF HTML ☆

赞 0 踩 0

2605.14759 2026-05-26 cs.LG 版本更新

Crys-JEPA: Accelerating Crystal Discovery via Embedding Screening and Generative Refinement

Crys-JEPA：通过嵌入筛选和生成精炼加速晶体发现

Nian Liu, Nikita Kazeev, Stephen Gregory Dale, Artem Maevskiy, Yuwei Zeng, Ryoji Kubo, Pengru Huang, Thomas Laurent, Yann LeCun, Kostya S. Novoselov, Xavier Bresson

发表机构 * National University of Singapore（国立新加坡大学）； Loyola Marymount University（洛约拉玛丽蒙特大学）； New York University（纽约大学）； AMI

AI总结提出Crys-JEPA联合嵌入预测架构，通过能量感知的潜在空间和筛选-精炼流程，解决晶体生成中稳定性和新颖性的冲突，在MP-20和Alex-MP-20数据集上V.S.U.N.指标分别提升53.8%和72.7%。

详情

AI中文摘要

从头晶体生成旨在发现不仅真实而且稳定和新颖的材料。然而，大多数现有生成模型被训练为最大化观测晶体的似然，这鼓励样本接近已知材料，但不一定与发现中重要的标准一致。我们的实证分析表明，当前晶体生成模型在稳定性和新颖性之间存在明显冲突：接近观测分布的样本倾向于保持稳定性但提供有限的新颖性，而远离分布的样本通常迅速失去稳定性。这表明发现既稳定又新颖晶体的有用区域极其狭窄。为了突破这一限制，我们引入了Crys-JEPA，一种用于晶体的联合嵌入预测架构，它学习一个能量感知的潜在空间，保留形成能差异。在这个空间中，稳定性评估可以重新表述为基于嵌入的与可访问训练晶体的比较，减少了对昂贵能量评估和特定任务外部参考的依赖。基于Crys-JEPA，我们进一步开发了一个筛选-精炼流程，识别有前景的生成晶体并重新引入它们以精炼生成模型。在MP-20和Alex-MP-20数据集上，我们在V.S.U.N.指标上分别比基线提升了53.8%和72.7%。

英文摘要

De novo crystal generation seeks to discover materials that are not merely realistic, but also stable and novel. However, most existing generative models are trained to maximize the likelihood of observed crystals, which encourages samples to stay close to known materials yet not necessarily align with the criteria that matter in discovery. Our empirical analysis shows that current crystal generative models exhibit a clear conflict between stability and novelty: samples near the observed distribution tend to retain stability but offer limited novelty, whereas samples farther from it often lose stability rapidly. This suggests that the useful region for discovering crystals that are both stable and novel is extremely narrow. To move beyond this limitation, we introduce Crys-JEPA, a joint embedding predictive architecture for crystals that learns an energy-aware latent space preserving formation-energy differences. In this space, stability assessment can be reformulated as an embedding-based comparison against accessible training crystals, reducing the reliance on expensive energy evaluation and task-specific external references. Building on Crys-JEPA, we further develop a screening-and-refinement pipeline that identifies promising generated crystals and reintroduces them to refine the generative model. On MP-20 and Alex-MP-20 datasets, we achieve improvements over baselines up to 53.8% and 72.7% on V.S.U.N. metric, respectively.

URL PDF HTML ☆

赞 0 踩 0

2605.12906 2026-05-26 cs.LG cs.AI 版本更新

Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning

数据难度与LLM微调中的泛化-外推权衡

Siyuan Liu, Tinghong Chen, Xinghan Li, Yifei Wang, Jingzhao Zhang

发表机构 * IIIS, Tsinghua University（清华大学人工智能学院）； College of AI, Tsinghua University（清华大学人工智能学院）； Shanghai Qi Zhi Institute（上海启智研究院）； Amazon AGI SF Lab（亚马逊AGI旧金山实验室）

AI总结本文通过实证和理论分析，研究了监督微调中数据难度对模型行为的影响，发现数据难度与数据量共同决定泛化与外推之间的权衡，并存在最优难度随数据量增加而向更难数据偏移的规律。

Comments Accepted to ICML 2026

详情

AI中文摘要

监督微调（SFT）期间的数据选择可以显著改变大型语言模型（LLMs）的行为。尽管已有工作研究了基于困惑度、难度或长度等启发式方法选择数据的效果，但报告的结果往往不一致或依赖于上下文。在这项工作中，我们从实证和理论角度系统地研究了数据难度在微调中的作用，并发现不存在普遍最优的难度水平；相反，其有效性取决于数据集大小。我们表明，对于固定的数据预算，SFT存在一个最优的数据难度，并且随着数据预算的增加，该最优难度向更难的数据偏移。为了解释这一现象，我们进行了受控的合成实验，揭示了一个简单的底层机制：分布内泛化差距与外推差距之间的相互作用。我们通过使用PAC-Bayesian泛化界限的理论分析进一步支持了这一机制。总的来说，我们的结果阐明了数据大小和难度如何共同影响SFT中泛化与外推之间的权衡，为在特定模型和数据条件下基于难度的数据选择提供了指导。

英文摘要

Data selection during supervised fine-tuning (SFT) can critically change the behavior of large language models (LLMs). Although existing work has studied the effect of selecting data based on heuristics such as perplexity, difficulty, or length, the reported findings are often inconsistent or context-dependent. In this work, we systematically study the role of data difficulty in fine-tuning from both empirical and theoretical perspectives, and find that there is no universally optimal difficulty level; rather, its effectiveness depends on the dataset size. We show that for a fixed data budget, there exists an optimal data difficulty for SFT, and that this optimal difficulty shifts toward harder data as the data budget increases. To explain this phenomenon, we conduct controlled synthetic experiments that reveal a simple underlying mechanism: the interplay between the (in-distribution) generalization gap and the extrapolation gap. We further support this mechanism through a theoretical analysis using PAC-Bayesian generalization bounds. Overall, our results clarify how data size and difficulty jointly affect the trade-off between generalization and extrapolation in SFT, providing guidance for difficulty-based data selection under certain model and data conditions.

URL PDF HTML ☆

赞 0 踩 0

2605.12374 2026-05-26 cs.CV cs.AI cs.LG 版本更新

Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models

填补GAP：多模态大语言模型中视觉推理的粒度对齐范式

Yanting Miao, Yutao Sun, Dexin Wang, Mengyu Zhou, Pascal Poupart, Lei Lv, Qi Zhao, Li Wang, Hao Li, Xiaoxi Jiang, Guanjun Jiang

发表机构 * Qwen Large Model Application Team, Alibaba（阿里云大模型应用团队）； Alibaba University of Waterloo（阿里大学水力学院）； Vector Institute（向量研究所）； Zhejiang University（浙江大学）

AI总结提出GAP（粒度对齐范式），通过特征级、上下文级和能力引导级对齐，解决多模态大语言模型中视觉潜在推理的特征空间不匹配问题，提升感知与推理性能。

详情

AI中文摘要

视觉潜在推理让多模态大语言模型（MLLM）以连续令牌形式创建中间视觉证据，避免外部工具或图像生成器。然而，现有方法通常遵循输出即输入的潜在范式，产生不稳定的收益。我们识别出特征空间不匹配是导致这种不稳定的证据：主流的视觉潜在模型建立在预归一化MLLM上，重用解码器隐藏状态作为预测的潜在输入，尽管这些状态与模型训练时消耗的输入嵌入处于截然不同的范数范围（Xie et al., 2025; Li et al., 2026; Team et al., 2026）。这种不匹配可能使直接潜在反馈不可靠。受此诊断启发，我们提出GAP，一种用于视觉潜在建模的粒度对齐范式。GAP在三个层面对齐视觉潜在推理：特征级对齐通过轻量级PCA对齐潜在头将解码器输出映射为输入兼容的视觉潜在；上下文级对齐通过可检查的辅助视觉监督锚定潜在目标；能力引导对齐选择性地将潜在监督分配给基础MLLM难以处理的示例。在Qwen2.5-VL 7B上，所得模型在我们监督变体中实现了最佳平均聚合感知和推理性能。推理时干预探测进一步表明，生成的潜在提供了任务相关的视觉信号，而不仅仅是增加令牌槽位。

英文摘要

Visual latent reasoning lets a multimodal large language model (MLLM) create intermediate visual evidence as continuous tokens, avoiding external tools or image generators. However, existing methods usually follow an output-as-input latent paradigm and yield unstable gains. We identify evidence for a feature-space mismatch that can contribute to this instability: dominant visual-latent models build on pre-norm MLLMs and reuse decoder hidden states as predicted latent inputs, even though these states occupy a substantially different norm regime from the input embeddings the model was trained to consume (Xie et al., 2025; Li et al., 2026; Team et al., 2026). This mismatch can make direct latent feedback unreliable. Motivated by this diagnosis, we propose GAP, a Granular Alignment Paradigm for visual latent modeling. GAP aligns visual latent reasoning at three levels: feature-level alignment maps decoder outputs into input-compatible visual latents through a lightweight PCA-aligned latent head; context-level alignment grounds latent targets with inspectable auxiliary visual supervision; and capacity-guided alignment assigns latent supervision selectively to examples where the base MLLM struggles. On Qwen2.5-VL 7B, the resulting model achieves the best mean aggregate perception and reasoning performance among our supervised variants. Inference-time intervention probing further suggests that generated latents provide task-relevant visual signal beyond merely adding token slots.

URL PDF HTML ☆

赞 0 踩 0

2605.09270 2026-05-26 cs.LG cs.AI 版本更新

Memorize Theorems, Not Instances: Probing SFT Generalization through Mathematical Reasoning

记忆定理而非实例：通过数学推理探究SFT泛化

Ruiying Peng, Mengyu Yang, Jing Lei, Xiaohui Li, Xueyu Wu, Xinlei Chen

发表机构 * Tsinghua Shenzhen International Graduate School（清华大学深圳国际研究生院）； Huawei Technologies（华为技术）

AI总结针对监督微调（SFT）损害推理泛化的问题，提出Theorem-SFT方法，通过显式定理应用训练，在多个基准上取得显著提升，并揭示前馈层是推理规则的主要存储位置。

详情

AI中文摘要

监督微调（SFT）广泛用于任务特定适配，但近期工作表明它会系统性地削弱推理泛化。我们认为根本原因不在于记忆本身，而在于其目标：标准SFT驱动模型利用并记忆问题-答案对中的虚假表面相关性，使其对表面输入变化脆弱。为解决此问题，我们提出Theorem-SFT，通过教授模型规则如何被调用而非答案看起来像什么，将监督重新导向显式定理应用。Theorem-SFT在多个基准和模型家族上取得一致提升：在MATH上（LLaMA3.2-3B-Instruct）提升8.8%，在GeoQA上（Qwen2.5-VL-7B-Instruct）提升20.27%，无需特定模态的重新训练。仅微调MLP层即可达到全层性能，表明前馈组件是推理规则的主要存储位置。我们的发现重新定义了争论：泛化失败并非源于记忆机制本身，而是源于记忆了错误的归纳目标。

英文摘要

Supervised Fine-Tuning (SFT) is widely used for task-specific adaptation, yet recent work shows it systematically undermines reasoning generalization. We argue the root cause is not memorization itself, but its target: vanilla SFT drives models to exploit and memorize spurious surface correlations in problem-solution pairs, leaving them brittle to superficial input variations. To address this, we propose Theorem-SFT, which reorients supervision toward explicit theorem application by teaching models how rules are invoked rather than what answers look like. Theorem-SFT yields consistent gains across benchmarks and model families: +8.8% on MATH (LLaMA3.2-3B-Instruct) and +20.27% on GeoQA (Qwen2.5-VL-7B-Instruct) without modality-specific re-training. Fine-tuning MLP layers alone matches full-layers performance, implicating feed-forward components as the primary locus of reasoning rules. Our findings reframe the debate: Generalization failures stem not from memorization as a mechanism, but from memorizing the wrong inductive targets.

URL PDF HTML ☆

赞 0 踩 0

2605.08306 2026-05-26 eess.IV cs.LG 版本更新

复随机梯度下降与再生核希尔伯特空间中的方向偏差

Natanael Alpay, Emeric Battaglia

发表机构 * Department of Mathematics University of California, Irvine, Irvine, CA 92697 USA（数学系，加州大学伊文斯顿分校，伊文斯顿，CA 92697，美国）

AI总结本文提出复随机梯度下降（Complex SGD）方法，在无解析性约束下证明其收敛性，并验证方向偏差从实域扩展到复域，在复再生核希尔伯特空间中通过核回归有效恢复超振荡函数和Blaschke乘积。

详情

AI中文摘要

随机梯度下降（SGD）是一种已知的随机迭代方法，因其实现简单和可扩展性而流行于大规模凸优化问题。某些目标函数，例如复值神经网络中的目标函数，受益于SGD和梯度下降（GD）中使用新定义的“梯度”进行更新，该梯度允许复参数。这种SGD/GD方法的复变体已被提出，但尚未提供无解析性约束的收敛保证。我们提出了一种允许复参数的SGD变体（复SGD），并在与实设置平行的假设下提供了收敛保证。值得注意的是，这些结果也扩展到GD，并且在相同的假设集下，我们确认了对于核回归问题，一些方向偏差结果从实域扩展到复域。我们提供了实证结果，证明了复SGD在使用复再生核希尔伯特空间的核回归问题中的有效性。特别地，我们展示了通过选择特定的损失函数，可以分别从Fock空间和Hardy空间中恢复超振荡函数和Blaschke乘积作为最优函数。

英文摘要

Stochastic Gradient Descent (SGD) is a known stochastic iterative method popular for large-scale convex optimization problems due to its simple implementation and scalability. Some objectives, such as those found in complex-valued neural networks, benefit from updates like in SGD and Gradient Descent (GD) with a newly defined ``gradient'' that allows for complex parameters. This complex variant of the SGD/GD methods has already been proposed, but convergence guarantees without analyticity constraints have not yet been provided. We propose a variant of SGD (complex SGD) that allows for complex parameters, and we provide convergence guarantees under assumptions that parallel those from the real setting. Notably, these results extend to GD as well, and with the same set of assumptions, we confirm that some directional bias results extend from the real to the complex setting for kernel regression problems. We provide empirical results demonstrating the efficacy of the complex SGD in kernel regression problems utilizing complex reproducing kernel Hilbert spaces. In particular, we demonstrate we may recover superoscillation functions and Blaschke products from the Fock Space and Hardy Space, respectively, as the optimal functions for a particular choice of a loss function.

URL PDF HTML ☆

赞 0 踩 0

2604.18970 2026-05-26 cs.LG cs.CR 版本更新

Mechanistic Anomaly Detection via Functional Attribution

通过功能归因的机制性异常检测

Hugo Lyons Keenan, Christopher Leckie, Sarah Erfani

发表机构 * School of Computing and Information Systems, The University of Melbourne, Victoria, Australia（计算与信息系统学院，墨尔本大学，维多利亚，澳大利亚）

AI总结将机制性异常检测重新定义为功能归因问题，利用影响函数测量测试样本与参考集之间的功能耦合，在视觉模型后门检测、大语言模型后门检测以及对抗样本和分布外样本检测中取得最优或显著改进。

Comments ICML '26 Camera Ready

详情

AI中文摘要

我们通常可以使用真实标签验证神经网络输出的正确性，但无法可靠地确定输出是由正常还是异常的内部机制产生的。机制性异常检测（MAD）旨在标记这些情况，但现有方法要么依赖于潜在空间分析（易受混淆攻击），要么特定于特定架构和模态。我们将MAD重新定义为功能归因问题：询问来自可信集的样本在多大程度上可以解释模型的输出，其中归因失败表明异常行为。我们使用影响函数来实现这一点，通过参数空间采样测量测试样本与小型参考集之间的功能耦合。我们在多种异常类型和模态上进行了评估。对于视觉模型中的后门，我们的方法在BackdoorBench上实现了最先进的检测，在七种攻击和四个数据集上平均防御有效性评分（DER）为0.93（次优为0.83）。对于大语言模型，我们在几种后门类型（包括显式混淆的模型）上也取得了比基线显著的改进。除了后门，我们的方法可以检测对抗样本和分布外样本，并区分单个模型内的多种异常机制。我们的结果确立了功能归因作为一种有效的、模态无关的工具，用于检测部署模型中的异常行为。

英文摘要

We can often verify the correctness of neural network outputs using ground truth labels, but we cannot reliably determine whether the output was produced by normal or anomalous internal mechanisms. Mechanistic anomaly detection (MAD) aims to flag these cases, but existing methods either depend on latent space analysis, which is vulnerable to obfuscation, or are specific to particular architectures and modalities. We reframe MAD as a functional attribution problem: asking to what extent samples from a trusted set can explain the model's output, where attribution failure signals anomalous behavior. We operationalize this using influence functions, measuring functional coupling between test samples and a small reference set via parameter-space sampling. We evaluate across multiple anomaly types and modalities. For backdoors in vision models, our method achieves state-of-the-art detection on BackdoorBench, with an average Defense Effectiveness Rating (DER) of 0.93 across seven attacks and four datasets (next best 0.83). For LLMs, we similarly achieve a significant improvement over baselines for several backdoor types, including on explicitly obfuscated models. Beyond backdoors, our method can detect adversarial and out-of-distribution samples, and distinguishes multiple anomalous mechanisms within a single model. Our results establish functional attribution as an effective, modality-agnostic tool for detecting anomalous behavior in deployed models.

URL PDF HTML ☆

赞 0 踩 0

2604.14054 2026-05-26 cs.LG cs.CL 版本更新

$π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data

$\pi$-Play: 通过特权自蒸馏实现的多智能体自对弈，无需外部数据

Yaocheng Zhang, Yuanheng Zhu, Wenyue Chong, Songjun Tu, Qichao Zhang, Jiajun Chai, Xiaohan Wang, Wei Lin, Guojun Yin, Dongbin Zhao

发表机构 * Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所）； School of Advanced Interdisciplinary Sciences, University of Chinese Academy of Sciences（中国科学院大学先进交叉学科学院）； Meituan（美团）； School of Artificial Intelligence, University of Chinese Academy of Sciences（中国科学院大学人工智能学院）

AI总结提出$\pi$-Play框架，利用自对弈中生成的问答构建路径作为特权信息，结合自蒸馏实现密集反馈的多智能体协同进化，无需外部数据即可超越全监督搜索代理。

Comments 23 pages, 11 figures

详情

AI中文摘要

深度搜索代理已成为解决复杂信息寻求任务的有前景范式，但其训练仍面临稀疏奖励、弱信用分配和有限标注数据的挑战。自对弈提供了一种可扩展的减少数据依赖的途径，但传统自对弈仅通过稀疏结果奖励优化学生，导致学习效率低下。在这项工作中，我们观察到自对弈在任务生成过程中自然产生一个问题构建路径（QCP），这是一种捕获反向求解过程的中间产物。这揭示了一种新的特权信息来源：自对弈可以低成本、大规模地提供高质量特权信息用于自蒸馏，无需依赖人类反馈或精心设计的特权信息。基于这一洞察，我们提出特权信息自对弈（$\pi$-Play），一种结合自对弈和自蒸馏的新型多智能体自进化框架。在$\pi$-Play中，考官生成任务及QCP，教师利用QCP作为特权上下文，通过自蒸馏对学生进行密集监督。这种设计将稀疏奖励的自对弈转变为密集反馈的协同进化。大量实验表明，无数据的$\pi$-Play超越了全监督搜索代理，并将进化效率相比传统自对弈提升了2-3倍。代码见 https://github.com/zhyaoch/pi-play。

英文摘要

Deep search agents have emerged as a promising paradigm for addressing complex information-seeking tasks, but their training remains challenging due to sparse rewards, weak credit assignment, and limited labeled data. Self-play offers a scalable route to reduce data dependence, but conventional self-play optimizes students only through sparse outcome rewards, leading to low learning efficiency. In this work, we observe that self-play naturally produces a question construction path (QCP) during task generation, an intermediate artifact that captures the reverse solution process. This reveals a new source of privileged information: self-play can provide high-quality privileged information for the self-distillation at low cost and at scale, without relying on human feedback or curated privileged information. Leveraging this insight, we propose Privileged Information Self-Play ($π$-Play), a novel multi-agent self-evolution framework combining self-play and self-distillation. In $π$-Play, an examiner generates tasks together with QCPs, and a teacher employs QCP as privileged context to densely supervise a student via self-distillation. This design transforms sparse-reward self-play into a dense-feedback co-evolution. Extensive experiments show that data-free $π$-Play surpasses fully supervised search agents and improves evolutionary efficiency by 2-3$\times$ over conventional self-play. Code is available at https://github.com/zhyaoch/pi-play.

URL PDF HTML ☆

赞 0 踩 0

2604.10783 2026-05-26 cs.AI cs.LG 版本更新

混合量子神经网络用于多变量临床时间序列预测

Irene Iele, Floriano Caprio, Paolo Soda, Matteo Tortora

发表机构 * Department of Diagnostics and Intervention, Radiation Physics, Biomedical Engineering, Umeå University, Sweden（诊断与介入部门，辐射物理，生物医学工程，乌梅大学，瑞典）； Department of Naval, Electrical, Electronics and Telecommunications Engineering, University of Genoa, Italy（海军、电气、电子与电信工程部门，热那亚大学，意大利）

AI总结提出一种混合量子-经典架构，将变分量子电路集成到循环神经网络中，用于多变量生理时间序列的多步预测，在BIDMC数据集上表现出与基线相当的精度和更强的鲁棒性。

详情

AI中文摘要

预测生理信号可以通过预期患者状态的临界变化来支持主动监测和及时的临床干预。在这项工作中，我们通过联合预测心率、血氧饱和度、脉搏率和呼吸率在15、30和60秒的预测时域上，解决了生理时间序列的多变量多步预测问题。我们提出了一种混合量子-经典架构，将变分量子电路（VQC）集成到循环神经骨干中。GRU编码器将历史观察窗口总结为潜在表示，然后将其投影到用于参数化VQC的量子角度上。量子层作为可学习的非线性特征混合器，在最终预测阶段之前建模跨变量交互。我们在BIDMC PPG和呼吸数据集上采用留一患者方案评估了所提出的方法。结果显示，与经典和深度学习基线相比，该方法具有竞争性的精度，同时对噪声和缺失输入具有更强的鲁棒性。这些发现表明，混合量子层可以为小队列临床环境中的生理时间序列预测提供有用的归纳偏置。代码可在https://github.com/arco-group/quantum-ml获取。

英文摘要

Forecasting physiological signals can support proactive monitoring and timely clinical intervention by anticipating critical changes in patient status. In this work, we address multivariate multi-horizon forecasting of physiological time series by jointly predicting heart rate, oxygen saturation, pulse rate, and respiratory rate at forecasting horizons of 15, 30, and 60 seconds. We propose a hybrid quantum-classical architecture that integrates a Variational Quantum Circuit (VQC) within a recurrent neural backbone. A GRU encoder summarizes the historical observation window into a latent representation, which is then projected into quantum angles used to parameterize the VQC. The quantum layer acts as a learnable non-linear feature mixer, modeling cross-variable interactions before the final prediction stage. We evaluate the proposed approach on the BIDMC PPG and Respiration dataset under a Leave-One-Patient-Out protocol. The results show competitive accuracy compared with classical and deep learning baselines, together with greater robustness to noise and missing inputs. These findings suggest that hybrid quantum layers can provide useful inductive biases for physiological time series forecasting in small-cohort clinical settings. The code is available at https://github.com/arco-group/quantum-ml.

URL PDF HTML ☆

赞 0 踩 0

2603.06798 2026-05-26 cs.LG cs.DC stat.ML 版本更新

NEST: Network- and Memory-Aware Device Placement For Distributed Deep Learning

NEST: 面向分布式深度学习的网络与内存感知设备放置

Irene Wang, Vishnu Varma Venkata, Arvind Krishnamurthy, Divya Mahajan

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； University of Washington（华盛顿大学）

AI总结提出NEST框架，通过结构化动态规划统一模型并行、拓扑建模和内存可行性，在多种硬件和网络上实现高达2.43倍的吞吐量提升。

Comments Accepted to MLSys 2026

详情

AI中文摘要

深度学习规模的不断增长要求分布式训练框架能够联合考虑并行性、内存和网络拓扑。先前的工作通常依赖启发式或拓扑无关的搜索，分别处理通信和内存。由于缺乏每设备内存感知，这些方法通常事后通过将参数和激活分片到多个设备上来确保可行性，从而增加同步、扩大通信、降低计算利用率，限制了实际数据中心网络上的可扩展性和效率。我们提出了NEST，一个网络、计算和内存感知的设备放置框架，通过结构化动态规划统一了模型并行、拓扑建模和内存可行性。NEST的动态规划在具有张量和专家并行配置、跨层次或任意网络的显式allreduce延迟以及内存/计算轮廓的算子图上运行。通过跨张量、流水线、数据和专家维度分解并行性，NEST为混合策略定义了一个原则性的搜索空间，同时联合优化共置、网络延迟和内存可行性。在多种硬件和网络上的评估表明，与最先进的基线相比，NEST实现了高达2.43倍的吞吐量提升、更好的内存效率和可扩展性，为下一代AI基础设施的并行化策略和数据中心互连协同设计提供了基础。NEST的源代码可在https://github.com/scai-tech/Nest获取。

英文摘要

The growing scale of deep learning demands distributed training frameworks that jointly reason about parallelism, memory, and network topology. Prior works often rely on heuristic or topology-agnostic search, handling communication and memory separately. Without per-device memory awareness, these methods typically ensure feasibility post hoc by sharding parameters and activations across many devices, increasing synchronization, inflating communication, and underutilizing compute-limiting scalability and efficiency on real datacenter networks. We present NEST, a network-, compute-, and memory-aware device placement framework that unifies model parallelism, topology modeling, and memory feasibility via structured dynamic programming. NEST's DP operates on operator graphs with tensor and expert parallel configurations, explicit allreduce latencies across hierarchical or arbitrary networks, and memory/compute profiles. By factoring parallelism across tensor, pipeline, data, and expert dimensions, NEST defines a principled search space for hybrid strategies while jointly optimizing co-location, network latency, and memory feasibility. Evaluations across diverse hardware and networks show NEST achieves up to 2.43 times higher throughput, better memory efficiency, and improved scalability over state-of-the-art baselines, providing a foundation for co-designing parallelization strategies and datacenter interconnects for next-generation AI infrastructure. The source code of NEST is available at: https://github.com/scai-tech/Nest

URL PDF HTML ☆

赞 0 踩 0

2603.05143 2026-05-26 cs.CL cs.LG 版本更新

Feature Resemblance: Towards a Theoretical Understanding of Analogical Reasoning in Transformers

特征相似性：迈向对Transformer中类比推理的理论理解

Ruichen Xu, Wenjing Yan, Ying-Jun Angela Zhang

发表机构 * Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong（香港中文大学信息工程系）

AI总结本文通过最小化Transformer抽象模型，从理论上证明联合训练和特定课程顺序能使实体在表示空间中对齐，从而通过特征相似性实现属性转移，即类比推理。

详情

AI中文摘要

理解大型语言模型中的推理因评估混淆多种推理类型而变得复杂。我们分离出类比推理，即模型在共享已知属性的实体之间转移属性，并研究这种转移何时能从训练中涌现。为了使问题在分析上易于处理，我们研究了一个最小化的Transformer风格抽象，该抽象隔离了学习到的表示如何支持类比推理。在此设置中，我们证明了三个关键结果。首先，对相似性和属性前提的联合训练通过对齐表示实现类比推理。其次，顺序训练仅在相似性结构先于特定属性学习时成功，揭示了课程不对称性。第三，在我们的风格化设置中，两跳推理$(a \to b, b \to c \Rightarrow a \to c)$可被视为具有身份桥$(b=b)$的类比推理，这些身份桥在训练数据中明确出现。这些结果共同揭示了一个统一机制：具有共享属性的实体在表示空间中对齐，从而通过特征相似性实现属性转移。使用高达8B参数的架构进行的实验与理论定性一致，并表明表示几何在风格化模型之外的类比推理中扮演重要角色。

英文摘要

Understanding reasoning in large language models is complicated by evaluations that conflate multiple reasoning types. We isolate analogical reasoning, where a model transfers an attribute between entities that share known properties, and study when such transfer can emerge from training. To make the problem analytically tractable, we study a minimal transformer-style abstraction that isolates how learned representations support analogical reasoning. Within this setting, we prove three key results. First, joint training on similarity and attribution premises enables analogical reasoning through aligned representations. Second, sequential training succeeds only when similarity structure is learned before specific attributes, revealing a curriculum asymmetry. Third, in our stylized setting, two-hop reasoning $(a \to b, b \to c \Rightarrow a \to c)$ can be viewed as analogical reasoning with identity bridges $(b=b)$, which appear explicitly in training data. Together, these results reveal a unified mechanism: entities with shared properties become aligned in representation space, enabling property transfer through feature resemblance. Experiments with architectures up to 8B parameters show qualitative agreement with the theory and suggest that representational geometry plays an important role in analogical reasoning beyond the stylized model.

URL PDF HTML ☆

赞 0 踩 0

2603.00857 2026-05-26 cs.LG cs.AI 版本更新

MultiPUFFIN: A Multimodal Domain-Constrained Foundation Model for Molecular Property Prediction of Small Molecules

MultiPUFFIN：用于小分子性质预测的多模态领域约束基础模型

Idelfonso B. R. Nogueira, Carine M. Rebello, Mumin Enis Leblebici, Erick Giovani Sperandio Nascimento

发表机构 * Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU)（挪威科学与技术大学化学工程系）； Faculty of Industrial Engineering, KU Leuven（鲁文大学工业工程学院）； University of Surrey（萨里大学）

AI总结提出多模态基础模型MultiPUFFIN，融合SMILES、2D图、3D构象及实验条件，通过条件感知精炼和热力学约束头，在小样本下优于ChemBERTa-2，预测小分子热物理性质。

详情

AI中文摘要

MultiPUFFIN是一个领域信息多模态基础模型，用于预测小分子的热物理性质，填补了化学工程、药物发现和材料科学中的关键空白。现有的分子基础模型在数百万分子上预训练以学习通用表示，但其标准MLP输出层不施加物理约束，蒸汽压预测可能违反温度单调依赖性，粘度曲线可能缺乏过程模拟器所需的功能形式。保证热力学一致性的领域信息方法仍局限于单一性质和少量数据集，而多模态基础模型则侧重于生物活性而非热物理性质。MultiPUFFIN通过双向跨模态注意力和门控融合融合SMILES序列、2D分子图和3D构象几何，并辅以实验条件和分子描述符的辅助编码器，填补了这一空白。骨干网络使用三种互补的自监督目标在500,000个未标记的PubChem分子上预训练。一个条件感知精炼堆栈包含五个条件器（温度、pH、压力、多晶型和测量方法），将每个性质路由到一个四头锦标赛，选择该性质性能最佳的热力学信息头。MultiPUFFIN的平均测试R²为0.784，在所有九个性质上优于微调的ChemBERTa-2，尽管训练使用的标记分子数量少了约2000倍。

英文摘要

MultiPUFFIN is a domain-informed multimodal foundation model for predicting thermophysical properties of small molecules, addressing a critical gap in chemical engineering, drug discovery, and materials science. Existing molecular foundation models pretrain on millions of molecules to learn general-purpose representations, but their standard MLP output layers impose no physical constraints, vapor pressure predictions may violate monotonic temperature dependence, and viscosity curves may lack the functional form required by process simulators. Domain-informed approaches that guarantee thermodynamic consistency have remained limited to single properties and small datasets, whereas multimodal foundation models have focused on biological activity rather than thermophysical properties. MultiPUFFIN fills this gap by fusing SMILES sequences, 2D molecular graphs, and 3D conformer geometries through bidirectional cross-modal attention and gated fusion, supplemented by auxiliary encoders for experimental conditions and molecular descriptors. The backbone is pretrained on 500,000 unlabelled PubChem molecules using three complementary self-supervised objectives. A condition-aware refinement stack of five conditioners (temperature, pH, pressure, polymorph, and measurement method) routes each property to a four-head tournament that selects the best-performing thermodynamically informed head for that property. MultiPUFFIN achieves a mean test R2 of 0.784 and outperforms fine-tuned ChemBERTa-2 on all nine properties despite training on roughly 2,000x fewer labeled molecules.

URL PDF HTML ☆

赞 0 踩 0

2602.21198 2026-05-26 cs.LG cs.AI cs.CL cs.CV cs.RO 版本更新

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

从试错中学习：具身大语言模型的反思式测试时规划

Yining Hong, Huang Huang, Manling Li, Li Fei-Fei, Leonidas Guibas, Jiajun Wu, Yejin Choi

发表机构 * Stanford University（斯坦福大学）； Northwestern University（西北大学）

AI总结提出反思式测试时规划方法，通过行动中反思和行动后反思两种模式，结合回溯性反思，使具身智能体在测试时进行自我纠正和经验积累，显著提升长程任务性能。

详情

AI中文摘要

具身大语言模型赋予机器人高级任务推理能力，但它们无法反思错误原因，导致部署成为一系列独立尝试，错误重复而非积累经验。借鉴人类反思实践，我们引入反思式测试时规划，整合两种反思模式： extit{行动中反思}，代理在行动前利用测试时扩展生成并评分多个候选行动，基于内部反思；以及 extit{行动后反思}，利用测试时训练，根据执行后的外部反思更新内部反思模型和行动策略。我们还包含回溯性反思，允许代理重新评估早期决策，并利用后见之明进行模型更新，实现适当的长程信用分配。在我们新设计的Long-Horizon Household基准和MuJoCo Cupboard Fitting基准上的实验表明，与基线模型相比有显著提升，并能零样本泛化到逼真的HM3D环境以及在Franka Panda机械臂上的真实机器人实验。消融实验证实，行动中反思和行动后反思相互依赖，且回溯性反思在较低计算开销下比逐步外部反馈实现更好的信用分配。定性分析进一步突出了通过反思进行的行为纠正。

英文摘要

Embodied LLMs endow robots with high-level task reasoning, but they cannot reflect on what went wrong or why, turning deployment into a sequence of independent trials where mistakes repeat rather than accumulate into experience. Drawing upon human reflective practitioners, we introduce Reflective Test-Time Planning, which integrates two modes of reflection: \textit{reflection-in-action}, where the agent uses test-time scaling to generate and score multiple candidate actions using internal reflections before execution; and \textit{reflection-on-action}, which uses test-time training to update both its internal reflection model and its action policy based on external reflections after execution. We also include retrospective reflection, allowing the agent to re-evaluate earlier decisions and perform model updates with hindsight for proper long-horizon credit assignment. Experiments on our newly-designed Long-Horizon Household benchmark and MuJoCo Cupboard Fitting benchmark show significant gains over baseline models, with zero-shot generalization to photorealistic HM3D environments and real-robot experiments on a Franka Panda arm. Ablations confirm that reflection-in-action and reflection-on-action are mutually dependent, and that retrospective reflection achieves better credit assignment than step-wise external feedback at lower computational overhead. Qualitative analyses further highlight behavioral correction through reflection.

URL PDF HTML ☆

赞 0 踩 0

2602.20210 2026-05-26 cs.LG cs.AI 版本更新

Multimodal Crystal Flow: Any-to-Any Modality Generation for Unified Crystal Modeling

多模态晶体流：面向统一晶体建模的任意模态生成

Kiyoung Seong, Sungsoo Ahn, Sehui Han, Changyoung Park

发表机构 * Graduate School of AI, KAIST, Seoul, South Korea（韩国科学技术院人工智能研究生院，首尔，韩国）； Materials Intelligence Lab, LG AI Research, Seoul, South Korea（LG AI研究所材料智能实验室，首尔，韩国）

AI总结提出多模态晶体流（MCFlow），一种统一的多模态流模型，通过原子类型和晶体结构的独立时间变量实现多种晶体生成任务，并在MP-20和MPTS-52基准上达到与任务特定基线竞争的性能。

详情

AI中文摘要

晶体建模涵盖一系列条件和非条件生成任务，包括晶体结构预测（CSP）和从头生成（DNG）。尽管最近的深度生成模型表现出有前景的性能，但它们仍然主要是任务特定的，缺乏跨任务共享晶体表示的统一框架。为了解决这一限制，我们提出了多模态晶体流（MCFlow），一种统一的多模态流模型，通过原子类型和晶体结构的独立时间变量将多种晶体生成任务实现为不同的推理轨迹。为了在标准Transformer模型中实现多模态流，我们引入了一种具有层次排列增强的组合和对称感知原子排序，无需显式结构模板即可注入组合和晶体学先验。在MP-20和MPTS-52基准上的实验表明，单个MCFlow模型在CSP、DNG和结构条件原子类型生成方面与任务特定基线具有竞争力。

英文摘要

Crystal modeling spans a family of conditional and unconditional generation tasks, including crystal structure prediction (CSP) and de novo generation (DNG). While recent deep generative models have shown promising performance, they remain largely task-specific, lacking a unified framework that shares crystal representations across tasks. To address this limitation, we propose Multimodal Crystal Flow (MCFlow), a unified multimodal flow model that realizes multiple crystal generation tasks as distinct inference trajectories via independent time variables for atom types and crystal structures. To enable multimodal flow in a standard transformer model, we introduce a composition- and symmetry-aware atom ordering with hierarchical permutation augmentation, injecting compositional and crystallographic priors without explicit structural templates. Experiments on the MP-20 and MPTS-52 benchmarks show that a single MCFlow model is competitive with task-specific baselines across CSP, DNG, and structure-conditioned atom type generation.

URL PDF HTML ☆

赞 0 踩 0

2602.17658 2026-05-26 cs.LG cs.AI cs.IT math.IT 版本更新

MARS: Margin and Semantic-Aware Data Augmentation for Reward Modeling

MARS：面向奖励建模的边界与语义感知数据增强

Payel Bhattacharjee, Osvaldo Simeone, Ravi Tandon

发表机构 * University of Arizona（亚利桑那大学）； Northeastern University London（伦敦东北大学）

AI总结提出MARS框架，通过优先增强低边界偏好对并利用语义距离细化，提升奖励模型质量和对齐性能。

详情

AI中文摘要

奖励建模是RLHF、RLAIF和基于PPO的策略优化等对齐流程的核心，但其可靠性受限于有限且异构的人类偏好数据，这些数据难以大规模收集。虽然合成增强可以扩展偏好监督，但现有方法通常均匀增强或在表示层面增强，而不针对奖励模型不确定或容易误排序的示例。在本文中，我们介绍了MARS（面向奖励建模的边界与语义感知数据增强），一种自适应增强框架，优先考虑低边界偏好对，并使用语义距离作为第二层细化，以增强选择响应和拒绝响应之间的对比。在多个偏好数据集、奖励模型骨干、下游对齐设置以及包括RewardBench和AlpacaEval在内的基准测试中，MARS在奖励模型质量和对齐性能上都优于现有基线。我们的结果表明，当同时由模型边界和语义结构引导时，奖励模型增强最为有效。

英文摘要

Reward modeling is central to alignment pipelines such as RLHF, RLAIF, and PPO-based policy optimization, yet its reliability is constrained by limited and heterogeneous human preference data that are expensive to collect at scale. While synthetic augmentation can expand preference supervision, existing methods often augment uniformly or at the representation level, without targeting examples where the reward model is uncertain or prone to mis-ranking. In this paper, we introduce MARS (Margin and Semantic-Aware Data Augmentation for Reward Modeling), an adaptive augmentation framework that prioritizes low-margin preference pairs and uses semantic distance as a second layer for refinement to enhance the contrast between the chosen and rejected responses. Across multiple preference datasets, reward-model backbones, downstream alignment settings, and benchmarks including RewardBench and AlpacaEval, MARS improves both reward-model quality and alignment performance over existing baselines. Our results show that reward-model augmentation is most effective when guided by both model margins and semantic structure.

URL PDF HTML ☆

赞 0 踩 0

2602.17234 2026-05-26 cs.AI cs.LG 版本更新

All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection and Mitigation in LLM Backtesting

所有泄漏都重要，有些泄漏更重要：LLM回测中可解释的时间污染检测与缓解

Zeyu Zhang, Ryan Chen, Bradly C. Stadie

发表机构 * Department of Statistics and Data Science, Northwestern University（统计与数据科学系，西北大学）； Bridgewater AIA Labs（布里奇沃特AIA实验室）

AI总结提出基于Shapley值的声明级评估框架Shapley-DCLR和推理时架构TimeSPEC，用于检测和缓解LLM回测中的时间污染问题。

Comments 8 pages plus appendix

详情

AI中文摘要

对已解决事件进行回测的LLM假设模型仅基于截止前知识进行推理，然而预训练模型不可避免地泄漏截止后知识。我们引入了一个声明级评估框架，将预测理由分解为原子声明，并应用Shapley值量化每个声明的决策影响，从而得到 extbf{Shapley-DCLR}（ extbf{Shapley}加权的 extbf{决策关键泄漏率}）——一个可解释的度量，用于衡量决策驱动推理中被污染的比例。我们进一步提出 extbf{TimeSPEC}（基于提取声明的时间监督预测），一种推理时架构，它将时间过滤的检索与声明级监督交织在一起，生成完全基于截止前证据的预测。在三个LLM上的消融实验证实了检索和监督共同必要；三项任务探测进一步说明，时间强制的性能成本与每个任务对截止后信息的依赖程度成正比。

英文摘要

Backtesting LLMs on resolved events assumes models reason only from pre-cutoff knowledge, yet pretrained models inevitably leak post-cutoff knowledge. We introduce a claim-level evaluation framework that decomposes prediction rationales into atomic claims and applies Shapley values to quantify each claim's decision impact, yielding \textbf{Shapley-DCLR} (\textbf{Shapley}-weighted \textbf{D}ecision-\textbf{C}ritical \textbf{L}eakage \textbf{R}ate) -- an interpretable metric measuring what fraction of decision-driving reasoning is contaminated. We further propose \textbf{TimeSPEC} (\textbf{Time}-\textbf{S}upervised \textbf{P}rediction with \textbf{E}xtracted \textbf{C}laims), an inference-time architecture that interleaves temporally-filtered retrieval with claim-level supervision, producing predictions grounded entirely in pre-cutoff evidence. Across three LLMs, the ablation experiments confirm retrieval and supervision are jointly necessary; and a three-task probe further illstrates that the performance cost of temporal enforcement scales with each task's reliance on post-cutoff information.

URL PDF HTML ☆

赞 0 踩 0

2602.16229 2026-05-26 cs.LG 版本更新

上下文展开赌博机：面向可验证奖励的强化学习

Xiaodong Lu, Xiaohan Wang, Jiajun Chai, Guojun Yin, Wei Lin, Zhijun Chen, Yu Luo, Fuzhen Zhuang, Yikun Ban, Deqing Wang

发表机构 * School of Computer Science and Engineering, Beihang University（北京航空航天大学计算机科学与工程学院）； School of Artificial Intelligence, Beihang University（北京航空航天大学人工智能学院）； Huawei（华为）

AI总结针对RLVR中展开使用无差别、短视导致的问题，提出上下文赌博机框架，自适应选择高价值展开，提升训练效率与性能。

详情

AI中文摘要

可验证奖励的强化学习（RLVR）是提升大型语言模型推理能力的有效范式。然而，现有RLVR方法以无差别和短视的方式使用展开：每个提示内不同质量的响应被统一对待，且历史展开在单次使用后被丢弃。这导致监督噪声大、样本效率低以及策略更新次优。我们通过将RLVR中的展开调度形式化为上下文赌博机问题，并提出一个统一的神经调度框架来解决这些问题，该框架在整个训练过程中自适应地选择高价值展开。每个展开被视为一个臂，其奖励由连续优化步骤之间诱导的性能增益定义。由此产生的调度器支持噪声感知的组内选择和历史展开的自适应全局重用，所有这些都在一个统一的原则性框架内。我们通过推导次线性遗憾界并证明扩大展开缓冲区可改善可实现性能上限，提供了理论依据。在六个数学推理基准上的实验表明，在多种RLVR优化方法中，性能和训练效率均有一致的提升。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) is an effective paradigm for improving the reasoning capabilities of large language models. However, existing RLVR methods utilize rollouts in an indiscriminate and short-horizon manner: responses of heterogeneous quality within each prompt are treated uniformly, and historical rollouts are discarded after a single use. This leads to noisy supervision, poor sample efficiency, and suboptimal policy updates. We address these issues by formulating rollout scheduling in RLVR as a contextual bandit problem and proposing a unified neural scheduling framework that adaptively selects high-value rollouts throughout training. Each rollout is treated as an arm whose reward is defined by the induced performance gain between consecutive optimization steps. The resulting scheduler supports both noise-aware intra-group selection and adaptive global reuse of historical rollouts within a single principled framework. We provide theoretical justification by deriving sublinear regret bounds and showing that enlarging the rollout buffer improves the achievable performance upper bound. Experiments on six mathematical reasoning benchmarks demonstrate consistent gains in performance and training efficiency across multiple RLVR optimization methods.

URL PDF HTML ☆

赞 0 踩 0

2602.07518 2026-05-26 cs.ET cs.AR cs.LG nlin.AO 版本更新

Physical Analogue Kolmogorov-Arnold Networks based on Reconfigurable Nonlinear-Processing Units

基于可重构非线性处理单元的物理模拟Kolmogorov-Arnold网络

Manuel Escudero, Mohamadreza Zolfagharinejad, Sjoerd van den Belt, Nikolaos Alachiotis, Wilfred G. van der Wiel

发表机构 * NanoElectronics Group, MESA+ Institute and BRAINS Center for Brain-Inspired Computing, University of Twente（纳米电子组、MESA+研究所和脑启发计算中心、代尔夫特理工大学）； CAES Group and BRAINS Center for Brain-Inspired Computing, University of Twente（CAES组和脑启发计算中心、代尔夫特理工大学）

AI总结提出一种基于可重构非线性处理单元（RNPU）的物理模拟KAN架构，通过硬件实现可编程非线性变换，在回归和分类任务中以更少参数达到与MLP相当的精度，并实现约10²-10³倍能效提升和约10倍面积缩减。

详情

AI中文摘要

Kolmogorov-Arnold网络（KAN）将神经计算从线性层转移到可学习的非线性边函数，但在硬件中高效实现这些非线性仍是一个开放挑战。本文介绍了一种物理模拟KAN架构，其中边函数通过可重构非线性处理单元（RNPU）在材料中实现：RNPU是多端纳米级硅器件，其输入输出特性通过控制电压调谐。通过将多个RNPU组合成边处理器，并将这些模块组装成具有集成混合信号接口的可重构模拟KAN（aKAN）架构，我们建立了一个现实的系统级硬件实现，能够以可编程非线性变换实现紧凑的KAN式回归和分类。使用实验校准的RNPU模型和硬件测量，我们展示了在任务复杂度增加时准确函数逼近的能力，同时所需可训练参数少于或相当于多层感知器（MLP）。系统级估计表明，对于代表性工作负载，每次推理能耗约为250 pJ，端到端推理延迟约为600 ns，与具有相似逼近误差的数字定点MLP相比，能耗降低约10²-10³倍，面积减少约10倍。这些结果确立了RNPU作为可扩展的硬件原生非线性计算基元，并指出模拟KAN架构是实现能量、延迟和面积高效的模拟神经网络硬件的现实硅基路径，特别适用于边缘推理。

英文摘要

Kolmogorov-Arnold Networks (KANs) shift neural computation from linear layers to learnable nonlinear edge functions, but implementing these nonlinearities efficiently in hardware remains an open challenge. Here we introduce a physical analogue KAN architecture in which edge functions are realized in materia using reconfigurable nonlinear-processing units (RNPUs): multi-terminal nanoscale silicon devices whose input-output characteristics are tuned via control voltages. By combining multiple RNPUs into an edge processor and assembling these blocks into a reconfigurable analogue KAN (aKAN) architecture with integrated mixed-signal interfacing, we establish a realistic system-level hardware implementation that enables compact KAN-style regression and classification with programmable nonlinear transformations. Using experimentally calibrated RNPU models and hardware measurements, we demonstrate accurate function approximation across increasing task complexity while requiring fewer or comparable trainable parameters than multilayer perceptrons (MLPs). System-level estimates indicate an energy per inference of $\sim$250 pJ and an end-to-end inference latency of $\sim$600 ns for a representative workload, corresponding to a $\sim$10$^{2}$-10$^{3}\times$ reduction in energy accompanied by a $\sim$10$\times$ reduction in area compared to a digital fixed-point MLP at similar approximation error. These results establish RNPUs as scalable, hardware-native nonlinear computing primitives and identify analogue KAN architectures as a realistic silicon-based pathway toward energy-, latency-, and footprint-efficient analogue neural-network hardware, particularly for edge inference.

URL PDF HTML ☆

赞 0 踩 0

2602.06717 2026-05-26 cs.LG cs.AI 版本更新

F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare

F-GRPO: 别让你的策略学到显而易见的而忘记罕见的

Daniil Plyusov, Alexey Gorbatovski, Boris Shaposhnikov, Viacheslav Sinii, Alexey Malakhov, Daria Korotyshova, Daniil Gavrilov

发表机构 * T-Tech

AI总结针对强化学习中有限采样组导致罕见正确轨迹被忽略的问题，提出基于Focal loss的难度感知缩放系数F-GRPO，在不增加组大小和计算成本下提升数学推理性能。

详情

AI中文摘要

基于可验证奖励的强化学习通常依赖组采样来估计优势并稳定策略更新。实践中，计算限制往往排除非常大的组，因此训练使用有限的rollout集合，这些集合只能强化它们暴露的正确行为。在实际组大小下，更新可能会遗漏罕见的正确轨迹，同时仍然包含混合奖励，将概率集中在更常见的采样解上。我们推导了这种提示局部尾部遗漏事件作为组大小函数的概率，展示了非单调行为，并在分类抽象中描述了未采样的正确质量如何在总正确质量增长时缩小。受此分析启发，我们提出了一种难度感知缩放系数，灵感来自Focal loss，它降低了高成功采样组的更新权重。经验上，分类模拟在分类设置中展示了相同效果，Maze提供了单解测试，LLM实验包括代表性的GRPO组大小扫描以及GRPO、DAPO和CISPO之间的固定N迁移。在Qwen2.5-7B上，N=8时，我们的方法将平均数学pass@256从64.1提高到70.3（GRPO），69.3提高到72.5（DAPO），73.2提高到76.8（CISPO）；在所有三种情况下，OOD pass@256也得到改善，且不增加组大小或计算成本。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) is commonly based on group sampling to estimate advantages and stabilize policy updates. In practice, computational limits often rule out very large groups, so training proceeds with finite rollout sets that can reinforce only the correct behavior they expose. At practical group sizes, updates can miss rare-correct trajectories while still containing mixed rewards, concentrating probability on more common sampled solutions. We derive the probability of such prompt-local tail-miss events as a function of group size, showing non-monotonic behavior, and in the categorical abstraction characterize how unsampled-correct mass can shrink even as total correct mass grows. Motivated by this analysis, we propose a difficulty-aware scaling coefficient, inspired by Focal loss, that down-weights updates on high-success sampled groups. Empirically, categorical simulation illustrates the same effect in the categorical setting, Maze provides a single-solution test, and LLM experiments include a representative GRPO group-size sweep together with fixed-$N$ transfer across GRPO, DAPO, and CISPO. On Qwen2.5-7B at $N{=}8$, our method improves average math pass@256 from 64.1 $\rightarrow$ 70.3 (GRPO), 69.3 $\rightarrow$ 72.5 (DAPO), and 73.2 $\rightarrow$ 76.8 (CISPO); OOD pass@256 also improves in all three cases, without increasing group size or computational cost.

URL PDF HTML ☆

赞 0 踩 0

2602.05052 2026-05-26 cs.LG 版本更新

Learning, Solving and Optimizing PDEs with TensorGalerkin: an efficient high-performance Galerkin assembly algorithm

使用TensorGalerkin学习、求解和优化PDE：一种高效的高性能Galerkin组装算法

Shizheng Wen, Mingyuan Chi, Tianwei Yu, Ben Moseley, Mike Yan Michelis, Pu Ren, Hao Sun, Siddhartha Mishra

发表机构 * ETH Zurich, Switzerland（苏黎世联邦理工学院，瑞士）； Imperial College London, UK（伦敦帝国学院，英国）； Northeastern University, USA（东北大学，美国）； Renmin University of China, China（中国人民大学，中国）

AI总结提出基于Galerkin离散化的统一算法框架，通过张量化元素操作和稀疏矩阵乘法实现O(1)图规模的系统组装，高效求解、约束优化和物理信息学习变分PDE。

详情

AI中文摘要

我们提出了一个统一的算法框架，用于具有变分结构的PDE的数值求解、约束优化和物理信息学习。该框架基于底层变分形式的Galerkin离散化，其高效率源于一种新颖的高度优化且兼容GPU的TensorGalerkin框架，用于线性系统组装（刚度矩阵和载荷向量）。TensorGalerkin通过在Python级Map阶段张量化元素操作，然后使用稀疏矩阵乘法进行全局归约，该乘法在网格诱导的稀疏图上执行消息传递。Map和Reduce阶段在PyTorch的autograd内部协同设计，使得组装图包含O(1)个节点，无论元素数量和局部自由度如何缩放。我们通过将TensorGalerkin部署为i)高效的数值PDE求解器，ii)用于PDE约束优化的端到端可微框架，以及iii)用于PDE的物理信息算子学习算法，验证了这种O(1)图属性。通过多个基准测试，包括非结构化网格上的2D和3D椭圆、抛物线和双曲PDE，我们证明了所提出的框架在所有目标下游应用中相比各种基线提供了显著的计算效率和精度提升。

英文摘要

We present a unified algorithmic framework for the numerical solution, constrained optimization, and physics-informed learning of PDEs with a variational structure. Our framework is based on a Galerkin discretization of the underlying variational forms, and its high efficiency stems from a novel highly-optimized and GPU-compliant TensorGalerkin framework for linear system assembly (stiffness matrices and load vectors). TensorGalerkin operates by tensorizing element-wise operations within a Python-level Map stage and then performs global reduction with a sparse matrix multiplication that performs message passing on the mesh-induced sparsity graph. The Map and Reduce stages are co-designed inside PyTorch's autograd so that the assembly graph contains $O(1)$ nodes regardless of how the number of elements and local DoFs scale. We validate this $O(1)$-graph property by deploying TensorGalerkin downstream as i) a highly-efficient numerical PDEs solver, ii) an end-to-end differentiable framework for PDE-constrained optimization, and iii) a physics-informed operator learning algorithm for PDEs. With multiple benchmarks, including 2D and 3D elliptic, parabolic, and hyperbolic PDEs on unstructured meshes, we demonstrate that the proposed framework provides significant computational efficiency and accuracy gains over a variety of baselines in all the targeted downstream applications.

URL PDF HTML ☆

赞 0 踩 0

2602.04139 2026-05-26 cs.LG physics.comp-ph 版本更新

Generative Neural Operators through Diffusion Last Layer

通过扩散最后一层的生成式神经算子

Sungwon Park, Anthony Zhou, Hongjoong Kim, Amir Barati Farimani

发表机构 * Korea University, Seoul, South Korea（韩国大学，首尔，韩国）； Carnegie Mellon University, Pittsburgh, USA（卡内基梅隆大学，匹兹堡，美国）

AI总结提出扩散最后一层（DLL）作为神经算子的概率输出头，通过Karhunen-Loéve展开和系数空间的条件扩散模型实现高效分布建模，在随机PDE基准和确定性长时滚动任务中提升了分布保真度和不确定性估计。

Comments ICML 2026, code is available at https://github.com/sungwpark/dll-no

详情

AI中文摘要

神经算子为学习函数空间之间的离散化不变映射提供了强大框架，但标准确定性模型无法捕捉预测不确定性。我们引入了扩散最后一层（DLL），一种用于神经算子主干的模块化概率输出头。DLL通过受Karhunen-Loéve展开启发的输入依赖低秩展开表示目标场，并在相应系数空间上学习条件扩散模型。这种设计使得在保留算子学习结构优势的同时实现高效的分布建模。在具有随机强迫的随机PDE基准测试中，DLL实现了强分布保真度，并与像素空间和传统潜在扩散基线竞争。在确定性长时滚动任务中，DLL提高了底层主干的滚动稳定性，并在复合自回归误差下提供了有用的预测不确定性估计。这些结果表明，在学习到的系数空间中进行扩散建模为不确定性感知神经算子提供了一条实用途径。

英文摘要

Neural operators provide a powerful framework for learning discretization invariant mappings between function spaces, but standard deterministic models do not capture predictive uncertainty. We introduce diffusion last layer (DLL), a modular probabilistic output head for neural operator backbones. DLL represents target fields through an input dependent low rank expansion inspired by the Karhunen-Loéve expansion and learns a conditional diffusion model over the corresponding coefficient space. This design enables efficient distributional modeling while preserving the structural advantages of operator learning. On stochastic PDE benchmarks with random forcing, DLL achieves strong distributional fidelity and performs competitively with pixel space and conventional latent diffusion baselines. In deterministic long horizon rollout tasks, DLL improves rollout stability over the underlying backbone and provides useful estimates of predictive uncertainty under compounding autoregressive errors. These results suggest that diffusion modeling in learned coefficient spaces offers a practical route to uncertainty aware neural operators.

URL PDF HTML ☆

赞 0 踩 0

2602.04120 2026-05-26 cs.LG cs.AI cs.DC cs.SE 版本更新

Scalable Explainability-as-a-Service (XaaS) for Edge AI Systems

面向边缘AI系统的可扩展可解释性即服务（XaaS）

Samaresh Kumar Singh, Joyjit Roy

AI总结提出可解释性即服务（XaaS）分布式架构，通过解耦推理与解释生成、语义缓存、轻量验证和自适应引擎，在边缘设备上实现低延迟、高保真的可解释性，并在三个实际用例中降低38%延迟。

Comments 8 pages, 5 figures, 2 tables. This version updates metadata after publication in IEEE Xplore and publication by SoutheastCon 2026

详情

DOI: 10.1109/SoutheastCon63549.2026.11476268
Journal ref: 2026 IEEE SoutheastCon, Huntsville, AL, USA, 2026

AI中文摘要

尽管可解释人工智能（XAI）取得了显著进展，但其在边缘和物联网系统中的集成通常是临时且低效的。当前大多数方法以“耦合”方式运行，即解释生成与模型推理同时进行。因此，这些方法在异构边缘设备上部署时会产生冗余计算、高延迟和可扩展性差的问题。本文提出可解释性即服务（XaaS），一种将可解释性视为一等系统服务（而非模型特定功能）的分布式架构。我们提出的XaaS架构的关键创新在于解耦推理与解释生成，使边缘设备能够在资源和延迟约束下请求、缓存和验证解释。为此，我们引入三项主要创新：（1）基于语义相似性的分布式解释缓存检索方法，显著减少冗余计算；（2）轻量验证协议，确保缓存和新生成解释的保真度；（3）自适应解释引擎，根据设备能力和用户需求选择解释方法。我们在三个实际边缘AI用例上评估了XaaS的性能：（i）制造质量控制；（ii）自动驾驶车辆感知；（iii）医疗诊断。实验结果表明，XaaS在三个实际部署中延迟降低38%，同时保持高解释质量。总体而言，本工作使得在大规模异构物联网系统中部署透明和可问责的AI成为可能，并弥合了XAI研究与边缘实用性之间的差距。

英文摘要

Though Explainable AI (XAI) has made significant advancements, its inclusion in edge and IoT systems is typically ad-hoc and inefficient. Most current methods are "coupled" in such a way that they generate explanations simultaneously with model inferences. As a result, these approaches incur redundant computation, high latency and poor scalability when deployed across heterogeneous sets of edge devices. In this work we propose Explainability-as-a-Service (XaaS), a distributed architecture for treating explainability as a first-class system service (as opposed to a model-specific feature). The key innovation in our proposed XaaS architecture is that it decouples inference from explanation generation allowing edge devices to request, cache and verify explanations subject to resource and latency constraints. To achieve this, we introduce three main innovations: (1) A distributed explanation cache with a semantic similarity based explanation retrieval method which significantly reduces redundant computation; (2) A lightweight verification protocol that ensures the fidelity of both cached and newly generated explanations; and (3) An adaptive explanation engine that chooses explanation methods based upon device capability and user requirement. We evaluated the performance of XaaS on three real-world edgeAI use cases: (i) manufacturing quality control; (ii) autonomous vehicle perception; and (iii) healthcare diagnostics. Experimental results show that XaaS reduces latency by 38% while maintaining high explanation quality across three real-world deployments. Overall, this work enables the deployment of transparent and accountable AI across large scale, heterogeneous IoT systems, and bridges the gap between XAI research and edge-practicality.

URL PDF HTML ☆

赞 0 踩 0

2602.02979 2026-05-26 cs.CL cs.LG 版本更新

CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning

CPMobius: 无数据强化学习的迭代式教练-玩家推理

Ran Li, Zeyuan Liu, Yinghao Chen, Bingxiang He, Jiarui Yuan, Zixuan Fu, Weize Chen, Jinyi Hu, Chen Qian, Zhiyuan Liu, Maosong Sun

发表机构 * Tsinghua University（清华大学）； University of Cambridge（剑桥大学）； Shanghai Jiao Tong University（上海交通大学）

AI总结提出CPMobius协作式教练-玩家范式，通过无外部数据的合作优化循环提升数学推理能力，在Qwen2.5-Math-7B-Instruct上总体准确率提升4.9%，OOD准确率提升5.4%。

Comments Accepted to the ICML 2026

详情

AI中文摘要

大型语言模型（LLMs）在复杂推理方面展现出强大潜力，但其进展仍从根本上受限于对大规模高质量人工策划任务和标签的依赖，无论是通过监督微调（SFT）还是基于推理特定数据的强化学习（RL）。这种依赖使得监督密集型训练范式日益不可持续，实践中已出现可扩展性减弱的迹象。为克服这一限制，我们引入了CPMöbius（CPMobius），一种用于推理模型无数据强化学习的协作式教练-玩家范式。与传统对抗性自博弈不同，CPMöbius受现实世界人类体育协作和多智能体协作启发，将教练和玩家视为独立但合作的角色。教练针对玩家的能力提出指令，并根据玩家表现的变化获得奖励，而玩家则因解决教练生成的越来越有指导性的任务而获得奖励。这种合作优化循环旨在直接提升玩家的数学推理能力。值得注意的是，CPMöbius在不依赖任何外部训练数据的情况下实现了显著改进，优于现有的无监督方法。例如，在Qwen2.5-Math-7B-Instruct上，我们的方法总体准确率平均提升4.9%，分布外（OOD）准确率平均提升5.4%，总体准确率超过RENT 1.5%，OOD准确率超过R-zero 4.2%。我们的代码库已在https://github.com/thunlp/CPMobius发布。

英文摘要

Large Language Models (LLMs) have demonstrated strong potential in complex reasoning, yet their progress remains fundamentally constrained by reliance on massive high-quality human-curated tasks and labels, either through supervised fine-tuning (SFT) or reinforcement learning (RL) on reasoning-specific data. This dependence renders supervision-heavy training paradigms increasingly unsustainable, with signs of diminishing scalability already evident in practice. To overcome this limitation, we introduce CPMöbius (CPMobius), a collaborative Coach-Player paradigm for data-free reinforcement learning of reasoning models. Unlike traditional adversarial self-play, CPMöbius, inspired by real world human sports collaboration and multi-agent collaboration, treats the Coach and Player as independent but cooperative roles. The Coach proposes instructions targeted at the Player's capability and receives rewards based on changes in the Player's performance, while the Player is rewarded for solving the increasingly instructive tasks generated by the Coach. This cooperative optimization loop is designed to directly enhance the Player's mathematical reasoning ability. Remarkably, CPMöbius achieves substantial improvement without relying on any external training data, outperforming existing unsupervised approaches. For example, on Qwen2.5-Math-7B-Instruct, our method improves accuracy by an overall average of +4.9 and an out-of-distribution average of +5.4, exceeding RENT by +1.5 on overall accuracy and R-zero by +4.2 on OOD accuracy. Our codebase has been released at https://github.com/thunlp/CPMobius.

URL PDF HTML ☆

赞 0 踩 0

2602.02495 2026-05-26 cs.CL cs.AI cs.LG 版本更新

Reward-free Alignment for Conflicting Objectives

无奖励的冲突目标对齐

Peter Chen, Xiaopeng Li, Xi Chen, Tianyi Lin

发表机构 * Columbia University（哥伦比亚大学）

AI总结提出RACO框架，通过冲突规避梯度下降的裁剪变体直接利用成对偏好数据解决多目标冲突，实现帕累托最优对齐。

Comments Accepted to ICML 2026 (Oral)

详情

AI中文摘要

直接对齐方法越来越多地用于将大型语言模型（LLMs）与人类偏好对齐。然而，许多现实世界的对齐问题涉及多个相互冲突的目标，简单的偏好聚合可能导致训练不稳定和糟糕的权衡。特别是，加权损失方法可能无法识别同时改善所有目标的更新方向，而现有的多目标方法通常依赖显式奖励模型，增加了额外复杂性并扭曲了用户指定的偏好。本文的贡献有两方面。首先，我们提出了一种用于冲突目标的无奖励对齐框架（RACO），该框架直接利用成对偏好数据，并通过一种新颖的冲突规避梯度下降的裁剪变体解决梯度冲突。我们提供了收敛到尊重用户指定目标权重的帕累托临界点的保证，并进一步证明在双目标设置中裁剪可以严格改善收敛速度。其次，我们使用一些启发式方法改进了我们的方法，并进行了实验，以证明所提框架在LLM对齐中的兼容性。在多个LLM家族（Qwen 3、Llama 3、Gemma 3）上的多目标摘要和安全对齐任务的定性和定量评估表明，与现有的多目标对齐基线相比，我们的方法始终能实现更好的帕累托权衡。

英文摘要

Direct alignment methods are increasingly used to align large language models (LLMs) with human preferences. However, many real-world alignment problems involve multiple conflicting objectives, where naive aggregation of preferences can lead to unstable training and poor trade-offs. In particular, weighted loss methods may fail to identify update directions that simultaneously improve all objectives, and existing multi-objective approaches often rely on explicit reward models, introducing additional complexity and distorting user-specified preferences. The contributions of this paper are two-fold. First, we propose a Reward-free Alignment framework for Conflicted Objectives (RACO) that directly leverages pairwise preference data and resolves gradient conflicts via a novel clipped variant of conflict-averse gradient descent. We provide convergence guarantees to Pareto-critical points that respect user-specified objective weights, and further show that clipping can strictly improve convergence rate in the two-objective setting. Second, we improve our method using some heuristics and conduct experiments to demonstrate the compatibility of the proposed framework for LLM alignment. Both qualitative and quantitative evaluations on multi-objective summarization and safety alignment tasks across multiple LLM families (Qwen 3, Llama 3, Gemma 3) show that our method consistently achieves better Pareto trade-offs compared to existing multi-objective alignment baselines.

URL PDF HTML ☆

赞 0 踩 0

2602.01322 2026-05-26 cs.LG cs.CL 版本更新

PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding

PolySAE: 通过多项式解码建模稀疏自编码器中的特征交互

Panagiotis Koromilas, Andreas D. Demou, James Oldfield, Yannis Panagakis, Mihalis Nicolaou

发表机构 * The Cyprus Institute（塞浦路斯研究所）； University of Athens（雅典大学）； University of Oxford（牛津大学）； Archimedes AI/Athena Research Center（阿基米德AI/雅典娜研究中心）； University of Cyprus（塞浦路斯大学）

AI总结提出PolySAE，在稀疏自编码器解码器中引入高阶项以建模特征交互，通过低秩张量分解在共享投影子空间上捕获成对和三元特征交互，在保持可解释性的同时提升探测F1约8%，并产生与共现频率无关的组合结构。

Comments 43rd International Conference on Machine Learning (ICML 2026); Code: https://github.com/pakoromilas/PolySAE

详情

AI中文摘要

稀疏自编码器（SAE）通过将激活分解为字典原子的稀疏组合来解释神经网络表示。然而，SAE假设特征通过线性重建相加组合，这种假设无法捕捉组合结构：线性模型无法区分“Starbucks”是由“star”和“coffee”特征的组合还是仅由它们的共现产生。这迫使SAE为复合概念分配整体特征，而不是将其分解为可解释的组成部分。我们引入了PolySAE，它通过高阶项扩展SAE解码器以建模特征交互，同时保留对可解释性至关重要的线性编码器。通过在共享投影子空间上进行低秩张量分解，PolySAE以较小的参数开销（GPT2上为3%）捕获成对和三元特征交互。在四个语言模型和三个SAE变体上，PolySAE在保持可比重建误差的同时，探测F1平均提升约8%，并产生类别条件特征分布之间2-10倍更大的Wasserstein距离。关键的是，学习到的交互权重与共现频率的相关性可忽略不计（r = 0.06，而SAE特征协方差为r = 0.82），表明多项式项捕获了很大程度上独立于表面统计的组合结构。最后，学习到的交互方向因果性地将模型输出引导向相应的组合语义。

英文摘要

Sparse autoencoders (SAEs) interpret neural network representations by decomposing activations into sparse combinations of dictionary atoms. However, SAEs assume features combine additively through linear reconstruction, an assumption that cannot capture compositional structure: linear models cannot distinguish whether ''Starbucks'' arises from the composition of ''star'' and ''coffee'' features or merely their co-occurrence. This forces SAEs to allocate monolithic features for compound concepts rather than decomposing them into interpretable constituents. We introduce PolySAE, which extends the SAE decoder with higher-order terms to model feature interactions while preserving the linear encoder essential for interpretability. Through low-rank tensor factorization on a shared projection subspace, PolySAE captures pairwise and triple feature interactions with small parameter overhead (3% on GPT2). Across four language models and three SAE variants, PolySAE achieves an average improvement of $\sim$8% in probing F1 while maintaining comparable reconstruction error, and produces 2--10$\times$ larger Wasserstein distances between class-conditional feature distributions. Critically, learned interaction weights exhibit negligible correlation with co-occurrence frequency ($r = 0.06$ vs $r = 0.82$ for SAE feature covariance), suggesting that polynomial terms capture compositional structure largely independent of surface statistics. Finally, the learned interaction directions causally steer model outputs toward the corresponding compositional semantics.

URL PDF HTML ☆

赞 0 踩 0

2602.01183 2026-05-26 cs.CV cs.LG 版本更新

Refining Context-Entangled Content Segmentation via Curriculum Selection and Anti-Curriculum Promotion

通过课程选择与反课程促进优化上下文纠缠内容分割

Chunming He, Rihan Zhang, Fengyang Xiao, Dingming Zhang, Zhiwen Cao, Sina Farsiu

发表机构 * Duke University（杜克大学）； Adobe（Adobe公司）

AI总结提出CurriSeg双阶段学习框架，结合课程学习与反课程学习原理，通过动态数据选择与频谱盲性微调提升上下文纠缠内容分割的鲁棒性和泛化能力。

Comments ICML 2026, 8 figures, 11 tables

详情

AI中文摘要

生物学习从简单到困难的任务逐步进行，逐渐增强感知和鲁棒性。受此原理启发，我们解决上下文纠缠内容分割（CECS）这一具有挑战性的场景，其中对象与周围环境共享内在视觉模式，如伪装目标检测。传统分割网络主要依赖架构增强，但往往忽略了在纠缠数据分布下控制鲁棒性的学习动态。我们引入CurriSeg，一个双阶段学习框架，统一了课程和反课程原则以提高表示可靠性。在课程选择阶段，CurriSeg基于样本损失的时间统计动态选择训练数据，区分困难但有信息的样本与噪声或模糊样本，从而实现稳定的能力增强。在反课程促进阶段，我们设计了频谱盲性微调，抑制高频成分以强制依赖低频结构和上下文线索，从而增强泛化能力。大量实验表明，CurriSeg在多种CECS基准上取得了一致的改进，无需增加参数或增加总训练时间，为进展与挑战如何相互作用以促进鲁棒且上下文感知的分割提供了原则性视角。代码将发布。

英文摘要

Biological learning proceeds from easy to difficult tasks, gradually reinforcing perception and robustness. Inspired by this principle, we address Context-Entangled Content Segmentation (CECS), a challenging setting where objects share intrinsic visual patterns with their surroundings, as in camouflaged object detection. Conventional segmentation networks predominantly rely on architectural enhancements but often ignore the learning dynamics that govern robustness under entangled data distributions. We introduce CurriSeg, a dual-phase learning framework that unifies curriculum and anti-curriculum principles to improve representation reliability. In the Curriculum Selection phase, CurriSeg dynamically selects training data based on the temporal statistics of sample losses, distinguishing hard-but-informative samples from noisy or ambiguous ones, thus enabling stable capability enhancement. In the Anti-Curriculum Promotion phase, we design Spectral-Blindness Fine-Tuning, which suppresses high-frequency components to enforce dependence on low-frequency structural and contextual cues and thus strengthens generalization. Extensive experiments demonstrate that CurriSeg achieves consistent improvements across diverse CECS benchmarks without adding parameters or increasing total training time, offering a principled view of how progression and challenge interplay to foster robust and context-aware segmentation. Code will be released.

URL PDF HTML ☆

赞 0 踩 0

2601.22466 2026-05-26 cs.LG 版本更新

EvoEGF-Mol: Evolving Exponential Geodesic Flow for Structure-based Drug Design

EvoEGF-Mol：用于基于结构的药物设计的演化指数测地流

Yaowei Jin, Junjie Wang, Cheng Cao, Penglei Wang, Duo An, Qian Shi

发表机构 * Lingang Laboratory（Lingang 实验室）； School of Information Science（信息科学学院）； Technology, ShanghaiTech University（技术，上海科技大学）

AI总结针对基于结构的药物设计中欧几里得空间与概率空间不匹配的问题，提出EvoEGF-Mol模型，通过复合指数族分布和演化指数测地流统一表示分子，实现高几何精度和相互作用保真度。

Comments Accepted to ICML 2026

详情

AI中文摘要

基于结构的药物设计（SBDD）旨在发现生物活性配体。传统方法在欧几里得空间和概率空间中分别构建连续原子坐标和离散化学类别的概率路径，导致与底层统计流形不匹配。我们通过使用复合指数族分布来表示分子来解决这个问题，其中坐标和类别在统一的自然参数空间中表示，并在Fisher-Rao度量下沿指数测地线同步演化。为了避免直接针对狄拉克分布的测地线导致的瞬时轨迹崩溃，我们提出了用于SBDD的演化指数测地流（EvoEGF-Mol），该方法用动态集中的分布替代静态狄拉克目标，并通过渐进参数细化架构进行训练。我们的模型在CrossDock上达到了参考级别的PoseBusters通过率（93.4%），展示了卓越的几何精度和相互作用保真度，同时在真实世界的MolGenBench任务中，在生物活性骨架恢复方面取得了优于基线方法的性能。代码可在https://github.com/BLEACH366/EvoEGF-Mol获取。

英文摘要

Structure-Based Drug Design (SBDD) aims to discover bioactive ligands. Conventional approaches construct probability paths separately in Euclidean and probabilistic spaces for continuous atomic coordinates and discrete chemical categories, leading to a mismatch with the underlying statistical manifolds. We address this issue by representing molecules using composite exponential-family distributions, where coordinates and categories are represented within a unified natural parameter space to evolve synchronously along exponential geodesics under the Fisher-Rao metric. To avoid the instantaneous trajectory collapse induced by geodesics directly targeting Dirac distributions, we propose Evolving Exponential Geodesic Flow for SBDD (EvoEGF-Mol), which replaces static Dirac targets with dynamically concentrating distributions and is trained with a progressive-parameter-refinement architecture. Our model approaches a reference-level PoseBusters passing rate (93.4%) on CrossDock, demonstrating remarkable geometric precision and interaction fidelity, while achieving superior performance over baseline methods on real-world MolGenBench tasks for bioactive scaffold recovery. Code is available at https://github.com/BLEACH366/EvoEGF-Mol.

URL PDF HTML ☆

赞 0 踩 0

2601.21406 2026-05-26 cs.CV cs.LG 版本更新

Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation

通过多表示生成增强统一多模态模型的理解能力

Zihan Su, Hongyang Wei, Kangrui Cen, Yong Wang, Guanhua Chen, Chun Yuan, Xiangxiang Chu

发表机构 * Tsinghua Shenzhen International Graduate School, Tsinghua University（清华大学深圳国际研究生院，清华大学）； AMAP, Alibaba Group（阿里妈妈，阿里巴巴集团）； Shanghai Jiao Tong University（上海交通大学）； Southern University of Science and Technology（南方科技大学）

AI总结提出UniMRG方法，通过辅助生成像素、深度和分割等多重表示，增强统一多模态模型的理解能力，减少幻觉并提升空间理解。

Comments Code: https://github.com/Sugewud/UniMRG

详情

AI中文摘要

统一多模态模型（UMMs）在单一框架内整合了视觉理解和生成。其最终目标是创建一个理解和生成相互促进的循环。虽然最近的后训练方法成功利用理解来增强生成，但利用生成来改善理解的逆向方向仍基本未被探索。在这项工作中，我们提出了UniMRG（统一多表示生成），一种简单而有效的架构无关的后训练方法。UniMRG通过引入辅助生成任务来增强UMMs的理解能力。具体来说，我们训练UMMs生成输入图像的多种内在表示，即像素（重建）、深度（几何）和分割（结构），同时进行标准的视觉理解目标。通过综合这些多样化的表示，UMMs捕获关于外观、空间关系和结构布局的互补信息。因此，UMMs对视觉输入形成了更深入和全面的理解。跨多种UMM架构的大量实验表明，我们的方法显著增强了细粒度感知，减少了幻觉，并改善了空间理解，同时提升了生成能力。

英文摘要

Unified Multimodal Models (UMMs) integrate both visual understanding and generation within a single framework. Their ultimate aspiration is to create a cycle where understanding and generation mutually reinforce each other. While recent post-training methods have successfully leveraged understanding to enhance generation, the reverse direction of utilizing generation to improve understanding remains largely unexplored. In this work, we propose UniMRG (Unified Multi-Representation Generation), a simple yet effective architecture-agnostic post-training method. UniMRG enhances the understanding capabilities of UMMs by incorporating auxiliary generation tasks. Specifically, we train UMMs to generate multiple intrinsic representations of input images, namely pixel (reconstruction), depth (geometry), and segmentation (structure), alongside standard visual understanding objectives. By synthesizing these diverse representations, UMMs capture complementary information regarding appearance, spatial relations, and structural layout. Consequently, UMMs develop a deeper and more comprehensive understanding of visual inputs. Extensive experiments across diverse UMM architectures demonstrate that our method notably enhances fine-grained perception, reduces hallucinations, and improves spatial understanding, while simultaneously boosting generation capabilities.

URL PDF HTML ☆

赞 0 踩 0

2601.21094 2026-05-26 cs.LG cs.AI cs.SY eess.SY 版本更新

Safety Generalization Under Distribution Shift in Safe Reinforcement Learning: A Diabetes Testbed

安全强化学习中的分布偏移下的安全泛化：一个糖尿病测试平台

Minjae Kwon, Josephine Lamp, Lu Feng

发表机构 * Department of Computer Science, University of Virginia（弗吉尼亚大学计算机科学系）

AI总结研究安全强化学习算法在分布偏移下训练时安全保证能否迁移到部署中，使用糖尿病管理作为测试平台，发现安全泛化差距并通过测试时屏蔽有效恢复安全性。

Comments Accepted at ICML 2026. Camera-ready version

详情

AI中文摘要

安全强化学习算法通常在固定的训练条件下进行评估。我们使用糖尿病管理作为安全关键测试平台，研究训练时的安全保证是否能在分布偏移下迁移到部署中。我们在统一的临床模拟器上对安全强化学习算法进行基准测试，并揭示了一个安全泛化差距：在训练期间满足约束的策略经常在未见过的患者身上违反安全要求。我们证明，测试时屏蔽（使用学习到的动力学模型过滤不安全动作）能有效恢复跨算法和患者群体的安全性。在八种安全强化学习算法、三种糖尿病类型和三个年龄组中，屏蔽使得PPO-Lag和CPO等强基线的血糖达标时间范围提高了13-14%，同时降低了临床风险指数和血糖变异性。我们的模拟器和基准测试为研究安全关键控制领域中分布偏移下的安全性提供了一个平台。代码可在https://github.com/safe-autonomy-lab/GlucoSim 和 https://github.com/safe-autonomy-lab/GlucoAlg 获取。

英文摘要

Safe Reinforcement Learning (RL) algorithms are typically evaluated under fixed training conditions. We investigate whether training-time safety guarantees transfer to deployment under distribution shift, using diabetes management as a safety-critical testbed. We benchmark safe RL algorithms on a unified clinical simulator and reveal a safety generalization gap: policies satisfying constraints during training frequently violate safety requirements on unseen patients. We demonstrate that test-time shielding, which filters unsafe actions using learned dynamics models, effectively restores safety across algorithms and patient populations. Across eight safe RL algorithms, three diabetes types, and three age groups, shielding achieves Time-in-Range gains of 13--14\% for strong baselines such as PPO-Lag and CPO while reducing clinical risk index and glucose variability. Our simulator and benchmark provide a platform for studying safety under distribution shift in safety-critical control domains. Code is available at https://github.com/safe-autonomy-lab/GlucoSim and https://github.com/safe-autonomy-lab/GlucoAlg.

URL PDF HTML ☆

赞 0 踩 0

2601.19743 2026-05-26 eess.IV cs.CV cs.LG 版本更新

Interpretable and backpropagation-free Green Learning for efficient multi-task echocardiographic segmentation and classification

可解释且无需反向传播的绿色学习用于高效多任务超声心动图分割与分类

Jyun-Ping Kao, Jiaxin Yang, C. -C. Jay Kuo, Jonghye Woo

AI总结提出一种无需反向传播的多任务绿色学习框架，通过无监督VoxelHop编码器与多级回归解码器及XG-Boost分类器，在EchoNet-Dynamic数据集上实现左心室分割与射血分数分类，以极低参数量达到高精度。

Comments Accepted for publication in APSIPA Transactions on Signal and Information Processing. Jyun-Ping Kao and Jiaxing Yang contributed equally to this work. C.-C. Jay Kuo and Jonghye Woo are the senior authors

详情

AI中文摘要

超声心动图是管理心力衰竭（HF）的基石，左心室射血分数（LVEF）是指导治疗的关键指标。然而，手动LVEF评估存在较高的观察者间变异性，而现有的深度学习（DL）模型通常是计算密集且数据饥饿的“黑箱”，阻碍了临床信任和采用。在此，我们提出了一种无需反向传播的多任务绿色学习（MTGL）框架，可同时进行左心室（LV）分割和LVEF分类。我们的框架将用于分层时空特征提取的无监督VoxelHop编码器与多级回归解码器和XG-Boost分类器相结合。在EchoNet-Dynamic数据集上，我们的MTGL模型实现了最先进的分类和分割性能，分类准确率达到94.3%，Dice相似系数（DSC）达到0.912，显著优于多个先进的3D DL模型。关键的是，我们的模型在参数数量少一个数量级的情况下实现了这一性能，展现了卓越的计算效率。这项工作表明，GL范式可以为复杂的医学图像分析提供高度准确、高效且可解释的解决方案，为临床实践中更可持续和可信的人工智能铺平道路。

英文摘要

Echocardiography is a cornerstone for managing heart failure (HF), with Left Ventricular Ejection Fraction (LVEF) being a critical metric for guiding therapy. However, manual LVEF assessment suffers from high inter-observer variability, while existing Deep Learning (DL) models are often computationally intensive and data-hungry "black boxes" that impede clinical trust and adoption. Here, we propose a backpropagation-free multi-task Green Learning (MTGL) framework that performs simultaneous Left Ventricle (LV) segmentation and LVEF classification. Our framework integrates an unsupervised VoxelHop encoder for hierarchical spatio-temporal feature extraction with a multi-level regression decoder and an XG-Boost classifier. On the EchoNet-Dynamic dataset, our MTGL model achieves state-of-the-art classification and segmentation performance, attaining a classification accuracy of 94.3% and a Dice Similarity Coefficient (DSC) of 0.912, significantly outperforming several advanced 3D DL models. Crucially, our model achieves this with over an order of magnitude fewer parameters, demonstrating exceptional computational efficiency. This work demonstrates that the GL paradigm can deliver highly accurate, efficient, and interpretable solutions for complex medical image analysis, paving the way for more sustainable and trustworthy artificial intelligence in clinical practice.

URL PDF HTML ☆

赞 0 踩 0

2601.05613 2026-05-26 cs.LG cs.AI 版本更新

PiXTime: A Model for Federated Time Series Forecasting with Heterogeneous Data across Nodes

PiXTime: 一种跨节点异构数据联邦时间序列预测模型

Yiming Zhou, Jiahao Wang, Mingyue Cheng, Hao Wang, Defu Lian, Enhong Chen

发表机构 * University of Science and Technology of China（科学技术大学）

AI总结提出基于Transformer的PiXTime框架，通过参数解耦架构（局部个性化模块+全局共享骨干）处理异构时间序列，实现联邦学习中的异构数据预测，并在多个基准上达到最优性能。

详情

AI中文摘要

虽然对分布式时间序列进行协同预测非常理想，但由于数据共享限制，直接合并局部数据集通常不可行。联邦学习提供了一种有前景的替代方案，但传统的联邦学习算法要求同构模型架构，这与去中心化节点中常见的结构差异（如时间分辨率不对齐、变量通道不匹配）不兼容。为弥合这一差距，我们引入了PiXTime，一种新颖的基于Transformer的框架，旨在原生适应并利用结构异构的时间数据。其核心采用参数解耦架构，将模型策略性地划分为局部个性化模块和全局聚合共享骨干。具体而言，节点特定的局部模块作为维度适配器，将不同长度的原始序列投影到统一表示空间。同时，全局同步的VE表将一致的类别标识注入特征空间，使共享骨干能够跨不一致的变量分布协同学习并泛化表示。在多个基准上的全面评估表明，PiXTime在异构联邦环境中实现了最先进的性能，同时在标准同构和集中式预测设置中保持强大的优势。

英文摘要

While collaborative forecasting on distributed time series is highly desirable, directly pooling localized datasets is often impractical due to data sharing constraints. Federated learning offers a promising alternative, yet conventional federated learning algorithms require homogeneous model architectures, which are incompatible with the structural discrepancies, such as unaligned temporal resolutions and mismatched variable channels, commonly observed across decentralized nodes. To bridge this gap, we introduce PiXTime, a novel Transformer-based framework designed to natively accommodate and leverage structurally heterogeneous temporal data. At its core, PiXTime adopts a parameter-decoupling architecture, strategically partitioning the model into localized personalized modules and a globally aggregated shared backbone. Specifically, node-specific local modules act as dimensional adapters, projecting raw sequences of diverse lengths into a unified representation space. Concurrently, a globally synchronized VE Table injects consistent categorical identities into the feature space, allowing the shared backbone to collaboratively learn and generalize representations across inconsistent variable distributions. Comprehensive evaluations on multiple benchmarks demonstrate that PiXTime achieves state-of-the-art performance in heterogeneous federated environments, while maintaining robust superiority in standard homogeneous and centralized forecasting settings.

URL PDF HTML ☆

赞 0 踩 0

2601.05289 2026-05-26 hep-ph cs.LG hep-ex physics.ins-det 版本更新

A universal vision transformer for fast calorimeter simulations

一种用于快速量热器模拟的通用视觉变换器

Luigi Favaro, Andrea Giammanco, Claudius Krause

发表机构 * Centre for Cosmology, Particle Physics（宇宙学、粒子物理与现象学研究中心）； Marietta Blau Institute for Particle Physics (MBI Vienna), Austrian Academy of Sciences (ÖAW), Austria（玛丽埃塔·布劳粒子物理研究所（MBI维也纳），奥地利科学院（ÖAW），奥地利）

AI总结本研究基于CaloDREAM架构，提出使用视觉变换器（ViT）进行快速量热器模拟，在规则和不规则几何结构及多个探测器上均表现出高精度和可扩展性，生成时间在单GPU上为10-100毫秒，并通过预训练和微调降低了训练成本并提高了数据效率。

Comments 44 pages, 17 figures, 8 tables; journal version. Mach. Learn.: Sci. Technol (2026)

详情

DOI: 10.1088/2632-2153/ae7179

AI中文摘要

探测器的高维复杂特性使得快速量热器模拟成为现代生成式机器学习的主要应用。视觉变换器（ViT）能够以无与伦比的精度模拟Geant4响应，并且不限于规则几何结构。从CaloDREAM架构出发，我们展示了ViT在规则和不规则几何结构以及多个探测器上的鲁棒性和可扩展性。结果表明，ViT在多个评估指标下生成的电磁和强子簇射与Geant4的偏差极小，同时在单个GPU上的生成时间保持在$\mathcal{O}(10-100)$毫秒。此外，我们表明在大型数据集上预训练并在目标几何结构上微调可以降低训练成本并提高数据效率，或者整体上提高生成簇射的保真度。

英文摘要

The high-dimensional complex nature of detectors makes fast calorimeter simulations a prime application for modern generative machine learning. Vision transformers (ViTs) can emulate the Geant4 response with unmatched accuracy and are not limited to regular geometries. Starting from the CaloDREAM architecture, we demonstrate the robustness and scalability of ViTs on regular and irregular geometries, and multiple detectors. Our results show that ViTs generate electromagnetic and hadronic showers with minimal deviations from Geant4 in multiple evaluation metrics, while maintaining the generation time in the $\mathcal{O}(10-100)$ ms on a single GPU. Furthermore, we show that pretraining on a large dataset and fine-tuning on the target geometry leads to reduced training costs and higher data efficiency, or altogether improves the fidelity of generated showers.

URL PDF HTML ☆

赞 0 踩 0

2601.03327 2026-05-26 cs.LG cs.AI 版本更新

Extreme-value forest fire prediction A study of the Loss Function in an Ordinality Scheme

极端值森林火灾预测：序数方案中损失函数的研究

Nicolas Caron, Christophe Guyeux, Hassan Noura, Benjamin Aynes

AI总结提出首个序数分类框架预测火灾严重等级，研究损失函数设计对预测极端事件的影响，发现加权卡帕损失在极端类别上IoU提升超过0.1。

Comments Following external reviews, we identified major methodological issues in the manuscript, including insufficient justification of the ordinal clustering strategy, limited statistical validation, ambiguities in dataset splitting, and missing comparisons with standard ordinal approaches. We therefore request withdrawal in order to prepare a substantially revised version

详情

AI中文摘要

野火在空间和严重程度上是高度不平衡的自然灾害，使得极端事件的预测特别具有挑战性。在这项工作中，我们引入了第一个序数分类框架，用于预测与法国操作决策直接对齐的野火严重等级。我们的研究调查了损失函数设计对神经模型预测罕见但关键的高严重火灾发生能力的影响。我们将标准交叉熵与几种序数感知目标进行比较，包括提出的基于截断离散指数广义帕累托分布的概率TDeGPD损失。通过对多种架构和真实操作数据的广泛基准测试，我们表明序数监督显著提高了模型相对于传统方法的性能。特别是，加权卡帕损失（WKLoss）取得了最佳整体结果，在最极端严重类别上IoU（交并比）增益超过0.1，同时保持了有竞争力的校准质量。然而，由于数据集中极端事件极低的代表性，对于最罕见事件的性能仍然有限。这些发现强调了将严重性排序、数据不平衡考虑和季节性风险整合到野火预测系统中的重要性。未来的工作将集中于将季节动态和不确定性信息纳入训练，以进一步提高极端事件预测的可靠性。

英文摘要

Wildfires are highly imbalanced natural hazards in both space and severity, making the prediction of extreme events particularly challenging. In this work, we introduce the first ordinal classification framework for forecasting wildfire severity levels directly aligned with operational decision-making in France. Our study investigates the influence of loss-function design on the ability of neural models to predict rare yet critical high-severity fire occurrences. We compare standard cross-entropy with several ordinal-aware objectives, including the proposed probabilistic TDeGPD loss derived from a truncated discrete exponentiated Generalized Pareto Distribution. Through extensive benchmarking over multiple architectures and real operational data, we show that ordinal supervision substantially improves model performance over conventional approaches. In particular, the Weighted Kappa Loss (WKLoss) achieves the best overall results, with more than +0.1 IoU (Intersection Over Union) gain on the most extreme severity classes while maintaining competitive calibration quality. However, performance remains limited for the rarest events due to their extremely low representation in the dataset. These findings highlight the importance of integrating both severity ordering, data imbalance considerations, and seasonality risk into wildfire forecasting systems. Future work will focus on incorporating seasonal dynamics and uncertainty information into training to further improve the reliability of extreme-event prediction.

URL PDF HTML ☆

赞 0 踩 0

2512.24075 2026-05-26 cs.LG 版本更新

Evolutionary Physics-Informed Temporal Fusion for Lane-Change Intention Prediction

进化物理信息时间融合用于换道意图预测

Jiazhao Shi, Qiyang Xie, Ziyu Wang, Dongxu Zhang, Yichen Lin, Di Zhu, Chen Xie, Ziwei Wang, Haoyun Zhang, Enliang Li, Zetong Guan

发表机构 * Tandon School of Engineering（工程学院）； New York University（纽约大学）； Khoury College of Computer Science（计算机科学学院）； Northeastern University（东北大学）； School of Business（商学院）； Wake Forest University（威克森林大学）； Independent Researcher（独立研究者）； University of Massachusetts Amherst（马萨诸塞大学阿默斯特分校）； Carnegie Mellon University（卡内基梅隆大学）； University of Pennsylvania（宾夕法尼亚大学）； Qualcomm CDMA Technologies（高通CDMA技术）； University of Michigan（密歇根大学）

AI总结提出一种进化物理信息时间融合框架，通过融合从传统交通信号导出的时间描述符和从原始轨迹序列学习的时间嵌入，实现三分类换道意图预测，在highD和exiD数据集上取得高F1分数。

详情

AI中文摘要

早期换道意图预测对于自动驾驶和ADAS至关重要，但由于换道行为依赖于不断变化的交通风险、周围车辆交互和目标车道可行性，而非仅瞬时车辆状态，因此仍具挑战性。本研究提出一种进化物理信息时间融合框架，用于三分类换道意图预测，包括左换道、右换道和不换道。该方法并非仅使用静态物理信息变量，而是从传统交通信号中导出时间描述符，包括风险演化、间隙持续性、反事实车道效用、交互压力梯度、机动可行性和意图一致性。这些描述符与通过序列编码器从原始轨迹序列学习的时间嵌入融合，融合表示用于最终分类。在highD和exiD数据集上，分别在1秒、2秒和3秒预测时域下进行实验。所提模型在highD上达到0.9514、0.9256和0.8872的宏F1分数，在exiD上达到0.9386、0.9070和0.8531。在exiD匝道邻近场景中改进尤为显著，表明时间物理演化在交互丰富的环境中特别有用。这些结果表明，将进化物理信息描述符与学习的时间表示相结合，为早期换道意图预测提供了更动态且可解释的解决方案。

英文摘要

Early lane-change intention prediction is essential for autonomous driving and ADAS, but it remains challenging because lane-changing behavior depends on evolving traffic risk, surrounding-vehicle interactions, and target-lane feasibility rather than only instantaneous vehicle states. This study proposes an evolutionary physics-informed temporal fusion framework for three-class lane-change intention prediction, including left lane change, right lane change, and no lane change. Instead of using static physics-informed variables alone, the proposed method derives temporal descriptors from conventional traffic signals, including risk evolution, gap persistence, counterfactual lane utility, interaction pressure gradient, maneuver feasibility, and intent consistency. These descriptors are fused with temporal embeddings learned from raw trajectory sequences through a sequence encoder, and the fused representation is used for final classification. Experiments are conducted on the highD and exiD datasets under 1\,s, 2\,s, and 3\,s prediction horizons. The proposed model achieves Macro F1-scores of 0.9514, 0.9256, and 0.8872 on highD, and 0.9386, 0.9070, and 0.8531 on exiD, respectively. The improvement is especially pronounced in exiD ramp-adjacent scenarios, indicating that temporal physical evolution is particularly useful in interaction-rich environments. These results demonstrate that combining evolutionary physics-informed descriptors with learned temporal representations provides a more dynamic and interpretable solution for early lane-change intention prediction.

URL PDF HTML ☆

赞 0 踩 0

2512.23076 2026-05-26 cs.LG cs.AI cs.HC 版本更新

Multimodal Functional Maximum Correlation for Emotion Recognition

多模态功能最大相关用于情感识别

Deyang Zheng, Tianyi Zhang, Wenming Zheng, Shujian Yu

发表机构 * Key Laboratory of Child Development and Learning Science (Ministry of Education), School of Biological Sciences and Medical Engineering, Southeast University（儿童发展与学习科学重点实验室（教育部）、生物科学与医学工程学院、东南大学）； Department of Artificial Intelligence, Westlake University（人工智能学院、西湖大学）； Department of Artificial Intelligence, Vrije Universiteit Amsterdam（人工智能学院、阿姆斯特丹自由大学）

AI总结提出多模态功能最大相关（MFMC）框架，通过双重总相关目标最大化高阶多模态依赖，在情感识别基准上取得最先进性能。

Comments manuscript accepted by IEEE Transactions on Affective Computing. Code is available at https://github.com/DY9910/MFMC

详情

DOI: 10.1109/TAFFC.2026.3695876

AI中文摘要

情绪状态表现为中枢和自主系统之间协调但异质的生理反应，这对情感计算中的多模态表示学习构成了基本挑战。学习这种联合动态因情感标注的稀缺性和主观性而进一步复杂化，这推动了自监督学习（SSL）的使用。然而，大多数现有的SSL方法依赖于成对对齐目标，这些目标不足以表征两个以上模态之间的依赖关系，也无法捕捉由协调的脑和自主反应产生的高阶交互。为了解决这一限制，我们提出了多模态功能最大相关（MFMC），一个原则性的SSL框架，通过双重总相关（DTC）目标最大化高阶多模态依赖。通过推导一个紧致的夹逼界并使用基于功能最大相关分析（FMCA）的迹替代进行优化，MFMC直接捕捉联合多模态交互，而不依赖于成对对比损失。在三个公开的情感计算基准上的实验表明，MFMC在受试者依赖和受试者独立评估协议下均一致地达到最先进或具有竞争力的性能，突显了其对受试者间变异性的鲁棒性。特别是，MFMC将CEAP-360VR上的受试者依赖准确率从78.9%提高到86.8%，仅使用EDA信号就将受试者独立准确率从27.5%提高到33.1%。此外，在MAHNOB-HCI最具挑战性的EEG受试者独立划分中，MFMC与最佳方法的差距在0.8个百分点以内。我们的代码可在https://github.com/DY9910/MFMC获取。

英文摘要

Emotional states manifest as coordinated yet heterogeneous physiological responses across central and autonomic systems, posing a fundamental challenge for multimodal representation learning in affective computing. Learning such joint dynamics is further complicated by the scarcity and subjectivity of affective annotations, which motivates the use of self-supervised learning (SSL). However, most existing SSL approaches rely on pairwise alignment objectives, which are insufficient to characterize dependencies among more than two modalities and fail to capture higher-order interactions arising from coordinated brain and autonomic responses. To address this limitation, we propose Multimodal Functional Maximum Correlation (MFMC), a principled SSL framework that maximizes higher-order multimodal dependence through a Dual Total Correlation (DTC) objective. By deriving a tight sandwich bound and optimizing it using a functional maximum correlation analysis (FMCA) based trace surrogate, MFMC captures joint multimodal interactions directly, without relying on pairwise contrastive losses. Experiments on three public affective computing benchmarks demonstrate that MFMC consistently achieves state-of-the-art or competitive performance under both subject-dependent and subject-independent evaluation protocols, highlighting its robustness to inter-subject variability. In particular, MFMC improves subject-dependent accuracy on CEAP-360VR from 78.9% to 86.8%, and subject-independent accuracy from 27.5% to 33.1% using the EDA signal alone. Moreover, MFMC remains within 0.8 percentage points of the best-performing method on the most challenging EEG subject-independent split of MAHNOB-HCI. Our code is available at https://github.com/DY9910/MFMC.

URL PDF HTML ☆

赞 0 踩 0

2512.15605 2026-05-26 cs.LG stat.ML 版本更新

Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction

自回归语言模型实际上是能量模型：对下一个词元预测的预见能力的洞察

Mathieu Blondel, Michael E. Sander, Germain Vivier-Ardisson, Tianlin Liu, Vincent Roulet

发表机构 * Google DeepMind（谷歌深Mind）

AI总结本文通过建立自回归模型与能量模型之间的双射，揭示了自回归模型在下一个词元预测范式下具备预见能力，并提供了理论误差界。

2512.10506 2026-05-26 eess.IV cs.LG eess.SP 版本更新

Hyperspectral Image Data Reduction for Endmember Extraction

用于端元提取的高光谱图像数据降维

Tomohiko Mizutani

发表机构 * Department of Mathematical and Systems Engineering, Shizuoka University（数学与系统工程系，静冈大学）

AI总结针对高光谱图像端元提取计算成本高的问题，提出一种基于线性混合模型和纯像元假设的数据降维技术，去除混合像元，保留近端元像元，并结合线性规划自字典方法，在不牺牲提取精度的前提下显著降低计算时间。

Comments 37 pages, code is available at https://github.com/tomohiko-mizutani/REDIC

详情

AI中文摘要

从高光谱图像中提取端元旨在识别场景中存在的材料的光谱特征。最近的研究表明，自字典方法可以实现高提取精度；然而，其高计算成本限制了它们在大规模高光谱图像中的适用性。尽管已经提出了几种方法来解决这个问题，但它仍然是一个主要挑战。受此情况启发，本文采用数据降维方法。假设高光谱图像遵循线性混合模型且满足纯像元假设，我们开发了一种数据降维技术来去除对应于多个端元特征混合的像元。我们分析了该降维步骤的理论性质，并表明它保留了靠近端元的像元。基于这一结果，我们提出了一种数据降维自字典方法，该方法将数据降维与基于线性规划公式的自字典方法相结合。数值实验表明，所提出的方法可以在不牺牲端元提取精度的情况下，显著减少原始自字典方法的计算时间。

英文摘要

Endmember extraction from hyperspectral images aims to identify the spectral signatures of materials present in a scene. Recent studies have shown that self-dictionary methods can achieve high extraction accuracy; however, their high computational cost limits their applicability to large-scale hyperspectral images. Although several approaches have been proposed to mitigate this issue, it remains a major challenge. Motivated by this situation, this paper pursues a data reduction approach. Assuming that a hyperspectral image follows the linear mixing model with the pure-pixel assumption, we develop a data reduction technique to remove pixels corresponding to mixtures of multiple endmember signatures. We analyze the theoretical properties of this reduction step and show that it preserves pixels that lie close to the endmembers. Building on this result, we propose a data-reduced self-dictionary method that integrates the data reduction with a self-dictionary method based on a linear programming formulation. Numerical experiments demonstrate that the proposed method can substantially reduce the computational time of the original self-dictionary method without sacrificing endmember extraction accuracy.

URL PDF HTML ☆

赞 0 踩 0

2512.08974 2026-05-26 physics.ao-ph cs.LG 版本更新

FuXi-Nowcast: Environment-conditioned deep learning for severe convection nowcasting

FuXi-Nowcast：环境条件深度学习用于强对流临近预报

Lei Chen, Zijian Zhu, Xiaoran Zhuang, Tianyuan Qi, Yuxuan Feng, Xiaohui Zhong, Hao Li

发表机构 * Jiangsu Meteorological Observatory（江苏省气象台）； Jiangsu Key Laboratory of Severe Storm Disaster Risk / Key Laboratory of Transportation Meteorology of CMA（江苏省 severe storm disaster risk key laboratory / 国家气象局交通运输气象重点实验室）； FuXi Intelligent Computing Technology Co., Ltd.（FuXi 智能计算技术有限公司）

AI总结提出环境条件深度学习系统FuXi-Nowcast，结合高分辨率观测与三维大气预报，在12小时内预测复合反射率、降水、阵风及地表变量，优于业务数值、持续性和外推基线。

详情

AI中文摘要

强对流产生局地灾害，通常需要在雷达回波完全揭示风暴发展之前发出预警。对流触发和强对流的维持对于仅依赖雷达的临近预报仍具挑战性，因为雷达观测中可能缺乏对流前信号，且强回波在预报中常快速衰减。本文提出FuXi-Nowcast，一种环境条件深度学习系统，结合高分辨率观测与三维大气预报，预测未来12小时内的复合反射率、降水、阵风及地表变量。在2024年4月至7月华东地区的评估中，FuXi-Nowcast在反射率和降水方面优于业务数值、持续性和外推基线。案例研究、诊断和消融实验表明，大气湿度信息和对强对流信号的显式保留有助于对流触发和维持的预报。这些结果表明，环境条件约束可以缓解仅依赖雷达的临近预报在高影响对流天气中的重要失效模式。

英文摘要

Severe convection produces localized hazards that often require warnings before radar echoes fully reveal storm development. Convective initiation and the maintenance of intense convection remain challenging for radar-only nowcasting because pre-convective signals may be absent from recent radar observations and strong echoes often decay rapidly in forecasts. Here we present FuXi-Nowcast, an environment-conditioned deep learning system that combines high-resolution observations with three-dimensional atmospheric forecasts to predict composite reflectivity, precipitation, wind gusts, and surface variables up to 12 h ahead. In April--July 2024 evaluations over East China, FuXi-Nowcast outperforms operational numerical, persistence and extrapolation baselines for reflectivity and precipitation. Case studies, diagnostics, and ablation experiments suggest that atmospheric moisture information and explicit preservation of strong convective signals contribute to forecasts of convective initiation and maintenance. These results show that environmental conditioning can mitigate important failure modes of radar-only nowcasting for high-impact convective weather.

URL PDF HTML ☆

赞 0 踩 0

2512.08508 2026-05-26 q-bio.BM cs.LG 版本更新

Multi-Alignment Contrastive Learning for Enzyme--Reaction Retrieval

多对齐对比学习用于酶-反应检索

Gengmo Zhou, Feng Yu, Wenda Wang, Zhifeng Gao, Guolin Ke, Zhewei Wei, Zhen Wang

发表机构 * Renmin University of China（中国人民大学）； DP Technology（DP科技）

AI总结提出多对齐对比学习框架，通过联合建模酶-反应跨域兼容性及功能注释驱动的域内关系，并引入Gromov-Wasserstein正则化项，提升酶虚拟筛选和双向检索性能。

详情

AI中文摘要

识别催化目标生化反应的酶是计算酶发现和生物催化剂设计的关键步骤。最近的表示学习方法将这一问题表述为酶-反应匹配，其中配对的酶和反应被嵌入到共享空间中。然而，大多数现有方法主要依赖于成对的酶-反应监督，并且对反应集或酶家族内部关系的利用有限。本文介绍了一种用于生化检索的多对齐对比学习框架。该框架联合建模酶与反应之间的跨域兼容性以及由功能注释诱导的域内关系。此外，受Gromov-Wasserstein启发的正则化目标鼓励学习的酶和反应表示空间之间的几何一致性。通过将成对的催化监督与高阶关系对齐相结合，该模型捕获了直接的酶-反应关联以及更广泛的功能组织。我们在酶虚拟筛选和双向酶-反应检索任务上评估了该方法。在EnzymeMap上的实验表明，与强对比基线相比，在BEDROC和富集因子指标下，早期识别性能有所提高。在ReactZyme上，该方法在基于时间、酶相似性和反应相似性的划分中均取得了一致的增益，展示了对未见酶和未见反应的鲁棒性。消融研究进一步表明，域内对齐、功能监督和几何正则化项各自对观察到的改进有所贡献。这些结果表明，建模多种形式的对齐可以改进用于酶发现、反应注释及相关计算生物学应用的对比检索模型。

英文摘要

Identifying enzymes that catalyze target biochemical reactions is a key step in computational enzyme discovery and biocatalyst design. Recent representation-learning methods formulate this problem as enzyme--reaction matching, where paired enzymes and reactions are embedded into a shared space. However, most existing approaches primarily rely on pairwise enzyme--reaction supervision and make limited use of the relationships within reaction sets or enzyme families. This work introduces a multi-alignment contrastive learning framework for biochemical retrieval. The framework jointly models cross-domain compatibility between enzymes and reactions and within-domain relationships induced by functional annotations. In addition, a Gromov--Wasserstein-inspired regularization objective encourages geometric consistency between the learned enzyme and reaction representation spaces. By combining pairwise catalytic supervision with higher-order relational alignment, the model captures both direct enzyme--reaction associations and broader functional organization. We evaluate the approach on enzyme virtual screening and bidirectional enzyme--reaction retrieval tasks. Experiments on EnzymeMap show improved early-recognition performance under BEDROC and enrichment-factor metrics compared with strong contrastive baselines. On ReactZyme, the method achieves consistent gains across time-based, enzyme-similarity, and reaction-similarity splits, demonstrating robustness to unseen enzymes and unseen reactions. Ablation studies further indicate that within-domain alignment, functional supervision, and the geometric regularization term each contribute to the observed improvements. These results suggest that modeling multiple forms of alignment can improve contrastive retrieval models for enzyme discovery, reaction annotation, and related computational biology applications.

URL PDF HTML ☆

赞 0 踩 0

2512.05865 2026-05-26 cs.LG cs.AI 版本更新

Intrinsically Interpretable Attention via Sparse Post-Training

通过稀疏后训练实现内在可解释的注意力机制

Florent Draye, Anson Lei, Hsiao-Ru Pan, Ingmar Posner, Bernhard Schölkopf

发表机构 * MPI-IS（马克斯·普朗克研究所）； University of Oxford（牛津大学）； ETH Zürich（苏黎世联邦理工学院）

AI总结提出一种后训练方法，通过约束损失下的灵活稀疏正则化，在不牺牲性能的前提下将Transformer注意力连接稀疏至约0.4%，从而简化全局电路并提升可解释性。

详情

AI中文摘要

我们引入一种简单的后训练方法，使Transformer注意力变得稀疏而不牺牲性能。在约束损失目标下应用灵活的稀疏正则化，我们在高达7B参数的模型上证明，可以将注意力连接减少到其边缘的约0.4%，同时保留原始预训练损失。与为计算效率设计的稀疏注意力方法不同，我们的方法利用稀疏性作为结构先验：它保留了能力，同时暴露出更有组织和可解释的连接模式。我们发现这种局部稀疏性级联成全局电路简化：特定任务的电路涉及更少的组件（注意力头和MLP），连接它们的边缘减少了多达100倍。此外，使用跨层转录器，我们表明稀疏注意力显著简化了注意力归因，实现了基于特征和基于电路视角的统一视图。这些结果表明，Transformer注意力可以变得稀疏几个数量级，表明其大部分计算是冗余的，并且稀疏性可以作为更结构化和可解释模型的指导原则。

英文摘要

We introduce a simple post-training method that makes transformer attention sparse without sacrificing performance. Applying a flexible sparsity regularisation under a constrained-loss objective, we show on models up to 7B parameters that it is possible to retain the original pretraining loss while reducing attention connectivity to $\approx 0.4 \%$ of its edges. Unlike sparse-attention methods designed for computational efficiency, our approach leverages sparsity as a structural prior: it preserves capability while exposing a more organized and interpretable connectivity pattern. We find that this local sparsity cascades into global circuit simplification: task-specific circuits involve far fewer components (attention heads and MLPs) with up to 100x fewer edges connecting them. Additionally, using cross-layer transcoders, we show that sparse attention substantially simplifies attention attribution, enabling a unified view of feature-based and circuit-based perspectives. These results demonstrate that transformer attention can be made orders of magnitude sparser, suggesting that much of its computation is redundant and that sparsity may serve as a guiding principle for more structured and interpretable models.

URL PDF HTML ☆

赞 0 踩 0

2511.20236 2026-05-26 cs.AI cs.LG 版本更新

Actionable and diverse counterfactual explanations incorporating domain knowledge and plausibility constraints

结合领域知识和可行性约束的可操作且多样化的反事实解释

Szymon Bobek, Łukasz Bałec, Grzegorz J. Nalepa

发表机构 * Faculty of Physics, Astronomy and Applied Computer Science, Institute of Applied Computer Science, Jagiellonian Human-Centered AI Lab（物理、天文与应用计算机科学学院，应用计算机科学研究所，雅盖隆人机中心AI实验室）

AI总结提出DANCE方法，通过建模特征依赖和领域约束生成可操作、多样化的反事实解释，在OpenML数据集和工业邮件营销场景中验证了其有效性和实用性。

详情

AI中文摘要

反事实解释通过识别实现期望结果所需的最小变化来提高机器学习模型的可操作可解释性。然而，现有方法常常忽略特征之间的依赖关系，这可能导致不现实或不切实际的修改。这一限制降低了反事实解释在现实决策支持系统中的实用性。受网络安全中电子邮件营销应用的启发，我们提出了DANCE（多样化、可操作且知识约束的解释），一种生成反事实的方法，该方法结合了特征依赖和领域约束。DANCE使用线性或概率结构对特征之间的关系进行建模，这些结构可以从数据中学习或由专家指定。在搜索过程中强制执行这些依赖关系以提高可行性和现实性。该方法在一个统一的目标中联合优化可行性、多样性、邻近性和稀疏性。我们在OpenML的140个数据集上评估了DANCE，并证明它在多个评估标准上相比现有方法具有竞争性或更优的性能。此外，我们与一个电子邮件营销平台合作，在真实工业环境中验证了该方法，表明它能够产生符合领域且可操作的建议。

英文摘要

Counterfactual explanations improve the actionable interpretability of machine learning models by identifying minimal changes required to achieve a desired outcome. However, existing methods often neglect dependencies among features, which can lead to unrealistic or impractical modifications. This limitation reduces the usefulness of counterfactual explanations in real-world decision-support systems. Motivated by applications in cybersecurity for email marketing, we propose DANCE (Diverse, Actionable, and Knowledge-Constrained Explanations), a method for generating counterfactuals that incorporate feature dependencies and domain constraints. DANCE models relationships between features using linear and probabilistic structures that can be learned from data or specified by experts. These dependencies are enforced during the search process to improve plausibility and feasibility. The method jointly optimizes plausibility, diversity, proximity, and sparsity within a unified objective. We evaluate DANCE on 140 datasets from OpenML and demonstrate that it achieves competitive or superior performance compared to existing approaches across multiple evaluation criteria. Additionally, we validate the method in a real-world industrial setting in collaboration with an email marketing platform, showing that it produces domain-consistent and actionable recommendations.

URL PDF HTML ☆

赞 0 踩 0

2511.19065 2026-05-26 cs.CV cs.AI cs.LG 版本更新

Understanding, Accelerating, and Improving MeanFlow Training

理解、加速和改进MeanFlow训练

Jin-Young Kim, Hyojun Go, Lea Bogensperger, Julius Erbach, Nikolai Kalischek, Federico Tombari, Konrad Schindler, Dominik Narnhofer

发表机构 * Yonsei University（延世大学）； ETH Zurich（苏黎世联邦理工学院）； University of Zurich（苏黎世大学）； Max Planck ETH CLS（马克斯·普朗克ETH CLS）； Google（谷歌）

AI总结通过分析瞬时速度与平均速度的相互作用，提出一种加速瞬时速度形成并逐步转移训练重点的有效训练方案，实现更快的收敛和更优的少步生成性能。

详情

AI中文摘要

MeanFlow通过联合学习瞬时速度场和平均速度场，有望在少步内实现高质量生成建模。然而，其底层训练动态仍不清楚。我们分析两种速度之间的相互作用，发现：(i) 建立良好的瞬时速度是学习平均速度的前提；(ii) 当时间间隔较小时，瞬时速度的学习受益于平均速度，但随着间隔增大而退化；(iii) 任务亲和性分析表明，对于一步生成至关重要的大间隔平均速度的平滑学习，依赖于先形成准确的瞬时速度和小间隔平均速度。在这些观察的指导下，我们设计了一种有效的训练方案，加速瞬时速度的形成，然后将重点从短间隔平均速度转移到长间隔平均速度。我们改进的MeanFlow训练实现了更快的收敛和显著更好的少步生成：使用相同的DiT-XL骨干网络，我们的方法在1-NFE ImageNet 256x256上达到了令人印象深刻的FID 2.87，而传统的MeanFlow基线为3.43。或者，我们的方法以2.5倍更短的训练时间或使用更小的DiT-L骨干网络，匹配MeanFlow基线的性能。

英文摘要

MeanFlow promises high-quality generative modeling in few steps, by jointly learning instantaneous and average velocity fields. Yet, the underlying training dynamics remain unclear. We analyze the interaction between the two velocities and find: (i) well-established instantaneous velocity is a prerequisite for learning average velocity; (ii) learning of instantaneous velocity benefits from average velocity when the temporal gap is small, but degrades as the gap increases; and (iii) task-affinity analysis indicates that smooth learning of large-gap average velocities, essential for one-step generation, depends on the prior formation of accurate instantaneous and small-gap average velocities. Guided by these observations, we design an effective training scheme that accelerates the formation of instantaneous velocity, then shifts emphasis from short- to long-interval average velocity. Our enhanced MeanFlow training yields faster convergence and significantly better few-step generation: With the same DiT-XL backbone, our method reaches an impressive FID of 2.87 on 1-NFE ImageNet 256x256, compared to 3.43 for the conventional MeanFlow baseline. Alternatively, our method matches the performance of the MeanFlow baseline with 2.5x shorter training time, or with a smaller DiT-L backbone.

URL PDF HTML ☆

赞 0 踩 0

2511.12046 2026-05-26 cs.CR cs.AI cs.CV cs.LG 版本更新

BackWeak: Backdooring Knowledge Distillation Simply with Weak Triggers and Fine-tuning

BackWeak: 使用弱触发器和微调简单后门知识蒸馏

Shanmin Wang, Dongdong Zhao

发表机构 * School of Computer Science and Artificial Intelligence（计算机科学与人工智能学院）； Wuhan University of Technology（武汉科技大学）

AI总结提出BackWeak方法，通过微调教师模型嵌入弱触发器实现后门攻击，无需替代学生模型或模拟蒸馏，在标准蒸馏过程中可靠转移至不同学生架构。

详情

AI中文摘要

知识蒸馏对于压缩大型模型至关重要，但依赖从第三方仓库下载的预训练“教师”模型引入了严重的安全风险——最显著的是后门攻击。现有的知识蒸馏后门方法通常复杂且计算密集：它们使用替代学生模型和模拟蒸馏来保证可转移性，并构建类似于通用对抗扰动（UAP）的触发器，这些触发器在幅度上不隐蔽，本质上表现出强烈的对抗行为。本文质疑这种复杂性是否必要，并构建了隐蔽的“弱”触发器——具有可忽略对抗效应的不可察觉扰动。我们提出了BackWeak，一种简单、无替代的攻击范式。BackWeak表明，通过使用非常小的学习率对良性教师模型进行微调并嵌入弱触发器，即可植入强大的后门。我们证明，这种精细的微调足以嵌入后门，在受害者的标准蒸馏过程中可靠地转移到不同的学生架构，从而实现高攻击成功率。在多个数据集、模型架构和知识蒸馏方法上的广泛实证评估表明，BackWeak比以往复杂的方法更高效、更简单，且通常更隐蔽。本文呼吁研究知识蒸馏后门攻击的学者特别关注触发器的潜在对抗特性。

英文摘要

Knowledge Distillation (KD) is essential for compressing large models, yet relying on pre-trained "teacher" models downloaded from third-party repositories introduces serious security risks--most notably backdoor attacks. Existing KD backdoor methods are typically complex and computationally intensive: they employ surrogate student models and simulated distillation to guarantee transferability, and construct triggers similar to universal adversarial perturbations (UAPs), which being not stealthy in magnitude, inherently exhibit strong adversarial behavior. This work questions whether such complexity is necessary and constructs stealthy "weak" triggers--imperceptible perturbations that have negligible adversarial effect. We propose BackWeak, a simple, surrogate-free attack paradigm. BackWeak shows that a powerful backdoor can be implanted by simply fine-tuning a benign teacher with a weak trigger using a very small learning rate. We demonstrate that this delicate fine-tuning is sufficient to embed a backdoor that reliably transfers to diverse student architectures during a victim's standard distillation process, yielding high attack success rates. Extensive empirical evaluations on multiple datasets, model architectures, and KD methods show that BackWeak is efficient, simpler, and often more stealthy than previous elaborate approaches. This work calls on researchers studying KD backdoor attacks to pay particular attention to the trigger's potential adversarial characteristics.

URL PDF HTML ☆

赞 0 踩 0

2511.03548 2026-05-26 cs.LG 版本更新

Flat Minima and Generalization: Insights from Stochastic Convex Optimization

平坦极小值与泛化：来自随机凸优化的见解

Matan Schliserman, Shira Vansover-Hager, Tomer Koren

发表机构 * Blavatnik School of Computer Science and AI, Tel Aviv University（塔夫茨大学Blavatnik计算机科学与人工智能学院）； Google Research（谷歌研究）

AI总结本文在随机凸优化框架下研究平坦极小值与泛化的关系，发现平坦经验极小值可能产生Ω(1)的总体风险，而尖锐极小值泛化最优，并证明两种锐度感知算法（SA-GD和SAM）也可能泛化不佳。

详情

AI中文摘要

理解学习算法的泛化行为是学习理论的核心目标。最近一种新兴的解释是，学习算法在实践中成功是因为它们收敛到平坦极小值，而平坦极小值一直与改进的泛化性能相关联。在这项工作中，我们在非负、β-光滑目标的随机凸优化的经典设置中研究平坦极小值与泛化之间的联系。我们的第一个发现是，即使在这个基础且被充分研究的设置中，平坦的经验极小值可能产生平凡的Ω(1)总体风险，而尖锐极小值则能最优地泛化。然后，我们表明这种糟糕的泛化行为延伸到两种自然的“锐度感知”算法，这些算法最初由Foret等人（2021）提出，旨在将优化偏向平坦解：锐度感知梯度下降（SA-GD）和锐度感知最小化（SAM）。对于SA-GD，它在预定义邻域内对最大损失执行梯度步骤，我们证明虽然它成功以快速率收敛到平坦极小值，但解的总体风险仍然可能高达Ω(1)，表明即使使用锐度感知梯度方法算法性地找到的平坦极小值也可能泛化不佳。对于SAM，一种基于归一化上升步骤的SA-GD计算高效近似，我们表明尽管它最小化经验损失，但可能收敛到尖锐极小值，并且也产生Ω(1)的总体风险。最后，我们使用算法稳定性技术为SA-GD和SAM建立了总体风险上界。

英文摘要

Understanding the generalization behavior of learning algorithms is a central goal of learning theory. A recently emerging explanation is that learning algorithms are successful in practice because they converge to flat minima, which have been consistently associated with improved generalization performance. In this work, we study the link between flat minima and generalization in the canonical setting of stochastic convex optimization with a non-negative, $β$-smooth objective. Our first finding is that, even in this fundamental and well-studied setting, flat empirical minima may incur trivial $Ω(1)$ population risk while sharp minima generalizes optimally. Then, we show that this poor generalization behavior extends to two natural ''sharpness-aware'' algorithms originally proposed by Foret et al. (2021), designed to bias optimization toward flat solutions: Sharpness-Aware Gradient Descent (SA-GD) and Sharpness-Aware Minimization (SAM). For SA-GD, which performs gradient steps on the maximal loss in a predefined neighborhood, we prove that while it successfully converges to a flat minimum at a fast rate, the population risk of the solution can still be as large as $Ω(1)$, indicating that even flat minima found algorithmically using a sharpness-aware gradient method might generalize poorly. For SAM, a computationally efficient approximation of SA-GD based on normalized ascent steps, we show that although it minimizes the empirical loss, it may converge to a sharp minimum and also incur population risk $Ω(1)$. Finally, we establish population risk upper bounds for both SA-GD and SAM using algorithmic stability techniques.

URL PDF HTML ☆

赞 0 踩 0

2511.03529 2026-05-26 cs.LG 版本更新

Byzantine-Robust Federated Learning with Learnable Aggregation Weights

具有可学习聚合权重的拜占庭鲁棒联邦学习

Javad Parsa, Amir Hossein Daghestani, André M. H. Teixeira, Mikael Johansson

发表机构 * Uppsala University, Sweden（瑞典乌普萨拉大学）； KTH, Sweden（瑞典皇家理工学院）

AI总结提出一种将聚合权重作为可学习参数联合优化的拜占庭鲁棒联邦学习优化问题，并开发了交替最小化算法，在异构数据和恶意客户端场景下优于现有方法。

Comments ICLR 2026

详情

AI中文摘要

联邦学习（FL）使客户端能够在不共享私有数据的情况下协作训练全局模型。然而，恶意（拜占庭）客户端的存在对FL的鲁棒性构成了重大挑战，尤其是在客户端数据分布异构的情况下。在本文中，我们提出了一种新颖的拜占庭鲁棒FL优化问题，该问题将自适应加权引入聚合过程。与传统方法不同，我们的公式将聚合权重视为可学习参数，与全局模型参数联合优化。为了解决这个优化问题，我们开发了一种交替最小化算法，在对抗攻击下具有强收敛保证。我们分析了所提目标的拜占庭弹性。我们在各种数据集和攻击场景下，将我们的算法与最先进的拜占庭鲁棒FL方法进行了性能评估。实验结果表明，我们的方法始终优于现有方法，特别是在数据高度异构且恶意客户端比例较大的情况下。

英文摘要

Federated Learning (FL) enables clients to collaboratively train a global model without sharing their private data. However, the presence of malicious (Byzantine) clients poses significant challenges to the robustness of FL, particularly when data distributions across clients are heterogeneous. In this paper, we propose a novel Byzantine-robust FL optimization problem that incorporates adaptive weighting into the aggregation process. Unlike conventional approaches, our formulation treats aggregation weights as learnable parameters, jointly optimizing them alongside the global model parameters. To solve this optimization problem, we develop an alternating minimization algorithm with strong convergence guarantees under adversarial attack. We analyze the Byzantine resilience of the proposed objective. We evaluate the performance of our algorithm against state-of-the-art Byzantine-robust FL approaches across various datasets and attack scenarios. Experimental results demonstrate that our method consistently outperforms existing approaches, particularly in settings with highly heterogeneous data and a large proportion of malicious clients.

URL PDF HTML ☆

赞 0 踩 0

2510.22827 2026-05-26 cs.CV cs.LG 版本更新

FairJudge: Abstention-Aware Multimodal Judges for Fairness and Alignment Evaluation in Text-to-Image Models

FairJudge: 文本到图像模型中公平性与对齐评估的弃权感知多模态裁判

Zahraa Al Sahili, Maimuna Nowaz, Maryam Fetanat, Ioannis Patras, Matthew Purver

发表机构 * Queen Mary University of London（伦敦玛丽女王大学）； Institut Jožef Stefan（乔泽夫·斯蒂芬研究所）； Imperial College London（伦敦帝国学院）

AI总结提出FairJudge协议，利用多模态大语言模型作为结构化裁判，通过封闭标签、弃权机制和证据报告，在文本到图像模型中实现社会属性预测、职业定位和提示-图像对齐的公平性评估。

详情

AI中文摘要

评估文本到图像（T2I）系统不仅需要判断图像是否匹配提示，还需要判断社会显著属性是否被忠实表示且没有无根据的推断。现有的自动评估器通常依赖于以面部为中心的识别器或对比图像-文本相似度，这些方法提供的诊断反馈有限，并且通常在视觉证据模糊或缺失时强制进行预测。对于宗教和残疾等公平敏感属性，其中线索可能是上下文相关的、间接的或故意未指定的，这些评估器可能会遗漏细心的人类评审员会注意到的失败模式。我们引入了\textsc{FairJudge}，一种弃权感知的评估协议，该协议使用遵循指令的多模态LLM作为社会属性预测、职业定位和提示-图像对齐的结构化裁判。该协议将输出限制为封闭标签集，要求可见证据的理由，在线索不足时支持明确的\textsc{unspecified}决策，并将基于量规的对齐判断映射到$[-1,1]$。这些约束将MLLM裁判从开放式评估转变为可解析、可审计的评估程序。在四个属性预测基准和三个职业/对齐基准上，\textsc{FairJudge}优于或补充了CLIP、DeepFace、VIEScore和VQAScore。消融实验表明，封闭标签、弃权和证据报告对可靠性至关重要。我们进一步引入了\textsc{DIVERSIFY}和\textsc{DIVERSIFY-Professions}，这两个资源丰富的上下文数据集用于评估超越面部可见或图标线索的社会表示和职业定位。我们发布了代码、提示、数据集、解析器日志和每张图像的裁判输出，以支持可重复的审计。

英文摘要

Evaluating text-to-image (T2I) systems requires judging not only whether an image matches a prompt, but also whether socially salient attributes are represented faithfully and without unsupported inference. Existing automated evaluators typically rely on face-centric recognizers or contrastive image--text similarity, which provide limited diagnostic feedback and often force predictions even when visual evidence is ambiguous or absent. For fairness-sensitive attributes such as religion and disability, where cues may be contextual, indirect, or intentionally unspecified, these evaluators can therefore miss failure modes that careful human reviewers would notice. We introduce \textsc{FairJudge}, an abstention-aware evaluation protocol that uses instruction-following multimodal LLMs as structured judges for social-attribute prediction, profession grounding, and prompt--image alignment. The protocol constrains outputs to closed label sets, requires visible-evidence rationales, supports an explicit \textsc{unspecified} decision when cues are insufficient, and maps rubric-based alignment judgments to $[-1,1]$. These constraints turn MLLM judging from open-ended assessment into a parseable, auditable evaluation procedure. Across four attribute-prediction benchmarks and three profession/alignment benchmarks, \textsc{FairJudge} outperforms or complements CLIP, DeepFace, VIEScore, and VQAScore. Ablations show that closed labels, abstention, and evidence reporting are central to reliability. We further introduce \textsc{DIVERSIFY} and \textsc{DIVERSIFY-Professions}, two context-rich resources for evaluating social representation and profession grounding beyond face-visible or iconic cues. We release code, prompts, datasets, parser logs, and per-image judge outputs to support reproducible auditing.

URL PDF HTML ☆

赞 0 踩 0

2510.22186 2026-05-26 cs.LG cs.IT math.FA math.IT math.MG 版本更新

Quantitative Bounds for Sorting-Based Permutation-Invariant Embeddings

基于排序的置换不变嵌入的定量界

Nadav Dym, Matthias Wellershoff, Efstratios Tsoukanis, Daniel Levy, Radu Balan

发表机构 * Department of Mathematics, University of Maryland（马里兰大学数学系）； Institute of Mathematical Sciences, Claremont Graduate University（克莱姆森研究生大学数学科学研究所）

AI总结研究通过排序独立一维投影得到的置换不变嵌入，改进了注入性所需嵌入维度的上下界，并给出了双Lipschitz常数的估计，其失真度与点数n的平方成正比且与维度d无关。

Comments Minor revision; 37 pages, 1 figure, 2 tables

详情

DOI: 10.1109/TIT.2026.3679460
Journal ref: IEEE Trans. Inf. Theory, vol. 72, no. 6, pp. 4297-4311, Jun. 2026

AI中文摘要

我们研究$d$维点集的置换不变嵌入，这些嵌入通过排序输入数据的$D$个独立一维投影来定义。此类嵌入出现在图深度学习中对图节点输出应具有置换不变性的场景。先前的工作表明，对于足够大的$D$和处于一般位置的投影，该映射是单射的，并且满足双Lipschitz条件。然而，仍存在两个空白：首先，注入性所需的最优大小$D$尚不清楚；其次，映射的双Lipschitz常数估计未知。本文在解决这两个空白方面取得了实质性进展。针对第一个空白，我们改进了注入性所需嵌入维度$D$的最佳已知上界，并给出了最小注入性维度的下界。针对第二个空白，我们构造了投影向量矩阵，使得映射的双Lipschitz失真度与点数$n$的平方成正比，且完全独立于维度$d$。我们还证明，对于任何投影向量的选择，映射的失真度不会优于与$n$的平方根成比例的界。最后，我们展示了即使对映射应用线性投影以降低其维度，也能提供类似的保证。

英文摘要

We study permutation-invariant embeddings of $d$-dimensional point sets, which are defined by sorting $D$ independent one-dimensional projections of the input. Such embeddings arise in graph deep learning where outputs should be invariant to permutations of graph nodes. Previous work showed that for large enough $D$ and projections in general position, this mapping is injective, and moreover satisfies a bi-Lipschitz condition. However, two gaps remain: firstly, the optimal size $D$ required for injectivity is not yet known, and secondly, no estimates of the bi-Lipschitz constants of the mapping are known. In this paper, we make substantial progress in addressing both of these gaps. Regarding the first gap, we improve upon the best known upper bounds for the embedding dimension $D$ necessary for injectivity, and also provide a lower bound on the minimal injectivity dimension. Regarding the second gap, we construct matrices of projection vectors, so that the bi-Lipschitz distortion of the mapping depends quadratically on the number of points $n$, and is completely independent of the dimension $d$. We also show that for any choice of projection vectors, the distortion of the mapping will never be better than a bound proportional to the square root of $n$. Finally, we show that similar guarantees can be provided even when linear projections are applied to the mapping to reduce its dimension.

URL PDF HTML ☆

赞 0 踩 0

2510.19731 2026-05-26 eess.SY cs.LG cs.SY 版本更新

Bridging Earth and Space: A Survey on HAPS for Non-Terrestrial Networks

连接地球与太空：面向非地面网络的HAPS综述

G. Svistunov, A. Akhtarshenas, D. López-Pérez, M. Giordani, G. Geraci, H. Yanikomeroglu

发表机构 * Universitat Politècnica de València（瓦伦西亚理工大学）； University of Padova（帕多瓦大学）； Universitat Pompeu Fabra（庞培法华大学）； Carleton University（卡尔顿大学）

AI总结本文综述了高空平台站（HAPS）在6G非地面网络中的用例、技术及集成策略，强调了其在扩展覆盖、动态回传、大规模物联网和低延迟通信中的关键作用。

Comments 43 pages. This work has been submitted to IEEE for possible publication (under review)

详情

AI中文摘要

HAPS正在成为6G无线网络演进中的关键推动者，连接地面和非地面基础设施。HAPS在平流层运行，能够提供广域覆盖、低延迟、高能效的宽带通信，并为各种应用提供灵活的部署选项。本综述全面概述了HAPS在6G生态系统中的用例、技术和集成策略。讨论了HAPS在扩展未覆盖区域连接、支持动态回传、实现大规模物联网以及为自主和沉浸式服务提供可靠低延迟通信方面的作用。本文回顾了地面和非地面网络集成的最先进架构，并强调了最近的现场试验。此外，还研究了关键使能技术，如信道建模、AI驱动的资源分配、干扰控制、移动管理和高能效通信。本文还概述了开放的研究挑战。通过解决现有文献中的空白，本综述将HAPS定位为全球集成、有弹性和可持续的6G网络的基础组成部分。

英文摘要

HAPS are emerging as key enablers in the evolution of 6G wireless networks, bridging terrestrial and non-terrestrial infrastructures. Operating in the stratosphere, HAPS can provide wide-area coverage, low-latency, energy-efficient broadband communications with flexible deployment options for diverse applications. This survey delivers a comprehensive overview of HAPS use cases, technologies, and integration strategies within the 6G ecosystem. The roles of HAPS in extending connectivity to underserved regions, supporting dynamic backhauling, enabling massive IoT, and delivering reliable low-latency communications for autonomous and immersive services are discussed. The paper reviews state-of-the-art architectures for terrestrial and non-terrestrial network integration, highlights recent field trials. Furthermore, key enabling technologies such as channel modeling, AI-driven resource allocation, interference control, mobility management, and energy-efficient communications are examined. The paper also outlines open research challenges. By addressing existing gaps in the literature, this survey positions HAPS as a foundational component of globally integrated, resilient, and sustainable 6G networks.

URL PDF HTML ☆

赞 0 踩 0

2510.11296 2026-05-26 cs.CV cs.LG 版本更新

$Δ\mathrm{Energy}$: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization

$Δ\mathrm{Energy}$: 优化视觉-语言对齐过程中的能量变化提升OOD检测与OOD泛化

Lin Zhu, Yifeng Yang, Xinbing Wang, Qinying Gu, Nanyang Ye

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）

AI总结本文提出ΔEnergy分数，通过重新对齐视觉-语言模态时的能量变化来同时提升分布外检测和分布外泛化性能，并基于此开发了统一微调框架EBM。

Comments Accepted by NeurIPS2025

详情

AI中文摘要

近期针对视觉-语言模型（VLM）的方法在下游任务快速适应中取得了显著成功。当应用于真实世界下游任务时，VLM不可避免地会遇到分布内（ID）数据和分布外（OOD）数据。OOD数据集通常包括协变量偏移（例如，已知类别但图像风格变化）和语义偏移（例如，测试时未见类别）。这凸显了提升VLM对协变量偏移OOD数据的泛化能力，同时有效检测开放集语义偏移OOD类别的重要性。本文受重新对齐视觉-语言模态时（具体通过将最大余弦相似度直接降低到低值）观察到的闭集数据中显著能量变化的启发，提出了一种新的OOD分数，命名为ΔEnergy。ΔEnergy显著优于基于能量的原始OOD分数，为OOD检测提供了更可靠的方法。此外，ΔEnergy还能同时提升协变量偏移下的OOD泛化，这是通过ΔEnergy的下界最大化（称为EBM）实现的。理论上证明EBM不仅能增强OOD检测，还能产生领域一致的Hessian矩阵，这作为OOD泛化的强指标。基于这一发现，我们开发了一个统一的微调框架，能够提升VLM在OOD泛化和OOD检测两方面的鲁棒性。在具有挑战性的OOD检测和泛化基准上的大量实验证明了我们方法的优越性，在AUROC上比近期方法提升了10%到25%。

英文摘要

Recent approaches for vision-language models (VLMs) have shown remarkable success in achieving fast downstream adaptation. When applied to real-world downstream tasks, VLMs inevitably encounter both the in-distribution (ID) data and out-of-distribution (OOD) data. The OOD datasets often include both covariate shifts (e.g., known classes with changes in image styles) and semantic shifts (e.g., test-time unseen classes). This highlights the importance of improving VLMs' generalization ability to covariate-shifted OOD data, while effectively detecting open-set semantic-shifted OOD classes. In this paper, inspired by the substantial energy change observed in closed-set data when re-aligning vision-language modalities (specifically by directly reducing the maximum cosine similarity to a low value), we introduce a novel OOD score, named ΔEnergy. ΔEnergy significantly outperforms the vanilla energy-based OOD score and provides a more reliable approach for OOD detection. Furthermore, ΔEnergy can simultaneously improve OOD generalization under covariate shifts, which is achieved by lower-bound maximization for ΔEnergy (termed EBM). EBM is theoretically proven to not only enhance OOD detection but also yields a domain-consistent Hessian, which serves as a strong indicator for OOD generalization. Based on this finding, we developed a unified fine-tuning framework that allows for improving VLMs' robustness in both OOD generalization and OOD detection. Extensive experiments on challenging OOD detection and generalization benchmarks demonstrate the superiority of our method, outperforming recent approaches by 10% to 25% in AUROC.

URL PDF HTML ☆

赞 0 踩 0

2510.10921 2026-05-26 cs.CV cs.AI cs.LG 版本更新

FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model

FG-CLIP 2: 一种双语细粒度视觉-语言对齐模型

Chunyu Xie, Bin Wang, Fanjing Kong, Jincheng Li, Dawei Liang, Ji Ao, Dawei Leng, Yuhui Yin

发表机构 * AI Research（360人工智能研究院）

AI总结提出FG-CLIP 2双语视觉语言模型，通过区域-文本匹配、长描述建模和文本内模态对比损失等细粒度监督，在英中双语上实现细粒度对齐，在29个数据集上取得最优结果。

Comments Accepted in ICML2026

详情

AI中文摘要

细粒度视觉-语言理解需要视觉内容与语言描述之间的精确对齐，这一能力在当前模型中仍然有限，尤其是在非英语环境下。虽然CLIP等模型在全局对齐上表现良好，但它们往往难以捕捉对象属性、空间关系和语言表达中的细粒度细节，且对双语理解的支持有限。为应对这些挑战，我们提出了FG-CLIP 2，一个旨在推进英语和中文细粒度对齐的双语视觉语言模型。我们的方法利用了丰富的细粒度监督，包括区域-文本匹配和长描述建模，以及多个判别性目标。我们进一步引入了文本内模态对比损失，以更好地区分语义相似的描述。在精心策划的大规模英语和中文数据混合上训练，包括新发布的1200万中文区域-文本数据集，FG-CLIP 2实现了强大的双语性能。为进行严格评估，我们提出了一个新的中文多模态理解基准，包括长描述检索和边界框分类。在8个任务的29个数据集上的大量实验表明，FG-CLIP 2优于现有方法，在两种语言上均达到了最先进的结果。我们发布了模型、代码和基准，以促进双语细粒度视觉-语言对齐的未来研究。

英文摘要

Fine-grained vision-language understanding requires precise alignment between visual content and linguistic descriptions, a capability that remains limited in current models, particularly in non-English settings. While models like CLIP perform well on global alignment, they often struggle to capture fine-grained details in object attributes, spatial relations, and linguistic expressions, with limited support for bilingual comprehension. To address these challenges, we introduce FG-CLIP 2, a bilingual vision-language model designed to advance fine-grained alignment for both English and Chinese. Our approach leverages rich fine-grained supervision, including region-text matching and long-caption modeling, alongside multiple discriminative objectives. We further introduce the Textual Intra-modal Contrastive (TIC) loss to better distinguish semantically similar captions. Trained on a carefully curated mixture of large-scale English and Chinese data, including a newly released 12M Chinese region-text dataset, FG-CLIP 2 achieves powerful bilingual performance. To enable rigorous evaluation, we present a new benchmark for Chinese multimodal understanding, featuring long-caption retrieval and bounding box classification. Extensive experiments on 29 datasets across 8 tasks show that FG-CLIP 2 outperforms existing methods, achieving state-of-the-art results in both languages. We release the model, code, and benchmark to facilitate future research on bilingual fine-grained vision-language alignment.

URL PDF HTML ☆

赞 0 踩 0

2510.08558 2026-05-26 cs.AI cs.CL cs.IR cs.LG 版本更新

Agent Learning via Early Experience

通过早期经验进行智能体学习

Kai Zhang, Xiangchao Chen, Bo Liu, Tianci Xue, Zeyi Liao, Zhihan Liu, Xiyao Wang, Yuting Ning, Zhaorun Chen, Xiaohan Fu, Jian Xie, Yuxuan Sun, Boyu Gou, Qi Qi, Zihang Meng, Jianwei Yang, Ning Zhang, Xian Li, Ashish Shah, Dat Huynh, Hengduo Li, Zi Yang, Sara Cao, Lawrence Jang, Shuyan Zhou, Jiacheng Zhu, Huan Sun, Jason Weston, Yu Su, Yifan Wu

发表机构 * Meta Superintelligence Labs（Meta超智能实验室）； FAIR at Meta（Meta的FAIR部门）； The Ohio State University（俄亥俄州立大学）

AI总结提出早期经验范式，利用智能体自身动作生成的交互数据（无需奖励信号）通过隐式世界建模和自我反思两种策略提升智能体在多样化环境中的效果和跨域泛化能力。

Comments ICML 2026

详情

AI中文摘要

语言智能体的一个长期目标是通过自身经验学习和改进，最终在复杂的现实任务中超越人类。然而，在缺乏可验证奖励（如网站）或需要低效长程展开（如多轮工具使用）的许多环境中，基于经验数据使用强化学习训练智能体仍然困难。因此，当前大多数智能体依赖专家数据的监督微调，这难以扩展且泛化能力差。这一局限性源于专家示范的本质：它们只捕获了狭窄的场景范围，并使智能体暴露于有限的环境多样性。我们通过一种称为早期经验的中间范式来解决这一局限性：由智能体自身动作生成的交互数据，其中产生的未来状态作为监督信号，无需奖励。在此范式下，我们研究了使用此类数据的两种策略：（1）隐式世界建模，利用收集的状态将策略基于环境动态；（2）自我反思，智能体从其次优动作中学习以改进推理和决策。在八个多样化环境和多个模型家族上的评估表明，我们的方法持续提升了有效性和跨域泛化，凸显了早期经验的价值。此外，在具有可验证奖励的环境中，我们的结果提供了有希望的信号，表明早期经验为后续强化学习奠定了坚实基础，使其成为模仿学习与完全经验驱动智能体之间的实用桥梁。

英文摘要

A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use). As a result, most current agents rely on supervised fine-tuning on expert data, which is challenging to scale and generalizes poorly. This limitation stems from the nature of expert demonstrations: they capture only a narrow range of scenarios, and expose the agent to limited environment diversity. We address this limitation with a middle-ground paradigm we call early experience: interaction data generated by the agent's own actions, where the resulting future states serve as supervision without reward signals. Within this paradigm, we study two strategies of using such data: (1) implicit world modeling, which uses collected states to ground the policy in environment dynamics; and (2) self-reflection, where the agent learns from its suboptimal actions to improve reasoning and decision-making. Evaluation across eight diverse environments and multiple model families shows that our approaches consistently improve effectiveness and out-of-domain generalization, highlighting the value of early experience. Moreover, in environments with verifiable rewards, our results provide promising signals that early experience offers a strong foundation for subsequent reinforcement learning, making it a practical bridge between imitation learning and fully experience-driven agents.

URL PDF HTML ☆

赞 0 踩 0

2510.08350 2026-05-26 cs.LG cs.AI 版本更新

DeepEN: A Deep Reinforcement Learning Framework for Personalized Enteral Nutrition in Critical Care

DeepEN: 一种用于重症监护中个性化肠内营养的深度强化学习框架

Daniel Jason Tan, Jiayang Chen, Dilruk Perera, Kay Choong See, Mengling Feng

发表机构 * Institute of Data Science（数据科学研究所）； Saw Swee Hock School of Public Health, National University of Singapore, Singapore（Saw Swee Hock公共卫生学院，新加坡国立大学，新加坡）； National University Hospital, Singapore（新加坡国立医院）

AI总结提出DeepEN框架，利用深度强化学习从电子健康记录中学习个性化肠内营养方案，在MIMIC-IV数据集上相比临床实践降低绝对死亡率4.0个百分点。

详情

AI中文摘要

目的：由于个性化程度有限以及在动态代谢需求下对适当热量、蛋白质和液体目标的不确定性，ICU中的肠内营养（EN）输送仍不理想。我们引入DeepEN，一个使用电子健康记录数据进行个性化EN优化的强化学习（RL）框架。方法：DeepEN在来自MIMIC-IV的超过11,000名ICU患者上训练，以生成每4小时一次、针对患者的卡路里、蛋白质和液体目标。状态表示包括人口统计学、合并症、生命体征、实验室值和近期干预措施。一个生理学对齐的奖励框架平衡了生物标志物稳定性与长期生存。策略学习采用带有保守Q学习正则化的决斗双深度Q网络，以实现安全的离线训练。结果：DeepEN实现了最高的估计策略价值（$V^π= 9.48$）和最低的校准死亡率（18.8 ± 1.0%），与临床实践（22.8%）相比绝对降低了4.0个百分点。该策略还表现出优越的代谢稳定性，实现了目标范围内葡萄糖、磷酸盐和钠值的最高比例。此外，偏离DeepEN策略与死亡率和生物标志物不稳定性独立相关，而偏离随机策略则没有这种关联。可解释性分析进一步表明，建议是基于器官功能和代谢状态的生理相关标志物，而不是静态剂量启发式。结论：DeepEN证明了保守离线RL在安全、个性化EN优化中的可行性，突出了数据驱动个性化在重症监护中补充基于指南方法的潜力。

英文摘要

Objective: Enteral nutrition (EN) delivery in the ICU remains suboptimal due to limited personalization and uncertainty regarding appropriate calorie, protein, and fluid targets under dynamic metabolic demands. We introduce DeepEN, a reinforcement learning (RL) framework for personalized EN optimization using electronic health record data. Methods: DeepEN was trained on over 11,000 ICU patients from MIMIC-IV to generate 4-hourly, patient-specific caloric, protein, and fluid targets. The state representation incorporated demographics, comorbidities, vital signs, laboratory values, and recent interventions. A physiologically aligned reward framework balanced biomarker stability with long-term survival. Policy learning employed a dueling double deep Q-network with Conservative Q-Learning regularization to enable safe offline training. Results: DeepEN achieved the highest estimated policy value ($V^π= 9.48$) and the lowest calibrated mortality (18.8 +/- 1.0%), representing a 4.0 percentage-point absolute reduction compared with clinician practice (22.8%). The policy also demonstrated superior metabolic stability, achieving the highest proportion of glucose, phosphate, and sodium values within target range. Furthermore, deviation from the DeepEN policy was independently associated with increased mortality and biomarker instability, whereas deviation from a random policy showed no such association. Interpretability analyses further indicated that recommendations were conditioned on physiologically relevant markers of organ function and metabolic status rather than static dosing heuristics. Conclusion: DeepEN demonstrates the feasibility of conservative offline RL for safe, individualized EN optimization, highlighting the potential of data-driven personalization to complement guideline-based approaches in critical care.

URL PDF HTML ☆

赞 0 踩 0

2510.06672 2026-05-26 cs.LG 版本更新

XRPO: Pushing the limits of GRPO with Targeted Exploration and Exploitation

XRPO：通过定向探索与利用突破GRPO极限

Udbhav Bamba, Minghao Fang, Yifan Yu, Haizhong Zheng, Fan Lai

发表机构 * University of Illinois Urbana–Champaign（伊利诺伊大学厄巴纳-香槟分校）； Carnegie Mellon University（卡内基梅隆大学）

AI总结提出XRPO框架，通过自适应探索分配器、上下文种子策略和新颖性感知优势机制，在数学和编码基准上实现比GRPO最高4% pass@1和6% cons@32的提升，并加速训练收敛达2.7倍。

详情

AI中文摘要

GRPO等强化学习算法推动了大型语言模型推理的最新进展。虽然增加rollout数量可以稳定训练，但现有方法在具有挑战性的提示上探索有限，且由于跨提示的上下文无关rollout分配（例如，每个提示生成16个rollout）以及严重依赖稀疏奖励，导致信息性反馈信号未被充分利用。本文提出XRPO（探索-利用GRPO），这是一个统一框架，通过rollout探索-利用的原则性视角重新审视策略优化。为增强探索，XRPO引入了一个数学基础的rollout分配器，自适应地优先处理具有更高不确定性减少潜力的提示。它还通过上下文种子策略注入精选示例，解决零奖励提示上的停滞问题，引导模型进入更困难的推理轨迹。为加强利用，XRPO开发了一种组相对、新颖性感知的优势锐化机制，利用序列似然性放大低概率但正确的响应，从而将策略扩展到稀疏奖励之外。在多种数学和编码基准上对推理和非推理模型的实验表明，XRPO优于现有先进方法（如GRPO和GSPO），pass@1提升高达4%，cons@32提升高达6%，同时训练收敛速度加快达2.7倍。

英文摘要

Reinforcement learning algorithms such as GRPO have driven recent advances in large language model (LLM) reasoning. While scaling the number of rollouts stabilizes training, existing approaches suffer from limited exploration on challenging prompts and leave informative feedback signals underexploited, due to context-independent rollout allocation across prompts (e.g., generating 16 rollouts per prompt) and relying heavily on sparse rewards. This paper presents XRPO(eXplore - eXploit GRPO), a unified framework that recasts policy optimization through the principled lens of rollout exploration-exploitation. To enhance exploration, XRPO introduces a mathematically grounded rollout allocator that adaptively prioritizes prompts with higher potential for uncertainty reduction. It further addresses stagnation on zero-reward prompts through an in-context seeding strategy that injects curated exemplars, steering the model into more difficult reasoning trajectories. To strengthen exploitation, XRPO develops a group-relative, novelty-aware advantage sharpening mechanism that leverages sequence likelihoods to amplify low-probability yet correct responses, thereby extending the policy's reach beyond sparse rewards. Experiments across diverse math and coding benchmarks on both reasoning and non-reasoning models demonstrate that XRPO outperforms existing advances (e.g., GRPO and GSPO) up to 4% pass@1 and 6% cons@32, while accelerating training convergence by up to 2.7X.

URL PDF HTML ☆

赞 0 踩 0

2510.05688 2026-05-26 cs.LG cs.AI 版本更新

vAttention: Verified Sparse Attention

vAttention: 验证的稀疏注意力

Aditya Desai, Kumar Krishna Agrawal, Shuo Yang, Alejandro Cuadron, Luis Gaspar Schroeder, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica

发表机构 * Electrical Engineering and Computer Sciences, University of California, Berkeley（加州大学伯克利分校电气工程与计算机科学系）

AI总结提出vAttention，通过统一top-k和随机采样，实现首个具有用户指定(ε, δ)近似精度保证的实用稀疏注意力机制，显著提升质量-效率权衡。

详情

Journal ref: Proceedings of the International Conference on Learning Representations (ICLR), 2026

AI中文摘要

最先进的用于减少解码延迟的稀疏注意力方法主要分为两类：近似top-$k$（及其扩展top-$p$）和最近引入的基于采样的估计。然而，这些方法在逼近全注意力方面存在根本性局限：它们无法在头和查询向量之间提供一致的近似，最关键的是，缺乏对近似质量的保证，限制了其实际部署。我们观察到top-$k$和随机采样是互补的：当注意力分数由少数标记主导时，top-$k$表现良好，而当注意力分数相对均匀时，随机采样提供更好的估计。基于这一洞察并利用采样的统计保证，我们引入了vAttention，这是第一个具有用户指定$(ε, δ)$近似精度保证（因此称为“已验证”）的实用稀疏注意力机制。这些保证使vAttention成为向大规模实用、可靠部署稀疏注意力迈出的引人注目的一步。通过统一top-$k$和采样，vAttention在质量-效率权衡上优于两者各自的表现。我们的实验表明，vAttention显著提高了稀疏注意力的质量（例如，在RULER-HARD上，Llama 3.1 8B Instruct和DeepSeek-R1-Distill-Llama-8B提高了约4.5个百分点），并有效弥合了全注意力和稀疏注意力之间的差距（例如，在多个数据集上，以高达20倍稀疏度匹配全模型质量）。我们还展示了它可以部署在推理场景中，在不牺牲模型质量的情况下实现快速解码（例如，vAttention在AIME2024上以10倍稀疏度和高达32K标记生成实现了全模型质量）。代码：https://github.com/skylight-org/sparse-attention-hub。网页：https://sky-light.eecs.berkeley.edu。

英文摘要

State-of-the-art sparse attention methods for reducing decoding latency fall into two main categories: approximate top-$k$ (and its extension, top-$p$) and recently introduced sampling-based estimation. However, these approaches are fundamentally limited in their ability to approximate full attention: they fail to provide consistent approximations across heads and query vectors and, most critically, lack guarantees on approximation quality, limiting their practical deployment. We observe that top-$k$ and random sampling are complementary: top-$k$ performs well when attention scores are dominated by a few tokens, whereas random sampling provides better estimates when attention scores are relatively uniform. Building on this insight and leveraging the statistical guarantees of sampling, we introduce vAttention, the first practical sparse attention mechanism with user-specified $(ε, δ)$ guarantees on approximation accuracy (thus, "verified"). These guarantees make vAttention a compelling step toward practical, reliable deployment of sparse attention at scale. By unifying top-$k$ and sampling, vAttention outperforms both individually, delivering a superior quality-efficiency trade-off. Our experiments show that vAttention significantly improves the quality of sparse attention (e.g., $\sim$4.5 percentage points for Llama 3.1 8B Instruct and DeepSeek-R1-Distill-Llama-8B on RULER-HARD), and effectively bridges the gap between full and sparse attention (e.g., across datasets, it matches full model quality with up to 20x sparsity). We also demonstrate that it can be deployed in reasoning scenarios to achieve fast decoding without compromising model quality (e.g., vAttention achieves full model quality on AIME2024 at 10x sparsity with up to 32K token generations). Code: https://github.com/skylight-org/sparse-attention-hub. Webpage: https://sky-light.eecs.berkeley.edu.

URL PDF HTML ☆

赞 0 踩 0

2509.23975 2026-05-26 eess.SY cs.LG cs.NA cs.SY math.NA math.OC 版本更新

Equation-Free Coarse Control of Distributed Parameter Systems via Local Neural Operators

基于局部神经算子的分布式参数系统无方程粗粒度控制

Gianluca Fabiani, Constantinos Siettos, Ioannis G. Kevrekidis

发表机构 * Hopkins Extreme Materials Institute and Department of Chemical and Biomolecular Engineering, Johns Hopkins University（霍普金斯极端材料研究所和化学与生物分子工程系，约翰霍普金斯大学）； Dipartimento di Matematica e Applicazioni ”Renato Caccioppoli”, Università degli studi di Napoli Federico II（Renato Caccioppoli数学与应用系，那不勒斯费德里克二世大学）； Department of Chemical and Biomolecular Engineering and Department of Applied Mathematics and Statistics, Johns Hopkins University（化学与生物分子工程系和应用数学与统计学系，约翰霍普金斯大学）

AI总结提出一种数据驱动方法，利用局部神经算子学习短时解算子，结合Krylov子空间方法计算稳态和降阶模型，实现无显式粗粒度方程的高维分布式参数系统控制。

Comments 8 pages, 2 figures

详情

AI中文摘要

当显式粗粒度方程不可用时，高维分布式参数系统（DPS）的控制仍然是一个挑战。经典的无方程（EF）方法依赖于被视为黑箱时间步进器的细尺度模拟器。然而，用于稳态计算、线性化和控制设计的重复模拟通常在计算上代价高昂，或者微观时间步进器甚至可能不可用，使得数据成为唯一资源。我们提出一种数据驱动替代方案，使用在时空微观/介观数据上训练的局部神经算子来获得高效的短时解算子。这些代理模型在Krylov子空间方法中用于计算粗粒度的稳定和不稳定稳态，同时以无矩阵方式提供雅可比信息。然后，Krylov-Arnoldi迭代逼近主导特征谱，生成捕获开环慢动态的降阶模型，而无需显式组装雅可比矩阵。离散时间线性二次型调节器（dLQR）和极点配置（PP）控制器均基于此降阶系统，并提升回完整的非线性动力学，从而闭合反馈回路。该框架通过稳定Liouville-Bratu PDE的不稳定稳态得到验证，展示了学习代理与真实系统之间的一致性能，并在模型失配下量化了性能下降。

英文摘要

The control of high-dimensional distributed parameter systems (DPS) remains a challenge when explicit coarse-grained equations are unavailable. Classical equation-free (EF) approaches rely on fine-scale simulators treated as black-box timesteppers. However, repeated simulations for steady-state computation, linearization, and control design are often computationally prohibitive, or the microscopic timestepper may not even be available, leaving us with data as the only resource. We propose a data-driven alternative that uses local neural operators, trained on spatiotemporal microscopic/mesoscopic data, to obtain efficient short-time solution operators. These surrogates are employed within Krylov subspace methods to compute coarse stable and unstable steady states, while also providing Jacobian information in a matrix-free manner. Krylov-Arnoldi iterations then approximate the dominant eigenspectrum, yielding reduced models that capture the open-loop slow dynamics without explicit Jacobian assembly. Both discrete-time Linear Quadratic Regulator (dLQR) and pole-placement (PP) controllers are based on this reduced system and lifted back to the full nonlinear dynamics, thereby closing the feedback loop. The framework is validated by stabilizing an unstable steady-state of the Liouville-Bratu PDE, demonstrating consistent performance between the learned surrogate and the true system, with quantified degradation under plant-model mismatch.

URL PDF HTML ☆

赞 0 踩 0

2509.22299 2026-05-26 cs.LG cs.AI 版本更新

HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space

HEAPr: 基于Hessian的输出空间中高效原子专家剪枝

Ke Li, Zheng Yang, Zhongbin Zhou, Feng Xue, Zhonglin Jiang, Wenxiao Wang

发表机构 * School of Software Technology, Zhejiang University（浙江大学软件学院）； FABU Inc.（FABU公司）； Hangzhou Kuaidi Science and Technology Co., Ltd.（杭州快的科学技术有限公司）

AI总结针对MoE模型粗粒度专家剪枝导致精度下降的问题，提出HEAPr算法，通过将专家分解为原子专家并利用二阶信息（最优脑外科原理）评估重要性，在输出空间简化计算，实现高比例无损压缩。

Comments ICLR 2026

详情

Journal ref: Proceedings of the International Conference on Learning Representations (ICLR), 2026

AI中文摘要

大型语言模型中的混合专家（MoE）架构相比密集LLM具有卓越性能和更低的推理成本。然而，其庞大的参数数量导致内存需求过高，限制了实际部署。现有的剪枝方法主要关注专家级剪枝，这种粗粒度通常导致显著的精度下降。在这项工作中，我们引入了HEAPr，一种新颖的剪枝算法，它将专家分解为更小、不可分割的原子专家，从而实现更精确和灵活的原子专家剪枝。为了衡量每个原子专家的重要性，我们利用基于最优脑外科理论原理的二阶信息。为了解决二阶信息带来的计算和存储挑战，HEAPr利用原子专家的固有属性，将专家参数的二阶信息转换为原子专家参数的二阶信息，并进一步简化为原子专家输出的二阶信息。这种方法将空间复杂度从$O(d^4)$（其中$d$是模型的维度）降低到$O(d^2)$。HEAPr仅需在小型校准集上进行两次前向传播和一次反向传播即可计算原子专家的重要性。在包括DeepSeek MoE和Qwen MoE系列在内的MoE模型上的大量实验表明，HEAPr在广泛的剪枝比例和基准测试中优于现有的专家级剪枝方法。具体来说，在大多数模型中，HEAPr在20%~25%的剪枝比例下实现了几乎无损的压缩，同时FLOPs也减少了近20%。代码可在[https://github.com/LLIKKE/HEAPr](https://github.com/LLIKKE/HEAPr)找到。

英文摘要

Mixture-of-Experts (MoE) architectures in large language models (LLMs) deliver exceptional performance and reduced inference costs compared to dense LLMs. However, their large parameter counts result in prohibitive memory requirements, limiting practical deployment. While existing pruning methods primarily focus on expert-level pruning, this coarse granularity often leads to substantial accuracy degradation. In this work, we introduce HEAPr, a novel pruning algorithm that decomposes experts into smaller, indivisible atomic experts, enabling more precise and flexible atomic expert pruning. To measure the importance of each atomic expert, we leverage second-order information based on principles similar to the Optimal Brain Surgeon theory. To address the computational and storage challenges posed by second-order information, HEAPr exploits the inherent properties of atomic experts to transform the second-order information from expert parameters into that of atomic expert parameters, and further simplifies it to the second-order information of atomic expert outputs. This approach reduces the space complexity from $O(d^4)$, where $d$ is the model's dimensionality, to $O(d^2)$. HEAPr requires only two forward passes and one backward pass on a small calibration set to compute the importance of atomic experts. Extensive experiments on MoE models, including DeepSeek MoE and Qwen MoE family, demonstrate that HEAPr outperforms existing expert-level pruning methods across a wide range of pruning ratios and benchmarks. Specifically, HEAPr achieves nearly lossless compression at pruning ratios of 20% ~ 25% in most models, while also reducing FLOPs nearly by 20%. The code can be found at [https://github.com/LLIKKE/HEAPr](https://github.com/LLIKKE/HEAPr).

URL PDF HTML ☆

赞 0 踩 0

2509.21592 2026-05-26 cs.CV cs.AI cs.LG 版本更新

What Happens Next? Anticipating Future Motion by Generating Point Trajectories

接下来会发生什么？通过生成点轨迹预测未来运动

Gabrijel Boduljak, Laurynas Karazija, Iro Laina, Christian Rupprecht, Andrea Vedaldi

发表机构 * Visual Geometry Group, University of Oxford（牛津大学视觉几何组）

AI总结提出一种基于单张图像预测未来运动的方法，通过生成密集轨迹网格来捕捉场景动态和不确定性，相比现有方法更准确多样，并验证其在机器人等下游任务中的有效性。

详情

Journal ref: ICLR 2026

AI中文摘要

我们考虑从单张图像预测运动的问题，即预测世界中物体可能如何移动，而无法观察其他参数如物体速度或施加的力。我们将此任务表述为密集轨迹网格的条件生成，模型紧密遵循现代视频生成器的架构，但输出运动轨迹而非像素。这种方法捕捉了场景范围的动态和不确定性，比先前的回归器和生成器产生更准确和多样化的预测。我们在模拟数据上广泛评估了我们的方法，展示了其在机器人等下游应用中的有效性，并在真实世界的直觉物理数据集上显示出有希望的准确性。尽管最近最先进的视频生成器常被视为世界模型，但我们表明它们在从单张图像预测运动方面存在困难，即使在简单的物理场景如落块或机械物体交互中，尽管对这些数据进行了微调。我们表明这一局限性源于生成像素的开销，而非直接建模运动。

英文摘要

We consider the problem of forecasting motion from a single image, i.e., predicting how objects in the world are likely to move, without the ability to observe other parameters such as the object velocities or the forces applied to them. We formulate this task as conditional generation of dense trajectory grids with a model that closely follows the architecture of modern video generators but outputs motion trajectories instead of pixels. This approach captures scene-wide dynamics and uncertainty, yielding more accurate and diverse predictions than prior regressors and generators. We extensively evaluate our method on simulated data, demonstrate its effectiveness on downstream applications such as robotics, and show promising accuracy on real-world intuitive physics datasets. Although recent state-of-the-art video generators are often regarded as world models, we show that they struggle with forecasting motion from a single image, even in simple physical scenarios such as falling blocks or mechanical object interactions, despite fine-tuning on such data. We show that this limitation arises from the overhead of generating pixels rather than directly modeling motion.

URL PDF HTML ☆

赞 0 踩 0

2509.16931 2026-05-26 cs.IR cs.AI cs.LG 版本更新

Equip Pre-ranking with Target Attention by Residual Quantization

通过残差量化为预排序阶段配备目标注意力机制

Yutong Li, Yu Zhu, Yichen Qiao, Ziyu Guan, Lv Shao, Tong Liu, Bo Zheng

发表机构 * Taobao \& Tmall Group of Alibaba Hangzhou China ； Shanghai Jiao Tong University Shanghai China ； Xidian University Xi'an China ； Taobao \& Tmall Group of Alibaba Beijing China ； Taobao \& Tmall Group of Alibaba ； Shanghai Jiao Tong University ； Xidian University

AI总结提出TARQ框架，利用残差量化在预排序阶段近似目标注意力架构，首次在延迟关键阶段引入TA建模能力，实现精度与效率的新最优平衡。

Comments 5 pages, 2 figures, accepted by SIGIR 2026 Short Paper Track

详情

AI中文摘要

工业推荐系统中的预排序阶段面临效率与效果之间的根本冲突。虽然目标注意力（TA）等强大模型在排序阶段擅长捕捉复杂的特征交互，但其高计算成本使其无法用于通常依赖简单向量积模型的预排序阶段。这种差异给整个系统造成了显著的性能瓶颈。为弥合这一差距，我们提出了TARQ，一种新颖的预排序框架。受生成模型启发，TARQ的关键创新在于通过残差量化为预排序阶段配备近似TA的架构。这使得我们首次将TA的建模能力引入延迟关键的预排序阶段，建立了精度与效率之间新的最优权衡。在淘宝进行的大量离线实验和大规模在线A/B测试证明了TARQ在排序性能上的显著提升。因此，我们的模型已全面部署在生产环境中，服务于数千万日活跃用户，并带来了可观的业务改进。代码和数据可在 https://github.com/zyody/tarq_sigir2026 获取。

英文摘要

The pre-ranking stage in industrial recommendation systems faces a fundamental conflict between efficiency and effectiveness. While powerful models like Target Attention (TA) excel at capturing complex feature interactions in the ranking stage, their high computational cost makes them infeasible for pre-ranking, which often relies on simplistic vector-product models. This disparity creates a significant performance bottleneck for the entire system. To bridge this gap, we propose TARQ, a novel pre-ranking framework. Inspired by generative models, TARQ's key innovation is to equip pre-ranking with an architecture approximate to TA by Residual Quantization. This allows us to bring the modeling power of TA into the latency-critical pre-ranking stage for the first time, establishing a new state-of-the-art trade-off between accuracy and efficiency. Extensive offline experiments and large-scale online A/B tests at Taobao demonstrate TARQ's significant improvements in ranking performance. Consequently, our model has been fully deployed in production, serving tens of millions of daily active users and yielding substantial business improvements. The code and data are available at https://github.com/zyody/tarq_sigir2026.

URL PDF HTML ☆

赞 0 踩 0

2509.16139 2026-05-26 cs.LG 版本更新

Spatio-temporal, multi-field deep learning of shock propagation in meso-structured media

介观结构介质中冲击传播的时空多场深度学习

M. Giselle Fernández-Godino, Meir H. Shachar, Kevin Korner, Jonathan L. Belof, Mukul Kumar, Jonathan Lind, William J. Schill

发表机构 * Lawrence Livermore National Laboratory（劳伦斯利弗莫尔国家实验室）

AI总结提出多场时空模型（MSTM），通过训练多尺度多物理场数据，同时演化七个耦合热力学和动力学场，以高精度预测冲击传播中的异常响应，实现1000倍加速。

Comments 25 pages, 12 figures

详情

AI中文摘要

预测多孔和晶格材料极端流体动力学响应是高能量密度物理学中的一个基本挑战，其中冲击诱导的孔洞塌陷、斜压涡度和异常动力学与热力学状态必须在多个尺度上解析。传统高保真流体动力学代码在行星防御和惯性约束聚变等应用的大规模设计探索中计算成本过高。我们提出了一种多场时空模型（MSTM），旨在克服标准机器学习替代模型的局限性，这些模型通常无法捕捉冲击传播特征的尖锐梯度和非线性场耦合。通过在高保真、多尺度多物理场数据上训练，MSTM同时演化七个耦合的热力学和动力学场——包括压力、温度、密度和速度——跨越复杂材料架构。我们的框架展示了准确预测异常响应的能力，例如反直觉的冲击后密度降低和局部热点形成，均方根误差低至1.4%。关键的是，模型的多场公式在长自回归展开中保持了物理一致性和界面稳定性，在结构保真度上比单场模型提高了94%。该框架实现了1000倍的求解时间减少，为介观结构介质中能量耗散和动量传递的实时分析与优化提供了实用途径。

英文摘要

Predicting the extreme hydrodynamic response of porous and architected lattice materials is a fundamental challenge in high energy density physics, where shock-induced pore collapse, baroclinic vorticity, and anomalous kinetic and thermodynamic states must be resolved across multiple scales. Traditional high-fidelity hydrocodes are computationally prohibitive for large-scale design exploration in applications like planetary defense and inertial confinement fusion. We present a multi-field spatio-temporal model (MSTM) designed to overcome the limitations of standard machine learning surrogates, which often fail to capture the sharp gradients and non-linear field couplings characteristic of shock propagation. By training on high-fidelity, multiscale multiphysics data, MSTM simultaneously evolves seven coupled thermodynamic and kinetic fields - including pressure, temperature, density, and velocity - across complex material architectures. Our framework demonstrates the ability to accurately predict anomalous responses, such as counterintuitive post-shock density reductions and localized hotspot formation, with mean root mean squared errors as low as 1.4%. Crucially, the model's multi-field formulation maintains physical consistency and interface stability over long autoregressive rollouts, outperforming single-field models by 94% in structural fidelity. This framework enables a 1000x reduction in time to solution, providing a practical pathway for the real-time analysis and optimization of energy dissipation and momentum transfer in meso-structured media.

URL PDF HTML ☆

赞 0 踩 0

2509.04445 2026-05-26 cs.LG 版本更新

Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment

朝向认知忠实决策模型以改善AI对齐

Cyrus Cousins, Vijay Keswani, Vincent Conitzer, Hoda Heidari, Jana Schaich Borg, Walter Sinnott-Armstrong

发表机构 * Duke University（杜克大学）； IIT Delhi（德里印度理工学院）； CMU（卡内基梅隆大学）

AI总结提出一种基于公理的方法，从成对比较中学习认知忠实的决策过程，以解决标准偏好诱导方法未能捕捉人类决策认知过程的问题，并在肾脏分配任务中验证了模型的有效性。

Comments In ICLR 2026

详情

AI中文摘要

最近的AI趋势旨在将AI模型与以人为中心的学习目标（如个人偏好、效用或社会价值观）对齐。使用标准偏好诱导方法，研究人员和从业者构建人类决策和判断的模型，AI模型与之对齐。然而，标准诱导方法通常未能捕捉人类决策背后的认知过程，如启发式或简化的结构化思维模式。为了解决这一失败，我们采用公理化的方法从成对比较中学习认知忠实的决策过程。基于分析塑造人类决策的认知过程的文献，我们推导出一个模型类，其中特征首先通过学习的规则处理，然后通过固定规则（如Bradley-Terry规则）聚合以产生决策。这种结构化的信息处理确保了这些模型作为代表潜在人类决策过程的现实且可行的候选者。我们通过在肾脏分配任务中学习可解释的人类决策模型来展示这种建模方法的有效性，并表明我们提出的模型在准确性上匹配或超越了先前的人类成对决策模型。

英文摘要

Recent AI trends seek to align AI models to learned human-centric objectives, such as personal preferences, utility, or societal values. Using standard preference elicitation methods, researchers and practitioners build models of human decisions and judgments, to which AI models are aligned. However, standard elicitation methods often fail to capture the cognitive processes behind human decision making, such as heuristics or simplifying structured thought patterns. To address this failure, we take an axiomatic approach to learning cognitively faithful decision processes from pairwise comparisons. Building on the literature analyzing cognitive processes that shape human decision-making, we derive a model class in which features are first processed with learned rules, then aggregated via a fixed rule, such as the Bradley-Terry rule, to produce a decision. This structured processing of information ensures that such models are realistic and feasible candidates to represent underlying human decision-making processes. We demonstrate the efficacy of this modeling approach by learning interpretable models of human decision making in a kidney allocation task, and show that our proposed models match or surpass the accuracy of prior models of human pairwise decision-making.

URL PDF HTML ☆

赞 0 踩 0

2508.17090 2026-05-26 stat.ML cs.LG 版本更新

Neural Stochastic Differential Equations on Compact State Spaces: Theory, Methods, and Application to Suicide Risk Modeling

紧致状态空间上的神经随机微分方程：理论、方法及其在自杀风险建模中的应用

Malinda Lu, Yue-Jane Liu, Matthew K. Nock, Yaniv Yacoby

发表机构 * Wellesley College（韦尔斯利学院）； Harvard University（哈佛大学）

AI总结针对生态瞬时评估数据中随机微分方程违反域约束和训练不稳定的问题，提出一种新型表达性SDE，通过约束漂移和扩散确保解在紧致多面体状态空间内，并引入参数化映射任意动力学为满足约束的SDE，在真实数据上提升预测和优化性能。

Comments Accepted at the Symposium on Probabilistic Machine Learning (ProbML) 2026, and at the Methods and Opportunities at Small Scale (MOSS), ICML 2025, Vancouver, Canada

详情

AI中文摘要

生态瞬时评估（EMA）研究能够通过智能手机收集自杀想法和行为（STB）的高频自我报告。潜在随机微分方程（SDE）是EMA数据的一个有前景的模型类别，因为数据是不规则采样、有噪声且部分观测的。但基于SDE的模型存在两个关键限制。(a) 这些模型经常违反域约束，削弱了模型的科学有效性和临床信任。(b) 训练在数值上不稳定，除非采用临时修复（例如过度简化的动力学），而这些修复不适合高风险应用。在此，我们开发了一类新型表达性SDE，其解被证明被限制在预设的紧致多面体状态空间内，与EMA数据的域匹配。在这项工作中，(1) 我们从理论和经验上展示了为什么基于链式法则的紧致域上SDE构造会失败；(2) 我们推导了一般和稳态SDE的漂移和扩散约束，使其解保持在所需状态空间内；(3) 我们引入了一种参数化方法，将任意（神经或专家给出的）动力学映射为满足约束的SDE。在多个真实EMA数据集上，包括一项大型自杀风险研究，我们的参数化方法在预测和优化动力学方面优于标准潜在神经SDE基线。这些贡献为自杀风险和其他临床时间序列的原则性、可信赖的连续时间模型铺平了道路，并将基于SDE的方法（例如扩散模型）的应用扩展到具有硬状态约束的领域。

英文摘要

Ecological Momentary Assessment (EMA) studies enable the collection of high-frequency self-reports of suicidal thoughts and behaviors (STBs) via smartphones. Latent stochastic differential equations (SDEs) are a promising model class for EMA data, as it is irregularly sampled, noisy, and partially observed. But SDE-based models suffer from two key limitations. (a) These models often violate domain constraints, undermining scientific validity and clinical trust of the model. (b) Training is numerically unstable without ad hoc fixes (e.g. oversimplified dynamics) that are ill-suited for high-stakes applications. Here, we develop a novel class of expressive SDEs whose solutions are provably confined to a prescribed compact polyhedral state space, matching the domains of EMA data. In this work, (1) we show why chain-rule based constructions of SDEs on compact domains fail, theoretically and empirically; (2) we derive constraints on drift and diffusion for general and stationary SDEs so their solutions remain in the desired state space; and (3), we introduce a parameterization that maps arbitrary (neural or expert-given) dynamics into constraint-satisfying SDEs. On several real EMA datasets, including a large suicide-risk study, our parameterization improves forecasts and optimization dynamics over standard latent neural SDE baselines. These contributions pave the way for principled, trustworthy continuous-time models of suicide risk and other clinical time series and extend applications of SDE-based methods (e.g. diffusion models) to domains with hard state constraints.

URL PDF HTML ☆

赞 0 踩 0

2508.13309 2026-05-26 cs.CV cs.LG 版本更新

DASH: A Meta-Attack Framework for Synthesizing Effective and Stealthy Adversarial Examples

DASH：一种用于合成有效且隐蔽的对抗样本的元攻击框架

Abdullah Al Nomaan Nafi, Habibur Rahaman, Zafaryab Haider, Tanzim Mahfuz, Fnu Suya, Swarup Bhunia, Prabuddha Chakraborty

发表机构 * University of Maine（缅因大学）； University of Florida（佛罗里达大学）； University of Tennessee, Knoxville（田纳西大学，基洛纳）

AI总结提出DASH元攻击框架，通过多阶段自适应组合Lp约束攻击方法，生成有效且感知对齐的对抗样本，在多个数据集上优于现有方法。

Comments Accepted to CVPR 2026

详情

AI中文摘要

在白盒设置下，已有大量技术被提出用于在严格的Lp范数约束下生成对抗样本。然而，这类范数受限的样本往往与人类感知不一致，只有少数方法专门探索感知对齐的对抗样本。此外，尚不清楚能否有效利用Lp约束攻击的见解来提升感知效能。本文介绍DASH，一个完全可微的元攻击框架，通过策略性地组合现有基于Lp的攻击方法，生成有效且感知对齐的对抗样本。DASH以多阶段方式运行：在每个阶段，它使用学习到的自适应权重聚合来自多个基础攻击的候选对抗样本，并将结果传播到下一阶段。一种新颖的元损失函数通过联合最小化误分类损失和感知失真来指导这一过程，使框架能够动态调整每个基础攻击在各阶段的贡献。我们在CIFAR-10、CIFAR-100和ImageNet上对对抗训练模型评估DASH。尽管仅依赖基于Lp约束的方法，DASH显著优于最先进的感知攻击如AdvAD，实现了更高的攻击成功率（例如提升20.63%）和更优的视觉质量（以SSIM、LPIPS和FID衡量，分别提升约11、0.015和5.7）。此外，DASH对未见过的防御具有良好的泛化能力，使其成为评估鲁棒性的实用且强大的基线，无需为每种新防御手工设计自适应攻击。

英文摘要

Numerous techniques have been proposed for generating adversarial examples in white-box settings under strict Lp-norm constraints. However, such norm-bounded examples often fail to align well with human perception, and only a few methods specifically explore perceptually aligned adversarial examples. Moreover, it remains unclear whether insights from Lp-constrained attacks can be effectively leveraged to improve perceptual efficacy. In this paper, we introduce DASH, a fully differentiable meta-attack framework that generates effective and perceptually aligned adversarial examples by strategically composing existing Lp-based attack methods. DASH operates in a multi-stage fashion: at each stage, it aggregates candidate adversarial examples from multiple base attacks using learned, adaptive weights and propagates the result to the next stage. A novel meta-loss function guides this process by jointly minimizing misclassification loss and perceptual distortion, enabling the framework to dynamically modulate the contribution of each base attack throughout the stages. We evaluate DASH on adversarially trained models across CIFAR-10, CIFAR-100, and ImageNet. Despite relying solely on Lp-constrained based methods, DASH significantly outperforms state-of-the-art perceptual attacks such as AdvAD, achieving higher attack success rates (e.g., 20.63% improvement) and superior visual quality, as measured by SSIM, LPIPS, and FID (improvements $\approx$ of 11, 0.015, and 5.7, respectively). Furthermore, DASH generalizes well to unseen defenses, making it a practical and strong baseline for evaluating robustness without requiring handcrafted adaptive attacks for each new defense.

URL PDF HTML ☆

赞 0 踩 0

2508.11925 2026-05-26 cs.CR cs.CL cs.LG 版本更新

Optimizing Token Choice for Code Watermarking: An RL Approach

优化代码水印的令牌选择：一种强化学习方法

Zhimeng Guo, Huaisheng Zhu, Siyuan Xu, Hangfan Zhang, Teng Xiao, Minhao Cheng

发表机构 * The Pennsylvania State University（宾夕法尼亚州立大学）

AI总结提出CodeTracer框架，通过强化学习训练策略模型智能选择令牌嵌入水印，在保持代码功能的同时提高水印可检测性。

Comments ICML 2026, 18 pages, 3 figures

详情

AI中文摘要

保护LLM生成代码的知识产权需要有效的水印系统，该系统能够在代码高度结构化、语法受限的性质中运行。在这项工作中，我们引入了CodeTracer，一种创新的自适应代码水印框架，其基础是一种新颖的强化学习训练范式。其核心是，CodeTracer采用策略驱动方法，利用参数化模型在下一个令牌预测期间智能地偏向令牌选择。该策略确保嵌入的水印保持代码功能，同时表现出与典型令牌分布微妙但统计上可检测的偏差。为了促进策略学习，我们设计了一个全面的奖励系统，将执行反馈与水印嵌入信号无缝集成，平衡过程级和结果级奖励。此外，我们采用Gumbel Top-k重参数化来实现离散水印决策的基于梯度的优化。广泛的比较评估表明，CodeTracer在水印可检测性和生成代码功能保持方面均显著优于最先进的基线。我们的代码可在https://github.com/TimeLovercc/CodeTracer获取。

英文摘要

Protecting intellectual property on LLM-generated code necessitates effective watermarking systems that can operate within code's highly structured, syntactically constrained nature. In this work, we introduce CodeTracer, an innovative adaptive code watermarking framework underpinned by a novel reinforcement learning training paradigm. At its core, CodeTracer features a policy-driven approach that utilizes a parameterized model to intelligently bias token choices during next-token prediction. This strategy ensures that embedded watermarks maintain code functionality while exhibiting subtle yet statistically detectable deviations from typical token distributions. To facilitate policy learning, we devise a comprehensive reward system that seamlessly integrates execution feedback with watermark embedding signals, balancing process-level and outcome-level rewards. Additionally, we employ Gumbel Top-k reparameterization to enable gradient-based optimization of discrete watermarking decisions. Extensive comparative evaluations demonstrate CodeTracer's significant superiority over state-of-the-art baselines in both watermark detectability and the preservation of generated code's functionality. Our code is available at https://github.com/TimeLovercc/CodeTracer.

URL PDF HTML ☆

赞 0 踩 0

2508.03104 2026-05-26 cs.LG cs.AI 版本更新

HiTeC: Hierarchical Contrastive Learning on Text-Attributed Hypergraph with Semantic-Aware Augmentation

HiTeC: 基于语义感知增强的文本属性超图层次对比学习

Mengting Pan, Fan Li, Chen Chen, Xiaoyang Wang, Wenjie Zhang

发表机构 * The University of New South Wales（新南威尔士大学）； University of Wollongong（沃拉彭大学）

AI总结提出HiTeC框架，通过两阶段层次对比学习，结合结构感知文本编码预训练和语义感知增强，解决文本属性超图中文本与拓扑关联不足、随机增强噪声及长程依赖捕获问题。

Comments 16 pages, 8 figures

详情

AI中文摘要

对比学习已成为自监督超图学习的主流范式，能够在无需昂贵标签的情况下实现有效训练。然而，现实世界超图中的节点实体通常关联丰富的文本信息，这在先前工作中被大量忽略。直接将现有基于对比学习的方法应用于此类文本属性超图（TAHGs）会导致三个关键限制：（1）普遍使用的图无关文本编码器无法捕获文本语义与超图拓扑之间的相关性，导致表示表达能力不足。（2）它们对随机数据增强的依赖引入了噪声并削弱了对比信号。（3）主要关注节点和超边级别的对比信号限制了捕获长程依赖的能力，而这对于有效的表示学习至关重要。为解决这些挑战，我们引入了HiTeC，一个两阶段层次对比学习框架，用于在TAHGs上进行有效的自监督学习。在第一阶段，我们使用结构感知的对比目标预训练文本编码器，以克服传统方法的图无关特性。在第二阶段，我们首先引入语义感知增强，包括结构上下文化的文本增强和语义感知的超边丢弃，以促进信息丰富的视图生成。随后，我们提出一个多尺度对比损失，结合基于$s$步行走的子图级别目标，以捕获长程依赖。在六个真实世界数据集上的大量实验验证了我们提出方法的有效性。

英文摘要

Contrastive learning (CL) has become a dominant paradigm for self-supervised hypergraph learning, enabling effective training without costly labels. However, node entities in real-world hypergraphs are often associated with rich textual information, which has been largely ignored in prior works. Directly applying existing CL-based methods to such text-attributed hypergraphs (TAHGs) leads to three key limitations: (1) The common use of graph-agnostic text encoders fails to capture the correlations between textual semantics and hypergraph topology, resulting in less expressive representations. (2) Their reliance on random data augmentations introduces noise and weakens the contrastive signals. (3) The primary focus on node- and hyperedge-level contrastive signals limits the ability to capture long-range dependencies, which is essential for effective representation learning. To address these challenges, we introduce HiTeC, a two-stage hierarchical contrastive learning framework for effective self-supervised learning on TAHGs. In the first stage, we pre-train the text encoder with a structure-aware contrastive objective to overcome the graph-agnostic nature of conventional methods. In the second stage, we begin by introducing semantic-aware augmentations, including structure-contextualized text augmentation and semantic-aware hyperedge dropping, to facilitate informative view generation. Subsequently, we propose a multi-scale contrastive loss with an $s$-walk-based subgraph-level objective to capture long-range dependencies. Extensive experiments on six real-world datasets validate the effectiveness of our proposed method.

URL PDF HTML ☆

赞 0 踩 0

2507.10593 2026-05-26 cs.SE cs.AI cs.CL cs.LG 版本更新

ToolRegistry: A Protocol-Agnostic Tool Management Library for Function-Calling LLMs

ToolRegistry: 一个用于函数调用LLM的协议无关工具管理库

Peng Ding, Rick Stevens

发表机构 * University of Chicago（芝加哥大学）； Argonne National Laboratory（阿贡国家实验室）

AI总结提出ToolRegistry系统，通过统一工具对象和注册表实现协议无关的工具管理，支持多种传输协议、可插拔后端和高级功能，显著减少集成代码并提升吞吐量。

Comments 16 pages, 4 figures, v3: add co-author, permission system, progressive tool disclosure, think-augmented calling, RPC framing, multi-provider support

详情

AI中文摘要

每个LLM工具调用在结构上都是一个RPC——一个函数名、JSON参数和序列化结果——然而每个协议（原生Python、MCP、OpenAPI、LangChain）都是从零开始集成的。我们提出ToolRegistry，一个使这种RPC本质显式化的系统：一个单一的Tool对象充当通用存根，无论传输方式如何，而注册表则作为RPC客户端运行时，负责调度、模式生成和执行。该系统以三个包的形式发布——一个核心注册表、一个通过MCP和OpenAPI暴露工具的服务器，以及一个生产就绪实现的中心——并通过可插拔的线程或进程后端调用工具。该系统现在还提供基于标签的权限策略、针对大型注册表的BM25F驱动的渐进式工具披露、增强思考的函数调用、多提供商模式支持（OpenAI、Anthropic、Gemini）、声明式JSONC/YAML配置，以及一个基于仅stdlib内置模块的近乎零依赖的核心。在我们的基准测试中，该库将集成代码减少了60-80%，并且为给定工作负载选择正确的并发模式（线程与进程）相比替代方案可带来高达3.1倍的吞吐量。ToolRegistry在https://github.com/Oaklight/ToolRegistry开源；文档位于https://toolregistry.readthedocs.io/。

英文摘要

Every LLM tool call is structurally an RPC -- a function name, JSON arguments, and a serialized result -- yet each protocol (native Python, MCP, OpenAPI, LangChain) is integrated from scratch. We present ToolRegistry, a system that makes this RPC nature explicit: a single Tool object acts as a universal stub regardless of transport, while the registry serves as the RPC client runtime for dispatch, schema generation, and execution. The system ships as three packages -- a core registry, a server exposing tools over MCP and OpenAPI, and a hub of production-ready implementations -- and invokes tools through pluggable thread or process backends. The system now also provides tag-based permission policies, BM25F-powered progressive tool disclosure for large registries, think-augmented function calling, multi-provider schema support (OpenAI, Anthropic, Gemini), declarative JSONC/YAML configuration, and a near-zero-dependency core built on stdlib-only vendored modules. In our benchmarks the library cuts integration code by 60-80%, and choosing the right concurrency mode (thread vs. process) yields up to 3.1x throughput over the alternative for a given workload. ToolRegistry is open-source at https://github.com/Oaklight/ToolRegistry; documentation lives at https://toolregistry.readthedocs.io/.

URL PDF HTML ☆

赞 0 踩 0

2507.03159 2026-05-26 cs.LG math.OC 版本更新

MathOptAI.jl: Embed trained machine learning predictors into JuMP models

MathOptAI.jl: 将训练好的机器学习预测器嵌入JuMP模型

Oscar Dowson, Robert B Parker, Russel Bent

发表机构 * Dowson Farms（多森农场）； Los Alamos National Laboratory（洛斯阿拉莫斯国家实验室）

AI总结提出开源Julia库MathOptAI.jl，将多种训练好的机器学习模型（神经网络、决策树、高斯过程）嵌入JuMP优化模型，并支持PyTorch模型的GPU加速。

2507.02215 2026-05-26 stat.ML cs.LG cs.NA math.NA 版本更新

Hybrid least squares for learning functions from highly noisy data

混合最小二乘法：从高噪声数据中学习函数

Ben Adcock, Bernhard Hientzsch, Akil Narayan, Yiming Xu

发表机构 * Department of Mathematics, Simon Fraser University（Simon Fraser大学数学系）； Courant Institute of Mathematical Sciences, New York University（纽约大学Courant数学科学研究所）； Scientific Computing and Imaging Institute, University of Utah（犹他大学科学计算与成像研究所）； Department of Mathematics, University of Kentucky（肯塔基大学数学系）

AI总结针对高噪声数据下的最小二乘函数逼近问题，提出结合Christoffel采样与最优实验设计的混合方法，在样本点生成和噪声平滑方面实现最优性，提升计算效率和样本复杂度，并扩展到凸性约束和自适应随机子空间场景。

Comments 30 pages

详情

AI中文摘要

受高效估计条件期望需求的驱动，我们考虑一个数据严重污染的最小二乘函数逼近问题。在小噪声情况下有效的现有方法在存在大噪声时表现不佳。为了解决这个问题，我们提出了一种混合方法，将Christoffel采样与最优实验设计相结合。我们证明，所提出的算法在样本点生成和噪声平滑方面都具有适当的优化特性，与现有方法相比，提高了计算效率和样本复杂度。我们还将该算法扩展到凸性约束设置，并具有类似的理论保证。当目标函数定义为随机场的期望时，我们进一步扩展我们的方法以利用自适应随机子空间，并建立了自适应过程逼近能力的结果。我们的理论发现得到了数值研究的支持，包括合成数据以及计算金融中更具挑战性的随机模拟问题。

英文摘要

Motivated by the need for efficient estimation of conditional expectations, we consider a least-squares function approximation problem with heavily polluted data. Existing methods that are effective in the small-noise regime are suboptimal when large noise is present. To address this issue, we propose a hybrid approach that combines Christoffel sampling with optimal experimental design. We show that the proposed algorithm enjoys appropriate optimality properties for both sample point generation and noise mollification, leading to improved computational efficiency and sample complexity compared to existing methods. We also extend the algorithm to convexity-constrained settings with similar theoretical guarantees. When the target function is defined as the expectation of a random field, we further extend our approach to leverage adaptive random subspaces and establish results on the approximation capacity of the adaptive procedure. Our theoretical findings are supported by numerical studies on both synthetic data and on a more challenging stochastic simulation problem in computational finance.

URL PDF HTML ☆

赞 0 踩 0

2506.21137 2026-05-26 cs.LG 版本更新

Norm$\times$Direction: Restoring the Missing Query Norm in Vision Linear Attention

Norm×Direction：恢复视觉线性注意力中缺失的查询范数

Weikang Meng, Yadan Luo, Liangyu Huo, Yingjian Li, Yaowei Wang, Xin Li, Zheng Zhang

发表机构 * Harbin Institute of Technology, Shenzhen, China（哈尔滨工业大学（深圳））； Pengcheng Laboratory, China（鹏城实验室）； The University of Queensland, Australia（昆士兰大学）

AI总结针对线性注意力中查询范数丢失和非负性导致信息损失的问题，提出基于范数-方向分解的NaLaFormer，通过注入查询范数恢复注意力分布尖峰性，并采用余弦相似度保证非负性，在多项任务上达到线性注意力新标杆。

详情

AI中文摘要

线性注意力缓解了softmax注意力的二次复杂度，但遭受了关键的表达能力损失。我们识别出两个主要原因：（1）归一化操作取消了查询范数，这打破了查询范数与softmax注意力中注意力分布的尖峰性（熵）之间的相关性。（2）强制非负性的标准技术通过抵消有效的内积交互导致破坏性的信息损失。为了解决这些挑战，我们引入了NaLaFormer，一种基于查询和键向量的范数×方向（ND）分解的新型线性注意力机制。我们利用每个分量解决一个不同的问题：查询范数被注入到我们的核中，以创建一个查询范数感知的映射，恢复注意力分布的尖峰性。方向向量通过基于几何的余弦相似度度量进行处理，该度量在保证非负性的同时保留了内积的丰富细粒度信息。我们通过全面的多模态评估验证了NaLaFormer，它在线性注意力上设立了新的最先进基准。我们的模型在ImageNet-1K上实现了高达7.5%的准确率提升，在ADE20K上实现了4.7%的mIoU改进，相比可比的基线。它展示了深刻的效率，在令牌密集的超分辨率任务（7万+令牌）中，将峰值内存减少了变革性的92.3%。NaLaFormer的通用性进一步得到证实，它在常识推理上超越了像Mamba这样的强基线，并在Long Range Arena（LRA）基准上设立了新的最先进水平。代码可在https://github.com/ZacharyMeng/NaLaFormer获取。

英文摘要

Linear attention mitigates the quadratic complexity of softmax attention but suffers from a critical loss of expressiveness. We identify two primary causes: (1) The normalization operation cancels the query norm, which breaks the correlation between a query's norm and the spikiness (entropy) of the attention distribution as in softmax attention. (2) Standard techniques for enforcing non-negativity cause destructive information loss by nullifying valid inner-product interactions. To address these challenges, we introduce NaLaFormer, a novel linear attention mechanism built upon a norm$\times$direction (ND) decomposition of the query and key vectors. We leverage each component to solve a distinct problem: The query norm is injected into our kernel to create a query-norm-aware map that restores the attention distribution's spikiness. The direction vectors are processed by a geometric, cosine-based similarity metric that guarantees non-negativity while preserving the rich, fine-grained information of the inner product. We validate NaLaFormer through a comprehensive multi-modal evaluation, where it sets new state-of-the-art benchmarks for linear attention. Our model achieves up to a 7.5% accuracy gain on ImageNet-1K and a 4.7% mIoU improvement on ADE20K over comparable baselines. It demonstrates profound efficiency, reducing peak memory by a transformative 92.3% in token-intensive super-resolution tasks (70K+ tokens). NaLaFormer's versatility is further confirmed as it surpasses strong baselines like Mamba on common-sense reasoning and sets a new state-of-the-art on the Long Range Arena (LRA) benchmark. Code is available at https://github.com/ZacharyMeng/NaLaFormer .

URL PDF HTML ☆

赞 0 踩 0

2506.19037 2026-05-26 cs.CL cs.AI cs.IT cs.LG cs.NE math.IT 版本更新

Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models

速度规划：用于掩码扩散语言模型的膨胀调度

Omer Luxembourg, Haim Permuter, Eliya Nachmani

发表机构 * School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beersheba, Israel（电气与计算机工程学院，内盖夫本· Gurion大学，贝尔谢巴，以色列）

AI总结提出膨胀解掩码调度器（DUS），通过将序列位置划分为非相邻的膨胀组并并行解掩码，最小化联合熵增益上界，在不修改去噪器的情况下实现高达5.8倍加速。

Comments Accepted at ICML 2026

详情

AI中文摘要

掩码扩散语言模型（MDLM）承诺快速、非自回归的文本生成，然而现有的采样器根据模型置信度选择要解掩码的标记，忽略了并行解掩码多个位置时的交互，实际上退化为缓慢的自回归行为。我们提出了膨胀解掩码调度器（DUS），这是一种仅推理、无需规划模型的方法，它将序列位置划分为非相邻的膨胀组，并并行解掩码，以在每个去噪步骤中最小化联合熵增益的上界。通过明确权衡网络调用次数与生成质量，DUS恢复了传统并行解掩码策略下丢失的大部分性能。在数学（GSM8K, MATH500）、代码（HumanEval, MBPP）、通用知识（BBH, MMLU-Pro）和指令遵循（IFEval）基准测试中，DUS优于基于置信度的规划器，并将扩散特有的质量-速度权衡转化为由块大小$B$确定的确定性、可预测的加速，与逐标记MDLM解码相比，实现了高达5.8倍的墙钟加速，而无需修改底层去噪器。作为即插即用的后滤波器，膨胀间隔也改进了自适应采样器。代码可在https://github.com/omerlux/DUS获取。

英文摘要

Masked diffusion language models (MDLMs) promise fast, non-autoregressive text generation, yet existing samplers, which pick tokens to unmask based on model confidence, ignore interactions when unmasking multiple positions in parallel and effectively reduce to slow, autoregressive behavior. We propose the Dilated Unmasking Scheduler (DUS), an inference-only, planner-model-free method that partitions sequence positions into non-adjacent dilated groups and unmasks them in parallel so as to minimize an upper bound on joint entropy gain at each denoising step. By explicitly trading off the number of network calls against generation quality, DUS recovers most of the performance lost under traditional parallel unmasking strategies. Across math (GSM8K, MATH500), code (HumanEval, MBPP), general-knowledge (BBH, MMLU-Pro), and instruction following (IFEval) benchmarks, DUS outperforms confidence-based planners and turns the diffusion-specific quality-speed trade-off into a deterministic, predictable speedup set by the block size $B$, yielding up to $5.8\times$ wall-clock speedup over token-by-token MDLM decoding without modifying the underlying denoiser. Applied as a drop-in post-filter, dilated spacing also improves adaptive samplers. Code is available at https://github.com/omerlux/DUS.

URL PDF HTML ☆

赞 0 踩 0

2506.17326 2026-05-26 cs.LG stat.AP stat.ML 版本更新

CopulaSMOTE: A Copula-Based Oversampling Approach for Imbalanced Classification in Diabetes Prediction

CopulaSMOTE：基于Copula的过采样方法用于糖尿病预测中的不平衡分类

Agnideep Aich, Md Monzur Murshed, Bruce Wade, Sameera Hewage

发表机构 * Stanford University School of Medicine（斯坦福大学医学院）； Minnesota State University（明尼苏达州立大学）； University of Louisiana at Lafayette（路易斯安那大学拉斐特分校）； Southern Utah University（犹他州南方大学）

AI总结提出CopulaSMOTE方法，利用截断藤copula建模少数类联合依赖结构生成合成样本，在三个糖尿病数据集上结合多种分类器评估，显示能改善大表格数据集的少数类恢复。

详情

AI中文摘要

类别不平衡仍然是糖尿病等疾病临床预测模型开发中的一个实际障碍，其中确诊病例的数量通常远少于对照组。合成少数类过采样技术（SMOTE）及其变体被广泛用于解决这种不平衡，但它们通过特征空间中的局部插值生成合成观测值，并未显式建模少数类的联合依赖结构。为了解决这一挑战，我们的研究引入了一种基于copula的数据增强方法，该方法在生成合成样本时估计少数类的依赖结构，并与标准机器学习技术集成。具体来说，我们采用截断藤copula通过一系列双变量构建块来表示多元依赖。我们在三个公共糖尿病数据集上评估了所提出的方法，即Pima Indians糖尿病数据集、Iraqi糖尿病数据集和CDC BRFSS 2015糖尿病健康指标数据集，这些数据集涵盖了不同的样本量、维度和不平衡程度。对于每个数据集，使用5×2交叉验证协议和Dietterich配对t检验，在五个分类器上比较了五种重采样策略。我们的研究结果表明，CopulaSMOTE可以改善较大表格糖尿病数据集（尤其是CDC BRFSS数据集）中的少数类恢复，但其优势取决于分类器和评估指标。

英文摘要

Class imbalance remains a practical obstacle in the development of clinical prediction models for conditions such as diabetes mellitus, where the number of confirmed cases is often much smaller than the number of controls. The Synthetic Minority Over-sampling Technique (SMOTE) and its variants are widely used to address this imbalance, but they generate synthetic observations through local interpolation in feature space and do not explicitly model the joint dependence structure of the minority class. To address this challenge, our study introduces a copula-based data augmentation approach that estimates the minority-class dependence structure when generating synthetic samples and integrates with standard machine learning techniques. Specifically, we employ truncated vine copulas to represent multivariate dependence through a sequence of bivariate building blocks. We evaluate the proposed approach on three public diabetes datasets, namely the Pima Indians Diabetes dataset, the Iraqi Diabetes dataset, and the CDC BRFSS 2015 Diabetes Health Indicators dataset, which together cover a range of sample sizes, dimensionalities, and imbalance regimes. For each dataset, five resampling strategies are compared across five classifiers using a 5 by 2 cross validation protocol with Dietterich's paired t test. Our findings suggest that CopulaSMOTE can improve minority-class recovery in larger tabular diabetes datasets, particularly the CDC BRFSS dataset, but its advantages depend on the classifier and evaluation metric.

URL PDF HTML ☆

赞 0 踩 0

2506.11027 2026-05-26 cs.LG cs.AI cs.PL 版本更新

From Reasoning to Code: GRPO Optimization for Underrepresented Languages

从推理到代码：针对代表性不足语言的GRPO优化

Federico Pennino, Bianca Raimondi, Massimo Rondelli, Andrea Gurioli, Maurizio Gabbrielli

发表机构 * Qwen2.5-Coder

AI总结提出结合Qwen2.5-Coder小模型与GRPO的强化学习方法，利用执行反馈和奖励机制提升Prolog、Lisp等低资源语言的代码生成准确性与推理质量。

Comments Accepted ICLP 2026

详情

AI中文摘要

使用大型语言模型（LLM）生成准确且可执行的代码对于代表性不足的编程语言（如Prolog和Lisp）仍然是一个重大挑战，因为与Python等高资源语言相比，公共训练数据稀缺。本文介绍了一种可泛化的强化学习（RL）方法，将Qwen2.5-Coder模型的小规模版本与组相对策略优化（GRPO）相结合，通过推理实现有效的代码生成。为了解决稀疏数据集的局限性，我们将执行驱动的反馈直接集成到RL循环中，利用一个奖励系统，该系统同时利用逻辑正确性和结构格式。在GSM8K数据集上的实验结果表明，在代表性不足的语言中，推理质量和代码准确性有显著提升。这些发现强调了我们的方法通过利用符号推理和基于解释器的反馈，使缺乏广泛训练资源的多种编程语言受益的潜力。

英文摘要

Generating accurate and executable code using Large Language Models (LLMs) remains a significant challenge for underrepresented programming languages, such as Prolog and Lisp, due to the scarcity of public training data compared to high-resource languages like Python. This paper introduces a generalizable Reinforcement Learning (RL) approach that combines small-scale versions of the Qwen2.5-Coder model with Group Relative Policy Optimization (GRPO) to enable effective code generation through reasoning. To address the limitations of sparse datasets, we integrate execution-driven feedback directly into the RL loop, utilizing a reward system that exploits both logical correctness and structural formatting. Experimental results on GSM8K dataset demonstrate significant improvements in reasoning quality and code accuracy across underrepresented languages. These findings underscore the potential of our approach to benefit a wide range of programming languages lacking extensive training resources by leveraging symbolic reasoning and interpreter-based feedback.

URL PDF HTML ☆

赞 0 踩 0

2506.06840 2026-05-26 stat.ML cs.AI cs.LG stat.AP stat.OT 版本更新

PhySense：面向精确物理感知的传感器布局优化

Yuezhou Ma, Haixu Wu, Hang Zhou, Huikun Weng, Jianmin Wang, Mingsheng Long

发表机构 * School of Software, BNRist, Tsinghua University（软件学院，BNRist，清华大学）

AI总结提出PhySense两阶段框架，通过流生成模型和投影梯度下降联合优化传感器布局与物理场重建，实现高精度物理感知。

详情

AI中文摘要

物理感知在许多科学和工程领域中扮演着核心角色，它固有地涉及两个耦合的任务：从稀疏观测中重建密集物理场，以及优化分散的传感器布局以观测最大信息。虽然深度学习在稀疏数据重建方面取得了快速进展，但现有方法通常忽略传感器布局的优化，将重建与布局之间的相互增强束之高阁。为了改变这种次优实践，我们提出了PhySense，一个协同的两阶段框架，学习联合重建物理场和优化传感器布局，两者都旨在实现精确的物理感知。第一阶段涉及一个基于流的生成模型，通过交叉注意力增强以自适应地融合稀疏观测。利用重建反馈，第二阶段通过投影梯度下降执行传感器布局以满足空间约束。我们进一步证明两个阶段的学习目标与经典方差最小化原则一致，提供了理论保证。在三个具有挑战性的基准测试（特别是3D几何数据集）上的大量实验表明，PhySense实现了最先进的物理感知精度，并发现了以前未考虑的信息丰富的传感器布局。代码可在以下仓库获取：https://github.com/thuml/PhySense。

英文摘要

Physics sensing plays a central role in many scientific and engineering domains, which inherently involves two coupled tasks: reconstructing dense physical fields from sparse observations and optimizing scattered sensor placements to observe maximum information. While deep learning has made rapid advances in sparse-data reconstruction, existing methods generally omit optimization of sensor placements, leaving the mutual enhancement between reconstruction and placement on the shelf. To change this suboptimal practice, we propose PhySense, a synergistic two-stage framework that learns to jointly reconstruct physical fields and to optimize sensor placements, both aiming for accurate physics sensing. The first stage involves a flow-based generative model enhanced by cross-attention to adaptively fuse sparse observations. Leveraging the reconstruction feedback, the second stage performs sensor placement via projected gradient descent to satisfy spatial constraints. We further prove that the learning objectives of the two stages are consistent with classical variance-minimization principles, providing theoretical guarantees. Extensive experiments across three challenging benchmarks, especially a 3D geometry dataset, indicate PhySense achieves state-of-the-art physics sensing accuracy and discovers informative sensor placements previously unconsidered. Code is available at this repository: https://github.com/thuml/PhySense.

URL PDF HTML ☆

赞 0 踩 0

2505.11788 2026-05-26 cs.DC cs.IT cs.LG cs.NI eess.SP math.IT 版本更新

Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission

基于不确定性感知的机会主义与压缩传输的通信高效混合语言模型

Seungeun Oh, Jinhyuk Kim, Jihong Park, Seung-Woo Ko, Jinho Choi, Tony Q. S. Quek, Seong-Lyun Kim

发表机构 * School of Electrical and Electronic Engineering, Yonsei University, South Korea（延世大学电气电子工程学院，韩国）； Information Systems Technology and Design pillar, Singapore University of Technology and Design, Singapore 487372（新加坡科技设计大学信息系统技术与设计支柱，新加坡487372）； Department of Smart Mobility Engineering, Inha University, South Korea（Inha大学智能移动工程系，韩国）； School of Electrical and Mechanical Engineering, The University of Adelaide, Australia（阿德莱德大学电气与机械工程学院，澳大利亚）； Singapore University of Technology and Design, Singapore 487372（新加坡科技设计大学，新加坡487372）

AI总结提出通信高效的混合语言模型CU-HLM，通过不确定性感知的机会主义传输和词汇表压缩，在保持97.4%准确率的同时实现高达206倍的令牌吞吐量提升。

Comments 17 pages, 13 figures, 5 tables; This article has been accepted for publication in IEEE Transactions on Communications. This is the author's accepted version; the final published version will be available via IEEE Xplore

详情

AI中文摘要

为了支持使用分散异构计算资源的新兴语言应用，混合语言模型（HLM）提供了一种有前景的架构，其中设备端的小语言模型（SLM）生成草稿令牌，由远程大语言模型（LLM）验证和纠正。然而，原始HLM存在巨大的通信开销，因为LLM要求SLM为每个令牌上传完整的词汇分布。此外，当LLM验证极有可能被接受的令牌时，通信和计算资源都被浪费。为了克服这些限制，我们提出了通信高效且不确定性感知的HLM（CU-HLM）。在CU-HLM中，SLM仅在其输出不确定性高时才传输截断的词汇分布。我们通过发现SLM的不确定性与LLM的拒绝概率之间的强相关性，验证了这种机会主义传输的可行性。此外，我们从理论上推导了最优不确定性阈值和最优词汇截断策略。仿真结果表明，与标准HLM相比，CU-HLM通过跳过74.8%的传输并压缩97.4%的词汇，实现了高达206倍的令牌吞吐量提升，同时保持了97.4%的准确率。

英文摘要

To support emerging language-based applications using dispersed and heterogeneous computing resources, the hybrid language model (HLM) offers a promising architecture, where an on-device small language model (SLM) generates draft tokens that are validated and corrected by a remote large language model (LLM). However, the original HLM suffers from substantial communication overhead, as the LLM requires the SLM to upload the full vocabulary distribution for each token. Moreover, both communication and computation resources are wasted when the LLM validates tokens that are highly likely to be accepted. To overcome these limitations, we propose communication-efficient and uncertainty-aware HLM (CU-HLM). In CU-HLM, the SLM transmits truncated vocabulary distributions only when its output uncertainty is high. We validate the feasibility of this opportunistic transmission by discovering a strong correlation between SLM's uncertainty and LLM's rejection probability. Furthermore, we theoretically derive optimal uncertainty thresholds and optimal vocabulary truncation strategies. Simulation results show that, compared to standard HLM, CU-HLM achieves up to 206$\times$ higher token throughput by skipping 74.8% transmissions with 97.4% vocabulary compression, while maintaining 97.4% accuracy.

URL PDF HTML ☆

赞 0 踩 0

2505.05880 2026-05-26 cs.AI cs.LG 版本更新

Combining Abstract Argumentation and Machine Learning for Efficiently Analyzing Low-Level Process Event Streams

结合抽象论证与机器学习高效分析低层过程事件流

Bettina Fazzinga, Sergio Flesca, Filippo Furfaro, Luigi Pontieri, Francesco Scala

发表机构 * University of Calabria（卡拉布里亚大学）； CNR（国家科研委员会）

AI总结提出一种数据高效的神经符号方法，通过抽象论证框架（AAF）优化序列标注模型生成的候选事件解释，以解决低层过程事件流中事件到活动映射的不确定性问题。

详情

DOI: 10.1007/s40747-026-02340-1

AI中文摘要

监控和分析过程轨迹是现代公司和组织的一项关键任务。在轨迹事件与参考业务活动之间存在差距的场景中，这涉及一个解释问题，即将任何正在进行的轨迹的每个事件转换为活动实例的相应步骤。基于最近将解释问题框架化为抽象论证框架（AAF）内的接受问题的方法，可以优雅地分析可能的（可能以聚合形式）事件解释，并为那些与先验过程知识冲突的解释提供解释。由于在事件到活动映射高度不确定（或简单地说未充分指定）的环境中，这种基于推理的方法可能产生低信息量的结果和繁重的计算，因此可以考虑发现一个序列标注模型，该模型经过训练以上下文感知的方式建议高概率的候选事件解释。然而，最优地训练这样的模型可能需要使用大量手动注释的示例轨迹。因此，我们提出了一种数据高效的神经符号方法，其中由示例驱动的序列标注器返回的候选解释由基于AAF的推理器进行细化。这使我们能够利用先验知识来补偿示例数据的稀缺性，实验结果证实了这一点。

英文摘要

Monitoring and analyzing process traces is a critical task for modern companies and organizations. In scenarios where there is a gap between trace events and reference business activities, this entails an interpretation problem, amounting to translating each event of any ongoing trace into the corresponding step of the activity instance. Building on a recent approach that frames the interpretation problem as an acceptance problem within an Abstract Argumentation Framework (AAF), one can elegantly analyze plausible event interpretations (possibly in an aggregated form), as well as offer explanations for those that conflict with prior process knowledge. Since, in settings where event-to-activity mapping is highly uncertain (or simply under-specified) this reasoning-based approach may yield lowly-informative results and heavy computation, one can think of discovering a sequence-tagging model, trained to suggest highly-probable candidate event interpretations in a context-aware way. However, training such a model optimally may require using a large amount of manually-annotated example traces. We then propose a data-efficient neuro-symbolic approach to the problem, where the candidate interpretations returned by the example-driven sequence tagger is refined by the AAF-based reasoner. This allows us to also leverage prior knowledge to compensate for the scarcity of example data, as confirmed by experimenftal results.

URL PDF HTML ☆

赞 0 踩 0

2502.11167 2026-05-26 cs.LG cs.CL 版本更新

SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors

SURGE: 大型语言模型作为通用代理代码执行器的潜力

Bohan Lyu, Siqiao Huang, Zichen Liang

发表机构 * Department of Computer Science and Technology, Tsinghua（清华大学计算机科学与技术系）； Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua（清华大学交叉信息研究院）

AI总结提出SURGE基准，包含1160个问题覆盖8个关键方面，通过评估21个开源和专有LLM，研究其作为代码执行预测代理模型的可行性、扩展律、数据效率和预测准确性。

详情

Journal ref: Proceedings of The 2025 Conference on Empirical Methods in Natural Language Processing

AI中文摘要

神经代理模型是数据挖掘中强大且高效的工具。同时，大型语言模型（LLM）在代码相关任务（如生成和理解）中展示了卓越的能力。然而，一个同样重要但尚未充分探索的问题是，LLM是否可以作为代码执行预测的代理模型。为了系统研究这一问题，我们引入了SURGE，一个包含1160个问题的综合基准，覆盖8个关键方面：多语言编程任务、竞赛级编程问题、仓库级代码分析、高成本科学计算、时间复杂度密集型算法、有缺陷代码分析、依赖特定编译器或执行环境的程序，以及形式化数学证明验证。通过对21个开源和专有LLM的广泛分析，我们研究了扩展律、数据效率和预测准确性。我们的发现揭示了LLM作为计算过程高效代理的可行性的重要见解。基准和评估框架可在https://github.com/Imbernoulli/SURGE获取。

英文摘要

Neural surrogate models are powerful and efficient tools in data mining. Meanwhile, large language models (LLMs) have demonstrated remarkable capabilities in code-related tasks, such as generation and understanding. However, an equally important yet underexplored question is whether LLMs can serve as surrogate models for code execution prediction. To systematically investigate it, we introduce SURGE, a comprehensive benchmark with $1160$ problems covering $8$ key aspects: multi-language programming tasks, competition-level programming problems, repository-level code analysis, high-cost scientific computing, time-complexity-intensive algorithms, buggy code analysis, programs dependent on specific compilers or execution environments, and formal mathematical proof verification. Through extensive analysis of $21$ open-source and proprietary LLMs, we examine scaling laws, data efficiency, and predictive accuracy. Our findings reveal important insights about the feasibility of LLMs as efficient surrogates for computational processes. The benchmark and evaluation framework are available at https://github.com/Imbernoulli/SURGE.

URL PDF HTML ☆

赞 0 踩 0

2502.10311 2026-05-26 cs.LG cs.AI cs.HC 版本更新

ExplainReduce: Generating global explanations from many local explanations

ExplainReduce: 从许多局部解释生成全局解释

Lauri Seppäläinen, Mudong Guo, Kai Puolamäki

发表机构 * University of Helsinki（赫尔辛基大学）

AI总结本文提出 ExplainReduce 方法，通过贪心启发式算法将大量局部解释缩减为少量简单模型，作为生成式全局解释，并证明其有效性和竞争力。

Comments 21 pages with a 36 page appendix, 8 + 39 figures, 1+1 tables. The datasets and source code used in the paper are available at https://github.com/edahelsinki/explainreduce. Accepted for publication in the 4th World Conference on eXplainable Artificial Intelligence (2026)

2502.01397 2026-05-26 cs.LG cs.AI cs.NA math.NA 版本更新

Message-Passing GNNs Fail to Approximate Sparse Triangular Factorizations

消息传递GNN无法近似稀疏三角分解

Vladislav Trifonov, Ekaterina Muravleva, Ivan Oseledets

发表机构 * AIC, Skoltech（斯克里普金技术大学人工智能中心）； Skoltech AI4S Center（斯克里普金技术大学AI4S中心）； Sberbank of Russia（俄罗斯储蓄银行）； AIRI

AI总结本文通过理论和实验证明，消息传递图神经网络在逼近稀疏三角分解时存在根本性局限，需要超越消息传递的架构创新。

Comments Camera-ready version published in Transactions on Machine Learning Research

详情

Journal ref: Transactions on Machine Learning Research, 2026

AI中文摘要

图神经网络（GNN）已被提议作为学习稀疏矩阵预条件子的工具，预条件子是加速线性求解器的关键组件。我们提出理论和实验证据表明，对于存在高质量预条件子但需要非局部依赖的矩阵类别，消息传递GNN从根本上无法近似稀疏三角分解。为了说明这一点，我们使用合成矩阵和SuiteSparse集合中的真实示例构建了一组基线。在包括图注意力网络和图变换器在内的多种GNN架构中，我们观察到预测因子与参考因子之间的余弦相似度较低（关键情况下≤0.7）。我们的理论和实验结果表明，需要超越消息传递的架构创新才能将GNN应用于矩阵分解等科学计算任务。此外，实验表明仅克服非局部性是不够的。需要定制的架构来捕获所需的依赖关系，因为即使是完全非局部的全局图变换器也无法匹配所提出的基线。

英文摘要

Graph Neural Networks (GNNs) have been proposed as a tool for learning sparse matrix preconditioners, which are key components in accelerating linear solvers. We present theoretical and empirical evidence that message-passing GNNs are fundamentally incapable of approximating sparse triangular factorizations for classes of matrices for which high-quality preconditioners exist but require non-local dependencies. To illustrate this, we construct a set of baselines using both synthetic matrices and real-world examples from the SuiteSparse collection. Across a range of GNN architectures, including Graph Attention Networks and Graph Transformers, we observe low cosine similarity ($\leq0.7$ in key cases) between predicted and reference factors. Our theoretical and empirical results suggest that architectural innovations beyond message-passing are necessary for applying GNNs to scientific computing tasks such as matrix factorization. Moreover, experiments demonstrate that overcoming non-locality alone is insufficient. Tailored architectures are necessary to capture the required dependencies since even a completely non-local Global Graph Transformer fails to match the proposed baselines.

URL PDF HTML ☆

赞 0 踩 0

2502.01184 2026-05-26 cs.LG cs.AI physics.chem-ph q-bio.QM 版本更新

FragmentNet: Adaptive Graph Fragmentation for Graph-to-Sequence Molecular Representation Learning

FragmentNet: 自适应图分片用于图到序列分子表示学习

Ankur Samanta, Rohan Gupta, Aditi Misra, Christian McIntosh Clarke, Jayakumar Rajadas

发表机构 * Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada（电气与计算机工程系，多伦多大学，多伦多，加拿大）； Regenerative Biomaterials Laboratory, Stanford Cardiovascular Institute, Palo Alto, USA（再生生物材料实验室，斯坦福心血管研究所，帕洛阿尔托，美国）

AI总结提出FragmentNet，通过自适应学习的分词器将分子图分解为化学有效的片段，并利用化学感知的空间位置编码保持分子拓扑，在片段级别进行掩码预训练，在多个属性预测任务上提升了性能。

Comments 22 pages, 13 figures, 5 tables

详情

AI中文摘要

分子表示学习方法通常将分子标记为单个原子或使用刚性、基于规则的分片分解，限制了它们捕捉有意义化学子结构上下文的能力。我们引入了FragmentNet，一种围绕新颖的自适应学习分词器构建的图到序列模型，该分词器将分子图分解为可调整粒度的化学有效片段，并辅以化学感知的空间位置编码，在生成的序列中保留分子拓扑。将自然语言处理中的掩码预训练策略扩展到分子领域，我们在化学有意义的片段级别而非单个原子级别对分子进行掩码和重建。在多个属性预测基准上的评估发现，在片段粒度上进行预训练在大多数任务上提高了下游性能，表明标记化粒度是分子表示学习的重要设计选择。

英文摘要

Molecular representation learning methods typically tokenize molecules as individual atoms or use rigid, rule-based fragment decompositions, limiting their ability to capture meaningful chemical substructure context. We introduce FragmentNet, a graph-to-sequence model built around a novel adaptive, learned tokenizer that decomposes molecular graphs into chemically valid fragments of adjustable granularity, complemented by chemically aware spatial positional encodings that preserve molecular topology in the resulting sequence. Extending masked pre-training strategies from natural language processing to the molecular domain, we mask and reconstruct molecules at the level of chemically meaningful fragments rather than individual atoms. Evaluating across multiple property prediction benchmarks, we find that pre-training at fragment granularity leads to improved downstream performance on the majority of tasks, demonstrating that tokenization granularity is an important design choice for molecular representation learning.

URL PDF HTML ☆

赞 0 踩 0

2501.14889 2026-05-26 cs.LG 版本更新

Iterative Feature Space Optimization through Incremental Adaptive Evaluation

通过增量自适应评估的迭代特征空间优化

Yanping Wu, Yanyong Huang, Zhengzhang Chen, Zijun Yao, Yanjie Fu, Kunpeng Liu, Xiao Luo, Dongjie Wang

发表机构 * University of Kansas（堪萨斯大学）； Southwestern University of Finance and Economics（西南财经大学）； Arizona State University（亚利桑那州立大学）； Portland State University（波特兰州立大学）； University of California（加州大学）

AI总结提出EASE框架，通过特征-样本子空间生成器和上下文注意力评估器，实现高效、泛化的特征空间优化，解决评估偏差、过拟合和低效问题。

Comments 18 pages

详情

AI中文摘要

迭代特征空间优化涉及系统评估和调整特征空间以提升下游任务性能。然而，现有工作存在三个关键局限：1）忽视数据样本间的差异导致评估偏差；2）针对特定机器学习模型定制特征空间导致过拟合和泛化能力差；3）每次优化迭代需要从头重新训练评估器，显著降低整体优化效率。为弥补这些不足，我们提出一种广义自适应特征空间评估器（EASE），以高效产生最优且泛化的特征空间。该框架包含两个关键组件：特征-样本子空间生成器和上下文注意力评估器。第一个组件旨在解耦特征空间内的信息分布以减轻评估偏差。为此，我们首先根据后续评估器的反馈，识别与预测任务最相关的特征和评估中最具挑战性的样本。这种解耦策略使评估器持续聚焦于特征空间中最具挑战性的方面。第二个组件旨在增量捕获特征空间的演化模式以实现高效评估。我们提出一种加权共享多头注意力机制，将特征空间的关键特征编码为嵌入向量用于评估。此外，评估器进行增量更新，保留先前的评估知识同时融入新见解，因为优化过程中连续的特征空间共享部分信息。在十四个真实世界数据集上的大量实验证明了所提框架的有效性。我们的代码和数据已公开。

英文摘要

Iterative feature space optimization involves systematically evaluating and adjusting the feature space to improve downstream task performance. However, existing works suffer from three key limitations:1) overlooking differences among data samples leads to evaluation bias; 2) tailoring feature spaces to specific machine learning models results in overfitting and poor generalization; 3) requiring the evaluator to be retrained from scratch during each optimization iteration significantly reduces the overall efficiency of the optimization process. To bridge these gaps, we propose a gEneralized Adaptive feature Space Evaluator (EASE) to efficiently produce optimal and generalized feature spaces. This framework consists of two key components: Feature-Sample Subspace Generator and Contextual Attention Evaluator. The first component aims to decouple the information distribution within the feature space to mitigate evaluation bias. To achieve this, we first identify features most relevant to prediction tasks and samples most challenging for evaluation based on feedback from the subsequent evaluator. This decoupling strategy makes the evaluator consistently target the most challenging aspects of the feature space. The second component intends to incrementally capture evolving patterns of the feature space for efficient evaluation. We propose a weighted-sharing multi-head attention mechanism to encode key characteristics of the feature space into an embedding vector for evaluation. Moreover, the evaluator is updated incrementally, retaining prior evaluation knowledge while incorporating new insights, as consecutive feature spaces during the optimization process share partial information. Extensive experiments on fourteen real-world datasets demonstrate the effectiveness of the proposed framework. Our code and data are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2408.08399 2026-05-26 cs.LG cs.SY eess.SY 版本更新

Transformer-based few-shot learning for modeling Electricity Consumption Profiles with minimal data across thousands of domains

基于Transformer的少样本学习：以最少数据跨数千个领域建模电力消费曲线

Weijie Xia, Gao Peng, Chenguang Wang, Peter Palensky, Eric Pauwels, Pedro P. Vergara

发表机构 * Intelligent Electrical Power Grids (IEPG) Group（智能电力电网组）； Centrum Wiskunde & Informatica (CWI)（数学与信息学研究中心）； Alliander N.V（Alliander公司）

AI总结针对电力消费曲线建模中数据稀缺问题，提出一种结合Transformer和高斯混合模型的免微调少样本学习框架，仅需1.6%数据即可准确恢复复杂分布，优于现有方法。

详情

DOI: 10.1016/j.ijepes.2026.111575
Journal ref: International Journal of Electrical Power & Energy Systems, Volume/Issue (February 2026), Article 111575

AI中文摘要

电力消费曲线（ECP）对于配电系统的运行和规划至关重要，尤其是在太阳能电池板和电动汽车等低碳技术日益普及的背景下。传统的ECP建模方法通常假设有足够的ECP数据可用。然而，在实践中，由于隐私问题或缺乏计量设备，ECP数据的可访问性有限。少样本学习（FSL）已成为数据稀缺场景下ECP建模的一种有前景的解决方案。然而，标准的FSL方法（例如用于图像的方法）不适用于ECP建模，因为（1）这些方法通常假设有多个具有充足数据的源域和多个目标域。但在ECP建模中，可能存在数千个源域（例如具有中等数据量的家庭）和数千个目标域（例如需要建模ECP的家庭）。（2）标准FSL方法通常涉及繁琐的知识迁移机制，例如预训练和微调。为了解决这些局限性，本文提出了一种新颖的FSL框架，将Transformer与高斯混合模型（GMM）相结合用于ECP建模。所提出的方法无需微调，计算效率高，即使在数据极其有限的情况下也具有鲁棒性。结果表明，我们的方法可以用最少的ECP数据（例如，仅占完整域数据集的1.6%）准确恢复复杂的ECP分布，并且在ECP建模背景下优于最先进的时间序列建模方法。

英文摘要

Electricity Consumption Profiles (ECPs) are crucial for operating and planning power distribution systems, especially with the increasing number of low-carbon technologies such as solar panels and electric vehicles. Traditional ECP modeling methods typically assume the availability of sufficient ECP data. However, in practice, the accessibility of ECP data is limited due to privacy issues or the absence of metering devices. Few-shot learning (FSL) has emerged as a promising solution for ECP modeling in data-scarce scenarios. Nevertheless, standard FSL methods, such as those used for images, are unsuitable for ECP modeling because (1) these methods usually assume several source domains with sufficient data and several target domains. However, in the context of ECP modeling, there may be thousands of source domains, e.g., households with a moderate amount of data, and thousands of target domains, e.g., households that ECP are required to be modeled. (2) Standard FSL methods usually involve cumbersome knowledge transfer mechanisms, such as pre-training and fine-tuning. To address these limitations, this paper proposes a novel FSL framework that integrates Transformers with Gaussian Mixture Models (GMMs) for ECP modeling. The proposed approach is fine-tuning-free, computationally efficient, and robust even with extremely limited data. Results show that our method can accurately restore the complex ECP distribution with a minimal amount of ECP data (e.g., only 1.6% of the complete domain dataset) and outperforms state-of-the-art time series modeling methods in the context of ECP modeling.

URL PDF HTML ☆

赞 0 踩 0

2406.09079 2026-05-26 cs.LG 版本更新

Hadamard Representation: Scaffolding Performance Across Model-free RL

Hadamard表示：跨无模型强化学习的性能支撑

Jacob E. Kooi, Zhao Yang, Mark Hoogendoorn, Vincent François-Lavet

发表机构 * Vrije Universiteit Amsterdam（阿姆斯特丹自由大学）

AI总结提出Hadamard表示（HR），通过将标准隐藏层替换为两个独立参数化层的逐元素乘积，减少神经元休眠并增加有效秩，从而在多种强化学习算法和领域中一致提升性能。

Comments 26 pages, 17 figures

详情

AI中文摘要

深度强化学习智能体在训练过程中逐渐失去表示能力：神经元变得休眠，从网络中移除活跃容量，有效秩崩溃，使存活的神经元冗余。现有的补救措施如周期性重置和特殊神经网络架构，大多局限于特定算法或领域。我们提出一个简单的架构修复，即Hadamard表示（HR），它将标准隐藏层替换为两个独立参数化层的逐元素乘积。HR通过两种互补机制运作。首先，它降低了神经元变得休眠的概率，这对于连续可微激活函数（如tanh）尤其有价值：与休眠的ReLU神经元（被有效剪枝）不同，饱和的tanh神经元通过将其输出权重转化为固定偏置而暗中破坏下游层。其次，独立于休眠，乘法结构捕获更丰富的特征交互，并在不拓宽层的情况下增加有效秩。我们在五种算法和三个领域上评估HR：基于像素的离散动作Atari上的DQN、PPO和PQN，基于状态连续控制上的SimbaV2，以及视觉连续控制上的MR.Q。HR在无需任何超参数调优的情况下，一致地优于强基线，并且其增益在参数匹配的更宽变体上仍然保持，排除了参数数量作为替代解释的可能性。

英文摘要

Deep reinforcement learning agents progressively lose representational capacity during training: neurons become dormant, removing active capacity from the network, and effective rank collapses, leaving surviving neurons redundant. Existing remedies such as periodic resets, and special neural network architectures, are largely algorithm- or domain-specific. We propose a simple architectural fix, the Hadamard Representation (HR), which replaces a standard hidden layer with the element-wise product of two independently parameterized layers. HR operates through two complementary mechanisms. First, it reduces the probability of a neuron becoming dormant, which is particularly valuable for continuously differentiable activations such as tanh: unlike dormant ReLU neurons, which are effectively pruned, saturated tanh neurons silently corrupt downstream layers by turning their outgoing weights into fixed biases. Second, independently of dormancy, the multiplicative structure captures richer feature interactions and increases effective rank without widening the layer. We evaluate HR across five algorithms and three domains: DQN, PPO, and PQN on pixel-based discrete-action Atari, SimbaV2 on state-based continuous control, and MR.Q on visual continuous control. HR consistently improves performance over the strong baselines without any hyperparameter tuning, and gains persist against parameter-matched wider variants, ruling out parameter count as an alternative explanation.

URL PDF HTML ☆

赞 0 踩 0

2406.04374 2026-05-26 cs.IR cs.GT cs.LG stat.ML 版本更新

Incentivized Exploration with Stochastic Covariates: A Two-Stage Mechanism Design for Recommender System

带随机协变量的激励探索：推荐系统的两阶段机制设计

Yuantong Li, Guang Cheng, Xiaowu Dai

发表机构 * Meta Platforms Inc（Meta公司）； Department of Statistics and Data Science（统计学与数据科学系）； University of California, Los Angeles, CA（加州大学洛杉矶分校）

AI总结针对推荐系统中用户自利偏好下的探索-利用权衡问题，提出一种两阶段算法，通过激励相容的探索和逆比例间隙采样策略实现次线性遗憾并满足激励约束。

Comments ICML 2026

详情

AI中文摘要

推荐系统通过连接用户与相关产品在互联网经济中扮演关键角色。然而，设计有效的推荐系统面临关键挑战：在确保探索新产品的激励与用户自利偏好之间的探索-利用权衡。先前工作解决了固定设计线性bandit中的贝叶斯激励相容性（Sellke & Slivkins, 2023），我们则应对在线采样的随机用户协变量的挑战。与标准的黑箱归约（Mansour et al., 2020）不同，我们的两阶段框架利用线性奖励结构，在满足激励约束的同时实现次线性遗憾。为解决该问题，我们提出一种两阶段算法，将激励探索与任何高效的即插即用离线学习算法相结合。在第一阶段，算法在保持激励相容性的同时探索产品以收集最优样本。第二阶段采用逆比例间隙采样策略（IPGS）与任何高效学习方法相结合，以确保次线性遗憾。理论上，我们证明算法RCB实现了$O(\sqrt{KdT})$遗憾，同时满足激励约束，并发现了激励预算与遗憾之间的权衡，实验验证了这一点。通过在个性化华法林剂量调整的实际应用和模拟中，我们展示了RCB的强激励增益、次线性遗憾和鲁棒性。

英文摘要

Recommender systems play a crucial role in internet economies by connecting users with relevant products. However, designing effective recommender systems faces the key challenges: the exploration-exploitation tradeoff in securing incentive to explore new products against user's self-interested preferences. While prior work addresses Bayesian Incentive Compatibility (BIC) in fixed-design linear bandits (Sellke & Slivkins, 2023), we tackle the challenge of stochastic user covariates sampled online. Unlike standard black-box reductions (Mansour et al., 2020), our two-stage framework exploits the linear reward structure to achieve sublinear regret while satisfying incentive constraints. To address it, we propose a two-stage algorithm that integrates incentivized exploration with any efficient plug-in offline learning algorithms. In the first stage, it explores products while maintaining incentive compatibility to gather optimal samples. The second stage employs inverse proportional gap sampling strategy (IPGS) integrated with any efficient learning methods to secure sublinear regret. Theoretically, we prove that algorithm RCB achieves $O(\sqrt{KdT})$ regret and simultaneously satisfies incentive constraints, and discovers the tradeoff between incentive budget and regret, validating in experiments. We demonstrate RCB's strong incentive gain, sublinear regret, and robustness through a real application on personalized warfarin dosing and simulations.

URL PDF HTML ☆

赞 0 踩 0

2404.08073 2026-05-26 math.OC cs.LG stat.ML 版本更新

Spurious Stationarity and Hardness Results for Bregman Proximal-Type Algorithms

Bregman近端类型算法的伪平稳性和困难结果

He Chen, Jiajin Li, Anthony Man-Cho So

发表机构 * Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong（香港中文大学系统工程与工程管理系）； Sauder School of Business, University of British Columbia（不列颠哥伦比亚大学萨德勒商学院）

AI总结本文揭示了Bregman近端类型算法（如镜像下降）在非欧几何下可能陷入伪平稳点，即使对于凸问题，若Bregman核的梯度非Lipschitz连续，停滞可无限持续，并指出该现象在非凸多面体约束问题中普遍存在，挑战了现有收敛性理论。

详情

AI中文摘要

Bregman近端类型算法（BPs），如镜像下降，已成为机器学习和数据科学中通过非欧几何利用问题结构的流行工具。在本文中，我们表明BPs可能被困在一类非平稳点附近，我们称之为\emph{伪平稳点}。如果Bregman核的梯度不是Lipschitz连续的，即使对于凸问题，这种停滞也可能持续任意有限次迭代。根本原因在于欧几里得几何和Bregman几何在下降行为上的根本对比：虽然欧几里得梯度下降确保在任何非平稳点附近充分下降，但BPs可能在伪平稳点附近表现出任意缓慢的下降。因此，常用的基于Bregman的平稳性度量，例如Bregman散度的相对变化，可能在伪平稳点附近消失。这可能误导性地表明收敛，即使迭代点仍远离任何真正的平稳点。我们的分析进一步揭示，伪平稳点并非病态，而是在具有多面体约束的广泛非凸问题类中普遍出现。综上所述，我们的发现揭示了基于Bregman的优化方法中的一个严重盲点，并呼吁新的理论工具和算法保障以确保可靠的收敛。

英文摘要

Bregman proximal-type algorithms (BPs), such as mirror descent, have become popular tools in machine learning and data science for exploiting problem structures through non-Euclidean geometries. In this paper, we show that BPs can get trapped near a class of non-stationary points, which we term \emph{spurious stationary points}. Such stagnation can persist for any finite number of iterations if the gradient of the Bregman kernel is not Lipschitz continuous, even in convex problems. The root cause lies in a fundamental contrast in descent behavior between Euclidean and Bregman geometries: While Euclidean gradient descent ensures sufficient decrease near any non-stationary point, BPs may exhibit arbitrarily slow decrease around spurious stationary points. As a result, commonly used Bregman-based stationarity measure, such as relative change in terms of Bregman divergence, can vanish near spurious stationary points. This may misleadingly suggest convergence, even when the iterates remain far from any true stationary point. Our analysis further reveals that spurious stationary points are not pathological, but rather occur generically in a broad class of nonconvex problems with polyhedral constraints. Taken together, our findings reveal a serious blind spot in Bregman-based optimization methods and calls for new theoretical tools and algorithmic safeguards to ensure reliable convergence.

URL PDF HTML ☆

赞 0 踩 0

2403.04545 2026-05-26 cs.LG math.ST stat.TH 版本更新

Branch Scaling Manifests as Implicit Architectural Regularization for Improving Generalization in Overparameterized ResNets

分支缩放表现为隐式架构正则化以改善过参数化ResNet的泛化能力

Zixiong Yu, Guhan Chen, Jianfa Lai, Bohan Li, Songtao Tian

发表机构 * Huawei Large Model Data Technology Lab, Shenzhen（华为大模型数据技术实验室，深圳）； Tsinghua University, Beijing（清华大学，北京）； Kyoto University, Kyoto（京都大学，京都）

AI总结本文研究残差网络中分支缩放因子对过参数化ResNet泛化性能的影响，通过理论分析证明快速深度衰减的缩放因子结合早停可实现极小极大最优泛化率，并利用神经正切核（NTK）近似解释其机制。

Comments Accepted by ICML. This version incorporates content from the preprint arXiv:2305.18506. The contributors of the relevant content have consented to its inclusion and have been listed as authors

详情

AI中文摘要

残差分支中的缩放因子已成为提升神经网络性能的流行方法，特别是在无归一化架构中。虽然先前的工作主要从优化角度研究缩放效应，本文通过泛化理论的视角探讨其在残差架构中的作用。具体来说，我们证明具有恒定缩放因子的宽残差网络（ResNet）随着深度增加渐近地变得不可学习。相反，当缩放因子表现出快速的深度方向衰减并结合早停时，过参数化ResNet实现了极小极大最优泛化率。为了建立这一结论，我们证明宽ResNet的泛化能力可以通过与神经正切核（NTK）相关的核回归来近似。我们的理论发现通过合成数据和真实世界分类任务（包括MNIST和CIFAR-100）的实验得到验证。

英文摘要

Scaling factors in residual branches have emerged as a prevalent method for boosting neural network performance, especially in normalization-free architectures. While prior work has primarily examined scaling effects from an optimization perspective, this paper investigates their role in residual architectures through the lens of generalization theory. Specifically, we establish that wide residual networks (ResNets) with constant scaling factors become asymptotically unlearnable as depth increases. In contrast, when the scaling factor exhibits rapid depth-wise decay combined with early stopping, over-parameterized ResNets achieve minimax-optimal generalization rates. To establish this, we demonstrate that the generalization capability of wide ResNets can be approximated by kernel regression associated with the Neural Tangent Kernel (NTK). Our theoretical findings are validated through experiments on synthetic data and real-world classification tasks, including MNIST and CIFAR-100.

URL PDF HTML ☆

赞 0 踩 0

2402.13791 2026-05-26 cs.LG 版本更新

Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing

打开黑箱：遥感中可解释人工智能的系统综述

Adrian Höhl, Ivica Obadic, Miguel Ángel Fernández Torres, Hiba Najjar, Dario Oliveira, Zeynep Akata, Andreas Dengel, Xiao Xiang Zhu

发表机构 * Chair of Data Science in Earth Observation, Technical University of Munich (TUM)（地球观测数据科学教授团，慕尼黑技术大学）； Munich Center for Machine Learning（慕尼黑机器学习中心）； Image Processing Laboratory (IPL), Universitat de València (UV)（图像处理实验室（IPL），瓦伦西亚大学）； University of Kaiserslautern-Landau, Germany（德国凯撒斯劳滕-兰道大学）； German Research Center for Artificial Intelligence (DFKI)（德国人工智能研究中心（DFKI））； School of Applied Mathematics, Getulio Vargas Foundation, Brazil（巴西格洛里奥·瓦格斯基金会应用数学学院）； Institute for Explainable Machine Learning at Helmholtz Munich（海德堡慕尼黑可解释机器学习研究所）； Chair of Interpretable and Reliable Machine Learning, Technical University of Munich（可解释和可靠机器学习教授团，慕尼黑技术大学）

AI总结本文通过系统综述，总结了遥感中可解释AI方法的使用、目标、发现和挑战，揭示了新兴方向并提供了评估方法。

详情

DOI: 10.1109/MGRS.2024.3467001
Journal ref: published in IEEE Geoscience and Remote Sensing Magazine, vol. 12, no. 4, pp. 261-304, Dec. 2024

AI中文摘要

近年来，黑箱机器学习方法已成为遥感知识提取的主导建模范式。尽管通过可解释人工智能揭示这些模型内部运作具有潜在益处，但目前在遥感应用中，仍缺乏全面概述可解释AI方法及其目标、发现和挑战的综述。本文通过系统综述来填补这一空白，识别该领域的关键趋势，并阐明针对特定遥感挑战的新颖可解释AI方法和新兴方向。我们还揭示了解释解释的常见模式，讨论了提取的科学见解，并反思了用于评估可解释AI方法的方法。因此，我们的综述提供了遥感中可解释AI最新技术的完整总结。此外，我们详细展望了挑战和有前景的研究方向，这为新颖方法论的发展奠定了基础，并为该领域的新研究者提供了有用的起点。

英文摘要

In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in remote sensing. Despite the potential benefits of uncovering the inner workings of these models with explainable AI, a comprehensive overview summarizing the explainable AI methods used and their objectives, findings, and challenges in remote sensing applications is still missing. In this paper, we address this gap by performing a systematic review to identify the key trends in the field and shed light on novel explainable AI approaches and emerging directions that tackle specific remote sensing challenges. We also reveal the common patterns of explanation interpretation, discuss the extracted scientific insights, and reflect on the approaches used for the evaluation of explainable AI methods. As such, our review provides a complete summary of the state-of-the-art of explainable AI in remote sensing. Further, we give a detailed outlook on the challenges and promising research directions, representing a basis for novel methodological development and a useful starting point for new researchers in the field.

URL PDF HTML ☆

赞 0 踩 0

2312.03957 2026-05-26 q-bio.TO cs.LG 版本更新

PerSival: Neural-network-based visualisation for pervasive continuum-mechanical simulations in musculoskeletal biomechanics

PerSival：基于神经网络的肌肉骨骼生物力学中连续介质力学模拟的普适可视化

David Rosin, Johannes Kässinger, Xingyao Yu, Okan Avci, Christian Bleiler, Oliver Röhrle

发表机构 * Institute for Parallel and Distributed Systems, University of Stuttgart（并行与分布式系统研究所，斯图加特大学）； Visualization Research Center VISUS, University of Stuttgart（可视化研究中心VISUS，斯图加特大学）； Biomechatronic Systems, Fraunhofer IPA, Stuttgart（生物机械系统，弗劳恩霍夫IPA研究所，斯图加特）

AI总结本文提出一种神经网络架构，通过稀疏网格代理捕捉肱二头肌表面变形，实现3D上肢肌肉骨骼系统模型在资源受限设备上的实时可视化，平均误差0.97 mm。

Comments 10 pages, 4 figures, 5 tables, to be submitted to Medical Image Analysis

详情

DOI: 10.1007/s11517-026-03519-x

AI中文摘要

本文提出一种新颖的神经网络架构，用于3D人体上肢肌肉骨骼系统模型的普适可视化。将模拟能力扩展到移动设备等资源贫乏系统，在众多研究领域中日益受到关注，以拓宽方法和结果的适用性。直到最近，由于计算成本过高，这一目标被认为对于肌肉骨骼系统的真实连续介质力学模拟而言遥不可及。在本工作中，我们使用稀疏网格代理来捕捉肱二头肌的表面变形，以训练一个深度学习模型，用于同一肌肉的实时可视化。这些代理模型均以5个肌肉激活水平作为输入，并输出肌肉表面每个网格节点的笛卡尔坐标向量。因此，神经网络架构的输入维度显著低于输出维度。5个肌肉激活水平足以实现肱二头肌2809个网格节点位置的平均误差为0.97 ± 0.16 mm，即0.57 ± 0.10%。该模型在仅使用CPU时每个预测变形状态的评估时间为9.88 ms，在GPU支持下为3.48 ms，对应的理论帧率分别为101 fps和287 fps。因此，深度学习代理为连续介质力学模拟在视觉实时应用中的可访问性提供了一条途径。

英文摘要

This paper presents a novel neural network architecture for the purpose of pervasive visualisation of a 3D human upper limb musculoskeletal system model. Bringing simulation capabilities to resource-poor systems like mobile devices is of growing interest across many research fields, to widen applicability of methods and results. Until recently, this goal was thought to be out of reach for realistic continuum-mechanical simulations of musculoskeletal systems, due to prohibitive computational cost. Within this work we use a sparse grid surrogate to capture the surface deformation of the m.~biceps brachii in order to train a deep learning model, used for real-time visualisation of the same muscle. Both these surrogate models take 5 muscle activation levels as input and output Cartesian coordinate vectors for each mesh node on the muscle's surface. Thus, the neural network architecture features a significantly lower input than output dimension. 5 muscle activation levels were sufficient to achieve an average error of 0.97 +/- 0.16 mm, or 0.57 +/- 0.10 % for the 2809 mesh node positions of the biceps. The model achieved evaluation times of 9.88 ms per predicted deformation state on CPU only and 3.48 ms with GPU-support, leading to theoretical frame rates of 101 fps and 287 fps respectively. Deep learning surrogates thus provide a way to make continuum-mechanical simulations accessible for visual real-time applications.

URL PDF HTML ☆

赞 0 踩 0

2311.11342 2026-05-26 cs.LG cs.DC math.OC 版本更新

On the Communication Complexity of Decentralized Stochastic Bilevel Optimization

去中心化随机双层优化的通信复杂度

Yihan Zhang, My T. Thai, Jie Wu, Hongchang Gao

发表机构 * Temple University（特拉华大学）

AI总结针对异构环境下现有去中心化随机双层优化算法收敛慢、通信成本高的问题，提出基于同步和交替更新策略的两种新算法，实现了更快的收敛速度和更低的通信成本，并首次在温和假设下揭示了异构设置中Hessian逆向量积的计算与通信对收敛率的影响。

详情

AI中文摘要

随机双层优化在机器学习中有着广泛的应用，包括元学习、超参数优化和神经架构搜索。为了将随机双层优化扩展到分布式数据，已经开发了几种去中心化随机双层优化算法。然而，现有方法在异构设置中通常存在收敛速度慢和通信成本高的问题，限制了它们在实际任务中的适用性。为了解决这些问题，我们提出了两种基于 extit{同步}和 extit{交替}更新策略的新型去中心化随机双层梯度下降算法。我们的算法能够实现比现有方法更快的收敛速度和更低的通信成本。重要的是，我们的收敛分析不依赖于关于异构性的强假设。更重要的是，我们的理论分析清晰地揭示了在异构设置下，关于Hessian逆向量积的计算和通信如何影响收敛率。据我们所知，这是首次在异构设置中在温和假设下取得如此有利的理论结果。此外，我们展示了如何在使用方差缩减梯度时建立交替更新策略的收敛率。最后，实验结果证实了我们算法的有效性。

英文摘要

Stochastic bilevel optimization finds widespread applications in machine learning, including meta-learning, hyperparameter optimization, and neural architecture search. To extend stochastic bilevel optimization to distributed data, several decentralized stochastic bilevel optimization algorithms have been developed. However, existing methods often suffer from slow convergence rates and high communication costs in heterogeneous settings, limiting their applicability to real-world tasks. To address these issues, we propose two novel decentralized stochastic bilevel gradient descent algorithms based on \textit{simultaneous} and \textit{alternating} update strategies. Our algorithms can achieve faster convergence rates and lower communication costs than existing methods. Importantly, our convergence analyses do not rely on strong assumptions regarding heterogeneity. More importantly, our theoretical analyses clearly disclose how the computation and communication regarding the Hessian-inverse-vector product under the heterogeneous setting affects the convergence rate. To the best of our knowledge, this is the first time such favorable theoretical results have been achieved with mild assumptions in the heterogeneous setting. Furthermore, we demonstrate how to establish the convergence rate for the alternating update strategy when combined with the variance-reduced gradient. Finally, experimental results confirm the efficacy of our algorithms.

URL PDF HTML ☆

赞 0 踩 0

2309.07778 2026-05-26 eess.IV cs.CV cs.LG q-bio.TO 版本更新

Virchow: A Million-Slide Digital Pathology Foundation Model

Virchow：百万级数字病理学基础模型

Eugene Vorontsov, Alican Bozkurt, Adam Casson, George Shaikovski, Michal Zelechowski, Siqi Liu, Kristen Severson, Eric Zimmermann, James Hall, Neil Tenenholtz, Nicolo Fusi, Philippe Mathieu, Alexander van Eck, Donghun Lee, Julian Viret, Eric Robert, Yi Kan Wang, Jeremy D. Kunz, Matthew C. H. Lee, Jan Bernhard, Ran A. Godrich, Gerard Oakley, Ewan Millar, Matthew Hanna, Juan Retamero, William A. Moye, Razik Yousfi, Christopher Kanan, David Klimstra, Brandon Rothrock, Thomas J. Fuchs

发表机构 * Paige ； Microsoft Research（微软研究院）； NSW Health Pathology（新南威尔士州卫生病理学）； St George Hospital（圣乔治医院）； Memorial Sloan Kettering Cancer Center（纪念斯隆凯特琳癌症中心）； University of Rochester（罗切斯特大学）

AI总结提出Virchow，一个基于DINOv2自监督学习、在150万张H&E染色全切片图像上训练的6.32亿参数视觉Transformer模型，用于计算病理学，在泛癌检测和生物标志物预测任务上达到最先进性能。

详情

AI中文摘要

通过分析病理图像实现精准医疗和决策支持系统的人工智能应用，有潜力彻底改变癌症的诊断和治疗。这类应用将依赖于模型捕捉病理图像中观察到的多样化模式的能力。为应对这一挑战，我们提出了Virchow，一个用于计算病理学的基础模型。利用DINOv2算法支持的自监督学习，Virchow是一个拥有6.32亿参数的视觉Transformer模型，在来自不同组织和标本类型的150万张苏木精-伊红染色全切片图像上训练，数据量比以往工作高出数个数量级。Virchow模型使得开发一个泛癌检测系统成为可能，该系统在17种不同癌症类型上的整体标本级AUC达到0.949，同时在7种罕见癌症类型上达到0.937的AUC。Virchow模型在内部和外部图像块级基准测试以及切片级生物标志物预测任务上均达到了最先进水平。性能的提升凸显了在大型病理图像数据集上训练的重要性，表明扩展数据和网络架构可以提高许多高影响计算病理学应用的准确性，尤其是在训练数据有限的情况下。

英文摘要

The use of artificial intelligence to enable precision medicine and decision support systems through the analysis of pathology images has the potential to revolutionize the diagnosis and treatment of cancer. Such applications will depend on models' abilities to capture the diverse patterns observed in pathology images. To address this challenge, we present Virchow, a foundation model for computational pathology. Using self-supervised learning empowered by the DINOv2 algorithm, Virchow is a vision transformer model with 632 million parameters trained on 1.5 million hematoxylin and eosin stained whole slide images from diverse tissue and specimen types, which is orders of magnitude more data than previous works. The Virchow model enables the development of a pan-cancer detection system with 0.949 overall specimen-level AUC across 17 different cancer types, while also achieving 0.937 AUC on 7 rare cancer types. The Virchow model sets the state-of-the-art on the internal and external image tile level benchmarks and slide level biomarker prediction tasks. The gains in performance highlight the importance of training on massive pathology image datasets, suggesting scaling up the data and network architecture can improve the accuracy for many high-impact computational pathology applications where limited amounts of training data are available.

URL PDF HTML ☆

赞 0 踩 0

2306.14853 2026-05-26 math.OC cs.LG 版本更新

Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

基于全一阶Oracle的近似最优非凸强凸双层优化

Lesi Chen, Yaohua Ma, Jingzhao Zhang

发表机构 * IIIS, Tsinghua University（清华大学人工智能院）； Shanghai Qi Zhi Institute（上海启智研究院）； Shanghai AI Lab（上海人工智能实验室）

AI总结针对下层问题强凸的双层优化，提出一种两时间尺度更新的一阶方法，在确定性设置下达到近最优的$\tilde{\mathcal{O}}(\varepsilon^{-2})$一阶Oracle复杂度，并扩展到随机和高阶光滑场景。

Comments JMLR 2025; fix a bug in the proof in Appendix E compared to the journal version

详情

AI中文摘要

本文考虑下层问题强凸的双层优化。最近的研究表明，使用Hessian-向量积（HVP）Oracle，可以在${\mathcal{O}}(\varepsilon^{-2})$次Oracle调用内找到$\varepsilon$-稳定点。然而，HVP Oracle在实践中可能无法访问或代价高昂。Kwon等人（ICML 2023）通过提出一种一阶方法解决了这个问题，该方法以较慢的$\tilde{\mathcal{O}}(\varepsilon^{-3})$速率实现相同目标。本文引入两时间尺度更新，改进了他们的方法，实现了近最优的$\tilde {\mathcal{O}}(\varepsilon^{-2})$一阶Oracle复杂度。我们的分析具有高度可扩展性。在随机设置下，当随机噪声仅存在于上层目标或同时存在于两层目标时，我们的算法分别可以达到$\tilde {\mathcal{O}}(\varepsilon^{-4})$和$\tilde {\mathcal{O}}(\varepsilon^{-6})$的随机一阶Oracle复杂度。当目标具有更高阶光滑性条件时，我们的确定性方法可以通过注入噪声逃离鞍点，并利用Nesterov动量加速达到更快的$\tilde {\mathcal{O}}(\varepsilon^{-1.75})$速率。

英文摘要

In this work, we consider bilevel optimization when the lower-level problem is strongly convex. Recent works show that with a Hessian-vector product (HVP) oracle, one can provably find an $ε$-stationary point within ${\mathcal{O}}(ε^{-2})$ oracle calls. However, the HVP oracle may be inaccessible or expensive in practice. Kwon et al. (ICML 2023) addressed this issue by proposing a first-order method that can achieve the same goal at a slower rate of $\tilde{\mathcal{O}}(ε^{-3})$. In this paper, we incorporate a two-time-scale update to improve their method to achieve the near-optimal $\tilde {\mathcal{O}}(ε^{-2})$ first-order oracle complexity. Our analysis is highly extensible. In the stochastic setting, our algorithm can achieve the stochastic first-order oracle complexity of $\tilde {\mathcal{O}}(ε^{-4})$ and $\tilde {\mathcal{O}}(ε^{-6})$ when the stochastic noises are only in the upper-level objective and in both level objectives, respectively. When the objectives have higher-order smoothness conditions, our deterministic method can escape saddle points by injecting noise, and can be accelerated to achieve a faster rate of $\tilde {\mathcal{O}}(ε^{-1.75})$ using Nesterov's momentum.

URL PDF HTML ☆

赞 0 踩 0

2011.11194 2026-05-26 cs.LG cs.CV cs.NE 版本更新

V3H: View Variation and View Heredity for Incomplete Multi-view Clustering

V3H: 面向不完整多视图聚类的视图变异与视图遗传

Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

发表机构 * School of Computer Science and Technology, Huazhong University of Science and Technology（华中科技大学计算机科学与技术学院）； Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology（华中科技大学大数据安全工程研究中心）； Department of Electrical and Computer Engineering, University of Florida（佛罗里达大学电子与计算机工程系）

AI总结提出一种受遗传学启发的视图变异与视图遗传方法(V3H)，通过分解子空间为变异矩阵和遗传矩阵分别学习各视图的独特信息和所有视图的一致信息，并利用可调低秩表示恢复底层数据结构，在不完整多视图聚类中同时捕获一致与独特信息，在15个基准数据集上超越现有方法。

Comments Publisheded in IEEE Transactions on Artificial Intelligence

详情

DOI: 10.1109/TAI.2021.3052425
Journal ref: IEEE Transactions on Artificial Intelligence 2020

AI中文摘要

真实数据常以多个不完整视图的形式出现。不完整多视图聚类是集成这些不完整视图的有效方法。以往的方法仅学习不同视图之间的一致信息，而忽略了每个视图的独特信息，这限制了它们的聚类性能和泛化能力。为克服这一局限，我们提出了一种新颖的视图变异与视图遗传方法(V3H)。受遗传学中变异与遗传的启发，V3H首先将每个子空间分解为对应视图的变异矩阵和所有视图的遗传矩阵，分别表示独特信息和一致信息。然后，通过基于聚类指示矩阵对齐不同视图，V3H集成来自不同视图的独特信息以提高聚类性能。最后，借助基于遗传矩阵的可调低秩表示，V3H恢复潜在的真正数据结构以减少大不完整性的影响。更重要的是，V3H可能是首个将遗传学引入聚类算法以从不完整多视图数据中同时学习一致信息和独特信息的工作。在15个基准数据集上的大量实验结果验证了其相对于其他最先进方法的优越性。

英文摘要

Real data often appear in the form of multiple incomplete views. Incomplete multi-view clustering is an effective method to integrate these incomplete views. Previous methods only learn the consistent information between different views and ignore the unique information of each view, which limits their clustering performance and generalizations. To overcome this limitation, we propose a novel View Variation and View Heredity approach (V3H). Inspired by the variation and the heredity in genetics, V3H first decomposes each subspace into a variation matrix for the corresponding view and a heredity matrix for all the views to represent the unique information and the consistent information respectively. Then, by aligning different views based on their cluster indicator matrices, V3H integrates the unique information from different views to improve the clustering performance. Finally, with the help of the adjustable low-rank representation based on the heredity matrix, V3H recovers the underlying true data structure to reduce the influence of the large incompleteness. More importantly, V3H presents possibly the first work to introduce genetics to clustering algorithms for learning simultaneously the consistent information and the unique information from incomplete multi-view data. Extensive experimental results on fifteen benchmark datasets validate its superiority over other state-of-the-arts.

URL PDF HTML ☆

赞 0 踩 0

2011.10396 2026-05-26 cs.LG cs.AI 版本更新

Double Self-weighted Multi-view Clustering via Adaptive View Fusion

双自加权多视图聚类：通过自适应视图融合

Xiang Fang, Yuchong Hu

发表机构 * School of Computer Science and Technology, Key Laboratory of Information Storage System Ministry of Education of China, Huazhong University of Science and Technology（计算机科学与技术学院，信息存储系统教育部重点实验室，华中科技大学）

AI总结提出双自加权多视图聚类框架（DSMC），通过自适应权重矩阵和权重因子分别对特征和图进行加权，去除冗余和噪声，并融合多图进行聚类。

Comments Corresponding author: Xiang Fang

详情

AI中文摘要

多视图聚类已应用于许多实际应用中，其中原始数据通常包含噪声。一些基于图的多视图聚类方法被提出来试图减少噪声的负面影响。然而，以往的基于图的多视图聚类方法即使存在冗余特征或噪声，也平等对待所有特征，这显然是不合理的。在本文中，我们提出了一种新颖的多视图聚类框架——双自加权多视图聚类（DSMC）来克服上述缺陷。DSMC执行双自加权操作，从每个图中去除冗余特征和噪声，从而获得鲁棒的图。对于第一次自加权操作，它通过引入自适应权重矩阵为不同特征分配不同的权重，这可以增强重要特征在联合表示中的作用，并使每个图鲁棒。对于第二次自加权操作，它通过施加自适应权重因子对不同图进行加权，这可以为更鲁棒的图分配更大的权重。此外，通过设计自适应多图融合，我们可以融合不同图中的特征，以整合这些图进行聚类。在六个真实世界数据集上的实验证明了其相对于其他最先进的多视图聚类方法的优势。

英文摘要

Multi-view clustering has been applied in many real-world applications where original data often contain noises. Some graph-based multi-view clustering methods have been proposed to try to reduce the negative influence of noises. However, previous graph-based multi-view clustering methods treat all features equally even if there are redundant features or noises, which is obviously unreasonable. In this paper, we propose a novel multi-view clustering framework Double Self-weighted Multi-view Clustering (DSMC) to overcome the aforementioned deficiency. DSMC performs double self-weighted operations to remove redundant features and noises from each graph, thereby obtaining robust graphs. For the first self-weighted operation, it assigns different weights to different features by introducing an adaptive weight matrix, which can reinforce the role of the important features in the joint representation and make each graph robust. For the second self-weighting operation, it weights different graphs by imposing an adaptive weight factor, which can assign larger weights to more robust graphs. Furthermore, by designing an adaptive multiple graphs fusion, we can fuse the features in the different graphs to integrate these graphs for clustering. Experiments on six real-world datasets demonstrate its advantages over other state-of-the-art multi-view clustering methods.

URL PDF HTML ☆

赞 0 踩 0

2011.10331 2026-05-26 cs.CV cs.LG 版本更新

ANIMC: A Soft Framework for Auto-weighted Noisy and Incomplete Multi-view Clustering

ANIMC: 一种自动加权噪声与不完整多视图聚类的软框架

Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

发表机构 * Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology（大数据安全湖北工程研究中心，信息科学与工程学院，华中科技大学）； School of Computer Science and Technology, Huazhong University of Science and Technology（计算机科学与技术学院，华中科技大学）； Key Laboratory of Information Storage System Ministry of Education of China, Huazhong University of Science and Technology（信息存储系统教育部重点实验室，华中科技大学）； Department of Electrical and Computer Engineering, University of Florida（电气与计算机工程系，佛罗里达大学）

AI总结提出ANIMC框架，通过软自动加权策略和双软正则回归模型，处理多视图聚类中的缺失实例和噪声问题。

Comments Publisheded in IEEE Transactions on Artificial Intelligence

详情

Journal ref: IEEE Transactions on Artificial Intelligence 2021

AI中文摘要

多视图聚类在许多图像处理场景中有广泛应用。在这些场景中，原始图像数据通常包含缺失实例和噪声，而大多数多视图聚类方法忽略了这一点。然而，缺失实例可能使这些方法难以直接使用，噪声则会导致不可靠的聚类结果。本文通过软自动加权策略和双软正则回归模型，提出了一种新颖的自动加权噪声与不完整多视图聚类框架（ANIMC）。首先，通过设计自适应半正则化非负矩阵分解（adaptive semi-RNMF），软自动加权策略为每个视图分配适当的权重，并添加软边界以平衡噪声和不完整性的影响。其次，通过提出θ-范数，双软正则回归模型通过选择不同的θ来调整模型的稀疏性。与现有方法相比，ANIMC具有三个独特优势：1）它是一种软算法，可以在不同场景下调整我们的框架，从而提高其泛化能力；2）它自动学习每个视图的适当权重，从而减少噪声的影响；3）它执行双软正则回归，对齐不同视图中的相同实例，从而减少缺失实例的影响。大量实验结果表明，它优于其他最先进的方法。

英文摘要

Multi-view clustering has wide applications in many image processing scenarios. In these scenarios, original image data often contain missing instances and noises, which is ignored by most multi-view clustering methods. However, missing instances may make these methods difficult to use directly and noises will lead to unreliable clustering results. In this paper, we propose a novel Auto-weighted Noisy and Incomplete Multi-view Clustering framework (ANIMC) via a soft auto-weighted strategy and a doubly soft regular regression model. Firstly, by designing adaptive semi-regularized nonnegative matrix factorization (adaptive semi-RNMF), the soft auto-weighted strategy assigns a proper weight to each view and adds a soft boundary to balance the influence of noises and incompleteness. Secondly, by proposingθ-norm, the doubly soft regularized regression model adjusts the sparsity of our model by choosing differentθ. Compared with existing methods, ANIMC has three unique advantages: 1) it is a soft algorithm to adjust our framework in different scenarios, thereby improving its generalization ability; 2) it automatically learns a proper weight for each view, thereby reducing the influence of noises; 3) it performs doubly soft regularized regression that aligns the same instances in different views, thereby decreasing the impact of missing instances. Extensive experimental results demonstrate its superior advantages over other state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2011.10254 2026-05-26 cs.LG cs.AI stat.ML 版本更新

Unbalanced Incomplete Multi-view Clustering via the Scheme of View Evolution: Weak Views are Meat; Strong Views do Eat

通过视图演化方案的不平衡不完整多视图聚类：弱视图为食，强视图为食

Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

发表机构 * School of Computer Science and Technology, Key Laboratory of Information Storage System Ministry of Education of China, Huazhong University of Science and Technology（计算机科学与技术学院，信息存储系统教育部重点实验室，华中科技大学）； Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology（大数据安全工程研究中心，网络安全学院，华中科技大学）； Department of Electrical and Computer Engineering, University of Florida（电气与计算机工程系，佛罗里达大学）

AI总结针对不同视图不完整程度不平衡的问题，受生物进化理论启发，提出基于视图演化的不平衡不完整多视图聚类方法UIMC，通过加权多视图子空间聚类和低秩鲁棒表示恢复数据，显著提升聚类性能。

Comments Accepted by IEEE Transactions on Emerging Topics in Computational Intelligence

详情

DOI: 10.1109/TETCI.2021.3077909
Journal ref: IEEE Transactions on Emerging Topics in Computational Intelligence 2021

AI中文摘要

不完整多视图聚类是处理现实世界中不完整多视图数据的重要技术。以往的工作假设所有视图具有相同的不完整性，即平衡不完整性。然而，不同的视图往往具有不同的不完整性，即不平衡不完整性，这导致了强视图（低不完整性视图）和弱视图（高不完整性视图）。不平衡不完整性阻止我们直接使用先前的方法进行聚类。在本文中，受有效生物进化理论的启发，我们设计了新颖的视图演化方案来聚类强视图和弱视图。此外，我们提出了一种不平衡不完整多视图聚类方法（UIMC），这是第一个基于视图演化的有效方法，用于不平衡不完整多视图聚类。与先前的方法相比，UIMC有两个独特的优势：1）它提出了加权多视图子空间聚类来整合这些不平衡不完整的视图，有效解决了不平衡不完整多视图问题；2）它设计了低秩和鲁棒表示来恢复数据，减少了不完整性和噪声的影响。大量的实验结果表明，UIMC在三个评估指标上相比其他最先进的方法将聚类性能提高了高达40%。

英文摘要

Incomplete multi-view clustering is an important technique to deal with real-world incomplete multi-view data. Previous works assume that all views have the same incompleteness, i.e., balanced incompleteness. However, different views often have distinct incompleteness, i.e., unbalanced incompleteness, which results in strong views (low-incompleteness views) and weak views (high-incompleteness views). The unbalanced incompleteness prevents us from directly using the previous methods for clustering. In this paper, inspired by the effective biological evolution theory, we design the novel scheme of view evolution to cluster strong and weak views. Moreover, we propose an Unbalanced Incomplete Multi-view Clustering method (UIMC), which is the first effective method based on view evolution for unbalanced incomplete multi-view clustering. Compared with previous methods, UIMC has two unique advantages: 1) it proposes weighted multi-view subspace clustering to integrate these unbalanced incomplete views, which effectively solves the unbalanced incomplete multi-view problem; 2) it designs the low-rank and robust representation to recover the data, which diminishes the impact of the incompleteness and noises. Extensive experimental results demonstrate that UIMC improves the clustering performance by up to 40% on three evaluation metrics over other state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2605.25235 2026-05-26 cs.LG cs.AI math.OC 版本更新

Constraint-Anchored Attribution: Feasibility-Certified Counterfactuals and Bonferroni-PAC Sufficient Subsets for Neural CO Policies

约束锚定归因：神经组合优化策略的可行性认证反事实与Bonferroni-PAC充分子集

Sohaib Lafifi

发表机构 * Univ. Artois, UR 3926, Laboratoire de G\'enie Informatique et d'Automatique de l'Artois (LGI2A) B\'ethune F-62400 France ； Univ. Artois, UR 3926, Laboratoire de G\'enie Informatique et d'Automatique de l'Artois (LGI2A)

AI总结提出一种神经组合优化策略的归因方法，通过LP松弛对偶分解决策、CSP可行性模型认证反事实，并用Bonferroni校正的Hoeffding充分子集测试界定PAC解释大小。

Comments 4 pages, 1 figure, Reference implementation: https://github.com/sohaibafifi/neuro-co-cax (MIT)

详情

AI中文摘要

我们为神经组合优化（CO）策略提供了一种归因方法，该方法（i）通过LP松弛对偶按约束族分解决策，（ii）通过组合可行性模型（实现为CSP可行性决策模型）认证反事实，以及（iii）通过沿贪心顺序的Bonferroni校正Hoeffding充分子集测试界定PAC充分解释的大小。在三个CO问题和三个随机种子上，我们的LP锚定$\Lambda$-归因在CVRPTW（n_cert=344）上匹配CF导出信号的96.5%，在定向问题（n_cert=281）上匹配77.2%，而代理梯度分别为75.0%和35.2%（配对差异+0.215和+0.420；McNemar精确$p \le 10^{-14}$）。在柔性作业车间调度问题的秩对齐机制中，两个后端在每个CSP认证翻转（n_cert=59）上一致，确认了无增益预测。Bonferroni-PAC子集平均每步5.0个节点（$M=70$，$\varepsilon=\delta=0.2$，$k_{\max}=25$）。参考实现：https://github.com/sohaibafifi/neuro-co-cax

基于影响启发的谱旋转用于极端低位LLM量化

Gorgi Pavlov

发表机构 * Lehigh University（莱斯大学）

AI总结本文利用伴随理论论文的影响自适应Walsh几何，通过WHT旋转和列缩放结合重构误差量化器，实现极端低位权重量化，在多个模型上降低困惑度15-58%。

Comments 14 pages, no figures. Companion application paper to arXiv:2605.01637 (theory). Code and pinned eval stack: https://github.com/gogipav14/spectral-llm

详情

AI中文摘要

我们将伴随理论论文（arXiv:2605.01637）的影响自适应Walsh几何应用于极端低位仅权重量化。方法是一个数学不变的变换：对每个线性层的权重矩阵进行WHT旋转，并根据逐坐标Walsh基激活能量重新缩放其列，然后交给重构误差量化器（Intel auto-round）。这使每组整数舍入偏向高谱能量通道。在四个从135M到1.5B参数的预训练仅解码器模型上，BBT-spectral在W2A16下相对于普通auto-round将wikitext-2困惑度降低了15-58%；我们还报告了一个TinyLlama-1.1B辅助数据点。三个扩展将方法迁移到其失败的族：针对Qwen3注意力的每头PCA矩阵-Gamma替换q_norm/k_norm（Qwen3-0.6B上PPL从136.76降至88.99）；与RoPE可交换的SO(2)每对旋转（Qwen2.5-1.5B上PPL从36.93降至21.84）；以及通过架构模糊测试发现的Laguna风格融合专家布局的MoE感知输入侧吸收修复。W2与W4的消融实验给出了一个故意的阴性对照：在W4下，重新分配收益落在±0.5 PPL噪声基底内，这与Schur-凸性直觉一致，即非集中影响成本随噪声预算缩小而消失。所有量化权重导出为OpenVINO IR，并在Intel NPU + Arc dGPU + CPU上运行，PPL在设备间变化在±0.1内。我们不声称将理论论文的majorization论证形式化为布尔到实数值的迁移：这里使用的WHT激活能量不是理论论文的布尔影响，联系是直观的，贡献在于工程价值而非迁移定理。与SpinQuant、QuaRot、QuIP-sharp、AQLM、OmniQuant和ButterflyQuant在匹配校准下的头对头基准测试是未来的主要工作。

英文摘要

We apply the influence-adaptive Walsh geometry of a companion theory paper (arXiv:2605.01637) to extreme low-bit weight-only LLM quantization. The recipe is one math-invariant transformation: WHT-rotate each linear layer's weight matrix and rescale its columns by per-coordinate Walsh-basis activation energy before handing off to a reconstruction-error quantizer (Intel auto-round). This biases per-group integer rounding toward high-spectral-energy channels. On four pretrained decoder-only models from 135M to 1.5B parameters, BBT-spectral reduces wikitext-2 perplexity by 15-58% relative to vanilla auto-round at W2A16; we also report a TinyLlama-1.1B auxiliary data point. Three extensions transfer the recipe to families it failed on: a per-head PCA matrix-Gamma replacement of q_norm/k_norm for Qwen3 attention (PPL 136.76 -> 88.99 on Qwen3-0.6B); an SO(2) per-pair rotation that commutes with RoPE (PPL 36.93 -> 21.84 on Qwen2.5-1.5B); and an MoE-aware input-side absorption fix identified by architectural fuzzing of Laguna-style fused-expert layouts. A W2-vs-W4 ablation gives a deliberate negative control: the redistribution payoff falls within the +/-0.5 PPL noise floor at W4, consistent with the Schur-convexity intuition that the cost of unconcentrated influence vanishes as the noise budget shrinks. All quantized weights export to OpenVINO IR and run on Intel NPU + Arc dGPU + CPU with PPL invariant to device within +/-0.1. We do not claim a formal Boolean-to-real-valued transfer of the theory paper's majorization argument: the WHT activation energy used here is not the Boolean influence of the theory paper, the link is intuitive, and the contribution is engineering value rather than a transferred theorem. Head-to-head benchmarks against SpinQuant, QuaRot, QuIP-sharp, AQLM, OmniQuant, and ButterflyQuant at matched calibration are the main future-work item.

URL PDF HTML ☆

赞 0 踩 0

2605.25198 2026-05-26 cs.LG cs.AI 版本更新

Hide to Guide: Learning via Semantic Masking

隐藏以引导：通过语义掩码学习

Ruitao Liu, Qinghao Hu, Alex Hu, Yecheng Wu, Shang Yang, Luke J. Huang, Zhuoyang Zhang, Han Cai, Song Han

发表机构 * MIT（麻省理工学院）； NVIDIA（英伟达）

AI总结提出语义掩码专家策略优化（SMEPO），通过掩码专家轨迹中与奖励相关的语义片段，将困难问题转化为填空过程，提升强化学习在推理密集型任务中的探索效率。

详情

AI中文摘要

具有可验证奖励的强化学习（RLVR）已成为提升语言模型在推理密集型任务上性能的强大范式，但其有效性常受限于探索。例如，模型在困难问题上常常失败，留下很少有用的奖励信号。外部专家轨迹提供了一种自然的引导来源，但它们也可能在通往验证器目标的关键路径上暴露与奖励相关的内容，如最终答案、中间值、可执行实现或与答案相关的实体。这些内容可能创建意外的奖励黑客通道，使策略通过复制轨迹而非学习底层推理或智能体行为来获得奖励。现有的引导式RL方法通过使用部分轨迹来降低这种风险，但它们主要启发式地控制展示多少专家信息，而非控制应隐藏哪些部分。为此，我们提出语义掩码专家策略优化（SMEPO），一种用于专家引导RLVR的细粒度语义掩码策略。SMEPO不是粗略地截断轨迹或原样展示，而是在保留专家分解、计划和过程结构的同时，掩码关键路径上与奖励相关的语义片段。这将困难问题从从头推理转变为填空过程：策略可以遵循专家的问题解决路径，但仍需自行重建缺失的值、代码或实体。SMEPO易于应用，无需更改奖励函数或RL目标。在包括数学、代码和智能体搜索在内的多个领域，SMEPO相比GRPO将准确率提升最多3.2个百分点，并将训练时间减少最多4.2倍。代码已开源：https://github.com/mit-han-lab/SMEPO。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has become a powerful paradigm for improving language models on reasoning-intensive tasks, but its effectiveness is often limited by exploration. For example, models often fail on hard problems, leaving little useful reward signal. External expert traces offer a natural source of guidance, yet they may also expose reward-relevant content along the critical path to the verifier target, such as final answers, intermediate values, executable implementations, or answer-related entities. This content can create an unintended reward hacking channel, allowing the policy to obtain reward by copying the trace rather than learning the underlying reasoning or agentic behavior. Existing guided-RL methods reduce this risk by using partial trajectories, but they mainly control how much expert information is shown heuristically rather than which parts should be hidden. To this end, we propose Semantic Masked Expert Policy Optimization (SMEPO), a fine-grained semantic masking strategy for expert-guided RLVR. Instead of truncating traces coarsely or revealing them unchanged, SMEPO masks reward-relevant semantic spans along the critical path while preserving the expert's decomposition, plan, and procedural structure. This turns hard problems from reasoning from scratch into a fill-in-the-blank process: the policy can follow the expert's problem-solving route, but must still reconstruct the missing values, code, or entities by itself. SMEPO is simple to apply and requires no changes to the reward function or RL objective. Across diverse domains, including math, code, and agentic search, SMEPO improves accuracy by up to 3.2 points over GRPO and reduces training time by up to 4.2x. The code is available at https://github.com/mit-han-lab/SMEPO.

URL PDF HTML ☆

赞 0 踩 0

2605.25194 2026-05-26 cs.LG 版本更新

Localization then Neutralization: Gradient-guided Token Suppression against Visual Prompt Injection Attack

先定位再中和：梯度引导的令牌抑制对抗视觉提示注入攻击

Dongpeng Zhang, Ke Ma, Yangbangyan Jiang, Gaozheng Pei, Longtao Huang, Qianqian Xu, Qingming Huang

发表机构 * School of Advanced Interdisciplinary Sciences, UCAS（UCAS交叉学科研究院）； School of Electronic, Electrical and Communication Engineering, UCAS（UCAS电子电气与通信工程学院）； State Key Laboratory of AI Safety, Institute of Computing Technology, CAS（中国科学院计算技术研究所人工智能安全国家重点实验室）； Alibaba Group（阿里巴巴集团）； School of Computer Science and Technology, UCAS（UCAS计算机科学与技术学院）； Beijing Academy of Artificial Intelligence（北京人工智能研究院）； Key Laboratory of Big Data Mining and Knowledge Management, UCAS（UCAS大数据挖掘与知识管理重点实验室）

AI总结针对多模态大语言模型的视觉提示注入攻击，提出梯度令牌掩码（GTM）方法，通过梯度分析定位关键图像令牌并掩码中和，将攻击成功率降至接近零且计算开销极小。

详情

AI中文摘要

对抗性图像通过提示注入对多模态大语言模型构成严重安全威胁。现有防御缺乏对底层机制的原则性理解，且难以平衡效率和防御效用。在这项工作中，我们表明成功的对抗攻击并非均匀依赖整个图像，而是依赖于一小部分关键图像令牌。基于这一见解，我们提出梯度令牌掩码（GTM），通过梯度分析定位这些令牌并通过掩码中和它们。我们发现，当攻击保留预测令牌时，基于第一个生成令牌输出概率的归因会失败。为克服这一点，GTM利用隐藏状态梯度范数分数进行对抗输入下的生成影响归因。我们证明其排名与完整对抗损失梯度的排名一致，为精确定位提供了理论保证。我们的方法仅需一次前向-反向传播即可识别并清零少量高分令牌，有效破坏对抗攻击路径。在提示注入和多模态越狱攻击上的大量实验表明，我们的方法将攻击成功率（ASR）降至接近零，同时以可忽略的计算开销保持模型效用。

英文摘要

Adversarial images pose a severe security threat to multimodal large language models through prompt injection. Existing defenses largely lack a principled understanding of the underlying mechanisms and struggle to balance efficiency and defense utility. In this work, we show that successful adversarial attacks do not rely on the entire image uniformly but instead depend on a small subset of critical image tokens. Based on this insight, we propose Gradient Token Masking (GTM), which localizes these tokens via gradient analysis and neutralizes them through masking. We find that attribution based on the first generated token's output probability fails when attacks preserve the predicted token. To overcome this, GTM utilizes the Hidden-State Gradient Norm score for generation-influence attribution under adversarial inputs. We prove that its ranking is consistent with that of the full adversarial loss gradient, providing a theoretical guarantee for accurate localization. Our method requires only a single forward-backward pass to identify and zero out a small number of high-scoring tokens, effectively disrupting the adversarial attack path. Extensive experiments on prompt injection and multimodal jailbreak attacks demonstrate that our approach reduces attack success rates (ASR) to near zero while preserving model utility with negligible computational overhead.

URL PDF HTML ☆

赞 0 踩 0

2605.25189 2026-05-26 cs.LG cs.CL 版本更新

Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models

方向对齐缓解语言模型强化学习中的奖励黑客问题

Wenlong Deng, Jiaji Huang, Kaan Ozkara, Yushu Li, Christos Thrampoulidis, Xiaoxiao Li, Youngsuk Park

发表机构 * University of British Columbia（不列颠哥伦比亚大学）； Vector Institute（向量研究所）； Amazon（亚马逊）

AI总结通过分析强化学习更新的几何结构，发现奖励黑客源于优化偏离稳定低维学习轨迹，提出可信方向投影方法约束梯度在干净参考子空间内，延迟捷径利用并保持任务性能。

2605.25174 2026-05-26 q-bio.NC cs.LG cs.NE 版本更新

Growing a Neural Network in Breadth, Depth, and Time

在广度、深度和时间上生长神经网络

Eivinas Butkus, Kedar Garzón Gupta, Nikolaus Kriegeskorte

发表机构 * Columbia University（哥伦比亚大学）； NSF AI Institute for Artificial and Natural Intelligence（国家科学基金会人工智能与自然智能研究院）

AI总结提出在循环卷积神经网络中定义广度、深度和时间的可微成本，通过反向传播联合优化任务误差和资源成本，发现三者可相互权衡，且模型使用的时间与人类反应时间相关。

详情

AI中文摘要

空间和时间资源约束对生物和人工智能系统都至关重要。在这里，我们在一个被构想为无限格点有限子集的循环卷积神经网络中，定义了广度、深度和时间的可微成本项。我们通过反向传播将这些成本与任务误差联合优化。我们对广度、深度和时间施加不同的压力，导致通过训练有机地出现多样化的计算图。我们发现所有三种资源可以相互权衡以达到给定的准确度水平。网络在所有三个维度上随任务复杂性增长，并且在输入被遮挡时自发地采取更多的循环步骤。令人惊讶的是，模型使用的时间与人类在物体识别任务中的反应时间相关。我们的框架提供了资源约束如何塑造神经架构的规范性解释，与神经科学中关于大脑设计的问题相联系，并可能有助于阐明自然界中发现的神经解决方案的多样性。

英文摘要

Spatial and temporal resource constraints are critical for both biological and artificial intelligent systems. Here we define differentiable cost terms for breadth, depth, and time within a recurrent convolutional neural network conceived as a finite subset of an infinite lattice. We optimize these costs jointly with task errors via backpropagation. We set different pressures on breadth, depth, and time, which leads to diverse computational graphs emerging organically through training. We find that all three resources can be traded off against each other to achieve a given level of accuracy. Networks grow in all three dimensions with task complexity and spontaneously take more recurrent steps when inputs are occluded. Surprisingly, time used by the model correlates with human reaction times in an object recognition task. Our framework provides a normative account of how resource constraints shape neural architectures, connecting to questions about brain design in neuroscience, and may help illuminate the diversity of neural solutions found in nature.

URL PDF HTML ☆

赞 0 踩 0

2605.25173 2026-05-26 stat.ML cs.LG math.ST stat.TH 版本更新

Nyström Kernel Stein Discrepancy Tests

Nyström 核 Stein 散度检验

Florian Kalinke, Zoltán Szabó, Bharath K. Sriperumbudur

发表机构 * Chair of Information Systems（信息系统系）； Department of Statistics（统计系）； London School of Economics（伦敦经济学院）； The Pennsylvania State University（宾夕法尼亚州立大学）

AI总结本文提出并理论证明 Nyström 加速的核 Stein 散度检验在保持渐近水平和局部一致性的同时，显著降低计算复杂度。

详情

AI中文摘要

核 Stein 散度（KSD）是通用域上最受欢迎的拟合优度（GoF）度量之一，已成功部署大量应用。KSD 的主要应用之一是构建强大的 GoF 检验。然而，依赖于经典 U-/V-统计量基 KSD 估计量的检验有两个主要缺点。（i）其运行时间随样本数量呈二次方增长。（ii）在大多数情况下，其渐近零分布计算上难以处理，通常通过自举法处理。虽然已知 Nyström 方法可以在温和条件下以无统计精度损失的方式加速 KSD 估计，但据我们所知，其对基于自举的 GoF 检验影响的基本问题尚未解决；解决此问题是本文的重点。特别地，我们证明了二次时间自举 KSD 基 GoF 检验的关键性质（渐近水平和局部一致性）由其 Nyström 加速版本保持。我们在球面数据和函数数据的 GoF 检验背景下数值展示了加速 KSD 估计量和自举的效率。我们的数值结果表明，Nyström 加速方法在统计性能上与二次时间方法相当，同时需要显著更小的运行时间。

英文摘要

Kernel Stein discrepancy (KSD) is among the most popular goodness-of-fit (GoF) measures on general domains with a large number of successful deployments. One of the main applications of KSD is in constructing powerful GoF tests. However, tests relying on the classical U-/V-statistic-based KSD estimators have two major drawbacks. (i) Their runtime scales quadratically in the number of samples. (ii) Their asymptotic null distribution is computationally intractable in most cases, typically handled by bootstrapping. While it is known that the Nyström method permits accelerating KSD estimation with no loss of statistical accuracy under mild conditions, to the best of our knowledge, the fundamental question of its impact on bootstrap-based GoF testing is open; resolving this question is the focus of the current paper. In particular, we prove that the key properties of the quadratic-time bootstrapped KSD-based GoF test (asymptotic level and local consistency) are preserved by its Nyström acceleration. We numerically demonstrate the efficiency of the accelerated KSD estimator and bootstrap in the context of GoF testing of spherical and functional data. Our numerical results show that the Nyström-accelerated method performs statistically on-par with the quadratic-time approach, while requiring substantially smaller runtime.

URL PDF HTML ☆

赞 0 踩 0

2605.25172 2026-05-26 stat.AP cs.DL cs.LG 版本更新

Rejoinder: The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review

回复：ICML 2023 排名实验：审视机器学习/人工智能同行评审中的作者自我评估

Buxin Su, Jiayao Zhang, Natalie Collina, Yuling Yan, Didong Li, Kyunghyun Cho, Jianqing Fan, Aaron Roth, Weijie Su

发表机构 * University of Pennsylvania（宾夕法尼亚大学）； University of Wisconsin–Madison（威斯康星大学麦迪逊分校）； University of North Carolina at Chapel Hill（北卡罗来纳大学教堂山分校）； New York University（纽约大学）； Princeton University（普林斯顿大学）； Associate Chair of ICML 2023（ICML 2023 associate chair）； Program Chair of ICML 2023（ICML 2023 program chair）

AI总结本文回应了关于ICML 2023排名实验的讨论，将同行评审视为统计估计问题，探讨了等渗机制的公平性与策略问题，并提出了结合审稿人排名和生成式AI时代以人为中心的评审框架。

Comments Rejoinder to the JASA Discussion of "The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review" (arXiv:2408.13430)

2605.25170 2026-05-26 cs.LG cs.AI cs.ET cs.RO 版本更新

PQDT: 伪查询双Transformer用于鲁棒点云修复

Haoqing Wu, Alexa Nawotki, Jochen Garcke

发表机构 * Mercedes-Benz AG（梅赛德斯-奔驰集团）； University of Bonn（波恩大学）； Fraunhofer SCAI（弗劳恩霍夫SCAI研究所）

AI总结提出一种基于伪查询模块和Transformer主干网络的统一3D修复网络，通过两阶段几何变换增强结构清晰度和局部细节，在多种退化场景下超越现有方法。

Comments To be published in The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

详情

Courant：一种具有局部支持和可解释场分解的状态自适应感知器神经代理模型

Anuj Kumar, Josiah Bjorgaard, Nikolaos Bouklas, Matteo Salvador, Alexander Lavin

发表机构 * Pasteur Labs（Pasteur实验室）； Cornell University（康奈尔大学）； Institute for Simulation Intelligence（模拟智能研究所）

AI总结提出基于感知器的编码-处理-解码代理模型Courant，通过状态自适应潜在查询和轻量解码器实现类似自适应hp细化的局部支持与可解释场分解，在稳态/瞬态模拟基准上取得竞争性精度。

详情

AI中文摘要

我们引入“Courant”，一种基于感知器的编码器-处理器-解码器代理模型，其潜在特征在物理空间中表现出自适应专门化和局部支持，实现了类似于自适应hp细化方案的功能，这是传统数值求解器和科学机器学习中非常期望的属性。所提出的架构结合了共享随机傅里叶特征坐标嵌入、状态自适应潜在查询和轻量解码器。Courant使用稳态或瞬态模拟数据进行端到端训练，仅使用物理空间中的标准L_2预测损失，在基准测试上达到竞争性精度。我们证明Courant的归纳偏差产生了设计上可解释的潜在变量：它们在模拟域中发展出多尺度几何专门化，并在时间相关情况下跟踪相干结构，类似于随时间演化的空间基函数，从而允许对模拟场进行紧凑的、几何锚定的、单位划分式的分解。

英文摘要

We introduce "Courant", a Perceiver-based encoder-processor-decoder surrogate model that has latent features exhibiting adaptive specialization and local support in the physical space, enabling functionality akin to an adaptive hp-refinement scheme, an attribute that is highly desirable in traditional numerical solvers and scientific machine learning broadly. The proposed architecture combines a shared random Fourier feature coordinate embedding, state-adapted latent queries, and a light-weight decoder. Courant is trained end-to-end with steady or transient simulation data and only a standard L_2 prediction loss in the physical space, achieving competitive accuracy on benchmarks. We demonstrate that Courant's inductive biases yield latents that are interpretable by design: they develop multiscale geometric specialization in the simulation domain and track coherent structures in the time-dependent case, acting analogously to time-evolving spatial basis functions and allowing for decoding a compact, geometry-anchored, partition-of-unity-like decomposition of the simulated field.

URL PDF HTML ☆

赞 0 踩 0

2605.25114 2026-05-26 stat.ML cs.LG 版本更新

Counterfactually Safe Reinforcement Learning

反事实安全的强化学习

Jingyi Li, Peng Wu, Chengchun Shi

发表机构 * Department of Statistics and Data Science, National University of Singapore（新加坡国立大学统计与数据科学系）； School of Mathematics and Statistics, Beijing Technology and Business University（北京技术与商业大学数学与统计学学院）； Department of Statistics, London School of Economics and Political Science（伦敦政治经济学院统计系）

AI总结针对强化学习策略可能对个体造成伤害的问题，提出基于反事实视角定义个体伤害，并设计两阶段学习过程以最大化期望回报同时控制伤害率，理论证明有限样本性质与次优性上界，实验验证有效性。

2605.25111 2026-05-26 cs.LG 版本更新

Revisiting Pre-Propagation GNNs: Robust Diffusion Operators and Hidden-State Re-Propagation

重新审视预传播图神经网络：鲁棒扩散算子与隐状态再传播

Zichao Yue, Zhiru Zhang

发表机构 * School of Electrical and Computer Engineering, Cornell University, Ithaca, New York, USA（电气与计算机工程系，康奈尔大学，纽约州伊萨卡市）

AI总结提出鲁棒图扩散算子和少量隐状态再传播方案，使预传播图神经网络在保持训练效率的同时匹配消息传递图神经网络的精度。

2605.25110 2026-05-26 cs.CV cs.AI cs.LG 版本更新

Uncertainty-DTW for Sequences and Visual Tokens

Uncertainty-DTW 用于序列和视觉标记

Lei Wang, Syuan-Hao Li, Yongsheng Gao, Piotr Koniusz

发表机构 * School of Engineering and Built Environment, Electrical and Electronic Engineering, Griffith University（工程与建筑环境学院，电气与电子工程学院，格里菲斯大学）； School of Computer Science and Engineering, University of New South Wales（计算机科学与工程学院，新南威尔士大学）

AI总结提出不确定性感知的动态时间规整（uDTW）框架，通过异方差不确定性建模和最大似然估计实现鲁棒对齐，并推广到视觉标记集，在多个领域取得优于现有方法的结果。

Comments Research report

详情

AI中文摘要

对齐结构化数据是计算机视觉和机器学习中的一个基本问题，支撑着时间序列分析、人类动作识别和视觉表示学习等任务。现有的对齐方法，包括动态时间规整（DTW）及其可微变体，依赖于确定性相似度度量，因此对异质和噪声特征敏感。在这项工作中，我们引入了不确定性感知对齐，这是一个概率框架，用异方差不确定性建模成对对应关系，并沿对齐路径执行结构化匹配。我们的公式，不确定性-DTW（uDTW），为每个对应分配一个正态分布，并通过最大似然估计目标参数化每条对齐路径，该目标包括（i）一个精度加权匹配项，抑制不可靠特征，以及（ii）一个对数方差正则化，防止退化解。这产生了一个概率对齐机制，对噪声具有鲁棒性且可解释，因为不确定性直接反映了匹配的可靠性。我们进一步将该框架从时间序列推广到标记化的视觉表示，从而能够对视觉标记集进行结构化匹配。学习到的不确定性可以解释为反向注意力：语义相关区域表现出低不确定性并主导对齐，而模糊/噪声区域具有高不确定性。这提供了对齐、注意力和不确定性建模之间的联系。我们在不同领域评估了所提出的框架。结果表明，与最先进的方法相比，该方法持续改进，并且学习到的不确定性与语义重要性相关。这些发现将不确定性感知对齐确立为一个通用、鲁棒且可解释的框架，用于从结构化数据中学习。

英文摘要

Aligning structured data is a fundamental problem in computer vision and machine learning, underlying tasks such as time series analysis, human action recognition, and visual representation learning. Existing alignment methods, including Dynamic Time Warping (DTW) and its differentiable variants, rely on deterministic similarity measures and are therefore sensitive to heterogeneous and noisy features. In this work, we introduce uncertainty-aware alignment, a probabilistic framework that models pairwise correspondences with heteroscedastic uncertainty and performs structured matching along alignment paths. Our formulation, uncertainty-DTW (uDTW), assigns each correspondence a Normal distribution and parametrizes each alignment path by a Maximum Likelihood Estimate objective consisting of (i) a precision-weighted matching term that suppresses unreliable features, and (ii) a log-variance regularization that prevents degenerate solutions. This yields a probabilistic alignment mechanism that is robust to noise and interpretable, as uncertainty directly reflects the reliability of matches. We further generalize this framework from temporal sequences to tokenized visual representations, enabling structured matching over sets of visual tokens. The learned uncertainty can be interpreted as a reverse-attention: semantically relevant regions exhibit low uncertainty and dominate the alignment, while ambiguous/noisy regions have high uncertainty. This provides a connection between alignment, attention, and uncertainty modeling. We evaluate the proposed framework across diverse domains. The results demonstrate consistent improvements over state-of-the-art methods and show that learned uncertainty correlates with semantic importance. These findings establish uncertainty-aware alignment as a general, robust, and interpretable framework for learning from structured data.

URL PDF HTML ☆

赞 0 踩 0

2605.25107 2026-05-26 cs.LG cs.AI cs.NA math.NA 版本更新

Leveraging Gauge Freedom for Learning Non-Gradient Population Dynamics of Stochastic Systems

利用规范自由度学习随机系统的非梯度种群动力学

Jules Berman, Tobias Blickhan, Benjamin Peherstorfer

发表机构 * Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA（数学科学学院，纽约大学，纽约，纽约州，10012，美国）

AI总结针对现有种群动力学推断局限于梯度流的问题，提出非梯度推断流（NGIF）算法，通过连续性方程的弱形式参数化一般向量场并选择非最小动能准则，在低维和高维物理问题中提高了分布精度并更好地捕捉非势输运。

2605.25095 2026-05-26 cs.AI cs.LG math.OC 版本更新

随机神经网络对非线性偏微分方程的表达能力

Muhammed Ali Mehmood, Lukas Gonon

发表机构 * Department of Mathematics（数学系）； Imperial College London（帝国理工学院伦敦分校）； School of Computer Science（计算机科学学院）； University of St. Gallen（圣加尔登大学）

AI总结研究随机生成隐藏权重的神经网络（RaNNs）对非线性偏微分方程解的逼近能力，推导了误差界并得到维数无关的逼近率1/2，应用于多孔介质方程和可压缩Navier-Stokes方程。

详情

AI中文摘要

随机生成隐藏权重的神经网络（RaNNs）已被广泛研究，既作为独立的机器学习方法，也作为全可训练深度学习方法的初始化。本文研究RaNNs在学习非线性偏微分方程（PDEs）解方面的表达能力。尽管在实际应用中广泛使用，但对此背景下RaNNs逼近性质的严格理论理解仍然有限。本文推导了RaNNs对时间依赖Sobolev函数的误差界，并对足够正则的函数获得了维数无关的逼近率$ rac{1}{2}$。我们将结果应用于两类重要的非线性PDEs：多孔介质方程和可压缩Navier-Stokes方程，表明RaNNs能够有效逼近这些复杂非线性PDEs的解。我们的理论分析得到了数值实验的支持，表明所获得的收敛速率超出了所考虑的设置。

英文摘要

Neural networks with randomly generated hidden weights (RaNNs) have been extensively studied, both as a standalone learning method and as an initialization for fully trainable deep learning methods. In this work, we study RaNN expressivity for learning solutions to non-linear partial differential equations (PDEs). Despite their widespread use in practical applications, a rigorous theoretical understanding of the approximation properties of RaNNs in this context remains limited. Here, we derive error bounds for RaNN approximations to time-dependent Sobolev functions and obtain a dimension-free approximation rate $\frac{1}{2}$ for sufficiently regular functions. We apply our results to two important classes of non-linear PDEs: Porous Medium Equations and Compressible Navier-Stokes Equations, showing that RaNNs are capable of efficiently approximating solutions to these complex, non-linear PDEs. Our theoretical analysis is supported by numerical experiments, showing that the obtained convergence rates extend beyond the considered setting.

URL PDF HTML ☆

赞 0 踩 0

2605.25050 2026-05-26 stat.AP cs.LG q-bio.QM stat.ML 版本更新

Multimodality Stacking with Blockwise missing values and application to the PIONeeR biomarkers study for prediction of resistance to immunotherapy

具有分块缺失值的多模态堆叠及其在预测免疫治疗耐药性的PIONeeR生物标志物研究中的应用

Mohamed Boussena, Florence Monville, Jacques Fieschi-Meric, Frederic Vely, Pierre Milpied, Julien Mazieres, Maurice Perol, Eric Vivier, Laurent Greillier, Fabrice Barlesi, Sebastien Benzekry

发表机构 * Inria – Inserm team COMPO, COMPutational pharmacology and clinical Oncology, Centre Inria Sophia Antipolis - Méditerranée, Centre de Recherches en Cancérologie de Marseille, Inserm U1068, CNRS UMR7258, Institut Paoli-Calmettes, Pharmacy faculty, Aix-Marseille University（Inria - Inserm COMPO团队，计算药理学和临床肿瘤学，Inria Sophia Antipolis -地中海， Marseille癌症研究中心，Inserm U1068，CNRS UMR7258，Paoli-Calmettes研究所，药学系，Aix-Marseille大学）； Veracyte SAS, Marseille, France（Veracyte SAS，法国马赛）； Assistance Publique-Hôpitaux de Marseille (APHM), Marseille, France（马赛公共医院（APHM），法国马赛）； Toulouse University Hospital, Toulouse, France（图卢兹大学医院，法国图卢兹）； Centre Leon Berard, Lyon, France（Leon Berard中心，法国里昂）； Innate Pharma, Marseille, France（Innate Pharma，法国马赛）； Université Paris Saclay, Gustave Roussy, Inserm, Prédicteurs Moléculaires et nouvelles cibles en oncologie (U981), F-94805, Villejuif, France（巴黎萨克雷大学，Gustave Roussy，Inserm，分子预测与肿瘤学新靶点（U981），法国维尔若，F-94805）

AI总结提出多模态堆叠框架MSB，通过独立建模各模态特征并利用交叉验证堆叠元学习器聚合预测，解决高维和分块缺失问题，在PIONeeR研究中预测非小细胞肺癌免疫治疗无进展生存期，性能优于基线算法。

详情

AI中文摘要

在临床肿瘤学中，整合多模态数据集常受到高维性和分块缺失的阻碍，即特定患者子集无法获得完整数据源。标准生存模型通常难以处理这些缺失，导致结果偏倚或患者排除。我们提出具有分块缺失值的多模态堆叠（MSB），一种用于生存分析的晚期融合框架，它独立建模模态特定特征，然后通过交叉验证的堆叠元学习器聚合预测。MSB在PIONeeR研究（n=443名患者，来自八个异质来源的378个生物标志物）中进行了验证，以预测接受免疫治疗的晚期非小细胞肺癌患者的无进展生存期。MSB产生了比基线算法更高的预测性能（C-index）。改进幅度因基线强度而异：线性模型提高了15.9%（Wilcoxon符号秩检验p<0.001），随机生存森林提高了5.4%（p=0.002），梯度提升方法提高了2.1%（p=0.030）。除了区分能力外，MSB还缩小了泛化差距（5折交叉验证重复3次的训练-测试差异：0.055 vs 线性模型的0.380）。置换重要性分析确定了常规实验室标志物、临床特征和PD-L1表达为主要预测驱动因素。缺失块指示器的重要性可忽略，表明模型从生物标志物值而非数据可用性模式中学习。MSB为具有分块缺失的多模态生存预测提供了一个统计验证的框架。通过无需完整数据即可进行系统性生物标志物评估，MSB为生物医学研究中的预测建模提供了实用工具，有待外部验证。实现代码可在https://github.com/MohamedBoussena/MSB 根据Inria许可证获取。

英文摘要

Integrating multimodal datasets in clinical oncology is frequently hindered by high dimensionality and blockwise missingness, where entire data sources are unavailable for specific patient subsets. Standard survival models often struggle with these gaps, leading to biased results or patient exclusion. We introduce Multimodality Stacking with Blockwise missing values (MSB), a late-fusion framework for survival analysis that independently models modality-specific features before aggregating predictions via a cross-validated stacking meta-learner. MSB was validated on the PIONeeR study (n=443 patients, 378 biomarkers across eight heterogeneous sources) to predict progression-free survival in advanced non-small cell lung cancer patients receiving immunotherapy. MSB yielded higher predictive performance (C-index) than baseline algorithms. Improvements varied by baseline strength: linear models showed a 15.9% increase (p<0.001 for the Wilcoxon signed-rank test), random survival forests gained 5.4% (p=0.002), and gradient boosting methods improved by 2.1% (p=0.030). Beyond discrimination, MSB reduced the generalization gap (train-test difference in 5 folds cross-validation repeated 3 times: 0.055 vs 0.380 for linear models). Permutation importance analysis identified routine laboratory markers, clinical features, and PD-L1 expression as primary predictive drivers. Missing block indicators showed negligible importance, suggesting the model learned from biomarker values rather than data availability patterns. MSB provides a statistically validated framework for multimodal survival prediction with blockwise missingness. By enabling systematic biomarker evaluation without requiring complete data, MSB offers a practical tool for predictive modeling in biomedical research, pending external validation. Implementation is available at https://github.com/MohamedBoussena/MSB under Inria license.

URL PDF HTML ☆

赞 0 踩 0

2605.25038 2026-05-26 cs.CL cs.LG cs.SE 版本更新

TRACE: A taxonomy-grounded synthetic dataset for teaching-program generation and session interpretation in Applied Behavior Analysis

TRACE：一个基于分类学的合成数据集，用于应用行为分析中的教学程序生成和会话解释

Festus Kahunla

发表机构 * Drexel University（德雷塞尔大学）； Pombo Labs（波莫实验室）

AI总结提出TRACE数据集，通过分类学驱动的确定性生成器创建2999个合成示例，覆盖教学程序生成和多会话行为解释任务，以解决ABA领域真实数据受隐私保护无法公开的问题。

Comments 11 pages, 3 tables. Dataset: https://huggingface.co/datasets/PomboLabs/TRACE ; code: https://github.com/Pombo-Labs/TRACE

详情

AI中文摘要

应用行为分析（ABA）是一门临床学科，其文档、教学程序和多次会话行为日志具有公式化和高容量的特点，但真实会话数据受HIPAA保护并受专业保密规则约束，阻碍了训练语料库的发布。我们提出了TRACE（分类学参考的ABA临床示例），一个包含2999个示例的合成指令调优数据集，涵盖两项ABA任务：跨离散试验训练、自然环境教学和任务分析的教学程序生成；以及跨十二种轨迹模式和十三种目标行为的多会话行为解释。每个示例均由一个基于经典ABA文献的确定性分类学驱动生成器产生，并且每个示例都带有完整的采样来源，即产生它的确切分类学单元。该数据集以CC BY-NC 4.0（数据）和MIT（代码）许可发布，包含分层训练集（2549）、验证集（149）、测试集（281）和完整性检查集（20）。TRACE是一个研究工件，尚未经过临床验证。

英文摘要

Applied Behavior Analysis (ABA) is a clinical discipline whose documentation, teaching programs and multi-session behavioral logs, is formulaic and high-volume, yet real session data is HIPAA-protected and bound by professional confidentiality rules, blocking the release of a training corpus. We present TRACE (Taxonomy-Referenced ABA Clinical Examples), a 2,999-example synthetic instruction-tuning dataset covering two ABA tasks: teaching-program generation across Discrete Trial Training, Natural Environment Teaching, and Task Analysis; and multi-session behavioral interpretation across twelve trajectory patterns and thirteen target behaviors. Every example is produced by a deterministic taxonomy-driven generator grounded in the canonical ABA literature, and every example carries complete sampling provenance, the exact taxonomy cells that produced it. The dataset is released under CC BY-NC 4.0 for data and MIT for code, with stratified train (2,549), validation (149), test (281), and sanity (20) splits. TRACE is a research artifact and has not been clinically validated.

URL PDF HTML ☆

赞 0 踩 0

2605.25030 2026-05-26 cs.LG 版本更新

MimirRAG: A Multi-Agent RAG Framework for Financial Data Retrieval with Metadata Integration

MimirRAG：一种集成元数据的金融数据检索多智能体RAG框架

Magnus Samuelsen, Wilmer Nyström, Somnath Mazumdar, Mansoor Hussain, Mikkel Strange

发表机构 * Copenhagen Business School（哥本哈根商学院）

AI总结提出MimirRAG多智能体RAG框架，通过元数据集成、表格感知分块和智能体工作流，在金融数据检索中实现89.3%准确率，优于基线。

详情

AI中文摘要

检索增强生成（RAG）系统提供了一种有前景的方法来减少大语言模型（LLM）中的幻觉并提高答案准确性，这是可靠金融分析的必要条件，其中答案必须基于文件中的可验证证据，而非从模型先验生成。然而，设计能够从混合金融文档中提取有意义见解并集成到分析师工作流程中的RAG系统仍然具有挑战性。本文介绍了MimirRAG（元数据集成多智能体信息检索），这是一个迭代开发的多智能体RAG系统，旨在应对这些挑战。MimirRAG具有模块化流水线，包括PDF文件的保结构解析、表格感知分块、元数据提取、带有查询规划和混合搜索的基于智能体的检索、验证以及支持数值推理的上下文感知生成。我们的消融研究确定了有效金融RAG的三个关键技术推动因素：元数据集成、表格感知分块和智能体工作流。MimirRAG使用FinanceBench进行定量评估，并通过四位金融分析师的专家验证进行定性评估。该系统在FinanceBench上达到89.3%的准确率，优于原始基准基线。专家反馈强调，成功部署还需要校准信任、全面的数据集成和用户个性化。我们得出结论，将多智能体RAG架构与以人为中心的设计原则相结合，可以改善金融分析中有意义见解的提取。

英文摘要

Retrieval-augmented generation (RAG) systems offer a promising approach to reduce hallucinations and improve answer accuracy in large language models (LLMs), a requirement for reliable, financial analysis where answers must be grounded in verifiable evidence from filings rather than generated from model priors. However, designing RAG systems that extract meaningful insights from mixed financial documents and integrate into analyst workflows remains challenging. This paper introduces MimirRAG (Metadata-Integrated Multi-Agent Information Retrieval), a multi-agent RAG system developed iteratively to address these challenges. MimirRAG features a modular pipeline encompassing structure-preserving parsing of PDF filings, table-aware chunking, metadata extraction, agent-based retrieval with query planning and hybrid search, validation, and context-aware generation with numerical reasoning support. Our ablation study identifies three key technical enablers for effective financial RAG: metadata integration, table-aware chunking, and an agentic workflow. MimirRAG was evaluated quantitatively using FinanceBench and qualitatively through expert validation with four financial analysts. The system achieved 89.3% accuracy on FinanceBench, outperforming the original benchmark baselines. Expert feedback highlighted that successful deployment also requires calibrated trust, comprehensive data integration, and user personalization. We conclude that combining multi-agent RAG architecture with human-centric design principles can improve the extraction of meaningful insights in financial analysis.

URL PDF HTML ☆

赞 0 踩 0

2605.25011 2026-05-26 cs.LG 版本更新

A perspective on fluid mechanical environments for challenges in reinforcement learning

强化学习挑战中的流体力学环境视角

Shruti Mishra, Michael Chang, Vamsi Spandan, Shmuel M. Rubinstein

发表机构 * Sony AI（索尼人工智能）； Cohere Labs（Cohere实验室）； The Hebrew University of Jerusalem（耶路撒冷希伯来大学）

AI总结本文提出将经典流体力学问题作为强化学习测试平台，通过非线性不稳定性环境中的状态、动作空间和奖励函数设计，促进智能体在高维动态环境中的高效交互。

详情

AI中文摘要

我们考虑开发能够高效与高维、演化环境交互的智能体所面临的挑战，旨在实现与开放世界交互的实际强化学习智能体，这些智能体仅能观察并影响世界的一小部分。我们认为，经典流体力学问题及其模拟为这类方法的开发提供了一个引人注目的测试平台。这些问题出现在非线性不稳定性中，其中小扰动可能增长并改变系统的动力学。非线性不稳定性代表了若干具有工业应用的开放科学挑战——液体射流的液滴破碎、两种流体界面的混合，以及海洋中异常高的怪浪的出现。在这些设置中，智能体可以利用跨变化动力学的保留表示来高效学习。我们提出了两个智能体与流体力学环境交互的问题描述，并描述了这些智能体的状态空间、动作空间和奖励函数。对于这些示例，我们指定了环境的非平稳方面以及保留的不变性。我们注意到Dedalus和JAX-CFD是可用于开发强化学习方法的开源模拟器（Burns等人，2016；Kochkov等人，2021）。我们通过创建在Dedalus模拟的静态环境中学习导航的强化学习智能体，展示了Dedalus在环境生成中的使用。这为未来开发能够有意义地与代表自然和工业流动中科学挑战的模拟环境交互的强化学习智能体奠定了基础。

英文摘要

We consider the challenge of developing agents that efficiently interact with high-dimensional, evolving environments, towards a view of practical reinforcement learning (RL) agents interacting with open worlds, of which they witness and affect only a small part. We argue that canonical fluid mechanics problems, and their simulations, present a compelling testbed for the development of such methods. These problems arise in nonlinear instabilities, where small disturbances can grow to transform the dynamics of a system. Nonlinear instabilities represent several open scientific challenges with industrial applications -- the droplet breakup of a liquid jet, mixing at an interface between two fluids, and the appearance of unusually tall rogue waves in the ocean. In these settings, agents may leverage preserved representations across the changing dynamics to learn efficiently. We present two problem descriptions of agents interacting with a fluid mechanical environment, and describe the state and action spaces, and reward functions, for these agents. For these examples, we specify the aspects of the environment which are nonstationary and the preserved invariances. We note Dedalus and JAX-CFD as open-source simulators that can be used for the development of reinforcement learning methods (Burns et al., 2016; Kochkov et al., 2021)) We demonstrate the use of Dedalus for environment generation by creating RL agents that learn to navigate in a stationary environment that is simulated using Dedalus. This sets the stage for future development of RL agents that learn to meaningfully interact with simulated environments that represent scientific challenges in natural and industrial flows.

URL PDF HTML ☆

赞 0 踩 0

2605.25004 2026-05-26 cs.LG cs.AI 版本更新

Metropolis-Scale Resilient and Trustworthy Traffic Flow Inference Using Multi-Source Data

基于多源数据的都市尺度弹性可信交通流推断

Qishen Zhou, Yifan Zhang, Michail A. Makridis, Anastasios Kouvelas, Yibing Wang, Simon Hu

发表机构 * School of Transportation, Jilin University（吉林大学交通运输学院）； Department of Computer Science, City University of Hong Kong (Dongguan)（香港城市大学（东莞）计算机科学系）； Institute for Transport Planning and Systems, ETH Zurich（苏黎世联邦理工学院交通规划与系统研究所）； Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University（浙江大学智能交通系统研究所）； ZJU-UIUC Institute, Zhejiang University（浙江大学-UIUC研究院）

AI总结提出任务感知注意力神经过程（TA-ANP）统一概率框架，融合浮动车数据和稀疏固定检测器数据，实现高精度、可信的不确定性量化的全局交通状态推断，并在都市尺度数据集上取得最优性能。

Comments The paper has been submitted to Elsevier for possible publication

详情

AI中文摘要

从稀疏观测中以高精度和可信的不确定性量化推断网络级交通状态对于智能交通系统至关重要，但由于问题的欠定性、传感网络的多方面干扰以及多个推断子任务在联合建模时的固有冲突，这仍然具有挑战性。我们提出了任务感知注意力神经过程（TA-ANP），这是一个统一的概率框架，通过融合浮动车数据（FCD）和稀疏的固定检测器测量，实现弹性且可信的全局交通状态推断（GTSI）。通过将GTSI视为一个随机过程，TA-ANP利用神经过程的元学习特性，无需重新训练即可快速适应传感配置的变化。引入了一个具有不同时空归纳偏置的任务感知多查询注意力模块，以联合处理三个GTSI子任务，同时减轻跨任务干扰。对于不确定性量化，我们将神经过程与蒙特卡洛丢弃法相结合，以捕获偶然不确定性和认知不确定性。为了支持都市尺度评估，我们构建了都市多源交通数据集（MMTD），该数据集整合了固定环路传感器测量、FCD统计数据和OpenStreetMap道路网络数据，覆盖了包含2371个路段的城市网络。在MMTD上的实验表明，TA-ANP在确定性和概率性指标下的所有子任务中均达到了最先进的性能。由此产生的良好校准的不确定性使得能够以更少的传感器部署实现更高效的固定传感器布局。在“损坏-修复-新增”传感生命周期下，TA-ANP在干扰吸收、性能恢复和对未见传感配置的适应性方面表现出卓越的弹性。

英文摘要

Inferring network-wide traffic states from sparse observations with high accuracy and trustworthy uncertainty quantification is essential for intelligent transportation systems, yet it remains challenging due to the underdetermined nature of the problem, multifaceted disturbances in sensing networks, and the inherent conflicts among multiple inference sub-tasks when modeled jointly. We propose the Task-Aware Attentive Neural Process (TA-ANP), a unified probabilistic framework for resilient and trustworthy global traffic state inference (GTSI) by fusing floating car data (FCD) with sparse fixed-detector measurements. By casting GTSI as a stochastic process, TA-ANP leverages the meta-learning properties of neural processes to adapt rapidly to changes in sensing configurations without retraining. A task-aware multi-query attention module with distinct spatiotemporal inductive biases is introduced to jointly handle three GTSI sub-tasks, while mitigating cross-task interference. For uncertainty quantification, we combine neural processes with Monte Carlo Dropout to capture both aleatoric and epistemic uncertainty. To support metropolis-scale evaluation, we construct the Metropolitan Multi-Source Traffic Dataset (MMTD), integrating fixed-loop sensor measurements, FCD statistics, and OpenStreetMap road-network data over an urban network of 2,371 road segments. Experiments on MMTD show that TA-ANP achieves state-of-the-art performance across all sub-tasks under deterministic and probabilistic metrics. The resulting well-calibrated uncertainties enable more efficient fixed-sensor placement with fewer sensor deployments. Under a Damage-Repair-Addition sensing lifecycle, TA-ANP demonstrates superior resilience in terms of disturbance absorption, performance recovery, and adaptability to unseen sensing configurations.

URL PDF HTML ☆

赞 0 踩 0

2605.25001 2026-05-26 cs.LG 版本更新

Mitigating Gradient Pathology in PINNs through Aligned Constraint

通过对齐约束缓解PINN中的梯度病理

Yichen Luo, Peiyu Zhu, Dongxiao Hu, Jia Wang, Tailin Wu, Dapeng Lan, Yu Liu, Zhibo Pang

发表机构 * Department of Information Science and Engineering, KTH Royal Institute of Technology, Stockholm, Sweden（信息科学与工程系，皇家理工学院，斯德哥尔摩，瑞典）； School of Advanced Manufacturing and Robotics, Peking University, Beijing, China（先进制造与机器人学院，北京大学，北京，中国）； School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Suzhou, China（先进技术学院，西安交通大学利物浦大学，苏州，中国）； Department of AI, School of Engineering, Westlake University, Hangzhou, China（人工智能系，工程学院，西湖大学，杭州，中国）

AI总结针对物理信息神经网络训练中梯度冲突导致的局部最优问题，提出约束对齐损失与流形提升方法，通过重新表述零阶项为对齐约束并引入延迟因子，显著提升数值稳定性和效率。

详情

Journal ref: Forty-Third International Conference on Machine Learning (ICML 2026)

AI中文摘要

虽然物理信息神经网络（PINN）在求解偏微分方程（PDE）方面功能强大，但其训练常常因梯度病理而瘫痪。来自PDE残差和边界约束的梯度相互对抗，使模型陷入局部最小值。当前的解决方案，如自适应加权或硬约束，要么无法从根本上解决这种病态条件，要么仅限于简单几何形状。在本研究中，我们从损失景观和优化动态的角度系统分析了这种梯度病理的可能原因。基于所得结论，我们提出了带有流形提升的约束对齐损失（CAML）。通过将所有零阶项重新表述为对齐约束，我们的方法有效缓解了梯度冲突。此外，我们引入了一个延迟因子来帮助优化器跳过高曲率区域。实验表明，我们的CAML在高度复杂的PINN问题中显著增强了数值稳定性和效率。我们的代码已在https://github.com/YichenLuo-0/CAML上开源。

英文摘要

While Physics-Informed Neural Networks (PINNs) are powerful for solving Partial Differential Equations (PDEs), their training is often paralyzed by gradient pathology. The gradients from the PDE residuals and boundary constraints oppose each other, trapping the model in local minima. Current solutions, such as adaptive weighting or hard constraints, either fail to fundamentally resolve this ill-conditioning or are limited to simple geometries. In this study, we systematically analyze the possible causes of this gradient pathology from the perspectives of loss landscapes and optimization dynamics. Based on the obtained conclusion, we propose Constraint-Aligned loss with Manifold Lifting (CAML). By reformulating all zeroth-order terms into aligned constraints, our method effectively mitigates gradient conflicts. In addition, we introduce a delay factor to help the optimizer skip the high-curvature area. Experiments demonstrate that our CAML significantly enhances numerical stability and efficiency in highly complex PINN problems. Our code is open-sourced on https://github.com/YichenLuo-0/CAML.

URL PDF HTML ☆

赞 0 踩 0

2605.24992 2026-05-26 cs.NI cs.AI cs.LG cs.MA 版本更新

Scaling up Energy-Aware Multi-Agent Reinforcement Learning for Mission-Oriented Drone Networks with Individual Reward

面向任务驱动无人机网络的能量感知多智能体强化学习扩展与个体奖励

Changling Li, Ying Li

发表机构 * Department of Computer Science, ETH Zurich（苏黎世联邦理工学院计算机科学系）； Department of Computer Science, Colby College（科尔比学院计算机科学系）

AI总结提出基于个体奖励函数的能量感知多智能体强化学习模型，利用深度Q网络解决无人机网络动态环境和电池容量限制下的轨迹规划问题，实验表明在任务密度高时成功率接近100%，且扩展性优于共享奖励模型。

Comments IEEE Internet of Things Journal

详情

DOI: 10.1109/JIOT.2024.3511253
Journal ref: volume=12, number=8, year=2025, pages=10640-10654

AI中文摘要

多智能体强化学习（MARL）因其通过交互学习的能力，在自动驾驶和智慧城市等协作系统中显示出广泛适用性。随着无人机网络的最新发展，研究人员也应用MARL来解决轨迹规划问题。然而，动态环境和有限的电池容量仍然是使用MARL实现高效协作任务执行的挑战。在本文中，我们提出了一种能量感知的MARL模型作为应对这些挑战的尝试，利用深度Q网络（DQN）和由任务执行进度及无人机剩余电量驱动的个体奖励函数。我们对所提出的模型进行了一系列仿真研究，并将其与共享奖励MARL进行比较，以探索MARL中信用分配的影响。结果表明，无论任务位置和长度如何，我们提出的模型都能达到至少80%的成功率。与共享奖励模式类似，个体奖励模式在任务密度高时可以获得更好的成功率，并且当任务密度接近40%时，几乎可以达到100%的成功率。我们提出的个体奖励模型的真正优势在环境扩展时得以显现。与共享奖励MARL的比较表明，我们提出的模型对环境大小和智能体数量的变化更加鲁棒。由于目标的清晰性，它可以用更少的步骤实现更高的成功率，从而更好地提高能源效率。

英文摘要

Multi-agent reinforcement learning (MARL) has shown wide applicability in collaborative systems such as autonomous driving and smart cities for its ability of learning through interaction. With the recent development of drone networks, researchers have also applied MARL to address the trajectory planning problems. However, the dynamic environment and the limited battery capacity are still challenging for using MARL to achieve efficient collaborative task execution. In this paper, we propose an energy-aware MARL model as an attempt to tackle these challenges, leveraging Deep Q-Networks (DQN) with \emph{individual reward functions} driven by the task execution progress and the remaining battery of drones. We conduct a set of simulation studies for the proposed mode and compare it with the shared reward MARL~\cite{Li2022MARL} to explore the impact of credit assignment in MARL. The results indicate that our proposed model can achieve at least 80\% success rate regardless of the task locations and lengths. Similar to the shared reward mode, the individual reward mode can achieve a better success rate when the task density is high, and it can hit nearly a 100\% success rate when task density gets close to 40\%. The true advantage of our proposed model with individual reward is revealed when scaling up the environment. The comparison to the shared reward MARL shows that the our proposed model is more robust towards the change of the environment size and agent numbers. It can achieve higher success rate with fewer steps due to the clarity of the goal which improves energy efficiency even better.

URL PDF HTML ☆

赞 0 踩 0

2605.24989 2026-05-26 cs.LG cs.AI cs.IR 版本更新

Selective Test-Time Compute Scaling for Click-Through Rate Prediction via Uncertainty-Triggered Feature Path Exploration

基于不确定性触发的特征路径探索的点击率预测选择性测试时计算扩展

Moyu Zhang, Yun Chen, Yujun Jin, Jinxin Hu, Yu Zhang, Xiaoyi Zeng

发表机构 * Alibaba Group（阿里巴巴集团）

AI总结针对点击率预测中训练数据稀疏导致的不确定性，提出无需训练、模型无关的UTTSI框架，通过双信号估计器区分认知不确定性和偶然模糊性，对不确定实例进行自适应特征过滤和随机特征路径探索，在保持最坏延迟不变的情况下实现平均约2.8倍基础模型开销，实验和在线A/B测试均取得显著提升。

Comments 12 pages, 4 Figures, 3 Tables

详情

AI中文摘要

扩展测试时计算对语言模型已被证明非常有效，然而这一机会在工业点击率（CTR）预测中仍未得到充分探索。CTR模型存在一个根本的不对称性：训练中充分表示的特征组合产生自信的预测，而稀疏观察到的特征组合则产生不可靠的输出。现有的训练阶段解决方案（如自适应门控）学习一个固定的选择函数，但受限于相同的稀疏性，在部署时无法提供针对每个实例的补救措施。我们提出UTTSI（不确定性触发的测试时选择性推理），一个无需训练、模型无关的框架，将推理深度按比例扩展到每个实例的不确定性。一个结合模型logit置信度和数据级频率先验的双信号估计器区分认知不确定性和偶然模糊性。每个实例都经过自适应特征过滤以去除不可靠的嵌入；不确定的实例额外接受随机特征路径探索，其预测通过一致性加权集成进行聚合。自信的实例完全绕过探索，保持平均开销约为基础模型成本的2.8倍，最坏情况延迟不变。在四个数据集和三种骨干架构上的实验表明，与所有训练阶段基线相比，取得了持续且统计显著的增益。为期七天的在线A/B测试进一步证实了5.3%的相对CTR提升（p < 0.01），确立了选择性测试时计算分配作为CTR预测训练阶段进展的实用补充。

英文摘要

Scaling test-time compute has proven highly effective for language models, yet this opportunity remains largely unexplored for industrial Click-Through Rate (CTR) prediction. CTR models suffer from a fundamental asymmetry: feature combinations well-represented in training yield confident predictions, while sparsely observed ones produce unreliable outputs. Existing training-phase solutions such as adaptive gating learn a fixed selection function subject to the same sparsity, offering no per-instance recourse at deployment.We propose UTTSI (Uncertainty-Triggered Test-Time Selective Inference), a training-free model-agnostic framework that scales inference depth proportionally to per-instance uncertainty. A dual-signal estimator combining model logit confidence with a data-level frequency prior distinguishes epistemic uncertainty from aleatoric ambiguity. Every instance undergoes adaptive feature filtering to remove unreliable embeddings; uncertain instances additionally receive stochastic feature-path explorations whose predictions are aggregated via consistency-weighted ensembling. Confident instances bypass exploration entirely, keeping average overhead at approximately $2.8\times$ base model cost with worst-case latency unchanged.Experiments on four datasets with three backbone architectures demonstrate consistent, statistically significant gains over all training-phase baselines. A seven-day online A/B test further confirms a 5.3% relative CTR gain ($p < 0.01$), establishing selective test-time compute allocation as a practical complement to training-phase advances for CTR prediction.

URL PDF HTML ☆

赞 0 踩 0

2605.24986 2026-05-26 cs.IR cs.LG 版本更新

Self-Balancing Gradient Allocation for Heterogeneity-Aware Feature Generation in Click-Through Rate Prediction

点击率预测中面向异构感知特征生成的自平衡梯度分配

Moyu Zhang, Yun Chen, Yujun Jin, Jinxin Hu, Yu Zhang, Xiaoyi Zeng

发表机构 * Alibaba Group（阿里巴巴集团）

AI总结针对生成式CTR方法中重建目标忽略特征场异构性导致难场欠拟合的问题，提出HeteGenCTR，通过可学习的场难度参数联合训练去噪网络，实现自平衡损失和难度引导注意力机制，在五个基准和在线A/B测试中取得显著提升。

Comments 12 pages, 5 figures, 4 tables

详情

AI中文摘要

通过离散扩散的生成式预训练在所有特征场上同时提供密集的重建监督，缓解了CTR预测中数据稀疏导致的表示崩溃。然而，所有现有的生成式CTR方法都有一个根本限制：重建目标对每个特征场赋予相同的训练权重，忽略了高基数ID字段、稀疏分类属性、数值和行为序列之间重建难度的深刻异质性。这导致容易的场主导训练梯度，而最难但信息最丰富的场长期欠拟合，我们将这个问题称为生成难度不平衡。我们提出HeteGenCTR，通过每个场可学习的难度参数与去噪网络联合训练来解决这种不平衡。这个统一信号驱动两个协调组件，无需额外超参数：一个自平衡损失，自动将梯度预算重新分配给更难的场，具有可证明的稳定均衡；以及一个难度引导的注意力机制，抑制已经收敛的容易场的影响，同时放大向难场的跨场信息流。两个组件共享相同的学习信号，并在整个训练过程中保持相互一致。在五个CTR基准和一个为期七天的在线A/B测试中，实验表明相对于最先进的基线具有一致且统计显著的改进，对冷启动和长尾用户有不成比例的增益。

英文摘要

Generative pre-training via discrete diffusion provides dense reconstruction supervision across all feature fields simultaneously, mitigating representation collapse from data sparsity in CTR prediction. However, all existing generative CTR methods share a fundamental limitation: the reconstruction objective assigns equal training weight to every feature field, ignoring the profound heterogeneity of reconstruction difficulty across high-cardinality ID fields, sparse categorical attributes, numerical values, and behavioral sequences. This causes easy fields to dominate training gradients while the hardest but most informative fields remain chronically underfit, a problem we term the generative difficulty imbalance.We propose HeteGenCTR, which resolves this imbalance through per-field learnable difficulty parameters jointly trained with the denoising network. This unified signal drives two coordinated components without additional hyperparameters: a self-balancing loss that automatically reallocates gradient budget toward harder fields with a provably stable equilibrium, and a difficulty-guided attention mechanism that suppresses the influence of already-converged easy fields while amplifying cross-field information flow toward hard fields. Both components share the same learned signal and remain mutually consistent throughout training. Experiments on five CTR benchmarks and a seven-day online A/B test demonstrate consistent, statistically significant improvements over state-of-the-art baselines, with disproportionate gains for cold-start and long-tail users.

URL PDF HTML ☆

赞 0 踩 0

2605.24985 2026-05-26 cs.RO cs.LG physics.comp-ph 版本更新

Learning, locomotion, and navigation of soft synthetic snakes in three-dimensional, heterogeneous environments

软体合成蛇在三维异质环境中的学习、运动与导航

Xiaotian Zhang, Ali Albazroun, Tixian Wang, Songyuan Cui, Prashant G. Mehta, Mattia Gazzola

发表机构 * Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana–Champaign（卡尔·R·沃塞基因组生物学研究所，伊利诺伊大学厄巴纳-香槟分校）； Department of Mechanical and Aerospace Engineering, Hong Kong University of Science and Technology（香港科学与技术大学机械与航空航天工程系）； Department of Mechanical Science and Engineering, University of Illinois Urbana–Champaign（伊利诺伊大学厄巴纳-香槟分校机械科学与工程系）

AI总结提出基于仿生驱动和感知模型的强化学习框架，使软体合成蛇能够自主导航非结构化三维地形，并通过高保真环境验证鲁棒性。

Comments 14 pages, 5 figures

详情

AI中文摘要

无肢陆地动物表现出卓越的运动多样性和控制能力，目前尚无法被工程对应物所超越。在这里，我们引入了一个计算框架，使软体合成蛇能够导航非结构化的、异质的三维地形。我们的方法基于仿生驱动和感知模型，这些模型降低了高自由度连续体固有的控制复杂性。这些模型被集成到强化学习架构中，以推导出穿越环境的策略。训练首先在简化的同质地形中进行，以学习运动基元。然后，这些基元被组合成针对复杂地形的自适应策略。我们通过将蛇部署在从真实世界成像重建的高保真三维环境中来展示鲁棒性，实现了可靠的导航。总体而言，这项工作为自然地形中连续系统的控制提供了一个物理真实的仿真平台和实用见解。

英文摘要

Limbless terrestrial animals exhibit exceptional locomotor versatility and control, currently unmatched by engineered counterparts. Here, we introduce a computational framework that enables soft synthetic snakes to navigate unstructured, heterogeneous 3D terrains. Our approach is grounded in bio-inspired actuation and sensing models that reduce the control complexity inherent to high-degree-of-freedom, continuum bodies. These models are integrated into a reinforcement learning architecture to derive environment-traversing policies. Training first occurs in simplified, homogeneous terrains to learn locomotion primitives. These are then composed into adaptive strategies for complex landscapes. We demonstrate robustness by deploying a snake in high-fidelity 3D environments reconstructed from real-world imaging, achieving reliable navigation. Overall, this work provides a physically-realistic simulation platform and practical insights for the control of continuum systems in natural terrains.

URL PDF HTML ☆

赞 0 踩 0

2605.24983 2026-05-26 cs.LG 版本更新

Benchmarking non-conformity score functions in conformal prediction

共形预测中非一致性评分函数的基准测试

Sol Erika Boman

发表机构 * Department of Medical Epidemiology and Biostatistics, Karolinska Institutet（卡罗林斯卡研究所医学流行病学与生物统计学系）； Department of Molecular Medicine and Surgery, Karolinska Institutet（卡罗林斯卡研究所分子医学与外科学系）

AI总结本文综述了共形预测中非一致性评分函数的性质，提出原始修改和评估方法，并通过实验比较了不同函数在平衡和不平衡类别设置下的性能。

Comments 3 tables, 1 supplementary table, 1 supplementary figure

详情

AI中文摘要

共形预测是机器学习分类中模型校准的一种有用且多功能的替代方法。它将单类预测替换为预测集，保证预测集包含真实类别的 extit{先验}概率大于或等于预指定的比率。预测集的大小和有用性在很大程度上取决于非一致性评分函数的选择。科学文献中包含许多非一致性评分函数的例子，但缺乏研究其性质和有效性的工作。在本文中，我们概述了非一致性评分函数的性质。我们给出了现有文献中的非一致性评分函数示例，并引入了原始修改。我们提出了一种评估共形预测器预测集大小的原始方法，并用它来比较非一致性评分函数。我们还研究了不同非一致性评分函数在类别不平衡设置下用于类别条件共形预测的有效性。

英文摘要

Conformal prediction is a useful and versatile alternative to model calibration in machine learning classification. It replaces single-class prediction with prediction sets, guaranteeing that the \textit{a priori} probability of the prediction sets containing the true class is larger than or equal to a pre-specified rate. The size and usefulness of the prediction sets relies heavily on the choice of the non-conformity score function. The scientific literature contains many examples of non-conformity score functions but there is an absence of studies examining their properties and effectiveness. In this paper, we give an overview of properties of non-conformity score functions. We give examples of non-conformity score functions in the existing literature and introduce original modifications. We introduce an original method of evaluating the prediction set sizes of conformal predictors and use it to provide a comparison between non-conformity score functions. We also examine efficacy of different non-conformity score functions for class-conditional conformal prediction in a setting with imbalanced classes.

URL PDF HTML ☆

赞 0 踩 0

2605.24981 2026-05-26 cs.CL cs.LG 版本更新

Large Language Model Selection with Limited Annotations

有限标注下的大语言模型选择

Yavuz Durmazkeser, Patrik Okanovic, Andreas Kirsch, Torsten Hoefler, Nezihe Merve Gürel

发表机构 * TU Delft（代尔夫特理工大学）； ETH Zurich（苏黎世联邦理工学院）

AI总结提出SELECT-LLM框架，通过基于期望信息增益的查询选择规则，在有限标注下高效识别最佳大语言模型，显著降低标注成本。

Comments 33 pages, 5 figures, 4 tables

详情

AI中文摘要

为给定任务选择大语言模型（LLM）需要比较许多强候选模型，然而标准评估依赖于固定评估集上的昂贵标注。为解决这一挑战，我们开发了SELECT-LLM，这是第一个用于主动模型选择LLM的框架。SELECT-LLM旨在找到一组查询，其标注对于识别给定任务的最佳LLM最具信息量。为此，我们引入了一种基于期望信息增益的查询选择规则，该规则通过候选模型输出之间的成对相似性计算。由于该规则仅使用生成的模型响应，SELECT-LLM可以在不假设候选模型架构或访问模型权重的情况下应用。这使得它适用于开源权重和黑盒LLM。我们在23个数据集、156个评估模型、多样化的任务族和多个文本评估指标上评估了SELECT-LLM。在所有实验中，SELECT-LLM在每个设置中都优于最强基线，最佳模型选择的标注成本降低高达81.8%，近最佳模型选择的标注成本降低高达84.78%。

英文摘要

Choosing a Large Language Model (LLM) for a given task requires comparing many strong candidates, yet standard evaluation relies on costly annotations over fixed evaluation sets. To address this challenge, we develop SELECT-LLM, the first framework for active model selection of LLMs. SELECT-LLM aims to find a small set of queries whose annotations are most informative for identifying the best LLM for a given task. To this end, we introduce a query selection rule based on expected information gain, computed from pairwise similarities between candidate model outputs. Because this rule only uses generated model responses, SELECT-LLM can be applied across candidate models without assumptions about their architecture or access to model weights. This makes it suitable for both open-weight and black-box LLMs. We evaluate SELECT-LLM across 23 datasets, 156 evaluated models, diverse task families, and multiple text evaluation metrics. Across all experiments, SELECT-LLM improves over the strongest baseline in every setting, with annotation cost reductions up to 81.8% for best model selection and up to 84.78% for near-best model selection.

URL PDF HTML ☆

赞 0 踩 0

2605.24975 2026-05-26 cs.RO cs.AI cs.LG 版本更新

Bridging the Gap: Enabling Soft Actor Critic for High Performance Legged Locomotion

弥合差距：实现软演员-评论家算法用于高性能腿部运动

Gianluca Sabatini, Chenhao Li, Marco Hutter

发表机构 * ETH Zurich（苏黎世联邦理工学院）

AI总结本文通过识别软演员-评论家（SAC）在并行训练中性能不足的根本原因，并提出策略初始化、超时感知评论家目标和多步回报估计等改进，使其在腿部运动任务中达到与近端策略优化（PPO）相当的性能。

详情

AI中文摘要

近端策略优化（PPO）由于其在IsaacLab等大规模并行仿真环境中的鲁棒性和可扩展性，已成为训练腿部机器人的事实标准。然而，其基于策略的性质使其天生样本效率低下，阻碍了其在真实硬件上的持续适应和微调。相比之下，软演员-评论家（SAC）是一种可以重用过去经验的离策略算法，使其成为模拟到现实迁移工作流程的自然候选，其中同一算法既可用于仿真，也可用于真实机器人的在线学习。尽管有这些优势，SAC在大规模并行训练设置中始终未能匹配PPO的经验性能。本工作确定了这一差距的根本原因，并引入了针对性的修改，包括策略初始化、超时感知评论家目标和多步回报估计，使SAC能够稳定地大规模训练。在多个腿部机器人平台和多样化的运动任务上评估，我们的方法完全弥合了与PPO的性能差距。

英文摘要

Proximal Policy Optimization (PPO) has become the de facto standard for training legged robots, thanks to its robustness and scalability in massively parallel simulation environments like IsaacLab. However, its on-policy nature makes it inherently sample-inefficient, preventing its use for continuous adaptation and fine-tuning on real hardware. Soft Actor-Critic (SAC), by contrast, is an off-policy algorithm that can reuse past experience, making it a natural candidate for sim-to-real transfer workflows where the same algorithm can be used both in simulation and for online learning on the real robot. Despite these advantages, SAC has consistently failed to match PPO's empirical performance in massively parallel training settings. This work identifies the root causes of this gap and introduces targeted modifications, covering policy initialization, timeout-aware critic targets, and multi-step return estimation, that enable SAC to train stably at scale. Evaluated across multiple legged robot platforms and diverse locomotion tasks, our approach closes the performance gap with PPO entirely.

URL PDF HTML ☆

赞 0 踩 0

2605.24971 2026-05-26 cs.LG cs.AI 版本更新

TGFormer: Towards Temporal Graph Transformer with Auto-Correlation Mechanism

TGFormer：基于自相关机制的时间图Transformer

Hongjiang Chen, Pengfei Jiao, Ming Du, Xuan Guo, Zhidong Zhao, Di Jin, Xiao Liu

发表机构 * Hangzhou Dianzi University, School of Cyberspace（杭州电子科技大学信息学院）； Tianjin University, College of Intelligence and Computing（天津大学智能与计算学院）； State Key Laboratory of Systems Medicine for Cancer, Shanghai Cancer Institute（癌症系统医学国家重点实验室，上海癌症研究院）

AI总结针对时间图神经网络在捕获长期依赖和周期模式上的不足，提出TGFormer，通过轨迹框架和自相关机制实现子交互级别的依赖发现与表示聚合，在六个基准上最高提升9.35%精度。

详情

DOI: 10.1016/j.patcog.2025.112053
Journal ref: Pattern Recognition 170 (2026): 112053

AI中文摘要

对时间图神经网络（TGNN）日益增长的兴趣源于它们能够建模复杂动态并提供卓越性能。然而，TGNN在捕获长期依赖和识别周期模式方面面临根本性挑战。为解决这些限制，我们提出了TGFormer，一种专为时间图设计的新型Transformer架构。我们的模型通过建立与时间序列分析原理一致的轨迹框架，重新定义了时间图学习。这种方法使TGFormer能够通过对历史交互的系统分析来推导节点表示，从而实现对跨连续时间戳的节点关系的精细检查。基于随机过程理论，我们开发了一种自相关机制，系统性地揭示节点交互中的周期依赖。这一创新使TGFormer能够在子交互级别进行依赖发现和表示聚合，相比传统注意力机制展现出更高的效率和准确性。在六个公开基准上的实验验证了我们的方法的有效性，与最先进方法相比，TGFormer最高实现了9.35%的精度提升。

英文摘要

The growing interest in Temporal Graph Neural Networks (TGNNs) stems from their ability to model complex dynamics and deliver superior performance. However, TGNNs encounter fundamental challenges in capturing long-term dependencies and identifying periodic patterns. To address these limitations, we propose TGFormer, a novel Transformer architecture specifically designed for temporal graphs. Our model redefines temporal graph learning by establishing a trajectory framework that aligns with time series analysis principles. This approach allows TGFormer to derive node representations through systematic analysis of historical interactions, enabling granular examination of node relationships across sequential timestamps. Building upon stochastic process theory, we develop an auto-correlation mechanism that systematically uncovers periodic dependencies in node interactions. This innovation empowers TGFormer to perform dependency discovery and representation aggregation at sub-interaction levels, demonstrating superior efficiency and accuracy compared to conventional attention mechanisms. Experimental validation across six public benchmarks confirms the effectiveness of our approach, with TGFormer at most achieving 9.35\% precision improvement compared to state-of-the-art approaches.

URL PDF HTML ☆

赞 0 踩 0

2605.24969 2026-05-26 cs.LG cs.AI 版本更新

OSDTW: Optimal Shared Depth and Task Weighting for Long-Tailed Recognition

OSDTW：长尾识别的最优共享深度与任务加权

Chang Chu, Qingyue Zhang, Shao-Lun Huang, Junxiong Zheng

发表机构 * Shenzhen International Graduate School, Tsinghua University, Shenzhen, China（清华大学深圳国际研究生院，中国深圳）； Shenzhen Zkosemi Semiconductor Technology Co., Ltd（深圳卓芯半导体科技有限公司）

AI总结提出OSDTW框架，通过分解任务、共享编码器与任务特定解码器，并基于Fisher信息矩阵推导泛化误差的偏置-方差分解，以优化共享深度和任务权重，解决长尾识别中头部-尾部性能权衡问题。

Comments ICIC 2026 Oral

详情

AI中文摘要

长尾识别面临持续的头部-尾部权衡：提升尾部性能通常会降低头部准确率，并可能增加训练不稳定性。尽管重加权、解耦训练和多专家方法取得了强有力的实证结果，但关于头部和尾部类别之间表示共享以及跨类别组监督加权的关键设计选择仍主要基于启发式。在这项工作中，我们提出了OSDTW，一个原则性的任务分解框架，将原始的单标签识别问题划分为头部任务和尾部任务，通过共享编码器和任务特定解码器实现。为了处理两个标签组之间的互斥性和统计依赖性，我们引入了一个因子化模型，并表明由此产生的基于KL散度的泛化误差可以写为任务项之和（加一个常数），从而得到一个定义良好的任务级目标。我们进一步开发了一个三阶段训练流程：独立任务训练以估计任务级最优值和Fisher信息矩阵，加权联合训练以学习共享编码器，以及分支组装以构建最终的解耦模型。在块对角Fisher近似下，我们推导了期望泛化误差的可计算二阶展开，将其分解为编码器方差、编码器偏置和解码器方差。这种偏置-方差分解提供了一个可计算的代理来选择共享深度和任务权重，从而实现高效的超参数搜索。在标准长尾基准上的实验证明了所提出方法相对于强基线的有效性。

英文摘要

Long-tailed recognition suffers from a persistent head--tail trade-off: improving tail performance often degrades head accuracy and can increase training instability. Despite strong empirical results from re-weighting, decoupled training, and multi-expert methods, key design choices about representation sharing between head and tail classes and supervision weighting across class groups remain largely heuristic. In this work, we propose OSDTW, a principled task-decomposition framework that partitions the original single-label recognition problem into a head task and a tail task, implemented with a shared encoder and task-specific decoders. To handle the mutual exclusivity and statistical dependence between the two label groups, we introduce a factorized model and show that the resulting Kullback--Leibler divergence-based generalization error can be written as the sum of task-wise terms up to an additive constant, yielding a well-defined task-wise objective. We further develop a three-stage training pipeline: independent task training to estimate task-wise optima and the Fisher information matrix, weighted joint training to learn a shared encoder, and branch assembly to construct the final decoupled model. Under a block-diagonal Fisher approximation, we derive a computable second-order expansion of the expected generalization error, decomposing it into encoder variance, encoder bias, and decoder variance. This bias--variance decomposition provides a computable proxy to select the shared depth and task weights, enabling efficient hyper-parameter search. Experiments on standard long-tailed benchmarks demonstrate the effectiveness of the proposed approach over strong baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.24965 2026-05-26 cs.CV cs.AI cs.LG 版本更新

通过区域感知注意力重校准减轻视觉语言模型中的对象幻觉

Yuanzhi Xu, Qian Gao, Jun Fan, Guohui Ding, Zhenyu Yang, Sixue Lin, Yuteng Xiao

发表机构 * Qilu University of Technology (Shandong Academy of Sciences)（齐鲁工业大学（山东省科学院））； China Telecom Digital Intelligence Technology Co, Ltd（中国电信数字智能技术有限公司）； Shenyang Aerospace University（沈阳航空航天大学）； Qilu Institute of Technology（齐鲁理工学院）

AI总结提出一种无需训练的区域感知自适应加权机制，通过计算注意力头的稳健统计中点并利用跨头分歧动态调整干预预算，以连续惩罚调制抑制幻觉路径，有效纠正视觉语义错位，同时保持生成流畅性。

详情

AI中文摘要

生成事实上不正确的对象（通常称为对象幻觉）仍然是大型视觉语言模型（LVLMs）中的一个持久挑战。当前解决该问题的方法——从昂贵的数据驱动微调和延迟较高的对比解码到刚性的注意力头截断——常常在计算效率或模型特征空间的连续性上做出妥协。为克服这些限制，我们引入了一种新颖的、无需训练的推理策略，该策略作为一种区域感知的自适应加权机制，动态纠正语义漂移，而不依赖于突然的启发式截断。通过计算各注意力头上的离群值稳健统计中点，我们为可靠的视觉表示建立了一个稳定锚点。然后，我们利用跨区域映射的跨头分歧来动态确定干预预算，通过连续惩罚调制温和地抑制引起幻觉的注意力路径。这种重校准过程有效纠正了视觉语义错位，同时完全保留了生成流畅性和语言先验。在包括CHAIR、POPE和MME在内的标准多模态基准上的全面评估表明，我们的策略显著减少了实例级和句子级幻觉。结果展示了与当代基线相比的最先进性能，证实了我们方法的效率和算法鲁棒性。我们的代码将公开。

英文摘要

The generation of factually incorrect objects, commonly known as object hallucination, remains a persistent challenge in Large Vision-Language Models (LVLMs). Current approaches to address this issue - ranging from expensive data-driven fine-tuning and high-latency contrastive decoding to rigid attention head truncation - frequently compromise either computational efficiency or the continuity of the model's feature space. To overcome these limitations, we introduce a novel, training-free inference strategy that operates as a region-aware adaptive weighting mechanism to dynamically correct semantic drift without relying on abrupt heuristic truncations. By computing an outlier-resistant statistical midpoint across various attention heads, we establish a stable anchor for reliable visual representations. We then utilize the inter-head disagreement mapped across regions to dynamically determine intervention budgets, gently suppressing hallucination-inducing attention paths through a continuous penalty modulation. This recalibration process effectively rectifies visual-semantic misalignments while fully preserving generative fluency and language priors. Comprehensive evaluations on standard multimodal benchmarks, including CHAIR, POPE, and MME, reveal that our strategy substantially curtails both instance- and sentence-level hallucinations. The results demonstrate state-of-the-art performance against contemporary baselines, confirming our method's efficiency and algorithmic robustness. Our code will be public.

URL PDF HTML ☆

赞 0 踩 0

2605.24950 2026-05-26 cs.RO cs.LG 版本更新

ARCANE-PedSynth: Synthetic Multi-Pedestrian Datasets with Behavioural Crossing Annotations

ARCANE-PedSynth：具有行为穿越注释的合成多行人数据集

Muhammad Naveed Riaz, Maciej Wielgosz, Antonio M. López Peña

发表机构 * Computer Vision Center (CVC), Universitat Aut\` o noma de Barcelona (UAB), Bellaterra, Barcelona, Spain ； Institute of Electronics, Faculty of Computer Science, Electronics ； Telecommunications, AGH University of Krakow, Krak\' o w, Poland

AI总结提出基于CARLA的开源框架ARCANE-PedSynth，通过混合AI-手动控制架构和12状态行为有限状态机生成高穿越率的多行人合成数据，支持RGB、LiDAR和DVS模态及行为标注，用于自动驾驶中的行人穿越预测。

详情

AI中文摘要

我们提出ARCANE-PedSynth，一个基于CARLA的开源软件框架，用于生成具有密集行为注释的合成多行人数据集，以支持自动驾驶中的行人穿越预测。该框架通过混合AI-手动行人控制架构克服了CARLA原生9%的穿越率，可实现高达75%的可配置目标穿越率。一个包含五种角色原型的12状态行为有限状态机产生了多样化的穿越行为。该框架生成同步的RGB、LiDAR和DVS数据，并带有每帧穿越标签、行为状态和估计的2D姿态关键点。我们通过PedSynth++（一个使用该框架生成的示例数据集）展示了ARCANE-PedSynth，该数据集包含533个多行人片段，覆盖12种天气条件，并带有RGB、LiDAR和DVS流。ARCANE-PedSynth通过CLI参数化和Docker容器化实现完全可重复性。

英文摘要

We present ARCANE-PedSynth, an open-source CARLA-based software framework for generating synthetic multi-pedestrian datasets with dense behavioural annotations for pedestrian crossing prediction in autonomous driving. The framework overcomes CARLA's native 9% crossing rate through a hybrid AI-manual pedestrian control architecture, enabling configurable target rates up to 75%. A 12-state behavioural finite state machine with five character archetypes produces diverse crossing behaviours. The framework generates synchronised RGB, LiDAR, and DVS data with per-frame crossing labels, behavioural states, and estimated 2D pose keypoints. We demonstrate ARCANE-PedSynth through PedSynth++, an example dataset generated with the framework, comprising 533 multi-pedestrian clips across 12 weather conditions with RGB, LiDAR, and DVS streams. ARCANE-PedSynth is fully reproducible via CLI parameterisation and Docker containerisation.

URL PDF HTML ☆

赞 0 踩 0

2605.24945 2026-05-26 cs.LG cs.AI physics.ao-ph 版本更新

RealBench: Benchmarking Data-Driven Numerical Weather Forecasting Under Operational Conditions and Extreme Event Challenges

RealBench: 在操作条件和极端事件挑战下对数据驱动数值天气预报的基准测试

Ruize Li, Zhibin Wen, Tao Han, Hao Chen, Fenghua Ling, Wei Zhang, Song Guo, Lei Bai

发表机构 * The Hong Kong University of Science and Technology（香港科学与技术大学）； Nanjing University（南京大学）； Southern University of Science and Technology（南方科技大学）； Shanghai AI Laboratory（上海人工智能实验室）； Shanghai TechWind Technology Co., Ltd.（上海技风科技有限公司）

AI总结提出RealBench基准，通过使用低延迟操作分析和全球10,000+站点观测数据，在严格分布外测试集上评估AI天气预报模型，揭示再分析指标与实际性能的显著差异，特别是极端事件方面。

Comments 35 pages, 22 figures

详情

AI中文摘要

准确评估天气预报模型对于其在现实世界应用中的可靠部署至关重要。然而，现有基准主要依赖再分析产品（如ERA5），这些产品通过延迟数据同化生成，不能反映实时操作预报的约束，导致基准性能与现实预报之间存在系统性不匹配。在这项工作中，我们引入了RealBench，这是一个用于AI天气预报的下一代基准，强调在操作条件下的现实评估。RealBench具有严格分布外测试集，覆盖2025年，以消除数据泄露并捕捉近期大气状况。它整合了多个数据源，包括低延迟操作分析和包含超过10,000个站点的全球原位观测数据集，从而能够直接针对真实大气测量进行评估。除了标准全球指标外，RealBench还为高影响极端事件（包括热浪、寒潮和热带气旋）提供了全面的评估框架，使用事件特定指标更好地反映现实预报优先级。评估结果揭示了基于再分析的指标与现实性能之间的显著差异，特别是关于极端事件。通过突出现有基准的局限性，这项工作建立了一个更忠实且与操作相关的评估范式，为推进下一代AI天气预报系统提供了严格的基础。基准实现可在以下网址获取：https://github.com/lixruize-del/NWP-Benchmark。

英文摘要

Accurate evaluation of weather forecasting models is critical for their reliable deployment in real-world applications. However, existing benchmarks predominantly rely on reanalysis products such as ERA5, which are generated through delayed data assimilation and do not reflect the constraints of real-time operational forecasting, thereby resulting in a systematic mismatch between benchmark performance and real-world forecasting. In this work, we introduce RealBench, a next-generation benchmark for AI weather forecasting that emphasizes realistic evaluation under operational conditions. RealBench features a strictly out-of-distribution test set spanning 2025 to eliminate data leakage and capture recent atmospheric regimes. It integrates multiple data sources, including low-latency operational analysis and a large-scale global in-situ observation dataset comprising over 10,000 stations, enabling direct evaluation against real atmospheric measurements. Beyond standard global metrics, RealBench provides a comprehensive evaluation framework for high-impact extreme events, including heatwaves, cold surges, and tropical cyclones, using event-specific metrics that better reflect real-world forecasting priorities. The evaluation results reveal substantial discrepancies between reanalysis-based metrics and real-world performance, particularly concerning extreme events. By highlighting the limitations of existing benchmarks, this work establishes a more faithful and operationally relevant evaluation paradigm, providing a rigorous foundation for advancing next-generation AI weather forecasting systems. The benchmark implementation is available at: https://github.com/lixruize-del/NWP-Benchmark.

URL PDF HTML ☆

赞 0 踩 0

2605.24941 2026-05-26 cs.CR cs.LG 版本更新

Memory-Induced Tool-Drift in LLM Agents

LLM代理中的记忆诱导工具漂移

Mahavir Dabas, Jihyun Jeong, Ming Jin, Ruoxi Jia

发表机构 * Virginia Tech（弗吉尼亚理工大学）

AI总结研究LLM代理中长期记忆存储的个性偏见（如成本意识、不耐烦等）在不适用情境下静默影响工具调用的问题，提出MEMDRIFT基准测试，发现偏置记忆导致工具参数偏离基线，且现有防御措施无法消除该现象。

详情

AI中文摘要

现代LLM代理将用于个性化的长期记忆与用于在现实世界中采取行动的工具调用接口相结合——这一组合支撑着当代生产系统。我们研究了这种组合的一个先前未被检查的失败：当存储在记忆中的个性驱动偏见（成本意识、不耐烦、风险承受能力等）在不适用情境下静默影响工具调用时。我们称此为记忆诱导工具漂移，并通过MEMDRIFT将其操作化，MEMDRIFT是一个包含105个场景的基准测试，涵盖五个偏见维度和七个专业领域，通过自动化对抗性流水线生成。在七个前沿模型（包括具有扩展推理能力的模型）中，偏置记忆使偏转分数（一种由评判者评分的参数偏离无偏基线的度量）在1-5分制上提高高达+3.6分。当记忆管理由三种生产记忆架构处理时，工具漂移持续存在。该现象影响现实世界的工具：扫描288个经过验证的MCP服务器上的6,062个工具，我们标记了608个具有易受影响参数的工具，并在一个经过验证的子集上确认了工具漂移。从机制上讲，偏置记忆充当隐式引导向量，将激活沿与显式行为指令相同的潜在方向推动。它们还将注意力从任务相关上下文重新分配到与目标参数具有表面关键词重叠的记忆条目上。标准防御——基于提示的相关性指令和记忆过滤器——减少了漂移但未能消除它。随着代理以用户名义采取越来越重要的行动，记忆诱导工具漂移代表了当前安全措施未能解决的一个系统性漏洞，这激发了在记忆管理和工具调用生成交叉点上的专用防御。

英文摘要

Modern LLM agents combine long-term memory for personalization with tool-calling interfaces for taking actions in the world -- a combination underpinning contemporary production systems. We study a previously unexamined failure of this combination: when personality-driven biases stored in memory (cost-consciousness, impatience, risk tolerance, etc.) silently affect tool calls in contexts where they are not applicable. We call this memory-induced tool-drift and operationalize it through MEMDRIFT, a benchmark of 105 scenarios spanning five bias dimensions and seven professional domains, generated through an automated adversarial pipeline. Across seven frontier models -- including those with extended reasoning -- biased memories raise deflection scores (a judge-scored measure of parameter deviation from unbiased baselines) by up to $+3.6$ points on a 1--5 scale. Tool-drift persists when memory management is handled by three production memory architectures. The phenomenon affects real-world tools: scanning 6{,}062 tools across 288 verified MCP servers, we flag 608 with susceptible parameters and confirm tool-drift on a validated subset. Mechanistically, biased memories act as implicit steering vectors, pushing activations along the same latent directions as explicit behavioral instructions. They also redistribute attention from task-relevant context toward memory entries with surface-level keyword overlap to the target parameter. Standard defenses -- prompt-based relevance instructions and memory filters -- reduce drift but do not eliminate it. As agents take increasingly consequential actions on a user's behalf, memory-induced tool-drift represents a systematic vulnerability that current safeguards do not address, motivating dedicated defenses at the intersection of memory management and tool-call generation.

URL PDF HTML ☆

赞 0 踩 0

2605.24939 2026-05-26 cs.LG math.OC 版本更新

Global linear convergence of entropy-regularized softmax policy gradient beyond tabular MDPs

熵正则化softmax策略梯度的全局线性收敛性：超越表格MDP

Ziyue Chen, David Šiška, Lukasz Szpruch

发表机构 * School of Mathematics（数学系）

AI总结本文研究连续状态和动作空间的无限时域熵正则化马尔可夫决策过程中策略梯度的全局收敛性，通过线性函数逼近的log-linear softmax策略，在$Q^π_τ$可实现性假设下建立非均匀Polyak--Łojasiewicz不等式，并识别两种特征机制下非均匀常数的有界性，证明正则化目标沿梯度流的全局线性收敛。

详情

AI中文摘要

我们研究了具有连续状态和动作空间的无限时域熵正则化马尔可夫决策过程（MDP）中策略梯度的全局收敛性。我们考虑带有线性函数逼近的log-linear softmax策略，它扩展了表格softmax参数化，同时保留了易处理的策略类。在正则化状态-动作值函数的$Q^π_τ$可实现性下，我们首先建立了一个非均匀的Polyak--Łojasiewicz（PŁ）不等式。非均匀性源于与策略几何相关的常数的退化性，即Fisher信息矩阵或非中心特征协方差矩阵。然后，我们确定了两种特征机制，在这些机制下，该非均匀常数可以沿梯度流有界。对于全仿射跨度特征，我们证明了KL正则化子的径向无界性，并表明Fisher信息矩阵的最小特征值保持在一个依赖于初始化的正常数之下。对于单纯形值特征，我们在与全1向量正交的子空间中证明了类似的径向无界性结果，并获得了非中心协方差矩阵最小特征值的统一下界。这些结果意味着正则化目标沿梯度流的全局线性收敛，即次优性以$\mathcal{O}(e^{-Ct})$衰减，其中$C>0$。我们的分析将熵正则化softmax策略梯度的全局收敛理论扩展到Agarwal等人（2020）；Bhandari和Russo（2024）；Mei等人（2020）的表格设置之外。

英文摘要

We study the global convergence of policy gradient for infinite-horizon entropy-regularized Markov decision processes (MDPs) with continuous state and action spaces. We consider log-linear softmax policies with linear function approximation, which extend the tabular softmax parameterization while retaining a tractable policy class. Under $Q^π_τ$-realizability for the regularized state-action value function, we first establish a non-uniform Polyak--Łojasiewicz (PŁ) inequality. The non-uniformity arises through degeneracy of constants associated with the policy geometry, namely the Fisher information matrix or an uncentered feature covariance matrix. We then identify two feature regimes under which this non-uniform constant can be bounded along the gradient flow. For full-affine-span features, we prove radial unboundedness of the KL regularizer and show that the smallest eigenvalue of the Fisher information matrix remains bounded below by an initialization-dependent positive constant. For simplex-valued features, we prove an analogous radial unboundedness result in the subspace orthogonal to the all-ones vector and obtain a uniform lower bound for the smallest eigenvalue of the uncentered covariance matrix. These results imply global linear convergence of the regularized objective along the gradient flow, i.e. suboptimality decaying as $\mathcal{O}(e^{-Ct})$ for some $C>0$. Our analysis extends the global convergence theory of entropy-regularized softmax policy gradient beyond the tabular setting of Agarwal et al. (2020); Bhandari and Russo (2024); Mei et al. (2020).

URL PDF HTML ☆

赞 0 踩 0

2605.24929 2026-05-26 stat.ML cs.IT cs.LG math.IT 版本更新

因式分解以泛化：面向时间序列预测的检索引导不变-动态分解

Jinjin Chi, Lei Feng, Lulu Zhang, Yongcheng Jing, Yiming Wang, Ximing Li, Jialie Shen, Leszek Rutkowski, Dacheng Tao

发表机构 * College of Computer Science and Technology, Jilin University（吉林大学计算机科学与技术学院）； College of Computing and Data Science, Nanyang Technological University（南洋理工大学计算机与数据科学学院）； City St George’s, University of London（伦敦大学城圣乔治学院）； Systems Research Institute, Polish Academy of Sciences（波兰科学院系统研究所）

AI总结提出检索引导的不变-动态分解框架，通过分离稳定共享结构与实例特定变化，提升时间序列零样本预测在分布偏移下的鲁棒性。

详情

AI中文摘要

时间序列基础模型（TSFMs）最近通过大规模预训练和检索增强预测实现了强大的零样本预测性能。然而，我们的实证分析揭示了基于检索的预测的一个非平凡限制：检索倾向于导致更振荡的预测，在高度波动的序列上提升性能，但在更平滑、趋势主导的序列上降低准确性。这表明检索信息可能在未明确区分稳定时间结构与实例特定变化的情况下被融合到预测中，这可能在分布偏移下降低鲁棒性。我们提出了一种用于时间序列预测的检索引导不变-动态分解框架。我们不将检索用作辅助预测上下文，而是利用检索到的序列作为来自相关环境的隐式样本，以指导表示分解。具体来说，我们首先通过基于注意力的聚合构建检索感知表示，然后引入检索引导路由机制将其分解为捕获稳定共享结构的不变组件和建模上下文相关变化的动态组件。这两个组件分别预测并融合以进行最终预测，使模型能够保留可迁移模式，同时保持对动态演变的适应性。我们进一步设计了鼓励不变学习和解耦的训练目标，并提供了理论见解，表明检索聚合减少了方差，并在没有显式环境监督的情况下近似不变表示学习。大量实验表明，我们的方法在分布偏移下持续提高鲁棒性，并在零样本预测设置中优于现有的TSFMs和基于检索的基线。

英文摘要

Time series foundation models (TSFMs) have recently achieved strong zero-shot forecasting performance through large-scale pretraining and retrieval-augmented prediction. However, our empirical analysis reveals a non-trivial limitation of retrieval-based forecasting: retrieval tends to induce more oscillatory predictions, improving performance on highly fluctuating series while degrading accuracy on smoother, trend-dominated ones. This suggests that retrieved information may be fused into prediction without explicitly distinguishing stable temporal structure from instance-specific variations, which can reduce robustness under distribution shifts. We propose a Retrieval-guided Invariant-Dynamic DEcomposition framework for time series forecasting. Rather than using retrieval as auxiliary predictive context, we leverage retrieved sequences as implicit samples from related environments to guide representation decomposition. Specifically, we first construct a retrieval-aware representation via attention-based aggregation, and then introduce a retrieval-guided routing mechanism to decompose it into an invariant component capturing stable shared structure and a dynamic component modeling context-dependent variations. These two components are forecast separately and fused for final prediction, enabling the model to preserve transferable patterns while remaining adaptive to evolving dynamics. We further design training objectives that encourage invariant learning and disentanglement, and provide theoretical insight showing that retrieval aggregation reduces variance and approximates invariant representation learning without explicit environment supervision. Extensive experiments demonstrate that our method consistently improves robustness under distribution shifts and outperforms existing TSFMs and retrieval-based baselines in zero-shot forecasting settings.

URL PDF HTML ☆

赞 0 踩 0

2605.24908 2026-05-26 cs.LG cs.AI 版本更新

On the Impact of Class Imbalance on the Learning Dynamics of Deep Neural Networks:An Intuitive Insight

论类别不平衡对深度神经网络学习动态的影响：直观洞察

Ismail B. Mustapha, Shafaatunnur Hasan, Sunday O. Olatunji, Hatem S. Y. Nabus

发表机构 * Faculty of Computing（计算机学院）； Universiti Teknologi Malaysia（技术大学）； Adejkunle Ajasin University（阿德吉库内勒·阿贾辛大学）； Johor, Malaysia（马来西亚 Johor）； Akungba-Akoko, Nigeria（尼日利亚 Akungba-Akoko）

AI总结通过监测不同不平衡比率下深度神经网络对多数类和少数类的学习模式，系统研究了类别不平衡如何导致模型早期欠拟合少数类并仅学习多数类，最终造成少数类表示过拟合而非泛化。

Comments Conference

详情

AI中文摘要

近年来，深度神经网络（DNN）中的类别不平衡问题引起了研究者的广泛关注。然而，相关文献中对DNN在不平衡数据上表现不佳的原因存在不同解释，表明人们对这一长期存在的现象如何影响DNN性能知之甚少。更好地理解这一问题对于开发有效的基于DNN的不平衡方法至关重要。因此，本研究通过监测DNN模型在不同不平衡比率数据集上对多数类和少数类的学习模式，系统研究了类别不平衡对DNN学习动态的影响。实验结果表明，与从平衡数据集学习时DNN类似地学习各个类别不同，类别不平衡严重损害了DNN的性能，导致模型在早期训练轮次中欠拟合少数类样本，同时仅学习多数类。尽管DNN最终学会了少数类样本，但这种学习方式仅导致学习到的少数类表示在测试阶段无法泛化，因为它们仅仅是过拟合以尽可能降低整体训练损失。

英文摘要

Class imbalance in deep neural networks (DNNs) has witnessed a rapid increase in research attention in recent years. However, the varying accounts of the reasons behind the poor performance of DNN on imbalance data in pertinent literature shows that little is known about how this agelong phenomenon impacts the performance of DNNs. A better understanding of this problem is crucial to developing effective DNN-based imbalance methods. Thus, this study systematically investigates the impact of class imbalance on the learning dynamics of DNN by monitoring the learning pattern of DNN models on both the majority and minority classes of datasets of varying imbalance ratios. Experimental findings shows that as against learning from balanced datasets where DNN learns the classes similarly, class imbalance has severe deteriorating impact on the performance of DNN, driving the model to underfit the minority class samples in the early training epochs while simultaneously learning only the majority class. Although DNN ultimately learns the minority samples, learning in this manner only results in learnt minority representations that are non-generalizable at test phase because they are merely overfitted to keep the overall training loss as low as possible.

URL PDF HTML ☆

赞 0 踩 0

2605.24903 2026-05-26 cs.CR cs.LG 版本更新

SEED: Semi-supervised Continual MalwarE Detection for Tackling ConcEpt Drift on a BuDget

SEED: 预算约束下应对概念漂移的半监督持续恶意软件检测

Suresh Kumar Amalapuram, Bikraj Shresta, Siva Ram murthy Chebiyam, Bheemarjuna Reddy Tamma, Sumohana S Channappayya

发表机构 * Indian Institute of Technology Ropar（印度理工学院罗帕尔）； Indian Institute of Technology Hyderabad（印度理工学院海得拉巴）

AI总结提出SEED方法，结合定制二元交叉熵损失与半监督持续学习和主动学习，在有限标注下有效检测未知恶意软件，平均AUT提升40%（BODMAS）和14%（AndroZoo）。

详情

AI中文摘要

基于机器学习的恶意软件检测器会随着良性应用和恶意应用中的概念漂移而随时间变得过时。最近的方法依赖完全标注数据，并利用层次对比损失（HCL）与主动学习，通过利用恶意软件表示中的语义结构来提高对漂移的鲁棒性。然而，在安全领域获取标注数据很困难。在部分标注设置下，HCL在检测未知恶意软件时性能显著下降，尤其是在BODMAS等可能缺乏强语义结构的数据集上。本文提出SEED，一种在有限监督下进行恶意软件检测的语义结构无关方法。SEED将定制的二元交叉熵目标与半监督持续学习和主动学习相结合。对于部分标注的已见任务，未标注样本通过奇异值分解投影到从先前已见数据构建的表示空间中，并与合适的标注样本配对以鼓励表示一致性。对于完全未标注的未见任务，使用表示空间中的余弦距离量化不确定性，并选择最不确定的样本供分析师标注。我们在Windows和Android恶意软件数据集上评估SEED。在已见任务上仅使用20%标注数据，与HCL*（HCL的半监督适应）相比，SEED在未知恶意软件检测上平均AUT提升40%（BODMAS）和14%（AndroZoo），同时在APIGraph上保持竞争力。最后，我们引入延迟缓冲区更新策略以减少重放期间的标签噪声传播并提高学习稳定性。

英文摘要

Machine learning based malware detectors become obsolete over time due to concept drift in benign and malware applications. Recent methods rely on fully labeled data and use hierarchical contrastive loss (HCL) with active learning to improve robustness against drift by exploiting semantic structure in malware representations. However, obtaining labeled data in the security domain is difficult. Under partially labeled settings, HCL suffers significant performance degradation in detecting unseen malware, especially on datasets such as BODMAS where strong semantic structure may not exist. In this paper, we propose SEED, a semantic-structure-agnostic method for malware detection under limited supervision. SEED combines a tailored binary cross-entropy objective with semi-supervised continual learning and active learning. For partially labeled seen tasks, unlabeled samples are projected into a representation space constructed from previously seen data using singular value decomposition, and paired with suitable labeled samples to encourage representation consistency. For unseen tasks with fully unlabeled data, uncertainty is quantified using cosine distance in representation space, and the most uncertain samples are selected for analyst labeling. We evaluate SEED on both Windows and Android malware datasets. Using only 20% labeled data on seen tasks, SEED achieves average AUT improvements of 40% on BODMAS and 14% on AndroZoo for unseen malware detection compared to HCL* (the semi-supervised adaptation of HCL), while remaining competitive on APIGraph. Finally, we introduce a delayed buffer update strategy to reduce label noise propagation during replay and improve learning stability.

URL PDF HTML ☆

赞 0 踩 0

2605.24902 2026-05-26 cs.CL cs.AI cs.LG 版本更新

迈向通用因果推理器

Qirun Dai, Xiao Liu, Jiawei Zhang, Dylan Zhang, Hao Peng, Chenhao Tan

发表机构 * The University of Chicago（芝加哥大学）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结提出UniCo数据生成框架，覆盖Pearl因果阶梯的18种查询类型，将符号示例转化为代码和自然语言，通过监督微调显著提升LLM的因果推理能力和推理忠实度。

详情

AI中文摘要

尽管因果推理的重要性不言而喻，但训练LLM进行因果推理仍未被充分探索。现有的数据工作大多集中在针对因果关系的特定方面对LLM进行基准测试，这使得它们不太适合训练可泛化的因果推理器。为了解决这个问题，我们提出了UniCo，一个数据生成框架，它既(1)涵盖了Pearl因果阶梯中的18种因果查询类型，又(2)将原生符号示例转化为代码和自然语言形式，以模拟因果术语未明确指定的真实世界用例。为确保数据质量，UniCo用精确的因果推理来支撑答案，并过滤掉存在推理捷径的案例。通过使用66.6K个UniCo生成的实例进行监督微调，Qwen3-4B、Qwen3-8B和Olmo-3-7B-Instruct在所有18种分布内查询类型上平均提升了22.9%，在训练分布之外的7个已建立的因果基准上，相比最先进的因果数据生成框架提升了8.1%。更重要的是，在真实世界的医学理解、法律决策和表格推理中，UniCo训练的模型始终展现出更忠实的推理轨迹，在忠实度指标上平均超过基础模型20.2%。这些结果表明，以因果为中心的训练不仅增强了因果推理能力，还赋予了LLM在一般推理任务中的因果思维。

英文摘要

Despite the importance of causal reasoning, training LLMs to reason causally remains underexplored. Existing data efforts mostly focus on benchmarking LLMs on specific aspects of causality, making them less suitable for training generalizable causal reasoners. To address this, we propose UniCo, a data generation framework that both (1) addresses 18 causal query types across Pearl's Causal Ladder and (2) translates natively symbolic examples into code and natural language forms to simulate real-world use cases where causal terms are not explicitly specified. To ensure data quality, UniCo grounds answers with exact causal inference and filters cases with reasoning shortcuts. Upon supervised finetuning with 66.6K UniCo-generated instances, Qwen3-4B, Qwen3-8B and Olmo-3-7B-Instruct achieve an average of 22.9% improvements across all 18 in-distribution query types, and 8.1% over state-of-the-art causal data generation frameworks on 7 established causal benchmarks outside the training distribution. More importantly, in real-world medical understanding, legal decision, and tabular reasoning, UniCo-trained models consistently display more faithful reasoning traces, outperforming the base models by an average of 20.2% in faithfulness metrics. These suggest that causality-centered training not only strengthens causal reasoning, but also equips LLMs with a causal mindset in general reasoning tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.24872 2026-05-26 cs.LG 版本更新

T2S-MPC：面向时变动力学的时间嵌入在线自适应模型预测控制

Zeyu Shen, Zhuoyuan Wang, Laixi Shi

发表机构 * JHU Department of Applied Mathematics and Statistics, Johns Hopkins University, MD, USA（约翰霍普金斯大学应用数学与统计学系）； CMU Department of Electrical and Computer Engineering, Carnegie Mellon University, PA, USA（卡内基梅隆大学电气与计算机工程系）

AI总结提出T2S-MPC框架，通过时间嵌入和双时间尺度更新在线学习残差动力学模型，实现快速时变环境下的自适应模型预测控制，在四旋翼任务中优于经典和神经MPC方法。

详情

AI中文摘要

基于学习的模型预测控制（MPC）的最新进展利用神经网络进行在线模型学习，当非平稳系统动力学偏离标称模型时，取得了强劲的性能。然而，现有方法主要处理特定或相对结构化的动力学变化形式，对于更一般、未知且不可预测的时变动力学处理不足。为应对这一挑战，我们提出T2S-MPC框架，该框架在线自适应学习残差动力学模型，并将其与MPC框架内的标称模型集成，以实现快速演变的在线规划。为使模型具有时间感知能力，我们通过结构化时间嵌入显式编码时间信息，并采用双时间尺度更新方案，使控制器能够捕捉非平稳动力学，同时平衡快速适应与稳定学习。我们在二维四旋翼上评估了所提方法，在多种时变扰动（包括线性漂移和周期性扰动）下执行稳定和轨迹跟踪任务。实验结果表明，T2S-MPC在控制性能上始终优于经典MPC、神经MPC及消融变体，同时在没有额外调参的情况下，在广泛的扰动条件下展现出强鲁棒性。源代码公开于https://github.com/Zeyuu0920/T2S_MPC。

英文摘要

Recent advances in learning-based model predictive control (MPC) have leveraged neural networks for online model learning, achieving strong performance when nonstationary system dynamics deviate from nominal models. However, existing approaches primarily address specific or relatively structured forms of dynamical variation, leaving more general, unknown, and unpredictable time-varying dynamics insufficiently handled. To tackle this challenge, we propose T2S-MPC, a framework that adaptively learns a residual dynamics model online and integrates it with the nominal model within the MPC framework to enable fast-evolving online planning. To make the model time-aware, we explicitly encode temporal information through a structured time embedding and employ a two-timescale update scheme, allowing the controller to capture nonstationary dynamics while balancing rapid adaptation with stable learning. We evaluate the proposed method on a 2D quadrotor across stabilization and trajectory tracking tasks under diverse time-varying disturbances, including linear drifting and periodic perturbations. Experimental results show that T2S-MPC consistently outperforms classical MPC, neural MPC, and ablated variants in control performance, while also demonstrating strong robustness across a wide range of disturbance conditions without additional tuning. The source code is publicly available at https://github.com/Zeyuu0920/T2S_MPC

URL PDF HTML ☆

赞 0 踩 0

2605.24841 2026-05-26 cs.LG 版本更新

DriftingMol: Decoder-Coupled Drift for One-Pass Property-Conditional Molecular Generation

DriftingMol: 用于一次性属性条件分子生成的解码器耦合漂移

Jiangjie Qiu, Yijun Li, Wentao Li, Xiaonan Wang

发表机构 * Beijing Key Laboratory of Artificial Intelligence for Advanced Chemical Engineering Materials（北京先进化工材料人工智能重点实验室）

AI总结提出 DriftingMol 两阶段框架，通过解码器耦合漂移将漂移模型适应于 SELFIES 潜在分子空间，实现低采样成本、高有效性和多样性的属性条件分子生成。

Comments 9 pages, 5 figures

详情

AI中文摘要

属性条件分子生成应在响应连续目标值的同时，以低采样成本生成有效且多样的分子。我们引入了 DriftingMol，一个两阶段框架，将漂移模型适应于 SELFIES 潜在分子空间。冻结的 SELFIES beta-VAE 提供潜在空间，其解码器的隐藏表示作为漂移特征图。在解码器耦合漂移中，解码器权重保持不变，但漂移梯度通过解码器特征图反向传播到 DiT 生成器，从而诱导出与分子解码对齐的拉回度量。在 ZINC250K 上，默认设置实现了 QED Spearman 相关系数 0.493，独特性 94.7%，而最强的解码器耦合条件达到 0.510。在协议匹配的四属性条件下，解码器耦合漂移的平均 Spearman 相关系数高达 0.598。在 15 个受控变体中，保留通过解码器特征的梯度路径的模型比测试的潜在空间、随机特征和外部特征漂移变体实现了更高的相关性，而分离或停止梯度的解码器控制导致 QED 相关性接近零且独特性极低。这些结果表明，解码器耦合漂移是一种有用的低成本机制，用于属性偏置分子生成，只需一次生成器评估和一次冻结解码器传递。

英文摘要

Property-conditional molecular generation should produce valid, diverse molecules while responding to continuous target values at low sampling cost. We introduce DriftingMol, a two-stage framework that adapts drifting models to a SELFIES latent molecular space. A frozen SELFIES beta-VAE provides the latent space, and the hidden representation of its decoder serves as the drift feature map. In decoder-coupled drift, decoder weights remain fixed, but drift gradients are backpropagated through the decoder feature map to a DiT generator, inducing a pullback metric aligned with molecular decoding. On ZINC250K, the default setting achieves QED Spearman correlation 0.493 with 94.7% uniqueness, while the strongest decoder-coupled condition reaches 0.510. Under protocol-matched four-property conditioning, decoder-coupled drift reaches mean Spearman correlation up to 0.598. Across 15 controlled variants, models that preserve the gradient path through decoder features achieve higher correlations than the tested latent-space, random-feature, and external-feature drift variants, while detached or stop-gradient decoder controls yield near-zero QED correlation and very low uniqueness. These results indicate that decoder-coupled drift is a useful low-cost mechanism for property-biased molecular generation, requiring one generator evaluation and one frozen decoder pass.

URL PDF HTML ☆

赞 0 踩 0

2605.24817 2026-05-26 cs.CR cs.AR cs.CL cs.LG 版本更新

RouteScan: A Non-Intrusive Approach to Auditing MoE LLMs Safety via Expert Routing Telemetry

RouteScan: 通过专家路由遥测对MoE大语言模型安全性进行非侵入式审计

Bo Lv, Zhiheng Xu, KeDong Xiu, Ruyi Ding, Tianhang Zheng, Zhibo Wang, Kui Ren

发表机构 * Zhejiang University（浙江大学）； Donghua University（东华大学）； Louisiana State University（路易斯安那州立大学）

AI总结提出RouteScan，一种利用MoE模型GPU级专家路由遥测（如预填充阶段活跃线程数）作为微架构指纹，通过轻量级检测流水线识别恶意提示的非侵入式审计框架，在未见过的有害领域AUROC超0.93，新越狱包装下超0.96，且相比基于内容的审计方法具有隐私优势。

Comments 20 pages. Under submission

详情

AI中文摘要

混合专家（MoE）架构已成为扩展大型语言模型（LLM）日益重要的范式。随着MoE模型越来越多地部署在实际服务中，安全性审计变得必要，以验证这些模型在运行过程中是否产生或助长有害行为。然而，现有的基于内容的审计方法通常需要访问用户提示、模型输入或生成输出，可能暴露敏感用户信息，并在LLM安全性和用户隐私之间造成根本性紧张。另一方面，我们观察到，在MoE模型中，稀疏专家路由将不同输入映射到激活不同的专家执行模式，在低级GPU执行遥测中产生可测量的足迹。受此观察启发，我们提出RouteScan，一种通过GPU级专家路由遥测检测有害行为的非侵入式审计框架。具体而言，RouteScan利用预填充阶段分配给专家模块的活跃GPU线程数作为判别性微架构指纹，并构建轻量级检测流水线，隔离跨领域不变风险指标以精确识别恶意提示。对具有不同路由设计的开源MoE LLM的全面评估表明，RouteScan实现了强泛化，在未见过的有害领域AUROC超过0.93，在新型越狱包装下超过0.96。此外，经验性反演测试表明，收集的专家路由遥测为提示重建提供的信息有限，表明相对于基于内容的审计方法具有实际隐私优势。

英文摘要

Mixture-of-Experts (MoE) architectures have become an increasingly important paradigm for scaling Large Language Models (LLMs). As MoE models are increasingly deployed in real-world services, safety auditing becomes necessary to verify whether these models produce or facilitate harmful behaviors during operation. However, existing content-based auditing methods typically require access to user prompts, model inputs, or generated outputs, potentially exposing sensitive user information and creating a fundamental tension between LLM safety and user privacy. On the other hand, we observe that, in MoE models, sparse expert routing maps different inputs to activate different expert-execution patterns, producing measurable footprints in low-level GPU execution telemetry. Inspired by this observation, we propose RouteScan, a non-intrusive auditing framework for detecting harmful behaviors through GPU-level expert routing telemetry. Specifically, RouteScan utilizes the number of active GPU threads allocated to expert modules during the prefilling phase as a discriminative micro-architectural fingerprint, and builds a lightweight detection pipeline that isolates cross-domain invariant risk indicators for the precise identification of malicious prompts. Comprehensive evaluations on open-source MoE LLMs with distinct routing designs demonstrate that RouteScan achieves strong generalization, with an AUROC exceeding 0.93 on unseen harmful domains and 0.96 under novel jailbreak wrappers. Moreover, empirical inversion tests show that the collected expert routing telemetry provides limited information for prompt reconstruction, suggesting a practical privacy advantage over content-based auditing methods.

URL PDF HTML ☆

赞 0 踩 0

2605.24810 2026-05-26 cs.LG cs.AI cs.RO stat.AP 版本更新

Cross-Domain Energy-Guided Diffusion Generation for Off-Dynamics Reinforcement Learning

跨域能量引导扩散生成用于动态偏移强化学习

Yu Yang, Yihong Guo, Anqi Liu, Pan Xu

发表机构 * Duke University（杜克大学）； Johns Hopkins University（约翰霍普金斯大学）

AI总结提出CEDGE框架，利用能量引导扩散模型生成目标域轨迹，解决动态偏移下离线强化学习的域适应问题。

Comments 29 pages, 3 figures, and 14 tables

详情

AI中文摘要

离动态离线强化学习旨在从大规模源数据集和有限目标数据集中学习目标域策略，但面临转移动态不匹配的问题。现有方法如奖励增强和数据过滤受限于源数据集，无法合成新的目标行为以改善超出收集源轨迹的覆盖范围。虽然近期基于模型的方法尝试通过学习目标感知动态来解决此问题，但生成的体验仅在转移层面构建，导致长时域上的累积误差。这些限制促使离动态离线RL转向轨迹级生成。我们提出CEDGE，一种跨域能量引导扩散生成框架。CEDGE在源域轨迹上训练轨迹扩散模型，并通过能量引导将生成样本适应到目标域。该引导通过最小化源域与期望目标域轨迹之间的分布不匹配得到，并分解为回报、域和行为能量成分。得到的能量引导轨迹既可用于直接规划，也可作为策略学习的合成数据。由于目标适应通过能量引导而非重新训练扩散模型实现，与先前方法相比，CEDGE能高效适应新的目标动态。在ODRL基准上的实验表明，轨迹级能量引导生成改善了动态偏移下的扩散规划，并产生提升下游目标策略学习的合成数据。

英文摘要

Off-dynamics offline reinforcement learning seeks to learn a target-domain policy from a large source dataset and a limited target dataset under mismatched transition dynamics. Existing approaches such as reward augmentation and data filtering are constrained to the source dataset and cannot synthesize new target behavior to improve coverage beyond the collected source trajectories. While recent model-based methods attempt to address this by learning target-aware dynamics, the generated experience is constructed only at the transition level, which leads to accumulated errors over long horizons. These limitations necessitate a shift toward trajectory-level generation for off-dynamics offline RL. We propose CEDGE, a Cross-domain Energy-guided Diffusion GEneration framework. CEDGE trains a trajectory diffusion model on source-domain trajectories and adapts the generated samples to the target domain through energy guidance. This guidance is derived by minimizing the distribution mismatch between the source and desired target-domain trajectories and is decomposed into return, domain, and behavior energy components. The resulting energy-guided trajectories are useful both for direct planning and as synthetic data for policy learning. Since target adaptation is achieved via energy guidance rather than retraining the diffusion model, CEDGE can be efficiently adapted to new target dynamics compared to previous methods. Experiments on the ODRL benchmark demonstrate that trajectory-level energy-guided generation improves diffusion planning under dynamics shifts and produces synthetic data that improves downstream target policy learning.

URL PDF HTML ☆

赞 0 踩 0

2605.24808 2026-05-26 cs.LG cs.AI 版本更新

Disentangled Double Machine Learning for Accurate Causal Effect Estimation

解缠双机器学习用于精确因果效应估计

Guodu Xiang, Kui Yu, Yujie Wang, Richang Hong, Fuyuan Cao, Jiye Liang

发表机构 * School of Computer Science and Information Engineering, Hefei University of Technology（合肥工业大学计算机科学与信息工程学院）； School of Computer and Information Technology, Shanxi University（山西大学计算机与信息学院）

AI总结提出解缠双机器学习（DDML），通过因果角色解缠和残差依赖正交化策略，解决高维或有限样本下双机器学习中因混淆因子未解缠导致的偏差和不稳定问题，在合成、半合成和真实数据集上优于13种基线方法。

Comments 15 pages, 9 figures

详情

AI中文摘要

混淆偏差是从观测数据中估计因果效应的一个关键挑战。双机器学习（DML）通过估计治疗和结果 nuisance 函数、构建治疗和结果残差，并从残差中估计因果效应来解决这一问题。然而，DML 在高维或有限样本场景中常常产生有偏和不稳定的估计。一个原因是 DML 使用所有协变量估计 nuisance 函数，而没有解缠不同的潜在因子，导致不可靠的 nuisance 函数估计。另一个原因是不精确的 nuisance 估计进一步引入了治疗残差与剩余结果误差之间的残差依赖，破坏了因果效应估计的准确性。为了解决这些问题，本文提出解缠双机器学习（DDML），一种整合两种关键策略的新算法。首先，因果角色解缠策略将协变量分解为混淆因子、治疗特有因子和结果特有因子，以实现可靠的 nuisance 函数估计。其次，残差依赖正交化策略减轻由 nuisance 估计误差引起的残差依赖，以增强因果效应估计的精度。在合成、半合成和真实数据集上的实验结果表明，DDML 在 MAE 和 RMSE 上均显著优于 13 种最先进的基线算法。

英文摘要

Confounding bias is a key challenge in causal effect estimation from observational data. Double Machine Learning (DML) addresses this issue by estimating treatment and outcome nuisance functions, constructing treatment and outcome residuals, and estimating causal effects from the residuals. However, DML often produces biased and unstable estimates in highdimensional or finite-sample scenarios. One reason is that DML estimates nuisance functions using all covariates without disentangling distinct latent factors, resulting in unreliable nuisance function estimation. Another is that imprecise nuisance estimation further introduces residual dependence between the treatment residual and the remaining outcome error, undermining the accuracy of causal effect estimates. To address these issues, in this paper, we propose Disentangled Double Machine Learning (DDML), a novel algorithm that integrates two key strategies. First, a causal role disentanglement strategy decomposes covariates into confounders, treatment-specific factors, and outcomespecific factors for enabling reliable nuisance function estimation. And second, a residual dependence orthogonalization strategy mitigates residual dependence caused by nuisance estimation errors for enhancing the precision of causal effect estimates. Experimental results on synthetic, semi-synthetic, and real-world datasets demonstrate that DDML significantly outperforms 13 state-of-the-art baseline algorithms in both MAE and RMSE.

URL PDF HTML ☆

赞 0 踩 0

2605.24803 2026-05-26 cs.LG 版本更新

Active Learning for Stochastic Contextual Linear Bandits

随机上下文线性老虎机的主动学习

Emma Brunskill, Ishani Karmarkar, Zhaoqi Li

发表机构 * Stanford University（斯坦福大学）

AI总结提出一种通过主动采样上下文-动作对奖励来学习近最优策略的算法，理论上证明主动上下文采样可将最小最大率改进最多√d倍，并在华法林剂量预测和笑话推荐任务中验证了样本效率提升。

详情

AI中文摘要

随机上下文线性老虎机的一个关键目标是高效学习近最优策略。现有算法通过策略性地采样动作来学习策略，但被动地从底层上下文分布中采样上下文。然而，在许多实际场景中——包括在线内容推荐、调查研究、临床试验——从业者可以根据上下文分布的先前知识主动采样或招募上下文。尽管有这种主动学习的潜力，但策略性上下文采样在随机上下文线性老虎机中的作用尚未被充分探索。我们提出一种算法，通过策略性地采样上下文-动作对的奖励来学习近最优策略。我们证明了实例相关的理论保证，表明我们的主动上下文采样策略可以将最小最大率改进最多√d倍，其中d是线性维度。我们通过实验证明，我们的算法在学习近最优策略所需的样本数量上有所减少，例如在华法林剂量预测和笑话推荐任务中。

英文摘要

A key goal in stochastic contextual linear bandits is to efficiently learn a near-optimal policy. Prior algorithms for this problem learn a policy by strategically sampling actions but naively (passively) sampling contexts from the underlying context distribution. However, in many practical scenarios -- including online content recommendation, survey research, and clinical trials -- practitioners can actively sample or recruit contexts based on prior knowledge of the context distribution. Despite this potential for active learning, the role of strategic context sampling in stochastic contextual linear bandits is underexplored. We propose an algorithm that learns a near-optimal policy by strategically sampling rewards of context-action pairs. We prove instance-dependent theoretical guarantees demonstrating that our active context sampling strategy can improve over the minimax rate by up to a factor of $\sqrt{d}$, where $d$ is the linear dimension. We show empirically that our algorithm reduces the number of samples needed to learn a near-optimal policy, in tasks such as warfarin dose prediction and joke recommendation.

URL PDF HTML ☆

赞 0 踩 0

2605.24786 2026-05-26 cs.LG cs.AI 版本更新

CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM

CONF-KV：面向长序列LLM的置信度感知KV缓存淘汰与混合精度存储

Yubo Li, Yidi Miao

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结提出CONF-KV方法，利用模型当前不确定性（置信度）动态调整KV缓存预算，结合混合精度存储和分块在线softmax注意力，在长序列推理中显著降低显存占用并保持高精度。

详情

AI中文摘要

长序列LLM推理使键值（KV）缓存成为GPU内存的主要消耗者，并使每个token的注意力计算越来越昂贵。许多常见的淘汰策略使用静态的最近窗口或历史注意力，忽略了每个解码步骤中计算出的一个信号：模型当前的不确定性。我们引入CONF-KV，一个KV缓存管理器，它将下一个token分布转换为标量置信度分数，并用它来选择每步缓存预算，在模型不确定时保留更多上下文，在模型确定时积极剪枝。在每个预算内，token根据累积注意力质量和最近性的组合进行排序，同时一个受保护的最近窗口保持局部连贯性。我们将该策略与分块在线softmax注意力、混合FP16/INT8存储以及金字塔式逐层预算变体相结合。在四个模型家族和生成长度高达4K的情况下，CONF-KV的显存占用接近固定的512 token滑动窗口，同时与完整KV相比，困惑度差异保持在1.5-2.1点以内。在长达32K token的“大海捞针”测试中，CONF-KV的检索准确率达到91.4%，而滑动窗口为53.8%，H2O为80.6%；在75个VisualWebArena任务中，它以2.8倍的峰值内存降低保留了完整KV成功率的95.3%。

英文摘要

Long-horizon LLM inference turns the key--value (KV) cache into the dominant GPU memory consumer and makes per-token attention increasingly expensive. Many common eviction policies use static recency windows or historical attention, leaving unused a signal computed on every decoding step: the model's current uncertainty. We introduce CONF-KV, a KV-cache manager that converts the next-token distribution into a scalar confidence score and uses it to choose the per-step cache budget, retaining more context when the model is uncertain and pruning aggressively when it is confident. Within each budget, tokens are ranked by a composite of accumulated attention mass and recency, while a protected recent window preserves local coherence. We combine the policy with blockwise online-softmax attention, mixed FP16/INT8 storage, and a pyramidal per-layer budget variant. Across four model families and generated lengths up to 4K, CONF-KV stays near the footprint of a fixed 512-token sliding window while remaining within 1.5--2.1 perplexity points of full KV. On Needle-in-a-Haystack up to 32K tokens, CONF-KV reaches 91.4% retrieval accuracy versus 53.8% for sliding windows and 80.6% for H2O; on 75 VisualWebArena tasks it retains 95.3% of full-KV success at 2.8 times lower peak memory.

URL PDF HTML ☆

赞 0 踩 0

2605.24779 2026-05-26 cs.LG cs.AI math.CO 版本更新

CyberMaskQA: 一个用于评估大语言模型在网络安全问答中隐私意识的基准

Matilda Gaddi, Jin Noh, Onat Gungor, Tajana Rosing

发表机构 * Department of Computer Science and Engineering（计算机科学与工程系）； University of California, San Diego (UCSD)（加州大学圣地亚哥分校）

AI总结针对现有基准缺乏隐私保护评估的问题，提出CyberMaskQA基准，通过结合人工场景与LLM语义扩展生成带隐私标签的数据集，以评估模型在网络安全问答中的推理与隐私保护能力。

详情

AI中文摘要

大型语言模型（LLM）越来越多地应用于网络安全问答（QA），用于事件响应和漏洞分析等关键任务。然而，现实世界的操作环境，包括系统日志和网络配置，本质上包含敏感标识符，例如IP地址、主机名和用户账户。在受监管的环境中，使用基于云的模型处理这些数据通常不安全或不可行。此外，隐私保护问答的进展因缺乏能够同时评估操作推理和隐私保护的带注释、上下文丰富的数据集而受阻。为解决这一差距，我们引入了CYBERMASKQA，一个涵盖关键安全领域的隐私感知问答基准。与主要测试事实知识的现有基准不同，CYBERMASKQA将问题置于现实的组织环境中，并具有资产和权限之间的显式因果依赖关系。通过系统化的流水线生成，该数据集结合了人工策划的基础场景与LLM驱动的语义扩展，为每个实例标注精确的私有实体标签，以实现可控的信息披露。对问答准确性和掩码性能的评估证明了该基准在开发可部署、上下文感知的网络安全模型以及促进隐私-效用权衡的细致研究方面的实用性。一经接受，我们将发布数据集和生成框架。

英文摘要

Large language models (LLMs) are increasingly applied to cybersecurity question answering (QA) for critical tasks such as incident response and vulnerability analysis. However, real-world operational contexts, including system logs and network configurations, inherently contain sensitive identifiers, e.g., IP addresses, host names, and user accounts. Processing this data with cloud-based models is often unsafe or infeasible in regulated environments. Furthermore, progress in privacy-preserving QA is hindered by the lack of annotated, context-rich datasets capable of jointly evaluating operational reasoning and privacy preservation. To address this gap, we introduce CYBERMASKQA, a privacy-aware QA benchmark covering key security domains. Unlike existing benchmarks that primarily test factual knowledge, CYBERMASKQA grounds questions in realistic organizational contexts with explicit causal dependencies among assets and privileges. Generated through a systematic pipeline, the dataset combines human-curated base scenarios with LLM-driven semantic expansion, annotating each instance with precise private entity labels to enable controlled information disclosure. Evaluations of QA accuracy and masking performance demonstrate the benchmark's utility for developing deployable, context-aware cybersecurity models and facilitating nuanced studies of privacy-utility trade-offs. Upon acceptance, we will release the dataset and the generation framework.

URL PDF HTML ☆

赞 0 踩 0

2605.24763 2026-05-26 cs.LG physics.flu-dyn 版本更新

High-fidelity Modeling of Full-scale Pressurized Water Reactor Flow Fields for Machine Learning Applications

面向机器学习应用的全尺寸压水堆流场高保真建模

Logan A. Burnett, Hyungjun Kim, Hsien-Cheng Chou, Arsha Witoelar, Robert A. Brewster, Benoit Forget, Emilio Baglietto, Majdi I. Radaideh

发表机构 * Department of Nuclear Engineering and Radiological Sciences, University of Michigan（密歇根大学核工程与辐射科学系）； Department of Nuclear Science and Engineering, Massachusetts Institute of Technology（麻省理工学院核科学与工程系）； Korea Atomic Energy Research Institute（韩国原子能研究所）； Department of Mechanical Engineering, University of Michigan（密歇根大学机械工程系）； Department of Computer Science and Engineering, University of Michigan（密歇根大学计算机科学与工程系）

AI总结本研究利用高保真CFD模拟和机器学习模型，对四环路压水堆组件级流场进行表征，揭示了冷腿旋流和下腔室输运导致的入口流量分布不均匀性，并验证了ConvLSTM等空间感知架构在流场重建与预测中的优越性。

Comments 30 pages, 10 figures, and 6 Tables

详情

AI中文摘要

本工作提出了一个用于四环路压水堆组件级流动表征的高保真计算流体动力学和数据驱动建模框架。利用公开可用的几何和运行条件构建了完整的下腔室和堆芯入口域，实现了带有泵诱导旋流边界条件的瞬态模拟。结果表明，冷腿旋流和下腔室输运在堆芯下部区域产生强烈的非均匀组件级入口流量分布，而轴向阻力和混合作用逐渐使更高位置的流动均匀化。这些基于物理的数据集随后被用于评估机器学习在部分场重建和短期自回归预测中的应用。一个基于3D卷积的修复模型成功地从部分观测中重建了缺失的组件级质量流量，误差集中在高湍流底部层，并在上层显著减小。跨多个ML模型的比较分析表明，空间感知架构，特别是ConvLSTM，通过有效捕捉耦合的时空动态，显著优于基于序列的LSTM和算子学习DeepONet方法。研究还强调了关键挑战，包括入口流预测对湍流和网格分辨率的敏感性，以及缺乏全尺寸实验验证数据。尽管存在这些限制，结果仍与预期的物理行为一致。总体而言，本工作将高保真CFD确立为开发数据驱动代理模型、稀疏传感策略和未来多物理场耦合框架的关键基础。

英文摘要

This work presents a high-fidelity computational fluid dynamics (CFD) and data-driven modeling framework for assembly-level flow characterization in a four-loop pressurized water reactor (PWR). A full lower-plenum and core-inlet domain was constructed using publicly available geometry and operating conditions, enabling transient simulations with pump-induced swirl boundary conditions. The results show that cold-leg swirl and lower-plenum transport generate strongly heterogeneous assembly-wise inlet flow distributions, particularly near the lower core region, while axial resistance and mixing progressively homogenize the flow at higher elevations. These physics-informed datasets were subsequently used to evaluate machine learning (ML) applications for partial field reconstruction and short-term autoregressive prediction. A 3D convolutional-based inpainting model successfully recon-structed missing assembly-level mass flow rates from partial observations, with errors concentrated in the highly turbulent base (bottom) layer and diminishing significantly in upper layers. Comparative analysis across multiple ML models demon-strates that spatially aware architectures, particularly ConvLSTM, significantly outperform sequence-based (LSTM) and operator-learning (DeepONet) approaches by effectively capturing coupled spatio-temporal dynamics. The study also high-lights key challenges, including the sensitivity of inlet flow predictions to turbulence and mesh resolution, as well as the absence of full-scale experimental validation data. Despite these limitations, the results remain consistent with expected physical behavior. Overall, this work establishes high-fidelity CFD as a critical foundation for developing data-driven surrogates, sparse sensing strategies, and future multiphysics coupling frameworks.

URL PDF HTML ☆

赞 0 踩 0

2605.24759 2026-05-26 cs.LG 版本更新

A Contractive Feedback Semantics for Reinforcement Learning

强化学习的收缩反馈语义

Zuyuan Zhang

发表机构 * The George Washington University（乔治华盛顿大学）

AI总结本文通过将单步决策过程视为开放随机组件，并利用收缩反馈环实现无限时域策略评估，建立了强化学习的组合语义，并推导出近似等价、状态抽象和合约规范的理论结果。

详情

AI中文摘要

折扣强化学习通常通过闭马尔可夫决策过程上的贝尔曼方程来呈现。本文发展了一种组合视角：将单步决策过程视为开放随机组件，并通过闭合收缩反馈环实现无限时域策略评估。由此产生的语义为开放组件分配了类型化的贝尔曼变换器，将串联和并联布线解释为变换器的复合和张量，并将反馈解释为由唯一不动点实现的可容许有界守护迹。这一视角产生了三个理论结果。第一，近似组件等价是对于可容许的良类型守护单孔上下文的上下文同余：局部算子误差在将组件插入周围电路后仍受控，该电路使用该孔一次且其反馈节点具有认证的均匀守护性。第二，精确和近似状态抽象成为交换或近交换的余代数图，从而给出值保持和显式 sup-norm 失真界。第三，在单调 ω-连续合约变换器语义下，安全性、风险和资源规范可以表示为量值值合约，其中局部归纳界通过最小不动点推理提升到布线和反馈中。其核心主张并非所有强化学习态射构成全局迹幺半范畴，而是折扣贝尔曼评估在守护电路的可容许类上允许收缩反馈语义。

英文摘要

Discounted reinforcement learning is usually presented through Bellman equations on closed Markov decision processes. This paper develops a compositional view: a one-step decision process is treated as an open stochastic component, and infinite-horizon policy evaluation is obtained by closing a contractive feedback loop. The resulting semantics assigns typed Bellman transformers to open components, interprets series and parallel wiring as composition and tensoring of transformers, and interprets feedback as an admissible guarded Banach trace realized by a unique fixed point. This perspective yields three theoretical consequences. First, approximate component equivalence is a contextual congruence for admitted well-typed guarded one-hole contexts: local operator error remains controlled after plugging the component into a surrounding circuit that uses the hole once and whose feedback nodes have certified uniform guardedness. Second, exact and approximate state abstractions become commuting or near-commuting coalgebraic diagrams, giving value-preservation and explicit sup-norm distortion bounds. Third, under monotone $ω$-continuous contract-transformer semantics, safety, risk, and resource specifications can be represented as quantale-valued contracts, where local inductive bounds lift through wiring and feedback by least-fixed-point reasoning. Its central claim is not that all RL morphisms form a global traced monoidal category, but that discounted Bellman evaluation admits a contractive feedback semantics on the admissible class of guarded circuits.

URL PDF HTML ☆

赞 0 踩 0

2605.24754 2026-05-26 cs.CV cs.AI cs.LG 版本更新

Motion-Compensated Weight Compression

运动补偿权重压缩

Ismail Lamaakal

发表机构 * Multidisciplinary Faculty of Nador Mohammed Premier University（纳多莫哈梅德 premier 大学多学科学院）

AI总结提出运动补偿权重压缩（MCWC）方法，通过对齐置换对称块并利用层序预测和熵编码，有效压缩神经网络权重，在Transformer语言建模和视觉分类任务中提升率-精度帕累托前沿。

Comments 54 pages, 17 tables, 6 Figures

详情

AI中文摘要

神经网络权重日益成为部署的瓶颈，然而大多数压缩流水线独立处理各层，忽略了由函数保持对称性引起的跨层冗余。我们提出运动补偿权重压缩（MCWC），一种仅权重的编解码器，它对齐置换对称块（例如隐藏单元和注意力头）以最大化跨层对应，将深度转化为可预测序列。在对齐的坐标系中，MCWC使用带有周期性关键帧的轻量级层序预测器，并仅编码在率失真目标下训练的学习熵模型预测残差。一个简单的解码器通过熵解码、反量化、预测驱动重建和逆对齐来重建可部署的权重，从而实现快速权重物化以进行推理。在Transformer语言建模和视觉分类中，MCWC在强量化和学习权重编解码基线之上改善了率-精度帕累托前沿，同时保持有竞争力的解码时间。消融实验证实，对齐、预测、熵建模和关键帧调度对于获得全部增益都是必要的。我们的代码可通过 https://github.com/Ism-ail11/MCWC 获取。

英文摘要

Neural network weights are increasingly a bottleneck for deployment, yet most compression pipelines treat layers independently and overlook cross-layer redundancy induced by function-preserving symmetries. We propose Motion-Compensated Weight Compression (MCWC), a weight-only codec that aligns permutation-symmetric blocks (e.g., hidden units and attention heads) to maximize cross-layer correspondence, turning depth into a predictable sequence. In the aligned coordinate system, MCWC uses a lightweight layer-sequential predictor with periodic keyframes and encodes only quantized prediction residuals using a learned entropy model trained under a rate distortion objective. A simple decoder reconstructs deployable weights by entropy decoding, dequantization, predictor-driven reconstruction, and inverse alignment, enabling fast weight materialization for inference. Across Transformer language modeling and vision classification, MCWC improves the rate accuracy Pareto frontier over strong quantization and learned weight-codec baselines, while maintaining competitive decode time. Ablations confirm that alignment, prediction, entropy modeling, and keyframe scheduling are each necessary for the full gains. Our code is available via https://github.com/Ism-ail11/MCWC.

URL PDF HTML ☆

赞 0 踩 0

2605.24752 2026-05-26 cs.LG cs.CC cs.DS math.PR 版本更新

用于多轮LLM微调的合成轨迹的双层优化

Shresth Verma, Mauricio Tec, Cheol Woo Kim, Kai Wang, Milind Tambe

发表机构 * Harvard University（哈佛大学）； Georgia Institute of Technology（佐治亚理工学院）

AI总结提出BOOST双层优化框架，通过内层加权训练和外层轻量级重加权头学习，解决合成轨迹质量异质性导致的LLM多轮交互性能下降问题。

详情

AI中文摘要

虽然LLM在单轮生成中表现出色，但在长程多轮交互中表现不佳。离线强化学习提供了一种可扩展的方法，但其性能依赖于多轮轨迹数据的可用性和质量。一种常见的补救措施是使用LLM或模拟器生成的合成轨迹来增强训练，但合成数据的质量高度异质，天真地将所有轨迹视为同等信息量会降低性能。我们提出BOOST，一个双层优化框架，其中内层在重新加权的数据上训练LLM，外层在保留的真实验证任务上训练一个轻量级的重加权头，无需外部评判器即可分配连续的轨迹级权重。为了夯实这一方法，我们推导出一个PAC-Bayesian界，揭示了三方权衡：合成数据增加了多样性但存在任务偏移风险，而将权重集中在高质量轨迹上提高了经验性能但以有效样本量为代价。实验上，我们的方法一致优于多个基线。分析表明，它提高了与真实数据分布一致且具有更高定性价值的合成轨迹的权重。

英文摘要

While LLMs excel at single-turn generation, they struggle with long-horizon, multi-turn interactions. Offline reinforcement learning (RL) offers a scalable approach, yet its performance hinges on the availability and quality of multi-turn trajectory data. A common remedy is to augment training with synthetic trajectories generated by LLMs or simulators, but synthetic data is highly heterogeneous in quality, and naively treating all trajectories as equally informative can degrade performance. We propose BOOST, a bilevel optimization framework where the inner level trains the LLM on reweighted data and the outer level trains a lightweight reweighting head on held-out real validation tasks, assigning continuous trajectory-level weights without requiring an external judge. To ground this approach, we derive a PAC-Bayesian bound revealing a three-way trade-off: synthetic data increases diversity but risks task-shift, while concentrating weight on high-quality trajectories improves empirical performance at the cost of effective sample size. Empirically, our method consistently outperforms multiple baselines. Analysis reveals it upweights synthetic trajectories that align with the real data distribution and exhibit higher qualitative merit.

URL PDF HTML ☆

赞 0 踩 0

2605.24742 2026-05-26 cs.LG 版本更新

英文摘要

Multimodal Attributed Graph Learning (MAGL) integrates intrinsic node attributes with structural topology via graph aggregation. However, as pretrained encoders evolve into Large Foundation Models (LFMs), the landscape of MAGL fundamentally shifts: under high-confidence LFM priors, mandatory aggregation introduces topological noise that overwhelms discriminative signals, triggering a counter-intuitive performance inversion where sophisticated MAGL architectures underperform simple topology-agnostic MLPs. Through systematic empirical and theoretical analysis, we identify that this inversion stems from a fundamental aggregation dilemma characterized by two concurrent pathologies: (1) Representational Pathology (SNR Degradation) - mandatory aggregation dilutes robust intrinsic features with topological noise, causing the noise penalty to outweigh its collaborative benefit; and (2) Optimization Pathology (Gradient Starvation) - topological aggregation attenuates gradient flow, while a shared task loss causes dominant modalities to prematurely suppress weaker ones. To resolve this dilemma, we propose SUPRA (Shared-Unique Prior-Retaining Architecture), a decoupled dual-pathway paradigm. SUPRA processes modality-specific features through topology-agnostic MLPs while capturing structural synergy via a lightweight shared GNN, with auxiliary deep supervision counteracting gradient starvation. Extensive evaluations demonstrate that SUPRA achieves state-of-the-art performance while requiring 3.5x lower peak GPU memory and up to 4.4x faster training time than Multimodal Graph Transformers.

URL PDF HTML ☆

赞 0 踩 0

2605.24680 2026-05-26 cs.LG 版本更新

Trajectory-Based Difficulty Scoring for Reliable Learning on Tabular Data

基于轨迹的难度评分用于表格数据的可靠学习

Tomer Lavi, Bracha Shapira, Nadav Rappoport

发表机构 * Faculty of Computer and Information Science, Ben-Gurion University of the Negev（计算机与信息科学学院，本·古里安内盖夫大学）

AI总结提出轨迹难度评分（TDS），通过分析梯度提升树的逐树累积预测轨迹，为每个实例估计难度，并在分类和回归任务中优于现有基线，同时支持主动学习、选择性预测和共形预测等应用。

详情

AI中文摘要

WLNO: 用于求解偏微分方程的小波-拉普拉斯神经算子

Muhammad Abid, Arth Sojitra, Omer San

发表机构 * Department of Mechanical and Aerospace Engineering, University of Tennessee, Knoxville（田纳西大学机械与航空航天工程系）

AI总结提出WLNO，通过融合Haar小波多尺度空间分解与拉普拉斯神经算子的极点-留数公式，在五个基准PDE问题上优于LNO，尤其擅长处理具有强空间多尺度结构的问题。

详情

AI中文摘要

本文介绍了小波-拉普拉斯神经算子（WLNO），一种新颖的神经算子，它将Haar小波多尺度空间分解与拉普拉斯神经算子（LNO）的拉普拉斯域极点-留数公式融合在一起。虽然LNO通过可学习的系统极点和留数捕捉瞬态和稳态动力学，但它缺乏提取复杂PDE解中固有的空间局部多尺度特征的显式机制。WLNO通过用并行单级Haar离散小波变换（DWT）分支增强LNO核心来解决这一问题，该分支将提升的特征图分解为四个频率子带：近似（LL）、水平细节（LH）、垂直细节（HL）和对角细节（HH），并在通过逆DWT重建之前对每个子带应用独立学习的$1\times1$卷积。两个分支通过一个可学习的sigmoid门控权重$\alpha_\mathrm{wav}$融合，该权重初始化为给小波分支一个小的初始贡献，允许模型在整个训练过程中自适应地平衡拉普拉斯域动力学与空间多尺度特征。WLNO与LNO在五个基准PDE问题上使用相同的超参数、训练数据和评估协议进行评估：扩散方程、Burgers方程、反应扩散系统、达西流和二维Navier-Stokes方程。WLNO在所有五个问题上始终优于LNO，在具有强空间多尺度结构的问题上改进最为显著，例如具有尖锐激波前沿的Burgers方程和具有相干涡旋结构的Navier-Stokes方程，而在更平滑和椭圆问题上表现一致。这些结果表明，基于小波的多尺度空间分解是拉普拉斯域算子学习的一种有原则且有效的补充。

英文摘要

This work introduces the Wavelet-Laplace Neural Operator (WLNO), a novel neural operator that fuses Haar wavelet multi-scale spatial decomposition with the Laplace-domain pole-residue formulation of the Laplace Neural Operator (LNO). While LNO captures transient and steady-state dynamics through learnable system poles and residues, it lacks an explicit mechanism for extracting spatially localized multi-scale features inherent in complex PDE solutions. WLNO addresses this by augmenting the LNO core with a parallel single-level Haar discrete wavelet transform (DWT) branch that decomposes the lifted feature map into four frequency subbands: approximation (LL), horizontal detail (LH), vertical detail (HL), and diagonal detail (HH) and applies independent learned $1\times1$ convolutions to each subband before reconstruction via the inverse DWT. The two branches are fused through a learnable sigmoid-gated weight $α_\mathrm{wav}$, initialized to give a small initial contribution to the wavelet branch, allowing the model to adaptively balance Laplace-domain dynamics against spatial multi-scale features throughout training. WLNO is evaluated against LNO on five benchmark PDE problems using identical hyperparameters, training data, and evaluation protocols: the diffusion equation, the Burgers equation, the reaction-diffusion system, Darcy flow, and the two-dimensional Navier-Stokes equation. WLNO consistently outperforms LNO on all five problems, with the most pronounced improvement on problems with strong spatial multi-scale structure, such as the Burgers equation with sharp shock fronts and the Navier-Stokes equation with coherent vortical structures, while remaining consistent across smoother and elliptic problems. These results demonstrate that wavelet-based multi-scale spatial decomposition is a principled and effective complement to Laplace-domain operator learning.

URL PDF HTML ☆

赞 0 踩 0

2605.24651 2026-05-26 math.NA cs.LG cs.NA 版本更新

WINO: A Weak-Form Physics Informed Neural Operator for Hyperelasticity on Variable Domains

WINO: 一种用于变域超弹性问题的弱形式物理信息神经算子

Bokai Zhu, Qinghui Zhang, Timon Rabczuk

发表机构 * School of Science, Harbin Institute of Technology, Shenzhen, P. R. China（哈尔滨工业大学深圳校区）； School of Science, Harbin Institute of Technology, Shenzhen, Guangdong（哈尔滨工业大学深圳校区）； Institute of Structural Mechanics, Bauhaus-Universität Weimar（魏玛 Bauhaus 大学结构力学研究所）

AI总结提出一种无数据框架WINO，结合神经算子的效率与φ-有限元法的几何灵活性，通过最小化弱形式残差和惩罚项训练，实现高精度且计算时间减少50-80%。

详情

AI中文摘要

我们提出了一种弱形式物理信息神经算子（WINO），这是一个无数据框架，结合了神经算子的效率与φ-有限元法（φ-FEM）的几何灵活性。φ-FEM是一种非拟合方法，无需体拟合网格即可适应几何变化，其中域几何由水平集函数φ表示。为了施加边界条件，Dirichlet问题采用φ-FEM提升，因此仅学习齐次位移贡献，而牵引驱动的Neumann问题额外预测非拟合弱形式所需的辅助场。参数通过最小化与φ-FEM对齐的弱形式残差平方以及切割单元辅助方程的平方惩罚来训练，从而消除了对大型配对数据集的依赖。训练后，WINO输出可作为神经算子热启动（NOWS）为非线性φ-FEM求解器提供初始值，相比传统冷启动求解器减少了迭代次数。数值基准测试表明，WINO在所有基准测试中实现了低于0.04的高精度，同时与纯数据驱动方法相比，总计算时间减少了50-80%。

英文摘要

We propose a Weak-form Physics-Informed Neural Operator (WINO), a data-free framework that combines the efficiency of neural operators with the geometric flexibility of the $φ$-finite element method ($φ$-FEM). $φ$-FEM is an unfitted method that accommodates geometric variations without body-fitted meshes, where the domain geometry is represented by the level-set function $φ$. To impose the boundary conditions, Dirichlet problems adopt the $φ$-FEM lifting so only the homogeneous displacement contribution is learned, whereas traction-driven Neumann problems additionally predict the auxiliary fields necessary for the unfitted weak formulation. Parameters are trained by minimizing squared weak-form residuals aligned with $φ$-FEM together with squared penalties on the cut-cell auxiliary equations, which removes the need for large paired datasets of converged reference solutions. After training, WINO outputs can seed the nonlinear $φ$-FEM solvers as neural operator warm starts (NOWS), which reduce iteration counts relative to traditional cold-started solvers. Numerical benchmarks show that WINO achieves high accuracy below 0.04 across all benchmarks, while reducing total computational time by 50--80\% compared with purely data-driven methods.

URL PDF HTML ☆

赞 0 踩 0

2605.24632 2026-05-26 cs.CR cs.AI cs.LG 版本更新

Demystifying the Mythos or Disrupting Bugonomics? From Zero-Day Asymmetry to Defender Remediation Throughput

揭秘神话或颠覆漏洞经济学？从零日不对称到防御者修复吞吐量

Alfredo Pesoli, Herman Errico, Lorenzo Cavallaro

发表机构 * University College London（伦敦大学学院）； Bynario

AI总结本文通过漏洞经济学视角分析LLM驱动的漏洞发现，指出其核心影响并非增加零日漏洞，而是提升防御者修复吞吐量，并利用Anthropic Mythos预览和Mozilla Firefox合作数据论证这一转变。

详情

AI中文摘要

最近，大型语言模型在生产软件中生成候选和确认漏洞的演示，重新引发了AI将重塑攻防安全的叙事。头条新闻强调能力，却很少审视成本和激励。本文通过漏洞经济学视角审视LLM驱动的漏洞发现：即生产、证明、优先级排序和修复安全相关缺陷的操作经济学。历史上，最引人注目的高端漏洞经济学是攻击方定价的，因为生产级零日漏洞和利用链是面向政府、经纪人和攻击方供应商的昂贵专家输出。防御方漏洞经济学早已存在于漏洞研究、奖励计划和供应商修复工作中；LLM辅助系统改变了其规模和分布。它们使得候选生成、代码理解、测试工具构建、影响证明草拟和报告准备在代码库规模上更便宜。利用和概念验证仍然重要，但在防御方工作流中，它们主要用于证明影响、指导优先级排序和证明修复的合理性。由此产生的瓶颈不仅仅是发现更多漏洞，而是吸收、验证、分类、修补和发布更大规模的报告流。利用Anthropic的Mythos预览和Mozilla Firefox合作的公开数据，以及公开的利用市场价格锚点和漏洞奖励计划，我们认为近期的转变不仅仅是更多的零日漏洞。而是向更广泛的防御者修复吞吐量迈进：低信号候选变得更便宜，证据丰富的修复变得更加重要，稀缺的能力转向维护者审查和发布工作。这种影响在开源领域尤为严重，因为LLM辅助发现可以增加报告量，而维护者侧的验证、分类、资金和发布能力可能无法扩展。

英文摘要

Recent demonstrations of large language models producing candidate and confirmed vulnerabilities in production software have renewed the narrative that AI will reshape offensive and defensive security. Headlines emphasize capability; they rarely interrogate costs and incentives. This paper examines LLM-driven vulnerability discovery through a bugonomics lens: the operational economics of producing, proving, prioritizing, and fixing security-relevant defects. Historically, the most visible high-end bugonomics was offense-priced because production-grade zero-days and exploit chains were expensive specialist outputs for governments, brokers, and offensive vendors. Defender-side bugonomics already existed in vulnerability research, reward programs, and vendor remediation work; LLM-assisted systems change its scale and distribution. They make candidate generation, code comprehension, harness construction, proof-of-impact drafting, and report preparation cheaper at codebase scale. Exploits and proofs of concept remain important, but in defender workflows they primarily prove impact, guide prioritization, and justify remediation. The resulting bottleneck is not only finding more bugs; it is absorbing, validating, triaging, patching, and shipping a larger stream of reports. Using public data from Anthropic's Mythos Preview and Mozilla Firefox collaborations, along with public exploit-market price anchors and vulnerability reward programs, we argue that the near-term shift is not simply more zero-days. It is a move toward broader defender remediation throughput: low-signal candidates become cheaper, evidence-rich remediation become more important, and scarce capacity shifts toward maintainer review and release work. The effect is acute in open source, where LLM-assisted discovery can increase report volume while maintainer-side validation, triage, funding, and release capacity may not scale.

URL PDF HTML ☆

赞 0 踩 0

2605.24631 2026-05-26 cs.LG cs.AI cs.CV 版本更新

Beyond Generative Priors: Minority Sampling with JEPA-Guided Diffusion

超越生成先验：JEPA引导扩散的少数采样

Sol Park, Soobin Um

发表机构 * Department of Artificial Intelligence, Kookmin University, Seoul, South Korea（人工智能系，韩国全州大学，首尔）

AI总结提出一种基于世界模型JEPA引导的扩散采样框架，通过近似策略实现高效计算，在无条件、类别条件和文本到图像生成中提升少数样本的保真度和语义有效性。

Comments ICML 2026, 21 pages, 9 figures

详情

AI中文摘要

少数采样旨在数据流形上生成低密度实例，在医学诊断、异常检测和创意AI等应用中具有核心重要性。然而，现有方法相对于从训练数据中学习的生成先验来定义少数样本，将稀有性限制在可能无法很好反映现实世界语义的模型特定概念中。在这项工作中，我们提出了一种以世界为中心的少数采样视角，该视角相对于现实世界先验而非生成器诱导的密度来定义稀有性。为此，我们引入了JEPA引导，一种由联合嵌入预测架构（JEPA）引导的扩散采样框架——JEPA是一类编码广泛、语义丰富表示的世界模型。JEPA引导将扩散轨迹导向JEPA隐含密度下的低密度区域，从而使生成的少数样本与现实世界的语义稀有性对齐。为了使JEPA引导在计算上实用，我们开发了带有理论误差界限的原则性近似策略，显著降低了引导计算的开销。在无条件、类别条件和文本到图像生成上的大量实验表明，JEPA引导持续提高了少数样本的保真度和语义有效性，在捕捉现实世界的稀有性概念方面优于以生成器为中心的基线。代码可在https://github.com/soobin-um/jepa-guidance获取。

英文摘要

Minority sampling aims to generate low-density instances on a data manifold and is of central importance in applications such as medical diagnosis, anomaly detection, and creative AI. Existing approaches, however, define minority samples relative to generative priors learned from training data, confining rarity to model-specific notions that may poorly reflect real-world semantics. In this work, we propose a world-centric perspective on minority sampling, which defines rarity with respect to real-world priors rather than generator-induced densities. To this end, we introduce JEPA guidance, a diffusion sampling framework guided by a Joint-Embedding Predictive Architecture (JEPA) -- a class of world models that encode broad, semantically rich representations. JEPA guidance steers diffusion trajectories toward low-density regions under the implicit density induced by the JEPA, thereby aligning generated minorities with real-world semantic rarity. To make JEPA guidance computationally practical, we develop principled approximation strategies accompanied by theoretical error bounds, significantly reducing the overhead of guidance computation. Extensive experiments across unconditional, class-conditional, and text-to-image generation demonstrate that JEPA guidance consistently improves the fidelity and semantic validity of minority samples, outperforming generator-centric baselines in capturing real-world notions of rarity. Code is available at https://github.com/soobin-um/jepa-guidance.

URL PDF HTML ☆

赞 0 踩 0

2605.24621 2026-05-26 cs.CV cs.AI cs.LG 版本更新

Phase-Aware Wavelet-Based-Scattering Encoder-Decoder for Dense Predictions

相位感知的基于小波散射的编解码器用于密集预测

Ghassen Marrakchi, Basarab Matei

发表机构 * Northern Paris Computer Science Lab, Sorbonne Paris Nord University, Villetaneuse, France（北巴黎计算机科学实验室，巴黎-索邦大学，法国维莱特内斯）

AI总结提出一种相位感知散射编解码器，通过在跳跃连接中显式保留相位信息来恢复空间结构，在图像去噪和皮肤病变分割任务中验证了相位对密集预测的有效性。

Comments 21 pages, 16 figures, 10 tables

2605.24614 2026-05-26 cs.CL cs.AI cs.LG 版本更新

Measuring the Depth of LLM Unlearning via Activation Patching

通过激活修补测量大语言模型遗忘的深度

Jaeung Lee, Dohyun Kim, Jaemin Jo

发表机构 * Sungkyunkwan University（全北大学）

AI总结提出遗忘深度评分（UDS），通过激活修补量化遗忘的机制深度，在150个遗忘模型上的元评估中达到最高忠实性和鲁棒性。

Comments 18 pages

详情

AI中文摘要

大语言模型遗忘已成为隐私保护和人工智能安全的关键事后机制，但审计目标知识是否真正被擦除仍然具有挑战性。现有的输出级指标无法检测到这些知识是否仍可从内部表示中恢复。最近的白盒研究揭示了此类残留知识，但通常依赖于辅助训练或数据集特定调整，缺乏可推广的指标。为解决这些限制，我们提出遗忘深度评分（UDS），一种通过激活修补量化遗忘机制深度的指标。UDS首先使用保留模型基线识别编码目标知识的层，然后在0-1尺度上测量遗忘模型中该知识被擦除的程度。在跨越8种方法的150个遗忘模型上的20个指标的元评估中，UDS实现了最高的忠实性和鲁棒性，证实了我们的因果方法是遗忘评估中最可靠的。案例研究进一步揭示，白盒指标可能在层级别上不一致，并且擦除深度因示例而异。我们提供了将UDS集成到现有基准测试框架并简化评估流程的指南。代码和数据可在https://github.com/gnueaj/unlearning-depth-score获取。

英文摘要

Large language model (LLM) unlearning has emerged as a crucial post-hoc mechanism for privacy protection and AI safety, yet auditing whether target knowledge is truly erased remains challenging. Existing output-level metrics fail to detect when this knowledge remains recoverable from internal representations. Recent white-box studies reveal such residual knowledge but often rely on auxiliary training or dataset-specific adaptations, leaving no generalizable metric. To address these limitations, we propose the Unlearning Depth Score (UDS), a metric that quantifies the mechanistic depth of unlearning via activation patching. UDS first identifies layers that encode the target knowledge using a retain model baseline, then measures how much of it is erased in the unlearned model on a 0-1 scale. In a meta-evaluation across 20 metrics on 150 unlearned models spanning 8 methods, UDS achieves the highest faithfulness and robustness, confirming our causal approach as the most reliable for unlearning evaluation. Case studies further reveal that white-box metrics can disagree at the layer level and that erasure depth varies across examples. We provide guidelines for integrating UDS into existing benchmarking frameworks and streamlining the evaluation pipeline. Code and data are available at https://github.com/gnueaj/unlearning-depth-score

URL PDF HTML ☆

赞 0 踩 0

2605.24611 2026-05-26 cs.LG 版本更新

Beyond Fixed Points: Superpolynomial Capacity of Asymmetric Hopfield Networks

超越不动点：非对称Hopfield网络的超多项式容量

Aakash Kumar, Anatoly Khina, Frederik Mallmann-Trenn, Emanuele Natale

发表机构 * COATI, CNRS, Inria, I3S, Université Côte d’Azur, France（法国国家科学研究中心（CNRS）、法国国家信息与自动化研究所（Inria）、I3S研究所、蔚蓝海岸大学）； School of Electrical and Computer Engineering, Tel Aviv University, Israel（特拉维夫大学电气与计算机工程学院）； Department of Informatics, King’s College London, UK（伦敦国王学院信息学院）； Institute of Science and Technology Austria (ISTA), Klosterneuburg, Austria（奥地利科学与技术研究所（ISTA））

AI总结通过结合组合数学、数论和观点动力学分析，在经典同步非对称Hopfield网络中实现了指数级数量的极限环吸引子，每个吸引子具有指数级周期且对噪声鲁棒，首次证明了非对称Hopfield网络的超多项式容量。

详情

AI中文摘要

经典Hopfield网络由于对称权重而局限于静态模式，而非对称网络可以通过极限环吸引子编码时间序列。然而，在经典的同步非对称网络中实现长序列的高容量存储仍然是一个挑战。我们在具有二进制神经元和同步更新的经典非对称Hopfield模型中提出了一种简单且鲁棒的构造，使得$n$个神经元能够支持$\exp\!ig(Ω(n/(\log n)^2)ig)$个不同的极限环吸引子，每个吸引子的周期为$\exp\!ig(Ω(\sqrt n/\log n)ig)$，并且对翻转概率高达$ rac12-o(1)$的随机噪声具有鲁棒性，从而在存储序列的数量和长度上实现了超多项式容量。这是首次展示非对称Hopfield网络的这种容量，我们通过结合组合数学、数论和观点动力学的分析得到了这一结果。我们的发现表明，同步非对称Hopfield网络具有比先前认识到的更大且更鲁棒的序列记忆容量，证明在生物和人工神经系统中，鲁棒的序列表示可以通过粗糙的结构模式而非复杂的非线性来实现。

英文摘要

Classical Hopfield networks are limited to static patterns due to symmetric weights, whereas asymmetric networks can encode temporal sequences via limit-cycle attractors. Achieving high-capacity storage of long sequences in classical synchronous asymmetric networks, however, has remained a challenge. We present a simple and robust construction within the classical asymmetric Hopfield model with binary neurons and synchronous updates, that allows $n$ neurons to support $\exp\!\big(Ω(n/(\log n)^2)\big)$ distinct limit-cycle attractors, each with period $\exp\!\big(Ω(\sqrt n/\log n)\big)$ and robust to random noise with flip probability up to $\frac12-o(1)$, yielding superpolynomial capacity in both the number and length of stored sequences. This is the first demonstration of such capacity for asymmetric Hopfield networks, which we obtain by combining results from combinatorics, number theory and the analysis of opinion dynamics. Our findings show that synchronous asymmetric Hopfield networks possess a sequence-memory capacity which is larger and more robust than previously recognized, demonstrating that, in both biological and artificial neural systems, robust sequence representation can be achieved through coarse architectural motifs rather than complex nonlinearities.

URL PDF HTML ☆

赞 0 踩 0

2605.24608 2026-05-26 cs.AI cs.CV cs.LG 版本更新

HeartBeatAI：用于多标签心电图心律失常的可解释且鲁棒的深度学习框架

Shubham Gupta, Nikhil Panwar, Partha Pratim Roy

发表机构 * Department of Computer Science and Engineering, Indian Institute of Technology Roorkee（印度拉胡尔理工学院计算机科学与工程系）

AI总结提出HeartBeatAI框架，结合域泛化、多尺度特征聚合和临床可解释性，通过Squeeze-and-Excitation ResNet和多层浓度管道实现鲁棒的12导联心电图分类，在跨数据集评估中达到98%宏F1分数，但跨机构部署时罕见异常检测仍存在挑战。

详情

AI中文摘要

虽然深度学习增强了自动化心电图分析，但临床部署受到类别不平衡和泛化差距的阻碍。本文提出了HeartBeatAI，一个结合域泛化、多尺度特征聚合和临床可解释性的深度学习框架，用于鲁棒的12导联心电图分类。超越基于图像的范式，HeartBeatAI集成了一个Squeeze-and-Excitation ResNet来隔离诊断导联，以及一个多层浓度管道来捕捉宏观节律和微观形态异常。为了缓解域偏移，该框架采用了MixStyle正则化和标签平滑。通过使用源内和留一域外协议在四个大规模数据集上进行严格的基准测试，在源内条件下实现了高性能（98%宏F1分数）。然而，留一域外评估揭示了检测罕见异常时的显著退化，突显了跨机构部署中持续存在的挑战。

英文摘要

While Deep Learning (DL) enhances automated electrocardiogram (ECG) analysis, clinical deployment is hindered by class imbalance and the generalization gap. This paper presents HeartBeatAI, a deep learning framework combining domain generalization, multi-scale feature aggregation, and clinical explainability for robust 12-lead ECG classification. Moving beyond image-based paradigms, HeartBeatAI integrates a Squeeze-and-Excitation (SE) ResNet to isolate diagnostic leads alongside a Multi-Layer Concentration Pipeline to capture macro-rhythm and micro-morphological anomalies. To mitigate domain shift, the framework employs MixStyle regularization and Label Smoothing. Rigorous benchmarking across four large-scale datasets using intra-source and Leave-One-Domain-Out (LODO) protocols demonstrates high performance (98% Macro F1-score) under intra-source conditions. However, LODO evaluations reveal significant degradation in detecting rare anomalies, highlighting a persistent challenge in cross-institutional deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.24584 2026-05-26 cs.LG cs.AI 版本更新

召唤神谕以屠之：利用大语言模型缓解金融回测中的前瞻偏差

Weixian Waylon Li, Mengyu Wang, Tiejun Ma

发表机构 * University of Edinburgh（爱丁堡大学）

AI总结提出FinCAD方法，通过对抗性偏差发现和实体日期自适应规则，在不重新训练的情况下抑制大语言模型对历史结果的记忆，从而缓解金融回测中的参数化前瞻偏差。

详情

AI中文摘要

在历史金融数据上回测大语言模型（LLMs）是不可靠的，因为预训练在事件发生后截断。一个在2024年训练的LLM已经“知道”2018-2020年股票的走势。我们将这种失败命名为参数化前瞻偏差，并提出FinCAD，一种上下文感知解码的推理时适配方法，无需重新训练即可抑制LLM对历史结果的记忆。FinCAD结合了一个对抗性偏差发现流程，该流程学习一个模型特定的记忆激活先验提示，以及一个实体和日期自适应规则，该规则将CAD强度按（实体，日期）记忆程度缩放，使得惩罚在记忆的样本内日期触发，并在样本外衰减至零。在五个7-14B LLM和五只大盘股上，FinCAD在记忆日期上将样本内回测收益削减高达-67.1%，同时将2025年样本外收益保持在$8K以内，夏普比率在基线的0.10以内，并保持通用推理能力在1.7分以内。在十一个模型的排行榜上，它将样本内/样本外Spearman相关性从+0.779提升至+0.846，恢复了真正预测样本外表现的排名。

英文摘要

Backtesting large language models (LLMs) on historical financial data is unreliable because pre-training cuts off after the events happened. An LLM trained in 2024 already "knows" which way 2018-2020 stocks moved. We name this failure parametric look-ahead bias and propose FinCAD, an inference-time adaptation of Context-Aware Decoding that suppresses an LLM's memory of historical outcomes without retraining. FinCAD pairs an adversarial bias-discovery pipeline that learns a model-specific memory-activating prior prompt with an entity- and date-adaptive rule that scales the CAD strength to per-(entity, date) memorisation, so the penalty fires on memorised in-sample dates and decays to zero out-of-sample. Across five 7-14B LLMs and five mega-cap equities, FinCAD cuts in-sample backtest returns by up to -67.1% on memorised dates while leaving 2025 out-of-sample returns within $8K and Sharpe within 0.10 of baseline, and preserves general-purpose reasoning within 1.7 pts. On an eleven-model leaderboard, it raises the in-sample / out-of-sample Spearman correlation from +0.779 to +0.846, recovering rankings that genuinely predict out-of-sample performance.

URL PDF HTML ☆

赞 0 踩 0

2605.24558 2026-05-26 cs.LG 版本更新

Position: AI for Science Should Treat Measurement-to-Dataset Pipelines as Inference Components

立场：科学人工智能应将测量到数据集的处理流程视为推理组件

Ling Zhan, Xiaoyao Yu, Tao Jia

发表机构 * College of Computer and Information Science（计算机与信息科学学院）； Chongqing Key Laboratory of Brain-Inspired Cognitive Computing and Educational Rehabilitation for Children with Special Needs（重庆脑启发认知计算及特殊需要儿童教育康复重点实验室）； Chongqing Normal University（重庆师范大学）

AI总结本文主张科学人工智能中的测量到数据集流程应被视为推理组件，并揭示了将其输出视为固定数据导致的三个失败模式，通过大规模神经科学实证验证了问题的严重性，呼吁建立可计算的观测框架。

Comments 23 pages, 5 figures, Proceedings of the 43 rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026

详情

AI中文摘要

科学人工智能（AI4Science）工作流通常将发布的数据集视为底层系统的固定接口。然而，在依赖间接观测的领域中，学习器观察到的是由多阶段测量、重建和预处理流程产生的衍生表示。我们认为这些测量到数据集的处理流程是推理组件：将其输出视为“给定数据”会冻结观测模型并掩盖可行流程选择的不确定性。我们识别出这种“冻结透镜”导致的三个失败模式：（C1）隐藏假设空间，即发布的数据集未指定流程配置或其有效性条件；（C2）未经认证的可迁移性，即流程可能被记录但其有效性范围未经测试，因此分布偏移下的失败无法判定；（C3）无约束的多样性，即存在许多可辩护的流程且分散性是真实的，但未传播到不确定性感知的证据中。我们通过大规模神经科学实证审计对这些主张进行压力测试，发现在跨数据集稳定性标准下存活率约为0.0004%。我们呼吁AI4Science社区通过特定领域的可计算观测框架使流程成为可计算的推理对象。这一转变能够量化流程的充分性和稳定性，将隐式的实现选择转化为可审计、可复现和累积的科学证据。

英文摘要

AI for Science (AI4Science) workflows often treat the released dataset as a fixed interface to the underlying system. However, in domains relying on \emph{indirect observation}, the learner observes a derivative representation produced by multi-stage measurement, reconstruction, and preprocessing pipelines. \textbf{We argue that these measurement-to-dataset pipelines are inference components: treating their outputs as ``given data'' freezes an observation model and obscures uncertainty over feasible pipeline choices.} We identify three failure modes arising from this ``frozen lens'': \textbf{(C1) hidden hypothesis space}, where the released dataset does not specify the pipeline configuration or its validity conditions; \textbf{(C2) uncertified transportability}, where a pipeline may be documented but its regime of validity is untested, so failures under distribution shift cannot be adjudicated; \textbf{(C3) ungoverned multiplicity}, where many defensible pipelines exist and dispersion is real but not propagated into uncertainty-aware evidence. We stress-test these claims with a large-scale neuroscience empirical audit, finding a survival rate of $\approx 0.0004\%$ under a cross-dataset stability criterion. We call on the AI4Science community to make pipelines \emph{computable} inference objects via domain-specific Computable Observation Frameworks. This shift enables quantifying pipeline adequacy and stability, converting implicit implementation choices into auditable, reproducible, and cumulative scientific evidence.

URL PDF HTML ☆

赞 0 踩 0

2605.24556 2026-05-26 cs.IR cs.CL cs.LG 版本更新

The Multilingual Curse at the Retrieval Layer: Evidence from Amharic

多语言诅咒在检索层：来自阿姆哈拉语的证据

Yosef Worku Alemneh, Kidist Amde Mekonnen, Maarten de Rijke

发表机构 * Independent Researcher（独立研究者）； University of Amsterdam（阿姆斯特丹大学）

AI总结针对零样本多语言检索在低资源形态丰富语言（如阿姆哈拉语）上表现不佳的问题，通过对比实验发现单语检索器显著优于多语言检索器，并揭示了多语言基准测试的局限性。

Comments 10 pages, 4 tables. Accepted to the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM) at ACL 2026

详情

AI中文摘要

多语言检索日益支撑着跨语言问答和检索增强生成。在多语言基准测试上的强零样本分数常被视为当前编码器能可靠跨语言迁移的证据。我们认为，对于代表性不足、形态丰富的语言，这一假设不成立，并以阿姆哈拉语作为诊断案例。在涵盖密集、延迟交互、学习稀疏和交叉编码器范式的共享段落检索协议下，我们比较了零样本多语言检索器、阿姆哈拉语微调的多语言检索器以及单语阿姆哈拉语检索器。最强的零样本多语言检索器在MRR@10上比最强的单语阿姆哈拉语第一阶段检索器低23%。在相同的阿姆哈拉语监督下微调两个最新的多语言嵌入模型，相比零样本获得了32-60%的相对MRR@10提升，但最佳阿姆哈拉语微调多语言模型仍低于最强的单语阿姆哈拉语检索器。这些发现表明，零样本多语言检索并不能充分代表LLM时代公平的信息访问：对于代表性不足的语言，检索必须在语言内部进行评估和适应，而不是从聚合的多语言基准测试中推断。为促进未来研究，我们在https://github.com/rasyosef/amharic-neural-ir 公开发布了数据集、代码库和训练模型。

英文摘要

Multilingual retrieval increasingly underpins cross-lingual question answering and retrieval-augmented generation. Strong zero-shot scores on multilingual benchmarks are often taken as evidence that current encoders transfer reliably across many languages. We argue that this assumption breaks down for underrepresented, morphologically rich languages, and use Amharic as a diagnostic case. Under a shared passage retrieval protocol covering dense, late-interaction, learned sparse, and cross-encoder paradigms, we compare zero-shot multilingual retrievers, Amharic-fine-tuned multilingual retrievers, and monolingual Amharic retrievers. The strongest zero-shot multilingual retriever underperforms the strongest monolingual Amharic first-stage retriever by 23% relative MRR@10. Fine-tuning two recent multilingual embedding models on the same Amharic supervision yields 32-60% relative MRR@10 gains over zero-shot, but the best Amharic-fine-tuned multilingual model remains below the strongest monolingual Amharic retriever. These findings indicate that zero-shot multilingual retrieval is not a sufficient proxy for equitable information access in the LLM era: for underrepresented languages, retrieval must be evaluated and adapted in-language rather than inferred from aggregate multilingual benchmarks. To foster future research, we publicly release the dataset, codebase, and trained models at https://github.com/rasyosef/amharic-neural-ir.

URL PDF HTML ☆

赞 0 踩 0

2605.24550 2026-05-26 cs.AI cs.CL cs.LG 版本更新

Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models

越狱以保护：通过临时越狱进行缓冲和强化以实现大型语言模型的安全微调

Seokil Ham, Jaehyuk Jang, Wonjun Lee, Changick Kim

发表机构 * School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST)（韩国科学技术院电子工程学院）

AI总结针对微调即服务中安全对齐被有害微调攻击削弱的问题，提出一种基于梯度分析的缓冲与强化框架，通过临时越狱适配器减少有害更新并利用QR分解合并强化安全，实现无需额外安全数据的高效防御。

Comments ICML 2026 Spotlight

详情

AI中文摘要

微调即服务（FaaS）使得大型语言模型（LLMs）的个性化成为可能，但它在有害微调攻击下会削弱安全对齐。最近的研究表明，在微调期间激活有害行为模块可以防止模型学习不良行为，但其机制尚不清楚。在本文中，我们重新审视临时越狱作为对抗有害微调的一种防御手段，并提供了梯度层面的分析，表明它能够饱和安全退化梯度，同时保留良性任务相关梯度。基于这一见解，我们提出了一种缓冲与强化微调框架，该框架在用户微调期间缓冲有害更新，并在适应后强化安全。具体来说，BufferLoRA作为一个可移除的适配器，在用户微调期间诱导临时越狱以减少有害更新。适应后，通过基于QR分解的合并，将经过训练的ReinforceLoRA（用于在临时越狱状态下恢复拒绝行为）与UserLoRA集成，以在保持用户任务性能的同时强化安全。大量实验表明，我们的框架在用户微调期间无需额外安全数据且计算成本极低的情况下，实现了卓越的安全性和实用性。

英文摘要

Fine-tuning-as-a-Service (FaaS) enables personalization of large language models (LLMs), but it can weaken safety-alignment under harmful fine-tuning attacks. Recent work has shown that activating harmful-behavior modules during fine-tuning can prevent models from learning undesired behaviors, but its mechanism remains unclear. In this paper, we revisit temporary jailbreaking as a defense against harmful fine-tuning and provide a gradient-level analysis showing that it saturates safety-degrading gradients while preserving benign task-relevant gradients. Based on this insight, we propose a Buffer-and-Reinforce fine-tuning framework that buffers harmful updates during user fine-tuning and reinforces safety after adaptation. Specifically, BufferLoRA induces temporary jailbreaking as a removable adapter to reduce harmful updates during user fine-tuning. After adaptation, ReinforceLoRA, trained to recover refusal behavior under the temporarily jailbroken state, is integrated with UserLoRA via QR decomposition-based merging to reinforce safety while preserving user-task performance. Extensive experiments show that our framework achieves superior safety and utility with no additional safety data during user fine-tuning and minimal computational cost.

URL PDF HTML ☆

赞 0 踩 0

2605.24548 2026-05-26 cs.LG math.PR 版本更新

Deep ZakaiJ: Structured Filtering for Jump-Diffusion Time Series Forecasting

Deep ZakaiJ：用于跳跃扩散时间序列预测的结构化滤波

Yan Leng, Thibaut Mastrolia, Hao Wang

发表机构 * University of Texas at Austin（德克萨斯大学奥斯汀分校）； University of California, Berkeley（加州大学伯克利分校）

AI总结提出Deep ZakaiJ模型，将Zakai非线性滤波方程嵌入神经编码器-解码器架构，通过Strang分裂实现隐状态信念更新，用于部分观测的跳跃扩散系统，在合成、金融和海洋数据集上改进了分布预测并保持点精度竞争力。

详情

AI中文摘要

由未观测隐状态驱动的时间序列经常表现出突然的跳跃不连续性，其时间和幅度无法仅从观测历史预测。经典跳跃扩散模型提供了严谨的数学框架，但假设刚性参数形式，而最近的神经跳跃模型在完全观测轨迹上操作，不推断控制动力学的隐状态。我们提出 extit{Deep ZakaiJ}，一种用于部分观测跳跃扩散系统的隐状态模型，将Zakai非线性滤波方程嵌入神经编码器-解码器架构。编码器通过Strang分裂递归更新隐状态的信念，分为三个可解释的子步骤：先验传播、扩散创新和跳跃创新，产生精确滤波演化的可微一阶精确近似。解码器是一个结构化跳跃扩散模型，明确以滤波信念为条件，保持连续动力学和不连续冲击之间的分离。在合成、金融和海洋数据集上， extit{Deep ZakaiJ}改善了分布预测，同时保持点精度竞争力，实现了校准的预测区间，并在合成和定性案例研究中恢复了可解释的隐结构。

英文摘要

Time series driven by unobserved latent states frequently exhibit abrupt jump discontinuities whose timing and magnitude cannot be predicted from observed history alone. Classical jump-diffusion models offer a principled mathematical framework but assume rigid parametric forms, while recent neural jump models operate on fully observed trajectories without inferring the hidden states that govern the dynamics. We propose \textit{Deep ZakaiJ}, a latent-state model for partially observed jump-diffusion systems that embeds the Zakai nonlinear filtering equation into a neural encoder--decoder architecture. The encoder recursively updates a belief over the latent state via Strang splitting into three interpretable substeps: prior propagation, diffusion innovation, and jump innovation, yielding a differentiable, first-order-accurate approximation of the exact filtering evolution. The decoder is a structured jump-diffusion model explicitly conditioned on the filtered belief, preserving the separation between continuous dynamics and discontinuous shocks. On synthetic, financial, and oceanographic datasets, \textit{Deep ZakaiJ} improves distributional forecasts while remaining competitive in point accuracy, achieving calibrated predictive intervals and recovering interpretable latent structure in synthetic and qualitative case studies.

URL PDF HTML ☆

赞 0 踩 0

2605.24547 2026-05-26 cs.LG 版本更新

RL with Learnable Textual Feedback: A Bilevel Approach

基于可学习文本反馈的强化学习：一种双层方法

Utsav Singh, Sidhaarth Sredharan, Souradip Chakraborty, Amrit Singh Bedi

发表机构 * University of Central Florida（佛罗里达中央大学）

AI总结针对稀疏奖励导致样本效率低的问题，提出一种双层优化框架Bi-NAC，联合训练评论家生成可改善策略的文本反馈和演员利用该反馈，在MATH-500等任务上提升了样本和参数效率。

详情

AI中文摘要

具有可验证奖励的强化学习可以改进LLM的推理能力，但当终端奖励稀疏时，学习仍然样本效率低下。这推动了关于文本反馈强化学习的一系列工作，其中评论家模型生成自然语言反馈来指导推理模型（演员），用更丰富的学习信号增强标量奖励。然而，现有方法通常将反馈视为固定的或辅助的，这忽略了关键性质：反馈不仅应正确，而且应在上下文中提供时改进策略（演员模型）。这激发了用于强化学习的可学习文本反馈范式。然而，反馈的可学习性和有用性取决于策略从中学习的能力，使得具有可学习反馈的强化学习本质上是一个双层问题。我们将这种耦合形式化为Stackelberg双层规划，并推导出双层自然语言演员-评论家（Bi-NAC），它联合训练评论家生成改善奖励的反馈和演员利用该反馈。在MATH-500、MBPP和GPQA上，Bi-NAC在样本和参数效率上优于强化学习和固定评论家基线：我们的2B模型优于3B GRPO基线，在MATH-500上达到46.6%对比41.4%，而我们的6B模型超过7B GRPO基线，在GPQA上达到49.3%对比43.6%。

英文摘要

Reinforcement learning with verifiable rewards can improve LLM reasoning, but learning remains sample-inefficient when terminal rewards are sparse. This has motivated a growing line of work on RL with textual feedback, where a critic model generates natural language feedback to guide a reasoning model (the actor), augmenting scalar rewards with richer learning signals. However, existing methods typically treat feedback as fixed or auxiliary, which misses a key property: feedback should not merely be correct, but should improve the policy (actor model) when provided in context. This motivates a paradigm of learnable textual feedback for RL. Yet the learnability and usefulness of feedback depend on the policy's ability to learn from it, making RL with learnable feedback an inherently bilevel problem. We formalize this coupling as a Stackelberg bilevel program and derive Bilevel Natural Language Actor-Critic (Bi-NAC), which jointly trains a critic to generate reward-improving feedback and an actor to exploit it. Across MATH-500, MBPP, and GPQA, Bi-NAC improves sample and parameter efficiency over RL and fixed-critic baselines: our 2B model outperforms the 3B GRPO baseline, achieving 46.6% versus 41.4% on MATH-500, while our 6B model surpasses the 7B GRPO baseline, achieving 49.3% versus 43.6% on GPQA.

URL PDF HTML ☆

赞 0 踩 0

2605.24545 2026-05-26 cs.LG cs.AI 版本更新

Rethinking Federated Unlearning via the Lens of Memorization

通过记忆视角重新思考联邦遗忘学习

Jiaheng Wei, Yanjun Zhang, He Zhang, Leo Yu Zhang, Chao Chen, Kok-Leong Ong, Jun Zhang, Yang Xiang

发表机构 * Royal Melbourne Institute of Technology（皇家墨尔本理工学院）； Griffith University（格里菲斯大学）； Swinburne University of Technology（斯威本理工大学）

AI总结针对联邦学习中遗忘数据与保留数据重叠导致遗忘无效和客户端不公平的问题，提出基于分组记忆评估的联邦记忆剪枝方法，通过重置负责记忆的冗余参数实现高效遗忘。

Comments This paper has been accepted by SIGKDD 2026

详情

ChronoVAE-HOPE：超越注意力——面向专业时间序列分类的下一代VAE基础模型

José Alberto Rodríguez, Luis Balderas, Miguel Lastra, Antonio Arauzo-Azofra, José M. Benítez

发表机构 * Department of Computer Science and Artificial Intelligence（计算机科学与人工智能系）； DiCITS ； iMUDS ； DaSCI ； University of Granada（格拉纳达大学）； Advanced Medical Imaging Group（先进医学成像组）； Instituto de Investigación Biosanitaria de Granada（格拉纳达生物医学研究 institute）； Department of Software Engineering（软件工程系）； Department of Rural Engineering（农村工程系）； University of Córdoba（科尔多瓦大学）

AI总结提出ChronoVAE-HOPE，一种基于VAE和HOPE块（含Titans模块和连续记忆系统）的下一代时间序列基础模型，通过解耦潜在空间分离趋势与季节成分，在UCR基准分类任务上表现优异。

详情

AI中文摘要

时间序列基础模型已成为通用时间序列预测领域的最新技术组成部分。然而，将其应用于专业分类任务仍受两个相互关联的挑战制约：标准注意力机制的二次成本以及无法解耦时间序列变异性背后的结构成分。本技术报告介绍了ChronoVAE-HOPE，一种下一代时间序列基础模型，它调和了大规模泛化与结构化潜在表示在时间序列分类中的需求。该方案的核心是构建于HOPE块之上的变分自编码器框架，该框架用双记忆系统替代二次注意力：用于动态短期保留的Titans模块和用于长期历史上下文抽象的连续记忆系统。一个关键的架构创新是解耦潜在空间，通过专用编码器头和分离的解码器路径将表示分解为独立的趋势和季节成分。ChronoVAE-HOPE在Monarch档案上进行自监督预训练，结合了掩码时间序列建模辅助目标和解耦VAE重建损失。预训练编码器随后被冻结，用于生成固定长度嵌入，以在UCR基准数据集上进行下游分类。实证结果表明，在不同时间域上，特别是在具有严格因果结构的设置中，模型表现出强劲性能。ChronoVAE-HOPE通过结构化生成表示为基础模型适应时间序列分类建立了一个稳健且可解释的框架。

英文摘要

Time Series Foundation Models (TSFMs) have become a new component of the state-of-the-art in general time series forecasting. However, adapting them to specialized classification tasks remains constrained by two interconnected challenges: the quadratic cost of standard attention mechanisms and the inability to disentangle the structural components underlying time series variability. This technical report introduces ChronoVAE-HOPE, a next-generation TSFM that reconciles massive generalization with structured latent representation for time series classification. The core of the proposal is a Variational Autoencoder (VAE) framework built upon the HOPE Block, which replaces quadratic attention with a dual-memory system: Titans modules for dynamic short-term retention and a Continuum Memory System (CMS) for the abstraction of long-term historical context. A key architectural novelty is the disentangled latent space, which factorizes representations into independent trend and seasonal components via dedicated encoder heads and separate decoder pathways. ChronoVAE-HOPE undergoes self-supervised pre-training on the Monash archive, combining a Masked Time Series Modeling (MTSM) auxiliary objective with a disentangled VAE reconstruction loss. The pre-trained encoder is subsequently frozen and used to generate fixed-length embeddings for downstream classification on the UCR benchmark datasets. Empirical results demonstrate strong performance across diverse temporal domains, particularly in settings characterized by strict causal structure. ChronoVAE-HOPE establishes a robust and interpretable framework for the adaptation of foundation models to time series classification through structured generative representations.

URL PDF HTML ☆

赞 0 踩 0

2605.22242 2026-05-26 cs.LG physics.ao-ph 版本更新

Decomposing Ensemble Spread in Lorenz '96 With Learned Stochastic Parameterizations

利用学习随机参数化分解 Lorenz '96 中的集合离散度

Birgit Kühbacher, Daan Crommelin, Niki Kilbertus

发表机构 * Technical University of Munich（慕尼黑技术大学）； Helmholtz Munich（海德堡-慕尼黑研究所）； Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心）； Centrum Wiskunde & Informatica (CWI)（荷兰代尔夫特数学与信息研究所）； Korteweg-de Vries Institute for Mathematics, University of Amsterdam（阿姆斯特丹大学克罗内克-德·维尔斯数学研究所）

AI总结本研究利用双尺度 Lorenz 1996 系统，通过比较多种集合配置和参数化策略，系统分析了内在变率、初始条件扰动和随机模型不确定性对集合离散度的影响，揭示了随机参数化特别是时间持续结构能增强早期离散度增长并改善离散度-误差一致性。

详情

AI中文摘要

由于混沌动力学、不完美的初始条件以及对底层物理过程的不完全表示，天气和气候预报本质上具有不确定性。业务集合预报旨在通过预报离散度来表示这些不确定性，然而许多方法产生的离散度估计不足，即离散度相对于预报误差增长过慢。利用双尺度 Lorenz 1996 系统作为广泛使用的受控测试平台，我们设计了一种系统方法来区分内在变率、初始条件扰动和随机模型不确定性。我们比较了多种集合配置和参数化策略，包括现有的确定性和自回归方法以及新颖的贝叶斯和基于流的方法。我们的结果表明，集合扰动不会增加系统的长期方差；相反，它们调节轨迹去相关和探索不变测度的速度。随机参数化，特别是那些具有时间持续结构的参数化，增强了早期离散度增长并改善了离散度-误差一致性。总体而言，我们阐明了不同不确定性来源在混沌系统中如何相互作用，并为天气和气候模型中随机参数化的设计和评估提供了指导。

英文摘要

Weather and climate forecasts are inherently uncertain due to chaotic dynamics, imperfect initial conditions, and incomplete representation of the underlying physical processes. Operational ensemble forecasts aim to represent these uncertainties through forecast spread, yet many approaches yield underdispersive estimates, with spread that grows too slowly relative to forecast error. Using the two-scale Lorenz 1996 system as a widely used, controlled testbed, we design a systematic approach to disentangle intrinsic variability, initial-condition perturbations, and stochastic model uncertainty. We compare multiple ensemble configurations and parameterization strategies, including existing deterministic and autoregressive as well as novel Bayesian and flow-based approaches. Our results show that ensemble perturbations do not increase the system's long-term variance; rather, they regulate how rapidly trajectories decorrelate and explore the invariant measure. Stochastic parameterizations, particularly those with temporally persistent structure, enhance early spread growth and improve spread-error consistency. Overall, we bring clarity to how different sources of uncertainty interact in a chaotic system and provide guidance for the design and evaluation of stochastic parameterizations in weather and climate models.

URL PDF HTML ☆

赞 0 踩 0

2605.20747 2026-05-26 q-bio.GN cs.LG 版本更新

Multi-Modal Machine Learning for Population- and Subject-Specific lncRNA-Type 2 Diabetes Association Analysis

多模态机器学习用于群体和个体特异性lncRNA-2型糖尿病关联分析

Ashwani Siwach, Sanjeev Narayan Sharma, Sunil Datt Sharma

发表机构 * Department of Electronics and Communication Engineering, IIITDM Jabalpur（IIITDM Jabalpur电子与通信工程系）； Department of Electronics and Communication Engineering, Central University of Jammu（Jammu中央大学电子与通信工程系）

AI总结本研究通过整合表达、二级结构和序列特征的多模态机器学习框架，在独立队列中识别与2型糖尿病相关的lncRNA，并利用SHAP分析实现群体和个体水平的关联解释。

Comments This work has been submitted to the IEEE for possible publication

详情

AI中文摘要

长链非编码RNA（lncRNA）是参与慢性疾病（包括2型糖尿病）发病机制的新兴调控分子。我们研究了文献中报道的与2型糖尿病相关的十种lncRNA：MALAT1、MEG3、MIAT、ANRIL、GAS5、KCNQ1OT1、H19、BCYRN1、XIST和HOTAIR，在两个独立的人群RNA-seq队列中进行了分析。单组学方法提供了疾病生物学的不完整视图，因此开发了一个整合多特征框架，提取每种lncRNA的表达、二级结构和序列特征。在分层k折交叉验证、留一法交叉验证和重复留出法方案下评估了八种机器学习分类器，以确保稳健的性能估计。应用SHAP分析进行个体水平的关联解释。在一个队列中，发现GAS5和XIST的表达特征以及GAS5、MEG3和ANRIL的序列特征与2型糖尿病相关，而在第二个队列中，发现MALAT1的表达特征以及KCNQ1OT1、ANRIL和MEG3的序列特征与2型糖尿病相关。SHAP将MEG3识别为两个队列中的主要lncRNA。机器学习结果与已建立的统计方法一致，同时额外提供了与特定分子特征类型相关的群体和个体水平疾病关联谱。所提出的框架增进了对2型糖尿病机制的理解，并支持基于lncRNA的精准医学。

英文摘要

Long non-coding RNAs (lncRNAs) are emerging regulatory molecules implicated in chronic disease pathogenesis, including Type 2 Diabetes Mellitus (T2D). We investigated ten literature reported lncRNAs associated with T2D: MALAT1, MEG3, MIAT, ANRIL, GAS5, KCNQ1OT1, H19, BCYRN1, XIST, and HOTAIR across two independent population-based RNA-seq cohorts. Single-omics approaches provide an incomplete view of disease biology, therefore, an integrative multi-feature framework was developed, extracting expression, secondary-structure, and sequence features for each lncRNA. Eight machine learning (ML) classifiers were evaluated under stratified k-fold, leave-one-out cross-validation (LOOCV), and repeated hold-out schemes to ensure robust performance estimation. SHAP analysis was applied for subject-level association interpretation. In one cohort, GAS5 and XIST expression features, along with GAS5, MEG3, and ANRIL sequence features, were found to be associated with T2D, while MALAT1 expression and KCNQ1OT1, ANRIL, and MEG3 sequence features were found to be associated in the second cohort. MEG3 was identified by SHAP as the dominant lncRNA in both cohorts. ML results were consistent with established statistical methods while additionally providing population- and subject-level disease association profiles linked to specific molecular feature types. The proposed framework advances mechanistic understanding of T2D and supports lncRNA-based precision medicine.

URL PDF HTML ☆

赞 0 踩 0

2605.20416 2026-05-26 cs.LG physics.comp-ph 版本更新

Miller-Index-Based Latent Crystallographic Fracture Plane Reasoning and generation with Vision-Language Models

基于米勒指数的潜在晶体学断裂面推理与生成：视觉-语言模型方法

Qinwu Xu, Xiaofu Ma, Yifan Jiang

发表机构 * Independent research（独立研究）

AI总结研究多模态大语言模型能否利用米勒指数作为结构化潜在表示来推理断裂几何，实验表明模型在理想条件下可进行潜在推理，并能拒绝不适用物理的表示。

详情

AI中文摘要

我们研究多模态大语言模型（MLLMs）是否能够利用晶体学平面指数（米勒指数）作为结构化潜在表示来推理断裂几何。我们将米勒指数 $z = (h,k,l)$ 形式化为控制理想平面断裂的潜在变量，并评估两种互补能力：(i) 潜在推理，即模型在物理有效条件下将视觉观测映射到平面假设；(ii) 潜在适用性评估，即模型判断这种表示对于给定断裂图像是否有意义。通过涵盖合成数据、受控的2D-3D几何对以及多种材料类别（包括陶瓷、玻璃、金属和混凝土）的真实断裂图像的广泛实验，我们表明MLLMs能够在理想设置下可靠地进行潜在推理，并且关键的是，当底层物理不支持时，能够拒绝该潜在表示。作为探索性扩展，我们进一步检查了AI生成的断裂序列，并观察到定性上合理的脆性断裂进展行为，这表明多模态生成模型可能编码了与材料失效动力学相关的部分隐式物理先验。这些结果表明，只要明确建模有效性域，MLLMs可以作为基于结构化潜在先验的物理感知推理系统。

英文摘要

We study whether multimodal large language models (MLLMs) can leverage crystallographic plane indices (Miller indices) as a structured latent representation for reasoning about fracture geometry. We formulate Miller indices $z = (h,k,l)$ as a latent variable governing idealized planar fracture and evaluate two complementary capabilities: (i) latent inference, where the model maps visual observations to plane hypotheses under physically valid conditions, and (ii) latent applicability assessment, where the model determines whether such a representation is meaningful for a given fracture image. Through extensive experiments spanning synthetic data, controlled 2D--3D geometric pairs, and real-world fracture images across multiple material classes -- including ceramics, glass, metals, and concrete -- we show that MLLMs can reliably perform latent inference in idealized settings and, critically, can reject the latent representation when the underlying physics does not support it. As an exploratory extension, we further examine AI-generated fracture sequences and observe qualitatively plausible brittle-fracture progression behaviors, suggesting that multimodal generative models may encode partial implicit physical priors related to material failure dynamics. These results suggest that MLLMs can act as physics-aware reasoning systems conditioned on structured latent priors, provided that the domain of validity is explicitly modeled.

URL PDF HTML ☆

赞 0 踩 0

2605.20278 2026-05-26 cs.LG cs.AI cs.CV 版本更新

使用高阶朗之万动力学减少扩散模型记忆化

Benjamin Sterling, Mónica F. Bugallo, Tom Tirer

发表机构 * Department of Applied Math & Statistics（应用数学与统计学系）； Stony Brook University（石溪大学）； Department of Electrical and Computer Engineering（电气与计算机工程系）； Faculty of Engineering（工程学院）； Bar-Ilan University（巴伊兰大学）

AI总结本文研究高阶朗之万动力学（HOLD）对扩散模型记忆化的影响，通过理论分析表明HOLD通过低通滤波学习得分函数并随阶数增加平滑度，从而缓解记忆化，并在真实数据上验证了理论。

详情

AI中文摘要

扩散/基于分数的模型已成为强大的生成模型，能够生成模仿训练数据分布的高质量样本。然而，观察到它们容易重现训练样本——称为“记忆化”——可能违反版权和隐私。在本文中，我们研究了高阶朗之万动力学（HOLD）对这一现象的影响。HOLD扩散过程引入了辅助变量；如果数据变量被解释为“位置”，那么辅助变量可以解释为“速度”和“加速度”，具体取决于所选模型的阶数。它们最初是基于这样的直觉提出的：通过隐式施加额外的动力学约束来正则化数据变量的轨迹。据我们所知，我们的工作首次提供了HOLD正则化效应的理论刻画。具体来说，我们表明在HOLD中，数据变量的动力学由学习得分函数的低通滤波版本控制，其平滑度随HOLD阶数增加而增加。然后我们分析了最优经验得分和分布崩溃的可能性。总之，我们的结果解释了随着模型阶数增加记忆化的缓解。最后，我们在真实世界数据上进行了实证研究，支持了我们的理论，并突出了HOLD在实践中相对于标准扩散的这一独特优势。

英文摘要

Diffusion/score-based models have emerged as powerful generative models, capable of generating high-quality samples that mimic the training data distribution. However, it has been observed that they are prone to reproducing training samples-known as "memorization"-potentially violating copyright and privacy. In this paper, we study the effect of Higher-Order Langevin Dynamics (HOLD) on this phenomenon. HOLD diffusion processes introduce auxiliary variables; if the data variable is interpreted as "position," then the auxiliary variables can be interpreted as "velocity" and "acceleration," depending on the chosen order of the model. They were originally proposed based on the intuition that they regularize the trajectories of the data variable by implicitly imposing additional dynamical constraints. Our work provides, to our knowledge, the first theoretical characterization of the regularization effect of HOLD. Specifically, we show that in HOLD, the dynamics of the data variable are governed by a low-pass-filtered version of the learned score function, with smoothness increasing with the order of HOLD. We then analyze the optimal empirical score and the possibility of distribution collapse. Together, our results explain the mitigation of memorization as the model order increases. Finally, we present an empirical study on real-world data that supports our theory and highlights this distinct advantage of HOLD over standard diffusion in practice.

URL PDF HTML ☆

赞 0 踩 0

2605.18840 2026-05-26 cs.LG cs.AI cs.CL 版本更新

The Growing Pains of Frontier Models: When Leaderboards Stop Separating and What to Measure Next

前沿模型的成长之痛：当排行榜不再区分以及接下来衡量什么

Adil Amin

发表机构 * Zehen Labs（泽亨实验室）

AI总结本文通过分解SWE-bench和GPQA Diamond分数为种群耦合趋势和每版本残差（h场），诊断前沿模型能力之间的协作与权衡，并提供三步诊断法、每实验室测量优先级表及七个可证伪预测。

Comments 13 pages, 5 figures, 4 tables. Companion paper: "Lying Is Just a Phase: The Hidden Alignment Transition in Language Model Scaling." ( https://doi.org/10.48550/arXiv.2605.18838 ). Code: https://github.com/adilamin89/cape-scaling . Dashboard: https://zehenlabs.com/cape/

详情

AI中文摘要

排行榜在独立轴上对前沿模型进行排名，但并未揭示能力在版本间是相互增强还是权衡——而在前沿，这种相互作用是更具信息量的信号。我们将配对的SWE-bench和GPQA Diamond分数分解为种群耦合趋势和每版本残差（h场），该残差从两个公开基准分数诊断能力重点。在来自10个实验室的34个模型（2024-2026）中，能力相互协作（r = +0.72，p < 10^{-6}），但协作程度系统性地变化：每个实验室的耦合斜率跨度达5倍（谷歌1.15 vs. DeepSeek 0.23），且实验室发生转向——DeepSeek从推理密集型逆转为编码优先（Δh = 15.9个百分点）；Anthropic在编码偏离和恢复之间振荡。种群回归作为等斜线相边界：用于识别基础尺度耦合转变的相同分类器√[(a/b)·B₁] [Amin, 2026] 对前沿模型进行分类，并已在下一个转变处检测到混合相行为（两个模型低于GPQA-IFEval等斜线）。h场不仅具有诊断性——它还告诉你需要改变什么。预训练建立耦合为0.871，而RLHF增加0.081 [Amin, 2026]：预训练级别的转变是永久的（DeepSeek的四个版本逆转持续存在），后训练转变是可逆的（Anthropic的三次编码偏离均在单个版本内恢复），仅推理计算在不重新训练的情况下将h改变+7.8个百分点。知道哪个组件占主导地位决定了是重新训练还是等待。我们提供了三步诊断法（定位、分类、预测）、每实验室测量优先级表以及七个带有时间戳标准的可证伪预测。五个截止日期后的版本落在95%预测区间内。代码、数据和交互式仪表盘：https://zehenlabs.com/cape/。

英文摘要

Leaderboards rank frontier models on independent axes but do not reveal whether capabilities reinforce or trade off across releases -- and at the frontier, this interaction is the more informative signal. We decompose paired SWE-bench and GPQA Diamond scores into a population coupling trend and per-release residual ($h$-field) that diagnoses capability emphasis from two public benchmark scores. Across 34 models from 10 labs (2024--2026), capabilities cooperate ($r = +0.72$, $p < 10^{-6}$), but cooperation varies systematically: per-lab coupling slopes span $5\times$ (Google $1.15$ vs. DeepSeek $0.23$), and labs pivot -- DeepSeek reversed from reasoning-rich to coding-first ($Δh = 15.9$~pp); Anthropic oscillates between coding excursions and recovery. The population regression serves as an isocline phase boundary: the same $\sqrt{(a/b)\cdot B_1}$ classifier that identifies the base-scale coupling transition [Amin, 2026] classifies frontier models and already detects mixed-phase behavior at the next transition (two models below the GPQA--IFEval isocline). The $h$-field is not just diagnostic -- it tells you what to change. Pretraining establishes coupling at $0.871$ while RLHF adds $0.081$ [Amin, 2026]: pretraining-level shifts are permanent (DeepSeek's four-release reversal persists), post-training shifts are reversible (Anthropic's three coding excursions each recover within one release), and inference compute alone shifts $h$ by $+7.8$~pp without retraining. Knowing which component dominates determines whether to retrain or wait. We provide a three-step diagnostic (locate, classify, predict), a per-lab measurement-priority table, and seven falsifiable predictions with timestamped criteria. Five post-cutoff releases fall within the 95\% prediction interval. Code, data, and an interactive dashboard: https://zehenlabs.com/cape/.

URL PDF HTML ☆

赞 0 踩 0

2605.18657 2026-05-26 cs.LG cs.AI 版本更新

KairosHope: A Next-Generation Time-Series Foundation Model for Specialized Classification via Dual-Memory Architecture

KairosHope: 一种基于双记忆架构的下一代时间序列基础模型，用于专门分类

Luis Balderas, José Alberto Rodríguez, Miguel Lastra, Antonio Arauzo-Azofra, José M. Benítez

发表机构 * Department of Computer Science and Artificial Intelligence（计算机科学与人工智能系）； DiCITS, iMUDS, DaSCI（DiCITS、iMUDS、DaSCI）； University of Granada（格拉纳达大学）； Advanced Medical Imaging Group（先进医学成像组）； Instituto de Investigación Biosanitaria de Granada (ibs.Granada)（格拉纳达生物医学研究机构（ibs.Granada））； Department of Software Engineering（软件工程系）； Department of Rural Engineering（农村工程系）； University of Córdoba（科尔多瓦大学）

AI总结针对标准注意力计算瓶颈和经典统计知识缺失问题，提出KairosHope模型，通过双记忆系统（Titans模块和连续记忆系统CMS）替代二次注意力，并融合深度表示与统计特征的混合决策头，在UCR基准上实现优越分类性能。

详情

AI中文摘要

时间序列基础模型（TSFMs）在通用预测任务中取得了显著成功；然而，它们对专门分类问题的适应仍然受到标准注意力的计算瓶颈和对经典统计知识的系统性忽略的限制。本技术报告介绍了KairosHope，一种下一代TSFM，旨在协调大规模泛化与分类任务中的分析精度。该提案的核心是HOPE块，一种用双记忆系统替代二次注意力的架构：用于动态短期保留的Titans模块和用于长期历史上下文抽象的连续记忆系统（CMS）。为了丰富归纳偏差，引入了混合决策头，它将深度潜在表示与通过tsfeatures包提取的确定性统计特征融合。KairosHope在大型Monash档案上进行自监督预训练，结合了掩码时间序列建模（MTSM）和对比学习（InfoNCE）。随后，通过严格的线性探测和全微调（LP-FT）协议在UCR基准数据集上进行适应，以防止灾难性遗忘。实验结果表明，在具有严格时间因果关系的领域（如HAR或传感器数据）中，性能优越。因此，KairosHope为基础模型适应时间序列分析建立了一个稳健高效的框架。

英文摘要

Time Series Foundation Models (TSFMs) have demonstrated notable success in general-purpose forecasting tasks; however, their adaptation to specialized classification problems remains constrained by the computational bottleneck of standard attention and the systematic omission of classical statistical knowledge. This technical report introduces KairosHope, a next-generation TSFM designed to reconcile massive generalization with analytical precision in classification tasks. The core of the proposal is the HOPE block, an architecture that replaces quadratic attention with a dual-memory system: Titans modules for dynamic short-term retention and a Continuum Memory System (CMS) for the abstraction of long-term historical context. To enrich the inductive bias, a Hybrid Decision Head is introduced, which fuses deep latent representations with deterministic statistical features extracted via tsfeatures package. KairosHope undergoes self-supervised pre-training on the massive Monash archive, combining Masked Time Series Modeling (MTSM) and contrastive learning (InfoNCE). Its subsequent adaptation to the UCR benchmark datasets is conducted through a rigorous Linear Probing and Full Fine-Tuning (LP-FT) protocol to prevent catastrophic forgetting. Empirical results demonstrate superior performance in domains characterized by strict temporal causality such as HAR or Sensor data. Consequently, KairosHope establishes a robust and efficient framework for the adaptation of foundation models to time series analysis.

URL PDF HTML ☆

赞 0 踩 0

2605.16591 2026-05-26 cs.LG cs.AI 版本更新

How Few-Shot Examples Add Up: A Causal Decomposition of Function Vectors in In-Context Learning

少样本示例如何累加：上下文学习中函数向量的因果分解

Entang Wang, Yiwei Wang, Aleksandra Bakalova, Michael Hahn

AI总结本文通过因果分解揭示少样本提示中函数向量由示例级子向量线性组合而成，并发现模型通过注意力重加权机制根据上下文调整示例贡献。

Comments Accepted at ICML 2026. 70 pages, 65 figures

详情

AI中文摘要

上下文学习（ICL）擅长从极少量示例中学习新任务，但我们仍缺乏对少样本提示如何塑造模型函数向量（FV）——一种驱动ICL查询任务行为的因果激活方向——的机制性解释。跨任务和模型，一个$n$样本FV可以通过示例级子FV的线性组合很好地近似，表明来自单个演示的贡献具有加性和可组合性。除了加性之外，我们展示了模型基于先前示例对单个示例的表示进行上下文化，以自适应地重新加权哪些演示主导FV：注意力转向在上下文中信息量更大、歧义更少的示例。最后，因果分解将查询-键路由与值更新分离，发现上下文化对FV质量最一致的贡献来自查询-键对齐——尤其是在歧义设置中——而值介导的效应则更加异质。综合起来，这些结果将加性叠加与上下文相关的注意力重加权统一为一个机制性的、可检验的说明，解释少样本提示如何实现任务。

英文摘要

In-context learning (ICL) excels at new tasks from minimal examples, yet we still lack a mechanistic explanation of how few-shot prompts shape a model's function vector (FV)--a causal activation direction that drives task behavior on the ICL query. Across tasks and models, an $n$-shot FV is well-approximated by a linear combination of example-level sub-FVs, suggesting additive and composable contributions from individual demonstrations. Beyond additivity, we show that models contextualize individual examples' representations based on prior examples to adaptively reweight which demonstrations dominate the FV: attention shifts toward examples that are more informative and less ambiguous under the context. Finally, a causal decomposition separates Query-Key routing from Value updates, finding that contextualization's most consistent contributions to FV quality arise from Query-Key alignment--particularly in ambiguous settings--while Value-mediated effects are more heterogeneous. Together, these results unify additive superposition with context-dependent attention reweighting into a mechanistic, testable account of how few-shot prompts implement tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.16409 2026-05-26 cs.CV cs.CL cs.LG 版本更新

Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for Multimodal Large Language Models

多语言OCR感知微调和提示引导的链式思维推理用于多模态大语言模型

Qinwu Xu, Yifan Jiang, Haoyu Ren

发表机构 * Meta AI ； UT Austin（德克萨斯大学奥斯汀分校）

AI总结提出一种多语言OCR感知的多模态训练框架，通过合成数据生成、OCR感知微调和结构化视觉链式思维提示，提升多模态大语言模型在复杂视觉条件下的OCR完整性和多语言翻译准确性。

详情

AI中文摘要

光学字符识别（OCR）和多语言文本理解仍然是多模态大语言模型（MLLMs）的主要失败模式，尤其是在包含杂乱布局、小字体、模糊、遮挡和复杂排版的真实世界图像中。我们提出了一种OCR感知的多语言多模态训练框架，该框架结合了（i）大规模合成OCR到翻译数据生成，（ii）使用LoRA适配的OCR感知监督微调（SFT），以及（iii）在不确定视觉条件下进行推理的结构化视觉链式思维（CoT）提示。使用基于LLaMA的多模态架构，所提出的框架在OCR完整性、多语言翻译准确性和退化视觉条件下的鲁棒性方面有了显著提升。在多语言收据、菜单、海报、标志、手写文本和文档图像上的实验结果表明，与基线模型相比，视觉-文本对齐显著改善。特别是，所提出的OCR感知后训练框架提高了对小、模糊、空间分散和部分遮挡文本的提取，同时减少了对不确定OCR条件下语言先验的依赖。与前沿多模态系统（包括GPT-5类和Gemini系列模型）的定性比较进一步表明，在噪声和视觉模糊的OCR场景下，OCR对齐得到改善，幻觉减少。总体而言，结果表明，以数据为中心的OCR感知多模态后训练为改进多语言OCR和基于OCR的视觉问答系统提供了一种有效且可扩展的方向。

英文摘要

Optical character recognition (OCR) and multilingual text understanding remain major failure modes of multimodal large language models (MLLMs), particularly in real-world images containing cluttered layouts, small fonts, blur, occlusion, and complex typography. We present an OCR-aware multilingual multimodal training framework that combines (i) large-scale synthetic OCR-to-translation data generation, (ii) OCR-aware supervised fine-tuning (SFT) with LoRA adaptation, and (iii) structured visual chain-of-thought (CoT) prompting for reasoning under uncertain visual conditions. Using a LLaMA-based multimodal architecture, the proposed framework substantially improves OCR completeness, multilingual translation accuracy, and robustness under degraded visual conditions. Experimental results on multilingual receipts, menus, posters, signs, handwritten text, and document images demonstrate significantly improved visual-text grounding compared with the baseline model. In particular, the proposed OCR-aware post-training framework improves extraction of small, blurred, spatially scattered, and partially occluded text while reducing reliance on language priors under uncertain OCR conditions. Qualitative comparisons with frontier multimodal systems, including GPT-5-class and Gemini-family models, further suggest improved OCR grounding and reduced hallucination under noisy and visually ambiguous OCR scenarios. Overall, the results indicate that data-centric OCR-aware multimodal post-training provides an effective and scalable direction for improving multilingual OCR and OCR-based visual question answering systems.

URL PDF HTML ☆

赞 0 踩 0

2605.14605 2026-05-26 cs.CR cs.AI cs.LG 版本更新

One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries

一步之遥：为什么针对恶意微调的防御在自适应对手面前失败

Itay Zloczower, Eyal Lenga, Gilad Gressel, Yisroel Mirsky

发表机构 * Ben-Gurion University of the Negev（贝纳-约瑟夫大学）； Amrita Vishwa Vidyapeetham（阿米塔维莎瓦迪耶佩塔）

AI总结本文通过分析15种近期防御机制，发现它们共享一个弱点：仅掩盖或误导有害行为路径而未消除行为本身，并开发了一种统一的自适应攻击，成功突破了所有防御机制。

Comments Under review

详情

AI中文摘要

模型提供商越来越多地发布开放权重或允许用户通过API微调基础模型。尽管这些模型在发布前经过安全对齐，但其防护措施通常可以通过对有害数据的微调来移除。最近的防御旨在使模型对此类恶意微调具有鲁棒性，但它们主要仅针对不考虑防御的固定攻击进行评估。我们表明这些鲁棒性声明是不完整的。通过调查15种近期防御，我们识别了几种防御机制，并表明它们共享一个单一弱点：它们掩盖或误导通往有害行为的路径，而不移除行为本身。然后，我们开发了一种统一的自适应攻击，突破了所有防御机制。我们的结果表明，当前方法并未提供稳健的安全性；它们主要阻止了它们所设计的攻击。我们希望我们针对这一领域的统一自适应对手将帮助未来的研究人员和实践者在部署前对新防御进行压力测试。

英文摘要

Model providers increasingly release open weights or allow users to fine-tune foundation models through APIs. Although these models are safety-aligned before release, their safeguards can often be removed by fine-tuning on harmful data. Recent defenses aim to make models robust to such malicious fine-tuning, but they are largely evaluated only against fixed attacks that do not account for the defense. We show that these robustness claims are incomplete. Surveying 15 recent defenses, we identify several defense mechanisms and show that they share a single weakness: they obscure or misdirect the path to harmful behavior without removing the behavior itself. We then develop a unified adaptive attack that breaks defenses across all defense mechanisms. Our results show that current approaches do not provide robust security; they mainly stop the attacks they were designed against. We hope that our unified adaptive adversary for this domain will help future researchers and practitioners stress-test new defenses before deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.13282 2026-05-26 cs.AI cs.LG 版本更新

一种面向计算连续体中因果可观测性的不确定性感知韧性微代理

Suvi De Silva, Alfreds Lapkovskis, Alaa Saleh, Sasu Tarkoma, Praveen Kumar Donta

发表机构 * Department of Computer Systems and Sciences（计算机系统与科学系）； Department of Computer Science（计算机科学系）

AI总结提出AURORA框架，通过集成自由能原理、因果do-calculus和局部因果状态图，在边缘层实现灰色故障的因果诊断与缓解，并采用双门控执行机制在不确定性高时避免破坏性干预。

详情

AI中文摘要

计算连续体中的灰色故障会产生模糊重叠的症状，现有方法由于缺乏因果意识或在高度认知不确定性下行动，无法可靠诊断，并可能导致破坏性干预。本文提出了一种面向因果可观测性的不确定性感知韧性微代理（AURORA），这是一个轻量级框架，用于诊断和缓解边缘层环境中的灰色故障。该框架采用并行微代理，集成自由能原理、因果do-calculus和局部因果状态图，支持每个故障马尔可夫毯内的反事实根因分析。将推理限制在因果相关变量上可降低计算开销，同时保持诊断保真度。AURORA进一步引入双门控执行机制，仅在因果置信度高且预测认知不确定性有界时授权修复；否则，放弃本地干预并将诊断有效载荷升级到雾层。我们的实验表明，AURORA优于基线，实现了0%的破坏性行动率，同时保持62.0%的修复准确率和3ms的平均修复时间。

英文摘要

Grey failures in the computing continuum produce ambiguous overlapping symptoms that existing approaches fail to diagnose reliably, either due to a lack of causal awareness or acting under high epistemic uncertainty, risking destructive interventions. This paper presents an uncertainty-aware resilience micro-agent for causal observability (AURORA), a lightweight framework for diagnosing and mitigating grey failures in edge-tier environments. The framework employs parallel micro-agents that integrate the free-energy principle, causal do-calculus, and localized causal state-graphs to support counterfactual root-cause analysis within each fault's Markov blanket. Restricting inference to causally relevant variables reduces computational overhead while preserving diagnostic fidelity. AURORA further introduces a dual-gated execution mechanism that authorizes remediation only when causal confidence is high and predicted epistemic uncertainty is bounded; otherwise, it abstains from local intervention and escalates the diagnostic payload to the fog tier. Our experiments demonstrate that AURORA outperforms baselines, achieving a 0% destructive action rate, while maintaining 62.0% repair accuracy and a 3ms mean time to repair.

URL PDF HTML ☆

赞 0 踩 0

2605.10302 2026-05-26 cs.LG 版本更新

Follow the Mean: Reference-Guided Flow Matching

跟随均值：参考引导的流匹配

Pedro M. P. Curvo, Maksim Zhdanov, Floor Eijkelboom, Jan-Willem van de Meent

发表机构 * University of Amsterdam（阿姆斯特丹大学）； AMLab（AML实验室）

AI总结提出通过改变参考集均值来引导预训练流匹配模型实现可控生成，无需微调或额外网络。

详情

AI中文摘要

现有的可控生成方法通常依赖于微调、辅助网络或测试时搜索。我们证明流匹配提供了不同的控制接口：通过示例进行自适应。对于确定性插值，速度场仅由条件端点均值决定；移动该均值会移动流本身。这为可控生成提供了一个简单原则：通过改变模型遵循的参考集来引导预训练模型。我们以两种形式实例化这一思想。参考均值引导无需训练：它从参考库中计算封闭形式的端点均值修正，并将其应用于冻结的FLUX.2-klein（4B）模型，在保持提示、种子和权重不变的情况下，实现对颜色、身份、风格和结构的控制。半参数引导通过显式均值锚点和学习到的残差精炼器摊销相同的思想，在AFHQv2上匹配无条件的DiT-B/4质量，同时允许在推理时交换参考集。这些结果指向一个更广泛的方向：通过数据而非参数更新进行自适应的生成模型。

英文摘要

Existing approaches to controllable generation typically rely on fine-tuning, auxiliary networks, or test-time search. We show that flow matching admits a different control interface: adaptation through examples. For deterministic interpolants, the velocity field is solely governed by a conditional endpoint mean; shifting this mean shifts the flow itself. This yields a simple principle for controllable generation: steer a pretrained model by changing the reference set it follows. We instantiate this idea in two forms. Reference-Mean Guidance is training-free: it computes a closed-form endpoint-mean correction from a reference bank and applies it to a frozen FLUX.2-klein (4B) model, enabling control of color, identity, style, and structure while keeping the prompt, seed, and weights fixed. Semi-Parametric Guidance amortizes the same idea through an explicit mean anchor and learned residual refiner, matching unconditional DiT-B/4 quality on AFHQv2 while allowing the reference set to be swapped at inference time. These results point to a broader direction: generative models that adapt through data, not parameter updates.

URL PDF HTML ☆

赞 0 踩 0

2605.06505 2026-05-26 cs.LG cs.AI cs.CR 版本更新

PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization

PACZero: 通过符号量化的语言模型PAC隐私微调

Murat Bilgehan Ertan, Xiaochen Zhu, Phuong Ha Nguyen, Marten van Dijk, Srinivas Devadas

发表机构 * CWI Amsterdam（阿姆斯特丹信息与计算科学研究所）； MIT Cambridge（麻省理工学院）； Vrije Universiteit Amsterdam（阿姆斯特丹自由大学）

AI总结提出PACZero系列零阶机制，通过符号量化实现零互信息下的PAC隐私微调，在SST-2和SQuAD上取得竞争性结果。

详情

AI中文摘要

我们引入了PACZero，一系列用于微调大型语言模型的PAC隐私零阶机制，在$I(S^*; Y_{1:T})=0$时提供可用的效用。该隐私机制将成员推断攻击（MIA）后验成功率限制在先验水平，这是DP框架仅在$\varepsilon=0$和无限噪声下才能达到的MIA抵抗水平。所有下面的DP-ZO比较都在MIA后验水平上匹配。关键见解是，PAC隐私仅在发布依赖于哪个候选子集是秘密时才对互信息收费。对子集聚合的零阶梯度进行符号量化会产生频繁的一致步骤，即每个候选子集在更新方向上达成一致；在这些步骤中，发布的符号花费零条件互信息。我们提出了两个变体，涵盖隐私-效用权衡：PACZero-MI（通过对二元发布进行精确校准的预算化MI）和PACZero-ZPL（在分歧步骤上通过均匀硬币翻转实现$I=0$）。我们在SST-2和SQuAD上使用OPT-1.3B和OPT-6.7B在LoRA和全参数轨道上进行了评估。在SST-2 OPT-1.3B全微调$I=0$时，PACZero-ZPL达到$88.99\pm0.91$，比非私有MeZO基线（$91.1$ FT）低2.1个百分点。在$\varepsilon<1$的高隐私机制下，没有先前方法能产生可用的效用，而PACZero-ZPL在$I=0$时在OPT-1.3B和OPT-6.7B上获得了有竞争力的SST-2准确率和非平凡的SQuAD F1分数。

英文摘要

We introduce PACZero, a family of PAC-private zeroth-order mechanisms for fine-tuning large language models that delivers usable utility at $I(S^*; Y_{1:T})=0$. This privacy regime bounds the membership-inference attack (MIA) posterior success rate at the prior, an MIA-resistance level the DP framework matches only at $\varepsilon=0$ and infinite noise. All DP-ZO comparisons below are matched at the MIA posterior level. The key insight is that PAC Privacy charges mutual information only when the release depends on which candidate subset is the secret. Sign-quantizing subset-aggregated zeroth-order gradients creates frequent unanimity, steps at which every candidate subset agrees on the update direction; at these steps the released sign costs zero conditional mutual information. We propose two variants that span the privacy-utility trade-off: PACZero-MI (budgeted MI via exact calibration on the binary release) and PACZero-ZPL ($I=0$ via a uniform coin flip on disagreement steps). We evaluate on SST-2 and SQuAD with OPT-1.3B and OPT-6.7B in both LoRA and full-parameter tracks. On SST-2 OPT-1.3B full fine-tuning at $I=0$, PACZero-ZPL reaches ${88.99\pm0.91}$, within $2.1$pp of the non-private MeZO baseline ($91.1$ FT). No prior method produces usable utility in the high-privacy regime $\varepsilon<1$, and PACZero-ZPL obtains competitive SST-2 accuracy and nontrivial SQuAD F1 across OPT-1.3B and OPT-6.7B at $I=0$.

URL PDF HTML ☆

赞 0 踩 0

2605.06259 2026-05-26 cs.LG cs.CR 版本更新

Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds

基于随机洗牌的DP-SGD的权衡函数：紧的上界和下界

Marten van Dijk, Murat Bilgehan Ertan

发表机构 * CWI Amsterdam（阿姆斯特丹信息与计算科学研究所）； Vrije Universiteit Amsterdam（阿姆斯特丹自由大学）

AI总结本文在$f$-DP框架下，针对基于随机洗牌子采样的差分隐私随机梯度下降（DP-SGD），推导了权衡函数的紧致分析，得到了透明且可解释的闭式界，并展示了单轮训练中达到有意义的差分隐私所需的参数设置。

详情

AI中文摘要

我们在$f$-DP框架下，针对基于随机洗牌子采样的差分隐私随机梯度下降（DP-SGD），推导了权衡函数的紧致分析。我们的分析涵盖了噪声乘数$σ$满足$σ\geq \sqrt{3/\ln M}$的情形，其中$M$是单轮内的轮数。与泊松子采样的$f$-DP分析（产生非封闭的隐式公式，可机器计算但不透明）不同，随机洗牌允许紧致分析，得到透明且可解释的闭式界。我们通过Berry-Esseen定理推导的具体界，在证明框架内紧致到常数因子。我们展示了单轮（$E=1$）的工作参数设置，对应的权衡函数$\geq 1-a-δ$，即仅比理想随机猜测对角线$1-a$低$δ$：对于$δ=1/100$和$σ=1$，大约$M \approx 1.14\times 10^6$轮和$N \approx 1.14\times 10^7$训练样本足以实现有意义的差分隐私。这与最近关于$σ\leq 1/\sqrt{2 \ln M}$情形的负面结果形成对比。我们的具体界可以在多个轮次上组合，导致$δ$具有与$E$的线性依赖关系，这限制了$E=O(\sqrt{M})$。为了超越Berry-Esseen，我们引入了一种新的证明技术，基于大数定律的推广，得到了渐近随机猜测对角线极限结果：如果$E=c_M^2M$且$c_M\to 0$，则$E$次组合的权衡函数满足$f^{\otimes E}(a)\to 1-a$在$a\in[0,1]$上一致，且$δ$仅具有$O(\sqrt{E})$的依赖关系。我们将这种渐近状态与相应的泊松子采样渐近进行比较，并将显式收敛速率的刻画作为一个开放问题。

英文摘要

We derive a tight analysis of the trade-off function for Differentially Private Stochastic Gradient Descent (DP-SGD) with subsampling based on random shuffling within the $f$-DP framework. Our analysis covers the regime $σ\geq \sqrt{3/\ln M}$, where $σ$ is the noise multiplier and $M$ is the number of rounds within a single epoch. Unlike $f$-DP analyses for Poisson subsampling, which yield non-closed implicit formulas that can be machine computed but are non-transparent, random shuffling admits a tight analysis yielding transparent and interpretable closed-form bounds. Our concrete bounds, derived via the Berry-Esseen theorem, are tight up to constant factors within the proof framework. We demonstrate worked parameter settings for a single epoch ($E=1$) with a corresponding trade-off function $\geq 1-a-δ$, that is, only $δ$ below the ideal random guessing diagonal $1-a$: For $δ= 1/100$ and $σ= 1$, roughly $M \approx 1.14\times 10^6$ rounds and $N \approx 1.14\times 10^7$ training samples suffice to achieve meaningful differential privacy. This is in contrast to recent negative results for the regime $σ\leq 1/\sqrt{2 \ln M}$. Our concrete bounds can be composed over multiple epochs leading to $δ$ having a linear in $E$ dependency, which restricts $E=O(\sqrt{M})$. To go beyond Berry--Esseen, we introduce a new proof technique based on a generalization of the law of large numbers that yields an asymptotic random guessing diagonal-limit result: if $E=c_M^2M$ with $c_M\to 0$, then the $E$-fold composed trade-off function satisfies $f^{\otimes E}(a)\to 1-a$ uniformly in $a\in[0,1]$ with $δ$ having only an $O(\sqrt{E})$ dependency. We compare this asymptotic regime with the corresponding Poisson subsampling asymptotic, and highlight the characterization of explicit convergence rates as an open question.

URL PDF HTML ☆

赞 0 踩 0

2605.05795 2026-05-26 cs.LG 版本更新

Reward Shaping and Action Masking for Compositional Tasks using Behavior Trees and LLMs

使用行为树和LLM的组合任务奖励塑造与动作掩码

Nicholas Potteiger, Ankita Samaddar, Taylor T. Johnson, Xenofon Koutsoukos

发表机构 * Vanderbilt University（范德比大学）

AI总结提出MRBT结构，结合LLM自动生成奖励和动作掩码，通过SMT验证和神经符号RL循环，提升组合任务训练效率和成功率。

详情

AI中文摘要

将复杂任务分解为一系列更简单的子任务可以提高自主代理的学习效率。强化学习（RL）可用于优化代理策略以完成子任务，但需要明确定义的子任务奖励，并受益于动作掩码。最近的工作使用大型语言模型（LLM）来自动化奖励塑造和动作掩码，然而它们都没有完全解决对子任务失败的响应性以及组合任务中不同对象的模块化问题。为了克服这些挑战，我们开发了掩码奖励行为树（MRBT），这是一种用作响应式和模块化奖励及动作掩码函数的符号结构。我们设计了一个MRBT模板，并推导出逻辑规范来构建和验证一系列对象交互子任务的MRBT。此外，我们开发了一个自动化流水线，使用LLM生成对变化任务对象鲁棒的MRBT，使用SMT求解器验证规范的正确性，以及一个神经符号RL循环来训练代理完成组合任务。实验证明成功生成和优化了五个MRBT，与基线以及没有动作掩码的MRBT相比，持续提高了训练效率和任务成功率。我们进一步强调了MRBT的三个优势：可迁移性、模块化和可验证性。

英文摘要

Decomposing complex tasks into a sequence of simpler subtasks can improve learning efficiency for an autonomous agent. Reinforcement learning (RL) can be used to optimize agent policies to complete subtasks, but requires well-defined subtask rewards and benefits from action masking. Recent work uses large language models (LLMs) to automate reward shaping and action masking, however none of them fully address reactivity to subtask failure and modularity to varying objects for compositional tasks. To overcome these challenges, we develop masking reward behavior tree (MRBT), a symbolic structure used as a reactive and modular reward and action mask function. We design an MRBT template and derive logical specifications to construct and verify MRBTs for a sequence of object-interaction subtasks. Further, we develop an automated pipeline that uses an LLM to generate MRBTs robust to varying task objects, an SMT-solver to verify correctness of specifications, and a neurosymbolic RL loop to train agents on compositional tasks. Experiments demonstrate successful generation and refinement of five MRBTs, consistently improving training efficiency and task success rates over baselines and MRBTs without action masking. We further highlight three advantages of MRBTs: transferability, modularity, and verifiability.

URL PDF HTML ☆

赞 0 踩 0

2605.05759 2026-05-26 cs.LG 版本更新

Full-Spectrum Graph Neural Networks: Expressive and Scalable

全谱图神经网络：表达力与可扩展性

Xiaohan Wang, Deyu Bo, Longlong Li, Kelin Xia

发表机构 * Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore（数学科学学院，物理与数学科学学院，南洋理工大学，新加坡637371，新加坡）

AI总结提出全谱图神经网络（FSpecGNN），通过将信号从节点域提升到节点对域并将单变量谱滤波器扩展为双变量滤波器，实现了对节点对信号的通用逼近，同时保持可扩展性。

Comments 41 pages, 4 figures. Accepted to ICML 2026

详情

AI中文摘要

众所周知，谱图神经网络（GNN）可以通用逼近节点信号；然而，它们的表达能力仍然受限于1维Weisfeiler-Lehman测试，这体现在它们对高阶信号缺乏通用性。为了突破这一界限，我们提出了全谱GNN（FSpecGNN），这是经典谱GNN的二阶推广。FSpecGNN从两个角度推进了谱滤波：（1）将信号从节点域提升到节点对域；（2）将特征值上的单变量谱滤波器扩展为特征值对上的双变量滤波器。我们证明经典谱GNN是FSpecGNN的对角特例，并证明FSpecGNN在通用逼近节点对信号的同时，其表达能力最多与Local 2-GNN相当，后者对异配图学习特别有益。此外，FSpecGNN支持可扩展实现，避免了显式的节点对级计算；结合低秩近似将全谱卷积简化为多项式谱滤波器的组合，使其能够在大图上学习。实验上，FSpecGNN验证了预测的表达能力，并在异配基准上展现了强劲性能。

英文摘要

It is well established that spectral graph neural networks (GNNs) can universally approximate node signals; however, their expressive power remains bounded by the 1-dimensional Weisfeiler-Lehman test, which is mirrored in their lack of universality for higher-order signals. To go beyond this bound, we propose the Full-Spectrum GNNs (FSpecGNNs), a second-order generalization of classical spectral GNNs. FSpecGNN advances spectral filtering from two perspectives: (1) it lifts signals from the node domain to the node-pair domain; and (2) it extends the univariate spectral filter over eigenvalues to a bivariate filter over eigenvalue pairs. We show that classical spectral GNNs arise as a diagonal special case of FSpecGNNs, and prove that FSpecGNNs can be at most as expressive as Local 2-GNN while universally approximating node-pair signals, the latter being particularly beneficial for heterophilic graph learning. Moreover, FSpecGNN admits scalable implementations that avoid explicit node-pair-level computations; combined with a low-rank approximation that reduces full-spectrum convolution to a combination of polynomial spectral filters, it enables learning on large graphs. Empirically, FSpecGNN validates the predicted expressivity and delivers strong performance on heterophilic benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.05226 2026-05-26 cs.LG cs.AI cs.CL 版本更新

Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning

将结果监督内化为过程监督：推理强化学习的新范式

Fei Ding, Yongkang Zhang, Runhao Liu, Yuhao Liao, Zijian Zeng, Sibo wang, Huiming Yang

发表机构 * Alibaba Group（阿里巴巴集团）； Tsinghua University（清华大学）

AI总结提出一种监督内化方法，使模型在仅结果监督下自动提取过程级学习信号，实现细粒度策略优化。

详情

AI中文摘要

推理强化学习的核心挑战不仅在于结果级监督的稀疏性，更在于如何将仅在序列末尾提供的反馈转化为可指导中间推理步骤的细粒度学习信号。现有方法要么依赖结果级奖励进行序列级优化，导致精确信用分配困难，要么依赖外部构建的过程监督，成本高昂且难以可持续扩展。为解决这一问题，我们提出一个新视角：推理强化学习可以理解为将结果监督内化为过程监督的问题。基于此视角，我们引入一种用于推理强化学习的监督内化方法，使模型能够通过识别、纠正和重用失败的推理轨迹自动提取过程级学习信号，从而在仅结果监督下实现更细粒度的策略优化。我们进一步将这一思想抽象为一种新的训练范式，其中模型在强化学习过程中持续生成并完善自身的内部过程监督，为推理强化学习中细粒度信用分配开辟了一条不同于外部提供过程监督的新路径。

英文摘要

The central challenge of reinforcement learning for reasoning lies not only in the sparsity of outcome-level supervision, but more fundamentally in how to transform feedback provided only at the end of a sequence into fine-grained learning signals that can guide intermediate reasoning steps. Existing approaches either rely on outcome-level rewards for sequence-level optimization, which makes precise credit assignment difficult, or depend on externally constructed process supervision, which is costly and difficult to scale sustainably. To address this, we propose a new perspective: reinforcement learning for reasoning can be understood as the problem of internalizing outcome supervision into process supervision. From this perspective, we introduce a supervision-internalization method for reinforcement learning for reasoning, enabling the model to automatically extract process-level learning signals through identifying, correcting, and reusing failed reasoning trajectories, thereby achieving finer-grained policy optimization under outcome-only supervision. We further abstract this idea into a new training paradigm, in which the model continually generates and refines its own internal process supervision during reinforcement learning, opening a new path for fine-grained credit assignment in reinforcement learning for reasoning that differs from externally provided process supervision.

URL PDF HTML ☆

赞 0 踩 0

2605.04363 2026-05-26 cs.LG cs.AI 版本更新

Mitigating Label Shift in Tabular In-Context Learning via Test-Time Posterior Adjustment

通过测试时后验调整缓解表格上下文学习中的标签偏移

Seunghan Lee

发表机构 * LG AI Research（LG人工智能研究）

AI总结针对TabPFN在表格数据上下文学习中对标签偏移敏感的问题，提出DistPFN方法，通过测试时后验调整重新缩放类别概率，无需修改架构或额外训练，在250多个OpenML数据集上显著提升分类性能。

Comments ICML 2026

详情

AI中文摘要

TabPFN最近作为表格数据集的基础模型受到关注，通过在合成数据上利用上下文学习实现了强性能。然而，我们发现TabPFN容易受到标签偏移的影响，常常过拟合训练数据集中的多数类。为了解决这一局限性，我们提出了DistPFN，这是第一个专为表格基础模型设计的测试时后验调整方法。DistPFN通过降低训练先验（即上下文的类别分布）的影响并强调模型预测后验的贡献来重新缩放预测的类别概率，无需架构修改或额外训练。我们进一步引入了DistPFN-T，它结合了温度缩放，以根据先验和后验之间的差异自适应地控制调整强度。我们在超过250个OpenML数据集上评估了我们的方法，证明在标签偏移下，各种基于TabPFN的模型在分类任务中取得了显著改进，同时在无标签偏移的标准设置中保持了强性能。代码可在以下仓库获取：https://github.com/seunghan96/DistPFN。

英文摘要

TabPFN has recently gained attention as a foundation model for tabular datasets, achieving strong performance by leveraging in-context learning on synthetic data. However, we find that TabPFN is vulnerable to label shift, often overfitting to the majority class in the training dataset. To address this limitation, we propose DistPFN, the first test-time posterior adjustment method designed for tabular foundation models. DistPFN rescales predicted class probabilities by downweighting the influence of the training prior (i.e., the class distribution of the context) and emphasizing the contribution of the model's predicted posterior, without architectural modification or additional training. We further introduce DistPFN-T, which incorporates temperature scaling to adaptively control the adjustment strength based on the discrepancy between prior and posterior. We evaluate our methods on over 250 OpenML datasets, demonstrating substantial improvements for various TabPFN-based models in classification tasks under label shift, while maintaining strong performance in standard settings without label shift. Code is available at this repository: https://github.com/seunghan96/DistPFN.

URL PDF HTML ☆

赞 0 踩 0

2605.03462 2026-05-26 cs.LG cs.AI 版本更新

From Muscle Bursts to Motor Intent: Self-Supervised Token Modeling for Heterogeneous EMG

从肌肉爆发到运动意图：面向异质EMG的自监督令牌建模

Zhenghao Huang, Huilin Yao, Kaikai Wang

AI总结提出AEMG自监督学习方法，通过事件级令牌建模和Transformer编码，从异质EMG数据中提取可复用的神经肌肉表征，提升跨用户、跨会话的鲁棒性并减少校准数据需求。

Comments After further verification, we identified issues in the current version that may affect the reliability and reproducibility of the reported experimental results. In particular, part of the evaluation relies on a dataset for which the public-release/redistribution status and supporting validation remain unresolved

详情

AI中文摘要

表面肌电图提供了一种从可穿戴肌肉记录推断人类运动意图的实用方法，但在单一采集设置下训练的模型在用户、会话、电极布局或手势协议改变时往往会失去可靠性。本文提出AEMG，一种自监督学习方法，旨在从多样化的EMG源中提取可复用的神经肌肉表征。首先将八个公开手势数据集转换为共享信号格式，以减少通道配置、传感器拓扑和记录协议的差异。AEMG不依赖固定长度滑动窗口，而是从能量变化中识别收缩事件并将其表示为紧凑的神经肌肉令牌，同时有序令牌组描述运动过程中多个肌肉的协调活动。然后使用空间和时间条件Transformer编码这些令牌序列，保留电极位置、激活时序和顺序结构信息。在预训练中，模型通过向量量化重建构建收缩原型的离散库，并通过从周围观测中恢复掩蔽的神经肌肉令牌进一步学习上下文依赖关系。在留一受试者和低标签适应设置下的实验表明，学习到的表征提高了对未见用户的鲁棒性，并减少了手势识别所需的校准数据量。这些发现表明，事件级令牌建模为适应性强且数据高效的基于EMG的运动意图理解提供了一条可扩展的途径。

英文摘要

Surface electromyography provides a practical way to infer human movement intention from wearable muscle recordings, but models trained under a single acquisition setting often lose reliability when the user, session, electrode layout, or gesture protocol changes. This paper proposes AEMG, a self-supervised learning approach designed to extract reusable neuromuscular representations from diverse EMG sources. Eight public gesture datasets are first transformed into a shared signal format to reduce discrepancies in channel configuration, sensor topology, and recording protocol. Instead of relying on fixed-length sliding windows, AEMG identifies contraction events from energy variations and represents them as compact neuromuscular tokens, while ordered token groups describe the coordinated activity of multiple muscles during motion. A spatially and temporally conditioned Transformer is then used to encode these token sequences, preserving information about electrode position, activation timing, and sequential structure. For pre-training, the model constructs a discrete library of contraction prototypes through vector-quantized reconstruction and further learns contextual dependencies by recovering masked neuromuscular tokens from surrounding observations. Experiments under leave-one-subject-out and low-label adaptation settings show that the learned representation improves robustness to unseen users and reduces the amount of calibration data required for gesture recognition. These findings suggest that event-level token modeling offers a scalable route toward adaptable and data-efficient EMG-based motor-intent understanding.

URL PDF HTML ☆

赞 0 踩 0

2605.02124 2026-05-26 cs.LG cs.AI math.PR 版本更新

Soft-to-Hard Routing in Sparse Mixture-of-Experts Models

稀疏混合专家模型中的软到硬路由

Reza Rastegar

发表机构 * Meta Platforms, Inc（Meta平台）

AI总结本文通过边界层微积分方法，研究了稀疏混合专家模型中softmax路由随温度趋于零时趋近于硬top-1路由的极限过程，并给出了基于路由界面邻域概率的定量误差界。

详情

AI中文摘要

随着温度趋于零，softmax路由趋近于硬top-1路由，但极限过程在路由器平局时存在奇异性。本文针对总体平方损失混合专家回归中的软到硬极限，发展了一种边界层微积分方法。对于具有logits $a_k(x;ϕ)$的路由器，相关的局部量是前两名的间隔$Δ(x;ϕ)$，相关的全局量是边界质量$\\mathbb{P}(Δ(X;ϕ)\\\le w)$。在光滑性和横截性假设下，余面积和管状邻域估计展示了该质量如何随板宽缩放；在二元情形中，主导系数是路由界面上的显式曲面积分。这些几何估计给出了软目标$L_τ$和硬目标$L_0$之间的定量界，包括在间隔尾条件下的$O(τ^α)$一致比较，并得到了紧参数空间上软目标的$Γ$-收敛性。主要结论是，零温度近似由路由界面的$O(τ)$邻域所承载的概率控制，而不仅仅由温度本身决定。在分离出问题的这一边界层部分后，我们记录了一个从硬路由到小温度软路由的条件景观传递定理，以及一个简化的双专家高斯计算，展示了局部对称性破缺。仅包含合成诊断作为边界层预测的受控检验。

英文摘要

Softmax routing approaches hard top-1 routing as the temperature tends to zero, but the limiting passage is singular at router ties. This paper develops a boundary-layer calculus for this soft-to-hard limit in population squared-loss mixture-of-experts regression. For a router with logits $a_k(x;ϕ)$, the relevant local quantity is the top-two margin $Δ(x;ϕ)$, and the relevant global quantity is the boundary mass $\mathbb{P}(Δ(X;ϕ)\le w)$. Under smoothness and transversality assumptions, coarea and tubular-neighborhood estimates show how this mass scales with the slab width; in the binary case the leading coefficient is an explicit surface integral over the routing interface. These geometric estimates give quantitative bounds between the soft objective $L_τ$ and the hard objective $L_0$, including an $O(τ^α)$ uniform comparison under a margin-tail condition, and yield $Γ$-convergence of the soft objectives on compact parameter spaces. The main conclusion is that the zero-temperature approximation is controlled by the probability carried by an $O(τ)$ neighborhood of the routing interfaces, not by temperature alone. After isolating this boundary-layer part of the problem, we record a conditional landscape-transfer theorem from hard to small-temperature soft routing and a reduced two-expert Gaussian calculation illustrating local symmetry breaking. Synthetic diagnostics are included only as controlled checks of the boundary-layer predictions.

URL PDF HTML ☆

赞 0 踩 0

2604.23396 2026-05-26 cs.IR cs.AI cs.CL cs.LG 版本更新

Lost in Decoding? Reproducing and Stress-Testing the Look-Ahead Prior in Generative Retrieval

迷失在解码中？复现与压力测试生成式检索中的前瞻先验

Kidist Amde Mekonnen, Yongkang Li, Yubao Tang, Simon Lupart, Maarten de Rijke

发表机构 * University of Amsterdam（阿姆斯特丹大学）

AI总结本文复现并压力测试了生成式检索中的前瞻先验方法PAG，发现其规划信号在词汇表面形式变化下脆弱，并评估了跨语言鲁棒性与查询端缓解策略。

Comments 12 pages, 5 figures, 9 tables; accepted to the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval, July 20-24, 2026, Melbourne/Naarm, Australia

详情

DOI: 10.1145/3805712.3808567
Journal ref: Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '26), pages XXX-XXX, 2026

AI中文摘要

生成式检索（GR）通过自回归生成文档标识符来对文档进行排序。由于许多GR方法依赖于trie约束的束搜索，它们在有限束解码下容易过早剪枝相关前缀。生成式检索中的前瞻规划（PAG）通过使用同时解码来计算文档级前瞻先验，指导后续顺序解码，从而缓解了这种失败模式。我们在推理时复现了PAG，并压力测试了其解码行为。使用作者发布的检查点和标识符/trie工件，在报告的解码设置下，我们在MS MARCO Dev和TREC-DL 2019/2020上复现了主要有效性结果，并在我们的硬件设置中证实了报告的束大小-延迟权衡。在复现之外，我们引入了规划漂移诊断，量化意图保持的查询变体如何改变规划器的top-n候选集和最高权重规划器令牌，以及这些变化如何影响引导解码。我们发现PAG的规划信号在词汇表面形式变化下是脆弱的：意图保持的拼写错误可能触发规划崩溃，其中规划的候选池变化足够大，使得前瞻奖励几乎无法提供有用的指导，实际上使解码退回到较弱的无引导搜索。我们进一步使用非英语mMARC O查询对英语索引评估了固定索引的跨语言鲁棒性，并评估了无需重新索引的查询端缓解策略；在我们的设置中，查询翻译提供了最强的恢复。总体而言，我们的结果证实了PAG报告的有效性以及在发布的推理设置下规划引导解码的优势，同时表明这些增益依赖于规划信号在现实查询变化和查询-文档不匹配下的稳定性。

英文摘要

Generative retrieval (GR) ranks documents by autoregressively generating document identifiers. Because many GR methods rely on trie-constrained beam search, they are vulnerable to early pruning of relevant prefixes under finite-beam decoding. Planning Ahead in Generative Retrieval (PAG) mitigates this failure mode by using simultaneous decoding to compute a document-level look-ahead prior that guides subsequent sequential decoding. We reproduce PAG at inference time and stress-test its decoding behavior. Using the authors' released checkpoint and identifier/trie artifacts under the reported decoding setup, we reproduce the main effectiveness results on MS MARCO Dev and TREC-DL 2019/2020, and corroborate the reported beam-size-latency trade-off in our hardware setting. Beyond reproduction, we introduce plan drift diagnostics that quantify how intent-preserving query variations alter the planner's top-n candidate set and highest-weight planner tokens, and how these changes affect guided decoding. We find that PAG's planning signal is brittle under lexical surface-form variation: intent-preserving typos can trigger plan collapse, where the planned candidate pool shifts enough that the look-ahead bonus provides little useful guidance, effectively reverting decoding toward weaker unguided search. We further evaluate fixed-index cross-lingual robustness using non-English mMARCO queries against an English index, and assess query-side mitigation strategies that require no re-indexing; query translation provides the strongest recovery in our setting. Overall, our results confirm PAG's reported effectiveness and the benefit of planning-guided decoding under the released inference setup, while showing that these gains depend on the stability of the planning signal under realistic query variation and query-document mismatch.

URL PDF HTML ☆

赞 0 踩 0

2604.20022 2026-05-26 cs.LG cs.AI cs.CL 版本更新

MoBayes: A Modular Bayesian Framework for Separating Reasoning from Language in Conversational Clinical Decision Support

MoBayes：一种用于对话式临床决策支持中推理与语言分离的模块化贝叶斯框架

Yusuf Kesmen, Fay Elhassan, Jiayi Ma, Julien Stalhandske, Yena Chang, David Sasu, Alexandra Kulinkina, Akhil Arora, Lars Klein, Mary-Anne Hartley

发表机构 * LiGHT, EPFL（LiGHT，瑞士联邦理工学院）； University of Bern（伯尔尼大学）； Aarhus University（奥胡斯大学）

AI总结提出MoBayes框架，通过将LLM作为语言接口、贝叶斯模块进行概率推理，实现推理与语言分离，在临床决策支持中优于独立前沿LLM医生。

Comments 50 pages including appendix, 13 figures, 22 tables. Preprint

详情

AI中文摘要

大型语言模型（LLM）越来越多地用于对话式临床决策支持，但它们将下一个标记预测与概率决策混为一谈。我们认为这种混淆反映了架构上的局限性：此类系统缺乏显式的后验追踪、可控的弃权阈值和可审计的推理链。我们引入MoBayes，一个模块化贝叶斯对话框架，将推理与语言分离。LLM仅作为语言接口，将患者对话解析为结构化观察，而贝叶斯模块对这些观察进行概率推理以更新后验，通过期望信息增益选择后续问题，并通过校准的决策阈值决定何时停止或推迟。这种设计实现了显式后验追踪、可控的选择性决策，以及无需重新训练语言模型即可替换的特定人群统计后端。在经验知识和LLM生成的知识库上，MoBayes优于独立的前沿LLM医生，包括匹配模型系列的比较，其中廉价的传感器模型与MoBayes配对以较低成本超过更大的自主模型。在对抗性患者沟通风格和不同诊断场景下，该优势依然存在。这些结果表明，可靠的对话式临床决策支持系统应将概率推理与语言生成分离，而不是仅扩大模型规模。代码可在https://anonymous.4open.science/r/MoBayes/获取。

英文摘要

Large language models (LLMs) are increasingly used for conversational clinical decision support, yet they conflate next token prediction with probabilistic decision making. We argue that this conflation reflects an architectural limitation: such systems lack explicit posterior tracking, controllable abstention thresholds, and auditable reasoning chains. We introduce MoBayes, a Modular Bayesian dialogue framework that separates reasoning from language. The LLM acts only as a language interface, parsing patient conversation into structured observations, while a Bayesian module performs probabilistic inference over these observations to update posteriors, select follow-up questions via expected-information-gain and determine when to stop or defer through calibrated decision thresholds. This design enables explicit posterior tracking, controllable selective decision-making, and replaceable population-specific statistical backends without retraining the language model. Across empirical and LLM-generated knowledge bases, MoBayes outperforms standalone frontier LLM doctors, including matched model-family comparisons where inexpensive sensor models paired with MoBayes exceed larger autonomous models at lower cost. The advantage persists under adversarial patient communication styles and across varying diagnostic scenarios. These results suggest that reliable conversational clinical decision support systems should separate probabilistic reasoning from language generation rather than scaling model size alone. Code is available at https://anonymous.4open.science/r/MoBayes/

URL PDF HTML ☆

赞 0 踩 0

2604.18800 2026-05-26 cs.SI cs.GT cs.LG 版本更新

Optimal Exploration of New Products under Assortment Decisions

基于分类决策的新产品最优探索

Jackie Baek, Atanas Dinev, Thodoris Lykouris

发表机构 * Stern School of Business, New York University（纽约大学斯特恩商学院）； Massachusetts Institute of Technology（麻省理工学院）

AI总结研究平台在容量约束下通过分类决策在线学习新产品质量，提出最优探索策略以最小化遗憾，并揭示新产品应与顶级现有产品搭配、同时探索数量由潜力决定等结构洞见。

详情

AI中文摘要

我们研究了一个平台在容量约束下对提供哪些产品进行分类决策时，对新产品的在线学习。对于新上架的产品，其质量最初未知，质量信息通过社会学习传播：当顾客购买新产品并留下评论时，其质量对平台和未来顾客都变得可见。由于评论需要购买，平台必须在分类中展示新产品（“探索”）以产生评论来了解新产品。这种探索成本高昂，因为顾客对新产品的需求低于现有产品。我们刻画了用于探索的最优分类以最小化遗憾，解决了两个问题。（1）平台应该单独提供新产品还是与现有产品一起提供？前者最大化新产品的购买概率，但产生较低的短期收入。尽管购买概率较低，我们证明将新产品与顶级现有产品配对总是最优的。（2）对于多个新产品，平台应该同时探索它们还是逐个探索？我们证明同时探索的新产品最优数量具有简单的阈值结构：它随着新产品的“潜力”增加而增加，并且令人惊讶的是，不依赖于它们的个体购买概率。我们还表明，两种经典的bandit算法，UCB和汤普森采样，在此设置中因相反的原因而失败：UCB过度探索而汤普森采样探索不足。我们的结果为平台应如何通过分类决策了解新产品提供了结构性洞见。

英文摘要

We study online learning for new products on a platform that makes capacity-constrained assortment decisions on which products to offer. For a newly listed product, its quality is initially unknown, and quality information propagates through social learning: when a customer purchases a new product and leaves a review, its quality is revealed to both the platform and future customers. Since reviews require purchases, the platform must feature new products in the assortment ("explore") to generate reviews to learn about new products. Such exploration is costly because customer demand for new products is lower than for incumbent products. We characterize the optimal assortments for exploration to minimize regret, addressing two questions. (1) Should the platform offer a new product alone or alongside incumbent products? The former maximizes the purchase probability of the new product but yields lower short-term revenue. Despite the lower purchase probability, we show it is always optimal to pair the new product with the top incumbent products. (2) With multiple new products, should the platform explore them simultaneously or one at a time? We show that the optimal number of new products to explore simultaneously has a simple threshold structure: it increases with the "potential" of the new products and, surprisingly, does not depend on their individual purchase probabilities. We also show that two canonical bandit algorithms, UCB and Thompson Sampling, both fail in this setting for opposite reasons: UCB over-explores while Thompson Sampling under-explores. Our results provide structural insights on how platforms should learn about new products through assortment decisions.

URL PDF HTML ☆

赞 0 踩 0

2604.18128 2026-05-26 cs.CL cs.AI cs.LG 版本更新

Depth Registers Unlock W4A4 on SwiGLU: A Reader/Generator Decomposition

深度寄存器解锁 SwiGLU 上的 W4A4：一种读取器/生成器分解

Ziyang Liu

AI总结本研究通过深度寄存器和铰链损失（DR+sink）训练时干预，将 SwiGLU 解码器语言模型的 W4A4 量化困惑度从 1727 降至 119，并分解出残差轴读取器主导误差，而生成器 w2 的双线性输入是剩余差距的主因。

Comments The authors have decided to withdraw this version following internal review regarding authorship and contribution agreements

详情

AI中文摘要

我们在一个受控的 300M 参数 SwiGLU 解码器语言模型（在 FineWeb-Edu 的 5B 令牌上训练）中研究训练后 W4A4 量化，并询问哪些输入激活位点主导误差。朴素的四舍五入 W4A4 将验证困惑度从 FP16 的 23.6 降至 1727。一种简单的残差轴训练时干预——带有寄存器幅度铰链损失的深度寄存器（DR+sink）——在匹配的 FP16 PPL 和匹配的零样本能力下，将其降至 119（约 14 倍），并与 SmoothQuant 组合达到 39.9 PPL。与 FP16 之间约 2 PPL 的剩余差距是诊断核心。我们按输入激活位点分解 W4A4 损伤：SwiGLU 块中的五个可训练线性层分为残差轴读取器（qkv, w1, w3）和块内生成器（o_proj, w2）。基本的范数论证表明，残差轴幅度控制紧密约束读取器，但 w2 的双线性输入仅受因子范数平凡乘积的约束；经验上，DR+sink 降低了读取器的峰度，而生成器基本不变，并且读取器恢复的 W4A4 残差在三个匹配检查点上平坦约为 0.28 nats，其中 Delta-remove(w2) 占主导。我们将 DR+sink 作为训练时探针而非部署方案提出：一种事后替代方案（Per-Linear QuaRot）在读取器轴上几乎与之匹配。完整的 QuaRot——添加在线每头值 Hadamard 和在线 w2 输入旋转——也没有缩小差距，直接验证了正交旋转无法约束双线性 SwiGLU 尾部的预测。这些主张特定于我们的 300M、5B 令牌、单种子设置，并且我们的实验未将分区与铰链分离。

英文摘要

We study post-training W4A4 quantization in a controlled 300M-parameter SwiGLU decoder-only language model trained on 5B tokens of FineWeb-Edu, and ask which input-activation sites dominate the error. Naive round-to-nearest W4A4 collapses validation perplexity from FP16 23.6 to 1727. A simple residual-axis training-time intervention -- Depth Registers with a register-magnitude hinge loss (DR+sink) -- reduces this to 119 (about 14x) at matched FP16 PPL and matched zero-shot capacity, and composes with SmoothQuant to 39.9 PPL. The residual ~2 PPL gap to FP16 is the diagnostic core. We decompose W4A4 damage by input-activation site: the five trainable linears in a SwiGLU block split into residual-axis readers (qkv, w1, w3) and block-internal generators (o_proj, w2). Elementary norm arguments show residual-axis magnitude control bounds readers tightly but leaves w2's bilinear input bounded only by the trivial product of factor bounds; empirically, DR+sink collapses reader kurtosis while leaving generators essentially unchanged, and the reader-rescued W4A4 residue is flat at ~0.28 nats across three matched checkpoints with Delta-remove(w2) dominating. We present DR+sink as a training-time probe rather than a deployment proposal: a post-hoc alternative (Per-Linear QuaRot) nearly matches it on the reader axis. Full QuaRot -- adding online per-head value Hadamard plus online w2-input rotation -- does not close the gap either, directly testing the prediction that orthogonal rotation cannot bound the bilinear SwiGLU tail. Claims are specific to our 300M, 5B-token, single-seed setting, and our experiments do not isolate the partition from the hinge.

URL PDF HTML ☆

赞 0 踩 0

2604.17328 2026-05-26 cs.LG cs.AI 版本更新

基于CSI元组的多模态学习辅助3D信道指纹构建

Chenjie Xie, Li You, Ruirong Chen, Gaoning He, Xiqi Gao

发表机构 * National Mobile Communications Research Laboratory, Southeast University（东南大学国家移动通信研究中心）； Purple Mountain Laboratories（紫金山实验室）； Huawei Technologies Co., Ltd.（华为技术有限公司）

AI总结针对低空通信中的3D信道指纹构建问题，提出一种基于CSI元组的多模态回归框架，通过融合位置、通信测量和地理环境地图，实现高效高精度的信道状态信息估计。

Comments 14 pages, 9 figures

详情

DOI: 10.1109/TWC.2026.3693681
Journal ref: IEEE Transactions on Wireless Communications, vol. 25, pp. 17369-17383, 2026

AI中文摘要

低空通信可以促进空中和地面无线资源的整合，扩大网络覆盖范围，提高传输质量，从而推动第六代（6G）移动通信的发展。作为低空传输的关键技术，3D信道指纹（3D-CF），也称为3D无线电地图或3D信道知识地图，有望增强对通信环境的理解，并辅助获取信道状态信息（CSI），从而避免重复估计并降低计算复杂度。本文提出了一种模块化的多模态框架来构建3D-CF。具体而言，我们首先基于莱斯衰落信道建立了3D-CF模型，将其表示为CSI元组的集合，每个元组包含低空飞行器（LAV）的位置及其对应的统计CSI。考虑到不同先验数据的异构结构，我们将3D-CF构建问题表述为一个多模态回归任务，其中CSI元组中的目标信道信息可以通过其对应的LAV位置、通信测量和地理环境地图直接估计。然后，相应地提出了一种高效的多模态框架，包括基于相关性的多模态融合（Corr-MMF）模块、多模态表示（MMR）模块和CSI回归（CSI-R）模块。数值结果表明，我们提出的框架能够高效地构建3D-CF，并在不同通信场景下比现有算法至少提高27.5%的精度，展示了其竞争性能和出色的泛化能力。我们还分析了计算复杂度，并说明了其在推理时间方面的优越性。

英文摘要

Low-altitude communications can promote the integration of aerial and terrestrial wireless resources, expand network coverage, and enhance transmission quality, thereby empowering the development of sixth-generation (6G) mobile communications. As an enabler for low-altitude transmission, 3D channel fingerprints (3D-CF), also referred to as the 3D radio map or 3D channel knowledge map, are expected to enhance the understanding of communication environments and assist in the acquisition of channel state information (CSI), thereby avoiding repeated estimations and reducing computational complexity. In this paper, we propose a modularized multimodal framework to construct 3D-CF. Specifically, we first establish the 3D-CF model as a collection of CSI-tuples based on Rician fading channels, with each tuple comprising the low-altitude vehicle's (LAV) positions and its corresponding statistical CSI. In consideration of the heterogeneous structures of different prior data, we formulate the 3D-CF construction problem as a multimodal regression task, where the target channel information in the CSI-tuple can be estimated directly by its corresponding LAV positions, together with communication measurements and geographic environment maps. Then, a high-efficiency multimodal framework is proposed accordingly, which includes a correlation-based multimodal fusion (Corr-MMF) module, a multimodal representation (MMR) module, and a CSI regression (CSI-R) module. Numerical results show that our proposed framework can efficiently construct 3D-CF and achieve at least 27.5% higher accuracy than the state-of-the-art algorithms under different communication scenarios, demonstrating its competitive performance and excellent generalization ability. We also analyze the computational complexity and illustrate its superiority in terms of the inference time.

URL PDF HTML ☆

赞 0 踩 0

2603.18766 2026-05-26 cs.LG 版本更新

Enhancing the Parameterization of Reservoir Properties for Data Assimilation Using Deep VAE-GAN

利用深度VAE-GAN增强数据同化中储层属性的参数化

M. A. Sampaio, P. H. Ranazzi, M. J. Blunt

发表机构 * Departamento de Engenharia de Minas e de Petróleo, Escola Politécnica, Universidade de São Paulo（圣保罗大学采矿与石油工程系，理工学院）； Department of Earth Science and Engineering, Imperial College London（伦敦帝国理工学院地球科学与工程系）

AI总结提出将VAE-GAN与ESMDA结合，以同时实现高质量储层描述和良好历史拟合，克服传统方法在非高斯分布和有限集合大小上的局限。

详情

DOI: 10.1016/j.cageo.2026.106196

AI中文摘要

目前，称为迭代集合平滑器的方法，特别是称为多重数据同化集合平滑器（ESMDA）的方法，可被视为石油储层模拟中历史拟合的最先进技术。然而，这种方法有两个重要限制：使用有限大小的集合来表示分布，以及参数和数据不确定性中的高斯假设。后者尤为重要，因为许多储层属性具有非高斯分布。参数化涉及在更新前将非高斯参数映射到高斯场，然后将其映射回原始域以将集合通过储层模拟器向前传播。一种有前景的参数化方法是通过深度学习模型。最近的研究表明，生成对抗网络（GAN）在数据同化方面表现不佳，但能生成地质上更合理的储层实现，而变分自编码器（VAE）在数据同化中表现优于GAN，但生成的地质模型不太真实。本工作的创新之处在于结合两者的优势，实现一个称为变分自编码器生成对抗网络（VAE-GAN）的深度学习模型，并与ESMDA集成。该方法应用于两个案例研究，一个案例是分类的，另一个是连续渗透率值。我们的发现表明，通过应用VAE-GAN模型，我们可以同时获得高质量的储层描述（就像GAN）和良好的生产曲线历史拟合（就像VAE）。

英文摘要

Currently, the methods called Iterative Ensemble Smoothers, especially the method called Ensemble Smoother with Multiple Data Assimilation (ESMDA) can be considered state-of-the-art for history matching in petroleum reservoir simulation. However, this approach has two important limitations: the use of an ensemble with finite size to represent the distributions and the Gaussian assumption in parameter and data uncertainties. This latter is particularly important because many reservoir properties have non-Gaussian distributions. Parameterization involves mapping non-Gaussian parameters to a Gaussian field before the update and then mapping them back to the original domain to forward the ensemble through the reservoir simulator. A promising approach to perform parameterization is through deep learning models. Recent studies have shown that Generative Adversarial Networks (GAN) performed poorly concerning data assimilation, but generated more geologically plausible realizations of the reservoir, while the Variational Autoencoder (VAE) performed better than the GAN in data assimilation, but generated less geologically realistic models. This work is innovative in combining the strengths of both to implement a deep learning model called Variational Autoencoder Generative Adversarial Network (VAE-GAN) integrated with ESMDA. The methodology was applied in two case studies, one case being categorical and the other with continuous values of permeability. Our findings demonstrate that by applying the VAE-GAN model we can obtain high quality reservoir descriptions (just like GANs) and a good history matching on the production curves (just like VAEs) simultaneously.

URL PDF HTML ☆

赞 0 踩 0

2603.16481 2026-05-26 cs.LG cs.SY eess.SY math.OC 版本更新

Optimal uncertainty bounds for multivariate kernel regression under bounded noise: A Gaussian process-based dual function

有界噪声下多元核回归的最优不确定性界：基于高斯过程的对偶函数

Amon Lahr, Anna Scampicchio, Johannes Köhler, Melanie N. Zeilinger

发表机构 * Institute for Dynamical Systems and Control, ETH Zurich（动态系统与控制研究所，苏黎世联邦理工学院）； Department of Electrical Engineering, Chalmers University of Technology（电气工程系，查尔姆斯理工大学）； Department of Mechanical Engineering, Imperial College London（机械工程系，伦敦帝国理工学院）

AI总结针对有界噪声下再生核希尔伯特空间中的多输出函数，提出一种紧致、确定性的不确定性界，通过无约束对偶公式获得，具有与经典高斯过程置信界相同的结构，便于集成到下游优化中。

Comments Extended version

详情

AI中文摘要

非保守的不确定性界对于从含噪数据中对潜在函数进行可靠预测至关重要，因此是安全学习控制的关键推动因素。在该领域，高斯过程回归等核方法因其固有的不确定性量化机制而成为成熟技术。然而，现有方法要么对底层噪声分布施加强假设，要么保守，要么不直接适用于多输出情况，要么难以集成到下游任务中。本文通过提出一种针对再生核希尔伯特空间（RKHS）中多输出函数的紧致、确定性界来应对这些限制，该函数受有界噪声影响。该界通过无约束的对偶公式获得，该公式具有与经典高斯过程置信界相同的结构，因此可以直接集成到下游优化流程中。我们证明了所提出的界推广了现有结果，并使用四旋翼动力学学习的示例说明了其应用。

英文摘要

Non-conservative uncertainty bounds are essential for making reliable predictions about latent functions from noisy data, and thus, a key enabler for safe learning-based control. In this domain, kernel methods such as Gaussian process regression are established techniques, thanks to their inherent uncertainty quantification mechanism. Still, existing bounds either pose strong assumptions on the underlying noise distribution, are conservative, do not directly apply in the multi-output case, or are difficult to integrate into downstream tasks. This paper addresses these limitations by presenting a tight, deterministic bound for multi-output functions in Reproducing Kernel Hilbert Spaces (RKHSs) subject to bounded noise. It is obtained through an unconstrained, duality-based formulation, which shares the same structure as classic Gaussian process confidence bounds, and can thus be straightforwardly integrated into downstream optimization pipelines. We show that the proposed bound generalizes existing results and illustrate its application using an example inspired by quadrotor dynamics learning.

URL PDF HTML ☆

赞 0 踩 0

2603.10250 2026-05-26 cs.LG 版本更新

GeMPO: Generalized Measure Matching for Online Diffusion Reinforcement Learning

GeMPO：在线扩散强化学习的广义度量匹配

Haitong Ma, Chenxiao Gao, Tianyi Chen, Na Li, Bo Dai

发表机构 * Harvard University（哈佛大学）； Georgia Institute of Technology（佐治亚理工学院）

AI总结提出GeMPO框架，通过将扩散RL中的重加权从softmax推广到一般单调函数，并引入负重加权机制，以解决过贪策略和负样本利用不足的问题。

Comments 22 pages, 6 figures

详情

AI中文摘要

扩散策略的强化学习中常用的一类算法对来自行为策略的样本进行softmax重加权，这通常会导致过贪策略，并且未能利用负样本的反馈。在这项工作中，我们引入了GeMPO，一个简单且统一的框架，将扩散RL中的重加权方案从softmax推广到一般单调函数。GeMPO通过度量匹配的视角重新审视扩散RL：首先，通过求解正则化策略优化目标构建虚拟目标策略度量；其次，通过重加权流匹配最小化当前策略与该目标度量之间的散度。这种公式有两个关键优势：i) 它将权重设计扩展到传统的指数重加权之外，允许针对不同的奖励景观进行定制；ii) 通过放松目标度量的非负性约束，我们的框架为负重加权提供了原则性的理由。我们解释了负重加权如何主动使策略远离次优动作，从而促进探索。大量的实证评估表明，GeMPO通过利用这些灵活的加权方案实现了具有竞争力或更优的性能，并且我们提供了在实践中选择重加权方法的实用指南。

英文摘要

A commonly used family of RL algorithms for diffusion policies conducts softmax reweighting over samples from the behavior policy, which often induces an overgreedy policy and fails to utilize feedback from negative samples. In this work, we introduce GeMPO, a simple and unified framework that generalizes reweighting scheme in diffusion RL from softmax to general monotonic functions. GeMPO revisits diffusion RL via a measure matching perspective: First, we construct a virtual target policy measure via solving a regularized policy optimization objective; Second, we minimize the divergence between the current policy and this target measure through reweighted flow matching. This formulation offers two key advantages: i) It extends weight design beyond traditional exponential reweighting, allowing it to be tailored to diverse reward landscapes; and ii) by relaxing the non-negativity constraint on the target measure, our framework provides a principled justification for negative reweighting. We provide interpretations of how negative reweighting actively repels the policy from suboptimal actions and thus facilitates exploration. Extensive empirical evaluations demonstrate that GeMPO achieves competitive or superior performance by leveraging these flexible weighting schemes, and we provide practical guidelines for selecting reweighting methods in practice.

URL PDF HTML ☆

赞 0 踩 0

2603.06626 2026-05-26 cs.LG cs.AI 版本更新

多流审计的全局序贯检验

Beepul Bharti, Ambar Pal, Jeremias Sulam

发表机构 * Mathematical Institute for Data Science (MINDS), Johns Hopkins University（数据科学数学研究所（MINDS），约翰霍普金斯大学）； Department of Biomedical Engineering, Johns Hopkins University（生物医学工程系，约翰霍普金斯大学）； Amazon Responsible AI（亚马逊负责任人工智能）

AI总结针对多数据流审计问题，提出基于鞅合并的序贯检验方法，在稀疏和密集备择假设下分别达到最优停止时间，并通过实验验证。

详情

AI中文摘要

在许多风险敏感领域，随着接收更多数据，持续审计机器学习系统以快速判断其是否按设计运行至关重要。该审计任务可建模为具有 $k$ 个数据流和全局零假设的序贯假设检验问题，其中全局零假设断言系统在所有 $k$ 个流上按预期运行。在备择假设下，使用 Bonferroni 校正的标准全局序贯检验，对于大 $k$ 和显著性水平 $α$，期望停止时间为 $O\left(\ln rac{k}{α} ight)$。在这项工作中，我们证明了依赖于通过平均和乘积规则合并鞅的高效序贯检验提供了改进的停止时间，从而对零假设具有更强的检验能力。利用这些结果，我们表明平衡检验在稀疏情形（仅少数非零流）下可以达到 Bonferroni 的 $O\left(\ln rac{k}{α} ight)$ 速率，同时在密集备择假设（许多非零流）下实现 $O\left( rac{1}{k}\ln rac{1}{α} ight)$。我们通过在合成数据和真实数据上的实验验证了我们的理论。

英文摘要

Across many risk-sensitive areas, it is critical to continuously audit machine learning systems as we receive more data to quickly determine if they are performing as designed. This auditing task can be modeled as a sequential hypothesis testing problem with $k$ data streams and a global null hypothesis that asserts the system operates as intended across all $k$ streams. Under the alternative, the standard global sequential test, which uses a Bonferroni correction, has an expected stopping time of $O\left(\ln \frac{k}α\right)$ for large $k$ and significance level $α$. In this work, we demonstrate that efficient sequential tests, relying on merging martingales via averaging and products rules, provide improved stopping times, and thus more powerful tests against the null. Using these results, we show that a balanced test can match the Bonferroni rate of $O\left(\ln \frac{k}α\right)$ in the sparse regime (just a few non-null streams) while achieving $O\left(\frac{1}{k}\ln \frac{1}α\right)$ under dense alternatives (many non-null steams). We validate our theory through experiments on both synthetic and real-world data.

URL PDF HTML ☆

赞 0 踩 0

2602.16340 2026-05-26 cs.LG stat.ML 版本更新

The Implicit Bias of Adam and Muon on Smooth Homogeneous Neural Networks

Adam和Muon在光滑齐次神经网络上的隐式偏差

Eitan Gronich, Gal Vardi

发表机构 * Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel（计算机科学与应用数学系，魏茨曼科学研究院，以色列雷霍夫特）

AI总结研究动量优化器在光滑齐次模型上的隐式偏差，证明Muon、MomentumGD和Signum在衰减学习率下近似于最速下降轨迹，并偏向于对应边际最大化问题的KKT点，同时将分析扩展到Adam和混合范数优化器。

Comments ICML 2026. 8 pages, 1 figure (with appendix: 45 pages, 3 figures)

2602.09130 2026-05-26 cs.LG 版本更新

UniComp: A Unified Evaluation of Large Language Model Compression via Pruning, Quantization and Distillation

UniComp: 通过剪枝、量化和蒸馏对大型语言模型压缩的统一评估

Jonathan von Rad, Yong Cao, Andreas Geiger

发表机构 * University College London（伦敦大学学院）； University of Tübingen（图宾根大学）； Tübingen AI Center（图宾根人工智能中心）

AI总结提出UniComp框架，统一评估剪枝、量化和知识蒸馏三种压缩方法，从性能、可靠性和效率三个维度在40个数据集上分析，发现知识偏差、性能与可靠性解耦以及任务特定校准可提升推理性能。

Comments 18 pages, 5 figures, 18 tables

详情

AI中文摘要

模型压缩对于部署大型语言模型（LLM）日益重要，然而现有的比较研究主要集中在剪枝和量化，且主要基于知识中心的基准进行评估。因此，我们引入了UniComp，一个用于比较剪枝、量化和知识蒸馏的统一评估框架。UniComp从性能、可靠性和效率三个维度评估压缩模型，使用多样化的面向能力和安全性的基准以及硬件感知的效率分析。通过对40个数据集上的六种压缩技术进行评估，我们观察到：(i) 一致的知识偏差，即事实回忆基本保留，而多步推理、多语言和指令遵循能力下降；(ii) 性能与可靠性之间的解耦，表明保留的性能并不一致地意味着保留的可靠性；(iii) 任务特定校准可以在剪枝模型中实现高达50%的推理性能相对提升。

英文摘要

Model compression is increasingly essential for deploying large language models (LLMs), yet existing comparative studies largely focus on pruning and quantization evaluated primarily on knowledge-centric benchmarks. Thus, we introduce UniComp, a unified evaluation framework for comparing pruning, quantization, and knowledge distillation. UniComp evaluates compressed models along three dimensions: performance, reliability, and efficiency, using a diverse set of capability- and safety-oriented benchmarks together with a hardware-aware efficiency analysis. Through evaluation of six compression techniques across 40 datasets, we observe (i) a consistent knowledge bias, where factual recall is largely preserved while multi-step reasoning, multilingual, and instruction-following capabilities degrade; (ii) a decoupling between performance and reliability, indicating that retained performance does not consistently imply preserved reliability; and (iii) that task-specific calibration can yield up to 50% relative improvement of reasoning performance in pruned models.

URL PDF HTML ☆

赞 0 踩 0

2602.06357 2026-05-26 cs.LG 版本更新

LLM-SAA: LLM-persona Generated Distributions for Decision-making

LLM-SAA：基于LLM人格生成分布的决策方法

Jackie Baek, Yunhan Chen, Ziyu Chi, Will Ma

发表机构 * Stern School of Business, New York University（纽约大学 Stern 商学院）； Department of Computer Science, Columbia University（哥伦比亚大学计算机科学系）； Graduate School of Business and Data Science Institute, Columbia University（哥伦比亚大学商学院和数据科学研究院）

AI总结研究利用LLM生成分布（如模拟消费者支付意愿）支持下游决策，通过三个经典问题（分类优化、定价、报童模型）评估其实际效用，发现低数据场景下有效，且决策无关指标（如Wasserstein距离）可能误导。

详情

AI中文摘要

LLM可以生成丰富的数据，从模拟人类估值和偏好的虚拟人格，到基于世界知识的需求预测。但这类LLM生成的分布对下游决策的支持程度如何？例如，在定价新产品时，企业可以提示LLM根据产品描述模拟消费者愿意支付的价格，但由此得到的分布对优化价格有多大用处？我们将这种方法称为LLM-SAA，即利用LLM构建估计分布，然后在该分布下优化决策。在本文中，我们研究基于这些分布所诱导的决策来评估其质量的指标。以三个经典决策问题（分类优化、定价和报童模型）为例，我们发现LLM生成的分布在实际中是有用的，尤其是在低数据场景下。我们还表明，在评估这些分布用于决策时，诸如Wasserstein距离等与决策无关的指标可能会产生误导。

英文摘要

LLMs can generate a wealth of data, ranging from simulated personas imitating human valuations and preferences, to demand forecasts based on world knowledge. But how well do such LLM-generated distributions support downstream decision-making? For example, when pricing a new product, a firm could prompt an LLM to simulate how much consumers are willing to pay based on a product description, but how useful is the resulting distribution for optimizing the price? We refer to this approach as LLM-SAA, in which an LLM is used to construct an estimated distribution and the decision is then optimized under that distribution. In this paper, we study metrics to evaluate the quality of these LLM-generated distributions, based on the decisions they induce. Taking three canonical decision-making problems (assortment optimization, pricing, and newsvendor) as examples, we find that LLM-generated distributions are practically useful, especially in low-data regimes. We also show that decision-agnostic metrics such as Wasserstein distance can be misleading when evaluating these distributions for decision-making.

URL PDF HTML ☆

赞 0 踩 0

2602.04653 2026-05-26 cs.CR cs.LG 版本更新

Inference-Time Backdoors via Chat Templates: From LLM Supply Chains to Agentic System Compromise

通过聊天模板的推理时后门：从LLM供应链到代理系统妥协

Ariel Fogel, Omer Hofman, Eilon Cohen, Roman Vainshtein

发表机构 * Fujitsu Research of Europe（富士通欧洲研究）

AI总结提出一种通过恶意修改聊天模板实现推理时后门攻击的方法，无需修改模型权重或训练数据，在LLM、代理和多代理系统层面均能成功攻击，且能绕过现有防御。

Comments V3: Accepted to ICLR 2026 Trustworthy AI Workshop, V4: Submitted to CCS 2026

详情

深度，而非数据：Hessian谱分叉分析

Shenyang Deng, Boyao Liao, Zhuoli Ouyang, Tianyu Pang, Yaoqing Yang

发表机构 * Department of Computer Science（计算机科学系）； Dartmouth College（达特茅斯学院）； University of Birmingham（伯明翰大学）

AI总结本文通过分析深度线性网络，证明Hessian矩阵的谱分叉结构（主导特征值与主体特征值分离）可仅由网络深度引起，与数据协方差平衡无关，且主导与主体特征值之比与深度线性相关。

详情

AI中文摘要

Hessian矩阵的特征值分布在理解深度神经网络的优化景观中起着关键作用。先前的工作将广泛记录的“主体-尖峰”谱结构（其中少数主导特征值与大量较小特征值分离）归因于数据协方差矩阵的不平衡。在这项工作中，我们通过证明这种谱分叉可以纯粹由网络架构引起，而与数据不平衡无关，来挑战这一观点。具体来说，我们分析了一个深度线性网络设置，并证明即使数据协方差完全平衡，Hessian仍然表现出分叉特征值结构：一个主导簇和一个主体簇。至关重要的是，我们建立了主导特征值与主体特征值之比与网络深度呈线性关系。这表明谱间隙受到网络架构的强烈影响，而不仅仅是由数据分布决定。我们的结果表明，在设计深度网络的优化算法时，应同时考虑模型架构和数据特征。

英文摘要

The eigenvalue distribution of the Hessian matrix plays a crucial role in understanding the optimization landscape of deep neural networks. Prior work has attributed the well-documented ``bulk-and-spike'' spectral structure, where a few dominant eigenvalues are separated from a bulk of smaller ones, to the imbalance in the data covariance matrix. In this work, we challenge this view by demonstrating that such spectral Bifurcation can arise purely from the network architecture, independent of data imbalance. Specifically, we analyze a deep linear network setup and prove that, even when the data covariance is perfectly balanced, the Hessian still exhibits a Bifurcation eigenvalue structure: a dominant cluster and a bulk cluster. Crucially, we establish that the ratio between dominant and bulk eigenvalues scales linearly with the network depth. This reveals that the spectral gap is strongly affected by the network architecture rather than solely by data distribution. Our results suggest that both model architecture and data characteristics should be considered when designing optimization algorithms for deep networks.

URL PDF HTML ☆

赞 0 踩 0

2602.00511 2026-05-26 cs.LG math.OC 版本更新

Partition of Unity Neural Networks for Interpretable Classification with Explicit Class Regions

用于可解释分类的单元划分神经网络及显式类别区域

Akram Aldroubi

发表机构 * Department of Mathematics（数学系）

AI总结提出单元划分神经网络（PUNN），通过直接学习满足和为1的非负函数来定义类别概率，无需softmax层，实现可解释分类并证明其稠密性，实验表明在保持精度的同时大幅减少参数。

Comments v2: substantially revised; under review at TMLR

详情

AI中文摘要

尽管神经网络分类器在经验上取得了成功，但它们仍然难以解释。在基于softmax的模型中，类别区域被隐式定义为logits之间不等式系统的解，这使得它们难以提取和可视化。我们引入了单元划分神经网络（PUNN），这是一种架构，其中类别概率直接来自学习到的单元划分，无需softmax层。PUNN构造了$k$个非负函数$h_1, \ldots, h_k$，满足$\sum_i h_i(x) = 1$，其中每个$h_i(x)$直接表示$P(\text{类别 } i \mid x)$。与softmax不同，其中类别区域通过logits之间的耦合不等式隐式定义，每个PUNN划分函数$h_i$直接定义类别$i$的概率作为$x$的独立函数。我们证明了PUNN在紧致域上的连续概率映射空间中是稠密的。定义划分的门函数$g_i$可以使用各种激活函数（sigmoid、高斯、bump）和参数化方法，从灵活的MLP到参数高效、形状感知的设计（球壳、椭球、球谐函数）。在合成数据、UCI基准和MNIST上的实验表明，基于MLP门的PUNN在精度上达到标准多层感知机的0.3-0.6%以内。当几何先验与数据结构匹配时，形状感知的门在参数减少多达300倍的情况下实现了相当的精度。这些结果表明，可解释性设计架构可以与黑盒模型竞争，同时提供透明的类别概率分配。

英文摘要

Despite their empirical success, neural network classifiers remain difficult to interpret. In softmax-based models, class regions are defined implicitly as solutions to systems of inequalities among logits, making them difficult to extract and visualize. We introduce Partition of Unity Neural Networks (PUNN), an architecture in which class probabilities arise directly from a learned partition of unity, without requiring a softmax layer. PUNN constructs $k$ nonnegative functions $h_1, \ldots, h_k$ satisfying $\sum_i h_i(x) = 1$, where each $h_i(x)$ directly represents $P(\text{class } i \mid x)$. Unlike softmax, where class regions are defined implicitly through coupled inequalities among logits, each PUNN partition function $h_i$ directly defines the probability of class $i$ as a standalone function of $x$. We prove that PUNN is dense in the space of continuous probability maps on compact domains. The gate functions $g_i$ that define the partition can use various activation functions (sigmoid, Gaussian, bump) and parameterizations ranging from flexible MLPs to parameter-efficient shape-informed designs (spherical shells, ellipsoids, spherical harmonics). Experiments on synthetic data, UCI benchmarks, and MNIST show that PUNN with MLP-based gates achieves accuracy within 0.3--0.6\% of standard multilayer perceptrons. When geometric priors match the data structure, shape-informed gates achieve comparable accuracy with up to 300$\times$ fewer parameters. These results demonstrate that interpretable-by-design architectures can be competitive with black-box models while providing transparent class probability assignments.

URL PDF HTML ☆

赞 0 踩 0

2601.23164 2026-05-26 cs.LG 版本更新

Stochastic Linear Bandits with Parameter Noise

带有参数噪声的随机线性赌博机

Daniel Ezer, Alon Peled-Cohen, Yishay Mansour

发表机构 * Tel Aviv University（特拉维夫大学）； Google Research（谷歌研究）

AI总结研究带有参数噪声的随机线性赌博机模型，提出一种简单的探索-利用算法，实现了与下界匹配（对数因子内）的遗憾界，并揭示了与经典加性噪声模型不同的最优遗憾阶。

Comments 8 pages

详情

AI中文摘要

我们研究了带有参数噪声模型的随机线性赌博机，其中动作$a$的奖励为$a^ op θ$，$θ$是独立同分布的样本。我们给出了一个遗憾上界$\widetilde{O} (\sqrt{d T \log (K/δ) σ^2_{\max}})$，其中$T$是时间范围，动作集大小为$K$，维度为$d$，$σ^2_{\max}$是任何动作奖励的最大方差。我们进一步给出了一个下界$\widetildeΩ (d \sqrt{T σ^2_{\max}})$，当$\log (K) \approx d$时，该下界是紧的（忽略对数因子）。对于更具体的动作集，即$p \leq 2$的$\ell_p$单位球及其对偶范数$q$，我们证明了极小极大遗憾为$\widetildeΘ (\sqrt{dT σ^2_q})$，其中$σ^2_q$是一个与方差相关的量，且始终不超过4。这与经典加性噪声模型中此类动作集可达到的极小极大遗憾（阶为$d \sqrt{T}$）形成对比。令人惊讶的是，我们表明这个最优（忽略对数因子）遗憾界可以通过一个非常简单的探索-利用算法实现。

英文摘要

We study the stochastic linear bandits with parameter noise model, in which the reward of action $a$ is $a^\top θ$ where $θ$ is sampled i.i.d. We show a regret upper bound of $\widetilde{O} (\sqrt{d T \log (K/δ) σ^2_{\max})}$ for a horizon $T$, general action set of size $K$ of dimension $d$, and where $σ^2_{\max}$ is the maximal variance of the reward for any action. We further provide a lower bound of $\widetildeΩ (d \sqrt{T σ^2_{\max}})$ which is tight (up to logarithmic factors) whenever $\log (K) \approx d$. For more specific action sets, $\ell_p$ unit balls with $p \leq 2$ and dual norm $q$, we show that the minimax regret is $\widetildeΘ (\sqrt{dT σ^2_q)}$, where $σ^2_q$ is a variance-dependent quantity that is always at most $4$. This is in contrast to the minimax regret attainable for such sets in the classic additive noise model, where the regret is of order $d \sqrt{T}$. Surprisingly, we show that this optimal (up to logarithmic factors) regret bound is attainable using a very simple explore-exploit algorithm.

URL PDF HTML ☆

赞 0 踩 0

2601.22925 2026-05-26 cs.IR cs.AI cs.LG 版本更新

BEAR: Towards Beam-Search-Aware Optimization for Recommendation with Large Language Models

BEAR：面向大语言模型推荐中束搜索感知的优化

Weiqin Yang, Bohao Wang, Zhenxiang Xu, Jiawei Chen, Shengjia Zhang, Jingbang Chen, Canghong Jin, Can Wang

发表机构 * Zhejiang University（浙江大学）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））； Hangzhou City University（杭州市城市大学）

AI总结针对监督微调与束搜索推理之间的不一致性，提出BEAR正则化方法，通过确保正例每个token在解码步骤中排名前B来避免过早剪枝，显著提升推荐性能。

Comments Accepted by SIGIR 2026

详情

DOI: 10.1145/3805712.3809533

AI中文摘要

近年来，利用大语言模型（LLM）进行推荐的研究迅速增长。这些方法通常采用监督微调（SFT）使LLM适应推荐场景，并在推理时使用束搜索高效检索前B个推荐项。然而，我们发现了关键的训练-推理不一致性：虽然SFT优化正例的整体概率，但即使这些项具有高整体概率，也不能保证它们会被束搜索检索到。由于贪心剪枝机制，束搜索可能会在正例的前缀概率不足时过早丢弃它。为了解决这种不一致性，我们提出了BEAR（束搜索感知正则化），一种新的微调目标，在训练中显式考虑束搜索行为。BEAR不直接模拟每个训练实例的束搜索（计算代价过高），而是强制执行一个宽松的必要条件：正例中的每个token在每个解码步骤中必须排在前B个候选token中。该目标有效降低了错误剪枝的风险，同时与标准SFT相比仅增加可忽略的计算开销。在四个真实世界数据集上的大量实验表明，BEAR显著优于强基线。代码可在https://github.com/Tiny-Snow/BEAR-SIGIR-2026获取。

英文摘要

Recent years have seen a rapid surge in research leveraging Large Language Models (LLMs) for recommendation. These methods typically employ supervised fine-tuning (SFT) to adapt LLMs to recommendation scenarios, and utilize beam search during inference to efficiently retrieve $B$ top-ranked recommended items. However, we identify a critical training-inference inconsistency: while SFT optimizes the overall probability of positive items, it does not guarantee that such items will be retrieved by beam search even if they possess high overall probabilities. Due to the greedy pruning mechanism, beam search can prematurely discard a positive item once its prefix probability is insufficient. To address this inconsistency, we propose BEAR (Beam-SEarch-Aware Regularization), a novel fine-tuning objective that explicitly accounts for beam search behavior during training. Rather than directly simulating beam search for each instance during training, which is computationally prohibitive, BEAR enforces a relaxed necessary condition: each token in a positive item must rank within the top-$B$ candidate tokens at each decoding step. This objective effectively mitigates the risk of incorrect pruning while incurring negligible computational overhead compared to standard SFT. Extensive experiments across four real-world datasets demonstrate that BEAR significantly outperforms strong baselines. Code is available at https://github.com/Tiny-Snow/BEAR-SIGIR-2026 .

URL PDF HTML ☆

赞 0 踩 0

2601.21924 2026-05-26 cs.LG stat.ML 版本更新

One-Step Bellman Alignment Enables Provably Efficient Transfer in Online RL

一步贝尔曼对齐实现在线强化学习中的可证明高效迁移

Elynn Chen, Enpei Zhang, Jinhang Chai, Yujun Yan

发表机构 * Department of Technology, Operations, & Statistics, New York University（纽约大学技术、运营与统计系）； Department of Operations Research & Financial Engineering, Princeton University（普林斯顿大学运筹学与金融工程系）； Department of Computer Science, Dartmouth College（达特茅斯学院计算机科学系）

AI总结提出一步贝尔曼对齐作为在线强化学习中迁移的正确抽象，并通过重加权目标（RWT）实现算子级修正，在RKHS函数逼近下建立了与任务迁移复杂度相关的遗憾界。

详情

AI中文摘要

我们研究在情节马尔可夫决策过程中的在线迁移强化学习，其中在学习目标任务时，来自相关源任务的经验是可用的。一个基本困难在于任务相似性通常根据奖励或转移来定义，而在线RL算法操作在贝尔曼回归目标上。因此，简单地重用源贝尔曼更新会引入系统性偏差并使遗憾保证失效。我们识别出一阶贝尔曼对齐作为在线RL中迁移的正确抽象，并提出重加权目标（RWT），这是一种算子级修正，通过测度变换重新定位延续值并补偿转移不匹配。RWT将任务不匹配简化为固定的一步修正，并实现了源数据的统计上合理的重用。这种对齐产生了一个两阶段RWT Q学习框架，将方差减少与偏差修正分离。在RKHS函数逼近下，我们建立的遗憾界随任务迁移的复杂度而非目标MDP的复杂度变化。我们进一步证明了所需的密度比允许一个具有有限样本保证的构造性RKHS估计器，并经验验证了对估计和错误指定比率的鲁棒性。在表格和神经网络设置中的实证结果均显示，与单任务学习和朴素池化相比，持续改进，突出了贝尔曼对齐作为在线RL中模型无关的迁移原理。

英文摘要

We study online transfer reinforcement learning (RL) in episodic Markov decision processes, where experience from related source tasks is available during learning on a target task. A fundamental difficulty is that task similarity is typically defined in terms of rewards or transitions, whereas online RL algorithms operate on Bellman regression targets. As a result, naively reusing source Bellman updates introduces systematic bias and invalidates regret guarantees. We identify one-step Bellman alignment as the correct abstraction for transfer in online RL and propose re-weighted targeting (RWT), an operator-level correction that retargets continuation values and compensates for transition mismatch via a change of measure. RWT reduces task mismatch to a fixed one-step correction and enables statistically sound reuse of source data. This alignment yields a two-stage RWT $Q$-learning framework that separates variance reduction from bias correction. Under RKHS function approximation, we establish regret bounds that scale with the complexity of the task shift rather than the target MDP. We further show the required density ratios admit a constructive RKHS estimator with finite-sample guarantees, and empirically validate robustness to estimated and mis-specified ratios. Empirical results in both tabular and neural network settings demonstrate consistent improvements over single-task learning and naïve pooling, highlighting Bellman alignment as a model-agnostic transfer principle for online RL.

URL PDF HTML ☆

赞 0 踩 0

2601.21601 2026-05-26 cs.LG cs.AI 版本更新

Dynamics Reveals Structure: Challenging the Linear Propagation Assumption

动力学揭示结构：挑战线性传播假设

Hoyeon Chang, Bálint Mucsányi, Seong Joon Oh

发表机构 * University of Tübingen（图宾根大学）

AI总结通过关系代数研究神经网络中线性传播假设的几何极限，证明其在对合运算（否定、逆）上可行，但在组合运算上存在根本性障碍，导致特征映射崩溃，并解释知识编辑失败、反转诅咒和多跳推理等问题的共同根源。

详情

AI中文摘要

神经网络通过一阶参数更新进行自适应，但尚不清楚这种更新是否保持逻辑一致性。我们研究了线性传播假设（LPA）的几何极限，该假设认为局部更新能够连贯地传播到逻辑结论。为了形式化这一点，我们采用关系代数，研究关系的三种核心运算：否定翻转真值、逆交换参数顺序、组合链接关系。对于否定和逆，我们证明保证与方向无关的一阶传播需要一种张量分解，将实体对上下文与关系内容分离。然而，对于组合，我们识别出一个根本性障碍。我们证明组合可归结为合取，并证明任何在线性特征上良好定义的合取必须是双线性的。由于双线性与否定不兼容，这迫使特征映射崩溃。这些结果表明，知识编辑失败、反转诅咒和多跳推理可能源于LPA固有的共同结构限制。

英文摘要

Neural networks adapt through first-order parameter updates, yet it remains unclear whether such updates preserve logical coherence. We investigate the geometric limits of the Linear Propagation Assumption (LPA), the premise that local updates coherently propagate to logical consequences. To formalize this, we adopt relation algebra and study three core operations on relations: negation flips truth values, converse swaps argument order, and composition chains relations. For negation and converse, we prove that guaranteeing direction-agnostic first-order propagation necessitates a tensor factorization separating entity-pair context from relation content. However, for composition, we identify a fundamental obstruction. We show that composition reduces to conjunction, and prove that any conjunction well-defined on linear features must be bilinear. Since bilinearity is incompatible with negation, this forces the feature map to collapse. These results suggest that failures in knowledge editing, the reversal curse, and multi-hop reasoning may stem from common structural limitations inherent to the LPA.

URL PDF HTML ☆

赞 0 踩 0

2601.20738 2026-05-26 cs.LG cs.DC eess.SP math.OC stat.ML 版本更新

SA-PEF: Step-Ahead Partial Error Feedback for Efficient Federated Learning

SA-PEF：用于高效联邦学习的前瞻部分误差反馈

Dawit Kiros Redie, Reza Arablouei, Stefan Werner

发表机构 * Department of Electronic Systems, Norwegian University of Science and Technology (NTNU)（挪威科学技术大学电子系统系）； Department of Information and Communications Engineering, Aalto University（阿尔托大学信息与通信工程系）； CSIRO’s Data61（CSIRO数据61）

AI总结提出SA-PEF方法，通过结合前瞻校正和部分误差反馈，在非IID数据和部分客户端参与下加速联邦学习收敛，并理论证明其收敛速率与Fed-SGD相当。

详情

Journal ref: Transactions on Machine Learning Research, 2026

AI中文摘要

带误差反馈（EF）的有偏梯度压缩减少了联邦学习（FL）中的通信，但在非IID数据下，残差误差可能缓慢衰减，导致早期轮次中的梯度不匹配和进度停滞。我们提出前瞻部分误差反馈（SA-PEF），它集成了前瞻（SA）校正与部分误差反馈（PEF）。当前瞻系数$α=0$时，SA-PEF恢复为EF；当$α=1$时，恢复为前瞻EF（SAEF）。对于非凸目标和$δ$-收缩压缩器，我们建立了二阶矩界和残差递归，保证了在异构数据和部分客户端参与下收敛到平稳点。得到的速率与标准非凸Fed-SGD保证在常数因子内匹配，在固定内步长下实现$O((η,η_0TR)^{-1})$收敛到方差/异质性下界。我们的分析揭示了一个由前瞻控制的残差收缩$ρ_r$，解释了早期训练阶段观察到的加速。为了平衡SAEF的快速预热与EF的长期稳定性，我们选择接近理论预测最优的$α$。跨多种架构和数据集的实验表明，SA-PEF始终比EF更快达到目标精度。

英文摘要

Biased gradient compression with error feedback (EF) reduces communication in federated learning (FL), but under non-IID data, the residual error can decay slowly, causing gradient mismatch and stalled progress in the early rounds. We propose step-ahead partial error feedback (SA-PEF), which integrates step-ahead (SA) correction with partial error feedback (PEF). SA-PEF recovers EF when the step-ahead coefficient $α=0$ and step-ahead EF (SAEF) when $α=1$. For non-convex objectives and $δ$-contractive compressors, we establish a second-moment bound and a residual recursion that guarantee convergence to stationarity under heterogeneous data and partial client participation. The resulting rates match standard non-convex Fed-SGD guarantees up to constant factors, achieving $O((η,η_0TR)^{-1})$ convergence to a variance/heterogeneity floor with a fixed inner step size. Our analysis reveals a step-ahead-controlled residual contraction $ρ_r$ that explains the observed acceleration in the early training phase. To balance SAEF's rapid warm-up with EF's long-term stability, we select $α$ near its theory-predicted optimum. Experiments across diverse architectures and datasets show that SA-PEF consistently reaches target accuracy faster than EF.

URL PDF HTML ☆

赞 0 踩 0

2601.15544 2026-05-26 cs.LG cs.AI 版本更新

RDumb++: Drift-Aware Continual Test-Time Adaptation

RDumb++：漂移感知的持续测试时自适应

Himanshu Mishra

发表机构 * Department of Computer Science（计算机科学系）； University of British Columbia（不列颠哥伦比亚大学）

AI总结针对持续测试时自适应中分布快速变化或长期漂移导致性能崩溃的问题，提出RDumb++方法，通过熵和KL散度漂移检测机制与自适应重置策略，在CCC基准上实现约3%的绝对准确率提升。

详情

AI中文摘要

持续测试时自适应（CTTA）旨在仅使用传入的无标签数据流在部署期间更新预训练模型。尽管先前的方法如Tent、EATA等在短期演化偏移下提供了有意义的改进，但当测试分布快速变化或时间跨度极长时，它们表现不佳。CCC基准测试体现了这一挑战，模型在包含750万样本且不断变化损坏类型和严重程度的数据流上运行。我们提出RDumb++，它是RDumb的合理扩展，引入了两种漂移检测机制，即基于熵的漂移评分和KL散度漂移评分，以及自适应重置策略。这些机制使模型能够检测累积的自适应何时变得有害，并在预测崩溃发生前恢复。在包含三种速度和三种种子的CCC-medium（九次运行，每次包含一百万样本）上，RDumb++始终优于RDumb，在整个数据流中实现约3%的绝对准确率提升，同时保持稳定的自适应。关于漂移阈值和重置强度的消融实验进一步表明，漂移感知重置对于防止崩溃和实现可靠的长期CTTA至关重要。

英文摘要

Continual Test-Time Adaptation (CTTA) seeks to update a pretrained model during deployment using only the incoming, unlabeled data stream. Although prior approaches such as Tent, EATA etc. provide meaningful improvements under short evolving shifts, they struggle when the test distribution changes rapidly or over extremely long horizons. This challenge is exemplified by the CCC benchmark, where models operate over streams of 7.5M samples with continually changing corruption types and severities. We propose RDumb++, a principled extension of RDumb that introduces two drift-detection mechanisms i.e entropy-based drift scoring and KL-divergence drift scoring, together with adaptive reset strategies. These mechanisms allow the model to detect when accumulated adaptation becomes harmful and to recover before prediction collapse occurs. Across CCC-medium with three speeds and three seeds (nine runs, each containing one million samples), RDumb++ consistently surpasses RDumb, yielding approx 3% absolute accuracy gains while maintaining stable adaptation throughout the entire stream. Ablation experiments on drift thresholds and reset strengths further show that drift-aware resetting is essential for preventing collapse and achieving reliable long-horizon CTTA.

URL PDF HTML ☆

赞 0 踩 0

2601.14340 2026-05-26 cs.CR cs.LG 版本更新

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

基于回合的结构性触发器：多轮LLM中的无提示后门

Yiyang Lu, Jinwen He, Yue Zhao, Kai Chen, Ruigang Liang, Cheng Hong, Yingjun Zhang

发表机构 * School of Cyber Security, University of Chinese Academy of Sciences（中国科学院大学网络安全学院）； Institute of Information Engineering, Chinese Academy of Sciences（中国科学院信息工程研究所）； Institute of Software, Chinese Academy of Sciences（中国科学院软件研究所）； Ant Group（蚂蚁集团）

AI总结提出一种利用对话结构（回合索引）作为触发器的后门攻击方法TST，无需用户输入即可激活，实现高攻击成功率并绕过提示中心防御。

详情

AI中文摘要

大型语言模型（LLM）被广泛集成到交互式系统中，如对话代理和面向任务的助手。这一日益增长的生态系统也带来了供应链风险，攻击者可以分发被污染的模型，降低下游可靠性和用户信任。现有的后门攻击和防御大多以提示为中心，关注用户可见的触发器，而忽视了多轮对话中的结构信号。我们提出了基于回合的结构性触发器（TST），这是一种从对话结构激活的后门攻击，使用回合索引作为触发器，且独立于用户输入。这造成了一种结构条件性的可靠性风险：带有后门的模型可以通过以提示为中心的检查和标准效用评估，但在选定的对话位置执行攻击者指定的行为，而用户输入中没有任何触发器。在四个开源LLM家族中，TST实现了99.52%的平均攻击成功率，同时基本保持了非触发效用，并且在未见过的对话数据集和代表性防御中仍然有效。这些结果揭示了对话结构是一个被忽视的攻击面，并激励了超越提示检查的结构感知多轮审计。

英文摘要

Large Language Models (LLMs) are widely integrated into interactive systems such as dialogue agents and task-oriented assistants. This growing ecosystem also raises supply-chain risks, where adversaries can distribute poisoned models that degrade downstream reliability and user trust. Existing backdoor attacks and defenses are largely prompt-centric, focusing on user-visible triggers while overlooking structural signals in multi-turn conversations. We propose Turn-based Structural Trigger (TST), a backdoor attack that activates from dialogue structure, using the turn index as the trigger and remaining independent of user inputs. This creates a structure-conditioned reliability risk: a backdoored model can pass prompt-centric checks and standard utility evaluations, yet execute attacker-specified behaviors at selected dialogue positions without any trigger in the user input. Across four open-source LLM families, TST achieves a 99.52% average ASR while largely preserving non-triggered utility, and remains effective across unseen dialogue datasets and representative defenses. These results reveal dialogue structure as an overlooked attack surface and motivate structure-aware multi-turn auditing beyond prompt inspection.

URL PDF HTML ☆

赞 0 踩 0

2601.10494 2026-05-26 stat.ML cs.LG 版本更新

CROCS: A Two-Stage Clustering Framework for Behaviour-Centric Consumer Segmentation with Smart Meter Data

CROCS：一种基于智能电表数据的以行为为中心的消费者细分的两阶段聚类框架

Luke W. Yerbury, Ricardo J. G. B. Campello, G. C. Livingston, Mark Goldsworthy, Lachlan O'Neil

发表机构 * Ausgrid（澳大利亚电网公司）

AI总结提出CROCS两阶段聚类框架，通过消费者日常负荷曲线的独立聚类和基于加权最小距离的集合间比较，实现鲁棒且可扩展的消费者行为细分。

详情

AI中文摘要

随着电网运营商面临可再生能源整合和电气化推广带来的不确定性增加，需求侧管理（DSM）——特别是需求响应（DR）——作为一种平衡现代电力系统的成本效益机制引起了广泛关注。全球持续部署的智能电表提供了前所未有的消费数据量，使得基于实际用电行为的消费者细分成为可能，有望为设计更有效的DSM和DR计划提供信息。然而，现有的基于聚类的细分方法未能充分反映消费者的行为多样性，通常依赖于严格的时间对齐，并且在存在异常值、缺失数据或大规模部署时表现不佳。为了解决这些挑战，我们提出了一种新颖的两阶段聚类框架——优化消费者细分的聚类表示（CROCS）。在第一阶段，每个消费者的每日负荷曲线被独立聚类，形成代表性负荷集（RLS），提供其典型日间消费行为的紧凑摘要。在第二阶段，使用加权最小距离和（WSMD）对消费者进行聚类，这是一种新颖的集合间度量，通过考虑这些行为的普遍性和相似性来比较RLS。最后，对WSMD诱导图进行社区检测，揭示体现定义消费者群体的共享日间行为的高阶原型，从而增强所得聚类的可解释性。在合成和真实澳大利亚智能电表数据集上的大量实验表明，CROCS能够捕捉消费者内部变异性，发现同步和异步行为相似性，对异常值和缺失数据保持鲁棒性，并通过自然并行化实现高效扩展。这些结果...

英文摘要

With grid operators confronting rising uncertainty from renewable integration and a broader push toward electrification, Demand-Side Management (DSM) -- particularly Demand Response (DR) -- has attracted significant attention as a cost-effective mechanism for balancing modern electricity systems. Unprecedented volumes of consumption data from a continuing global deployment of smart meters enable consumer segmentation based on real usage behaviours, promising to inform the design of more effective DSM and DR programs. However, existing clustering-based segmentation methods insufficiently reflect the behavioural diversity of consumers, often relying on rigid temporal alignment, and faltering in the presence of anomalies, missing data, or large-scale deployments. To address these challenges, we propose a novel two-stage clustering framework -- Clustered Representations Optimising Consumer Segmentation (CROCS). In the first stage, each consumer's daily load profiles are clustered independently to form a Representative Load Set (RLS), providing a compact summary of their typical diurnal consumption behaviours. In the second stage, consumers are clustered using the Weighted Sum of Minimum Distances (WSMD), a novel set-to-set measure that compares RLSs by accounting for both the prevalence and similarity of those behaviours. Finally, community detection on the WSMD-induced graph reveals higher-order prototypes that embody the shared diurnal behaviours defining consumer groups, enhancing the interpretability of the resulting clusters. Extensive experiments on both synthetic and real Australian smart meter datasets demonstrate that CROCS captures intra-consumer variability, uncovers both synchronous and asynchronous behavioural similarities, and remains robust to anomalies and missing data, while scaling efficiently through natural parallelisation. These results...

URL PDF HTML ☆

赞 0 踩 0

2601.08205 2026-05-26 cs.CV cs.LG 版本更新

FUME: Fused Unified Multi-Gas Emission Network for Livestock Rumen Acidosis Detection

FUME: 用于牲畜瘤胃酸中毒检测的融合统一多气体排放网络

Taminul Islam, Toqi Tahamid Sarker, Mohamed Embaby, Khaled R Ahmed, Amer AbuGhazaleh

发表机构 * Southern Illinois University, Carbondale（南方伊利诺伊大学，卡本达勒分校）； University of California, Davis（加州大学戴维斯分校）

AI总结提出FUME网络，利用双气体（CO2和CH4）光学成像，通过轻量双流架构和通道注意力融合，实现瘤胃酸中毒的高精度分割与分类。

Comments 10 pages, 5 figures

详情

Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2026, pp. 510-519

AI中文摘要

瘤胃酸中毒是奶牛中常见的代谢紊乱，导致重大经济损失和动物福利问题。当前的诊断方法依赖于侵入性pH测量，限制了持续监测的可扩展性。我们提出了FUME（融合统一多气体排放网络），这是首个在体外条件下通过双气体光学成像进行瘤胃酸中毒检测的深度学习方法。我们的方法利用红外相机捕获的互补二氧化碳（CO2）和甲烷（CH4）排放模式，将瘤胃健康状态分类为健康、过渡和酸中毒。FUME采用轻量双流架构，包含权重共享编码器、模态特定自注意力和通道注意力融合，联合优化气体羽流分割和奶牛健康分类。我们引入了首个双气体OGI数据集，包含8967个标注帧，覆盖六个pH水平，并带有像素级分割掩码。实验表明，FUME在仅使用1.28M参数和1.97G MACs的情况下，实现了80.99%的mIoU和98.82%的分类准确率——在分割质量上优于最先进方法，且计算成本降低10倍。消融研究揭示，CO2提供主要的判别信号，而双任务学习对于最优性能至关重要。我们的工作确立了基于气体排放的牲畜健康监测的可行性，为实用的体外酸中毒检测系统铺平了道路。代码可在 https://github.com/taminulislam/fume 获取。

英文摘要

Ruminal acidosis is a prevalent metabolic disorder in dairy cattle causing significant economic losses and animal welfare concerns. Current diagnostic methods rely on invasive pH measurement, limiting scalability for continuous monitoring. We present FUME (Fused Unified Multi-gas Emission Network), the first deep learning approach for rumen acidosis detection from dual-gas optical imaging under in vitro conditions. Our method leverages complementary carbon dioxide (CO2) and methane (CH4) emission patterns captured by infrared cameras to classify rumen health into Healthy, Transitional, and Acidotic states. FUME employs a lightweight dual-stream architecture with weight-shared encoders, modality-specific self-attention, and channel attention fusion, jointly optimizing gas plume segmentation and classification of dairy cattle health. We introduce the first dual-gas OGI dataset comprising 8,967 annotated frames across six pH levels with pixel-level segmentation masks. Experiments demonstrate that FUME achieves 80.99% mIoU and 98.82% classification accuracy while using only 1.28M parameters and 1.97G MACs--outperforming state-of-the-art methods in segmentation quality with 10x lower computational cost. Ablation studies reveal that CO2 provides the primary discriminative signal and dual-task learning is essential for optimal performance. Our work establishes the feasibility of gas emission-based livestock health monitoring, paving the way for practical, in vitro acidosis detection systems. Codes are available at https://github.com/taminulislam/fume.

URL PDF HTML ☆

赞 0 踩 0

2601.06870 2026-05-26 cs.LG cs.AI 版本更新

QASA: Quality-Aware Semantic Augmentation for Robust Multimodal Sentiment Analysis

QASA: 面向鲁棒多模态情感分析的质量感知语义增强

Jiazhang Liang, Jianheng Dai, Miaosen Luo, Menghua Jiang, Sijie Mai

发表机构 * School of Computer Science, South China Normal University（华南师范大学计算机学院）

AI总结提出QASA框架，利用扩散模型生成视觉和听觉增强样本，并通过解耦质量感知评分模块分配训练权重，以解决高质量数据稀缺问题，提升多模态情感分析的鲁棒性和泛化能力。

Comments 11 pages, 4 figures

详情

AI中文摘要

多模态大语言模型在多模态情感分析中展现出强大的语义表示能力。然而，由于高质量训练数据的稀缺，它们学习稳定且可泛化的多模态特征的能力受到限制。为了解决这一问题，我们提出了QASA（质量感知语义增强），该方法使用扩散模型生成增强的视觉和听觉样本，从而扩大训练数据集并支持多模态学习。生成的样本质量可能参差不齐，并可能出现跨模态不一致。为此，我们引入了一个解耦的质量感知评分模块，根据每个增强样本的可靠性分配训练权重。这种方法减少了低质量数据的影响，有助于更稳定和鲁棒的模型训练。该框架结合了扩散模型的生成能力和多模态大模型的语义推理能力，提供了一种无需人工标注的自动数据增强策略，同时在有限高质量数据下提高了泛化性和鲁棒性。在CH-SIMS数据集上的实验表明，QASA在五类准确率（Acc5）和二类准确率（Acc2）上分别相对提升了18.0%和5.9%，并且在CMU-MOSI和MUStARD基准测试上也优于现有方法。

英文摘要

Multimodal large language models have demonstrated strong ability in capturing semantic representations for multimodal sentiment analysis. Their capacity to learn stable and generalizable multimodal features is limited, however, by the scarcity of high-quality training data. To address this, we propose QASA (Quality-Aware Semantic Augmentation), which uses diffusion models to generate augmented visual and auditory samples, thereby enlarging the training dataset and supporting multimodal learning. The generated samples can vary in quality and may exhibit cross-modal inconsistencies. To manage this, we introduce a decoupled quality-aware scoring module that assigns training weights based on the reliability of each augmented sample. This approach reduces the influence of low-quality data and contributes to more stable and robust model training. The framework combines the generative capabilities of diffusion models with the semantic reasoning of multimodal large models, providing an automated data augmentation strategy that does not require human annotation while improving generalization and robustness under limited high-quality data. Experiments on the CH-SIMS dataset show that QASA yields a relative increase of 18.0\% and 5.9\% in five-class accuracy (Acc5) and binary accuracy (Acc2), respectively, and it also outperforms existing methods on the CMU-MOSI and MUStARD benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2512.23956 2026-05-26 stat.ML cs.LG 版本更新

Implicit geometric regularization in flow matching via density weighted Stein operators

通过密度加权Stein算子的流匹配中的隐式几何正则化

Shinto Eguchi

发表机构 * The Institute of Statistical Mathematics（统计数学研究所）

AI总结提出γ-流匹配（γ-FM），通过动态密度加权策略隐式正则化高维空间中的回归几何，改善向量场平滑性和采样效率。

Comments Revised version

详情

AI中文摘要

流匹配（FM）已成为连续归一化流的一个强大范式，但标准FM隐式地在整个环境空间上进行未加权的$L^2$回归。在高维空间中，这导致了一个根本性的低效：绝大多数积分区域由低密度的“空洞”区域组成，其中目标速度场通常是混沌或定义不良的。在本文中，我们提出了γ-流匹配（γ-FM），一种密度加权变体，它将回归几何与底层概率流对齐。虽然密度加权是可取的，但朴素实现需要评估难以处理的目标密度。我们通过引入一种动态密度加权策略来规避这一点，该策略直接从训练粒子估计目标密度。这种方法使我们能够动态降低空洞区域中的回归损失，而不损害FM的无模拟特性。理论上，我们证明了γ-FM在赋予γ-Stein度量的统计流形上最小化传输成本。谱分析进一步表明，这种几何结构引入了隐式Sobolev正则化，有效地抑制了空洞区域中的高频振荡。实验上，γ-FM显著改善了高维潜在数据集上的向量场平滑性和采样效率，同时展示了对异常值的内在鲁棒性。

英文摘要

Flow Matching (FM) has emerged as a powerful paradigm for continuous normalizing flows, yet standard FM implicitly performs an unweighted $L^2$ regression over the entire ambient space. In high dimensions, this leads to a fundamental inefficiency: the vast majority of the integration domain consists of low-density ``void'' regions where the target velocity fields are often chaotic or ill-defined. In this paper, we propose {$γ$-Flow Matching ($γ$-FM)}, a density-weighted variant that aligns the regression geometry with the underlying probability flow. While density weighting is desirable, naive implementations would require evaluating the intractable target density. We circumvent this by introducing a Dynamic Density-Weighting strategy that estimates the \emph{target} density directly from training particles. This approach allows us to dynamically downweight the regression loss in void regions without compromising the simulation-free nature of FM. Theoretically, we establish that $γ$-FM minimizes the transport cost on a statistical manifold endowed with the $γ$-Stein metric. Spectral analysis further suggests that this geometry induces an implicit Sobolev regularization, effectively damping high-frequency oscillations in void regions. Empirically, $γ$-FM significantly improves vector field smoothness and sampling efficiency on high-dimensional latent datasets, while demonstrating intrinsic robustness to outliers.

URL PDF HTML ☆

赞 0 踩 0

2512.20063 2026-05-26 cs.LG 版本更新

PairFlow: Closed-Form Source-Target Coupling for Few-Step Generation in Discrete Flow Models

PairFlow: 离散流模型中用于少步生成的闭式源-目标耦合

Mingue Park, Jisung Hwang, Seungwoo Yoo, Kyeongmin Yeo, Minhyuk Sung

发表机构 * KAIST（韩国科学技术院）

AI总结提出PairFlow，一种轻量级预处理方法，通过闭式反演构建源-目标配对样本，无需预训练教师即可实现离散流模型的少步采样，匹配甚至超越两阶段微调性能。

Comments ICLR 2026

详情

AI中文摘要

我们介绍了$\texttt{PairFlow}$，一种用于训练离散流模型（DFM）的轻量级预处理步骤，无需预训练教师即可实现少步采样。DFM最近作为一类新的离散数据生成模型出现，性能强劲。然而，由于其迭代性质，采样速度慢。现有的加速方法主要依赖微调，这引入了大量额外的训练开销。$\texttt{PairFlow}$通过轻量级预处理步骤解决了这个问题。受ReFlow及其在DFM上的扩展启发，我们从源分布和目标分布的耦合样本训练DFM，无需任何预训练教师。我们方法的核心是DFM的闭式反演，这使得能够高效构建配对的源-目标样本。尽管成本极低，仅占完整模型训练所需计算量的1.7%，但$\texttt{PairFlow}$匹配甚至超越了涉及微调的两阶段训练的性能。此外，使用我们的框架训练的模型为后续蒸馏提供了更强的基模型，在微调后进一步加速。在分子数据以及二值和RGB图像上的实验证明了我们方法的广泛适用性和有效性。

英文摘要

We introduce $\texttt{PairFlow}$, a lightweight preprocessing step for training Discrete Flow Models (DFMs) to achieve few-step sampling without requiring a pretrained teacher. DFMs have recently emerged as a new class of generative models for discrete data, offering strong performance. However, they suffer from slow sampling due to their iterative nature. Existing acceleration methods largely depend on finetuning, which introduces substantial additional training overhead. $\texttt{PairFlow}$ addresses this issue with a lightweight preprocessing step. Inspired by ReFlow and its extension to DFMs, we train DFMs from coupled samples of source and target distributions, without requiring any pretrained teacher. At the core of our approach is a closed-form inversion for DFMs, which allows efficient construction of paired source-target samples. Despite its extremely low cost, taking only up to 1.7% of the compute needed for full model training, $\texttt{PairFlow}$ matches or even surpasses the performance of two-stage training involving finetuning. Furthermore, models trained with our framework provide stronger base models for subsequent distillation, yielding further acceleration after finetuning. Experiments on molecular data as well as binary and RGB images demonstrate the broad applicability and effectiveness of our approach.

URL PDF HTML ☆

赞 0 踩 0

2512.13323 2026-05-26 cs.AI cs.LG 版本更新

Error-Driven Prompt Optimization for Arithmetic Reasoning

基于错误驱动的算术推理提示优化

Árpád Pándy, Róbert Lakatos, András Hajdu

发表机构 * Deptartment of Data Science & Visualization, Faculty of Informatics, University of Debrecen（数据科学与可视化系，信息学院，德布勒恩大学）

AI总结提出一种错误驱动的提示优化框架，通过聚类错误预测迭代优化提示规则，使小型本地语言模型在算术推理任务中准确率达到70.8%，超越GPT-3.5 Turbo。

详情

DOI: 10.1109/ACCESS.2026.3685125
Journal ref: IEEE Access, vol. 14, pp. 62570-62583, 2026

AI中文摘要

人工智能的最新进展激发了人们对工业代理的兴趣，这些代理能够在表格数据工作流中支持金融和医疗等受监管领域的分析师。此类系统的关键能力是对结构化数据执行准确的算术运算，同时确保敏感信息永远不会离开安全的本地环境。在此，我们引入了一种用于算术推理的错误驱动优化框架，该框架增强了代码生成代理（CGA），特别应用于本地小型语言模型（SLM）。通过对领先的SLM（Qwen3 4B）进行系统评估，我们发现虽然基础模型在算术任务中表现出基本局限性，但我们提出的错误驱动方法通过聚类错误预测来迭代优化提示规则，显著提升了性能，将模型准确率提高到70.8%。我们的结果表明，开发可靠、可解释且可工业部署的AI助手不仅可以通过昂贵的微调实现，还可以通过系统的、错误驱动的提示优化来实现，从而使小型模型以符合隐私要求的方式超越大型语言模型（GPT-3.5 Turbo）。

英文摘要

Recent advancements in artificial intelligence have sparked interest in industrial agents capable of supporting analysts in regulated sectors, such as finance and healthcare, within tabular data workflows. A key capability for such systems is performing accurate arithmetic operations on structured data while ensuring sensitive information never leaves secure, on-premises environments. Here, we introduce an error-driven optimization framework for arithmetic reasoning that enhances a Code Generation Agent (CGA), specifically applied to on-premises small language models (SLMs). Through a systematic evaluation of a leading SLM (Qwen3 4B), we find that while the base model exhibits fundamental limitations in arithmetic tasks, our proposed error-driven method, which clusters erroneous predictions to refine prompt-rules iteratively, dramatically improves performance, elevating the model's accuracy to 70.8\%. Our results suggest that developing reliable, interpretable, and industrially deployable AI assistants can be achieved not only through costly fine-tuning but also via systematic, error-driven prompt optimization, enabling small models to surpass larger language models (GPT-3.5 Turbo) in a privacy-compliant manner.

URL PDF HTML ☆

赞 0 踩 0

2512.06393 2026-05-26 cs.AI cs.CL cs.LG cs.LO 版本更新

Conflict-Aware Fusion: Mitigating Logic Inertia in Large Language Models via Structured Cognitive Priors

冲突感知融合：通过结构化认知先验缓解大语言模型中的逻辑惯性

Qiming Bao, Xiaoxuan Fu, Michael Witbrock

发表机构 * Xtracta & Strong AI Lab, University of Auckland（Xtracta与强人工智能实验室，奥克兰大学）； School of Humanities, China University of Political Science and Law（人文学院，中国政法大学）； Strong AI Lab, University of Auckland（强人工智能实验室，奥克兰大学）

AI总结针对大语言模型在规则系统结构扰动下表现脆弱的问题，提出冲突感知融合训练流程，通过验证-演绎结构先验和符号推理奖励，在多个压力测试中实现鲁棒性饱和。

详情

AI中文摘要

大型语言模型（LLM）在许多推理基准上取得了高准确率，但在基于规则系统的结构扰动下仍然脆弱。我们引入了一个包含四个压力测试的诊断框架——冗余与必要规则删除、矛盾规则注入、逻辑保持重写和多定律堆叠——并用它来揭示逻辑惯性：生成式LLM（Qwen2/3、TinyLlama、GPT-4o、Gemma-3-4B-IT）和仅编码器BERT基线在矛盾前提下沿学习到的演绎轨迹持续推理的倾向。这种崩溃是剧烈的：未经处理的基线在基础任务上的准确率从1.00下降到矛盾注入时的0.00（实例级精确匹配），而GPT-4o仅解决了56.0%的矛盾案例。我们提出冲突感知融合，这是一个四阶段训练流程，将验证-演绎作为学习到的结构先验强制执行：（i）SFT建立验证前缀；（ii）DPO锐化矛盾停止决策边界；（iii）逻辑不变正则化（LIRE）通过对称KL惩罚逻辑等价规则公式之间的差异；（iv）来自验证反馈的强化学习（RLVF）使用符号前向链接引擎作为确定性预言奖励，联合优化不变性和敏感性。该流程在1.5B和8B骨干网络上均使所有四个主要压力测试达到饱和。我们进一步验证了第二阶段扩展，用Lean 4内核替换命题预言机，在分层187个问题的Lean翻译样本中，对105个经典可推导（T）问题达到99.0%的内核一致性（整体71.7%，涵盖两种极性），为形式化验证的RL训练提供了可靠的升级路径。代码和基准：https://github.com/14H034160212/lemo

英文摘要

Large language models (LLMs) achieve high accuracy on many reasoning benchmarks but remain brittle under structural perturbations of rule-based systems. We introduce a diagnostic framework with four stress tests -- redundant vs. essential rule deletion, contradictory-rule injection, logic-preserving rewrites, and multi-law stacking -- and use it to expose Logic Inertia: the tendency of generative LLMs (Qwen2/3, TinyLlama, GPT-4o, Gemma-3-4B-IT) and the encoder-only BERT baseline to persist along learned deductive trajectories under inconsistent premises. The collapse is sharp: untreated baselines fall from accuracy 1.00 on the base task to 0.00 on contradiction injection (instance-level exact match), and GPT-4o resolves only 56.0% of contradiction cases. We propose Conflict-Aware Fusion, a four-stage training pipeline that enforces verification-before-deduction as a learned structural prior: (i) SFT establishes the verification preamble; (ii) DPO sharpens the halt-on-contradiction decision boundary; (iii) Logical Invariance REgularisation (LIRE) penalises divergence between logically equivalent rule formulations via symmetric KL; (iv) Reinforcement Learning from Verification Feedback (RLVF) uses a symbolic forward-chaining engine as a deterministic oracle reward, jointly optimising invariance and sensitivity. The pipeline saturates all four primary stress tests for both 1.5B and 8B backbones. We further validate a Phase 2 extension that replaces the propositional oracle with a Lean 4 kernel, attaining 99.0% kernel agreement on the 105 classically-derivable (T) questions within a stratified 187-question Lean-translated sample (overall 71.7% across both polarities), providing a sound upgrade path to formally verified RL training. Code and benchmark: https://github.com/14H034160212/lemo

URL PDF HTML ☆

赞 0 踩 0

2512.05791 2026-05-26 physics.med-ph cs.CV cs.LG math.PR 版本更新

Fast and Robust Diffusion Posterior Sampling for MR Image Reconstruction Using the Preconditioned Unadjusted Langevin Algorithm

使用预条件未调整朗之万算法实现快速且鲁棒的MR图像重建扩散后验采样

Moritz Blumenthal, Tina Holliber, Jonathan I. Tamir, Martin Uecker

发表机构 * Institute of Biomedical Imaging, Graz University of Technology, Graz, Austria ； Department of Radiology, Boston Children's Hospital, Harvard Medical School, Boston, USA ； Chandra Family Department of Electrical Engineering, University of Texas at Austin, USA ； Department of Diagnostic Medicine, Dell Medical School, University of Texas at Austin, USA

AI总结针对MR图像重建中扩散后验采样速度慢和参数调优问题，提出基于预条件未调整朗之万算法的精确似然方法，实现快速收敛且无需调参的鲁棒采样。

Comments Submitted to Magnetic Resonance in Medicine

详情

DOI: 10.1002/mrm.70416

AI中文摘要

目的：结合未调整朗之万算法（ULA）与扩散模型，可以从高度欠采样的k空间数据生成高质量MRI重建结果并附带不确定性估计。然而，扩散后验采样（DPS）或似然退火等采样方法存在重建时间长和需要参数调优的问题。本文旨在开发一种具有快速收敛性的鲁棒采样算法。理论与方法：在用于后验采样的反向扩散过程中，精确似然与所有噪声尺度下的扩散先验相乘。为克服收敛缓慢的问题，采用了预条件技术。该方法在fastMRI数据上训练，并在健康志愿者的回顾性欠采样脑部数据上测试。结果：对于笛卡尔和非笛卡尔加速MRI的后验采样，新方法在重建速度和样本质量上均优于退火采样和DPS。结论：所提出的预条件精确似然方法能够在各种MRI重建任务中实现快速可靠的后验采样，无需参数调优。

英文摘要

Purpose: The Unadjusted Langevin Algorithm (ULA) in combination with diffusion models can generate high quality MRI reconstructions with uncertainty estimation from highly undersampled k-space data. However, sampling methods such as diffusion posterior sampling (DPS) or likelihood annealing suffer from long reconstruction times and the need for parameter tuning. The purpose of this work is to develop a robust sampling algorithm with fast convergence. Theory and Methods: In the reverse diffusion process used for sampling the posterior, the exact likelihood is multiplied with the diffused prior at all noise scales. To overcome the issue of slow convergence, preconditioning is used. The method is trained on fastMRI data and tested on retrospectively undersampled brain data of a healthy volunteer. Results: For posterior sampling in Cartesian and non-Cartesian accelerated MRI the new approach outperforms annealed sampling and DPS in terms of reconstruction speed and sample quality. Conclusion: The proposed exact likelihood with preconditioning enables rapid and reliable posterior sampling across various MRI reconstruction tasks without the need for parameter tuning.

URL PDF HTML ☆

赞 0 踩 0

2512.05765 2026-05-26 cs.AI cs.LG 版本更新

AGI Requires a Coordination Layer on Top of Pattern Repositories

AGI 需要在模式存储库之上建立协调层

Edward Y. Chang

发表机构 * Department of Computer Science, Stanford University（斯坦福大学计算机科学系）

AI总结本文提出大型语言模型（LLM）并非AGI的死胡同，而是缺少系统2协调层，通过UCCT和RCA实现语义锚定与因果验证，并设计MACI多智能体协调栈，实验表明自适应控制优于静态提示。

Comments 15 pages, 5 figures, 7 tables

详情

AI中文摘要

在本文中，我们认为那些将大型语言模型（LLM）视为AGI死胡同的有影响力的批评误判了瓶颈：它们混淆了海洋与渔网。模式存储库是必要的系统1基础；缺失的组件是一个系统2协调层，该层能够招募相关模式、验证其使用、保持状态并控制收敛。我们将常常被混淆的两种控制用途分开。由UCCT（统一上下文控制理论）形式化的语义锚定，通过由有效支持（rho_d）、表征不匹配（d_r）和自适应锚定预算（gamma log k）控制的相变，将标签和任务意图绑定到学习到的模式区域。由递归因果审计（RCA）实现的追踪-答案验证，测试最终因果判断是否在其自身推理轨迹的压力下得到支持。我们将这些思想转化为MACI，一个多智能体协调栈，通过诱饵（PID调节辩论）、过滤（苏格拉底式和因果审计）和持久性（事务性内存）整合多样性和控制。在因果判断和谄媚-偏执权衡上的实证验证表明，静态提示失败的地方，自适应控制成功。通过将常见反对意见重新定义为可测试的协调失败，我们认为通往AGI的道路是通过LLM，而不是绕过它们。能力不是协调。

英文摘要

In this paper we argue that influential critiques dismissing Large Language Models (LLMs) as a dead end for AGI misidentify the bottleneck: they confuse the ocean with the net. Pattern repositories are the necessary System-1 substrate; the missing component is a System-2 coordination layer that recruits relevant patterns, verifies their use, preserves state, and governs convergence. We separate two uses of control that are often conflated. Semantic anchoring, formalized by UCCT (Unified Contextual Control Theory), binds labels and task intent to learned pattern regions through a phase transition governed by effective support (rho_d), representational mismatch (d_r), and an adaptive anchoring budget (gamma log k). Trace-answer verification, implemented by Recursive Causal Audit (RCA), tests whether a final causal judgment is warranted by its own reasoning trace under pressure. We translate these ideas into MACI, a multi-agent coordination stack that integrates diversity and control via baiting (PID-modulated debate), filtering (Socratic and causal audit), and persistence (transactional memory). Empirical validation on causal judgment and the sycophancy-paranoia trade-off demonstrates that static prompting fails where adaptive control succeeds. By reframing common objections as testable coordination failures, we argue that the path to AGI runs through LLMs, not around them. Capability is not coordination.

URL PDF HTML ☆

赞 0 踩 0

2512.00125 2026-05-26 cs.CV cs.LG 版本更新

Hybrid Synthetic Data Generation with Domain Randomization Enables Zero-Shot Vision-Based Part Inspection Under Extreme Class Imbalance

混合合成数据生成与域随机化实现极端类别不平衡下基于视觉的零样本零件检测

Ruo-Syuan Mei, Sixian Jia, Guangze Li, Soo Yeon Lee, Brian Musser, William Keller, Sreten Zakula, Jorge Arinez, Chenhui Shao

发表机构 * Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109, USA ； Materials \& Manufacturing Systems Research Lab, General Motors, Warren, MI 48092, USA

AI总结提出一种结合仿真渲染、域随机化和真实背景合成的混合合成数据生成框架，仅用合成数据训练YOLOv8n和MobileNetV3-small模型，在极端类别不平衡下实现零样本工业零件检测，检测mAP@0.5达0.995，分类准确率96%，平衡准确率90.1%。

Comments Submitted to the NAMRC 54

详情

DOI: 10.1016/j.jmapro.2026.04.020

AI中文摘要

机器学习，特别是深度学习，正在改变工业质量检测。然而，训练鲁棒的机器学习模型通常需要大量高质量标注数据，这在制造业中获取成本高昂、耗时且劳动密集。此外，缺陷样本本身稀少，导致严重的类别不平衡，降低模型性能。这些数据约束阻碍了基于机器学习的质量检测方法在实际生产环境中的广泛采用。合成数据生成（SDG）通过高效、经济且可扩展的方式创建大规模、平衡且完全标注的数据集，提供了一种有前景的解决方案。本文提出一种混合SDG框架，集成了基于仿真的渲染、域随机化和真实背景合成，无需人工标注即可实现基于计算机视觉的工业零件检测的零样本学习。该SDG流水线通过改变零件几何、光照和表面属性，并将合成零件合成到真实图像背景上，在一小时内生成12,960张标注图像。利用YOLOv8n骨干网络进行目标检测、MobileNetV3-small进行质量分类的两阶段架构，仅使用合成数据训练，并在300个真实工业零件上评估。所提方法在检测上达到mAP@0.5为0.995，分类准确率96%，平衡准确率90.1%。与基于少量真实数据的基线方法相比，性能显著提升。在严重类别不平衡下，所提基于SDG的方法达到90-91%的平衡准确率，而基线仅达到50%准确率。这些结果表明，所提方法能够为真实制造应用实现免标注、可扩展且鲁棒的质量检测。

英文摘要

Machine learning, particularly deep learning, is transforming industrial quality inspection. Yet, training robust machine learning models typically requires large volumes of high-quality labeled data, which are expensive, time-consuming, and labor-intensive to obtain in manufacturing. Moreover, defective samples are intrinsically rare, leading to severe class imbalance that degrades model performance. These data constraints hinder the widespread adoption of machine learning-based quality inspection methods in real production environments. Synthetic data generation (SDG) offers a promising solution by enabling the creation of large, balanced, and fully annotated datasets in an efficient, cost-effective, and scalable manner. This paper presents a hybrid SDG framework that integrates simulation-based rendering, domain randomization, and real background compositing to enable zero-shot learning for computer vision-based industrial part inspection without manual annotation. The SDG pipeline generates 12,960 labeled images in one hour by varying part geometry, lighting, and surface properties, and then compositing synthetic parts onto real image backgrounds. A two-stage architecture utilizing a YOLOv8n backbone for object detection and MobileNetV3-small for quality classification is trained exclusively on synthetic data and evaluated on 300 real industrial parts. The proposed approach achieves an mAP@0.5 of 0.995 for detection, 96% classification accuracy, and 90.1% balanced accuracy. Comparative evaluation against few-shot real-data baseline approaches demonstrates significant improvement. The proposed SDG-based approach achieves 90-91% balanced accuracy under severe class imbalance, while the baselines reach only 50% accuracy. These results demonstrate that the proposed method enables annotation-free, scalable, and robust quality inspection for real-world manufacturing applications.

URL PDF HTML ☆

赞 0 踩 0

2511.15407 2026-05-26 cs.AI cs.CV cs.LG 版本更新

IPR-1: Interactive Physical Reasoner

IPR-1：交互式物理推理器

Mingyu Zhang, Lifeng Zhuo, Tianxi Tan, Guocan Xie, Xian Nie, Yan Li, Renjie Zhao, Zizhu He, Ziyu Wang, Jiting Cai, Yong-Lu Li

发表机构 * CARNEGIE MELLON UNIVERSITY（卡内基梅隆大学）

AI总结提出IPR模型，通过世界模型滚动评分和强化VLM策略，结合物理中心动作代码PhysCode，在1000+异构游戏基准上实现鲁棒的物理推理，性能超越GPT-5并零样本迁移至未见游戏。

Comments Accepted by CVPR 2026. 13 pages of main text and 20 pages of appendices. Project page: https://mybearyzhang.github.io/ipr-1

详情

AI中文摘要

人类通过观察、与环境交互以及内化物理和因果关系来学习。在这里，我们旨在探究一个智能体是否能够通过交互类似地获得类人推理能力，并随着更多经验不断改进。为此，我们引入了一个包含1000+异构游戏的Game-to-Unseen (G2U)基准，这些游戏展现出显著的视觉领域差异。现有方法（包括VLM和世界模型）难以捕捉底层物理和因果关系，因为它们不关注核心机制且过度拟合视觉细节。VLM/VLA智能体能够推理，但在交互设置中缺乏前瞻性，而世界模型进行想象但模仿视觉模式而非分析物理和因果关系。因此，我们提出IPR（交互式物理推理器），利用世界模型滚动来评分和强化VLM的策略，并引入PhysCode，一种以物理为中心的动作代码，将语义意图与动力学对齐，为预测和推理提供共享动作空间。在1000+游戏上预训练后，我们的IPR在从原始直觉到目标驱动推理的各个层次上表现稳健，甚至在总体上超越了GPT-5。我们发现，性能随着训练游戏和交互步骤的增加而提升，并且模型还能零样本迁移到未见过的游戏。这些结果支持以物理为中心的交互作为稳步提升物理推理的路径。更多演示和项目详情请见https://mybearyzhang.github.io/ipr-1。

英文摘要

Humans learn by observing, interacting with environments, and internalizing physics and causality. Here, we aim to ask whether an agent can similarly acquire human-like reasoning from interaction and keep improving with more experience. To study this, we introduce a Game-to-Unseen (G2U) benchmark of 1,000+ heterogeneous games that exhibit significant visual domain gaps. Existing approaches, including VLMs and world models, struggle to capture underlying physics and causality since they are not focused on core mechanisms and overfit to visual details. VLM/VLA agents reason but lack look-ahead in interactive settings, while world models imagine but imitate visual patterns rather than analyze physics and causality. We therefore propose IPR (Interactive Physical Reasoner), using world-model rollouts to score and reinforce a VLM's policy, and introduce PhysCode, a physics-centric action code aligning semantic intent with dynamics to provide a shared action space for prediction and reasoning. Pretrained on 1,000+ games, our IPR performs robustly on levels from primitive intuition to goal-driven reasoning, and even surpasses GPT-5 overall. We find that performance improves with more training games and interaction steps, and that the model also zero-shot transfers to unseen games. These results support physics-centric interaction as a path to steadily improving physical reasoning. Further demos and project details can be found at https://mybearyzhang.github.io/ipr-1.

URL PDF HTML ☆

赞 0 踩 0

2511.09048 2026-05-26 cs.LG 版本更新

Guaranteeing Conservation of Integrals with Projection in Physics-Informed Neural Networks

在物理信息神经网络中通过投影保证积分守恒

Anthony Baez, Wang Zhang, Ziwen Ma, Lam Nguyen, Subhro Das, Luca Daniel

发表机构 * MIT（麻省理工学院）； IBM（国际商业机器公司）

AI总结提出一种投影方法，通过求解约束非线性优化问题，在物理信息神经网络中分别或联合保证线性和二次积分量的守恒，将守恒误差降低三到四个数量级。

详情

AI中文摘要

我们提出了一种新颖的投影方法，能够保证物理信息神经网络（PINNs）中积分量的守恒。尽管PINNs用于强制执行偏微分方程（PDEs）结构的软约束在训练过程中提供了必要的灵活性，但也允许发现的解违反物理定律。为了解决这个问题，我们引入了一种投影方法，分别和联合保证线性和二次积分的守恒。我们通过求解约束非线性优化问题推导了投影公式，并发现经过投影修改的PINN（称为PINN-Proj）相比软约束，将这些量的守恒误差降低了三到四个数量级，并略微减少了PDE解误差。我们还发现，投影通过改善损失景观的条件性来改善收敛。我们的方法有望成为一个通用框架，只要存在可解的方案，就能保证PINN中任何积分量的守恒。

英文摘要

We propose a novel projection method that guarantees the conservation of integral quantities in Physics-Informed Neural Networks (PINNs). While the soft constraint that PINNs use to enforce the structure of partial differential equations (PDEs) enables necessary flexibility during training, it also permits the discovered solution to violate physical laws. To address this, we introduce a projection method that guarantees the conservation of the linear and quadratic integrals, both separately and jointly. We derived the projection formulae by solving constrained non-linear optimization problems and found that our PINN modified with the projection, which we call PINN-Proj, reduced the error in the conservation of these quantities by three to four orders of magnitude compared to the soft constraint and marginally reduced the PDE solution error. We also found evidence that the projection improved convergence through improving the conditioning of the loss landscape. Our method holds promise as a general framework to guarantee the conservation of any integral quantity in a PINN if a tractable solution exists.

URL PDF HTML ☆

赞 0 踩 0

2511.03963 2026-05-26 stat.ML cs.LG 版本更新

Robust inference using density-powered Stein operators

使用密度驱动的Stein算子进行稳健推断

Shinto Eguchi

发表机构 * The Institute of Statistical Mathematics（统计数学研究所）

AI总结提出基于γ-散度的γ-Stein算子，通过密度加权实现未归一化概率模型的稳健推断，并应用于稳健拟合优度检验和贝叶斯后验近似。

Comments Revised version

详情

AI中文摘要

虚假不动点：大语言模型中的康德反馈、稳定误校准与表征压缩

Akira Okutomi

发表机构 * ToppyMicroServices OÜ（ToppyMicroServices公司）

AI总结本文通过康德承诺门控框架和线性反馈模型，研究大语言模型中高置信度错误作为局部稳定、内部一致且自信错误的虚假不动点现象，发现稳定性与正确性可分离，并探索高信噪比惯性和表征压缩作为稳定误校准的可能机制。

Comments 27 pages, 8 figures, v3.0

详情

AI中文摘要

大型语言模型中的高置信度错误通常被视为脆弱的失败。我们研究另一种可能性：某些错误可能是虚假不动点，即局部稳定、内部一致且自信地错误。这分离了鲁棒性与真实追踪。我们通过康德承诺门控框架和一个最小线性反馈模型来发展这种分离，其中稳定性和正确性可以偏离。在三个开源权重模型上，根据我们的隐藏状态敏感性探测，过度自信的错误项并不比自信正确的项系统性地更局部脆弱。基于弃权的自我批评通过牺牲覆盖率减少了过度自信的错误承诺，而C3-R（一种基于规则的显式反馈门控）则加剧了这种权衡而非消除它。这些结果激发但未证实高信噪比惯性和表征压缩作为稳定误校准的可能机制。

英文摘要

High-confidence errors in large language models are often treated as fragile failures. We study an alternative: some errors may be false fixed points, locally stable, internally coherent, and confidently wrong. This separates robustness from truth-tracking. We develop the separation through a Kantian commitment-gate framing and a minimal linear feedback model in which stability and correctness can diverge. Across three open-weight models, overconfident wrong items are not systematically more locally fragile than confidently correct items under our hidden-state sensitivity probes. Abstention-aware self-critique reduces overconfident wrong commitments by sacrificing coverage, and C3-R, a rule-based explicit feedback gate, sharpens that tradeoff rather than eliminating it. These results motivate, but do not establish, high signal-to-noise (high-SNR) inertia and representational compression as possible mechanisms for stable miscalibration.

URL PDF HTML ☆

赞 0 踩 0

2510.08609 2026-05-26 cs.SE cs.CR cs.LG cs.PL 版本更新

Which Is Better For Reducing Outdated and Vulnerable Dependencies: Pinning or Floating?

哪种方法更能减少过时和易受攻击的依赖：固定版本还是浮动版本？

Imranur Rahman, Jill Marley, William Enck, Laurie Williams

发表机构 * North Carolina State University（北卡罗来纳州立大学）

AI总结本研究通过实证分析npm、PyPI和Cargo生态系统中依赖版本约束的使用趋势，利用生存分析比较固定版本与浮动版本对依赖过时和易受攻击风险的影响。

Comments Accepted to ASE 2025

详情

DOI: 10.1109/ASE63991.2025.00229
Journal ref: 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE)

AI中文摘要

开发者通常使用版本约束来指定其项目依赖的可接受版本。固定依赖可以减少破坏性变更的可能性，但需要手动管理过时和易受攻击依赖的替换。另一方面，浮动依赖可以自动获取错误修复和安全修复，但存在破坏性变更的风险。安全从业者主张固定依赖以防止软件供应链攻击，例如恶意包更新。然而，由于固定是最严格的版本约束，它最可能导致依赖过时。尽管如此，不同版本约束类型下依赖变得过时或易受攻击的可能性如何变化尚不清楚。本研究旨在通过大规模实证评估不同版本约束类型下依赖变得过时或易受攻击的可能性，帮助开发者做出明智的依赖版本约束选择。在本研究中，我们首先识别了npm、PyPI和Cargo生态系统中依赖版本约束使用的趋势以及开发者对版本约束类型更改的模式。然后，我们使用生存分析对依赖状态转换进行建模，并估计使用固定版本与其他版本约束类型相比，依赖变得过时或易受攻击的可能性如何变化。我们观察到，在过时和易受攻击的依赖中，最常用的版本约束类型是浮动-次要，固定版本次之。我们还发现，浮动-主要导致过时的可能性最小，而浮动-次要导致易受攻击的可能性最小。

英文摘要

Developers consistently use version constraints to specify acceptable versions of the dependencies for their project. Pinning dependencies can reduce the likelihood of breaking changes, but comes with a cost of manually managing the replacement of outdated and vulnerable dependencies. On the other hand, floating can be used to automatically get bug fixes and security fixes, but comes with the risk of breaking changes. Security practitioners advocate pinning dependencies to prevent against software supply chain attacks, e.g., malicious package updates. However, since pinning is the tightest version constraint, pinning is the most likely to result in outdated dependencies. Nevertheless, how the likelihood of becoming outdated or vulnerable dependencies changes across version constraint types is unknown. The goal of this study is to aid developers in making an informed dependency version constraint choice by empirically evaluating the likelihood of dependencies becoming outdated or vulnerable across version constraint types at scale. In this study, we first identify the trends in dependency version constraint usage and the patterns of version constraint type changes made by developers in the npm, PyPI, and Cargo ecosystems. We then modeled the dependency state transitions using survival analysis and estimated how the likelihood of becoming outdated or vulnerable changes when using pinning as opposed to the rest of the version constraint types. We observe that among outdated and vulnerable dependencies, the most commonly used version constraint type is floating-minor, with pinning being the next most common. We also find that floating-major is the least likely to result in outdated and floating-minor is the least likely to result in vulnerable dependencies.

URL PDF HTML ☆

赞 0 踩 0

2510.02730 2026-05-26 cs.LG cs.CV 版本更新

Dale meets Langevin: A Multiplicative Denoising Diffusion Model

Dale meets Langevin: 乘法去噪扩散模型

Nishanth Shetty, Madhava Prasath, Chandra Sekhar Seelamantula

发表机构 * Department of Electrical Engineering（电子工程系）； Indian Institute of Science（印度科学研究所）

AI总结提出以几何布朗运动为前向噪声过程的乘法分数生成模型，推导反向时间SDE并设计两种乘法采样器，引入Hyvärinen分数和乘法去噪分数匹配目标，在图像数据集上验证生成能力。

详情

AI中文摘要

指数梯度下降（EGD）是一种受生物学启发的优化算法，遵循Dale定律，在收敛时产生对数正态分布的突触权重，与神经科学的实验观察一致。由于几何布朗运动（GBM）在任何固定时间的边际分布是对数正态的，这种收敛性质揭示了EGD与基于GBM的随机过程之间的自然联系。我们提出了一种基于分数的乘法生成模型，以GBM作为前向噪声过程，并推导了其在环境空间和对数变换空间中的相应反向时间SDE。通过离散化相应的反向时间SDE，我们推导出两种乘法采样器：直接从环境空间反向时间SDE得到的符号无关采样器，以及通过Lamperti变换得到的符号保持采样器，我们称之为Dale-Langevin采样器。我们将该框架与镜像Langevin动力学联系起来，表明优化中驱动EGD的凸函数精确地控制着Dale-Langevin采样器。虽然标准Stein分数（定义为随机向量X在x处的∇log p_X(x)）在基于加性噪声的扩散模型中自然出现，但在乘法设置中，我们遇到了一种用于采样的修改版Stein分数，我们称之为Hyvärinen分数：x∘∇log p_X(x)。为了估计该分数，我们提出了一种新的乘法去噪分数匹配目标（M-DSM），证明了其与乘法显式分数匹配损失的等价性，并表明它包含了非负分数匹配损失。在MNIST、Fashion-MNIST、Kuzushiji-MNIST和CIFAR-10上的实验结果验证了所提框架的生成能力。

英文摘要

Exponentiated gradient descent (EGD), a biologically motivated optimisation algorithm that respects Dale's law, produces log-normally distributed synaptic weights at convergence, in alignment with experimental observations in neuroscience. Since the marginal distribution of geometric Brownian motion (GBM) at any fixed time is log-normal, this convergence property reveals a natural connection between EGD and GBM-based stochastic processes. We propose a multiplicative score-based generative model with GBM as a forward noising process and derive its corresponding reverse-time SDE in both the ambient space and in the $\log$-transformed space. We derive two multiplicative samplers by discretising the corresponding reverse-time SDEs: a sign-agnostic sampler obtained directly from the ambient-space reverse-time SDE, and a sign-preserving sampler, which we refer to as the Dale-Langevin sampler, obtained via the Lamperti transform. We connect the framework to Mirrored Langevin Dynamics, showing that the convex function driving EGD in optimisation precisely governs the Dale-Langevin sampler. While the standard Stein score, defined as $\nabla \log p_{\boldsymbol{X}}(\boldsymbol{x})$ for a random vector $\boldsymbol{X}$ evaluated at $\boldsymbol{x}$, comes up naturally in the additive noise based diffusion models, in the multiplicative setting, we encounter a modified version of the Stein score for sampling, which we refer to as the {\it Hyvärinen score}: $\boldsymbol{x} \circ \nabla \log p_{\boldsymbol{X}}(\boldsymbol{x})$. To estimate the score, we propose a new multiplicative denoising score-matching objective (M-DSM), prove its equivalence to the multiplicative explicit score-matching loss and show that it subsumes the non-negative score matching loss. Experimental results on MNIST, Fashion-MNIST, Kuzushiji-MNIST, and CIFAR-10 to validate the generative capability of the proposed framework.

URL PDF HTML ☆

赞 0 踩 0

2510.01389 2026-05-26 cs.RO cs.AI cs.LG 版本更新

动态关系先验提升Transformer在多变量时间序列中的表现

Hunjae Lee, Corey Clark

发表机构 * Department of Computer Science, Southern Methodist University, Dallas TX USA（计算机科学系，南方 Methodist 大学，德克萨斯州达拉斯）

AI总结提出动态关系先验注意力机制（prime attention），通过为每个token对动态调整表示，有效捕捉多变量时间序列中异构的通道间依赖关系，在保持相同计算复杂度下提升预测精度达6.5%。

详情

AI中文摘要

标准Transformer中的注意力机制使用静态的token表示，这些表示在每一层的所有成对计算中保持不变。这限制了它们与每个token对交互中可能存在的多样化关系动态的表示对齐。虽然标准注意力在关系相对同质的领域表现出色，但其静态关系学习难以捕捉多变量时间序列（MTS）数据中多样、异构的通道间依赖关系——其中单个系统内不同的通道对交互可能由完全不同的物理定律或时间动态支配。为了更好地将注意力机制与此类领域现象对齐，我们提出了带有动态关系先验的注意力机制（prime attention）。与标准注意力中每个token在所有成对交互中呈现相同表示不同，prime attention通过可学习的调制动态地（或按交互）定制每个token，以最好地捕捉每个token对的独特关系动态，从而针对特定关系优化每个成对交互。这种prime attention的表示可塑性使其能够在保持与标准注意力相同渐近计算复杂度的同时，有效提取MTS中关系特定的信息。我们的结果表明，prime attention在基准测试中始终优于标准注意力，预测精度提升高达6.5%。此外，我们发现与标准注意力相比，prime attention在使用最多40%更短序列长度时即可达到相当或更优的性能，进一步证明了其卓越的关系建模能力。

英文摘要

Standard attention mechanisms in transformers employ static token representations that remain unchanged across all pair-wise computations in each layer. This limits their representational alignment with the potentially diverse relational dynamics of each token-pair interaction. While they excel in domains with relatively homogeneous relationships, standard attention's static relational learning struggles to capture the diverse, heterogeneous inter-channel dependencies of multivariate time series (MTS) data--where different channel-pair interactions within a single system may be governed by entirely different physical laws or temporal dynamics. To better align the attention mechanism for such domain phenomena, we propose attention with dynamic relational priming (prime attention). Unlike standard attention where each token presents an identical representation across all of its pair-wise interactions, prime attention tailors each token dynamically (or per interaction) through learnable modulations to best capture the unique relational dynamics of each token pair, optimizing each pair-wise interaction for that specific relationship. This representational plasticity of prime attention enables effective extraction of relationship-specific information in MTS while maintaining the same asymptotic computational complexity as standard attention. Our results demonstrate that prime attention consistently outperforms standard attention across benchmarks, achieving up to 6.5\% improvement in forecasting accuracy. In addition, we find that prime attention achieves comparable or superior performance using up to 40\% less sequence length compared to standard attention, further demonstrating its superior relational modeling capabilities.

URL PDF HTML ☆

赞 0 踩 0

2509.10515 2026-05-26 cs.LG cs.CL 版本更新

Adaptive Preference Optimization with Uncertainty-aware Utility Anchor

基于不确定性感知效用锚点的自适应偏好优化

Xiaobo Wang, Zixia Jia, Jiaqi Li, Qi Liu, Zilong Zheng

发表机构 * State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China（认知智能国家重点实验室，中国科学技术大学）； Institute of Artificial Intelligence, Hefei Comprehensive National Science Center（人工智能研究院，合肥综合性国家科学中心）； State Key Laboratory of General Artificial Intelligence, BIGAI（通用人工智能国家重点实验室，BIGAI）

AI总结提出一种通用离线偏好优化框架UAPO，通过引入锚点函数估计偏好数据标注的不确定性，支持非配对数据训练，提升数据利用效率和训练鲁棒性。

Comments Accepted by EMNLP 2025 Findings

详情

AI中文摘要

离线偏好优化方法对于大型语言模型（LLMs）的对齐是高效的。直接偏好优化（DPO）类学习作为最流行的方法之一，因其在奖励建模中的高效性而脱颖而出。然而，这些方法通常遵循惯例使用Bradley-Terry（BT）奖励建模，该建模面临几个关键假设，包括对成对训练数据的需求、模型分布偏移、人类理性假设等。为了解决这些限制，我们提出了一种通用的离线偏好优化框架——基于不确定性感知效用锚点的自适应偏好优化（UAPO），该框架引入了一个锚点函数来估计偏好数据标注带来的不确定性。我们的方法即使在数据未配对的情况下也能进行训练，显著提高了数据利用效率。此外，锚点设计使UAPO在训练过程中更加鲁棒。实验结果表明，UAPO在无需严格依赖数据配对的情况下取得了有竞争力的结果，为更灵活有效的偏好优化方法铺平了道路。

英文摘要

Offline preference optimization methods are efficient for large language models (LLMs) alignment. Direct Preference optimization (DPO)-like learning, one of the most popular approaches, stands out for its efficiency in reward modeling. However, these methods typically follow the convention to use Bradley-Terry (BT) reward modeling that faces several critical assumptions, including the requirement for pairwise training data, model distribution shifting, human rationality assumption, etc. To address these limitations, we propose a general framework for offline preference optimization methods, Adaptive Preference Optimization with Utility Anchor (UAPO), which introduces an anchoring function to estimate the uncertainties brought from preference data annotation. Our method enables training even in scenarios where the data is unpaired, significantly enhancing data utilization efficiency. Moreover, the anchor design makes UAPO more robust in the training process. Experimental results demonstrate that UAPO achieves competitive outcomes without the strict dependency on data pairing, paving the way for more flexible and effective preference optimization methods.

URL PDF HTML ☆

赞 0 踩 0

2508.21620 2026-05-26 cs.LG 版本更新

Introduction to the Analysis of Probabilistic Decision-Making Algorithms

概率决策算法分析导论

Agustinus Kristiadi

发表机构 * Western University and Vector Institute（西方大学和向量研究所）

AI总结本文为概率决策算法（包括赌博机算法、贝叶斯优化和树搜索算法）的理论分析提供了一本自包含的入门指南，旨在降低非专家的理解门槛。

详情

AI中文摘要

决策理论为在各种不确定性下做出选择提供了原则性方法。实现这些理论的算法已成功应用于广泛的实际问题，包括材料和药物发现。事实上，这些算法是可取的，因为它们可以自适应地收集信息以在未来做出更好的决策，从而产生数据高效的工作流程。在科学发现中，实验成本高昂，因此这些算法可以显著降低实验成本。这些算法的理论分析对于理解其行为以及为开发下一代算法提供有价值的见解至关重要。然而，文献中的理论分析通常对非专家来说难以理解。本专著旨在为常用概率决策算法（包括赌博机算法、贝叶斯优化和树搜索算法）的理论分析提供一本可访问的、自包含的入门介绍。仅假设读者具备概率论和统计学的基本知识，以及一些关于高斯过程的基础知识。

英文摘要

Decision theories offer principled methods for making choices under various types of uncertainty. Algorithms that implement these theories have been successfully applied to a wide range of real-world problems, including materials and drug discovery. Indeed, they are desirable since they can adaptively gather information to make better decisions in the future, resulting in data-efficient workflows. In scientific discovery, where experiments are costly, these algorithms can thus significantly reduce the cost of experimentation. Theoretical analyses of these algorithms are crucial for understanding their behavior and providing valuable insights for developing next-generation algorithms. However, theoretical analyses in the literature are often inaccessible to non-experts. This monograph aims to provide an accessible, self-contained introduction to the theoretical analysis of commonly used probabilistic decision-making algorithms, including bandit algorithms, Bayesian optimization, and tree search algorithms. Only basic knowledge of probability theory and statistics, along with some elementary knowledge about Gaussian processes, is assumed.

URL PDF HTML ☆

赞 0 踩 0

2508.11307 2026-05-26 physics.ao-ph cs.LG physics.data-an 版本更新

Approximating the universal thermal climate index using sparse regression with orthogonal polynomials

使用正交多项式稀疏回归逼近通用热气候指数

Sabin Roman, Ljupco Todorovski, Saso Dzeroski, Gregor Skok

发表机构 * Department of Knowledge Technologies, Jo z ef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia Faculty of Mathematics ； Physics, University of Ljubljana, Jadranska ulica 19, 1000 Ljubljana, Slovenia

AI总结针对通用热气候指数(UTCI)标准多项式近似误差大的问题，提出基于正交多项式基的稀疏回归方法，在保持计算效率的同时显著降低平均误差和大误差频率。

Comments Final peer-reviewed version of the manuscript

详情

DOI: 10.5194/gmd-19-4319-2026
Journal ref: Geoscientific Model Development 19, 4319-4330 (2026)

AI中文摘要

通用热气候指数(UTCI)是一种衡量热舒适度的指标，用于量化人类对环境条件的感受。由于其作为生物气候指标的稳健性和多功能性，已被广泛应用于生物气候学的众多研究中，并越来越多地作为户外热舒适度的操作度量。从相关环境参数计算UTCI值通常并不直接，因此使用6次多项式近似已成为计算UTCI值的标准方法。尽管计算效率高，但该多项式近似的误差可能很大。本研究的目标是开发一种改进的多项式近似版本——既能保持相当的计算效率，又在数值稳定性方面更稳健，且精度显著提高，特别是在减少较大误差的频率方面。通过使用稀疏正交回归（即基于正交多项式基的稀疏回归）实现了这一目标，这不仅大幅降低了平均误差（即平均误差、平均绝对误差和均方根误差），还显著减少了较大误差的频率。利用勒让德多项式基，可以构建近似模型，有效填充精度与复杂度的帕累托前沿，并在不同模型容量下表现出稳定的层次化系数结构。仅使用20%的数据训练新近似模型，并在剩余80%的数据上进行测试，显示出成功的泛化能力，且结果在自助法下具有稳健性。该分解有效地将UTCI近似为正交基中的傅里叶式展开，在L2（最小二乘）意义上接近理论最优值。

英文摘要

The Universal Thermal Climate Index (UTCI) is a measure of thermal comfort that quantifies how humans experience environmental conditions. Due to its robustness and versatility as a bioclimatic indicator, it has been extensively employed across a wide range of studies in bioclimatology and is increasingly used as an operational measure of outdoor thermal comfort. Calculating the UTCI value from the relevant environmental parameters is nominally not straightforward, which is why using a 6th-degree polynomial approximation has become the standard way to calculate UTCI values. Although it is computationally efficient, the error of this polynomial approximation can be substantial. The goal of this study was to develop an improved version of the polynomial approximation - one that retains comparable computational efficiency but is more robust in terms of numerical stability and substantially more accurate, particularly in reducing the frequency of larger errors. This goal was achieved using sparse orthogonal regression, namely sparse regression with an orthogonal polynomial basis, which not only substantially reduces the average errors (i.e., the mean error, the mean absolute error, and the root mean square error) but also drastically reduces the frequency of large errors. By leveraging Legendre polynomial bases, approximation models could be constructed that efficiently populate a Pareto front of accuracy versus complexity and exhibit stable, hierarchical coefficient structures across varying model capacities. Training the new approximation models over only 20% of the data, with the testing performed over the remaining 80%, highlights successful generalization, with the results being robust under bootstrapping. The decomposition effectively approximates the UTCI as a Fourier-like expansion in an orthogonal basis, yielding results near the theoretical optimum in the L2 (least squares) sense.

URL PDF HTML ☆

赞 0 踩 0

2507.14760 2026-05-26 eess.IV cs.AI cs.CV cs.LG 版本更新

QUTCC: Quantile Uncertainty Training and Conformal Calibration for Imaging Inverse Problems

QUTCC: 成像逆问题的分位数不确定性训练与保形校准

Cassandra Tong Ye, Shamus Li, Tyler King, Kristina Monakhova

AI总结提出QUTCC方法，结合分位数回归与U-Net实现空间自适应保形校准，在多个成像逆问题中生成更紧的不确定性区间并定位模型幻觉。

详情

AI中文摘要

尽管深度学习为科学和医学成像带来了巨大前景，但任何失败和幻觉（与事实不符的预测）都难以定位，并可能产生严重的下游后果。不确定性估计技术，如保形预测，可以通过预测模型预测的统计有效误差条来提供帮助。然而，流行的保形预测方法并非为高维图像值问题设计，且在保形校准过程中未考虑图像内的空间相关性，导致不确定性区间过大。我们提出了一种实用的同时分位数回归方法，能够在保形校准期间实现非线性、空间自适应缩放。我们的方法QUTCC使用带有分位数嵌入的U-Net架构，在训练期间学习完整的条件分位数分布，然后利用这个非线性学习函数进行空间自适应保形校准。在测试时，我们的方法能够高效地估计具有像素边际覆盖保证的不确定性区间。此外，QUTCC还可以在没有内置分布假设的情况下预测逐像素条件概率密度估计。我们在多个去噪问题、加速磁共振成像和定量相位显微镜上评估了我们的方法。与先前的保形方法相比，我们的方法在相同覆盖水平下始终产生更紧的不确定性区间，能够预测不同任务的合理条件分布，并且在某些情况下，高不确定性区域可以帮助我们定位模型预测中的幻觉。

英文摘要

While deep learning offers tremendous promise for scientific and medical imaging, any failures and hallucinations (predictions that do not coincide with reality) are hard to pinpoint and can have serious downstream consequences. Uncertainty estimation techniques, such as conformal prediction, can help by predicting statistically valid error bars for a model's prediction. However, popular conformal prediction methods were not designed for high-dimensional image-valued problems and do not take into account spatial correlations within an image during conformal calibration, resulting in larger-than-necessary uncertainty intervals. We propose a practical simultaneous quantile regression method that enables non-linear, spatially-adaptive scaling during conformal calibration. Our method, QUTCC uses a U-Net architecture with a quantile embedding to learn a full conditional quantile distribution during training, and then leverages this non-linear, learned function for spatially-adaptive conformal calibration. At test time, our method can efficiently estimate uncertainty intervals with pixel-marginal coverage guarantees. In addition, QUTCC can also predict pixel-wise conditional probability density estimates without any built-in distributional assumptions. We evaluate our method on several denoising problems, accelerated magnetic resonance imaging, and quantitative phase microscopy. Our method consistently produces tighter uncertainty intervals than prior conformal methods at the same coverage level, can predict plausible conditional distributions for different tasks, and in some cases, high-uncertainty regions can help us locate hallucinations in a model's prediction.

URL PDF HTML ☆

赞 0 踩 0

2507.06038 2026-05-26 math.NA cs.LG cs.NA 版本更新

Fredholm Neural Networks for inverse problems in elliptic PDEs

Fredholm神经网络用于椭圆型偏微分方程反问题

Kyriakos C. Georgiou, Constantinos Siettos, Athanasios N. Yannacopoulos

发表机构 * Division of Applied Mathematics, Brown University（布朗大学应用数学系）； Department of Statistics and Stochastic Modelling and Applications Laboratory, Athens University of Economics and Business（雅典经济与商业大学统计学与随机建模与应用实验室）

AI总结基于Fredholm神经网络框架，提出可解释的Potential Fredholm神经网络（PFNN）求解椭圆型偏微分方程正反问题，实现高精度并严格证明误差界。

详情

AI中文摘要

在我们先前关于Fredholm神经网络（Fredholm NN / FNN）求解积分方程的工作基础上，我们将该框架扩展到线性和非线性椭圆型偏微分方程的反问题。所提出的方案包含一个定制设计的深度神经网络（DNN），其中层数、权重、偏置和超参数基于不动点方案以可解释的方式计算，因此我们称之为Potential Fredholm神经网络（PFNN）。我们首先构建PFNN作为求解正问题的方法，表明该方法确保了高精度和可解释性，在区域内部实现小误差，在边界上接近机器精度。然后，我们使用该方法求解椭圆型PDE的反问题，并提供了方案一致性的严格证明以及与PFNN架构直接相关的区域内部和边界的误差界。特别地，我们表明这些误差界依赖于边界函数的近似和积分离散方案，两者都直接对应于Fredholm NN架构的组成部分。通过这种方式，我们构建了一个可解释的方案，该方案为反问题提供精确解，同时由于PFNN的架构而明确尊重边界条件。我们评估了所提出方案在二维和三维线性和半线性椭圆型PDE上的性能。

英文摘要

Building on our previous work on Fredholm Neural Networks (Fredholm NNs/ FNNs) for solving integral equations, we extend the framework to inverse problems for linear and nonlinear elliptic partial differential equations. The proposed scheme consists of a custom-designed deep neural network (DNN) in which the number of layers, weights, biases and hyperparameters are computed in an explainable manner based on a fixed-point scheme, and we therefore refer to this as the Potential Fredholm Neural Network (PFNN). We first build the PFNN as a method for solving the forward problem, showing that this approach ensures both a high accuracy and explainability, achieving small errors in the interior of the domain, and near machine-precision on the boundary. We then use this approach to solve inverse problems for elliptic PDEs, and provide a rigorous proof for the consistency of the scheme and error bounds for both the interior and boundary of the domain, tied directly to the architecture of the PFNN. In particular, we show that these error bounds depend on the approximation of the boundary function and the integral discretization scheme, both of which directly correspond to components of the Fredholm NN architecture. In this way, we construct an explainable scheme that provides accurate solutions to the inverse problems, whilst still explicitly respecting the boundary conditions, due to the architecture of the PFNN. We assess the performance of the proposed scheme for linear and semi-linear elliptic PDEs in two and three dimensions.

URL PDF HTML ☆

赞 0 踩 0

2506.01945 2026-05-26 econ.EM cs.LG stat.AP 版本更新

Stock Market Telepathy: Graph Neural Networks Predicting the Secret Conversations between MINT and G7 Countries

股市读心术：图神经网络预测MINT与G7国家之间的秘密对话

Nurbanu Bursa

发表机构 * Hacettepe University（哈切特佩大学）

AI总结使用MTGNN图神经网络分析2012-2024年G7与MINT国家股市指数，揭示美国、加拿大、印尼和土耳其的影响力，并证明该方法优于传统预测模型。

详情

DOI: 10.1080/23737484.2026.2668446
Journal ref: Communications in Statistics: Case Studies, Data Analysis and Applications (2026)

AI中文摘要

新兴经济体，特别是MINT国家（墨西哥、印度尼西亚、尼日利亚和土耳其），在全球股市中的影响力日益增强，尽管它们仍易受G7（加拿大、法国、德国、意大利、日本、英国和美国）等发达国家经济状况的影响。金融市场的这种相互关联性和敏感性使得理解这些关系对于投资者和政策制定者准确预测股价走势至关重要。为此，我们研究了2012年至2024年G7和MINT国家的主要股市指数，使用了一种称为多元时间序列图神经网络（MTGNN）的最新图神经网络算法。该方法允许考虑多元时间序列中复杂的时空连接。在实现中，MTGNN揭示出美国和加拿大在预测过程中对股市指数最具影响力的G7国家，而印度尼西亚和土耳其是最具影响力的MINT国家。此外，我们的结果表明，MTGNN在预测MINT和G7国家股市指数价格方面优于传统方法。因此，该研究为经济板块市场提供了宝贵的见解，并提出了一种使用MTGNN分析全球股市动态的令人信服的实证方法。

英文摘要

Emerging economies, particularly the MINT countries (Mexico, Indonesia, Nigeria, and Türkiye), are gaining influence in global stock markets, although they remain susceptible to the economic conditions of developed countries like the G7 (Canada, France, Germany, Italy, Japan, the United Kingdom, and the United States). This interconnectedness and sensitivity of financial markets make understanding these relationships crucial for investors and policymakers to predict stock price movements accurately. To this end, we examined the main stock market indices of G7 and MINT countries from 2012 to 2024, using a recent graph neural network (GNN) algorithm called multivariate time series forecasting with graph neural network (MTGNN). This method allows for considering complex spatio-temporal connections in multivariate time series. In the implementations, MTGNN revealed that the US and Canada are the most influential G7 countries regarding stock indices in the forecasting process, and Indonesia and Türkiye are the most influential MINT countries. Additionally, our results showed that MTGNN outperformed traditional methods in forecasting the prices of stock market indices for MINT and G7 countries. Consequently, the study offers valuable insights into economic blocks' markets and presents a compelling empirical approach to analyzing global stock market dynamics using MTGNN.

URL PDF HTML ☆

赞 0 踩 0

2505.05371 2026-05-26 eess.SP cs.LG q-bio.NC 版本更新

From Sleep Staging to Spindle Detection: A Case Study on End-to-End Automated Sleep Analysis

从睡眠分期到纺锤波检测：端到端自动化睡眠分析的案例研究

Niklas Grieger, Siamak Mehrkanoon, Philipp Ritter, Stephan Bialonski

发表机构 * Department of Medical Engineering and Technomathematics, FH Aachen University of Applied Sciences（医学工程与技术数学系，亚琛应用科学大学）； Department of Information and Computing Sciences, Utrecht University（信息与计算科学系，乌得勒支大学）； Institute for Data-Driven Technologies, FH Aachen University of Applied Sciences（数据驱动技术研究所，亚琛应用科学大学）； Department of Psychiatry and Psychotherapy, University Hospital Carl Gustav Carus, Technische Universität Dresden（精神病学与心理治疗系，卡尔·古斯塔夫·卡尔斯大学医院，德累斯顿技术大学）

AI总结本研究通过案例评估，使用已验证的机器学习模型（RobustSleepNet和SUMOv2）实现全自动化睡眠分析，成功复现了专家基于双相情感障碍的研究发现，表明全自动化方法可促进大规模睡眠研究。

Comments 12 pages, 4 figures, 2 tables

详情

DOI: 10.1038/s41598-026-53891-9
Journal ref: Scientific Reports 16, 16014 (2026)

AI中文摘要

睡眠分析的自动化，包括宏观结构（睡眠分期）和微观结构（例如睡眠纺锤波）元素，有望实现大规模睡眠研究，并减少由于评分者间不一致导致的差异。虽然睡眠分期和纺锤波检测等单个步骤已被分别研究，但多步骤睡眠分析自动化的可行性仍不清楚。在本案例研究中，我们评估了使用经过验证的机器学习模型进行睡眠分期（RobustSleepNet）和后续纺锤波检测（SUMOv2）的全自动化分析是否能够复现基于专家的双相情感障碍研究结果。自动化分析定性地复现了专家研究的关键发现，包括双相情感障碍患者与健康对照之间快速纺锤波密度的显著差异，在几分钟内完成了以前需要数月手动完成的工作。虽然自动化分析的结果在定量上与专家研究存在差异，可能是由于专家评分者之间或评分者与模型之间的偏差，但各个模型在睡眠分期和纺锤波检测方面的表现达到或超过了评分者间一致性。我们的结果表明，全自动化方法具有促进大规模睡眠研究的潜力。我们通过共享代码并引入SomnoBot（一个保护隐私的睡眠分析平台），公开提供自动化分析中使用的工具。

英文摘要

Automation of sleep analysis, including both macrostructural (sleep stages) and microstructural (e.g., sleep spindles) elements, promises to enable large-scale sleep studies and to reduce variance due to inter-rater incongruencies. While individual steps, such as sleep staging and spindle detection, have been studied separately, the feasibility of automating multi-step sleep analysis remains unclear. In this case study, we evaluate whether a fully automated analysis using validated machine learning models for sleep staging (RobustSleepNet) and subsequent spindle detection (SUMOv2) can replicate findings from an expert-based study of bipolar disorder. The automated analysis qualitatively reproduced key findings from the expert-based study, including significant differences in fast spindle densities between bipolar patients and healthy controls, accomplishing in minutes what previously took months to complete manually. While the results of the automated analysis differed quantitatively from the expert-based study, possibly due to biases between expert raters or between raters and the models, the models individually performed at or above inter-rater agreement for both sleep staging and spindle detection. Our results demonstrate that fully automated approaches have the potential to facilitate large-scale sleep research. We are providing public access to the tools used in our automated analysis by sharing our code and introducing SomnoBot, a privacy-preserving sleep analysis platform.

URL PDF HTML ☆

赞 0 踩 0

2504.05181 2026-05-26 cs.IR cs.AI cs.DL cs.LG 版本更新

Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval

轻量级直接文档相关性优化用于生成式信息检索

Kidist Amde Mekonnen, Yubao Tang, Maarten de Rijke

发表机构 * Institute for Clarity in Documentation（文档清晰度研究所）； Inria Paris-Rocquencourt（巴黎- Rocquencourt 国家信息与自动化所）； Rajiv Gandhi University（拉朱·甘地大学）； Tsinghua University（清华大学）； Palmer Research Laboratories（帕勒尔研究实验室）； University of Amsterdam（阿姆斯特丹大学）

AI总结提出直接文档相关性优化（DDRO）方法，通过成对排序直接对齐令牌级文档ID生成与文档级相关性估计，无需显式奖励建模和强化学习，在MS MARCO和Natural Questions上分别提升MRR@10 7.4%和19.9%。

Comments 12 pages, 3 figures. SIGIR '25 Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval July 13--18, 2025 Padua, Italy. Code and pretrained models available at: https://github.com/kidist-amde/ddro/

详情

DOI: 10.1145/3726302.3730023
Journal ref: Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '25), pages 1327-1338, 2025

AI中文摘要

生成式信息检索（GenIR）是一种有前景的神经检索范式，它将文档检索形式化为文档标识符（docid）生成任务，允许朝着统一的全局检索目标进行端到端优化。然而，现有的GenIR模型存在令牌级错位问题，即训练用于预测下一个令牌的模型往往无法有效捕捉文档级相关性。虽然基于强化学习的方法（如相关性反馈强化学习（RLRF））旨在通过奖励建模解决这种错位，但它们引入了显著的复杂性，需要优化辅助奖励函数，然后进行强化微调，这在计算上昂贵且往往不稳定。为了解决这些挑战，我们提出了直接文档相关性优化（DDRO），它通过成对排序的直接优化，将令牌级docid生成与文档级相关性估计对齐，无需显式的奖励建模和强化学习。在包括MS MARCO文档和Natural Questions在内的基准数据集上的实验结果表明，DDRO优于基于强化学习的方法，在MS MARCO上MRR@10提升了7.4%，在Natural Questions上提升了19.9%。这些发现凸显了DDRO通过简化优化方法增强检索效果的潜力。通过将对齐问题框架化为直接优化问题，DDRO简化了GenIR模型的排序优化流程，同时为基于强化学习的方法提供了一种可行的替代方案。

英文摘要

Generative information retrieval (GenIR) is a promising neural retrieval paradigm that formulates document retrieval as a document identifier (docid) generation task, allowing for end-to-end optimization toward a unified global retrieval objective. However, existing GenIR models suffer from token-level misalignment, where models trained to predict the next token often fail to capture document-level relevance effectively. While reinforcement learning-based methods, such as reinforcement learning from relevance feedback (RLRF), aim to address this misalignment through reward modeling, they introduce significant complexity, requiring the optimization of an auxiliary reward function followed by reinforcement fine-tuning, which is computationally expensive and often unstable. To address these challenges, we propose direct document relevance optimization (DDRO), which aligns token-level docid generation with document-level relevance estimation through direct optimization via pairwise ranking, eliminating the need for explicit reward modeling and reinforcement learning. Experimental results on benchmark datasets, including MS MARCO document and Natural Questions, show that DDRO outperforms reinforcement learning-based methods, achieving a 7.4% improvement in MRR@10 for MS MARCO and a 19.9% improvement for Natural Questions. These findings highlight DDRO's potential to enhance retrieval effectiveness with a simplified optimization approach. By framing alignment as a direct optimization problem, DDRO simplifies the ranking optimization pipeline of GenIR models while offering a viable alternative to reinforcement learning-based methods.

URL PDF HTML ☆

赞 0 踩 0

2504.05108 2026-05-26 cs.AI cs.LG cs.NE 版本更新

Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning

利用大语言模型发现算法：进化搜索遇见强化学习

Anja Surina, Amin Mansouri, Lars Quaedvlieg, Amal Seddas, Maryna Viazovska, Emmanuel Abbe, Caglar Gulcehre

发表机构 * EPFL（苏黎世联邦理工学院）； Apple（苹果公司）

AI总结提出通过强化学习微调持续优化大语言模型，结合进化搜索加速发现更优算法，在组合优化任务上验证有效性。

Comments 34 pages

详情

AI中文摘要

发现解决复杂问题的高效算法一直是数学和计算机科学中的重大挑战，多年来需要大量人类专业知识。近期，基于大语言模型（LLMs）的进化搜索在加速跨领域算法发现方面展现出潜力，特别是在数学和优化领域。然而，现有方法将LLM视为静态生成器，错过了利用进化探索获得的信号更新模型的机会。在这项工作中，我们提出通过强化学习（RL）微调持续优化搜索算子——即LLM，从而增强基于LLM的进化搜索。我们的方法利用进化搜索作为探索策略来发现改进的算法，而RL则基于这些发现优化LLM策略。我们在组合优化任务上的实验表明，将RL与进化搜索相结合加速了更优算法的发现，展示了RL增强的进化策略在算法设计中的潜力。

英文摘要

Discovering efficient algorithms for solving complex problems has been an outstanding challenge in mathematics and computer science, requiring substantial human expertise over the years. Recent advancements in evolutionary search with large language models (LLMs) have shown promise in accelerating the discovery of algorithms across various domains, particularly in mathematics and optimization. However, existing approaches treat the LLM as a static generator, missing the opportunity to update the model with the signal obtained from evolutionary exploration. In this work, we propose to augment LLM-based evolutionary search by continuously refining the search operator - the LLM - through reinforcement learning (RL) fine-tuning. Our method leverages evolutionary search as an exploration strategy to discover improved algorithms, while RL optimizes the LLM policy based on these discoveries. Our experiments on combinatorial optimization tasks demonstrate that integrating RL with evolutionary search accelerates the discovery of superior algorithms, showcasing the potential of RL-enhanced evolutionary strategies for algorithm design.

URL PDF HTML ☆

赞 0 踩 0

2503.19605 2026-05-26 cs.LG cs.CL math.ST stat.TH 版本更新

Lean Formalization of Generalization Error Bound by Rademacher Complexity and Dudley's Entropy Integral

Rademacher复杂度和Dudley熵积分的泛化误差界的Lean形式化

Sho Sonoda, Kazumi Kasaura, Yuma Mizuno, Kei Tsukamoto, Naoto Onda

发表机构 * RIKEN AIP（日本理化学研究所AIP）； CyberAgent Inc.（CyberAgent公司）； OMRON SINIC X Corporation（OMRON SINIC X株式会社）； University College Cork（科克大学）； The University of Tokyo（东京大学）

AI总结本文在Lean 4中形式化了基于Rademacher复杂度的泛化误差界，通过形式化对称化论证、有界差异分析和McDiarmid不等式，并扩展到可数假设类及可分离拓扑索引集，最后应用得到线性预测器的经验Rademacher界和Dudley熵积分界。

Comments accepted at ITP2026

详情

AI中文摘要

探究网络剪枝对性能与可解释性的影响

Jonathan von Rad, Florian Seuffert

发表机构 * AI Center, Neural Information Processing Group University of Tübingen（人工智能中心、神经信息处理组汤姆森大学）

AI总结本文通过系统应用非结构化、结构化剪枝及连接稀疏方法，研究不同剪枝技术对GoogLeNet在ImageNet验证集上的分类性能和可解释性的影响，发现充分重训练后性能可接近甚至超越原始网络，且可解释性评分与剪枝率无显著关联。

Comments 4 pages, 6 figures

详情

AI中文摘要

深度神经网络（DNN）通常对其任务而言是过参数化的，可以通过移除权重进行大幅压缩，这一过程称为剪枝。我们研究了不同剪枝技术对GoogLeNet的分类性能和可解释性的影响。我们系统地应用非结构化剪枝、结构化剪枝以及连接稀疏性（输入权重剪枝）方法，并分析这些方法对网络在ImageNet验证集上性能的影响。我们还比较了不同的重训练策略，如迭代剪枝和一次性剪枝。我们发现，通过足够的重训练轮次，网络的性能可以接近默认GoogLeNet的性能——甚至在某些情况下超越它。为了评估可解释性，我们采用了Zimmermann等人开发的机制可解释性评分（MIS）。我们的实验表明，当使用MIS作为度量时，可解释性与剪枝率之间没有显著关系。此外，我们观察到，准确率极低的网络仍然可以获得高MIS分数，这表明MIS可能并不总是与可解释性的直观概念（例如理解正确决策的基础）一致。

英文摘要

Deep Neural Networks (DNNs) are often over-parameterized for their tasks and can be compressed quite drastically by removing weights, a process called pruning. We investigate the impact of different pruning techniques on the classification performance and interpretability of GoogLeNet. We systematically apply unstructured and structured pruning, as well as connection sparsity (pruning of input weights) methods to the network and analyze the outcomes regarding the network's performance on the validation set of ImageNet. We also compare different retraining strategies, such as iterative pruning and one-shot pruning. We find that with sufficient retraining epochs, the performance of the networks can approximate the performance of the default GoogLeNet - and even surpass it in some cases. To assess interpretability, we employ the Mechanistic Interpretability Score (MIS) developed by Zimmermann et al. . Our experiments reveal that there is no significant relationship between interpretability and pruning rate when using MIS as a measure. Additionally, we observe that networks with extremely low accuracy can still achieve high MIS scores, suggesting that the MIS may not always align with intuitive notions of interpretability, such as understanding the basis of correct decisions.

URL PDF HTML ☆

赞 0 踩 0

2401.11963 2026-05-26 cs.NE cs.AI cs.LG 版本更新

Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey on Hybrid Algorithms

桥接进化算法与强化学习：混合算法的全面综述

Pengyi Li, Jianye Hao, Hongyao Tang, Xian Fu, Yan Zheng, Ke Tang

发表机构 * College of Intelligence and Computing, Tianjin University（天津大学智能与计算学院）； Montreal Institute of Learning Algorithms (MILA)（蒙特利尔学习算法研究所）； Department of Computer Science and Engineering, Southern University of Science and Technology（南方科技大学计算机科学与工程系）

AI总结本文全面综述了进化强化学习（ERL）领域，将进化算法（EA）与强化学习（RL）融合，系统总结了三种主要研究方向：EA辅助RL优化、RL辅助EA优化以及EA与RL协同优化，并分析了各分支解决的问题及未来挑战。

Comments New Version, add more methods

详情

AI中文摘要

进化强化学习（ERL）将进化算法（EA）和强化学习（RL）相结合用于优化，已展现出显著的性能提升。通过融合这两种方法，ERL已成为一个有前景的研究方向。本综述全面概述了ERL中的不同研究分支。具体而言，我们系统地总结了相关算法的最新进展，并确定了三个主要研究方向：EA辅助的RL优化、RL辅助的EA优化以及EA和RL的协同优化。随后，我们对每个研究方向进行了深入分析，组织了多个研究分支。我们阐明了每个分支旨在解决的问题，以及EA和RL的整合如何应对这些挑战。最后，我们讨论了各个研究方向中潜在的挑战和未来的研究方向。为了便于研究人员深入研究ERL，我们在https://github.com/yeshenpy/Awesome-Evolutionary-Reinforcement-Learning上整理了所涉及的算法和代码。

英文摘要

Evolutionary Reinforcement Learning (ERL), which integrates Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for optimization, has demonstrated remarkable performance advancements. By fusing both approaches, ERL has emerged as a promising research direction. This survey offers a comprehensive overview of the diverse research branches in ERL. Specifically, we systematically summarize recent advancements in related algorithms and identify three primary research directions: EA-assisted Optimization of RL, RL-assisted Optimization of EA, and synergistic optimization of EA and RL. Following that, we conduct an in-depth analysis of each research direction, organizing multiple research branches. We elucidate the problems that each branch aims to tackle and how the integration of EAs and RL addresses these challenges. In conclusion, we discuss potential challenges and prospective future research directions across various research directions. To facilitate researchers in delving into ERL, we organize the algorithms and codes involved on https://github.com/yeshenpy/Awesome-Evolutionary-Reinforcement-Learning.

URL PDF HTML ☆

赞 0 踩 0

2311.15487 2026-05-26 cs.LG cs.AI math-ph math.MP math.OC stat.ML 版本更新

Global $\mathcal{L}^2$ minimization at uniform exponential rate via geometrically adapted gradient descent in Deep Learning

全局 $\mathcal{L}^2$ 最小化：通过深度学习中的几何自适应梯度下降实现均匀指数速率

Thomas Chen

发表机构 * Department of Mathematics, University of Texas at Austin（德克萨斯大学奥斯汀分校数学系）

AI总结本文利用微分几何中黎曼度量的任意性，提出两种改进的梯度下降流（过参数化和欠参数化设置），在秩条件成立时证明其以均匀指数收敛速率驱动 $\mathcal{L}^2$ 代价到全局最小值，并推广到秩条件不成立的情形。

Comments AMS Latex, 21 pages. Typos corrected, references and comments added

详情

AI中文摘要

我们考虑深度学习网络中的监督学习场景，并利用黎曼度量选择的任意性（微分几何的一般事实）来定义梯度下降流。在标准的深度学习方法中，参数空间（权重和偏置）上的梯度流是相对于欧几里得度量定义的。而在这里，我们选择相对于深度学习网络输出层中的欧几里得度量的梯度流。这自然地在参数空间中诱导出两种改进的梯度下降流版本，一种适用于过参数化设置，另一种适用于欠参数化设置。在过参数化情况下，我们证明，只要秩条件成立，改进的梯度下降的所有轨道都以均匀指数收敛速率将 ${\mathcal L}^2$ 代价驱动到其全局最小值；因此，对于任何预先指定的接近全局最小值的程度，可以获得一个先验的停止时间。我们指出了后者与亚黎曼几何的关系。此外，我们将上述框架推广到秩条件不成立的情况；特别地，我们表明局部平衡只有在秩损失发生时才能存在，并且通常它们不是孤立点，而是参数空间中临界子流形的元素。

英文摘要

We consider the scenario of supervised learning in Deep Learning (DL) networks, and exploit the arbitrariness of choice in the Riemannian metric relative to which the gradient descent flow can be defined (a general fact of differential geometry). In the standard approach to DL, the gradient flow on the space of parameters (weights and biases) is defined with respect to the Euclidean metric. Here instead, we choose the gradient flow with respect to the Euclidean metric in the output layer of the DL network. This naturally induces two modified versions of the gradient descent flow in the parameter space, one adapted for the overparametrized setting, and the other for the underparametrized setting. In the overparametrized case, we prove that, provided that a rank condition holds, all orbits of the modified gradient descent drive the ${\mathcal L}^2$ cost to its global minimum at a uniform exponential convergence rate; one thereby obtains an a priori stopping time for any prescribed proximity to the global minimum. We point out relations of the latter to sub-Riemannian geometry. Moreover, we generalize the above framework to the situation in which the rank condition does not hold; in particular, we show that local equilibria can only exist if a rank loss occurs, and that generically, they are not isolated points, but elements of a critical submanifold of parameter space.

URL PDF HTML ☆

赞 0 踩 0

2310.01285 2026-05-26 q-fin.CP cs.LG q-fin.MF stat.ML 版本更新

Automated regime classification in multidimensional time series data using sliced Wasserstein k-means clustering

多维时间序列数据中的自动制度分类：基于切片Wasserstein k-means聚类

Qinmeng Luan, James Hamp

发表机构 * Citigroup, London, UK（伦敦英国摩根大通公司）； Data Science Institute, London School of Economics, London, UK（伦敦经济学院数据科学研究所）

AI总结提出切片Wasserstein k-means聚类方法，通过近似多维Wasserstein距离，实现多维时间序列数据的自动制度分类，并在合成数据和真实外汇数据中验证有效性。

详情

DOI: 10.3934/DSFE.2025016
Journal ref: Data Science in Finance and Economics 2025, Volume 5, Issue 3: 387-418

AI中文摘要

最近的研究提出Wasserstein k-means（Wk-means）聚类作为对时间序列数据（特别是单维资产收益）进行制度分类的强大方法。本文首先详细研究应用于合成一维时间序列数据的Wasserstein k-means聚类算法的行为。我们通过详细研究聚类算法的动态以及超参数变化如何影响不同随机初始化的性能，扩展了先前的工作。我们计算简单的度量，发现这些度量有助于识别高质量的聚类。然后，我们将Wasserstein k-means聚类技术扩展到多维时间序列数据，通过将多维Wasserstein距离近似为切片Wasserstein距离，得到一种称为“切片Wasserstein k-means（sWk-means）聚类”的方法。我们将sWk-means聚类方法应用于多维时间序列数据中的自动制度分类问题，使用合成数据证明该方法的有效性和有效性。最后，我们以公开的外汇即期汇率数据作为案例研究，表明sWk-means方法能够识别真实多维金融时间序列中的不同市场制度。我们最后评论了该方法的一些局限性以及潜在的补充或替代方法。

英文摘要

Recent work has proposed Wasserstein k-means (Wk-means) clustering as a powerful method to classify regimes in time series data, and one-dimensional asset returns in particular. In this paper, we begin by studying in detail the behaviour of the Wasserstein k-means clustering algorithm applied to synthetic one-dimensional time series data. We extend the previous work by studying, in detail, the dynamics of the clustering algorithm and how varying the hyperparameters impacts the performance over different random initialisations. We compute simple metrics that we find to be useful in identifying high-quality clusterings. We then extend the technique of Wasserstein k-means clustering to multidimensional time series data by approximating the multidimensional Wasserstein distance as a sliced Wasserstein distance, resulting in a method we call 'sliced Wasserstein k-means (sWk-means) clustering'. We apply the sWk-means clustering method to the problem of automated regime classification in multidimensional time series data, using synthetic data to demonstrate the validity and effectiveness of the approach. Finally, we show that the sWk-means method is able to identify distinct market regimes in real multidimensional financial time series, using publicly available foreign exchange spot rate data as a case study. We conclude with remarks about some limitations of our approach and potential complementary or alternative approaches.

URL PDF HTML ☆

赞 0 踩 0

2306.02216 2026-05-26 cs.LG cs.CV 版本更新

Forgettable Federated Linear Learning with Certified Data Unlearning

具有认证数据遗忘的可遗忘联邦线性学习

Ruinan Jin, Minghui Chen, Qiong Zhang, Xiaoxiao Li

发表机构 * University of British Columbia（不列颠哥伦比亚大学）； Vector Institute（向量研究所）； Renmin University of China（中国人民大学）

AI总结提出一种基于预训练模型线性近似的联邦遗忘框架，通过联邦线性训练实现高效、安全且可认证的客户端数据遗忘。

Comments IEEE Transactions on Neural Networks and Learning Systems

详情

DOI: 10.1109/TNNLS.2026.3683398
Journal ref: IEEE Transactions on Neural Networks and Learning Systems, Early Access, pp. 1-10, 2026

AI中文摘要

联邦学习（FL）能够在分布式客户端之间进行协作模型训练，同时保护用户隐私。最近，联邦遗忘（FU）的出现旨在解决“被遗忘权”问题，并在无需重新训练整个FL系统的情况下移除中毒或目标客户端的影响。然而，许多FU方法需要与保留或目标客户端通信，引入额外的安全风险，或存储历史模型，限制了其效率和实用性。此外，由于非线性模型及其训练动态的复杂性，大多数用于深度神经网络（DNN）的FU方法缺乏理论认证。在这项工作中，我们引入了可遗忘联邦线性学习，这是一个用于DNN的训练和遗忘框架。我们的方法使用预训练模型线性近似DNN，并通过联邦线性训练实现与原始网络相当的性能。我们进一步提出了一种经过认证、高效且安全的遗忘策略，使服务器能够在不进行额外客户端通信或存储的情况下移除目标客户端的影响。在从小型到大型数据集上使用卷积神经网络和现代基础模型进行的广泛实验表明，我们的方法在模型准确性和有效的目标客户端遗忘之间取得了平衡。这项工作为高效且可信的FU提供了一个实用的流程。代码：https://github.com/Nanboy-Ronan/2F2L-Federated-Unlearning

英文摘要

Federated Learning (FL) enables collaborative model training across distributed clients while preserving user privacy. Recently, Federated Unlearning (FU) has emerged to address the "right to be forgotten" and to remove the influence of poisoned or target clients without retraining the entire FL system. However, many FU methods require communication with retained or target clients, introduce additional security risks, or store historical models, limiting their efficiency and practicality. Moreover, most FU methods for deep neural networks (DNNs) lack theoretical certification due to the complexity of nonlinear models and their training dynamics. In this work, we introduce Forgettable Federated Linear Learning, a training and unlearning framework for DNNs. Our approach uses pre-trained models to linearly approximate DNNs and achieve performance comparable to the original networks through Federated Linear Training. We further present a certified, efficient, and secure unlearning strategy that enables the server to remove a target client's influence without additional client communication or storage. Extensive experiments on small- to large-scale datasets, using both convolutional neural networks and modern foundation models, show that our method balances model accuracy with effective target-client unlearning. This work provides a practical pipeline for efficient and trustworthy FU. Code: https://github.com/Nanboy-Ronan/2F2L-Federated-Unlearning

URL PDF HTML ☆

赞 0 踩 0

2605.24524 2026-05-26 cs.LG cs.CL q-bio.NC 版本更新

What Are We Actually Decoding? Source Attribution for Non-Invasive Brain-to-Language Retrieval

我们究竟在解码什么？非侵入式脑到语言检索的源归因

Xinyu Zhang, Sichao Liu, Runhao Lu, Alexandra Woolgar, Lihui Wang

发表机构 * KTH（瑞典皇家理工学院）； University of Cambridge（剑桥大学）； EPFL（苏黎世联邦理工学院）； Karolinska Institutet（Karolinska研究所）； McGill University（麦吉尔大学）

AI总结针对非侵入式神经语言解码中结果被非刺激诱发源（如解码器先验、嵌入度量、信号时长等）膨胀的问题，提出一个审计框架，通过结构捷径、窗口级刺激锁定证据和跨窗口上下文聚合三种源分离，并引入组上下文偏差（GCB）作为可控的源归因干预，实现性能的源归因而非仅报告。

Comments 35 pages, 7 figures, 25 tables

详情

AI中文摘要

在非侵入式神经语言解码中，结果可能被非刺激诱发的神经证据源膨胀：解码器先验、基于嵌入的度量以及非神经结构干扰（如信号时长）。因此，方法学挑战在于归因：当报告的性能提升可以追溯到特定源时，它才更具信息性。我们将刺激锁定的MEG到音频检索重新构建为一个审计框架，将表观性能分离为三个源——结构捷径、窗口级刺激锁定证据和跨窗口上下文聚合——并为每个源提供诊断。在变长解码下，信号盲的高斯噪声达到66.3%的Rank@1（R@1），但一旦强制执行固定时长窗口和刺激身份分割，其性能骤降至接近随机，从而隔离了结构泄漏。在这些控制下，固定窗口检索恢复了可测量的MEG-音频可区分性，而一个神谕句子桶诊断显示，95.7%的Top-1错误选择了错误的句子，将剩余瓶颈定位到句子级竞争。我们使用组上下文偏差（GCB）审计这一上下文源，这是一种推理时的加性logit偏差，它跨窗口汇集句子一致的证据，同时保持基础检索分数和候选池固定。作为分数空间干预，GCB使上下文源变得可测量：在相同固定设置下，Gwilliams上的R@1从44%变为52%，MOUS上从22%变为29%。在此设计下，GCB是可审计的：其效应在随机分组扰动下崩溃，并在局部证据在MEG中衰减或在EEG中接近随机时消失，支持其作为受控源归因干预的使用。这些结果表明，脑到语言性能应进行源归因，而不仅仅是报告。

英文摘要

In non-invasive neural language decoding, results can be inflated by sources that are not stimulus-evoked neural evidence: decoder priors, embedding-based metrics, and non-neural structural nuisances such as signal duration. The methodological challenge is therefore attribution: a reported gain is more informative when it can be traced to a specific source. We recast stimulus-locked MEG-to-audio retrieval as an auditing framework that separates apparent performance into three sources - structural shortcuts, window-level stimulus-locked evidence, and cross-window contextual aggregation - and provides a diagnostic for each. Signal-blind Gaussian noise reaches 66.3% Rank@1 (R@1) under variable-length decoding but collapses to near chance once fixed-duration windows and stimulus-identity splits are enforced, isolating structural leakage. Under these controls, fixed-window retrieval recovers measurable MEG-audio discriminability, while an oracle sentence-bucket diagnostic shows that 95.7% of Top-1 errors select the wrong sentence, localising the residual bottleneck to sentence-level competition. We audit this contextual source with Group Context Bias (GCB), an inference-time additive logit bias that pools sentence-consistent evidence across windows while leaving the base retrieval scores and candidate pool fixed. Used as a score-space intervention, GCB makes the contextual source measurable: R@1 shifts from 44% to 52% on Gwilliams and from 22% to 29% on MOUS under the same fixed setting. GCB is auditable under this design: its effect collapses under random-grouping perturbations and vanishes when local evidence is attenuated in MEG or is near chance in EEG, supporting its use as a controlled source-attribution intervention. These results suggest that brain-to-language performance should be source-attributed, not merely reported.

URL PDF HTML ☆

赞 0 踩 0

2605.24523 2026-05-26 cs.LG cs.CL q-bio.NC 版本更新

MindAlign: Bridging EEG, Vision, and Language for Zero-Shot Visual Decoding

MindAlign: 弥合脑电图、视觉和语言实现零样本视觉解码

Zexuan Chen, Sichao Liu, Runhao Lu, Huichao Qi, Alexandra Woolgar, Xi Vincent Wang, Lihui Wang

发表机构 * KTH, SWeden（瑞典皇家理工学院）； University of Cambridge, UK（剑桥大学）； EPFL, Switzerland（瑞士联邦理工学院）； McGill University, Canada（麦吉尔大学）； Karolinska Institutet, Sweden（卡罗林斯卡研究所）

AI总结提出一种三模态对比学习框架MindAlign，通过对齐脑电图、图像和文本表示，在Things-EEG2零样本基准上实现54.1% Top-1和83.4% Top-5准确率，显著超越先前方法。

Comments 20 pages, 10 figures, 15 tables

详情

AI中文摘要

从大脑信号进行视觉解码是计算机视觉和神经科学交叉领域的关键挑战，需要连接神经表征和视觉计算模型的方法。我们提出了一种基于脑电图的视觉解码三模态对比框架，在统一潜在空间中对齐脑电图、视觉和文本表示。我们的方法采用两阶段设计。首先，我们通过无标签试次上的掩码重建预训练脑电图编码器，学习可稳健迁移到下游任务的时空规律。其次，我们通过对比学习联合对齐脑电图、图像和大语言模型生成的文本描述，其中文本监督作为语义正则化器，向共享空间注入语言结构，而不压倒主要的脑电图-图像信号。编码器集成了被试自适应、通道上的图注意力和时空卷积嵌入。在Things-EEG2 200路零样本基准上，我们的框架实现了54.1%的Top-1和83.4%的Top-5准确率，大幅超过最强先前基线（32.4%/64.0%），配对Wilcoxon检验证实所有被试内基线的显著性（p<0.01）。我们在Things-MEG上验证了泛化性。分析表明，紧凑的嵌入几何（CN-CLIP）优于更大的骨干网络，且解码与视觉处理的既定神经生理学一致。这项工作是从非侵入性时间神经信号进行稳健、语义基础视觉解码的关键一步。源代码公开于https://github.com/anon-eeg/eeg_image_decoding。

英文摘要

Visual decoding from brain signals is a key challenge at the intersection of computer vision and neuroscience, requiring methods that bridge neural representations and computational models of vision. We introduce a tri-modal contrastive framework for EEG-based visual decoding that aligns EEG, visual, and textual representations within a unified latent space. Our approach follows a two-stage design. First, we pre-train an EEG encoder via masked reconstruction on unlabeled trials, learning spatio-temporal regularities that transfer robustly to downstream tasks. Second, we jointly align EEG, image, and LLM-generated textual descriptions through contrastive learning, where text supervision acts as a semantic regularizer that injects linguistic structure into the shared space without overwhelming the primary EEG-image signal. The encoder integrates subject-specific adaptation, graph-attention over channels, and temporal-spatial convolutional embeddings. On the Things-EEG2 200-way zero-shot benchmark, our framework achieves 54.1% Top-1 and 83.4% Top-5 accuracy, substantially exceeding the strongest prior baseline (32.4% / 64.0%), with paired Wilcoxon tests confirming significance (p < 0.01) over all in-subject baselines. We validate generalization on Things-MEG. Analysis reveals that compact embedding geometries (CN-CLIP) outperform much larger backbones, and that decoding aligns with established neurophysiology of visual processing. This work is a critical step towards robust, semantically-grounded visual decoding from non-invasive temporal neural signals. The source code is publicly available in https://github.com/anon-eeg/eeg_image_decoding.

URL PDF HTML ☆

赞 0 踩 0

2605.24520 2026-05-26 q-bio.GN cs.LG 版本更新

AnnotateMissense: a genome-wide annotation and benchmarking framework for missense pathogenicity prediction

AnnotateMissense：一个用于错义致病性预测的全基因组注释和基准测试框架

Muhammad Muneeb, David B. Ascher

发表机构 * School of Chemistry and Molecular Biology（化学与分子生物学学院）； The University of Queensland（昆士兰大学）； Baker Heart and Diabetes Institute（贝克心脏病与糖尿病研究所）

AI总结提出AnnotateMissense框架，整合多种特征，通过XGBoost模型在ClinVar数据集上实现高精度错义变异致病性预测，并生成全基因组预测结果。

详情

AI中文摘要

错义变异解读仍然具有挑战性，因为致病性取决于来自群体频率、进化保守性、转录本背景、氨基酸替代严重性、先验致病性预测因子以及蛋白质语言模型衍生特征的异质性证据。我们提出了AnnotateMissense，一个用于错义变异解读的可扩展注释、基准测试和全基因组预测框架。AnnotateMissense整合了来自dbNSFP v5.1的hg38错义变异与ANNOVAR注释、dbNSFP转录本/蛋白质描述符、AlphaMissense评分、ESM衍生特征、保守性指标、群体频率变量、已建立的致病性预测因子以及工程化的氨基酸/密码子背景特征。使用132,714个ClinVar标记的错义变异，我们在受控特征配置下对机器学习和深度学习模型进行了基准测试。完整的303特征基准集在XGBoost上实现了最强性能，在分层五折交叉验证中平均MCC=0.9411，ROC-AUC=0.9950。受限的朴素和位置导向特征集分别达到了较低的MCC最佳值0.4989和0.5113。循环控制消融实验表明，移除先验预测因子、群体频率和临床重叠证据会降低性能，而单独排除AlphaMissense和ESM衍生特征影响最小。在新观察到的致病/良性变异上的时间ClinVar验证实现了MCC=0.7613，准确率=0.8798，F1分数=0.8750。最终模型应用于90,643,830个hg38错义变异，生成AnnotateMissense致病性评分和二元预测标签。代码和输出可在https://github.com/MuhammadMuneeb007/CAGI7_Annotate_All_Missense和https://doi.org/10.5281/zenodo.19981867获取。

英文摘要

Missense variant interpretation remains challenging because pathogenicity depends on heterogeneous evidence from population frequency, evolutionary conservation, transcript context, amino acid substitution severity, prior pathogenicity predictors and protein-language-model-derived features. We present AnnotateMissense, a scalable annotation, benchmarking and genome-wide prediction framework for missense variant interpretation. AnnotateMissense integrates hg38 missense variants derived from dbNSFP v5.1 with ANNOVAR annotations, dbNSFP transcript/protein descriptors, AlphaMissense scores, ESM-derived features, conservation metrics, population-frequency variables, established pathogenicity predictors and engineered amino acid/codon-context features. Using 132,714 ClinVar-labelled missense variants, we benchmarked machine-learning and deep-learning models under controlled feature configurations. The full 303-feature benchmark set achieved the strongest performance with XGBoost, reaching mean MCC = 0.9411 and ROC-AUC = 0.9950 across stratified five-fold cross-validation. Restricted naive and location-oriented feature sets achieved lower best MCC values of 0.4989 and 0.5113, respectively. Circularity-controlled ablations showed that removing prior-predictor, population-frequency and clinically overlapping evidence reduced performance, whereas excluding AlphaMissense and ESM-derived features alone had minimal effect. Temporal ClinVar validation on newly observed pathogenic/benign variants achieved MCC = 0.7613, accuracy = 0.8798 and F1-score = 0.8750. The final model was applied to 90,643,830 hg38 missense variants to generate AnnotateMissense pathogenicity scores and binary prediction labels. Code and outputs are available at https://github.com/MuhammadMuneeb007/CAGI7_Annotate_All_Missense and https://doi.org/10.5281/zenodo.19981867.

URL PDF HTML ☆

赞 0 踩 0

2605.24517 2026-05-26 cs.LG cs.CL 版本更新

ECHO: Terminal Agents Learn World Models for Free

ECHO: 终端代理免费学习世界模型

Vaishnavi Shrivastava, Piero Kauffmann, Ahmed Awadallah, Dimitris Papailiopoulos

发表机构 * Microsoft Research（微软研究院）

AI总结提出ECHO混合目标，通过预测环境观测令牌将终端反馈转化为密集监督信号，显著提升CLI代理在TerminalBench-2.0上的性能。

详情

AI中文摘要

CLI代理是语言模型最接近具身环境的设置：模型发出命令，终端执行它们，返回的流——stdout、错误、文件、日志和跟踪——记录了后果。我们认为这个流是一个监督信号，但标准的代理强化学习丢弃了它：GRPO风格的训练使用稀疏的结果级奖励更新动作令牌，而忽略了rollout中已有的环境响应。失败的rollout尽管包含关于环境如何响应的丰富证据，但提供的策略梯度信号很少。我们引入了ECHO（环境交叉熵混合目标），这是一种混合目标，它将动作令牌上的标准策略梯度损失与辅助损失相结合，该辅助损失训练策略预测其自身动作产生的环境观测令牌。ECHO重用与GRPO相同的前向传播，不需要额外的rollout，并将终端反馈转化为所有rollout的密集监督。ECHO在TerminalBench-2.0上将GRPO的pass@1翻倍：Qwen3-8B从2.70%提升到5.17%，Qwen3-14B从5.17%提升到10.79%。ECHO还产生了更好地预测终端动态的策略，即使是在它们未生成的轨迹上：在保留的rollout中，它显著降低了环境令牌的交叉熵，而单独的GRPO几乎没有改变。从基础Qwen3-8B开始，ECHO在没有专家演示的情况下，在保留的终端任务上匹配了专家SFT然后GRPO的性能，并在TerminalBench-2.0上恢复了大专家SFT初始化收益的一半。在某些设置中，仅环境预测损失就能实现无验证器的自我改进，使策略仅通过与环境交互就能在未见过的OOD任务上改进。这些结果表明，环境观测不仅是未来动作的上下文，而且是每个rollout中已经存在的密集、在策略的监督信号。

英文摘要

CLI agents are the closest thing language models have to an embodied setting: the model emits commands, the terminal executes them, and the returned stream -- stdout, errors, files, logs, and traces -- records the consequences. We argue that this stream is a supervision signal, but standard agent RL discards it: GRPO-style training updates action tokens with sparse outcome-level rewards while ignoring environment responses already in the rollout. Failed rollouts provide little policy-gradient signal despite containing rich evidence about how the environment responds. We introduce ECHO (Environment Cross-entropy Hybrid Objective), a hybrid objective that combines the standard policy-gradient loss on action tokens with an auxiliary loss that trains the policy to predict environment observation tokens resulting from its own actions. ECHO reuses the same forward pass as GRPO, requires no additional rollouts, and turns terminal feedback into dense supervision for all rollouts. ECHO doubles GRPO pass@1 on TerminalBench-2.0: Qwen3-8B improves from 2.70% to 5.17%, and Qwen3-14B from 5.17% to 10.79%. ECHO also produces policies that better predict terminal dynamics, even on trajectories they did not generate: across held-out rollouts, it sharply reduces environment-token cross-entropy while GRPO alone barely changes it. From base Qwen3-8B, ECHO matches expert-SFT-then-GRPO performance on held-out terminal tasks without expert demonstrations, and recovers roughly half of the expert-SFT initialization benefit on TerminalBench-2.0. In some settings, the environment prediction loss alone enables verifier-free self-improvement, allowing policies to improve on unseen OOD tasks by learning only from environment interactions. Together, these results suggest that environment observations are not merely context for future actions, but a dense, on-policy supervision signal already present in every rollout.

URL PDF HTML ☆

赞 0 踩 0

2605.24515 2026-05-26 cs.LG 版本更新

Lake Detection and Water Quality Estimation in Sentinel-2 Data

Sentinel-2 数据中的湖泊检测与水质估计

Iulia Pleşu, Alexandra Băicoianu, Ioana Cristina Plajer

发表机构 * Transilvania University of Bra\c sov, Faculty of Mathematics and Computer Science（布拉索夫特拉西亚大学数学与计算机科学学院）

AI总结本文比较了三种机器学习架构用于水体识别与监测，并提出了针对水质指数的有意义配色方案，以提高可解释性和决策支持。

详情

AI中文摘要

随着气候变化和人类对自然景观的压力增加，内陆水资源变得越来越稀缺、脆弱且难以可持续管理。因此，可靠且自动化的地表水体检测、监测和评估方法具有日益增长的科学和实践重要性。在本文中，我们研究并比较了三种不同的机器学习架构用于水体识别与监测。通过定量指标和实际案例评估其性能。此外，在代表性测试图像上与经典的 NDWI 阈值法进行直接比较，以突出数据驱动方法与基于指数方法之间的差异。这一分析使我们能够识别出在准确性、鲁棒性和实际适用性方面表现最佳的模型。除了检测之外，有意义的水质评估的一个主要挑战在于光谱水指数的一致且可解释的可视化。标准颜色映射技术通常不足或可能对环境应用产生误导。为弥补这一差距，我们提出了一套适用于水质指数的有意义配色方案，有助于人类用户更清晰地解释、比较和决策。

英文摘要

With climate change and increasing human pressure on natural landscapes, inland water resources are becoming progressively scarcer, more vulnerable, and more difficult to manage sustainably. Reliable and automated methods for detecting, monitoring, and assessing surface water bodies are therefore of growing scientific and practical importance. In this paper, we investigate and compare three distinct machine learning architectures for water body identification and monitoring. Their performance is evaluated through quantitative metrics and real-world examples. Furthermore, a direct comparison with classical NDWI thresholding is conducted on a representative test image to highlight differences between data-driven and index-based approaches. This analysis allows us to identify the best-performing model in terms of accuracy, robustness, and practical applicability. Beyond detection, a major challenge for meaningful water quality assessment lies in the consistent and interpretable visualization of spectral water indices. Standard color mapping techniques are often inadequate or potentially misleading for environmental applications. To address this gap, we propose a suite of meaningful color schemes adapted for water quality indices, facilitating clearer interpretation, comparison, and decision-making for human users.

URL PDF HTML ☆

赞 0 踩 0

2605.24513 2026-05-26 cs.LG 版本更新

Zeroth-Order Nonconvex Nonsmooth Optimization with Heavy-Tailed Noise

具有重尾噪声的零阶非凸非光滑优化

Zhuanghua Liu, Luo Luo

发表机构 * Zhuanghua Liu（刘庄华）； Luo Luo（罗洛）

AI总结针对目标函数Lipschitz连续的非凸非光滑问题，提出一种通过裁剪两点梯度估计器的在线到非凸转换框架的随机零阶算法，在重尾噪声下实现$(δ, ε)$-Goldstein驻点，其零阶复杂度为${\\mathcal O}(d^{\\frac{p}{2(p-1)}}δ^{-1}ε^{-\\frac{2p-1}{p-1}})$，与已知最优结果一致。

详情

AI中文摘要

SPACE：统一对称与非对称路由问题的通用神经求解器

Rongsheng Chen, Changliang Zhou, Canhong Yu, Yuanyao Chen, Yu Zhou, Zhuo Chen, Zhenkun Wang

发表机构 * School of Automation and Intelligent Manufacturing, Southern University of Science and Technology, Shenzhen, China（自动化与智能制造学院，南方科技大学，深圳，中国）； Pengcheng Laboratory, Shenzhen, China（鹏城实验室，深圳，中国）； Guangdong Provincial Key Laboratory of Fully Actuated System Control Theory and Technology, Southern University of Science and Technology, Shenzhen, China（广东省全驱动系统控制理论与技术重点实验室，南方科技大学，深圳，中国）； College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China（计算机科学与软件工程学院，深圳大学，深圳，中国）

AI总结针对现有神经求解器在对称与非对称车辆路径问题中表现不一致的问题，提出基于空间枢轴对齐的无坐标嵌入框架SPACE，通过双向弗雷歇表示和权重解耦自适应解码机制，实现统一节点表示与解生成，在110个变体上取得优异零样本泛化。

详情

AI中文摘要

通用神经路由求解器在利用统一模型解决多种车辆路径问题（VRPs）方面显示出巨大潜力。然而，现有求解器通常局限于对称设置，或在切换到非对称设置时由于输入不一致或固有结构差异而性能下降，这严重限制了它们在包含两种场景的实际应用中的实用性。为解决这一限制，我们基于每个节点到特定枢轴集的相对距离定义其空间位置，并进一步提出一种空间枢轴对齐的无坐标嵌入（SPACE）框架，该框架统一了对称和非对称VRP中的节点表示和解生成。具体而言，我们使用一种新颖的最远枢轴采样策略构建双向弗雷歇表示，以实现跨不同问题设置的不变节点表示。此外，我们引入了一种权重分解的自适应解码机制，将几何感知从问题表示中解耦，减轻约束决策对特定几何设置的过拟合。在110个VRP变体（包括55个对称问题及其非对称对应问题）上的大量实验表明，SPACE在对称和非对称VRP中均实现了有前景的零样本泛化。

英文摘要

Generalist neural routing solvers have shown great potential in solving diverse vehicle routing problems (VRPs) with a unified model. However, existing solvers are typically limited to symmetric settings or degrade in performance when switching to asymmetric settings due to input inconsistencies or inherent structural differences, substantially limiting their practicality in real-world scenarios that encompass both scenarios. To address this limitation, we define the spatial position of each node based on the relative distances to a specific set of pivots and further propose a Spatial Pivot-Aligned Coordinate-free Embedding (SPACE) framework that unifies node representation and solution generation across symmetric and asymmetric VRPs. Specifically, we construct a bidirectional Frechet representation using a novel furthest pivot sampling strategy to enable invariant node representations across distinct problem settings. Furthermore, we introduce a weight-decomposed adaptive decoding mechanism that decouples geometric perception from problem representations, mitigating the overfitting of constraint decisions to a specific geometry setting. Extensive experiments on 110 VRP variants, comprising 55 symmetric problems and their asymmetric counterparts, demonstrate that SPACE achieves promising zero-shot generalization in both symmetric and asymmetric VRPs.

URL PDF HTML ☆

赞 0 踩 0

2605.24477 2026-05-26 cs.LG cs.IT math.IT math.ST stat.TH 版本更新

法律判决预测中的时间概念漂移：跨越乌克兰法院判决三个时期的神经基线

Volodymyr Ovcharov

AI总结通过微调四种Transformer编码器在乌克兰法院三个时期（战前、混合战争、全面入侵）的判决上，研究法律语言的时间漂移，发现前向性能严重下降（最多27.2个百分点），法律领域预训练不能提升绝对性能但能减轻漂移，时序持续学习可消除灾难性遗忘。

Comments 17 pages, 6 tables, 5 figures. Dataset: https://huggingface.co/datasets/overthelex/ukrainian-court-decisions

详情

AI中文摘要

法律NLP基准测试在随机分割的数据上评估模型，隐含假设法律语言是平稳的。我们通过微调四种Transformer编码器——XLM-RoBERTa（base和large）及其法律领域变体——在地缘政治事件定义的三个时间时期的乌克兰法院判决上测试这一假设：战前（2008-2013）、混合战争（2014-2021）和全面入侵（2022-2026）。每个模型在一个时期上训练，并在所有三个时期上评估，产生一个3x3的跨时间泛化矩阵。四个发现出现。（1）前向退化严重：在战前数据上训练的模型应用于全面入侵时期判决时，宏F1最多下降27.2个百分点。（2）退化不对称：后向迁移（全面入侵到战前）比前向迁移稳健得多，与法律语言是加性的假设一致。（3）法律领域预训练（Legal-XLM-R）不提升绝对性能，但减少前向退化的幅度和不对称性。（4）时序持续学习消除了通用XLM-R的灾难性遗忘：战前知识完全保留（+1.8至+6.2个百分点），而全面入侵性能提升+16.5至+19.0个百分点；逆时序训练导致严重遗忘。跨司法管辖区在瑞士判决预测数据上的预训练提升绝对性能，但不减少时间退化幅度，确认时间漂移是法律语言演化的内在属性。数据集（三个时期共428K判决）作为LEXTREME贡献公开可用。

英文摘要

Legal NLP benchmarks evaluate models on randomly split data, implicitly assuming that legal language is stationary. We test this assumption by fine-tuning four transformer encoders -- XLM-RoBERTa (base and large) and their legal-domain variants -- on Ukrainian court decisions from three temporal epochs defined by geopolitical disruptions: pre-war (2008-2013), hybrid war (2014-2021), and full-scale invasion (2022-2026). Each model is trained on one epoch and evaluated on all three, producing a 3x3 cross-temporal generalization matrix. Four findings emerge. (1) Forward degradation is severe: models trained on pre-war data lose up to 27.2 percentage points of macro-F1 when applied to full-scale invasion era decisions. (2) The degradation is asymmetric: backward transfer (full-scale to pre-war) is substantially more robust than forward transfer, consistent with the hypothesis that legal language is additive. (3) Legal-domain pretraining (Legal-XLM-R) does not improve absolute performance but reduces forward degradation magnitude and asymmetry. (4) Chronological continual learning eliminates catastrophic forgetting for general XLM-R: pre-war knowledge is fully retained (+1.8 to +6.2 pp) while full-scale performance gains +16.5 to +19.0 pp; reverse-chronological training causes severe forgetting. Cross-jurisdictional pretraining on Swiss Judgment Prediction data improves absolute performance but does not reduce temporal degradation magnitude, confirming that temporal drift is an intrinsic property of legal language evolution. The dataset (428K decisions across three epochs) is publicly available as a LEXTREME contribution.

URL PDF HTML ☆

赞 0 踩 0

2605.24449 2026-05-26 cs.RO cs.LG 版本更新

Vision-Guided Outdoor Flight and Obstacle Evasion via Reinforcement Learning

基于强化学习的视觉引导户外飞行与避障

Shiladitya Dutta, Aayush Gupta, Varun Saran, Avideh Zakhor

发表机构 * College of Engineering, Department of Electrical Engineering and Computer Science, University of California Berkeley（加州大学伯克利分校工程学院电气工程与计算机科学系）

AI总结提出一种基于立体视觉深度和视觉惯性里程计的传感器运动策略，通过强化学习和特权学习在仿真中训练，实现零样本迁移到未知户外环境和无人机平台进行自主避障导航。

Comments Published in IEEE Robotics and Automation Letters, vol 11, no 2. Presented at the IEEE International Conference on Robotics and Automation 2026

详情

DOI: 10.1109/LRA.2025.3641120

AI中文摘要

尽管四旋翼飞行器凭借其全向机动性拥有令人印象深刻的穿越能力，但在复杂环境中需要持续的人工操控限制了其在GNSS和遥测信号缺失场景中的应用。为此，我们提出了一种新颖的传感器运动策略，该策略使用立体视觉深度和视觉惯性里程计（VIO）在未知环境中自主穿越障碍物以到达目标点。该策略由一个预训练的自编码器作为感知前端，后接一个规划与控制LSTM网络，输出速度指令，可由现成的商用无人机执行。我们利用强化学习和特权学习范式，通过两阶段过程在仿真中训练该策略：1）以全局运动规划器生成的优化轨迹作为监督骨干进行初始训练；2）在课程环境中进一步微调。为弥合仿真到现实的差距，我们采用领域随机化和奖励塑造来创建对噪声和领域偏移具有鲁棒性的策略。在户外实验中，我们的方法成功实现了对训练中从未遇到的障碍环境和无人机平台的零样本迁移。

英文摘要

Although quadcopters boast impressive traversal capabilities enabled by their omnidirectional maneuverability, the need for continuous pilot control in complex environments impedes their application in GNSS and telemetry-denied scenarios. To this end, we propose a novel sensorimotor policy that uses stereo-vision depth and visual-inertial odometry (VIO) to autonomously navigate through obstacles in an unknown environment to reach a goal point. The policy is comprised of a pre-trained autoencoder as the perception head followed by a planning and control LSTM network which outputs velocity commands that can be followed by an off-the-shelf commercial drone. We leverage reinforcement and privileged learning paradigms to train the policy in simulation through a two-stage process: 1) initial training with optimal trajectories generated by a global motion planner acting as a supervisory backbone, 2) further fine-tuning in a curriculum environment. To bridge the sim-to-real gap, we employ domain randomization and reward shaping to create a policy that is both robust to noise and domain shift. In outdoor experiments, our approach achieves successful zero-shot transfer to both obstacle environments and a drone platform that were never encountered during training.

URL PDF HTML ☆

赞 0 踩 0

2605.24437 2026-05-26 cs.LG 版本更新

CAffNet: Hard Constraint-Affine Neural Networks

CAffNet: 硬约束仿射神经网络

Yang Zhao, Jungeun Lee, Jeong hwan Jeon, Sze Zheng Yong

发表机构 * Department of Mechanical and Industrial Engineering, Northeastern University, Boston, MA 02115 USA（东北大学机械与工业工程系）； Department of Electrical Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea（乌山科学技术研究院电子工程系）

AI总结提出一种将任意基数的输入相关仿射约束硬嵌入前馈神经网络和Transformer的框架，通过可训练的约束仿射层实现联合优化并保持通用逼近性质。

详情

AI中文摘要

我们提出了一种新颖的框架，用于将硬约束满足嵌入神经网络（NN）架构中，特别是前馈神经网络和Transformer，约束为任意基数的输入相关仿射约束。传统的约束执行方法要么依赖于基于惩罚的软约束，无法保证满足性，要么依赖于训练后执行约束的后处理方法，可能导致次优性。我们在神经网络中引入了一个可训练的约束仿射（CAffine）层，得到CAffNet，它超越了通过固定正交或平行投影执行仿射约束的方式，并实现了与网络参数的联合优化。此外，我们对约束空间维度没有施加任何限制，并证明了我们的构造保持了神经网络的通用逼近性质，同时为所有输入提供了约束遵守的可证明保证。实验验证表明，在需要保证约束满足的各个领域中，性能稳健。

英文摘要

We present a novel framework for embedding hard constraint satisfaction into neural network (NN) architectures, specifically feedforward neural networks and transformers, with input-dependent affine constraints of arbitrary cardinality. Traditional constraint enforcement approaches either rely on penalty-based soft constraints, which offer no guarantee of satisfaction, or on post-processing methods that enforce constraints after the NN is trained, which may lead to suboptimality. We introduce a trainable constraint-affine (CAffine) layer into NNs, yielding CAffNet, which goes beyond enforcing affine constraints via fixed orthogonal or parallel projections and enables joint optimization with network parameters. Moreover, we impose no restrictions on the constraint space dimensions and establish that our construction preserves the universal approximation properties of NNs, while providing provable guarantees on constraint adherence for all inputs. Experimental validation demonstrates robust performance across diverse domains requiring guaranteed constraint satisfaction.

URL PDF HTML ☆

赞 0 踩 0

2605.24436 2026-05-26 cs.MA cs.LG cs.RO 版本更新

A Reinforcement Learning Inspired Latent Yield Based Adaptive Algorithm Switching Mechanism

一种受强化学习启发的基于潜在收益的自适应算法切换机制

Jayprakash S. Nair, Jimson Mathew, Shivashankar B. Nair

发表机构 * Indian Institute of Technology Patna（印度理工学院帕纳布分校）； Indian Institute of Technology Guwahati（印度理工学院古瓦哈提分校）

AI总结针对在线或动态环境中算法选择困难的问题，提出一种受强化学习启发的潜在收益方法，通过封装奖励和惩罚触发探索与利用，实现自适应算法切换，并在排序算法和机器人避障任务中验证了有效性。

Comments Accepted and published in the Proceedings of the 29th European Conference on Applications of Evolutionary Computation (EvoApplications 2026), held as part of EvoStar 2026, Toulouse, France, April 8 to 10, 2026. Lecture Notes in Computer Science (LNCS), Springer Nature Switzerland

详情

DOI: 10.1007/978-3-032-23604-3_8
Journal ref: Applications of Evolutionary Computation, EvoApplications 2026, LNCS, Springer Nature Switzerland, 2026

AI中文摘要

对于给定的问题实例，选择最合适的算法仍然是一项具有挑战性的任务，尤其是在问题特征随时间演变的在线或动态环境中。仅依赖瞬时性能指标可能导致反应性和不稳定的行为，通常会导致次优的算法切换。本文介绍了一种计算高效的方法，用于聚合算法在多个问题实例上的性能，该方法对实例特征的剧烈变化具有相当的免疫性。受强化学习（RL）固有特征的启发，该技术将奖励和惩罚封装到一个潜在收益中，进而触发利用和探索，从而产生自适应算法切换。所提出的技术采用受遗传算法启发的岛屿模型，以促进并行探索和算法种群之间的性能交换，这些算法种群栖息在局部库中。在排序算法和机器人避障任务上的实验评估证明了该方法的可行性和有效性，突显了其在自适应算法选择至关重要的领域中的潜力。

英文摘要

Selecting the most suitable algorithm for a given problem instance remains a challenging task, particularly in online or dynamic environments where problem characteristics evolve over time. Relying solely on instantaneous performance metrics can result in a reactive and unstable behaviour, often leading to suboptimal algorithm switching. This paper introduces a computationally efficient approach for aggregating an algorithm's performance across multiple problem instances that is fairly immune to erratic variations in instance features. Inspired by features inherent to Reinforcement Learning (RL), this technique encapsulates rewards and penalties into a latent yield that, in turn, triggers exploitation and exploration, consequently resulting in adaptive algorithm switching. The proposed technique employs island models, inspired by Genetic Algorithms, to facilitate parallel exploration and performance exchanges among algorithm populations inhabiting local repertoires. Experimental evaluations on sorting algorithms and robotic obstacle avoidance tasks demonstrate the feasibility and effectiveness of the approach, highlighting its potential in domains where adaptive algorithm selection is critical.

URL PDF HTML ☆

赞 0 踩 0

2605.24433 2026-05-26 cs.RO cs.LG 版本更新

Smoother Action Chunking Flow Policy via Prior-Corrected Orthogonal Trust-Region Guidance

基于先验校正的正交信任区域引导的平滑动作块流策略

Kai Fang, Hailong Pei, Xuemin Chi

发表机构 * South China University of Technology（华南理工大学）； Zhejiang University（浙江大学）

AI总结提出POTR方法，通过先验校正权重和正交信任区域约束，改善流匹配机器人策略中动作块推理的边界不连续性和横向扰动，提升成功率和运动平滑性。

详情

AI中文摘要

流匹配机器人策略通常使用动作块推理进行高效的闭环控制，但块边界可能引入不连续的动作转换。现有的RTC引导通过在去噪过程中注入校正信号来改善连续性，但其权重调度在中间时间步较弱，且无约束的校正方向可能引入横向扰动。我们提出POTR，一种先验校正的正交信任区域引导方法。首先，我们将数据先验尺度$σ_d$纳入RTC引导权重，产生更强的中间时间校正。其次，我们将引导向量分解为与去噪速度平行和垂直的分量，并将垂直分量约束在信任区域内。在LIBERO上使用$π_{0.5}$，与RTC相比，POTR提高了成功率，并持续减少了块边界不连续性、加速度和加加速度。消融实验表明，先验校正权重提供了主要的校正增益，而正交信任区域进一步提高了稳定性。

英文摘要

Flow-matching robot policies commonly use action-chunking inference for efficient closed-loop control, but chunk boundaries can introduce discontinuous action transitions. Existing RTC guidance improves continuity by injecting correction signals during denoising, yet its weight schedule is weak at intermediate timesteps and its unconstrained correction direction may introduce transverse perturbations. We propose POTR, a **p**rior-corrected **o**rthogonal **t**rust-**r**egion guidance method. First, we incorporate a data-prior scale $σ_d$ into the RTC guidance weight, yielding stronger intermediate-time correction. Second, we decompose the guidance vector into components parallel and perpendicular to the denoising velocity, and constrain the perpendicular component within a trust region. On LIBERO with $π_{0.5}$, POTR improves success rate and consistently reduces chunk-boundary discontinuity, acceleration, and jerk compared with RTC. Ablations show that the prior-corrected weight provides the main correction gain, while the orthogonal trust region further improves stability.

URL PDF HTML ☆

赞 0 踩 0

2605.24428 2026-05-26 cs.LG 版本更新

Representation-Guided Discrete Molecular Graph Retrosynthesis

表示引导的离散分子图逆合成

Jiahai Huang, Anjie Qiao, Zhen Wang, Defu Lian, Yutong Lu

发表机构 * Sun Yat-sen University（中山大学）； University of Science and Technology of China（中国科学技术大学）

AI总结提出表示引导的分子图逆合成方法GRG，通过将预训练编码器的化学语义注入扩散模型，在USPTO-50k上达到58.6/77.2/83.4/87.1的top-1/3/5/10准确率，多样性提升至15.5，并加速收敛35%的epoch和30%的时间。

详情

AI中文摘要

基于随机过程的分子图生成器已成为无模板单步逆合成的最先进方法。然而，这些模型通常仅在产物-反应物对上训练，从而以间接和隐式的方式获取化学相关表示。与此同时，计算机视觉的最新进展表明，向生成器提供表示引导可以有效地将预训练编码器的语义提取到DiTs中，显著改善收敛性和生成质量。类似的增益是否适用于逆合成任务，以及哪些图特定的设计选择可以使其工作，仍然是一个开放问题。为了解决这些问题，我们在一个统一的设计空间上进行了系统的实证研究，该空间涵盖教师分子表示、端点和粒度选择、去噪器中的注入深度、对应策略和引导方案。在这些考虑的指导下，我们开发了图导向的表示引导（GRG），在USPTO-50k上实现了58.6/77.2/83.4/87.1的top-1/3/5/10准确率，同时将多样性提高到15.5，两者均大幅优于所采用的基础生成器。值得注意的是，GRG在分布外设置中一致地改进了所有top-k指标，表明表示引导有助于获取内在的化学语义。同时，引入的表示引导将达到可比性能所需的epoch数减少了35%，挂钟时间减少了30%。此外，我们引入了一种简单而有效的基于表示相似性的重排序机制，该机制无需训练额外的验证器即可进一步改善排序列表的顶部。

英文摘要

Stochastic process-based molecular graph generators have become the state of the art for template-free single-step retrosynthesis. However, these models are typically trained only on product-reactant pairs, thereby acquiring chemistry-relevant representations in an indirect and implicit manner. Meanwhile, recent advances in computer vision demonstrate that offering representation guidance to a generator can effectively distill semantics from pretrained encoders into DiTs, substantially improving both convergence and generation quality. Whether similar gains extend to the retrosynthesis task, and what graph-specific design choices can make them work, remains an open question. To address these questions, we conduct a systematic empirical study over a unified design space spanning teacher molecular representations, endpoint and granularity choices, injection depths in the denoiser, correspondence strategies and guidance scheme. Guided by these considerations, we develop Graph-oriented Representation Guidance (GRG), which achieves 58.6 / 77.2 / 83.4 / 87.1 top-1 / 3 / 5 / 10 accuracy on USPTO-50k, while increasing diversity to 15.5, both substantially outperforming the adopted base generator. Notably, GRG consistently improves all top-k metrics in out-of-distribution settings, suggesting that representation guidance facilitates the acquisition of intrinsic chemical semantics. Meanwhile, the introduced representation guidance reduces the number of epochs by 35% and the wall-clock time by 30% to reach comparable performance. In addition, we introduce a simple yet effective representation-similarity-based reranking mechanism, which further improves the top of the ranked list without training an additional verifier.

URL PDF HTML ☆

赞 0 踩 0

2605.24425 2026-05-26 cs.LG cs.AI cs.CL 版本更新

Momentum Streams for Optimizer-Inspired Transformers

动量流：优化器启发的Transformer

Jingchu Gai, Nai-Chieh Huang, Jiayun Wu

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结提出一类优化器启发的Transformer（如三重动量TMMFormer），通过将残差更新解释为优化器步骤，发现动量是性能提升的关键，能收敛到更平坦的极小值，减少遗忘并改善泛化。

2605.24422 2026-05-26 stat.ML cs.LG 版本更新

Clustering based on Stochastic Dominance with application for risk averters and risk seekers

基于随机占优的聚类及其对风险规避者和风险寻求者的应用

Hua Li, Xue Jia, Yilin Kang, Wing-Keung Wong

发表机构 * School of Science, Changchun University, Changchun, China（长春大学科学学院，中国长春）； School of Mathematics and Science, Northeast Normal University, Changchun, China（东北师范大学数学与科学学院，中国长春）

AI总结针对传统聚类方法无法捕捉资产间风险占优关系的问题，提出基于随机占优检验统计量的聚类分析框架，通过构造随机占优系数矩阵并改进K-means和层次聚类算法，实现面向不同风险偏好投资者的定制化资产配置。

详情

AI中文摘要

随机占优（SD）理论为选择适合不同风险偏好（即风险规避、风险寻求和风险中性）投资者资产配置需求的优质资产提供了严格框架。然而，传统的股票聚类方法通常依赖欧氏距离等几何度量，往往无法有效捕捉资产间的内在风险占优关系。为解决这一局限，本文提出一种基于SD检验统计量的创新聚类分析框架。方法上，本研究将SD理论与机器学习算法深度融合。超越传统依赖几何距离的限制，我们创新性地利用一阶、二阶和三阶SD的检验统计量构建“随机占优系数矩阵”。在此矩阵基础上，我们修改了经典的K-means和层次聚类算法。具体地，针对不同阶次的SD关系，我们推导出12种不同的算法变体。同时，我们构建了SD-SC系数和SD-DBI指数作为专门的有效性指标来评估聚类性能。实证上，我们分析了代表性发达市场（美国纳斯达克指数）和新兴市场（中国沪深100指数）的成分股数据。结果验证了所提方法的有效性和稳健性。此外，我们将聚类结果应用于单指数模型的修正和全局最小方差投资组合（GMVP）的构建。结果表明，所提方法有效促进了投资者的定制化资产配置，具有重要的理论价值和实践意义。

英文摘要

Stochastic Dominance (SD) theory provides a rigorous framework for selecting superior assets tailored to the asset allocation needs of investors with varying risk preferences (i.e., risk-averse, risk-seeking, and risk-neutral). However, traditional stock clustering methods typically rely on geometric metrics such as Euclidean distance, which often fail to effectively capture the intrinsic risk dominance relationships among assets. To address this limitation, this paper proposes an innovative clustering analysis framework based on SD test statistics. Methodologically, this study deeply integrates SD theory with machine learning algorithms. Transcending the limitations of traditional reliance on geometric distance, we innovatively utilize test statistics from first-, second-, and third-order SD to construct a "Stochastic Dominance Coefficient Matrix." Building upon this matrix, we modify the classic K-means and Hierarchical Clustering algorithms. Specifically, we derive 12 distinct algorithm variants tailored to different orders of SD relationships. Simultaneously, we construct the SD-SC coefficient and the SD-DBI index as specialized validity indices to evaluate the clustering performance. Empirically, we analyze constituent stock data from a representative developed market (the US NASDAQ Index) and an emerging market (China's CSI 100 Index). The results verify the effectiveness and robustness of the proposed method. Furthermore, we apply the clustering results to the modification of the Single Index Model and the construction of Global Minimum Variance Portfolios (GMVP). The findings demonstrate that the proposed method effectively facilitates customized asset allocation for investors, holding significant theoretical value and practical implications.

URL PDF HTML ☆

赞 0 踩 0

2605.24421 2026-05-26 cs.CR cs.LG 版本更新

Poisoning the Watchtower: Prompt Injection Attacks Against LLM-Augmented Security Operations Through Adversarial Log Content

毒害瞭望塔：通过对抗性日志内容对LLM增强的安全运营进行提示注入攻击

Rohan Pandey, Archit Bhujang

发表机构 * DigitalOcean ； Arizona State University（亚利桑那州立大学）

AI总结研究攻击者控制的日志字段如何向LLM注入指令（日志基底提示注入），提出四类攻击分类，并评估不同防御下的攻击成功率。

Comments 10 pages

详情

AI中文摘要

大型语言模型（LLM）越来越多地被用作安全运营中心（SOC）的分析师助手，它们接收日志和警报数据，生成分类标签、事件摘要或修复建议。我们研究了这种设计的一个结构性故障模式：许多日志字段是攻击者控制的。用户代理、URL、有效载荷、DNS查询和尝试的用户名因此可以将指令与入侵证据一起传递给模型。我们将这种设置称为\emph{日志基底提示注入}。我们引入了日志基底攻击的四类分类法：直接覆盖（S1）、角色劫持（S2）、上下文操纵（S3）和混淆有效载荷（S4）。我们使用 exttt{gpt-4o-mini}作为分析师评估了48种策略-防御-任务组合。三个发现突出。首先，直接覆盖在我们的设置中无效：所有S1分类攻击实现了0%的抑制。相比之下，角色劫持在朴素分类器下抑制了68%的恶意日志，并且在更强的防御下仍然有效。其次，摘要是最高风险的任务：上下文操纵在没有防御的情况下达到96%的注入成功率，即使在受限输出下也达到38%。第三，防御减少但未消除攻击面：平均注入成功率从朴素提示下的26.6%下降到我们最强防御下的11.8%。我们还将实证结果与确定性模拟分析师进行比较，发现模拟显著错误预测了当前模型行为，尤其是对于直接覆盖。这些结果表明，SOC副驾驶应将原始日志内容视为对抗性输入，而非普通分析师上下文。

英文摘要

Large language models (LLMs) are increasingly used as analyst assistants in security operations centers (SOCs), where they ingest log and alert data to produce triage labels, incident summaries, or remediation advice. We study a structural failure mode of this design: many log fields are attacker controlled. User agents, URLs, payloads, DNS queries, and attempted usernames can therefore carry instructions to the model alongside evidence of the intrusion. We call this setting \emph{log-substrate prompt injection}. We introduce a four-class taxonomy of log-substrate attacks: direct override (S1), persona hijack (S2), context manipulation (S3), and obfuscated payloads (S4). We evaluate 48 strategy-defense-task combinations using \texttt{gpt-4o-mini} as the analyst. Three findings stand out. First, direct overrides are ineffective in our setting: all S1 classification attacks achieve 0\% suppression. In contrast, persona hijacks suppress 68\% of malicious logs under a naive classifier and remain effective under stronger defenses. Second, summarization is the highest-risk task: context manipulation reaches 96\% injection success without defenses and 38\% even with constrained output. Third, defenses reduce but do not eliminate the attack surface: average injection success falls from 26.6\% under naive prompting to 11.8\% under our strongest defense. We also compare empirical results to a deterministic mock analyst and find that simulation substantially mispredicts current model behavior, especially for direct overrides. These results suggest that SOC copilots should treat raw log content as adversarial input rather than ordinary analyst context.

URL PDF HTML ☆

赞 0 踩 0

2605.24420 2026-05-26 cs.LG cs.AI 版本更新

Batch Normalization Amplifies Memorization and Privacy Risks

批归一化加剧记忆化和隐私风险

Ngoc Phu Doan, Chongyan Gu, Ihsen Alouani

发表机构 * Queen’s University Belfast（女王大学贝尔法斯特）

AI总结本文通过实证和理论分析，发现批归一化层会显著增加模型对异常样本的记忆化，从而加剧隐私泄露风险。

详情

AI中文摘要

批归一化（BN）被广泛采用以加速深度神经网络的收敛并实现更稳定的训练。然而，其对隐私和记忆化的影响在很大程度上尚未被探索。在这项工作中，我们研究了BN层对非典型或异常样本记忆化的影响及其对隐私泄露的启示。我们使用三种互补方法进行了广泛的实证研究：（i）对分布外训练样本的无意记忆化，（ii）通过梯度范数测量的每个样本影响，以及（iii）对成员推断攻击（MIA）的敏感性。跨多个数据集和架构，我们一致观察到，与没有BN的模型相比，BN显著增加了对异常值的记忆化。关键的是，这种放大的记忆化直接转化为隐私漏洞：具有BN的模型对MIA表现出显著更高的敏感性。我们通过理论分析补充了实证结果，表明BN在训练过程中放大了异常样本的每步影响，为这一现象提供了机制性见解。我们的结果突显了与BN相关的被低估的隐私风险，并为归一化层如何放大罕见或敏感训练样本的影响提供了实践和理论见解。

英文摘要

Batch Normalization (BN) is widely adopted to enable faster convergence and more stable training of deep neural networks. However, its impact on privacy and memorization has remained largely unexplored. In this work, we investigate the effect of BN layers on the memorization of atypical or outlier samples and its implications for privacy leakage. We conduct an extensive empirical study using three complementary approaches: (i) unintended memorization of out-of-distribution training samples, (ii) per-sample influence measured via gradient norms, and (iii) susceptibility to membership inference attacks (MIA). Across multiple datasets and architectures, we consistently observe that BN substantially increases the memorization of outliers compared to models without BN. Critically, this amplified memorization translates directly into privacy vulnerabilities: models with BN exhibit significantly higher susceptibility to MIAs. We complement our empirical findings with a theoretical analysis showing that BN amplifies the per-step influence of outlier samples during training, providing mechanistic insight into this phenomenon. Our results highlight an underappreciated privacy risk associated with BN and provide both practical and theoretical insights into how normalization layers can amplify the influence of rare or sensitive training examples.

URL PDF HTML ☆

赞 0 踩 0

2605.24418 2026-05-26 cs.LG 版本更新

ChainLearn: A Blockchain-Based Capacity-Aware Framework for Federated Ensemble Learning

ChainLearn: 一种基于区块链的容量感知联邦集成学习框架

Karan Sharma, Aditya Tripathi, Rahul Mishra, Tapas Kumar Maiti

发表机构 * EdddTri

AI总结针对医院计算资源异构导致标准联邦学习失效的问题，提出容量感知协调方法，通过区块链分离链上策略与链下学习，为各医院分配适当架构并加权集成，在降低通信开销的同时保持竞争性精度与校准误差。

Comments 10 pages, 7 figures, 11 tables. IEEE conference format. Code: https://github.com/EdddTri/ChainLearn

详情

AI中文摘要

联邦学习用于医疗影像中，其中隐私禁止集中数据。标准联邦算法假设同质硬件、相同架构和集中聚合，当医院拥有不均等的计算资源时失败。我们提出容量感知协调：测量每个医院的吞吐量，分配容量适当的架构（MobileNetV3-Small、EfficientNet-B0、ResNet-50），并通过加权集成组合预测。弱医院和强医院都可以参与，无需强制统一架构。我们将链上策略与链下学习分离。一个Solidity合约存储医院注册、基准哈希、指标和权重。医院本地训练并仅提交哈希和标量（而非参数）。加权集成推理在链下计算。在PneumoniaMNIST和DermaMNIST上的实验（5个种子，3个非独立同分布水平）表明，我们的方法相比等权集成实现了更低或相等的校准误差，相比FedAvg、FedProx和FedMD具有竞争性精度。每轮通信开销为224字节，相比FedAvg减少了超过912,000倍。

英文摘要

Federated learning is used in medical imaging where privacy prohibits centralizing data. Standard federated algorithms assume homogeneous hardware, identical architectures, and centralized aggregation, which fails when hospitals have unequal compute resources. We propose capacity-aware coordination: measure each hospital's throughput, assign capacity-appropriate architectures (MobileNetV3-Small, EfficientNet-B0, ResNet-50), and combine predictions via weighted ensemble. Weak and strong hospitals can participate without forcing uniform architectures. We separate on-chain policy from off-chain learning. A Solidity contract stores hospital registration, benchmark hashes, metrics, and weights. Hospitals train locally and submit only hashes and scalars (not parameters). Weighted ensemble inference is computed off-chain. Experiments on PneumoniaMNIST and DermaMNIST (5 seeds, 3 non-IID levels) show our method achieves lower or equal calibration error versus equal-weight ensemble and competitive accuracy versus FedAvg, FedProx, and FedMD. Communication overhead is 224 bytes per round, a reduction of over 912,000x compared to FedAvg.

URL PDF HTML ☆

赞 0 踩 0

2605.24417 2026-05-26 cs.LG 版本更新

LLMTabBench: Evaluating LLMs on Binary Tabular Classification From Zero to Few Shots

LLMTabBench：从零样本到少样本的二元表格分类中评估LLM

Daria Grushina, Kseniia Kuvshinova, Alina Kostromina, Aziz Temirkhanov, Mile Mitrovic, Dmitry Simakov

发表机构 * Sb AI Lab, HSE, NES（Sb AI Lab，HSE，NES）； Sb AI Lab, HSE（Sb AI Lab，HSE）； Sb AI Lab, HSE University（Sb AI Lab，HSE大学）； Sb AI Lab

AI总结提出LLMTabBench基准，系统评估LLM在数据稀缺条件下进行表格分类时，先验知识与上下文信息（任务描述和少样本示例）的交互作用，以及性能随数据复杂度的扩展规律。

详情

AI中文摘要

表格数据的监督分类仍然是核心机器学习任务，但其对大规模标注数据集的依赖限制了在数据稀缺领域的适用性。对于此类少样本场景，像TabPFN（一种最先进的先验数据拟合网络）这样的专门方法通过利用大规模合成预训练设定了高标准，但它们仍然需要标注示例的上下文才能运行。相比之下，大型语言模型（LLM）可以通过直接从任务描述中进行零样本和少样本上下文学习提供更灵活的替代方案，但它们在表格数据上的性能仍然不一致且理解不足。我们引入了LLMTabBench，这是一个基准测试，旨在系统评估LLM在数据稀缺条件下进行表格分类的能力。LLMTabBench明确探究了（i）LLM先验知识如何与上下文信息（任务描述和少样本示例）相互作用，以及（ii）模型性能如何随数据复杂度的增加而扩展，使用了真实世界和受控合成数据集。我们的发现包括：（1）LLM在零样本设置中极具竞争力，甚至可以超越那些能够访问少样本示例的替代模型；（2）加入额外的少样本示例可能与LLM先验知识冲突，限制甚至降低性能；（3）存在一个数据复杂度阈值，超过该阈值LLM的性能下降且少样本示例变得效果较差。这些发现共同揭示了表格数据上下文学习的基本限制，并为在低数据场景中部署LLM提供了实用指导。

英文摘要

Supervised classification for tabular data remains a core machine learning task, yet its reliance on large labeled datasets limits applicability in data-scarce domains. For such few-shot scenarios, specialized methods like TabPFN - a state-of-the-art Prior-Data Fitted Network - have set a high standard by leveraging large-scale synthetic pretraining, though they still require a context of labeled examples to function. In contrast, Large Language Models (LLMs) could offer a more flexible alternative via zero- and few-shot in-context learning directly from task descriptions, but their performance on tabular data remains inconsistent and poorly understood. We introduce LLMTabBench, a benchmark designed to systematically evaluate LLMs for tabular classification under data-scarce conditions. LLMTabBench explicitly probes (i) how LLM prior knowledge interacts with in-context information (task descriptions and few-shot examples), and (ii) how model performance scales with increasing data complexity, using both real-world and controlled synthetic datasets. Our findings include: (1) LLMs are highly competitive in zero-shot settings and can outperform alternative models, even when those models have access to few-shot examples; (2) incorporating additional few-shot examples can conflict with LLM prior knowledge, limiting or even degrading performance; and (3) there is a data complexity threshold beyond which LLMs' performance declines and few-shot examples become less effective. Together, these findings reveal fundamental constraints of in-context learning for tabular data and provide practical guidance for deploying LLMs in low-data regimes.

URL PDF HTML ☆

赞 0 踩 0

2605.24416 2026-05-26 cs.LG 版本更新

生成式OOD正则化的基于模型的策略优化

Aysin Tumay, Jiahe Huang, Elise Jortberg, Rose Yu

发表机构 * University of California, San Diego（加州大学圣地亚哥分校）； Abiomed（阿比omed）

AI总结提出GORMPO算法，利用生成式密度估计在稀疏状态-动作空间中限制策略更新到高密度区域，以解决离线强化学习中的分布外动作问题，并在真实医疗数据集和离线RL数据集上优于基线方法。

详情

AI中文摘要

我们研究使用离线强化学习的序贯决策。传统离线RL策略在训练仅依赖稀疏离线表示时可能导致分布外（OOD）动作。为确保在稀疏状态-动作空间中的安全离线策略，我们探索如何将密度估计模型集成到基于模型的RL方法中以避免OOD区域。生成式模型能够显式建模稀疏状态-动作空间中的密度。基于此，我们引入生成式OOD正则化的基于模型的策略优化（GORMPO），一种密度正则化的离线RL算法，使用生成式密度建模将策略更新限制在数据集的高密度区域。此外，我们考察更好的OOD检测是否对应更好的基于模型的离线策略。我们比较了（1）各种密度估计器的OOD检测能力，以及（2）它们在GORMPO框架内在真实医疗数据集和稀疏离线RL数据集上的性能。我们在温和假设下理论上保证了GORMPO的性能。实验上，GORMPO在真实医疗数据集上比最先进的基线方法提升17%，并在离线RL数据集上增强了基础模型。我们的实证发现表明，在动态稳定的环境中，更好的OOD检测通常导致改进的策略，而当动态不确定时，带有保守惩罚的较差密度估计更受青睐。

英文摘要

We study sequential decision-making with offline reinforcement learning (RL). Traditional offline RL policies may result in out-of-distribution (OOD) actions when training relies only on sparse offline representations. To ensure safe offline policies in a sparse state-action space, we explore how density estimation models can be integrated into model-based RL methods to avoid the OOD regions. Generative models are capable of explicitly modeling the density in sparse state-action spaces. Building on this, we introduce Generative OOD-regularized Model-based Policy Optimization (GORMPO), a density-regularized offline RL algorithm that uses generative density modeling to restrict policy updates to high-density areas of the dataset. Furthermore, we examine whether better OOD detection corresponds to better model-based offline policies. We compare (1) the OOD detection capabilities of various density estimators and (2) their performance within the GORMPO framework on a real-world medical dataset and sparse offline RL datasets. We theoretically guarantee GORMPO's performance under mild assumptions. Empirically, GORMPO outperforms state-of-the-art baselines by 17% on a real-world medical dataset and enhances the base model on the offline RL datasets. Our empirical findings show that better OOD detection generally results in improved policies in environments with stable dynamics, while conservative penalties with poor density estimation are favored when dynamics are uncertain.

URL PDF HTML ☆

赞 0 踩 0

2605.24395 2026-05-26 cs.LG 版本更新

AvAtar: Learning to Align via Active Optimal Transport

AvAtar: 通过主动最优输运学习对齐

Qi Yu, Ruizhong Qiu, Zhichen Zeng, My T. Thai, Huan Liu, Hanghang Tong

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； University of Florida（佛罗里达大学）； Arizona State University（亚利桑那州立大学）

AI总结提出AvAtar框架，利用主动学习策略通过熵正则化最优输运的梯度影响量化候选点信息量，并采用伴随状态法高效求解，以提升对齐性能。

Comments Published as a conference paper at ICML 2026

详情

AI中文摘要

对齐在许多机器学习问题中扮演基础角色，例如多网络分析、多模态学习和点云配准。近期工作越来越多地利用最优输运（OT）进行分布对齐，其有效性很大程度上依赖于在实践中难以或昂贵获取的稀疏监督。然而，现有工作大多忽略了如何主动获取高质量监督以提升OT框架下的对齐性能。本文提出了一种基于主动对齐的最优输运框架AvAtar。我们通过测量候选点对全局对齐结果的梯度影响来量化其信息量，该影响通过熵正则化OT公式从全局对齐结果传播到候选点的所有可能监督。鉴于OT的约束性质，直接对其求导具有挑战性，我们利用伴随状态方法将计算重新表述为一个线性系统，可通过共轭梯度法以线性复杂度求解并保证收敛。通过有效的效用函数编码全局对齐结果，AvAtar适用于OT框架下的一般对齐问题。在三个代表性对齐任务上的大量实验证明了所提AvAtar的有效性、可扩展性和泛化性。

英文摘要

Alignment plays a fundamental role in many machine learning problems, such as multi-network analysis, multimodal learning, and point cloud registration. Recent works increasingly leverage optimal transport (OT) for distributional alignment, whose effectiveness largely depends on sparse supervision that is hard or costly to obtain in practice. Existing works, however, largely overlook how to actively acquire high-quality supervision to improve their alignment performance under OT frameworks. In this paper, we propose a principled active alignment framework for optimal transport alignment called AvAtar. We quantify the informativeness of a candidate by measuring its gradient-based impact on the global alignment result, computed as the gradient propagation from the global alignment result to all possible supervisions of the candidate through the entropy-regularized OT formulation. While differentiating through OT is challenging given its constrained nature, we leverage the adjoint-state method to reformulate the computation to a linear system solvable by the conjugate gradient method with linear complexity and guaranteed convergence. By encoding the global alignment result via effective utility functions, AvAtar is applicable to general alignment problems under the OT framework. Extensive experiments on three representative alignment tasks demonstrate the effectiveness, scalability, and generalizability of the proposed AvAtar.

URL PDF HTML ☆

赞 0 踩 0

2605.24390 2026-05-26 cs.LG 版本更新

Learning Laplacian Eigenspace with Mass-Aware Neural Operators on Point Clouds

学习拉普拉斯特征空间：基于质量感知神经算子的点云处理

Zherui Yang, Tao Du, Ligang Liu

发表机构 * University of Science and Technology of China（中国科学技术大学）； Tsinghua University（清华大学）； Shanghai Qi Zhi Institute（上海智院）； Laoshan Laboratory（崂山实验室）

AI总结提出神经特征空间算子（NEO），通过预测稳定不变的低频子空间而非特征向量，结合质量感知神经算子和瑞利-里兹精化，实现点云上拉普拉斯-贝尔特拉米算子的快速谱分解。

详情

DOI: 10.1145/3799902.3811185

AI中文摘要

拉普拉斯-贝尔特拉米算子（LBO）的特征分解是几何分析的基础，但由于在大规模数据上迭代求解器的高成本，计算其低频特征模态仍然是一个重大瓶颈。为了分摊这一成本，我们引入了神经特征空间算子（NEO），这是一种前馈框架，旨在直接从点云预测谱。关键的是，NEO通过学习稳定、不变的低频子空间，规避了标准特征向量回归的不适定性（后者存在固有的符号翻转和旋转歧义）。具体来说，网络预测一组冗余的基函数，其张成空间稳健地覆盖目标特征空间，从而通过轻量级的瑞利-里兹精化恢复精确的特征对。为了处理不规则采样，我们提出了一种质量感知神经算子，将逐点面积权重纳入基于注意力的聚合中，提高了对非均匀密度的鲁棒性，并实现了跨分辨率的零样本泛化。我们的方法实现了近线性的运行时间缩放，并在相当精度下比迭代求解器获得了显著的挂钟加速，同时对高分辨率点云表现出强大的零样本迁移能力。得到的特征对支持标准的谱几何任务，而原始基函数为下游学习提供了有效的逐点特征。代码：https://github.com/Adversarr/NEO。

英文摘要

The eigendecomposition of the Laplace--Beltrami Operator (LBO) is fundamental to geometric analysis, yet computing its low-frequency eigenmodes remains a significant bottleneck due to the high cost of iterative solvers on large-scale data. To amortize this cost, we introduce the Neural Eigenspace Operator (NEO), a feed-forward framework designed to predict the spectrum directly from point clouds. Crucially, NEO circumvents the ill-posed nature of standard eigenvector regression, which suffers from intrinsic sign flips and rotation ambiguities, by learning the stable, invariant low-frequency subspace instead. Specifically, the network predicts a redundant set of basis functions whose span robustly covers the target eigenspace, allowing for the recovery of accurate eigenpairs via a lightweight Rayleigh--Ritz refinement. To handle irregular sampling, we propose a mass-aware neural operator that incorporates per-point area weights into attention-based aggregation, improving robustness to non-uniform densities and enabling zero-shot generalization across resolutions. Our approach achieves near-linear runtime scaling and substantial wall-clock speedups over iterative solvers at comparable accuracy, and exhibits strong zero-shot transfer to high-resolution point clouds. The resulting eigenpairs support standard spectral geometry tasks, while the raw basis functions provide effective point-wise features for downstream learning. Code: https://github.com/Adversarr/NEO.

URL PDF HTML ☆

赞 0 踩 0

2605.24386 2026-05-26 quant-ph cond-mat.stat-mech cs.DS cs.LG 版本更新

Fermi-Dirac machines as quantizations of neurons

费米-狄拉克机作为神经元的量子化

Alexander He, Nana Liu, Mark M. Wilde

发表机构 * Department of Physics, Cornell University（康奈尔大学物理系）； Institute of Natural Sciences, Shanghai Jiao Tong University（上海交通大学自然科学研究院）； School of Mathematical Sciences, Shanghai Jiao Tong University（上海交通大学数学科学学院）； Ministry of Education Key Laboratory in Scientific and Engineering Computing, Shanghai Jiao Tong University（上海交通大学教育部科学与工程计算重点实验室）； Global College, Shanghai Jiao Tong University（上海交通大学全球学院）； School of Electrical and Computer Engineering, Cornell University（康奈尔大学电气与计算机工程学院）

AI总结本文将费米-狄拉克机重新解释为经典神经元的正则量子化，通过用算子替换经典变量，开发了高效混合量子-经典算法来评估和训练量子化神经元，并证明了基于费米-狄拉克神经元的计算决策问题是BQP完全的。

Comments 87 pages, 12 figures, 2 tables

详情

AI中文摘要

费米-狄拉克机最近被提出作为在量子计算机上解决半定优化问题的一种方法。在这里，我们将其重新解释为经典神经元的正则量子化。通过将经典神经元视为应用于参数化经典哈密顿量的激活函数，我们通过用算子替换经典变量来量子化该模型，这些算子的特征值编码了它们的可能值。这遵循了量子力学中正则量子化的标准方法。关键的是，当哈密顿量由对易算子组成时，我们的构造精确地简化为经典神经元。更一般地，我们的方法产生了一个激活可观测量，定义为应用于参数化量子哈密顿量的激活函数。这个量子化神经元的输出是一个随机变量，其期望值等于激活可观测量相对于输入状态的期望值。我们开发了高效的混合量子-经典算法来评估和训练我们的量子化神经元的输出和梯度，从而实现评估和训练。这些算法依赖于基本原语，包括随机采样、哈密顿量模拟和Hadamard测试。我们还量子化了其他一系列激活函数，包括平滑修正线性单元（ReLU）、sigmoid线性单元、高斯平滑ReLU和高斯误差线性单元（GeLU），这些已知对深度学习应用有用。数值实验表明，基于量子哈密顿量的神经元可以学习经典神经元无法学习的函数。我们进一步基于费米-狄拉克神经元定义了一个计算决策问题，并证明它是BQP完全的，提供了反对有效经典模拟的复杂性理论证据。最后，我们将我们的方法推广到连续量子变量，并勾画了将这些神经元组合成网络的两种不同方式。

英文摘要

Fermi-Dirac machines were proposed recently as an approach to solving semidefinite optimization problems on quantum computers. Here, we reinterpret them as canonical quantizations of classical neurons. By viewing a classical neuron as an activation function applied to a parameterized classical Hamiltonian, we quantize this model by replacing classical variables with operators whose eigenvalues encode their possible values. This follows the standard approach to canonical quantization in quantum mechanics. Crucially, when the Hamiltonian consists of commuting operators, our construction reduces exactly to a classical neuron. More generally, our approach yields an activation observable, defined as an activation function applied to a parameterized quantum Hamiltonian. The output of this quantized neuron is a random variable with expectation value equal to that of the activation observable with respect to an input state. We develop efficient hybrid quantum-classical algorithms for evaluating outputs and gradients of our quantized neurons, enabling evaluation and training. These algorithms rely on basic primitives that include random sampling, Hamiltonian simulation, and the Hadamard test. We also quantize a whole host of other activation functions, including the smooth rectified linear unit (ReLU), sigmoid linear unit, Gaussian-smoothed ReLU, and Gaussian error linear unit (GeLU), which are known to be useful for deep learning applications. Numerical experiments indicate that neurons based on quantum Hamiltonians can learn functions that classical neurons cannot. We further define a computational decision problem based on Fermi-Dirac neurons and prove that it is BQP-complete, providing complexity-theoretic evidence against efficient classical simulation. Finally, we generalize our approach to continuous quantum variables and sketch two different ways of composing these neurons into networks.

URL PDF HTML ☆

赞 0 踩 0

2605.24381 2026-05-26 cs.LG cs.AI stat.AP stat.ML 版本更新

Assessing the Operational Viability of Foundation Models for Time Series Forecasting

评估基础模型在时间序列预测中的操作可行性

Kavin Soni, Debanshu Das, Vamshi Guduguntla

发表机构 * Google, USA（谷歌公司，美国）

AI总结通过对比基础模型与监督学习方法在四种操作场景下的性能，提出基于经验特征的复杂度路由器以实现精度与效率的平衡。

Comments 21 pages, 8 Figures, Code available at [https://github.com/kavin-soni/timeseries-zeroshot-eval]

详情

AI中文摘要

时间序列预测驱动着金融、交通和能源等领域的操作决策。虽然监督学习方法表现出色，但它们需要特定领域的训练、特征工程和持续维护。大规模基础模型最近作为一种零样本替代方案出现，像LLM一样避免了任务特定训练。在这项工作中，我们评估了基础模型与标准监督方法的对比。我们不仅关注总体精度，还分析了四种操作场景下的性能：周期性人机系统、物理约束过程、随机金融市场和异构需求预测。我们的结果描述了最优部署区域。基础模型在具有可迁移周期结构的领域中表现良好，并且对于冷启动或长尾场景效率高。相反，监督专家在受严格物理约束的系统中保持更高的精度。在金融领域，较新的基础模型正在迅速缩小与监督专家的性能差距。我们进一步量化了推理延迟、数据漂移适应性和部署约束之间的权衡。最后，我们提出了一个复杂度路由器，它利用经验特征将每个序列分配给最优模型类别。我们证明，与部署通用基础模型相比，这种选择性路由实现了更高的精度和显著更低的推理成本，为平衡泛化性和效率提供了一个实用框架。

英文摘要

Time series forecasting drives operational decisions in areas like finance, transportation, and energy. While supervised learning approaches achieve strong performance, they require domain-specific training, feature engineering, and ongoing maintenance. Large-scale foundation models have recently emerged as a zero-shot alternative, avoiding task-specific training much like LLMs. In this work, we evaluate foundation models against standard supervised approaches. Rather than focusing solely on aggregate accuracy, we analyze performance across four operational regimes: periodic human-centric systems, physically constrained processes, stochastic financial markets, and heterogeneous demand forecasting. Our results characterize optimal deployment areas. Foundation models perform well in domains with transferable periodic structures and are efficient for cold-start or long-tail scenarios. Conversely, supervised specialists maintain higher precision in systems governed by strict physical constraints. In financial domains, newer foundation models are rapidly closing the performance gap with supervised specialists. We further quantify trade-offs in inference latency, data drift adaptability, and deployment constraints. Finally, we propose a Complexity Router that assigns each series to the optimal model class using empirical features. We demonstrate that this selective routing achieves higher accuracy and significantly lower inference costs compared to deploying a universal foundation model, providing a practical framework for balancing generalization and efficiency.

URL PDF HTML ☆

赞 0 踩 0

2605.24370 2026-05-26 cs.LG q-bio.QM 版本更新

GEESE: Genotype-aware End-to-End Spatio-temporal Embedding for Behavioral Phenotyping

GEESE: 基因型感知的端到端时空嵌入用于行为表型分析

Yiran Ding, Yuen Gao, Chunqi Qian, Zijun Cui

发表机构 * Department of Computer Science and Engineering（计算机科学与工程系）； Department of Radiology（放射学系）

AI总结提出GEESE框架，利用预训练时间序列基础模型从3D姿态动力学中直接学习行为表征，无需手工特征，在三个自闭症相关基因模型上超越传统方法，并开发了交互式工具HONK。

详情

AI中文摘要

遗传动物模型的行为表型分析目前需要劳动密集的手工特征工程，这限制了可重复性和可扩展性。我们提出GEESE，一个端到端的深度学习框架，直接从3D姿态动力学中学习行为表征，无需手工特征。使用预训练的时间序列基础模型，我们将运动序列编码到一个行为流形中，该流形支持行为分类和基因型预测。在三个自闭症相关基因模型（CNTNAP2、CHD8、FMR1）上评估，我们的深度学习方法在这两个任务上都超越了手工特征基线，揭示出学习到的表征捕获了基因型特异的行为特征。该框架跨遗传背景泛化，一个全队列模型仅从运动模式中识别遗传背景和基因型。我们进一步提供HONK，一个交互式智能工具，使没有编程专业知识的科研人员能够通过自然语言交互从姿态数据中进行行为表型分析。

英文摘要

Behavioral phenotyping of genetic animal models currently requires labor-intensive manual feature engineering that limits reproducibility and scalability. We present GEESE, an end-to-end deep learning framework that learns behavioral representations directly from 3D pose dynamics without hand-crafted features. Using a pretrained time series foundation model, we encode movement sequences into a behavioral manifold that supports both behavior classification and genotype prediction. Evaluated across three autism-associated genetic models (CNTNAP2, CHD8, FMR1), our deep learning approach surpasses hand-crafted feature baselines in both tasks, revealing that learned representations capture genotype-specific behavioral signatures. The framework generalizes across genetic backgrounds, and an all-cohort model identifies both genetic background and genotype from movement patterns alone. We further provide HONK, an interactive intelligent tool enabling researchers without programming expertise to perform behavioral phenotyping from pose data through natural language interaction.

URL PDF HTML ☆

赞 0 踩 0

2605.24367 2026-05-26 cs.CV cs.LG 版本更新

Gaussian Rank-Based Neighborhood Degree for Graph Neural Networks in Image Classification

基于高斯排序邻域度的图神经网络图像分类方法

Rafael Mendonça Duarte, Jean Roberto Ponciano, Lucas Pascotti Valem

发表机构 * Institute of Mathematics and Computer Science (ICMC)（数学与计算机科学研究所）； University of São Paulo (USP)（圣保罗大学）； São Carlos -- SP -- Brazil（巴西圣卡洛斯）

AI总结提出GRaNDe（高斯排序邻域度）方法，通过结合邻域排序与高斯距离加权来改进图神经网络中的度归一化，在五个公开图像分类数据集上取得一致准确率提升。

详情

AI中文摘要

数据的指数级增长加剧了未标注数据的可用性与人工标注的高成本之间的差距。图神经网络（GNN）作为一种有前景的解决方案出现，因为它们利用关系结构并从标注和未标注数据中学习，执行半监督学习。这些模型的一个关键组成部分是基于度的归一化，它影响消息传播，但通常假设邻域节点具有均匀重要性。在图像分类中，图通常根据特征相似性构建，将所有邻居平等对待可能会忽略相关性的重要变化。受此差距启发，我们提出GRaNDe（高斯排序邻域度）。这种新颖的度度量将邻域排序与高斯距离加权相结合，以更好地捕捉节点重要性。在五个公开图像分类数据集上的实验表明，与最先进方法相比，该方法具有一致的准确率提升和竞争性或更优的结果。

英文摘要

The exponential growth of data has intensified the gap between the availability of unlabeled data and the high cost of manual annotation. Graph Neural Networks (GNNs) have emerged as a promising solution, as they exploit relational structures and learn from both labeled and unlabeled data, performing semi-supervised learning. A crucial component of many of these models is degree-based normalization, which influences message propagation but typically assumes uniform importance among neighboring nodes. In image classification, graphs are usually constructed from feature similarity, where treating all neighbors equally may overlook important variations in relevance. Motivated by this gap, we propose GRaNDe (Gaussian Rank-based Neighborhood Degree). This novel degree measure integrates neighborhood ranking with Gaussian distance weighting to better capture node importance. Experiments on five public image classification datasets show consistent accuracy improvements and competitive or superior results compared to state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2605.24366 2026-05-26 cs.CL cs.LG 版本更新

Structure-Aware RAG: Structured Retrieval Augmented Generation from Noisy Data for Conversational Agents

结构感知检索增强生成：面向对话代理的噪声数据结构化检索增强生成

Kaiqiao Han, LuAn Tang, Renliang Sun, Peng Yuan, Wei Cheng, Haoyu Wang, Wei Wang, Yizhou Sun, Haifeng Chen

发表机构 * UCLA（加州大学洛杉矶分校）； NEC Labs（日本电装实验室）

AI总结提出结构感知检索增强生成（SA-RAG），通过表格作为中间结构化表示来减少噪声并保留关键信息，结合质量感知的表格元数据生成框架和优化方法，在噪声真实数据集上显著优于现有RAG基线。

详情

AI中文摘要

大型语言模型（LLM）已广泛应用于对话应用。然而，它们对参数化知识的依赖限制了在需要动态或领域特定信息的真实场景中的可靠性。检索增强生成（RAG）通过在生成过程中引入外部知识来解决这一限制，但现有的基于文本和基于图的RAG方法通常难以处理噪声或不相关的上下文。在这项工作中，我们提出了结构感知检索增强生成（SA-RAG），它使用表格作为中间结构化表示，提供紧凑且可控的接口，在减少噪声的同时保留关键信息。我们引入了一个质量感知的表格元数据生成框架，对元数据规范化和有效性进行建模，提高了元数据质量和下游性能。此外，我们探索了无训练和基于训练的表格生成方法。生成验证和直接偏好优化进一步提高了表格质量，同时保持了语义和结构一致性。在两个噪声真实数据集上的实验表明，SA-RAG显著优于现有的RAG基线。我们的代码已在公共仓库中公开。

英文摘要

Large Language Models (LLMs) have been widely adopted in conversational applications. However, their reliance on parametric knowledge limits reliability in real-world scenarios that require dynamic or domain-specific information. Retrieval-Augmented Generation (RAG) addresses this limitation by incorporating external knowledge during generation, but existing text-based and graph-based RAG methods often struggle with noisy or irrelevant contexts. In this work, we propose Structure-aware Retrieval Augmented Generation (SA-RAG), which uses tables as an intermediate structured representation to provide a compact and controllable interface that reduces noise while preserving essential information. We introduce a quality-aware table metadata generation framework that models metadata normalization and effectiveness, improving metadata quality and downstream performance. Furthermore, we explore both training-free and training-based table generation methods. Generation validation and direct preference optimization further improve table quality while maintaining semantic and structural consistency. Experiments on two noisy real-world datasets show that SA-RAG significantly outperforms existing RAG baselines. Our code is publicly available at a public repository.

URL PDF HTML ☆

赞 0 踩 0

2605.24364 2026-05-26 stat.ML cs.LG 版本更新

Multicalibration Boosting: Theory, Convergence, and Transferability

多校准提升：理论、收敛性和可迁移性

Hanxuan Ye, Hongzhe Li

发表机构 * Department of Biostatistics and Epidemiology（生物统计学与流行病学系）； University of Pennsylvania（宾夕法尼亚大学）

AI总结本文统一了多校准提升（MCBoost）的理论框架，揭示了校准-风险权衡，并建立了在弱假设下的收敛率、有限样本保证和协变量偏移下的迁移性保证。

详情

AI中文摘要

LLMs 未显示出个体化元认知的迹象

M. Moran, Mark Whiting

发表机构 * Pareto AI

AI总结通过因素分析和校准方法，研究20个前沿大语言模型在六个基准上的置信度判断，发现模型间置信度差异主要由共享的难度因子和决策阈值决定，而非个体化元认知，数学推理中的表面例外实为混淆效应。

详情

AI中文摘要

置信度加权路由、选择性弃权和集成加权都假设模型表达的置信度能反映其回答问题的能力。它们假定功能性元认知，即无需实际执行就能评估自身能力的能力。聚合校准已被广泛研究，结果不一，但置信度表达的内在结构尚不明确。我们使用四因素分析与成对校准，分解了20个前沿大语言模型在六个基准上的二元置信度判断，探究置信度不同的两个模型是否也在性能上存在差异。在事实回忆和信息检索基准上，跨模型置信度矩阵近似秩为1，单个主导因子捕获了大部分潜在方差。检索事实的模型共享一个项目级难度轴，主要区别在于沿该轴的决策阈值。在所有基准上，一旦移除所有模型一致同意的项目，置信度与性能之间的关系便消失。模型间成对校准即使统计显著也很小，且在控制共享因子上的基率差异后，剩余部分缩小为零。数学推理似乎是例外，但结果发现这是一种混淆：推理模型通过尝试在思维链中解决问题来回答关于其置信度的问题，绕过了我们试图测量的亚符号自我知识。我们没有发现任何测试领域存在显著的语言化个体化元认知的证据。

英文摘要

Confidence-weighted routing, selective abstention, and ensemble weighting all assume that a model's stated confidence is informative about its capability on the question being asked. They presume functional metacognition, the capacity to assess one's own capabilities, without exercising them. Aggregate calibration is well studied, with mixed results, but the underlying structure of elicited confidence is less well understood. We decompose binary confidence judgements from 20 frontier Large Language Models (LLMs) across six benchmarks using tetrachoric factor analysis paired with pairwise calibration, asking whether two models that differ in confidence also differ in performance. On factual recall and information retrieval benchmarks the cross-model confidence matrix is approximately rank-one and a single dominant factor captures most of the latent variance. Models retrieving facts share an item-level difficulty axis and differ mainly in their decision thresholds along it. Across all benchmarks the relationship between confidence and performance collapses once items that all models agree on are removed. Inter-model pairwise calibration is small even where statistically significant, and what remains shrinks to nothing once base-rate differences along the shared factor are controlled for. Mathematical reasoning is the apparent exception, but this turns out to be a confound where reasoning models answer questions about their confidence by trying to solve them in their chain of thought, bypassing the sub-symbolic self-knowledge we seek to measure. We find no evidence for significant verbalised individuated metacognition in any tested domain.

URL PDF HTML ☆

赞 0 踩 0

2605.24298 2026-05-26 cs.CR cs.AI cs.LG 版本更新

An Empirical Evaluation of LLM-Generated Code Security Across Prompting Methods

LLM生成代码安全性的提示方法实证评估

Mohammed Kharma, Ahmed Sabbah, Mohammad Alkhanafseh, Mohammad Hammoudeh, David Mohaisen

发表机构 * Department of Computer Science, Birzeit University（计算机科学系，巴勒斯坦比泽大学）； King Fahd University of Petroleum and Minerals（国王法赫德石油和矿物大学）； University of Central Florida（中央佛罗里达大学）

AI总结通过跨5个LLM和4种编程语言的实证评估，提出弱点感知零样本链式思考（WA-0CoT）提示策略，发现提示方法虽影响弱点类别分布，但无法显著降低漏洞频率或密度。

Comments 40 pages, 22 tables, 8 figures

详情

AI中文摘要

大型语言模型（LLM）在自动化代码生成中的日益使用提高了软件开发效率，但往往以安全性为代价。生成的代码经常忽略关键问题，使其容易受到弱加密和不正确的输入验证等问题的影响。为了研究这一问题，我们对跨五个LLM和四种编程语言（Java、C++、C和Python）的LLM生成代码的安全质量进行了全面的实证评估，考察了多种提示工程方法的影响。我们提出了一种弱点感知的零样本链式思考（WA-0CoT）提示策略，该策略利用CWE映射丰富提示中的安全上下文以指导模型推理。我们的实证分析在卡方检验的支持下发现，不同提示方法在漏洞频率或密度上没有统计学上的显著降低。然而，包括WA-0CoT在内的提示策略系统地影响了CWE类别的组成分布，其效果因编程语言而异。这些发现表明，虽然安全感知的提示改变了生成弱点的结构，但仅靠提示工程不足以可靠地降低整体漏洞水平。结果强调了在评估LLM生成代码的安全属性时，语言感知和模型感知的提示设计的重要性。

英文摘要

The growing use of Large Language Models (LLMs) for automated code generation has enhanced software development efficiency, but often at the cost of security. Generated code frequently overlooks critical concerns, leaving it vulnerable to issues such as weak encryption and improper input validation. To investigate this problem, we present a comprehensive empirical evaluation of the security quality of LLM-generated code across five LLMs and four programming languages (Java, C++, C, and Python), examining the impact of multiple prompt engineering methods. We introduce a weaknesses-aware zero-shot chain-of-thought (WA-0CoT) prompting strategy that enriches prompts with security context using CWE mappings to guide model reasoning. Our empirical analysis, supported by chi-square tests, finds no statistically significant reductions in vulnerability frequency or density across prompt methods. However, prompting strategies, including WA-0CoT, systematically influence the compositional distribution of CWE categories, with effects varying by programming language. These findings suggest that while security-aware prompting alters the structure of generated weaknesses, prompt engineering alone is insufficient to reliably reduce overall vulnerability levels. The results highlight the importance of language-aware and model-aware prompt design when evaluating the security properties of LLM-generated code.

URL PDF HTML ☆

赞 0 踩 0

2605.24295 2026-05-26 cs.LG stat.ML 版本更新

Private Adaptive Covariance Estimation via Gaussian Graphical Models

通过高斯图模型进行私有自适应协方差估计

Cecilia Ferrando, Miguel Fuentes, Brett Mullins, Cameron Musco, Daniel Sheldon

发表机构 * Manning College of Information and Computer Sciences（信息与计算机科学学院）

AI总结提出PACE-GGM，一种数据自适应的差分隐私协方差估计方法，通过将隐私预算集中在经验协方差矩阵信息量最大的条目上，并在每轮中选择近似差的条目进行高斯机制测量，然后通过最大熵重建目标重构完整协方差矩阵，从而在高维和低到中等隐私预算下显著降低估计误差。

2605.24294 2026-05-26 cs.CR cs.AI cs.LG 版本更新

Concept Drift Adaptation Using Self-Supervised and Reinforcement Learning In Android Malware Detection

使用自监督学习和强化学习在Android恶意软件检测中适应概念漂移

Ahmed Sabbah, Mohammad Kharma, Mohammad Alkhanafseh, Radi Jarrar, Samer Zein, David Mohaisen

发表机构 * Birzeit University（巴伊兹大学）； University of Central Florida（中央佛罗里达大学）

AI总结提出一个基于自监督学习和强化学习的框架，通过冻结编码器测量潜在漂移并轻量适配，同时利用PPO控制器在成本约束下选择维护动作，以应对Android恶意软件检测中的概念漂移。

Comments 9 pages, 2 figures, 2 tables

详情

AI中文摘要

Android恶意软件检测器在部署后常因概念漂移而性能下降，而每次维护步骤完全重新训练成本高昂。我们提出一个按时间顺序的自适应维护框架，将部署时的维护建模为序列决策问题。该框架在初始化阶段通过自监督学习学习稳定的潜在表示，冻结编码器，在固定表示空间中测量潜在漂移，并使用可训练适配器和分类头进行轻量下游适配。一个近端策略优化控制器根据检测器状态（包括当前效用、固定记忆集上的保留率、潜在漂移指标和更新成本）选择低成本的维护动作。我们在模拟器和真实Android恶意软件数据集上，使用静态和动态特征，在因果部署式协议下评估该框架。结果表明，RL控制器提供了一种强大的成本感知适配策略，在非平稳部署条件下，始终保持在最佳策略之列，同时在时间性能、记忆保留和维护成本之间取得有利平衡。

优化数字治疗干预：内源性依从性下的在线学习

Eric Pulick, Stephanie Carpenter, Matthew Buman, Yonatan Mintz

发表机构 * Department of Industrial and Systems Engineering, University of Wisconsin-Madison（威斯康星大学麦迪逊分校工业与系统工程系）； College of Health Solutions, Arizona State University（亚利桑那州立大学健康解决方案学院）

AI总结针对慢性病数字治疗中患者依从性受推荐和过去依从性影响的问题，提出一个包含线性动力系统和logit链接的决策支持框架，并设计基于乐观主义的UCB-BOLD算法实现亚线性遗憾。

Comments 48 pages, 6 figures

详情

AI中文摘要

临床医生管理慢性病干预面临的一个关键挑战是在信息和资源有限的情况下维持患者的长期健康。数字治疗（DT）通过重复互动（例如每日治疗建议）提供了一种成本效益高的方式来大规模管理干预，但患者的成功高度依赖于他们的依从性。行为心理学表明，治疗建议和过去的依从性都会影响未来的依从性，然而现有的DT决策支持框架仅建模建议效应或将依从性视为外生背景，在模型和算法开发上留下了关键空白。为填补这一空白，我们提出了一个DT决策支持框架，该框架同时捕捉建议和依从性效应，使临床医生能够更好地规划治疗建议。我们使用线性动力系统（LDS）对患者随时间变化的治疗参与能力进行建模，该系统同时捕捉建议和依从性效应，并通过logit链接与依从性行为内生连接。我们建立了该模型的有限时间辨识保证，将LDS结果扩展到我们的设置。接下来，我们提出了一种基于乐观主义的算法UCB-BOLD用于在线治疗选择，并证明其实现了亚线性遗憾。我们通过使用微随机试验数据生成的合成患者队列进行消融研究，将UCB-BOLD与基准进行了比较。DT决策支持工具可以包含动态模型，使决策者能够有效利用DT设置中的数据，通过有效的资源分配改善患者健康。虽然短视或启发式方法对某些患者类型足够，但对于其他患者，明确规划建议和依从性效应的好处显著；UCB-BOLD的条件风险价值遗憾比次优基准低2-3倍。

英文摘要

A critical challenge facing clinicians managing chronic disease interventions is sustaining long-run patient health given limited information and resources. Digital therapeutics (DTs) provide a cost-effective way to manage interventions at scale through repeated interactions (e.g. daily treatment recommendations), but patient success is highly dependent on their adherence. Behavioral psychology suggests that both treatment recommendations and past adherence affect future adherence, yet existing decision support frameworks for DTs model only recommendation effects or treat adherence as exogenous context, leaving a key gap in model and algorithm development. To address this gap, we present a DT decision support framework that captures both recommendation and adherence effects, allowing clinicians to better plan treatment recommendations. We model a patient's time-varying capacity for engagement with treatment using a linear dynamical system (LDS) that captures both recommendation and adherence effects, endogenously connected to adherence behavior with a logit link. We establish finite-time identification guarantees for this model, extending LDS results to our setting. Next, we propose an optimism-based algorithm, UCB-BOLD, for online treatment selection and prove that it achieves sublinear regret. We evaluate UCB-BOLD against benchmarks via ablation studies on a synthetic patient cohort generated using micro-randomized trial data. DT decision support tools can include dynamical models to enable decision makers to efficiently use the data in DT settings to improve patient health through effective resource allocation. While myopic or heuristic approaches suffice for some patient types, the benefits of explicitly planning around recommendation and adherence effects are significant for others; UCB-BOLD achieves 2-3x lower conditional value-at-risk regret than the next-best benchmark.

URL PDF HTML ☆

赞 0 踩 0

2605.24251 2026-05-26 cs.LG cs.CV 版本更新

Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions

重新思考边缘上的持续异常检测：在现实工业条件下进行基准测试

Chad Weatherly, Sen Lin

发表机构 * University of Houston（休斯敦大学）

AI总结针对现有持续异常检测方法在评估、比较和边缘部署约束上的不足，提出统一基准和训练无关方法DINOSaur，在多种协议下超越所有现有方法，并在边缘设备上实现快速推理和适应。

详情

AI中文摘要

持续异常检测（CAD）解决了工业检测系统适应不断变化的生产条件的需求，但现有方法存在三个关键差距：不现实的评估、缺乏系统比较以及未考虑边缘部署约束。我们引入了一个统一的基准，结合了结构和逻辑异常的离散任务评估、一种新颖的连续漂移协议、对所有已发布CAD方法的首次头对头比较，以及在边缘硬件上的计算效率分析。我们的结果表明，现有的CAD方法并不一致地优于带有简单经验重放的传统方法。受此启发，我们提出了DINOSaur，一种无需训练的方法，结合了冻结的DINOv3骨干网络、空间索引的coreset记忆和邻域限制的异常评分。DINOSaur通过构造实现了零遗忘，在所有五种协议上优于所有评估的方法，并在NVIDIA Jetson Orin Nano上以低于100毫秒的推理速度运行，在设备上适应新任务的时间不到30秒。

英文摘要

Continual anomaly detection (CAD) addresses the need for industrial inspection systems to adapt to evolving production conditions, yet existing methods share three critical gaps: unrealistic evaluation, no systematic comparison, and no consideration of edge deployment constraints. We introduce a unified benchmark combining discrete-task evaluation on structural and logical anomalies, a novel continuous drift protocol, the first head-to-head comparison of all published CAD methods, and computational efficiency profiling on edge hardware. Our results reveal that existing CAD methods do not consistently outperform traditional approaches with simple experience replay. Thus motivated, we propose DINOSaur, a training-free method combining a frozen DINOv3 backbone with spatially-indexed coreset memory and neighborhood-restricted anomaly scoring. DINOSaur achieves zero forgetting by construction, outperforms all evaluated methods across all five protocols, and runs at sub-100\,ms inference on an NVIDIA Jetson Orin Nano, with on-device adaptation to new tasks in under 30 seconds.

URL PDF HTML ☆

赞 0 踩 0

2605.24249 2026-05-26 cs.LG 版本更新

PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets

PrivFusion: 一种用于协调分布式数据集的隐私保护多智能体框架

Anisa Halimi, Liubov Nedoshivina, Kieran Fraser, Stefano Braghin

发表机构 * IBM Research（IBM研究院）

AI总结提出PrivFusion框架，通过多智能体自动协调异构结构化数据集，在联邦学习前实现隐私保护的数据对齐，减少人工干预。

Comments Accepted by IEEE CBMS 2026

2605.24216 2026-05-26 cs.LG cs.AI cs.CL cs.CR 版本更新

Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning

Agent-ToM: 通过心智理论推理学习监控自主LLM智能体

Nesreen K. Ahmed, Nima Nafisi

发表机构 * Cisco Outshift（思科Outshift）

AI总结针对自主LLM智能体的隐蔽恶意行为监控难题，提出基于心智理论推理的Agent-ToM框架，通过信念推断、意图假设与验证实现结构化轨迹分析，在监控基准上取得优于集成方法的性能。

Comments 23 pages, 9 figures

详情

AI中文摘要

监控自主大语言模型（LLM）智能体的隐蔽恶意行为具有挑战性，因为攻击模式具有延迟性、上下文依赖性和长期性。智能体可能追求隐藏目标同时保持表面良性行为，即使拥有完整轨迹访问也难以检测。先前的监控方法改进了脚手架或集成聚合，但独立处理每条轨迹，未从先前的监控经验中学习。此外，标准推理方法解释观察到的行为，但没有明确推理智能体的信念、意图和目标对齐，而这些对于区分良性任务执行和隐蔽偏离是必要的。我们提出 extbf{Agent-ToM}，一种基于心智理论（ToM）推理的监控学习框架，用于自主智能体的安全分析。Agent-ToM通过推断信念、具有校准置信度的意图假设、预期行动以及与任务一致行为基线的偏离，执行结构化的全轨迹分析。在推理时，它采用 extit{推理-验证-细化}流程来构建和验证监控决策。在训练时，Agent-ToM将批评信号蒸馏到持久的 extit{语义护栏记忆}中，使得跨回合可重用的信念和意图条件约束成为可能。我们在对抗性智能体监控基准（SHADE-Arena和CUA-SHADE-Arena）上评估Agent-ToM。Agent-ToM实现了强精确率-召回率平衡，并使用连贯的双调用推理流程，优于包括集成方法在内的最先进监控基线。这些结果表明，在监控层学习，结合结构化的ToM推理和验证，为保护自主LLM智能体提供了有效且可部署的基础。

英文摘要

Monitoring autonomous large language model (LLM) agents for covert malicious behavior is challenging due to delayed, context-dependent, and long-horizon attack patterns. Agents may pursue hidden objectives while maintaining superficially benign behavior, making detection difficult even with full trajectory access. Prior monitoring approaches improve scaffolding or ensemble aggregation, but treat each trajectory independently and do not learn from prior monitoring experience. Moreover, standard reasoning methods explain observed behavior without explicitly reasoning about agent beliefs, intentions, and goal alignment required to distinguish benign task execution from covert deviation. We propose \textbf{Agent-ToM}, a learning-to-monitor framework grounded in Theory-of-Mind (ToM) reasoning for security analysis of autonomous agents. Agent-ToM performs structured full-trajectory analysis by inferring beliefs, intent hypotheses with calibrated confidence, expected actions, and deviations from task-consistent behavioral baselines. At inference time, it employs a \textit{Reason-Verify-Refine} pipeline to construct and validate monitoring decisions. At training time, Agent-ToM distills critique signals into a persistent \textit{semantic guardrail memory}, enabling reusable belief- and intent-conditioned constraints across episodes. We evaluate Agent-ToM on adversarial agent monitoring benchmarks (SHADE-Arena and CUA-SHADE-Arena). Agent-ToM achieves strong precision-recall balance and outperforms state-of-the-art monitoring baselines, including ensemble methods, while using a coherent two-call reasoning pipeline. These results demonstrate that learning at the monitoring layer, combined with structured ToM reasoning and verification, provides an effective and deployable foundation for securing autonomous LLM agents.

URL PDF HTML ☆

赞 0 踩 0

2605.24213 2026-05-26 cs.SE cs.AI cs.LG 版本更新

Towards Evaluation Engineering: An Empirical Study of ML Evaluation Harnesses in the Wild

迈向评估工程：机器学习评估工具在野外的实证研究

Zhimin Zhao, Zehao Wang, Abdul Ali Bangash, Bram Adams, Ahmed E. Hassan

发表机构 * Software Analysis and Intelligence Lab (SAIL), School of Computing, Queen's University（软件分析与智能实验室（SAIL），计算学院，女王大学）； Concordia University（Concordia大学）； Lahore University of Management Sciences (LUMS)（拉合尔管理科学大学（LUMS））

AI总结通过对57个评估工具的实证研究，提出五阶段工具模型，并分类16560个问题，发现规范阶段问题最多（41.4%），主要根因是未实现功能（24.3%）、文档缺失（20.3%）和输入验证缺失（17.2%），为将评估工程作为独立软件工程关注点奠定实证基础。

详情

AI中文摘要

评估工具是编排模型评估的软件系统，管理模型调用、数据加载、指标计算和结果报告。尽管它们在机器学习基础设施中扮演关键角色，但其操作挑战和工程问题迄今受到的关注有限。我们对57个评估工具进行了实证研究，推导出一个五阶段工具模型，并根据工作流阶段和根本原因对16,560个问题进行了分类。大多数工具操作挑战集中在规范阶段（占问题的41.4%），在此阶段工具集成外部模型、数据集和评分评判者。操作挑战的三个最常见根本原因是未实现功能（24.3%）、文档缺失（20.3%）和输入验证缺失（17.2%），这些合计占分类问题的61.7%，涵盖现有功能的缺陷和阻碍预期工作流的能力缺口。根本原因也因工作流阶段而异：环境不兼容和外部依赖破坏占配置问题的36.2%，而算法错误（25.9%）和验证缺失（22.5%）主导评估问题。这些贡献共同为将评估工程视为一个独立的软件工程关注点建立了实证基础。

英文摘要

Evaluation harnesses are software systems that orchestrate model evaluation by managing model invocation, data loading, metric computation, and result reporting. Despite their critical role in machine learning infrastructure, their operational challenges and engineering concerns have received limited attention so far. We present an empirical study of 57 evaluation harnesses, deriving a five-stage harness model and classifying 16,560 issues by workflow stage and root cause. Most harness operational challenges concentrate in the Specification stage (41.4% of issues), where harnesses integrate external models, datasets, and scoring judges. The three most frequent root causes of operational challenges are unimplemented features (24.3%), documentation gaps (20.3%), and missing input validation (17.2%), which together account for 61.7% of classified issues, spanning both defects in existing functionality and capability gaps that block intended workflows. Root causes also vary by workflow stage: environment incompatibility and external dependency breakage account for 36.2% of provisioning issues, whereas algorithmic error (25.9%) and validation gap (22.5%) dominate assessment issues. Together, these contributions establish an empirical foundation for treating evaluation engineering as a distinct software engineering concern.

URL PDF HTML ☆

赞 0 踩 0

2605.24212 2026-05-26 stat.AP cs.AI cs.LG stat.ML 版本更新

Distributionally Robust Transfer Learning with Structurally Missing Covariates, with Application to Cross-National Cardiac Arrest Prediction

分布鲁棒迁移学习在结构缺失协变量中的应用：以跨国心脏骤停预测为例

Siqi Li, Chuan Hong, Ziye Tian, Benjamin Sieu-Hon Leong, Koshi Nakagawa, Hideharu Tanaka, Sang Do Shin, Khuong Quoc Dai, Do Ngoc Son, Marcus Eng Hock Ong, Nan Liu, Molei Liu

发表机构 * Centre for Biomedical Data Science, Duke-NUS Medical School, Singapore（生物医学数据科学中心，杜克-国家大学医学院，新加坡）； Duke-NUS AI + Medical Sciences Initiative, Duke-NUS Medical School, Singapore（杜克-国家大学医学院AI+医学科学倡议，新加坡）； Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA（生物统计学与生物信息学系，杜克大学，北卡罗来纳州达勒姆，美国）； Duke Clinical Research Institute, Durham, NC, USA（杜克临床研究学院，北卡罗来纳州达勒姆，美国）； Emergency Medicine Department, National University Hospital, Singapore（急诊医学部，国立大学医院，新加坡）； Department of Sport and Medical Science, Faculty of Physical Education, Kokushikan University, Tokyo, Japan（体育与医学科学系，体育学院，立命馆大学，东京，日本）； Graduate School of Emergency Medical System, Kokushikan University, Tokyo, Japan（急救医疗系统研究生院，立命馆大学，东京，日本）； Department of Emergency Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea（急诊医学系，首尔国立大学医学院，首尔，韩国）； Center for Emergency Medicine, Bach Mai Hospital, Hanoi, Vietnam（急救医学中心，巴赫梅医院，河内，越南）； Center for Critical Care Medicine, Bach Mai Hospital, Hanoi, Vietnam（重症医学中心，巴赫梅医院，河内，越南）； Health Services Research Centre, Singapore Health Services, Singapore（卫生服务研究中心，新加坡卫生服务，新加坡）； Department of Emergency Medicine, Singapore General Hospital, Singapore（急诊医学部，新加坡中央医院，新加坡）； Pre-hospital & Emergency Research Centre, Health Services Research and Population Health, Duke-NUS Medical School, Singapore（院前与急诊研究中心，卫生服务研究与人口健康，杜克-国家大学医学院，新加坡）

AI总结提出DRUM框架，通过分布鲁棒优化和神经网络生成器处理目标域中结构缺失的协变量，实现无标签目标域的预测模型迁移，并在跨国心脏骤停预测中验证有效性。

详情

AI中文摘要

当关键训练协变量在部署时不可用且目标域中标记结果有限时，跨医疗系统部署临床预测模型常常失败。例如，院外心脏骤停（OHCA）的高性能模型依赖于高资源环境中常规收集的详细院前测量数据，但在许多国际登记处中不可用。现有方法要么丢弃缺失协变量，牺牲预测信息，要么依赖于关于其目标分布的可检验假设。我们提出了DRUM（具有结构缺失协变量的分布鲁棒无监督迁移学习），这是一个将预测模型迁移到某些协变量结构缺失且结果标签不可用的目标群体的框架。DRUM将协变量划分为共享组件（$X$，在所有环境中观察到）和缺失组件（$A$，仅在源域中观察到）。DRUM不进行缺失协变量插补，而是使用神经网络生成器优化未知目标分布$A \mid X$上的最坏情况预测性能，并通过鲁棒性参数控制与源条件允许的偏差。我们进一步开发了一种偏差校正程序，以减少对干扰估计误差的敏感性。模拟显示，在分布偏移下，平均和最坏情况预测误差均有显著改善。应用于跨国OHCA预测，将模型从美国登记处迁移到多个未记录院前变量的亚洲登记处，DRUM在各个站点产生了更校准的预测和改进的临床分类性能。

英文摘要

Deploying clinical prediction models across healthcare systems often fails when key training covariates are unavailable at deployment and labeled outcomes are limited in the target domain. For example, high-performing models for out-of-hospital cardiac arrest (OHCA) rely on detailed prehospital measurements routinely collected in high-resource settings but unavailable in many international registries. Existing methods either discard missing covariates, sacrificing predictive information, or rely on untestable assumptions about their target distribution. We propose DRUM (\underline{D}istributionally \underline{R}obust \underline{U}nsupervised transfer learning with structurally \underline{M}issing covariates), a framework that transfers prediction models to target populations where certain covariates are structurally absent and outcome labels are unavailable. DRUM partitions covariates into shared components ($X$), observed across all settings, and missing components ($A$), observed only in the source. Rather than imputing missing covariates, DRUM optimizes worst-case predictive performance over the unknown target distribution of $A \mid X$ using a neural network generator, with a robustness parameter controlling allowable deviation from the source conditional. We further develop a bias correction procedure that reduces sensitivity to nuisance estimation error. Simulations show substantial improvements in both mean and worst-case prediction error under distribution shift. Applied to cross-national OHCA prediction, transferring models from a US registry to multiple Asian registries where prehospital variables are unrecorded, DRUM yields better-calibrated predictions and improved clinical classification performance across sites.

URL PDF HTML ☆

赞 0 踩 0

2605.24210 2026-05-26 cs.LG stat.ML 版本更新

Characterizing the Representational Capacity of Neural Processes

神经过程表示能力的刻画

Robin Young

发表机构 * University of Cambridge（剑桥大学）

AI总结本文通过严格层级分析，刻画了条件神经过程、注意力神经过程、Transformer神经过程及其潜在变体的表示能力，揭示了不同架构在函数表示上的包含关系与局限。

Comments To appear at ProbML/AABI 2026

详情

AI中文摘要

神经过程能表示哪些函数？我们分析了流行的NP架构的表示能力：条件神经过程（CNPs）、注意力神经过程（ANPs）、Transformer神经过程（TNPs）及其潜在变体。我们证明这些架构形成了一个严格的层级结构。CNP可表示的函数恰好是那些依赖于上下文分布的有限多个期望特征的函数。ANPs通过查询相关的重新加权严格推广了CNPs，从而实现了核平滑器。ConvCNPs和ANPs不可比较；每个都包含对方之外的函数，通过平稳性与平移等变性区分。具有$L$个自注意力层的TNPs捕获$L$跳上下文交互。对于潜在NPs，我们证明有限维潜在变量提供一致的采样，但不能规避编码器的限制；匹配GP后验分布需要潜在维度随上下文大小缩放。这些结果为基于任务结构的架构选择提供了理论基础。

英文摘要

What functions can Neural Processes represent? We analyze the representational capacity of popular NP architectures: Conditional Neural Processes (CNPs), Attentive Neural Processes (ANPs), Transformer Neural Processes (TNPs), and their latent variants. We prove these architectures form a strict hierarchy. CNP-representable functions are exactly those depending on finitely many expected features of the context distribution. ANPs strictly generalize CNPs via query-dependent reweighting, enabling kernel smoothers. ConvCNPs and ANPs are incomparable; each contains functions outside the other, separated by stationarity versus translation equivariance. TNPs with $L$ self-attention layers capture $L$-hop context interactions. For latent NPs, we show finite-dimensional latents provide coherent sampling but do not circumvent encoder limitations; matching GP posterior distributions requires latent dimension scaling with context size. These results provide a theoretical foundation for architecture selection based on task structure.

URL PDF HTML ☆

赞 0 踩 0

2605.24207 2026-05-26 cs.DB cs.LG 版本更新

Incorporating Deep Learning Design in Database Queries

将深度学习设计融入数据库查询

Yuval Lev Lubarsky, Dean Light, Boaz Berger, Shunit Agmon, Benny Kimelfeld

发表机构 * University of Washington（华盛顿大学）

AI总结提出一种将深度学习自然集成到数据库查询中的方法，通过为元组关联可学习的向量嵌入，使查询同时操作数据和嵌入，实现关系深度学习。

详情

AI中文摘要

关系数据库上的深度学习通常通过将数据转换为图表示并在外部框架中应用基于图的神经网络来实现。这种数据库与外部机器学习系统之间的往返引入了非平凡的工程开销。实际上，这些图神经网络对元组嵌入进行操作，并以捕获关系连接引起的交互的方式操纵它们。鉴于这种自然的对应关系，没有根本原因说明为什么在关系数据上指定神经网络应该比查询它困难得多。我们提出了一种将深度学习与数据库查询自然集成的方法。关键思想是为每个元组关联一个来源，表示为具有可学习参数的向量嵌入。查询被提升为联合操作数据和嵌入，将带有嵌入元组的输入关系映射到带有嵌入元组的输出关系。这种方法为关系深度学习提供了声明性基础，促进了与数据库系统的集成、优化和广泛采用。我们描述了RelaNN，这是一个基于PyTorch和cuDF构建的概念验证实现。通过实现各种图学习模型，包括图卷积网络、异构图变换器、超图神经网络和深度同态网络，我们展示了RelaNN的实用性。程序的简单性及其有竞争力的运行时性能展示了一条具体路径，使得在数据库上实现最先进的神经网络变得像编写查询一样简单。

英文摘要

Deep learning over relational databases is conventionally realized by translating data into graph representations and applying graph-based neural networks within external frameworks. This round-trip between the database and external machine learning (ML) systems introduces non-trivial engineering overhead. In effect, these graph neural networks operate on tuple embeddings and manipulate them in ways that capture the interactions induced by relational joins. Given this natural correspondence, there is no fundamental reason why specifying a neural network over relational data should be substantially harder than querying it. We propose an approach that naturally integrates deep learning with database queries. The key idea is to associate each tuple with provenance, represented as a vector embedding with learnable parameters. Queries are lifted to operate jointly on data and embeddings, mapping input relations with embedded tuples to output relations with embedded tuples. This approach provides a declarative foundation for relational deep learning, facilitating integration with database systems, optimization, and wide adoption. We describe RelaNN, a proof-of-concept implementation of this approach built on top of PyTorch and cuDF. We illustrate the utility of RelaNN by implementing various graph-learning models, including graph convolutional networks, heterogeneous graph transformers, hypergraph neural networks and deep homomorphism networks. The simplicity of the programs and their competitive runtime performance demonstrate a concrete path toward making the implementation of state-of-the-art neural networks over databases as simple as writing a query.

URL PDF HTML ☆

赞 0 踩 0

2605.24195 2026-05-26 cs.CV cs.LG 版本更新

Single View Seafloor Recovery from Imaging Sonar via Differentiable Rendering

通过可微渲染从成像声纳进行单视图海底恢复

Sevan Brodjian, Michael Hobley, Pietro Perona

发表机构 * California Institute of Technology（加州理工学院）

AI总结提出一种无需训练的方法，通过可微渲染在30秒内从单张声纳图像恢复海底地形，利用已知海底倾斜条件，首次实现单视图高度恢复。

详情

AI中文摘要

由于光衰减和浑浊度，声纳通常是水下高分辨率成像的唯一合适模态。前视成像声纳提供距离和水平角度的测量，但将垂直结构压缩成平面图像，产生歧义，使得3D恢复具有挑战性。成像声纳的一个常见应用是水下地形测绘（测深），但目前的方法需要多个视图、昂贵的多传感器设置或大量训练数据，这限制了其使用和对新环境的适应性。我们提出了一种无需训练的方法，通过可微渲染在30秒内从单张声纳图像恢复测深，条件为已知的海底倾斜。据我们所知，这是声纳中单视图高度恢复的第一个可微渲染方法。我们的方法实现了可微声纳光线追踪，并优化显式高度场以重现目标图像。在合成数据集上，我们的方法在分布偏移下优于有监督的CNN，在粗糙地形上保持接近，而CNN在分布内获胜。通过建模声纳过程的物理基础先验，我们的方法无需训练数据即可适应不同的传感器配置和环境。

英文摘要

Sonar is often the only modality suitable for high-resolution imaging underwater due to light attenuation and turbidity. Forward-looking imaging sonar provides measurements over range and horizontal angle but collapses vertical structure into a flat image, creating ambiguities that make 3D recovery challenging. A common use case for imaging sonar is underwater terrain mapping (bathymetry), yet current methods require many views, expensive multi-sensor setups, or significant training data, which limits use and adaptability to new environments. We present a training-free method that recovers bathymetry from a single sonar image in under 30 seconds via differentiable rendering, conditioned on a known seafloor tilt. To our knowledge, this is the first differentiable rendering approach for single-view height recovery in sonar. Our method implements differentiable sonar ray tracing and optimizes an explicit height field to reproduce the target image. On synthetic datasets, our approach outperforms a supervised CNN under distribution shift and remains close on rough terrain, while the CNN wins in-distribution. By modeling physically grounded priors of the sonar process, our method adapts across sensor configurations and environments without training data.

URL PDF HTML ☆

赞 0 踩 0

2605.24193 2026-05-26 cs.SD cs.LG 版本更新

Music Transcription with (Almost) No Supervision

音乐转录：几乎无需监督

Saebyeol Shin, Chao Wan, Zhenzhen Liu, Justin Lovelace, Daniel C. Lin, Kilian Q. Weinberger, John Thickstun

发表机构 * Cornell University（康奈尔大学）

AI总结采用循环一致性翻译框架，利用少量配对数据作为锚点，充分挖掘未配对音频和乐谱数据，实现高质量音乐转录。

2605.24192 2026-05-26 cs.LG cs.AI cs.CV 版本更新

Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization

滤波后验均值集合：扩散泛化分析模型的统一框架

Matthew Niedoba, Berend Zwartsenberg, Frank Wood

发表机构 * University of British Columbia（不列颠哥伦比亚大学）； Inverted AI ； Alberta Machine Intelligence Institute（阿尔伯塔机器智能研究所）

AI总结本文提出滤波后验均值集合（FPMC）统一框架，通过查询精度向量、响应权重和源分布建模扩散模型去噪函数的泛化行为，并通过软松弛和源分布增强提升现有方法性能。

Comments 27 Pages, 7 figures

详情

AI中文摘要

作为图像扩散模型骨干的神经网络去噪函数，在多种网络架构和训练超参数下展现出显著一致的泛化行为。最近一系列研究试图通过聚合训练数据集补丁的后验加权平均值来建模这些网络的输出。在本工作中，我们将这些方法整合为一个统一的模型类，称为滤波后验均值集合（FPMC）。我们使用查询精度向量、响应权重和源分布定义该模型类，并说明现有方法可通过这些设计轴的具体选择恢复。依次研究每个轴，我们发现FPMC性能可以通过对先前基于补丁的方法进行软松弛以及通过源分布的增强来改进。将这些发现应用于现有的FPMC，我们在三个自然图像数据集上展示了样本的一致改进。

英文摘要

The neural-network denoising functions which form the backbone of image diffusion models are remarkably consistent in their generalization behaviour across a wide variety of network architectures and training procedure hyperparameters. A recent line of research has sought to model the outputs of these networks by aggregating posterior weighted averages of training dataset patches. In this work, we consolidate these approaches into a unified model class which we call Filtered Posterior Mean Collections (FPMCs). We define this model class using query precision vectors, response weights, and source distributions, and illustrate that existing methods are recoverable with specific choices of these design axes. Investigating each axis in turn, we find that FPMC performance can be improved with soft relaxations of prior patch-based methods, and through augmentations of source distributions. Applying these findings to an existing FPMC, we demonstrate consistent sample improvement across three natural image datasets.

URL PDF HTML ☆

赞 0 踩 0

2605.24183 2026-05-26 cs.DB cs.AI cs.LG 版本更新

AvalancheBench: Evaluating Enterprise Data Agents Through Latent World Recovery

AvalancheBench: 通过潜在世界恢复评估企业数据智能体

Darek Kleczek, Fuheng Zhao, Alexander W. Lee, Julien Tissier, Pawel Liskowski, Ugur Cetintemel, Anupam Datta

发表机构 * Brown University and Snowflake（布朗大学和Snowflake）

AI总结提出AvalancheBench基准，通过潜在世界恢复评估企业数据智能体的分析理解能力，揭示早期错误如何传播并导致系统性错误推荐。

详情

AI中文摘要

我们介绍了AvalancheBench，一个通过潜在世界恢复评估企业数据智能体的基准。AvalancheBench在三个方面改进了现有基准。首先，它评估分析理解而非流程完成：系统根据是否恢复了解释数据的片段、驱动因素、时间事件和关系来评分，而不仅仅是执行工作流或生成看似合理的报告。其次，它通过从已知潜在世界生成观测数据，为目标驱动分析提供真实基准，从而允许对不完整但有效的恢复给予部分分数。第三，它揭示了早期分析错误如何传播到后续结论：遗漏的片段、合并的事件或错误的归因可能导致系统性错误推荐。在这个意义上，AvalancheBench通过提供一个受控环境来诊断智能体是否恢复了企业数据背后的分析结构，从而补充了真实数据基准。在第一个电子商务用例中，领先编码智能体的最强配置仅恢复了26%的评分标准，失败集中在通用客户细分和合并的时间事件上。

英文摘要

We introduce AvalancheBench, a benchmark for evaluating enterprise data agents through \emph{latent world recovery}. AvalancheBench improves on existing benchmarks in three ways. First, it evaluates analytical understanding rather than pipeline completion: systems are scored on whether they recover the segments, drivers, temporal events, and relationships that explain the data, not merely on whether they execute a workflow or produce a plausible report. Second, it provides ground truth for goal-driven analytics by generating observations from a known latent world, enabling partial credit for incomplete but valid recoveries. Third, it exposes how early analytical mistakes propagate into later conclusions: missed segments, merged events, or wrong attributions can lead to systematically wrong recommendations. In this sense, AvalancheBench complements real-data benchmarks by providing a controlled setting for diagnosing whether agents recover the analytical structure behind enterprise data. On a first e-commerce use case, the strongest configuration of a leading coding agent recovers only 26\% of the rubric, with failures concentrated in generic customer segmentations and merged temporal events.

URL PDF HTML ☆

赞 0 踩 0

2605.24173 2026-05-26 cs.CL cs.AI cs.CR cs.LG 版本更新

Extracting Training Data from Diffusion Language Models via Infilling

通过填充从扩散语言模型中提取训练数据

Yihan Wang, N. Asokan

发表机构 * University of Waterloo（滑铁卢大学）； KTH Royal Institute of Technology（皇家理工学院）

AI总结提出填充提取协议，利用扩散语言模型的双向去噪能力，通过任意二进制掩码参数化，揭示掩码几何形状控制提取能力，边缘条件掩码比前缀条件掩码多提取三倍逐字序列，且双向访问打开了自回归模型无法利用的通道。

详情

AI中文摘要

大型语言模型中的记忆化几乎完全通过前缀条件提取进行研究，这是自回归模型的自然选择。然而，扩散语言模型（DLM）可以在任意位置去噪掩码标记。因此，仅前缀探测揭示了DLM中记忆化的一个方面，并显著低估了训练数据提取的风险。为了真实地建模DLM中训练数据的可提取性，我们引入了\emph{填充提取}，这是一种由任意二进制掩码参数化的数据提取协议，它包含了前缀仅探测并考虑了DLM的双向归纳偏差。在LLaDA-8B和Dream-7B上，跨五种提取模式、三种训练流水线和三个涵盖逐字和部分泄漏的语料库进行实例化，我们发现掩码几何形状控制着可提取性：边缘条件掩码比前缀条件掩码\emph{多提取三倍}的逐字序列，并且双向访问打开了自回归模型中无法利用的通道。特别是，我们表明，一个能够访问已删除个人身份信息的训练数据的现实对手，甚至可以从DLM中提取被删除的电子邮件地址，其召回率高于规模匹配的自回归模型。解码的可调参数可测量地影响提取性能，而后续的监督微调阶段并未消除先前的记忆化。

英文摘要

Memorization in large language models has been studied almost exclusively through prefix-conditioned extraction, a natural choice for autoregressive models. However, diffusion language models (DLMs) can denoise masked tokens at arbitrary positions. Thus, prefix-only probing reveals only one facet of memorization in DLMs and significantly underestimates the risk of training-data extraction. In order to realistically model extractability of training data in DLMs, we introduce \emph{infilling extraction}, a data-extraction protocol parameterized by an arbitrary binary mask that subsumes prefix-only probing and accounts for the bidirectional inductive bias of DLMs. Instantiating it on LLaDA-8B and Dream-7B across five extraction modes, three training pipelines, and three corpora covering verbatim and partial leakage, we find that mask geometry governs extractability: edge-conditioned masks \emph{extract up to three times more} verbatim sequences than prefix-conditioned ones, and bidirectional access opens channels inaccessible in autoregressive models. In particular, we show that a realistic adversary with access to training data where personally identifiable information has been redacted, can even achieve higher recall on extracting redacted email addresses from DLMs than from scale-matched autoregressive models. Tunable parameters for decoding measurably affect extraction performance, while a follow-up supervised finetuning stage does not eliminate the prior memorization.

URL PDF HTML ☆

赞 0 踩 0

2605.24171 2026-05-26 cs.LG cs.AI 版本更新

通过边际轨迹分布判别检测高维亚稳态盆地

Taj Jones-McCormick

AI总结提出一种基于边际轨迹分布比较的判别方法，通过神经网络近似贝叶斯分类器来识别高维马尔可夫过程中的亚稳态盆地，克服了传统谱方法在高维和非线性几何下的局限性。

详情

AI中文摘要

我们研究仅使用轨迹采样来识别高维时间齐次马尔可夫过程中动态不同的吸引盆地的问题。该问题是亚稳态动力系统分析的基础，其中过程在盆地内快速混合，而盆地之间的转换在感兴趣的时间尺度上很少发生，甚至当状态空间可约时也是如此。现有方法通常依赖于空间离散化或估计转移算子的谱分析，这在高维设置或底层盆地几何高度非线性时可能变得不可靠。我们提出了一种基于边际轨迹分布比较的盆地识别判别方法。我们证明了一个简单的风险分离结果：如果两个初始状态属于同一盆地，则区分其边际轨迹分布的贝叶斯最优分类器达到接近1/2的风险，而如果它们位于不同的盆地，则最优风险接近零。这一观察将盆地检测简化为边际轨迹分布之间的两样本判别问题。基于这一原理，我们开发了一种神经算法，该算法接收一组候选盆地代表，并通过神经网络近似贝叶斯分类器估计分类风险，迭代地合并它们。我们在各种亚稳态系统上评估了该方法。这些系统包括通过将低维动力学嵌入高维噪声环境空间构建的合成系统。在这些设置中，标准的谱和聚类方法常常失败，而我们的方法准确恢复了底层盆地结构。这些结果显示了现有方法的缺点，并突出了轨迹判别作为识别高维随机系统中动态盆地的有效工具。

英文摘要

We study the problem of identifying dynamically distinct basins of attraction in high dimensional time-homogeneous Markov processes using only trajectory sampling. This problem is fundamental in the analysis of metastable dynamical systems, where the process rapidly mixes within basins while transitions between basins occur rarely on the timescale of interest, or even when the state space is reducible. Existing approaches typically rely on spatial discretization or spectral analysis of estimated transition operators, which can become unreliable in high dimensional settings or when the underlying basin geometry is highly nonlinear. We propose a discriminative approach to basin identification based on marginal trajectory distribution comparison. We prove a simple risk separation result: if two initial states belong to the same basin, the Bayes-optimal classifier distinguishing their marginal trajectory distributions achieves risk close to 1/2, whereas if they lie in distinct basins, the optimal risk is close to zero. This observation reduces basin detection to a two-sample discrimination problem between marginal trajectory distributions. Motivated by this principle, we develop a neural algorithm that receives a set of candidate basin representatives and iteratively merges them by estimating classification risk with a neural network that approximates the Bayes classifier. We evaluate the method on various metastable systems. These include synthetic systems constructed by embedding low-dimensional dynamics into high dimensional noisy ambient spaces. In these settings, standard spectral and clustering-based methods often fail, while our approach accurately recovers the underlying basin structure. These results display a shortcoming of existing methods and highlight trajectory discrimination as an effective tool for identifying dynamical basins in high dimensional stochastic systems.

URL PDF HTML ☆

赞 0 踩 0

2605.24113 2026-05-26 cs.LG math.DG math.OC math.ST stat.TH 版本更新

Riemannian Archetypal Analysis: Interpretable non-linear data analysis on deformed star distributions

黎曼原型分析：变形星形分布上的可解释非线性数据分析

Willem Diepeveen, Deanna Needell

发表机构 * Department of Mathematics（数学系）； University of California, Los Angeles（加州大学洛杉矶分校）

AI总结提出黎曼原型分析（RAM），通过数据驱动的拉回几何将经典原型分析扩展到非线性流形，结合可解释性与非线性表达能力，并基于凸松弛与非凸细化的优化方案实现。

详情

AI中文摘要

经典原型分析因其可解释性而具有吸引力，但其线性几何在处理强非线性结构的数据时可能限制性能；同时，现有的神经扩展提高了灵活性，但往往削弱了原型和插值的几何意义。在这项工作中，我们基于数据驱动的拉回几何，针对实值数据开发了黎曼版本的原型分析，旨在结合经典原型分析的可解释性与现代非线性模型的表达能力。我们引入了一类变形星形分布及其相关的拉回黎曼几何，以提供所得流形映射的统计解释，将黎曼原型映射（RAM）定义为投影到原型的测地凸组合流形上，并提出了基于凸松弛后接非凸细化的实用优化方案。我们进一步提出了一种学习方案，从数据中产生合理但通常次优的变形星形分布。在合成示例和MNIST上的实验表明，所提出的框架产生了有意义的测地线、有用的去噪投影和几何感知分类，同时也明确了当前优化限制所在。

英文摘要

Classical archetypal analysis is appealing for its interpretability, but its linear geometry can limit performance on data with strongly non-linear structure; at the same time, existing neural extensions improve flexibility while often weakening the geometric meaning of archetypes and interpolations. In this work, we develop a Riemannian version of archetypal analysis based on data-driven pullback geometry for real-valued data, with the goal of combining the interpretability of classical archetypal analysis with the expressive power of modern non-linear models. We introduce a class of deformed star distributions together with associated pullback Riemannian geometry to provide a statistical interpretation of the resulting manifold mappings, define the Riemannian archetypal mapping (RAM) as a projection onto the manifold of geodesically convex combinations of archetypes, and propose a practical optimization scheme based on convex relaxation followed by non-convex refinement. We further propose a learning scheme that yields reasonable, albeit generally suboptimal, deformed star distributions from data. Experiments on synthetic examples and MNIST show that the resulting framework produces meaningful geodesics, useful denoising projections, and geometry-aware classifications, while also clarifying where current optimization limitations remain.

URL PDF HTML ☆

赞 0 踩 0

2605.24106 2026-05-26 cs.LG cs.AI 版本更新

Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference

克服地球观测中的“物理冲击”：面向PINN洪水推断的异方差不确定性框架

Tewodros Syum Gebre, Jagrati Talreja, Matilda Anokye, Leila Hashemi-Beni

发表机构 * Built Environment Department, College of Science and Technology, North Carolina A&T State University（北卡罗来纳A&T州立大学科学与技术学院建筑环境系）； United Nations University Institute for Water, Environment and Health（联合国大学水、环境与健康研究所）

AI总结提出一种不确定性感知的物理信息神经网络框架，通过动态热身启动和异方差不确定性建模，解决遥感洪水映射中物理约束与噪声数据冲突导致的梯度发散问题，在Sen1Floods11数据集上IoU提升25%。

Comments This article is accepted in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

详情

AI中文摘要

从遥感数据（如合成孔径雷达SAR）中快速准确地绘制洪水范围对于灾害应急响应至关重要，但标准深度学习模型由于缺乏水文约束常产生物理上不可能的预测。尽管物理信息神经网络（PINNs）试图通过将控制定律直接嵌入损失函数来解决这一问题，但其在真实遥感数据上的应用经常失败。将刚性空间导数（如二维浅水方程）强加于试图拟合噪声SAR散斑的无条件潜在空间会导致灾难性的梯度发散，我们将这一现象称为“物理冲击”。本文提出了一种专门针对应用地球观测的新型不确定性感知PINN框架，以解决这一不稳定性。通过集成动态热身启动协议和通过负对数似然目标建模异方差偶然不确定性，网络学会在高传感器噪声区域动态放松物理约束，而在高置信度区域严格强制执行。在Sen1Floods11数据集上的评估表明，我们的概率注意力门控FNO-UNet成功稳定了多目标优化，与确定性基线相比，交并比（IoU）相对提高了25%。此外，通过深度集成，我们成功地将内在传感器噪声与分布外地形未知性分离开来，为运营机构提供了高度校准、物理一致的置信区间，用于稳健的灾害缓解和实时决策。

英文摘要

Rapid and accurate flood extent mapping from Remote Sensing data, such as Synthetic Aperture Radar (SAR), is critical for operational disaster response, but standard Deep Learning models often produce physically impossible predictions due to a lack of hydrological constraints. While PhysicsInformed Neural Networks (PINNs) attempt to address this by embedding governing laws directly into the loss function, their application to real-world remote sensing data frequently fails. Enforcing rigid spatial derivatives (e.g., the 2D Shallow Water Equations) onto unconditioned latent spaces attempting to fit noisy SAR speckle causes catastrophic gradient divergence, a phenomenon we term Physics Shock. In this paper, we propose a novel Uncertainty-Aware PINN framework tailored specifically for applied Earth Observation that addresses this instability. By integrating a dynamic Warm-Start protocol and modeling heteroscedastic aleatoric uncertainty via a negative log-likelihood objective, the network learns to dynamically relax physical constraints in regions of high sensor noise while strictly enforcing them in high-confidence areas. Evaluated on the Sen1Floods11 dataset, our probabilistic Attention-Gated FNO-UNet successfully stabilizes multi-objective optimization, achieving a +25% relative improvement in Intersection over Union (IoU) compared to deterministic baselines. Furthermore, through Deep Ensembles, we successfully disentangle intrinsic sensor noise from out-of-distribution terrain ignorance, providing operational agencies with highly calibrated, physically consistent confidence bounds for robust disaster mitigation and real-time decision-making.

URL PDF HTML ☆

赞 0 踩 0

2605.24084 2026-05-26 cs.LG cs.AI cs.LO 版本更新

Verified SHAP: Provable Bounds for Exact Shapley Values of Neural Networks

Verified SHAP: 神经网络精确Shapley值的可证明界

David Boetius, Shahaf Bassan, Guy Katz, Stefan Leue, Tobias Sutter

发表机构 * University of Konstanz, Konstanz, Germany（康斯坦茨大学）； Hebrew University of Jerusalem, Jerusalem, Israel（耶路撒冷希伯来大学）； University of St.Gallen, St.Gallen, Switzerland（斯图加特大学）

AI总结利用神经网络验证技术，提出一种计算SHAP值精确上下界的算法，可扩展到比现有精确方法大数个数量级的搜索空间。

Comments Accepted at ICML 2026. 34 pages, 13 figures

2605.24077 2026-05-26 eess.SP cs.LG 版本更新

LWM-CDE: A Representation Space for Wireless Data Reasoning and Transferability

LWM-CDE：无线数据推理与可迁移性的表示空间

Sadjad Alikhani, Akshay Malhotra, Shahab Hamidi-Rad, Ahmed Alkhateeb

发表机构 * School of Electrical, Computer and Energy Engineering, Arizona State University（电气、计算机与能源工程学院，亚利桑那州立大学）； InterDigital, Inc.（InterDigital公司）

AI总结提出基于预训练无线基础模型特征空间的数据集相似性框架LWM-CDE，通过对比学习和几何形状损失微调数据集嵌入，构建距离可靠指示可迁移性的结构化流形，在无线基准测试中比现有指标更高效且与经验迁移性能相关性更强。

Comments The model and relevant scripts are available on the WILab Hugging Face page: https://huggingface.co/wi-lab

详情

AI中文摘要

机器学习在真实世界无线通信任务中的部署面临显著的泛化挑战，原因包括信号结构对位置和环境的依赖性、不同部署场景下数据的高度多样性以及真实世界数据的有限可用性。当前用于评估训练分布与推理（部署）分布之间数据相似性以及模型可迁移性的方法存在计算成本高和性能不一致的问题，导致关键的模型部署和模型生命周期管理决策缺乏原则性基础。为了解决这一问题，我们引入了一个基于预训练无线基础模型特征空间的数据集相似性框架。我们的方法LWM-CDE（对比学习数据集嵌入）通过结合对比损失和几何形状损失对基础模型的数据集嵌入进行微调，构建了一个结构化流形，其中距离可靠地指示可迁移性。在无线基准测试上的大量实验表明，LWM-CDE与经验迁移性能的相关性比现有指标更强，同时计算效率更高。学习到的表示空间支持更有效且数据高效的决策，例如源数据集选择、标签感知增强和预算预训练，展示了其在各种无线通信应用中的广泛实用性。

英文摘要

Machine learning deployments in real-world wireless communication tasks face significant generalization challenges due to location and environment-specific signal structure, high diversity in data across different deployments, and limited availability of real-world data. Current approaches for assessing data similarity between training and inference (deployment) distributions, as well as evaluating model transferability, suffer from high computational costs and inconsistent performance, leaving critical model deployment and model life cycle management decisions without a principled foundation. To address this, we introduce a dataset similarity framework built upon the feature space of a pretrained wireless foundation model. Our method, LWM-CDE (Contrastive learning of Dataset Embedding), fine-tunes the dataset embeddings of the foundation model using a combination of contrastive and geometry-shaping losses, creating a structured manifold where distance reliably indicates transferability. Extensive experiments on wireless benchmarks show that LWM-CDE achieves stronger correlation with empirical transfer performance than existing metrics while being more computationally efficient. The learned representation space supports more effective and data-efficient decision-making for tasks like source dataset selection, label-aware augmentation, and budgeted pretraining, demonstrating its broader utility across different wireless communication applications.

URL PDF HTML ☆

赞 0 踩 0

2605.24076 2026-05-26 stat.ML cs.LG 版本更新

Causality as the Statistical Conscience of Artificial Intelligence: From Pearl's Ladder to Trustworthy Machines

因果关系作为人工智能的统计良知：从珀尔的阶梯到可信机器

Ernest Fokoué

发表机构 * School of Mathematics and Statistics, College of Science（数学与统计学学院，科学学院）

AI总结本文论证因果推断是AI不可或缺的统计良知，通过统计必要性定理、统一因果统计估计框架以及三种AI失败模式的因果盲区分析，提出可信AI本质上是因果统计问题。

Comments 18 pages, 4 figures, 1 table

详情

AI中文摘要

现代人工智能通过优化大规模语料库上的统计风险函数实现了卓越的预测能力。然而，这与其正智能之间存在差距：无法区分相关性与因果关系。本文认为，因果推断（识别干预下不变的机制）是人工智能不可或缺的统计良知。没有因果基础，AI系统只是相关机器：在熟悉领域强大，在分布偏移下脆弱，在高风险场景中存在偏见。三个贡献发展了这一论点。首先，因果泛化的统计必要性定理：任何实现分布外泛化的算法必须编码因果结构，形式化了预测P(Y|X)与智能P(Y|do(X))之间的区别。其次，一个统一框架将珀尔的do演算、潜在结果框架、双机器学习以及不变风险最小化连接为一系列因果统计估计量，每个估计量在不同假设下识别干预分布。第三，三种AI失败模式（大语言模型中的幻觉、基于人类反馈的强化学习中的奖励黑客以及分布偏移下的退化）是因果盲区的表现，每种都有原则性的统计补救措施。可信AI的核心是一个因果统计问题。统计界不仅有能力解决它——而且是唯一拥有严格解决所需基础工具的群体。

英文摘要

Modern Artificial Intelligence achieves remarkable predictive power by optimizing statistical risk functionals over vast corpora. Yet a gap separates this from genuine intelligence: the inability to distinguish correlation from causation. This paper argues that causal inference (identifying mechanisms invariant under intervention) is AI's indispensable statistical conscience. Without causal grounding, AI systems are correlation machines: powerful in familiar domains, brittle under distribution shift, and biased in high-stakes settings. Three contributions develop this argument. First, a Statistical Necessity Theorem for Causal Generalization: any algorithm achieving out-of-distribution generalization must encode causal structure, formalizing the distinction between prediction P(Y|X) and intelligence P(Y|do(X)). Second, a unified framework connects Pearl's do-calculus, the Potential Outcomes framework, Double Machine Learning, and Invariant Risk Minimization as a family of Causal Statistical Estimators, each identifying interventional distributions under different assumptions. Third, three AI failure modes (hallucination in large language models, reward hacking in reinforcement learning from human feedback, and degradation under distribution shift) are manifestations of causal blindness, each admitting a principled statistical remedy. Trustworthy AI is, at its core, a problem of causal statistics. The statistical community is not merely equipped to solve it -- it is the only community with the foundational tools to do so rigorously.

URL PDF HTML ☆

赞 0 踩 0

2605.24073 2026-05-26 physics.chem-ph cs.LG 版本更新

Multitask learning with semiempirical orbital charges enables sample-efficient MLIPs

基于半经验轨道电荷的多任务学习实现样本高效的机器学习原子间势

Ihor Neporozhnii, Sjoerd Hoogland, Oleksandr Voznyy

发表机构 * Department of Physical and Environmental Sciences（物理与环境科学系）； Department of Electrical and Computer Engineering（电气与计算机工程系）； The Alliance for AI-Accelerated Materials Discovery（人工智能加速材料发现联盟）

AI总结提出利用半经验轨道电荷进行多任务学习，通过等变模型预测轨道电荷，显著提升机器学习原子间势的样本效率和精度，减少46%能量误差并降低五倍数据需求。

Comments 16 pages, 6 figures

详情

AI中文摘要

机器学习原子间势（MLIPs）需要生成计算昂贵的大规模训练数据集，以准确模拟材料和分子。利用多任务学习结合电子结构信息可提高样本效率，然而，在全哈密顿矩阵（随原子数二次缩放）上训练对于大数据集是难以处理的。在这项工作中，我们表明利用轨道分辨的半经验电荷进行多任务学习显著提高了MLIPs的样本效率和精度。为了高效预测轨道电荷，我们实现了一个专门的等变模型，与不变基线相比降低了电荷预测误差。通过使用计算成本低、随原子数线性缩放的GFN1-xTB轨道电荷增强训练，我们的模型实现了能量平均绝对误差降低46%，并且仅需五分之一的数据即可达到仅能量模型的性能。此外，我们的方法优于在昂贵的密度泛函理论（DFT）原子电荷上训练的模型，捕捉了轨道分辨的电子复杂性，并迫使网络学习一个物理准确的潜在空间，该空间根据共享化学性质自发地对金属进行聚类。由于轨道电荷仅在训练期间需要，这种方法保持了推理效率，为开发复杂化学系统的准确、数据高效的基础模型提供了可扩展的方案。

英文摘要

Machine learning interatomic potentials (MLIPs) require generating computationally expensive, large-scale training datasets to accurately simulate materials and molecules. Incorporating electronic structure information using multitask learning improves sample efficiency, however, training on full Hamiltonian matrices, which scale quadratically with the number of atoms, is intractable for large datasets. In this work, we show that multitask learning utilizing orbitally resolved semiempirical charges significantly improves sample efficiency and accuracy in MLIPs. To efficiently predict orbital charges, we implement a specialized equivariant model, reducing charge prediction error compared to an invariant baseline. By augmenting training with computationally inexpensive GFN1-xTB orbital charges, which scale linearly with the number of atoms, our model achieves a 46\% reduction in energy mean absolute error and requires five times less data to match the performance of energy-only models. Furthermore, our approach outperforms models trained on expensive density functional theory (DFT) atomic charges, capturing orbitally resolved electronic complexity and forcing the network to learn a physically accurate latent space that spontaneously clusters metals by shared chemical properties. Because orbital charges are only required during training, this approach preserves inference efficiency, providing a scalable recipe for developing accurate, data-efficient foundation models for complex chemical systems.

URL PDF HTML ☆

赞 0 踩 0

2605.24072 2026-05-26 stat.ML cs.LG math.PR 版本更新

Optimal Non-Asymptotic Edgeworth Expansions for Multivariate Neural Network Outputs

多元神经网络输出的最优非渐近 Edgeworth 展开

Lucia Celli

发表机构 * Department of Mathematics, University of Luxembourg（卢森堡大学数学系）

AI总结针对有限宽度全连接神经网络输出，利用任意阶 Edgeworth 展开逼近其与高斯极限的偏差，并给出总变差距离的上下界。

Comments 34 pages, 2 figures

2605.24067 2026-05-26 physics.ao-ph cs.LG 版本更新

Seeing Inside the Storm: Improving Nowcasting by Integrating Meteorological Drivers

洞察风暴内部：通过整合气象驱动因子改进临近预报

Minghui Qiu, Jun Chen, Lin Chen, Weifeng Chen, Shuxin Zhong, Zhidan Liu, Yu Zhang, Kaishun Wu

发表机构 * Guangzhou Meteorological Observatory（广州气象局）

AI总结提出MetroLogist框架，通过物理定制的编码器、时间相位对齐器和跨场空间聚合器，整合热力学、动力学和微物理驱动因子，实现风暴生命周期的完整建模，显著提升临近预报性能。

详情

AI中文摘要

大多数基于雷达反射率的临近预报系统关注当前降水，忽略了大气前兆——如低层辐合、湍流涡旋和潜热加热——这些为预见风暴诞生提供了短暂窗口。我们提出了MetroLogist，一个受物理启发的雷达智能框架，模拟从风暴前兆到组织化演变的完整对流生命周期。然而，利用这些前兆并非易事：它们源自多个气象驱动因子——热力学、动力学和微物理——这些因子异步演化（C1）且在空间上分散（C2）。为此，MetroLogist设计了三个紧密集成的组件。物理定制编码器根据雷达回波的内在物理尺度和语义进行处理，形成热力学、动力学和微物理流，捕捉不同的动力机制。时间相位对齐器通过利用因果时间注意力来捕捉不同驱动因子何时以及如何相互作用和激活，从而解决C1。跨场空间聚合器通过跨区域融合，对齐相邻单元中微弱且分散的前兆，以暴露上游触发因素并强制空间一致性，从而解决C2。在3D-NEXRAD（2020-2022，全美范围）上的评估显示，MetroLogist在高影响检测（CSI40）上比强基线提升了+9.7%，并在风暴发展阶段实现了37.67%的显著增益——展示了在风暴出现之前感知它们的真正预见能力。代码可在补充材料中找到。

英文摘要

Most nowcasting systems, built on radar reflectivity, focus on current precipitation, ignoring the atmospheric precursors -- such as low-level convergence, turbulent eddies, and latent heating -- that offer a fleeting window to foresee storm birth. We introduce MeteoLogist, a physics-inspired radar intelligence framework that models the full life cycle of convection -- from its precursors to organized storm evolution. However, exploiting these precursors is non-trivial: they originate from multiple meteorological drivers -- thermodynamic, kinematic, and microphysical -- that evolve asynchronously (C1) and remain spatially fragmented (C2). To this end, MeteoLogist designs three tightly integrated components. The Physics-Tailored Encoders process radar echoes according to their intrinsic physical scales and semantics, forming thermodynamic, kinematic, and microphysical streams that capture distinct dynamical regimes. The Temporal-Phase Aligner addresses C1 by leveraging causal temporal attention to capture when and how different drivers interact and activate. The Cross-Field Spatial Aggregator addresses C2 through cross-regional fusion, aligning weak and scattered precursors across neighboring cells to expose upstream triggers and enforce spatial coherence. Evaluated on 3D-NEXRAD (2020--2022, US-wide), MeteoLogist boosts high-impact detection (CSI40) by +9.7% over strong baselines, and achieves a remarkable 37.67% gain during the storm-developing stage -- demonstrating true foresight in sensing storms before they appear. The code can be found in the supplementary material.

URL PDF HTML ☆

赞 0 踩 0

2605.24064 2026-05-26 cs.LG cs.AI 版本更新

Generative Representation Learning on Hyper-relational Knowledge Graphs via Masked Discrete Diffusion

超关系知识图谱上的生成式表示学习：基于掩码离散扩散

Jaejun Lee, Seheon Kim, Joyce Jiyoung Whang

发表机构 * School of Computing（计算学院）； Department of AI Computing, KAIST, Daejeon, South Korea（人工智能计算系，韩国科学技术院，大田，韩国）

AI总结针对超关系知识图谱中任意掩码查询的补全与事实生成任务，提出基于掩码离散扩散的生成式表示学习方法KREPE，统一链接预测与事实生成，性能达到最优。

Comments 28 pages, 16 figures, 18 tables, 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

超关系知识图谱（HKG）能有效表示复杂事实。在HKG中推断新知识是一个关键问题，但现有方法将其视为简单的链接预测，假设事实中几乎所有实体和关系已知，仅留单个空白待填充。然而，这种受限假设在现实场景中可能不成立，因为事实的多个甚至全部组成成分可能同时缺失。为弥补这一差距，我们引入一个称为事实生成的任务：从任意掩码查询生成有效超关系事实，即补全部分观察到的事实或从头生成事实。我们提出KREPE，这是首个用于HKG的生成式表示学习方法，通过掩码离散扩散学习以局部事实成分和HKG全局结构为条件的缺失成分概率分布。KREPE通过上下文消息传递建模事实内依赖，并通过聚合随机采样上下文建模事实间关联。KREPE在单一训练框架内无缝统一链接预测与事实生成，在标准HKG链接预测基准上达到最先进性能，并在生成新颖且正确事实方面超越基于LLM的基线方法。

英文摘要

Hyper-relational knowledge graphs (HKGs) effectively represent complex facts. While inferring new knowledge in HKGs is a critical problem, current methods cast it as a simple link prediction, assuming that nearly all entities and relations within a fact are known, leaving only a single blank to be filled. However, this restricted assumption may not hold in real-world scenarios in which multiple, or even all, constituent components of a fact may be missing simultaneously. To bridge this gap, we introduce a task called fact generation: generating a valid hyper-relational fact from an arbitrarily masked query, i.e., completing a partially observed fact or generating a fact from scratch. We propose KREPE, the first generative representation learning method for HKGs that learns to model the probability distributions of missing components conditioned on the local fact components and global structure of HKGs via a masked discrete diffusion. KREPE models both the intra-fact dependencies by contextual message passing and inter-fact correlations by aggregating stochastically sampled contexts. KREPE seamlessly unifies link prediction and fact generation within a single training framework, achieving state-of-the-art performance on standard HKG link prediction benchmarks and outperforming LLM-based baselines in generating novel and correct facts.

URL PDF HTML ☆

赞 0 踩 0

2605.24062 2026-05-26 cs.LG cs.AI 版本更新

Federated Learning over Human-Body Communication for On-Body Edge Intelligence: A Survey, Taxonomy, and BODYFED-HBC Scheduling Vignette

基于人体通信的联邦学习用于体表边缘智能：综述、分类法与BODYFED-HBC调度示例

Koffka Khan

发表机构 * Department of Computing and Information Technology（计算与信息技术系）； The University of the West Indies（西印度大学）

AI总结本文综述了人体通信与联邦学习在可穿戴设备中的交叉领域，提出了一种区分体内、体中心、跨用户和临床云联邦学习部署的分类法，并引入BODYFED-HBC参考架构和调度算法以解决体信道感知的联邦学习问题。

详情

AI中文摘要

人体通信（HBC）是一种有前景的可穿戴体域网络物理层，因为它可以将通信局限在身体周围，并减轻传统无线电链路的负担。联邦学习（FL）是一种有前景的学习层，因为它可以减少生理和行为传感的原始数据集中化。然而，这两类文献之间的联系仍然薄弱：用于可穿戴设备的FL通常抽象通信层，而HBC研究通常抽象学习和模型更新流量。本文综述了HBC、无线体域网络、可穿戴FL、身体互联网隐私和边缘智能优化的交叉领域。我们提出了一种分类法，区分了体内、体中心、跨用户和临床云FL部署，并识别了体信道感知FL这一开放问题：即客户端选择、更新压缩和聚合由姿态相关的HBC链路、剩余能量、传感器内存和隐私风险控制的学习协议。为了使研究议程具体化，我们引入了BODYFED-HBC作为参考架构，并提供了优化公式和调度算法。我们进一步指定了一个可复现的模拟示例，该示例结合了公共可穿戴数据集和经验性的体耦合通信信号损耗模型。文章最后为工作在硬件层之上的计算机科学家提供了开放数据集、评估指标、局限性和研究方向。

英文摘要

Human-body communication (HBC) is a promising physical substrate for wearable body-area networks because it can localize communication around the body and reduce the burden of conventional radio links. Federated learning (FL) is a promising learning substrate because it can reduce raw-data centralization for physiological and behavioral sensing. Yet these two literatures remain weakly connected: FL for wearables usually abstracts the communication layer, whereas HBC research usually abstracts learning and model-update traffic. This article surveys the intersection of HBC, wireless body-area networks, wearable FL, Internet-of-Bodies privacy, and edge-intelligence optimization. We propose a taxonomy that distinguishes intra-body, body-hub, cross-user, and clinical-cloud FL deployments, and we identify the open problem of body-channel-aware FL: learning protocols whose client selection, update compression, and aggregation are controlled by posture-dependent HBC links, residual energy, sensor memory, and privacy risk. To make the research agenda concrete, we introduce BODYFED-HBC as a reference architecture and provide an optimization formulation and scheduling algorithm. We further specify a reproducible simulation vignette that combines public wearable datasets with empirical body-coupled-communication signal-loss models. The article concludes with open datasets, evaluation metrics, limitations, and research directions for computer scientists working above the hardware layer.

URL PDF HTML ☆

赞 0 踩 0

2605.24058 2026-05-26 cs.LG cs.AI 版本更新

Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning

符号胜过浮点：面向设备端微调的低秩双二值适配器

Yoshihiko Fujisawa, Yuma Ichikawa, Yudai Fujimoto, Akira Sakai, Katsuki Fujisawa

发表机构 * Fujitsu Limited（富士通株式会社）； Institute of Science Tokyo（东京科学研究所）； RIKEN Center for AIP（理化学研究所先进信息处理中心）； Tokai University（静冈大学）

AI总结提出LoRDBA，一种用二值符号载波和通道级缩放替代低秩因子的适配器，在保持LoRA兼容性的同时显著降低存储和计算开销，并在设备端微调中匹配或超越低比特基线性能。

Comments 34 pages, 3 figures

详情

AI中文摘要

大型语言模型的设备端适配通常保持量化基模型冻结，同时训练和部署一个小型任务特定的LoRA适配器。然而，在未合并的适配器模式下，适配器不仅仅是一个紧凑的存储模块；它引入了一个额外的密集浮点分支，维护可训练状态以进行本地更新，并充当通信和热交换单元。我们提出LoRDBA，一种LoRA兼容的适配器，它将两个低秩因子替换为二值符号载波，同时通过轻量级的通道级缩放表示幅度，将密集适配器分支转换为两个符号累积矩阵乘法，中间穿插通道级缩放。有限样本分析表明，重建质量由原始LoRA因子的残差与幅度之比决定。在适配器模式实验中，LoRDBA在匹配模型大小的情况下优于低比特基线，并在某些场景下匹配fp16 LoRA的质量。尽管适配器占用减少了超过10倍，未合并的适配器在匹配秩r=16时最多引入8%的预填充延迟开销，训练内存开销约为fp16 LoRA的1.6倍。

英文摘要

On-device adaptation of large language models commonly keeps a quantized base model frozen while training and deploying a small, task-specific LoRA adapter. In the unmerged adapter-mode setting, however, the adapter is more than a compact storage module; it introduces an additional dense floating-point branch, maintains a trainable state for local updates, and acts as a unit of communication and hot-swapping.We introduce LoRDBA, a LoRA-compatible adapter that replaces both low-rank factors with binary sign carriers while representing magnitudes through lightweight, channel-wise scales, converting the dense adapter branch into two sign-accumulation matrix multiplications interleaved with channel-wise scaling. A finite-sample analysis shows that reconstruction quality is governed by the residual-to-magnitude ratio of the original LoRA factors. In adapter-mode experiments, LoRDBA outperforms low-bit baselines at matched model sizes while matching fp16 LoRA quality in selected regimes. The unmerged adapter incurs at most 8% prefill latency overhead at matched rank r=16 despite an over 10x reduction in adapter footprint, with moderate training memory overhead of approximately 1.6x that of fp16 LoRA.

URL PDF HTML ☆

赞 0 踩 0

2605.24057 2026-05-26 cs.LG cs.AI 版本更新

Feature Lottery? A Bifurcation Theory of Concept Emergence

特征彩票？概念涌现的分岔理论

Fuming Yang

发表机构 * MIT（麻省理工学院）

AI总结提出一种基于分岔理论的方法，通过损失Hessian驱动的超临界叉形分岔检测表示动力学中的结构涌现，并引入无标签相位坐标β/β_c，在多种设置下验证了四个不同的转变阶段，揭示了特征可解释性的早期可预测性。

详情

AI中文摘要

神经网络在训练过程中的特定时刻获得结构化表示，然而识别这些转变通常依赖于回顾性的、基于标签的指标。我们引入了一种表示动力学的分岔理论来实时检测这些时刻。通过分析附加在演化编码器上的被动高斯混合模型探针，我们展示了结构的开始对应于由损失Hessian驱动的超临界叉形分岔。系统表现出一个理论上可预测的过零点（β_c），与网络当前状态（β）相比，产生一个动态比率β(t)/β_c(t)：一个通用的、无标签的表示动力学相位坐标，完全可以从隐藏状态计算得出。我们在不同设置下实证验证了该坐标预测的四个不同转变阶段：语言模型（Pythia）上的稀疏自编码器、自监督学习（CIFAR）和grokking（模算术）。关键的是，在有限耗散下，宏观对称性破缺可能滞后于初始过零点数个数量级，这为grokking中观察到的延迟逃逸提供了严格的动力学解释。微观上，分岔产生了一个共享的不稳定子空间，迫使集体对称性破缺。我们将其称为稀疏自编码器训练中的“特征彩票”：一个特征的最终可解释性变得惊人地早期可预测。仅在训练5%时，早期原子纯度就能稳健地预测最终收敛纯度，其中前十百分位的早期原子在收敛时的纯度比基线高出12倍以上。除了解释概念涌现外，β/β_c还为训练健康提供了实用的早期预警指标，在下游指标反应之前检测到可用结构的出现、特征身份的结晶以及表示崩溃的时期。

英文摘要

Neural networks acquire structured representations at specific moments during training, yet identifying these transitions typically relies on retrospective, label-dependent metrics. We introduce a bifurcation theory of representation dynamics to detect these moments in real time. Analyzing a passive GMM probe attached to the evolving encoder, we show the onset of structure corresponds to a supercritical pitchfork bifurcation driven by the loss Hessian. The system exhibits a theoretically predictable zero-crossing ($β_c$) that, compared to the network's current state ($β$), yields a dynamic ratio $β(t)/β_c(t)$: a universal, label-free phase coordinate for representation dynamics, computable entirely from hidden states. We empirically validate four distinct transition regimes predicted by this coordinate across diverse settings: SAEs on language models (Pythia), SSL (CIFAR), and grokking (modular arithmetic). Crucially, under finite dissipation, macroscopic symmetry-breaking can lag the initial zero-crossing by orders of magnitude, which providing a rigorous dynamical account of the delayed escape observed in grokking. Microscopically, the bifurcation creates a shared unstable subspace, forcing collective symmetry breaking. We term this the "feature lottery" in SAE training: a feature's terminal interpretability becomes predictable remarkably early. By only 5% of training, early atom purity robustly predicts final convergence purity, with top-decile early atoms achieving over 12x the baseline purity at convergence. Beyond explaining concept emergence, $β/β_c$ provides a practical early-warning indicator for training health, detecting the onset of usable structure, the crystallization of feature identity, and representational collapse epochs before downstream metrics react.

URL PDF HTML ☆

赞 0 踩 0

2605.24055 2026-05-26 cs.LG cs.AI 版本更新

Cascade-KDE: Robust Time-Series Restoration under Out-of-Distribution Impulse Corruptions

Cascade-KDE：面向分布外脉冲损坏的鲁棒时间序列恢复

Yuefeng Liu, Ning Yang, Ziyu Yang

发表机构 * School of Digital and Intelligent Industry (School of Cyber Science and Technology)（数字与智能产业学院（网络科学与技术学院））； Inner Mongolia University of Science and Technology（内蒙古科技大学）

AI总结提出Cascade-KDE无训练框架，通过二维密度估计、密度截断鲁棒期望和指数级联自适应停止，在保留局部结构的同时鲁棒恢复被高斯噪声和脉冲异常损坏的时间序列。

详情

AI中文摘要

工业传感、医疗和能源系统中的真实世界时间序列数据通常被高斯噪声和偶尔的大幅度脉冲异常值混合污染。对于依赖局部形状的任务，如心电图形态分析和电池退化监测，主要要求不仅是低重建误差，还要保留导数峰值和任务关键特征。我们提出了Cascade-KDE，一种用于损坏时间序列的无训练恢复框架。该方法首先估计二维时间-幅度密度，然后应用密度截断鲁棒期望来限制远处异常点的影响，最后通过具有自适应停止的指数级联细化序列。该设计旨在提高在分布外脉冲损坏下的鲁棒性，同时使恢复轨迹接近原始局部结构。在多个基准数据集上，所提方法在曲线保真度、导数保留、下游分类和运行时效率方面相比经典滤波器和代表性学习基线表现出一致的改进。这些结果表明，基于有界密度的恢复是噪声时间序列流程中保留特征预处理的实用选择。

英文摘要

Real-world time-series data in industrial sensing, healthcare, and energy systems is often corrupted by a mixture of Gaussian noise and occasional large-magnitude impulse outliers. For tasks that depend on local shape, such as ECG morphology analysis and battery degradation monitoring, the main requirement is not only low reconstruction error but also preservation of derivative peaks and task-critical features. We propose Cascade-KDE, a training-free restoration framework for corrupted time series. The method first estimates a two-dimensional temporal-amplitude density, then applies a Density-Truncated Robust Expectation to limit the influence of distant abnormal points, and finally refines the sequence through an exponential cascade with adaptive stopping. This design aims to improve robustness under out-of-distribution impulse corruptions while keeping the restored trajectory close to the original local structure. Across several benchmark datasets, the proposed method shows consistent gains over classical filters and representative learning-based baselines on curve fidelity, derivative preservation, downstream classification, and runtime efficiency. These results suggest that bounded density-based restoration is a practical option for feature-preserving preprocessing in noisy time-series pipelines.

URL PDF HTML ☆

赞 0 踩 0

2605.24053 2026-05-26 cs.AI cs.CL cs.LG 版本更新

Breaking the Chains of Probability: Neutrosophic Logic as a New Framework for Epistemic Uncertainty in Large Language Models

打破概率的锁链：中智逻辑作为大型语言模型中认知不确定性的新框架

Maikel Yelandi Leyva-Vázquez, Florentin Smarandache

发表机构 * Universidad Bolivariana del Ecuador, Coordinación Académica de Posgrado（巴尔干大学厄瓜多尔分校，研究生院）； Universidad de Guayaquil（瓜亚基尔大学）； Universidad Bernardo O’Higgins（伯纳多·奥希金斯大学）； Mathematics, Physics, and Natural Sciences Division, University of New Mexico（新墨西哥大学数学、物理和自然科学系）

AI总结本文提出使用中智逻辑（Truth、Indeterminacy、Falsity三个独立维度）替代传统概率框架，通过实验发现该框架能更丰富地表示LLM的内部状态，并在35%的评估中自发出现超真状态，为透明、可靠和伦理感知的AI系统提供关键步骤。

Comments Published in Neutrosophic Sets and Systems, Vol. 99 (2026). Author's preprint version. Open code and data available at: github.com/mleyvaz/neutrosophic-llm-logic

详情

DOI: 10.5281/zenodo.19954583
Journal ref: Neutrosophic Sets and Systems, Vol. 99, 2026

AI中文摘要

大型语言模型（LLM）主要受概率框架支配，其中结果概率之和被约束为1。这种由Softmax层强加的结构限制导致不确定性崩溃，使得难以区分认知不确定性、悖论和模糊性。我们提出了一种中智逻辑应用的实证研究，该框架将真（T）、不确定（I）和假（F）视为三个独立维度，用于建模LLM中的认知状态。我们在四个OpenAI GPT模型家族上进行了实验，涵盖五种语言现象：逻辑悖论、认知无知、模糊性、伦理矛盾和未来偶然性，采用三种提示策略：中智、概率和熵衍生。我们的发现表明，中智方法通过允许T+I+F>1（我们称之为超真状态），提供了模型内部状态的更丰富表示。在35%的评估中，超真状态自发出现，主要出现在伦理矛盾和逻辑悖论下。我们证明，该方法在模糊上下文中保留了真值，并提供了一种稳健的方法来识别和量化内部模型冲突。我们得出结论，中智评估层的集成是迈向更透明、可靠和伦理感知的AI系统的关键一步。

英文摘要

Large Language Models (LLMs) are predominantly governed by probabilistic frameworks in which the sum of outcome probabilities is constrained to unity. This architectural limitation, often imposed by Softmax layers, leads to a collapse of uncertainty that makes it difficult to differentiate between epistemic uncertainty, paradox, and vagueness. We present an empirical investigation of the application of Neutrosophic Logic, a framework that treats Truth (T), Indeterminacy (I), and Falsity (F) as three independent dimensions, to model epistemic states in LLMs. We conducted experiments on a family of four OpenAI GPT models across five linguistic phenomena: logical paradoxes, epistemic ignorance, vagueness, ethical contradictions, and future contingencies, under three prompting strategies: neutrosophic, probabilistic, and entropy-derived. Our findings reveal that the neutrosophic approach, by allowing T+I+F > 1, a state we term hyper-truth, provides a richer representation of a model's internal state. In 35% of evaluations, hyper-truth emerged spontaneously, predominantly under ethical contradiction and logical paradox. We demonstrate that this approach preserves truth values in fuzzy contexts and offers a robust method for identifying and quantifying internal model conflict. We conclude that the integration of neutrosophic evaluation layers is a critical step toward more transparent, reliable, and ethically aware AI systems.

URL PDF HTML ☆

赞 0 踩 0

2605.24052 2026-05-26 cs.LG cs.AI 版本更新

Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing

移动众包中用于LLM微调的诚实在线偏好聚合

Shugang Hao, Lingjie Duan

发表机构 * Singapore University of Technology and Design（新加坡科技设计大学）； Hong Kong University of Science and Technology（香港科技大学）

AI总结针对移动众包中工人可能策略性谎报偏好反馈的问题，提出一种动态贝叶斯博弈模型和在线加权聚合机制，确保工人诚实反馈并实现次线性遗憾。

详情

AI中文摘要

为了更好地满足移动应用（如导航）中用户的需求，移动众包平台可以迭代地将大语言模型（LLM）生成的内容（例如，AI生成的交通状况预测）与从众包工人（例如，移动用户）收集的人类反馈进行对齐。然而，工人可能会策略性地谎报他们的在线偏好反馈，以最大化其影响力或报酬。移动众包中现有的流程（例如，基于EM的权重估计）无法在这种在线设置中识别出最准确的工人，导致在$T$个时隙上产生线性遗憾$\mathcal{O}(T)$。在本文中，我们研究了移动众包中用于LLM微调的诚实在线偏好聚合。我们建立了一个新的动态贝叶斯博弈来建模平台与策略性移动工人之间的多智能体在线学习过程。我们提出了一种新颖的在线加权聚合机制，该机制根据每个工人的反馈准确性动态调整其在偏好聚合中的权重。我们证明了我们的机制确保了策略性工人的诚实反馈，并在$T$个时隙上实现了次线性遗憾$\mathcal{O}(\sqrt{T})$。我们进一步将我们的机制扩展到每个时隙工人反馈有限的挑战性场景，仍然保证了次线性遗憾$\mathcal{O}(\sqrt{T})$。在真实世界数据集上进行的LLM微调实验进一步证明了我们的机制相对于基准方案的显著性能提升。

英文摘要

To better serve users' demands in mobile applications (e.g., navigation), mobile crowdsourcing platforms can iteratively align large language model (LLM)-generated content (e.g., AI-generated traffic condition predictions) with human feedback collected from crowdsourcing workers (e.g., mobile users). However, workers may strategically misreport their online preference feedback to maximize their influence or payment. Existing pipelines in mobile crowdsourcing (e.g., EM-based weight estimation) fail to identify the most accurate worker in this online setting, resulting in a linear regret $\mathcal{O}(T)$ over $T$ time slots. In this paper, we study truthful online preference aggregation for LLM fine-tuning in mobile crowdsourcing. We formulate a new dynamic Bayesian game to model the multi-agent online learning process between the platform and strategic mobile workers. We propose a novel online weighted aggregation mechanism that dynamically adjusts each worker's weight in the preference aggregation according to their feedback accuracy. We prove that our mechanism ensures truthful feedback from strategic workers and achieves a sublinear regret $\mathcal{O}(\sqrt{T})$ over $T$ time slots. We further extend our mechanism to a challenging scenario with limited worker feedback per time slot, still guaranteeing a sublinear regret $\mathcal{O}(\sqrt{T})$. Experiments on LLM fine-tuning with real-world datasets further demonstrate significant performance gains of our mechanisms over benchmark schemes.

URL PDF HTML ☆

赞 0 踩 0

2605.24048 2026-05-26 cs.LG cs.AI 版本更新

Mixture of Complementary Agents for Robust LLM Ensemble

互补代理混合：鲁棒的大语言模型集成

Yichi Zhang, Kevin Lu, Yuang Zhang, Jie Gao, Lirong Xia, Fang-Yi Yu

发表机构 * DIMACS, Rutgers University（罗格斯大学DIMACS研究中心）； Department of Mathematics, Rutgers University（罗格斯大学数学系）； Department of Computer Science, George Mason University（乔治·梅森大学计算机科学系）； Department of Computer Science, Rutgers University（罗格斯大学计算机科学系）

AI总结将大语言模型选择视为组合选择问题，提出基于互补性的贪心选择算法，在性能与成本间取得最佳平衡。

详情

AI中文摘要

多AI协作，例如集成或辩论大语言模型（LLMs），是一种有前景的聚合信息和提升性能的范式。这些流程的基础步骤是将多个提议LLM的响应输入到一个总结LLM中，后者合成一个更好的答案。然而，选择哪些提议者并非易事。现有方法主要关注准确性（选择最强模型）或多样性（确保多样性），并且常常忽视提议者之间以及与总结者之间的交互。我们将提议者选择重新定义为类似于特征选择的组合选择问题，其中LLM的价值在于其与其他模型的互补性。然而，由于时间复杂度过高，直接应用标准特征选择算法在LLM场景中不切实际。受此限制，我们探索了一系列计算可行的贪心式选择算法，这些算法使用少量标记集评估互补性。我们的实验验证了互补性作为提议者选择的指导原则，并确定了在实践中实现最佳性能-成本权衡的方法。

英文摘要

Multi-AI collaboration, such as ensembling or debating large language models (LLMs), is a promising paradigm for aggregating information and boosting performance. A foundational step in these pipelines is to feed the responses of several proposer LLMs into a summarizer LLM, which synthesizes a better answer. However, choosing which proposers to include is non-trivial. Existing approaches primarily focus either on accuracy (picking the strongest models) or diversity (ensuring variety), and often overlook the interactions among proposers and with the summarizer. We reframe proposer selection as a combinatorial selection problem akin to feature selection, where the value of an LLM lies in its complementarity with others. However, directly applying standard feature-selection algorithms is impractical in the LLM setting due to prohibitive time complexity. Motivated by this limitation, we explore an extensive range of computationally feasible, greedy-style selection algorithms that assess complementarity using a small labeled set. Our experiments validate complementarity as a guiding principle for proposer selection and identify methods that achieve the best performance-cost trade-offs in practice.

URL PDF HTML ☆

赞 0 踩 0

2605.24045 2026-05-26 cs.LG cs.AI 版本更新

A Large-Scale Dataset and Benchmark: Do Protein-Ligand Models Learn Binding Sites or Just Binding Likelihood?

大规模数据集与基准：蛋白质-配体模型学习的是结合位点还是仅仅结合可能性？

Zhaohan Meng, Zhen Bai, Ke Yuan, Iadh Ounis, Zaiqiao Meng, Hao Xu, Joseph Loscalzo

发表机构 * School of Computing Science（计算科学学院）； School of Cancer Sciences（癌症科学学院）； School of Life Science and Technology（生命科学与技术学院）； Institute of Science Tokyo（东京科学研究院）； Cancer Research UK Scotland Institute（英国癌症研究会苏格兰研究所）； Language Technology Lab（语言技术实验室）； Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School（哈佛医学院内科部，布里格斯妇女医院）； The Broad Institute of MIT and Harvard（MIT和哈佛大学Broad研究所）

AI总结针对现有基准无法评估模型是否定位结合位点的问题，提出包含约10万对蛋白质-配体的InteractBind数据集和细粒度基准，通过结合位点定位任务揭示模型在强二元预测下定位能力有限。

Comments Under Review for the NeurIPS 2026 Conference, Track on Evaluations and Datasets

详情

AI中文摘要

蛋白质-配体建模是计算药物发现和分子设计的基础。现有的蛋白质-配体基准通常通过二元结合预测和亲和力回归等任务评估蛋白质与配体是否相互作用以及结合强度。然而，这些评估提供的证据有限，无法判断模型是否能够定位结合位点或识别分子识别背后的非共价相互作用。为填补这一空白，我们引入了InteractBind，一个大规模蛋白质-配体数据集，包含约10万对蛋白质-配体对，以及一个用于细粒度评估的基准。核心细粒度任务是结合位点定位，它利用覆盖六种主要非共价相互作用类型的蛋白质残基和配体原子相互作用图，评估模型导出的相互作用图是否能够定位结合位点。InteractBind还包含结合亲和力和蛋白质相似性控制的分割，以支持现实的泛化评估。使用InteractBind，我们评估了八个现有的基于序列和交互感知的模型，评估了二元结合预测和结合位点定位。结果显示，尽管二元结合预测表现强劲，但结合位点定位能力有限，且在不同非共价相互作用类型间存在显著差异。总体而言，InteractBind建立了一个基准范式，鼓励开发更具可解释性和物理基础的蛋白质-配体模型。

英文摘要

Protein-ligand modeling underpins computational drug discovery and molecular design. Existing protein-ligand benchmarks typically evaluate whether a protein and ligand interact and how strongly they bind, through tasks such as binary binding prediction and affinity regression. However, these evaluations provide limited evidence of whether models can localize binding sites or identify the non-covalent interactions underlying molecular recognition. To address this gap, we introduce InteractBind, a large-scale protein-ligand dataset comprising approximately 100k protein-ligand pairs, together with a benchmark for fine-grained evaluation. The core fine-grained task is that of binding-site localization, which uses protein-residue and ligand-atom interaction maps spanning six major types of non-covalent interactions to assess whether model-derived interaction maps localize binding sites. InteractBind further includes binding affinity and protein similarity-controlled splits to support realistic generalization assessment. Using InteractBind, we evaluate eight existing sequence-based and interaction-aware models, assessing binary binding prediction and binding-site localization. Results reveal limited binding-site localization despite strong binary binding prediction, with marked variation across non-covalent interaction types. Overall, InteractBind establishes a benchmark paradigm that encourages the development of more interpretable and physically grounded protein-ligand models.

URL PDF HTML ☆

赞 0 踩 0

2605.24043 2026-05-26 cs.LG cs.AI 版本更新

LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs

LLM-AutoSciLab：通过LLM主动实验进行闭环科学发现

Sanchit Kabra, Nikhil Abhyankar, Saaketh Desai, Prasad Iyer, Chandan K Reddy

发表机构 * Virginia Tech（弗吉尼亚理工大学）； Sandia National Laboratories（桑迪亚国家实验室）

AI总结提出LLM-AutoSciLab闭环框架，通过假设生成与实验选择迭代优化，在预算约束下实现主动数据采集，在三个基准上优于现有方法且样本效率提升2-5倍。

详情

AI中文摘要

科学发现是一个闭环过程，其中假设指导数据采集，观察结果细化假设空间。然而，大多数方法将发现简化为对固定数据集的监督学习，其中有限的观察可能支持多种局部拟合但无法泛化的合理机制。因此，关键挑战在于选择信息丰富的观察以消除不确定性，将焦点从静态推断转向自适应数据采集。为此，我们提出LLM-AutoSciLab，一个将假设生成与假设条件实验选择和机制细化相结合的闭环框架。LLM-AutoSciLab不是将模型拟合到被动收集的数据，而是迭代地提出合理的假设，选择信息丰富的实验来区分或细化它们，并使用由此产生的证据更新其状态。为了评估具有主动数据采集的动态闭环科学发现，我们引入了ActiveSciBench，包含两个数据集：包含57个酶动力学任务的ActiveSciBench-Chem和包含45个基因调控网络任务的ActiveSciBench-GRN。这些数据集将发现建模为预算约束过程，需要自适应实验设计、变量选择和真实机制的恢复。在NewtonBench、ActiveSciBench-Chem和ActiveSciBench-GRN上，LLM-AutoSciLab优于先前方法，在NewtonBench和ActiveSciBench-Chem上分别达到67.6%和35.1%的符号准确率，在ActiveSciBench-GRN上达到31.1%的精确图恢复。此外，假设引导的实验比最强竞争基线样本效率高2-5倍。代码和数据可在https://github.com/scientific-discovery/LLM-AutoSciLab获取。

英文摘要

Scientific discovery is a closed-loop process in which hypotheses guide data acquisition and observations refine the hypothesis space. Yet most approaches reduce discovery to supervised learning over fixed datasets, where limited observations can support multiple plausible mechanisms that fit locally but fail to generalize. Thus, the key challenge is selecting informative observations to resolve uncertainty, shifting the focus from static inference to adaptive data acquisition. To address this, we propose LLM-AutoSciLab, a closed-loop framework that couples hypothesis generation with hypothesis-conditioned experiment selection and mechanism refinement. Rather than fitting models to passively collected data, LLM-AutoSciLab iteratively proposes plausible hypotheses, selects informative experiments to distinguish or refine them, and updates its state using the resulting evidence. To evaluate dynamic, closed-loop scientific discovery with active data acquisition, we introduce ActiveSciBench, comprising two datasets: ActiveSciBench-Chem with 57 enzyme-kinetics tasks and ActiveSciBench-GRN with 45 gene-regulatory-network tasks. These datasets model discovery as a budget-constrained process requiring adaptive experiment design, variable selection, and recovery of true mechanisms. Across NewtonBench, ActiveSciBench-Chem, and ActiveSciBench-GRN, LLM-AutoSciLab outperforms prior methods, achieving 67.6% and 35.1% symbolic accuracy on NewtonBench and ActiveSciBench-Chem, respectively, and 31.1% exact graph recovery on ActiveSciBench-GRN. Moreover, hypothesis-guided experimentation is 2-5x more sample-efficient than the strongest competing baselines. Code and data are available at: https://github.com/scientific-discovery/LLM-AutoSciLab

URL PDF HTML ☆

赞 0 踩 0

2605.24033 2026-05-26 cs.LG cs.LO 版本更新

Towards Verifiable Transformers: Solver-Checkable Circuit Explanations

迈向可验证的Transformer：求解器可检查的电路解释

Neel Somani

发表机构 * Independent Researcher（独立研究者）

AI总结提出Verifiable Transformers框架，通过将任务局部Transformer电路转化为有界、求解器可检查的声明，实现电路属性的形式化验证。

详情

AI中文摘要

机制可解释性通常识别Transformer模型内部的电路，但这些电路的解释通常通过示例、消融和手动推理来验证。这留下了在发现合理电路与证明电路功能之间的差距。我们引入了Verifiable Transformers，一个将任务局部Transformer电路转化为有界、求解器可检查的声明的框架。给定一个行为、一个有限任务域和一个候选token投影，我们提取任务电路并验证属性，如投影功能等价性、边必要性、任务相关不变性和最终残差鲁棒性。直接验证将提取的电路本身编码到SMT求解器中。当电路包含无法精确或可处理编码的算子时，代理介导的验证拟合一个SMT可编码的代理，在有限域上针对提取的电路验证它，并针对代理验证符号解释。我们使用带有Signed L1 BandNorm、sparsemax注意力和LeakyReLU的GPT风格架构实例化直接验证。在小型符号序列任务上，我们训练一个SMT可表示的Transformer，提取用于引号闭合和括号类型跟踪的稀疏电路，并详尽验证投影功能等价性、内容不变性、边必要性和最终残差鲁棒性。在GPT-2规模上，相同的算子堆栈在OpenWebText上稳定训练，尽管朴素直接SMT验证仍然难以处理。我们还展示了在具有难以编码注意力的任务局部电路上的代理介导验证，显示了已验证的符号解释和求解器生成的反例。目标不是全模型验证，而是为将机制电路解释转化为可证明或反驳的形式命题提供一条具体路径。

一种用于通信感知评估流水线并行LLM训练的表格调度抽象

Daniel Barley, Jonathan Leis, Benjamin Klenk, Holger Fröning

发表机构 * Hardware and Artificial Intelligence (HAWAII) Lab, Heidelberg University, Heidelberg, Germany（海德堡大学硬件与人工智能实验室）； NVIDIA Corporation, Santa Clara, CA, USA（英伟达公司）

AI总结本文提出一种表格调度抽象和统一的多抽象方法，通过公式推理、理想化调度表和通信感知执行模拟，比较了GPipe、1F1B、Chimera和Hanayo等流水线调度方案，发现通信会抵消气泡分析的结构优势，调度排名依赖于执行环境。

Comments Accepted at the 25th IEEE International Symposium on Parallel and Distributed Computing (ISPDC 2026)

详情

AI中文摘要

流水线并行是大型语言模型分布式训练的关键技术，因为它减少了每设备的参数和激活内存。然而，比较流水线调度方案是困难的：分析模型暴露了诸如气泡比率等结构量，而端到端硬件实验成本高昂且依赖于系统。在这项工作中，我们引入了一种表格调度抽象和一种统一的多抽象方法，该方法连接了基于公式的推理、理想化调度表和通信感知执行模拟。使用这个框架，我们在多个建模的系统配置下比较了GPipe、1F1B、Chimera和受限模式下的Hanayo。我们的结果表明，调度排名不是抽象不变的：通信可以抵消仅由气泡分析所暗示的结构优势。在本文考虑的假设下，GPipe和1F1B在运行时等价，但1F1B实现了更低的激活内存峰值。Chimera主要在低微批次数和通信友好的情况下具有优势，而Hanayo在其预期的受限工作点有效，但对网络瓶颈敏感。我们进一步研究了一种非对称的Chimera式放置，它没有减少全局峰值内存需求，但在浅流水线中显示出有限的运行时收益。总体而言，流水线调度质量仅在建模的执行环境背景下才有意义。

英文摘要

Pipeline parallelism is a key technique for distributed training of large language models because it reduces per-device parameter and activation memory. However, comparing pipeline schedules is difficult: analytical models expose structural quantities such as bubble ratios, while end-to-end hardware experiments are costly and system-specific. In this work, we introduce a tabular schedule abstraction and a unified multi-abstraction methodology that connects formula-based reasoning, idealized schedule tables, and communication-aware execution simulation. Using this framework, we compare GPipe, 1F1B, Chimera, and Hanayo in its restricted regime across multiple modeled system configurations. Our results show that schedule rankings are not abstraction-invariant: communication can negate structural advantages suggested by bubble analysis alone. Under the assumptions considered here, GPipe and 1F1B are runtime-equivalent, but 1F1B achieves a lower activation-memory peak. Chimera is advantageous mainly at low microbatch counts and in communication-favorable regimes, while Hanayo is effective in its intended restricted operating point but remains sensitive to network bottlenecks. We further study an asymmetric Chimera-style placement, which does not reduce the global peak memory requirement but reveals limited runtime gains in shallow pipelines. Overall, pipeline schedule quality is meaningful only in the context of the modeled execution environment.

URL PDF HTML ☆

赞 0 踩 0

2605.24004 2026-05-26 cs.AI cs.CV cs.LG cs.RO 版本更新

Reason--Imagine--Act: Closed-Loop LLM Decision Making with World Models for Autonomous Driving

推理--想象--行动：基于世界模型的闭环LLM自动驾驶决策

Zhengqi Sun, Yiwen Sun, Boxuan Liu, Tailai Chen, Tianxu Guo, Jiabin Liu

发表机构 * 1Department of Information Management, Peking University, Beijing 100871, China ； 2School of Intelligence Science ； Technology, Peking University, Beijing 100871, China ； 3State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing 100080, China ； 4Yuanpei College, Peking University, Beijing 100871, China ； 5China Agricultural University, Beijing, China ； 6CRSC Research \& Design Institute Group Co., Ltd., Beijing, China

AI总结提出Reason--Imagine--Act (RIA)闭环框架，结合LLM推理器与动作条件世界模型进行在线安全验证，在CARLA点目标协议下实现80.05%路线完成率、51.10%到达率和0.20%碰撞率。

Comments Accepted by the 2026 IEEE International Conference on Intelligent Transportation Systems (ITSC 2026). 8 pages, 2 figures

详情

AI中文摘要

大型语言模型（LLM）在自动驾驶中具有潜力，但仅基于语义的决策策略可能在动态交通中产生物理上不安全的行为。现有方法要么在没有显式动力学验证的情况下进行在线语言推理，要么主要在离线流程中使用世界模型，在决策时语义意图与物理可行性之间存在差距。我们提出了Reason--Imagine--Act (RIA)，一个闭环框架，将LLM推理器与动作条件世界模型耦合，用于在线安全验证。在每一步，LLM提出一个动作模板和候选子动作，世界模型执行短时域展开，安全评分器选择最安全的可执行动作并反馈给下一步推理。在统一的CARLA点目标协议（1000个回合）下，RIA实现了80.05%的路线完成率、51.10%的到达率和0.20%的碰撞率。在相同的闭环接口下，RIA在核心闭环指标上始终优于无训练基线，包括CARLA TM和MADA。为便于复现，代码可在https://github.com/pku-smart-city/source_code/tree/main/RIA获取。

英文摘要

Large language models (LLMs) are promising for autonomous driving, but semantics-only decision policies can yield physically unsafe behavior in dynamic traffic. Existing methods either perform online language reasoning without explicit dynamics verification or use world models mainly in offline pipelines, leaving a gap between semantic intent and physical feasibility at decision time. We propose Reason--Imagine--Act (RIA), a closed-loop framework that couples an LLM reasoner with an action-conditioned world model for online safety verification. At each step, the LLM proposes an action template and candidate sub-actions, the world model performs short-horizon rollouts, and a safety scorer selects the safest executable action with feedback to the next reasoning step. Under a unified CARLA point-goal protocol (1000 episodes), RIA achieves 80.05% route completion, 51.10% arrival rate, and 0.20% collision rate. Under the same closed-loop interface, RIA consistently outperforms training-free baselines, including CARLA TM and MADA, on core closed-loop metrics. For reproducibility, code is available at https://github.com/pku-smart-city/source_code/tree/main/RIA.

URL PDF HTML ☆

赞 0 踩 0

2605.23997 2026-05-26 cs.CV cs.AI cs.LG 版本更新

IVR-R1: Refining Trajectories through Iterative Visual-Grounded Reasoning in Reinforcement Learning

IVR-R1：通过强化学习中的迭代视觉基础推理优化轨迹

Chenghao Li, Fusheng Hao, Xikai Zhang, Likang Xiao, Yanwei Ren, Fuxiang Wu, Quan Chen, Liu Liu

发表机构 * Hangzhou International Innovation Institute, Beihang University（北京航空航天大学杭州国际创新研究院）； School of Artificial Intelligence, Beihang University（北京航空航天大学人工智能学院）； Kuaishou Technology（快手科技）； Shenzhen Institute of Advanced Integration Technology, Shenzhen（深圳先进集成技术研究院）

AI总结提出IVR-R1框架，利用奖励驱动的筛选机制和迭代再推理循环，在强化学习中动态校正多模态推理轨迹，以解决视觉幻觉和逻辑错误问题。

详情

AI中文摘要

通过强化学习的多模态大语言模型在复杂视觉推理任务中展现出显著能力，但在长程多模态场景中仍存在局限，常出现视觉幻觉和逻辑错误。当前方法通常将高维视觉场景预编码为离散文本代理以促进下游推理。然而，随着推理链展开，文本与视觉场景之间固有的信息不对称会侵蚀视觉基础，导致推理误导和错误输出。为解决此问题，我们提出IVR-R1（迭代视觉基础推理），一种新颖的强化学习训练框架，通过动态视觉重新对齐主动校正推理轨迹以指导策略优化。具体而言，利用奖励驱动的筛选机制识别有缺陷的展开，IVR-R1在多模态上下文中执行细粒度的步骤级错误归因。通过将中间推理状态与原始视觉先验进行迭代交叉引用，再推理循环实现自动轨迹校正，有效合成专家级演示，作为策略模型的高保真推理模板。我们在多种多模态基准上的实验表明，IVR-R1持续优于现有强化学习方法，为在复杂多模态推理中保持逻辑和视觉一致性建立了优越范式。

英文摘要

Multimodal large language models via reinforcement learning (RL) have demonstrated remarkable capabilities in complex visual reasoning tasks, yet they remain limited in long-horizon multimodal scenarios, often suffering from visual hallucination and logical error. Current methods typically pre-encode high-dimensional visual scenes into discrete textual proxies to facilitate downstream reasoning. As the reasoning chain unfolds, however, the inherent information asymmetry between text and visual scenes tends to erode visual grounding, resulting in misguided reasoning and erroneous outputs. To address this issue, we introduce IVR-R1 (Iterative Visual-grounded Reasoning), a novel RL training framework that facilitates dynamic visual re-alignment that actively rectifies reasoning trajectories to guide policy optimization. Specifically, by leveraging a reward-driven screening mechanism to identify flawed rollouts, IVR-R1 executes a fine-grained, step-level error attribution within the multimodal context. By iteratively cross-referencing intermediate reasoning states against pristine visual priors, a Re-Reasoning Loop enables automated trajectory rectification, effectively synthesizing expert-level demonstrations that serve as high-fidelity reasoning templates for the policy model. Our experiments across diverse multimodal benchmarks demonstrate that IVR-R1 consistently outperforms existing reinforcement learning methods, establishing a superior paradigm for maintaining logical and visual consistency in complex multimodal reasoning.

URL PDF HTML ☆

赞 0 踩 0

2605.23988 2026-05-26 cs.DC cs.LG 版本更新

TSFLora: Token-Compressed Split Fine-Tuning for Wireless Edge Networks

TSFLora: 面向无线边缘网络的令牌压缩分割微调

Xianke Qiang, Zheng Chang, Li Wang, Ying-Chang Liang

发表机构 * School of Computer Science and Engineering（计算机科学与工程学院）； Center for Intelligent Networking and Communications（智能网络与通信中心）； School of Computer Science（计算机科学学院）； Beijing University of Posts and Telecommunications（北京邮电大学）

AI总结针对无线设备资源受限下大模型微调难题，提出TSFLora框架，通过注意力引导的令牌选择、合并、低比特量化及LoRA适配器，在分割联邦训练中压缩中间令牌序列，显著降低通信开销和内存占用，同时保持精度。

详情

AI中文摘要

将大型AI模型（LAM）适配到个性化边缘数据具有挑战性，因为无线设备的内存、计算和上行容量有限。联邦微调保护数据隐私，但仍要求每个设备托管完整模型，而分割学习以大量激活传输为代价减少设备内存。本文提出TSFLora，一种令牌压缩的分割微调框架，用于在边缘实现通信高效的LAM适配。TSFLora在分割联邦训练流程中结合了注意力引导的令牌选择、令牌合并、低比特激活量化和基于LoRA的适配。关键思想是在传输前压缩中间令牌序列，从而在不改变冻结骨干网络的情况下减少上行流量和服务器端处理。在CIFAR-10、CIFAR-100和TinyImageNet上对ViT模型的实验表明，TSFLora在保持竞争性精度的同时，实现了高达 extbf{6.8$ imes$}的通信减少和 extbf{41\%}的内存节省。

英文摘要

Adapting large AI models (LAMs) to personalized edge data is challenging because wireless devices have limited memory, computation, and uplink capacity. Federated fine-tuning preserves data privacy but still requires each device to host the full model, while split learning reduces device memory at the cost of heavy activation transmission. This paper proposes TSFLora, a token-compressed split fine-tuning framework for communication-efficient LAM adaptation at the edge. TSFLora combines attention-guided token selection, token merging, low-bit activation quantization, and LoRA-based adaptation within a split federated training pipeline. The key idea is to compress the intermediate token sequence before transmission so that the system reduces both uplink traffic and server-side processing without changing the frozen backbone. Experiments on ViT models over CIFAR-10, CIFAR-100, and TinyImageNet show that TSFLora achieves up to \textbf{6.8$\times$} communication reduction and \textbf{41\%} memory saving while maintaining competitive accuracy.

URL PDF HTML ☆

赞 0 踩 0

2605.23984 2026-05-26 cs.LG cs.AI cs.CV 版本更新

Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection

面向多模态在线分布式工业异常检测的参数高效多类智能调度

Heqiang Wang, Weihong Yang, Zheyuan Yang, Jia Zhou, Xiaoxiong Zhong, Fangming Liu, Weizhe Zhang

发表机构 * Pengcheng Laboratory（鹏城实验室）； Shenzhen International Graduate School（深圳国际研究生院）

AI总结针对工业异常检测中分布式、持续生成数据的特点，提出多模态在线分布式工业异常检测框架，通过多类智能调度问题和序列边际增益贪婪算法协调模型更新，并采用资源高效类级低秩适应策略降低系统开销，在MVTec 3D-AD和Eyecandies数据集上取得优越性能。

详情

AI中文摘要

工业异常检测作为工业系统的基本挑战已引起广泛关注。异构工业传感器的快速发展推动工业异常检测从单模态向多模态范式转变。然而，现有方法主要针对集中式和离线场景设计，忽视了实际工业环境中分布式和持续生成的数据特征。随着边缘智能的发展，现代边缘设备不仅能够采集数据，还能进行分布式模型训练，实现系统范围内的协作智能。工业异常检测是此背景下的关键应用。受这些挑战启发，我们提出了一种名为多模态在线分布式工业异常检测（MODIAD）的新框架。首先给出了MODIAD的完整工作流程，然后制定了多类智能调度（MIS）问题，通过平衡数据充足性和类别更新频率来协调跨类模型更新。为了高效解决该问题，我们设计了序列边际增益贪婪（SMG）算法，能够在资源约束下实现有效的多类训练。此外，为了提升训练过程中的计算和通信效率，我们提出了资源高效类级低秩适应（REC-LoRA）策略，在保持检测性能的同时显著降低系统开销。在两个代表性多模态工业异常检测数据集MVTec 3D-AD和Eyecandies上的大量实验表明，所提方法在MODIAD场景下实现了优越的性能和效率。

英文摘要

Industrial anomaly detection has attracted significant attention as a fundamental challenge in industrial systems. The rapid advancement of heterogeneous industrial sensors has driven industrial anomaly detection from unimodal to multimodal paradigms. However, existing methods are primarily designed for centralized and offline settings, overlooking the distributed and continuously generated data characteristic of real-world industrial environments. With the advancement of edge intelligence, modern edge devices are increasingly capable of not only data acquisition but also distributed model training, enabling collaborative intelligence across the system. Industrial anomaly detection represents a critical application in this context. Motivated by these challenges, we propose a novel framework termed Multimodal Online Distributed Industrial Anomaly Detection (MODIAD). We first present a comprehensive workflow for MODIAD and then formulate a Multi-class Intelligent Scheduling (MIS) problem to coordinate cross class model updates by balancing data sufficiency and class update frequency. To efficiently solve this problem, we design a Sequential Marginal Gain Greedy (SMG) algorithm that enables effective multi-class training under resource constraints. Furthermore, to improve the computational and communication efficiency during training, we propose an Resource Efficient Class-Wise Low Rank Adaptation (REC-LoRA) strategy, which significantly reduces system overhead while preserving detection performance. Extensive experiments on two representative multimodal industrial anomaly detection datasets, MVTec 3D-AD and Eyecandies demonstrate that the proposed approach achieves superior performance and efficiency under the MODIAD scenario.

URL PDF HTML ☆

赞 0 踩 0

2605.23978 2026-05-26 cs.LG econ.EM q-fin.ST q-fin.TR 版本更新

Algometrics: Forecasting Under Algorithmic Feedback

算法度量：算法反馈下的预测

Marc Schmitt

发表机构 * University of Oxford（牛津大学）

AI总结提出算法度量框架，研究预测算法影响自身评估数据的反馈机制，证明部署风险不可仅由历史数据识别，并给出估计方法。

详情

AI中文摘要

在算法市场中，预测模型成为其试图预测的数据生成过程的一部分。一旦其输出转化为交易、分配、执行计划或风险控制，它们就会改变用于评估的未来数据。我引入了算法度量，这是一个用于时间序列的框架，其演化依赖于预测它们的预测算法。该框架区分了被动预测下测量的历史风险和预测驱动行动时测量的部署风险。我证明了三个结果。首先，仅凭被动历史数据无法识别部署风险：即使在线性一步反馈模型中，无限多的算法介导环境会诱导相同的历史规律，但对同一预测器意味着不同的部署风险。其次，历史模型排名可能在拥挤下反转，因此被动误差较低的预测器在类似算法被采用后可能具有更高的部署误差。第三，随机化或工具化行动可识别短视界线性反馈，并且我推导出部署风险估计的有限样本界。这些结果表明，算法市场中的时间序列基准应报告反馈敏感性和预测准确性。

英文摘要

In algorithmic markets, predictive models become part of the data-generating process they aim to forecast. Once their outputs are converted into trades, allocations, execution schedules, or risk controls, they change the future data on which they are evaluated. I introduce algometrics, a framework for time series whose evolution depends on the predictive algorithms forecasting them. The framework distinguishes historical risk, measured under passive forecasting, from deployment risk, measured when forecasts drive actions. I prove three results. First, deployment risk is not identifiable from passive historical data alone: even in a one-step linear feedback model, infinitely many algorithm-mediated environments induce the same historical law while implying different deployment risks for the same forecaster. Second, historical model rankings can invert under crowding, so a predictor with lower passive error can have higher deployment error once similar algorithms are adopted. Third, randomized or instrumented actions identify short-horizon linear feedback, and I derive a finite-sample bound for deployment-risk estimation. These results suggest that time-series benchmarks in algorithmic markets should report feedback sensitivity alongside predictive accuracy.

URL PDF HTML ☆

赞 0 踩 0

2605.23971 2026-05-26 physics.chem-ph cs.LG physics.app-ph 版本更新

Physics-Guided Concentration Inference from Resistance Transients in a Mixed-Phase SnO-SnO$_2$ Carbon Monoxide Sensor with p-n Switching

物理引导的混合相SnO-SnO$_2$一氧化碳传感器中具有p-n切换的电阻瞬态浓度推断

Sani Biswas, Preetam Singh, Amit Kumar Gangwar

发表机构 * Centro de Modelamiento Matemático, Universidad de Chile & IRL 2807 - CNRS（智利大学数学建模中心及CNRS IRL 2807）； Department of Chemical Engineering, Biotechnology and Materials, FCFM, Universidad de Chile（智利大学化学工程、生物技术与材料系）； ANID - Millenium Science Initiative, Millenium Nuclei of Advanced MXenes for Sustainable Applications (AMXSA)（ANID-千年科学计划，可持续应用先进MXenes的千年核）； CSIR-National Physical Laboratory, Dr. K.S. Krishnan Marg, New Delhi, 110012, India（印度国家物理实验室，Dr. K.S. Krishnan Marg，新德里，110012）

AI总结提出一种物理引导的机器学习框架，利用混合相SnO-SnO$_2$气体传感器的电阻瞬态信号推断CO浓度，通过物理可解释描述符和频域特征实现p型和n型传感模式下的分类与回归，揭示了p型利于分类、n型利于高保真回归的双模行为。

Comments 15 pages, 14 figures

详情

AI中文摘要

本工作提出一个物理引导的机器学习框架，用于从实验测量的混合相SnO-SnO$_2$材料气体传感器的电阻瞬态信号中推断一氧化碳浓度，该传感器表现出温度依赖的p-n切换行为。周期级瞬态响应通过物理可解释的描述符表示，并辅以紧凑的快速傅里叶变换（FFT）和离散小波变换（DWT）摘要。使用考虑泄漏的分组交叉验证，我们分别研究了p型和n型传感模式下的多类浓度分类和连续浓度回归。在两种模式下，融合特征提供了最强的整体性能，而物理引导的描述符块仍然具有很强的竞争力，表明主要的浓度信息已经编码在物理上有意义的瞬态动力学中。p型分支显示出最佳的浓度类别区分能力，融合随机森林分类器达到约96.5%的准确率，而n型分支产生最佳的定量浓度估计，融合随机森林回归器实现了MAE≈1.48 ppm和R²≈0.992。这些结果揭示了清晰的双模行为：p型传感特别有利于分类，而n型传感更有利于高保真回归。更广泛地说，该研究表明，考虑泄漏的、周期级的、物理引导的机器学习可以将传统的气体传感分析扩展到单一响应指标之外，同时保持物理可解释性。

英文摘要

This work presents a physics-guided machine-learning framework for carbon monoxide concentration inference from experimentally measured resistance transients of a mixed-phase SnO-SnO$_2$ material gas sensor exhibiting temperature-dependent p-n switching behavior. Cycle-level transient responses are represented through physically interpretable descriptors and complemented by compact fast Fourier transform (FFT) and discrete wavelet transform (DWT)-based summaries. Using leakage-aware grouped cross-validation, we study both multi-class concentration classification and continuous concentration regression for the p-type and n-type sensing regimes separately. Across both regimes, fused features provide the strongest overall performance, while the physics-guided descriptor block remains highly competitive, indicating that the dominant concentration information is already encoded in physically meaningful transient dynamics. The p-type branch shows the best concentration-class discrimination, with the fused Random Forest classifier reaching approximately $96.5\%$ accuracy, whereas the n-type branch yields the best quantitative concentration estimation, with the fused Random Forest regressor achieving an MAE$\approx 1.48$ ppm and an R$^2$ $\approx 0.992$. These results reveal a clear dual-regime behavior: p-type sensing is particularly favorable for classification, whereas n-type sensing is more favorable for high-fidelity regression. More broadly, the study demonstrates that leakage-aware, cycle-level, physics-guided machine learning can extend conventional gas-sensing analysis beyond single-response metrics while preserving physical interpretability

URL PDF HTML ☆

赞 0 踩 0

2605.23962 2026-05-26 q-fin.ST cs.LG 版本更新

From Index to Equity: Pre-Training Transformers for Stock Return Prediction

从指数到股票：预训练Transformer用于股票回报预测

Marie Soehl Coolsaet, Roberto Gallardo, Zhen Gao

发表机构 * Faculty of Engineering, McMaster University（麦基尔大学工程学院）； Department of Economics, Universidad Veracruzana, Mexico（墨西哥韦拉克鲁斯大学经济系）

AI总结本文研究基于Transformer的股票预测模型，通过在多伦多证券交易所指数上预训练再微调到个股，提升了预测性能，并与LSTM和XGBoost对比。

详情

AI中文摘要

本研究旨在利用机器学习改进股票价格预测，并支持与买入、卖出和持有资产相关的明智投资决策。具体而言，本文研究了基于Transformer的股票预测模型，并考察了预训练策略对预测性能的影响。首先在多伦多证券交易所指数（TSX）上预训练一个Transformer模型以预测日内回报方向，随后在TSX个股上进行微调。该模型进一步适用于回报值回归任务。性能以长短期记忆网络（LSTM）和XGBoost模型为基准进行对比。在市场指数上的预训练将个股预测的二元交叉熵损失从0.69降低到0.64。微调后的Transformer回归模型实现了比基准模型更低的均方误差，尽管集成模型和XGBoost模型实现了更高的平均日回报。此外，开发了一个实际应用以提供实时股票预测用于交易支持。未来工作将集中于增加Transformer模型容量、纳入更广泛的全球技术指标以及过滤掉低可预测性的股票。

英文摘要

This research aims to leverage machine learning to improve stock price prediction and support informed investment decisions related to buying, selling, and holding assets. Specifically, this work investigates transformer-based models for stock prediction and examines the impact of pre-training strategies on forecasting performance. A transformer model was first pre-trained on the Toronto Stock Exchange Index (TSX) to predict intra-day return direction and subsequently fine-tuned on individual TSX stocks. The model was further adapted for return-value regression tasks. Performance was benchmarked against Long Short-Term Memory (LSTM) and XGBoost models. Pre-training on the market index improved the binary cross-entropy loss for individual stock prediction from 0.69 to 0.64. The fine-tuned transformer regression model achieved lower mean squared error than the benchmark models, although the ensemble and XGBoost models achieved higher average daily returns. In addition, a practical application was developed to deliver real-time stock predictions for trading support. Future work will focus on increasing transformer model capacity, incorporating broader global technical indicators, and filtering out stocks with low predictability.

URL PDF HTML ☆

赞 0 踩 0

2605.23961 2026-05-26 q-bio.BM cs.AI cs.LG 版本更新

Multimodal Alignment and Preference Optimization for Zero-Shot Conditional RNA Generation

多模态对齐与偏好优化用于零样本条件RNA生成

Roman Klypa, Alberto Bietti, Sergei Grudinin

发表机构 * Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK（格勒诺布尔阿尔卑斯大学、法国国家科学研究中心、格勒诺布尔INP、LJK实验室）； Center for Computational Mathematics, Flatiron Institute（计算数学中心、Flatiron研究所）

AI总结提出Moirain框架，通过多模态监督微调和直接偏好优化实现条件RNA序列生成，在零样本条件下生成具有高结合亲和力的生物合理RNA序列。

详情

AI中文摘要

设计能与特定蛋白质相互作用的RNA分子是实验和计算生物学中的一个关键挑战。尽管自然语言建模和基于深度学习的蛋白质设计最近取得了进展，但在提高成功交互频率和生成序列的真实性方面仍有很大空间。在这项工作中，我们将条件RNA序列生成视为一个多阶段对齐问题，引入了Moirain：一组通过多模态监督微调（SFT）和直接偏好优化（DPO）优化的模型。我们的方法从对多样化RNA语料库的大规模预训练开始，以捕捉序列合理性的基本语法。为了实现目标特异性生成，我们采用了一种多模态SFT架构，该架构以蛋白质结构和序列特征为条件进行RNA合成。最后，我们利用DPO使用合成交互数据来优化模型：利用DPO在非对齐偏好空间中导航的独特能力，我们提高了功能适应性，同时不破坏学习到的自然分布。对Moirain系列（Moirain-Base、-Multi和-DPO）的广泛评估表明，与现有基线相比，我们的框架始终能产生新颖、多样且生物合理的RNA序列，并具有优越的结合亲和力。

英文摘要

The design of RNA molecules that interact with specific proteins is a critical challenge in experimental and computational biology. Despite recent progress in natural language modeling and deep learning-based protein design, there remains significant room to improve the frequency of successful interactions and the authenticity of generated sequences for functional applications. In this work, we frame conditional RNA sequence generation as a multi-stage alignment problem, introducing Moirain: a suite of models optimized via multimodal supervised fine-tuning (SFT) and Direct Preference Optimization (DPO). Our approach begins with large-scale pretraining on diverse RNA corpora to capture the fundamental grammars of sequence plausibility. To achieve target-specific generation, we employ a multimodal SFT architecture that conditions RNA synthesis on protein structural and sequential features. Finally, we leverage DPO to refine the model using synthetic interaction data: taking advantage of DPO's unique ability to navigate non-aligned preference spaces, we improve functional fitness without collapsing the learned natural distribution. Extensive evaluation of the Moirain series (Moirain-Base, -Multi, and -DPO) demonstrates that our framework consistently produces novel, diverse, and biologically plausible RNA sequences with superior binding affinities compared to existing baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.23960 2026-05-26 q-bio.BM cs.LG 版本更新

Learning Protein Structure-Function Relationships through Knowledge-guided Representation Decomposition

通过知识引导的表示分解学习蛋白质结构-功能关系

Mingqing Wang, Zhiwei Nie, Athanasios V. Vasilakos, Yonghong He, Zhixiang Ren

发表机构 * Shenzhen International Graduate School, Tsinghua University, Shenzhen, China（清华大学深圳国际研究生院）； Pengcheng Laboratory, Shenzhen, China（鹏城实验室）； School of Electronic and Computer Engineering, Peking University, Shenzhen, China（北京大学电子与计算机工程学院）； CAIR, University of Agder, Norway（阿格德大学CAIR）； Shanghai Smart Logic Technology Co. Ltd., Shanghai, China（上海智略科技有限公司）

AI总结提出知识引导的框架ProtDiS，基于信息瓶颈原理分解预训练的蛋白质微环境嵌入，得到更特异、独立和信息高效的结构特征，在12个下游任务上取得一致改进。

Comments 28 pages, 17 figures, icml 2026 regular

详情

AI中文摘要

蛋白质在复杂的三维结构中编码多样功能，然而大多数深度学习表示仍然高度纠缠，掩盖了功能背后的生物物理信号。本文引入ProtDiS，一个知识引导的框架，将预训练的蛋白质微环境嵌入分解为生物学上可解释且任务相关的维度。受信息瓶颈原理启发，ProtDiS学习平衡信息量和压缩的表示，产生更特异、独立和信息高效的结构特征，并在12个下游任务上取得一致改进，在基于结构的分割下提升最大。蛋白质和残基层面的分析进一步表明，ProtDiS能够区分折叠相似但功能不同的蛋白质，并捕捉关键的细粒度生物物理信号。这些发现表明，知识引导的分解为蛋白质结构建模中的潜在空间结构化提供了一种通用且可解释的方法。源代码和实现细节公开于https://github.com/AI-HPC-Research-Team/ProtDiS。

英文摘要

Proteins encode diverse functions within complex three-dimensional structures, yet most deep learning representations remain highly entangled, obscuring the biophysical signals that underlie function. Here we introduce ProtDiS, a knowledge-guided framework that decomposes pretrained protein micro-environment embeddings into biologically grounded and task-relevant dimensions. Inspired by the information bottleneck principle, ProtDiS learns representations that balance informativeness and compression, yielding structural features that are more specific, independent, and information-efficient, and achieving consistent improvements across twelve downstream tasks, with the largest gains under structure-based splits. Protein- and residue-level analyses further show that ProtDiS differentiates proteins with similar folds but divergent functions and captures fine-grained biophysical signals critical. These findings suggest that knowledge-guided decomposition provides a general and interpretable approach for structuring latent spaces in protein structural modeling. The source code and implementation details are publicly available at https://github.com/AI-HPC-Research-Team/ProtDiS.

URL PDF HTML ☆

赞 0 踩 0

2605.23957 2026-05-26 cs.AI cs.LG 版本更新

Low-Cost Labels, Reliable Choices: Rollout-Calibrated Hyper-Heuristics for Job Shop Scheduling

低成本标签，可靠选择：用于作业车间调度的Rollout校准超启发式算法

Junhao Wei, Yanxiao Li, Yifu Zhao, Zhenhong Peng, Baili Lu, Dexing Yao, Haochen Li, Qinbin He, Sio-Kei Im, Yapeng Wang, Xu Yang

发表机构 * Faculty of Applied Sciences, Macao Polytechnic University（澳门理工学院应用科学学院）； Pazhou Lab (Huangpu), Guangzhou（广州 Pazhou 实验室（黄埔））； College of Animal Science and Technology, Zhongkai University of Agriculture and Engineering（仲恺农业工程学院动物科学与技术学院）； Macao Polytechnic University（澳门理工学院）

AI总结提出一种基于Rollout校准的超启发式算法，通过遗憾归一化标签、上下文KNN不确定性估计和门控机制，在低成本标签下实现可靠的选择器，显著降低平均RPD。

详情

AI中文摘要

学习辅助的超启发式算法可以在保持构造性作业车间调度问题（JSSP）启发式的可行性和可解释性的同时，选择调度规则。其主要计算成本在于标签生成而非模型拟合，因为每个监督标签通常需要从部分调度中展开候选规则。我们研究了这一标签成本问题以及一个可靠性问题：学习的选择器不应偏离强默认规则，除非预测的增益是可信的。所提出的选择器使用遗憾归一化的展开标签、上下文KNN不确定性估计以及一个门控机制，仅在预测改进超过不确定性调整的边际时采取行动。我们还变化展开深度和广度以衡量成本-质量权衡。在合成JSSP实例上，门控选择器在学习的选择器中实现了最低的平均RPD，接近最佳固定调度规则，并将Random-HH的平均RPD降低了一个数量级以上。

英文摘要

Learning-assisted hyper-heuristics can select among dispatching rules while preserving the feasibility and interpretability of constructive Job Shop Scheduling Problem (JSSP) heuristics. Their main computational cost lies in label generation rather than model fitting, since each supervised label usually requires rolling out candidate rules from a partial schedule. We study this label-cost problem together with a reliability problem: a learned selector should not switch away from a strong default rule unless the predicted gain is credible. The proposed selector uses regret-normalized rollout labels, a contextual KNN uncertainty estimate, and a gate that acts only when the predicted improvement exceeds an uncertainty-adjusted margin. We also vary rollout depth and breadth to measure the cost-quality trade-off. On synthetic JSSP instances, the gated selector achieves the lowest mean RPD among learned selectors, remains close to the best fixed dispatching rule, and reduces Random-HH mean RPD by more than an order of magnitude.

URL PDF HTML ☆

赞 0 踩 0

2605.23956 2026-05-26 cs.AI cs.LG cs.MA 版本更新

LLM介导的普适系统中的权威倒置：当模型信任用户胜过传感器

Long Zhang, Zi-bo Qin, Wei-neng Chen

发表机构 * School of Computer Science and Engineering, South China University of Technology（华南理工大学计算机科学与工程学院）； School of Computer Science（计算机科学学院）； Engineering, South China University of Technology（华南理工大学工程学院）

AI总结本研究揭示了大语言模型在融合传感器与用户冲突信息时，由于格式依赖性导致数值传感器数据被自然语言用户主张支配的权威倒置现象，并提出了几何框架、审计指标（CIR和AAI）以及推理时层干预方法（GAC）来诊断和缓解该问题。

详情

AI中文摘要

大语言模型（LLM）越来越多地融合普适系统中的异构输入。然而，当传感器测量值与用户主张冲突时，LLM如何隐式分配权威尚未被研究，这引发了在物理传感必须保持优先级的部署场景中的关键可靠性问题。与显式的传统融合不同，LLM将权威分配隐藏在学习的表示中。我们发现这种分配严重依赖于格式：数值传感器数据未能整合到与答案相关的模型方向中，使得自然语言主张主导最终决策，我们将这种现象称为 extbf{权威倒置}。为了诊断和缓解这一问题，我们开发了一个上下文整合的几何框架，引入了两个可计算的审计指标，即上下文整合比（CIR）和权威对齐指数（AAI），并提出了几何权威校准（GAC），一种推理时的层级干预方法，以抑制错位的用户权威。在四个数据集（共576个冲突实例）上评估四个模型（参数规模4B至35B，三种架构），揭示了极端的倒置：在数值任务上，模型表现出接近零的传感器信任（AAI = -0.805，Cohen's d = -2.14），且不受模型容量影响。验证我们的几何框架，理论引导的因果注入翻转了80.2%的错误决策（随机对照<0.4%）。实际应用中，GAC将HAR准确率从0–1.6%提升至21.9–27.5%，优于提示基线。最终，LLM介导系统中的权威分配必须被显式审计并根据应用特定配置，而不是保持隐式。

英文摘要

Large language models (LLMs) increasingly fuse heterogeneous inputs in ubiquitous systems. Yet, how LLMs implicitly allocate authority when sensor measurements and user claims conflict remains unexamined, raising critical reliability concerns for deployments where physical sensing must retain priority. Unlike explicit traditional fusion, LLMs bury authority allocation within learned representations. We discover this allocation is severely format-dependent: numerical sensor data fails to integrate into answer-relevant model directions, allowing natural-language claims to dominate the final decision, a phenomenon we term \textbf{Authority Inversion}.To diagnose and mitigate this, we develop a geometric framework of context integration, introduce two computable audit metrics, specifically the Context Integration Ratio (CIR) and Authority Alignment Index (AAI), and propose Geometric Authority Calibration (GAC), an inference-time layer-level intervention to suppress misplaced user authority. Evaluating four models (4B to 35B parameters, three architectures) across four datasets totaling 576 conflict instances reveals extreme inversion: on numerical tasks, models exhibit near-zero sensor trust (AAI = -0.805, Cohen's d = -2.14), unaffected by model capacity. Validating our geometric framework, theory-guided causal injection flips 80.2\% of incorrect decisions (vs. <0.4\% for random controls). Practically, GAC improves HAR accuracy from 0 -- 1.6\% to 21.9 -- 27.5\%, outperforming prompting baselines. Ultimately, authority allocation in LLM-mediated systems must be explicitly audited and application-specifically configured rather than left implicit.

URL PDF HTML ☆

赞 0 踩 0

2605.23936 2026-05-26 cs.AI cs.LG 版本更新

Fuzzy, Neutrosophic, and Uncertain Graph Theory: Properties and Applications

模糊、中智和不确定图论：性质与应用

Takaaki Fujita, Florentin Smarandache

AI总结本书系统综述了不确定性下的图论，以不确定图框架为核心，统一了模糊、中智等模型，并介绍了扩展图类及其在分子图、决策系统、图神经网络等领域的应用。

Comments 326 pages. Publisher: Neutrosophic Science International Association (NSIA) Publishing House. ISBN: 978-197250204-4

2605.23932 2026-05-26 cs.AI cs.CL cs.CY cs.LG 版本更新

When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure

当正确信念崩溃：LLMs在临床压力下的认知韧性

Boyu Xiao, Xiuqi Tian, Xuwen Song, Haochun Wang, Guanchun Song, Sendong Zhao, Bing Qin

发表机构 * Research Center for Social Computing and Interactive Robotics, Harbin Institute of Technology, China（社会计算与交互机器人研究院，哈尔滨工业大学，中国）

AI总结研究LLMs在临床对话中面对逐步升级压力时信念稳定性问题，提出Med-Stress压力测试框架，发现知识-韧性差距，并设计RBED和R-FT方法提升鲁棒性。

Comments ACL 2026

详情

AI中文摘要

尽管在医学基准测试中准确率很高，但LLMs在临床对话中可能表现出严重的多轮谄媚行为，在逐步升级的压力下放弃最初正确的诊断。我们提出了\textbf{\textsc{Med-Stress}}，一个针对性的压力测试框架，用于评估在逐步升级压力下的信念稳定性。在九个前沿大型语言模型（LLMs）中，我们发现医学知识与鲁棒性之间存在明显的分离：高初始诊断能力并不意味着高信念稳定性，导致多个LLMs存在较大的知识-鲁棒性差距。为了缓解这种失败模式，我们提出了一种轻量级的推理时防御方法\textbf{\texttt{RBED}}（\textbf{R}ole-\textbf{B}ased \textbf{E}pistemic \textbf{D}efense），以及一种训练时方法\textbf{\texttt{R-FT}}（\textbf{R}esilience-oriented \textbf{F}ine-\textbf{T}uning），该方法内化了基于证据的抗压能力。实验表明，\textbf{\texttt{R-FT}}几乎消除了信念变化，并显著提高了鲁棒性。

英文摘要

Despite strong medical benchmark accuracy, LLMs can exhibit severe multi-turn sycophancy in clinical dialogue, abandoning initial correct diagnosis under escalating pressure. We propose \textbf{\textsc{Med-Stress}}, a targeted stress test framework that evaluates belief stability under escalating pressure. Across nine frontier large language models (LLMs), we find a clear dissociation between medical knowledge and robustness: high initial diagnostic capability does not imply high belief stability, yielding large knowledge-robustness gaps for several LLMs. To mitigate this failure mode, we propose a lightweight inference-time defense, \textbf{\texttt{RBED}} (\textbf{R}ole-\textbf{B}ased \textbf{E}pistemic \textbf{D}efense), and \textbf{\texttt{R-FT}} (\textbf{R}esilience-oriented \textbf{F}ine-\textbf{T}uning), a training-time approach that internalizes evidence-based resistance to pressure. Experiments show that \textbf{\texttt{R-FT}} nearly eliminates belief change and substantially improves robustness.

URL PDF HTML ☆

赞 0 踩 0

2605.23930 2026-05-26 cs.AI cs.LG cs.MA 版本更新

Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game

量子青蛙：量化时间合作博弈中的涌现合作与难度缩放

Saad Mankarious

发表机构 * Gymnasium API

AI总结通过强化学习分析量化时间合作博弈Quantum Frog，发现同步冲刺策略最优，合作训练可大幅提升成功率并缩短回合步数。

详情

AI中文摘要

我们引入了\emph{Quantum Frog}，这是一个双人合作游戏，基于一种新颖的\emph{量化时间}机制，其中环境仅在玩家行动时推进。受经典街机游戏Frogger启发，Quantum Frog要求两只青蛙穿越一个8×8的交通网格并一起到达远端。我们使用强化学习（RL）作为分析镜头来回答四个设计问题：（1）游戏难度如何随交通密度缩放，（2）最优单智能体策略是什么以及为什么，（3）独立和合作双智能体游戏之间的合作差距有多大，以及（4）当智能体被激励合作时会出现什么联合策略？我们通过五个升级阶段训练智能体：表格Q学习、深度Q网络（\DQN）、独立\DQN~（\IDQN）和多智能体近端策略优化（\MAPPO\ 带有集中式评论家），针对一到六辆车的交通密度进行评估。我们的主要发现是：（i）量化时间机制使得\emph{冲刺策略}（每一步直接向上移动）普遍最优，因为暴露于交通的时间被最小化；（ii）添加一个不协调的第二玩家比将单个专家玩家的交通量增加六倍更难；（iii）合作训练相对于独立智能体将联合成功率提高了+32–34个百分点，并将回合长度从约90步减少到约6步；（iv）涌现的合作策略是同步冲刺，而不是复杂的位置协调，这表明在时间关键的合作任务中，仅共享激励就足以使智能体对齐。这些发现为Quantum Frog的商业设计提供了具体、经验基础的指导，并为环境机制在塑造多智能体学习动态中的作用提供了更广泛的见解。

英文摘要

We introduce \emph{Quantum Frog}, a two-player cooperative game built on a novel \emph{quantized-time} mechanic in which the environment advances only when a player acts. Inspired by the classic arcade game Frogger, Quantum Frog requires two frogs to cross an 8$\times$8 grid of traffic and reach the far side together. We use reinforcement learning (RL) as an analytical lens to answer four design questions: (1) how does game difficulty scale with traffic density, (2) what is the optimal single-agent policy and why, (3) how large is the cooperation gap between independent and cooperative two-agent play, and (4) what joint strategy emerges when agents are incentivised to cooperate? We train agents through five escalating stages, Tabular Q-Learning, Deep Q-Network (\DQN), Independent \DQN~(\IDQN), and Multi-Agent Proximal Policy Optimisation (\MAPPO\ with a centralised critic), evaluating each against traffic densities of one to six cars. Our key findings are: (i) the quantized-time mechanic makes a \emph{rush strategy} (moving directly upward at every step) universally optimal, as time exposure to traffic is minimised; (ii) adding an uncoordinated second player is harder than sextupling the traffic for a single expert player; (iii) cooperative training recovers +32--34 percentage points of joint success rate relative to independent agents and reduces episode length from $\sim$90 to $\sim$6 steps; and (iv) the emergent cooperative strategy is synchronised rushing, not complex positional coordination, illustrating that shared incentives alone suffice to align agents in time-critical cooperative tasks. These findings provide concrete, empirically grounded guidance for the commercial design of Quantum Frog and offer broader insights into the role of environment mechanics in shaping multi-agent learning dynamics.

URL PDF HTML ☆

赞 0 踩 0

2605.23926 2026-05-26 cs.AI cs.LG 版本更新

How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning

多少思考才足够？量化和理解LLM推理中的冗余

Zhiyuan Zhai, Xinkai You, Wenjing Yan, Xin Wang

发表机构 * Fudan University（复旦大学）； The Chinese University of Hong Kong（香港中文大学）

AI总结本文通过形式化推理冗余度量，量化了前沿推理模型在数学基准上高达61%-93%的步骤级冗余，并证明这种冗余是长度无关结果奖励的结构性后果，而非模型特定伪影。

详情

AI中文摘要

具备推理能力的大语言模型通过生成长思维链来解决难题，这严重增加了延迟、GPU时间和能耗。粗略检查其轨迹发现大量重构、验证和循环自省，然而这种深思熟虑中有多少实际上是必要的，从未在大规模上被度量或从第一性原理解释。本文填补了这两个空白。我们直接以推理模型本身的形式化推理冗余：一个正确轨迹的冗余度是其尾部可被截断的最大分段步骤比例，同时迫使模型终止思考并输出最终答案，仍能产生正确答案。对四个前沿推理模型和两个数学基准的大规模量化表明，步骤级冗余一致地高——在我们研究的8个（模型，基准）条件下介于61%和93%之间，其中六个条件下中位关键前缀等于单个分段步骤——该发现对评判模型族的选择是稳健的，并且尽管在MATH-500上随问题难度增加而降低，所有四个模型即使在最难的Level-5问题上仍然显著冗余（ρ∈[46%,85%]）。然后我们证明这种冗余是长度无关结果奖励的结构性后果，而非模型特定伪影：在任何此类奖励下，没有有限期望停止时间是最优的。该结果无论RL算法、基础模型、数据分布或策略是通过RL还是蒸馏获得均成立；因此过度思考不是需要在单个模型中修补的缺陷，而是当前推理模型训练方式的结构性属性。代码：https://github.com/zhiyuanZhai20/how-much-thinking-is-enough

英文摘要

Reasoning-capable large language models solve hard problems by emitting long chains of thought, paying heavily in latency, GPU time, and energy. Casual inspection of their traces reveals extensive reformulation, verification, and circular self-reflection, yet how much of this deliberation is actually necessary has never been measured at scale or explained from first principles. This paper closes both gaps. We formalise reasoning redundancy directly in terms of the reasoning model itself: the redundancy of a correct trace is the largest fraction of its trailing segmented steps that can be truncated while $π$, forced to terminate thinking and emit a final answer, still produces the correct answer. A large-scale quantification across four frontier reasoning models and two mathematical benchmarks shows that step-level redundancy is consistently high -- between 61% and 93% across the 8 (model, benchmark) conditions we study, with the median critical prefix equal to a single segmented step in six of the eight conditions -- that the finding is robust to the choice of judge family, and that although $ρ$ decreases with problem difficulty on MATH-500, all four models remain substantially redundant ($ρ\in [46\%, 85\%]$) even on the hardest Level-5 problems. We then prove that this redundancy is a structural consequence of length-agnostic outcome rewards, not a model-specific artefact: under any such reward, no finite expected stopping time is optimal. The result holds regardless of RL algorithm, base model, data distribution, or whether the policy is obtained via RL or distillation; over-thinking is therefore not a bug to be patched in individual models but a structural property of how current reasoning models are trained. Code: https://github.com/zhiyuanZhai20/how-much-thinking-is-enough

URL PDF HTML ☆

赞 0 踩 0

2605.23922 2026-05-26 cs.CY cs.AI cs.LG 版本更新

High-Risk AI Systems and the Problem of Identity in the European AI Act

高风险人工智能系统与欧洲人工智能法案中的身份问题

Andrea Ferrario

发表机构 * Institute of Biomedical Ethics and History of Medicine, University of Zürich（伦理与医学史研究所，苏黎世大学）； SUPSI, Dalle Molle Institute for Artificial Intelligence (IDSIA)（SUPSI，达勒莫利人工智能研究所）； ETH Zürich（苏黎世联邦理工学院）

AI总结本文通过功能+框架分析欧盟AI法案中高风险AI系统的身份认定问题，提出同步身份测试方法以支持监管审计。

Comments Accepted as a non-archival paper at The 2026 ACM Conference on Fairness, Accountability, and Transparency (FAccT '26), June 25-28, 2026, Montreal, QC, Canada

详情

AI中文摘要

欧盟人工智能法案（AIA）为高风险AI系统建立了一个生命周期治理制度，围绕事前合规评估、上市后监测以及在“重大修改”时重新评估。这些义务预设了AI身份判断：监管机构和提供者必须决定更新后的系统是否随时间保持同一系统。在这项工作中，我们展示了如何通过人工制品身份的功能+框架来澄清这一逻辑，该框架通过预期功能以及适当功能的上下文相关标准（即“AI可信度”）来个体化AI系统。我们进一步论证，AIA没有为同步身份提供内部、可审计的标准——即在给定时间两个AI系统在监管目的上是否应被视为相同——而是基本上将这种相同性判断委托给部门或协调工具。功能+提供了一个以预期功能和可信度概况及水平为基础的同步身份测试，使得同步身份决策在采购、责任和市场监督等治理环境中可检查。我们的贡献是一个概念性和审计视角：我们提供了AIA生命周期义务与功能+身份组件之间的对应图，并通过一个用于审计和争议情境的最小决策流程使同步案例在操作上清晰可读。最后，我们提出两个面向实施的建议：（1）更精确、可测试的预期用途报告，以及（2）标准化、可审计的可信度报告，支持跨时间和跨部署的可比性。

英文摘要

The EU Artificial Intelligence Act (AIA) establishes a lifecycle governance regime for high-risk AI systems built around ex-ante conformity assessment, post-market monitoring, and re-assessment upon "substantial modification." These obligations presuppose AI identity judgments: regulators and providers must decide when an updated system remains the same system over time. In this work, we show how this logic is clarified by the function+ framework of artifact identity, which individuates AI systems by their intended function together with context-sensitive criteria of appropriate functioning, captured as "AI trustworthiness." We further argue that the AIA does not provide an internal, auditable criterion for synchronic identity--when two AI systems at a given time should count as the same for regulatory purposes--and instead largely defers such sameness determinations to sectoral or harmonization instruments. function+ supplies a synchronic identity test anchored in intended function and trustworthiness profiles and levels, making synchronic identity decisions inspectable in governance settings such as procurement, liability, and market surveillance. Our contribution is a conceptual and auditing lens: we provide a correspondence map between AIA lifecycle obligations and function+ identity components, and we make the synchronic case operationally legible via a minimal decision flow for audit and dispute contexts. We conclude with two implementation-facing recommendations: (1) more precise, testable reporting of intended purpose, and (2) standardized, auditable trustworthiness reporting that supports comparability over time and across deployments.

URL PDF HTML ☆

赞 0 踩 0

2605.23918 2026-05-26 cs.DC cs.LG cs.PF 版本更新

The Model Parking Tax: Quantifying the Hidden Energy Cost of Always-On GPU Model Deployment

模型停放税：量化始终在线GPU模型部署的隐藏能耗成本

Sai Sathvik Vadari

AI总结通过跨架构测量，发现GPU空闲功耗由DVFS状态决定，而非显存占用，CUDA上下文贡献了超过98%的停放税，并建立了冷启动盈亏平衡模型。

Comments 7 pages, 3 figures, 5 tables

详情

AI中文摘要

AI推理行业将模型全天候加载在GPU内存中以避免冷启动延迟，隐含地将空闲功耗视为准备就绪的固定成本。然而，这种成本的结构从未被经验性地分解——也从未跨GPU架构进行过。我们首次跨架构测量了空闲GPU功耗作为VRAM分配的函数，结合了18天的生产遥测数据（335,267个样本，14个H100 GPU）以及在三种GPU架构（涵盖三种内存技术：NVIDIA H100（HBM3，80 GB）、A100（HBM2e，80 GB）和L40S（GDDR6，48 GB））上进行的受控剂量-反应实验。我们观察到，在所有三种架构上，空闲功耗是分段常数：CUDA上下文强制进行离散的DVFS转换，消耗比裸空闲多26-66 W（HBM架构上为26-50 W，GDDR6上为66 W），而边际VRAM效应在所有测试设备上均低于测量相关性（|β| < 0.02 W/GB）。无论内存技术如何，CUDA上下文占停放税的98%以上。我们通过在所有三种架构上运行真实的HuggingFace模型（Qwen2.5-7B）验证了这一发现，确认每个设备上空张量与模型加载之间的功耗差异小于0.5 W，并捕获了模型加载期间的冷启动功耗曲线。我们推导出一个冷启动盈亏平衡模型，表明能量最优行为取决于请求到达率和加载延迟——而非模型大小——盈亏平衡间隔为1-5分钟。我们的结果确定了一个在所有测试架构上一致的约束：带上下文的空闲功耗由DVFS状态决定，而非内存占用。

英文摘要

The AI inference industry keeps models loaded in GPU memory around the clock to avoid cold-start latency, implicitly treating idle power as a fixed cost of readiness. Yet the structure of this cost has never been empirically decomposed - and never across GPU architectures. We present the first cross-architecture measurement of idle GPU power as a function of VRAM allocation, combining 18 days of production telemetry (335,267 samples, 14 H100 GPUs) with controlled dose-response experiments on three GPU architectures spanning three memory technologies: NVIDIA H100 (HBM3, 80 GB), A100 (HBM2e, 80 GB), and L40S (GDDR6, 48 GB). We observe that idle power is piecewise constant on all three architectures: the CUDA context forces a discrete DVFS transition consuming +26-66 W over bare idle (26-50 W on HBM architectures, 66 W on GDDR6), while the marginal VRAM effect is bounded below measurement relevance ($|β| < 0.02$ W/GB) on every device tested. The CUDA context accounts for >98% of the parking tax regardless of memory technology. We validate this finding with a real HuggingFace model (Qwen2.5-7B) on all three architectures, confirming <0.5 W difference from empty tensors on every device, and capture cold-start power profiles during model loading. We derive a cold-start breakeven model showing energy-optimal behavior depends on request arrival rate and loading latency - not model size - with breakeven intervals of 1-5 minutes. Our results identify a constraint consistent across all tested architectures: idle-with-context power is determined by DVFS state, not memory occupancy.

URL PDF HTML ☆

赞 0 踩 0

2605.23909 2026-05-26 cs.AI cs.LG 版本更新

Confidence Calibration in Large Language Models

大型语言模型中的置信度校准

Noam Michael, Daniel BenShushan, Jacob Bien, Don A. Moore

发表机构 * U.C. Berkeley（伯克利大学）； University of Southern California（南加州大学）

AI总结通过预注册研究，发现大型语言模型（LLMs）的置信度普遍高于准确率，且存在显著的难易效应：困难测试中过度自信，简单测试中信心不足，并提出了LifeEval测试用于评估不同难度下的模型校准。

2605.22800 2026-05-26 cs.LG cs.AI stat.ML 版本更新

The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning

匹配原则：面向干扰鲁棒表示学习的损失函数几何理论

Vishal Rajput

发表机构 * KU Leuven（根特大学）

AI总结提出匹配原则，通过估计任务协方差矩阵并匹配惩罚矩阵的像空间，统一了多种鲁棒性方法，并在线性高斯模型中证明最优性。

Comments 58 pages, 13 pre-specified empirical blocks. v2: partial-pass framing, geometry-task dissociation, T2B protocol v3, layout/figure fixes; core theorems unchanged. Code: matching-pmh (PyPI). Related note: arXiv:2604.21395

详情

AI中文摘要

鲁棒性、领域自适应、光度/遮挡不变性、传感器漂移和对齐风格被视为独立的文献领域，拥有各自独立的方法族。在标签保持的部署偏移下，它们共享一个几何对象：协方差 Sigma_task = Cov_{Q_n}(n)，即输入在标签不变的情况下可以变化的方式。CORAL、对抗训练、数据增强、度量学习、雅可比惩罚和对齐约束并非独立的技巧——它们都是 Sigma_task 的估计量。固定该对象后，雅可比惩罚由一个矩阵 Sigma' 确定，其像空间必须覆盖 range(Sigma_task)——即匹配原则。我们在线性高斯模型中证明了最优性（定理A），证明了任何能够消除部署漂移的二次惩罚都需要像空间覆盖（定理G），并在全局最小值处证明了相同的二分性（定理A*_global）。错误方向/信号对齐控制（引理C；推论E/E*）以及七个估计量（引理D1-D7），加上无标签TDI，为需要学习 Sigma_task 的情况提供了可证伪的配方。在十三个模块（从ML到Qwen2.5-7B）上，测试了匹配的、各向同性的和错误方向的惩罚对几何和部署漂移的影响。其中十二个模块与可识别性成立的理论一致；Office-31是一个命名的特征间隙失败案例。部分通过：几何可以在不改善每个头条任务指标的情况下提升。一次初步的7B DPO运行（一个epoch，240对）：匹配风格-PMH保持了风格TDI，而标准DPO则使其退化。我们不声称标准训练达到全局最小值（假设(O)是开放的），不声称估计的 Sigma_task 总是可识别的，也不声称在每个排行榜上占优。我们提出一个可证伪的设计配方：估计 Sigma_task，匹配 Sigma'，运行控制，分别报告任务和几何指标。

英文摘要

Robustness, domain adaptation, photometric/occlusion invariance, sensor drift, and alignment style are treated as separate literatures with separate method families. Under label-preserving deployment shift they share one geometric object: the covariance Sigma_task = Cov_{Q_n}(n) of ways inputs can change without changing the label. CORAL, adversarial training, augmentation, metric learning, Jacobian penalties, and alignment constraints are not independent tricks--they are estimators of Sigma_task. Fix that object and the Jacobian penalty is pinned by a matrix Sigma' whose range must cover range(Sigma_task)--the matching principle. We prove optimality in a linear-Gaussian model (Thm. A), necessity of range coverage for any quadratic penalty that zeros deployment drift (Thm. G), and the same dichotomy at global minima (Thm. A*_global). Wrong-direction/signal-aligned controls (Lemma C; Cor. E/E*) and seven estimators (Lemmas D1--D7), plus label-free TDI, yield a falsifiable recipe when Sigma_task must be learned. Thirteen blocks (ML through Qwen2.5-7B) test matched vs isotropic vs wrong-direction penalties on geometry and deployment drift. Twelve match theory where identifiability holds; Office-31 is a named eigengap failure. Partial passes: geometry can improve without every headline task metric moving. A pilot 7B DPO run (one epoch, 240 pairs): matched style-PMH preserves Style TDI where standard DPO degrades it. We do not claim standard training reaches global minima (assumption (O) is open), that estimated Sigma_task is always identifiable, or dominance on every leaderboard. We claim a falsifiable design recipe: estimate Sigma_task, match Sigma', run the controls, report task and geometry separately.

URL PDF HTML ☆

赞 0 踩 0

2605.22532 2026-05-26 cs.LG 版本更新

Relational Linear Properties in Language Models: An Empirical Investigation

语言模型中的关系线性性质：一项实证研究

Giovanni Valer, Luigi Gresele, Marco Bronzini, Emanuele Marconato

发表机构 * University of Copenhagen（哥本哈根大学）； University of Trento（特伦托大学）； University of Bologna（博洛尼亚大学）； University of Pisa（比萨大学）

AI总结本文提出基于KL散度的探针方法，实证检验语言模型中关系线性假设（即固定关系下对象解嵌入可由主体嵌入线性映射预测），发现其随模型、层和关系表述变化。

详情

AI中文摘要

线性性质在语言模型的表示中普遍存在；然而，实验性地测试它们仍然是一项具有挑战性的任务。本文聚焦于关系线性：即对于固定关系（例如“演奏”），对象的解嵌入（例如“小号”）可以通过线性映射从其主体（例如“迈尔斯·戴维斯”）的嵌入预测。我们提出了一种实验方法，用于测试Marconato等人（2025）提出的关系线性公式。具体而言，我们引入了一种基于KL散度的探针方法来评估这一性质，并考察其在不同层和不同表述的关系查询中的变化。该方法也比先前工作更高效；例如，它避免了Hernandez等人（2024）在线性关系嵌入中使用的粗略雅可比近似。我们在四个数据集上的发现表明，关系线性在不同模型间存在差异，展现出与先前关于模型表示中语言信息的观察一致的逐层模式，并且受关系表述方式变化的影响不同。

英文摘要

Linear properties are ubiquitous in the representations of language models; however, testing them experimentally remains a challenging task. This work focuses on relational linearity: the hypothesis that, for a fixed relation (e.g., "plays"), the unembedding of an object (e.g., "trumpet") can be predicted from the embedding of its subject (e.g.,"Miles Davis") by a linear map. We present an experimental method to test the formulation of relational linearity by Marconato et al. (2025). Specifically, we introduce a probing method, based on Kullback-Leibler divergence, to evaluate this property and examine its variation across layers and paraphrased relational queries. It is also more efficient than previous work; for example, it avoids the crude Jacobian approximations used in Linear Relational Embeddings by Hernandez et al. (2024). Our findings across four datasets show that relational linearity varies across models, exhibits layer-wise patterns consistent with prior observations about linguistic information in model representations, and is differently affected by changes in how the relation is phrased.

URL PDF HTML ☆

赞 0 踩 0

2605.22005 2026-05-26 cs.LG cs.AI cs.CL 版本更新

Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)

检查你的大语言模型的秘密词典！五行代码揭示你的大语言模型学到了什么（包括它不应该学到的）

Hisashi Miyashita

发表机构 * Mgnite Inc.（Mgnite公司）

AI总结通过对lm_head权重矩阵进行奇异值分解（仅需五行PyTorch代码且无需模型推理），直接从模型权重中揭示可解释的语义子空间，并发现模型训练数据组成和策展哲学。

详情

AI中文摘要

我们展示了基于Transformer的大语言模型的lm_head权重矩阵的奇异值分解——仅需五行PyTorch代码且无需模型推理——直接从模型权重中揭示可解释的语义子空间。每个左奇异向量识别出当隐藏状态与相应奇异方向对齐时最容易被选中的词汇标记；检查这些聚类揭示了模型的训练数据组成和策展哲学。分析GPT-OSS-120B、Gemma-2-2B和Qwen2.5-1.5B，我们发现奇异值谱和词汇聚类结构在不同模型间存在系统性差异：GPT呈现出功能分化子空间的渐进层次；Gemma以19世纪前的英语正字法为主，形成阶梯式聚类结构，这可能有助于高输出可控性；Qwen展现出广泛的多语言覆盖，同时其子空间的词汇被作者认为在伦理上不适合直接发表。基础-指令对比表明，伦理上令人担忧的子空间源自预训练，并且不会被后训练对齐移除。我们引入词汇聚类得分（VCS）来量化子空间一致性，以及加权投影得分（WPS）作为静态故障标记检测器；将WPS应用于GPT-OSS-120B，无需任何模型推理即可恢复shokubutsu-hyakka-tsu（ID 137606），这是CJK语言社区中广泛报道的一个著名故障标记。我们提出了问题词汇内容根本原因的分类法，并呼吁将lm_head SVD分析作为标准发布前安全审计步骤。我们的发现进一步指出了SVD引导的分词器优化和更可控的大语言模型设计方向。

英文摘要

We show that singular value decomposition of the lm_head} weight matrix of a transformer-based large language model -- requiring only five lines of PyTorch and no model inference -- reveals interpretable semantic subspaces directly from the model weights. Each left singular vector identifies the vocabulary tokens most readily selected when the hidden state aligns with the corresponding singular direction; inspecting these clusters exposes the model's training data composition and curation philosophy. Analysing GPT-OSS-120B, Gemma-2-2B, and Qwen2.5-1.5B, we find that singular value spectra and vocabulary cluster structures differ systematically across models: GPT exhibits a graduated hierarchy of functionally differentiated subspaces; Gemma is dominated by pre-nineteenth-century English orthography, forming a stepwise clustering structure that may contribute to high output controllability; and Qwen exhibits broad multilingual coverage alongside subspaces whose vocabulary the authors have determined to be ethically inappropriate for direct publication. Base-instruct comparison reveals that ethically concerning subspaces originate in pretraining and are not removed by post-training alignment. We introduce the Vocabulary Cluster Score (VCS) to quantify subspace coherence, and the Weighted Projection Score (WPS) as a static glitch token detector; applying WPS to GPT-OSS-120B recovers shokubutsu-hyakka-tsu (ID 137606), a well-known glitch token widely reported in the CJK language community, without any model inference. We propose a taxonomy of root causes for problematic vocabulary content and call for lm_head} SVD analysis to be adopted as a standard pre-release safety auditing step. Our findings further suggest directions toward SVD-guided tokenizer optimisation and more controllable LLM design.

URL PDF HTML ☆

赞 0 踩 0

2605.20670 2026-05-26 cs.LG 版本更新

HypergraphFormer: 从大语言模型中学习超图以实现可编辑的楼层平面图生成

Nikita Klimenko, Hesam Salehipour, Parham Eftekhar, Amir Khasahmadi, Ramon Elias Weber

发表机构 * Autodesk Research（Autodesk研究院）； York University（约克大学）； UC Berkeley（加州大学伯克利分校）

AI总结提出HypergraphFormer，利用大语言模型学习超图表示来生成楼层平面图，在RPLAN数据集上超越现有方法，并支持任意边界和高度可编辑性。

详情

AI中文摘要

在这项工作中，我们提出了HypergraphFormer，一种基于大语言模型学习超图表示的新型高效楼层平面图生成方法。该模型通过监督微调训练，生成基于超图的文本表示，编码楼层平面图中的空间关系和连通性信息。我们在RPLAN数据集上训练和评估我们的方法，并进一步在本文发布的一个独立的分布外数据集上展示其泛化能力。我们的方法在多种指标上优于基于栅格化或向量化表示的最先进技术。我们还展示了改进的数据效率，特别是在分布偏移下。超图公式通过将公寓足迹与其功能和几何细分解耦，使得能够为任意、不规则、用户指定的边界生成楼层平面图。此外，我们展示了所提出的方法具有高度的可编辑性，使其特别适合由大语言模型支持的设计导向工作流程。

英文摘要

In this work, we propose HypergraphFormer, a novel and efficient approach to floor plan generation based on learning hypergraph representations with a large language model (LLM). The model is trained via supervised fine-tuning to generate a hypergraph-based textual representation that encodes spatial relationships and connectivity information within floor plans. We train and evaluate our approach on the RPLAN dataset, and further demonstrate its generalizability on a separate out-of-distribution dataset, which we release in this paper. Our method outperforms state-of-the-art techniques based on rasterized or vectorized representations across a diverse set of metrics. We also show improved data efficiency, particularly under distribution shift. The hypergraph formulation enables the generation of floor plans for arbitrary, irregular, user-specified boundaries by decoupling apartment footprints from their functional and geometric subdivisions. Furthermore, we show that the proposed methodology offers a high degree of editability, making it particularly well suited to design-oriented workflows supported by LLMs.

URL PDF HTML ☆

赞 0 踩 0

2605.18224 2026-05-26 cs.LG cs.AI 版本更新

A Simplex Witness Certificate for Constant Collapse in Variational Autoencoders

变分自编码器中恒定坍缩的单纯形见证证书

Zegu Zhang, Jianhua Peng, Jian Zhang

发表机构 * Independent Researcher（独立研究者）； School of Computing, Southeast University（东南大学计算机学院）

AI总结提出一种基于GMM教师后验和单纯形见证的证书，用于检测和量化VAE编码器均值是否发生输入无关的恒定坍缩，并在MNIST、CIFAR-10和CIFAR-100上验证了方法有效性。

详情

AI中文摘要

我们研究变分自编码器中的精确恒定坍缩：确定性编码器均值变得与输入无关。先验保持为标准高斯分布。在VAE训练之前，我们从基于GMM的数据视角选择一个固定的教师后验，并将一个固定的仅潜在空间单纯形见证附加到编码器均值上。这种构造产生两个关联对象。第一个是证书：如果见证预测优于教师的最佳恒定预测器，则编码器均值不能是输入无关的常数。第二个是局部逃逸方向：在坍缩流形上，教师残差为对齐损失提供样本相关的下降方向。对于任何全支撑的教师后验，相同的几何结构也给出一个具有零教师-见证对齐误差的闭式潜在码。其缩放版本追踪一条从恒定预测器到精确教师码的边际能量路径，该路径量化了受保护见证子空间内的非坍缩。我们在MNIST、CIFAR-10和CIFAR-100上实例化了该方法。使用搜索的无监督PCA-GMM教师，在CIFAR-10和CIFAR-100上，所有五个种子的普通VAE均未通过教师-见证证书，而RST变体在所有五个种子中均通过。在坍缩压力设置下（β_KL ∈ {2,4,8}），普通VAE再次在所有种子中失败，而RST-alpha-prefit保持证书阳性。在两个自然图像数据集上的逃逸轨迹从低边际初始化开始增加见证边际，并表现出非零的教师诱导梯度范数。该分析仅限于编码器均值的精确恒定坍缩；生成质量、解码器使用和其他坍缩模式仍是独立的问题。

英文摘要

We study exact constant collapse in variational autoencoders: the deterministic encoder mean becomes independent of the input. The prior remains the standard Gaussian. Before VAE training, we select a fixed teacher posterior from a GMM-based view of the data and attach a fixed latent-only simplex witness to the encoder mean. This construction yields two linked objects. The first is a certificate: if the witness prediction improves on the best constant predictor of the teacher, the encoder mean cannot be input-independent constant. The second is a local escape direction: on the collapsed manifold, the teacher residual gives a sample-dependent descent direction for the alignment loss. For any full-support teacher posterior, the same geometry also gives a closed-form latent code with zero teacher-witness alignment error. Its scaled versions trace a margin-energy path from the constant predictor to the exact teacher code, which quantifies non-collapse inside the protected witness subspace. We instantiate the method on MNIST, CIFAR-10, and CIFAR-100. With searched unsupervised PCA-GMM teachers, vanilla VAEs fail the teacher-witness certificate in all five seeds on CIFAR-10 and CIFAR-100, while RST variants pass in all five seeds. Under collapse-stress settings with $β_{\mathrm{KL}}\in\{2,4,8\}$, vanilla VAE again fails in all seeds, whereas RST-alpha-prefit remains certificate-positive. Escape trajectories on both natural-image datasets increase the witness margin from a low-margin initialization and exhibit nonzero teacher-induced gradient norms. The analysis is confined to exact constant collapse of the encoder mean; generation quality, decoder use, and other collapse modes remain separate questions.

URL PDF HTML ☆

赞 0 踩 0

2605.17606 2026-05-26 cs.LG 版本更新

The Neural Tangent Kernel for Classification

分类问题的神经正切核

Jonathan Plenk, Sergio Calvo-Ordonez, Alvaro Cartea, Yarin Gal, Mark van der Wilk, Kamil Ciosek

发表机构 * Mathematical Institute, University of Oxford（牛津大学数学研究所）； Oxford-Man Institute of Quantitative Finance, University of Oxford（牛津大学量化金融研究所）； OATML, University of Oxford（牛津大学OATML研究中心）； Department of Computer Science, University of Oxford（牛津大学计算机科学系）； Spotify

AI总结本文通过识别宽神经网络在分类损失下保持懒惰训练的条件，将神经正切核理论扩展到分类问题，并分析了参数正则化对核常数性的影响以及预测器分布与贝叶斯方法的关系。

Comments Preprint

详情

AI中文摘要

在宽神经网络中，神经正切核（NTK）在训练过程中近似保持常数，为研究训练动态、泛化以及核方法的联系提供了强大的理论工具。然而，该理论主要局限于回归损失。先前认为，在分类损失或更一般涉及非线性输出变换的损失上训练会破坏这一性质，导致logits发散和线性化失效。本文通过识别宽神经网络保持懒惰训练机制的条件，将NTK理论扩展到分类问题。我们表明，参数空间正则化确保了交叉熵损失下训练过程中NTK的常数性，而在无正则化的情况下，当目标非退化（即所有类别具有严格正概率）时，该机制得以恢复。在这些条件下，训练可由线性化模型很好地近似，从而基于NTK得到解的显式刻画。我们进一步分析了随机初始化引起的训练预测器分布，并将这种模型不确定性的概念与贝叶斯方法联系起来。

英文摘要

In wide neural networks, the Neural Tangent Kernel (NTK) remains approximately constant during training, providing a powerful theoretical tool for studying training dynamics, generalization, and connections to kernel methods. However, this theory is largely restricted to regression losses. It was previously thought that training on a classification loss, or more generally losses involving nonlinear output transformations, breaks this property, leading to divergent logits and a breakdown of the linearization. In this paper, we extend NTK theory to classification by identifying conditions under which wide neural networks remain in the lazy training regime. We show that parameter-space regularization ensures a constant NTK during training for cross-entropy loss, while in the absence of regularization the regime is recovered when targets are non-degenerate, i.e. when all classes have strictly positive probability. Under these conditions, training is well-approximated by the linearized model, yielding an explicit characterization of the solution in terms of the NTK. We further analyze the distribution of trained predictors induced by random initialization and relate this notion of model uncertainty to Bayesian methods.

URL PDF HTML ☆

赞 0 踩 0

2605.16302 2026-05-26 cs.LG cs.AI cs.CL 版本更新

Reducing Credit Assignment Variance via Counterfactual Reasoning Paths

通过反事实推理路径减少信用分配方差

Fei Ding, Yongkang Zhang, Youwei Wang, Zijian Zeng

发表机构 * Alibaba Group（阿里巴巴集团）； Tsinghua University（清华大学）

AI总结提出反事实比较框架，通过采样多条推理轨迹并利用差异隐式估计过程级优势，将稀疏终端奖励转化为步骤敏感信号，从而改进大语言模型多步推理的信用分配，并引入隐式行为策略优化（IBPO）提升训练稳定性和性能上限。

详情

AI中文摘要

使用大语言模型进行多步推理的强化学习通常依赖于稀疏的终端奖励，这会导致一个条件较差的信用分配问题：最终反馈均匀地传播到所有中间决策。这导致高梯度方差、不稳定的训练和许多无效更新，最终限制了模型的持续改进。我们提出了一种用于信用分配的反事实比较框架。对于每个输入，该框架采样多个推理轨迹，并将它们的差异视为对替代决策的隐式近似。这产生了一个隐式过程级优势估计器，将稀疏终端奖励转化为步骤敏感的学习信号。基于此框架，我们引入了隐式行为策略优化（IBPO），该方法在数学和代码推理基准上显著提高了训练稳定性和性能上限。我们的结果为释放大语言模型的推理潜力指明了一个有前景的方向。

英文摘要

Reinforcement learning for multi-step reasoning with large language models (LLMs) typically relies on sparse terminal rewards, which creates a poorly conditioned credit-assignment problem: the final feedback is propagated uniformly across all intermediate decisions. This leads to high gradient variance, unstable training, and many ineffective updates, ultimately limiting sustained model improvement. We propose a counterfactual-comparison framework for credit assignment. For each input, the framework samples multiple reasoning trajectories and treats their differences as implicit approximations to alternative decisions. This yields an implicit process-level advantage estimator that converts sparse terminal rewards into step-sensitive learning signals. Building on this framework, we introduce Implicit Behavior Policy Optimization (IBPO), which substantially improves training stability and the performance ceiling on mathematical and code-reasoning benchmarks. Our results point to a promising direction for unlocking the reasoning potential of LLMs.

URL PDF HTML ☆

赞 0 踩 0

2605.15433 2026-05-26 cs.LG 版本更新

Spectral Priors vs. Attention: Investigating the Utility of Attention Mechanisms in EEG-Based Diagnosis

光谱先验 vs. 注意力：探究注意力机制在基于脑电图的诊断中的效用

Tawsik Jawad, Gowtham Atluri, Vikram Ravindra

发表机构 * University of Cincinnati（辛辛那提大学）

AI总结本文提出一种基于频带选择的光谱特征构建方法，证明在小型EEG数据集中，传统机器学习模型性能可匹敌或超越SOTA深度学习模型，而注意力机制无法提取稳定的光谱特征。

详情

AI中文摘要

脑电图（EEG）时间序列信号具有显著噪声和粗糙的空间分辨率，这使得神经退行性疾病的分类变得复杂。即使是最先进的深度学习架构，由于组间高度相似性，也难以区分健康对照和患病受试者，或不同疾病类型。在本文中，我们展示了一种光谱选择性特征构建方法能够增强类别可分性。通过隔离主要脑波频带内的信号强度，我们将高维原始数据转化为高价值的光谱特征。我们的结果表明，在小型数据集中：a) 从频域和时频域导出的特征使传统机器学习模型能够匹配或超越最先进深度学习模型的性能；b) 注意力机制无法提取表征健康神经活动的稳定特征签名，无论是在静息态还是任务态EEG中；c) 基于注意力的模型在寻找相关光谱特征方面的局限性似乎是稳健的，因为提供频率选择性时域输入并未显著改善其性能。我们在三个开源静息态EEG数据集和一个任务态EEG数据集上验证了我们的方法，为我们的主张提供了强有力的经验证据。

英文摘要

Electroencephalograph (EEG) timeseries signals are characterized by significant noise and coarse spatial resolution, which complicates the classification of neurodegenerative diseases. Even SOTA deep learning architectures struggle to distinguish between healthy controls and diseased subjects, or between different disease types, due to high intergroup similarity. In this paper, we show that a spectrally selective approach to feature construction enhances class separability. By isolating signal strengths within the primary brainwave bands, we transform high dimensional raw data into high value spectral features. Our results demonstrate that in small datasets a) features derived from frequency and time frequency domain allow traditional machine learning models to match or exceed the performance of SOTA deep learning models, b) Attention mechanism is unable to distill the stable feature signatures that characterize healthy neural activity in both resting and task EEGs, and c) the limitations of attention based models in finding relevant spectral features appear to be robust in that providing frequency selective time domain input do not appreciably improve their performance. We validate our methodology across three open source resting EEG datasets and one task EEG dataset, providing robust empirical evidence for our claims.

URL PDF HTML ☆

赞 0 踩 0

2605.14255 2026-05-26 cs.LG cs.CV 版本更新

Architecture-Aware Explanation Auditing for Industrial Visual Inspection

面向工业视觉检测的架构感知解释审计

Sibo Jia, Zihang Zhao, Kunrong Li

AI总结本文提出一种基于原生读出假设的架构感知解释审计协议，通过扰动实验证明解释方法的忠实度受其与模型原生决策机制的结构距离约束，并揭示忠实度排名是（模型、解释器、扰动算子）三元组的联合属性。

Comments Format update

详情

AI中文摘要

工业视觉检测系统日益依赖深度分类器，其热力图解释可能看似合理，但未能识别真正驱动模型决策的图像区域。本文基于原生读出假设，实现了一种架构感知的解释审计协议：解释方法的基于扰动的忠实度受其与模型原生决策机制的结构距离约束。在WM-811K晶圆图（9类，172k图像）上，采用三种子零填充扰动协议，ViT-Tiny + Attention Rollout的Deletion AUC为0.211，而Swin-Tiny / ResNet18+CBAM / DenseNet121 + Grad-CAM的Deletion AUC为0.432-0.525（|Cohen's d| > 1.1），尽管其分类准确率较低。Swin-Tiny将架构家族与读出结构分离：尽管是Transformer，其空间特征图层次使其与Grad-CAM兼容，表明操作因素是读出结构而非架构家族。一个模型无关的控制方法（RISE）将所有家族的Deletion AUC压缩至约0.1，表明差距源于解释器路径；值得注意的是，RISE优于所有原生方法，因此原生读出是兼容性原则而非最优性保证。模糊填充敏感性分析表明，在不同扰动基线下的家族排序反转，强化了忠实度排名是（模型、解释器、扰动算子）三元组的联合属性。在MVTec AD（预训练模型）上的探索性边界条件研究表明，审计结果依赖于数据集/任务，并识别了需要限定的条件。该协议提供了可操作的指导：解释路径应基于读出结构与模型架构协同设计，部署的热力图应附带定量忠实度指标。

英文摘要

Industrial visual inspection systems increasingly rely on deep classifiers whose heatmap explanations may appear visually plausible while failing to identify the image regions that actually drive model decisions. This paper operationalizes an architecture-aware explanation audit protocol grounded in the native-readout hypothesis: the perturbation-based faithfulness of an explanation method is bounded by its structural distance from the model's native decision mechanism. On WM-811K wafer maps (9 classes, 172k images) under a three-seed zero-fill perturbation protocol, ViT-Tiny + Attention Rollout attains Deletion AUC 0.211 against 0.432-0.525 for Swin-Tiny / ResNet18+CBAM / DenseNet121 + Grad-CAM (abs(Cohen's d) > 1.1), despite lower classification accuracy. Swin-Tiny disentangles architecture family from readout structure: despite being a Transformer, its spatial feature-map hierarchy makes it Grad-CAM compatible, showing that the operative factor is readout structure rather than architecture family. A model-agnostic control (RISE) compresses all families to Deletion AUC about 0.1, indicating the gap arises from the explainer pathway; notably, RISE outperforms all native methods, so native readout is a compatibility principle rather than an optimality guarantee. A blur-fill sensitivity analysis shows that the family ordering reverses under a different perturbation baseline, reinforcing that faithfulness rankings are joint properties of (model, explainer, perturbation operator) triples. An exploratory boundary-condition study on MVTec AD (pretrained models) indicates that audit results are dataset/task dependent and identifies conditions requiring qualification. The protocol yields actionable guidance: explanation pathways should be co-designed with model architectures based on readout structure, and deployed heatmaps should be accompanied by quantitative faithfulness metrics.

URL PDF HTML ☆

赞 0 踩 0

2605.12961 2026-05-26 cs.CV cs.LG 版本更新

Reducing Bias and Variance: Generative Semantic Guidance and Bi-Layer Ensemble for Image Clustering

减少偏差与方差：用于图像聚类的生成语义引导与双层集成

Feijiang Li, Zhenxiong Li, Jieting Wang, Zizheng Jiu, Saixiong Liu, Liang Du

发表机构 * Institute of Big Data Science and Industry（大数据科学与产业研究院）； Key Laboratory of Evolutionary Science Intelligence of Shanxi Province（山西省进化智能科学重点实验室）； School of Artificial Intelligence, Shanxi University（山西大学人工智能学院）

AI总结提出GSEC框架，通过生成语义引导减少偏差、双层集成学习降低方差，在六个基准数据集上超越18种最新方法。

详情

AI中文摘要

图像聚类旨在将未标记的图像数据集划分为不同的组。该任务的一个核心方面是构建并利用先验知识来指导聚类过程。最近的方法引入语义描述作为先验信息，其中大多数通常依赖于基于匹配的技术和预定义词汇表。然而，有限的匹配空间限制了它们对下游聚类任务的适应性。此外，这些方法主要关注减少偏差以提高性能，经常忽视方差降低的重要性。为了解决这些局限性，我们提出了GSEC（基于生成语义引导和双层集成的图像聚类），这是一个旨在通过生成语义引导减少偏差并通过集成学习缓解方差的框架。我们的方法利用多模态大语言模型生成语义描述，并通过加权平均推导图像嵌入。此外，双层集成策略通过内层的BatchEnsemble整合跨模态信息，并通过外层的对齐机制对齐输出。对比实验表明，GSEC在六个基准数据集上优于18种最新方法，进一步分析证实了其在同时减少偏差和方差方面的有效性。代码可在https://github.com/2017LI/GSEC.git获取。

英文摘要

Image clustering aims to partition unlabeled image datasets into distinct groups. A core aspect of this task is constructing and leveraging prior knowledge to guide the clustering process. Recent approaches introduce semantic descriptions as prior information, most of which typically relying on matching-based techniques with predefined vocabularies. However, the limited matching space restricts their adaptability to downstream clustering tasks. Moreover, these methods primarily focus on reducing bias to improve performance, frequently overlooking the importance of variance reduction. To address these limitations, we propose GSEC (Image Clustering based on Generative Semantic Guidance and Bi-Layer Ensemble), a framework designed to reduce bias through generative semantic guidance and mitigate variance via ensemble learning. Our method employs Multimodal Large Language Models to generate semantic descriptions and derive image embeddings via weighted averaging. Additionally, a bi-layer ensemble strategy integrates cross-modal information through BatchEnsemble in the inner layer and aligns outputs via an alignment mechanism in the outer layer. Comparative experiments demonstrate that GSEC outperforms 18 state-of-the-art methods across six benchmark datasets, while further analysis confirms its effectiveness in simultaneously reducing both bias and variance. The code is available at https://github.com/2017LI/GSEC.git.

URL PDF HTML ☆

赞 0 踩 0

2605.10430 2026-05-26 cs.LG cs.AI stat.ML 版本更新

Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation

真实 vs. 半模拟：重新思考治疗效果估计的评估

George Panagopoulos

发表机构 * Department of Computer Science University of Luxembourg（计算机科学系卢森堡大学）

AI总结通过大规模实证研究，比较了半模拟基准和真实数据集上使用反事实指标与可观测指标评估治疗效果估计模型的效果，揭示了两种评估体系之间的差距，并发现简单元学习器与强基础模型结合具有竞争力。

详情

AI中文摘要

利用机器学习估计异质性治疗效果在学术研究和工业实践中都引起了广泛关注。然而，这两个领域通常在不同条件下评估模型。方法论工作通常依赖于半模拟基准和需要反事实结果的指标，而实际应用则依赖于基于排名或测试结果的可观测指标。尽管方法论进展与实际部署之间存在众所周知的差距，但这些评估体系之间的关系尚未得到系统研究。我们对标准半模拟基准系列和真实数据集上的治疗效果评估进行了大规模实证研究。我们的基准涵盖了与多个基础学习器配对的元学习器，以及专门的因果机器学习模型。我们使用应用导向文献中常见的可观测指标以及方法论文中常用的反事实指标来评估这些方法。我们的结果揭示了两个互补的差距。首先，即使在相同的半模拟基准上，反事实指标也不能可靠地恢复可观测指标偏好的估计器。其次，在半模拟基准上获得的排名不能迁移到真实数据集。我们还发现，具有强大基础模型的简单元学习器始终具有竞争力，这与专门的因果模型形成对比。总体而言，我们的发现表明，治疗效果估计研究的进展不应仅通过反事实指标和半模拟基准来评估，而应结合可观测指标和真实数据验证。

英文摘要

Estimating heterogeneous treatment effects with machine learning has attracted substantial attention in both academic research and industrial practice. However, the two communities often evaluate models under markedly different conditions. Methodological work typically relies on semi-simulated benchmarks and metrics that require counterfactual outcomes, whereas real-world applications rely on observable metrics based on ranking or test outcomes. Despite the well-known gap between methodological progress and practical deployment, the relationship between these evaluation regimes has not been examined systematically. We conduct a large-scale empirical study of treatment effect evaluation across standard semi-simulated benchmark families and real-world datasets. Our benchmark covers meta-learners paired with multiple base learners, as well as specialized causal machine learning models. We evaluate these methods using observable metrics common in application-oriented literature, alongside counterfactual metrics commonly used in methods papers. Our results reveal two complementary gaps. First, counterfactual metrics do not reliably recover the estimators preferred by observable metrics, even on the same semi-simulated benchmarks. Second, rankings obtained on semi-simulated benchmarks do not transfer to real datasets. We further find that simple meta-learners with strong base models are consistently competitive, in contrast to specialized causal models. Overall, our findings suggest that progress in treatment effect estimation research should not be assessed solely through counterfactual metrics and semi-simulated benchmarks, but it would benefit from incorporating observable metrics and real-data validation.

URL PDF HTML ☆

赞 0 踩 0

2605.07733 2026-05-26 cs.LG cs.AI 版本更新

Intelligent Truck Matching in Full Truckload Shipments using Ping2Hex approach

使用Ping2Hex方法的整车运输智能卡车匹配

Srinivas Kumar Ramdas, Jose Mathew, Ankit Singh Chauhan, Dinesh Rajkumar, Aravind Manoj, Mohit Goel

发表机构 * Project44 Gmbh（Project44公司）

AI总结提出基于Ping2Hex的智能卡车匹配系统ITM 2.0，通过概率排序和LightGBM模型解决GPS数据中车辆标识缺失导致的匹配问题，显著提升精度和覆盖率。

Comments 12 pages, 10 figures, 8 tables. Accepted at iSCSi 2026 (International Conference on Industry Sciences and Computer Sciences Innovation). To appear in Procedia Computer Science (Elsevier)

详情

Journal ref: ISCSI(2026)

AI中文摘要

利用GPS数据进行准确的卡车与货物匹配是整车供应链可视性的基础，能够实现实时跟踪和准确的预计到达时间（ETA）预测。然而，缺失或损坏的车辆标识符使得传统匹配方法无法使用，导致货物失去可视性。本文提出了智能卡车匹配（ITM）2.0，一个机器学习系统，通过将匹配问题表述为概率排序来解决这一关键缺口。我们的方法利用Uber H3六边形空间索引将GPS ping离散化为路线相似性特征，结合时间信息，然后应用带有阈值后处理的LightGBM梯度提升。通过严格的评估，包括离线模型选择（SVM、XGBoost、LightGBM）、全面的消融研究和生产影子测试，我们展示了相对于基于规则的基线的显著提升。ITM 2.0在北美实现了26个百分点的精度提升，在欧洲实现了14个百分点的提升，同时覆盖率翻倍。该系统已在Project44部署用于处理整车运输，展示了对于高达1公里的地理编码误差、多个候选卡车和稀疏ping的鲁棒性。

英文摘要

Accurate truck-to-shipment matching using GPS data is foundational for full truckload supply chain visibility, enabling real-time tracking and accurate estimated time of arrival (ETA) predictions. However, missing or corrupted vehicle identifiers prevent traditional matching approaches, leaving shipments without visibility. This paper presents Intelligent Truck Matching (ITM) 2.0, a machine learning system that addresses this critical gap by formulating matching as a probabilistic ranking problem. Our approach leverages Uber H3 hexagonal spatial indexing to discretize GPS pings into route similarity features, combined with temporal information, then applies LightGBM gradient boosting with threshold-based post-processing. Through rigorous evaluation including offline model selection (SVM, XGBoost, LightGBM), comprehensive ablation studies, and production shadow testing, we demonstrate substantial gains over rule-based baselines. ITM 2.0 achieves 26 percentage point precision improvement in North America and 14 points in Europe, while doubling coverage. Deployed in production at Project44 handling full truckload shipments, the system demonstrates robustness to geocoding errors up to 1 km, multiple candidate trucks, and sparse pings.

URL PDF HTML ☆

赞 0 踩 0

2605.06415 2026-05-26 cs.LG cs.AI cs.CL cs.CV 版本更新

E = T*H/(O+B): A Dimensionless Control Parameter for Mixture-of-Experts Ecology

E = T*H/(O+B)：混合专家生态的无量纲控制参数

Qingjun Zhang

发表机构 * School of Integrated Circuits, Wuxi Taihu University（无锡太湖大学集成电路学院）

AI总结提出无量纲控制参数E = T*H/(O+B)，通过12个控制实验证明E≥0.5可保证混合专家模型无死亡专家，并发现专家复活、正交毒性依赖数据集等六项额外结果。

Comments 12 experiments, 11,000+ training epochs, cross-modal validation (vision + language). Extended version of the Claude-in-the-Loop ecology framework

详情

AI中文摘要

我们引入E = T*H/(O+B)，这是一个无量纲控制参数，用于预测混合专家（MoE）模型是否会发展出健康的专家生态还是陷入死亡专家。E将四个超参数——路由温度T、路由熵权重H、先知权重O和平衡权重B——组合成一个单一量。通过12个控制实验（8个视觉，4个语言），总计超过11,000个训练周期，我们确定仅E ≥ 0.5就足以保证零死亡专家，消除了手工设计负载平衡辅助损失的必要性。我们在CIFAR-10、CIFAR-100、TinyImageNet-200、WikiText-2和WikiText-103上跨模态验证了这一点。另外还发现了六项结果：（1）死亡专家可以复活——由平衡损失驱动路由器重新探索触发；（2）正交毒性依赖于数据集，并非普遍存在；（3）任务复杂性改变了临界E阈值；（4）模型过拟合与专家生态健康解耦；（5）三层MoE自发崩溃为两层功能结构；（6）生态结构在50倍温度范围内保持不变。我们提出E作为MoE训练的统一诊断指标，类似于流体力学中的雷诺数。

英文摘要

We introduce E = T*H/(O+B), a dimensionless control parameter that predicts whether Mixture-of-Experts (MoE) models will develop a healthy expert ecology or collapse into dead experts. E combines four hyperparameters -- routing temperature T, routing entropy weight H, oracle weight O, and balance weight B -- into a single quantity. Through 12 controlled experiments (8 vision, 4 language) totaling over 11,000 training epochs, we establish that E >= 0.5 alone is sufficient to guarantee zero dead experts, removing the necessity for handcrafted load-balancing auxiliary losses. We validate this cross-modally on CIFAR-10, CIFAR-100, TinyImageNet-200, WikiText-2, and WikiText-103. Six additional findings emerge: (1) dead experts can resuscitate -- triggered by balance loss driving router re-exploration; (2) ortho toxicity is dataset-dependent, not universal; (3) task complexity shifts the critical E threshold; (4) model overfitting is decoupled from expert ecological health; (5) three-tier MoE spontaneously collapses into a two-tier functional structure; (6) ecological structure is temperature-invariant across a 50x range. We propose that E serves as a unified diagnostic for MoE training, analogous to the Reynolds number in fluid dynamics.

URL PDF HTML ☆

赞 0 踩 0

2605.04295 2026-05-26 cs.LG cs.AI 版本更新

LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy

通过自适应共形语义熵进行LLM不确定性量化

Hamed Karimi, Vaishali Meyappan, Reza Samavi

发表机构 * Toronto Metropolitan University（多伦多 Metropolitan 大学）； Vector Institute（向量研究所）

AI总结提出自适应共形语义熵（ACSE）方法，通过聚类语义熵并自适应调整不确定性分数，结合共形校准实现统计可靠的接受/弃权决策，在多个数据集上优于现有基线。

Comments Accepted for publication in the Proceedings of the 35th International Joint Conference on Artificial Intelligence (IJCAI 2026); 14 Pages

详情

AI中文摘要

LLMs的过度自信，特别是在产生幻觉时，对在安全关键环境中部署模型构成了重大挑战，并使得对不确定性进行可靠估计成为必要。现有的不确定性量化方法通常优先考虑词汇或概率度量；然而，这些技术往往忽略了具有相似含义的不同响应的语义差异。在本文中，我们提出了自适应共形语义熵（ACSE），一种通过自适应测量LLMs输出中的语义分散性来估计提示级不确定性的方法。我们的不确定性评分函数基于对同一提示的多个不同响应的语义熵进行聚类。该函数根据每个聚类的语义特征自适应调整不确定性分数。为了确保我们分数的统计可靠性，我们使用共形校准应用决策规则来接受/弃权提示，提供了有限样本、无分布的保证，使得接受响应中的错误率保持在用户指定的容差范围内。我们使用不同LLMs和数据集进行的广泛实验评估表明，我们的方法在判别性能、共形保证和概率校准指标方面始终优于最先进的不确定性量化基线。作为一个亮点，对于TriviaQA数据集，我们方法的AUROC为0.88，而令牌熵方法为0.65。

英文摘要

LLMs' overconfidence, particularly when hallucinating, poses a significant challenge for the deployment of the models in safety-critical settings and makes a reliable estimation of uncertainty necessary. Existing approaches for uncertainty quantification typically prioritize lexical or probabilistic measures; however, these techniques often ignore the semantic variance of different responses with similar meaning. In this paper, we propose Adaptive Conformal Semantic Entropy (ACSE), a method for estimating prompt-level uncertainty by adaptively measuring semantic dispersion in LLMs outputs. Our uncertainty scoring function is based on clustering semantic entropy of multiple diverse responses to the same prompt. The function adaptively adjusts the uncertainty score based on semantic features of each cluster. To ensure statistical reliability of our score, we use conformal calibration to apply a decision rule to accept/abstain the prompts, providing a finite-sample, distribution-free guarantee such that the error rate among the accepted responses remains bounded by a user-specified tolerance. Our extensive experimental evaluations using different LLMs and datasets, demonstrate that our approach consistently outperforms state-of-the-art uncertainty quantification baselines using discriminative performance, conformal guarantees, and probabilistic calibration indicators. As a highlight, for TriviaQA dataset, AUROC of our approach is 0.88 compared to 0.65 produced by the token entropy approach.

URL PDF HTML ☆

赞 0 踩 0

2605.02044 2026-05-26 cs.LG 版本更新

NeuroViz: Real-time Interactive Visualization of Forward and Backward Passes in Neural Network Training

NeuroViz：神经网络训练中前向和后向传播的实时交互式可视化

Tanvi Sharma, Reza Rawassizadeh

发表机构 * Boston University（波士顿大学）

AI总结提出NeuroViz交互式可视化工具，通过实时展示全连接神经网络训练中的激活值、权重更新和损失变化，以及逐神经元方程，显著提升训练透明度和可解释性。

Comments 9 pages, 4 figures, 6 tables

详情

AI中文摘要

训练神经网络难以解释，尤其对于新手。我们介绍了NeuroViz，一个交互式可视化工具，支持全连接神经网络训练的实时探索。用户可以配置网络架构、激活函数、学习率和数据集，然后观察激活值、权重更新和损失进展。NeuroViz将权重变化与前后向传播中的激活信号直接对应可视化，使用户能够区分单个epoch内的更新前后状态，并查看动态更新的逐神经元方程。我们与31名参与者进行了对比用户研究，与六个已有的可视化工具相比，NeuroViz获得了最高的可用性评分（SUS 80.97，属于“优秀”范围），清晰度平均排名2.47，有用性平均排名2.23（越低越好）。超过70%的参与者报告说，可视化显著提高了他们对神经网络训练透明度的感知。实现实例可在https://neuroviz.org访问。

英文摘要

Training neural networks is difficult to interpret, particularly for newcomers. We introduce NeuroViz, an interactive visualization tool that supports real-time exploration of fully connected neural network training. Users can configure network architecture, activation functions, learning rates, and datasets, then observe activations, weight updates, and loss progression. NeuroViz visualizes weight changes in direct correspondence with activation signals in both forward and backward passes, enabling users to distinguish pre- and post-update states within individual epochs and view dynamically updating per-neuron equations. We conduct a comparative user study with 31 participants against six established visualization tools and we achieved the highest usability score (SUS 80.97, in the 'excellent' range), with mean rankings of 2.47 for clarity and 2.23 for usefulness (lower is better). Over 70% of participants reported that the visualizations substantially increased their perception of neural network training transparency. The implemented instance is accessible at https://neuroviz.org.

URL PDF HTML ☆

赞 0 踩 0

2604.24517 2026-05-26 cs.LG cs.GT 版本更新

序列级奖励的组内学习设计条件：令牌梯度消除

Fei Ding, Yongkang Zhang, youwei wang, Zijian Zeng

发表机构 * Alibaba Group（阿里巴巴集团）； Tsinghua University（清华大学）

AI总结针对大语言模型多步推理中稀疏终端奖励导致的信用分配问题，提出反事实比较框架和隐式行为策略优化（IBPO），通过轨迹差异近似替代决策，将稀疏奖励转化为步骤敏感信号，提升训练稳定性和推理性能。

详情

AI中文摘要

基于大语言模型的多步推理强化学习通常依赖于稀疏的终端奖励，这导致了不良条件的信用分配问题：最终反馈均匀地传播到所有中间决策。这导致高梯度方差、不稳定的训练和许多无效更新，最终限制了模型的持续改进。我们提出了一种用于信用分配的反事实比较框架。对于每个输入，该框架采样多个推理轨迹，并将它们的差异视为替代决策的隐式近似。这产生了一个隐式过程级优势估计器，将稀疏的终端奖励转化为步骤敏感的学习信号。基于此框架，我们引入了隐式行为策略优化（IBPO），显著提高了数学和代码推理基准上的训练稳定性和性能上限。我们的结果指向了一个有希望的方向，以解锁大语言模型的推理潜力。

英文摘要

Reinforcement learning for multi-step reasoning with large language models (LLMs) typically relies on sparse terminal rewards, which creates a poorly conditioned credit-assignment problem: the final feedback is propagated uniformly across all intermediate decisions. This leads to high gradient variance, unstable training, and many ineffective updates, ultimately limiting sustained model improvement. We propose a counterfactual-comparison framework for credit assignment. For each input, the framework samples multiple reasoning trajectories and treats their differences as implicit approximations to alternative decisions. This yields an implicit process-level advantage estimator that converts sparse terminal rewards into step-sensitive learning signals. Building on this framework, we introduce Implicit Behavior Policy Optimization (IBPO), which substantially improves training stability and the performance ceiling on mathematical and code-reasoning benchmarks. Our results point to a promising direction for unlocking the reasoning potential of LLMs.

URL PDF HTML ☆

赞 0 踩 0

2604.11811 2026-05-26 cs.PL cs.AI cs.CL cs.LG 版本更新

M$^\star$: Every Task Deserves Its Own Memory Harness

M$^\star$：每个任务都应有专属的记忆框架

Wenbo Pan, Shujie Liu, Xiangyang Zhou, Shiwei Zhang, Wanlu Shi, Mirror Xu, Xiaohua Jia

发表机构 * City University of Hong Kong（香港城市大学）； Microsoft（微软）

AI总结提出M$^\star$方法，通过可执行程序进化自动发现任务优化的记忆系统，在对话、具身规划和专家推理等任务上优于固定记忆基线。

Comments Preprint. Code: https://github.com/wbopan/mstar ; Live demo: https://mstar.wenbo.io

详情

AI中文摘要

大型语言模型代理依赖专门的记忆系统在长时间交互中积累和重用知识。最近的架构通常采用针对特定领域定制的固定记忆设计，例如用于对话的语义检索或用于编码的技能重用。然而，为某一目的优化的记忆系统往往无法迁移到其他任务。为了解决这一限制，我们引入了M$^\star$，一种通过可执行程序进化自动发现任务优化记忆框架的方法。具体来说，M$^\star$将代理记忆系统建模为用Python编写的记忆程序。该程序封装了数据模式、存储逻辑和代理工作流指令。我们使用反射式代码进化方法联合优化这些组件；该方法采用基于种群的搜索策略，并分析评估失败以迭代改进候选程序。我们在涵盖对话、具身规划和专家推理的四个不同基准上评估M$^\star$。结果表明，M$^\star$在所有评估任务上稳健地优于现有的固定记忆基线。此外，进化出的记忆程序对每个领域展现出结构不同的处理机制。这一发现表明，针对给定任务特化记忆机制探索了广泛的设计空间，并提供了比通用记忆范式更优的解决方案。

英文摘要

Large language model agents rely on specialized memory systems to accumulate and reuse knowledge during extended interactions. Recent architectures typically adopt a fixed memory design tailored to specific domains, such as semantic retrieval for conversations or skills reused for coding. However, a memory system optimized for one purpose frequently fails to transfer to others. To address this limitation, we introduce M$^\star$, a method that automatically discovers task-optimized memory harnesses through executable program evolution. Specifically, M$^\star$ models an agent memory system as a memory program written in Python. This program encapsulates the data Schema, the storage Logic, and the agent workflow Instructions. We optimize these components jointly using a reflective code evolution method; this approach employs a population-based search strategy and analyzes evaluation failures to iteratively refine the candidate programs. We evaluate M$^\star$ on four distinct benchmarks spanning conversation, embodied planning, and expert reasoning. Our results demonstrate that M$^\star$ improves performance over existing fixed-memory baselines robustly across all evaluated tasks. Furthermore, the evolved memory programs exhibit structurally distinct processing mechanisms for each domain. This finding indicates that specializing the memory mechanism for a given task explores a broad design space and provides a superior solution compared to general-purpose memory paradigms.

URL PDF HTML ☆

赞 0 踩 0

2604.04453 2026-05-26 cs.CE cs.LG 版本更新

Generative modeling of granular flow on inclined planes using conditional flow matching

基于条件流匹配的倾斜平面上颗粒流生成建模

Xuyang Li, Rui Li, Teng Man, Yimin Lu

发表机构 * School of Construction, University of North Carolina at Charlotte（北卡罗来纳大学夏洛特分校建设学院）； Department of Civil, Environmental, and Construction Engineering, Texas Tech University（德克萨斯理工大学土木、环境与建设工程系）； College of Civil Engineering, Zhejiang University of Technology（浙江工业大学土木工程学院）

AI总结提出首个条件流匹配（CFM）框架，利用稀疏边界观测重建颗粒流内部运动，通过可微前向算子和稀疏感知梯度引导机制实现高精度重建，并优于确定性CNN基线。

详情

AI中文摘要

颗粒流控制着许多自然和工业过程，但其内部运动学和力学在很大程度上仍无法观测，因为实验只能接触到边界或自由表面。传统的数值模拟对于快速逆重建计算成本高昂，而确定性模型在病态设定下往往会退化为过度平滑的平均预测。本研究据作者所知首次提出了一个条件流匹配（CFM）框架，用于从稀疏边界观测重建颗粒流。该生成模型在高保真颗粒分辨离散元模拟上训练，在推理时由可微前向算子和一种新颖的稀疏感知梯度引导机制指导。该机制避免了标准均方误差方法固有的梯度稀释，保留了观测误差的绝对物理尺度，无需超参数调整即可强制执行测量一致性，并防止非材料区域出现非物理速度预测。一个物理解码器将重建的速度场映射到应力状态和能量波动量，包括平均应力、偏应力和颗粒温度。该框架从完整观测到仅16%的信息窗口准确恢复内部流场，并且在仅11%数据的强稀释空间分辨率下仍然有效。在最病态的重建区域，它优于确定性CNN基线，并通过集成生成提供空间分辨的不确定性估计。这些结果表明，条件生成模型为颗粒介质中隐藏体力学特性的非侵入性推断提供了一条实用途径，并暗示了在颗粒和多相系统逆问题中的潜在适用性。

英文摘要

Granular flows govern many natural and industrial processes, yet their interior kinematics and mechanics remain largely unobservable, as experiments access only boundaries or free surfaces. Conventional numerical simulations are computationally expensive for fast inverse reconstruction, and deterministic models tend to collapse to over-smoothed mean predictions in ill-posed settings. This study, to the best of the authors' knowledge, presents the first conditional flow matching (CFM) framework for granular-flow reconstruction from sparse boundary observations. Trained on high-fidelity particle-resolved discrete element simulations, the generative model is guided at inference by a differentiable forward operator and a novel sparsity-aware gradient guidance mechanism. This mechanism avoids the gradient dilution inherent to standard mean-squared-error approaches, preserves the absolute physical scale of observation errors, enforces measurement consistency without hyperparameter tuning, and prevents unphysical velocity predictions in non-material regions. A physics decoder maps the reconstructed velocity fields to stress states and energy fluctuation quantities, including mean stress, deviatoric stress, and granular temperature. The framework accurately recovers interior flow fields from full observation to only 16\% of the informative window, and it remains effective under strongly diluted spatial resolution with only 11% of data. It also outperforms a deterministic CNN baseline in the most ill-posed reconstruction regime and provides spatially resolved uncertainty estimates through ensemble generation. These results demonstrate that conditional generative modeling offers a practical route for non-invasive inference of hidden bulk mechanics in granular media, and it suggests potential applicability for inverse problems in particulate and multiphase systems.

URL PDF HTML ☆

赞 0 踩 0

2603.18444 2026-05-26 cs.LG cs.AI 版本更新

Discounted Beta-Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards

折扣Beta-Bernoulli奖励估计用于基于可验证奖励的样本高效强化学习

Haechan Kim, Soohyun Ryu, Gyouk Chu, Doohyuk Jang, Eunho Yang

发表机构 * KAIST（韩国科学技术院）

AI总结针对基于可验证奖励的强化学习样本效率低的问题，提出折扣Beta-Bernoulli奖励估计方法，利用历史奖励统计量降低估计方差并避免方差崩溃，在多个推理基准上显著提升性能。

Comments 14 pages, 3 figures

详情

AI中文摘要

基于可验证奖励的强化学习已成为提升大语言模型推理能力的有效后训练范式。然而，现有的基于组的RLVR方法常遭受严重的样本低效问题。这种低效源于对少量rollout的奖励进行点估计，导致高估计方差、方差崩溃以及生成响应的无效利用。在本工作中，我们从统计估计角度重新审视RLVR，将奖励建模为从策略诱导分布中抽取的样本，并将优势计算视为从有限数据中估计奖励分布的问题。基于此观点，我们提出折扣Beta-Bernoulli奖励估计，该方法利用历史奖励统计量处理非平稳分布。尽管有偏，所得估计量展现出降低且稳定的方差，理论上避免了估计方差崩溃，并在均方误差上优于标准点估计。在六个分布内和三个分布外推理基准上的大量实验表明，使用DBB的GRPO一致优于朴素GRPO，在1.7B和8B模型上分别实现了分布内平均Acc@8提升3.22/2.42点，分布外提升12.49/6.92点，且无需额外计算成本或内存开销。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has emerged as an effective post-training paradigm for improving the reasoning capabilities of large language models. However, existing group-based RLVR methods often suffer from severe sample inefficiency. This inefficiency stems from reliance on point estimation of rewards from a small number of rollouts, leading to high estimation variance, variance collapse, and ineffective utilization of generated responses. In this work, we reformulate RLVR from a statistical estimation perspective by modeling rewards as samples drawn from a policy-induced distribution and casting advantage computation as the problem of estimating the reward distribution from finite data. Building on this view, we propose Discounted Beta-Bernoulli (DBB) reward estimation, which leverages historical reward statistics for the non-stationary distribution. Although biased, the resulting estimator exhibits reduced and stable variance, theoretically avoids estimated variance collapse, and achieves lower mean squared error than standard point estimation. Extensive experiments across six in-distribution and three out-of-distribution reasoning benchmarks demonstrate that GRPO with DBB consistently outperforms naive GRPO, achieving average Acc@8 improvements of 3.22/2.42 points in-distribution and 12.49/6.92 points out-of-distribution on the 1.7B and 8B models, respectively, without additional computational cost or memory usage.

URL PDF HTML ☆

赞 0 踩 0

2603.17198 2026-05-26 cs.LG cs.CL 版本更新

Structural Abstraction as an Inductive Bias for Non-Stationary Language Model Training

结构抽象作为非平稳语言模型训练的归纳偏置

Elnaz Rahmati, Nona Ghazizadeh, Zhivar Sourati, Nina Rouhani, Morteza Dehghani

发表机构 * University of Southern California（南加州大学）

AI总结提出抽象增强训练（AAT）方法，通过联合优化具体实例及其结构抽象，减少灾难性干扰并提升关系泛化能力，在非平稳语言模型训练中验证了结构抽象作为稳定学习信号的有效性。

详情

AI中文摘要

认知科学的一个基本原则认为，智能体不是通过将经验存储为孤立实例来学习，而是通过形成捕捉跨情境共享关系结构的抽象图式来学习。尽管这一主张得到了行为和神经影像研究的充分支持，但其作为语言模型计算训练信号的作用仍未得到充分探索。我们针对非平稳语言模型训练中的这一空白，提出疑问：将学习偏向结构抽象是否能如人类结果所预测的那样减少灾难性干扰并提升关系泛化？为研究这一问题，我们引入了抽象增强训练（AAT），这是一种轻量级的损失级修改，联合优化具体实例及其结构抽象，以及两个基准：关系循环基准（RCB）和叙事抽象基准（NAB）。这些资源将核心认知构造操作化：实体掩码作为关系对齐的计算模拟，谚语作为必须跨表面不同情境推断的隐式抽象意义的载体。我们的实证结果表明，AAT持续减少遗忘并提升泛化，其模式与基于图式学习的认知预测一致。除了对持续学习的实际意义外，这些结果提供了初步的计算证据，表明结构抽象是非平稳环境中稳定学习的信号。

英文摘要

A foundational principle in cognitive science holds that intelligent agents do not learn by storing experiences as isolated instances, but by forming abstract schemas that capture relational structure shared across situations. Even though this claim is well supported by behavioral and neuroimaging studies, its role as a computational training signal in language models remains underexplored. We target this gap in the setting of non-stationary language model training, asking does biasing learning toward structural abstraction reduce catastrophic interference and improve relational generalization as predicted by human results? To study this question, we introduce Abstraction-Augmented Training (AAT), a lightweight loss-level modification that jointly optimizes over concrete instances and their structural abstractions, and two benchmarks, the Relational Cycle Benchmark (RCB) and the Narrative Abstraction Benchmark (NAB). These resources operationalize core cognitive constructs: entity masking as a computational analog of relational alignment, and proverbs as vehicles for implicit abstract meaning that must be inferred across surface-dissimilar situations. Our empirical results demonstrate that AAT consistently reduces forgetting and improves generalization in a pattern that aligns with cognitive predictions for schema-based learning. Beyond the practical implications for continual learning, these results offer preliminary computational evidence that structural abstraction is a signal for stable learning in non-stationary environments.

URL PDF HTML ☆

赞 0 踩 0

2603.17044 2026-05-26 cs.LG cs.AI cs.CV 版本更新

逻辑引导的向量场用于约束生成建模

Ali Baheri

发表机构 * Rochester Institute of Technology（罗切斯特理工学院）

AI总结提出逻辑引导向量场（LGVF）框架，通过可微逻辑约束松弛注入流匹配生成模型，结合训练时逻辑损失和推理时梯度调整，在三个约束生成案例中减少59-82%的约束违反。

详情

AI中文摘要

神经符号系统旨在结合符号逻辑的表达结构与神经学习的灵活性；然而，生成模型通常缺乏在生成时强制执行声明性约束的机制。我们提出了逻辑引导向量场（LGVF），这是一个神经符号框架，将符号知识（指定为逻辑约束的可微松弛）注入流匹配生成模型。LGVF耦合了两种互补机制：（1）训练时逻辑损失，惩罚连续流轨迹上的约束违反，权重强调目标分布附近的正确性；（2）推理时调整，使用约束梯度引导采样，作为对学习动力学的轻量级、逻辑信息校正。我们在三个约束生成案例研究上评估了LGVF，涵盖线性、非线性和多区域可行性约束。在所有设置中，与标准流匹配相比，LGVF将约束违反减少了59-82%，并在每种情况下实现了最低的违反率。在线性和环形设置中，LGVF还通过MMD衡量提高了分布保真度，而在多障碍物设置中，我们观察到满意度-保真度权衡，可行性提高但MMD增加。除了定量收益外，LGVF还产生了具有约束意识的向量场，表现出新兴的避障行为，无需显式路径规划即可将样本绕过禁止区域。

英文摘要

Neuro-symbolic systems aim to combine the expressive structure of symbolic logic with the flexibility of neural learning; yet, generative models typically lack mechanisms to enforce declarative constraints at generation time. We propose Logic-Guided Vector Fields (LGVF), a neuro-symbolic framework that injects symbolic knowledge, specified as differentiable relaxations of logical constraints, into flow matching generative models. LGVF couples two complementary mechanisms: (1) a training-time logic loss that penalizes constraint violations along continuous flow trajectories, with weights that emphasize correctness near the target distribution; and (2) an inference-time adjustment that steers sampling using constraint gradients, acting as a lightweight, logic-informed correction to the learned dynamics. We evaluate LGVF on three constrained generation case studies spanning linear, nonlinear, and multi-region feasibility constraints. Across all settings, LGVF reduces constraint violations by 59-82% compared to standard flow matching and achieves the lowest violation rates in each case. In the linear and ring settings, LGVF also improves distributional fidelity as measured by MMD, while in the multi-obstacle setting, we observe a satisfaction-fidelity trade-off, with improved feasibility but increased MMD. Beyond quantitative gains, LGVF yields constraint-aware vector fields exhibiting emergent obstacle-avoidance behavior, routing samples around forbidden regions without explicit path planning.

URL PDF HTML ☆

赞 0 踩 0

2602.01576 2026-05-26 cs.LG cs.AI cs.CV 版本更新

Generative Visual Code Mobile World Models

生成式视觉代码移动世界模型

Woosung Koh, Sungjun Han, Segyu Lee, Se-Young Yun, Jamin Shin

发表机构 * Trillion Labs（万亿实验室）

AI总结提出通过单一视觉语言模型预测可执行网页代码来生成移动GUI下一状态，结合文本和视觉世界模型优势，实现高保真视觉生成与精确文本渲染。

Comments ICML 2026

详情

AI中文摘要

移动图形用户界面世界模型为在训练和推理时提升移动GUI代理性能提供了有前景的路径。然而，当前方法面临关键权衡：基于文本的世界模型牺牲了视觉保真度，而视觉世界模型在精确文本渲染上的不足导致其依赖缓慢、复杂的流水线和大量外部模型。我们提出一种新范式：通过可渲染代码生成进行视觉世界建模，其中单一视觉语言模型预测下一个GUI状态为可执行网页代码，该代码渲染为像素，而非直接生成像素。这结合了两种方法的优势：视觉语言模型保留其语言先验以实现精确文本渲染，同时其在结构化网页代码上的预训练实现了高保真视觉生成。我们推出了gWorld（8B、32B），这是基于该范式的首个开源权重视觉移动GUI世界模型，以及一个自动合成基于代码的训练数据的数据生成框架（gWorld）。在4个分布内和2个分布外基准测试的广泛评估中，gWorld在准确率与模型规模之间建立了新的帕累托前沿，性能优于8个前沿开源权重模型（其规模大50.25倍以上）。进一步分析表明：（1）通过gWorld扩展训练数据带来有意义的收益；（2）我们流水线的每个组件都提高了数据质量；（3）更强的世界建模提升了下游移动GUI策略性能。

英文摘要

Mobile Graphical User Interface (GUI) World Models (WMs) offer a promising path for improving mobile GUI agent performance at train- and inference-time. However, current approaches face a critical trade-off: text-based WMs sacrifice visual fidelity, while the inability of visual WMs in precise text rendering led to their reliance on slow, complex pipelines dependent on numerous external models. We propose a novel paradigm: visual world modeling via renderable code generation, where a single Vision-Language Model (VLM) predicts the next GUI state as executable web code that renders to pixels, rather than generating pixels directly. This combines the strengths of both approaches: VLMs retain their linguistic priors for precise text rendering while their pre-training on structured web code enables high-fidelity visual generation. We introduce gWorld (8B, 32B), the first open-weight visual mobile GUI WMs built on this paradigm, along with a data generation framework (gWorld) that automatically synthesizes code-based training data. In extensive evaluation across 4 in- and 2 out-of-distribution benchmarks, gWorld sets a new pareto frontier in accuracy versus model size, outperforming 8 frontier open-weight models over 50.25x larger. Further analyses show that (1) scaling training data via gWorld yields meaningful gains, (2) each component of our pipeline improves data quality, and (3) stronger world modeling improves downstream mobile GUI policy performance.

URL PDF HTML ☆

赞 0 踩 0

2601.21670 2026-05-26 cs.CV cs.LG 版本更新

未来KL正则化GRPO：基于f-散度正则化的过程级信用分配

Jiarui Yao, Ruida Wang, Hao Bai, Tong Zhang

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结本文提出未来KL正则化策略优化（FRPO），通过因果未来正则化回报修正GRPO中局部KL损失缺失的梯度信号，在数学推理任务中提升pass@16并保持更高熵和更低策略漂移。

详情

AI中文摘要

组相对策略优化（GRPO）广泛用于无评论家的大语言模型（LLM）后训练，但其KL正则化通常作为局部损失侧的token惩罚实现。我们表明这遗漏了自回归KL正则化诱导的策略梯度信号。与标准KL正则化强化学习（RL）目标不同，GRPO的组归一化引入非线性提示级效用；对于二元验证器奖励，该效用为$2\arcsin\sqrt p$。因此，奖励和KL在归一化前无法融合而不改变隐式目标。我们推导了具有token级$f$-散度正则化的GRPO风格目标的on-policy梯度。奖励项恢复标准化的GRPO优势，而正则化项包括局部KL损失遗漏的因果未来正则化回报。对于反向KL，这产生简单的未来KL修正：在优势构建后添加每个token对数比的反向累积和。由此产生的方法，未来KL正则化策略优化（FRPO），不需要评论家或额外的模型传递。在数学推理任务上，FRPO在我们的主要大模型设置中提高了pass@16，同时保持比传统损失侧KL基线更高的熵和更低的策略漂移。

英文摘要

Group Relative Policy Optimization (GRPO) is widely used for critic-free Large Language Model (LLM) post-training, but its KL regularization is usually implemented as a local loss-side token penalty. We show that this misses the policy-gradient signal induced by autoregressive KL regularization. Unlike standard KL-regularized Reinforcement Learning (RL) objectives, GRPO's group normalization induces a non-linear prompt-level utility; for binary verifier rewards, this utility is $2\arcsin\sqrt p$. As a result, reward and KL cannot be fused before normalization without changing the implicit objective. We derive the on-policy gradient of GRPO-style objectives with token-wise $f$-divergence regularization. The reward term recovers the standardized GRPO advantage, while the regularizer term includes a causal future-regularization return-to-go omitted by local KL losses. For reverse KL, this yields a simple future KL correction: add a reverse cumulative sum of per-token log ratios after advantage construction. The resulting method, Future-KL Regularized Policy Optimization (FRPO), requires no critic or extra model passes. On mathematical reasoning tasks, FRPO improves pass@16 in our main large-model setting while maintaining higher entropy and lower policy drift than conventional loss-side KL baselines.

URL PDF HTML ☆

赞 0 踩 0

2601.10012 2026-05-26 cs.LG 版本更新

PID-Guided Partial Alignment for Multimodal Decentralized Federated Learning

PID引导的多模态去中心化联邦学习部分对齐

Yanhang Shi, Xiaoyu Wang, Houwei Cao, Jian Li, Yong Liu

发表机构 * Department of Electrical and Computer Engineering, Stony Brook University（石溪大学电气与计算机工程系）； Department of Applied Mathematics and Statistics and the Department of Computer Science, Stony Brook University（石溪大学应用数学与统计系和计算机科学系）； Department of Electrical and Computer Engineering, New York University（纽约大学电气与计算机工程系）； Department of Computer Science, New York Institute of Technology（纽约理工学院计算机科学系）

AI总结针对多模态去中心化联邦学习中异构代理间更新不兼容的问题，提出基于部分信息分解的PARSE框架，通过特征分裂和部分对齐实现高效通信与协作。

详情

AI中文摘要

多模态去中心化联邦学习（DFL）必须支持持有不同模态子集和通常不同模型组件的代理之间的协作，同时在无协调服务器或全局网络视图的点对点（P2P）覆盖网络上运行。一个关键障碍是，传统的多模态训练通常依赖于单一共享表示，这隐含假设异构对等体可以通过相同的通信链路交换和聚合相同的模型组件。在多模态DFL中，这一假设不成立：单模态和多模态代理可能通过共享覆盖网络推送不兼容的更新，削弱代理间迁移和跨模态交互。我们提出PARSE，一个无服务器框架，将部分信息分解（PID）引入多模态DFL。每个代理将其潜在特征分裂为冗余、独特和协同切片（“特征分裂”），并在模态条件化的P2P覆盖网络上进行切片感知通信。在训练过程中，代理仅交换与其邻居在语义上可对齐的切片，根据它们共享的模态和模型组件（“部分对齐”）。这种设计避免了集中式编排和梯度手术式的冲突处理，同时与标准DFL约束和多种P2P覆盖网络拓扑兼容。在多个基准测试和异构代理混合场景中，PARSE在保持每链路负载受限的同时，始终优于任务共享、模态共享和混合共享的多模态DFL基线。关于融合选择和分裂比例的消融实验，以及定性特征分析和覆盖网络拓扑研究，证明了所提出的切片感知设计的鲁棒性和通信效率。

英文摘要

Multimodal decentralized federated learning (DFL) must support collaboration among agents that hold different modality subsets and often different model components, while operating over peer-to-peer (P2P) overlays without a coordinating server or a global network view. A key obstacle is that conventional multimodal training often relies on a single shared representation, which implicitly assumes that heterogeneous peers can exchange and aggregate the same model components over the same communication links. In multimodal DFL, this assumption breaks down: uni- and multimodal agents may push incompatible updates through shared overlays, weakening both inter-agent transfer and cross-modal interaction. We present PARSE, a server-free framework that brings partial information decomposition (PID) into multimodal DFL. Each agent splits its latent features into redundant, unique, and synergistic slices ("feature fission"), and performs slice-aware communication over modality-conditioned P2P overlays. During training, agents exchange only the slices that are semantically alignable with their neighbors, according to the modalities and model components they share ("partial alignment"). This design avoids centralized orchestration and gradient-surgery style conflict handling, while remaining compatible with standard DFL constraints and a range of P2P overlay topologies. Across multiple benchmarks and heterogeneous peer mixes, PARSE consistently outperforms task-, modality-, and hybrid-sharing multimodal DFL baselines while keeping per-link payloads bounded. Ablations on fusion choices and split ratios, together with qualitative feature analyses and overlay-topology studies, demonstrate the robustness and communication efficiency of the proposed slice-aware design.

URL PDF HTML ☆

赞 0 踩 0

2601.03191 2026-05-26 cs.CV cs.AI cs.LG 版本更新

测试时图搜索用于目标条件强化学习

Evgenii Opryshko, Junwei Quan, Claas Voelcker, Yilun Du, Igor Gilitschenski

发表机构 * Department of Computer Science, University of Toronto, Toronto, Canada（多伦多大学计算机科学系）； Vector Institute, Toronto, Canada（向量研究所）； University of Texas at Austin, Austin, USA（德克萨斯大学奥斯汀分校）； Harvard University, Cambridge, USA（哈佛大学）

AI总结提出测试时图搜索方法，通过构建离线数据集图并自适应选择子目标，在不额外训练的情况下显著提升目标条件强化学习在长时域任务中的成功率。

详情

AI中文摘要

离线目标条件强化学习（GCRL）通常难以处理长时域任务，其中价值估计误差累积导致策略不可靠。通常认为没有专门训练就无法实现有效的长期规划。相反，我们的工作表明，现有的GCRL策略与轻量级、无需训练的规划包装器结合时，可以完成长时域任务。我们发现标准目标条件价值函数编码了足以进行规划的局部一致几何结构。我们的方法，测试时图搜索（TTGS），在离线数据集上构建图，并采用自适应子目标选择策略。为了解决最短路径搜索中不可靠的价值估计，我们提出了一种新机制，软性地惩罚长距离转移。我们的方法计算开销可忽略，且不需要额外的监督或参数更新。在OGBench基准上，TTGS显著提高了多个基学习器和任务的成功率，主要收益在具有挑战性的长时域运动任务上，其中一些成功率从接近零提高到90%以上，通常匹配或超越需要复杂辅助训练的方法。代码和视频可在https://ktolnos.github.io/ttgs找到。

英文摘要

Offline goal-conditioned reinforcement learning (GCRL) often struggles with long-horizon tasks, where errors in value estimation accumulate and produce unreliable policies. It is typically assumed that effective long-term planning is infeasible without specialized training. In contrast, our work demonstrates that existing GCRL policies can complete long-horizon tasks when combined with a lightweight, training-free planning wrapper. We find that standard goal-conditioned value functions encode locally consistent geometric structure sufficient for planning. Our approach, Test-Time Graph Search (TTGS), constructs a graph over the offline dataset and employs an adaptive subgoal selection strategy. To address unreliable value estimates during shortest-path search, we propose a novel mechanism that softly penalizes long-distance transitions. Our method incurs negligible computational overhead and requires no additional supervision or parameter updates. On the OGBench benchmark, TTGS significantly boosts success rates across multiple base learners and tasks, with primary gains on challenging long-horizon locomotion tasks where some success rates are improved from near-zero to over 90\%, often matching or outperforming methods that require complex auxiliary training. Code and videos can be found at https://ktolnos.github.io/ttgs.

URL PDF HTML ☆

赞 0 踩 0

2510.01384 2026-05-26 cs.LG 版本更新

Fine-Tuning Masked Diffusion for Provable Self-Correction

微调掩码扩散以实现可证明的自校正

Jaeyeon Kim, Seunggeun Kim, Taekyun Lee, David Z. Pan, Hyeji Kim, Sham Kakade, Sitan Chen

发表机构 * Harvard University（哈佛大学）； The University of Texas at Austin（德克萨斯大学奥斯汀分校）； Kempner Institute（凯姆纳研究所）

AI总结提出PRISM方法，通过轻量级模型无关的重新掩码策略，在掩码扩散模型中实现可证明的自校正，无需强化学习或验证器，提升低质量令牌检测与修正能力。

Comments Authorship statement: Jaeyeon Kim and Seunggeun Kim contributed equally, and Taekyun Lee is also a co first author

详情

AI中文摘要

生成模型的一个自然期望是自校正——在推理时检测并修正低质量令牌。尽管掩码扩散模型（MDMs）已成为离散空间生成建模的有前景方法，但其自校正能力仍知之甚少。先前将自校正融入MDMs的尝试要么需要彻底改造MDM架构/训练，要么依赖于令牌质量的不精确代理，限制了其适用性。受此启发，我们引入PRISM——掩码扩散推理时自校正的插件式重新掩码——一种轻量级、模型无关的方法，适用于任何预训练MDM。理论上，PRISM定义了一个自校正损失，可证明地学习每个令牌的质量分数，无需强化学习或验证器。这些质量分数在与MDM相同的前向传播中计算，并用于检测低质量令牌。实验上，PRISM在多个领域和规模上推进了MDM推理：数独；无条件文本（170M）；以及使用LLaDA（8B）的代码。

英文摘要

A natural desideratum for generative models is self-correction--detecting and revising low-quality tokens at inference. While Masked Diffusion Models (MDMs) have emerged as a promising approach for generative modeling in discrete spaces, their capacity for self-correction remains poorly understood. Prior attempts to incorporate self-correction into MDMs either require overhauling MDM architectures/training or rely on imprecise proxies for token quality, limiting their applicability. Motivated by this, we introduce PRISM--Plug-in Remasking for Inference-time Self-correction of Masked Diffusions--a lightweight, model-agnostic approach that applies to any pretrained MDM. Theoretically, PRISM defines a self-correction loss that provably learns per-token quality scores, without RL or a verifier. These quality scores are computed in the same forward pass with MDM and used to detect low-quality tokens. Empirically, PRISM advances MDM inference across domains and scales: Sudoku; unconditional text (170M); and code with LLaDA (8B).

URL PDF HTML ☆

赞 0 踩 0

2510.01184 2026-05-26 cs.LG 版本更新

Temporal Score Rescaling for Temperature Sampling in Diffusion and Flow Models

扩散与流模型中温度采样的时间分数重缩放

Yanbo Xu, Yu Wu, Sungjae Park, Zhizhuo Zhou, Shubham Tulsiani

发表机构 * School of Computer Science, Carnegie Mellon University, Pittsburgh, USA（计算机科学系，卡内基梅隆大学，匹兹堡，美国）； Department of Computer Science, Stanford University, California, USA（计算机科学系，斯坦福大学，加利福尼亚，美国）

AI总结提出一种无需微调或改变训练策略的方法，通过重缩放噪声数据的得分函数来调控扩散和流模型的采样多样性，实现局部温度控制，并在图像生成、姿态估计、深度预测、机器人操作和蛋白质设计等任务中验证了有效性。

Comments Accepted at ICML 2026. Project page: https://temporalscorerescaling.github.io/

详情

AI中文摘要

我们提出一种机制来引导去噪扩散和流匹配模型的采样多样性，允许用户从比训练分布更尖锐或更宽的分布中采样。我们基于这些模型利用（学习的）噪声数据分布的得分函数进行采样这一观察，并表明重缩放这些得分函数可以有效控制“局部”采样温度。值得注意的是，该方法不需要任何微调或改变训练策略，可以应用于任何现成模型，并且与确定性和随机采样器兼容。我们首先在玩具2D数据上验证了我们的框架，然后展示了其在五个不同任务上训练的扩散模型中的应用——图像生成、姿态估计、深度预测、机器人操作和蛋白质设计。我们发现，在这些任务中，我们的方法允许从更尖锐（或更平坦）的分布中采样，从而带来性能提升，例如，深度预测模型受益于采样更可能的深度估计，而图像生成模型在采样稍平坦的分布时表现更好。

英文摘要

We present a mechanism to steer the sampling diversity of denoising diffusion and flow matching models, allowing users to sample from a sharper or broader distribution than the training distribution. We build on the observation that these models leverage (learned) score functions of noisy data distributions for sampling and show that rescaling these allows one to effectively control a 'local' sampling temperature. Notably, this approach does not require any finetuning or alterations to training strategy, and can be applied to any off-the-shelf model and is compatible with both deterministic and stochastic samplers. We first validate our framework on toy 2D data, and then demonstrate its application for diffusion models trained across five disparate tasks -- image generation, pose estimation, depth prediction, robot manipulation, and protein design. We find that across these tasks, our approach allows sampling from sharper (or flatter) distributions, yielding performance gains e.g., depth prediction models benefit from sampling more likely depth estimates, whereas image generation models perform better when sampling a slightly flatter distribution.

URL PDF HTML ☆

赞 0 踩 0

2510.00387 2026-05-26 cs.LG cs.HC 版本更新

Bayesian Distributional Models of Executive Functioning

执行功能的贝叶斯分布模型

Robert Kasumba, Zeyu Lu, Dom CP Marticorena, Mingyang Zhong, Paul Beggs, Anja Pahor, Geetha Ramani, Imani Goffney, Susanne M Jaeggi, Aaron R Seitz, Jacob R Gardner, Dennis L Barbour

发表机构 * Division of Computing and Data Science, Washington University（华盛顿大学计算与数据科学系）； Department of Computer Science and Engineering, Washington University（华盛顿大学计算机科学与工程系）； Department of Biomedical Engineering, Washington University（华盛顿大学生物医学工程系）； Department of Psychology, University of Maribor（马里博大学心理学系）； Department of Human Development and Quantitative Methodology, University of Maryland（马里兰大学人类发展与定量方法系）； Department of Teaching and Learning, Policy and Leadership, University of Maryland（马里兰大学教学与学习、政策与领导系）； Department of Psychology, Northeastern University（东北大学心理学系）； Department of Computer and Information Science, University of Pennsylvania（宾夕法尼亚大学计算机与信息科学系）

AI总结本研究使用已知真实参数的受控模拟，评估分布潜变量模型（DLVM）和贝叶斯分布主动学习（DALE）相比传统独立最大似然估计（IMLE）的优势，证明DLVM结合DALE能更高效地估计认知表现分布。

Comments 45 pages, 8 figures, 2 tables

详情

AI中文摘要

本研究使用已知真实参数的受控模拟，评估分布潜变量模型（DLVM）和贝叶斯分布主动学习（DALE）相比传统独立最大似然估计（IMLE）的表现。DLVM整合了多个执行功能任务和个体的观测，允许在稀疏或不完整数据条件下进行参数估计。为了建立已知真实参数，我们从神经网络学习的潜空间中均匀采样个体会话，并将其映射到不同任务上的分布认知表现。然后使用DALE、随机过程或标准固定电池方法从这些分布中采样个体测试项。在给定相同观测集时，DLVM始终优于IMLE，尤其是在数据量较小的情况下，并且更快收敛到真实分布的高度准确估计。在第二组分析中，DALE自适应地引导采样以最大化信息增益，优于随机采样和固定测试电池，尤其是在前80次试验中。这些发现确立了将DLVM的跨任务推理与DALE的最优自适应采样相结合的优势，为更高效的认知评估提供了原则性基础。

英文摘要

This study uses controlled simulations with known ground-truth parameters to evaluate how Distributional Latent Variable Models (DLVM) and Bayesian Distributional Active LEarning (DALE) perform in comparison to conventional Independent Maximum Likelihood Estimation (IMLE). DLVM integrates observations across multiple executive function tasks and individuals, allowing parameter estimation even under sparse or incomplete data conditions. To establish known-ground truth, we uniformly sample individual sessions from a neural network learned latent space and map them to distributional cognitive performance across different tasks. The individual test-items are then sampled from these distributions using either DALE, random procedure or a standard fixed battery approach. When given the same set of observations, DLVM consistently outperformed IMLE, especially under smaller amounts of data, and converges faster to highly accurate estimates of the true distributions. In a second set of analyses, DALE adaptively guided sampling to maximize information gain, outperforming random sampling and fixed test batteries, particularly within the first 80 trials. These findings establish the advantages of combining DLVM's cross-task inference with DALE's optimal adaptive sampling, providing a principled basis for more efficient cognitive assessments.

URL PDF HTML ☆

赞 0 踩 0

2509.13608 2026-05-26 cs.LG 版本更新

Is GPT-4o mini Blinded by its Own Safety Filters? Exposing the Multimodal-to-Unimodal Bottleneck in Hate Speech Detection

GPT-4o mini 是否被自身的安全过滤器蒙蔽？揭示多模态到单模态瓶颈在仇恨言论检测中的作用

Niruthiha Selvanayagam, Ted Kurti

AI总结本文通过 Hateful Memes Challenge 数据集系统分析 GPT-4o mini 在多模态仇恨言论检测中的安全架构，发现并实验验证了“单模态瓶颈”缺陷，即上下文无关的安全过滤器会优先阻断多模态推理，导致误报。

Comments This paper reports preliminary findings from a small-scale study whose sample size is insufficient to support the stated conclusions. The authors are withdrawing it to conduct a more comprehensive evaluation

详情

AI中文摘要

随着大型多模态模型（LMMs）融入日常数字生活，理解其安全架构成为 AI 对齐的关键问题。本文对 OpenAI 的 GPT-4o mini（一个全球部署的模型）在多模态仇恨言论检测这一困难任务上进行了系统分析。使用 Hateful Memes Challenge 数据集，我们对 500 个样本进行了多阶段调查，以探究模型的推理和失败模式。我们的核心发现是通过实验识别出“单模态瓶颈”——一种架构缺陷，其中模型的高级多模态推理被上下文无关的安全过滤器系统性地抢先阻断。对 144 次内容策略拒绝的定量验证显示，这些覆盖触发由单模态视觉（50%）和文本（50%）内容均等引发。我们进一步证明该安全系统脆弱，不仅阻止高风险图像，也阻止良性的常见模因格式，导致可预测的误报。这些发现揭示了最先进 LMMs 中能力与安全性之间的根本矛盾，强调了需要更集成、上下文感知的对齐策略，以确保 AI 系统能够安全且有效地部署。

英文摘要

As Large Multimodal Models (LMMs) become integral to daily digital life, understanding their safety architectures is a critical problem for AI Alignment. This paper presents a systematic analysis of OpenAI's GPT-4o mini, a globally deployed model, on the difficult task of multimodal hate speech detection. Using the Hateful Memes Challenge dataset, we conduct a multi-phase investigation on 500 samples to probe the model's reasoning and failure modes. Our central finding is the experimental identification of a "Unimodal Bottleneck," an architectural flaw where the model's advanced multimodal reasoning is systematically preempted by context-blind safety filters. A quantitative validation of 144 content policy refusals reveals that these overrides are triggered in equal measure by unimodal visual 50% and textual 50% content. We further demonstrate that this safety system is brittle, blocking not only high-risk imagery but also benign, common meme formats, leading to predictable false positives. These findings expose a fundamental tension between capability and safety in state-of-the-art LMMs, highlighting the need for more integrated, context-aware alignment strategies to ensure AI systems can be deployed both safely and effectively.

URL PDF HTML ☆

赞 0 踩 0

2509.12783 2026-05-26 q-bio.NC cs.LG math.DS stat.ML 版本更新

Fast reconstruction of degenerate populations of conductance-based neuron models from spike times

基于电导的神经元模型退化群体的尖峰时间快速重建

Julien Brandoit, Damien Ernst, Guillaume Drion, Arthur Fyon

发表机构 * Montefiore Institute, University of Liège（里耶克斯大学蒙特福尔研究所）； LTCI, Telecom Paris, Institut Polytechnique de Paris（巴黎电信LTCI研究院，巴黎理工院）

AI总结结合深度学习与动态输入电导理论，从尖峰时间快速重建高维电导模型的退化群体，实现高精度、鲁棒且可扩展的推断。

详情

DOI: 10.1371/journal.pcbi.1014337
Journal ref: PLOS Computational Biology 22(5): e1014337 (2026)

AI中文摘要

从实验可获取的记录中推断电导模型（CBMs）的生物物理参数仍然是计算神经科学的一个核心挑战。尖峰时间是最广泛可用的数据，但它们很少揭示哪些离子通道电导组合产生了观察到的活动。这一逆问题因神经元退化而进一步复杂化，其中多个不同的电导集产生相似的尖峰模式。我们引入了一种方法，通过将深度学习与动态输入电导（DICs）相结合来解决这一挑战，DICs是一个理论框架，将复杂的CBMs简化为三个可解释的反馈组件，控制兴奋性和尖峰模式。我们的方法首先使用一个神经网络将尖峰时间映射到阈值处的DIC密度，该网络学习神经元活动的低维表示。然后，预测的DIC值通过迭代补偿算法用于生成退化的CBM群体，确保与中间目标DIC兼容，从而再现相应的尖峰模式，即使在高度模型中也是如此。应用于两个模型，该算法流程以高精度和鲁棒性重建尖峰和爆发模式，包括在模拟生理随机性的噪声电流注入下生成的尖峰序列。它在标准硬件上毫秒级内产生多样的退化群体，实现了仅从尖峰记录进行可扩展且高效的推断。总之，这项工作将DICs定位为实验观察活动与机制模型之间的实用且可解释的桥梁。通过实现直接从尖峰时间快速且可扩展地重建退化群体，我们的方法提供了一种强大的方式来研究神经元如何利用电导变异性实现可靠计算。

英文摘要

Inferring the biophysical parameters of conductance-based models (CBMs) from experimentally accessible recordings remains a central challenge in computational neuroscience. Spike times are the most widely available data, yet they reveal little about which combinations of ion channel conductances generate the observed activity. This inverse problem is further complicated by neuronal degeneracy, where multiple distinct conductance sets yield similar spiking patterns. We introduce a method that addresses this challenge by combining deep learning with Dynamic Input Conductances (DICs), a theoretical framework that reduces complex CBMs to three interpretable feedback components governing excitability and firing patterns. Our approach first maps spike times to DIC densities at threshold using a neural network that learns a low-dimensional representation of neuronal activity. The predicted DIC values are then used to generate degenerate CBM populations via an iterative compensation algorithm, ensuring compatibility with the intermediate target DICs, and thereby reproducing the corresponding firing patterns, even in high-dimensional models. Applied to two models, this algorithmic pipeline reconstructs spiking and bursting regimes with high accuracy and robustness to variability, including spike trains generated under noisy current injection mimicking physiological stochasticity. It produces diverse degenerate populations within milliseconds on standard hardware, enabling scalable and efficient inference from spike recordings alone. Together, this work positions DICs as a practical and interpretable link between experimentally observed activity and mechanistic models. By enabling fast and scalable reconstruction of degenerate populations directly from spike times, our approach provides a powerful way to investigate how neurons exploit conductance variability to achieve reliable computation.

URL PDF HTML ☆

赞 0 踩 0

2509.11379 2026-05-26 stat.ML cs.LG math.ST stat.TH 版本更新

FLoRIST: 用于高效准确的大语言模型联邦微调的奇异值阈值化方法

Hariharan Ramesh, Jyotikrishna Dass

发表机构 * Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country（匿名机构，匿名城市，匿名地区，匿名国家）

AI总结提出FLoRIST框架，通过奇异值阈值化在紧凑中间空间中对局部适配器进行分解，实现数学上准确的聚合，同时保持通信和计算高效。

Comments 21 pages, 12 figures

详情

Journal ref: Ninth Conference on Machine Learning and Systems (MLSys 2026)

AI中文摘要

将低秩适配（LoRA）集成到联邦学习为在不共享本地数据的情况下对大语言模型（LLMs）进行参数高效微调提供了一种有前景的解决方案。然而，为联邦LoRA设计的几种方法在平衡通信效率、模型准确性和计算成本方面面临重大挑战，尤其是在异构客户端之间。这些方法要么依赖于简单的局部适配器平均，这会引入聚合噪声；要么需要传输大型堆叠局部适配器，导致通信效率低下；要么需要重建内存密集的全局权重更新矩阵并执行计算昂贵的分解来设计客户端特定的低秩适配器。在这项工作中，我们提出了FLoRIST，一个联邦微调框架，在不产生高通信或计算开销的情况下实现了数学上准确的聚合。FLoRIST不是在服务器端构建完整的全局权重更新矩阵，而是通过对堆叠的局部适配器分别执行奇异值分解，采用高效的分解流程。该方法在紧凑的中间空间内操作，以表示来自局部LoRA的累积信息。我们引入了可调的奇异值阈值化，用于服务器端最优秩选择，以构建一对所有客户端共享的全局低秩适配器。跨多个数据集和LLMs的大量实证评估表明，FLoRIST在同构和异构设置中始终在卓越的通信效率和竞争性能之间取得最佳平衡。

英文摘要

Integrating Low-Rank Adaptation (LoRA) into federated learning offers a promising solution for parameter-efficient fine-tuning of Large Language Models (LLMs) without sharing local data. However, several methods designed for federated LoRA present significant challenges in balancing communication efficiency, model accuracy, and computational cost, particularly among heterogeneous clients. These methods either rely on simplistic averaging of local adapters, which introduces aggregation noise, require transmitting large stacked local adapters, leading to poor communication efficiency, or necessitate reconstructing memory-dense global weight-update matrix and performing computationally expensive decomposition to design client-specific low-rank adapters. In this work, we propose FLoRIST, a federated fine-tuning framework that achieves mathematically accurate aggregation without incurring high communication or computational overhead. Instead of constructing the full global weight-update matrix at the server, FLoRIST employs an efficient decomposition pipeline by performing singular value decomposition on stacked local adapters separately. This approach operates within a compact intermediate space to represent the accumulated information from local LoRAs. We introduce tunable singular value thresholding for server-side optimal rank selection to construct a pair of global low-rank adapters shared by all clients. Extensive empirical evaluations across multiple datasets and LLMs demonstrate that FLoRIST consistently strikes the best balance between superior communication efficiency and competitive performance in both homogeneous and heterogeneous setups.

URL PDF HTML ☆

赞 0 踩 0

2506.09084 2026-05-26 cs.LG cs.AI 版本更新

PageLLM: A Multi-Grained Reward Framework for Whole-Page Optimization with Large Language Models

PageLLM：面向整页优化的大语言模型多粒度奖励框架

Xinyuan Wang, Liang Wu, Dongjie Wang, Yanjie Fu

发表机构 * Arizona State University（亚利桑那州立大学）； Nokia（诺基亚）； University of Kansas（堪萨斯大学）

AI总结针对整页优化中人工标注成本高和页面级连贯性与项目级放置粒度不匹配的问题，提出PageLLM框架，通过将隐式反馈解耦为粗粒度页面级奖励和细粒度项目级奖励，结合PPO的RLHF进行微调，显著提升排序性能并在线上部署中取得收益。

详情

AI中文摘要

整页优化（WPO）决定了搜索和推荐结果如何呈现给用户，而大语言模型（LLMs）通过将页面生成视为序列生成为其开辟了新途径。然而，将LLMs适配到网络规模的WPO仍受限于昂贵的人工标注需求以及页面级连贯性与项目级放置之间的粒度不匹配。在这项工作中，我们表明这两个挑战是耦合的：只要奖励信号被解耦为两个互补的粒度，仅凭隐式用户反馈就足以进行对齐。我们提出了PageLLM，一个基于奖励的微调框架，该框架（i）将隐式反馈转化为四个对比偏好对族，涵盖相关性、排序、多样性和冗余度；（ii）学习一个粗粒度的页面级奖励和一个细粒度的项目级奖励，后者捕捉对参与度敏感的位置交换；（iii）在预训练的LLM上通过基于PPO的RLHF结合这两种奖励。在七个亚马逊类别上针对十一个基线的广泛实验表明，单独任何一种奖励都不足够——丢弃页面级或项目级信号分别使NDCG@100降低17.8%和15.2%，而联合奖励则使NDCG@100提升高达46.8%。在拥有1000万用户的在线A/B测试中，PageLLM使GMV提升0.44%，点击率提升0.14%，证实了来自隐式反馈的多粒度奖励可扩展到生产级WPO。代码和数据可在匿名仓库中获取。

英文摘要

Whole-page optimization (WPO) decides how search and recommendation results are surfaced to users, and large language models (LLMs) open a new route to it by treating page generation as sequence generation. Adapting LLMs to web-scale WPO, however, remains bottlenecked by the need for costly human annotations and by the mismatched granularity between page-level coherence and item-level placement. In this work we show that these two challenges are coupled: implicit user feedback alone suffices for alignment, provided the reward signal is decoupled into two complementary granularities. We propose PageLLM, a reward-based fine-tuning framework that (i) turns implicit feedback into four contrastive preference-pair families covering relevance, ranking, diversity, and redundancy, (ii) learns a coarse page-level reward and a fine item-level reward that captures engagement-sensitive position swaps, and (iii) combines both rewards in PPO-based RLHF over a pre-trained LLM. Extensive experiments on seven Amazon categories against eleven baselines show that neither reward alone is sufficient -- dropping the page-level or item-level signal reduces NDCG@100 by 17.8% and 15.2% respectively, whereas the joint reward improves NDCG@100 by up to 46.8%. Deployed in a 10M-user online A/B test, PageLLM raises GMV by 0.44% and click-through rate by 0.14%, confirming that multi-grained rewards from implicit feedback scale to production WPO. Code and data are available at an anonymized repository.

URL PDF HTML ☆

赞 0 踩 0

2506.00181 2026-05-26 cs.LG stat.ML 版本更新

On the Interaction of Batch Noise, Adaptivity, and Compression, under $(L_0,L_1)$-Smoothness: An SDE Approach

关于批噪声、自适应性和压缩在$(L_0,L_1)$-光滑性下的相互作用：一种SDE方法

Enea Monzio Compagnoni, Rustem Islamov, Frank Norbert Proske, Aurelien Lucchi, Antonio Orvieto, Eduard Gorbunov

发表机构 * University of Basel（巴塞尔大学）； University of Oslo（奥斯陆大学）； MBZUAI（马克斯·普朗克智能系统研究所）； Max Planck Institute for Intelligent Systems（马克斯·普朗克智能系统研究所）； ELLIS Institute Tübingen（图宾根ELLIS研究所）； Tübingen AI Center（图宾根人工智能中心）

AI总结本文通过随机微分方程（SDE）框架，在$(L_0,L_1)$-光滑性假设下统一分析分布式压缩SGD及其符号变体，揭示了梯度噪声、通信压缩和自适应更新之间的相互作用，并提出了新的SDE模型以准确捕捉学习率限制与几何特性的关系。

Comments Accepted at ICML 2026 (Poster)

详情

AI中文摘要

分布式随机优化交织了（i）随机梯度噪声、（ii）通信压缩和（iii）自适应/归一化更新。虽然每个因素已被单独研究，但在现实假设下它们的联合效应仍然知之甚少。在这项工作中，我们在最近引入的$(L_0, L_1)$-光滑性条件下，为分布式压缩SGD（DCSGD）及其符号变体分布式符号SGD（DSignSGD）开发了一个统一的理论框架。从概念角度，我们表明文献中的一阶和二阶修正方程不能准确建模离散时间步长/稳定性限制，特别是在$(L_0,L_1)$-光滑性下。从技术角度，我们通过将曲率相关项仔细纳入其漂移中，提出了新的一阶SDE：这有助于捕捉学习率限制、梯度噪声、压缩和损失景观几何之间的细粒度关系。重要的是，我们在一般梯度噪声假设下进行，包括重尾和仿射方差区域，这超出了经典的有限方差设置。我们的结果表明，归一化DCSGD的更新作为稳定性的自然条件出现，归一化程度由梯度噪声结构、景观正则性和压缩率精确决定。相比之下，DSignSGD即使在重尾噪声下也能以标准学习率调度收敛。这些发现共同提供了新的理论见解和视角，以及实践指导。

英文摘要

Distributed stochastic optimization intertwines (i) stochastic gradient noise, (ii) communication compression, and (iii) adaptive/normalized updates. While each factor has been studied in isolation, their joint effect under realistic assumptions remains poorly understood. In this work, we develop a unified theoretical framework for Distributed Compressed SGD (DCSGD) and its sign variant Distributed SignSGD (DSignSGD) under the recently introduced $(L_0, L_1)$-smoothness condition. From a conceptual perspective, we show that the first- and second-order modified equations from the literature do not accurately model the discrete-time step-size/stability restrictions, especially under $(L_0,L_1)$-smoothness. From a technical perspective, we propose new first-order SDEs by carefully incorporating curvature-dependent terms into their drift: This helps capture the fine-grained relationship between learning rate restrictions, gradient noise, compression, and the geometry of the loss landscape. Importantly, we do so under general gradient noise assumptions, including heavy-tailed and affine-variance regimes, which extend beyond the classical bounded-variance setting. Our results suggest that normalizing the updates of DCSGD emerges as a natural condition for stability, with the degree of normalization precisely determined by the gradient noise structure, the landscape's regularity, and the compression rate. In contrast, DSignSGD converges even under heavy-tailed noise with standard learning rate schedules. Together, these findings offer both new theoretical insights and perspectives, and practical guidance.

URL PDF HTML ☆

赞 0 踩 0

2505.18979 2026-05-26 cs.LG 版本更新

Dynamic Optimization and Safety Indicator Injection for Jailbreaking Text-to-Image Models with Multimodal Safety Filters

动态优化与安全指示注入：针对多模态安全过滤器的文本到图像模型越狱方法

Zixuan Chen, Hao Lin, Ke Xu, Xinghao Jiang, Tanfeng Sun

发表机构 * Shanghai Jiao Tong University（上海交通大学）； The Chinese University of Hong Kong（香港中文大学）

AI总结提出OptJail框架，通过动态提示优化与自适应安全指示注入，绕过文本和图像过滤器，实现高成功率越狱，并揭示多模态防御的系统性漏洞。

详情

AI中文摘要

文本到图像（T2I）模型可能生成不安全内容，促使采用包含文本和图像过滤器的多阶段安全流水线。新型基于LLM的过滤器能检测关键词之外的潜在意图，使得令牌级扰动攻击不可靠。我们的评估进一步表明，现有越狱方法在绕过过滤器和保持语义保真度之间存在尖锐权衡，同时需要过多查询才能成功。我们提出 extbf{OptJail}，一种自动化越狱框架，结合动态提示优化与多模态反馈。它包含两个关键组件：(i) extit{动态优化}，一种迭代过程，利用文本过滤器反馈和语义一致性将提示改写为对抗变体；(ii) extit{自适应安全指示注入}，将良性视觉线索的注入建模为强化学习问题，以绕过图像级过滤器。OptJail实现了最先进的性能，将ShieldLM-7B的绕过率从8.9%（Sneakyprompt）提高到99.0%，CLIP分数从0.2637提升到0.2762。此外，它能泛化到未见过的过滤器，并在我们的评估中成功越狱DALL·E 3。机制分析揭示了这些防御失败的原因：优化后的提示被投影到过滤器表示空间的“安全”区域，但在生成模型的语义空间中几乎保持静止；注入的安全指示将图像检测器的注意力从不安全内容转向良性视觉线索。本研究揭示了当前多模态防御的系统性漏洞，并激励更强的自适应防御。

英文摘要

Text-to-image (T2I) models can generate not-safe-for-work (NSFW) content, motivating multi-stage safety pipelines with both text and image filters. Newer LLM-based filters detect latent intent beyond keywords, making token-level perturbation attacks unreliable. Our evaluation further shows that existing jailbreak methods exhibit a sharp trade-off between filter evasion and semantic fidelity, while also requiring excessive queries to succeed. We introduce \textbf{OptJail}, an automated jailbreak framework that combines dynamic prompt optimization with multimodal feedback. It consists of two key components: (i) \textit{Dynamic Optimization}, an iterative process that leverages text-filter feedback and semantic consistency to rewrite prompts into adversarial variants; and (ii) \textit{Adaptive Safety Indicator Injection}, which formulates the injection of benign visual cues as a reinforcement learning problem to bypass image-level filters. OptJail achieves state-of-the-art performance, increasing the ShieldLM-7B bypass rate from 8.9\% (Sneakyprompt) to 99.0\%, improving CLIP score from 0.2637 to 0.2762. Moreover, it generalizes to unseen filters and successfully jailbreaks DALL E 3 in our evaluation. Mechanistic analysis reveals why these defenses fail: optimized prompts are projected into the ``safe'' region of the filter's representation space yet remain nearly stationary in the generative model's semantic space, and injected safety indicators redirect image detectors' attention away from NSFW content toward benign visual cues. This study reveals systemic vulnerabilities in current multimodal defenses and motivates stronger adaptive defenses.

URL PDF HTML ☆

赞 0 踩 0

2505.13878 2026-05-26 cs.LG cs.CL 版本更新

InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models

InfiFPO：通过偏好优化实现大型语言模型的隐式模型融合

Yanggan Gu, Yuanyi Wang, Zhaoyi Yan, Yiming Zhang, Qi Zhou, Fei Wu, Hongxia Yang

发表机构 * The Hong Kong Polytechnic University (PolyU)（香港理工大学）； Zhejiang University（浙江大学）； PolyU-Daya Bay Technology and Innovation Research Institute（香港理工大学-大亚湾技术与创新研究院）

AI总结提出InfiFPO方法，通过将DPO中的参考模型替换为融合源模型，在序列级别合成多源概率，实现隐式模型融合，从而在偏好对齐阶段有效融合多个LLM并提升性能。

详情

Journal ref: NeurIPS 2025

AI中文摘要

模型融合通过轻量训练方法将具有不同优势的多个大型语言模型（LLM）组合成一个更强大的集成模型。现有的模型融合工作主要关注监督微调（SFT），而偏好对齐（PA）——增强LLM性能的关键阶段——在很大程度上未被探索。当前少数在PA阶段的融合方法（如WRPO）通过仅利用源模型的响应输出而丢弃其概率信息来简化过程。为了解决这一局限性，我们提出了InfiFPO，一种用于隐式模型融合的偏好优化方法。InfiFPO将直接偏好优化（DPO）中的参考模型替换为一个融合源模型，该模型在序列级别合成多源概率，从而规避了先前工作中复杂的词汇对齐挑战，同时保留了概率信息。通过引入概率裁剪和最大边际融合策略，InfiFPO使枢轴模型能够与人类偏好对齐，同时有效地从源模型中蒸馏知识。在11个广泛使用的基准上的综合实验表明，InfiFPO始终优于现有的模型融合和偏好优化方法。当使用Phi-4作为枢轴模型时，InfiFPO在11个基准上的平均性能从79.95提升至83.33，显著增强了其在数学、编码和推理任务上的能力。

英文摘要

Model fusion combines multiple Large Language Models (LLMs) with different strengths into a more powerful, integrated model through lightweight training methods. Existing works on model fusion focus primarily on supervised fine-tuning (SFT), leaving preference alignment (PA) --a critical phase for enhancing LLM performance--largely unexplored. The current few fusion methods on PA phase, like WRPO, simplify the process by utilizing only response outputs from source models while discarding their probability information. To address this limitation, we propose InfiFPO, a preference optimization method for implicit model fusion. InfiFPO replaces the reference model in Direct Preference Optimization (DPO) with a fused source model that synthesizes multi-source probabilities at the sequence level, circumventing complex vocabulary alignment challenges in previous works and meanwhile maintaining the probability information. By introducing probability clipping and max-margin fusion strategies, InfiFPO enables the pivot model to align with human preferences while effectively distilling knowledge from source models. Comprehensive experiments on 11 widely-used benchmarks demonstrate that InfiFPO consistently outperforms existing model fusion and preference optimization methods. When using Phi-4 as the pivot model, InfiFPO improve its average performance from 79.95 to 83.33 on 11 benchmarks, significantly improving its capabilities in mathematics, coding, and reasoning tasks.

URL PDF HTML ☆

赞 0 踩 0

2505.03677 2026-05-26 cs.LG 版本更新

Neural Integral Operators for Inverse Problems: An Operator-Learning Framework for Small-Sample Spectroscopic Classification

逆问题的神经积分算子：小样本光谱分类的算子学习框架

Emanuele Zappala, Alice Giola, Andreas Kramer, Saugat Acharya, Enrico Greco

发表机构 * Department of Mathematics and Statistics, Idaho State University, Physical Science Complex, 921 S. 8th Ave., Stop 8085, Pocatello, ID 83209, USA（数学与统计学系，爱达荷州立大学，物理科学中心，921 S. 8th Ave., Stop 8085, Pocatello, ID 83209, USA）； Department of Computer Science, Idaho State University, 921 S. 8th Ave Mail Stop 8060, Pocatello, ID 83209-8023, USA（计算机科学系，爱达荷州立大学，921 S. 8th Ave Mail Stop 8060, Pocatello, ID 83209-8023, USA）； Institute for the Advanced Study of Culture and the Environment (IASCE), University of South Florida, 4202 E Fowler Ave, Tampa, FL 33620, USA（文化与环境高级研究机构（IASCE），佛罗里达州立大学，4202 E Fowler Ave, Tampa, FL 33620, USA）

AI总结提出神经积分算子（NIO）框架，通过参数化Urysohn核和蒙特卡洛采样隐式正则化，在小样本光谱分类任务中优于传统机器学习和深度学习基线。

Comments 20 pages. 4 figures, 3 tables. v2: Link to code repository added. v3: Article largely reorganized and several portions rewritten for clarity. Comments are welcome

详情

AI中文摘要

在软计算中，学习具有强归纳偏置的函数空间映射是一个核心挑战，尤其是在训练数据稀缺且标准深度架构过拟合的情况下。我们引入了一种基于第一类积分方程的\emph{神经积分算子}（NIO）框架，其中算子的Urysohn核由前馈网络~$G_{θ_G}$参数化，潜在函数由卷积编码器~$E_{ϕ_E}$生成，两者通过交叉熵损失进行端到端联合训练。学习算子的积分通过蒙特卡洛采样近似，我们认为这充当了在被积函数层面操作的隐式随机正则化器，补充了权重衰减和dropout等参数级正则化器。我们在三个不同规模和复杂度的真实世界光谱分类任务（FT-IR水果泥、NIR肉类、NIR纺织品）上对该框架进行了基准测试，并与传统机器学习（决策树、支持向量机，有无UMAP）和现代深度学习基线（FFNN、CNN+FFNN、浅层CNN、Transformer）进行了比较。所提出的NIO在所有数据集和指标上始终位居前两名，在最具挑战性的小样本复杂数据集（纺织品）上取得了最佳结果，并且在数据稀缺情况下比竞争深度模型具有更低的性能方差。结果表明，当传统深度学习方法受限于数据稀缺时，具有随机数值积分的算子学习架构是光谱学中逆问题的一种可行的软计算策略。

英文摘要

Learning maps between function spaces with a strong inductive bias is a central challenge in soft computing, especially when training data are scarce and standard deep architectures overfit. We introduce a \emph{neural integral operator} (NIO) framework based on integral equations of the first kind, in which the Urysohn kernel of the operator is parameterized by a feed-forward network~$G_{θ_G}$ and the latent function is produced by a convolutional encoder~$E_{ϕ_E}$, both trained jointly end-to-end via cross-entropy loss. The integral defining the learned operator is approximated by Monte Carlo sampling, which we argue acts as an implicit stochastic regularizer operating at the level of the integrand and complementing parameter-level regularizers such as weight decay and dropout. We benchmark the framework on three real-world spectroscopic classification tasks (FT-IR fruit purees, NIR meat, NIR textiles) of varying size and complexity, against traditional machine learning (decision tree, support vector machine, with and without UMAP) and modern deep learning baselines (FFNN, CNN+FFNN, shallow CNN, transformer). The proposed NIO is consistently among the top two performing models across all datasets and metrics, achieves the best results on the most challenging small-and-complex dataset (Textile), and yields lower performance variance than competing deep models in the small-data regime. The results suggest that operator-learning architectures with stochastic numerical integration are a viable soft-computing strategy for inverse problems in spectroscopy when conventional deep learning approaches are limited by data scarcity.

URL PDF HTML ☆

赞 0 踩 0

2411.06278 2026-05-26 math.NA cs.LG cs.NA math.OC 版本更新

训练后的量子神经网络是高斯过程

Filippo Girardi, Giacomo De Palma

发表机构 * Korteweg–de Vries Institute for Mathematics, University of Amsterdam（阿姆斯特丹大学Korteweg–de Vries数学研究所）； QuSoft, Science Park 123, Amsterdam（阿姆斯特丹QuSoft）； Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126, Pisa (PI), Italy（意大利比萨斯克里瓦纳超级学院）； Department of Mathematics, University of Bologna（博洛尼亚大学数学系）

AI总结研究由参数化单量子比特门和固定双量子比特门构成的量子神经网络在无限宽度极限下的行为，证明未训练和训练后的网络生成的函数概率分布均收敛于高斯过程，并分析了测量噪声的影响。

Comments 116 pages

详情

DOI: 10.1007/s00220-025-05238-0
Journal ref: Communications in Mathematical Physics 406, 92 (2025)

AI中文摘要

我们研究了由参数化单量子比特门和固定双量子比特门构成的量子神经网络在无限宽度极限下的行为，其中生成的函数是所有量子比特上单量子比特可观测量之和的期望值。首先，我们证明，当每个被测量的量子比特仅与少数其他被测量的量子比特相关时，具有随机初始化参数的未训练网络生成的函数的概率分布在分布上收敛于高斯过程。然后，我们通过梯度下降和平方损失在监督学习问题上解析地刻画了网络的训练。我们证明，只要网络不受贫瘠高原的影响，训练后的网络可以完美拟合训练集，并且训练后生成的函数的概率分布仍然在分布上收敛于高斯过程。最后，我们考虑网络输出端测量的统计噪声，并证明多项式数量的测量足以使所有先前的结果成立，并且网络始终可以在多项式时间内训练。

英文摘要

We study quantum neural networks made by parametric one-qubit gates and fixed two-qubit gates in the limit of infinite width, where the generated function is the expectation value of the sum of single-qubit observables over all the qubits. First, we prove that the probability distribution of the function generated by the untrained network with randomly initialized parameters converges in distribution to a Gaussian process whenever each measured qubit is correlated only with few other measured qubits. Then, we analytically characterize the training of the network via gradient descent with square loss on supervised learning problems. We prove that, as long as the network is not affected by barren plateaus, the trained network can perfectly fit the training set and that the probability distribution of the function generated after training still converges in distribution to a Gaussian process. Finally, we consider the statistical noise of the measurement at the output of the network and prove that a polynomial number of measurements is sufficient for all the previous results to hold and that the network can always be trained in polynomial time.

URL PDF HTML ☆

赞 0 踩 0

2310.04981 2026-05-26 cs.CV cs.LG 版本更新

Compositional Semantics for Open Vocabulary Spatio-semantic Representations

开放词汇时空语义表示的组合语义

Robin Karlsson, Francisco Lepe-Salazar, Kazuya Takeda

发表机构 * Graduate School of Informatics, Nagoya University（名古屋大学信息学研究科）； Ludolab ； TIER IV

AI总结提出潜在组合语义嵌入z*作为可查询时空语义记忆的知识表示，证明其存在性、最优性及可发现性，并引入充分相似性推理方法提升重叠语义推理性能。

Comments Preprint

详情

AI中文摘要

视觉语言模型（VLM）将环境感知转换为LLM可解释的视觉语言语义。然而，完成复杂任务通常需要对当前感知之外的信息进行推理。我们提出潜在组合语义嵌入z*作为可查询时空语义记忆的基于学习的原则性知识表示。我们在数学上证明z*总是可以找到，并且最优z*是任何集合Z的质心。我们推导了估计相关和不相关语义可分离性的概率界限。我们证明z*可以通过迭代梯度下降从视觉外观和单一描述中发现。我们在包括CLIP和SBERT的四个嵌入空间上实验验证了我们的发现。结果表明，z*可以表示由SBERT编码的多达10个语义，以及理想均匀分布的高维嵌入的多达100个语义。我们引入了三个具有重叠语义的新数据集，以表明在常规非重叠注释上训练的常见VLM能够发现z*。我们提出的充分相似性推理方法克服了传统推理的根本局限性，并将更高层次的重叠语义推理性能平均提高了19.63 mIoU。

英文摘要

Vision-language models (VLMs) transform environment percepts into vision-language semantics interpretable by LLMs. However, completing complex tasks often requires reasoning about information beyond what is currently perceived. We propose latent compositional semantic embeddings z* as a principled learning-based knowledge representation for queryable spatio-semantic memories. We mathematically prove that z* can always be found, and that the optimal z* is the centroid for any set Z. We derive a probabilistic bound for estimating separability of related and unrelated semantics. We prove that z* is discoverable from visual appearance and singular descriptions by iterative gradient descent. We experimentally verify our findings on four embedding spaces including CLIP and SBERT. Our results show that z* can represent up to 10 semantics encoded by SBERT, and up to 100 semantics for ideal uniformly distributed high-dimensional embeddings. We introduce three new datasets with overlapping semantics to show that common VLMs trained on conventional nonoverlapping annotations discover z*. Our novel sufficient similarity inference method overcomes fundamental limitations of conventional inference, and improves higher-level overlapping semantic inference performance by 19.63 mIoU on average.

URL PDF HTML ☆

赞 0 踩 0

2305.11663 2026-05-26 cs.LG cs.AI cs.CL cs.CY 版本更新

Algorithmic failure as a humanities methodology: machine learning's mispredictions identify rich cases for qualitative analysis

作为人文学科方法论的算法失败：机器学习的错误预测识别出用于定性分析的丰富案例

Jill Walker Rettberg

AI总结本文通过实验验证了Munk等人提出的利用机器学习失败预测识别定性分析中模糊且丰富案例的方法，使用简单kNN算法对虚构角色与机器视觉技术互动的动作数据进行分类，发现不可预测的动作更具矛盾性和情感负荷，支持该方法在人文学科中的适用性。

详情

DOI: 10.1177/20539517221131290
Journal ref: Big Data & Society 9(2) 2022

AI中文摘要

本文评论测试了Munk等人（2022）提出的一种方法论，即利用机器学习中的失败预测作为识别定性分析中模糊且丰富案例的方法。使用一个描述500件艺术品、电影、小说和电子游戏中虚构角色与机器视觉技术互动动作的数据集，我训练了一个简单的机器学习算法（使用R中的kNN算法），仅根据虚构角色的信息预测动作是主动还是被动。可预测的动作通常是缺乏情感且明确的，其中机器视觉技术被当作简单工具。不可预测的动作，即算法无法正确预测的动作，则更加矛盾且情感负荷更重，角色与技术之间的权力关系更为复杂。因此，结果支持Munk等人的理论，即失败预测可以有效地用于识别定性分析的丰富案例。本测试不仅简单复制了Munk等人的结果，还证明了该方法可以应用于更广泛的人文学科领域，并且不需要复杂的神经网络，简单的机器学习算法也能奏效。需要进一步研究以理解该方法适用于哪些类型的数据以及哪种机器学习最具生成性。为此，附上了产生结果所需的R代码，以便复制测试。该代码也可重复使用或改编，以在其他数据集上测试该方法。

英文摘要

This commentary tests a methodology proposed by Munk et al. (2022) for using failed predictions in machine learning as a method to identify ambiguous and rich cases for qualitative analysis. Using a dataset describing actions performed by fictional characters interacting with machine vision technologies in 500 artworks, movies, novels and videogames, I trained a simple machine learning algorithm (using the kNN algorithm in R) to predict whether or not an action was active or passive using only information about the fictional characters. Predictable actions were generally unemotional and unambiguous activities where machine vision technologies were treated as simple tools. Unpredictable actions, that is, actions that the algorithm could not correctly predict, were more ambivalent and emotionally loaded, with more complex power relationships between characters and technologies. The results thus support Munk et al.'s theory that failed predictions can be productively used to identify rich cases for qualitative analysis. This test goes beyond simply replicating Munk et al.'s results by demonstrating that the method can be applied to a broader humanities domain, and that it does not require complex neural networks but can also work with a simpler machine learning algorithm. Further research is needed to develop an understanding of what kinds of data the method is useful for and which kinds of machine learning are most generative. To support this, the R code required to produce the results is included so the test can be replicated. The code can also be reused or adapted to test the method on other datasets.

URL PDF HTML ☆

赞 0 踩 0

2105.13431 2026-05-26 cs.LG cs.AI cs.SY eess.SY 版本更新

An Offline Risk-aware Policy Selection Method for Bayesian Markov Decision Processes

贝叶斯马尔可夫决策过程的离线风险感知策略选择方法

Giorgio Angelotti, Nicolas Drougard, Caroline Ponzoni Carvalho Chanel

发表机构 * Natural Intelligence Toulouse Institute, University of Toulouse, France（图卢兹大学自然智能研究所）； ISAE-SUPAERO, University of Toulouse, France（图卢兹大学ISAE-SUPAERO）

AI总结针对离线强化学习中模型不确定性导致策略风险高的问题，提出一种基于贝叶斯形式化框架的风险感知策略选择方法EvC，通过最大化贝叶斯后验下的风险感知目标来选择稳健策略。

Comments Preprint, under review

详情

DOI: 10.1016/j.artint.2026.104519
Journal ref: Artificial Intelligence, Volume 354, 2026

挖矿的智能时机：用于比特币硬件投资回报率预测的深度学习框架

Sithumi Wickramasinghe, Bikramjit Das, Dorien Herremans

发表机构 * Singapore University of Technology and Design（新加坡科技设计大学）

AI总结提出MineROI-Net，一种基于Transformer的深度学习框架，将比特币ASIC硬件采购建模为时间序列分类任务，预测一年内的投资回报率类别，在2015-2024年20种ASIC矿机数据上达到83.2%准确率和83.5%宏F1分数。

详情

AI中文摘要

由于市场波动、技术快速过时和协议驱动的收入周期，比特币挖矿硬件的获取需要战略时机。尽管挖矿已演变为资本密集型行业，但关于何时购买新的专用集成电路（ASIC）硬件的指导很少，且没有先前的计算框架解决这一决策问题。我们通过将硬件获取建模为时间序列分类任务来填补这一空白，预测购买ASIC机器是否在一年内产生盈利（投资回报率（ROI）>= 1）、边际（0 < ROI < 1）或亏损（ROI <= 0）的回报。我们提出了MineROI-Net，一种开源的基于Transformer的架构，旨在捕捉挖矿盈利能力中的多尺度时间模式。在2015年至2024年间发布的20种ASIC矿机在不同市场体制下的数据上评估，MineROI-Net优于循环、卷积和基于注意力的基线，达到了83.2%的准确率和83.5%的宏F1分数。该模型展示了强大的经济相关性，在检测亏损时期达到了97.8%的精确率，在检测盈利时期达到了81.5%的精确率，同时避免了将盈利场景误分类为亏损以及反之亦然。这些结果表明，MineROI-Net为挖矿硬件采购时机提供了一种实用的数据驱动工具，可能降低资本密集型挖矿操作中的财务风险。

英文摘要

Bitcoin mining hardware acquisition requires strategic timing due to volatile markets, rapid technological obsolescence, and protocol-driven revenue cycles. Despite mining's evolution into a capital-intensive industry, there is little guidance on when to purchase new Application-Specific Integrated Circuit (ASIC) hardware, and no prior computational frameworks address this decision problem. We address this gap by formulating hardware acquisition as a time series classification task, predicting whether purchasing ASIC machines yields profitable (Return on Investment (ROI) >= 1), marginal (0 < ROI < 1), or unprofitable (ROI <= 0) returns within one year. We propose MineROI-Net, an open-source Transformer-based architecture designed to capture multi-scale temporal patterns in mining profitability. Evaluated on data from 20 ASIC miners released between 2015 and 2024 across diverse market regimes, MineROI-Net outperforms recurrent, convolutional, and attention-based baselines, achieving 83.2% accuracy and 83.5% macro F1-score. The model demonstrates strong economic relevance, achieving 97.8% precision in detecting unprofitable periods and 81.5% precision in detecting profitable ones, while avoiding misclassifying profitable scenarios as unprofitable and vice versa. These results indicate that MineROI-Net offers a practical, data-driven tool for timing mining hardware acquisitions, potentially reducing financial risk in capital-intensive mining operations.

URL PDF HTML ☆

赞 0 踩 0

2401.01160 2026-05-26 eess.IV cs.CG cs.CV cs.LG 版本更新

Train-Free Segmentation in MRI with Cubical Persistent Homology

基于立方体持续同调的MRI无训练分割

Anton François, Raphaël Tinarrage

发表机构 * Centre G. Borelli ； ENS Paris-Saclay（巴黎-萨克雷大学）； IST Austria（IST奥地利研究所）； EMAp, Fundação Getulio Vargas（EMAp，格洛里亚·瓦格斯基金会）

AI总结提出一种基于拓扑数据分析的无训练MRI分割框架，通过自动阈值、提取已知拓扑子集和分解成分三步实现，利用持续同调中的近似代表循环建立拓扑特征与解剖成分的可解释联系，在胶质母细胞瘤和胎儿皮质板分割中验证有效性。

Comments Similar to the published version. 22 pages, 11 figures, 3 tables. For associated code, see https://github.com/antonfrancois/gliomaSegmentation_TDA

详情

DOI: 10.1007/s10851-026-01300-1
Journal ref: Journal of Mathematical Imaging and Vision 68, 20 (2026)

AI中文摘要

我们研究了一种基于拓扑数据分析的无训练MRI分割框架。该流程分三步进行：首先通过自动阈值识别待分割的整个对象，然后检测一个拓扑结构已知的独特子集，最后推导出分割的各个组成部分。一个关键要素是从持续同调图中提取近似代表循环，这提供了持久特征与解剖成分之间的可解释联系。为了阐明该方法的应用范围，我们明确了潜在的拓扑和强度假设，量化了它们在真实数据上的成立情况，并分析了典型的失败模式。我们在胶质母细胞瘤和胎儿皮质板分割上评估了该方法，并与无监督和深度学习参考方法进行了比较。通过在没有大型标注数据集的情况下运行，该方法非常适合数据稀缺的场景，并为专家修正或基于学习的流程提供了可解释的基线和实用的初始化。

英文摘要

We investigate a framework for train-free MRI segmentation based on Topological Data Analysis. The pipeline proceeds in three steps, first identifying the whole object to segment via automatic thresholding, then detecting a distinctive subset whose topology is known in advance, and finally deducing the various components of the segmentation. A key ingredient is the extraction of approximate representative cycles from persistence diagrams, which provides an interpretable link between persistent features and anatomical components. To clarify the method's scope, we make the underlying topological and intensity assumptions explicit, quantify when they hold on real data, and analyze typical failure modes. We evaluate the approach on glioblastoma and on fetal cortical plate segmentation, with comparisons to unsupervised and deep-learning references. By operating without large annotated datasets, the method is well suited to scarce-data settings and provides an interpretable baseline and practical initialization for expert refinement or learning-based pipelines.

URL PDF HTML ☆

赞 0 踩 0

2509.23413 2026-05-26 cs.LG 版本更新

URS: A Unified Neural Routing Solver for Cross-Problem Zero-Shot Generalization

URS：一种面向跨问题零样本泛化的统一神经路由求解器

Changliang Zhou, Canhong Yu, Shunyu Yao, Xi Lin, Zhenkun Wang, Yu Zhou, Qingfu Zhang

发表机构 * School of Automation and Intelligent Manufacturing, Southern University of Science and Technology, Shenzhen, China（自动化与智能制造学院，南方科技大学，深圳，中国）； Guangdong Provincial Key Laboratory of Fully Actuated System Control Theory and Technology, Southern University of Science and Technology, Shenzhen, China（广东省全驱动系统控制理论与技术重点实验室，南方科技大学，深圳，中国）； College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China（计算机科学与软件工程学院，深圳大学，深圳，中国）； Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China（计算机科学系，香港城市大学，香港特别行政区，中国）； School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China（数学与统计学学院，西安交通大学，西安，中国）

AI总结提出URS，一种统一神经路由求解器，通过统一数据表示和混合偏置模块，实现单个模型在110种车辆路径问题变体（含99种未见变体）上的零样本泛化，并支持高达7000节点的规模。

Comments accepted by ICML 2026

详情

AI中文摘要

多任务神经路由求解器因其能够使用单个模型解决多种车辆路径问题（VRP）而成为一种有前景的范式。然而，现有的神经求解器通常依赖预定义的问题约束或需要针对每个问题进行微调，这极大地限制了它们对未见VRP变体的零样本泛化能力。为了解决这一关键瓶颈，我们提出了URS，一种统一的神经路由求解器，能够通过单个模型在广泛的未见VRP变体上实现零样本泛化。我们提出了一种统一数据表示（UDR），用数据统一替代问题枚举，从而扩大了问题覆盖范围并减少了对领域专业知识的依赖。此外，我们在编码过程中引入了一个混合偏置模块（MBM）来改进节点嵌入，该模块有效地捕获了各种问题固有的多个先验。在UDR的基础上，我们开发了一个问题条件参数生成器，以进一步提高零样本泛化能力。大量实验表明，URS能够为110种VRP变体（包括99种未见变体）持续生成高质量的解，同时展现出对多达7000个节点的大规模实例的出色可扩展性。据我们所知，URS是第一个能够通过单个模型处理超过100种VRP变体的神经求解器。我们的代码可在https://github.com/CIAM-Group/URS获取。

英文摘要

Multi-task neural routing solvers have emerged as a promising paradigm for their ability to solve multiple vehicle routing problems (VRPs) using a single model. However, existing neural solvers typically rely on predefined problem constraints or require per-problem fine-tuning, which substantially limits their zero-shot generalization ability to unseen VRP variants. To address this critical bottleneck, we propose URS, a unified neural routing solver that achieves zero-shot generalization across a wide range of unseen VRPs with a single model. We propose a unified data representation (UDR) that replaces problem enumeration with data unification, thereby broadening the problem coverage and reducing reliance on domain expertise. In addition, we introduce a Mixed Bias Module (MBM) during encoding to improve node embeddings, which efficiently captures multiple priors inherent to various problems. On top of the UDR, we develop a problem-conditioned parameter generator to further improve zero-shot generalization. Extensive experiments show that URS consistently produces high-quality solutions for 110 VRP variants (including 99 unseen variants) while demonstrating impressive scalability to large-scale instances with up to 7000 nodes. To the best of our knowledge, URS is the first neural solver to handle over 100 VRP variants with a single model. Our code is available at https://github.com/CIAM-Group/URS.

URL PDF HTML ☆

赞 0 踩 0

2505.20110 2026-05-26 cs.LG cs.AI 版本更新

Beyond the Proxy: Trajectory-Distilled Guidance for Offline GFlowNet Training

超越代理：用于离线GFlowNet训练的轨迹蒸馏指导

Ruishuo Chen, Xun Wang, Rui Hu, Zhuoran Li, Longbo Huang

发表机构 * Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China（清华大学交叉信息研究院）

AI总结提出轨迹蒸馏GFlowNet（TD-GFN），利用逆强化学习从离线轨迹中提取稠密边奖励，通过DAG剪枝和优先反向采样指导策略，避免代理模型，提升离线GFlowNet训练的收敛速度和样本质量。

Comments Camera-ready version accepted at ICML 2026

详情

AI中文摘要

生成流网络（GFlowNets）擅长采样多样化的高奖励对象。在许多实际应用中，由于无法进行主动奖励查询，这些模型必须使用静态离线数据集进行训练。主流的训练方法通常依赖代理模型为在线采样的轨迹提供奖励反馈。然而，由于数据稀缺或评估成本高，构建可靠的代理往往具有挑战性。虽然现有的无代理方法试图解决这一问题，但它们通常施加粗糙的约束，限制了模型有效探索的能力。为了克服这些限制，我们提出了轨迹蒸馏GFlowNet（TD-GFN），一种新颖的无代理训练框架。TD-GFN利用逆强化学习（IRL）从离线轨迹中提取稠密的、转移级别的边奖励，为高效探索提供丰富的结构指导。关键的是，为了确保鲁棒性，这些奖励通过DAG剪枝和优先反向采样间接指导策略。这种设计确保梯度更新仅依赖于数据集中的真实终端奖励，从而防止错误传播。实验结果表明，TD-GFN在收敛速度和样本质量上显著优于广泛的现有基线，为离线GFlowNet训练建立了更鲁棒和高效的范式。

英文摘要

Generative Flow Networks (GFlowNets) excel at sampling diverse, high-reward objects. In many practical applications where active reward queries are infeasible, these models must be trained using static offline datasets. Prevailing training methods typically rely on a proxy model to provide reward feedback for online sampled trajectories. However, constructing a reliable proxy is often challenging due to data scarcity or high evaluation costs. While existing proxy-free approaches attempt to address this, they often impose coarse constraints that limit the model's ability to explore effectively. To overcome these limitations, we propose Trajectory-Distilled GFlowNet (TD-GFN), a novel proxy-free training framework. TD-GFN utilizes inverse reinforcement learning (IRL) to extract dense, transition-level edge rewards from offline trajectories, providing rich structural guidance for efficient exploration. Crucially, to ensure robustness, these rewards guide the policy indirectly through DAG pruning and prioritized backward sampling. This design ensures that gradient updates rely exclusively on ground-truth terminal rewards from the dataset, thereby preventing error propagation. Empirical results demonstrate that TD-GFN significantly outperforms a broad range of existing baselines in both convergence speed and sample quality, establishing a more robust and efficient paradigm for offline GFlowNet training.

URL PDF HTML ☆

赞 0 踩 0

2509.15543 2026-05-26 cs.LG 版本更新

实例条件适应：神经路由求解器的大规模泛化

Changliang Zhou, Xi Lin, Zhenkun Wang, Xialiang Tong, Mingxuan Yuan, Qingfu Zhang

发表机构 * School of Automation and Intelligent Manufacturing and Guangdong Provincial Key Laboratory of Fully Actuated System Control Theory and Technology, Southern University of Science and Technology, Shenzhen 518055, China（自动化与智能制造学院和广东省全驱动系统控制理论与技术重点实验室，南方科技大学，深圳518055，中国）； Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China（计算机科学系，香港城市大学，香港特别行政区，中国）； Huawei Noah’s Ark Lab, Hong Kong SAR, China（华为诺亚实验室，香港特别行政区，中国）

AI总结提出实例条件适应模型（ICAM），通过简单高效的实例条件适应函数和低复杂度的适应模块，显著提升神经路由求解器在大规模旅行商问题（TSP）、容量车辆路径问题（CVRP）和非对称旅行商问题（ATSP）上的泛化性能，同时保持快速推理速度。

Comments 13 pages, 5 figures

详情

DOI: 10.1109/TITS.2026.3674538
Journal ref: IEEE Transactions on Intelligent Transportation Systems, 2026

AI中文摘要

神经组合优化（NCO）方法在无需专家知识的情况下，展现出了解决智能交通系统路由问题的巨大潜力。然而，现有的构造性NCO方法仍难以解决大规模实例，这严重限制了其应用前景。为了解决这些关键缺陷，本文提出了一种新颖的实例条件适应模型（ICAM），以实现神经路由求解器更好的大规模泛化。特别地，我们设计了一个简单而高效的实例条件适应函数，以较小的时空开销显著提升现有NCO模型的泛化性能。此外，通过对不同注意力机制之间信息融合性能的系统研究，我们进一步提出了一个强大且低复杂度的实例条件适应模块，为不同规模的实例生成更好的解。在合成实例和基准实例上的大量实验结果表明，我们提出的方法能够在解决大规模旅行商问题（TSP）、容量车辆路径问题（CVRP）和非对称旅行商问题（ATSP）时，以非常快的推理时间获得有希望的结果。我们的代码可在 https://github.com/CIAM-Group/ICAM 获取。

英文摘要

The neural combinatorial optimization (NCO) method has shown great potential for solving routing problems of intelligent transportation systems without requiring expert knowledge. However, existing constructive NCO methods still struggle to solve large-scale instances, which significantly limits their application prospects. To address these crucial shortcomings, this work proposes a novel Instance-Conditioned Adaptation Model (ICAM) for better large-scale generalization of neural routing solvers. In particular, we design a simple yet efficient instance-conditioned adaptation function to significantly improve the generalization performance of existing NCO models with a small time and memory overhead. In addition, with a systematic investigation on the performance of information incorporation between different attention mechanisms, we further propose a powerful yet low-complexity instance-conditioned adaptation module to generate better solutions for instances across different scales. Extensive experimental results on both synthetic and benchmark instances show that our proposed method is capable of obtaining promising results with a very fast inference time in solving large-scale Traveling Salesman Problems (TSPs), Capacitated Vehicle Routing Problems (CVRPs), and Asymmetric Traveling Salesman Problems (ATSPs). Our code is available at https://github.com/CIAM-Group/ICAM.

URL PDF HTML ☆

赞 0 踩 0

2409.02416 2026-05-26 cs.LG stat.ML 版本更新

Relative Translation Invariant Wasserstein Distance

相对平移不变Wasserstein距离

Binshuai Wang, Qiwei Di, Ming Yin, Mengdi Wang, Quanquan Gu, Peng Wei

发表机构 * Department of Computer Science（计算机科学系）； George Washington University（乔治华盛顿大学）； University of California, Los Angeles（加州大学洛杉矶分校）； Department of Electrical and Computer Engineering（电气与计算机工程系）； Princeton University（普林斯顿大学）

AI总结受Bures距离启发，提出相对平移不变Wasserstein距离RW_p，证明其度量性质，并设计双层算法计算离散分布间的RW_p距离，当p=2时提出RW_2-LP和RW_2-Sinkhorn算法以提高数值稳定性，实验验证了算法在减少数值误差和实际雷暴模式检索中的有效性。

Comments Accepted by Transactions on Machine Learning Research (TMLR). Final accepted version. The implementation is publicly available at \url{https://github.com/DRKWang/rw_metric}

详情

AI中文摘要

受Bures距离启发，我们引入了一类新的距离族——\\emph{相对平移不变Wasserstein距离}，记为$RW_p$，作为经典Wasserstein距离$W_p$（$p \\\in [1, +\\\infty)$）的推广。我们证明了$RW_p$定义了一个有效的度量，并表明这类度量比经典Wasserstein距离更具内在性。设计了一种双层算法来计算任意离散分布之间的一般$RW_p$距离。此外，当$p=2$时，我们证明在离散设定下最优耦合矩阵在分布平移下不变，并进一步提出了两种算法，即$\\\mathrm{RW}_2$-LP算法和$\\\mathrm{RW}_2$-Sinkhorn算法，以提高计算$W_2$距离和最优耦合矩阵解的数值稳定性。最后，我们进行了三个实验来验证我们的理论结果和算法。前两个实验报告了$\\\mathrm{RW}_2$-LP算法和$\\\mathrm{RW}_2$-Sinkhorn算法（无论是否归一化）相比标准算法能显著减少数值误差。第三个实验表明$RW_p$算法在计算上具有可扩展性，并适用于实际应用中相似雷暴模式的检索。

英文摘要

Motivated by the Bures distance, we introduce a new family of distances, \emph{relative translation invariant Wasserstein distances}, denoted by $RW_p$, as an extension of the classical Wasserstein distances $W_p$ for $p \in [1, +\infty)$. We establish that $RW_p$ defines a valid metric and demonstrate that this type of metric is more intrinsic than the classical Wasserstein distance. A bi-level algorithm is designed to compute the general $RW_p$ distance between arbitrary discrete distributions. Moreover, when $p = 2$, we show that the optimal coupling matrix is invariant under distributional translation in the discrete setting, and we further propose two algorithms, the $\mathrm{RW}_2$-LP algorithm and the $\mathrm{RW}_2$-Sinkhorn algorithm, to improve the numerical stability of computing $W_2$ distance and the optimal coupling matrix solutions. Finally, we conduct three experiments to validate our theoretical results and algorithms. The first two experiments report that the $\mathrm{RW}_2$-LP algorithm and the $\mathrm{RW}_2$-Sinkhorn algorithm, both with and without normalization, can significantly reduce the numerical errors compared to standard algorithms. The third experiment shows that $RW_p$ algorithms are computationally scalable and applicable to the retrieval of similar thunderstorm patterns in practical applications.

URL PDF HTML ☆

赞 0 踩 0