2605.21483 2026-05-21 astro-ph.CO cs.LG 版本更新

Velocityformer: Broken-Symmetry-Matched Equivariant Graph Transformers for Cosmological Velocity Reconstruction

Velocityformer: 用于宇宙学速度重建的破缺对称性匹配等价图变换器

Tilman Tröster, David Mirkovic, Veronika Oehl, Arne Thomsen

发表机构 * ETH Zürich（苏黎世联邦理工学院）

AI总结该研究提出Velocityformer，一种等价图变换器架构，通过匹配观测数据的破缺对称性来提高宇宙学速度重建的精度，其在速度相关系数r上比标准线性理论基线提高了35%。

详情

AI中文摘要

精确测量动能Sunyaev-Zel'dovich效应（kSZ效应）——一种探测大尺度宇宙中等离子体分布的关键可观测量——需要准确从光谱巡天中重建星系速度。kSZ测量的信噪比（SNR）直接与重建速度和真实速度之间的相关系数r成正比。我们引入了Velocityformer，一种等价图变换器架构，旨在匹配观测数据的特定对称性。尽管底层物理在平移和旋转下是等价的，但观测效应由于视线方向的偏好而打破了这一对称性。将模型的归纳偏置与数据的破缺对称性匹配，能够一致地提高所有模型大小和训练体积下的性能，Velocityformer在标准线性理论基线上将r提高了35%，并在所有数据体积上优于机器学习基线。通过将模型的归纳偏置与数据以及基于物理的长波长解进行条件化，Velocityformer具有高度的数据效率，能够在最少的低保真模拟数据上训练到高精度，并在输入几何、宇宙学参数和星系样本上实现零样本泛化。在高保真模拟星系目录上，这将r比物理基线提高了30%，直接转化为观测数据上的相同SNR增益。

英文摘要

Precise measurement of the kinematic Sunyaev-Zel'dovich (kSZ) effect - a probe of the large-scale distribution of baryonic matter, a key observable for cosmological inference - requires accurate reconstruction of galaxy velocities from spectroscopic surveys. The signal-to-noise ratio (SNR) of kSZ measurements scales directly with the correlation coefficient $r$ between reconstructed and true velocities. We introduce Velocityformer, an equivariant graph transformer architecture designed to match the specific symmetry of the observational data. While the underlying physics is equivariant with respect to translations and rotations, observational effects break this symmetry due to the preferred line-of-sight direction. Matching the model's inductive bias to the data's broken symmetry consistently improves performance across all model sizes and training volumes, with Velocityformer improving $r$ by 35% over the standard linear theory baseline and outperforming ML baselines at every data volume. By matching the model's inductive bias to the data and conditioning on the physics-based long-wavelength solution, Velocityformer is highly data-efficient, training to high accuracy on as few as 4 low-fidelity simulations, and generalises zero-shot across input geometry, cosmological parameters, and galaxy sample. On high-fidelity simulated galaxy catalogues, this yields a 30% improvement in $r$ over the physical baseline, directly translating to the same SNR gain on observational data.

URL PDF HTML ☆

赞 0 踩 0

2605.21481 2026-05-21 cs.AI cs.CL cs.LG 版本更新

DelTA: 一种用于可验证奖励强化学习的判别性token信用分配

Kaiyi Zhang, Wei Wu, Yankai Lin

发表机构 * Gaoling School of Artificial Intelligence, Renmin University of China（中国人民大学人工智能学院 Gallagher 学院）； Ant International（蚂蚁国际）

AI总结本文提出DelTA方法，通过估计token系数来增强特定侧的token梯度方向，从而改进可验证奖励强化学习中的token概率更新，提升了模型在数学基准测试中的性能。

详情

AI中文摘要

可验证奖励强化学习（RLVR）已成为提升大语言模型推理能力的核心技术。尽管其有效性已得到认可，但响应级奖励如何转化为token级概率变化仍缺乏深入理解。本文引入了RLVR更新的判别视角，表明策略梯度更新方向隐式地作为token梯度向量的线性判别器，从而决定学习过程中哪些token概率被增加或减少。在标准序列级RLVR中，该判别器由通过优势加权平均得到的正负侧质心构成。然而，此类质心构建可能被共享的高频模式（如格式token）主导，稀释了稀疏但判别性强的方向，这些方向更能区分高分响应与低分响应。为解决这一限制，本文提出DelTA，一种判别性token信用分配方法，通过估计token系数来放大侧特定的token梯度方向并降低共享或弱判别性的方向。这些系数重新加权了自我归一化的RLVR替代方案，使有效的侧向质心更具对比性，从而重塑RLVR更新方向。在七个数学基准测试中，DelTA在Qwen3-8B-Base和Qwen3-14B-Base上分别比最强的同规模基线高出3.26和2.62个平均分。此外，代码生成、不同backbone和域外评估的额外结果进一步展示了DelTA的泛化能力。

英文摘要

Reinforcement learning from verifiable rewards (RLVR) has emerged as a central technique for improving the reasoning capabilities of large language models. Despite its effectiveness, how response-level rewards translate into token-level probability changes remains poorly understood. We introduce a discriminator view of RLVR updates, showing that the policy-gradient update direction implicitly acts as a linear discriminator over token-gradient vectors and thereby determines which token probabilities are increased or decreased during learning. Under standard sequence-level RLVR, this discriminator is constructed from positive- and negative-side centroids formed by advantage-weighted averaging of token-gradient vectors. However, such centroid construction can be dominated by shared high-frequency patterns, such as formatting tokens, diluting sparse yet discriminative directions that better distinguish high-reward responses from low-reward ones. To address this limitation, we propose $\textbf{DelTA}$, a discriminative token credit assignment method that estimates token coefficients to amplify side-specific token-gradient directions and downweight shared or weakly discriminative ones. These coefficients reweight a self-normalized RLVR surrogate, making the effective side-wise centroids more contrastive and thereby reshaping the RLVR update direction. On seven mathematical benchmarks, DelTA outperforms the strongest same-scale baselines by 3.26 and 2.62 average points on Qwen3-8B-Base and Qwen3-14B-Base, respectively. Additional results on code generation, a different backbone, and out-of-domain evaluations further demonstrate the generalization ability of DelTA.

URL PDF HTML ☆

赞 0 踩 0

2605.21461 2026-05-21 cs.LG 版本更新

A Machine Learning Framework for Weighted Least Squares GNSS Positioning based on Activation Functions

一种基于激活函数的加权最小二乘GNSS定位机器学习框架

Pin-Hsun Lee, Harry Leib

发表机构 * Department of Electrical and Computer Engineering, McGill University（麦吉尔大学电气与计算机工程系）

AI总结本文提出了一种基于激活函数的加权最小二乘GNSS定位机器学习框架，通过使用信号质量指标作为训练特征，利用集成学习算法识别低质量信号，并通过激活函数将机器学习预测的分数转换为适当的权重以提高定位精度。

详情

AI中文摘要

全球导航卫星系统（GNSS）被广泛用于为各种应用提供位置、速度和时间（PVT）信息，包括交通运输、基于位置的通信服务和智能农业。在城市峡谷中，高楼大厦和狭窄街道可能导致信号遮挡、非视距（NLOS）接收和多路径效应，这些都会引入GNSS伪距测量的误差。尽管多星座GNSS有效增加了可用卫星的数量，但包含退化信号可能导致严重的定位误差。本文提出了一种基于激活函数的加权最小二乘（WLS）算法的机器学习框架，以提高定位精度。几种信号质量指标被用作集成学习算法的训练特征，以通过提供质量分数来识别低质量信号。然后，激活函数被用来将机器学习预测的分数转换为适合WLS定位的适当权重。为了评估我们方法的性能，使用来自香港和东京城市地区的实际数据集进行了实验。对激活函数的比较分析表明，Sigmoid函数在不同的机器学习算法和GNSS星座配置下始终产生最大的改进。所提出的算法在单星座和多星座场景中均表现出显著的定位误差减少。此外，我们的结果表明，所提出的算法在训练数据来自其他具有类似城市化水平的地区时，表现出强大的地理迁移性。

英文摘要

Global Navigation Satellite Systems (GNSS) are widely used to provide position, velocity, and timing (PVT) information for various applications, including transportation, location-based communication services, and intelligent agriculture. In urban canyons, high-rise buildings and narrow streets can cause signal obstruction, non-line-of-sight (NLOS) reception, and multipath effects that introduce errors in GNSS pseudorange measurements. Although multi-constellations GNSS effectively increase the number of available satellites, the inclusion of degraded signals can lead to severe positioning errors. This study proposes a machine learning framework for the weighted least squares (WLS) algorithm incorporating activation functions to enhance positioning accuracy. Several signal quality indicators are employed as training features for ensemble learning algorithms to identify poor quality signals by providing quality scores. Then, activation functions are employed to transform the machine learning predicted scores to appropriate weights for WLS positioning. To evaluate the performance of our approach, experiments are conducted using real-world datasets from Hong Kong and Tokyo urban areas. Comparative analysis of activation functions reveals that sigmoid functions consistently yield the greatest improvements with different machine learning algorithms and GNSS constellation configurations. The proposed algorithm demonstrates substantial reductions in positioning errors for both single- and multiconstellation scenarios. Furthermore, our results indicate that the proposed algorithm exhibits strong geographical transferability. The proposed algorithm maintains comparable level of performance when trained on data from other regions with similar levels of urbanization.

URL PDF HTML ☆

赞 0 踩 0

2605.21458 2026-05-21 cs.AI cs.LG stat.ME 版本更新

Mind the Sim-to-Real Gap & Think Like a Scientist

注意仿真到现实的差距并像科学家一样思考

Harsh Parikh, Gabriel Levin-Konigsberg, Dominique Perrault-Joncas, Alexander Volfovsky

发表机构 * Amazon SCOT（亚马逊SCOT团队）； Yale University（耶鲁大学）； Duke University（杜克大学）

AI总结本文研究了在仿真和现实之间如何补充实验以减少价值差距，提出了Fisher-SEP方法，并通过两个案例研究展示了其应用。

详情

AI中文摘要

假设有规划者拥有一个预先训练的序列决策问题的仿真器，并有机会在现实中进行实验。仿真器查询成本低，但继承了校准数据中的混杂因素和漂移。实验是无偏的，但每次试验消耗一个现实单位。我们研究了规划者何时以及如何补充仿真器进行实验。我们给出了三个结果。首先，扩展的仿真引理将仿真器的价值误差分解为校准-部署偏移，该偏移可以随机化识别，以及一个参数残差，无法通过进一步交互减少。第二，仿真器最优策略与最优解之间的价值差距分为局部部分，这部分在部署策略已访问的状态上，以及可达性部分，这部分在部署策略未访问的状态上。在纯被动学习下，可达性部分在任何时间范围内都保持远离零。第三，我们提出了Fisher-SEP，一种辅助仿真的实验策略（SEP），该策略最小化目标策略价值的后验预测方差，具有仅奖励和仅转换的特殊化版本。两个案例研究展示了这些制度。在自动售货机供应链中，前端实验在时间范围足够长以抵消试点成本后超过后验更新。在HIV移动测试示例中，有一个走廊将一个受监控区域与一个受监控较差的区域分开，只有设计的探索才能到达受监控较差的区域。

英文摘要

Suppose a planner has a pre-trained simulator of a sequential decision problem and the option to run real experiments in the field. The simulator is cheap to query but inherits confounding and drift from its calibration data. Experimentation is unbiased but consumes one real unit per trial. We study when, and how, the planner should supplement the simulator with experiments. We give three results. First, an extended simulation lemma decomposes the simulator's value error into a calibration--deployment shift that randomization can identify and a parametric residual that no further interaction can reduce. Second, the value gap between the simulator-optimal policy and the optimum splits into a local component, on states the deployed policy already visits, and a reachability component, on states it does not. The reachability component stays bounded away from zero at any horizon under purely passive learning. Third, we propose Fisher-SEP, a simulation-aided experimental policy (SEP) that minimizes the posterior predictive variance of a target policy's value, with reward-only and transition-only specializations. Two case studies illustrate the regimes. In a vending-machine supply chain, front-loaded experimentation overtakes posterior updating once the horizon is long enough to amortize the pilot. In an HIV mobile-testing example with a corridor that separates a well-surveilled region from a poorly-surveilled one, only designed exploration reaches the poorly-surveilled region.

URL PDF HTML ☆

赞 0 踩 0

2605.21455 2026-05-21 cs.LG 版本更新

Mitigating Label Bias with Interpretable Rubric Embeddings

通过可解释的评分标准嵌入缓解标签偏差

Calvin Isley, Johann D. Gaebler, Sharad Goel

发表机构 * Harvard Kennedy School（哈佛肯尼迪学校）； Harvard University（哈佛大学）

AI总结本文提出通过可解释的评分标准嵌入来缓解标签偏差问题，通过理论和实验证明该方法在合理条件下能减少标签偏差并提升群体质量评估。

详情

AI中文摘要

统计决策算法越来越多地应用于难以获取真实标签的领域，如招聘、大学录取和内容审核。在这些情况下，模型通常是在历史人类评估上进行训练——例如使用过去招聘决定作为真实申请者质量的代理。然而，如果过去的评估不公正地偏袒某些群体，基于这些标签训练的模型可能会继承这些偏见。为了解决这个问题，我们提出基于评分标准嵌入进行预测，这是一种表示框架，用专家定义的准则派生的特征替代标准黑盒嵌入，这些准则与感兴趣的底层构造对齐。通过将预测锚定在语义有意义的维度上，这种方法可以防止受偏见代理信号的影响。我们提供了理论和实证证据，证明在合理条件下评分标准嵌入能够缓解标签偏见。实证上，我们在一个新型的数据集上评估了我们的方法，该数据集包含申请大型硕士项目的申请。我们发现，基于评分标准嵌入训练的模型在减少群体差异的同时提高了群体质量的衡量标准。我们的结果表明，基于可解释、领域相关的表示进行预测，为存在偏见标签的学习提供了一种实用方法。

英文摘要

Statistical decision algorithms are increasingly deployed in domains where ground-truth labels are hard to obtain, such as hiring, university admissions, and content moderation. In these settings, models are typically trained on historical human evaluations -- for example, using past hiring decisions as a proxy for true applicant quality. However, if past evaluations unjustly favor certain groups, models trained on these labels may inherit those biases. To address this problem, we propose basing predictions on rubric embeddings, a representation framework that replaces standard black-box embeddings with features derived from expert-defined criteria that align with the underlying construct of interest. By anchoring predictions to semantically meaningful dimensions, this approach guards against biased proxy signals. We provide both theoretical and empirical evidence that rubric embeddings mitigate label bias under plausible conditions. Empirically, we evaluate our method on a novel dataset of applications to a large master's program. We find that models trained on rubric embeddings reduce group disparities while improving measures of cohort quality. Our results suggest that basing predictions on interpretable, domain-grounded representations offers a practical approach to learning in the presence of biased labels.

URL PDF HTML ☆

赞 0 踩 0

2605.21451 2026-05-21 cs.LG cond-mat.dis-nn cs.AI cs.NE 版本更新

Approximation Theory for Neural Networks: Old and New

神经网络的近似理论：旧与新

Soumendu Sundar Mukherjee, Himasish Talukdar

AI总结本文综述了神经网络近似理论的发展，包括传统单隐层网络的密度结果、量化误差界限以及深度-宽度权衡，还探讨了Kolmogorov-Arnold网络等新架构的理论性质。

Comments 31 pages, 4 figures

详情

AI中文摘要

通用近似定理为神经网络的表达能力提供了数学解释。它们断言，在激活函数的温和条件下，前馈神经网络在广泛的函数类中是密集的，例如实数空间$\mathbb{R}^d$的紧致子集上的连续函数、$L^p$空间或Sobolev空间。在过去四十年里，这些定性的一般性结果已发展成丰富的定量理论，涉及近似速率、参数效率以及深度和宽度等架构特征的作用。本文综述了该理论的几个方面。我们回顾了单隐层网络的经典密度结果，以及将近似误差与网络大小和目标函数的光滑性假设联系起来的量化界限。特别强调了深度-宽度权衡以及证明更深层次架构在结构函数类中可实现更高参数效率的结果。除了标准前馈神经网络外，我们还回顾了Kolmogorov-Arnold网络（KANs）等近期发展的理论性质。

英文摘要

Universal approximation theorems provide a mathematical explanation for the expressive power of neural networks. They assert that, under mild conditions on the activation function, feedforward neural networks are dense in broad function classes, such as continuous functions on compact subsets of $\mathbb{R}^d$, $L^p$ spaces, or Sobolev spaces. Over the past four decades, these qualitative universality results have evolved into a rich quantitative theory addressing approximation rates, parameter efficiency, and the role of architectural features such as depth and width. This survey presents several glimpses into this theory. We review classical density results for single-hidden-layer networks, as well as quantitative bounds that relate approximation error to network size and smoothness assumptions on target functions. Particular emphasis is placed on depth--width trade-offs and on results demonstrating that deeper architectures can achieve superior parameter efficiency for structured function classes. In addition to standard feedforward neural networks, we also review recent developments on Kolmogorov--Arnold Networks (KANs), which offer an alternative architectural paradigm and whose approximation-theoretic properties have begun to attract significant theoretical attention.

URL PDF HTML ☆

赞 0 踩 0

2605.21442 2026-05-21 cs.LG cs.AI 版本更新

torchtune: PyTorch native post-training library

torchtune: 一种基于PyTorch的后训练库

Mark Obozov, Maxime Griot, Joseph Cummings, Evan Smothers, Felipe Mello, Rafi Ayub, Philip John Bontrager, Salman Mohammadi, Ariel Kwiatkowski, Nathan Azrak, Mircea Mironenco

发表机构 * PyTorch ； Meta ； Stanford（斯坦福）； Meta-FAIR

AI总结本文介绍了torchtune，一种基于PyTorch的后训练库，旨在简化大语言模型的后训练生命周期，提供高效的微调、实验和部署流程，通过模块化和可扩展性提升性能和灵活性。

Comments 14 pages

详情

AI中文摘要

现代大语言模型通常需要多阶段训练流水线才能实现强大的下游性能，后训练是适应开放式模型的主要接口。我们介绍了torchtune，一种基于PyTorch的库，旨在简化大语言模型的后训练生命周期，使微调、实验和面向部署的工作流程更加高效。与许多现有的微调框架不同，这些框架往往在易用性、专用食谱或硬件效率方面进行优化，而牺牲了透明性和扩展性，torchtune强调模块化、可修改性和对底层PyTorch组件的直接访问。在本文中，我们阐述了torchtune的设计原则，描述了这些原则如何体现在其模型构建器、训练食谱和分布式训练堆栈中，并在具有代表性的后训练设置中评估了该库。我们对比了流行的微调框架，包括Axolotl和Unsloth，并展示了torchtune在许多设置中提供了强大的性能和内存效率，同时保持足够的灵活性以支持快速的研究迭代。这些结果将torchtune定位为可重复的大语言模型后训练研究的实用基础。

英文摘要

Modern LLMs typically require multistage training pipelines to achieve strong downstream performance, with post-training serving as the main interface for adapting open-weight models. We introduce torchtune, a PyTorch-native library designed to streamline the post-training lifecycle of LLMs, enabling efficient fine-tuning, experimentation, and deployment-oriented workflows. Unlike many existing fine-tuning frameworks, which often optimize for ease of use, specialized recipes, or hardware efficiency at the cost of transparency and extensibility, torchtune emphasizes modularity, hackability, and direct access to the underlying PyTorch components. In this paper, we present the design principles behind torchtune, describe how they are reflected in its model builders, training recipes, and distributed training stack, and evaluate the library across representative post-training settings. We compare against popular fine-tuning frameworks, including Axolotl and Unsloth, and show that torchtune provides strong performance and memory efficiency across many settings while remaining flexible enough for rapid research iteration. These results position torchtune as a practical foundation for reproducible LLMs post-training research.

URL PDF HTML ☆

赞 0 踩 0

2605.21437 2026-05-21 physics.geo-ph cs.LG stat.ML 版本更新

Neural Negative Binomial Regression for Weekly Seismicity Forecasting: Per-Cell Dispersion Estimation and Tail Risk Assessment

基于神经网络的负二项回归用于每周地震预测：每个单元的分散估计和尾部风险评估

Alim Igilik

AI总结本文提出了一种基于神经网络的地震预测方法，通过每个单元的分散参数估计和尾部风险评估，改进了传统泊松分布的假设，提高了极端事件预测的准确性。

Comments 28 pages, 9 figures. Source code available at https://github.com/Al1mkaYandere/seismic-probabilistic-modeling

详情

AI中文摘要

传统方法在空间网格上预测每周地震数量时依赖于具有单一全局分散假设的泊松分布。我们证明在中亚（2010-2024）的地震数据中，这一假设系统性地被违反，通过具有边界校正的似然比检验，强烈拒绝泊松假设（p < 10^{-179}）。本文的主要贡献是EarthquakeNet架构，它通过神经网络（空间嵌入+MLP）提供每个单元的过分散参数alpha的内生估计，而无需显式空间协方差指定。与现有地震预测中的负二项回归方法不同，后者通常假设单一全局alpha，所提出的每个单元公式允许模型识别地震聚类的空间异质性，并通过预测分布的分位数构建概率风险意识警报。在2018-2023年的四系统走步评估中，与负二项GLM基线相比，平均皮球偏差（MPD）减少了8.6%。在尾部区域（Y >= 5）的改进最为显著，所提出模型的连续排名概率得分（CRPS）比基线低12.5%，表明极端事件预测的校准得到改善。

英文摘要

Standard approaches to forecasting the weekly number of earthquakes on a spatial grid rely on the Poisson distribution with a single global dispersion assumption. We show that this assumption is systematically violated in seismic data from Central Asia (2010-2024), where a likelihood-ratio test with boundary correction strongly rejects the Poisson hypothesis (p < 10^{-179}). The main contribution of this work is the EarthquakeNet architecture, which provides an endogenous per-cell estimate of the overdispersion parameter alpha via a neural network (spatial embeddings + MLP), without explicit spatial covariance specification. In contrast to existing negative binomial regression approaches in seismological forecasting, which typically assume a single global alpha, the proposed per-cell formulation allows the model to identify spatial heterogeneity in seismic clustering and to construct probabilistic risk-aware alerts via quantiles of the predicted distribution. A walk-forward evaluation (2018-2023) over four systems shows an 8.6 percent reduction in mean pinball deviation (MPD) relative to a negative binomial GLM baseline. The strongest improvements are observed in the tail regime (Y >= 5), where the continuous ranked probability score (CRPS) of the proposed model is 12.5 percent lower than that of the baseline, indicating improved calibration in extreme-event forecasting.

URL PDF HTML ☆

赞 0 踩 0

2605.21435 2026-05-21 cs.LG math.AT math.CT 版本更新

Gaussian Sheaf Neural Networks

高斯sheaf神经网络

André Ribeiro, Ana Luiza Tenório, Tiago da Silva, Diego Mesquita

发表机构 * Getulio Vargas Foundation（盖图利奥·瓦格斯基金会）； MBZUAI（穆斯林人工智能研究所）

AI总结本文提出高斯sheaf神经网络（GSNNs），通过将高斯分布的均值和协方差矩阵作为节点特征，解决传统GNN在处理概率分布特征时的不足，提出新的拉普拉斯算子并进行实验验证。

详情

AI中文摘要

图神经网络（GNNs）已成为学习关系数据的主流方法。尽管传统GNN的消息传递机制适合向量值节点特征，但某些情况下节点特征更适合用概率分布表示而非实数向量。具体来说，当节点特征是高斯分布时，其由均值和协方差矩阵描述，简单地将参数拼接成单一向量并应用标准消息传递会丢失均值和协方差的几何和代数结构。我们提出高斯sheaf神经网络（GSNNs），这是一个将这些归纳偏置纳入图学习的系统框架。基于细胞sheaf理论，我们推导出一个新的拉普拉斯算子，该算子扩展到此设置并保留其关键性质。我们通过合成和实际数据的实验补充了我们的理论贡献，展示了GSNNs的实用相关性。

英文摘要

Graph Neural Networks (GNNs) have become the de facto standard for learning on relational data. While traditional GNNs' message passing is well suited for vector-valued node features, there are cases in which node features are better represented by probability distributions than real vectors. Concretely, when node features are Gaussians, characterized by a mean and a covariance matrix, naively concatenating their parameters into a single vector and applying standard message passing discards the geometric and algebraic structure that governs means and covariances. We propose Gaussian Sheaf Neural Networks (GSNNs), a principled framework that incorporates these inductive biases into graph-based learning. Building on the theory of cellular sheaves, we derive a new Laplacian operator that generalizes the sheaf Laplacian to this setting and preserves its key properties. We complement our theoretical contributions with experiments on synthetic and real-world data that illustrate the practical relevance of GSNNs.

URL PDF HTML ☆

赞 0 踩 0

2605.21429 2026-05-21 cs.RO cs.LG 版本更新

roto 2.0: The Robot Tactile Olympiad

roto 2.0：机器人触觉奥林匹克

Elle Miller, Jayaram Reddy, Ayush Deshmukh, Trevor McInroe, David Abel, Oisin Mac Aodha, Sethu Vijayakumar

发表机构 * University of Edinburgh（爱丁堡大学）

AI总结本文提出roto 2.0，一个基于触觉的强化学习基准，旨在通过四种不同的机器人形态（16-DOF到24-DOF）标准化触觉强化学习，专注于端到端的'盲'操作，仅使用本体感觉和触觉传感，不使用状态信息或蒸馏。研究展示了显著的性能提升，盲控代理在10秒内完成13次保定球旋转，比当前最先进的速度快了一个数量级。通过开源环境和经过充分调优的基线，降低了进入门槛，使研究人员能够优先考虑基本算法挑战而非繁琐的强化学习调优。

Comments Accepted to 7th ViTac Workshop, ICRA 2026

详情

AI中文摘要

基于触觉的强化学习（RL）目前受到碎片化研究和对过饱和方向任务的关注所限制。我们介绍了Robot Tactile Olympiad的v2版本（roto 2.0），一个GPU并行化的基准，旨在标准化四种不同的机器人形态（16-DOF到24-DOF）之间的触觉强化学习。与之前的基准不同，roto专注于端到端的'盲'操作，仅使用本体感觉和触觉传感，而不使用状态信息或蒸馏。我们展示了显著的性能提升，我们的盲控代理在10秒内完成13次保定球旋转，比当前最先进的速度快了一个数量级。通过开源我们的环境和经过充分调优的基线，我们降低了进入门槛，使研究人员能够优先考虑基本算法挑战而非繁琐的强化学习调优。网站：https://elle-miller.github.io/roto/

英文摘要

Tactile-based reinforcement learning (RL) is currently hindered by fragmented research and a focus on over-saturated orientation tasks. We introduce v2 of the Robot Tactile Olympiad (\texttt{roto 2.0}), a GPU-parallelised benchmark designed to standardise tactile-based RL across four distinct robotic morphologies (16-DOF to 24-DOF). Unlike prior benchmarks, roto focuses on end-to-end "blind" manipulation, utilising only proprioception and tactile sensing without state information or distillation. We demonstrate a significant performance leap, with our blind agents achieving 13 Baoding ball rotations in 10 seconds, an order of magnitude faster than current state-of-the-art speeds. By open-sourcing our environments and robustly tuned baselines, we reduce the barrier to entry and enable researchers to prioritise fundamental algorithmic challenges over tedious RL tuning. Website: https://elle-miller.github.io/roto/

URL PDF HTML ☆

赞 0 踩 0

2605.21428 2026-05-21 cs.LG cs.DS 版本更新

Polynomial-Time Robust Multiclass Linear Classification under Gaussian Marginals

多项式时间鲁棒多类线性分类下的高斯边缘分布

Ilias Diakonikolas, Giannis Iakovidis, Mingchen Ma

发表机构 * University of Wisconsin-Madison（威斯康星大学麦迪逊分校）

AI总结研究在高斯分布下多类线性分类器的无偏学习任务，提出了一种多项式时间鲁棒学习算法，解决了多类分类中误差保证的问题，特别是在k≥3的情况下。

详情

AI中文摘要

我们研究在高斯分布下多类线性分类器的无偏学习任务。给定来自R^d × [k]分布的标记示例(x, y)，其中x的边缘分布为高斯分布，目标是输出一个误差与最佳k类线性分类器相当的假设。尽管二分类情况k=2有成熟的算法理论，但k≥3的情况了解较少。即使对于k=3，先前的鲁棒算法在复杂性和表示大小上也存在指数依赖于所需准确度的倒数。在本文中，我们为多类线性分类器开发了新的结构结果，并利用这些结果设计了具有维度无关误差保证的完全多项式时间鲁棒学习器。我们的第一个结果表明，标准多类感知机算法即使在干净标签和高斯边缘分布的情况下也需要超多项式样本和更新，揭示了二分类中不存在的基本障碍。我们的主要积极结果是一个成对不恰当学习框架，该框架产生了一个高效的误差为~O(k^{3/2}√opt)+ε的一般k的学习器。此外，我们还开发了一个更精确的基于定位的框架，导致k=3时的误差为O(opt)+ε，以及对于几何上规则的k类线性分类器，误差为poly(k)opt+ε。

英文摘要

We study the task of agnostic learning of multiclass linear classifiers under the Gaussian distribution. Given labeled examples $(x, y)$ from a distribution over $\mathbb{R}^d \times [k]$, with Gaussian $x$-marginal, the goal is to output a hypothesis whose error is comparable to that of the best $k$-class linear classifier. While the binary case $k=2$ has a well-developed algorithmic theory, much less is known for $k \ge 3$. Even for $k=3$, prior robust algorithms incur exponential dependence on the inverse of the desired accuracy in both complexity and representation size. In this work, we develop new structural results for multiclass linear classifiers and use them to design fully polynomial-time robust learners with dimension-independent error guarantees. Our first result shows that the standard multiclass perceptron algorithm requires super-polynomially many samples and updates, even with clean labels and Gaussian marginals, revealing a basic obstruction absent in the binary case. Our main positive result is a pairwise improper-learning framework which yields an efficient learner with error $\widetilde O(k^{3/2}\sqrt{\mathrm{opt}})+ε$ for general $k$. Additionally, we develop a sharper localization-based framework which leads to error $O(\mathrm{opt})+ε$ for $k=3$, and error $\mathrm{poly}(k)\mathrm{opt}+ε$ for geometrically regular $k$-class linear classifiers.

URL PDF HTML ☆

赞 0 踩 0

2605.21426 2026-05-21 cs.LG 版本更新

Adaptive Signal Resuscitation: Channel-wise Post-Pruning Repair for Sparse Vision Networks

自适应信号复苏：用于稀疏视觉网络的通道级后剪枝修复

Qishi Zhan, Ziheng Chen, Minxuan Hu

发表机构 * Department of Mathematical and Statistical Sciences, Marquette University（马歇尔大学数学与统计科学系）； The University of Texas at Austin（德克萨斯大学奥斯汀分校）； Cornell Ann S. Bowers College of Computing and Information Science, Cornell University（康奈尔大学安·S·博尔斯计算与信息科学学院）

AI总结本文提出了一种无需训练的通道级修复方法ASR，用于解决高稀疏度下因后剪枝修复粒度不匹配导致的精度下降问题，通过估计每个输出通道的方差匹配修正并结合数据驱动的收缩规则，提升稀疏视觉网络的性能。

详情

AI中文摘要

一次性的幅度剪枝在高稀疏度情况下会导致严重的精度下降，即使剪枝掩码保留了最大的权重。我们认为这种失败反映了后剪枝修复的粒度不匹配。在全局幅度剪枝下，几乎崩溃的通道可以与在同一层中保留信息激活方差的通道共存。现有的逐层激活修复方法对整个层应用单一修正，因此在尝试恢复层级信号时可能会过度放大受损通道。我们提出了自适应信号复苏（ASR），一种无需训练的通道级修复方法，该方法的修复粒度与损伤粒度相匹配。ASR为每个输出通道估计方差匹配的修正，并通过数据驱动的收缩规则稳定该修正，抑制信号弱的后剪枝通道的不可靠修正，同时保留健康通道的修正。在批量归一化重校准之前应用ASR，仅需在小校准集上进行几次前向传递，无需重新训练。在三个数据集、四种卷积架构以及无结构和有结构稀疏性设置下，ASR通常优于逐层修复，尤其在高稀疏度情况下效果显著。在ResNet-50在90%稀疏度下，ASR在CIFAR-10上恢复了55.6%的Top-1准确率，相比逐层修复的41.0%和仅批量归一化重校准的28.0%。消融实验表明，朴素的通道级方差匹配不足，而收缩稳定了后剪枝修复。

英文摘要

One-shot magnitude pruning can cause severe accuracy collapse in the high-sparsity regime, even when the pruning mask preserves the largest weights. We argue that this failure reflects a granularity mismatch in post-pruning repair. Under global magnitude pruning, nearly collapsed channels can coexist with channels that retain informative activation variance within the same layer. Existing layer-wise activation repair methods apply a single correction to the whole layer, and can therefore over-amplify damaged channels while trying to restore the layer-level signal. We propose Adaptive Signal Resuscitation (ASR), a training-free channel-wise repair method that matches the granularity of repair to the granularity of damage. ASR estimates a variance-matching correction for each output channel and stabilizes it with a data-driven shrinkage rule, suppressing unreliable corrections for channels with weak post-pruning signal while preserving corrections for healthier channels. Applied before BatchNorm recalibration, ASR requires only forward passes on a small calibration set and no retraining. Across three datasets, four convolutional architectures, and both unstructured and structured sparsity settings, ASR generally improves over layer-wise repair, with the clearest gains in high-sparsity regimes. On ResNet-50 at 90% sparsity, ASR recovers 55.6% top-1 accuracy on CIFAR-10, compared with 41.0% for layer-wise repair and 28.0% for BatchNorm-only recalibration. Ablations show that naive channel-wise variance matching is insufficient, and that shrinkage stabilizes post-pruning repair.

URL PDF HTML ☆

赞 0 踩 0

2605.21420 2026-05-21 cs.LG cs.AI q-bio.MN 版本更新

记忆、收敛与泛化在生成模型中的表现

Antoine Maillard, Sebastian Goldt

发表机构 * INRIA Paris & DI ENS, PSL University, Paris, France（巴黎国家信息与自动化研究所（INRIA）及巴黎高等师范学院（ENS）与巴黎大学（PSL University））； International School of Advanced Studies (SISSA), Trieste, Italy（意大利国际高级研究学院（SISSA））

AI总结本文研究了生成模型中记忆、收敛和泛化的区别，通过线性生成模型的分析，发现当样本数与输入维度成线性关系时，模型会从记忆过渡到泛化，并揭示了泛化包含两个不同目标：匹配数据分布的主体和恢复数据的主潜在因素。

详情

AI中文摘要

生成神经网络通过少量但有限的示例学习生成高度逼真的图像——它们是通过记忆训练集还是真正收敛到数据分布？为了解决这个问题，Kadkhodaie、Guth、Simoncelli和Mallat（ICLR '24）分别在数据集的不同子集上训练扩散模型，并显示当训练图像数量足够大时，它们会收敛到几乎相同的密度。这一结果提出了两个基本问题：需要多少数据才能收敛，以及收敛在学习数据分布方面捕捉了什么？本文通过提供线性生成模型从记忆到泛化的精确分析来解决这些问题。我们发现这些模型在小负载下会记忆，而当样本数与输入维度成线性关系时，收敛会连续出现。令人惊讶的是，我们发现收敛对恢复数据的主潜在因素不敏感，这些因素在尖锐的过渡中被恢复。在将我们的方法扩展到具有幂律谱的数据后，我们在卷积去噪器实验和Kadkhodaie等人的数据中发现了相同的收敛与潜在因素恢复的区别。因此，我们证明生成模型的泛化分解为至少两个不同的目标：匹配数据分布的主体和恢复数据的主潜在因素。这些目标对应于真实与学习数据分布之间的两种不同距离，只有第一个被收敛所捕捉。

英文摘要

Generative neural networks learn how to produce highly realistic images from a large, but finite number of examples - or do they simply memorise their training set? To settle this question, Kadkhodaie, Guth, Simoncelli and Mallat (ICLR '24) trained diffusion models independently on disjoint subsets of a dataset and showed that they converge to nearly the same density when the number of training images is large enough. This result raises two basic questions: how much data do you need for convergence, and what does convergence capture about learning the data distribution? Here, we address these questions by providing an exact analytical characterisation of the transition from memorisation to generalisation in linear generative models. We find that these models memorise at small load, while convergence emerges continuously when the number of samples is linear in the input dimension. Strikingly, we find that convergence is insensitive to recovery of the principal latent factors of the data, which are recovered in a sharp transition. After extending our approach to data with power-law spectra, we find the same distinction between convergence and latent recovery in our experiments with convolutional denoisers and in the data of Kadkhodaie et al. We thus show that generalisation in generative models decomposes into at least two distinct objectives: matching the bulk of the data distribution and recovering the principal latent factors. These objectives correspond to two different distances between true and learnt data distribution, and only the first one is captured by convergence.

URL PDF HTML ☆

赞 0 踩 0

2605.21395 2026-05-21 cs.AI cs.LG 版本更新

Towards Resilient and Autonomous Networks: A BlueSky Vision on AI-Native 6G

迈向稳健和自主的网络：AI原生6G的BlueSky愿景

Liang Wu, Kelly Wan, Mayank Darbari, Liangjie Hong

发表机构 * Nokia（诺基亚）

AI总结本文提出了一种AI原生6G的BlueSky愿景，旨在将人工智能原生整合到6G中，从'为AI的网络'转向'为网络的AI'，通过基础模型和协作多智能体系统，将网络管理转化为统一的多模态多任务优化问题，推动6G向智能自维持通信基础设施发展。

Comments Accepted at KDD 2026

详情

AI中文摘要

新兴应用的普及，如自动驾驶和沉浸式体验，要求细胞网络不仅更快，而且从根本上更稳健和自主。本文提出了一种BlueSky愿景，探讨人工智能如何原生整合到6G中，从'为AI的网络'转向'为网络的AI'。我们设想，不同于5G对分散、随机模型的依赖，6G时代原生AI将由基础模型锚定，并通过协作多智能体系统进行协调，将网络管理视为统一的多模态、多任务优化问题。基于这一愿景，我们提出了两个变革性方向。第一方向是开发一个6G基础模型作为统一的骨干，将任务特定的知识蒸馏成适合多样边缘部署的紧凑模型。第二方向是推进多智能体系统，以自主诊断、维护和恢复网络，最小化人工干预。这些方向为6G演变为智能、自维持的通信基础设施指明了道路。

英文摘要

The proliferation of emerging applications, such as autonomous driving and immersive experiences, demands cellular networks that are not only faster, but fundamentally more resilient and autonomous. This paper presents a BlueSky vision on how Artificial Intelligence will be natively integrated into 6G, shifting the paradigm from \underline{Network for AI} to \underline{AI for Network}. We envision that, unlike 5G's reliance on scattered, ad-hoc models each trained for a single task, native AI in the 6G era will be anchored by a foundation model and and orchestrated via collaborative multi-agent systems, framing network management as a unified, multi-modal, multi-task optimization problem. Built on this vision, we outline two transformative directions. The first focuses on developing a 6G foundation model as a unified backbone, with task-specific knowledge distilled into compact models suited for diverse edge deployments. The second advances multi-agent systems designed to autonomously diagnose, maintain, and recover networks with minimal human intervention. These directions chart a roadmap for 6G to evolve into an intelligent, self-sustaining communication infrastructure.

URL PDF HTML ☆

赞 0 踩 0

2605.21388 2026-05-21 cs.LG cs.AI cs.NA math.NA stat.ML 版本更新

On the Regularity and Generalization of One-Step Wasserstein-guided Generative Models for PDE-Induced Measures

关于PDE诱导度量的一步Wasserstein引导生成模型的正则性和泛化性

Likun Lin, Zhongjian Wang, Jack Xin, Zhiwen Zhang

发表机构 * Department of Mathematics, The University of Hong Kong（香港大学数学系）； Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University（南洋理工大学数学科学系）； Department of Mathematics, University of California at Irvine（加州大学 Irvine 分校数学系）

AI总结本文研究了一步Wasserstein引导生成模型在处理PDE诱导概率度量时的正则性和泛化性，通过理论框架证明了运输映射的正则性和生成模型的泛化性质，并通过实验验证了理论结果。

详情

AI中文摘要

尽管生成模型在经验上取得了显著成功，但其在科学计算中的统计准确性理论仍然较为悲观。本文发展了一个理论框架，用于理解运输映射的正则性和一步Wasserstein引导生成模型的泛化性质。我们考虑了与线性椭圆和抛物型方程在有界域上以及扩散和福克-计划克方程在环面上关联的归一化目标密度。在标准结构假设下，我们证明这些目标度量满足倍增条件。通过结合这一事实与倍增度量之间最优运输的正则性理论，我们证明了从均匀源度量到目标度量的最优运输映射是Hölder连续的。这种正则性为通过单个推前映射学习PDE诱导分布的一步生成模型提供了近似理论依据。作为代表实例，我们研究了DeepParticle，并推导了描述学习映射与总体最优映射之间差异的额外风险界。我们还建立了在目标转移下的鲁棒性估计，并通过实验验证了推导出的速率。

英文摘要

Despite the remarkable empirical success of generative models, the available theory on their statistical accuracy in scientific computing remains largely pessimistic. This paper develops a theoretical framework for understanding the regularity of transport maps and the generalization properties of one-step Wasserstein-guided generative models for PDE-induced probability measures. We consider normalized target densities associated with linear elliptic and parabolic equations on bounded domains, as well as diffusion and Fokker--Planck equations on the torus. Under standard structural assumptions, we prove that these target measures satisfy doubling conditions. By combining this fact with regularity theory for optimal transport between doubling measures, we show that the optimal transport map from a uniform source measure to the target measure is Hölder continuous. This regularity yields an approximation-theoretic justification for one-step generative models that learn PDE-induced distributions via a single pushforward map. As a representative instance, we study DeepParticle and derive excess-risk bounds characterizing the discrepancy between the learned map and the population-optimal map. We also establish a robustness estimate under target shift and illustrate the theory with experiments which support the derived rates.

URL PDF HTML ☆

赞 0 踩 0

2605.21381 2026-05-21 cs.CV cs.LG 版本更新

Disentangling Generation and Regression in Stochastic Interpolants for Controllable Image Restoration

解耦生成与回归在可控图像恢复中的随机插值

Yi Liu, Jia Ma, Wengen Li, Jihong Guan, Shuigeng Zhou, Yichao Zhang

发表机构 * Tongji University（同济大学）； Fudan University（复旦大学）

AI总结本文提出DiSI框架，通过解耦随机插值过程中的生成与回归组件，实现从纯回归到全生成的连续可控过渡，提升图像恢复任务的效率和精度。

Comments 44 pages, 16 figures, 16 tables

详情

AI中文摘要

近年来，图像恢复（IR）的进步主要由生成方法如扩散模型和流匹配驱动，这些方法在合成逼真纹理方面表现出色，但存在推理慢和像素保真度差的问题。相比之下，传统基于回归的IR方法在这些方面表现更佳，提供单步高效性和高像素级重建保真度。为弥合这一差距，我们提出DiSI，一个统一框架，将随机插值过程解耦为独立的生成和回归组件。这种解耦使DiSI具有显著的通用性，能够连续且可控地从纯回归过程过渡到全生成过程。技术上，我们通过两种特定的采样轨迹实例化该框架，并辅以统一的采样器，实现高质量的少步推理。此外，我们设计了双分支U-Net风格变压器网络，在像素空间中使用专用分支增强条件引导，同时确保高吞吐量。大量实验表明，DiSI在各种IR任务中实现了高效且具有竞争力的结果，同时在单个模型中提供推理时的灵活性，以控制失真感知的权衡。

英文摘要

Recent advances in Image Restoration (IR) have been largely driven by generative methods such as Diffusion Models and Flow Matching, which excel in synthesizing realistic textures while suffering from slow multi-step inference and compromised pixel fidelity. In contrast, classical regression-based IR methods excel precisely in these aspects, offering single-step efficiency and high pixel-level reconstruction fidelity. To bridge this gap, we propose DiSI, a unified framework that Disentangles the underlying Stochastic Interpolant process into independent generation and regression components. This decoupling endows DiSI with remarkable versatility, enabling a continuous and controllable transition from a pure regression process to a fully generative one. Technically, we instantiate this framework with two specific sampling trajectories, accompanied by a unified sampler for high-quality, few-step inference on arbitrary trajectories. Furthermore, we design a dual-branch U-Net style transformer network in pixel space, using a dedicated branch to enhance conditional guidance while ensuring high throughput. Extensive experiments demonstrate that DiSI efficiently achieves competitive results on various IR tasks, while uniquely offering the inference-time flexibility to control the distortion-perception trade-off within a single model.

URL PDF HTML ☆

赞 0 踩 0

2605.21372 2026-05-21 cs.CV cs.AI cs.LG cs.RO 版本更新

Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training

闭环动态驾驶数据混合用于真实-合成协同训练

Hongzhi Ruan, Pei Liu, Weiliang Ma, Zhengning Li, Xueyang Zhang, Jun Ma, Dan Xu, Kun Zhan

发表机构 * Li Auto（力汽车）； HKUST（香港科技大学）； HKUST (GZ)（香港科技大学（广州））

AI总结本文提出了一种闭环动态数据混合方法，通过动态优化过程调整训练数据混合比例，以提升模型性能，解决了在有限预算下优化数据混合的关键问题。

详情

AI中文摘要

数据扩展是现代深度学习的基础，随着自动驾驶转向端到端学习，其重要性日益增加。现实世界驾驶数据标注成本高且场景偏向性明显，使利用几乎无限的合成数据进行真实-合成协同训练成为有前景的方向。然而，简单地整合所有可用的合成数据效率低下且导致分布偏移，优化实际训练预算下的数据混合仍是一个关键但尚未充分研究的问题。因此，我们主张在场景类型和数量上为训练数据混合提供明确指导。特别是在本文中，我们将数据混合近似概念化为一个动态优化过程，通过闭环评估反馈迭代调整训练数据混合以最大化模型性能，并提出AutoScale，一种完全自动化的闭环数据引擎，统一了场景表示、数据混合优化与检索以及模型训练与评估。具体而言，我们提出了图正则化的自编码器（Graph-RAE）用于驾驶场景表示，引入了簇感知梯度上升（Cluster-GA）用于簇级重要性估计和重新加权，并执行簇引导的向量检索以选择高价值样本。在NavSim上的实验表明，AutoScale在有限预算下优于传统协同训练和跨域基线，实现了更好的性能。

英文摘要

Data scaling is fundamental to modern deep learning, and grows increasingly critical as autonomous driving shifts to end-to-end learning. Real-world driving data is expensive to annotate and scene-biased, making real-synthetic co-training with near-infinite synthetic data a promising direction. However, naively incorporating all available synthetic data is inefficient and leads to distribution shifts, and optimizing data mixture under practical training budgets remains a critical yet under-explored problem. In this sense, we claim that the mixture of training data requires clear guidance in terms of scene types and quantities. Particularly in this work, we conceptualize the data mixture approximately as a dynamic optimization process that iteratively adjusts the training data mixture to maximize model performance, guided by closed-loop evaluation feedback, and propose AutoScale, a fully automated closed-loop data engine unifying scene representation, data mixture optimization and retrieval, as well as model training and evaluation. Specifically, we propose Graph Regularized AutoEncoder (Graph-RAE) for driving scene representations, introduce Cluster-aware Gradient Ascent (Cluster-GA) for cluster-wise importance estimation and reweighting, and perform cluster-guided vector retrieval to select high-value samples. Experiments on NavSim demonstrate that AutoScale outperforms vanilla co-training and cross-domain baselines, achieving better performance with fewer synthetic samples under constrained budgets.

URL PDF HTML ☆

赞 0 踩 0

2605.21352 2026-05-21 cs.LG cs.CE cs.ET 版本更新

Classification of Single and Mixed Partial Discharges under Switching Voltage Using an AWA-CNN Framework

基于切换电压的单相和混合局部放电分类的AWA-CNN框架

Md Rafid Kaysar Shagor, Zannatul Ferdousy Mouri, Farhina Haque, Anindya Bijoy Das

AI总结本文提出了一种基于AWA模式表示的CNN框架，用于在切换电压激励下对局部放电源进行分类，通过分析脉冲幅度、宽度和面积生成可视化模式，实现对六种不同放电源的高准确率分类。

详情

AI中文摘要

随着快速开关功率电子的应用增加，局部放电（PD）分析在切换电压激励下的重要性日益增加，但比在正弦条件下更具挑战性，因为活动集中在电压转换处。本文提出了一种幅度-宽度-面积（AWA）模式表示，用于在切换电压激励下进行源导向的局部放电分析。在所提出的方法中，时间域的局部放电脉冲通过脉冲幅度、宽度和面积进行表征，并映射到可视化模式中，其中幅度和面积定义坐标轴，宽度通过颜色编码。生成的AWA模式用于区分六种单个和混合的局部放电源条件：电晕、内部、表面、电晕+内部、电晕+表面和内部+表面。为了评估所提出表示的分类能力，比较了随机森林基线和两个卷积神经网络（CNN）模型，即InceptionV3和ResNet-18。AWA模式显示出可区分的源依赖分布，CNN基于分类在测试准确率上超过96%，而随机森林为73.33%。结果表明，AWA模式为在切换电压激励下多类局部放电源分类提供了合适的可视化表示。

英文摘要

The growing use of fast-switching power electronics has made partial discharge (PD) analysis under switching-voltage excitation increasingly important, yet more challenging than under sinusoidal conditions due to activity concentrated at voltage transitions. This work presents an Amplitude-Width-Area (AWA) pattern representation for source-oriented PD analysis under switching-voltage excitation. In the proposed method, time domain PD pulses are characterized using pulse amplitude, width, and area, and mapped into a visual pattern where amplitude and area define the coordinate axes and width is encoded by color. The generated AWA patterns are used to distinguish six single and mixed PD source conditions: corona, internal, surface, corona+internal, corona+surface, and internal+surface. To evaluate the classification capability of the proposed representation, a Random Forest baseline and two Convolutional Neural Network (CNN) models, InceptionV3 and ResNet-18, are compared. The AWA patterns show distinguishable source-dependent distributions, and CNN-based classification achieves testing accuracy above 96%, compared with 73.33% for Random Forest. The results indicate that AWA patterns provide a visual representation of PD pulses suitable for multi-class PD source classification under switching-voltage excitation.

URL PDF HTML ☆

赞 0 踩 0

2605.21348 2026-05-21 cs.LG cs.AI cs.NA math.NA physics.comp-ph 版本更新

Data-Efficient Neural Operator Training via Physics-Based Active Learning

通过物理引导的主动学习实现数据高效的神经算子训练

Alicja Polanska, Lorenzo Zanisi, Vignesh Gopakumar, Stanislas Pamela

发表机构 * University College London（伦敦大学学院）； Atomic Energy Authority（原子能局）

AI总结本文提出了一种基于物理的主动学习方法，用于提高神经算子训练的数据效率，通过利用偏微分方程残差来指导数据选择，在1D Burgers方程和2D可压缩纳维-斯托克斯方程的数值实验中验证了该方法在数据效率上的优越性。

Comments Presented at the ICLR 2026 Workshop on Artificial Intelligence and Partial Differential Equations

详情

优化的联邦知识蒸馏与分布式神经架构搜索

Chaimaa Medjadji, Sylvain Kubler, Yves Le Traon, Guilain Leduc, Sadi Alawadi, Feras M. Awaysheh

发表机构 * Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg（安全、可靠与信任跨学科研究中心（SnT），卢森堡大学）； Blekinge Institute of Technology（布莱金厄理工大学）； ADSLabs, Umea University（ADSLabs，乌梅亚大学）

AI总结本文提出FedKDNAS框架，结合客户端侧神经架构选择与服务器协调的知识蒸馏，以解决联邦学习中数据异质性、系统异质性和通信效率问题，通过提升准确率和效率的帕累托效率。

详情

AI中文摘要

联邦学习（FL）使在不集中数据的情况下进行协同模型训练成为可能。然而，现实部署必须同时解决客户端数据的统计异质性（非iid）、系统异质性（设备能力差异）和通信效率。现有FL方法通过改进聚合、个性化或知识蒸馏来缓解这些挑战，但几乎都假设客户端架构固定，限制了对异质数据复杂性和硬件约束的适应性。这种架构限制通常导致现实FL系统中准确率与效率之间的次优权衡。本文引入FedKDNAS，一种由蒸馏驱动的FL框架，结合客户端侧神经架构选择与服务器协调的知识蒸馏。每个客户端在准确率-资源约束下自主选择轻量模型，然后使用结合监督学习和知识蒸馏的混合目标在本地训练，并仅分享预测结果。服务器然后聚合并平滑这些预测，可选地与教师模型结合，以生成下一轮的稳定蒸馏目标。在六个数据集上对六个代表性的FL基线（FedAvg、Ditto、FedMD、FedDF、FedDistill、Local-KD）的广泛评估表明，FedKDNAS在非iid条件下将准确率提高高达15%，减少客户端CPU使用约28%，同时将通信开销减少高达44倍，同时保持轻量的logit通信。

英文摘要

Federated Learning (FL) enables collaborative model training without centralizing data. However, real-world deployments must simultaneously address statistical heterogeneity across client data (non-IID), system heterogeneity in device capabilities, and communication efficiency. Existing FL approaches mitigate these challenges through improved aggregation, personalization, or knowledge distillation, but they almost universally assume a fixed client architecture, limiting adaptability to heterogeneous data complexity and hardware constraints. This architectural constraint often leads to suboptimal trade-offs between accuracy and efficiency in real-world FL systems. This work introduces FedKDNAS, a distillation-driven FL framework that combines client-side neural architecture selection with distillation of server-coordinated knowledge. Each client autonomously selects a lightweight model under accuracy-resource constraints. It then trains it locally using a hybrid objective combining supervised learning and knowledge distillation and shares only predictions on a public reference set. The server then aggregates and smooths these predictions, optionally combining them with a teacher model, to produce stable distillation targets for the next round. Extensive evaluation on six datasets against six representative FL baselines (FedAvg, Ditto, FedMD, FedDF, FedDistill, Local-KD) demonstrates that FedKDNAS consistently achieves superior Pareto efficiency, improving accuracy by up to 15\% under non-IID conditions, reducing client CPU usage by approximately 28\%, and decreasing communication overhead by up to 44 times while maintaining lightweight logit-based communication.

URL PDF HTML ☆

赞 0 踩 0

2605.21318 2026-05-21 cs.CL cs.AI cs.LG 版本更新

TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization

TextReg: 通过正则化的文本空间优化缓解提示分布过拟合

Lucheng Fu, Ye Yu, Yiyang Wang, Yiqiao Jin, Haibo Jin, B. Aditya Prakash, Haohan Wang

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结本文研究了提示分布过拟合问题，提出TextReg框架通过正则化的文本梯度实现软惩罚目标，结合双证据梯度净化、语义编辑正则化和正则化引导的提示更新，提升模型在分布外（OOD）任务上的泛化能力。

Comments Code: https://github.com/luchengfu6/TextReg

详情

AI中文摘要

大型语言模型（LLMs）对用于指定任务目标和行为约束的提示非常敏感。许多最近的提示优化方法通过迭代使用LLM生成的反馈来重写提示，但结果提示往往变长，积累狭窄的样本特定规则，并在训练分布之外泛化能力差。我们研究这种失败模式作为提示分布过拟合，并认为这反映了离散文本空间优化中表示控制的不足。我们通过表示不效率（representational inefficiency）进行了形式化，这是一种双因素度量，将提示不效率分解为容量成本和范围狭窄，将分布提示过拟合归因于优化过程中两者的耦合增长。我们提出了TextReg，一个正则化框架，通过正则化的文本梯度实现软惩罚目标，结合双证据梯度净化、语义编辑正则化和正则化引导的提示更新。在多个推理基准上，TextReg显著提高了分布外（OOD）泛化能力，其准确性在TextGrad和REVOLVE上分别提高了+11.8%和+16.5%。

英文摘要

Large language models (LLMs) are highly sensitive to the prompts used to specify task objectives and behavioral constraints. Many recent prompt optimization methods iteratively rewrite prompts using LLM-generated feedback, but the resulting prompts often become longer, accumulate narrow sample-specific rules, and generalize poorly beyond the training distribution. We study this failure mode as prompt distributional overfitting and argue that it reflects a lack of representation control in discrete text-space optimization. We formalize this view through representational inefficiency, a dual-factor measure that decomposes prompt inefficiency into capacity cost and scope narrowness, attributing distributional prompt overfitting to their coupled growth during optimization. We propose TextReg, a regularization framework that realizes a soft-penalty objective through regularized textual gradients, combining Dual-Evidence Gradient Purification, Semantic Edit Regularization, and Regularization-Guided Prompt Update. Across multiple reasoning benchmarks, TextReg substantially improves out-of-distribution (OOD) generalization, with accuracy gains of up to +11.8% over TextGrad and +16.5% over REVOLVE.

URL PDF HTML ☆

赞 0 踩 0

2605.21317 2026-05-21 cs.LG 版本更新

TimeSRL: 通过语义RL调优的LLM实现通用的时间序列行为建模 -- 一项心理健康应用的案例研究

Yuang Fan, Lilin Xu, Millie Wu, Jingping Nie, Qingyu Chen, Yuzhe Yang, Zhuo Zhang, Xin Liu, Subigya Nepal, Xiaofan Jiang, Xuhai "Orson" Xu

发表机构 * Columbia University（哥伦比亚大学）； University of North Carolina at Chapel Hill（北卡罗来纳大学教堂山分校）； Yale University（耶鲁大学）； University of California, Los Angeles（加州大学洛杉矶分校）； Google（谷歌）； University of Virginia（弗吉尼亚大学）

AI总结本文提出TimeSRL，一种两阶段LLM框架，通过显式的语义瓶颈路由预测，将原始信号抽象为高级自然语言，从而预测行为结果，该方法在心理健康预测中实现了最先进的跨群体泛化性能。

详情

AI中文摘要

纵向被动传感能够实现连续健康预测，但模型在跨数据集分布偏移下往往失效。传统机器学习容易过拟合群体特异性特征，而大型语言模型（LLMs）在长且异质的时间序列上难以可靠推理。我们引入TimeSRL，一种两阶段LLM框架，通过显式的语义瓶颈路由预测。模型首先将原始信号抽象为高级自然语言，然后仅从这些抽象中预测行为结果。这迫使模型在我们认为泛化更好的语义概念上进行推理。我们通过组相对策略优化（GRPO）结合可验证奖励的强化学习（RLVR）端到端优化这一过程，学习与结果对齐的抽象，而无需金标准中间注释。在心理健康预测中，TimeSRL在设计用于在严格的一留一数据集-out（LOSO）协议下压力测试跨群体泛化能力的基准上实现了最先进的性能，将焦虑的均绝对误差（MAE）在强大的非LLM ML和LLM基线模型上分别降低了3.1-10.1%和9.5-44.1%，抑郁的MAE则降低了3.2-9.6%和27.4-57.6%（所有p值<0.05）。TimeSRL在不同传感管道上的跨基准迁移中显著优于先前方法，在不进行目标领域微调的情况下，其性能与自身在领域内性能相当。这些结果表明语义抽象具有可重用性，并指出了通过RL调优的LLM实现通用行为建模的新方向。

英文摘要

Longitudinal passive sensing enables continuous health prediction, yet models often fail under cross-dataset distribution shifts. Traditional ML overfits cohort-specific artifacts, while Large Language Models (LLMs) struggle to reason reliably over long, heterogeneous time-series. We introduce TimeSRL, a two-stage LLM framework that routes predictions through an explicit semantic bottleneck. The model first abstracts raw signals into high-level natural language, then predicts behavioral outcomes from these abstractions alone. This forces the model to reason over semantic concepts that we argue generalize better than raw numbers. We optimize this process end-to-end using Group Relative Policy Optimization (GRPO) with Reinforcement Learning from Verifiable Rewards (RLVR), learning outcome-aligned abstractions without gold intermediate annotations. Instantiated on mental-health prediction, TimeSRL achieves state-of-the-art performance on a benchmark designed to stress-test cross-cohort generalization under a rigorous leave-one-dataset-out (LOSO) protocol, reducing mean absolute error (MAE) over strong non-LLM ML and LLM baselines by 3.1--10.1% and 9.5--44.1% for anxiety, and 3.2--9.6% and 27.4--57.6% for depression (all $p$s<0.05). TimeSRL significantly outperforms prior methods in cross-benchmark transfer across different sensing pipelines, rivaling its own within-domain performance without target-domain fine-tuning. These results demonstrate that semantic abstractions are reusable and point to a new direction for generalizable behavior modeling via RL-tuned LLMs.

URL PDF HTML ☆

赞 0 踩 0

2605.21292 2026-05-21 stat.ML cs.AI cs.LG math.DS 版本更新

Large-Step Training Dynamics of a Two-Factor Linear Transformer Model

双因子线性变换器模型的大步训练动态

Krishnakumar Balasubramanian

发表机构 * Department of Statistics, University of California, Davis（加州大学戴维斯分校统计学系）

AI总结本文研究了双因子线性变换器模型在大学习率下的训练动态，通过分析发现大步长学习率可以改变变换器的训练吸引子，而非仅仅加速收敛，可能在稳定性阈值之外导致训练进入循环、有界混沌或发散。

详情

AI中文摘要

梯度流分析显示，简化的线性变换器可以学习上下文线性回归算法，但无法解释大学习率下梯度下降的有限步行为。受高学习率变换器不稳定性实证研究和二次回归的立方图相图启发，我们研究了一个可以简化为单提示线性变换器训练问题的恰好可约问题。归一化后，动态减少为一个双因子乘积映射，具有有效步长参数μ。在平衡切片上，该映射恢复了已知的标量立方过渡，从单调收敛到飞弹收敛，周期性和有界非收敛，以及发散。我们随后分析了完整的二维系统，显示对于0<μ<2，它有一个显式不变的切比雪夫椭圆，将前向不变区域分开；该椭圆承载着不平衡的混沌动态，但横向排斥，而平衡标量吸引子可以横向吸引。这些结果表明，大常数学习率可以改变学习变换器的训练吸引子，而不仅仅是加速收敛：在稳定性阈值之外，有限步训练可能进入循环、有界混沌或发散，而不是单一的上下文线性回归解。我们还讨论了这对基于小批量梯度下降训练方法的影响。

英文摘要

Gradient-flow analyses show that simplified linear transformers can learn the in-context linear-regression algorithm, but they do not explain the finite-step behavior of gradient descent at large learning rates. Motivated by empirical work on high-learning-rate transformer instabilities and by the cubic-map phase diagram for quadratic regression, we study an exactly reducible one-prompt linear-transformer training problem. After normalization, the dynamics reduce to a two-factor product map with an effective step-size parameter $μ$. On the balanced slice, this map recovers the known scalar cubic transition from monotone convergence to catapult convergence, periodic and chaotic bounded nonconvergence, and divergence. We then analyze the full two-dimensional system and show that, for $0<μ<2$, it has an explicit invariant Chebyshev ellipse separating forward-invariant regions; this ellipse carries off-balanced chaotic dynamics but is transversely repelling, while balanced scalar attractors can be transversely attracting. These results show that large constant learning rates can change the training attractor of the learned transformer rather than merely accelerating convergence: beyond sharp stability thresholds, finite-step training may settle into cycles, bounded chaos, or divergence instead of a single in-context linear-regression solution. We also discuss the consequences for mini-batch gradient descent based training methods.

URL PDF HTML ☆

赞 0 踩 0

2605.21288 2026-05-21 cs.LG 版本更新

A Mechanistic Study of Tabular Foundation Models

表格基础模型的机理研究

Marin Biloš, James T. Wilson, Anderson Schneider, Yuriy Nevmyvaka

发表机构 * Morgan Stanley（摩根大通）

AI总结本文研究了不同架构的表格基础模型在分类和回归任务中的准确性收敛问题，揭示了模型内部算法、对称性来源以及扰动鲁棒性的机理，发现先前指出的表示崩溃并非实际问题。

详情

AI中文摘要

表格基础模型在不同架构下在多种分类和回归任务中表现出准确性的收敛。这引发了排行榜无法回答的问题：（i）这些模型是否执行相同的上下文算法？（ii）行、列和类置换不变性来源在哪里？（iii）在针对推断机制设计的扰动下，它们的鲁棒性如何？我们对这三个问题进行了特征化。模型家族实现了质上不同的相似性基于读取：从加权投票上下文标签到类条件均值读取，每种都通过因果干预得到验证。我们发现先前工作中强调的表示崩溃并非这些模型的实际问题。每个模型的置换不变性可以追溯到特定的位置参数，移除这些参数可保持准确性并使近似不变性变为精确。针对每个读取设计的扰动复现了预测的失败模式；枢纽和排名攻击将它们与重训练基线隔离。这些结果共同提供了当前表格基础模型的机理解释，并识别了哪些归纳偏置同时决定了其准确性和特征性失败。

英文摘要

Tabular foundation models with different architectures converge in accuracy across a range of classification and regression tasks. This raises questions a leaderboard cannot answer: (i) whether the models execute the same in-context algorithm, (ii) where row, column, and class-permutation invariances originate, and (iii) how robust they are under perturbations engineered against the inferred mechanism. We characterize all three. The model families realize qualitatively distinct similarity-based readouts: from an attention-weighted vote over context labels to a class-conditional mean readout, each confirmed by causal intervention. We find that the representation collapse highlighted in prior work is not a practical concern for them. Each model's permutation invariances trace to specific positional parameters whose removal preserves accuracy and makes approximate invariance exact. Perturbations engineered against each readout reproduce predicted failure modes; hub and rank attacks isolate them from refit baselines. Together these results give a mechanistic account of contemporary tabular foundation models and identify which inductive biases govern both their accuracy and characteristic failures.

URL PDF HTML ☆

赞 0 踩 0

2605.21266 2026-05-21 cs.LG cs.AI 版本更新

How Much Online RL is Enough? Informative Rollouts for Offline Preference Optimization in RLVR

在线强化学习需要多少？用于RLVR中离线偏好优化的信息性回放

Richa Verma, Balaraman Ravindran

发表机构 * TCS Research Department of CSE（TCS计算机科学系研究部）； IIT Madras（印度理工学院马德拉斯分校）； Department of Data Science & AI（数据科学与人工智能系）； Wadhwani School of Data Science & AI（Wadhwani数据科学与人工智能学院）

AI总结本文提出G2D方法，通过短时GRPO预热、构建静态偏好数据集和离线DPO微调，以较低的计算成本实现优于GRPO的性能，强调偏好数据信息性而非数量的重要性。

详情

AI中文摘要

可验证奖励的强化学习（RLVR）已成为语言模型推理的强大范式，GRPO是其主要例子。然而，GRPO需要连续在线回放生成，这使它计算成本高且难以扩展。尽管直接偏好优化（DPO）提供了稳定的离线替代方案，但通常在训练时表现不如在线RL方法如GRPO。我们引入G2D（GRPO到DPO），一个三阶段流程，进行短GRPO预热，构建静态偏好数据集，并使用DPO离线微调模型。在Qwen2.5-7B和Llama-3.1-8B上，我们发现离线DPO在适度预热下能以显著更低的计算成本匹配或超越GRPO。在Qwen2.5-7B上，G2D在K=150时在MATH-500上达到62.4%，比GRPO（51.6%）高出10.8%，计算成本低约4倍。在Llama-3.1-8B上，G2D在K=500时达到49.4%，在实验设置中超越GRPO。我们表明性能不取决于偏好对的数量，而取决于其信息性。适度预热产生校准的不确定性回放，产生更强的对比信号，而过度预热导致过于自信的策略和信息较少的数据。我们的结果将RLVR中的离线-在线差距重新定义为主要的数据信息性问题，并识别了适当难度校准的离线微调数据集的短在线RL预热作为计算高效的在线RL替代方案。

英文摘要

Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a powerful paradigm for reasoning in language models, with GRPO as its primary example. However, GRPO requires continuous online rollout generation, making it computationally expensive and difficult to scale. While Direct Preference Optimization (DPO) offers a stable and efficient offline alternative, it is typically expected to underperform w.r.t. online RL methods such as GRPO when trained on rollouts from a cold supervised fine-tuned (SFT) policy. We introduce G2D (GRPO to DPO)}, a three-stage pipeline that performs a short GRPO warm-up, constructs a static preference dataset, and fine-tunes a model offline with DPO. Across a set of values of the number of online steps (K) in GRPO on Qwen2.5-7B and Llama-3.1-8B, we find that offline DPO with moderate warm-up matches or outperforms GRPO at substantially lower compute cost in our setting. On Qwen2.5-7B, G2D at K=150 achieves 62.4% on MATH-500, outperforming GRPO (51.6%) by 10.8% at ~4x lower compute. On Llama-3.1-8B, G2D at K=500 achieves 49.4%, surpassing GRPO in our experimental setting. We show that performance is not governed by the number of preference pairs, which does not vary much w.r.t. K, but by their informativeness. Moderate warm-up produces rollouts with calibrated uncertainty, yielding stronger contrastive signal, while excessive warm-up leads to overconfident policies and less informative data. Our results recast the offline-online gap in RLVR as primarily a data informativeness problem, and identify short online RL warm-up with appropriate difficulty calibration of the fine-tuning dataset as a compute-efficient alternative to online RL.

URL PDF HTML ☆

赞 0 踩 0

2605.20706 2026-05-21 cs.DC cs.AI cs.LG 版本更新

Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU

网络上的Llamas：基于WebGPU的内存高效、性能可移植和多精度LLM推理

Reese Levine, Rithik Sharma, Nikhil Jain, Abhijit Ramesh, Zheyuan Chen, Neha Abbas, James Contini, Tyler Sorensen

发表机构 * Microsoft Research（微软研究院）； UC Santa Cruz（加州大学圣克鲁兹分校）

AI总结本文提出LlamaWeb，一种基于WebGPU的LLM推理框架，通过静态内存规划和高效模型加载减少内存开销，支持多种模型权重格式，实现了内存高效、性能可移植的LLM推理。

Comments 19 pages, 11 figures, 5 tables

详情

AI中文摘要

在浏览器中运行语言模型提供了一个独特的机会，可以构建高效、私有且可移植的AI应用，但需要应对受限的内存可用性和异构硬件目标。为了实现这一机会，我们提出了Llamas on the Web（LlamaWeb），一种针对llama.cpp的WebGPU后端，能够在浏览器中实现内存高效且性能可移植的LLM推理，适用于广泛范围的模型权重格式。我们的设计通过静态内存规划和高效的模型加载显著减少了内存开销，通过可调的内核库解决了跨设备的差异性，并引入了模板化的GPU内核，支持多种量化格式的高性能实现，从而实现了广泛模型支持和对新格式的扩展性。我们评估了LlamaWeb在16个设备上，收集了10个语言模型和四种模型权重格式的数据。我们比较了LlamaWeb与现有的浏览器LLM框架，发现LlamaWeb在多种设备、浏览器和操作系统组合下需要29-33%更少的内存。我们还评估了LlamaWeb的性能，发现其在四个不同供应商的GPU上解码吞吐量提高了45-69%。此外，我们还比较了LlamaWeb与其他llama.cpp后端的性能，发现其在某些设备上与甚至超越了供应商特定的后端性能。

英文摘要

Running language models in the browser presents a unique opportunity to build efficient, private, and portable AI applications, but requires contending with constrained memory availability and heterogeneous hardware targets. To realize this opportunity, we present Llamas on the Web (LlamaWeb), a WebGPU backend for llama$.$cpp that enables memory-efficient and performance-portable LLM inference across a wide range of model weight formats in the browser. Our design significantly reduces memory overhead through static memory planning and efficient model loading, addresses cross-device variability through a tunable kernel library, and introduces templated GPU kernels that support performant implementations of numerous quantization formats, enabling broad model support and extensibility to new formats. We evaluate LlamaWeb on 16 devices from 8 vendors, collecting data from 10 language models and four model weight formats. We compare LlamaWeb against existing browser-based LLM frameworks and find that LlamaWeb requires 29-33% less memory across several combinations of device, browser, and operating system. We also evaluate LlamaWeb's performance against these frameworks and find that it increases decode throughput by 45-69% across four GPUs from separate vendors. In addition, we compare LlamaWeb's performance against other llama$.$cpp backends, where it is competitive with and even beats vendor-specific backend performance on some devices.

URL PDF HTML ☆

赞 0 踩 0

2605.19269 2026-05-21 cs.LG 版本更新

tutor-student 强化学习：一种动态课程以实现鲁棒的深度伪造检测

Zhanhe Lei, Zhongyuan Wang, Jikang Cheng, Baojin Huang, Yuhong Yang, Zhen Han, Chao Liang, Dengpan Ye

发表机构 * School of Computer Science, Wuhan University（武汉大学计算机学院）； School of Integrated Circuits, Peking University（北京大学集成电路学院）； School of Information, Huazhong Agricultural University（华中农业大学信息学院）； Cyberspace Institute of Advanced Technology, Guangzhou University（广州大学先进技术网络研究院）

AI总结本文提出了一种 tutor-student 强化学习框架，通过动态优化训练课程来提高深度伪造检测的鲁棒性和泛化能力。

Comments Accepted to CVPR 2026

详情

Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026 (CVPR 2026)

AI中文摘要

标准的监督训练将所有样本视为同等重要，这在学习鲁棒且可泛化的特征方面可能是次优的。在本工作中，我们提出了一种新颖的 tutor-student 强化学习 (TSRL) 框架，以动态优化训练课程。我们的方法将训练过程建模为马尔可夫决策过程，其中一个 ``tutor'' agent 学习引导一个 ``student'' (深度伪造检测器)。tutor 实现为一个近端策略优化 (PPO) agent，观察每个训练样本的丰富状态表示，包括不仅其视觉特征，还包括其历史学习动态，如 EMA 损失和遗忘计数。基于此状态，tutor 通过分配连续权重 (0-1) 到样本的损失，从而动态重新加权训练批次。tutor 的奖励基于 student 的即时性能变化，具体奖励从错误预测转为正确预测的过渡。这种策略促使 tutor 学习一个优先考虑高价值样本的课程，如困难但可学习的例子，从而实现更高效和有效的训练过程。我们证明，这种自适应课程相比传统训练方法提高了 student 对未见操纵技术的泛化能力。代码可在 https://github.com/wannac1/TSRL 上获得。

英文摘要

Standard supervised training for deepfake detection treats all samples with uniform importance, which can be suboptimal for learning robust and generalizable features. In this work, we propose a novel Tutor-Student Reinforcement Learning (TSRL) framework to dynamically optimize the training curriculum. Our method models the training process as a Markov Decision Process where a ``Tutor'' agent learns to guide a ``Student'' (the deepfake detector). The Tutor, implemented as a Proximal Policy Optimization (PPO) agent, observes a rich state representation for each training sample, encapsulating not only its visual features but also its historical learning dynamics, such as EMA loss and forgetting counts. Based on this state, the Tutor takes an action by assigning a continuous weight (0-1) to the sample's loss, thereby dynamically re-weighting the training batch. The Tutor is rewarded based on the Student's immediate performance change, specifically rewarding transitions from incorrect to correct predictions. This strategy encourages the Tutor to learn a curriculum that prioritizes high-value samples, such as hard-but-learnable examples, leading to a more efficient and effective training process. We demonstrate that this adaptive curriculum improves the Student's generalization capabilities against unseen manipulation techniques compared to traditional training methods. Code is available at https://github.com/wannac1/TSRL.

URL PDF HTML ☆

赞 0 踩 0

2603.17784 2026-05-21 cs.CV cs.LG 版本更新

ResNet-50 with Class Reweighting and Anatomy-Guided Temporal Decoding for Gastrointestinal Video Analysis

基于类重加权和解剖引导时间解码的ResNet-50在消化系统视频分析中的应用

Romil Imtiaz, Dimitris K. Iakovidis

发表机构 * Department of Computer Science and Biomedical Informatics, University of Thessaly（塞萨洛尼基大学计算机科学与生物医学信息学系）

AI总结本文提出了一种多标签消化系统视频分析管道，结合ResNet-50帧分类器和解剖引导的时间事件解码，通过类重加权和解剖引导的解码方法提高稀有病理类别的识别性能，最终在挑战测试集上将时间mAP从0.3801提升到0.4303。

Comments ICPR 2026 RARE-VISION Competition

2602.13485 2026-05-21 cs.LG stat.ML 版本更新

Federated Learning of Nonlinear Temporal Dynamics with Graph Attention-based Cross-Client Interpretability

基于图注意力的跨客户端可解释性非线性时序动态联邦学习

Ayse Tursucular, Ayush Mohanty, Nazal Mohamed, Nagi Gebraeel

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结本文提出了一种联邦学习框架，用于在分布式非线性系统中学习跨客户端的时序依赖关系。该框架通过非线性状态空间模型将本地高维观测映射到低维潜在状态，并利用图注意力网络在通信的潜在状态上学习图结构的神经状态转移模型，通过将学习的服务器侧转移模型的雅可比矩阵与注意力系数相关联，实现了对跨客户端时序依赖关系的可解释性。

Comments Manuscript under review

详情

AI中文摘要

现代工业系统的网络越来越多地由分布式传感器监控，其中每个系统由多个子系统组成，生成高维时间序列数据。这些子系统通常是相互依赖的，因此理解一个子系统中的时序模式如何与其他子系统相关联变得很重要。在去中心化设置中，原始测量值无法共享，客户端观测是异质的，这使得问题更加复杂。在实际部署中，每个子系统（客户端）运行一个固定的专有模型，无法修改或重新训练，限制了现有方法。非线性动态进一步使跨客户端时序依赖关系难以解释，因为它们嵌入在非线性状态转移函数中。本文提出了一种联邦框架，用于在这些约束下学习跨客户端的时序依赖关系。每个客户端使用非线性状态空间模型将高维本地观测映射到低维潜在状态。中央服务器利用图注意力网络在通信的潜在状态上学习图结构的神经状态转移模型。为了可解释性，我们将学习的服务器侧转移模型的雅可比矩阵与注意力系数相关联，从而首次提供了对去中心化非线性系统中跨客户端时序依赖关系的可解释性描述。我们建立了理论收敛保证，以达到集中化 oracle，并通过合成实验验证了该框架，展示了收敛性、可解释性、可扩展性和隐私。此外，现实世界实验显示其性能与去中心化基线相当。

英文摘要

Networks of modern industrial systems are increasingly monitored by distributed sensors, where each system comprises multiple subsystems generating high dimensional time series data. These subsystems are often interdependent, making it important to understand how temporal patterns at one subsystem relate to others. This is challenging in decentralized settings where raw measurements cannot be shared and client observations are heterogeneous. In practical deployments each subsystem (client) operates a fixed proprietary model that cannot be modified or retrained, limiting existing approaches. Nonlinear dynamics further make cross client temporal interdependencies difficult to interpret because they are embedded in nonlinear state transition functions. We present a federated framework for learning temporal interdependencies across clients under these constraints. Each client maps high dimensional local observations to low dimensional latent states using a nonlinear state space model. A central server learns a graph structured neural state transition model over the communicated latent states using a Graph Attention Network. For interpretability we relate the Jacobian of the learned server side transition model to attention coefficients, providing the first interpretable characterization of cross client temporal interdependencies in decentralized nonlinear systems. We establish theoretical convergence guarantees to a centralized oracle and validate the framework through synthetic experiments demonstrating convergence, interpretability, scalability and privacy. Additional real world experiments show performance comparable to decentralized baselines.

URL PDF HTML ☆

赞 0 踩 0

2601.15133 2026-05-21 cs.CV cs.LG 版本更新

多样化的大语言模型还是多样化的问题解释？那是集成的问题

Rafael Rosales, Santiago Miret

发表机构 * Intel Labs（英特尔实验室）

AI总结本文比较了使用大语言模型回答二元问题的两种多样性方法：模型多样性和问题解释多样性，并发现问题解释多样性在集成准确性上表现更优。

详情

DOI: 10.63317/43t2yvgid7tw
Journal ref: Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), pages 5116-5128

AI中文摘要

有效利用多样性已被证明可以提高各种机器学习模型，包括大语言模型（LLMs）的性能。然而，确定最有效的多样性使用方法仍是一个挑战。在本工作中，我们比较了两种用于使用LLMs回答二元问题的多样性方法：模型多样性，即多个模型回答相同的问题，以及问题解释多样性，即使用同一模型以不同方式 framing 相同的问题来回答。对于这两种情况，我们应用多数投票作为集成共识启发式方法来确定最终答案。我们的boolq、strategyqa和pubmedqa实验表明，问题解释多样性在集成准确性上始终优于模型多样性。此外，我们对GPT和LLaMa的分析表明，模型多样性通常产生在最佳和最差集成成员之间的结果，而没有明显的改进。

英文摘要

Effectively leveraging diversity has been shown to improve performance for various machine learning models, including large language models (LLMs). However, determining the most effective way of using diversity remains a challenge. In this work, we compare two diversity approaches for answering binary questions using LLMs: model diversity, which relies on multiple models answering the same question, and question interpretation diversity, which relies on using the same model to answer the same question framed in different ways. For both cases, we apply majority voting as the ensemble consensus heuristic to determine the final answer. Our experiments on boolq, strategyqa, and pubmedqa show that question interpretation diversity consistently leads to better ensemble accuracy compared to model diversity. Furthermore, our analysis of GPT and LLaMa shows that model diversity typically produces results between the best and the worst ensemble members without clear improvement.

URL PDF HTML ☆

赞 0 踩 0

2503.00565 2026-05-21 stat.ML cs.LG math.ST stat.ME stat.TH 版本更新

Batched Single-Index Global Multi-Armed Bandits with Covariates

批量单索引全局多臂老虎机与协变量

Sakshi Arya, Hyebin Song

发表机构 * Department of Mathematics, Applied Mathematics and Statistics, Case Western Reserve University（数学系、应用数学与统计学系，凯斯西储大学）； Department of Statistics, Pennsylvania State University（统计系，宾夕法尼亚州立大学）

AI总结本文提出了一种新的半参数框架，用于带有协变量的批量老虎机问题，通过引入共享参数和单索引回归模型来捕捉臂奖励之间的关系，提出BIDS算法，在两种设置下推导了理论遗憾界，证明了在协变量维度为1时非参数批量老虎机的最优率。

详情

AI中文摘要

多臂老虎机（MAB）框架是一种广泛用于顺序决策制定的方法，其中决策者在每一轮中选择一个臂，以最大化长期奖励。在许多实际应用中，如个性化医学和推荐系统，决策时可用上下文信息，不同臂的奖励相关而非独立，且反馈以批量形式提供。我们提出了一种新的半参数框架，用于带有协变量的批量老虎机，该框架在臂之间共享参数。我们利用单索引回归（SIR）模型来捕捉臂奖励之间的关系，同时在可解释性和灵活性之间取得平衡。我们的算法，批量单索引动态分箱和 successive arm elimination（BIDS），采用批量 successive arm elimination 策略，并通过单索引方向引导的动态分箱机制。我们考虑了两种设置：一种是可用 pilot 方向，另一种是方向从数据估计，推导了两种情况的理论遗憾界。当 pilot 方向足够准确且臂的数量 K 固定时，我们的方法在非参数批量老虎机中实现了最小化最优率（d=1），规避了维度灾难。在模拟和现实数据集上的大量实验展示了我们的算法相比由 \cite{jiang2025batched} 引入的非参数批量老虎机方法的有效性。

英文摘要

The multi-armed bandits (MAB) framework is a widely used approach for sequential decision-making, where a decision-maker selects an arm in each round with the goal of maximizing long-term rewards. In many practical applications, such as personalized medicine and recommendation systems, contextual information is available at the time of decision-making, rewards from different arms are related rather than independent, and feedback is provided in batches. We propose a novel semi-parametric framework for batched bandits with covariates that incorporates a shared parameter across arms. We leverage the single-index regression (SIR) model to capture relationships between arm rewards while balancing interpretability and flexibility. Our algorithm, Batched single-Index Dynamic binning and Successive arm elimination (BIDS), employs a batched successive arm elimination strategy with a dynamic binning mechanism guided by the single-index direction. We consider two settings: one where a pilot direction is available and another where the direction is estimated from data, deriving theoretical regret bounds for both cases. When a pilot direction is available with sufficient accuracy and the number of arms $K$ is fixed, our approach achieves minimax-optimal rates (with $d = 1$) for nonparametric batched bandits, circumventing the curse of dimensionality. Extensive experiments on simulated and real-world datasets demonstrate the effectiveness of our algorithm compared to the nonparametric batched bandit method introduced by \cite{jiang2025batched}.

URL PDF HTML ☆

赞 0 踩 0

2501.01793 2026-05-21 cs.LG cs.AI 版本更新

Creating Artificial Students that Never Existed: Leveraging Large Language Models and CTGANs for Synthetic Data Generation

创建从未存在过的虚拟学生：利用大型语言模型和CTGANs进行合成数据生成

Mohammad Khalil, Sam Urmian, Ronas Shakya, Qinyi Liu

发表机构 * Centre for the Science of Learning & Technology (SLATE)（学习科学与技术中心（SLATE））； University of Bergen（卑尔根大学）

AI总结本文研究了利用生成对抗网络（GANs）和大型语言模型（LLMs）生成合成表格数据的潜力，探讨了通过合成数据创建虚拟学生以服务于学习分析模型的可能性，并评估了不同生成模型的性能。

详情

AI中文摘要

在本研究中，我们探索了人工智能和深度学习技术，特别是生成对抗网络（GANs）和大型语言模型（LLMs）在生成合成表格数据方面的成长潜力。获取高质量学生数据对于推进学习分析至关重要，但隐私问题和全球更严格的数据保护法规限制了其可用性和使用。合成数据提供了一个有前途的替代方案。我们探讨了是否可以利用合成数据来创建虚拟学生以服务于学习分析模型。使用流行的GAN模型CTGAN和三种LLMs-GPT2、DistilGPT2和DialoGPT，我们生成了合成的表格学生数据。我们的结果表明，这些方法具有强大的潜力，能够生成高质量的合成数据集，与真实学生数据相似。为了验证我们的发现，我们应用了一套全面的效用评估指标来评估合成数据的统计和预测性能，并比较了不同生成模型，特别是LLMs的性能。本研究旨在为学习分析社区提供有价值的见解，为扩展学习分析领域的方法学工具箱提供新的创新方法。

英文摘要

In this study, we explore the growing potential of AI and deep learning technologies, particularly Generative Adversarial Networks (GANs) and Large Language Models (LLMs), for generating synthetic tabular data. Access to quality students data is critical for advancing learning analytics, but privacy concerns and stricter data protection regulations worldwide limit their availability and usage. Synthetic data offers a promising alternative. We investigate whether synthetic data can be leveraged to create artificial students for serving learning analytics models. Using the popular GAN model CTGAN and three LLMs- GPT2, DistilGPT2, and DialoGPT, we generate synthetic tabular student data. Our results demonstrate the strong potential of these methods to produce high-quality synthetic datasets that resemble real students data. To validate our findings, we apply a comprehensive set of utility evaluation metrics to assess the statistical and predictive performance of the synthetic data and compare the different generator models used, specially the performance of LLMs. Our study aims to provide the learning analytics community with valuable insights into the use of synthetic data, laying the groundwork for expanding the field methodological toolbox with new innovative approaches for learning analytics data generation.

URL PDF HTML ☆

赞 0 踩 0

2501.01785 2026-05-21 cs.LG cs.AI cs.CY 版本更新

Can Synthetic Data be Fair and Private? A Comparative Study of Synthetic Data Generation and Fairness Algorithms

合成数据能否公平且隐私？合成数据生成与公平性算法的比较研究

Qinyi Liu, Oscar Deho, Sam Urmian, Mohammad Khalil, Srecko Joksimovic, George Siemens

发表机构 * Centre for the Science of Learning & Technology (SLATE), University of Bergen（学习科学与技术中心（SLATE），卑尔根大学）； University of South Australia（澳大利亚南澳大利亚大学）

AI总结本研究探讨了合成数据生成与公平性算法在平衡隐私和公平性方面的效果，发现DECAF算法在隐私和公平性之间取得最佳平衡，但其预测准确性较低，而对合成数据应用预处理公平算法能进一步提升公平性。

详情

AI中文摘要

随着机器学习在学习分析（LA）中的广泛应用，算法公平性和隐私问题引发了广泛关注。合成数据作为一种双重用途工具，能够增强LA模型的隐私性和公平性。然而，先前研究指出公平性与隐私之间存在反比关系，使同时优化两者变得困难。本研究探讨了哪些合成数据生成器能最好地平衡隐私和公平性，并确定预处理公平算法（通常应用于真实数据集）在合成数据上的有效性。我们的结果表明，DEbiasing CAusal Fairness（DECAF）算法在隐私和公平性之间取得了最佳平衡。然而，DECAF在实用性上表现不佳，这体现在其预测准确性上。值得注意的是，我们发现将预处理公平算法应用于合成数据时，公平性提升幅度比应用于真实数据时更大。这些发现表明，结合合成数据生成与公平性预处理可以为创建更公平的LA模型提供有前途的方法。

英文摘要

The increasing use of machine learning in learning analytics (LA) has raised significant concerns around algorithmic fairness and privacy. Synthetic data has emerged as a dual-purpose tool, enhancing privacy and improving fairness in LA models. However, prior research suggests an inverse relationship between fairness and privacy, making it challenging to optimize both. This study investigates which synthetic data generators can best balance privacy and fairness, and whether pre-processing fairness algorithms, typically applied to real datasets, are effective on synthetic data. Our results highlight that the DEbiasing CAusal Fairness (DECAF) algorithm achieves the best balance between privacy and fairness. However, DECAF suffers in utility, as reflected in its predictive accuracy. Notably, we found that applying pre-processing fairness algorithms to synthetic data improves fairness even more than when applied to real data. These findings suggest that combining synthetic data generation with fairness pre-processing offers a promising approach to creating fairer LA models.

URL PDF HTML ☆

赞 0 踩 0

2409.08700 2026-05-21 cs.LG 版本更新

Personalized Weight Loss Management through Wearable Devices and Artificial Intelligence

通过可穿戴设备和人工智能实现个性化体重管理

Sergio Romero-Tapiador, Ruben Tolosana, Aythami Morales, Blanca Lacruz-Pleguezuelos, Sofia Bosch Pastor, Laura Judith Marcos-Zambrano, Guadalupe X. Bazán, Gala Freixer, Ruben Vera-Rodriguez, Julian Fierrez, Javier Ortega-Garcia, Isabel Espinosa-Salinas, Enrique Carrillo de Santa Pau

发表机构 * Department of Mathematics, Universidad de Las Palmas de Gran Canaria, 35001, Spain（拉斯帕尔马斯德Gran Canaria大学数学系）； Cancer Research Program, IMDEA Food Institute（IMDEA食品研究所癌症研究计划）； Health, IMDEA Food Institute（IMDEA食品研究所健康）

AI总结本文研究利用可穿戴设备和人工智能预测超重和肥胖人群的体重变化，通过分析100名受试者的生物标志物、体征和行为数据，发现体重减轻者与未减轻者的关键差异，使用梯度提升分类器达到84.44%的AUC，表明多数据源整合在个性化医疗中的潜力。

Comments 25 pages, 6 figures, 7 tables, 1 appendix

详情

DOI: 10.1016/j.compbiomed.2026.111676
Journal ref: Computers in Biology and Medicine, Vol. 173, 111676, 2026

AI中文摘要

早期检测慢性及非传染性疾病（NCDs）对于在初始阶段有效治疗至关重要。本研究探讨了可穿戴设备和人工智能（AI）在预测超重和肥胖个体体重变化中的应用。使用来自AI4FoodDB数据库的1个月试验数据，包括生物标志物、体征和行为数据，我们识别出体重减轻（≥初始体重2%）者与未减轻者之间的关键差异。特征选择技术和分类算法显示出有前景的结果，梯度提升分类器达到84.44%的曲线下面积（AUC）。多数据源（如体征、体力和睡眠活动等）的整合增强了性能，表明可穿戴设备和AI在个性化医疗中的潜力。

英文摘要

Early detection of chronic and Non-Communicable Diseases (NCDs) is crucial for effective treatment during the initial stages. This study explores the application of wearable devices and Artificial Intelligence (AI) in order to predict weight loss changes in overweight and obese individuals. Using wearable data from a 1-month trial involving around 100 subjects from the AI4FoodDB database, including biomarkers, vital signs, and behavioral data, we identify key differences between those achieving weight loss (>= 2% of their initial weight) and those who do not. Feature selection techniques and classification algorithms reveal promising results, with the Gradient Boosting classifier achieving 84.44% Area Under the Curve (AUC). The integration of multiple data sources (e.g., vital signs, physical and sleep activity, etc.) enhances performance, suggesting the potential of wearable devices and AI in personalized healthcare.

URL PDF HTML ☆

赞 0 踩 0

2605.21260 2026-05-21 cs.LG 版本更新

On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective

关于思维链的成本与收益：一种学习理论视角

Yue Zhang, Zhiyi Dong, Tommaso Cesari, Yongyi Mao

发表机构 * University of Ottawa（渥太华大学）

AI总结本文从学习理论的角度出发，研究了思维链（CoT）的成本与收益，通过分析回答映射与链式规则的交互作用，定义了假设在该交互下的推理风险，并推导出该风险的紧分解，揭示了CoT在不同条件下的帮助与损害作用。

详情

AI中文摘要

我们开发了一个学习理论框架，用于理解思维链（CoT）。我们将CoT建模为回答映射与链式规则之间的交互作用，链式规则通过自回归的方式生成中间问题，并定义了在该交互下假设的推理风险。我们的第一个结果是将该风险紧分解为两个具有相反作用的项：一个oracle轨迹风险（OTR），它捕捉了CoT的收益，并在领域适应问题中减少到目标领域风险；一个轨迹不匹配风险（TMR），它捕捉了CoT通过在不匹配的推理轨迹上积累误差所带来的成本。然后我们展示，这种成本在没有结构的情况下是无法避免的：如果任何一项损失、假设的回答映射或链式规则缺乏稳定性，即使OTR为零且假设与真实值一致，TMR也可以任意大。相反，在具有稳定性的情况下，我们证明了在精确放大因子下TMR的紧上界，该放大因子识别了有界、线性和指数误差增长区域。这些结果共同给出了CoT何时有助于推理、何时有害以及控制两者之间转换的精确理论。

英文摘要

We develop a learning-theoretic framework for understanding Chain of Thought (CoT). We model CoT as the interaction between an answer map and a chain rule that generates intermediate questions autoregressively, and define the reasoning risk of a hypothesis under this interaction. Our first result is a tight canonical decomposition of this risk into two terms with opposing roles: an oracle-trajectory risk (OTR), which captures the benefit of CoT and reduces to a target-domain risk in a domain adaptation problem, and a trajectory-mismatch risk (TMR), which captures the cost of CoT through error accumulation along mismatched reasoning trajectories. We then show that this cost is unavoidable without structure: if any one of the loss, the hypothesis answer map, or the chain rule lacks stability, the TMR can be arbitrarily large even when the OTR is zero and the hypothesis is uniformly close to the ground truth. Conversely, under stability, we prove a tight upper bound on the TMR governed by an exact amplification factor that identifies bounded, linear, and exponential error-growth regimes. Together, these results give a precise theory of when CoT helps, when it hurts, and what controls the transition between the two.

URL PDF HTML ☆

赞 0 踩 0

2605.21253 2026-05-21 stat.ML cs.LG 版本更新

Theoretical guidelines for annealed Langevin dynamics in compositional simulation-based inference

关于组成式得分方法在基于模拟的推断中的退火动力学理论指南

Camille Touron, Gabriel V. Cardoso, Julyan Arbel, Pedro L. C. Rodrigues

发表机构 * Univ. Grenoble Alpes（格勒诺布尔阿尔卑斯大学）； Inria（法国国家信息与自动化技术研究所）； CNRS（法国国家科学研究中心）； Grenoble INP（格勒诺布尔INP）； LJK（实验室）； Geostatistics team（地质统计学团队）； Centre for geosciences and geoengineering（地球科学与地球工程中心）； Mines Paris（巴黎矿校）； PSL University（巴黎 sciences et lettres 大学）

AI总结本文研究了基于模拟的推断中组成式得分方法的退火动力学理论，提出了一种新的理论框架，通过推导Wasserstein界，为超参数选择提供了理论指导，并在高斯情况下证明了不同复合得分方法在步长和总动力学步数上的差异。

详情

AI中文摘要

基于模拟的推断（SBI）中的组成式得分方法通过聚合单独学习的后验得分来近似给定n个独立观测的后验分布。目前主要有两种方法（Geffner等人，2023；Linhart等人，2026）。由于所得到的复合得分不对应于真实多观测后验的正向扩散路径上的任何分布的得分，通过反向SDE采样会导致不可消除的偏差。退火动力学提供了一种原理性的替代方法：它将复合得分视为一系列可处理的桥梁密度序列的真实得分，并依次采样这些密度。当正确调节时，它可能导致可控的偏差。然而，其超参数，即步长、每个级别步数和退火级别数，迄今为止都是经验选择。我们推导了退火动力学在近似得分下的Wasserstein界，并将其转化为这些超参数的显式决策规则，以保证规定的采样精度，同时突显每种复合得分方法的不同理论方面。在高斯情况下，我们获得了所有相关量的闭式表达式，并证明了Linhart等人（2026）的桥梁密度一致地允许更大的步长和更少的总动力学步数，而Geffner等人（2023）的则不然。此外，我们还通过实验证明，在高斯情况下的调节可以推广到更复杂的问题，从而为使用组成式得分方法的实践者提供了一个清晰且理论坚实的起点。

英文摘要

Compositional score-based approaches to simulation-based inference (SBI) approximate the posterior over a shared parameter given $n$ independent observations by aggregating individually learned posterior scores: currently, there are two main propositions of such methods (Geffner et al. (2023), Linhart et al. (2026)). As the resulting composite score does not correspond to the score of any distribution along the forward diffusion path of the true multi-observation posterior, sampling from it via a reverse SDE leads to an irreducible bias. Annealed Langevin dynamics provides a principled alternative: it treats the composite score as the genuine score of a sequence of tractable bridging densities and samples from them in succession. When properly tuned, it could lead to a controllable bias. However, its hyperparameters, namely step sizes, the number of steps per level, and the number of annealing levels, have so far been chosen empirically. We derive Wasserstein bounds for annealed Langevin with approximate scores and translate them into explicit decision rules for these hyperparameters that guarantee a prescribed sampling accuracy, while highlighting different theoretical aspects of each composite score formulation. In the Gaussian setting, we obtain closed-form expressions for all relevant quantities and prove that the bridging densities of Linhart et al. (2026) consistently admit larger step sizes and require fewer total Langevin steps than those of Geffner et al. (2023). Furthermore, we show empirically that the tuning obtained in the Gaussian setting generalizes to more complex problems, thus providing a well-understood and theoretically grounded starting point for practitioners using compositional score-based approaches.

URL PDF HTML ☆

赞 0 踩 0

2605.21241 2026-05-21 cs.LG 版本更新

Divide and Contrast: Learning Robust Temporal Features without Augmentation

划分与对比：无需增强学习鲁棒的时间特征

Abdul-Kazeem Shamba, Kerstin Bach, Gavin Taylor

发表机构 * Department of Computer Science, Norwegian University of Science（挪威科学与技术大学计算机科学系）； Department of Computer Science, United States Naval Academy（美国海军学院计算机科学系）

AI总结本文提出Di-COT框架，通过对比时间窗口内的信息子结构而非单个时间步，实现了无需数据增强和多编码器传递的自监督学习，从而在六个大规模真实世界数据集和UCR/UEA基准上取得了最先进的性能，同时显著减少了训练时间。

Comments Published in the 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

针对时间序列表示的自监督学习旨在减少对标记数据的依赖，同时保持强大的下游性能，但许多现有方法存在计算成本高或依赖不适用于多样化时间动态的假设。在本工作中，我们引入了Divide and Contrast (Di-COT)，一种无需数据增强和多次编码器传递的无监督框架，通过对比时间窗口内的信息子结构而非单个时间步来实现。Di-COT在每次迭代中随机将每个窗口划分为少量重叠的子块，从而实现高效且有意义的对比，同时减轻时间转换期间的假阳性。为进一步提高可扩展性，我们采用了一种对比目标，其计算依赖于批量大小和子块数量，使损失计算独立于序列长度。在六个大规模真实世界数据集以及UCR和UEA基准上的广泛实验表明，Di-COT学习了语义结构化且可迁移的表示，实现了分类、聚类、kNN和跨数据集转移任务上的最先进的性能，同时大幅减少了训练时间。源代码可在https://github.com/sfi-norwai/Di-COT上公开获取。

英文摘要

Self-supervised learning for time-series representation aims to reduce reliance on labeled data while maintaining strong downstream performance, yet many existing approaches incur high computational costs or rely on assumptions that do not hold across diverse temporal dynamics. In this work, we introduce Divide and Contrast (Di-COT), an unsupervised framework that avoids data augmentation and multiple encoder passes by contrasting informative substructures within a window rather than individual timesteps. Di-COT stochastically partitions each window into a small number of overlapping sub-blocks per iteration, enabling efficient and meaningful contrast while mitigating false positives during temporal transitions. To further improve scalability, we adopt a contrastive objective whose computation depends on the batch size and the number of sub-blocks, making loss computation independent of sequence length. Extensive experiments on six large-scale real-world datasets, as well as the UCR and UEA benchmarks, demonstrate that Di-COT learns semantically structured and transferable representations, achieving state-of-the-art performance on classification, clustering, $k$NN, and cross-dataset transfer, while substantially reducing training time. The source code is publicly available at https://github.com/sfi-norwai/Di-COT.

URL PDF HTML ☆

赞 0 踩 0

2605.21240 2026-05-21 cs.LG cs.AI 版本更新

APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

APEX：自主策略探索用于自演化大语言模型代理

Yibo Li, Jiashuo Yang, Zhi Zheng, Zhiyuan Hu, Yuan Sui, Shizun Wang, Yufei He, Bryan Hooi

发表机构 * National University of Singapore（新加坡国立大学）； Beijing University of Posts and Telecommunications（北京邮电大学）

AI总结本文提出APEX，一种用于自演化大语言模型代理的自主策略探索方法，通过构建和维护显式的策略空间来解决探索崩溃问题，并在多个基准测试中表现出色。

详情

AI中文摘要

LLM代理在广泛复杂的任务中表现出强大的性能，包括需要长时间决策的交互环境。但是这些代理在测试时间无法实时学习。自演化代理通过在多个回合中积累记忆和反思来解决这个问题，而不是要求模型权重更新。然而，这些代理常常面临探索崩溃的问题：随着记忆的增长，行为会集中在熟悉的高奖励惯例上，减少了发现更好替代品的机会。为了解决这个问题，我们提出了自主策略探索（APEX），通过策略图——一个具有先决条件依赖边的有向无环图来构建和维护显式的策略空间。在APEX中，分支发现通过证据支持的未探索方向扩展地图，而策略选择在规划过程中平衡探索和利用。在九个Jericho文本冒险游戏和WebArena（一个现实的网络交互基准）上进行评估，APEX优于所有基线。广泛的消融实验验证了每个组件的贡献，并展示了在不同设置中的鲁棒性，证明了APEX在自演化代理中的持续探索有效性。

英文摘要

LLM agents have shown strong performance across a wide range of complex tasks, including interactive environments that require long-horizon decision making. But these agents cannot learn on the fly at test time. Self-evolving agents address this by accumulating memory and reflection across episodes rather than requiring model-weight updates. However, these agents often suffer from exploration collapse: as memory grows, behavior concentrates around familiar high-reward routines, reducing the chance of discovering better alternatives. To address this problem, we propose Autonomous Policy EXploration (APEX), which builds and maintains an explicit strategy space through a strategy map-a directed acyclic graph of milestones with prerequisite dependency edges. In APEX, Fork Discovery expands the map with evidence-grounded unexplored directions, while Policy Selection balances exploration and exploitation during planning. Evaluated on nine Jericho text-adventure games and WebArena, a realistic web interaction benchmark, APEX outperforms all baselines. Extensive ablations validate each component's contribution and demonstrate robustness across diverse settings, demonstrating APEX's effectiveness for sustained exploration in self-evolving agents.

URL PDF HTML ☆

赞 0 踩 0

2605.21226 2026-05-21 cs.LG cs.AI 版本更新

OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization

OCTOPUS: 通过在最优平方误差量化下的八面体参数化优化Transformer的KV缓存

Mark Boss, Vikram Voleti, Simon Donné, Shimon Vainer

发表机构 * Stability AI

AI总结 OCTOPUS通过联合量化旋转坐标三元组，优化了Transformer的KV缓存，在保持内存带宽和足迹的同时，通过八面体参数化将方向映射到平方，并利用Lloyd-Max量化来实现非均匀的位分配，从而在各种数据类型中实现了优于现有旋转编码器的性能。

详情

AI中文摘要

关键值（KV）缓存是长上下文自回归推断中内存带宽和足迹的主要瓶颈。最近的旋转预条件编码器（TurboQuant, PolarQuant）表明，通过结构化的随机旋转后，再配合每个坐标轴的标量量化器，该量化器的边际分布具有解析性，可以近似达到KV压缩的最优解。OCTOPUS通过联合量化旋转坐标三元组进一步推进了这一范式。每个三元组的方向通过八面体参数化映射到平方，然后得到的两个坐标和三元组范数通过Lloyd-Max量化与实现匹配的边际分布进行量化。通过优化每个三元组的平方误差，得到的位分配严格非均匀，仅依赖于键的总维度。我们发现，在有限维的情况下，通过扫描找到的质量最优是恒定的，无论在我们测试的任何现实解码器中。该编码器是数据无关的、在线的，并且在给定种子的情况下是确定性的。在文本、视频和音频中，OCTOPUS在每个报告的比特宽度和指标上都匹配或超越了所有先前的旋转编码器，其优势随着比特数的减少而增加。此外，一个融合的Triton实现可以在不生成未压缩键的情况下实时重建键，因此编码器在解码时间上不会增加带宽或延迟。项目页面：https://octopus-quant.github.io/

英文摘要

The key-value (KV) cache dominates memory bandwidth and footprint in long-context autoregressive inference. Recent rotation-preconditioned codecs (TurboQuant, PolarQuant) show that a structured random rotation followed by a per-coordinate scalar quantizer matched to an analytically tractable marginal is a near-optimal recipe for KV compression. OCTOPUS advances this paradigm through joint quantization of rotated coordinate triplets. Each triplet's direction is mapped to a square via an octahedral parameterization, and the two resulting coordinates and the triplet norm are Lloyd-Max quantized against implementation-matched marginals. Optimizing the per-triplet squared error gives a strictly non-uniform bit allocation depending only on the total dimensionality of the keys. We find the finite-dimensional quality optimum with sweeps to be constant on every real decoder we test. The codec is data-oblivious, online, and deterministic given a seed. Across text, video, and audio, OCTOPUS matches or beats every prior rotation codec at every reported bit width and metric, with a lead that grows as bits drop for extreme compression. Furthermore, a fused Triton implementation reconstructs keys on the fly without materializing the uncompressed key, so the codec adds no decode-time bandwidth or latency over the existing dequantization. Project Page: https://octopus-quant.github.io/

URL PDF HTML ☆

赞 0 踩 0

2605.21225 2026-05-21 cs.LG cs.AI 版本更新

PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment

PREFINE: 基于偏好的隐式奖励和成本微调以实现安全对齐

Richa Verma, Bavish Kulur, Sanjay Chawla, Balaraman Ravindran

发表机构 * TCS Research, \ of CSE, IIT Madras India ； Department of Computing Science, \ of Alberta Canada ； Qatar Computing Research Institute, \ Bin Khalifa University Qatar ； Department of Data Science \& AI, Wadhwani School of Data Science \& AI, IIT Madras India ； TCS Research, \ of CSE, IIT Madras ； Department of Computing Science, \ of Alberta ； Qatar Computing Research Institute, \ Bin Khalifa University ； Department of Data Science \& AI, Wadhwani School of Data Science \& AI, IIT Madras

AI总结该研究提出PREFINE方法，通过基于偏好的隐式奖励和成本微调，在连续控制环境中实现安全策略对齐，通过微调预训练强化学习策略以生成低成本行为同时保持高奖励。

Comments Accepted at AAMAS 2026 as a full paper

详情

AI中文摘要

我们解决了通过引入成本约束使预训练的强化学习（RL）策略安全意识的问题，而无需重新训练。虽然成本可以数值编码，但我们假设更一般的情况是当成本作为偏好提供时。给定一个奖励优化的策略和一个小的偏好（低成本）和不偏好（高成本）轨迹数据集，我们的目标是微调策略以生成低成本行为，同时保留高奖励。与标准RLHF在语言模型中不同，我们的设置涉及轨迹层面的偏好，在连续控制环境中。我们介绍了PREFINE：基于偏好的隐式奖励和成本微调以实现安全对齐，这是一种基于偏好的微调方法，将现在广泛用于LLM微调的直接偏好优化（DPO）适应到序列决策设置中。PREFINE构造策略采样的反事实轨迹以建立有意义的偏好对比，并联合优化奖励保留和安全对齐。实证上，PREFINE将约束违反和灾难性故障减少了超过60%，同时保持原始奖励行为。PREFINE生成的策略在显著提高数据和计算效率的情况下，实现了低成本、高奖励性能， bridging preference alignment和安全策略适应在连续域中。

英文摘要

We address the problem of making a pre-trained reinforcement learning (RL) policy safety-aware by incorporating cost constraints without retraining it from scratch. While costs could be numerically encoded, we assume a more general setting is when costs are provided as preferences. Given a reward-optimized policy and a small dataset of preferred (low-cost) and dispreferred (high-cost) trajectories, our goal is to fine-tune the policy to generate low-cost behaviors while retaining high rewards. Unlike standard RLHF in language models, where preferences are defined over responses to the same prompt, our setting involves trajectory-level preferences in continuous control environments. We introduce PREFINE: Preference-based Implicit Reward and Cost Fine-Tuning for Safety Alignment which is a preference-based fine-tuning method that adapts Direct Preference Optimization (DPO), which is now widely used for LLM fine-tuning, to the sequential decision making setting. PREFINE constructs policy-sampled counterfactual trajectories to establish meaningful preference contrasts and jointly optimizes for reward retention and safety alignment. Empirically, PREFINE reduces constraint violations and catastrophic failures by over 60% while maintaining original reward behavior. PREFINE produces policies that achieve low-cost, high-reward performance with significantly improved data and computational efficiency compared to full offline RL or imitation learning, bridging preference alignment and safe policy adaptation in continuous domains.

URL PDF HTML ☆

赞 0 踩 0

2605.21217 2026-05-21 stat.ML cs.LG 版本更新

Federated LoRA Fine-Tuning for LLMs via Collaborative Alignment

通过协作对齐的联邦LoRA微调大型语言模型

Shuaida He, Liwen Chen, Long Feng

发表机构 * School of Computing & Data Science, The University of Hong Kong（计算与数据科学学院，香港大学）

AI总结本文研究了在联邦学习环境下使用LoRA进行参数高效微调的问题，提出了一种名为CLAIR的框架，通过结构低秩加块稀疏分解来恢复共享LoRA子空间并检测污染客户端，从而在噪声情况下实现精确恢复，并在不同条件下实现稳定和一致的协作集恢复。

详情

AI中文摘要

低秩适应（LoRA）已成为参数高效微调大型语言模型（LLMs）的强大工具。本文研究了在联邦学习设置下的LoRA，使客户端能够在保持参数效率的同时进行协作微调。我们专注于一个高度异质的环境，在这种环境中客户端仅共享部分结构，且大量子集可能被污染。我们提出了Collaborative Low-rank Alignment and Identifiable Recovery（CLAIR），一个意识污染的框架，仅依赖于初步的本地估计器。其公式适用于从线性回归到神经网络和LLM模块的广泛领域，只要本地适应可以表示为矩阵值更新。CLAIR通过结构低秩加块稀疏分解恢复共享LoRA子空间并检测污染客户端。我们证明了在无噪声情况下能够精确恢复共享LoRA子空间，在初步估计误差下实现稳定恢复，并在温和的分离条件下实现一致的协作集恢复。我们进一步量化了CLAIR的改进效果：它通过跨客户端平均减少子空间外的估计误差，同时在共享LoRA子空间内保留客户端特定的变异，从而在该Oracle增益超过子空间估计和良性客户端异质性的成本时优于本地微调。经验上，我们通过在文本复制任务上微调Transformer架构来展示CLAIR的优势。结果表明，与本地微调和非鲁棒联邦平均相比，CLAIR在准确检测污染客户端和改善良性客户端性能方面表现出色。

英文摘要

Low-rank adaptation (LoRA) has emerged as a powerful tool for parameter-efficient fine-tuning of large language models (LLMs). This paper studies LoRA under a federated learning setting, enabling collaborative fine-tuning across clients while preserving parameter efficiency. We focus on a highly heterogeneous regime in which clients share only partial structure and a substantial subset may be contaminated. We propose Collaborative Low-rank Alignment and Identifiable Recovery (CLAIR), a contamination-aware framework that relies only on preliminary local estimators. Its formulation applies broadly, from linear regression to neural network and LLM modules, whenever local adaptation can be represented by matrix-valued updates. CLAIR recovers the shared LoRA subspace and detects contaminated clients via a structured low-rank plus block-sparse decomposition. We prove exact recovery of the shared LoRA subspace in the noiseless case, stable recovery under preliminary estimation error, and consistent collaborative-set recovery under mild separation conditions. We further quantify the gain from CLAIR refinement: it reduces off-subspace estimation error through cross-client averaging while preserving client-specific variation within the shared LoRA subspace, thus improves over local fine-tuning whenever this oracle gain outweighs the costs of subspace estimation and benign-client heterogeneity. Empirically, we demonstrate the benefits of CLAIR by fine-tuning a Transformer architecture on a text-copying task. The results show accurate contamination detection and improved benign-client performance compared with local fine-tuning and non-robust federated averaging.

URL PDF HTML ☆

赞 0 踩 0

2605.21213 2026-05-21 quant-ph cs.AI cs.LG math.OC 版本更新

ChunkFT: 用于内存高效全微调的分块优化

Yongkang Liu, Zijing Wang, Mengjie Zhao, Ercong Nie, Mingyang Wang, Qian Li, Feiliang Ren, Shi Feng, Daling Wang, Hinrich Schütze

发表机构 * Northeastern University, China（中国东北大学）； Shanghai Jiao Tong University, China（上海交通大学）； CIS, LMU Munich, Germany（慕尼黑大学CIS实验室）； MCML, Germany（德国MCML实验室）； Shandong University, China（山东大学）

AI总结本文提出ChunkFT框架，通过动态激活的工作集重新定义全参数微调，实现了无需修改网络架构即可对任意子张量进行梯度计算，理论分析和实验表明其在内存使用、运行时间和优化质量上均有效，且在下游任务中表现优于现有内存高效基线。

详情

AI中文摘要

本文提出了ChunkFT，一种内存高效的微调框架，其通过动态激活的工作集重新定义全参数微调。ChunkFT能够在不修改网络架构的情况下，对任意子张量进行梯度计算，为优化任意子网络提供了算法基础，同时避免了标准密集梯度计算。在确定性设置下，我们提供了ChunkFT的理论收敛分析。实验中，我们使用单块RTX 4090-24GB GPU和两块H800-80GB GPU分别对Llama 3-8B和Llama 3-70B进行微调。一个7B模型在1K输入长度下的全参数微调仅需13.72GB的GPU内存。结果表明，ChunkFT在内存使用、运行时间和优化质量上均有效。此外，在语言理解、数学推理和MT-Bench等下游任务中，ChunkFT在性能上一致优于现有内存高效的基线。值得注意的是，ChunkFT在某些情况下甚至超过了全参数微调的性能。我们的代码库可在https://github.com/misonsky/chunk上找到。

英文摘要

This work presents \textsc{ChunkFT}, a memory-efficient fine-tuning framework that reformulates full-parameter fine-tuning around a dynamically activated working set. \textsc{ChunkFT} enables gradient computation for arbitrary sub-tensors without modifying the network architecture, providing an algorithmic foundation for optimizing arbitrary sub-networks while avoiding standard dense gradient computation. We provide a theoretical convergence analysis of \textsc{ChunkFT} in the deterministic setting. Empirically, we apply \textsc{ChunkFT} to fine-tune Llama 3-8B and Llama 3-70B using a single RTX 4090-24GB GPU and 2$\times$ H800-80GB GPUs, respectively. Full-parameter fine-tuning of a 7B model with a 1K input length requires only 13.72GB of GPU memory. The results demonstrate the effectiveness of \textsc{ChunkFT} in memory usage, running time, and optimization quality. Moreover, downstream evaluations on language understanding, mathematical reasoning, and MT-Bench show that \textsc{ChunkFT} consistently outperforms existing memory-efficient baselines. Notably, \textsc{ChunkFT} achieves performance comparable to, and in some cases exceeding, full-parameter fine-tuning. Our repository is on https://github.com/misonsky/chunk.

URL PDF HTML ☆

赞 0 踩 0

2605.21167 2026-05-21 stat.ML cs.LG 版本更新

A Rigorous, Tractable Measure of Model Complexity

一个严格且可计算的模型复杂度度量

Oskar Allerbo, Thomas B. Schön

发表机构 * KTH Royal Institute of Technology（皇家理工学院）； Uppsala University（乌普萨拉大学）

AI总结本文提出了一种严格且易于计算的模型复杂度度量方法，基于模型在不同输入上的梯度相似性，适用于参数模型和非参数模型，并扩展了多项式度数、核长度尺度等模型特定复杂度度量，同时揭示了随机傅里叶特征、随机森林、神经网络和梯度提升中的双下降现象。

详情

AI中文摘要

对模型复杂度的准确评估对于解释、泛化和模型选择等主题至关重要。然而，大多数现有复杂度度量要么依赖于启发式假设，要么计算上不可行。在本文中，我们提出了一种数学上严谨且易于计算的模型复杂度度量方法，该方法基于模型在不同输入上的梯度相似性。因此，它适用于任何参数模型，也适用于基于核的非参数模型。我们证明了我们的复杂度度量可以推广到模型特定的复杂度度量，如多项式度数（多项式回归）、核长度尺度（Matérn核）、邻居数（k-近邻）、分割数（决策树）和树数（随机森林）。我们还利用我们的度量方法获得了关于随机傅里叶特征、随机森林、神经网络和梯度提升中的双下降现象的新见解。

英文摘要

An accurate assessment of a model's complexity is crucial for topics such as interpretation, generalization, and model selection. However, most existing complexity measures either rely on heuristic assumptions or are computationally prohibitive. In this paper, we present a mathematically rigorous yet easy-to-compute measure of model complexity that is based on the similarities between the model gradients across inputs. It is thus well-defined for any parametric model, but also for kernel-based non-parametric models. We prove that our measure of complexity generalizes model-specific complexity measures such as polynomial degree (for polynomial regression), kernel length scale (for Matérn kernels), number of neighbors (for k-nearest neighbors), number of splits (for decision trees), and number of trees (for random forests). We also use our measure to obtain new insights into the double descent phenomenon for random Fourier features, random forests, neural networks, and gradient boosting.

URL PDF HTML ☆

赞 0 踩 0

2605.21164 2026-05-21 cs.LG quant-ph 版本更新

Q-SYNTH: Hybrid Quantum-Classical Adversarial Augmentation for Imbalanced Fraud Detection

Q-SYNTH：混合量子-经典对抗增强用于不平衡欺诈检测

Adam Innan, Mansour El Alami, Nouhaila Innan, Muhammad Shafique, Mohamed Bennai

发表机构 * Quantum Physics and Spintronics Team, LPMC, Faculty of Sciences Ben M'sick（量子物理与自旋电子团队，拉瓦尔学院，本·马西克科学学院）； eBRAIN Lab, Division of Engineering, New York University Abu Dhabi (NYUAD)（eBRAIN实验室，工程学院，纽约大学阿布扎比分校（NYUAD））； Center for Quantum and Topological Systems (CQTS), NYUAD Research Institute（量子与拓扑系统中心（CQTS），NYUAD研究机构）

AI总结本文提出Q-SYNTH，一种混合量子-经典对抗框架，用于生成不平衡欺诈检测中的少数类样本，通过量子电路生成器和经典神经网络判别器，提升欺诈检测的召回率和F1分数。

Comments 13 pages, 6 figures

详情

AI中文摘要

信用卡欺诈检测受到极端类别不平衡的挑战，其中欺诈交易稀少但操作上至关重要。这种不平衡通常使监督学习器偏向合法类别，导致整体准确率高但欺诈类召回率和F1分数较弱。本文介绍了Q-SYNTH，一种混合经典-量子生成对抗框架，其中参数化量子电路作为生成器，经典神经网络作为判别器。Q-SYNTH旨在表数据中生成少数类欺诈样本，并从两个维度进行评估：生成样本与真实欺诈样本的统计保真度以及下游欺诈检测性能。为此，生成的样本通过基于Kolmogorov-Smirnov统计和Wasserstein距离的分布相似性度量进行评估，通过AUC-ROC衡量真实与合成的可检测性，并在量子和经典分类器上评估下游分类性能。在报告的协议下，Q-SYNTH在与经典GAN基线相比减少了边缘分布不匹配，同时保持了具有竞争力的下游欺诈检测性能。尽管SMOTE在特征相似性方面最强，而经典GAN在某些设置中达到最高的下游性能，Q-SYNTH在分布保真度和下游性能之间提供了良好的权衡，支持了混合量子增强在不平衡欺诈检测中的可行性。

英文摘要

Credit card fraud detection is fundamentally challenged by extreme class imbalance, where fraudulent transactions are rare yet operationally critical. This imbalance often biases supervised learners toward the legitimate class, leading to high overall accuracy but weaker fraud-class recall and F1-score. This paper introduces Q-SYNTH, a hybrid classical--quantum generative adversarial framework in which a parameterized quantum circuit serves as the generator and a classical neural network serves as the discriminator. Q-SYNTH is designed for minority-class fraud synthesis in tabular data and is evaluated along two dimensions: statistical fidelity to real fraud samples and downstream performance for fraud detection. To this end, generated samples are assessed using distributional similarity measures based on Kolmogorov-Smirnov statistics and Wasserstein distances, real-vs-synthetic detectability measured by AUC-ROC, and downstream classification performance across both quantum and classical classifiers. Under the reported protocol, Q-SYNTH reduces marginal distribution mismatch relative to a classical GAN baseline while maintaining competitive downstream fraud-detection performance. Although SMOTE achieves the strongest feature-wise similarity and the classical GAN attains the highest downstream performance in several settings, Q-SYNTH offers a favorable compromise between distributional fidelity and downstream performance, supporting the feasibility of hybrid quantum augmentation for imbalanced fraud detection.

URL PDF HTML ☆

赞 0 踩 0

2605.21160 2026-05-21 cs.LG 版本更新

Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning

通过反向生成数据和引导强化学习学习第一积分

Jingfeng Zhong, Zhengxiang Liu, Zhijie Wang, Shuai Li

发表机构 * Shanghai Jiao Tong University（上海交通大学）

AI总结本文提出FISolver，一种基于LLM的求解器，通过反向生成数据和引导强化学习方法，解决第一积分发现中的数据稀缺问题，并在挑战性基准上显著优于其他方法。

Comments 17 pages, 2 figures, 3 tables

2605.21157 2026-05-21 cs.CV cs.AI cs.LG cs.RO 版本更新

Comparative Analysis of Military Detection Using Drone Imagery Across Multiple Visual Spectrums

多光谱下无人机影像用于军事检测的比较分析

Sourov Roy Shuvo, Prajwal Panth, Rajesh Chowdhury, Sorup Chakraborty, Sudip Chakrabarty, Prasant Kumar Pattnaik

发表机构 * School of Computer Engineering KIIT Deemed to be University（计算机工程学院 KIIT deemed to be 大学）

AI总结本文研究了不同光谱条件下无人机影像用于军事目标检测的问题，通过构建四种不同数据集（灰度、热成像、夜视和模糊成像）来评估模型在不同环境下的性能，提出了一种改进的YOLOv11-small模型以提升无人机作战的性能和可靠性。

Comments 6 pages, 7 figures. Accepted at the 16th International Conference on Computing, Communication and Networking Technologies (ICCCNT), July 6-11, 2025, IIT Indore. Proceedings pending publication

详情

AI中文摘要

在现代战争中，无人机已成为情报收集和精确打击在不同 hostile 环境中的重要组成部分。其能够从安全距离实时操作 hostile 环境的能力使其在监视和军事行动中具有无价的价值。KIIT-MiTA 数据集由从无人机拍摄的不同军事场景图像组成，为检测军事目标提供了基础，但未考虑各种现实场景。为此，创建了四种不同类型的数据集：灰度、热成像、夜视和模糊成像，以模拟现实环境如低能见度、热成像和夜间条件。YOLOv11-small 模型被训练和用于检测不同设置中的目标。本研究通过在防御和进攻任务中开发先进的检测系统，提高了基于无人机的作战性能和可靠性。

英文摘要

In modern warfare, drones are becoming an essential part of intelligence gathering and carrying out precise attacks in different kinds of hostile environments. Their ability to operate in real-time and hostile environments from a safe distance makes them invaluable for surveillance and military operations. The KIIT-MiTA dataset is comprised of images of different military scenarios taken from drones, and these provide a foundation for detecting military objects, but it does not take into account the various types of real-world scenarios. With that in mind, to evaluate how the models are performing under varying conditions, four different types of datasets are created: Gray Scale, Thermal Vision, Night Vision, and Obscura Vision. These simulate the real-world environments such as low visibility, heat-based imagery, and nighttime conditions. The YOLOv11-small model is trained and used to detect objects across diverse settings. This research boosts the performance and reliability of drone-based operations by contributing to the development of advanced detection systems in both defensive and offensive missions.

URL PDF HTML ☆

赞 0 踩 0

2605.21154 2026-05-21 cs.CL cs.AI cs.LG 版本更新

Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models

精神病诊断的ICD分类自动化：从经典NLP到大语言模型

Fernando Ortega, Raúl Lara-Cabrera, Jorge Dueñas-Lerín, Alejandro de la Torre-Luque, Mercé Salvador Robert, Enrique Baca-García

发表机构 * Department of Sistemas Informáticos, Universidad Politécnica de Madrid, Spain（西班牙马德里理工大学信息系统系）； KNODIS Research Group, Universidad Politécnica de Madrid, Spain（西班牙马德里理工大学KNODIS研究组）； CIBERSAM ISCIII, Spain（西班牙ISCIII CIBERSAM）； Department of Legal Medicine, Psychiatry and Pathology. Complutense University of Madrid, Spain（西班牙马德里康普顿斯大学法医学、精神病学与病理学系）； Hospital Universitario de Móstoles, Universidad Rey Juan Carlos, Spain（西班牙雷阿尔皇家卡洛斯大学莫斯特oles大学医院）； Department of Psychiatry, University Hospital Jimenez Díaz Fundation, Madrid, Spain（西班牙圣地亚哥· jiménez Díaz基金会精神病科部）； Department of Psychiatry, University Hospital Rey Juan Carlos, Móstoles, Spain（西班牙雷阿尔皇家卡洛斯大学莫斯特oles医院精神病科部）； Department of Psychiatry, General Hospital of Villalba, Madrid, Spain（西班牙维拉尔巴医院精神病科部）； Department of Psychiatry, University Hospital Infanta Elena, Madrid, Spain（西班牙伊菲格尼亚医院精神病科部）； Department of Psychology, Universidad Catolica del Maule, Talca, Chile（智利马尔学院心理学系）； Department of Psychiatry, Madrid Autonomous University, Madrid, Spain（西班牙马德里自治大学精神病科部）

AI总结本研究提出利用NLP和机器学习技术将自由文本描述映射到国际疾病分类（ICD），以自动化精神病诊断分析，通过评估从经典频率模型到先进大语言模型的多种文本表示方法，展示了transformer嵌入在捕捉隐含语义线索和细致医学术语方面的优势。

详情

AI中文摘要

心理健康已成为全球优先事项，导致临床诊断编码的行政负担巨大。本研究提出通过将自由文本描述映射到国际疾病分类（ICD）来自动化精神病诊断分析，利用包含145,513个西班牙精神病描述的专用数据集，评估了从经典频率模型（BoW，TF-IDF）到先进大语言模型（如e5_large、BioLORD和Llama-3-8B）的各种文本表示方法。结果表明，基于transformer的嵌入 consistently 超过传统方法，通过端到端微调，e5_large模型实现了最高的性能，F1_micro得分为0.866。本研究证明了将大语言模型适应特定临床术语对于克服“长尾”标签分布和精神病 discourse 的固有模糊性至关重要。

英文摘要

Mental health has become a global priority, leading to a massive administrative burden in the coding of clinical diagnoses. This study proposes the automation of psychiatric diagnostic analysis by mapping free-text descriptions to the International Classification of Diseases (ICD) using Natural Language Processing (NLP) and Machine Learning (ML) techniques. Utilizing a specialized dataset of 145,513 Spanish psychiatric descriptions, various text representation paradigms were evaluated, ranging from classical frequency-based models (BoW, TF-IDF) to state-of-the-art Large Language Models (LLMs) such as e5\_large, BioLORD, and Llama-3-8B. Results indicate that transformer-based embeddings consistently outperform traditional methods by capturing implicit semantic cues and nuanced medical terminology. The e5\_large model, through end-to-end fine-tuning, achieved the highest performance with a $F1_{micro}$ score of 0.866. This research demonstrates that adapting LLMs to specific clinical nomenclature is essential for overcoming the challenges of ``long-tail'' label distributions and the inherent ambiguity of psychiatric discourse.

URL PDF HTML ☆

赞 0 踩 0

2605.21147 2026-05-21 cs.LG cs.CL 版本更新

SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning

SMoA：用于参数高效微调的频谱调制适配器

Yongkang Liu, Xing Li, Mengjie Zhao, Shanru Zhang, Zijing Wang, Qian Li, Shi Feng, Feiliang Ren, Daling Wang, Hinrich Schütze

发表机构 * Northeastern University, China（东北大学，中国）； Shandong University, China（山东大学，中国）； CIS, LMU Munich, Germany（慕尼黑莱布尼茨大学CIS中心，德国）； MCML, Germany（德国MCML）

AI总结本文提出SMoA，一种频谱感知更新的适配器，通过在较小的参数预算下扩大可访问的频谱更新家族，提升参数高效微调的性能。

详情

AI中文摘要

随着模型参数数量的增加，参数高效微调（PEFT）已成为定制预训练大语言模型的首选方法。低秩适应（LoRA）使用低秩更新方法来模拟全参数微调，广泛用于减少资源需求。然而，降低秩面临代表能力有限的挑战。理论表明，LoRA微调秩r收敛于预训练权重矩阵的前r个奇异值。随着秩的增加，更多主奇异方向被保留，通常会提高模型性能。然而，更大的秩也会引入更多的可训练参数，导致更高的计算成本。为克服这一矛盾，我们提出SMoA，一种频谱调制适配器，通过在较小的参数预算下扩大可访问的频谱感知更新家族。SMoA将层分成多个对齐的频谱块，并在每个对角块上应用一个块内Hadamard调制的低秩分支，从而获得更广泛的预训练频谱方向覆盖。我们提供了多个任务的理论分析和实证结果。在我们的实验中，SMoA在当前较低预算设置下优于LoRA和具有竞争力的LoRA风格基线。

英文摘要

As the number of model parameters increases, parameter-efficient fine-tuning (PEFT) has become the go-to choice for tailoring pre-trained large language models. Low-rank Adaptation (LoRA) uses a low-rank update method to simulate full parameter fine-tuning, which is widely used to reduce resource requirements. However, decreasing the rank encounters challenges with limited representational capacity. Theory suggests that LoRA fine-tuning with rank r converges toward the top r singular values of the pre-trained weight matrix. As the rank increases, more principal singular directions are preserved, which generally improves the model's performance. However, a larger rank also introduces more trainable parameters, leading to higher computational cost. To overcome this dilemma, we propose SMoA, a \textbf{S}pectrum \textbf{Mo}dulation \textbf{A}dapter that enlarges the accessible family of spectrum-aware updates under a smaller parameter budget. SMoA partitions the layer into multiple aligned spectral blocks and applies one in-block Hadamard-modulated low-rank branch to each diagonal block, yielding broader coverage of pretrained spectral directions. We provide theoretical analysis and empirical results on multiple tasks. In our experiments, SMoA improves average performance in the current lower-budget setting over LoRA and competitive LoRA-style baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.21127 2026-05-21 cs.LG 版本更新

Reasoning-Trace Collapse: Evaluating the Loss of Explicit Reasoning During Fine-Tuning

推理轨迹坍缩：在微调过程中显式推理能力的丧失评估

Lukas Twist, Helen Yannakoudakis, Jie M. Zhang

发表机构 * King’s College London（伦敦国王学院）

AI总结本文研究了在微调过程中显式推理能力的丧失问题，提出了一种结构评估框架来区分答案正确性与推理轨迹的有效性，并发现标准监督微调会迅速抑制有效的推理轨迹，而仅关注答案的指标会掩盖这一问题。

Comments 22 pages, 3 tables, 3 figures

详情

AI中文摘要

显式推理模型被训练以在最终答案之前生成中间推理轨迹，但下游微调通常在不包含此类轨迹的普通指令-响应数据上进行。我们证明这种不匹配会导致推理轨迹坍缩：微调后的模型仍然能生成合理的最终答案，但会失去使其成为推理模型的结构有效推理轨迹。我们引入了一种结构评估框架，将答案正确性与推理轨迹有效性分开，测量有效、空、缺失和截断的推理轨迹以及基于推理的任务性能。使用该框架，我们研究了四个开放式推理模型，发现标准监督微调可以迅速抑制有效的推理轨迹，而仅关注答案的指标会显著掩盖这一失败：在几种设置中，基于有效推理的性能仍保持高位，而有效推理的比例却大幅下降。我们进一步表明，简单的损失屏蔽策略可以在不需教师生成推理轨迹的情况下显著缓解坍缩。这些结果表明，微调后的推理模型的评估应报告结构推理可靠性指标，尤其是在适应数据不包含显式推理轨迹的情况下。

HORST：用于稀疏Transformer训练的优化几何组合

Tom Jacobs, Rohan Jain, Rebekka Burkholz

发表机构 * CISPA Helmholtz Center for Information Security（CISPA 河岸信息安全中心）

AI总结本文提出HORST，一种结合优化几何的模块化优化器，通过超几何镜像映射引入L1稀疏性偏置，以在保持训练稳定性的同时促进稀疏性。

Comments 22 pages, 8 figures

2605.21103 2026-05-21 cs.LG 版本更新

分离通信与策略：在带宽限制下的鲁棒多智能体强化学习

Alexi Canesse, Benoît Goupil, Jesse Read, Sonia Vanier

发表机构 * École polytechnique (LIX)（巴黎高等理工学院（LIX））； CNRS（国家科学研究中心）； Institut Polytechnique de Paris（巴黎高等理工学院）； Palaiseau, France（法国Palaiseau）

AI总结本文提出了一种新的方法，通过引入β指标和SLIM架构，将通信路径与策略的潜在表示分离，从而在带宽受限的情况下提高多智能体强化学习的鲁棒性和性能。

详情

AI中文摘要

通信在多智能体强化学习（MARL）中起到了协调作用，但许多实际应用，例如无人机编队的搜索与救援任务，在严重的带宽限制下运行。许多通信架构仍然存在耦合瓶颈，其中共享的潜在表示用于策略执行和智能体间通信。因此，减少信息量会直接限制策略的潜在空间，通常导致显著的性能下降。我们通过两个贡献来解决这个问题。首先，我们引入β，一个归一化的每智能体带宽预算，将稀疏性、轮次和信息维度统一为一个可比的约束。其次，我们提供SLIM，一个最小的架构，将通信路径与策略的潜在表示分离，使我们能够隔离带宽的影响与策略容量的影响，同时受益于步骤内通信。我们在几个部分可观测的MARL基准上评估了我们的方法，其中通信是至关重要的。我们的方法在状态空间中实现了最先进的性能，并且在有限的通信下表现出可扩展性和鲁棒性，随着带宽的减少，降级仅是轻微的。

英文摘要

Communication enables coordination in multi-agent reinforcement learning (MARL), but many real-world applications, e.g., search-and-rescue with drone swarms, operate under severe bandwidth constraints. Many communication architectures still expose a coupled bottleneck in which a shared latent representation is used for both policy execution and inter-agent communication. Consequently, reducing message size directly limits the policy's latent space, often leading to significant performance degradation. We address this with two contributions. First, we introduce $β$, a normalised per-agent bandwidth budget that unifies sparsity, rounds, and message dimension into a single comparable constraint. Second, we provide SLIM, a minimal architecture that decouples the communication pathway from the policy's latent representation, allowing us to isolate the effect of bandwidth from the effect of policy capacity while benefiting from in-step communication. We evaluate our method on several partially-observable MARL benchmarks, where communication is essential. Our approach achieves state-of-the-art performance and exhibits scalability and robustness under limited communication, with only marginal degradation as bandwidth is reduced.

URL PDF HTML ☆

赞 0 踩 0

2605.21083 2026-05-21 physics.app-ph cs.LG physics.bio-ph physics.med-ph 版本更新

AIMBio-Mat: An AI-Native FAIR Platform for Closed-Loop Materials Discovery and Biomedical Translation

AIMBio-Mat: 一个面向AI的FAIR平台，用于闭环材料发现与生物医学转化

D. -M. Mei, K. Acharya, C. M. Adhikari, M. Adhikari, S. Aryal, B. V. Benson, K. Bhatta, S. Bhattarai, N. Budhathoki, A. M. Castillo, D. Chakraborty, S. Chhetri, S. Choudhury, T. A. Chowdhury, R. D. Cruz, B. Cui, S. Dhital, K. -M. Dong, R. Gapuz, A. Ghasemi, E. Z. Gnimpieba, B. D. S. Gurung, H. A. Hashim, R. I. Harry, K. -E. Hasin, M. K. Hassanzadeh, M. K. Jha, D. Kim, K. -C. Kong, B. Lama, A. Mahat, N. Maharjan, A. Majeed, J. Mammo, M. M. Masud, K. S. Moore, A. Nawaz, H. Oli, S. A. Panamaldeniya, L. Pandey, R. Pandey, Z. Peng, A. Prem, M. M. Rana, K. Rana Magar, R. Rizk, C. S. Tadi, L. -W. Wang, Y. Yang, G. -L. Yin, C. -X. Yu, D. Zeng, M. Zhou, Q. Zhou

发表机构 * Department of Physics, University of South Dakota（南达科他大学物理系）； South Dakota School of Mines and Technology（南达科他州矿学院）； Department of Chemistry, Physics and Materials Science, Fayetteville State University（费耶特维尔州立大学化学、物理与材料科学系）； University of South Dakota（南达科他大学）； PROMISE Lab, Sanford Research（桑福德研究机构PROMISE实验室）； Department of Mechanical Engineering, University of Mississippi（密西西比大学机械工程系）； Department of Physics and Astronomy, University of Kansas（堪萨斯大学物理与天文学系）； Tiospa Zina Tribal School（蒂奥萨宾纳部落学校）； Department of Mechanical and Materials Engineering, University of Nebraska–Lincoln（内布拉斯加大学林肯分校机械与材料工程系）

AI总结本文提出AIMBio-Mat平台，通过整合材料来源、生物医学背景、知识图谱、不确定性感知的机器学习和人机协作主动学习，解决材料发现与生物医学转化中跨领域推理的问题，并提供可验证的平台蓝图。

Comments 35 pages, 4 figures, and 12 tables

详情

AI中文摘要

材料发现和生物医学转化日益需要能够跨组成、加工、结构、生物响应、可制造性、安全性和治理约束进行推理的模型。现有的材料和生物医学数据生态系统虽然强大，但仍然缺乏与AI指导发现相结合的能力。本文提出AIMBio，一个面向AI的、符合FAIR原则和治理意识的决策层框架，将材料来源、生物医学背景、知识图谱、不确定性感知的机器学习和人机协作主动学习联系起来。该框架将生物医学-材料发现建模为在不确定性下的约束多目标优化，并引入了元数据、模型文档、风险分层治理、评估指标和分阶段实施的实用要求。为使路线图可测试，我们增加了最小可行原型规范和一个用于AI指导的纳米材料药物输送的示范试点。AIMBio被定位为探索性和临床前发现基础设施，而不是临床决策支持软件；任何临床或受控设备使用都需要单独的验证、变更控制和监管审查。核心贡献是提供一个可发表的平台蓝图，将碎片化的材料和生物医学记录转化为可审计、实验可操作和转化负责任的发现工作流。

英文摘要

Materials discovery and biomedical translation increasingly require models that can reason across composition, processing, structure, biological response, manufacturability, safety, and governance constraints. Existing materials and biomedical data ecosystems are powerful but remain poorly coupled for AI-guided discovery. Here we present AIMBio, a conceptual framework for an AI-native, FAIR, and governance-aware decision layer that links materials provenance, biomedical context, knowledge graphs, uncertainty-aware machine learning, and human-in-the-loop active learning. The framework formulates biomedical-materials discovery as constrained multi-objective optimization under uncertainty and introduces practical requirements for metadata, model documentation, risk-tiered governance, evaluation metrics, and phased implementation. To make the roadmap testable, we add a minimum viable prototype specification and a worked pilot for AI-guided nanomaterials for drug delivery. AIMBio is positioned as exploratory and preclinical discovery infrastructure, not as clinical decision-support software; any clinical or regulated-device use would require separate validation, change control, and regulatory review. The central contribution is a publishable platform blueprint for converting fragmented materials and biomedical records into auditable, experimentally actionable, and translationally responsible discovery workflows.

URL PDF HTML ☆

赞 0 踩 0

2605.21081 2026-05-21 cs.SD cs.LG 版本更新

Musical Attention Transformer: Music Generation Using a Music-Specific Attention Model

音乐注意力转换器：使用音乐特定的注意力模型进行音乐生成

Shinnosuke Taksuka, Hideo Mukai

发表机构 * Department of Computer Science, School of Science and Technology, Meiji University（计算机科学系，科学与技术学部，立命馆大学）

AI总结本文提出了一种音乐特定的注意力模型，通过整合元信息来提升音乐生成的质量，核心方法是将音乐结构和元数据结合，主要贡献是提高了生成音乐的连贯性和多样性。

Comments 32 pages, 13 figures

详情

AI中文摘要

本研究旨在通过引入元信息来提升使用Transformer进行音乐生成的质量。尽管基于Transformer的方法在捕捉音乐作品中的长期依赖性方面有效，但它们生成的音乐常出现重复或音符重复的问题，导致不自然的旋律。为了解决这些限制，我们提出了音乐注意力机制，该机制将元信息如小节号、调性、节拍等整合到注意力过程中。音乐注意力显式利用音乐的结构属性及其相关元数据，使Transformer的注意力机制能够更有效地运作，从而提高生成输出的质量。在我们的框架中，每个音乐音符被表示为五个事件（音高、小节号、起始时间、持续时间和力度）以及三个元数据元素的组合。然后将注意力机制修改为反映这些八个特征之间的相关性，使模型能够更好地捕捉音乐编排的内在特性。实验结果表明，整合音乐注意力的模型在音乐连贯性、变化性和整体质量方面优于先前的方法，如全注意力和步进注意力。值得注意的是，它显著减少了重复并增强了模型生成多样化、和谐一致的旋律的能力。音乐注意力因此在AI驱动的音乐生成中代表了重要的进展，有助于创建更自然和富有表现力的音乐作品。

英文摘要

This study aims to enhance the quality of music generation using Transformers by incorporating meta-information. While Transformer-based approaches are effective at capturing long-term dependencies in musical compositions, the music they generate often suffers from issues such as excessive repetition or duplication of notes, leading to unnatural melodies. To address these limitations, we propose Musical Attention, a mechanism that incorporates meta-information such as bar numbers, key, signatures, and tempos into the attention process. Musical Attention explicitly leverages both the structural properties of music and its associated metadata, enabling the Transformer's attention mechanism to operate more effectively and thereby improving the quality of the generated output. In our framework, each musical note is represented as a combination of five events-pitch, bar number, onset, duration, and velocity in addition to the three metadata elements. The attention mechanism is then modified to reflect the correlations among these eight features, allowing the model to better capture the inherent characteristics of musical composition. Experimental results demonstrate that the model incorporating Musical Attention outperforms prior methods, such as Full Attention and Strided Attention, in terms of musical coherence, variation, and overall quality. Notably, it significantly reduces repetition and enhances the model's ability to generate diverse, harmonically consistent melodies. Musical Attention thus represents a meaningful advancement in AI-driven music generation, facilitating the creation of more natural and expressive compositions.

URL PDF HTML ☆

赞 0 踩 0

2605.21075 2026-05-21 cs.CV cs.LG 版本更新

SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining

SpectralEarth-FM: 将高光谱图像引入多模态地球观测预训练

Nassim Ait Ali Braham, Aaron Banze, Conrad M. Albrecht, Julien Mairal, Jocelyn Chanussot, Xiao Xiang Zhu

发表机构 * Chair of Data Science in Earth Observation（地球观测数据科学主任）； Technical University of Munich（慕尼黑技术大学）； Remote Sensing Technology Institute（遥感技术研究所）； German Aerospace Center (DLR)（德国航空航天中心）； Department of Aerospace Engineering（航空航天工程系）； University of the Bundeswehr Munich（联邦国防军慕尼黑大学）； LEAP ； Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心）； Univ. Grenoble Alpes（格勒诺布尔阿尔卑斯大学）； Inria（法国国家信息与自动化技术研究院）； CNRS（法国国家科学研究中心）； Grenoble INP（格勒诺布尔INP）； LJK

AI总结本文提出SpectralEarth-FM，一种用于多传感器地球观测输入的分层变压器，旨在联合处理高光谱图像与低通道观测。通过构建SpectralEarth-MM数据集，采用JEPA风格的目标进行预训练，实现了在高光谱下游任务和标准EO基准上的最佳性能。

详情

AI中文摘要

地球观测（EO）基础模型（FMs）越来越多地使用多传感器数据进行训练，涵盖多谱段图像（MSI）、合成孔径雷达（SAR）和衍生的地理空间层，但高光谱图像（HSI）仍被低估。相反，现有的高光谱FM仅在HSI上训练，未探索HSI与共定位EO传感器的联合预训练和融合。我们引入SpectralEarth-FM，一种用于多传感器EO输入的分层变压器，具有异构光谱维度。该架构结合了高光谱输入的光谱标记化、传感器特定编码器、跨传感器融合模块和共享分层编码器，能够联合处理HSI和低通道观测。为了预训练SpectralEarth-FM，我们构建了SpectralEarth-MM数据集，该数据集将EnMAP、EMIT、DESI三颗空间载荷的HSI与Sentinel-2、Landsat-8/9光学图像、Landsat地表温度（LST）和Sentinel-1 SAR在共同地理足迹上进行共定位。该数据集包含约2000万个全球分布的地点，25000万个地理参考碎片，以及超过40TB的数据。预训练使用一种联合嵌入预测架构（JEPA）风格的目标，匹配全球视图和同一地点单传感器局部视图之间的表示。我们评估了SpectralEarth-FM在高光谱下游任务和标准EO基准上的性能，遵循PANGAEA协议，实现了在两种评估设置中的最佳性能。

英文摘要

Earth observation (EO) foundation models (FMs) are increasingly trained on multisensor data, spanning multispectral imagery (MSI), synthetic aperture radar (SAR), and derived geospatial layers, but hyperspectral imagery (HSI) remains underrepresented. Conversely, existing hyperspectral FMs are trained on HSI alone, leaving joint pretraining and fusion of HSI with co-located EO sensors unexplored. We introduce SpectralEarth-FM, a hierarchical transformer for multisensor EO input with heterogeneous spectral dimensionality. The architecture combines spectral tokenization for hyperspectral inputs, sensor-specific encoders, a cross-sensor fusion module, and a shared hierarchical encoder, enabling joint processing of HSI and lower-channel observations. To pretrain SpectralEarth-FM, we curate SpectralEarth-MM, a dataset that co-locates HSI from three spaceborne sensors (EnMAP, EMIT, DESIS) with Sentinel-2, Landsat-8/9 optical imagery, Landsat land surface temperature (LST), and Sentinel-1 SAR, over common geographic footprints. It comprises approximately 2M globally distributed locations, 25M georeferenced patches, and over 40TB of data. Pretraining uses a Joint-Embedding Predictive Architecture (JEPA)-style objective that matches representations between global views and single-sensor local views from the same location. We evaluate SpectralEarth-FM on hyperspectral downstream tasks and standard EO benchmarks following the PANGAEA protocol, achieving state-of-the-art results across both evaluation settings.

URL PDF HTML ☆

赞 0 踩 0

2605.21070 2026-05-21 cs.LG 版本更新

Towards Understanding Self-Pretraining for Sequence Classification

向序列分类中的自预训练理解迈进

Omar Coser, Loredana Zollo, Paolo Soda, Antonio Orvieto

发表机构 * Unit of Artificial Intelligence & Computer Systems, Università Campus Bio-Medico di Roma（人工智能与计算机系统单位，罗马生物医学学院）； Unit of Advanced Robotics and Human-Centered Technologies, Università Campus Bio-Medico di Roma（先进机器人与以人为本技术单位，罗马生物医学学院）； Department of Diagnostics and Intervention, Radiation Physics, Biomedical Engineering, Umeå University（诊断与介入部门，辐射物理，生物医学工程，乌梅拉大学）； Max Planck Institute for Intelligent Systems（智能系统马克斯·普朗克研究所）； ELLIS Institute Tübingen（图宾根ELLIS研究所）； Tübingen AI Center（图宾根人工智能中心）

AI总结本文通过复制和系统消融Amos等人的研究，揭示了自预训练（SPT）在序列分类中提升性能的关键因素，发现标签监督在学习有用的查询-键注意力模式方面存在瓶颈，并通过简化理论框架证明了自预训练通过学习接近性交互来提升性能。

Comments v1: Preliminary, extension of the version accepted at ICML 2025 Workshop MOSS

详情

AI中文摘要

Amos等人（2024）表明，通过首先使用掩码标记预测目标进行预训练，可以在不使用外部数据或增强的情况下显著提高Transformer模型在序列分类中的准确性，这一过程称为自预训练（SPT）。尽管Amos等人（2024）的主要目标是展示Transformer在Long-Range Arena（LRA）上的强大性能，但他们的流程引发了更多根本性问题：SPT如何驱动优化以获得更好的解决方案？为什么标准监督训练在Transformer中会失效？为了更好地理解这一点，我们复制并系统消除了Amos等人（2024）的发现。我们的消融分析表明，在研究的设置中，关键瓶颈并非深度或泛化本身，而是标签监督在随机初始化下学习有用查询-键注意力模式的能力。在最小化设置中，我们识别出学习接近性交互——将绝对位置编码转换为接近性偏置的注意力分数——是SPT带来的改进的关键来源。最后，在简化理论框架中，我们证明标签监督在某些注意力分数方向上可能是局部盲目的，而这些方向可以通过掩码重建来检测。

英文摘要

Amos et al. (2024) showed that the accuracy of Transformer models in sequence classification can be significantly improved by first pretraining with a masked token prediction objective without external data or augmentation, a procedure referred to as self-pretraining (SPT). While the primary objective of Amos et al. (2024) was to showcase that Transformers can achieve strong performance on the Long-Range Arena (LRA), their pipeline raises more fundamental questions: How does SPT drive optimization to better solutions? Why can standard supervised training fail in Transformers? To better understand this, we replicate and systematically ablate the findings of Amos et al. (2024). Our ablations suggest that a central bottleneck in the studied settings is not depth or generalization alone, but the ability of label supervision to learn useful query-key Attention patterns from random initialization. With a minimal setup, we identify learning proximity interactions - turning absolute positional encodings into proximity-biased Attention scores - as a key source of the improvements brought by SPT. Finally, in a simplified theoretical setup, we show that label supervision can be locally blind to certain Attention-score directions that are instead detectable through masked reconstruction.

URL PDF HTML ☆

赞 0 踩 0

2605.21066 2026-05-21 cs.LG 版本更新

Robust Personalized Recommendation under Hidden Confounding in MNAR

在MNAR中具有隐藏混杂因素的鲁棒个性化推荐

Zongyu Li, Wanting Su, Tianyu Xia

发表机构 * Guangdong University of Technology（广东工业大学）； Chinese Academy of Sciences（中国科学院）； Peking University（北京大学）

AI总结本文提出了一种新的框架，通过估计用户-项目层面的敏感度界限，缓解了全局敏感度界限中固有的同质性假设，从而在存在隐藏混杂因素的情况下实现更鲁棒和准确的个性化推荐。

详情

AI中文摘要

推荐系统通常依赖于观察到的用户-项目交互数据，这些数据由于用户对项目的有选择性交互而容易产生选择偏差。逆概率加权和双重稳健估计器在观察到的混杂因素下有效缓解了选择偏差，但在存在隐藏混杂因素的情况下不可靠。现有的方法依赖于随机对照试验（RCTs）或全局敏感度界限，在实践中受到限制：RCTs需要昂贵的实验数据，而全局敏感度界限假定通过敏感性分析，未测量的混杂因素对倾向性的影响是均匀有界的，从而忽视了用户-项目交互中的异质性。为克服这一限制，我们提出了一种新的框架，该框架估计用户-项目层面的敏感度界限，从而显著放宽了全局敏感度界限中固有的同质性假设，称为个性化未观察混杂因素意识交互去混杂（PUID）。为确保鲁棒性和预测准确性，我们进一步开发了对抗优化策略，并提出了一个基准引导的变体（BPUID），该变体结合了预训练模型作为稳定参考。在三个真实世界数据集上的广泛实验表明，我们的方法在存在隐藏混杂因素的情况下显著优于全局方法，且不需要RCT数据。

英文摘要

Recommender systems often rely on observational user--item interaction data, which is prone to selection bias due to users' selective interactions with items. Inverse propensity weighting and doubly robust estimators effectively mitigate selection bias under observed confounding, but are unreliable in the presence of hidden confounders. Existing approaches relying on randomized controlled trials (RCTs) or global sensitivity bounds are constrained in practice: RCTs demand costly experimental data, while global sensitivity bounds presume a uniformly bounded effect of unmeasured confounders on propensities through sensitivity analysis, thereby neglecting heterogeneity across user--item interactions. To overcome this limitation, we propose a novel framework, which estimates user--item level sensitivity bounds, thereby substantially relaxing the homogeneity assumption inherent in global sensitivity bounds named Personalized Unobserved-Confounding-aware Interaction Deconfounder (PUID). To ensure both robustness and predictive accuracy, we further develop an adversarial optimization strategy and propose a benchmark-guided variant (BPUID) that incorporates pre-trained models as stabilizing references. Extensive experiments on three real-world datasets demonstrate that our approach significantly outperforms global methods under hidden confounding, without requiring RCT data.

URL PDF HTML ☆

赞 0 踩 0

2605.21060 2026-05-21 cs.LG cs.AI stat.ML 版本更新

Divide et Calibra: Multiclass Local Calibration via Vector Quantization

Divide et Calibra: 通过向量量化实现多类局部校准

Cesare Barbera, Lorenzo Perini, Giovanni De Toni, Andrea Passerini, Andrea Pugnana

发表机构 * University of Pisa（比萨大学）； University of Trento（特伦托大学）； Meta（Meta公司）； Fondazione Bruno Kessler（布鲁诺·凯斯勒基金会）

AI总结本文提出了一种复合方法，通过向量量化诱导表示空间的结构划分，并利用Dirichlet浓度的参数化实现跨区域参数共享，从而学习出能泛化到稀疏区域的异质校准映射，提升了局部校准性能同时保持了全局校准和预测性能。

详情

AI中文摘要

在高风险场景中，准确且校准良好的机器学习（ML）模型是必需的，但有效的多类校准仍然具有挑战性：全局方法假设校准误差在潜在空间中是同质的，而局部方法通常依赖于潜在空间降维，导致信息丢失。为了解决这些问题，我们提出了一种多类校准的复合方法，其中区域特定的校准映射是从共享的码字依赖因素中构建的。我们通过向量量化（VQ）实现这一想法，它诱导了表示空间的结构划分，并利用Dirichlet浓度的参数化实现跨区域参数共享。我们的方法学习了能泛化到稀疏区域的异质校准映射。在基准数据集上的实验显示，在保持竞争性的全局校准和预测性能的同时，显著提高了局部校准性能。

英文摘要

Accurate and well-calibrated Machine Learning (ML) models are mandatory in high-stakes settings, yet effective multiclass calibration remains challenging: global approaches assume calibration errors are homogeneous across the latent space, while local methods often rely on latent-space dimensionality reduction, which leads to information loss. To address these issues, we propose a compositional approach to multiclass calibration, where region-specific calibration maps are constructed from shared codeword-dependent factors. We instantiate this idea via Vector Quantization (VQ), which induces a structured partition of the representation space, and an indexed parameterization of Dirichlet concentrations that enables parameter sharing across regions. Our approach learns heterogeneous calibration maps that generalize well even to sparse regions of the latent space. Experiments on benchmark datasets show significant improvements in local calibration while maintaining competitive global calibration and predictive performance.

URL PDF HTML ☆

赞 0 踩 0

2605.21059 2026-05-21 cs.CV cs.LG 版本更新

Multimodal LLMs under Pairwise Modalities

基于成对模态的多模态大语言模型

Yan Li, Yunlong Deng, Yuewen Sun, Gongxu Luo, Kun Zhang, Guangyi Chen

发表机构 * Mohamed bin Zayed University of Artificial Intelligence（穆罕默德·本·扎耶德人工智能大学）； Carnegie Mellon University（卡内基梅隆大学）

AI总结本文提出了一种基于成对模态训练多模态大语言模型的方法，通过理论分析和表示学习框架，实现了跨模态对齐和重构，提升了模型的跨模态性能。

详情

AI中文摘要

尽管多模态大语言模型（MLLMs）取得了令人印象深刻的结果，但其训练通常依赖于联合编纂的多模态数据，需要大量的人力来构建多向对齐的数据集，从而限制了跨领域的可扩展性。在本工作中，我们探索了仅利用多种成对模态作为完整联合多模态分布的替代方案进行训练。具体来说，我们首先提供了理论分析，探讨在仅观察成对模态的情况下，表示可识别的条件。基于此分析，我们提出了一种表示学习框架，用于仅使用成对数据对齐跨模态的潜在表示。该框架包括两个阶段：潜在表示对齐和跨模态重构。具体而言，在第一阶段，我们通过自模态重建和成对对比学习学习跨模态的共享潜在空间。我们还通过部分对齐和最小潜在规范在对比学习过程中引入归纳偏置。在第二阶段，我们将新引入的模态的编码器与预训练模态的解码器整合起来，以促进跨模态转移和生成。我们通过将3D点云和触觉模态添加到预训练的MLLMs中，并使用三种模态对进行评估，证明通过学习对齐的潜在表示空间，我们的模型在跨模态性能上表现优异。

英文摘要

Despite the impressive results achieved by multimodal large language models (MLLMs), their training typically relies on jointly curated multimodal data, requiring substantial human effort to construct multi-way aligned datasets and thereby limiting scalability across domains. In this work, we explore training MLLMs by only leveraging multiple paired modalities as a surrogate for the full joint multimodal distribution. Specifically, we first provide a theoretical analysis of the conditions under which the representations are identifiable with only observing pairwise modalities. Building on this analysis, we propose a representation learning framework for aligning latent representations across modalities using only pairwise data. The framework consists of two stages: latent representation alignment and cross-modal recomposition. Specifically, in the first stage, we learn the shared latent space across modalities by both self-modal reconstruction and pair-wise contrastive learning. We also incorporate an inductive bias in the contrastive learning process by partially aligning and minimal latent specification. In stage two, we integrate the encoder of newly introduced modalities with the decoders of the pre-trained modalities to facilitate cross-modal transfer and generation. We evaluate our method by newly adding 3D point clouds and tactile modalities into pre-trained MLLMs with three modality pairs and show that, by learning an aligned latent representation space, our model achieves strong cross-modal performance.

URL PDF HTML ☆

赞 0 踩 0

2605.21058 2026-05-21 cs.LG 版本更新

A Dialogue between Causal and Traditional Representation Learning: Toward Mutual Benefits in a Unified Formulation

因果与传统表征学习之间的对话：在统一框架中实现相互受益

Yan Li, Yuewen Sun, Shaoan Xie, Gongxu Luo, Yunlong Deng, Kun Zhang, Guangyi Chen

发表机构 * Mohamed bin Zayed University of Artificial Intelligence（莫扎伊德·本·扎耶德人工智能大学）； Carnegie Mellon University（卡内基梅隆大学）

AI总结本文探讨了因果表征学习与传统表征学习之间的对话，提出统一框架，通过任务组件和约束组件相互促进发展，实验表明因果约束的有效性依赖于所配的任务。

详情

AI中文摘要

因果表征学习（CRL）和传统表征学习在发展轨迹上大相径庭。传统表征学习主要由应用和经验目标驱动，而CRL则更关注理论问题，尤其是可识别性。这种侧重点的不同导致了两个领域在术语、问题建模和评估上的差距，限制了交流，有时导致孤立或冗余的努力。本文认为，这两个领域应对话而非视为独立范式。为此，我们引入了一个统一框架，其中表征学习由两个组件定义：任务组件，指定所学表征需要保留的信息；约束组件，指定对潜在空间的结构约束。在此框架下，双向收益。CRL提供理论工具，用于理解何时结构化潜在约束是有用或必要的，而传统表征学习提供实用见解，关于任务设计和目标选择，可以改进CRL方法的发展。为了说明这种交互，我们实验研究了不同任务组件如何影响CRL方法在不同结构约束下的行为。在CausalVerse上的结果表明，因果约束的有效性强烈依赖于所配的任务。

英文摘要

Causal representation learning (CRL) and traditional representation learning have largely developed along different trajectories. Traditional representation learning has been driven mainly by applications and empirical objectives, whereas CRL has focused more on theoretical questions, particularly identifiability. This difference in emphasis has created a gap between the two fields in terminology, problem formulation, and evaluation, limiting communication and sometimes leading to disconnected or redundant efforts. In this paper, we argue that these two fields should be brought into dialogue rather than treated as separate paradigms. To this end, we introduce a unified formulation in which the representation learning is characterized by two components: a task component, which specifies what information the learned representation is required to preserve, and a constraint component, which specifies what structure is imposed on the latent space. Under this formulation, the benefits run in both directions. CRL provides theoretical tools for understanding when structured latent constraints are useful or necessary, while traditional representation learning offers practical insights on task design and objective choice that can improve the development of CRL methods. To illustrate this interaction, we experimentally study how different task components affect the behavior of CRL methods under different structured constraints. Results on CausalVerse show that the effectiveness of causal constraints depends strongly on the tasks with which they are paired.

URL PDF HTML ☆

赞 0 踩 0

2605.21055 2026-05-21 cs.NE cs.LG 版本更新

Genetic Programming with Transformer-Based Mutation for Approximate Circuit Design

基于变压器的突变遗传编程用于近似电路设计

Ondrej Galeta, Lukas Sekanina

发表机构 * Brno University of Technology, Faculty of Information Technology（布拉格技术大学信息科技学院）

AI总结本文提出了一种基于变压器的突变算子，用于改进遗传编程在近似算术电路自动设计中的进化设计和优化过程，通过混合方案防止电路近似过程停滞，并在多个目标误差约束下优于EvoApproxLib库中的现有高优化设计。

Comments To appear at IEEE World Congress on Computational Intelligence, Congress on Evolutionary Computation, Maastricht, NL, 2026

详情

AI中文摘要

最近的趋势是利用机器学习模型来提高进化设计和优化过程。我们提出了一种新的基于变压器的突变算子，用于Cartesian遗传编程（CGP）以实现近似算术电路的自动设计。我们引入了一种CGP的混合方案，其中所提出的突变算子与标准突变算子交替使用，以防止电路近似过程停滞。我们还开发了一种新的训练方案，用于底层变压器，该方案利用由成千上万的CGP染色体组成的训练向量，这些染色体代表各种近似乘法器。对于几种目标误差约束，使用基于变压器的突变算子的CGP进化出的近似乘法器在性能和优化方面优于EvoApproxLib库中的现有高优化设计。尽管训练和进化过程计算上都很耗费资源，但它们似乎是改进现有近似电路和产生新、可能可专利的电路设计所必需的步骤。

英文摘要

A recent trend is to leverage machine learning models to improve the evolutionary design and optimization process. We propose a novel transformer-based mutation operator for Cartesian genetic programming (CGP) for the automated design of approximate arithmetic circuits. We introduce a hybrid scheme for CGP in which the proposed mutation operator is switched with the standard mutation operator to prevent stagnation of the circuit approximation process. We also develop a new training scheme for the underlying transformer that utilizes training vectors composed of thousands of CGP chromosomes representing various approximate multipliers. For several target error constraints, the approximate multipliers evolved with CGP utilizing the transformer-based mutation achieve better trade-offs than the highly optimized designs available in the state-of-the-art EvoApproxLib library of approximate circuits. Although both training and evolutionary processes are computationally demanding, they appear to be necessary steps for improving existing approximate circuits and producing new, potentially patentable circuit designs.

URL PDF HTML ☆

赞 0 踩 0

2605.21041 2026-05-21 stat.ML cs.LG stat.ME 版本更新

基于TanDEM-X和Landsat数据的混合机器学习模型用于森林高度估计

Islam Mansour, Ronny Haensch, Irena Hajnsek, Konstantinos Papathanassiou

发表机构 * German Aerospace Center (DLR)（德国航空航天中心（DLR））； Institute of Environmental Engineering, ETH Zürich（环境工程研究所，苏黎世联邦理工学院）

AI总结本文提出了一种结合机器学习与物理模型的混合方法，利用TanDEM-X干涉相干测量和Landsat光学数据来提高森林高度估计的精度，通过扩展特征空间减少高度和基线地形坡度的模糊性，实验结果表明RMSE和MAE分别降低了13.5%和16.6%。

详情

DOI: 10.1109/LGRS.2026.3693644

AI中文摘要

将机器学习（ML）与物理模型（PM）结合，已成为从遥感数据中检索地球物理参数的一种有前途的方法。在此背景下，一种用于从TanDEM-X干涉相干测量中估计森林高度的ML模型最近被提出，该模型通过物理模型约束学习过程。虽然所选特征用于训练和反演以确保解决方案的物理一致性，但它们无法解决数据中的所有高度/结构和基线/地形坡度模糊性。为改进这一点，提出通过扩展特征空间加入光学Landsat数据，以提供关于森林类型或结构的补充信息。扩展的模型被应用于几处Gabon的Lopé国家公园的TanDEM-X数据，并与空中LiDAR测量进行评估。结果表明，与原始混合模型相比，RMSE和MAE分别减少了13.5%和16.6%，证实了多光谱输入的附加价值。

英文摘要

Integrating machine learning (ML) with physical models (PM) has emerged as a promising way of retrieving geophysical parameters from remote sensing data. In this context, a ML model for estimating forest height from TanDEM-X interferometric coherence measurements has recently been proposed, that constrains the learning process through a PM. While the features used for training and inversion where selected to ensure the physical consistency of the solutions, they could not resolve all height / structure and baseline / terrain slope ambiguities in the data. To improve this, the extension of the feature space with optical Landsat data is proposed able to provide complementary information on forest type or structure. The extended model is applied and validated on several TanDEM-X acquisitions over the Gabonese Lopé national park site and assessed against airborne LiDAR measurements. Results show a 13.5% reduction in RMSE and a 16.6% reduction in MAE compared to the original hybrid model, confirming the added value of multispectral inputs.

URL PDF HTML ☆

赞 0 踩 0

2605.20996 2026-05-21 cs.LG math.OC 版本更新

Beyond the Bellman Recursion: A Pontryagin-Guided Framework for Non-Exponential Discounting

超越贝尔曼递归：一种指导性框架用于非指数折扣

Hojin Ko, Jeonggyu Huh

发表机构 * Department of Mathematics, Sungkyunkwan University, Suwon, Republic of Korea（韩国首尔大学数学系）

AI总结本文提出了一种基于庞特里亚金原理的直接策略优化框架（PG-DPO），以解决非指数折扣问题，通过放弃递归方法，结合庞特里亚金最大原理和蒙特卡洛回放，提高动态规划的准确性和稳定性。

2605.20989 2026-05-21 cs.LG q-bio.GN 版本更新

Modeling Temporal scRNA-seq Data with Latent Gaussian Process and Optimal Transport

用潜在高斯过程和最优传输建模时间序列scRNA-seq数据

Mehmet Yigit Balik, Harri Lähdesmäki

发表机构 * Department of Computer Science, Aalto University, Espoo, Finland（奥卢大学计算机科学系，埃斯波，芬兰）

AI总结本文提出了一种生成框架，利用潜在异方差高斯过程建模种群趋势，并通过最优传输对齐生成和观测的种群分布，以捕捉生物异质性，从而在复杂插值和外推基准上实现最先进的性能。

详情

AI中文摘要

单细胞RNA测序提供了单细胞分辨率的基因表达见解，但从这些静态快照测量中推断时间过程仍然是一个根本性挑战。当前利用神经微分方程和流的方法容易过拟合且缺乏对生物变异性的仔细考虑。在本文中，我们提出了一种生成框架，利用希尔伯特空间方法近似潜在异方差高斯过程（GP）来建模种群趋势。为解决真实细胞轨迹的缺失问题，我们利用最优传输（OT）目标对齐生成和观测的种群分布。我们的方法通过引入细胞特异性潜在时间和细胞类型条件来捕捉生物异质性，从而解构时间异步性和不同细胞类型的轨迹。我们展示了在复杂插值和外推基准上的最先进性能，并引入了一种新的基于梯度的策略来推断扰动轨迹。

英文摘要

Single-cell RNA sequencing provides insights into gene expression at single-cell resolution, yet inferring temporal processes from these static snapshot measurements remains a fundamental challenge. Current approaches utilizing neural differential equations and flows are sensitive to overfitting and lack careful considerations of biological variability. In this work, we propose a generative framework that models population trends using a latent heteroscedastic Gaussian process (GP) approximated by Hilbert space methods. To address the absence of genuine cell trajectories, we leverage an optimal transport (OT) objective that aligns generated and observed population distributions. Our method explicitly captures biological heterogeneity by incorporating cell-specific latent time and cell type conditioning to disentangle temporal asynchrony and trajectories to different cell types. We demonstrate state-of-the-art performance on complex interpolation and extrapolation benchmarks and introduce a novel gradient-based strategy for inferring perturbation trajectories.

URL PDF HTML ☆

赞 0 踩 0

2605.20982 2026-05-21 cs.DC cs.AI cs.LG 版本更新

训练分布决定了药物盲癌敏感性预测的上限

Taekyung Heo

发表机构 * Taekyung Heo

AI总结本文研究了药物盲癌敏感性预测中训练分布对预测性能的影响，发现传统指标存在偏差，通过机制分层训练和响应匹配策略恢复了预测增益。

详情

AI中文摘要

精准肿瘤学需要预测特定肿瘤从其分子特征出发哪种药物能抑制它，但尽管药物表示越来越复杂，药物盲敏感性预测却停滞不前。本文表明这种停滞反映的是度量偏差而非表示瓶颈。标准基准全球皮尔逊相关系数受药物间效力差异主导，一个简单的药物均值预测器即可捕捉。每种药物皮尔逊相关系数揭示了在四个独立数据集中，没有药物编码能超过仅基于细胞特征的预测。受控实验将作用机制身份作为药物特征或训练分布约束，确定了原因。将作用机制作为特征提供微小收益，而将其作为训练分布分层则显著提高针对靶向激酶抑制剂的每种药物相关系数，因为全癌症联合训练抑制了通路特异性敏感信号。机制分层训练和试点观察的响应匹配提供了两种可部署策略，共同恢复了药物盲敏感性预测中的主要预测增益来源。

英文摘要

Precision oncology requires predicting which drugs will suppress a specific tumor from its molecular profile, but drug-blind sensitivity prediction has plateaued despite increasingly complex drug representations. Here we show that this stagnation reflects a metric artifact rather than a representational bottleneck. The standard benchmark, global Pearson r, is dominated by between-drug potency differences that a trivial drug-mean predictor captures without any cell-specific learning. Per-drug Pearson r, which isolates within-drug cell ranking, reveals that no drug encoding improves over cell-only features across four independent datasets. A controlled experiment channeling mechanism-of-action identity as either a drug feature or a training-distribution constraint identifies the cause. Supplying MoA as a feature yields negligible benefit, whereas using it to stratify training raises per-drug r substantially for targeted kinase inhibitors, because pan-cancer co-training suppresses pathway-specific sensitivity signals. Mechanism-stratified training and response matching from pilot observations provide two deployable strategies that together recover the principal sources of predictive gain in drug-blind sensitivity prediction.

URL PDF HTML ☆

赞 0 踩 0

2605.20883 2026-05-21 cs.LG 版本更新

Learning fMRI activations dictionaries across individual geometries via optimal transport

通过最优传输学习跨个体几何的fMRI激活字典

Sonia Mazelet, Rémi Flamary, Bertrand Thirion

发表机构 * CMAP, Ecole Polytechnique Palaiseau, France（CMAP，巴黎政治学院帕莱索校区，法国）； Mind, Inria-Saclay Palaiseau, France（Mind，法国国家信息与自动化研究所萨克雷帕莱索分所，法国）

AI总结本文提出了一种基于最优传输的fMRI字典学习方法，通过Fused Gromov-Wasserstein距离处理个体脑几何差异，利用amortized优化减少计算成本，并学习依赖FGW参数平衡特征对齐与结构一致性的字典原子。

详情

AI中文摘要

字典学习是一种创建可解释表示的强大工具。当应用于功能性磁共振成像（fMRI）数据时，所得到的脑活动模式可用于各种下游任务，如脑状态分类或群体水平分析。然而，一个主要挑战是不同个体之间的脑几何差异。通常通过将每个个体的脑几何投影到一个通用模板上来解决，这会移除个体特定的信息。在本工作中，我们提出了一种新的fMRI数据字典学习方法，该方法明确考虑了这种差异。我们使用基于最优传输的融合Gromov-Wasserstein（FGW）距离来比较具有不同几何和特征的图。为了解决计算多个FGW距离对于大图（如来自fMRI数据的图）带来的挑战，我们依赖于amortized优化来学习一个神经网络，该网络可以预测最优传输计划的近似值，从而显著降低计算成本。此外，我们学习了依赖FGW权衡参数的字典原子，该参数控制特征对齐和结构一致性之间的平衡。在HCP数据集上的数值实验表明，所提出的方法能够捕捉数据中的不同几何差异水平，并提供保留关键信息的表示。

英文摘要

Dictionary learning is a powerful tool for creating interpretable representations. When applied to functional magnetic resonance imaging (fMRI) data, the resulting patterns of brain activity can be used for various downstream tasks, such as brain state classification or population-level analysis. However, a major challenge is the variability in brain geometry across individuals. This is usually addressed by projecting each individual brain geometry onto a common template, which removes subject-specific information. In this work, we introduce a novel approach to dictionary learning on fMRI data that explicitly accounts for this variability. We use the optimal transport-based Fused Gromov-Wasserstein (FGW) distance to compare graphs with different geometries and features. To address the challenge of computing multiple FGW distances for large graphs such as those arising from fMRI data, we rely on amortized optimization to learn a neural network that predicts an approximation of the optimal transport plans, which substantially reduces the computational cost. Additionally, we learn dictionary atoms that depend on the FGW trade-off parameter, which controls the balance between feature alignment and structural consistency. Numerical experiments on the HCP dataset demonstrate that the proposed approach captures different levels of geometric variability in the data and provides representations that preserve essential information.

URL PDF HTML ☆

赞 0 踩 0

2605.20879 2026-05-21 cs.LG 版本更新

NeighborDiv: Training-free Zero-shot Generalist Graph Anomaly Detection via Neighbor Diversity

NeighborDiv: 一种基于邻居多样性、无需训练的跨域通用图异常检测方法

Kaifeng Wei, Teng Liu, Liang Dong, Xiubo Liang, Yuke Li

发表机构 * Netease Yidun AI Lab（网易易盾AI实验室）； School of Software Technology（软件学院）； Zhejiang University（浙江大学）

AI总结本文提出NeighborDiv，一种无需训练的通用图异常检测方法，通过邻居多样性原理来检测异常，克服了传统方法在训练复杂度、数据依赖性和跨域泛化稳定性方面的不足，实验表明其在多个评估框架下均取得最佳性能。

详情

AI中文摘要

图异常检测（GAD）正逐渐转向通用图异常检测（GGAD）以实现跨域的'一揽子'检测，但现有GGAD方法主要依赖邻居一致性原则，陷入'节点到邻居一致性范式'的异常量化中。这些方法存在训练流程复杂、依赖大量训练数据、计算成本高以及跨域泛化不稳定等问题。为了解决这些限制，我们提出了NeighborDiv，一种基于邻居多样性的无需训练的通用图异常检测框架。偏离主流的'节点到邻居一致性范式'，我们转向'邻居到邻居多样性范式'，发现节点邻居集合的内部结构分散性是一种强大且独立的异常信号。我们通过邻居间特征相似性的方差来量化邻居多样性，捕捉节点如何组织其局部图环境，并独立于传统节点到邻居一致性框架。在两个标准的GGAD评估范式下进行的大量实验表明，NeighborDiv在单域独立训练（SDIT）下平均AUC提升了10.25%，平均AP提升了17.78%；在统一多域训练（UMDT）下，AUC和AP分别提升了6.89%和9.58%。值得注意的是，NeighborDiv在所有数据集上均无性能波动，消除了训练集依赖性，建立了一个轻量且高度实用的GGAD框架。

英文摘要

Graph Anomaly Detection (GAD) is increasingly shifting to Generalist GAD (GGAD) for cross-domain "one-for-all" detection, but existing GGAD methods predominantly rely on the neighbor consistency principle, falling into the \textbf{Node-to-Neighbor Consistency Paradigm} for anomaly quantification. These methods suffer from complex training pipelines, heavy training data dependency, high computational costs, and unstable cross-domain generalization. To address these limitations, we propose NeighborDiv, a training-free generalist graph anomaly detection framework based on neighbor diversity. Departing from the dominant Node-to-Neighbor Consistency Paradigm, we shift the focus to the \textbf{Neighbor-to-Neighbor Diversity Paradigm}, and uncover that the internal structural dispersion of a node's neighbor set is a powerful, independently discriminative anomaly signal. We quantify neighbor diversity via the variance of inter-neighbor feature similarities, which captures how a node organizes its local graph environment, and operates independently of conventional node-to-neighbor consistency frameworks. Extensive experiments under two standard GGAD evaluation paradigms show NeighborDiv achieves state-of-the-art performance, with relative gains of 10.25% in average AUC and 17.78% in average AP over the second-best baseline under Single-Domain Independent Training (SDIT), and 6.89%/9.58% in AUC/AP under Unified Multi-Domain Training (UMDT), respectively. Notably, NeighborDiv yields zero performance volatility across all datasets, eliminating training-set dependency and establishing a lightweight and highly practical GGAD framework.

URL PDF HTML ☆

赞 0 踩 0

2605.20878 2026-05-21 cs.LG 版本更新

CIG: Exploration via Conditional Information Gain

CIG: 通过条件信息增益进行探索

Tim Joseph, Marcus Fechner, Philipp Stegmaier, Karam Daaboul, J. Marius Zöllner

发表机构 * FZI Karlsruhe（弗赖堡研究所卡尔斯鲁厄分所）； KIT Karlsruhe（卡尔斯鲁厄理工学院）

AI总结该研究提出了一种条件信息增益（CIG）奖励机制，用于强化学习中的探索问题，通过可追溯的log-determinant目标和Ensemble Disagreement核来生成因果每步奖励，从而在高维状态空间中实现有效的探索。

Comments 28 pages, 10 figures, 3 tables

详情

AI中文摘要

在强化学习中，内在奖励用于探索时会根据不同的上下文进行条件化：终身奖励对每个转移进行累积经验评分，但忽略轨迹内的冗余；事件奖励惩罚轨迹内的重复，但丢弃长期进步。混合方法通过启发式权重结合两种信号，或需要高斯过程动态模型，无法扩展到低维状态空间。轨迹级信息增益可以分解为每步项，这些项同时条件于回放缓冲区和轨迹前缀，但在深度模型中仍然不可行。我们推导出条件信息增益（CIG）奖励作为可追溯的替代方案：一个基于集合分歧核的log-determinant目标，其Cholesky因子分解产生因果每步奖励，保留两个条件集并在高维状态空间中扩展。我们在基于模型的设置中实例化CIG，其中轨迹较短且轨迹内的修正仍大部分未探索。在十二个任务上，包括离散（MiniGrid）和连续控制（OGBench），在干净和随机干扰设置中，CIG在性能上优于或匹配先前的探索方法，同时对随机干扰具有鲁棒性。

英文摘要

Intrinsic rewards for exploration in reinforcement learning condition on different contexts: lifelong rewards score each transition against accumulated experience but ignore within-rollout redundancy; episodic rewards penalize intra-trajectory repetition but discard lifetime progress. Hybrid methods combine both signals through heuristic weights or require Gaussian-process dynamics that do not scale beyond low-dimensional state spaces. Trajectory-level information gain decomposes into per-step terms that condition on the replay buffer and rollout prefix simultaneously, but remains intractable for deep models. We derive the Conditional Information Gain (CIG) reward as a tractable surrogate: a log-determinant objective over an ensemble disagreement kernel whose Cholesky factorization yields causal per-step rewards that retain both conditioning sets while scaling to high-dimensional state spaces. We instantiate CIG in a model-based setting, where rollouts are short and within-rollout corrections remain largely unexplored. Across twelve tasks spanning discrete (MiniGrid) and continuous control (OGBench), in both clean and stochastic-distractor settings, CIG outperforms or matches prior exploration methods while remaining robust to stochastic distractors.

URL PDF HTML ☆

赞 0 踩 0

2605.20872 2026-05-21 cs.LG cs.AI cs.GR 版本更新

CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation

CAdam: 3D高斯密度细化中的上下文自适应矩估计

SeungJeh Chung, Geonho Park, Misong Kim, HyeongYeop Kang

发表机构 * IIIXR Lab, Kyung Hee University（庆尚大学IIIXR实验室）； IIIXR Lab, Korea University（韩国大学IIIXR实验室）

AI总结本文提出CAdam方法，通过将密度细化问题转化为统计信号验证问题，解决生成式蒸馏中密度估计的瓶颈，从而在保持视觉质量的同时显著减少高斯点数量。

Comments Accepted to SIGGRAPH 2026 Conference Papers. 12 pages, 8 figures

详情

DOI: 10.1145/3799902.3811215

AI中文摘要

Adaptive densification是3D高斯点划法（3DGS）的核心引擎。然而，当将其应用于基于优化的生成式蒸馏范式时，这种重建原生机制暴露了根本性限制，导致效率低下且充满冗余的表示。我们诊断这种失败为密度困境，源于生成指导的随机性：标准的幅度基积累无差别地聚合瞬态噪声与几何信号，难以在过密度和欠拟合之间取得平衡。为了解决这一问题，我们引入了上下文自适应矩估计（CAdam），一种新的框架，将密度细化重新解释为统计上站得住的信号验证问题。CAdam利用梯度的一阶矩来利用干涉原理，其中随机波动通过破坏性干涉抵消，而一致的几何漂移通过建设性干涉累积，从而有效分离底层信号与生成噪声底座。这进一步通过基于分位数的上下文意识和内在信号噪声比（SNR）门控机制增强，确保在优化阶段之间具有鲁棒的适应性，并使密度细化能够软终止。在多样化的目标（SDS，ISM，VFDS）和强大的生成3DGS后端上进行了广泛的实验，结果表明CAdam相比标准密度细化将高斯点数减少85%-97%，同时保持整体可比的视觉质量。这些结果突显了信号感知密度控制作为改进优化生成式蒸馏内存效率的实用方法。

英文摘要

Adaptive densification is the engine of 3D Gaussian Splatting (3DGS). However, when transposed to the optimization-based Generative Distillation paradigm, this reconstruction-native mechanism reveals fundamental limitations, resulting in inefficient representations cluttered with redundant primitives. We diagnose this failure as a Densification Dilemma stemming from the stochastic nature of generative guidance: the standard magnitude-based accumulation indiscriminately aggregates transient noise alongside geometric signals, making it difficult to strike a balance between over-densification and under-fitting. To resolve this, we introduce Context-Adaptive Moment Estimation (CAdam), a novel framework that reinterprets densification as a statistically grounded signal verification problem. CAdam leverages the first moment of gradients to exploit the interference principle, where stochastic fluctuations cancel out via destructive interference while consistent geometric drifts accumulate via constructive interference, effectively disentangling the underlying signal from the generative noise floor. This is further augmented by a quantile-based context awareness and an intrinsic Signal-to-Noise Ratio (SNR) gating mechanism, which ensure robust adaptation across optimization stages and enable the soft termination of densification. Extensive experiments across diverse objectives (SDS, ISM, VFDS) and strong generative 3DGS backbones show that CAdam reduces Gaussian count by 85%-97% relative to standard densification while preserving overall comparable perceptual quality. These results highlight signal-aware density control as a practical way to improve memory efficiency in optimization-based generative distillation.

URL PDF HTML ☆

赞 0 踩 0

2605.20868 2026-05-21 cs.LG cs.AI cs.SY eess.SY 版本更新

Runtime-Certified Bounded-Error Quantized Attention

具有运行时认证的误差受限量化注意

Dean Calver

发表机构 * Independent Researcher（独立研究者）

AI总结本文提出了一种分层的KV缓存架构，通过在GPU内存中存储INT8键和INT4值，同时在系统RAM中保留FP16原始数据，实现了运行时认证的注意机制，通过误差分解得到每头每步的误差界，以驱动自适应精度选择和多阶段回退流程，确保在需要时能恢复到精确的密集注意输出。

Comments 32 pages, 1 figure

详情

AI中文摘要

KV缓存量化减少了长上下文LLM推理的内存成本，但引入了通常仅通过经验验证的近似误差。现有系统依赖于平均情况下的鲁棒性，没有机制在运行时检测或恢复失败。本文提出了一种分层的KV缓存架构，使注意机制具有运行时认证：INT8键和INT4值存储在GPU内存中，而FP16原始数据保留在系统RAM中以实现确定性回退。一个两术语误差分解提供了每头每步的误差界（i）键量化导致的注意分布扭曲和（ii）值重建误差。这些界在线计算并用于驱动自适应精度选择和多阶段回退阶梯，确保在需要时能恢复到精确的密集注意输出。在PG-19、NIAH和RULER基准上，对LLaMA~3.1-8B（上下文长度达128K）的测试中，系统在语言建模和检索任务中与密集FP16 KV质量在噪声范围内匹配，同时恢复了在朴素INT8/INT4基线中观察到的灾难性故障。短上下文的值敏感任务暴露了压缩与保真度之间的可控权衡，可通过更紧的值容忍度或FP16值回退消除。认证是局部的（每头、每步），不保证端到端模型的正确性，但确保每个注意计算要么相对于FP16参考是受控的，要么通过回退精确恢复。这将KV缓存量化重新定义为运行时验证的计算，而不是固定近似。目标不是原始的速度提升，而是使在严格质量约束下安全部署的激进KV压缩成为可能。

英文摘要

KV cache quantization reduces the memory cost of long-context LLM inference, but introduces approximation error that is typically validated only empirically. Existing systems rely on average-case robustness, with no mechanism to detect or recover from failures at runtime. We present a tiered KV cache architecture that enables runtime-certified attention: INT8 keys and INT4 values are stored in GPU memory, while FP16 originals are retained in system RAM for deterministic fallback. A two-term error decomposition yields per-head, per-step bounds on (i) attention distribution distortion from key quantization and (ii) value reconstruction error. These bounds are computed online and used to drive adaptive precision selection and a multi-stage fallback ladder, which guarantees recovery to the exact dense attention output when required. Across PG-19, NIAH, and RULER benchmarks on LLaMA~3.1-8B with contexts up to 128K, the system matches dense FP16 KV quality within noise for language modelling and retrieval tasks, while recovering catastrophic failures observed in naive INT8/INT4 baselines. Value-sensitive tasks at short context expose a controlled trade-off between compression and fidelity, which can be eliminated via tighter value tolerances or FP16-value fallback. The certification is local (per-head, per-step) and does not guarantee end-to-end model correctness, but ensures that each attention computation is either bounded relative to an FP16 reference or exactly recovered via fallback. This reframes KV cache quantization as a runtime-verified computation rather than a fixed approximation. The goal is not raw speedups, but enabling safe deployment of aggressive KV compression under strict quality constraints.

URL PDF HTML ☆

赞 0 踩 0

2605.20866 2026-05-21 cs.LG cs.DC math.OC stat.ML 版本更新

LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging

LOSCAR-SGD：局部SGD与通信-计算重叠及延迟校正的稀疏模型平均

Yassine Maziane, Ammar Mahran, Artavazd Maranjyan, Peter Richtárik

发表机构 * KAUST（卡塔尔科技大学）

AI总结本文研究了在异构计算环境下结合通信压缩、局部训练和通信-计算重叠的局部SGD方法，提出LOSCAR-SGD通过仅通信稀疏模型坐标并持续优化来提高分布式学习效率，首次给出了这种组合方法的理论保证。

详情

AI中文摘要

在分布式学习中，通信是主要的瓶颈，尤其是在大规模设置和联邦学习环境中链接缓慢时。减少此成本的三种标准方法是通信压缩、局部训练和通信-计算重叠。结合这些成分的方法在实践中被发现对大规模训练有效，但很少有理论支持同时结合这三种方法的方法。我们研究了一个异构计算环境，其中不同的工作者可能进行不同数量的局部步骤，并提出LOSCAR-SGD，一种局部SGD方法，仅通信模型坐标的稀疏子集，并在通信飞行期间继续优化。关键成分是延迟校正的合并规则，该规则在不丢弃重叠阶段所做进展的情况下整合延迟同步信息。我们为光滑非凸目标函数提供了收敛保证，并展示了稀疏性、重叠和工作者异质性如何影响收敛速度。据我们所知，这是首次针对这种成分组合的理论。实验进一步表明，通信-计算重叠减少了训练时间，并且延迟校正的合并优于朴素覆盖。

英文摘要

Communication is a major bottleneck in distributed learning, especially in large-scale settings and in federated learning environments with slow links. Three standard ways to reduce this cost are communication compression, local training, and communication-computation overlap. Methods that combine these ingredients are used in practice and have been found to be effective for large-scale training, but there is little theory for methods that combine all three. We study a heterogeneous-compute setting in which different workers may take different numbers of local steps, and we propose LOSCAR-SGD, a Local SGD method that communicates only a sparse subset of model coordinates and continues optimizing while communication is in flight. A key ingredient is a delay-corrected merge rule that incorporates delayed synchronized information without discarding the progress made during the overlap phase. We give convergence guarantees for smooth non-convex objectives and show how sparsity, overlap, and worker heterogeneity affect the rate. To the best of our knowledge, this is the first theory for this combination of ingredients. Experiments further show that communication-computation overlap reduces training time and that the delay-corrected merge outperforms naive overwriting.

URL PDF HTML ☆

赞 0 踩 0

2605.20865 2026-05-21 cs.LG cs.AI 版本更新

Multi-Step Likelihood-Ratio Correction for Reinforcement Learning with Verifiable Rewards

多步似然比校正用于可验证奖励的强化学习

Deokgyu Yoon, Hyungkyu Kang, Joongkyu Lee, Byeongchan Kim, Gyungin Shin, Sungrae Park, Min-hwan Oh

发表机构 * Seoul National University（首尔国立大学）； Upstage

AI总结本文提出了一种多步前向轨迹政策优化（NFPO）算法，通过引入N步前向轨迹来改进PPO的近似目标，从而在可验证奖励的强化学习中实现更精确的策略改进。

详情

AI中文摘要

可验证奖励的强化学习（RLVR）在提升大语言模型的推理能力方面起着关键作用。然而，广泛使用的PPO替代目标本质上是局部的，因为它们依赖于精确策略梯度目标的局部近似。虽然这种近似通过减少重要性采样引起的方差来提高稳定性，但它也引入了结构偏差到替代目标中，必须通过信任区域机制进行控制。在本文中，我们引入了N步前向轨迹，通过累积下一个N-1个token的似然比来增强PPO替代目标。基于这一想法，我们提出了N步前向轨迹策略优化（NFPO），一种将N步前向轨迹整合到掩码策略梯度框架中的实用RLVR算法。NFPO提供了一个连续的桥梁，将PPO替代目标与精确策略梯度目标联系起来，提供了一种控制偏差-方差权衡的原理机制。我们的理论分析表明，通过适当选择N，所提出的目标比标准PPO替代目标提供了更紧的策略改进界。在全面推理基准测试中，实验表明NFPO一致地提高了性能，支持了我们的理论发现。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) plays a pivotal role in improving the reasoning ability of large language models. However, widely used PPO surrogate objectives are fundamentally local, as they rely on a local approximation of the exact policy gradient objective. While this approximation improves stability by reducing the variance induced by importance sampling, it also introduces structural bias into the surrogate objective, which must be controlled through trust region mechanisms. In this work, we introduce the $N$-step forward trace, which augments the PPO surrogate objective using the cumulative likelihood ratio of the next $N-1$ tokens. Building on this idea, we propose $N$-Step Forward-Trace Policy Optimization (NFPO), a practical RLVR algorithm that integrates the $N$-step forward trace into the masked policy gradient framework. NFPO provides a continuous bridge between the PPO surrogate objective and the exact policy gradient objective, offering a principled mechanism for controlling the bias-variance trade-off. Our theoretical analysis shows that, with an appropriate choice of $N$, the proposed objective yields a tighter policy-improvement bound than the standard PPO surrogate. Experiments on comprehensive reasoning benchmarks demonstrate that NFPO consistently improves performance, supporting our theoretical findings.

URL PDF HTML ☆

赞 0 踩 0

2605.20863 2026-05-21 cs.DC cs.LG 版本更新

PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR

PlexRL: 服务化大语言模型执行在RLVR中的集群级编排

Yiqi Zhang, Fangzheng Jiao, Tian Tang, Boyu Tian, Hangyu Wang, Qiaoling Chen, Guoteng Wang, Zhen Jiang, Peng Sun, Ping Zhang, Xiaohe Hu, Ziming Liu, Menghao Zhang, Yanmin Jia, Yang You, Siyuan Feng

发表机构 * National University of Singapore（新加坡国立大学）； Beihang University（北航）； Shanghai Qiji Zhifeng Co., Ltd.（上海启智风科技有限公司）； Nanyang Technological University（南洋理工大学）； Shanghai Innovation Institute（上海创新研究院）

AI总结本文提出PlexRL，通过集群级编排服务化大语言模型执行，解决RLVR训练中的效率问题，提升集群容量并降低GPU小时成本，同时保持算法灵活性和最小的单任务开销。

详情

AI中文摘要

可验证奖励的强化学习（RLVR）最近在大语言模型（LLMs）中解锁了强大的推理能力，触发了新算法和数据的快速探索。然而，RLVR训练 notoriously 不高效：长尾回放、工具引起的停滞以及回放和训练之间资源需求的不对称性引入了大量空闲时间，无法通过作业本地优化如同步流水线、异步回放或 colocated 执行来消除。我们认为这种低效是结构性的。虽然个体RLVR作业中的空闲间隙是不可避免的，但它们在不同作业之间 largely 抗相关，因此可以在集群级别利用。基于这一观察，我们提出了PlexRL，一个用于在RLVR作业中多路复用统一LLM服务的集群级运行时。通过集中管理模型放置、状态转换和功能级调度，在严格亲和约束下，PlexRL将LLM执行时间片分配到作业中以填补否则空闲的时期，而无需昂贵的模型迁移。我们的实现和评估表明，PlexRL显著提高了有效集群容量，并通过最大37.58%减少了用户GPU小时成本，同时保持算法灵活性并引入最小的单作业开销。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has recently unlocked strong reasoning capabilities in large language models (LLMs), triggering rapid exploration of new algorithms and data. However, RLVR training is notoriously inefficient: long-tailed rollouts, tool-induced stalls, and asymmetric resource requirements between rollout and training introduce substantial idle time that cannot be eliminated by job-local optimizations such as synchronous pipelining, asynchronous rollout, or colocated execution. We argue that this inefficiency is structural. While idle gaps are unavoidable within individual RLVR jobs, they are largely anti-correlated across jobs and therefore exploitable at the cluster level. Leveraging this observation, we present PlexRL, a cluster-level runtime for multiplexing unified LLM services across RLVR jobs. By centrally managing model placement, state transitions, and function-level scheduling under strict affinity constraints, PlexRL time-slices LLM execution across jobs to fill otherwise idle periods without expensive model migration. Our implementation and evaluations demonstrate that PlexRL significantly improves effective cluster capacity and reduces user GPU hour cost by maximum 37.58% while preserving algorithmic flexibility and introducing minimal per-job overhead.

URL PDF HTML ☆

赞 0 踩 0

2605.20856 2026-05-21 cs.RO cs.AI cs.LG 版本更新

DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation

DISC: 通过策略生成解耦指令与状态条件控制

Hanxiang Ren, Pei Zhou, Xunzhe Zhou, Yanchao Yang

发表机构 * Zhejiang University（浙江大学）； The University of Hong Kong（香港大学）； TranscEngram

AI总结 DISC通过策略生成解耦指令与状态条件控制，解决了任务状态耦合导致的观察泄漏问题，并在多个基准测试中表现出色，证明了语言生成的策略参数驱动行为。

详情

AI中文摘要

语言条件的操控策略通常通过共享网络参数处理指令和观察。这种任务-状态耦合提供了观察泄漏的路径——网络学习了场景到动作的捷径，完全绕过了语言接地。DISC通过结构上消除这一失败。而不是将通用策略条件在语言上，DISC使用超网络从指令本身生成整个任务特定的视觉-运动策略参数集。生成的策略从不直接访问语言；因此，其任务意识必须来自语言。 Consequently，观察泄漏没有路径出现。另一方面，生成一致的高维策略权重本身是一个具有挑战性的问题。我们通过两阶段超网络解决它，其细化阶段将基于梯度优化的结构作为前馈归纳偏差嵌入，产生全局一致的参数，而无需实际梯度计算。在标准数据预算上完全从头训练，DISC在LIBERO-90和Meta-World上优于所有耦合基线，在复杂、长周期任务中优势扩大，并在不使用外部预训练数据的情况下超越了大规模预训练的π₀。在一个现实基准中，所有任务共享相同的视觉上下文，DISC显著优于耦合替代方案，直接证实了语言生成的策略参数，而非视觉捷径，驱动行为。超网络进一步学习了一个语义结构化的参数流形，能够从最少的演示中实现少样本适应，并在改写指令中实现稳健的泛化。我们的代码可在：https://github.com/ReNginx/DISC获取。

英文摘要

Language-conditioned manipulation policies typically process instructions and observations through shared network parameters. This task-state entanglement provides a pathway for observation leakage -- networks learn scene-to-action shortcuts that bypass language grounding entirely. DISC eliminates this failure structurally. Rather than conditioning a universal policy on language, DISC uses a hypernetwork to generate the entire parameter set of a task-specific visuomotor policy from the instruction alone. The generated policy never directly accesses language; therefore, its task-awareness must come from the language. Consequently, observation leakage has no pathway to emerge. On the other hand, generating coherent high-dimensional policy weights is itself a challenging problem. We address it with a two-stage hypernetwork whose refinement stage embeds the structure of gradient-based optimization as a feed-forward inductive bias, producing globally consistent parameters without actual gradient computation. Trained entirely from scratch on standard data budgets, DISC outperforms all entangled baselines on LIBERO-90 and Meta-World, with advantages that widen on complex, long-horizon tasks -- and surpasses the large-scale pretrained $π_0$ despite using no external pretraining data. On a real-world benchmark where all tasks share identical visual context, DISC substantially outperforms entangled alternatives, directly confirming that language-generated policy parameters, not visual shortcuts, drive behavior. The hypernetwork further learns a semantically structured parameter manifold that enables few-shot adaptation from minimal demonstrations and robust generalization across paraphrased instructions. Our code is available at: {https://github.com/ReNginx/DISC}.

URL PDF HTML ☆

赞 0 踩 0

2605.20839 2026-05-21 cs.CV cs.LG 版本更新

Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models

无需激活的图像识别回骨：在MetaFormer风格视觉模型中的多项式替代方案

Jeffrey Wang, Jonathan Gregory, Grigorios G. Chrysos

发表机构 * University of Wisconsin--Madison（威斯康星大学麦迪逊分校）

AI总结本文提出无需激活函数的多项式替代方法，用于在MetaFormer风格的视觉模型中实现图像识别，展示了多项式模块在多个数据集上的优越性能。

Comments Accepted to ICML 2026

2605.20834 2026-05-21 cs.AI cs.LG 版本更新

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment

DPO与RLHF的条件等价性：隐含假设、失败模式与可证明对齐

Zhiqin Yang, Yonggang Zhang, Wei Xue, Dong Fang, Bo Han, Yike Guo

发表机构 * The Hong Kong University of Science（香港科技大学）； Hong Kong Baptist University（香港 Baptist 大学）

AI总结本文研究了DPO与RLHF的等价性问题，指出其等价性依赖于一个隐含假设，当该假设不成立时，DPO会优化相对优势而非绝对对齐，从而导致路径性收敛。作者提出CPO方法，通过引入约束实现可证明对齐，并通过几何解释揭示DPO的margin ranking机制。

Comments 49 pages

详情

AI中文摘要

直接偏好优化（DPO）作为一种替代强化学习从人类反馈（RLHF）的方法，理论上等价但实现更简单。我们证明这种等价性是条件性的而非普遍的，取决于一个隐含假设：RLHF最优策略必须偏好人类偏好响应。当该假设不成立时，DPO优化参考策略的相对优势而非绝对对齐人类偏好，导致路径性收敛，即策略降低DPO损失但偏好不被偏好响应。我们刻画了该假设被违反的情况，展示了不可取的解空间存在，并证明在这些情况下DPO和RLHF优化根本不同的目标。为解决此问题，我们引入约束偏好优化（CPO），通过在RLHF中加入约束以实现可证明对齐。我们进一步通过软边距排名提供几何解释，揭示DPO实现边距排名但可能具有潜在负目标。我们的理论分析确立了DPO保证成立的条件，并提供了保持简单性的同时具有可证明对齐的解决方案。在标准基准上的全面实验表明，CPO实现了最先进的性能。代码可在：https://github.com/visitworld123/CPO获取。

英文摘要

Direct Preference Optimization (DPO) has emerged as a popular alternative to Reinforcement Learning from Human Feedback (RLHF), offering theoretical equivalence with simpler implementation. We prove this equivalence is conditional rather than universal, depending on an implicit assumption frequently violated in practice: the RLHF-optimal policy must prefer human-preferred responses. When this assumption fails, DPO optimizes relative advantage over the reference policy rather than absolute alignment with human preferences, leading to pathological convergence where policies decrease DPO loss while preferring dispreferred responses. We characterize when this assumption is violated, show the existence of an undesirable solution space, and prove that DPO and RLHF optimize fundamentally different objectives in such cases. To address this, we introduce Constrained Preference Optimization (CPO), augmenting RLHF with constraints for provable alignment. We further provide a geometric interpretation through soft margin ranking, revealing that DPO implements margin ranking with potentially negative targets. Our theoretical analysis establishes when DPOs' guarantees hold and provides solutions preserving simplicity with provable alignment. Comprehensive experiments on standard benchmarks demonstrate that CPO achieves state-of-the-art performance. Code is available at: https://github.com/visitworld123/CPO.

URL PDF HTML ☆

赞 0 踩 0

2605.20824 2026-05-21 cs.LG 版本更新

Markovian Circuit Tracing for Transformer State Dynamic

马尔可夫电路追踪用于Transformer状态动态

Abdullah X

发表机构 * Project AWARE and Zephara AI（项目AWARE和Zephara AI）

AI总结本研究提出马尔可夫电路追踪（MCT）方法，用于评估Transformer激活是否包含粗粒度的状态转移结构，通过合成的隐马尔可夫模型任务验证了残差激活中包含部分贝叶斯信念信息，并展示了状态抽象在不同状态下恢复粗粒度转移信号的效果。

详情

AI中文摘要

许多序列计算更容易通过内部状态的运动来研究，而不是孤立的局部电路。我们引入了马尔可夫电路追踪（MCT），一种用于测试Transformer激活是否包含粗粒度状态转移结构的诊断流程。该基准使用合成的隐马尔可夫模型（HMM）任务，其中潜在状态、转移矩阵、贝叶斯信念向量、贝叶斯最优预测以及强制状态反事实目标都是已知的。在六个HMM家族和每个家族三个种子的情况下，tiny因果Transformer学习接近贝叶斯的下一个token预测器，其平均超额损失为0.0138。残差激活在受控的合成基准中包含部分贝叶斯信念信息。从这些激活中提取的状态抽象在持久和低状态领域恢复粗粒度转移信号最强，在模糊发射和六状态领域则较弱。最清晰的结果来自状态强制。修复恢复的状态质心将KL值从未修复模型中的0.1957降低到0.0532，平均上优于错误状态、均值激活、随机激活和洗牌标签控制。本研究的贡献是一个受控的基准和评估框架，用于Transformer状态动态可解释性，MCT作为简单的参考流程。

英文摘要

Many sequence computations are easier to study as movement through internal states than as isolated local circuits. We introduce Markovian Circuit Tracing (MCT), a diagnostic pipeline for testing whether transformer activations contain coarse state-transition structure. The benchmark uses synthetic Hidden Markov Model (HMM) tasks where latent states, transition matrices, Bayesian belief vectors, Bayes-optimal predictions, and forced-state counterfactual targets are known exactly. Across six HMM families and three seeds per family, tiny causal transformers learn near-Bayes next-token predictors, with mean excess loss over Bayes of 0.0138. Residual activations contain partial Bayesian belief information in this controlled synthetic benchmark. State abstractions extracted from these activations recover coarse transition signal, strongest in persistent and lower-state regimes, and weaker in ambiguous-emission and six-state regimes. The clearest result comes from state forcing. Patching a recovered-state centroid reduces KL to the exact HMM counterfactual target from 0.1957 in the unpatched model to 0.0532 on average, beating wrong-state, mean-activation, random-activation, and shuffled-label controls. The contribution is a controlled benchmark and evaluation framework for transformer state-dynamics interpretability, with MCT as a simple reference pipeline

URL PDF HTML ☆

赞 0 踩 0

2605.20815 2026-05-21 cs.CL cs.AI cs.IR cs.LG 版本更新

GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval

在消费级硬件上实现GraphRAG：对本地LLMs在医疗EHR模式检索中的基准测试

Peter Fernandes, Ria Kanjilal

发表机构 * Department of Computer Engineering（计算机工程系）； California Polytechnic State University（加州州立大学波特兰分校）

AI总结本文研究了在消费级硬件上使用本地LLMs进行医疗EHR模式检索的GraphRAG方法，评估了四种不同模型在索引效率、知识图构建、查询延迟、回答质量和幻觉方面的表现，发现模型参数大小和检索模式对结果有显著影响。

Comments 9 pages, 1 figure, 5 tables

详情

超越数值特征：通过等高线图进行CNN驱动的算法选择用于连续黑盒优化

Yiliang Yuan, Xiang Shi, Mustafa Misir

发表机构 * Mohamed bin Zayed University of Artificial Intelligence, Masdar City, United Arab Emirates（莫扎德人工智能大学，马斯达尔城，阿拉伯联合酋长国）

AI总结本文提出了一种基于表示的实例级算法选择方法，应用于黑盒优化，用于自动从固定组合中选择最有前途的求解器。传统连续优化工作主要依赖于数值描述符，包括探索景观分析特征和学习嵌入如Deep-ELA。本文研究了一种互补的表示：探测景观的等高线图可视化。一个CNN回归器利用多个实例特定的等高线视图（堆叠或编码每个视图并聚合）来预测每个求解器的性能，从而通过预测的最佳值进行选择。在标准BBOB 2009单目标协议上，所得到的选者显著优于单最佳求解器（SBS），并与基于特征的基线具有竞争力。随后在DeepELA设置下的双目标评估进一步表明，当使用窗口等高线视图时，基于图像的原则同样具有竞争力。总体而言，结果表明，简单的视觉模型可以利用探测景观中的空间结构进行算法选择，而无需手工设计ELA特征。

详情

AI中文摘要

本文介绍了一种新的基于表示的方法，用于实例级算法选择，应用于黑盒优化，以自动从固定组合中选择最有前途的求解器。传统连续优化工作主要依赖于数值描述符，包括探索景观分析特征和学习嵌入如Deep-ELA。本文研究了一种互补的表示：探测景观的等高线图可视化。一个CNN回归器利用多个实例特定的等高线视图（堆叠或编码每个视图并聚合）来预测每个求解器的性能，从而通过预测的最佳值进行选择。在标准BBOB 2009单目标协议上，所得到的选者显著优于单最佳求解器（SBS），并与基于特征的基线具有竞争力。随后在DeepELA设置下的双目标评估进一步表明，当使用窗口等高线视图时，基于图像的原则同样具有竞争力。总体而言，结果表明，简单的视觉模型可以利用探测景观中的空间结构进行算法选择，而无需手工设计ELA特征。

英文摘要

The present paper introduces a new representation-driven approach to per-instance algorithm selection, applied to black-box optimization, for automatically choosing the most promising solver from a fixed portfolio. Prior work in continuous optimization largely relies on numerical descriptors, including Exploratory Landscape Analysis features and learned embeddings such as Deep-ELA. This work studies a complementary representation: contour-map visualizations of probed landscapes. A CNN regressor takes multiple instance-specific contour views (stacked or encoded per view and aggregated) and predicts per-solver performance, enabling selection by the predicted best value. On the standard BBOB 2009 single-objective protocol, the resulting selectors significantly outperform the single best solver (SBS) and are competitive with feature-based baselines. A subsequent bi-objective evaluation under the DeepELA setting further indicates that the same image-based principle can be competitive when using windowed contour views. Overall, the results suggest that simple vision models can exploit spatial structure in probed landscapes for algorithm selection without handcrafted ELA features.

URL PDF HTML ☆

赞 0 踩 0

2605.20784 2026-05-21 cs.AI cs.LG 版本更新

Interaction Locality in Hierarchical Recursive Reasoning

层次递归推理中的交互局部性

Yosuke Miyanishi, Tetsuro Morimura

发表机构 * CyberAgent Inc.（CyberAgent公司）

AI总结本文提出交互局部性框架，用于测量信息流是否在附近单元或语义段内传输或跨越，通过在HRM和TRM等层次递归推理模型上应用，验证了局部执行与全局规划的可重复测量框架。

详情

AI中文摘要

空间推理需要位置绑定计算和位置不变结构：智能体必须在保持路线、对象或约束层次计划的同时进行局部移动。我们提出交互局部性，一种任务-几何感知的框架，用于衡量信息流是否在附近单元或语义段内传输或跨越。我们通过稀疏自动编码器特征消融和有限噪声激活补丁来实例化该框架，并在附录中报告了结构性雅可比和注意力检查。将其应用于Maze-Hard、Sudoku Extreme和ARC-AGI等模型。在这些模型中，激活补丁给出了最清晰的架构指纹：高层递归状态倾向于在附近单元或相同段内写入信息，而重复的递归更新将这些局部写入累积到更广泛的解决方案结构中。这种模式在迷宫路径、数独约束和ARC-AGI对象邻域中均成立，其中TRM表现最强。为了测试交互局部性是否超越玩具但具有挑战性的网格基准，我们还将其应用于MTU3D，一个大规模的具身3D场景-grounding模型。在MTU3D设置中，因果空间局部性主要出现在视觉场景特征传递给下游grounding模块的过渡处，而不是在视觉编码器中均匀分布。这种对比表明，HRM和TRM中观察到的局部到全局的交接与显式递归推理动态有关，而具身3D模型可能在模块边界集中因果空间结构。交互局部性将直观的局部执行/全局规划故事转化为可重复测量的递归和具身空间推理框架。

英文摘要

Spatial reasoning requires both location-bound computation and location-invariant structure: agents must make local moves while preserving route, object, or constraint-level plans. We propose interaction locality, a task-geometry-aware framework for measuring whether information flow stays within nearby cells or semantic segments, or crosses them. We instantiate the framework with sparse-autoencoder feature ablations and finite-noise activation patching, with structural Jacobian and attention checks reported in the appendix, and apply it to HRM and TRM, two compact hierarchical and recursive reasoning models, on Maze-Hard, Sudoku Extreme, and ARC-AGI. Across these models, activation patching gives the clearest architectural fingerprint: high-level recurrent states tend to write information within nearby cells or same-segment units, while repeated recursive updates accumulate these local writes into broader solution structure. This pattern holds across maze paths, Sudoku constraints, and ARC-AGI object neighborhoods, with the strongest concentration in TRM. To test whether interaction locality extends beyond toy-yet-challenging grid benchmarks, we also apply it to MTU3D, a large-scale embodied 3D scene-grounding model. In this MTU3D setting, causal spatial locality appears primarily at the transition where visual scene features are handed to the downstream grounding module, rather than uniformly throughout the visual encoder. This contrast suggests that the local-to-global handoff observed in HRM and TRM is tied to explicit recursive reasoning dynamics, while embodied 3D models may concentrate causal spatial structure at module boundaries. Interaction locality turns the intuitive local-execution/global-planning story into a reproducible measurement framework for recursive and embodied spatial reasoning.

URL PDF HTML ☆

赞 0 踩 0

2605.20782 2026-05-21 cs.LG 版本更新

Causal Machine Learning Is Not a Panacea: A Roadmap for Observational Causal Inference in Health

因果机器学习并非万能：健康领域观察性因果推断的路线图

Donna Tjandra, Trenton Chang, Sonali Parbhoo, Rajesh Ranganath, Andre Kurepa Waschka, William Mitchell, Maggie Makar, Shalmali Joshi, Finale Doshi-Velez, Leo Anthony Celi, Jenna Wiens

发表机构 * Division of Computer Science and Engineering, University of Michigan（密歇根大学计算机科学与工程系）； Department of Electrical and Electronic Engineering, Imperial College London（伦敦帝国理工学院电子与电气工程系）； Courant Institute of Mathematical Sciences, New York University（纽约大学Courant数学科学研究所）； Center for Data Science, New York University（纽约大学数据科学中心）； Department of Mathematics & Statistics, Elon University（埃洛伊大学数学与统计学系）； Department of Ophthalmology, Cambridge University Hospitals（剑桥大学医院眼科部）； Department of Biomedical Informatics, Columbia University（哥伦比亚大学生物医学信息学系）； School of Engineering and Applied Science, Harvard University（哈佛大学工程与应用科学学院）； Laboratory for Computational Physiology, Institute for Medical Engineering and Science, Massachusetts Institute of Technology（麻省理工学院医学工程与科学研究所计算生理学实验室）； Department of Medicine, Beth Israel Deaconess Medical Center（贝斯以色列德aconess医疗中心医学部）； Department of Biostatistics, Harvard T.H. Chan School of Public Health（哈佛T.H. Chan公共卫生学院生物统计学系）

AI总结本文探讨了因果机器学习在观察性数据中的应用，强调了验证有效性假设和合理使用因果机器学习的重要性，提出了加强因果分析严谨性和可解释性的模板。

详情

AI中文摘要

目的：随着大规模观察性临床数据集的日益可用以及随机对照试验的挑战，使用因果机器学习（ML）进行观察性数据中的因果推断引发了热情。我们提出了应用因果ML到观察性数据的路线图。材料和方法：我们概述了在可用数据中评估有效性假设的重要性，并负责任地应用于临床专家使用因果ML和ML从业者有限的临床专业知识。观察：尽管因果ML有所进步，但其限制在各学科中仍然被低估。这种知识缺口可能影响发现的有效性。讨论：因果假设必须得到满足，模型选择必须得到证明。否则，这些方法可能会产生有偏见或误导性的结果，对临床研究和患者护理产生影响。结论：因果ML可以成为生成因果假设的强大工具。我们提供了一个模板来加强因果分析的严谨性和可解释性。

英文摘要

Objective: The growing availability of large-scale observational clinical datasets and challenges in conducting randomized controlled trials have spurred enthusiasm in using causal machine learning (ML) for causal inference in observational data. We present a roadmap for applying causal ML to observational data. Materials and methods: We outline the importance of assessing validity assumptions within available data and applying causal ML responsibly for clinical experts using causal ML and ML practitioners with limited clinical expertise. Observations: Despite advances in causal ML, its limitations remain largely under-appreciated across disciplines. This gap in shared knowledge may impact the validity of findings. Discussion: Causal assumptions must be satisfied and modeling choices justified. Otherwise, these approaches risk producing biased or misleading results, with consequences for clinical research and patient care. Conclusion: Causal ML can be a powerful tool for generating causal hypotheses. We provide a template to strengthen the rigor and interpretability of causal analyses.

URL PDF HTML ☆

赞 0 踩 0

2605.20780 2026-05-21 cs.LG cs.CV 版本更新

Learning to Think in Physics: Breaking Shortcut Learning in Scientific Diffusion via Representation Alignment

学习物理中的推理：通过表征对齐打破科学扩散中的捷径学习

Haozhe Jia, Pengyu Yin, Wenshuo Chen, Shaofeng Liang, Lei Wang, Bowen Tian, Xiucheng Wang, Nanqian Jia, Yutao Yue

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港理工大学（广州））； Shandong University（山东大学）； LimX Dynamics Technology Co., Ltd.（LimX动态技术有限公司）； Xidian University（西安电子科技大学）； Peking University（北京大学）； Institute of Deep Perception Technology, Jiangsu Industrial Technology Research Institute (JITRI)（感知技术研究院，江苏工业技术研究院（JITRI））； Griffith University（格里菲斯大学）

AI总结该研究提出了一种无需教师的框架REPA-P，通过使用原理残差对中间特征与物理状态进行对齐，以解决物理信息扩散模型中中间表示在边界条件变化时容易产生捷径学习的问题，从而在四个PDE任务中提高了收敛速度、减少了物理残差并增强了分布外鲁棒性。

详情

AI中文摘要

物理信息扩散模型通常只在最终输出上强制实施PDE约束，导致中间表示不受约束且在边界条件变化时容易产生捷径学习。我们引入了REPA-P，一种无需教师、架构无关的框架，通过原理残差对中间特征与物理状态进行对齐。REPA-P在选定的层上附加轻量级1×1投影头，将隐藏激活解码为物理量，并在训练过程中应用PDE残差损失。这些头在推理时被丢弃，引入了零开销。在四个PDE任务中，包括达西流、拓扑优化、静电势和湍流通道流，REPA-P通过2倍的收敛加速、66.4%的残差减少和49.3%的分布外鲁棒性提升，实现了在U-Net和扩散变换器骨干网络上的持续收益。消融实验显示，监督少量中间层捕获了大部分收益，并补充了输出级物理损失。代码可在[https://github.com/Hxxxz0/REPA-P](https://github.com/Hxxxz0/REPA-P)获得。

英文摘要

Physics-informed diffusion models typically enforce PDE constraints only on final outputs, leaving intermediate representations unconstrained and prone to shortcut learning under shifted boundary conditions. We introduce **REPA-P**, a teacher-free, architecture-agnostic framework that aligns intermediate features with physical states using first-principles residuals. REPA-P attaches lightweight $1{\times}1$ projection heads to selected layers, decodes hidden activations into physical quantities, and applies PDE residual losses during training. These heads are discarded at inference, introducing **zero overhead**. Across four PDE tasks, including Darcy flow, topology optimization, electrostatic potential, and turbulent channel flow, REPA-P accelerates convergence by up to $2{\times}$, reduces physics residuals by up to $66.4\%$, and improves out-of-distribution robustness by up to $49.3\%$, with consistent gains on both U-Net and Diffusion Transformer backbones. Ablations show that supervising a small set of intermediate layers captures most benefits and complements output-level physics losses. Code is available at [https://github.com/Hxxxz0/REPA-P](https://github.com/Hxxxz0/REPA-P).

URL PDF HTML ☆

赞 0 踩 0

2605.20771 2026-05-21 cs.LG 版本更新

Cumulative Meta-Learning from Active Learning Queries for Robustness to Spurious Correlations

通过主动学习查询进行累积元学习以增强对虚假相关性的鲁棒性

Kin Whye Chew, Jingxian Wang

发表机构 * Department of Computer Science（计算机科学系）； National University of Singapore（新加坡国立大学）

AI总结本文提出了一种累积主动元学习（CAML）框架，通过主动学习查询样本来元学习先验知识，以提高模型对虚假相关性的鲁棒性，实验结果显示在多个基准测试中性能显著提升。

Comments Under review. 26 pages, 7 figures

详情

AI中文摘要

现实世界数据集中的虚假相关性导致机器学习模型依赖于无关模式，削弱了可靠性、泛化能力和公平性。主动学习提供了一种有前景的方法来解决这一故障模式，通过查询能够区分核心特征和虚假特征的信息样本。然而，标准的主动学习方法只是将查询的示例添加到标记集中，仅更新了似然项。在深度学习领域，这些信息样本的影响可能被更大的标记集稀释，并被过参数化的模型记忆化。我们提出了累积主动元学习（CAML），一种主动学习框架，利用查询的示例来元学习先验，或归纳偏差，以指导模型的适应。CAML将每个主动学习轮次视为一个元学习任务：当前的标记集作为元训练数据用于适应，而新查询的批次作为元测试数据用于评估泛化能力。与传统元学习不同，CAML利用主动学习轮次之间的序列依赖性，通过维护一个逐步细化的累积归纳偏差。理论上，我们证明了这种累积形式引入了交互项，将早期元学习的归纳偏差与后期查询诱导的目标联系起来，捕捉了标准元学习中缺失的依赖关系。实验表明，CAML在多个虚假相关性基准测试和获取策略中提高了少数群体的准确性，最高在Dominoes上提升了27.8%，在Waterbirds上提升了29.9%，在SpuCo上提升了14.3%，在CivilComments上提升了24.0%。

英文摘要

Spurious correlations in real-world datasets cause machine learning models to rely on irrelevant patterns, undermining reliability, generalization, and fairness. Active learning offers a promising way to address this failure mode by querying informative samples that distinguish core features from spurious ones. However, standard active-learning methods simply append queried examples to the labeled set, effectively updating only the likelihood term. In deep learning regimes, the influence of these informative samples can be diluted by the larger labeled set and memorized by overparameterized models. We propose Cumulative Active Meta-Learning (CAML), an active-learning framework that uses queried examples to meta-learn the prior, or inductive bias, governing how the model adapts. CAML casts each active-learning round as a meta-learning task: the current labeled set serves as meta-train data for adaptation, while the newly queried batch serves as meta-test data for evaluating generalization. Unlike conventional meta-learning, which treats tasks as independent and identically distributed, CAML exploits the sequential dependence between active-learning rounds by maintaining a cumulative inductive bias that is progressively refined. Theoretically, we show that this cumulative formulation introduces interaction terms that couple earlier meta-learned inductive biases with later query-induced objectives, capturing dependencies absent from standard meta-learning. Empirically, CAML improves minority-group accuracy across spurious-correlation benchmarks and acquisition strategies, with gains of up to 27.8% on Dominoes, 29.9% on Waterbirds, 14.3% on SpuCo, and 24.0% on CivilComments.

URL PDF HTML ☆

赞 0 踩 0

2605.20767 2026-05-21 cs.CL cs.LG stat.ME 版本更新

PACD-Net: 假设增强对比学习用于从SMBG估计血糖控制

Canyu Lei, David Repaske, Jianxin Xie

发表机构 * University of Virginia, School of Data Science, Charlottesville, VA 22903, USA（弗吉尼亚大学数据科学学院）； University of Virginia, Department of Pediatrics, Charlottesville, VA 22903, USA（弗吉尼亚大学儿科系）

AI总结本研究提出PACD-Net，一种自监督对比学习框架，用于从稀疏不规则采样的SMBG数据中估计血糖控制指标，通过伪SMBG样本指导学习并提高模型的准确性和稳定性。

详情

AI中文摘要

有效的糖尿病管理需要持续监测血糖水平。临床中，通过连续葡萄糖监测（CGM）获取的指标如时间范围（TIR）、低于范围时间（TBR）和高于范围时间（TAR）用于评估血糖控制。然而，由于CGM成本高且可及性有限，许多患者依赖自测血糖（SMBG）。与CGM不同，SMBG提供稀疏且不规则的测量，使得准确估计这些指标具有挑战性。传统监督学习方法在稀疏数据下表现不佳，导致泛化能力差和性能不稳定。为此，我们提出PACD-Net，一种自监督对比学习框架，用于从SMBG估计血糖控制。使用具有更丰富时间覆盖的伪SMBG样本作为教师信号，指导从稀疏观测中学习。此外，多视图对比学习强制不同采样模式下的表征一致性。模型采用混合Swin Transformer-CNN主干网络以捕捉稀疏SMBG序列中的时间依赖性。实验结果表明，PACD-Net在真实世界SMBG数据中对TAR、TIR和TBR的估计优于现有方法，实现了在极稀疏观测设置下的改进准确性和增强的稳定性与泛化能力。所提出的框架为临床SMBG解释提供了实用工具，并为从稀疏且不规则采样的传感器数据中学习提供了通用方法。

英文摘要

Effective diabetes management requires continuous monitoring of glycemic levels. Clinically, glycemic control is assessed using metrics such as Time in Range (TIR), Time Below Range (TBR), and Time Above Range (TAR), typically derived from continuous glucose monitoring (CGM). However, many patients rely on self-monitoring of blood glucose (SMBG) due to the high cost and limited accessibility of CGM. Unlike CGM, SMBG provides sparse and irregular measurements, making accurate estimation of these metrics challenging. Conventional supervised learning approaches struggle under such sparsity, leading to poor generalization and unstable performance. To address this, we propose PACD-Net, a self-supervised contrastive knowledge distillation framework for estimating glycemic control from SMBG. Pseudo-SMBG samples with richer temporal coverage are used as teacher signals to guide learning from sparse observations. In addition, multi-view contrastive learning enforces representation consistency across diverse sampling patterns. The model adopts a hybrid Swin Transformer-CNN backbone to capture temporal dependencies in sparse SMBG sequences. Experimental results demonstrate that PACD-Net consistently outperforms existing methods in estimating TAR, TIR, and TBR from real-world SMBG data, achieving improved accuracy as well as enhanced stability and generalization under extremely sparse observation settings. The proposed framework provides a practical tool for clinical SMBG interpretation and offers a generalizable approach for learning from sparse and irregularly sampled sensor data in broader applications.

URL PDF HTML ☆

赞 0 踩 0

2605.20745 2026-05-21 cs.LG cs.AI cs.CL 版本更新

AGPO: 基于双统计反馈的自适应群体策略优化

Miaobo Hu, Shuhao Hu, Bokun Wang, Ruohan Wang, Xin Wang, Xiaobo Guo, Daren Zha, Jun Xiao

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences（中国科学院信息工程研究所）； School of Cyber Security, University of Chinese Academy of Sciences（中国科学院大学网络安全学院）； School of Artificial Intelligence, University of Chinese Academy of Sciences（中国科学院大学人工智能学院）

AI总结本文提出AGPO，一种无 critic 的 GRPO 改进方法，通过群体层面的统计信息控制更新幅度和探索。在九个英语和中文数学/STEM 基准上，Qwen2.5-14B 在相同生成 token 预算下优于 PPO/GRPO，达到 GSM8K 67.3% 和 MATH 40.5%。

详情

AI中文摘要

强化学习提升大语言模型推理能力，但 PPO/GRPO 通常使用固定剪切和解码温度，使训练脆弱且调参困难。我们提出自适应群体策略优化（AGPO），一种无 critic 的 GRPO 改进方法，利用群体层面统计信息控制更新幅度和探索。AGPO 使用共享的探针衍生统计状态驱动两个控制器：（i）自适应剪切，根据奖励分散度和偏度、探针投票熵、策略熵和逐步 KL 偏移设置信任区域大小；（ii）双向自适应温度采样，根据与运行基线相对的中心不确定性加热或冷却解码。在九个英语和中文数学/STEM 基准上，使用 AGPO 训练的 Qwen2.5-14B 在相同生成 token 预算下优于 PPO/GRPO，达到 GSM8K 67.3% 和 MATH 40.5%。收益转移到 Llama-3-8B 和 Gemma-2-9B，消融实验确认两个模块互补。我们的实现可在 https://github.com/wandugu/paper_agpo 公开获取。

英文摘要

Reinforcement learning improves LLM reasoning, but PPO/GRPO typically use fixed clipping and decoding temperature, which makes training brittle and tuning-heavy. We propose Adaptive Group Policy Optimization (AGPO), a critic-free refinement of GRPO that uses group-level statistics to control both update magnitude and exploration. AGPO uses a shared probe-derived statistical state to drive two controllers: (i) adaptive clipping, which sets the trust-region size from reward dispersion and skewness, probe vote entropy, policy entropy, and step-wise KL drift; and (ii) bidirectional adaptive temperature sampling, which heats or cools decoding around a base temperature according to centered uncertainty relative to a running baseline. On nine English and Chinese math/STEM benchmarks, Qwen2.5-14B trained with AGPO outperforms PPO/GRPO under the same generated-token budget, reaching 67.3% on GSM8K and 40.5% on MATH. Gains transfer to Llama-3-8B and Gemma-2-9B, and ablations confirm both modules are complementary. Our implementation is publicly available at https://github.com/wandugu/paper_agpo.

URL PDF HTML ☆

赞 0 踩 0

2605.20721 2026-05-21 cs.LG 版本更新

Robust Recommendation from Noisy Implicit Feedback: A GMM-Weighted Bayes-label Transition Matrix Framework

从噪声隐式反馈中鲁棒推荐：一种加权贝叶斯标签转移矩阵框架

Zongyu Li, Xuanyu Liu, Gongce Cao, Shirui Sun, Yaqi Fang, Yongshuai Yu

发表机构 * Guangdong University of Technology（广东工业大学）； University of Chinese Academy of Sciences（中国科学院大学）； Capital Normal University（首都师范大学）； Xiamen University（厦门大学）； Beijing Jiaotong University（北京交通大学）

AI总结本文提出了一种鲁棒的高斯混合模型加权贝叶斯标签转移矩阵框架（RGBT），通过利用高斯混合模型生成实例特定的可靠性评分，系统校准贝叶斯标签转移矩阵估计以减少偏差，从而在保证全样本利用的同时，实现一致的估计和显著的估计方差减少。

详情

AI中文摘要

在推荐系统中，从隐式反馈学习受到普遍标签噪声的挑战。虽然传统去噪方法通常丢弃噪声实例以确保鲁棒性，但这种策略不可避免地导致数据利用率低。替代方法利用贝叶斯标签转移矩阵（BLTM）可以利用所有可用数据，但其估计在实际推荐场景中往往存在偏差。为了解决这些限制，本文提出了一种鲁棒的高斯混合模型加权贝叶斯标签转移矩阵框架（RGBT）。我们的解决方案利用高斯混合模型（GMM）推导实例特定的可靠性评分，系统校准BLTM估计以减轻偏差。理论分析确认，通过利用BLTM框架结合GMM校准，我们的方法同时确保了全样本利用、一致的估计以及关键的估计方差显著减少。在多个真实世界和合成翻转数据集上的广泛实验表明，RGBT不仅比主流可靠样本去噪方法更有效地利用噪声样本，而且在状态-of-the-art转移矩阵去噪方法中实现了显著更优的转移矩阵校准能力。

英文摘要

Learning from implicit feedback in recommender systems is fundamentally challenged by pervasive label noise. While conventional denoising approaches often discard noisy instances to ensure robustness, this strategy inevitably suffers from low data utilization. Alternative methods that employ a Bayes-label transition matrix (BLTM) can leverage all available data, but their estimates tend to be biased in practical recommendation scenarios. To address these limitations, this paper proposes a Robust GMM-weighted Bayes-label Transition Matrix framework (RGBT). Our solution utilizes a Gaussian Mixture Model (GMM) to derive instance-specific reliability scores, which systematically calibrate the BLTM estimation to mitigate bias. Theoretical analysis confirms that our approach, by leveraging the BLTM framework with GMM calibration, simultaneously ensures full sample utilization, delivers consistent estimation, and critically, achieves a significant reduction in estimation variance. Extensive experiments on multiple real-world and synthetically flipped datasets demonstrate that RGBT not only utilizes noisy samples more effectively than mainstream reliable sample-based denoising methods, but also achieves significantly superior calibration capability of the transition matrix compared to state-of-the-art transition matrix-based denoising approaches.

URL PDF HTML ☆

赞 0 踩 0

2605.20713 2026-05-21 cs.CV cs.AI cs.LG 版本更新

SAVER: Selective As-Needed Vision Evidence for Multimodal Information Extraction

SAVER：选择性所需视觉证据用于多模态信息提取

Miaobo Hu, Shuhao Hu, Bokun Wang, Rui Chen, Xin Wang, Xiaobo Guo, Daren Zha, Jun Xiao

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences（中国科学院信息工程研究所）； University of Chinese Academy of Sciences（中国科学院大学）； School of Cyber Security, University of Chinese Academy of Sciences（中国科学院大学网络安全学院）； School of Artificial Intelligence, University of Chinese Academy of Sciences（中国科学院大学人工智能学院）

AI总结该研究提出SAVER框架，通过选择性视觉证据提升多模态命名实体识别和关系抽取的性能，减少计算开销并提高准确性。

详情

AI中文摘要

多模态信息提取在社交媒体中具有挑战性，因为帖子可能附加多个弱相关、冗余甚至误导性的图像。在这样的情况下，持续的多模态融合会浪费计算资源并放大虚假的视觉提示。核心挑战是决定是否为每个候选跨度或标记实体对咨询视觉信息，以及如果需要，哪些小图像子集提供可信的证据。我们提出SAVER，一种选择性视觉所需框架用于多模态命名实体识别和多模态关系抽取。SAVER使用符合性地面性门（CGG）来估计MNER中的跨度级视觉地面性，从两个标记实体推导出对级激活，通过符合性风格程序和Clopper-Pearson上界校准激活阈值。当被激活时，一个子模ularity相关性-多样性选择器选择跨图像的紧凑证据子集，然后通过集合变换器进行聚合。一个受能量启发的联合评分头结合文本、可选视觉证据、文本-图像一致性以及稀疏路由用于实体类型或关系分类。实验表明，SAVER在强文本-only和持续多模态基线上一致提高F1，同时减少AURC，增加激活覆盖面积，在固定风险水平下，降低FLOPs和P90延迟。

英文摘要

Multimodal IE in social media is difficult because a post may attach multiple images that are weakly related, redundant, or even misleading with respect to the text. In this setting, always-on multimodal fusion wastes computation and can amplify spurious visual cues. The core challenge is to decide, for each candidate span or marked entity pair, whether vision should be consulted at all and, if so, which small subset of images provides trustworthy evidence. We propose SAVER, a selective vision-as-needed framework for multimodal named entity recognition and multimodal relation extraction. SAVER uses a Conformal Groundability Gate (CGG) to estimate span-level visual groundability in MNER, derive pair-level activation in MRE from the two marked entities, and calibrate the activation threshold on a held-out split via a conformal-style procedure with Clopper--Pearson upper bounds. When activated, a submodular relevance--diversity selector chooses a compact evidence subset across images, which is then aggregated by a Set Transformer. An energy-inspired joint scoring head combines text, optional visual evidence, text--image consistency, and sparse routing for entity typing or relation classification. Experiments show that SAVER consistently improves F1 over strong text-only and always-on multimodal baselines, while reducing AURC, increasing activation coverage at a fixed risk level, and lowering FLOPs and P90 latency.

URL PDF HTML ☆

赞 0 踩 0

2605.20696 2026-05-21 cs.LG 版本更新

Distributed Direct Preference Optimization

分布式直接偏好优化

Zhanhong Jiang

发表机构 * Translational AI Center, Iowa State University, Ames, USA（翻译人工智能中心，爱荷华州立大学，爱荷华州阿姆斯）

AI总结本文研究了在分布式环境中直接偏好优化（DPO）的收敛性和时间复杂度，分析了联邦学习和去中心化学习中偏好数据碎片化对优化动态的影响，并提出了具有理论保证的鲁棒且可扩展的实现实现方法。

Comments 29 pages, 12 figures

详情

AI中文摘要

基于偏好强化学习（RL）是将策略与人类判断对齐的关键范式，然而其在分布式设置中，偏好数据在异构用户之间碎片化的情况下理论行为仍不明确。直接偏好优化（DPO）避免显式奖励建模，但在联邦和去中心化训练中缺乏收敛保证，其中通信约束和非独立同分布（non-IID）偏好根本上改变了优化动态。我们为分布式环境中的DPO提供了首次收敛性和时间复杂度分析。通过建模具有用户特定偏好分布的个性化离线RL，我们刻画了诱导的全局优化景观。对于联邦DPO，我们推导了收敛率，量化了客户端漂移、通信频率和偏好异质性的影响；对于去中心化DPO，我们建立了在一般通信图上的收敛性，并展示了谱连通性如何控制优化速度和共识。实证上，我们在标准对齐基准上验证了我们的理论见解，证明了我们提出的方法不仅具有强理论保证，而且在实践中也表现出鲁棒性和可扩展性。代码库在此处提供。

英文摘要

Preference-based reinforcement learning (RL) is a key paradigm for aligning policies with human judgments, yet its theoretical behavior in distributed settings where preference data are fragmented across heterogeneous users remains poorly understood. Direct Preference Optimization (DPO) avoids explicit reward modeling but lacks convergence guarantees under federated and decentralized training, where communication constraints and non-IID preferences fundamentally alter optimization dynamics. We provide the first convergence and time-complexity analysis of DPO in distributed environments. Modeling personalized offline RL with user-specific preference distributions, we characterize the induced global optimization landscape. For federated DPO, we derive convergence rates that quantify the impact of client drift, communication frequency, and preference heterogeneity; for decentralized DPO, we establish convergence over general communication graphs and show how spectral connectivity governs optimization speed and consensus. Empirically, we corroborate our theoretical insights on standard alignment benchmarks, demonstrating that our proposed methods not only enjoy strong theoretical guarantees but also deliver robust and scalable performance in practice. The code base is available here.

URL PDF HTML ☆

赞 0 踩 0

2605.20689 2026-05-21 cs.CL cs.AI cs.IR cs.LG 版本更新

DIVE: Embedding Compression via Self-Limiting Gradient Updates

DIVE: 通过自限制梯度更新实现嵌入压缩

Dongfang Zhao

发表机构 * University of Washington Tacoma School of Engineering and Technology（华盛顿大学塔可姆分校工程与技术学院）

AI总结本文提出DIVE方法，通过自限制的三元组损失和头级NT-Xent对比损失解决嵌入压缩中因标注数据稀缺导致的过拟合问题，提升了检索性能。

详情

AI中文摘要

大型语言模型的高维嵌入对向量搜索系统造成了显著的存储和计算成本。最近的嵌入压缩方法，包括Matryoshka-Adaptor（EMNLP 2024）、Search-Adaptor（ACL 2024）和SMEC（EMNLP 2025），通过轻量级残差适配器实现降维，但其训练目标在标注数据稀缺时导致严重过拟合，使检索性能低于冻结基线。我们提出DIVE（通过隐式视图集合进行降维），一种压缩适配器，通过两种机制解决这一失败。首先，一个自限制的基于hinge的三元组损失在三元组满足边距约束时产生零梯度，限制应用于预训练嵌入空间的总扰动。其次，头级NT-Xent对比损失将每个嵌入的多个学习投影视为隐式视图，提供密集的自监督梯度，补偿小数据集上三元组信号的稀疏性。在六个BEIR数据集上，DIVE在每个数据集和每个评估的压缩比上均优于所有三个基线适配器，具有14M参数的开源实现。

英文摘要

High-dimensional embeddings from large language models impose significant storage and computational costs on vector search systems. Recent embedding compression methods, including Matryoshka-Adaptor (EMNLP 2024), Search-Adaptor (ACL 2024), and SMEC (EMNLP 2025), enable dimensionality reduction through lightweight residual adapters, but their training objectives cause severe overfitting when labeled data is scarce, degrading retrieval performance below the frozen baseline. We propose \textsc{DIVE} (\textbf{D}imensionality reduction with \textbf{I}mplicit \textbf{V}iew \textbf{E}nsembles), a compression adapter that addresses this failure through two mechanisms. First, a self-limiting hinge-based triplet loss produces zero gradient once a triplet satisfies the margin constraint, bounding the total perturbation applied to the pretrained embedding space. Second, a head-wise NT-Xent contrastive loss treats multiple learned projections of each embedding as implicit views, providing dense self-supervised gradients that compensate for the sparsity of the triplet signal on small datasets. Across six BEIR datasets, \textsc{DIVE} outperforms all three baseline adapters on every dataset and at every evaluated compression ratio, with a 14M-parameter open-source implementation.

URL PDF HTML ☆

赞 0 踩 0

2605.20687 2026-05-21 eess.IV cs.LG 版本更新

Motion-Robust Deep Reconstruction for Free-Breathing Cardiac Cine MRI

运动鲁棒深度重建用于自由呼吸心脏 cine MRI

Mahmut Yurt, Kanghyun Ryu, Zhitao Li, Xucheng Zhu, Xianglun Mao, Martin Janich, Marcus Alley, Kawin Setsompop, John Pauly, Shreyas Vasanawala, Ali Syed

发表机构 * Stanford University（斯坦福大学）； KIST ； GE Healthcare（通用电气医疗）

AI总结本文提出Cine-DL框架，通过结合目标k空间预处理和快速模型基于深度重建，解决自由呼吸径向采集在高加速下的运动伪影问题，提升临床应用可行性。

详情

AI中文摘要

传统心脏 cine MRI 依赖于呼吸保持的 Cartesian 采集，容易产生运动伪影且可能不舒适或不可行，特别是对于儿童和其他不配合患者。自由呼吸径向采集可以缓解这些限制，但高加速下的鲁棒重建仍具挑战，因显著的条纹伪影。为解决这些限制，我们提出 Cine-DL，一个面向临床的框架，结合目标 k 空间预处理与快速、基于模型的深度重建。在该流程中，原始自由呼吸径向数据经过回顾性心脏分箱和呼吸门控以分辨心脏相位并丢弃运动损坏的 spokes。我们然后引入条纹优化线圈压缩 (SOC)，明确保留心脏信号同时抑制通常驱动条纹伪影的外围干扰。所得 2D+t cine 系列通过一个展开网络重建，交替使用 ResNet 近似算子与基于物理的数据一致性更新，通过共轭梯度求解。我们进一步采用内存高效的训练策略以减少峰值内存使用。我们在自由呼吸志愿者数据上评估 Cine-DL，与已建立的基线 (k-t SENSE 和 iGRASP) 相比，并通过医院部署新获得的患者数据证明临床应用。我们的实验表明，Cine-DL 一致提高定量指标和视觉保真度，支持向自由呼吸 cine MRI 的常规、时间敏感临床应用的实用路线。

英文摘要

Conventional cardiac cine MRI relies on breath-hold Cartesian acquisitions, which are vulnerable to motion artifacts and can be uncomfortable or infeasible, particularly for pediatric and other noncompliant patients who cannot reliably hold their breath. Free-breathing radial acquisitions can alleviate these limitations, but robust reconstruction at high acceleration remains challenging due to prominent streak artifacts. To address these limitations, we propose Cine-DL, a clinically oriented framework that couples targeted k-space preprocessing with fast, model-based deep reconstruction. In this pipeline, raw free-breathing radial data undergo retrospective cardiac binning and respiratory gating to resolve cardiac phases and discard motion-corrupted spokes. We then introduce Streak Optimized Coil Compression (SOC), which explicitly preserves cardiac signals while suppressing peripheral interference that typically drives the streak artifacts. The resulting 2D+t cine series is reconstructed with an unrolled network that alternates a ResNet proximal operator with physics-based data consistency updates solved via conjugate gradient. We further employ a memory-efficient training strategy that reduces peak memory usage. We evaluate Cine-DL on free-breathing volunteer data against established baselines (k-t SENSE and iGRASP) and demonstrate clinical translation via hospital deployment on newly acquired patient data. Our experiments show that Cine-DL consistently improves quantitative metrics and visual fidelity, supporting a practical route toward routine, time-sensitive clinical adoption of free-breathing cine MRI.

URL PDF HTML ☆

赞 0 踩 0

2605.20681 2026-05-21 stat.ME cs.LG 版本更新

RoPeSLR: 3D RoPE驱动的稀疏低秩注意力用于高效的扩散变换器

Yuxi Liu, Zekun Zhang, Yixiang Cai, Renjia Deng, Yutong He, Kun Yuan

发表机构 * Peking University（北京大学）； University of Electronic Science and Technology of China（电子科技大学）； Beijing University of Posts and Telecommunications（北京邮电大学）

AI总结本研究提出RoPeSLR，一种基于3D RoPE的稀疏低秩注意力框架，旨在解决扩散变换器中长序列生成的高复杂度问题，通过结合高频率语义尖峰集和极低秩背景连续体，实现子二次稀疏性和子线性秩增长，从而在超长视频推理中表现出色。

详情

AI中文摘要

扩散变换器（DiTs）已革新了高保真视频生成，但其$\mathcal{O}(L^2)$的注意力复杂度对长序列合成构成了重大瓶颈。尽管近期的稀疏线性注意力混合体旨在缓解这一问题，但其在极端稀疏性下性能严重下降，这是因为“RoPE困境”：标准线性注意力无法保持3D旋转位置嵌入（RoPE）的正交相对位置结构，从而消除了关键的距离意识。为了解决这个问题，我们提出了RoPeSLR，一种3D RoPE引导的稀疏低秩注意力框架。我们建立，根据经验证实的假设，DiT注意力流形可以解耦为一个高频率语义尖峰集（受限于$\mathcal{O}(L^{3/2})$稀疏性）和一个极低秩（$\mathcal{O}(d_h \log L)$）背景连续体。受这一结构先验的指导，RoPeSLR摒弃标准线性注意力，采用具有可学习3D绝对位置嵌入（PE）注入的头级低秩参数化，无缝合成长距离相对距离衰减。通过保证子二次稀疏性和子线性秩增长，RoPeSLR特别适合扩展到超长视频推理。广泛的评估验证了这种可扩展优势：在90%稀疏性下，RoPeSLR在Wan2.1-1.3B上实现高达10倍的FLOPs减少，并在HunyuanVideo-13B的超长100K+ token序列上提供2.26倍的端到端推理加速，同时保持接近无损的生成保真度（平均VBench退化低于1.3%）

英文摘要

Diffusion Transformers (DiTs) have revolutionized high-fidelity video generation, yet their $\mathcal{O}(L^2)$ attention complexity poses a formidable bottleneck for long-sequence synthesis. While recent sparse-linear attention hybrids aim to mitigate this, their performance severely degrades at extreme sparsity due to the "RoPE Dilemma": standard linear attention fails to preserve the orthogonal relative-position structure of 3D Rotary Position Embeddings (RoPE), neutralizing vital distance awareness. To address this, we propose \textbf{RoPeSLR}, a 3D RoPE-guided Sparse-LowRank attention framework. We establish that under empirically validated assumptions, the DiT attention manifold admits a decoupling into a high-frequency semantic spike set (bounded by $\mathcal{O}(L^{3/2})$ sparsity) and an extreme low-rank ($\mathcal{O}(d_h \log L)$) background continuum. Guided by this structural prior, RoPeSLR eschews standard linear attention for a head-wise low-rank parameterization equipped with a learnable 3D Absolute Positional Embedding (PE) injection, seamlessly synthesizing long-range relative distance decay. By guaranteeing sub-quadratic sparsity and sub-linear rank growth, RoPeSLR is exceptionally suited for scaling to ultra-long video inference. Extensive evaluations validate this scalable superiority: at 90\% sparsity, RoPeSLR achieves up to $10\times$ fewer FLOPs on Wan2.1-1.3B and delivers a $2.26\times$ end-to-end inference speedup on the ultra-long 100K+ token sequences of HunyuanVideo-13B, all while maintaining near-lossless generation fidelity (less than 1.3\% average VBench degradation).

URL PDF HTML ☆

赞 0 踩 0

2605.20649 2026-05-21 eess.SP cs.AI cs.LG 版本更新

AMAR: Lightweight Attention-Based Multi-User Activity Recognition from Wi-Fi CSI

AMAR: 基于注意力机制的轻量级多用户活动识别从Wi-Fi CSI

Amirhossein Mohammadi, Hina Tabassum

发表机构 * Department of Electrical Engineering and Computer Science（电气工程与计算机科学系）

AI总结本文提出了一种基于注意力机制的轻量级多用户活动识别框架AMAR，通过将活动识别转化为集合预测问题，利用Transformer架构和边缘-云混合架构，实现了在多用户环境下对并发活动的高精度识别，同时显著减少带宽使用和占用估计误差。

Comments 25 pages, 6 figures, 3 tables

详情

AI中文摘要

基于Wi-Fi的人体活动识别（HAR）已发展为一种有前景的无接触传感方法，利用无线收发器收集的信道状态信息（CSI）。尽管现有研究主要集中在单用户场景，但实际部署通常涉及多用户设置，其中并发用户的行为导致CSI模式重叠，挑战传统分类方法。为解决这一限制，本文提出了一种基于注意力机制的多用户活动识别（AMAR）框架，将HAR转化为集合预测问题。AMAR的Transformer架构利用可学习的查询嵌入作为专用活动检测器，使系统能够同时从复合CSI表示中识别多种活动。此外，为应对部署限制，AMAR采用边缘-云混合架构，其中边缘设备上的轻量级卷积网络执行初始特征提取，随后通过残差向量量化实现显著的带宽减少，同时保留活动区分信息。云组件通过基于注意力的集合匹配执行最终活动预测，使系统能够处理变化的占用水平。在教室、会议厅和空房间环境中，AMAR在平均情况下几乎将完美预测所有并发活动的速率提高了两倍，同时其F1分数达到53.4%，比最佳基准45.6%有所提高，并将占用估计误差减少了74%，同时大幅减少带宽使用。

英文摘要

Wi-Fi-based human activity recognition (HAR) has emerged as a promising approach for contactless sensing, leveraging channel state information (CSI) collected from wireless transceivers. While existing studies have primarily concentrated on single-user scenarios, real-world deployments often involve multi-user settings where concurrent users' movements induce overlapping CSI patterns that challenge conventional classification methods. To address this limitation, this paper introduces an attention-based multi-user activity recognition (AMAR) framework that formulates HAR as a set prediction problem. The transformer-based architecture in AMAR leverages learnable query embeddings acting as specialized activity detectors, enabling the simultaneous identification of multiple activities from composite CSI representations. Moreover, to address deployment constraints, AMAR is designed in an edge-cloud split architecture form where lightweight convolutional networks on edge devices perform initial feature extraction, followed by residual vector quantization that achieves substantial bandwidth reduction while preserving activity-discriminative information. The cloud component performs final activity prediction through attention-based set matching, enabling the system to handle varying occupancy levels. Across classroom, meeting-room, and empty-room environments, on average AMAR nearly doubles the rate of perfectly predicting all concurrent activities compared to the best baseline. Moreover, it achieves an $F_1$-score of 53.4% compared to 45.6% for the best benchmark, and reduces occupancy estimation error by 74%, while minimizing bandwidth substantially.

URL PDF HTML ☆

赞 0 踩 0

2605.20644 2026-05-21 cs.LG cs.AI cs.RO 版本更新

通过弱形式潜变量动力学进行时间依赖的PDE约束优化

April Tran, Terry Haut, David Bortz, Youngsoo Choi

发表机构 * Department of Applied Mathematics, University of Colorado（应用数学系，科罗拉多大学）； Center for Applied Scientific Computing, Lawrence Livermore National Laboratory（应用科学计算中心，劳伦斯利弗莫尔国家实验室）

AI总结本文提出了一种基于弱形式潜空间降阶建模的框架，用于加速梯度基PDE约束优化，通过弱形式系统识别方法压缩高维解轨迹并识别参数化潜变量动力学，从而在多查询设计和控制场景中实现高效优化。

详情

AI中文摘要

受高维时间依赖偏微分方程约束的优化问题需要重复的正向和灵敏度求解，这在许多多查询设计和控制设置中使高保真优化计算上不可行。我们提出了一种弱形式潜空间降阶建模框架，用于加速梯度基PDE约束优化。所提出的方法基于弱形式潜空间动力学识别（WLaSDI），该方法将高维解轨迹压缩到低维潜变量表示中，并利用弱形式系统识别来识别参数化潜变量动力学。通过避免显式数值微分训练轨迹，弱形式提高了对噪声数据的鲁棒性，并产生了更可靠的代理动力学用于优化。我们制定了由此产生的降阶PDE约束优化问题，并推导了针对所学潜变量动力学的直接灵敏度和伴随基梯度表达式，从而能够以可扩展的方式对设计参数进行梯度评估。该框架在三个时间依赖的基准问题上得到验证：用于最优hohlraum设计的热辐射传递、两流不稳定性Vlasov-Poisson系统以及无粘Burgers方程。在这些例子中，WLaSDI产生了准确的最优设计，保持了在噪声训练数据下的鲁棒性，并实现了显著的计算节省，包括相对于全阶优化的速度提升高达五量级。这些结果表明，弱形式潜变量动力学为复杂时间依赖PDE系统的梯度基优化提供了高效且噪声鲁棒的代理基础。

英文摘要

Optimization problems constrained by high-dimensional, time-dependent partial differential equations require repeated forward and sensitivity solves, making high-fidelity optimization computationally prohibitive in many-query design and control settings. We present a weak-form latent-space reduced-order modeling framework for accelerating gradient-based PDE-constrained optimization. The proposed approach builds on Weak-form Latent Space Dynamics Identification (WLaSDI), which compresses high-dimensional solution trajectories into a low-dimensional latent representation and identifies parametric latent dynamics using weak-form system identification. By avoiding explicit numerical differentiation of training trajectories, the weak-form improves robustness to noisy data and yields more reliable surrogate dynamics for optimization. We formulate the resulting reduced PDE-constrained optimization problem and derive both direct-sensitivity and adjoint-based gradient expressions for the learned latent dynamics, enabling scalable gradient evaluation with respect to design parameters. The framework is demonstrated on three time-dependent benchmark problems: thermal radiative transfer for optimal hohlraum design, the two-stream instability Vlasov-Poisson system, and the inviscid Burgers equation. Across these examples, WLaSDI produces accurate optimal designs, remains robust under noisy training data, and delivers substantial computational savings, including speedups of up to five orders of magnitude relative to full-order optimization. These results demonstrate that weak-form latent dynamics provide an efficient and noise-robust surrogate foundation for gradient-based optimization of complex time-dependent PDE systems.

URL PDF HTML ☆

赞 0 踩 0

2605.20624 2026-05-21 cs.CV cs.AI cs.LG 版本更新

Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models

用自回归扩散模型加速视频逆问题求解器

Taesung Kwon, Jonghyun Park, Hyungjin Chung, Jong Chul Ye

发表机构 * KAIST（韩国科学技术院）； EverEx

AI总结本文提出自回归视频逆问题求解器（AVIS），通过自回归扩散模型实现流式视频恢复，显著降低初始延迟并提高吞吐量，同时保持高质量的恢复效果，并进一步提出加速变体AVIS Flash，实现更高的吞吐量和更优的效率-性能权衡，为实时部署铺平道路。

Comments Project page is available here: https://avis-project.github.io/

详情

AI中文摘要

扩散模型为零样本视频逆问题提供了强大的先验知识，但其实时部署受到两个效率问题的阻碍：由整体视频恢复引起的高初始延迟，以及由于在像素空间中多次VAE传递以强制测量一致性导致的低吞吐量。为克服这些限制，我们提出了自回归视频逆问题求解器（AVIS）。AVIS框架利用自回归视频扩散模型以流式方式恢复视频，自然地消除了延迟瓶颈。具体而言，AVIS通过测量一致性的估计初始化反向扩散，减少了所需的采样步骤。与领先的非自回归求解器相比，AVIS将初始延迟从114秒减少到4秒，并将吞吐量从0.71提高到1.18 FPS，同时实现更优的恢复质量。我们进一步引入了一个高度加速的变体，称为AVIS Flash，该变体仅在第一个片段上强制测量一致性。AVIS Flash在单个RTX 4090 GPU上将吞吐量提高到5.91 FPS，同时保持竞争性的性能，并实现有利的效率-性能权衡，为实时部署铺平道路。

英文摘要

Diffusion models provide powerful priors for zero-shot video inverse problems, but their real-time deployment is hindered by two inefficiencies: high initial latency caused by holistic video restoration, and low throughput resulting from multiple VAE passes to enforce measurement consistency in pixel space. To overcome these limitations, we propose Autoregressive Video Inverse problem Solver (AVIS). The AVIS framework leverages autoregressive video diffusion models to restore videos in a streaming manner, naturally eliminating latency bottlenecks. Specifically, AVIS initializes reverse diffusion with a measurement-consistent estimate, reducing the required sampling steps. Compared to leading non-autoregressive solvers, AVIS drastically reduces initial latency from 114s to 4s and increases throughput from 0.71 to 1.18 FPS while achieving superior restoration quality. We further introduce a highly accelerated variant, dubbed AVIS Flash, that enforces measurement consistency solely on the first chunk. AVIS Flash substantially boosts throughput to 5.91 FPS on a single RTX 4090 GPU while maintaining competitive performance and achieving a favorable efficiency-performance trade-off, paving the way toward real-time deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.20620 2026-05-21 cs.LG cs.DB cs.GT 版本更新

Dynamic Shapley Computation

动态Shapley值计算

Xuan Yang, Hsi-Wen Chen, Ming-Syan Chen, Jian Pei

发表机构 * Duke University（杜克大学）； National Taiwan University（国立台湾大学）

AI总结本文提出D-Shap框架，通过将Shapley值表示为玩家-任务矩阵，解决动态环境下训练数据贡献评估的高效更新问题，利用任务和联盟的局部性特性实现快速更新和自评估。

详情

AI中文摘要

基于数据的Shapley估值提供了一种量化训练数据贡献的原则性方法，但其高计算成本使其在动态设置中难以应用，其中任务和训练玩家不断变化。现有方法将Shapley计算视为一次性过程，将贡献汇总为聚合分数，阻止了重用并要求在任何变化时重新计算。我们引入了一种新的视角，将Shapley值表示为玩家-任务矩阵，并将动态估值建模为结构化矩阵维护问题。我们利用每个任务依赖于少量训练玩家的事实以及相似任务产生相似估值，导致效用局部性和联盟局部性。基于这些见解，我们提出了D-Shap，一种动态估值框架，通过仅修改矩阵的小部分实现高效更新：新任务估值通过结构感知插值推断，而由新玩家引起的更新被限制在受影响的局部矩阵块中。为消除对预指定评估任务的需求，我们引入了自估值，通过可扩展的子集重用和覆盖感知的锚点选择，直接从训练数据构建初始矩阵。在多样模型上的实验表明，D-Shap在毫秒级内完成任务更新，并将玩家更新成本降低至全重新计算的三量级，同时实现与全重新计算相当的估值质量。

英文摘要

Shapley-based data valuation provides a principled way to quantify the contribution of training data, but its high computational cost makes it impractical in dynamic settings where tasks and training players evolve. Existing methods treat Shapley computation as a one-shot process and collapse contributions into aggregated scores, preventing reuse and requiring recomputation under any change. We introduce a new perspective that represents Shapley values as a player-by-task matrix and formulates dynamic valuation as a structured matrix maintenance problem. We exploit the fact that each task depends on a small subset of training players and that similar tasks yield similar valuations, leading to utility locality and coalition locality. Based on these insights, we propose D-Shap, a dynamic valuation framework that enables efficient updates by modifying only a small portion of the matrix: new task valuations are inferred via structure-aware interpolation, while updates induced by new players are confined to affected local matrix blocks. To eliminate the need for pre-specified evaluation tasks, we introduce self-valuation, which constructs the initial matrix directly from training data, supported by scalable subset reuse and coverage-aware anchor selection. Experiments across diverse models show that D-Shap performs task updates in milliseconds and reduces the cost of player updates by up to three orders of magnitude, while achieving valuation quality competitive with full recomputation.

URL PDF HTML ☆

赞 0 踩 0

2605.20619 2026-05-21 cs.LG math.OC stat.ML 版本更新

ReversedQ: 在回合制在线强化学习中更快的Q学习机会

Sofia R. Miskala-Dinc, Aviva Prins

发表机构 * University of Maryland（马里兰大学）

AI总结本文研究了在回合有限的马尔可夫决策过程（MDPs）中使用无模型Q学习的效率问题，提出了ReversedQ方法，通过改进价值函数更新顺序、更新频率和初始化来提升学习速度，实验表明其在多个任务中均优于现有方法。

Comments This paper contains 5 pages and 2 figures. To be presented at the Adaptive and Learning Agents workshop (ALA 2026) at AAMAS 2026

2605.20581 2026-05-21 cs.LG cond-mat.mtrl-sci 版本更新

TriForces: Augmenting Atomistic GNNs for Transferable Representations

TriForces: 为可迁移表示增强原子istic GNNs

Ali Ramlaoui, Alexandre Duval, Hannah Bull, Victor Schmidt, Hugues Talbot, Fragkiskos D. Malliaros, Joseph Musielewicz

发表机构 * Université Paris-Saclay, CentraleSupélec, Inria, Gif-sur-Yvette, France（巴黎-萨克雷大学，中央超算研究所，法国国家信息与自动化技术研究院，法国吉夫-sur-耶vette）

AI总结 TriForces通过分离组成和结构信息并结合自监督学习，提升MatBench和QM9的性能，无需DFT标签，并在OMat24上实现高效相似结构检索。

Comments 28 pages, 11 figures. Accepted at ICML 2026

详情

AI中文摘要

基于群体的矩阵估计与潜在子空间恢复

Hamza Golubovic, Matthew Shen, Genevera I. Allen, Tarek M. Zikry

发表机构 * Department of Statistics, Columbia University（哥伦比亚大学统计系）； Irving Institute for Cellular Dynamics, Zuckerman Institute Columbia University（Zuckerman研究所细胞动力学院，哥伦比亚大学）； School of Data and Information Sciences, University of North Carolina at Chapel Hill（北卡罗来纳大学教堂山分校数据与信息科学学院）

AI总结本文提出了一种针对异质数据中群体特定低秩矩阵估计的凸估计器GAME，通过重叠核范数惩罚正则化来恢复子群特定的子空间结构，同时在共享坐标系中保留局部潜在结构，并在不同数据集上验证了其在结构缺失情况下优于传统低秩方法的性能。

Comments 12 pages, 6 main figures, 1 main algorithm

详情

AI中文摘要

现代矩阵补全问题通常涉及异质数据，其行同时属于多个元类别，如推荐系统中的人口统计数据和年龄组，或神经电生理实验中的区域和记录会话标签。标准低秩估计器施加单一全局潜在几何结构，可以恢复平均结构，但可能平滑掉子群特定的变异，尤其是在观察分布不均的情况下。我们引入了Group-Aware Matrix Estimation (GAME)，一种用于重叠子群级低秩矩阵估计的凸估计器。GAME通过重叠核范数惩罚正则化子群特定的子矩阵，允许相关组之间共享信息，同时在共享坐标系中保留局部潜在结构。我们为重建误差和子群特定子空间恢复提供了有限样本保证，展示了性能如何依赖于采样密度、子群秩和重叠结构。在合成、推荐、生态和神经科学数据集上的实验表明，GAME在结构缺失情况下最有益，其中子群意识正则化提高了重建准确性和潜在子空间保真度。在这些基准测试中，GAME在全局低秩、侧信息和现代填补基线中表现竞争力或最佳，当子群表现出不同低秩结构时，收益最大。

英文摘要

Modern matrix completion problems often involve heterogeneous data whose rows simultaneously belong to many meta-categories, such as demographic and age groups in recommendation systems, or region and recording session labels in neural electrophysiological experiments. Standard low-rank estimators impose a single global latent geometry, which can recover average structure but may smooth away subgroup-specific variation, especially when observations are unevenly distributed across groups. We introduce Group-Aware Matrix Estimation (GAME), a convex estimator for overlapping subgroup-wise low-rank matrix estimation. GAME regularizes category-specific submatrices through overlapping nuclear-norm penalties, allowing related groups to borrow information while preserving local latent structure in a shared coordinate system. We provide finite-sample guarantees for both reconstruction error and subgroup-specific subspace recovery, showing how performance depends on sampling density, subgroup rank, and overlap structure. Experiments on synthetic, recommendation, ecological, and neuroscience datasets show that GAME is most beneficial in structured missingness regimes, where subgroup-aware regularization improves both reconstruction accuracy and latent subspace fidelity. Across these benchmarks, GAME is competitive or best among global low-rank, side-information, and modern imputation baselines, with the largest gains when subgroups exhibit distinct low-rank structure.

URL PDF HTML ☆

赞 0 踩 0

2605.20555 2026-05-21 cs.LG cs.AI 版本更新

Complementing reinforcement learning with SFT through logit averaging in the post training of LLMs

通过logit平均在LLMs后训练中补充强化学习

Xingwei Gan, Ying Zhu

发表机构 * UC San Diego（加州大学圣迭戈分校）

AI总结本文提出一种在LLMs后训练中通过logit平均补充强化学习的方法，将该方法整合到Group Relative Policy Optimization (GRPO)中，无需使用KL正则化或critic，通过logit平均结构将可训练策略与参考策略耦合，以利用可训练策略的推理能力并保持SFT的格式优势。

2605.20552 2026-05-21 stat.ML cs.LG 版本更新

Spectral bandits for smooth graph functions with applications in recommender systems

图上平滑函数的谱带it问题及其在推荐系统中的应用

Tomáš Kocák, Michal Valko, Rémi Munos, Branislav Kveton, Shipra Agrawal

发表机构 * SequeL team, Inria France Microsoft Research New England（Inria法国微软新英格兰研究实验室SequeL团队）； Technicolor Research Center California（Technicolor加州研究中心）； Microsoft Research New England（微软新英格兰研究实验室）； Microsoft Research Bangalore India（微软班加罗尔印度研究实验室）

AI总结本文研究了图上平滑函数的带it问题，提出了一种在推荐系统中有效学习用户偏好的方法，通过有效维度的定义和线性缩放的算法，实现了低悔的在线学习。

Comments Published at AAAI 2014 - SDMBD

2605.20547 2026-05-21 cs.LG cs.AI stat.ML 版本更新

Latent Process Generator Matching

潜在过程生成器匹配

Lukas Billera, Hedwig Nora Nordlinder, Ben Murrell

发表机构 * Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet（微生物学、肿瘤和细胞生物学系，Karolinska研究院）

AI总结本文提出了一种潜在过程生成器匹配框架，该框架将观测到的生成状态视为可 tractable 马尔可夫过程的确定性图像，从而扩展了生成器匹配理论，使其适用于时间依赖的潜在条件过程。

Comments 18 pages, 1 figure

详情

AI中文摘要

许多近期的流匹配和扩散式生成模型在训练过程中依赖于辅助的随机动力学：通过模拟更丰富的过程来定义条件目标，但辅助状态在生成时要么难以采样，要么并不属于期望的输出。现有的生成器匹配理论规范了对静态潜在随机变量的条件，而几篇近期论文证明了特定增强状态构造的投影结果的特殊情况。我们引入了潜在过程生成器匹配，一种通用框架，将观测到的生成状态视为可 tractable 马尔可夫过程的确定性图像 $X_t=Φ(Y_t)$。我们显示在这一设定下，可以在图像空间中学习一个随机过程的生成器，其一阶边缘分布与投影过程相同。这扩展并涵盖了文献中的离散潜在过程结果，并将生成器匹配从静态潜在变量扩展到丰富的时间依赖潜在条件过程家族。

英文摘要

Many recent flow-matching and diffusion-style generative models rely on auxiliary stochastic dynamics during training: a richer process is simulated to define conditional targets, but the auxiliary state is either intractable to sample at generation time or simply not part of the desired output. Existing Generator Matching theory formalises conditioning on static latent random variables, and several recent papers prove special cases of projection results for particular augmented-state constructions. We introduce latent process generator matching, a general framework that treats the observed generative state as a deterministic image $X_t=Φ(Y_t)$ of a tractable Markov process $Y_t$. We show that in this setting one may learn the generator of a stochastic process on the image space which has the same one-time marginal distributions as the projected process. This generalizes and subsumes the discrete latent process results from the literature, and extends Generator Matching from static latent variables to a rich family of time-dependent latent conditional processes.

URL PDF HTML ☆

赞 0 踩 0

2605.20545 2026-05-21 stat.ML cs.LG 版本更新

Sample Complexity of Transfer Learning: An Optimal Transport Approach

迁移学习的样本复杂性：一种最优传输方法

Haoyang Cao, Xin Guo, Wenpin Tang, Guan Wang

发表机构 * Tsinghua-Berkeley Shenzhen Institute（清华大学-伯克利深圳研究院）

AI总结本文通过最优传输视角分析迁移学习的样本效率，发现当数据维度d大于3时，迁移学习的样本复杂性为O(m^{-(α+1)/d})，优于直接学习的O(m^{-p/d})，其中α表示数据分布的光滑度，p表示最优目标模型的光滑度。

详情

AI中文摘要

迁移学习是许多复杂结构的机器学习/AI模型，如大语言模型和生成式AI中的关键技术。迁移学习的本质是利用已解决的源任务知识来解决新目标任务，尤其是在后者训练数据样本量m较低时。本文严格分析了迁移学习在样本效率方面的潜在优势。具体而言，从最优传输视角出发，我们发现当数据维度d大于3时，迁移学习的样本复杂性为O(m^{-(α+1)/d})，其中α表示数据分布的光滑度，而直接学习的样本复杂性为O(m^{-p/d})，其中p表示最优目标模型的光滑度。我们的发现从理论上支持了当目标任务在一系列不太光滑的模型（即高度复杂的网络，可能使用非光滑激活函数）中优化时，迁移学习具有更好的样本效率。以图像分类为例，我们通过数值实验展示了迁移学习的样本效率，即在数据渴求的 regime 中，迁移学习可以显著提升模型性能。

英文摘要

Transfer learning is an essential technique for many machine learning/AI models of complex structures such as large language models and generative AI. The essence of transfer learning is to leverage knowledge from resolved source tasks for a new target task, especially when the sample size $m$ of the training data for the latter is low. In this work, we rigorously analyze the potential benefit of transfer learning in terms of sample efficiency. Specifically, taking an optimal transport viewpoint of transfer learning, we find that when the data dimension $d$ is higher than $3$, the sample complexity for transfer learning is $O(m^{-(α+1)/d})$, with $α$ indicating the smoothness of the data distribution, as opposed to the $O(m^{-p/d})$ sample complexity for direct learning with $p$ indicating the smoothness of the optimal target model. Our finding theoretically supports a better sample efficiency for transfer learning, when the target task is optimizing over a family of not-so-smooth models (i.e., highly complex networks with the possible use of non-smooth activation functions). Using image classification as an example, we numerically demonstrate the sample efficiency for transfer learning, that is, in the data hungry regime, the model performance can be significantly improved by transfer learning.

URL PDF HTML ☆

赞 0 踩 0

2605.20030 2026-05-21 cs.LG math.OC 版本更新

波动率预测是否能带来更好的投资组合？图神经网络的实证证据

Rylan Wade

发表机构 * University of Southern California（南加州大学）

AI总结本文研究图神经网络是否能提高实际波动率预测，并探讨这些预测是否能提升投资组合表现。通过2015-2025年间465只标普500股票的每周实际波动率数据，将异质自回归和长短期记忆基线模型与基于滚动相关性、行业和格兰杰因果图的图神经网络模型进行比较，包括和不包括宏观经济状态特征。实证发现，预测误差最小、横截面排名准确度最高、投资组合夏普比率最高的模型是三种不同的模型。预测准确性、排名质量与投资组合表现相关但不等同。只有当投资规则能利用其编码的横截面结构时，图波动率模型才具有价值。

2605.19138 2026-05-21 cs.RO cs.AI cs.LG 版本更新

COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones

COBALT: 通过基于云的远程操作利用智能手机进行机器人学习

Ayush Agarwal, Ansh Gandhi, Jeremy A. Collins, Omar Rayyan, Aryan Sarswat, Ranjani Koushik, Masoud Moghani, Ajay Mandlekar, Animesh Garg

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； University of California, Berkeley（加州大学伯克利分校）； New York University Abu Dhabi (NYUAD)（纽约大学阿布扎克分校）； University of Toronto（多伦多大学）； NVIDIA（英伟达）

AI总结本文提出COBALT平台，通过基于云的远程操作技术，利用智能手机等设备大规模收集高质量的机器人学习数据，提高仿真实验和现实世界中的机器人学习效率。

详情

AI中文摘要

大规模、高质量的演示数据稀缺仍然是扩展模仿学习用于机器人操作的主要瓶颈。我们提出了COBALT，一个旨在大规模普及机器人学习的远程操作平台，无论是仿真还是现实世界。通过利用向量化的环境，我们的可扩展、负载均衡的基础设施支持多个用户在单个GPU上同时进行远程操作，从而显著降低远程操作成本。操作员可以使用几乎全球任何地方的常见设备连接，包括单或双智能手机、VR头盔、3D鼠标和键盘。内存中的数据缓存和高效的视频流保持控制和渲染同步，支持数十个并发用户在20 Hz下以不超过100毫秒的端到端延迟运行，每GPU支持多达8个并发用户。我们还展示了稳定运行支持256个模拟客户端跨8个GPU，凸显了系统在硬件和单个服务器内的扩展能力。我们进行了全面的用户研究，显示基于手机的远程操作性能与或优于专用硬件，能够更快、更符合人体工学地收集数据。为确保数据质量，COBALT记录一套实时指标以自动过滤劣质演示。我们进一步证明，结构化的用户培训课程显著提高了数据收集质量。基于用户研究的洞察，我们通过众包收集了一个大规模、高质量的试点数据集，该数据集包含7500多个演示（50多个小时），在五个国家的智能手机上收集了九天的数据。我们通过训练最先进的模仿学习算法验证了数据集的质量。请访问https://cobalt-teleop.github.io/获取更多详情。

英文摘要

The scarcity of large-scale, high-quality demonstration data remains a bottleneck in scaling imitation learning for robotic manipulation. We present COBALT, a teleoperation platform designed to democratize robot learning at scale both in simulation and in the real world. By leveraging vectorized environments, our scalable, load-balanced infrastructure supports concurrent teleoperation by multiple users on a single GPU, yielding a significant reduction in teleoperation cost. Operators can connect from nearly anywhere on Earth using commonly available devices, including single or dual smartphones, VR headsets, 3D mice, and keyboards. An inmemory data cache and efficient video streaming keep control and rendering synchronous, sustaining dozens of concurrent users at 20 Hz with sub-100 ms end-to-end latency for up to 8 concurrent users per GPU. We also demonstrate stable operation supporting 256 simulated clients across 8 GPUs, underscoring the system's ability to scale across hardware and within individual servers. We perform a comprehensive user study showing that phone-based teleoperation performs comparably to or better than specialized hardware, enabling faster, more ergonomic data collection. To ensure data quality, COBALT logs a suite of real-time metrics to automatically filter suboptimal demonstrations. We further demonstrate that a structured user training curriculum significantly improves data collection quality. Guided by insights from our user study, we crowdsource the collection of a large-scale, high-quality pilot dataset with 7500+ demonstrations (50+ hours) collected with smartphones across nine countries over five days. We validate the dataset's quality by training state-of-the-art imitation learning algorithms. Please visit https://cobalt-teleop.github.io/ for more details.

URL PDF HTML ☆

赞 0 踩 0

2605.18860 2026-05-21 cs.LG cs.CV 版本更新

Spectral structural distortion reveals redundant neurons in neural networks

谱结构扭曲揭示神经网络中的冗余神经元

Yongyu Wang

AI总结本文提出了一种基于谱结构扭曲的神经元冗余判定方法，通过分析神经网络层变换前后的关系结构，识别可移除的神经元并保持任务性能。

详情

AI中文摘要

过度参数化的神经网络通常包含许多可移除的神经元，但什么使神经元冗余仍不明确。现有剪枝标准通常依赖局部量如权重大小、激活强度或梯度敏感性，但这些指标对神经元在层变换中结构作用的洞察有限。本文表明，神经元冗余可通过在层间表示变换中参与谱结构扭曲的程度来表征。对于训练好的网络的每个隐藏层，我们记录预激活和后激活的隐藏状态，将神经元视为图节点，构建描述神经元层面关系结构的输入侧和输出侧图。然后我们定义了一个谱结构重要性分数，测量每个神经元对这两个关系结构之间主导图谱扭曲的贡献。参与度低的神经元被视为结构冗余并通过迭代剪枝过程移除，在每次结构变化后重新计算分数。在中间剪枝轮次中不进行参数更新；在达到目标参数减少后，对紧凑模型应用一次恢复微调阶段。直接消融分析和在传统神经网络、编码器-only Transformer 和解码器-only 语言模型上的实验表明，这种图谱标准能够识别可移除的神经元和 Transformer 单元，同时在压缩后保持任务性能。这些结果表明，神经冗余不仅仅是小权重或弱激活的结果，而是可以通过在层间关系结构谱扭曲中的弱参与来理解。

英文摘要

Overparameterized neural networks often contain many removable neurons, yet what makes a neuron redundant remains poorly understood. Existing pruning criteria commonly rely on local quantities such as weight magnitude, activation strength, or gradient sensitivity, but these measures provide limited insight into the structural role of a neuron in the transformation performed by a layer. Here we show that neuronal redundancy can be characterized by weak participation in the spectral structural distortion induced by layer-wise representation transformations. For each hidden layer of a trained network, we record pre-activation and post-activation hidden states, model neurons as graph nodes, and construct input-side and output-side graphs that describe neuron-level relational structure before and after the layer transformation. We then define a spectral structural importance score that measures the contribution of each neuron to the dominant graph-spectral distortion between these two relational structures. Low-participation neurons are treated as structurally redundant and removed through an iterative pruning process in which scores are recomputed after each structural change. No parameter updates are performed during intermediate pruning rounds; after the target parameter reduction is reached, a single recovery fine-tuning stage is applied to the compact model. Direct ablation analysis and experiments across conventional neural networks, encoder-only Transformers, and decoder-only language models show that this graph-spectral criterion identifies removable neurons and Transformer units while preserving task performance after compression. These results suggest that neural redundancy is not merely a consequence of small weights or weak activations, but can be understood through weak participation in the spectral distortion of layer-wise relational structure.

URL PDF HTML ☆

赞 0 踩 0

2605.18833 2026-05-21 cs.LG cs.AI 版本更新

Automated Big Data Quality Assessment using Knowledge Graph Embeddings

利用知识图谱嵌入进行自动化大数据质量评估

Hadi Fadlallah, Rima Kilany, Mitri Haber, Ali Jaber

发表机构 * Saint-Joseph University（圣约瑟夫大学）； Lebanese University（黎巴嫩大学）

AI总结本文提出了一种基于知识图谱嵌入的自动化大数据质量评估方法，通过整合多样化的知识图谱表示，利用上下文信息生成针对每个情境的全面数据质量评估计划。

Comments 17 pages, 10 figures

详情

DOI: 10.1504/IJDMMM.2025.150987
Journal ref: International Journal of Data Mining, Modelling and Management 17.4 (2025) 383-405

AI中文摘要

自动化数据质量评估对于管理大数据至关重要，但现有解决方案在实现准确的上下文感知评估方面面临挑战。本文提出了一种基于知识的新方法，利用知识图谱嵌入来预测输入数据集的上下文表示与知识图谱中相关质量规则和维度之间的缺失边。我们通过整合知识图谱中的多样化表示，从深入的文献研究中获取洞察，从而开发出针对每个情境的全面且上下文特定的数据质量评估计划。利用知识图谱提高了我们对输入数据集上下文的理解，克服了传统方法仅依赖严格匹配并忽视上下文特征的局限性。通过注入数值边属性，我们为每个预测的质量测量分配相应的权重，为输入数据集提供全面的数据质量评估计划。为了评估我们的方法，我们利用AccentureLabs开发和基准测试的AmpliGraph框架。评估涉及使用由黎巴嫩原子能委员会（LAEC-CNRS）提供的现实世界辐射传感器数据集。从该评估中获得的结果证明了我们的解决方案能够为给定的输入数据集生成全面的数据质量评估计划。

英文摘要

Automated data quality assessment is crucial for managing big data, but existing solutions face challenges in achieving accurate context-aware assessment. This paper presents a novel knowledge-based approach to enhance automated data quality assessment. Our approach utilizes knowledge graph embeddings to predict missing edges between the input dataset's context representation and the relevant quality rules and dimensions within a knowledge graph representing contextual data characteristics and the required quality assessment operations. We surpass conventional practices by integrating diverse representations within the knowledge graph, drawing insights from contextual information from a thorough literature investigation. This integration allows us to develop a comprehensive and context-specific data quality assessment plan tailored to each context. Leveraging the knowledge graph improves our understanding of the input dataset's context, overcoming the limitations of traditional methods that rely solely on strict matching and overlook contextual characteristics. By injecting numerical edge attributes, we assign corresponding weights to each predicted quality measurement, providing a comprehensive data quality assessment plan for the input dataset. To evaluate our approach, we leverage AmpliGraph, a framework developed and benchmarked by AccentureLabs. The evaluation involves employing a real-world radiation sensors dataset provided by the Lebanese Atomic Energy Commission (LAEC-CNRS). The results obtained from this evaluation demonstrate the capability of our solution to generate a comprehensive data quality assessment plan for the given input dataset.

URL PDF HTML ☆

赞 0 踩 0

2605.18579 2026-05-21 cs.LG 版本更新

S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs

S2Aligner: 用于稀疏文本属性图的高效且可迁移的预训练方法

Yuhan Wang, Haopeng Zhang, Yibo Ding, Jiaqi Yu, Xinyu Zhao, Yuhang Liu, Ziwei Zhang, Xiao Wang, Ruijie Wang

发表机构 * Beihang University（北航大学）； Tianjin University（天津大学）

AI总结本文提出S2Aligner，一种针对稀疏文本属性图的高效且可迁移的预训练方法，通过解耦语义对齐与结构建模，增强对齐过程而不污染共享的语义空间，从而减少跨域泛化差距。

Comments 19 pages

详情

AI中文摘要

在文本属性图（TAGs）上进行预训练是构建可迁移图基础模型的核心，其中LLM-as-Aligner方法通过大语言模型的语义知识对图和文本表示进行对齐。然而，这些方法通常假设节点文本提供足够的监督，但在实际稀疏TAGs中这一假设往往不成立。当文本锚点缺失、嘈杂或跨域不均时，图结构必须通过弱语义证据进行对齐，导致不可靠的结构-语义对应关系和稀疏性引起的迁移偏差。本文提出S2Aligner，一种针对稀疏TAGs的稀疏感知且结构增强的LLM-as-Aligner框架用于图-文本预训练。关键思想是解耦语义对齐与结构建模，使拓扑感知信号能够增强对齐而不污染共享的语义空间。具体而言，S2Aligner将图-文本表示分解为语义和结构成分，利用结构导向的重建与一致性控制来将可靠的拓扑线索注入文本表示，并在文本稀疏性下抑制不一致的结构信号。此外，S2Aligner引入稀疏感知的跨域风险平衡，通过全局-域密度比校准域风险，并通过图可靠性估计降低不可靠的稀疏样本权重。理论分析表明，该目标通过控制域风险差异来减少跨域泛化差距。在多样化的图域、稀疏程度和下游任务上进行的广泛实验表明，S2Aligner在一致性上优于现有基线。

英文摘要

Pre-training on text-attributed graphs (TAGs) is central to building transferable graph foundation models, where LLM-as-Aligner methods align graph and text representations through the semantic knowledge of large language models. However, these methods usually assume that node texts provide sufficient and reliable supervision, an assumption often violated in real-world sparse TAGs. When textual anchors are missing, noisy, or uneven across domains, graph structures must be aligned with weak semantic evidence, leading to unreliable structure-semantics correspondence and sparsity-induced transfer bias. This paper presents S2Aligner, a sparsity-aware and structure-enhanced LLM-as-Aligner framework for graph-text pre-training on sparse TAGs. The key idea is to decouple semantic alignment from structural modeling, allowing topology-aware signals to enhance alignment without contaminating the shared semantic space. Specifically, S2Aligner decomposes graph-text representations into semantic and structural components, uses structure-oriented reconstruction with consistency control to inject reliable topology cues into text representations, and suppresses inconsistent structural signals under textual sparsity. Moreover, S2Aligner introduces sparsity-aware cross-domain risk balancing, which calibrates domain risks through a global-domain density ratio and downweights unreliable sparse samples via graph reliability estimation. Theoretical analysis shows that this objective reduces cross-domain generalization gaps by controlling domain risk discrepancy. Extensive experiments across diverse graph domains, sparsity levels, and downstream tasks demonstrate that S2Aligner consistently outperforms existing baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.17946 2026-05-21 cs.AI cs.CV cs.LG 版本更新

SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

SVFSearch: 一种面向游戏垂直领域的多模态知识密集型短视频帧搜索基准

Lingtao Mao, Huangyu Dai, Xinyu Sun, Zihan Liang, Ben Chen, Chenyi Lei, Wenwu Ou

发表机构 * Kuaishou Technology（快手科技）

AI总结本文提出SVFSearch，首个针对中文游戏领域短视频帧搜索的多模态知识密集型基准，通过5000个四选一测试示例和4198个辅助训练示例，评估了从直接问答到计划-行动-重新计划代理等多种方法在短视频帧搜索中的性能。

详情

AI中文摘要

多模态大语言模型越来越多地被用作代理的骨干，以理解多模态输入、计划检索操作、调用外部工具并推理由检索信息得出的结论。然而，现有的基准很少评估在短视频应用中的这种能力，其中暂停的帧通常在视觉上具有歧义性，回答需要垂直的、长尾的和快速发展的领域知识。我们引入了SVFSearch，这是首个针对中文游戏领域短视频帧搜索的开放基准。SVFSearch包含5,000个四选一测试示例和4,198个辅助训练示例，每个示例都围绕一个暂停的游戏场景展开，来自真实的短视频片段。为了支持公平且可重复的评估，SVFSearch提供了一个冻结的离线检索环境，包括一个游戏领域文本语料库、一个主题链接的图像画廊以及文本、图像和多模态检索接口，避免了对不受控的网络搜索API的依赖。我们评估了从直接问答和RAG工作流程到计划-行动-重新计划代理和学习搜索模型在内的代表性范式。结果揭示了模型单独回答、实际代理搜索和 oracle 知识之间的巨大差距：最好的开源直接问答模型达到66.4%，最好的实际代理达到79.1%，而 oracle 知识达到95.4%。进一步分析揭示了视觉定位、检索质量、证据基础推理和工具使用行为中的瓶颈，包括过度检索、只回答捷径和检索诱导的误导。

英文摘要

Multimodal large language models are increasingly used as agent backbones that understand multimodal inputs, plan retrieval actions, invoke external tools, and reason over retrieved information. Yet existing benchmarks rarely evaluate this ability in short-video applications, where a paused frame is often visually ambiguous and answering requires vertical, long-tail, and fast-evolving domain knowledge. We introduce SVFSearch, the first open benchmark for short-video frame search in the Chinese gaming domain. SVFSearch contains 5,000 four-choice test examples and 4,198 auxiliary training examples, each centered on a paused game scene from a real short-video clip. To support fair and reproducible evaluation, SVFSearch provides a frozen offline retrieval environment with a game-domain text corpus, a topic-linked image gallery, and text, image, and multimodal retrieval interfaces, avoiding reliance on uncontrolled web search APIs. We evaluate representative paradigms ranging from direct QA and RAG workflow to Plan-Act-Replan agents and learned search models. Results reveal a large gap between model-only answering, practical agentic search, and oracle knowledge: the best open-source direct-QA model reaches 66.4%, the best practical agent achieves 79.1%, and oracle knowledge reaches 95.4%. Further analysis exposes bottlenecks in visual grounding, retrieval quality, evidence-grounded reasoning, and tool-use behavior, including over-search, answer-only shortcuts, and retrieval-induced misleading.

URL PDF HTML ☆

赞 0 踩 0

2605.17164 2026-05-21 cs.DC cs.AI cs.LG cs.PL 版本更新

Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference

Charon：一种用于大规模大语言模型训练和推理的统一且细粒度模拟器

Mengtian Yang, Zhekun Zhang, Mingheng Wu, Jianwen Yan, Hanshi Sun, Li-wen Chang

发表机构 * University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结本文提出Charon模拟器，通过统一、模块化和细粒度的方法，准确预测大语言模型性能，实验显示其在不同模型和配置上具有高精度，预测误差低于5.35%，并在实际推理部署中发现提升系统吞吐量的配置，展示了其实际价值。

Comments Accepted by MLSys 2026

2605.16812 2026-05-21 cs.LG cs.CR 版本更新

SEED：通过加权独立集实现目标数据选择

Yuan Zhang, Lifeng Guo, Junwen Pan, Wenzhao Zheng, Wen Zhou, Kuan Cheng, Kurt Keutzer, Shanghang Zhang

发表机构 * School of Computer Science, Peking University（北京大学计算机科学学院）； Beijing University of Posts and Telecommunications（北京邮电大学）； Tianjin University（天津大学）； EECS, UC Berkeley（伯克利大学电子工程与计算机科学系）； Chinese Academy of Sciences（中国科学院）

AI总结本文提出SEED方法，通过将数据选择问题建模为加权独立集（WIS）在相似性图上，解决样本质量与多样性之间的平衡问题，并引入节点价值校准和局部尺度归一化来提升数据选择的鲁棒性和可扩展性。

Comments 20 pages

详情

AI中文摘要

数据选择旨在从大规模训练语料中识别出紧凑且信息丰富的子集，平衡样本质量和收集多样性。我们将该问题建模为相似性图上的加权独立集（WIS），其中节点代表数据样本并按影响程度加权，边连接语义冗余的配对。这种建模自然产生同时高质量和多样化的子集。然而，实践中存在两个挑战：朴素的节点权重无法区分信息信号与梯度噪声，且在异构领域分布下构造边会产生结构不平衡的图，偏向社会稀疏区域。为解决这些问题，我们引入了两种从统一图视角出发的改进方法：（1）节点价值校准，限制影响估计到双侧显著子空间，以任务相关信号为基础确定节点重要性，而不是表面统计；（2）局部尺度归一化，适应边阈值到局部邻域密度，缓解因跨领域分布偏移引起的图不平衡。这些组件共同产生了一个稳健且可扩展的数据选择流程，称为SEED。我们进一步构建了 exttt{Honeybee-Remake-SEED-200K}，一个由SEED编纂的紧凑多模态数据集。广泛实验表明，SEED在指令微调、视觉指令微调和语义分割等任务上，优于现有最先进方法，适用于多种模型家族。

英文摘要

Data selection seeks to identify a compact yet informative subset from large-scale training corpora, balancing sample quality against collection diversity. We formulate this problem as a Weighted Independent Set (WIS) on a similarity graph, where nodes represent data samples weighted by influence, and edges connect semantically redundant pairs. This formulation naturally yields subsets that are simultaneously high-quality and diverse. However, two challenges arise in practice: naive node weights fail to distinguish informative signals from gradient noise, and edge construction under heterogeneous domain distributions produces structurally imbalanced graphs that bias selection toward sparse regions. To address these issues, we introduce two principled refinements from a unified graph perspective: (1) \textit{node value calibration} that restricts influence estimation to the bilateral salient subspace to ground node importance in task-relevant signals rather than surface-level statistics; (2) \textit{local scale normalization} that adapts edge thresholds to local neighborhood density, mitigating graph imbalance induced by cross-domain distribution shifts. Together, these components yield a robust and scalable data selection pipeline dubbed SEED. We further construct \texttt{Honeybee-Remake-SEED-200K}, a compact multimodal dataset curated by SEED. Extensive experiments show that SEED consistently outperforms state-of-the-art methods on instruction tuning, visual instruction tuning, and semantic segmentation across diverse model families.

URL PDF HTML ☆

赞 0 踩 0

2605.15305 2026-05-21 cs.GR cs.LG 版本更新

WorldParticle: Unified World Simulation of Lagrangian Particle Dynamics via Transformer

WorldParticle：通过Transformer实现拉格朗日粒子动力学的统一世界模拟

Caoliwen Wang, Minghao Guo, Siyuan Chen, Heng Zhang, Mengdi Wang, Xingyu Ni, Hanson Sun, Kunyi Wang, Zherong Pan, Kui Wu, Lingjie Liu, Yin Yang, Chenfanfu Jiang, Taku Komura, Wojciech Matusik, Peter Yichen Chen

发表机构 * University of British Columbia（不列颠哥伦比亚大学）； MIT CSAIL（麻省理工学院计算机科学与人工智能实验室）； Georgia Institute of Technology（佐治亚理工学院）； Inria（法国国家信息与自动化技术研究院）； Meta（Meta公司）； Independent Researcher（独立研究者）； University of Pennsylvania（宾夕法尼亚大学）； University of Utah（犹他大学）； University of California Los Angeles（加州大学洛杉矶分校）； University of Hong Kong（香港大学）

AI总结本文提出基于Transformer架构的粒子模拟器，能够统一模拟布料、弹性固体、牛顿流体、非牛顿流体、颗粒材料和分子动力学等不同物理现象，通过预测-校正设计和粒子表示，实现高效的模拟与泛化。

详情

AI中文摘要

一个能够模拟多种物理现象而无需针对特定求解器重新设计的统一模拟器一直是模拟科学中的长期目标。我们提出一个基于学习的粒子模拟器，基于单一的Transformer架构，以模拟布料、弹性固体、牛顿流体、非牛顿流体、颗粒材料和分子动力学。我们的模型采用共享拉格朗日粒子表示的预测-校正设计。一个显式预测器首先在已知的外力作用下推进粒子，产生一个中间状态，该状态捕捉了外部驱动的运动，但不捕捉粒子间相互作用。一个学习的校正器通过三个阶段预测残差位置和速度更新：一个粒子分词器编码局部粒子-粒子、粒子-边界和拓扑引导的相互作用；一个超分词编码器通过交替的自注意力和分词合并将粒子分词合并为紧凑的超分词集；一个超分词解码器通过交叉注意力将这些超分词提升回粒子分辨率，以预测每个粒子的位置和速度校正。逐步分词合并通过在每一层将分词数量减半来减少后续编码器层的注意力成本，解码器通过紧凑的超分词集而不是完整的粒子-粒子注意力进行通信。在六个动力学类别中，相同的架构能够泛化到未见过的材料、边界配置、初始条件和外力。我们进一步展示了下游交互控制、反向设计和从现实世界操作数据中学习，减少了对每个现象求解器工程的需要。

英文摘要

A unified simulator that can model diverse physical phenomena without solver-specific redesign is a long-standing goal across simulation science. We present a learning-based particle simulator built on a single transformer architecture to model cloth, elastic solds, Newtonian and non-Newtonian fluids, granular materials, and molecular dynamics. Our model follows a prediction-correction design on a shared Lagrangian particle representation. An explicit predictor first advances particles under the known external forces, producing an intermediate state that captures externally driven motion but not inter-particle interactions. A learned corrector then predicts the residual position and velocity updates through three stages: a particle tokenizer that encodes local particle-particle, particle-boundary, and topology-guided interactions; a super-token encoder that hierarchically merges particle tokens into a compact set of super tokens via alternating self-attention and token merging; and a super-token decoder that lifts these super tokens back to particle resolution through cross-attention to predict per-particle position and velocity corrections. Progressive token merging reduces the attention cost at successive encoder layers by halving the token count at each level, and the decoder communicates through the compact super-token set rather than full particle-to-particle attention. Across the six dynamics categories, the same architecture generalizes to unseen materials, boundary configurations, initial conditions, and external forces. We further demonstrate downstream interactive control, inverse design, and learning from real-world manipulation data, reducing the need for per-phenomenon solver engineering.

URL PDF HTML ☆

赞 0 踩 0

2605.15157 2026-05-21 cs.RO cs.LG 版本更新

Hand-in-the-Loop: Improving VLA Policies for Dexterous Manipulation via Seamless Hand-Arm Intervention

手在环中：通过无缝手臂干预改进VLA策略以实现灵巧操作

Zhuohang Li, Liqun Huang, Wei Xu, Zhengming Zhu, Nie Lin, Xiao Ma, Xinjun Sheng, Ruoshi Wen

发表机构 * State Key Laboratory of Mechanical System and Vibration, School of Mechanical Engineering, Shanghai Jiao Tong University（机械系统与振动国家重点实验室，机械工程学院，上海交通大学）； Shanghai Key Laboratory of Intelligent Robotics, Meta Robotics Institute, Shanghai Jiao Tong University, Shanghai 200240, China（智能机器人上海市重点实验室，元机器人研究院，上海交通大学，上海200240，中国）； The University of Tokyo（东京大学）

AI总结本文提出Hand-in-the-Loop方法，通过无缝整合人类干预与自主策略执行，减少手部操作中的突兀变化，提升双臂灵巧操作的鲁棒性和效率。

详情

AI中文摘要

Vision-Language-Action (VLA)模型在灵巧操作中容易累积误差，高维动作空间和接触丰富的动态会放大政策偏差。虽然交互模仿学习(IIL)可通过人类修正数据细化策略，但将其应用于高自由度机械手仍具有挑战性，因为人类遥控与策略执行在干预时刻的命令不匹配，导致机器人手部配置的突兀变化，即'手势跳跃'。我们提出了Hand-in-the-Loop (HandITL)，一种无缝的人在回路干预方法，将人类的修正意图与自主策略执行相结合，以避免在双臂灵巧操作中的手势跳跃。与使用直接遥控接管相比，HandITL将干预抖动减少了99.8%，并保持了干预后的稳健操作，将抓取失败减少了87.5%，平均完成时间减少了19.1%。我们在需要双臂协调、工具使用和精细长时域操作的任务上验证了HandITL。当用于收集策略细化的修正数据时，HandITL在三个长时域灵巧任务中平均优于使用标准遥控数据训练的策略19%。

英文摘要

Vision-Language-Action (VLA) models are prone to compounding errors in dexterous manipulation, where high-dimensional action spaces and contact-rich dynamics amplify small policy deviations over long horizons. While Interactive Imitation Learning (IIL) can refine policies through human correction data, applying it to high-degree-of-freedom (DoF) robotic hands remains challenging due to a command mismatch between human teleoperation and policy execution at the intervention moment, which causes abrupt robot-hand configuration changes, or "gesture jumps". We present Hand-in-the-Loop (HandITL), a seamless human-in-the-loop intervention method that blends human corrective intent with autonomous policy execution to avoid gesture jumps during bimanual dexterous manipulation. Compared with taking over control using direct teleoperation, HandITL reduces intervention jitter by 99.8% and preserves robust post-intervention manipulation, reducing grasp failures by 87.5% and mean completion time by 19.1%. We validate HandITL on tasks requiring bimanual coordination, tool use, and fine-grained long-horizon manipulation. When used to collect correction data for policy refinement, HandITL yields policies that outperform those trained with standard teleoperation data by 19% on average across three long-horizon dexterous tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.14364 2026-05-21 cs.LG 版本更新

MoRe: Modular Representations for Principled Continual Representation Learning on Sequential Data

MoRe：模块化表示用于序列数据的原理化持续表示学习

Jiaqi Sun, Boyang Sun, Rasmy M. H., Xiangchen Song, Kun Zhang

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Mohamed bin Zayed University of Artificial Intelligence（穆罕默德·本·扎耶德人工智能大学）

AI总结本文提出MoRe框架，通过模块化表示方法实现序列数据的原理化持续学习，其核心贡献是通过分解知识为可识别的模块层级，实现模块的重用、对齐和扩展，从而在保持旧模块的同时提升模型的可塑性和稳定性。

详情

AI中文摘要

持续学习要求模型在适应新数据的同时保持已获得的知识。其核心挑战可以视为原理化的一步适应：在最小干扰的情况下将新信息整合到现有表示中。大多数现有方法通过监督、任务特定的方式修改模型参数或架构来解决这一挑战。然而，根本问题在于表示层面：任务需要具有不同但结构化的表示，这些表示可以被选择性更新而不破坏表示，同时结构应反映数据中的内在组织而非任务边界。在序列数据中，时间延迟依赖性提供了一种自然的信号，用于揭示这种组织，展示如何基本表示产生更具体的表示。受人类大脑模块化组织的启发，我们提出MoRe，一个框架，它在表示本身中识别模块性，而不是在架构层面分配。MoRe将知识分解为具有可识别保证的基本和特定模块层级，使在适应过程中能够实现原理化的模块重用、对齐和扩展，同时通过构造保留旧模块。在合成基准和真实世界LLM激活数据上的实验表明了可解释的层次结构，改进了可塑性-稳定性权衡，表明MoRe是持续适应的原理化基础。

英文摘要

Continual learning requires models to adapt to new data while preserving previously acquired knowledge. At its core, this challenge can be viewed as principled one-step adaptation: incorporating new information with minimal interference to existing representations. Most existing approaches address this challenge by modifying model parameters or architectures in a supervised, task-specific manner. However, the underlying issue is representational: tasks require distinct yet structured representations that can be selectively updated without disrupting representations, while structure should reflect intrinsic organization in the data rather than task boundaries. In sequential data, time-delayed dependencies provide a natural signal for uncovering this organization, revealing how fundamental representations give rise to more specific ones. Inspired by the modular organization of the human brain, we propose MoRe, a framework that identifies modularity in the representation itself rather than allocating it at the architectural level. MoRe decomposes knowledge into a hierarchy of fundamental and specific modules with identifiability guarantees, enabling principled module reuse, alignment, and expansion during adaptation while preserving old modules by construction. Experiments on synthetic benchmarks and real-world LLM activations demonstrate interpretable hierarchical structure, improved plasticity-stability trade-offs, suggesting MoRe as a principled foundation for continual adaptation

URL PDF HTML ☆

赞 0 踩 0

2605.13302 2026-05-21 cs.LG cs.SY eess.SY 版本更新

Safe Bayesian Optimization for Uncertain Correlation Matrices in Linear Models of Co-Regionalization

安全的贝叶斯优化用于线性共区域化模型中的不确定相关矩阵

Jannis Lübsen, Annika Eichler

发表机构 * Institute of Control Systems, Hamburg University of Technology（控制系统研究所，汉堡技术大学）

AI总结本文将多任务贝叶斯优化的安全保证从内在共区域化模型扩展到线性共区域化模型，通过组合多个特征更灵活地建模任务间相关性，并推导了从线性共区域化核高斯过程中采样的向量值函数的统一误差界，同时在安全多任务贝叶斯优化基准上的数值比较中展示了线性共区域化模型的潜在性能优势。

Comments Accepted at IFAC WC26

2605.12483 2026-05-21 cs.LG cs.AI 版本更新

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

超越GRPO和在线策略蒸馏：一种经验性稀疏到密集奖励原则用于语言模型后训练

Yuanda Xu, Hejian Sang, Zhengze Zhou, Ran He, Zhipeng Wang, Alborz Geramifard

AI总结本文提出了一种经验性的稀疏到密集奖励原则，用于语言模型后训练，通过在教师模型上使用稀疏奖励进行探索和发现，然后通过密集监督将行为压缩到部署模型中，从而在数学问题上实现了优于GRPO的性能。

详情

AI中文摘要

在标记可验证的训练数据是约束的情况下，每个检查的示例应分配给模型和奖励密度，其中它最有信息量。我们识别出一个支配这种分配的奖励密度原则：稀疏序列级奖励在能够探索和发现更好行为的模型上最有用，而密集的token级教师监督更适合将该行为压缩到更小的部署模型中。该原则产生了一个简单的分配规则：在最强的可用教师上使用稀缺的标记数据，然后将奖励形状的行为作为密集监督转移到下游。我们通过一个四阶段的工作流程——教师RL、forward-KL预热、在线策略蒸馏、可选的后桥学生RL——在可验证的数学上评估了此规则，使用Qwen3和Llama模型。在固定的Qwen3-1.7B部署学生大小下，一个通过密集桥进行蒸馏的RL改进的8B教师在相同的学生上表现优于直接GRPO（79.3% vs. 75.9%在MATH；25.2% vs. 19.8%在AIME 2024，avg@16），而从相同教师提前进行RL的转移效果更差。一个组件消融确认了每个阶段的重要性：用RL改进的教师替换为原始教师会损失7.8个MATH点，移除forward-KL预热会损失1.7个点，移除在线策略蒸馏会损失3.3个点。教师质量顺序——原始教师转移 < 直接GRPO < RL教师转移——在使用Llama-3.1-8B-Instruct作为教师和Llama-3.3-70B-Instruct作为教师的情况下重复。操作教训是避免将稀缺的标记数据用于准备最少的策略：使用稀疏奖励进行教师端的发现，使用密集转移进行学生端的压缩，并在桥接后才使用学生端的稀疏奖励。

英文摘要

In settings where labeled verifiable training data is the binding constraint, each checked example should be allocated to the model and reward density where it is most informative. We identify a reward-density principle that governs this allocation: sparse sequence-level reward is most useful on models that can explore and discover better behavior, while dense token-level teacher supervision is better suited for compressing that behavior into a smaller deployment model. The principle yields a simple allocation rule: use scarce labeled data upstream on the strongest available teacher, then transfer the reward-shaped behavior downstream as dense supervision. We evaluate this rule through a four-stage workflow -- teacher RL, forward-KL warmup, on-policy distillation, optional post-bridge student RL -- on verifiable math with Qwen3 and Llama models. At fixed Qwen3-1.7B deployment-student size, an RL-improved 8B teacher distilled through the dense bridge outperforms direct GRPO on the same student ($79.3\%$ vs.\ $75.9\%$ on MATH; $25.2\%$ vs.\ $19.8\%$ on AIME~2024, avg@16), while transfer from the same teacher \emph{before} RL underperforms. A component ablation confirms that each stage is load-bearing: replacing the RL-improved teacher with a raw teacher costs $7.8$ MATH points, removing the forward-KL warmup costs $1.7$, and removing on-policy distillation costs $3.3$. The teacher-quality ordering -- raw-teacher transfer $<$ direct GRPO $<$ RL-teacher transfer -- replicates on Llama-3.1-8B-Instruct with a Llama-3.3-70B-Instruct teacher. The operational lesson is to avoid spending scarce labeled data on the least prepared policy: use sparse reward for teacher-side discovery, dense transfer for student compression, and student-side sparse reward only after the bridge.

URL PDF HTML ☆

赞 0 踩 0

2605.12196 2026-05-21 cs.LG 版本更新

ECTO: Exogenous-Conditioned Temporal Operator for Ultra-Short-Term Wind Power Forecasting

ECTO：用于超短期风功率预测的外源性条件化时间运算符

Cao Yuan, Junjun Wang

发表机构 * Wuhan Polytechnic University（武汉理工大学）； Wuhan Public Meteorological Service Center（武汉市气象局）

AI总结本文提出了一种统一框架ECTO，通过物理基础变量选择和外源性条件化制度细化模块，实现了对超短期风功率预测中非平稳、条件依赖的风力发电的高效建模，从而在不同气候、容量和外源变量维度的风场中取得最佳的均方误差性能。

Comments 42 pages, 10 figures, 9 tables

详情

AI中文摘要

准确的超短期风功率预测对于电网调度和备用管理至关重要，但因其风力发电的非平稳性和条件依赖性而具有挑战性。气象外源变量包含大量预测信息，但最有信息量的变量组合会因站点、运行条件和预测时间跨度而异。现有的深度学习方法要么将外源输入视为通用的辅助通道通过统一混合或软门控，要么依赖于固定的预处理步骤如PCA，而没有利用气象变量的物理结构。我们提出ECTO（外源性条件化时间运算符），一个统一的框架，将外源变量建模分解为两个互补的模块。物理基础变量选择（PGVS）使用领域指导的物理先验和稀疏max激活进行层次化、组意识的稀疏选择，产生一个紧凑、条件适应的外源上下文。外源性条件化制度细化（ECRR）将预测路由通过学习到的制度专家，通过专家混合范式应用增益-偏置校准和特定时间跨度的校正。在三个跨越不同气候、容量（66-200 MW）和外源变量维度（11-13个变量）的风场实验中，ECTO在所有站点中实现了最低的均方误差，相对于最强基线的相对改进范围从2.2%到5.2%，在较长的预测时间跨度（H=32）时扩大到8.6%。消融分析确认了每个与外源变量相关的组件都贡献了积极的效果（PGVS +1.84%，ECRR +2.86%），可解释性分析揭示PGVS学习了具有物理意义的、特定站点的变量选择模式，而ECRR收敛到一致的校准策略。

英文摘要

Accurate ultra-short-term wind power forecasting is critical for grid dispatch and reserve management, yet remains challenging due to the non-stationary, condition-dependent nature of wind generation. Meteorological exogenous variables carry substantial predictive information, but the most informative variable combination varies across sites, operating conditions, and prediction horizons. Existing deep learning approaches either treat exogenous inputs as generic auxiliary channels through uniform mixing or soft gating, or rely on fixed preprocessing steps such as PCA, without exploiting the physical structure of meteorological variables. We propose ECTO (Exogenous-Conditioned Temporal Operator), a unified framework that decomposes exogenous variable modeling into two complementary modules. Physically-Grounded Variable Selection (PGVS) performs hierarchical, group-aware sparse selection over exogenous variables using a domain-informed physical prior and sparsemax activations, producing a compact, condition-adaptive exogenous context. Exogenous-Conditioned Regime Refinement (ECRR) routes the forecast through learned regime experts that apply gain--bias calibration and horizon-specific corrections via a mixture-of-experts paradigm. Experiments on three wind farms spanning different climates, capacities (66--200 MW), and exogenous dimensions (11--13 variables) demonstrate that ECTO achieves the lowest MSE across all sites, with relative improvements over the strongest baseline ranging from 2.2% to 5.2%, widening to 8.6% at the longer prediction horizon ($H=32$). Ablation analysis confirms that each exogenous-related component contributes positively (PGVS +1.84%, ECRR +2.86%), and interpretability analysis reveals that PGVS learns physically meaningful, site-specific variable selection patterns, while ECRR converges to well-separated calibration strategies consistent across sites.

URL PDF HTML ☆

赞 0 踩 0

2605.11302 2026-05-21 cs.LG cs.AI cs.CL 版本更新

A Theory of Time-Sensitive Language Generation: Sparse Hallucination Beats Mode Collapse

时间敏感语言生成理论：稀疏幻觉战胜模式崩溃

Atul Ganju, Travis McVoy, Shaddin Dughmi, Shang-Hua Teng

发表机构 * University of Southern California（美国南加州大学）

AI总结本文研究了在全局偏好顺序下语言生成的极限情况，提出了一种时间敏感的语言生成方法，通过稀疏幻觉技术克服了模式崩溃问题，证明了在特定条件下可以实现最优密度。

详情

AI中文摘要

我们研究了在全局偏好顺序下语言生成的极限情况，如Kleinberg和Wei所引入的。与以往工作类似，我们追求广度，但增加了时效性要求：高排名字符串应更早生成。一个字符串只有在截止时间前生成才被认可，其截止时间由一个函数确定，该函数将字符串在目标语言中的排名映射到必须生成的时间。这与机器学习中的归纳偏置一致，即在其他条件相同的情况下，倾向于选择更简单或更可能的输出。我们证明，在强意义上，最终一致的生成器无法实现时效性生成——这是大多数先前相关工作的主角。在可能最温和的一致性放松下，即幻觉率随时间消失，我们证明可以绕过我们的不可能结果。特别是，我们可以实现相对于任何超线性截止函数的最优密度。我们还证明这是紧的，通过排除线性截止时间和消失幻觉率下的时效性生成。

英文摘要

We study language generation in the limit under a global preference ordering on strings, as introduced by Kleinberg and Wei. As is done in previous work, we aim for breadth, but impose an additional requirement of timeliness: higher-ranked strings should be generated earlier. A string is then only credited if it is generated before a deadline, where its deadline is defined by a function that maps a string's rank in the target language to the time by which it must be produced. This is in keeping with a central consideration in machine learning, where inductive bias favors ``simpler'' or ``more plausible'' outputs, all else being equal. We show that timely generation is impossible in a strong sense for eventually consistent generators -- the protagonists of most prior related work. Under what is perhaps the mildest natural relaxation of consistency, a hallucination rate that vanishes over time, we show that we can circumvent our impossibility result. In particular, we can achieve optimal density with respect to any superlinear deadline function. We also show this is tight by ruling out timely generation with linear deadlines and vanishing hallucination rate.

URL PDF HTML ☆

赞 0 踩 0

2605.10830 2026-05-21 cs.CV cs.LG 版本更新

Predicting 3D structure by latent posterior sampling

通过潜在后验采样预测3D结构

Azmi Haider, Dan Rosenbaum

发表机构 * Department of Computer Science（计算机科学系）； University of Haifa（海法大学）； Department of Computational Science（计算科学系）

AI总结本文提出了一种结合NeRF表示和扩散模型的概率建模方法，用于从不同类型的观测数据（如单视角、多视角、噪声图像、稀疏像素和稀疏深度数据）中准确预测3D结构。

详情

AI中文摘要

生成模型在2D图像和神经场表示在3D场景中的显著成就提供了一个有吸引力的机会，将两种方法的优势结合起来。在本工作中，我们提出了一种方法，将基于NeRF的3D场景表示与扩散模型的概率建模和推理相结合。我们将3D重建视为一个具有内在不确定性的感知问题，从而可以受益于概率推断方法。核心思想是将3D场景表示为一个随机的潜在变量，我们可以学习其先验分布，并在给定一组观测数据的情况下进行后验推断。我们通过扩散模型的分数推理方法进行后验采样，并结合从重建模型计算出的似然项（包括体渲染）。我们通过两阶段过程训练模型：首先训练重建模型并自动解码潜在表示以处理3D场景的数据集，然后在潜在空间上训练扩散模型的先验。通过使用模型从后验中生成样本，我们证明了各种3D重建任务可以执行，根据所使用的输入观测类型不同。我们展示了从单视角、多视角、噪声图像、稀疏像素和稀疏深度数据的重建。这些观测在提供的场景信息量上有所不同，我们展示了我们的方法能够建模与每个任务相关的不同水平的内在不确定性。我们的实验表明，这种方法产生了一种全面的方法，能够准确地从各种观测类型中预测3D结构。

英文摘要

The remarkable achievements of both generative models of 2D images and neural field representations for 3D scenes present a compelling opportunity to integrate the strengths of both approaches. In this work, we propose a methodology that combines a NeRF-based representation of 3D scenes with probabilistic modeling and reasoning using diffusion models. We view 3D reconstruction as a perception problem with inherent uncertainty that can thereby benefit from probabilistic inference methods. The core idea is to represent the 3D scene as a stochastic latent variable for which we can learn a prior and use it to perform posterior inference given a set of observations. We formulate posterior sampling using the score-based inference method of diffusion models in conjunction with a likelihood term computed from a reconstruction model that includes volumetric rendering. We train the model using a two-stage process: first we train the reconstruction model while auto-decoding the latent representations for a dataset of 3D scenes, and then we train the prior over the latents using a diffusion model. By using the model to generate samples from the posterior we demonstrate that various 3D reconstruction tasks can be performed, differing by the type of observation used as inputs. We showcase reconstruction from single-view, multi-view, noisy images, sparse pixels, and sparse depth data. These observations vary in the amount of information they provide for the scene and we show that our method can model the varying levels of inherent uncertainty associated with each task. Our experiments illustrate that this approach yields a comprehensive method capable of accurately predicting 3D structure from diverse types of observations.

URL PDF HTML ☆

赞 0 踩 0

2605.08731 2026-05-21 cs.PF cs.LG 版本更新

Single-Thread JPEG Decoder Benchmarks Mis-Evaluate ML Data Loaders

单线程JPEG解码器基准测试误评了ML数据加载器

Vladimir Iglovikov, Dmitry Kosarevsky

发表机构 * Ternaus ； Independent Researcher（独立研究者）

AI总结本文通过评估不同Python可访问的JPEG解码路径在五种匹配的16核Google Cloud CPU上的表现，发现单线程基准测试无法准确评价ML数据加载器的性能，揭示了不同架构和解码器在多线程工作负载下的差异。

Comments 10 pages, 4 figures. Code and data: https://github.com/ternaus/imread_benchmark

详情

AI中文摘要

JPEG解码是常规的机器学习基础设施，但Python解码器的选择通常基于单进程、单线程的微观基准测试。我们通过在五种匹配的16核Google Cloud CPU（Intel Emerald Rapids，AMD Zen 4，AMD Zen 5，ARM Neoverse V2和ARM Neoverse N1）上审计十三种Python可访问的JPEG解码路径，验证了这一评估假设。ImageNet验证是工作负载，而不是新的数据集贡献：每次运行都从内存中解码完整的50,000张图像分割，并报告所有解码器的单线程吞吐量，对于符合条件的解码器，在工人数量{0,2,4,8}时报告PyTorch DataLoader吞吐量以及解码器跳过行为。评估协议改变了支持的结论。在Neoverse V2上，imageio在单线程吞吐量中排名第九，但进入与torchvision并列的DataLoader层级；在Zen 4上，torchvision从第七名的单线程提升到最高测量的DataLoader层级；在Neoverse N1上，imagecodecs是单线程领导者，但在峰值DataLoader吞吐量中排名第五。我们还发现Zen 4和Zen 5之间的工人数量结论不同，TensorFlow在单线程ARM上有较大的惩罚，严格的原生JPEG解码器/包装器拒绝了相同的罕见ImageNet JPEG。对于PyTorch DataLoader工作负载，torchvision和simplejpeg形成了最强的零跳过层级：torchvision具有最高的平均归一化吞吐量，而simplejpeg具有最高的最低吞吐量。OpenCV在每种测试的CPU上仍然是一个稳健的通用备用选项，超过平台本地胜者的90%。我们发布了原始JSON，生成的表格/图表以及一个可执行的本地/云基准框架。

英文摘要

JPEG decode is routine ML infrastructure, but Python decoder choices are often justified by single-process, single-thread microbenchmarks. We audit this evaluation assumption with thirteen Python-accessible JPEG decode paths on five matched 16 vCPU Google Cloud CPUs: Intel Emerald Rapids, AMD Zen 4, AMD Zen 5, ARM Neoverse V2, and ARM Neoverse N1. ImageNet validation is the workload, not a new dataset contribution: each run decodes the full 50,000-image split from memory and reports single-thread throughput for all decoders, PyTorch \texttt{DataLoader} throughput for eligible decoders at worker counts $\{0,2,4,8\}$, and decoder skip behavior. The evaluation protocol changes the supported conclusion. On Neoverse V2, \texttt{imageio} is ninth in single-thread throughput yet lands in the top DataLoader tier with \texttt{torchvision}; on Zen 4, \texttt{torchvision} rises from seventh single-thread to the top measured DataLoader tier; on Neoverse N1, \texttt{imagecodecs} is the single-thread leader but fifth at peak DataLoader throughput. We also find that worker-count conclusions differ between Zen 4 and Zen 5, TensorFlow has a large single-thread ARM penalty, and strict native JPEG decoders/wrappers reject the same rare ImageNet JPEG. For PyTorch DataLoader workloads, \texttt{torchvision} and \texttt{simplejpeg} form the strongest measured zero-skip tier: \texttt{torchvision} has the highest mean normalized throughput, while \texttt{simplejpeg} has the highest minimum. OpenCV remains a robust general-purpose fallback above 90\% of the platform-local winner on every tested CPU. We release raw JSON, generated tables/figures, and an executable local/cloud benchmark framework.

URL PDF HTML ☆

赞 0 踩 0

2605.08123 2026-05-21 cs.LG cs.CL 版本更新

Block-Wise Differentiable Sinkhorn Attention: Tail-Refinement Gradients with a Gap-Aware Dustbin Bridge

块级可微的Sinkhorn注意力：带有间隙意识的尘桶桥尾部细化

Dylan Forde

发表机构 * Independent Researcher（独立研究者）

AI总结本文研究了通过停止基固定深度尾部细化代理在TPU硬件上实现长上下文平衡熵最优传输（OT）注意力。通过停止T步Sinkhorn求解后，展开一个短的细化尾部并精确地对这个代理进行微分。对于报告的R=2 TPU路径，反向传播包含四个阶梯计划因子。我们证明了一个精确的一参考瓷砖计划：R=2分数余切是单个参考计划瓷砖乘以一个由向量余切和双差分构建的显式修改字段。这导致了块级成本O((T+R)LW)，O(Ld)输入存储，以及O(L)额外的HBM使用，对于固定头部维度d和带宽W在平衡固定支撑路径上。我们还正式化了当前dustbin_block路径作为在增强支撑上的相同单位目标代理，因此共轭计划提升到单个活跃尘桶路径，这在我们的TPU运行中使用；这个桥是代数的，不声称一般KL不平衡或任意容量间隙模型。我们提供了局部代理偏置界，后验偏置证书和严格正活跃块的投影收缩证书。在合成掩码问题上，优化的内核在10^-5至10^-10范围内与相同中心代理的精确自动微分匹配。在TPU v6e-8上，一个四配置Pfam屏幕完成端到端，一个提升的平衡R=2运行通过三小时预算，每秒维持大约8.5个示例，达到第1437步。保留的Pfam测试碎片将重建从5.57提高到2.05，稀疏CE从5.53提高到5.30，相对于第0步，CE被诊断性记录而不是直接优化；目标-均值对齐度量没有显著改善，而确定性对角参考在这些度量上仍更强。

详情

AI中文摘要

我们研究了通过停止基固定深度尾部细化代理在TPU硬件上实现长上下文平衡熵最优传输（OT）注意力。在停止T步Sinkhorn求解后，我们展开一个短的细化尾部并精确地对这个代理进行微分。对于报告的R=2 TPU路径，反向传播包含四个阶梯计划因子。我们证明了一个精确的一参考瓷砖计划：R=2分数余切是单个参考计划瓷砖乘以一个由向量余切和双差分构建的显式修改字段。这导致了块级成本O((T+R)LW)，O(Ld)输入存储，以及O(L)额外的HBM使用，对于固定头部维度d和带宽W在平衡固定支撑路径上。我们还正式化了当前dustbin_block路径作为在增强支撑上的相同单位目标代理，因此共轭计划提升到单个活跃尘桶路径，这在我们的TPU运行中使用；这个桥是代数的，不声称一般KL不平衡或任意容量间隙模型。我们提供了局部代理偏置界，后验偏置证书和严格正活跃块的投影收缩证书。在合成掩码问题上，优化的内核在10^-5至10^-10范围内与相同中心代理的精确自动微分匹配。在TPU v6e-8上，一个四配置Pfam屏幕完成端到端，一个提升的平衡R=2运行通过三小时预算，每秒维持大约8.5个示例，达到第1437步。保留的Pfam测试碎片将重建从5.57提高到2.05，稀疏CE从5.53提高到5.30，相对于第0步，CE被诊断性记录而不是直接优化；目标-均值对齐度量没有显著改善，而确定性对角参考在这些度量上仍更强。

英文摘要

We study long-context balanced entropic optimal transport (OT) attention on TPU hardware through a stopped-base, fixed-depth tail-refinement surrogate. After a stopped $T$-step Sinkhorn solve, we unroll a short refinement tail and differentiate that surrogate exactly. For the reported $R=2$ TPU path, the backward pass contains four staircase plan factors. We prove an exact one-reference-tile schedule: the $R=2$ score cotangent is a single reference plan tile times an explicit modifier field built from vector cotangents and dual differences. This yields block-wise cost $O((T+R)LW)$, $O(Ld)$ input storage, and $O(L)$ additional HBM usage for fixed head dimension $d$ and band width $W$ on the balanced fixed-support path. We also formalize the current \texttt{dustbin\_block} path as the same unit-target surrogate on an augmented support, so the adjoint schedule lifts to the single-active-dustbin path used in our TPU runs; this bridge is algebraic and does not claim a general KL-unbalanced or arbitrary-capacity gap model. We provide a local surrogate-bias bound, an a posteriori bias certificate, and a projective contraction certificate for strictly positive active blocks. On synthetic masked problems, the optimized kernel matches exact autodiff of the same centered surrogate to within $10^{-5}$--$10^{-10}$. On TPU v6e-8, a four-configuration Pfam screen completes end-to-end, and a promoted balanced $R=2$ run sustains roughly $8.5$ examples per second through a three-hour budget, reaching step $1437$. Held-out Pfam test shards improve reconstruction from $5.57$ to $2.05$ and sparse CE from $5.53$ to $5.30$ relative to step $0$, with CE logged diagnostically rather than optimized directly; target-barycenter alignment metrics do not materially improve, and a deterministic diagonal reference remains stronger on those metrics.

URL PDF HTML ☆

赞 0 踩 0

2605.06139 2026-05-21 cs.LG cs.AI 版本更新

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex

列表式策略优化：基于组的RLVR作为LLM响应单纯形上的目标投影

Yun Qu, Qi Wang, Yixiu Mao, Heming Zou, Yuhang Jiang, Yingyue Li, Wutong Xu, Lizhou Cai, Weijie Liu, Clive Bai, Kai Yang, Yangkun Chen, Saiyong Yang, Xiangyang Ji

发表机构 * Department of Automation, Tsinghua University（清华大学自动化系）； LLM Department, Tencent（腾讯LLM部门）

AI总结本文提出列表式策略优化（LPO），通过显式执行目标投影来解构隐式目标，利用响应单纯形限制近端RL目标，并通过精确散度最小化进行策略投影，从而在多样推理任务和LLM基础上提升训练性能，同时保持优化稳定性和响应多样性。

详情

AI中文摘要

可验证奖励的强化学习（RLVR）已成为大语言模型（LLMs）训练后的一种标准方法，以激励推理能力。在现有方法中，基于组的策略梯度很流行，它为每个提示样本生成一组响应，并通过组内优势信号更新策略。本文揭示这些优化策略共享一个共同的几何结构：每种策略隐式地定义了一个目标分布，并通过一阶近似向响应单纯形投影。基于这一见解，我们提出了列表式策略优化（LPO）以显式执行目标投影，通过限制近端RL目标到响应单纯形来解构隐式目标，然后通过精确散度最小化进行策略投影。该框架提供了（i）在列表式目标上单调改进，具有有界、零和和自校正的投影梯度，以及（ii）通过解耦的投影步骤灵活选择散度，具有不同的结构性质。在多样推理任务和LLM基础架构上，LPO在匹配的目标下一致地优于典型的策略梯度基线，同时内在地保持了优化稳定性和响应多样性。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has become a standard approach for large language models (LLMs) post-training to incentivize reasoning capacity. Among existing recipes, group-based policy gradient is prevalent, which samples a group of responses per prompt and updates the policy via group-relative advantage signals. This work reveals that these optimization strategies share a common geometric structure: each implicitly defines a target distribution on the response simplex and projects toward it via first-order approximation. Building on this insight, we propose Listwise Policy Optimization (LPO) to explicitly conduct the target-projection, which demystifies the implicit target by restricting the proximal RL objective to the response simplex, and then projects the policy via exact divergence minimization. This framework provides (i) monotonic improvement on the listwise objective with bounded, zero-sum, and self-correcting projection gradients, and (ii) flexibility in divergence selection with distinct structural properties through the decoupled projection step. On diverse reasoning tasks and LLM backbones, LPO consistently improves training performance over typical policy gradient baselines under matched targets, while intrinsically preserving optimization stability and response diversity.

URL PDF HTML ☆

赞 0 踩 0

2605.05863 2026-05-21 cs.LG cs.AI 版本更新

SOPE: Stabilizing Off-Policy Evaluation for Online RL with Prior Data

SOPE: 通过先验数据稳定在线强化学习中的策略评估

Carlo Romeo, Girolamo Macaluso, Alessandro Sestini, Andrew D. Bagdanov

发表机构 * Media Integration and Communication Center – University of Florence（媒体集成与通信中心——佛罗伦萨大学）； SEED – Electronic Arts（SEED——电子艺界）

AI总结本文提出SOPE算法，通过使用与演员对齐的离策略策略评估（OPE）信号作为自动早停机制，动态控制离线训练阶段的长度，从而在连续控制任务中提高基线性能并减少计算资源消耗。

详情

AI中文摘要

将先验数据纳入在线强化学习可以加速训练，但通常需要在高计算成本和长的多阶段训练流水线之间做出艰难的权衡。虽然固定长度的稳定阶段比静态更新计划更具计算效率，但它们需要任务相关的手动调整，可能会导致先验知识的浪费或严重的过拟合。为此，我们提出了SOPE算法，该算法利用与演员对齐的离策略策略评估（OPE）信号作为自动早停机制，动态控制离线训练阶段的长度。通过在当前策略的动作分布下对批评者进行保留验证集的评估，SOPE在离分布收益饱和时精确停止梯度更新，从而消除了手动调度调整的需要。在Minari基准套件的25个连续控制任务上评估，SOPE将基线性能提高了高达45.6%，同时将所需的TFLOPs减少了高达22倍，从而在样本效率和计算效率之间取得了平衡。这些发现表明，自适应的、基于评估的更新计划比依赖静态、详尽的更新计划更有效。

英文摘要

Incorporating prior data into online reinforcement learning accelerates training but typically forces a difficult trade-off between high computational costs and long, multi-stage training pipelines. While fixed-length stabilization phases are significantly more computationally efficient than static update schedules, they require task-dependent manual tuning, risking either the waste of prior knowledge or severe overfitting. To address this, we propose SOPE, an algorithm that uses an actor-aligned Off-Policy Policy Evaluation (OPE) signal as an automated early-stopping mechanism to dynamically control the length of offline training phases. By evaluating the critic on a held-out validation split under the current policy's action distribution, SOPE halts gradient updates exactly when out-of-distribution benefits saturate, eliminating the need for manual schedule tuning. Evaluated on 25 continuous control tasks from the Minari benchmark suite, SOPE improves baseline performance by up to 45.6% while reducing the required TFLOPs by up to 22x, thus balancing the tradeoff between sample and computational efficiency. These findings demonstrate that adaptive, evaluation-driven update schedules are more effective than relying on static, exhaustive update schedules.

URL PDF HTML ☆

赞 0 踩 0

2605.04128 2026-05-21 cs.GR cs.AI cs.CL cs.CV cs.LG 版本更新

JoyAI-Image: Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

JoyAI-Image: 激活统一多模态理解和生成中的空间智能

Lin Song, Wenbo Li, Guoqing Ma, Wei Tang, Bo Wang, Yuan Zhang, Yijun Yang, Yicheng Xiao, Jianhui Liu, Yanbing Zhang, Guohui Zhang, Wenhu Zhang, Hang Xu, Nan Jiang, Xin Han, Haoze Sun, Maoquan Zhang, Haoyang Huang, Nan Duan

发表机构 * Joy Future Academy, JD（joy未来学院，京东）

AI总结本文提出JoyAI-Image，一种统一的多模态基础模型，用于视觉理解、文本到图像生成和指令引导的图像编辑。该模型结合了空间增强的多模态大语言模型（MLLM）和多模态扩散Transformer（MMDiT），通过共享的多模态接口实现感知与生成的交互。构建可扩展的训练配方，结合统一指令微调、长文本渲染监督、空间 grounded 数据和通用及空间编辑信号，使模型具备广泛的多模态能力，同时增强几何感知推理和可控视觉合成。实验表明，JoyAI-Image在理解、生成、长文本渲染和编辑基准上达到最先进的性能。更重要的是，增强的理解、可控的空间编辑和新视角辅助推理之间的双向循环使模型超越一般视觉能力，向更强的空间智能发展。

Comments Code: https://github.com/jd-opensource/JoyAI-Image

详情

AI中文摘要

我们提出了JoyAI-Image，一种统一的多模态基础模型，用于视觉理解、文本到图像生成和指令引导的图像编辑。JoyAI-Image将空间增强的多模态大语言模型（MLLM）与多模态扩散Transformer（MMDiT）结合，允许感知和生成通过共享的多模态接口进行交互。围绕此架构，我们构建了一个可扩展的训练配方，结合了统一指令微调、长文本渲染监督、空间 grounded 数据以及通用和空间编辑信号。该设计使模型具备广泛的多模态能力，同时增强了几何感知推理和可控视觉合成。在理解、生成、长文本渲染和编辑基准上的实验表明，JoyAI-Image实现了最先进的或高度竞争的性能。更重要的是，增强的理解、可控的空间编辑和新视角辅助推理之间的双向循环使模型超越一般视觉能力，向更强的空间智能发展。这些结果表明，统一视觉模型在下游应用如视觉-语言-动作系统和世界模型中具有前景。

英文摘要

We present JoyAI-Image, a unified multimodal foundation model for visual understanding, text-to-image generation, and instruction-guided image editing. JoyAI-Image couples a spatially enhanced Multimodal Large Language Model (MLLM) with a Multimodal Diffusion Transformer (MMDiT), allowing perception and generation to interact through a shared multimodal interface. Around this architecture, we build a scalable training recipe that combines unified instruction tuning, long-text rendering supervision, spatially grounded data, and both general and spatial editing signals. This design gives the model broad multimodal capability while strengthening geometry-aware reasoning and controllable visual synthesis. Experiments across understanding, generation, long-text rendering, and editing benchmarks show that JoyAI-Image achieves state-of-the-art or highly competitive performance. More importantly, the bidirectional loop between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables the model to move beyond general visual competence toward stronger spatial intelligence. These results suggest a promising path for unified visual models in downstream applications such as vision-language-action systems and world models.

URL PDF HTML ☆

赞 0 踩 0

2605.03690 2026-05-21 cs.LG cs.AI q-bio.QM 版本更新

Graph Neural Network based Hierarchy-Aware Embeddings of Knowledge Graphs: Applications to Yeast Phenotype Prediction

基于图神经网络的面向层次的知识图谱嵌入：应用于酵母表型预测

Filip Kronström, Alexander H. Gower, Daniel Brunnsåker, Ievgeniia A. Tiukova, Ross D. King

发表机构 * Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg（计算机科学与工程系，查尔姆斯理工大学和哥德堡大学）； Department of Life Sciences, Chalmers University of Technology（生命科学系，查尔姆斯理工大学）； Department of Industrial Biotechnology, KTH Royal Institute of Technology（工业生物技术系，皇家理工学院）； Department of Chemical Engineering and Biotechnology, University of Cambridge（化学工程与生物技术系，剑桥大学）

AI总结本文提出了一种利用图神经网络和来自底层本体的语义损失来生成层次感知的知识图谱嵌入的方法，用于酵母表型预测，并展示了其在基因敲除效应预测和知识图谱修订评估中的应用。

详情

AI中文摘要

我们提出了一种利用图神经网络和来自底层本体的语义损失来生成层次感知的知识图谱嵌入的方法。该方法生成的嵌入更能反映领域知识。为了展示其效用，我们预测并解释了酵母Saccharomyces cerevisiae中基因敲除的影响，并在没有预测任务的情况下学习知识图谱的盒嵌入。我们进一步展示了盒嵌入如何作为评估知识图谱修订的基础。我们的酵母知识图谱是从社区数据库和本体术语构建的。低维盒嵌入结合图神经网络用于预测双基因敲除的细胞生长。在10折交叉验证中，这些预测的平均R²分数为0.360，显著高于基线比较，证明了高层定性知识对实验结果的影响力。在模型训练中纳入语义损失项提高了其预测性能（R²=0.377），通过将嵌入对齐本体结构。这表明本体中的类层次可以用于定量预测。我们还测试了训练好的模型在三基因敲除上的表现，展示了其对训练数据之外数据的泛化能力。此外，通过识别酵母知识图谱中对细胞生长预测重要的共现关系，我们构建了关于酵母相互作用特征的假说。一个生物实验验证了其中一个发现，揭示了肌醇利用与渗透压压力抗性之间的关联，突显了模型在生物发现中的潜力。

英文摘要

We present a method for finding hierarchy-aware embeddings of knowledge graphs (KGs) using graph neural networks (GNNs) enriched with a semantic loss derived from underlying ontologies. This method yields embeddings that better reflect domain knowledge. To demonstrate their utility, we predict and interpret the effects of gene deletions in the yeast Saccharomyces cerevisiae and learn box embeddings for KGs in the absence of a prediction task. We further show how box embeddings can serve as the basis for evaluating KG revisions. Our yeast KG is constructed from community databases and ontology terms. Low-dimensional box embeddings combined with GNNs are used to predict cell growth for double gene knockouts. Over 10-fold cross validation, these predictions have a mean $R^2$~score~of~0.360, significantly higher than baseline comparisons, demonstrating that high-level qualitative knowledge is informative about experimental outcomes. Incorporating semantic loss terms in the training of the models improves their predictive performance ($R^2$=0.377) by aligning embeddings with ontology structure. This shows that class hierarchies from ontologies can be exploited for quantitative prediction. We also test the trained models on triple gene knockouts, showing they generalise to data beyond those seen in training. Additionally, by identifying co-occurring relations in the yeast KG important for the cell-growth predictions, we construct hypotheses about interacting traits in yeast. A biological experiment validates one such finding, revealing an association between inositol utilisation and osmotic stress resistance, highlighting the model's potential to guide biological discovery.

URL PDF HTML ☆

赞 0 踩 0

2604.23937 2026-05-21 physics.flu-dyn cs.LG 版本更新

Multi-scale Dynamic Wake Modeling and Prediction of Floating Offshore Wind Turbines via Physics-Informed Neural Networks and Fourier Neural Operators

基于物理信息神经网络和傅里叶神经算子的浮式海上风力涡轮机多尺度动态涡流建模与预测

Guodan Dong, Jianhua Qin, Chang Xu

发表机构 * College of Renewable Energy, Hohai University, Changzhou, 213200, China（能源学院，河海大学，常州，213200，中国）； College of Water Conservancy and Hydropower Engineering（水利水电工程学院）； National Technology Innovation Center for Wind Power, Hohai University, Changzhou, 213200, China（风能技术创新中心，河海大学，常州，213200，中国）

AI总结本文提出利用物理信息神经网络和傅里叶神经算子对浮式海上风力涡轮机的多尺度动态涡流进行建模与预测，通过高保真数据集验证了FNOs在效率、长期预测能力和多尺度相干结构保真度方面的优势。

详情

AI中文摘要

多尺度动态涡流建模与预测对于浮式海上风力涡轮机（FOWTs）的实时控制和优化至关重要。在本研究中，通过两种新的深度学习框架——物理信息神经网络（PINNs）和傅里叶神经算子（FNOs），对在不同斯特劳哈尔数（St）范围内耦合的涌动和俯仰运动下产生的涡流进行建模。高保真数据集来源于具有旋翼线模型的大涡模拟（LES-AL）。结果表明，两种框架都能很好地建模主导的大尺度动态结构，如涡流蜿蜒；然而，FNOs在效率（计算速度提升8倍，收敛速度提升40倍）、长期预测能力和多尺度相干结构保真度方面显著优于PINN模型。此外，PINN模型预测的涡流具有平滑效应，限制了高频相干结构的分辨率，并低估了涡流中心和半宽处的湍流波动。频谱分析显示，FNOs能解析主要的涡流蜿蜒频率（其中Stp表示由耦合涌动和俯仰运动引起的频率），其对应的高阶谐波（2Stp，3Stp）以及能量级联。相比之下，PINN预测中的能量级联在高频范围（St > 1.0）内衰减得更快。此外，预乘功率谱密度表明，PINN模型所建模的涡流蜿蜒及对应谐波频率的能量含量相对于CFD和FNOs而言相对较低。这些发现表明，FNOs在高保真、实时建模FOWT涡流方面具有广阔前景。

英文摘要

Multi-scale dynamic wake modeling and prediction are essential for the real-time control and optimization of floating offshore wind turbines (FOWTs). In this study, wakes of FOWTs under coupled surge and pitch motions across a range of Strouhal numbers (St), which can induce wake meandering, are modeled via two novel deep-learning frameworks: physics-informed neural networks (PINNs) and Fourier neural operators (FNOs). The high-fidelity dataset is obtained from large-eddy simulations with the actuator line model (LES-AL). The results demonstrate that the dominant large-scale dynamic structures, such as meandering, can be well modeled by both frameworks; however, FNOs exhibit significant advantages over the PINN model in terms of efficiency (8-fold computational speedup and 40-fold faster convergence), long-term predictive capability, and multi-scale coherent structural fidelity. Furthermore, the wakes predicted by the PINN model exhibit a smoothing effect that limits the resolution of high-frequency coherent structures and underestimates turbulent fluctuations in both the wake center and half-width. Spectral analysis reveals that FNOs resolve the primary meandering frequency (where Stp denotes the frequency induced by the coupled surge and pitch motions), its corresponding higher-order harmonics (2Stp, 3Stp), and the energy cascade. In contrast, the energy cascade in the PINN predictions dissipates more rapidly in the high-frequency regime (St > 1.0). Additionally, the pre-multiplied power spectral density indicates that the energy contained in meandering and the corresponding harmonic frequencies modeled by PINNs is relatively low compared to that in CFD and FNOs. These findings suggest that FNOs are promising for the high-fidelity, real-time modeling of FOWT wakes.

URL PDF HTML ☆

赞 0 踩 0

2604.20985 2026-05-21 cs.LG cs.AI cs.CR stat.ML 版本更新

Differentially Private Model Merging

差分隐私模型融合

Qichuan Yin, Manzil Zaheer, Tian Li

发表机构 * The University of Chicago（芝加哥大学）； Google DeepMind（谷歌DeepMind）

AI总结本文提出两种后处理技术，随机选择和线性组合，用于在不额外训练的情况下生成满足任意目标差分隐私要求的最终私有模型，同时分析了这些方法在一般问题和私有均值估计中的隐私-效用权衡。

详情

AI中文摘要

在机器学习中，推理或部署时间的隐私要求往往由于政策、法规或用户偏好变化而演变。在本文中，我们旨在构建一组模型，以满足任何目标差分隐私（DP）要求，而无需额外训练，给定一组已在相同数据集上训练且具有不同隐私/效用权衡的现有模型。我们提出两种后处理技术，即随机选择和线性组合，以生成最终的私有模型，满足任何目标隐私参数。我们从R'enyi DP和一般问题中的隐私损失分布的角度提供了这些方法的隐私计费，以及在私有均值估计中的精确隐私/效用权衡分析，并比较了这两种机制。实验上，我们展示了我们方法的有效性，并在多个模型和合成及现实世界数据集上验证了我们的分析。

英文摘要

In machine learning, privacy requirements at inference or deployment time often evolve due to changing policies, regulations, or user preferences. In this work, we aim to construct a magnitude of models to satisfy any target differential privacy (DP) requirement without additional training, given a set of existing models trained on the same dataset with different privacy/utility tradeoffs. We propose two post-processing techniques, namely random selection and linear combination, to generate final private models satisfying any target privacy parameter. We provide privacy accounting of these approaches from the lens of R'enyi DP and privacy loss distributions on general problems, as well as on private mean estimation, where we precisely characterize the privacy/utility tradeoffs and compare the two mechanisms. Empirically, we demonstrate the effectiveness of our approaches and validate our analyses on several models and both synthetic and real-world datasets.

URL PDF HTML ☆

赞 0 踩 0

2604.11661 2026-05-21 cs.LG cs.AI 版本更新

Towards Autonomous Mechanistic Reasoning in Virtual Cells

向虚拟细胞中的自主机理推理迈进

Yunhui Jang, Lu Zhu, Jake Fawkes, Alisandra Kaye Denton, Dominique Beaini, Emmanuel Noutahi

发表机构 * Korea Advanced Institute of Science and Technology (KAIST)（韩国科学技术院）； Valence Labs（Valence实验室）； University College London（伦敦大学学院）

AI总结本文提出了一种结构化解释形式化方法，用于虚拟细胞中的生物推理，通过机理动作图实现系统验证和反驳，并引入VCR-Agent多智能体框架，结合生物基础知识检索和基于验证器的过滤方法，生成并验证机理推理。

详情

AI中文摘要

Praxium：基于AI的遥测和依赖分析的云异常诊断

Rohan Kumar, Jason Li, Zongshun Zhang, Syed Mohammad Qasim, Gianluca Stringhini, Ayse K. Coskun

发表机构 * Boston University（波士顿大学）

AI总结本文提出Praxium框架，利用AI技术进行云服务异常检测和根本原因推断，通过遥测数据和依赖分析提高故障诊断效率和准确性。

详情

AI中文摘要

随着现代微服务架构在云应用中的普及，云服务正变得越来越复杂，更容易受到配置错误和软件bug的影响。传统方法依赖专家输入来诊断和修复微服务异常，但在持续集成和持续部署（CI/CD）范式下缺乏可扩展性。微服务发布包含新的软件安装，与应用程序组件有复杂的相互作用。因此，将异常行为归因于任何特定安装或发布变得更加困难，导致可能的解决时间变慢。为了解决当前诊断方法的不足，本文引入Praxium，一个用于异常检测和根本原因推断的框架。Praxium帮助管理员在软件发现工具PraxiPaaS提供的依赖安装信息的背景下评估目标指标性能。Praxium持续监控遥测数据以识别异常，然后通过最近软件安装的因果影响进行根本原因分析，以向站点可靠性工程师（SRE）提供有关观察到的异常的相关信息。在本文中，我们证明Praxium能够有效进行异常检测和根本原因推断，并提供在实际环境中所需的有效异常检测超参数调优分析。在75次总试验中使用四个合成异常，异常检测始终在>0.97宏F1水平上表现良好。此外，我们还显示因果影响分析能够可靠地推断异常的根本原因，即使软件包安装时间间隔越来越短。

英文摘要

As the modern microservice architecture for cloud applications grows in popularity, cloud services are becoming increasingly complex and more vulnerable to misconfiguration and software bugs. Traditional approaches rely on expert input to diagnose and fix microservice anomalies, which lacks scalability in the face of the continuous integration and continuous deployment (CI/CD) paradigm. Microservice rollouts, containing new software installations, have complex interactions with the components of an application. Consequently, this added difficulty in attributing anomalous behavior to any specific installation or rollout results in potentially slower resolution times. To address the gaps in current diagnostic methods, this paper introduces Praxium, a framework for anomaly detection and root cause inference. Praxium aids administrators in evaluating target metric performance in the context of dependency installation information provided by a software discovery tool, PraxiPaaS. Praxium continuously monitors telemetry data to identify anomalies, then conducts root cause analysis via causal impact on recent software installations, in order to provide site reliability engineers (SRE) relevant information about an observed anomaly. In this paper, we demonstrate that Praxium is capable of effective anomaly detection and root cause inference, and we provide an analysis on effective anomaly detection hyperparameter tuning as needed in a practical setting. Across 75 total trials using four synthetic anomalies, anomaly detection consistently performs at >0.97 macro-F1. In addition, we show that causal impact analysis reliably infers the correct root cause of anomalies, even as package installations occur at increasingly shorter intervals.

URL PDF HTML ☆

赞 0 踩 0

2603.22727 2026-05-21 cs.LG eess.SP 版本更新

Spiking Personalized Federated Learning for Brain-Computer Interface-Enabled Immersive Communication

基于脑机接口的沉浸式通信的脉冲个性化联邦学习

Chen Shang, Dinh Thai Hoang, Diep N. Nguyen, Jiadong Yu

发表机构 * School of Electrical and Data Engineering, University of Technology Sydney（悉尼技术大学电气与数据工程学院）； Thrust of Internet of Things, The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州）物联网研究所）

AI总结本文提出了一种利用脑机接口获取脑信号以推断用户中心状态（如意图和感知相关不适）的沉浸式通信框架，通过个性化联邦学习模型处理脑信号，以适应神经多样性数据并防止敏感脑信号信息泄露，同时通过嵌入脉冲神经网络降低能耗，实验表明在真实脑信号数据集上识别准确率最高且能耗降低6.46倍。

Comments 6 pages, 3 figures

详情

Journal ref: INFOCOM Workshop, 2026

AI中文摘要

本文提出了一种新颖的沉浸式通信框架，利用脑机接口（BCI）获取脑信号以推断用户中心状态（例如意图和感知相关不适），从而在强个体差异下实现更个性化和稳健的沉浸式适应。具体而言，我们开发了一个个性化联邦学习（PFL）模型来分析和处理收集到的脑信号，该模型不仅能够适应神经多样性脑信号数据，还能防止敏感脑信号信息泄露。为了解决持续设备学习和推理在能量受限的沉浸终端（如头戴式显示器）中的能量瓶颈，我们进一步将脉冲神经网络（SNNs）嵌入到PFL中。通过利用稀疏、事件驱动的脉冲计算，SNN启用的PFL在保持竞争性个性化性能的同时，降低了训练和推理的计算和能耗。在真实脑信号数据集上的实验表明，我们的方法在整体识别准确率方面表现最佳，同时与传统人工神经网络基线相比，推理能耗降低了6.46倍。

英文摘要

This work proposes a novel immersive communication framework that leverages brain-computer interface (BCI) to acquire brain signals for inferring user-centric states (e.g., intention and perception-related discomfort), thereby enabling more personalized and robust immersive adaptation under strong individual variability. Specifically, we develop a personalized federated learning (PFL) model to analyze and process the collected brain signals, which not only accommodates neurodiverse brain-signal data but also prevents the leakage of sensitive brain-signal information. To address the energy bottleneck of continual on-device learning and inference on energy-limited immersive terminals (e.g., head-mounted display), we further embed spiking neural networks (SNNs) into the PFL. By exploiting sparse, event-driven spike computation, the SNN-enabled PFL reduces the computation and energy cost of training and inference while maintaining competitive personalization performance. Experiments on real brain-signal dataset demonstrate that our method achieves the best overall identification accuracy while reducing inference energy by 6.46$\times$ compared with conventional artificial neural network-based personalized baselines.

URL PDF HTML ☆

赞 0 踩 0

2603.22430 2026-05-21 cs.LG 版本更新

Inference Time Policy Optimization for Offline RL with Differentiable World Models

基于可微世界模型的离线强化学习推理时间策略优化

Rohan Deb, Stephen J. Wright, Arindam Banerjee

发表机构 * Siebel School of Computing and Data Science（计算与数据科学学院）； Department of Computer Sciences（计算机科学系）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； University of Wisconsin Madison（威斯康星大学麦迪逊分校）

AI总结本文提出了一种在推理时间利用可微世界模型优化策略参数的方法，通过端到端的梯度计算提升离线强化学习的性能，同时探讨了推理时间适应的计算开销与收益的权衡。

详情

AI中文摘要

Offline Reinforcement Learning (RL) learns optimal policies from fixed datasets, training a policy once and deploying it at inference time without further refinement. Inspired by model predictive control (MPC), we introduce an inference time adaptation framework that utilizes a pretrained policy along with a learned world model. While existing world model and diffusion-planning methods use learned dynamics to generate imagined trajectories during training, or to sample candidate plans at inference time, they do not use inference-time information to *optimize* the policy parameters on the fly. In contrast, our design is a Differentiable World Model (DWM) pipeline that enables end-to-end gradient computation through imagined rollouts for inference time policy optimization (ITPO). We evaluate our algorithm on D4RL continuous-control benchmarks (MuJoCo locomotion tasks and AntMaze), and show that exploiting inference-time information to optimize the policy parameters yields consistent gains over strong offline RL baselines. Inference-time adaptation, however, is expensive: rollout generation and backpropagation dominate per-step compute. We study this tradeoff explicitly, showing that a suitable tilted version of one-step MeanFlow sampler recovers much of the gains at a fraction of the cost.

英文摘要

Offline Reinforcement Learning (RL) learns optimal policies from fixed datasets, training a policy once and deploying it at inference time without further refinement. Inspired by model predictive control (MPC), we introduce an inference time adaptation framework that utilizes a pretrained policy along with a learned world model. While existing world model and diffusion-planning methods use learned dynamics to generate imagined trajectories during training, or to sample candidate plans at inference time, they do not use inference-time information to *optimize* the policy parameters on the fly. In contrast, our design is a Differentiable World Model (DWM) pipeline that enables end-to-end gradient computation through imagined rollouts for inference time policy optimization (ITPO). We evaluate our algorithm on D4RL continuous-control benchmarks (MuJoCo locomotion tasks and AntMaze), and show that exploiting inference-time information to optimize the policy parameters yields consistent gains over strong offline RL baselines. Inference-time adaptation, however, is expensive: rollout generation and backpropagation dominate per-step compute. We study this tradeoff explicitly, showing that a suitable tilted version of one-step MeanFlow sampler recovers much of the gains at a fraction of the cost.

URL PDF HTML ☆

赞 0 踩 0

2603.21033 2026-05-21 cs.CE cs.LG 版本更新

TabPFN Extensions for Interpretable Geotechnical Modelling

TabPFN扩展在可解释地质建模中的应用

Taiga Saito, Yu Otake, Daijiro Mizutani, Stephen Wu

发表机构 * Department of Civil and Environmental Engineering, Tohoku University（东北大学土木环境工程系）； Research Organization of Information and Systems, The Institute of Statistical Mathematics（信息与系统研究所统计数学研究所）； Department of Statistical Science, The Graduate University for Advanced Studies（高级研究大学统计科学系）

AI总结本文评估了TabPFN及其扩展库在地质任务中的表现，通过土壤类型分类和参数迭代填补，展示了TabPFN在不确定性量化和可解释性方面的优势。

详情

AI中文摘要

地质场地特性依赖于稀疏且异质的钻孔数据，其中不确定性量化和可解释性与预测准确性同样重要。我们评估了TabPFN以及其tabpfn-extensions库在两个地质任务中的表现：(1) 从N值和剪切波速度数据进行土壤类型分类作为受控示例；(2) 在BM/AirportSoilProperties/2/2025中迭代填补五个机械参数（s_u，E_u，σ'_p，C_c，C_v）。在不重新训练的情况下，我们应用余弦相似度分析TabPFN嵌入，可视化预测分布，并计算SHAP属性。在回归基准测试中，我们比较了TabPFN与均值填补、线性回归、随机森林、XGBoost和HBM；引入了预测不确定性在上下文扰动类中的代理分解；并通过一维固结模型传播边缘C_c和σ'_p分布以获得可靠性指数β和服务性超额概率P_f。嵌入表现出标签一致的黏土/砂分组；迭代填补减少了所有五个目标的RMSE，其中TabPFN在四个目标上最低；SHAP属性与Skempton压缩指数相关性和反向预固结压力-含水量依赖性一致；代理分解中的后验成分最大。我们将贡献定位为一个工作评估流程，可能补充数据稀缺的地质学方法，而不是算法创新。

英文摘要

Geotechnical site characterisation relies on sparse, heterogeneous borehole data, where uncertainty quantification and interpretability matter as much as predictive accuracy. We evaluate TabPFN~\citep{Hollmann2025}, a tabular foundation model, and its \texttt{tabpfn-extensions} library on two geotechnical tasks: (1) soil-type classification from N-value and shear-wave velocity data as a controlled illustrative case, and (2) iterative imputation of five mechanical parameters ($s_\mathrm{u}$, $E_{\mathrm{u}}$, ${σ'}_\mathrm{p}$, $C_\mathrm{c}$, $C_\mathrm{v}$) in BM/AirportSoilProperties/2/2025. Without retraining, we apply cosine-similarity analysis to TabPFN embeddings, visualise predictive distributions, and compute SHAP attributions. On the regression benchmark we compare TabPFN with mean imputation, linear regression, random forests, XGBoost, and HBM; introduce a proxy decomposition of predictive uncertainty across context-perturbation classes; and propagate marginal $C_\mathrm{c}$ and ${σ'}_\mathrm{p}$ distributions through a one-dimensional consolidation model to obtain the reliability index $β$ and serviceability exceedance probability $P_\mathrm{f}$. Embeddings exhibit label-consistent Clay/Sand grouping; iterative imputation reduces RMSE for all five targets, with TabPFN lowest on four; SHAP attributions are consistent with the Skempton compression-index correlation and the inverse preconsolidation-pressure-water-content dependence; the within-posterior component is largest in the proxy decomposition. We position the contribution as a worked evaluation workflow that may complement established methods for data-scarce geotechnics, not as algorithmic innovation.

URL PDF HTML ☆

赞 0 踩 0

2603.20420 2026-05-21 q-bio.GN cs.LG q-bio.QM 版本更新

CRANE: Correcting Errors in Raw Nanopore Signals Using Hidden Markov Models

CRANE：利用隐马尔可夫模型纠正原始纳米孔信号中的错误

Simon Ambrozak, Ulysse McConnell, Bhargav Srinivasan, Burak Ozkan, Ernest Zhang, Can Firtina

发表机构 * University of Maryland（马里兰大学）； ETH Zurich（苏黎世联邦理工学院）； Bilkent University（比尔肯特大学）

AI总结本文提出CRANE方法，通过训练和使用隐马尔可夫模型（HMM）来纠正纳米孔信号中的错误，从而提高原始信号分析的准确性，减少分析管道优化的负担，并且不引入显著的计算开销。

详情

AI中文摘要

纳米孔测序可以读取比其他测序方法更长的核酸分子序列，称为读数，这已推动了基因组分析的进步，如无间隙的人类基因组组装。通过分析纳米孔测序生成的原始电信号读数，现有方法可以将这些读数映射到DNA字符（即碱基调序）而无需转换，从而实现快速高效的测序数据分析。然而，原始信号常常由于噪声和处理误差而包含错误，这限制了原始信号分析的总体准确性。本文的目标是检测并纠正原始信号中的错误，以提高原始信号分析的准确性。为此，我们提出了CRANE，一种通过训练和利用隐马尔可夫模型（HMM）来准确纠正信号错误的机制。我们在各种数据集上的广泛评估表明，CRANE 1）一致提高了底层原始信号分析工具的整体准确性，2）最小化了为新型纳米孔技术优化分析管道的负担，3）不引入显著的计算开销。我们得出结论，CRANE提供了一种有效的方法，系统地在进一步分析之前识别并纠正原始纳米孔信号中的错误，这可以促进一种专门为原始纳米孔信号设计的新类别的错误校正机制。源代码：CRANE可在https://github.com/STORMgroup/CRANE上获得。我们还在GitHub页面上提供了完全重现我们结果的脚本。

英文摘要

Nanopore sequencing can read substantially longer sequences of nucleic acid molecules, called reads, than other sequencing methods, which has led to advances in genomic analysis such as the gapless human genome assembly. By analyzing the raw electrical signal reads that nanopore sequencing generates from molecules, existing works can map these reads without translating them into DNA characters (i.e., basecalling), allowing for quick and efficient analysis of sequencing data. However, raw signals often contain errors due to noise and processing errors, which limits the overall accuracy of raw signal analysis. Our goal in this work is to detect and correct errors in raw signals to improve the accuracy of raw signal analyses. To this end, we propose CRANE, a mechanism that trains and utilizes a Hidden Markov Model (HMM) to accurately correct signal errors. Our extensive evaluation on various datasets shows that CRANE 1) consistently improves the overall accuracy of the underlying raw signal analysis tools, 2) minimizes the burden of optimizing analysis pipelines for newer nanopore technologies, and 3) does not introduce substantial computational overhead. We conclude that CRANE provides an effective mechanism to systematically identify and correct the errors in raw nanopore signals before further analysis, which can enable the development of a new class of error correction mechanisms purely designed for raw nanopore signals. Source Code: CRANE is available at https://github.com/STORMgroup/CRANE. We also provide the scripts to fully reproduce our results on our GitHub page

URL PDF HTML ☆

赞 0 踩 0

2603.19545 2026-05-21 eess.SY cs.LG cs.SY math.OC 版本更新

Verifiable Error Bounds for Physics-Informed Neural Network Solutions of Lyapunov and Hamilton-Jacobi-Bellman Equations

用于李雅普诺夫和哈密尔顿-雅可比-贝尔曼方程的物理信息神经网络解的可验证误差界

Jun Liu

发表机构 * Department of Applied Mathematics, Faculty of Mathematics, University of Waterloo（应用数学系，数学学院，滑铁卢大学）

AI总结本文研究了如何通过物理信息神经网络求解李雅普诺夫和哈密尔顿-雅可比-贝尔曼方程的可验证误差界，提出了基于这些方程的解的误差界计算方法，并展示了如何通过残差界来估计真实解的相对误差以及近似解的后验估计。

Comments The paper will appear in the IEEE Control Systems Letters

详情

AI中文摘要

许多非线性系统分析和控制的核心问题可以重新表述为求解偏微分方程（如李雅普诺夫和哈密尔顿-雅可比-贝尔曼方程）的问题。物理信息神经网络（PINNs）作为一种无网格方法，已被提出用于近似这些方程的解，但在大多数现有工作中，没有严格的保证表明小的PDE残差意味着小的解误差。本文开发了用于李雅普诺夫和哈密尔顿-雅可比-贝尔曼方程近似解的可验证误差界，特别强调基于PINN的近似方法。对于李雅普诺夫和哈密尔顿-雅可比-贝尔曼PDEs，我们展示了可验证残差界可以产生相对于真实解的相对误差界以及以近似解为术语的可计算后验估计。对于哈密尔顿-雅可比-贝尔曼方程，这还提供了在紧致子水平集上的最优值函数的认证上界和下界，并量化了由此诱导的反馈策略的最优性差距。我们进一步证明了一侧残差界已经意味着近似本身定义了有效的李雅普诺夫或控制李雅普诺夫函数。我们通过数值示例展示了这些结果。

英文摘要

Many core problems in nonlinear systems analysis and control can be recast as solving partial differential equations (PDEs) such as Lyapunov and Hamilton-Jacobi-Bellman (HJB) equations. Physics-informed neural networks (PINNs) have emerged as a promising mesh-free approach for approximating their solutions, but in most existing works there is no rigorous guarantee that a small PDE residual implies a small solution error. This paper develops verifiable error bounds for approximate solutions of Lyapunov and HJB equations, with particular emphasis on PINN-based approximations. For both the Lyapunov and HJB PDEs, we show that a verifiable residual bound yields relative error bounds with respect to the true solutions as well as computable a posteriori estimates in terms of the approximate solutions. For the HJB equation, this also yields certified upper and lower bounds on the optimal value function on compact sublevel sets and quantifies the optimality gap of the induced feedback policy. We further show that one-sided residual bounds already imply that the approximation itself defines a valid Lyapunov or control Lyapunov function. We illustrate the results with numerical examples.

URL PDF HTML ☆

赞 0 踩 0

2603.16513 2026-05-21 cs.LG cs.AI 版本更新

可解释的人工智能：面向Transformer模型的上下文感知分层集成梯度方法

Melkamu Abay Mersha, Jugal Kalita

发表机构 * College of Engineering and Applied Science, University of Colorado Colorado Springs（科罗拉多州立大学工程与应用科学学院）

AI总结本文提出了一种上下文感知分层集成梯度框架（CA-LIG），用于解释Transformer模型的决策过程，通过计算每个Transformer块内的分层集成梯度，并将这些token级属性与类特定的注意力梯度融合，从而生成具有符号和上下文敏感性的属性图，以捕捉支持和反对的证据，并追踪Transformer层中的相关性层次流动。

详情

DOI: 10.1016/j.neucom.2026.133050

AI中文摘要

Transformer模型在多个领域和任务中实现了最先进的性能，然而其深层表示使得预测难以解释。现有的可解释性方法依赖于最终层的属性，只能捕捉局部token级属性或全局注意力模式，缺乏对token间依赖关系和结构组件的上下文感知能力。它们还无法捕捉相关性如何在层之间演变以及结构组件如何影响决策。为了解决这些限制，我们提出了上下文感知分层集成梯度（CA-LIG）框架，一种统一的层次属性框架，该框架在每个Transformer块内计算分层集成梯度，并将这些token级属性与类特定的注意力梯度融合。这种整合产生了带有符号和上下文敏感性的属性图，能够捕捉支持和反对的证据，同时追踪Transformer层中的相关性层次流动。我们评估了CA-LIG框架在多样化的任务、领域和Transformer模型家族中的表现，包括使用BERT进行情感分析和长多类文档分类，使用XLM-R和AfroLM在低资源语言设置中进行仇恨言论检测，以及使用Masked Autoencoder Vision Transformer模型进行图像分类。在所有任务和架构中，CA-LIG提供了更忠实的属性，显示出对上下文依赖的更强敏感性，并产生了更清晰、更语义连贯的可视化结果，优于现有可解释性方法。这些结果表明，CA-LIG提供了更全面、上下文感知和可靠的Transformer决策解释，推动了深度神经网络的实用可解释性和概念理解。

英文摘要

Transformer models achieve state-of-the-art performance across domains and tasks, yet their deeply layered representations make their predictions difficult to interpret. Existing explainability methods rely on final-layer attributions, capture either local token-level attributions or global attention patterns without unification, and lack context-awareness of inter-token dependencies and structural components. They also fail to capture how relevance evolves across layers and how structural components shape decision-making. To address these limitations, we proposed the \textbf{Context-Aware Layer-wise Integrated Gradients (CA-LIG) Framework}, a unified hierarchical attribution framework that computes layer-wise Integrated Gradients within each Transformer block and fuses these token-level attributions with class-specific attention gradients. This integration yields signed, context-sensitive attribution maps that capture supportive and opposing evidence while tracing the hierarchical flow of relevance through the Transformer layers. We evaluate the CA-LIG Framework across diverse tasks, domains, and transformer model families, including sentiment analysis and long and multi-class document classification with BERT, hate speech detection in a low-resource language setting with XLM-R and AfroLM, and image classification with Masked Autoencoder vision Transformer model. Across all tasks and architectures, CA-LIG provides more faithful attributions, shows stronger sensitivity to contextual dependencies, and produces clearer, more semantically coherent visualizations than established explainability methods. These results indicate that CA-LIG provides a more comprehensive, context-aware, and reliable explanation of Transformer decision-making, advancing both the practical interpretability and conceptual understanding of deep neural models.

URL PDF HTML ☆

赞 0 踩 0

2602.16399 2026-05-21 eess.AS cs.LG cs.SD 版本更新

Multi-Channel Replay Speech Detection using Acoustic Maps

基于声学地图的多通道回放语音检测

Michael Neri, Tuomas Virtanen

发表机构 * Faculty of Information Technology（信息科技学院）； Commmunication Sciences（通信科学）； Tampere University（塔尔皮奥大学）； Tampere, Finland（芬兰塔尔皮奥）

AI总结本文提出利用声学地图作为新型空间特征表示方法，用于多通道录音中的回放语音检测，通过轻量级卷积神经网络在ReMASC数据集上实现了竞争性性能，展示了声学地图在不同设备和声学环境下的紧凑且物理可解释的特征空间。

Comments Accepted in EUSIPCO 2026

2602.10989 2026-05-21 math.ST cs.IT cs.LG math.IT math.PR stat.ML stat.TH 版本更新

用于过程监控的图自编码器

Xiangrui Zhang

发表机构 * School of Information and Control Engineering, China University of Mining and Technology（信息与控制工程学院，中国矿业大学）

AI总结本文提出了一种因果图时空自编码器（CGSTAE），通过结合基于空间自注意力机制的空间相关图结构学习模块和利用图卷积长短期记忆（GCLSTM）的空间-时间编码器-解码器模块，以提高工业过程监控的可靠性和可解释性。

详情

AI中文摘要

为提高工业过程监控的可靠性和可解释性，本文提出了一种因果图时空自编码器（CGSTAE）。CGSTAE的网络架构结合了两个组件：基于空间自注意力机制的空间相关图结构学习模块（SSAM）和利用图卷积长短期记忆（GCLSTM）的空间-时间编码器-解码器模块。SSAM通过捕捉变量之间的动态关系来学习相关图，而一种新的三步因果图结构学习算法被引入，以从这些相关图中推导出因果图。该算法利用因果不变性原理的反向视角来揭示从变化相关性中得到的不变因果图。空间-时间编码器-解码器由GCLSTM单元构建，在序列到序列框架内重建时间序列过程数据。所提出的CGSTAE通过特征空间和残差空间中的两个统计量实现有效的过程监控和故障检测。最后，我们通过田纳西东部过程和一个现实世界的空气分离过程验证了CGSTAE在过程监控中的有效性。

英文摘要

To improve the reliability and interpretability of industrial process monitoring, this article proposes a Causal Graph Spatial-Temporal Autoencoder (CGSTAE). The network architecture of CGSTAE combines two components: a correlation graph structure learning module based on spatial self-attention mechanism (SSAM) and a spatial-temporal encoder-decoder module utilizing graph convolutional long-short term memory (GCLSTM). The SSAM learns correlation graphs by capturing dynamic relationships between variables, while a novel three-step causal graph structure learning algorithm is introduced to derive a causal graph from these correlation graphs. The algorithm leverages a reverse perspective of causal invariance principle to uncover the invariant causal graph from varying correlations. The spatial-temporal encoder-decoder, built with GCLSTM units, reconstructs time-series process data within a sequence-to-sequence framework. The proposed CGSTAE enables effective process monitoring and fault detection through two statistics in the feature space and residual space. Finally, we validate the effectiveness of CGSTAE in process monitoring through the Tennessee Eastman process and a real-world air separation process.

URL PDF HTML ☆

赞 0 踩 0

2602.02304 2026-05-21 cs.AI cs.LG 版本更新

Comparing Explanations is Not Enough, Explain the Change: New Standards are Needed to Explain Behavioral Shifts in Large Language Models

比较解释并不足够，解释变化：需要新的标准来解释大型语言模型中的行为转变

Martino Ciaperoni, Marzio Di Vece, Roberto Pellungrini, Luca Pappalardo, Fosca Giannotti, Francesco Giannini

发表机构 * Scuola Normale Superiore（诺莱学院）； ISTI-CNR（意大利国家研究委员会ISTI研究所）； University of Pisa（比萨大学）

AI总结本文提出了一种新的XAI方法，旨在解释大型语言模型在干预后行为转变的原因和机制，以应对现有解释方法无法解释行为转变的问题。

详情

AI中文摘要

大规模基础模型在受到缩放、微调、人类反馈强化学习或上下文学习等干预时会表现出行为转变。当前的可解释性方法结构上不适用于解释这些转变，因为它们要么将模型视为静态对象，如传统可解释AI（XAI）方法所做的，要么仅仅比较不同模型检查点的独立解释。因此，这些方法无法解释两个模型实例之间的功能转变，其中某种行为在干预后发生了变化。这种差距在欧盟人工智能法案、美国州立法和中国人工智能法规等司法管辖区中带来了重大治理风险，这些法规要求记录重大系统修改的因果链。本文主张，解释大型语言模型的行为转变需要一种系统的方法，将转变本身作为解释的主要对象：即解释干预如何和为何将参考模型转变为具有不同行为的更新模型。为了支持这一主张，我们引入了称为比较XAI（XAI_Δ）的新XAI范式，旨在解释两个模型检查点之间的差异，其中行为发生了变化，以及一组规范，规定XAI_Δ解释器和解释必须满足的条件，包括可比性、有效性、可操作性和监控，目标是将模型审计 grounded 在明确、可测量的要求中。最后，我们通过示例实验提供初步证据，表明在实践中需要XAI_Δ，将结果汇总成一份转换报告，直接可用于治理和事件记录。

英文摘要

Large-scale foundation models exhibit \emph{behavioral shifts} when subjected to interventions such as scaling, fine-tuning, reinforcement learning with human feedback, or in-context learning. Current explainability methods are structurally ill-suited to explain these shifts, because they either treat models as static objects, as traditional eXplainable AI (XAI) approaches do, or merely compare independent explanations across different checkpoints of a model. As a result, these approaches fail to explain the functional transition between two model instances in which a certain behavior has shifted following an intervention. This gap creates significant governance risks across jurisdictions including the EU AI Act, US state legislation, and Chinese AI regulations, which require documenting causal chains for substantial system modifications. This position paper argues that explaining behavioral shifts in large language models requires a principled approach that treats the shift itself as the primary object of explanation: namely, one that explains how and why an intervention transforms a reference model into an updated model with different behavior. To support this claim, we introduce \textit{Comparative} XAI (XAI$_Δ$), a novel XAI paradigm aimed at explaining the difference between two model checkpoints where a behavior has shifted, together with a set of desiderata specifying what XAI$_Δ$ explainers and explanations must satisfy, including comparability, validity, actionability, and monitoring, with the goal of grounding model auditing in explicit, measurable requirements. Finally, we provide preliminary evidence suggesting the need for XAI$_Δ$ in practice through illustrative experiments, compiling the resulting findings into a transition report directly usable for governance and incident documentation.

URL PDF HTML ☆

赞 0 踩 0

2601.22932 2026-05-21 cs.LG 版本更新

DC-LA: Difference-of-Convex Langevin Algorithm

DC-LA：差分凸拉格朗日算法

Hoang Phuc Hau Luu, Zhongjian Wang

发表机构 * Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore（数学科学系，物理与数学科学学院，南洋理工大学，新加坡）

AI总结本文研究了一个采样问题，其目标分布为π∝exp(-f-r)，其中数据保真项f是Lipschitz光滑的，而正则化项r=r1-r2是一个非光滑的差分凸（DC）函数。通过利用r的DC结构，分别对r1和r2应用Moreau包络以平滑r。随后，将正则化部分的凹部分分配给数据保真项，并研究相应的近端拉格朗日算法（称为DC-LA）。在V远离耗散的假设下，建立了DC-LA在q-Wasserstein距离上收敛到目标分布π的结论，且在离散化和平滑误差范围内对所有q∈ℕ*成立。结果在非对数凹采样方面改进了之前的成果。

详情

AI中文摘要

我们研究了一个采样问题，其目标分布为π∝exp(-f-r)，其中数据保真项f是Lipschitz光滑的，而正则化项r=r1-r2是一个非光滑的差分凸（DC）函数，即r1,r2是凸函数。通过利用r的DC结构，我们分别对r1和r2应用Moreau包络以平滑r。遵循DC编程，我们将正则化部分的凹部分分配给数据保真项，并研究其对应的近端拉格朗日算法（称为DC-LA）。我们在V远离耗散的假设下，建立了DC-LA在q-Wasserstein距离上收敛到目标分布π的结论，且在离散化和平滑误差范围内对所有q∈ℕ*成立。我们的结果在非对数凹采样方面改进了之前的成果。数值实验表明，DC-LA在合成设置中能够生成准确的分布，并在实际应用的计算机断层扫描中提供定性合理的不确定性量化。

英文摘要

We study a sampling problem whose target distribution is $π\propto \exp(-f-r)$ where the data fidelity term $f$ is Lipschitz smooth while the regularizer term $r=r_1-r_2$ is a non-smooth difference-of-convex (DC) function, i.e., $r_1,r_2$ are convex. By leveraging the DC structure of $r$, we can smooth out $r$ by applying Moreau envelopes to $r_1$ and $r_2$ separately. In line with DC programming, we then redistribute the concave part of the regularizer to the data fidelity and study its corresponding proximal Langevin algorithm (termed DC-LA). We establish convergence of DC-LA to the target distribution $π$, up to discretization and smoothing errors, in the $q$-Wasserstein distance for all $q \in \mathbb{N}^*$, under the assumption that $V$ is distant dissipative. Our results improve previous work on non-log-concave sampling in terms of a more general framework and assumptions. Numerical experiments show that DC-LA produces accurate distributions in synthetic settings and provides qualitatively reasonable uncertainty quantification in a real-world Computed Tomography application.

URL PDF HTML ☆

赞 0 踩 0

2601.22292 2026-05-21 cs.MA cs.LG 版本更新

Learning Incentive Structures for Cooperative Resilience in Multi-Agent Systems under Social Dilemmas

在社会困境中的多智能体系统中学习合作韧性激励结构

Manuela Chacon-Chamorro, Luis Felipe Giraldo, Nicanor Quijano

发表机构 * School of Engineering, Universidad de los Andes（工程学院，亚诺斯大学）

AI总结本文研究了在社会困境中通过多智能体强化学习系统学习促进集体福祉的激励结构，提出了一种评估和排名智能体轨迹的韧性度量标准，并通过三种激励结构评估了资源共享环境中的系统性能。

Comments Supplementary material in https://github.com/mavivi95/supplementary_files/blob/main/Learning_TCSS___Supplementary_File__AN_.pdf Updated version submitted to IEEE Transactions on Computational Social Systems (TCSS). This preprint is under review for possible publication in IEEE

详情

AI中文摘要

通过基于共识的隐私保护数据分发实现安全、可验证和可扩展的多客户端数据共享

Prajwal Panth, Sahaj Raj Malla

发表机构 * School of Computer Engineering, KIIT University（KIIT大学计算机工程学院）； Department of Mathematics, Kathmandu University（加德满都大学数学系）

AI总结本文提出了一种基于共识的隐私保护数据分发（CPPDD）框架，该框架是一种轻量级且在设置后自动运行的协议，用于安全的多客户端数据聚合。该框架通过结合每个客户端的仿射掩码和优先级驱动的顺序共识锁定的双层保护机制，强制实施一致发布保密性。通过步骤（sigma_S）和数据（sigma_D）校验和实现去中心化完整性，从而在不需要持续协调的情况下实现自动恶意偏差检测和原子回滚。该设计支持标量、向量和矩阵负载，具有O(N*D)的计算和通信复杂度，可选边缘服务器卸载，并在N-1破坏情况下具有抗合谋性。形式分析证明了正确性、共识依赖完整性与公平性（CDIF）以及在偏差下的高概率回滚，并假设伪随机函数族的情况下证明了IND-CPA安全性。在MNIST衍生向量上的实证评估显示，可扩展性线性增长到N=500，每个客户端的计算时间亚毫秒级。该框架实现了100%的恶意偏差检测、精确的数据恢复以及与MPC和HE基线相比低三个到四个数量级的FLOPs。CPPDD在安全投票、联盟联邦学习、区块链担保和地理信息能力构建中实现了原子协作，解决了在受监管和资源受限环境中可扩展性、信任最小化和可验证多方计算的关键差距。

Comments 25 pages, 6 figures, preprint

详情

AI中文摘要

我们提出了一种基于共识的隐私保护数据分发（CPPDD）框架，一种轻量级且在设置后自动运行的协议，用于安全的多客户端数据聚合。该框架通过结合每个客户端的仿射掩码和优先级驱动的顺序共识锁定的双层保护机制，强制实施一致发布保密性。去中心化完整性通过步骤（sigma_S）和数据（sigma_D）校验和进行验证，从而在不需要持续协调的情况下实现自动恶意偏差检测和原子回滚。该设计支持标量、向量和矩阵负载，具有O(N*D)的计算和通信复杂度，可选边缘服务器卸载，并在N-1破坏情况下具有抗合谋性。形式分析证明了正确性、共识依赖完整性与公平性（CDIF）以及在偏差下的高概率回滚，并假设伪随机函数族的情况下证明了IND-CPA安全性。在MNIST衍生向量上的实证评估显示，可扩展性线性增长到N=500，每个客户端的计算时间亚毫秒级。该框架实现了100%的恶意偏差检测、精确的数据恢复以及与MPC和HE基线相比低三个到四个数量级的FLOPs。CPPDD在安全投票、联盟联邦学习、区块链担保和地理信息能力构建中实现了原子协作，解决了在受监管和资源受限环境中可扩展性、信任最小化和可验证多方计算的关键差距。

英文摘要

We propose the Consensus-Based Privacy-Preserving Data Distribution (CPPDD) framework, a lightweight and post-setup autonomous protocol for secure multi-client data aggregation. The framework enforces unanimous-release confidentiality through a dual-layer protection mechanism that combines per-client affine masking with priority-driven sequential consensus locking. Decentralized integrity is verified via step (sigma_S) and data (sigma_D) checksums, facilitating autonomous malicious deviation detection and atomic abort without requiring persistent coordination. The design supports scalar, vector, and matrix payloads with O(N*D) computation and communication complexity, optional edge-server offloading, and resistance to collusion under N-1 corruptions. Formal analysis proves correctness, Consensus-Dependent Integrity and Fairness (CDIF) with overwhelming-probability abort on deviation, and IND-CPA security assuming a pseudorandom function family. Empirical evaluations on MNIST-derived vectors demonstrate linear scalability up to N = 500 with sub-millisecond per-client computation times. The framework achieves 100% malicious deviation detection, exact data recovery, and three-to-four orders of magnitude lower FLOPs compared to MPC and HE baselines. CPPDD enables atomic collaboration in secure voting, consortium federated learning, blockchain escrows, and geo-information capacity building, addressing critical gaps in scalability, trust minimization, and verifiable multi-party computation for regulated and resource-constrained environments.

URL PDF HTML ☆

赞 0 踩 0

2512.13788 2026-05-21 cs.LG cs.RO 版本更新

Constrained Policy Optimization via Sampling-Based Weight-Space Projection

通过基于采样的权重空间投影进行约束策略优化

Shengfan Cao, Francesco Borrelli, Eunhyek Joa

发表机构 * Department of Mechanical Engineering, Seoul National University, Seoul, Korea（首尔国立大学机械工程系）

AI总结该研究提出了一种基于采样的权重空间投影方法SCPO，用于在不离开安全操作范围的情况下优化策略，通过在参数空间中直接强制安全约束，确保在训练过程中保持安全性和可行性，同时在约束控制任务中实现闭环稳定性。

Comments Accepted for publication at IFAC World Congress 2026; fixed minor notation inconsistencies

详情

AI中文摘要

Jasraj Singh, Shelvia Wongso, Jeremie Houssineau, Badr-Eddine Chérief-Abdellatif

发表机构 * Nanyang Technological University（南洋理工大学）

AI总结本文提出了一种基于可能性理论的变分推断方法，通过建立最大性Donsker-Varadhan公式，解决了传统变分推断中对加法性假设的依赖问题，并提出了CBOpt优化器以提升图像分类任务的性能。

Comments 37 pages, 3 figures, 13 tables

详情

AI中文摘要

变分推断（VI）是现代贝叶斯学习的核心，使复杂模型的近似推断成为可能。然而，其公式依赖于高维积分定义的期望和发散，通常使解析处理变得不可能，需要依赖大量近似。可能性理论是一种不精确概率框架，允许我们直接建模信念不确定性，而不是依赖概率的主观解释。尽管该框架在稀疏或不精确信息下提供鲁棒性和可解释性，但将VI适应到可能性设置中需要重新思考核心概念，如发散，这预设了加法性。在本工作中，我们开发了一种原则性的公式，以进行可能性VI，通过建立经典Donsker-Varadhan公式的最大性类比。所得到的框架使我们能够推导出具有指数族候选者的可能性VI学习规则和实用的神经网络训练更新规则，从而产生了一族称为CBOpt的优化器。最后，我们证明CBOpt在域内和域外图像分类任务中实现了有竞争力的性能。

英文摘要

Variational inference (VI) is a cornerstone of modern Bayesian learning, enabling approximate inference in complex models. However, its formulation depends on expectations and divergences defined through high-dimensional integrals, often rendering analytical treatment impossible and necessitating heavy reliance on approximations. Possibility theory, an imprecise probability framework, allows us to directly model epistemic uncertainty instead of relying on a subjective interpretation of probabilities. While this framework provides robustness and interpretability under sparse or imprecise information, adapting VI to the possibilistic setting requires rethinking core concepts such as divergences, which presuppose additivity. In this work, we develop a principled formulation for performing possibilistic VI by establishing a maxitive analogue of the classical Donsker-Varadhan formulation. The resulting framework enables us to derive a learning rule for possibilistic VI with exponential-family candidates and practical update rules for neural-network training, giving rise to a family of optimizers termed CBOpt. Finally, we demonstrate that CBOpt achieves competitive performance on both in-domain and out-of-domain image classification tasks.

URL PDF HTML ☆

赞 0 踩 0

2510.14444 2026-05-21 cs.LG cs.AI 版本更新

A Free Lunch in LLM Compression: Revisiting Retraining after Pruning

在LLM压缩中寻找免费午餐：重新审视剪枝后的重新训练

Moritz Wagner, Christophe Roux, Max Zimmer, Sebastian Pokutta

发表机构 * Department for AI in Society, Science, and Technology, Zuse Institute Berlin（人工智能社会、科学与技术系，柏林Zuse研究所）； Institute of Mathematics, Technische Universität Berlin（数学系，柏林技术大学）

AI总结本文研究了在剪枝后通过局部重建进行适应的方法，发现其在减少数据和计算成本的同时能有效提升模型性能，并揭示了在不同粒度下重建参数窗口对最终质量的影响，挑战了LLM剪枝后适应不可行的主流观点。

详情

AI中文摘要

后训练剪枝可以显著降低LLM推理成本，但除非剩余权重被适应，否则往往会降质。由于在LLM规模上全局重新训练成本高昂，近期研究大多集中在日益复杂的剪枝标准上，旨在选择更好的稀疏模式而不进行适应。我们通过局部重建重新审视这一权衡：在剪枝后，我们依次在校准集上适应模型参数的一个子集，训练其以匹配密集模型的相应中间激活值。我们评估了局部重建在不同模型家族和规模上的表现，最高达到72B参数，并得出三个主要发现。首先，局部重建是LLM的有效适应机制：它在剪枝后重新训练时，使用了超过一个数量级更少的数据和计算资源，即使使用PEFT技术也是如此。其次，重建在粒度上表现出广泛的“免费午餐”区域，即重建参数窗口：只要重建区域包含至少一个非线性子模块，最终质量对窗口大小几乎不敏感，允许粒度主要基于内存约束来选择。相比之下，重建单个矩阵，尽管是文献中常提出的方法，却持续表现不佳，因为小的矩阵级误差会积累成更大的激活漂移。最后，重建减少了剪枝标准的相对重要性：随着模型规模的增加，复杂标准与简单基线之间的性能差距缩小，使简单方法再次具有竞争力。总体而言，我们的结果挑战了LLM剪枝后适应不可行的主流观点。

英文摘要

Post-training pruning can substantially reduce LLM inference costs, but it often degrades quality unless the remaining weights are adapted. Since global retraining is expensive at LLM scale, recent work has largely focused on increasingly sophisticated pruning criteria that aim to select better sparsity patterns without adaptation. We revisit this trade-off through local reconstruction: after pruning, we adapt one subset of the model parameters at a time on a calibration set, training it to match the corresponding intermediate activations of the dense model. We evaluate local reconstruction across model families and scales, up to 72B parameters, and establish three main findings. First, local reconstruction is an effective adaptation mechanism for LLMs: it matches post-pruning retraining while using over an order of magnitude less data and compute, even when using PEFT techniques. Second, reconstruction exhibits a broad "free-lunch" regime in granularity, i.e., the reconstruction parameter window: as long as the reconstructed region contains at least a nonlinear submodule, final quality is largely insensitive to the window size, allowing granularity to be chosen primarily based on memory constraints. In contrast, reconstructing individual matrices, despite being the natural approach often proposed in the literature, consistently underperforms, as small matrix-level errors accumulate into larger activation drift. Lastly, reconstruction reduces the relative importance of the pruning criterion: performance gaps between sophisticated criteria and simple baselines shrink with model scale, making simple methods competitive again. Overall, our results challenge the prevailing view that post-pruning adaptation is impractical for LLMs.

URL PDF HTML ☆

赞 0 踩 0

2510.06824 2026-05-21 cs.LG 版本更新

Efficient numeracy in language models through single-token number embeddings

通过单token数字嵌入提升语言模型的数值处理效率

Linus Kreitner, Paul Hager, Jonathan Mengedoht, Georgios Kaissis, Daniel Rueckert, Martin J. Menten

发表机构 * Chair for AI in Healthcare and Medicine, Technical University of Munich (TUM) and TUM University Hospital, Munich, Germany（人工智能在医疗和医学中的Chair，慕尼黑技术大学（TUM）和慕尼黑技术大学医院，德国慕尼黑）； Department of Computing, Imperial College London, UK（计算系，伦敦帝国学院，英国）； Munich Center for Machine Learning (MCML), Munich, Germany（慕尼黑机器学习中心（MCML），德国慕尼黑）； Hasso Plattner Institute for Digital Engineering, University of Potsdam, Germany（哈索·platzer研究所数字工程学院，波茨坦大学，德国）

AI总结本文提出BitTokens，一种利用IEEE 754二进制浮点表示将数字编码为单token的方法，使语言模型能更高效地处理数值计算，从而提升其解决复杂问题的能力。

详情

AI中文摘要

为了推动科学和工程领域的进步，大型语言模型（LLMs）必须能够高效处理大量数值数据并解决长计算。目前只能通过外部工具或大量推理链实现，这要么削弱了LLMs的数值表示，要么限制了它们能解决的问题长度。我们发现前沿LLMs解决基本计算需要过多的推理token，这被其分拆单个数字为多个token的分词策略所加剧。这促使了对高效且有效的单token数字编码的需求。我们提出了一组此类编码的准则，并展示现有方法未能满足这些准则。为解决这些不足，我们提出了BitTokens，一种新的编码策略，通过IEEE 754二进制浮点表示将任何数字编码为单个token。通过广泛实验，我们证明我们的BitTokens使即使是小型语言模型也能学习到几乎完美解决基本算术运算的算法。这种新获得的效率可以扩展语言模型能解决的问题长度和复杂性。

英文摘要

To drive progress in science and engineering, large language models (LLMs) must be able to process large amounts of numerical data and solve long calculations efficiently. This is currently only possible through the use of external tools or extensive reasoning chains, either weakening the numerical representations of LLMs or limiting the length of problems they can solve. We show that frontier LLMs require excessive amounts of reasoning tokens to solve even basic calculations, which is exacerbated by their tokenization strategies that split single numbers into multiple tokens. This motivates the need for efficient and effective single-token number encodings. We introduce a set of desiderata for such encodings and show that existing approaches fail to fulfill them. To address these shortcomings, we propose BitTokens, a novel encoding strategy that represents any number as a single token using its IEEE 754 binary floating-point representation. Through extensive experiments we show that our BitTokens allow even small language models to learn algorithms that solve basic arithmetic operations nearly perfectly. This newly gained efficiency could expand the length and complexity of problems language models can solve.

URL PDF HTML ☆

赞 0 踩 0

2510.00171 2026-05-21 quant-ph cs.LG 版本更新

Quantum reservoir computing in Jaynes-Cummings models: Nonlinear memory and time-series prediction

在Jaynes-Cummings模型中进行量子回声计算：非线性记忆与时间序列预测

Sreetama Das, Gian Luca Giorgi, Roberta Zambrini

发表机构 * Institute for Cross-Disciplinary Physics and Complex Systems (IFISC) UIB-CSIC（交叉学科物理与复杂系统研究所（IFISC） UIB-CSIC）

AI总结本文研究了基于Jaynes-Cummings模型的量子回声计算，探讨了非线性记忆和时间序列预测的核心方法，并展示了其在复杂动态系统中的应用价值。

Comments 16 pages, 14 figures, published version

详情

DOI: 10.1103/ffd3-ytbt
Journal ref: Phys. Rev. Research 8, 023148 (2026)

AI中文摘要

我们研究了利用由Jaynes-Cummings（JC）哈密顿量及其色散极限（DJC）描述的混合量子-玻色子系统进行量子回声计算（QRC）。这些模型提供了高维希尔伯特空间和内在非线性动力学，使其成为时间信息处理的强大基质。我们通过线性和非线性记忆任务系统地评估了两种回声体，证明它们表现出非线性记忆能力优于线性记忆能力。我们进一步在Mackey-Glass时间序列上测试其预测性能，该序列是用于混沌动态的广泛基准，展示了可比的预测能力。我们还研究了记忆和预测准确性如何随回声参数变化，并展示了更高阶玻色子可观测量和时间复用在增强表达性中的作用，即使在最小的自旋-玻色子配置中也是如此。我们的结果确立了基于JC和DJC的回声体作为时间序列处理的多功能平台，并作为克服等效量子位对设置的基本单元，提供了通往可调、高性能量子机器学习架构的途径。

英文摘要

We investigate quantum reservoir computing (QRC) using a hybrid qubit-boson system described by the Jaynes-Cummings (JC) Hamiltonian and its dispersive limit (DJC). These models provide high-dimensional Hilbert spaces and intrinsic nonlinear dynamics, making them powerful substrates for temporal information processing. We systematically benchmark both reservoirs through linear and nonlinear memory tasks, demonstrating that they exhibit an unusual superior nonlinear over linear memory capacity. We further test their predictive performance on the Mackey-Glass time series, a widely used benchmark for chaotic dynamics, and show comparable forecasting ability. We also investigate how memory and prediction accuracy vary with reservoir parameters, and show the role of higher-order bosonic observables and time multiplexing in enhancing expressivity, even in minimal spin-boson configurations. Our results establish JC- and DJC-based reservoirs as versatile platforms for time-series processing and as elementary units that overcome the setting of equivalent qubit pairs and offer pathways toward tunable, high-performance quantum machine learning architectures.

URL PDF HTML ☆

赞 0 踩 0

2509.26627 2026-05-21 cs.AI cs.LG cs.RO 版本更新

TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance

TimeRewarder: 通过帧间时间距离从被动视频中学习密集奖励

Yuyang Liu, Chuan Wen, Yihang Hu, Dinesh Jayaraman, Yang Gao

发表机构 * Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China（清华大学交叉信息研究院）； Shanghai Qi Zhi Institute（上海启智研究院）； Shanghai Jiao Tong University（上海交通大学）； University of Pennsylvania（宾夕法尼亚大学）

AI总结本文提出TimeRewarder方法，通过帧间时间距离从被动视频中学习密集奖励，以提升强化学习在稀疏奖励任务中的性能，实验表明其在多个任务中显著提高了成功率和样本效率。

Comments ICML 2026 spotlight paper

详情

AI中文摘要

设计密集奖励对于强化学习（RL）至关重要，但在机器人学中往往需要大量的手动工作且缺乏可扩展性。一个有前景的解决方案是将任务进展视为密集奖励信号，因为它量化了动作在时间上推动系统向任务完成迈进的程度。我们提出了TimeRewarder，一种简单而有效的奖励学习方法，通过建模帧对之间的时间距离，从被动视频（包括机器人演示和人类视频）中推导出进展估计信号。然后展示如何通过TimeRewarder提供逐步的代理奖励以指导强化学习。在我们对十个具有挑战性的Meta-World任务的全面实验中，我们表明TimeRewarder显著提高了稀疏奖励任务的强化学习性能，仅在每个任务中进行200,000次环境交互时，就实现了9/10任务的几乎完美成功。该方法在最终成功率和样本效率上均优于先前方法和手动设计的环境密集奖励。此外，我们还展示了TimeRewarder预训练可以利用真实世界的人类视频，突显了其作为从多样化视频源中获取丰富奖励信号的可扩展方法的潜力。

英文摘要

Designing dense rewards is crucial for reinforcement learning (RL), yet in robotics it often demands extensive manual effort and lacks scalability. One promising solution is to view task progress as a dense reward signal, as it quantifies the degree to which actions advance the system toward task completion over time. We present TimeRewarder, a simple yet effective reward learning method that derives progress estimation signals from passive videos, including robot demonstrations and human videos, by modeling temporal distances between frame pairs. We then demonstrate how TimeRewarder can supply step-wise proxy rewards to guide reinforcement learning. In our comprehensive experiments on ten challenging Meta-World tasks, we show that TimeRewarder dramatically improves RL for sparse-reward tasks, achieving nearly perfect success in 9/10 tasks with only 200,000 environment interactions per task. This approach outperformed previous methods and even the manually designed environment dense reward on both the final success rate and sample efficiency. Moreover, we show that TimeRewarder pretraining can exploit real-world human videos, highlighting its potential as a scalable approach to rich reward signals from diverse video sources.

URL PDF HTML ☆

赞 0 踩 0

2509.25606 2026-05-21 cs.LG 版本更新

Effective Model Pruning: Measure The Redundancy of Model Components

有效模型剪枝：衡量模型组件的冗余性

Yixuan Wang, Dan P. Guralnik, Saiedeh Akbari, Warren E. Dixon

发表机构 * Department of Mechanical and Aerospace Engineering, University of Florida（佛罗里达大学机械与航空航天工程系）； Department of Mathematics, Ohio University（俄亥俄大学数学系）

AI总结本文研究了模型剪枝中的基本问题，提出了一种基于有效样本大小的剪枝方法，通过分析重要性评分分布来确定可丢弃的组件数量，并在多种网络架构上验证了该方法的有效性。

Comments 18 pages, 4 figures. Accepted at ICML 2026 (Spotlight)

详情

AI中文摘要

本文开创性地研究了模型剪枝中的基本问题：给定一个分配给模型组件的重要性评分向量s，如何确定在不牺牲性能的情况下可以丢弃多少评分组件？我们提出了有效模型剪枝（EMP），该方法通过粒子过滤中的有效样本大小概念（也称为逆西姆逊指数）直接从评分分布中推导出所需的稀疏性。EMP提供了一个通用的自适应阈值，该阈值基于评分s在模型组件上的分布：EMP将s映射到一个称为有效样本大小的数值N_eff(s)。丢弃N-N_eff分值最低的组件。推导了有效质量s_eff（保留的标准化评分总和）关于N_eff的紧下界。这一过程产生了一个相对于原始密集模型具有可证明上界损失变化的模型。在多种网络架构上进行了数值实验，包括MLPs、CNNs、Transformers、LLMs和KAN。还展示了EMP能够处理多种剪枝标准，如权重大小、注意力评分、KAN重要性评分以及特征级信号如图像像素。

英文摘要

This article initiates the study of a basic question about model pruning. Given a vector $s$ of importance scores assigned to model components, how many of the scored components could be discarded without sacrificing performance? We propose Effective Model Pruning (EMP), which derives the desired sparsity directly from the score distribution using the notion of effective sample size from particle filtering, also known as the inverse Simpson index. Rather than prescribe a pruning criterion, EMP supplies a universal adaptive threshold derived from the distribution of the score $s$ over the model components: EMP maps $s$ to a number $N_{eff}=N_{eff}(s)$, called the effective sample size. The $N-N_{eff}$ lowest scoring components are discarded. A tight lower bound on the effective mass $s_{eff}$ (the sum of retained normalized scores) in terms of $N_{eff}$ is derived. This process yields models with a provable upper bound on the loss change relative to the original dense model. Numerical experiments are performed demonstrating this phenomenon across a variety of network architectures including MLPs, CNNs, Transformers, LLMs, and KAN. It is also shown that EMP addresses a rich set of pruning criteria such as weight magnitude, attention score, KAN importance score, and even feature-level signals such as image pixels.

URL PDF HTML ☆

赞 0 踩 0

2509.22963 2026-05-21 cs.LG 版本更新

Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces

基于离散扩散策略的强化学习

Haitong Ma, Ofir Nabati, Aviv Rosenberg, Bo Dai, Oran Lang, Craig Boutilier, Na Li, Shie Mannor, Lior Shani, Guy Tenneholtz

发表机构 * Google Research（谷歌研究）； Harvard University（哈佛大学）； Google DeepMind（谷歌DeepMind）； Nvidia Research（Nvidia研究）

AI总结本文提出了一种新的框架，用于在复杂的组合动作空间中训练高效的离散扩散模型策略，通过高效的在线训练过程和策略镜像下降方法，实现了稳定的策略改进，并在多个挑战性组合基准上取得了最先进的性能。

Comments 22 pages, 10 figures. Haitong Ma and Ofir Nabati contributed equally to this paper

详情

AI中文摘要

强化学习（RL）在面对许多现实问题中常见的大规模组合动作空间时面临扩展困难。本文介绍了一种新的框架，用于训练离散扩散模型作为这些复杂设置中的高效策略。我们的关键创新是一个高效的在线训练过程，确保了稳定的策略改进。通过利用策略镜像下降（PMD）来定义一个理想的、正则化的目标策略分布，我们将策略更新框架为一个分布匹配问题，训练具有表现力的扩散模型以复制这个稳定的靶向分布。这种解耦方法稳定了学习过程，并显著提高了训练性能。我们的方法在一系列具有挑战性的组合基准上实现了最先进的结果和优越的样本效率，包括DNA序列生成、具有宏动作的强化学习和多智能体系统。实验表明，我们的扩散策略在与其他基线相比时表现出优越的性能。

英文摘要

Reinforcement learning (RL) struggles to scale to large, combinatorial action spaces common in many real-world problems. This paper introduces a novel framework for training discrete diffusion models as highly effective policies in these complex settings. Our key innovation is an efficient online training process that ensures stable and effective policy improvement. By leveraging policy mirror descent (PMD) to define an ideal, regularized target policy distribution, we frame the policy update as a distributional matching problem, training the expressive diffusion model to replicate this stable target. This decoupled approach stabilizes learning and significantly enhances training performance. Our method achieves state-of-the-art results and superior sample efficiency across a diverse set of challenging combinatorial benchmarks, including DNA sequence generation, RL with macro-actions, and multi-agent systems. Experiments demonstrate that our diffusion policies attain superior performance compared to other baselines.

URL PDF HTML ☆

赞 0 踩 0

2509.13648 2026-05-21 cs.LG cs.IR 版本更新

Sequential Data Augmentation for Generative Recommendation

生成推荐中的序列数据增强

Geon Lee, Bhuvesh Kumar, Clark Mingxuan Ju, Tong Zhao, Kijung Shin, Neil Shah, Liam Collins

发表机构 * Snap Inc.（Snap公司）

AI总结本文研究了生成推荐中数据增强的影响，提出了一种系统化的框架GenPAS，通过三种受偏步骤统一了多种增强策略，提升了模型的准确率、数据效率和参数效率。

详情

AI中文摘要

生成推荐在个性化系统中起着关键作用，通过预测用户的历史行为序列来预测用户未来的行为。在训练这些模型时，数据增强是一个关键但尚未充分研究的因素，即从用户交互历史中构建训练数据的过程。通过塑造训练分布，数据增强直接影响模型的泛化能力和性能。然而，在现有工作中，这一过程通常被简化、应用不一致或被视为次要设计选择，而没有系统和原则性的理解。受我们实证发现不同增强策略会产生显著性能差异的启发，我们深入分析了它们如何重塑训练分布并影响与未来目标的对齐以及对未见输入的泛化能力。为了系统化这一设计空间，我们提出GenPAS，一个通用且原则性的框架，将增强建模为输入-目标对上的随机采样过程，包含三个受偏步骤：序列采样、目标采样和输入采样。这种形式将广泛使用的策略作为特殊情况统一起来，并使训练分布的灵活控制成为可能。我们在基准和工业数据集上的大量实验表明，GenPAS在准确率、数据效率和参数效率方面优于现有策略，为生成推荐中原则性的训练数据构建提供了实用指导。我们的代码可在https://github.com/snap-research/GenPAS上获得。

英文摘要

Generative recommendation plays a crucial role in personalized systems, predicting users' future interactions from their historical behavior sequences. A critical yet underexplored factor in training these models is data augmentation, the process of constructing training data from user interaction histories. By shaping the training distribution, data augmentation directly and often substantially affects model generalization and performance. Nevertheless, in much of the existing work, this process is simplified, applied inconsistently, or treated as a minor design choice, without a systematic and principled understanding of its effects. Motivated by our empirical finding that different augmentation strategies can yield large performance disparities, we conduct an in-depth analysis of how they reshape training distributions and influence alignment with future targets and generalization to unseen inputs. To systematize this design space, we propose GenPAS, a generalized and principled framework that models augmentation as a stochastic sampling process over input-target pairs with three bias-controlled steps: sequence sampling, target sampling, and input sampling. This formulation unifies widely used strategies as special cases and enables flexible control of the resulting training distribution. Our extensive experiments on benchmark and industrial datasets demonstrate that GenPAS yields superior accuracy, data efficiency, and parameter efficiency compared to existing strategies, providing practical guidance for principled training data construction in generative recommendation. Our code is available at https://github.com/snap-research/GenPAS.

URL PDF HTML ☆

赞 0 踩 0

2508.16474 2026-05-21 eess.SY cs.LG cs.SY math.OC 版本更新

Reinforcement Learning-based Control via Y-wise Affine Neural Networks (YANNs)

基于Y-wise仿射神经网络的强化学习控制

Austin Braniff, Yuhe Tian

发表机构 * Department of Chemical and Biomedical Engineering, West Virginia University（化学与生物医学工程系，西弗吉尼亚大学）

AI总结本文提出了一种基于Y-wise仿射神经网络（YANNs）的新型强化学习算法，通过利用YANNs的可解释性，将多参数线性模型预测控制的显式解重新表述，并在初始化RL策略网络和评估网络时提供线性最优控制的自信度，最终实现对一般非线性优化问题的求解。

详情

DOI: 10.1016/j.compchemeng.2026.109610
Journal ref: Computers & Chemical Engineering, Volume 209, 109610 (2026)

AI中文摘要

本文提出了一种基于Y-wise仿射神经网络（YANNs）的新型强化学习算法。YANNs提供了一种可解释的神经网络，能够精确表示任意输入和输出维度的分段仿射函数，定义在任意数量的多面体子域上。YANNs的一个典型应用是重新表述多参数线性模型预测控制的显式解。在此基础上，本文提出利用YANNs初始化RL的策略网络和评估网络，使由此产生的YANN-RL控制算法能够以线性最优控制的自信度开始。YANN-策略网络通过使用离线计算获得的多参数控制解，利用近似的线性系统模型进行初始化。YANN-评估网络表示线性系统中状态-动作价值函数的显式形式以及作为优化控制问题（OCP）目标函数的奖励函数。此外，通过注入额外的网络层来扩展YANNs以实现非线性表达，这些层可以在线通过直接与真实复杂的非线性系统交互进行训练。这样，策略和状态价值函数最初精确表示线性OCP，并能够最终学习一般非线性OCP的解。此外，还实现了连续策略改进，以提供启发式信心，即线性OCP的解作为RL策略性能的有效下界。YANN-RL算法在裁剪摆和安全关键的化学反应系统上进行了演示。实验结果表明，YANN-RL在考虑安全约束时显著优于使用深度确定性策略梯度的现代RL算法。

英文摘要

This work presents a novel reinforcement learning (RL) algorithm based on Y-wise Affine Neural Networks (YANNs). YANNs provide an interpretable neural network which can exactly represent known piecewise affine functions of arbitrary input and output dimensions defined on any amount of polytopic subdomains. One representative application of YANNs is to reformulate explicit solutions of multi-parametric linear model predictive control. Built on this, we propose the use of YANNs to initialize RL actor and critic networks, which enables the resulting YANN-RL control algorithm to start with the confidence of linear optimal control. The YANN-actor is initialized by representing the multi-parametric control solutions obtained via offline computation using an approximated linear system model. The YANN-critic represents the explicit form of the state-action value function for the linear system and the reward function as the objective in an optimal control problem (OCP). Additional network layers are injected to extend YANNs for nonlinear expressions, which can be trained online by directly interacting with the true complex nonlinear system. In this way, both the policy and state-value functions exactly represent a linear OCP initially and are able to eventually learn the solution of a general nonlinear OCP. Continuous policy improvement is also implemented to provide heuristic confidence that the linear OCP solution serves as an effective lower bound to the performance of RL policy. The YANN-RL algorithm is demonstrated on a clipped pendulum and a safety-critical chemical-reactive system. Our results show that YANN-RL significantly outperforms the modern RL algorithm using deep deterministic policy gradient, especially when considering safety constraints.

URL PDF HTML ☆

赞 0 踩 0

2508.16453 2026-05-21 cs.SI cs.CL cs.LG 版本更新

Anti-establishment sentiment on TikTok: Implications for understanding influence(rs) and expertise on social media

TikTok上的反 Establishment 情绪：对社交媒体中影响者和专业知识理解的启示

Tianliang Xu, Ariel Hasell, Sabina Tomkins

发表机构 * GitHub

AI总结本文研究了TikTok上反 Establishment 情绪的普遍性，通过计算方法分析了金融、健康和阴谋论等主题内容中反 Establishment 情绪的分布，并探讨了社交媒体环境中反 Establishment 情绪对用户参与和平台激励的影响。

Comments 10 pages excluding references; 14 pages in total; 4 figures; Accepted by the AAAI Conference on Web and Social Media (ICWSM-2026)

详情

AI中文摘要

对公共服务机构的不信任和反 Establishment 观点正在上升（尤其是在美国）。随着人们转向社交媒体获取信息，有必要了解社交媒体环境是否以及如何促进对机构的不信任。在社交媒体中，内容创作者、影响者和其他意见领袖往往将自己定位为在健康、政治等众多话题上具有专业知识和权威性，并在许多情况下贬低和否定机构专业知识以建立追随者并增加自身可见性。然而，这种内容的普及程度以及此类内容是否增加参与度仍不清楚。本研究分析了TikTok平台上反 Establishment 情绪（AES）的普遍性。尽管TikTok作为信息来源非常流行，但其仍然相对研究较少，可能为人们如何形成对机构态度提供重要见解。我们采用计算方法，对TikTok帖子进行标注，判断其是否包含AES，涵盖内容创作者通常定位为专家的主题领域：金融和健康。作为比较，我们还考虑了阴谋论主题，其中AES预期较为常见。我们发现，AES在阴谋论内容中最为普遍，而在其他两个主题的内容中相对罕见。然而，我们发现与此类内容的参与模式因领域而异，并且可能存在平台激励用户发布表达反 Establishment 情绪的内容。

英文摘要

Distrust of public serving institutions and anti-establishment views are on the rise (especially in the U.S.). As people turn to social media for information, it is imperative to understand whether and how social media environments may be contributing to distrust of institutions. In social media, content creators, influencers, and other opinion leaders often position themselves as having expertise and authority on a range of topics from health to politics, and in many cases devalue and dismiss institutional expertise to build a following and increase their own visibility. However, the extent to which this content appears and whether such content increases engagement is unclear. This study analyzes the prevalence of anti-establishment sentiment (AES) on the social media platform TikTok. Despite its popularity as a source of information, TikTok remains relatively understudied and may provide important insights into how people form attitudes towards institutions. We employ a computational approach to label TikTok posts as containing AES or not across topical domains where content creators tend to frame themselves as experts: finance and wellness. As a comparison, we also consider the topic of conspiracy theories, where AES is expected to be common. We find that AES is most prevalent in conspiracy theory content, and relatively rare in content related to the other two topics. However, we find that engagement patterns with such content varies by area, and that there may be platform incentives for users to post content that expresses anti-establishment sentiment.

URL PDF HTML ☆

赞 0 踩 0

2508.11354 2026-05-21 cs.CV cs.AI cs.LG 版本更新

FunduSegmenter: Leveraging the RETFound Foundation Model for Joint Optic Disc and Optic Cup Segmentation in Retinal Fundus Images

FunduSegmenter：利用RETFound基础模型进行视网膜底照相图像中视盘和视杯联合分割

Zhenyi Zhao, Muthu Rama Krishnan Mookiah, Emanuele Trucco

发表机构 * University of Dundee（邓迪大学）

AI总结本文提出了一种基于RETFound基础模型的FunduSegmenter模型，通过引入一系列新颖模块实现视盘和视杯的联合分割，实验表明该模型在多个数据集上均优于现有方法。

详情

DOI: 10.1167/tvst.15.5.14
Journal ref: Trans. Vis. Sci. Tech. 2026;15(5):14

AI中文摘要

目的：本研究首次将RETFound模型应用于视盘（OD）和视杯（OC）的联合分割。RETFound是一个为眼底相机和光学相干断层扫描图像开发的知名基础模型，已在疾病诊断中表现出色。方法：我们提出FunduSegmenter，该模型整合了一系列新颖模块与RETFound，包括预适配器、解码器、后适配器、带有卷积块注意模块的跳跃连接以及视觉Transformer块适配器。该模型在自有数据集GoDARTS以及四个公开数据集IDRiD、Drishti-GS、RIM-ONE-r3和REFUGE上进行了评估，通过内部验证、外部验证和领域泛化实验进行验证。结果：在内部验证中，平均Dice相似系数达到90.51%，优于所有基线方法，其中nnU-Net为82.91%，DUNet为89.17%，TransUNet为87.91%。在所有外部验证实验中，平均结果比最佳基线高约3%，且在领域泛化中也具有竞争力。结论：本研究探讨了RETFound通过学习潜在通用表示在眼底相机图像中进行OD和OC分割的潜力。我们的FunduSegmenter在整体上优于现有最先进基线方法。所提出的模块是通用的，可以扩展到其他基础模型的微调。临床相关性：该模型在分布内和分布外数据上均表现出强大的稳定性与泛化能力，提供了稳定的OD和OC分割。这是许多自动化任务的关键步骤，从设置准确的视网膜坐标到生物标志物发现。代码和训练权重可在：https://github.com/JusticeZzy/FunduSegmenter上获得。

英文摘要

Purpose: This study introduces the first adaptation of RETFound for joint optic disc (OD) and optic cup (OC) segmentation. RETFound is a well-known foundation model developed for fundus camera and optical coherence tomography images, which has shown promising performance in disease diagnosis. Methods: We propose FunduSegmenter, a model integrating a series of novel modules with RETFound, including a Pre-adapter, a Decoder, a Post-adapter, skip connections with Convolutional Block Attention Module and a Vision Transformer block adapter. The model is evaluated on a proprietary dataset, GoDARTS, and four public datasets, IDRiD, Drishti-GS, RIM-ONE-r3, and REFUGE, through internal verification, external verification and domain generalization experiments. Results: An average Dice similarity coefficient of 90.51% was achieved in internal verification, which outperformed all baselines, some substantially (nnU-Net: 82.91%; DUNet: 89.17%; TransUNet: 87.91%). In all external verification experiments, the average results were about 3% higher than those of the best baseline, and our model was also competitive in domain generalization. Conclusions: This study explored the potential of the latent general representations learned by RETFound for OD and OC segmentation in fundus camera images. Our FunduSegmenter generally outperformed state-of-the-art baseline methods. The proposed modules are general and can be extended to fine-tuning other foundation models. Translational Relevance: The model shows strong stability and generalization on both in-distribution and out-of-distribution data, providing stable OD and OC segmentation. This is an essential step for many automated tasks, from setting the accurate retinal coordinate to biomarker discovery. The code and trained weights are available at: https://github.com/JusticeZzy/FunduSegmenter.

URL PDF HTML ☆

赞 0 踩 0

2508.09001 2026-05-21 cs.CL cs.AI cs.LG 版本更新

基于耦合簇水平精度的机器学习力场用于晶格动力学

Sita Schönbauer, Johanna P. Carbone, Fredrik V. Eriksson, Florian Libisch, Andreas Grüneis

发表机构 * Institute of Theoretical Physics, Technical University of Vienna（维也纳技术大学理论物理研究所）； Faculty of Physics and Center for Computational Materials Science, University of Vienna（维也纳大学物理系和计算材料科学中心）

AI总结本文研究了基于近似密度泛函理论和耦合簇水平势能面训练的机器学习力场，通过计算声子色散关系和振动密度态与实验和参考ab initio结果进行比较，验证了其在碳金刚石和锂氢固体中的准确性和精度，并探讨了通过耦合簇与密度泛函结果差异的delta学习方法和带电意识的机器学习力场方法。

Comments 17 pages, 7 figures

2507.06344 2026-05-21 quant-ph cs.CC cs.LG 版本更新

Gradient Scalability and Taylor Surrogation of Quantum Cost Landscapes

量子成本景观的梯度可扩展性与泰勒近似

Sabri Meyer, Francesco Scala, Francesco Tacchino, Aurelien Lucchi

发表机构 * Department of Mathematics and Computer Science, University of Basel（数学与计算机科学系，巴塞尔大学）； IBM Quantum, IBM Research Europe -- Zurich（IBM量子，IBM欧洲研究院——苏黎世）

AI总结本文研究了变分量子算法中梯度可扩展性与计算复杂性之间的关系，提出了一种经典模拟技术泰勒近似，并引入了线性克莱因编码器以确保梯度的常数可扩展性，通过数值实验发现梯度可能在超多项式复杂区域中衰减多项式而非指数。

Comments 12 pages, 6 figures, 54 pages of supplementary material

详情

AI中文摘要

变分量子算法是近期量子计算的有希望候选者，但因 barren plateaus 问题导致梯度相对于系统规模指数衰减，从而面临可扩展性挑战。最近的推测认为避免这些 plateaus 可能导致经典可模拟性，从而限制量子优势的机会。在本文中，我们推进了梯度可扩展性与变分量子算法计算复杂性之间关系的理论理解。我们首先提出了泰勒近似，一种经典模拟技术，它在近克莱因区域上匹配泡利路径运行时间保证，并在特定情况下提供运行时间优势。利用此近似，我们证明在之前已确立的经典可模拟区域之外，计算复杂性至少为超多项式。接着，我们引入了线性克莱因编码器，一种经典高效的基础集修改器，确保在接近克莱因电路的景观区域中梯度的常数可扩展性。最后，对这些修改后的景观进行数值实验，提供了初步的实验证据，表明在常数可扩展梯度可能在超多项式复杂区域中衰减多项式而非指数的过渡区。这些发现表明可能存在非消失梯度和超多项式复杂性共存的推测实例，这验证了未来正式证明的必要性。

英文摘要

Variational Quantum Algorithms are promising candidates for near-term quantum computing, yet they face scalability challenges due to barren plateaus, where gradients vanish exponentially relative to system size. Recent conjectures suggest that avoiding these plateaus might inherently lead to classical simulability, thereby limiting the opportunities for quantum advantage. In this work, we advance the theoretical understanding of the relationship between gradient scalability at initialization and the computational complexity of variational quantum algorithms. We first present the Taylor surrogate, a classical simulation technique that matches Pauli path runtime guarantees on near-Clifford regions while offering runtime advantages in specific regimes. Leveraging this surrogate, we prove that beyond previously established classically simulable regions, the computational complexity is at least super-polynomial. Next, we introduce the Linear Clifford Encoder, a classically efficient ansatz modifier that ensures constant-scaling gradients within landscape regions close to Clifford circuits. Finally, numerical experiments on these modified landscapes provide preliminary empirical evidence of a transition zone where constant-scaling gradients may decay polynomially in super-polynomially complex regions rather than exponentially. These findings suggest speculative instances where non-vanishing gradients and super-polynomial complexity could potentially coexist, vindicating the need for future formal proofs.

URL PDF HTML ☆

赞 0 踩 0

2506.21039 2026-05-21 cs.LG cs.AI 版本更新

Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning

严格子目标执行：在分层强化学习中的可靠长 horizon 规划

Jaebak Hwang, Sanghyeon Lee, Jeongmo Kim, Seungyul Han

发表机构 * Graduate School of Artificial Intelligence（人工智能研究生院）； Ulsan National Institute of Science and Technology (UNIST)（釜山国立科学与技术研究所）； Ulsan, South Korea（韩国釜山）

AI总结本文提出严格子目标执行（SSE）框架，通过前沿经验回放（FER）分离不可达与可接受的子目标，提高高层决策效率，从而在长horizon任务中实现更可靠的规划。

Comments 10 pages for main, 26 pages for total, Accepted to ICLR 2026

详情

Journal ref: International Conference on Learning Representations (ICLR), 2026

AI中文摘要

长horizon目标条件任务对强化学习（RL）提出了根本性挑战，特别是在目标遥远且奖励稀疏的情况下。虽然分层和图基方法提供了部分解决方案，但它们对传统 hindsight relabeling 的依赖往往无法纠正子目标不可行性，导致高层规划效率低下。为此，我们提出严格子目标执行（SSE），一种基于图的分层RL框架，整合前沿经验回放（FER）以分离不可达与可接受的子目标，并优化高层决策。FER利用失败和部分成功转移确定可达性前沿，识别不可靠的子目标，提高子目标可靠性，并减少不必要的高层决策。此外，SSE采用解耦探索策略以覆盖目标空间的未探索区域，并通过路径细化调整边成本以利用观察到的低层失败。在多样化的长horizon基准测试中，SSE在效率和成功率方面均优于现有目标条件和分层RL方法。我们的代码可在 https://jaebak1996.github.io/SSE/ 上获得。

英文摘要

Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, their reliance on conventional hindsight relabeling often fails to correct subgoal infeasibility, leading to inefficient high-level planning. To address this, we propose Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that integrates Frontier Experience Replay (FER) to separate unreachable from admissible subgoals and streamline high-level decision making. FER delineates the reachability frontier using failure and partial-success transitions, which identifies unreliable subgoals, increases subgoal reliability, and reduces unnecessary high-level decisions. Additionally, SSE employs a decoupled exploration policy to cover underexplored regions of the goal space and a path refinement that adjusts edge costs using observed low-level failures. Experimental results across diverse long-horizon benchmarks show that SSE consistently outperforms existing goal-conditioned and hierarchical RL methods in both efficiency and success rate. Our code is available at https://jaebak1996.github.io/SSE/

URL PDF HTML ☆

赞 0 踩 0

2506.17631 2026-05-21 cs.LG cs.AI 版本更新

Time-Prompt: Integrated Heterogeneous Prompts for Unlocking LLMs in Time Series Forecasting

Time-Prompt: 集成异构提示以解锁时间序列预测中的LLM

Zesen Wang, Lijuan Lan, Yonggang Li

发表机构 * Central South University, Changsha, China（中南大学，长沙，中国）

AI总结本文提出Time-Prompt框架，通过构建统一的提示范式、设计语义空间嵌入和跨模态对齐模块以及高效微调LLM参数，提升时间序列预测性能，并在碳排放数据集上验证其有效性。

Comments Accepted at IJCNN 2026

详情

AI中文摘要

时间序列预测旨在建模变量间的时序依赖关系以推断未来状态，对现实世界场景具有重要性和广泛应用。尽管基于深度学习的方法已取得显著进展，但其在长期预测中仍表现不佳。最近研究表明，大型语言模型（LLMs）在时间序列预测中表现出色，但其在该任务中的实用性仍存疑。为此，我们提出Time-Prompt框架，旨在激活LLMs进行时间序列预测。具体而言，我们首先构建了一个统一的提示范式，利用可学习的软提示引导LLM的行为，并利用文本化的硬提示增强时间序列表示。其次，为了增强LLM对预测任务的全面理解，我们设计了一个语义空间嵌入和跨模态对齐模块，以实现时序和文本数据的融合。最后，我们利用时间序列数据高效地微调LLM的参数。此外，我们专注于碳排放领域，旨在为全球碳中和做出贡献。在6个公开数据集和3个碳排放数据集上的综合评估表明，Time-Prompt是一个强大的时间序列预测框架。

英文摘要

Time series forecasting aims to model temporal dependencies among variables for future state inference, holding significant importance and widespread applications in real-world scenarios. Although deep learning-based methods have achieved remarkable progress, they still exhibit suboptimal performance in long-term forecasting. Recent research demonstrates that large language models (LLMs) achieve promising performance in time series forecasting, but this progress is still met with skepticism about whether LLMs are truly useful for this task. To address this, we propose Time-Prompt, a framework for activating LLMs for time series forecasting. Specifically, we first construct a unified prompt paradigm with learnable soft prompts to guide the LLM's behavior and textualized hard prompts to enhance the time series representations. Second, to enhance LLM' comprehensive understanding of the forecasting task, we design a semantic space embedding and cross-modal alignment module to achieve fusion of temporal and textual data. Finally, we efficiently fine-tune the LLM's parameters using time series data. Furthermore, we focus on carbon emissions, aiming to provide a modest contribution to global carbon neutrality. Comprehensive evaluations on 6 public datasets and 3 carbon emission datasets demonstrate that Time-Prompt is a powerful framework for time series forecasting.

URL PDF HTML ☆

赞 0 踩 0

2505.19075 2026-05-21 cs.AI cs.CL cs.LG 版本更新

Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs

Universal Reasoner: 一个单一、可组合的即插即用推理器用于冻结的LLM

Jaemin Kim, Hangeol Chang, Hyunmin Hwang, Choonghan Kim, Jong Chul Ye

发表机构 * Graduate School of Artificial Intelligence, Korea Advanced Institute of Science and Technology（人工智能研究生院，韩国科学技术院）

AI总结本文提出Universal Reasoner，一种可组合且即插即用的推理模块，能够在冻结的大规模语言模型上提供专门的推理能力，通过共享或对齐的token空间实现弱到强的泛化，实验表明其在数学推理和机器翻译中优于现有微调方法。

Comments ICML 2026

详情

AI中文摘要

Large Language Models (LLMs) have demonstrated remarkable general capabilities, but enhancing skills such as reasoning often demands substantial computational resources and may compromise generalization. While Parameter-Efficient Fine-Tuning (PEFT) methods offer a more resource-conscious alternative, they typically require retraining for each LLM backbone due to architectural dependencies. To address these challenges, we propose Universal Reasoner (UniR)-a modular, composable, and plug-and-play reasoning module that can be used with larger frozen LLMs to provide specialized reasoning capabilities with a shared or aligned token space. Specifically, UniR decomposes the reward into a standalone reasoning module trained in a decoupled manner using verifiable rewards, effectively translating trajectory-level signals into token-level guidance. Once trained, UniR is combined with frozen LLMs at inference time by simply adding its output logits to those of the backbone. This additive structure enables modular composition: multiple UniR modules trained for different tasks can be jointly applied by summing their logits, enabling complex reasoning via composition. Furthermore, UniR demonstrates weak-to-strong generalization, where reasoning modules trained on smaller models effectively guide much larger LLMs in the same model family, and generalize across domains such as in vision language models and medical reasoning. Experiments on mathematical reasoning and machine translation show that UniR surpasses existing fine-tuning methods. Code is open-sourced at https://github.com/hangeol/UniR.

英文摘要

Large Language Models (LLMs) have demonstrated remarkable general capabilities, but enhancing skills such as reasoning often demands substantial computational resources and may compromise generalization. While Parameter-Efficient Fine-Tuning (PEFT) methods offer a more resource-conscious alternative, they typically require retraining for each LLM backbone due to architectural dependencies. To address these challenges, we propose Universal Reasoner (UniR)-a modular, composable, and plug-and-play reasoning module that can be used with larger frozen LLMs to provide specialized reasoning capabilities with a shared or aligned token space. Specifically, UniR decomposes the reward into a standalone reasoning module trained in a decoupled manner using verifiable rewards, effectively translating trajectory-level signals into token-level guidance. Once trained, UniR is combined with frozen LLMs at inference time by simply adding its output logits to those of the backbone. This additive structure enables modular composition: multiple UniR modules trained for different tasks can be jointly applied by summing their logits, enabling complex reasoning via composition. Furthermore, UniR demonstrates weak-to-strong generalization, where reasoning modules trained on smaller models effectively guide much larger LLMs in the same model family, and generalize across domains such as in vision language models and medical reasoning. Experiments on mathematical reasoning and machine translation show that UniR surpasses existing fine-tuning methods. Code is open-sourced at https://github.com/hangeol/UniR.

URL PDF HTML ☆

赞 0 踩 0

2505.07054 2026-05-21 eess.SY cs.LG cs.SY math.OC 版本更新

YANNs: Y-wise Affine Neural Networks for Exact and Efficient Representations of Piecewise Linear Functions

Austin Braniff, Yuhe Tian

发表机构 * Department of Chemical and Biomedical Engineering, West Virginia University, United States of America（化学与生物医学工程系，西弗吉尼亚大学，美国）

AI总结本文提出YANNs，一种能够精确且高效表示分段线性函数的Y-wise仿射神经网络，无需训练即可实现功能等效表示，为多参数模型预测控制提供了应用展示，展示了在实时计算中的高效性与控制理论保证。

详情

DOI: 10.1016/j.compchemeng.2026.109589
Journal ref: Computers & Chemical Engineering, Volume 208, 109589 (2026)

AI中文摘要

本文正式介绍了Y-wise仿射神经网络（YANNs），一种完全可解释的网络架构，能够连续且高效地表示具有多面体子域的分段仿射函数。根据证明，YANNs的开发无需训练即可实现功能等效表示。YANNs因此保留了原始公式的全部数学性质。多参数模型预测控制被用作YANNs的应用展示，理论上计算最优控制律作为状态、输出、设定点和扰动的分段仿射函数。通过精确表示多参数控制律，YANNs保留了如递归可行性与稳定性等关键控制理论保证。这使YANNs区别于现有工作，后者将神经网络用于近似最优控制律而非精确表示。通过优化网络推理速度，YANNs在实时计算中比传统分段仿射函数计算快得多。数值案例研究展示了算法在输入/输出维度和子域数量方面的可扩展性。YANNs在控制领域代表了重大进展，作为首个内在确保可行性和稳定性的神经网络控制器。未来应用可将其作为数据驱动建模/控制的高效且可解释的起点。

英文摘要

This work formally introduces Y-wise Affine Neural Networks (YANNs), a fully-explainable network architecture that continuously and efficiently represent piecewise affine functions with polytopic subdomains. Following from the proofs, it is shown that the development of YANNs requires no training to achieve the functionally equivalent representation. YANNs thus maintain all mathematical properties of the original formulations. Multi-parametric model predictive control is utilized as an application showcase of YANNs, which theoretically computes optimal control laws as a piecewise affine function of states, outputs, setpoints, and disturbances. With the exact representation of multi-parametric control laws, YANNs retain essential control-theoretic guarantees such as recursive feasibility and stability. This sets YANNs apart from the existing works which apply neural networks for approximating optimal control laws instead of exactly representing them. By optimizing the inference speed of the networks, YANNs can evaluate substantially faster in real-time compared to traditional piecewise affine function calculations. Numerical case studies are presented to demonstrate the algorithmic scalability with respect to the input/output dimensions and the number of subdomains. YANNs represent a significant advancement in control as the first neural network-based controller that inherently ensures both feasibility and stability. Future applications can leverage them as an efficient and interpretable starting point for data-driven modeling/control.

URL PDF HTML ☆

赞 0 踩 0

2502.12120 2026-05-21 cs.LG cs.AI cs.CL 版本更新

LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws

LLMs on the Line: 数据决定损失-损失缩放定律

Prasanna Mayilvahanan, Thaddäus Wiedemer, Sayak Mallick, Matthias Bethge, Wieland Brendel

发表机构 * Max Planck Institute for Intelligent Systems（智能系统马克斯·普朗克研究所）； ELLIS Institute Tübingen（图宾根ELLIS研究所）； Tübingen AI Center（图宾根人工智能中心）； University of Tübingen（图宾根大学）

AI总结研究探讨了影响LLM损失-损失缩放定律的主要因素，发现预训练数据决定了缩放趋势，而模型大小、优化超参数、分词器和架构差异对缩放影响有限，因此应精心选择预训练数据以获得最佳下游性能。

Comments ICML 2025 camera-ready version

详情

AI中文摘要

缩放定律指导大型语言模型（LLMs）的发展，通过提供模型大小、令牌和计算量之间的最佳平衡估计。最近，损失-损失缩放定律，即预训练数据集和下游任务之间损失的关系，已成为理解并改进LLM性能和泛化能力的强大工具。在本工作中，我们研究了哪些因素最强烈地影响损失-损失缩放。我们的实验发现，预训练数据决定了缩放趋势。相比之下，模型大小、优化超参数、分词器甚至显著的架构差异，如基于Transformer的模型如Llama和状态空间模型如Mamba之间的差异，通常影响有限。因此，从业者应仔细选择适合的预训练数据集以获得最佳下游性能，而架构和其他设置可以自由优化以提高训练效率。

英文摘要

Scaling laws guide the development of large language models (LLMs) by offering estimates for the optimal balance of model size, tokens, and compute. More recently, loss-to-loss scaling laws that relate losses across pretraining datasets and downstream tasks have emerged as a powerful tool for understanding and improving LLM performance and generalization. In this work, we investigate which factors most strongly influence loss-to-loss scaling. Our experiments reveal that the pretraining data determines the scaling trend. In contrast, model size, optimization hyperparameters, tokenizer and even significant architectural differences, such as between transformer-based models like Llama and state-space models like Mamba, generally have limited impact. Consequently, practitioners should carefully curate suitable pretraining datasets for optimal downstream performance, while architectures and other settings can be freely optimized for training efficiency.

URL PDF HTML ☆

赞 0 踩 0

2502.03752 2026-05-21 cs.LG cs.AI 版本更新

Self-Improving Skill Learning for Robust Skill-based Meta-Reinforcement Learning

基于鲁棒技能的元强化学习中的自我改进技能学习

Sanghyeon Lee, Sangjun Bae, Yisak Park, Seungyul Han

发表机构 * Graduate School of Artificial Intelligence（人工智能研究生院）； Ulsan National Institute of Science and Technology (UNIST)（釜山国立科学技术研究院 (UNIST)）

AI总结本文提出Self-Improving Skill Learning (SISL)方法，通过解耦的高层和技能改进策略进行自我指导的技能细化，并利用最大回报重标记进行技能优先级排序，从而在噪声和次优数据下实现鲁棒且稳定的适应，优于其他基于技能的元强化学习方法。

Comments 10 pages main, 27 pages appendix with reference. Accepted to ICLR 2026

详情

Journal ref: International Conference on Learning Representations (ICLR), 2026

AI中文摘要

元强化学习（Meta-RL）能够快速适应未见任务，但在长时间 horizon 环境中面临挑战。基于技能的方法通过将状态-动作序列分解为可重用的技能并采用分层决策来解决这一问题。然而，这些方法对噪声的离线演示高度敏感，导致技能学习不稳定和性能下降。为此，我们提出Self-Improving Skill Learning (SISL)，通过解耦的高层和技能改进策略进行自我指导的技能细化，同时应用最大回报重标记进行技能优先级排序，从而在噪声和次优数据下实现鲁棒且稳定的适应。通过减轻噪声的影响，SISL实现了可靠的技能学习，并在多样化的长horizon任务上一致优于其他基于技能的元强化学习方法。我们的代码可在https://epsilog.github.io/SISL获取。

英文摘要

Meta-reinforcement learning (Meta-RL) facilitates rapid adaptation to unseen tasks but faces challenges in long-horizon environments. Skill-based approaches tackle this by decomposing state-action sequences into reusable skills and employing hierarchical decision-making. However, these methods are highly susceptible to noisy offline demonstrations, leading to unstable skill learning and degraded performance. To address this, we propose Self-Improving Skill Learning (SISL), which performs self-guided skill refinement using decoupled high-level and skill improvement policies, while applying skill prioritization via maximum return relabeling to focus updates on task-relevant trajectories, resulting in robust and stable adaptation even under noisy and suboptimal data. By mitigating the effect of noise, SISL achieves reliable skill learning and consistently outperforms other skill-based meta-RL methods on diverse long-horizon tasks. Our code is available at https://epsilog.github.io/SISL.

URL PDF HTML ☆

赞 0 踩 0

2502.02844 2026-05-21 cs.LG cs.AI cs.CR cs.MA 版本更新

Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

狼群对抗攻击用于鲁棒多智能体强化学习

Sunwoo Lee, Jaebak Hwang, Yonghyeon Jo, Seungyul Han

发表机构 * Graduate School of Artificial Intelligence, UNIST, Ulsan, South Korea（人工智能研究生院，UNIST，韩国乌山）

AI总结本文提出狼群对抗攻击框架，用于对抗多智能体强化学习中的协同对抗攻击，并引入狼群-对抗学习框架来训练鲁棒的MARL策略以防御该攻击。

Comments 9 pages main, 23 pages appendix with reference. Accepeted by ICML 2025

详情

DOI: 10.5555/3780338.3781634
Journal ref: Proceedings of Machine Learning Research (PMLR), ICML 2025

AI中文摘要

传统多智能体强化学习（MARL）中的鲁棒方法往往难以应对合作场景中的协调对抗攻击。为了解决这一限制，我们提出了受狼群狩猎策略启发的狼群对抗攻击框架，该框架针对初始智能体及其辅助智能体以破坏合作。此外，我们还引入了狼群-对抗学习用于MARL（WALL）框架，该框架通过促进系统内协作来训练鲁棒的MARL策略以防御所提出的狼群攻击。实验结果突显了狼群攻击的毁灭性影响以及WALL所实现的显著鲁棒性改进。我们的代码可在https://github.com/sunwoolee0504/WALL上获得。

英文摘要

Traditional robust methods in multi-agent reinforcement learning (MARL) often struggle against coordinated adversarial attacks in cooperative scenarios. To address this limitation, we propose the Wolfpack Adversarial Attack framework, inspired by wolf hunting strategies, which targets an initial agent and its assisting agents to disrupt cooperation. Additionally, we introduce the Wolfpack-Adversarial Learning for MARL (WALL) framework, which trains robust MARL policies to defend against the proposed Wolfpack attack by fostering systemwide collaboration. Experimental results underscore the devastating impact of the Wolfpack attack and the significant robustness improvements achieved by WALL. Our code is available at https://github.com/sunwoolee0504/WALL.

URL PDF HTML ☆

赞 0 踩 0

2502.02834 2026-05-21 cs.LG cs.AI 版本更新

Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

任务感知虚拟训练：增强元强化学习在分布外任务中的泛化能力

Jeongmo Kim, Yisak Park, Minung Kim, Seungyul Han

发表机构 * Graduate School of Artificial Intelligence, UNIST, Ulsan, South Korea（人工智能研究生院，UNIST，韩国乌山）

AI总结本文提出Task-Aware Virtual Training方法，通过度量学习提升元强化学习在分布外任务中的泛化能力，采用虚拟任务保持任务特征并利用状态正则化技术减少状态变化环境中的过估计误差。

Comments 9 pages main paper, 20 pages appendices with reference. Accepted to ICML 2025

详情

DOI: 10.5555/3780338.3781544
Journal ref: Proceedings of Machine Learning Research (PMLR), ICML 2025

AI中文摘要

元强化学习旨在开发能够泛化到未见任务的策略，这些任务从任务分布中采样。尽管基于上下文的元强化学习方法通过任务潜在变量改善任务表示，但它们在分布外（OOD）任务上常常表现不佳。为了解决这个问题，我们提出了Task-Aware Virtual Training（TAVT），一种新的算法，通过度量基于的表示学习准确捕捉任务特征，用于训练和OOD场景。我们的方法在虚拟任务中成功保持任务特征，并采用状态正则化技术以减轻状态变化环境中的过估计误差。数值结果表明，TAVT在各种MuJoCo和MetaWorld环境中显著增强了对OOD任务的泛化能力。我们的代码可在https://github.com/JM-Kim-94/tavt.git获取。

英文摘要

Meta reinforcement learning aims to develop policies that generalize to unseen tasks sampled from a task distribution. While context-based meta-RL methods improve task representation using task latents, they often struggle with out-of-distribution (OOD) tasks. To address this, we propose Task-Aware Virtual Training (TAVT), a novel algorithm that accurately captures task characteristics for both training and OOD scenarios using metric-based representation learning. Our method successfully preserves task characteristics in virtual tasks and employs a state regularization technique to mitigate overestimation errors in state-varying environments. Numerical results demonstrate that TAVT significantly enhances generalization to OOD tasks across various MuJoCo and MetaWorld environments. Our code is available at https://github.com/JM-Kim-94/tavt.git.

URL PDF HTML ☆

赞 0 踩 0

2501.02407 2026-05-21 cs.CL cs.CR cs.LG 版本更新

Towards the Anonymization of the Language Modeling

朝向语言模型的匿名化

Antoine Boutet, Lucas Magnana, Juliette Sénéchal

发表机构 * INSA Lyon, Inria, CITI, UR3720（里昂国家理工学院、法国国家科学研究中心、CITI、UR3720）； Inria, INSA Lyon, CITI, UR3720（法国国家科学研究中心、里昂国家理工学院、CITI、UR3720）

AI总结本文提出了一种隐私保护的语言模型方法，通过掩码语言模型（MLM）和因果语言模型（CLM）方法，旨在解决语言模型的匿名化问题，从而促进其共享。研究通过医疗数据集评估了这两种方法，并表明在避免记忆直接和间接标识信息的同时，能够保持高隐私性和高实用性。

详情

AI中文摘要

自然语言处理（NLP）的快速发展已经革新了许多领域，包括医疗保健。然而，这些进展带来了显著的隐私问题，特别是当预训练模型在敏感数据上进行微调和专门化时，可能会记住并暴露个人信息。本文提出了一种隐私保护的语言模型方法，以解决语言模型的匿名化问题，从而促进其共享。具体来说，我们提出了掩码语言模型（MLM）方法，用于专门化类似于BERT的语言模型，以及因果语言模型（CLM）方法，用于专门化类似于GPT的语言模型，以避免模型记住训练数据中直接和间接的标识信息。我们使用医疗数据集全面评估了我们的方法，并将其与不同的基线进行了比较。我们的结果表明，通过在模型专门化过程中避免记忆直接和间接的标识符，我们的掩码和因果语言模型方案在保持高隐私性的同时，能够保持高实用性。

英文摘要

Rapid advances in Natural Language Processing (NLP) have revolutionized many fields, including healthcare. However, these advances raise significant privacy concerns, especially when pre-trained models fine-tuned and specialized on sensitive data can memorize and then expose and regurgitate personal information. This paper presents a privacy-preserving language modeling approach to address the problem of language models anonymization, and thus promote their sharing. Specifically, we propose both a Masking Language Modeling (MLM) methodology to specialize a BERT-like language model, and a Causal Language Modeling (CLM) methodology to specialize a GPT-like model that avoids the model from memorizing direct and indirect identifying information present in the training data. We have comprehensively evaluated our approaches using a medical dataset and compared them against different baselines. Our results indicate that by avoiding memorizing both direct and indirect identifiers during model specialization, our masking and causal language modeling schemes offer a good tradeoff for maintaining high privacy while retaining high utility.

URL PDF HTML ☆

赞 0 踩 0

2412.14738 2026-05-21 cs.LG 版本更新

Spectrally unstable nodes drive reliability failures in graph learning

谱不稳定性节点驱动图学习中的可靠性故障

Yongyu Wang

发表机构 * MTU（MTU大学）

AI总结研究探讨了图学习中谱不稳定性节点对可靠性故障的影响，提出了一种可靠性感知干预方法以隔离这些节点，从而提升算法在对抗性和内在噪声下的鲁棒性。

详情

AI中文摘要

图学习算法在图结构被对抗性扰动、本质上嘈杂或由不完美观测构造时可能会失效。本文展示了一些节点比其他节点对对抗性扰动和内在噪声损害图学习算法承担更大的责任。基于图谱畸变分析，我们识别出这些故障驱动节点，并引入一种可靠性感知干预，将其隔离出主要学习步骤。目标算法应用于稳定的诱导子图，隔离节点的预测通过拓扑或质心传播恢复。在针对和非针对的结构攻击下，以及谱超图聚类和多视图谱聚类等图神经网络中，这一原理在对抗性和内在噪声下均提高了可靠性。这些结果表明，节点层面的谱不稳定性为理解并缓解图学习中的可靠性故障提供了一个共同机制。

英文摘要

Graph-learning algorithms can fail when graph structure is adversarially perturbed, intrinsically noisy or constructed from imperfect observations. Here we show that some nodes bear much greater responsibility than others for allowing adversarial perturbations and intrinsic noise to harm graph-learning algorithms. Building on graph-spectral distortion analysis, we identify these failure-driving nodes and introduce a reliability-aware intervention that isolates them from the main learning step. The target algorithm is applied to a stable induced subgraph, and predictions for isolated nodes are recovered through topology- or centroid-based propagation. Across graph neural networks under targeted and non-targeted structural attacks, spectral hypergraph clustering and multi-view spectral clustering, this principle improves reliability under both adversarial and intrinsic noise. These results suggest that node-level spectral instability provides a common mechanism for understanding and mitigating reliability failures in graph learning.

URL PDF HTML ☆

赞 0 踩 0

2410.23212 2026-05-21 stat.ML cs.LG math.ST stat.TH 版本更新

Improved convergence rate of kNN graph Laplacians: differentiable self-tuned affinity

kNN图拉普拉斯算子的改进收敛速度：可微自调亲和力

Xiuyuan Cheng, Yixuan Tan, Nan Wu

发表机构 * Department of Mathematics, Duke University（杜克大学数学系）； Department of Mathematical Sciences, The University of Texas at Dallas（德克萨斯大学达拉斯分校数学科学系）

AI总结本文研究了kNN图的收敛速度问题，提出了一种可微自调亲和力的方法，通过改进分析得到在流形数据设定下，kNN图拉普拉斯算子以O(N^{-2/(d+6)})的速度收敛到极限流形算子，验证了理论结果。

详情

AI中文摘要

在基于图的数据分析中，k最近邻（kNN）图因其对局部数据密度的适应性而被广泛应用。允许图中边的加权，核化图亲和力提供了一种更一般的kNN图，其中kNN距离用于自适应地设置核带宽。在本文中，我们考虑了一类一般的kNN图，其中图亲和力为W_{ij}=ε^{-d/2}k_0(||x_i -x_j||^2/εϕ(ρ(x_i),ρ(x_j))^2)，其中ρ(x)是点x的（重新缩放的）kNN距离，ϕ是一个对称双变量函数，k_0是一个非负函数。在流形数据设定下，其中N个i.i.d.样本x_i从一个未知的d维流形上的密度p中抽取，我们证明了在k_0和ϕ具有C^3正则性并满足其他技术条件时，kNN图拉普拉斯算子以O(N^{-2/(d+6)})的速度收敛到极限流形算子（取决于p），并验证了理论结果。

英文摘要

In graph-based data analysis, $k$-nearest neighbor ($k$NN) graphs are widely used due to their adaptivity to local data densities. Allowing weighted edges in the graph, the kernelized graph affinity provides a more general type of $k$NN graph where the $k$NN distance is used to set the kernel bandwidth adaptively. In this work, we consider a general class of $k$NN graph where the graph affinity is $W_{ij} = ε^{-d/2} k_0 ( \| x_i - x_j \|^2 / εϕ( \hat ρ(x_i), \hat ρ(x_j) )^2 ) $, with $\hatρ(x)$ being the (rescaled) $k$NN distance at the point $x$, $ϕ$ a symmetric bi-variate function, and $k_0$ a non-negative function on $[0,\infty)$. Under the manifold data setting, where $N$ i.i.d. samples $x_i$ are drawn from a density $p$ on a $d$-dimensional unknown manifold embedded in a high dimensional Euclidean space, we prove the operator pointwise convergence of the $k$NN graph Laplacian to the limiting manifold operator (depending on $p$) at the rate of $O(N^{-2/(d+6)})$, up to a log factor, when $k_0$ and $ϕ$ have $C^3$ regularity and satisfy other technical conditions. This is obtained when $ε\sim N^{-2/(d+6)}$ and $k \sim N^{6/(d+6)}$, both at the optimal order to balance the theoretical bias and variance errors. Our improved convergence rate is based on a refined analysis of the $k$NN estimator, which can be of independent interest. We validate our theory by numerical experiments on simulated data.

URL PDF HTML ☆

赞 0 踩 0

2409.18272 2026-05-21 cs.LG 版本更新

SLIDE: A machine-learning based method for forced dynamic response estimation of multibody systems

SLIDE：一种基于机器学习的多体系统强迫动态响应估计方法

Peter Manzl, Alexander Humer, Qasim Khadim, Johannes Gerstmayr

发表机构 * University of Innsbruck, Austria（奥地利因斯布鲁克大学）； Johannes Kepler University Linz, Austria（奥地利林茨约翰尼斯·凯普勒大学）； University of Oulu, Finland（芬兰奥卢大学）

AI总结本文提出了一种基于机器学习的SLIDE方法，用于估计机械或多体系统的输出序列，通过滑动窗口初始截断动态响应估计器，利用复数特征值近似阻尼效应，提高模拟速度并实现实时性能。

Comments Paper currently in submission for journal publication

详情

DOI: 10.1080/15397734.2025.2559325
Journal ref: Mechanics Based Design of Structures and Machines 54(1), 2026

AI中文摘要

在计算工程中，提高模拟速度和效率是一个永恒的目标。为了充分利用神经网络技术和硬件，我们提出了SLiding-window Initially-truncated Dynamic-response Estimator (SLIDE)，一种基于深度学习的方法，用于估计机械或多体系统的输出序列，主要但不局限于强迫激励。SLIDE的一个关键优势是能够估计阻尼系统的动态响应，而无需完整系统状态，使其特别有效于柔性多体系统。该方法根据初始效应（如阻尼）的衰减截断输出窗口，该衰减通过系统线性化方程的复数特征值近似。此外，还训练了一个第二个神经网络来提供误差估计，进一步增强了方法的应用性。该方法应用于包括Duffing振荡器、柔性滑块-曲柄系统和安装在柔性底座上的工业6R机械臂在内的多种系统。我们的结果表明，从模拟到数百万次的加速显著，远超实时性能。

英文摘要

In computational engineering, enhancing the simulation speed and efficiency is a perpetual goal. To fully take advantage of neural network techniques and hardware, we present the SLiding-window Initially-truncated Dynamic-response Estimator (SLIDE), a deep learning-based method designed to estimate output sequences of mechanical or multibody systems with primarily, but not exclusively, forced excitation. A key advantage of SLIDE is its ability to estimate the dynamic response of damped systems without requiring the full system state, making it particularly effective for flexible multibody systems. The method truncates the output window based on the decay of initial effects, such as damping, which is approximated by the complex eigenvalues of the systems linearized equations. In addition, a second neural network is trained to provide an error estimation, further enhancing the methods applicability. The method is applied to a diverse selection of systems, including the Duffing oscillator, a flexible slider-crank system, and an industrial 6R manipulator, mounted on a flexible socket. Our results demonstrate significant speedups from the simulation up to several millions, exceeding real-time performance substantially.

URL PDF HTML ☆

赞 0 踩 0

2409.04777 2026-05-21 cs.LG math.OC 版本更新

Optimization Hyper-parameter Laws for Large Language Models

大语言模型的优化超参数规律

Xingyu Xie, Kuangyu Ding, Shuicheng Yan, Kim-Chuan Toh, Tianwen Wei

发表机构 * Department of Mathematics, National University of Singapore, Singapore（新加坡国立大学数学系）； School of Computing, National University of Singapore（新加坡国立大学计算机学院）； Department of Mathematics and Institute of Operations Research and Analytics, National University of Singapore, Singapore（新加坡国立大学数学系和运筹分析研究所）； Skywork AI, Beijing（北京Skywork AI）

AI总结本文提出Opt-Laws框架，通过分析SDE收敛和逃逸特性，预测最终训练损失，从而在小规模实验中预选学习率调度方案，提高了超参数选择的准确性。

详情

AI中文摘要

大语言模型推动了显著的AI进步，但其训练过程资源消耗大且对超参数选择高度敏感。尽管扩展定律提供了模型大小和数据需求的指导，但它们在选择动态超参数（如学习率调度）方面存在不足。为此，我们提出优化超参数规律（Opt-Laws），该框架将最终训练损失作为学习率调度、模型大小和数据大小的函数进行预测。基于SDE基于的收敛和逃逸分析，Opt-Laws产生可解释的收敛和逃逸特征，能够预测不同模型规模下的最终训练损失，从而在小规模实验中预选调度方案。实证表明，Opt-Laws在验证配置上实现了94%的Top-2命中率，正确识别了所有五个评估的非家族设置中的最佳性能调度家族，并以F1=0.92检测到训练发散。

英文摘要

Large Language Models have driven significant AI advancements, yet their training is resource-intensive and highly sensitive to hyper-parameter selection. While scaling laws provide valuable guidance on model size and data requirements, they fall short in choosing dynamic hyper-parameters, such as learning-rate (LR) schedules, that evolve during training. To bridge this gap, we present Optimization Hyper-parameter Laws (Opt-Laws), a framework that predicts final training loss as a function of LR schedule, model size, and data size. Grounded in SDE-based convergence and escape analyses, Opt-Laws yield interpretable convergence and escape features that predict final training loss across model scales, enabling schedule pre-selection from small-scale experiments. Empirically, Opt-Laws achieve a 94% Top-2 hit rate for identifying near-optimal schedule candidates on held-out configurations, correctly identify the best-performing schedule family in all five evaluated out-of-family settings, and detect training divergence with F1 = 0.92.

URL PDF HTML ☆

赞 0 踩 0

2408.08812 2026-05-21 cs.LG 版本更新

TRAM: Test-Time Risk Adaptation with Mixture of Agents

TRAM: 测试时风险适应与代理混合

Mohamad Fares El Hajj Chehade, Amrit Singh Bedi, Amy Zhang, Hao Zhu

发表机构 * UT Austin（得克萨斯大学）； University of Central Florida（中央佛罗里达大学）； MIT（麻省理工学院）； UMD（大学公园分校）

AI总结本文研究了在部署时无需更新的零更新适应问题，提出TRAM方法通过混合代理评估源策略的风险调整分数，以降低部署风险并保持奖励。

详情

AI中文摘要

部署的强化学习代理常面临在训练后才指定的安全要求，如新的危险地图、修订的风险阈值或行为对齐约束。我们研究零更新部署时适应，其中固定的风险中性源策略库在新的奖励-风险权衡下被重用。我们提出TRAM（通过代理混合的测试时风险适应），一种源评分的组合规则，该规则在目标奖励和基于占用的部署风险下评估每个源策略，然后使用风险调整的源评分选择动作。不同于训练时与固定替代物（如回报方差）绑定的风险敏感方法，TRAM支持在测试时指定的空间屏障暴露、与参考行为的偏离以及局部波动风险。我们明确将TRAM作为替代方法：它不解决拼接策略的完整占用控制问题，但允许一个可测量的源壳匹配项，将源评分风险与实际风险联系起来。在网格世界、MuJoCo Reacher、Safety-Gymnasium和LLM对齐设置中的实验表明，TRAM在不需测试时任何参数更新的情况下减少了部署风险，同时保持了奖励。

英文摘要

Deployed reinforcement learning agents often face safety requirements that are specified only after training, such as new hazard maps, revised risk thresholds, or behavioral alignment constraints. We study zero-update deployment-time adaptation, where a fixed library of risk-neutral source policies is reused under a newly specified reward-risk tradeoff. We propose TRAM (Test-Time Risk Adaptation via Mixture of Agents), a source-scored composition rule that evaluates each source policy under the target reward and an occupancy-based deployment risk, then selects actions using risk-adjusted source scores. Unlike training-time risk-sensitive methods tied to a fixed surrogate such as return variance, TRAM supports spatial barrier exposure, divergence from a reference behavior, and local volatility risks specified at test time. We explicitly characterize TRAM as a surrogate method: it does not solve the full occupancy-control problem of the stitched policy, but admits a measurable source-hull mismatch term connecting source-scored risk to realized risk. Experiments in gridworlds, MuJoCo Reacher, Safety-Gymnasium, and an LLM alignment setting show that TRAM reduces deployment risk while preserving reward, without requiring any parameter updates at test time.

URL PDF HTML ☆

赞 0 踩 0

2407.08976 2026-05-21 stat.ML cs.LG math.ST stat.TH 版本更新

Computational-Statistical Trade-off in Kernel Two-Sample Testing with Random Fourier Features

核两样本检验中计算与统计的权衡：随机傅里叶特征

Ikjun Choi, Ilmun Kim

发表机构 * Department of Statistics and Data Sciences, The University of Texas at Austin（德克萨斯大学奥斯汀分校统计与数据科学系）； Department of Mathematical Sciences, KAIST（韩国科学技术院数学科学系）

AI总结本文研究了使用随机傅里叶特征近似MMD检验在计算复杂度与统计功效之间的权衡，证明通过合理选择随机特征数量可以在亚二次时间内达到与MMD检验相同的最小最大分离率。

详情

AI中文摘要

近年来，两样本检验方法得到了快速发展，其中最大均值差异（MMD）检验已成为处理复杂和高维数据的有效工具。尽管MMD检验在成功和广泛应用方面表现突出，但其二次时间复杂度限制了大规模分析的应用。为了解决这一问题，本文重新审视了使用随机傅里叶特征近似的MMD检验，并研究其计算-统计权衡。我们首先揭示，只有当随机特征数量趋于无穷时，近似MMD检验才能在点估计上保持一致性。随后，我们考虑检验的均匀功效，并在最小最大检验框架下研究时间-功效权衡。我们的结果表明，通过精心选择随机特征数量，可以在亚二次时间内达到与MMD检验相同的最小最大分离率。我们基于不同的分布假设（如Sobolev球内的密度）展示了这一点。理论发现通过模拟研究得到验证。

英文摘要

Recent years have seen a surge in methods for two-sample testing, among which the Maximum Mean Discrepancy (MMD) test has emerged as an effective tool for handling complex and high-dimensional data. Despite its success and widespread adoption, the primary limitation of the MMD test has been its quadratic-time complexity, which poses challenges for large-scale analysis. While various approaches have been proposed to expedite the procedure, it has been unclear whether it is possible to attain the same power guarantee as the MMD test at sub-quadratic time cost. To fill this gap, we revisit the approximated MMD test using random Fourier features, and investigate its computational-statistical trade-off. We start by revealing that the approximated MMD test is pointwise consistent in power only when the number of random features approaches infinity. We then consider the uniform power of the test and study the time-power trade-off under the minimax testing framework. Our result shows that, by carefully choosing the number of random features, it is possible to attain the same minimax separation rates as the MMD test within sub-quadratic time. We demonstrate this point under different distributional assumptions such as densities in a Sobolev ball. Our theoretical findings are corroborated by simulation studies.

URL PDF HTML ☆

赞 0 踩 0

2406.07125 2026-05-21 cs.CR cs.AI cs.LG 版本更新

CARACAS: vehiCular ArchitectuRe for detAiled Can Attacks Simulation

CARACAS：用于详细CAN攻击模拟的车辆架构

Sadek Misto Kirdi, Nicola Scarano, Franco Oberti, Luca Mannella, Stefano Di Carlo, Alessandro Savino

发表机构 * Politecnico di Torino, Department of Control（都灵理工大学控制与计算机工程系）

AI总结本文提出CARACAS，一种用于模拟详细CAN攻击的车辆模型，通过结合Simulink等仿真框架和攻击模型的稳健表示，生成合成数据集以提高IDS的检测能力，重点展示电池电动车的扭矩控制攻击模拟。

Comments 6 pages, 8 figures, TrustAICyberSec workshop - IEEE ISCC 2024

详情

DOI: 10.1109/ISCC61673.2024.10733705
Journal ref: Proceeding of the 29th IEEE Symposium on Computers and Communications, ISCC 2024

AI中文摘要

现代车辆越来越容易受到利用网络基础设施的攻击，特别是控制器局域网络（CAN）网络。为了使用基于数据分析和分类的现代工具如入侵检测系统（IDS）来有效应对这些威胁，需要大量的CAN消息大数据集。本文探讨了通过利用仿真框架如Simulink的建模能力以及攻击模型的稳健表示来生成合成数据集的可行性，提出了CARACAS车辆模型，包括通过CAN消息进行组件控制和攻击注入能力。CARACAS展示了该方法的有效性，包括电池电动车（BEV）模型，并重点针对两种不同的场景中的扭矩控制攻击进行分析。

英文摘要

Modern vehicles are increasingly vulnerable to attacks that exploit network infrastructures, particularly the Controller Area Network (CAN) networks. To effectively counter such threats using contemporary tools like Intrusion Detection Systems (IDSs) based on data analysis and classification, large datasets of CAN messages become imperative. This paper delves into the feasibility of generating synthetic datasets by harnessing the modeling capabilities of simulation frameworks such as Simulink coupled with a robust representation of attack models to present CARACAS, a vehicular model, including component control via CAN messages and attack injection capabilities. CARACAS showcases the efficacy of this methodology, including a Battery Electric Vehicle (BEV) model, and focuses on attacks targeting torque control in two distinct scenarios.

URL PDF HTML ☆

赞 0 踩 0

2312.01386 2026-05-21 cs.LG stat.ML 版本更新

On the Suboptimality of GP-UCB under Polynomial Effective Optimism

关于多项式有效乐观性下GP-UCB的次优性质

Wenjia Wang, Xiaowei Zhang

发表机构 * Department of Industrial Systems Engineering and Management, National University of Singapore（新加坡国立大学工业系统工程与管理系）； Department of Industrial Engineering and Decision Analytics, The Hong Kong University of Science and Technology（香港科学与技术大学工业工程与决策分析系）

AI总结本文研究了GP-UCB在多项式有效乐观性下的次优性质，通过定义有效乐观性水平（核岭回归中的探索系数与正则化参数的乘积），在统一置信假设下证明了GP-UCB在Matérn核下的新后悔下界，表明有效乐观性水平的多项式增长排除了最小最大最优后悔率，揭示了标准GP-UCB证明最小最大最优性的障碍。

详情

AI中文摘要

高斯过程上置信界（GP-UCB）被广泛用于昂贵黑盒函数的序列优化。尽管文献中已建立了许多关于其累积后悔的上界，但GP-UCB是否最小最大最优仍是一个开放问题。我们通过定义有效乐观性水平（核岭回归中的探索系数与正则化参数的乘积）来研究这一问题。在统一置信假设下，我们证明了GP-UCB在Matérn核下的新后悔下界。该下界表明，有效乐观性水平的多项式增长（至对数因子）排除了最小最大最优的后悔率。由于这一情形涵盖大多数现有分析，我们的结果指出了证明标准GP-UCB最小最大最优性的具体障碍。更广泛地说，它表明当前上界与最小最大下界之间的差距可能反映了算法本身的限制，而不仅仅是分析的限制。

英文摘要

Gaussian process upper confidence bound (GP-UCB) is widely used for sequential optimization of expensive black-box functions. Although many upper bounds on its cumulative regret have been established in the literature, whether GP-UCB is minimax optimal remains open. We study this question through the effective optimism level, defined as the product of the exploration coefficient and the regularization parameter in kernel ridge regression. Under a uniform confidence assumption, we prove a new regret lower bound for GP-UCB with Matérn kernels. The bound shows that polynomial growth of the effective optimism level, up to logarithmic factors, rules out the minimax-optimal regret rate. Since this is the regime covered by most existing analyses, our result identifies a concrete obstacle to proving minimax optimality for standard GP-UCB. More broadly, it suggests that the gap between current upper bounds and minimax lower bounds may reflect a real limitation of the algorithm, not only of the analysis.

URL PDF HTML ☆

赞 0 踩 0

2307.11925 2026-05-21 cs.LG math.CA 版本更新

Mercer Large-Scale Kernel Machines from Ridge Function Perspective

从岭函数视角出发的Mercer大规模核机

Karol Dziedziul, Sergey Kryzhevich, Paweł Wieczyński

发表机构 * Faculty of Applied Mathematics, The Gda\'nsk University of Technology, ul. G. Narutowicza 11/12, 80-952 Gda\'nsk, Poland

AI总结本文从岭函数视角出发，研究大规模核机的Mercer性质，探讨了通过余弦函数的乘积之和近似核函数的可行性，并分析了该方法的障碍，应用于图像处理中的'一对一'方法。

Comments 17 pages, 3 figures

2304.12906 2026-05-21 cs.LG stat.ML 版本更新

The Score-Difference Flow for Implicit Generative Modeling

隐式生成建模的分数差流

Romann M. Weber

发表机构 * Disney Research（迪士尼研究）

AI总结本文提出分数差流作为隐式生成建模的一种新方法，通过最优减少两个分布之间的KL散度，展示了其与去噪扩散模型的等价性，并揭示了生成对抗网络训练中隐含的数据优化子问题与分数差流之间的联系。

Comments 25 pages, 5 figures, 4 tables. Updated final version of a paper originally published in Transactions on Machine Learning Research (TMLR), including minor typographical corrections and post-publication commentary connecting the SD flow to drifting models

详情

Journal ref: Transactions on Machine Learning Research (7/2023)

AI中文摘要

隐式生成建模（IGM）旨在生成与目标数据分布特征相符的合成样本。近期工作（如分数匹配网络、扩散模型）从推动合成源数据向目标分布的角度出发，通过动力学扰动或环境空间中的流来实现。在此方向上，我们提出任意目标与源分布之间的分数差（SD）作为一种流，该流能够最优地减少两者之间的KL散度。我们应用SD流到方便的代理分布上，这些分布只有在原始分布对齐时才对齐。我们证明在某些条件下，这种形式与去噪扩散模型具有形式等价性。我们还表明，生成对抗网络的训练包含一个隐含的数据优化子问题，当判别器最优时，该子问题在特定损失函数选择下诱导出SD流。因此，SD流为解决生成建模三重困境（高质量样本、模式覆盖和快速采样）的三种模型类别提供了理论联系，从而为统一方法奠定了基础。

英文摘要

Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. In this direction, we present the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that individually address the three challenges of the "generative modeling trilemma" -- high sample quality, mode coverage, and fast sampling -- thereby setting the stage for a unified approach.

URL PDF HTML ☆

赞 0 踩 0

2212.08989 2026-05-21 cs.LG 版本更新

Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics

深度学习应用于计算力学：综述、现状和经典方法

Loc Vu-Quoc, Alexander Humer

发表机构 * University of Illinois at Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Johannes Kepler University（约翰尼斯·开普勒大学）

AI总结本文综述了深度学习在计算力学中的应用，包括固体力学、流体力学和有限元技术，并讨论了混合和纯机器学习方法在解决非线性偏微分方程中的作用，同时介绍了LSTM、注意力机制和核方法等技术。

Comments 275 pages, 158 figures. Appeared online on 2023.03.01 at CMES-Computer Modeling in Engineering & Sciences

详情

DOI: 10.32604/cmes.2023.028130
Journal ref: CMES-Computer Modeling in Engineering & Sciences, Vol. 137, No. 2, pp.1069-1343, 2023

AI中文摘要

三个最近由于人工智能在艺术和科学领域取得的突破性进展作为动机：获奖的数字图像、蛋白质折叠、快速矩阵乘法。本文详细回顾了近年来在人工神经网络中的许多发展，特别是深度学习（DL），并将其应用于计算力学（固体力学、流体力学、有限元技术）。讨论了混合和纯机器学习（ML）方法。混合方法将传统PDE离散化与ML方法结合，以帮助建模复杂的非线性本构关系，非线性地降低模型阶数以实现高效模拟（湍流），或通过预测传统积分方法中的某些组件来加速模拟。其中，方法（1）和（2）依赖于长短期记忆（LSTM）架构，方法（3）依赖于卷积神经网络。纯ML方法解决（非线性）PDEs的方法由物理信息神经网络（PINN）方法表示，这些方法可以结合注意力机制来处理不连续解。LSTM和注意力架构，以及现代和通用的经典优化器，包括用于DL网络的随机性，都被广泛回顾。核机，包括高斯过程，为更高级的工作如浅层网络无限宽度提供了足够的深度。不仅面向专家，读者被假定熟悉计算力学，但不熟悉DL，其概念和应用从基础开始构建，旨在让首次学习者快速进入研究前沿。AI的历史和限制被回顾和讨论，特别关注指出经典方法中的错误陈述或误解，即使在知名参考文献中也是如此。大变形梁的位置和指向控制作为示例。

英文摘要

Three recent breakthroughs due to AI in arts and science serve as motivation: An award winning digital image, protein folding, fast matrix multiplication. Many recent developments in artificial neural networks, particularly deep learning (DL), applied and relevant to computational mechanics (solid, fluids, finite-element technology) are reviewed in detail. Both hybrid and pure machine learning (ML) methods are discussed. Hybrid methods combine traditional PDE discretizations with ML methods either (1) to help model complex nonlinear constitutive relations, (2) to nonlinearly reduce the model order for efficient simulation (turbulence), or (3) to accelerate the simulation by predicting certain components in the traditional integration methods. Here, methods (1) and (2) relied on Long-Short-Term Memory (LSTM) architecture, with method (3) relying on convolutional neural networks. Pure ML methods to solve (nonlinear) PDEs are represented by Physics-Informed Neural network (PINN) methods, which could be combined with attention mechanism to address discontinuous solutions. Both LSTM and attention architectures, together with modern and generalized classic optimizers to include stochasticity for DL networks, are extensively reviewed. Kernel machines, including Gaussian processes, are provided to sufficient depth for more advanced works such as shallow networks with infinite width. Not only addressing experts, readers are assumed familiar with computational mechanics, but not with DL, whose concepts and applications are built up from the basics, aiming at bringing first-time learners quickly to the forefront of research. History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics, even in well-known references. Positioning and pointing control of a large-deformable beam is given as an example.

URL PDF HTML ☆

赞 0 踩 0

1908.05972 2026-05-21 cs.LG stat.ML 版本更新

AI-based Prediction of Independent Construction Safety Outcomes from Universal Attributes

基于AI的独立施工安全结果的属性预测

Henrietta Baker, Matthew R. Hallowell, Antoine J. -P. Tixier

发表机构 * University of Edinburgh, UK（爱丁堡大学，英国）； University of Colorado at Boulder, USA（科罗拉多大学博尔德分校，美国）

AI总结本文改进并验证了先前研究中通过机器学习从属性中预测安全结果的方法，使用NLP提取属性并训练模型预测伤害严重性、类型、受影响身体部位和事件类型，通过独立人工标注消除潜在的人工相关性，结果表明属性仍具有高度预测性，同时引入了更大的数据集、新模型、模型堆叠和更合适的评估指标，最终成功预测伤害严重性，这是重大进展。

Comments Added author contributions and journal reference, updated corresponding author, fixed a few typos

详情

Journal ref: Automation in Construction 118 (2020): 103146

AI中文摘要

本文显著改进并验证了先前研究中通过机器学习从属性中预测安全结果的方法。与原始研究类似，我们使用自然语言处理（NLP）从原始事件报告中提取基本属性，并训练机器学习模型进行预测。此处预测的安全结果包括伤害严重性、伤害类型、受影响身体部位和事件类型。与原始研究不同，安全结果不是通过NLP提取，而是由独立的人工标注提供，从而消除了预测变量和预测目标之间可能的人工相关性。结果表明，属性仍具有高度预测性，证实了原始方法的有效性。当前研究的其他改进包括使用（1）一个包含超过90,000份报告的更大数据集，（2）两种新模型，XGBoost和线性支持向量机（SVM），（3）模型堆叠，（4）更简单的实验设置和更合适的性能指标，以及（5）对各属性重要性评分的分析。最后，伤害严重性结果得到良好预测，这在原始研究中并未实现。这是重大进展。

英文摘要

This paper significantly improves on, and finishes to validate, an approach proposed in previous research in which safety outcomes were predicted from attributes with machine learning. Like in the original study, we use Natural Language Processing (NLP) to extract fundamental attributes from raw incident reports and machine learning models are trained to predict safety outcomes. The outcomes predicted here are injury severity, injury type, body part impacted, and incident type. However, unlike in the original study, safety outcomes were not extracted via NLP but were provided by independent human annotations, eliminating any potential source of artificial correlation between predictors and predictands. Results show that attributes are still highly predictive, confirming the validity of the original approach. Other improvements brought by the current study include the use of (1) a much larger dataset featuring more than 90,000 reports, (2) two new models, XGBoost and linear SVM (Support Vector Machines), (3) model stacking, (4) a more straightforward experimental setup with more appropriate performance metrics, and (5) an analysis of per-category attribute importance scores. Finally, the injury severity outcome is well predicted, which was not the case in the original study. This is a significant advancement.

URL PDF HTML ☆

赞 0 踩 0

2605.20539 2026-05-21 cs.LG 版本更新

OpenSeisML: Open Large-Scale Real Seismic and well-log Dataset for Generative AI

OpenSeisML: 开放式大规模真实地震和井历数据集用于生成式AI

Ipsita Bhar, Huseyin Tuna Erdinc, Thales Souza, Charles Jones, Felix J. Herrmann

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； Osokey Ltd（Osokey公司）

AI总结本文提出OpenSeisML，一个开放的大型真实地震和井历数据集，用于支持生成式AI在地震反演中的应用，通过自动化数据整理流程提供可重复的地震数据准备，以训练生成模型捕捉地下属性的统计分布，从而生成多个统计上一致的现实实现用于不确定性量化。

Comments 5 pages, 8 figures

详情

AI中文摘要

机器学习（ML）和计算机视觉的出现显著加速了地震反演工作流程，通过减少传统昂贵的迭代方法的计算成本。然而，ML方法的发展和评估仍然受限于真实速度模型的稀缺性，因为大多数高质量数据由石油和天然气公司私有拥有。为了解决这一差距，我们提出了OpenSeisML，一个收集真实地震数据集的集合，旨在支持生成式AI（Gen-AI）在地震反演中的工作流程。这些数据集是从英国国家数据存储库（NDR）中公开可用的调查中精心挑选的。当地震体积处于时域而井位于深度时，需要进行时-深转换。我们使用检波器数据建立时-深关系，并通过插值构建速度模型，以实现对叠后地震数据的准确转换。在这里，我们提出了一种自动化数据整理流程，使地震数据准备成为可能，同时确保可重复性。目标是训练一个生成模型，以捕捉地下属性的统计分布，从而生成多个统计上一致的现实实现，用于不确定性量化，这些可以作为地震反演的先验条件。

英文摘要

The advent of machine learning (ML) and computer vision has significantly accelerated seismic inversion workflows by reducing the computational cost of traditionally expensive iterative methods. However, the development and evaluation of ML methods remain limited by the scarcity of realistic velocity models, as most high-quality data are privately owned by oil and gas companies. To address this gap, we present OpenSeisML, a collection of real seismic datasets designed to support generative AI (Gen-AI) workflows for seismic inversion. The datasets are curated from publicly available surveys in the UK National Data Repository (NDR). When seismic volumes are in the time domain and wells are in depth, a time-to-depth conversion is required. We use checkshot data to establish the time-depth relationship and construct a velocity model through interpolation for accurate conversion of post-stack seismic data. Here, we present an automated data curation pipeline that enables seismic data preparation while ensuring reproducibility. The objective is to train a generative model that captures the statistical distribution of subsurface properties, enabling the synthesis of multiple statistically consistent realizations for uncertainty quantification which can act as a prior for seismic inversion.

URL PDF HTML ☆

赞 0 踩 0

2605.20534 2026-05-21 cs.LG cs.AI stat.ML 版本更新

Axiomatizing Neural Networks via Pursuit of Subspaces

通过子空间追求轴心化神经网络

Mehmet Yamac, Mert Duman, Ugur Akpinar, Felix Rojas Casadiego, Serkan Kiranyaz, Marcel van Gerven, Moncef Gabbouj

发表机构 * Tampere University, Faculty of ITC, Finland（芬兰塔尔库大学信息与通信技术学院）； Department of Electrical Engineering, Qatar University, Qatar（卡塔尔大学电气工程系）； Donders Institute, Radboud University, The Netherlands（荷兰拉德堡德大学多纳尔斯研究所）

AI总结本文提出一个基于几何公理的框架，用于解释神经网络的行为，通过子空间追求假设，统一了表示、计算和泛化在浅层和深层架构中的视角。

Comments 43 pages, 25 figures. Code and additional materials will be released

2605.20533 2026-05-21 cs.LG 版本更新

基于二次近似的指数机制用于具有隐私保障的机器学习模型微调

Hoang Tran, Jorge Ramirez, Jiayi Wang, Alberto Bocchinfuso, Christopher Stanley, M. Paul Laiu

发表机构 * Computer Science and Mathematics Division, Oak Ridge National Laboratory（橡树岭国家实验室计算机科学与数学 division）； Computational Science and Engineering Division, Oak Ridge National Laboratory（橡树岭国家实验室计算科学与工程 division）； HPC Department, Cineca（Cineca 高性能计算部）

AI总结本文提出一种基于指数机制的随机算法，用于在保证差分隐私的前提下微调预训练模型，通过结合局部二次近似和新数据集信息构建效用函数，并引入随机投影策略提升高维模型的可扩展性。

详情

AI中文摘要

微调过程将预训练的机器学习模型适应到一个小而敏感的数据集，但此过程有风险记住个体新的数据点，使模型对试图提取敏感信息的对手而言变得脆弱。在本文中，我们开发了一种基于指数机制的随机算法，用于微调的同时确保差分隐私。我们的关键思想是构建一个简单的效用函数，该函数结合了预训练模型的局部二次近似和新数据集的信息。所得到的指数机制允许以闭式形式精确地从多元正态分布中进行抽样。我们建立了该方法的理论隐私保证、灵敏度界限和准确性估计。我们进一步引入了一种随机投影策略，使该方法能够扩展到高维模型。在MNIST基准和MIMIC临床数据集上的数值实验显示，该方法在现有差分隐私微调技术中表现具有竞争力。

英文摘要

Fine-tuning adapts a pretrained machine learning model to a small, sensitive dataset, but this process risks memorizing individual new data points, making the model vulnerable to adversaries who seek to extract sensitive information. In this work, we develop a randomized algorithm based on the exponential mechanism for fine-tuning while ensuring differential privacy. Our key idea is to construct a simple utility function that combines a local quadratic approximation of the pretrained model with information from the new dataset. The resulting exponential mechanism admits exact sampling from a multivariate normal distribution in closed form. We establish theoretical privacy guarantees, sensitivity bounds, and accuracy estimations for our method. We further introduce a random-projection strategy that makes the approach scalable to high-dimensional models. Numerical experiments on the MNIST benchmark and the MIMIC clinical dataset demonstrate competitive performance against existing differentially private fine-tuning techniques.

URL PDF HTML ☆

赞 0 踩 0

2605.20515 2026-05-21 cs.LG eess.SP 版本更新

Online Conformal Prediction with Corrupted Feedback

在线腐蚀反馈下的符合预测

Bowen Wang, Matteo Zecchin, Osvaldo Simeone

发表机构 * Department of Engineering, King’s College London（伦敦国王学院工程系）； Communication Systems Department, EURECOM（EURECOM通信系统部）； Institute for Intelligent Networked Systems, Northeastern University London（伦敦东北大学智能网络系统研究所）

AI总结本文研究了在存在腐蚀反馈的情况下在线符合预测的鲁棒性问题，提出两种鲁棒方案并通过实验验证了其在腐蚀反馈下的改进性能。

详情

AI中文摘要

现代人工智能系统需要校准的不确定性估计，这些估计在顺序和非平稳环境中仍需保持可靠。在线符合预测（OCP）通过适应性更新的预测集来解决这一挑战，这些预测集提供确定性的长期误覆盖保证。然而，这些保证依赖于对过去预测集覆盖情况的完美反馈假设。在实践中，观察到的误覆盖指示器可能受到噪声、通信故障或对抗性操纵的干扰，这会严重降低OCP的校准保证。本文研究了在腐蚀反馈下的OCP。我们首先将反馈腐蚀建模为任意的二进制翻转序列，并分析反馈腐蚀如何影响和降低标准OCP的误覆盖性能。然后我们提出两种鲁棒方案：通过过滤的鲁棒OCP，利用预测阈值的结构特性来过滤腐蚀反馈；以及通过主动补偿的鲁棒OCP，整合主动补偿机制以减轻腐蚀反馈的影响。对于这两种方法，我们建立了显式的误覆盖保证，并进一步专门针对独立随机翻转模型和具有记忆限制的任意误差模型。在真实世界数据集上的实验验证了所提出的方法，显示在腐蚀反馈下校准显著改进，预测集明显更小，相比基线OCP方法。

英文摘要

Modern artificial intelligence systems require calibrated uncertainty estimates that remain reliable in sequential and non-stationary environments. Online conformal prediction (OCP) addresses this challenge through adaptively updated prediction sets that provide deterministic long-run miscoverage guarantees. These guarantees, however, hinge on the assumption of perfect feedback about the coverage of past prediction sets. In practice, the observed miscoverage indicator may be corrupted by noise, communication failures, or adversarial manipulation, which can severely degrade OCP's calibration guarantees. In this paper, we study OCP under corrupted feedback. We first model feedback corruption as an arbitrary binary flip sequence, and analyze how feedback corruption affects and degrades the miscoverage performance of standard OCP. We then propose two robust schemes: robust OCP via filtering, which leverages the structural properties of the predicted threshold to filter corrupted feedback, and robust OCP via active compensation, which incorporates an active compensation mechanism to mitigate the effect of corrupted feedback. For both methods, we establish explicit miscoverage guarantees, which are further specialized for an independent stochastic flip model and for an arbitrary error model with memory bounds. Experiments on real-world datasets validate the proposed approach, showing markedly improved calibration and significantly smaller prediction sets compared with baseline OCP methods under corrupted feedback.

URL PDF HTML ☆

赞 0 踩 0

2605.20506 2026-05-21 cs.LG cs.CL 版本更新

Reinforcing Human Behavior Simulation via Verbal Feedback

通过言语反馈强化人类行为模拟

Weiwei Sun, Xuhui Zhou, Jiarui Liu, Weihua Du, Haojia Sun, Yiqing Xie, Qianou Ma, Sihao Chen, Mengting Wan, Longqi Yang, Pei Zhou, Sherry Wu, Sean Welleck, Graham Neubig, Yiming Yang, Maarten Sap

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Microsoft（微软）

AI总结本文提出DITTO模型，通过将言语反馈作为强化学习中的首要信号来提升LLM模拟人类行为的能力，并引入SOUL基准测试平台，展示了在多个任务中显著提升性能的成果。

详情

AI中文摘要

人类通过言语反馈（例如父母说“那很粗鲁”或朋友解释“这是为什么那会伤害你”）学习社会规范和行为。然而，对于LLM而言，学习反馈主要集中在代码和数学等领域，这些领域中的RL奖励可以直接验证并压缩为标量值。随着LLM越来越多地用于模拟人类行为，例如代表用户、患者、学生和其他角色，有必要使它们更加人性化，这需要接受一种根本不同的信号：主观的、多方面的言语反馈。我们提出了DITTO，一个通过将言语反馈作为强化学习中的首要信号进行训练的模型。每次回放后，DITTO会接收言语反馈并生成反馈条件的改进回放；两个输出通过GRPO联合优化，将言语指导蒸馏到基础策略中，而无需在测试时使用反馈。我们还引入了SOUL（Simulation gym Of hUman-Like behavior），一个涵盖10个任务、六个类别的统一基准和训练数据集：理论思维、角色扮演、社交技能、学习模拟、用户模拟和角色模拟。DITTO在基础模型上平均提升了36%，并在SOUL基准测试中的6个任务上超过了GPT-5.4，证明了通过言语反馈的强化学习是训练LLM模拟人类行为的有前途的方向。

英文摘要

Humans learn social norms and behaviors from verbal feedback (e.g., a parent saying "that was rude" or a friend explaining "here's why that hurt"). Yet, learning from feedback for LLMs has largely focused on domains like code and math, where RL rewards are directly verifiable and condensed into scalar values. As LLMs are increasingly used to simulate human behavior, e.g., standing in for users, patients, students, and other personas, there is a pressing need to make them more human-like, which requires embracing a fundamentally different kind of signal: feedback that is verbal, subjective, and multi-faceted. We present DITTO, a model trained by treating verbal feedback as a first-class signal in reinforcement learning. After each rollout, DITTO receives verbal feedback and generates a feedback-conditioned improved rollout; both outputs are jointly optimized with GRPO, distilling verbal guidance into the base policy without requiring feedback at test time. We also introduce SOUL (Simulation gym Of hUman-Like behavior), a unified benchmark and training data suite spanning 10 tasks across six categories: Theory of Mind, character role play, social skill, learner simulation, user simulation, and persona simulation. DITTO achieves an average 36% improvement over the base model and exceeds GPT-5.4 on 6 of 10 SOUL benchmarks, demonstrating that RL with verbal feedback is a promising direction for training LLMs to simulate human behavior.

URL PDF HTML ☆

赞 0 踩 0

2605.20502 2026-05-21 cs.LG cs.AI cs.CV stat.AP stat.ML 版本更新

Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection

基于表示空间扩散模型的Tippett最小融合多编码器异常检测

Neelkamal Bhuyan

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结本文提出了一种多编码器融合的表示空间扩散模型，通过统计分析每个编码器对特定分布偏移类型的敏感性，引入EncMin2L门控机制，无需使用OOD标签即可在较低参数成本下提升异常检测性能，同时在四种分布偏移类型上均达到0.94以上的AUROC。

Comments 14 pages

详情

AI中文摘要

我们通过多编码器融合的每编码器表示空间扩散模型（RDMs）来解决跨完整分布偏移谱的异常检测问题，包括全局域变化、语义分歧、纹理差异和协变量腐蚀。我们从ID数据中统计地识别每个编码器对特定偏移类型的敏感性，并引入EncMin2L——一种编码器无关的两级min(⋅)门控，能够在不使用OOD标签的情况下结合和校准每编码器扩散基的似然检测器，参数成本比单编码器基线低2.3倍。两种ID数据诊断：η²（类条件F检验）和Δμ（在合成腐蚀下的对数似然偏移）量化编码器的专业化，而Tippett最小p值组合将每编码器得分聚合为一个校准稳定的OOD信号。EncMin2L在所有四种偏移类型上均达到≥0.94的AUROC，优于在重叠基准上的最佳表示空间扩散OOD检测器。

英文摘要

We address out-of-distribution (OOD) detection across the full spectrum of distribution shifts -- global domain changes, semantic divergence, texture differences, and covariate corruptions -- through a multi-encoder fusion of per-encoder representation-space diffusion models (RDMs). We statistically identify each encoder's sensitivity to specific shift types from ID data alone and introduce EncMin2L -- an encoder-agnostic two-level $\min(\cdot)$-gate that combines and calibrates per-encoder diffusion-based likelihood detectors without OOD labels, outperforming monolithic multi-encoder baselines at $2.3\times$ lower parameter cost. Two ID-data diagnostics: $η^2$ (class-conditional F-test) and $Δμ$ (log-likelihood shift under synthetic corruptions) -- quantify encoder specialization, while a Tippett minimum $p$-value combination aggregates per-encoder scores into a single, calibration-stable OOD signal. EncMin2L achieves $\geq 0.94$ AUROC across all four shift types simultaneously, outperforming the state-of-the-art representation-space diffusion OOD detectors across overlapping benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.20494 2026-05-21 cs.LG physics.ao-ph stat.AP 版本更新

A 10,000-Year Global Stochastic Tropical Cyclone Catalog with Wind-Dependent Track Transitions (WHITS)

具有风依赖性路径转换的10,000年全球随机热带气旋目录（WHITS）

Jennifer Nakamura, Upmanu Lall

发表机构 * Lamont-Doherty Earth Observatory, Columbia University（哥伦比亚大学拉蒙特-多赫蒂地球观测站）； School of Complex Adaptive Systems, Arizona State University（亚利桑那州立大学复杂适应系统学院）； Earth and Environmental Engineering, Columbia University（哥伦比亚大学地球与环境工程系）

AI总结本文提出WHITS方法，通过非参数半马尔可夫路径生成器生成全球10,000年合成气旋目录，以提高保险损失评估的可靠性。

详情

AI中文摘要

可靠的热带气旋（TC）风险评估受到历史记录的简短和空间稀疏性的限制，特别是对于罕见的高强度登陆事件，这些事件主导了保险损失。我们提出了WHITS（风聚焦飓风交互路径模拟器），这是一种非参数半马尔可夫路径生成器，扩展了Nakamura等人（2015）的HITS框架，有三种改进：在历史路径段之间转换时，除了位置、年龄和前进向量外，还根据局部风速进行条件；在比较向量项上选择核时，进行了细化以抑制动态不一致的跳跃；并在每个转换中应用了短平滑窗口，以消除下游风暴潮用户报告的位置和风速不连续性。WHITS被拟合到每个六个盆地的完整可用最佳轨迹记录中，北大西洋延伸至1851年，在其他盆地延伸至可靠最佳轨迹数据的最早年份。所得到的10,000年全球合成目录重现了所有盆地的观测路径密度和每年飓风/台风风力打击概率。该目录旨在用于灾难风险应用，其中大量、低偏倚的物理合理路径比小而统计上修正的样本更有用。

英文摘要

Reliable assessment of tropical cyclone (TC) risk is limited by the brevity and spatial sparsity of the historical record, particularly for the rare, high-intensity landfalls that dominate insured loss. We present WHITS (Wind-focused Hurricane Interactive Track Simulator), a non-parametric semi-Markov track generator that extends the HITS framework of Nakamura et al. (2015) in three ways: transitions between historical track segments are conditioned on local wind speed in addition to position, age, and forward vector; the kernel selection on the comparative-vector term is sharpened to suppress dynamically inconsistent jumps; and a short smoothing window is applied across each transition to remove the position and wind discontinuities reported by downstream surge users. WHITS is fit to the full available best-track record in each of six basins in IBTrACS, extending in the North Atlantic to 1851 and in other basins to the earliest year of reliable best-track data. The resulting 10,000-yr global synthetic catalog reproduces observed track density and the annual hurricane/typhoon-force wind-hit probability across all basins. The catalog is intended for catastrophe-risk applications where a large, low-bias sample of physically plausible tracks is more useful than a small, statistically corrected one.

URL PDF HTML ☆

赞 0 踩 0

2605.20485 2026-05-21 cs.LG 版本更新

ZEBRA: Zero-shot Budgeted Resource Allocation for LLM Orchestration

ZEBRA: 零样本预算化资源分配用于LLM编排

May Hamri, Inbal Talgam-Cohen

发表机构 * Tel Aviv University（特拉维夫大学）

AI总结该研究提出ZEBRA框架，通过将多阶段预算分配转化为连续非线性背包问题，有效解决多智能体流水线中预算分配问题，实验显示其在多个任务上均优于传统方法。

详情

AI中文摘要

随着自主代理在固定货币预算下执行端到端任务，关键问题从预算是否被尊重转变为如何有效使用预算。现有预算感知方法通常在单一代理内逐步控制推理过程，或通过强化学习学习资源分配策略。本文提出ZEBRA，一种零样本框架，将多阶段预算分配转化为连续非线性背包问题：一个LLM控制器估计各阶段的效用曲线，通过拉格朗日乘数的水填充搜索返回各阶段的分配。加法和乘法聚合统一在同一个求解器下。在150个任务APPS编码基准测试中，ZEBRA变体在所有聚合指标上均优于LLM直接分配方法。在预算为无约束支出的α=0.5时，ZEBRA恢复了94.4%的无约束质量，而LLM直接分配仅为88.1%。该优势具有统计显著性，并且在编码之外也具有转移性：在3阶段的HotpotQA流水线中，ZEBRA比LLM直接分配高出14.3个百分点，分配在经验上对曲线估计噪声具有鲁棒性。在HotpotQA中，ZEBRA达到的预算分配（近平衡）与APPS中的分配（偏向细化阶段）不同，显示出对流水线结构的适应性。更广泛地说，我们展示了在推理时间使用轻量级算法指导可以改善自主多智能体系统的经济行为。

英文摘要

As autonomous agents increasingly execute end-to-end tasks under fixed monetary budgets, the pressing open question shifts from whether the budget is respected, to how to spend it effectively. Existing budget-aware methods typically control reasoning step-by-step within a single agent, or learn resource allocation policies via RL. None address how to split a budget across the composing phases of a multi-agent pipeline at inference time. We propose ZEBRA, a zero-shot framework that reduces multi-phase budget allocation to a continuous nonlinear knapsack problem: an LLM controller estimates per-phase utility curves, and a water-filling search on the Lagrange multiplier returns the per-phase split. Additive and multiplicative aggregations are unified under the same solver. On a $150$-task APPS coding benchmark, both ZEBRA variants outperform LLM-direct (budget allocation directly by an LLM) on every aggregate metric. At a budget of $α= 0.5$ of the unconstrained spend, ZEBRA recovers $94.4\%$ of unconstrained quality, versus $88.1\%$ for LLM-direct. The advantage is statistically significant and transfers beyond coding: on a $3$-phase HotpotQA pipeline, ZEBRA beats LLM-direct by $14.3$pp, with allocations empirically robust to curve-estimation noise. On HotpotQA, ZEBRA arrives at a different budget split (near-balanced) compared to the APPS one (skewed towards a refinement phase), showing adaptation to the pipeline structure. More broadly, we show that lightweight algorithmic guidance at inference time can improve the economic behavior of autonomous multi-agent systems.

URL PDF HTML ☆

赞 0 踩 0

2605.20482 2026-05-21 cs.LG cs.SY eess.SY 版本更新

Quadratic Characterizations for Reachability Analysis of Neural Networks

二次特性用于神经网络可达性分析

Elias Khalife, Mazen Farhood, Pierre-Loic Garoche

发表机构 * Kevin T. Crofton Department of Aerospace and Ocean Engineering, Virginia Tech（凯文·T·克罗夫顿航空航天与海洋工程系，弗吉尼亚理工学院）； Federation ENAC ISAE-SUPAERO ONERA, Universite de Toulouse（ENAC ISAE-SUPAERO ONERA联盟，图卢兹大学）

AI总结本文提出了一种构建二维实平面上标量关系的验证二次特性的框架，通过局部生成候选二次不等式并全局验证，以提高神经网络可达性分析的精度和效率。

详情

AI中文摘要

二次约束（QCs）广泛用于表征非线性和不确定性，但在有界域上通用分析特性可能较为保守。本文开发了一个框架，用于构建二维实平面上标量关系的验证二次特性。候选二次不等式通过使用关系和外部样本点求解凸二次规划局部生成。然后通过求和平方证书在精确半代数描述或非多项式关系的放松多项式描述上进行全局验证。所得到的验证约束定义了所考虑域上标量关系的可信上近似。这些约束与基于QCs和点wise积分二次约束（IQCs）的现有分析框架直接兼容，可用于静态非线性和不确定性的分析，并可嵌入基于QCs的半正定规划中，用于前馈神经网络的可达性和安全性分析。对于平滑激活函数如tanh，该方法产生域依赖的二次特性，作为通用扇区或斜率描述的替代方案。对于ReLU网络，我们给出了减少QC基于可达性分析保守性的方法，通过利用神经元间的依赖关系和更紧的局部界限。数值示例展示了对平滑激活函数的改进可达性结果，对ReLU网络的减少保守性，以及通过涉及饱和的示例展示了其在神经网络之外的应用。

英文摘要

Quadratic constraints (QCs) are widely used to characterize nonlinearities and uncertainties, but generic analytical characterizations can be conservative on bounded domains. This paper develops a framework for constructing verified quadratic characterizations of scalar relations in the two-dimensional real plane. Candidate quadratic inequalities are locally generated by solving convex quadratic programs using samples from the relation and exterior sample points. They are then verified globally using sum-of-squares certificates over an exact semialgebraic description or, in the case of nonpolynomial relations, over relaxed polynomial descriptions. The resulting verified constraints define a sound overapproximation of the scalar relations over the considered domains. These constraints are directly compatible with existing analysis frameworks based on QCs and pointwise integral quadratic constraints (IQCs) for static nonlinearities and uncertainties, and they can also be embedded in QC-based semidefinite programs for reachability and safety analysis of feedforward neural networks. For smooth activations such as $\tanh$, the method yields domain-dependent quadratic characterizations that constitute an alternative to generic sector- or slope-based descriptions. For ReLU networks, we give methods to reduce conservatism in QC-based reachability analysis of feedforward networks by exploiting dependencies between neurons and tighter local bounds. Numerical examples demonstrate improved reachability results for smooth activations, reduced conservatism for ReLU networks, and applicability beyond neural networks through an example involving saturation.

URL PDF HTML ☆

赞 0 踩 0

2605.20479 2026-05-21 cs.CV cs.LG 版本更新

Oracle Supervision Transfers for Hyperparameter Prediction in Model-Based Image Denoising

用于基于模型的图像去噪中超参数预测的Oracle监督转移

Jianmin Liao, Lixin Shen, Yuesheng Xu

发表机构 * Department of Mathematics Syracuse University（数学系苏利文大学）； Department of Mathematics & Statistics Old Dominion University（数学与统计学系老 Dominion 大学）

AI总结该研究提出HyperDn，一种单配置条件预测器，通过聚合源配置的Oracle监督，预测新的去噪器-噪声配置的异质超参数，展示了在跨范式实验中，从相对便宜的TV/TGV变分源转移到更昂贵的扩散模型DiffPIR时，通过少量或无目标Oracle标签实现接近Oracle性能的成果。

详情

AI中文摘要

超参数预测是基于模型的图像去噪器中的关键实际瓶颈，从经典的TV/TGV变分求解器到现代的扩散基模型如DiffPIR。尽管现有的学习预测器可以实现接近Oracle的性能，但这种方法扩展性差：每个新的配置通常需要其自身的Oracle标记训练集，且每个标签都需要通过与干净地面真实值对比的分层网格搜索来评估。因此，我们询问是否可以从源配置收集的Oracle监督能够转移到目标配置，而使用很少或没有目标Oracle标签。我们提出了HyperDn，一种单配置条件预测器，通过聚合源配置的Oracle监督，预测新的去噪器-噪声配置的异质超参数。在跨范式实验中，HyperDn从相对便宜的TV/TGV变分源转移到更昂贵的扩散基DiffPIR。仅使用2个目标Oracle标签，它达到了30.23 dB，接近Oracle性能，且在使用1/32个目标标签的情况下优于训练自研的每配置64标签预测器。在没有目标Oracle标签的情况下，HyperDn在两个未见过的噪声类型混合和从相对便宜的96×96源图像转移到512×768目标时也达到了接近Oracle的PSNR。这些结果表明，超参数预测的昂贵Oracle监督可以从源转移到新的目标配置，从而减少为每个新的去噪配置重建Oracle标签的需求。

英文摘要

Hyperparameter prediction is a critical practical bottleneck for model-based image denoisers, ranging from classical TV/TGV variational solvers to modern diffusion-based models such as DiffPIR. While existing learned predictors can achieve near-oracle performance, this approach scales poorly: each new configuration conventionally requires its own oracle-labeled training set, and each label requires a hierarchical grid search evaluated against clean ground truth. We therefore ask whether oracle supervision collected on source configurations can transfer to target configurations with few or no target oracle labels. We propose HyperDn, a single configuration-conditioned predictor that pools oracle supervision across source configurations and predicts heterogeneous hyperparameters for new denoiser--noise configurations. In a cross-paradigm experiment, HyperDn transfers from relatively cheap TV/TGV variational sources to more expensive diffusion-based DiffPIR. With only $2$ target oracle labels, it reaches $30.23$\,dB, within $0.90$\,dB of the oracle, and outperforms the $64$-label per-configuration predictor trained from scratch, using $1/32$ as many target labels as that baseline point. Without any target oracle labels, HyperDn also reaches near-oracle PSNR on two unseen mixtures of seen noise types and on transfer from relatively cheap $96\times 96$ source images to $512\times 768$ targets. Together, these results show that expensive oracle supervision for hyperparameter prediction can be transferred from source to new target configurations, reducing the need to rebuild oracle labels for each new denoising configuration.

URL PDF HTML ☆

赞 0 踩 0

2605.20477 2026-05-21 cs.LG cs.AI cs.CL 版本更新

Training Language Agents to Learn from Experience

训练语言代理以从经验中学习

Yuval Shalev, Zifeng Ding, Mateja Jamnik

发表机构 * University of Cambridge（剑桥大学）

AI总结本文提出了一种名为In-context Training（ICT）的任务框架，用于评估语言代理在跨任务中的自我改进能力，并通过基于强化学习的训练管道直接从经验中学习反思，从而在多个基准任务中优于基线模型，展示了从经验中学习的能力本身可以被学习。

详情

AI中文摘要

语言代理可以在交互环境中通过经验进行适应，但当前基于反思的方法只能在单个任务实例内进行自我纠正。是否可以将这种经验提炼成可重用的教训，从而在未来的未见任务上提高性能仍不明确。我们通过引入In-context Training（ICT）任务来解决这个问题，这是一种用于评估语言代理跨任务自我改进能力的框架。在ICT中，一个反思模型观察由行为模型收集的轨迹，并生成旨在提高行为模型在未见任务上的性能的系统提示。然后，我们提出了一种基于强化学习的训练管道，用于直接从经验中学习此类反思，而无需人工提供的示例。在ALFWorld和MiniHack上，我们训练的反思器在大多数保留的任务家族上优于未训练的基线，表明从经验中学习的能力本身可以被学习。在某些情况下，我们观察到在训练反射器的基准之外的泛化能力，能够显著不同的环境。最后，我们介绍了MetaGym，一个通用的Python库，用于构建元环境，从而促进未来对自我改进语言代理的研究。

英文摘要

Language agents can adapt from experience in interactive environments, but current reflection-based methods can only self-correct within a single task instance. Whether such experience can be distilled into reusable lessons that improve performance on future unseen tasks remains unclear. We address this problem by introducing the In-context Training (ICT) task, a framework for evaluating cross-task self-improvement in language agents. In ICT, a reflector model observes trajectories collected by an actor model and generates system prompts intended to improve the actor's performance on future unseen tasks. We then propose an RL-based training pipeline for learning such reflections directly from experience, without human-provided examples. Across ALFWorld and MiniHack, our trained reflectors outperform an untrained baseline on most held-out task families, showing that the ability to learn from experience can itself be learned. In some cases, we observe generalisation beyond the benchmark on which the reflector was trained, to substantially different environments. Finally, we introduce MetaGym, a generic Python library for constructing meta-environments, enabling future research on self-improving language agents.

URL PDF HTML ☆

赞 0 踩 0

2605.20473 2026-05-21 cs.SE cs.AI cs.LG 版本更新

Code Generation by Differential Test Time Scaling

通过微分测试时间缩放进行代码生成

Yifeng He, Ethan Wang, Jicheng Wang, Xuanxin Ouyang, Hao Chen

发表机构 * University of California, Davis（加州大学戴维斯分校）

AI总结本文提出DiffCodeGen，一种基于覆盖引导的微分分析的代码生成方法，通过生成多样化的代码候选并利用覆盖引导模糊测试来合成输入，无需现有测试用例或大语言模型，从而提高效率和可扩展性。

Comments 16 main text, 21 pages with references

详情

AI中文摘要

测试时间缩放已崭露头角，成为通过在推理时间探索大规模解决方案空间来改进代码生成的有前途的方法。然而，现有方法通常依赖于公开的测试用例，这些在实践中不可用，或需要大量的LLM推理来选择候选，导致显著的token消耗和时间开销。我们提出了DiffCodeGen，一种基于覆盖引导的微分分析的新型测试时间缩放方法用于代码生成。DiffCodeGen利用各种采样和提示策略生成多样化的代码候选，然后应用覆盖引导的模糊测试来合成输入，而无需任何现有的测试用例或大语言模型。通过在这些输入上执行所有候选，DiffCodeGen捕捉到它们的动态行为并根据行为相似性对候选进行聚类。DiffCodeGen选择最大聚类的medoid作为最终输出。不同于先前的测试时间缩放方法需要额外的LLM推理来选择候选，DiffCodeGen在不调用任何额外模型的情况下进行选择，导致极小或没有额外的token消耗。DiffCodeGen完全异步，自然适合当前代理编程的趋势，因此是高效且高度可扩展的。我们评估了DiffCodeGen在4个大型语言模型上的表现，展示了相对于基线的一致改进。与最先进的测试时间缩放方法相比，DiffCodeGen在仅使用少量时间和token的情况下实现了竞争或更优的性能。DiffCodeGen是模型无关的，可以与推理模型结合以进一步提升性能。

英文摘要

Test-time scaling has emerged as a promising approach for improving code generation by exploring large solution spaces at inference time. However, existing methods often rely on public test cases that are unavailable in practice, or require extensive LLM inference for candidate selection, leading to significant token consumption and time overhead. We present DiffCodeGen, a novel test-time scaling method for code generation based on coverage-guided differential analysis. DiffCodeGen generates diverse code candidates using various sampling and prompting strategies, then applies coverage-guided fuzzing to synthesize inputs without requiring any existing tests or large language models. By executing all candidates on these inputs, DiffCodeGen captures their dynamic behavior and clusters candidates based on behavioral similarity. DiffCodeGen selects the medoid of the largest cluster as the final output. Unlike prior test-time scaling methods that invoke additional LLM inference for candidate selection, DiffCodeGen performs selection without any extra model calls, incurring little to no additional token consumption. DiffCodeGen is fully asynchronous, naturally suited to the current trend of agentic coding, and is thus efficient and highly scalable. We evaluate DiffCodeGen across 4 large language models, demonstrating consistent improvements over baselines. Compared to state-of-the-art test-time scaling methods, DiffCodeGen achieves competitive or superior performance while using only a fraction of time and tokens. DiffCodeGen is model-agnostic and can be combined with reasoning models to further boost performance.

URL PDF HTML ☆

赞 0 踩 0

2605.20450 2026-05-21 cs.LG cs.CR 版本更新

SMA-DP: Spectral Memory-Aware Differential Privacy for Deep Learning

SMA-DP：基于频谱记忆的差分隐私用于深度学习

Mohammad Partohaghighi, Roummel Marcia

发表机构 * Department of Electrical Engineering and Computer Science（电气工程与计算机科学系）； University of California, Merced（加州大学默塞德分校）； Department of Applied Mathematics（应用数学系）

AI总结本文提出了一种名为SMA-DP-SGD的差分隐私随机梯度下降方法，通过引入频谱记忆分支来增强DP-SGD的隐私保护性能，从而在多个数据集上实现了更优的准确率和隐私保护。

详情

AI中文摘要

差分隐私随机梯度下降（DP-SGD）通过每个示例裁剪和校准的高斯噪声实现私人的深度学习，但其高方差更新会降低在具有挑战性的数据集上的效用。我们提出了SMA-DP-SGD，一种基于频谱记忆的差分隐私随机梯度下降方法，该方法通过在之前隐私化噪声发布中构建的分数记忆分支来增强DP-SGD。受WeightWatcher启发的幂律频谱指数提供了组级可靠性信号，在实验中以层级方式实现，以适应衰减和有效记忆深度。隐私历史对齐、范数匹配和激活预热稳定了记忆贡献。隐私保持透明：在给定隐私发布历史的条件下，记忆分支是固定的，而唯一新的数据依赖项是当前裁剪总和乘以固定系数β。因此，SMA-DP-SGD保持了干净的条件敏感度结构，并且当β=1时，精确恢复组级DP-SGD。在CIFAR-100、CIFAR-10和MNIST上的实验显示，SMA-DP-SGD在多个DP优化基线中表现竞争或更优，尤其在CIFAR-100和CIFAR-10上获得最大收益。CIFAR-10的消融实验显示，β控制隐私-效用轨迹，而频谱和记忆诊断确认了受控的短至中等有效记忆深度和小的记忆分支比。运行时分析显示，该机制带来了额外的开销，大约是DP-SGD的2.94倍，在我们的CIFAR-10实现中，揭示了适应性隐私记忆与计算成本之间的实际权衡。

英文摘要

Differentially private stochastic gradient descent (DP-SGD) enables private deep learning through per-example clipping and calibrated Gaussian noise, but its high-variance updates can reduce utility on challenging datasets. We propose \textbf{SMA-DP-SGD}, a \textbf{Spectral Memory-Aware Differentially Private Stochastic Gradient Descent} method that augments DP-SGD with a fractional memory branch built only from previously privatized noisy releases. WeightWatcher-inspired power-law spectral exponents provide group-wise reliability signals, instantiated layer-wise in our experiments, to adapt the decay and effective memory depth. Private-history alignment, norm matching, and warm-up activation stabilize the memory contribution. Privacy remains transparent: conditioned on the private release history, the memory branch is fixed, and the only newly data-dependent term is the current clipped sum scaled by a fixed coefficient $β$. Hence, SMA-DP-SGD preserves a clean conditional sensitivity structure and exactly recovers group-wise DP-SGD when $β=1$. Experiments on CIFAR-100, CIFAR-10, and MNIST show competitive or superior accuracy over several DP optimization baselines, with the largest gains on CIFAR-100 and CIFAR-10. CIFAR-10 ablations show that $β$ controls the privacy--utility trajectory, while spectral and memory diagnostics confirm a controlled short-to-moderate effective memory depth and a small memory-branch ratio. Runtime analysis shows that the mechanism incurs additional overhead, about $2.94\times$ DP-SGD in our CIFAR-10 implementation, revealing a practical trade-off between adaptive private memory and computational cost.

URL PDF HTML ☆

赞 0 踩 0

2605.20449 2026-05-21 cs.LG cs.AI 版本更新

LLM Pretraining Shapes a Generalizable Manifold: Insights into Cross-Modal Transfer to Time Series

LLM预训练塑造了可泛化的流形：跨模态迁移至时间序列的洞察

Alexis Roger, Prateek Humane, Zhenghan Tai, Gwen Legate, Andrei Mircea, Vasilii Feofanov, Irina Rish

发表机构 * McGill University（麦吉尔大学）； Mila - Quebec AI Institute（魁北克人工智能研究所）； Université de Montréal（蒙特利尔大学）； University of Toronto（多伦多大学）； Concordia University（康科迪亚大学）； com（42.com）

AI总结研究探讨了语言预训练的Transformer能否成为有效的时序预测器，并揭示了跨模态迁移的机制，指出预训练构建了流形，微调则将数值动态投影到任务相关方向。

详情

AI中文摘要

语言预训练的Transformer能否成为有效的时序预测器，以及原因是什么？本文表明，跨模态迁移出现是因为语言预训练为时序训练预设了一个可重用的流形。在冻结的LLM状态上进行线性探测可以解码出真实的时序轨迹而无需配对监督，该投影空间中的检索能产生具有竞争力的预测，表明在微调之前就已经存在结构和动态。预训练初始化还提升了优化效果，产生连贯的梯度和高度各向异性的损失景观，不同于随机初始化。微调则起到低维对齐的作用，重用已有的方向而非从头学习时间原始特性，这通过低秩更新、子空间对齐和共享的周期性、趋势和重复特征得到证实。这些结果支持了LLM到时序迁移的几何解释：语言预训练构建了流形，微调将数值动态投影到任务相关方向上。

英文摘要

Can language-pretrained transformers become effective time-series forecasters, and why? In this paper, we show that cross-modal transfer arises because language pretraining preconditions time series training with a reusable manifold. A linear probe on frozen LLM states decodes realistic time-series trajectories without paired supervision, and retrieval in this projected space yields competitive forecasts, showing that structure and dynamics exist before finetuning. Pretrained initialization also improves optimization, producing coherent gradients and a highly anisotropic loss landscape unlike random initialization. Finetuning then acts as low-dimensional alignment, reusing existing directions rather than learning temporal primitives from scratch, as evidenced by low-rank updates, subspace alignment, and shared features for periodicity, trend, and repetition. Together, these results support a geometric account of LLM-to-time-series transfer: language pretraining builds the manifold, and finetuning projects numerical dynamics onto task-relevant directions.

URL PDF HTML ☆

赞 0 踩 0

2605.20441 2026-05-21 cs.LG cs.AI cs.NE 版本更新

Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics

Transformer在Grokking中的权重衰减区域：廉价的在线诊断

Lucky Verma

发表机构 * Independent Researcher（独立研究者）

AI总结研究探讨了在模运算中训练的Transformer模型在记忆、泛化和崩溃之间的尖锐转变，并通过权重衰减作为标量经验控制参数来分析这些区域，引入了两种廉价的在线诊断方法，通过注意力激活来跟踪训练动态，并在较低计算成本下补充损失景观诊断。

Comments 28 pages, 11 figures, 5 tables. Code and aggregate JSONs: https://github.com/lucky-verma/grokking-diagnostics. Per-run JSONs: https://huggingface.co/datasets/lucky-verma/grokking-diagnostics-runs. Lean 4/mathlib v4.29.0 formal checks available in the code repository

详情

AI中文摘要

在模运算中训练的Transformer模型表现出记忆、泛化和崩溃之间的尖锐转变。我们证明权重衰减作为这些区域的标量经验控制参数，并引入了两种廉价的在线诊断方法，即平均成对注意力头余弦相似度和熵标准差，这些方法仅通过注意力激活来跟踪训练动态，并在较低计算成本下补充损失景观诊断。在十一种实验条件和三种模型规模（0.82M到85M参数）中，权重衰减轴将记忆、发展性Grokking和崩溃分开。一个接近临界点的逻辑拟合将记忆到发展性的边界定位在λ_c=0.0158（95%置信区间[0.0109, 0.0200]，N=210）；一个幂律拟合给出经验指数ν=0.757（置信区间[0.725, 0.799]）。参考指数ν=1/2和3D伊辛ν≈0.63在我们四格网格下位于此经验置信区间之外，因此我们报告ν为经验值，并将临界点类别的识别推迟到更密集的有限大小缩放工作。一个与地平线匹配的多任务复制（n=280，四个模运算）保留了权重衰减控制模式；在λ=0.05时进行的配对注意力头重新初始化实验改变了阶段2的振幅（Cohen的d=-1.190，n=10，p_t=4.5×10^-3），而匹配的权重范数裁剪则没有。三个跨架构探测（4L MLP，4L LSTM和4L Mamba；每个n=70）在小Transformer注意力模型的模运算中复制了权重衰减控制的转变，具有架构特定的λ_c值。主要诊断主张限于小Transformer注意力模型的模运算；非注意力实验是范围探测，架构广泛、语言模型和临界点类别的主张超出范围。

英文摘要

Transformers trained on modular arithmetic exhibit sharp transitions between memorization, generalization, and collapse. We show that weight decay acts as a scalar empirical control parameter for these regimes, and introduce two cheap online diagnostics, mean pairwise attention-head cosine similarity and entropy standard deviation, that track training dynamics from attention activations alone and complement loss-landscape diagnostics at lower compute cost. Across eleven experimental conditions and three model scales (0.82M to 85M parameters), the weight-decay axis separates memorization, developmental grokking, and collapse. A near-transition logistic fit localizes the memorization-to-developmental boundary at $λ_c=0.0158$ (95% CI [0.0109, 0.0200], N=210); a power-law fit gives an empirical exponent $ν=0.757$ (CI [0.725, 0.799]). Reference exponents $ν=1/2$ and 3D Ising $ν\approx 0.63$ lie outside this empirical CI under our four-bin grid, so we report $ν$ as empirical and defer universality-class identification to denser finite-size-scaling work. A horizon-matched multi-task replication (n=280, four modular operations) preserves the weight-decay control pattern; a paired attention-head re-initialization experiment at $λ=0.05$ changes Phase-2 amplitude (Cohen's $d=-1.190$, n=10, $p_t=4.5 \times 10^{-3}$), while matched weight-norm clipping does not. Three cross-architecture probes (4L MLP, 4L LSTM, and 4L Mamba; each n=70) replicate the weight-decay-controlled transition with architecture-specific $λ_c$ values. Main diagnostic claims are scoped to modular arithmetic in small transformer attention models; the non-attention experiments are scope probes, and architecture-wide, language-model, and universality-class claims are out of scope.

URL PDF HTML ☆

赞 0 踩 0

2605.20440 2026-05-21 cs.LG cs.AI math.RA 版本更新

Group-Algebraic Tensors: Provably-optimal Equivariant Learning and Physical Symmetry Discovery

群代数张量：可证明最优的等变学习与物理对称性发现

Paulina Hoyos, Shashanka Ubaru, Dongsung Huh, Vasileios Kalantzis, Kenneth L. Clarkson, Misha Kilmer, Haim Avron, Lior Horesh

发表机构 * UT Austin（得克萨斯大学奥斯汀分校）； IBM Research（IBM研究院）； Independent（独立）； Tufts University（塔夫茨大学）； Tel-Aviv University（特拉维夫大学）

AI总结本文提出了一种群代数张量框架，通过将有限群G的乘法规则引入张量代数，使等变性成为代数属性而非架构限制。该框架基于三个理论支柱：(i) Eckart-Young最优性保证的星G-SVD；(ii)通过Kronecker分解组合多个对称性；(iii)600行的Lean4形式化证明。该框架提供了等变神经网络无法实现的能力：每个预测的闭式分解和数据驱动发现最佳对称群。在QM9分子几何上，通过八面体子群恢复角动量选择规则，展示了数据驱动的物理发现。

详情

AI中文摘要

我们引入了$\star_G$张量代数，在其中任何有限群$G$定义乘法规则，使等变性成为代数属性而非架构约束。该框架基于三个机器验证的理论支柱：(i) $\star_G$-SVD的Eckart-Young最优性保证，是首个对称保持张量近似的结果，精确且多项式时间；(ii) 通过Kronecker分解组合多个对称性，通过将$F_G$替换为$F_{G_1} \otimes F_{G_2}$无需架构重设计；(iii) 600行的Lean~4形式化证明了$\star_G$代数。该框架提供了等变神经网络（ENNs）结构无法实现的能力：每个预测的闭式分解，以及数据驱动发现最佳对称群。作为非平凡的实证演示，分解QM9分子几何的八面体子群恢复了角动量选择规则，仅凭数据而非量子力学输入：标量性质由A$_1$主导，偶极子成分由T$_1$主导，各向异性极化率对l=1不敏感，因为秩2迹分解l=0⊕l=2要求，T$_1$/A$_1$预测能力比将向量可观测量与标量可观测量分离了五倍。在完整的QM9（130,831分子）上，$\star_G$-SVD与岭回归提供闭式预测，参数数量比参数匹配的MLP少50-90倍。代数等变性因此补充架构等变性，不是更快、更好、更便宜的替代方案，而是不同的数学能力：可证明最优的对称保持压缩，每irrep可解释性，以及数据驱动的物理发现。

英文摘要

We introduce the $\star_G$ tensor algebra, in which any finite group $G$ defines the multiplication rule, making equivariance an intrinsic algebraic property rather than an architectural constraint. The framework rests on three machine-verified theoretical pillars: (i)~an Eckart-Young optimality guarantee for the $\star_G$-SVD: the first such result for symmetry-preserving tensor approximation, exact and polynomial-time; (ii)~a Kronecker factorization that composes multiple symmetries by replacing $F_G$ with $F_{G_1} \otimes F_{G_2}$ with no architectural redesign; and (iii)~a 600-line Lean~4 formalization of the $\star_G$ algebra. The framework provides capabilities that equivariant neural networks (ENNs) structurally cannot: a closed-form per-irreducible-representation decomposition of every prediction, and data-driven discovery of the symmetry group that best fits a dataset. As a non-trivial empirical demonstration, decomposing QM9 molecular geometry over the chiral octahedral subgroup of SO(3) recovers the Wigner--Eckart selection rules of angular momentum from data alone, with no quantum mechanical input: scalar properties are A$_1$-dominated, dipole components are T$_1$-dominated, the isotropic polarizability is uniquely insensitive to $l\!=\!1$ as the rank-2-trace decomposition $l\!=\!0 \oplus l\!=\!2$ requires, and the T$_1$/A$_1$ predictive-power ratio separates vector observables from scalar observables by a factor of five. On full QM9 (130{,}831 molecules), $\star_G$-SVD with ridge regression provides closed form predictions at $\sim50-90\times$ fewer parameters than parameter-matched MLPs. Algebraic equivariance thus complements architectural equivariance not as a faster-better-cheaper alternative but as a different mathematical affordance: provably-optimal symmetry-preserving compression, per-irrep interpretability, and data-driven physical discovery.

URL PDF HTML ☆

赞 0 踩 0

2605.20439 2026-05-21 cs.LG cs.HC 版本更新

Can Conversational XAI Improve User Performance? An Experimental Study

对话式XAI能否提升用户表现？一项实证研究

Sven Kruschel, Julian Rosenberger, Lasse Bohlen, Mathias Kraus, Patrick Zschech

发表机构 * TU Dresden（德累斯顿技术大学）； University of Regensburg（罗滕堡大学）

AI总结本研究通过实验评估对话式XAI对用户表现的影响，探讨其在预测准确性、模型理解和错误识别方面的核心方法及主要贡献。

Comments Accepted at Thirty-Fourth European Conference on Information Systems (ECIS 2026), Milan, Italy

详情

AI中文摘要

可解释人工智能（XAI）技术旨在为预测模型提供洞察并提升用户表现，但往往未能达到这些期望。对话式XAI助手承诺克服这些限制，但关于其对客观性能指标影响的实证证据仍然有限。我们提出了一种实验设计，通过预测准确性、模型理解和错误识别来评估解释辅助。使用一个可解释性设计的预测模型，我们创建了用户能够通过识别和补偿系统性误差而超越模型的条件。我们将对话辅助与问答辅助进行比较，以评估哪种辅助更有效地支持用户与模型解释互动。初步测试我们实验设计的结果显示，两组参与者（N=42）均显著超越了模型，但两种辅助类型之间没有表现差异，整体参与度较为有限。这些发现为我们的计划全面研究提供了改进方向，包括增强的参与干预措施和对驱动预测改进机制的调查。

英文摘要

Explainable AI (XAI) techniques aim to provide insights into predictive models and enhance user performance, yet they often fall short of these expectations. Conversational XAI assistants promise to overcome such limitations, but empirical evidence on their impact on objective performance measures remains limited. We propose an experimental design for evaluating explanation assistance through prediction accuracy, model understanding, and error identification. Using an explainable-by-design prediction model, we create conditions where users can outperform the model by identifying and compensating for systematic errors. We compare conversational assistance against Q&A-based assistance to assess which better supports users in working with model explanations. Preliminary results from testing our experimental design show that participants (N=42) in both treatments significantly outperformed the model but reveal no performance differences between assistance types and modest engagement overall. These findings inform refinements for our planned full study, including enhanced engagement interventions and investigation of the mechanisms driving improved predictions.

URL PDF HTML ☆

赞 0 踩 0

2605.20434 2026-05-21 stat.ML cs.DM cs.LG 版本更新

Contradiction Graphs Determine VC Dimension

矛盾图确定VC维

Jesse Campbell, Daniel Ibaibarriaga, Lev Reyzin

发表机构 * Department of Mathematics, Statistics, & Computer Science（数学、统计与计算机科学系）

AI总结本文研究二元概念类的矛盾图，通过分析矛盾图的结构确定VC维的阈值，从而精确计算VC维并区分有限与无限VC维。

2605.20413 2026-05-21 cs.LG 版本更新

Supervised Latent Restructuring for Small-Data Quantum Learning in Plant Phenomics

监督潜在重构在植物表型小数据量子学习中的应用

Alakananda Mitra, David H. Fleisher, Vangimalla Reddy, Chittaranjan Ray

发表机构 * Nebraska Water Center, IANR University of Nebraska–Lincoln（内布拉斯加水中心，IA NR 内布拉斯加大学林肯分校）； Adaptive Cropping Systems Laboratory USDA-ARS（适应性种植系统实验室 USDA-ARS）； Nebraska Water Center, DWFI University of Nebraska–Lincoln（内布拉斯加水中心，DWFI 内布拉斯加大学林肯分校）

AI总结本文研究了在小数据条件下，通过监督潜在重构提升植物表型数据中高维特征压缩的几何分离性，提出混合工作流程结合PCA和LDA进行潜在空间重构，并利用GPU加速的量子核对齐方法，发现潜在几何结构在小数据量子学习中是关键设计变量。

Comments 11 pages, 4 Tables, 3 Figures

详情

AI中文摘要

高维生物数据往往表现出特征维度与样本数量之间的严重不匹配，这使得在极小数据条件下可靠分类变得困难。在这些情况下，核方法在潜在压缩无法保持类别分离结构时会失去判别能力。我们研究了细粒度植物表型学中的这一问题，并提出了一种混合工作流程，将1280维的深度图像嵌入压缩到64维的PCA空间，然后通过线性判别分析（LDA）重构为11维的监督潜在空间，并在NVIDIA L40S硬件上进行GPU加速的量子核对齐（QKA）。实证研究表明，监督潜在重构显著提高了压缩表示的几何分离性，使轮廓系数从原始嵌入空间中的0.003和PCA-64空间中的-0.006增加到监督LDA-11空间中的0.197。然而，下游经典评估显示存在明显的压缩权衡：线性SVM和XGBoost在重构的潜在空间中有所改善，而RBF-SVM和随机森林在相同的11维瓶颈下则有所下降。在受限的优化预算下，该领域的QKA仍然具有挑战性，表明潜在几何结构本身不足以实现强可训练的量子性能。这些发现将表示几何学定位为小数据量子学习中的关键设计变量，并揭示了从剧烈压缩的生物表示中恢复非线性判别结构的实践难度。

英文摘要

High-dimensional biological data often exhibit a severe mismatch between feature dimensionality and sample size, making reliable classification difficult in extremely small-data regimes. In these settings, kernel methods can lose discriminative power when latent compression fails to preserve class-separating structure. We study this problem in fine-grained plant phenomics and propose a hybrid workflow that compresses 1280-dimensional deep image embeddings into a 64-dimensional PCA space and then restructures them into an 11-dimensional supervised latent space using Linear Discriminant Analysis (LDA), followed by GPU-accelerated Quantum Kernel Alignment (QKA) on NVIDIA L40S hardware. Empirically, supervised latent restructuring substantially improves the geometric separability of the compressed representation, increasing the Silhouette coefficient from 0.003 in the raw embedding space and -0.006 in PCA-64 to 0.197 in the supervised LDA-11 space. However, downstream classical evaluation reveals a clear compression trade-off: Linear SVM and XGBoost improve in the restructured latent space, whereas RBF-SVM and Random Forest degrade under the same 11-dimensional bottleneck. Under a constrained optimization budget, QKA in this regime remains challenging, indicating that latent geometry alone is not sufficient for strong trainable quantum performance. These findings position representation geometry as a central design variable in small-data quantum learning and expose the practical difficulty of recovering nonlinear discriminative structure from aggressively compressed biological representations.

URL PDF HTML ☆

赞 0 踩 0

2605.20408 2026-05-21 cs.LG 版本更新

Spectral Souping: A Unified Framework for Online Preference Alignment

谱汤：一种在线偏好对齐的统一框架

Yinlam Chow, Guy Tennenholtz, Ted Yun, James Harrison, Arthur Gretton, Andre Barreto, Bo Dai

发表机构 * Google DeepMind（谷歌深Mind）； Google Research（谷歌研究）

AI总结本文提出了一种统一的在线偏好对齐框架Spectral Souping，通过发现LLM中的通用谱表示，实现了高效的模型合并，从而在不需昂贵在线重训练的情况下快速适应个体用户偏好。

详情

AI中文摘要

基于人类反馈的强化学习（RLHF）能够有效地将大型语言模型（LLMs）与聚合人类偏好对齐，但往往无法解决个体用户多样且冲突的需求。为了解决这个问题，我们引入了Spectral Souping，一种高效的在线偏好对齐统一框架。我们的贡献是发现LLM中的通用谱表示，该表示已被证明对模型合并具有高度适应性。这一理论洞察使我们能够采用两阶段方法：我们首先在离线学习中学习一组专门的策略，每个策略专注于不同的细粒度偏好维度。一个在线适应算法随后在推理时间高效地对这些策略进行“汤化”，通过合并其输出或参数，使模型能够快速适应而无需昂贵的在线重训练。在在线偏好对齐基准测试中的实验表明，我们的方法在现有最先进方法上实现了显著的性能提升，提供了一种可扩展且计算高效的方法，用于动态适应LLMs以适应个体用户偏好。

英文摘要

Reinforcement Learning from Human Feedback (RLHF) effectively aligns Large Language Models (LLMs) with aggregate human preferences but often fails to address the diverse and conflicting needs of individual users. To overcome this issue, we introduce Spectral Souping, a unified framework for efficient, online preference alignment. Our contribution is the discovery of a universal spectral representation within LLMs, which is proven to be highly amenable to model merging. This theoretical insight enables a two-phase methodology: we first learn a basis of specialized policies offline, each focused on a distinct, fine-grained preference dimension. An online adaptation algorithm then efficiently ``soups'' these policies at inference time, either by merging their outputs or parameters, enabling rapid model adaptation without the need for costly online retraining w.r.t. tailored preference rewards. Experiments on online preference alignment benchmarks demonstrate that our method achieves significant performance improvements over existing state-of-the-art approaches, presenting a scalable and computationally efficient solution for dynamically adapting LLMs to individual user preferences.

URL PDF HTML ☆

赞 0 踩 0

2605.20400 2026-05-21 stat.AP cs.LG stat.ML 版本更新

STELLAR: 为自动驾驶扩展3D感知大模型

Yingwei Li, Xin Huang, Yang Liu, Yang Fu, Alex Zihao Zhu, Chen Song, Junwen Yao, Anant Subramanian, Hao Xiang, Weijing Shi, Yuliang Zou, Tom Hoddes, Zhaoqi Leng, Govind Thattai, Dragomir Anguelov, Mingxing Tan

发表机构 * Waymo ； UCSD（加州大学圣地亚哥分校）

AI总结本文研究了大规模训练在自动驾驶感知系统中的应用，通过扩展输入模态并训练大规模模型，实现了在Waymo数据集上的新状态-of-the-art性能。

详情

AI中文摘要

模型扩展通过在多样化数据集上进行大规模训练已显示出显著的成功。然而，尚不清楚相同的范式是否适用于自动驾驶感知系统，因为存在独特的挑战，如融合异构传感器数据和需要复杂的3D空间理解。为弥合这一差距，我们进行了系统分析，研究了规模对这些系统的影响。我们基于稀疏窗口变换器开发了STELLAR模型，扩展了输入模态，包括LiDAR、雷达、相机和地图先验。我们在一个包含5000万驾驶示例的大规模数据集上训练该模型，参数数量高达5亿。我们的大规模实验揭示了模型性能与模型大小、数据和计算之间的经验扩展趋势。所得到的模型在Waymo Open Dataset挑战中建立了新的状态-of-the-art，大幅超越了先前的成果。我们的工作表明，大规模训练是提升自动驾驶感知模型能力极具前景的路径。

英文摘要

Model scaling has demonstrated remarkable success through large-scale training on diverse datasets. It remains an open question whether the same paradigm would apply to autonomous driving perception systems due to unique challenges, such as fusing heterogeneous sensor data and the need for sophisticated 3D spatial understanding. To bridge this gap, we present a comprehensive study on systematically analyzing the impact of scale on these systems. We develop our STELLAR model based on Sparse Window Transformer, by extending the input modalities to include LiDAR, radar, camera, and map prior. We train the model on a large-scale dataset of 50 million driving examples with up to 500 million parameters. Our large-scale experiments reveal empirical scaling trends that connect model performance to model size, data, and compute. The resulting model establishes a new state-of-the-art on the Waymo Open Dataset challenge, outperforming prior arts by a large margin. Our work demonstrates that large-scale training is a highly promising path for advancing the capabilities of perception models for autonomous driving.

URL PDF HTML ☆

赞 0 踩 0

2605.20389 2026-05-21 cs.LG cs.AI 版本更新

Nonlocal operator learning for fMRI encoding and decoding tasks

非局部算子学习用于fMRI编码和解码任务

Andreas Kramer, Saugat Acharya, Alice Giola, Emanuele Zappala

发表机构 * Department of Computer Science, Idaho State University（计算机科学系，爱达荷州立大学）； Department of Mathematics and Statistics, Idaho State University（数学与统计学系，爱达荷州立大学）

AI总结本文提出了一种基于神经积分算子的框架，用于fMRI数据的编码和解码任务，探讨了非局部时空上下文的作用，并通过实验验证了更长的时间窗口和视觉皮层与全脑记录对性能和潜在空间几何的影响。

Comments 18 pages, 4 figures, 5 tables. Comments are welcome!

详情

AI中文摘要

功能性磁共振成像（fMRI）数据表现出高维时空结构，使得预测和解码变得具有挑战性。在本工作中，我们研究了基于神经积分算子的模型用于fMRI的编码和解码任务，特别强调非局部时空上下文的作用。我们实现了一个潜在的神经积分算子框架，该框架在辅助空间中执行固定点迭代，通过解码器进行分类和刺激预测。我们在两个开源fMRI数据集上评估了我们的模型。我们的实验检验了从fMRI记录中解码刺激以及从刺激表示中编码fMRI动态。主要关注点是时空上下文的影响：我们系统比较了短和长的时间窗口，以及使用视觉皮层与全脑记录，并分析其对性能和潜在空间几何的影响。在不同任务和数据集中，更长的时间窗口通常会改善结果并产生更具结构化的学习表示。在解码实验中，学习的潜在空间通常比原始数据提供更清晰的类别分离。在编码实验中，尽管由于任务难度绝对性能保持中等，但更长的时间窗口仍能产生一致的改进。这些发现表明，神经积分算子为建模fMRI动态提供了一个有前景的框架，并且更广泛的时空上下文对预测和表示学习都是有益的。更广泛地说，结果表明，利用大脑动态中的分布式非局部结构需要专门设计的模型架构来捕捉此类依赖关系。

英文摘要

Functional MRI data exhibit high-dimensional spatiotemporal structure, making both prediction and decoding challenging. In this work, we investigate neural integral-operator-based models for encoding and decoding tasks in fMRI, with particular emphasis on the role of nonlocal spatiotemporal context. We implement a latent neural integral operator framework that performs fixed point iterations in an auxiliary space from which classification and stimuli prediction is performed via a decoder. We evaluate our model on two open-source fMRI datasets. Our experiments examine both decoding of stimuli from fMRI recordings and encoding of fMRI dynamics from stimulus representations. A main focus is the effect of spatiotemporal context: we systematically compare short and long temporal windows, as well as the use of visual cortex vs whole brain recordings, and analyze their influence on performance and latent-space geometry. Across tasks and datasets, larger temporal windows generally improve results and produce more structured learned representations. In decoding experiments, the learned latent space often provides clearer class separation than the raw data. In encoding experiments, although absolute performance remains moderate due to the difficulty of the task, longer temporal windows still yield consistent gains. These findings suggest that neural integral operators provide a promising framework for modeling fMRI dynamics and that broader spatiotemporal context can be beneficial for both prediction and representation learning. More broadly, the results indicate that exploiting distributed nonlocal structure in brain dynamics requires model architectures specifically designed to capture such dependencies.

URL PDF HTML ☆

赞 0 踩 0

2605.20369 2026-05-21 cs.CL cs.AI cs.LG 版本更新

DEL: Digit Entropy Loss for Numerical Learning of Large Language Models

DEL：用于大语言模型数值学习的数字熵损失

Zhaohui Zheng, Chenhang He, Shihao Wang, Yuxuan Li, Ming-Ming Cheng, Lei Zhang

发表机构 * The Hong Kong Polytechnic University（香港理工大学）； VCIP, College of Computer Science, Nankai University（南开大学计算机学院VCIP）

AI总结本文提出Digit Entropy Loss (DEL)用于大语言模型的自回归数值学习，通过重新设计传统无监督熵优化，引入数字条件概率和二元交叉熵，使熵优化转向监督方式，同时推广整数基于的数值学习到浮点数优化，从而提升数值预测的准确性。

详情

AI中文摘要

数字预测是大语言模型（LLMs）在数学问题解决和代码生成中的基本能力。广泛采用的最大似然估计（MLE）用于LLM训练并不适合数字预测。最近，惩罚驱动的方法，例如数字标记损失和离散化距离损失，引入了数字距离的归纳偏置，但分别导致了数字分布过度锐化和过度扁平化。在本文中，我们深入分析了LLM的数值学习，并表明现有的数值学习方法在概念上遵循一个准则-距离公式，其中准则项代表优化模式，距离项灌输几何先验。因此，我们提出了Digit Entropy Loss (DEL)用于自回归数值学习，其重新设计传统无监督熵优化的三个关键设计：利用数字条件概率和二元交叉熵将熵优化引导为监督方式；舍弃距离项以避免数值距离的问题；并将整数基于的数值学习推广到浮点数优化，使数值预测更加准确。我们的DEL公式可以结合整数、小数和小数点，将学习目标从单个数字扩展到浮点数领域。在七个数学推理基准测试中使用四个代表性的LLM，包括CodeLlama、Mistral、DeepSeek和Qwen-2.5，进行实验，结果表明DEL在整体预测准确性和数值距离方面均优于其替代方法。源代码在https://github.com/PolyU-VCLab/DEL。

英文摘要

Number prediction stands as a fundamental capability of large language models (LLMs) in mathematical problem-solving and code generation. The widely adopted maximum likelihood estimation (MLE) for LLM training is not tailored to number prediction. Recently, penalty-driven approaches, e.g., Number Token Loss and Discretized Distance Loss, introduce an inductive bias of numerical distance but induce over-sharpened and over-flattened digit distributions, respectively. In this paper, we make an in-depth analysis on LLM numerical learning, and show that existing numerical learning methods conceptually follow a criterion-distance formulation, where the criterion term represents optimization pattern and the distance term instills geometric prior. Consequently, we present Digit Entropy Loss (DEL) for auto-regressive numerical learning, which reformulates the conventional unsupervised entropy optimization in three key designs: leveraging digit conditional probability and binary cross-entropy to guide the entropy optimization into a supervised manner; deprecating the distance term to bypass the issue of numerical distance; and generalizing the integer-based numerical learning to floating-point number optimization, enabling more accurate number prediction. Our DEL formulation can incorporate integers, decimals, and decimal points, expanding the learning objective from a single digit to the floating-point number domain. Experiments conducted on seven mathematical reasoning benchmarks with four representative LLMs, including CodeLlama, Mistral, DeepSeek, and Qwen-2.5, demonstrate that DEL consistently outperforms its counterparts in both overall prediction accuracy and numerical distance. Source codes are at https://github.com/PolyU-VCLab/DEL

URL PDF HTML ☆

赞 0 踩 0

2605.20357 2026-05-21 cs.LG cs.AI 版本更新

Consistently Informative Soft-Label Temperature for Knowledge Distillation

一致信息软标签温度用于知识蒸馏

Hoang-Chau Luong, Nghia Van Vo, Kaiqi Zhao, Lingwei Chen

发表机构 * Rochester Institute of Technology（罗切斯特理工学院）； Oakland University（奥克兰大学）

AI总结本文提出CIST方法，通过为教师和学生分配样本级自适应温度，解决传统固定温度设计中教师软标签熵不一致和教师-学生logit尺度对齐过严的问题，从而提升知识蒸馏效果。

详情

AI中文摘要

知识蒸馏（KD）通过匹配教师和学生预测分布将知识从高容量教师传递给紧凑学生，温度缩放是平滑教师预测并暴露信息量大的

英文摘要

Knowledge distillation (KD) transfers knowledge from a high-capacity teacher to a compact student by matching their predictive distributions, with temperature scaling serving as a central mechanism for smoothing teacher predictions and exposing informative "dark knowledge" beyond the hard label. However, the standard fixed-temperature design is inherently sample-agnostic. Since samples differ in logit scale and learning difficulty, a single global temperature produces teacher soft labels with highly inconsistent entropy: some predictions remain overly sharp and provide limited inter-class information, whereas others become over-smoothed and lose class-discriminative information. Moreover, sharing the same temperature between teacher and student further imposes rigid logit-scale alignment despite their capacity mismatch. To address these limitations, we propose CIST (Consistently Informative Soft-label Temperature), which assigns separate sample-wise adaptive temperatures to the teacher and student. This design produces consistently informative teacher soft labels while relaxing rigid teacher--student logit-scale matching. It also reweights the distillation objective according to teacher confidence and student learning difficulty. Theoretically, we show that teacher-label entropy is largely governed by the ratio between the maximum teacher logit and the temperature, providing a principled basis for adaptive smoothing. Empirically, CIST mitigates the inconsistency induced by fixed temperature, and experiments on both vision and language distillation tasks show consistent improvements over standard KD and strong baselines with negligible computational overhead.

URL PDF HTML ☆

赞 0 踩 0

2605.20355 2026-05-21 cs.RO cs.HC cs.LG 版本更新

Proximal State Nudging: Reducing Skill Atrophy from AI Assistance

近端状态引导：减少人工智能辅助下的技能退化

Megha Srivastava, Jonathan Ouyang, Eric Zhou, Andrew Silva, Emily Sumner, Dorsa Sadigh, Yuchen Cui, Deepak Gopinath, Guy Rosman

发表机构 * Stanford University（斯坦福大学）； University of California Los Angeles（加州大学洛杉矶分校）； Toyota Research Institute（丰田研究院）

AI总结本文提出了一种名为近端状态引导（PSN）的共享自主算法，通过引导用户向最易学习的状态发展，同时优化技能发展和任务表现，以减少人工智能辅助下的技能退化问题。

Comments 9 pages

详情

AI中文摘要

技能退化，即在人工智能辅助下人类能力的逐渐下降，对半自主系统的共享控制构成了安全风险，因为在这种情况下，操作员可能无法区分自己的输入与自主修正。我们提出了近端状态引导（PSN），一种共享自主算法，通过引导用户向估计最易学习的状态发展，共同优化技能发展和任务表现。我们首先展示了PSN在平衡无辅助奖励下的学生进步与总体共享表现方面优于现有共享自主基线，使用经典LunarLander环境中的模拟学生。然后，我们呈现了迄今为止关于整合学习兼容共享自主的规划器的人类受试者研究：在CARLA模拟器中的两个驾驶任务（高性能赛车和并线，n=60）中，PSN在无辅助技能方面产生的收益比标准混合共享自主大7倍，同时碰撞次数比无辅助自我练习少50%。

英文摘要

Skill atrophy, the gradual decline of human capability under AI assistance, poses a safety risk in shared-control of semi-autonomous systems, where operators may be unable to distinguish their own inputs from autonomous corrections. We propose Proximal State Nudging (PSN), a shared autonomy algorithm that jointly optimizes for skill development and task performance by nudging users toward states estimated to be most learnable. We first show that PSN outperforms existing shared autonomy baselines in balancing student improvement in unassisted reward with overall shared performance, using simulated students in the classic LunarLander environment. We then present, to the best of our knowledge, the first human subject studies of a planner incorporating learning-compatible shared autonomy: across two driving tasks in the CARLA simulator (High Performance Racing and Parallel Parking, n = 60), PSN produces up to 7x larger gains in unassisted skill than standard blended shared autonomy, while incurring 50% fewer collisions than unassisted self-practice.

URL PDF HTML ☆

赞 0 踩 0

2605.20345 2026-05-21 stat.ML cs.LG 版本更新

Corrected Integrated Laplace Approximation for Bayesian Inference in Latent Gaussian Models

修正的积分拉普拉斯近似法用于潜在高斯模型的贝叶斯推断

Jinlin Lai, Charles C. Margossian, Daniel R. Sheldon

发表机构 * Manning College of Information and Computer Sciences University of Massachusetts Amherst（信息与计算机科学学院马萨诸塞大学阿姆赫斯特分校）； Department of Statistics University of British Columbia（统计学系不列颠哥伦比亚大学）

AI总结本文提出了一种重要性采样方案来纠正积分拉普拉斯近似法（ILA）在潜在高斯模型（LGMs）中引入的误差，通过增加重要性采样的样本数使近似后验收敛到正确后验，并在自动微分框架中实现该方法以支持超参数推断中的梯度基算法，特别是哈密顿蒙特卡洛方法。

详情

AI中文摘要

潜在高斯模型（LGMs）是一类流行的贝叶斯分层模型，包括高斯过程、某些空间模型和混合效应模型。对LGMs进行高效贝叶斯推断通常需要对潜在变量进行边缘化。对于具有非高斯似然的LGMs，精确边缘化是不可能的，一种流行的方法是使用积分拉普拉斯近似（ILA）进行近似边缘化。使用ILA会产生一个近似后验，在某些情况下，它可能与正确后验有显著差异，从而影响下游应用。我们提出了一种重要性采样方案来纠正ILA引入的误差。通过增加重要性采样的样本数，ILA产生的后验将收敛到正确后验。这一想法通过伪边缘化、拟蒙特卡洛和随机化拟蒙特卡洛等技术实现。我们将在自动微分框架中实现我们的方法，以支持在超参数推断中的梯度基算法。对于后者，我们特别考虑使用哈密顿蒙特卡洛方法。我们展示了在各种应用模型中减少误差的好处。

英文摘要

Latent Gaussian models (LGMs) are a popular class of Bayesian hierarchical models that include Gaussian processes, as well as certain spatial models and mixed-effect models. Efficient Bayesian inference of LGMs often requires marginalizing out the latent variables. For LGMs with a non-Gaussian likelihood, exact marginalization is not possible and a popular approach is to do approximate marginalization with an integrated Laplace approximation (ILA). Using ILA produces an approximate posterior which, in some settings, can differ significantly from the correct posterior, which impacts downstream applications. We propose an importance sampling scheme to correct the error introduced by ILA. By increasing the number of samples in importance sampling, the posterior with ILA converges to the correct posterior. This idea is realized with various techniques, including pseudo-marginalization, quasi-Monte Carlo and randomized quasi-Monte Carlo. We implement our methods in an automatic differentiation framework to support gradient-based algorithms when doing inference on the hyperparameters. For the latter, we specifically consider the use of Hamiltonian Monte Carlo. We demonstrate the benefits of reduced error in various applied models.

URL PDF HTML ☆

赞 0 踩 0

2605.20314 2026-05-21 cs.LG cs.AI 版本更新

Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases

数据更少，训练更快：重复较小的数据集通过采样偏差加速学习

Jingwen Liu, Ezra Edelman, Surbhi Goel, Bingbin Liu

发表机构 * Columbia University（哥伦比亚大学）； University of Pennsylvania（宾夕法尼亚大学）； Harvard University（哈佛大学）

AI总结研究探讨了'小数据与大数据差距'现象，即使用更少样本重复训练比使用更大数据集更节省计算资源，通过层间增长和采样偏差机制实现加速，为优化提供了新的归纳偏差。

Comments ICML 2026

2605.20311 2026-05-21 cs.LG 版本更新

WaveGraphNet: Physics-Consistent Guided-Wave Damage Localization through Coupled Inverse-Forward Graph Learning

WaveGraphNet: 通过耦合逆向-前向图学习实现物理一致的引导波损伤定位

Vinay Sharma, Aditya Bharade, Olga Fink

发表机构 * EPFL, Intelligent Maintenance and Operations Systems（瑞士联邦理工学院智能维护与运营系统）； EPFL, Intelligent Maintenance（瑞士联邦理工学院智能维护）

AI总结本文提出WaveGraphNet，一种用于碳纤维增强聚合物板引导波损伤定位的耦合逆向-前向图学习框架，通过图结构建模传感布局，利用图连接性表示测量传播路径，结合逆向分支和前向分支实现损伤定位的鲁棒性提升。

详情

AI中文摘要

引导波结构健康监测通过稀疏的粘结压电换能器网络在复合板中实现损伤定位。然而，从pitch-catch测量推断缺陷的空间位置仍然在仅有有限损伤位置用于训练时受到弱约束。因此，训练以预测缺陷位置的模型可能在已见案例中表现良好，但在未见结构区域泛化能力差。本文提出WaveGraphNet，一种用于引导波损伤定位的耦合逆向-前向图学习框架。传感布局被显式建模为图，其中换能器表示为节点，测量传播路径定义图连接性。逆向分支将图结构化的频谱描述符映射到损伤位置，而前向分支预测与候选位置相关的测量波响应路径的能量偏差模式。在训练过程中，前向分支作为物理一致的正则化器，抑制那些在数值上合理但与测量波响应能量重新分布不一致的位置估计。这种耦合促使推断的损伤坐标与底层波传播行为达成一致。在本基准中，所提出的图基公式为稀疏引导波传感提供了强大的定位模型，并在 extrapolation 到 held-out 区域时相比非图和图基基线表现出改进的鲁棒性。这些结果突显了耦合逆向-前向图学习作为在有限空间覆盖下引导波定位的有效策略的潜力。

英文摘要

Guided-wave structural health monitoring enables damage localization in composite plates using sparse networks of bonded piezoelectric transducers. However, inferring the spatial location of defects from pitch-catch measurements remains weakly constrained when only a limited set of damage locations is available for training. As a result, models trained to predict defect locations may perform well on seen cases but generalize poorly to unseen regions of the structure. This paper proposes WaveGraphNet, a coupled inverse--forward graph learning framework for guided-wave damage localization in Carbon Fiber Reinforced Polymer (CFRP) plates. The sensing layout is explicitly modeled as a graph, where transducers are represented as nodes and measured propagation paths define the graph connectivity. An inverse branch maps graph-structured spectral descriptors of differential guided-wave responses to a damage location, while a forward branch predicts the path-wise energy-deviation patterns of measured wave responses associated with a candidate location. During training, the forward branch serves as a physics-consistent regularizer, discouraging location estimates that are numerically plausible but inconsistent with the measured redistribution of wave-response energy. This coupling encourages agreement between inferred damage coordinates and the underlying wave propagation behavior. Within this benchmark, the proposed graph-based formulation provides a strong localization model for sparse guided-wave sensing and demonstrates improved robustness in extrapolation to held-out regions compared to both non-graph and graph baselines. These results highlight the potential of coupled inverse-forward graph learning as an effective strategy for guided-wave localization under limited spatial coverage.

URL PDF HTML ☆

赞 0 踩 0

2605.20308 2026-05-21 cs.CV cs.AI cs.LG 版本更新

SDM: A Powerful Tool for Evaluating Model Robustness

SDM：评估模型鲁棒性的强大工具

Xinlei Liu, Tao Hu, Jichao Xie, Peng Yi, Hailong Ma, Baolin Li

发表机构 * Information Engineering University, Zhengzhou, China ； Key Laboratory of Cyberspace Endogenous Safety \& Security of Henan Province, Zhengzhou, China ； Key Laboratory of Cyberspace Security Ministry of Education of China, Zhengzhou, China ； Songshan Laboratory, Zhengzhou, China

AI总结本文提出了一种名为SDM的新型梯度攻击方法，通过重新定义对抗样本生成的目标，解决了传统方法中'高损失非对抗样本'导致的性能下降问题，并在实验中证明了其在攻击性能和成本效率上的优势。

Comments 16 pages

详情

Journal ref: Forty-third International Conference on Machine Learning (ICML 2026)

AI中文摘要

基于梯度的攻击方法是评估模型鲁棒性的重要方法。然而，自从提出APGD以来，此类方法难以取得显著突破。为了实现这一效果，我们首先分析了先前方法中导致攻击性能下降的'高损失非对抗样本'问题，并证明该问题源于对抗样本生成目标的不恰当。随后，我们将目标重新定义为

英文摘要

Gradient-based attacks are important methods for evaluating model robustness. However, since the proposal of APGD, it has been difficult for such methods to achieve significant breakthroughs. To achieve such an effect, we first analyze the issue of "high-loss non-adversarial examples" that degrades attack performance in previous methods, and prove that this issue arises from inappropriate objectives for adversarial example generation. Subsequently, we reconstruct the objective as "maximizing the difference between the non-ground-truth label probability upper bound and the ground-truth label probability", and proposes a novel and powerful gradient-based attack method named Sequential Difference Maximization (SDM). SDM establishes a three-layer optimization framework of "cycle-stage-step". It adopts the negative probability loss function and the Directional Probability Difference Ratio (DPDR) loss function in the initial and subsequent optimization stages, respectively, and approaches the ideal objective of adversarial example generation via stage-wise sequential optimization. Experiments demonstrate that compared with previous state-of-the-art methods, SDM not only achieves stronger attack performance but also exhibits superior cost-effectiveness. The code is available at https://github.com/X-L-Liu/ICML-SDM.

URL PDF HTML ☆

赞 0 踩 0

2605.20300 2026-05-21 cs.LG cs.AI 版本更新

谱遗忘：无需重新训练的后验能力恢复

Aarash Abro, Muhammad Tahir

发表机构 * Zeta Labs（泽塔实验室）； Lahore University of Management Sciences（拉合尔管理科学大学）

AI总结研究探讨了语言模型在目标任务微调过程中因训练数据未显式威胁而退化的能力现象，提出了一种仅使用预训练检查点和微调后检查点的后验修复方法，通过谱修复技术恢复受损能力并保留目标任务收益。

详情

AI中文摘要

对语言模型进行目标任务微调通常会退化那些训练数据从未显式威胁的能力。我们研究这种现象，称为灾难性遗忘，并提出一种后验修复解决方案，仅使用预训练检查点W_base和其微调后代W_ft。目标不仅是将模型回退到基础检查点，而是恢复微调损坏的能力，同时保留目标任务的收益和任何有益的未显式改进。我们引入了DG-Hard，一种仅使用检查点的谱修复方法，用于微调更新Δ= W_ft - W_base。DG-Hard将Δ视为嵌入在IID-like噪声残差中的低秩任务对齐信号，该信号梯度下降没有动力去除，并对每个权重-增量矩阵应用Donoho-Gavish硬奇异值阈值，保留更新的结构高能部分并去除谱体。这将修复简化为一个闭合形式的SVD过滤步骤，无需数据依赖的调优。一个核心困难是评估：平均准确率隐藏了每个基准的失败，而朴素恢复分数奖励那些简单回退到基础的模型。因此，我们引入了一个分区条件度量，分别跟踪愈合、保留、非损坏和目标任务保留。在14（模型，任务）设置和九个跨领域未显式基准上，DG-Hard在后验基线中实现了最强的平衡修复。DG-Hard还恢复了由良性微调退化的三个独立安全轴的安全对齐，尽管不使用任何对齐数据。这些结果表明，部分微调引起的能力建设损失并非专业化不可避免的后果，而是在权重更新本身中可去除的谱残余。

英文摘要

Fine-tuning a language model for a target task routinely degrades capabilities the training data never explicitly threatened. We study this phenomenon, known as catastrophic forgetting, and propose a post-hoc repair solution that uses only the pretrained checkpoint $W_{\mathrm{base}}$ and its fine-tuned descendant $W_{\mathrm{ft}}$. The goal is not merely to revert the model toward the base checkpoint, but to recover capabilities damaged by fine-tuning while preserving both the target-task gains and any beneficial held-out improvements. We introduce DG-Hard, a checkpoint-only spectral repair method for the fine-tuning update $Δ= W_{\mathrm{ft}} - W_{\mathrm{base}}$. DG-Hard treats $Δ$ as a low-rank task-aligned signal embedded in an IID-like noise residual that gradient descent has no incentive to remove, and applies the Donoho-Gavish hard singular-value threshold to each weight-delta matrix, keeping the structured high-energy part of the update and removing the spectral bulk. This reduces repair to a closed-form SVD filtering step requiring no data-dependent tuning. A central difficulty is evaluation: average accuracy hides per-benchmark failures, while naive recovery scores reward models that simply revert toward the base. We therefore introduce a partition-conditional metric that separately tracks healing, preservation, non-damage, and target-task retention. Across $14$ (model, task) settings and nine cross-domain held-out benchmarks, DG-Hard achieves the strongest balanced repair among post-hoc baselines. DG-Hard also restores safety alignment degraded by benign fine-tuning on three independent safety axes, despite using no alignment data. These results suggest that part of fine-tuning-induced capability loss is not an unavoidable consequence of specialization, but a removable spectral residue in the weight update itself.

URL PDF HTML ☆

赞 0 踩 0

2605.20295 2026-05-21 cs.LG cs.AI 版本更新

Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization

Quant.npu：通过完全静态量化实现高效的移动NPU推理以支持设备端LLM

Jinghe Zhang, Daliang Xu, Chenghua Wang, Weikai Xie, Tao Qi, Yun Ma, Mengwei Xu, Gang Huang

发表机构 * Qualcomm（高通）

AI总结本文提出Quant.npu框架，通过完全静态量化方法实现高效的移动NPU推理，解决了传统后训练量化方法在NPU硬件约束下的兼容性问题，并在实际移动NPU上实现了较高的准确性和较低的推理延迟。

详情

AI中文摘要

大型语言模型（LLMs）正越来越多地部署在移动设备上，其中神经处理单元（NPUs）需要完全静态量化以实现最优的推理效率。然而，现有的后训练量化（PTQ）方法主要依赖于动态激活量化，使其与NPU硬件约束不兼容。为了弥合高保真PTQ与NPU受限推理之间的差距，我们提出了Quant.npu，一个仅整数的完全静态量化框架。它结合了可学习的量化参数和旋转矩阵，使低比特激活-权重量化无需运行时重新计算量化参数。关键的是，我们发现初始化和选择性优化量化参数对于优化稳定性至关重要，因为不恰当的初始化和简单的联合优化会引发梯度不稳定，破坏旋转矩阵的优化。为此，我们提出了针对不同激活特征的旋转和比特宽感知初始化，以及针对旋转和未旋转张量的分布感知选择性优化（双阶段量化流水线）。此外，我们引入了一种敏感性引导的自适应混合精度方案，以在准确性和推理效率之间取得平衡。在实际移动NPU上的大量实验表明，Quant.npu在准确度上与最先进的方法相当，同时将推理延迟降低了最高15.1%。

英文摘要

Large language models (LLMs) are increasingly deployed on mobile devices, where Neural Processing Units (NPUs) necessitate fully static quantization for optimal inference efficiency. However, existing post-training quantization (PTQ) methods predominantly rely on dynamic activation quantization, rendering them incompatible with NPU hardware constraints. To bridge the gap between high-fidelity PTQ and NPU-constrained inference, we propose Quant.npu, a integer-only fully static quantization framework. It incorporates learnable quantization parameters and rotation matrices, enabling low-bit activation-weight quantization without runtime quantization parameters re-computation. Crucially, we identify that initialization and selective optimization of quantization parameters is pivotal for optimization stability, as improper initialization and naive joint optimization induce gradient instability that disrupts the optimization of rotation matrices. To address this, we propose a rotation-and-bit-width-aware initialization tailored to diverse activation profiles and a distribution-aware selective optimization (two-stage quantization pipeline) tailored to rotated and unrotated tensors. Furthermore, we introduce a sensitivity-guided adaptive mixed-precision scheme to balance accuracy with inference efficiency. Extensive experiments on real-world mobile NPUs demonstrate that Quant.npu achieves comparable accuracy to state-of-the-art methods, while reducing inference latency by up to 15.1%.

URL PDF HTML ☆

赞 0 踩 0

2605.20293 2026-05-21 cs.LG cs.AI cs.NE 版本更新

FusionCell: 跨注意力融合布局几何与网络列表拓扑以实现标准单元性能预测

Haoyi Zhang, Kairong Guo, Bojie Zhang, Yibo Lin, Runsheng Wang

发表机构 * School of Integrated Circuits, Peking University, Beijing, China（集成电路学院，北京大学，北京，中国）

AI总结本文提出FusionCell，通过跨注意力机制融合布局几何和网络列表拓扑，以提高标准单元性能预测的准确性，解决了传统方法忽略布局几何导致的耦合和布局依赖效应的问题。

详情

AI中文摘要

标准单元是数字电路的基本构建块，其延迟和功率对芯片级性能有关键影响；然而，其表征仍依赖于缓慢的仿真扫描，许多快速预测器忽略了布局几何，未能捕捉到耦合和布局依赖效应。挑战在于如何联合表示布局几何和网络列表拓扑，使模型能够同时捕捉细粒度的空间细节和结构连接，以实现准确的性能预测。我们引入FusionCell，一种双模态预测器，将路由布局几何和网络列表拓扑作为输入，并在统一模型中显式融合它们。一个DeiT编码器处理三层路由布局，而图Transformer模型异构设备/网络图。模态通过拓扑引导机制集成，其中网络列表作为结构“地图”主动查询布局中的相关物理区域，以实现联合几何和拓扑推理。我们构建了一个基于ASAP7 PDK的7nm数据集，使用自动工具生成超过19500个单元，涵盖149种类型，针对六个指标：信号上升/下降延迟、过渡和功率。实验结果表明，FusionCell减少了回归误差，平均MAPE为0.92个百分点，并在基线模型上提高了Spearman/Kendall排名，同时将表征过程的速度提高了数十倍，相比电路仿真。

英文摘要

Standard cells form the building blocks of digital circuits, so their delay and power critically influence chip-level performance; yet characterization still relies on slow simulation sweeps, and many fast predictors ignore layout geometry, missing coupling and layout-dependent effects. The challenge is to jointly represent layout geometry and netlist topology so models capture fine-grained spatial details together with structural connectivity for accurate performance prediction. We introduce FusionCell, a dual-modality predictor that treats routed layout geometry and netlist topology as inputs and fuses them explicitly in a unified model. A DeiT encoder processes three-layer routed layouts, while a graph transformer models heterogeneous device/net graphs. The modalities are integrated through a topology-guided mechanism, where the netlist acts as a structural "map" to actively query relevant physical regions in the layout for joint geometric and topological reasoning. We build a 7nm dataset based on the ASAP7 PDK with over 19.5k cells spanning 149 types using automatic tools, targeting six metrics: signal rise/fall delay, transition, and power. Experimental results demonstrate that FusionCell reduces regression error, with an average MAPE of 0.92 percent, and improves Spearman/Kendall ranking over baselines, while accelerating the characterization process by orders of magnitude compared to circuit simulation.

URL PDF HTML ☆

赞 0 踩 0

2605.20286 2026-05-21 cs.CR cs.LG 版本更新

Adaptive Probe-based Steering for Robust LLM Jailbreaking

适应性探针引导用于鲁棒大语言模型劫持

Junxi Chen, Junhao Dong, Xiaohua Xie

发表机构 * School of Computer Science（计算机科学学院）； Engineering, Sun Yat-Sen University, China（中山大学工程学院，中国）； Nanyang Technological University, Singapore（南洋理工大学，新加坡）

AI总结本文提出了一种基于模型提取的适应性探针引导方法，通过动态调整引导强度来提升大语言模型劫持的鲁棒性和有效性，无需额外对比提示或手动调参，显著提高了攻击效果。

Comments 19 pages, 13 figures, accepted by ICML 2026

2605.20285 2026-05-21 cs.LG cs.AI 版本更新

模态解耦的在线递归编辑

Siyuan Li, Youyuan Zhang, Fangming Liu, Jing Li

发表机构 * Harbin Institute of Technology, Shenzhen, China.（哈尔滨工业大学（深圳））； Peng Cheng Laboratory, China.（鹏城实验室）； Huazhong University of Science and Technology, China（华中科技大学）

AI总结本文提出M-ORE，一种用于持续多模态大语言模型适应的模态解耦在线递归编辑器，通过统一的近端投影公式和Sherman-Morrison递归实现常数级的每编辑开销，从而在保持模块局部统计信息和固定正交低秩编辑子空间的同时，减少长周期干扰，提升可靠性、通用性和局部性。

详情

AI中文摘要

针对多模态大语言模型（MLLMs）的在线模型编辑需要在计算和内存预算限制下处理连续的纠正流，但为文本-only LLMs开发的编辑器在MLLMs上往往表现不佳：视觉主导的激活偏移了塑造更新的统计信息，导致跨模态冲突，而顺序写入在共享的编辑空间中交织，放大了长周期干扰，导致跨编辑干扰。为了解决这些问题，我们提出了M-ORE，一种用于持续MLLM适应的模态解耦在线递归编辑器。M-ORE源自统一的近端投影公式，并允许通过Sherman-Morrison递归实现闭式更新，从而实现每编辑常数开销。它维护文本堆栈和视觉投影器的模块级局部统计信息，以避免视觉主导的更新塑造，并通过Sherman-Morrison递归在固定正交低秩编辑子空间中进行持续更新，以缓解长周期干扰。在多个MLLM基础架构和在线编辑基准上的实验表明，我们的M-ORE方法在可靠性、通用性和局部性方面优于强大的基线方法，同时实现了有利的质量-效率扩展。我们的代码在https://github.com/lab-klc/M-ORE上公开可用。

英文摘要

Online model editing for multimodal large language models (MLLMs) requires assimilating a stream of corrections under tight compute and memory budgets. Yet editors developed for text-only LLMs often degrade on MLLMs: visually dominant activations skew the statistics that shape updates, causing cross-modal conflict, while sequential writes become entangled in a shared edit space and amplify long-horizon interference, causing inter-edit interference. To address these, we propose M-ORE, a modality-decoupled online recursive editor for lifelong MLLM adaptation. M-ORE is derived from a unified proximal-projection formulation and admits a closed-form update with a Sherman-Morrison recursion, yielding constant per-edit overhead. It maintains module-wise locality statistics for the text stack and the visual projector to avoid visually dominated update shaping and performs continual updates in a fixed orthogonal low-rank edit subspace via a Sherman-Morrison recursion to mitigate long-horizon interference. Experiments on multiple MLLM backbones and online editing benchmarks show that our M-ORE method consistently improves reliability, generality, and locality over strong baselines, while achieving favorable quality-efficiency scaling. Our code is publicly available at https://github.com/lab-klc/M-ORE.

URL PDF HTML ☆

赞 0 踩 0

2605.20272 2026-05-21 cs.LG cs.AI 版本更新

Smaller Abstract State Spaces Enable Cross-Scale Generalization in Reinforcement Learning

更小的抽象状态空间在强化学习中实现跨尺度泛化

Nasehatul Mustakim, Lucas Lehnert

发表机构 * Department of Computer Science（计算机科学系）； University of Saskatchewan（萨斯喀彻温大学）； Saskatoon, Saskatchewan, Canada（加拿大萨斯喀彻温省萨斯喀彻温市）

AI总结本文提出了一种理论模型，通过扩展POMDP中的状态抽象框架，定义了 successor-weighted model reduction，从而在强化学习代理中实现跨尺度泛化，并分析了抽象状态空间大小对泛化能力的影响。

详情

AI中文摘要

尽管人类能够轻易地将抽象概念推广到更复杂或更大的任务中，但构建具备这种能力的强化学习（RL）系统仍然难以实现。本文提出了首个关于如何在RL代理中实现Out-of-Distribution（OOD）泛化的理论模型。我们的方法考虑了部分可观测马尔可夫决策过程（POMDPs），并假设智能体使用抽象函数来确定哪些经验可以被视为等价，哪些必须区分。首先，我们扩展了现有的状态抽象框架和证明技术到POMDPs。然后，我们定义了successor-weighted model reduction，这是一种允许压缩到比先前定义更小的抽象空间的模型缩减变体。我们推导了代理OOD测试性能的界限，从而定义了实现OOD泛化的条件。该界限将代理的性能损失分解为近似和估计误差，揭示了减少代理抽象状态空间大小如何提高测试性能和OOD泛化能力。我们的分析表明，限制代理在有限的抽象状态集合上操作对于实现更复杂任务的泛化是必要的。我们的结果鼓励进一步研究学习能够跨不同复杂程度任务进行扩展的RL架构。

英文摘要

While humans readily generalize abstract concepts to more complex or larger tasks, building Reinforcement Learning (RL) systems with this ability remains elusive. Here, we present the first theoretical model of how such Out-of-Distribution (OOD) generalization can be achieved in RL agents. Our approach considers Partially Observable Markov Decision Processes (POMDPs) and assumes that an intelligent agent uses an abstraction function to determine which experiences can be treated as equivalent and which must be distinguished. First, we extend the existing state abstraction framework and proof techniques to POMDPs. Then, we define a successor-weighted model reduction, a model reduction variant that enables compression into smaller abstract spaces than prior definitions allow. We derive a bound on the agent's OOD test performance, thereby defining the conditions under which OOD generalization is achievable. This bound decomposes an agent's performance loss into approximation and estimation errors, revealing how reducing an agent's abstract state space size improves test performance and OOD generalization. Our analysis suggests that constraining an agent to operate over a small, finite set of abstract states is necessary for achieving generalization to more complex tasks. Our results motivate further research into learning RL architectures that scale across tasks of varying complexity levels.

URL PDF HTML ☆

赞 0 踩 0

2605.20271 2026-05-21 stat.ML cs.LG 版本更新

Multi-Head Attention as Ensemble Nadaraya-Watson Estimation: Variance Reduction, Decorrelation, and Optimal Head Diversity

多头注意力作为恩德里亚-沃森估计的集合：方差减少、去相关和最优头多样性

Ernest Fokoué

发表机构 * School of Mathematics and Statistics, College of Science（数学与统计学学院，科学学院）

AI总结本文提出多头注意力可以视为恩德里亚-沃森核回归估计器的集合，通过分析头输出的去相关性，推导出方差减少与头多样性之间的关系，并提出头多样性指数来衡量不同头之间的去相关程度，最终得出最优的头数量和维度分配方案。

Comments 14 pages

详情

AI中文摘要

我们发展了多头注意力（MHA）作为恩德里亚-沃森（NW）核回归估计器集合的严谨统计理论。基于单头softmax注意力与NW估计器之间的代数恒等式，我们证明MHA是H个NW估计器的结构化集合，每个在键空间的不同的学习投影子空间中操作。我们推导出MHA均方误差的显式偏倚-方差-协方差分解，表明方差减少不仅取决于头数H，还根本上取决于头输出的去相关性。去相关由学习投影子空间之间的主角之间决定：正交投影产生最大方差减少；对齐投影产生无。我们引入头多样性指数（HDI），一个可计算的谱度量，衡量头之间的去相关程度，并证明MHA均方误差随HDI单调递减。这为经验观察到的注意力头的专业化提供了第一个严谨的理论解释。在固定总维度预算D=H*d_k下，我们解决最优头维度分配问题，推导出MSE最小化的配对（H*,d_k*）从数据分布和回归平滑度。解决方案得出新的架构扩展定律：最优每头维度随着训练集大小对数增长，而最优头数几乎与总预算D线性增长。我们的框架统一了三个先前的工作：单头注意力的NW理论、集合学习的一般加权理论以及生物和计算集合之间的去相关-方差减少同构性。多头注意力是Transformer对通用原则的实例化：相同代理加上促进多样性的机制产生涌现最优性。

英文摘要

We develop a rigorous statistical theory of multi-head attention (MHA) as an ensemble of Nadaraya-Watson (NW) kernel regression estimators. Building on the algebraic identity between single-head softmax attention and the NW estimator, we prove that MHA is a structured ensemble of H NW estimators, each operating in a distinct learned projection subspace of the key space. We derive an explicit Bias-Variance-Covariance decomposition of the MHA mean squared error, showing that variance reduction depends not merely on the number of heads H but fundamentally on the decorrelation of head outputs. Decorrelation is governed by the principal angles between learned projection subspaces: orthogonal projections yield maximum variance reduction; aligned projections yield none. We introduce the Head Diversity Index (HDI), a computable spectral measure of inter-head decorrelation, and prove that MHA mean squared error is monotonically decreasing in HDI. This provides the first rigorous theoretical explanation for the empirically observed specialization of attention heads. Under a fixed total-dimension budget D = H * d_k, we solve the optimal head-dimension allocation problem, deriving the MSE-minimizing pair (H*, d_k*) from data distribution and regression smoothness. The solution yields a new architectural scaling law: the optimal per-head dimension grows logarithmically with training set size, while the optimal number of heads grows nearly linearly with the total budget D. Our framework unifies three strands of prior work: the NW theory of single-head attention, the general weighting theory for ensemble learning, and the decorrelation-variance-reduction isomorphism between biological and computational ensembles. Multi-head attention is the Transformer's instantiation of a universal principle: identical agents plus diversity-enforcing mechanisms yields emergent optimality.

URL PDF HTML ☆

赞 0 踩 0

2605.20270 2026-05-21 cs.LG cs.AI stat.ML 版本更新

Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

conformal selective acting: any-time-valid risk control for rlvr-trained llms

Hamed Khosravi, Xiaoming Huo

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结该研究提出了一种 conformal selective acting 方法，用于在 rlvr 训练的 llms 部署中实现 anytime-valid 的风险控制，通过在部署要求下强制一个空单元，利用 e-process 和 bonferroni 网格来维护 pathwise 有效性，同时在多个基准测试中证明了其有效性。

详情

AI中文摘要

一个本地专家 llm，通过在操作员本地数据上使用强化学习从可验证奖励 (rlvr) 进行微调，被安装在一个受监管的组织中，具有每个部署的误差预算 α。操作员需要在每个回合为该部署的流提供安全证书：不跨部署汇总，不等待长期平均。现有封装器无法在自适应、在线更新的流上实现这一点：离线 conformal 风险方法需要可交换性；在线 conformal 方法仅绑定长期平均；非可交换扩展是边际有效的；最接近的 anytime 封装器，A-RCPS，控制的是边际风险而非选择性风险。使用 (测试统计量，有效性保证，部署规则) 框架，我们识别了一个被部署要求强制的空单元：e-process 每个阈值，选择性风险，anytime-pathwise 有效性，max-certified-threshold 规则。Conformal Selective Acting (CSA) 填充它作为每回合的封装器，维护每个阈值上的 ville 型 e-process 在 bonferroni 网格上，评估相对于 rlvr 过滤器。在可预测的更新和 isotonic-calibrated 单调风险下，我们证明了 (i) 一个 anytime-pathwise 选择性风险界 $R_T^{\mathrm{act}}\leα+O(N_T^{-1/2})$，(ii) 与 $Θ(arη^{-2}\log(1/δ))$ 匹配的认证率，以及 (iii) 与 horizon 无关的发布率差距。在八个专家基准 ($480$ 流)、十六个对抗性分布偏移单元 ($160$ 流) 和五个 live Expert-Iteration RLVR 单元 (在四个基础模型上使用在线 LoRA 在三个架构家族中) ($10{,}300$ 轮) 中，CSA 是十种方法中唯一一个在每个单元上都满足 pathwise 有效性和非拒绝部署的方法。我们不提出新的 llm、训练算法或策略类；CSA 是部署端的补充，与模型正交，适用于无法使用前沿 API 的操作员。

英文摘要

A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on operator-local data, is installed in a regulated organization with per-deployment error budget $α$. The operator needs a safety certificate for this deployment's stream at every round: no pooling across deployments, no waiting for a long-run average. Existing wrappers cannot deliver this on adaptive, online-updated streams: offline conformal-risk methods require exchangeability; online-conformal methods bound only long-run averages; non-exchangeable extensions are marginally valid; and the closest anytime wrapper, A-RCPS, controls marginal rather than selective risk. Using a (test statistic, validity guarantee, deployment rule) framework, we identify one empty cell forced by deployment requirements: e-process per threshold, selective risk, anytime-pathwise validity, max-certified-threshold rule. Conformal Selective Acting (CSA) fills it as a per-round wrapper maintaining a Ville-type e-process per threshold on a Bonferroni grid, evaluated against the RLVR filtration. Under predictable updates and isotonic-calibrated monotone risk we prove (i) an anytime-pathwise selective-risk bound $R_T^{\mathrm{act}}\leα+O(N_T^{-1/2})$, (ii) rate-optimal certification matching $Θ(\barη^{-2}\log(1/δ))$, and (iii) a horizon-independent release-rate gap. Across eight specialist benchmarks ($480$ streams), sixteen adversarial distribution-shift cells ($160$ streams), and five live Expert-Iteration RLVR cells with online LoRA over four base models in three architecture families ($10{,}300$ rounds), CSA is the only method among ten compared that satisfies pathwise validity and non-refusing deployment on every cell. We do not propose a new LLM, training algorithm, or policy class; CSA is the deployment-side complement, orthogonal to the model, for operators who cannot use a frontier API.

URL PDF HTML ☆

赞 0 踩 0

2605.20269 2026-05-21 cs.LG cs.AI stat.ML 版本更新

Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity

捕捉移动子空间：超越平稳性的低秩老虎机

Hamed Khosravi, Xiaoming Huo

发表机构 * H. Milton Stewart School of Industrial and Systems Engineering（H. Milton Stewart工业与系统工程学院）； Georgia Institute of Technology（佐治亚理工学院）

AI总结本文研究了在子空间漂移的情况下，低秩线性上下文老虎机的问题，提出了一种新的算法SPSC，在保持子空间变化的同时，实现了基于秩的动态遗憾率。

详情

AI中文摘要

许多老虎机应用（推荐、临床给药、广告定向）有两个事实，以往的工作只孤立处理：奖励生活在低维潜在子空间上，且该子空间漂移。静态低秩老虎机利用秩但受子空间变化影响；非静态线性老虎机适应漂移但以环境速率$\widetilde{O}(d\sqrt{T})$工作。我们研究了分段静态低秩线性上下文老虎机，具有标量反馈：$θ_t = B_k^\star w_t$，其中秩-$r$因子$B_k^\star\in\mathbb{R}^{d\times r}$在每个未知的$K$段内恒定，且可以在边界处改变。我们的结果在三个轴上都是紧致的。 (i) 识别边界。在单次标量奖励下，移动子空间可通过奖励的二次函数来恢复，当且仅当三个探针侧条件成立：已知噪声方差、有界状态-噪声耦合、以及全维探针支持。每个都是在无限制二次矩问题中的必要条件，且共同它们是充分的，表征了解决区域的边界。 (ii) 算法和动态遗憾。SPSC在学习的$r$维子空间内交替等距探针与窗口投影岭UCB利用；CUSUM样式的变体在线发现段边界。成本动态遗憾是$\widetilde{O}(r\sqrt{T})+\widetilde{O}(T^{2/3})+O(W\,V_{\mathrm{in}})$，用内在秩代替环境$d\sqrt{T}$速率。 (iii) 实验。在十一基准上，从合成、UCI/MovieLens、半合成临床和ZOZOTOWN生产日志数据跨度，SPSC在$d-r\gtrsim T^{1/6}$时优于非静态和低秩基线，匹配分析交叉点。据我们所知，这是在该设置中首次工作来表征识别边界并达到内在秩动态遗憾率的工作。

英文摘要

Many bandit deployments (recommendation, clinical dosing, ad targeting) share two facts prior work handles only in isolation: rewards live on a low-dimensional latent subspace, and that subspace drifts. Stationary low-rank bandits exploit rank but break under subspace change; non-stationary linear bandits adapt to drift but pay ambient rate $\widetilde{O}(d\sqrt{T})$. We study piecewise-stationary low-rank linear contextual bandits with scalar feedback: $θ_t = B_k^\star w_t$ with rank-$r$ factor $B_k^\star\in\mathbb{R}^{d\times r}$ constant within each of $K$ unknown segments and able to shift at boundaries. Our results are tight along three axes. (i) Identification boundary. With single-play scalar rewards, the moving subspace is recoverable through quadratic functionals of rewards iff three probe-side conditions hold: known noise variance, bounded state-noise coupling, and full-dimensional probe support. Each is necessary in the unrestricted-second-moment problem, and jointly they are sufficient, characterizing the boundary of the solvable region. (ii) Algorithm and dynamic regret. SPSC interleaves isotropic probes with windowed projected ridge-UCB exploitation inside the learned $r$-dimensional subspace; a CUSUM-style variant discovers segment boundaries online. The costed dynamic regret is $\widetilde{O}(r\sqrt{T})+\widetilde{O}(T^{2/3})+O(W\,V_{\mathrm{in}})$, replacing the ambient $d\sqrt{T}$ rate with the intrinsic rank. (iii) Empirics. On eleven benchmarks spanning synthetic, UCI/MovieLens, semi-synthetic clinical, and ZOZOTOWN production-log data, SPSC outperforms non-stationary and low-rank baselines whenever $d-r\gtrsim T^{1/6}$, matching the analytical crossover. To our knowledge, this is the first work to characterize the identification boundary and attain the intrinsic-rank dynamic-regret rate in this setting.

URL PDF HTML ☆

赞 0 踩 0

2605.20268 2026-05-21 cs.LG cs.AI cs.CL 版本更新

Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding

Chronicle：一种用于联合语言和时间序列理解的多模态基础模型

Paul Quinlan, Jeremy Levasseur, Qingguo Li, Xiaodan Zhu

发表机构 * InertialAI ； Department of Electrical and Computer Engineering, Queen’s University（皇后大学电气与计算机工程系）； Department of Mechanical and Materials Engineering, Queen’s University（皇后大学机械与材料工程系）

AI总结本文提出Chronicle，一种联合训练语言和时间序列的多模态基础模型，通过统一架构实现两者共享参数，从而在多个任务上取得了优异表现。

详情

AI中文摘要

现实中的时间序列通常伴随着文本：元数据、描述、新闻、报告。然而，时间序列基础模型通常孤立处理数值序列，而试图弥合两者差距的多模态文本-时间序列模型往往事后使用预训练语言模型，继承了从未见过时间数据的表示。这些模型几乎全部在其他多模态基线上进行评估，而不是在各自领域最强的单模基础模型上进行评估，这留下了联合训练是否必要的疑问。我们提出了Chronicle，一个仅含324M参数的解码器-only变压器，从头开始在自然语言和时间序列上进行单统一架构的训练。两种模态共享相同的transformer块、注意力机制和残差流；预训练的大部分使用单模批次，因此跨模态能力纯粹来自共享参数，辅以一个短的对齐阶段，交替处理两者。据我们所知，Chronicle是第一个从头开始联合训练文本和时间序列的模型，也是第一个在两个领域中评估专用基础模型的多模态模型。它在19个NLU任务上与Gemma-3-270M-PT相当，在24个UCR/UEA数据集上设定了新的冻结-嵌入时间序列分类标准，并在Time-MMD上产生多模态预测，优于所有监督融合基线，所有这些都来自单一主干。

英文摘要

Real-world time series come with text: metadata, descriptions, news, reports. Yet time series foundation models process numerical sequences in isolation, and the multimodal text-and-time-series models that attempt to bridge the two all adapt a pretrained language model post hoc, inheriting representations shaped without ever seeing temporal data. These models are also evaluated almost exclusively against other multimodal baselines, not against the strongest unimodal foundation models in either domain, leaving open whether joint training is needed at all. We present Chronicle, a compact 324M-parameter decoder-only transformer trained from scratch on natural language and time series within a single unified architecture. Both modalities share the same transformer blocks, attention mechanism, and residual stream; the bulk of pretraining uses unimodal batches so cross-modal capability emerges purely from shared parameters, with a short alignment stage that interleaves the two. To our knowledge, Chronicle is the first model jointly pretrained on text and time series from scratch, and the first multimodal model evaluated against dedicated foundation models in both domains. It matches Gemma-3-270M-PT on 19 NLU tasks, sets a new bar for frozen-embedding time series classification on 24 UCR/UEA datasets, and produces multimodal forecasts on Time-MMD that beat every supervised fusion baseline, all from a single backbone.

URL PDF HTML ☆

赞 0 踩 0

2605.20262 2026-05-21 cs.LG cs.AI 版本更新

Residual Paving: Diagnosing the Routing Bottleneck in Selective Refusal Editing

残差铺垫：在选择性拒绝编辑中的路由瓶颈诊断

Bryce Hinkley, Peyman Najafirad

发表机构 * University of Texas at San Antonio（德克萨斯大学圣安东尼奥分校）

AI总结本文研究了选择性拒绝编辑作为三重控制问题，通过引入残差铺垫方法，分离路由选择、是否干预和残差编辑能力，从而减少编辑拒绝率并提高良性分布和有害分布的保留率。

详情

AI中文摘要

我们研究选择性拒绝编辑作为三重控制问题：在指定的编辑提示上诱导非拒绝，同时在编辑集之外保持良性行为和有害拒绝。我们引入残差铺垫，一种用于冻结指令微调变压器的路由残差编辑方法，将路由选择、是否干预与残差编辑能力分离。早期层的路由预测一个标量门和专家混合；当激活时，提示条件的瓶颈残差专家应用后期层的残差更新，同时保持骨干不变。这种分解支持一个oracle路由诊断，其中仅将学习到的标量门替换为保留的编辑/保留标签，其余残差编辑器和冻结的骨干保持不变。在主要的Gemma-3-4B-IT保留分割上，学习到的残差铺垫将编辑拒绝率从88.6%降至4.0%，同时保持95.5%的良性分布和87.3%的有害分布。相同协议的一向引导控制在编辑成功方面要弱得多，留下编辑拒绝率为86.8%（针对Edit-target ActAdd）和78.9%（针对DIM风格的拒绝引导）。剩余的失败是偏离目标的有害-保留退化：有害拒绝仍低于冻结基础率，65.3% vs. 81.6%。在六个骨干上，oracle路由在每行报告的指标上都提高了保留侧的诊断分数，中位数增益+12.9个百分点，支持了学习到的路由选择是主要观察到的瓶颈的解释。对两个骨干的轨迹诊断进一步表明，运动方向是朝向编辑目标延续而非通用拒绝抑制。

英文摘要

We study selective refusal editing as a three-way control problem: induce non-refusal on designated edit prompts while preserving benign behavior and harmful refusals outside the edit set. We introduce Residual Paving, a routed residual editing method for frozen instruction-tuned transformers that separates route selectivity, whether to intervene, from residual-edit capacity, what edit to apply. An early-layer router predicts a scalar gate and expert mixture; when active, prompt-conditioned bottleneck residual experts apply later-layer residual updates while leaving the backbone unchanged. This decomposition supports an oracle-routing diagnostic where only the learned scalar gate is replaced with the held-out edit/keep label, leaving the residual editor and frozen backbone fixed. On the primary Gemma-3-4B-IT held-out split, learned Residual Paving reduces edit refusal from 88.6% to 4.0%, with 95.5% benign distribution preservation and 87.3% harmful distribution preservation. Same-protocol one-direction steering controls are much weaker on edit success, leaving edit refusal at 86.8% for Edit-target ActAdd and 78.9% for DIM-style refusal steering. The remaining failure is off-target harmful-keep degradation: harmful refusal remains below the frozen-base rate, 65.3% vs. 81.6%. Across six backbones, oracle routing improves the keep-side diagnostic score on every reported row, with median gain +12.9 pp, supporting the interpretation that learned route selectivity is the main observed bottleneck. Trajectory diagnostics on two backbones further suggest directed movement toward edit-target continuations rather than generic refusal suppression.

URL PDF HTML ☆

赞 0 踩 0

2605.20258 2026-05-21 cs.LG cs.AI cs.CR 版本更新

It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs

需要两人：互补的自我蒸馏用于大语言模型中的上下文完整性

Sangwoo Park, Woongyeong Yeo, Seanie Lee, Yumin Choi, Hyomin Lee, Kangsan Kim, Jinheon Baek, Seong Joon Oh, Sung Ju Hwang

发表机构 * KAIST（韩国科学技术院）

AI总结本文提出SELFCI框架，通过分离信息抑制与任务解决，解决大语言模型中隐私与效用的权衡问题，通过互补的自我蒸馏方法提升上下文完整性。

Comments 28 pages, 16 figures

详情

AI中文摘要

上下文完整性（CI）定义隐私不仅仅是保持信息隐藏，而是根据给定情境的规范来管理信息流。随着大型语言模型越来越多地被用作个人代理处理敏感工作流程，遵循CI变得至关重要。然而，即使前沿模型在做出披露决策时仍然不可靠，现有的缓解策略往往会降低基础任务性能。为了解决这一隐私-效用权衡问题，我们提出了SELFCI，一种互补的自我蒸馏框架，将信息抑制与任务解决解耦。SELFCI联合优化两个独立的反向KL散度，这些散度来源于反馈得到的不同教师分布：一个鼓励保留与任务相关的信息以提高效用，另一个强制最小化和适当披露。这种互补的公式诱导出一个专家产品（PoE）目标，使策略与能力和隐私需求的交集对齐。实证评估显示，SELFCI无需依赖昂贵的外部监督，始终优于竞争基线，如在线强化学习算法（例如GRPO）。这些趋势进一步扩展到涉及代理工作流程和积累私人上下文的离域设置中，表明SELFCI为实现CI对齐提供了一条实用路径。

英文摘要

Contextual Integrity (CI) defines privacy not merely as keeping information hidden, but as governing information flows according to the norms of a given context. As large language models are increasingly deployed as personal agents handling sensitive workflows, adhering to CI becomes critical. However, even frontier models remain unreliable in making disclosure decisions, and existing mitigation strategies often degrade underlying task performance. To overcome this privacy-utility trade-off, we propose SELFCI, a complementary self-distillation framework that decouples information suppression from task resolution. SELFCI jointly optimizes two independent reverse KL divergences over distinct teacher distributions derived from feedback: one encourages preserving task-relevant information for utility, while the other enforces minimal and appropriate disclosure. This complementary formulation induces a Product-of-Experts (PoE) target, aligning the policy with the intersection of capability and privacy requirements. Empirical evaluations demonstrate that SELFCI, without relying on costly external supervision, consistently outperforms competitive baselines such as online reinforcement learning algorithms (e.g., GRPO). These trends further extend to out-of-domain settings involving agentic workflows and accumulated private context, suggesting that SELFCI provides a practical path toward CI alignment.

URL PDF HTML ☆

赞 0 踩 0

2605.20257 2026-05-21 cs.LG cs.AI 版本更新

Instance Discrimination for Link Prediction

实例判别用于链接预测

Valentin Cuzin-Rambaud, Mathieu Lefort, Rémy Cazabet

AI总结本文提出了一种基于链接表示的新模型L-GRACE和L-BGRL，用于改进链接预测任务的性能，特别是在无属性图上，并展示了其在监督和自监督场景下的竞争力。

详情

AI中文摘要

最近，实例判别模型已成为自监督学习的主要解决方案。在图像领域已证明其有效性后，实例判别学习现在在图领域，特别是节点分类任务中也表现出色。然而，针对链接预测任务的贡献较少。在本文中，我们提出将现有方法适应到此领域。我们首先对现有自监督模型在链接预测领域的性能进行了严格评估，表明主要性能依赖于增强过程（类似于计算机视觉）。然后，我们提出了一种基于社区结构的新的结构增强方法，这对链接预测相关。我们的主要贡献是引入了两个新的模型，L-GRACE和L-BGRL，基于链接表示而不是节点表示，这些模型改进了现有方法的性能，特别是在无属性图上，并且我们展示了它们在监督和自监督场景下与最先进的方法相当。

英文摘要

Recently, instance discrimination models have emerged as a major solution for self-supervised learning. Having already demonstrated its effectiveness in the image domain, instance discrimination learning is now proving equally convincing in the graph domain, in particular for node classification. However, fewer contributions have tackled the link prediction task. In this contribution, we propose to adapt existing methods to this context. We first provide a rigorous evaluation of existing self-supervised models in the field of link prediction, showing that the main performance depends on the augmentation process (like in computer vision). We then propose a new structural augmentation based on the community structure that is relevant for link prediction. Our main contribution introduces two new models, L-GRACE and L-BGRL, based on link representations instead of node representations, which improve the performance of the existing methods, especially on unattributed graphs, and we show that they perform on par with the state of the art, both in supervised and self-supervised contexts.

URL PDF HTML ☆

赞 0 踩 0

2605.20256 2026-05-21 cs.LG cs.AI 版本更新

CP-MoE：一致性保留的混合专家用于持续学习

Yang Liu, Toan Nguyen, Flora D. Salim

发表机构 * School of Computer Science and Engineering University of New South Wales（计算机科学与工程学院新南威尔士大学）

AI总结本文提出CP-MoE，一种基于瞬时专家的持续学习框架，通过一致性保留的路由偏置和瞬时专家引导的正则化机制，减少参数干扰和遗忘，同时保留跨任务知识转移。

详情

AI中文摘要

持续学习在大语言模型（LLMs）和视觉-语言模型（VLMs）中仍面临灾难性遗忘的严重障碍。尽管混合专家（MoE）架构提供了扩展的有效途径，但现有的基于LoRA的MoE持续学习方法仍面临根本性的权衡：要么过于激进地隔离专家，限制任务间的知识转移，要么允许任务特定的更新覆盖重要的现有参数，导致严重的遗忘。为此，我们提出了CP-MoE，一种持续学习框架，围绕瞬时专家构建，该专家捕捉早期任务特定的更新并引导其整合到稳定的专家中。CP-MoE引入了一种一致性保留的路由偏置，利用瞬时专家估计与稳定专家的表示相似性，并引导路由向更兼容的专家选择方向；还引入了一种瞬时专家引导的正则化机制，该机制在合并过程中选择性地保护重要历史参数。这些组件共同减少了参数干扰和遗忘，同时保留了跨任务的知识转移。我们在基于LLM和VLM的MoE模型上验证了CP-MoE，既在单模态又在多模态持续学习基准上进行了测试。在SuperNI基准上，涵盖多样化的序列语言任务，CP-MoE实现了最先进的性能，并在未见任务上表现出更强的零样本迁移能力。在VQA v2数据集上，它能有效扩展到多模态视觉推理，一致地减少遗忘，并优于强大的MoE基线。

英文摘要

Catastrophic forgetting remains a major obstacle to continual learning in large language models (LLMs) and vision--language models (VLMs). Although Mixture-of-Experts (MoE) architectures offer an efficient path to scaling, existing LoRA-based MoE continual learning methods still face a fundamental trade-off: they either isolate experts too aggressively, limiting knowledge transfer across tasks, or allow task-specific updates to overwrite important existing parameters, leading to severe forgetting. To address this, we propose CP-MoE, a continual learning framework built around a transient expert that captures early task-specific updates and guides their integration into stable experts. CP-MoE introduces a consistency-preserving routing bias, which uses the transient expert to estimate representation similarity with stable experts and steer routing towards more compatible expert selection, and a transient expert-guided regularisation mechanism, which selectively protects important historical parameters during merging. Together, these components reduce parameter interference and forgetting while preserving cross-task knowledge transfer. We validate CP-MoE on both unimodal and multimodal continual learning benchmarks with LLM-based and VLM-based MoE models. On SuperNI benchmark, spanning diverse sequential language tasks, CP-MoE achieves state-of-the-art performance and stronger zero-shot transfer to unseen tasks. On VQA v2 dataset, it scales effectively to multimodal visual reasoning, consistently reduces forgetting, and outperforms strong MoE baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.20245 2026-05-21 cs.SI cs.LG 版本更新

Prism: Structural Symmetry Scanning via Duality-Constrained Laplacian Projection

Prism：通过双重视约束拉普拉斯投影进行结构对称性扫描

Jiatong Xie

发表机构 * Independent researcher（独立研究者）

AI总结 Prism通过双重视约束拉普拉斯投影方法，利用图拉普拉斯矩阵和双重视算子计算结构对称性缺陷，以检测复杂网络的结构自一致性偏离程度，并在不同数据集上验证其在社区检测和结构应力检测中的有效性。

Comments 10 pages, 4 tables, 1 figure. This work presents a first-principles unsupervised network structural diagnosis framework based on symmetric involution operator and Laplacian commutator constraint. It achieves noise-robust community detection and early structural risk detection in financial time-series networks without supervised training data

详情

AI中文摘要

我们介绍了Prism，一个用于复杂网络结构对称性诊断的框架。给定一个图拉普拉斯矩阵L和一个双重视算子P（一个对称的逆运算），Prism计算双重视缺陷δ(L,P) = ||LP - PL||_F / ||L||_F ——一个标量，衡量网络偏离结构自一致性程度。当P编码网络的真实对称性时，δ接近零并在结构退化时单调上升；任意P给出噪声。我们证明了满足[L', P] = 0的最优L'由闭合形式的块对角投影给出，并提供了一个无监督的交替优化方法，从图自身的费米向量中学习P。在合成网络上的实验表明，真实P的缺陷比索引反转基线更敏感于结构退化，并比模块度更敏感。在带有边噪声的Zachary's Karate Club数据集上，Prism在5%噪声下达到94.5%的社区检测准确度，而原始拉普拉斯基线为76.6%。应用于实时S&P 500数据（2026-05-17）时，Prism检测到结构应力上升（缺陷0.43→0.73在90天内）而表面相关性仍低——一个相关性方法无法检测到的信号。在涵盖五个主要压力事件（2011-2020）的历史回测中，双重视缺陷表现出一致的模式：它在相关性尖峰之前达到高水平，并在结构脆弱期维持高水平，而传统指标将其归类为平静期。双重视缺陷是一种基于原理的结构可接受条件，不需要训练数据，可在毫秒内计算。

英文摘要

We introduce \textbf{Prism}, a framework for structural symmetry diagnosis in complex networks. Given a graph Laplacian $L$ and a duality operator $P$ (a symmetric involution), Prism computes the \emph{duality defect} $δ(L,P) = \|LP - PL\|_F / \|L\|_F$ -- a scalar measuring how far the network deviates from structural self-consistency. When $P$ encodes the network's true symmetry, $δ$ starts near zero and rises monotonically as structure degrades; an arbitrary $P$ gives noise. We prove that the optimal $L'$ satisfying $[L', P] = 0$ is given by a closed-form block-diagonal projection, and provide an unsupervised alternating optimization that learns $P$ from the graph's own Fiedler vector. Experiments on synthetic networks show the true-$P$ defect is $3.38\times$ more sensitive to structural degradation than an index-reversal baseline and more sensitive than modularity. On Zachary's Karate Club with edge noise, Prism achieves $94.5\%$ community detection accuracy at $5\%$ noise versus $76.6\%$ for the raw Laplacian baseline. Applied to live S\&P~500 data (2026-05-17), Prism detects rising structural stress (defect $0.43 \to 0.73$ over 90 days) while surface correlations remain low -- a signal invisible to correlation-based methods. In a historical backtest spanning five major stress events (2011--2020), the duality defect exhibits a consistent pattern: it reaches elevated levels \emph{before} the correlation spike that accompanies each crisis, and sustains high readings during periods of structural fragility that conventional metrics classify as calm. The duality defect is a first-principles structural admissibility condition, requiring no training data and computable in milliseconds.

URL PDF HTML ☆

赞 0 踩 0

2605.20244 2026-05-21 cs.LO cs.AI cs.CL cs.LG cs.SE 版本更新

Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search

Lean Refactor: 通过代理策略搜索实现多目标可控的证明优化

Jialin Lu, Soonho Kong, Rodrigo Stehling, Kaiyu Yang, Zhangyang Wang, Weiran Sun, Wuyang Chen

发表机构 * Simon Fraser University（西蒙弗雷泽大学）； Amazon Web Services（亚马逊网络服务）； MiroMind ； University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结本文提出Lean Refactor框架，通过检索增强的代理策略搜索，解决多目标、可控和版本鲁棒的Lean证明重构问题，主要贡献是通过预注释的多目标重构策略数据库实现高效的证明优化。

详情

AI中文摘要

我们提出了Lean Refactor，一个插件式的检索增强型代理框架，用于多目标、可控和版本鲁棒的Lean证明重构。LLM生成的证明虽然正确但冗长且易碎，现有重构工作忽视了三个实际挑战：1）Lean重构本质上是多目标的（证明长度、编译成本和版本兼容性常存在矛盾）；2）Lean仓库具有脆弱的兼容性，而LLM发布不了解Lean/Mathlib版本；3）基于训练的流水线需要每次新LLM发布时重复微调，无法随模型变化或Lean发布周期扩展。Lean Refactor通过检索预注释的多目标重构策略数据库中的冻结代理LLM，每个策略都密集注释了元数据，如支持的Lean/Mathlib版本和预期的编译成本减少。实验显示在竞争基准上压缩超过70%的token级别，在研究仓库上压缩超过20%，并达到高达60%的编译时间减少，优于先前工作和Claude Code。版本过滤检索进一步提高了目标Lean版本的压缩效果，重构后的miniF2F证明在零样本版本迁移至未来Lean发布时表现优于未重构的对应物。

英文摘要

We present Lean Refactor, a plug-and-play retrieval-augmented agentic framework for multi-objective, controllable, and version-robust refactoring of Lean proofs. LLM-generated proofs are notoriously correct-but-verbose and brittle across library versions, yet existing refactoring works overlook three practical challenges: 1) Lean refactoring is natively multi-objective (proof length, compilation cost, and version compatibility are often in tension); 2) Lean repositories have fragile compatibility, whereas LLM releases are unaware of Lean/Mathlib versions; 3) Training-based pipelines require repeated fine-tuning with each new LLM release, scaling neither with model churn nor with Lean's release cycle. Lean Refactor steers a frozen agentic LLM with retrievals from a curated database of multi-objective refactoring strategies, each densely annotated with metadata such as supported Lean/Mathlib versions and expected compilation-cost reduction. Experiments show over $70\%$ token-level compression on competition benchmarks, over $20\%$ on research repositories, and up to $60\%$ compilation-time reduction, outperforming prior work and Claude Code. Version-filtered retrieval further improves compression on the target Lean version, and refactored miniF2F proofs exhibit stronger zero-shot version transfer to future Lean releases than their unrefactored counterparts.

URL PDF HTML ☆

赞 0 踩 0

2605.20242 2026-05-21 cs.LG cond-mat.mtrl-sci cs.AI physics.chem-ph 版本更新

LEAP: A closed-loop framework for perovskite precursor additive discovery

LEAP：一种用于钙钛矿前驱体添加剂发现的闭环框架

Xin-De Wang, Zhi-Rui Chen, Ze-Feng Gao, Peng-Jie Guo, Cheng Mu, Zhong-Yi Lu

发表机构 * School of Physics, Renmin University of China（中国人民大学物理学院）； School of Chemistry and Life Resource, Renmin University of China（中国人民大学化学与生命资源学院）

AI总结该研究提出LEAP框架，结合大语言模型和主动学习，通过文献驱动的机制相关描述符和贝叶斯优化，实现了钙钛矿太阳能电池添加剂的高效发现，实验验证显示其在性能提升方面优于通用模型。

Comments 30 pages; 11 figures

详情

AI中文摘要

高效发现前驱体添加剂对于提高钙钛矿太阳能电池性能至关重要，但庞大的化学空间使传统试错筛选效率低下。我们开发了LEAP（通过主动学习进行钙钛矿添加剂探索的LLM驱动闭环框架），该框架结合了领域专用的大语言模型（LLM）和主动学习，用于迭代性添加剂优先级排序。LLM被训练以从钙钛矿添加剂文献中提取机制相关知识，并通过可解释的描述符表示候选分子，这些描述符进一步整合到贝叶斯优化工作流中，以在低数据条件下进行不确定性感知的优先级排序。在未见过的文献基准测试中，领域专用模型在机制一致推理方面优于通用模型。专家在闭环中的证明概念研究实验验证显示，经过三次筛选轮次后，添加剂优先级得到改善，导致平均设备PCEs分别为20.13%和20.87%，分别比对照组的19.25%有所提高，其中最佳PCE为21.32%。这些结果提供了初步证据，表明基于文献的机制描述符，当结合贝叶斯优化和专家可行性审查时，可以支持钙钛矿光伏中的机制感知添加剂优先级排序。

英文摘要

Efficient discovery of precursor additives is essential for improving the performance of perovskite solar cells, yet the large chemical space makes conventional trial-and-error screening inefficient. We develop LEAP(LLM-driven Exploration via Active Learning for Perovskites), an expert-in-the-loop closed framework that couples a domain-specialized large language model(LLM) with active learning for iterative additive prioritization. The LLM is trained to extract mechanism-relevant knowledge from the perovskite additive literature and to represent candidate molecules through interpretable descriptors, which are further integrated into a Bayesian optimization workflow for uncertainty-aware prioritization under low-data conditions. Benchmark results on unseen literature show that the domain-specialized model outperforms general-purpose models in mechanism-consistent reasoning. Experimental validation in an expert-in-the-loop proof-of-concept study suggests improved additive prioritization across three screening rounds, leading to average device PCEs of 20.13% and 20.87% for the later-round 6-CDQ- and 2-CNA-treated devices, respectively, compared with 19.25% for the control, with a champion PCE of 21.32%. These results provide preliminary evidence that literature-grounded mechanistic descriptors, when coupled with Bayesian optimization and expert feasibility review, can support mechanism-aware additive prioritization in perovskite photovoltaics.

URL PDF HTML ☆

赞 0 踩 0

2605.20241 2026-05-21 cs.LG cs.AI cs.CL 版本更新

高级科学方法论应用于罗西尼

Silvia Licciardi, Daniela Macchione, Emmanuel Caronna, Elisa Francomano

发表机构 * University of Palermo, Department of Engineering（巴勒莫大学工程系）； Conservatory Alfredo Casella（阿尔弗雷多·卡塞拉音乐学院）

AI总结本文通过计算分析方法，对罗西尼为梅斯塔西奥的《Mi lagnerò tacendo》所作的音乐作品进行结构分析，揭示其旋律、和声及文本创作选择，为音乐文献学研究提供新的系统研究基础。

2605.20209 2026-05-21 cs.GR cs.LG cs.RO 版本更新

SOLAR：一种自优化的开放式自主代理，用于终身学习和持续适应

Nitin Vetcha, Dianbo Liu

发表机构 * Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore（眼科学系，Yong Loo Lin医学院，新加坡国立大学，新加坡）； Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, Karnataka, India（计算与数据科学系，印度科学研究院，班加罗尔，卡纳塔克邦，印度）

AI总结本文提出SOLAR，一种自优化的开放式自主代理，通过参数级元学习实现自我改进，解决了动态真实世界中概念漂移和梯度基适应成本高的问题，展示了在常识、数学、医学、编程、社交和逻辑推理任务上的优越性能。

Comments Accepted at "Association for the Advancement of Artificial Intelligence 2026 Conference" in Streaming Continual Learning Bridge. Published in CEUR Workshop Proceedings (Original version at https://ceur-ws.org/Vol-4183/paper2.pdf)

详情

Journal ref: CEUR Workshop Proceedings, Vol. 4183, 2026

AI中文摘要

尽管大型语言模型（LLMs）在许多任务上取得了显著成功，但在动态、真实世界环境中部署时仍然面临瓶颈，主要挑战是概念漂移和基于梯度的适应成本高。传统微调（FT）难以适应非平稳数据流，且会导致灾难性遗忘或需要大量人工数据校准。为了解决这些限制，本文在流式和持续学习范式中提出Self-Optimizing Lifelong Autonomous Reasoner（SOLAR），即一种开放式自主代理，利用参数级元学习实现自我改进，将模型权重视为探索的环境。SOLAR通过在常识常识知识上建立强先验，使其在迁移学习中有效。通过多级强化学习方法，SOLAR自主发现适应策略，实现对未见领域的高效测试时间适应。关键在于SOLAR维护一个不断发展的有效修改策略知识库，隐式地作为事件记忆缓冲器，平衡可塑性（适应新任务）和稳定性（保留元知识）。实验表明，SOLAR在常识、数学、医学、编程、社交和逻辑推理任务上优于强基线，标志着向能够适应演进环境的自主代理迈出重要一步。

英文摘要

Despite the remarkable success of large language models (LLMs), they still face bottlenecks while deploying in dynamic, real-world settings with primary challenges being concept drift and the high cost of gradient-based adaptation. Traditional fine-tuning (FT) struggles to adapt to non-stationary data streams without resulting in catastrophic for getting or requiring extensive manual data curation. To address these limitations within the streaming and continual learning paradigm, we propose the Self-Optimizing Lifelong Autonomous Reasoner (SOLAR) which is an open-ended autonomous agent that leverages parameter-level meta-learning to self-improve, treating model weights as an environment for exploration. It initiates the process by consolidating a strong prior over common-sense knowledge making it effective for transfer-learning. By utilizing a multi-level reinforcement learning approach, SOLAR autonomously discovers adaptation strategies, enabling efficient test-time adaptation to unseen domains. Crucially, SOLAR maintains an evolving knowledge base of valid modification strategies, implicitly acting as an episodic memory buffer to balance plasticity (adaptation to new tasks) and stability (retention of meta-knowledge). Experiments demonstrate that SOLAR outperforms strong baselines on common-sense, mathematical, medical, coding, social and logical reasoning tasks, marking a significant step toward autonomous agents capable of lifelong adaptation in evolving environments.

URL PDF HTML ☆

赞 0 踩 0

2605.20188 2026-05-21 cs.LG cs.AI 版本更新

GraphDiffMed: Knowledge-Constrained Differential Attention with Pharmacological Graph Priors for Medication Recommendation

GraphDiffMed: 基于药理图先验的知识约束差分注意力用于药物推荐

Krati Saxena, Tomohiro Shibata

发表机构 * Kyushu Institute of Technology（九州工业大学）

AI总结本文提出GraphDiffMed，一种结合噪声感知注意力和药理约束的药物推荐框架，通过双尺度差分注意力在院内和院间层面过滤虚假信号，提升推荐质量和安全性。

详情

AI中文摘要

从电子健康记录（EHRs）中推荐安全有效的药物组合是核心临床AI问题，但因患者轨迹长、噪声大且临床异质性高而困难。现有方法通常在时间建模或药理知识整合方面表现优异，但难以同时实现两者并有效抑制噪声。我们提出GraphDiffMed，一种基于双尺度差分注意力v2的知识约束药物推荐框架。差分注意力应用于院内和院间层面以过滤遇境内的虚假信号和纵向历史中的噪声，而药理约束则在学习过程中整合。在MIMIC-III和消融研究中，该设计在推荐质量和排名上优于强基线模型，同时实现了更平衡的安全性能。我们进一步发现，最强表现的配置在实验设置下仅使用人口统计辅助特征。总体而言，GraphDiffMed证明了结合噪声感知注意力与药理约束能产生更可靠且具有临床意义的药物推荐。我们开源代码至https://github.com/saxenakrati09/GraphDiffMed。

英文摘要

Recommending safe and effective medication combinations from electronic health records (EHRs) is a core clinical AI problem, yet it remains difficult because patient trajectories are long, noisy, and clinically heterogeneous. Existing methods typically excel at either temporal modeling across visits or pharmacological knowledge integration (e.g., drug-drug interactions, DDIs), but rarely achieve both while robustly suppressing noise. We present GraphDiffMed, a knowledge-constrained medication recommendation framework built on dual-scale Differential Attention v2. Differential attention is applied at both intra-visit and inter-visit levels to filter spurious signals within encounters and across longitudinal history, while pharmacological constraints are incorporated during learning. Experiments on MIMIC-III and ablation studies show that this design consistently improves recommendation quality and ranking over strong baselines while achieving a more favorable safety performance balance. We further find that the strongest-performing configuration uses only demographic auxiliary features under our experimental setting. Overall, GraphDiffMed demonstrates that combining noise-aware attention with pharmacological constraints yields more reliable and clinically meaningful medication recommendation. We open-source our code at https://github.com/saxenakrati09/GraphDiffMed.

URL PDF HTML ☆

赞 0 踩 0

2605.20187 2026-05-21 cs.LG cs.AI cs.IT math.IT 版本更新

Neural Estimation of Pairwise Mutual Information in Masked Discrete Sequence Models

在遮蔽离散序列模型中神经估计成对互信息

Jai Sharma, Yifan Wang, Bryan Li

发表机构 * University of California, Berkeley, CA, USA（加州大学伯克利分校）

AI总结本文提出了一种神经框架，直接从预训练的遮蔽扩散模型（MDMs）的隐藏状态中估计成对条件互信息（MI），利用模型自身条件分布计算的地面真实MI进行监督，从而捕捉模型内部对依赖结构的信念，并在单次前向传递中预测完整的MI矩阵，实现MI引导的并行解码。

Comments 6 pages, 3 figures; submitting to ICML 2026

详情

AI中文摘要

理解变量之间的依赖关系对于解释性和高效生成在遮蔽扩散模型（MDMs）中至关重要，但这些模型主要暴露边际条件分布，而不显式表示变量间依赖。我们提出了一种神经框架，直接从预训练MDM的隐藏状态中估计成对条件互信息（MI），使用模型自身条件分布计算的地面真实MI进行监督。所得到的估计器捕捉了模型内部对依赖结构的信念，并在单次前向传递中预测完整的MI矩阵，从而通过识别条件独立的变量子集实现MI引导的并行解码。我们在Sudoku和蛋白质序列生成中使用ESM-C评估了我们的方法，其中MI图恢复了已知的结构约束，并在保持生成质量的同时，相比顺序解码将推理时间前向传递次数减少了3-5倍，同时优于基于熵的并行化方法。

WriteSAE: 用于递归状态的稀疏自编码器

Jack Young

发表机构 * Indiana University（印第安纳大学）

AI总结本文提出WriteSAE，一种用于递归语言模型状态中矩阵更新的稀疏自编码器，通过在递归缓存中替换原始写入操作来提升生成效果，并在多个模型上验证了其有效性。

Comments 26 pages, 14 figures, 21 tables; code at https://github.com/JackYoung27/writesae

详情

AI中文摘要

我们介绍了WriteSAE，一种用于递归语言模型状态中矩阵更新的稀疏自编码器。在Gated DeltaNet、Mamba-2和RWKV-7中，每个token向递归缓存写入一个矩阵形状的更新；残差流SAE具有向量形状的原子，无法直接替换该更新。WriteSAE学习具有与模型自身写入相同形状的秩-1矩阵原子。这使我们能够测试直接替换：在SAE激活原子的位置，我们移除模型的写入，插入由SAE激活缩放的原子，并继续前向传递。在92.4%的评估位置上，原子比删除写入能产生更接近的最终token分布；平均每个原子，该比率是89.8%。对于Gated DeltaNet，一个使用忘记门、读取查询和输出嵌入的公式可以预测结果的logit变化，$R^2 = 0.98$。相同的替换测试在Mamba-2-370M上转移，达到88.1%。在生成中，该公式选择写入方向；将写入方向写入三个连续的缓存位置，其范数为模型写入的3倍，使在未修改模型中初始排名为100-1000的token出现在100%的延续中，比33.3%有所提高。据我们所知，这是首次在状态空间或混合递归层中报告的缓存级引导干预。

英文摘要

We introduce WriteSAE, a sparse autoencoder for the matrix updates written into recurrent language-model state. In Gated DeltaNet, Mamba-2, and RWKV-7, each token writes a matrix-shaped update to a recurrent cache; a residual-stream SAE has vector-shaped atoms and cannot replace that update directly. WriteSAE learns rank-1 matrix atoms with the same shape as the model's own write. This lets us test a direct replacement: at positions where the SAE activates an atom, we remove the model's write, insert the atom scaled by its SAE activation, and continue the forward pass. The atom gives a closer final token distribution than deleting the write on 92.4% of evaluated positions; averaged per atom, the rate is 89.8%. For Gated DeltaNet, a formula using the forget gate, read query, and output embedding predicts the resulting logit change with $R^2 = 0.98$. The same replacement test transfers to Mamba-2-370M at 88.1%. In generation, the formula chooses a write direction; writing it into three consecutive cache positions at $3\times$ the norm of the model's write makes tokens initially ranked 100--1000 by the unmodified model appear in 100% of continuations, up from 33.3%. To our knowledge this is the first cache-level steering intervention reported in a state-space or hybrid recurrent layer.

URL PDF HTML ☆

赞 0 踩 0

2605.06395 2026-05-21 cs.LG cs.AI eess.SP 版本更新

Consistent Geometric Deep Learning via Hilbert Bundles and Cellular Sheaves

通过希尔伯特丛和细胞sheaf实现一致的几何深度学习

Kartik Tandon, Julian Gould, Tanishq Bhatia, Francesca Dominici, Alejandro Ribeiro, Claudio Battiloro

发表机构 * University of Pennsylvania（宾夕法尼亚大学）； Sakana AI ； Northeastern University（东北大学）； Harvard University（哈佛大学）

AI总结本文提出了一种新的卷积学习框架，用于在流形上支持的可能无限维信号，通过希尔伯特丛关联的连接拉普拉斯算子作为卷积算子，引入了称为HilbNets的滤波器和神经网络，并通过两阶段采样过程实现，证明了采样诱导的希尔伯特细胞sheaf的sheaf拉普拉斯收敛于底层连接拉普拉斯，从而在无限维丛设置中推广了Belkin和Niyogi的收敛结果，最终在合成和现实任务中验证了该框架。

Comments 51 pages, 3 figures, 5 tables

详情

AI中文摘要

现代深度学习架构越来越多地面临复杂信号的挑战，这些信号本质上是无限维的，如时间序列、概率分布或算子，并在不规则域上定义。然而，针对这些设置的统一学习理论仍然缺乏。为了开始解决这一差距，我们引入了一种新的卷积学习框架，用于在流形上支持的可能无限维信号。具体来说，我们使用与希尔伯特丛相关的连接拉普拉斯算子作为卷积算子，并推导出滤波器和神经网络，称为HilbNets。我们使HilbNets以及更一般地卷积操作通过两阶段采样过程实现。首先，我们证明采样流形诱导了一个希尔伯特细胞sheaf，这是一个带有希尔伯特特征空间和边耦合规则的广义图结构，并证明其sheaf拉普拉斯在采样密度增加时以概率收敛于底层连接拉普拉斯。值得注意的是，这一结果是Belkin & Niyogi收敛结果在无限维丛设置中的推广，这是几何学习方法的理论基石。其次，我们离散化信号并证明离散化的（可实现的）HilbNets收敛于底层连续架构，并且可以在相同丛的不同采样中转移，从而为学习提供一致性。最后，我们验证了我们的框架在合成和现实任务中的有效性。总体而言，我们的结果通过将经典拉普拉斯框架提升到信号在每个点居住在自身希尔伯特空间的设置中，扩展了几何学习的范围。

英文摘要

Modern deep learning architectures increasingly contend with sophisticated signals that are natively infinite-dimensional, such as time series, probability distributions, or operators, and are defined over irregular domains. Yet, a unified learning theory for these settings has been lacking. To start addressing this gap, we introduce a novel convolutional learning framework for possibly infinite-dimensional signals supported on a manifold. Namely, we use the connection Laplacian associated with a Hilbert bundle as a convolutional operator, and we derive filters and neural networks, dubbed as \textit{HilbNets}. We make HilbNets and, more generally, the convolution operation, implementable via a two-stage sampling procedure. First, we show that sampling the manifold induces a Hilbert Cellular Sheaf, a generalized graph structure with Hilbert feature spaces and edge-wise coupling rules, and we prove that its sheaf Laplacian converges in probability to the underlying connection Laplacian as the sampling density increases. Notably, this result is a generalization to the infinite-dimensional bundle setting of the Belkin \& Niyogi \cite{BELKIN20081289} convergence result for the graph Laplacian to the manifold Laplacian, a theoretical cornerstone of geometric learning methods. Second, we discretize the signals and prove that the discretized (implementable) HilbNets converge to the underlying continuous architectures and are transferable across different samplings of the same bundle, providing consistency for learning. Finally, we validate our framework on synthetic and real-world tasks. Overall, our results broaden the scope of geometric learning as a whole by lifting classical Laplacian-based frameworks to settings where the signal at each point lives in its own Hilbert space.

URL PDF HTML ☆

赞 0 踩 0

2605.03601 2026-05-21 cs.LG cs.DM math.CO 版本更新

Most ReLU Networks Admit Identifiable Parameters

大多数ReLU网络允许可识别的参数

Moritz Grillo, Guido Montúfar

发表机构 * Max Planck Institute for Mathematics in the Sciences（马克斯·普朗克数学研究所）

AI总结研究ReLU深度网络的实现映射，探讨函数是否能确定其参数（除缩放和排列外），引入基于加权多面体复形的框架，证明对于输入和隐藏层宽度至少为2的架构，存在可识别参数的开集，且函数维度等于参数数量减去隐藏神经元数量，并建立通用深度层次。

2605.03562 2026-05-21 cs.LG cs.AI 版本更新

HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization

HeadQ: KV-Cache量化中的模型可见失真与分数空间校正

Jorge L. Ruiz Williams

AI总结本文提出HeadQ方法，通过在键侧存储低秩残差侧码并在校准学习的查询基上应用作为加性对数修正，以解决KV缓存量化中的模型可见失真问题，并通过分数空间误差预测注意力KL散度，优于原始键MSE。

Comments Withdrawn by the author because ethical concerns were identified after posting

详情

预归一化变换器中的门控归一化移除与尺度锚定

Andrei Kanavalau, Carmen Amo Alonso, Sanjay Lall

发表机构 * Department of Electrical Engineering（电气工程系）； Department of Computer Science（计算机科学系）； Stanford University（斯坦福大学）

AI总结本文研究了预归一化变换器中归一化层的必要性，提出了一种门控归一化移除方法，通过TaperNorm逐步将归一化操作转为样本无关的线性或仿射映射，并揭示了最终归一化层对预logit表示尺度的锚定作用。

详情

AI中文摘要

归一化层在变换器中是标准组件，但其样本依赖的计算在整个训练和推理过程中是否必要尚不明确。本文为预归一化变换器开发了一种门控归一化移除方法。该方法通过TaperNorm实现，从标准RMSNorm/LayerNorm逐步过渡到学习的样本无关线性或仿射映射。一旦门控达到零， tapered层将不再计算每个token的统计信息，所得到的映射可以折叠到相邻的线性投影中。结果表明，在测试的预训练和微调设置中，内部归一化可以逐步移除，且验证损失的增加较小。我们的方法揭示了最终归一化层的独特作用，即它锚定了预logit表示的尺度。有了这个锚定，最后隐藏状态的径向变化不会直接减少损失；当移除它时，通过增加logit的幅度可以实现交叉熵的减少。固定目标尺度损失提供了显式的替代锚定，使在测试范围内能够完全无归一化地进行消融实验。最后，在KV缓存自回归解码基准中，逐步移除内部归一化可提供高达1.14倍的吞吐量，使用显式缩放操作，折叠后可达1.18倍。

英文摘要

Normalization layers are standard in transformers, but it is not clear whether their sample-dependent computations are necessary throughout both training and inference. This work develops a gated normalization-removal approach for pre-norm transformers. The approach is implemented using TaperNorm, which starts from standard RMSNorm/LayerNorm and gradually tapers to learned sample-independent linear or affine maps. Once the gate reaches zero, per-token statistics are no longer computed in the tapered layers and the resulting maps can be folded into adjacent linear projections. The results indicate that internal normalization can be tapered in the tested pre-training and fine-tuning settings with small validation-loss increases. Our approach helps reveal a distinct role for final normalization, namely that it anchors the scale of the pre-logit representation. With this anchor present, radial changes in the last hidden state do not directly reduce the loss; when it is removed, reducing cross-entropy can be achieved by increasing logit magnitudes. A fixed-target scale loss provides an explicit alternative anchor and enables fully norm-free ablations in the tested regimes. Finally, in a KV-cached autoregressive decoding benchmark, tapering internal norms gives up to $1.14\times$ higher throughput with explicit scaling operations and up to $1.18\times$ after folding.

URL PDF HTML ☆

赞 0 踩 0

2602.08686 2026-05-21 cs.LG cs.AI 版本更新

CompilerKV: Risk-Adaptive KV Compression via Offline Experience Compilation

CompilerKV: 通过离线经验编译实现风险适应性的键值压缩

Ning Yang, Chengzhi Wang, Yibo Liu, Baoliang Tian, Haijun Zhang

发表机构 * Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所）； University of Electronic Science and Technology of China（电子科技大学）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））； ByteDance（字节跳动）； University of Science and Technology Beijing（北京科技大学）

AI总结本文提出CompilerKV，一种通过离线经验编译实现风险适应性的键值压缩方法，通过离线编译校准语料库中的纠正表，将在线纠正减少到O(1)查找加预算限制，从而在多个模型架构上实现了压缩SOTA，并在不同压力条件下保持最优性能。

详情

AI中文摘要

Prefill-only KV compression freezes a token subset at the end of prefill and decodes from it without further eviction. The retention decision is therefore irreversible, yet existing methods estimate the corrective signals it relies on, per-head reliability and prompt-level compression sensitivity, online from a single noisy prompt. We argue this is the wrong statistical unit: these signals exhibit far higher cross-prompt regularity than within-prompt signal-to-noise. We introduce extsc{CompilerKV}, a KV-retention policy whose corrective tables are compiled offline from a calibration corpus, reducing online correction after the standard observation-window scan to $O(1)$ lookups plus a budget clamp. We find that compiled retention tables behave as portable architectural priors: rankings transfer across disjoint corpora on four backbones (mean Spearman $arρ{=}0.90$), and direct model-to-model table transfer costs only $0.4$--$0.8$ LongBench points on average. At a 512-token budget, extsc{CompilerKV} attains compressed-SOTA on all four backbones, improving over the strongest prefill-only baseline by $+1.67$ points on average (task-bootstrap 95\% CI $[+1.08,+2.37]$). Pressure regimes amplify the gap: under a fixed $512/32k$ cache ratio, CompilerKV remains the strongest compressed method through 128k RULER ($\sim\!73$ vs.\ FullKV $\sim\!79$, SnapKV $\sim\!38$); on 32k NIAH it reaches $0.89$ vs.\ SnapKV $0.42$; and at 32k input, retaining only $1.56\%$ of the prefill KV, batch-16 serving remains feasible where FullKV is OOM.

英文摘要

Prefill-only KV compression freezes a token subset at the end of prefill and decodes from it without further eviction. The retention decision is therefore irreversible, yet existing methods estimate the corrective signals it relies on, per-head reliability and prompt-level compression sensitivity, online from a single noisy prompt. We argue this is the wrong statistical unit: these signals exhibit far higher cross-prompt regularity than within-prompt signal-to-noise. We introduce \textsc{CompilerKV}, a KV-retention policy whose corrective tables are compiled offline from a calibration corpus, reducing online correction after the standard observation-window scan to $O(1)$ lookups plus a budget clamp. We find that compiled retention tables behave as portable architectural priors: rankings transfer across disjoint corpora on four backbones (mean Spearman $\barρ{=}0.90$), and direct model-to-model table transfer costs only $0.4$--$0.8$ LongBench points on average. At a 512-token budget, \textsc{CompilerKV} attains compressed-SOTA on all four backbones, improving over the strongest prefill-only baseline by $+1.67$ points on average (task-bootstrap 95\% CI $[+1.08,+2.37]$). Pressure regimes amplify the gap: under a fixed $512/32k$ cache ratio, CompilerKV remains the strongest compressed method through 128k RULER ($\sim\!73$ vs.\ FullKV $\sim\!79$, SnapKV $\sim\!38$); on 32k NIAH it reaches $0.89$ vs.\ SnapKV $0.42$; and at 32k input, retaining only $1.56\%$ of the prefill KV, batch-16 serving remains feasible where FullKV is OOM.

URL PDF HTML ☆

赞 0 踩 0

2602.07832 2026-05-21 cs.LG cs.AI 版本更新

rePIRL: Learn PRM with Inverse RL for LLM Reasoning

rePIRL: 通过逆强化学习学习PRM以提高LLM推理

Xian Wu, Kaijie Zhu, Ying Zhang, Lun Wang, Wenbo Guo

发表机构 * Meta AI ； Department of Computer Science, University of California, Santa Barbara（加州大学圣芭芭拉分校计算机科学系）； Google DeepMind（谷歌DeepMind）； Independent Researcher（独立研究者）

AI总结本文提出rePIRL框架，通过逆强化学习学习高效的PRM，无需依赖专家策略的强假设，解决了传统方法中熵崩溃等固有限制问题，通过双学习过程和定制技术提升LLM推理性能，并在数学和编程任务数据集上验证了其有效性。

详情

AI中文摘要

过程奖励已被广泛用于深度强化学习以提高训练效率、减少方差并防止奖励黑客。在LLM推理中，现有工作也探索了各种解决方案来学习有效的过程奖励模型（PRM），有或无专家策略的帮助。然而，现有方法要么依赖于对专家策略的强假设（例如要求其奖励函数），要么受到固有限制（例如熵崩溃），导致PRM效果有限或泛化能力差。在本文中，我们引入了rePIRL，一种受逆强化学习启发的框架，能够在对专家策略假设最少的情况下学习有效的PRM。具体来说，我们设计了一种双学习过程，交替更新策略和PRM。我们的学习算法具有定制技术，以解决将传统逆强化学习扩展到LLM的挑战。我们理论证明，所提出的学习框架可以统一在线和离线PRM学习方法，证明rePIRL可以在最少假设下学习PRM。在标准化数学和编程推理数据集上的经验评估展示了rePIRL在现有方法上的有效性。我们进一步展示了训练的PRM在测试时训练、测试时扩展以及为训练困难问题提供早期信号的应用。最后，我们通过详细的消融研究验证了我们的训练配方和关键设计选择。

英文摘要

Process rewards have been widely used in deep reinforcement learning to improve training efficiency, reduce variance, and prevent reward hacking. In LLM reasoning, existing works also explore various solutions for learning effective process reward models (PRM) with or without the help of an expert policy. However, existing methods either rely on strong assumptions about the expert policies (e.g., requiring their reward functions) or suffer intrinsic limitations (e.g., entropy collapse), resulting in weak PRMs or limited generalizability. In this paper, we introduce rePIRL, an inverse RL-inspired framework that learns effective PRMs with minimal assumptions about expert policies. Specifically, we design a dual learning process that updates the policy and the PRM interchangeably. Our learning algorithm has customized techniques to address the challenges of scaling traditional inverse RL to LLMs. We theoretically show that our proposed learning framework can unify both online and offline PRM learning methods, justifying that rePIRL can learn PRMs with minimal assumptions. Empirical evaluations on standardized math and coding reasoning datasets demonstrate the effectiveness of rePIRL over existing methods. We further show the application of our trained PRM in test-time training, test-time scaling, and providing an early signal for training hard problems. Finally, we validate our training recipe and key design choices via a detailed ablation study.

URL PDF HTML ☆

赞 0 踩 0

2601.18973 2026-05-21 cs.LG cs.AI cs.SY eess.SY quant-ph 版本更新

When Does Adaptation Win? Scaling Laws for Meta-Learning in Quantum Control

何时适应胜出？量子控制中元学习的缩放定律

Nima Leclerc, Chris Miller, Nicholas Brawand

发表机构 * The MITRE Corporation（MITRE公司）

AI总结本文研究了元学习在量子控制中的适应性问题，推导了适应增益的缩放定律，表明适应增益随着梯度步数指数饱和，而随任务方差线性增长，为判断适应的必要性提供了量化标准。

Comments 28 pages, 11 figures

详情

AI中文摘要

量子硬件固有地存在设备异质性和环境漂移，迫使实践者在次优非适应控制器和高成本的设备特定重新校准之间做出选择。我们推导了元学习的缩放定律下限，表明适应增益（任务特定梯度步的预期保真度提升）随着梯度步数指数饱和，而随任务方差线性增长，提供了判断适应是否值得其开销的量化标准。在量子门校准上的验证显示，低方差任务的适应收益微乎其微，但在极端分布外条件（训练噪声的10倍）下，两量子位门的保真度提升超过40%，这对减少云量子处理器上的设备校准时间具有启示。进一步在经典线性二次控制上的验证证实这些定律源于通用优化几何而非量子特定物理。我们还引入了一种少量次预适应协议，能够在3-19%的相对误差范围内，通过N=3-5次探测步估计最优的适应预算。

英文摘要

Quantum hardware suffers from intrinsic device heterogeneity and environmental drift, forcing practitioners to choose between suboptimal non-adaptive controllers or costly per-device recalibration. We derive a scaling law lower bound for meta-learning showing that the adaptation gain (expected fidelity improvement from task-specific gradient steps) saturates exponentially with gradient steps and scales linearly with task variance, providing a quantitative criterion for when adaptation justifies its overhead. Validation on quantum gate calibration shows negligible benefits for low-variance tasks but >40% fidelity gains on two-qubit gates under extreme out-of-distribution conditions (10$\times$ the training noise), with implications for reducing per-device calibration time on cloud quantum processors. Further validation on classical linear-quadratic control confirms these laws emerge from general optimization geometry rather than quantum-specific physics. We further introduce a few-shot pre-adaptation protocol that estimates the optimal adaptation budget from $N{=}3$-5 probe steps within 3-19% relative error across out-of-distribution regimes.

URL PDF HTML ☆

赞 0 踩 0

2601.05639 2026-05-21 cs.CV cs.LG 版本更新

TriagerX: 用于基于内容和交互的缺陷分类任务的双变换器

Md Afif Al Mamun, Gias Uddin, Lan Xia, Longyu Zhang

发表机构 * University of Calgary（卡尔加里大学）； York University（约克大学）； IBM Canada（IBM加拿大）

AI总结本文提出TriagerX，一种双变换器架构，通过结合内容和交互信息来改进缺陷分类任务的推荐准确性，优于现有最先进方法。

Comments Accepted to IEEE Transactions on Software Engineering (TSE). 17 pages, 15 figures

详情

DOI: 10.1109/TSE.2026.3685573

AI中文摘要

预训练语言模型（PLMs）是基于变换器的架构，可用于缺陷分类任务。PLMs比传统机器学习（ML）模型更能捕捉标记语义（例如TF-IDF、词袋）。然而，PLMs可能仍然会关注在缺陷报告中不相关的标记，这会影响其有效性。此外，当不考虑开发人员围绕类似缺陷的交互历史时，模型的推荐可能不够优化。我们设计了TriagerX来解决这些限制。首先，为了更可靠地评估标记语义，我们利用双变换器架构。与当前最先进的（SOTA）基线使用单一变换器架构不同，TriagerX从两个变换器中收集推荐，每个变换器通过其最后三层提供推荐。这种设置生成了一个稳健的内容基于候选开发人员的排名。TriagerX然后通过一种新的基于交互的排名方法来细化此排名，该方法考虑了开发人员与类似修复缺陷的历史交互。在五个数据集中，TriagerX超越了所有九种基于变换器的方法，包括SOTA基线，通常在Top-1和Top-3开发人员推荐准确性上提高了超过10%。我们与我们的大型行业合作伙伴合作，成功将其部署到他们的开发环境中。合作伙伴要求开发人员和组件的推荐，组件作为团队分配的代理，特别是在开发人员轮岗或团队变化的情况下特别有用。我们训练TriagerX在合作伙伴的数据集上进行两项任务，并在组件推荐上优于SOTA基线最高达10%，在开发人员推荐上最高达54%。

英文摘要

Pretrained Language Models or PLMs are transformer-based architectures that can be used in bug triaging tasks. PLMs can better capture token semantics than traditional Machine Learning (ML) models that rely on statistical features (e.g., TF-IDF, bag of words). However, PLMs may still attend to less relevant tokens in a bug report, which can impact their effectiveness. In addition, the model can be sub-optimal with its recommendations when the interaction history of developers around similar bugs is not taken into account. We designed TriagerX to address these limitations. First, to assess token semantics more reliably, we leverage a dual-transformer architecture. Unlike current state-of-the-art (SOTA) baselines that employ a single transformer architecture, TriagerX collects recommendations from two transformers with each offering recommendations via its last three layers. This setup generates a robust content-based ranking of candidate developers. TriagerX then refines this ranking by employing a novel interaction-based ranking methodology, which considers developers' historical interactions with similar fixed bugs. Across five datasets, TriagerX surpasses all nine transformer-based methods, including SOTA baselines, often improving Top-1 and Top-3 developer recommendation accuracy by over 10%. We worked with our large industry partner to successfully deploy TriagerX in their development environment. The partner required both developer and component recommendations, with components acting as proxies for team assignments-particularly useful in cases of developer turnover or team changes. We trained TriagerX on the partner's dataset for both tasks, and it outperformed SOTA baselines by up to 10% for component recommendations and 54% for developer recommendations.

URL PDF HTML ☆

赞 0 踩 0

2506.20764 2026-05-21 math.OC cs.LG 版本更新

FLUME-FNO：在未见的城市形态中高效且可扩展地预测三维风场和温度场

Shaoxiang Qin, Theodore Potsis, Dongxue Zhan, Xue Liu, Ted Stahopoulos, Liangzhu Leon Wang

发表机构 * Department of Building, Civil and Environmental Engineering, Cetner Zero Energy Building Studies, Concordia University（建筑、土木与环境工程系，零能耗建筑研究中心，康科迪亚大学）； School of Computer Science, McGill University（计算机科学系，麦吉尔大学）； Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates（人工智能Mohamed bin Zayed大学，阿布扎比，阿拉伯联合酋长国）

AI总结本文提出FLUME-FNO方法，通过仅使用建筑几何信息高效且可扩展地预测未见城市形态中的三维风场和温度场，解决了传统CFD计算成本高和深度学习方法依赖大量训练数据的问题。

详情

AI中文摘要

城市微气候，由建筑物几何形状所塑造的风场和温度场，显著影响能源消耗、行人风、污染物扩散、城市热岛效应和公共健康。准确预测微气候至关重要但具有挑战性。传统计算流体动力学（CFD）在快速评估中计算成本过高，而许多深度学习方法需要大量训练数据且在未见配置中泛化能力差。我们提出了快速局部化城市微气候模拟傅里叶神经算子（FLUME-FNO），一种基于仅建筑几何信息的高效且可扩展的框架，用于快速预测三维风场和温度场。FLUME-FNO假设局部城市微气候主要由从特定位置直接可见的周围几何形状控制。为此，该框架引入了一种新的多方向距离特征（MDDF），通过测量到周围建筑物的方向距离来表示可见的开放空间结构。通过在全域上计算MDDF并将编码的几何特征裁剪成较小的3D块，FLUME-FNO有效地增强了有限的CFD数据，使其能够从仅23个CFD模拟中进行稳健学习。该模型在未见配置上实现了风速的均绝对误差为0.2 m/s和温度的均绝对误差为0.19 °C。为满足对可信快速微气候预测的需求，该框架进一步使用深度集成作为FLUME-FNO不确定性的实用代理，不确定性范围从3%到40%不等。UQ框架证明FLUME-FNO在风工程和微气候研究中提供了稳健、可信的预测，其精度在可接受的误差阈值内，突显了其在现实应用中的潜力。

英文摘要

Urban microclimate, encompassing wind and temperature fields shaped by building geometry, significantly impacts energy consumption, pedestrian winds, pollutant dispersion, urban heat island, and public health. Accurately predicting microclimate is crucial yet challenging. Conventional Computational Fluid Dynamics (CFD) is computationally prohibitive for rapid assessments, while many deep learning approaches require extensive training data and struggle with generalization in unseen configurations. We present the Fast Localized Urban Microclimate Emulation Fourier Neural Operator (FLUME-FNO), a data-efficient and scalable framework for rapid prediction of 3D wind and temperature fields based solely on building geometry. FLUME-FNO assumes the local urban microclimate is primarily governed by surrounding geometry directly visible from a specific location. To encode this, the framework introduces a novel Multi-Directional Distance Feature (MDDF), representing visible open-space structures by measuring directional distances to surrounding buildings. By computing MDDF over the full domain and cropping encoded geometric features into smaller 3D patches, FLUME-FNO effectively augments limited CFD data, enabling robust learning from just 23 CFD simulations. The model achieves mean absolute errors of 0.2 m/s for wind speed and 0.19 °C for temperature on unseen configurations. Addressing the need for trustworthy fast microclimate prediction, the framework is further assessed using a deep ensemble as a practical proxy for FLUME-FNO uncertainty, ranging from 3% to 40% depending on location. The UQ framework demonstrates FLUME-FNO provides resilient, trustworthy predictions within acceptable accuracy thresholds for wind engineering and microclimate studies, highlighting its potential for real-world applications.

URL PDF HTML ☆

赞 0 踩 0

2503.15105 2026-05-21 math.NA cs.LG cs.NA math.OC 版本更新

Control, Optimal Transport and Neural Differential Equations in Supervised Learning

控制、最优传输与神经微分方程在监督学习中的应用

Minh-Nhat Phung, Minh-Binh Tran

发表机构 * Department of Mathematics, Texas A\&M University, College Station, TX 77843, USA（德克萨斯A&M大学数学系）

AI总结本文研究了使用神经微分方程近似最优传输方程的基本计算问题，提出了一个新颖的框架用于用神经ODE近似连续域中的不平衡最优传输，通过推广具有皮尔逊发散的离散UOT问题，构造了收敛于真实UOT动态的向量场，推动了计算传输和机器学习的数学基础。

详情

AI中文摘要

我们研究了使用神经微分方程近似最优传输（OT）方程的基本计算问题。更具体地说，我们开发了一个新的框架，用于用神经ODE近似连续域中的不平衡最优传输（UOT）。通过推广具有皮尔逊发散的离散UOT问题，我们构造了神经ODE的向量场，这些向量场收敛于真实的UOT动态，从而推进了计算传输和机器学习的数学基础。为此，我们设计了一种受Sinkhorn算法启发的数值方案来解决相应的最小化问题，并严格证明其收敛性，提供明确的误差估计。从获得的数值解中，我们推导出定义传输动态的向量场，并构造相应的传输方程。最后，从数值获得的传输方程中，我们构造了一个神经微分方程，其流在适当的极限情况下收敛于真实的传输动态。

英文摘要

We study the fundamental computational problem of approximating optimal transport (OT) equations using neural differential equations (Neural ODEs). More specifically, we develop a novel framework for approximating unbalanced optimal transport (UOT) in the continuum using Neural ODEs. By generalizing a discrete UOT problem with Pearson divergence, we constructively design vector fields for Neural ODEs that converge to the true UOT dynamics, thereby advancing the mathematical foundations of computational transport and machine learning. To this end, we design a numerical scheme inspired by the Sinkhorn algorithm to solve the corresponding minimization problem and rigorously prove its convergence, providing explicit error estimates. From the obtained numerical solutions, we derive vector fields defining the transport dynamics and construct the corresponding transport equation. Finally, from the numerically obtained transport equation, we construct a neural differential equation whose flow converges to the true transport dynamics in an appropriate limiting regime.

URL PDF HTML ☆

赞 0 踩 0

2406.03506 2026-05-21 cs.LG cs.AI 版本更新

Fuzzy Convolution Neural Networks for Tabular Data Classification

模糊卷积神经网络用于表格数据分类

Arun D. Kulkarni

发表机构 * Computer Science Department, University of Texas at Tyler（德克萨斯大学泰勒分校计算机科学系）

AI总结本文提出了一种针对表格数据分类的模糊卷积神经网络（FCNN），通过将特征值映射为模糊隶属度并转换为图像来训练CNN模型，从而在表格数据分类任务中实现有效的学习和优于现有方法的性能。

Comments 10 pages, 16 figures, Submitted to IEEE Access

详情

DOI: 10.1109/ACCESS.2024.3479882
Journal ref: IEEE Access, vol. 12, pp. 151846-151855 (2024)

AI中文摘要

近年来，由于在各种领域中表现出色，特别是图像和文本分类任务，卷积神经网络（CNNs）已经引起了广泛关注。然而，它们在表格数据分类中的应用仍然很少被探索。在生物信息学、金融、医学等领域，非图像数据普遍存在。将CNNs适应于分类非图像数据仍然极具挑战性。本文研究了CNNs在表格数据分类中的有效性，旨在弥合传统机器学习方法与深度学习技术之间的差距。我们提出了一种专门针对表格数据的新型框架——模糊卷积神经网络（FCNN），以捕捉特征向量中的局部模式。在我们的方法中，我们将特征值映射到模糊隶属度。模糊隶属度向量被转换为图像，用于训练CNN模型。训练后的CNN模型用于分类未知的特征向量。为了验证我们的方法，我们生成了六个复杂的噪声数据集。我们从每个数据集中随机选择70%的样本用于训练，30%用于测试。数据集还使用了最先进的机器学习算法，如决策树（DT）、支持向量机（SVM）、模糊神经网络（FNN）、贝叶斯分类器和随机森林（RF）进行分类。实验结果表明，我们提出的方法能够有效地从表格数据中学习有意义的表示，实现与现有方法相媲美或更优的性能。总体而言，我们的发现表明，所提出的FCNN模型在表格数据分类任务中具有前景，作为一种可行的替代方案，为在结构化数据分析中利用深度学习提供了新的视角和潜在的机会。

英文摘要

Recently, convolution neural networks (CNNs) have attracted a great deal of attention due to their remarkable performance in various domains, particularly in image and text classification tasks. However, their application to tabular data classification remains underexplored. There are many fields such as bioinformatics, finance, medicine where nonimage data are prevalent. Adaption of CNNs to classify nonimage data remains highly challenging. This paper investigates the efficacy of CNNs for tabular data classification, aiming to bridge the gap between traditional machine learning approaches and deep learning techniques. We propose a novel framework fuzzy convolution neural network (FCNN) tailored specifically for tabular data to capture local patterns within feature vectors. In our approach, we map feature values to fuzzy memberships. The fuzzy membership vectors are converted into images that are used to train the CNN model. The trained CNN model is used to classify unknown feature vectors. To validate our approach, we generated six complex noisy data sets. We used randomly selected seventy percent samples from each data set for training and thirty percent for testing. The data sets were also classified using the state-of-the-art machine learning algorithms such as the decision tree (DT), support vector machine (SVM), fuzzy neural network (FNN), Bayes classifier, and Random Forest (RF). Experimental results demonstrate that our proposed model can effectively learn meaningful representations from tabular data, achieving competitive or superior performance compared to existing methods. Overall, our finding suggests that the proposed FCNN model holds promise as a viable alternative for tabular data classification tasks, offering a fresh prospective and potentially unlocking new opportunities for leveraging deep learning in structured data analysis.

URL PDF HTML ☆

赞 0 踩 0

2305.09620 2026-05-21 cs.CL cs.AI cs.LG 版本更新

AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction

AI增强的调查：利用大型语言模型和调查进行意见预测

Junsol Kim, Byungkyu Lee

发表机构 * Department of Sociology（社会学系）； University of Chicago（芝加哥大学）； New York University（纽约大学）； Chicago, IL（伊利诺伊州芝加哥市）； New York, NY（纽约州纽约市）

AI总结本文提出了一种基于大型语言模型的框架，通过结合问题、受访者和调查时期的嵌入表示，预测重复横断面调查中缺失的响应，从而弥补传统调查在捕捉历史变化方面的不足。

详情

AI中文摘要

全国代表性调查追踪公众意见，但每年只询问有限的问题，限制了其捕捉历史变化的潜力。为填补这一空白，我们开发了一个基于大型语言模型（LLM）的框架，通过结合问题、受访者和调查时期的嵌入表示，预测重复横断面调查中缺失的响应。我们引入了LLM在调查研究中的两个新应用：回溯预测（预测年度层面的缺失意见）和未询问意见预测（预测完全缺失的意见）。使用1972-2021年一般社会调查的数据，我们的LLM模型在交叉验证和在GSS未询问的年份中通过其他组织测量的公众意见方面表现良好。这些能力使我们能够恢复缺失的趋势并确定公众态度变化的时间，例如同性婚姻支持率的上升。然而，未询问意见预测的性能仍较为有限。我们展示了当我们的模型优于现有基准时的情况，检验了哪些意见和受访者更具可预测性，并评估了我们的方法是否减少了LLM预测响应的同质化倾向。我们的研究证明了LLM和调查可以相互增强：LLM扩大了调查的潜力，而调查则校准LLM以模拟人类意见。

英文摘要

Nationally representative surveys track public opinion, yet they ask only a limited set of questions each year, limiting its potential to capture historical changes. To fill this gap, we develop a large language model (LLM)-based framework for predicting missing responses in repeated cross-sectional surveys by incorporating embeddings for questions, respondents, and survey periods. We introduce two new applications of LLMs to survey research: retrodiction (predicting year-level missing opinions) and unasked opinion prediction (predicting entirely missing opinions). Using data from the 1972-2021 General Social Surveys, our LLM-based models perform strongly in retrodicting masked GSS opinions through cross-validation and public opinions measured by other organizations in years when the GSS did not ask them. These capabilities enable us to recover missing trends and pinpoint when public attitudes changed, such as the rising support for same-sex marriage. However, performance remains modest for unasked opinion prediction. We show when our models outperform established benchmarks, examine which opinions and and respondents are more predictable, and evaluate whether our approach reduces LLMs' tendency to homogenize predicted responses. Our study demonstrates that LLMs and surveys can mutually enhance each other: LLMs broaden survey potential, while surveys calibrate LLMs for simulating human opinions.

URL PDF HTML ☆

赞 0 踩 0