arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.22821 2026-05-22 cs.CL cs.LG 版本更新

Tokenisation via Convex Relaxations

基于凸松弛的分词

Jan Tempus, Philip Whittington, Craig W. Schmidt, Dennis Komm, Tiago Pimentel

发表机构 * ETH Zurich（苏黎世联邦理工学院）； Kensho Technologies（Kensho科技公司）

AI总结本文提出了一种基于凸松弛的分词方法ConvexTok，通过将分词构建问题转化为线性规划并利用凸优化工具求解，改进了分词指标和语言模型的bits-per-byte性能，并提升了下游任务表现。

2605.22820 2026-05-22 cs.LG 版本更新

Integrable Elasticity via Neural Demand Potentials

通过神经需求势实现可积弹性

Carlos Heredia, Daniel Roncel

发表机构 * IAMM Research, Department of Applied Artificial Intelligence（IAMM研究院，应用人工智能系）； DAMM

AI总结本文提出了一种以需求为导向的神经网络模型ICDN，用于多产品零售需求预测。该模型学习对数需求作为对数价格的平滑、上下文依赖函数，从而能够精确推导出弹性。在Dominick's啤酒数据集上，ICDN在样本外泛化性能上优于有向对数-对数基准，并产生了更稳定、更具经济合理性的弹性估计，尤其是在交叉价格效应较弱的情况下。

Comments 44 pages, 7 figures

2605.22817 2026-05-22 cs.LG cs.AI cs.CL cs.NE 版本更新

Vector Policy Optimization: Training for Diversity Improves Test-Time Search

向量策略优化：为多样性训练改进测试时间搜索

Ryan Bahlous-Boldi, Isha Puri, Idan Shenfeld, Akarsh Kumar, Mehul Damani, Sebastian Risi, Omar Khattab, Zhang-Wei Hong, Pulkit Agrawal

发表机构 * MIT（麻省理工学院）； Improbable AI Lab（Improbable AI 实验室）； MIT-IBM Computing Research Lab（麻省理工-IBM 计算研究实验室）； Sakana AI

AI总结本文提出向量策略优化（VPO）方法，通过训练策略以预测多样化的下游奖励函数，从而产生多样化的解决方案，以改进测试时间搜索的性能。

Comments 24 pages

详情

AI中文摘要

语言模型现在必须能够即刻泛化到新的环境，并在像AlphaEvolve这样的推理扩展搜索过程中工作，该过程通过多种任务特定的奖励函数选择滚出。不幸的是，标准的LLM后训练优化方法通常优化预定义的标量奖励，导致当前LLM生成低熵响应分布，从而在推理时间搜索所需多样性方面挣扎。我们提出向量策略优化（VPO），一种RL算法，专门训练策略以预测多样化的下游奖励函数并生成多样化的解决方案。VPO利用奖励在实践中通常是向量值的事实，例如代码生成中的每测试用例正确性，或者多个不同的用户人设或奖励模型。VPO本质上是GRPO优势估计器的直接替代品，但其训练LLM输出一组解决方案，其中每个解决方案专门针对向量奖励空间中的不同权衡。在四个任务上，VPO在测试时间搜索（如pass@k和best@k）中匹配或超越了最强的标量RL基线，随着搜索预算的增长，差距逐渐扩大。对于进化搜索，VPO模型解锁了GRPO模型无法解决的问题。随着测试时间搜索变得更加标准化，优化多样性可能需要成为后训练的默认目标。

英文摘要

Language models must now generalize out of the box to novel environments and work inside inference-scaling search procedures, such as AlphaEvolve, that select rollouts with a variety of task-specific reward functions. Unfortunately, the standard paradigm of LLM post-training optimizes a pre-specified scalar reward, often leading current LLMs to produce low-entropy response distributions and thus to struggle at displaying the diversity that inference-time search will require. We propose Vector Policy Optimization (VPO), an RL algorithm that explicitly trains policies to anticipate diverse downstream reward functions and to produce diverse solutions. VPO exploits that rewards are often vector-valued in practice, like per-test-case correctness in code generation or, say, multiple different user personas or reward models. VPO is essentially a drop-in replacement for the GRPO advantage estimator, but it trains the LLM to output a set of solutions where individual solutions specialize to different trade-offs in the vector reward space. Across four tasks, VPO matches or beats the strongest scalar RL baselines on test-time search (e.g. pass@k and best@k), with the gap widening as the search budget grows. For evolutionary search, VPO models unlock problems that GRPO models cannot solve at all. As test-time search becomes more standardized, optimizing for diversity may need to become the default post-training objective.

URL PDF HTML ☆

赞 0 踩 0

2605.22814 2026-05-22 cs.LG 版本更新

Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration

记住保持好奇：用于3D探索的片段上下文和持久世界

Lily Goli, Justin Kerr, Daniele Reda, Alec Jacobson, Andrea Tagliasacchi, Angjoo Kanazawa

发表机构 * University of Toronto（多伦多大学）； UC Berkeley（加州大学伯克利分校）； Wayve ； Vector Institute（向量研究所）； Simon Fraser University（西蒙 Fraser大学）

AI总结本研究提出了一种基于好奇心驱动强化学习的方法，通过引入持久世界模型和片段上下文来解决3D环境中稀疏奖励长周期任务中的探索问题，实验表明该方法在HM3D数据集上优于基于强化学习的主动映射基线，并能泛化到Gibson和AI生成的世界。

详情

AI中文摘要

探索是学习有用行为在稀疏奖励、长周期任务中的前提，特别是在3D环境中。好奇心驱动的强化学习通过内在奖励来解决这个问题，这些内在奖励来自于智能体对世界的预测模型与现实之间的不匹配。然而，将这种内在动机转化为复杂、逼真的环境仍然具有挑战性，因为智能体可能会被困在局部循环中，并且在重新访问遗忘状态时会获得新的奖励。在本工作中，我们证明这种失败源于缺乏空间持续性和片段上下文。我们表明，有效的好奇心需要一个持久且持续更新的世界模型，配以能够维护片段轨迹历史的智能体，以导航到新区域。我们通过在线3D重建作为世界模型的持久模型，同时将智能体策略参数化为基于RGB观察的序列模型来维持片段上下文。这种设计在训练期间实现了有效的探索，同时允许智能体在部署时仅使用RGB帧进行导航。在纯好奇心训练下，我们的智能体在HM3D上优于基于强化学习的主动映射基线，并能泛化到Gibson和AI生成的世界。我们的端到端策略使智能体能够高效适应下游任务，如苹果采摘和图像目标导航，优于从头开始的基线。请参见https://recuriosity.github.io/的视频结果。

英文摘要

Exploration is a prerequisite for learning useful behaviors in sparse-reward, long-horizon tasks, particularly within 3D environments. Curiosity-driven reinforcement learning addresses this via intrinsic rewards derived from the mismatch between the agent's predictive model of the world and reality. However, translating this intrinsic motivation to complex, photorealistic environments remains difficult, as agents can become trapped in local loops and receive fresh rewards for revisiting forgotten states. In this work, we demonstrate that this failure stems from a lack of spatial persistence and episodic context. We show that effective curiosity requires a model of the world that is persistent and continuously updated, paired with an agent that maintains an episodic trajectory history to navigate toward novel regions. We achieve this using an online 3D reconstruction as a persistent model of the world, while the agent policy is parameterized as a sequence model over RGB observations to maintain episodic context. This design enables effective exploration during training while allowing the agent to navigate using solely RGB frames at deployment. Trained purely via curiosity on HM3D, our agent outperforms RL-based active mapping baselines and generalizes zero-shot to Gibson and AI-generated worlds. Our end-to-end policy enables efficient adaptation to downstream tasks, such as apple picking and image-goal navigation, outperforming from-scratch baselines. Please see video results at https://recuriosity.github.io/.

URL PDF HTML ☆

赞 0 踩 0

2605.22786 2026-05-22 cs.AI cs.ET cs.LG cs.MA 版本更新

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems

LCGuard: 多智能体系统中安全KV共享的潜在通信守护者

Sadia Asif, Mohammad Mohammadi Amiri, Momin Abbas, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy

发表机构 * Rensselaer Polytechnic Institute（伦斯勒理工学院）； IBM Research（IBM研究院）

AI总结本文提出LCGuard框架，通过在智能体间共享KV缓存前学习表示层面的转换，以防止敏感信息泄露，同时在多个模型家族和多智能体基准测试中验证了其在减少重建攻击成功率和保持任务性能方面的有效性。

详情

AI中文摘要

基于大型语言模型（LLM）的多智能体系统越来越多地依赖中间通信来协调复杂任务。尽管大多数现有系统通过自然语言进行通信，但最近的研究表明，通过transformer键值（KV）缓存进行的潜在通信可以提高效率并保留更丰富的任务相关信息。然而，KV缓存也编码了上下文输入、中间推理状态和智能体特定信息，从而创建了一个可能传播敏感内容的不透明通道，而无需显式文本披露。为此，我们引入了LCGuard（潜在通信守护者），一个用于多智能体LLM系统中安全KV基于潜在通信的框架。LCGuard将共享的KV缓存视为潜在的工作记忆，并在缓存艺术制品传输到智能体之前学习表示层面的转换。我们通过重建正式化表示层面的敏感信息泄露操作：如果一个对抗性解码器可以从共享缓存艺术制品中恢复出智能体特定的敏感输入，则该共享缓存艺术制品是不安全的。这导致了一种对抗性训练公式，其中对抗者学习重建敏感输入，而LCGuard学习转换以保留任务相关语义并减少可重建的信息。在多个模型家族和多智能体基准测试中的实证评估表明，LCGuard在减少基于重建的泄露和攻击成功率的同时，能够保持与标准KV共享基线相比具有竞争力的任务性能。

英文摘要

Large language model (LLM)-based multi-agent systems increasingly rely on intermediate communication to coordinate complex tasks. While most existing systems communicate through natural language, recent work shows that latent communication, particularly through transformer key-value (KV) caches, can improve efficiency and preserve richer task-relevant information. However, KV caches also encode contextual inputs, intermediate reasoning states, and agent-specific information, creating an opaque channel through which sensitive content may propagate across agents without explicit textual disclosure. To address this, we introduce \textbf{LCGuard} (Latent Communication Guard), a framework for safe KV-based latent communication in multi-agent LLM systems. LCGuard treats shared KV caches as latent working memory and learns representation-level transformations before cache artifacts are transmitted across agents. We formalize representation-level sensitive information leakage operationally through reconstruction: a shared cache artifact is unsafe if an adversarial decoder can recover agent-specific sensitive inputs from it. This leads to an adversarial training formulation in which the adversary learns to reconstruct sensitive inputs, while LCGuard learns transformations that preserve task-relevant semantics and reduce reconstructable information. Empirical evaluations across multiple model families and multi-agent benchmarks show that LCGuard consistently reduces reconstruction-based leakage and attack success rates while maintaining competitive task performance compared to standard KV-sharing baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.22779 2026-05-22 cs.SE cs.LG 版本更新

FAME: Failure-Aware Mixture-of-Experts for Message-Level Log Anomaly Detection

FAME：面向失败的混合专家模型用于消息级日志异常检测

Huanchi Wang, Zihang Huang, Yifang Tian, Kristina Dzeparoska, Hans-Arno Jacobsen, Alberto Leon-Garcia

发表机构 * Department of Electrical and Computer Engineering（电气与计算机工程系）

AI总结本文提出FAME，一种面向失败的混合专家模型，用于消息级日志异常检测。该方法通过少量标注数据训练轻量级路由器和领域专家，实现高效的异常检测，同时在BGL和Thunderbird数据集上取得了高精度和召回率。

Comments 12 pages, 5 figures

详情

AI中文摘要

生产系统每天生成数百万条日志行，但大多数异常检测器在会话或窗口级别工作，标记的是行组而非特定消息。这种粗粒度迫使操作员每条警报都要检查许多常规行。消息级检测提供更细粒度，但仍然具有挑战性。一个事件模板可能对应正常和异常消息，故障源于异构子系统，大规模行级标注不切实际。尽管大型语言模型（LLMs）可以推断日志语义，但将其应用于每条行对于持续监控来说成本太高。我们提出了FAME（Failure-Aware Mixture-of-Experts），一种标签高效的面向消息级的混合专家框架，该框架仅在离线时使用LLM一次。我们最多为每个模板标注K条标注行以推导二元正常/异常指标和代表性示例。LLM提出将模板划分为故障领域，并通过认证步骤验证该提议后再进行训练。FAME训练了一个轻量级路由器和领域专家，这些专家在本地运行，并输出异常预测和故障领域标签。在BGL上，FAME在K=100时达到F1=98.16，将标注工作量减少76倍，并检测出86.3%的未见过的EventIDs异常。在Thunderbird上，FAME达到F1=99.95，具有完美的召回率。

英文摘要

Production systems generate millions of log lines daily, yet most anomaly detectors operate at the session or window-level, flagging groups of lines rather than identifying the specific message responsible. This coarse granularity forces operators to inspect many routine lines per alert. Message-level detection offers finer granularity, but remains challenging. A single event template may correspond to both normal and anomalous messages, failures arise from heterogeneous subsystems, and line-level labeling at scale is impractical. Although large language models (LLMs) can reason over log semantics, applying them to every line is too costly for continuous monitoring. We present FAME (Failure-Aware Mixture-of-Experts), a label-efficient message-level mixture-of-experts framework that uses an LLM only once offline. We annotate at most K labeled lines per template to derive binary normal/anomaly indicators and representative examples. The LLM proposes a partition of templates into failure domains, and a certification step validates the proposal before training. FAME trains a lightweight router and domain experts that run on-premise and output anomaly predictions and failure-domain labels. On BGL, FAME achieves F1 = 98.16 at K = 100 reducing annotation effort by 76x and detects 86.3% of anomalies from unseen EventIDs. On Thunderbird, FAME reaches F1 = 99.95 with perfect recall.

URL PDF HTML ☆

赞 0 踩 0

2605.22776 2026-05-22 cs.LG cs.AI stat.CO stat.ML 版本更新

Lumberjack: 通过树中的Heavy Hitter检测实现更好的差分隐私随机森林

Christian Janos Lebeda, David Erb, Tudor Cebere, Aurélien Bellet

发表机构 * PreMeDICaL ； Inria（法国国家信息与自动化研究所）； Université de Montpellier（蒙彼利埃大学）； INSERM（国家医学研究院）； Technical University of Munich（慕尼黑技术大学）

AI总结本文提出Lumberjack算法，通过构建大规模随机决策树并应用隐私保护的剪枝技术，显著提升了差分隐私随机森林的实用性。该方法引入了新的（ε，δ）-DP Heavy Hitter检测算法，具有O_{ε,δ}(√log h)的误差，使得树的高度可以更深，从而在隐私约束下提高表达能力。实验表明，Lumberjack在基准数据集上优于现有差分隐私随机森林方法，特别是在隐私预算下的隐私-效用权衡上取得显著改进。

详情

AI中文摘要

随机森林广泛应用于涉及敏感表格数据的领域，但现有的差分隐私（DP）方法通常会降级性能到不实用的程度。在本文中，我们介绍Lumberjack，一种差分隐私随机森林算法，通过构建大规模随机决策树并应用激进的隐私保护剪枝技术，保留仅足够 populated 的节点，从而实现显著更高的实用性。我们方法的关键组成部分是一个新颖的（ε，δ）-DP Heavy Hitter检测算法，用于层次数据，其误差为O_{ε,δ}(√log h)对于高度为h的树，并可能具有独立的兴趣。这种有利的缩放使得可以使用比先前工作更深的树，从而在隐私约束下提高表达能力。我们在基准数据集上的实验证明，Lumberjack在基准数据集上优于现有差分隐私随机森林方法，建立了新的状态。特别是，我们的方法在实际隐私预算下的隐私-效用权衡上取得了显著改进。我们的发现表明，精心设计的差分隐私随机森林可以缩小大部分的效用差距，突显了未来研究中一个有前途但尚未被探索的方向。

英文摘要

Random forests are widely used in fields involving sensitive tabular data, but existing approaches to enforcing differential privacy (DP) typically degrade performance to the point of impracticality. In this paper, we introduce Lumberjack, a differentially private random forest algorithm that achieves substantially higher utility by constructing large random decision trees and then applying aggressive, privacy-preserving pruning to retain only sufficiently populated nodes. A key component of our approach is a novel $(\varepsilon,δ)$-DP heavy hitter detection algorithm for hierarchical data, whose error is $O_{\varepsilon,δ}(\sqrt{\log h})$ for trees of height $h$ and may be of independent interest. This favorable scaling enables the use of significantly deeper trees than in prior work, leading to improved expressiveness under privacy constraints. Our empirical evaluation on benchmark datasets shows that Lumberjack consistently outperforms prior DP random forest methods, establishing a new state of the art. In particular, our approach yields substantial improvements in the privacy-utility trade-off for practical privacy budgets. Our findings suggest that carefully designed DP random forests can close much of the utility gap, highlighting a promising and underexplored direction for future research.

URL PDF HTML ☆

赞 0 踩 0

2605.22749 2026-05-22 cs.LG cs.AI 版本更新

Cyber-Physical Anomaly Detection in IoT-Enabled Smart Grids Using Machine Learning and Metaheuristic Feature Optimization

基于机器学习和元启发式特征优化的物联网智能电网中网络-物理异常检测

Adis Alihodžić, Eva Tuba, Milan Tuba

发表机构 * Department of Mathematical and Computer Sciences, University of Sarajevo（萨拉热窝大学数学与计算机科学系）； Singidunum University（辛吉杜姆大学）； Trinity University（特里尼蒂大学）； Sinergija University（辛格里雅大学）

AI总结本文研究了如何利用机器学习和元启发式特征优化方法，在物联网智能电网中检测网络-物理异常，通过评估多个基线模型，发现基于树的集成模型在该数据集上表现最佳，且经过特征优化后，模型在准确率和AUC指标上均有显著提升。

详情

AI中文摘要

现代智能电网依赖于密集的测量基础设施、通信链路和智能现场设备。尽管这提高了监控和控制能力，但也增加了遭受网络-物理破坏的风险。操作员必须区分物理事件，如故障或线路干扰，与恶意行为，如虚假数据注入或未经授权的命令执行。本章利用著名的MSU/ORNL电力系统攻击数据集来研究这一问题。所提出的方法结合了机器学习与基于遗传算法的特征选择。目标是双重的：准确分类攻击和自然事件，并确定一组减少的、物理信息丰富的PMU/IED测量是否能够支持可靠的检测。评估了多个基线模型，包括逻辑回归、RBF-SVM、XGBoost、随机森林和额外树。结果表明，基于树的集成模型在考虑的数据集上最为有效，其中额外树提供了最强的全特征基线。在特征选择后，GA + Extra Trees模型将干净的PMU特征空间从112个属性减少到五次运行的平均27.4个属性，同时将宏F1从0.9118提高到0.9212，ROC-AUC从0.9791提高到0.9837。这些结果表明，许多同步电气测量是冗余的。一个紧凑的基于相量的特征子集仍能提供准确且可解释的智能电网异常检测。

英文摘要

Modern smart grids rely on dense measurement infrastructures, communication links, and intelligent field devices. Although this improves supervision and control, it also increases vulnerability to cyber-physical disruptions. Operators must distinguish physical incidents, such as faults or line disturbances, from malicious actions, such as false data injection or unauthorized command execution. This chapter investigates this problem using the well-known MSU/ORNL Power System Attack Dataset. The proposed method combines machine learning with genetic-algorithm-based feature selection. The objective is twofold: to classify attack and natural events accurately, and to determine whether a reduced set of physically informative PMU/IED measurements can support reliable detection. Several baseline models are evaluated, including logistic regression, RBF-SVM, XGBoost, Random Forest, and Extra Trees. The results show that tree-based ensemble models are the most effective for the considered dataset, with Extra Trees providing the strongest full-feature baseline. After feature selection, the GA + Extra Trees model reduces the clean PMU feature space from 112 attributes to an average of 27.4 attributes over five runs, while increasing macro-F1 from 0.9118 to 0.9212 and ROC-AUC from 0.9791 to 0.9837. These results indicate that many synchronized electrical measurements are redundant. A compact subset of phasor-based features can still provide accurate and interpretable anomaly detection in smart grids.

URL PDF HTML ☆

赞 0 踩 0

2605.22746 2026-05-22 cs.LG eess.AS stat.ML 版本更新

Plug-in Losses for Evidential Deep Learning: A Simplified Framework for Uncertainty Estimation that Includes the Softmax Classifier

插件损失用于证据深度学习：一个简化框架用于不确定性估计，其中包括softmax分类器

Berk Hayta, Hannah Laus, Simon Mittermaier, Felix Krahmer

发表机构 * TU Munich（慕尼黑技术大学）； MCML（慕尼黑实验室）； Infineon Technologies（英飞凌科技）

AI总结本文提出了一种简化框架，用于通过插件损失近似证据深度学习中的不确定性估计，证明了在特定证据到狄利克雷分布映射下，该框架包含标准的softmax分类器，并在Google语音命令数据集上验证了其有效性。

详情

AI中文摘要

现实中的基于传感器的学习系统需要可靠且计算高效的不确定性估计。证据深度学习（EDL）通过狄利克雷分布建模类概率，从而实现单次通过的不确定性估计，其中狄利克雷参数由一个学习的神经网络映射预测。然而，这种方法可能导致计算挑战，因为狄利克雷期望目标比标准监督学习损失更复杂，增加了分析和实现的难度。我们通过近似由EDL诱导的一阶经验风险最小化问题的目标，使用在狄利克雷均值上评估的插件损失，证明在温和假设下，对于广泛的一类损失函数，包括均方误差和交叉熵损失，近似误差随着证据的增长而减小。作为特殊情况，我们的分析为在不确定性估计中使用softmax提供了正当性，因为在特定的证据到狄利克雷分布映射下，我们的框架包含标准的softmax分类器。我们在Google语音命令数据集上验证了所提出的简化目标，并展示了其在预测准确性和选择性预测性能上与经典EDL相当，同时使用标准深度学习损失和训练流程实现起来更简单。到目前为止，本文的实证分析是首次通过EDL获得语音识别任务中的覆盖-准确性权衡。

英文摘要

Real-world sensor-based learning systems require uncertainty estimation that is both reliable and computationally efficient. Evidential Deep Learning (EDL) provides single-pass uncertainty estimation by modeling the class probabilities via Dirichlet distributions, where the Dirichlet parameters are predicted by a learned neural network mapping. However, this approach can lead to computational challenges, as Dirichlet expected objectives are more complex than standard supervised learning losses, complicating their analysis and implementation. We address this issue by approximating the objective of the first-order empirical risk minimization problem induced by EDL with a plug-in loss evaluated at the Dirichlet mean and show that, under mild assumptions, the approximation error decays with growing evidence for a broad class of loss functions, including mean-squared error and cross-entropy loss. As a special case, our analysis provides justification for the use of softmax in the context of uncertainty estimation, since under a particular evidence-to-Dirichlet mapping, our framework includes the standard softmax classifier. We validate the proposed simplified objectives on the Google Speech Commands dataset and show that they achieve predictive accuracy and selective prediction performance comparable to classical EDL, while being simpler to implement using standard deep learning losses and training pipelines. To the best of our knowledge, this empirical analysis is the first to obtain coverage-accuracy trade-offs for speech recognition tasks through EDL.

URL PDF HTML ☆

赞 0 踩 0

2605.22743 2026-05-22 cs.LG 版本更新

SeqLoRA: Bilevel Orthogonal Adaptation for Continual Multi-Concept Generation

SeqLoRA: 为持续多概念生成的双水平正则化适应

Javad Parsa, Enis Simsar, Amir Joudaki, Thomas Hofmann, André M. H. Teixeira

发表机构 * Uppsala University（乌普萨拉大学）； ETH Zurich（苏黎世联邦理工学院）； Sweden（瑞典）

AI总结本文提出SeqLoRA，一种双水平优化框架，通过联合优化LoRA因素来解决文本到图像扩散模型中多自定义概念组合时的表示干扰问题，提高了身份保持性和可扩展性。

详情

AI中文摘要

参数高效微调能够快速个性化文本到图像扩散模型，但组合多个自定义概念仍然具有挑战性，因为存在表示干扰。现有的模块化方法要么依赖于昂贵的后置融合，要么冻结适应子空间，这限制了表达能力和概念保真度。为了解决这一权衡，我们提出了顺序正则化的LoRA（SeqLoRA），一种联合优化LoRA因素的持续学习框架。理论上，我们为我们的算法建立了强收敛保证，并将残差层激活建模为矩阵子高斯过程，以推导出灾难性遗忘的高概率界。我们进一步证明，从数据中学习LoRA基底比冻结基底方法更有效地最小化残差干扰能量。在多概念图像生成实验中，SeqLoRA在多达101个概念上提高了身份保持性和可扩展性，同时避免了昂贵的融合并减少了组合生成中的属性干扰。

英文摘要

Parameter-efficient fine-tuning enables fast personalization of text-to-image diffusion models, but composing multiple custom concepts remains challenging due to representation interference. Existing modular methods either rely on expensive post-hoc fusion or freeze adaptation subspaces, which limit expressiveness and concept fidelity. To address this trade-off, we propose Sequential regularized LoRA (SeqLoRA), a constrained continual learning framework that jointly optimizes both LoRA factors via bilevel optimization. Theoretically, we establish strong convergence guarantees for our algorithm and model the residual layer activations as a matrix sub-Gaussian process to derive high-probability bounds on catastrophic forgetting. We further prove that learning the LoRA basis from data minimizes residual interference energy more effectively than frozen-basis methods. Experiments on multi-concept image generation demonstrate that SeqLoRA improves identity preservation and scalability across up to 101 concepts, while avoiding costly fusion and reducing attribute interference in composed generations.

URL PDF HTML ☆

赞 0 踩 0

2605.22736 2026-05-22 math.OC cs.LG cs.NA math.DG math.NA 版本更新

Optimization over the intersection of manifolds

在两个流形交集上的优化

Yan Yang, Bin Gao, Ya-xiang Yuan

发表机构 * State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, and University of Chinese Academy of Sciences, China（数学科学国家重点实验室，数学与系统科学研究院，中国科学院，以及中国科学院大学，中国）

AI总结本文提出了一种几何方法，通过在单个流形上进行重新参数化，并在两个正交方向上更新迭代点，以解决两个流形交集上的优化问题，证明了清洁交集和内在横贯性是等价的，并展示了该方法在稀疏和低秩优化问题中的有效性。

Comments 26 pages, 5 figures, 3 tables

详情

AI中文摘要

在两个流形交集上的优化出现在广泛的应用中，但受到可行区域耦合几何的阻碍。在本文中，我们证明了正则性——清洁交集和内在横贯性——是等价的，这导致了可处理的交集切空间投影。因此，我们提出了一种几何方法，该方法仅在单个流形上使用重新参数化，并在两个正交方向上更新迭代点。具体而言，迭代点停留在一个流形上，而这两个方向分别负责渐近接近另一个流形和减少目标函数。在内在横贯性下，我们推导了可行性和最优性度量的收敛速度，并证明了每个积累点都是第一阶 stationary 的。在稀疏和低秩优化问题上的数值实验，包括拟合球形数据、在真实数据上近似双曲嵌入和计算压缩模式，展示了所提方法的有效性。

英文摘要

Optimization over the intersection of two manifolds arises in a broad range of applications, but is hindered by the coupled geometry of the feasible region. In this paper, we prove that the regularities -- clean intersection and intrinsic transversality -- are equivalent, which yields a tractable projection onto the tangent space of the intersection. Therefore, we propose a geometric method that employs a retraction on only one manifold and updates the iterate along two orthogonal directions. Specifically, the iterates stay on one manifold, and the two directions are responsible for asymptotically approaching the other manifold and decreasing the objective function, respectively. Under intrinsic transversality, we derive the convergence rate for both the feasibility and optimality measures, and show that every accumulation point is first-order stationary. Numerical experiments on problems stemming from sparse and low-rank optimization, including fitting spherical data, approximating hyperbolic embeddings on real data, and computing compressed modes, demonstrate the effectiveness of the proposed method.

URL PDF HTML ☆

赞 0 踩 0

2605.22731 2026-05-22 cs.LG cs.AI 版本更新

Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation

训练后是关于状态，而不是标记：一种状态分布视角下的SFT、RL和在线策略蒸馏

Dong Nie

发表机构 * Independent Researcher（独立研究者）

AI总结本文从状态分布的角度研究了监督微调(SFT)、强化学习(RL)和在线策略蒸馏(OPD)等大语言模型训练后方法，发现训练状态的来源和局部性与监督信号的形式同样重要。

详情

AI中文摘要

大型语言模型的训练后方法，如监督微调(SFT)、强化学习(RL)和蒸馏，通常通过其损失函数进行分析：最大似然、策略梯度、前向KL、反向KL或相关的目标级变体。我们研究了一个互补因素：应用于监督的状态分布。对于自回归策略，状态是提示加上生成的前缀。SFT在固定数据集的状态上训练，而RL和在线策略蒸馏(OPD)在当前学习者诱导的状态上训练。我们正式将训练后过程视为状态分布塑造，并使用Qwen3-0B-Base在GSM8K上进行受控的小规模研究，用TruthfulQA和MMLU作为保留评估。我们的结果显示出三种现象。第一，轻微的SFT运行在GSM8K上表现良好，而压力SFT运行导致显著的保留损失。第二，从退化的SFT教师那里获得的OPD在GSM8K、TruthfulQA和MMLU上优于该教师，尽管仅使用教师作为监督来源。第三，轻量级的在线策略RL运行在GSM8K上提高了表现，同时保持了保留。这些结果支持了训练后过程的状态视角：训练状态的来源和局部性与监督信号的形式同样重要。

英文摘要

Large language model post-training methods such as supervised fine-tuning (SFT), reinforcement learning (RL), and distillation are often analyzed through their loss functions: maximum likelihood, policy gradients, forward KL, reverse KL, or related objective-level variants. We study a complementary factor: the state distribution on which supervision is applied. For an autoregressive policy, a state is a prompt plus generated prefix. SFT trains on fixed dataset states, while RL and on-policy distillation (OPD) train on states induced by the current learner. We formalize post-training as state-distribution shaping and run a controlled smallscale study using Qwen3-0.6B-Base on GSM8K, with TruthfulQA and MMLU as retention evaluations. Our results show three phenomena. First, a mild SFT run improves GSM8K with little forgetting, while a stress SFT run causes substantial retention loss. Second, OPD from a degraded SFT teacher surpasses that teacher on GSM8K, TruthfulQA, and MMLU, despite using the teacher as its only supervision source. Third, a lightweight on-policy RL run improves GSM8K while preserving retention. These results support a state-centric view of post-training: the source and locality of training states can be as important as the form of the supervision signal.

URL PDF HTML ☆

赞 0 踩 0

2605.22724 2026-05-22 cs.LG cs.NA math.NA stat.ML 版本更新

Multiple Neural Operators Achieve Near-Optimal Rates for Multi-Task Learning

多重神经算子在多任务学习中实现接近最优的速率

Adrien Weihs, Hayden Schaeffer

发表机构 * Department of Mathematics,Id University of California Los Angeles,Id（数学系，加州大学洛杉矶分校）

AI总结本文研究了共享多任务设置中学习一组算子的近似性和统计复杂性，重点探讨了多重神经算子（MNO）架构。对于广泛类别的Lipschitz多重算子映射，推导出近似和统计泛化性的近优上界。同时，建立了参数复杂性的诅咒并证明了相应的最小最大速率。这些结果表明，跨任务共享表示不会增加总体成本：多任务算子学习遵循与单算子学习相同的缩放定律。此外，本文还比较了MNO与基于拼接任务输入的深度ONet多任务扩展版本，并表明从最坏情况的近似复杂性角度看，两种架构满足本质上相同的渐进行速率。

2605.22723 2026-05-22 cs.LG cs.AI cs.IT math.IT 版本更新

The Value of Covariance Matching in Gaussian DDPMs and the Lanczos Sampler

高斯DDPM中协方差匹配的价值及兰扎斯采样器

Md Sahil Akhtar, Aymane El Gadarri, Vivek F. Farias, Adam D. Jozefiak

发表机构 * Electrical Engineering and Computer Science（电气工程与计算机科学系）； Massachusetts Institute of Technology（麻省理工学院）； Operations Research Center（运筹学研究中心）； Sloan School of Management（斯隆管理学院）

AI总结本文研究了高斯DDPM中协方差匹配在路径空间KL散度中的价值，提出兰扎斯采样器方法，通过矩阵自由技术实现最优反向协方差采样，从而提升采样质量。

详情

AI中文摘要

高斯DDPM中的核心误差度量是精确反向链与学习高斯反向过程之间的路径空间KL散度。这一量在如分类引导等过程中尤为重要，这些过程扰动整个反向轨迹而非仅终端样本。先前分析显示，标准各向同性反向协方差会导致随着去噪步数T增长而不可避免的Ω(1/T)路径KL误差。我们证明匹配完整后验协方差突破这一障碍，使路径KL误差降至O(1/T²)。为使完整协方差匹配实用化，我们引入兰扎斯高斯采样器（LGS），一种无需训练、矩阵自由的方法，仅通过后验均值的雅可比-向量积即可从最优反向协方差采样。LGS避免了密集协方差存储和辅助协方差模型。我们证明LGS近似误差随兰扎斯步骤数呈指数衰减，每个兰扎斯步骤仅需一次雅可比-向量积。实验表明，仅使用三个此类步骤即可在标准图像基准上提升样本质量，优于包括OCM-DDPM在内的强对角协方差基线。这表明完整协方差匹配在理论和实践中均具有价值。

英文摘要

A central error measure in Gaussian DDPMs is the path-space KL divergence between the exact reverse chain and the learned Gaussian reverse process. This quantity is especially relevant for procedures such as classifier guidance, which perturb the entire reverse trajectory rather than only the terminal sample. Prior analyses show that standard isotropic reverse covariances suffer an unavoidable $Ω(1/T)$ path-KL error as the number of denoising steps $T$ grows. We show that matching the full posterior covariance breaks this barrier, yielding an order-wise improvement that reduces the path KL to $O(1/T^2)$. To make full covariance matching practical, we introduce the Lanczos Gaussian sampler (LGS), a training-free, matrix-free method for sampling from the optimal reverse covariance using only covariance-vector products, which are available through Jacobian-vector products of the posterior mean. LGS avoids dense covariance storage and auxiliary covariance models. We prove that LGS approximation error decays exponentially in the number of Lanczos steps, where each Lanczos step requires a single Jacobian-vector product. Empirically, using only just three such steps improves sample quality over strong diagonal-covariance baselines, including OCM-DDPM, across standard image benchmarks. This identifies full covariance matching as both theoretically valuable and practically accessible for fast DDPM sampling.

URL PDF HTML ☆

赞 0 踩 0

2605.22719 2026-05-22 cs.LG 版本更新

Reading Task Failure Off the Activations: A Sparse-Feature Audit of GPT-2 Small on Indirect Object Identification

阅读任务失败的激活特征：GPT-2小模型在间接对象识别任务上的稀疏特征审计

Mahdi Nasermoghadasi

发表机构 * Research Division, BrightMind AI（BrightMind AI研究部）； Texas Tech University（德克萨斯理工大学）； University of Texas at Arlington（德克萨斯大学阿灵顿分校）

AI总结该研究通过审计GPT-2小模型在间接对象识别任务中失败与成功样本的稀疏自动编码器特征，发现特定特征与任务失败高度相关，并通过多种控制实验验证了其相关性而非因果性。

Comments 10 pages, 7 figures

详情

AI中文摘要

我们报告了一个小型、可复现的审计，探讨了GPT-2小模型在间接对象识别（IOI）任务中失败与成功样本之间稀疏自动编码器（SAE）特征的差异。在300个提示中，GPT-2小模型达到79.7%的准确率；24,576个层-8残差流SAE特征中有146个通过holm校正的显著性阈值，105个具有大效应量（|Cohen's d| > 0.8）。最强的单一相关特征——特征17,491（d=+2.93，Neuronpedia标签'加密密钥'）——在提示中的转移对象为'密钥'时，GPT-2小模型失败率达93.3%，而在其他七个对象上仅为7.5%（Fisher精确检验p=8.79 x 10^-33）。我们通过三种控制实验验证了这一相关性。 (i) 因果消融：在所有45个密钥提示的token位置上零特征17,491不恢复准确性（6.7% -> 4.4%）；该特征是相关而非该层的充分原因。 (ii) 表示基线：对原始768维残差流进行逻辑回归达到5倍ROC AUC=0.929，与前100个SAE特征（0.927）相当；SAE基底增加可解释性而非预测能力。 (iii) 种子鲁棒性检查：在五个随机种子中，密钥子集的失败率保持在75.0-93.3%（行为效应是真实的），但特征17,491仅在1个运行中是top-|d|特征。因此，方法学贡献是审计流程（经济、模型无关、揭示命名相关特征）而非任何单个通过该流程发现的特征。我们发布了代码、300个提示语料库、300x24,576激活矩阵、消融和基线脚本以及图表。完整流程可在笔记本电脑（Apple M3 Max，无离散GPU）上运行。

英文摘要

We report a small, reproducible audit of which sparse-autoencoder (SAE) features of GPT-2 small fire differently on failed versus successful trials of the Indirect Object Identification (IOI) task. On 300 prompts, GPT-2 small reaches 79.7% accuracy; 146 of the 24,576 features in the layer-8 residual-stream SAE release of Bloom (2024) clear a Holm-corrected significance threshold and 105 reach a large effect size (|Cohen's d| > 0.8). The strongest single correlate of failure -- feature 17,491, d=+2.93, Neuronpedia label 'cryptographic keys' -- is essentially silent except when the prompt's transferred object is 'the keys,' on which GPT-2 small fails 93.3% of the time vs. 7.5% on the other seven objects (Fisher exact p = 8.79 x 10^-33). We put this correlate through three controls that a mechanistic claim should pass. (i) A causal ablation: zeroing feature 17,491 in the residual stream across all token positions of the 45 keys prompts does not restore accuracy (6.7% -> 4.4%); the feature is a correlate, not a sufficient cause at this layer. (ii) A representation baseline: a logistic regression on the raw 768-dimensional residual stream reaches 5-fold ROC AUC = 0.929, matching the top-100 SAE features (0.927); the SAE basis adds interpretability, not predictive power. (iii) A seed-robustness check: across five random seeds the keys-subset failure rate stays in 75.0--93.3% (the behavioural effect is real), but feature 17,491 is the top-|d| feature in only 1 of 5 runs. The methodological contribution is therefore the audit pipeline (cheap, model-agnostic, surfaces named correlates) rather than any single feature found through it. We release the code, the 300-prompt corpus, the 300x24,576 activation matrix, the ablation and baseline scripts, and the figures. The full pipeline runs on a laptop (Apple M3 Max, no discrete GPU).

URL PDF HTML ☆

赞 0 踩 0

2605.22717 2026-05-22 cs.SD cs.AI cs.LG cs.MM 版本更新

Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators

实时音乐扩散模型：交互式音乐生成扩散模型的高效微调与后训练

Zachary Novack, Stephen Brade, Haven Kim, Hugo Flores García, Nithya Shikarpur, Chinmay Talegaonkar, Suwan Kim, Valerie K. Chen, Julian McAuley, Taylor Berg-Kirkpatrick, Cheng-Zhi Anna Huang

发表机构 * UC San Diego（加州大学圣迭戈分校）； MIT（麻省理工学院）； Adobe（Adobe公司）

AI总结本文研究了音频扩散模型能否通过块级KV缓存高效地转化为交互式模型，从而在消费级硬件上实现。提出的Live Music Diffusion Models (LMDMs)通过块级KV缓存恢复并超越了离散Live Music Models (LMMs)的推理复杂度，并通过ARC-Forcing范式实现稳定的后训练对齐，从而在无需显式RL或奖励模型的情况下减少误差累积。

详情

AI中文摘要

交互式流式音乐生成承诺了生成模型在实时表演和协作创作中的应用，这在离线模型中是无法实现的。然而，最先进的模型存在于离散AR领域，需要工业级的计算资源进行训练和推理。在本文中，我们研究音频扩散模型是否可以被重新利用为交互式模型，从而在消费级硬件上实现。通过仔细分析现代块级外推扩散流程，我们发现推理过程中存在关键的低效问题，导致其计算效率严劣于离散AR模型。我们提出了Live Music Diffusion Models (LMDMs)，一种简单的生成扩散过程修改，通过块级KV缓存恢复并超越了离散Live Music Models (LMMs)的推理复杂度。与LMMs不同，LMDMs进一步通过我们新颖的ARC-Forcing范式实现稳定的后训练对齐，无需任何显式RL或奖励模型即可减少误差累积。我们展示了LMDMs在多个创意领域中的应用，包括文本条件生成、基于草图的音乐合成和即兴演奏。最后，我们展示了如何将LMDMs用作生成乐器，在真实艺术家与AI的合作中利用LMDMs作为“生成延迟”，将音乐家的即兴演奏转换为可变的音色效果，同时在本地消费级游戏笔记本电脑上运行。

英文摘要

Interactive streaming music generation promises the use of generative models for live performance and co-creation that is impossible with offline models. However, SOTA models exist in the discrete-AR regime, requiring industrial levels of compute for both training and inference. In this work, we investigate whether audio diffusion models, with their wide support in the open-source community but non-streaming bidirectional nature, can be repurposed efficiently into interactive models accessible on consumer hardware. By taking a critical look at the modern pipeline for block-wise outpainting diffusion, we identify critical inefficiencies during inference that result in strictly worse computational efficiency than their discrete-AR counterparts. We propose Live Music Diffusion Models (LMDMs), a simple modification of the generative diffusion process that recovers, and then outperforms, the inference complexity of the discrete Live Music Models (LMMs) through block-wise KV Caching. Unlike LMMs, LMDMs further enable stable post-training alignment through our novel ARC-Forcing paradigm, reducing error accumulation without any explicit RL or reward models. We demonstrate the application of LMDMs in a number of creative domains, including text-conditioned generation, sketch-based music synthesis, and jamming. We finally show how LMDMs can be used as a generative instrument in a real artist-AI collaboration, utilizing LMDMs as a "generative delay" to transform musicians' improvisation live for variable timbral effects while running locally on a consumer gaming laptop.

URL PDF HTML ☆

赞 0 踩 0

2605.22711 2026-05-22 cs.LG cs.AI 版本更新

Abstraction for Offline Goal-Conditioned Reinforcement Learning

离线目标条件强化学习中的抽象

Clarisse Wibault, Alexander Goldie, Antonio Villares, Maike Osborne, Jakob Foerster

发表机构 * FLAIR, MLRG University of Oxford（FLAIR、MLRG 欧洲大学）

AI总结本文提出了一种在离线目标条件强化学习中利用抽象的方法，通过引入相对化选项和不同层次的表示，提高了在相似状态空间上下文中的经验复用能力，从而提升了性能。

2605.22703 2026-05-22 cs.LG 版本更新

Clipping Bottleneck: Stabilizing RLVR via Stochastic Recovery of Near-Boundary Signals

剪裁瓶颈：通过近边界信号的随机恢复稳定RLVR

Shuo Yang, Jinda Lu, Chiyu Ma, Kexin Huang, Haoming Meng, Qihui Zhang, Yuyang Liu, Bolin Ding, Guoyin Wang, Li Yuan, Jingren Zhou

发表机构 * Alibaba Group（阿里巴巴集团）

AI总结本文研究了强化学习可验证奖励（RLVR）中由于硬剪裁决策导致的训练不稳定问题，提出了一种名为近边界随机救援（NSR）的简单方法，通过随机保留略微超出边界范围的token来恢复丢失的信号，从而提升训练稳定性和性能。

详情

AI中文摘要

强化学习可验证奖励（RLVR）已成为扩展大语言模型推理能力的核心范式，但其优化过程常常受到训练不稳定和收敛次优的问题影响。通过系统分析基于剪裁的GRPO类目标，我们发现由硬剪裁引起的刚性剪裁决策是所研究的RLVR设置中的关键实际瓶颈。具体而言，我们的分析表明，信息信号可能位于剪裁阈值之外的近边界区域，因此被标准硬剪裁规则所丢弃。值得注意的是，一旦这个瓶颈被精确识别，即使在边界处进行简单的随机扰动也能恢复有意义的性能提升。基于这一发现，我们提出了近边界随机救援（NSR），一种最小、即插即用的修改方法，通过随机保留略微超出边界范围的token来恢复丢失的信号。虽然NSR通过随机采样可以被解释为在期望上诱导隐含梯度衰减，但我们的消融实验表明，其随机的边界局部救援机制在一致性上比确定性梯度衰减更有效。通过在7B到30B规模以及密集和MoE架构上的广泛实验验证，作为即插即用的解决方案，NSR显著提高了训练稳定性，并在DAPO和GSPO等强基线模型上实现了持续的性能提升。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a central paradigm for scaling LLM reasoning, yet its optimization often suffers from training instability and suboptimal convergence. Through a systematic dissection of clipping-based GRPO-style objectives, we identify the rigid clipping decision induced by hard clipping as a key practical bottleneck in the studied RLVR setups. Specifically, our analysis suggests that informative signals can lie in the near-boundary region just beyond the clipping threshold, and are therefore discarded by the standard hard-clipping rule. Notably, once this bottleneck is precisely identified, even simple stochastic perturbations at the boundary can recover meaningful performance gains. Building on this finding, we propose Near-boundary Stochastic Rescue (NSR), a minimal, plug-and-play modification that stochastically retains these slightly out-of-bound tokens to recover lost signals. While NSR, via stochastic sampling, can be interpreted as inducing an implicit gradient decay in expectation, our ablations reveal that its stochastic, boundary-local rescue mechanism is consistently more effective than deterministic gradient decay. Validated by extensive experiments across model sizes from 7B to 30B and both dense and MoE architectures, as a plug-and-play solution, NSR substantially improves training stability and delivers consistent gains over strong baselines such as DAPO and GSPO.

URL PDF HTML ☆

赞 0 踩 0

2605.22691 2026-05-22 cs.LG cond-mat.stat-mech 版本更新

Posterior Collapse as Automatic Spectral Pruning

后验坍缩作为自动谱剪枝

Johannes Hirn

发表机构 * Image Processing Laboratory (IPL), Universitat de València, Paterna, València 46980, Spain（图像处理实验室（IPL），瓦伦西亚大学，帕特erna，瓦伦西亚 46980，西班牙）

AI总结本文研究了β-VAE中的后验坍缩现象，揭示其本质上是一种自动谱剪枝过程，通过分析不同β值下的均衡解，展示了潜在模式从最不有用的到最有用的逐步解耦的崩溃过程。

2605.22679 2026-05-22 cs.CV cs.LG 版本更新

Conceptualizing Embeddings: Sparse Disentanglement for Vision-Language Models

将嵌入概念化：面向视觉-语言模型的稀疏解缠

Piotr Kubaty, Patryk Marszałek, Łukasz Struski, Adam Wróbel, Jacek Tabor, Marek Śmieja

发表机构 * Faculty of Mathematics and Computer Science, Jagiellonian University（雅盖隆大学数学与计算机科学学院）； Doctoral School of Exact and Natural Sciences, Jagiellonian University（雅盖隆大学精确与自然科学博士学校）； Centre for Credible AI, Warsaw University of Technology（华沙技术大学可信人工智能中心）

AI总结本文提出CEDAR方法，通过稀疏解缠技术在不增加维度的情况下揭示预训练嵌入的组成结构，从而提升视觉-语言模型的可解释性和与人类感知的一致性。

详情

AI中文摘要

视觉-语言模型学习了强大的多模态嵌入，但其内部语义仍然模糊。尽管稀疏自编码器（SAEs）可以提取可解释的特征，但它们依赖于扩展表示维度，这会破坏原始几何结构并引入冗余。我们引入CEDAR（通过自适应旋转进行概念嵌入解缠），一种事后方法，能够在不增加维度的情况下揭示预训练嵌入的组成结构。通过学习具有top-k稀疏瓶颈的可逆变换，CEDAR将语义信息集中到轴对齐的解缠坐标中。在CLIP-like架构中，单个坐标可以与文本概念进行解释，而对于生成模型如BLIP，它们可以解码为自然语言描述。实验表明，CEDAR在重建-稀疏性权衡方面具有竞争力，同时产生更可解释且更符合人类感知的解释。我们的结果表明，视觉-语言表示中的显性纠缠可以通过适当的基变换来解决，从而消除对过度扩展的需要。

英文摘要

Vision-language models learn powerful multimodal embeddings, yet their internal semantics remain opaque. While sparse autoencoders (SAEs) can extract interpretable features, they rely on expanding the representation dimension, which compromises the original geometry and introduces redundancy. We introduce CEDAR (Conceptual Embedding Disentanglement via Adaptive Rotation), a post-hoc method that reveals the compositional structure of pretrained embeddings without increasing dimensionality. By learning an invertible transformation with a top-$k$ sparsity bottleneck, CEDAR concentrates semantic information into axis-aligned disentangled coordinates. In CLIP-like architecture, individual coordinates can be interpreted with textual concepts, while for generative models such as BLIP, they can be decoded into natural language descriptions. Experiments demonstrate that CEDAR achieves a competitive reconstruction-sparsity trade-off while producing explanations that are more interpretable and better aligned with human perception. Our results suggest that the apparent entanglement in vision-language representations can be resolved through a suitable change of basis, eliminating the need for overcomplete expansions.

URL PDF HTML ☆

赞 0 踩 0

2605.22666 2026-05-22 math.CO cs.LG math.PR 版本更新

Holographic functions and neural networks

全息函数与神经网络

Balazs Szegedy

发表机构 * Rényi Institute of Mathematics（雷尼数学研究所）

AI总结本文研究了全息函数的复杂性，通过三种不同方法（采样性质、结构性质和计算性质）探讨了全息函数的复杂性界限，并证明了这三种性质在参数上是等价的。

2605.22658 2026-05-22 cs.CV cs.LG cs.MM eess.IV 版本更新

SegCompass: Exploring Interpretable Alignment with Sparse Autoencoders for Enhanced Reasoning Segmentation

SegCompass: 探索通过稀疏自编码器实现可解释对齐以增强推理分割

Zhenyu Lu, Liupeng Li, Jinpeng Wang, Haoqian Kang, Yan Feng, Ke Chen, Yaowei Wang

发表机构 * Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences（深圳先进技术研究院，中国科学院）； Peng Cheng Laboratory（鹏城实验室）； Harbin Institute of Technology, Shenzhen（哈尔滨工业大学（深圳））； Meituan, Beijing（美团，北京）； University of Chinese Academy of Sciences（中国科学院大学）； College of Computer Science and Technology, Jilin University（吉林大学计算机科学与技术学院）

AI总结本文提出SegCompass，一种通过稀疏自编码器实现可解释对齐的端到端模型，以提升推理分割的性能和可解释性。

Comments Accepted by CVPR 2026. 15 pages, 9 figures, 6 tables

详情

AI中文摘要

尽管大语言模型提供了强大的组合推理能力，但现有推理分割流程未能清晰地将这种推理与视觉感知连接起来。当前方法，如潜在查询对齐，虽然端到端但却是不透明的“黑箱”。相反，文本定位读出仅可读但不真正可解释，通常作为无约束的后处理步骤。为弥合这一可解释性差距，我们提出了SegCompass，一种端到端模型，利用稀疏自编码器（SAE）建立一个显式、可解释且可微的对齐路径。给定一个图像-指令对，SegCompass首先生成一个思维链（CoT）轨迹。该方法的核心是一个将CoT和视觉标记映射到共享高维稀疏概念空间的SAE。一个查询代码本从该空间中选择显著概念，然后通过槽映射器在空间上定位到多槽热图，引导最终的掩码解码器。整个模型联合训练，将强化学习用于推理路径与标准分割监督相结合。这种由SAE驱动的接口提供了显著比潜在查询更可追溯的“白盒”连接，比文本读出更连贯。在五个具有挑战性的基准测试中，SegCompass匹配或超越了最先进的性能。关键的是，我们的视觉和定量分析显示，所学稀疏概念的质量与最终掩码准确性之间存在强相关性，证实了SegCompass通过其增强且可检查的对齐实现了优越的结果。代码可在https://github.com/ZhenyuLU-Heliodore/SegCompass获取。

英文摘要

While large language models provide strong compositional reasoning, existing reasoning segmentation pipelines fail to transparently connect this reasoning to visual perception. Current methods, such as latent query alignment, are end-to-end yet opaque "black boxes". Conversely, textual localization readout is merely readable, not truly interpretable, often functioning as an unconstrained post-hoc step. To bridge this interpretability gap, we propose SegCompass, an end-to-end model that leverages a Sparse Autoencoder (SAE) to forge an explicit, interpretable, and differentiable alignment pathway. Given an image-instruction pair, SegCompass first generates a chain-of-thought (CoT) trace. The core of our method is an SAE that maps both the CoT and visual tokens into a shared, high-dimensional sparse concept space. A query codebook selects salient concepts from this space, which are then spatially grounded by a slot mapper into a multi-slot heatmap that guides the final mask decoder. The entire model is trained jointly, unifying reinforcement learning for the reasoning path with standard segmentation supervision. This SAE-driven interface provides a "white-box" connection that is significantly more traceable than latent queries and more coherent than textual readouts. Extensive experiments on five challenging benchmarks demonstrate that SegCompass matches or surpasses state-of-the-art performance. Crucially, our visual and quantitative analyses show a strong correlation between the quality of the learned sparse concepts and final mask accuracy, confirming that SegCompass achieves superior results through its enhanced and inspectable alignment. Code is available at https://github.com/ZhenyuLU-Heliodore/SegCompass.

URL PDF HTML ☆

赞 0 踩 0

2605.22653 2026-05-22 cs.DS cs.LG 版本更新

医疗LLM基准测试的可靠性仅取决于其显式假设

Naveen Raman, Santiago Cortes-Gomez, Mateo Dulce Rubio, Fei Fang, Bryan Wilder

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； New York University（纽约大学）

AI总结本文提出医疗LLM基准测试的评估-部署差距源于隐式假设，而非基准设计问题，并通过BenchmarkCards和分阶段评估方法来解决这一问题。

Comments 13 pages, 1 figure

详情

AI中文摘要

基准测试对于医疗评估是必要的，但不足以预测部署性能。我们的观点是，评估-部署差距并非源于基准设计不当，而是源于关于用户如何与模型交互的隐式假设，这些假设无法仅通过基准测试本身来揭示。为了使这一观点更明确，我们提出了将假设分为两类的分类：任务假设，可通过对话数据单独测试；以及结果假设，需要结果数据和行为研究来测试。关键的是，结果假设依赖于人类行为，即使设计良好的基准也无法直接观察。为了证明该框架的实用性，我们回顾性分析了一个医疗RCT作为案例研究，并发现差距自然分为大致相等的任务和结果差距。为此，我们做出了两项贡献：首先，我们提出BenchmarkCards，一种记录假设的工具；其次，我们提出分阶段评估，一种系统测试假设并评估性能的程序。

英文摘要

Benchmarks are necessary for healthcare evaluation, but are not sufficient for predicting deployment performance. Our position is that the evaluation--deployment gap arises not because of poorly designed benchmarks, but from implicit assumptions about how users interact with models that cannot be surfaced from benchmarks alone. To make this precise, we propose a classification of assumptions into two categories: task, which can be tested from conversation data alone, and outcome, which requires outcome data and behavioral studies for testing. Critically, outcome assumptions depend on human behavior, something that even well-designed benchmarks cannot directly observe. To demonstrate the operationality of this framework, we retrospectively analyze a healthcare RCT as a case study and find that the gap naturally separates into task and outcome gaps of roughly equal size. To address this, we make two contributions: first, we propose BenchmarkCards, an artifact that documents assumptions, and second, we propose staged evaluation, a procedure that systematically tests assumptions and evaluates performance.

URL PDF HTML ☆

赞 0 踩 0

2605.22611 2026-05-22 cs.LG 版本更新

Benchmarking Machine Learning Architectures for Antimicrobial Stewardship in Pediatric ICUs

对儿科ICU中抗菌药物使用管理的机器学习架构进行基准测试

Niklas Raehse, Luregn J. Schlapbach, Daphné Chopard

发表机构 * Department of Intensive Care and Neonatology and Children’s Research Center University of Zurich University Children’s Hospital Zurich（重症护理与新生儿科及儿童研究中心，苏黎世大学苏黎世儿童医院）； Department of Health Sciences and Technology ETH Zurich（健康科学与技术系，苏黎世联邦理工学院）； Department of Computer Science ETH Zurich（计算机科学系，苏黎世联邦理工学院）

AI总结本研究针对儿科ICU中抗菌药物使用管理的机器学习模型进行基准测试，通过公共数据集和私人机构队列系统评估了四种临床相关的目标，发现预测性能主要由目标流行率和数据集特征决定，而非模型复杂度，序列模型在粗粒度下提升了精度-召回权衡，但细粒度建模带来的收益有限，且校准效果较差。

Comments 16 pages, 6 figures, code: https://anonymous.4open.science/r/AMS_intervention_prediction-C024

详情

AI中文摘要

抗菌药物使用管理（AMS）在儿科重症监护室（PICUs）中至关重要，其中诊断不确定性常导致广谱抗生素使用，增加抗菌药物耐药性和潜在的长期危害。机器学习为从电子健康记录数据中识别患者层面的使用管理干预机会提供了有前途的方法，但以往研究主要集中在成人群体和静态表格表示上。我们展示了在PICU中对AMS干预预测的系统性基准研究，涵盖了公共数据集和私人机构队列。我们定义了四个临床相关的代理目标以减少抗生素暴露：静脉到口服转换、降级、停用和短程治疗。在统一的评估框架下，我们比较了表格、基于序列和基于图的时序模型在多个时间分辨率下的表现。我们发现，预测性能主要由目标流行率和数据集特征驱动，而非模型复杂度。序列模型在粗粒度（24小时）下比表格方法在精度-召回权衡上有所提升，而更精细的时间建模提供有限的额外收益。然而，这些收益是以较差的校准为代价的，更简单的表格模型产生更可靠的概率估计。多任务学习仅产生微小改进，表明在使用管理目标之间共享结构有限。我们的发现强调了目标设计、时间表示和校准在临床机器学习中的重要性，并为开发可靠的决策支持系统提供实用指导。

英文摘要

Antimicrobial stewardship (AMS) is critical in pediatric intensive care units (PICUs), where diagnostic uncertainty often drives broad-spectrum antibiotic use, increasing antimicrobial resistance and potential long-term harms. Machine learning offers a promising approach for identifying patient-level opportunities for stewardship interventions from electronic health record data, yet prior work has focused largely on adult populations and static tabular representations. We present a systematic benchmarking study of AMS intervention prediction in the PICU across a public dataset and a private institutional cohort. We define four clinically relevant proxy targets for reducing antibiotic exposure: intravenous-to-oral switching, de-escalation, discontinuation, and short-course therapy. Under a unified evaluation framework, we compare tabular, sequence-based, and graph-based temporal models at multiple temporal resolutions. We find that predictive performance is driven primarily by target prevalence and dataset characteristics rather than model complexity. Sequence models improve the precision-recall trade-off over tabular approaches at coarse (24-hour) resolution, while finer temporal modeling provides limited additional benefit. However, these gains come at the cost of poorer calibration, with simpler tabular models yielding more reliable probability estimates. Multi-task learning produces only marginal improvements, suggesting limited shared structure across stewardship targets. Our findings highlight the importance of target design, temporal representation, and calibration in clinical machine learning, and provide practical guidance for developing reliable decision support systems for pediatric AMS.

URL PDF HTML ☆

赞 0 踩 0

2605.22604 2026-05-22 cs.CR cs.AI cs.LG cs.SE 版本更新

Innovations in Cardless Artificial Intelligence Banking: A Comprehensive Framework for Cyber Secure and Fraud Mitigation using Machine Learning Algorithms

无卡人工智能银行业创新：基于机器学习算法的全面框架用于网络安全与欺诈防范

Md Israfeel

发表机构 * Computer Engineering, University of Central Florida, Orlando, Florida, USA

AI总结本文提出了一种全面的框架，利用机器学习算法增强无卡人工智能银行系统的网络安全和欺诈防范能力，通过AI驱动的数据加密生成虚拟卡，减少信息泄露风险。

详情

AI中文摘要

无卡人工智能（AI）银行业的发展标志着金融领域的一次范式转变，为用户提供前所未有的安全性和便利性。本文概述了一个全面的框架，旨在增强网络安全，引入自动生成的虚拟卡，并在无卡AI银行系统中减轻欺诈风险。该框架设想了一种未来银行架构，利用AI驱动的数据加密技术来创建安全的虚拟卡以实现无缝交易。通过强调安全的通信渠道，它确保了银行系统、持卡人和第三方供应商之间的金融活动的完整性。基于AI的授权方法在验证每一笔交易的同时，主动识别潜在欺诈，展示了该框架在加强无卡AI银行业安全方面的有效性。初始方法，包含一个AI驱动的基于特征的银行系统，确保生成带有加密数据的虚拟卡，减少信息暴露并降低欺诈风险。整合机器学习算法为潜在的欺诈活动增加了一层保护。最后，所提出的框架为无卡AI银行系统建立了一个全面的网络安全和欺诈防范范式。其实施使金融机构能够应对传统银行业相关的安全问题，为一个不仅抗欺诈而且对用户安全和方便的未来银行业景观铺平道路。

英文摘要

The advent of cardless artificial intelligence (AI) banking heralds a paradigm shift in the financial landscape, offering users unprecedented security and convenience. This paper outlines a comprehensive framework designed to enhance cybersecurity, introduce auto-generated virtual cards, and mitigate fraud risks within cardless AI banking systems. The framework envisions a future banking architecture that employs AI-powered data cryptography to create secure virtual cards for seamless transactions. By emphasizing secure communication channels, it ensures the integrity of financial activities among banking systems, cardholders, and third-party vendors. AI-based authorization methodologies play a pivotal role in authenticating each transaction while proactively identifying potential fraud, demonstrating the framework's efficacy in fortifying cardless AI banking security. The initial approach, featuring an AI-driven, feature-based banking system, ensures the generation of virtual cards with encrypted data, minimizing information exposure and reducing fraud risks. Integrating a machine learning algorithm adds an additional layer of protection against potential fraudulent activities. In conclusion, the proposed framework establishes a holistic cybersecurity and fraud-mitigation paradigm for cardless AI banking systems. Its implementation empowers financial institutions to address security concerns associated with traditional banking, paving the way for a future banking landscape that is not only fraud-resistant but also secure and convenient for users.

URL PDF HTML ☆

赞 0 踩 0

2605.22597 2026-05-22 cs.LG cs.AI cs.GR cs.RO 版本更新

MoSA: Motion-constrained Stress Adaptation for Mitigating Real-to-Sim Gap in Continuum Dynamics via Learning Residual Anisotropy

MoSA: 通过学习残余各向异性来缓解连续动力学中现实到模拟差距的运动约束应力适应

Jiaxu Wang, Junhao He, Jingkai Sun, Yi Gu, Yunyang Mo, Jiahang Cao, Qiang Zhang, Renjing Xu

发表机构 * Hong Kong University of Science（香港科学大学）； MMLab, Chinese University of Hong Kong, Hong Kong SAR（香港中文大学MMLab, 香港特别行政区）； The University of Hong Kong, Hong Kong SAR（香港大学, 香港特别行政区）

AI总结本文提出MoSA框架，通过运动约束应力适应来缓解连续动力学中现实到模拟差距，利用各向同性模型作为物理先验，并学习残余应力算子以捕捉轻微各向异性和非均匀性，最终在机器人操作中验证了其有效性。

Journal ref International Conference on Machine Learning 2026

详情

AI中文摘要

从视觉观测中学习现实世界的动力学对于各种领域至关重要。一种常见策略是通过估计物理参数来校准模拟器，但准确性最终受限于底层物理模型，这些模型通常假设材料是均质且各向同性的。即使合理，现实中的物体通常表现出轻微的各向异性和非均匀性。在近各向同性的骨架良好校准后，这些残余效应成为进一步缩小现实到模拟差距的关键瓶颈。虽然神经网络可以端到端地拟合动力学，但这种黑盒建模会丢弃强物理先验，导致数据效率低和过拟合。因此，我们提出了MoSA，一种运动约束应力适应框架，旨在针对这些残余效应以进一步提高现实到模拟动力学学习。MoSA使用各向同性模型作为物理先验，并学习残余应力算子以捕捉轻微各向异性和非均匀性。它通过微平面约束的再分布逐步适应应力，在一个物理指导的级联网络中。我们进一步通过监督变形场的时空导数来施加运动约束。实验表明，我们学习的动力学在准确性、泛化性和鲁棒性方面均优于现有方法，同时学习了具有物理意义的残余各向异性。最后，我们在机器人操作设置中验证了MoSA，显示更好的现实到模拟动力学建模能够转化为更可靠的模拟到现实转移。项目页面可在https://mercerai.github.io/MoSA/上获取。

英文摘要

Learning real-world dynamics from visual observations is crucial for various domains. A common strategy is to calibrate simulators by estimating physical parameters, yet accuracy is ultimately bounded by the underlying physical models, which often assume materials are homogeneous and isotropic. Even if reasonable, real-world objects typically exhibit mild anisotropy and heterogeneity. After the near-isotropic backbone is well calibrated, these residual effects become the key bottleneck for further closing the real-to-sim gap. Although neural networks can fit dynamics end-to-end, such black-box modeling discards strong physical priors, leading to poor data efficiency and overfitting. Therefore, we propose MoSA, a motion-constrained stress adaptation framework that targets these residual effects to further improve real-to-sim dynamics learning. MoSA uses an isotropic model as a physics prior and learns residual stress operators to capture mild anisotropy and heterogeneity. It progressively adapts stresses via microplane-constrained redistribution in a physics-informed cascaded network. We further impose motion constraints by supervising temporal and spatial derivatives of the deformation field. Experimentally, our learned dynamics achieves superior accuracy, generalization, and robustness, while learning physically meaningful residual anisotropy. Finally, we validate MoSA in a robot manipulation setting, showing that better real-to-sim dynamics modeling translates into more reliable sim-to-real transfer. Project Page is available at https://mercerai.github.io/MoSA/.

URL PDF HTML ☆

赞 0 踩 0

2605.22596 2026-05-22 cs.LG 版本更新

Factored Diffusion Policies:Compositionally Generalized Robot Control with a Single Score Network

因子扩散策略：一种单一分数网络的组成通用机器人控制

Sayan Mitra, Ege Yuceel, Noah Giles, Abhishek Pai

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结本文提出了一种因子扩散策略，通过单一共享的扩散网络实现通用机器人控制，该网络在推理时能将分数分解为各因子的加法形式，从而在训练任务预算上从因子基数的乘积减少到求和，通过轨迹管证书将分数界转化为闭环状态轨迹管，实验验证了其泛化界和证书的有效性。

详情

AI中文摘要

机器人任务通常由多个因子组成，如要抓取的对象、要避开的障碍物、目标的颜色等。收集每个因子组合的专家示范数据会呈指数增长。我们提出了因子扩散策略：一个单一共享的扩散网络，通过每个因子的空标记dropout进行训练，在推理时分数可以跨因子加性分解。在给定动作-观测对的情况下，因子之间的近似条件独立性使得这种组合可以近似真实联合分数，误差有界且均匀，从而将训练任务预算从因子基数的乘积减少到求和。轨迹管证书将此分数界通过反向时间采样ODE和一个收缩跟踪控制器转化为闭环状态轨迹管，其半径分解为ODE敏感性常数和每个因子分数误差预算。不同于将单独训练的网络组合在一起的组合扩散方法，我们使用一个共享网络。无人机赛车实验验证了泛化界和证书的有效性。在基于状态的多关卡赛车中，因子策略通过90%的保留关卡（与理想情况一致），而K网络组合基线则下降到3%；在基于视觉的单关卡穿越中，它能够零样本迁移至未见场地，成功率提升11.7个百分点，碰撞率减少2.4倍。

英文摘要

Robotic tasks are typically specified by a tuple of factors, such as the object to be grasped, the obstacles to be avoided, the color of the target, and so on. Collecting expert demonstrations for every combination of factor values grows combinatorially. We present factored diffusion policies: a single shared diffusion network trained with per-factor null-token dropout, whose score decomposes additively across factors at inference. Under approximate conditional independence between factors given the action-observation pair, this composition approximates the true joint score with a bounded uniform error, reducing the training-task budget from a product of factor cardinalities to a sum. A trajectory-tube certificate chains this score-level bound through the reverse-time sampling ODE and a contracting tracking controller into a closed-loop state-trajectory tube whose radius factors into an ODE-sensitivity constant and a per-factor score-error budget. Unlike compositional-diffusion methods for control that combine separately trained networks, we use one shared network. Drone racing experiments confirm both the generalization bound and the certificate. On state-based multi-gate racing, the factored policy passes 90% of held-out gates -- matching an oracle -- while a K-network composition baseline collapses to 3%; on vision-based single-gate traversal, it transfers zero-shot to an unseen venue with +11.7pp success-rate gain and 2.4X crash-rate reduction.

URL PDF HTML ☆

赞 0 踩 0

2605.22593 2026-05-22 cs.LG 版本更新

SynAE: 一个用于评估工具调用代理合成数据质量的框架

Shuaiqi Wang, Aadyaa Maddi, Zinan Lin, Giulia Fanti

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Microsoft Research（微软研究院）

AI总结本文提出SynAE框架，用于评估多轮工具调用代理合成数据的质量，通过四个指标类别评估合成数据的有效性、保真度和多样性，揭示单一指标不足以全面表征合成数据质量。

详情

AI中文摘要

如今，工具调用代理通常在静态执行轨迹数据集上进行评估或测试，包括输入命令、代理响应和相关工具调用。然而，内部生产数据集往往不足或无法使用；例如，它们可能包含敏感或专有数据，或过于稀疏，无法支持全面测试（尤其是预部署前）。在这些情况下，实践者越来越多地用合成数据替代或补充真实数据进行评估。关键挑战是量化这些合成数据集与真实数据之间的关系。我们介绍了SynAE，一个用于评估多轮工具调用代理合成基准如何复制和增强真实数据轨迹特征的评估框架。SynAE在四个指标类别中评估合成数据的效度、保真度和多样性：（i）任务指令和中间响应，（ii）工具调用，（iii）最终输出，（iv）下游评估。我们通过近期代理基准评估SynAE，并通过现实且受控的生成方案测试常见的合成数据失败模式。SynAE能够检测数据效度、保真度和多样性的细粒度变化，并表明没有单一指标足以全面表征合成数据质量，从而推动对合成数据的多轴评估。SynAE的演示可在https://synae-2026-synae-demo.static.hf.space/index.html获取，代码在https://github.com/wsqwsq/SynAE。

英文摘要

Today, tool-calling agents are commonly evaluated or tested on static datasets of execution traces, including input commands, agent responses, and associated tool calls. However, internal production datasets are often insufficient or unusable for testing; for example, they may contain sensitive or proprietary data, or they may be too sparse to support comprehensive testing (especially pre-deployment). In these settings, practitioners are increasingly replacing or augmenting real datasets with synthetic ones for evaluation purposes. A key challenge is quantifying the relation between these synthetic datasets and the real data. We introduce SynAE, an evaluation framework for assessing how well synthetic benchmarks for multi-turn, tool-calling agents replicate and augment the characteristics of real data trajectories. SynAE assesses the validity, fidelity, and diversity of synthetic data across four metric categories: (i) task instructions and intermediate responses, (ii) tool calls, (iii) final outputs, and (iv) downstream evaluation. We evaluate SynAE using recent agent benchmarks and test common synthetic data failure modes via realistic and controlled generation schemes. SynAE detects fine-grained variations in data validity, fidelity and diversity, and shows that no single metric is sufficient to fully characterize synthetic data quality, motivating a multi-axis evaluation of synthetic data for agent testing. A demo of SynAE is available at https://synae-2026-synae-demo.static.hf.space/index.html, with code at https://github.com/wsqwsq/SynAE.

URL PDF HTML ☆

赞 0 踩 0

2605.22561 2026-05-22 cs.LG 版本更新

Regret-Based $(ε,δ)$-optimal Stopping Criteria for Bayesian Optimization

基于遗憾的贝叶斯优化（ε，δ）-最优停止准则

Haowei Wang, Jingyi Wang, Qiyu Wei

发表机构 * National University of Singapore（新加坡国立大学）； Lawrence Livermore National Laboratory（劳伦斯利弗莫尔国家实验室）； The University of Manchester（曼彻斯特大学）

AI总结本文提出了一种基于更紧的高斯过程上置信界（GP-UCB）即时遗憾界限的停止准则，确保在终止时以高概率1-δ获得ε-最优解，并通过数值实验验证其有效性。

Comments 21 pages

2605.22556 2026-05-22 cs.LG 版本更新

ImplicitTerrainV2: Wavelet-Guided Spatially Adaptive Neural Terrain Representation

ImplicitTerrainV2: 基于小波引导的时空自适应神经地形表示

Haoan Feng, Xin Xu, Leila De Floriani

发表机构 * University of Maryland, College Park（马里兰大学学院市分校）

AI总结本文提出ImplicitTerrainV2，通过结合频谱控制机制、小波引导的空间自适应性、导数感知监督和训练后模型压缩，实现了紧凑高效的神经地形数据格式，提升了地形分析的精度和效率。

Comments 14 pages, 8 figures

详情

AI中文摘要

数字高程模型（DEMs）是地理信息系统（GIS）中地形分析的基础，但其常见的栅格形式依赖插值进行离格采样和有限差分算子进行基于导数的分析。隐式神经表示（INRs）提供了一种连续的替代方案，但先前的地形INRs缺乏显式的频率控制，忽视了地形的梯度结构，并且在实际部署中仍然过于庞大和昂贵。我们提出了ImplicitTerrainV2，通过结合频谱控制机制、小波引导的空间自适应性、导数感知监督和训练后模型压缩，将地形INRs推进到紧凑、高效的神经地形数据格式。在核心部分，小波复杂度场（WCF）从解析计算的小波系数中推导出空间自适应的频率掩码，将高频能力局部化到复杂地形区域。同一字段指导复杂度感知的自适应采样，将训练集中在高复杂度区域，同时梯度匹配应用额外监督以强制地形DEMs的光滑流形结构，从而提高导数保真度。训练后混合精度量化和熵编码将存储减少到1.23 bpp，PSNR下降0.28 dB。在50个瑞士地形图块上，ImplicitTerrainV2达到66.25 dB的端到端PSNR，比先前工作提高了5.70 dB，同时使用3.2倍更少的参数，在单个GPU上每个图块训练时间仅为55秒。我们的压缩神经格式在率失真性能上与几种已建立的DEM编码器竞争，同时还支持离格点查询、闭合形式导数评估和分辨率无关重建，这可能受益于许多下游GIS应用。

英文摘要

Digital elevation models (DEMs) underpin terrain analysis in Geographic Information Systems (GIS), but in their common raster form, they rely on interpolation for off-grid sampling and finite-difference operators for derivative-based analysis. Implicit neural representations (INRs) offer a continuous alternative, but prior terrain INRs lack explicit frequency control, neglect the gradient structure of terrain, and remain too large and costly to train for practical deployment. We present ImplicitTerrainV2, which advances terrain INRs toward a compact, efficient neural terrain data format by combining a spectral control mechanism with wavelet-guided spatial adaptivity, derivative-aware supervision, and post-training model compression. At its core, a wavelet complexity field (WCF) derives spatially-adaptive frequency masks from analytically computed wavelet coefficients, localizing high-frequency capacity to complex terrain regions. The same field guides complexity-aware adaptive sampling that concentrates training in high-complexity regions, while gradient matching applies extra supervision to enforce the smooth manifold structure of terrain DEMs for improved derivative fidelity. Post-training mixed-precision quantization and entropy coding reduce storage to 1.23 bpp with a 0.28 dB PSNR drop. On 50 Swiss terrain tiles, ImplicitTerrainV2 reaches 66.25 dB end-to-end PSNR, improving over the prior work by 5.70 dB while using 3.2x fewer parameters and training in 55 s per tile on a single GPU. Our compressed neural format is competitive with several established DEM codecs in rate-distortion performance, while additionally supporting off-grid point queries, closed-form derivative evaluation, and resolution-independent reconstruction, which may benefit many downstream GIS applications.

URL PDF HTML ☆

赞 0 踩 0

2605.22549 2026-05-22 stat.ML cs.LG 版本更新

A Martingale Kernel Independence Test

一个鞅核独立性检验

Felix Laumann, Zhaolu Liu, Mauricio Barahona

发表机构 * Imperial College London（伦敦帝国学院）

AI总结本文提出两种学生化统计量，通过自归一化和半样本分割，实现了无需排列校准的独立性检验，显著提升了计算效率和测试性能。

详情

AI中文摘要

Hilbert-Schmidt Independence Criterion (HSIC) 及其联合独立性扩展 dHSIC 是退化 V 统计量，其数据依赖的加权 χ² 空间迫使排列校准，导致每测试成本乘以排列次数，实际中为两到三个数量级。通过将最近的鞅 MMD 构造应用于两样本检验到联合独立性问题，我们引入了两个学生化统计量，其空分布为标准正态分布，无论数据分布如何，因此单次正态分位数查找可完全替代排列步骤。第一个，mHSIC，是两个经验中心 Gram 矩阵的 Hadamard 积的自归一化下三角和。在独立性和有界四次矩核下，它收敛于标准正态分布。它对所有固定替代一致，且在样本量二次成本下运行，无需样本分割，与偏置 HSIC V 统计量匹配。第二个统计量 mdHSIC 通过单个半样本分割实现有限样本一致性：中心化估计在一半，下三角自归一化鞅在另一半运行，使条件均值残差缩成指数小量，因此在任意固定联合测试变量数下，统计量渐近标准正态分布，每测试成本仅与 d 线性增长。在合成数据中，输入维度从 1 到 500，联合测试变量从 2 到 10，两种统计量在运行速度上比排列校准基线快 25 到 60 倍，同时保持相同的经验 I 类错误率和测试功效。

英文摘要

The Hilbert-Schmidt Independence Criterion (HSIC) and its joint-independence extension $d\mathrm{HSIC}$ are degenerate $V$-statistics whose data-dependent weighted-$χ^2$ null limits force a permutation calibration that multiplies the per-test cost by the number of permutations, in practice two orders of magnitude. Adapting the recent martingale MMD construction for two-sample testing to the (joint) independence problem, we introduce two studentised statistics whose null distributions are standard normal regardless of the data law, so that a single normal-quantile lookup replaces the permutation step entirely. The first, $m\mathrm{HSIC}$, is a self-normalised lower-triangular sum of the Hadamard product of two empirically centred Gram matrices. Under independence and bounded-fourth-moment kernels it converges to a standard normal. It is consistent against every fixed alternative, and runs at quadratic cost in the sample size without any sample split, matching the biased HSIC $V$-statistic. Our second statistic, $md\mathrm{HSIC}$, achieves finite-sample consistency with a single half-sample split: the centring is estimated on one half and the lower-triangular self-normalised martingale is run on the other, shrinking the conditional-mean residual to a quantity that is exponentially small in $d$, so the statistic is asymptotically standard normal at every fixed number of jointly tested variables, with a per-test cost that grows only linearly in $d$. On synthetic data with per-variable input dimension from $1$ to $500$ and between $2$ and $10$ jointly tested variables, both statistics match the empirical type-I error rate and test power of permutation-calibrated baselines while running $25$ to $60\times$ faster.

URL PDF HTML ☆

赞 0 踩 0

2605.22537 2026-05-22 cs.LG 版本更新

F-TIS: Harnessing Diverse Models in Collaborative GRPO

F-TIS: 利用多样化模型进行协作GRPO

Nikolay Blagoev, Oğuzhan Ersoy, Wendelin Boehmer, Lydia Yiyu Chen

发表机构 * Gensyn University of Neuchatel（日内瓦大学内沙特尔分校）； Gensyn（盖森）； TU Delft（代尔夫特理工大学）； University of Neuchatel（日内瓦大学内沙特尔分校）

AI总结本文提出F-TIS方法，通过利用异构模型在协同GRPO训练中提高本地模型的学习效果，实现了高效的通信和一致的最终模型收敛，同时在某些情况下提升了模型在分布外任务上的泛化能力。

Comments Accepted to ICML 2026 Workshop Scalable Learning and Optimization for Efficient Multimodal AI Agents (SCALE)

详情

AI中文摘要

像GRPO这样的强化学习方法在LLM后训练中变得非常流行。在GRPO中，模型产生一组提示的完成，这些完成会得到奖励，策略会朝着相对高奖励的完成更新。由于模型的自回归性质，这种训练风格的生成阶段可以极其耗时。为了解决这个问题，先前的工作试图将推理步骤分布到许多节点上，并行工作。这些工作主要假设训练中的同质模型，以保持样本尽可能接近on-policy。这一假设可能在去中心化系统中不切实际，因为具有不同计算能力和偏好的各方可能希望在同一个任务上合作。因此，去中心化训练需要一种能够处理异构模型的方法——不同的模型在同一个任务上协作。然而，这会导致训练过程中出现高度离策略的样本，而先前的工作已经指出离策略样本可能会影响GRPO的收敛。为了实现异质性，我们提出了过滤截断重要性采样（F-TIS）——一种GRPO风格的训练范式，可以利用离策略样本来改进本地模型的学习。我们的框架允许各种模型在同一个RL训练运行中协作，同时保持高效的通信。我们广泛评估了F-TIS在各种异构设置中的表现，并展示了它在最终模型收敛方面与纯on-sample训练相同。此外，我们观察到在某些设置中，F-TIS在分布外任务上的泛化能力优于on-policy训练，使模型性能提高了高达12%。

英文摘要

Reinforcement learning methods such as GRPO have seen great popularity in LLM post-training. In GRPO, models produce completions to a set of prompts, which are rewarded, and the policy is updated towards the relatively high reward completions. Due to the auto-regressive nature of models, the generation phase of such style of training can be extremely time consuming. As a solution, prior work has sought to distribute the inference step across many nodes, working parallel. These works assume primarily homogeneous models in the training in order to keep samples as close to on-policy as possible. This assumption may be impractical in decentralized systems, where parties with various computes and preferences may wish to collaborate on the same task. Thus, decentralized training requires an approach that can handle heterogeneous models - different models collaborating on the same tasks. However, this leads to highly off-policy samples presented during training, which prior work has identified that off-policy samples can hurt GRPO convergence. To enable heterogeneity, we propose Filtered Truncated Importance Sampling (F-TIS) - a GRPO-style training paradigm that can use off-policy samples to improve local model's learning. Our framework allows various models to collaborate in the same RL training run while being communication efficient. We extensively evaluate F-TIS in various heterogeneous setups and we show that it exhibits identical final model convergence to purely on-sample training. Furthermore, we observe in some setups better generalization on out-of-distribution tasks than on-policy training, increasing model's performance by up to 12\%.

URL PDF HTML ☆

赞 0 踩 0

2605.22531 2026-05-22 cs.LG 版本更新

Disentanglement Beyond Generative Models with Riemannian ICA

超越生成模型的解缠：黎曼ICA

Edmond Cunningham

发表机构 * University of Massachusetts Amherst（马萨诸塞大学阿默斯特分校）

AI总结本文提出黎曼ICA，一种不依赖生成模型的解缠方法，通过引入解缠张量来研究局部解缠特性，为理解无生成假设下的特征解缠提供了理论基础。

详情

AI中文摘要

在解缠理论基础与现代表示学习实践之间存在差距。现有的理论框架，特别是独立成分分析（ICA）及其非线性变体，假设数据背后存在统计独立的潜在变量，使得解缠等同于识别生成数据的潜在变量。这种生成框架具有可解释性和理论依据，但其强假设使其难以应用于现代表示学习。现代预训练编码器通常学习出具有解缠特性的特征，而无需做出生成假设，但缺乏解释这些特征作为独立变化因素的一般理论。本文通过引入黎曼ICA，将ICA的全局生成模型替换为局部几何结构。RICA基于观察到，在ICA中，数据点的潜在变化因素可以通过从该点出发的径向曲线映射到潜在空间中的轴对齐直线来理解。我们利用黎曼几何正式化这一观点，并以与现有生成方法一致的方式提出我们的理论。我们的主要贡献是解缠张量，它编码了我们称为点解缠的二阶解缠概念。该张量依赖于数据对数似然的Hessian以及模型诱导的里奇曲率。在受控源恢复设置中，RICA在多个流形上恢复了源，而ICA基线的成功取决于用于表示观测的坐标。本文为研究无生成模型假设下的局部解缠提供了理论基础。

英文摘要

There is a gap between the theoretical foundations of disentanglement and the practice of modern representation learning. Existing theoretical frameworks, particularly Independent Component Analysis (ICA) and its nonlinear variants, assume a generative model with statistically independent latent variables underlying the data so that disentanglement amounts to identifying the latents that could have generated the data. This generative framework is interpretable and theoretically justified, but its strong assumptions make it difficult to apply to modern representation learning. Modern pretrained encoders often learn features that exhibit disentangled properties without making generative assumptions, yet there is no general theory for interpreting these features as independent factors of variation. We take a step toward such a theory by introducing Riemannian ICA (RICA), which replaces ICA's global generative model with local geometric structure. RICA is founded on the observation that in ICA, the factors of variation underlying a data point can be understood through radial curves emanating from the point that map to axis-aligned lines in the latent space. We formalize this perspective using Riemannian geometry and introduce our theory in a way that is consistent with the existing generative approach. Our main contribution is the disentanglement tensor, which encodes a second-order notion of disentanglement that we call pointwise disentanglement. This tensor depends on the Hessian of the data log likelihood as well as the Ricci curvature induced by the model. In a controlled source recovery setting with known ground-truth sources, RICA recovers sources across several manifolds, while the success of ICA baselines depends on the coordinates used to represent the observations. Our work provides a theoretical basis for studying local disentanglement without assuming a global generative model.

URL PDF HTML ☆

赞 0 踩 0

2605.22529 2026-05-22 cs.LG cs.AI 版本更新

Stabilising Explainability Fragility in Cybersecurity AI: The Impact and Mitigation of Multicollinearity in Public Benchmark Datasets

在网络安全AI中稳定可解释性脆弱性：公共基准数据集中的多重共线性影响与缓解

Ioannis J. Vourganas, Anna Lito Michala

发表机构 * Netrity Ltd（Netrity有限公司）； University of Glasgow（格拉斯哥大学）

AI总结本文研究了在入侵检测（IDS）中使用AI可解释性时的一个未被探索但重要的漏洞：多重共线性导致的不稳定性。尽管广泛依赖于事后可解释性工具如SHAP或LIME，但相关特征对解释鲁棒性的影响未被评估。我们引入了一个正式定理，表明多重共线性会放大归因方差。这证明了在多重共线性下，解释和特征重要性是非可识别的。在代表性的基准数据集UNSW-NB15上，通过一系列全面的实验验证了该定理。评估了四种广泛使用的模型家族，包括线性、基于树的、核和神经网络模型，在基于VIF和相关性阈值的完整和剪枝特征集上。我们提出了新的指标Explanability Fragility Score，并提出了两种新的缓解方法，具有变量整合复杂度。CAA-Filtering专注于通过分组训练模型的归因来稳定解释。SHARP是一种新的训练时间正则化框架，通过惩罚归因不稳定性，使可解释性稳定性可控且单调提高。研究结果支持稳定的预测性能，使用Kendall's τ量化在重采样解释中的不稳定性。这项工作对XAI在安全关键领域中的可信度和可重复性有直接影响，并促使将多重共线性缓解措施纳入IDS流程，为从业者提供了一套指南。

Comments 35 pages, 3 figures, submitted to ACM TAISAP

详情

AI中文摘要

本文研究了在入侵检测（IDS）中使用AI可解释性时的一个未被探索但重要的漏洞：多重共线性导致的不稳定性。尽管广泛依赖于事后可解释性工具如SHAP或LIME，但相关特征对解释鲁棒性的影响未被评估。我们引入了一个正式定理，表明多重共线性会放大归因方差。这证明了在多重共线性下，解释和特征重要性是非可识别的。在代表性的基准数据集UNSW-NB15上，通过一系列全面的实验验证了该定理。评估了四种广泛使用的模型家族，包括线性、基于树的、核和神经网络模型，在基于VIF和相关性阈值的完整和剪枝特征集上。我们提出了新的指标Explanability Fragility Score，并提出了两种新的缓解方法，具有变量整合复杂度。CAA-Filtering专注于通过分组训练模型的归因来稳定解释。SHARP是一种新的训练时间正则化框架，通过惩罚归因不稳定性，使可解释性稳定性可控且单调提高。研究结果支持稳定的预测性能，使用Kendall's τ量化在重采样解释中的不稳定性。这项工作对XAI在安全关键领域中的可信度和可重复性有直接影响，并促使将多重共线性缓解措施纳入IDS流程，为从业者提供了一套指南。

英文摘要

This paper investigates a unexplored yet impactful vulnerability in AI explainability used in intrusion detection (IDS): multicollinearity-induced instability. Despite extensive reliance on post-hoc explainability tools such as SHAP or LIME, the impact of correlated features on explanation robustness is not evaluated. We introduce a formal theorem stating that multicollinearity inflates attribution variance. This demonstrates that explanations and feature importances are non-identifiable under multicollinearity. A suite of comprehensive experiments validates the theorem on a representative benchmark dataset, UNSW-NB15. Four widely used families of models are evaluated, including linear, tree-based, kernel, and neural, across full and pruned feature sets based on VIF and correlation thresholding. We propose the novel metric of Explanability Fragility Score and two novel methods to mitigate it with variable integration complexity. CAA-Filtering focuses on stabilising explanations by grouping attributions of trained models. SHARP is a novel training-time regularisation framework that penalises attribution instability, enabling controllable and monotonic improvement of explainability stability. The findings support stable predictive performance, using Kendall's τ to quantify instability across bootstrapped explanations. This work has direct implications for the trustworthiness and reproducibility of XAI in security-critical contexts, and motivates incorporating multicollinearity mitigations into the IDS pipelines, providing a set of guidelines for practitioners.

URL PDF HTML ☆

赞 0 踩 0

2605.22507 2026-05-22 cs.LG stat.ML 版本更新

Generative Modeling by Value-Driven Transport

通过价值驱动传输进行生成建模

Pablo Moreno-Muñoz, Adrian Müller, Gergely Neu

发表机构 * Universitat Pompeu Fabra Barcelona（巴塞罗那庞培乌法布拉大学）； ETH Zürich（苏黎世联邦理工学院）； ICREA & Universitat Pompeu Fabra Barcelona（ICREA与巴塞罗那庞培乌法布拉大学）

AI总结本文提出了一种基于测度传输离散时间随机控制 formulations 的新生成建模框架，通过线性规划的对偶变量直接编码最优控制策略，并开发了高效的模拟-free 原始-对偶算法来计算近似最优价值函数和价值驱动传输（VDT）策略，这些策略在多个实验中表现出优越的性能和良好的可扩展性。

详情

AI中文摘要

我们提出了一种基于测度传输离散时间随机控制 formulations 的新生成建模框架。通过适应控制理论中的经典结果，我们将问题 formulations 为一个线性规划，其对偶变量对应于控制问题的最优价值函数，这直接编码了最优控制策略。利用这种线性规划 formulations，我们开发了高效的模拟-free 原始-对偶算法，用于计算近似最优价值函数及其相关的价值驱动传输（VDT）策略，这些策略近似于真正的最优策略。我们展示了经过良好训练的 VDT 策略与其他基于流、扩散或 Schrödinger 桥的最新方法相比具有许多有利的性质：它们导致直线传输路径，可以快速且鲁棒地模拟，并且可以以与扩散和流基模型相同的方式增强（例如，条件生成、分类器-free 引导、无配对数据到数据翻译都很容易整合）。我们在一系列实验中评估了我们的方法，结果表明性能强大且具有良好的可扩展性潜力。

英文摘要

We propose a new framework for generative modeling based on a discrete-time stochastic control formulation of measure transport. Adapting classic results from control theory, we formulate our problem as a linear program whose dual variables correspond to the \emph{optimal value function} of the control problem, which directly encodes the optimal control policy. Exploiting this LP formulation, we develop an efficient simulation-free primal-dual algorithm for computing approximately optimal value functions and the associated \emph{value-driven transport} (VDT) policies which approximate the true optimal policy. We show that well-trained VDT policies enjoy numerous favorable properties in comparison with other state-of-the-art methods based on flows, diffusions, or Schrödinger bridges: they lead to straight transport paths which can be simulated quickly and robustly, and can be enhanced in all the same ways as diffusion and flow-based models (e.g., conditional generation, classifier-free guidance, unpaired data-to-data translation are all easy to incorporate). We evaluate our methodology in a range of experiments, with results that indicate strong performance and good potential for scalability.

URL PDF HTML ☆

赞 0 踩 0

2605.22506 2026-05-22 cs.CR cs.LG 版本更新

噪声中的信号：通过因子化潜在空间中的拟合性检验进行分布外检测

Philipp Bomatter, Jack Geary, Henry Gouk

发表机构 * School of Informatics University of Edinburgh（信息学院爱丁堡大学）

AI总结本文提出了一种基于因子化潜在空间中拟合性检验的分布外检测方法SITN，该方法无需访问分布外数据，计算开销小，并能严格控制误报率。

详情

AI中文摘要

深度生成模型为分布外检测提供了自然的基础，但先前的工作表明，它们分配的似然在区分分布内与分布外数据方面 notoriously 不可靠。在本文中，我们通过利用连续归一化流的 diffeomorphic 和质量保持性质来解决这个问题。我们的分析表明，分布外样本被映射到在噪声先验下高度非典型的噪声样本，这种方式无法通过似然来捕捉。基于这一观察，我们提出了一种新的方法--Signal in the Noise (SITN)--用于单样本级别的分布外检测。SITN 不需要访问分布外数据，计算开销小，并提供严格的误报率控制。通过标准基准和合成扰动的全面评估，突显了该方法的有效性以及似然方法固有的复杂性偏差的不存在。

英文摘要

Deep generative models offer a natural foundation for out-of-distribution (OOD) detection, yet prior work has shown that their assigned likelihoods are notoriously unreliable indicators for in- vs out-of-distribution data. In this paper, we address this problem by leveraging the diffeomorphic and mass-preserving properties of continuous normalising flows. Our analysis shows that OOD samples are mapped to noise samples that are highly atypical under the noise prior in ways not captured by the likelihood. Based on this observation, we propose a new method -- Signal in the Noise (SITN) -- for OOD detection on the single-sample level. SITN requires no access to OOD data, incurs minimal computational overhead, and provides strict control of false positive rates. Comprehensive evaluations through standard benchmarks and synthetic perturbations highlight the method's effectiveness and the absence of the complexity bias inherent to likelihood-based methods.

URL PDF HTML ☆

赞 0 踩 0

2605.22493 2026-05-22 cs.LG cs.AI cs.RO 版本更新

Understanding Multimodal Failure in Action-Chunking Behavioral Cloning

理解动作分块行为克隆中的多模态失败

Lorenzo Mazza, Massimiliano Datres, Ariel Rodriguez, Sebastian Bodenstedt, Gitta Kutyniok, Stefanie Speidel

发表机构 * NCT-Dresden（NCT-德累斯顿）

AI总结研究行为克隆在多模态情况下失败的机制，分析不同多模态参数化在动作分块策略中的不同失效方式，并提出通过调整正则化程度和改进生成策略来提升鲁棒性的方法。

详情

AI中文摘要

当相同的观察允许多个有效动作时，行为克隆变得困难。我们研究了动作分块策略中的这一问题，并展示了不同多模态参数化以不同的方式失败。对于隐变量策略，后验-先验正则化使部署时的采样更可靠，但过度正则化会移除区分演示模式所需的动作条件信息。减少这种正则化可以保留模式信息，但此时成功取决于先验是否覆盖相关隐变量区域。对于动作空间生成策略，多模态性受到基础到动作传输的平滑性限制：具有小Lipschitz常数的映射无法将大量分离的模式分配显著概率。覆盖许多模式需要基础空间中的陡峭过渡或动作空间中的非支持桥接区域。在合成多模态任务和机器人模拟基准上的实验支持了这些机制。

英文摘要

Behavioral cloning becomes difficult when the same observation admits several valid actions. We study this problem for action-chunking policies and show that different multimodal parameterizations fail in different ways. For latent-variable policies, posterior-prior regularization makes deployment-time sampling more reliable, but excessive regularization removes the action-conditioned information needed to distinguish demonstrated modes. Reducing this regularization can preserve mode information, but then success depends on whether the prior covers the relevant latent regions. For action-space generative policies, multimodality is constrained by the smoothness of the base-to-action transport: a map with small Lipschitz constant cannot assign substantial probability to many well-separated modes. Covering many modes therefore requires either sharp transitions in base space or off-support bridge regions in action space. Experiments on synthetic multimodal tasks and robotic simulation benchmarks support these mechanisms.

URL PDF HTML ☆

赞 0 踩 0

2605.22488 2026-05-22 cs.LG 版本更新

Represented Is Not Computed: A Causal Test of Candidate Algorithmic Intermediates in a Transformer

表示不等于计算：一个变换器的因果测试，检验候选算法中间变量

Ishita Darade, Sushrut Thorat

发表机构 * MKSSS's Cummins College of Engineering for Women（MKSSS女子工程学院）； Institute of Cognitive Science（认知科学研究所）； Osnabrück University（奥斯纳布吕克大学）

AI总结本文研究了变换器在执行算术任务时如何整合组件，发现模型虽然能准确回答问题，但其内部表示与计算路径之间存在因果分离，表明探针结果可能与实际因果观察有显著差异。

Comments 16 pages, 4 figures

详情

AI中文摘要

结构化提示要求根据任务相关的关系整合组件。网络如何实现这种整合在语言或视觉任务中往往难以判断，因为这些关系很少精确到足以定义候选内部算法。算术提供了一个更清晰的环境。我们研究了一个训练于基数提取的变换器：给定N，B和D，它必须报告N的基数-B展开式中B^D的系数。闭式解，即floor(N/B^D) mod B，提供了显式的候选算法中间变量。在三个种子下，模型在测试的数字-基数交集上达到了99.83%的准确答案，建立了可靠的任务能力。线性探针解码了这些中间变量，使分阶段的算术计算成为可能。因果测试则将表示与使用分开：在局部路由中，从具有D作为输入的流到输出位置，行为取决于早期的D选择性通信，与N和B无关。相关地，稀疏电路搜索发现大部分N、B和D的路线是分开的，它们在晚期而非由探针建议的分阶段路线中结合。因此，模型表示了使闭式解合理的中间变量，但识别的局部因果路线并未将它们传递到输出流。这一案例表明，基于探针的结论可能与实际因果观察有显著差异，即使有显式的算法假设。

英文摘要

Structured prompts require integrating components according to task-relevant relations. How a network implements this integration is often hard to judge in language or vision, where those relations are rarely specified precisely enough to define a candidate internal algorithm. Arithmetic offers a cleaner setting. We study a Transformer trained on base-digit extraction: given $N$, $B$, and $D$, it must report the coefficient of $B^D$ in the base-$B$ expansion of $N$. The closed-form solution, $\lfloor N/B^D \rfloor \bmod B$, provides explicit candidate algorithmic intermediates. Across three seeds, the model reaches 99.83% exact-answer accuracy on held-out number-base intersections, establishing reliable task competence. Linear probes decode the intermediates, making staged arithmetic computation plausible. Causal tests then separate representation from use: within the localized route from the stream with $D$ as input to the output positions, behavior depends on early $D$-selective communication, independent of $N$ and $B$. Relatedly, a sparse circuit search finds mostly separate $N$, $B$, and $D$ routes that combine late rather than the staged route suggested by the probes. Thus, the model represents the intermediates that make the closed-form solution plausible, but the identified localized causal route does not transmit them to the output stream. This case shows that probe-based conclusions can diverge sharply from causal observations, even when explicit algorithmic hypotheses are available.

URL PDF HTML ☆

赞 0 踩 0

2605.22481 2026-05-22 cs.LG math.ST stat.TH 版本更新

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks

当更强的触发器反噬：高维背景下后门攻击的理论

Donald Flynn, Hadas Yaron Goldhirsh, Jonathan P. Keating, Inbar Seroussi

发表机构 * Mathematical Institute, University of Oxford（牛津大学数学研究所）； School of Mathematical Science, Tel Aviv University（特拉维夫大学数学科学学院）； School of Mathematical Science and Computer Science, Tel Aviv University（特拉维夫大学数学科学与计算机科学学院）

AI总结本文研究了在高维情况下后门毒化攻击的行为，发现更强的训练触发器有助于防御者，并通过高维理论分析了后门攻击的核心机制和影响因素。

详情

AI中文摘要

后门毒化攻击在高维情况下表现出反直觉的行为：更强的训练触发器有助于防御者。我们研究了在比例极限下（p/n→κ）的正则化广义线性模型在高斯混合数据上的表现，通过改变训练触发强度α（相对于固定的测试触发强度）来研究。三种现象出现：（i）干净测试准确率随着α增加而增加；（ii）攻击成功率在有限的α后达到峰值然后下降；（iii）最危险的触发方向是数据协方差的最小特征向量。我们为平方损失证明了所有三个结果，并通过高斯代理固定点系统将（i）和（ii）扩展到一般的凸GLM损失。我们识别出一个与κ成比例的有限样本噪声底噪是（i）背后机制，这在经典n>>p分析中是不可见的。在CIFAR-10和高斯代理上的实验与理论紧密吻合；ResNet-18实验显示在非凸设置下也出现了相同现象。

英文摘要

Backdoor poisoning attacks behave counter-intuitively in high dimensions: stronger training triggers can help the defender. We study regularised generalised linear models on Gaussian-mixture data in the proportional regime ($p/n \to κ$), varying the training trigger strength $α$ against a fixed test trigger. Three phenomena emerge: (i) clean test accuracy increases with $α$; (ii) attack success peaks at a finite $α$ and then declines; and (iii) the most damaging trigger direction is the minimum eigenvector of the data covariance. We prove all three results in closed form for the squared loss, and extend (i) and (ii) to general convex GLM losses via a Gaussian-proxy fixed-point system. We identify a finite-sample noise floor proportional to $κ$ as the mechanism behind (i), invisible to classical $n \gg p$ analysis. Experiments on CIFAR-10 and Gaussian surrogates match the theory closely; ResNet-18 experiments show the same phenomena beyond the convex setting.

URL PDF HTML ☆

赞 0 踩 0

2605.22480 2026-05-22 cs.LG cs.AI 版本更新

Implicit Regularization of Mini-Batch Training in Graph Neural Networks

图神经网络中mini-batch训练的隐式正则化

Clement Wang, Antoine Vialle, Robin Vaysse, Thomas Bonald

发表机构 * Institut Polytechnique de Paris（巴黎理工学院）； Mirakl

AI总结本文研究了图神经网络中mini-batch训练的隐式正则化现象，发现简单的随机节点采样方法在多个数据集上表现优异，且效率更高。

详情

AI中文摘要

图神经网络（GNN）的mini-batch训练与i.i.d.数据训练有本质区别：采样子图会改变拓扑结构并引入边界效应，导致先前工作发展出结构感知采样器以保持局部连接性和减少嵌入方差。令人惊讶的是，我们证明了最简单的可能方案，即随机节点采样（RNS），在均匀采样的诱导子图上训练，在10个数据集中的8个上在墙钟时间和内存消耗上匹配或优于全图训练。为了解释这一点，我们对图mini-batch随机梯度下降（SGD）应用反向误差分析，并显示其隐式最小化采样损失加上一个与mini-batch梯度方差成比例的正则化量，该量直接由采样器塑造。尽管RNS丢弃了局部结构，但它产生了一组预期损失更接近全图损失，且每批梯度方差更低的mini-batch，从而得到更好的隐式目标。我们的分析将图采样器的选择重新定义为一种隐式正则化形式，并将RNS识别为一种强大的、有理论基础的可扩展GNN训练方法。

英文摘要

Mini-batch training of Graph Neural Networks (GNNs) is fundamentally different from training on i.i.d. data: sampling a subgraph alters the topology and introduces boundary effects, leading prior work to develop structure-aware samplers that preserve local connectivity and reduce embedding variance. Surprisingly, we demonstrate that the simplest possible scheme, Random Node Sampling (RNS), training on the induced subgraph of uniformly sampled nodes, matches or outperforms full-graph training on 8 of 10 datasets at a fraction of the wall-clock time and memory. To explain this, we apply backward error analysis to graph mini-batch Stochastic Gradient Descent (SGD) and show that it implicitly minimizes the sampled loss plus a regularizer proportional to the mini-batch gradient variance, a quantity directly shaped by the sampler. Although RNS discards local structure, it produces mini-batches whose expected loss is closer to the full-graph loss, and whose per-batch gradients have lower variance, yielding a better implicit objective. Our analysis reframes the choice of graph sampler as a form of implicit regularization, and identifies RNS as a strong, theoretically grounded method for scalable GNN training.

URL PDF HTML ☆

赞 0 踩 0

2605.22476 2026-05-22 cs.LG cs.CL 版本更新

Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity

结构稀疏注意力用于具有次二次序列复杂度的实体跟踪

Hangyue Zhao, Paul Caillon, Erwan Fagnou, Alexandre Allauzen

发表机构 * ESPCI PSL（ESPCI 法国巴黎大学）； LAMSADE, Université Paris Dauphine - PSL（LAMSADE 巴黎dauphine大学-巴黎科学实验室）

AI总结本文提出了一种结构稀疏注意力机制，用于在长序列中高效维护和更新实体和属性的潜在状态，通过减少计算复杂度提升实体跟踪的效率和准确性。

Comments 12 pages, 1 figure, 9 tables

详情

AI中文摘要

实体跟踪需要在长序列中维护和更新实体和属性的潜在状态。最近的特定任务注意力运算可以通过在单个层内进行多跳状态传播，将深度Transformer堆栈压缩成几层，但其密集评估仍很昂贵。我们显示在这种情况下，学习的注意力具有很强的结构特性：大部分质量集中在局部块对角邻域，具有轻量的跨块残差。利用这一点，我们推导出一种分块评估的解析式算子，保持块内交互的精确性，并通过缩减系统路由跨块交互。所得到的评估是序列长度的次二次复杂度$O(n^{4/3}d)$（当$d\approx n$时为$O(n^{7/3})$）。在受控跟踪基准上，我们的方法在保持密集运算准确性的同时，通过标准化测量协议减少了12-29%的实时时钟时间，并在可比的精确匹配准确性下，比紧凑的密集Transformer快高达2.4倍。我们进一步提供了关于块大小和模型容量的消融实验，并识别了一个限制：当同时演化的属性数量超过注意力头的数量时，性能会崩溃。

英文摘要

Entity tracking requires maintaining and updating latent states for entities and attributes over long sequences. Recent task-specific attention operators can compress deep Transformer stacks into a few layers by performing multi-hop state propagation within a single layer, but their dense evaluation remains expensive. We show that in this setting, learned attention is strongly structured: most mass concentrates in local block-diagonal neighborhoods with a light cross-block residue. Exploiting this, we derive a blockwise evaluation of a resolvent-style operator that keeps within-block interactions exact and routes cross-block interactions through a reduced system. The resulting evaluation is subquadratic in sequence length $O(n^{4/3}d)$ (and $O(n^{7/3})$ when $d\approx n$). On controlled tracking benchmarks, our method matches the dense operator's accuracy while reducing wall-clock time by $12-29\%$ under a standardized measurement protocol, and is up to $2.4 \times$ faster than a compact dense Transformer at comparable exact-match accuracy. We further provide ablations over block size and model capacity, and identify a limitation: performance collapses when the number of simultaneously evolving properties exceeds the number of attention heads.

URL PDF HTML ☆

赞 0 踩 0

2605.22472 2026-05-22 cs.LG 版本更新

Winner-Take-All bottlenecks enforce disentangled symbolic representations in multi-task learning

赢家通吃瓶颈强制多任务学习中的解耦符号表示

Julian Gutheil, Simon Hitzginger, Robert Legenstein

发表机构 * Institute of Machine Learning and Neural Computation（机器学习与神经计算研究所）； Graz University of Technology（格拉茨技术大学）； Graz, Austria（奥地利格拉茨）

AI总结本文研究了赢家通吃瓶颈在多任务学习中强制提取数据类别潜在因素的作用，证明了其产生的表示具有高度符号性，并通过实验验证了其在一般化中的优势。

详情

AI中文摘要

赢家通吃（WTA）网络是大脑皮层网络中的核心电路模式，在现代深度学习模型中，如Transformer的注意力层中的softmax激活函数，也广泛存在WTA-like激活。尽管其在简单生成模型中提取潜在因素的角色已被研究，但在高度非线性纠缠的潜在因素背景下其作用仍不清楚。本文表明，在深度神经网络中存在WTA瓶颈时，在某些明确条件下，可以在多任务学习设置中强制提取数据的类别潜在因素。特别是，我们证明了WTA瓶颈中产生的表示具有高度符号性，其中单个神经元或神经元群体编码单个抽象特征，如特定对象、颜色或位置。我们进一步在两个数据集上实验证明，即使在不完全符合我们定理假设的架构和设置中，这一结论也成立，并展示了获得的符号表示在一般化中的优势。我们提出的模型为具有WTA-like组件的深度神经网络的一般化能力提供了见解，并可能成为符号AI和子符号AI系统之间的接口。

英文摘要

Winner-take-all (WTA) networks constitute a central circuit motif in cortical networks of the brain. In addition, WTA-like activations are abundant in modern deep learning models in the form of the softmax activation for example in attention layers of transformers. While their role in the extraction of latent factors has been studied for relatively simple generative models, their role in the context of highly non-linearly entangled latent factors has remained elusive. In this article, we show that a WTA bottleneck within a deep neural network can enforce under certain well-defined conditions the extraction of categorical latent factors of the data in a multi-task learning setup. In particular, we prove that the representation that emerges in the WTA bottleneck is highly symbolic, where a single neuron or a population of neurons encodes the presence of a single abstract feature such as a specific object, color, or position. We furthermore show empirically on two datasets, that this also holds for architectures and setups that do not fully comply with the assumptions of our theorem and demonstrate the advantages of the acquired symbolic representation for generalization. Our proposed model provides insights into the generalization capabilities of deep neural networks with WTA-like components and may serve as an interface between symbolic and subsymbolic AI systems.

URL PDF HTML ☆

赞 0 踩 0

2605.22471 2026-05-22 cs.LG 版本更新

不要忘记批评者：基于价值的多循环持续强化学习中的数据复习

Benjamin Poole, Andrew Quinn, Li Yang, Minwoo Lee

发表机构 * Department of Computer Science（计算机科学系）； University of North Carolina at Charlotte（北卡罗来纳大学夏洛特分校）

AI总结本文提出了一种基于价值的数据复习方法，用于多循环持续强化学习，通过引入Qreg+NWLU方法改进学习效率、遗忘缓解和知识转移。

详情

AI中文摘要

数据复习已成为缓解持续强化学习（CRL）中灾难性遗忘的领先方法。然而，现有工作仍局限于策略梯度框架，仅正则化执行者，由于批评者正则化导致的性能下降。这种以执行者为中心的方法忽略了数据复习在价值函数近似中的潜力。此外，现有CRL评估很少考虑多循环环境，其中任务序列重复，这是关键的现实场景，加剧了遗忘和可塑性。我们研究了使用Q值正则化的深度Q网络在多循环设置中的数据复习，并提出Qreg+NWLU，引入了两个简单的修改：（1）连续数据复习，动态收集和更新存储的Q值在整个训练过程中；（2）“无等待”正则化，立即应用而不是在第一个任务之后。这些修改在价值函数近似设置中提高了学习效率、遗忘缓解和知识转移，优于Qreg和传统CRL方法。

英文摘要

Data rehearsal has emerged as a leading approach for mitigating catastrophic forgetting in Continual Reinforcement Learning (CRL). However, existing work remains confined to policy gradient frameworks, regularizing only actors due to the performance degradation incurred by critic regularization. This actor-centric approach overlooks the potential of data rehearsal for value function approximation. Moreover, existing evaluations in CRL rarely consider multi-cyclic environments where task sequences repeat, a critical real-world scenario that exacerbates forgetting and plasticity. We investigate data rehearsal for Deep Q-Networks using Q-value regularization in multi-cyclic settings and propose Qreg+NWLU which introduces two simple modifications: (1) continuous data rehearsal that dynamically collects and updates stored Q-values throughout training, and (2) "No-Wait" regularization that applies immediately rather than after the first task. Together, these modifications yield improvements in learning efficiency, forgetting mitigation, and knowledge transfer over Qreg and conventional CRL methods within value function approximation settings.

URL PDF HTML ☆

赞 0 踩 0

2605.22438 2026-05-22 stat.ML cs.GT cs.LG 版本更新

Do Not Trust The Auctioneer: Learning to Bid in Feedback-Manipulated Auctions

不要相信拍卖师：在反馈操纵拍卖中学习出价

Luigi Foscari, Matilde Tullii, Vianney Perchet

发表机构 * Università degli Studi di Milano（米兰大学）； Crest-Ensae（Ensae研究中心）； IP Paris（巴黎研究所）； CRITEO AI Team（CRITEO人工智能团队）

AI总结研究在反馈操纵拍卖中学习出价的问题，提出一种结合鲁棒区间消除分支和乐观分支的算法，以应对反馈操纵带来的挑战，并在单活跃区域情况下提供匹配下界。

详情

AI中文摘要

Shilling是指通过人工出价使竞争看起来更激烈以推高价格。我们研究了重复的第一价格拍卖，在其中shilling影响反馈但不影响分配：学习者在真实竞争出价中获胜或失败，但在失败后观察到真实出价和一个独立的shill出价的最大值。这种操纵改变了学习者所观察到的内容，从而影响其学习出价的方式，而不会改变当前拍卖的结果。我们分析了与最佳出价基准相比的遗憾，假设shill-bid分布已知。即使如此，shilling仍可能掩盖真实出价，而有用的侧信息仅通过间歇性低shill事件出现。我们的算法结合了一个鲁棒的区间消除分支，该分支忽略shilled报告并达到动态定价率$ ilde{\mathcal{O}}(T^{2/3})$，以及一个乐观分支，该分支去偏失败侧报告并利用其在可靠时的结果信息，达到第一价格拍卖的速率$ ilde{\mathcal{O}}(\sqrt{T})$。一个验证和竞赛过程让算法在不知道正确尺度或反馈几何学的情况下使用这些乐观更新。我们用单活跃区域情况下的匹配下界补充了上界，除了对数因子外。总体而言，结果表明，即使只有反馈的shilling也能显著改变重复出价的统计难度。

英文摘要

Shilling is the use of artificial bids to make competition appear stronger and push prices upward. We study repeated first-price auctions in which shilling affects feedback but not allocation: the learner wins or loses against the real competing bid, but after a loss observes the maximum of the real bid and an independent shill bid. Thus the manipulation changes what the learner observes and hence how it learns to bid, without changing the outcome of the current auction. We analyze regret with respect to the best bid benchmark, assuming that the shill-bid distribution is known. Even then, shilling can mask the real bid, while useful side information appears only through intermittent low-shill events. Our algorithm combines a robust interval-elimination branch, which ignores the shilled report and achieves the dynamic-pricing rate $\tilde{\mathcal{O}}(T^{2/3})$, with an optimistic branch that debiases losing-side reports and exploits the resulting suffix information when it is reliable and achieves the first-price auctions rate $\tilde{\mathcal{O}}(\sqrt{T})$. A validation and racing procedure lets the algorithm use these optimistic updates without knowing the right scale or feedback geometry in advance. We complement the upper bounds with a matching lower bound, up to logarithmic factors, in the single-active-region case. Overall, the results show that even feedback-only shilling can sharply alter the statistical difficulty of repeated bidding.

URL PDF HTML ☆

赞 0 踩 0

2605.22437 2026-05-22 cs.CR cs.AI cs.LG 版本更新

后验预测方差分解用于风力发电中的epistemic和aleatoric不确定性

Yinsong Chen, Samson S. Yu, Kashem M. Muttaqi

发表机构 * School of Engineering, Deakin University（德肯大学工程学院）； ARC Training Centre in Energy Technologies for Future Grids, School of Engineering, University of Wollongong（未来电网能源技术培训中心，沃林戈大学工程学院）

AI总结本文提出了一种后验预测方差分解方法，用于分离风力发电预测中的epistemic和aleatoric不确定性，通过总不确定性分解为aleatoric和epistemic组件，并提出特定于风力发电的评估框架来验证分解的有效性。

详情

AI中文摘要

准确的风力发电预测需要可靠的不确定性量化，但现有大多数方法报告单一的预测不确定性，将epistemic和aleatoric来源混淆了。本文应用总方差定律到异方差神经网络回归和贝叶斯后验近似联合设置中，推导出总不确定性（TU）的显式分解，将其分为aleatoric（AU）和epistemic（EU）组件。所得估计器与标准后验近似方法和β-NLL训练兼容，用于调节均值-方差学习的权衡。提出了一种特定于风力发电的评估框架，用于在没有地面真实不确定性标签的情况下验证分离性，包括三个模块：受控合成实验以验证对异方差噪声和分布偏移的响应；数据属性驱动验证在真实世界风力涡轮机SCADA数据集上；以及数据集大小缩放实验以检验EU的预测渐近行为。在合成和真实世界实验中，分解的AU和EU组件在噪声结构、分布偏移和训练规模变化方面表现出理论一致的方向，支持所提出分解和评估协议的理论一致性和操作实用性。

英文摘要

Accurate wind power forecasting requires reliable uncertainty quantification, yet most existing methods report a single predictive uncertainty that conflates epistemic and aleatoric sources. This paper applies the law of total variance to the joint setting of heteroscedastic neural network regression and Bayesian posterior approximation, deriving an explicit decomposition of total uncertainty (TU) into aleatoric (AU) and epistemic (EU) components. The resulting estimators are compatible with standard posterior-approximation methods and with $β$-NLL training to regulate the mean--variance learning trade-off. A wind power--specific evaluation framework is proposed to validate disentanglement without access to ground-truth uncertainty labels, comprising three modules: controlled synthetic experiments to verify responses to heteroscedastic noise and distribution shift; data-property--driven validation on a real-world wind turbine SCADA dataset; and dataset-size scaling experiments to examine the predicted asymptotic behavior of EU. Across synthetic and real-world experiments, the decomposed AU and EU components respond in theoretically consistent directions to noise structure, distributional shift, and training-scale variation, supporting the theoretical consistency and operational utility of the proposed decomposition and evaluation protocol.

URL PDF HTML ☆

赞 0 踩 0

2605.22387 2026-05-22 cs.LG cs.CE 版本更新

Hybrid Kolmogorov-Arnold Network and XGBoost Framework for Week-Ahead Price Forecasting in Australia's National Electricity Market

混合 Kolmogorov-Arnold 网络与 XGBoost 框架用于澳大利亚国家电力市场的周 ahead 电价预测

Houxuan Zhou, Sriram Prasad, Chenghao Huang, Jiajie Feng, Hao Wang

发表机构 * Department of Data Science and AI, Faculty of IT, Monash University, Australia（数据科学与人工智能系，IT学院，墨尔本大学，澳大利亚）； School of Electrical Engineering and Computer Science, University of Queensland, Australia（电气工程与计算机科学学院，昆士兰大学，澳大利亚）； Monash Energy Institute, Monash University, Australia（墨尔本能源研究所，墨尔本大学，澳大利亚）

AI总结本文提出了一种混合 KAN+XGBoost 框架，用于预测澳大利亚国家电力市场的周 ahead 电价，该框架结合了 Kolmogorov-Arnold 网络的全局非线性表示能力和 XGBoost 的局部鲁棒性，以捕捉长期依赖和短期价格波动，实验表明该模型在 MAE 上比 XGBoost 和 naive 基线模型分别减少了 12% 和 50% 以上。

Comments The 24th IEEE International Conference on Industrial Informatics, 2026

详情

AI中文摘要

准确的电力价格预测（EPF）对于市场参与者支持运营计划和风险管理至关重要，但因强波动性、非线性动态和频繁的极端价格尖峰而具有挑战性。这些挑战在澳大利亚国家电力市场（NEM）中尤为突出，其中高可再生能源渗透率进一步增加了不确定性。本文研究了周 ahead 电力价格预测，并提出了一种混合 KAN+XGBoost 框架，该框架结合了 Kolmogorov-Arnold 网络（KAN）与基于树的学习方法。所提出的方法结合了 KAN 的全局非线性表示能力与 XGBoost 的局部鲁棒性，以捕捉长期依赖和短期价格波动。实验在真实 NEM 数据上使用扩展窗口评估策略进行。结果表明，所提出的模型在基准方法（包括 SARIMAX、长短期记忆（LSTM）、独立 KAN 和 XGBoost）上表现更优，与 XGBoost 相比将 MAE 减少了约 12%，与 naive 基线相比减少了超过 50%。结果表明，混合学习策略为高动态电力市场中的电价预测提供了一种有效且稳健的解决方案。

英文摘要

Accurate electricity price forecasting (EPF) is essential for market participants to support operational planning and risk management, yet remains challenging due to strong volatility, nonlinear dynamics, and frequent extreme price spikes. These challenges are particularly pronounced in the Australian National Electricity Market (NEM), where high renewable penetration further increases uncertainty. This paper investigates week-ahead electricity price forecasting and proposes a hybrid KAN+XGBoost framework that integrates Kolmogorov-Arnold Networks (KAN) with tree-based learning. The proposed approach combines the global nonlinear representation capability of KAN with the local robustness of XGBoost to capture both long-term dependencies and short-term price fluctuations. Experiments are conducted on real-world NEM data using an expanding window evaluation strategy. The results demonstrate that the proposed model outperforms benchmark methods, including SARIMAX, Long Short-Term Memory (LSTM), standalone KAN, and XGBoost, reducing MAE by approximately 12% compared to XGBoost and by over 50% compared to a naive baseline. The results suggest that hybrid learning strategies provide an effective and robust solution for electricity price forecasting in highly dynamic electricity markets.

URL PDF HTML ☆

赞 0 踩 0

2605.22385 2026-05-22 cs.LG 版本更新

Efficient Higher-order Subgraph Attribution via Message Passing

通过消息传递实现高效的高阶子图归因

Ping Xiong, Thomas Schnake, Grégoire Montavon, Klaus-Robert Müller, Shinichi Nakajima

发表机构 * BIFOLD -- Berlin Institute for the Foundations of Learning（柏林学习与数据基础研究院）； Department of Artificial Intelligence, Korea University, Seoul 136-713, Korea（韩国大学人工智能系）； RIKEN Center for AIP, Japan（日本AIP研究中心）

AI总结本文提出了一种基于消息传递的高效算法，能够在线性时间内通过GNN-LRP对子图进行归因，并扩展了子图归因方法以考虑邻接图特征，实验表明该方法具有显著加速和高实用性。

Comments Published in ICML 2022

Journal ref Proceedings of the 39th International Conference on Machine Learning, PMLR 162:24478-24495, 2022

详情

AI中文摘要

解释图神经网络（GNNs）近年来变得越来越重要。高阶解释方案，如GNN-LRP（针对GNN的分层相关性传播），已成为解开不同特征如何相互作用并解释GNNs的强大工具。GNN-LRP在每一层为节点之间的行走提供相关性归因，而子图归因则表示为指数级许多此类行走的总和。在本工作中，我们证明这种指数复杂性可以避免。特别是，我们提出了新的算法，能够在GNN-LRP中以线性时间（相对于网络深度）对子图进行归因。我们的算法通过利用分配属性的消息传递技术，直接计算高阶解释的量。我们进一步将高效的算法适应于计算一种扩展的子图归因方法，该方法还考虑了邻接图特征。实验结果表明，所提算法有显著的加速效果，并展示了我们新颖的扩展子图归因方法的高实用性和可扩展性。

英文摘要

Explaining graph neural networks (GNNs) has become more and more important recently. Higher-order interpretation schemes, such as GNN-LRP (layer-wise relevance propagation for GNN), emerged as powerful tools for unraveling how different features interact thereby contributing to explaining GNNs. GNN-LRP gives a relevance attribution of walks between nodes at each layer, and the subgraph attribution is expressed as a sum over exponentially many such walks. In this work, we demonstrate that such exponential complexity can be avoided. In particular, we propose novel algorithms that enable to attribute subgraphs with GNN-LRP in linear-time (w.r.t. the network depth). Our algorithms are derived via message passing techniques that make use of the distributive property, thereby directly computing quantities for higher-order explanations. We further adapt our efficient algorithms to compute a generalization of subgraph attributions that also takes into account the neighboring graph features. Experimental results show the significant acceleration of the proposed algorithms and demonstrate the high usefulness and scalability of our novel generalized subgraph attribution method.

URL PDF HTML ☆

赞 0 踩 0

2605.22380 2026-05-22 cs.CL cs.LG 版本更新

Multi-Stage Training for Abusive Comment Detection in Indic Languages

印地语中辱骂评论检测的多阶段训练

Pranshu Rastogi, Madhav Mathur, Ramaneswaran S, Kshitij Mohan

发表机构 * Department of CSE, JIIT Noida（计算机科学与工程系，印度尼泊尔理工学院诺伊达）； Department of ICE, NSUT Delhi（电子与计算机工程系，NSUT德里）； Department of IT, VIT Vellore（信息科技系，维杰学院维洛雷）； Department of CSE, IIIT Delhi（计算机科学与工程系，德里理工学院）

AI总结本文提出了一种多阶段训练方法，通过语言预处理和多个模型的集成，提高印地语中辱骂评论检测的准确性，减少误报率以保护言论自由。

Comments 4 pages, EAM2021 selected

2605.22379 2026-05-22 cs.HC cs.AI cs.LG 版本更新

Cross-Subject EEG Emotion Recognition Based on Temporal Asynchronous Alignment Contrastive Learning

基于时间异步对齐对比学习的跨受体EEG情绪识别

Ying Xie, Yi Zheng, Zehui Xiao, Wenkai Lu, Mengting Liu

发表机构 * School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University（中山大学生物医学工程学院深圳校区）； School of Computer Science and Technology, Tianjin University（天津大学计算机科学与技术学院）

AI总结本文提出了一种基于时间异步对齐对比学习（TA2CL）的框架，用于解决跨受体EEG情绪识别中由于不同受体响应时间不一致导致的识别问题，通过改进相似性计算策略，提升模型对跨受体差异和时间延迟的鲁棒性。

Comments 16 pages, 7 figures

详情

AI中文摘要

随着科技的发展，情绪研究的重要性日益凸显。近年来，基于脑电图（EEG）的情绪识别已成为一个活跃的研究领域，因其客观性和高时间分辨率。然而，大多数现有方法侧重于优化编码器结构以增强特征提取能力，而对相似性计算策略关注较少，特别是忽略了不同受体之间响应的潜在时间不一致问题。为了解决这些不足，本文受ColBERT在自然语言处理（NLP）中的晚期交互机制启发，提出了一种基于时间异步对齐的对比学习（TA2CL）框架。该方法将传统的全局

英文摘要

With the advancement of science and technology, the importance of emotion research has become increasingly evident. Electroencephalography (EEG)-based emotion recognition has emerged as an active research area in recent years, owing to its objectivity and high temporal resolution. However, most existing methods focus on optimizing encoder structures to enhance feature extraction capabilities, while paying relatively little attention to similarity calculation strategies, particularly overlooking the potential temporal misalignment of responses among different subjects. To address these shortcomings, this paper draws inspiration from the late interaction mechanism of ColBERT in natural language processing (NLP) and proposes a Temporal Asynchronous Alignment-based Contrastive Learning (TA2CL) framework. This method transforms the traditional global "hard alignment" similarity calculation approach into a fine-grained local matching mechanism, enabling the model to adaptively search for and align "locally highly correlated" segments between two EEG signals, thereby effectively mitigating the effects of inter-subject differences and temporal delays. Experimental results demonstrate that the proposed method achieves strong performance across multiple public datasets. Specifically, on the FACED dataset, it achieves an accuracy of 64.5% for the nine-class classification task and 79.5% for the binary classification task, while on the SEED and SEED-V datasets, it achieves accuracies of 86.4% and 70.1%, respectively, validating the method's effectiveness and generalization capability.

URL PDF HTML ☆

赞 0 踩 0

2605.22377 2026-05-22 cs.LG 版本更新

Towards Explainability of SLMs by investigating Token Level Activation

通过研究token层面激活实现SLMs的可解释性

Sayantani Ghosh, Rajashik Datta, Amit Kumar Das, Amlan Chakrabarti

发表机构 * Information Technology（信息技术）； A.K. Choudhury School of Information Technology（A.K. Choudhury 信息技术学院）； Computer Science & Engineering(Artificial Intelligence)（计算机科学与工程（人工智能））； Institute of Engineering & Management（工程与管理学院）； Computer Science & Engineering（计算机科学与工程）； University of Calcutta（加尔各答大学）

AI总结本文提出了一种轻量且通用的框架，通过BERT第8层隐藏状态的激活强度量化token层面的表示重要性，揭示了语义信息在激活强度上的集中分布，为将BERT从黑箱模型转变为更透明的玻璃箱模型提供了可解释且计算高效的替代方法。

详情

AI中文摘要

基于Transformer的语言模型，如具有1.1亿个参数的BERT，已彻底改变了自然语言理解，但其内部机制仍 largely opaque to 研究人员和从业者。传统的基于注意力的可解释性方法往往强调结构上重要但语义上弱的token，如标点符号，而不是有意义的语义关系。本文介绍了一种轻量且通用的框架，用于通过BERT第8层隐藏状态的激活强度量化token层面的表示重要性。所提出的激活流网络（AFN）框架通过第8层隐藏表示的L2范数计算token激活强度，从而能够直接对语义显著的token进行排序。进一步，本文引入了基于阈值的激活桶公式，通过经验上四分位数激活边界将token分为高激活和低激活组。实验观察表明，语义上有意义的内容词始终占据高激活桶，并主导表示激活的变化，而结构支持的token贡献相对较少。结果表明，第8层充当一个关键的语义整合区域，平衡了结构和语义信息处理。通过揭示激活强度集中在语义信息丰富的token周围，本文为将BERT从黑箱模型转变为更透明的玻璃箱模型提供了可解释且计算高效的替代方法。

英文摘要

Transformer-based language models such as BERT having 110M+ parameters have revolutionized natural language understanding, yet their internal mechanisms remain largely opaque to researchers and practitioners. Traditional attention-based interpretability methods often emphasize structurally important but semantically weak tokens such as punctuation marks rather than meaningful semantic relationships. This work introduces a lightweight and model-agnostic framework for quantifying token-level representational importance using hidden-state activation strengths at Layer 8 of BERT. The proposed Activation Flow Network (AFN) framework computes Token Activation Strength using the L2 norm of Layer-8 hidden representations, enabling direct ranking of semantically salient tokens. The study further introduces a threshold-based activation bucket formulation that partitions tokens into HIGH-activation and LOW-activation groups using an empirical upper-quartile activation boundary. Experimental observations demonstrate that semantically meaningful content words consistently occupy the HIGH-activation bucket and dominate representational activation shifts, while structurally supportive tokens contribute comparatively less. The results suggest that Layer 8 acts as a critical semantic consolidation zone balancing structural and semantic information processing. By revealing how activation magnitudes concentrate around semantically informative tokens, this work provides an interpretable and computationally efficient alternative to attentioncentric analysis, contributing toward transforming BERT from a "black box" into a more transparent "glass box" model for natural language understanding.

URL PDF HTML ☆

赞 0 踩 0

2605.22376 2026-05-22 cs.LG 版本更新

Target-Aligned Bellman Backup for Cross-domain Offline Reinforcement Learning

目标对齐的贝尔曼备份用于跨域离线强化学习

Wei Liu, Ting Long

发表机构 * School of Artificial Intelligence（人工智能学院）； Jilin University（吉林大学）

AI总结本文提出了一种基于目标域贝尔曼目标对齐的跨域离线强化学习方法，旨在通过评估源域过渡与目标域贝尔曼目标的一致性来提升策略学习性能。

详情

AI中文摘要

跨域离线强化学习（CDRL）旨在通过利用源域收集的数据来改进目标域的策略学习。现有方法通常通过测量源域数据与目标域转换的相似性来评估数据的可迁移性，并隐式地进行转换级选择。被判定为相似的转换会被赋予更高的权重或奖励，而不相似的则被降权。然而，转换级的相似性并不一定保证长期回报的一致性。即使视觉或动态上相似的转换在目标域中也可能导致显著不同的结果，这可能会误导策略学习并降低性能。为了解决这个问题，我们重新审视了策略学习的根本目标。由于策略优化最终依赖于贝尔曼目标来评估决策的质量，我们提出基于源域转换与目标域贝尔曼目标的一致性来评估源域转换的可迁移性，而不是表面的转换相似性。基于这一见解，我们提出了一种名为目标对齐的贝尔曼备份（TABB）的方法，通过测量源域数据对目标域中准确贝尔曼目标估计的贡献来选择性地利用源域数据。我们在广泛的跨域离线RL设置中评估了TABB，尤其是在目标域数据高度有限的情况下。实验结果表明，TABB在各种情况下都实现了强大的性能。

英文摘要

Cross-domain offline reinforcement learning (CDRL) aims to improve policy learning in a target domain by leveraging data collected from a source domain. Existing works typically assess the transferability of source-domain data by measuring its similarity to target-domain transitions, and implicitly perform transition-level selection. Transitions that are considered similar are assigned higher weights or rewards, while dissimilar ones are down-weighted. However, transition-level similarity does not necessarily imply consistency in long-term returns. Even visually or dynamically similar transitions may lead to significantly different outcomes in the target domain, which can mislead policy learning and degrade performance. To address this issue, we revisit the fundamental objective of policy learning. Since policy optimization ultimately relies on Bellman targets to evaluate the quality of decisions, we propose to assess the transferability of source-domain transitions based on their alignment with target-domain Bellman targets, rather than superficial transition similarity. Based on this insight, we propose a method termed Target-Aligned Bellman Backup (TABB), which selectively leverages source-domain data by measuring their contribution to accurate Bellman target estimation in the target domain. We evaluate TABB across a broad range of cross-domain offline RL settings with highly limited target-domain data. Experimental results show that TABB consistently achieves strong performance.

URL PDF HTML ☆

赞 0 踩 0

2605.22372 2026-05-22 cs.LG 版本更新

一种用于在线Softmax分类中三分之一缩放的边界层机制

Marcel Kühn, Yoon Thelge, Bernd Rosenow

发表机构 * Institute for Theoretical Physics, Leipzig University（理论物理研究所，莱比锡大学）； ScaDS.AI Dresden/Leipzig（ScaDS.AI 德累斯顿/莱比锡）

AI总结本文研究了在线教师-学生模型中平滑替代损失与离散标签之间的不匹配如何产生幂律学习曲线的边界层机制，揭示了测试损失和泛化误差的α^{-1/3}缩放特性，以及学习率调度对泛化误差的改进。

Comments 20 pages, 7 figures

详情

AI中文摘要

硬标签分类通常使用平滑替代损失进行训练，最典型的是交叉熵。我们隔离了一个渐近机制，即这种平滑替代损失与离散标签之间的不匹配在在线教师-学生模型中产生幂律学习曲线。在减去平均logit后，热力学极限动态在中心变量中闭合：一个增长的中心学生-教师对齐D和残余学生方差Δ。在晚期时间，远离教师决策边界的例子已被自信分类并贡献指数级很小。只有宽度为O(D^{-1})的边界层仍活跃，而固定学习率的在线梯度下降噪声保持非零的Δ。作为训练时间α的函数，晚期解产生α^{-1/3}的幂律，不仅适用于测试损失，还适用于泛化误差ε_g，即1减去测试准确率。这比相同模型的贝叶斯最优参考α^{-1}要慢得多。我们进一步表明，学习率调度可以将泛化误差改进到ε_g ~ α^{-1/2}的幂律。模拟支持预测的序参量动态和学习曲线。使用相关高斯输入和白化预训练特征的受控实验表明，数据结构可以主导瞬态。因此，我们的结果是一种渐近的、补充的机制，而不是神经缩放定律频谱解释的替代方案。

英文摘要

Hard-label classification is usually trained with smooth surrogate losses, most prominently softmax cross-entropy. We isolate an asymptotic mechanism by which this mismatch between smooth surrogate and discrete labels produces power-law learning curves in an online teacher-student model. After subtracting the mean logit, the thermodynamic-limit dynamics close in centered variables: a growing centered student-teacher alignment $D$ and the residual student variance $Δ$. At late times, examples away from teacher decision boundaries are already classified confidently and contribute exponentially little. Only boundary layers of width $O(D^{-1})$ remain active, while the noise of fixed-learning-rate online gradient descent maintains a nonzero $Δ$. As a function of the training time $α$ the late-time solution yields a $α^{-1/3}$ power law not only for the test loss but also for the generalization error $ε_g$, i.e., one minus test accuracy. This is much slower than the $α^{-1}$ Bayes-optimal reference for the same model. We further show that learning-rate schedules can improve the generalization error towards a $ε_g \sim α^{-1/2}$ power law. Simulations support the predicted order parameter dynamics and learning curves. Controlled experiments with correlated Gaussian inputs and whitened pretrained features show that data structure can dominate transients. Therefore, our result is an asymptotic, complementary mechanism rather than an alternative to spectral explanations of neural scaling laws.

URL PDF HTML ☆

赞 0 踩 0

2605.22340 2026-05-22 cs.LG 版本更新

From Snapshots to Trajectories: Learning Single-Cell Gene Expression Dynamics via Conditional Flow Matching

从快照到轨迹：通过条件流匹配学习单细胞基因表达动力学

Siyu Pu, Qingqing Long, Xiaohan Huang, Haotian Chen, Jiajia Wang, Meng Xiao, Xiao Luo, Hengshu Zhu, Yuanchun Zhou, Xuezhi Wang

发表机构 * Computer Network Information Center, Chinese Academy of Sciences（中国科学院计算机网络信息中心）； University of California, Los Angeles（加州大学洛杉矶分校）

AI总结本文提出单细胞流匹配（scFM）方法，通过条件流匹配学习单细胞基因表达的动力学，解决时间点不连续和长时间预测中的分布漂移问题，提升轨迹推断的准确性和时间一致性。

详情

AI中文摘要

单细胞RNA测序（scRNA-seq）提供了细胞状态的高维轮廓，使能够驱动建模细胞动态随时间变化。实际上，时间分辨的scRNA-seq仅在几个离散时间点收集为不配对的快照群体，留下显著的时间间隙。这激励了在未测量时间点进行轨迹推断。现有方法主要沿着两个方向发展，最优传输（OT）对齐在观测快照之间提供分布层面的匹配，而连续时间生成模型支持通过学习的动力学进行预测。然而，仍存在两个挑战：（i）不配对的快照导致相邻时间点之间的局部转换模糊，导致监督不稳定；（ii）长时间预测依赖于重复积分，其中小的建模误差会累积并导致分布漂移。为了解决这些挑战，我们提出单细胞流匹配（scFM），一种基于耦合条件流匹配的潜在生成框架。首先，我们计算熵正则化的OT耦合在相邻快照之间，并使用它们来构建软加权流匹配目标，以学习时间依赖的速度场。其次，我们学习双向速度场，并利用其一致性来细化耦合并改进稀疏监督下的时间一致性。第三，我们引入分布层面的对齐和潜在动态正则化，以锚定长时间滚动并缓解漂移。在真实世界的时间序列scRNA-seq数据集上的实验表明，scFM在时间插值和外推的分布预测性能上始终有所提高。此外，scFM在中间时间点缺失的情况下产生更准确的轨迹重建和时间一致的可视化，表明对潜在时间基因表达动力学的更忠实恢复。

英文摘要

Single-cell RNA sequencing (scRNA-seq) provides high-dimensional profiles of cellular states, enabling data-driven modeling of cellular dynamics over time. In practice, time-resolved scRNA-seq is collected at only a few discrete time points as unpaired snapshot populations, leaving substantial temporal gaps. This motivates trajectory inference at unmeasured time points. Existing methods mainly follow two directions, optimal-transport (OT) alignment provides distribution-level matching between observed snapshots, while continuous-time generative models support forecasting via learned dynamics. However, two challenges remain: (i) unpaired snapshots render local transitions between adjacent time points ambiguous, leading to unstable supervision; and (ii) long-horizon prediction relies on repeated integration, where small modeling errors compound and cause distribution drift. To address these challenges, we propose single-cell Flow Matching (scFM), a latent generative framework based on coupling-conditioned flow matching. First, we compute entropically regularized OT couplings between adjacent snapshots and use them to construct soft, weighted flow-matching targets for learning time-dependent velocity fields. Second, we learn bidirectional velocity fields and leverage their consistency to refine couplings and improve temporal coherence under sparse supervision. Third, we introduce distribution-level alignment and latent dynamic regularization to anchor long rollouts and mitigate drift. Experiments on real-world time-series scRNA-seq datasets show that scFM consistently improves distributional prediction performance for both temporal interpolation and extrapolation. Moreover, scFM yields more accurate trajectory reconstruction and temporally coherent visualizations where intermediate time points are absent, indicating a more faithful recovery of underlying temporal gene expression dynamics.

URL PDF HTML ☆

赞 0 踩 0

2605.22338 2026-05-22 cs.LG 版本更新

Physics-Informed Generative Solver: Bridging Data-Driven Priors and Conservation Laws for Stable Spatiotemporal Field Reconstruction

物理引导的生成求解器：连接数据驱动先验与守恒定律以稳定时空场重建

Ziyuan Zhu, Keyu Hu, Zhifei Chen, Yuhao Shi, Ming Bao, Jing Zhao, Gang Wang, Haitan Xu, Jiadong Li, Qijun Zhao, Xiaodong Li, Minghui Lu, Yanfeng Chen

发表机构 * School of Advanced Manufacturing Engineering, Nanjing University（南京大学先进制造工程学院）； National Laboratory of Solid State Microstructures, Nanjing University（南京大学固态微结构国家实验室）； Suzhou Acoustics Industry Technology Research Institute Co., Ltd.（苏州声学工业技术研究所有限公司）； School of Mechanical and Electric Engineering, Soochow University（苏州大学机械与电子工程学院）； Shishan Laboratory, Nanjing University（仙山实验室）

AI总结本文提出了一种物理引导的生成求解器，通过分离稳定的先验学习与推理时的守恒定律强制执行，解决了从稀疏测量中重建连续物理场的问题，同时在声学和气象学中实现了高效且稳定的场重建。

详情

AI中文摘要

从稀疏测量中重建连续物理场是一个核心的逆问题，但数据驱动的生成模型可能会生成违反支配动力学的状态。我们引入了一种物理引导的生成求解器，将稳定的先验学习与推理时的守恒定律强制执行分离。Martingale-Regularized Score Matching通过Score Fokker-Planck约束正则化Score预训练，从而获得动态稳定的先验。Physics-Informed Implicit Score Sampling则通过物理残差的梯度引导去噪轨迹，将样本投影到可接受的流形上而无需重新训练。在声学中，该方法从稀疏传感器共同生成压力和粒子速度，使密集的虚拟阵列得以抑制空间混叠。相同的框架在极端稀疏的现实世界ERA5气象场中也具有泛化能力。一起，这项工作建立了一个严谨且可推广的范式，用于解决高维逆问题，弥合了生成人工智能与第一原理科学之间的差距。

英文摘要

Reconstructing continuous physical fields from sparse measurements is a central inverse problem, but data-driven generative models can produce states that violate governing dynamics. We introduce a physics-informed generative solver that separates stable prior learning from inference-time enforcement of conservation laws. Martingale-Regularized Score Matching regularizes score pretraining with a Score Fokker-Planck constraint, yielding a dynamically stable prior. Physics-Informed Implicit Score Sampling then guides denoising trajectories by gradients of physical residuals, projecting samples toward admissible manifolds without retraining. In acoustics, the method co-generates pressure and particle velocity from sparse sensors, enabling dense virtual arrays that suppress spatial aliasing. The same framework generalizes to real-world ERA5 meteorological fields under extreme sparsity. Together, this work establishes a rigorous and generalizable paradigm for solving high-dimensional inverse problems, bridging the gap between generative artificial intelligence and first-principles science.

URL PDF HTML ☆

赞 0 踩 0

2605.22335 2026-05-22 cs.LG 版本更新

Learning Causal Orderings for In-Context Tabular Prediction

在上下文中的表格预测中学习因果顺序

Sascha Xu, Sarah Mameche, Jilles Vreeken

发表机构 * Department of XXX, University of YYY, Location, Country（XXX系，YYY大学，地点，国家）； School of ZZZ, Institute of WWW, Location, Country（ZZZ学院，WWW研究所，地点，国家）

AI总结本文研究了如何在表格预测中同时推断和强制因果结构，通过拓扑变量顺序形式进行因果结构推断，提出TabOrder模型利用因果顺序约束注意力机制，在学习的因果顺序下仅基于先于目标的特征进行预测，并通过似然目标无监督学习最优变量顺序，同时探讨了样本缺失对因果方向识别的影响。

详情

AI中文摘要

在上下文学习中，表格数据集在观测设置中具有强大的预测标准；然而，它主要依赖于相关结构，这在分布偏移或干预下变得不可靠。虽然已建立的方法可用于发现因果结构，但它们通常专注于结构可识别性，并与可能从中受益的预测架构解耦。为了弥合这些视角，我们研究了如何在表格预测中同时推断和强制因果结构，以拓扑变量顺序的形式。与标准架构不同，我们的模型TabOrder使用因果顺序约束注意力，基于学习的因果顺序下仅使用先于目标的特征进行预测。类似于因果发现方法，TabOrder通过基于似然的目标无监督学习最优变量顺序。我们在此选择下标准函数模型类别，并研究了样本缺失，这是表格数据中常见的挑战，如何与因果方向识别相互作用。经验上，我们确认TabOrder在恢复准确的变量顺序的同时，解决了预测和填补任务，并在干预下为现实世界生物数据提供了见解。

英文摘要

In-context learning for tabular data sets strong predictive standards in observational settings; it however primarily relies on correlational structure, which becomes unreliable under distribution shift or intervention. While established methods to discover causal structure exist, they are often focused on structure identifiability and decoupled from the predictive architectures that could benefit from them. To bridge these perspectives, we study how to simultaneously infer and enforce causal structure in the form of topological variable orderings into tabular prediction. Unlike standard architectures, our model TabOrder uses causal order-constrained attention, basing predictions only on features that precede a target under a learned causal order. Similar to causal discovery methods, TabOrder learns the optimal variable ordering in an unsupervised manner through a likelihood-based objective. We justify this choice under standard functional model classes and also study how sample missingness, a common challenge in tabular data, interacts with causal direction identification. Empirically, we confirm that TabOrder recovers accurate variable orderings while addressing prediction and imputation tasks, as well as gives insight into real-world biological data under intervention.

URL PDF HTML ☆

赞 0 踩 0

2605.22334 2026-05-22 cs.LG 版本更新

Riemannian geometry meets fMRI: the advantages of modeling correlation manifolds and eigenvector subspaces

黎曼几何与fMRI的结合：建模相关流形和特征向量子空间的优势

Mario Severino, Manuela Moretto, Robert A. McCutcheon, Mattia Veronese

发表机构 * Department of Information Engineering, University of Padova（信息工程系，帕多瓦大学）； Department of Neuroimaging, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King’s College London（神经影像系，精神病学、心理学与神经科学研究所（IoPPN），伦敦国王学院）； Department of Psychiatry, University of Oxford（精神病学系，牛津大学）； Oxford Health NHS Foundation Trust, Warneford Hospital（牛津健康国家卫生信托基金，沃内福德医院）； Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King’s College London（精神病学研究系，精神病学、心理学与神经科学研究所，伦敦国王学院）

AI总结本文提出了一种可扩展的几何框架，通过Off-log度量和Grassmannian子空间判别方法，改进了fMRI数据的分析，提高了敏感性和预测性能。

详情

AI中文摘要

相关矩阵是功能脑网络的基本总结，但标准分析通常将条目独立处理，忽略了相关空间的曲面几何。现有的几何方法往往缺乏闭式运算或依赖任意区域排序，限制了可扩展性。我们引入了一种可扩展的几何框架，包含两个组成部分：（i）Off-log度量，一种平滑变换将相关矩阵映射到对称零对角矩阵。这使得距离、弗雷歇均值和线性模型的闭式表达成为可能，允许标准统计建模而无需复杂的流形优化。（ii）Grassmannian子空间判别，通过特征向量子空间之间的主角距离比较受试者，解决固有的符号和基底模糊性。这两个组成部分可以集成到标准机器学习工作流中进行推断、回归和分类。在两个临床队列（帕金森病和精神分裂症）和三个衰老fMRI数据集上得到验证，Off-log度量在置换检验中提高了灵敏度，并在分类中与黎曼和欧几里得基线匹配或超过。脑年龄预测性能相当，其中黎曼度量在两个队列中表现最佳。Grassmannian方法始终优于欧几里得基线，突显了与疾病相关的网络。总体而言，几何意识的表示提高了灵敏度和预测性能，同时在大规模部署时仍保持简单。

英文摘要

Correlation matrices are fundamental summaries of functional brain networks, yet standard analyses often treat entries independently, ignoring the curved geometry of correlation space. Existing geometric methods frequently lack closed-form operations or depend on arbitrary region ordering, limiting scalability. We introduce a scalable geometric framework with two components: (i) the Off-log metric, a smooth transformation mapping correlation matrices to symmetric zero-diagonal matrices. This enables closed-form expressions for distances, Frechet means, and linear models, allowing standard statistical modeling without complex manifold optimization. (ii) Grassmannian subspace discrimination, which compares subjects via principal-angle distances between eigenvector subspaces, resolving inherent sign and basis ambiguities. Both components integrate into standard machine-learning workflows for inference, regression, and classification. Validated across two clinical cohorts (Parkinson's and psychosis) and three ageing fMRI datasets, the Off-log metric increased sensitivity in permutation tests and matched or exceeded Riemannian and Euclidean baselines in classification. Brain-age prediction performance was comparable, with Riemannian metrics excelling in two of three cohorts. The Grassmannian method consistently outperformed Euclidean baselines, highlighting disease-relevant networks. Overall, geometry-aware representations improve sensitivity and predictive performance while remaining straightforward to deploy at scale.

URL PDF HTML ☆

赞 0 踩 0

2605.22331 2026-05-22 cs.LG cs.AI cs.DC 版本更新

SepsisAI Orchestrator: A Containerized and Scalable Platform for Deploying AI Models and Real-Time Monitoring in Early Sepsis Detection

SepsisAI Orchestrator：一个容器化和可扩展的平台，用于部署AI模型和实时监控以实现早期败血症检测

Santiago Ospitia, John Sanabria, John Garcia-Henao

发表机构 * School of Systems Engineering and Computing, University of Valle（系统工程与计算学院，山谷大学）； Digital Medicine Unit, Balgrist University Hospital（数字医学单元，巴尔格里斯大学医院）； Nucleus-AI Research（核芯AI研究所）

AI总结本文提出SepsisAI-Orchestrator平台，通过整合HL7 FHIR启发的临床文档架构（CDA）预处理、NoSQL存储、容器化LightGBM分类器和Streamlit临床仪表板，解决了早期败血症检测中AI模型部署的挑战，并通过负载测试展示了U型扩展行为。

Comments 13 pages, 5 figures. Submitted to BioCARLA 2025 Workshop

详情

AI中文摘要

尽管在临床机器学习文献中预测结果强劲，但将这些模型转化为床边使用仍然受限于系统层面的障碍：异构数据表示、缺乏标准化的部署流程以及研究原型与医院环境的并发性和延迟需求之间的不匹配。我们提出了SepsisAI-Orchestrator，一个开源的模块化平台，旨在解决早期败血症检测中的部署缺口。该平台集成了HL7 FHIR启发的临床文档架构（CDA）预处理、NoSQL存储、通过REST API服务的容器化LightGBM分类器和Streamlit临床仪表板，并通过Docker和Kubernetes进行协调。一个之前已验证的LightGBM模型（在PhysioNet 2019上的F1值为0.87-0.94）在不进行修改的情况下被重用；贡献在于周围基础设施及其在负载下的实证表征。使用k6进行50-1000个并发虚拟用户测试，我们发现副本数量必须与主机的物理CPU线程数匹配：在12线程CPU上从3个副本扩展到12个副本，将p95延迟从3.3秒减少到1.41秒（减少57.3%）并消除所有请求失败，而过度配置到24或48个副本则由于调度器竞争导致性能下降。据我们所知，这种U型扩展行为此前尚未对临床AI推理工作负载进行量化。我们不声称具有前瞻性临床验证。源代码和部署清单可在https://github.com/nucleusai/sepsisai-orchestrator获取。

英文摘要

Despite strong predictive results in the clinical machine learning literature, the translation of these models into bedside use remains limited by systems-level barriers: heterogeneous data representations, the absence of standardized deployment workflows, and a mismatch between research prototypes and the concurrency and latency requirements of hospital environments. We present the SepsisAI-Orchestrator, an open-source modular platform that addresses this deployment gap for early sepsis detection. The platform integrates HL7 FHIR-inspired Clinical Document Architecture (CDA) preprocessing, NoSQL storage, a containerized LightGBM classifier served via REST APIs, and a Streamlit clinical dashboard, orchestrated with Docker and Kubernetes. A previously validated LightGBM model (F1 0.87-0.94 on PhysioNet 2019) is reused without modification; the contribution lies in the surrounding infrastructure and its empirical characterization under load. Using k6 with 50-1000 concurrent virtual users, we find that replica count must be matched to the physical CPU thread count of the host: scaling from 3 to 12 replicas on a 12-thread CPU reduces p95 latency from 3.3s to 1.41s (57.3% reduction) and eliminates all request failures, while over-provisioning to 24 or 48 replicas degrades performance due to scheduler contention. To our knowledge this U-shaped scaling behavior has not been quantified previously for clinical AI inference workloads. We do not claim prospective clinical validation. Source code and deployment manifests are available at https://github.com/nucleusai/sepsisai-orchestrator.

URL PDF HTML ☆

赞 0 踩 0

2605.22304 2026-05-22 cs.AI cs.DB cs.LG 版本更新

Evaluation of Pipelines for Data Integration into Knowledge Graphs

数据整合到知识图谱的管道评估

Marvin Hofer, Erhard Rahm

发表机构 * ScaDS.AI Dresden/Leipzig（ScaDS.AI 德累斯顿/莱比锡）； Leipzig University（莱比锡大学）

AI总结本文提出KGI-Bench基准测试，用于评估将不同输入数据整合到现有知识图谱的管道，通过覆盖度、正确性和一致性三个指标分析输出的知识图谱质量，并在电影领域提供基准数据集以评估12种管道的性能。

详情

AI中文摘要

将新数据整合到知识图谱（KG）通常涉及在工作流或管道中执行的不同任务。对于特定的整合问题，有许多可能的管道，但目前尚无通用方法来评估此类管道的整体质量和性能，以确定最佳选择。因此，我们提出一个新的基准KGI-Bench，用于评估将不同类型的输入数据整合到现有KG的管道。我们通过分析输出，即更新后的KG，使用三个互补的质量度量：覆盖度、正确性和一致性来评估管道。我们还提供了基准数据集（种子KG、三种格式的重叠输入数据、参考KG作为地面真实值）用于电影领域。为了展示所提基准的适用性和有用性，我们比较评估了12种管道，并分析了它们在不同输入数据格式和设计选择下的行为。

英文摘要

Integrating new data into knowledge graphs (KG) typically involves different tasks that are executed within workflows or pipelines There are many possible pipelines for a specific integration problem but there is not yet a general approach to evaluate the overall quality and performance of such pipelines to be able to determine the best choices. We therefore propose a new benchmark KGI-Bench to evaluate integration pipelines that ingest different kinds of input data into an existing KG. We evaluate pipelines by analyzing their output, i.e., the updated KG, with the three complementary quality metrics coverage, correctness and consistency. We also provide benchmark datasets (seed KG, overlapping input data of three formats, reference KG as a ground truth) for the movie domain. To demonstrate the applicability and usefulness of the proposed benchmark, we comparatively evaluate 12 pipelines and analyze their behavior across different input data formats and design choices.

URL PDF HTML ☆

赞 0 踩 0

2605.22300 2026-05-22 cs.AI cs.LG cs.MA 版本更新

Cross-domain benchmarks reveal when coordinated AI agents improve scientific inference from partial evidence

跨领域基准测试揭示协调AI代理在部分证据下提升科学推断何时有效

Fiona Y. Wong, Markus J. Buehler

发表机构 * Laboratory for Atomistic and Molecular Mechanics (LAMM)（原子分子力学实验室）； Department of Biological Engineering（生物工程系）； Department of Mechanical Engineering（机械工程系）； Department of Civil and Environmental Engineering（土木与环境工程系）； Center for Computational Science and Engineering, Schwarzman College of Computing（计算科学与工程中心）； Massachusetts Institute of Technology（麻省理工学院）

AI总结本文通过跨领域基准测试探讨协调AI代理在部分证据下提升科学推断的有效性，发现当不同学科各自捕捉现象部分时，跨通道复合方法优于单一通道基线，但在某些情况下分解并不总是提升整体性能。

详情

AI中文摘要

科学证据通常跨越仪器、数据库和学科，因此没有单一来源能完整记录现象。这使得确定协调AI代理何时能超越简单科学工作流变得困难。我们通过涵盖四个科学任务的跨领域基准测试评估了这一问题：将分子结构映射到音乐表示、检测科学历史范式转变、识别媒介传播疾病爆发以及验证行星凌星候选体。每个案例均使用冻结评估小组、预定义评分协议、明确基线、消融或零对照，以及声明的限制。结果定义了三个操作模式。当不同学科各自只捕捉现象部分时，跨通道复合方法优于单一通道基线：气候-媒介爆发达到AUROC 0.944，行星凌星验证达到AUROC 0.955。然而，行星凌星工作流与强联合摘要基线几乎持平，表明分解不总能提升整体性能。当一个信号主导时，如范式转变检测，协调主要提升解释和可追溯性。对于分子音乐化，收益是表征而非预测性的。ScienceClaw x Infinite提供了此评估的可审计艺术ifacts和来源层。因此，该基准测试仅在对应的性能、来源或表征主张有明确比较器支持时才赋予协调价值。

英文摘要

Scientific evidence often spans instruments, databases, and disciplines, so no single source records the full phenomenon. This makes it difficult to determine when coordinated AI agents add value over simpler scientific workflows. We evaluate this question with a cross-domain benchmark spanning four scientific tasks: mapping molecular structure into musical representations, detecting historical paradigm shifts in science, identifying vector-borne disease emergence, and vetting transiting-exoplanet candidates. Each case uses a frozen evaluation panel, predefined scoring protocols, explicit baselines, ablations or null controls, and stated limitations. The results define three operating regimes. When different disciplines each capture only part of the phenomenon, cross-channel composites improve over single-channel baselines: climate-vector emergence reaches AUROC 0.944 and exoplanet vetting reaches AUROC 0.955. However, the exoplanet workflow is effectively tied with a strong combined-summary baseline, showing that decomposition does not always improve top-line performance. When one signal dominates, as in paradigm-shift detection, coordination mainly improves interpretation and traceability. For molecular sonification, the gain is representational rather than predictive. ScienceClaw x Infinite provides the auditable artifact and provenance layer for this evaluation. The benchmark therefore assigns value to coordination only when the corresponding performance, provenance, or representation claim is supported by explicit comparators.

URL PDF HTML ☆

赞 0 踩 0

2605.22291 2026-05-22 cs.LG 版本更新

Long-term Fairness with Selective Labels

长期公平性与选择性标签

Giovani Valdrighi, Isabel Valera, Marcos Medeiros Raimundo

发表机构 * Department of Computer Science, Saarland University, Saarbrücken, Germany（萨尔布吕肯大学计算机科学系）

AI总结本文研究了在选择性标签设置下长期公平性的问题，提出了一种新的框架，通过结合观测数据和标签预测模型来估计真实的公平性度量，并提出了一种新的强化学习算法以实现有效长期公平决策。

详情

AI中文摘要

长期公平性算法旨在通过考虑决策政策与人口行为之间的动态关系，满足超越静态和短期观念的公平性。大多数先前的方法从可观察特征和标签评估性能和公平性度量，其中标签被假设为完全可观测。然而，在招聘或贷款等场景中，标签（例如偿还贷款的能力）是选择性标签，因为它们仅在积极决定（例如贷款被批准时）后才被揭示。在本文中，我们研究了选择性标签设置下的长期公平性，并分析表明，朴素的解决方案无法保证公平性。为了解决这一差距，我们引入了一个新的框架，利用观测数据和标签预测模型来估计真实的公平性度量值，将其分解为观测公平性和标签预测中的偏差。这使我们能够通过使用预测模型的置信度来推导出满足真实公平性的充分条件。最后，我们依赖我们的理论结果，提出了一种新的强化学习算法，以实现有效长期公平决策。在半合成环境中，所提出的算法在公平性和性能方面与具有oracle访问真实标签的智能体相当。

英文摘要

Long-term fairness algorithms aim to satisfy fairness beyond static and short-term notions by accounting for the dynamics between decision-making policies and population behavior. Most previous approaches evaluate performance and fairness measures from observable features and a label, which is assumed to be fully observed. However, in scenarios such as hiring or lending, the labels (e.g., ability to repay the loan) are selective labels as they are only revealed based on positive decisions (e.g., when a loan is granted). In this paper, we study long-term fairness in the selective labels setting and analytically show that naive solutions do not guarantee fairness. To address this gap, we then introduce a novel framework that leverages both the observed data and a label predictor model to estimate the true fairness measure value by decomposing it into the observed fairness and bias from label predictions. This allows us to derive sufficient conditions to satisfy true fairness from observable quantities by using the confidence in the predictor model. Finally, we rely on our theoretical results to propose a novel reinforcement learning algorithm for effective long-term fair decision-making with selective labels. In semisynthetic environments, the proposed algorithm reached comparable fairness and performance to an agent with oracle access to the true labels.

URL PDF HTML ☆

赞 0 踩 0

2605.22286 2026-05-22 cs.LG cs.AI 版本更新

EmoTrack: Robust Depression Tracking from Counseling Transcripts across Session Regimes

EmoTrack: 从咨询记录中跨会话制度实现稳健的抑郁跟踪

Zhaomin Wu, Jiayi Li, Bingsheng He

发表机构 * Department of Computer Science National University of Singapore（新加坡国立大学计算机科学系）

AI总结本文研究了从单次会话和多会话制度中通过咨询记录进行稳健抑郁跟踪的问题，提出了LongCounsel多会话咨询数据集和EmoTrack框架，结合LLM提取的临床信号和冻结的轮次级语义嵌入，训练症状特定预测器，并通过紧凑的跨会话记忆进一步结合先前会话，实验表明在真实单次会话基准上表现优异。

详情

AI中文摘要

基于文本的咨询是人工智能心理健康支持的重要接口，其中记录可能用于监控抑郁严重程度并标记需要及时人工审查的会话。然而，跨会话制度实现稳健的PHQ-8预测仍然具有挑战性：基于微调的方法可以利用更丰富的监督但可能在数据稀缺时泛化能力差，而基于提示的LLM方法数据高效但通常将每个记录整体处理，对纵向上下文支持有限。我们研究了从咨询记录中跨单次会话和多会话制度进行稳健抑郁跟踪。我们引入了LongCounsel多会话咨询数据集，具有会话级PHQ-8监督，用于评估在部分症状披露和跨会话连续性下的重复会话跟踪。我们进一步提出了EmoTrack，一种PHQ-8预测框架，结合LLM提取的临床信号与冻结的轮次级语义嵌入，并在得到的记录表示上训练症状特定预测器。当先前会话可用时，EmoTrack可通过紧凑的跨会话记忆进一步结合它们。在LongCounsel和DAIC-WOZ上的实验表明，EmoTrack在真实单次会话基准上实现了明显优势，包括在最强DAIC-WOZ基线上的MAE相对减少13.5%，并在LongCounsel上与最强的纵向基线保持竞争力。

英文摘要

Text-based counseling is an important interface for AI mental-health support, where transcripts may be used to monitor depression severity and flag sessions requiring timely human review. However, robust PHQ-8 prediction across session regimes remains challenging: fine-tuning-based methods can exploit richer supervision but may generalize poorly under data scarcity, while prompt-based LLM methods are data-efficient but usually treat each transcript holistically and provide limited support for longitudinal context. We study robust depression tracking from counseling transcripts across single-session and multi-session regimes. We introduce LongCounsel, a multi-session counseling dataset with session-level PHQ-8 supervision for evaluating repeated-session tracking under partial symptom disclosure and cross-session continuity. We further propose EmoTrack, a PHQ-8 prediction framework that combines LLM-extracted clinical signals with frozen turn-level semantic embeddings and trains symptom-specific predictors over the resulting transcript representation. When prior sessions are available, EmoTrack can further incorporate them through compact cross-session memory. Experiments on LongCounsel and DAIC-WOZ show that EmoTrack achieves a clear gain on the real single-session benchmark, including a 13.5% relative MAE reduction over the strongest DAIC-WOZ baseline, and remains competitive with the strongest longitudinal baseline on LongCounsel.

URL PDF HTML ☆

赞 0 踩 0

2605.22275 2026-05-22 cs.LG 版本更新

Adaptive Measurement Allocation for Learning Kernelized SVMs Under Noisy Observations

适应性测量分配用于在噪声观测下学习核化SVM

Artur Miroszewski

发表机构 * Φ \Phi -lab, European Space Agency (ESA/ESRIN), Frascati, Italy（Φ \Phi 实验室，欧洲航天局（ESA/ESRIN），弗拉斯卡蒂，意大利）

AI总结本文提出了一种适应性测量分配策略，用于在噪声观测下学习核化支持向量机，通过结合几何敏感性和主动集不稳定性，优化核矩阵中决策关键区域的测量分配，从而提升支持向量恢复、边距估计和决策函数准确性。

Comments 20 pages, 9 figures

详情

AI中文摘要

核方法通常是在假设能够精确获取Gram矩阵的情况下进行建模的。然而，在新兴领域如量子机器学习中，每个核元素必须从噪声观测中推断出来，其准确性取决于如何分配有限的测量预算。尽管如此，现有方法大多依赖于均匀分配，这虽然平等地降低了估计方差，但忽略了核化分类器对Gram矩阵的高度非均匀依赖。在本文中，我们提出了一种适应性测量分配策略，用于从噪声伯努利观测中学习核化支持向量机。我们的方法结合了两个互补原则：(i) 几何敏感性，捕捉单个核元素扰动对分类器边距的影响，以及 (ii) 主动集不稳定性，量化由测量噪声引起的支持向量成员身份的离散变化概率。这些信号定义了一个任务感知的分配方案，将测量集中在核矩阵中最关键的决策区域。我们提供了理论分析，表明适应性分配的益处由诱导核重要结构的异质性决定，导致在不同情况下适应性或均匀策略更优。在合成数据集上的实验证明，在固定测量预算下，适应性分配显著提高了支持向量恢复、边距估计和决策函数准确性。双系数稳定性准则进一步使早停成为可能，仅使用少量测量成本即可达到近最优性能。此外，在从真实数据导出的量子核上的额外实验揭示了与已知现象如核集中度相一致的领域依赖行为。

英文摘要

Kernel methods are typically formulated under the assumption of exact, noise-free access to the Gram matrix. However, in emerging settings such as quantum machine learning, each kernel entry must be inferred from noisy observations, and its accuracy depends on how a limited measurement budget is allocated. Despite this, existing approaches overwhelmingly rely on uniform allocation, which equalizes estimator variance but ignores the highly non-uniform dependence of kernelized classifiers on the Gram matrix. In this work, we introduce an adaptive measurement-allocation strategy for learning kernelized Support Vector Machines (SVMs) from noisy Bernoulli observations. Our approach combines two complementary principles: (i) geometric sensitivity, capturing how perturbations of individual kernel entries affect the classifier margin, and (ii) active-set instability, quantifying the probability of discrete changes in support-vector membership induced by measurement noise. These signals define a task-aware allocation scheme that concentrates measurements on the most decision-critical regions of the kernel matrix. We provide a theoretical analysis showing that the benefit of adaptive allocation is governed by the heterogeneity of the induced kernel importance structure, leading to distinct regimes in which adaptive or uniform strategies are preferable. Empirical evaluations on synthetic datasets demonstrate that adaptive allocation significantly improves support-vector recovery, margin estimation, and decision-function accuracy under fixed measurement budgets. A dual-coefficient stability criterion further enables early stopping, achieving near-optimal performance while using only a fraction of the measurement cost. Additional experiments on quantum kernels derived from real-world data reveal a regime-dependent behavior aligned with known phenomena such as kernel concentration. Together...

URL PDF HTML ☆

赞 0 踩 0

2605.22266 2026-05-22 cs.LG cs.AI 版本更新

Detecting Atypical Clients in Federated Learning via Representation-Level Divergence

通过表示层面的分歧检测联邦学习中的非典型客户端

Cristian Pérez-Corral, Jose I. Mestre, Alberto Fernández-Hernández, Manuel F. Dolz, Enrique S. Quitana-Ortí

发表机构 * Universitat Politècnica de València（巴塞罗那理工大学）； Universitat Jaume I（Jaime I 大学）

AI总结本文提出了一种轻量级的几何信号来量化客户端与全局模型之间的功能偏差，以检测联邦学习中的非典型客户端，通过评估输入空间的激活诱导分区变化来区分稳定但异质的客户端与显著偏离全局范式的客户端。

详情

AI中文摘要

联邦学习使分布式客户端在异质数据上进行协作训练，但这种异质性常常导致更新不稳定和全局性能下降。此外，在实际部署中，客户端更新可能偏离预期行为，不仅由于良性非独立同分布的数据分布，还由于分布偏移或异常输入，这引发了对聚合过程可靠性的担忧。在本工作中，我们提出了一种轻量级的几何信号来量化客户端相对于全局模型的功能偏差。与比较模型参数或梯度不同，我们的方法衡量每个客户端本地训练如何改变激活诱导的输入空间分区，该评估基于共享的探测集。这产生了一个置换不变、可解释的客户端-全局分歧度量，捕捉了模型处理数据方式的差异。我们展示该信号能有效识别导致非典型功能变化的客户端，区分稳定但异质的客户端与那些更新显著偏离全局范式的客户端。因此，所提出的度量提供了一个简单的工具用于监控客户端行为，并在联邦学习系统中实现风险感知的聚合策略。

英文摘要

Federated learning enables collaborative training across distributed clients with heterogeneous data, but such heterogeneity often leads to unstable updates and degraded global performance. Moreover, in practical deployments, client updates may deviate from the expected behavior not only due to benign not i.i.d. distributions, but also due to distributional shifts or anomalous inputs, raising concerns about the reliability of the aggregation process. In this work, we propose a lightweight geometric signal to quantify the functional deviation of a client with respect to the global model. Instead of comparing model parameters or gradients, our approach measures how the local training of each client alters the activation-induced partition of the input space, evaluated on a shared probe set. This yields a permutation-invariant, interpretable metric of client--global divergence that captures differences in how data is processed by the model. We show that this signal effectively identifies clients that induce atypical functional changes, distinguishing stable yet heterogeneous clients from those whose updates significantly diverge from the global regime. As a result, the proposed metric provides a simple tool for monitoring client behavior and enabling risk-aware aggregation strategies in federated learning systems.

URL PDF HTML ☆

赞 0 踩 0

2605.22263 2026-05-22 cs.LG cs.AI 版本更新

Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning

按能力定制教学：方向自适应自蒸馏用于LLM推理

Hongbin Zhang, Chaozheng Wang, Kehai Chen, Youcheng Pan, Yang Xiang, Jinpeng Wang, Min Zhang

发表机构 * Institute of Computing and Intelligence, Harbin Institute of Technology（计算智能研究所，哈尔滨工业大学）； Peng Cheng Laboratory（鹏城实验室）； Keeta AI, Meituan（Keeta AI，美团）

AI总结本文提出方向自适应自蒸馏（DASD），通过熵引导的定向监督改进LLM推理，通过分析发现统一的教师监督导致探索被压制，DASD在六个数学推理基准中取得最佳表现。

Comments Under Review

详情

AI中文摘要

在线自蒸馏（OPSD）是一种新兴的LLM后训练范式，其中模型作为自己的教师：在有特权信息（如参考轨迹或提示）的条件下，同一策略为自身 rollout 提供密集的token级监督。然而，最近的研究表明，OPSD 通过抑制预测不确定性而损害复杂推理，这支持探索和假设修订。我们的token级分析显示，这种失败源于在具有不同不确定性水平的token上应用统一的教师监督方向：符合特权自教师会抑制高熵的探索，而偏离教师会降低低熵的步骤准确性。据此，我们提出了方向自适应自蒸馏（DASD），将特权自蒸馏从统一教师模仿重新框架为熵引导的定向监督：高熵token被推离特权教师以保持探索，而低熵token被拉向教师以稳定步骤级执行。在六个数学推理基准上，DASD在强RLVR和自蒸馏基线中实现了最佳的宏Avg@16。Pass@$k$、推理健康和泛化分析表明，这些平均收益来自于在不牺牲步骤级执行的情况下保留探索。

英文摘要

On-policy self-distillation (OPSD) is an emerging LLM post-training paradigm in which the model serves as its own teacher: conditioned on privileged information such as a reference trace or hint, the same policy provides dense token-level supervision on its own rollouts. However, recent studies show that OPSD degrades complex reasoning by suppressing predictive uncertainty, which supports exploration and hypothesis revision. Our token-level analysis shows that this failure arises from applying a uniform direction of teacher supervision across tokens with different uncertainty levels: conformity to the privileged self-teacher suppresses exploration at high entropy, while deviation from the teacher degrades step accuracy at low entropy. Accordingly, we propose \textbf{Direction-Adaptive Self-Distillation} (\textbf{DASD}), which reframes privileged self-distillation from uniform teacher imitation into entropy-routed directional supervision: high-entropy tokens are pushed away from the privileged teacher to preserve exploration, while low-entropy tokens are pulled toward the teacher to stabilize step-level execution. Across six mathematical reasoning benchmarks, DASD achieves the best macro Avg@16 over strong RLVR and self-distillation baselines. Pass@$k$, reasoning-health, and generalization analyses show that these average gains come from preserving exploration without sacrificing step-level execution.

URL PDF HTML ☆

赞 0 踩 0

2605.22262 2026-05-22 cs.SD cs.LG eess.AS 版本更新

Automatic Contextual Audio Denoising

自动上下文音频去噪

Diep Luong, Konstantinos Drossos, Mikko Heikkinen, Tuomas Virtanen

发表机构 * Tampere University（塔尔皮奥大学）； Nokia（诺基亚）

AI总结本文提出了一种自动上下文音频去噪方法，通过推断音频场景类别来区分有用和无关声音成分，从而提高去噪效果。

详情

AI中文摘要

音频上下文决定了哪些声音成分和来源是相关的，哪些可以被听众感知为无关（噪声）。例如，在城市监控中交通噪声是有信息的，而在同一地点的电话通话中则为噪声。大多数当前的音频去噪系统使用固定的目标-噪声定义，往往在一种上下文中去除有用成分而在另一种上下文中无法抑制无关成分。为此，我们引入了自动上下文音频去噪（ACAD）的概念，该概念基于推断的上下文定义目标和噪声。在本工作中，我们将上下文限制为与声学场景类别相关联。我们将场景类别外的事件分布之外的声音事件（噪声）标记为离上下文（OC），而典型于该场景的事件标记为在上下文中（IC）。我们实现了一种深度学习方法，该方法能够自动推断音频信号的上下文并去除OC成分，并将其与无上下文推断、有 oracle 上下文和单独提供无信息上下文的变体进行比较。在跨多样上下文的配对干净/噪声数据上，其中一种上下文中的OC成分可能在另一种上下文中是IC，我们的方法在标准客观指标上优于其他方法，表明模型能够推断上下文，并且上下文依赖的处理可以增强去噪。

英文摘要

Audio context determines which sound components and sources are relevant and which can be perceived as irrelevant (noise) by listeners. For example, traffic noise is informative in urban surveillance but noise for a phone call at the same location. Most current audio denoising systems apply fixed target-noise definitions, often removing useful components in one context while failing to suppress irrelevant components. To address this, we introduce the concept automatic contextual audio denoising (ACAD) which defines target and noise based on the inferred context. In this work, we restrict context to be associated with an acoustic scene class. We label sound events outside the event distribution of a scene class (noise) as out-of-context (OC) and events typical for that scene as in-context (IC). We implement a deep learning method that automatically infers the context of the audio signal and removes OC components, and benchmark it against variants: without context inference, with oracle context, and with separately provided uninformative context. On paired clean/noisy data across diverse contexts, where OC components in one context may be IC in another, our proposed method outperforms other approaches across standard objective metrics, indicating that the model can infer context and context-dependent processing can enhance denoising.

URL PDF HTML ☆

赞 0 踩 0

2605.22259 2026-05-22 cs.LG cs.CV cs.RO 版本更新

An Evidence Hierarchy for Bayesian Object Classification via OSINT-Aided Heterogeneous Sensor Fusion

基于OSINT辅助异质传感器融合的贝叶斯目标分类证据层级

Jan Nausner, Michael Hubner

发表机构 * Center for Digital Safety & Security, Austrian Institute of Technology GmbH (AIT)（数字安全与安全研究所，奥地利技术研究院（AIT））

AI总结本文提出了一种基于OSINT辅助的异质传感器融合方法，通过建立新的证据层级模型，结合上下文信息和领域知识，提升对CBRNE威胁的分类准确率，实验结果表明该方法在抗干扰和先验不匹配方面具有优势，分类准确率高达95%。

Comments 6 pages, 1 figure; \c{opyright} 2026 The Authors. Submitted to the 2026 IEEE International Conference on Multisensor Fusion and Integration (MFI 2026). Under review

详情

AI中文摘要

异质传感器融合对于检测、定位和分类CBRNE威胁至关重要。然而，单独的传感器通常只能检测相关威胁的子集，其可靠性各异，甚至只能提供间接威胁指示，使威胁分类变得困难。此外，传感器侧的高杂波率对融合系统提出了巨大挑战。此外，高质量数据集的有限供应阻碍了智能传感器中基于学习的检测和分类模型的发展。为缓解这些传感器相关缺点，提出了一种上下文感知和领域知识增强的融合过程。首先，建立了一个新的证据层级，能够建模直接、指示性和上下文信息。其次，通过收集、处理和利用OSINT输入，将环境上下文信息引入融合过程。第三，利用证据层级的所有级别，构建一个结合领域知识的贝叶斯威胁类型分类机制。所提出的方法在模拟场景中进行了评估，结果表明该融合方法在抗杂波和先验不匹配方面具有优势，总体分类准确率高达95%。

英文摘要

Heterogeneous sensor fusion is vital for detecting, localizing, and classifying CBRNE threats. However, individual sensors are often only capable of detecting a subset of relevant threats with varying reliability or can even provide only indirect threat indications, making threat classification challenging. Furthermore, high clutter rates on the sensor side present a great challenge for fusion systems. Additionally, the limited availability of high quality datasets hinders the advancement of learning-based detection and classification models in smart sensors. To mitigate these sensor related shortcomings, a context-aware and domain knowledge-enhanced fusion process is proposed. First, a novel evidence hierarchy is established that enables modeling of direct, indicative, and contextual information. Second, contextual information about the environment is introduced into the fusion process, by collecting, processing, and exploiting OSINT inputs. Third, all levels of the evidence hierarchy are used to craft a Bayesian threat type classification mechanism with domain knowledge-informed priors. The proposed methodology is evaluated in simulated scenarios, and the results demonstrate the benefit of the proposed fusion approach in terms of robustness to clutter and prior mismatch, with an overall classification accuracy of up to 95%.

URL PDF HTML ☆

赞 0 踩 0

2605.22257 2026-05-22 cs.LG cs.AI cs.LO 版本更新

具有Kolmogorov-Arnold网络的全纯神经ODEs用于复杂动力学的可解释发现

Bhaskar Ranjan Karn, Dinesh Kumar

AI总结本文提出了一种基于Kolmogorov-Arnold网络的全纯神经ODE框架，用于在复杂动力学系统中发现可解释的 governing equations，通过可微的正则化保持全纯结构，并在多个复杂动力学系统上验证了其有效性。

Comments 16 pages. Comments are welcome

详情

AI中文摘要

由全纯映射（如z² + c）支配的复杂动力系统表现出具有极端初始条件敏感性的分形边界。从数据准确建模这些结构需要尊重底层复解析几何的方法，但神经普通微分方程（Neural ODEs）中的多层感知机（MLP）缺乏复解析先验，违反柯西-黎曼条件，并作为不透明的近似器无法提供 governing equations。我们引入了全纯KAN-ODE框架，用Kolmogorov-Arnold网络（KAN）取代MLP，其可学习的B样条激活函数位于网络边，并将柯西-黎曼方程作为可微正则化以保持全纯结构。我们在六个复杂动力系统家族上进行了评估，涵盖多项式和超越类。仅使用280个参数（比MLP基线少16倍），网络在所有六个系统上实现了速度场R² > 0.95，正确识别了所有六个 governing symbolic families 通过自动样条到公式拟合，并重建了Julia集分形边界，与98.0%一致。关键的是，模型在10%观测噪声下仅表现出4%的MSE退化，而MLP则退化了15.2倍，且在从二次到三次动力学的迁移学习中实现了90.4%的改进。虽然MLP在点重建误差上更低，因为其容量更大，但KAN唯一提供了可解释的符号方程，强制了全纯结构，并具有优越的噪声鲁棒性，这些能力在黑盒架构中完全缺失。这些结果确立了KANs作为MLP的参数高效、可解释的替代方案，用于具有全纯动力学的物理信息发现。

英文摘要

Complex dynamical systems governed by holomorphic maps such as $z^2 + c$ exhibit fractal boundaries with extreme sensitivity to initial conditions. Accurately modelling these structures from data requires methods that respect the underlying complex-analytic geometry, yet Multi-Layer Perceptrons (MLPs) within Neural Ordinary Differential Equations (Neural ODEs) lack complex-analytic priors, violate the Cauchy--Riemann conditions, and function as opaque approximators incapable of yielding governing equations. We introduce Holomorphic KAN-ODE, a framework that replaces the MLP with a Kolmogorov-Arnold Network (KAN) whose learnable B-spline activations reside on network edges, and incorporates Cauchy--Riemann equations as a differentiable regularization to preserve holomorphic structure. We evaluate on six families of complex dynamical systems spanning polynomial and transcendental classes. With only 280 parameters ($16\times$ fewer than the MLP baseline), the network achieves velocity-field $R^2 > 0.95$ on all six systems, correctly identifies all six governing symbolic families through automatic spline-to-formula fitting, and reconstructs Julia set fractal boundaries with up to 98.0\% agreement. Crucially, the model exhibits only 4\% MSE degradation under 10\% observation noise versus $15.2\times$ for MLPs, and achieves 90.4\% improvement in transfer learning from quadratic to cubic dynamics. While the MLP attains lower pointwise reconstruction error due to its larger capacity, the KAN uniquely provides interpretable symbolic equations, enforced holomorphic structure, and superior noise resilience, capabilities that are entirely absent in black-box architectures. These results establish KANs as a parameter-efficient, interpretable alternative to MLPs for physics-informed discovery of holomorphic dynamics.

URL PDF HTML ☆

赞 0 踩 0

2605.22223 2026-05-22 cs.LG 版本更新

How Many Different Outputs Can a Transformer Generate?

变换器能生成多少种不同的输出？

Maxime Meyer, Mario Michelessa, Caroline Chaux, Vincent Y. F. Tan

发表机构 * Department of Mathematics, National University of Singapore, Singapore, 117543（新加坡国立大学数学系）； School of Computing, National University of Singapore, Singapore, 117543（新加坡国立大学计算学院）； Aix Marseille Univ, CNRS, I2M, Marseille, France（法国马赛大学、国家科学研究中心、I2M研究所）； Department of Electrical and Computer Engineering, National University of Singapore（新加坡国立大学电子与计算机工程系）

AI总结研究如何利用变换器架构中的少量特性来准确预测其能生成的不同序列数量，包括定性和定量分析，并提供基于提示长度的上限，实验证明在不同架构和模型大小下该上限紧致于10倍以内。分析还解释了之前在简单序列任务（如复制和填塞）中观察到的变换器经验性失败现象。

Comments ICML 2026 Spotlight

详情

AI中文摘要

我们研究如何仅利用变换器架构中的少量特性来紧密预测其能生成的不同序列数量，包括定性和定量分析。我们提供一个依赖于提示长度的上限，实验证明在不同架构和模型大小下，该上限紧致于10倍以内。我们的分析还为之前在简单序列任务（如复制和填塞）中观察到的变换器经验性失败提供了理论解释。形式上，我们证明了（i）可访问序列的最大长度（即变换器能为某些提示生成的序列）与提示长度成线性增长，（ii）超过临界阈值后，可访问序列的比例随序列长度呈指数衰减，（iii）提示长度与可访问序列长度之间的线性系数具有理论上限。值得注意的是，这些结果即使在无界上下文和计算时间下也成立。

英文摘要

We study how we can leverage only a handful of characteristics of a transformer's architecture to closely predict the number of different sequences it can output, both qualitatively and quantitatively. We provide an upper bound depending on the length of the prompt, which we show empirically to be tight up to a factor less than 10, across architectures and model sizes. Our analysis also provides a theoretical explanation for previously observed empirical failures of transformers on simple sequence tasks, such as copying and cramming. Formally, we prove that (i) the maximal length of accessible sequences (those that the transformer can output for some prompt) grows linearly with the prompt length, (ii) beyond a critical threshold, the proportion of accessible sequences decays exponentially with sequence length, and (iii) the linear coefficient relating prompt length to accessible sequence length admits a theoretical upper bound. Notably, these results hold even with unbounded context and computation time.

URL PDF HTML ☆

赞 0 踩 0

2605.22221 2026-05-22 cs.LG cs.AI cs.LO 版本更新

Can Transformers Learn to Verify During Backtracking Search?

Transformer能否在回溯搜索中学习验证？

Yin Jun Phua, Tony Ribeiro, Tuan Nguyen, Katsumi Inoue

发表机构 * Yin Jun Phua (corresponding author) Institute of Science Tokyo, 2-12-1 Ookayama, Meguro-ku, Tokyo 152-8550, Japan Tony Ribeiro Centrale Nantes, CNRS, Laboratoire des Sciences du Num\'erique de Nantes, LS2N, UMR 6004, F-44000 Nantes, France National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan Steelous Protocol, 8-20-32, Ginza, Chuo-ku, Tokyo 104-0061, Japan Tuan Nguyen Hanoi University of Science ； Technology, No. 1 Dai Co Viet, Hai Ba Trung, Ha Noi, Vietnam Katsumi Inoue National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan

AI总结本文研究了Transformer在回溯搜索中的验证能力，指出传统方法在处理轨迹数据时存在散列检索和历史纠缠问题，并提出局部化和选择性状态注意力（SSA）来解决这些问题，通过实验验证了SSA在3-SAT、图着色、Blocks World和回溯解析等任务中的有效性。

详情

AI中文摘要

回溯搜索是经典约束求解器、规划器和定理证明器的基础。最近的基于Transformer的推理系统探索其自身中间步骤的搜索树。一种常见的训练方法是在离线求解器轨迹上拟合自回归的下一个令牌损失。模型的输入在每一步都是所有先前决策的累积轨迹。最优的继续或回溯预测器仅依赖于当前搜索状态，因为到达相同状态的两条轨迹允许相同的延续。我们证明，仅使用累积轨迹训练的解码器Transformer在两种方式上未能满足这一要求：轨迹可以将状态特征散列到许多位置（散列检索），并且预测器可以基于轨迹而非状态（历史纠缠）。我们通过局部化解决散列检索问题，这是一种轨迹级的修复方法，将每个决策块重写以局部化状态特征。我们通过选择性状态注意力（SSA）解决历史纠缠问题，这是一种固定注意力掩码，可以在不修改训练数据、目标或参数的情况下强制结构化基于状态的决策。我们专注于矛盾传播后发生的反应验证。我们在3-SAT、图着色、Blocks World和回溯解析中测试SSA。在仅在先前历史上不同的相同状态对中，SSA发出相同的决定，而自回归训练的因果基线则不会。我们的贡献是针对序列轨迹数据的Transformer行为诊断，配以结构化修复。预训练语言模型在搜索其自身推理步骤时可能面临相同的失败。我们的分析为推理时的上下文清除作为不重新训练的情况下应用相同隔离的方法提供了候选方案。

英文摘要

Backtracking search underlies classical constraint solvers, planners, and theorem provers. Recent transformer-based reasoning systems explore search trees over their own intermediate steps. A common training recipe fits an autoregressive next-token loss on offline solver traces. The model's input at each step is a cumulative trace of all prior decisions. The optimal continue-or-backtrack predictor depends only on the current search state, since two trajectories reaching the same state admit the same viable continuations. We show that decoder-only transformers trained on cumulative traces fail this requirement in two ways: the trace can scatter state features across many positions (scattered retrieval), and the predictor can condition on the trajectory rather than the state (history entanglement). We address scattered retrieval with localization, a trace-level fix that rewrites each decision block to expose state features locally. We address history entanglement with Selective State Attention (SSA), a fixed attention mask that enforces state-based decisions structurally without modifying training data, objective, or parameters. We focus on reactive verification, after propagation has exposed a contradiction. We test SSA on 3-SAT, graph coloring, Blocks World, and backtracking parsing. On same-state pairs that differ only in prior history, SSA emits identical decisions while a cumulative-trained causal baseline does not. Our contribution is a diagnostic of transformer behavior on serialized trajectory data, paired with a structural fix. Pretrained language models that search over their own reasoning steps may face the same failure. Our analysis opens up inference-time context clearing as a candidate way to apply the same isolation without retraining.

URL PDF HTML ☆

赞 0 踩 0

2605.22217 2026-05-22 cs.LG cs.CL 版本更新

Survive or Collapse: The Asymmetric Roles of Data Gating and Reward Grounding in Self-Play RL

生存或崩溃：自我博弈强化学习中数据门控与奖励基础的不对称作用

Sophia Xiao Pu, Zhaotian Weng, Chengzhi Liu, Jayanth Srinivasa, Gaowen Liu, William Yang Wang, Xin Eric Wang

发表机构 * University of California, Santa Barbara（加州大学圣巴巴拉分校）； Cisco Research（思科研究）

AI总结本文研究了自我博弈强化学习中数据门控和奖励基础的不对称作用，发现数据门控是维持稳定的关键因素，而奖励信号在门控移除后无法单独保证稳定性，揭示了'基础提出者悖论'。

详情

AI中文摘要

自我博弈强化学习通过语言模型自行生成任务进行训练，实现提出者与求解者的共同进化，无需人工标注。最近的系统报告了显著的推理提升，但崩溃和不稳定性普遍存在且理解不足。主流观点将其视为奖励设计问题，但我们认为自我博弈的稳定性由两个不同的调节机制决定：数据层面的门控，决定哪些由提出者生成的任务进入训练池，以及奖励信号，更新已准入任务的策略。通过在Python输出预测任务和确定性DSL双胞胎任务上的受控实验，我们发现这两个机制是不对称的。严格的数据门控在我们测试的每种奖励变体下都能保证稳定性，包括没有地面真实信息访问的自一致性奖励；而一旦移除门控，没有任何奖励变体足以保证稳定性。这种不对称性揭示了我们称之为'基础提出者悖论'的反直觉耦合：具有地面真实信息访问的提出者在与自一致性求解器配对时，会比无地面真实信息的提出者更快崩溃，因为训练集中在形成最快路径到虚假自一致性吸引子的干净任务上。将二进制门控替换为连续严格性参数ε进一步揭示了两阶段相变：训练侧指标在低ε时解耦，而验证准确率在ε远高于时才保持。数据层面的门控，而非奖励校准，是自我博弈稳定性的绑定约束。

英文摘要

Self-play reinforcement learning trains language models on their own generated tasks, co-evolving a proposer and solver without human labels. Recent systems report strong reasoning gains, but collapse and instability are widely observed and poorly understood. The dominant response treats this as a reward-design problem. We argue instead that self-play stability is governed by two distinct levers: a data-level gate that decides which proposer-generated tasks enter the training pool, and the reward signal that updates the policy on tasks already admitted. Through controlled experiments on a Python output-prediction task and a deterministic-DSL twin task that strips pretraining priors, output ambiguity, and executor noise, we find the two levers are asymmetric. A strict gate is sufficient for stability under every reward variant we test, including a self-consistency reward with no access to ground truth; while no reward variant is sufficient once the gate is removed. This asymmetry exposes a counter-intuitive coupling we call the Grounded Proposer Paradox: a proposer with ground-truth access accelerates collapse faster than an ungrounded one when paired with a self-consistency solver, by concentrating training on clean tasks that form the fastest path to a spurious self-consistent attractor. Replacing the binary gate with a continuous strictness parameter $\varepsilon$ further reveals a two-stage phase transition: training-side metrics decouple at low $\varepsilon$, while validation accuracy holds until $\varepsilon$ is much higher. Data-level gating, not reward calibration, is the binding constraint on self-play stability.

URL PDF HTML ☆

赞 0 踩 0

2605.22207 2026-05-22 eess.SY cs.LG cs.SY 版本更新

Kernel-Based Safe Exploration in Deep Reinforcement Learning

基于核的深度强化学习安全探索

Rupak Majumdar, Nikhil Singh, Sadegh Soudjani

发表机构 * Max Planck Institute for Software Systems（马克斯·普朗克软件系统研究所）

AI总结本文提出了一种基于核的方法，用于在深度强化学习中安全探索，通过学习屏障函数来保证策略不会进入危险区域，同时在探索过程中同时学习最优策略和屏障函数，提供更可靠的概率安全保证。

Comments Accepted at L4DC Conference (22 Jan 2026)

详情

AI中文摘要

安全性在将深度强化学习算法部署到现实世界时是一个主要关注点。一种有前景的方向是学习一个屏障函数，以确保学习的策略不会访问危险区域。屏障函数是从状态到实数的函数，它将初始状态赋予低值，将危险状态赋予高值，并在每次转移中减少期望值；这样的函数可用于限制到达危险状态的概率。以前的研究直接从探索数据中学习屏障函数，但需要大量数据或对系统动力学的限制。在本文中，我们展示了如何利用核嵌入来学习深度强化学习中随机系统的屏障函数。我们的算法，称为基于核的安全探索（KBSE），在探索过程中同时学习最优策略和屏障函数。屏障函数是通过迭代计算得到的，并以条件均值嵌入表示，随着探索的增加，它们提供更好的概率安全保证。探索算法使用学习到的屏障函数来识别安全违规。在发生违规时，它会干预，将危险动作改为安全动作，从而确保探索仅限于限制到达危险状态概率的动作。我们评估了KBSE在多个复杂的连续控制基准上的性能。实验结果表明，我们的新算法适用于合成概率安全的控制策略，而不会影响奖励的累积。

英文摘要

Safety has been a major concern when deploying deep reinforcement learning algorithms in the real world. A promising direction that ensures that the learned policy does not visit unsafe regions is to learn a \emph{barrier function} along with the policy. A barrier is a function from states to reals that assigns low values to the initial states, high values to the unsafe states, and decreases in expectation on each transition; such a function can be used to bound the probability of reaching unsafe states. Previous attempts learned a barrier function directly from exploration data, but this required either large amounts of data or restrictions on the system dynamics. In this paper, we show how kernel embeddings can be used to learn barrier functions during deep reinforcement learning for stochastic systems with unknown dynamics. Our algorithm, \emph{kernel-based safe exploration (KBSE)}, learns an optimal policy and a barrier simultaneously during exploration. The barriers are computed iteratively, represented as conditional mean embeddings, and provide better probabilistic safety guarantees with more exploration. The exploration algorithm uses the learned barrier functions to identify safety violations. In the case of violation, it intervenes to modify the unsafe action to a safe action, thereby ensuring that the exploration is restricted to actions that bound the probability of reaching unsafe states. We evaluate KBSE on several complex continuous control benchmarks. Experimental results establish our new algorithm to be suitable for synthesizing control policies that are probabilistically safe without degradation in reward accumulation.

URL PDF HTML ☆

赞 0 踩 0

2605.22205 2026-05-22 cs.AI cs.LG 版本更新

Skill Weaving: Efficient LLM Improvement via Modular Skillpacks

技能编织：通过模块化技能包实现高效的LLM改进

Zhuo Li, Guodong Du, Zesheng Shi, Weiyang Guo, Weijun Yao, Yuan Zhou, Jiabo Zhang, Jing Li

发表机构 * Harbin Institute of Technology, Shenzhen, China（哈尔滨工业大学（深圳））； The Hong Kong Polytechnic University（香港理工大学）； Huawei Technologies Co., Ltd.（华为技术有限公司）； Shanghai Jiaotong University（上海交通大学）

AI总结本研究提出SkillWeave框架，通过模块化技能包使LLM在固定内存预算下实现领域专业化，通过SkillZip压缩技术实现高效部署，实验表明其在多任务和代理基准上表现优异，速度提升达4倍。

Comments Accepted by ACL2026

2605.22200 2026-05-22 cs.CV cs.AI cs.LG 版本更新

OSS: Open Suturing Skills Vision-Based Assessment Challenge 2024-2025

OSS: 2024-2025 开放缝合技能基于视觉的评估挑战

Hanna Hoffmann, Setareh Bady, Claas de Boer, Max Kirchner, Jan Egger, Rainer Röhrig, Frank Hölzle, Lennart Johannes Gruber, Kunpeng Xie, Marlon Neuhaus, Victor Alves, Guilherme Barbosa, Leonardo Barroso, João Carvalho, Hao Chen, Gabriella d'Albenzio, André Ferreira, Nuno Gomes, Yuichiro Hayashi, Kousuke Hirasawa, Rebecca Hisey, Seungjae Hong, Seoi Jeong, Tiago Jesus, Daehong Kang, Satoshi Kasai, Shunsuke Kikuchi, Takayuki Kitasaka, Satoshi Kondo, Hyoun-Joong Kong, Youngbin Kong, Atsushi Kouno, Shlomi Laufer, Kyu Eun Lee, Bining Long, Nooshin Maghsoodi, Hiroki Matsuzaki, Evangelos Mazomenos, Ori Meiraz, Kensaku Mori, Marina Music, Masahiro Oda, Roi Papo, Jieun Park, Rafael Piexoto, Saeid Rezaei, Mariana Ribeiro, Soyeon Shin, Yang Shu, Idan Smoller, Danail Stoyanov, Yihui Wang, Xinkai Zhao, Sebastian Bodenstedt, Isabel Funke, Stefanie Speidel, Behrus Hinrichs-Puladi

发表机构 * Department of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC) Dresden（转化外科肿瘤学部，肿瘤疾病国家中心（NCT/UCC）德累斯顿）； The Centre for Tactile Internet with Human-in-the-Loop (CeTI), TUD Dresden University of Technology（具有人环路触觉互联网中心（CeTI），德累斯顿技术大学）； Department of Oral and Maxillofacial Surgery, University Hospital RWTH Aachen（口腔和颌面外科部，亚琛大学医院）； Center for Tooth-, Mouth- and Jaw Medicine, University Göttingen（牙科、口科和颌科医学中心，哥廷根大学）； Institute of Medical Informatics, University Hospital RWTH Aachen（医学信息学研究所，亚琛大学医院）； Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology（医学系和卡尔·戈斯塔·卡鲁斯大学医院，德累斯顿技术大学）； German Cancer Research Center (DKFZ)（德国癌症研究中心（DKFZ））； Muroran Institute of Technology（牟然技术学院）； Niigata University of Health and Welfare（北九州市保健福利大学）； Konica Minolta, Inc.（柯尼卡美能达公司）； Jmees, Inc.（Jmees公司）； Department of Computer Science and Engineering, The Hong Kong University of Science and Technology（计算机科学与工程部，香港科学与技术大学）； Center Algoritmi/LASI, University of Minho（算法中心/ALASI，米尼奥大学）； Life and Health Sciences Research Institute (ICVS), School of Medicine, University of Minho（生命与健康科学研究院（ICVS），医学院，米尼奥大学）； ICVS/3B's - PT Government Associate Laboratory（ICVS/3B's - PT政府附属实验室）； Institute for AI in Medicine (IKIM), University Medicine Essen（医学人工智能研究所（IKIM），埃森大学医学部）； The Faculty of Data and Decisions Science, Technion - Israel Institute of Technology（数据与决策科学系，技术学院-以色列理工学院）； UCL Hawkes Institute, University College London（UCL Hawkes研究所，伦敦大学学院）； School of Computing, Queen's University（计算学院，皇后大学）； Department of Transdisciplinary Medicine, Seoul National University Hospital（跨学科医学部，首尔国立大学医院）； Interdisciplinary Program in Medical Informatics, Seoul National University（医学信息学跨学科项目，首尔国立大学）； Department of Clinical Medical Sciences, Seoul National University（临床医学科学部，首尔国立大学）； Institute of Convergence Medicine with Innovative Technology, Seoul National University Hospital（融合医学与创新技术研究所，首尔国立大学医院）； Department of Surgery, Seoul National University College of Medicine and Seoul National University Hospital（外科部，首尔国立大学医学院和首尔国立大学医院）

AI总结本文提出OSS挑战，旨在通过基于视觉的评估方法提升开放手术技能训练，通过挑战数据集和多任务评估，评估不同方法在开放手术技能评估中的表现，揭示视频评估的潜力与限制。

Comments Stefanie Speidel and Behrus Hinrichs-Puladi jointly supervised this work. Submitted to MEDIA

详情

AI中文摘要

通过有效的训练实现高水平的外科技能对于最佳的患者结果至关重要。自动化、数据驱动的技能评估有潜力改善外科训练。尽管基于机器学习的方法在微创手术技能评估中越来越受欢迎，但其在开放手术中的应用仍然有限。我们提出了一个专门的MICCAI挑战，旨在基准测试和推进开放手术中的基于视觉的技能评估。挑战数据集包含在干实验室环境中用静态GoPro相机记录的开放缝合训练任务视频，除了主要视频模态外，还包含仪器轨迹数据。OSS挑战连续两年举办，分别包含两个和三个独立任务：(1) 将技能水平分类为四个类别，(2) 预测涵盖八个类别的完整客观结构化评估技术技能分数，(3) 跟踪手部和手术工具。参与者提交了多种解决方案，包括基于深度学习的视频模型、跟踪驱动的方法和混合方法。通用的空间时间视频模型始终实现了最强的性能，尽管概念上多样的方法在执行良好的情况下也能达到竞争水平。预测细粒度的OSATS分数仍然具有挑战性，但受益于增加的训练数据。关键点跟踪由于频繁的遮挡和出帧实例而变得困难，限制了当前基于运动的技能分析的应用。这项工作评估了创新和多样的解决方案，突显了基于视频的评估在开放手术中的潜力和当前限制，并识别了推进自动化技能评估向临床影响发展的关键方向。

增强多模态大语言模型以用于安全关键驾驶视频分析

Tomaso Trinci, Henrique Piñeiro Monteagudo, Leonardo Taccari

发表机构 * Verizon Connect

AI总结本研究通过融合降采样视频帧与同步高频 telemetry 数据及专用计算机视觉模型的语义信息，提升多模态大语言模型在安全关键驾驶场景中的感知与推理能力，从而更准确地识别和描述现实驾驶中的安全关键事件。

Comments Accepted at the 2026 IEEE International Conference on Intelligent Transportation Systems (ITSC 2026)

详情

AI中文摘要

近年来，多模态大语言模型（MLLMs）在一般视觉理解方面展现了出色的性能。然而，其在安全关键驾驶场景中的应用受限于无法准确感知和推理罕见高风险动态事件（如碰撞或接近碰撞）的能力。为此，我们提出了一种增强MLLM感知能力的流程，通过融合降采样视频帧与同步高频telematics数据（IMU和GPS）以及专用计算机视觉模型的语义信息生成高质量的伪标签，包括描述性标题和问答对，专门用于训练MLLM识别和描述现实驾驶中的安全关键事件（SCEs）。我们通过微调开源QwenVL-2.5模型并使用DoRA适配器展示了该方法的有效性：实验表明在少于50M可训练参数和有限计算预算下，显著提高了识别和解释安全关键事件的能力。

英文摘要

Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities in general visual understanding. However, their application to safety-critical driving scenarios remains limited by an inability to accurately perceive and reason about rare high-stakes dynamic events, such as collisions or near-collisions. To address this, we introduce a pipeline that enhances MLLM perception by fusing downsampled video frames with synchronized high-frequency telematics data (IMU and GPS) and semantic insights from specialized computer vision models. Our pipeline generates high-quality pseudo-labels, including descriptive captions and question-answer pairs, specifically designed to train MLLMs to identify and describe Safety-Critical Events (SCEs) in real-world driving footage. We show the effectiveness of our approach fine-tuning the open-source QwenVL-2.5 model via DoRA adapters: our experiments demonstrate significant improvements in identifying and explaining safety-critical events, with fewer than 50M trainable parameters and limited computational budget.

URL PDF HTML ☆

赞 0 踩 0

2605.22182 2026-05-22 cs.LG 版本更新

IKNO: Infinite-order Kernel Neural Operators

IKNO：无限阶核神经算子

Pengyuan Zhu, Ivor W. Tsang, Yueming Lyu

发表机构 * Nanyang Technological University（南洋理工大学）； Centre for Frontier AI Research(CFAR), Agency for Science, Technology and Research (A*STAR)（前沿人工智能研究中心（CFAR），科技研究局（A*STAR））

AI总结本文提出IKNO，一种通过无限阶核积分构建的神经算子，解决了传统模型因依赖一阶核积分而限制表达能力的问题，通过两种互补的构造方法实现了高效的全局信息聚合，并在多个基准数据集上取得了SOTA精度。

详情

AI中文摘要

神经算子在现代科学计算中因灵活性和强大的泛化能力而取得了显著成功。然而，现有模型主要依赖于一阶核积分近似，这严重限制了它们的表达能力。为此，我们提出了无限阶核神经算子（IKNO），通过无限阶核积分构建神经算子，并具有优雅的闭式有限近似。我们开发了两种互补的无限阶神经算子构造：IKNO-Vanilla，通过克罗内克特征分解在产品网格上应用完整的核解算子；以及IKNO-TP，一种替代的张量积算子，通过各轴解算子进行组合。此外，我们为这两种IKNO变体开发了快速计算方案，实现了出色的全局信息聚合同时保持高计算效率。实验证明，我们在具有任意输入形状的时间依赖和时间无关基准数据集上评估了我们的IKNO，包括大规模工业数据集。广泛的实验表明，IKNO方法在几乎所有基准数据集上都实现了显著的精度提升，同时保持了对非常大的点云的可扩展性。

英文摘要

Neural operators have achieved significant success in modern scientific computing due to their flexibility and strong generalization capabilities. Existing models, however, primarily rely on first-order kernel integral approximations, which severely limit their expressivity. To address this, we propose the Infinite-order Kernel Neural Operator (IKNO), which constructs neural operators via infinite-order kernel integrals and admits an elegant closed-form finite approximation. We develop two complementary infinite-order neural operator constructions: IKNO-Vanilla, which applies the full-kernel resolvent on the product grid via Kronecker eigendecomposition, and IKNO-TP, an alternative tensor-product operator that composes per-axis resolvents. Furthermore, we develop fast computation schemes for both variants of IKNO, which achieve outstanding global information aggregation while maintaining high computational efficiency. Empirically, we evaluate our IKNO on both time-dependent and time-independent benchmarks with arbitrary input shapes, including large-scale industrial datasets. Extensive experiments demonstrate that the IKNO method consistently achieves the SOTA accuracy with significant improvements on nearly all benchmark datasets while maintaining scalability to very large point clouds.

URL PDF HTML ☆

赞 0 踩 0

2605.22177 2026-05-22 cs.LG cs.CL 版本更新

Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

Maestro：通过强化学习协调分层模型-技能集合

Jinyang Wu, Guocheng Zhai, Ruihan Jin, Yuhao Shen, Zhengxi Lu, Fan Zhang, Haoran Luo, Zheng Lian, Zhengqi Wen, Jianhua Tao

发表机构 * Tsinghua University（清华大学）； Zhejiang University（浙江大学）； The Chinese University of Hong Kong（香港中文大学）； Nanyang Technological University（南洋理工大学）； Tongji University（同济大学）

AI总结本文提出Maestro框架，通过强化学习协调多模态任务，利用分层模型-技能集合提升多模态任务性能，实现高效且通用的协调策略。

详情

AI中文摘要

大型语言模型（LLMs）和模块化技能的普及使自主代理具备了越来越强大的能力。现有框架通常依赖于单一的LLM和固定的逻辑来与这些技能交互。这导致了一个关键瓶颈：不同的LLMs在不同领域具有不同的优势，但当前框架未能利用模型和技能的互补优势，从而限制了其在下游任务上的性能。在本文中，我们提出了Maestro（多模态代理专家技能强化学习协调框架），这是一个由强化学习（RL）驱动的协调框架，将异构多模态任务重新框架化为一个在分层模型-技能注册表上的顺序决策过程。与将所有知识整合到单一模型中不同，Maestro训练了一个轻量级的策略，动态组合冻结的专家模型和一个双层技能库，决定在每一步是否调用外部专家，选择哪个模型-技能对，以及何时终止。该策略通过基于结果的强化学习进行优化，不需要步骤级监督。我们评估了Maestro在十个代表性的多模态基准上，涵盖数学推理、图表理解、高分辨率感知和领域特定分析。仅使用一个4B的协调器，Maestro实现了70.1%的平均准确率，超过了GPT-5（69.3%）和Gemini-2.5-Pro（68.7%）。关键的是，学习的协调策略能够泛化到未见过的模型和技能，无需重新训练：在注册表中添加非领域专家，使在四个具有挑战性的基准上平均达到59.5%，优于所有闭源基线。Maestro进一步保持了高计算效率和低延迟。源代码可在https://github.com/jinyangwu/Maestro上获得。

英文摘要

The proliferation of large language models (LLMs) and modular skills has endowed autonomous agents with increasingly powerful capabilities. Existing frameworks typically rely on monolithic LLMs and fixed logic to interface with these skills. This gives rise to a critical bottleneck: different LLMs offer distinct advantages across diverse domains, yet current frameworks fail to exploit the complementary strengths of models and skills, thereby limiting their performance on downstream tasks. In this paper, we present Maestro (Multimodal Agent for Expert-Skill Targeted Reinforced Orchestration), a Reinforcement Learning (RL)-driven orchestration framework that reframes heterogeneous multimodal tasks as a sequential decision-making process over a hierarchical model-skill registry. Rather than consolidating all knowledge into a single model, Maestro trains a lightweight policy to dynamically compose ensembles of frozen expert models and a two-tier skill library, deciding at each step whether to invoke an external expert, which model-skill pair to select, and when to terminate. The policy is optimized via outcome-based RL, requiring no step-level supervision. We evaluate Maestro across ten representative multimodal benchmarks spanning mathematical reasoning, chart understanding, high-resolution perception, and domain-specific analysis. With only a 4B orchestrator, Maestro achieves an average accuracy of 70.1%, surpassing both GPT-5 (69.3%) and Gemini-2.5-Pro (68.7%). Crucially, the learned coordination policy generalizes to unseen models and skills without retraining: augmenting the registry with out-of-domain experts yields a 59.5% average on four challenging benchmarks, outperforming all closed-source baselines. Maestro further maintains high computational efficiency with low latency. The source code is available at https://github.com/jinyangwu/Maestro.

URL PDF HTML ☆

赞 0 踩 0

2605.22168 2026-05-22 cs.AI cs.LG 版本更新

Measuring Cross-Modal Synergy: A Benchmark for VLM Explainability

衡量跨模态协同：VLM可解释性的一个基准

Joël Roman Ky, Salah Ghamizi, Maxime Cordy

发表机构 * University of Luxembourg（卢森堡大学）； Luxembourg Institute of Health (LIH)（卢森堡健康研究院）

AI总结本文提出Synergistic Faithfulness作为衡量VLM跨模态协同的指标，解决了传统单模态评估方法在评估VLM可解释性时的不足，通过引入Shapley交互指数，实现了对多模态协同的准确评估，同时提升了计算效率。

详情

AI中文摘要

视觉-语言模型（VLMs）将复杂的视觉输入映射到语义空间，但目前解释VLM的跨模态推理仍依赖于通过单模态扰动度量评估的后验解释器。我们揭示了这一范式的局限性：由于多模态数据集包含语言先验和模态偏差，VLMs经常表现出跨模态冗余，允许它们仅使用文本回答视觉查询。因此，单模态度量惩罚忠实的解释器，导致评估崩溃，其中视觉和文本排名根本矛盾（Kendall's τ= -0.06）。为了解决这一问题，我们引入了Synergistic Faithfulness（F_syn），一个基于Shapley交互指数的可扩展度量，严格隔离模态间的Harsanyi收益，作为高度准确的替代指标（ρ= 0.92），同时实现了24倍的计算加速。在评估8种不同的XAI方法、3种VLM架构和3个基准数据集时，发现为VLM设计的解释器严重过度索引视觉显著性，并在捕捉真正的跨模态协同方面显著劣于适应的注意力方法。通过将视觉合理性与跨模态忠实性解耦，本文提供了一个严格评估框架，以安全审计VLM在高风险部署中的推理。

英文摘要

Vision-Language Models (VLMs) map complex visual inputs to semantic spaces, but interpreting the cross-modal reasoning of VLMs currently relies on post-hoc explainers evaluated via unimodal perturbation metrics. We expose a limitation in this paradigm: because multimodal datasets contain language priors and modality biases, VLMs frequently exhibit cross-modal redundancy, allowing them to answer visual queries using text alone. Consequently, unimodal metrics penalize faithful explainers, triggering an evaluation collapse where visual and textual rankings fundamentally contradict each other. %(Kendall's $τ= -0.06$). To resolve this, we introduce Synergistic Faithfulness ($\mathcal{F}_{syn}$), a scalable metric rooted in the Shapley Interaction Index that strictly isolates the joint Harsanyi dividend between modalities, serving as a highly accurate surrogate ($ρ= 0.92$) while achieving a $24\times$ computational speedup. Evaluating 8 distinct XAI methods across 3 VLM architectures and 3 benchmark datasets, reveals that explainers proposed for VLMs heavily over-index on visual salience and significantly underperform adapted attention-based methods in capturing true cross-modal synergy. By decoupling visual plausibility from cross-modal faithfulness, this work provides a rigorous evaluation framework required to safely audit VLM reasoning in high-stakes deployments.

URL PDF HTML ☆

赞 0 踩 0

2605.22164 2026-05-22 cs.LG cs.RO 版本更新

Beyond Euclidean Proximity: Repairing Latent World Models with Horizon-Matched Trajectory Reachability Metrics

超越欧几里得距离：通过地平线匹配轨迹可达性度量修复潜在世界模型

Liangyu Li, Shengzhi Wang, Qingwen Liu

发表机构 * Tongji University（同济大学）

AI总结本文提出轨迹可达性度量（TRM）作为固定潜在世界模型的后处理终端排名方法，通过训练小的成对头部来改进终端排名，从而提高连续操控任务的性能。

Comments 26 pages, 7 figures

详情

AI中文摘要

潜在世界模型可以包含用于控制的状态，但其终端成本接口可能会向规划器暴露错误的决策相关信息。在常见的潜在MPC中，候选序列通过预测终端和目标潜在状态之间的欧几里得距离进行排名；这假设了原始潜在距离权重能够正确地反映可达性相关变量。我们提出轨迹可达性度量（TRM），一种用于固定潜在世界模型的后处理终端排名方法。TRM从记录的轨迹结构中训练一个小的成对头部，并将其用作替代或混合成本；编码器、动力学、采样器、优化器和评估表现保持不变。关键设计选择是地平线意识监督：该度量在广泛的、平衡的时间分离上进行训练，以匹配长地平线终端候选排名问题。在硬TwoRoom基准上，使用LeWorldModel（LeWM）的原始潜在规划成功率为7.0%，而全地平线TRM成功率为97.0%；洗牌时间标签控制仍为0.0%。同样的配方在三个种子上将PLDM基线从32.7%提高到84.0%，而短地平线TRM变体在100,000对预算下仅达到35.0%。在TwoRoom中，我们提供了TRM为何有效的机理证据：XY位置是线性可解码的（R²=0.998），但原始潜在MSE错误地排名候选；XY探针行空间在终端-目标潜在MSE中占比不到1%，但承载了大部分候选质量信号；SCSA审计显示TRM提高了规划器看到的排序和选定终点。在PushT go50/go75中，TRM风格的任务-状态度量比闭环成功更清晰地改进了SCSA排名和选定最终距离，推动了连续操控中的辅助混合成本。TRM是规划器面对的修复，审计解释了何时终端可达性度量应替代或补充原始潜在接近度。

英文摘要

Latent world models can contain the state needed for control, yet their terminal-cost interface can expose the planner to the wrong decision-relevant information. In common latent MPC, candidate sequences are ranked by Euclidean distance between predicted terminal and goal latent states; this assumes that raw latent distance weights reachability-relevant variables correctly. We propose trajectory reachability metrics (TRM), a post-hoc terminal-ranking method for fixed latent world models. TRM trains a small pairwise head from logged trajectory structure and uses it as a replacement or hybrid cost; the encoder, dynamics, sampler, optimizer, and evaluation manifests remain fixed. The key design choice is horizon-aware supervision: the metric is trained on broad, balanced temporal separations to match the long-horizon terminal candidate ranking problem. On a hard TwoRoom benchmark, raw latent planning with LeWorldModel (LeWM) reaches 7.0% success, while full-horizon TRM reaches 97.0%; shuffled temporal-label controls stay at 0.0%. The same recipe improves a PLDM baseline from 32.7% to 84.0% across three seeds, and a short-horizon TRM variant reaches only 35.0% with the 100,000 pair budget. In TwoRoom, we provide mechanistic evidence for why TRM works: XY position is linearly decodable (R^2=0.998), yet raw latent MSE misranks candidates; the XY-probe rowspace accounts for less than 1% of terminal-goal latent MSE but carries most candidate-quality signal; and SCSA audits show that TRM improves the ordering and selected endpoint seen by the planner. On PushT go50/go75, TRM-style task-state metrics improve SCSA ranking and selected final distance more cleanly than closed-loop success, motivating auxiliary hybrid costs in continuous manipulation. TRM is the planner-facing repair, and audits explain when terminal reachability metrics should replace or augment raw latent proximity.

URL PDF HTML ☆

赞 0 踩 0

2605.22156 2026-05-22 cs.LG cs.AI 版本更新

从赌局到经验伯恩斯坦LIL

Francesco Orabona

AI总结本文通过在线投注策略的财富保证，推导出迭代对数定律，并提出经验伯恩斯坦LIL方法。

2605.22112 2026-05-22 astro-ph.HE astro-ph.IM cs.LG 版本更新

Self-Supervised ConvLSTM for Fermi Large Area Telescope Transient Detection

基于自监督的ConvLSTM用于费米大视场望远镜瞬变检测

Alberto Garinei, Stefano Speziali, Alessandro Vispa, Andrea Marini, Sara Cutini, Emanuele Piccioni, Marcello Marconi, Francesco Longo, Matteo Martini, Francesca Fallucchi, Romeo Giuliano, Ernesto William De Luca, Umberto Di Matteo, Sabino Meola

发表机构 * Idea-RE

AI总结本文提出了一种结合端到端模拟和自监督时空深度学习的方法，用于在受控环境中检测费米- LAT中的瞬变伽马射线现象，通过生成一个十年合成宇宙并利用ConvLSTM网络来建模天空的典型演变，以检测异常。

Comments 17 pages, 5 figures. Accepted for publication in Astronomy and Computing. Author-accepted manuscript version

Journal ref Astronomy and Computing 56 (2026) 101128

详情

DOI: 10.1016/j.ascom.2026.101128

AI中文摘要

我们提出了一种框架，通过将费米- LAT天空的端到端模拟与自监督时空深度学习相结合，用于在受控环境中检测瞬变伽马射线现象。我们使用gtobssim生成一个十年的合成宇宙，并将模拟事件处理成每日全天空计数和曝光图，获得一个时间有序的序列，其结构与费米- LAT观测一致。为了建模天空的典型演变，我们采用卷积长短期记忆网络（ConvLSTM），该网络直接在地图序列上运行，保持空间局部性的同时学习时间依赖性。模型被训练以重建预期的发射，偏离学习基线的量通过像素级均方残差图量化。然后，我们通过从训练集上的残差分布估计每个像素的阈值，定义统计学驱动的异常标准，并通过局部滤波强制空间一致性以抑制孤立波动。训练后的ConvLSTM被部署到费米- LAT每日地图上，其中天空可能由于真实的天体物理变化或仪器非平稳性而偏离典型行为。所得到的流程可以标记出与高变源或瞬变事件（如耀斑或伽马射线暴）一致的局部、时间依赖的过剩，并为在长持续时间、费米- LAT类数据集上评估异常检测策略提供基准。

英文摘要

We present a framework for detecting transient gamma-ray phenomena in a controlled environment by combining end-to-end simulations of the Fermi-LAT sky with self-supervised spatio-temporal deep learning. We generate a ten-year synthetic Universe with gtobssim and process the simulated events into daily all-sky maps of counts and exposure, obtaining a time-ordered sequence that mirrors the structure of Fermi-LAT observations. To model the nominal evolution of the sky, we employ a Convolutional Long Short-Term Memory (ConvLSTM) network that operates directly on map sequences, preserving spatial locality while learning temporal dependencies. The model is trained to reconstruct expected emission, and departures from the learned baseline are quantified through pixel-wise mean-squared residual maps. We then define statistically motivated anomaly criteria by estimating per-pixel thresholds from the residual distribution on the training set, and we enforce spatial coherence via local filtering to suppress isolated fluctuations. The ConvLSTM is then deployed as trained predictor on Fermi-LAT daily maps, where the sky can depart from the nominal behavior because of genuine astrophysical variability and instrumental non-stationarities. The resulting pipeline flags localized, time-dependent excesses consistent with high-variable sources or transient events (e.g., flares or GRBs) and provides a benchmark for evaluating anomaly-detection strategies on long-duration, Fermi-LAT-like datasets.

URL PDF HTML ☆

赞 0 踩 0

2605.22111 2026-05-22 cs.LG cs.CE stat.ML 版本更新

Aerodynamic force reconstruction using physics-informed Gaussian processes

利用物理信息高斯过程进行气动力重建

Gledson Rodrigo Tondo, Igor Kavrakov, Guido Morgenthal

发表机构 * Bauhaus-Universität Weimar（魏玛应用科学大学）； University of Cambridge（剑桥大学）

AI总结本文提出一种基于物理信息的机器学习方法，用于从结构动态响应的噪声测量中重建底层气动载荷，通过避免过拟合和无需正则化方案，提高了模型的准确性和适用性。

详情

DOI: 10.1007/978-3-032-15130-8_20

AI中文摘要

准确建模气动载荷对于理解和预测复杂结构系统的响应至关重要。然而，这些模型往往依赖于真实物理力的简化，引入假设可能会限制其准确性。在存在噪声或不完整数据的情况下，验证这些模型变得特别具有挑战性。为此，我们介绍了一种概率物理信息机器学习方法，旨在从结构动态响应的噪声测量中重建底层气动载荷。该模型避免了过拟合，消除了对正则化方案的需要，并允许在训练过程中使用异质和多保真度数据。通过重建大贝尔东桥在线性非稳态假设下的气动载荷，证明了该方法的有效性。结果表明，真实和预测载荷之间有很强的一致性，特别是在均方误差、幅度、相位角和信号峰值值方面。该载荷重建方法具有广泛的应用前景，如模型验证、未来载荷估计和结构损伤预测。

英文摘要

Accurate modeling of aerodynamic loads is essential for understanding and predicting the responses of complex structural systems. However, these models often rely on simplifications of the true physical forces, introducing assumptions that can limit their accuracy. Validating such models becomes particularly challenging in the presence of noisy or incomplete data. To address this, we introduce a probabilistic physics-informed machine learning approach designed to reconstruct the underlying aerodynamic loads from noisy measurements of structural dynamic responses. The model avoids overfitting, eliminates the need for regularization schemes, and allows for the use of heterogeneous and multi-fidelity data during the training process. The efficacy of the approach is demonstrated through the reconstruction of aerodynamic loads on the Great Belt East Bridge, simulated under a linear unsteady assumption. Results show a strong agreement between true and predicted loads, particularly related to root mean squared errors, magnitude, phase angle and peak values of the signals. The method for load reconstructing holds broad applicability, such as modeling validation, future load estimation, and structural damage prognosis.

URL PDF HTML ☆

赞 0 踩 0

2605.22098 2026-05-22 cs.CV cs.AI cs.LG 版本更新

TextTeacher: What Can Language Teach About Images?

TextTeacher: 语言能教会我们关于图像什么？

Tobias Christian Nauen, Stanislav Frolov, Brian Bernhard Moser, Federico Raue, Ahmed Anwar, Andreas Dengel

发表机构 * RPTU University Kaiserslautern-Landau（赖兴海大学凯撒斯劳滕-兰道分校）； German Research Center for Artificial Intelligence (DFKI)（德国人工智能研究中心（DFKI））

AI总结该研究提出TextTeacher方法，通过将语言模型的语义知识注入到图像分类训练中，提升视觉模型的性能，同时保持推理时的模型简洁性。

Comments Published at TMLR

Journal ref Transactions on Machine Learning Research, ISSN 2835-8856, 2026

详情

AI中文摘要

柏拉图表示假设认为，足够大的模型会收敛到共享的表示几何结构，即使跨模态。受此启发，我们提出问题：语言模型的语义知识能否有效提升视觉模型？为此，我们引入TextTeacher，一种简单的辅助目标，将文本嵌入作为额外信息注入图像分类训练。TextTeacher利用 readily available 的图像描述、预训练并冻结的文本编码器以及轻量级投影，生成语义锚点，高效引导训练期间的表示，同时保持推理时的模型不变。在ImageNet上使用标准ViT后端，TextTeacher将准确率提升高达+2.7个百分点（p.p.），并在相同配方和计算条件下产生一致的迁移增益（平均+1.0 p.p.）。它优于视觉知识蒸馏，在相同计算预算下更准确，或在相似准确率下更快。我们的分析表明，TextTeacher在训练初期塑造了更深的层，并通过补充互补的语义线索帮助泛化。TextTeacher增加的开销很小，不需要对目标模型进行昂贵的多模态训练，并保持纯视觉模型的简洁性和延迟。

英文摘要

The platonic representation hypothesis suggests that sufficiently large models converge to a shared representation geometry, even across modalities. Motivated by this, we ask: Can the semantic knowledge of a language model efficiently improve a vision model? As an answer, we introduce TextTeacher, a simple auxiliary objective that injects text embeddings as additional information into image classification training. TextTeacher uses readily available image captions, a pre-trained and frozen text encoder, and a lightweight projection to produce semantic anchors that efficiently guide representations during training while leaving the inference-time model unchanged. On ImageNet with standard ViT backbones, TextTeacher improves accuracy by up to +2.7 percentage points (p.p.) and yields consistent transfer gains (on average +1.0 p.p.) under the same recipe and compute. It outperforms vision knowledge distillation, yielding more accuracy at a constant compute budget or similar accuracy, but 33% faster. Our analysis indicates that TextTeacher acts as a feature-space preconditioner, shaping deeper layers in the first stages of training, and aiding generalization by supplying complementary semantic cues. TextTeacher adds negligible overhead, requires no costly multimodal training of the target model and preserves the simplicity and latency of pure vision models. Project page with code and captions: https://nauen-it.de/publications/text-teacher

URL PDF HTML ☆

赞 0 踩 0

2605.22097 2026-05-22 quant-ph cs.LG 版本更新

Q-PhotoNAS: Hybrid Quantum Neural Architecture Search Framework on Photonic Devices

Q-PhotoNAS：基于光子设备的混合量子神经架构搜索框架

Farah Elnakhal, Alberto Marchisio, Nouhaila Innan, Gabriel Falcao, Muhammad Shafique

发表机构 * Quandela Ascella photonic QPU（Quandela Ascella 光子量子处理器）

AI总结本文提出了一种结合遗传算法和可学习量子相位编码的混合光子量子-经典模型神经架构搜索框架，通过系统探索经典和量子组件的联合设计空间，提高了图像分类任务的准确率和硬件兼容性。

详情

AI中文摘要

光子量子计算是一种有前景的可扩展量子机器学习平台，但在硬件和优化约束下设计有效的混合架构仍然具有挑战性。现有方法依赖于手动调优的架构，无法考虑经典预处理、相位编码和光子电路结构之间的协同作用，限制了准确性和硬件兼容性。在本文中，我们提出了一种混合光子量子-经典模型的神经架构搜索框架，结合基于遗传算法的搜索和可学习量子相位编码，系统地探索经典和量子组件的联合设计空间。我们的框架编码了19个超参数，分布在六个基因组中，并通过基于组的交叉、按基因突变和精英主义进化混合架构的种群。在短训练预算下评估每个候选者，然后对最佳设计进行完整重新训练。我们在两个图像分类基准测试上评估了我们的框架，即Digits和MNIST，分别达到了99.44%和98.78%的最终验证准确率，基于Quandela Ascella光子QPU的第一性执行时间估计，单张图像推断时间分别为67 ms（Digits）和149 ms（MNIST）。我们的量子贡献分析进一步显示，光子层提取了与经典路径正交的非冗余特征，相较于仅经典基线提供了可测量的准确性优势。我们的结果表明，自动化架构搜索对于混合光子系统来说既实用又具有影响，为在光子设备上量子AI的系统设计空间探索开辟了道路。

英文摘要

Photonic quantum computing is a promising platform for scalable quantum machine learning, but designing effective hybrid architectures remains challenging under hardware and optimization constraints. Existing approaches rely on manually tuned architectures that fail to account for the collaboration between classical preprocessing, phase encoding, and photonic circuit structure, limiting both accuracy and hardware compatibility. In this paper, we propose a neural architecture search framework for hybrid photonic quantum-classical models that combines genetic algorithm-based search with learnable quantum phase encoding to systematically explore the joint design space of classical and quantum components. Our framework encodes 19 hyperparameters across six gene groups and evolves a population of hybrid architectures using group-based crossover, per-gene mutation, and elitism, evaluating each candidate on a short training budget before full retraining of the best found design. We evaluate our framework on two image classification benchmarks, Digits and MNIST, achieving final validation accuracies of 99.44% and 98.78%, respectively, with first-principles execution time estimates on the Quandela Ascella photonic QPU projecting single-image inference at 67 ms (Digits) and 149 ms (MNIST). Our quantum contribution analysis further shows that the photonic layer extracts non-redundant features orthogonal to the classical pathway, providing a measurable accuracy advantage over classical-only baselines. Our results demonstrate that automated architecture search is both practical and impactful for hybrid photonic systems, opening the way for systematic design space exploration of quantum AI on photonic devices.

URL PDF HTML ☆

赞 0 踩 0

2605.21214 2026-05-22 cs.LG cs.AI 版本更新

Behavior-Consistent Deep Reinforcement Learning

行为一致的深度强化学习

Marcel Hussing, Liv G. d'Aliberti, Claas Voelcker, Benjamin Eysenbach, Eric Eaton

发表机构 * University of Pennsylvania（宾夕法尼亚大学）； Princeton University（普林斯顿大学）； University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结本文提出了一种行为一致的深度强化学习方法，通过控制策略的分布相似性来减少跨训练运行的策略分歧，从而提高稳定性和性能。

详情

AI中文摘要

强化学习（RL）在不同训练运行中常常表现出高方差，导致性能不可靠，并对现实领域中的部署构成重大挑战。在本文中，我们通过形式化行为一致的RL问题来解决跨运行策略分歧的挑战，目标是获得在不同训练运行中表现优异且分布相似的策略。我们的关键观察是最大熵RL提供了一种直接机制来控制行为分歧，通过将运行锚定到一个共同的（均匀）先验。我们证明，对于玻尔兹曼策略，选择温度与Q函数分歧界成正比可以限制诱导策略之间的成对KL散度。然而，我们还表明，简单地增加熵可能会损害策略优化并放大非策略误差。基于这些观察，我们提出了Q值期望分歧（QED），一种状态依赖的温度调度，利用双批评机分歧作为单次运行的跨运行分歧代理。经验上，我们在18个连续控制任务中展示了QED将跨运行分歧减少两个数量级，而不会牺牲性能，从而在适度的样本效率成本下实现了显著的回报方差减少。

英文摘要

Reinforcement learning (RL) often exhibits high variance across training runs, leading to unreliable performance and posing a major challenge to deployment in real-world domains. In this work, we address the challenge of cross-run policy divergence by formalizing the problem of behavior-consistent RL, where the objective is to obtain policies that are both high-performing and distributionally similar across training runs. Our key observation is that maximum-entropy RL provides a direct mechanism for controlling behavioral divergence by anchoring runs to a common (uniform) prior. We prove that, for Boltzmann policies, choosing the temperature proportional to $Q$-function disagreement bounds the pairwise KL divergence between the induced policies. However, we also show that naïvely increasing entropy might impair policy optimization while amplifying off-policy error. Building upon these observations, we propose $Q$-value Expectile Disagreement (QED), a state-dependent temperature schedule that uses double-critic disagreement as a single-run proxy for cross-run disagreement. Empirically, we demonstrate that across 18 continuous-control tasks, QED reduces across-run divergence by two orders of magnitude without sacrificing performance, resulting in a considerable reduction in return variance at modest sample-efficiency costs.

URL PDF HTML ☆

赞 0 踩 0

2605.21143 2026-05-22 cs.SD cs.LG 版本更新

CoarseSoundNet: Building a reliable model for ecological soundscape analysis

CoarseSoundNet：构建一个可靠的生态声音景观分析模型

Alexander Gebhard, Andreas Triantafyllopoulos, Dominik Arend, Sandra Müller, Svenja Schmidt, Michael Scherer-Lorenzen, Björn W. Schuller

发表机构 * organization= TUM University Hospital, CHI -- Chair of Health Informatics , addressline= Ismaninger Str. 22 , city= Munich , postcode= 81675 , state= Bavaria , country= Germany ； organization= University of Freiburg, Faculty of Biology, Geobotany , addressline= Schaenzlestr. 1 , city= Freiburg , postcode= 79104 , state= Baden-Württemberg , country= Germany ； organization= MCML -- Munich Center for Machine Learning , city= Munich , state= Bavaria , country= Germany ； organization= Imperial College London, GLAM -- Group on Language, Audio, \& Music , city= London , country= UK

AI总结本文提出CoarseSoundNet模型，用于在真实噪声环境下对生物声音、地质声音和人类声音进行分类，并通过系统研究模型架构、训练数据和评估策略，提高了模型在被动声学监测中的泛化能力。

Comments Currently under review

详情

AI中文摘要

声音景观由三种声音组成：生物声音（动物发出的声音）、地质声音（自然非生物声音）和人类声音（人类发出的声音）。在声音景观生态学领域，一个关键研究问题是这些组成部分如何相互作用，特别是生物声音如何响应地质声音和人类声音。然而，目前尚缺乏能够对这些元素进行区分量化分析的工具。最近的机器学习（ML）方法旨在支持自动化分析，但通常依赖于任务特定或干净的数据，限制了其在噪声被动声学监测（PAM）记录中的泛化能力。本文提出了一种清晰且可重复的结构来构建用于粗粒度声音景观分类的ML模型，并引入了CoarseSoundNet，一个经过训练以在真实PAM条件下区分生物声音、地质声音和人类声音的深度学习模型。我们系统地研究了模型架构、额外训练类的影响、数据组成和评估策略。我们的发现表明，模型性能随着额外PAM数据的增加而提高，特别是当数据与目标领域相似时，并且通过在训练中引入显式的静默类进一步提高性能。类特定的决策阈值和基于持续时间的约束进一步提高了性能，特别是在人类声音和地质声音方面。错误分析显示，人类声音由于掩蔽效应而面临挑战，而静默和昆虫声音在地质和生物声音方面存在混淆。最后，我们进行了一项生态案例研究，表明使用CoarseSoundNet预过滤记录可以产生与地面真实过滤相当的声学指数趋势，支持其作为生态声学分析有效预处理工具的使用。

英文摘要

A soundscape is composed of three types of sound: biophony (sounds made by animals), geophony (natural abiotic sounds) and anthropophony (sounds made by humans). A key research question in the field of soundscape ecology is how these components interact with each other, specifically how biophony responds to geophony and anthropophony. Nevertheless, as of today, there are not many analytical instruments that enable the distinct quantification of these elements. Recent machine learning (ML) approaches aim to support automated analysis but often rely on task-specific or clean data, limiting generalisation to noisy passive acoustic monitoring (PAM) recordings. This study presents a clear and reproducible structure to build ML models for coarse soundscape classification and introduces CoarseSoundNet, a deep learning model trained to distinguish biophony, geophony, and anthropophony under realistic PAM conditions. We systematically investigate model architectures, the influence of an additional training class, data composition, and evaluation strategies. Our findings suggest that model performance improves with additional PAM data, especially when similar to the target domain, and by introducing an explicit silence class during training. Class-specific decision thresholds and duration-based constraints further enhance performance, particularly for anthropophony and geophony. Error analyses exhibit challenges for anthropophony due to masking effects and confusions for silence and insect sounds for geophony and biophony. Finally, we conduct an ecological case study which shows that pre-filtering recordings with CoarseSoundNet yields acoustic index trends comparable to ground-truth filtering, supporting its use as an effective preprocessing tool for ecoacoustic analyses.

URL PDF HTML ☆

赞 0 踩 0

2605.20975 2026-05-22 cs.LG cs.CR 版本更新

Choose Wisely and Privately: Proactive Client Selection for Fair and Efficient Federated Learning

明智且私密地选择：为公平和高效的联邦学习进行主动客户端选择

Adda Akram Bendoukha, Heber Hwang Arcolezi, Nesrine Kaaniche, Aymen Boudguiga

发表机构 * GitHub

AI总结本文提出了一种主动客户端选择框架，旨在在训练前找到满足效用和公平性要求的最佳客户端联邦，以提高联邦学习的效率和公平性。

详情

AI中文摘要

联邦学习使能够在去中心化的数据源上进行协作模型训练而无需数据传输。基于平均的联邦学习受限于非独立同分布数据的存在，这会负面影响收敛速度和最终模型的准确性。传统替代方法存在显著的低效率。包含噪声或高度异质数据的客户端会进行昂贵的梯度计算，这些计算在聚合前要么被丢弃要么被大幅降权。这些反应式方法浪费计算资源，需要更多的通信轮次并导致不必要的隐私暴露。在本文中，我们提出了一种主动客户端选择框架，旨在在训练开始前找到一个最优的客户端联邦，其联合数据满足效用和公平性要求。我们的方法依赖于从差分隐私连续表中计算出的互信息来量化联合数据集中的跨特征相关性的重要性。我们引入了一个潜在联邦损失（PFL）在固定大小的联邦集上，它平衡了两个目标。最大化集体数据效用的同时确保公平的跨特征相关性以防止群体不公平。客户端选择被表达为一个最优子集搜索问题，基于PFL目标，我们使用模拟退火在强差分隐私保证下解决客户端的本地统计信息。在四个基准上的实验结果表明，与均匀抽样相比，使用最优找到的联邦训练的模型更快、更公平且更准确，即使当使用最先进的自适应聚合或抽样策略时也是如此。

英文摘要

Federated Learning enables collaborative model training across decentralized data sources without data transfer. Averaging-based FL is limited by the presence of non-IID data, which negatively impacts convergence speed and final model accuracy. Conventional alternatives suffer from significant inefficiency. Clients with noisy or highly heterogeneous data contribute expensive gradient computations that are either discarded or heavily down-weighted before aggregation. These reactive approaches waste computational resources, require more communication rounds and result in unnecessary privacy exposure. In this paper, we propose a proactive client selection framework that aims to find an optimal federation of clients whose combined data match utility and fairness requirements before training begins. Our method relies on mutual information computed from differentially private contingency tables to quantify the relevance of cross-feature correlations in the union dataset. We introduce a Potential Federation Loss (PFL) over the set of fixed-size federations, which balances two objectives. Maximizing collective data utility while ensuring fair cross-features correlations to prevent group unfairness. Client selection is expressed as an optimal subset search problem over the PFL objective, which we solve using simulated annealing under strong differential privacy guarantees for clients' local statistics. Experimental results on four benchmarks show faster, fairer, and more accurate models trained on optimally found federations, compared to uniform sampling, even when state-of-the-art adaptive aggregation or sampling strategies are employed.

URL PDF HTML ☆

赞 0 踩 0

2605.20069 2026-05-22 cs.LG cs.GT 版本更新

Smooth Partial Lotteries for Stable Randomized Selection

用于稳定随机选择的平滑部分彩票

Alexander Goldberg, Giulia Fanti, Nihar B. Shah

发表机构 * New Zealand Health Research Council（新西兰健康研究理事会）； Swiss National Science Foundation（瑞士国家科学基金会）； European Research Council（欧洲研究理事会）； Science Foundation Ireland（爱尔兰科学基金会）； Volkswagen Foundation（大众基金会）； The British Academy（英国学院）； Austrian Science Fund (FWF)（奥地利科学基金）； Formas（Formas基金会）； Luebber et al.（Luebber等人）

AI总结本文提出平滑性作为部分彩票设计原则，通过定义评分到选择概率的Lipschitz条件，提出Clipped Linear Lottery机制，证明其在平滑性与遗憾之间取得更好的平衡，并通过实验验证其在实际应用中的有效性。

详情

AI中文摘要

竞争性选择过程，从科学资金资助到招生和招聘，使用评估来评分候选人，并最终根据这些评分选择一部分人。最近，许多组织采用了部分彩票，根据评估评分随机化选择。然而，现有的彩票设计本质上是不稳定的，因为对单个候选人的评分的微小变化会导致其选择概率的大幅变化。这种不稳定性削弱了彩票的一个关键目标：减少决策边界附近细微评分区别的影响。我们提出平滑性作为部分彩票的设计原则，并将其形式化为评分到选择概率的映射的Lipschitz条件。我们引入了Clipped Linear Lottery，一种简单的机制，其中选择概率与估计质量在上阈值和下阈值之间线性变化，上阈值以上我们总是接受，下阈值以下我们总是拒绝。我们证明Clipped Linear Lottery的最坏遗憾与任何平滑选择规则的下界在(1 - k/n)因子内匹配，其中k/n是接受率。我们比较平滑选择与其他稳定性概念如个体公平性和差分隐私，证明Clipped Linear Lottery在平滑性与遗憾的权衡上优于其他方法。在ICLR 2025、NeurIPS 2024和瑞士国家科学基金会的真实同行评审数据上的实验表明，现有彩票设计在实践中即使在单个评分扰动下也高度不稳定。我们的实验还确认了我们的理论分析的紧性，并证明我们提出的Clipped Linear Lottery在实践中比其他方法在平滑性与效用的权衡上更优。

英文摘要

Competitive selection processes, from scientific funding to admissions and hiring, use evaluations to score candidates, and eventually choose a subset of them based on those scores. Recently, many organizations have adopted partial lotteries, which randomize selection based on evaluation scores. However, existing lottery designs are inherently unstable, as a small change to a single candidate's score can cause large shifts in their selection probabilities. This instability undermines a key goal of lotteries: reducing the influence of fine-grained score distinctions near the decision boundary. We propose smoothness as a design principle for partial lotteries, formalizing it as a Lipschitz condition on the mapping from review scores over candidates to selection probabilities. We introduce the Clipped Linear Lottery, a simple mechanism in which selection probabilities scale linearly with estimated quality between an upper threshold, above which we always accept, and a lower threshold, below which we always reject. We prove that the Clipped Linear Lottery's worst-case regret matches a lower bound for any smooth selection rule up to a factor of $(1 - k/n)$, where $k/n$ is the acceptance rate. We compare smooth selection to other stability notions like Individual Fairness and Differential Privacy, showing that the Clipped Linear Lottery achieves a better smoothness-regret tradeoff than alternatives. Experiments on real peer review data from ICLR 2025, NeurIPS 2024, and the Swiss National Science Foundation demonstrate that existing lottery designs are highly unstable in practice even under perturbations to a single score. Our experiments also confirm the tightness of our theoretical analysis and show that our proposed Clipped Linear Lottery achieves a better smoothness-utility tradeoff than alternatives in practice.

URL PDF HTML ☆

赞 0 踩 0

2605.19965 2026-05-22 cs.LG eess.SP 版本更新

TextSeal: 一种用于溯源与蒸馏保护的本地化大语言模型水印

Tom Sander, Hongyan Chang, Tomáš Souček, Tuan Tran, Valeriu Lacatusu, Sylvestre-Alvise Rebuffi, Alexandre Mourachko, Surya Parimi, Christophe Ropers, Rashel Moritz, Vanessa Stark, Hady Elsahar, Pierre Fernandez

发表机构 * FAIR, Meta Superintelligence Labs（FAIR，Meta超智能实验室）

AI总结本文提出TextSeal，一种先进的大语言模型水印技术，通过Gumbel-max采样引入双密钥生成以恢复输出多样性，并结合熵加权评分和多区域定位提升检测性能。该方法支持推测解码和多令牌预测等服务优化，不增加推理开销。在检测强度上严格优于基线方法SynthID-text，并对稀释具有鲁棒性，即使在混合的人类/AI文档中也能保持自信的本地化检测。理论上该方案无失真，经推理基准评估证实其保持下游性能；同时通过多语言人工评估（6000次A/B对比，5种语言）显示无明显质量差异。除了用于溯源检测外，TextSeal还具有'放射性'特性：其水印信号通过模型蒸馏传递，可检测未经授权的使用。

详情

AI中文摘要

我们介绍TextSeal，一种最先进的大语言模型水印。基于Gumbel-max采样，TextSeal引入双密钥生成以恢复输出多样性，同时结合熵加权评分和多区域定位以提升检测性能。它支持推测解码和多令牌预测等服务优化，并不增加任何推理开销。TextSeal在检测强度上严格优于基线方法如SynthID-text，并对稀释具有鲁棒性，即使在混合的人类/AI文档中也能保持自信的本地化检测。该方案在理论上是无失真的，经推理基准评估确认其保持下游性能；同时通过多语言人工评估（6000次A/B对比，5种语言）显示无明显质量差异。除了用于溯源检测外，TextSeal还具有'放射性'特性：其水印信号通过模型蒸馏传递，可检测未经授权的使用。

英文摘要

We introduce TextSeal, a state-of-the-art watermark for large language models. Building on Gumbel-max sampling, TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and multi-region localization for improved detection. It supports serving optimizations such as speculative decoding and multi-token prediction, and does not add any inference overhead. TextSeal strictly dominates baselines like SynthID-text in detection strength and is robust to dilution, maintaining confident localized detection even in heavily mixed human/AI documents. The scheme is theoretically distortion-free, and evaluation across reasoning benchmarks confirms that it preserves downstream performance; while a multilingual human evaluation (6000 A/B comparisons, 5 languages) shows no perceptible quality difference. Beyond its use for provenance detection, TextSeal is also ``radioactive'': its watermark signal transfers through model distillation, enabling detection of unauthorized use.

URL PDF HTML ☆

赞 0 踩 0

2605.12058 2026-05-22 cs.LG cs.AI 版本更新

Holder Policy Optimisation

Hölder Policy Optimisation

Yuxiang Chen, Dingli Liang, Yihang Chen, Ziqin Gong, Chenyang Le, Zhaokai Wang, Jiachen Zhu, Lingyu Yang, Jianghao Lin, Weinan Zhang, Jun Wang

发表机构 * University College London（伦敦大学学院）； Shanghai Jiao Tong University（上海交通大学）； The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））

AI总结本文提出HölderPO框架，通过Hölder均值统一token级概率聚合，解决固定聚合机制导致的训练崩溃与性能不足问题，理论证明不同p值对梯度集中度和方差的平衡作用，并通过动态退火算法实现训练周期内的p值调度，实验表明其在多个数学基准测试中取得更优的稳定性和收敛性。

详情

AI中文摘要

Group Relative Policy Optimisation (GRPO) enhances large language models by estimating advantages across a group of sampled trajectories. However, mapping these trajectory-level advantages to policy updates requires aggregating token-level probabilities within each sequence. Relying on a fixed aggregation mechanism for this step fundamentally limits the algorithm's adaptability. Empirically, we observe a critical trade-off: certain fixed aggregations frequently suffer from training collapse, while others fail to yield satisfactory performance. To resolve this, we propose HölderPO, a generalised policy optimisation framework unifying token-level probability aggregation via the Hölder mean. By explicitly modulating the parameter $p$, our framework provides continuous control over the trade-off between gradient concentration and variance bounds. Theoretically, we prove that a larger $p$ concentrates the gradient to amplify sparse learning signals, whereas a smaller $p$ strictly bounds gradient variance. Because no static configuration can universally resolve this concentration-stability trade-off, we instantiate the framework with a dynamic annealing algorithm that progressively schedules $p$ across the training lifecycle. Extensive evaluations demonstrate superior stability and convergence over existing baselines. Specifically, our approach achieves a state-of-the-art average accuracy of $54.9\%$ across multiple mathematical benchmarks, yielding a substantial $7.2\%$ relative gain over standard GRPO and secures an exceptional $93.8\%$ success rate on ALFWorld.

英文摘要

Group Relative Policy Optimisation (GRPO) enhances large language models by estimating advantages across a group of sampled trajectories. However, mapping these trajectory-level advantages to policy updates requires aggregating token-level probabilities within each sequence. Relying on a fixed aggregation mechanism for this step fundamentally limits the algorithm's adaptability. Empirically, we observe a critical trade-off: certain fixed aggregations frequently suffer from training collapse, while others fail to yield satisfactory performance. To resolve this, we propose \textbf{HölderPO}, a generalised policy optimisation framework unifying token-level probability aggregation via the Hölder mean. By explicitly modulating the parameter $p$, our framework provides continuous control over the trade-off between gradient concentration and variance bounds. Theoretically, we prove that a larger $p$ concentrates the gradient to amplify sparse learning signals, whereas a smaller $p$ strictly bounds gradient variance. Because no static configuration can universally resolve this concentration-stability trade-off, we instantiate the framework with a dynamic annealing algorithm that progressively schedules $p$ across the training lifecycle. Extensive evaluations demonstrate superior stability and convergence over existing baselines. Specifically, our approach achieves a state-of-the-art average accuracy of $54.9\%$ across multiple mathematical benchmarks, yielding a substantial $7.2\%$ relative gain over standard GRPO and secures an exceptional $93.8\%$ success rate on ALFWorld.

URL PDF HTML ☆

赞 0 踩 0

2605.11246 2026-05-22 cs.LG 版本更新

Support-Proximity Augmented Diffusion Estimation for Offline Black-Box Optimization

支持接近增强的扩散估计用于离线黑盒优化

Yonghan Yang, Ye Yuan, Zipeng Sun, Linfeng Du, Bowei He, Haolun Wu, Can Chen, Xue Liu

发表机构 * MBZUAI - Mohamed bin Zayed University of Artificial Intelligence（MBZUAI - 摩擦 bin Zayed 大学）； McGill University（麦吉尔大学）； Mila - Quebec AI Institute（Mila - 加拿大AI研究所）； Amazon AGI（亚马逊人工智能实验室）

AI总结本文提出SPADE框架，通过条件生成建模重新想象前向替代建模，利用扩散模型建模前向似然p(y|x)，并引入校准扩散估计模块和支撑接近正则化机制，以提高优化性能。

Comments Accepted by ICML 2026. First two authors contributed equally

详情

AI中文摘要

离线黑盒优化旨在仅使用静态数据集发现具有高属性分数的新设计，这一任务本质上受到分布外（OOD）外推问题的挑战。现有方法通常分为逆向方法，其在将分数映射到设计的 ill-posed 性质上挣扎，以及前向方法，其往往缺乏量化不确定性有效性的分布表达能力。在本文中，我们提出SPADE（Support-Proximity Augmented Diffusion Estimation），一种新颖的框架，通过条件生成建模的视角重新想象前向替代建模。SPADE通过扩散模型建模前向似然p(y|x)，但通过两个关键增强来适应优化：（1）校准扩散估计模块，强制统计矩和成对排名的全局一致性；（2）支撑接近正则化机制，通过kNN基于的密度估计隐式内化数据流形约束p(x)。理论上，我们证明我们的正则化在第一阶上等价于最大化具有有效设计先验的贝叶斯后验。经验上，SPADE在Design-Bench任务和LLM数据混合优化基准上实现了最先进的性能。

英文摘要

Offline black-box optimization aims to discover novel designs with high property scores using only a static dataset, a task fundamentally challenged by the out-of-distribution (OOD) extrapolation problem. Existing approaches typically bifurcate into inverse methods, which struggle with the ill-posed nature of mapping scores to designs, and forward methods, which often lack the distributional expressivity to quantify uncertainty effectively. In this work, we propose SPADE (Support-Proximity Augmented Diffusion Estimation), a novel framework that reimagines forward surrogate modeling through the lens of conditional generative modeling. SPADE models the forward likelihood p(y|x) using a diffusion model, but with two critical enhancements to tailor it for optimization: (1) a Calibrated Diffusion Estimation module that enforces global consistency in statistical moments and pairwise rankings, and (2) a Support-Proximity Regularization mechanism that implicitly internalizes the data manifold constraint p(x) via kNN-based density estimation. Theoretically, we prove that our regularization is first-order equivalent to maximizing a Bayesian posterior with a valid design prior. Empirically, SPADE achieves state-of-the-art performance across Design-Bench tasks and an LLM data mixture optimization benchmark.

URL PDF HTML ☆

赞 0 踩 0

2605.08982 2026-05-22 cs.LG 版本更新

PMCTS: Particle Monte Carlo Tree Search for Principled Parallelized Inference Time Scaling

PMCTS：用于原理化并行推断时间扩展的粒子蒙特卡洛树搜索

Yaniv Oren, Viliam Vadocz, Joery A. de Vries, Wendelin Böhmer, Matthijs T. J. Spaan, Hendrik Baier

发表机构 * Department of Intelligent Systems, TU Delft（代尔夫特理工大学智能系统系）； Department of Computer Science, ETH Zürich（苏黎世联邦理工学院计算机科学系）； Trent AI Limited（Trent AI有限公司）； Information Systems, TU Eindhoven（埃因霍温理工大学信息系统系）； Centrum Wiskunde & Informatica, Amsterdam（阿姆斯特丹数学与信息学研究中心）

AI总结本文提出PMCTS，一种适用于神经网络评估的原理化并行MCTS算法，通过并行计算实现推断时间扩展，并在多个领域中显著优于传统启发式基线方法。

2605.07598 2026-05-22 cs.LG 版本更新

关于漂移模型的Wasserstein梯度流解释

Arthur Gretton, Li Kevin Wenliang, Alexandre Galashov, James Thornton, Valentin De Bortoli, Arnaud Doucet

发表机构 * Google DeepMind（谷歌深Mind）

AI总结本文通过Wasserstein梯度流分析了漂移模型，揭示了GMD框架与WGF路径之间的关系，展示了三种主要结果：漂移模型中的算法对应于KL散度的WGF极限点，实际实现的算法对应于Sinkhorn散度的固定点但缺乏某些特性，同时该方法可以扩展到其他WGF的极限点，如MMD、切线Wasserstein距离和GAN批评者函数。

详情

AI中文摘要

最近，Deng等人（2026）提出了生成模型通过漂移（GMD），一种新的生成任务框架。本文通过Wasserstein梯度流（WGF）的视角分析了GMD，即概率测度空间中函数的最速下降路径，配备了最优传输的几何结构。与之前的WGF相关贡献不同，GMD可以被视为直接针对特定WGF流的固定点。我们展示了三个主要结果：首先，Deng等人（2026）提出的一种算法对应于在KL散度上的WGF的极限点，伴有Parzen平滑。其次，Deng等人（2026）实际实现的算法对应于另一种过程，类似于Sinkhorn散度的固定点，但缺乏后者的一些理想特性。第三，同样的想法可以扩展到其他WGF的极限点，包括最大均值差异（MMD）、切线Wasserstein距离和GAN批评者函数。

英文摘要

Recently, Deng et al. (2026) proposed Generative Modeling via Drifting (GMD), a novel framework for generative tasks. This note presents an analysis of GMD through the lens of Wasserstein Gradient Flows (WGF), i.e., the path of steepest descent for a functional in the space of probability measures, equipped with the geometry of optimal transport. Unlike previous WGF-based contributions, GMD can be thought of as directly targeting a fixed point of a specific WGF flow. We demonstrate three main results: first, that one algorithm proposed by Deng et al. (2026) corresponds to finding the limiting point of a WGF on the KL divergence, with Parzen smoothing on the densities. Second, that the algorithm actually implemented by Deng et al. (2026) corresponds to a different procedure, which bears some resemblance to the fixed point of a WGF on the Sinkhorn divergence, but lacks certain desirable properties of the latter. Third, the same same idea can be extended to the limiting point of other WGFs, including the Maximum Mean Discrepancy (MMD), the sliced Wasserstein distance, and GAN critic functions.

URL PDF HTML ☆

赞 0 踩 0

2605.04217 2026-05-22 cs.LG cs.CL 版本更新

Jordan-RoPE: Non-Semisimple Relative Positional Encoding via Complex Jordan Blocks

Jordan-RoPE: 通过复Jordan块实现非半单相对位置编码

Yaobo Zhang

发表机构 * School of Physics, Ningxia University（宁夏大学物理学院）

AI总结本文提出了一种非半单相对位置编码Jordan-RoPE，通过复旋转特征和Nilpotent响应在同一缺陷Jordan块中实现距离调制的相位基，从而生成振荡-多项式特征，如e^{-γd}cos(ωd)、e^{-γd}sin(ωd)等，并在语言模型中验证了其有效性。

Comments 15 pages, 4 figures, 6 tables; code available at https://github.com/ybzhang-nxu/jordan_rope

详情

AI中文摘要

相对位置编码决定了查询-键滞后函数能够进入原始注意力logit的哪些功能。RoPE提供旋转相位，而ALiBi提供加性距离偏置。受线性平移不变位置编码的群论观点启发，我们研究了非半单情况，其中复旋转特征和Nilpotent响应共存于同一缺陷Jordan块中。所生成的相对算子产生如e^{-γd}cos(ωd)、e^{-γd}sin(ωd)、d e^{-γd}cos(ωd)和d e^{-γd}sin(ωd)等振荡-多项式特征，其中因果滞后d=i-j≥0。因此，该构造实现了距离调制的相位基d e^{iωd}，而非仅仅添加单独的距离通道到RoPE。我们将其精确Jordan-RoPE公式化为非半单一参数表示，给出其实块形式，并指定非正交位置映射所需的共轭查询作用。我们还区分了该精确表示与稳定变体，后者虽然改善了数值行为但破坏了精确群律。核级别诊断和一个Jordan友好的合成语言模型任务表明，当目标包含距离调制的相位交互时，耦合的Jordan基是有用的。在小型WikiText-103字语言模型上，一个缩放精确变体在Jordan家族中优于RoPE和直接求和基线，而RoPE+ALiBi仍然是整体最强的。证据是结构性的，而非广义的性能声明。

英文摘要

Relative positional encodings determine which functions of query-key lag can enter the primitive attention logit. RoPE supplies a rotary phase, while ALiBi supplies an additive distance bias. Motivated by group-theoretic views of linear translation-invariant positional encodings, we study a non-semisimple case in which a complex rotary eigenvalue and a nilpotent response live in the same defective Jordan block. The resulting relative operator generates oscillatory-polynomial features such as $e^{-γd}\cos(ωd)$, $e^{-γd}\sin(ωd)$, $d e^{-γd}\cos(ωd)$, and $d e^{-γd}\sin(ωd)$, for causal lag $d=i-j\geq 0$. Thus the construction realizes a distance-modulated phase basis $d e^{iωd}$, rather than merely adding a separate distance channel to RoPE. We formulate Exact Jordan-RoPE as a non-semisimple one-parameter representation, give its real block form, and specify the contragredient query action required by non-orthogonal positional maps. We also distinguish this exact representation from stabilized variants whose bounded shear improves numerical behavior but breaks the exact group law. Kernel-level diagnostics and a Jordan-friendly synthetic language-model task show that the coupled Jordan basis is useful when the target contains distance-modulated phase interactions. On a small WikiText-103 byte language model, a scaled-exact variant improves over RoPE and direct-sum baselines within the Jordan family, while RoPE+ALiBi remains strongest overall. The evidence is structural rather than a broad performance claim.

URL PDF HTML ☆

赞 0 踩 0

2605.04062 2026-05-22 cs.LG cs.AI 版本更新

EdgeRazor: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation

EdgeRazor: 一种通过混合精度量化感知蒸馏实现大语言模型轻量化的框架

Shu-Hao Zhang, Le-Tong Huang, Xiang-Sheng Deng, Xin-Yi Zou, Chen Wu, Nan Li, Shao-Qun Zhang, Zhi-Hua Zhou

发表机构 * National Key Laboratory for Novel Software Technology, Nanjing University（南京大学新型软件技术国家重点实验室）； School of Intelligent Science and Technology, Nanjing University（南京大学智能科学与技术学院）； School of Artificial Intelligence, Nanjing University（南京大学人工智能学院）； Microsoft AI（微软AI）

AI总结本文提出EdgeRazor框架，通过混合精度量化感知蒸馏方法，在资源受限设备上部署大语言模型，实现了更高的压缩比和更高效的性能。

详情

AI中文摘要

量化已成为在资源受限设备上部署大语言模型（LLMs）的主流方法，但将精度压缩到低于4位通常会导致严重的性能退化或高昂的重训练成本。在本文中，我们提出了EdgeRazor，一种通过混合精度量化感知蒸馏实现LLM轻量化的框架。它包含三个模块：混合精度结构量化用于精细控制位宽，层自适应特征蒸馏动态选择最信息丰富的特征进行对齐，以及熵感知KL散度用于在人工标注和蒸馏数据集上实现前向-反向平衡。在MobileLLM和Qwen系列上的评估表明，在权重-激活量化下，1.88位的Qwen3-0.6B-EdgeRazor在2位基准上表现优异，优于11.27，超过最强的3位基准4.38。在效率方面，EdgeRazor在所有位宽下实现了更高的压缩比，1.58位的Qwen3-0.6B-EdgeRazor将存储从1.11 GB减少到0.19 GB，同时在16位基准上加速解码15.16倍。这些结果经验上验证了EdgeRazor的有效性和效率。代码可以从GitHub和Huggingface访问。

英文摘要

Quantization has emerged as a mainstream approach for deploying Large Language Models (LLMs) on resource-constrained devices, yet compressing precision below 4-bit typically causes severe performance degradation or prohibitive retraining costs. In this paper, we propose EdgeRazor, a lightweight framework for LLMs via Mixed-Precision Quantization-Aware Distillation. It contains three modules: Structural Quantization with Mixed Precision for fine-grained control of bit-widths, Layer-Adaptive Feature Distillation that dynamically selects the most informative features for alignment, and Entropy-Aware KL Divergence for forward-reverse balance on both human-annotated and distilled datasets. Evaluations conducted on MobileLLM and Qwen families show that under weight-activation quantization, the 1.88-bit Qwen3-0.6B-EdgeRazor outperforms the state-of-the-art 2-bit baselines by 11.27 and surpasses the strongest 3-bit baselines by 4.38, while the quantized MobileLLM-350M-EdgeRazor requires a training budget 4-10$\times$ lower than the leading quantization-aware training method. In terms of efficiency, EdgeRazor achieves higher compression ratios at all bit-widths, and the 1.58-bit Qwen3-0.6B-EdgeRazor reduces storage from 1.11 GB to 0.19 GB while accelerating decoding by 15.16$\times$ over the 16-bit baseline. These results empirically validate the effectiveness and efficiency of EdgeRazor. The codes can be accessed from \href{https://github.com/zhangsq-nju/EdgeRazor}{GitHub} and \href{https://huggingface.co/collections/zhangsq-nju/edgerazor-nbit}{Huggingface}.

URL PDF HTML ☆

赞 0 踩 0

2605.02409 2026-05-22 cs.LG 版本更新

Inducing Permutation Invariant Priors in Bayesian Optimization for Carbon Capture and Storage Applications

在碳捕集与封存应用中诱导排列不变的先验分布

Sofianos Panagiotis Fotias, Vassilis Gaganis

发表机构 * School of Mining and Metallurgical Engineering, National Technical University of Athens（采矿与冶金工程学院，国家技术大学雅典）

AI总结本文提出了一种新的高斯过程核（GP-Perm），用于在碳捕集与封存项目中处理排列对称性问题，同时结合深度核学习模型（DKL-DS）以学习排列不变的嵌入，通过八个用例评估了所提出的方法。

详情

AI中文摘要

贝叶斯优化是一种迭代方法，专门用于优化昂贵的黑盒目标函数。像高斯过程（GP）这样的代理模型是贝叶斯优化的黄金标准，但当输入具有排列对称性时，常用的内核在处理无序项集时效率低下。受此问题的启发，我们转向在碳捕集与封存项目中使用排列不变的贝叶斯优化进行井位布置。高保真黑盒模拟器被指示在群控制下操作井，导致注入器和生产器群中出现无法被标准GP内核利用的排列对称性。在本工作中，我们的主要贡献是一种新的高斯过程内核（GP-Perm），通过比较集合的诱导经验表示之间的稳定分歧来编码排列不变性，并可以与标准内核结合以处理额外的向量值输入。作为学习不变的基线，我们还考虑了使用深度集架构的深度核学习模型（DKL-DS）来学习排列不变的嵌入。我们评估了所提出的方法在8个用例中的表现，包括七个合成基准和一个现实的CCS案例研究（Johansen构造）

英文摘要

Bayesian Optimization is an iterative method, tailored to optimizing expensive black box objective functions. Surrogate models like Gaussian Processes, which are the gold standard in Bayesian Optimization, can be inefficient for inputs with permutation symmetries, as the most common kernels employed are better suited for vector inputs rather than unordered sets of items. Motivated by this issue, we turn to permutation invariant Bayesian Optimization for well placement in Carbon Capture and Storage projects. The high fidelity black box simulator is instructed to operate wells under group control, giving rise to permutation symmetries within injector and producer groups that cannot be exploited with standard GP kernels. In this work, our main contribution is a novel Gaussian Process kernel (GP-Perm) that encodes permutation invariance by comparing sets through a stable divergence between their induced empirical representations, and can be combined with standard kernels for additional vector-valued inputs. As a learned invariant baseline, we also consider a Deep Kernel Learning model (DKL-DS) using the Deep Sets architecture to learn a permutation-invariant embedding. We evaluate the proposed methodology across 8 use cases, comprising seven synthetic benchmarks and one realistic CCS case study (Johansen formation)

URL PDF HTML ☆

赞 0 踩 0

2605.01369 2026-05-22 eess.SP cs.AI cs.LG 版本更新

MU-SHOT-Fi: Self-Supervised Multi-User Wi-Fi Sensing with Source-free Unsupervised Domain Adaptation

MU-SHOT-Fi: 基于源无关无监督域适应的多用户Wi-Fi感知

Ahmed Y. Radwan, Hina Tabassum

发表机构 * department of Electrical Engineering and Computer Science（电气工程与计算机科学系）

AI总结本文提出MU-SHOT-Fi框架，通过源无关无监督域适应方法，在单用户和多用户Wi-Fi感知中实现准确的活动分类和占用估计，同时防止模型崩溃。

Journal ref IEEE Internet of Things Journal, Early Access, 2026

详情

DOI: 10.1109/JIOT.2026.3686090

AI中文摘要

深度学习已被广泛应用于基于Wi-Fi CSI的人体活动识别（HAR），因为它能够以隐私保护和成本效益的方式学习时空特征。然而，基于深度学习的模型在跨环境泛化能力差，特别是在多用户设置中，重叠活动导致CSI纠缠和域偏移。实际部署通常由于隐私限制限制访问标记源数据，这促使使用仅未标记目标域CSI和预训练源模型进行源无关适应。在本文中，我们提出了MU-SHOT-Fi，一种用于单用户和多用户Wi-Fi感知的源无关无监督域适应框架。MU-SHOT-Fi在源训练期间采用排列不变的集合预测与匈牙利匹配，随后在目标域中采用冻结分类器骨干适应。为了实现无标签的稳定适应，我们引入了占用加权信息最大化，通过将多样性正则化集中在可能占用的槽位上，同时排除主导类别的边际熵。此外，我们采用二进制旋转预测作为空间自监督，利用CSI频率-时间结构学习域不变特征。对于单用户场景，我们引入SU-SHOT-Fi，通过将占用加权替换为标准信息最大化，并结合对比预测编码以利用时间一致性。在WiMANS和Widar 3.0数据集上进行了广泛的实验，涵盖了跨环境、跨频率、跨方向和组合域偏移，证明MU-SHOT-Fi在大域偏移下有效恢复多用户精确活动分类性能，同时保持准确的占用估计并防止向主导类崩溃。

英文摘要

Deep learning has been widely adopted for WiFi CSI-based human activity recognition (HAR) due to its ability to learn spatio-temporal features in a privacy-preserving and cost-effective manner. However, DL-based models generalize poorly across environments, a challenge amplified in multi-user settings where overlapping activities cause CSI entanglement and domain shifts. Practical deployments often limit access to labeled source data due to privacy constraints, motivating source-free adaptation using only unlabeled target-domain CSI and a pre-trained source model. In this paper, we propose MU-SHOT-Fi, a source-free unsupervised domain adaptation framework for single- and multi-user Wi-Fi sensing. MU-SHOT-Fi employs permutation-invariant set prediction with Hungarian matching during source training, followed by frozen-classifier backbone adaptation in the target domain. To enable stable adaptation without labels, we introduce occupancy-weighted information maximization that prevents model collapse by focusing diversity regularization on likely-occupied slots while excluding the dominant class from marginal entropy. Additionally, we employ binary rotation prediction as spatial self-supervision that exploits CSI frequency-time structure to learn domain-invariant features. For single-user scenarios, we introduce SU-SHOT-Fi by replacing occupancy weighting with standard information maximization and incorporating contrastive predictive coding to exploit temporal consistency. Extensive experiments on the WiMANS and Widar 3.0 datasets across cross-environment, cross-frequency, cross-orientation, and combined domain shifts demonstrate that MU-SHOT-Fi effectively recovers multi-user exact-activity classification performance under large domain shifts while maintaining accurate occupancy estimation and preventing collapse toward dominant classes.

URL PDF HTML ☆

赞 0 踩 0

2605.00392 2026-05-22 cs.CV cs.LG 版本更新

RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference

RTPrune: 两次阅读启发的令牌修剪用于高效DeepSeek-OCR推理

Ben Wan, Yan Feng, Zihan Tang, Weizhe Huang, Yuting Zeng, Jia Wang, Tongxuan Liu

发表机构 * Shanghai Jiao Tong University（上海交通大学）； University of Science and Technology of China（中国科学技术大学）； Tsinghua University（清华大学）

AI总结本文提出RTPrune，一种针对DeepSeek-OCR的两次阶段令牌修剪方法，通过优先保留高范数视觉令牌并利用最优传输理论进行令牌配对和合并，从而在OCR任务中实现更高效的推理性能和更优的效率-精度权衡。

Comments 21 pages, accepted by ICML2026

详情

AI中文摘要

DeepSeek-OCR利用视觉-文本压缩来减少长文本处理成本并加速推理，但视觉令牌仍然容易出现冗余的文本和结构信息。此外，当前用于传统视觉-语言模型（VLMs）的令牌修剪方法由于不恰当的压缩机制而无法保持文本保真度。通过分析DeepSeek-OCR的解码过程，我们发现了一种独特的双阶段阅读轨迹：模型最初优先处理大多数高范数令牌，然后随后重新分配其注意力到剩余的令牌上。受此启发，我们提出RTPrune，一种专为DeepSeek-OCR设计的双阶段令牌修剪方法。在第一阶段，我们优先保留捕捉显著文本和结构信息的高范数视觉令牌。在第二阶段，剩余的令牌基于最优传输理论进行配对和合并，以实现高效的特征聚合。我们进一步引入了一个动态修剪比率，以适应令牌相似性和文本密度，从而在OCR任务中实现更优的效率-精度权衡。广泛的实验表明，RTPrune在OmniDocBench上实现了99.47%的准确率和1.23倍更快的prefill速度，当应用于DeepSeek-OCR-Large时，仅保留84.25%的令牌。

英文摘要

DeepSeek-OCR leverages visual-text compression to reduce long-text processing costs and accelerate inference, yet visual tokens remain prone to redundant textual and structural information. Moreover, current token pruning methods for conventional vision-language models (VLMs) fail to preserve textual fidelity due to improper compression mechanisms. By analyzing the decoding process of DeepSeek-OCR, we find that a distinct two-stage reading trajectory: the model initially prioritizes the majority of high-norm tokens, then subsequently redistributes its attention to the remaining ones. Motivated by this insight, we propose RTPrune, a two-stage token pruning method tailored for DeepSeek-OCR. In the first stage, we prioritize high-norm visual tokens that capture salient textual and structural information. In the second stage, the remaining tokens are paired and merged based on optimal transport theory to achieve efficient feature aggregation. We further introduce a dynamic pruning ratio that adapts to token similarity and textual density for OCR tasks, enabling a better efficiency-accuracy trade-off. Extensive experiments demonstrate state-of-the-art performance, as evidenced by 99.47% accuracy and 1.23$\times$ faster prefill on OmniDocBench, achieved with 84.25% token retention when applied to DeepSeek-OCR-Large.

URL PDF HTML ☆

赞 0 踩 0

2604.26836 2026-05-22 cs.LG cs.SY eess.SY 版本更新

Uncertainty-Aware Predictive Safety Filters for Probabilistic Neural Network Dynamics

具有不确定性的预测安全过滤器用于概率神经网络动态

Bernd Frauenknecht, Lukas Kesper, Daniel Mayfrank, Henrik Hose, Sebastian Trimpe

发表机构 * Institute for Data Science in Mechanical Engineering (DSME), RWTH Aachen University（机械工程数据科学研究所（DSME），亚琛工业大学）； Institute of Climate and Energy Systems (ICE), Energy Systems Engineering (ICE-1), Forschungszentrum Jülich GmbH（气候与能源系统研究所（ICE），能源系统工程（ICE-1），焦耳研究中心有限公司）

AI总结本文提出了一种具有不确定性的预测安全过滤器（UPSi），通过将未来结果建模为可达集，利用概率集合（PE）神经网络动态模型提供严格的安全预测，从而在模型基于强化学习（MBRL）中提升探索安全性，同时保持与标准MBRL相当的性能。

详情

重新思考高维数据同化中的分数基非线性数据同化前向过程

Eunbi Yoon, Won Chang, Donghan Kim, Dae Wook Kim

发表机构 * KAIST（韩国科学技术院）

AI总结本文提出了一种针对数据同化问题的改进前向过程，用于高维非线性系统的状态估计，通过改进的分数基滤波器在测量空间中转换系统状态，提高了同化性能。

详情

AI中文摘要

数据同化是通过结合模型预测和测量来估计动态系统状态的过程。当系统是非线性且高维时，这一任务变得具有挑战性。为了解决这个问题，最近出现了一种基于分数的贝叶斯滤波器。然而，这些方法在某些情况下仍表现不佳，特别是在空间稀疏测量下。这种退化源于对似然分数的启发式近似，其误差会随时间累积。这一限制是因为这些方法只是采用了一种经典的生成建模前向过程，将数据分布转化为高斯分布，而与测量方程无关。在这里，我们提出了一种针对滤波的前向过程，将系统状态转换到测量空间，从而实现了似然分数的理论严谨公式化。基于此，我们开发了测量感知的分数基滤波器（MASF）。我们在Kolmogorov流上评估了MASF，这是一个具有高达$\mathcal{O}(10^5)$维度的高维流体基准测试，包括非线性情况下的状态与测量之间的维度不匹配。MASF在现有分数基滤波器和集合型卡尔曼滤波器上表现出改进的性能。值得注意的是，当使用幅度预训练时，MASF相比基线实现了高达$28.2 imes$的时钟时间加速。我们的实现可在 exttt{https://github.com/tcnllab-oss/masf}获得。

英文摘要

Data assimilation is the process of estimating the state of a dynamical system over time by combining model predictions with measurements. This task becomes challenging when the system is nonlinear and high-dimensional. To address this, score-based Bayesian filters have recently emerged. However, these methods still show unsatisfactory performance in certain cases, particularly under spatially sparse measurements. Such degradation stems from heuristic approximations of the likelihood score, whose errors can accumulate over time. This limitation arises because the methods simply adopt a classical forward process for generative modeling that transforms a data distribution toward a Gaussian distribution, which is independent of the measurement equation. Here, we propose a forward process tailored for filtering that transforms the system state toward the measurement space, enabling a theoretically sound formulation of the likelihood score. Based on this, we develop the Measurement-Aware Score-based Filter (MASF). We evaluate MASF on Kolmogorov flow, a high-dimensional fluid benchmark with up to $\mathcal{O}(10^5)$ dimensions, under diverse measurement operators, including nonlinear cases with a dimensional mismatch between the state and the measurements. MASF shows improved performance over existing score-based filters and ensemble-type Kalman filters. Notably, MASF achieves up to a $28.2\times$ wall-clock speedup compared with the baselines when using amortized pretraining. Our implementation is available at \texttt{https://github.com/tcnllab-oss/masf}.

URL PDF HTML ☆

赞 0 踩 0

2603.29981 2026-05-22 cs.LG stat.ML 版本更新

Aligning Validation with Deployment in Spatial Prediction: Target-Weighted Cross-Validation

在空间预测中对齐验证与部署：目标加权交叉验证

Alexander Brenning, Thomas Suesse

发表机构 * Friedrich Schiller University Jena（耶拿弗里德里希-施勒辛格大学）； ELLIS Unit Jena（耶拿ELLIS单位）

AI总结本文提出了一种基于加权交叉验证的部署导向验证框架，通过引入目标加权交叉验证（TWCV）来对齐验证任务与指定领域内预测任务的分布，以减少因采样偏差导致的预测误差。

详情

AI中文摘要

可靠地估计预测性能对于空间环境建模至关重要，其中机器学习模型用于从不均匀分布的观测数据中生成地图。标准交叉验证（CV）假设验证数据能代表目标领域内预测条件的分布。在实践中，由于选择性或集群采样，这一假设经常被违反，导致性能和不确定性估计偏倚。本文引入了一种基于加权交叉验证的部署导向验证框架，该框架通过重要性加权交叉验证（IWCV）和基于校准的方法，目标加权交叉验证（TWCV），利用具有空间意义的任务描述符如环境协变量和预测距离。模拟实验表明，传统非空间和空间交叉验证策略在现实采样设计下会表现出显著偏倚，而加权交叉验证方法在验证任务充分覆盖部署任务空间时能大幅减少这种偏倚。德国氮氧化物（NO₂）浓度制图案例研究显示，标准交叉验证由于采样偏倚会高估预测误差，而加权交叉验证则能产生更符合部署条件的估计。该框架将验证任务生成与风险估计分开，并为在样本分布与预测领域不同的空间预测设置中改进性能评估提供了实用方法。

英文摘要

Reliable estimation of predictive performance is essential for spatial environmental modeling, where machine-learning models are used to generate maps from unevenly distributed observations. Standard cross-validation (CV) assumes that validation data are representative of prediction conditions across the target domain. In practice, this assumption is often violated due to preferential or clustered sampling, leading to biased performance and uncertainty estimates. We introduce a deployment-oriented validation framework based on weighted CV that aligns validation tasks with the distribution of prediction tasks across a specified domain. The framework includes importance-weighted cross-validation (IWCV) and a calibration-based approach, Target-Weighted Cross-Validation (TWCV), which uses spatially meaningful task descriptors such as environmental covariates and prediction distance. Simulation experiments show that conventional non-spatial and spatial CV strategies can exhibit substantial bias under realistic sampling designs, whereas weighted CV approaches substantially reduce this bias when validation tasks adequately cover the deployment-task space. A case study on mapping nitrogen dioxide (NO$_2$) concentrations across Germany demonstrates that standard CV can overestimate prediction error due to sampling bias, while weighted CV yields estimates more consistent with deployment conditions. The framework separates validation task generation from risk estimation and provides a practical approach for improving performance assessment in spatial prediction settings where sample distributions differ from prediction domains.

URL PDF HTML ☆

赞 0 踩 0

2603.25958 2026-05-22 cs.LG 版本更新

Cluster-Adaptive Feature Extraction and its Theoretical Foundation with Minkowski Weighted k-Means

基于Minkowski加权k均值的聚类自适应特征提取及其理论基础

Renato Cordeiro de Amorim, Vladimir Makarenkov

发表机构 * School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe, UK（埃塞克斯大学计算机科学与电子工程学院，英国威文豪斯）； Département d’informatique, Université du Québec à Montréal, C.P. 8888 succ. Centre-Ville, Montreal (QC) H3C 3P8 Canada（魁北克大学蒙特利尔分校计算机科学系，加拿大蒙特利尔（QC）H3C 3P8）； Mila - Quebec AI Institute, Montreal, QC, Canada（魁北克人工智能研究所，加拿大蒙特利尔（QC））

AI总结本文提出了一种基于Minkowski加权k均值的聚类自适应特征提取方法，通过理论分析揭示了特征权重的结构，并证明了该方法在抑制高分散特征和增强信息性特征方面的有效性。

详情

AI中文摘要

Minkowski加权k均值（mwk-均值）算法通过引入特征权重和Minkowski距离扩展了经典k均值。我们首先证明，mwk-均值的目标函数可以表示为聚类内分散度的幂均值聚合，其中幂次由Minkowski指数p决定。这一表示揭示了p如何控制特征在选择性和均匀性之间的过渡。利用这种表示，我们推导了目标函数的界限，并刻画了特征权重的结构，证明其仅依赖于相对分散度，并遵循与分散比的幂律关系。这导致了对高分散特征抑制的显式保证，并建立了算法的收敛性。基于这些理论结果，我们引入了聚类自适应特征提取（CAFE），一种利用mwk-均值特征权重对数据进行预处理以进行无监督特征提取的方法。我们证明这种预处理反转了聚类内分散度的排序，抑制噪声特征并放大信息性特征。在受控的聚类内噪声环境下进行的大量实验表明，CAFE在传统特征提取方法的结果上始终表现出改进。

英文摘要

The Minkowski weighted $k$-means ($mwk$-means) algorithm extends classical $k$-means by incorporating feature weights and a Minkowski distance. We first show that the $mwk$-means objective can be expressed as a power-mean aggregation of within-cluster dispersions, with the order determined by the Minkowski exponent $p$. This formulation reveals how $p$ controls the transition between selective and uniform use of features. Using this representation, we derive bounds for the objective function and characterise the structure of the feature weights, showing that they depend only on relative dispersion and follow a power-law relationship with dispersion ratios. This leads to explicit guarantees on the suppression of high-dispersion features, and we establish convergence of the algorithm. Building on these theoretical results, we introduce Cluster-Adaptive Feature Extraction (CAFE), a method that uses the $mwk$-means feature weights to rescale the data prior to unsupervised feature extraction. We prove that this rescaling reverses the within-cluster dispersion ordering, suppressing noisy features and amplifying informative ones. Numerous experiments conducted under controlled within-cluster noise show that CAFE consistently improves the results of traditional feature extraction methods.

URL PDF HTML ☆

赞 0 踩 0

2603.20405 2026-05-22 cs.LG cs.CL cs.LO 版本更新

Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

使用 Opus 4.6 和 Rocq-MCP 的 2025 年 Putnam 问题

Guillaume Baudart, Marc Lelarge, Tristan Stérin, Jules Viennot

发表机构 * IRIF, Université Paris Cité, Inria, CNRS（IRIF，巴黎Cité大学，法国国家信息与自动化研究所，法国国家科学研究中心）； DI ENS, PSL University, Inria（ENS巴黎大学DI，巴黎科学实验室大学，法国国家信息与自动化研究所）

AI总结研究探讨了使用 Opus 4.6 配合 Rocq-MCP 工具自主证明 2025 年 Putnam 数学竞赛中 12 个问题中的 10 个，展示了基于模型上下文协议 (MCP) 的自动证明方法及公开可用的证明过程。

2603.04525 2026-05-22 stat.ML cs.LG 版本更新

The Volterra signature

Volterra签名

Paul P. Hager, Fabian N. Harang, Luca Pelizzari, Samy Tindel

发表机构 * Department of Statistics and Operations Research, University of Vienna（统计与运筹学系，维也纳大学）； Department of Economics, BI Norwegian Business School（经济学系，BI挪威商学院）； Department of Mathematics, Purdue University（数学系，普渡大学）

AI总结本文提出Volterra签名作为处理历史依赖系统的显式特征表示，通过将输入路径与时间核结合到张量代数中，利用Volterra-Chen恒等式推导出严谨的学习理论保证，并展示其在动态学习任务中的有效性。

详情

AI中文摘要

现代处理非马尔可夫时间序列的学习方法，如循环神经网络、神经控制微分方程或变换器，通常依赖于隐式的记忆机制，这些机制在长时间范围内难以解释或训练。我们提出Volterra签名VSig(x;K)作为处理历史依赖系统的显式特征表示。通过将输入路径x加权时间核K转化为张量代数，我们利用相关的Volterra-Chen恒等式推导出严谨的学习理论保证。具体来说，我们证明了注入性陈述（在增强下可识别），从而在无限维路径空间上推导出通用逼近定理，这在某些情况下通过VSig(x;K)的线性泛函实现。此外，我们通过展示与Volterra签名相关的内积可通过二参数积分方程闭合地表示，证明了核技巧的应用，从而利用PDE的数值方法进行计算。对于一大类指数型核，VSig(x;K)在张量代数中解线性状态空间微分方程。结合对时间重参数化的不变性，这些结果将Volterra签名定位为数据科学中稳健且计算上可行的特征映射。我们在真实和合成数据上的动态学习任务中展示了其有效性，其中它一致地改进了经典路径签名基线。

英文摘要

Modern approaches for learning from non-Markovian time series, such as recurrent neural networks, neural controlled differential equations or transformers, typically rely on implicit memory mechanisms that can be difficult to interpret or to train over long horizons. We propose the \emph{Volterra signature} $\mathrm{VSig}(x;K)$ as a principled, explicit feature representation for history-dependent systems. By developing the input path $x$ weighted by a temporal kernel $K$ into the tensor algebra, we leverage the associated Volterra--Chen identity to derive rigorous learning-theoretic guarantees. Specifically, we prove an \emph{injectivity} statement (identifiability under augmentation) that leads to a \emph{universal approximation} theorem on the infinite dimensional path space, which in certain cases is achieved by \emph{linear functionals} of $\mathrm{VSig}(x;K)$. Moreover, we demonstrate applicability of the \emph{kernel trick} by showing that the inner product associated with Volterra signatures admits a closed characterization via a two-parameter integral equation, enabling numerical methods from PDEs for computation. For a large class of exponential-type kernels, $\mathrm{VSig}(x;K)$ solves a linear state-space ODE in the tensor algebra. Combined with inherent invariance to time reparameterization, these results position the Volterra signature as a robust, computationally tractable feature map for data science. We demonstrate its efficacy in dynamic learning tasks on real and synthetic data, where it consistently improves classical path signature baselines.

URL PDF HTML ☆

赞 0 踩 0

2603.04383 2026-05-22 cs.CY cs.CR cs.IR cs.LG cs.SI 版本更新

Turning Trust to Transactions: Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer Economy

将信任转化为交易：追踪YouTube的影响力经济中的affiliate营销与FTC合规性

Chen Sun, Yash Vekaria, Zubair Shafiq, Rishab Nithyanand

发表机构 * University of Iowa（爱荷华大学）； UC Davis（加州大学戴维斯分校）

AI总结本研究通过Web测量和NLP技术开发工具，分析YouTube上affiliate营销生态系统的现状，揭示affiliate链接的普及程度及非合规行为的比例，并提出通过标准化披露功能提高合规性的建议。

Comments ICWSM 2026

详情

AI中文摘要

YouTube已发展成一个强大的平台，创作者通过affiliate营销来 monetize 他们的影响力，这引发了关于透明度和伦理问题的担忧，尤其是在创作者未能披露其affiliate关系时。尽管监管机构如美国联邦贸易委员会（FTC）已发布指南以解决这些问题，但非合规和消费者伤害仍然存在，且这些问题的严重程度仍不清楚。在本文中，我们介绍了利用最近的Web测量和NLP研究进展开发的工具，以研究YouTube上的affiliate营销生态系统。我们应用这些工具对来自近54万创作者的200万视频的10年数据集进行分析，研究YouTube上affiliate营销的普及程度及非合规行为的比例。我们的发现表明，affiliate链接广泛存在，但披露合规性仍然很低，大多数视频未能达到FTC标准。此外，我们分析了不同利益相关者在改善披露行为上的影响。我们的研究表明，平台通过标准化披露功能与提高合规性密切相关。我们建议监管机构和affiliate合作伙伴应与平台合作，以提高影响力经济中的透明度、问责制和信任度。

英文摘要

YouTube has evolved into a powerful platform where creators monetize their influence through affiliate marketing, raising concerns about transparency and ethics, especially when creators fail to disclose their affiliate relationships. Although regulatory agencies like the US Federal Trade Commission (FTC) have issued guidelines to address these issues, non-compliance and consumer harm persist, and the extent of these problems remains unclear. In this paper, we introduce tools, developed with insights from recent advances in Web measurement and NLP research, to examine the state of the affiliate marketing ecosystem on YouTube. We apply these tools to a 10-year dataset of 2 million videos from nearly 540,000 creators, analyzing the prevalence of affiliate marketing on YouTube and the rates of non-compliant behavior. Our findings reveal that affiliate links are widespread, yet disclosure compliance remains low, with most videos failing to meet FTC standards. Furthermore, we analyze the effects of different stakeholders in improving disclosure behavior. Our study suggests that the platform is highly associated with improved compliance through standardized disclosure features. We recommend that regulators and affiliate partners collaborate with platforms to enhance transparency, accountability, and trust in the influencer economy.

URL PDF HTML ☆

赞 0 踩 0

2603.03454 2026-05-22 cs.LG 版本更新

[Re] FairDICE: A Fair Tradeoff in Multi-objective Offline RL

[Re] FairDICE：多目标离线RL中的公平权衡

Peter Adema, Karim Galliamov, Aleksey Evstratovskiy, Ross Geurts

发表机构 * University of Amsterdam（阿姆斯特丹大学）

AI总结该研究探讨了多目标离线强化学习中公平权衡的问题，提出FairDICE算法通过自适应学习多目标权重来实现公平妥协，但发现代码错误导致其在连续环境中退化为标准行为克隆，并需修正超参数以提升实验有效性。

Comments 12 pages, 8 figures in main text. Code at https://github.com/p-adema/re-fairdice. Reviewed at https://openreview.net/forum?id=Tr6MBt0hAj

Journal ref Published 05/2026 in Transactions on Machine Learning Research

详情

AI中文摘要

离线强化学习（RL）是RL领域的一个新兴分支，其中策略仅从演示中学习。在离线RL中，某些环境需要平衡多个目标，但现有的多目标离线RL算法未能提供有效的方法来找到公平的折中方案。FairDICE（见arXiv:2506.08062v2）通过将OptiDICE（一种离线RL算法）进行适应性修改，以自动学习多个目标的权重，例如激励目标间的公平性。由于这一贡献具有价值，本复制研究检验了关于FairDICE的可复制性声明。我们发现许多理论声明成立，但代码中的错误使FairDICE在连续环境中退化为标准行为克隆，并且许多重要的超参数最初未明确指定。在修正之后，我们通过扩展原始论文的实验表明，FairDICE可以扩展到复杂环境和高维奖励，尽管它在（在线）超参数调优上可能依赖性较强。我们得出结论，FairDICE是一种理论上有吸引力的方法，但实验验证需要显著修订。

英文摘要

Offline Reinforcement Learning (RL) is an emerging field of RL in which policies are learned solely from demonstrations. Within offline RL, some environments involve balancing multiple objectives, but existing multi-objective offline RL algorithms do not provide an efficient way to find a fair compromise. FairDICE (see arXiv:2506.08062v2) seeks to fill this gap by adapting OptiDICE (an offline RL algorithm) to automatically learn weights for multiple objectives to e.g. incentivise fairness among objectives. As this would be a valuable contribution, this replication study examines the replicability of claims made regarding FairDICE. We find that many theoretical claims hold, but an error in the code reduces FairDICE to standard behaviour cloning in continuous environments, and many important hyperparameters were originally underspecified. After rectifying this, we show in experiments extending the original paper that FairDICE can scale to complex environments and high-dimensional rewards, though it can be reliant on (online) hyperparameter tuning. We conclude that FairDICE is a theoretically interesting method, but the experimental justification requires significant revision.

URL PDF HTML ☆

赞 0 踩 0

2603.02938 2026-05-22 cs.LG cs.AI 版本更新

Beyond One-Size-Fits-All: Adaptive Subgraph Denoising for Zero-Shot Graph Learning with Large Language Models

超越一刀切：基于大语言模型的零样本图学习中的自适应子图去噪

Fengzhi Li, Liang Zhang, Yuan Zuo, Ruiqing Zhao, YanSong Liu, Yunfei Ma, Fanyu Meng, Junlan Feng

发表机构 * JIUTIAN Research（JIUTIAN研究）； The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； MIIT Key Laboratory of Data and Decision Intelligence（信息与决策智能重点实验室）； Beihang University（北航）

AI总结本文提出GraphSSR框架，通过自适应子图提取和去噪方法，解决传统图神经网络在零样本学习中泛化能力不足的问题，提升大语言模型在图推理任务中的表现。

详情

AI中文摘要

图基任务在零样本设置中仍面临显著挑战，由于数据稀缺性和传统图神经网络（GNNs）无法泛化到未见领域或标签空间。尽管最近的进展转向利用大语言模型（LLMs）作为预测器来增强GNNs，但这些方法常面临跨模态对齐问题。最近的范式（即Graph-R1）通过采用纯文本格式和基于LLM的图推理克服了上述架构依赖性，显示出改进的零样本泛化能力。然而，它使用一种任务无关的“一刀切”子图提取策略，不可避免地引入了显著的结构噪声——无关邻居和边——这会扭曲LLMs的感知范围并导致次优预测。为了解决这一限制，我们引入GraphSSR，一种新的框架，用于零样本LLM图推理中的自适应子图提取和去噪。具体而言，我们提出了SSR流水线，通过“采样-选择-推理”过程动态定制子图提取以适应特定上下文，使模型能够自主过滤掉任务无关的邻居并克服“一刀切”问题。为了内化这一能力，我们开发了SSR-SFT，一种数据合成策略，生成高质量的SSR风格图推理轨迹用于LLM的监督微调。此外，我们提出了SSR-RL，一种两阶段强化学习框架，该框架专门设计用于自适应子图去噪，明确调节所提出SSR流水线中的采样和选择操作。通过结合真实性增强和去噪增强的强化学习，我们引导模型使用简洁的、去噪的子图进行推理以实现准确预测。

英文摘要

Graph-based tasks in the zero-shot setting remain a significant challenge due to data scarcity and the inability of traditional Graph Neural Networks (GNNs) to generalize to unseen domains or label spaces. While recent advancements have transitioned toward leveraging Large Language Models (LLMs) as predictors to enhance GNNs, these methods often suffer from cross-modal alignment issues. A recent paradigm (i.e., Graph-R1) overcomes the aforementioned architectural dependencies by adopting a purely text-based format and utilizing LLM-based graph reasoning, showing improved zero-shot generalization. However, it employs a task-agnostic, one-size-fits-all subgraph extraction strategy, which inevitably introduces significant structural noise--irrelevant neighbors and edges--that distorts the LLMs' receptive field and leads to suboptimal predictions. To address this limitation, we introduce GraphSSR, a novel framework designed for adaptive subgraph extraction and denoising in zero-shot LLM-based graph reasoning. Specifically, we propose the SSR pipeline, which dynamically tailors subgraph extraction to specific contexts through a "Sample-Select-Reason" process, enabling the model to autonomously filter out task-irrelevant neighbors and overcome the one-size-fits-all issue. To internalize this capability, we develop SSR-SFT, a data synthesis strategy that generates high-quality SSR-style graph reasoning traces for supervised fine-tuning of LLMs. Furthermore, we propose SSR-RL, a two-stage reinforcement learning framework that explicitly regulates sampling and selection operations within the proposed SSR pipeline designed for adaptive subgraph denoising. By incorporating Authenticity-Reinforced and Denoising-Reinforced RL, we guide the model to achieve accurate predictions using parsimonious, denoised subgraphs for reasoning.

URL PDF HTML ☆

赞 0 踩 0

2602.22719 2026-05-22 cs.LG 版本更新

Interpreting and Steering State-Space Models via Activation Subspace Bottlenecks

通过激活子空间瓶颈解释和操控状态空间模型

Vamshi Sunku Mohan, Kaustubh Gupta, Aneesha Das, Chandan Singh

发表机构 * Microsoft Research, Redmond（微软研究院（红mond））； Independent Researcher（独立研究员）

AI总结本文通过识别Mamba家族状态空间模型中的激活子空间瓶颈，提出了一种在测试时通过乘以标量来操控激活的干预方法，从而在多个模型和基准测试中提升了性能，并验证了这些瓶颈对性能的阻碍作用。

2602.22270 2026-05-22 cs.LG q-bio.PE 版本更新

Prior Knowledge-enhanced Spatio-temporal Epidemic Forecasting

先验知识增强的时空疫情预测

Sijie Ruan, Jinyu Li, Jia Wei, Zenghao Xu, Jie Bao, Junshi Xu, Junyang Qiu, Shuliang Wang, Xiaoxiao Wang, Hanning Yuan

发表机构 * Beijing Institute of Technology（北京理工大学）； Zhejiang Provincial Center for Disease Control and Prevention（浙江省疾病预防控制中心）； JD Technology（京东科技）； The University of Hong Kong（香港大学）； China Mobile Internet（中国移动互联网）

AI总结本文提出了一种结合隐式时空先验和显式专家先验的新型混合框架STOEP，通过动态调整区域依赖关系、放大弱信号和机制性预测来提升时空疫情预测的准确性。

Comments 12 pages, 10 figures, accepted to IJCAI 2026

详情

AI中文摘要

时空疫情预测对于公共卫生管理至关重要，但现有方法常面临对弱疫情信号不敏感、空间关系过于简化和参数估计不稳定的问题。为解决这些问题，我们提出了Spatio-Temporal priOr-aware Epidemic Predictor（STOEP），一种新的混合框架，整合了隐式时空先验和显式专家先验。STOEP由三个关键组件组成：（1）病例感知邻接学习（CAL），利用历史感染模式动态调整基于移动性的区域依赖关系；（2）空间指导参数估计（SPE），采用可学习的空间先验来放大弱疫情信号；（3）基于滤波的机制性预测（FMF），使用专家指导的自适应阈值策略来正则化疫情参数。在真实世界中的新冠和流感数据集上进行的广泛实验表明，STOEP在RMSE上比最佳基线高出11.1%。该系统已在中国一个省级CDC部署，以促进后续应用。

英文摘要

Spatio-temporal epidemic forecasting is critical for public health management, yet existing methods often struggle with insensitivity to weak epidemic signals, over-simplified spatial relations, and unstable parameter estimation. To address these challenges, we propose the Spatio-Temporal priOr-aware Epidemic Predictor (STOEP), a novel hybrid framework that integrates implicit spatio-temporal priors and explicit expert priors. STOEP consists of three key components: (1) Case-aware Adjacency Learning (CAL), which dynamically adjusts mobility-based regional dependencies using historical infection patterns; (2) Space-informed Parameter Estimating (SPE), which employs learnable spatial priors to amplify weak epidemic signals; and (3) Filter-based Mechanistic Forecasting (FMF), which uses an expert-guided adaptive thresholding strategy to regularize epidemic parameters. Extensive experiments on real-world COVID-19 and influenza datasets demonstrate that STOEP outperforms the best baseline by 11.1% in RMSE. The system has been deployed at a provincial CDC in China to facilitate downstream applications.

URL PDF HTML ☆

赞 0 踩 0

2602.18141 2026-05-22 cs.LG 版本更新

Geometry-Induced Diffusion on Graphs: A Learnable Weighted Laplacian for Spectral GNNs

图诱导扩散：用于谱GNNs的可学习加权拉普拉斯算子

Mia Zosso, Ali Hariri, Victor Kawasaki-Borruat, Pierre-Gabriel Berlureau, Pierre Vandergheynst

发表机构 * École Polytechnique Fédérale de Lausanne (EPFL)（瑞士联邦理工学院（EPFL））； École Normale Supérieure – PSL（巴黎高等师范学院–PSL）

AI总结本文提出了一种简单的谱GNN架构mu-ChebNet，通过学习节点级权重函数mu来修改图拉普拉斯算子，从而改变传播几何而不改变图拓扑，从而促进信息传播的优选路径，帮助长距离信号避免高收缩瓶颈，无需重复层堆叠。

详情

AI中文摘要

长距离图任务对图神经网络（GNNs）来说具有挑战性：全局机制如注意力或重排方案可能计算成本高，而深度局部传播容易导致梯度消失、过平滑和过压缩。引入的mu-ChebNet架构是一种简单的谱GNN，它在应用ChebNet式滤波器之前学习一个节点级权重函数mu。所学的权重mu诱导了一个修改后的图拉普拉斯算子，从而有效改变传播几何而不改变图拓扑。这种任务相关的几何促进了信息传播的优选路径，从而帮助长距离信号避免高度收缩的瓶颈，并消除了对重复层堆叠的需要。在实践中，我们用学习的算子L_mu代替固定的图拉普拉斯算子L，保持所提出的mu-ChebNet架构轻量级，同时使传播任务自适应。此外，我们提供了一种谱分析，说明mu如何调节传播动力学，并在合成长距离推理任务和现实世界图基准上观察到性能的提高。所学的权重函数不仅具有可解释性，还为自适应图传播提供了轻量级的替代方案。

英文摘要

Long-range graph tasks are challenging for Graph Neural Networks (GNNs): global mechanisms such as attention or rewiring schemes can be computationally expensive, while deep local propagation is prone to vanishing gradients, oversmoothing, and oversquashing. The introduced mu-ChebNet architecture is a simple spectral GNN that learns a node-wise weight function mu before applying ChebNet-style filters. The learned weighting mu induces a modified graph Laplacian which effectively changes the propagation geometry without altering the graph topology. This task-dependent geometry promotes preferred routes for information propagation, thereby helping long-range signals avoid highly contractive bottlenecks, and obviating the need for repeated layer stacking. In practice, we replace the fixed graph Laplacian L by a learned operator L_mu, keeping the proposed mu-ChebNet architecture lightweight while making propagation task-adaptive. Furthermore, we provide a spectral analysis demonstrating how mu modulates propagation dynamics, and empirically observe improved performance on both synthetic long-range reasoning tasks and real-world graph benchmarks. The learned weight function is not only interpretable, but also offers a lightweight alternative to attention and rewiring for adaptive graph propagation.

URL PDF HTML ☆

赞 0 踩 0

2602.13372 2026-05-22 cs.AI cs.LG 版本更新

MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

MoralityGym：用于评估序列决策代理中分层道德对齐的基准

Simon Rosen, Siddarth Singh, Ebenezer Gelo, Helen Sarah Robertson, Ibrahim Suder, Victoria Williams, Benjamin Rosman, Geraud Nangue Tasse, Steven James

发表机构 * University of the Witwatersrand（威特沃特斯兰大学）

AI总结本文提出MoralityGym基准，通过将道德规范表示为有序的规范约束，评估序列决策代理中分层道德对齐的挑战，展示了98个伦理困境问题，并通过心理学和哲学的见解改进了伦理决策方法。

Comments Accepted at AAMAS 2026

Journal ref Proc of the 25th International Conference on Autonomous Agents and Multiagent Systems AAMAS 2026, Paphos, Cyprus, May 25 to 29, 2026, IFAAMAS

详情

DOI: 10.65109/SAKL6648

AI中文摘要

评估在面对冲突且分层结构的人类规范时，代理的道德对齐是一个在人工智能安全、道德哲学和认知科学交汇处的关键挑战。我们引入了Morality Chains，一种新的形式化方法，用于将道德规范表示为有序的规范约束，并引入了MoralityGym，一个包含98个伦理困境问题的基准，这些问题是作为电车困境风格的Gymnasium环境呈现的。通过将任务解决与道德评估解耦，并引入新的道德度量标准，MoralityGym允许将心理学和哲学的见解整合到规范敏感推理的评估中。基于安全强化学习方法的基准结果揭示了关键限制，强调了需要更系统的方法来处理伦理决策。本文为开发在复杂现实环境中行为更可靠、透明和道德的AI系统提供了基础。

英文摘要

Evaluating moral alignment in agents navigating conflicting, hierarchically structured human norms is a critical challenge at the intersection of AI safety, moral philosophy, and cognitive science. We introduce Morality Chains, a novel formalism for representing moral norms as ordered deontic constraints, and MoralityGym, a benchmark of 98 ethical-dilemma problems presented as trolley-dilemma-style Gymnasium environments. By decoupling task-solving from moral evaluation and introducing a novel Morality Metric, MoralityGym allows the integration of insights from psychology and philosophy into the evaluation of norm-sensitive reasoning. Baseline results with Safe RL methods reveal key limitations, underscoring the need for more principled approaches to ethical decision-making. This work provides a foundation for developing AI systems that behave more reliably, transparently, and ethically in complex real-world contexts.

URL PDF HTML ☆

赞 0 踩 0

2602.12952 2026-05-22 cs.LG cs.AI cs.CV 版本更新

Transporting Task Vectors across Different Architectures without Training

在不同架构间传输任务向量而无需训练

Filippo Rinaldi, Aniello Panariello, Giacomo Salici, Angelo Porrello, Simone Calderara

发表机构 * AImageLab, University of Modena and Reggio Emilia（AImageLab，Modena和雷吉奥艾米利亚大学）

AI总结本文提出Theseus方法，通过功能匹配在不同宽度模型间传输任务更新，无需训练或反向传播，展示了在视觉和语言模型上的改进效果。

Comments Accepted at the International Conference on Machine Learning (ICML), 2026

详情

AI中文摘要

适应大型预训练模型以完成下游任务时，通常会产生针对特定任务的参数更新，这些更新对于每个模型变体重新学习都很昂贵。尽管最近的研究表明，这些更新可以在具有相同架构的模型之间转移，但跨不同宽度的模型转移仍鲜有探索。在本文中，我们引入Theseus，一种无需训练的方法，用于在异构宽度模型间传输任务更新。与其匹配参数，我们通过其在中间表示上诱导的功能效应来表征任务更新。我们正式将任务向量传输定义为在观察到的激活上进行的功能匹配问题，并显示在通过正交Procrustes分析对齐表示空间后，它允许一个稳定的闭式解，该解保留了更新的几何结构。我们在不同宽度的视觉和语言模型上评估Theseus，显示在不进行额外训练或反向传播的情况下，相对于基线有持续的改进。我们的结果表明，当任务身份通过功能而非参数定义时，任务更新可以有意义地在不同架构间转移。代码可在https://github.com/apanariello4/merge-and-rebase获取。

英文摘要

Adapting large pre-trained models to downstream tasks often produces task-specific parameter updates that are expensive to relearn for every model variant. While recent work has shown that such updates can be transferred between models with identical architectures, transferring them across models of different widths remains unexplored. In this work, we introduce Theseus, a training-free method for transporting task updates across heterogeneous-width models. Rather than matching parameters, we characterize a task update by the functional effect it induces on intermediate representations. We formalize task-vector transport as a functional matching problem on observed activations and show that, after aligning representation spaces via orthogonal Procrustes analysis, it admits a stable closed-form solution that preserves the geometry of the update. We evaluate Theseus on vision and language models across different widths, showing consistent improvements over baselines without additional training or backpropagation. Our results show that task updates can be meaningfully transferred across architectures when task identity is defined functionally rather than parametrically. Code is available at https://github.com/apanariello4/merge-and-rebase.

URL PDF HTML ☆

赞 0 踩 0

2602.12506 2026-05-22 cs.LG 版本更新

On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs

关于RL微调VLMs的鲁棒性和链式思维一致性

Rosie Zhao, Anshul Shah, Xiaoyu Zhu, Xinke Deng, Zhongyu Jiang, Yang Yang, Joerg Liebelt, Arnab Mondal

发表机构 * Apple（苹果公司）； OpenAI

AI总结本文研究了RL微调VLMs在视觉推理任务中的鲁棒性和链式思维一致性，发现文本扰动和CoT不一致会显著降低模型的鲁棒性和信心，而闭源模型在保持鲁棒性和推理一致性方面表现更佳，指出这一差距源于当前开源RL微调的不足而非任务本身的限制。

Comments ICML 2026

详情

AI中文摘要

强化学习（RL）微调已成为增强大型语言模型（LLMs）在推理密集型任务中的关键技术，推动其扩展到视觉语言模型（VLMs）。尽管RL微调的VLMs在视觉推理基准测试中表现优异，但它们仍容易受到弱视觉基础、幻觉和过度依赖文本提示的影响。我们发现，简单的受控文本扰动，包括误导的标题或错误的链式思维（CoT）轨迹，会导致鲁棒性和信心的显著下降，且当考虑跨开源多模态推理模型的CoT一致性时，这些影响更为明显。相比之下，闭源模型表现出相似的失败模式，但保持了显著更高的鲁棒性和推理一致性，这表明差距反映的是当前开源RL微调的不足，而非任务本身的限制。为了更好地理解这些漏洞，我们进一步分析了RL微调动态，并揭示了准确率与忠实度之间的权衡：微调提高了基准测试准确率，但同时可能削弱伴随的CoT的可靠性及其对上下文变化的鲁棒性。尽管对抗性增强提高了鲁棒性，但本身并不能防止忠实度漂移。结合忠实度意识的奖励可以恢复答案与推理之间的对齐，但当与增强结合时，训练风险会坍缩到捷径策略，鲁棒性仍然难以获得。这些发现突显了仅基于准确率的评估的局限性，并促使训练和评估协议共同强调正确性、鲁棒性和视觉基础推理的忠实度。

英文摘要

Reinforcement learning (RL) finetuning has become a key technique for enhancing large language models (LLMs) on reasoning-intensive tasks, motivating its extension to vision-language models (VLMs). While RL-tuned VLMs improve on visual reasoning benchmarks, they remain vulnerable to weak visual grounding, hallucinations, and over-reliance on textual cues. We show that simple, controlled textual perturbations, including misleading captions or incorrect chain-of-thought (CoT) traces, cause substantial drops in robustness and confidence, and that these effects are more pronounced when CoT consistency is taken into account across open-source multimodal reasoning models. In contrast, closed models exhibit similar failure modes but maintain markedly greater robustness and reasoning consistency, suggesting that the gap reflects a shortcoming in current open-source RL finetuning rather than an inherent limitation of the task. To better understand these vulnerabilities, we further analyze RL finetuning dynamics and uncover an accuracy-faithfulness trade-off: finetuning raises benchmark accuracy, but can simultaneously erode the reliability of the accompanying CoT and its robustness to contextual shifts. Although adversarial augmentation improves robustness, it does not by itself prevent faithfulness drift. Incorporating a faithfulness-aware reward can restore alignment between answers and reasoning, but when paired with augmentation, training risks collapsing onto shortcut strategies and robustness remains elusive. Together, these findings highlight the limitations of accuracy-only evaluations and motivate training and assessment protocols that jointly emphasize correctness, robustness, and the faithfulness of visually grounded reasoning.

URL PDF HTML ☆

赞 0 踩 0

2602.10894 2026-05-22 cs.LG cs.AI 版本更新

Revisiting Regularized Policy Optimization for Stable and Efficient Reinforcement Learning in Two-Player Games

重新审视正则化策略优化以实现稳定且高效的双人博弈强化学习

Kazuki Ota, Takayuki Osa, Motoki Omura, Tatsuya Harada

发表机构 * The University of Tokyo, Japan（东京大学）； RIKEN Center for Advanced Intelligence Project, Japan（日本RIKEN高级智能项目中心）

AI总结本文重新审视了带有反向Kullback-Leibler正则化和熵正则化的策略优化方法，在双人零和设置中从理论和经验角度分析其组合，提供了新的收敛保证并通过合成游戏的数值实验验证了理论结果，并基于正则化策略优化推导出一种实用的模型无关强化学习算法，通过在五个棋盘游戏中进行的全面实验验证了算法的训练效率。

Comments Accepted at ICML 2026

详情

AI中文摘要

像棋盘游戏这样的双人博弈长期以来一直是强化学习的传统基准。本工作重新审视了一种带有反向Kullback-Leibler正则化和熵正则化的策略优化方法，并从理论和经验角度分析其在双人零和设置中的组合。从理论角度来看，我们研究了策略更新规则在两个理论设置中的稳定性：博弈论的正常形式博弈和有限长度博弈。我们提供了新的收敛保证，并通过合成游戏的数值实验验证了我们的理论结果。从经验角度来看，我们推导出一种基于正则化策略优化的实用模型无关强化学习算法。我们通过在五个棋盘游戏中进行的全面实验验证了我们算法的训练效率。实验结果表明，我们的智能体在各种环境中学习效率均优于现有方法。

英文摘要

Two-player games such as board games have long been used as traditional benchmarks for reinforcement learning. This work revisits a policy optimization method with reverse Kullback-Leibler regularization and entropy regularization and analyzes this combination in two-player zero-sum settings from theoretical and empirical perspectives. From a theoretical perspective, we investigate the stability of the policy update rule in two theoretical settings: game-theoretic normal-form games and finite-length games. We provide novel convergence guarantees and verify our theoretical results through numerical experiments on synthetic games. From an empirical perspective, we derive a practical model-free reinforcement learning algorithm based on the regularized policy optimization. We validate the training efficiency of our algorithm through comprehensive experiments on five board games: Animal Shogi, Gardner Chess, Go, Hex, and Othello. Experimental results show that our agent learns more efficiently than existing methods across environments.

URL PDF HTML ☆

赞 0 踩 0

2602.09851 2026-05-22 cs.LG 版本更新

CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization

CoFEH: 由协作贝叶斯超参数优化赋能的LLM驱动特征工程

Beicheng Xu, Keyao Ding, Wei Liu, Yupeng Lu, Bin Cui

发表机构 * School of CS \& Key Lab of High Confidence Software Technologies (MOE), Peking University Beijing China ； School of CS \& Beijing Key Laboratory of Software ； Hardware Cooperative Artificial Intelligence Systems, Peking University Beijing China ； School of CS \& Key Lab of High Confidence Software Technologies (MOE), Peking University ； Hardware Cooperative Artificial Intelligence Systems, Peking University

AI总结本文提出CoFEH框架，通过结合LLM驱动的特征工程和贝叶斯超参数优化，实现鲁棒的端到端AutoML，解决了传统方法在搜索空间刚性和缺乏领域意识的问题，并引入互条件机制提升FE与HPO的协同效果。

Comments Accepted at KDD 2026. Extended version with full appendices

详情

DOI: 10.1145/3770855.3817664

AI中文摘要

特征工程（FE）在自动化机器学习（AutoML）中至关重要，但传统方法在搜索空间刚性和缺乏领域意识方面存在瓶颈。尽管大型语言模型（LLMs）能生成无界运算符，但现有方法仅关注孤立子任务，无法实现自由形式的FE流程。此外，它们很少与下游ML模型的超参数优化（HPO）结合，导致贪心的"FE-then-HPO"工作流无法捕捉强FE-HPO交互。本文提出CoFEH，一种协作框架，通过 interleaving LLM驱动的FE和贝叶斯HPO实现鲁棒的端到端AutoML。CoFEH使用基于Tree of Thought（TOT）的LLM驱动FE优化器探索灵活的FE流程，贝叶斯优化（BO）模块解决HPO，并动态优化器选择器适配FE和HPO步骤。关键的是，我们引入互条件机制，使LLM和BO之间共享上下文，实现相互指导的决策。实验表明，CoFEH在独立FE和联合FE+HPO设置中均优于传统和LLM基线。

英文摘要

Feature Engineering (FE) is pivotal in automated machine learning (AutoML) but remains a bottleneck for traditional methods, which operate within rigid search spaces and lack domain awareness. While Large Language Models (LLMs) offer a promising alternative to generate unbounded operators with semantic reasoning, existing methods focus on isolated subtasks such as feature generation, falling short of free-form FE pipelines. Moreover, they are rarely coupled with hyperparameter optimization (HPO) of the downstream ML model, leading to greedy "FE-then-HPO" workflows that cannot capture strong FE-HPO interactions. In this paper, we present CoFEH, a collaborative framework that interleaves LLM-based FE and Bayesian HPO for robust end-to-end AutoML. CoFEH uses an LLM-driven FE optimizer powered by Tree of Thought (TOT) to explore flexible FE pipelines, a Bayesian optimization (BO) module to solve HPO, and a dynamic optimizer selector that adaptively interleaves FE and HPO steps. Crucially, we introduce a mutual conditioning mechanism that shares context between LLM and BO, enabling mutually informed decisions. Experiments show that CoFEH outperforms both traditional and LLM-based baselines in both standalone FE and joint FE+HPO settings.

URL PDF HTML ☆

赞 0 踩 0

2602.08064 2026-05-22 cs.LG cs.AI cs.CL 版本更新

SiameseNorm: Breaking the Barrier to Reconciling Pre/Post-Norm

SiameseNorm: 突破预规范与后规范之间的障碍

Tianyu Li, Dongchen Han, Zixuan Cao, Haofeng Huang, Mengyu Zhou, Ming Chen, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang, Gao Huang

发表机构 * Leap Lab, Tsinghua University（清华大学 Leap 实验室）； Qwen Large Model Application Team, Alibaba（阿里巴巴 Qwen 大模型应用团队）； Institute for Interdisciplinary Information Sciences, Tsinghua University（清华大学交叉信息学研究院）

AI总结本文提出SiameseNorm，一种双流架构，通过共享残差块将预规范和后规范结合，从而在保持训练稳定性的同时提升模型性能，适用于多种架构和模态。

Comments Accepted to ICML 2026; camera-ready version; revised presentation and added additional experimental results

详情

AI中文摘要

预规范与后规范之间的长期矛盾仍然是Transformer架构中的一个开放问题，反映了训练稳定性与表示能力之间的根本权衡。先前尝试结合两者优势的研究取得了一定进展，但往往在不同训练设置下表现有限，限制了其更广泛的应用。我们重新审视这一困境，表明单流架构难以协调预规范的稳定身份梯度传播与后规范的主要残差路径归一化。为了解决这种结构张力，我们提出SiameseNorm，一种简单而有效的双流架构，能够与预规范训练配方保持兼容。SiameseNorm通过共享残差块将预规范和后规范流连接起来，允许每个残差块从两个路径接收优化信号，且开销极低。在400M和1.3B密集语言模型、15B MoE模型、视觉Transformer以及扩散Transformer上的大量实验表明，SiameseNorm在各种架构和模态中都能保持强大的训练稳定性的同时提升性能。代码可在https://github.com/Qwen-Applications/SiameseNorm上获得。

英文摘要

The long-standing tension between Pre- and Post-Norm remains an open problem in Transformer architecture, reflecting a fundamental trade-off between training stability and representational capacity. Prior attempts to combine their strengths have made progress, but often show limited robustness across training settings, restricting their broader applicability. We revisit this dilemma, showing that single-stream architectures struggle to reconcile Pre-Norm's stable identity-gradient propagation with Post-Norm's normalization of the main residual path. To address this structural tension, we propose SiameseNorm, a simple yet effective two-stream architecture that remains compatible with Pre-Norm training recipes. SiameseNorm couples Pre-Norm-like and Post-Norm-like streams through shared residual blocks, allowing each residual block to receive optimization signals from both pathways with negligible overhead. Extensive experiments on 400M and 1.3B dense language models, 15B MoE models, Vision Transformers, and Diffusion Transformers show that SiameseNorm consistently improves performance while maintaining strong training stability across architectures and modalities. Code is available at https://github.com/Qwen-Applications/SiameseNorm.

URL PDF HTML ☆

赞 0 踩 0

2602.07340 2026-05-22 cs.LG 版本更新

Revisiting Robustness for LLM Safety Alignment via Selective Geometry Control

通过选择性几何控制重新审视LLM安全对齐的鲁棒性

Yonghui Yang, Wenjian Tao, Jilong Liu, Xingyu Zhu, Junfeng Fang, Weibiao Huang, Le Wu, Richang Hong, Tat-Sent Chua

发表机构 * National University of Singapore（新加坡国立大学）； Hefei University of Technology（合肥工业大学）； ST Engineering Ltd., Singapore（新加坡ST工程有限公司）

AI总结本文通过优化几何视角重新审视LLM安全对齐的鲁棒性，提出ShaPO框架，通过选择性几何控制在对齐关键参数子空间上强制最坏对齐目标，提升安全鲁棒性。

详情

AI中文摘要

大型语言模型的安全对齐在领域偏移和噪声偏好监督下仍显得脆弱。大多数现有鲁棒对齐方法关注对齐数据中的不确定性，而忽视了基于偏好的目标中优化诱导的脆弱性。在本文中，我们从优化几何的角度重新审视LLM安全对齐的鲁棒性，并认为鲁棒性失败不能仅通过数据为中心的方法解决。我们提出了ShaPO，一种几何感知的偏好优化框架，通过在对齐关键参数子空间上进行选择性几何控制来强制最坏情况下的对齐目标。通过避免均匀的几何约束，ShaPO缓解了在分布偏移下可能损害鲁棒性的过度正则化问题。我们将在两个层面实例化ShaPO：token层面的ShaPO稳定了基于似然的替代优化，而reward层面的ShaPO在噪声监督下强制奖励一致的优化。在多样化的安全基准和噪声偏好设置中，ShaPO在流行偏好优化方法上一致地提高了安全鲁棒性。此外，ShaPO能够与数据鲁棒目标清洁地组合，产生额外的收益，并经验上支持所提出的优化-几何视角。代码可在https://github.com/liujilong0116/ShaPO上获得。

英文摘要

Safety alignment of large language models remains brittle under domain shift and noisy preference supervision. Most existing robust alignment methods focus on uncertainty in alignment data, while overlooking optimization-induced fragility in preference-based objectives. In this work, we revisit robustness for LLM safety alignment from an optimization geometry perspective, and argue that robustness failures cannot be addressed by data-centric methods alone. We propose \textit{ShaPO}, a geometry-aware preference optimization framework that enforces worst-case alignment objectives via selective geometry control over alignment-critical parameter subspace. By avoiding uniform geometry constraints, ShaPO mitigates the over-regularization that can harm robustness under distribution shift. We instantiate ShaPO at two levels: token-level ShaPO stabilizes likelihood-based surrogate optimization, while reward-level ShaPO enforces reward-consistent optimization under noisy supervision. Across diverse safety benchmarks and noisy preference settings, ShaPO consistently improves safety robustness over popular preference optimization methods. Moreover, ShaPO composes cleanly with data-robust objectives, yielding additional gains and empirically supporting the proposed optimization-geometry perspective. The code is available at https://github.com/liujilong0116/ShaPO.

URL PDF HTML ☆

赞 0 踩 0

2602.05873 2026-05-22 cs.LG 版本更新

十亿级图基础模型

Maya Bechler-Speicher, Yoel Gottlieb, Andrey Isakov, David Abensur, Ami Tavory, Daniel Haimovich, Ido Guy, Udi Weinsberg

发表机构 * Meta

AI总结本文提出GraphBFF，一种用于构建大规模异构图的十亿参数图基础模型的端到端方法，通过引入GraphBFF Transformer架构，揭示了异构图的神经缩放定律，并在多个下游任务中展示了其优越的性能。

详情

AI中文摘要

图结构数据支撑了许多关键应用。尽管基础模型通过大规模预训练和轻量级适应改变了语言和视觉领域，但将其扩展到一般、现实世界的图结构却具有挑战性。在本文中，我们提出了Graph Billion-Foundation-Fusion（GraphBFF）：一种用于构建大规模异构图的十亿参数图基础模型（GFMs）的端到端方法。该方法的核心是GraphBFF Transformer，一种灵活且可扩展的架构，专为实际的十亿级GFMs设计。利用GraphBFF，我们提出了异构图的神经缩放定律，并显示损失随着模型容量或训练数据规模的增加而减少，取决于哪个因素是瓶颈。GraphBFF框架提供了具体的方法论，用于数据分批、预训练和微调，以构建大规模的GFMs。我们通过一个现实世界中的十亿级图展示了该框架的有效性，评估了一个十亿参数的GraphBFF Transformer，按照所提出的配方。在十个不同的现实世界下游任务上，涵盖节点和链接级别的分类和回归，GraphBFF在训练过程中未见过的图上始终优于基线，最大差距达到31个PRAUC点，包括在少样本设置中。最后，我们讨论了使GFMs成为工业规模图学习实际和原则性基础的关键挑战和开放机会。

英文摘要

Graph-structured data underpins many critical applications. While foundation models have transformed language and vision via large-scale pretraining and lightweight adaptation, extending this paradigm to general, real-world graphs is challenging. In this work, we present Graph Billion-Foundation-Fusion (GraphBFF): an end-to-end recipe for building billion-parameter Graph Foundation Models (GFMs) for large-scale heterogeneous graphs. Central to the recipe is the GraphBFF Transformer, a flexible and scalable architecture designed for practical billion-scale GFMs. Using the GraphBFF, we present neural scaling laws for heterogeneous graphs and show that loss decreases predictably as either model capacity or training data scales, depending on which factor is the bottleneck. The GraphBFF framework provides concrete methodologies for data batching, pretraining, and fine-tuning for building GFMs at scale. We demonstrate the effectiveness of the framework over a real-world billion-scale graph, with an evaluation of a billion-parameter GraphBFF Transformer following the proposed recipe. Across ten diverse, real-world downstream tasks on graphs unseen during training, spanning node- and link-level classification and regression, GraphBFF consistently outperforms baselines, with large margins of up to 31 PRAUC points, including in few-shot settings. Finally, we discuss key challenges and open opportunities for making GFMs a practical and principled foundation for graph learning at industrial scale.

URL PDF HTML ☆

赞 0 踩 0

2602.04703 2026-05-22 eess.SP cs.LG 版本更新

Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels

利用亚6GHz通道进行毫米波波束预测的知识蒸馏

Sina Tavakolian, Nhan Thanh Nguyen, Ahmed Alkhateeb, Markku Juntti

发表机构 * Centre for Wireless Communications, University of Oulu, P.O.Box 4500, FI-90014, Finland（奥卢大学无线通信中心，芬兰）； School of Electrical, Computer, and Energy Engineering, Arizona State University, AZ, USA（亚利桑那州立大学电气、计算机与能源工程学院）

AI总结本文提出了一种基于知识蒸馏技术的高效框架，利用亚6GHz通道预测毫米波波束，通过紧凑的学生深度学习架构在减少计算和内存需求的同时保持性能。

Comments 5 pages, 4 figures. Accepted for publication at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2026

Journal ref Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 22642-22646, 2026

详情

DOI: 10.1109/ICASSP55912.2026.11461506

AI中文摘要

在毫米波（mmWave）高机动环境中，波束成形通常会带来显著的训练开销。尽管先前研究指出亚6GHz通道可用于预测最优毫米波波束，但现有方法依赖于大型深度学习（DL）模型，具有不可接受的计算和内存需求。本文提出了一种基于知识蒸馏（KD）技术的计算高效框架，用于亚6GHz通道-毫米波波束映射。我们开发了两种紧凑的学生DL架构，基于个体和关系蒸馏策略，仅保留少量隐藏层，却能紧密模仿大型教师DL模型的性能。大量仿真表明，所提出的学生模型在保持教师的波束预测准确性和频谱效率的同时，将可训练参数和计算复杂度减少了99%。

英文摘要

Beamforming in millimeter-wave (mmWave) high-mobility environments typically incurs substantial training overhead. While prior studies suggest that sub-6 GHz channels can be exploited to predict optimal mmWave beams, existing methods depend on large deep learning (DL) models with prohibitive computational and memory requirements. In this paper, we propose a computationally efficient framework for sub-6 GHz channel-mmWave beam mapping based on the knowledge distillation (KD) technique. We develop two compact student DL architectures based on individual and relational distillation strategies, which retain only a few hidden layers yet closely mimic the performance of large teacher DL models. Extensive simulations demonstrate that the proposed student models achieve the teacher's beam prediction accuracy and spectral efficiency while reducing trainable parameters and computational complexity by 99%.

URL PDF HTML ☆

赞 0 踩 0

2602.02112 2026-05-22 cs.LG cs.AI cs.CL 版本更新

Unifying Masked Diffusion Models with Various Generation Orders and Beyond

统一多种生成顺序及超越的掩码扩散模型

Chunsan Hong, Sanghyun Lee, Jong Chul Ye

发表机构 * Graduate School of AI, KAIST, South Korea（韩国延世大学人工智能研究生院）

AI总结本文提出Order-Expressive Masked Diffusion Model (OeMDM)和Learnable-Order Masked Diffusion Model (LoMDM)，统一了不同生成顺序的扩散生成过程，并通过单目标学习生成顺序和扩散骨干，提升了文本生成性能。

Comments Accepted at ICML 2026

详情

AI中文摘要

两次序贯蒙特卡洛用于树搜索

Yaniv Oren, Joery A. de Vries, Pascal R. van der Vaart, Matthijs T. J. Spaan, Wendelin Böhmer

发表机构 * Delft University of Technology（代尔夫特理工大学）

AI总结本文提出Twice Sequential Monte Carlo Tree Search（TSMCTS）方法，通过减少方差和缓解路径退化问题，提高了在离散和连续环境中比SMC基线和现代MCTS版本更优的性能，同时在顺序计算上具有良好的扩展性。

2510.17991 2026-05-22 cs.LG cs.CV 版本更新

Demystifying Transition Matching: When and Why It Can Beat Flow Matching

解开转换匹配之谜：何时以及为何它能超越流匹配

Jaihoon Kim, Rajarshi Saha, Minhyuk Sung, Youngsuk Park

发表机构 * KAIST（韩国科学技术院）； Amazon Web Services（亚马逊网络服务）

AI总结本文研究了转换匹配（TM）在何时以及为何能超越流匹配（FM），通过证明在单峰高斯分布下TM具有更低的KL散度，并分析了在高斯混合分布中TM在局部单峰区域的优势，以及在目标方差非可忽略时TM的优越性。

Comments Code: https://github.com/amazon-science/TransitionFlowMatching (AISTATS 2026)

详情

AI中文摘要

流匹配（FM）是许多最先进的生成模型的基础，但最近的结果表明转换匹配（TM）可以以更少的采样步骤获得更高的质量。本文回答了TM何时以及为何能超越FM的问题。首先，当目标是一个单峰高斯分布时，我们证明在有限的步骤数下，TM的KL散度严格低于FM。改进源于TM中的随机差分潜在更新，这些更新保留了目标协方差，而确定性FM则低估了它。我们随后表征了收敛速率，显示在固定计算预算下，TM比FM收敛得更快，从而在单峰高斯情况下确立了其优势。其次，我们将分析扩展到高斯混合分布，并识别出局部单峰区域，在这些区域中，采样动态近似于单峰情况，TM可以超越FM。近似误差随着组件均值之间的最小距离增加而减少，突显了当模式良好分离时TM的优势。然而，当目标方差接近零时，每个TM更新收敛到FM更新，TM的性能优势减弱。总之，我们证明了当目标分布具有良好分离的模式和非可忽略的方差时，TM优于FM。我们通过受控实验在高斯分布上验证了我们的理论结果，并将比较扩展到现实世界中的图像和视频生成应用。

英文摘要

Flow Matching (FM) underpins many state-of-the-art generative models, yet recent results indicate that Transition Matching (TM) can achieve higher quality with fewer sampling steps. This work answers the question of when and why TM outperforms FM. First, when the target is a unimodal Gaussian distribution, we prove that TM attains strictly lower KL divergence than FM for finite number of steps. The improvement arises from stochastic difference latent updates in TM, which preserve target covariance that deterministic FM underestimates. We then characterize convergence rates, showing that TM achieves faster convergence than FM under a fixed compute budget, establishing its advantage in the unimodal Gaussian setting. Second, we extend the analysis to Gaussian mixtures and identify local-unimodality regimes in which the sampling dynamics approximate the unimodal case, where TM can outperform FM. The approximation error decreases as the minimal distance between component means increases, highlighting that TM is favored when the modes are well separated. However, when the target variance approaches zero, each TM update converges to the FM update, and the performance advantage of TM diminishes. In summary, we show that TM outperforms FM when the target distribution has well-separated modes and non-negligible variances. We validate our theoretical results with controlled experiments on Gaussian distributions, and extend the comparison to real-world applications in image and video generation.

URL PDF HTML ☆

赞 0 踩 0

2510.16590 2026-05-22 cs.LG cs.AI q-bio.BM 版本更新

Atom-anchored LLMs speak Chemistry: A Retrosynthesis Demonstration

原子锚定的大语言模型：化学 retrosynthesis 的演示

Alan Kai Hassen, Andrius Bernatavicius, Antonius P. A. Janssen, Mike Preuss, Gerard J. P. van Westen, Djork-Arné Clevert

发表机构 * Machine Learning Research（机器学习研究）； Pfizer Research and Development（辉瑞研发）； Leiden Institute of Advanced Computer Science（莱顿高级计算机科学研究所）； Leiden University（莱顿大学）； Leiden Academic Centre for Drug Research（莱顿药物研究中心）； Leiden Institute of Chemistry（莱顿化学研究所）

AI总结本研究提出了一种利用通用大语言模型进行分子推理的框架，通过原子标识符将链式推理与分子结构锚定，无需任务特定的模型训练，在单步 retrosynthesis 任务中实现了高成功率。

Comments Alan Kai Hassen and Andrius Bernatavicius contributed equally to this work

详情

AI中文摘要

在化学领域应用机器学习通常受到标注数据稀缺和昂贵的限制，限制了传统监督方法。在本工作中，我们介绍了一种利用通用大语言模型（LLMs）进行分子推理的框架，该框架无需进行任务特定的模型训练。我们的方法通过使用独特的原子标识符将链式推理锚定到分子结构上。首先，LLM执行零样本任务以识别相关片段及其关联的化学标签或转换类别。在可选的第二步中，这种位置感知信息用于少量样本任务，结合提供的类别示例，预测化学转化。我们将框架应用于单步 retrosynthesis 任务，该任务此前LLMs表现不佳。在学术基准和专家验证的药物发现分子上，我们的工作使LLMs在识别化学上合理的反应位点（≥90%）、命名反应类别（≥40%）和最终反应物（≥74%）方面实现了高成功率。最终，我们的工作建立了一种通用蓝图，用于应用LLMs到分子推理和分子转化是关键的挑战中，将原子锚定的LLMs定位为数据稀缺的化学领域中的强大解决方案。

英文摘要

Applications of machine learning in chemistry are often limited by the scarcity and expense of labeled data, restricting traditional supervised methods. In this work, we introduce a framework for molecular reasoning using general-purpose Large Language Models (LLMs) that operates without requiring task-specific model training. Our method anchors chain-of-thought reasoning to the molecular structure by using unique atomic identifiers. First, the LLM performs a zero-shot task to identify relevant fragments and their associated chemical labels or transformation classes. In an optional second step, this position-aware information is used in a few-shot task with provided class examples to predict the chemical transformation. We apply our framework to single-step retrosynthesis, a task where LLMs have previously underperformed. Across academic benchmarks and expert-validated drug discovery molecules, our work enables LLMs to achieve high success rates in identifying chemically plausible reaction sites ($\geq90\%$), named reaction classes ($\geq40\%$), and final reactants ($\geq74\%$). Ultimately, our work establishes a general blueprint for applying LLMs to challenges where molecular reasoning and molecular transformations are key, positioning atom-anchored LLMs as a powerful solution for data-scarce chemistry domains.

URL PDF HTML ☆

赞 0 踩 0

2510.11339 2026-05-22 cs.LG cs.AI 版本更新

Event-Aware Prompt Learning for Dynamic Graphs

事件感知的动态图提示学习

Xingtong Yu, Ruijuan Liang, Renhe Jiang, Dongyuan Li, Yunxiao Zhao, Xinming Zhang, Yuan Fang

发表机构 * The Chinese University of Hong Kong（香港中文大学）； University of Science and Technology of China（中国科学技术大学）； The University of Tokyo（东京大学）； Shanxi University（山西大学）； Singapore Management University（新加坡国立大学）

AI总结本文提出EVP框架，通过提取历史事件并引入事件适应机制，增强动态图学习模型对历史事件知识的利用能力。

Comments Under review

详情

AI中文摘要

现实中的图通常通过一系列事件演变，建模不同领域中对象之间的动态交互。对于动态图学习，动态图神经网络（DGNNs）已逐渐成为流行解决方案。最近，提示学习方法被探索应用于动态图。然而，现有方法通常侧重于捕捉节点与时间之间的关系，而忽视了历史事件的影响。在本文中，我们提出了EVP，一种事件感知的动态图提示学习框架，可以作为现有方法的插件，增强其利用历史事件知识的能力。首先，我们为每个节点提取一系列历史事件，并引入事件适应机制，以将这些事件的细粒度特征对齐到下游任务。其次，我们提出事件聚合机制，以有效将历史知识整合到节点表示中。最后，我们在四个公开数据集上进行了广泛的实验，以评估和分析EVP。

英文摘要

Real-world graph typically evolve via a series of events, modeling dynamic interactions between objects across various domains. For dynamic graph learning, dynamic graph neural networks (DGNNs) have emerged as popular solutions. Recently, prompt learning methods have been explored on dynamic graphs. However, existing methods generally focus on capturing the relationship between nodes and time, while overlooking the impact of historical events. In this paper, we propose EVP, an event-aware dynamic graph prompt learning framework that can serve as a plug-in to existing methods, enhancing their ability to leverage historical events knowledge. First, we extract a series of historical events for each node and introduce an event adaptation mechanism to align the fine-grained characteristics of these events with downstream tasks. Second, we propose an event aggregation mechanism to effectively integrate historical knowledge into node representations. Finally, we conduct extensive experiments on four public datasets to evaluate and analyze EVP.

URL PDF HTML ☆

赞 0 踩 0

2510.10129 2026-05-22 cs.LG cs.AI 版本更新

从自由能原理中涌现的自正交吸引子神经网络

Tamas Spisak, Karl Friston

发表机构 * Center for Translational Neuro- and Behavioral Sciences (C-TNBS), University Medicine Essen, Germany（转化神经与行为科学中心（C-TNBS），埃森大学医学中心，德国）； Queen Square Institute of Neurology, University College London, WC1N 3AR, UK（皇后广场神经病学研究所，伦敦大学学院，英国）； VERSES, Los Angeles, CA 90067, USA（VERSES，美国加利福尼亚州洛杉矶90067）

AI总结本文基于自由能原理，研究了自组织动力学如何从随机动力系统的基本原理中涌现，提出了一种无需显式学习和推断规则的高效且生物合理的方法，实现了多层贝叶斯主动推断过程，通过分析和模拟证明了所提网络倾向于产生近似正交化的吸引子表示，从而提升泛化能力和隐变量与可观测效应间的互信息。

Comments 27 pages main text, 8 pages appendix, 7 figures; interactive manuscript available at: https://pni-lab.github.io/fep-attractor-network Associated GitHub repository: https://github.com/pni-lab/fep-attractor-network

Journal ref Neurocomputing (2026): 133472

详情

DOI: 10.1016/j.neucom.2026.133472

AI中文摘要

吸引子动力学是许多复杂系统，包括大脑的特征。理解这些自组织动力学如何从基本原理中涌现对于推进对神经计算和人工智能系统设计的理解至关重要。本文正式阐述了如何将自由能原理应用于随机动力系统的通用划分，从而推导出吸引子网络的形成机制。我们的方法消除了显式学习和推断规则的需要，并识别出这些自组织系统中涌现的、高效且生物合理的推断和学习动力学。这些结果导致了一个集体、多层次的贝叶斯主动推断过程。自由能景观上的吸引子编码先验信念；推断将感官数据整合到后验信念中；学习则微调耦合以最小化长期的惊讶。通过分析和模拟，我们证明所提出的网络倾向于产生近似正交化的吸引子表示，这是同时优化预测准确性和模型复杂性所导致的后果。这些吸引子能够高效地覆盖输入子空间，提升泛化能力和隐变量与可观测效应间的互信息。此外，尽管随机数据呈现导致对称且稀疏的耦合，但序列数据则促进不对称耦合和非平衡稳态动力学，提供了对传统玻尔兹曼机的自然扩展。我们的发现为自组织吸引子网络提供了统一的理论，为人工智能和神经科学提供了新的见解。

英文摘要

Attractor dynamics are a hallmark of many complex systems, including the brain. Understanding how such self-organizing dynamics emerge from first principles is crucial for advancing our understanding of neuronal computations and the design of artificial intelligence systems. Here we formalize how attractor networks emerge from the free energy principle applied to a universal partitioning of random dynamical systems. Our approach obviates the need for explicitly imposed learning and inference rules and identifies emergent, but efficient and biologically plausible inference and learning dynamics for such self-organizing systems. These result in a collective, multi-level Bayesian active inference process. Attractors on the free energy landscape encode prior beliefs; inference integrates sensory data into posterior beliefs; and learning fine-tunes couplings to minimize long-term surprise. Analytically and via simulations, we establish that the proposed networks favor approximately orthogonalized attractor representations, a consequence of simultaneously optimizing predictive accuracy and model complexity. These attractors efficiently span the input subspace, enhancing generalization and the mutual information between hidden causes and observable effects. Furthermore, while random data presentation leads to symmetric and sparse couplings, sequential data fosters asymmetric couplings and non-equilibrium steady-state dynamics, offering a natural generalization of conventional Boltzmann Machines. Our findings offer a unifying theory of self-organizing attractor networks, providing novel insights for AI and neuroscience.

URL PDF HTML ☆

赞 0 踩 0

2502.13822 2026-05-22 stat.ML cs.LG 版本更新

Uncertainty quantification for Markov chain induced martingales with application to temporal difference learning

马尔可夫链诱导的martingales的不确定性量化及其在时间差学习中的应用

Weichen Wu, Yuting Wei, Alessandro Rinaldo

发表机构 * The Voleon Group（Voleon集团）； Department of Statistics and Data Science, The Wharton School, University of Pennsylvania（统计与数据科学系，沃顿商学院，宾夕法尼亚大学）； Department of Statistics and Data Sciences, University of Texas（统计与数据科学系，德克萨斯大学）

AI总结本文提出了一种新的高维集中不等式和Berry-Esseen界，用于分析由马尔可夫链诱导的向量martingales，并将其应用于时间差学习算法的性能分析，得到了与渐近方差相符的高概率一致性保证，并建立了Gaussian近似的时间差估计器的分布收敛速率。

2502.01476 2026-05-22 cs.LG cs.NA math.NA physics.comp-ph 版本更新

Neuro-Symbolic AI for Analytical Solutions of Differential Equations

神经符号AI用于微分方程的解析解

Orestis Oikonomou, Levi Lingsch, Dana Grund, Siddhartha Mishra, Georgios Kissas

发表机构 * Seminar for Applied Mathematics, ETH Zurich, Switzerland（应用数学研讨会，苏黎世联邦理工学院，瑞士）； ETH AI Center, Zurich, Switzerland（苏黎世联邦理工学院人工智能中心，瑞士）； IBM Research Europe, Zurich, Switzerland（IBM欧洲研究院，苏黎世，瑞士）； Institute for Atmospheric and Climate Science, ETH Zurich, Switzerland（大气与气候科学研究所，苏黎世联邦理工学院，瑞士）； Swiss Data Science Center, ETH Zurich, Switzerland（瑞士数据科学中心，苏黎世联邦理工学院，瑞士）

AI总结本文提出SIGS神经符号框架，通过上下文无关文法生成数学上有效且物理上有意义的构建块，并结合用户指定的Ansatz进行组合，嵌入到拓扑正则化的连续潜在流形中，通过两阶段搜索发现解析解，提高了微分方程解析解的准确性和效率。

Comments Updates the method and added extra results

详情

AI中文摘要

微分方程的解析解提供精确且可解释的洞察，但很少有可用，因为发现它们需要专家直觉或穷举组合空间。我们引入SIGS，一种用于方程驱动的闭式解发现的神经符号框架。SIGS使用上下文无关文法生成数学上有效且物理上有意义的构建块，结合用户指定的Ansatz来组合这些块，将其嵌入到拓扑正则化的连续潜在流形中，并通过两个阶段在该流形上进行搜索：结构选择后通过梯度下降进行系数细化，仅根据PDE残差和指定的边界和初始条件评分候选。这种设计将符号推理与数值优化统一起来；文法约束候选解块为正确，而潜在搜索使探索变得可行且数据无关。SIGS是首个神经符号方法，能够（i）恢复耦合非线性PDE系统的解析解，（ii）当文法缺乏自然原始元时发现等价的符号形式，（iii）为缺乏已知闭式解的PDE产生准确的符号近似。总体而言，SIGS在标准PDE基准测试中，在准确性和运行时间上都比现有符号方法提高了多个数量级。

英文摘要

Analytical solutions to differential equations offer exact, interpretable insight but are rarely available because discovering them requires expert intuition or exhaustive search of combinatorial spaces. We introduce SIGS, a neuro-symbolic framework for equation-driven closed-form solution discovery. SIGS uses a context-free grammar to generate mathematically valid and physically meaningful building blocks, with a user-specified Ansatz prescribing how these blocks combine, embeds them into a topology-regularised continuous latent manifold, and searches this manifold in two stages: structure selection followed by coefficient refinement using gradient descent, scoring candidates only against the PDE residual and prescribed boundary and initial conditions. This design unifies symbolic reasoning with numerical optimization; the grammar constrains candidate solution blocks to be proper by construction, while the latent search makes exploration tractable and data-free. SIGS is the first neuro-symbolic method to (i) recover analytical solutions for coupled nonlinear PDE systems, (ii) discover equivalent symbolic forms when the grammar lacks the natural primitives, and (iii) produce accurate symbolic approximations for PDEs lacking known closed-form solutions. Overall, SIGS improves over existing symbolic methods by orders of magnitude in both accuracy and runtime across standard PDE benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2410.19787 2026-05-22 cs.CV cs.LG 版本更新

Leveraging Multi-Temporal Sentinel 1 and 2 Satellite Data for Leaf Area Index Estimation With Deep Learning

利用多时相哨兵1和2卫星数据进行叶面积指数估计的深度学习方法

Clement Wang, Antoine Debouchage, Valentin Goldité, Aurélien Wery, Jules Salzinger

发表机构 * Austrian Institute of Technology - Vienna, Austria（奥地利技术研究所-维也纳，奥地利）

AI总结本文提出了一种基于多时相哨兵1雷达数据和哨兵2多谱段数据的深度学习方法，用于像素级叶面积指数预测，通过多U-Net网络结构和共同潜在空间实现不同输入模态的互补信息融合，最终在公开数据上取得了0.06 RMSE和0.93 R2分数。

Journal ref Proc. 2023 Conference on Big Data from Space (BiDS'23), Publications Office of the European Union, Luxembourg, 2023

详情

DOI: 10.2760/46796

AI中文摘要

叶面积指数（LAI）是理解生态系统健康和植被动态的关键参数。在本文中，我们提出了一种新的像素级LAI预测方法，通过利用多时间戳的哨兵1雷达数据和哨兵2多谱段数据的互补信息。我们的方法基于多个针对此任务定制的多U-Net深度神经网络。为处理不同输入模态的复杂性，该方法由多个预先训练的模块组成，以在共同的潜在空间中表示所有输入数据。然后，我们通过一个共同的解码器进行端到端微调，该解码器还考虑了季节性因素，我们发现季节性在其中起重要作用。我们的方法在公开可用数据上实现了0.06 RMSE和0.93 R2分数。我们的贡献可在https://github.com/valentingol/LeafNothingBehind上获得，供未来工作进一步改进当前进展。

英文摘要

The Leaf Area Index (LAI) is a critical parameter to understand ecosystem health and vegetation dynamics. In this paper, we propose a novel method for pixel-wise LAI prediction by leveraging the complementary information from Sentinel 1 radar data and Sentinel 2 multi-spectral data at multiple timestamps. Our approach uses a deep neural network based on multiple U-nets tailored specifically to this task. To handle the complexity of the different input modalities, it is comprised of several modules that are pre-trained separately to represent all input data in a common latent space. Then, we fine-tune them end-to-end with a common decoder that also takes into account seasonality, which we find to play an important role. Our method achieved 0.06 RMSE and 0.93 R2 score on publicly available data. We make our contributions available at https://github.com/valentingol/LeafNothingBehind for future works to further improve on our current progress.

URL PDF HTML ☆

赞 0 踩 0

2410.18151 2026-05-22 cs.SD cs.LG cs.MM eess.AS 版本更新

Music102: An $D_{12}$-equivariant transformer for chord progression accompaniment

Music102: 一个 $D_{12}$-等价变换器用于和弦进行伴奏

Weiliang Luo

发表机构 * Massachusetts Institute of Technology（麻省理工学院）

AI总结本文提出Music102，一种基于群论和音乐结构的等价变换器，用于提升和弦进行伴奏的质量，通过整合音乐对称性如转位和反射操作，改进了非等价变换器Music101的性能。

Comments 10 pages, 3 figures

Journal ref Proceedings of the 2025 International Computer Music Conference (https://hdl.handle.net/2027/fulcrum.zg64tq53m)

详情

AI中文摘要

我们提出了Music102，一种先进的模型，旨在通过$D_{12}$-等价变换器增强和弦进行伴奏。受群论和音乐结构的启发，Music102利用音乐对称性--如转位和反射操作--将这些属性整合到变换器架构中。通过编码先前的音乐知识，模型在旋律和和弦序列上保持等价性。使用POP909数据集训练和评估Music102，结果显示其在加权损失和精确准确度指标上均优于非等价变换器Music101原型，尽管参数更少。这项工作展示了自注意力机制和层归一化在离散音乐领域中的适应性，解决了计算音乐分析中的挑战。凭借其稳定且灵活的神经框架，Music102为等价音乐生成和计算音乐创作工具的进一步探索奠定了基础，将数学理论与实际音乐表演相结合。

英文摘要

We present Music102, an advanced model aimed at enhancing chord progression accompaniment through a $D_{12}$-equivariant transformer. Inspired by group theory and symbolic music structures, Music102 leverages musical symmetry--such as transposition and reflection operations--integrating these properties into the transformer architecture. By encoding prior music knowledge, the model maintains equivariance across both melody and chord sequences. The POP909 dataset was employed to train and evaluate Music102, revealing significant improvements over the non-equivariant Music101 prototype Music101 in both weighted loss and exact accuracy metrics, despite using fewer parameters. This work showcases the adaptability of self-attention mechanisms and layer normalization to the discrete musical domain, addressing challenges in computational music analysis. With its stable and flexible neural framework, Music102 sets the stage for further exploration in equivariant music generation and computational composition tools, bridging mathematical theory with practical music performance.

URL PDF HTML ☆

赞 0 踩 0

2408.13002 2026-05-22 cs.LG 版本更新

Measuring Variable Importance in Heterogeneous Treatment Effects with Confidence

用置信度衡量异质处理效应中的变量重要性

Joseph Paillard, Angel Reyero Lobo, Vitaliy Kolodyazhniy, Bertrand Thirion, Denis A. Engemann

发表机构 * Roche Pharma Research \& Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland ； Université Paris-Saclay, Inria, CEA, Palaiseau, France

AI总结本文提出PermuCATE算法，用于在估计条件平均处理效应时进行统计严谨的全局变量重要性评估，通过理论分析和实证研究证明其比LOCO方法具有更低的方差，从而提高统计功效，适用于生物医学应用中的有限数据环境。

Journal ref Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:47456-47477, 2025

详情

AI中文摘要

因果机器学习在从复杂数据中估计个体处理效应方面具有潜力。为了成功应用于现实世界，获得可靠见解以确定哪些变量驱动对治疗的异质反应至关重要。我们提出PermuCATE，一种基于条件排列重要性（CPI）方法的算法，用于统计严谨地评估条件平均处理效应（CATE）估计中的变量重要性。有限样本情况的理论分析和实证研究显示，PermuCATE比留一协变量法（LOCO）参考方法具有更低的方差，并提供可靠的变量重要性度量。这一特性提高了统计功效，这对于生物医学应用中常见的有限数据环境中的因果推断至关重要。我们通过模拟和真实世界健康数据集实证展示了PermuCATE的优势，包括具有多达数百个相关变量的设置。

英文摘要

Causal machine learning holds promise for estimating individual treatment effects from complex data. For successful real-world applications of machine learning methods, it is of paramount importance to obtain reliable insights into which variables drive heterogeneity in the response to treatment. We propose PermuCATE, an algorithm based on the Conditional Permutation Importance (CPI) method, for statistically rigorous global variable importance assessment in the estimation of the Conditional Average Treatment Effect (CATE). Theoretical analysis of the finite sample regime and empirical studies show that PermuCATE has lower variance than the Leave-One-Covariate-Out (LOCO) reference method and provides a reliable measure of variable importance. This property increases statistical power, which is crucial for causal inference in the limited-data regime common to biomedical applications. We empirically demonstrate the benefits of PermuCATE in simulated and real-world health datasets, including settings with up to hundreds of correlated variables.

URL PDF HTML ☆

赞 0 踩 0

2209.03358 2026-05-22 cs.NE cs.AI cs.CR cs.CV cs.LG 版本更新

Attacking the Spike: On the Transferability and Security of Spiking Neural Networks to Adversarial Examples

攻击尖峰：关于脉冲神经网络对抗示例的转移性和安全性

Nuo Xu, Kaleel Mahmood, Haowen Fang, Ethan Rathbun, Caiwen Ding, Wujie Wen

发表机构 * Lehigh University（莱文大学）； University of Minnesota Twin Cities（明尼苏达大学双城分校）； North Carolina State University（北卡罗来纳州立大学）； University of Rhode Island（罗德岛大学）； Northeastern University（东北大学）

AI总结本文研究了脉冲神经网络（SNN）在对抗示例中的鲁棒性，揭示了对抗攻击的转移性，并提出了混合动态脉冲估计（MDSE）攻击方法，以提高SNN和非SNN模型的对抗示例生成效果。

Comments Accepted manuscript. Published in *Neurocomputing*, Volume 656, 2025, Article 131506. Available online 12 September 2025. DOI: 10.1016/j.neucom.2025.131506

Journal ref Neurocomputing, Volume 656, 2025, 131506

详情

DOI: 10.1016/j.neucom.2025.131506

AI中文摘要

脉冲神经网络（SNNs）因其高能效和最近在分类性能上的进展而受到广泛关注。然而，与传统深度学习方法不同，SNN对对抗示例的鲁棒性研究仍相对薄弱。在本文中，我们通过三个贡献推进了SNN的对抗攻击研究。首先，我们表明对SNN的成功白盒对抗攻击高度依赖于底层的替代梯度估计器，即使对于对抗训练的SNN也是如此。其次，使用最佳的单一替代梯度估计器，我们分析了对抗攻击在SNN、视觉Transformer（ViTs）和CNN之间的可转移性。我们的分析揭示了两个关键差距：现有的白盒攻击没有利用多个替代梯度估计器来攻击SNN，且没有单个模型攻击能够可靠地生成同时欺骗SNN和非SNN模型的对抗示例。作为我们的第三个贡献，我们开发了混合动态脉冲估计（MDSE）攻击来解决这些问题。MDSE使用动态梯度估计方案，充分利用多个替代梯度估计器函数，生成能够同时欺骗SNN和非SNN模型的对抗示例。MDSE在SNN/ViT模型集合上比传统白盒攻击如Auto-PGD有效多达91.4%，在对抗训练的SNN集合上提供了3倍的提升。实验覆盖了三个数据集（CIFAR-10、CIFAR-100、ImageNet）和十九个分类器模型（每个CIFAR数据集七个，ImageNet五个）。我们的MDSE实现和评估的模型在https://github.com/nuoxuxxx/attacking-the-spike-mdse上公开可用。

英文摘要

Spiking neural networks (SNNs) have attracted much attention for their high energy efficiency and recent advances in classification performance. However, unlike traditional deep learning approaches, the study of SNN robustness to adversarial examples remains relatively underdeveloped. In this work, we advance the adversarial attack side of SNNs through three contributions. First, we show that successful white-box adversarial attacks on SNNs are highly dependent on the underlying surrogate gradient estimator, even for adversarially trained SNNs. Second, using the best single surrogate gradient estimator, we analyze the transferability of adversarial attacks across SNNs, Vision Transformers (ViTs) and CNNs. Our analysis reveals two key gaps: no existing white-box attack exploits multiple surrogate gradient estimators for SNNs, and no single-model attack reliably generates adversarial examples that simultaneously fool both SNN and non-SNN models. For our third contribution, we develop the Mixed Dynamic Spiking Estimation (MDSE) attack to address these issues. MDSE uses a dynamic gradient estimation scheme to fully exploit multiple surrogate gradient estimator functions and generates adversarial examples capable of fooling SNN and non-SNN models simultaneously. MDSE is up to 91.4% more effective on SNN/ViT model ensembles and provides a 3x boost on adversarially trained SNN ensembles compared to conventional white-box attacks like Auto-PGD. Experiments cover three datasets (CIFAR-10, CIFAR-100, ImageNet) and nineteen classifier models (seven per CIFAR dataset, five for ImageNet). Our implementation of MDSE and the evaluated models is publicly available at https://github.com/nuoxuxxx/attacking-the-spike-mdse.

URL PDF HTML ☆

赞 0 踩 0

2605.22083 2026-05-22 cs.SD cs.LG eess.AS 版本更新

RobustSpeechFlow: Learning Robust Text-to-Speech Trajectories via Augmentation-based Contrastive Flow Matching

RobustSpeechFlow: 通过基于增强的对比流匹配学习鲁棒的文本到语音轨迹

Jinhyeok Yang, Hyeongju Kim, Yechan Yu, Joon Byun, Frederik Bous, Juheon Lee

发表机构 * Supertone Inc（Supertone公司）； Independent Researcher（独立研究者）

AI总结本文提出RobustSpeechFlow，一种通过引入长度保持重复和跳过潜在增强来改进对齐鲁棒性的训练策略，从而在无需外部对齐器或偏好数据的情况下，直接惩罚现实中的失败模式，并能无缝集成到现有流程中，实验表明其在文本到语音任务中显著提升了语音质量与鲁棒性。

Comments Submitted to INTERSPEECH 2026

详情

AI中文摘要

尽管流匹配文本到语音（TTS）在零样本说话人相似性和自然度方面表现强劲，但仍易受内容保真度问题影响，特别是由于不完美的对齐导致的跳过和重复错误。我们提出了RobustSpeechFlow，一种训练策略，通过扩展对比流匹配，引入长度保持重复和跳过潜在增强来提高对齐鲁棒性。该方法无需外部对齐器或偏好数据，直接惩罚现实中的失败模式，并能无缝集成到现有流程中。在Seed-TTS-eval上，仅使用0.06B参数，其将词错误率（WER）从1.44降至1.38。在我们的ZERO500基准测试中，它在多样化的说话人和语调条件下实现了稳定的可理解性提升；在NFE=24时，其将英文字符错误率（CER）从0.48%降至0.35%，将韩文CER从0.81%降至0.57%。音频样本：https://robustspeechflow.github.io/

英文摘要

While flow-matching text-to-speech (TTS) achieves strong zero-shot speaker similarity and naturalness, it remains susceptible to content fidelity issues, particularly skip and repeat errors from imperfect alignment. We propose RobustSpeechFlow, a training strategy that improves alignment robustness by extending contrastive flow matching with length-preserving repeat and skip latent augmentations. Requiring no external aligners or preference data, our method directly penalizes realistic failure modes and readily integrates into existing pipelines. On Seed-TTS-eval, it reduces the word error rate (WER) from 1.44 to 1.38 using only 0.06B parameters. On our ZERO500 benchmark, it delivers consistent intelligibility improvements across diverse speaker and prosody conditions; at NFE=24, it reduces English character error rate (CER) from 0.48\% to 0.35\% and Korean CER from 0.81\% to 0.57\%. Audio samples: https://robustspeechflow.github.io/

URL PDF HTML ☆

赞 0 踩 0

2605.22075 2026-05-22 cs.LG q-bio.QM 版本更新

Can Breath Biomarkers Causally Influence Blood Glucose? Investigating VOC-Mediated Modulation in Diabetes

呼吸生物标志物能否因果影响血糖？探讨VOC介导的糖尿病调节

Varsha Sharma, Prasanta K. Guha, Avik Ghose

发表机构 * TCS Research（TCS研究）； Department of E&ECE, IIT Kharagpur（印度理工学院Kharagpur电子与电气工程系）

AI总结本研究通过非侵入式数据驱动框架，利用挥发性有机化合物（VOCs）和生活方式变量识别糖尿病高风险个体，采用因果推断技术估计VOCs如乙酮、异丙醇、异戊二烯和乙醇对血糖水平的影响，并设计分类器区分糖尿病患者与非糖尿病患者，建立基于风险的排名系统和高斯混合模型识别自然聚类。

Journal ref Proceedings of the IJCAI workshop on Advanced Neural Systems for Next-Generation Biomedical Intelligence, 2025

详情

AI中文摘要

糖尿病是一种全球健康负担，早期检测对于及时干预至关重要。本研究探讨了一种非侵入式、数据驱动的框架，利用挥发性有机化合物（VOCs）和生活方式变量识别糖尿病高风险个体。我们使用因果推断技术估计乙酮、异丙醇、异戊二烯和乙醇等VOCs对血糖水平的影响。此外，我们设计了一个分类器，利用非侵入式标志物区分糖尿病患者和非糖尿病患者。我们为“灰色区域”中的个体建立了基于风险的排名系统，并使用高斯混合模型识别人群中的自然聚类。我们的结果表明，特定的VOCs对血糖水平表现出强因果影响，且机器学习模型能够可靠地分类和分层高风险个体。这种集成的因果-可解释分析可以支持非侵入式糖尿病早期筛查工具的开发。

英文摘要

Diabetes is a global health burden, and early detection is critical for timely intervention. This study explores a non-invasive, data-driven framework to identify individuals at risk of diabetes using Volatile Organic Compounds (VOCs) and lifestyle variables. We use causal inference techniques to estimate the impact of VOCs such as acetone, isopropanol, isoprene, and ethanol on blood glucose levels. Additionally, we designed a classifier to distinguish diabetics from non-diabetics using non-invasive markers. We created a risk-based ranking system for individuals in the "gray zone," and identified natural clusters in the population using Gaussian Mixture Model. Our results suggest that specific VOCs exhibit a strong causal influence on glucose levels and that machine learning models can reliably classify and stratify individuals at high risk. This integrated causal-explainable analysis can support the development of tool for non-invasive early screening of diabetes.

URL PDF HTML ☆

赞 0 踩 0

2605.22074 2026-05-22 cs.LG cs.AI cs.CL 版本更新

From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning

从推理链到可验证子问题：课程强化学习使LLM推理能够进行信用分配

Xitai Jiang, Zihan Tang, Wenze Lin, Yang Yue, Shenzhi Wang, Gao Huang

发表机构 * LeapLab, Tsinghua University（清华大学 LeapLab）； Qiuzhen College, Tsinghua University（清华大学旗正学院）

AI总结该研究提出SCRL框架，通过从参考推理链中生成可验证子问题，解决LLM推理中信用分配问题，提升了在数学推理任务中的性能。

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）在LLM推理中展现出强大潜力，但基于结果的RLVR在处理难题时效率低下，因为正确的最终答案 rollout 很少且样本层面的信用分配无法利用失败尝试中的部分进展。我们引入SCRL（子问题课程强化学习），一种课程强化学习框架，通过从参考推理链中推导出可验证子问题，并将最终子问题固定为原始问题。这将难题中的部分进展转化为可验证的学习信号。算法上，SCRL使用子问题层面的归一化，每个子问题位置独立归一化奖励，并将结果优势分配给相应的答案片段，使在没有外部评分标准或奖励模型的情况下实现更细粒度的信用分配。我们的分析表明，子问题课程将难题从梯度死亡区中拉出，随着原始问题难度增加，相对收益也更大。在七个数学推理基准测试中，SCRL超越了强大的课程学习基线，使Qwen3-4B-Base的平均准确率比GRPO提高+4.1点，Qwen3-14B-Base提高+1.9点。在AIME24、AIME25和IMO-Bench上，SCRL进一步提高Qwen3-4B-Base的pass@1由+3.7点，pass@64由+4.6点，表明在难题推理任务中探索能力更强。

英文摘要

Reinforcement learning from verifiable rewards (RLVR) has shown strong promise for LLM reasoning, but outcome-based RLVR remains inefficient on hard problems because correct final-answer rollouts are rare and sample-level credit assignment cannot use partial progress in failed attempts. We introduce SCRL (Subproblem Curriculum Reinforcement Learning), a curriculum RL framework that derives verifiable subproblems from reference reasoning chains and fixes the final subproblem as the original problem. This turns partial progress on hard problems into verifiable learning signals. Algorithmically, SCRL uses subproblem-level normalization, which normalizes rewards independently at each subproblem position and assigns the resulting advantages to the corresponding answer spans, enabling finer-grained credit assignment without external rubrics or reward models. Our analysis shows that subproblem curricula lift hard problems out of gradient dead zones, with larger relative gains as the original problem becomes harder. Across seven mathematical reasoning benchmarks, SCRL outperforms strong curriculum-learning baselines, improving average accuracy over GRPO by +4.1 points on Qwen3-4B-Base and +1.9 points on Qwen3-14B-Base. On AIME24, AIME25, and IMO-Bench, SCRL further improves pass@1 by +3.7 points and pass@64 by +4.6 points on Qwen3-4B-Base, indicating better exploration on hard reasoning problems.

URL PDF HTML ☆

赞 0 踩 0

2605.22055 2026-05-22 cs.LG cs.AI 版本更新

RADAR: 通过动态防御对抗RAG的检索腐败

Ziyuan Chen, Yueming Lyu, Yi Liu, Weixiang Han, Jing Dong, Caifeng Shan, Tieniu Tan

发表机构 * School of Intelligence Science and Technology, Nanjing University, Suzhou, China（南京大学智能科学与技术学院，中国，苏州）； City University of Hong Kong, Hong Kong, China（香港城市大学，中国，香港）； Institute of Automation, Chinese Academy of Sciences, Beijing, China（中国科学院自动化研究所，北京，中国）

AI总结 RADAR通过将可靠的上下文选择建模为图基能最小化问题，利用最大流最小割算法进行精确求解，采用贝叶斯记忆节点递归更新信念状态，以平衡稳定性和对抗性攻击，同时适应真实知识变化，在动态数据集上实现了比基线方法更优越的鲁棒性和响应质量，且存储开销小。

2605.22013 2026-05-22 cs.CV cs.GR cs.LG 版本更新

Ex-GraphRAG：图增强大语言模型中的可解释证据路由

Yoav Kor Sade, Arvindh Arun, Rishi Puri, Steffen Staab, Maya Bechler-Speicher

发表机构 * Tel Aviv University（特拉维夫大学）； Institute for AI, University of Stuttgart（人工智能研究所，斯图加特大学）； NVIDIA（英伟达）； Meta AI

AI总结本文提出Ex-GraphRAG，通过引入多变量图神经加法网络（M-GNAN）来解决图增强大语言模型中证据路由的可解释性问题，揭示了语义重要性与结构连通性之间的不匹配，对检索剪枝、上下文构建和失败诊断有重要影响。

详情

AI中文摘要

GraphRAG通过从知识图中检索子图并使用消息传递GNN进行编码，将语言模型置于这些子图上。由于这些编码器通过迭代邻域聚合将节点贡献纠缠在一起，因此无法确定每个检索实体对编码器输出的影响程度，因此无法忠实审计实际到达模型的结构证据。我们引入Ex-GraphRAG，用多变量图神经加法网络（M-GNAN）替代GNN编码器，这是一种扩展到高维嵌入空间的加法图模型，能够精确分解编码器的输出，而无需事后近似。在STaRK-Prime上，这种可审计的编码器与黑盒性能相匹配。利用它审计证据路由，我们发现语义-结构不匹配：主导编码器输出的节点在检索的子图中结构上是断开的，由低贡献的中介节点连接，其移除会使多跳问答性能下降高达28%。这种不匹配对任何不透明编码器都是不可见的，揭示了语义重要性与结构连通性由不同的节点集控制，对图增强大语言模型的检索剪枝、上下文构建和故障诊断有直接的影响。

英文摘要

GraphRAG conditions language models on subgraphs retrieved from knowledge graphs, encoded via message-passing GNNs. Because these encoders entangle node contributions through iterated neighborhood aggregation, there is no closed-form way to determine how much each retrieved entity influenced the encoder's output, and therefore no way to faithfully audit what structural evidence actually reached the model. We introduce Ex-GraphRAG, which replaces the GNN encoder with a Multivariate Graph Neural Additive Network (M-GNAN), an extension of additive graph models to high-dimensional embedding spaces that yields an exact decomposition of the encoder's output across individual nodes and feature groups, without post-hoc approximation. On STaRK-Prime, this auditable encoder matches black-box performance. Using it to audit evidence routing, we uncover a semantic-structural mismatch: the nodes that dominate the encoder's output are structurally disconnected in the retrieved subgraph, held together by low-attribution intermediaries whose removal degrades multi-hop QA by up to 28%. This mismatch, invisible to any opaque encoder, reveals that semantic importance and structural connectivity are governed by disjoint sets of nodes, with direct implications for retrieval pruning, context construction, and failure diagnosis in graph-augmented LLMs.

URL PDF HTML ☆

赞 0 踩 0

2605.21993 2026-05-22 cs.AI cs.LG 版本更新

ECPO: Evidence-Coupled Policy Optimization for Evidence-Certified Candidate Ranking

ECPO：基于证据的策略优化用于证据认证的候选者排序

Miaobo Hu, Shuhao Hu, BoKun Wang, Yina Sa, Xin Wang, Xiaobo Guo, Daren Zha, Jun Xiao

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences（信息工程研究所，中国科学院）； School of Cyber Security, University of Chinese Academy of Sciences（中国科学院大学网络安全学院）； School of Artificial Intelligence, University of Chinese Academy of Sciences（中国科学院大学人工智能学院）

AI总结本文研究了证据认证候选者排序问题，提出了一种名为ECPO的策略优化方法，通过结合排序和证据证书来提升排序效果和证据可靠性。

详情

AI中文摘要

用于决策支持的排序系统不仅应对候选者进行排序，还应展示可独立验证的证据。我们研究了证据认证候选者排序：给定一个意图ID、预定义的计划骨架、窗口局部的候选者名单、以及通过文本推导出的候选者轨迹及其跨度来源，系统必须输出一个Top-K列表以及doc_id:span证据证书，其引用的跨度足以恢复决策。我们在此任务上在MAVEN-ERE和RAMS上进行了实例化，使用固定上游提取、窗口局部随机候选者标识符、骨架对齐的轨迹监督、难例和审计参考。我们引入了证据耦合策略优化（ECPO），一种列表级策略优化目标，其动作是排序和证据证书的联合对象。ECPO首先从骨架对齐、论点一致性以及可选图特征中学习可解释的轨迹奖励；然后优化一个受约束的策略，具有三个耦合奖励：列表级排序效用、跨度级证书有效性以及由一个无标签的确定性验证器计算的证据循环奖励，该验证器通过去除声明的引用跨度重建候选者支持。这将目标从单独最大化普通NDCG转变为最大化CertNDCG和决策-证据耦合。评估将ECPO与零样本、SFT和GRPO策略、仅RM的评分带确定性证据附件、语法/JSON约束解码、验证器重试、最佳-N RM选择以及后验证据合理化在封闭名单、预测名单和混合名单设置下进行比较。

英文摘要

Ranking systems used in decision-support settings should not only order candidates but also expose evidence that can be independently checked. We study evidence-certified candidate ranking: given an intent_id, a predefined plan skeleton, a window-local candidate roster, and text-derived candidate trajectories with span provenance, a system must output a Top-K list together with doc_id:span evidence certificates whose cited spans are sufficient to recover the decision. We instantiate this task on MAVEN-ERE and RAMS with fixed upstream extraction, window-local randomized candidate identifiers, skeleton-aligned trajectory supervision, hard negatives, and audit references. We introduce Evidence-Coupled Policy Optimization (ECPO), a listwise policy-optimization objective whose action is the joint object of ranking and evidence certificate. ECPO first learns an interpretable trajectory reward from skeleton alignment, argument consistency, and optional graph features; it then optimizes a constrained policy with three coupled rewards: listwise ranking utility, span-level certificate validity, and an evidence-cycle reward computed by a label-free deterministic verifier that reconstructs candidate support from claim-stripped cited spans. This reframes the goal from maximizing ordinary NDCG alone to maximizing CertNDCG and decision-evidence coupling. The evaluation compares ECPO against zero-shot, SFT, and GRPO policies, RM-only scoring with deterministic evidence attachment, grammar/JSON-constrained decoding, validator retry, best-of-N RM selection, and post-hoc evidence rationalization under closed-roster, predicted-roster, and hybrid-roster settings.

URL PDF HTML ☆

赞 0 踩 0

2605.21975 2026-05-22 cs.LG 版本更新

Reasoning through Verifiable Forecast Actions: Consistency-Grounded RL for Financial LLMs

通过可验证的预测动作进行推理：面向金融大语言模型的一致性导向强化学习

Jialin Chen, Aosong Feng, Harshit Verma, Siyi Gu, Haiwen Wang, Ali Maatouk, Yixuan He, Yifeng Gao, Leandros Tassiulas, Rex Ying

发表机构 * Yale University（耶鲁大学）； University of Texas Rio Grande Valley（德克萨斯理工大学）； Arizona State University（亚利桑那州立大学）

AI总结本文提出StockR1，一种结合时间序列的LLM，通过可验证的预测动作统一股票预测与金融推理，利用强化学习优化整个流程，提升金融问答和股票预测的准确性。

详情

AI中文摘要

金融市场以极端非平稳性、低信噪比和对新闻、公司基本面和宏观经济信号的强依赖性为特征。然而，现有方法要么将时间序列抽象为文本，要么将预测与基于语言的推理解耦，导致定性推理与定量结果之间存在根本性不匹配。为此，我们引入StockR1，一种增强时间序列的LLM，通过可验证的预测动作统一股票预测与金融推理。基于工具调用设计，模型首先发出预测动作，即对其定性市场展望的结构化和可解释的表示。然后，它调用一个受此动作条件的时序解码器，生成分布式的未来轨迹，从而更有效地进行问答和金融推理。我们通过强化学习优化整个流程，其中奖励共同反映答案的正确性、预测的准确性以及生成动作与观察到的时序动态之间的一致性。此外，奖励通过样本级不确定性标量重新加权，鼓励模型适应市场动态中变化的不确定性。我们在大规模10年基准上评估StockR1的金融问答和股票预测。我们的方法在时间序列基线和通用LLM上均表现优异，将推理准确性提高了17.7%（4B）和25.9%（8B）。这些发现表明，结构化预测动作在语言推理和时间预测之间建立了强大的协同效应，使LLM能够通过可验证、可解释和数值基础的决策进行推理。

英文摘要

Financial markets are characterized by extreme non-stationarity, low signal-to-noise ratios, and strong dependence on external information such as news, company fundamentals, and macroeconomic signals. Yet, existing approaches either abstract time-series into text or decouple forecasting from language-based reasoning, leading to a fundamental mismatch between qualitative reasoning and quantitative outcomes. To address this, we introduce StockR1, a time-series-enhanced LLM that unifies stock forecasting and financial reasoning through a verifiable forecast action. Based on a tool-call design, the model first emits a forecast action, which is a structured and interpretable representation of its qualitative market outlook. It then invokes a time-series decoder conditioned on this action to generate distributional future trajectories, leading to more informed question answering and financial reasoning. We optimize the full pipeline with reinforcement learning, where rewards jointly reflect answer validity, forecast accuracy, and consistency between generated actions and observed time-series dynamics. In addition, rewards are reweighted by a sample-level uncertainty scalar, encouraging the model to accommodate varying uncertainty in market dynamics. We evaluate StockR1 on financial question answering and stock forecasting over a large-scale 10-year benchmark. Our method consistently outperforms time-series baselines and general-purpose LLMs, improving reasoning accuracy by 17.7% (4B) and 25.9% (8B). These findings demonstrate that structuring the forecast actions establishes a powerful synergy between language reasoning and temporal prediction, enabling LLMs to reason through verifiable, interpretable, and numerically grounded decisions.

URL PDF HTML ☆

赞 0 踩 0

2605.21972 2026-05-22 cs.LG 版本更新

训练算法的热力学不可逆性

Liu Ziyin, Yuanjie Ren, Adam Levine, Isaac Chuang

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； NTT Research（NTT研究所）

AI总结本文提出了一种通用框架，用于定义和分析训练算法的不可逆性，证明了四种不同方法在步长η的主导阶近似下是等价的，并展示了不可逆性如何导致时间反演对称性破缺的新兴力。

Comments preprint

2605.21928 2026-05-22 cs.LG cs.AI stat.ME 版本更新

CausalGuard: Conformal Inference under Graph Uncertainty

CausalGuard: 在图不确定性下的契合推断

Vikash Singh, Weicong Chen, Debargha Ganguly, Yanyan Zhang, Nengbo Wang, Sreehari Sankar, Mohsen Hariri, Alexander Nemecek, Chaoda Song, Shouren Wang, Biyao Zhang, Van Yang, Erman Ayday, Jing Ma, Vipin Chaudhary

发表机构 * Case Western Reserve University（凯斯西储大学）

AI总结本文提出CausalGuard，一种结构加权的契合框架，通过聚合图条件双稳健伪结果进行校准，以在图不确定性下提供无分布的有限样本边际覆盖。

详情

AI中文摘要

从观察数据估计治疗效应需要选择调整集，但有效的调整依赖于未知的因果图。图的不规范可能导致覆盖不足，而图无关的契合包装可能只能通过大填充来恢复名义覆盖。我们介绍了CausalGuard，一种结构加权的契合框架，该框架在聚合图条件双稳健伪结果后进行校准。候选DAGs从LLM衍生的边先验中提出，通过条件独立性测试进行修剪，并通过贝叶斯信息准则重新加权。然后，一个复合非契合分数校准后加权的伪结果。CausalGuard为聚合的伪结果提供无分布的有限样本边际覆盖；在因果识别、重叠、条件均值噪声稳定性以及集中在目标对齐的有效调整策略下，其条件均值收敛于真实的条件平均治疗效应。在五个基准测试中，CausalGuard在可直接评估的目标上实现了均值覆盖超过名义90%水平，并在图无关契合基线需要大填充时减少了宽度。压力测试显示，当保留的候选集受数据支持时，CausalGuard能抑制无效的碰撞调整并在不规范的先验下保持稳定。

英文摘要

Estimating treatment effects from observational data requires choosing an adjustment set, but valid adjustment depends on an unknown causal graph. Graph misspecification can cause under-coverage, while graph-agnostic conformal wrappers may regain nominal coverage only through large padding. We introduce CausalGuard, a structure-weighted conformal framework that calibrates after aggregating graph-conditional doubly robust pseudo-outcomes. Candidate DAGs are proposed from an LLM-derived edge prior, pruned by conditional-independence tests, and reweighted by Bayesian Information Criterion. A composite nonconformity score then calibrates the posterior-weighted pseudo-outcome. CausalGuard provides distribution-free finite-sample marginal coverage for this aggregated pseudo-outcome; under causal identification, overlap, conditional-mean nuisance stability, and concentration on target-aligned valid adjustment strategies, its conditional mean converges to the true Conditional Average Treatment Effect. Across five benchmarks, CausalGuard attains mean coverage above the nominal 90% level for the directly evaluable target and reduces width when graph-agnostic conformal baselines require large padding. Stress tests show that CausalGuard suppresses invalid collider adjustment and remains stable under misspecified priors when the retained candidate set is data-supported.

URL PDF HTML ☆

赞 0 踩 0

2605.21916 2026-05-22 quant-ph cs.LG 版本更新

A2QTGN: Adaptive Amplitude Quantum-Integrated Temporal Graph Network for Dynamic Link Prediction

A2QTGN：自适应幅度量子集成时间图网络用于动态链接预测

Nouhaila Innan, M. Murali Karthick, Simeon Kandan Sonar, Vivek Chaturvedi, Muhammad Shafique

发表机构 * eBRAIN Lab, Division of Engineering, New York University Abu Dhabi (NYUAD)（eBRAIN实验室，工程学院，纽约大学阿布扎比分校）； Center for Quantum and Topological Systems (CQTS), NYUAD Research Institute, NYUAD（量子与拓扑系统中心（CQTS），NYUAD研究院，NYUAD）； Indian Institute of Technology Palakkad (IITPKD)（帕拉卡德印度理工学院（IITPKD））

AI总结本文提出A2QTGN，一种结合自适应幅度编码和时间图网络的混合量子-经典框架，用于动态链接预测，通过量子状态表示节点交互特征并根据时间活动选择性刷新幅度嵌入，提升时间表示能力。

Comments 9 pages, 3 figures

详情

AI中文摘要

动态链接预测对于建模复杂系统中演变的交互至关重要，包括社交、通信、金融和交通网络。经典时间图模型捕捉序列依赖性，但可能难以表示大规模动态图中同时和快速变化的节点-边交互。我们提出A2QTGN（自适应幅度量子集成时间图网络），一种混合量子-经典框架，结合自适应幅度编码与时间图网络骨干。所提出机制将节点交互特征表示为量子状态，并根据时间活动选择性刷新幅度嵌入，保留稳定节点状态的同时强调有意义的结构变化。此设计减少了不必要的量子重编码并改进了时间表示以进行链接预测。在五个时间图基准数据集上的实验表明，A2QTGN在多样化的动态图中实现了强大的预测和排名性能。消融研究证实了量子嵌入模块和自适应更新策略的重要性，而使用嘈杂后端和有限真实设备执行的硬件感知推断支持了近期量子辅助时间图学习的可行性。

英文摘要

Dynamic link prediction is important for modeling evolving interactions in complex systems, including social, communication, financial, and transportation networks. Classical temporal graph models capture sequential dependencies, but they may struggle to represent concurrent and rapidly changing node-edge interactions in large dynamic graphs. We propose A2QTGN (Adaptive Amplitude Quantum-Integrated Temporal Graph Network), a hybrid quantum-classical framework that combines adaptive amplitude encoding with a Temporal Graph Network backbone. The proposed mechanism represents node interaction features as quantum states and selectively refreshes amplitude embeddings based on temporal activity, preserving stable node states while emphasizing meaningful structural changes. This design reduces unnecessary quantum re-encoding and improves temporal representation for link prediction. Experiments on five Temporal Graph Benchmark datasets show that A2QTGN achieves strong predictive and ranking performance across diverse dynamic graphs. Ablation studies confirm the importance of both the quantum embedding module and the adaptive update strategy, while hardware-aware inference using a noisy backend and limited real-device execution supports the feasibility of near-term quantum-assisted temporal graph learning.

URL PDF HTML ☆

赞 0 踩 0

2605.21915 2026-05-22 cs.CR cs.LG 版本更新

CCLab: Adversarial Testing of Learning- and Non-Learning-Based Congestion Controllers

CCLab: 学习型和非学习型拥塞控制器的对抗测试

Zhi Chen, Shehab Sarar Ahmed, Chenkai Wang, Brighten Godfrey, Gang Wang

AI总结本文提出CCLab框架，用于系统评估学习型和非学习型拥塞控制器在对抗性条件下的鲁棒性，发现学习型控制器在对抗测试中比传统算法更鲁棒，并展示了对抗性追踪可用于训练更鲁棒的拥塞控制器。

Comments 13 pages for main paper, 16 pages in total

详情

AI中文摘要

拥塞控制器（CCs）对网络性能至关重要，但其在恶劣条件下的鲁棒性仍不够了解。尽管最近的学习型CCs在受控环境中表现出色，但当控制器的输入信号被破坏或环境条件变得系统性挑战时，其与传统CCs的性能对比尚不清楚。本文介绍CCLab，一种对抗测试框架，用于系统评估学习型和非学习型CCs的鲁棒性。CCLab包含一个基于强化学习（RL）的对抗代理，在闭环中与拥塞控制策略协同工作，生成受约束的扰动，无论是对输入信号（特征级）还是外部网络条件（环境级），同时通过显式约束保持现实性。利用此框架，我们在特征级和环境级对抗性条件下比较学习型和非学习型CCs。尽管两种类型的CCs在对抗测试中均出现性能下降，但学习型CCs总体上比传统人工设计算法更鲁棒。最后，我们展示对抗性追踪可用于训练更鲁棒的CCs，其在挑战性和正常条件下均优于现有学习型CCs。

英文摘要

Congestion controllers (CCs) are critical to network performance, and yet their robustness under adverse conditions remains insufficiently understood. While recent learning-based CCs have demonstrated strong performance in controlled environments, it is unclear how they compare to traditional CCs when controllers' input signals are corrupted or when environmental conditions become systematically challenging. In this paper, we introduce CCLab, an adversarial testing framework for systematically evaluating the robustness of both learning-based and non-learning-based CCs. CCLab includes a reinforcement learning (RL)-based adversarial agent that operates in a closed loop with the congestion control policy, generating bounded perturbations either on input signals (feature-level) or on external network conditions (environment-level), while preserving realism through explicit constraints. Using this framework, we compare learning-based CCs with non-learning-based CCs under both feature-level and environment-level adversarial conditions. While both types of CCs suffer from performance degradation under adversarial testing, we find that learning-based CCs, in general, are more robust than traditional human-designed algorithms. Finally, we show that our adversarial traces can be used to train more robust CCs that outperform existing learning-based CCs under both challenging and normal conditions.

URL PDF HTML ☆

赞 0 踩 0

2605.21911 2026-05-22 cs.LG 版本更新

Noise Schedule Design for Diffusion Models: An Optimal Control Perspective

扩散模型的噪声调度设计：一个最优控制视角

Seo Taek Kong, Weina Wang, R. Srikant

发表机构 * ECE & CSL University of Illinois Urbana-Champaign（电子工程与计算机科学实验室，伊利诺伊大学厄巴纳-香槟分校）； Computer Science Department Carnegie Mellon University（计算机科学系，卡内基梅隆大学）； ECE, CSL & NCSA University of Illinois Urbana-Champaign（电子工程、计算机科学实验室及国家计算科学中心，伊利诺伊大学厄巴纳-香槟分校）

AI总结本文从最优控制的角度出发，提出了一种分析和设计扩散模型噪声调度的框架，通过将噪声调度问题转化为最优控制问题，推导出噪声调度的充分条件，实现了更优的采样误差，并通过参数调整得到新的噪声调度方案，提升了图像生成的FID分数。

详情

AI中文摘要

我们开发了一个系统分析和设计扩散模型噪声调度的框架。我们证明可以将此设计问题重新表述为一个最优控制问题，其状态是扩散过程的Fisher信息，该信息根据微分方程演变，控制输入是噪声调度。最优控制问题的目标函数涉及Fisher信息，它被证明是Kullback-Leibler采样误差的上界。通过求解此最优控制问题，我们获得噪声调度的充分条件，使得最先进的~O(d/n)采样误差得以实现，其中d是数据维度，n是离散化步骤数。尽管现有理论工作也证明~O(d/n)采样误差界是可行的，但这些结果仅适用于特定的噪声调度，不包括实践中使用的调度。在进一步的数据分布参数假设下，我们证明可以得到噪声调度的闭式表达。这些噪声调度通过允许额外可调参数来推广标准经验调度，如指数和Sigmoid调度。系统地调整这些调度的参数可得到新的调度方案，在图像生成基准上取得更优的FID分数。

英文摘要

We develop a principled framework for analyzing and designing noise schedules in diffusion models. We show that one can recast this design problem as an optimal control problem, whose state is the Fisher information of the diffusion process which evolves according to an ODE and the control input is the noise schedule. The objective of the optimal control problem is a functional involving the Fisher information, which is shown to be an upper bound on the Kullback-Leibler sampling error. By solving this optimal control problem, we obtain sufficient conditions on noise schedules under which state-of-the-art $\tilde{\mathcal{O}} (d/n)$ sampling error is achievable, where $d$ is the data dimension and $n$ is the number of discretization steps. While existing theoretical work also prove that $\tilde{\mathcal{O}}(d/n)$ sampling error bounds are achievable, these results hold for specific noise schedules, which do not include the schedules used in practice. Under a further parametric assumption on the data distribution, we show that one can obtain closed-form expressions for the noise schedules. These noise schedules generalize standard empirical schedules such as exponential and sigmoid schedules by allowing additional parameters that can be tuned. Systematically tuning the parameters of these schedules yields new schedules that achieve superior FID scores on image generation benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.21903 2026-05-22 eess.SY cs.AI cs.LG cs.NE cs.SY 版本更新

Engineering Hybrid Physics-Informed Neural Networks for Next-Generation Electricity Systems: A State-of-the-Art Review

为下一代电力系统工程混合物理指导神经网络：最新综述

Joseph Nyangon

发表机构 * Energy Exemplar（1能源典范）

AI总结本文综述了用于电力系统的混合物理指导机器学习架构，探讨了物理指导神经网络（PINNs）、深度算子网络（DeepONets）、傅里叶神经算子、极端学习机增强的PINNs、基于图的PINNs（PIGNNs）和域分解PINNs等方法，展示了这些方法在场分析、故障检测、数字孪生、替代建模和控制优化中的应用，以及嵌入麦克斯韦方程等第一原理约束对预测精度、仿真时间和泛化能力的提升。

Comments 59 pages, 6 Figures

详情

DOI: 10.3389/frai.2026.1751785/

AI中文摘要

将机器学习与领域特定物理相结合，正在改变电力系统的設計、監測和控制，其中數據稀缺、解釋性有限以及需要强制物理定律限制了纯数据驱动模型。物理指导机器学习（PIML）通过将支配方程直接嵌入到学习过程中，解决了这些限制，为工业4.0应用提供了准确、高效且可扩展的解决方案。本文综述了用于电力系统的混合PIML架构，包括物理指导神经网络（PINNs）、深度算子网络（DeepONets）、傅里叶神经算子、极端学习机增强的PINNs、基于图的PINNs（PIGNNs）和域分解PINNs。每种方法通过覆盖场分析、故障检测、数字孪生、替代建模和控制优化的案例研究进行审查。综述显示，嵌入麦克斯韦方程和其他第一原理约束显著提高了在稀疏和噪声数据下的预测精度，将仿真时间相对于有限元方法减少了多个数量级，并增强了在不同运行条件下的一般化能力。混合框架在参数敏感性、动态行为和鲁棒性方面始终优于纯数据驱动的基线，同时支持实时数字孪生校准和不确定性量化。持续的挑战包括对于刚性多尺度问题训练不稳定、高保真模型的计算成本以及缺乏标准化的基准。研究结果表明，PIML使从黑箱数据驱动方法向透明、物理指导策略的转变成为可能，为在坚韧和智能电力系统中持续创新奠定了基础。

英文摘要

The integration of machine learning with domain-specific physics is transforming the design, monitoring, and control of electricity systems, where data scarcity, limited interpretability, and the need to enforce physical laws constrain purely data-driven models. Physics-informed machine learning (PIML) addresses these limitations by embedding governing equations directly into the learning process, yielding accurate, efficient, and scalable solutions for Industry 4.0 applications. This article reviews hybrid PIML architectures for electricity systems, including physics-informed neural networks (PINNs), Deep Operator Networks (DeepONets), Fourier Neural Operators, Extreme Learning Machine-enhanced PINNs, graph-based PINNs (PIGNNs), and domain-decomposition PINNs. Each approach is examined through case studies spanning field analysis, fault detection, digital twins, surrogate modeling, and control optimization. The review shows that embedding Maxwell's equations and other first-principles constraints substantially improves predictive accuracy under sparse and noisy data, reduces simulation time by orders of magnitude relative to finite element methods, and enhances generalization across operating regimes. Hybrid frameworks consistently outperform purely data-driven baselines on parameter sensitivity, dynamic behavior, and robustness, while supporting real-time digital-twin calibration and uncertainty quantification. Persistent challenges include training instability for stiff multi-scale problems, computational cost of high-fidelity models, and the absence of standardized benchmarks. The findings demonstrate that PIML enables a paradigm shift from black-box data-driven methods to transparent, physics-informed strategies, positioning the field for sustained innovation in resilient and intelligent electricity systems.

URL PDF HTML ☆

赞 0 踩 0

2605.21868 2026-05-22 cs.LG 版本更新

When to Switch, Not Just What: Transition Quality Prediction in Clash Royale

何时切换，而不仅仅是选择：Clash Royale中的切换质量预测

Heeyun Heo, Huy Kang Kim

AI总结该研究探讨了竞技游戏中玩家在连续失利后切换策略的频率与胜率之间的反向关联，提出了一种基于切换质量预测（TQP）的三阶段方法，通过PersonaGate、TimingGate和ScoreFusion来优化策略推荐，并引入SwitchGap作为评估指标，以衡量策略的判别质量。

Comments 11 pages, 2 figures, 4 tables; Accepted at IEEE Conference on Games (CoG) 2026

详情

AI中文摘要

在竞技游戏中，玩家经常在连续失利后切换策略，但通过对34,619名Clash Royale玩家的926,334场比赛记录分析，发现切换频率与胜率之间存在反直觉的关联：切换频率与胜率成反比，且这种影响在不同玩家和情境中差异显著。我们归因于许多先前推荐系统的一个局限性，即仅通过预期质量评估策略，而忽略了切换行为的成本和个体在切换倾向上的差异。我们将这一隐含前提称为零切换成本假设。为了解决这一问题，我们将策略推荐重新表述为一个过渡层面的决策问题，并将其实例化为TQP（Transition Quality Predictor），一个三阶段的流程，结构为Who -> When -> What。PersonaGate抑制了那些在经验上与更优结果相关联的玩家的推荐。TimingGate识别出切换可能比保持更有净收益的时刻，使用子类型和状态匹配的基线来控制自然胜率恢复。ScoreFusion通过结合采用性信号和预测的过渡质量（delta WR）来对候选策略进行排名。我们进一步引入了SwitchGap，一种衡量策略判别质量的评估指标，不将观察到的玩家选择视为最优地面真实。这一属性尤为重要，因为最频繁切换的玩家记录了最低的胜率。完整的流程在推荐率为5.4%时实现了SwitchGap的+10.4个百分点，尽管在表现最差的群体中，触发损失的切换者从子类型条件指导中受益最大。

英文摘要

In competitive games, players frequently switch strategies after losing streaks, yet our analysis of 926,334 match records from 34,619 Clash Royale players reveals a counterintuitive pattern: switching frequency is inversely associated with the win rate, with effects that vary substantially across players and situational contexts. We attribute this to a limitation common in many prior recommendation systems, which evaluate strategies by expected quality while overlooking the behavioral cost of switching and individual differences in switching propensity. We refer to this implicit premise as the Zero Switching Cost Assumption. To address this, we reformulate strategy recommendation as a transition-level decision problem and instantiate it as TQP (Transition Quality Predictor), a three-stage pipeline structured as Who -> When -> What. PersonaGate suppresses recommendations for players whose strategic consistency is empirically associated with superior outcomes. TimingGate identifies moments when switching is likely to yield a net benefit over staying, using a subtype- and state-matched baseline to control for natural win-rate recovery. ScoreFusion ranks candidate strategies by combining an adoptability signal with predicted transition quality (delta WR). We further introduce SwitchGap, an evaluation metric that measures a policy's discriminative quality without treating observed player choices as optimal ground truth. This property is particularly important because the most frequent switchers record the lowest win rates. The full pipeline achieves a SwitchGap of +10.4 percentage points at a recommendation rate of 5.4%, and loss-triggered switchers, despite being the lowest-performing group, benefit the most from subtype-conditioned guidance.

URL PDF HTML ☆

赞 0 踩 0

2605.21859 2026-05-22 q-bio.PE cs.LG q-bio.QM 版本更新

PhylaFlow: Hybrid Flow Matching in Billera-Holmes-Vogtmann Tree Space for Phylogenetic Inference

PhylaFlow：在Billera-Holmes-Vogtmann树空间中进行混合流匹配用于系统发育推断

Yasha Ektefaie, Leo Cui, Shrey Jain, Marinka Zitnik, Pardis Sabeti

发表机构 * Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard（埃里克和wendy Schmidt中心，MIT和哈佛大学Broad研究所）； Department of Biomedical Informatics, Harvard Medical School（哈佛医学院生物医学信息学系）； Centennial High School（Centennial高中）； Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard（传染病与微生物组计划，MIT和哈佛大学Broad研究所）

AI总结该研究提出PhylaFlow模型，通过在Billera-Holmes-Vogtmann树空间中学习后验盆地运输，实现混合流匹配，从而提高系统发育推断的效率和准确性。

Comments 9 pages, 3 figures

详情

AI中文摘要

系统发育树是混合对象：分支长度连续变化，而拓扑结构通过边收缩和扩展离散变化。Billera-Holmes-Vogtmann（BHV）树空间提供了这种结构的规范几何表示，将每个解析拓扑表示为欧几里得正交ant，并将拓扑变化表示为在共享的低维边界上移动。我们引入PhylaFlow，一种混合流匹配模型，该模型在BHV空间中学习后验盆地运输。PhylaFlow在BHV测地路径上训练，从随机起始树到短程后验样本，将连续分支长度运动与学习到的边界事件和离散拓扑转换耦合在一起。我们通过操作性评估所学的几何运算：如果流到达后验相关区域，则有限预算的贝叶斯细化，从或由其终端树初始化或引导，应能更有效地恢复后验支持的拓扑。在DS1-DS8系统发育后验基准上，PhylaFlow相对于经典初始化显著减少了初始Tree-KL。在有限预算的MrBayes细化后，直接PhylaFlow在大多数数据集上改进了早期和中期拓扑恢复轨迹，而split-guided PhylaFlow-MCMC在最困难的案例中取得了最强的结果。最好的PhylaFlow变体在八种数据集中的七种上优于短预热，并在八种数据集中的五种上优于PhyloGFN。在联合序列条件实验中，序列嵌入引导后验分裂恢复，尽管精确的后验拓扑恢复仍处于初步阶段。这些结果表明，混合流匹配可以学习BHV树空间中的可操作运输，并为贝叶斯系统发育推断提供几何感知的提议机制。

英文摘要

Phylogenetic trees are hybrid objects: branch lengths vary continuously, while topologies change discretely through edge contractions and expansions. Billera-Holmes-Vogtmann (BHV) tree space provides a canonical geometry for this structure, representing each resolved topology as a Euclidean orthant and topological changes as motion across shared lower-dimensional boundaries. We introduce PhylaFlow, a hybrid flow-matching model that learns posterior-basin transport in BHV tree space. PhylaFlow is trained on BHV geodesic paths from random starting trees to short-run posterior samples, coupling continuous branch-length motion within orthants with learned boundary events and discrete topology transitions. We evaluate the learned geometry operationally: if the flow reaches posterior-relevant regions, finite-budget Bayesian refinement initialized from, or guided by, its terminal trees should recover posterior-supported topologies more efficiently. Across DS1-DS8 phylogenetic posterior benchmarks, PhylaFlow substantially reduces initial Tree-KL relative to classical initializers. After finite-budget MrBayes refinement, direct PhylaFlow improves early and intermediate topology-recovery trajectories on most datasets, while split-guided PhylaFlow-MCMC obtains the strongest hard-case results. The best PhylaFlow variant outperforms short-warmup on seven of eight datasets and PhyloGFN on five of eight under the same refinement budget. In a joint sequence-conditioned experiment, sequence embeddings steer posterior split recovery, although exact posterior topology recovery remains preliminary. These results show that hybrid flow matching can learn actionable transport in BHV tree space and provide a geometry-aware proposal mechanism for Bayesian phylogenetic inference.

URL PDF HTML ☆

赞 0 踩 0

2605.21856 2026-05-22 cs.LG cs.AI 版本更新

The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation

推理的幻觉：通过零CoT截断揭示LLM中的逃避数据污染

Yifan Lan, Yuanpu Cao, Hanyu Wang, Lu Lin, Jinghui Chen

发表机构 * The Pennsylvania State University（宾夕法尼亚州立大学）

AI总结本文提出零CoT探针（ZCP）方法，通过截断整个推理过程来暴露模型中的潜在捷径映射，以检测LLM中的直接和逃避数据污染，提出了 contamination confidence 指标来量化污染的可能性和严重性。

详情

AI中文摘要

大型语言模型（LLMs）在广泛的任务上展示了令人印象深刻的推理能力，但数据污染破坏了这些能力的客观评估。这个问题进一步加剧了恶意模型发布者使用逃避或间接污染策略，例如改写基准数据以逃避现有检测方法并人为提升排行榜表现。当前的方法难以可靠地检测这种隐蔽的污染。在本工作中，我们揭示了一个关键现象：模型生成的推理步骤主动掩盖其底层的记忆。受此启发，我们提出了零CoT探针（ZCP），一种新颖的黑盒检测方法，故意截断整个链式思维（CoT）过程以暴露潜在的捷径映射。为进一步将记忆与模型的内在问题解决能力区分开来，ZCP将模型在原始基准上的零CoT表现与等价扰动的参考数据集进行比较。此外，我们引入了污染置信度（Contamination Confidence），一个量化污染可能性和严重性的指标，超越了简单的二元分类。对已识别的污染模型和特别微调的污染模型的广泛实验表明，ZCP能够稳健地检测直接和逃避的数据污染。ZCP的代码可在https://github.com/Yifan-Lan/zero-cot-probe获取。

英文摘要

Large language models (LLMs) have demonstrated impressive reasoning abilities across a wide range of tasks, but data contamination undermines the objective evaluation of these capabilities. This problem is further exacerbated by malicious model publishers who use evasive, or indirect, contamination strategies, such as paraphrasing benchmark data to evade existing detection methods and artificially boost leaderboard performance. Current approaches struggle to reliably detect such stealthy contamination. In this work, we uncover a critical phenomenon: a model's generated reasoning steps actively mask its underlying memorization. Inspired by this, we propose the Zero-CoT Probe (ZCP), a novel black-box detection method that deliberately truncates the entire Chain-of-Thought (CoT) process to expose latent shortcut mappings. To further isolate memorization from the model's intrinsic problem-solving capabilities, ZCP compares the model's zero-CoT performance on the original benchmark against an isomorphically perturbed reference dataset. Furthermore, we introduce Contamination Confidence, a metric that quantifies both the likelihood and severity of contamination, moving beyond simple binary classifications. Extensive experiments on both previously identified contaminated models and specially fine-tuned contaminated models demonstrate that ZCP robustly detects both direct and evasive data contamination. The code for ZCP is accessible at https://github.com/Yifan-Lan/zero-cot-probe.

URL PDF HTML ☆

赞 0 踩 0

2605.21849 2026-05-22 cs.LG cs.CL 版本更新

Geometry-Adaptive Explainer for Faithful Dictionary-Based Interpretability under Distribution Shift

基于几何适应的解释器：在分布偏移下字典基础可解释性的忠实性

Sungjun Lim, Heedong Kim, Andrew Lee, Kyungwoo Song

发表机构 * Yonsei University（延世大学）； Harvard University（哈佛大学）

AI总结本文提出了一种几何适应解释器（GAE），用于在分布偏移下提高基于字典的可解释性。通过重新对齐解释器的字典与偏移活跃子空间，同时保持原始特征结构，GAE在无监督的情况下减少了分布偏移下的忠实性差距。

详情

AI中文摘要

机制可解释性旨在通过识别因果负责的内部结构来解释模型的行为。基于字典的解释器如稀疏自编码器和转码器是主要工具，但其在分布外（OOD）偏移下的忠实性却很少受到系统性关注。我们证明分布偏移会旋转模型所使用的子空间，导致解释器的字典在训练分布（ID）激活上训练时出现对齐偏差。我们将这种偏差正式化为忠实性差距，即ID字典与OOD活跃子空间之间的几何距离，并证明其控制OOD忠实性退化。为了减少这种差距，我们提出了几何适应解释器（GAE），它在保持原始特征结构的同时，重新对齐解释器的字典与OOD活跃子空间。这只需要未标记的OOD激活，并且不需要梯度更新。我们证明GAE在无适应ID解释器上有所改进，其额外损失被二次限制于二阶矩偏移。经验上，GAE在多个模型和OOD设置中甚至匹配或超过了所有基于训练的基线在因果忠实性上的表现。

英文摘要

Mechanistic interpretability aims to explain a model's behavior by identifying causally responsible internal structures. Dictionary-based explainers such as sparse autoencoders and transcoders are a primary tool, but their faithfulness under out-of-distribution (OOD) shift has received little systematic attention. We show that distribution shift rotates the subspace that the model actively uses, misaligning the explainer's dictionary trained on in-distribution (ID) activations. We formalize this misalignment as the faithfulness gap, a geometric distance between the ID dictionary and the OOD-active subspace, and show that it controls OOD faithfulness degradation. To reduce this gap, we propose the Geometry-Adaptive Explainer (GAE), which realigns the explainer's dictionary with the OOD-active subspace while preserving the original feature structure. This requires only unlabeled OOD activations and no gradient updates. We prove that GAE improves over the unadapted ID explainer, with excess loss bounded quadratically by the second-moment shift. Empirically, GAE even matches or surpasses all training-based baselines in causal faithfulness across multiple models and OOD settings.

URL PDF HTML ☆

赞 0 踩 0

2605.21846 2026-05-22 stat.ME cs.LG stat.ML 版本更新

Causal Discovery in Structural VAR Models Under Equal Noise Variance

在等噪声方差假设下结构VAR模型中的因果发现

SeyedSina Seyedi HasanAbadi, Fahimeh Arab, Erfan Nozari, AmirEmad Ghassami

发表机构 * Bourns College of Engineering, University of California, Riverside（加州大学河滨分校工程学院）； University of California, San Francisco（加州大学旧金山分校）； Department of Mathematics and Statistics, Boston University（波士顿大学数学与统计学系）

AI总结本文研究了在等噪声方差假设下线性高斯结构VAR模型中的因果发现问题，提出了一种基于稀疏性的方法ENVAR，用于在观测等价类中寻找稀疏的结构代表，并在合成数据和fMRI数据集上进行了评估。

详情

AI中文摘要

从多变量时间序列中进行因果发现具有挑战性，因为因果效应可能在时间上和同一采样间隔内同时发生。这个问题在神经科学等应用中尤为重要，其中采样率可能相对粗糙，而同时效应不一定形成无环图。我们研究了在等噪声方差假设下线性高斯结构VAR模型中的因果发现，这意味着结构噪声项具有共同的方差。与基于DAG的横断面等噪声方差设置不同，此处考虑的时间序列设置通常不会导致因果图的唯一点识别。相反，多种结构VAR参数化可以诱导相同的平稳观测过程定律。我们引入了一种针对此设置的观测等价性概念，并展示相应的等价类由结构方程的正交变换以及全局正比例尺度共同刻画。这种刻画导致了观测对齐差异，即比较结构模型模去保持观测定律的变换。基于这一理论，我们提出ENVAR，一种基于稀疏性的方法，用于在诱导的观测等价类中搜索稀疏的归一化结构代表。我们评估了所提出的方法在合成结构VAR数据和fMRI数据集上的性能。

英文摘要

Causal discovery from multivariate time series is challenging when causal effects may occur both across time and within the same sampling interval. This issue is especially important in applications such as neuroscience, where the sampling rate may be coarse relative to the underlying dynamics and contemporaneous effects need not form an acyclic graph. We study causal discovery in linear Gaussian structural VAR models under an equal noise variance assumption, meaning that the structural noise terms have a common variance. Unlike the DAG-based cross-sectional equal noise variance setting, the time-series setting considered here does not generally yield point identification of a unique causal graph. Instead, multiple structural VAR parameterizations can induce the same stationary observed process law. We introduce a notion of observational equivalence tailored to this setting and show that the corresponding equivalence class is characterized by orthogonal transformations of the structural equations together with a global positive scale. This characterization leads to an equivalence-aware model discrepancy, the observational alignment discrepancy, which compares structural models modulo transformations that preserve the observed law. Building on this theory, we propose ENVAR, a sparsity-based procedure that searches over the induced observational equivalence class for a sparse normalized structural representative. We evaluate the proposed methodology on synthetic structural VAR data and on an fMRI dataset.

URL PDF HTML ☆

赞 0 踩 0

2605.21842 2026-05-22 cs.LG cs.CL eess.SP 版本更新

Energy-Gated Attention: Spectral Salience as an Inductive Bias for Transformer Attention

能量门控注意力：频谱显著性作为Transformer注意力的归纳偏置

Athanasios Zeris

发表机构 * Independent Researcher, Athens, Greece（雅典，希腊独立研究者）

AI总结本文提出能量门控注意力（EGA），通过频谱显著性作为归纳偏置来改进Transformer注意力机制，通过在键嵌入的频谱能量上进行门控，提高了信息密集位置的注意力权重，实验结果显示在多个数据集上均取得显著效果。

Comments 12 pages, 4 figures

详情

AI中文摘要

标准的Transformer注意力计算查询和键之间的成对相似性，将所有标记视为具有同等显著性，无论其内在信息含量如何。在湍流流体力学中，相干结构——在背景混沌中持续存在的能量主导、空间组织化的模式——承载了总能量的不成比例份额，并控制所有传输。我们提出，标记在Transformer注意力中扮演类似的角色：信息密集的位置（形态边界、语法头、话语标记）集中了频谱能量，并应比背景标记（功能词、重复模式、低信息填充词）获得更多的注意力。我们提出能量门控注意力（EGA）：一种简单的修改，通过键标记嵌入的频谱能量来门控值聚合，该计算通过一个单个学习的线性投影完成，以发现嵌入场的主导频谱模式。在TinyShakespeare上，EGA仅使用12,480个额外参数（<0.26%的开销）和没有可测量的计算成本，就实现了+0.103的验证损失改进。结果在Penn Treebank上也一致（+0.101），证明了数据集的独立性。在三种小波家族（固定Morlet、Daubechies db2/db4和参数化Morlet）的系统消融研究中，发现固定结构基底是次优的——最优的能量方向是数据自适应的且非正弦的——同时识别出学习的小波包作为有前途的开放方向。学习的能量阈值收敛到tau ~ 0.35，无论初始化如何，对应于英语文本中携带高于平均频谱能量的约36%的标记比例，这是一个稳定的语言属性，与英语文本中内容词的比例一致。

英文摘要

Standard transformer attention computes pairwise similarity between queries and keys, treating all tokens as equally salient regardless of their intrinsic informational content. In turbulent fluid dynamics, coherent structures -- the energetically dominant, spatially organized patterns that persist amid background chaos -- carry a disproportionate fraction of total energy and govern all transport. We propose that tokens play an analogous role in transformer attention: informationally dense positions (morphological boundaries, syntactic heads, discourse markers) concentrate spectral energy and should attract proportionally more attention than background tokens (function words, repeated patterns, low-information filler). We propose Energy-Gated Attention (EGA): a simple modification that gates value aggregation by the spectral energy of key token embeddings, computed by a single learned linear projection that discovers the dominant spectral mode of the embedding field. On TinyShakespeare, EGA achieves +0.103 validation loss improvement with only 12,480 additional parameters (<0.26% overhead) and no measurable computational cost. The result is consistent on Penn Treebank (+0.101), demonstrating dataset independence. A systematic ablation across three wavelet families (fixed Morlet, Daubechies db2/db4, and a parametric Morlet) establishes that fixed structured bases are suboptimal -- the optimal energy direction is data-adaptive and non-sinusoidal -- while identifying learned wavelet packets as a promising open direction. The learned energy threshold converges to tau ~= 0.35 independently of initialization, corresponding to the fraction (~36%) of tokens carrying above-average spectral energy in English text, a stable linguistic property consistent with the fraction of content words in running English text.

URL PDF HTML ☆

赞 0 踩 0

2605.21834 2026-05-22 cs.LG 版本更新

On-Policy Consistency Training Improves LLM Safety with Minimal Capability Degradation

基于策略的一致性训练通过最小能力退化提升大语言模型安全性

Andy Han, Kristina Fujimoto, Avidan Shah, Kiet Nguyen, Kai Xu, Chen Yueh-Han, Ilia Sucholutsky, Rico Angell

发表机构 * New York University（纽约大学）

AI总结本文提出基于策略的一致性训练（OPCT）方法，通过模型自身响应对比性提示来提升大语言模型的安全性，实验表明OPCT在抑制顺从性、防止越狱和增强安全意识方面优于传统监督微调（SFT），同时避免了SFT导致的能力退化问题。

详情

AI中文摘要

对齐的模型可能以多种方式表现不当：它们常常谄媚，容易被越狱攻击，或未能包含适当的安全警告。一致性训练是一种有前途的新对齐范式，通过使用对比输入对训练模型的不变性来缓解此类失败。现有的一致性训练过程在离线生成监督信号，并使用监督微调（SFT）来更新模型。不幸的是，由此产生的模型往往只是记忆训练分布的表面形式，因此泛化能力差且能力退化。我们引入基于策略的一致性训练（OPCT），一种新的一致性训练方法，其目标是在模型自身对提示的响应上计算，由自身对相应对比提示的条件监督。我们评估了OPCT在三个安全轴上的表现：顺从性、越狱和安全意识。在三个模型家族中，OPCT在所有安全目标上均优于其SFT对应物。与基线相比，OPCT将顺从率几乎减半（8.1% vs. 15.4%，相比之下SFT为11.2%）。在适应性目标攻击者下，OPCT在保持的越狱行为上保持越狱防御成功率接近99%，而SFT平均达到87%。在安全意识方面，OPCT在两个模型中优于SFT，其余模型中与SFT相当。OPCT还大大避免了SFT引发的能力退化，如在MATH-500上下降28分。我们的结果表明，一致性训练最好以OPCT而不是SFT的方式实施，尤其是在希望超越训练分布泛化时。

英文摘要

Aligned models can misbehave in several ways: they are often sycophantic, fall victim to jailbreaks, or fail to include appropriate safety warnings. Consistency training is a promising new alignment paradigm to mitigate such failures by training invariants into the model using contrastive input pairs. Existing consistency training procedures generate the supervision signal once, offline, and use supervised fine-tuning (SFT) to update the model. Unfortunately, the resulting models tend to merely memorize the surface forms of the training distribution and thus generalize poorly and regress in their capabilities. We introduce On-Policy Consistency Training (OPCT), a new consistency training approach where the objective is computed over the model's own responses to prompts, supervised by itself conditioned on corresponding contrastive prompts. We evaluate OPCT on three safety axes: sycophancy, jailbreaking, and safety awareness. Across three model families, OPCT outperforms its SFT counterpart on all safety desiderata. It nearly halves the sycophancy rate relative to baseline (8.1% vs. 15.4%, compared to 11.2% for SFT). Under an adaptive per-target attacker, OPCT holds jailbreak defense success near 99% on held-out jailbreak behaviors, whereas SFT achieves 87% on average. On safety awareness, OPCT outperforms SFT in two out of three models, and matches it on the other. OPCT also largely avoids the capability regressions that SFT induces, such as a 28-point drop on MATH-500. Our results suggest that consistency training is best implemented as OPCT rather than as SFT, especially when generalization beyond the training distribution is desired.

URL PDF HTML ☆

赞 0 踩 0

2605.21820 2026-05-22 cs.LG cond-mat.mtrl-sci 版本更新

Beyond Scalar Objectives: Expert-Feedback-Driven Autonomous Experimentation for Scientific Discovery at the Nanoscale

超越标量目标：基于专家反馈的自主实验探索用于纳米尺度科学发现

Ralph Bulanadi, Jefferey Baxter, Arpan Biswas, Hiroshi Funakubo, Dennis Meier, Jan Schultheiß, Rama Vasudevan, Yongtao Liu

发表机构 * Center for Nanophase Materials Sciences, Oak Ridge National Laboratory（橡树岭国家实验室纳米相材料中心）； University of Tennessee-Oak Ridge Innovation Institute, University of Tennessee（田纳西大学橡树岭创新研究所）； Department of Material Science and Engineering, School of Materials and Chemical Technology, Institute of Science Tokyo（东京科学大学材料科学与工程系、材料与化学技术学院）； Department of Materials Science and Engineering, Norwegian University of Science and Technology (NTNU)（挪威科学技术大学（NTNU）材料科学与工程系）； Faculty of Physics and Center for Nanointegration Duisburg-Essen (CENIDE), University of Duisburg-Essen（杜伊斯堡- Essen大学物理系和杜伊斯堡- Essen纳米集成中心（CENIDE））； Research Center Future Energy Materials and Systems, Research Alliance Ruhr（鲁尔研究联盟未来能源材料与系统研究中心）

AI总结本文提出了一种名为深度核成对学习（DKPL）的方法，通过整合专家知识和跨学科科学知识，改进自主显微实验，从而在纳米尺度上更有效地发现科学现象。

详情

AI中文摘要

自动驾驶实验室或自主实验正成为加速科学发现的变革性平台。贝叶斯优化（BO）是用于此目的最广泛使用的机器学习框架之一，但这些基于BO的框架依赖于预定义的标量描述符来指导实验。在许多情况下，确定合适的标量描述符可能具有挑战性，并且可能无法捕捉到专家所察觉的微妙但科学重要的现象。为克服这一限制，本文开发了深度核成对学习（DKPL），一种用于自主显微实验的方法，该方法将人类专业知识和跨学科科学知识整合到一个主动学习循环中。与依赖显式标量目标不同，DKPL使专家能够直接评估哪些实验输出更有前途，使用跨学科知识。DKPL然后从这些专家判断中学习一个潜在的效用函数，以指导后续的自主显微实验。我们通过一个具有已知真实值的实验模型数据集展示了DKPL在学习物理有意义的纳米级结构方面的能力，同时有效优先考虑高信息测量区域。我们进一步将DKPL应用于分析铁电域墙的特性，在BiFeO3中区分高和低特征域墙角度，并在ErMnO3中发现头对头和尾对尾的域墙特性。这一发展建立了一种将专家知识整合到自主显微实验中的方法，并展示了一条通向能够解决超越标量度量驱动学习限制的科学问题的专家引导的自动驾驶实验室的路径。

英文摘要

Self-driving laboratories or autonomous experimentation are emerging as transformative platforms for accelerating scientific discovery. Bayesian optimization (BO) is among the most widely used machine learning frameworks for these purposes, but these BO-based frameworks rely on predefined scalar descriptors to guide experimentation. In many situations, the determination of an appropriate scalar descriptor can be challenging, and may fail to capture subtle yet scientifically important phenomena apparent to experts with interdisciplinary insight. To overcome this limitation, here we develop deep-kernel pairwise learning (DKPL), an approach for autonomous microscopy experiments which incorporates human expertise and interdisciplinary scientific knowledge into an active learning loop. Instead of relying on explicit scalar objectives, DKPL enables experts to directly evaluate which experimental output is more promising using interdisciplinary knowledge. DKPL then learns a latent utility function from these expert judgements to guide subsequent autonomous microscopy experiments. We demonstrate DKPL's performance in learning physically meaningful nanoscale structures while effectively prioritizing high-information measurement regions using an experimental model dataset with known ground truth. We further apply DKPL to analyze the character of ferroelectric domain walls, where we find DKPL capable of distinguishing between high and low characteristic domain-wall angles in bismuth ferrite, and able to discover both head-to-head and tail-to-tail domain-wall character in erbium manganite. This development establishes an approach to integrate expert knowledge into autonomous microscopy experiments and demonstrates a pathway toward expert-guided self-driving laboratories capable of addressing scientific problems beyond the limits of scalar-metrics-driven learning.

URL PDF HTML ☆

赞 0 踩 0

2605.21805 2026-05-22 stat.CO cs.LG stat.ML 版本更新

为何语义熵失效：面向策略优化的几何感知与校准不确定性

Zheyuan Zhang, Kaiwen Shi, Han Bao, Zehong Wang, Tianyi Ma, Yanfang Ye

发表机构 * University of Notre Dame（诺丁汉大学）

AI总结本文提出了一种新的策略优化框架GCPO，通过几何感知措施捕捉语义分歧，并利用基于奖励的校准对齐不确定性与学习信号强度，从而更准确地跟踪梯度变化并提升训练后性能。

详情

AI中文摘要

训练后已成为改进大语言模型推理和对齐的关键，其中无批评模型能够实现从模型生成输出的可扩展学习，但缺乏区分信息性与噪声信号的原理性机制。最近的方法利用响应级度量作为不确定性信号来调节基于群体的优化方法，如GRPO。然而，其经验成功仍不稳定，且不清楚它们如何影响优化动态。在本文中，我们提供迄今为止第一个原理性公式，将不确定性信号解释为表征和调节梯度方差和学习信号质量的机制。基于经验和理论分析，我们识别出当前基于熵的估计器的两个关键缺陷：各向异性缺口和校准缺口。受此分析启发，我们提出几何感知校准策略优化（GCPO），一种新的框架，整合几何感知度量以捕捉语义分歧，利用基于奖励的校准对齐不确定性与学习信号强度。在多个基准测试中的实验表明，我们的方法更忠实跟踪梯度变化，并且一致提升训练后性能。我们的结果强调了设计与优化动态对齐的不确定性信号的重要性，为稳健训练后方法提供了原理性视角。

英文摘要

Post-training has become central to improving reasoning and alignment in large language models, where critic-free models enable scalable learning from model-generated outputs but lack principled mechanisms to distinguish informative from noisy signals. Recent approaches leverage response-level measures as uncertainty signals to regulate group-based optimization methods such as GRPO. Yet their empirical success remains unstable and unclear in how they influence optimization dynamics. In this paper, we provide, to our knowledge, the first principled formulation that interprets uncertainty signals as mechanisms for characterizing and regulating gradient variance and learning signal quality. Based on both empirical and theoretical analysis, we identify two critical gaps of current entropy-based estimators: The anisotropic gap and The calibration gap. Motivated by this analysis, we propose Geometric-aware Calibrated Policy Optimization (GCPO), a novel framework integrating geometry-aware measures to capture semantic disagreement with reward-based calibration to align uncertainty with learning signal strength. Experiments on multiple benchmarks show that our approach more faithfully tracks gradient variability and consistently improves post-training performance. Our results highlight the importance of designing uncertainty signals that are aligned with optimization dynamics, offering a principled perspective for robust post-training.

URL PDF HTML ☆

赞 0 踩 0

2605.21800 2026-05-22 cs.LG cs.RO 版本更新

stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation

stable-worldmodel: 一个用于可重复世界建模研究和评估的平台

Lucas Maes, Quentin Le Lidec, Luiz Facury, Nassim Massaudi, Ayush Chaurasia, Francesco Capuano, Richard Gao, Taj Gillin, Dan Haramati, Damien Scieur, Yann LeCun, Randall Balestriero

发表机构 * Mila & Université de Montréal（Mila与蒙特利尔大学）； New York University（纽约大学）； Universidade Federal de Minas Gerais（巴西联邦大学矿务学院）； Independent Researcher（独立研究者）； LanceDB ； University of Oxford（牛津大学）； Brown University（布朗大学）

AI总结本文提出stable-worldmodel平台，旨在解决世界建模研究中代码库、数据管道和评估协议碎片化的问题，通过提供高性能的数据层、现代世界模型基线和规划求解器的实现，以及扩展的环境和任务，实现标准化和可重复的世界建模研究和评估。

详情

AI中文摘要

世界模型是构建能够推理、规划并在训练数据之外进行泛化的重要组成部分。然而，目前世界模型的研究仍然碎片化，不同的代码库、数据管道和评估协议阻碍了可重复性和公平比较。当前实践还受到三个关键瓶颈的限制：脆弱的一次性代码库、缓慢的视频数据加载以及缺乏标准化的泛化基准。我们提出了stable-worldmodel (swm)，一个开源平台，用于标准化和可重复的世界建模研究和评估。它提供了（1）一个高性能的Lance数据层，支持和转换MP4、HDF5和LeRobot数据集；（2）干净、经过良好测试的现代世界模型基线和规划求解器的实现；（3）一个广泛的环境和任务套件，扩展了可控的视觉、几何和物理因素的变化，以系统地评估动态理解、控制性能、表示质量和分布外泛化。通过在单一可扩展框架下统一整个流程， exttt{swm}显著减少了研究开销，并加速了向可靠世界模型的可信进展。

英文摘要

World models are central to building agents that can reason, plan, and generalize beyond their training data. However, research on world models is currently fragmented, with disparate codebases, data pipelines, and evaluation protocols hindering reproducibility and fair comparison. Current practice is further limited by three key bottlenecks: fragile one-off codebases, slow video data loading, and the lack of standardized generalization benchmarks. We present stable-worldmodel (swm), an open-source platform for standardized and reproducible world modeling research and evaluation. It delivers (1) a high-performance Lance-based data layer with native support and conversion tools for MP4, HDF5, and LeRobot datasets, (2) clean, well-tested implementations of modern world model baselines and planning solvers, and (3) a broad suite of environments and tasks extended with controllable visual, geometric, and physical factors of variation for systematic in-silico evaluation of dynamics understanding, control performance, representation quality, and out-of-distribution generalization. By unifying the full pipeline under a single, scalable framework, \texttt{swm} dramatically reduces research overhead and accelerates trustworthy progress toward reliable world models.

URL PDF HTML ☆

赞 0 踩 0

2605.21798 2026-05-22 cs.LG stat.ML 版本更新

Three Costs of Amortizing Gaussian Process Inference with Neural Processes

三次成本：神经过程在高斯过程推断中的摊销

Robin Young

发表机构 * University of Cambridge, Cambridge, UK（剑桥大学）

AI总结本文研究了神经过程在高斯过程推断中的摊销成本，将高斯过程的后验推断从精确的O(n^3)转换为学习的O(n)映射，分析了标签污染、信息瓶颈和摊销误差三个来源，并提出了架构优化建议。

Comments To appear at ProbNum 2026

详情

AI中文摘要

神经过程用于摊销高斯过程推断，将精确的O(n^3)后验替换为学习的O(n)映射，从上下文集到预测分布。对于一类潜在的神经过程，我们界定了高斯过程和LNP预测之间的KL散度，将其分解为三个可解释的来源，即标签污染，因为神经过程使用标签值来估计在精确高斯过程中标签无关的量；信息瓶颈，因为有限维表示无法解析完整的上下文几何；以及摊销误差，因为单个编码器网络在所有上下文中共享。瓶颈截断项随着表示维度d衰减为O(e^{-cd^{2/d_x}})，对于平方指数核在R^{d_x}上，其中c>0是核依赖的常数，以及对于Matérn-ν核为O(d^{-2ν/d_x})，直接将架构尺寸与核平滑度和输入维度联系起来。标签污染项通常为O(1)，只有观测噪声部分衰减为O(1/n)，识别了通过标签依赖的表示路由不确定性估计的持续成本。这些结果刻画了在分析类别中的摊销成本，并产生了架构建议，以在高斯过程摊销范围内仅从上下文位置预测方差，并用二阶池化代替均值聚合以关闭主导的摊销差距。

英文摘要

Neural processes amortize Gaussian process inference, replacing the exact $O(n^3)$ posterior with a learned $O(n)$ map from context sets to predictive distributions. For a class of latent neural processes, we bound the Kullback--Leibler (KL) divergence between the GP and LNP predictives, decomposing it into three interpretable sources, namely label contamination as the neural process uses label values to estimate a quantity that is label-independent in the exact GP, an information bottleneck because the finite-dimensional representation cannot resolve the full context geometry, and amortization error from a single encoder network shared across all contexts. The bottleneck truncation term decays in the representation dimension $d$ as $O(e^{-cd^{2/d_x}})$ for squared-exponential kernels on $\mathbb{R}^{d_x}$ where $c > 0$ is a kernel-dependent constant and as $O(d^{-2ν/d_x})$ for Matérn-$ν$ kernels, directly linking architecture sizing to kernel smoothness and input dimension. The label contamination term is $O(1)$ in general, with only the observation-noise component decaying as $O(1/n)$, identifying a persistent cost of routing uncertainty estimation through a label-dependent representation. These results characterize the costs of amortization within the analyzed class and yield architectural recommendations to predict variance from context locations alone in the GP-amortization regime, and replace mean aggregation with second-order pooling to close the dominant amortization gap.

URL PDF HTML ☆

赞 0 踩 0

2605.21792 2026-05-22 cs.CL cs.AI cs.DB cs.LG 版本更新

Residual Skill Optimization for Text-to-SQL Ensembles

残差技能优化用于文本到SQL集成

Jiongli Zhu, Haoquan Guan, Parjanya Prajakta Prashant, Nikki Lijing Kuang, Seyedeh Baharan Khatami, Canwen Xu, Xiaodong Yu, Yingyu Lin, Zhewei Yao, Yuxiong He, Babak Salimi

发表机构 * University of California, San Diego（加州大学圣地亚哥分校）； Snowflake AI Research（Snowflake人工智能研究）

AI总结本文提出DivSkill-SQL，一种残差技能优化框架，通过在当前技能集成失败的示例上优化新技能，从而构建互补的文本到SQL集成，提升Pass@K性能，在Spider2-Lite上实现了显著的准确性提升，同时在不同方言和任务上表现出一致的改进。

详情

AI中文摘要

文本到SQL集成通过生成多个SQL候选并选择一个来优于单一候选生成，但其效果受限于Pass@K，即至少有一个K候选正确的概率。现有方法通过随机解码或提示变体启发式地引入多样性，导致候选集受相关失败主导。我们提出DivSkill-SQL，一种残差技能优化框架，构建互补的文本到SQL集成而无需模型微调：每个新技能在当前技能集成失败的示例上进行优化，证明其对Pass@K的边际贡献。在Spider2-Lite上，DivSkill-SQL在Snowflake和BigQuery上分别比最强集成基线提升11.1和8.3个点，且在两个基础模型（Opus-4.6和GPT-5.4）上表现一致。在单个方言上无重新训练即可转移至其他方言（Snowflake、BigQuery、SQLite）和不同任务形式（如BIRD-Critic，+2.6个点）。错误诊断显示幻觉的模式参考和函数调用减少3倍，表明收益来自真正可靠的互补技能，而非表面形式变化。

英文摘要

Text-to-SQL ensembles improve over single-candidate generation by drawing multiple SQL candidates and selecting one, but their effectiveness is bounded by Pass@K, the probability that at least one of K candidates is correct. Existing methods source diversity heuristically through stochastic decoding or prompt variants, leaving candidate sets dominated by correlated failures. We present DivSkill-SQL, a residual skill optimization framework that builds complementary agentic Text-to-SQL ensembles without model fine-tuning: each new skill is optimized on examples the current skill ensemble fails on, provably targeting its marginal contribution to Pass@K. On Spider2-Lite, DivSkill-SQL improves selected accuracy by up to +11.1 points on Snowflake and +8.3 on BigQuery over the strongest ensemble baseline, with consistent gains across two base models (Opus-4.6 and GPT-5.4). Skills optimized on a single dialect transfer without retraining across dialects (Snowflake, BigQuery, SQLite) and to a different task formulation, such as BIRD-Critic (+2.6 pts). Error diagnostics show up to 3x fewer hallucinated schema references and function calls, indicating that gains come from genuinely reliable complementary skills rather than surface-form variation.

URL PDF HTML ☆

赞 0 踩 0

2605.21783 2026-05-22 cs.LG stat.ML 版本更新

MMD-Balls as Credal Sets: A PAC-Bayesian Framework for Epistemic Uncertainty in Test-Time Adaptation

Ahanaf Hasan Ariq

发表机构 * Ideal School and College（理想学校和学院）

AI总结本文提出了一种基于PAC-Bayesian框架的测试时间适应方法，通过将MMD球体解释为 credal sets，提供了对epistemic不确定性量化的自然方法，并建立了与MMD相关的泛化界限、有限样本版本、统一最坏情况风险界限以及几何保持界限。

Comments 15 pages, 0 figures. Accepted at the 2nd Workshop on Epistemic Intelligence in Machine Learning (EIML@ICML 2026)

详情

AI中文摘要

Test-time adaptation (TTA) methods improve model performance under distribution shift but lack formal guarantees connecting shift magnitude to prediction reliability. We develop a PAC-Bayesian framework yielding generalization bounds explicitly parameterized by the maximum mean discrepancy (MMD) between source and target distributions. Our principal contribution is interpreting MMD-balls around the source distribution as credal sets in Walley's imprecise probability theory, yielding natural epistemic uncertainty quantification. We establish: (i) a PAC-Bayesian bound with an MMD-dependent shift penalty under an RKHS-Lipschitz loss assumption; (ii) a finite-sample version via MMD concentration; (iii) a uniform worst-case risk bound over all distributions in the credal set, with a lower-upper risk decomposition; and (iv) geodesic preservation bounds explaining why kernel-guided adaptation protects local feature geometry. The credal set interpretation separates epistemic from aleatoric uncertainty and provides a principled decision criterion for when adaptation is warranted.

英文摘要

Test-time adaptation (TTA) methods improve model performance under distribution shift but lack formal guarantees connecting shift magnitude to prediction reliability. We develop a PAC-Bayesian framework yielding generalization bounds explicitly parameterized by the maximum mean discrepancy (MMD) between source and target distributions. Our principal contribution is interpreting MMD-balls around the source distribution as credal sets in Walley's imprecise probability theory, yielding natural epistemic uncertainty quantification. We establish: (i) a PAC-Bayesian bound with an MMD-dependent shift penalty under an RKHS-Lipschitz loss assumption; (ii) a finite-sample version via MMD concentration; (iii) a uniform worst-case risk bound over all distributions in the credal set, with a lower-upper risk decomposition; and (iv) geodesic preservation bounds explaining why kernel-guided adaptation protects local feature geometry. The credal set interpretation separates epistemic from aleatoric uncertainty and provides a principled decision criterion for when adaptation is warranted.

URL PDF HTML ☆

赞 0 踩 0

2605.21780 2026-05-22 cs.LG cs.CR 版本更新

Provable Robustness against Backdoor Attacks via the Primal-Dual Perspective on Differential Privacy

通过微分隐私的对偶视角证明对后门攻击的鲁棒性

Aman Saxena, Jan Schuchardt, Yan Scholten, Stephan Günnemann

发表机构 * Department of Computer Science, Technical University of Munich（慕尼黑技术大学计算机科学系）； Munich Data Science Institute（慕尼黑数据科学研究所）； MCML ； Machine Learning Research, Morgan Stanley（摩根大通机器学习研究）

AI总结本文提出一种基于对偶视角的微分隐私框架，用于证明对抗性扰动下的鲁棒性，通过整合随机平滑与隐私配置文件，提供对训练时间和推理时间攻击的联合鲁棒性保证。

详情

AI中文摘要

随机平滑是一种强大的工具，可用于证明对对抗扰动的鲁棒性，包括通过随机训练的污染攻击和通过随机推理的逃避攻击。将这些保证扩展到后门攻击，其中训练和测试数据共同被扰动，仍然具有挑战性，因为训练和测试时间的随机化机制必须在单一鲁棒性证书内进行分析。我们通过将随机平滑与通过隐私配置文件连接到微分隐私的对偶视角，提供了一种数值程序，用于组合异构机制。所得到的框架能够实现对复杂、组合机制的紧密、模块化、端到端认证，同时利用现有微分隐私机制的分析。我们为DP-SGD和带有推理时间平滑的深度分区聚合实例化该框架，推导出对训练时间和推理时间攻击的联合鲁棒性保证。在MNIST和CIFAR-10上的实验展示了该框架的有效性。总体而言，我们提供了一个系统且通用的框架，用于使用复合机制在复杂的威胁模型下证明鲁棒性，该模型更好地捕捉了现实对手的能力。

英文摘要

Randomized smoothing is a powerful tool for certifying robustness to adversarial perturbations, including poisoning attacks via randomized training and evasion attacks via randomized inference. Extending these guarantees to backdoor attacks, where training and test data are jointly perturbed, remains challenging because training- and test-time randomized mechanisms must be analyzed within a single robustness certificate. We address this by connecting randomized smoothing to the dual view of differential privacy through privacy profiles, which provide a numerical procedure for composing heterogeneous mechanisms. The resulting framework enables tight, modular, end-to-end certification of complex, composed mechanisms while leveraging existing analyses of differentially private mechanisms. We instantiate the framework for DP-SGD and Deep Partition Aggregation with inference-time smoothing, deriving joint robustness guarantees against both training-time and inference-time attacks. Experiments on MNIST and CIFAR-10 demonstrate the effectiveness of our framework. Overall, we provide a principled and general framework for using composite mechanisms to certify robustness under complex threat models that better capture the capabilities of real-world adversaries.

URL PDF HTML ☆

赞 0 踩 0

2605.21773 2026-05-22 cs.CR cs.LG 版本更新

HIDBench: Benchmarking Large Language Models for Host-Based Intrusion Detection

HIDBench: 用于基于主机入侵检测的大型语言模型评估

Danyu Sun, Jinghuai Zhang, Yuan Tian, Zhou Li

发表机构 * University of California, Irvine（加州大学洛杉矶分校）； University of California, Los Angeles（加州大学洛杉矶分校）

AI总结本文提出HIDBench基准测试，用于评估大型语言模型在支持基于主机的入侵检测系统（HIDS）中的能力，揭示了LLM在复杂系统日志数据中的性能差异和敏感性。

详情

AI中文摘要

近年来，基准测试努力已推动了大型语言模型（LLMs）在网络安全中的评估，包括渗透测试和漏洞识别等任务。然而，入侵检测从系统日志这一关键网络安全任务仍未被探索。在本文中，我们提出一个新的基准测试，以评估LLM在支持基于主机的入侵检测系统（HIDS）中的能力。该任务需要在大规模、嘈杂且高度不平衡的系统日志上进行细粒度推理，其中良性与恶意活动之间的复杂相互作用使得可靠检测具有挑战性。我们的基准测试统一了三个公开的系统日志数据集，DARPA-E3、DARPA-E5和NodLink，并引入了一个数据构建管道，将原始主机遥测数据转换为LLM兼容的输入，从而在现实入侵检测设置下进行系统评估。我们对前沿LLM的评估揭示了在不同数据集上的显著性能差距。尽管许多模型在更简单的数据集上实现了高精度（通常超过0.8），但当系统日志变得更加嘈杂和复杂时，其性能显著下降，MCC经常低于0.5，误报率急剧上升。我们进一步分析了模型行为，并识别出不同的模式，包括具有低误报率的保守检测器和产生过多警报的过度敏感模型。总体而言，我们的结果表明，尽管LLM在HIDS中显示了强大的潜力，但其有效性对数据复杂性高度敏感，稳健的系统设计对于可靠的部署至关重要。

英文摘要

Recent benchmark efforts have advanced the evaluation of large language models (LLMs) in cybersecurity, including tasks such as penetration testing and vulnerability identification. However, a critical cybersecurity task, namely intrusion detection from system logs, remains unexplored. In this work, we present a new benchmark to assess LLMs' capabilities in supporting host-based intrusion detection systems (HIDS). This task requires fine-grained reasoning over large-scale, noisy, and highly imbalanced system logs, where complex interactions between benign and malicious activities make reliable detection challenging. Our benchmark unifies three public system log datasets, DARPA-E3, DARPA-E5, and NodLink, and introduces a data construction pipeline that transforms raw host telemetry into LLM-compatible inputs, enabling systematic evaluation under realistic intrusion detection settings. Our evaluation of frontier LLMs reveals substantial performance gaps across datasets. While many models achieve high precision (often above 0.8) on simpler datasets, their performance degrades significantly as system logs become noisier and more complex, with MCC frequently dropping below 0.5 and false positive rates increasing sharply. We further analyze model behavior and identify distinct regimes, including conservative detectors with low false positive rates and over-sensitive models that generate excessive alerts. Overall, our results highlight that while LLMs show strong potential for HIDS, their effectiveness is highly sensitive to data complexity, and robust system design is essential for reliable deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.21770 2026-05-22 cs.LG 版本更新

Manifold-Guided Attention Steering

基于流形的注意力引导

Ian Li, Kapilesh Guruprasad, Raunak Sengupta, Ninad Satish, Loris D'Antoni, Rose Yu

发表机构 * University of California, San Diego（加州大学圣地亚哥分校）

AI总结本文提出了一种基于流形的注意力引导方法，通过在推理过程中监控注意力头与正确性流形的距离，动态纠正偏差，从而提高大语言模型在数学推理、代码生成和分子生成等任务中的表现。

详情

AI中文摘要

尽管大型语言模型具备完成正确推理所需的知识，但在推理任务中仍经常出现错误。一种可能的改进方法是通过激活引导。然而，现有激活引导方法使用固定且预先计算的修正向量，忽略了模型当前所处的生成轨迹位置；结果是无差别扰动，会像错误步骤一样自由地破坏已正确步骤。我们提出基于流形的注意力引导（MAGS），这是一种基于几何观察的轨迹感知推理过程干预方法：特定注意力头的输出激活在错误点偏离低维正确性流形，并且这种偏差会通过后续步骤累积。对于每个识别出的注意力头，我们从正确和错误轨迹的对比对中学习一个低维子空间，该子空间捕捉了误差行为偏离正确行为的方向。在推理过程中，我们监控每个头与该流形的距离，并在偏差超过学习阈值时应用针对性的投影修正，将注意力输出引导回正确的子空间，防止误差传播。MAGS在数学推理（MATH-500，GSM8K）、代码生成（HumanEval，MBPP）和分子生成（SMILES）等基准测试中均优于未引导的基线和静态引导方法，表明正确性流形是LLM注意力几何学中的普遍特征。

英文摘要

Large language models frequently produce errors in reasoning tasks despite possessing the underlying knowledge required for correct reasoning. One possible approach to improve reasoning consistency is through activation steering. However, existing activation steering approaches apply fixed, pre-computed correction vectors, ignoring where the model currently sits along its generation trajectory; the result is indiscriminate perturbation that disrupts already-correct steps as freely as erroneous ones. We propose Manifold-Guided Attention Steering (MAGS), a trajectory-aware inference-time intervention grounded in a geometric observation: the output activations of specific attention heads diverge from a low-dimensional correctness manifold at the point of error, and this deviation compounds through subsequent steps. For each identified attention head, we learn a low-dimensional subspace from contrastive pairs of correct and incorrect traces that capture the directions along which error behavior deviates from correct behavior. During inference, we monitor each head's proximity to this manifold and apply a targeted projection correction when deviation exceeds a learned threshold, steering the attention output back toward the correct subspace before the error propagates. MAGS consistently outperforms both unsteered baselines and static steering approaches across benchmarks spanning mathematical reasoning (MATH-500, GSM8K), code generation (HumanEval, MBPP), and molecular generation (SMILES), suggesting that correctness manifolds are a general feature of LLM attention geometry.

URL PDF HTML ☆

赞 0 踩 0

2605.21768 2026-05-22 cs.LG cs.MA 版本更新

Memory-R2: Fair Credit Assignment for Long-Horizon Memory-Augmented LLM Agents

Memory-R2: 长时间 horizon 记忆增强 LLM agent 的公平信用分配

Sikuan Yan, Ahmed Bahloul, Ercong Nie, Susanna Schwarzmann, Riccardo Trivisonno, Volker Tresp, Yunpu Ma

发表机构 * Ludwig Maximilian University of Munich（慕尼黑路德维希-马克西米利安大学）； Munich Center for Machine Learning（慕尼黑机器学习中心）； Huawei Heisenberg Research Center (Munich)（华为海森堡研究中心（慕尼黑））； Technical University of Munich（慕尼黑技术大学）

AI总结本文提出 Memory-R2 框架，通过结合局部和全局组相对优化方法，解决长时间 horizon 记忆增强 LLM agent 在多会话环境中训练时由于记忆状态差异导致的信用分配不公平问题，同时联合优化记忆形成与记忆演化。

详情

AI中文摘要

Memory-augmented LLM agents enable interactions that extend beyond finite context windows by storing, updating, and reusing information across sessions. However, training such agents with reinforcement learning in multi-session environments is challenging because memory turns the agent's past actions into part of its future environment. Once different rollouts write, update, or delete different memories, they no longer share the same intermediate memory state, making trajectory-level comparisons fundamentally unfair. This violates a key assumption behind group-relative methods such as GRPO, where rollouts are compared as if they were sampled from the same effective environment. Consequently, trajectory-level rewards provide noisy or biased credit signals for long-horizon memory operations. To address this challenge, we introduce Memory-R2, a training framework for long-horizon memory-augmented LLM agents. Its core algorithm, LoGo-GRPO, combines local and global group-relative optimization. The global objective preserves end-to-end learning from long-horizon trajectory-level rewards, while local rerollouts compare different memory-operation outcomes from the same intermediate memory state, yielding fairer group comparisons and more precise supervision for memory construction. Beyond credit assignment, Memory-R2 jointly optimizes memory formation and memory evolution with a shared-parameter co-learning design, where a fact extractor and a memory manager are instantiated from the same LLM backbone through role-specific prompts. To stabilize multi-step RL over long memory horizons, we adopt a progressive curriculum that increases the training horizon from 8 to 16 to 32 sessions. Together, these components provide an effective training paradigm for memory-augmented LLM agents in long-horizon multi-session settings.

英文摘要

Memory-augmented LLM agents enable interactions that extend beyond finite context windows by storing, updating, and reusing information across sessions. However, training such agents with reinforcement learning in multi-session environments is challenging because memory turns the agent's past actions into part of its future environment. Once different rollouts write, update, or delete different memories, they no longer share the same intermediate memory state, making trajectory-level comparisons fundamentally unfair. This violates a key assumption behind group-relative methods such as GRPO, where rollouts are compared as if they were sampled from the same effective environment. Consequently, trajectory-level rewards provide noisy or biased credit signals for long-horizon memory operations. To address this challenge, we introduce Memory-R2, a training framework for long-horizon memory-augmented LLM agents. Its core algorithm, LoGo-GRPO, combines local and global group-relative optimization. The global objective preserves end-to-end learning from long-horizon trajectory-level rewards, while local rerollouts compare different memory-operation outcomes from the same intermediate memory state, yielding fairer group comparisons and more precise supervision for memory construction. Beyond credit assignment, Memory-R2 jointly optimizes memory formation and memory evolution with a shared-parameter co-learning design, where a fact extractor and a memory manager are instantiated from the same LLM backbone through role-specific prompts. To stabilize multi-step RL over long memory horizons, we adopt a progressive curriculum that increases the training horizon from 8 to 16 to 32 sessions. Together, these components provide an effective training paradigm for memory-augmented LLM agents in long-horizon multi-session settings.

URL PDF HTML ☆

赞 0 踩 0

2605.21765 2026-05-22 cs.LG 版本更新

PEARL：通过对比学习实现工业级直播推荐的无偏百分位估计

Blake Gella, Wei Wu, Yuhao Yin, Zexi Huang, Zikai Wang, Emily Liu, Junlin Zhang, Wentao Guo, Qinglei Wang

发表机构 * TikTok（字节跳动）； ByteDance（字节跳动）

AI总结本文提出PEARL框架，通过对比学习方法解决用户行为不平衡问题，通过相对偏好信号建模提升推荐系统的性能和鲁棒性。

详情

AI中文摘要

训练于用户交互数据的推荐系统容易受到行为强度不平衡的影响——这种系统性扭曲源于用户间异质的参与模式。这种不平衡会使反馈信号失真，使得观察到的互动不再真实反映真实的偏好，导致模型过度放大高活跃用户信号而低估其他人，最终在大规模情况下降低推荐质量与鲁棒性。为了解决这个问题，我们提出了一种非参数对比百分位近似框架PEARL，该框架建模相对偏好信号而非绝对参与程度。基于相对优势去偏，PEARL利用真实的对比交互样本直接近似百分位关系，而无需依赖辅助分布估计模型。我们提供了理论证明，表明这种成对比较能产生无偏的基于百分位的偏好信号估计。为了更广泛的应用，我们引入了基于预测的重采样机制用于百分位平滑以处理稀疏和离散的反馈，以及通用的价值加权形式和共训练策略以增强建模灵活性和表示学习。大量离线实验表明，PEARL有效减轻了行为偏差，并在多个排序目标上一致提高了推荐性能。在拥有数十亿用户的大规模直播平台部署后，在线A/B测试确认了实际收益：观看时长增加2.10%，消费金额增加0.80%，互动率增加1.49%，举报率降低6.91%。

英文摘要

Recommender systems trained on user interaction data are susceptible to behavioral intensity imbalance--a systematic distortion arising from heterogeneous engagement patterns across users. This imbalance skews feedback signals such that observed interactions no longer faithfully reflect true preferences, causing models to disproportionately amplify signals from highly active users while underrepresenting others, which ultimately degrades recommendation quality and robustness at scale. To address this issue, we propose a nonparametric contrastive percentile approximation framework, PEARL, that models relative preference signals instead of absolute engagement magnitudes. Building upon relative advantage debiasing, PEARL leverages real contrastive interaction samples to approximate percentile relationships directly, without relying on auxiliary distribution estimation models. We provide theoretical justification demonstrating that such pairwise comparisons yield unbiased estimates of percentile-based preference signals. For broader applicability, we introduce a prediction-based bootstrapping mechanism for percentile smoothing to handle sparse and discrete feedback, alongside a generalized value-weighted formulation and a co-training strategy to enhance both modeling flexibility and representation learning. Extensive offline experiments demonstrate that PEARL effectively mitigates behavioral bias and consistently improves recommendation performance across multiple ranking targets. Deployed in a production livestream platform with a combined user base of billions, online A/B testing confirms substantial real-world gains: +2.10% Watch Duration, +0.80% Consumption Amount, +1.49% Interaction Rate, and -6.91% Report Rate.

URL PDF HTML ☆

赞 0 踩 0

2605.21751 2026-05-22 cs.LG 版本更新

面向广告市场的支持感知离线策略选择

Prashant Shekhar, Caroline Howard

发表机构 * Department of Mathematics（数学系）； Embry-Riddle Aeronautical University（埃姆伯里-瑞德尔航空大学）

AI总结本文提出了一种支持感知的离线决策框架，用于广告市场的保留策略选择，通过将记录证据转化为保守决策对象，以确保验证的可靠性，而非仅依赖点估计排名。

详情

AI中文摘要

记录的广告拍卖使离线保留价格评估变得有吸引力但有风险。回放表可以识别具有大显眼收益增益的策略，但它们也可能隐藏弱阈值支持、多重比较效应、子组伤害和投标者响应不确定性。现有的回放和离线策略评估方法估计或排名策略价值，但它们不能直接回答可用证据是否足够强以证明验证的问题。本文开发了一种支持感知的离线决策框架用于保留策略选择。与其输出单一的点估计胜者，该框架将记录证据转化为保守的决策对象，包括认证的策略、统计上被主导的替代方案以及需要进一步验证的未解决候选者。主要理论结果给出了一种统一的有限目录保证，显示在同时控制不确定性和保守支持门控的情况下，该框架保留了最佳通过策略，同时排除了具有认证遗憾的策略。支持性结果描述了支持本地化的回放泛化，建立了信息论阈值解析极限，并量化了异质投标者响应如何推翻本地化回放排名。在iPinYou实时竞价日志上的实验显示，领先的保留规则在第二季实现了47.66%的回放提升，同时实现了40.71%的下限提升，在第三季实现了43.87%的冻结超时回放提升。该框架将19个策略目录减少到两个策略验证短名单，同时在44个广告商、交易所和地区段中认证无害。结果支持核心主张，即离线保留策略评估应产生认证的验证决策，而非仅依赖点估计排名。

英文摘要

Logged advertising auctions make offline reserve-price evaluation attractive but risky. Replay tables can identify policies with large apparent yield gains, yet they can also hide weak threshold support, multiple-comparison effects, subgroup harm, and bidder-response uncertainty. Existing replay and off-policy evaluation methods estimate or rank policy values, but they do not directly answer the operational question of whether the available evidence is strong enough to justify validation. This paper develops a support-aware offline decision framework for reserve-policy selection. Rather than outputting a single point-estimate winner, the framework converts logged evidence into a conservative decision object consisting of certified policies, statistically dominated alternatives, and unresolved candidates requiring further validation. The main theoretical result gives a unified finite-catalog guarantee showing that, under simultaneous uncertainty control and conservative support gates, the framework preserves the best gate-passing policy while eliminating only policies with certified regret. Supporting results characterize support-localized replay generalization, establish information-theoretic threshold-resolution limits, and quantify when heterogeneous bidder response can overturn localized replay rankings. Experiments on iPinYou real-time-bidding logs show that the leading reserve rule achieves a 47.66% replay lift in season two, a 40.71% simultaneous lower-bound lift, and a 43.87% frozen out-of-time replay lift in season three. The framework reduces a 19-policy catalog to a two-policy validation shortlist while certifying non-harm across 44 advertiser, exchange, and region segments. The results support the central claim that offline reserve-policy evaluation should produce certified validation decisions rather than point-estimate rankings alone.

URL PDF HTML ☆

赞 0 踩 0

2605.21728 2026-05-22 cs.CV cs.CL cs.LG 版本更新

BEiTScore: Reference-free Image Captioning Evaluation with an Efficient Cross-Encoder Model

BEiTScore: 一种基于高效交叉编码器的无参考图像描述评估方法

Gonçalo Gomes, Bruno Martins, Chrysoula Zerva

发表机构 * Instituto Superior Técnico（里斯本大学理工学院）； INESC-ID ； Instituto de Telecomunicações（电信机构）

AI总结本文提出了一种无参考图像描述评估方法BEiTScore，通过高效的交叉编码器模型解决传统评估方法在计算成本和敏感性方面的不足，提出了一种新的评估指标，并在多种场景下验证了其优越的性能。

详情

AI中文摘要

图像描述评估仍是一个重大挑战，因为视觉-语言模型朝着生成长形式和上下文丰富的描述等更具挑战性的能力发展。最先进的评估度量标准涉及使用大型语言模型（LLMs）作为评判者的大量计算成本，或者受到标准CLIP基于编码器的限制，例如严格的令牌限制、缺乏细粒度敏感性或缺乏组合泛化能力，因为将描述视为“词袋”。我们提出了一种新的学习度量标准，以解决上述挑战，基于一个轻量级交叉编码器，其初始化来自视觉问答模型检查点，平衡了强大的权重初始化与计算效率。我们的训练方案使用精心编排的数据混合进行监督学习，特征是对抗性的LLM基于数据增强，以增强模型对细粒度视觉-语言错误的敏感性。我们还引入了一个新的基准，用于在多种场景中评估详细的描述评估。实验结果表明，所提出的度量标准在保持大规模基准测试、质量感知解码或奖励指导所需的效率的同时，实现了最先进的性能。

英文摘要

Image captioning evaluation remains a significant challenge, as vision-language models evolve toward more challenging capabilities such as generating long-form and context-rich descriptions. State-of-the-art evaluation metrics involve extensive computational costs associated with the use of Large Language Models (LLMs) as judges, or instead suffer from the limitations of standard CLIP-based encoders, such as strict token limits, lack of fine-grained sensitivity, or lack of compositional generalization by treating captions as ``bags-of-words.'' We propose a new learned metric that tackles the aforementioned challenges, based on a lightweight cross-encoder that is initialized from a visual question-answering model checkpoint, balancing a strong weight initialization with computational efficiency. Our training scheme uses a carefully assembled data mixture for supervised learning, featuring adversarial LLM-based data augmentations to enhance model sensitivity to fine-grained visual-linguistic errors. We also introduce a new benchmark designed to assess detailed captioning evaluation across diverse scenarios. Experimental results demonstrate that the proposed metric achieves state-of-the-art performance while maintaining the efficiency required for large-scale benchmarking, quality-aware decoding, or reward guidance.

URL PDF HTML ☆

赞 0 踩 0

2605.21724 2026-05-22 cs.LG cs.AI 版本更新

TBP-mHC: full expressivity for manifold-constrained hyper connections through transportation polytopes

TBP-mHC: 通过运输多面体实现 manifold-constrained 超连接的全表达性

Anton Lyubinin

AI总结本文提出 TBP-mHC，通过运输多面体参数化实现 manifold-constrained 超连接的全表达性，解决了超连接中无约束混合导致的训练不稳定性问题，并在语言模型预训练中展示了竞争性性能和改进的稳定性与可扩展性。

详情

AI中文摘要

超连接（HC）通过在多个残差流之间引入可学习的混合来改进残差网络，但无约束的混合导致训练不稳定。Manifold-Constrained Hyper-Connections（mHC）通过Sinkhorn归一化强制近似双随机性，而mHC-lite则通过置换矩阵的凸组合确保精确约束，但以阶乘复杂度为代价。KromHC通过Kronecker积参数化减少此成本，但限制混合矩阵为Birkhoff多面体的结构子流形。我们提出运输Birkhoff多面体（TBP）参数化及其递归变体（RTBP），通过(n-1)^2自由度构造精确的双随机混合矩阵。我们的方法避免了迭代归一化和组合爆炸，同时保持Birkhoff多面体的完整表达性。在语言模型预训练中的实验证明了竞争性性能，同时具有改进的稳定性和可扩展性。

英文摘要

Hyper-Connections (HC) improve residual networks by introducing learnable mixing across multiple residual streams, but unconstrained mixing leads to training instability. Manifold-Constrained Hyper-Connections (mHC) address this by enforcing approximate double stochasticity via Sinkhorn normalization, while mHC-lite ensures exact constraints through convex combinations of permutation matrices at the cost of factorial complexity. KromHC reduces this cost using Kronecker-product parameterizations, but restricts the mixing matrices to a structured submanifold of the Birkhoff polytope . We propose Transportation Birkhoff Polytope (TBP) parameterizations and their Recursive variants (RTBP), which construct exactly doubly stochastic mixing matrices with $(n-1)^2$ degrees of freedom. Our approach avoids iterative normalization and combinatorial explosion while preserving full expressivity of the Birkhoff polytope. Empirical results on language model pre-training' demonstrate competitive performance with improved stability and scalability.

URL PDF HTML ☆

赞 0 踩 0

2605.21722 2026-05-22 cond-mat.stat-mech cond-mat.mtrl-sci cs.LG 版本更新

MetaDNS: Enhancing Exploration in Discrete Neural Samplers via Well-Tempered Metadynamics

MetaDNS: 通过良好温控元动力学增强离散神经采样器的探索

Xiaochen Du, Juno Nam, Jaemoo Choi, Wei Guo, Sathya Edamadaka, Junyi Sha, Elton Pan, Yongxin Chen, Molei Tao, Rafael Gómez-Bombarelli

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； Georgia Institute of Technology（佐治亚理工学院）

AI总结本文提出MetaDNS，一种将良好温控元动力学整合到离散扩散或自回归采样器中的通用框架，以解决多模式和能量屏障离散分布采样中的模式崩溃问题，并实现自由能重建。

Comments Accepted at ICML 2026

详情

AI中文摘要

强化学习中大语言模型的价值-梯度假说

Arip Asadulaev, Daniil Ognev, Karim Salta, Martin Takac

发表机构 * MBZUAI（穆斯林人工智能研究所）

AI总结本文提出了一种价值-梯度视角来解释无评论强化学习方法在大语言模型后训练中的有效性，并通过分析actor更新和注意力机制中的自适应微分，提出了价值梯度信号和可达奖励空间的分解方法。

2605.21653 2026-05-22 cs.LG cs.AI cs.CL 版本更新

Amplifying, Not Learning: Fine-Tuned AI Text Detectors Amplify a Pretrained Direction

放大而非学习：微调的AI文本检测器放大了预训练的方向

Alexander Smirnov

发表机构 * University College London（伦敦大学学院）

AI总结该研究探讨了通过微调AI文本检测器来放大预训练方向而非学习AI与人类边界的问题，发现微调在某些情况下会降低辨别能力，但在非母语写作中表现不同，并展示了闭合形式雅可比预测器在不同架构中的有效性。

详情

AI中文摘要

AI文本检测器放大了预训练的典型性轴；它们并不构建AI与人类的边界。在没有任何任务监督的原始编码器上，将投影到AI-中心（HC3）的中心可以实现NYT与HC3的AUROC分别为0.806/0.944/0.834，跨三种架构（86-106%的微调辨别上限：在RoBERTa-base上，原始投影超过微调）；在RoBERTa-base上，完全微调在两种流畅正式人口测试中降低了辨别能力。相同的轴在非母语ESL写作中反转（AUROC 0.06-0.20）--这是典型性阅读独有的可验证预测。一个24例冻结探测器与完全微调（0.900 vs 0.895）一致。一个闭合形式雅可比预测器参数化轴操纵干预，R²=1.000通用，提升了ELECTRA-CE部署的TPR从0.000到0.904（FPR=1%），并在三个独立训练的第三方RoBERTa检测器上转移，达到16/16 oracle等价（在OpenAI检测器上57%的NYT-FPR减少）。范围：编码器家族；机制幅度HC3锚定；人口层面共享轴，不同架构中每文本机制有所变化。三种操作上不同的探测器--文本表面caps_rate残差化、几何符号epsilon消融、闭合形式文本对预测器--在三种架构中一致，cos 0.74/0.81/1.00，确认了观察者不变性。在匹配TPR-0.90评估下，已发表的干预动物园（CC、dealign-f2c）在27个单元格中校准等价（|Delta AUROC| <= 0.0081），并且ELECTRA上的LoRA->full-FT偏移差距的97%是校准偏移而非学习表示--这是核心主张的预测确认。

英文摘要

AI text detectors amplify a pretrained typicality axis; they do not construct an AI-vs-human boundary. On raw encoders before any task supervision, projecting onto centroid(AI)-centroid(HC3) achieves NYT-vs-HC3 AUROC 0.806/0.944/0.834 across three architectures (86-106% of the fine-tuned discrimination ceiling: on RoBERTa-base, raw projection exceeds fine-tuning); on RoBERTa-base, full fine-tuning reduces discrimination below raw on both fluent-formal populations tested. The same axis inverts on non-native ESL writing (AUROC 0.06-0.20) -- a falsifiable prediction unique to the typicality reading. A 24-example frozen probe matches full fine-tuning (0.900 vs 0.895). A closed-form Jacobian predictor parameterises axis-manipulating interventions with R^2 = 1.000 universal, lifts ELECTRA-CE deployment TPR from 0.000 to 0.904 at FPR = 1%, and transfers to three independently-trained third-party RoBERTa detectors at 16/16 oracle-equivalence (57% NYT-FPR reduction on the OpenAI detector). Scope: encoder family; mechanism magnitude HC3-anchored; population-level shared axis with per-text mechanisms varying across architectures. Three operationally distinct probes -- text-surface caps_rate residualisation, geometric signed-epsilon ablation, closed-form text-pair predictor -- agree at cos 0.74/0.81/1.00 across three architectures, confirming observer-invariance. Under matched-TPR-0.90 evaluation, the published intervention zoo (CC, dealign-f2c) is calibration-equivalent across 27 cells (|Delta AUROC| <= 0.0081), and >= 97% of the LoRA->full-FT bias gap on ELECTRA is calibration shift, not learned representation -- the central claim's prediction confirmed.

URL PDF HTML ☆

赞 0 踩 0

2605.21649 2026-05-22 cs.LG cs.CL 版本更新

EntmaxKV: Support-Aware Decoding for Entmax Attention

EntmaxKV: 基于支持的解码方法用于Entmax注意力

Gonçalo Duarte, Miguel Couceiro, Marcos V. Treviso

发表机构 * Instituto Superior Técnico, Universidade de Lisboa（里斯本大学理工学院）； ELLIS Unit Lisbon（里斯本ELLIS单位）； INESC-ID ； Instituto de Telecomunicações（电信研究所）

AI总结本文提出EntmaxKV，一种基于支持的解码框架，利用熵最大注意力的稀疏性在KV页面加载前进行稀疏解码，通过查询感知的页面评分、支持感知的候选选择和稀疏熵最大注意力，减少概率质量丢失，提高长上下文语言模型的效率。

详情

AI中文摘要

长上下文解码越来越受到KV缓存内存流量的限制，因为每个生成的标记都需在缓存上进行注意力运算，而缓存大小与上下文长度成线性增长。现有稀疏解码方法通过选择部分标记或页面来减少成本，但这些方法是为softmax注意力设计的，其密集尾部使得任何截断都会丢弃非零的概率质量。相比之下，α-entmax产生精确的零，将稀疏解码从密集尾部近似转变为支持恢复：如果所选候选包含entmax支持，稀疏解码仍保持精确。虽然最近的entmax内核实现了高效的训练，但它们并未解决自回归解码瓶颈，即密集推理仍需在稀疏性确定之前流式传输完整的KV缓存。在本文中，我们引入了EntmaxKV，一种基于entmax的稀疏解码框架，它在KV页面加载前利用稀疏性。EntmaxKV结合了查询感知的页面评分、支持感知的候选选择和稀疏entmax注意力。我们通过分析截断误差中的丢弃概率质量δ，证明输出误差由δ控制，并在恢复entmax支持时消失。我们进一步引入了一种高斯感知的entmax选择器，从轻量级页面统计中估计entmax阈值，使所选预算适应于分数分布。实验证明，EntmaxKV比基于softmax的稀疏解码在相同KV预算下丢弃更少的概率质量，保留更多支持标记，并实现更低的输出误差。在长上下文和语言建模基准上，它接近完整的缓存entmax，但使用KV缓存的少量比例，达到100万上下文长度时，比完整的注意力基线快3.36倍（softmax）和5.43倍（entmax）。代码可在：https://github.com/deep-spin/entmaxkv获取。

英文摘要

Long-context decoding is increasingly limited by KV-cache memory traffic since each generated token attends over a cache whose size grows linearly with context length. Existing sparse decoding methods reduce this cost by selecting subsets of tokens or pages, but are designed for softmax attention, whose dense tails make any truncation discard nonzero probability mass. In contrast, $α$-entmax produces exact zeros, turning sparse decoding from dense-tail approximation into support recovery: if the selected candidates contain the entmax support, sparse decoding remains exact. While recent entmax kernels enable efficient training, they do not address the autoregressive decoding bottleneck, where dense inference still streams the full KV cache before sparsity is known. In this work, we introduce EntmaxKV, an entmax-native sparse decoding framework that exploits sparsity before KV pages are loaded. EntmaxKV combines query-aware page scoring, support-aware candidate selection, and sparse entmax attention. We analyze truncation error through the dropped probability mass $δ$, showing that output error is controlled by $δ$ and vanishes when the entmax support is recovered. We further introduce a Gaussian-aware entmax selector that estimates the entmax threshold from lightweight page statistics, adapting the selected budget to the score distribution. Empirically, EntmaxKV drops less probability mass, retains more support tokens, and achieves lower output error than softmax-based sparse decoding at matched KV budgets. On long-context and language modeling benchmarks, it closely matches full-cache entmax while using a small fraction of the KV cache, achieving up to $3.36\times$ (softmax) and $5.43\times$ (entmax) speedup over full attention baselines at 1M context length. Code available at: https://github.com/deep-spin/entmaxkv.

URL PDF HTML ☆

赞 0 踩 0

2605.21646 2026-05-22 cs.LG 版本更新

Alike Parts: A Feature-Informed Approach to Local and Global Prototype Explanations

相似部分：一种基于特征的局部和全局原型解释方法

Jacek Karolczak, Jerzy Stefanowski

发表机构 * Institute of Computing Science（计算科学研究所）

AI总结本文提出了一种基于特征的局部和全局原型解释方法，通过整合特征重要性来提高解释的粒度，实验表明该方法在保持模型预测精度的同时增强了特征多样性。

Comments Accepted for publication in International Journal of Applied Mathematics and Computer Science (IJAMCS)

详情

AI中文摘要

基于原型的解释提供了一种直观的、基于实例的方法来支持机器学习黑箱分类器的可解释性，但通常缺乏特征层面的细粒度。我们介绍了一个框架，该框架在两个层次上整合特征重要性以解决这一差距。首先，对于局部解释，我们提出"相似部分"：一种利用特征重要性评分来突出分类实例与其最近原型之间最相关、共享的特征子集的方法，以引导用户关注。其次，我们通过在全局原型选择目标函数中加入特征重要性项，积极促进所选原型的特征属性的多样性。在六个基准数据集上的实验表明，这种增强的选取过程保持或在某些情况下提高了替代模型的预测保真度，表明特征多样性并不影响模型保真度。

英文摘要

Prototype-based explanations offer an intuitive, example-based approach to support the interpretability of machine learning black box classifiers but often lack feature-level granularity. We introduce a framework that integrates feature importance at two levels to address this gap. First, for local explanations, we propose \textit{alike parts}: a method that uses feature importance scores to highlight the most relevant, shared feature subsets between a classified instance and its nearest prototype, guiding user attention. Second, we augment the global prototype selection objective function with a feature importance term to actively promote diversity in the feature attributions of the selected prototypes. Experiments on six benchmark datasets show that this augmented selection process maintains or, in some cases, increases the prediction fidelity of the surrogate model, suggesting that feature diversity does not compromise model fidelity.

URL PDF HTML ☆

赞 0 踩 0

2605.21615 2026-05-22 cs.CR cs.LG cs.SE 版本更新

ASSEMBLAGE-DEEPHISTORY: A Cross-Build Binary Dataset with Temporal Coverage

ASSEMBLAGE-DEEPHISTORY: 一个具有时间覆盖的跨构建二进制数据集

Chang Liu, Noah Fleischmann, Nicolò Altamura, Edward Raff, James Holt, Kristopher Micinski

发表机构 * Syracuse University（Syracuse大学）； Booz Allen Hamilton ； Independent Researcher（独立研究员）； CrowdStrike

AI总结本文提出ASSEMBLAGE-DEEPHISTORY数据集，整合了跨构建多样性、跨版本历史和CVE标签，为二进制分析提供统一框架，通过三个分析验证了其在LLM漏洞检测、版本聚类和二进制相似性分解中的价值。

详情

AI中文摘要

现有的二进制数据集通常只能捕捉一个或两个二进制变化轴：它们要么提供无时间轴的跨编译器构建，要么为单构建二进制提供CVE标签。没有一个结合跨构建多样性、跨版本历史和CVE标签到可查询的结构中。我们提出了ASSEMBLAGE-DEEPHISTORY，将这些维度整合到统一的框架中，其中每个二进制的编译上下文、源代码、易受攻击函数和包版本都作为一等元数据存储。ASSEMBLAGE-DEEPHISTORY包含73,610个二进制文件，涵盖248个开源项目，这些二进制文件在Linux和Windows上使用GCC、Clang和MSVC在多个优化级别下编译，具有多年的历史构建。每个二进制文件都被索引到一个数据库中，该数据库将其链接到其源代码、函数、调试信息、变体构建、历史版本和易受攻击函数。三种分析展示了该结构的价值：（1）一个三阶段LLM基准测试（识别、策略引导检测和跨构建转移）以测试LLM是否在二进制漏洞上进行推理或在构建特定的artifact上进行模式匹配；（2）MalConv嵌入、jTrans函数嵌入和TLSH模糊哈希的比较，量化了同一包版本在每个空间中的聚类情况；（3）贝叶斯回归分解二进制相似性为时间距离、文件更改和提交的贡献。

英文摘要

Existing binary corpora typically capture only one or two axes of binary variation: they either provide cross-compiler builds without a temporal axis, or CVE labels for single-build binaries. None combine cross-build diversity, cross-version history, and CVE labels into a queryable structure. We present ASSEMBLAGE-DEEPHISTORY, which consolidates these dimensions into a unified framework where every binary's compilation context, source code, vulnerable functions, and package version are stored as first-class metadata. ASSEMBLAGE-DEEPHISTORY comprises 73,610 binaries spanning 248 open-source projects, compiled across GCC, Clang, and MSVC at multiple optimization levels on Linux and Windows, with multi-year historical builds. Each binary is indexed in a database that links it to its source code, functions, debug info, variant builds, historical versions, and vulnerable functions. Three analyses demonstrate this structure's value: (1) a three-stage LLM benchmark (recognition, strategy-guided detection, and cross-build transfer) to test whether LLMs reason about binary vulnerabilities or pattern-match on build-specific artifacts; (2) a comparison of MalConv embeddings, jTrans function embeddings, and TLSH fuzzy hashes quantifying how same-package versions cluster in each space; and (3) a Bayesian regression decomposing binary similarity into contributions from temporal distance, file changes, and commits.

URL PDF HTML ☆

赞 0 踩 0

2605.21614 2026-05-22 cs.HC cs.LG 版本更新

Exploring the Effectiveness of Using LLMs for Automated Assessment of Student Self Explanations in Programming Education

探索使用LLMs对编程教育中学生自解释进行自动评估的有效性

Arun-Balajiee Lekshmi-Narayanan, Mohammad Hassany, Peter Brusilovsky

发表机构 * University of Pittsburgh（匹兹堡大学）

AI总结本文研究了在编程教育中使用LLMs自动评估学生自解释的有效性，通过比较LLMs与语义相似性方法在二元分类任务中的表现，探讨了自动评分技术的优劣。

详情

AI中文摘要

worked examples是特定领域的逐步问题解决示例，提供给学生以获得领域特定的问题解决技能。通过将worked examples与self-explanations结合，可以增强其有效性，因为self-explanations要求学生解释而不是被动学习每个问题解决步骤。主要挑战是评估学生解释的正确性。在现有方法中，学生解释通过其语义相似性与教师或领域专家的解释进行判断。鉴于近期LLM基于的自动评分进展，仍不清楚语义相似性方法是否仍然是自动评分文本学生响应（如文章或代码解释）最有效的方法。比较这些方法需要高质量的数据集，提供如平衡的类别分布和领域特定的标注数据等特征。在本文中，我们提出了一个严格比较LLMs与语义相似性用于自动评分的比较，框架为二元分类任务。

英文摘要

Worked examples are step-by-step solutions to problems in a specific domain, offered to students to acquire domain-specific problem-solving skills. The effectiveness of worked examples could be enhanced by combining them with self-explanations, which ask students to explain rather than passively study each problem-solving step. The main challenge of this approach is assessing the correctness of the student's explanations. In the prevailing approach, student explanations are judged by their semantic similarity to an instructor's or domain expert's explanation. Given recent advances in LLM-based automated scoring, it remains unclear whether semantic similarity methods are still the most effective technique to automatically score textual student responses like essays or code explanations. Comparing these methods also requires quality datasets that offer distinctive features such as balanced class distributions and domain-specific labeled data for automated scoring tasks. In this paper, we present a rigorous comparison between LLMs and semantic similarity used for automated scoring, framed as a binary classification task.

URL PDF HTML ☆

赞 0 踩 0

2605.21611 2026-05-22 cs.CV cs.LG 版本更新

UniVL: Unified Vision-Language Embedding for Spatially Grounded Contextual Image Generation

UniVL：统一的视觉-语言嵌入用于空间接地的上下文图像生成

Jiayun Wang, Yu Wang, Weijie Gan, Zhenting Wang, Wei Wei

发表机构 * Center for Advanced AI（先进人工智能中心）

AI总结本文提出了一种统一的视觉-语言嵌入方法，通过单一的视觉输入直接将语义绑定到空间位置，从而减少计算并提高图像生成质量。

详情

AI中文摘要

我们引入了空间接地的上下文图像生成任务，这是一种可控的图像生成任务，重新定义了条件生成范式。与通过两个独立编码器分别提供参考图像和全局文本提示不同，UniVL被训练以从单一统一的视觉输入中直接绑定语义到空间位置，其中文本指令被渲染到空间掩码上。这消除了推理过程中对独立文本编码器的需求。所得到的模型通过遵循用户指定的指令来支持上下文图像生成，即在指定位置生成什么内容，同时显著减少了计算量。为了解决这一任务，我们提出了一种框架，其中从光学字符识别预训练的backbone中适应的UniVL编码器读取统一的条件，并生成一个融合视觉和语义意图以及空间位置的UniVL嵌入fVIL。一个两阶段流程首先对齐UniVL与VAE嵌入空间，然后将预训练的扩散backbone完全基于UniVL嵌入进行条件生成，消除了如T5等独立文本编码器。尽管这种重新定义使用了刻意最小化的文本接口，但仍然取得了显著的实证收益。在UniVL-ImgGen上，一个包含477,000个掩码标注图像的基准数据集上，UniVL在文本提示基线之上提高了图像质量，将FID从14降低到11，并将PSNR从16提高到20。它还完全消除了文本编码器，将推理TFLOPs减少高达52%，将运行时间减少高达44%。此外的消融研究验证了所提出组件的贡献，为具有统一条件范式的高效、空间接地图像生成铺平了道路。

英文摘要

We introduce spatially grounded contextual image generation, a controllable image generation task that reframes the conditioning paradigm. Instead of supplying a reference image and a global text prompt through two separate encoders, one for vision and one for language, UniVL is trained to bind semantics to spatial locations directly from a single unified visual input, where the textual instruction is rendered onto the spatial mask. This removes the need for a standalone text encoder at inference time. The resulting model supports contextual image generation by following user-specified instructions about what should appear where, while substantially reducing computation. To address this task, we propose a framework in which the UniVL encoder, adapted from an optical-character-recognition-pretrained backbone, reads the unified condition optically and produces a UniVL embedding, fVIL, that fuses visual and semantic intent with spatial locations in a single token sequence. A two-stage pipeline first aligns UniVL with the VAE embedding space and then conditions a pretrained diffusion backbone entirely on UniVL embeddings, eliminating the standalone text encoder, such as T5. Although this reframing uses a deliberately minimal text interface, it yields strong empirical gains. On UniVL-ImgGen, a benchmark of 477K mask-annotated images that we construct for training and evaluation, UniVL improves image quality over text-prompted baselines, reducing FID from 14 to 11 and increasing PSNR from 16 to 20. It also eliminates the text encoder entirely, reducing inference TFLOPs by up to 52% and runtime by up to 44%. Additional ablation studies validate the contributions of the proposed components, paving the way for efficient, spatially grounded image generation with a unified conditioning paradigm.

URL PDF HTML ☆

赞 0 踩 0

2605.21610 2026-05-22 cs.LG 版本更新

AgForce Enables Antigen-conditioned Generative Antibody Design

AgForce 使生成抗体设计具备抗原条件

Mansoor Ahmed, Murray Patterson

发表机构 * Georgia State University（佐治亚州立大学）； Georgia Institute of Technology（佐治亚理工学院）

AI总结本文提出AgForce方法，通过图神经网络和改进的解码器设计，解决传统抗体设计方法中对抗原输入忽略的问题，提升了抗体序列生成的质量和恢复能力。

详情

AI中文摘要

抗体设计方法通常基于抗原结构生成互补决定区（CDR），但基线方法的系统评估表明，它们大多忽略了抗原输入。我们识别出三种导致这种行为的失败模式。抗原盲性是因为模型从抗体框架上下文推断预测，而非抗原信息，从而产生几乎相同的CDR，无论目标如何。词汇坍塌将预测的氨基酸减少到每个位置3到5种，远低于天然序列的真实分布。此外，任何使用标准位置交叉熵训练的模型都会收敛到位置边际分布，这使得它无法产生抗原特异性序列预测。我们提出了一种名为AgForce的新型编码器-解码器架构，它使用图神经网络（GNN）作为编码器，并针对序列-结构协同设计设计了专用解码器。具体而言，我们应用了框架dropout、门控瓶颈和双曲交叉注意力，以防止抗体的捷径路径。在解码器中，一个具有Potts-like成对耦合和退火的多选学习（aMCL）的混合密度网络（MDN）序列头取代了交叉熵目标，用一个多组件分布替代了位置边际分布的最优解。一个抗原循环一致性头将梯度路由通过序列解码器，迫使预测分布编码抗原身份。AgForce在CHIMERA-Bench数据集上同时实现了最佳的结合质量和序列恢复能力，比最强的序列基线提高了8%的氨基酸恢复率，且在所有界面指标上均优于基线，几乎将GNN方法的有效词汇量翻倍。源代码可在：https://github.com/mansoor181/ag-force.git

英文摘要

Antibody design methods condition on antigen structure to generate complementarity-determining regions (CDR), yet a systematic evaluation of baseline methods reveals that they largely ignore the antigen input. We identify three failure modes that explain this behavior. Antigen blindness arises because models derive predictions from antibody framework context rather than antigen information, producing nearly identical CDRs regardless of the target. Vocabulary collapse reduces predicted amino acids to three to five per position, far below the ground truth distribution in native sequences. Moreover, any model trained with standard per-position cross-entropy converges to the positional marginal distribution, making it provably unable to produce antigen-specific sequence predictions. We propose a novel encoder-decoder architecture called AgForce, that uses a graph neural network (GNN) as the encoder and specialized decoders for sequence-structure co-design. Specifically, we apply framework dropout, gated bottlenecks, and hyperbolic cross attention that prevent the antibody shortcut path. In the decoder, a Mixture Density Network (MDN) sequence head with Potts-like pairwise coupling and annealed Multiple Choice Learning (aMCL) replaces the cross-entropy objective with a multi-component distribution whose optimal solution differs from the positional marginal. An antigen cycle consistency head routes gradients through the sequence decoder, forcing predicted distributions to encode antigen identity. AgForce achieves the best binding quality and sequence recovery simultaneously on the CHIMERA-Bench dataset, improving amino acid recovery by 8% over the strongest sequence baseline while surpassing the baselines across all interface metrics, and nearly doubling the effective vocabulary of GNN methods. The source code is available at: https://github.com/mansoor181/ag-force.git

URL PDF HTML ☆

赞 0 踩 0

2605.21606 2026-05-22 cs.LG cs.AI 版本更新

When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning

何时教师标记可靠？用于推理的基于位置加权的在线自我蒸馏

Xiaogeng Liu, Xinyan Wang, Yingzi Ma, Yechao Zhang, Chaowei Xiao

发表机构 * Johns Hopkins University（约翰霍普金斯大学）； University of Wisconsin–Madison（威斯康星大学麦迪逊分校）； Nanyang Technological University（南洋理工大学）

AI总结本文提出了一种基于位置加权的在线自我蒸馏方法，用于改进推理任务中教师标记的可靠性，通过引入分支可行性诊断来识别教师标记的可靠性，并在不同模型上验证了其有效性。

Comments Pre-print. Code is available at https://github.com/SaFo-Lab/PW-OPSD

详情

AI中文摘要

在线自我蒸馏（OPSD）通过一个特权教师训练学生，但其标准目标对所有生成的标记同等重视，隐含地将特权教师目标视为在每个学生访问的前缀中同样可靠。现有的基于熵的OPD方法通过调节令牌级监督来放松这种均匀性，但推理中高教师熵的可靠性含义具有歧义：它可以反映非可行的不确定性或良性的解决方案多样性。为识别这一现象，我们引入了分支可行性诊断。具体来说，我们记录特权答案教师提示中的下一个标记替代方案，强制每个替代方案在学生提示及其在线脊柱前缀之后，并测试由此产生的学生模板延续是否能恢复正确答案。在Qwen3-4B上，我们发现一个导向的序列内位置分数是测试中最强的教师标记可靠性预测因子，达到曲线下面积（AUROC）为0.83；局部不确定性分数最多为0.57。受此轨迹结构的启发，我们提出了基于位置加权的在线自我蒸馏（PW-OPSD），其在保持相同的学生滚动生成、特权教师传递和截断的前向KL目标的同时，应用递增的位置权重。在不同随机种子的全面评估中，诊断衍生的PW-OPSD在AIME 2024和AIME 2025 Avg@12上分别提高了+1.0和+1.1分，并在两个更大规模的模型上也展示了一致的Avg@12改进。这些结果表明，推理蒸馏中的教师标记可靠性具有轨迹结构，并且可以在不增加教师计算的情况下利用。

英文摘要

On-policy self-distillation (OPSD) trains a student on its own rollouts using a privileged teacher, but its standard objective weights all generated tokens equally, implicitly treating the privileged teacher target as equally reliable at every student-visited prefix. Existing entropy-based OPD methods relax this uniformity by modulating token-level supervision with teacher entropy, but high teacher entropy in reasoning has an ambiguous reliability meaning: it can reflect either non-viable uncertainty or benign solution diversity. To identify this phenomenon, we introduce a branch-viability diagnostic. Specifically, we record next-token alternatives from the privileged-answer teacher prompt, force each alternative after the student prompt plus its on-policy spine prefix, and test whether the resulting student-template continuation recovers the correct answer. On Qwen3-4B, we find that an oriented within-sequence position score is the strongest tested predictor of teacher-token reliability, reaching an area-under-ROC-curve (AUROC) of 0.83; local uncertainty scores are at most 0.57. Motivated by this trajectory-level structure, we propose Position-Weighted On-Policy Self-Distillation (PW-OPSD), which applies an increasing position weight while keeping the same student rollout, privileged teacher pass, and clipped forward-KL target as OPSD. In our comprehensive evaluations with different random seeds, the diagnostic-derived PW-OPSD improves AIME 2024 and AIME 2025 Avg@12 by +1.0 and +1.1 points, and a generalization evaluation on two larger-scale models from different families, DeepSeek-R1-Distill-Llama-8B and Olmo-3-7B-Think, also demonstrates consistent aggregate Avg@12 improvements. These results show that teacher-token reliability in reasoning distillation is trajectory-structured and can be utilized without additional teacher computation.

URL PDF HTML ☆

赞 0 踩 0

2605.21600 2026-05-22 cs.LG 版本更新

ConTact: Contact-First Antibody CDR Design via Explicit Interface Reasoning

ConTact: 通过显式界面推理进行接触优先的抗体CDR设计

Mansoor Ahmed, Spencer VonBank, Nadeem Taj, Sujin Lee, Naila Jan, Murray Patterson

发表机构 * Georgia State University, Atlanta, USA（佐治亚州立大学）； Georgia Institute of Technology, Atlanta, USA（佐治亚理工学院）； DePauw University, Indiana, USA（德保罗大学）； University of Engineering（工程大学）

AI总结本文提出ConTact，一种通过显式界面推理进行抗体CDR设计的方法，通过显式分解CDR设计为三个阶段：学习表面互补性指纹、预测CDR-抗原接触以及注入接触门控抗原特征，从而提高结构质量和表位意识。

详情

基于运行时治理的嵌入式联邦学习用于缺铁预测

Fan Zhang, Simon Deltadahl, Majid Lotfian Delouee, Daniel Kreuter, Joseph Taylor, Allerdien Visser, BloodCounts Consortium, James H. F. Rudd, Nicholas S. Gleadall, Suthesh Sivapalaratnam, Folkert Asselbergs, Martijn C. Schut, Michael Roberts

发表机构 * Theoretical Physics University of Cambridge Cambridge, UK ； Translational AI Laboratory, Dept. of Laboratory Medicine Amsterdam UMC Amsterdam, The Netherlands ； Precision Health University Research Institute Queen Mary Univ. of London London, UK ； Department of Medicine University of Cambridge Cambridge Biomedical Campus Cambridge, UK ； Transplant Cambridge Biomedical Campus Cambridge, UK ； Dept. of Cardiology Amsterdam Cardiovascular Sciences Amsterdam UMC Amsterdam, The Netherlands

AI总结本文提出了一种基于嵌入的联邦学习框架，用于从常规全血计数数据中预测缺铁，并在两个临床环境中部署，展示了个性化聚合方法在处理不同样本量和任务相关性时的优越性。

详情

AI中文摘要

近期的综述发现，发表的大多数医疗联邦学习（FL）研究从未达到实际应用。我们开发了一种基于嵌入的FL管道，用于从常规全血计数（FBC）数据中预测缺铁，并在阿姆斯特丹大学医学中心（AUMC）和英国国家血库和移植（NHSBT）两个临床环境中部署。这两个临床数据集在结构上不独立和相同分布（非IID），异质性源于不同的群体差异而非采样误差。运行时治理由FLA$^3$强制执行，这是一个面向医疗的FL平台，提供研究范围的执行、基于策略的授权和带签名的审计日志。标准样本量加权聚合（FedAvg）在两个站点相对于仅本地训练降低了受试者工作特征曲线（ROC-AUC）的面积，因为全局更新偏向于较大的AUMC分布。FedMAP，一种个性化聚合方法，将AUMC的ROC-AUC从0.9470提升到0.9594，将NHSBT的ROC-AUC从0.8558提升到0.8671，实现了最高的宏ROC-AUC为0.9133和最佳的宏平衡精度。这些结果支持在临床联邦中使用个性化聚合，特别是在客户端样本量和任务相关性差异显著时。

英文摘要

Recent reviews find that the vast majority of published healthcare federated learning (FL) studies never reach real-world deployment. We developed an embedding-based FL pipeline for iron deficiency prediction from routine full blood count (FBC) data and deployed it across real institutional environments at Amsterdam University Medical Centre (AUMC) and NHS Blood and Transplant (NHSBT), two clinical environments that differ markedly in iron deficiency prevalence, ferritin distribution, and subject populations. A frozen domain-specific haematology foundation model, DeepCBC, performs site-local representation extraction, restricting federated training to a compact downstream classifier and substantially reducing recurrent communication relative to full-encoder federation. The two clinical datasets are structurally not independent and identically distributed (non-IID), with heterogeneity arising from distinct population differences rather than sampling artefacts. Runtime governance is enforced by FLA$^3$, a healthcare-oriented FL platform providing study-scoped execution, policy-based authorisation, and signed audit logging. Standard sample-size-weighted aggregation (FedAvg) reduced the area under the receiver operating characteristic curve (ROC-AUC) at both sites relative to local-only training, as the global update was biased towards the larger AUMC distribution. FedMAP, a personalised aggregation method, raised ROC-AUC from 0.9470 to 0.9594 at AUMC and from 0.8558 to 0.8671 at NHSBT relative to local-only training, achieving the highest macro ROC-AUC of 0.9133 and the best macro balanced accuracy overall. These results support personalised aggregation in clinical federations where client sample size and task relevance diverge substantially.

URL PDF HTML ☆

赞 0 踩 0

2605.21561 2026-05-22 cs.LG 版本更新

Objective-Induced Bias and Search Dynamics in Multiobjective Unsupervised Feature Selection

目标诱导偏差与多目标无监督特征选择中的搜索动态

Mathieu Cherpitel, Thomas Bäck, Martijn R. Tannemaat, Anna V. Kononova

发表机构 * LIACS, Leiden University（莱顿大学LIACS）； LUMC, Leiden University（莱顿大学LUMC）

AI总结本研究探讨了多目标无监督特征选择中评价目标对搜索动态和Pareto前沿质量的影响，发现基于轮廓系数的评价目标倾向于产生低基数的平凡解，而提出的PCA损失目标能生成测试准确度与监督优化相似的紧凑子集。

详情

AI中文摘要

无监督特征选择通常被建模为一个多目标优化问题，联合优化子集质量和子集大小。然而，这种形式的行为严重依赖于评估目标的选择、子集大小正则化的方向以及初始化策略。我们通过一个具有已知信息性、冗余性和无关特征类型的合成数据集，在受控环境下研究这些因素。通过结合三个评估目标：准确率、轮廓系数和PCA重建损失，并结合子集大小最小化或最大化，比较了六种形式。结果表明，形式对搜索动态和最终Pareto前沿的质量都有显著影响。基于轮廓系数的形成表现出对平凡低基数解的强烈偏见，并且仍然是预测性能的弱代理。相比之下，所提出的PCA损失目标产生具有与直接优化监督准确率获得的子集相似测试准确度的紧凑子集。总体而言，该研究表明，目标设计是有效多目标无监督特征选择的关键。

英文摘要

Unsupervised feature selection is commonly formulated as a multiobjective optimisation problem that jointly optimises subset quality and subset size. Yet the behaviour of this formulation depends critically on the choice of evaluation objective, the direction of subset-size regularisation, and the initialisation strategy. We study these factors in a controlled setting using a synthetic dataset with known informative, redundant, and irrelevant feature types. Six formulations are compared by combining three evaluation objectives: accuracy, silhouette score, and PCA reconstruction loss with subset-size minimisation or maximisation. The results show that formulation strongly affects both search dynamics and the quality of the resulting Pareto front. Silhouette-based formulations exhibit a strong bias toward trivial low-cardinality solutions and remain weak proxies for predictive performance. In contrast, the proposed PCA loss objective produces compact subsets with test accuracy comparable to subsets obtained by directly optimising supervised accuracy. Overall, the study shows that objective design is central to effective multiobjective unsupervised feature selection.

URL PDF HTML ☆

赞 0 踩 0

2605.21560 2026-05-22 cs.LG 版本更新

AutoMCU: Feasibility-First MCU Neural Network Customization via LLM-based Multi-Agent Systems

AutoMCU: 通过基于LLM的多智能体系统实现面向MCU的神经网络定制化

Penglin Dai, Zijie Zhou, Xincao Xu, Junhua Wang, Xiao Wu, Lixin Duan

发表机构 * School of Computing and Artificial Intelligence, Southwest Jiaotong University（计算机与人工智能学院，西南交通大学）； Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China（深圳先进研究院，电子科技大学）； School of Computer Science and Engineering, Northeastern University（计算机科学与工程学院，东北大学）

AI总结本文提出AutoMCU，一种基于LLM的多智能体系统，用于在MCU约束下实现神经网络的自动化定制化。通过自然语言任务需求和硬件规格，AutoMCU迭代生成结构化架构候选方案，通过供应商工具链反馈过滤不可行设计，在训练前进行筛选，评估可行模型并在受控协议下验证部署可行性。

详情

AI中文摘要

在微控制器单元（MCU）上部署神经网络对于边缘智能至关重要，但受限于内存、存储和计算约束仍具挑战性。现有方法，如模型压缩和硬件感知神经架构搜索（HW-NAS），通常依赖代理指标，导致搜索成本高，且未能充分弥合架构设计与验证部署之间的差距。本文提出AutoMCU，一种以可行性为导向的基于大型语言模型（LLM）的多智能体系统，用于在MCU约束下实现神经网络的自动化定制化。给定自然语言任务要求和硬件规格，AutoMCU迭代生成结构化架构候选方案，在训练前通过供应商工具链反馈过滤不可行设计，评估可行模型在受控协议下的性能，并通过后端基础部署分析验证部署可行性。AutoMCU包含两个关键机制：1）基于硬件的架构生成，用于在RAM和Flash约束下提前淘汰不可部署的候选方案；2）状态隔离的多智能体调度，用于稳定协调提案、训练、评估和部署阶段。在严格MCU约束下对CIFAR-10和CIFAR-100的实验表明，AutoMCU在减少定制时间至约1-2小时的同时实现了具有竞争力的精度，相比代表性的MCU导向HW-NAS基线方法所需的数百小时GPU时间。与ColabNAS和基于LLM的NAS方法GENIUS在NAS-Bench-201上的比较进一步证明了AutoMCU的有效性和稳定性。在多个STM32微控制器上的实际设备部署验证了其在MCU规模边缘智能中的实际适用性。

英文摘要

Deploying neural networks on microcontroller units (MCUs) is critical for edge intelligence but remains challenging due to tight memory, storage, and computation constraints. Existing approaches, such as model compression and hardware-aware neural architecture search (HW-NAS), often depend on proxy metrics, incur high search cost, and do not fully bridge the gap between architecture design and verified deployment. This paper presents AutoMCU, a feasibility-first large language model (LLM)-based multi-agent system for automated neural network customization under MCU constraints. Given natural-language task requirements and hardware specifications, AutoMCU iteratively generates structured architecture candidates, filters infeasible designs through vendor toolchain feedback before training, evaluates feasible models under a controlled protocol, and verifies deployability through backend-grounded deployment analysis. AutoMCU includes two key mechanisms: 1) hardware-in-the-loop architecture generation for early elimination of undeployable candidates under RAM and Flash constraints, and 2) state-isolated multi-agent scheduling for stable coordination of proposal, training, evaluation, and deployment stages. Experiments on CIFAR-10 and CIFAR-100 under strict MCU constraints show that AutoMCU achieves competitive accuracy while reducing customization time to about 1--2 hours, compared with hundreds of GPU hours for representative MCU-oriented HW-NAS baselines. Comparisons with ColabNAS and the LLM-based NAS method GENIUS on NAS-Bench-201 further demonstrate the effectiveness and stability of AutoMCU. Real-device deployments on multiple STM32 microcontrollers validate its practical applicability to MCU-scale edge intelligence.

URL PDF HTML ☆

赞 0 踩 0

2605.21558 2026-05-22 cs.LG cs.CL 版本更新

From Parameters to Data: A Task-Parameter-Guided Fine-Tuning Pipeline for Efficient LLM Alignment

从参数到数据：一种任务参数引导的微调流水线用于高效的LLM对齐

Hao Chen, Qi Zhang, Liyao Li, Zhanming Shen, Wentao Ye, Lirong Gao, Ningtao Wang, Xing Fu, Xiaoyu Shen, Junbo Zhao

发表机构 * Zhejiang University（浙江大学）； Eastern Institute of Technology（东部技术研究所）

AI总结本研究提出了一种任务参数引导的微调流水线，通过任务敏感的注意力头作为双指南，实现样本挖掘和结构剪枝，从而提高LLM对齐的效率。

Comments Accepted@ICML26, 28 pages, 11 figures, 26 tables

详情

AI中文摘要

适应大型语言模型（LLM）到专业领域通常会带来高数据和计算开销。尽管先前的效率努力大多将数据选择和参数高效微调视为孤立过程，我们的实证分析表明它们可能本质上是耦合的。我们提出了强映射假说：稀疏的注意力头子集在任务特定适应中起主导作用，作为解锁特定数据模式的钥匙。基于这一观察，我们提出了从参数到数据（P2D）统一框架，利用这些任务敏感的注意力头作为双指南，用于样本挖掘和结构剪枝。为了严格量化整个流程的成本，我们引入了对齐效率比率（AER）指标，用于衡量选择延迟和训练时间。机理上，P2D通过轻量级代理识别关键头，并利用它们作为功能性过滤器来精选高亲和力数据，建立协同流程。经验上，通过更新仅10%的注意力头在10%的数据上，P2D在强基线基础上实现了8.3个百分点的性能提升，并提供了7.0倍的端到端时间加速。这些结果验证了精确的参数-数据同步消除了冗余，提供了一种新的高效对齐范式。

英文摘要

Adapting Large Language Models (LLMs) to specialized domains typically incurs high data and computational overhead. While prior efficiency efforts have largely treated data selection and parameter-efficient fine-tuning as isolated processes, our empirical analysis suggests they may be intrinsically coupled. We posit the Strong Map Hypothesis: a sparse subset of attention heads plays a dominant role in task-specific adaptation, acting as keys that unlock specific data patterns. Building on this observation, we propose From Parameters to Data (P2D), a unified framework that leverages these task-sensitive attention heads as a dual compass for both sample mining and structural pruning. To rigorously quantify the total pipeline cost, we introduce the Alignment Efficiency Ratio (AER) metric for both selection latency and training time. Mechanistically, P2D identifies critical heads via a lightweight proxy and uses them as a functional filter to curate high-affinity data, establishing a synergistic pipeline. Empirically, by updating merely 10% of attention heads on 10% of the data, P2D achieves an 8.3 pp performance gain over strong baselines and delivers a 7.0x end-to-end time speedup. These results validate that precise parameter-data synchronization eliminates redundancy, offering a new paradigm for efficient alignment.

URL PDF HTML ☆

赞 0 踩 0

2605.21556 2026-05-22 cs.LG 版本更新

Beyond Single Slot: Joint Optimization for Multi-Slot Guaranteed Display Advertising

超越单一广告位：多广告位保障型显示广告的联合优化

Zhaoqi Zhang, Jiaming Deng, Miao Xie, Linyou Cai, Qianlong Xie, Xingxing Wang, Siqiang Luo, Gao Cong

发表机构 * Nanyang Technological University（南洋理工大学）； China Agricultural University（中国农业大学）

AI总结本文提出了一种多广告位保障型显示广告的联合优化框架，解决了广告位冗余、合同不平衡和曝光集中等关键问题，通过离线 bipartite 匹配问题和合同轮盘机制，提升了广告商 ROI、平台收入效率和合同履行的鲁棒性。

Comments Accepted at SIGIR Industry Track 2026

详情

DOI: 10.1145/3805712.3808398

AI中文摘要

保障型显示广告对于平台变现至关重要，但现有方法通常基于单一广告位假设，限制了其在多广告位页面浏览中的优化能力。本文提出了一种新颖的多广告位保障型显示广告联合优化框架，解决了广告位冗余、合同不平衡和曝光集中等关键挑战。我们的方法将分配建模为一个离线 bipartite 匹配问题，采用合同轮盘机制实现广告位独占性，并通过页面浏览约束实现印象控制，同时结合可扩展的分配优化算法以实现高效的大规模部署。在美团广告平台的大量在线测试中，我们的方法显著提高了广告商 ROI、平台收入效率和合同履行的鲁棒性。具体而言，在线 A/B 测试显示在 70% 的流量下，平均收入每用户增加了 28.99%，DID 分析进一步表明合同稳定性得到改善，证明了我们的框架在现实广告部署中的强大适用性和有效性。

英文摘要

Guaranteed display advertising is crucial for platform monetization, yet existing methods often operate under a single-slot assumption, limiting their ability to optimize allocation across multi-slot page views. In this paper, we propose a novel joint optimization framework for multi-slot GD allocation, addressing key challenges such as slot-level redundancy, contract imbalance, and exposure concentration. Our approach formulates the allocation as an offline bipartite matching problem with a contract roulette mechanism for slot exclusivity and Page View constraints for impression control, and incorporates a scalable allocation optimization algorithm for efficient large-scale deployment. Extensive online tests on the Meituan advertising platform demonstrate that our method significantly improves merchant ROI, platform revenue efficiency, and contract fulfillment robustness. Specifically, online A/B tests show a 28.99% increase in Average Revenue Per User under 70% traffic, and DID analysis further indicates improved contract stability, demonstrating the strong applicability and effectiveness of our framework in real-world advertising deployments.

URL PDF HTML ☆

赞 0 踩 0

2605.21553 2026-05-22 cs.LG cs.IT eess.IV math.IT 版本更新

TONIC: Token-Centric Semantic Communication for Task-Oriented Wireless Systems

TONIC：面向任务的无线系统中的基于标记的语义通信

Sige Liu, Kezhi Wang

发表机构 * Department of Computer Science, Brunel University London（布鲁内尔大学伦敦计算机科学系）

AI总结本文提出TONIC框架，通过在发送端进行语义感知保护和接收端置信度感知门控，实现任务导向无线系统中基于标记的语义通信，优于传统方法。

Comments 15 pages, 10 figures

详情

AI中文摘要

标记正成为基础模型表示和处理信息的基本单元，用于理解和推理。然而，传统的位级忠实无线通信在可靠传输的内容与下游模型实际消耗的内容之间存在不匹配。这种不匹配要求一种通信设计，能够直接考虑标记层面的任务相关性和下游模型需求，而不是将所有传输位视为同等重要。在本文中，我们提出了TONIC，一种面向任务的无线系统中的基于标记的语义通信框架。发送端将每个源样本转换为标记序列，估计标记层面的任务相关性，并在固定信道使用预算下通过效用感知的非均等错误保护分配保护。在接收端，使用标记层面的置信度来门控不可靠的决策，将有害的替代转换为可恢复的擦除，在基于Transformer的完成模型恢复被遮蔽的标记以进行最终任务推理之前。我们的框架在模块化且可解释的架构中结合了发送端的语义感知保护和接收端的置信度感知门控，而不是仅依赖于完全黑盒端到端学习。我们进一步建立了接收端门控规则的效用感知贝叶斯风险解释，并研究其与非均等保护和完成的相互作用。在图像分类实验中，结果表明TONIC在匹配的通信预算下，无论是在AWGN、瑞利和莱斯信道上，都优于分离式方案、像素域DeepJSCC基线和标记域基线。

英文摘要

Tokens are becoming the basic units through which foundation models represent and process information for understanding and inference. However, traditional wireless communication, centered on bit-level fidelity, faces a mismatch between what is transmitted reliably and what downstream models actually consume. This mismatch calls for a communication design that directly accounts for token-level task relevance and downstream model requirements, rather than treating all transmitted bits as equally important. In this paper, we propose TONIC, a token-centric semantic communication framework for task-oriented wireless systems. The transmitter converts each source sample into a sequence of tokens, estimates token-level task relevance, and allocates protection through utility-aware unequal error protection under a fixed channel-use budget. At the receiver, token-level confidence is used to gate unreliable decisions, turning harmful substitutions into recoverable erasures before a Transformer-based completion model restores the masked tokens for final task inference. Our framework combines transmitter-side semantic-aware protection with receiver-side confidence-aware gating in a modular and interpretable architecture, rather than relying solely on fully black-box end-to-end learning. We further establish a utility-aware Bayes-risk interpretation for the receiver-side gating rule and study its interaction with unequal protection and completion. Experimental results on image classification show that TONIC consistently outperforms separation-based schemes, the pixel-domain DeepJSCC baseline, and token-domain baselines under matched communication budgets over AWGN, Rayleigh, and Rician channels.

URL PDF HTML ☆

赞 0 踩 0

2605.21552 2026-05-22 cs.LG stat.ML 版本更新

Expectation Consistency Loss: Rethink Confidence Calibration under Covariate Shift

期望一致性损失：在协变量偏移下重新思考置信度校准

Jinzong Dong, Zhaohui Jiang, Bo Yang

发表机构 * School of Automation, Central South University, Changsha, China（中南大学自动化学院，长沙，中国）

AI总结本文针对协变量偏移下的置信度校准问题，提出了一种无监督域适应损失（ECL），该方法在理论和实践中均表现出色，能够有效校准目标域的置信度。

Comments Accepted by ICML 2026

详情

AI中文摘要

置信度校准对于分类模型在安全关键决策场景中的应用至关重要，并已受到广泛关注。通用的置信度校准方法假设训练和测试数据是独立同分布的，这在存在协变量偏移时限制了其有效性。在协变量偏移下的先前校准方法在类内或标准校准方面存在困难，且通常依赖于当密度比较大或无界时不稳定的重要性加权。鉴于上述限制，本文重新思考了协变量偏移下的置信度校准。首先，我们推导出协变量偏移下的置信度校准的必要且充分条件，称为期望一致性条件，该条件揭示协变量偏移并不必然导致未校准的置信度，并提供了比全局协变量分布对齐更弱的置信度校准条件。然后，利用期望一致性条件，本文提出了一种无监督域适应损失来校准目标域的置信度，称为期望一致性损失（ECL），该方法兼容标准校准、类内校准和顶部标签校准。第三，我们证明计算ECL损失的样本复杂度与预期校准误差（ECE）相同，并提供了一种理论支持的mini-batch可训练方案。最后，我们在模拟和现实世界协变量偏移数据集上验证了本文方法的有效性。

英文摘要

Confidence calibration for classification models is vital in safety-critical decision-making scenarios and has received extensive attention. General confidence calibration methods assume training and test data are independent and identically distributed, limiting their effectiveness under covariate shifts. Previous calibration methods under covariate shift struggle with class-wise or canonical calibrations and often rely on unstable importance weighting when density ratios are large or unbounded. Given the above limitations, this paper rethinks confidence calibration under covariate shifts. First, we derive a necessary and sufficient condition for confidence calibration under covariate shifts, named Expectation consistency condition, which reveals covariate shifts do not necessarily lead to uncalibrated confidence and provides a weaker condition for confidence calibration than global covariate distribution alignment. Then, utilizing Expectation consistency condition, this paper proposes an unsupervised domain adaptation loss to calibrate confidence of the target domain, named Expectation consistency loss (ECL), which is compatible with canonical calibration, class-wise calibration, and top-label calibration. Third, we prove that computing ECL loss has the same sample complexity as Expected Calibration Error (ECE) and provide a theoretically grounded mini-batch trainable scheme for ECL loss. Finally, we validate the effectiveness of our method on both simulated and real-world covariate shift datasets.

URL PDF HTML ☆

赞 0 踩 0

2605.21550 2026-05-22 cs.LG 版本更新

PeakFocus: Bridging Peak Localization and Intensity Regression via a Unified Multi-Scale Framework for Electricity Load Forecasting

PeakFocus: 通过统一的多尺度框架桥接峰值定位与强度回归以实现电力负荷预测

Wangzhi Yu, Peng Zhu, Qing Zhao, Yiwen Jiang, Dawei Cheng

发表机构 * School of Computer Science and Technology, Tongji University（同济大学计算机科学与技术学院）； Big Data Center, State Grid Corporation of China（国家电网公司大数据中心）

AI总结本文提出PeakFocus框架，通过统一的多尺度框架解决电力负荷峰值预测中的峰值定位与强度回归问题，改进多尺度表示冲突和强度平滑问题，提升预测精度。

详情

AI中文摘要

电力负荷峰值预测（ELPF）同时预测峰值时间和强度，是有效电网调度和风险管理的前提。然而，现有方法面临三个限制。首先，它们采用预测后定位的两阶段范式，切断了时间定位和强度回归之间的联系。其次，它们仍然挣扎于多尺度表示冲突，导致峰值误判和时间错位。第三，强度回归过程中缺乏显式的峰值时间上下文，导致强度平滑，因为预测受全局平滑趋势主导。为了解决这些限制，我们提出了PeakFocus，一个统一的ELPF框架。（i）统一的峰值感知流水线（UPAP）利用三重混合损失共同监督时间定位和强度回归，并配以基于容忍度的评估协议。（ii）多尺度混合峰值定位器（MSM-PL）利用粗粒度特征来缓解局部波动导致的峰值误判，并通过级联机制将它们注入细粒度特征中以解决时间错位。（iii）位置感知解码器（LAD）将峰值时间上下文注入强度回归过程中，提供明确的指导以对抗强度平滑并提高峰值强度估计。在公共电力（ELC）数据集和我们工业级的World Large-scale Electricity Load（WLEL）数据集上的广泛实验表明，PeakFocus在时间和强度精度上均优于基线方法。

英文摘要

Electricity load peak forecasting (ELPF), simultaneously predicting peak timing and intensity, is a prerequisite for effective grid scheduling and risk management. However, existing methods face three limitations. First, they adopt a two-stage predict-then-locate paradigm, which severs the link between temporal localization and intensity regression. Second, they still struggle with the multi-scale representation conflict, leading to peak misjudgment and timing misalignment. Third, the lack of explicit peak timing context during intensity regression causes intensity smoothing because predictions are dominated by global smoothing trends. To address these limitations, we propose PeakFocus, a unified framework for ELPF. (i) A Unified Peak-Aware Pipeline (UPAP) utilizes a triple hybrid loss to jointly supervise temporal localization and intensity regression, alongside a tolerance-based evaluation protocol. (ii) A Multi-Scale Mixing Peak Locator (MSM-PL) exploits coarse-grained features to mitigate peak misjudgment caused by local fluctuations, and injects them into fine-grained features via a cascade mechanism to resolve timing misalignment. (iii) A Location-Aware Decoder (LAD) injects peak timing context into the intensity regression process, providing explicit guidance to counteract intensity smoothing and improve peak intensity estimation. Extensive experiments on the public Electricity (ELC) dataset and our industrial-scale World Large-scale Electricity Load (WLEL) dataset show that PeakFocus outperforms baselines in both timing precision and intensity estimation.

URL PDF HTML ☆

赞 0 踩 0

2605.21548 2026-05-22 stat.ML cs.AI cs.LG 版本更新

Local Covariate Selection for Average Causal Effect Estimation without Pretreatment and Causal Sufficiency Assumptions

局部协变量选择用于无预处理和因果充分性假设下的平均因果效应估计

Zeyu Liu, Zheng Li, Feng Xie, Yan Zeng, Hao Zhang, Kun Zhang

发表机构 * Department of Applied Statistics, Beijing Technology and Business University（北京技术与商业大学应用统计系）； Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences（深圳先进技术研究院）； College of Computer Science and Artificial Intelligence, Fudan University（复旦大学计算机科学与人工智能学院）； Department of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence（Mohamed bin Zayed人工智能大学机器学习系）； Carnegie Mellon University（卡内基梅隆大学）

AI总结本文提出了一种局部学习方法，用于非参数因果效应估计中的协变量选择，避免了预处理和因果充分性假设，提高了计算效率和估计准确性。

详情

AI中文摘要

我们研究了选择协变量以无偏估计总因果效应的问题。现有方法通常依赖于对所有变量的全局因果结构学习，或依赖于强假设，如因果充分性假设——观测变量不共享潜在混杂因素，或预处理假设，限制协变量只能是不受处理或结果影响的变量。这些要求在实践中往往不现实，且在高维设置中全局学习变得计算上不可行。为了解决这些挑战，我们提出了一种新颖的局部学习方法，用于非参数因果效应估计中的协变量选择，避免了预处理和因果充分性假设。我们首先刻画了一个局部边界，该边界包含至少一个有效的调整集，当且仅当存在调整集来识别因果效应时。然后我们开发了局部识别程序，以在该边界内高效地搜索。我们证明了所提出的方法是正确且完整的。在多个合成数据集和两个真实世界数据集上的实验表明，我们的方法在准确估计因果效应的同时，显著提高了计算效率。

英文摘要

We study the problem of selecting covariates for unbiased estimation of the total causal effect.Existing approaches typically rely on global causal structure learning over all variables, or on strong assumptions such as causal sufficiency - where observed variables share no latent confounders - or the pretreatment assumption, which limits covariates to those unaffected by the treatment or outcome. These requirements are often unrealistic in practice, and global learning becomes computationally prohibitive in high-dimensional settings.To address these challenges, we propose a novel local learning method for covariate selection in nonparametric causal effect estimation that avoids both the pretreatment and causal sufficiency assumptions. We first characterize a local boundary that contains at least one valid adjustment set whenever one exists for identifying the causal effect, and then develop local identification procedures to efficiently search within this boundary.We prove that the proposed method is sound and complete. Experiments on multiple synthetic datasets and two real-world datasets show that our approach achieves accurate causal effect estimation while substantially improving computational efficiency.

URL PDF HTML ☆

赞 0 踩 0

2605.21544 2026-05-22 cs.LG eess.SP 版本更新

Tabular foundation models for robust calibration of near-infrared chemical sensing data

用于近红外化学传感数据稳健校准的表格基础模型

Robin Reiter, Denis Cornet, Fabien Michel, Lauriane Rouan, Gregory Beurier

发表机构 * CIRAD（国际热带农业中心）； UMR AGAP Institut（AGAP研究所）； Université de Montpellier（蒙彼利埃大学）； INRAE（国家农业食品与环境研究机构）； Institut Agro（农业研究所）； LIRMM（蒙彼利埃大学LIRMM实验室）

AI总结本文研究了表格基础模型在近红外化学传感数据校准中的应用，通过对比不同模型在回归和分类任务中的表现，发现预处理优化的TabPFN在回归任务中表现最佳，而在分类任务中直接使用原始光谱的数据表现最优，同时指出在存在光谱异常值和外推样本时，传统化学计量学模型仍具竞争力。

Comments 56 pages, 17 figures, including supplementary material

详情

AI中文摘要

近红外光谱学正日益被用作一种快速、非破坏性的化学传感技术，用于食品、制药、生物和环境样品的分析。然而，NIR传感器的实际部署仍然依赖于能够处理高维、共线性光谱、有限样本量、预处理依赖性、光谱异常值和超出校准域外推的校准模型。本文评估了表格基础模型是否能为NIR化学传感提供新的校准策略。我们对66个NIR数据集（涵盖54个回归和12个分类任务）进行了基准测试，并将直接推断原始光谱与预处理优化推断与PLS/PLS-DA、岭回归、CatBoost和一维卷积神经网络进行比较。本研究使用统一的验证框架，在此框架中预处理和模型选择仅在校准数据上进行，之后进行外部测试评估。在回归中，预处理优化的TabPFN在总体平均排名上最佳，并显著优于PLS、CatBoost、TabPFN在原始光谱上的表现以及CNN-1D，同时在统计上与岭回归相当。在分类中，直接应用于原始光谱的TabPFN提供了最佳的平均排名，性能接近优化变体。鲁棒性分析显示，TabPFN提供强的平均预测性能，但在光谱异常值和外推样本中，其优势减少，传统化学计量学模型仍具竞争力。这些结果表明，表格基础模型可以补充已建立的化学计量学工作流程用于NIR化学传感，特别是在小到中等规模的校准设置中，同时突显了需要光谱特定的先验知识和不确定性感知的部署策略。

英文摘要

Near-infrared spectroscopy is increasingly used as a rapid, non-destructive chemical sensing technology for the analysis of food, pharmaceutical, biological, and environmental samples. However, the practical deployment of NIR sensors still depends on calibration models able to handle high-dimensional, collinear spectra, limited sample sizes, preprocessing dependence, spectral outliers, and extrapolation beyond the calibration domain. Here, we evaluate whether tabular foundation models can provide a new calibration strategy for NIR chemical sensing. We benchmark TabPFN on 66 NIR datasets covering 54 regression and 12 classification tasks, and compare direct inference on raw spectra with preprocessing-optimized inference against PLS/PLS-DA, Ridge, Catboost, and one-dimensional convolutional neural networks. The study uses a unified validation framework in which preprocessing and model selection are performed exclusively on calibration data before external test evaluation. In regression, preprocessing-optimized TabPFN achieves the best overall average rank and significantly outperforms PLS, CatBoost, TabPFN on raw spectra, and CNN-1D, while remaining statistically comparable to Ridge. In classification, TabPFN applied directly to raw spectra provides the best average rank, with performance close to the optimized variant. Robustness analyses show that TabPFN provides strong average predictive performance but that its advantage decreases on spectral outliers and extrapolated samples, where classical chemometric models remain competitive. These results suggest that tabular foundation models can complement established chemometric workflows for NIR chemical sensing, especially in small- to medium-sized calibration settings, while highlighting the need for spectroscopy-specific priors and uncertainty-aware deployment strategies.

URL PDF HTML ☆

赞 0 踩 0

2605.21543 2026-05-22 cs.LG 版本更新

Provable Joint Decontamination for Benchmarking Multiple Large Language Models

可证明的多语言模型基准测试去污染

Zhenlong Liu, Hao Zeng, Hongxin Wei

发表机构 * Department of Statistics and Data Science, Southern University of Science and Technology Shanghai Innovation Institute（统计与数据科学系，南方科技大学上海创新研究院）

AI总结本文提出了一种可证明的多语言模型基准测试去污染方法，通过联合选择过程实现全局污染率控制，提升跨模型比较的可靠性。

详情

AI中文摘要

在LLM评估中，基准数据污染已成为关键挑战：当评估示例出现在一个或多个受审模型的训练数据中时，报告性能可能被夸大，跨模型比较变得不可靠。大量训练数据检测工作设计了评分来量化模型对给定数据点的记忆程度，但这些基于评分的方法缺乏理论保证。最近的符合方法为单个模型提供了可证明的假识别控制；然而，分别应用它们到每个模型会产生模型特定的基准，破坏跨模型的公平比较。在本文中，我们将多模型基准去污染正式化为一个联合选择问题，并提出联合包络符合选择（JECS），一种符合程序，能够在给定假设下实现全局污染率（GCR）控制。具体而言，JECS计算每个模型的符合p值，通过每个项目的最大值进行汇总，并从高于数据驱动阈值的右尾观测中重建一个保守的包络最大p空分布。通过将自适应Benjamini-Hochberg（BH）程序应用于包络重新缩放值，我们选择了一个具有可证明GCR控制的基准。在各种模型和基准上的广泛实验表明，JECS在保持目标GCR控制的同时，比max-p基线具有更高的功效。

英文摘要

Benchmark data contamination has become a central challenge in LLM evaluation: when evaluation examples appear in the training data of one or more audited models, reported performance can be inflated and cross-model comparisons become unreliable. A broad line of training-data detection work designs scores to quantify how strongly a model memorizes a given data point, but these score-based methods lack theoretical guarantees. Recent conformal approaches provide provable false-identification control for a single model; however, applying them separately to each model can produce model-specific benchmarks, undermining fair comparison across models. In this work, we formalize multi-model benchmark decontamination as a joint selection problem and propose Joint Envelope Conformal Selection (JECS), a conformal procedure that enables global contamination rate (GCR) control under stated assumptions. Specifically, JECS computes per-model conformal p-values, aggregates them by the per-item maximum, and reconstructs a conservative envelope of the max-p null distribution from right-tail observations above a data-driven threshold. By applying the adaptive Benjamini-Hochberg (BH) procedure to the envelope-rescaled values, we select a benchmark with provable GCR control. Extensive experiments across various models and benchmarks demonstrate that JECS achieves higher power than the max-p baseline while consistently maintaining the target GCR control.

URL PDF HTML ☆

赞 0 踩 0

2605.21542 2026-05-22 cs.LG 版本更新

Discovering Entity-Conditioned Lag Heterogeneity: A Lag-Gated Neural Audit Framework for Panel Time Series

发现实体-条件滞后异质性：一种用于面板时间序列的滞后门神经审计框架

Andi Xu

发表机构 * School of Engineering Jönköping University（工程学院琼斯科普инг大学）

AI总结本文提出了一种用于面板时间序列的滞后门神经审计框架AC-GATE，旨在解决不同实体在不同时间跨度上对历史信号的响应问题，通过引入适应性编码器和尺度不变滞后门，实现对滞后异质性的发现和结构化输出。

Comments Preprint/technical paper. An interpretable neural audit framework for entity-conditioned lag discovery in panel time series. 10 pages, 5 figures, 16 tables. Code available at the GitHub repository

详情

AI中文摘要

国家层面的时间面板被广泛用于实证分析。研究人员经常需要审计不同实体在不同时间跨度上对历史信号的响应。当前方法通常无法直接提供可审计的实体特定滞后汇总。我们将其公式化为时间面板挖掘任务，并提出AC-GATE，一种具有尺度不变滞后门的适应性编码器。它通过使用可观察的实体层面代理来条件化历史观测的滞后权重分布，从而将有效的滞后作为模型的结构输出，而不是事后解释。评估基于分层审计协议，将预测校准与滞后发现分开。使用具有已知真实滞后的人工面板进行机制恢复测试，并使用两个现实世界的国家层面面板进行外部审计和压力测试。结果表明，AC-GATE可以在合成数据中恢复异质滞后结构，并在真实数据中生成非退化的、结构化的有效滞后。

英文摘要

Country-level temporal panels are widely used in empirical analysis. Researchers often need to audit how different entities respond to historical signals over different time horizons. Current approaches typically do not provide directly auditable entity-specific lag summaries. We formulate entity-conditioned heterogeneous lag discovery as a temporal panel mining task and propose AC-GATE, an Adaptive-Conditioning Encoder with a Scale-Invariant Lag Gate. It instantiates conditional Moderated Distributed Lag by using observable entity-level proxies to condition lag-weight distributions over historical observations, thereby making effective lags structural outputs of the model rather than post-hoc explanations. The evaluation is based on a layered audit protocol that separates predictive calibration from lag discovery. A synthetic panel with known ground-truth lags is used for mechanism recovery testing, and two real-world country-level panels are used for external audit and stress testing. The results show that AC-GATE can recover heterogeneous lag structure in synthetic data, and generates non-degenerate, externally structured effective lags in real data.

URL PDF HTML ☆

赞 0 踩 0

2605.21541 2026-05-22 cs.CR cs.AI cs.LG stat.ML 版本更新

Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs

频域正则化对抗对齐用于针对闭源大语言模型的可转移攻击

Leitao Yuan, Qinghua Mao, Daizong Liu, Kun Wang, Wenjie Wang, Yan Teng, Jing Shao, Dongrui Liu

发表机构 * Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； Zhejiang University（浙江大学）； Shanghai Jiao Tong University（上海交通大学）； Wuhan University（武汉大学）； Nanyang Technological University（南洋理工大学）； University of Science and Technology of China（中国科学技术大学）

AI总结本文提出FRA-Attack，通过频域正则化方法解决对抗转移性问题，通过高通DCT目标和频率域梯度正则化提升跨模型的对抗转移能力。

详情

AI中文摘要

多模态大语言模型（MLLMs）仍易受基于转移的针对性攻击影响，其中在开源代理编码器上优化的扰动可以泛化到闭源MLLMs。提高对抗转移性的一个关键挑战是有效捕捉不同模型间共享的内在视觉聚焦特性，使得扰动与可转移的语义线索对齐，而非代理特定行为。然而，现有方法受到空间域特征冗余和代理特定梯度信号的阻碍，影响跨模型转移性。在本文中，我们提出FRA-Attack，从统一的频域正则化视角解决这两个挑战。在特征对齐方面，对patch特征的高通DCT目标抑制冗余的全局结构，并将损失集中在承载MLLMs内在视觉聚焦的高频带。在梯度优化方面，我们引入频率域梯度正则化（FGR），一种无模型依赖的低通正则化器，仅使用几何频率坐标调节代理梯度，即不涉及代理衍生的统计量，因此FGR通过构造无模型依赖性，消除代理特定的高频伪影，同时保留可转移的低频方向。两者共同形成统一的频域转移性处理。在15个旗舰MLLMs上进行的广泛实验显示，FRA-Attack在跨模型转移性方面表现优异，特别是在GPT-5.4、Claude-Opus-4.6和Gemini-3-flash等最先进的模型上实现了最先进的性能。

英文摘要

Multimodal large language models (MLLMs) remain vulnerable to transfer-based targeted attacks, where perturbations optimized on open-source surrogate encoders can generalize to closed-source MLLMs. A key challenge for improving adversarial transferability is to effectively capture the intrinsic visual focus shared across different models, such that perturbations align with transferable semantic cues rather than surrogate-specific behaviors. However, existing methods suffer from spatial-domain feature redundancy and surrogate-specific gradient signals, thereby hindering cross-model transferability. In this paper, we propose FRA-Attack, which addresses both challenges from a unified frequency-domain regularization perspective. For feature alignment, a high-pass DCT objective on patch features suppresses redundant global structures and concentrates the loss on the high-frequency band that carries the MLLMs' intrinsic visual focus. For gradient optimization, we introduce Frequency-domain Gradient Regularization (FGR), a \textit{model-agnostic} low-pass regularizer that modulates the surrogate gradient using only the geometric frequency coordinate, \textit{i.e.}, no surrogate-derived statistic is involved, so that FGR is model-agnostic by construction, removing surrogate-specific high-frequency artifacts while preserving transferable low-frequency directions. Together, the two components form a unified frequency-domain treatment of transferability. Extensive experiments on $15$ flagship MLLMs across $7$ vendors show that FRA-Attack achieves superior cross-model transferability, particularly with state-of-the-art performance on GPT-5.4, Claude-Opus-4.6 and Gemini-3-flash.

URL PDF HTML ☆

赞 0 踩 0

2605.21539 2026-05-22 cs.LG 版本更新

DualOptim+: Bridging Shared and Decoupled Optimizer States for Better Machine Unlearning in Large Language Models

DualOptim+: 联合与解耦优化器状态的桥梁以提升大语言模型中的机器反遗忘

Xuyang Zhong, Qizhang Li, Yiwen Guo, Chen Liu

发表机构 * Department of Computer Science, City University of Hong Kong（香港城市大学计算机科学系）； Independent Researcher（独立研究者）

AI总结本文提出DualOptim+，一种改进大语言模型中机器反遗忘的新优化框架，通过引入基础状态和delta状态，有效平衡遗忘与保留目标，同时提出8位量化变体以减少内存开销，实验表明其在多个任务中均表现出色。

Comments Accepted by ICML 2026

2605.21534 2026-05-22 stat.ML cs.LG 版本更新

Adaptive RBF-KAN: A Comparative Evaluation of Dynamic Shape Parameters in Kolmogorov-Arnold Networks

自适应RBF-KAN：动态形状参数在Kolmogorov-Arnold网络中的比较评估

Roberto Cavoretto, Alessandra De Rossi, Adeeba Haider, Amir Noorizadegan

发表机构 * Member of the INdAM Research Group GNCS（INdAM GNCS研究组成员）； Department of Mathematics, Hong Kong Baptist University（香港 Baptist大学数学系）

AI总结本文研究了Kolmogorov-Arnold网络中动态形状参数的选择问题，通过引入更广泛的径向基核和基于留一验证的核尺度估计，改进了RBF-KAN模型，提升了对不同函数类型的适应能力。

详情

AI中文摘要

Kolmogorov-Arnold网络（KANs）通过可学习的单变量边缘函数近似多变量函数，通常参数化为B样条基。尽管有效，基于样条的实现可能计算成本较高。一种改进的KAN变体称为FastKAN，通过将样条替换为高斯径向基函数（RBF）来提高效率，但其依赖于固定的核和形状参数。在本工作中，我们扩展了基于RBF的KAN框架，引入了更广泛的径向基核，并通过留一验证（LOOCV）初始化核形状参数。到目前为止，这是首次将基于LOOCV的核尺度估计与深度KAN训练相结合的研究。我们还首次将Matérn和Wendland核引入KAN框架，使KAN能够超越FastKAN中使用的高斯核，提供更灵活的基函数表示。LOOCV估计提供了数据驱动的核尺度初始化，随后在网络训练中进一步优化。所提出的自适应RBF-KAN在多个二维基准函数上进行了评估。结果突显了核选择和自适应形状参数的重要性，不同核在光滑函数、不连续性和振荡模式中表现出优势。总体而言，结合基于LOOCV的初始化与自适应核学习为改进RBF-KAN模型提供了一种实用策略。

英文摘要

Kolmogorov-Arnold Networks (KANs) approximate multivariate functions using learnable univariate edge functions, typically parameterized by B-spline bases. Although effective, spline-based implementations can be computationally expensive. A modified version of KANs, called FastKAN, improves efficiency by replacing splines with Gaussian radial basis functions (RBFs), but it relies on a fixed kernel and shape parameter. In this work, we extend the RBF-based KAN framework by introducing a broader family of radial basis kernels and by initializing the kernel shape parameter using leave-one-out cross-validation (LOOCV). To the best of our knowledge, this is the first study that integrates LOOCV-based kernel scale estimation with deep KAN training. We also introduce Matérn and Wendland kernels into the KAN framework for the first time, enabling more flexible basis representations beyond the Gaussian kernel used in FastKAN. The LOOCV estimate provides a data-driven initialization of the kernel scale, which is subsequently refined during network training. The proposed adaptive RBF-KAN is evaluated on several two-dimensional benchmark functions. The results highlight the importance of kernel selection and adaptive shape parameters, with different kernels showing advantages for smooth functions, discontinuities, and oscillatory patterns. Overall, combining LOOCV-based initialization with adaptive kernel learning provides a practical strategy for improving RBF-based KAN models.

URL PDF HTML ☆

赞 0 踩 0

2605.21527 2026-05-22 eess.IV cs.CV cs.LG 版本更新

CryoNet: A Deep Learning Framework for Multi-Modal Debris-Covered Glacier Mapping. A Case Study of the Poiqu Basin, Central Himalaya

CryoNet：一种用于多模态冰川覆盖区制图的深度学习框架。帕iqu盆地，中央喜马拉雅地区案例研究

Farzaneh Barzegar, Tobias Bolch, Norbert Kuehtreiber, Silvia L. Ullo

发表机构 * University of Sannio（萨恩尼奥大学）； Graz University of Technology（格拉茨技术大学）

AI总结本研究提出CryoNet，一种利用多模态数据集的深度学习框架，用于区分干净冰川、覆盖冰川和冰湖，通过在喜马拉雅中央帕iqu盆地的案例研究展示了其在复杂高山环境中的有效性。

Comments 15 pages, 10 figures, 5 tables. Preprint submitted to IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS); currently under review

详情

AI中文摘要

冰川作为淡水储备和气候变化指标起着关键作用，但其自动制图，尤其是覆盖冰川，由于与周围地形的光谱相似性仍具挑战性。本研究引入了CryoNet，一种深度学习框架，利用丰富的多模态数据集，包括Sentinel-2光学影像、DEM导出的地形变量、光谱指数、主成分分析（PCA）、InSAR相干性和相位、点状特征和GLCM纹理，以区分干净冰川、覆盖冰川和冰湖。CryoNet是一种基于ResNet101编码器的编码器-解码器CNN，具有嵌套跳接连接和空间-通道Squeeze-and-Excitation（scSE）注意力机制。本研究在喜马拉雅中央帕iqu盆地进行，通过将训练模型应用于阿尔卑斯山脉的蒙布朗山群评估其可转移性。我们还分析了每层数据在提高冰川制图性能中的重要性。所提出的模型实现了总体IoU为90.52%，平均召回率为98.08%，平均精确率为92.26%。对于覆盖冰川，CryoNet实现了IoU为90.46%，召回率为95.79%，精确率为94.21%。在单类和总体指标上，CryoNet超越了DeepLabV3+、SegFormer和U-Net，作为最先进的（SOTA）参考，证明了其在复杂高山环境中的冰川制图有效性。

英文摘要

Glaciers play a critical role as freshwater reserves and indicators of climate change, yet their automatic delineation, especially for debris-covered glaciers, remains challenging due to spectral similarity with surrounding terrain. This study introduces CryoNet, a deep learning framework that leverages a rich multi-modal dataset combining Sentinel-2 optical imagery, DEM-derived topographic variables, spectral indices, Principal Component Analysis (PCA), InSAR coherence and phase, tasseled-cap features, and GLCM texture to discriminate clean-ice glaciers, debris-covered glaciers, and glacial lakes. CryoNet is an encoder-decoder CNN with nested skip connections and spatial-channel Squeeze-and-Excitation (scSE) attention, built upon a ResNet101 encoder to capture hierarchical contextual and spatial features. The study is conducted in the Poiqu Basin in the central Himalaya, and transferability is evaluated by applying the trained model to the Mont Blanc Massif in the Alps. We additionally analyse the importance of each data layer in improving glacier mapping performance. The proposed model achieves an overall IoU of 90.52%, mean Recall of 98.08%, and mean Precision of 92.26%. For debris-covered glaciers specifically, CryoNet obtains an IoU of 90.46%, a recall of 95.79%, and a precision of 94.21%. Across both per-class and overall metrics, CryoNet surpasses DeepLabV3+, SegFormer, and U-Net, taken as state-of-the-art (SOTA) references, demonstrating its effectiveness for robust glacier mapping in complex high-mountain environments.

URL PDF HTML ☆

赞 0 踩 0

2605.21522 2026-05-22 q-bio.QM cs.AI cs.CE cs.LG stat.ML 版本更新

Protein Thoughts: Interpretable Reasoning with Tree of Thoughts and Embedding-Space Flow Matching for Protein-Protein Interaction Discovery

蛋白质思想：基于树 of 思维和嵌入空间流匹配的可解释推理用于蛋白质-蛋白质相互作用发现

Kingsley Yeon, Xuefeng Liu, Promit Ghosal

发表机构 * Department of Statistics and CCAM University of Chicago（统计学系和CCAM大学芝加哥分校）； School of Medicine Stanford University（医学学院斯坦福大学）； Department of Statistics University of Chicago（统计学系芝加哥大学）

AI总结本文提出了一种可解释的蛋白质-蛋白质相互作用发现框架，通过显式推理将PPI发现转化为可解释的搜索问题，利用嵌入空间流匹配和树 of 思维搜索方法提升预测精度和可解释性。

详情

AI中文摘要

面向社区的顶点排序用于基于参考的图压缩：一种交叉编码实证研究

Jimmy Dubuisson

发表机构 * Vantino Geneva Switzerland（瓦宁托日内瓦瑞士）

AI总结本文提出了一种两阶段的Leiden+LLP顶点排序方法，并研究其与基于参考的压缩的交互作用，结果显示在初始顶点排序较差的图中，重新排序能显著节省比特数，且不同编码器对排序的响应具有高度一致性。

Comments 26 pages, 7 figures, 9 tables. Full reproducibility package at https://github.com/jimbotonic/Adjacently.jl. Preprint; comments welcome

详情

AI中文摘要

基于参考的图压缩通过将每个顶点的邻接列表相对于最近的顶点进行编码，利用局部性来压缩大规模有向图。主流工具WebGraph的BVGraph固定单一编码流程，并依赖于单独选择的顶点排序--通常为URL字典序或分层标签传播（LLP）。排序与编码器之间的相互作用很少被测量。我们提出了一种两阶段的Leiden+LLP顶点排序--全局LLP用于种子标签，Leiden社区检测，然后在每个诱导子图上进行每簇LLP--并研究其与基于参考的压缩的交互。在初始顶点排序较差的图中，重新排序在每组数据集和编码器上节省了0.3到5.4比特每边。该收益的大小对编码器的敏感性较小：在四个五弱排序数据集中，四个独立参数化的编码器在Leiden+LLP与纯LLP之间的收益在大约±0.04 bpe内一致。在URL排序的网络爬虫中，其中分布式排序已经编码了局部性，自适应编码器仍然受益于重新排序，但经过URL诱导残差结构（BV-HC，CG at K>1）调优的编码器会受到轻微损害。为了量化在排序固定后编码器选择的重要性，我们贡献了三个基于参考的编码器--BG、CS和CG--它们能够从最多28个候选分解中进行每顶点成本最优的选择。每个在自己最佳测试排序下运行。这三个中的最佳在每个测试数据集上都优于BVGraph高压缩性能，编码器层面的收益在弱排序数据集中始终小于排序层面的收益。编码器框架还产生了一个自限定的位流，支持低开销随机访问。

英文摘要

Reference-based graph compression encodes each vertex's neighbor list relative to a recent vertex, exploiting locality to compress large directed graphs. The dominant tool, WebGraph's BVGraph, fixes a single encoding pipeline and relies on a separately chosen vertex ordering -- typically URL-lexicographic or Layered Label Propagation (LLP). The interaction between ordering and encoder is rarely measured. We propose a two-stage Leiden+LLP vertex ordering -- global LLP to seed labels, Leiden community detection, then per-cluster LLP on each induced subgraph -- and study how it interacts with reference-based compression. On graphs with poor initial vertex order, reordering saves 0.3 to 5.4 bits per edge on every dataset and encoder we measured. The size of that gain is largely insensitive to the encoder: on four of five weakly ordered datasets, four independently parameterised encoders agree on the Leiden+LLP-vs-plain-LLP gain within roughly +/- 0.04 bpe. On URL-ordered web crawls, where the distributed ordering already encodes locality, adaptive encoders still benefit from reordering, but encoders tuned to URL-induced residual structure (BV-HC, CG at K>1) are mildly hurt by it. To quantify how much encoder choice matters once ordering is fixed, we contribute three reference-based encoders -- BG, CS, and CG -- that perform per-vertex cost-optimal selection from up to 28 candidate decompositions. Each is run under its own best-tested ordering. The best of the three improves over BVGraph high-compression by 2-9% on every dataset tested, with the encoder-level gain consistently smaller than the ordering-level gain on weakly ordered datasets. The encoder framework also yields a self-delimiting bitstream that supports low-overhead random access.

URL PDF HTML ☆

赞 0 踩 0

2605.21507 2026-05-22 physics.ao-ph cs.AI cs.CE cs.LG 版本更新

Visibility nowcasting in South Korea: a machine learning approach to class imbalance and distribution shift

韩国可见度现在预测：一种处理数据不平衡和分布偏移的机器学习方法

Bong Gyun Shin, Chan Sik Lee, Hyesun Suh

发表机构 * Department of AI Big Data（人工智能大数据系）； Daejin University（大 Jain 大学）； Department of Statistics and Actuarial Science（统计与精算科学系）； Soongsil University（顺斯大学）； College of Artificial Intelligence Convergence（人工智能融合学院）

AI总结本文提出了一种机器学习方法，用于预测韩国六个主要城市的大气可见度，通过SMOTENC和CTGAN处理数据不平衡，并结合机器学习和深度学习模型进行评估，发现训练与测试期间的分布偏移导致预测性能下降，强调了在时间序列数据上实施现在预测模型时考虑外部环境因素的重要性。

Comments Published in Theoretical and Applied Climatology

Journal ref Theoretical and Applied Climatology, vol. 157, art. no. 283, 2026

详情

DOI: 10.1007/s00704-026-06219-6

AI中文摘要

大气可见度是交通安全和空气质量管理的关键变量，然而，由于气象条件和空气污染物之间的复杂相互作用以及低可见度事件的稀有性，准确预测仍然具有挑战性。本研究引入了一种机器学习框架，用于预测韩国六个主要城市的可见度。为了处理2018-2020年训练数据中的不平衡问题，我们应用了合成少数类过采样技术（SMOTENC）和条件表格生成对抗网络（CTGAN）。然后，使用结合机器学习和深度学习模型的集成方法，并在2021年测试数据集上进行评估。结果表明，测试集的预测性能相比交叉验证阶段明显下降。这种退化归因于训练和测试期间的分布偏移，通过测量SHAP分析确定的最显著特征的Wasserstein距离得到了定量确认。总体而言，本研究提出了一种旨在同时解决数据不平衡和时间分布偏移双重挑战的方法，并强调在时间序列数据上实施现在预测模型时考虑不断变化的外部环境因素的必要性。

英文摘要

Atmospheric visibility is a critical variable for transportation safety and air quality management, however, accurate prediction remains challenging due to the complex interactions between meteorological conditions and air pollutants, as well as the rarity of low-visibility events. This study introduces a machine learning framework to nowcast visibility in six major South Korean cities. To handle the imbalance in the 2018-2020 training data, we applied the Synthetic Minority Over-sampling Technique with Nominal and Continuous (SMOTENC) and Conditional Tabular Generative Adversarial Network (CTGAN). An ensemble approach combining machine learning and deep learning models was then used and evaluated on a 2021 test dataset. The results revealed a marked decline in predictive performance in the test set compared to the cross-validation phase. This degradation was attributed to a distributional shift between training and testing periods, which was quantitatively confirmed by measuring the Wasserstein distance of the most influential feature identified by SHAP analysis. In general, this study presents a methodology that aims to simultaneously address the dual challenges of data imbalance and temporal distributional shifts, and emphasizes the necessity of accounting for evolving external environmental factors when implementing nowcasting models on time-series data.

URL PDF HTML ☆

赞 0 踩 0

2605.21502 2026-05-22 q-bio.MN cs.AI cs.LG 版本更新

Graph neural network explanations reveal a topological signature of disease-associated hubs in biological networks

图神经网络解释揭示了生物网络中与疾病相关的枢纽的拓扑特征

Kyle Higgins, Ivan Laponogov, Dennis Veselkov, Kirill Veselkov

发表机构 * Division of Cancer, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London（癌症部、外科与癌症部门、医学学院、伦敦帝国学院）； Department of Computing, Imperial College London（计算部门、伦敦帝国学院）； Department of Environmental Health Sciences, Yale University（环境健康科学部门、耶鲁大学）

AI总结本文研究了图神经网络在生物网络中识别疾病相关结构的方法，发现不同解释方法在稀疏单节点驱动和分布式路径信号中有不同的表现，并提出了一种结合壳层枢纽评分和解释器共识排名的框架，提升了对癌症基因的优先级排序和生物学相关分子的恢复能力。

Comments 25 pages (excluding supplement), 7 figures, 7 supplementary tables

详情

AI中文摘要

图神经网络（GNNs）越来越多地用于建模生物系统，但后验解释方法恢复有意义的分子机制的可靠性仍不清楚。本文系统评估了四种广泛使用的解释方法：显著性归因（SA）、集成梯度（IG）、GNNExplainer 和层间相关传播（LRP），以识别乳腺癌RNA-seq数据在蛋白质-蛋白质相互作用网络上的疾病相关结构。通过合成基准测试，我们发现解释方法恢复了不同的信号组织：SA在稀疏单节点驱动方面表现最佳，而IG和LRP更倾向于恢复分布式的路径样和级联样信号。在TCGA BRCA数据中，我们识别出一种一致的拓扑特征，即疾病相关枢纽的归因在最近的1跳邻居中达到峰值，并在后续网络壳层中衰减，这种模式在IG和LRP中最为显著，并与已知癌症枢纽的强富集相关。我们进一步观察到局部枢纽富集与全局基因排名性能之间的权衡，IG优化局部富集，而SA在全局区分方面表现更优。受这些互补行为的启发，我们提出了一种结合基于壳层的枢纽评分和解释器共识排名的框架。共识评分提高了对经典癌症基因（TP53、BRCA1、ESR1、MYC）的优先级排序，减少了对节点度数的依赖，并且在调优时优于单独的方法。通路富集进一步揭示了对生物上一致的癌症程序的改进恢复，包括ERBB2、RTK、MAPK、免疫和细胞因子信号。这些结果表明，拓扑感知的图解释整合可以提高生物可解释性和生物相关分子的恢复能力。

英文摘要

Graph neural networks (GNNs) are increasingly used to model biological systems, yet the reliability of post-hoc explanation methods for recovering meaningful molecular mechanisms remains unclear. Here, we systematically evaluate four widely used approaches: Saliency Attribution (SA), Integrated Gradients (IG), GNNExplainer, and Layer-wise Relevance Propagation (LRP) for identifying disease-relevant structure in breast cancer RNA-seq data projected onto a protein-protein interaction network. Using synthetic benchmarks with known ground-truth motifs, we show that explanation methods recover distinct signal organizations: SA performs best for sparse single-node drivers, whereas IG and LRP preferentially recover distributed pathway-like and cascade-like signals. In TCGA BRCA data, we identify a consistent topological signature of disease-associated hubs in which attribution peaks in the immediate 1-hop neighborhood and decays across successive network shells, a pattern most pronounced for IG and LRP and associated with strong enrichment of known cancer hubs. We further observe a trade-off between local hub enrichment and global gene ranking performance, with IG optimizing local enrichment and SA achieving superior global discrimination. Motivated by these complementary behaviors, we introduce a framework combining a shell-based hub score with consensus ranking across explainers. Consensus scores improve prioritization of canonical cancer genes (TP53, BRCA1, ESR1, MYC), reduce dependence on node degree, and, especially when tuned, outperform individual methods. Pathway enrichment further reveals improved recovery of biologically coherent cancer programs, including ERBB2, RTK, MAPK, immune, and cytokine signaling. Together, these results demonstrate that topology-aware integration of graph explanations can improve biological interpretability and biologically relevant molecular recovery.

URL PDF HTML ☆

赞 0 踩 0

2605.21499 2026-05-22 physics.flu-dyn cs.LG 版本更新

Conditional Neural Field based Reduced Order Model for Dynamic Ditching Load Prediction

基于条件神经场的降阶模型用于动态倾倒载荷预测

Henning Schwarz, Pyei Phyo Lin, Jens-Peter M. Zemke, Thomas Rung

发表机构 * Institute for Fluid Dynamics and Ship Theory, Hamburg University of Technology, Am Schwarzenberg-Campus 4, D-21073 Hamburg, Germany（流体动力学与船舶理论研究所，汉堡技术大学，Schwarzenberg Campus 4，德国汉堡，D-21073）； Institute of Mathematics, Hamburg University of Technology, Am Schwarzenberg-Campus 3, D-21073 Hamburg, Germany（数学研究所，汉堡技术大学，Schwarzenberg Campus 3，德国汉堡，D-21073）

AI总结本文提出一种基于条件神经场的降阶模型，用于预测飞机倾倒载荷，该模型在不依赖空间离散化的情况下，通过结合LSTM网络实现了高精度的时空预测，并在不同空间离散化条件下展示了良好的重建能力。

详情

AI中文摘要

基于网格的神经网络，如卷积自编码器，在计算流体力学中广泛用于基于维度缩减的替代模型。近年来，基于坐标的方案，如条件神经场的使用逐渐兴起。其不依赖空间离散化的特性为计算流体力学中的各种应用提供了有益的特性。本文讨论了使用条件神经场方法对飞机倾倒载荷进行时空预测。模型使用两个数据集进行评估，一个与单个固定空间离散化相关，另一个包含不同离散化数据的数据。当与潜在空间中的长短期记忆（LSTM）网络结合时，基于神经场的模型在第一个数据集上实现了与网格依赖的卷积自编码器模型相当的时空预测精度，但参数显著更少。第二个数据集的结果展示了基于神经场的方法在异质空间离散化条件下准确重建倾倒载荷的能力。这允许灵活地使用为不同几何形状和/或离散化生成的训练数据集，以及使用替代模型预测不同配置的载荷。

英文摘要

Grid-based neural networks such as convolutional autoencoders are widely used in dimension reduction-based surrogate models for computational fluid dynamics. In recent years, the use of coordinate-based approaches like conditional neural fields has emerged. Their independence of the spatial discretization is a beneficial feature for various applications in computational fluid dynamics. This paper discusses the spatio-temporal prediction of aircraft ditching loads using a conditional neural field approach. The model is evaluated using two datasets for the dynamic loads of the fuselage of a DLR-D150 aircraft, one of which relates to a single fixed spatial discretization and the other that includes data from different discretizations. When paired with a long short-term memory (LSTM) network in the latent space, the neural field-based model achieves a spatio-temporal prediction accuracy for the first data set that is close to that of grid-dependent convolutional autoencoder-based models, and with significantly less parameters. Results for the second data set demonstrate the ability of the neural field-based approach to reconstruct ditching loads accurately for heterogeneous spatial discretizations. This allows for flexible use of training datasets generated for different geometries and/or discretizations, as well as the use of the surrogate model to predict loads for different configurations.

URL PDF HTML ☆

赞 0 踩 0

2605.21496 2026-05-22 cs.LG cs.AI cs.CL 版本更新

HealthCraft: A Reinforcement Learning Safety Environment for Emergency Medicine

HealthCraft: 一种用于急救医学的强化学习安全环境

Brandon Dent

发表机构 * GOATnote Inc.（GOATnote公司）

AI总结本文提出HealthCraft，首个公开的强化学习环境，用于在真实急救医学条件下奖励轨迹级安全，通过FHIR R4世界状态、24个MCP工具和双层评估标准，评估模型在急救任务中的安全性和性能，揭示了模型在多步骤工作流中的安全失败问题。

Comments 16 pages, 5 figures, 6 tables. Code, task suite, and Docker bundle: https://github.com/GOATnote-Inc/healthcraft

详情

AI中文摘要

前沿语言模型被部署到临床工作流程的速度超过了评估它们安全性的基础设施。静态医学问答基准测试忽略了急救医学中至关重要的失败模式：轨迹级安全崩溃、工具误用和在持续临床压力下的屈从。我们提出了HealthCraft，首个公开的强化学习环境，该环境在真实急救医学条件下奖励轨迹级安全，源自Corecraft。它基于FHIR R4世界状态，包含14个实体类型和3,987个种子实体，暴露24个MCP工具，并定义了双层评估标准，只要任何安全关键性标准被违反，就会将奖励设为零。我们发布了195个任务，涵盖六个类别，根据2,255个二元标准（其中515个为安全关键性标准）进行评分；一个事后10任务负类列表将此扩展到205个任务和2,337个标准。在两个前沿模型上的V8结果表明，Claude Opus 4.6在Pass@1达到24.8% [21.5-28.4]，GPT-5.4为12.6% [10.2-15.6]，安全失败率为27.5%和34.0%。在多步骤工作流——最接近真实急救护理的代理——中，性能降至接近零（Claude 1.0%，GPT-5.4 0.0%），尽管在单个步骤上部分具备能力。在试点v2和v8之间修复了六个基础设施错误，重新排列了哪些模型“看起来更强”，这表明基础设施的保真度是测量的一部分。一个确定性的LLM-判断器叠加限制了评估者的噪声，并且一个60次负类烟雾试点显示奖励信号不是可直接用于训练的安全：限制标准通过率为0.929的患病率，这在评估工具可以容忍但训练奖励不能。我们搭建了与Corecraft第5.2节中的Megatron+SGLang+GRPO循环的耦合，并将训练奖励的消融作为未来的工作。环境、任务、评估标准和工具均在Apache 2.0下发布。

英文摘要

Frontier language models are being deployed into clinical workflows faster than the infrastructure to evaluate them safely. Static medical-QA benchmarks miss the failure modes that matter in emergency medicine: trajectory-level safety collapse, tool misuse, and capitulation under sustained clinical pressure. We present HealthCraft, the first public reinforcement-learning environment that rewards trajectory-level safety under realistic emergency-medicine conditions, adapted from Corecraft. It is built on a FHIR R4 world state with 14 entity types and 3,987 seed entities, exposes 24 MCP tools, and defines a dual-layer rubric that zeroes reward whenever any safety-critical criterion is violated. We release 195 tasks across six categories, graded against 2,255 binary criteria (515 safety-critical); a post-hoc 10-task negative-class slate extends this to 205 tasks and 2,337 criteria. V8 results on two frontier models show Claude Opus 4.6 at Pass@1 24.8% [21.5-28.4] and GPT-5.4 at 12.6% [10.2-15.6], with safety-failure rates of 27.5% and 34.0%. On multi-step workflows - the closest proxy to real emergency care - performance collapses to near zero (Claude 1.0%, GPT-5.4 0.0%) despite partial competence on individual steps. Six infrastructure bugs fixed between pilots v2 and v8 re-ordered which model "looks stronger," evidence that infrastructure fidelity is part of the measurement. A deterministic LLM-judge overlay bounds evaluator noise, and a 60-run negative-class smoke pilot shows the reward signal is not drop-in training-safe: restraint criteria pass at 0.929 prevalence, a gameability an eval harness can tolerate but a training reward cannot. We scaffold coupling to a Megatron+SGLang+GRPO loop per Corecraft Section 5.2 and leave training-reward ablations as future work. Environment, tasks, rubrics, and harness are released under Apache 2.0.

URL PDF HTML ☆

赞 0 踩 0

2605.21494 2026-05-22 cs.LG 版本更新

Double descent for least-squares interpolation on contaminated data: A simulation study

过拟合模型的最小二乘插值在受污染数据中的双下降现象：一项模拟研究

Tino Werner

发表机构 * Institute for Mathematics, Carl von Ossietzky University Oldenburg（奥尔登堡卡尔·冯·奥西特齐克大学数学研究所）

AI总结本文研究了在受污染数据下线性回归中是否会出现双下降现象，比较了最小二乘插值估计器与几种鲁棒替代方法的性能，发现大规模过拟合确实导致双下降现象，使最小二乘插值器的泛化性能优于鲁棒替代方法。

详情

AI中文摘要

过参数化模型尽管根据经典统计理论应容易过拟合，但能表现出出色的泛化性能。双下降现象的发现，即在达到一定模型复杂度后泛化误差减小，开辟了新的研究方向。稳健统计考虑在受污染数据上的统计估计，由于现实数据不满足假设，导致数据点相对于假设的“理想”分布出现异常值，可能严重扭曲任何经典估计器。本文探讨在受污染训练数据的线性回归设置中是否会出现双下降现象。比较了高度非鲁棒的最小二乘插值估计器与几种鲁棒替代方法的性能。结果表明，大规模过参数化确实导致双下降现象，使最小二乘插值器的泛化性能非常优异，优于鲁棒替代方法。

英文摘要

Overparametrized models can exhibit an excellent generalization performance, although they should be prone to overfitting according to classical statistical theory. The discovery of the "double descent", indicating that the generalization error decreases after a certain model complexity has been reached, opened a new line of research. Robust statistics considers statistical estimation on contaminated data, which, due to assumptions that do not hold on real data, let data points appear as outliers w.r.t. the assumed "ideal" distribution, potentially severely distorting any classical estimator. We address the question whether a double descent phenomenon can be observed in a linear regression setting with contaminated training data. We compare the performance of the highly non-robust least-squares interpolation estimator with several robust alternatives. It turns out that large overparametrization indeed allows for a double descent phenomenon, resulting in a very good generalization performance of the least-squares interpolator, surpassing that of the robust alternatives.

URL PDF HTML ☆

赞 0 踩 0

2605.21493 2026-05-22 cs.LG cs.AI cs.CV 版本更新

Don't Collapse Your Features: Why CenterLoss Hurts OOD Detection and Multi-Scale Mahalanobis Wins

不要压缩你的特征：为什么CenterLoss伤害OOD检测和多尺度Mahalanobis获胜

Rahul D Ray

发表机构 * Department of Electronics and Electrical Engineering（电子与电气工程系）

AI总结本文提出GOEN方法，通过多尺度特征、L2归一化、Mahalanobis距离和校准头来提升OOD检测性能，发现CenterLoss会降低OOD检测性能，而GOEN-NoCenterLoss在CIFAR-10基准上表现优于其他基线方法。

详情

AI中文摘要

检测分布外（OOD）输入的能力是安全部署机器学习系统的基础。然而，当前方法往往依赖于仅优化分类准确性的特征表示，忽略了epistemic不确定性的要求。我们引入GOEN（几何优化的epistemic网络），一种结合多尺度特征、L2归一化、Mahalanobis距离和使用真实硬OOD示例训练的校准头的简单流程。通过系统消融，我们发现一个反直觉的发现：CenterLoss，一种用于特征紧凑性的流行正则化器，显著降低了OOD检测性能，尽管提高了分类准确性。最佳变体GOEN-NoCenterLoss在CIFAR-10基准上实现了0.9483的平均OOD AUROC，超过了包括深度集成（0.8827）、KNN（0.8967）和ODIN（0.8870）在内的所有基线方法，同时保持了有竞争力的分布内准确性。我们的结果挑战了普遍认为更好的分类几何自动导致更好的epistemic不确定性假设。相反，我们展示了过于紧致的特征簇会压缩类间边缘并扭曲所需的有效OOD检测的协方差结构。GOEN是高效的，在单个GPU上训练不到20分钟，并提供了一种构建可靠识别自身局限的AI系统的实用蓝图。

英文摘要

The ability to detect out-of-distribution (OOD) inputs is fundamental to safe deployment of machine learning systems. Yet, current methods often rely on feature representations that are optimised solely for classification accuracy, neglecting the distinct requirements of epistemic uncertainty. We introduce GOEN (Geometry-Optimised Epistemic Network), a simple pipeline that combines multi-scale features, L2 normalisation, Mahalanobis distance, and a calibration head trained with real hard OOD examples. Through systematic ablation we uncover a counter-intuitive finding: CenterLoss, a popular regulariser for feature compactness, significantly degrades OOD detection performance, reducing average OOD AUROC from 0.9483 to 0.9366 despite improving classification accuracy. The best variant, GOEN-NoCenterLoss, achieves an average OOD AUROC of 0.9483, surpassing all baselines including deep ensembles (0.8827), KNN (0.8967), and ODIN (0.8870) on CIFAR-10 benchmarks, while maintaining competitive in-distribution accuracy. Our results challenge the prevailing assumption that better classification geometry automatically leads to better epistemic uncertainty. Instead, we show that overly tight feature clusters compress inter-class margins and distort the covariance structure needed for effective OOD detection. GOEN is efficient, training in under 20 minutes on a single GPU, and provides a practical blueprint for building AI systems that reliably recognise their own limitations.

URL PDF HTML ☆

赞 0 踩 0

2605.21492 2026-05-22 cs.LG cs.AI cs.LO stat.ML 版本更新

The Attribution Impossibility: No Feature Ranking Is Faithful, Stable, and Complete Under Collinearity

特征归因不可能性：在共线性下，没有任何特征排名是忠实、稳定和完整的

Drake Caraker, Bryan Arnold, David Rhoads

发表机构 * Independent Researchers（独立研究人员）

AI总结本文研究了在共线性情况下特征排名的不可能性，证明了无法同时满足忠实、稳定和完整性的条件，并提出了DASH方法作为解决途径，同时通过形式化验证展示了其理论基础和实际应用影响。

Comments 66 pages, 12 figures, 305 Lean 4 theorems. Code at https://github.com/DrakeCaraker/dash-impossibility-lean

详情

DOI: 10.5281/zenodo.19468379

AI中文摘要

在共线性情况下，没有任何特征排名可以同时忠实、稳定和完整。对于共线性对，排名本质上等同于抛硬币。我们证明了这一不可能性，针对四种模型类别进行了量化分析，通过集成平均（DASH）方法解决该问题，并利用305个Lean 4定理进行机验证。我们刻画了完整的归因设计空间：恰好存在两种方法家族——忠实-完整方法（不稳定，排名可能翻转多达50%的时间）和集成方法如DASH（稳定，对称特征报告平局）。归因比在梯度提升中发散为1/(1-rho^2)，在Lasso中为无穷大，在随机森林中收敛。DASH（Diversified Aggregation of SHAP）在无偏聚合中被证明是帕累托最优的，达到Cramer-Rao方差下界并具有紧的集成大小公式。在77个公共数据集中，68%表现出归因不稳定性。在特征具有相等因果效应时，切换到条件SHAP无法逃脱这一不可能性。该框架包括实用的诊断工具——Z检验工作流程和单模型筛查工具——并直接影响公平性审计：基于SHAP的代理歧视审计在共线性下被证明不可靠。设计空间定理、诊断和不可能性均在Lean 4中形式化验证（305个定理从16个公理，0 sorry）——据我们所知，这是可解释AI领域首个形式化验证的不可能性。

英文摘要

No feature ranking can be simultaneously faithful, stable, and complete when features are collinear. For collinear pairs, ranking reduces to a coin flip. We prove this impossibility, quantify it for four model classes, resolve it via ensemble averaging (DASH), and machine-verify it with 305 Lean 4 theorems. We characterize the complete attribution design space: exactly two families of methods exist -- faithful-complete methods (unstable, with rankings that flip up to 50% of the time) and ensemble methods like DASH (stable, reporting ties for symmetric features) -- and no method lies outside this dichotomy. The impossibility is quantitative: the attribution ratio diverges as 1/(1-rho^2) for gradient boosting, is infinite for Lasso, and converges for random forests. DASH (Diversified Aggregation of SHAP) is provably Pareto-optimal among unbiased aggregations, achieving the Cramer-Rao variance bound with a tight ensemble size formula. In a survey of 77 public datasets, 68% exhibit attribution instability. Switching to conditional SHAP does not escape the impossibility when features have equal causal effects. The framework includes practical diagnostics -- a Z-test workflow and single-model screening tool -- and has direct consequences for fairness auditing: SHAP-based proxy discrimination audits are provably unreliable under collinearity. The design space theorem, diagnostics, and impossibility are mechanically verified in Lean 4 (305 theorems from 16 axioms, 0 sorry) -- to our knowledge, the first formally verified impossibility in explainable AI.

URL PDF HTML ☆

赞 0 踩 0

2605.21491 2026-05-22 cs.LG cs.AI cs.CL 版本更新

Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation

通过比较想法评估教授语言模型预测研究成功的技巧

Srujan P Mule, Aniketh Garikaparthi, Manasi Patwardhan

发表机构 * IISER Pune（印度理工学院帕内尔）

AI总结本研究探讨了语言模型能否在无需实验的情况下预测研究想法的实证成功，通过构建基于PapersWithCode客观结果的11488对想法数据集，发现通过强化学习可提升模型性能至71.35%，证明小型语言模型可以作为有效的客观验证器，为自主科学发现提供可扩展路径。

Comments ACL 2026 Findings

详情

AI中文摘要

随着语言模型通过自动化假设生成和实现加速科学研究，出现了一个新的瓶颈：在没有彻底实验的情况下评估和过滤数百个AI生成的想法。我们问语言模型是否能学会在任何实验运行之前预测研究想法的实证成功。我们研究了比较实证预测：给定一个基准特定的研究目标和两个候选想法，预测哪个将实现更好的基准性能。我们构建了一个基于PapersWithCode客观结果的11,488对想法数据集。尽管现成的8B参数模型表现不佳（30%准确率），SFT显著提升了性能至77.1%，优于GPT-5（61.1%）。通过将评估框架为推理任务，通过可验证奖励的强化学习（RLVR），我们训练模型发现潜在的推理路径，实现71.35%的准确率，并具有可解释的依据。通过额外的消融和分布外测试，我们展示了对表面启发式的鲁棒性，并转移到了跨领域时间拆分测试集和独立构建的测试集。我们的结果表明，计算高效的轻量级语言模型可以作为有效的、客观的验证器，为自主科学发现提供可扩展的路径。

英文摘要

As language models accelerate scientific research by automating hypothesis generation and implementation, a new bottleneck emerges: evaluating and filtering hundreds of AI-generated ideas without exhaustive experimentation. We ask whether LMs can learn to forecast the empirical success of research ideas before any experiments are run. We study comparative empirical forecasting: given a benchmark-specific research goal and two candidate ideas, predict which will achieve better benchmark performance. We construct a dataset of 11,488 idea pairs grounded in objective outcomes from PapersWithCode. While off-the-shelf 8B-parameter models struggle (30% acc.), SFT dramatically boosts performance to 77.1%, outperforming GPT-5 (61.1%). By framing evaluation as a reasoning task via Reinforcement Learning with Verifiable Rewards (RLVR), we train models to discover latent reasoning paths, achieving 71.35% acc. with interpretable justifications. Through additional ablations and out-of-distribution tests, we show robustness to surface-level heuristics and transfer to both a cross-domain time-split test set and an independently constructed test set. Our results demonstrate that compute-efficient small language models can serve as effective, objective verifiers, offering a scalable path for autonomous scientific discovery.

URL PDF HTML ☆

赞 0 踩 0

2605.21490 2026-05-22 cs.LG cs.CR 版本更新

Temporal Contrastive Transformer for Financial Crime Detection: Self-Supervised Sequence Embeddings via Predictive Contrastive Coding

基于时间对比的变压器用于金融犯罪检测：通过预测对比编码实现自监督序列嵌入

Danny Butvinik, Yonit Marcus, Nitzan Tal, Gabrielle Azoulay

发表机构 * NICE Actimize

AI总结本文提出了一种名为时间对比变压器（TCT）的表示学习框架，旨在捕捉金融交易序列中的时间动态。通过自监督对比目标训练模型，生成编码时间行为模式的嵌入，以支持下游的欺诈检测任务。实验结果显示，嵌入本身能实现有意义的预测性能（AUC 0.8644），但结合领域工程特征时，性能提升不显著（AUC 0.9205 vs. 0.9245），表明学习到的表示与现有特征抽象有较大重叠。这些发现表明TCT是一种有前景的表示学习方法，能够捕捉相关的行为信号，同时凸显了在强领域特征上实现加性价值的挑战。

Comments 10 pages, 4 figures, one table

详情

AI中文摘要

我们介绍了一种时间对比变压器（TCT），一种旨在捕捉金融交易序列中上下文时间动态的表示学习框架。该模型通过自监督对比目标进行训练，以生成编码时间行为模式的嵌入，以支持下游的欺诈检测任务。我们通过将学习到的嵌入作为输入特征送入梯度提升分类器，在现实环境中评估TCT。实验结果表明，仅使用嵌入本身就能实现有意义的预测性能（AUC 0.8644），表明模型能够捕捉非平凡的时间结构。然而，当结合领域工程特征时，与基线相比没有可观的提升（AUC 0.9205 vs. 0.9245），表明学习到的表示与现有特征抽象有较大重叠。这些发现将TCT定位为一种有前景的表示学习方法，能够捕捉相关的行为信号，同时凸显了在强领域特征上实现加性价值的挑战。这些结果反映了时间表示学习在金融犯罪检测中的发展中间阶段，并激励进一步研究模型架构、训练目标和整合策略。在这一早期阶段，实现与强特征工程基线相当的性能本身就是一个有意义的结果，表明学习到的表示可以近似于领域特定的特征，而无需手动工程。虽然尚未达到生产就绪状态，但这些结果指出了减少对特征工程依赖的有希望的方向。

英文摘要

We introduce the Temporal Contrastive Transformer (TCT), a representation learning framework designed to capture contextual temporal dynamics in sequences of financial transactions. The model is trained using a self-supervised contrastive objective to produce embeddings that encode behavioral patterns over time, with the goal of supporting downstream fraud detection tasks. We evaluate TCT in a realistic setting by using the learned embeddings as input features to a gradient boosting classifier. Experimental results show that embeddings alone achieve meaningful predictive performance (AUC 0.8644), indicating that the model captures non-trivial temporal structure. However, when combined with domain-engineered features, no measurable improvement is observed over the baseline (AUC 0.9205 vs. 0.9245), suggesting that the learned representations largely overlap with existing feature abstractions. These findings position TCT as a promising representation learning approach that captures relevant behavioral signal, while highlighting the challenges of achieving additive value over strong domain features. The results reflect an intermediate stage in the development of temporal representation learning for financial crime detection and motivate further research on model architecture, training objectives, and integration strategies. At this early stage, achieving performance comparable to a strong feature-engineered baseline is itself a meaningful outcome, indicating that learned representations approximate domain-specific features without manual engineering. While not yet production-ready, these results point to a promising direction for reducing reliance on feature engineering in financial crime detection.

URL PDF HTML ☆

赞 0 踩 0

2605.21282 2026-05-22 cs.LG cs.AI 版本更新

Stochastic MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent

随机均值流策略：带有熵镜降的一步生成控制

Zeyuan Wang, Da Li, Yulin Chen, Yuehu Gong, Yanming Guo, Ye Shi, Liang Bai, Tianyuan Yu, Yanwei Fu

发表机构 * Laboratory for Big Data and Decision（大数据与决策实验室）； National University of Defense Technology（国防科技大学）； Samsung AI Center Cambridge（三星AI研究中心）； Queen Mary University of London（伦敦玛丽女王大学）； Fudan University（复旦大学）； ShanghaiTech University（上海科技大学）； Shanghai Innovation Institute（上海创新研究院）

AI总结本文提出了一种随机均值流策略（SMFP），通过均值流变换将高斯噪声映射到动作，以实现可训练的生成策略，从而在离线策略镜降框架下实现探索性且稳定的改进。

详情

AI中文摘要

GROW: 将GRPO与状态-动作建模对齐以适用于开放世界VLM智能体

Xiongbin Wu, Zhihao Luo, Shanzhe Lei, Lechao Zhang, Xuhong Wang, Jie Yang, Zhonglong Zheng, Yuanjie Zheng, Xin Tan, Wei Liu

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； East China Normal University（华东师范大学）； Zhejiang Normal University（浙江师范大学）； Shandong Normal University（山东省师范大学）

AI总结本文提出GROW框架，通过将收集的轨迹分解为状态-动作样本，并在样本间计算优势，解决了标准GRPO在多轮RL中因需要完整轨迹导致上下文过长和噪声的问题，实验表明其在超过800个Minecraft任务中取得SOTA性能。

详情

AI中文摘要

最近，视觉-语言模型（VLM）智能体在开放世界任务中展现出有前景的进步，其中成功的任务完成通常需要多次视觉感知和动作执行的回合。然而，现有方法仍主要依赖于监督微调（SFT）专家演示，而先进的强化学习（RL）算法，特别是分组相对策略优化（GRPO），尚未在这些任务中有效应用于多轮RL，因为标准GRPO需要完整的轨迹作为训练样本，导致上下文过长和噪声。为了解决这个问题，我们提出GROW，一种适用于开放世界VLM智能体的RL框架，将收集的轨迹分解为状态-动作样本，并在这些样本之间计算优势，而不是将完整轨迹视为单一实体。我们进一步提供了一个替代分析，表明尽管分组样本是基于不同的局部状态而不是相同的提示上下文，简化假设下目标可以保留GRPO的核心相对策略优化信号。在超过800个Minecraft任务上的实验表明，我们的方法实现了最先进的性能，证明了我们提出的RL框架在开放世界VLM智能体中的有效性。

英文摘要

Recently, vision-language model (VLM) agents have shown promising progress in open-world tasks, where successful task completion often requires multiple turns of visual perception and action execution. However, existing methods still rely primarily on Supervised Fine-Tuning (SFT) with expert demonstrations, while the advanced reinforcement learning (RL) algorithm, specifically Group Relative Policy Optimization (GRPO), has not been effectively employed for multi-turn RL in these tasks because standard GRPO requires full trajectories as training samples which leads to excessively long context and noise. To address this issue, we propose GROW, a RL framework for open-world VLM agents that decomposes collected trajectories into state-action samples, and computes advantages between these samples rather than treating a full trajectory as a single entity. We further provide a surrogate analysis indicating that, even though the grouped samples are conditioned on different local states rather than an identical prompt context, the objective can preserve the core relative policy optimization signal of GRPO under simplifying assumptions. Experiments on more than 800 Minecraft tasks show that our method achieves state-of-the-art (SOTA) performance, demonstrating the effectiveness of our proposed RL framework for open-world VLM agents.

URL PDF HTML ☆

赞 0 踩 0

2605.18893 2026-05-22 cs.LG 版本更新

Position: Graph Condensation Needs a Reset -- Move Beyond Full-dataset Training and Model-Dependence

位置：图压缩需要重新开始——超越全数据集训练和模型依赖

Mridul Gupta, Samyak Jain, Vansh Ramani, Hariprasad Kodamana, Sayan Ranu

发表机构 * Yardi School of Artificial Intelligence, IIT Delhi, India（印度德里理工学院Yardi人工智能学院）； Department of Computer Science and Engineering, IIT Delhi, India（印度德里理工学院计算机科学与工程系）； Department of Chemical Engineering, IIT Delhi, India（印度德里理工学院化学工程系）； Indian Institute of Technology Delhi, Abu Dhabi, Zayed City, Abu Dhabi, UAE（印度德里理工学院阿布扎赫德分校，扎耶德城，阿布扎赫德，阿联酋）

AI总结本文指出当前图压缩方法存在系统性缺陷，呼吁转向轻量、架构无关且可部署的方法，以实现高效、通用和可扩展的图神经网络训练。

详情

AI中文摘要

图神经网络（GNNs）是学习图结构数据的强大工具，但其可扩展性在推荐系统、欺诈检测和分子生物学等领域的现实图规模下日益受到限制。图压缩——生成保留原始模型性能的更小合成图的任务——已成为有前途的解决方案。然而，主流的梯度匹配方法引入了根本性矛盾：它需要在完整数据集上训练以生成压缩版本，从而削弱了效率目标。更糟糕的是，这些方法存在高计算开销、在不同GNN架构间泛化差以及对特定模型配置的脆弱依赖。同样令人担忧的是社区对误导性评估协议如节点压缩比的依赖，这些协议未能反映真正的资源节约、压缩开销以及对神经架构搜索的虚假应用。这些不足并非偶然——它们是系统性的，并阻碍了有意义的进展。在本文的立场论文中，我们主张图压缩目前需要重新开始。我们呼吁超越全数据集训练和模型依赖，转而倡导轻量、架构无关且可部署的方法。通过识别关键方法论缺陷并概述具体研究方向，我们旨在将领域重新导向能够实现压缩真正承诺的方法：高效、通用和可扩展的图神经网络训练。

英文摘要

Graph Neural Networks (GNNs) are powerful tools for learning from graph-structured data, but their scalability is increasingly strained by the size of real-world graphs in domains like recommender systems, fraud detection, and molecular biology. Graph condensation -- the task of generating a smaller synthetic graph that retains the performance of models trained on the original -- has emerged as a promising solution. However, the dominant approach of gradient matching introduces a fundamental contradiction: it requires training on the full dataset to create the compressed version, thereby undermining the goal of efficiency. Worse still, these methods suffer from high computational overhead, poor generalization across GNN architectures, and brittle reliance on specific model configurations. Equally concerning is the community's reliance on misleading evaluation protocols such as node compression ratios, which fail to reflect true resource savings, condensation overhead, and illusory application to neural architecture search. These shortcomings are not incidental -- they are systemic, and they obstruct meaningful progress. In this position paper, we argue that graph condensation, in its current form, needs a reset. We call for moving beyond full-dataset training and model-dependent design, and instead advocate for methods that are lightweight, architecture-agnostic, and practically deployable. By identifying key methodological flaws and outlining concrete research directions, we aim to reorient the field toward approaches that deliver on the true promise of condensation: efficient, generalizable, and usable GNN training at scale.

URL PDF HTML ☆

赞 0 踩 0

2605.18721 2026-05-22 cs.LG cs.CL 版本更新

General Preference Reinforcement Learning

通用偏好强化学习

Muhammad Umer, Muhammad Ahmed Mohsin, Ahsan Bilal, Arslan Chaudhry, Andreas Haupt, Sanmi Koyejo, Emily Fox, John M. Cioffi

发表机构 * Stanford University（斯坦福大学）； The University of Oklahoma（俄克拉荷马大学）

AI总结本文提出通用偏好强化学习（GPRL），通过引入通用偏好模型（GPM）解决传统强化学习在开放任务中连续探索不足的问题，通过多维偏好比较提升模型性能。

详情

AI中文摘要

训练后将大型语言模型（LLM）对齐分解为两个大致分离的轨道。在线强化学习（RL）通过可验证奖励推动数学和代码的涌现推理，但依赖于无法达到开放任务的程序验证器；而偏好优化处理开放生成任务却牺牲了驱动在线RL的连续探索。弥合这一差距需要一个开放性质量验证器，但标量奖励模型不适合此任务。质量是多维的，任何标量分数都是不完整的代理，使在线RL崩溃于分数最敏感的轴。我们转而采用通用偏好模型（GPM），将响应嵌入到k个斜对称子空间中，并将偏好表示为结构化的、具有不传递性的比较。在此基础上，我们提出通用偏好强化学习（GPRL），将k维结构延伸到策略更新中。GPRL计算每维的组相对优势，对每个优势进行归一化以避免任何轴主导，并通过上下文相关的特征值进行聚合。相同的结构推动了一个闭环漂移监视器，能够检测单轴利用并通过重新加权维度和收紧信任区域进行即时纠正。从Llama-3-8B-Instruct开始，GPRL在AlpacaEval~2.0上达到长度控制的胜利率为56.51%，并在Arena-Hard、MT-Bench和WildBench上优于SimPO和SPPO，通过在长时间训练中抵抗奖励黑客。

英文摘要

Post-training has split large language model (LLM) alignment into two largely disconnected tracks. Online reinforcement learning (RL) with verifiable rewards drives emergent reasoning on math and code but depends on a programmatic verifier that cannot reach open-ended tasks, while preference optimization handles open-ended generation yet forgoes the continuous exploration that powers online RL. Closing this gap requires a verifier for open-ended quality, but a scalar reward model is the wrong shape for the job. Quality is multi-dimensional, and any scalar score is an incomplete proxy that lets online RL collapse onto whichever axis the score is most sensitive to. We turn instead to the General Preference Model (GPM), which embeds responses into $k$ skew-symmetric subspaces and represents preference as a structured, intransitivity-aware comparison. Building on this, we propose General Preference Reinforcement Learning (GPRL), which carries the $k$-way structure through to the policy update. GPRL computes per-dimension group-relative advantages, normalizes each on its own scale so no axis can dominate, and aggregates them with context-dependent eigenvalues. The same structure powers a closed-loop drift monitor that detects single-axis exploitation and corrects it on the fly by reweighting dimensions and tightening the trust region. Starting from $\texttt{Llama-3-8B-Instruct}$, GPRL reaches a length-controlled win rate of $56.51\%$ on AlpacaEval~2.0 while also outperforming SimPO and SPPO on Arena-Hard, MT-Bench, and WildBench by resisting reward hacking across extended training runs.

URL PDF HTML ☆

赞 0 踩 0

2605.17659 2026-05-22 cs.LG 版本更新

局部关注，线性记忆：线性注意力作为跨帧记忆用于自回归视频扩散

Kunyang Li, Mubarak Shah, Yuzhang Shang

发表机构 * Institute of Artificial Intelligence, University of Central Florida（中央佛罗里达大学人工智能研究所）

AI总结本文提出了一种名为ARL2的混合注意力模块，通过将二次跨帧注意力替换为固定大小的递归状态，解决了自回归视频扩散模型在长视频生成中的可扩展性瓶颈问题，实现了线性时间复杂度和常数内存消耗，同时提升了时间一致性。

详情

AI中文摘要

自回归（AR）视频扩散是一种强大的视频生成范式，用于流式和交互式视频生成。然而，其依赖于softmax自注意力机制导致序列长度的二次计算复杂度和内存使用，由于键值缓存，限制了其扩展到长视频时间范围的能力。现有的解决方案（例如稀疏注意力和KV缓存压缩）降低了每步成本，但仍依赖于线性增长的缓存或不可逆地丢弃过去上下文，因此无法解决线性内存增长和流式上下文管理问题。为了解决这一可扩展性瓶颈，我们提出了ARL2（局部关注，线性记忆），一种混合注意力模块，通过将二次跨帧注意力替换为固定大小的递归状态。我们将自注意力分解为两个分支：一个用于空间细节和局部依赖的帧内softmax分支，以及一个用于维护固定大小状态以流式管理上下文的帧间门控线性分支。我们的关键见解是softmax注意力捕捉细粒度的局部交互，而递归状态提供可控的长程记忆。这种设计实现了线性时间复杂度和常数内存消耗，同时在全softmax模型上提高了时间一致性。为防止噪声中间状态破坏记忆，我们只在去噪步骤后更新递归状态。为了避免帧内信息不对称，所有token共享相同的预更新状态，而不是按顺序更新。据我们所知，这是首次将预训练的AR视频扩散模型转换为混合线性注意力架构的工作，通过一种高效的两阶段训练方案实现AR视频的训练。在75%的层被替换为混合线性注意力的情况下，模型实现了高达2.26倍的时钟时间加速和54%的内存减少，同时保持与改进的时间一致性相当的质量。

英文摘要

Autoregressive (AR) video diffusion is a powerful paradigm for streaming and interactive video generation. However, its reliance on softmax self-attention leads to quadratic compute complexity in sequence length and memory usage due to key-value caching, which limits its scalability to long video horizons. Existing remedies (e.g., sparse attention and KV-cache compression) reduce per-step cost but still rely on a linearly growing cache or irreversibly discard past context, and thus fail to address linear memory growth and streaming context management. To address this scalability bottleneck, we propose ARL2 (Attend Locally, Remember Linearly), a hybrid attention module that replaces quadratic cross-frame attention with a fixed-size recurrent state. We decompose self-attention into two branches: an intra-frame softmax branch for spatial detail and local dependencies, and an inter-frame gated recurrent linear branch that maintains a fixed-size state for streaming context. Our key insight is that softmax attention captures fine-grained local interactions, while a recurrent state provides controllable long-range memory. This design achieves linear-time scaling with constant memory while improving temporal consistency over the full-softmax model. To prevent noisy intermediate states from corrupting memory, we update the recurrent state only after the denoised pass. To avoid within-frame information asymmetry, all tokens share the same pre-update state rather than sequential updates. To the best of our knowledge, this is the first work to convert a pretrained AR video diffusion model into a hybrid linear attention architecture, through an efficient two-stage training scheme for AR video. With 75% of layers replaced by hybrid linear attention, the model achieves up to 2.26 wall-clock speedup and 54% memory reduction, while maintaining comparable quality with improving temporal consistency.

URL PDF HTML ☆

赞 0 踩 0

2605.16362 2026-05-22 cs.LG cs.AI 版本更新

When Is Rank-1 Steering Cheap? Geometry, Granularity, and Budgeted Search

当秩-1引导廉价时是什么情况？几何学、粒度和预算化搜索

John T. Robertson, Jianing Zhu, Haris Vikalo, Zhangyang Wang

发表机构 * The University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结本文研究了秩-1引导在不同概念上的有效性差异，提出粒度和几何学是影响引导成本的关键因素，并介绍了GRACE框架来高效优化引导过程。

Comments Updated Abstract metadata

详情

AI中文摘要

激活引导提供了一种无需重新训练即可控制大语言模型的轻量方法，但其效果在不同概念上变化显著。先前研究通常将这种变化视为许多概念无法由单一引导方向捕捉的证据。我们主张这种变化更多反映了搜索难度：有用的秩-1干预通常存在，但找到它可能成本高昂。我们正式将秩-1引导定义为在干预层和系数上的预算约束优化。在不同概念和模型家族中，提示边界方向对齐预测有效干预的位置，使几何引导搜索能够以更少的评估达到高效用，平均减少39.8%的试验次数以恢复95%的最佳效用。为解释为何某些概念即使在更好的搜索下仍昂贵，我们引入了粒度，即对比上下文中方向异质性的度量。粒度区分了差异向量共享稳定全局方向的概念，与提示在每个输入中局部一致但最优方向系统性旋转的概念。更高的粒度与更慢的收敛速度和更低的最佳效用相关（相关系数分别为0.44和-0.46，p<0.001）。我们提出了GRACE框架，一个粒度和表征意识的概念工程框架，利用激活几何学来诊断引导难度的主要来源，选择适当的解决方案，并高效分配优化努力。我们的结果将框架从“秩-1何时失败？”转变为“秩-1何时廉价且稳定？”，使激活几何学从描述性工具转变为LLM控制的可操作先验。

英文摘要

Activation steering offers a lightweight way to control LLMs without retraining, but its effectiveness varies sharply across concepts. Prior work often reads this variability as evidence that many concepts are not captured by a single steering direction. We argue instead that much of it reflects search difficulty: a useful rank-1 intervention often exists, but finding it can be expensive. We formalize rank-1 steering as a budget-constrained optimization over intervention layer and coefficient. Across concepts and model families, prompt-boundary directional alignment predicts where effective interventions occur, enabling geometry-guided search that reaches high utility with substantially fewer evaluations, reducing the trials needed to recover 95% of best-found utility by 39.8% on average across three model families. To explain why some concepts remain expensive even under better search, we introduce concept granularity, a measure of directional heterogeneity across contrastive contexts. Granularity distinguishes concepts whose difference vectors share a stable global direction from those where prompts agree locally within each input but the utility-maximizing direction rotates systematically across inputs. Higher granularity is associated with slower convergence and lower best-found performance (Pearson $r{=}0.44$ with trials-to-95%, $r{=}{-}0.46$ with best-found utility, both $p<0.001$). We present GRACE, a Granularity- and Representation-Aware Concept Engineering framework that uses activation geometry to diagnose the dominant source of steering difficulty, select the appropriate remedy, and allocate optimization effort efficiently. Our results shift the frame from "when does rank-1 fail?" to "when is rank-1 cheap and stable?", turning activation geometry from a descriptive tool into an actionable prior for LLM control.

URL PDF HTML ☆

赞 0 踩 0

2605.15588 2026-05-22 cs.CL cs.LG 版本更新

Calibrating LLMs with Semantic-level Reward

通过语义层面奖励校准大型语言模型

Fengfei Yu, Ruijia Niu, Dongxia Wu, Yian Ma, Rose Yu

发表机构 * Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA（加州大学圣地亚哥分校计算机科学与工程系，拉贾尔，加利福尼亚州，美国）； Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, California, USA（加州大学圣地亚哥分校Halıcıoğlu数据科学研究所，拉贾尔，加利福尼亚州，美国）； Department of Statistics, Stanford University, Stanford, California, USA（斯坦福大学统计学系，斯坦福，加利福尼亚州，美国）

AI总结本文提出了一种新的校准框架CSR，通过在语义空间中直接校准语言模型，避免了传统方法中因词汇化置信度导致的不一致问题，实验显示CSR在多个数据集上均能有效降低ECE并提高AUROC。

详情

AI中文摘要

随着大型语言模型（LLMs）被应用于医疗问答和法律推理等关键领域，估计其输出正确性的能力对于安全可靠使用至关重要，要求模型具有良好的校准能力。标准的可验证奖励强化学习（RLVR）通过二元正确性奖励训练模型，但该奖励对置信度不敏感，无法对自信但错误的预测施加惩罚，从而降低校准效果。最近的研究通过训练模型生成带有词汇化置信度的置信分数并奖励与正确性的同意来解决这一问题。然而，词汇化置信度在语义相同但文本变化时表现出不一致性。我们提出Calibration with Semantic Reward（CSR），一种在语义空间中直接校准语言模型的框架，无需词汇化置信度接口。CSR结合了正确性奖励和一种新的语义校准奖励，通过促进正确路径中的语义一致性和不正确路径中的探索来鼓励利用和探索。在HotpotQA（在分布）和TriviaQA、MSMARCO、NQ-Open（不在分布）三个模型家族上的实验表明，CSR在几乎所有设置中都比词汇化置信度基线实现了更低的ECE和更高的AUROC，ECE减少高达40%，AUROC提高高达31%，校准行为在所有四个评估设置中均表现出良好的鲁棒性。

英文摘要

As large language models (LLMs) are deployed in consequential settings such as medical question answering and legal reasoning, the ability to estimate when their outputs are likely to be correct is essential for safe and reliable use, requiring well-calibrated uncertainty. Standard reinforcement learning with verifiable rewards (RLVR) trains models with a binary correctness reward that is indifferent to confidence, providing no penalty for confident but wrong predictions and thereby degrading calibration. Recent work addresses this by training models to produce verbalized confidence scores alongside answers and rewarding agreement with correctness. However, verbalized confidence is calibrated at the token level and thus exhibits inconsistency across textual variations with same semantic meaning. We propose \textbf{Calibration with Semantic Reward (CSR)}, a framework that calibrates language models directly in semantic space without a verbalized confidence interface. CSR combines the correctness reward with a novel semantic calibration reward that encourages exploitation among correct rollouts by promoting semantic agreement, and exploration among incorrect ones by discouraging spurious consistency. Experiments across three model families on HotpotQA (in-distribution) and TriviaQA, MSMARCO, and NQ-Open (out-of-distribution) show that CSR consistently achieves lower ECE and higher AUROC than verbalized-confidence baselines across nearly all settings, reducing ECE by up to $40\%$ and improving AUROC by up to $31\%$ over verbalized-confidence baselines, with calibration behavior generalizing robustly across all four evaluation settings.

URL PDF HTML ☆

赞 0 踩 0

2605.15505 2026-05-22 cs.AI cs.IR cs.LG 版本更新

X-SYNTH: Beyond Retrieval -- Enterprise Context Synthesis from Observed Digital Human Attention

X-SYNTH：超越检索——从观察到的数字人类注意力中提取企业上下文

Guruprasad Raghavan, George Nychis, Rohan Narayana Murthy

发表机构 * Workfabric AI

AI总结本文提出X-SYNTH框架，通过分析数字人类注意力行为模式，解决企业上下文合成问题，其核心方法是基于行为模式的上下文合成，而非传统检索，从而显著提升有效线索率并降低误报率。

Comments 11 pages, 7 figures, 5 tables

详情

AI中文摘要

在企业运营中，AI代理任务所需上下文分散在记录系统、静态信息存储和通信渠道中。所存储的是系统状态，这是工作实际发生情况的损失性表示。现有的方法通过匹配请求内容来检索存储的信息；对于狭窄请求，这种方法效果良好。但合成质量依赖于了解应展示什么以及如何解释它：这涉及每个组织、团队和个人特有的知识，存在于行为模式中，而不在任何检索索引中。对于提出对企业有价值的线索给销售员的代理任务，这种方法失效：真正的线索率低，假线索率高，且模型没有改进机制。我们提出了X-SYNTH，一个基于数字人类注意力的框架，这种注意力是每个工人的可数字化交互特征，编码了他们做了什么、按什么顺序做，以及隐含的奖励信号。在没有外部标签的情况下，可以区分出导致积极结果的先前行为轨迹与未导致积极结果的轨迹。X-SYNTH将每个个体的行为基线建模为数字双胞胎签名（DTS），并根据个体和查询选择七种注意力过滤器：比例、反比、微分、递归、比较、顺序和集体，以识别因果相关的活动签名。一个四阶段的管道将基于行为模式的排名上下文组装起来，而不是查询嵌入。一个前沿模型在无辅助的情况下实现了9.5%的真实线索率（TLR）和90.5%的假线索率（FLR）。在加入X-SYNTH后，TLR上升到61.9%（6.5倍），而FLR下降到18.8%。企业上下文合成不是检索问题，而是相关性问题，而数字人类注意力是其最可靠的地面真实值。

英文摘要

In enterprise operations, the context required for an AI agent task is scattered across systems of record, static information stores, and communication channels. What is stored is system state, a lossy representation of the work that actually happened. The prevailing approach retrieves by matching request content to what is stored; for narrow requests this works well. But synthesis quality depends on knowing what to surface and how to interpret it: knowledge specific to each organization, team, and individual, present in behavioral patterns, absent from any retrieval index. For the agentic task of proposing enterprise-valuable leads to sellers, this approach breaks down: True Lead Rate is low, False Lead Rate is high, and the model has no mechanism to improve. We present X-SYNTH, a framework for enterprise context synthesis grounded in digital human attention, the digitally observable interaction signatures of each worker, encoding what they did, the sequence in which they did it, and implicit reward signals. Behavioral traces preceding positive outcomes are distinguishable from those that did not, without external labeling. X-SYNTH models each individual's behavioral baseline as a Digital Twin Signature (DTS) and selects among seven attention filters, Proportional, Inverse, Differential, Recurrent, Comparative, Sequential, and Collective, per individual and per query, to identify causally relevant activity signatures. A four-stage pipeline assembles ranked context grounded in behavioral patterns rather than query embeddings. A frontier model unaided achieves 9.5% True Lead Rate (TLR) with 90.5% False Lead Rate (FLR). Augmented with X-SYNTH, TLR rises to 61.9% (6.5x) while FLR falls to 18.8%. Enterprise context synthesis is not a retrieval problem. It is a relevance problem, and digital human attention is its most reliable ground truth.

URL PDF HTML ☆

赞 0 踩 0

2605.12836 2026-05-22 cs.LG 版本更新

On-policy knowledge distillation (OPD) trains a student on its own rollouts under token-level supervision from a teacher. Not all token positions matter equally, but existing views of token importance are incomplete. We ask a direct question: which tokens carry the most useful learning signal in OPD? Our answer is that informative tokens come from two regions: positions with high student entropy, and positions with low student entropy plus high teacher--student divergence, where the student is overconfident and wrong. Empirically, student entropy is a strong first-order proxy: retaining $50\%$ of tokens with entropy-based sampling matches or exceeds all-token training while reducing peak memory by up to $47\%$. But entropy alone misses a second important region. When we isolate low-entropy, high-divergence tokens, training on fewer than $10\%$ of all tokens nearly matches full-token baselines, showing that overconfident tokens carry dense corrective signal despite being nearly invisible to entropy-only rules. We organize these findings with TIP (Token Importance in on-Policy distillation), a two-axis taxonomy over student entropy and teacher--student divergence, and give a theoretical explanation for why entropy is useful yet structurally incomplete. This view motivates type-aware token selection rules that combine uncertainty and disagreement. We validate this picture across three teacher--student pairs spanning Qwen3, Llama, and Qwen2.5 on MATH-500 and AIME 2024/2025, and on the DeepPlanning benchmark for long-horizon agentic planning, where Q3-only training on $<$$20\%$ of tokens surpasses full-token OPD. Our experiments are implemented by extending the OPD repository https://github.com/HJSang/OPSD_OnPolicyDistillation, which supports memory-efficient distillation of larger models under limited GPU budgets.

URL PDF HTML ☆

赞 0 踩 0

2604.12325 2026-05-22 cs.LG cs.AI 版本更新

Black-Box Optimization From Small Offline Datasets via Meta Learning with Synthetic Tasks

通过合成任务进行元学习的黑盒优化

Azza Fadhel, The Hung Tran, Trong Nghia Hoang, Jana Doppa

发表机构 * School of EECS, Washington State University, Pullman, WA, USA（华盛顿州立大学电子工程与计算机科学学院，普拉默，华盛顿州，美国）

AI总结本文提出了一种通过生成合成任务进行元学习的框架OptBias，用于解决小规模离线数据下的黑盒优化问题，通过学习可重用的优化偏差来提升小数据场景下的性能。

Comments Accepted for Publication at International Conference on Artificial Intelligence and Statistics (AISTATS)

详情

AI中文摘要

规则状态推断（RSI）：一种用于规则治理领域合规监控的贝叶斯框架

Abdou-Raouf Atarmla

发表机构 * Institut National des Postes et Télécommunications（摩洛哥邮政和电信国家研究院）； Togo DataLab（多哥数据实验室）； Ministry of Digital Economy（数字经济部）

AI总结本文提出了一种名为规则状态推断（RSI）的贝叶斯框架，用于解决规则治理领域中合规监控的三大结构性挑战：部署时缺乏标记结果、非合规实体战略性缺失观察以及监管环境变化速度超过任何监督模型的重新训练速度。RSI通过将权威、形式化的规则集作为结构化的贝叶斯先验，利用变分推断和精确坐标上升更新来推断人口的潜在合规状态。

Comments 18 pages. Experimental validation forthcoming

详情

AI中文摘要

在规则治理领域（如税收管理、临床协议遵守、环境监管）的合规监控面临三个结构性障碍，标准机器学习无法同时解决：部署时缺乏标记结果、非合规实体战略性缺失观察以及监管环境变化速度超过任何监督模型的重新训练速度。我们引入规则状态推断（RSI），一种贝叶斯框架，颠覆了传统的学习规则从数据的范式。RSI将权威、形式化的规则集作为结构化的贝叶斯先验，并通过均场变分推断和精确坐标上升更新推断人口的潜在合规状态。核心建模对象是一个联合潜变量，每个监管时期一个：全局合规文化因子η以及每个规则的激活、人口合规水平和参数漂移成分。RSI提供了三个正式保证：每个规则更新的监管适应性为O(n_k + K)；对于可识别的连续成分的伯恩斯坦-冯·米塞斯一致性；以及每次迭代的单调ELBO收敛。我们将在托戈财政系统上实例化RSI，基于官方监管法律的基准2000家合成企业；完整的数值验证将随后进行。该框架设计用于直接扩展到顺序RSI，一种状态空间公式化中，一个监管时期的后验成为下一个的先验，从而产生精确的卡尔曼滤波器用于合规轨迹跟踪和实体级贝叶斯评分。

英文摘要

Compliance monitoring in rule-governed domains (tax administration, clinical protocol adherence, environmental regulation) faces three structural obstacles that standard machine learning does not simultaneously address: the absence of labeled outcomes at deployment, strategically missing observations where non-compliant entities selectively withhold evidence, and a regulatory environment that changes faster than any supervised model can be retrained. We introduce Rule-State Inference (RSI), a Bayesian framework that reverses the usual paradigm. Rather than learning rules from data, RSI treats an authoritative, formalized rule set as structured Bayesian priors and infers the latent compliance state of a population through mean-field variational inference with exact coordinate-ascent updates. The central modeling object is a joint latent state per regulatory period: a global compliance-culture factor eta and per-rule components for activation, population compliance level, and parametric drift. RSI delivers three formal guarantees: O(n_k + K) regulatory adaptability per rule update; Bernstein-von Mises consistency for the identifiable continuous components; and monotone ELBO convergence at every iteration. We instantiate RSI on the Togolese fiscal system on a benchmark of 2,000 synthetic enterprises grounded in official regulatory law; full numerical validation is forthcoming. The framework is designed for direct extension to Sequential RSI, a state-space formulation where the posterior from one regulatory period becomes the prior for the next, yielding an exact Kalman filter for compliance-trajectory tracking and entity-level Bayesian scoring.

URL PDF HTML ☆

赞 0 踩 0

2603.20228 2026-05-22 math.OC cs.LG 版本更新

Compact Lifted Relaxations for Low-Rank Optimization

紧凑的提升松弛方法用于低秩优化

Ryan Cory-Wright, Jean Pauphilet

发表机构 * Department of Analytics, Marketing and Operations, Imperial Business School（分析、营销与运营部，帝国商业学院）； Management Science and Operations, London Business School（管理科学与运营，伦敦商业学院）

AI总结本文提出了一种可处理秩约束二次优化问题的紧凑凸松弛方法，通过引入提升半正定松弛，避免了传统方法中所需的谱结构项，并通过冗余块的分析得到更紧凑的松弛形式，同时引入了新的有效不等式（投影割）以增强低秩松弛效果，适用于矩阵补全和降维回归等问题。

Comments Part of this material previously appeared in arXiv:2501.02942v2, which was split into this paper and arXiv:2501.02942v3

详情

AI中文摘要

我们开发了可处理n×m矩阵上的秩约束二次优化问题的可 tractable 凸松弛方法，这种设置通常只有在目标函数或约束具有谱结构时才可用 tractable 松弛。我们推导了不需谱项的提升半正定松弛。尽管直接提升引入了维度为n² + nm + 1的大型半正定约束，我们证明了许多时刻矩阵的块是冗余的，并推导出等价的紧凑松弛，仅涉及两个半正定约束，分别维度为nm + 1和n + m。我们还推导了一种新的有效不等式类别，称为投影割，利用了低秩矩阵的线性像继承秩约束的事实，显著增强了我们的低秩松弛。对于矩阵补全和降维回归等问题，我们利用额外的结构得到更紧凑的公式，涉及半正定矩阵的维度至多为低秩决策矩阵两个维度之和（即大小至多为n + m）。总体而言，我们为广泛低秩二次问题获得了可扩展的半正定界。

英文摘要

We develop tractable convex relaxations for rank-constrained quadratic optimization problems over $n \times m$ matrices, a setting for which tractable relaxations are typically only available when the objective or constraints admit spectral structure. We derive lifted semidefinite relaxations that do not require such spectral terms. Although a direct lifting introduces a large semidefinite constraint in dimension $n^2 + nm + 1$, we prove that many blocks of the moment matrix are redundant and derive an equivalent compact relaxation that only involves two semidefinite constraints of dimension $nm + 1$ and $n+m$, respectively. We also derive a new class of valid inequalities for low-rank problems, which we call projection cuts, that exploit the fact that rank constraints are inherited by linear images of a low-rank matrix, to strengthen our low-rank relaxations substantially. For matrix completion and reduced-rank regression problems, among others, we exploit additional structure to obtain even more compact formulations involving semidefinite matrices of dimension at most the sum of the two dimensions of the low-rank decision matrix (i.e., of size at most $n+m$). Overall, we obtain scalable semidefinite bounds for a broad class of low-rank quadratic problems.

URL PDF HTML ☆

赞 0 踩 0

2603.16077 2026-05-22 cs.LG 版本更新

MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Scaling of Diffusion Language Models

MDM-Prime-v2：二进制编码和索引洗牌使扩散语言模型能够扩展

Chen-Hao Chao, Wei-Fang Sun, Junwei Quan, Chun-Yi Lee, Rahul G. Krishnan

发表机构 * University of Toronto（多伦多大学）； Vector Institute（向量研究所）； NVIDIA AI Technology Center（NVIDIA AI技术中心）； National Taiwan University（国立台湾大学）

AI总结本文提出MDM-Prime-v2，通过二进制编码和索引洗牌技术改进扩散语言模型，解决了子分词器功能形式与BPE分词器结合导致的交叉熵损失增加以及子分词器粒度超参数选择缺乏工具的问题，从而提升了模型在常识推理基准上的零样本准确率。

详情

AI中文摘要

Masked diffusion models (MDM) exhibit superior generalization when learned using a Partial masking scheme (Prime). This approach converts tokens into sub-tokens and models the diffusion process at the sub-token level. We identify two limitations of the MDM-Prime framework. First, we find that the functional form of the subtokenizer significantly increases the cross-entropy loss in the objective when paired with commonly used Byte-Pair-Encoding (BPE) tokenizers. Second, we lack tools to guide the hyperparameter choice of the token granularity in the subtokenizer. To address these limitations, we analyze the optimal design of the subtokenizer that minimizes MDM-Prime training objective and develop MDM-Prime-v2, a masked diffusion language model which incorporates Binary Encoding and Index Shuffling. Our analysis characterizes how token granularity and sub-token entropy influence the training objective and downstream performance, providing principled criteria for subtokenizer design. When extending the model size to 1.1B parameters, MDM-Prime-v2 demonstrates superior average zero-shot accuracy across eight commonsense reasoning benchmarks, outperforming similar-sized baselines including GPT-Neo, OPT, Pythia, Bloom, SMDM, and TinyLLaMA.

英文摘要

Masked diffusion models (MDM) exhibit superior generalization when learned using a Partial masking scheme (Prime). This approach converts tokens into sub-tokens and models the diffusion process at the sub-token level. We identify two limitations of the MDM-Prime framework. First, we find that the functional form of the subtokenizer significantly increases the cross-entropy loss in the objective when paired with commonly used Byte-Pair-Encoding (BPE) tokenizers. Second, we lack tools to guide the hyperparameter choice of the token granularity in the subtokenizer. To address these limitations, we analyze the optimal design of the subtokenizer that minimizes MDM-Prime training objective and develop MDM-Prime-v2, a masked diffusion language model which incorporates Binary Encoding and Index Shuffling. Our analysis characterizes how token granularity and sub-token entropy influence the training objective and downstream performance, providing principled criteria for subtokenizer design. When extending the model size to 1.1B parameters, MDM-Prime-v2 demonstrates superior average zero-shot accuracy across eight commonsense reasoning benchmarks, outperforming similar-sized baselines including GPT-Neo, OPT, Pythia, Bloom, SMDM, and TinyLLaMA.

URL PDF HTML ☆

赞 0 踩 0

2603.02604 2026-05-22 cs.LG 版本更新

Heterogeneous Agent Collaborative Reinforcement Learning

异质智能体协作强化学习

Zhixia Zhang, Zixuan Huang, Gongxun Li, Huaiyang Wang, Chengyi Yuan, Xin Xia, Deqing Wang, Fuzhen Zhuang, Shuai Ma, Ning Ding, Yaodong Yang, Jianxin Li, Yikun Ban

发表机构 * Beihang University（北航）； Bytedance China（字节跳动中国）； Tsinghua University（清华大学）； Peking University（北京大学）； Apple（苹果公司）

AI总结本文提出了一种新的强化学习从可验证奖励（RLVR）问题HACRL，通过异质智能体共享验证的轨迹实现协同优化，解决了孤立多智能体在线优化的效率问题，并提出HACPO算法以最大化样本利用率和跨智能体知识转移。

详情

AI中文摘要

我们引入了异质智能体协作强化学习（HACRL），一种新的强化学习从可验证奖励（RLVR）问题，旨在解决孤立多智能体在线优化的低效问题。HACRL允许独立执行的协同优化：异质智能体在训练期间共享验证的轨迹以互相改进，而在推理期间独立操作。不同于基于大语言模型的多智能体强化学习（MARL），HACRL不需要协调部署，也不同于在线/离线策略蒸馏，它使异质智能体之间实现双向相互学习，而非单向的教师到学生转移。基于此问题，我们提出HACPO，一种协作RL算法，能够通过原则性的轨迹共享最大化样本利用率和跨智能体知识转移。为缓解能力差异和策略分布偏移，HACPO引入了四个定制机制，具有对无偏优势估计的理论保证。在多样化的异质模型组合和推理基准上的广泛实验表明，HACPO一致地提升了所有参与智能体，相比使用双轨迹的GSPO，平均提高了3.6%，同时仅使用一半的轨迹成本。

英文摘要

We introduce Heterogeneous Agent Collaborative Reinforcement Learning (HACRL), a new Reinforcement Learning from Verifiable Reward (RLVR) problem that addresses the inefficiencies of isolated multi-agent on-policy optimization. HACRL enables collaborative optimization with independent execution: heterogeneous agents share verified rollouts during training to mutually improve, while operating independently at inference time. Unlike LLM-based multi-agent reinforcement learning (MARL), HACRL does not require coordinated deployment, and unlike on-/off-policy distillation, it enables bidirectional mutual learning among heterogeneous agents rather than one-directional homogeneous teacher-to-student transfer. Building on this problem, we propose HACPO, a collaborative RL algorithm that enables principled rollout sharing to maximize sample utilization and cross-agent knowledge transfer. To mitigate capability discrepancies and policy distribution shifts, HACPO introduces four tailored mechanisms with theoretical guarantees on unbiased advantage estimation. Extensive experiments across diverse heterogeneous model combinations and reasoning benchmarks show that HACPO consistently improves all participating agents, outperforming GSPO with double rollouts by an average of 3.6% while using only half the rollout cost.

URL PDF HTML ☆

赞 0 踩 0

2602.23200 2026-05-22 cs.LG cs.CL 版本更新

InnerQ: Hardware-Aware Tuning-Free Quantization of KV Cache for Large Language Models

InnerQ: 一种面向硬件的无需调优的KV缓存量化方法用于大语言模型

Sayed Mohammadreza Tayaranian Hosseini, Amir Ardakani, Warren J. Gross

发表机构 * Department of Electrical and Computer Engineering（电气与计算机工程系）

AI总结本文提出InnerQ，一种面向硬件的KV缓存量化方法，旨在减少解码延迟而不影响评估性能，通过分组量化策略提高数据重用率，从而在Llama和Mistral模型上提升了少样本评估得分。

Comments 18 pages, 5 figures, 7 tables

详情

AI中文摘要

当基于Transformer的语言模型用于文本生成时，大部分推理时间消耗在解码阶段，其中依次生成输出token。因此，减少每个解码步骤的硬件成本对于高效的长上下文生成至关重要。主要瓶颈是键值（KV）缓存，其大小随序列长度增长，通常主导模型的内存足迹。先前工作提出了压缩KV缓存的同时最小化精度损失的量化方法。我们提出了InnerQ，一种面向硬件的KV缓存量化方案，能够在不牺牲评估性能的情况下减少解码延迟。InnerQ通过沿内维对缓存矩阵进行分组实现分组量化。这种分组策略使去量化与向量-矩阵乘法对齐，并在GPU计算单元之间增加数据重用。结果，InnerQ减少了内存访问并加速了去量化，实现了比先前KV缓存量化方法平均快1.3倍，比非量化基线快2.7倍。为了在剧烈压缩下保持精度，InnerQ结合了三种技术：(i) 混合量化，根据局部统计选择对每个组使用对称或非对称量化；(ii) 高精度窗口用于最近的token和注意力sink token以缓解异常值泄漏；(iii) 对key缓存的通道归一化，在prefill期间计算一次并折叠到模型参数中以消除运行时开销。除了减少延迟外，在Llama和Mistral模型上的实验表明，InnerQ还相对于先前的KV缓存量化方法提升了少样本评估得分。

英文摘要

When transformer-based language models are deployed for text generation, most of the inference time is spent in the decoding stage, where output tokens are generated sequentially. Reducing the hardware cost of each decoding step is therefore critical for efficient long-context generation. A major bottleneck is the key-value (KV) cache, whose size grows with sequence length and often dominates the model's memory footprint. Prior work has proposed quantization methods to compress the KV cache while minimizing its loss of precision. We present InnerQ, a hardware-aware KV cache quantization scheme that reduces decode latency without compromising evaluation performance. InnerQ performs group-wise quantization by grouping cache matrices along their inner dimension. This grouping strategy aligns dequantization with vector-matrix multiplication and increases data reuse across GPU compute units. As a result, InnerQ reduces memory access and accelerates dequantization, achieving an average $1.3\times$ speedup over prior KV cache quantization methods and $2.7\times$ over the non-quantized baseline. To maintain fidelity under aggressive compression, InnerQ incorporates three techniques: (i) hybrid quantization, which chooses symmetric or asymmetric quantization for each group based on local statistics; (ii) high-precision windows for both recent tokens and attention sink tokens to mitigate outlier leakage; and (iii) per-channel normalization of the key cache, computed once during prefill and folded into the model parameters to eliminate runtime overhead. Beyond reducing latency, experiments on Llama and Mistral models show that InnerQ also improves few-shot evaluation scores relative to prior KV cache quantization methods.

URL PDF HTML ☆

赞 0 踩 0

2602.18600 2026-05-22 cs.LG 版本更新

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

MapTab: MLLMs 是否已准备好在异构图中进行多标准路线规划？

Ziqiao Shang, Lingyue Ge, Zi-Jian Cheng, Shi-Yu Tian, Zhenyu Huang, Wenbo Fu, Weiming Wu, Yang Chen, Xiangwen Zhang, Yulan Hu, Bin Liu, Yu-Feng Li, Lan-Zhe Guo

发表机构 * National Key Laboratory for Novel Software Technology, Nanjing University（南京大学新型软件技术国家重点实验室）； School of Intelligence Science and Technology, Nanjing University（南京大学智能科学与技术学院）； AMAP, Alibaba Group（阿里集团AMAP）； School of Computing and Artificial Intelligence, Southwest Jiaotong University（西南交通大学计算机与人工智能学院）

AI总结本文提出MapTab基准测试，用于评估多模态大语言模型在多标准路线规划任务中的综合推理能力，发现当前模型在多模态推理方面存在显著挑战。

详情

AI中文摘要

系统评估多模态大语言模型（MLLMs）对于推进人工通用智能（AGI）至关重要。然而，现有基准测试仍不足以严格评估其在多标准约束下的推理能力。为弥合这一差距，我们引入MapTab，一个专门设计用于通过路线规划任务评估MLLMs的综合多标准推理能力的多模态基准测试。MapTab要求MLLMs感知并结合地图图像中的视觉线索与结构化表格数据中的路线属性（如时间、价格）。该基准测试涵盖两个场景：Metromap，涵盖52个国家160座城市的地铁网络；Travelmap，描绘19个国家的168个代表性旅游景点。总共包含328张图像、196,800个路线规划查询和3,936个问答查询，所有数据均包含4个关键标准：时间、价格、舒适度和可靠性。对15个代表性MLLMs的广泛评估表明，当前模型在多标准多模态推理方面面临重大挑战。值得注意的是，在视觉感知有限的条件下，多模态协作往往不如单模态方法表现优异。我们认为MapTab提供了一个具有挑战性和现实性的测试平台，以推进MLLMs的系统评估。我们的代码可在https://github.com/Ziqiao-Shang/MapTab上获得。

英文摘要

Systematic evaluation of Multimodal Large Language Models (MLLMs) is crucial for advancing Artificial General Intelligence (AGI). However, existing benchmarks remain insufficient for rigorously assessing their reasoning capabilities under multi-criteria constraints. To bridge this gap, we introduce MapTab, a multimodal benchmark specifically designed to evaluate holistic multi-criteria reasoning in MLLMs via route planning tasks. MapTab requires MLLMs to perceive and ground visual cues from map images alongside route attributes (e.g., Time, Price) from structured tabular data. The benchmark encompasses two scenarios: Metromap, covering metro networks in 160 cities across 52 countries, and Travelmap, depicting 168 representative tourist attractions from 19 countries. In total, MapTab comprises 328 images, 196,800 route planning queries, and 3,936 QA queries, all incorporating 4 key criteria: Time, Price, Comfort, and Reliability. Extensive evaluations across 15 representative MLLMs reveal that current models face substantial challenges in multi-criteria multimodal reasoning. Notably, under conditions of limited visual perception, multimodal collaboration often underperforms compared to unimodal approaches. We believe MapTab provides a challenging and realistic testbed to advance the systematic evaluation of MLLMs. Our code is available at https://github.com/Ziqiao-Shang/MapTab.

URL PDF HTML ☆

赞 0 踩 0

2602.16169 2026-05-22 cs.LG cs.CL 版本更新

Amey P. Pasarkar, Adji Bousso Dieng

发表机构 * Lewis-Sigler Institute For Integrative Genomics, Princeton University（普林斯顿大学整合基因组学研究所）； Department of Computer Science, Princeton University（普林斯顿大学计算机科学系）

AI总结本文提出了一种基于Vendi Scores的Vendi Novelty Score（VNS）方法，从多样性角度解决分布外检测问题，该方法无需密度建模，具有线性时间复杂度和非参数特性，并在多个图像分类基准上实现了最先进的OOD检测性能。

详情

AI中文摘要

Out-of-distribution (OOD) detection is critical for the safe deployment of machine learning systems. Existing post-hoc detectors typically rely on model confidence scores or likelihood estimates in feature space, often under restrictive distributional assumptions. In this work, we introduce a third paradigm and formulate OOD detection from a diversity perspective. We propose the Vendi Novelty Score (VNS), an OOD detector based on the Vendi Scores (VS), a family of similarity-based diversity metrics. VNS quantifies how much a test sample increases the VS of the in-distribution feature set, providing a principled notion of novelty that does not require density modeling. VNS is linear-time, non-parametric, and naturally combines class-conditional (local) and dataset-level (global) novelty signals. Across multiple image classification benchmarks and network architectures, VNS achieves state-of-the-art OOD detection performance. Remarkably, VNS retains this performance when computed using only 1% of the training data, enabling deployment in memory- or access-constrained settings.

英文摘要

Out-of-distribution (OOD) detection is critical for the safe deployment of machine learning systems. Existing post-hoc detectors typically rely on model confidence scores or likelihood estimates in feature space, often under restrictive distributional assumptions. In this work, we introduce a third paradigm and formulate OOD detection from a diversity perspective. We propose the Vendi Novelty Score (VNS), an OOD detector based on the Vendi Scores (VS), a family of similarity-based diversity metrics. VNS quantifies how much a test sample increases the VS of the in-distribution feature set, providing a principled notion of novelty that does not require density modeling. VNS is linear-time, non-parametric, and naturally combines class-conditional (local) and dataset-level (global) novelty signals. Across multiple image classification benchmarks and network architectures, VNS achieves state-of-the-art OOD detection performance. Remarkably, VNS retains this performance when computed using only 1% of the training data, enabling deployment in memory- or access-constrained settings.

URL PDF HTML ☆

赞 0 踩 0

2602.06264 2026-05-22 cs.LG 版本更新

Swap Regret Minimization Through Response-Based Approachability

通过响应方法实现交换遗憾最小化

Ioannis Anagnostides, Gabriele Farina, Maxwell Fishelson, Haipeng Luo, Jon Schneider

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Massachusetts Institute of Technology（麻省理工学院）； University of Southern California（南加州大学）； Google Research（谷歌研究）

AI总结本文提出了一种更简单高效的算法，通过预处理后的约翰椭球，保证了线性交换遗憾为O(d√T)，并建立了信息论下限，证明了经典算法在减少线性交换遗憾方面的最优性，同时扩展了该方法以处理多项式维度的交换偏差集。

Comments V3 makes certain clarifications and improves the upper bound for general sets via symmetrization

详情

AI中文摘要

我们考虑在线优化中最小化不同交换遗憾形式的问题。这些形式的遗憾与博弈中的相关均衡概念紧密相关，并且最近已被证明能够保证对战略对手的非操纵性。最近，Daskalakis, Farina, Fishelson, Pipis和Schneider（STOC '25）开发了在一般凸集上最小化线性交换遗憾的计算效率算法，但其遗憾界为Ω(d⁴√T)，并且每次迭代都需要计算强度大的椭球算法调用。在本文中，我们开发了一种显著更简单、计算效率更高的算法，该算法保证在经过约翰椭球预处理的一般凸集上线性交换遗憾为O(d√T)。我们的算法利用了Bernstein和Shimkin（JMLR~'15）提出的强大的响应方法可接近框架——此前在交换遗憾最小化研究中被忽视——同时最小化了profile交换遗憾，最近已被证明能够保证非操纵性。此外，我们建立了匹配的信息论下限：即使当集合是中心对称的时，任何学习者在期望上必须承受Ω(d√T)的线性交换遗憾，对于足够大的T。这还表明，Gordon, Greenwald和Marks（ICML '08）的经典算法在减少线性交换遗憾方面是存在最优的，尽管它计算上效率低下。最后，我们将这种方法扩展以最小化相对于具有多项式维度的交换偏差集的遗憾，统一并加强了最近在均衡计算和在线学习中的研究成果。

英文摘要

We consider the problem of minimizing different notions of swap regret in online optimization. These forms of regret are tightly connected to correlated equilibrium concepts in games, and have been more recently shown to guarantee non-manipulability against strategic adversaries. The only computationally efficient algorithm for minimizing linear swap regret over a general convex set in $\mathbb{R}^d$ was developed recently by Daskalakis, Farina, Fishelson, Pipis, and Schneider (STOC '25). However, it incurs a highly suboptimal regret bound of $Ω(d^4 \sqrt{T})$ and also relies on computationally intensive calls to the ellipsoid algorithm at each iteration. In this paper, we develop a significantly simpler, computationally efficient algorithm that guarantees $O(d \sqrt{T})$ linear swap regret for a general convex set that has been preconditioned via the John ellipsoid. Our algorithm leverages the powerful response-based approachability framework of Bernstein and Shimkin (JMLR~'15) -- previously overlooked in the line of work on swap regret minimization -- and simultaneously minimizes profile swap regret, which was recently shown to guarantee non-manipulability. Moreover, we establish a matching information-theoretic lower bound: any learner must incur in expectation $Ω(d \sqrt{T})$ linear swap regret for large enough $T$, even when the set is centrally symmetric. This also shows that the classic algorithm of Gordon, Greenwald, and Marks (ICML '08) is existentially optimal for minimizing linear swap regret, although it is computationally inefficient. Finally, we extend our approach to minimize regret with respect to the set of swap deviations with polynomial dimension, unifying and strengthening recent results in equilibrium computation and online learning.

URL PDF HTML ☆

赞 0 踩 0

2602.05286 2026-05-22 cs.LG cs.AI 版本更新

HealthMamba: An Uncertainty-aware Spatiotemporal Graph State Space Model for Effective and Reliable Healthcare Facility Visit Prediction

HealthMamba: 一种考虑不确定性的时空图状态空间模型用于有效可靠的医疗设施访问预测

Dahai Yu, Lin Jiang, Rongchao Xu, Guang Wang

发表机构 * Department of Computer Science, Florida State University（佛罗里达州立大学计算机科学系）

AI总结本文提出HealthMamba，一种考虑不确定性的时空图状态空间模型，用于有效可靠的医疗设施访问预测。该模型包含三个关键组件：统一的时空上下文编码器、新的图状态空间模型GraphMamba以及综合的不确定性量化模块。实验结果显示，HealthMamba在预测准确性和不确定性量化方面分别比现有最佳基线提高了6.0%和3.5%。

Comments IJCAI 2026

详情

AI中文摘要

医疗设施访问预测对于优化医疗资源配置和 informing 公共卫生政策至关重要。尽管已经采用了先进的机器学习方法以提高预测性能，但现有工作通常将此任务视为时间序列预测问题，而没有考虑不同类型的医疗设施的内在空间依赖性，且在公共紧急情况等异常情况下也无法提供可靠的预测。为了推进现有研究，我们提出了HealthMamba，一种考虑不确定性的时空框架，用于准确且可靠的医疗设施访问预测。HealthMamba包含三个关键组件：(i) 一个统一的时空上下文编码器，融合异构的静态和动态信息，(ii) 一种新的图状态空间模型称为GraphMamba用于分层时空建模，(iii) 一个综合的不确定性量化模块，整合三种不确定性量化机制以实现可靠的预测。我们在四个大规模真实世界数据集上评估了HealthMamba，这些数据集来自加州、纽约、得克萨斯州和佛罗里达州。结果表明，HealthMamba在预测准确性和不确定性量化方面分别比现有最佳基线提高了6.0%和3.5%。

英文摘要

Healthcare facility visit prediction is essential for optimizing healthcare resource allocation and informing public health policy. Despite advanced machine learning methods being employed for better prediction performance, existing works usually formulate this task as a time-series forecasting problem without considering the intrinsic spatial dependencies of different types of healthcare facilities, and they also fail to provide reliable predictions under abnormal situations such as public emergencies. To advance existing research, we propose HealthMamba, an uncertainty-aware spatiotemporal framework for accurate and reliable healthcare facility visit prediction. HealthMamba comprises three key components: (i) a Unified Spatiotemporal Context Encoder that fuses heterogeneous static and dynamic information, (ii) a novel Graph State Space Model called GraphMamba for hierarchical spatiotemporal modeling, and (iii) a comprehensive uncertainty quantification module integrating three uncertainty quantification mechanisms for reliable prediction. We evaluate HealthMamba on four large-scale real-world datasets from California, New York, Texas, and Florida. Results show HealthMamba achieves around 6.0% improvement in prediction accuracy and 3.5% improvement in uncertainty quantification over state-of-the-art baselines.

URL PDF HTML ☆

赞 0 踩 0

2602.03067 2026-05-22 cs.LG cs.AI cs.NA math.NA 版本更新

FlashSinkhorn: IO-Aware Entropic Optimal Transport on GPU

FlashSinkhorn: GPU上的IO感知熵最优传输

Felix X. -F. Ye, Xingjie Li, An Yu, Ming-Ching Chang, Linsong Chu, Davis Wertheimer

发表机构 * Department of Mathematics \& Statistics, University at Albany, Albany, NY, USA ； Department of Mathematics ； Statistics, University of North Carolina at Charlotte, Charlotte, NC, USA ； Department of Computer Science, University at Albany, Albany, NY, USA ； IBM T.\ J.\ Watson Research Center, Yorktown Heights, NY, USA

AI总结本文提出FlashSinkhorn，一种基于GPU的熵最优传输求解器，通过将稳定化的对数域Sinkhorn更新转换为行-wise的LogSumExp归一化，实现了与Transformer注意力相同的归一化方式，从而实现了FlashAttention风格的融合和分块处理，显著降低了HBMIO并保持线性内存操作。

详情

AI中文摘要

熵最优传输（EOT）通过Sinkhorn迭代在现代机器学习中广泛应用，但GPU求解器在大规模情况下仍效率低下。张量化实现因密集的n×m交互导致二次HBM流量，而现有在线后端避免存储密集矩阵但仍然依赖于通用的 tiled map-reduce 减少内核，融合有限。我们提出FlashSinkhorn，一种针对平方欧几里得成本的IO感知EOT求解器，将稳定化的对数域Sinkhorn更新重写为行-wise的LogSumExp归一化，与Transformer注意力相同的归一化方式。这使得FlashAttention风格的融合和分块处理成为可能：融合的Triton内核通过芯片上的SRAM流式传输分块，并在单次通过中更新双潜力，显著减少每个迭代的HBM IO同时保持线性内存操作。我们进一步提供了用于传输应用的流式内核，实现了可扩展的一阶和二阶优化。在A100 GPU上，FlashSinkhorn在点云OT上的前向传递速度比最先进的在线基线快32倍，在端到端速度上快161倍，提高了OT基于下游任务的可扩展性。为了可重复性，我们发布了开源实现，网址为https://github.com/ot-triton-lab/flash-sinkhorn。

英文摘要

Entropic optimal transport (EOT) via Sinkhorn iterations is widely used in modern machine learning, yet GPU solvers remain inefficient at scale. Tensorized implementations suffer quadratic HBM traffic from dense $n\times m$ interactions, while existing online backends avoid storing dense matrices but still rely on generic tiled map-reduce reduction kernels with limited fusion. We present \textbf{FlashSinkhorn}, an IO-aware EOT solver for squared Euclidean cost that rewrites stabilized log-domain Sinkhorn updates as row-wise LogSumExp reductions of biased dot-product scores, the same normalization as transformer attention. This enables FlashAttention-style fusion and tiling: fused Triton kernels stream tiles through on-chip SRAM and update dual potentials in a single pass, substantially reducing HBM IO per iteration while retaining linear-memory operations. We further provide streaming kernels for transport application, enabling scalable first- and second-order optimization. On A100 GPUs, FlashSinkhorn achieves up to $32\times$ forward-pass and $161\times$ end-to-end speedups over state-of-the-art online baselines on point-cloud OT, improves scalability on OT-based downstream tasks. For reproducibility, we release an open-source implementation at https://github.com/ot-triton-lab/flash-sinkhorn .

URL PDF HTML ☆

赞 0 踩 0

2602.01935 2026-05-22 cs.LG cs.AI cs.PL 版本更新

针对实值时间序列的软贝叶斯上下文树模型

Shota Saito, Yuta Nakahara, Toshiyasu Matsushima

发表机构 * Gunma University（群马大学）； Waseda University（早稻田大学）

AI总结本文提出了一种新的软贝叶斯上下文树模型（Soft-BCT），用于实值时间序列。该模型采用概率性分裂上下文空间，而非传统上下文树模型中确定性的上下文空间分裂。基于变分推断提出学习算法，实验结果表明Soft-BCT在某些数据集上优于传统上下文树模型。

2601.10348 2026-05-22 cs.CL cs.AI cs.LG 版本更新

Training-Trajectory-Aware Token Selection

基于训练轨迹的token选择

Zhanming Shen, Jiaqi Hu, Zeyu Qin, Hao Chen, Wentao Ye, Zenan Huang, Yihong Zhuang, Guoshan Lu, Junlin Zhou, Junbo Zhao

发表机构 * Zhejiang University（浙江大学）； Hong Kong University of Science and Technology（香港科技大学）

AI总结本文提出T3S方法，通过在token层面重构训练目标，清除未学习token的优化路径，从而在连续蒸馏中提升性能，实验表明在AR和dLLM设置中均取得显著效果。

Comments Accepted by ICML 2026

详情

AI中文摘要

高效的蒸馏是将昂贵的推理能力转化为可部署效率的关键途径，然而在前沿领域中，当学生模型已具备较强的推理能力时，朴素的连续蒸馏往往产生有限的收益甚至退化。我们观察到一种训练特征现象：即使损失单调下降，所有性能指标在几乎相同的瓶颈处会突然大幅下降，然后逐渐恢复。我们进一步揭示了token层面的机制：置信度会分裂成稳步增加的模仿锚点token，快速锚定优化，以及尚未学习的token，其置信度被抑制直到瓶颈之后。这两种类型token无法共存的特性是连续蒸馏失败的根本原因。为此，我们提出了基于训练轨迹的token选择（T3S）方法，以在token层面重建训练目标，清除未学习token的优化路径。T3S在AR和dLLM设置中均取得一致的收益：仅用数百个示例，Qwen3-8B在竞争性推理基准上超越DeepSeek-R1，Qwen3-32B接近Qwen3-235B，且T3训练的LLaDA-2.0-Mini超越其AR基线，达到所有16B级模型中的最先进性能。

英文摘要

Efficient distillation is a key pathway for converting expensive reasoning capability into deployable efficiency, yet in the frontier regime where the student already has strong reasoning ability, naive continual distillation often yields limited gains or even degradation. We observe a characteristic training phenomenon: even as loss decreases monotonically, all performance metrics can drop sharply at almost the same bottleneck, before gradually recovering. We further uncover a token-level mechanism: confidence bifurcates into steadily increasing Imitation-Anchor Tokens that quickly anchor optimization and other yet-to-learn tokens whose confidence is suppressed until after the bottleneck. And the characteristic that these two types of tokens cannot coexist is the root cause of the failure in continual distillation. To this end, we propose Training-Trajectory-Aware Token Selection (T3S) to reconstruct the training objective at the token level, clearing the optimization path for yet-to-learn tokens. T3S yields consistent gains in both AR and dLLM settings: with only hundreds of examples, Qwen3-8B surpasses DeepSeek-R1 on competitive reasoning benchmarks, Qwen3-32B approaches Qwen3-235B, and T3-trained LLaDA-2.0-Mini exceeds its AR baseline, achieving state-of-the-art performance among all of 16B-scale no-think models.

URL PDF HTML ☆

赞 0 踩 0

2601.05157 2026-05-22 cs.DS cs.LG stat.ML 版本更新

Learning Mixture Models via Efficient High-dimensional Sparse Fourier Transforms

通过高效的高维稀疏傅里叶变换学习混合模型

Alkis Kalavasis, Pravesh K. Kothari, Shuchen Li, Manolis Zampetakis

发表机构 * Yale University（耶鲁大学）； Princeton University（普林斯顿大学）

AI总结本文提出了一种在高维空间中以多项式时间复杂度学习混合模型参数的方法，适用于具有重尾分布的混合模型，包括那些协方差有限的分布，且无需集群均值的最小分离。

详情

AI中文摘要

在本文中，我们提出了一种${ m poly}(d,k)$时间复杂度和样本复杂度的算法，用于高效学习$d$维空间中$k$个球形分布的参数。与之前的所有方法不同，我们的技术适用于具有重尾分布的情况，甚至包括那些没有有限协方差的分布。我们的方法在集群分布具有足够重的尾部特征函数时才能成功。此类分布包括拉普拉斯分布，但关键地排除了高斯分布。所有之前学习混合模型的方法都隐式或显式地依赖于低次矩。即使对于拉普拉斯分布的情况，我们证明任何此类算法必须使用超多项式数量的样本。因此，我们的方法补充了那些绕过矩方法限制的技术列表。出人意料的是，我们的算法不需要任何集群均值之间的最小分离。这与球形高斯混合模型形成鲜明对比，后者在信息论上证明需要最小的$\ell_2$-分离[Regev and Vijayaraghavan '17]。我们的方法与现有技术相结合，允许在混合模型中获得'两者兼得'的保证，其中每个组件要么具有重尾特征函数，要么具有亚高斯尾部但轻尾特征函数。我们的算法基于一种新的通过高效高维稀疏傅里叶变换学习混合模型的方法。我们相信这种方法将在统计估计中找到更多应用。作为例子，我们给出一个一致的鲁棒均值估计算法，以对抗噪声无关的对手，这是一个由文献中的多重假设检验文献实际提出的模型。它最近在一位作者的硕士论文中正式提出，并已启发了后续的工作。

英文摘要

In this work, we give a ${\rm poly}(d,k)$ time and sample algorithm for efficiently learning the parameters of a mixture of $k$ spherical distributions in $d$ dimensions. Unlike all previous methods, our techniques apply to heavy-tailed distributions and include examples that do not even have finite covariances. Our method succeeds whenever the cluster distributions have a characteristic function with sufficiently heavy tails. Such distributions include the Laplace distribution but crucially exclude Gaussians. All previous methods for learning mixture models relied implicitly or explicitly on the low-degree moments. Even for the case of Laplace distributions, we prove that any such algorithm must use super-polynomially many samples. Our method thus adds to the short list of techniques that bypass the limitations of the method of moments. Somewhat surprisingly, our algorithm does not require any minimum separation between the cluster means. This is in stark contrast to spherical Gaussian mixtures where a minimum $\ell_2$-separation is provably necessary even information-theoretically [Regev and Vijayaraghavan '17]. Our methods compose well with existing techniques and allow obtaining ''best of both worlds" guarantees for mixtures where every component either has a heavy-tailed characteristic function or has a sub-Gaussian tail with a light-tailed characteristic function. Our algorithm is based on a new approach to learning mixture models via efficient high-dimensional sparse Fourier transforms. We believe that this method will find more applications to statistical estimation. As an example, we give an algorithm for consistent robust mean estimation against noise-oblivious adversaries, a model practically motivated by the literature on multiple hypothesis testing. It was formally proposed in a recent Master's thesis by one of the authors, and has already inspired follow-up works.

URL PDF HTML ☆

赞 0 踩 0

2512.19131 2026-05-22 cs.DC cs.LG 版本更新

Evidential Trust-Aware Model Personalization in Decentralized Federated Learning for Wearable IoT

基于证据的信任感知模型个性化在可穿戴物联网的去中心化联邦学习中

Murtaza Rangwala, Richard O. Sinnott, Rajkumar Buyya

发表机构 * Quantum Cloud Computing and Distributed Systems (qCLOUDS) Lab（量子云计算与分布式系统实验室）； School of Computing and Information Systems（计算与信息系统学院）； The University of Melbourne, Australia（墨尔本大学）

AI总结本文提出Murmura框架，利用证据深度学习实现去中心化联邦学习中的信任感知模型个性化，通过Dirichlet基于的证据模型中的epistemic不确定性直接指示节点兼容性，从而减少非IID条件下的性能下降并加快收敛速度。

Comments v2. Addressed minor reviewer concerns

详情

DOI: 10.1109/CCGrid68966.2026.00061

AI中文摘要

去中心化联邦学习（DFL）能够在边缘设备之间进行协作模型训练，而无需集中协调，提供了对单点故障的抗性。然而，由于非相同分布的本地数据导致的统计异质性，创建了一个根本性挑战：节点必须学习适应其本地分布的个性化模型，同时选择性地与兼容的同行合作。现有方法要么强制一个单一的全局模型，无法适应任何人，要么依赖于启发式的同行选择机制，无法区分真正不兼容数据分布的同行和具有有价值互补知识的同行。我们提出了Murmura，一个利用证据深度学习实现去中心化联邦学习中信任感知模型个性化的框架。我们的关键见解是，基于Dirichlet的证据模型中的epistemic不确定性直接表明同行兼容性：当同行模型评估本地数据时，高epistemic不确定性表明分布不匹配，使节点能够排除不兼容的影响，同时通过选择性合作保持个性化模型。Murmura引入了一种信任感知的聚合机制，通过在本地验证样本上的交叉评估计算同行兼容性分数，并基于证据信任进行模型聚合，使用自适应阈值。在三个可穿戴物联网数据集（UCI HAR，PAMAP2，PPG-DaLiA）上的评估表明，与基线相比，Murmura将从IID到非IID条件下的性能下降减少了0.9% vs. 19.3%，实现了7.4×更快的收敛速度，并在超参数选择中保持稳定的准确性。这些结果确立了证据不确定性作为去中心化异构环境中兼容性感知个性化的原则性基础。

英文摘要

Decentralized federated learning (DFL) enables collaborative model training across edge devices without centralized coordination, offering resilience against single points of failure. However, statistical heterogeneity arising from non-identically distributed local data creates a fundamental challenge: nodes must learn personalized models adapted to their local distributions while selectively collaborating with compatible peers. Existing approaches either enforce a single global model that fits no one well, or rely on heuristic peer selection mechanisms that cannot distinguish between peers with genuinely incompatible data distributions and those with valuable complementary knowledge. We present Murmura, a framework that leverages evidential deep learning to enable trust-aware model personalization in DFL. Our key insight is that epistemic uncertainty from Dirichlet-based evidential models directly indicates peer compatibility: high epistemic uncertainty when a peer's model evaluates local data reveals distributional mismatch, enabling nodes to exclude incompatible influence while maintaining personalized models through selective collaboration. Murmura introduces a trust-aware aggregation mechanism that computes peer compatibility scores through cross-evaluation on local validation samples and personalizes model aggregation based on evidential trust with adaptive thresholds. Evaluation on three wearable IoT datasets (UCI HAR, PAMAP2, PPG-DaLiA) demonstrates that Murmura reduces performance degradation from IID to non-IID conditions compared to baseline (0.9% vs. 19.3%), achieves 7.4$\times$ faster convergence, and maintains stable accuracy across hyperparameter choices. These results establish evidential uncertainty as a principled foundation for compatibility-aware personalization in decentralized heterogeneous environments.

URL PDF HTML ☆

赞 0 踩 0

2512.12744 2026-05-22 cs.LG 版本更新

Resting Neurons, Active Insights: Robustifying Activation Sparsity in LLMs via Spontaneity

静息神经元，主动洞察：通过自发性增强LLM中的激活稀疏性

Haotian Xu, Jiannan Yang, Tian Gao, Tsui-Wei Weng, Tengfei Ma

发表机构 * IBM Thomas J. Watson Research Center, Yorktown Heights, USA（IBM 托马斯·J·沃森研究中心，美国Yorktown Heights）； Halıcıoğlu Data Science Institute, UC San Diego, La Jolla, USA（哈利奇欧数据科学研究所，美国UC圣地亚哥La Jolla）； Stony Brook University, Stony Brook, USA（史泰文·布鲁克大学，美国Stony Brook）

AI总结本文提出了一种通过引入自发神经元（SPON）来增强LLM中激活稀疏性的方法，解决了高稀疏率下模型精度下降的问题，通过分布匹配训练SPON，使模型在稀疏计算中保持稳定和泛化能力。

Comments ICML 2026

详情

AI中文摘要

激活稀疏性提供了一种有吸引力的途径来加速大型语言模型（LLM）的推理过程，通过选择性地抑制隐藏激活。然而，现有方法在高稀疏率下表现出严重的准确性下降。我们发现，这种失败源于表征不稳定：*激活稀疏性破坏了预训练期间学习的输入依赖激活，导致隐藏状态的分布偏移。*我们通过将激活稀疏性重新定义为表征对齐问题，并引入**自发神经元（SPON）**，一种受生物系统中自发神经活动启发的轻量机制。SPON注入一组小的可学习、输入无关的激活向量，作为稀疏计算中的持久表征锚点。这些向量通过分布匹配训练与密集模型匹配，并在训练后可吸收进偏置项中，带来极小的推理开销。在多个LLM架构上，SPON一致地恢复了性能，稳定了潜在表征，并保持了泛化能力。我们的结果确立了SPON作为可靠激活稀疏推理的有效且原则性解决方案，并为LLM的知识保留提供了新的见解。

英文摘要

Activation sparsity offers a compelling route to accelerate large language model (LLM) inference by selectively suppressing hidden activations, yet existing approaches exhibit severe accuracy degradation at high sparsity. We show that this failure stems from representational instability: *activation sparsity disrupts input-dependent activation learned during pretraining, inducing distribution shifts in hidden states.* We address this issue by reframing activation sparsity as a representational alignment problem and introducing **Spontaneous Neurons (SPON)**, a lightweight mechanism inspired by spontaneous neural activity in biological systems. SPON injects a small set of learnable, input-independent activation vectors that act as persistent representational anchors for sparse computation. These vectors are trained via distribution matching to the dense model and can be absorbed into bias terms after training, incurring negligible inference overhead. Across multiple LLM backbones, SPON consistently restores performance, stabilizes latent representations, and preserves generalization. Our results establish SPON as an effective and principled solution for reliable activation-sparse inference, and offer new insights into knowledge retention in LLMs.

URL PDF HTML ☆

赞 0 踩 0

2512.11587 2026-05-22 cs.LG cs.NA math.NA math.OC 版本更新

Gradient Descent as a Perceptron Algorithm: Understanding Dynamics and Implicit Acceleration

梯度下降作为感知机算法：理解动态与隐式加速

Alexander Tyurin

发表机构 * Applied AI Institute, Moscow, Russia（应用人工智能研究所，莫斯科，俄罗斯）

AI总结本文研究了梯度下降在神经网络训练中的优化动态和隐式加速现象，通过非线性模型分析显示梯度下降步骤等价于广义感知机算法，揭示了非线性模型在迭代复杂度上的优势。

详情

AI中文摘要

即使对于应用于神经网络训练的梯度下降（GD）方法，理解其优化动态，包括收敛速度、迭代轨迹、函数值振荡，尤其是其隐式加速现象，仍然是一个具有挑战性的问题。我们分析了具有逻辑损失的非线性模型，并展示梯度下降的步骤等同于广义感知机算法（Rosenblatt, 1958），从而提供了新的动态视角。这种简化步骤通过经典线性代数工具进行分析。在最小化示例中，我们证明了双层模型的非线性可以证明在迭代复杂度上比线性模型更快，即$ ilde{O}(\sqrt{d})$，相比线性模型的$Ω(d)$，其中$d$是特征数量。这有助于解释神经网络中观察到的优化动态和隐式加速现象。理论结果通过广泛的数值实验得到支持。我们相信这种替代观点将进一步推动神经网络优化的研究。

英文摘要

Even for the gradient descent (GD) method applied to neural network training, understanding its optimization dynamics, including convergence rate, iterate trajectories, function value oscillations, and especially its implicit acceleration, remains a challenging problem. We analyze nonlinear models with the logistic loss and show that the steps of GD reduce to those of generalized perceptron algorithms (Rosenblatt, 1958), providing a new perspective on the dynamics. This reduction yields significantly simpler algorithmic steps, which we analyze using classical linear algebra tools. Using these tools, we demonstrate on a minimalistic example that the nonlinearity in a two-layer model can provably yield a faster iteration complexity $\tilde{O}(\sqrt{d})$ compared to $Ω(d)$ achieved by linear models, where $d$ is the number of features. This helps explain the optimization dynamics and the implicit acceleration phenomenon observed in neural networks. The theoretical results are supported by extensive numerical experiments. We believe that this alternative view will further advance research on the optimization of neural networks.

URL PDF HTML ☆

赞 0 踩 0

2512.09472 2026-05-22 cs.DC cs.LG 版本更新

WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving

WarmServe: 为多LLM服务实现一种多GPU预热

Chiheng Lou, Sheng Qi, Rui Kang, Yong Zhang, Chen Sun, Pengcheng Wang, Xuanzhe Liu, Xin Jin

发表机构 * School of Computer Science, Peking University（北京大学计算机科学学院）； Huawei Technologies Co., Ltd（华为技术有限公司）

AI总结本文提出WarmServe系统，通过基于工作负载预测的多GPU预热技术，减少LLM服务中的尾部时间到第一个令牌（TTFT）并提高请求吞吐量。

Comments Accepted at ICML 2026

详情

AI中文摘要

在共享GPU集群中部署多个模型是提高大型语言模型（LLM）服务资源效率的关键策略。现有多LLM服务系统通过牺牲降级的推理性能，特别是时间到第一个令牌（TTFT）来提高GPU利用率。我们归因于缺乏对未来工作负载特征的认识。相反，最近的分析表明，现实世界中的LLM服务工作负载具有强周期性和长期可预测性。在本文中，我们提出了一种“一为多”GPU预热方法，根据工作负载预测主动将多个模型的参数加载到GPU上。这些预热的权重使系统能够在遇到请求高峰时迅速实例化服务实例。我们设计并实现了WarmServe，一个多LLM服务系统，包含三个关键技术：（1）一个模型放置算法，优化预热决策以最小化跨模型预热干扰；（2）一个KV缓存预留策略，将正在运行GPU上的空闲KV缓存空间重新利用于预热新模型；（3）一个高效的GPU内存切换机制用于张量管理。在真实世界数据集上的评估显示，WarmServe将尾部TTFT减少到比最先进的自动扩展系统高50.8倍，同时支持比GPU共享系统高2.5倍的请求吞吐量。

英文摘要

Deploying multiple models within shared GPU clusters is a key strategy to improve resource efficiency in large language model (LLM) serving. Existing multi-LLM serving systems improve GPU utilization at the cost of degraded inference performance, particularly time-to-first-token (TTFT). We attribute this degradation to the lack of awareness regarding future workload characteristics. In contrast, recent analyses have shown the strong periodicity and long-term predictability of real-world LLM serving workloads. In this paper, we propose one-for-many GPU prewarming, which proactively loads parameters from multiple models onto GPUs based on workload forecasts. These prewarmed weights enable the system to promptly instantiate serving instances upon encountering request bursts. We design and implement WarmServe, a multi-LLM serving system incorporating three key techniques: (1) a model placement algorithm that optimizes prewarming decisions to minimize cross-model prewarming interference, (2) a KV cache reservation strategy that repurposes idle KV cache space on running GPUs for prewarming new models, and (3) an efficient GPU memory switching mechanism for tensor management. Evaluation on real-world datasets shows that WarmServe reduces tail TTFT by up to 50.8$\times$ compared to the state-of-the-art autoscaling-based system, while supporting up to 2.5$\times$ higher request throughput than the GPU-sharing system.

URL PDF HTML ☆

赞 0 踩 0

2511.18159 2026-05-22 cs.LG 版本更新

Bringing Stability to Diffusion: Decomposing and Reducing Variance of Training Masked Diffusion Models

为扩散模型带来稳定性：分解和减少训练掩码扩散模型的方差

Mengni Jia, Mengyu Zhou, Yihao Liu, Xiaoxi Jiang, Guanjun Jiang

发表机构 * University of Cambridge（剑桥大学）； Peking University（北京大学）； Qwen Large Model Application Team, Alibaba（阿里巴巴通义大模型应用团队）

AI总结本文研究了掩码扩散模型（MDMs）训练方差高导致不稳定的问题，通过分解方差来源并提出六种方差减少方法，显著提升了模型在复杂推理任务中的准确率，并将运行间变异性降低至自回归模型（ARMs）水平。

详情

AI中文摘要

Masked diffusion models (MDMs) are a promising alternative to autoregressive models (ARMs), but they suffer from inherently much higher training variance. High variance leads to noisier gradient estimates and unstable optimization, so even equally strong pretrained MDMs and ARMs that are competitive at initialization often diverge after task-specific training, with MDMs falling far behind. There has been no theoretical explanation or systematic solution. We derive the first decomposition of MDM training variance into three sources: (A) masking pattern noise, (B) masking rate noise, and (C) data noise, while ARMs are only affected by (C). This explains the fundamental training gap. Building on this foundation, we design six variance-reduction methods, including two core methods: (1) P-POTS, a Pareto-optimal t sampler that minimizes training variance by sampling harder t values more often with appropriately smaller update steps, and (2) MIRROR, which uses negatively correlated samples to reduce (A). Experiments show that compared to standard MDM training, our methods improve accuracy by 7-8% on complex reasoning tasks, while simultaneously reducing run-to-run variability to near ARM levels, substantially narrowing the gap with strong ARM baselines; in most settings, even the best baseline runs remain below the worst run of our method.

英文摘要

Masked diffusion models (MDMs) are a promising alternative to autoregressive models (ARMs), but they suffer from inherently much higher training variance. High variance leads to noisier gradient estimates and unstable optimization, so even equally strong pretrained MDMs and ARMs that are competitive at initialization often diverge after task-specific training, with MDMs falling far behind. There has been no theoretical explanation or systematic solution. We derive the first decomposition of MDM training variance into three sources: (A) masking pattern noise, (B) masking rate noise, and (C) data noise, while ARMs are only affected by (C). This explains the fundamental training gap. Building on this foundation, we design six variance-reduction methods, including two core methods: (1) P-POTS, a Pareto-optimal t sampler that minimizes training variance by sampling harder t values more often with appropriately smaller update steps, and (2) MIRROR, which uses negatively correlated samples to reduce (A). Experiments show that compared to standard MDM training, our methods improve accuracy by 7-8% on complex reasoning tasks, while simultaneously reducing run-to-run variability to near ARM levels, substantially narrowing the gap with strong ARM baselines; in most settings, even the best baseline runs remain below the worst run of our method.

URL PDF HTML ☆

赞 0 踩 0

2511.10619 2026-05-22 cs.LG stat.ML 版本更新

Algorithm Design and Stronger Guarantees for the Improving Multi-Armed Bandits Problem

改进多臂老虎机问题的算法设计及更强的保证

Avrim Blum, Marten Garicano, Kavya Ravichandran, Dravyansh Sharma

发表机构 * Toyota Technological Institute at Chicago（芝加哥丰田技术研究所）； University of Chicago（芝加哥大学）； IDEAL Institute, Toyota Technological Institute at Chicago（IDEAL研究所，芝加哥丰田技术研究所）

AI总结本文提出两种新的参数化老虎机算法家族，通过离线数据界定了学习近最优算法的样本复杂度，并在标准超参数调优基准上进行了实证评估。第一家族包含先前工作的最优随机算法，展示在满足额外凹性性质的臂奖励曲线下，可以实现更强的保证。第二家族算法在良好行为实例上保证最佳臂识别，在不良行为实例上退化为最坏情况保证。

Comments 36 pages

详情

AI中文摘要

改进多臂老虎机问题是一个在不确定性下分配努力的形式模型，受投资新技术研究努力、进行临床试验和从学习曲线中选择超参数等场景的启发。每次拉取臂提供奖励，该奖励以递减回报单调增加。已有大量工作设计了改进老虎机算法，但最坏情况保证较为悲观。事实上，已知确定性和随机性算法相对于最优臂的强下界分别为Ω(k)和Ω(√k)的乘法近似因子。在本文中，我们提出两个新的参数化老虎机算法家族，并利用离线数据界定了从每个家族学习近最优算法的样本复杂度。我们还在标准超参数调优基准上进行了实证评估。我们定义的第一家族包含先前工作的最优随机算法。我们证明，适当选择的算法从该家族中可以实现更强的保证，当臂奖励曲线下满足与凹性强度相关的额外性质时，具有最优的k依赖性。我们的第二家族包含在良好行为实例上保证最佳臂识别并在不良行为实例上退化为最坏情况保证的算法。

英文摘要

The improving multi-armed bandits problem is a formal model for allocating effort under uncertainty, motivated by scenarios such as investing research effort into new technologies, performing clinical trials, and hyperparameter selection from learning curves. Each pull of an arm provides reward that increases monotonically with diminishing returns. A growing line of work has designed algorithms for improving bandits, albeit with somewhat pessimistic worst-case guarantees. Indeed, strong lower bounds of $Ω(k)$ and $Ω(\sqrt{k})$ multiplicative approximation factors are known for both deterministic and randomized algorithms (respectively) relative to the optimal arm, where $k$ is the number of bandit arms. In this work, we propose two new parameterized families of bandit algorithms and bound the sample complexity of learning the near-optimal algorithm from each family using offline data. We also perform empirical evaluations on standard hyperparameter tuning benchmarks. The first family we define includes the optimal randomized algorithm from prior work. We show that an appropriately chosen algorithm from this family can achieve stronger guarantees, with optimal dependence on $k$, when the arm reward curves satisfy additional properties related to the strength of concavity. Our second family contains algorithms that both guarantee best-arm identification on well-behaved instances and revert to worst-case guarantees on poorly-behaved instances.

URL PDF HTML ☆

赞 0 踩 0

2511.07885 2026-05-22 cs.DC cs.AI cs.CL cs.LG 版本更新

Intelligence per Watt: Measuring Intelligence Efficiency of Local AI

每瓦智能：衡量本地AI的智能效率

Jon Saad-Falcon, Avanika Narayan, Hakki Orhun Akengin, J. Wes Griffin, Herumb Shandilya, Adrian Gamarra Lafuente, Medhya Goel, Rebecca Joseph, Shlok Natarajan, Etash Kumar Guha, Shang Zhu, Ben Athiwaratkun, John Hennessy, Azalia Mirhoseini, Christopher Ré

发表机构 * Stanford University（斯坦福大学）； Together AI

AI总结本文研究了本地AI在能源效率和性能上的表现，提出了一种统一的衡量指标IPW，展示了本地推理在重新分配需求方面的能力，并揭示了本地加速器的优化潜力。

详情

AI中文摘要

大型语言模型（LLM）查询主要由集中式云基础设施中的前沿模型处理。需求增长比提供商能够扩展的速度更快。两项进展创造了重新思考这一范式的机会：小型本地LM（<=20B活跃参数）在许多任务上能与前沿模型竞争性地表现，而本地加速器（如Apple M4 Max）可以以交互延迟支持这些模型。这引发了问题：本地推理能否在能源受限的设备上有效重新分配需求？这需要测量本地LM是否能准确回答现实查询以及是否在能源受限的设备上高效。我们提出了智能每瓦（IPW），即任务准确度每单位功率，作为衡量本地推理能力与效率的统一指标。我们评估了20多个最先进的本地LM、8种硬件加速器（本地和云）以及100万条现实单轮聊天和推理查询。对于每个查询，我们测量了准确性（本地LM对前沿模型的胜率）、能耗、延迟和功率。我们发现三个关键结果。首先，本地LM成功回答了88.7%的这些查询，准确性因领域而异。其次，2023-2025年的纵向分析显示IPW提高了5.3倍，由算法和加速器的改进驱动，本地可服务查询覆盖范围从23.2%增加到71.3%。第三，本地加速器在相同模型上实现的IPW至少比云加速器低1.4倍，揭示了本地加速器优化的巨大潜力。这些发现表明，本地推理可以对集中式基础设施的大量查询需求进行有意义的重新分配，IPW是跟踪这一转变的关键指标。

英文摘要

Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure. Demand growth strains this paradigm faster than providers can scale. Two advances create an opportunity to rethink it: small, local LMs (<=20B active parameters) now achieve competitive performance to frontier models on many tasks, and local accelerators (e.g., Apple M4 Max) can host these models at interactive latencies. This raises the question: can local inference viably redistribute demand from centralized infrastructure? This requires measuring both whether local LMs can accurately answer real-world queries and whether they can do so efficiently on power-constrained devices (e.g., laptops). We propose intelligence per watt (IPW), task accuracy per unit of power, as a unified metric for the capability and efficiency of local inference across model-accelerator configurations. We evaluate 20+ state-of-the-art local LMs, 8 hardware accelerators (local and cloud), and 1M real-world single-turn chat and reasoning queries. For each query, we measure accuracy (local LM win rate against frontier models), energy, latency, and power. We find three key results. First, local LMs successfully answer 88.7% of these queries, with accuracy varying by domain. Second, longitudinal analysis from 2023-2025 shows IPW improved 5.3x, driven by both algorithmic and accelerator advances, with locally-serviceable query coverage rising from 23.2% to 71.3%. Third, local accelerators achieve at least 1.4x lower IPW than cloud accelerators running identical models, revealing significant headroom for local accelerator optimization. These findings demonstrate that local inference can meaningfully redistribute demand from centralized infrastructure for a substantial subset of queries, with IPW serving as the critical metric for tracking this transition.

URL PDF HTML ☆

赞 0 踩 0

2511.04838 2026-05-22 cs.LG math.SP q-bio.MN 版本更新

SPECTRA: Spectral Domain-Aware Graph Generation for Imbalanced Molecular Property Regression

SPECTRA: 用于不平衡分子属性回归的谱域感知图生成

Brenda Nogueira, Gisela A. Gonzalez-Montiel, Meng Jiang, Nitesh V. Chawla, Nuno Moniz

发表机构 * University of Notre Dame, Dept. of Computer Science ； University of Notre Dame, Dept. of Chemistry ； University of Notre Dame, Lucy Family Institute for Data \& Society Notre Dame Indiana USA ； University of Notre Dame, Lucy Family Institute for Data \& Society

AI总结本文提出SPECTRA方法，通过结合稀缺性感知预算方案、目标邻居图对齐和拉普拉斯谱插值，提升对相关但数据稀缺的分子属性值的预测能力，同时在相关目标范围内优于现有最先进方法，计算时间减少约4倍。

2511.02043 2026-05-22 cs.LG cs.PF 版本更新

Flashlight: PyTorch Compiler Extensions to Accelerate Attention Variants

Flashlight: PyTorch 编译器扩展以加速注意力变种

Bozhi You, Irene Wang, Zelal Su Mustafaoglu, Abhinav Jangda, Angélica Moreira, Roshan Dathathri, Divya Mahajan, Keshav Pingali

发表机构 * Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country（匿名机构，匿名城市，匿名地区，匿名国家）

AI总结本文提出Flashlight，一种基于PyTorch的编译器框架，能够自动生成融合的FlashAttention风格内核，支持任意注意力程序，无需静态模板或预定义内核专有化，从而在保持性能的同时提供灵活性。

详情

AI中文摘要

注意力是大型语言模型（LLMs）的基本构建块，因此有很多努力去高效地实现它。例如，FlashAttention利用分块和内核融合来优化注意力。最近，一些注意力变种被引入以提高模型质量和效率。支持它们仍然困难，因为它们通常需要专门的内核或手动调优的实现。FlexAttention最近通过使用静态编程模板来支持FlashAttention-like内核来解决部分这一差距。在本文中，我们介绍了Flashlight，一种位于PyTorch生态系统中的编译器原生框架，能够自动生成融合的FlashAttention风格内核，适用于任意注意力程序，而无需依赖静态模板或预定义的内核专有化。Flashlight利用PyTorch的编译流程来透明地融合和分块注意力计算，使各种注意力模式能够高效执行。不仅支持FlexAttention模型中所有可表达的变种，还处理更一般、数据依赖的注意力公式，这些超出了FlexAttention的能力范围。我们的结果表明，Flashlight生成的内核在性能上与FlexAttention具有竞争力或更优，同时提供原生PyTorch代码的灵活性，使开发人员能够快速探索新的注意力模型，而不会牺牲性能。

英文摘要

Attention is a fundamental building block of large language models (LLMs), so there have been many efforts to implement it efficiently. For example, FlashAttention leverages tiling and kernel fusion to optimize attention. Recently, a number of variants of attention have been introduced to enhance model quality or efficiency. Supporting them efficiently remains difficult since they usually require specialized kernels or hand-tuned implementations. FlexAttention recently addressed part of this gap by using static programming templates to support FlashAttention-like kernels for a subset of attention variants. In this paper, we introduce Flashlight, a compiler-native framework within the PyTorch ecosystem that automatically generates fused, FlashAttention-style kernels for arbitrary attention-based programs, without relying on static templates or predefined kernel specializations. Flashlight leverages PyTorch's compilation workflow to fuse and tile attention computations transparently, enabling efficient execution for diverse attention patterns. Not only does it support all variants expressible in the FlexAttention model but it also handles more general, data-dependent attention formulations that are beyond the capabilities of FlexAttention. Our results show that Flashlight produces kernels with competitive or superior performance to FlexAttention, while offering the flexibility of native PyTorch code, enabling developers to rapidly explore new attention models without sacrificing performance.

URL PDF HTML ☆

赞 0 踩 0

2510.04280 2026-05-22 cs.LG cs.AI cs.RO 版本更新

A KL-regularization Framework for Learning to Plan with Adaptive Priors

一种基于KL正则化的学习规划框架：具有自适应先验的规划

Álvaro Serra-Gomez, Daniel Jarne Ornia, Dhruva Tirumala, Thomas Moerland

发表机构 * LIACS, Leiden University, Leiden, The Netherlands（莱顿大学莱顿分校，荷兰）； Google Deepmind, London, United Kingdom（谷歌DeepMind，英国伦敦）； University of Oxford, Oxford, United Kingdom（牛津大学，英国牛津）

AI总结本文提出了一种基于KL正则化的学习规划框架，通过将规划器的动作分布作为先验整合到策略优化中，提升了在高维连续控制任务中模型驱动强化学习的样本效率和长期性能。

Comments Published at ICML2026

详情

AI中文摘要

有效的探索仍然是模型驱动强化学习（MBRL）中的核心挑战，尤其是在高维连续控制任务中，样本效率至关重要。近期的一项重要工作利用学习的策略作为模型预测路径积分（MPPI）规划的提案分布。初始方法在更新采样策略时独立于规划器分布，通常通过确定性策略梯度和熵正则化最大化学习的价值函数。然而，由于训练过程中遇到的状态依赖于MPPI规划器，使采样策略与规划器对齐可以提高价值估计的准确性以及长期性能。为此，近期的方法通过最小化KL散度到规划器分布或引入规划器引导的正则化来更新采样策略。在本文中，我们通过引入策略优化-模型预测控制（PO-MPC），将这些基于MPPI的强化学习方法统一到一个框架中，这是一种整合规划器动作分布作为先验的KL正则化MBRL方法家族。通过使学习的策略与规划器的行为对齐，PO-MPC允许在回报最大化和KL散度最小化之间更灵活的策略更新。我们澄清了先前方法如何作为该家族的特殊案例出现，并探索了之前未研究的变体。我们的实验表明，这些扩展配置产生了显著的性能提升，推动了基于MPPI的强化学习的前沿。

英文摘要

Effective exploration remains a central challenge in model-based reinforcement learning (MBRL), particularly in high-dimensional continuous control tasks where sample efficiency is crucial. A prominent line of recent work leverages learned policies as proposal distributions for Model-Predictive Path Integral (MPPI) planning. Initial approaches update the sampling policy independently of the planner distribution, typically maximizing a learned value function with deterministic policy gradient and entropy regularization. However, because the states encountered during training depend on the MPPI planner, aligning the sampling policy with the planner improves the accuracy of value estimation and long-term performance. To this end, recent methods update the sampling policy by minimizing KL divergence to the planner distribution or by introducing planner-guided regularization into the policy update. In this work, we unify these MPPI-based reinforcement learning methods under a single framework by introducing Policy Optimization-Model Predictive Control (PO-MPC), a family of KL-regularized MBRL methods that integrate the planner's action distribution as a prior in policy optimization. By aligning the learned policy with the planner's behavior, PO-MPC allows more flexibility in the policy updates to trade off Return maximization and KL divergence minimization. We clarify how prior approaches emerge as special cases of this family, and we explore previously unstudied variations. Our experiments show that these extended configurations yield significant performance improvements, advancing the state of the art in MPPI-based RL.

URL PDF HTML ☆

赞 0 踩 0

2509.12610 2026-05-22 cs.DB cs.AI cs.LG 版本更新

ScaleDoc: Scaling LLM-based Predicates over Large Document Collections

ScaleDoc: 通过大规模文档集合进行基于大语言模型的谓词扩展

Hengrui Zhang, Yulong Hui, Yihao Liu, Huanchen Zhang

发表机构 * Tsinghua University（清华大学）

AI总结本文提出ScaleDoc系统，通过将谓词执行分为离线表示阶段和优化的在线过滤阶段，解决了大规模文档分析中大语言模型高推理成本的问题，实现了端到端速度提升和LLM调用成本降低。

详情

DOI: 10.1145/3802106

AI中文摘要

谓词是数据分析系统中的基础组件。然而，现代工作负载越来越多地涉及无结构文档，这需要语义理解，而不仅仅是传统基于值的谓词。鉴于巨大的文档和随机查询，尽管大语言模型（LLMs）显示出强大的零样本能力，但其高推理成本导致不可接受的开销。因此，我们引入ScaleDoc，一种新的系统，通过将谓词执行分解为离线表示阶段和优化的在线过滤阶段来解决这一问题。在离线阶段，ScaleDoc利用LLM为每个文档生成语义表示。在线阶段，对于每个查询，它在这些表示上训练一个轻量级代理模型来过滤大多数文档，只将有歧义的案例转发给LLM进行最终决策。此外，ScaleDoc提出了两个核心创新来实现显著的效率：（1）基于对比学习的框架，训练代理模型生成可靠的预测决策分数；（2）自适应级联机制，确定有效的过滤策略，同时满足特定的准确率目标。我们在三个数据集上的评估表明，ScaleDoc实现了超过2倍的端到端速度提升，并将昂贵的LLM调用减少了高达85%，使大规模语义分析变得实用和高效。

英文摘要

Predicates are foundational components in data analysis systems. However, modern workloads increasingly involve unstructured documents, which demands semantic understanding, beyond traditional value-based predicates. Given enormous documents and ad-hoc queries, while Large Language Models (LLMs) demonstrate powerful zero-shot capabilities, their high inference cost leads to unacceptable overhead. Therefore, we introduce \textsc{ScaleDoc}, a novel system that addresses this by decoupling predicate execution into an offline representation phase and an optimized online filtering phase. In the offline phase, \textsc{ScaleDoc} leverages a LLM to generate semantic representations for each document. Online, for each query, it trains a lightweight proxy model on these representations to filter the majority of documents, forwarding only the ambiguous cases to the LLM for final decision. Furthermore, \textsc{ScaleDoc} proposes two core innovations to achieve significant efficiency: (1) a contrastive-learning-based framework that trains the proxy model to generate reliable predicating decision scores; (2) an adaptive cascade mechanism that determines the effective filtering policy while meeting specific accuracy targets. Our evaluations across three datasets demonstrate that \textsc{ScaleDoc} achieves over a 2$\times$ end-to-end speedup and reduces expensive LLM invocations by up to 85\%, making large-scale semantic analysis practical and efficient.

URL PDF HTML ☆

赞 0 踩 0

2509.09088 2026-05-22 cs.LG math.DG math.DS 版本更新

An entropy formula for the Deep Linear Network

深度线性网络的熵公式

Govind Menon, Tianmin Yu

发表机构 * Division of Applied Mathematics, Brown University（布朗大学应用数学系）； School of Mathematics, Institute for Advanced Study（高级研究院数学系）； Department of Mathematics, Northwestern University（西北大学数学系）

AI总结本文研究深度线性网络的黎曼几何，以建立学习过程的热力学描述。通过群作用分析过参数化，并利用参数空间到可观测空间的黎曼子流形，定义并计算玻尔兹曼熵。主要技术步骤是利用雅可比矩阵理论显式构造平衡流形的切空间正交基。

Comments Final version of accepted paper in SIAM Journal on Mathematical Analysis. Includes fixes of minor typos (especially equation (3.13), (6.35) and (6.36)

2508.06884 2026-05-22 math.OC cs.LG 版本更新

Near-Optimal Convergence of Accelerated Gradient Methods under Generalized and $(L_0, L_1)$-Smoothness

在一般化和$(L_0, L_1)$-光滑条件下加速梯度方法的近最优收敛性

Alexander Tyurin

发表机构 * Applied AI Institute, Moscow, Russia（应用人工智能研究所，莫斯科，俄罗斯）

AI总结本文研究了在满足最近提出的$\ell$-光滑性条件的凸优化问题中的一阶方法。虽然加速梯度下降法（AGD）在$L$-光滑性下能达到最优复杂度$O(\sqrt{L} R / \sqrt{\varepsilon})$，但现有方法在$\ell$-光滑性下的扩展要么引入额外的初始梯度依赖，要么有指数因子$ L_1 R $，或者需要昂贵的辅助子程序。本文解决了这一开放问题，通过新的Lyapunov函数和设计新的算法，实现了$O(\sqrt{\ell(0)} R / \sqrt{\varepsilon})$的oracle复杂度，对于小$\varepsilon$和几乎任何$\ell$。例如，在$(L_0, L_1)$-光滑性下，我们的界$O(\sqrt{L_0} R / \sqrt{\varepsilon})$在小$\varepsilon$范围内被证明是最佳的，并去除了先前加速算法中所有非常数的乘法因子。

详情

AI中文摘要

我们研究了在满足最近提出的$\ell$-光滑性条件的凸优化问题中的一阶方法。该条件$|| abla^{2}f(x)|| \le \ell\left(|| abla f(x)|| ight)$扩展了$L$-光滑性和$(L_{0},L_{1})$-光滑性。虽然加速梯度下降法AGD在$L$-光滑性下能达到最优复杂度$O(\sqrt{L} R / \sqrt{\varepsilon})$，其中$\varepsilon$是误差容忍度，$R$是起始点与最优解之间的距离，但现有方法在$\ell$-光滑性下的扩展要么引入额外的初始梯度依赖，要么有指数因子$ L_1 R $，或者需要昂贵的辅助子程序，留下开放问题：是否可能在小$\varepsilon$下实现AGD型$O(\sqrt{\ell(0)} R / \sqrt{\varepsilon})$的速率，即使在$(L_{0},L_{1})$-光滑性情况下。我们解决了这一开放问题。通过新的Lyapunov函数和设计新的算法，我们实现了对于小$\varepsilon$和几乎任何$\ell$的$O(\sqrt{\ell(0)} R / \sqrt{\varepsilon})$的oracle复杂度。例如，在$(L_{0},L_{1})$-光滑性下，我们的界$O(\sqrt{L_0} R / \sqrt{\varepsilon})$在小$\varepsilon$范围内被证明是最佳的，并去除了先前加速算法中所有非常数的乘法因子。

英文摘要

We study first-order methods for convex optimization problems with functions $f$ satisfying the recently proposed $\ell$-smoothness condition $||\nabla^{2}f(x)|| \le \ell\left(||\nabla f(x)||\right),$ which generalizes the $L$-smoothness and $(L_{0},L_{1})$-smoothness. While accelerated gradient descent AGD is known to reach the optimal complexity $O(\sqrt{L} R / \sqrt{\varepsilon})$ under $L$-smoothness, where $\varepsilon$ is an error tolerance and $R$ is the distance between a starting and an optimal point, existing extensions to $\ell$-smoothness either incur extra dependence on the initial gradient, suffer exponential factors in $L_{1} R$, or require costly auxiliary sub-routines, leaving open whether an AGD-type $O(\sqrt{\ell(0)} R / \sqrt{\varepsilon})$ rate is possible for small-$\varepsilon$, even in the $(L_{0},L_{1})$-smoothness case. We resolve this open question. Leveraging a new Lyapunov function and designing new algorithms, we achieve $O(\sqrt{\ell(0)} R / \sqrt{\varepsilon})$ oracle complexity for small-$\varepsilon$ and virtually any $\ell$. For instance, for $(L_{0},L_{1})$-smoothness, our bound $O(\sqrt{L_0} R / \sqrt{\varepsilon})$ is provably optimal in the small-$\varepsilon$ regime and removes all non-constant multiplicative factors present in prior accelerated algorithms.

URL PDF HTML ☆

赞 0 踩 0

2507.20268 2026-05-22 cs.LG eess.SP stat.ML 版本更新

Reliable Wireless Indoor Localization via Cross-Validated Prediction-Powered Calibration

通过交叉验证的预测驱动校准实现可靠的无线室内定位

Seonghoon Yoo, Houssem Sifaou, Sangwoo Park, Joonhyuk Kang, Osvaldo Simeone

发表机构 * School of Electrical Engineering, Korea Advanced Institute of Science and Technology（韩国科学技术院电子工程学院）； King’s Communications, Learning & Information Processing (KCLIP) Lab, Centre for Intelligent Information Processing Systems (CIIPS), Department of Engineering, King’s College London（伦敦国王学院信息与通信实验室，智能信息处理系统中心，工程系）； Institute for Intelligent Networked Systems, Northeastern University London（伦敦东北大学智能网络系统研究所）

AI总结本文提出一种利用有限校准数据同时优化预测器和估计合成标签偏差的方法，通过交叉验证预测驱动校准提高无线室内定位的可靠性。

2506.19500 2026-05-22 cs.AI cs.CL cs.LG 版本更新

NaviAgent: Graph-Driven Bilevel Planning for Scalable Tool Orchestration

NaviAgent: 一种基于图的双层规划用于可扩展的工具编排

Yan Jiang, Hao Zhou, Lizhong GU, Tianlong Li, Ruinan Jin, Wanqi Zhou, Ai Han

发表机构 * Department of Electrical and Computer Engineering, The Ohio State University, USA（电气与计算机工程系，俄亥俄州立大学，美国）

AI总结本文提出NaviAgent，一种基于图的双层规划框架，通过解耦任务规划与工具执行，提升大规模工具编排的可扩展性和鲁棒性，实验表明其在任务成功率和实际应用中表现优异。

Comments Accepted to ICML 2026

Journal ref Proceedings of the 43rd International Conference on Machine Learning (ICML), 2026

详情

AI中文摘要

大型语言模型（LLMs）越来越多地作为功能调用代理，通过调用外部工具来处理超出其静态知识的任务。然而，它们通常逐个调用工具，缺乏对任务结构的整体视图。由于工具之间往往相互依赖，这导致了错误累积和可扩展性差，尤其是在扩展到数百或数千个工具时。为了解决这些限制，我们提出了NaviAgent，一种显式的双层架构，通过基于工具关系的图建模来解耦任务规划与工具执行。在规划层，基于LLM的代理决定是否直接回应、澄清意图或检索并执行独立于工具间复杂度的工具链。在执行层，工具世界导航模型（TWNM）编码工具之间的结构和行为关系，引导代理生成可扩展且鲁棒的调用序列。通过整合真实工具交互的反馈，NaviAgent实现了规划与执行之间的闭环对齐，使代理能够在大规模工具生态系统中实现自适应导航。在API-Bank和ToolBench上的评估显示，任务成功率（TSR）有持续改进，TWNM在复杂任务上平均提升13.1个百分点。进一步在50个真实API跨7个领域的测试中，展示了4.3-12.0个百分点的持续收益，步骤更少且延迟更低，证明了其在真实世界动态下的鲁棒泛化能力。

英文摘要

Large Language Models (LLMs) increasingly act as function-call agents that invoke external tools to tackle tasks beyond their static knowledge. However, they typically invoke tools one at a time without a global view of task structure. As tools often depend on one another, this leads to error accumulation and poor scalability, particularly when scaling to hundreds or thousands of tools. To address these limitations, we propose NaviAgent, an explicit bilevel architecture that decouples task planning from tool execution through graph-based modeling of tool relations. At the planning level, the LLM-based agent decides whether to respond directly, clarify intent, or retrieve and execute a toolchain independent of inter-tool complexity. At the execution level, a Tool World Navigation Model (TWNM) encodes structural and behavioral relations among tools, steering the agent to compose scalable and robust invocation sequences. Incorporating feedback from real tool interactions, NaviAgent achieves closed-loop alignment between planning and execution, enabling adaptive navigation in large-scale tool ecosystems. Evaluations on API-Bank and ToolBench show consistent improvements in task success rate (TSR), with TWNM yielding an average gain of 13.1 points on complex tasks. Further tests on 50 real APIs across 7 domains show consistent gains of 4.3--12.0 points, with fewer steps and latency, demonstrating robust generalization under real-world dynamics.

URL PDF HTML ☆

赞 0 踩 0

2506.16659 2026-05-22 cs.LG cs.AI math.OC 版本更新

Memory-Efficient LLM Pretraining via Minimalist Optimizer Design

通过最小化优化器设计实现内存高效的LLM预训练

Athanasios Glentis, Jiaxiang Li, Andi Han, Mingyi Hong

发表机构 * Department of Electrical and Computer Engineering, University of Minnesota, USA（电气与计算机工程系，明尼苏达大学，美国）； School of Mathematics and Statistics, University of Sydney, Australia（数学与统计学学院，悉尼大学，澳大利亚）

AI总结本文研究了如何通过简单的优化器设计改进，使SGD在预训练中达到最先进的性能，提出了SCALE优化器，在内存使用上比Adam更高效，并在多个模型上表现优于现有内存高效的优化器。

Comments Accepted at ICML 2026

详情

AI中文摘要

训练大型语言模型（LLMs）依赖于自适应优化器，如Adam，这些优化器引入了额外的操作，并需要比SGD更多的内存来维护一阶和二阶矩量。尽管最近的工作如GaLore、Fira和APOLLO提出了状态压缩的内存高效变体，但一个根本性的问题仍然存在：plain SGD需要哪些最小的修改才能达到最先进的预训练性能？我们通过自底向上的方法系统地研究了这个问题，并识别出两种简单但高度（内存和计算）高效的技巧：（1）列级梯度归一化（沿输出维度归一化梯度），在没有动量的情况下提升SGD性能；（2）仅在输出层应用一阶动量，因为梯度方差最高。结合这两种技术得到SCALE（Stochastic Column-normAlized Last-layer momEntum），一种简单的优化器，用于内存高效的预训练。在多个模型（60M-1B）上，SCALE的内存使用仅为Adam的35-45%，并且在多个模型上表现优于Adam。它还一致优于内存高效的优化器如GaLore、Fira和APOLLO，使其成为在内存限制下的大规模预训练的强大候选者。对于LLaMA 7B，SCALE在困惑度和内存消耗方面都优于最先进的内存高效方法APOLLO和Muon。

英文摘要

Training large language models (LLMs) relies on adaptive optimizers such as Adam, which introduce extra operations and require significantly more memory to maintain first- and second-order moments than SGD. While recent works such as GaLore, Fira and APOLLO have proposed state-compressed memory-efficient variants, a fundamental question remains: What are the minimum modifications to plain SGD needed to match state-of-the-art pretraining performance? We systematically investigate this question using a bottom-up approach, and identify two simple yet highly (memory- and compute-) efficient techniques: (1) column-wise gradient normalization (normalizing the gradient along the output dimension), that boosts SGD performance without momentum; and (2) applying first-order momentum only to the output layer, where gradient variance is highest. Combining these two techniques lead to SCALE (Stochastic Column-normAlized Last-layer momEntum), a simple optimizer for memory efficient pretraining. Across multiple models (60M-1B), SCALE matches or exceeds the performance of Adam while using only 35-45% of the total memory. It also consistently outperforms memory-efficient optimizers such as GaLore, Fira and APOLLO, making it a strong candidate for large-scale pretraining under memory constraints. For LLaMA 7B, SCALE outperforms the state-of-the-art memory-efficient methods APOLLO and Muon in both perplexity and memory consumption.

URL PDF HTML ☆

赞 0 踩 0

2505.24333 2026-05-22 stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG 版本更新

Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation

深度变换器的两种失效模式及如何避免它们：初始化下信号传播的统一理论

Alessio Giorlandino, Sebastian Goldt

发表机构 * International School of Advanced Studies (SISSA)（国际先进研究学校（SISSA））

AI总结本文研究了深度变换器中自注意力层的两种失效模式——秩坍缩和熵坍缩，并提出了一种统一的信号传播理论，通过分析初始化对训练稳定性的影响，提供了一种计算训练性图的简单算法，以确定正确初始化超参数的选择。

Journal ref ICLR 2026

详情

AI中文摘要

找到正确的初始化对于确保神经网络的平稳训练和良好性能至关重要。在变换器中，错误的初始化可能导致自注意力层的两种失效模式：秩坍缩，其中所有标记坍缩为相似的表示，以及熵坍缩，其中高度集中的注意力分数导致训练不稳定。尽管之前的研究所研究了变换器的不同缩放领域，但迄今为止，关于如何初始化变换器的渐近精确、到常数的处方仍然缺乏。在这里，我们提供了一种分析深度变换器中信号通过自注意力、层归一化、跳跃连接和MLP传播的理论。我们的理论产生了一种简单的算法，用于计算训练性图，以确定给定架构的正确初始化超参数选择。我们通过建立与统计物理中随机能模型的正式平行，克服了处理自注意力层的关键挑战。我们还分析了反向路径中的梯度，并确定了梯度在初始化时消失的区域。我们通过三个案例研究展示了我们框架的通用性。我们的理论框架为自注意力的两种失效模式提供了统一的视角，并对权重和残差连接的尺度提供了定量预测，以确保平稳训练。

英文摘要

Finding the right initialisation for neural networks is crucial to ensure smooth training and good performance. In transformers, the wrong initialisation can lead to one of two failure modes of self-attention layers: rank collapse, where all tokens collapse into similar representations, and entropy collapse, where highly concentrated attention scores lead to training instability. While previous work has studied different scaling regimes for transformers, an asymptotically exact, down-to-the constant prescription for how to initialise transformers has so far been lacking. Here, we provide an analytical theory of signal propagation through deep transformers with self-attention, layer normalisation, skip connections and MLP. Our theory yields a simple algorithm to compute trainability diagrams that identify the correct choice of initialisation hyper-parameters for a given architecture. We overcome the key challenge, an exact treatment of the self-attention layer, by establishing a formal parallel with the Random Energy Model from statistical physics. We also analyse gradients in the backward path and determine the regime where gradients vanish at initialisation. We demonstrate the versatility of our framework through three case studies. Our theoretical framework gives a unified perspective on the two failure modes of self-attention and gives quantitative predictions on the scale of both weights and residual connections that guarantee smooth training.

URL PDF HTML ☆

赞 0 踩 0

2505.20349 2026-05-22 physics.flu-dyn cs.LG 版本更新

FD-Bench: A Modular and Fair Benchmark for Data-driven Fluid Simulation

FD-Bench: 一种模块化且公平的用于数据驱动流体模拟的基准测试

Haixin Wang, Ruoyan Li, Fred Xu, Fang Sun, Kaiqiao Han, Zijie Huang, Ching Chang, Xiao Luo, Wei Wang, Yizhou Sun

发表机构 * University of California, Los Angeles（加州大学洛杉矶分校）； Meta ； University of Wisconsin–Madison（威斯康星大学麦迪逊分校）

AI总结本文提出FD-Bench，一个模块化、公平、全面且可重复的数据驱动流体模拟基准测试，通过统一的实验设置评估85个基线模型，解决可重复性和可比性问题，为未来数据驱动流体模型的稳健评估奠定基础。

Comments 32 pages, 20 figures, paper accepted by KDD 2026

详情

AI中文摘要

数据驱动的流体动力学建模随着神经PDE求解器的快速发展而迅速进步，但公平且强大的基准测试仍然碎片化，由于缺乏统一的PDE数据集和标准化的评估协议。尽管架构创新丰富，但公平评估进一步受制于空间、时间和损失模块之间缺乏明确分离。在本文中，我们引入FD-Bench，这是首个公平、模块化、全面且可重复的数据驱动流体模拟基准测试。FD-Bench在统一的实验设置下系统评估了85个基线模型，涵盖10种代表性流场场景。它提供了四个关键贡献：(1) 模块化设计，使空间、时间和损失函数模块之间能够公平比较；(2) 首个系统框架，用于与传统数值求解器的直接比较；(3) 在不同分辨率、初始条件和时间窗口下的细粒度泛化分析；(4) 用户友好的、可扩展的代码库，以支持未来研究。通过严谨的实证研究，FD-Bench建立了迄今为止最全面的排行榜，解决了长期存在的可重复性和可比性问题，并为未来数据驱动流体模型的稳健评估奠定了基础。代码已开源在https://github.com/WillDreamer/FD-Bench。

英文摘要

Data-driven modeling of fluid dynamics has advanced rapidly with neural PDE solvers, yet a fair and strong benchmark remains fragmented due to the absence of unified PDE datasets and standardized evaluation protocols. Although architectural innovations are abundant, fair assessment is further impeded by the lack of clear disentanglement between spatial, temporal and loss modules. In this paper, we introduce FD-Bench, the first fair, modular, comprehensive and reproducible benchmark for data-driven fluid simulation. FD-Bench systematically evaluates 85 baseline models across 10 representative flow scenarios under a unified experimental setup. It provides four key contributions: (1) a modular design enabling fair comparisons across spatial, temporal, and loss function modules; (2) the first systematic framework for direct comparison with traditional numerical solvers; (3) fine-grained generalization analysis across resolutions, initial conditions, and temporal windows; and (4) a user-friendly, extensible codebase to support future research. Through rigorous empirical studies, FD-Bench establishes the most comprehensive leaderboard to date, resolving long-standing issues in reproducibility and comparability, and laying a foundation for robust evaluation of future data-driven fluid models. The code is open-sourced at https://github.com/WillDreamer/FD-Bench.

URL PDF HTML ☆

赞 0 踩 0

2505.15844 2026-05-22 q-bio.QM cs.LG stat.AP 版本更新

Advancing Tabular Stroke Modelling Through a Novel Hybrid Architecture and Feature-Selection Synergy

通过一种新型混合架构和特征选择协同效应推进表格性中风建模

Yousuf Islam, Md. Jalal Uddin Chowdhury, Sumon Chandra Das

发表机构 * Department of Computer Science and Engineering, Leading University, Sylhet 3112, Bangladesh（计算机科学与工程系，领先大学，锡尔het 3112，孟加拉国）； DeepNet Research and Development Lab, Sylhet 3100, Bangladesh（深网研究与发展实验室，锡尔het 3100，孟加拉国）

AI总结本文提出了一种数据驱动且可解释的机器学习框架，利用十种常规获取的 demographics、生活方式和临床变量，通过详尽的探索性数据分析、数据预处理和特征选择，构建出一个准确率达到97.2%的中风风险评估模型，显著优于现有模型。

Journal ref IEEE Conference Publication, 2025

详情

DOI: 10.1109/BECITHCON69222.2025.11503962

AI中文摘要

脑中风仍然是全球死亡和残疾的主要原因之一，但大多数表格数据预测模型仍低于95%的准确率阈值，限制了实际应用。为解决这一差距，本文开发并验证了一个完全数据驱动且可解释的机器学习框架，旨在使用来自4981条记录的公共队列中十种常规获取的 demographics、生活方式和临床变量来预测中风。我们通过详尽的探索性数据分析（EDA）来理解数据集的结构和分布，随后进行严格的数据预处理，包括处理缺失值、去除异常值和使用合成少数类过采样技术（SMOTE）纠正类别不平衡。为了简化特征选择，使用了点二列相关性和随机森林Gini重要性，并利用分层五折交叉验证优化了包括树集成、提升、核方法和多层神经网络在内的十种不同算法。它们基于概率的预测帮助我们构建了所提出的模型，包括随机森林、XGBoost、LightGBM和一个支持向量分类器，其中逻辑回归作为元学习器。所提出的模型实现了97.2%的准确率和97.15%的F1分数，表明其显著优于领先的单个模型LightGBM，其准确率为91.4%。本研究的结果表明，严格的预处理与多样化的混合模型相结合，可以将低成本的表格数据转化为几乎临床级别的中风风险评估工具。

英文摘要

Brain stroke remains one of the principal causes of death and disability worldwide, yet most tabular-data prediction models still hover below the 95% accuracy threshold, limiting real-world utility. Addressing this gap, the present work develops and validates a completely data-driven and interpretable machine-learning framework designed to predict strokes using ten routinely gathered demographic, lifestyle, and clinical variables sourced from a public cohort of 4,981 records. We employ a detailed exploratory data analysis (EDA) to understand the dataset's structure and distribution, followed by rigorous data preprocessing, including handling missing values, outlier removal, and class imbalance correction using Synthetic Minority Over-sampling Technique (SMOTE). To streamline feature selection, point-biserial correlation and random-forest Gini importance were utilized, and ten varied algorithms-encompassing tree ensembles, boosting, kernel methods, and a multilayer neural network-were optimized using stratified five-fold cross-validation. Their predictions based on probabilities helped us build the proposed model, which included Random Forest, XGBoost, LightGBM, and a support-vector classifier, with logistic regression acting as a meta-learner. The proposed model achieved an accuracy rate of 97.2% and an F1-score of 97.15%, indicating a significant enhancement compared to the leading individual model, LightGBM, which had an accuracy of 91.4%. Our study's findings indicate that rigorous preprocessing, coupled with a diverse hybrid model, can convert low-cost tabular data into a nearly clinical-grade stroke-risk assessment tool.

URL PDF HTML ☆

赞 0 踩 0

2503.06115 2026-05-22 stat.ML cs.IT cs.LG math.IT math.PR 版本更新

On Statistical Estimation of Edge-Reinforced Random Walks

关于边缘增强随机游走的统计估计

Qinghua, Ding, Venkat Anantharam

发表机构 * Department of Electrical Engineering and Computer Sciences（电气工程与计算机科学系）； University of California at Berkeley（加州大学伯克利分校）

AI总结本文研究了边缘增强随机游走初始边权重的统计估计问题，利用随机环境中的超几何高斯结构来分析估计器的样本复杂性。

Comments This is the full version of the conference paper in submission to ISIT 2025

2502.21194 2026-05-22 stat.ML cs.LG 版本更新

Prior shift estimation for positive unlabeled data through the lens of kernel embedding

通过核嵌入视角对正样本无标签数据的先验偏移估计

Jan Mielniczuk, Wojciech Rejchel, Paweł Teisseyre

发表机构 * Polish Academy of Sciences（波兰科学院）； Nicolaus Copernicus University（尼古拉·哥白尼大学）； Warsaw University of Technology（华沙理工大学）

AI总结本文研究了在目标无标签样本的先验分布估计问题，假设其可能与源群体不同，并且源数据部分可观察：只有正类样本和整个群体的样本可用（PU学习场景）。提出了一种新的直接估计先验分布的方法，避免了对两个群体后验概率的估计，并具有简单的几何解释。该方法基于分布匹配技术与再生核希尔伯特空间中的核嵌入，并作为优化任务的显式解获得。建立了其渐近一致性以及对其与未知先验偏差的显式非渐近界，该界在实践中可计算。通过合成和实际数据研究有限样本行为，证明该方法在性能上与竞争对手相当或更优。

2502.09487 2026-05-22 cs.CL cs.AI cs.LG 版本更新

Internal narratives parameterise affective states

内部叙事参数化情感状态

Jakub Onysk, Quentin J. M. Huys

发表机构 * Applied Computational Psychiatry Lab（应用计算精神病学实验室）； Max Planck UCL Centre for Computational Psychiatry and Ageing Research（马克斯·普朗克UCL计算精神病学与衰老研究中心）； Queen Square Institute of Neurology and Mental Health（圣夸克广场神经病学与心理健康研究所）； Neuroscience Department（神经科学系）； Division of Psychiatry（精神病学系）

AI总结本文通过量化参与者内部叙事的大语言模型表示及其子空间，研究了叙事与情感状态之间的关系，发现特定症状的描述性思维能够预测标准化的抑郁评分，并强调保持症状间的协方差对构建效度至关重要。

详情

AI中文摘要

描述我们如何用语言表达感受对于心理评估和干预至关重要，但叙事与情感状态之间的映射仍然理解不足。在两个大规模研究（n=1257）中，我们通过大语言模型表示及其子空间量化了参与者内部叙事的结构和动态，以参数化抑郁状态。在第一项研究中，我们发现对特定症状的描述性思维捕捉了预测标准化、自我报告抑郁评分的细粒度信息。关键的是，我们显示保持症状之间的特定协方差对于构效效度至关重要，这表明高维文本表示镜像了疾病的潜在几何结构。第二项研究探讨了这种关系的时间动态，当参与者与情感叙事互动时。我们发现量化内部叙事的变化导致自我报告的变化，而基线叙事严重性预测了后续情感变化的幅度。通过将情感视为计算状态，我们的结果强调了其核心、治疗相关功能：约束内部叙事的结构并整合上下文以塑造自我报告。

英文摘要

Characterising how we verbalise our feelings is central to psychological assessment and intervention, yet the mapping between narrative and affective state remains poorly understood. Across two large studies (n=1257), we parameterised the structure and dynamics of depressive states by quantifying participants' internal narratives through large-language-model representations and their subspaces. In Study 1, we found verbal descriptions of symptom-specific thoughts captured granular information predictive of standardised, self-reported depression scores. Critically, we show preserving the specific covariance between symptoms is essential for construct validity, suggesting high-dimensional text representations mirror the latent geometry of the disorder. Study 2 probed the temporal dynamics of this relationship as participants engaged with emotional narratives. We found quantified changes in internal narratives led to changes in self-report, while the baseline narrative severity predicted the magnitude of subsequent affective change. By framing affect as a computational state, our results highlight its core, therapeutically pertinent functions: constraining the structure of internal narratives and integrating context to shape self-report.

URL PDF HTML ☆

赞 0 踩 0

2501.00677 2026-05-22 cs.LG cs.CV cs.IT cs.NA math.IT math.NA stat.ML 版本更新

Deeply Learned Robust Matrix Completion for Large-scale Low-rank Data Recovery

深度学习鲁棒矩阵补全用于大规模低秩数据恢复

HanQin Cai, Chandra Kundu, Jialin Liu, Wotao Yin

发表机构 * School of Data, Mathematical, and Statistical Sciences and the Department of Computer Science, University of Central Florida（数据、数学与统计科学学院和计算机科学系，中央佛罗里达大学）； School of Data, Mathematical, and Statistical Sciences, University of Central Florida（数据、数学与统计科学学院，中央佛罗里达大学）； Damo Academy, Alibaba US（阿里云美国研究院）

AI总结本文提出了一种可扩展且可学习的非凸方法，即学得鲁棒矩阵补全（LRMC），用于大规模鲁棒矩阵补全问题，该方法具有低计算复杂度和线性收敛性，并通过深度展开有效学习自由参数以实现最优性能，同时在合成数据集和实际应用中验证了其优越的实验性能。

Journal ref IEEE Transactions on Pattern Analysis and Machine Intelligence, 48(6): 6541-6556, 2026

详情

DOI: 10.1109/TPAMI.2026.3659041

AI中文摘要

鲁棒矩阵补全（RMC）是一种广泛使用的机器学习工具，同时解决低秩数据分析中的两个关键问题：缺失数据条目和极端异常值。本文提出了一种新颖的可扩展且可学习的非凸方法，称为学得鲁棒矩阵补全（LRMC），用于大规模RMC问题。LRMC具有低计算复杂度和线性收敛性。受所提出定理的启发，LRMC的自由参数可通过深度展开有效学习以达到最佳性能。此外，本文提出了一种灵活的前馈-递归-混合神经网络框架，将深度展开从固定次数迭代扩展到无限次数迭代。通过在合成数据集和实际应用中的广泛实验，验证了LRMC的优越的实验性能，包括视频背景减除、超声成像、面部建模和卫星图像云去除。

英文摘要

Robust matrix completion (RMC) is a widely used machine learning tool that simultaneously tackles two critical issues in low-rank data analysis: missing data entries and extreme outliers. This paper proposes a novel scalable and learnable non-convex approach, coined Learned Robust Matrix Completion (LRMC), for large-scale RMC problems. LRMC enjoys low computational complexity with linear convergence. Motivated by the proposed theorem, the free parameters of LRMC can be effectively learned via deep unfolding to achieve optimum performance. Furthermore, this paper proposes a flexible feedforward-recurrent-mixed neural network framework that extends deep unfolding from fix-number iterations to infinite iterations. The superior empirical performance of LRMC is verified with extensive experiments against state-of-the-art on synthetic datasets and real applications, including video background subtraction, ultrasound imaging, face modeling, and cloud removal from satellite imagery.

URL PDF HTML ☆

赞 0 踩 0

2411.02813 2026-05-22 cs.LG 版本更新

Sparse Orthogonal Parameters Tuning for Continual Learning

稀疏正交参数调优用于持续学习

Kun-Peng Ning, Hai-Jian Ke, Yu-Yang Liu, Jia-Yu Yao, Yong-Hong Tian, Li Yuan

AI总结本文提出了一种名为SoTU的新型方法，通过稀疏正交参数调优来解决持续学习中的灾难性遗忘问题，实现了对流数据的最优特征表示。

详情

AI中文摘要

基于预训练模型（PTM）的持续学习方法近年来引起了广泛关注，这些方法能够适应连续的下游任务而无需灾难性遗忘。这些方法通常不更新预训练参数，而是使用额外的适配器、提示和分类器。在本文中，我们从新的角度研究了稀疏正交参数对持续学习的益处。我们发现，合并来自多个流任务的模型所学习的稀疏正交性在解决灾难性遗忘方面具有巨大潜力。利用这一见解，我们提出了一种新颖且有效的称为SoTU（稀疏正交参数调优）的方法。我们假设SoTU的有效性在于将多个领域学到的知识转换为正交delta参数的融合。在多样化的CL基准测试中评估了所提出的方法的有效性。值得注意的是，SoTU在不需要复杂分类器设计的情况下实现了流数据的最优特征表示，使其成为一种即插即用的解决方案。

英文摘要

Continual learning methods based on pre-trained models (PTM) have recently gained attention which adapt to successive downstream tasks without catastrophic forgetting. These methods typically refrain from updating the pre-trained parameters and instead employ additional adapters, prompts, and classifiers. In this paper, we from a novel perspective investigate the benefit of sparse orthogonal parameters for continual learning. We found that merging sparse orthogonality of models learned from multiple streaming tasks has great potential in addressing catastrophic forgetting. Leveraging this insight, we propose a novel yet effective method called SoTU (Sparse Orthogonal Parameters TUning). We hypothesize that the effectiveness of SoTU lies in the transformation of knowledge learned from multiple domains into the fusion of orthogonal delta parameters. Experimental evaluations on diverse CL benchmarks demonstrate the effectiveness of the proposed approach. Notably, SoTU achieves optimal feature representation for streaming data without necessitating complex classifier designs, making it a Plug-and-Play solution.

URL PDF HTML ☆

赞 0 踩 0

2411.02776 2026-05-22 cs.LG stat.AP 版本更新

Deep learning-based modularized loading protocol for parameter estimation of Bouc-Wen class models

基于深度学习的模块化加载协议用于Bouc-Wen类模型参数估计

Sebin Oh, Junho Song, Taeyong Kim

发表机构 * Department of Civil and Environmental Engineering, University of California, Berkeley, CA, United States（加州大学伯克利分校土木与环境工程系）； Department of Civil Systems Engineering, Ajou University, Suwon, Republic of Korea（全州大学土木系统工程系）

AI总结本文提出了一种基于深度学习的模块化加载协议，用于优化Bouc-Wen类模型的参数估计。该协议包含两个关键部分：最优加载历史构建和基于CNN的快速参数估计。每个部分被分解为独立的子模块，针对不同的滞回行为（基本滞回、结构退化和咬合效应），使协议能够适应多种滞回模型。三种独立的CNN架构被开发出来以捕捉这些滞回行为的路径依赖性。通过在多样化的加载历史上训练这些CNN架构，识别出最小的加载序列，称为加载历史模块，并将其组合以构建最优的加载历史。三种训练好的CNN模型用作快速参数估计器。协议的数值评估，包括三栋钢结构框架的非线性时间历史分析和三栋钢筋混凝土框架的脆弱性曲线构建，表明该协议显著减少了总分析时间，同时保持或提高了估计精度。该协议可扩展到其他滞回模型，表明了一种系统的方法来识别通用滞回模型。

Journal ref Engineering Structures 339, 120458 (2025)

详情

DOI: 10.1016/j.engstruct.2025.120458

AI中文摘要

本研究提出了一种模块化的深度学习基于加载协议，用于Bouc-Wen（BW）类模型的最佳参数估计。该协议由两个关键组成部分组成：最佳加载历史构建和基于CNN的快速参数估计。每个组成部分被分解为独立的子模块，针对不同的滞回行为——基本滞回、结构退化和咬合效应——使协议能够适应多种滞回模型。开发了三种独立的CNN架构以捕捉这些滞回行为的路径依赖性。通过在多样化的加载历史上训练这些CNN架构，识别出最小的加载序列，称为加载历史模块，然后将其组合以构建最优的加载历史。三种训练好的CNN模型，分别在相应的加载历史模块上训练，用作快速参数估计器。协议的数值评估，包括三栋钢结构框架的非线性时间历史分析和三栋钢筋混凝土框架的脆弱性曲线构建，表明所提出的协议显著减少了总分析时间，同时保持或提高了估计精度。所提出的协议可以扩展到其他滞回模型，表明了一种系统的方法来识别通用滞回模型。

英文摘要

This study proposes a modularized deep learning-based loading protocol for optimal parameter estimation of Bouc-Wen (BW) class models. The protocol consists of two key components: optimal loading history construction and CNN-based rapid parameter estimation. Each component is decomposed into independent sub-modules tailored to distinct hysteretic behaviors-basic hysteresis, structural degradation, and pinching effect-making the protocol adaptable to diverse hysteresis models. Three independent CNN architectures are developed to capture the path-dependent nature of these hysteretic behaviors. By training these CNN architectures on diverse loading histories, minimal loading sequences, termed \textit{loading history modules}, are identified and then combined to construct an optimal loading history. The three CNN models, trained on the respective loading history modules, serve as rapid parameter estimators. Numerical evaluation of the protocol, including nonlinear time history analysis of a 3-story steel moment frame and fragility curve construction for a 3-story reinforced concrete frame, demonstrates that the proposed protocol significantly reduces total analysis time while maintaining or improving estimation accuracy. The proposed protocol can be extended to other hysteresis models, suggesting a systematic approach for identifying general hysteresis models.

URL PDF HTML ☆

赞 0 踩 0

2411.01332 2026-05-22 cs.LG cs.AI 版本更新

A Mechanistic Explanatory Strategy for XAI

为XAI的解释性策略机制

Marcin Rabiza

发表机构 * Institute of Philosophy and Sociology, Polish Academy of Sciences（哲学与社会学院，波兰科学院）； Institute for Philosophy, Leiden University（哲学研究所，莱顿大学）

AI总结本文提出了一种基于机制的解释性策略，旨在通过分解、定位和重组来揭示深度学习系统功能组织的机制，从而改进可解释人工智能的理论基础和实践应用。

详情

DOI: 10.1007/978-3-032-10073-3_23

AI中文摘要

尽管在XAI领域取得了显著进展，学者们指出缺乏坚实的理论基础和与更广泛科学解释 discourse 的整合仍是持续存在的问题。为此，新兴研究借鉴了各种科学和科学哲学文献中的解释策略来填补这些空白。本文概述了一种用于解释深度学习系统功能组织的机制性策略，将近期的可解释人工智能发展置于更广泛的哲学背景下。根据机制方法，对不透明AI系统的解释涉及识别驱动决策的机制。对于深度神经网络，这意味着辨别功能相关组件，如神经元、层、电路或激活模式，并通过分解、定位和重组来理解其作用。图像识别和语言模型的证明原理案例研究将这些理论方法与OpenAI和Anthropic的机制可解释性研究相结合。研究结果表明，追求机制性解释可以揭示传统可解释性技术可能忽略的元素，最终促进更彻底的可解释人工智能。

英文摘要

Despite significant advancements in XAI, scholars note a persistent lack of solid conceptual foundations and integration with broader scientific discourse on explanation. In response, emerging research draws on explanatory strategies from various sciences and the philosophy of science literature to fill these gaps. This paper outlines a mechanistic strategy for explaining the functional organization of deep learning systems, situating recent developments in explainable AI within a broader philosophical context. According to the mechanistic approach, the explanation of opaque AI systems involves identifying mechanisms that drive decision making. For deep neural networks, this means discerning functionally relevant components, such as neurons, layers, circuits, or activation patterns, and understanding their roles through decomposition, localization, and recomposition. Proof-of-principle case studies from image recognition and language modeling align these theoretical approaches with mechanistic interpretability research from OpenAI and Anthropic. The findings suggest that pursuing mechanistic explanations can uncover elements that traditional explainability techniques may overlook, ultimately contributing to more thoroughly explainable AI

URL PDF HTML ☆

赞 0 踩 0

2410.04753 2026-05-22 cs.AI cs.CL cs.LG cs.LO 版本更新

ImProver: Agent-Based Automated Proof Optimization

ImProver：基于代理的自动证明优化

Riyaz Ahuja, Jeremy Avigad, Prasad Tetali, Sean Welleck

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结本文研究了自动证明优化问题，提出ImProver这一基于大语言模型的代理，用于重写证明以优化长度、可读性等任意标准，实验表明其能显著缩短证明并提高其模块化和可读性。

Comments Published as a conference paper at ICLR 2025

详情

AI中文摘要

大型语言模型（LLMs）已被用于在证明助手如Lean中生成数学定理的正式证明。然而，我们通常希望根据不同的下游用途优化正式证明，例如使其符合某种风格、易于阅读、简洁或模块化。适当优化的证明对于学习任务也非常重要，尤其是因为人工撰写的证明可能不适用于此目的。为此，我们研究了一个新的问题：自动证明优化，即重写证明以使其正确并优化任意标准，如长度或可读性。作为自动证明优化的一种初步方法，我们提出了ImProver，这是一个能够重写证明以优化任意用户定义指标的大型语言模型代理。我们发现直接应用LLMs进行证明优化效果有限，并在ImProver中引入了各种改进，例如新颖的链式状态技术中的符号Lean上下文使用，以及错误校正和检索。我们测试了ImProver在重写真实世界中的本科、竞赛和研究级数学定理方面的性能，发现ImProver能够重写证明使其显著更短、更模块化和更易读。

英文摘要

Large language models (LLMs) have been used to generate formal proofs of mathematical theorems in proofs assistants such as Lean. However, we often want to optimize a formal proof with respect to various criteria, depending on its downstream use. For example, we may want a proof to adhere to a certain style, or to be readable, concise, or modularly structured. Having suitably optimized proofs is also important for learning tasks, especially since human-written proofs may not optimal for that purpose. To this end, we study a new problem of automated proof optimization: rewriting a proof so that it is correct and optimizes for an arbitrary criterion, such as length or readability. As a first method for automated proof optimization, we present ImProver, a large-language-model agent that rewrites proofs to optimize arbitrary user-defined metrics in Lean. We find that naively applying LLMs to proof optimization falls short, and we incorporate various improvements into ImProver, such as the use of symbolic Lean context in a novel Chain-of-States technique, as well as error-correction and retrieval. We test ImProver on rewriting real-world undergraduate, competition, and research-level mathematics theorems, finding that ImProver is capable of rewriting proofs so that they are substantially shorter, more modular, and more readable.

URL PDF HTML ☆

赞 0 踩 0

2401.00139 2026-05-22 cs.AI cs.CL cs.LG stat.ME 版本更新

Enhancing Causal Reasoning in Large Language Models: A Causal Attribution Model for Precision Fine-Tuning

增强大语言模型中的因果推理：一种用于精确微调的因果归因模型

Hengrui Cai, Shengjie Liu, Rui Song

发表机构 * University of California, Irvine（加州大学尔湾分校）； North Carolina State University（北卡罗来纳州立大学）； Amazon（亚马逊公司）

AI总结本文提出一种因果归因模型，通过精确微调提升大语言模型的可解释性和因果推理能力，展示了模型在不同领域中的因果发现任务中的有效性。

Comments A Python implementation of our proposed method is available at https://github.com/ncsulsj/Causal_LLM

详情

AI中文摘要

本文介绍了一种因果归因模型，旨在通过精确微调增强大语言模型（LLMs）的可解释性并提高其因果推理能力。尽管LLMs在多种任务中表现出色，但其推理过程往往仍是一个黑箱，限制了有针对性的增强。我们提出了一种新的因果归因模型，利用“do-运算符”构建干预场景，使我们能够系统地量化LLMs因果推理过程中不同组件的贡献。通过在各种领域中进行因果发现任务来评估所提出的归因分数，我们证明了LLMs在因果发现中的有效性严重依赖于提供的上下文和领域特定知识，但也可以利用数值数据进行有限的相关性推理，而非因果性。这促使了所提出的微调LLM用于成对因果发现，有效且正确地利用了知识和数值信息。

英文摘要

This paper introduces a causal attribution model to enhance the interpretability of large language models (LLMs) and improve their causal reasoning abilities via precise fine-tuning. Despite LLMs' proficiency in diverse tasks, their reasoning processes often remain black box, and thus restrict targeted enhancement. We propose a novel causal attribution model that utilizes "do-operators" for constructing interventional scenarios, allowing us to quantify the contribution of different components in LLMs's causal reasoning process systematically. By assessing the proposed attribution scores through causal discovery tasks across various domains, we demonstrate that LLMs' effectiveness in causal discovery heavily relies on provided context and domain-specific knowledge but can also utilize numerical data with limited calculations in correlation, not causation. This motivates the proposed fine-tuned LLM for pairwise causal discovery, effectively and correctly leveraging both knowledge and numerical information.

URL PDF HTML ☆

赞 0 踩 0

2311.04938 2026-05-22 cs.CV cs.AI cs.LG 版本更新

Improved DDIM Sampling with Moment Matching Gaussian Mixtures

改进的DDIM采样与矩匹配高斯混合模型

Prasad Gabbur

发表机构 * Independent Researcher（独立研究者）； Apple（苹果公司）

AI总结本文提出在DDIM框架中使用高斯混合模型作为反向转换操作符，通过约束GMM参数匹配DDPM前向边缘的矩，从而在少量采样步骤下提升生成样本质量，实验表明GMM核在FID和IS指标上优于传统高斯核。

Comments 34 pages, 12 figures; Accepted to TMLR; Code open sourced

Journal ref Transactions on Machine Learning Research, 05/2026

详情

AI中文摘要

我们提出在去噪扩散隐式模型（DDIM）框架中使用高斯混合模型（GMM）作为反向转换操作符（核），这是用于加速从预训练去噪扩散概率模型（DDPM）采样的最广泛使用的 approaches 之一。具体而言，我们通过约束GMM参数来匹配DDPM前向边缘的一阶和二阶中心矩。我们发现矩匹配足以获得与原始DDIM高斯核相等或更好的样本质量。我们分别在无条件模型（训练于CelebAHQ和FFHQ）、类条件模型（训练于ImageNet）以及使用Stable Diffusion v2.1在COYO700M数据集上进行文本到图像生成实验。我们的结果表明，当采样步骤数较小时，使用GMM核可显著提升生成样本的质量，如在ImageNet 256x256上，使用10个采样步骤时，GMM核的FID为6.94，IS为207.85，而高斯核分别为10.15和196.73。此外，我们还为修正流匹配模型推导了新的SDE采样器，并对所提出的方法进行了实验。我们发现使用1-修正流和2-修正流模型均有所改进。代码：https://github.com/pgabbur/ddim-gmm。

英文摘要

We propose using a Gaussian Mixture Model (GMM) as reverse transition operator (kernel) within the Denoising Diffusion Implicit Models (DDIM) framework, which is one of the most widely used approaches for accelerated sampling from pre-trained Denoising Diffusion Probabilistic Models (DDPM). Specifically we match the first and second order central moments of the DDPM forward marginals by constraining the parameters of the GMM. We see that moment matching is sufficient to obtain samples with equal or better quality than the original DDIM with Gaussian kernels. We provide experimental results with unconditional models trained on CelebAHQ and FFHQ, class-conditional models trained on ImageNet, and text-to-image generation using Stable Diffusion v2.1 on COYO700M datasets respectively. Our results suggest that using the GMM kernel leads to significant improvements in the quality of the generated samples when the number of sampling steps is small, as measured by FID and IS metrics. For example on ImageNet 256x256, using 10 sampling steps, we achieve a FID of 6.94 and IS of 207.85 with a GMM kernel compared to 10.15 and 196.73 respectively with a Gaussian kernel. Further, we derive novel SDE samplers for rectified flow matching models and experiment with the proposed approach. We see improvements using both 1-rectified flow and 2-rectified flow models. Code: https://github.com/pgabbur/ddim-gmm.

URL PDF HTML ☆

赞 0 踩 0

2306.05905 2026-05-22 cs.LG math.OC 版本更新

TreeDQN: Sample-Efficient Off-Policy Reinforcement Learning for Combinatorial Optimization

TreeDQN: 一种用于组合优化的高效离策略强化学习方法

D. Sorokin, A. Kostin, L. Savchenko, G. Gusev, A. V. Savchenko

发表机构 * Sber AI Lab（Sber AI实验室）； Laboratory for Theoretical Foundations of AI Models, HSE University（人工智能模型理论基础实验室，HSE大学）

AI总结 TreeDQN通过优化几何平均预期回报，提高了离策略强化学习在组合优化任务中的样本效率，并在合成任务和ML4CO竞赛中表现优异。

Comments Accepted in Knowledge-Based Systems

详情

AI中文摘要

解决组合优化任务的一种方便方法是分支定界法。其分支启发式可以学习以解决大量相似任务。在这里取得的有希望的结果是通过最近出现的基于树马尔可夫决策过程的在线策略强化学习方法实现的。为了克服其主要缺点，即训练时间非常大和不稳定，我们提出了TreeDQN（树深度Q网络），一种样本效率高的离策略RL方法，通过优化预期回报的几何平均来训练。为了理论支持我们的方法的训练过程，我们证明了树MDP中Bellman算子的收缩性质。结果表明，我们的方法所需的训练数据最多减少10倍，并在合成任务上比已知的在线策略方法运行更快。此外，TreeDQN在ML4CO竞赛中的挑战性实际任务上显著优于最先进的技术。

英文摘要

A convenient approach to optimally solving combinatorial optimization tasks is the Branch-and-Bound method. Its branching heuristic can be learned to solve a large set of similar tasks. The promising results here are achieved by the recently appeared on-policy reinforcement learning method based on the tree Markov Decision Process. To overcome its main disadvantages, namely, very large training time and unstable training, we propose TreeDQN (Tree Deep Q-Network), a sample-efficient off-policy RL method trained by optimizing the geometric mean of expected return. To theoretically support the training procedure for our method, we prove the contraction property of the Bellman operator for the tree MDP. As a result, our method requires up to 10 times less training data and performs faster than known on-policy methods on synthetic tasks. Moreover, TreeDQN significantly outperforms the state-of-the-art techniques on a challenging practical task from the ML4CO competition.

URL PDF HTML ☆

赞 0 踩 0