arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.06494 2026-06-05 cs.LG 版本更新

TailLoR: Protecting Principal Components in Parameter-Efficient Continual Learning

TailLoR：在参数高效持续学习中保护主成分

Marius Dragoi, Ioana Pintilie, Alexandra Dragomir, Antonio Barbalau, Florin Brad

发表机构 * Bitdefender

AI总结提出TailLoR方法，利用预训练权重的奇异基作为固定参考系，对奇异值矩阵进行低秩更新，并通过软谱惩罚抑制与主导奇异方向对齐的更新，从而减少干扰并实现细粒度适应。

2606.06486 2026-06-05 cs.LG cs.AI cs.GT 版本更新

Regret Minimization with Adaptive Opponents in Repeated Games

重复博弈中与自适应对手的遗憾最小化

Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu, Kaiqing Zhang

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； OpenAI ； University of Maryland, College Park（马里兰大学学院公园分校）

AI总结针对重复博弈中自适应对手的遗憾最小化问题，提出重复策略遗憾（RP-Regret）指标，并设计三种算法实现次线性遗憾，同时证明所有玩家最小化该遗憾可学习子博弈完美均衡。

详情

AI中文摘要

在本文中，我们研究重复博弈中与\emph{自适应}对手（即能够根据历史对局做出反应的对手）的遗憾最小化问题。已知在线学习中的标准\emph{外部遗憾}指标无法捕捉这种自适应性。为了考虑玩家的反事实推理，我们引入了{ t 重复策略遗憾（RP-Regret）}，这是一种博弈论指标，衡量当所有玩家都能对历史对局做出\emph{反应}时，\emph{实际}累积效用与\emph{事后最优}累积效用之间的差异。与此背景下现有的遗憾概念相比，我们的概念更贴近重复博弈，允许更强的比较器和约束更少的对手，同时当所有玩家最小化该遗憾时，仍有可能找到更好的均衡。我们首先确定了获得时间次线性{ t RP-Regret}的必要条件，涉及遗憾定义中玩家比较器策略的变化以及比较器和对手策略的记忆。然后，我们研究了最小化{ t RP-Regret}的附加条件和可证明的算法，该遗憾在策略空间上本质上是\emph{非凸}的。为了应对这一挑战，我们提出了三种算法：（i）基于优化预言机（如先前一些在线非凸学习工作所假设的）；（ii）每次迭代最小化{ t RP-Regret}的凸\emph{线性化}替代项；（iii）当对手缓慢改变策略时，直接最小化{ t RP-Regret}。此外，当所有玩家都能运行算法最小化{ t RP-Regret}（或其线性化变体）时，可以学习重复博弈的某些子博弈完美均衡。我们还提供了实验，表明最小化我们的遗憾概念可以在诸如猎鹿博弈等游戏中带来更合作、效用更高的解。

英文摘要

In this paper, we study regret minimization in repeated games with \emph{adaptive} opponents who can respond based on histories of play. The standard metric of \emph{external regret} in online learning is known to fail to capture such adaptivity. To account for players' counterfactual reasoning, we introduce {\tt Repeated Policy Regret (RP-Regret)}, a game-theoretic metric that measures the difference between the \emph{realized} and the \emph{best-in-hindsight} accumulated utility when all players can \emph{respond} to the history of play. Compared to existing regret notions in this setting, ours is native to repeated game playing, enabling stronger comparators and opponents with fewer constraints, while maintaining the possibility of finding better equilibria when all players minimize it. We first identify necessary conditions for obtaining {\tt RP-Regret} sublinear in time, on the variation of the player's comparator strategies in the regret definition and on the memories of both the comparator and opponents' strategies. We then study additional conditions and provable algorithms to minimize {\tt RP-Regret}, which is by definition \emph{non-convex} in the strategy space. To address this challenge, we propose three algorithms: (i) one based on an optimization oracle, as assumed in some prior work in online non-convex learning; (ii) one that minimizes a convex and \emph{linearized} surrogate of {\tt RP-Regret} at each iteration; (iii) one that directly minimizes {\tt RP-Regret} when opponents change strategies slowly. Furthermore, when all players can run algorithms to minimize the {\tt RP-Regret} (or its linearized variant), certain subgame perfect equilibria of the repeated game can be learned. We also provide experiments showing that minimizing our regret notions can lead to more cooperative solutions with higher utility in games such as Stag-Hunt.

URL PDF HTML ☆

赞 0 踩 0

2606.06481 2026-06-05 cs.CL cs.AI cs.LG 版本更新

Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection

操作引导的渐进式人机文本转换基准：面向多粒度AI文本检测

Sondos Mahmoud Bsharat, Jiacheng Liu, Xiaohan Zhao, Tianjun Yao, Xinyi Shang, Yi Tang, Jiacheng Cui, Ahmed Elhagry, Salwa K. Al Khatib, Hao Li, Salman Khan, Zhiqiang Shen

发表机构 * Mohamed bin Zayed University of Artificial Intelligence（莫扎德·本·泽亚德人工智能大学）； University College London（伦敦大学学院）

AI总结提出OpAI-Bench基准，通过九种渐进修订版本和五种AI编辑操作，模拟人机协作编辑过程，支持文档、句子、词元和跨度多粒度检测，揭示AI文本可检测性受编辑操作、领域和累积修订历史影响，并发现混合作者中间版本比纯人类或纯AI端点更难检测。

Comments Our code and data are available at https://github.com/VILA-Lab/OpAI-Bench

详情

AI中文摘要

随着AI写作助手越来越多地融入现实世界的起草和修订流程，许多文档不再是纯粹的人类撰写或AI生成，而是渐进式人机共同编辑的结果。然而，现有的AI文本检测基准主要关注最终输出，对AI作者身份信号如何在修订过程中出现、累积或消失的理解有限。我们引入了OpAI-Bench，一个操作引导的基准，用于研究在文档、句子、词元和跨度粒度上的渐进式人机文本转换。从人类撰写的文档开始，OpAI-Bench在预定义的AI覆盖水平和五种代表性AI编辑操作下，为每个样本构建了九个顺序修订版本，涵盖四个领域，同时保留多粒度上的完整作者身份来源。该基准支持8个文档级检测器、7个句子级检测器和2个细粒度词元/跨度级检测器的全面评估。实验表明，AI文本的可检测性不仅受AI编辑内容比例的影响，还受编辑操作、领域和累积修订历史的影响。有趣的是，我们注意到混合作者身份的中间版本通常比完全人类或大量AI编辑的端点更难检测，暴露了现有基准遗漏的非单调检测模式。OpAI-Bench为分析在现实渐进编辑场景下，AI辅助写作是否、何时以及如何变得可检测提供了一个受控测试平台。我们的代码和基准可在https://github.com/VILA-Lab/OpAI-Bench获取。

英文摘要

As AI writing assistants become increasingly integrated into real-world drafting and revision workflows, many documents are no longer purely human-written or AI-generated, but instead result from progressive human-AI co-editing. However, existing AI-text detection benchmarks largely focus on final outputs and provide limited understanding of how AI authorship signals emerge, accumulate, or disappear throughout the revision process. We introduce OpAI-Bench, an operation-guided benchmark for studying progressive human-to-AI text transformation across document, sentence, token, and span granularities. Starting from human-written documents, OpAI-Bench constructs nine sequentially revised versions for each sample under predefined AI coverage levels and five representative AI edit operations, covering four domains while preserving complete authorship provenance at multiple granularities. The benchmark supports comprehensive evaluation with 8 document-level detectors, 7 sentence-level detectors, and 2 fine-grained token/span-level detectors. Experiments reveal that AI-text detectability is governed not only by the proportion of AI-edited content, but also by edit operation, domain, and cumulative revision history. Interestingly, we notice that mixed-authorship intermediate versions are often harder to detect than both fully human and heavily AI-edited endpoints, exposing non-monotonic detection patterns missed by existing benchmarks. OpAI-Bench provides a controlled testbed for analyzing whether, when, and how AI-assisted writing becomes detectable under realistic progressive editing scenarios. Our code and benchmark are available at https://github.com/VILA-Lab/OpAI-Bench.

URL PDF HTML ☆

赞 0 踩 0

2606.06480 2026-06-05 cs.GT cs.LG 版本更新

PC层：通过多项式权重预处理改进大语言模型预训练

Senmiao Wang, Tiantian Fang, Haoran Zhang, Yushun Zhang, Kunxiang Zhao, Alex Schwing, Ruoyu Sun

发表机构 * The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））； Google LLC（谷歌公司）； Shenzhen International Center for Industrial and Applied Mathematics（深圳国际工业与应用数学中心）； Shenzhen Research Institute of Big Data（深圳大数据研究院）； University of Illinois at Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结提出一种多项式预条件子权重参数化方法（PC层），通过低阶多项式预条件重塑权重矩阵奇异值谱，确保LLM训练中权重条件稳定，且训练后无推理开销，在Llama-1B预训练中优于标准Transformer。

2606.06469 2026-06-05 math.ST cs.LG math.PR stat.TH 版本更新

How abundant are good interpolators?

好的插值器有多丰富？

August Y. Chen, Ahmed El Alaoui

发表机构 * Cornell University, Department of Computer Science（康奈尔大学计算机科学系）； Cornell University, Department of Statistics and Data Science（康奈尔大学统计与数据科学系）

AI总结在高维比例下，通过大偏差原理研究随机均匀选择的线性插值分类器的泛化误差分布，发现几乎所有插值分类器具有相同的泛化性能，而高效算法（如梯度下降）优于大多数插值器。

Comments 140 pages

详情

AI中文摘要

设 $S$ 是单位范数线性分类器 $\theta\in \mathbb{R}^d$ 的集合，这些分类器以预先固定的可能负的间隔 $\kappa$ 正确分类标记数据集 $(X_i,y_i)_{i=1}^n$ 中的每个点，其中 $X_i \in \mathbb{R}^d$，$y_i \in \{-1,+1\}$。在两种自然的数据生成分布——高斯混合模型和具有高斯特征的逻辑模型——以及比例 $n/d \to \alpha$ 且 $\alpha$ 足够小的条件下，我们建立了关于事件（从 $S$ 中均匀随机选择的点 $\theta$ 达到给定泛化误差）的大偏差原理，且该事件以高概率依赖于数据的选择。相关的速率函数是确定性的，描述了在 $d$ 的指数尺度上具有给定期望性能的插值分类器的比例。作为推论，我们建立了以下集中现象：除了指数小的一部分外，所有插值分类器都具有大致相同的泛化性能，该性能由该速率函数的唯一最大值给出。我们将该最大值与通过梯度下降的经验风险最小化和自然线性规划的性能进行了数值比较，两者都找到了 $S$ 中的一个点，并推断出在 $\alpha$ 小的过参数化区域中，这些高效方法优于绝大多数插值器，指出了它们在此设置中非平凡的良性过拟合。

英文摘要

Let $S$ be the set of unit norm linear classifiers $θ\in \mathbb{R}^d$ which correctly classify every point of a labeled dataset $(X_i,y_i)_{i=1}^n$, $X_i \in \mathbb{R}^d$, $y_i \in \{-1,+1\}$, with a possibly negative margin $κ$ fixed in advance. Under two natural data-generating distributions of the $(X,y)$ pairs -- a Gaussian mixture model and a logistic model with Gaussian features -- and in the proportional regime $n/d \to α$ with small enough $α$, we establish a large deviation principle on the event that a point $θ$ chosen uniformly at random from $S$ achieves a given generalization error, with high probability over the choice of the data. The associated large deviation rate function is deterministic and describes the proportion, at the exponential scale in $d$, of interpolating classifiers having a given desired performance. As a consequence, we establish the following concentration phenomenon: all but an exponentially small fraction of interpolating classifiers have approximately the same generalization performance given by the unique maximizer of this rate function. We numerically compare this maximizer to the performance of empirical risk minimization by gradient descent and to the performance of a natural linear program, both finding a point in $S$, and deduce that in the overparametrized regime of small $α$, these efficient procedures outperform the vast majority of interpolators, pointing to their nontrivial benign overfitting in this setting.

URL PDF HTML ☆

赞 0 踩 0

2606.06467 2026-06-05 cs.CL cs.AI cs.LG 版本更新

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

仅索引一次：具有共享路由的跨层稀疏注意力

Yutao Sun, Yanqi Zhang, Li Dong, Jianyong Wang, Furu Wei

发表机构 * Microsoft Research（微软研究院）； Tsinghua University（清华大学）

AI总结提出跨层稀疏注意力（CLSA），通过共享KV缓存和路由索引，在保持token稀疏注意力精度的同时减少路由开销，显著提升长上下文LLM的解码效率。

详情

AI中文摘要

现代LLM中的长上下文推理越来越受到解码效率的限制，尤其是在模型生成长中间思维链的推理密集型场景中。现有的稀疏注意力方法通常面临实际的效率-质量权衡。结构化块稀疏方法通常提供更强的加速，但会导致明显的质量损失，而token稀疏方法通常更准确，但由于在全缓存上进行top-k路由仍然昂贵，因此端到端加速有限。在这项工作中，我们提出了跨层稀疏注意力（CLSA），它建立在KV共享架构（如YOCO）之上。核心思想不仅是跨解码器层共享KV缓存，还共享路由索引。单个索引器计算一次token级别的top-k选择，并在各层之间重用生成的索引，从而保留了token稀疏注意力的细粒度选择性，同时分摊了路由开销。由此产生的架构共同改善了所有主要的推理瓶颈，包括预填充、KV缓存存储和长上下文解码。在短上下文和长上下文基准上的实验表明，CLSA既准确又高效，在128K上下文下实现了高达7.6倍的解码加速和17.1倍的总体吞吐量提升。这些结果表明，对于长上下文LLM，这是一种更完整的架构解决方案，可同时提升模型质量和推理效率。

英文摘要

Long-context inference in modern LLMs is increasingly constrained by decoding efficiency, especially in reasoning-heavy settings where models generate long intermediate chains of thought. Existing sparse attention methods often face a practical efficiency-quality trade-off. Structured block sparse methods typically provide stronger acceleration but incur noticeable quality loss, while token sparse methods are usually more accurate yet deliver limited end-to-end speedup because top-k routing over the full cache remains expensive. In this work, we propose cross-layer sparse attention (CLSA), which is built on top of KV-sharing architectures such as YOCO. The core idea is to share not only the KV cache across cross-decoder layers, but also the routing index. A single indexer computes token-level top-k selection once and reuses the resulting index across layers, thereby preserving the fine-grained selectivity of token sparse attention while amortizing the routing overhead. The resulting architecture improves all major inference bottlenecks jointly, including pre-filling, KV-cache storage, and long-context decoding. Experiments across short-context and long-context benchmarks show that CLSA is both accurate and efficient, achieving up to 7.6x decoding speedup and 17.1x overall throughput improvement at 128K context. These results suggest a more complete architectural solution for long-context LLMs that jointly advances model quality and inference efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.06459 2026-06-05 cs.LG 版本更新

Event Detection for Parameter-to-KPI Dependency Learning for AI-RAN

面向AI-RAN的参数到KPI依赖学习的事件检测

Christie Djidjev, Nicholas Kaminski

发表机构 * Idaho National Laboratory（爱达荷国家实验室）

AI总结针对AI-RAN中多AI控制函数相互干扰问题，提出基于事件检测的依赖学习方法，通过将噪声连续遥测转换为二元事件指示器，并利用合成数据评估机器学习管道恢复潜在依赖结构的能力。

详情

AI中文摘要

下一代无线网络预计将依赖多个并发的AI驱动控制功能，这些功能同时优化不同的网络目标，特别是在AI集成和开放无线接入网络架构中，如AI无线接入网络（AI-RAN）和开放无线接入网络（O-RAN）。当这些功能相互作用时，它们可能以难以仅从原始网络数据中检测的方式相互干扰。管理此类交互的一个关键缺失部分是可靠、可解释的依赖结构，该结构捕获在任何给定时间哪些控制参数积极影响哪些网络性能结果。本文聚焦于支持此类依赖学习所需的事件检测步骤，通过将噪声连续遥测转换为参数活动和KPI响应的二元指示器。核心困难在于并非数据中的每个波动都反映真实的控制交互，因此该方法必须区分真实的参数-结果关系与背景变化。由于难以获得具有已知参数-KPI真实标签的真实AI-RAN流量轨迹，我们引入了一个带有植入潜在依赖的合成闭环流量生成器。我们使用这种受控遥测来评估基于机器学习的依赖恢复管道，该管道将连续轨迹到二元事件指示器的转换表述为一个显著性检测问题。实验评估表明，当信号与背景变化充分分离时，所提出的管道能够可靠地从噪声连续轨迹中恢复潜在依赖结构，同时强调阈值校准是控制事件检测质量的关键因素。这些结果为自适应AI-RAN控制系统的可解释依赖学习奠定了基础。

英文摘要

Next-generation wireless networks are expected to rely on multiple concurrent AI-driven control functions that optimize different network objectives simultaneously, particularly in AI-integrated and open radio access network architectures such as AI Radio Access Network (AI-RAN) and Open Radio Access Network (O-RAN). When these functions interact, they can interfere with one another in ways that are difficult to detect from raw network data alone. A key missing piece for managing such interactions is a reliable, interpretable dependency structure that captures which control parameters are actively influencing which network performance outcomes at any given time. This paper focuses on the event-detection step needed to support such dependency learning by converting noisy continuous telemetry into binary indicators of parameter activity and KPI response. The central difficulty is that not every fluctuation in the data reflects a genuine control interaction, so the method must distinguish real parameter-outcome relationships from background variation. Because real AI-RAN traffic traces with known parameter-KPI ground truth are difficult to obtain, we introduce a synthetic closed-loop traffic generator with planted latent dependencies. We use this controlled telemetry to evaluate a machine-learning-based dependency recovery pipeline that formulates the conversion of continuous traces into binary event indicators as a significance-detection problem. Experimental evaluation shows that the proposed pipeline reliably recovers the latent dependency structure from noisy continuous traces when the signal is sufficiently separated from background variation, while highlighting threshold calibration as the key factor controlling event-detection quality. These results constitute a foundational step toward interpretable dependency learning for adaptive AI-RAN control systems.

URL PDF HTML ☆

赞 0 踩 0

2606.06458 2026-06-05 cs.LG cs.AI cs.CV 版本更新

In-Context Multiple Instance Learning

上下文多实例学习

Alexander Möllers, Marvin Sextro, Julius Hense, Gabriel Dernbach, Klaus-Robert Müller

发表机构 * Berlin Institute for the Foundations of Learning and Data（柏林学习与数据基础研究所）； Machine Learning Group, Technische Universität Berlin（柏林技术大学机器学习小组）； Aignostics ； Institute of Pathology, Charité – Universitätsmedizin Berlin（柏林查理医院病理研究所）； Max-Planck Institute for Informatics（马克斯·普朗克信息研究所）； Department of Artificial Intelligence, Korea University（韩国大学人工智能系）

AI总结本文提出一种基于感知器架构的上下文学习器，通过合成数据预训练，无需梯度更新即可从少量标记包中解决新的多实例学习任务，在12个基准上超越需任务特定训练的监督基线。

详情

AI中文摘要

多实例学习（MIL）解决了在实例包级别提供监督的问题，并已成功应用于从计算病理学到卫星图像等领域。然而，现有算法在低标签率（许多实际应用的特点）下表现不佳。灵活的模型过拟合，而僵化的模型无法适应手头的任务。我们证明，在合成数据上预训练一个具有感知器架构的上下文学习器，可以得到一个能够从少量标记包中解决新任务的模型。在推理时，分类在单次前向传播中完成，无需梯度更新。我们提出并研究了不同的用于包结构数据的合成数据生成器，发现它们捕获了互补的归纳偏差。在这些生成器的混合上预训练的模型继承了每个生成器在各自任务上的优势，并在12个MIL基准上取得了最佳平均性能，超过了需要任务特定训练的监督基线。

英文摘要

Multiple Instance Learning (MIL) addresses problems where supervision is available at the level of bags of instances and has been successfully applied in fields ranging from computational pathology to satellite imagery. Nevertheless, existing algorithms struggle in the low-label regime that characterizes many real-world applications. Flexible models overfit and rigid ones fail to adapt to the task at hand. We show that pretraining an in-context learner with a Perceiver-style architecture on synthetic data yields a model that can solve new tasks from a handful of labeled bags. At inference time, classification happens in a single forward pass and requires no gradient updates. We propose and investigate different synthetic data generators for bag-structured data and find that they capture complementary inductive biases. A model pretrained on a mixture of these generators inherits their per-task strengths and achieves the best average performance across twelve MIL benchmarks, outperforming supervised baselines that require task-specific training.

URL PDF HTML ☆

赞 0 踩 0

2606.06447 2026-06-05 cs.CL cs.LG 版本更新

Latent Reasoning with Normalizing Flows

基于归一化流的潜在推理

Guancheng Tu, Xiangjun Fu, Suhao Yu, Yao Tang, Haoqiang Kang, Lianhui Qin, Yizhe Zhang, Jiatao Gu

发表机构 * University of Pennsylvania（宾夕法尼亚大学）； UC San Diego（圣地亚哥大学）； Meta（Meta公司）

AI总结提出NF-CoT框架，通过归一化流在LLM内部建模连续潜在思维，保留自回归生成、概率采样、KV缓存解码和似然估计等优势，在代码生成任务中提升通过率并降低推理成本。

详情

AI中文摘要

大型语言模型通常通过生成显式思维链（CoT）来改进推理，展示了中间计算的重要性。然而，文本CoT迫使这种计算通过离散、串行且面向通信的令牌流进行：每个推理步骤必须在模型继续之前被语言化，即使底层更新是语义的、不确定的或仅部分形成的。潜在推理通过在承诺文本之前以紧凑的连续状态执行中间计算，提供了一种更高带宽的替代方案。然而，现有的潜在推理方法常常牺牲了使CoT在自回归语言模型中有效的关键优势，包括原生的从左到右生成、概率采样、与KV缓存解码的兼容性以及可处理的似然估计。我们提出NF-CoT，一种潜在推理框架，通过使用归一化流对连续思维进行建模来保留这些优势。NF-CoT在LLM骨干内部实例化一个TARFlow风格的归一化流，定义了从显式CoT提炼的紧凑连续思维上的可处理概率模型。连续思维位置由NF头生成，而文本位置由标准LM头在同一因果流中生成。这种设计为潜在思维提供了精确的似然，支持使用原始KV缓存进行概率从左到右解码，并支持在潜在推理空间中进行直接策略梯度优化。在代码生成基准测试中，NF-CoT在显式CoT和先前潜在推理基线上提高了通过率，同时显著降低了中间推理成本。

英文摘要

Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and communication-oriented token stream: each reasoning step must be verbalized before the model can proceed, even when the underlying update is semantic, uncertain, or only partially formed. Latent reasoning offers a higher-bandwidth alternative by performing intermediate computation in compact continuous states before committing to text. Yet existing latent-reasoning methods often sacrifice key advantages that make CoT effective in autoregressive language models, including native left-to-right generation, probabilistic sampling, compatibility with KV-cache decoding, and tractable likelihood estimation. We propose NF-CoT, a latent reasoning framework that preserves these advantages by modeling continuous thoughts with normalizing flows. NF-CoT instantiates a TARFlow-style normalizing flow inside the LLM backbone, defining a tractable probability model over compact continuous thoughts distilled from explicit CoT. Continuous-thought positions are generated by an NF head, while text positions are generated by the standard LM head within the same causal stream. This design provides exact likelihoods for latent thoughts, enables probabilistic left-to-right decoding with the original KV cache, and supports direct policy-gradient optimization in the latent reasoning space. On code-generation benchmarks, NF-CoT improves pass rates over explicit-CoT and prior latent-reasoning baselines while substantially reducing intermediate-reasoning cost.

URL PDF HTML ☆

赞 0 踩 0

2606.06440 2026-06-05 cs.LG stat.ML 版本更新

Causal Atlases from Entropic Inference: Bayesian Networks beyond Optimal DAGs

来自熵推理的因果图谱：超越最优DAG的贝叶斯网络

Hazhir Aliahmadi, Irina Babayan, Greg van Anders

发表机构 * Department of Physics , Engineering Physics and Astronomy（物理系、工程物理与天文学系）

AI总结针对数据驱动因果识别中多因果链问题，提出基于熵推理的因果图谱方法，通过最大熵系综采样量化因果结构歧义性。

Comments 18 pages, 2 figures

详情

AI中文摘要

数据驱动的因果关系识别对于理解科学内外的复杂系统至关重要。贝叶斯网络通过有向无环图（DAG）为建模通用因果关系提供了一种概率方法。然而，构建贝叶斯网络的典型技术依赖于优化，这可能不适合学习因果关系，因为底层数据可能允许多条因果链。更忠实于数据的因果关系表示将提供构建多个因果图的框架，这些因果图与底层数据固有的变异性一致。在这里，我们展示了基于熵的推理生成了与底层数据一致的合理因果关系的图谱。在2节点和20节点线性结构方程模型的模拟噪声数据上，我们对图的最大熵系综进行采样，从而量化底层因果关系中固有的结构歧义性。我们的方法表明，“优化”的DAG可能包含在同等精确的拓扑中不一致的因果伪影。

英文摘要

Data-driven causal relationship identification is pertinent to advancing understanding of complex systems both within and beyond science. Bayesian networks offer a probabilistic method for modelling generic causal relationships via directed acyclic graphs (DAGs). However, typical techniques for constructing Bayesian networks rely on optimization, which can be ill-suited for learning causal relationships because the underlying data may admit multiple chains of causation. More data-faithful representations of causal relationships would provide frameworks for constructing multiple causal maps that are consistent with the variability that is inherent in underlying data. Here, we show that entropy-based inference generates atlases of plausible causal relationships that are consistent with underlying data. On simulated noisy data of 2- and 20-node linear structural equation models, we sample a maximum-entropy ensemble of graphs that allow us to quantify the inherent structural ambiguity in underlying causal relationships. Our method shows that "optimized" DAGs can contain causal artifacts are not consistent across equivalently accurate topologies.

URL PDF HTML ☆

赞 0 踩 0

2606.06418 2026-06-05 cs.LG cs.AI cs.SY eess.SY 版本更新

共形风险分担：具有参与保证的认证成本分配

Ieva Kazlauskaite

发表机构 * Ieva Kazlauskaite（伊娃·卡祖利特）

AI总结提出共形风险分担方法，通过可解释的分担策略与分裂共形校准相结合，从有限数据中无分布假设地分配罕见事件的财务影响，为每个参与者提供义务上限并验证无人受损。

详情

AI中文摘要

将罕见不利事件的财务影响在群体中分担可以减轻极端个人负担，但任何因该安排而变得更糟的参与者都有理由退出。因此，一个可信的机制必须为每个代理人提供其未来义务的可信上限，并且只有在参与者之间的总损害有界时才应部署。我们将此形式化为认证分配问题：从有限数据中，无需分布假设，找到一种再分配规则，为每个参与者产生义务上限，并验证没有参与者实质上变得更糟。我们提出共形风险分担，通过将可解释的分担策略与分裂共形校准相结合来解决这个问题。分担强度在训练数据上调整，而保留的校准数据产生无分布假设的每个代理保证（在可交换性下有效）。在合成和真实数据（包括降水和能源合作社数据）上的实验证实，该框架可以显著降低高风险代理的极端义务，同时控制对他人的损害。

英文摘要

Sharing the financial impact of rare adverse events across a group can soften extreme individual burdens, but any participant made worse off by the arrangement has reason to leave. A credible mechanism must therefore provide each agent with a trustworthy cap on their future obligation and should be deployed only if the aggregate harm across participants is bounded. We formalise this as the Certified Allocation Problem: from finite data and without distributional assumptions, find a redistribution rule, produce obligation caps for every participant, and verify that no participant is made materially worse off. We propose Conformal Risk Sharing, which solves this problem by pairing an interpretable sharing policy with split conformal calibration. The sharing intensity is tuned on training data, while held-out calibration data produces distribution-free per-agent guarantees (valid under exchangeability). Experiments on synthetic and real-world data, including precipitation and energy-cooperative data, confirm that the framework can substantially reduce extreme obligations for high-risk agents while controlling harm to others.

URL PDF HTML ☆

赞 0 踩 0

2606.06385 2026-06-05 cs.LG 版本更新

Learned Response-Field Inertia Operator for HEC-RAS 2D Water-Surface Elevation Prediction

用于 HEC-RAS 2D 水面高程预测的学习型响应场惯性算子

Edward Holmberg, Elias Ioup, Md Meftahul Ferdaus, Mahdi Abdelguerfi, Julian Simeonov

发表机构 * Canizaro Livingston Gulf States Center for Environmental Informatics, Department of Computer Science, The University of New Orleans（坎西罗利文斯顿湾州环境信息中心，计算机科学系，新奥尔良大学）； Center for Geospatial Sciences, Naval Research Laboratory（地理空间科学中心，海军研究实验室）； Ocean Sciences Division, Naval Research Laboratory（海洋科学 division，海军研究实验室）

AI总结提出学习型响应场惯性算子（LRFIO），一种基于增量、无外力项的学习代理模型，通过从已求解的 HEC-RAS 轨迹中校准惯性响应算子并在原生非均匀网格上进行封闭形式滚动预测，实现了跨数据集的水面高程预测，并展示了自适应复杂度控制。

Comments Preprint manuscript prepared using IEEEtran journal format

详情

AI中文摘要

本文提出了一种跨数据集评估学习型原生网格代理模型的方法，用于 HEC-RAS 2D 中求解器一致的水面高程（WSE）预测。为避免栅格重映射误差和信息访问混淆，代理模型直接在原始非均匀计算单元上评估，并采用显式策略分离静态项目输入、当前水力状态、项目输入强迫、校准衍生量以及未来求解器输出目标。我们引入了学习型响应场惯性算子（LRFIO），这是一种无外力、基于增量的学习代理模型，它从已求解的 HEC-RAS 轨迹中校准惯性响应算子，并通过封闭形式的原生网格滚动部署保留的算子。LRFIO 评估了一个基例优先的响应层次结构，包括持久性、全局校准惯性和分段响应场惯性。分段、残差校正和神经化惯性被视为可学习建模选择，仅当验证证据证明其成本合理时才保留增加的复杂度。在四个不同的 HEC-RAS 2D 基准测试中，LRFIO 对不同领域保留了不同的响应结构，展示了自适应学习复杂度。选择器审计显示复杂度可控，最大验证遗憾为 4.30%。在部署期间，保留的滚动时间范围为 0.003 秒至 0.242 秒，Beaver Bayou 实测-求解比较表明，相对于 HEC-RAS 实现了约 2.75 × 10^4 的视界归一化加速。这些结果表明，当前的原生网格增量是一个强大的求解器条件预测支架，并且仅在经验证实时才应保留增加的响应场、神经或空间复杂度。

英文摘要

This article presents a cross-dataset evaluation of learned native-cell surrogate models for solver-consistent water-surface elevation (WSE) prediction in HEC-RAS 2D. To avoid raster remapping error and information-access confounding, surrogates are evaluated directly on the original nonuniform computational cells under an explicit policy that separates static project inputs, current hydraulic state, project-input forcing, calibration-derived quantities, and future solver-output targets. We introduce the Learned Response-Field Inertia Operator (LRFIO), a no-forcing, increment-based learned surrogate that calibrates an inertial response operator from solved HEC-RAS trajectories and deploys the retained operator through closed-form native-cell rollout. LRFIO evaluates a base-case-first response hierarchy consisting of persistence, global calibrated inertia, and segmented response-field inertia. Segmentation, residual correction, and neuralized inertia are treated as learnable modeling choices, with added complexity retained only when validation evidence justifies its cost. Evaluated across four diverse HEC-RAS 2D benchmarks, LRFIO retains different response structures for different domains, demonstrating adaptive learned complexity. The selector audit shows controlled complexity with a maximum validation regret of 4.30%. During deployment, retained rollout times range from 0.003 s to 0.242 s, and the Beaver Bayou measured-solve comparison gives an estimated 2.75 x 10^4 horizon-normalized speedup over HEC-RAS. These results indicate that the current native-cell increment is a strong solver-conditioned predictive scaffold and that added response-field, neural, or spatial complexity should be retained only when empirically justified.

URL PDF HTML ☆

赞 0 踩 0

2606.06364 2026-06-05 cs.LG stat.ML 版本更新

End-to-End Subgraph Detection with GraphDETR

端到端子图检测与GraphDETR

Dexiong Chen, Till Hendrik Schulz, Karsten Borgwardt

发表机构 * Max Planck Institute of Biochemistry（马克斯·普朗克生物化学研究所）

AI总结提出GraphDETR框架，将子图检测视为集合预测问题，通过图神经网络编码目标图、Transformer解码器联合预测所有模式实例，并采用二分匹配端到端训练，支持精确和近似匹配，在多达1000节点的图中检测50节点模式，并在ChEMBL数据集上实现AP100=91.2。

详情

AI中文摘要

子图检测旨在识别查询模式实例是否出现在更大图中及其位置。该问题在科学领域至关重要，且与子图同构密切相关，后者是NP完全的，限制了组合方法只能处理小模式或中等规模图。我们提出GraphDETR，一个深度学习框架，将子图检测公式化为集合预测问题，类似于目标检测中的DETR。GraphDETR使用图神经网络编码目标图，并采用一组固定的可学习查询向量，通过Transformer解码器解码，在单次前向传播中联合预测所有模式实例。这通过端到端训练和二分匹配实现。与传统仅解决精确结构匹配的组合方法不同，GraphDETR自然扩展到近似匹配，使得能够检测超出精确模式对应的实例。实验表明，GraphDETR能够在多达1000个节点的目标图中检测多达50个节点的多样化模式，如分子结构、环、团和模糊模式。我们进一步在ChEMBL数据集上评估分子官能团检测，GraphDETR预测每个分子的完整官能团集合，实现了$ ext{AP}_{100} = 91.2$的强性能。

英文摘要

Subgraph detection seeks to identify whether and where instances of query patterns occur within a larger graph. This problem is fundamental across scientific domains and is closely related to subgraph isomorphism, which is NP-complete, limiting combinatorial approaches to small patterns or moderately sized graphs. We introduce GraphDETR, a deep learning framework that formulates subgraph detection as a set prediction problem, analogous to DETR in object detection. GraphDETR encodes the target graph with a graph neural network, and employs a fixed set of learnable query vectors, decoded via a transformer decoder, to predict all pattern occurrences jointly in a single forward pass. This is enabled by training the model end-to-end with bipartite matching. Unlike traditional combinatorial methods that only solve exact structural matching, GraphDETR naturally extends to approximate matching, enabling detection beyond exact pattern correspondence. Empirically, we show that GraphDETR can detect diverse patterns, such as molecular structures, cycles, cliques, and fuzzy patterns of up to 50 nodes, in target graphs with up to 1000 nodes. We further evaluate on molecular functional group detection over the ChEMBL dataset, where GraphDETR predicts the complete set of functional groups per molecule, achieving a strong performance of $\text{AP}_{100} = 91.2$.

URL PDF HTML ☆

赞 0 踩 0

2606.06353 2026-06-05 cs.LG 版本更新

Maximising the Set-Piece Return: Optimising Football Corner Tactics with Graph Reinforcement Learning

最大化定位球回报：用图强化学习优化足球角球战术

Sean Groom, Michael Groom, Francisco Belo, Axl Rice, Liam Anderson, Victor-Alexandru Darvariu, Shuo Wang

发表机构 * School of Computer Science, University of Birmingham, Birmingham, UK（伯明翰大学计算机科学学院）； Nottingham Forest Football Club, Nottingham, UK（诺丁汉森林足球俱乐部）； Oxford Robotics Institute, University of Oxford, Oxford, UK（牛津大学机器人研究所）； School of Sport, Exercise and Rehabilitation Sciences, University of Birmingham, Birmingham, UK（伯明翰大学运动、体能与康复科学学院）

AI总结提出一种基于图强化学习的框架，通过调整进攻球员位置和速度来最大化角球首次触球射门概率，在英超角球数据上优于传统优化方法。

Comments 11 pages, 4 figures

详情

AI中文摘要

机器学习越来越多地被用于评估足球战术。然而，现有方法侧重于描述历史动作或分析师指定的反事实场景。在这项工作中，我们旨在超越对历史观察模式的模仿，发现新的可泛化的球员配置和策略。为此，我们专注于优化角球套路，并制定了一个决策问题，其中中央策略调整进攻球员的位置和速度，以最大化首次触球射门概率。与解决孤立设置的经典优化不同，我们贡献了一个基于图结构数据的强化学习架构，该架构产生一个通用策略，用于调整任意起始球员位置。在超过3000个英超角球上的评估表明，在匹配推理预算下，我们的方法显著优于基线优化技术。我们的结果表明，图强化学习可以将定位球分析从历史评估和模仿转向奖励驱动的战术发现。

英文摘要

Machine learning is increasingly employed for the evaluation of football tactics. However, existing approaches focus on characterising historical actions or analyst-specified counterfactual scenarios. In this work, we seek to go beyond the imitation of historically observed patterns towards discovering new generalisable player configurations and strategies. To tackle this, we focus on optimising corner kick routines, and formulate a decision-making problem in which a central policy makes adjustments to attacking player positions and velocities to maximise first contact shot probability. Unlike classic optimisation that solves for isolated setups, we contribute a reinforcement learning architecture operating on graph-structured data that yields a general policy for adjusting arbitrary starting player positions. Evaluated on over 3,000 Premier League corners, our approach strongly outperforms baseline optimisation techniques under matched inference budgets. Our results suggest that graph reinforcement learning can shift set-piece analysis from historical evaluation and imitation towards reward-driven tactical discovery.

URL PDF HTML ☆

赞 0 踩 0

2606.06351 2026-06-05 stat.ML cs.LG 版本更新

Function-Space Priors for Bayesian Neural ODEs with Application to Vessel Trajectory Prediction

贝叶斯神经常微分方程的函数空间先验及其在船舶轨迹预测中的应用

Jaeyeong Lee, Wonmo Koo, Heeyoung Kim

发表机构 * Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology (KAIST)（工业与系统工程系，韩国科学技术院（KAIST））

AI总结针对船舶轨迹预测中不规则采样、缺失报告和复杂动力学挑战，提出一种在向量场上施加高斯过程核先验的正则化方法，并结合概率多重打靶实现长序列的不确定性量化。

详情

AI中文摘要

从自动识别系统（AIS）数据预测船舶轨迹对于海上态势感知至关重要，但由于不规则采样、缺失报告和复杂动力学，这仍然具有挑战性。除了准确的点预测外，海事应用还需要良好校准的不确定性估计以支持可靠决策。贝叶斯神经常微分方程（ODE）通过在神经向量场参数上放置先验，为具有不确定性量化的连续时间轨迹建模提供了原则性框架。然而，常用的各向同性高斯权重先验无法编码船舶动力学的信息性结构特性，如平滑性和局部性。现有的函数空间贝叶斯神经网络方法解决了静态映射的这一限制，但不能直接转移到神经常微分方程，因为其主要关注量是轨迹而非向量场本身。原则上，可以直接在ODE解上放置高斯过程（GP）先验，但这需要将分布通过非线性ODE求解器传播，这在分析上是棘手的。为了解决这一挑战，我们采用了一种实用方法，直接在有限测量点集上评估的向量场上施加基于GP核的先验。具体来说，我们用基于核的正则化器增强标准权重空间变分目标，该正则化器惩罚向量场偏离GP先验所隐含的结构。为了处理长且不规则的AIS轨迹，我们进一步将这种函数空间正则化与概率多重打靶相结合，该打靶方法在保持全局一致性的同时解耦跨时间段的推理。

英文摘要

Vessel trajectory prediction from Automatic Identification System (AIS) data is essential for maritime situational awareness, yet it remains challenging due to irregular sampling, missing reports, and complex dynamics. Beyond accurate point forecasts, maritime applications also demand well-calibrated uncertainty estimates for reliable decision-making. Bayesian Neural Ordinary Differential Equations (ODEs) offer a principled framework for continuous-time trajectory modeling with uncertainty quantification by placing a prior over the neural vector field parameters. However, the commonly used isotropic Gaussian weight prior fails to encode informative structural properties of vessel dynamics, such as smoothness and locality. Existing function-space Bayesian neural network methods address this limitation for static mappings, but do not transfer directly to Neural ODEs, where the primary quantity of interest is the trajectory rather than the vector field itself. In principle, one could place a Gaussian process (GP) prior directly over ODE solutions, but this requires propagating distributions through a nonlinear ODE solver, which is analytically intractable. To address this challenge, we adopt a practical approach that imposes a GP-kernel-based prior directly on the vector field evaluated at a finite set of measurement points. Specifically, we augment the standard weight-space variational objective with a kernel-based regularizer that penalizes deviations of the vector field from the structure implied by a GP prior. To handle long and irregular AIS trajectories, we further combine this function-space regularization with probabilistic multiple shooting, which decouples inference across temporal segments while maintaining global consistency.

URL PDF HTML ☆

赞 0 踩 0

2606.06348 2026-06-05 cs.LG 版本更新

Performance Evaluation of GraphCast for Medium-Range Weather Forecasting over Brazil

GraphCast在巴西中期天气预报中的性能评估

Wolfgang R. Rowell, Lucas S. Kupssinskü

发表机构 * MALTA, Machine Learning Theory and Applications Lab, PUCRS, Porto Alegre, Brazil（MALTA，机器学习理论与应用实验室，PUCRS，波士顿，巴西）

AI总结本研究利用GraphCast模型与ECMWF IFS HRES基线，评估其在巴西四个气候子区域的中期天气预报性能，发现其技能具有季节性依赖性，在冬季中期表现不佳但在延伸期有优势，夏季则能准确捕捉大尺度水汽输送并抑制高频对流变率。

详情

AI中文摘要

全球天气预报范式正随着机器学习天气预报模型（MLWP）的出现而迅速转变。虽然这些数据驱动的架构展现出卓越的全球技能，但全球南方地区的区域基准仍然稀缺，其在复杂、高对流环境中的有效性在很大程度上未经验证。本研究评估了GraphCast operational在巴西四个不同气候子区域中，以确定性ECMWF IFS HRES为基线的性能。利用可扩展的云原生管道和WeatherBench-X框架进行天气模型基准测试，我们评估了四个选定季节窗口中的选定对流层变量（$T_{850}$、$Q_{850}$、$Z_{500}$），以运行IFS分析作为地面实况，计算两个模型的统计指标。结果揭示了依赖于天气形势的技能特征。在南半球冬季，GraphCast在中期（预报天数2-7）对$Z_{500}$解析巴西上空快速传播的斜压系统时表现不佳，但在延伸期重新获得优势，此时其固有的对混沌小尺度变率的平滑在确定性技能指标下变得有益。相反，在南半球夏季雨季，GraphCast准确捕捉了大尺度水汽输送，同时内在抑制了破坏确定性NWP温度预报的高频对流变率。这些发现为巴西建立了基线，并定义了将指导未来“热带化”努力的具体物理边界，旨在优化这些基础AI模型以增强区域韧性。

英文摘要

The paradigm of global weather forecasting is rapidly shifting with the emergence of Machine Learning Weather Prediction models (MLWP). While these data-driven architectures demonstrate remarkable global skill, regional benchmarks in the Global South remain scarce, leaving their efficacy in complex, highly convective environments largely unverified. This study evaluates the performance of GraphCast operational against the deterministic ECMWF IFS HRES as baseline across four distinct Brazilian climatic sub-regions. Utilizing a scalable, cloud-native pipeline and the WeatherBench-X framework for benchmarking weather models, we assess selected tropospheric variables ($T_{850}$, $Q_{850}$, $Z_{500}$) over four selected seasonal windows, employing the operational IFS analysis as the ground truth to calculate the statistical metrics for both models. Results reveal a regime-dependent skill profile. During the austral winter, GraphCast underperforms in the medium range (lead days 2-7) for $Z_{500}$ when resolving fast-propagating baroclinic systems over southern Brazil, but regains an advantage in the extended range, where its inherent smoothing of chaotic small-scale variability becomes beneficial under deterministic skill metrics. Conversely, during the austral summer wet season, GraphCast accurately captures large-scale moisture transport while intrinsically dampening the high-frequency convective variability that degrades deterministic NWP temperature forecasts. These findings establish a baseline for Brazil and define the specific physical boundaries that will guide future ``tropicalization'' efforts, aiming to optimize these foundational AI models for regional resilience.

URL PDF HTML ☆

赞 0 踩 0

2606.06347 2026-06-05 eess.SY cs.LG cs.SY 版本更新

Attack Detection using Time Series Foundation Models

使用时间序列基础模型的攻击检测

Sribalaji C. Anand, Anh Tung Nguyen, George J. Pappas

发表机构 * University of Pennsylvania（宾夕法尼亚大学）； KTH Royal Institute of Technology（皇家理工学院）； Uppsala University（乌普萨拉大学）

AI总结针对无模型知识的网络物理系统，提出基于TimesFM时间序列基础模型的零样本攻击检测方法，在IEEE 14节点电力系统上验证其性能。

Comments Under review

详情

AI中文摘要

本文解决了在没有任何被控对象模型或其结构知识的情况下，网络物理系统中的攻击检测问题。远程被控对象通过假设受到攻击的网络向操作员传输传感器测量值。我们考虑两类攻击：无模型重放攻击和基于模型的隐蔽攻击。对于后者，我们针对线性与非线性系统，推导了针对$\chi^2$检测器的最优隐蔽攻击策略的闭式表达式。然后，我们提出一种基于TimesFM（Google Research开发的时间序列基础模型）的无模型结构检测器，该检测器以零样本方式作为替代残差生成器运行。实验表明，基于TimesFM的检测器实现了相当或更优的攻击检测性能。在IEEE 14节点电力系统上通过数值实验证明了所提方法的有效性。我们还证明，当经典冗余假设失效时，TimesFM预测可作为受损测量值的替代，这是一种实用的缓解技术。

英文摘要

This paper addresses the problem of attack detection in cyber-physical systems without any knowledge of the plant model or its structure. A remotely located plant transmits sensor measurements to an operator over a network that is assumed to be under attack. We consider two classes of attacks: model-free replay attacks and model-based stealthy attacks. For the latter, we derive closed-form expressions for the optimal stealthy attack policy against a $χ^2$ detector, for both linear and nonlinear systems. We then propose a model-structure-free detector based on TimesFM, a time-series foundation model developed by Google Research, which serves as a surrogate residual generator operating in a zero-shot fashion. We show empirically that the TimesFM-based detector achieves a comparable or superior attack detection performance. The efficacy of the proposed approach is demonstrated numerically on the IEEE 14-bus power system. We also demonstrate that TimesFM predictions can serve as a substitute for corrupted measurements, a practical mitigation technique when classical redundancy assumptions fail.

URL PDF HTML ☆

赞 0 踩 0

2606.06345 2026-06-05 cs.AI cs.LG q-bio.NC 版本更新

弥合领域专业知识与泛化能力以实现性能估计

Shuxuan Li, Zhilin Zhao, Quyu Kong, Wei-Shi Zheng

发表机构 * School of Computer Science and Engineering, Sun Yat-sen University, China（中山大学计算机科学与工程学院）； Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China（教育部人工智能与先进计算重点实验室）； Shenzhen Loop Area Institute, China（深圳环湖院）； Alibaba Cloud（阿里云）

AI总结提出FRAP方法，利用外部基础模型和基础模型的互补优势，通过温度缩放校准和对齐预测分布，构建更可靠的伪标签参考分布，从而在分布偏移下准确估计模型性能。

详情

AI中文摘要

分布偏移下的性能估计旨在预测模型在未标记测试集上的行为，该测试集的分布与训练数据不同，这一场景需要能够真实反映模型行为且无需真实标签的可靠指标。现有方法仅依赖给定模型的输出，而一旦分布发生偏移，其偏差会被放大，削弱了与真实性能的相关性。受此限制，我们提出融合参考对齐预测（FRAP），利用外部基础模型和基础模型的互补优势，构建更可靠的伪标签替代。FRAP通过应用温度缩放校准最小化基础模型与基础模型预测分布之间的差异，从而对齐两者。对齐后的预测通过基于置信度的加权融合成精炼的参考分布，该分布整合了基础模型的鲁棒性和基础模型的领域专业知识，并通过测量基础模型预测与该参考分布的一致性来获得性能估计。在多种数据集和架构上的大量实验表明，FRAP在分布偏移下相较于代表性性能估计方法取得了持续且显著的改进。

英文摘要

Performance estimation under distribution shift aims to predict how a model behaves on an unlabeled test set whose distribution differs from the training data, a scenario that requires reliable indicators that can faithfully reflect model behavior without ground-truth labels. Existing approaches rely solely on the outputs of the given model whose biases are amplified once the distribution shifts, weakening the correlation with the true performance. Motivated by this limitation, we propose Fused Reference Alignment Prediction (FRAP), which leverages the complementary strengths of an external foundation model and the base model to construct a more reliable surrogate of the ground-truth labels. FRAP aligns the prediction distribution of the foundation model with that of the base model by applying temperature-scaled calibration that minimizes their divergence. The aligned predictions are fused through confidence-based weighting into a refined reference distribution that integrates robustness from the foundation model and domain-specific expertise from the base model, and performance estimation is obtained by measuring how closely the base model predictions agree with this reference. Extensive experiments across diverse datasets and architectures show that FRAP provides consistent and substantial improvements over representative performance-estimation methods under distribution shift.

URL PDF HTML ☆

赞 0 踩 0

2606.06334 2026-06-05 cs.LG 版本更新

Quantifying the Privacy of Counterfactuals by Leveraging Membership Inference Attacks Against Synthetic Data

通过利用针对合成数据的成员推理攻击量化反事实的隐私性

Maryam Babaei, Yingke Wang, Hadrien Lautraite, Heber H. Arcolezi, Ulrich Aivodji, Sebastien Gambs

发表机构 * ÉTS Montreal and Mila Canada（蒙特利尔ÉTS学院和Mila加拿大）； UQAM Canada（魁北克大学UQAM加拿大）； Inria Grenoble France（法国格勒诺布尔Inria）

AI总结本文利用针对合成数据的成员推理攻击，证明仅通过反事实即可成功实施成员推理攻击，无需访问模型，揭示了反事实发布中的隐私风险。

详情

AI中文摘要

反事实通常用于高风险决策领域，通过展示用户档案的变化如何导致期望结果来解释机器学习模型。然而，通过反事实解释模型决策也可能被对手利用，对模型或其训练数据进行隐私攻击。基于反事实提供真实训练数据的现实替代品（类似于合成数据）的类比，我们在本文中展示了如何通过借鉴针对合成数据开发的攻击，成功地对反事实进行隐私攻击。更准确地说，我们研究了针对合成数据设计的成员推理攻击在各种类型反事实上的有效性。此外，虽然现有的针对反事实的成员推理攻击通常需要能够查询模型，但我们展示了如何仅使用一组反事实（无需访问生成它们的模型）即可成功进行成员推理攻击。我们的结果表明，模型开发者在向不同用户发布反事实时应更加谨慎，因为这可能导致隐私泄露。

英文摘要

Counterfactuals are typically used in high-stakes decision areas to explain a machine learning model by showing how changes to the user profiles result in the desired outcome. However, explaining the model's decisions through counterfactuals can also be exploited by an adversary to conduct privacy attacks against the model or its training data. Drawing on the analogy that counterfactuals provide realistic substitutes for real training data, similar to synthetic data, we demonstrate in this paper how it is possible to successfully perform privacy attacks on counterfactuals by drawing on the attacks developed against synthetic data. More precisely, we investigate the effectiveness of the membership inference attacks designed for synthetic data on various types of counterfactuals. Additionally, while existing membership inference attacks against counterfactuals usually require to be able to query the model, we show how it is possible to perform successful membership inference attacks using only a set of counterfactuals, with no access to the model from which they are generated. Our results demonstrate that model developers should be more cautious when releasing counterfactuals to various users, as it can lead to a privacy breach.

URL PDF HTML ☆

赞 0 踩 0

2606.06333 2026-06-05 cs.LG cs.AI 版本更新

学习遗忘什么：通过习得的词元级重要性改进大语言模型遗忘

Gizem Yüce, Giorgos Nikolaou, Nicolas Flammarion

发表机构 * Theory of Machine Learning Lab, EPFL（机器学习理论实验室，EPFL）

AI总结提出交替词元加权遗忘（ATWU）框架，通过联合学习词元遗忘特异性和模型参数，在无外部监督下实现最优的遗忘-保留权衡。

详情

AI中文摘要

机器遗忘旨在从训练好的模型中移除特定知识，同时保留其通用能力。对于自回归语言模型，遗忘样本中的并非所有词元都与遗忘同等相关。现有方法要么忽略这种异质性，要么依赖辅助模型、启发式方法或外部标注来估计每个词元对遗忘的相关性。我们转而通过其与保留目标的交互来刻画这种相关性：一个词元是遗忘特异性的，其程度取决于在该词元上最小化遗忘损失不与保留最优性冲突。我们将这一视角形式化为一个关于模型参数和词元权重的联合优化问题，并证明在自然分离条件下，所得目标能够恢复 oracle 遗忘特异性词元支持。受此公式启发，我们引入了交替词元加权遗忘（ATWU），这是一个轻量级框架，在遗忘过程中通过一个基于隐藏状态的简单线性评分器联合学习词元遗忘特异性和模型参数，无需外部词元级监督。在 TOFU 和 RWKU 上，ATWU 实现了最先进的遗忘-保留权衡，优于样本级方法、基于概率的词元加权启发式方法和基于辅助模型的方法。此外，学习到的分数与真实遗忘特异性跨度显著更好地对齐，表明 ATWU 识别了语义上有意义的词元级遗忘信号。总体而言，我们的结果表明，保留冲突为识别语言模型应遗忘什么提供了有效标准，使得能够直接从模型表示中以最小计算开销无监督学习词元级遗忘特异性。

英文摘要

Machine unlearning aims to remove targeted knowledge from a trained model while preserving its general capabilities. For autoregressive language models, not all tokens in a forget sample are equally relevant to forgetting. Existing approaches either ignore this heterogeneity or rely on auxiliary models, heuristics, or external annotations to estimate each token's relevance for forgetting. We instead characterize it through the interaction with the retain objective: a token is forget-specific to the extent that minimizing the forget loss on that token does not conflict with retain optimality. We formalize this perspective as a joint optimization problem over the model parameters and the token weights and show that, under a natural separation condition, the resulting objective recovers the oracle forget-specific token support. Motivated by this formulation, we introduce Alternating Token-Weighted Unlearning (ATWU), a lightweight framework that jointly learns token forget-specificity and model parameters during unlearning using a simple linear scorer over the hidden states, without external token level supervision. Across TOFU and RWKU, ATWU achieves state of the art forget-retain trade-offs, outperforming sample-level methods, probability-based token weighting heuristics, and auxiliary-model-based approaches. Moreover, the learned scores align substantially better with ground truth forget-specific spans, indicating that ATWU identifies semantically meaningful token level forgetting signals. Overall, our results suggest that retain conflict provides an effective criterion for identifying what language models should forget, enabling unsupervised learning of token level forget-specificity directly from model representations with minimal computational overhead.

URL PDF HTML ☆

赞 0 踩 0

2606.06314 2026-06-05 math.NA cs.LG cs.NA stat.ML 版本更新

DAS-PINNs for high-dimensional partial differential equations: extending deep adaptive sampling to spacetime domains

DAS-PINNs 用于高维偏微分方程：将深度自适应采样扩展到时空域

Anshima Singh, David J. Silvester

发表机构 * University of Manchester（曼彻斯特大学）； Department of Mathematics（数学系）

AI总结提出一种基于归一化流的深度自适应采样框架，将时空视为统一域，通过残差分布自动识别高残差区域并生成采样点，有效求解具有局部动态特征的高维时变PDE。

详情

AI中文摘要

具有空间局部和动态演化解的时变高维偏微分方程对物理信息神经网络（PINNs）构成根本性挑战，因为在高维时空域中均匀配点采样越来越无效。本文将深度自适应采样框架扩展到时变设置，将空间和时间视为统一域，无需任何显式时间推进。归一化流神经网络模型有效学习由PDE残差诱导的分布，并生成集中在解最难学习区域的新配点。与需要显式时间步进或移动网格的传统自适应策略不同，高残差区域由PDE残差分布驱动，在空间和时间上自动识别和跟踪。通过从二维空间中的尖锐移动特征到高达八维空间中的局部结构等一系列基准问题，评估了所提策略的有效性。

英文摘要

Time-dependent high-dimensional partial differential equations (PDEs) with spatially localised and dynamically evolving solutions pose a fundamental challenge for physics-informed neural networks (PINNs), as uniform collocation sampling becomes increasingly ineffective in high-dimensional spatiotemporal domains. In this work, a deep adaptive sampling framework for PINNs is extended to the time-dependent setting by treating space and time as a unified domain without any explicit time marching. A normalising flow neural network model effectively learns the distribution induced by the PDE residual and generates new collocation points concentrated in regions where the solution is most difficult to learn. Unlike conventional adaptive strategies that require explicit time stepping or moving meshes, high-residual regions are automatically identified and tracked across both space and time, driven purely by the PDE residual distribution. The effectiveness of the proposed strategy is assessed on a range of benchmark problems, from sharp and moving features in two spatial dimensions to localised structures in up to eight spatial dimensions.

URL PDF HTML ☆

赞 0 踩 0

2606.06313 2026-06-05 physics.flu-dyn cs.LG physics.comp-ph 版本更新

消息传递图神经网络的PAC-Bayesian对抗鲁棒泛化：敏感性分析

Ziling Liang, Xinping Yi, Qingsong Wen, Shi Jin

发表机构 * School of Information Science and Engineering, Southeast University（信息科学与工程学院，东南大学）； Squirrel Ai Learning

AI总结通过敏感性感知的PAC-Bayesian框架，利用输出雅可比矩阵的秩约束和异向高斯后验，为消息传递图神经网络导出更紧的对抗鲁棒泛化界。

详情

AI中文摘要

尽管图神经网络（GNNs）对对抗攻击的脆弱性对图表示学习构成了严重威胁，但在对抗环境下对鲁棒泛化行为的理解仍然是一个基本挑战。最近，基于PAC-Bayesian边际的泛化分析通过提供灵活且数据依赖的分析框架，显著推动了这一研究方向。然而，现有的鲁棒分析通常依赖于各向同性高斯后验，并在全参数空间中控制权重扰动，这限制了捕捉异质参数敏感性的能力，且依赖于隐藏宽度相关的复杂度项，导致泛化界不够紧。在本文中，我们将最近提出的敏感性感知PAC-Bayesian框架从深度神经网络扩展到消息传递图神经网络（MPGNNs），并在对抗环境下导出了更紧的鲁棒泛化界。具体地，我们首先通过推导关于权重参数的输出雅可比矩阵，量化不同参数块的扰动对网络输出的敏感性。利用这些雅可比矩阵在$K$类图分类中秩最多为$K$的事实，我们构建了雅可比对齐的敏感性矩阵，并使用具有优化协方差的异向高斯后验来紧上界KL散度。值得注意的是，通过细化学习权重的谱范数依赖性，并将主导维度因子从隐藏宽度相关项减少到类别数$K$，我们的分析为MPGNNs提供了更紧的鲁棒泛化保证，从而指导其设计以增强对抗鲁棒性。

英文摘要

Whilst the vulnerability of graph neural networks (GNNs) to adversarial attacks poses a critical threat to graph representation learning, the understanding of the robust generalization behavior remains a fundamental challenge in the adversarial setting. Recently, PAC-Bayesian margin-based generalization analysis substantially advances this line of research by providing a flexible and data-dependent analytical framework. However, existing robust analyses often rely on isotropic Gaussian posteriors and control weight perturbations in the full parameter space, which limits the ability to capture heterogeneous parameter sensitivity yet hinges on hidden-width-dependent complexity terms, resulting in not-tight-enough generalization bounds. In this paper, we extend a recently proposed sensitivity-aware PAC-Bayesian framework from deep neural networks to message passing GNNs (MPGNNs) and derive a tighter robust generalization bound in the adversarial setting. Specifically, we first quantify how sensitive the perturbations across different parameter blocks are to the network outputs by deriving the output Jacobians with respect to the weight parameters. Exploiting the fact that these Jacobian matrices have rank at most $K$ in $K$-class graph classification, we then construct Jacobian-aligned sensitivity matrices and use anisotropic Gaussian posteriors with optimized covariances to upper bound the KL divergence in a tight way. Notably, by refining the spectral-norm dependence on the learned weights and reducing the leading dimension factor from hidden-width-dependent terms to the number of classes $K$, our analysis yields much tighter robust generalization guarantees for MPGNNs, thereby guiding their designs to enhance adversarial robustness.

URL PDF HTML ☆

赞 0 踩 0

2606.06288 2026-06-05 stat.ML cs.LG 版本更新

Discrete Causal Representations from Heterogeneous Domains: A Bayesian Approach with Social Survey Applications

来自异构域的离散因果表示：一种贝叶斯方法及其在社会调查中的应用

Ankur Garg, Michael Stettler, Aaron Schein, Julius von Kügelgen

发表机构 * Department of Statistics, University of Chicago（芝加哥大学统计学系）； University of Tübingen（图宾根大学）； Department of Statistics & Data Science Institute, University of Chicago（芝加哥大学统计学与数据科学研究所）； Seminars for Statistics, ETH Zürich（苏黎世联邦理工学院统计系）

AI总结提出一种贝叶斯方法，从多环境数据中学习离散因果概念，通过序贯蒙特卡洛采样近似多模态后验，并在社会调查数据中验证了其推断有意义的高层概念和因果关系的有效性。

详情

AI中文摘要

因果表示学习旨在推断产生观测到的低层测量的高层潜在因果概念。这对于来自不同环境或领域的异构数据尤其相关，因为分布偏移通常通过某些底层因果机制中的稀疏局部变化而发生，而生成过程的其他部分保持不变。尽管因果表示的可识别性已被广泛研究，但实用的不确定性感知方法和真实世界用例仍较少探索。在这项工作中，我们提出了一种从多环境数据中学习因果表示的贝叶斯方法，重点关注离散因果概念和未知的多节点软干预的情况。为此，我们将因果假设和可解释性需求转化为层次模型中的适当先验和参数选择。然后，我们设计了一种基于序贯蒙特卡洛采样的推理方案来近似得到的多模态后验。我们通过社会调查数据的案例研究展示了我们的方法，其中潜在因果概念对应于文化价值观或政治观点，测量对应于调查响应，环境对应于不同的国家或州。我们的模型推断出有意义的高层概念以及它们之间合理的因果关系，展示了其在学习复杂真实世界数据的因果表示方面的实用性。

英文摘要

Causal representation learning aims to infer the high-level latent causal concepts that give rise to observed low-level measurements. This is particularly relevant for heterogeneous data from different environments or domains since distribution shifts often arise through sparse, localized changes in some of the underlying causal mechanisms, while other parts of the generative process remain unchanged. Whereas identifiability of causal representations has been studied extensively, practical uncertainty-aware methods and real-world use cases remain less explored. In this work, we propose a Bayesian approach to learning causal representations from multi-environment data, focusing on the case of discrete causal concepts and unknown multi-node soft interventions. To this end, we translate causal assumptions and interpretability desiderata into suitable priors and parametric choices within a hierarchical model. We then devise an inference scheme based on sequential Monte Carlo sampling to approximate the resulting multimodal posterior. We showcase our approach through case studies on social survey data, where latent causal concepts correspond to cultural values or political opinions, measurements to survey responses, and environments to different countries or states. Our model infers meaningful high-level concepts and plausible causal relations among them, demonstrating its utility for learning causal representations of complex real-world data.

URL PDF HTML ☆

赞 0 踩 0

2606.06272 2026-06-05 cs.LG cs.AI 版本更新

Your GFlowNet Secretly Learns an Optimal Transport Plan

你的GFlowNet秘密学习了一个最优传输方案

Ian Maksimov, Nikita Morozov, Denis Belomestny, Sergey Samsonov

发表机构 * GitHub ； arXiv

AI总结本文建立了非无环生成流网络与最优传输之间的理论联系，证明最小流GFlowNet学习到的策略编码了从源分布到目标分布的最优传输方案。

Comments ICML 2026 SPIGM Workshop

2606.06249 2026-06-05 cs.CV cs.LG 版本更新

Benedikt Seiter, Anya Fries, Julius von Kügelgen, Jonas Peters

发表机构 * ETH Zürich（苏黎世联邦理工学院）

AI总结针对多领域数据，提出Anchor PCA方法，通过修改目标矩阵进行主成分分析，在保留共享变异方向的同时权衡整体解释方差，实现鲁棒的降维。

详情

AI中文摘要

主成分分析（PCA）是最广泛使用的无监督降维技术之一。我们研究来自多个相关领域的数据的PCA。由于主成分在不同领域通常不同，获得共享低秩嵌入的一种方法是对合并数据进行PCA。然而，这种方法可能关注仅在少数领域中表现出高变异的虚假方向。为了找到在未见但相似领域中仍能解释大部分方差的鲁棒嵌入，我们提出关注共享变异方向。为此，我们引入了Anchor PCA，它在整体解释方差与共享和领域特定低秩嵌入之间的一致性之间进行权衡。Anchor PCA相当于对修改后的目标矩阵进行PCA，因此可以高效求解。此外，我们证明Anchor PCA恢复最大不变子空间，并在有界领域特定协方差膨胀下允许极小极大重构解释。在具有时间漂移的模拟和真实气体传感器数据上，我们分别证明Anchor PCA恢复了最大不变子空间，并产生了比合并基线和最坏情况替代方法在未见领域上解释更多方差的嵌入。综合来看，这些发现确立了Anchor PCA作为从多领域数据进行鲁棒无监督降维的有前途的方法。

英文摘要

Principal component analysis (PCA) is one of the most widely used unsupervised dimension reduction techniques. We study PCA for data from multiple related domains. Since principal components generally differ across domains, one way to obtain a shared low-rank embedding is to perform PCA on the pooled data. However, this approach can focus on spurious directions that exhibit high variation in only a few domains. To find a robust embedding that still explains most variance in unseen but similar domains, we propose instead to focus on shared directions of variation. To this end, we introduce Anchor PCA which trades off overall explained variance with agreement between the shared and domain-specific low-rank embeddings. Anchor PCA amounts to PCA on a modified target matrix and thus can be solved efficiently. Moreover, we show that Anchor PCA recovers a maximal invariant subspace and admits a minimax reconstruction interpretation under bounded domain-specific covariance inflations. On simulated and real-world gas sensor data with temporal drift, we demonstrate, respectively, that Anchor PCA recovers the maximally invariant subspace and yields embeddings that explain more variance on unseen domains than the pooling baseline and a worst-case alternative. Taken together, these findings establish Anchor PCA as a promising approach to robust unsupervised dimension reduction from multi-domain data.

URL PDF HTML ☆

赞 0 踩 0

2606.06225 2026-06-05 cs.IR cs.AI cs.LG 版本更新

Bridging the Semantic-Collaborative Gap: An Asymmetric Graph Architecture for Cold-Start Item Recommendation

弥合语义-协作鸿沟：面向冷启动物品推荐的非对称图架构

Anh Truong, John Trenkle, Yuanbo Chen, Honghong Zhao, Abdullah Alchihabi, Effy Fang, Michael Tamir

发表机构 * Tubi ； Kumo AI

AI总结提出Shallow-RHS非对称链接预测架构，通过左端设备塔利用时序历史消息传递捕获协作信号，右端内容塔仅基于内在特征编码，解决冷启动物品推荐中的图归纳补全问题。

详情

AI中文摘要

协同过滤和基于图的推荐模型因利用观察到的用户交互而非常有效，但这种依赖性在新增内容没有交互历史时产生了根本性的冷启动挑战。在Tubi的生产检索系统中，这一挑战还受到服务接口的进一步限制：新内容必须立即分配独立的嵌入，并且模型必须产生适用于近似最近邻检索的设备嵌入。我们通过将冷启动推荐表述为时间二分设备-内容图上的归纳图补全问题来解决这一设置。我们提出Shallow-RHS，一种非对称链接预测架构，其中左端（LHS）设备塔利用时序有效的观看历史消息传递来捕获协作信号，而右端（RHS）内容塔相对于图是故意浅层的，仅从内在特征编码内容。RHS塔不使用基于ID的嵌入、内容侧子图、邻居聚合或交互派生的表示，迫使内容编码器将内在特征映射到协同过滤感知的嵌入空间。训练后，学习到的内容编码器为热内容和新增内容生成嵌入，通过检索热替代邻居实现隐式图补全。我们进一步将相同的表示补全原则扩展到设备冷启动，通过从人口统计特征构建基于群体的嵌入。大规模在线实验表明，在内容冷启动参与度、推广速度、印象获取和设备冷启动参与度方面持续相对改进。

英文摘要

Collaborative filtering and graph-based recommendation models are highly effective because they leverage observed user interactions, but this dependence creates a fundamental cold-start challenge when newly added content has no interaction history. In Tubi's production retrieval system, this challenge is further constrained by the serving interface: new content must be assigned a standalone embedding immediately, and the model must also produce device embeddings suitable for approximate nearest-neighbor retrieval. We address this setting by formulating cold-start recommendation as an inductive graph-completion problem on a temporal bipartite device-content graph. We propose Shallow-RHS, an asymmetric link-prediction architecture in which the left-hand side (LHS) device tower leverages temporally valid watch-history message passing to capture collaborative signals, while the right-hand side (RHS) content tower is intentionally shallow with respect to the graph and encodes content solely from intrinsic features. The RHS tower does not use ID-based embeddings, content-side subgraphs, neighbor aggregation, or interaction-derived representations, forcing the content encoder to map intrinsic features into a collaborative-filtering-aware embedding space. After training, the learned content encoder generates embeddings for both warm and newly ingested content, enabling implicit graph completion through retrieval of warm surrogate neighbors. We further extend the same representation-completion principle to device cold-start by constructing cohort-based embeddings from demographic features. Large-scale online experiments demonstrate consistent relative improvements in content cold-start engagement, promotion speed, impression acquisition, and device cold-start engagement.

URL PDF HTML ☆

赞 0 踩 0

2606.06207 2026-06-05 cs.AI cs.LG 版本更新

Unsupervised Pattern Analysis in Japanese Veterinary Toxicology: A Regulatory-Compliant Framework for Cross-Species Risk Assessment

日本兽医毒理学中的无监督模式分析：用于跨物种风险评估的合规框架

Yukiko Kawakami, Mohammad Shirazi, Ryo Shimizuwa, Saito Shinoda, Alireza Mortazavi, Matsumoto Kawahara

发表机构 * Graduate School of Information Sciences, Tohoku University（东北大学信息科学研究生院）

AI总结提出一种监管集成的无监督框架，利用NVAL数据库对不良药物事件进行聚类分析，识别出具有生物学意义的跨物种毒性模式。

Comments Submitted to IEEE Transactions on Biomedical Engineering

详情

AI中文摘要

兽医药物警戒系统对于监测不良药物事件（ADEs）至关重要，然而现有方法往往无法捕捉由当地生物学和监管环境塑造的区域特异性毒性模式。在日本，这些挑战因物种特异性代谢差异以及农林水产省（MAFF）定义的报告实践而加剧。以往的工作大多依赖于预测导向模型，限制了机制可解释性。本研究提出了一种监管集成的无监督框架，用于利用国家兽医检测实验室（NVAL）数据库进行模式发现。ADEs被编码为器官系统对齐的表示，并针对物种特异性报告偏差进行调整，从而实现跨物种比较。应用基于相似性的聚类和降维来识别潜在毒性结构。对4,120份高置信度ADE报告（9,080个药物-ADE组合）的分析识别出三个显著的物种聚类（p < 0.01），包括伴侣动物中的肝脏主导模式（0.42 ± 0.06）、反刍动物中的肾毒性（0.39 ± 0.07）以及绵羊中的皮肤敏感性（0.35 ± 0.07）。药物水平聚类与药理类别的对齐率达到83%，而余弦相似度优于其他指标（轮廓系数：0.48；聚类精度：87%）。监管验证显示与既定分类高度一致。这些发现表明，与监管对齐的无监督分析能够揭示具有生物学意义的区域特异性毒性模式，为兽药安全性评估提供了一个可解释且可扩展的框架。

英文摘要

Veterinary pharmacovigilance systems are essential for monitoring adverse drug events (ADEs), yet existing approaches often fail to capture region-specific toxicity patterns shaped by local biological and regulatory contexts. In Japan, these challenges are amplified by species-specific metabolic differences and reporting practices defined by the Ministry of Agriculture, Forestry, and Fisheries (MAFF). Most prior work relies on prediction-oriented models, limiting mechanistic interpretability. This study proposes a regulatory-integrated unsupervised framework for pattern discovery using the National Veterinary Assay Laboratory (NVAL) database. ADEs are encoded into organ system-aligned representations and adjusted for species-specific reporting biases, enabling cross-species comparison. Similarity-based clustering and dimensionality reduction are applied to identify latent toxicity structures. Analysis of 4,120 high-confidence ADE reports (9,080 drug-ADE combinations) identified three significant species clusters (p < 0.01), including hepatic-dominant patterns in companion animals (0.42 $\pm$ 0.06), renal toxicity in ruminants (0.39 $\pm$ 0.07), and dermatological sensitivity in sheep (0.35 $\pm$ 0.07). Drug-level clustering achieved 83% alignment with pharmacological classes, while cosine similarity outperformed alternative metrics (silhouette score: 0.48; cluster precision: 87%). Regulatory validation showed strong agreement with established classifications. These findings demonstrate that regulation-aligned unsupervised analysis can uncover biologically meaningful, region-specific toxicity patterns, providing an interpretable and scalable framework for veterinary drug safety assessment.

URL PDF HTML ☆

赞 0 踩 0

2606.06205 2026-06-05 cs.LG 版本更新

Non-Negative Matrix Factorization for Event Data

事件数据的非负矩阵分解

Raphaël Romero

发表机构 * Ghent University（根特大学）

AI总结提出EventNMF，一种直接对事件时间进行建模的连续时间非负矩阵分解方法，通过B样条基函数分解强度函数，避免分箱预处理的信息损失，并证明标准分箱方法是其特例。

详情

AI中文摘要

连续时间事件数据，其中实体随时间发出瞬时事件，自然出现在许多领域，如神经科学、地震学和社会网络。非负矩阵分解（NMF）是揭示此类数据中可解释结构的自然工具，但迄今为止仅在分箱或平滑实体级计数度量后应用。这种预处理步骤存在擦除实体级异质性和细粒度时间特征的风险。在本文中，我们介绍了EventNMF，一种直接对事件时间进行操作的连续时间非负分解模型：每个实体的事件被建模为泊松过程，其强度通过非负B样条基函数分解，一个简单的估计过程恢复了跨实体共享的可解释时间模板。所得方法在数学上严谨、易于实现且计算高效。我们进一步证明了标准分箱计数方法是零次样条的特例，探索了偏差-方差权衡，并在合成潜在因子模型上与现有方法进行了比较，以及在几个实际应用中展示了EventNMF的有效性。

英文摘要

Continuous-time event data, in which entities emit instantaneous events over time, arises naturally across many domains such as neuroscience, seismology, and social networks. Non-negative matrix factorization (NMF) is a natural tool to uncover interpretable structure in such data, but it has so far only been applied after binning or smoothing the entity-level counting measures. This preprocessing step comes with the risk of erasing entity-level heterogeneities and fine-grained temporal features. In this paper, we introduce EventNMF, a continuous-time non-negative factorization model that operates directly on event times: each entity's events are modeled as a Poisson process whose intensity factorizes through a non-negative B-spline basis, and a simple estimation procedure recovers interpretable temporal templates shared across entities. The resulting method is mathematically principled, easy to implement, and computationally efficient. We further show that standard binned-count approaches arise as the special case of degree-zero splines, explore bias-variance tradeoffs and compare against existing methods on a synthetic latent factor model, and demonstrate the effectiveness of EventNMF on several real-world applications.

URL PDF HTML ☆

赞 0 踩 0

2606.06196 2026-06-05 cs.LG 版本更新

A Machine Learning-Based Framework for Discovering Huntington's Disease Stages: Integrating Graph Representation Learning and clustering to Uncover Progression Dynamics in Longitudinal Enroll-HD Dataset

基于机器学习的亨廷顿病阶段发现框架：结合图表示学习与聚类以揭示纵向Enroll-HD数据集中的进展动态

Lubna M. Abu Zohair, Marta Vallejo, MD Azher Uddin, John R. Woodward, Hind Zantout

发表机构 * School of Mathematical and Computer Sciences, Heriot-Watt University（赫瑞斯-沃德大学数学与计算机科学学院）

AI总结提出基于动态图表示学习和K-means++聚类的无监督框架，从纵向临床数据中识别亨廷顿病的四个有意义的进展阶段，并通过稳定性分析验证其鲁棒性。

Comments Accepted for publication in the Proceedings of the 10th International Conference on Medical and Health Informatics (ICMHI 2026), Association for Computing Machinery (ACM)

详情

AI中文摘要

亨廷顿病（HD）是一种进行性脑部疾病，逐渐影响运动、认知功能和行为。准确且一致地识别疾病阶段对于理解病程、患者分组、个性化护理和治疗发现至关重要。现有的临床分期框架主要依赖于预定义的临床测量阈值和临床专家决策，但这些离散的截断值可能掩盖有意义的阶段内变异性，并且容易受到评估者间差异的影响，尤其是在运动和功能评估中。为了解决这些局限性，我们开发了一个基于动态图表示学习的无监督机器学习框架，以捕捉来自纵向临床测量的患者内部和患者之间的时间关系。利用学习到的表示，我们应用K-means++聚类来识别分离良好的组。然后，我们迭代增加聚类数量（k），使用稳定性分析评估鲁棒性，并揭示超出初始最优解的额外有意义的聚类。我们将该框架应用于Enroll-HD队列中的302名个体（1477次就诊，每次就诊44个临床变量；80%为明显参与者），实现了反映自然临床进展的数据驱动的HD阶段发现。尽管队列规模有限，所提出的框架使用四维潜在空间实现了稳健的聚类性能，通过聚类稳定性分析识别出四个有意义的且统计上显著的疾病阶段。每个阶段对应明确的临床测量边界，与先前建立的临床分期方法相比，重叠最小。

英文摘要

Huntington's disease (HD) is a progressive brain disorder that gradually affects movement, cognitive function, and behavior. Identifying the stage of the disease accurately and consistently is important for understanding its course, grouping patients, personalized care, and discovering treatment. Existing clinical staging frameworks rely primarily on predefined clinical measurement thresholds and clinical expert decisions, yet these discrete cut-offs may obscure meaningful intra-stage variability and remain vulnerable to inter-rater differences, especially in motor and functional assessments. To address these limitations, we developed an unsupervised machine learning framework based on dynamic graph representation learning to capture temporal relationships within and across patients from longitudinal clinical measurements. Using the learned representations, we applied K-means++ clustering to identify well-separated groups. We then iteratively increased the number of clusters (k), using stability analysis to assess robustness and reveal additional meaningful clusters beyond the initial optimal solution. We applied the framework to 302 individuals from the Enroll-HD cohort (1,477 visits, 44 clinical variables per visit; 80% manifest participants), enabling data-driven discovery of HD stages reflecting natural clinical progression. Despite the limited cohort size, the proposed framework achieved robust clustering performance using a four-dimensional latent space, identifying four meaningful and statistically distinct disease stages through clustering stability analysis. Each stage corresponded to well-defined clinical measurement boundaries, with minimal overlap compared to previously established clinical staging methods.

URL PDF HTML ☆

赞 0 踩 0

2606.06179 2026-06-05 stat.ML cs.LG 版本更新

Diffusion Models Observe Only Gradients: A Geometric Perspective on Score Matching Errors

扩散模型仅观察梯度：分数匹配误差的几何视角

Naïl B. Khelifa, Richard E. Turner, Ramji Venkataramanan

发表机构 * University of Cambridge（剑桥大学）

AI总结本文从几何角度揭示L2分数误差不是衡量边缘分布质量的正确指标，通过Helmholtz-Hodge分解将分数误差分解为梯度分量和螺线管分量，证明只有梯度分量影响Fokker-Planck动力学，并给出仅依赖于梯度分量的KL散度上界及可计算的梯度分量估计器。

详情

AI中文摘要

基于分数的扩散模型通常通过最小化$L^2$分数匹配误差来训练，标准理论分析依赖该量来约束学习分布与目标分布之间的采样差异。我们证明$L^2$分数误差不是边缘分布质量的正确内在度量：一个学习的扩散模型可能在完美匹配目标分布的同时产生任意大的$L^2$分数误差。通过将分数误差分解为梯度分量和螺线管分量（Helmholtz-Hodge分解），我们识别出背后的几何原因：只有梯度分量进入边缘Fokker-Planck动力学，而螺线管分量在结构上不可见。我们在三个结果中精确阐述了这一点。首先，基于修正的几何，我们证明了一个不可能结果：没有$L^2$分数误差的单调函数能够一致地给出学习分布与目标分布之间任何散度的下界。其次，我们推导出Kullback-Leibler散度的上界，该上界仅依赖于误差的可观测梯度分量，从而收紧标准Girsanov界，并指出其宽松性源于在路径空间而非边缘空间动力学上操作的成本。第三，我们通过对偶Sobolev恒等式给出了梯度分量的可处理估计器，实验表明该估计器与样本质量的相关性显著优于完整的$L^2$误差。

英文摘要

Score-based diffusion models are typically trained by minimizing the $L^2$ score matching error, and standard theoretical analyses rely on this quantity to bound the sampling discrepancy between the learned and target distributions. We show the $L^2$ score error is not the right intrinsic measure of marginal distributional quality: a learned diffusion model can incur arbitrarily large $L^2$ score error while perfectly matching the target distribution. By decomposing score errors into a gradient and a solenoidal component (a Helmholtz-Hodge decomposition), we identify the geometric reason behind this: only the gradient component enters the marginal Fokker-Planck dynamics, while the solenoidal component is structurally invisible. We make this precise in three results. First, building on the corrected geometry, we prove an impossibility result: no monotone function of the $L^2$ score error can uniformly lower bound any divergence between the learned and target distributions. Second, we derive an upper bound on the Kullback-Leibler divergence that depends only on the observable gradient component of the error, tightening the standard Girsanov bound and identifying its looseness as the cost of operating on path-space rather than marginal-space dynamics. Third, we give a tractable estimator of the gradient component via a dual Sobolev identity, which is shown to empirically correlate substantially better with sample quality than the full $L^2$ error.

URL PDF HTML ☆

赞 0 踩 0

2606.06178 2026-06-05 cs.LG cs.AI cs.CL 版本更新

Learning to Route LLMs from Implicit Cost-Performance Preferences via Meta-Learning

通过元学习从隐式成本-性能偏好中学习路由LLM

Jiahao Zeng, Ming Tang, Ningning Ding

发表机构 * Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； Southern University of Science and Technology（南方科技大学）

AI总结提出MetaRouter框架，利用元学习从少量交互中学习用户隐式成本-性能偏好，实现个性化LLM路由，在分布内外任务上优于基线方法。

详情

AI中文摘要

大型语言模型（LLM）在性能与成本之间存在权衡，更强大的模型会产生更高的费用。LLM路由旨在通过将查询发送到最合适的模型来降低费用同时保持性能。然而，现有方法无法很好地适应不同用户的成本-性能偏好。为了解决这一差距，我们引入了一种新颖的感知LLM路由范式，用于个性化和以用户为中心的成本-性能优化，通过少量交互高效学习用户的隐式偏好。为了应对异构用户需求的挑战，我们将偏好配置文件形式化为上下文赌博机中的一组不同任务，并提出了MetaRouter，一个用于偏好感知LLM路由的元学习框架。实验结果表明，MetaRouter在分布内和分布外任务上均优于强基线。此外，它在学习用户偏好方面表现出高效率，对可路由LLM的变化具有鲁棒性，并且可扩展到多模型路由。

英文摘要

Large language models (LLMs) present a trade-off between performance and cost, where more powerful models incur greater expense. LLM routing aims to mitigate expenses while maintaining performance by sending queries to the most suitable model. However, existing methods cannot perform well for different user cost-performance preferences. To address this gap, we introduce a novel perceptive LLM routing paradigm for personalized and user-centric cost-performance optimization, which efficiently learns users' implicit preferences through little interaction. To handle the challenge of heterogeneous user needs, we formulate preference profiles as a set of distinct tasks in contextual bandit and propose MetaRouter, a meta-learning framework designed for preference-aware LLM routing. Experimental results show that MetaRouter outperforms strong baselines on both in-distribution and out-of-distribution tasks. Furthermore, it exhibits high efficiency in learning user preferences, robustness to changes in the routable LLMs, and scalability to multi-model routing.

URL PDF HTML ☆

赞 0 踩 0

2606.06174 2026-06-05 cs.LG stat.AP 版本更新

Learning to model pediatric asthma exacerbation from multiple risk factors: a case study in coastal Virginia

学习从多风险因素建模儿童哮喘加重：弗吉尼亚沿海地区案例研究

Jonathan Colen, Eric Werner, Maryam Golbazi, Heather Richter, Diana McSpadden, Amy Quinn, Jocel Santos, Mary Jane Darling, Mary Margaret Gleason

发表机构 * Joint Institute on Advanced Computing for Energy and Science（联合能源与科学高级计算研究所）； Old Dominion University（老 Dominion 大学）； School of Data Science（数据科学学院）； Eastern Virginia Medical School（东部弗吉尼亚医学院）； Macon & Joan Brock Virginia Health Sciences（马科恩与乔安·布罗克弗吉尼亚健康科学）； Children’s Hospital of the King’s Daughters（国王女儿儿童医院）； Children’s Specialty Group（儿童专科组）； Institute for Coastal Adaptation and Resilience（海岸适应与韧性研究所）； Chief Data Office（首席数据办公室）； Thomas Jefferson National Accelerator Facility（托马斯·杰弗逊国家加速器设施）； Department of Psychiatry（精神病学系）； Boston Children’s Hospital（波士顿儿童医院）

AI总结本研究通过比较广义线性模型、神经网络和稀疏字典学习框架，建模弗吉尼亚沿海地区儿童哮喘加重与空气污染、气象及社区社会经济因素的关系，并识别关键风险因素。

Comments 22 pages, 6 figures (5 supplemental)

详情

AI中文摘要

儿童哮喘是一种常见疾病，受空气污染、气象和社区级社会经济因素加剧。在大型时空数据集中建模哮喘加重（AE）需要厘清多个贡献因素的影响。在本案例研究中，我们比较了三种平衡预测能力与可解释性的技术，以预测汉普顿路（弗吉尼亚沿海地区，包括7个城市，人口超过150万）的AE。在整理环境空气污染测量值、天气数据和社区机会指标后，我们建模了2018-2023年区域儿童医院及附属机构的邮政编码级急性AE就诊情况。广义线性模型（GLM）提供基线，神经网络（NN）作为最大预测目标。为了桥接统计模型和深度学习，我们开发了一个基于稀疏字典学习的框架，以识别和解释简约的非线性交互方程。在比较每个模型的预测性能后，我们估计了输入暴露变量导致的AE相对风险，并发现各框架间的一致性。我们的工作将统计模型与可解释机器学习模型联系起来，突出了可能影响AE的协同交互作用，并可能为未来研究指导弗吉尼亚沿海地区的公共卫生干预措施。

英文摘要

Childhood asthma is a common illness exacerbated by air pollution as well as meteorological and neighborhood-level socioeconomic factors. Modeling asthma exacerbation (AE) in large spatiotemporal datasets requires disentangling impacts from multiple contributors. In this case study, we compared three techniques that balance predictive power with interpretability to predict AE in Hampton Roads, a coastal Virginia region comprising 7 cities and over 1.5 million people. After collating ambient air pollution measurements, weather data, and measures of neighborhood opportunity, we modeled zip code-level acute AE visits to a regional children's hospital and affiliated providers from 2018-2023. Generalized linear models (GLM) provided a baseline while neural networks (NN) served as a maximally predictive target. To bridge between statistical models and deep learning, we developed a framework based on sparse dictionary learning to identify and interpret parsimonious nonlinear interacting equations. After comparing each model's predictive performance, we estimated relative risks for AE due to input exposure variables and found consensus across frameworks. Our work links statistical and interpretable machine learning models to highlight possible synergistic interactions influencing AE, and may enable future studies to guide public health interventions in coastal Virginia.

URL PDF HTML ☆

赞 0 踩 0

2606.06171 2026-06-05 stat.ML cs.LG cs.NA math.NA physics.comp-ph 版本更新

Effective Dimensionality as an Operator Invariant for Physics-Preserving Constraint Adaptation in Physics-Informed Neural Networks

有效维度作为物理信息神经网络中保物理约束适配的算子不变量

Cornelius Otchere, Michael Shields

发表机构 * University of Cambridge（剑桥大学）

AI总结利用Fisher信息矩阵定义物理约束模型的有效自由度d_eff，证明其收敛于微分算子核维数，并基于此提出子空间投影策略实现边界条件适配。

详情

AI中文摘要

物理信息神经网络固有地遭受任务干扰，因为它们依赖共享参数空间来同时满足控制微分方程和边界条件。我们使用Fisher信息矩阵分析这种结构冲突，量化物理约束模型中的有效自由度（$d_{eff}$）。与经典的$d_{eff}$（衡量相对于统计先验由数据提供信息的参数方向数量）不同，我们的$d_{eff}$衡量不受微分算子约束的参数方向维度。对于具有有限维核的算子，我们证明$d_{eff}$精确收敛于核维数，与网络宽度、深度或激活函数无关，将其从拟合诊断重新解释为底层连续算子的结构不变量。对于具有无限维核的算子，$d_{eff}$则衡量网络对该核的有限维表示带宽，而非恢复整数不变量。重要的是，$d_{eff}$还作为先验结构诊断。将适定问题的$d_{eff}$驱动到零，证明物理和边界约束已吸收网络的自由方向。基于这一表征，我们引入了用于边界适配的子空间投影策略。无需从头重新训练，我们将参数更新投影到预训练物理算子的零空间，使得新边界条件得到满足而不干扰已学习的物理。基于梯度的微调可以达到或超过此效果，但需要更多的挂钟时间和调参，而子空间投影在几秒到几分钟内提供近乎等效的质量。我们在线性和非线性算子上验证，展示了对初始和边界偏移以及未遇到约束类型的准确适配。

英文摘要

Physics-Informed Neural Networks inherently suffer from task interference because they rely on a shared parameter space to satisfy both governing differential equations and boundary conditions. We analyze this structural conflict using the Fisher Information Matrix to quantify the effective degrees of freedom ($d_{eff}$) in a physics-constrained model. Unlike the classical $d_{eff}$ which measures how many parameter directions are informed by data against a statistical prior, our $d_{eff}$ measures the dimension of the parameter directions unconstrained by the differential operator. For operators with finite-dimensional kernel, we show that $d_{eff}$ converges to the kernel dimension exactly, independent of network width, depth, or activation function, recasting it from a fit diagnostic into a structural invariant of the underlying continuous operator. For operators with infinite-dimensional kernel, $d_{eff}$ instead measures the network's finite-dimensional representational bandwidth for that kernel rather than recovering an integer invariant. Importantly, $d_{eff}$ also serves as an a priori structural diagnostic. Driving $d_{eff}$ of a well-posed problem to zero certifies that the physics and boundary constraints have absorbed the network's free directions. Building on this characterization, we introduce subspace projection strategies for boundary adaptation. Rather than retraining from scratch, we project parameter updates into the null space of the pre-trained physics operator so that new boundary conditions are satisfied without disturbing the learned physics. Gradient-based fine-tuning can match or exceed this but needs more wall-clock time and tuning, whereas subspace projection delivers near-equivalent quality in seconds to minutes. We validate on linear and nonlinear operators, demonstrating accurate adaptation to initial and boundary shifts and unencountered constraint types.

URL PDF HTML ☆

赞 0 踩 0

2606.06164 2026-06-05 cs.LG physics.comp-ph 版本更新

从观察中学习理论化世界

Doojin Baek, Gyubin Lee, Junyeob Baek, Hosung Lee, Sungjin Ahn

发表机构 * University of Washington（华盛顿大学）

AI总结受认知科学启发，提出Learning-to-Theorize范式，通过神经理论家（NEO）模型从原始非文本观测中推断显式解释性理论，实现基于解释的泛化。

详情

AI中文摘要

理解世界意味着什么？当代世界模型通常将理解操作化为在潜在空间或观测空间中的准确未来预测。然而，发展认知科学提出了不同的观点：人类理解是通过构建关于世界如何运作的内部理论而涌现的，即使在成熟语言习得之前也是如此。受这种理论构建的认知观点启发，我们引入了Learning-to-Theorize，一种从原始非文本观测中推断世界的显式解释性理论的学习范式。我们通过神经理论家（NEO）实例化该范式，这是一种概率神经模型，它将潜在程序诱导为习得的思维语言，并通过共享的转移模型执行它们。在NEO中，理论被表示为一个可执行的组合程序，其习得的原语可以系统地重新组合以解释新现象。实验表明，这种公式化实现了基于解释的泛化，允许根据生成观测的程序来理解观测。

英文摘要

What does it mean to understand the world? Contemporary world models often operationalize understanding as accurate future prediction in latent or observation space. Developmental cognitive science, however, suggests a different view: human understanding emerges through the construction of internal theories of how the world works, even before mature language is acquired. Inspired by this theory-building view of cognition, we introduce Learning-to-Theorize, a learning paradigm for inferring explicit explanatory theories of the world from raw, non-textual observations. We instantiate this paradigm with the Neural Theorizer (NEO), a probabilistic neural model that induces latent programs as a learned Language of Thought and executes them through a shared transition model. In NEO, a theory is represented as an executable, compositional program whose learned primitives can be systematically recombined to explain novel phenomena. Experiments show that this formulation enables explanation-driven generalization, allowing observations to be understood in terms of the programs that generate them.

URL PDF HTML ☆

赞 0 踩 0

2606.06104 2026-06-05 cs.LG 版本更新

A Sliced-Wasserstein Framework on Correlation Matrices for EEG Decoding

用于脑电图解码的相关矩阵切片Wasserstein框架

Chen Hu, Rui Wang, Jiale Zhou, Jingjun Yi, Shaocheng Jin, Yidong Song, Yefeng Zheng

发表机构 * Westlake University（西湖大学）； School of Artificial Intelligence and Computer Science（人工智能与计算机科学学院）； Jiangnan University（江南大学）； Sun Yat-sen University（中山大学）

AI总结提出基于拉回欧几里得度量的切片Wasserstein框架，实例化两种相关矩阵切片Wasserstein差异，并构建脑电图解码的域泛化方法，在三个数据集上验证了分布偏移下的泛化能力提升。

Comments Accepted by KDD 2026

详情

DOI: 10.1145/3770855.3818864

AI中文摘要

脑电图（EEG）提供非侵入性、毫秒分辨率的神经活动记录，广泛应用于神经科学和医疗保健。许多EEG解码流程依赖协方差描述符以抵抗噪声，但这种表示对通道缩放敏感。因此，近期研究提倡使用满秩相关矩阵作为EEG解码的尺度不变替代。本文提出一个通用框架，用于在赋予拉回欧几里得度量（PEM）的流形上进行切片Wasserstein（SW）差异计算，称为拉回欧几里得度量切片Wasserstein（PEMSW）。在该框架下，我们在两种最近引入的相关几何（即Off-Log度量（OLM）和对数缩放度量（LSM））下，在满秩相关矩阵流形上实例化了两种相关切片Wasserstein（CorSW）差异。基于CorSW，我们进一步开发了用于EEG解码的域泛化（DG）框架。在三个EEG数据集上的实验表明，在分布偏移下泛化能力得到提升，且训练开销低，无额外推理成本。源代码可在https://github.com/ChenHu-ML/CorSW获取。

英文摘要

Electroencephalography (EEG) offers noninvasive, millisecond resolution recordings of neuronal activity and is widely used in neuroscience and healthcare. Many EEG decoding pipelines rely on covariance descriptors for their robustness to noise, but such representations are sensitive to channel-wise scaling. Recent studies have therefore advocated full-rank correlation matrices as a scale-invariant alternative for EEG decoding. In this paper, we propose a general framework for Sliced Wasserstein (SW) discrepancies on manifolds endowed with Pullback Euclidean Metrics (PEMs), termed Pullback Euclidean Metric Sliced Wasserstein (PEMSW). Within this framework, we instantiate two Correlation Sliced-Wasserstein (CorSW) discrepancies on the manifold of full-rank correlation matrices under two recently introduced correlation geometries, \textit{i.e.}, the Off-Log Metric (OLM) and Log-Scaled Metric (LSM). Building on CorSW, we further develop a domain generalization (DG) framework for EEG decoding. Experiments on three EEG datasets demonstrate improved generalization under distribution shifts, with low training overhead and no additional inference cost. The source code is available at https://github.com/ChenHu-ML/CorSW.

URL PDF HTML ☆

赞 0 踩 0

2606.06102 2026-06-05 cs.AI cs.LG 版本更新

Step-adaptive multimodal fusion network with multi-scale cloud feature learning for ultra-short-term solar irradiance forecasting

步进自适应多模态融合网络与多尺度云特征学习用于超短期太阳辐照度预测

Jingxin Zhang Xiaoqin Wang

发表机构 * School of Automation, Southeast University（自动化学院，东南大学）

AI总结提出一种步进自适应多模态融合网络，通过InceptionNeXt提取多尺度云特征、步进自适应低频补偿单元动态调整低频信息，并结合气象时间序列特征进行超短期太阳辐照度预测。

详情

AI中文摘要

超短期太阳辐照度预测对于光伏系统调度和电网稳定性至关重要。现有方法存在三个关键缺陷：单一时间序列模型无法捕捉复杂条件下云的空间动态，标准卷积不能充分表示多尺度云特征，固定的低频补偿策略无法适应不同的预测步长。针对这些问题，本文提出了一种用于超短期辐照度预测的多源数据融合模型。该模型首先采用InceptionNeXt从地基云图像中提取多尺度、多方向的空间特征。然后引入步进自适应低频补偿单元，根据预测步长动态调节全局低频信息。最后，将增强的图像特征与气象时间序列特征相结合，通过TempAttnLSTM网络捕获全局时间依赖性进行多步预测。在公共NREL数据集和山东实际光伏电站上的实验表明，与几种最先进的方法相比，所提方法具有有效性。

英文摘要

Ultra-short-term solar irradiance prediction is critical for photovoltaic system dispatch and power grid stability. Existing approaches suffer from three key shortcomings: single time-series models cannot capture the spatial dynamics of clouds under complex conditions, standard convolutions inadequately represent multi-scale cloud features, and fixed low-frequency compensation strategies fail to adapt to different prediction steps. To address these issues, this proposes a multi-source data fusion model for ultra-short-term irradiance prediction. The model first employs InceptionNeXt to extract multi-scale, multi-directional spatial features from ground-based cloud images. A step-adaptive low-frequency compensation unit is then introduced to dynamically modulate global low-frequency information based on the prediction step. Eventually, the enhanced image features are combined with meteorological time-series features, and a TempAttnLSTM network captures global temporal dependencies for multi-step prediction. Experiments on the public NREL dataset and practical photovoltaic stations in Shandong illustrate the effectiveness of the proposed method compared with several state-of-the-art approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.06098 2026-06-05 cs.CL cs.LG 版本更新

IR3DE: A Linear Router for Large Language Models

IR3DE：面向大型语言模型的线性路由器

Eros Fanì, Oğuzhan Ersoy

发表机构 * Gensyn

AI总结提出基于岭回归的线性路由器IR3DE，以低成本快速为每个提示选择最合适的领域专家大语言模型，在推理任务中超越基线方法，并支持动态添加或移除专家模型。

Comments Accepted at the ICML 2026 Workshop on Resource-Adaptive Foundation Model Inference

详情

AI中文摘要

基础大型语言模型（LLM）在广泛的一般任务上表现出色，并通过领域专家LLM在各种专业任务上取得显著成果。随着可用LLM列表的不断增长，推理路由器被提出以选择每个提示最合适的LLM。然而，现有的路由方法要么优化弱到强通用LLM的成本，要么需要大量训练来支持领域专家路由。在本文中，我们提出IR3DE，一种基于岭回归的领域专家路由器，为每个提示提供廉价且快速的路由决策。我们在两种因果语言建模（CLM）设置中评估IR3DE，其中任务是对所有域进行下一个词预测，以及一种推理设置，其中每个域有自己的独特推理任务。尽管是线性路由器，IR3DE在两种CLM设置中实现了与其他基线相当的性能，并在推理设置中超越了它们，归一化性能达到98.4%。此外，IR3DE允许添加或移除新的领域专家，而无需从头重新训练路由器，从而可以动态服务一组LLM，对路由器本身的干扰最小。我们的代码可在github.com/gensyn-ai/IR3DE获取。

英文摘要

Foundational Large Language Models (LLMs) demonstrate proficiency on a wide range of general tasks, and achieve remarkable results on various specialized tasks via domain-expert LLMs. With the ever-growing list of available LLMs, inference routers are being proposed to select the most appropriate LLM for each prompt. However, existing routing methods either optimize cost across weak-to-strong generalist LLMs or require substantial training to support domain-expertise routing. In this paper, we propose IR3DE, a Ridge Regression-based Router for Domain Experts that provides cheap and fast routing decisions for each prompt. We evaluate IR3DE in two Causal Language Modeling (CLM) settings where the tasks are next-token prediction for all domains, and one reasoning setting where each domain has its own distinct reasoning task. Despite being a linear router, IR3DE achieves performance comparable to the other baselines in both CLM settings, and surpassing them in the reasoning setting, with a normalized performance of 98.4%. Moreover, IR3DE enables the addition or removal of new domain experts without requiring the router to be retrained from scratch, allowing a dynamic set of LLMs to be served with minimal disruption to the router itself. Our code is available at: github.com/gensyn-ai/IR3DE.

URL PDF HTML ☆

赞 0 踩 0

2606.06096 2026-06-05 cs.LG cs.AI cs.CL 版本更新

OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

OrderGrad: 通过顺序统计量策略梯度估计超越均值优化

Paavo Parmas, Yongmin Kim, Kohsei Matsutani, Shota Takashiro, Soichiro Nishimori, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo

发表机构 * The University of Tokyo（东京大学）

AI总结提出OrderGrad，一种用于顺序统计量目标的似然比和重参数化梯度估计器族，通过奖励变换实现风险厌恶、鲁棒和探索性学习的统一即插即用方法。

详情

AI中文摘要

策略梯度方法通常优化期望回报，但许多现实应用关心回报的分布特性：尾部风险、异常值鲁棒性或最佳K发现。我们引入OrderGrad，一种用于顺序统计量目标的似然比和重参数化梯度估计器族。OrderGrad优化有限样本L-统计量，即排序奖励或成本的加权平均，通过仅改变秩权重来恢复诸如VaR、CVaR、修剪均值、中位数和top-m/最佳K标准等目标。对于任何固定样本大小和秩权重向量，OrderGrad为相应的顺序统计量目标提供无偏梯度估计。该方法实现为简单的奖励变换，然后可在其他标准策略梯度或重参数化更新中使用。我们研究了所得估计量的方差行为，并在均值优化与部署目标不匹配的任务上进行了评估，包括LLM数学后训练和其他任务。OrderGrad为风险厌恶、鲁棒和探索性学习提供了统一的即插即用途径。代码：https://github.com/paavo5/ordergrad

英文摘要

Policy-gradient methods usually optimize expected return, but many real world applications care about distributional properties of returns: tail risk, outlier robustness, or best-of-K discovery. We introduce OrderGrad, a family of likelihood-ratio and reparameterization gradient estimators for order-statistic objectives. OrderGrad optimizes finite-sample L-statistics, i.e., weighted averages of sorted rewards or costs, recovering objectives such as VaR, CVaR, trimmed means, medians, and top-m/best-of-K criteria by changing only the rank weights. For any fixed sample size and rank-weight vector, OrderGrad provides an unbiased gradient estimator for the corresponding order-statistic objective. The method is implemented as a simple reward transformation that can then be used in an otherwise standard policy-gradient or reparameterized update. We study the resulting estimator's variance behavior and evaluate it on tasks where mean optimization is mismatched to the deployment objective, including LLM math post-training and other tasks. OrderGrad provides a unified, plug-and-play route to risk-averse, robust, and exploratory learning. Code: https://github.com/paavo5/ordergrad

URL PDF HTML ☆

赞 0 踩 0

2606.06094 2026-06-05 cs.AI cs.LG math.DS physics.med-ph 版本更新

Integrating Mechanistic and Data-Driven Models for Neurological Disorders through Differentiable Programming

通过可微编程整合机制模型与数据驱动模型用于神经系统疾病

Shah Pallav Dhanendrakumar, Saikat Pal, Sitikantha Roy

发表机构 * Department of Applied Mechanics, Indian Institute of Technology Delhi（印度理工学院德里应用力学系）； Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi（印度理工学院德里人工智能学院）

AI总结本文综述了混合建模策略，通过可微编程将深度学习与基于物理的求解器结合，用于神经系统疾病的诊断、预后和治疗规划，优于纯机制或纯数据驱动方法。

详情

AI中文摘要

计算建模、神经影像和人工智能的进步正在革新神经系统疾病的建模，以改进诊断、预后和治疗规划。机制模型提供了对疾病的宝贵科学见解，但在实践中常常因假设而简化，或计算昂贵且求解缓慢。然而，纯数据驱动方法虽然提供速度和可扩展性，但需要大量高质量数据进行训练，并且通常存在可解释性和泛化问题。本视角论文提供了混合建模策略的结构化概述，这些策略将深度学习模型与基于物理的求解器相结合，并分为并行、串行和并行-串行架构。强调的三种主要方法是：用于缺失或不完整物理的残差建模、用于连续时间动力学近似的神经常微分方程（NODEs），以及用神经近似加速传统求解器的求解器在环。这些混合模型整合了基于控制微分方程的公式和深度学习，以表征神经系统疾病的演变，并有望实现先进的个性化神经建模。此外，该研究探索并提出了不同的混合配置，以提高诊断准确性、预测疾病进展，并为一系列神经系统疾病提供治疗策略信息。这些能力优于独立的机制或纯数据驱动方法，使混合建模成为强大的工具，特别是在涉及脑肿瘤、阿尔茨海默病和中风等神经系统疾病的进展和治疗反应建模的应用中。

英文摘要

Advances in computational modeling, neuroimaging, and artificial intelligence are revolutionizing the modeling of neurological disorders for improved diagnostics, prognosis, and treatment planning. Mechanistic models provide valuable scientific insight into the disorders, but in practice they are often simplified with assumptions or computationally expensive and slow to solve. However, while purely data driven approaches provide speed and scalability, they require large, high quality data to train and generally suffer from interpretability and generalization issues. This perspective paper presents a structured overview of hybrid modeling strategies, which combine deep learning models with physics based solvers, and are categorized into parallel, series, and parallel-series architectures. Three main approaches that have been emphasized are residual modeling for missing or incomplete physics, Neural Ordinary Differential Equations (NODEs) for continuous time dynamics approximation, and solver in the loop that accelerates traditional solvers with neural approximations. These hybrid models integrate the governing differential equation based formulations and deep learning to characterize the evolution of neurological disorders, and promise advanced personalized neurological modeling. In addition, the study explores and proposes different hybrid configurations to improve diagnosis accuracy, predict disease progression, and inform treatment strategies across a range of neurological disorders. These capabilities outperform standalone mechanistic or purely data driven approaches, making hybrid modeling a powerful tool, especially in applications involving modeling the progression and treatment responses in neurological conditions such as brain tumors, Alzheimer's disease, and stroke.

URL PDF HTML ☆

赞 0 踩 0

2606.06080 2026-06-05 cs.LG cs.AI cs.CL 版本更新

使用Rashomon集的蜕变测试：机器学习中的解释忠实性

Helge Spieker, Jørn Eirik Betten, Arnaud Gotlieb

发表机构 * Norwegian Ministry of Education and Research（挪威教育与研究部）

AI总结针对机器学习中因Rashomon效应导致解释不可靠的问题，提出基于蜕变测试的框架，通过后验解释方法评估特征归因的忠实性，无需真实标签。

Comments Accepted at 10th International Workshop on Metamorphic Testing (MET 2026)

2606.06053 2026-06-05 cs.LG 版本更新

Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification

基于函数近似的在线KL正则化强化学习在模型误设下的研究

Haoyang Hong, Zichen Wang, Quanquan Gu, Huazheng Wang

发表机构 * Department of XXX, University of YYY, Location, Country（XXX系，YYY大学，地点，国家）； School of ZZZ, Institute of WWW, Location, Country（ZZZ学院，WWW研究所，地点，国家）

AI总结研究在模型误设下，基于一般函数近似的KL正则化上下文赌博机和情节强化学习，提出KL误设公式并分析基于回归的Gibbs策略更新算法，给出包含显式误设项的高概率KL遗憾界。

Comments Accepted by RLC 2026

2606.06046 2026-06-05 math.NA cs.LG cs.NA 版本更新

Learning solution operators of PDEs with sparse approximation methods

用稀疏逼近方法学习PDE的解算子

Sebastian Neumayer, Daniel Potts, Fabian Taubert

AI总结本文提出一种结合乘积基展开与正交匹配追踪的稀疏高维方法，用于逼近偏微分方程的解算子，显著减少所需样本量，并在数值实验中与立方体稀疏逼近和傅里叶神经算子对比，展示了在稀疏表示下的准确性和可解释性。

详情

AI中文摘要

我们研究了使用稀疏高维技术逼近偏微分方程（PDE）解算子的问题。基于维度增量框架，我们将乘积基展开与稀疏恢复方法（特别是正交匹配追踪（OMP））相结合，与先前考虑的基于立方体的方法相比，大幅减少了所需样本量。我们在多个数值示例上评估了所得方法，在准确性、运行时间和样本量方面与基于立方体的稀疏逼近和傅里叶神经算子进行了比较。实验表明，我们的方法相对于其前身显著减少了所需的PDE求解次数，同时保持了有竞争力的准确性，特别是当解在所选基中具有稀疏表示时。此外，恢复的稀疏索引集为相关变量和参数交互提供了可解释的见解。

英文摘要

We investigate the approximation of solution operators for partial differential equations (PDEs) using sparse high-dimensional techniques. Building on a dimension-incremental framework, we combine product basis expansions with sparse recovery methods, specifically orthogonal matching pursuit (OMP), to substantially reduce the required sample size compared with a previously considered cubature-based approach. We evaluate the resulting method numerically on several examples, comparing it against both cubature-based sparse approximation and Fourier neural operators in terms of accuracy, runtime, and sample size. The experiments show that our approach considerably reduces the number of required PDE solves relative to its predecessor while maintaining competitive accuracy, particularly when the solution admits a sparse representation in the chosen basis. Furthermore, the recovered sparse index sets yield interpretable insights into the relevant variables and parameter interactions.

URL PDF HTML ☆

赞 0 踩 0

2606.06043 2026-06-05 stat.ML cs.LG 版本更新

Adaptive Learning Rates with Surrogate Probability for Follow-the-Perturbed-Leader

基于替代概率的自适应学习率用于跟随扰动领导者

Jongyeong Lee, Junya Honda, Shinji Ito, Chansoo Kim

发表机构 * Korea Institute of Science and Technology（韩国科学技术院）； Kyoto University（京都大学）； RIKEN AIP（理化学研究所AIP）； The University of Tokyo（东京大学）； University of Science and Technology（科学技术大学）

AI总结针对FTPL算法因无优化特性难以设计自适应概率依赖学习率的问题，提出基于替代概率函数的自适应学习率，实现了任意形状参数α>1的Pareto扰动下的最佳双世界保证，并扩展到专家建议的赌博机问题。

Comments TBA COLT2026

详情

AI中文摘要

跟随正则化领导者框架在在线学习问题中显示出有效性和灵活性，其中学习率的选择至关重要。最近，通过求解凸优化获得的、基于臂选择概率定义的自适应学习率，在各种赌博机问题中实现了改进的最佳双世界（BOBW）保证。相比之下，其计算高效替代方案——跟随扰动领导者（FTPL）的BOBW保证仍然相对有限，因为其无优化特性讽刺地使得设计自适应的、概率依赖的学习率变得非平凡。为了解决这一挑战，我们通过引入替代概率函数为FTPL提出了一种自适应学习率，该函数仅从可用量计算，无需精确概率。基于这些具有替代函数的学习率，我们为具有任意形状参数$α>1$的Pareto扰动的FTPL提供了BOBW保证，推广了先前仅限于特定选择$α=2$的结果。我们进一步展示了在具有专家建议的赌博机问题中，具有自适应学习率的FTPL的BOBW保证。我们的方法保留了FTPL的计算简单性，同时实现了概率依赖的自适应性，并且基于替代的方法论可能在其他算法框架（超越FTPL和学习率设计）中具有独立意义。

英文摘要

Follow-the-regularized-leader framework has shown effectiveness and flexibility in online learning problems, where the choice of learning rates are known to be crucial. Recently, adaptive learning rates defined in terms of the arm-selection probabilities, obtained by solving convex optimization, have achieved improved best-of-both-worlds (BOBW) guarantees in various bandit problems. In contrast, BOBW guarantees for its computationally efficient alternative, follow-the-perturbed-leader (FTPL), remain relatively limited since its optimization-free nature ironically makes the design of adaptive, probability-dependent learning rates non-trivial. To address this challenge, we propose an adaptive learning rate for FTPL by introducing surrogate probability functions that can be computed only from the available quantities, without requiring the exact probabilities. Based on these learning rates with surrogate functions, we provide the BOBW guarantee for FTPL with Pareto perturbations for any shape parameter $α>1$, generalizing prior results restricted to specific choices of $α=2$. We further show the BOBW guarantees for FTPL with adaptive learning rates in the bandit problem with expert advices. Our approach preserves the computational simplicity of FTPL while enabling probability-dependent adaptivity, and the surrogate-based methodology may be of independent interest in other algorithmic frameworks beyond FTPL and learning rate designs.

URL PDF HTML ☆

赞 0 踩 0

2606.06034 2026-06-05 cs.LG cs.AI 版本更新

When Good Enough Is Optimal: Multiplication-Only Matrix Inversion Approximation for Quantized Gated DeltaNet

当足够好即最优：量化门控DeltaNet的仅乘法矩阵求逆近似

Luoming Zhang, Yuwei Ren, Kui Zhang, Tian Liu, Lingjuan Ge, Denghao Li, Matthew Harper Langston, Yin Huang, Weiliang Will Zeng, Liang Zhang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结针对分块并行线性注意力中矩阵求逆的瓶颈，提出基于截断Neumann级数展开的仅矩阵乘法算法，结合结构掩码和并行残差校正，实现NPU上5倍内核加速和20%解码层开销降低。

详情

AI中文摘要

分块并行线性注意力中的矩阵求逆是长上下文建模的主要瓶颈，尤其是在NPU上，基于前向替换的方法并行性有限且硬件利用率低。我们提出了一种快速的、基于矩阵乘法（MatMul）的算法，专门针对分块线性注意力中出现的严格下三角矩阵。受Neumann级数项快速增长和逆矩阵对角集中性的启发，我们采用截断Neumann展开，结合结构掩码和并行残差校正，以消除顺序依赖。我们进一步将方法扩展到低比特INT，通过缓解重复矩阵幂运算引起的动态范围扩展，并根据块大小调整近似阶数和残差步长，以最小化计算成本同时保持模型精度。在Qwen3.5系列模型上的实验表明，在浮点和低精度推理下，该方法实现了高达5倍的内核级加速和20%的解码层开销降低，同时保持了精度。我们的方法为可扩展线性注意力提供了一种高效且硬件友好的解决方案。

英文摘要

Matrix inversion in chunk-wise parallel linear attention is a major bottleneck for long-context modeling, particularly on NPUs, where forward-substitution-based methods exhibit limited parallelism and poor hardware utilization. We propose a fast, Matrix Multiplication (MatMul)-based algorithm tailored for strictly lower-triangular matrices arising in chunk-wise linear attention. Motivated by the rapid growth of Neumann-series terms and the diagonal concentration of the inverse matrix, we employ a truncated Neumann expansion with structural masking and parallel residual correction to eliminate sequential dependencies. We further extend our method to low-bits INT by mitigating the dynamic range expansion arising from repeated matrix power operations, and adapt the approximation order and residual step to the chunk size to minimize computational cost while preserving the model's accuracy. Experiments on Qwen3.5-family models demonstrate up to 5$\times$ kernel-level speedup and a 20% reduction in decode-layer overhead, while preserving accuracy under both floating-point and low-precision inference. Our method offers an efficient and hardware-friendly solution for scalable linear attention.

URL PDF HTML ☆

赞 0 踩 0

2606.06032 2026-06-05 cs.LG 版本更新

Catastrophic Forgetting as Accessibility Collapse: A Three-Level Framework for Knowledge Persistence in Continual Learning

灾难性遗忘作为可访问性崩溃：持续学习中知识持久性的三层次框架

Ayushman Trivedi, Bhavika Melwani

发表机构 * Independent Researchers（独立研究者）

AI总结本文提出一个三层次框架（知识存储、表示和可访问性），通过实验证明灾难性遗忘主要是可访问性失败而非表示擦除，重新训练分类器可恢复大部分性能。

Comments 14 pages, 6 figures, 8 tables. Sequential continual-learning experiments on CIFAR-100 using ResNet-18

详情

AI中文摘要

灾难性遗忘通常被解释为在顺序学习过程中先前获得的知识不可逆转地擦除。在这项工作中，我们研究了一种替代观点：遗忘可能并非源于任务表示的完全破坏，而是源于对保存信息的可访问性丧失。我们引入了一个三层次框架，将知识存储、表示和可访问性分开，并通过在ResNet-18上对顺序CIFAR-100分类进行的一系列持续学习实验来评估每个组件。我们的分析结合了检查点持久性、线性探测、表示几何、分类器重置恢复和逐层可恢复性实验。我们观察到早期任务的完全行为遗忘，任务准确率从54.8%下降到0%，而线性探测性能保留了大约76%的原始表示信息。此外，仅重新训练最终分类器就恢复了原始任务性能的75.7%，而无需修改骨干网络。逐层分析表明，早期和中间层保留了高度可恢复的任务信息，尽管后期阶段严重退化。投影能量和主角度分析表明，保留的知识以分布式高维表示的形式持续存在，而不是通过保留一个小的主导子空间。这些发现表明，灾难性遗忘更适合被描述为可访问性失败而非完全表示擦除，并且即使在功能遗忘发生后，大量任务相关信息仍嵌入在神经表示中。

英文摘要

Catastrophic forgetting is commonly interpreted as the irreversible erasure of previously acquired knowledge during sequential learning. In this work, we investigate an alternative perspective: that forgetting may arise not from complete destruction of task representations but from a loss of accessibility to preserved information. We introduce a three-level framework separating knowledge storage, representation, and accessibility, and evaluate each component through a series of continual-learning experiments on sequential CIFAR-100 classification using ResNet-18. Our analysis combines checkpoint persistence, linear probing, representation geometry, classifier-reset recovery, and layer-wise recoverability experiments. We observe complete behavioral forgetting of earlier tasks, with task accuracy collapsing from 54.8% to 0%, while linear probe performance retains approximately 76% of the original representational information. Furthermore, retraining only the final classifier restores 75.7% of the original task performance without modifying the backbone network. Layer-wise analysis reveals that early and intermediate layers preserve highly recoverable task information despite severe degradation at later stages. Projection-energy and principal-angle analyses indicate that retained knowledge persists as distributed high-dimensional representations rather than through preservation of a small dominant subspace. These findings suggest that catastrophic forgetting is better characterized as an accessibility failure than complete representational erasure, and that substantial task-relevant information remains embedded within neural representations even after functional forgetting has occurred.

URL PDF HTML ☆

赞 0 踩 0

2606.06027 2026-06-05 cs.AI cs.CL cs.LG cs.SI 版本更新

压缩-蒸馏：面向高效知识蒸馏的推理轨迹压缩

Maxime Griot, Paul Steven Scotti, Tanishq Mathew Abraham

发表机构 * Université catholique de Louvain（列日天主教大学）； Sophont Inc（Sophont公司）

AI总结本文提出在知识蒸馏前对推理轨迹进行事后压缩，以降低训练成本并缩短推理输出，实验表明压缩在准确率与效率间存在权衡。

详情

AI中文摘要

推理模型产生长的思维链轨迹，这些轨迹蒸馏成本高且鼓励学生输出冗长内容。我们研究在知识蒸馏前对这些轨迹进行事后压缩。两个教师模型，Qwen3.5-397B-A17B 和 gpt-oss-120B，各生成约 283k 条正确轨迹；两个指令调优模型将其压缩至原始字符长度的 8.6-21.0%。在包含 48 次运行的主网格和七次 Qwen 教师截断消融实验中，压缩轨迹将训练 token 减少至原始的 12-30%，训练速度提升 2.0-7.6 倍，推理输出缩短 3-19 倍，在更短的 gpt-oss 教师下减少幅度较小。然而，原始轨迹在每个规模下和两位教师上都保持最高的下游准确率。一项长度匹配的原始轨迹截断消融实验表明，压缩并非仅仅受益于更小的 token 预算：模型压缩的轨迹通常优于或匹配朴素截断，尤其是对于较小的学生模型，同时保持更短的推理输出。总体而言，推理轨迹压缩提供了准确率与效率之间的权衡，而非免费改进：学生模型保留了原始轨迹高达 96% 的准确率，同时获得了高达 18 倍的每 token 效率提升；在 0.8B 规模下，使用 LoRA 压缩轨迹缩小了原始与压缩之间的差距，但未超过原始轨迹。

英文摘要

Reasoning models produce long chain-of-thought traces that are costly to distill and encourage verbose student outputs. We study post-hoc compression of such traces before knowledge distillation. Two teachers, Qwen3.5-397B-A17B and gpt-oss-120B, generate about 283k correct traces each; two instruction-tuned models then compress them to 8.6-21.0% of their original character length. Across a 48-run main grid plus seven Qwen-teacher truncation ablations, compressed traces reduce training tokens to 12-30% of raw, speed up training by 2.0-7.6x, and shorten inference outputs by 3-19x with smaller reductions under the shorter gpt-oss teacher. However, raw traces retain the highest downstream accuracy at every scale and for both teachers. A length-matched raw-trace truncation ablation shows that compression is not merely benefiting from a smaller token budget: model-compressed traces usually beat or match naive truncation, especially for smaller students, while maintaining shorter inference outputs. Overall, reasoning-trace compression offers an accuracy-efficiency trade-off rather than a free improvement: students retain up to 96% of raw-trace accuracy while gaining up to 18x higher per-token efficiency, and at the 0.8B scale under LoRA compressed traces narrow the raw-vs-compressed gap but do not exceed raw.

URL PDF HTML ☆

赞 0 踩 0

2606.05981 2026-06-05 cs.CV cs.LG 版本更新

Video-Rate Streaming Stylization on a Vision-Aware MLLM-Conditioned Edit Diffusion: Asymmetric Batched Inference on a Distilled UNet + MLLM Text Encoder

基于视觉感知的多模态大语言模型条件编辑扩散的视频率流式风格化：蒸馏UNet + MLLM文本编码器上的非对称批处理推理

Yoshiyuki Ootani

发表机构 * Independent researcher（独立研究员）

AI总结针对蒸馏扩散模型中文本编码器成为瓶颈的问题，提出一种结合非对称CUDA流水线、编译友好的ControlNet-LLLite重构和周期性条件刷新调度的流式管线，在消费级GPU上实现视频率实时风格化编辑。

Comments 12 pages, 4 figures, 12 tables. Under review at IEEE Transactions on Circuits and Systems for Video Technology. Code, evaluation harness, and the released v3 Temporal LLLite adapter weights are at https://github.com/otanl/dreamlite-stream (also mirrored to Hugging Face and Zenodo)

详情

AI中文摘要

扩散U-Net的激进蒸馏反转了实时文本到图像流水线的逐帧瓶颈：一旦去噪器成为4步或1步蒸馏的学生模型，文本编码器就成为关键路径。这种反转在视觉感知编辑扩散中最为严重，其中编码器是多模态大语言模型（MLLM）。我们研究了一个0.39B蒸馏编辑U-Net与2.13B MLLM文本编码器（Qwen3-VL）配对的情况，并提出了一种针对该场景的流式管线，该管线围绕三种工程机制构建：非对称侧流/主流CUDA流水线，带有批处理文本编码器摊销（以及可选的静态提示缓存）；一种编译友好的ControlNet-LLLite重构，将整个U-Net +适配器堆栈折叠成单个融合图；以及一个带有钩子子集的周期性条件刷新调度，用于摊销每帧条件成本。在单个消费级RTX 3090 Ti上，512x512分辨率下，管线在批大小B=8时维持27.4 fps，B=16时维持29.6 fps，端到端p50延迟分别约为0.5和1.0秒；相同操作点在RTX 4090上测得54.9 fps，在RTX 5090上测得74.1 fps。我们报告的是视频率流式吞吐量而非交互式低延迟，并将我们的数据与相同堆栈的StreamDiffusion重运行进行对比，作为系统上下文，而非基准优越性声明。对于训练的油画风格，发布的时序适配器在剪辑内噪声中泛化到19个未使用的DAVIS-2017序列和来自七个来源的15个非DAVIS剪辑；对未见风格族的提示级泛化有限，并单独报告。

英文摘要

Aggressive distillation of the diffusion U-Net inverts the per-frame bottleneck of real-time text-to-image pipelines: once the denoiser is a 4-step or 1-step distilled student, the text encoder becomes the critical path. This inversion is most acute in vision-aware edit diffusion, where the encoder is a multimodal large language model (MLLM). We study the case of a 0.39B distilled edit U-Net paired with a 2.13B MLLM text encoder (Qwen3-VL) and present a streaming pipeline targeted at this regime built around three engineering mechanisms: asymmetric side-stream / main-stream CUDA pipelining with batched text-encoder amortisation (and optional static-prompt caching), a compile-friendly ControlNet-LLLite reformulation that folds the entire U-Net + adapter stack into a single fused graph, and a periodic conditioning-refresh schedule with a hook subset that amortises the per-frame conditioning cost. On a single consumer RTX 3090 Ti at 512x512 the pipeline sustains 27.4 fps over a 480-frame run at batch size B=8 and 29.6 fps at B=16, with end-to-end p50 latency of approximately 0.5 and 1.0 seconds respectively; the same operating point measures 54.9 fps on RTX 4090 and 74.1 fps on RTX 5090. We report video-rate streaming throughput rather than interactive low latency, and locate our numbers against same-stack StreamDiffusion re-runs as systems context, not as a benchmark superiority claim. For the trained oil-painting style, the released temporal adapter generalises within in-clip noise to 19 unused DAVIS-2017 sequences and 15 non-DAVIS clips from seven sources; prompt-level generalisation to unseen style families is bounded and reported separately.

URL PDF HTML ☆

赞 0 踩 0

2606.05972 2026-06-05 cs.LG 版本更新

LLM Explainability with Counterfactual Chains and Causal Graphs

基于反事实链和因果图的LLM可解释性

Nirit Nussbaum-Hoffer, Nitay Calderon, Liat Ein-Dor, Roi Reichart

发表机构 * Faculty of Data and Decision Sciences, Technion I IBM Research（数据与决策科学学院，技术离子IBM研究所）

AI总结提出一种四阶段方法，利用因果图建模LLM推理过程，通过MCMC启发的反事实增强发现类判别性概念并生成可解释图，用于疾病诊断、情感分析等任务。

详情

AI中文摘要

因果图为使机制透明提供了高级语言。近期工作使用大型语言模型（LLMs）恢复外部世界过程的因果图。相反，在本文中，我们使用因果图对LLM推理本身进行建模，为利益相关者提供模型如何感知和组织高层概念以产生预测的透明视图。我们提出了一种四阶段方法来构建此类图。给定目标LLM和一组文本示例，我们的方法发现类判别性、人类可解释的概念，并将每个输入映射到LLM感知的概念状态。然后，我们引入一种受MCMC启发的反事实增强过程，通过反事实链扩展稀疏的观测数据。这使得使用$σ$-CG进行稳定的因果发现成为可能，从而产生信息丰富且可解释的图。我们将我们的方法应用于三个LLM，涵盖疾病诊断、情感分析和LLM作为评判者的分类任务。我们评估了学习到的图的预测保真度和结构稳定性，以及受MCMC启发的增强的收敛性和下游效用。我们的结果表明，发现的因果图捕获了与LLM推理一致的有意义的依赖关系。总之，本文为LLM的概念级可解释性提供了基础。

英文摘要

Causal graphs provide a high-level language for making mechanisms transparent. Recent work uses Large Language Models (LLMs) to recover causal graphs of external-world processes. Instead, in this paper, we use causal graphs to model LLM inference itself, providing stakeholders with a transparent view of how the model perceives and organizes high-level concepts to produce a prediction. We propose a four-phase method for constructing such graphs. Given a target LLM and a set of textual examples, our method discovers class-discriminative, human-interpretable concepts and maps each input to LLM-perceived concept states. We then introduce an MCMC-inspired counterfactual augmentation procedure that expands the sparse observational data through chains of counterfactuals. This enables stable causal discovery with $σ$-CG, yielding informative, interpretable graphs. We apply our method to three LLMs across disease diagnosis, sentiment analysis, and LLM-as-a-judge classification tasks. We evaluate the learned graphs for predictive fidelity and structural stability, and the MCMC-inspired augmentation for convergence and downstream utility. Our results show that the discovered causal graphs capture meaningful dependencies consistent with LLMs' reasoning. Together, this paper provides a foundation for concept-level explainability of LLMs.

URL PDF HTML ☆

赞 0 踩 0

2606.05970 2026-06-05 cs.CL cs.AI cs.LG 版本更新

Measuring the sensitivity of LLM-based structured extraction to prompt, model, and schema choices in clinical discharge summaries

测量基于LLM的结构化提取对临床出院小结中提示、模型和模式选择的敏感性

Martin Murin

发表机构 * DryLabz GmbH（DryLabz公司）

AI总结本研究通过固定提取任务并逐一改变提示、模型和模式选择，测量了大型语言模型在临床文本结构化提取中输出对上游配置的敏感性，发现模式选择导致的差异集中在缺失与沉默的区分上，而模型选择在多类分类中主导提示措辞。

Comments 69 pages, 5 main figures, supplementary material included

详情

AI中文摘要

大型语言模型越来越多地用于从临床自由文本笔记中进行结构化提取，但其输出对上游配置选择的敏感性比在固定基准上的准确性更少被理解。本文通过固定提取任务并逐一改变一个选择，在没有人工标注真实值的情况下测量了这种敏感性。固定模式包括17个临床文档标志（三值：是/否/未记录）和47个标签词汇（用于主要入院原因）。表达该模式的三种提示变体分别在两个模型大小上对MIMIC-IV v3.1出院小结运行。跨提示一致性通过Cohen's kappa在ICD分层子集上测量。配对相同笔记比较隔离了模型选择的影响，事后将三值标志折叠为二值测试了模式对不一致的贡献。在三值标志上，两个模型达到相同的合并跨提示一致性（中位数kappa 0.69和0.68）；较大的模型提高了某些字段的一致性并降低了其他字段的一致性，这是一种重新分布而非无效果。将模式折叠为二值消除了大部分跨提示不一致，将其定位在缺失与沉默的区分上，而非发现是否存在。在多类入院分类上，改变模型会重新分配近一半笔记的主导标签，而改变提示措辞则重新分配约八分之一的笔记，并且较大的模型在残余的通用类别上分配的权重少得多（44%到26%）。这些模式表明，模式施加的不一致集中在缺失与沉默轴上，而模型在多类分类上主导提示措辞，这是通过一种可重复的方法在人群规模部署中审计提取可重复性而识别的。

英文摘要

Large language models are increasingly used for structured extraction from clinical free-text notes, but the sensitivity of their output to upstream configuration choices is less understood than their accuracy on fixed benchmarks. This work measures that sensitivity without human-annotated ground truth, by holding the extraction task fixed and varying one choice at a time. The fixed schema comprises 17 clinical documentation flags on a three-way yes/no/not_documented value set and a 47-tag vocabulary for the primary admission reason. Three prompt variants expressing this schema were each run at two model sizes on MIMIC-IV v3.1 discharge summaries. Cross-prompt agreement was measured by Cohen's kappa on ICD-stratified subsets. A paired same-note comparison isolated the effect of model choice, and a post-hoc collapse of the three-way flags to binary tested the schema's contribution to disagreement. On the three-way flags, the two models reach the same pooled cross-prompt agreement (median kappa 0.69 and 0.68); the larger model raises agreement on some fields and lowers it on others, a redistribution rather than the absence of an effect. Collapsing the schema to binary dissolves most of the cross-prompt disagreement, locating it on the absence-versus-silence distinction rather than on whether the finding is present. On the multi-class admission categorization, changing the model reassigns the dominant tag on close to half of all notes while changing the prompt phrasing reassigns it on roughly one in eight, and the larger model places far less mass on residual catch-all categories (44% to 26%). These patterns indicate a schema-imposed source of disagreement concentrated on the absence-versus-silence axis and a dominance of model over prompt phrasing on multi-class categorization, identified by a reusable methodology for auditing extraction reproducibility on a population-scale deployment.

URL PDF HTML ☆

赞 0 踩 0

2606.05958 2026-06-05 cs.LG 版本更新

Steering Vectors are an Adversarial Attack Surface

Steering Vectors 是对抗攻击面

Abzal Aidakhmetov, Donato Crisostomi, Tommaso Mencattini, Adrian Robert Minut, Iacopo Masi, Emanuele Rodolà

发表机构 * Sapienza University of Rome（罗马萨皮恩扎大学）； EPFL（苏黎世联邦理工学院）

AI总结本文揭示了一种隐蔽的数据投毒攻击，通过替换转向数据集中的4-6%令牌，使转向向量与反拒绝方向对齐，从而劫持目标模型，同时保留对良性提示的预期转向效果。

详情

AI中文摘要

激活转向已成为一种无需微调即可控制大型语言模型（LLM）行为的流行方法。由于该技术即插即用，用户共享数据集和预计算向量以转向模型激活。然而，我们展示了一种隐蔽的数据投毒攻击可以悄无声息地破坏这一流程。通过替换转向数据集中的4-6%令牌，攻击者可以使结果向量与反拒绝方向对齐。这劫持了目标模型，同时保留了对良性提示的预期转向效果。在此威胁模型下，恶意行为者可以分发一个看似安全的包，包含文本、向量和权重，以及一个终端用户可以验证的等价证书。我们在两个开放权重模型系列和八个模型-属性组合上测试了该攻击，观察到中毒向量的绝对攻击成功率（ASR）达到20-55%，比干净参考高出19%到51%。最后，我们发现一种拒绝方向正交化防御可以恢复约82%的ASR差距，而不损害良性行为。

英文摘要

Activation steering has become a popular way to control Large Language Model (LLM) behavior without fine-tuning. Since the technique is plug-and-play, users share datasets and precomputed vectors to steer model activations. However, we show that a \emph{stealth data poisoning attack} silently compromises this pipeline. By substituting $4{-}6\%$ of tokens in the steering dataset, an attacker can silently align the resulting vector with an anti-refusal direction. This jailbreaks the target model while preserving the intended steering effect on benign prompts. Under this threat model, a malicious actor can distribute an apparently safe bundle containing texts, vectors, and weights, alongside an equivalence certificate that the end-user can verify. We test the attack on two open-weight model families and eight model-attribute combinations, observing that poisoned vectors reach an absolute attack success rate (ASR) of $20{-}55\%$, $+19\%$ to $+51\%$ over a clean reference. Finally, we find that a refusal-direction orthogonalization defense can recover ${\approx}82\%$ of the ASR gap without harming benign behavior.

URL PDF HTML ☆

赞 0 踩 0

2606.05957 2026-06-05 cs.LG stat.ML 版本更新

Dead Directions: Geometric Singular Learning

死方向：几何奇异学习

Tejas Pradeep Shirodkar

发表机构 * IIIT, Hyderabad（Hyderabad 二十一世纪信息技术研究所）

AI总结本文通过引入“死方向”概念，桥接奇异学习理论与信息几何，提出在原始参数坐标下从Fisher曲率衰减率恢复KL阶数的方法，并扩展到深度网络，实现无需后验采样的Watanabe三元组(λ, m, ν)轨迹率读出。

Comments 139 pages, 13 figures, 13 tables

详情

AI中文摘要

奇异学习理论和信息几何研究了相同的参数空间，但使用了大体不同的词汇：前者在解决坐标中计算贝叶斯不变量，后者在非退化假设下使用原始坐标，而过参数化模型经常违反该假设。我们通过一个原始概念——死方向——将它们桥接起来：死方向是沿着Fisher度量退化的单位向量，等价于具有确定KL阶数的解析奇异集的切向量，KL阶数由KL散度消失的速度决定。两种解读命名同一向量；我们的核心操作表明，其KL阶数可作为方向Fisher曲率趋近奇异点的衰减率恢复，在原始参数坐标中无需Hironaka分解。光滑纤维上的选择规则将该速率转化为Watanabe的单方向对实对数规范阈值的贡献，我们将恢复扩展到多分量交叉、重数m、奇异波动ν（在一维方向中KL阶数通用）、先验RLCT偏移以及温度后验。然后我们将该速率提升到深度网络：多层K-FAC分解将每个Fisher块写为激活侧和梯度侧速率的乘积，两者之间存在对偶性，并在现代网络原语（残差流、层归一化、注意力）中实例化。商定理将该速率传递到在G不变度量下梯度流的规范商Θ/G；SGD符合条件，标准Adam不符合，我们构造了一个G等变Adam族预条件器（DDCAdam）使其符合。该桥接提供了对奇异几何的参数坐标处理、每个架构的闭式预测，以及从一个检查点的前向和后向传播中无需后验采样的Watanabe三元组(λ, m, ν)轨迹率读出。

英文摘要

Singular learning theory and information geometry have studied the same parameter spaces in mostly separate vocabularies: the former computes Bayesian invariants in resolved coordinates, the latter works in original coordinates under a non-degeneracy assumption that overparameterised models routinely violate. We bridge them through one primitive, the dead direction: a unit vector along which the Fisher metric degenerates, equivalently a tangent to the analytic singular set with a definite KL order, set by how fast the KL divergence vanishes. The two readings name the same vector; our central move shows its KL order is recoverable as the decay rate of the directional Fisher curvature approaching the singularity, in original parameter coordinates and without a Hironaka resolution. A selection rule on smooth fibres translates this rate into Watanabe's single-direction contribution to the real log canonical threshold, and we extend the recovery to multi-component crossings, multiplicity $m$, the singular fluctuation $ν$ (universal in the KL order for 1D directions), prior-RLCT shifts, and tempered posteriors. We then lift this rate to a deep network: a multi-layer K-FAC factorisation writes each Fisher block as a product of activation- and gradient-side rates with a duality between them, instantiated at modern-network primitives (residual streams, layer normalisation, attention). A quotient theorem carries the rate to the gauge quotient $Θ/G$ under gradient flow on a $G$-invariant metric; SGD qualifies, standard Adam does not, and we construct a $G$-equivariant Adam-family preconditioner (DDCAdam) that does. The bridge yields a parameter-coordinate handle on singular geometry, closed-form per-architecture predictions, and a trajectory-rate readout of Watanabe's triple $(λ, m, ν)$ from one checkpoint's forward and backward passes, without posterior sampling.

URL PDF HTML ☆

赞 0 踩 0

2606.05946 2026-06-05 cs.LG 版本更新

Short paper: Models in the dark -- Rectification and erasure under GDPR in ML supply chains

短论文：黑暗中的模型——机器学习供应链中GDPR下的更正与删除

Henrik Graßhoff, Malte Hansen, Meiko Jensen, Sara Ramezanian

发表机构 * Karlstad University（卡尔斯塔德大学）

AI总结本文从跨学科视角调查机器学习供应链中实现GDPR更正权和删除权的挑战，提出“黑暗中的模型”概念，并分析其带来的紧迫问题。

Comments accepted for presentation at Annual Privacy Forum 2026

2606.05942 2026-06-05 stat.ML cs.LG 版本更新

EML-CD: Causal Mechanism Recovery via EML Symbolic Trees in Structure Learning

EML-CD：通过结构学习中的EML符号树进行因果机制恢复

Sota Asanuma

发表机构 * SoftBank Corp（软银公司）

AI总结提出EML-CD框架，利用EML算子构建可解释的因果机制符号树，从数据中自动发现闭式因果方程，在真实和合成数据上实现了机制恢复与结构学习的平衡。

详情

AI中文摘要

基于神经网络（NN）的非线性因果发现方法能够恢复DAG结构，但将每个因果机制视为黑箱。Waxman等人认为从NN权重中提取因果机制是不适定的。我们提出EML-CD，一个将EML算子（能够从单个二元运算符组合初等函数）集成到因果结构学习中的框架，以可解释的机制恢复为主要目标。EML-CD将每条边机制表示为门控EML二叉树，并自动发现闭式因果方程。解析雅可比矩阵可直接从输出方程计算，从而定量理解因果效应。在真实数据（Sachs蛋白信号，d=11）上，EML-CD达到SHD=11.2±0.4（5次种子均值；基线为单次确定性运行），与PC/GES在种子方差内相当且低于CAM，同时为每条检测到的边附加闭式方程（精确率0.756，召回率0.365）。在已知机制的受控双变量测试中，EML-CD忠实恢复了11个初等函数族中的10个（留出形状相关性≥0.96；仅高频正弦部分恢复）。在符号合成基准上，EML-CD的留出机制f-MSE远低于固定SINDy字典且更稳定（均值3.67对比7644，后者因一次种子的灾难性外推而膨胀），尽管其结构恢复（SHD 14.0）仅与字典相当且低于专用优化器；在Causal Chambers光隧道子集上，深度2模型将F1分数从线性OLS-BIC的0.273提升至0.444。

英文摘要

Neural network (NN)-based nonlinear causal discovery methods recover DAG structure but leave each causal mechanism as a black box. Waxman et al. argued that extracting causal mechanisms from NN weights is ill-posed. We propose EML-CD, a framework that integrates the EML operator (capable of composing elementary functions from a single binary operator) into causal structure learning, with interpretable mechanism recovery as the primary objective. EML-CD represents each edge mechanism as a gated EML binary tree and automatically discovers closed-form causal equations. Analytical Jacobians can be directly computed from the output equations, enabling quantitative understanding of causal effects. On real data (Sachs protein signaling, d=11), EML-CD achieves SHD=11.2 +/- 0.4 (5-seed mean; baselines are single deterministic runs), on par with PC/GES within seed variance and below CAM, while attaching closed-form equations to each detected edge (precision 0.756, recall 0.365). In a controlled bivariate test with known mechanisms, EML-CD recovers 10 of 11 elementary function families faithfully (held-out shape correlation >= 0.96; only high-frequency sine is partial). On a symbolic synthetic benchmark, EML-CD attains a substantially lower and more stable held-out mechanism f-MSE than a fixed SINDy dictionary (mean 3.67 vs. 7644, the latter inflated by catastrophic extrapolation on one seed), although its structure recovery (SHD 14.0) only matches the dictionary and stays below specialized optimizers; on the Causal Chambers light-tunnel subset, a depth-2 model improves F1 over linear OLS-BIC (0.444 vs. 0.273).

URL PDF HTML ☆

赞 0 踩 0

2606.05931 2026-06-05 cs.CL cs.AI cs.CV cs.IR cs.LG cs.MM eess.AS 版本更新

To Be Multimodal or Not to Be: Query-Adaptive Audio-Visual Person Retrieval via Active Modality Detection

多模态还是非多模态：通过主动模态检测的查询自适应音视频人物检索

Erfan Loweimi, Mengjie Qian, Kate Knill, Guanfeng Wu, Chi-Ho Chan, Abbas Haider, Muhammad Awan, Josef Kittler, Hui Wang, Mark Gales

发表机构 * University of Cambridge（剑桥大学）； Queen's University Belfast（贝尔法斯特女王大学）； University of Surrey（萨里大学）； Cisco（思科）； Southwest Jiaotong University（西南交通大学）； Teesside University（泰赛德大学）

AI总结提出一种查询自适应框架，通过跨模态分数一致性检测主动模态，在BBC Rewind语料库上达到94.2%的P@1，优于单模态和固定融合方法。

Comments INTERSPEECH 2026

详情

AI中文摘要

当通过语音和面部从视频档案中检索一个人时，系统应该是多模态的吗？在实际的广播档案中，与精心策划的基准不同，目标可能只被听到但未被看到、只被看到但未被听到，或者两者兼有。融合来自缺失模态的分数会引入噪声，使精度低于最佳单模态系统。我们提出了一种查询自适应框架，通过跨模态分数一致性检测主动模态：当两种模态都活跃时，由一种模态检索的文件在另一种模态上也得分高；当一种模态缺失时，这种一致性被破坏。由这些跨模态特征驱动的分类器实现了89%的检测准确率。在BBC Rewind语料库（包含超过12,000个广播视频）上，自适应系统达到了94.2%的P@1，优于仅语音（82.9%）、仅面部（93.4%）和固定融合（90.0%），恢复了与具有真实模态标签的Oracle（96.6%）之间差距的64%。

英文摘要

When retrieving a person from a video archive by voice and face, should the system be multimodal or not? In real-world broadcast archives, unlike curated benchmarks, a target may be heard but unseen, seen but unheard, or both. Fusing scores from an absent modality injects noise, degrading precision below the best unimodal system. We propose a query-adaptive framework that detects active modalities via cross-modal score consistency: when both modalities are active, files retrieved by one also score highly on the other; this agreement breaks down when a modality is absent. Classifiers driven by these cross-modal features achieve 89% detection accuracy. On the BBC Rewind corpus (with over 12,000 broadcast videos) the adaptive system attains 94.2% P@1, outperforming speaker-only (82.9%), face-only (93.4%), and fixed fusion (90.0%), recovering 64% of the gap to an oracle with ground-truth modality labels (96.6%).

URL PDF HTML ☆

赞 0 踩 0

2606.05927 2026-06-05 cs.LG 版本更新

Addressing Imbalance in Multi-Label Data via Label-Specific Distance-based Oversampling

通过标签特定距离的过采样解决多标签数据中的不平衡问题

Bin Liu, Jun Wu, Haoyu Peng, Ao Zhou, Jin Wang, QiaoSong Chen, Grigorios Tsoumakas

发表机构 * Key Laboratory of Data Engineering and Visual Computing, Chongqing University of Posts and Telecommunications, China（数据工程与视觉计算重点实验室，重庆邮电大学，中国）； School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, China（计算机科学与技术学院，重庆邮电大学，中国）； State Key Laboratory of Novel Software Technology, Nanjing University, China（新型软件技术国家重点实验室，南京大学，中国）； School of Informatics, Aristotle University of Thessaloniki, Greece（信息学院，希腊阿尔蒂米斯大学）

AI总结针对多标签分类中的标签不平衡问题，提出基于标签特定距离的过采样方法LSDMLO，通过加权相关特征空间识别标签一致邻居，生成更有效的合成实例，实验表明优于现有方法。

详情

AI中文摘要

复杂的非平衡标签分布对多标签分类构成了严峻挑战，因为大多数分类器偏向于多数类和高频标签。过采样是一种高效且灵活的解决方案，通过增加实例来为多标签分类器提供更平衡的训练数据集。现有的大多数过采样方法以启发式方式创建合成实例，本质上依赖于在整个特征空间中使用欧氏距离检索的邻域信息。然而，它们未能考虑特征对不同标签的不同语义相关性，导致邻近邻居之间的标签不一致，进而引入标签混淆和过拟合到合成实例。为了克服上述问题，我们提出了一种新颖的采样方法，称为基于标签特定距离的多标签过采样（LSDMLO），该方法创建更有用且标签正确的合成实例，以解决多标签数据集中的不平衡问题。LSDMLO基于加权相关特征空间推导标签特定距离，以识别标签一致的邻居，这有助于选择在边界区域表达更多标签相关性的种子实例，并生成与原始数据标签分布一致的合成实例。综合实验表明，所提出的LSDMLO在各种基分类器下均优于最先进的多标签采样方法。

英文摘要

The complex imbalanced label distribution poses a crucial challenge to multi-label classification, as most classifiers are biased towards the majority class and high-frequent labels. Oversampling is an efficient and flexible solution that augments instances to provide a more balanced training dataset for multi-label classifiers. Most existing oversampling methods create synthetic instances in a heuristic way that essentially relies on neighborhood information retrieved using Euclidean distance within the entire feature space. However, they fail to consider the varying semantic relevance of features to different labels, leading to label inconsistency among proximate neighbors and further introducing label confusion and overfitting to synthetic instances. To overcome the above issue, we propose a novel sampling approach called Label-Specific Distance-based Multi-Label Oversampling (LSDMLO) that creates more useful and well-labeled synthetic instances to address the imbalance in multi-label datasets. LSDMLO derives the label-specific distance to identify label-consistent neighbors based on the weighted pertinent feature space, which facilitates selecting seed instances that express more label correlations in boundary areas and generating synthetic instances aligned with the label distribution of original data. The comprehensive experiments verify that the proposed LSDMLO outperforms the state-of-the-art multi-label sampling approaches under various base classifiers.

URL PDF HTML ☆

赞 0 踩 0

2606.05911 2026-06-05 cs.SD cs.LG eess.AS 版本更新

DBHN-Net: Dual-Branch Hybrid Neural Network For Low-Complexity Monaural Speech Enhancement

DBHN-Net: 低复杂度单声道语音增强的双分支混合神经网络

Cunhang Fan, Enrui Liu, Jing Zhou, Jian Kang, Jie Li, Andong Li, Jian Zhou, Zhao Lv, Xuelong Li

发表机构 * State Key Laboratory of Opto-Electronic Information Acquisition and Protection Technology, (School of Computer Science and Technology), Anhui University（光电信息获取与防护技术国家重点实验室（计算机科学与技术学院），安徽大学）； China Telecom Artificial Intelligence Technology (Beijing) Co., Ltd（中国电信人工智能技术（北京）有限公司）； Institute of Acoustics, University of Chinese Academy of Sciences（中国科学院声学研究所）； Institute of Artificial Intelligence (TeleAI), China Telecom, China（人工智能研究所（TeleAI），中国电信，中国）

AI总结提出一种结合ANN和SNN的双分支混合神经网络，通过BandSplit、TF-Mamba等模块降低计算复杂度，同时利用交互和融合模块保持性能，在三个公共数据集上实现平均7.5倍复杂度降低。

Comments This article has been accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI)

详情

DOI: 10.1109/TPAMI.2026.3698087
Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI2026)

AI中文摘要

尽管基于人工神经网络（ANN）的语音增强（SE）方法表现出色，但高计算复杂度和高能耗阻碍了它们在实际前端处理任务中的部署。目前，脉冲神经网络（SNN）在降低功耗方面显示出潜力。然而，SNN的离散二进制激活和复杂的时空动态常常导致信息丢失。因此，当前的挑战集中在如何保持性能并降低计算复杂度。为了解决这个问题，本文提出了一种双分支混合神经网络（DBHN）。1）在网络架构方面：设计了一个集成ANN和SNN的双分支网络，其中SNN分支降低功耗，而ANN分支解决信息丢失；开发了BandSplit和时频（TF）-Mamba模块，以同时压缩能耗和增强模型性能；实现了带有残差连接的脉冲特征提取组（SFEG）和信息转换块（ITB）组件，以减轻信息丢失，同时进一步细化特征表示。2）为了促进分支间的信息融合：设计了一个交互模块，以促进双分支网络各个阶段的信息交换；设计了一个TF交叉注意力融合模块，在数据自适应地引导SNN分支保留更多关键信息的同时，对双分支信息进行时频域融合。结果表明，所提出的模型在三个公共数据集上保持了优越的性能，同时与基线模型相比，计算复杂度平均降低了7.5倍。

英文摘要

Although artificial neural network (ANN) based speech enhancement (SE) methods demonstrate excellent performance, the high computational complexity and high energy consumption hinder their deployment in practical front-end processing tasks.} Currently, the spiking neural networks (SNNs) have shown potential in reducing power consumption. However, the discrete binary activation and complex spatio-temporal dynamics of SNNs often result in information loss. The current challenge therefore focuses on how to maintain performance and reduce computational complexity. To address this issue, this work propose a Dual-Branch Hybrid Neural (DBHN) Network. 1) In terms of network architecture: A dual-branch network integrating ANN and SNN was designed, where the SNN branch reduces power consumption while the ANN branch addresses information loss; The BandSplit and Time-Frequency (TF) -Mamba modules were developed to simultaneously compress energy consumption and enhance model performance; Spiking Feature Extraction Group (SFEG) and Information Transformation Block (ITB) components were implemented with residual connections to mitigate information loss while further refining feature representations. 2) To facilitate inter-branch information fusion: An Interaction module was designed to promote information exchange at various stages of the dual-branch network; A TF-Cross Attention-Fusion module was designed to perform time-frequency domain fusion of dual-branch information while data-adaptively guiding the SNN branch to retain more critical information. Results show that the proposed model maintains superior performance across three public datasets while achieving an average 7.5 fold reduction in computational complexity compared to baseline models.

URL PDF HTML ☆

赞 0 踩 0

2606.05899 2026-06-05 cs.LG cond-mat.dis-nn 版本更新

High-Dimensional Theory of LoRA Fine-Tuning in a Solvable Attention Model

可解注意力模型中LoRA微调的高维理论

O. Duranthon, F. Boncoraglio, L. Zdeborová

发表机构 * Statistical Physics of Computation Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)（计算物理学实验室，瑞士联邦理工学院（EPFL））

AI总结本文通过高维统计理论分析低秩适应（LoRA）在注意力模型中的微调过程，揭示了预训练与微调之间的相互作用，并给出了测试误差和表示对齐的精确渐近刻画。

2606.05895 2026-06-05 cs.CL cs.LG 版本更新

Representing Research Attention as Contextually Structured Flows

将研究关注度表示为上下文结构化流

Jessica Rodrigues, Angelo Salatino, Gard Jenset, Scott Hale

发表机构 * University of Oxford（牛津大学）； The Open University（开放大学）； Springer Nature

AI总结提出注意力流（attention flows）作为上下文结构化表示，编码注意力的组织及其随时间演化，通过类比推理基准评估发现流表示更有效支持结构比较，并提升部分观测和结构扰动下的鲁棒性。

Comments Accepted at STi 2026 - International Conference on Science and Technology Indicators

详情

AI中文摘要

研究关注度被广泛用作可见性、影响和社会采纳的指标，但通常表示为聚合计数，无法保留注意力在上下文中随时间如何发展。这造成了注意力解释方式与其表示方式之间的不匹配。我们提出注意力流作为上下文结构化表示，编码注意力的组织及其随时间演化。我们通过构建基于研究产出间类比推理的基准，评估这些表示是否捕获可迁移结构。比较信号、序列和基于流的表示，我们发现流表示更有效地支持结构比较，特别是在注意力受时间进程或上下文分布影响的场景中。我们进一步表明，学习到的流表示在部分观测和结构扰动下提高了鲁棒性。总体而言，这些结果支持将注意力建模为上下文结构化现象，并为更具信息性的研究评估方法提供了基础。

英文摘要

Research attention is widely used as an indicator of visibility, influence, and societal uptake, yet it is typically represented as aggregated counts that do not preserve how attention develops across contexts over time. This creates a mismatch between how attention is interpreted and how it is represented. We propose attention flows as contextually structured representations that encode the organisation of attention and its evolution over time. We evaluate whether these representations capture transferable structure by constructing a benchmark based on analogy-style reasoning across research outputs. Comparing signal, sequence, and flow-based representations, we find that flow representations more effectively support structural comparison, particularly in settings where attention is shaped by temporal progression or context distributions. We further show that learned flow representations improve robustness under partial observation and structural perturbation. Overall, these results support modelling attention as a contextually structured phenomenon and provide a basis for more informative approaches to research evaluation.

URL PDF HTML ☆

赞 0 踩 0

2606.05885 2026-06-05 cs.LG 版本更新

When Denser Credit Is Not Enough: Evidence-Calibrated Policy Optimization for Long-Horizon LLM Agent Training

当更密集的信用不足时：面向长周期LLM智能体训练的基于证据校准的策略优化

Yuanfan Li, Qi Zhou, Wenjing Duan, Lu Chen

发表机构 * X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University, Shanghai, China（X-LANCE实验室，计算机科学学院，上海交通大学，上海，中国）； Faculty of Electronic and Information Engineering, Xi’an Jiaotong University（电子与信息工程学院，西安交通大学）

AI总结针对长周期LLM智能体在稀疏延迟奖励下的信用分配问题，提出一种无评论家的策略优化算法ECPO，通过证据校准的动作优势和方差门控信用加权来修正密集信用的统计不可靠性，在ALFWorld和WebShop上显著提升性能。

详情

AI中文摘要

长周期LLM智能体需要能够在稀疏和延迟奖励下为中间决策分配信用的强化学习方法。最近的基于分组的方法如GiGPO通过构建重复锚点状态下的步骤级优势来改进GRPO。然而，我们表明这种密集信用在统计上可能不可靠：在有限的轨迹采样下，罕见但幸运的动作可能获得过大的优势，产生发散锚点偏差和后期训练振荡。我们提出证据校准策略优化（ECPO），一种在策略更新前校准步骤级信用的无评论家策略优化算法。ECPO结合了证据校准动作优势（将轨迹按规范动作分组并收缩低计数估计）和方差门控信用加权（抑制由动作内噪声主导的锚点状态）。在ALFWorld和WebShop上使用Qwen2.5-1.5B/7B的实验表明，ECPO持续优于强基线，在Qwen2.5-1.5B上，ALFWorld/WebShop的成功点分别比GiGPO提高+5.2/+7.3，同时仅增加0.1%的额外优势计算开销。

英文摘要

Long-horizon LLM agents require reinforcement learning methods that can assign credit to intermediate decisions under sparse and delayed rewards. Recent group-based methods such as GiGPO improve over GRPO by constructing step-level advantages at repeated anchor states. However, we show that such dense credit can be statistically unreliable: under limited rollouts, rare but lucky actions may receive overly large advantages, producing divergent anchor bias and late-stage training oscillation. We propose Evidence-Calibrated Policy Optimization (ECPO), a critic-free policy optimization algorithm that calibrates step-level credit before policy updates. ECPO combines Evidence-Calibrated Action Advantage, which groups rollouts by canonical actions and shrinks low-count estimates, with Variance-Gated Credit Weighting, which suppresses anchor states dominated by within-action noise. Experiments on ALFWorld and WebShop with Qwen2.5-1.5B/7B show that ECPO consistently outperforms strong baselines, improving GiGPO by +5.2/+7.3 success points on ALFWorld/WebShop with Qwen2.5-1.5B while adding only 0.1% additional advantage-computation overhead.

URL PDF HTML ☆

赞 0 踩 0

2606.05873 2026-06-05 cs.RO cs.AI cs.CV cs.LG 版本更新

LadderMan: Learning Humanoid Perceptive Ladder Climbing

LadderMan: 学习人形机器人感知爬梯

Siheng Zhao, Yuanhang Zhang, Ziqi Lu, Pieter Abbeel, Rocky Duan, Koushil Sreenath, Yue Wang, C. Karen Liu, Guanya Shi

发表机构 * Amazon FAR（亚马逊FAR）； USC（美国南加州大学）； UC Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）； CMU（卡内基梅隆大学）

AI总结提出LadderMan系统，通过两阶段学习管道和视觉基础模型，使人形机器人能够鲁棒地攀爬多种梯子并在梯子上进行操控。

详情

AI中文摘要

人形机器人在以人为中心的环境中具有巨大潜力，但由于稀疏的立足点和手抓点、复杂的全身协调以及对感知和控制误差的敏感性，爬梯仍然是最具挑战性的任务之一。我们提出了 extbf{LadderMan}，一个统一的系统，使人形机器人能够鲁棒地攀爬多种梯子并在这种受限条件下进行操控。我们的攀爬策略基于一个可扩展的两阶段学习管道，其中我们使用混合运动跟踪从单个参考运动学习多个攀爬专家，并通过混合模仿和强化学习将这些专家蒸馏成一个统一的基于深度视觉的运动攀爬策略。为了实现真实世界部署，我们利用视觉基础模型来弥合深度感知中的模拟到现实差距。基于学习到的攀爬策略，我们进一步使用双智能体公式训练一个独立的操控策略，允许通过遥操作在梯子上进行稳定操控。实验表明，LadderMan在多种几何形状的梯子上实现了鲁棒的攀爬，以零样本方式成功迁移到真实世界硬件，并在具有挑战性的梯子约束下支持各种操控任务。视频结果见https://ladderman-robot.github.io。

英文摘要

Humanoid robots hold great promise for operating in human-centered environments, yet ladder climbing remains one of the most challenging tasks due to sparse footholds and handholds, complex whole-body coordination, and sensitivity to perception and control errors. We present \textbf{LadderMan}, a unified system that enables humanoid robots to robustly climb diverse ladders and perform manipulation under such constrained conditions. Our climbing policy is built on a scalable two-stage learning pipeline, where we use hybrid motion tracking to learn multiple climbing experts from a single reference motion, and distill these experts into a unified depth-based visuomotor climbing policy via hybrid imitation and reinforcement learning. To enable real-world deployment, we leverage vision foundation models to bridge the sim-to-real gap in depth perception. Building on the learned climbing policy, we further train a separate manipulation policy using a dual-agent formulation, allowing stable on-ladder manipulation via teleoperation. Experiments demonstrate that LadderMan achieves robust ladder climbing across a wide range of geometries, successfully transfers to real-world hardware in a zero-shot manner, and supports various manipulation tasks under challenging ladder constraints. Video results are available at https://ladderman-robot.github.io .

URL PDF HTML ☆

赞 0 踩 0

2606.05870 2026-06-05 q-bio.NC cs.LG q-bio.QM 版本更新

Cross-scale spatially-aware generative modeling of transcriptomic programs underlying neurodegenerative brain organization

跨尺度空间感知生成模型揭示神经退行性脑组织下的转录组程序

Krishnakumar Vaithianathan

发表机构 * Department of Computer Engineering, Karaikal Polytechnic College, Karaikal, Puducherry, India（计算机工程系，卡莱克尔理工学院，卡莱克尔，浦那赫里，印度）

AI总结提出一种跨尺度空间感知生成框架，通过变分生成架构结合图空间平滑正则化，学习区域基因表达与皮质退化的潜在生物程序，实现高精度预测（解释方差0.8604，空间相关r=0.9439）。

Comments 26 pages, 5 figures

详情

AI中文摘要

神经退行性疾病如阿尔茨海默病表现出高度有序的区域性脑脆弱性模式，但这种空间选择性的生物学机制仍不完全清楚。现有的成像-转录组研究主要依赖于基因表达与神经影像表型之间的相关性分析，限制了它们模拟分子组织如何导致神经退化的能力。在这里，我们引入了一个跨尺度空间感知生成框架，用于模拟皮质退化下的转录组程序。使用艾伦人脑图谱中910个标志基因在68个皮质区域的区域转录组图谱。通过计算认知正常对照（NC=926）和阿尔茨海默病受试者（AD=426）之间的皮质厚度差异，从ADNI FreeSurfer皮质厚度测量构建神经退行性脆弱性图谱。采用变分生成架构学习连接区域基因表达组织与皮质退化的潜在生物程序，同时结合基于图的空间平滑正则化以保持皮质组织。所提出的框架实现了对区域神经退行性脆弱性的强预测，解释方差为0.8604，预测与观察到的皮质退化图谱之间存在显著空间相关性（r=0.9439，p<0.001）。学习到的潜在表示揭示了与分布性疾病易感性相关的结构化转录组组织。这些发现表明，生物约束的生成建模可以桥接微观分子组织与宏观神经退化，为空间感知的生成神经生物学和计算神经科学奠定基础。

英文摘要

Neurodegenerative disorders such as Alzheimer's disease exhibit highly organized patterns of regional brain vulnerability, yet the biological mechanisms underlying this spatial selectivity remain incompletely understood. Existing imaging-transcriptomic studies have largely relied on correlation-based analyses between gene expression and neuroimaging phenotypes, limiting their ability to model how molecular organization gives rise to neurodegeneration. Here, we introduce a cross-scale spatially-aware generative framework for modeling transcriptomic programs underlying cortical neurodegeneration. Regional transcriptomic profiles were derived from the Allen Human Brain Atlas using 910 landmark genes across 68 cortical regions. Neurodegenerative vulnerability maps were constructed from ADNI FreeSurfer cortical thickness measurements by computing regional cortical thinning differences between cognitively normal controls (NC = 926) and Alzheimer's disease subjects (AD = 426). A variational generative architecture was used to learn latent biological programs linking regional gene-expression organization to cortical degeneration while incorporating graph-based spatial smoothness regularization to preserve cortical organization. The proposed framework achieved strong prediction of regional neurodegenerative vulnerability, yielding an explained variance of 0.8604 and a significant spatial correlation between predicted and observed cortical degeneration profiles (r = 0.9439, p < 0.001). The learned latent representations revealed structured transcriptomic organization associated with distributed disease susceptibility. These findings demonstrate that biologically constrained generative modeling can bridge microscale molecular organization with macroscale neurodegeneration, providing a foundation for spatially-aware generative neurobiology and computational neuroscience.

URL PDF HTML ☆

赞 0 踩 0

2606.05863 2026-06-05 cs.LG cs.AI 版本更新

Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction

通过深度线性网络理论与条件ReLU约简解读Grokking中的两个训练时钟

Hu Tan, Kuo Gai, Shihua Zhang

发表机构 * State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China（数学科学国家重点实验室，数学与系统科学研究院，中国科学院，北京100190，中国）； School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China（中国科学院大学数学科学学院，北京100049，中国）； Shanghai Institute for Mathematics and Interdisciplinary Sciences (SIMIS), Shanghai, China（上海数学与交叉科学研究所（SIMIS），上海，中国）； Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China（浙江省系统健康科学重点实验室，生命科学学院，杭州先进研究院，中国科学院大学，中国科学院，杭州310024，中国）

AI总结本文通过分离分类损失的快速衰减与表示学习的缓慢简化，定义了“两个训练时钟”形式化Grokking现象，并利用深度线性网络理论和条件ReLU约简机制解释了这一两阶段过程。

详情

AI中文摘要

Grokking表明，拟合训练数据和学习简单底层规则可能发生在不同的时间尺度上。我们通过将分类损失的快速衰减与学习表示的较慢简化分离来形式化这一现象，并将由此产生的停止时间对称为两个训练时钟。对于深度线性网络，我们证明后边际间隙增长或一步尾部收缩条件在对数时间尺度上将交叉熵损失降低到ε水平。相反，当存在逐层权重衰减时，端到端映射上的诱导正则化可以表示为Schatten型惩罚；在尖锐的晚期Kurdyka-Lojasiewicz尾部下，这种结构能量在多项式时间尺度上闭合。因此，两个时钟将拟合与表示简化分开。然后我们解释相同机制如何在ReLU MLP中出现。在训练集上的激活模式保持固定的区域中，网络简化为活动坐标上的线性模型。在两层ReLU嵌入模型中，链式法则估计进一步表明，在受控的下游范数下，分类器头可以比嵌入块接收更大的有效梯度。这支持了一个两阶段机制：分类器先拟合，而表示随后继续简化。我们以模加法作为主要实验设置。深度线性理论提供了分析的核心严格基础。但ReLU结果被表述为条件约简，以解释经验行为，而不声称对非线性训练动态的全局证明。

英文摘要

Grokking suggests that fitting the training data and learning a simple underlying rule may occur on different time scales. We formalize this phenomenon by separating the fast decay of the classification loss from the slower simplification of the learned representation, and we call the resulting pair of stopping times two training clocks. For deep linear networks, we show that a post-margin gap-growth or one-step tail-contraction condition reduces the cross-entropy loss to level epsilon on a logarithmic time scale. In contrast, when layerwise weight decay is present, the induced regularization on the end-to-end map can be expressed as a Schatten-type penalty; under a sharp late-time Kurdyka-Lojasiewicz tail, this structural energy closes on a polynomial time scale. The two clocks, therefore, separate fitting from representation simplification. We then explain how the same mechanism can appear in ReLU MLPs. In regions where the activation patterns on the training set remain fixed, the network reduces to a linear model in the active coordinates. In a two-layer ReLU embedding model, chain-rule estimates further show that the classifier head can receive larger effective gradients than the embedding block under controlled downstream norms. This supports a two-stage mechanism in which the classifier fits first, while the representation continues to simplify later. We use modular addition as the main experimental setting. The deep linear theory provides the rigorous core of the analysis. But the ReLU results are formulated as conditional reductions that account for empirical behavior without claiming a global proof for nonlinear training dynamics.

URL PDF HTML ☆

赞 0 踩 0

2606.05817 2026-06-05 cs.LG cs.AI 版本更新

Consistency Training Along the Transformer Stack

沿Transformer堆栈的一致性训练

Sukrati Gautam, Neil Shah, Arav Dhoot, Bryan Maruyama, Caroline Wei, Rohan Kapoor, Robert Sidey, Prakhar Gupta, Zi Cheng Huang, David Demitri Africa

发表机构 * Purdue University（普渡大学）； Independent（独立）； Columbia University（哥伦比亚大学）； University of California, San Diego（加州大学圣地亚哥分校）； University of California, Los Angeles（加州大学洛杉矶分校）； Dartmouth College（达特茅斯学院）； University of Michigan, Ann Arbor（密歇根大学安娜堡分校）

AI总结本文通过引入MLP状态和注意力分布的一致性目标，将一致性训练扩展到多种安全威胁，并发现跨威胁泛化及共享机制，证明其作为灵活对齐框架的有效性。

Comments Submitted to EMNLP 2026

详情

AI中文摘要

一致性训练鼓励模型在不同上下文中表现相似，并已显示出减少对齐问题的潜力。我们以两种方式扩展一致性训练的范围。首先，我们引入两个新的内部一致性目标：MLP一致性训练（MLPCT），匹配激活后的MLP状态；以及注意力一致性训练（AttCT），匹配每个头的注意力分布。其次，我们将一致性训练应用于四种额外的安全威胁：角色上下文学习攻击、对抗性挫败、预填充攻击和条件性对齐错误。在多个模型和威胁设置中，我们发现一致性训练在减少对齐问题方面远优于先前工作中研究的谄媚和越狱设置。我们还发现了跨威胁泛化的案例，即针对一种失败模式的训练提高了对另一种模式的鲁棒性，并识别了ACT、MLPCT和AttCT共享的残差流机制，同时将BCT区分为机制上不同的方法。我们的结果表明，一致性训练是一个灵活且可扩展的对齐框架，能够统一防御更广泛的模型病理类别。

英文摘要

Consistency training encourages models to behave similarly across different contexts, and has shown promise for reducing misalignment. We broaden the scope of consistency training in two ways. First, we introduce two new internal consistency targets: MLP Consistency Training (MLPCT), which matches post-activation MLP states, and Attention Consistency Training (AttCT), which matches per-head attention distributions. Second, we apply consistency training to four additional safety threats: persona in-context learning attacks, adversarial frustration, prefill attacks, and conditional misalignment. Across several models and threat settings, we find that consistency training reduces misalignment well beyond the sycophancy and jailbreak settings studied in prior work. We also find cases of cross-threat generalization, where training against one failure mode improves robustness to another, and identify a shared residual-stream mechanism underlying ACT, MLPCT, and AttCT, while distinguishing BCT as mechanistically distinct. Our results suggest that consistency training is a flexible and extensible framework for alignment, capable of unifying defenses against a broader class of model pathologies.

URL PDF HTML ☆

赞 0 踩 0

2606.05814 2026-06-05 cs.LG 版本更新

Robust and sparse support vector machine via hybrid truncated loss for supervised classification

基于混合截断损失的鲁棒稀疏支持向量机用于监督分类

Yuliang Yang, Chen Chen, Yuxiang Liu, Huiru Wang

发表机构 * School of Science, Beijing Forestry University（北京林业大学理学院）； Translational Cancer Research Center, Peking University First Hospital（北京大学第一医院转化肿瘤研究中心）

AI总结提出一种稀疏且有界的混合截断损失函数L_ht，构建L_ht-SVM模型用于单视图分类，并扩展为多视图MvL_ht-SVM，通过P-平稳点和交替方向乘子法实现高效优化，实验表明在准确率、稀疏性和鲁棒性上优于对比方法。

详情

AI中文摘要

支持向量机（SVM）是一种广泛使用的分类器，但选择合适的损失函数仍然困难。凸损失如hinge损失和最小二乘损失对异常值敏感，而有界非凸损失通常导致高计算成本。为解决这一问题，我们提出一种混合截断损失函数（$L_{\mathrm{ht}}$），该函数既稀疏又有界，并构建了用于单视图分类的$L_{\mathrm{ht}}$-SVM模型。我们引入P-平稳点，并利用它建立一阶必要和充分最优性条件。基于这些条件，我们设计了一种带有工作集策略的交替方向乘子法，降低了计算成本并实现了全局收敛。我们进一步通过添加结构信息和视图权重将$L_{\mathrm{ht}}$-SVM扩展到多视图学习，得到Mv$L_{\mathrm{ht}}$-SVM，该方法遵循共识和互补原则。在合成、真实世界和图像数据集上的实验表明，$L_{\mathrm{ht}}$-SVM在准确率更高、支持向量更少和噪声鲁棒性更好方面优于五种单视图方法，而Mv$L_{\mathrm{ht}}$-SVM在准确率、精确率、召回率和F1分数上优于六种多视图方法。

英文摘要

The support vector machine (SVM) is a widely used classifier, but choosing an appropriate loss function remains difficult. Convex losses such as the hinge loss and least-squares loss are sensitive to outliers, while bounded non-convex losses often lead to high computational cost. To address this, we propose a hybrid truncated loss function ($L_{\mathrm{ht}}$) that is both sparse and bounded, and build the $L_{\mathrm{ht}}$-SVM model for single-view classification. We introduce the P-stationary point and use it to establish the first-order necessary and sufficient optimality conditions. Based on these conditions, we design an alternating direction method of multipliers with a working-set strategy that reduces computational cost and achieves global convergence. We further extend $L_{\mathrm{ht}}$-SVM to multi-view learning by adding structural information and view weights, resulting in Mv$L_{\mathrm{ht}}$-SVM, which follows both the consensus and complementarity principles. Experiments on synthetic, real-world, and image datasets show that $L_{\mathrm{ht}}$-SVM achieves higher accuracy with fewer support vectors and better noise robustness than five single-view methods, while Mv$L_{\mathrm{ht}}$-SVM outperforms six multi-view methods in accuracy, precision, recall, and F1-score.

URL PDF HTML ☆

赞 0 踩 0

2606.05800 2026-06-05 cs.LG 版本更新

SALT: When More Rollouts Don't Help in Group-Based Policy Optimization and How to Make Them Matter

SALT: 当更多 rollout 在基于组的策略优化中无益时如何使其发挥作用

Powei Chang, Jinpeng Zhang, Chaoqun Sun, MiniWell Tsao, Lianrui Li, Jianxiang Xiang, Chenyu Wang, Yukang Gao, Dongying Kong

发表机构 * Bilibili Inc.（哔哩哔哩公司）； Fudan University（复旦大学）； Zhejiang University（浙江大学）

AI总结针对 GRPO 风格组归一化中增加 rollout 数量导致梯度抵消的问题，提出 SALT 组件，通过子空间自适应重加权组相对更新系数，改善更新几何并提升性能。

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）通常采用 GRPO 风格的组相对更新，为每个提示采样多个 rollout 以构建归一化学习信号。然而，仅仅增加 rollout 数量并不能可靠地增强学习：在 GRPO 风格组归一化下，每个 rollout 的策略梯度特征可能集中到低秩、有符号的几何结构中，导致聚合时大量抵消，削弱有效更新。我们通过 SALT（子空间自适应几何插件组件）解决这种失效模式，该组件利用样本梯度几何对组相对更新的系数进行重新加权。SALT 从小批量 Gram 几何中估计主导共享子空间，将组相对系数分解为共享通道和残差通道，并在符号抵消严重时自适应放大残差通道。在多种推理导向的 RLVR 基准和模型规模上，SALT 在不修改奖励模型或 rollout 采样过程的情况下，改善了有效更新几何和性能。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) often adopts GRPO-style group-relative updates, sampling multiple rollouts per prompt to construct normalized learning signals. However, merely increasing the number of rollouts does not reliably strengthen learning: under GRPO-style group normalization, per-rollout policy-gradient features can concentrate into a low-rank, signed geometry, causing substantial cancellation during aggregation and weakening the effective update. We address this failure mode with SALT, a Subspace-Adaptive geometry pLug-in componenT that uses sample-wise gradient geometry to reweight the coefficients of group-relative updates. SALT estimates a dominant shared subspace from the mini-batch Gram geometry, decomposes group-relative coefficients into shared and residual channels, and adaptively amplifies the residual channel when signed cancellation is severe. Across diverse reasoning-oriented RLVR benchmarks and model scales, SALT improves effective update geometry and performance without modifying the reward model or the rollout sampling procedure

URL PDF HTML ☆

赞 0 踩 0

2606.05799 2026-06-05 cs.LG cs.CL 版本更新

让它简单：视觉-语言-动作模型的单步动作生成

Yitong Chen, Shiduo Zhang, Jingjing Gong, Xipeng Qiu

发表机构 * University of Science and Technology of China（中国科学技术大学）； Shanghai Innovation Institute（上海创新研究院）； Fudan University（复旦大学）

AI总结针对视觉-语言-动作（VLA）模型，提出通过偏置训练时间分布至高频噪声状态，实现无需教师模型、蒸馏或辅助目标的单步动作生成，性能可匹配十步解码。

Comments 20 pages, 10 figures

详情

AI中文摘要

基于扩散的视觉-语言-动作（VLA）模型通常继承图像生成的观点：动作通过迭代去噪生成。我们认为VLA动作生成具有不同的条件-目标结构：策略以丰富的观测、语言和状态为条件，但仅预测紧凑的低维动作块。在这种不对称性下，强单步动作生成不一定需要为图像合成开发的先进单步方法。我们保持标准速度预测，不添加教师模型、蒸馏阶段或辅助目标；在我们的主要方案中，我们简单地将训练时间分布偏向高频噪声状态。我们首先在受控的MNIST网格到序列任务中隔离效果，然后通过广泛的机器人策略实验进行测试。在标准LIBERO、LIBERO-Plus和LIBERO-Pro上，使用高频噪声偏置调度训练的单步策略通常匹配相同方案下的十步解码，并且在标准LIBERO上可以超过使用均匀时间分布训练的十步策略。真实机器人双臂YAM RSS评估提供了相同采样器趋势的小样本跨架构检查。在具有30M动作头的1.4B VLM模型上，单步解码在LIBERO-Long上达到95.6%。这些结果表明，强单步VLA动作生成可以从标准扩散训练中涌现，而无需引入为图像生成开发的完整少步扩散机制。

英文摘要

Diffusion-based vision-language-action (VLA) models often inherit the image-generation view: actions are generated by iterative denoising. We argue that VLA action generation has a different condition-target structure: the policy is conditioned on rich observations, language, and state, but predicts only a compact, low-dimensional action chunk. Under this asymmetry, strong one-step action generation should not necessarily require the advanced one-step methods developed for image synthesis. We keep standard velocity prediction and add no teacher model, distillation stage, or auxiliary objective; in our main recipe, we simply bias the training time distribution toward high-noise states. We first isolate the effect in a controlled MNIST grid-to-sequence task, then test it with extensive robot-policy experiments. Across standard LIBERO, LIBERO-Plus, and LIBERO-Pro, one-step policies trained with high-noise biased schedules generally match ten-step decoding under the same recipe, and on standard LIBERO can exceed ten-step policies trained with a uniform time distribution. A real-robot bimanual YAM RSS evaluation gives a small-sample cross-architecture check of the same sampler trend. On a 1.4B VLM model with a 30M action head, one-step decoding reaches 95.6\% on LIBERO-Long. These results show that strong one-step VLA action generation can emerge from standard diffusion training, without importing the full few-step diffusion machinery developed for image generation.

URL PDF HTML ☆

赞 0 踩 0

2606.05733 2026-06-05 cs.LG cs.CE q-fin.CP stat.ML 版本更新

Zero-Copy Semantic Contagion: An In-Memory Streaming Architecture for Evolving Attention Graphs

零拷贝语义传染：一种用于演化注意力图的内存流式架构

Kabir Murjani

发表机构 * Department of Electrical Engineering, Nirma University（电气工程系，尼玛大学）

AI总结提出一种基于Rust-Python的异构流式架构，通过零拷贝解析和神经霍克斯过程实现跨公司注意力图的实时构建与推理，在FNSPID语料库上相比随机基线提升1.70倍精度。

Comments Accepted to the 2026 ACM SIGMOD Workshop on Data Management for the Modern Financial Systems (FinDS). 10 pages, 4 figures

详情

AI中文摘要

按代码预测模型主导金融时间序列工作，但仍无法捕捉跨公司传播：台湾的晶圆厂中断在单资产模型中不会显现，直到苹果自己的价格已经变动。为解决这一局限，我们引入一种异构的Rust-Python流式架构，将跨公司注意力映射为直接由文本驱动的连续时间图。我们表明，在摄取端，零拷贝Rust边缘解析新闻记录约需100纳秒，并在约1.2微秒内扫描目标股票宇宙。在推理端，一个多变量神经霍克斯过程，具有每节点连续时间LSTM状态和双线性潜在投影，传播定向激发，而自适应剪枝规则限制了动态邻域更新的计算成本。结合这些阶段，我们展示了在单个商用CPU上，每条传入新闻记录的端到端处理延迟约为13毫秒。在FNSPID语料库（47个代码的638篇文章）的一个月时间保持集上评估，该系统在90百分位次日回报阈值下，相比随机基线精度提升1.70倍，相比同行业基线提升3.36倍。关键的是，移除图拓扑结构会使精度降至零，证实动态注意力网络是该架构中跨公司信号的唯一驱动因素。

英文摘要

Per-ticker forecasting models dominate financial time-series work yet remain blind to cross-company propagation: a foundry disruption in Taiwan does not register in a single-asset model until Apple's own price has already moved. To address this limitation, we introduce a heterogeneous Rust-Python streaming architecture that maps cross-company attention as a continuous-time graph driven directly from text. We show that on the ingestion side, a zero-copy Rust edge parses news records in $\sim$100 ns and scans the target equity universe in $\sim$1.2 $μ$s. On the inference end, a multivariate Neural Hawkes Process featuring per-node continuous-time LSTM states and a bilinear latent projection propagates directed excitation, while an adaptive pruning rule bounds the computational cost of dynamic neighborhood updates. Combining these stages, we demonstrate an end-to-end processing latency of $\sim$13 ms per incoming news record on a single commodity CPU. Evaluated on a one-month temporal holdout of the FNSPID corpus (638 articles across 47 tickers), the system delivers a $1.70\times$ precision lift over random at the 90th-percentile next-day return threshold, and $3.36\times$ over a same-sector baseline. Crucially, removing the graph topology collapses precision to zero, confirming that the dynamic attention network is the sole driver of cross-company signal in this architecture.

URL PDF HTML ☆

赞 0 踩 0

2606.05731 2026-06-05 cs.LG 版本更新

Intercomparison of Machine Learning Algorithms for Remote Sensing-based In-season Crop Mapping

基于遥感的季节内作物制图机器学习算法比较

August Posch, Jitendra Kumar, Forrest M. Hoffman, Auroop R. Ganguly

发表机构 * Oak Ridge National Laboratory（橡树岭国家实验室）； Environmental Sciences Division（环境科学 division）； Northeastern University（东北大学）

AI总结本研究通过比较十种机器学习算法，利用Landsat-Sentinel反射率时间序列和轮作历史，在6月初准确绘制玉米和杏仁的30米分辨率作物图，并量化物候和分布不确定性，发现支持向量机总体表现最佳。

Comments 22 pages, 8 figures

详情

AI中文摘要

面对日益极端的气候相关作物威胁，季节内作物类型制图对粮食安全至关重要。目前，美国农业部作物数据层提供30米分辨率的作物类型标签，并在收获后的2月可用，但尚无产品能在收获前以令人满意的精度绘制作物类型，从而允许应急管理人员近乎实时地应对作物威胁。此外，直到本研究，广泛算法的相对优势尚未以考虑年际变异的方式进行评估。在此，我们结合协调的Landsat-Sentinel地表反射率时间序列和作物轮作历史信息，在6月初准确绘制爱荷华州的玉米和加利福尼亚州的杏仁的30米分辨率图，并稳健量化物候和作物分布引起的不确定性。通过逐年交叉验证和一套指标，比较了十种机器学习算法的数千种模型配置。超参数搜索显示，支持向量机是总体最成功的算法，在加利福尼亚州6月初的杏仁（爱荷华州6月初的玉米）的五个未见验证年份中，平均F1分数为0.74（0.59）。年际变异是不确定性的主要来源，但模式表明通过集成方法或辅助数据有进一步提高性能的潜力。未来工作可将这些方法扩展到包括所有作物类型的多类地图、全美国地图以及季节内作物产量预测。

英文摘要

In-season crop type mapping is critical for food security in the face of increasingly extreme climate-related threats to crops. Currently, the USDA Cropland Data Layer provides crop type labels at 30m resolution and is available the February after harvest, but no product exists that maps crop types before harvest with satisfactory accuracy that would allow emergency managers to respond to crop threats in near real time. Furthermore, the relative advantages of a wide range of algorithms have not been evaluated in a way that accounts for interannual variability, until this study. Here, Harmonized Landsat-Sentinel surface reflectance imagery time series and crop rotation history information are combined to map corn in Iowa and almonds in California at 30m resolution accurately by early June in unseen years, with robust quantification of uncertainty due to phenology and crop distribution. Thousands of model configurations across ten machine learning algorithms were compared using a year-wise cross-validation and a suite of metrics. Hyperparameter search revealed Support Vector Machines to be the most successful algorithm overall, with a mean F1 score of 0.74 (0.59) across five unseen validation years for almonds by early June in California (corn by early June in Iowa). Interannual variation was a large source of uncertainty, but patterns showed the potential to further improve performance with ensemble approaches or ancillary data. Future work may extend these methods to include multiclass maps of all crop types, CONUS-wide maps, and in-season crop yield forecasting.

URL PDF HTML ☆

赞 0 踩 0

2606.05729 2026-06-05 cs.IT cs.LG math.IT 版本更新

Automated Proving of Shannon-Type Entropy Inequalities via Fine-Tuned Language Models and Guided Tree Search

通过微调语言模型和引导树搜索自动证明香农型熵不等式

Shing Yin Wong, Shaocheng Liu, Linqi Song, Amin Gohari, Cheuk Ting Li

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文通过微调小规模语言模型并结合引导束搜索，自动化证明香农型熵不等式，在含10-15个变量的测试集上达到85%的证明成功率。

详情

AI中文摘要

证明香农型熵不等式是信息论中的一项基本任务，通常需要构造已知约束的非平凡线性组合，这是一个组合搜索问题，其规模随随机变量数量增加而急剧增长。我们研究了小规模大语言模型（0.6B--1.7B参数），在原子证明步骤上微调并结合引导束搜索，能否自动化这一过程。在包含n=10到15个变量的60个不等式的保留测试集上，我们的0.6B微调模型通过树搜索达到了85%的证明成功率。GPT-5.5在零样本提示下解决了1.7%的样本，而Psitip解决了33.3%的样本。跨训练上下文长度（4096 vs. 8192 token）和数据分布（n=9偏斜 vs. 非偏斜）的系统消融研究表明，4096 token的非偏斜训练分布表现最佳，而扩展上下文和偏斜数据没有带来边际收益。我们进一步识别了两种主要的失败模式——格式失败和步骤质量退化——并通过受控消融验证了束评分启发式的必要性（随机评分将成功率从83%降至23%）。

英文摘要

Proving Shannon-type entropy inequalities is a fundamental task in information theory that often requires constructing non-trivial linear combinations of known constraints, which is a combinatorial search problem that scales poorly with the number of random variables. We investigate whether small-scale large language models (0.6B--1.7B parameters), fine-tuned on atomic proof steps and combined with guided beam search, can automate this process. On a held-out test set of 60 inequalities spanning n=10 to 15 variables, our 0.6B fine-tuned model achieves an 85\% proof success rate with tree search. GPT-5.5 solves 1.7\% samples under zero-shot prompting while Psitip solves 33.3\% samples. A systematic ablation study across training context length (4096 vs.\ 8192 tokens) and data distribution (n=9-skewed vs not skewed) reveals that a 4096-token not skewed training distribution yields the best performance, with extended context and skewed data providing no marginal benefit. We further identify two dominant failure modes -- format failures and step quality degradation -- and verify that the beam-scoring heuristic is essential via a controlled ablation (random scoring reduces success from 83\% to 23\%).

URL PDF HTML ☆

赞 0 踩 0

2606.05718 2026-06-05 cs.CV cs.AI cs.LG 版本更新

ViCuR: Visual Cues as Recoverable Privilege for Multimodal On-Policy Distillation

ViCuR: 视觉线索作为多模态在策略蒸馏中的可恢复特权

Kanghui Tian, Siyuan Liu, Ziang Yan, Sheng Xia, Shuai Dong, Yi Wang

发表机构 * Shanghai AI Laboratory（上海人工智能实验室）； Fudan University（复旦大学）； Nanjing University（南京大学）

AI总结提出ViCuR框架，通过将教师特权从答案侧替换为输入中的视觉线索，并引入轻量级线索恢复模块，解决多模态在策略蒸馏中的训练-测试不匹配问题，在七个基准上显著提升学生模型性能。

Comments 25 pages, 11 figures. Preprint, under review

详情

AI中文摘要

在策略蒸馏（OPD）通过在教师监督下，对学生自身策略采样的轨迹进行训练来改进推理。在多模态推理中，一种常见的扩展是使用特权教师，该教师观察仅在训练时可用的信号，如参考答案或理由。然而，这种答案侧特权造成了训练-测试不匹配：教师的监督可能依赖于学生无法获得的信号，鼓励捷径模仿而非基于视觉的推理。我们提出ViCuR，一种基于视觉的特权教师蒸馏框架，用视觉线索（输入中与查询相关的证据）取代答案侧特权。由于这些线索来源于推理时可用的相同视觉输入，它们的证据可由学生恢复。为此，ViCuR引入了一个轻量级线索恢复模块，在预填充期间使用专用的汇点令牌交叉注意力，将任务相关的视觉证据聚合到内部表示中，而不改变推理接口或需要辅助的线索生成损失。在七个基准上，使用Qwen3-VL-2B和8B学生，ViCuR在总体平均性能上持续优于基于答案的在策略自蒸馏，分别提升+1.19和+1.24。它还能自然地扩展到更强的教师OPD，超越OPD基线+0.64和+1.08，并在8B规模上具有一致的域外增益。这些结果表明，在多模态在策略蒸馏中，教师特权的设计与教师强度同等重要。

英文摘要

On-policy distillation (OPD) improves reasoning by training a student on trajectories sampled from its own policy under supervision from a teacher. In multimodal reasoning, a common extension is to use a privileged teacher that observes training-time-only signals such as reference answers or rationales. However, such answer-side privilege creates a train-test mismatch: the teacher's supervision may depend on signals unavailable to the student, encouraging shortcut imitation rather than visually grounded reasoning. We propose ViCuR, a visually grounded privileged-teacher distillation framework that replaces answer-side privilege with visual cues (query-related evidence in the input). Because these cues are derived from the same visual input available at inference, their evidence is recoverable by the student. To support this, ViCuR introduces a lightweight cue recovery module that uses dedicated sink-token cross-attention during prefill to aggregate task-relevant visual evidence into an internal representation, without changing the inference interface or requiring auxiliary cue-generation losses. Across seven benchmarks with Qwen3-VL-2B and 8B students, ViCuR consistently improves over answer-based on-policy self-distillation by +1.19 and +1.24 on overall average performance. It also extends naturally to stronger-teacher OPD, surpassing OPD baselines by +0.64 and +1.08, with consistent out-of-domain gains at the 8B scale. These results show that, in multimodal on-policy distillation, the design of teacher privilege is as important as teacher strength.

URL PDF HTML ☆

赞 0 踩 0

2606.05714 2026-06-05 cs.CR cs.LG 版本更新

Hybrid CNN-LSTM Framework for Intelligent Cyber Attack Detection and Prevention in U.S. Critical Digital Infrastructure: A Comparative Machine Learning Evaluation on CSE-CIC-IDS2018

混合CNN-LSTM框架用于美国关键数字基础设施的智能网络攻击检测与防御：基于CSE-CIC-IDS2018的机器学习比较评估

Md. Iqbal Hossan, Md. Serajul Kabir Chowdhury Rubel, Md. Arifur Rahman, B. M. Taslimul Haque

发表机构 * Department of Computer Science, Maharishi International University（马哈拉吉国际大学计算机科学系）； Department of Information Studies, Trine University（特林大学信息学系）； Department of Business Information Systems, Central Michigan University（中央密歇根大学商业信息系统系）

AI总结提出一种结合CNN和LSTM的混合深度学习框架，利用CSE-CIC-IDS2018数据集进行网络攻击检测与防御，通过比较多种机器学习模型，实现高精度入侵检测和自动防御。

Comments 25 pages, 9 figures, CSE CIC IDS2018 dataset, Hybrid CNN LSTM, cyber attack detection

详情

DOI: 10.25163/ai.1110763
Journal ref: Journal of Ai ML DL, 1(1), 2025

AI中文摘要

美国数字基础设施正在快速增长，因此，关键领域（包括医疗、金融、交通、能源和政府系统）面临的先进网络威胁也在增加。传统的网络安全方法，包括基于签名的入侵检测系统，已无法有效应对当今的网络攻击，因为它们无法实时检测未知和变化的攻击。为了克服这些限制，本研究提出了一种智能网络防御系统，利用人工智能（AI）和机器学习（ML）算法来检测和预防美国数字基础设施中的网络攻击。本研究使用CSE-CIC-IDS2018数据集，这是一个真实的网络流量数据集，包含各种网络攻击场景，包括分布式拒绝服务（DDoS）、暴力攻击、僵尸网络、渗透攻击和基于Web的攻击。实施并评估了多种机器学习和深度学习模型，如随机森林、XGBoost、卷积神经网络（CNN）和长短期记忆（LSTM）网络，用于识别恶意网络行为并提高入侵检测的准确性。所提出的框架结合了数据预处理、特征工程、实时流量监控、智能威胁分类和自动防御机制，以增强网络安全弹性。

英文摘要

Digital infrastructure is growing at a rapid pace in the United States, and as a result, exposure to advanced cyber threats to critical sectors including healthcare, finance, transportation, energy and government systems is growing. The traditional cybersecurity approaches, including signature-based intrusion detection systems, have become less effective against today's cyber attacks, as they are unable to detect unknown and changing attacks in real time. To overcome these constraints, this research suggests a smart cyber-defense system, which utilizes Artificial Intelligence (AI) and Machine Learning (ML) algorithms in the detection and prevention of cyber attacks in the U.S. digital infrastructure. This study uses the CSE-CIC-IDS2018 dataset, which is a realistic network traffic dataset, along with various cyber attack scenarios, including Distributed Denial of Service (DDoS), brute force attacks, botnets, infiltration attacks, and web-based attacks. A number of machine learning and deep learning models such as Random Forest, XGBoost, Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks are implemented and evaluated to be used in identifying malicious network behavior and boosting the accuracy of intrusion detection. The framework proposed combines data preprocessing, feature engineering, real-time traffic monitoring, intelligent threat classification with automated prevention mechanisms to build cybersecurity resilience. E

URL PDF HTML ☆

赞 0 踩 0

2606.05704 2026-06-05 cs.AI cs.LG 版本更新

Critic-Guided Heterogeneous Multi-Agent Reasoning for Reliable Mathematical Problem Solving

基于评论的异构多智能体推理用于可靠的数学问题求解

Muhammad Talha Sharif, Abdul Rehman

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出一种基于评论的异构多智能体框架，通过生成器-验证器结构和自适应学习系统，利用中间反馈评估和引导推理过程，在GSM8K基准上实现高达13%的准确率提升，并减少对大模型的依赖。

Comments 6 pages

详情

AI中文摘要

Diff2SP：随机规划中相关场景生成的扩散模型

Haixiang Sun, Andrew Liu

发表机构 * Purdue University（普渡大学）

AI总结提出Diff2SP扩散生成框架，将下游优化目标嵌入场景生成过程，通过理论证明和经验验证实现统计一致性与决策感知的平衡。

详情

AI中文摘要

场景生成是随机规划（SP）中的关键组成部分，直接影响不确定性下决策的质量。现有方法主要依赖于基于采样的技术或使用神经网络的监督学习。基于采样的方法通常难以捕捉复杂依赖关系和罕见但可能的事件，而监督学习需要固定的输入-输出对进行训练，且生成不受预定义模式或规则限制的多样化现实场景的能力有限。为了解决这些局限性，我们引入了Diff2SP，一种基于扩散的生成框架，将下游优化目标直接融入场景生成中。与将场景生成和决策制定视为独立步骤的传统方法不同，Diff2SP将随机优化嵌入训练过程，从而生成既统计一致又具有决策感知的场景。为了正式证明这种优化感知设计的合理性，我们建立了将分布精度与决策质量联系起来的遗憾界，并建立了样本复杂度保证，显示出比传统生成模型（如GAN）更快的收敛速度。在合成数据集和电力系统数据集上的实证结果验证了这些理论见解，表明Diff2SP在统计保真度和下游优化结果上均有一致提升。

英文摘要

Scenario generation is a critical component in stochastic programming (SP), as it directly influences the quality of decision-making under uncertainty. Existing approaches predominantly rely on either sampling-based techniques or supervised learning using neural networks. Sampling-based techniques often struggle to capture complex dependencies and rare but plausible events, while supervised learning requires fixed input-output pairs for training and is limited in its ability to generate a wide variety of realistic scenarios that are not restricted by predefined patterns or rules. To address these limitations, we introduce Diff2SP, a diffusion-based generative framework that incorporates downstream optimization objectives directly into scenario generation. Unlike conventional methods that treat scenario generation and decision-making as separate steps, Diff2SP embeds stochastic optimization into the training process, enabling the generation of scenarios that are both statistically coherent and decision-aware. To formally justify this optimization-aware design, we establish a regret bounds that link distributional accuracy to decision quality, and establish sample complexity guarantees showing faster convergence than traditional generative models such as GANs. Empirical results on both synthetic and power-system datasets validate these theoretical insights, demonstrating that Diff2SP consistently improves both statistical fidelity and downstream optimization outcomes.

URL PDF HTML ☆

赞 0 踩 0

2606.05639 2026-06-05 cs.LG 版本更新

Q-GNN: Query-Conditioned Graph Neural Networks with Type Awareness for Knowledge Graph Completion

Q-GNN: 具有类型感知的查询条件图神经网络用于知识图谱补全

Dongxiao He, Ruqiong Zhang, Zhizhi Yu, Ling Ding, Di Jin, Guangquan Xu, Zhiyong Feng

发表机构 * College of Intelligence and Computing, Tianjin University（智能与计算学院，天津大学）

AI总结提出Q-GNN，通过融合查询实体的结构上下文和语义类型信息，增强图神经网络在知识图谱补全中的推理能力。

详情

AI中文摘要

知识图谱补全（KGC）旨在从不完整的知识图谱中预测缺失的三元组，这对于下游应用至关重要。近年来，基于图神经网络（GNN）的方法通过在以查询为中心的局部子图上进行消息传递取得了显著成功。然而，在实践中，查询由实体和关系共同定义，两者都携带推理不可或缺的信息，但这些方法仅依赖查询关系作为引导信号，而查询实体中固有的信息未被利用来指导推理——实体仅作为子图提取的结构锚点。为此，我们从两个角度将查询实体信息融入推理过程：第一是结构上下文，即实体周围的邻居结构和关系模式，由专用上下文编码器编码并用于调制消息；第二是实体的语义类型，由大语言模型推断，并融入注意力计算和最终评分，以提供类型级别的先验约束。这两类信息共同使推理过程同时受查询关系和查询实体引导。在标准基准上的实验结果证明了所提出的Q-GNN的有效性。

英文摘要

Knowledge Graph Completion (KGC) aims at predicting missing triplets from incomplete knowledge graphs, which is crucial for downstream applications. Recently, Graph Neural Network (GNN)-based methods have achieved remarkable success by performing message passing over query-centered local subgraphs. However, in practice, a query is jointly defined by both the entity and the relation, with both carrying information indispensable for reasoning, yet these methods rely solely on the query relation as the guiding signal, while the information inherent in the query entity is not leveraged to guide inference - the entity serves merely as a structural anchor for subgraph extraction. To this end, we incorporate query entity information into the reasoning process from two perspectives: the first is structural context, i.e., the neighboring structure and relation patterns around the entity, which is encoded by a dedicated context encoder and used to modulate messages; the second is semantic type of the entity, inferred by a large language model, which is incorporated into attention computation and final scoring to provide type-level prior constraints. Together, these two sources of information enable the reasoning process to be guided by both the query relation and the query entity. Experimental results on standard benchmarks demonstrate the effectiveness of the proposed Q-GNN.

URL PDF HTML ☆

赞 0 踩 0

2606.05636 2026-06-05 cs.LG 版本更新

StableRCA: Robust Graph-Agnostic Mechanism-Level Root Cause Analysis

StableRCA：鲁棒的图无关机制级根因分析

Xiaoyu Lin, Nicholas Tagliapietra, Kehan Li, Lavdim Halilaj, Juergen Luettin

发表机构 * Department of Computer Science, Tsinghua University（清华大学计算机科学系）； Bosch Center for Artificial Intelligence（博世人工智能中心）； Computer Science Department, TU Darmstadt（图尔恩大学计算机科学系）

AI总结提出StableRCA框架，通过估计局部马尔可夫边界并检测条件分布偏移，避免全局图发现，实现鲁棒的机制级根因分析。

详情

AI中文摘要

根因分析（RCA）旨在识别复杂领域（如制造业、云计算和医疗保健）中导致系统行为异常的变量。现有方法面临一个关键瓶颈：基于图的因果方法可以识别干预目标，但通常需要已知或准确估计的因果图，而无图统计方法要么定位边际异常而非结构原因，要么依赖于对图结构或函数形式的限制性假设。我们提出StableRCA，一种局部机制级RCA框架，通过估计局部马尔可夫边界并检测其中的条件分布偏移来避免全局图发现。利用独立因果机制原理，我们证明在忠实马尔可夫边界恢复和非退化机制偏移下，干预目标可以以样本量指数收敛的概率被识别。在合成基准和五个真实世界数据集上的实验表明，StableRCA对图错误指定具有鲁棒性，在多个干预目标下有效，可扩展至大型系统，并在不同应用领域中可靠。代码可在 https://anonymous.4open.science/r/StableRCA-E362 获取。

英文摘要

Root-Cause Analysis (RCA) seeks to identify the variables responsible for abnormal system behavior in complex domains such as manufacturing, cloud computing, and healthcare. Existing approaches face a critical bottleneck: graph-based causal methods can identify intervention targets but typically require a known or accurately estimated causal graph, while graph-free statistical methods either localize marginal anomalies rather than structural causes, or rely on restrictive assumptions about graph structure or functional form. We propose StableRCA, a local mechanism-level RCA framework that avoids global graph discovery by estimating local Markov boundaries and detecting conditional distribution shifts within them. Leveraging the Independent Causal Mechanism principle, we show that intervention targets can be identified with probability converging exponentially in sample size under faithful Markov boundary recovery and non-degenerate mechanism shifts. Experiments on synthetic benchmarks and five real-world datasets demonstrate that StableRCA is robust to graph misspecification, effective under multiple intervention targets, scalable to large systems, and reliable across diverse application domains. Code is available at: https://anonymous.4open.science/r/StableRCA-E362

URL PDF HTML ☆

赞 0 踩 0

2606.05626 2026-06-05 cs.CL cs.AI cs.LG 版本更新

When New Generators Arrive: Lifelong Machine-Generated Text Attribution via Ridge Feature Transfer

当新生成器到来：基于岭特征迁移的终身机器生成文本归因

Zhen Sun, Yifan Liao, Zhicong Huang, Jiaheng Wei, Cheng Hong, Yutao Yue, Xinlei He

发表机构 * Wuhan University（武汉大学）； Ant Group（蚂蚁集团）； The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））； Institute of Deep Perception Technology, JITRI（感知技术研究院，JITRI）

AI总结针对终身机器生成文本归因中持续适应新生成器与保留旧知识难以平衡的问题，提出轻量级分析更新框架RidgeFT，通过协方差校准和固定随机特征实现无需示例回放的闭式更新。

Comments 12 pages

详情

AI中文摘要

机器生成文本（MGT）归因旨在识别给定文本的特定生成器，从而为模型问责和滥用调查提供细粒度证据。随着新的大语言模型不断涌现，归因模型必须持续纳入新生成器，同时保留识别先前见过的生成器的能力。先前工作表明，这种终身MGT归因设置具有挑战性，现有方法通常难以在适应新类别和保留旧类别之间实现稳定平衡。为解决此问题，我们提出RidgeFT，一种轻量级分析更新框架，不依赖于示例回放。RidgeFT在初始生成器集上训练任务感知编码器，在首次观察到每个生成器类别时存储紧凑的类别充分统计量，然后冻结编码器以进行无回放的闭式更新。它通过协方差校准抑制与生成器无关的变异，通过固定随机特征提升表示能力，并基于类别充分统计量通过闭式岭回归更新新类别。在具有不同初始生成器设置的多主题评估中，RidgeFT始终优于基线。它在跨领域、骨干网络和增量协议上实现了最佳宏F1，同时改进了旧类别保留和新类别适应。这些结果表明，特征稳定的分析更新为终身MGT归因提供了一种简单而有效的方法。

英文摘要

Machine-generated text (MGT) attribution aims to identify the specific generator responsible for a given text, thereby providing fine-grained evidence for model accountability and misuse investigation. As new large language models continue to emerge, attribution models must continuously incorporate new generators while preserving their ability to recognize previously seen ones. Prior works have shown that this lifelong MGT attribution setting is challenging, and existing methods often struggle to achieve a stable balance between adapting to new classes and retaining old ones. To address this issue, we propose RidgeFT, a lightweight analytic update framework that does not rely on exemplar replay. RidgeFT trains a task-aware encoder on the initial generator set, stores compact class-wise sufficient statistics when each generator class is first observed, and then freezes the encoder for replay-free closed-form updates. It then suppresses generator-irrelevant variation through covariance calibration, improves representation capacity with fixed random features, and updates new classes through closed-form ridge regression based on class-level sufficient statistics. Across multi-topic evaluations with varying initial generator setups, RidgeFT consistently outperforms baselines. It achieves the best macro-F1 across domains, backbones, and incremental protocols, while also improving both old-class retention and new-class adaptation. These results suggest that feature-stable analytic updates provide a simple yet effective approach to lifelong MGT attribution.

URL PDF HTML ☆

赞 0 踩 0

2606.05625 2026-06-05 cs.AI cs.LG 版本更新

Self-Commitment Latency: A Reward-Free Probe for Prompted Implicit Hacking

自承诺延迟：一种用于提示隐式劫持的无奖励探针

Bonan Shen, Youting Wang, Dingyan Shang, Tao Ning

发表机构 * Stanford University（斯坦福大学）； Tsinghua University（清华大学）

AI总结提出自承诺延迟指标，通过测量推理上下文对模型自身最终答案的承诺时机，无需奖励信号即可检测提示隐式劫持，在GSM8K数据集上达到AUROC 0.878-0.926。

详情

AI中文摘要

当语言模型的思维链看似良性时，隐式奖励劫持难以审计：最终答案可能被提示捷径锚定，而书面推理仍类似于普通问题求解。基于验证器的探针通过测量早期截断的推理上下文获得高奖励来暴露此类行为，但需要任务特定的奖励信号。本文提出一种弱输入替代方案——自承诺延迟，它测量提示推理上下文对模型自身最终答案的承诺时机。我们在受控配对GSM8K设置中使用Qwen2.5-3B-Instruct-4bit评估该探针，比较普通提示与包含答案提示的提示。与诚实上下文相比，包含提示的上下文显著更早且以更低不确定性做出承诺。主要延迟指标——阈值为0.8时的首次承诺延迟——达到AUROC 0.878；支持的全曲线摘要达到承诺范围AUROC 0.926和平均未承诺质量AUROC 0.904。当两种提示条件都正确回答时信号更强，且在不同阈值下保持稳定。这些结果表明，存在捷径的推理上下文会留下早期行为承诺特征，无需奖励模型、外部评判或训练分类器即可检测。

英文摘要

Implicit reward hacking is hard to audit when a language model's chain of thought appears benign: a final answer may be anchored by a prompt shortcut while the written reasoning still resembles ordinary problem solving. Verifier-based probes expose such behavior by measuring how early truncated reasoning contexts obtain high reward, but require a task-specific reward signal. This paper proposes a weaker-input alternative, self-commitment latency, which measures how early a prompted reasoning context commits to the model's own final answer. We evaluate the probe in a controlled paired GSM8K setting using Qwen2.5-3B-Instruct-4bit, comparing ordinary prompts with prompts that include an answer hint. Hinted contexts commit substantially earlier and with lower uncertainty than honest contexts. The primary latency metric, first-commitment latency at threshold 0.8, reaches AUROC 0.878; supporting whole-curve summaries reach AUROC 0.926 for commitment range and 0.904 for mean uncommitted mass. The signal is stronger when both prompt conditions answer correctly and remains stable across thresholds. These results show that shortcut-available reasoning contexts can leave an early behavioral commitment signature detectable without a reward model, external judge, or trained classifier.

URL PDF HTML ☆

赞 0 踩 0

2606.05618 2026-06-05 nlin.CD cs.LG math.DS 版本更新

Uncovering Extreme Event Mechanisms for Prediction and Control with Sensitivity-Balanced Projections

利用敏感度平衡投影揭示极端事件机制以进行预测与控制

Nicholas Zolman, Sajeda Mokbel, Samuel E. Otto, Steven L. Brunton

发表机构 * Department of Mechanical Engineering, University of Washington（华盛顿大学机械工程系）； AI Institute in Dynamic Systems, University of Washington（华盛顿大学动态系统人工智能研究所）； Sibley School of Mechanical and Aerospace Engineering, Cornell University（康奈尔大学Sibley机械与航空航天工程学院）

AI总结提出基于协方差平衡降维（CoBRAS）的可解释方法，通过自动微分替代伴随计算，识别敏感度平衡投影以揭示极端事件机制，并用于数据驱动预测和事件抑制控制。

Comments 12 pages, 6 figures (main text). Additional 14 pages of references and Supplementary Information

详情

AI中文摘要

极端事件——如地震和日冕物质抛射——在许多混沌动力系统中很常见，但由于驱动它们的微妙不稳定性机制，很难表征和预测。在这项工作中，我们开发了一种可解释的技术，揭示极端事件背后的潜在机制，并利用它们构建数据驱动的预测和直观的事件抑制控制器。特别是，我们利用伴随快照的协方差平衡降维（CoBRAS）方法来识别线性斜投影，这些投影最好地捕获感兴趣量的敏感度并重建原始状态。重要的是，我们绕过了繁琐的伴随计算的需要，而是通过现代自动可微数值框架使用反向传播。为了适应空间局部事件，我们还引入了一种新的CoBRAS变体，以获得局部敏感度平衡投影。我们展示了这种方法在一系列具有挑战性的系统中表征极端事件的效用，包括二维Kolmogorov流中湍流能量耗散的爆发、耦合FitzHugh-Nagumo振荡器网络中的自发同步，以及由修正非线性薛定谔方程产生的海洋怪波的局部形成。对于每个例子，我们展示了我们的简单预测模型准确预测极端事件，并且潜在机制可用于设计控制律以防止这些事件。最后，我们证明了通过直接从数据学习动力学的神经网络代理模型，我们可以将这种方法扩展到实验系统和那些并非原生用自动可微编程语言编写的系统。

英文摘要

Extreme events -- such as earthquakes and coronal mass ejections -- are common in many chaotic dynamical systems, yet are difficult to characterize and predict due to the subtle instability mechanisms that drive them. In this work, we develop an interpretable technique that reveals the underlying mechanisms behind extreme events and uses them to build data-driven forecasts and intuitive event suppression controllers. In particular, we utilize the covariance balancing reduction using adjoint snapshots (CoBRAS) method to identify linear oblique projections that best capture the sensitivity of a quantity of interest and reconstruct the original state. Importantly, we bypass the need for cumbersome adjoint calculations, instead using backpropagation via modern automatically differentiable numerical frameworks. To accommodate spatially localized events, we also introduce a new variant of CoBRAS to obtain local sensitivity-balanced projections. We demonstrate the utility of this approach to characterize extreme events across a diverse set of challenging systems, including turbulent bursts of energy dissipation in the 2D Kolmogorov Flow, spontaneous synchronization in networks of coupled FitzHugh-Nagumo oscillators, and the localized formation of ocean rogue waves from a modified nonlinear Schrödinger equation. For each example, we show that our simple forecast models accurately predict extreme events and that the underlying mechanisms may be used to design control laws to prevent these events. Finally, we demonstrate that by learning a neural network surrogate model of the dynamics directly from data, we may extend this approach to experimental systems and systems that are not natively written in an automatically differentiable programming language.

URL PDF HTML ☆

赞 0 踩 0

2606.05609 2026-06-05 cs.CR cs.AI cs.LG 版本更新

SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks

SlotGCG：利用LLMs中的位置脆弱性进行越狱攻击

Seungwon Jeong, Jiwoo Jeong, Hyeonjin Kim, Yunseok Lee, Woojin Lee

发表机构 * Dongguk University-Seoul（东国大学-首尔）

AI总结本文提出SlotGCG方法，通过量化提示中不同插入位置（槽）的脆弱性得分（VSS），选择最脆弱的位置插入对抗性令牌，从而显著提升基于优化的越狱攻击成功率。

详情

Journal ref: International Conference on Learning Representations (ICLR), 2026

AI中文摘要

随着大型语言模型（LLMs）的广泛部署，通过越狱攻击识别其脆弱性变得日益关键。基于优化的攻击方法如贪婪坐标梯度（GCG）专注于将对抗性令牌插入到提示的末尾。然而，GCG将对抗性令牌限制在固定的插入点（通常是提示后缀），未探索在其他位置插入令牌的效果。在本文中，我们实证研究了提示中可插入令牌的候选位置（称为槽）。我们发现越狱的脆弱性与槽的选择高度相关。基于这些发现，我们引入了脆弱性槽得分（VSS）来量化越狱的位置脆弱性。随后，我们提出SlotGCG，该方法使用VSS评估所有槽，选择最脆弱的槽进行插入，并在这些槽上运行针对性的优化攻击。我们的方法提供了一种与攻击无关的位置搜索机制，可插入任何基于优化的攻击，仅增加200毫秒的预处理时间。在多个模型上的实验表明，SlotGCG显著优于现有方法。具体而言，与基于GCG的攻击相比，它实现了14%更高的攻击成功率（ASR），收敛更快，并且对防御方法表现出更强的鲁棒性，ASR比基线方法高42%。我们的实现可在https://github.com/youai058/SlotGCG获取。

英文摘要

As large language models (LLMs) are widely deployed, identifying their vulnerability through jailbreak attacks becomes increasingly critical. Optimization-based attacks like Greedy Coordinate Gradient (GCG) have focused on inserting adversarial tokens to the end of prompts. However, GCG restricts adversarial tokens to a fixed insertion point (typically the prompt suffix), leaving the effect of inserting tokens at other positions unexplored. In this paper, we empirically investigate \emph{slots}, i.e., candidate positions within a prompt where tokens can be inserted. We find that vulnerability to jailbreaking is highly related to the selection of the \emph{slots}. Based on these findings, we introduce the \textit{Vulnerable Slot Score} (VSS) to quantify the positional vulnerability to jailbreaking. We then propose SlotGCG, which evaluates all slots with VSS, selects the most vulnerable slots for insertion, and runs a targeted optimization attack at those slots. Our approach provides a position-search mechanism that is attack-agnostic and can be plugged into any optimization-based attack, adding only 200ms of preprocessing time. Experiments across multiple models demonstrate that SlotGCG significantly outperforms existing methods. Specifically, it achieves 14\% higher Attack Success Rates (ASR) over GCG-based attacks, converges faster, and shows superior robustness against defense methods with 42\% higher ASR than baseline approaches. Our implementation is available at \href{https://github.com/youai058/SlotGCG}{https://github.com/youai058/SlotGCG}

URL PDF HTML ☆

赞 0 踩 0

2606.05606 2026-06-05 cs.LG cs.AI math.OC 版本更新

自回归扩散世界模型用于LLM智能体的离线评估

Kaixuan Liu, Guojun Xiong, Weinan Zhang, Shengpu Tang

发表机构 * Department of Computer Science, Emory University（埃默里大学计算机科学系）； School of Computer Science, Shanghai Jiao Tong University（上海交通大学计算机科学学院）

AI总结提出ADWM框架，通过自回归扩散世界模型从预收集轨迹中模拟环境响应，实现无需在线交互的LLM智能体策略离线评估。

详情

AI中文摘要

在多轮交互环境中评估大语言模型（LLM）智能体成本高且风险大，因为它需要在线环境交互。我们提出ADWM（自回归扩散世界模型），一个仅从预收集轨迹中估计新LLM智能体策略性能的评估框架。核心思想是学习一个潜在扩散世界模型，模拟环境如何响应评估策略，而无需在真实环境中执行。现有的基于扩散的OPE方法通过联合扩散状态和动作，在单次传递中引导完整轨迹，这一假设对于动作是离散文本且必须在观察环境后从策略中采样的LLM智能体不成立。与遭受复合误差的自回归世界模型不同，ADWM将每个转移建模为独立的去噪过程，实现可靠的逐步展开，其中世界模型和智能体按因果顺序交替。关键的是，被评估的LLM智能体通过策略条件得分函数直接引导每一步的扩散生成，确保模拟轨迹准确反映其决策模式。实验上，ADWM在多种多轮智能体任务中实现了准确的价值估计和评估可靠性，展示了其作为离线LLM智能体评估实用框架的前景。

英文摘要

Evaluating large language model (LLM) agents in multi-turn interactive environments is expensive and risky, as it requires online environment interaction. We propose ADWM (Autoregressive Diffusion World Model), an evaluation framework that estimates the performance of a new LLM agent policy purely from pre-collected trajectories. The core idea is to learn a latent diffusion world model that simulates how the environment responds to the evaluation policy, without ever executing it in the real environment. Existing diffusion-based OPE methods guide full trajectories in a single pass by jointly diffusing states and actions, an assumption that breaks down for LLM agents whose actions are discrete text that must be sampled from the policy after observing the environment. Unlike autoregressive world models that suffer from compounding errors, ADWM models each transition as an independent denoising process, enabling reliable step-by-step rollouts where the world model and agent alternate in causal order. Crucially, the LLM agent under evaluation directly guides the diffusion generation at each step via a policy-conditioned score function, ensuring that simulated trajectories accurately reflect its decision-making patterns. Empirically, ADWM achieves accurate value estimates and evaluation reliability across diverse multi-turn agent tasks, demonstrating its promise as a practical framework for offline LLM agent evaluation.

URL PDF HTML ☆

赞 0 踩 0

2606.05555 2026-06-05 cs.LG cs.AI 版本更新

Representation Learning Enables Scalable Multitask Deep Reinforcement Learning

表示学习实现可扩展的多任务深度强化学习

Johan Obando-Ceron, Lu Li, Scott Fujimoto, Pierre-Luc Bacon, Aaron Courville, Pablo Samuel Castro

发表机构 * Mila – Québec AI Institute（魁北克AI研究所）； Université de Montréal（蒙特利尔大学）； McGill University（麦吉尔大学）； CIFAR AI Chair（CIFAR人工智能 chair）； Google DeepMind（谷歌DeepMind）

AI总结本文提出一种结合预测性表示学习与高容量值函数近似的无模型算法MR.Q，在无需规划的情况下，在多任务连续控制任务中超越基于世界模型的方法和多种深度强化学习基线，并显著降低计算开销。

详情

AI中文摘要

将强化学习扩展到多样化的多任务设置仍然是一个核心挑战。虽然基于模型的强化学习的最新进展取得了强劲的性能，但它们依赖于规划和复杂的训练流程，使得不清楚哪些组件对可扩展性至关重要。我们重新审视这个问题，并认为可扩展多任务强化学习的主要驱动力不是基于模型的控制，而是\emph{表示学习}。特别地，我们表明，将预测性的、基于模型的表示与高容量值函数逼近相结合，即使没有规划，也足以实现强劲的性能。我们评估了一种简单的无模型算法MR.Q，将辅助预测目标与可扩展的actor-critic架构相结合。这种方法在多样化的多任务连续控制任务套件中优于最近基于世界模型的方法和一系列深度强化学习基线，同时显著降低了计算开销并提高了实际时间效率。我们观察到随着模型容量的增加而持续改进，并通过消融实验表明预测性表示学习对性能至关重要。

英文摘要

Scaling reinforcement learning (RL) to diverse multitask settings remains a central challenge. While recent advances in model-based RL achieve strong performance, they rely on planning and complex training pipelines, making it unclear which components are essential for scalability. We revisit this question and argue that the primary driver of scalable multitask RL is not model-based control, but \emph{representation learning}. In particular, we show that combining predictive, model-based representations with high-capacity value function approximation is sufficient to achieve strong performance, even without planning. We evaluate a simple model-free algorithm, MR.Q, coupled with auxiliary predictive objectives into a scalable actor-critic architecture. This approach outperforms a recent world-model-based method and a range of deep RL baselines across a diverse suite of multitask continuous control tasks, while significantly reducing computational overhead and improving wall-clock efficiency. We observe consistent improvements with increased model capacity and show through ablations that predictive representation learning is critical for performance.

URL PDF HTML ☆

赞 0 踩 0

2606.05552 2026-06-05 cs.LG cs.AI cs.GR 版本更新

Balancing Image Compression and Generation with Bootstrapped Tokenization

平衡图像压缩与生成：自引导分词

Haozhe Chi, Jinghan Li, Hao Jiang, Wu Sheng, Yi Ma, Jing Wang, Yadong Mu

发表机构 * Peking University（北京大学）； Central Media Technology Institute, Huawei（华为中央媒体技术研究所）

AI总结提出SelfBootTok方法，通过自引导学习将图像信息分解为全局和局部标记组，使生成器仅依赖全局标记，减少40%计算量并提升重建与生成质量，以64个标记实现1.56的gFID新纪录。

详情

AI中文摘要

尽管图像分词取得了进展，但标准方法通过在每个标记中混合所有粒度来编码冗余信息，因此标记之间仍存在冗余。不同粒度信息的混合也增加了生成器训练的复杂性。本文介绍了SelfBootTok，一种通过将信息干净地分解为全局和局部标记组来解决此问题的方法。通过自引导学习，模型仅从全局标记预测局部细节，将视觉细节的负担从生成器转移到分词器。因此，我们的生成器效率更高，仅需全局标记，计算量减少约40%，同时提供更优的重建和生成。此外，该范式优雅地扩展：通过利用更多数据或参数来自监督局部表示学习，SelfBootTok仅使用64个标记就实现了1.56的最优gFID分数。

英文摘要

Despite progress in image tokenization, standard methods encode redundant information by mixing all granularities within each token, thus redundancy persists between tokens. The mix of information of different granularity also complicates the training of generators. This paper introduces SelfBootTok, a method that resolves this by cleanly decomposing information into global and local token groups. Through self-bootstrapped learning, the model predicts local details exclusively from global tokens, shifting the burden of visual details from the generator to the tokenizer. Consequently, our generator is far more efficient, requiring only global tokens and reducing computation by approximately 40%, while delivering superior reconstruction and generation. Moreover, this paradigm scales elegantly: by leveraging more data or parameters to self-supervise local representation learning, SelfBootTok achieves a new state-of-the-art gFID score of 1.56 using only 64 tokens.

URL PDF HTML ☆

赞 0 踩 0

2606.05538 2026-06-05 cs.LG cs.CL 版本更新

主导层 ZO：单层主导大语言模型的零阶微调

Wanhao Yu, Ziyan Wang, Zheng Wang, Abeer Matar Almalky, Yihang Zuo, Shuteng Niu, Sen Lin, Adnan Siraj Rakin, Deliang Fan, Li Yang

发表机构 * University of North Carolina at Charlotte（北卡罗来纳大学夏洛特分校）； University of Houston（休斯顿大学）； State University of New York at Binghamton（纽约州立大学布法罗分校）； Arizona State University（亚利桑那州立大学）； Department of Artificial Intelligence and Informatics, Mayo Clinic（梅奥诊所人工智能与信息学系）

AI总结本文发现零阶优化微调大语言模型时，单个解码层主导性能，通过仅微调该层可匹配或超越全模型微调，并基于激活异常值识别该层，解释其机制。

详情

AI中文摘要

零阶（ZO）优化通过仅使用前向传播实现大语言模型（LLM）的内存高效微调，但适应性如何分布在各层仍不清楚。在这项工作中，我们揭示了一个令人惊讶的现象：ZO 微调被单个解码层显著主导。在多个 LLM 家族和下游任务中，仅微调这一主导层始终匹配甚至超越全模型 ZO 微调。我们进一步表明，主导层是任务无关但模型特定的，并且可以在训练前通过简单的仅推理激活异常值分析来识别。具体来说，主导层与预训练模型中的第一个激活异常值层一致。为了解释这一现象，我们分析了在 ZO 优化下扰动效应如何传播。我们发现主导层结合了两个关键特性：高扰动敏感性和在残差流中的早期位置，使得扰动引起的效应能够通过后续的解码层传播和累积。因此，该层在前向更新下产生不成比例的强且稳定的优化信号。在 LLaMA2-7B 和 Qwen3-8B 上的九个基准测试的广泛实验表明，主导层 ZO 微调在平均性能上优于全模型 MeZO 和基于 LoRA 的 ZO 微调，同时实现了高达 4.52 倍的训练加速。

英文摘要

Zeroth-order (ZO) optimization enables memory-efficient fine-tuning of large language models (LLMs) using only forward passes, but it remains unclear how useful adaptation is distributed across layers. In this work, we reveal a surprising phenomenon: ZO fine-tuning is sharply dominated by a single decoding layer. Across multiple LLM families and downstream tasks, fine-tuning this dominant layer alone consistently matches or even exceeds full-model ZO fine-tuning. We further show that the dominant layer is task-agnostic but model-specific, and can be identified before training through a simple inference-only analysis of activation outliers. Specifically, the dominant layer consistently aligns with the first activation-outlier layer in the pre-trained model. To explain this phenomenon, we analyze how perturbation effects propagate under ZO optimization. We find that the dominant layer combines two key properties: high perturbation sensitivity and early placement in the residual stream, allowing perturbation-induced effects to propagate and accumulate through remaining subsequent decoding layers. As a result, this layer produces disproportionately strong and stable optimization signals under forward-only updates. Extensive experiments on LLaMA2-7B and Qwen3-8B across nine benchmarks show that dominant-layer ZO fine-tuning improves average performance over full-model MeZO and LoRA-based ZO fine-tuning while achieving up to 4.52$\times$ training speedup.

URL PDF HTML ☆

赞 0 踩 0

2606.05497 2026-06-05 cs.LG 版本更新

LEVANTE-bench: Multi-Scale Comparison of VLMs to Children Using Cognitive Tasks (or, "Is Your VLM Smarter Than a 5th Grader?")

LEVANTE-bench: 使用认知任务对VLM与儿童进行多尺度比较（或者，“你的VLM比五年级学生聪明吗？”）

Alvin Wei Ming Tan, David Cardinal, Tania Lorido-Botran, Laura Bravo-Sanchez, Sunny Yu, Michael C. Frank

发表机构 * Stanford University（斯坦福大学）

AI总结本文提出LEVANTE-bench基准，基于儿童认知任务数据，从多个尺度系统评估视觉语言模型与5-12岁儿童在六项任务上的对齐程度，发现模型与人类认知仅部分对齐。

详情

AI中文摘要

鉴于人类经验本质上是多模态的，视觉语言模型（VLM）在模拟人类认知随经验增长和发展方面具有巨大潜力。发挥其潜力需要工具来比较VLM与人类认知发展在不同任务、年龄和人群中的表现。我们提出LEVANTE-bench，这是一个基于学习变异网络（LEVANTE）的任务和数据的基准，该网络分发跨语言和文化测量儿童认知的开源任务和数据。在LEVANTE-bench中，我们系统评估了VLM在六项任务上的表现，比较它们与三个国家5-12岁儿童（N = 1547）的对齐程度。我们在多个尺度上比较模型，评估它们的整体准确性、在任务和项目层面与儿童的对齐程度，以及它们匹配儿童试验级错误分布的程度。对齐在不同尺度上是异质的：在任务和项目层面，能力更强的模型与人类对齐更好。然而，与人类错误分布的匹配在不同任务间差异很大，对于某些任务，较小的模型更好地匹配了年幼儿童的错误。此外，即使表现最好的VLM在矩阵推理和心理旋转任务上也表现不佳。因此，当前的VLM架构仅与儿童的认知能力部分对齐。

英文摘要

Given the inherently multimodal nature of human experience, vision-language models (VLMs) hold substantial promise for modeling human cognition as it grows and develops with experience. Realizing their potential requires tools for comparing VLMs with human cognitive development across tasks, ages, and populations. We present LEVANTE-bench, a benchmark based on tasks and data from the Learning Variability Network (LEVANTE), which distributes open-source tasks and data measuring children's cognition across languages and cultures. In LEVANTE-bench, we systematically assess VLMs on six tasks, comparing their alignment with children aged 5-12 ($N$ = 1547) across three countries. We compare models at multiple scales, assessing their overall accuracy, their task- and item-level alignment with children, and how well they match children's trial-level error distributions. Alignment was heterogeneous across scales: at the level of tasks and items, more capable models aligned better with humans. However, match to human error distributions varied widely across tasks, and for several tasks, smaller models matched younger children's errors better. In addition, even the best-performing VLMs struggled on matrix reasoning and mental rotation tasks. Thus, current VLM architectures align only partially with the cognitive abilities of children.

URL PDF HTML ☆

赞 0 踩 0

2606.05488 2026-06-05 stat.ML cs.LG stat.ME 版本更新

面向统一且数据高效的预测与健康管理：基于表格基础模型

Raffael Theiler, Lev Telyatnikov, Leandro Von Krannichfeldt, Olga Fink

发表机构 * IMOS Lab, EPFL（IMOS实验室，瑞士联邦理工学院）

AI总结提出利用表格基础模型通过上下文学习处理工业时间序列，实现预测与健康管理（PHM）任务，在低数据场景下表现优异，并优于序列模型和梯度提升树。

详情

AI中文摘要

数据驱动的预测与健康管理（PHM）利用时变状态监测数据来诊断系统状态并估计工程资产的剩余使用寿命。这些任务是维护规划的核心，但工业PHM数据通常是碎片化的、部分观测且标注不足，这阻碍了监督学习。基础模型提供了一条通往可重用预测系统的途径，然而大多数时间序列基础模型是为预测设计的，并假设长序列、连贯且规则采样。为弥补这一差距，我们提出了一个框架，利用上下文学习将表格基础模型应用于工业时间序列，并在多种PHM任务上对其进行评估。通过将原始单元级信号转换为表格行，我们展示了这些模型在多个任务（包括预测和诊断）上表现良好，且数据效率高。我们在统一的评估协议下，直接将其与序列模型、Transformer基线和梯度提升树进行比较。结果表明，表格基础模型在预测和诊断任务中取得了最佳平均排名。我们的发现进一步表明，基于PFN的模型在低数据场景下具有竞争力，时间上下文可以在表格表示中保留，且性能依赖于子采样下的代表性上下文构建。这些结果证明，表格基础模型为异构PHM问题提供了一个实用且通用的接口。

英文摘要

Data-driven Prognostics and Health Management (PHM) uses time-varying condition-monitoring data to diagnose system states and estimate remaining useful life in engineered assets. These tasks are central to maintenance planning, but industrial PHM data are often fragmented, partially observed, and poorly labeled, which hinders supervised learning. Foundation models offer a route toward reusable predictive systems, yet most time-series foundation models are designed for forecasting and assume long, coherent, regularly sampled sequences. To address this gap, we propose a framework for applying Tabular Foundation Models to industrial time series using in-context learning, and we evaluate them on a variety of PHM tasks. By converting raw unit-level signals into tabular rows, we show that these models perform well across multiple tasks - including prognostics, and diagnostics - and are highly data efficient. We compare them directly with sequence models, transformer baselines, and gradient-boosted trees under a common evaluation protocol. The results indicate that tabular foundation models achieve the best average ranks across prognostic and diagnostic tasks. Our findings further show that PFN-based models are competitive in low-data regimes, that temporal context can be preserved in the tabular representation, and that performance depends on representative context construction under subsampling. These results demonstrate that tabular foundation models provide a practical and general interface for heterogeneous PHM problems.

URL PDF HTML ☆

赞 0 踩 0

2606.05478 2026-06-05 cs.CV cs.LG 版本更新

Can We Predict The Human Preference For Text-to-Image Content Prior To Generation And Is It Even Useful To Do So?

我们能否在生成之前预测文生图内容的人类偏好，以及这样做是否有用？

Joong Ho Kim, Keith G. Mills

发表机构 * LSU ATHENA Lab（LSU ATHENA实验室）

AI总结研究在扩散模型生成图像前预测人类偏好评分（HPM）的可行性，并利用该预测提升生成质量，同时评估不同HPM的适用性。

Comments Code is available at https://github.com/LSU-ATHENA/HPM-Predict

详情

AI中文摘要

扩散模型（DM）通过从用户提示中合成高质量、逼真的视觉内容，彻底改变了文本驱动的生成。而先前视觉生成的进展（如VAE和GAN）主要基于感知或视觉相似性指标（如FID、PSNR）进行评估，DM的进展促进了更先进的人类偏好指标（HPM）的发展，这些指标将人类判断建模并量化为标量值。然而，DM使用固有的随机过程合成内容，其中随机噪声种子生成。初始随机噪声直接定性和定量地影响生成输出的质量。这种影响在本地部署场景的小型模型中尤为显著。鉴于这一现象，我们首先研究在投入计算资源进行生成之前，我们能在多大程度上预测标量HPM分数。进一步，我们研究能在多大程度上利用这种预测来改善生成图像的质量，并研究哪些HPM最适合此任务。我们的研究表明，这不仅是可能的，而且可以实现可忽略的硬件开销。

英文摘要

Diffusion Models (DM) have revolutionized text-driven generation by enabling the synthesis of high-quality, photorealistic visual content from user prompts. Whereas prior advances in visual generation such as VAEs and GANs were primarily evaluated on perceptual or visual similarity metrics such as FID PSNR, DM advances have fostered the development of more advanced Human Preference Metrics (HPM) that model and quantify human judgment as scalar values. However, DMs synthesize content using an inherently stochastic process where random noise seeds generation. The initial random noise directly affects the quality of generated outputs, both qualitatively and quantitatively. This influence is pronounced in smaller models for local deployment scenarios. Given this phenomenon, we first investigate to what extent we can predict scalar HPM scores prior to committing compute resources for generation. Further, we then investigate to what extent we can leverage such prediction to improve the quality of generated images, and also study which HPMs are best suited for this task. Our investigation reveals that not only is this possible, but that it is feasible to achieve negligible hardware overhead.

URL PDF HTML ☆

赞 0 踩 0

2606.05474 2026-06-05 q-bio.BM cs.LG 版本更新

AlloGen: Conformation-Selective Binder Generation with Differential State Scoring

AlloGen: 基于差异状态评分的构象选择性结合物生成

Hanqun Cao, Zachary Quinn, Aastha Pal, Sumi Kimura, Jingjie Zhang, Pheng Ann Heng, Pranam Chatterjee

发表机构 * Department of Computer Science and Engineering（计算机科学与工程系）； The Chinese University of Hong Kong（香港中文大学）； Department of Bioengineering（生物工程系）； University of Pennsylvania（宾夕法尼亚大学）； Department of Computer and Information Science（计算机与信息科学系）

AI总结提出AlloGen框架，通过可学习的构象选择性评分器Qθ，结合骨架生成与状态选择性，实现针对蛋白不同构象状态的选择性结合物设计。

详情

AI中文摘要

蛋白质结合物设计主要优化亲和力，忽视了构象选择性：对于激酶、核受体和GPCR等变构靶点，无论结合多紧密，同时结合活性态和非活性态的结合物无法提供功能特异性。我们提出AlloGen，一个模块化框架，将骨架生成与学习到的状态选择性评分器$Q_θ$解耦，$Q_θ$是一个SE(3)不变的界面图变换器，通过两阶段课程训练，先学习界面几何，再施加构象区分。由于$Q_θ$完全可微且与生成器无关，它可以作为被动重排序器或主动基于梯度的引导器与任何骨架生成器集成，无需重新训练。在跨越多个家族和构象机制的多样化蛋白质基准上，AlloGen一致地识别出优先识别所需结构状态同时排斥替代构象的结合物。在钙调蛋白上的实验验证进一步表明，这些计算选择性信号可转化为物理分子，产生从头设计的肽，结合所需的全息构象，而对apo状态无检测到的结合。总之，这些结果确立了构象选择性作为可学习属性，并为状态选择性蛋白质结合物设计提供了通用框架。

英文摘要

Protein binder design has largely optimized for affinity alone, leaving conformational selectivity unaddressed: for allosteric targets such as kinases, nuclear receptors, and GPCRs, a binder that engages both active and inactive states provides no functional specificity regardless of how tightly it binds. We introduce AlloGen, a modular framework that decouples backbone generation from a learned state-selectivity scorer $Q_θ$, an SE(3)-invariant interface graph transformer trained via a two-phase curriculum that first learns interface geometry before imposing conformational discrimination. Because $Q_θ$ is fully differentiable and generator-agnostic, it integrates with any backbone generator as a passive reranker or an active gradient-based guide without retraining. Across a diverse benchmark of proteins spanning multiple families and conformational mechanisms, AlloGen consistently identifies binders that preferentially recognize desired structural states while rejecting alternative conformations. Experimental validation on calmodulin further demonstrates that these computational selectivity signals translate to physical molecules, yielding de novo peptides that bind the desired holo conformation while exhibiting no detectable binding to the apo state. Together, these results establish conformational selectivity as a learnable property and provide a general framework for state-selective protein binder design.

URL PDF HTML ☆

赞 0 踩 0

2606.05444 2026-06-05 cs.CL cs.AI cs.LG 版本更新

CausalPOI：基于时空图因果建模的冷启动POI签到预测

Zhaoqi Zhang, Miao Xie, Yi Li, Linyou Cai, Siqiang Luo, Gao Cong

发表机构 * Nanyang Technological University（南洋理工大学）； China Agricultural University（中国农业大学）； Meituan（美团）

AI总结提出CausalPOI框架，利用时空功能交互图建模POI间语义和空间关系，通过结构对齐的处理和对照图模拟事实与反事实场景，解决冷启动POI签到预测问题，在真实数据集上显著优于基线。

Comments Accepted at KDD 2026

详情

DOI: 10.1145/3770855.3817641

AI中文摘要

随着城市环境的快速演变，准确建模兴趣点（POI）的动态行为对于支持数据驱动的城市规划和商业决策至关重要。尽管时空图学习的最新进展改进了POI预测，但大多数方法依赖于基于邻近性的图和相关性驱动建模，忽略了POI之间的功能依赖关系，且未能捕捉城市干预的因果效应。本文引入了一个新的研究问题——冷启动POI签到预测，旨在通过建模新引入POI的时间演化及其与附近POI在结构化城市空间背景下的功能交互，预测其未来的签到模式。为应对这些挑战，我们提出了CausalPOI，一个基于时空图的因果表示学习框架。CausalPOI利用时空功能交互图建模POI之间的语义和空间关系，并构建结构对齐的处理图和对照图以模拟事实和反事实场景。在真实SafeGraph数据集上的大量实验表明，CausalPOI在各方面显著优于最先进的基线，验证了其在时空预测、语义交互建模和因果效应估计方面的有效性，为城市干预分析提供了更可解释和可操作的基础。源代码可在Github获取。

英文摘要

As urban environments continue to evolve rapidly, accurately modeling the dynamic behaviour of Points of Interest is essential for supporting data-driven urban planning and commercial decision-making. While recent advancements in spatio-temporal graph learning have improved POI forecasting, most methods rely on proximity-based graphs and correlation-driven modeling, which overlook the functional dependencies between POIs and fail to capture the causal effects of urban interventions. In this paper, we introduce a novel research problem -- cold-start POI check-in forecasting, which aims to predict the future check-in pattern of a newly introduced POI, by modeling its temporal evolution and functional interactions with nearby POIs in a structured urban spatial context. To address these challenges, we propose CausalPOI, a spatio-temporal graph-based causal representation learning framework. CausalPOI leverages Spatio-Temporal Functional Interaction Graph to model semantic and spatial relationships between POIs, and constructs structurally aligned treatment and control graphs to simulate factual and counterfactual scenarios. Extensive experiments on real-world SafeGraph datasets demonstrate that CausalPOI significantly outperforms state-of-the-art baselines across the board, validating its effectiveness in spatio-temporal forecasting, semantic interaction modeling, and causal effect estimation, providing a more interpretable and actionable foundation for urban intervention analysis. Source code is available at Github.

URL PDF HTML ☆

赞 0 踩 0

2606.05404 2026-06-05 cs.AI cs.CL cs.LG 版本更新

Harnessing Generalist Agents for Contextualized Time Series

利用通用智能体进行情境化时间序列分析

Zihao Li, Kaifeng Jin, Yuanchen Bei, Jiaru Zou, Avaneesh Kumar, Xuying Ning, Yanjun Zhao, Mengting Ai, Baoyu Jing, Hanghang Tong, Jingrui He

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结提出TimeClaw框架，通过集成可执行时间工具、经验驱动能力进化和情景多模态记忆，使通用大语言模型智能体具备情境化时间推理能力，在能源、金融等多领域基准上取得性能提升。

Comments Preprint. 38 Pages

详情

AI中文摘要

时间序列通常嵌入在丰富的上下文中，这对于整体建模至关重要。此外，现实世界的从业者通常需要用于分析时间动态的端到端工作流，其中广泛研究的任务（如预测）只是更广泛解决方案循环中的一个步骤。虽然通用AI智能体为复杂上下文下的此类工作流提供了有前景的接口，但它们主要运行在文本空间中，并未与结构化时间信号完全对齐。在这项工作中，我们引入了TimeClaw，一个用于时间序列的智能体框架，它为通用大语言模型智能体配备了情境化时间推理所需的时间序列原生运行时支持。TimeClaw集成了可执行的时间工具以进行有根据和可审计的分析，经验驱动的能力进化以创建可重用的分析例程，以及用于检索相关推理轨迹的情景多模态记忆。这些组件共同解锁了带有上下文信息的开放式时间推理。在涵盖能源、金融、天气、交通和其他现实世界领域的多个基准上的广泛评估表明，TimeClaw的性能得到了提升。代码可在https://github.com/iDEA-iSAIL-Lab-UIUC/TimeClaw获取。

英文摘要

Time series are often embedded in rich contexts that are essential for holistic modeling. Moreover, real-world practitioners often require end-to-end workflows for analyzing temporal dynamics, where widely studied tasks such as forecasting are only one step in a broader solution loop. While generalist AI agents offer a promising interface for such workflows under complex contexts, they still operate primarily in textual spaces that are not fully aligned with structured temporal signals. In this work, we introduce TimeClaw, an agentic harness framework for time series that equips generalist LLM agents with the time series-native runtime support needed for contextualized temporal reasoning. TimeClaw integrates executable temporal tools for grounded and auditable analysis, experience-driven capability evolution for creating reusable analytical routines, and episodic multimodal memory for retrieving relevant reasoning traces. Together, these components unlock harnessed open-ended temporal reasoning with contextual information. Extensive evaluation on multiple benchmarks covering diverse tasks across energy, finance, weather, traffic, and other real-world domains demonstrates improved performance of TimeClaw. Code is available at https://github.com/iDEA-iSAIL-Lab-UIUC/TimeClaw.

URL PDF HTML ☆

赞 0 踩 0

2606.05403 2026-06-05 cs.LG cs.AI 版本更新

Trust, but Don't Verify: Epistemic Blind Spots in LLM Source Evaluation

信任，但不验证：LLM 源评估中的认知盲点

Rohan N. Pradhan, Steve Goley

发表机构 * Amazon（亚马逊）

AI总结研究语言模型在多源综合中是否评估证据质量，发现模型虽能检测伪造统计但未在综合中启用，而是依赖方法论-语域门控，导致数值有效性被抑制。

详情

AI中文摘要

语言模型日益充当认知代理，综合多个来源的证据以辅助决策。然而，它们是否评估这些证据的质量，还是仅仅基于表面呈现进行聚合，目前尚不清楚。我们表明，模型具备检测伪造统计数据的能力（孤立方法论的正确识别率为0.76-1.00），但在多源综合过程中并未启用这一能力，无论统计数据是伪造还是有效，都会产生相似的数值估计。具体而言，源影响受方法论-语域门控支配，该门控响应分析文本的分布性语域，但不响应数值有效性：例如，统计上不可能的置信区间与有效区间获得相同权重。这种行为分离在来自三个家族（Claude、Qwen、OLMo）的五个模型以及三个专业领域中均得到复现。机制分析（包括因果追踪、线性探针和组件级归因）收敛于同一解释：模型编码并因果使用一种跨领域转移的方法论-语域表示（探针AUC 0.83-0.92），而数值有效性信号（在孤立时可解码）在多源综合中被抑制至随机水平。基于提示的缓解措施（甚至是指定精确统计检查的预言清单）会产生全面怀疑而非选择性辨别，我们检查的后训练流程强化了风格捷径而未建立数值验证。与追踪用户偏好的奉承行为不同，这种失败追踪的是源是否呈现为分析可信，而非其主张是否内部一致。我们称之为认知对齐：与偏好对齐和安全对齐一样，问题不在于能力，而在于部署。

英文摘要

Language models increasingly act as epistemic proxies, synthesizing evidence from multiple sources to inform decisions. Whether they evaluate the quality of that evidence, or merely aggregate it based on surface presentation, remains poorly understood. We show that models possess the capability to detect fabricated statistics (correct identification rates of 0.76-1.00 for methodology in isolation) but do not recruit this capability during multi-source synthesis, producing similar numeric estimates whether the statistics are fabricated or valid. Specifically, source influence is governed by a methodology-register gate that responds to the distributional register of analytical text but not to numeric validity: for example, statistically impossible confidence intervals receive the same weight as valid ones. The behavioral dissociation replicates across five models from three families (Claude, Qwen, OLMo) and three professional domains. Mechanistic analyses, including causal tracing, linear probes, and component-level attribution, converge on the same account: the model encodes and causally uses a methodology-register representation that transfers across domains (probe AUC 0.83-0.92), while numeric-validity signals, decodable in isolation, are suppressed to chance during multi-source synthesis. Prompting-based mitigations, even an oracle checklist naming the exact statistical checks, produce blanket skepticism rather than selective discernment, and the post-training pipelines we examine reinforce the stylistic shortcut without building numeric verification. Unlike sycophancy, which tracks user preference, this failure tracks whether a source presents as analytically credible, not whether its claims are internally consistent. We term this epistemic alignment: like preference and safety alignment, the question is not capability but deployment.

URL PDF HTML ☆

赞 0 踩 0

2606.05400 2026-06-05 cs.AI cs.CL cs.LG 版本更新

LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

LeanMarathon：通过长视界Lean自动形式化实现可靠的AI合作数学家

Yuanhe Zhang, Yuekai Sun, Taiji Suzuki, Jason D. Lee, Fanghui Liu

发表机构 * Department of Statistics, University of Warwick, UK（英国沃里克大学统计系）； Center for Advanced Intelligence Project, RIKEN, Japan（日本理化学研究所高级智能项目）； Department of Statistics, University of Michigan, USA（美国密歇根大学统计系）； Department of Mathematical Informatics, The University of Tokyo（东京大学数学信息学系；日本理化学研究所高级智能项目）； also Center for Advanced Intelligence Project, RIKEN, Japan（加州大学伯克利分校电气工程与计算机科学系；统计系）； Department of Electrical Engineering and Computer Sciences, also Department of Statistics, University of California, Berkeley, USA（上海交通大学数学科学学院，自然科学院和MOE-LSC）； School of Mathematical Sciences, Institute of Natural Sciences and MOE-LSC, Shanghai Jiao Tong University, China

AI总结提出多智能体框架LeanMarathon，通过蓝图抽象和两阶段编排器实现长视界研究数学的可靠自动形式化，在四个Erdős问题上成功形式化七个定理。

Comments 26 pages, 9 figures. Comments are welcome

详情

AI中文摘要

长视界研究数学的自动形式化不仅在困难引理上失败，而且在规模上失败：陈述漂移、依赖关系纠缠、上下文衰减以及局部修复破坏远处的工作。我们提出LeanMarathon，一个用于可靠的研究级Lean自动形式化的多智能体框架。其核心抽象是一个演化的蓝图：一个Lean文件，同时作为形式化证明骨架、自然语言证明图和共享系统记录。四个合约范围的智能体构建、审计、证明和修复这个蓝图。这些智能体由一个两阶段编排器协调，该编排器首先通过对抗性审查稳定目标保真度，然后从动态叶节点向上并行地通过CI门控轮次释放证明有向无环图（DAG）。LeanMarathon将一次脆弱的数小时运行转变为许多局部、可恢复、并行的交易。我们在两篇最近的研究论文上评估LeanMarathon，涵盖四个Erdős问题（#1051, #1196, #164, #1217）。在三次自主运行中，它形式化了所有七个目标定理，没有留下任何sorry，证明了258个引理和定理。这些结果表明，可靠的AI合作数学不仅需要更强的证明器，还需要耐用的框架，以在长数学发展过程中保持目标保真度。代码可在https://github.com/YuanheZ/LeanMarathon找到。

英文摘要

Long-horizon autoformalization of research mathematics fails not only at hard lemmas, but at scale: statements drift, dependencies tangle, context decays, and local repairs corrupt distant work. We present LeanMarathon, a multi-agent harness for reliable research-level Lean autoformalization. Its core abstraction is an evolving blueprint: a Lean file that serves simultaneously as formal proof skeleton, natural-language proof graph, and shared system of record. Four contract-scoped agents construct, audit, prove, and repair this blueprint. These agents are coordinated by a two-stage orchestrator that first stabilizes target fidelity through adversarial review and then discharges the proof directed acyclic graph (DAG) from its dynamic leaves upward in parallel CI-gated rounds. LeanMarathon turns one brittle multi-hour run into many local, recoverable, parallel transactions. We evaluate LeanMarathon on two recent research papers spanning four Erdős problems (#1051, #1196, #164, #1217). Across three autonomous runs, it formalizes all seven target theorems with no sorry, proving 258 lemmas and theorems. These results show that reliable AI co-mathematics requires not only stronger provers, but durable harnesses that preserve target fidelity across long mathematical developments. The code can be found at https://github.com/YuanheZ/LeanMarathon.

URL PDF HTML ☆

赞 0 踩 0

2606.05381 2026-06-05 cs.LG 版本更新

Generalized TV--$\ell_p$ Structured Priors for Bayesian $T_1$ Mapping

广义TV--$\ell_p$结构化先验用于贝叶斯$T_1$映射

Disi Lin, Martin Berggren, Tommy Löfstedt

发表机构 * Department of Computing Science, Umeå University, Sweden（乌尔姆大学计算机科学系，瑞典）

AI总结提出一种结合总变分(TV)与$\ell_p$范数的结构化空间先验族，并嵌入贝叶斯回归框架，利用No-U-Turn采样器进行后验推断，实现$T_1$映射中的不确定性量化，实验表明该方法能提高空间一致性和估计可靠性。

Comments Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2026:015

详情

DOI: 10.59275/j.melba.2026-g41g
Journal ref: Machine.Learning.for.Biomedical.Imaging. 2026 (2026)

AI中文摘要

我们提出了一类扩展的结构化空间先验，将总变分(TV)函数与$\ell_p$范数相结合。该先验被证明是适定的，并嵌入到贝叶斯回归框架中，以实现$T_1$映射中的不确定性量化，后验推断使用No-U-Turn采样器(NUTS)进行。该TV--$\ell_p$构造被证明构成一个定义良好的先验分布族，并且自然地增强了估计参数图中的空间一致性和平滑变化。该方法与基于均匀先验、Gamma先验和有界TV先验的最大似然估计以及几种贝叶斯替代先验进行了比较。评估包括在合成脑和心脏$T_1$映射数据集以及真实在体乳腺$T_1$映射数据集上的实验。结果表明，TV--$\ell_p$先验产生更集中的后验密度，表明不确定性降低。它还持续实现更低的方差和更小的（负）偏差，从而得到更可靠的估计。总体而言，在贝叶斯模型中将基于TV的结构化惩罚与$\ell_p$范数嵌入先验中，改善了$T_1$图中的空间一致性，并增强了不确定性量化，为具有不确定性的$T_1$映射提供了一种稳健的方法。

英文摘要

We propose an extended family of structured spatial priors that incorporates the total variation (TV) function with $\ell_p$ norms. The prior is proven to be proper and incorporated into a Bayesian regression framework to enable uncertainty quantification in $T_1$ mapping, with posterior inference performed using the No-U-Turn Sampler (NUTS). This TV--$\ell_p$ construction is proven to constitute a well-defined family of prior distributions, and it naturally enforces spatial consistency and smooth variations in the estimated parameter maps. The method was evaluated in comparison to maximum-likelihood estimation and several Bayesian alternative priors based on the uniform, Gamma, and bounded TV priors. The evaluation includes experiments on synthetic brain and cardiac $T_1$ mapping datasets, as well as a real in-vivo breast $T_1$ mapping dataset. The results show that the TV--$\ell_p$ prior yields more concentrated posterior densities, indicating reduced uncertainty. It also consistently achieves lower variance and smaller (negative) bias, leading to more reliable estimates. Overall, embedding a TV-based structured penalty along with $\ell_p$ norms in a prior in a Bayesian model improves spatial coherence in $T_1$ maps and enhances uncertainty quantification, offering a robust approach for $T_1$ mapping with uncertainties.

URL PDF HTML ☆

赞 0 踩 0

2606.05380 2026-06-05 cs.DS cs.LG 版本更新

Learning-Augmented Online Minimization with Dual Predictions

具有双重预测的学习增强在线最小化

Christian Coester, Alexa Tudose, Alexander Turoczy

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结针对度量任务系统和层状集合覆盖两类在线最小化问题，提出利用对偶线性规划最优解的机器学习预测来改进理论保证的学习增强算法。

2606.05378 2026-06-05 cs.LG cs.AI 版本更新

Pattern Selectivity is Not Task-Causal Structure: A Cross-Architecture Mechanistic Study of Composed-Task Circuits in 1B-Class Language Models

模式选择性并非任务因果结构：1B类语言模型中组合任务电路的跨架构机制研究

Yongzhong Xu

发表机构 * B-Class Language Models（1B类语言模型）； Cross-Architecture Mechanistic Study（跨架构机理研究）

AI总结通过统一协议测试三个1B类语言模型在四个组合任务上的注意力头电路，发现不同模型对同一任务使用不同的注意力模式，并引入五类筛选结果分类法，提出MoE模型基于前一个token位置基板构建组合任务电路的可证伪假设。

Comments 27 pages, 3 figures

详情

AI中文摘要

我们测试了一个单一的筛选与消融方案——通过任务模式选择性识别注意力头电路，然后通过与匹配随机零假设进行因果消融验证——是否能在不同模型家族中产生一致的机制性结论。该方案可在不同流水线间移植；但它识别出的具体电路则不能。在四个组合任务（间接宾语识别、大于、后继序列、变量绑定）和三个来自不同训练流水线的1B类语言模型（Pythia 1B / Pile / 密集；OLMo 1B / DCLM / 密集；OLMoE 1B-7B / DCLM / 混合专家）上，我们运行了一个统一协议，每个单元使用十个种子采样匹配随机零假设。由此产生的12个（任务，模型）单元中，没有两个在可比较的效应大小下共享相同的主要因果筛选：同一任务，具有相同的行为能力，在不同模型中通过不同的注意力模式类型实现。我们引入了一个五类筛选结果分类法——主要原因、次要原因、相关物、干扰物、零——并附有定量阈值，并展示了所有五类结果均出现在面板中。我们提出了一个可证伪的假设：我们面板中的MoE模型在一个基础的前一个token位置基板之上构建组合任务电路（对于OLMoE 1B-7B，前一个token电路消融在4个任务中的3个上是最强的因果筛选），IOI例外与IOI是最终位置名称复制任务一致，其结构直接探测不同的模式。该假设附带对其他MoE语言模型的明确预测。我们诚实地构建方法论：来自配套方法论论文的谱参与比信号是专门化计算的一般指标；使发现具有任务特异性的是任务模式筛选加上每个模型的因果验证。

英文摘要

We test whether a single screen-and-ablate recipe -- identify attention-head circuits by task-pattern selectivity, then verify by causal ablation against a matched-random null -- produces consistent mechanistic claims across model families. The recipe ports across pipelines; the specific circuit it identifies does not. Across four composed tasks (indirect-object identification, greater-than, successor sequences, variable binding) and three 1B-class language models from distinct training pipelines (Pythia 1B / Pile / dense; OLMo 1B / DCLM / dense; OLMoE 1B-7B / DCLM / mixture-of-experts), we run a unified protocol with the matched-random null sampled across ten seeds per cell. The resulting 12 (task, model) cells contain no two that share the same primary causal screen at comparable effect size: the same task, with the same behavioral capability, is implemented through different attention-pattern types across models. We introduce a five-category screen-outcome taxonomy -- primary cause, secondary cause, correlate, interferer, null -- with quantitative thresholds, and show that all five outcomes appear in the panel. We propose a falsifiable hypothesis: the MoE model in our panel builds composed-task circuits on top of a foundational previous-token positional substrate (the prev-token-circuit ablation is the strongest causal screen on 3 of 4 tasks for OLMoE 1B-7B), with the IOI exception consistent with IOI being a final-position name-copying task whose structure directly probes a different pattern. The hypothesis comes with explicit predictions for other MoE language models. We frame the methodology honestly: the spectral participation-ratio signal from the companion methodology paper is a general indicator of specialized computation; what makes a finding task-specific is the task-pattern screen plus a per-model causal verification.

URL PDF HTML ☆

赞 0 踩 0

2606.05376 2026-06-05 cs.LG 版本更新

SHALA-LLM: Smartly Handling Ambiguous Labels in Aligning LLMs

SHALA-LLM：在对齐LLM中智能处理模糊标签

Jingyao Wu, Ashley Wang, Keane Ong, Paul Pu Liang, Rosalind Picard

发表机构 * MIT Media Lab, Massachusetts Institute of Technology（麻省理工学院媒体实验室、麻省理工学院）； National University of Singapore（新加坡国立大学）

AI总结提出SHALA-LLM强化学习框架，通过从标注者分布中学习并动态优先处理高模糊样本，改善LLM对模糊标签的建模，在NLI和情感识别任务中提升与标注者分布的一致性及分类性能。

详情

AI中文摘要

许多以人为中心的任务，包括自然语言推理（NLI）和情感识别（ER），具有多种合理的解释，导致标签模糊和不同标注者之间的分歧。随着LLM越来越多地部署在现实场景中，忠实建模这种模糊性对于识别有争议的输入、保留模糊情况下的变异性以及捕捉人类判断的完整分布至关重要。然而，现有的LLM对齐方法主要假设单一正确标签，在优化过程中排除了标注者分歧。我们不将这种模糊性视为噪声，而是展示如何通过一种名为SHALA-LLM（在对齐LLM中智能处理模糊标签）的新算法将其视为改善模型行为的信息。该强化学习框架提供了一种新方式，使LLM能够直接从标注者分布中学习，同时在优化过程中动态优先处理高模糊样本。在包括ChaosNLI、GoEmotions和MSP-Podcast在内的模糊敏感NLI和ER基准上的实验表明，SHALA-LLM改善了与标注者标签分布的一致性，例如在ChaosNLI上，它将Jensen-Shannon距离降低了高达62.1%。同时，SHALA-LLM将F1分数提高了高达16.7%，表明建模标注者分歧也能增强分类性能。

英文摘要

Many human-centered tasks, including natural language inference (NLI) and emotion recognition (ER), have multiple plausible interpretations, leading to label ambiguity and challenging disagreements across human annotators. As LLMs are increasingly deployed in real-world settings, faithfully modeling such ambiguity is essential to identify contested inputs, preserve variability in ambiguous cases, and capture the full distribution of human judgments. Yet, existing LLM alignment approaches have predominantly assumed a single correct label, excluding annotator disagreement during optimization. Instead of treating this ambiguity as noise, we show how to treat it as information that improves model behavior through a new algorithm called SMARTLY HANDLING AMBIGUOUS LABELS IN ALIGNING LLMS (SHALA-LLM). This reinforcement learning framework provides a new way for LLMs to learn directly from annotator distributions while dynamically prioritizing highly ambiguous samples during optimization. Experiments on ambiguity-sensitive NLI and ER benchmarks, including ChaosNLI, GoEmotions, and MSP-Podcast, demonstrate that SHALA-LLM improves agreement with annotator label distributions, e.g. on ChaosNLI, it reduces Jensen-Shannon Distance by up to 62.1%. At the same time, SHALA-LLM improves F1 by up to 16.7%, showing that modeling annotator disagreement can also strengthen classification performance.

URL PDF HTML ☆

赞 0 踩 0

2606.05373 2026-06-05 cs.LG physics.bio-ph 版本更新

Evidence-Guided Neural Architecture Selection under Uncertainty for Subject-Specific Blood Glucose Forecasting

证据引导的神经架构选择在不确定性下用于个体化血糖预测

Md Azharul Islam, Dwyer Deighan, Tarunraj Singha, Danial Faghihi

发表机构 * organization= Department of Mechanical ； Data-Enabled Sciences, University at Buffalo , city= Buffalo , state= NY , country= USA

AI总结提出EVIDENT框架，结合贝叶斯训练、证据排序和任务特定验证，在有限、噪声和异构数据中自动选择最优神经架构，用于个体化血糖预测。

详情

AI中文摘要

在有限、噪声和异构数据下的时间序列预测中，可靠的神经架构选择是一个开放挑战，标准的启发式架构设计和验证方法无法确保准确可靠的预测和泛化。我们提出EVIDENT（基于证据的神经架构识别），一个整合贝叶斯训练、基于证据的排序和不确定性下任务特定验证的架构选择框架。该框架探索候选架构池，并识别满足规定验证标准的最低容量模型。我们使用时间卷积网络（TCNs）在1型糖尿病患者的个体化血糖预测中演示了该方法。结果表明，EVIDENT在群体水平糖尿病数据上系统地拒绝了参数不足和过度的TCN架构，同时识别出能可靠泛化到未见患者的模型。当多个架构具有竞争力时，该框架进一步支持基于可信度的集成预测，从而提升预测性能。与随机搜索基线相比，EVIDENT识别出更小的架构，在未见患者上具有更一致的预测性能。这些发现确立了EVIDENT作为一种神经架构发现策略，能够在数据有限和异构环境中实现高风险预测的可靠模型选择。

英文摘要

Reliable neural architecture selection is an open challenge in time-series forecasting under limited, noisy, and heterogeneous data, where standard heuristic architecture design and validation approaches fail to ensure accurate and reliable prediction and generalization. We propose EVIDENT (EVidence-based IDEntification of Neural archiTectures), a framework for architecture selection that integrates Bayesian training, evidence-based ranking, and task-specific validation under uncertainty. The framework explores the candidate architecture pool and identifies the lowest-capacity model that satisfies a prescribed validation criterion. We demonstrate this method using temporal convolutional networks (TCNs) for individualized blood glucose forecasting in type 1 diabetes patients. The results show that EVIDENT systematically rejects both under- and over-parameterized TCN architectures on population-level diabetes data, while identifying models that generalize reliably to unseen patients. When multiple architectures are competitive, the framework further supports plausibility-weighted ensemble predictions that enhance predictive performance. Compared with a random-search baseline, EVIDENT identified smaller architectures with more consistent forecasting performance on unseen patients. These findings establish EVIDENT as a strategy to neural architecture discovery, enabling reliable model selection for high-consequence forecasting in data-limited and heterogeneous settings.

URL PDF HTML ☆

赞 0 踩 0

2606.05371 2026-06-05 cs.LG cs.NA math.NA stat.ML 版本更新

Mamba-Assisted Non-Markovian Closure for Reduced-Order Modeling

Mamba辅助的非马尔可夫闭合用于降阶建模

Zhi-Feng Wei, Saad Qadeer, Panos Stinis

发表机构 * Pacific Northwest National Laboratory（太平洋西北国家实验室）； University of Washington（华盛顿大学）； Brown University（布朗大学）

AI总结针对高维动力系统降阶建模中的非马尔可夫闭合项问题，提出Mamba辅助闭合框架，利用Mamba序列模型从已解析轨迹预测闭合项，并通过数值积分器耦合降阶方程，在粘性Burgers方程和混沌双尺度Lorenz '96系统上优于马尔可夫模型、GRU序列模型和Wilks方法。

Comments Code will be released upon acceptance

详情

AI中文摘要

高维动力系统的降阶建模常常受到非马尔可夫闭合项的阻碍，该闭合项表示未解析变量对解析动力学的影响。受Mori--Zwanzig形式论的启发，其中闭合项采取解析轨迹的记忆泛函形式，我们将闭合建模重新表述为序列建模问题，并提出Mamba辅助闭合（MAC）框架：一个基于Mamba的序列模型，经过训练从解析轨迹预测闭合项，通过数值积分器与降阶控制方程耦合，以在时间上推进解析变量。该框架的一个关键特性是利用状态空间模型的双重表示——模型通过卷积形式以序列到序列的方式进行训练，并通过循环形式进行逐步自回归部署，从而实现高效的长轨迹训练和恒定的每步推理成本。在粘性Burgers方程和混沌双尺度Lorenz '96系统上，MAC模型在预测准确性和长时间展开稳定性方面显著优于马尔可夫降阶模型、基于GRU的序列模型和Wilks方法。

英文摘要

Reduced-order modeling of high-dimensional dynamical systems is often hindered by the non-Markovian closure term that represents the effect of unresolved variables on the resolved dynamics. Inspired by the Mori--Zwanzig formalism, in which the closure takes the form of a memory functional of the resolved trajectory, we recast closure modeling as a sequence modeling problem and propose the Mamba-Assisted Closure (MAC) framework: a Mamba-based sequence model, trained to predict the closure from the resolved trajectory, is coupled with the reduced-order governing equations through a numerical integrator to advance the resolved variables in time. A key feature of the framework is its exploitation of the dual representation of state-space models -- the model is trained in a sequence-to-sequence fashion via the convolutional form, and deployed for step-by-step autoregressive rollout via the recurrent form, yielding both efficient long-trajectory training and constant per-step inference cost. On the viscous Burgers' equation and the chaotic two-scale Lorenz '96 system, the MAC model substantially outperforms the Markovian reduced-order model, the GRU-based sequence model, and the Wilks method in predictive accuracy and long-time rollout stability.

URL PDF HTML ☆

赞 0 踩 0

2606.05365 2026-06-05 stat.ML cs.LG 版本更新

Environment-Robust Representation Learning with Empirical Bayes

基于经验贝叶斯的环境鲁棒表示学习

Yuli Slavutsky, Matthew Shen, Bohan Wu, David M. Blei

发表机构 * Department of Statistics Columbia University（统计学系哥伦比亚大学）； Columbia University（哥伦比亚大学）； Departments of Statistics and Computer Science Columbia University（统计学与计算机科学系哥伦比亚大学）

AI总结提出一种经验贝叶斯变分方法，通过跨环境平衡项学习不变潜在变量，实现对新环境的鲁棒预测，在天文、微生物和ICU数据上优于现有方法。

详情

AI中文摘要

我们考虑多环境预测问题。假设环境改变潜在变量的分布，而生成观测协变量和目标的机制在给定该变量条件下保持稳定。例如，医院或临床队列可能在潜在患者状态的流行率上有所不同，尽管这些状态、生理测量和结果之间的关系保持不变。给定来自多个环境的数据集，我们为这类问题构建了一个贝叶斯模型，并推导出相应的变分目标。我们证明该目标分解为每个环境项和由模型结构引起的额外跨环境平衡项。我们使用经验贝叶斯方法设置先验并将其纳入目标。基于该目标，我们开发了一种用于后验近似的摊销变分算法，并利用学习到的潜在变量在新环境中形成预测。我们通过模拟以及天文源识别、基于微生物组的疾病检测和ICU脓毒症预测的实际研究来研究我们的方法。在这些设置中，我们的方法在新环境预测方面优于先前的方法。

英文摘要

We consider multi-environment prediction problems. We assume the environments change the distribution of a latent variable, while the mechanisms generating observed covariates and targets remain stable conditional on that variable. For example, hospitals or clinical cohorts may differ in the prevalence of latent patient states, even though the relationships between those states, physiological measurements, and outcomes remain unchanged. Given a dataset from multiple environments, we formulate a Bayesian model for such problems and derive the corresponding variational objective. We show that this objective decomposes into per-environment terms and an additional cross-environment balancing term induced by the model's structure. We use an empirical Bayes method to set the prior and incorporate it into the objective. Based on this objective, we develop an amortized variational algorithm for posterior approximation, and use the resulting learned latent variables to form predictions in new environments.We study our approach through simulations and real-world studies of astronomical source identification, microbiome-based disease detection, and ICU sepsis prediction. Across these settings, our method outperforms previous approaches for prediction in new environments.

URL PDF HTML ☆

赞 0 踩 0

2606.05361 2026-06-05 stat.ML cs.LG 版本更新

物理的隐形之手：当视频扩散模型知道的比它们展示的更多

Parsa Esmati, Somjit Nath, Katja Hofmann, Derek Nowrouzezahrai, Samira Ebrahimi Kahou, Majid Mirmehdi

发表机构 * University of Bristol（布里斯托大学）； McGill University（麦吉尔大学）； Mila–Quebec AI Institute（魁北克AI研究院）； Microsoft Research（微软研究院）； University of Calgary（卡尔加里大学）

AI总结通过逆向扩散过程探测视频扩散模型的潜在轨迹，发现物理合理性可以从扩散变换器状态中线性解码，准确率达81.27%，表明物理有意义的表示是生成式去噪的副产品。

详情

AI中文摘要

现代视频扩散模型生成越来越真实和时间上连贯的视频，这激发了它们作为候选世界模拟器的使用。然而，目前尚不清楚这些模型是否内部编码了物理结构，或者仅仅是复现了训练中看到的运动模式。我们通过沿着对应已知物理合理性的真实视频的潜在轨迹探测视频扩散模型来研究这个问题。为了获得这样的轨迹，我们通过从干净视频潜在变量向后积分学习到的速度场到噪声，近似逆向确定性采样过程，从而访问模型的中间状态和注意力图。利用这些恢复的轨迹，我们表明物理合理性可以从扩散变换器状态中线性解码，在IntPhys和InfLevel上达到约81.27%的平均准确率，并优于专门的表示学习基线如V-JEPA和VideoMAE。令人惊讶的是，这个信号在VAE潜在输入中不存在，而是在去噪变换器内部出现，尽管模型没有使用自监督预测目标进行训练。这些发现表明，物理有意义的表示可以作为生成式去噪的副产品产生。

英文摘要

Modern video diffusion models generate increasingly realistic and temporally coherent videos, motivating their use as candidate world simulators. Yet it remains unclear whether these models internally encode physical structure, or merely reproduce motion patterns seen during training. We study this question by probing video diffusion models along latent trajectories corresponding to real videos with known physical plausibility. To obtain such trajectories, we approximately invert the deterministic sampling process by integrating the learned velocity field backward from a clean video latent to noise, giving access to the model's intermediate states and attention maps. Using these recovered trajectories, we show that physical plausibility is linearly decodable from diffusion transformer states across IntPhys and InfLevel, reaching around 81.27% average accuracy and outperforming dedicated representation-learning baselines such as V-JEPA and VideoMAE. Surprisingly, this signal is absent from the VAE latent input and emerges inside the denoising transformer itself, despite the model not being trained with a self-supervised predictive objective. These findings suggest that physically meaningful representations can arise as a byproduct of generative denoising.

URL PDF HTML ☆

赞 0 踩 0

2606.05327 2026-06-05 cs.LG q-bio.QM stat.ML 版本更新

Multimarginal flow matching with optimal transport potentials

基于最优传输势的多边缘流匹配

Raghav Kansal, David Crair, Nghia Nguyen, Scott Pope, Bradley Parry

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出一种利用动态最优传输势引导流匹配学习中间边缘分布的方法，实现高效无模拟的多边缘流匹配，在单细胞RNA测序、海洋学和气象数据集上取得最优性能。

Comments 9 pages, 3 figures, 4 tables, and a 27 page appendix. Accepted to the Forty-Third International Conference on Machine Learning

详情

AI中文摘要

流匹配（FM）已成为学习两个经验分布之间动态传输映射的强大框架。然而，对于存在中间观测边缘分布的情况，这些边缘分布有助于约束端点之间的流，这方面的研究较少。这种“多边缘”设置对于许多科学领域中动态系统的时间演化建模至关重要，这些领域可以对序列分布进行采样。我们通过一种新颖的方法解决了这个问题，该方法利用了FM与动态最优传输（OT）之间的联系，通过动态OT作用中的势项将流柔和地引导向中间边缘分布。通过扩展条件FM学习目标以包含这些势，我们推导出一种高效、无模拟的多边缘FM算法，该算法在学习流的时空动力学方面提供了相当大的灵活性。我们在不同的单细胞RNA测序、海洋学和气象数据集上展示了OT势FM（OTP-FM）的最先进性能和训练效率。我们的代码可在https://github.com/Bexorg-Inc/OTP-FM获取。

英文摘要

Flow matching (FM) has emerged as a powerful framework for learning dynamic transport maps between two empirical distributions. However, less explored is the setting with intermediate observed marginals that can help constrain the flows between the endpoints. This "multimarginal" regime is central to modeling temporal evolution in dynamical systems in many scientific domains that can sample sequential distributions. We tackle this problem with a novel approach that leverages the connection between FM and dynamic optimal transport (OT), softly steering the flow towards the intermediate marginals through potential terms in the dynamic OT action. By extending the conditional FM learning target to incorporate these potentials, we derive an efficient, simulation-free algorithm for multimarginal FM that offers considerable flexibility in the spatiotemporal dynamics of the learned flows. We demonstrate state-of-the-art performance and training efficiency of OT-potential FM (OTP-FM) on diverse single-cell RNA sequencing, oceanographic, and meteorological datasets. Our code is available at https://github.com/Bexorg-Inc/OTP-FM.

URL PDF HTML ☆

赞 0 踩 0

2606.05326 2026-06-05 math.OC cs.AI cs.LG math-ph math.AP math.MP 版本更新

Gradient descent at the Edge of Stability: free energy model and kinetic description of the two-layer network

稳定边缘的梯度下降：双层网络的自由能模型与动力学描述

Antonin Chodron de Courcel

发表机构 * Ecole Normale Supérieure, CNRS, 45 rue d’Ulm, 75005 Paris, France（巴黎高等师范学院、法国国家科学研究中心、巴黎 rue d’Ulm 45 号、75005 地址、法国）

AI总结针对大学习率下梯度下降的稳定边缘动力学，提出连续时间有效模型跟踪平均轨迹与快速振荡协方差，揭示有效自由能作为关键监控量，并导出宽双层网络的平均场极限动力学方程。

Comments Comments are welcome!

详情

AI中文摘要

基于LLM的弱验证器聚合用于空间布局生成

Sharon Zhang, R. Kenny Jones, Jiajun Wu, Maneesh Agrawala

发表机构 * Stanford University（斯坦福大学）； Roblox

AI总结提出一种通过聚合LLM生成的弱验证器来构建强验证器的流水线，用于空间布局领域，在3D房间布局和2D海报设计任务中F1分数提升高达7倍。

详情

AI中文摘要

我们提出了一种流水线，用于构建和聚合任务特定的、LLM生成的弱（不完美）验证器，以形成空间布局领域的强验证器。给定任务描述，我们的流水线要求LLM使用布局验证DSL合成一组验证程序。每个单独的LLM生成的验证器通常对布局与相应任务描述之间的匹配提供不完美的检查。我们表明，通过聚合许多此类验证器的响应，我们可以产生更强的验证器。此外，通过应用弱学习技术，我们的流水线可以从非常稀疏的人工标记示例布局（约10个）中学习如何聚合弱验证器。我们发现，我们的流水线产生的强验证器优于使用一组LLM评判者直接检查布局是否与任务描述匹配的现状方法，在各种3D房间布局和2D海报设计任务中，F1分数提高了高达7倍。我们还证明，使用来自我们强验证器的自然语言反馈进行验证器引导的布局生成，根据人类评估者的评估，将基础布局生成器的布局质量提高了高达66.2%。

英文摘要

We present a pipeline for building and aggregating task-specific, LLM-generated weak (imperfect) verifiers into a strong verifier for spatial layout domains. Given a task description, our pipeline asks an LLM to synthesize a collection of verifier programs using a layout verification DSL. Each individual LLM-generated verifier usually provides an imperfect check for a match between the layout and the corresponding task description. We show that by aggregating the responses of many such verifiers we can produce a stronger verifier. Moreover, by applying techniques from weak learning, our pipeline can learn how to aggregate the weak verifiers from a very sparse set of human labeled example layouts (about 10). We find that the strong verifiers produced by our pipeline outperform the status-quo approach of using a set of LLM judges to directly check whether a layout matches a task description, raising F1-scores by up to 7X across a variety of 3D room layout and 2D poster design tasks. We also demonstrate that verifier-guided layout generation using natural language feedback from our strong verifiers improves layout quality of a base layout generator by up to 66.2% according to a human evaluator.

URL PDF HTML ☆

赞 0 踩 0

2606.05266 2026-06-05 cs.LG cs.CC cs.DS math.CO math.PR math.ST stat.TH 版本更新

Sharp Low-Degree Thresholds for Planted-vs-Planted Testing

植入vs植入测试的尖锐低度阈值

Anda Skeja, Daniel Gutiérrez Espinoza, Fiona Skerman, Alexander S. Wein

发表机构 * Department of Mathematics, University of California, Davis（加州大学戴维斯分校数学系）

AI总结针对植入vs植入设置，建立了低度多项式测试的首个尖锐阈值，并证明在植入子矩阵和植入稠密子图模型中计数社区的匹配上下界，测试阈值与已知低度恢复阈值精确一致。

2606.05265 2026-06-05 cs.LG 版本更新

Data-efficient flood depth prediction through domain-aware coreset selection and tabular foundation models

数据高效的洪水深度预测：通过领域感知的核心集选择与表格基础模型

Lipai Huang, Adithi Srinath, Manas Singh, Junwei Ma, Ali Mostafavi

发表机构 * Urban Resilience.AI Lab（Urban Resilience.AI实验室）； Zachry Department of Civil and Environmental Engineering, Texas A&M University（Zachry土木与环境工程系，德克萨斯A&M大学）； Department of Computer Science and Engineering, Texas A&M University（计算机科学与工程系，德克萨斯A&M大学）； Resilitix Intelligence LLC ； Institute for a Disaster Resilient Texas, Texas A&M University（德克萨斯灾难韧性研究所，德克萨斯A&M大学）

AI总结提出一种领域感知的核心集构建流程，结合表格基础模型，仅用0.7%的训练数据即可实现与监督模型相当的洪水深度预测精度，并支持跨流域迁移。

详情

AI中文摘要

近实时洪水深度预测需要替代模型具有准确性、快速性和跨流域可迁移性。监督替代模型在精度上可媲美基于物理的模拟器，但每个流域需要数百万训练行，且无法外推到原始网格之外。我们提出了一种领域感知的核心集构建流程，在推理时对表格基础模型进行条件化。该流程按重现期和受影响最严重的流域对风暴进行分层，然后使用目标感知的空间选择器采样六边形。使用每个流域训练池的0.7%，模型在休斯顿地区九个流域上实现了平均$R^2$为0.663，达到监督参考（$R^2$=0.673）的98.5%。该模型无需特定任务重训练即可迁移到未见的流域，优于基于核心集训练的监督基线。在真实风暴上，模型在一个远分布外案例中超过了监督参考，在一个几乎分布内案例中略逊于监督参考。领域感知的核心集构建使表格基础模型能够实现数据高效、跨流域可迁移的洪水预测，无需每个流域的训练。

英文摘要

Near-real-time flood depth prediction demands surrogate models that are accurate, fast, and transferable across watersheds. Supervised surrogates can match physics-based simulators in accuracy but need millions of training rows per watershed and cannot extrapolate beyond their original mesh. We propose a domain-aware coreset construction pipeline that conditions a tabular foundation model at inference time. The pipeline stratifies storms by return period and most-affected watershed, then samples hexagons with a target-aware spatial selector. With 0.7% of the per-watershed training pool, the model attains a mean $R^2$ of 0.663 across nine Houston-area watersheds, within 98.5% of the supervised reference ($R^2$ = 0.673). It transfers to held-out watersheds without task-specific retraining, staying ahead of a coreset-trained supervised baseline. On real storms it exceeds the supervised reference on a far out-of-distribution case and trails it on a mostly in-distribution one. Domain-aware coreset construction lets tabular foundation models deliver data-efficient, watershed-transferable flood predictions without per-watershed training.

URL PDF HTML ☆

赞 0 踩 0

2606.05263 2026-06-05 cs.LG cs.AI 版本更新

Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents

基于策略条件的反事实信用分配用于长周期语言智能体的可验证强化学习

Renwei Meng

发表机构 * stu.ahu.edu.cn（安徽大学）

AI总结提出CVT-RL算法，通过策略条件反事实贡献估计和可验证奖励约束，解决长周期语言智能体在推理和工具使用中的虚假证据链、信念漂移和捷径行为问题，在多个任务上提升成功率并降低作弊率。

Comments 16 pages, 6 figures

详情

AI中文摘要

具有可验证奖励的强化学习改进了推理和工具使用，但长周期语言智能体仍然学习到无支持的证据链、信念漂移以及满足终端检查的捷径行为。现有的过程奖励大多是相关的：它们奖励类似检索、反思或验证的步骤，而不估计在指定干预下该步骤是否有助于最终验证的成功。我们提出CVT-RL，一种具有密集可验证奖励、干预有效性门控和策略条件反事实贡献（PCCC）估计器的约束策略梯度算法。删除、语义替换、证据替换和工具输出扰动定义了不同的受控干预；延续从冻结的参考策略中采样，并使用选择调整的双重稳健估计器增强优势。信念控制仅使用前缀可观察标签，而增广拉格朗日约束无支持的声明、跳过的验证、工具篡改和不安全调用。在长上下文问答、ALFWorld、ScienceWorld以及网页/工具任务上，CVT-RL将平均任务成功率从计算匹配的非因果强化学习的71.8%和信息匹配的反事实过程基线的75.4%提高到78.9%，证据F1分数从信息匹配基线的78.9提高到82.8，并将测量的作弊率从7.2%降低到3.9%。独立人工审计估计CVT-RL的作弊率为4.6%，而信息匹配基线为8.1%，自适应检测器规避攻击仅将作弊率提高到7.1%。分层自助法和混合效应检验在Holm校正后所有主要指标的p<0.01。精心范围的反事实信用，结合有效性门控、诊断和可验证约束，为语言智能体更可靠的长周期强化学习提供了一条可复现的路径。

英文摘要

Reinforcement learning with verifiable rewards improves reasoning and tool use, yet long-horizon language agents still learn unsupported evidence chains, belief drift, and shortcut actions that satisfy terminal checks. Existing process rewards are mostly correlational: they reward retrieval-, reflection-, or verification-like steps without estimating whether the step contributes to final verified success under a specified intervention. We propose CVT-RL, a constrained policy-gradient algorithm with dense verifiable rewards, intervention-validity gating, and a policy-conditioned counterfactual contribution (PCCC) estimator. Deletion, semantic substitution, evidence substitution, and tool-output perturbation define separate controlled interventions; continuations are sampled from a frozen reference policy, and a selection-adjusted doubly robust estimator augments the advantage. Belief control uses only prefix-observable labels, while an augmented Lagrangian constrains unsupported claims, skipped verification, tool tampering, and unsafe calls. On long-context QA, ALFWorld, ScienceWorld, and web/tool tasks, CVT-RL improves average task success from 71.8% for compute-matched non-causal RL and 75.4% for an information-matched counterfactual-process baseline to 78.9%, improves evidence F1 from 78.9 to 82.8 over the information-matched baseline, and reduces measured hacking from 7.2% to 3.9%. Independent human audit estimates 4.6% hacking for CVT-RL versus 8.1% for the information-matched baseline, and adaptive detector-evasion attacks raise hacking only to 7.1%. Stratified bootstrap and mixed-effects tests give p<0.01 after Holm correction for all primary metrics. Carefully scoped counterfactual credit, paired with validity gating, diagnostics, and verifiable constraints, provides a reproducible route toward more reliable long-horizon RL for language agents.

URL PDF HTML ☆

赞 0 踩 0

2606.05261 2026-06-05 cs.CV cs.AI cs.LG 版本更新

NIV: Neural Axis Variations for Variable Font Generation

NIV: 用于可变字体生成的神经轴变化

Nadav Benedek, Ariel Shamir, Ohad Fried

发表机构 * Reichman University（雷赫曼大学）

AI总结提出NIV方法，通过预测字形轮廓的逐点位移，自动将静态字体转换为支持多轴连续插值的可变字体，并在新构建的数据集上验证其泛化能力。

详情

AI中文摘要

可变字体能够沿语义设计轴（如字重、字宽、倾斜和光学尺寸）实现字形几何的连续变化。然而，从静态字体构建可变字体仍然是一个劳动密集型过程，需要专业的字体设计和对字形变化数据的手动规范。我们引入了NIV（神经轴变化），一种自动将静态字体转换为功能齐全的可变字体的方法。给定字形轮廓和一组期望的设计轴，NIV预测每点的位移。该模型直接操作矢量字形几何，并采用一种新颖的属性嵌入机制，捕获多个轴之间的相互作用，从而在统一框架内实现一致的多轴变化。我们在一个新构建的源自可变Google字体的数据集上训练NIV，该数据集包含超过一百万个变化元组。得到的模型能够泛化到未见过的码点、未见过的字体样式、高复杂度的CJK字形，甚至分布外的手写输入。生成的输出是标准的可变字体文件，支持通过现有渲染引擎进行连续插值。为了促进研究，我们在https://github.com/ndvbd/NIV上发布了数据集、完整的训练和推理实现以及训练好的模型。超越字体排印，我们的方法展示了如何使用神经变形合成具有连续参数变化的结构化几何对象。

英文摘要

Variable fonts enable continuous variation of glyph geometry along semantic design axes such as weight, width, slant, and optical size. However, constructing a variable font from a static font remains a labor-intensive process requiring expert typographic design and manual specification of glyph variation data. We introduce NIV (Neural Axis Variations), a method that automatically converts a static font into a fully functional variable font. Given glyph outlines and a set of desired design axes, NIV predicts per-point displacements. The model operates directly on vector glyph geometry and employs a novel Property Embedding mechanism that captures interactions between multiple axes, enabling consistent multi-axis variation within a unified framework. We train NIV on a newly constructed dataset derived from variable Google Fonts, comprising over one million variation tuples. The resulting model generalizes across unseen code points, unseen font styles, high-complexity CJK glyphs, and even out-of-distribution handwriting inputs. The generated outputs are standard variable font files supporting continuous interpolation via existing rendering engines. To facilitate research, we release the dataset, the complete training and inference implementation, and trained models at https://github.com/ndvbd/NIV. Beyond typography, our approach demonstrates how structured geometric objects with continuous parametric variation can be synthesized using neural deformations.

URL PDF HTML ☆

赞 0 踩 0

2606.05258 2026-06-05 stat.ML cs.LG stat.AP 版本更新

Harnessing Source Heterogeneity for Cluster-Structured Transfer Learning

利用源异质性进行聚类结构迁移学习

Xiaohui Yin, Jun Jin, Shane J. Sacco, Robert H. Aseltine, Kun Chen

发表机构 * Department of Statistics, University of Connecticut（康奈尔大学统计学系）； Department of Public Health Sciences, Henry Ford Health（亨利福特医疗健康部公共卫生科学系）； Center for Population Health, University of Connecticut Health Center（康奈尔大学健康中心人口健康中心）

AI总结针对迁移学习中源异质性问题，提出Trans-GLMC方法，通过聚类结构自适应融合目标与源数据，提升预测性能并识别可解释的源聚类。

详情

AI中文摘要

当目标群体数据有限但存在多个相关辅助源时，迁移学习是一种自然策略。一个核心难点是源异质性：辅助源可能并非同等有用，且其有用性可能以结构化的聚类方式变化。现有的迁移学习方法通常将源选择简化为二元的信息性/非信息性决策，忽略了具有不同可迁移性的源子组。受一项使用康涅狄格医院信息管理交换（CHIME）数据的自杀风险研究（涵盖27家医院的636,758名患者）的启发，我们提出了Trans-GLMC，一种针对广义线性模型的聚类结构迁移学习程序。CHIME设置说明了核心挑战：由于自杀尝试在任何单一设施中罕见，医院特定的风险模型不稳定，而不加区分地合并所有医院会模糊设施层面在患者构成和风险特征上的差异。Trans-GLMC首先在目标和候选源之间构建基于系数的距离，以恢复潜在源聚类。然后，它结合全局融合、聚类内细化和目标去偏，产生一个适应检测到的结构的估计量。我们建立了一个非渐近误差界，当存在有意义的目标聚类时，该误差界优于其非聚类对应物，否则在常数范围内匹配非聚类速率。在模拟和CHIME研究中，Trans-GLMC改进了设施特定的预测，识别了具有相互可迁移性的可解释医院社区，并恢复了临床一致的自杀风险因素。

英文摘要

Transfer learning is a natural strategy when a target population has limited data but multiple related auxiliary sources are available. A central difficulty is source heterogeneity: auxiliary sources may not be equally useful, and their usefulness may vary in a structured, cluster-like fashion. Existing transfer-learning methods often reduce source selection to a binary informative/non-informative decision, overlooking subgroups of sources with differential transferability. Motivated by a suicide-risk study using data from the Connecticut Hospital Information Management Exchange (CHIME), comprising 636,758 patients across 27 hospitals, we propose Trans-GLMC, a cluster-structured transfer-learning procedure for generalized linear models. The CHIME setting illustrates the core challenge: hospital-specific risk models are unstable because suicide attempts are rare at any single facility, whereas indiscriminate pooling across hospitals can obscure facility-level differences in patient mix and risk profiles. Trans-GLMC first constructs a coefficient-based distance among the target and candidate sources to recover latent source clusters. It then combines global fusion, within-cluster refinement, and target debiasing to produce an estimator that adapts to the detected structure. We establish a non-asymptotic error bound that improves over its unclustered counterpart whenever a meaningful target cluster exists and matches the unclustered rate up to constants otherwise. In simulations and in the CHIME study, Trans-GLMC improves facility-specific prediction, identifies interpretable communities of hospitals with mutual transferability, and recovers clinically coherent suicide-risk factors.

URL PDF HTML ☆

赞 0 踩 0

2606.05257 2026-06-05 cs.LG cs.IR 版本更新

Scaling Laws for Behavioral Foundation Models over User Event Sequences

用户事件序列上行为基础模型的缩放定律

Rickard Brüel Gabrielsson

发表机构 * Unbox AI

AI总结研究行为基础模型在用户事件序列上的缩放定律，通过约600次实验发现小嵌入器参数最优，计算最优训练在低计算量时数据密集，且评估指标影响缩放定律。

详情

AI中文摘要

基础模型越来越多地在推荐、支付、欺诈和商务领域的用户行为序列上进行训练，但这些模型仍然缺乏语言模型缩放定律所提供的计算校准。我们研究了一种常见的两部件行为模型架构：基于特征的嵌入器将每个多模态项目映射为向量，解码器仅变换器从结果序列中预测下一个事件。在真实交互数据上进行约600次运行，涵盖$10^{15}$-$10^{19}$训练FLOPs，我们联合变化四个部署相关轴：两部件参数分配、临界批量大小、模型/数据分配以及冻结嵌入器后使用的采样负例数量。小嵌入器（参数占比$s^{\star}\!\approx\!2\%$）在我们测试的每个预算下都是计算最优的，因为嵌入器参数每步更昂贵，且暴露于比上下文器参数多得多的重复项目。计算最优训练在低计算量时相对于文本是数据密集的，但随着计算量增加，其$D/N$比率向Chinchilla启发式靠拢。采样训练目标和部署的排序指标以自身缩放的方式不一致：临界批量大小、冻结后的最优负例数量以及损失与排序质量之间的一致性都随计算量和所选评估指标而变化。对于负采样，更大的预算越来越偏好更多负例；到$10^{19}$ FLOPs时，活跃约束是候选轴内存而非FLOPs。在行为基础模型中，评估指标因此是缩放定律的一部分：改变它可能改变计算最优配方。

英文摘要

Foundation models are increasingly trained on sequences of user actions in recommendation, payments, fraud, and commerce, but these models still lack the kind of compute calibration that scaling laws provide for language models. We study a common two-part behavioral-model architecture: a feature-based event embedder maps each multi-modal item to a vector, and a decoder-only transformer predicts the next event from the resulting sequence. Across roughly 600 runs on real interaction data, spanning $10^{15}$-$10^{19}$ training FLOPs, we jointly vary four deployment-relevant axes: the two-part parameter split, critical batch size, model/data allocation, and the number of sampled negatives used after freezing the embedder. A small embedder ($s^{\star}\!\approx\!2\%$ of parameters) is compute-optimal at every budget we test because embedder parameters are both more expensive per step and exposed to far more repeated items than contextualizer parameters. Compute-optimal training is data-heavy relative to text at low compute, but its $D/N$ ratio moves toward the Chinchilla heuristic as compute increases. The sampled training objective and deployed ranking metrics disagree in ways that themselves scale: critical batch size, optimal negative count after freezing, and the agreement between loss and ranking quality all shift with compute and with the chosen evaluation metric. For negative sampling, larger budgets increasingly prefer more negatives; by $10^{19}$ FLOPs the active constraint is candidate-axis memory rather than FLOPs. In behavioral foundation models, the evaluation metric is therefore part of the scaling law: changing it can change the compute-optimal recipe.

URL PDF HTML ☆

赞 0 踩 0

2606.05254 2026-06-05 cs.LG cs.CV cs.RO 版本更新

Flash-WAM: Modality-Aware Distillation for World Action Models

Flash-WAM：面向世界动作模型的模态感知蒸馏

Arman Akbari, Ci Zhang, Arash Akbari, Lin Zhao, Yixiao Chen, Weiwei Chen, Xuan Zhang, Geng Yuan, Yanzhi Wang

发表机构 * Northeastern University（东北大学）； University of Georgia（佐治亚大学）； EmbodyX Inc.（EmbodyX公司）

AI总结针对世界动作模型联合生成视频和机器人动作时因多模态噪声分布不对称导致蒸馏失效的问题，提出模态感知步蒸馏框架Flash-WAM，通过为不同模态选择匹配噪声机制的参数化方法，实现单步推理并大幅加速。

详情

AI中文摘要

世界动作模型（WAMs）通过迭代扩散联合生成未来视频和机器人动作，在操作基准上表现出色，但需要数十个去噪步骤，这一成本阻碍了实时控制。步蒸馏已成为自然的补救措施，但现成的方法在联合视频-动作设置中失效，因为视频和动作流使用不同的信噪比偏移噪声调度，并以显著不同的边际噪声分布到达训练，这种不对称性是单模态蒸馏方法无法处理的。我们提出 extbf{Flash-WAM}，一个受一致性蒸馏启发的模态感知步蒸馏框架，为每个模态选择一致性函数以匹配其噪声机制：针对动作流的低噪声机制采用线性梯度缩放参数化，针对视频流的高噪声机制采用方差保持参数化，该框架基于对一致性函数族的结构分析，该分析刻画了在一致性边界条件下可实现的梯度缩放。在LingBot-VA上实例化，Flash-WAM将每个模态的推理压缩到单步。在RoboTwin 2.0上，这将每个块延迟从8.1秒减少到NVIDIA L40S上的348毫秒，实现了23倍的加速，从而支持实时推理。Flash-WAM在模拟基准上保持了任务成功率（RoboTwin 2.0上85.5%，LIBERO上95.7%），并大幅恢复了真实世界性能（Unitree G1人形机器人上平均60%），而朴素的一致性蒸馏在相同步预算下降至24%。

英文摘要

World-action models (WAMs) jointly generate future video and robot actions through iterative diffusion, achieving strong performance on manipulation benchmarks but requiring tens of denoising steps, a cost that precludes real-time control. Step distillation has emerged as the natural remedy, but off-the-shelf methods break down in the joint video-action setting because video and action streams use different SNR-shifted noise schedules and reach training with substantially different marginal noise distributions, an asymmetry that single-modality distillation methods cannot accommodate. We introduce \textbf{Flash-WAM}, a modality-aware step-distillation framework inspired by consistency distillation that selects the consistency function for each modality to match its noise regime: a linear-gradient-scaling parametrization for the action stream's low-noise regime, paired with a variance-preserving parametrization for the video stream's high-noise regime, grounded in a structural analysis of the consistency-function family that characterizes the achievable gradient scaling under the consistency boundary condition. Instantiated on LingBot-VA, Flash-WAM compresses inference to a single step in each modality. On RoboTwin 2.0, this reduces per-chunk latency from $8.1$ seconds to $348$ ms on NVIDIA L40S, a $23{\times}$ speedup that enables real-time inference. Flash-WAM preserves task success on simulation benchmarks ($85.5\%$ RoboTwin 2.0, $95.7\%$ LIBERO) and substantially recovers real-world performance ($60\%$ average on a Unitree G1 humanoid robot), while naive consistency distillation drops to $24\%$ at the same step budget.

URL PDF HTML ☆

赞 0 踩 0

2606.05253 2026-06-05 cs.LG 版本更新

Alpha-RTL: Test-Time Training for RTL Hardware Optimization

Alpha-RTL: 面向RTL硬件优化的测试时训练

Peilong Zhou, Zhirong Chen, Cangyuan Li, Haoyu Gao, Kaiyan Chang, Ziming Qu, Ying Wang

发表机构 * SKLP, Institute of Computing Technology, Chinese Academy of Sciences（SKLP，计算技术研究所，中国科学院）； University of the Chinese Academy of Sciences（中国科学院大学）； School of Advanced Interdisciplinary Sciences（先进交叉学科学院）

AI总结提出TTT-RTL框架，通过测试时强化学习结合EDA反馈（语法检查、仿真和PPA奖励）优化LLM生成的RTL设计，在RTLLM v2.0和工业级C910 FPU单元上分别降低PPA乘积65.1%和ADP 59.4%。

Comments 10 pages, 5 figures

详情

AI中文摘要

大型语言模型（LLM）在生成功能正确的寄存器传输级（RTL）硬件设计方面展现出越来越大的潜力。最近的系统通过集成EDA的强化学习（结合语法、仿真和PPA奖励）进一步改进，但在部署前训练通用RTL生成器，而测试时方法使用冻结策略进行搜索。我们则在测试时执行强化学习，使LLM策略能够针对特定RTL问题适应可执行的EDA反馈。我们提出TTT-RTL，据我们所知，这是首个针对每个设计的测试时训练框架，它闭环了LLM策略与用于RTL优化的EDA流水线。TTT-RTL采样候选实现，通过语法检查和仿真验证它们，使用综合得出的PPA乘积对有效设计进行评分，通过PUCT索引的设计状态池重用高奖励变体，并使用熵策略梯度目标更新策略。为了在稀疏或平台期奖励下稳定策略更新，我们引入了一个自适应KL预算控制器，该控制器使用参考KL、有效样本量和奖励饱和信号来调整熵约束。在Nangate 45nm工艺下的RTLLM v2.0上，TTT-RTL将几何平均PPA乘积相对于参考降低了65.1%，优于最强的已发布冻结策略智能体基线（26.1%）。在Sky130工艺下的工业级XuanTie C910 FPU前导零预测单元上，TTT-RTL实现了59.4%的ADP降低，消融实验证实策略适应、状态重用和KL预算控制各自都有贡献。这些结果表明，带有可执行EDA反馈的测试时训练可以将基于LLM的RTL生成从功能正确性推向物理优化的硬件。

英文摘要

Large language models (LLMs) have shown increasing promise in generating functionally correct register-transfer-level (RTL) hardware designs. Recent systems improve further through EDA-integrated reinforcement learning with syntax, simulation, and PPA rewards, but train a general RTL generator before deployment while test-time approaches search with a frozen policy. We instead perform reinforcement learning at test time, allowing the LLM policy to adapt to executable EDA feedback for the specific RTL problem at hand. We propose TTT-RTL, to our knowledge the first per-design test-time training framework that closes the loop between an LLM policy and an EDA pipeline for RTL optimization. TTT-RTL samples candidate implementations, verifies them through syntax checking and simulation, scores valid designs using synthesis-derived PPA product, reuses high-reward variants through a PUCT-indexed design-state pool, and updates the policy with an entropic policy-gradient objective. To stabilize policy updates under sparse or plateaued rewards, we introduce an adaptive KL-budget controller that adjusts the entropy constraint using reference KL, effective sample size, and reward saturation signals. On RTLLM v2.0 under Nangate 45nm, TTT-RTL reduces the geometric-mean PPA product by 65.1% over the reference, outperforming the strongest published frozen-policy agent baseline at 26.1%. On an industrial XuanTie C910 FPU leading-zero-anticipation unit under Sky130, TTT-RTL achieves a 59.4% ADP reduction, and ablations confirm that policy adaptation, state reuse, and KL-budget control each contribute. These results suggest that test-time training with executable EDA feedback can move LLM-based RTL generation beyond functional correctness toward physically optimized hardware.

URL PDF HTML ☆

赞 0 踩 0

2606.05247 2026-06-05 cs.LG stat.ML 版本更新

DiffSlack: Learning under Nonlinear Inequality Constraints via Learnable Slack Variables

DiffSlack: 通过可学习松弛变量在非线性不等式约束下学习

Ziqian Wang, Chenxi Fang, Zhen Zhang

发表机构 * State Key Laboratory of Tribology in Advanced Equipment, Tsinghua University（先进设备摩擦学国家重点实验室，清华大学）； Beijing Key Laboratory of Transformative High-end Manufacturing Equipment and Technology, Department of Mechanical Engineering, Tsinghua University（transformative高端制造设备与技术北京市重点实验室，机械工程系，清华大学）； Automotive Electronics Business Unit, Hirain Inc.（Hirain公司汽车电子事业部）

AI总结提出DiffSlack，一种可微投影层，通过可学习松弛变量将非线性不等式约束转化为等式，结合阻尼高斯-牛顿投影实现端到端约束满足，在车辆路径规划中取得更高成功率和几何约束满足度。

详情

AI中文摘要

在神经网络中强制执行非线性不等式约束仍然具有挑战性，尤其是当输出受到许多耦合约束时。现有的硬约束方法通常对约束集施加结构限制，或者为大规模非线性问题引入大量计算开销。在此，我们提出DiffSlack，一种用于非线性不等式约束神经预测的可微投影层。DiffSlack将不等式重新表述为带有可学习松弛变量的等式，这些松弛变量作为增强网络输出的一部分被预测，并为阻尼高斯-牛顿投影提供数据驱动的热启动。投影层将原始预测映射到增强可行流形上，同时保持端到端可微性。两阶段课程进一步稳定训练并改善约束满足。我们在具有200个来自碰撞避免、曲率限制和航点间距的非线性不等式约束的车辆路径规划上评估DiffSlack。与现有的基于学习的基线相比，DiffSlack在相当的推理预算下实现了更高的规划成功率和更强的几何约束满足。消融研究进一步表明，硬投影层降低了对监督质量的敏感性。CARLA中的闭环跟踪和真实车辆实验证实了生成轨迹的可执行性。这些结果表明，DiffSlack为工程应用中将硬不等式约束嵌入神经网络提供了一种实用且可扩展的方法。

英文摘要

Enforcing nonlinear inequality constraints in neural networks remains challenging, especially when the output is subject to many coupled constraints. Existing hard constraint methods often impose structural restrictions on the constraint set or introduce substantial computational overhead for large-scale nonlinear problems. Here, we propose DiffSlack, a differentiable projection layer for nonlinear inequality-constrained neural prediction. DiffSlack reformulates inequalities as equalities with learnable slack variables, which are predicted as part of the augmented network output and provide a data-driven warm start for damped Gauss-Newton projection. The projection layer maps raw predictions onto the augmented feasible manifold while preserving end-to-end differentiability. A two-stage curriculum further stabilizes training and improves constraint satisfaction. We evaluate DiffSlack on vehicle path planning with 200 nonlinear inequality constraints from collision avoidance, curvature limits, and waypoint spacing. Compared with existing learning-based baselines, DiffSlack achieves a higher planning success rate and stronger geometric constraint satisfaction under a comparable inference budget. Ablation studies further show that the hard projection layer reduces sensitivity to supervision quality. Closed-loop tracking in CARLA and real-world vehicle experiments confirms the executability of the generated trajectories. These results demonstrate that DiffSlack provides a practical and scalable approach to embedding hard inequality constraints into neural networks for engineering applications.

URL PDF HTML ☆

赞 0 踩 0

2606.05242 2026-06-05 stat.ML cs.LG cs.NA math.NA math.PR 版本更新

Deterministic Envelopes for Tamed SGLD: Decoupling Stochastic-Gradient Noise and Localizing Taming

驯化SGLD的确定性包络：解耦随机梯度噪声与局部化驯化

Yiwei Zhou, Ziheng Chen

发表机构 * School of Mathematics and Statistics, Yunnan University（云南大学数学与统计学院）

AI总结针对随机梯度Langevin算法中驯化分母引入的稳态偏差问题，提出一种结构保持的确定性包络框架，通过解耦梯度噪声与驯化步骤来消除偏差，并引入混合包络设计以兼顾稳定性和偏差减少。

Comments 40 pages, 11 tables, 2 figures

详情

AI中文摘要

随机梯度Langevin算法常使用驯化分母来稳定非全局Lipschitz漂移。本文表明，当分母与分子依赖于相同的随机梯度实现时，驯化步骤会改变随机预言本身，即使原始随机梯度无偏，也可能产生稳态偏差。我们提出了一种结构保持的框架来设计驯化分母。它在采样预言噪声之前固定分母，并使用局部确定性包络来避免典型区域中的不必要驯化。这些核保留了驯化的稳定效果，同时避免了由梯度依赖分母引入的偏差。我们的理论解释了稳态误差如何分解为预言依赖驯化引起的偏差和确定性稳定引入的剩余误差。在这个确定性包络族中，分析识别出一个远尾条件，解释了局部软包络的局限性，并激发了一个混合成员：在典型区域使用软包络，但在罕见游荡时通过硬尾控制提供保护。实验证实了随机分母的预测稳态失真、确定性包络设计的偏差减少以及混合结构的稳定效果。

英文摘要

Stochastic-gradient Langevin algorithms often use tamed denominators to stabilize non-globally Lipschitz drifts. This paper shows that when the denominator depends on the same stochastic-gradient realization as the numerator, the taming step changes the stochastic oracle itself and can create a stationary bias even if the original stochastic gradient is unbiased. We propose a structure-preserving framework for designing tamed denominators. It fixes the denominator before the oracle noise is sampled and uses localized deterministic envelopes to avoid unnecessary taming in typical regions. These kernels keep the stabilizing effect of taming while avoiding the bias introduced by a gradient-dependent denominator. Our theory explains how the stationary error splits into the bias caused by oracle-dependent taming and the remaining error introduced by deterministic stabilization. Within this deterministic-envelope family, the analysis identifies a far-tail condition that explains the limitation of local soft envelopes and motivates a hybrid member: soft in the typical region, but protected by hard-tail control on rare excursions. Experiments confirm the predicted stationary distortions of random denominators, the bias reduction of deterministic-envelope designs, and the stabilizing effect of the hybrid construction.

URL PDF HTML ☆

赞 0 踩 0

2606.05239 2026-06-05 stat.ML cs.LG 版本更新

HyFAD: Hybrid Time-Frequency Diffusion with Frequency-Aware Embedding for Time Series Imputation

HyFAD: 用于时间序列插值的混合时频扩散与频率感知嵌入

Hongfan Gao, Wangmeng Shen, Bin Yang, Jilin Hu

发表机构 * School of Data Science and Engineering（数据科学与工程学院）； East China Normal University（华东师范大学）

AI总结提出HyFAD模型，通过耦合时频扩散框架和频率感知步嵌入，实现从时域到频域的渐进式去噪，有效解决频率敏感去噪和高频重建问题，在多个基准数据集上达到最先进性能。

详情

AI中文摘要

扩散模型通过迭代去噪逐步捕捉复杂数据分布，在时间序列建模中表现出强大性能。然而，现有方法在处理频率敏感去噪、高频重建以及平衡全局趋势与局部动态方面存在困难。为解决这些限制，我们提出 extbf{HyFAD}，一种用于时间序列插值的 extbf{混合}时频 extbf{扩散}模型，带有 extbf{频率感知}嵌入。基于DDPM范式，HyFAD采用耦合的时频扩散框架，其中反向去噪从时域到频域顺序进行，实现从粗到细的生成。具体地，时域扩散过程捕捉低频全局趋势，而频域扩散过程细化高频频谱分量。我们进一步引入频率感知步嵌入，利用扩散步与频谱分量之间的关系，提供步依赖的频谱引导，促进更准确的频带重建。在多个基准数据集上的大量实验表明，HyFAD达到了最先进的性能。我们的源代码可在https://github.com/hongfangao/HyFAD获取。

英文摘要

Diffusion models have demonstrated strong performance in time series modeling due to their ability to progressively capture complex data distributions through iterative denoising. However, existing approaches struggle with frequency-sensitive denoising, high-frequency reconstruction and balancing global trends with local dynamics. To address these limitations, we propose \textbf{HyFAD}, a \textbf{Hy}brid time-frequency \textbf{D}iffusion model with \textbf{F}requency-\textbf{A}ware embedding for time series imputation. Built upon the DDPM paradigm, HyFAD adopts a coupled time-frequency diffusion framework, in which the reverse denoising proceeds sequentially from the time domain to the frequency domain, enabling coarse-to-fine generation. Specifically, the time-domain diffusion process captures low-frequency global trends, while the frequency-domain diffusion process refines high-frequency spectral components. We further introduce a frequency-aware step embedding that exploits the relationship between diffusion steps and spectral components, providing step-dependent spectral guidance and facilitates more accurate band-wise reconstruction. Extensive experiments on multiple benchmark datasets demonstrate that HyFAD achieves state-of-the-art performance. Our source code is available at https://github.com/hongfangao/HyFAD.

URL PDF HTML ☆

赞 0 踩 0

2606.05236 2026-06-05 cs.RO cs.LG 版本更新

A New Quaternion-Joint Cable-Driven Redundant Manipulator Configuration and its Control Through FABRIK and Residual Reinforcement Learning

一种新型四元数关节缆驱动冗余机械臂配置及其通过FABRIK和残差强化学习的控制

Tanapath Pornthisan, Thanapat Kemthong, Thanyapisit Kangsathien, Pasut Aranchaiya, Paulo Garcia, Viboon Sangveraphunsiri

发表机构 * University of California, San Diego（加州大学圣地亚哥分校）

AI总结提出一种4段8关节四元数关节缆驱动冗余机械臂配置，并利用残差强化学习实现比FABRIK算法高三个数量级的位置和方向精度控制。

详情

AI中文摘要

能够穿越任意空间路径的机械臂，特别是在高度阻塞的工作空间中，在多个行业中备受期待。四元数关节最近赋予了一类特定的机械臂——缆驱动冗余机械臂——超越其先前能力的新功能。具体来说，四元数关节减少了每个自由度所需的电机数量，为更紧凑的解决方案铺平了道路。一个持续的挑战是，四元数关节运动学模型的复杂性给机械臂配置的先验决策带来了困难，并对控制系统提出了更高的计算需求，其非线性放大了由于制造不精确而产生的设计与物理实物之间的所有差异。在这里，我们展示了一个4段、8关节的机械臂可以在更低的硬件成本下实现比现有配置更广阔的工作空间，并且残差强化学习在控制此类机械臂方面优于现有最先进的方法——特别是FABRIK算法。我们的结果表明，这种配置比先前设计更有效地利用工作空间，并且残差强化学习在位置和方向精度上比FABRIK高出三个数量级，实现了对新型4段、8关节机械臂的精确控制。此外，控制实现更简单：我们描述了完整的FABRIK控制过程及相应的学习实现。我们的方法适用于新系统的设计，为设计者提供了开发此类机械臂及新型配置相应控制系统的更多工具。

英文摘要

Robotic arms capable of traversing arbitrary spatial paths, especially in highly obstructed workspaces, are highly desired across several industries. Quaternion-joints have recently empowered a specific class of robotic arms -- cable-driven redundant manipulators -- beyond its prior capabilities. Specifically, quaternion-joints reduce the number of required motors per degree of freedom, paving the way for more compact solutions.An ongoing challenge is that the complexity of the kinematic model of quaternion joints challenges a priori decisions on manipulator configurations and imposes higher computational demands on the control system and its non-linearities amplify all discrepancies between design and physical artifact arising from fabrication imprecision. Here we show a that a 4-segment, 8-joint manipulator can achieve a broader workspace than extant configurations, at lower hardware cost, and that Residual Reinforcement Learning outperforms extant state-of-the-art methods -- specifically, the FABRIK algorithm -- on the control of such manipulator. Our results show that this configuration is more workspace-effective than prior designs, and that Residual Reinforcement Learning outperforms FABRIK by three orders of magnitude on positional and orientational accuracy, effecting precise control of the novel 4-segment, 8-joint manipulator. Additionally, the control implementation is simpler: we describe the complete FABRIK process for control and corresponding learning implementation. Our methodology is applicable to the design of new systems, providing designers with further tools for the development of this class of manipulators and corresponding control systems for novel configurations.

URL PDF HTML ☆

赞 0 踩 0

2606.05234 2026-06-05 cs.RO cs.LG 版本更新

OLIVE: Online Low-Rank Incremental Learning for Efficient Adaptive Exoskeletons

OLIVE: 面向高效自适应外骨骼的在线低秩增量学习

Dong Liu, Yanxuan Yu, Ben Lengerich, Tony Geng, Ying Nian Wu

发表机构 * University of California, Los Angeles（加州大学洛杉矶分校）； Columbia University（哥伦比亚大学）； University of Wisconsin-Madison（威斯康星大学麦迪逊分校）； Rice University（里奇大学）

AI总结提出OLIVE框架，通过低秩残差分解和奖励驱动策略梯度实现外骨骼控制的在线个性化自适应，在多种地形上提升步态平滑度、降低努力并增强稳定性。

详情

AI中文摘要

可穿戴外骨骼系统有望恢复身体障碍者的行动能力，但大多数现有控制器依赖于静态步态策略，缺乏适应动态真实环境或个体用户特征的能力。我们提出\olive（\underline{O}nline \underline{L}ow-rank \underline{I}ncremental Learning for Efficient Adapti\underline{ve} Exoskeletons），一种参数高效的在线自适应框架，在部署期间持续个性化外骨骼控制。\olive将控制策略的自适应组件分解为低秩残差形式~$\dW = \At\Bt^\top$，秩~$r!\ll!\min(d,k)$，将在线更新成本从$\mathcal{O}(dk)$降低到$\mathcal{O}(r(d{+}k))$，同时保持预训练基础控制器~$\Wz$的稳定性。参数通过奖励塑造的策略梯度更新，完全由身体传感器反馈（EMG、IMU、振动）驱动，消除了对离线参考轨迹的依赖。门控机制根据上下文状态调节个性化强度，动态秩调度器根据地形复杂度调整更新维度——在简单平坦地形上分配最小容量，在要求高的不平坦地形上扩展到更高秩更新——从而在多种活动中实现稳健性能：平地行走、楼梯导航、斜坡和不平坦地形。在可穿戴平台上的实验表明，\olive在步态平滑度、努力减少和运动稳定性上比最强基线分别提高了13、22和15个百分点，在大约1,800步内收敛，端到端延迟为7.4毫秒。我们的代码实现可在https://github.com/FastLM/OLIVE获取。

英文摘要

Wearable exoskeleton systems hold promise for restoring mobility in individuals with physical impairments, yet most existing controllers rely on static gait policies that lack the ability to adapt to dynamic real-world environments or individual user characteristics. We present \olive (\underline{O}nline \underline{L}ow-rank \underline{I}ncremental Learning for Efficient Adapti\underline{ve} Exoskeletons), a parameter-efficient online adaptation framework that continuously personalizes exoskeleton control during deployment. \olive decomposes the adaptive component of the control policy into a low-rank residual form~$\dW = \At\Bt^\top$ with rank~$r!\ll!\min(d,k)$, reducing online update cost from $\mathcal{O}(dk)$ to $\mathcal{O}(r(d{+}k))$ while preserving the stability of a pretrained base controller~$\Wz$. Parameters are updated via a reward-shaped policy gradient driven purely by on-body sensor feedback (EMG, IMU, vibration), eliminating dependence on offline reference trajectories. A gating mechanism modulates the strength of personalization based on contextual state, and a dynamic rank scheduler adapts the update dimensionality to terrain complexity -- allocating minimal capacity on simple flat terrain and expanding to higher-rank updates on demanding uneven surfaces -- enabling robust performance across diverse activities: flat walking, stair navigation, slopes, and uneven terrain. Experiments on the wearable platform demonstrate that \olive achieves +13, +22, and +15 percentage-point improvements in gait smoothness, effort reduction, and motion stability over the strongest baseline, converging within $\sim$1{,}800 walking steps at 7.4,ms end-to-end latency. Our code implementation is available at https://github.com/FastLM/OLIVE.

URL PDF HTML ☆

赞 0 踩 0

2606.05232 2026-06-05 cs.LG cs.AI 版本更新

Differentiable Efficient Operator Search

可微分高效算子搜索

Xiaohuan Pei, Jiyuan Zhang, Yuanfan Guo, Weiguo Feng, Tao Huang, Cho-Jui Hsieh, Chang Xu

发表机构 * The University of Sydney（悉尼大学）； ByteDance（字节跳动）； Shanghai Jiao Tong University（上海交通大学）； University of California, Los Angeles（加州大学洛杉矶分校）

AI总结提出可微分高效算子搜索框架，统一解释多种token缩减算子，通过联合搜索缩减位置、保留数量和算子行为，在预算约束下优化多模态模型性能。

详情

AI中文摘要

高效多模态基础模型通常依赖于手动设计的token缩减算子，如剪枝、合并、池化和自适应重加权。尽管这些算子看起来不同，但我们表明它们可以被解释为共享算子空间的不同区域。基于这一观点，我们引入了高效算子搜索，一个可微分框架，联合搜索在哪里缩减token、保留多少token以及如何处理缩减后的token信息。所提出的搜索空间参数化层激活、保留预算和算子行为，而搜索策略在单边预算和成本约束下优化任务性能。该公式将代表性手工设计基线作为特例恢复，并进一步发现超越孤立手动设计的混合算子。在多模态基准上的实验表明，搜索得到的算子在精度-效率权衡上具有竞争力，特别是在激进的视觉token缩减下。这些结果表明，高效多模态推理可以从手动算子设计重新构建为可微分算子搜索。

英文摘要

Efficient multimodal foundation models often rely on manually designed token-reduction operators, such as pruning, merging, pooling, and adaptive reweighting. Although these operators appear different, we show that they can be interpreted as distinct regimes of a shared operator space. Based on this view, we introduce Efficient Operator Search, a differentiable framework that jointly searches where to reduce tokens, how many tokens to retain, and how reduced token information should be processed. The proposed search space parameterizes layer activation, retention budget, and operator behavior, while the search policy optimizes task performance under one-sided budget and cost constraints. This formulation recovers representative hand-designed baselines as special cases and further discovers hybrid operators beyond isolated manual designs. Experiments on multimodal benchmarks show that the searched operators achieve competitive accuracy-efficiency trade-offs, especially under aggressive visual-token reduction. These results suggest that efficient multimodal inference can be reframed from manual operator design to differentiable operator search.

URL PDF HTML ☆

赞 0 踩 0

2606.05230 2026-06-05 stat.ML cs.LG eess.SP 版本更新

Central Description Length (CDL) Clustering Validation Index

中心描述长度（CDL）聚类验证指标

Mahdi Shamsi, Soosan Beheshti

发表机构 * Toronto Metropolitan University（多伦多 Metropolitan 大学）

AI总结提出中心描述长度（CDL）聚类验证指标，通过计算不可观测真实聚类中心描述长度的概率上界来评估聚类质量，无需标签且适用于非凸和不规则形状数据。

详情

AI中文摘要

在工程机器学习管道中，无标签情况下选择聚类算法及其超参数是一个常见难题，这些管道通常用于传感器、图像或过程数据的无监督分析。聚类验证指标（CVI）提供内部评分来对候选聚类进行排序，但大多数流行的CVI基于欧几里得紧致性和分离项构建，因此倾向于紧凑的凸分区。已知它们在非凸、不规则或变密度数据上的性能会下降，通常需要使用核变换或替代距离度量，但代价是额外的调优和计算。本文介绍了中心描述长度（CDL）聚类验证指标。CDL利用观测到的簇内紧致性、估计的聚类中心和估计的聚类协方差，计算与不可观测的真实聚类中心相关的描述长度的概率上界。该界限将簇内紧致性和质心位移压缩为一个可计算的量，并在任何聚类算法产生的分区上进行评估。实现仅使用可观测的量（数据、分区、估计中心和估计协方差），不使用真实标签。在具有非凸和任意形状簇的合成基准测试中，CDL-CVI比我们测试的传统CVI更频繁地选择参考聚类数，并达到更高的调整兰德指数（ARI）值，且无需额外的核预处理阶段。在从冻结的无监督嵌入聚类的图像基准测试（MNIST、CIFAR-10、STL-10）中，CDL-CVI在报告的试验中，针对K-means、DBSCAN和谱聚类返回的聚类数接近参考类别数。

英文摘要

Selecting a clustering algorithm and its hyperparameters without labels is a common difficulty in engineering machine learning pipelines that work with unsupervised analysis of sensor, image, or process data. Clustering validation indices (CVIs) provide internal scores for ranking candidate clusterings, but most popular CVIs are built from Euclidean compactness and separation terms and so tend to favour compact, convex partitions. Their performance is known to degrade on non convex, irregular, or variable density data, where kernel transformations or alternative distance measures are typically used at the cost of additional tuning and computation. This paper introduces the Central Description Length (CDL) clustering validation index. CDL uses the observed within cluster compactness, the estimated cluster centers, and the estimated cluster covariances to compute a probabilistic upper bound on the description length associated with the unobservable true cluster centers. The bound condenses intra cluster compactness and centroid displacement into a single computable quantity and is evaluated on the partition produced by any clustering algorithm. The implementation uses only observable quantities (the data, the partition, the estimated centers, and the estimated covariances) and does not use ground truth labels. On synthetic benchmarks with non convex and arbitrary shape clusters, CDL-CVI selected the reference number of clusters more often and reached higher Adjusted Rand Index (ARI) values than the conventional CVIs we tested, without an additional kernel preprocessing stage. On image benchmarks (MNIST, CIFAR-10, STL-10) clustered from frozen unsupervised embeddings, CDL-CVI returned cluster numbers close to the reference class counts across K-means, DBSCAN, and spectral clustering in the reported trials.

URL PDF HTML ☆

赞 0 踩 0

2606.05227 2026-06-05 q-bio.CB cs.LG math-ph math.MP q-bio.BM 版本更新

Quantifying the biophysical properties of stomatocytes in health and disease

量化健康与疾病状态下口形红细胞的生物物理特性

Zhaojie Chai, Jianlu Zheng, He Li, Ming Dao, George Em Karniadakis

发表机构 * Division of Applied Mathematics, Brown University（布朗大学应用数学系）； Department of Materials Science and Engineering, Massachusetts Institute of Technology（麻省理工学院材料科学与工程系）； College of Engineering, University of Georgia（佐治亚大学工程学院）

AI总结通过耗散粒子动力学模拟与微流控成像结合，构建三种口形红细胞模型，揭示其几何主导的脾窦穿越行为、膜滚动抑制及高剪切粘度增加，统一解释遗传性口形红细胞增多症的脾切除悖论。

Comments 26 pages, 9 figures

详情

AI中文摘要

遗传性口形红细胞增多症（HS）包括以杯状红细胞为特征的红细胞疾病，这些细胞对脾切除术的反应相反：在过度水化型HS（OHS）中可治愈，但在脱水型HS（DHS/干裂红细胞）中可能促进血栓形成。这一悖论持续存在，因为红细胞生物力学由部分独立的参数——剪切模量、弯曲刚度、表面积体积比（S/V）和细胞质粘度——控制，而现有检测方法仅能零散地捕获这些参数。本文结合耗散粒子动力学（DPD）模拟与微流控成像，在固定膜面积和递减体积（109.7、101.5、89.8 fL）下构建了一个对照盘状红细胞和三种口形红细胞模型（ST-RBC1-3），覆盖OHS到DHS的范围。通过五种力学正交的检测追踪这组参数，我们发现内皮间裂隙（IES）穿越由几何主导：过度水化型ST-RBC1需要比健康红细胞高一个数量级的临界压力，而脱水型ST-RBC3可自由通过。然而，ST-RBC3抑制膜滚动，并在生理血细胞比容下将低剪切全血粘度提高约29%，与戈谢病高粘度相当。一个漏斗-障碍芯片将这些差异放大为无标记的中心线偏移信号，预计可区分所有四种红细胞类型（极端表型间约4.5个标准差）。这些结果将单细胞力学、脾滤过和血液流变学统一在一个框架内，解决了脾切除悖论，并指向HS的微流控术前风险分层。

英文摘要

Hereditary stomatocytosis (HS) comprises red blood cell (RBC) disorders characterized by cup-shaped erythrocytes that respond oppositely to splenectomy: curative in overhydrated HS (OHS) but potentially thrombogenic in dehydrated HS (DHS/xerocytosis). This paradox persists because RBC biomechanics is governed by partly independent parameters--shear modulus, bending rigidity, surface-to-volume ratio (S/V), and cytoplasmic viscosity--that existing assays capture only piecemeal. Here we combine dissipative particle dynamics (DPD) simulations with microfluidic imaging to construct a control discocyte and three stomatocyte models (ST-RBC1-3) at fixed membrane area and decreasing volume (109.7, 101.5, 89.8 fL), spanning the OHS-to-DHS range. Tracing this parameter set through five mechanically orthogonal assays, we find that interendothelial-slit (IES) traversal is geometry-dominated: overhydrated ST-RBC1 requires an order of magnitude higher critical pressure than healthy RBCs, whereas dehydrated ST-RBC3 passes freely. ST-RBC3 nonetheless suppresses membrane tank-treading and raises low-shear whole-blood viscosity by ~29% at physiological haematocrit, comparable to Gaucher-disease hyperviscosity. A funnel-obstacle chip amplifies these differences into a label-free centerline-offset signal predicted to separate all four RBC types (~4.5 standard deviations between extreme phenotypes). These results unite single-cell mechanics, splenic filtration, and hemorheology in one framework, resolve the splenectomy paradox, and point toward microfluidic pre-operative risk stratification in HS.

URL PDF HTML ☆

赞 0 踩 0

2606.05225 2026-06-05 q-bio.QM cs.LG 版本更新

The Language of Elution: Autoregressive Prediction of the Next Feature in Untargeted LC-HRMS Lipidomics

洗脱的语言：非靶向LC-HRMS脂质组学中下一个特征的自回归预测

Dayanjan S. Wijesinghe

发表机构 * Virginia Commonwealth University School of Pharmacy（弗吉尼亚联邦大学药学院）

AI总结将色谱洗脱建模为自回归序列预测任务，使用LSTM和Transformer模型基于无注释特征预测下一个洗脱的质荷比区间，在临床脂质组学数据上达到98.4%的top-1准确率，并揭示了序列模式而非分子特性驱动预测。

详情

AI中文摘要

非靶向液相色谱-高分辨质谱（LC-HRMS）每份样本可检测数千个分子特征，但仅有2-20%获得可靠的结构注释。造成这种“暗代谢组”的一个根本原因是串联质谱（MS/MS）采集是反应性的：仪器仅在离子出现后选择前体，而对后续洗脱的离子一无所知。我们将色谱洗脱重新定义为自回归序列预测任务。由于反相洗脱顺序由疏水性决定，连续特征形成物理约束的序列，类似于语言中的标记。我们将质荷比（m/z）轴离散化为110个区间，并训练长短期记忆（LSTM）和Transformer模型，基于五个无注释的每个标记特征（m/z区间、质量亏损、保留时间间隔、极性和强度排名）预测下一个洗脱的m/z区间。在来自四个临床脂质组学队列（342份血浆样本；SCIEX TripleTOF 6600+，Waters CSH C18）的15,242个特征上训练，LSTM达到98.4%的top-1准确率（top-5为99.99%；平均绝对误差3.6 Da），Transformer达到98.0%。消融实验表明，自回归上下文贡献了55.5个百分点，而没有任何单个特征贡献超过0.2个百分点：是序列模式而非分子特性驱动预测。模型在共享方法的仪器间可迁移（在独立Agilent 6530数据集上r=0.999），但在不同色谱柱化学（top-1为5.1%）或极性模式（2.6%）下失败，证实了方法和模式特异性。在少至2到5次质控进样上进行微调，可将保留准确率从2.6%恢复到近50%，因此跨条件部署需要最少的校准。这些结果表明洗脱序列高度可预测，并为预测性MS/MS采集奠定基础，以提高非靶向代谢组学的注释覆盖率。

英文摘要

Untargeted liquid chromatography-high-resolution mass spectrometry (LC-HRMS) detects thousands of molecular features per sample, yet only 2-20% receive confident structural annotations. A root cause of this "dark metabolome" is that tandem MS/MS acquisition is reactive: instruments select precursors only after ions appear, blind to what elutes next. We reframe chromatographic elution as an autoregressive sequence prediction task. Because reversed-phase elution order is governed by hydrophobicity, successive features form a physically constrained sequence, like tokens in language. We discretize the mass-to-charge (m/z) axis into 110 bins and train long short-term memory (LSTM) and Transformer models to predict the next eluting m/z bin from five annotation-free per-token features: m/z bin, mass defect, retention-time gap, polarity, and intensity rank. Trained on 15,242 features from four clinical lipidomics cohorts (342 plasma samples; SCIEX TripleTOF 6600+, Waters CSH C18), the LSTM reaches 98.4% top-1 accuracy (99.99% top-5; mean absolute error 3.6 Da) and the Transformer 98.0%. Ablation shows autoregressive context accounts for 55.5 percentage points while no single feature contributes more than 0.2 pp: the sequential pattern, not molecular properties, drives prediction. Models transfer across instruments sharing the method (r=0.999 on an independent Agilent 6530 dataset) but fail under a different column chemistry (5.1% top-1) or polarity mode (2.6%), confirming method- and mode-specificity. Fine-tuning on as few as two to five quality-control injections recovers held-out accuracy from 2.6% to nearly 50%, so cross-condition deployment needs minimal calibration. These results establish that elution sequences are highly predictable and lay the groundwork for predictive MS/MS acquisition to improve annotation coverage in untargeted metabolomics.

URL PDF HTML ☆

赞 0 踩 0

2606.05219 2026-06-05 cs.LG cs.AI 版本更新

Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

大步长梯度下降恢复多路径深度线性网络中的对称性

Hee-Sung Kim, Sungyoon Lee

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文研究大步长离散梯度下降如何通过边缘稳定性振荡使多路径深度线性网络从对称性破坏转向信号重新分配，从而偏好共享表示而非单路径主导。

Comments ICML 2026

详情

AI中文摘要

状态承诺学习：训练语言模型区分计算与记忆

Fei Ding, Yongkang Zhang, Runhao Liu, Yuhao Liao, Zijian Zeng, Huiming Yang

发表机构 * Alibaba Group（阿里巴巴集团）； Tsinghua University（清华大学）

AI总结提出状态承诺学习目标及反事实擦除强化学习（CERL）方法，通过训练模型区分应保留的持久状态与可丢弃的临时计算，减少推理对隐藏思维的依赖，在数学、长链逻辑、科学问答和多轮工具使用任务中保持准确率的同时降低依赖。

Comments 17 pages

详情

AI中文摘要

推理语言模型不区分用于计算的token与构成持久状态的token：一旦生成，所有隐藏思维都保留在上下文中并影响未来预测。因此，下游推理可能依赖于不应在后续安全依赖的失败尝试、死胡同和私人草稿。我们将此现象重新定义为一种新的训练目标，即状态承诺学习：训练模型明确区分应作为持久状态提交的信息与可丢弃的临时计算。我们定义了一个反事实准则，即持久状态充分性，使得在隐藏思维被擦除后答案是否仍然可用变得可训练和可测量。然后，我们提出反事实擦除强化学习（CERL），它在相同前缀下评估保留隐藏思维的路径和擦除它们的路径，并仅在擦除路径保持正确时给予奖励。我们还引入了擦除依赖协议，并在数学、长链逻辑、科学问答和多轮工具使用评估中表明，CERL在不牺牲准确率的情况下显著降低了答案对隐藏思维的依赖，始终优于仅正确性强化学习和长答案SFT基线。

英文摘要

Reasoning language models do not distinguish tokens used for computation from tokens that constitute persistent state: once generated, all hidden thoughts remain in context and influence future predictions. As a result, downstream reasoning may depend on failed attempts, dead ends, and private scratch work that should not be safely relied on later. We recast this phenomenon as a new training objective, state commitment learning: training models to explicitly distinguish information that should be committed as persistent state from temporary computation that can be discarded. We define a counterfactual criterion, persistent-state sufficiency, which makes it trainable and measurable whether an answer remains usable after hidden thoughts are erased. We then propose Counterfactual Erasure RL (CERL), which evaluates, under the same prefix, both a path that keeps hidden thoughts and a path that erases them, and gives reward only when the erasure path remains correct. We also introduce the Erasure Dependence Protocol and show across mathematics, long-chain logic, scientific QA, and multi-turn tool-use evaluation that CERL substantially reduces answer dependence on hidden thoughts without sacrificing accuracy, consistently outperforming correctness-only RL and long-answer SFT baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.05200 2026-06-05 physics.comp-ph cs.LG 版本更新

A differentiable machine learning small-angle X-ray scattering analysis framework for structure elucidation of lipid nanoparticles

一种用于脂质纳米颗粒结构解析的可微分机器学习小角X射线散射分析框架

Maria Bånkestad, Sandra Barman, Magnus Röding, Erik Kaunisto, Viktoriia Meklesh, Audrey Gallud, Marco Mendez, Marianna Yanez Arteta, Stefan Norberg, Ann Terry, Smita Chakraborty, Shun Yu, Jerk Rönnols, Sepideh Pashami

发表机构 * RISE Research Institutes of Sweden, Division Digital Systems, Computer Science（瑞典RISE研究机构，数字系统部门，计算机科学）； RISE Research Institutes of Sweden, Division Bioeconomy, Food Research and Innovation（瑞典RISE研究机构，生物经济、食品研究与创新部门）； Sustainable Innovation & Transformational Excellence, Pharmaceutical Technology & Development, Operations, AstraZeneca（可持续创新与转型卓越，制药技术与开发，运营，阿斯利康）； Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg（查尔姆斯理工大学和哥德堡大学数学科学系）； Advanced Drug Delivery, Pharmaceutical Sciences, R&D, AstraZeneca（先进药物递送，药学科学，研发，阿斯利康）； Global Product Development, Pharmaceutical Technology & Development, Operations, AstraZeneca（全球产品开发，制药技术与开发，运营，阿斯利康）； MAX IV Laboratory, Lund University（隆德大学MAX IV实验室）

AI总结提出一种结合机器学习代理模型和可微分层的框架，加速脂质纳米颗粒的SAXS数据分析，实现多起点拟合和可辨识性分析，揭示参数简并性。

Comments 38 pages, 24 figures, 5 tables (incl. supplementary information)

详情

AI中文摘要

脂质纳米颗粒（LNPs）是带负电核酸的有效递送系统。其多组分架构产生核-壳结构。小角X射线散射（SAXS）是LNPs的重要表征技术，但从SAXS恢复内部结构和尺寸分布是一个具有非唯一解的反问题。现实模型通常过于昂贵，难以进行系统探索。我们引入了一个机器学习加速的、可微分的框架，用于异质、多分散LNPs的SAXS分析。前向模型结合了具有高斯随机场内芯的核-壳颗粒、单分散SAXS图的神经代理模型，以及一个对颗粒尺寸分布进行积分的可微分层。代理模型将预测成本降低了四个数量级，而可微性使得大规模多起点拟合和集成可辨识性分析成为可能。应用于合成和实验MC3 LNP数据，该框架表明，近乎相同的SAXS拟合可能源于不同的参数模式，其中实验拟合主要由尺寸分布与内部结构参数之间的权衡主导。

英文摘要

Lipid nanoparticles (LNPs) are efficient delivery systems for negatively charged nucleic acids. Their multi-component architecture yields a core-shell structure. Small-angle X-ray scattering (SAXS) is an important characterization technique for LNPs, but recovering internal structure and size distribution from SAXS is an inverse problem with non-unique solutions. Realistic models are often too expensive for systematic exploration. We introduce a machine-learning-accelerated, differentiable framework for SAXS analysis of heterogeneous, polydisperse LNPs. The forward model combines a core-shell particle with a Gaussian random-field interior, a neural surrogate for the monodisperse SAXS map, and a differentiable layer integrating over particle-size distributions. The surrogate reduces prediction cost by four orders of magnitude, while differentiability enables large-scale multi-start fitting and ensemble identifiability analysis. Applied to synthetic and experimental MC3 LNP data, the framework shows that near-identical SAXS fits can arise from distinct parameter modes, with the experimental fits dominated by a trade-off between size-distribution and interior-structure parameters.

URL PDF HTML ☆

赞 0 踩 0

2606.05198 2026-06-05 q-bio.BM cs.LG 版本更新

An accurate nucleic acid-small molecule docking framework via geometric deep learning with large-scale pretraining

基于几何深度学习与大尺度预训练的核酸-小分子精准对接框架

Shi Li, Xujun Zhang, Mingquan Liu, Hui Zhang, Shuoying Jia, Yu Kang, Tingjun Hou, Peichen Pan

发表机构 * College of Pharmaceutical Sciences, Zhejiang University（浙江大学药学院）； Faculty of Health Sciences, University of Macau（澳门大学健康科学学院）； Zhejiang Provincial Key Laboratory for Intelligent Drug Discovery and Development, Jinhua Institute of Zhejiang University（浙江省智能药物发现与开发重点实验室，浙江大学金华研究院）； Shanghai Innovation Institute（上海创新研究院）

AI总结提出NucleoDock框架，通过物理引导的大规模预训练和精细调优，结合序列、结构及原子级特征，利用混合密度网络几何评分头实现核酸-小分子对接，在125个复合物基准上达到56%的top-1成功率（RMSD<2.0Å），优于传统方法。

Comments 34 pages, 4 figures, 4 tabels, Supplementary Materials includes 8 tabels

详情

AI中文摘要

核酸越来越被认为是超越传统以蛋白质为中心的药物发现中的治疗靶点，然而将小分子准确高效地对接至核酸结构仍然具有挑战性。基于物理的对接方法通常准确性和效率有限，而深度学习方法则受限于实验解析的核酸-配体复合物稀缺。在此，我们提出NucleoDock，一个用于核酸-小分子对接的深度学习框架。为解决数据稀缺问题，NucleoDock将物理引导的大规模预训练（对数百万个对接生成的合成复合物）与在精选实验共晶结构上的微调相结合。它进一步整合了序列和结构信息的核苷酸表示与原子级三维特征，以捕获生物学背景和结合位点几何结构。使用基于混合密度网络的几何评分头来建模条件相互作用距离分布以进行构象排序。在125个核酸-配体复合物的外部基准测试中，NucleoDock在RMSD截止值2.0Å下实现了56%的top-1成功率，优于rDock的29%，同时每个复合物生成100个构象约需5秒。在ROBIN基准上的回顾性虚拟筛选进一步显示了早期富集的改善。NucleoDock代表了在弥合蛋白质导向和核酸导向的计算药物发现之间方法论差距方面迈出的一步。

英文摘要

Nucleic acids are increasingly recognized as therapeutic targets beyond conventional protein-centered drug discovery, yet accurate and efficient docking of small molecules to nucleic acid structures remains challenging. Physics-based docking methods often show limited accuracy and efficiency, whereas deep learning approaches are constrained by the scarcity of experimentally resolved nucleic acid-ligand complexes. Here, we present NucleoDock, a deep learning framework for nucleic acid-small molecule docking. To address data scarcity, NucleoDock combines physics-guided large-scale pretraining on millions of docking-generated synthetic complexes with fine-tuning on curated experimental co-crystal structures. It further integrates sequence- and structure-informed nucleotide representations with atomistic three-dimensional features to capture both biological context and binding-site geometry. A mixture density network-based geometric scoring head is used to model conditional interaction-distance distributions for pose ranking. On an external benchmark of 125 nucleic acid-ligand complexes, NucleoDock achieved a top-1 success rate of 56 percent at an RMSD cutoff of 2.0 Angstrom, outperforming rDock with 29 percent, while generating 100 poses in approximately 5 seconds per complex. Retrospective virtual screening on the ROBIN benchmark further showed improved early enrichment. NucleoDock represents a step toward bridging the methodological gap between protein- and nucleic acid-directed computational drug discovery.

URL PDF HTML ☆

赞 0 踩 0

2606.05194 2026-06-05 cs.LG cs.AI cs.CL 版本更新

基于轨迹级别优势优先经验回放的GRPO

Gyeongtae Yoo, Sanghyeok Park, Soohyuk Jang, Ik-hwan Kim, Sungroh Yoon

发表机构 * Department of Electrical and Computer Engineering, Seoul National University（首尔国立大学电子与计算机工程系）； Interdisciplinary Program in AI, Seoul National University（首尔国立大学人工智能跨学科项目）； AIIS, ASRI, INMC, and ISRC, Seoul National University（首尔国立大学人工智能研究所、人工智能研究机构、智能网络与计算中心及人工智能科学研究中心）

AI总结针对GRPO样本效率低的问题，提出轨迹级经验回放缓冲器，通过年龄驱逐限制陈旧性、新鲜锚定组合保持在线策略、按优势幅度优先采样，在多个数学基准上显著提升性能。

详情

AI中文摘要

基于可验证奖励的GRPO强化学习是后训练推理LLM的标准方法，但样本效率低下。每个轨迹仅用于一次梯度更新后被丢弃。朴素回放在此设置中不适用，因为LLM策略每步梯度变化快，存储的轨迹会变得陈旧并破坏训练稳定性。我们提出一种面向GRPO的轨迹级回放缓冲器，存储和采样单个轨迹而非整组。缓冲器通过年龄驱逐限制陈旧性：任何超过tau_max训练步数的轨迹被移除。缓冲器还通过新鲜锚定组合保留在线策略数据：每个批次保留其新鲜的在线策略轨迹，并拼接从缓冲器中单独抽取的回放轨迹。我们按每个轨迹的优势幅度进行优先回放，并回收优势大的单个轨迹。在三个Qwen3-Base规模、五个数学基准上，我们的方法优于GRPO和朴素回放基线。所有规模均获得正向增益，且随模型增大而增长。最大增益在4B规模上，五个基准平均提升+4.35个百分点。在联合衡量准确率和token效率的AES指标下，与GRPO的效率差距同样在4B最大，为+0.579。

英文摘要

Reinforcement learning from verifiable rewards with GRPO is a standard approach for post-training reasoning LLMs. It remains sample inefficient. Each rollout is used for a single gradient update and then discarded. Naive replay is not well suited in this setting because LLM policies drift quickly per gradient step. Stored rollouts therefore become stale and can destabilize training. We propose a rollout-level replay buffer for GRPO that stores and samples individual rollouts rather than whole groups. The buffer bounds staleness through age eviction. Any rollout older than tau_max training steps is removed. The buffer also preserves on-policy data via fresh-anchored composition. Each batch keeps its fresh on-policy rollouts and then concatenates replay rollouts drawn separately from the buffer. We prioritize replay by per-rollout advantage magnitude and recycle individual rollouts whose advantages are large. Across three Qwen3-Base scales on five math benchmarks, our method outperforms GRPO and naive replay baselines. Gains are positive at every scale and grow with model size. The largest gain is +4.35 pp on the five-benchmark average at 4B. Under an AES metric that jointly measures accuracy and token efficiency, the efficiency margin over GRPO is again largest at 4B, at +0.579.

URL PDF HTML ☆

赞 0 踩 0

2606.04485 2026-06-05 cs.LG 版本更新

LimiX-2M: Mitigating Low-Rank Collapse and Attention Bottlenecks in Tabular Foundation Models

LimiX-2M：缓解表格基础模型中的低秩坍塌和注意力瓶颈

Yuanrui Wang, Xingxuan Zhang, Han Yu, Mingchao Hao, Gang Ren, Hao Yuan, Li Mao, Yunjia Zhang, Chun Yuan, Peng Cui

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出统一tokenize-and-route框架LimiX-2M，通过RaBEL扩展标量为局部RBF特征并重新排序双向块S→N→F，以2M参数超越更大模型，改善表格基础模型的精度-效率权衡。

Comments Accepted to ICML 2026

详情

AI中文摘要

表格基础模型（TFM）日益与树集成方法竞争，但其性能通常计算效率低下：使用标准仿射标量分词时，每个特征通过本质上的一维通道注入值变化，特征ID/位置信号无法增加特征内值的自由度，导致早期层值敏感性弱和隐藏状态冗余。我们提出了一个统一的\emph{tokenize-and-route}框架用于强TFM： extbf{RaBEL}将每个标量扩展为紧凑的局部RBF特征（可选指数门控）以改善条件和浅层有效秩，而重新排序的双向块 extbf{S$ ightarrow$N$ ightarrow$F}通过在特征混合前聚合跨样本上下文并使用注意力池化来使计算与读出对齐。这些变化共同产生了 extbf{LimiX-2M}，一个2M参数模型，在广泛使用的表格基准上优于更大的TabPFN-v2和TabICL基线，同时降低了训练和推理成本。这些结果突出了值感知分词和读出对齐路由作为改善TFM中精度-效率权衡的关键杠杆。模型检查点和推理代码可在https://github.com/limix-ldm-ai/LimiX获取。

英文摘要

Tabular foundation models (TFMs) increasingly rival tree ensembles, but their performance is often compute-inefficient: with standard affine scalar tokenization, each feature injects value variation through an essentially one-dimensional channel, and feature IDs/positional signals cannot increase within-feature value degrees of freedom, yielding weak early-layer value sensitivity and redundant hidden states. We present a unified tokenize-and-route framework for strong TFMs: RaBEL expands each scalar into compact localized RBF features (optionally exponent-gated) to improve conditioning and shallow-layer effective rank, while a reordered bidirectional block S->N->F aligns computation with the readout by aggregating cross-sample context before feature mixing and using attention pooling. Together, these changes yield LimiX-2M, a 2M-parameter model that outperforms larger TabPFN-v2 and TabICL baselines on widely used tabular benchmarks while reducing training and inference costs. These results highlight value-aware tokenization and readout-aligned routing as key levers for improving the accuracy--efficiency trade-off in TFMs. Model checkpoints and inference code are available at https://github.com/limix-ldm-ai/LimiX.

URL PDF HTML ☆

赞 0 踩 0

2606.04335 2026-06-05 cs.LG cs.SY eess.SY 版本更新

先过滤，再重加权：重新思考在线策略蒸馏中的优化粒度

Yuying Li, Leqi Zheng, Yongzi Yu, Wenrui Zhou, Xuchang Zhong, Xing Hu, Jing Jin, Hangjie Yuan, Tao Feng

发表机构 * THU（清华大学）； HKUST（香港科技大学）； BIT（北京理工大学）； Meituan（美团）； ZJU（浙江大学）

AI总结针对在线策略蒸馏，提出FiRe-OPD方法，通过轨迹级过滤和令牌级软重加权实现细粒度优化，在多种设置下优于现有方法。

详情

AI中文摘要

大型语言模型中的在线策略蒸馏正从全轨迹KL监督转向更具选择性的训练范式。最近的在线策略蒸馏方法越来越关注选择哪些轨迹进行学习、哪些令牌信息量最大以及哪些监督信号最可靠。受此趋势启发，我们重新思考在线策略蒸馏的优化粒度，并提出FiRe-OPD（先过滤，再重加权），该方法在轨迹和令牌两个层面联合调整监督信号。具体来说，FiRe-OPD首先过滤轨迹以移除低质量的采样结果，然后在保留的轨迹内应用软重加权以强调信息丰富的令牌。与硬令牌选择相比，FiRe-OPD利用软加权机制有效减轻信息损失并增强优化稳定性，从而实现更细粒度的在线策略蒸馏优化。我们在强到弱、单教师和多教师设置中验证了FiRe-OPD的有效性，并展示了其相对于近期令牌级在线策略蒸馏方法的优越性（例如，在强到弱设置中AIME 2024上+6.25，在多教师设置中Miner上+18.81）。我们的代码可从此链接获取。

英文摘要

On-Policy distillation (OPD) in large language models is shifting from full-trace KL supervision toward more selective training paradigms. Recent OPD methods increasingly focus on selecting which trajectories to learn from, which tokens are most informative, and which supervision signals are most reliable. Motivated by this trend, we rethink optimization granularity of OPD and propose \fireicon\ FiRe-OPD (Filter, then Reweight), which jointly adjusts supervision signals at both trajectory and token levels. In details, FiRe-OPD first filters trajectories to remove low-quality rollout samples, and then applies soft reweighting within the retained trajectories to emphasize informative tokens. Compared with hard token selection, FiRe-OPD leverages a soft-weighting mechanism to effectively mitigate information loss and enhance optimization stability, thereby achieving finer-grained OPD optimization. We validate the effectiveness of FiRe-OPD across strong-to-weak, single-teacher, and multi-teacher settings, and demonstrate its superiority over recent token-level OPD methods ( (e.g., +6.25 on AIME 2024 in strong-to-weak, +18.81 on Miner in multi-teacher). Our code is available at https://github.com/YuYingLi0/FiRe-OPD.

URL PDF HTML ☆

赞 0 踩 0

2606.02031 2026-06-05 cs.LG cs.AI cs.CL cs.CV 版本更新

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

OpenWebRL: 揭秘视觉网络代理的在线多轮强化学习

Rui Yang, Qianhui Wu, Yuxi Chen, Hao Bai, Wenlin Yao, Hao Cheng, Baolin Peng, Huan Zhang, Tong Zhang, Jianfeng Gao

发表机构 * UIUC（伊利诺伊大学香槟分校）； Microsoft（微软）

AI总结提出OpenWebRL框架，通过在线多轮强化学习在真实网站上训练视觉网络代理，以4B参数模型在基准测试中达到开源最优，并与闭源系统竞争。

Comments 36 pages, 11 figures

详情

AI中文摘要

构建强大的视觉网络代理需要长程推理、精确定位以及与动态真实网站的稳健交互。尽管进展迅速，最强的系统仍然大多是专有的，而开放代理仍然严重依赖于对大量策划的网络轨迹进行监督式后训练。这种依赖造成了主要的可扩展性瓶颈：高质量演示的收集成本高昂，而静态数据集对多样且不断变化的开放网络的覆盖有限。尽管在线强化学习在基于文本的代理中显示出前景，但其直接用于在实时网站上训练视觉网络代理的潜力仍未得到充分探索。在本文中，我们介绍了OpenWebRL，一个用于在真实网站上通过在线多轮强化学习训练视觉网络代理的开放框架。OpenWebRL涵盖了完整的训练流程，包括可扩展的实时浏览器基础设施、监督初始化、多模态上下文管理、轨迹级成功判断以及高效的多轮策略优化。使用该框架，我们训练了OpenWebRL-4B，在具有挑战性的实时网络基准测试中建立了新的开源最优水平。仅使用0.4K初始化轨迹和2.2K开放式强化学习训练任务，OpenWebRL-4B在Online-Mind2Web上达到67.0%的成功率，在DeepShop上达到64.0%，优于之前类似或更大规模的开放代理，并与包括OpenAI CUA和Gemini CUA在内的专有系统保持竞争力。除了强大的基准性能外，我们还系统研究了使在线强化学习对视觉网络代理有效的关键设计选择，并分析了强化学习如何改进代理推理。总体而言，我们的工作为构建更强大、可重复且成本效益更高的开放网络代理提供了一条实用路径。我们将发布我们的训练数据、模型和代码以支持未来的研究。

英文摘要

Building capable visual web agents requires long-horizon reasoning, precise grounding, and robust interaction with dynamic real-world websites. Despite rapid progress, the strongest systems remain largely proprietary, while open agents still depend heavily on supervised post-training over large collections of curated web trajectories. This dependence creates a major scalability bottleneck: high-quality demonstrations are expensive to collect, and static datasets offer limited coverage of the diverse, ever-changing open web. Although online RL has shown promise for text-based agents, its potential for training visual web agents directly on live websites remains largely underexplored. In this paper, we introduce OpenWebRL, an open framework for training visual web agents with online multi-turn RL on real websites. OpenWebRL covers the full training pipeline, including scalable live-browser infrastructure, supervised initialization, multimodal context management, trajectory-level success judging, and efficient multi-turn policy optimization. Using this framework, we train OpenWebRL-4B, which establishes a new open-source state of the art on challenging live-web benchmarks. With only 0.4K initialization trajectories and 2.2K open-ended RL training tasks, OpenWebRL-4B achieves 67.0% success on Online-Mind2Web and 64.0% on DeepShop, outperforming prior open agents of similar or larger scale and remaining competitive with proprietary systems including OpenAI CUA and Gemini CUA. Beyond strong benchmark performance, we systematically study the key design choices that make online RL effective for visual web agents, and analyze how RL improves agentic reasoning. Overall, our work offers a practical path toward building more capable, reproducible, and cost-efficient open web agents. We will release our training data, models, and code to support future research.

URL PDF HTML ☆

赞 0 踩 0

2605.31278 2026-06-05 cs.AI cs.LG stat.ME 版本更新

Industrializing Prediction-Powered Inference: The GLIDE Library for Reliable GenAI and Agentic Systems Evaluation

工业化预测驱动推断：用于可靠生成式AI与智能体系统评估的GLIDE库

Grégoire Martinon, Ibrahim Merad, Mohammed Raki

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Google Research（谷歌研究院）

AI总结提出GLIDE开源库，统一多种预测驱动推断方法，提供无偏估计与有效置信区间，显著降低人工标注成本。

Comments 8 pages, Accepted to the ICML 2026 Workshop on Statistical Frameworks for Uncertainty in Agentic Systems, Seoul, South Korea, 2026

详情

AI中文摘要

智能体系统的可靠评估需要具有有效不确定性的无偏估计，但标准实践在昂贵的人工标注和有偏的LLM-as-judge代理之间权衡。预测驱动推断（PPI）将两者结合为具有有效置信区间的去偏估计，然而其各种方法仍分散在不同论文的部分实现中。我们介绍GLIDE，一个开源Python库，它在专用于均值估计的scipy风格API下统一了最先进的PPI估计器（PPI++、分层PPI、先预测后去偏及其分层变体、主动统计推断）和采样器（均匀、分层、主动、成本最优）。GLIDE附带一个可复现的蒙特卡洛验证套件、一个基于经验的决策树用于方法选择，以及一个智能体评估案例研究，显示在同等精度下显著节省标注成本。GLIDE包可通过此URL获取：https://github.com/EmertonData/glide

英文摘要

Reliable evaluation of agentic systems requires unbiased estimates with valid uncertainty, but standard practice navigates between costly human annotation and biased LLM-as-judge proxies. Prediction-powered inference (PPI) combines both into debiased estimates with valid confidence intervals, yet its various methods remain scattered across papers under partial implementations. We introduce GLIDE, an open-source Python library that unifies state-of-the-art PPI estimators (PPI++, Stratified PPI, Predict-Then-Debias and its stratified variants, Active Statistical Inference) and samplers (uniform, stratified, active, cost-optimal) under a scipy-style API specialized to mean estimation. GLIDE ships with a reproducible Monte Carlo validation suite, an empirically grounded decision tree for method selection, and an agentic evaluation case study showing substantial annotation savings at equivalent precision. The GLIDE package is available at this URL: https://github.com/EmertonData/glide

URL PDF HTML ☆

赞 0 踩 0

2605.27991 2026-06-05 stat.ML cs.LG 版本更新

Gradient-Flow Optimization as Dynamic Random-Effects Inference: Testing and Early Stopping with Applications to Deep Learning

深度神经网络训练作为随机效应：优化-推断对偶性

Minhao Yao, Ruoyu Wang, Xihong Lin, Lin Liu, Zhonghua Liu

发表机构 * Centre for Biomedical Data Science, Duke-NUS Medical School, National University of Singapore（生物医学数据科学中心，国立新加坡大学杜克-新加坡医学学校）； Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA（生物统计学系，哈佛T.H. Chan公共卫生学院，马萨诸塞州波士顿，美国）； Institute of Natural Sciences, MOE-LSC, School of Mathematical Sciences, CMA-Shanghai, SJTU-Yale Joint Center of Biostatistics and Data Science, Shanghai Jiao Tong University（自然科学院，MOE-LSC，数学科学学院，CMA-上海，SJTU-耶鲁联合生物统计学与数据科学中心，上海交通大学）； Department of Biostatistics, Columbia University, New York, NY, USA（生物统计学系，哥伦比亚大学，纽约州纽约市，美国）

AI总结本文提出深度神经网络训练与经典随机效应模型等价，揭示了优化-推断对偶性，并利用限制最大似然估计实现基于似然的早停规则。

详情

AI中文摘要

深度神经网络（DNN）取得了显著的实证成功，但其训练动态主要从优化而非统计原理的角度被理解。本文通过证明连续时间神经正切核（NTK）梯度流产生的预测与经典随机效应模型的预测完全等价，为过参数化机制下的DNN训练建立了一个统计框架。在该框架中，训练时间充当方差分量，或等价地作为经验贝叶斯协方差超参数，控制噪声到结构化信号的变异分配。这种等价性揭示了一种优化-推断对偶性：梯度流路径既是优化轨迹，也是经验贝叶斯随机效应推断路径。以训练时间为条件，网络输出是潜在信号的后验均值，通过限制最大似然估计（REML）估计训练时间，将早停转化为基于似然的经验贝叶斯推断，而非外部调参。这一视角产生了一个两阶段推断程序。首先，方差分量检验确定DNN训练是否捕捉到初始化之外的统计显著结构。其次，以训练合理为条件，REML提供基于似然的早停规则。由此产生的停止时间在NTK特征基下具有谱解释，其中训练持续到谱损失去相关实现。我们进一步证明，对于固定设计下的样本内预测，REML引导的早停实现了渐近最优预测误差，并且在额外的随机设计正则条件下，对于样本外预测也成立。这项工作将DNN训练重新定义为统计推断，并为决定是否以及训练深度神经网络多长时间提供了原则性基础。

英文摘要

Gradient-flow optimization is usually viewed as an algorithmic procedure for minimizing empirical loss, with training duration selected by validation or heuristic early-stopping rules. We develop a statistical inference framework for the gradient-flow training trajectory itself. The central object is fixed-operator squared-error gradient flow: whenever the fitted value evolves through a time-invariant positive semidefinite training operator, the trained model output at each training time is exactly equivalent to the best linear unbiased predictor, or empirical-Bayes posterior mean, under a corresponding random-effects model. Under this representation, training time becomes a variance-component parameter governing how variance is reallocated from residual noise to structured signal. This turns two basic training decisions into inferential problems. First, whether training is needed is formulated as a variance-component test for signal beyond initialization. Second, how long to train is formulated as restricted maximum likelihood (REML) estimation of the training-time variance component. The resulting REML-guided early stopping rule has a spectral interpretation: it selects the training time at which optimized spectral losses become empirically decorrelated from the eigenvalues of the training operator, yielding an effective degrees-of-freedom measure for the evolving trained model. We establish asymptotic prediction optimality for fixed-design in-sample risk and, under additional kernel regularity conditions, random-design out-of-sample risk. Deep learning models in fixed-kernel gradient regimes provide canonical modern-AI instantiations of the theory. Numerical experiments and a UK Biobank proteomics application show that the proposed inferential approach attains competitive prediction accuracy while reducing the reliance on validation splits and repeated checkpoint evaluation.

URL PDF HTML ☆

赞 0 踩 0

2605.27292 2026-06-05 cs.LG stat.ML 版本更新

极端区域策略蒸馏

Changyu Chen, Xiting Wang, Rui Yan

发表机构 * Gaoling School of Artificial Intelligence, Renmin University of China（中国人民大学耿丽人工智能学院）； Wuhan University（武汉大学）

AI总结提出极端区域策略蒸馏（ERPD）两阶段框架，通过解耦样本效率与KL效率，在固定数据上先进行弱约束离策略优化以最大化提取训练信号，再在信任区域约束下蒸馏到基础策略，从而在数学推理任务中实现更好的性能与更小的KL散度。

详情

AI中文摘要

大语言模型的强化学习面临样本效率与渐近性能之间的基本权衡：严格在策略方法在单次更新后丢弃轨迹，而离策略重用引入分布不匹配，现有信任区域技术主要通过强制保守优化来缓解，往往未充分利用丰富的训练信号。为研究此问题，我们在固定数据上执行大量离策略更新。实验揭示，激进的多步优化带来快速初始增益，但过度更新导致轨迹概率偏离和熵崩溃，性能早期停滞。收紧KL约束仅降低上限而不解决退化。这促使我们提出极端区域策略蒸馏（ERPD），一个两阶段框架，将样本效率与KL效率解耦。第一阶段在固定数据上执行弱约束离策略优化，以最大化提取训练信号。所得策略提供令牌级监督。第二阶段，我们在信任区域约束下将这些信号蒸馏到基础策略中，过滤有害漂移同时保留有用信号。蒸馏后的策略以显著更小的KL散度达到相当或更好的性能，表明第一阶段的大部分散度用于不必要的漂移而非真正改进。关键的是，ERPD同时适应强和弱教师：当激进优化未产生更强策略时，即使是退化教师也能通过替代信号构建策略提供有效监督。我们在数学推理上验证ERPD，显示出对强基础模型（在策略训练停滞时）的增益，以及使用弱教师的可靠改进。

英文摘要

Reinforcement learning for large language models faces a fundamental trade-off between sample efficiency and asymptotic performance: strictly on-policy methods discard trajectories after a single update, while off-policy reuse introduces distribution mismatch that existing trust-region techniques mitigate primarily by enforcing conservative optimization, often leaving rich training signals underutilized. To investigate this, we perform extensive off-policy updates on fixed data. Our experiments reveal that aggressive multi-step optimization brings rapid initial gains, but excessive updates cause trajectory probabilities to deviate and entropy to collapse, with performance plateauing early. Tightening KL constraints merely lowers the ceiling without resolving the degradation. This motivates Extreme Region Policy Distillation (ERPD), a two-stage framework that decouples sample efficiency from KL efficiency. The first stage performs weakly constrained off-policy optimization on fixed data to maximally extract training signals. The resulting policy provides token-level supervision. In the second stage, we distill these signals into the base policy under trust-region constraints, filtering harmful drift while preserving useful signals. The distilled policy achieves comparable or better performance with substantially smaller KL divergence, indicating that much of the first-stage divergence was spent on unnecessary drift rather than genuine improvement. Crucially, ERPD accommodates both strong and weak teachers: when aggressive optimization yields no stronger policy, even degenerate teachers provide effective supervision via alternative signal construction strategies. We validate ERPD on mathematical reasoning, showing gains for strong base models where on-policy training plateaus, and reliable improvements with weak teachers.

URL PDF HTML ☆

赞 0 踩 0

2507.02628 2026-06-05 cs.LG 版本更新

A Generative Approach for Semantic Auditing of Electronic Health Records

电子健康记录语义审计的生成式方法

Irena Girshovitz, Atai Ambus, Moni Shahar, Ran Gilad-Bachrach

发表机构 * School of Biomedical Engineering, Faculty of Engineering, Tel Aviv University（特拉维夫大学生物医学工程学院，工程学院）； AI and Data Science Center of Tel Aviv University (TAD)（特拉维夫大学人工智能与数据科学中心（TAD））； Safra Center for Bioinformatics, Tel Aviv University（特拉维夫大学萨弗拉生物信息学中心）

AI总结提出一种基于大语言模型的生成式方法，通过语义数据覆盖和检索增强生成架构自动检测电子健康记录与流行病学证据之间的不一致性，实现可扩展的语义审计。

Comments 23 pages, 5 figures (+ appendix)

详情

DOI: 10.1038/s41746-026-02809-w

AI中文摘要

临床人工智能的可靠性依赖于高质量数据，然而电子健康记录常常与现有科学知识不一致。当前的质量评估存在局限性：要么关注语法，要么依赖劳动密集型的手工规则来捕捉语义细微差别。为了克服这些可扩展性障碍，我们提出了Medical Data Pecking，一种采用软件单元测试原则进行医疗数据验证的方法。它引入了语义数据覆盖，利用大语言模型生成上下文感知的测试，以“啄出”观察数据与流行病学证据之间的不一致性。为了演示这种方法，我们使用检索增强生成架构实现了一个参考工具，将医学文献综合为可执行代码。当应用于三个数据集时，该实现为每个队列生成了数十个测试，识别出观察分布与流行病学先验之间的差异。这些差异既包括真实的数据不一致性，也包括预期的队列选择效应。这项工作为可扩展的语义审计提供了一个初步框架，将保证从手工规则转向可信AI所需的生成式和上下文敏感的验证。

英文摘要

The reliability of clinical artificial intelligence (AI) depends on high-quality data, yet Electronic Health Records are often inconsistent with existing scientific knowledge. Current quality assessments are limited: they either focus on syntax or rely on labor-intensive manual rules to capture semantic nuances. To overcome these scalability barriers, we propose Medical Data Pecking, a methodology that adopts software unit testing principles for medical data validation. It introduces Semantic Data Coverage, employing Large Language Models to generate context-aware tests that "peck" for inconsistencies between observed data and epidemiological evidence. To demonstrate this methodology, we implemented a reference tool using a Retrieval-Augmented Generation architecture that synthesizes medical literature into executable code. When applied to three datasets, this implementation generated dozens of tests per cohort, identifying discrepancies between observed distributions and epidemiological priors. These discrepancies encompass both genuine data inconsistencies and expected cohort-selection effects. This work provides an initial framework for scalable semantic auditing, shifting assurance from manual rules to the generative and context-sensitive verification required for trustworthy AI.

URL PDF HTML ☆

赞 0 踩 0

2605.24059 2026-06-05 cs.LG cs.AI 版本更新

Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers

频谱探测电路：识别预训练Transformer中注意力头电路的三步法

Yongzhong Xu

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出一种三步法，通过频谱信号排序、任务模式筛选和组消融因果验证，无需标签即可识别预训练Transformer中执行持续内容依赖计算的注意力头电路，并在多个模型上验证了其通用性和因果必要性。

Comments 35 pages, 4 figures

详情

AI中文摘要

利用大语言模型进行安全硬件设计及相关问题：机遇与挑战

Johann Knechtel, Ozgur Sinanoglu, Ramesh Karri

发表机构 * New York University Abu Dhabi（纽约大学阿布扎克分校）； NYU Tandon School of Engineering（纽约大学塔能工程学院）

AI总结本文探讨了大语言模型在电子设计自动化和硬件安全领域的应用，分析了其在生成RTL代码、自动生成测试平台以及弥合高层次规格与硅芯片之间语义差距方面的潜力，同时指出了其引入的严重安全漏洞，并总结了当前研究的最新进展和未来研究方向。

Comments Accepted for 2026 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

详情

AI中文摘要

将大语言模型（LLMs）整合到电子设计自动化（EDA）和硬件安全领域正迅速重塑半导体行业。尽管LLMs在生成寄存器传输级（RTL）代码、自动生成测试平台以及弥合高层次规格与硅芯片之间的语义差距方面提供了前所未有的能力，但同时它们也引入了严重的安全隐患。本文全面回顾了LLM驱动的硬件设计的最新研究进展，围绕EDA综合、硬件信任、安全设计和教育等方面的关键进展进行深入分析。我们系统地扩展了最近突破的方法——从基于推理的综合和多代理漏洞提取到数据污染和对抗性机器学习（ML）规避。我们整合了关于关键防御措施的一般讨论，如动态基准测试以对抗数据记忆和激进的红队测试以实现稳健的安全评估。最后，我们综合了跨领域的经验教训，以指导未来研究朝着安全、可信和自主的设计生态系统发展。

英文摘要

The integration of Large Language Models (LLMs) into Electronic Design Automation (EDA) and hardware security is rapidly reshaping the semiconductor industry. While LLMs offer unprecedented capabilities in generating Register Transfer Level (RTL) code, automating testbenches, and bridging the semantic gap between high-level specifications and silicon, they simultaneously introduce severe vulnerabilities. This comprehensive review provides an in-depth analysis of the state-of-the-art in LLM-driven hardware design, organized around key advancements in EDA synthesis, hardware trust, design for security, and education. We systematically expand on the methodologies of recent breakthroughs -- from reasoning-driven synthesis and multi-agent vulnerability extraction to data contamination and adversarial machine learning (ML) evasion. We integrate general discussions on critical countermeasures, such as dynamic benchmarking to combat data memorization and aggressive red-teaming for robust security assessment. Finally, we synthesize cross-cutting lessons learned to guide future research toward secure, trustworthy, and autonomous design ecosystems.

URL PDF HTML ☆

赞 0 踩 0

2605.20119 2026-06-05 cs.LG cs.AI 版本更新

Toto 2.0: Time Series Forecasting Enters the Scaling Era

Toto 2.0：时间序列预测进入规模化时代

Emaad Khwaja, Chris Lettieri, Gerald Woo, Eden Belouadah, Marc Cenac, Guillaume Jarry, Enguerrand Paquin, Xunyi Zhao, Viktoriya Zhukov, Othmane Abou-Amal, Chenghao Liu, Ameet Talwalkar, David Asker

发表机构 * Datadog AI Research（Datadog AI研究院）； Carnegie Mellon University（卡内基梅隆大学）

AI总结本文提出Toto 2.0模型家族，通过单一训练配方在400万到25亿参数范围内实现可靠的预测质量提升，并在三个基准测试中达到新状态。

Comments Code: https://github.com/DataDog/toto Weights: https://huggingface.co/collections/Datadog/toto-20

2405.05097 2026-06-05 cs.LG stat.ML 版本更新

Biology-inspired joint distribution neurons based on Hierarchical Correlation Reconstruction allowing for multidirectional propagation of values and densities

受生物学启发的联合分布神经元：基于层次相关性重建的多向传播神经元

Jarek Duda

发表机构 * Jagiellonian University（雅盖隆大学）

AI总结本文提出了一种受生物学启发的联合分布神经元，通过层次相关性重建实现多向值和密度传播，改进了传统人工神经元在学习、灵活性和鲁棒性方面的不足。

Comments 12 pages, 17 figures

详情

AI中文摘要

最近，一百万生物神经元（BNN）在现代强化学习（RL）方法中表现出色，尤其在Pong游戏中表现优异，表明它们在学习、灵活性和鲁棒性方面仍具有显著优势，提示需要改进当前人工神经元（如MLP/KAN）以更好地与生物神经元一致。本文提出了一种扩展KAN方法的神经元，包含局部联合分布模型：ρ(x)=∑_{j∈B} a_j f_j(x)对于x∈[0,1]^d，增加了对KAN的解释和信息流控制，并允许逐步补充生物神经元的三个基本特性：1）生物轴突可以双向传播，而当前人工神经元仅单向传播，联合分布神经元可通过替换变量获得条件值/分布；2）动物表现出风险规避，需要处理方差，现实世界更需要概率模型，所提方法可预测和传播分布作为矩向量（期望值、方差等）；3）生物神经元需要局部训练，除了反向传播外，所提方法还允许其他训练方式，如直接训练、张量分解或最终的局部和有前景的信息瓶颈。所提方法非常通用，也可用于扩展softmax在transformer或JEPA嵌入中的应用，暗示特征是现实世界属性联合密度的混合矩。

英文摘要

Recently a million of biological neurons (BNN) has turned out better from modern RL methods in playing Pong~\cite{RL}, reminding they are still qualitatively superior e.g. in learning, flexibility and robustness - suggesting to try to improve current artificial e.g. MLP/KAN for better agreement with biological. There is proposed extension of KAN approach to neurons containing model of local joint distribution: $ρ(\mathbf{x})=\sum_{\mathbf{j}\in B} a_\mathbf{j} f_\mathbf{j}(\mathbf{x})$ for $\mathbf{x} \in [0,1]^d$, adding interpretation and information flow control to KAN, and allowing to gradually add missing 3 basic properties of biological: 1) biological axons propagate in both directions~\cite{axon}, while current artificial are focused on unidirectional propagation - joint distribution neurons can repair by substituting some variables to get conditional values/distributions for the remaining. 2) Animals show risk avoidance~\cite{risk} requiring to process variance, and generally real world rather needs probabilistic models - the proposed can predict and propagate also distributions as vectors of moments: (expected value, variance) or higher. 3) biological neurons require local training, and beside backpropagation, the proposed allows many additional ways, like direct training, through tensor decomposition, or finally local and promising: information bottleneck. Proposed approach is very general, can be also used as extension of softmax in embeddings of e.g. transformer, JEPA, Mamba, suggesting interpretation that features are mixed moments of joint density of real-world properties.

URL PDF HTML ☆

赞 0 踩 0

2605.13587 2026-06-05 stat.ML cs.LG eess.SP 版本更新

Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: A large-scale benchmark of operator-adaptive PLS and Ridge models

将预处理选择重新定义为近红外光谱学中的模型内部校准：一种大规模的运算符自适应PLS和岭模型基准测试

Gregory Beurier, Robin Reiter, Camille Noûs, Lauriane Rouan, Denis Cornet

发表机构 * CIRAD, UMR AGAP Institut（CIRAD，AGAP研究院）； UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro（AGAP研究院，蒙彼利埃大学，CIRAD，INRAE，农业研究院）； Laboratoire Cogitamus（Cogitamus实验室）

AI总结本文研究了在近红外光谱学中，将预处理选择重新定义为模型内部校准的方法，通过大规模基准测试比较运算符自适应PLS和岭模型的性能和效率。

Comments 17 pages, 8 figures; supplementary material (39 pages, 4 figures) included. Extended preprint version of a companion study prepared as a concise journal article (same results, different framing and scope). Code and artifacts: https://github.com/GBeurier/nirs4all-aom

详情

AI中文摘要

预处理筛选通常是近红外光谱学校准工作流程中最昂贵的部分。其有效性在于平滑、导数、去趋势及相关滤波器会改变PLS或岭回归所看到的光谱方向，但完整的外部搜索会反复拟合几乎相同的线性模型。本文研究了将该搜索折叠成一个校准步骤的情况。对于严格的线性预处理运算符，变换后的PLS交叉协方差满足(XA^T)^T Y = AX^T Y，而岭回归依赖于运算符诱导的核X A^T A X^T。这些恒等式允许在模型内部筛选有限的运算符银行，同时保留原始波长系数。样本自适应或拟合的校正如SNV、MSC、EMSC和ASLS仍保持为折叠局部分支，而不是被吸收进代数中。本研究使用AOM基准队列：在显式中包含61个回归行和17个分类行。在主回归分母（N=32）上，普通的紧凑银行AOM-PLS记录了与PLS默认值相比的中位RMSEP比为0.991，与PLS-HPO相比为0.990；所选的ASLS-AOM-compact-cv5分支在相同的两个参考上记录为0.985和1.002。普通的AOMRidge-global-compact-none基线记录了与Ridge默认值相比的0.974，与Ridge-HPO相比为0.984，而所选的AOMRidge-Blender-headline-spxy3记录为0.918和0.966。所选分类器AOM-PLS-DA-global-simpls-covariance在13个数据集上将平衡精度提高了0.159，其中12/13胜出。运行时间差距是实际结果：PLS-HPO每次运行的中位总时间是710.81秒，而所选的AOM-PLS分支仅为1.63秒。线性运算符自适应校准因此在预测质量上与彻底的预处理筛选相当，对于PLS来说，拟合时间减少了多个数量级。

英文摘要

Preprocessing screening is often the most expensive part of a near-infrared spectroscopy calibration workflow. It works because smoothing, derivatives, detrending and related filters change the spectral directions seen by partial least squares (PLS) or Ridge regression, but a full external search repeatedly refits nearly the same linear model. This paper studies the case where that search can be collapsed into one calibration step. For a strict linear preprocessing operator A acting on row spectra as XA^T, the transformed PLS cross-covariance satisfies (XA^T)^T Y = A X^T Y, and Ridge regression depends on the operator-induced kernel X A^T A X^T. These identities let a finite operator bank be screened inside the model while retaining original-wavelength coefficients, and the same identity extends to cheaply evaluated linear operator chains. Sample-adaptive or fitted corrections such as SNV, MSC, EMSC and ASLS are not strict linear; we prove the boundary and keep them as fold-local branches. The cohort has 61 regression and 17 classification rows, with a strict paired regression denominator of N=32 for the eight paper variants. There, AOM-PLS reaches median RMSEP ratios of 0.991/0.990 (simple) and 0.985/1.002 (best) against PLS-default/PLS-HPO, and AOM-Ridge reaches 0.974/0.984 (simple) and 0.918/0.966 (best) against Ridge-default/Ridge-HPO. The operator-adaptive classifier AOM-PLS-DA improves balanced accuracy by a median 0.159 on N=13 datasets (12/13 wins). The practical result is the runtime gap: PLS-HPO takes a median 710.81 s per run, whereas AOM-PLS takes 1.18-1.63 s -- 436 to 602 times less PLS fitting time. Linear operator-adaptive calibration thus gives prediction quality comparable to exhaustive preprocessing screening, with orders-of-magnitude less fitting time for PLS.

URL PDF HTML ☆

赞 0 踩 0

2511.20577 2026-06-05 cs.LG 版本更新

MSTN: A Lightweight and Fast Model for General TimeSeries Analysis

MSTN: 一种轻量且快速的通用时间序列分析模型

Sumit S Shevtekar, Chandresh K Maurya

发表机构 * Department of Computer Science and Engineering（计算机科学与工程系）； Indian Institute of Technology Indore（印度理工学院印度尔）

AI总结本文提出了一种轻量且快速的MSTN模型，通过多尺度时间网络架构，结合早期内部聚合原理，有效处理时间序列中的非平稳性、非线性动态和多时间尺度行为，实现了在多个时间尺度上的灵活建模，同时保持了模型的轻量级和低延迟特性。

Comments 30 pages, published in Transactions on Machine Learning Research (TMLR)

详情

AI中文摘要

现实世界的时间序列往往表现出强烈的非平稳性、复杂的非线性动态以及在多个时间尺度上的行为，从快速的局部波动到缓慢演变的长期趋势。然而，许多现代架构施加了刚性的、固定尺度的结构先验，如基于补丁的标记化、预定义的感受野或冻结的主干编码器，这可能会过度正则化时间动态并限制对突发高幅事件的适应性。为此，我们引入了多尺度时间网络（MSTN），一种基于早期内部聚合原理的混合神经架构。MSTN集成了三个互补的组件：（i）多尺度卷积编码器，用于捕捉细粒度的局部结构；（ii）序列建模模块，通过递归或注意力机制学习长距离依赖；（iii）自我门控融合阶段，结合挤压激活和单个密集层，动态重新加权和融合多尺度表示。ETA确保下游模块在O(1)时间内运行，而编码器保留O(L^2)（Transformer）或O(L)（BiLSTM）。这种设计使MSTN能够灵活地建模从毫秒到长期的时间模式，同时避免了通常与长上下文模型相关的计算负担。在广泛的基准测试中，包括填补缺失值、长期预测、分类和跨数据集泛化，MSTN实现了最先进的性能，在27个数据集中的21个上建立了新的最佳结果，同时保持轻量级（MSTN-BiLSTM约为0.40M参数，MSTN-Transformer约为1.06M参数）并适合低延迟推理（<1秒，通常在毫秒级），资源受限的部署。

英文摘要

Real-world time series often exhibit strong non-stationarity, complex nonlinear dynamics, and behavior expressed across multiple temporal scales, from rapid local fluctuations to slow-evolving long-range trends. However, many contemporary architectures impose rigid, fixed-scale structural priors such as patch-based tokenization, predefined receptive fields, or frozen backbone encoders - which can over-regularize temporal dynamics and limit adaptability to abrupt high-magnitude events. To handle this, we introduce the Multi-scale Temporal Network (MSTN), a hybrid neural architecture grounded in an Early Temporal Aggregation principle. MSTN integrates three complementary components: (i) a multi-scale convolutional encoder that captures fine-grained local structure; (ii) a sequence modeling module that learns long-range dependencies through either recurrent or attention-based mechanisms; and (iii) a self-gated fusion stage incorporating squeeze-excitation and a single dense layer to dynamically reweight and fuse multi-scale representations. ETA ensures downstream modules operate in O(1) time, while the encoder retains O(L^2) (Transformer) or O(L) (BiLSTM). This design enables MSTN to flexibly model temporal patterns spanning milliseconds to extended horizons, while avoiding the computational burden typically associated with long-context models. Across extensive benchmarks covering imputation, long-term forecasting, classification, and cross-dataset generalization, MSTN achieves state-of-the-art performance, establishing new best results on 21 of 27 datasets, while remaining lightweight (~0.40M params for MSTN-BiLSTM and ~1.06M for MSTN-Transformer) and suitable for low-latency inference (<1 sec, often in milliseconds), resource-constrained deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.16138 2026-06-05 cs.LG cs.AI hep-ex 版本更新

Surrogate Neural Architecture Codesign Package (SNAC-Pack)

代理神经架构协同设计包（SNAC-Pack）

Jason Weitz, Dmitri Demler, Benjamin Hawks, Aaron Wang, Nhan Tran, Javier Duarte

发表机构 * University of California San Diego（加州大学圣地亚哥分校）； ETH Zurich（苏黎世联邦理工学院）； Fermi National Accelerator Laboratory（费米国家加速器实验室）； University of Illinois Chicago（伊利诺伊大学芝加哥分校）

AI总结本文提出SNAC-Pack，一种面向硬件的自动化机器学习框架，用于神经架构协同设计和端到端FPGA部署，通过多目标全局搜索和硬件代理模型减少合成成本，并结合量化感知训练和迭代幅度剪枝来压缩模型，最终在FPGA上实现高效部署。

Comments 15 Pages, 3 Figures, AutoML (International Conference on Automated Machine Learning) 2026

详情

AI中文摘要

神经架构搜索（NAS）是一种强大的自动模型设计方法，但现有方法往往只优化准确率或依赖如位操作（BOPs）等代理指标，这些指标与硬件成本的相关性较差。在FPGA部署中，成本由查找表、DSP、触发器、BRAM和延迟等多维预算主导。我们提出了代理神经架构协同设计包（SNAC-Pack），一种开源的AutoML框架，用于硬件感知的神经架构协同设计和端到端FPGA部署。SNAC-Pack使用Optuna和NSGA-II进行多目标全局搜索，将试验加载到共享的SQLite存储中，以实现计算节点之间的并行工作。硬件代理模型输出每个试验的资源和延迟估计，避免了否则会主导搜索循环的合成成本。随后的局部搜索阶段结合量化感知训练（QAT）和迭代幅度剪枝，在联合压缩循环中应用。最后，通过hls4ml Python库将最终模型合成到FPGA固件中。YAML配置和可选的代理前端使用户能够在新数据集上运行管道而无需修改框架。我们在大型强子对撞机的喷射分类和超导量子比特读出中展示了SNAC-Pack，发现了紧凑的架构，这些架构在任务指标上匹配或超过强基线，同时减少FPGA资源利用，并在量子比特读出情况下将设计空间探索过程从数月的手动微调减少到数小时的自动化搜索。

英文摘要

Neural architecture search (NAS) is a powerful approach for automating model design, but existing methods often optimize for accuracy alone or rely on proxy metrics such as bit operations (BOPs) that correlate poorly with hardware cost. This gap is particularly large for FPGA deployment, where cost is dominated by a multi-dimensional budget of lookup tables, DSPs, flip-flops, BRAM, and latency. We present the Surrogate Neural Architecture Codesign Package (SNAC-Pack), an open-source AutoML framework for hardware-aware neural architecture codesign and end-to-end FPGA deployment. SNAC-Pack runs a multi-objective global search with Optuna and NSGA-II, loading trials to a shared SQLite store that enables parallel workers across compute nodes. A hardware surrogate model outputs per-trial resource and latency estimates, avoiding the synthesis cost that would otherwise dominate the search loop. A local search stage then applies quantization-aware training (QAT) together with iterative magnitude pruning in a combined compression loop, after which the final model is synthesized to FPGA firmware via the hls4ml Python library. A YAML configuration and an optional agentic frontend let users run the pipeline on new datasets without modifying the framework. We demonstrate SNAC-Pack on jet classification at the Large Hadron Collider and superconducting qubit readout, discovering compact architectures that match or exceed strong baselines on the task metric while reducing FPGA resource utilization and, in the qubit readout case, reducing the design space exploration process from months of manual fine-tuning to hours of automated search.

URL PDF HTML ☆

赞 0 踩 0

2509.10825 2026-06-05 cs.LG cs.AI stat.ML 版本更新

CUBE: Contrastive Understanding by Balanced Experiments

CUBE: 通过平衡实验实现对比理解

Dongseok Kim, Hyoungsun Choi, Mohamed Jismy Aashik Rasool, Gisung Oh

发表机构 * Department of Computer Engineering（计算机工程系）； Gachon University（加荣大学）

AI总结本文提出CUBE框架，通过平衡低-高探针解释已训练的预测模型，揭示模型的主要效应和交互作用，验证了其在合成和现实表格任务中的有效性。

Comments The core framework and main claims remain unchanged; the manuscript has been revised for clarity, presentation, and consistency

2605.15454 2026-06-05 cs.CL cs.LG stat.ML 版本更新

Reasoning Models Don't Just Think Longer, They Move Differently

推理模型不只思考更久，它们的移动方式不同

Anders Gjølbye, Lars Kai Hansen, Sanmi Koyejo

发表机构 * Technical University of Denmark（丹麦技术大学）； Stanford University（斯坦福大学）

AI总结本文研究了推理训练模型在生成链式思维时的轨迹差异，发现通过长度校正后，不同领域中难度与轨迹几何的耦合关系存在显著差异，尤其是在代码领域中，推理训练模型表现出更直接的轨迹和更一致的局部曲率。

Comments Preprint

详情

AI中文摘要

经过训练的推理语言模型通常在更难的问题上消耗更多标记，但更长的思维链并不表明模型只是计算更多步骤或遵循不同的内部轨迹。我们通过在编程、数学和布尔可满足性问题中研究链式思维生成过程中的隐藏状态轨迹来区分这一区别。原始轨迹几何强烈受到生成长度的影响：更长的生成会机械地改变路径统计，因此在没有调整的情况下，基于难度的比较是误导的。在残差化轨迹统计后，难度在所有研究的领域中系统地与修正后的轨迹几何相关联。在代码领域中，最清晰的推理特定分离出现在更难的问题中，推理训练模型显示出更直接的修正轨迹和更一致的局部曲率，而与匹配的指令训练基线相比，这种差异更小。在数学和布尔可满足性问题中，修正后的难度-几何耦合较弱，但仍存在。提示阶段的线性探测不反映代码领域的分离，行为注释显示更强的修正耦合与策略转变和不确定性监控同时出现。这些发现确立了长度校正作为生成时间轨迹分析的先决条件，并表明推理训练可以与不同的修正轨迹几何相关联，这种效果的强度取决于领域。

英文摘要

Reasoning-trained language models often spend more tokens on harder problems, but longer chains of thought do not show whether a model is merely computing for more steps or following a different internal trajectory. We study this distinction through hidden-state trajectories during chain-of-thought generation across competitive programming, mathematics, and Boolean satisfiability. Raw trajectory geometry is strongly shaped by generation length: longer generations mechanically alter path statistics, so difficulty-dependent comparisons are misleading without adjustment. After residualizing trajectory statistics on length, difficulty remains systematically coupled to corrected trajectory geometry across all domains studied. The clearest reasoning-specific separation appears in the code domain, where harder problems show more direct corrected trajectories and less heterogeneous local curvature in reasoning-trained models than in matched instruction-tuned baselines. Corrected difficulty-geometry coupling is weaker, but still present, in mathematics and Boolean satisfiability. Prompt-stage linear probes do not mirror the code-domain separation, and behavioral annotations show that stronger corrected coupling co-occurs with strategy shifts and uncertainty monitoring. Together, these findings establish length correction as a prerequisite for generation-time trajectory analysis and show that reasoning training can be associated with distinct corrected trajectory geometry, with the strength of the effect depending on the domain.

URL PDF HTML ☆

赞 0 踩 0

2605.13830 2026-06-05 cs.AI cs.LG 版本更新

Quantifying Sensitivity for Tree Ensembles: A symbolic and compositional approach

对树集成模型的敏感性量化：一种符号和组合方法

Ajinkya Naik, Chaitanya Garg, S. Akshay, Ashutosh Gupta, Kuldeep S. Meel

发表机构 * Indian Institute of Technology Bombay（印度理工学院班加罗尔分校）； University of Toronto（多伦多大学）

AI总结本文提出了一种针对树集成模型的敏感性量化方法，通过离散化输入空间并枚举易受敏感性影响的区域，结合代数决策图（ADD）编码和分拆子问题，实现高效计算。实验表明，所提工具XCount在速度和可扩展性方面优于其他方法。

详情

AI中文摘要

决策树集成（DTE）是一种广泛应用于AI分类任务的流行模型，用于多个安全关键领域，因此对这些模型的验证已成为过去十年的研究热点之一。其中一个问题就是敏感性问题，它询问给定一个DTE，是否一小部分特征的变化会导致输入的误分类。在本工作中，我们的目标是构建一个针对DTEs的定量敏感性概念，通过离散化模型的输入空间并枚举易受敏感性影响的区域。我们提出了一种新的算法技术，可以在保证认证误差和置信度范围内高效地完成此计算。我们的方法基于将问题编码为代数决策图（ADD），并进一步将其拆分为可高效解决的子问题，使计算成为组合和可扩展的。我们在不同规模的基准上评估了我们的技术的性能，与相同问题编码下的模型计数器进行比较。实验结果表明，我们的工具XCount在速度上显著优于其他方法，并且在集成规模增加时表现良好。

英文摘要

Decision tree ensembles (DTE) are a popular model for a wide range of AI classification tasks, used in multiple safety critical domains, and hence verifying properties on these models has been an active topic of study over the last decade. One such verification question is the problem of sensitivity, which asks, given a DTE, whether a small change in subset of features can lead to misclassification of the input. In this work, our focus is to build a quantitative notion of sensitivity, tailored to DTEs, by discretizing the input space of the model and enumerating the regions which are susceptible to sensitivity. We propose a novel algorithmic technique that can perform this computation efficiently, within a certified error and confidence bound. Our approach is based on encoding the problem as an algebraic decision diagram (ADD), and further splitting it into subproblems that can be solved efficiently and make the computation compositional and scalable. We evaluate the performance of our technique over benchmarks of varying size in terms of number of trees and depth, comparing it against the performance of model counters over the same problem encoding. Experimental results show that our tool XCount achieves significant speedup over other approaches and can scale well with the increasing sizes of the ensembles.

URL PDF HTML ☆

赞 0 踩 0

2605.12951 2026-06-05 stat.ML cs.LG 版本更新

Coreset-Induced Conditional Velocity Flow Matching

由Coreset诱导的条件速度流匹配

Xiao Wang, Zihua She, Jianxi Su

发表机构 * Department of Statistics, Purdue University（普渡大学统计学系）

AI总结本文提出了一种生成模型CCVFM，通过数据驱动的源分布增强层次化修正流，利用Coreset压缩目标数据并生成高斯混合分布，从而在无需学习神经采样器的情况下实现条件速度律的闭式表达，并通过轻量级修正流进一步优化生成效果。

详情

AI中文摘要

我们提出了Coreset-Induced Conditional Velocity Flow Matching (CCVFM)，一种生成模型，通过数据驱动的源分布增强层次化修正流。层次化流匹配在速度空间中建模完整的条件速度定律，但其内部流被要求从头开始将各向同性高斯噪声传输到多模态目标速度分布。我们的关键观察是，此内部源可以被一个闭式近似替代，该近似基于目标的Coreset。CCVFM首先利用熵Sinkhorn Coreset将目标压缩为加权原子，并将它们提升为高斯混合分布。由此诱导的条件速度定律是一个闭式高斯混合分布，可在不学习神经采样器的情况下进行采样。一个轻量级修正流，从该精确近似源训练而来，然后优化剩余的近似到目标残差，而不是学习整个噪声到数据映射。我们证明，在显式压缩假设下，近似传输成本等于目标-近似Wasserstein差距，而噪声-源的类比具有维度尺度下界。我们进一步刻画了直接近似源训练目标的条件二次矩，并表明当近似条件律接近真实条件速度律在均值和协方差时，其源依赖的超额是小的。实验证明，在MNIST、CIFAR-10、ImageNet-32和CelebA-HQ上，所提方法在匹配架构下实现了具有竞争力的少步生成。

英文摘要

We propose Coreset-Induced Conditional Velocity Flow Matching (CCVFM), a generative model that augments hierarchical rectified flow with a data-informed source distribution. Hierarchical flow matching models the full conditional velocity law in velocity space, but its inner flow is asked to transport isotropic Gaussian noise to a multimodal target velocity distribution from scratch. Our key observation is that this inner source can be replaced by a closed-form surrogate built from a coreset of the target. CCVFM first compresses the target into weighted atoms using an entropic Sinkhorn coreset and lifts them to a Gaussian mixture. The induced conditional velocity law is then a closed-form Gaussian mixture that can be sampled without a learned neural sampler. A lightweight correction flow, trained from this exact surrogate source, then refines the remaining surrogate-to-target residual rather than learning an entire noise-to-data map. We prove that the surrogate transport cost equals the target--surrogate Wasserstein gap under an explicit compression assumption, whereas the noise-source analogue has a dimension-scale lower bound. We further characterize the conditional second moment of the direct surrogate-source training target and show that its source-dependent excess is small when the surrogate conditional law is close to the true conditional velocity law in mean and covariance. Empirically, on MNIST, CIFAR-10, ImageNet-32, and CelebA-HQ, the proposed method reaches competitive few-step generation under matched architectures.

URL PDF HTML ☆

赞 0 踩 0

2605.08318 2026-06-05 cs.LG cs.AI cs.NA math.NA physics.comp-ph stat.ML 版本更新

When Attention Beats Fourier: Multi-Scale Transformers for PDE Solving on Irregular Domains

当注意力胜过傅里叶：用于不规则域上的PDE求解的多尺度变换器

Brandon Yee, Pairie Koh, Jack Rodriguez, Mihir Tekal

发表机构 * Physics Lab, Yee Collins Research Group（Yee Collins研究组物理实验室）

AI总结本文研究了深度学习模型在求解偏微分方程（PDE）时的架构选择问题，探讨了基于学习注意力的变换器架构在何时优于傅里叶域神经算子。引入了多尺度注意力变换器（MSAT），该架构将时空解的历史编码为令牌序列，并通过复合监督目标进行端到端训练。在五个基准问题上，与九种基线方法（包括物理信息神经网络、神经算子和状态空间模型）进行了全面的实证评估，展示了在复杂几何问题上的最佳泛化能力。

Comments Substantial Revision Required

详情

AI中文摘要

我们研究了深度学习模型在求解偏微分方程（PDE）时的架构选择问题，探讨了基于学习注意力的变换器架构在何时优于傅里叶域神经算子。我们介绍了多尺度注意力变换器（MSAT），一种深度学习架构，将时空解的历史编码为令牌序列，并通过复合监督目标进行端到端训练。我们对九种基线方法（包括物理信息神经网络、神经算子和状态空间模型）进行了全面的实证评估，覆盖了PINNacle套件中的五个基准问题，使用相同的训练/测试分割和参考数据。MSAT在复杂几何问题上实现了最先进的泛化能力（Heat2D-CG的L²相对误差为0.0101，比FNO提高了3.7倍），在34秒的总推理时间下，比Mamba-NO的120,812秒快得多。对物理正则化组件的消融研究揭示了精确的归纳偏置权衡：物理先验减少了扩散主导问题的测试误差，但会退化混沌和回流流动制度的泛化能力，直接刻画了先验规格错误的边界。近似误差界作为域边界复杂性κ的函数，为这些实证发现提供了理论基础，并为架构选择提供了一个原则性的规则。

英文摘要

We study the problem of \emph{architecture selection} for deep learning models trained to solve partial differential equations (PDEs), asking when transformer-based architectures with learned attention outperform Fourier-domain neural operators. We introduce the \textbf{Multi-Scale Attention Transformer} (\msat{}), a deep learning architecture that encodes spatiotemporal solution histories as token sequences and trains end-to-end via a composite supervised objective with optional physics-informed regularization terms. We conduct a comprehensive empirical evaluation against nine baselines -- including physics-informed neural networks (PINNs), neural operators (FNO, DeepONet, GNOT), and state-space models (Mamba-NO) -- across five benchmark problems from the PINNacle suite, using identical train/test splits and reference data for all methods. \msat{} achieves state-of-the-art generalization on complex geometry problems ($L^2_\mathrm{rel} = 0.0101$ on Heat2D-CG, a $3.7\times$ improvement over FNO) at $34\,\mathrm{s}$ total inference vs.\ $120{,}812\,\mathrm{s}$ for Mamba-NO. Ablation studies over the physics regularization component reveal a precise inductive bias tradeoff: physics priors reduce test error on diffusion-dominated problems but degrade generalization on chaotic and recirculating-flow regimes, directly characterizing the prior misspecification boundary. Approximation error bounds as a function of domain boundary complexity $κ$ provide a theoretical basis for these empirical findings and a principled rule for architecture selection.

URL PDF HTML ☆

赞 0 踩 0

2605.08253 2026-06-05 cs.LG cs.AI 版本更新

Path-Coupled Bellman Flows for Distributional Reinforcement Learning

路径耦合贝尔曼流用于分布式强化学习

Boyang Xu, Qing Zou, Siqin Yang, Hao Yan

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文提出路径耦合贝尔曼流（PCBF），一种连续时间的分布式强化学习方法，通过学习回报分布的流匹配来解决现有方法在边界不匹配和高方差-bootstrap问题，实验表明其在分布保真度和训练稳定性方面有所提升。

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026)

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026

AI中文摘要

分布式强化学习（DRL）模型完整回报分布，但现有有限支持或分位数方法依赖于投影，而近期基于流的方法在流源处可能遭受边界不匹配，或在当前和后续噪声独立时出现高方差的bootstrap问题。本文提出路径耦合贝尔曼流（PCBF），一种连续时间DRL方法，通过学习回报分布的流匹配使用源一致的贝尔曼耦合路径：当前路径从t=0所需的基先验开始，到达t=1的贝尔曼目标，并在中间时间保持路径上的线性关系到后续流（不需要时间t的边际满足分布贝尔曼固定点对所有t）。PCBF通过共享基噪声耦合当前和后续回报流，并使用λ参数化的控制变异目标：λ=0恢复无偏样本贝尔曼目标，而λ>0通过可控的偏倚换取方差减少。在可解析的MRPs、OGBench和D4RL上的实验表明，PCBF在分布保真度和训练稳定性方面有所提升，并在离线RL性能上具有竞争力。

英文摘要

Distributional reinforcement learning (DRL) models the full return distribution, but existing finite-support or quantile-based methods rely on projections, while recent flow-based approaches can suffer from \emph{boundary mismatch} at the flow source or from \emph{high-variance} bootstrapping when current and successor noises are independent. We propose Path-Coupled Bellman Flows (PCBF), a continuous-time DRL method that learns return distributions with flow matching using \textbf{source-consistent Bellman-coupled paths}: the current path starts from the required base prior at $t{=}0$, reaches the Bellman target at $t{=}1$, and maintains a pathwise affine relation to the successor flow at intermediate times (without requiring time-$t$ marginals to satisfy a distributional Bellman fixed point for all $t$). PCBF couples current and successor return flows through shared base noise and uses a $λ$-parameterized control-variate target: $λ{=}0$ recovers an unbiased sample Bellman target, while $λ{>}0$ trades controlled bias for variance reduction. Experiments on analytically tractable MRPs, OGBench, and D4RL show improved distributional fidelity and training stability, and competitive offline RL performance.

URL PDF HTML ☆

赞 0 踩 0

2605.08215 2026-06-05 cs.CV cs.LG cs.RO 版本更新

Test-Time Training for Visual Foresight Vision-Language-Action Models

测试时训练用于视觉前瞻视觉-语言-动作模型

Sangwu Park, Wonjoong Kim, Yeonjun In, Sein Kim, Hongseok Kang, Chanyoung Park

发表机构 * KAIST（韩国科学技术院）

AI总结本文提出了一种测试时训练方法，用于增强视觉前瞻视觉-语言-动作模型在面对分布外数据时的鲁棒性，通过引入适应性更新过滤机制来减少测试时更新带来的实际挑战。

Comments Accepted at ICML 2026 Workshop on Continual Adaptation at Scale (CATS)

详情

AI中文摘要

Visual Foresight VLA (VF-VLA) 已成为最近 VLA 中的重要架构选择，因其出色的性能。然而，VF-VLA 的固有设计使其特别容易受到分布外（OOD）偏移的影响。由于动作的质量直接取决于预测未来视觉信息的准确性，OOD 条件会影响两个阶段。为了解决这一脆弱性，我们提出了测试时训练视觉前瞻 VLA（$T^3$VF），这是一种受观察启发的测试时训练方法，即预测的未来图像及其后续观察形成自然的监督对。为了进一步解决由于随意测试时更新而产生的实际挑战，我们引入了自适应更新过滤机制。经验上，$T^3$VF 在不改变任何架构或辅助模块的情况下，以适度的额外推理成本缓解了 VF-VLA 的 OOD 脆弱性。

英文摘要

Visual Foresight VLA (VF-VLA) has become a prominent architectural choice in the recent VLA due to its impressive performance. Nevertheless, the inherent design of VF-VLA makes it particularly vulnerable to out-of-distribution (OOD) shifts. Because the quality of action directly depends on the accuracy of the predicted future visual information, OOD conditions affect both stages at once. To address this vulnerability, we propose Test-Time Training Visual Foresight VLA ($T^3$VF), a test-time training approach motivated by the observation that the predicted future image and its subsequent observation form a natural supervision pair. To further address the practical challenges that arise from indiscriminate test-time updates, we introduce an adaptive update filtering mechanism. Empirically, $T^3$VF mitigates the OOD vulnerability of VF-VLA at a modest additional inference cost, without requiring any architectural modification or auxiliary modules.

URL PDF HTML ☆

赞 0 踩 0

2605.07482 2026-06-05 cs.LG cs.AI 版本更新

SHRED: Retain-Set-Free Unlearning via Self-Distillation with Logit Demotion

SHRED: 通过自蒸馏与对数势降低实现无保留集的去记忆

Zizhao Hu, Ameya Godbole, Johnny Tian-Zheng Wei, Mohammad Rostami, Jesse Thomason, Robin Jia

发表机构 * University of Southern California（南加州大学）； USC Information Sciences Institute（USC信息科学研究所）

AI总结本文提出了一种无需保留集的去记忆方法SHRED，通过自蒸馏与对数势降低，在去记忆的同时保持模型的实用性，优于传统需要保留集的方法。

详情

AI中文摘要

针对大语言模型（LLMs）的机器去记忆问题，旨在选择性地移除记忆中的内容，如私人数据、受版权文本或危险知识，而无需昂贵的全量重新训练。现有大多数方法需要一个经过精心挑选的保留集以防止一般模型用途的灾难性退化，这会增加额外的数据依赖性，使部署复杂化。我们提出SHRED（通过高惊奇度的无保留集熵降低的自蒸馏），一种无需保留集的去记忆方法，基于一个关键洞察：并非所有遗忘集实例中的token都同等地包含记忆信息。高信息token集中了模型的记忆知识，而低信息token反映了一般语言能力。SHRED分为两个阶段。（1）选择：我们对遗忘集实例进行前向传递，收集每个token的自回归概率，并选择底部（最低概率，最高香农信息）作为遗忘位置；剩余位置保留为良性锚点。（2）训练：我们构建了修改的KL目标，降低记忆token在遗忘位置的logit，同时在良性位置保持原始分布。模型通过单一的顶部KL自蒸馏目标进行训练，同时驱动遗忘和实用性保持。我们评估了SHRED在四个标准去记忆基准上的表现，并证明其在遗忘效果和模型实用性之间建立了新的帕累托最优权衡，优于保留集依赖的方法。我们的分析显示，SHRED对重新学习攻击和成员推断攻击具有鲁棒性，并且在多次连续去记忆运行后仍能保持稳定的实用性。

英文摘要

Machine unlearning for large language models (LLMs) aims to selectively remove memorized content such as private data, copyrighted text, or hazardous knowledge, without costly full retraining. Most existing methods require a retain set of curated examples to prevent catastrophic degradation of general model utility, creating an extra data dependency that complicates deployment. We propose SHRED (Self-distillation via High-surprisal-only Retain-set-free Entropy Demotion), a retain-set-free unlearning method built on a key insight: not all tokens within a forget set instance carry memorized information equally. High-information tokens concentrate the model's memorized knowledge, while low-information tokens reflect general language competence. SHRED operates in two stages. (1) Selection: We perform a forward pass on a forget set instance, collect per-token autoregressive probabilities, and select the bottom (lowest probability, highest Shannon information) as forget positions; the remaining positions are retained as benign anchors. (2) Training: We construct modified KL targets that demote the memorized token's logit at forget positions while preserving the original distribution at benign positions. The model is then trained via a single top KL self-distillation objective that simultaneously drives forgetting and utility preservation. We evaluate SHRED across four standard unlearning benchmarks and demonstrate that it establishes a new Pareto-optimal trade-off between forget efficacy and model utility, outperforming retain-set-dependent methods. Our analysis shows that SHRED is robust against relearning attacks and membership-inference attacks, and it maintains stable utility even after many sequential unlearning runs.

URL PDF HTML ☆

赞 0 踩 0

2605.07096 2026-06-05 cs.LG cs.AI stat.ME 版本更新

Transformer 的拓扑困境

Michael C. Mozer, Shoaib Ahmed Siddiqui, Rosanne Liu

发表机构 * Google DeepMind（谷歌深Mind）

AI总结本文探讨了Transformer在处理序列结构时的拓扑问题，指出其纯前馈架构限制了动态状态跟踪，提出应通过递归架构转向隐含激活动态，并介绍了连续思维Transformer架构的分类方法及未来研究方向。

详情

AI中文摘要

Transformers通过扩展的上下文历史在序列中编码结构。然而，其纯前馈架构从根本上限制了动态状态跟踪。状态跟踪——迭代更新反映不断变化环境的潜在变量——涉及本质上序列依赖性，这使得前馈网络难以维持。因此，前馈模型会将演进状态表示推入其层栈更深处，使得信息在浅层不可用，最终耗尽模型的深度。虽然动态深度模型和显式或隐式思维可以绕过这一深度限制，但这些解决方案在计算和内存上效率低下。在本文中，我们主张，时间扩展认知需要从显式思维轨迹转向隐式激活动态，通过递归架构。我们引入了递归和连续思维Transformer架构的分类方法，按其递归轴（深度与步长）和输入标记与递归步长的比例进行分类。最后，我们概述了有前景的研究方向，包括增强的状态空间模型和粗粒度递归，以更好地将状态跟踪整合到现代基础模型中。

英文摘要

Transformers encode structure in sequences via an expanding contextual history. However, their purely feedforward architecture fundamentally limits dynamic state tracking. State tracking -- the iterative updating of latent variables reflecting an evolving environment -- involves inherently sequential dependencies that feedforward networks struggle to maintain. Consequently, feedforward models push evolving state representations deeper into their layer stack with each new input step, rendering information inaccessible in shallow layers and ultimately exhausting the model's depth. While this depth limit can be bypassed by dynamic depth models and by explicit or latent thinking that externalizes state representations, these solutions are computationally and memory inefficient. In this article, we argue that temporally extended cognition requires refocusing from explicit thought traces to implicit activation dynamics via recurrent architectures. We introduce a taxonomy of recurrent and continuous-thought transformer architectures, categorizing them by their recurrence axis (depth versus step) and their ratio of input tokens to recurrence steps. Finally, we outline promising research directions, including enhanced state-space models and coarse-grained recurrence, to better integrate state tracking into modern foundation models.

URL PDF HTML ☆

赞 0 踩 0

2604.23466 2026-06-05 cs.LG cs.AI cs.AR 版本更新

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

评估Hopper和Blackwell GPU上的CUDA Tile用于AI工作负载

Divakar Kumar Yadav, Tian Zhao, Deepak Kumar

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结本文评估了CUDA Tile在Hopper和Blackwell GPU上的AI工作负载性能，比较了CuTile与cuBLAS、Triton等方法的效率和可移植性，发现CuTile在特定工作负载上表现优异，但在跨架构优化上仍有不足。

详情

AI中文摘要

NVIDIA的CUDA Tile（CuTile）引入了一种基于Python的、以tile为中心的抽象，用于GPU内核开发，旨在简化编程同时保持Tensor Core和Tensor Memory Accelerator（TMA）在现代GPU上的效率。我们对三种NVIDIA GPU（Hopper和Blackwell架构下的H100 NVL、B200和RTX PRO 6000 Blackwell Server Edition）上的CuTile进行了首次独立、跨架构评估，对比了cuBLAS、Triton、WMMA和原始SIMT等现有方法。我们通过基准测试代表性AI工作负载，包括GEMM、融合多头注意力和端到端LLM推理（BF16/FP16精度），以评估性能和可移植性。我们的结果表明，CuTile的效果强烈依赖于工作负载和架构。在数据中心级Blackwell（B200）上，CuTile在融合注意力任务中达到最高1007 TFLOP/s，比FlashAttention-2快2.5倍，仅需60行Python内核代码。对于GEMM，CuTile在22行代码中达到cuBLAS性能的52-79%，比WMMA的123行代码更高效，使其成为手写CUDA内核的实用替代品，但尚未成为供应商优化库的替代品。然而，相同的CuTile注意力内核在RTX PRO 6000（sm_120）上仅达到FlashAttention-2的53%吞吐量，暴露了显著的跨架构优化差距。相比之下，Triton在所有测试平台上的cuBLAS性能保持在62-101%，无需架构特定调整，显示出更强的可移植性。

英文摘要

NVIDIA's CUDA Tile (CuTile) introduces a Python-based, tile-centric abstraction for GPU kernel development that aims to simplify programming while retaining Tensor Core and Tensor Memory Accelerator (TMA) efficiency on modern GPUs. We present the first independent, cross-architecture evaluation of CuTile against established approaches such as cuBLAS, Triton, WMMA, and raw SIMT on three NVIDIA GPUs spanning Hopper and Blackwell: H100 NVL, B200, and RTX PRO 6000 Blackwell Server Edition. We benchmark representative AI workloads, including GEMM, fused multi-head attention, and end-to-end LLM inference in BF16/FP16 precision, to assess both performance and portability. Our results show that CuTile effectiveness is strongly workload- and architecture-dependent. On datacenter-class Blackwell (B200), CuTile achieves up to 1007 TFLOP/s for fused attention, outperforming FlashAttention-2 by 2.5x while requiring only 60 lines of Python kernel code. For GEMM, CuTile reaches 52-79% of cuBLAS performance in 22 lines of code (versus 123 for WMMA), making it a practical replacement for hand-written CUDA kernels but not yet for vendor-optimized libraries. However, the same CuTile attention kernel achieves only 53% of FlashAttention-2 throughput on RTX PRO 6000 (sm_120), exposing significant cross-architecture optimization gaps. In contrast, Triton sustains 62-101% of cuBLAS performance across all tested platforms without architecture-specific tuning, demonstrating substantially stronger portability.

URL PDF HTML ☆

赞 0 踩 0

2602.23665 2026-06-05 cs.IR cs.LG cs.SI 版本更新

Geodesic Semantic Search: Cartographic Navigation of Citation Graphs with Learned Local Riemannian Maps

测地语义搜索：基于学习局部黎曼度量的引文图导航

Brandon Yee, Lucas Wang, Kundana Kommini

AI总结本文提出Geodesic Semantic Search (GSS)，通过在引文图上学习节点特定的黎曼度量，实现几何感知的语义检索。不同于传统基于嵌入的检索依赖固定欧几里得距离，GSS在每个节点学习低秩度量张量，诱导局部正定度量，从而在保持模型可计算性的同时保证有效度量。检索过程通过多源Dijkstra算法在学习的测地距离上进行，随后通过最大边际相关性重排序和路径一致性过滤。在包含169,000篇arXiv论文的引文预测基准上，GSS在Recall@20上比SPECTER+FAISS基线提升了23%。我们提供了Bridge Recovery Guarantee，描述了测地检索在定性上优于直接相似性的情况，以及训练损失与检索质量的边际分离结果，并刻画了低秩度量参数化的表达能力。我们的分层粗到细检索方法结合k-means池化，将计算成本降低4倍，同时保持97%的检索质量。

Comments Substantial Revision Required

详情

AI中文摘要

我们提出了Geodesic Semantic Search (GSS)，一种检索系统，通过在引文图上学习节点特定的黎曼度量，以实现几何感知的语义检索。不同于标准基于嵌入的检索依赖固定欧几里得距离，\gss{}在每个节点学习一个低秩度量张量$\mL_i \in \R^{d imes r}$，诱导一个局部正定度量$\mG_i = \mL_i \mL_i^ op + \eps \mI$。这种参数化保证了有效的度量，同时保持模型的可计算性。检索过程通过在学习的测地距离上进行多源Dijkstra算法，随后通过最大边际相关性重排序和路径一致性过滤。在包含169,000篇arXiv论文的引文预测基准上，GSS在Recall@20上比SPECTER+FAISS基线提高了23%。我们提供了Bridge Recovery Guarantee，描述了测地检索在定性上优于直接相似性的情况，以及训练损失与检索质量的边际分离结果，并刻画了低秩度量参数化的表达能力。我们的分层粗到细检索方法结合k-means池化，将计算成本降低4倍，同时保持97%的检索质量。

英文摘要

We present Geodesic Semantic Search (GSS), a retrieval system that learns node-specific Riemannian metrics on citation graphs to enable geometry-aware semantic search. Unlike standard embedding-based retrieval that relies on fixed Euclidean distances, \gss{} learns a low-rank metric tensor $\mL_i \in \R^{d \times r}$ at each node, inducing a local positive semi-definite metric $\mG_i = \mL_i \mL_i^\top + \eps \mI$. This parameterization guarantees valid metrics while keeping the model tractable. Retrieval proceeds via multi-source Dijkstra on the learned geodesic distances, followed by Maximal Marginal Relevance reranking and path coherence filtering. On citation prediction benchmarks with 169K arXiv papers, GSS achieves 23\% relative improvement in Recall@20 over SPECTER+FAISS baselines. We provide a Bridge Recovery Guarantee characterizing when geodesic retrieval qualitatively outperforms direct similarity, a margin separation result connecting training loss to retrieval quality, and characterize the expressiveness of low-rank metric parameterization. Our hierarchical coarse-to-fine search with k-means pooling reduces computational cost by $4\times$ while maintaining 97\% retrieval quality.

URL PDF HTML ☆

赞 0 踩 0

2604.22583 2026-06-05 cs.LG 版本更新

Adaptive Head Budgeting for Efficient Multi-Head Attention

自适应头预算用于高效多头注意力

Bilal Faye, Abdoulaye Mbaye, Hanane Azzag, Mustapha Lebbah

发表机构 * LIPN, Université Paris 13（巴黎第十三大学LIPN实验室）； Université Paris 13（巴黎第十三大学）； Université de Versailles Saint-Quentin-en-Yvelines（巴黎- Versailles 巴黎-圣昆丁-埃夫里大学）

AI总结提出BudgetFormer，通过动态分配注意力头预算和相关性分布，在文本分类任务中减少计算和内存开销，同时保持或提升性能。

详情

AI中文摘要

多头注意力使Transformer能够捕获多样化的表示，但无论任务复杂度如何，通常每个输入都会激活所有注意力头。对于粗粒度任务（如文本分类），相关信息通常是全局性的，这种固定分配会引入不必要的计算。我们提出BudgetFormer，一种基于每个输入动态分配注意力头的Transformer架构。该模型学习头预算和相关性分布，以选择信息量最大的头。为了支持有效的头选择，我们引入了一种平衡探索与利用的训练策略。在文本分类任务上的实验表明，BudgetFormer减少了FLOPs和内存使用，同时匹配或超越了标准多头注意力的性能。这些结果突显了自适应头分配作为提高Transformer效率和性能的有效方法。

英文摘要

Multi-head attention enables Transformers to capture diverse representations, but all attention heads are typically activated for every input, regardless of task complexity. For coarse-grained tasks such as text classification, where relevant information is often global, this fixed allocation can introduce unnecessary computation. We propose BudgetFormer, a Transformer architecture that dynamically allocates attention heads on a per-input basis. The model learns both a head budget and a relevance distribution to select the most informative heads. To support effective head selection, we introduce a training strategy that balances exploration and exploitation. Experiments on text classification tasks show that BudgetFormer reduces FLOPs and memory usage while matching or surpassing the performance of standard multi-head attention. These results highlight adaptive head allocation as an effective approach to improving Transformer efficiency and performance.

URL PDF HTML ☆

赞 0 踩 0

2602.13939 2026-06-05 cs.LG cs.AI 版本更新

A Horizon-Aware Decision-Support Framework for Demand Forecasting Model Selection in Resilient Production Planning

面向 horizon 的决策支持框架：用于在鲁棒生产计划中选择需求预测模型

Adolfo González, Víctor Parada

发表机构 * Department of Computer Engineering and Informatics, Faculty of Engineering, University of Santiago of Chile（工程学院计算机工程与信息学系，智利圣塔克鲁斯大学）

AI总结本文提出了一种面向 horizon 的决策支持框架，用于在需求波动大、不确定性高的生产计划中选择需求预测模型，通过 MDFH 方法预测误差指标并提出 RMSSEh 和 AHSIV 作为改进的模型选择方法。

Comments 31 pages, 12 figures and Appendix

详情

AI中文摘要

需求预测是鲁棒生产计划、库存补给、采购和产能决策中的关键输入，在需求间歇性、高波动性和运营不确定性下尤为重要。在这些情况下，仅根据固定的测试周期性能选择预测模型可能导致决策与预测所用的未来规划周期不一致。本文提出 Metric Degradation by Forecast Horizon（MDFH）程序作为面向 horizon 的决策支持框架，用于选择需求预测模型。MDFH 在显式结构稳定性条件下，将观察到的测试周期的误差指标（如MAE、RMSE和RMSSE）投影到未来的运营周期。基于此层，RMSSEh 被推导为一种简洁的面向周期的选优器，同时提出 Adaptive Hybrid Selector for Intermittency and Variability（AHSIV）作为结构异质需求序列的适应性扩展。ERA，一种多变量排名聚合选优器，被包含为比较对象。实证评估使用了Walmart、M3、M4和M5数据集，三个训练-测试分区，22个预测模型和12步未来周期。结果表明，RMSSEh 和 AHSIV 在通过事后全局相对精度评估时，比ERA提供更一致的下游体积对齐。

英文摘要

Demand forecasting is a critical input for resilient production planning, inventory replenishment, procurement, and capacity decisions under demand intermittency, high variability, and operational uncertainty. In these contexts, selecting forecasting models solely on the basis of fixed test-horizon performance may lead to decisions misaligned with the future planning horizons in which forecasts are used. This study proposes the Metric Degradation by Forecast Horizon (MDFH) procedure as a horizon-aware decision-support framework for selecting demand forecasting models. MDFH projects eligible out-of-sample error metrics, specifically MAE, RMSE, and RMSSE, from an observed test horizon toward future operational horizons under explicit structural-stability conditions. Based on this layer, RMSSEh is derived as a parsimonious horizon-aware selector, while the Adaptive Hybrid Selector for Intermittency and Variability (AHSIV) is proposed as an adaptive extension for structurally heterogeneous demand series. ERA, a multivariate ranking-aggregation selector, is included as a comparator. The empirical evaluation uses the Walmart, M3, M4, and M5 datasets, three training-testing partitions, 22 forecasting models, and 12-step future horizons. Results show that RMSSEh and AHSIV provide more consistent downstream volumetric alignment than ERA when assessed through ex post Global Relative Accuracy.

URL PDF HTML ☆

赞 0 踩 0

2510.22048 2026-06-05 cs.LG 版本更新

PF$Δ$: A Benchmark Dataset for Power Flow under Load, Generation, and Topology Variations

PF$Δ$: 一个用于负载、发电和拓扑变化的功率流基准数据集

Ana K. Rivera, Anvita Bhagavathula, Alvaro Carbonero, Priya Donti

发表机构 * Department of Electrical Engineering & Computer Science（电气工程与计算机科学系）； Laboratory for Information & Decision Systems（信息与决策系统实验室）； Massachusetts Institute of Technology（麻省理工学院）

AI总结本文提出PF$Δ$基准数据集，用于评估在负载、发电和拓扑变化下的功率流计算，通过包含859,800个解决的实例，涵盖六种不同的变电站系统大小，并包含三种故障场景，以评估传统求解器和基于GNN的方法，识别现有方法的不足和未来研究的开放问题。

Comments 31 pages, 14 figures. Accepted at NeurIPS 2025

详情

Journal ref: NeurIPS 2025

AI中文摘要

功率流（PF）计算是实时电网操作的核心，广泛应用于诸如故障分析（其中重复的PF评估评估在停电情况下的电网安全性）和拓扑优化（涉及基于PF的在组合学上庞大的动作空间中的搜索）。在操作时间尺度上运行这些计算或在大规模评估空间中仍然是主要的计算瓶颈。此外，随着可再生能源的整合和气候引起的极端天气，电力系统操作的不确定性也在增加，这要求工具能够准确且高效地模拟广泛的情景和运行条件。机器学习方法相对于传统求解器提供了潜在的加速，但其性能尚未在能够捕捉真实世界变化的基准上得到系统评估。本文介绍了PF$Δ$，一个用于功率流的基准数据集，能够捕捉负载、发电和拓扑的多样化变化。PF$Δ$包含859,800个解决的功率流实例，涵盖六种不同的变电站系统大小，捕捉三种类型的故障场景（N、N-1和N-2），并包括接近不可行的案例，这些案例接近稳态电压稳定性极限。我们评估了传统求解器和基于GNN的方法，突显了现有方法在关键领域的不足，并识别了未来研究的开放问题。我们的数据集可在https://huggingface.co/datasets/pfdelta/pfdelta/tree/main获取，我们的代码、数据生成脚本和模型实现可在https://github.com/MOSSLab-MIT/pfdelta获取。

英文摘要

Power flow (PF) calculations are the backbone of real-time grid operations, across workflows such as contingency analysis (where repeated PF evaluations assess grid security under outages) and topology optimization (which involves PF-based searches over combinatorially large action spaces). Running these calculations at operational timescales or across large evaluation spaces remains a major computational bottleneck. Additionally, growing uncertainty in power system operations from the integration of renewables and climate-induced extreme weather also calls for tools that can accurately and efficiently simulate a wide range of scenarios and operating conditions. Machine learning methods offer a potential speedup over traditional solvers, but their performance has not been systematically assessed on benchmarks that capture real-world variability. This paper introduces PF$Δ$, a benchmark dataset for power flow that captures diverse variations in load, generation, and topology. PF$Δ$ contains 859,800 solved power flow instances spanning six different bus system sizes, capturing three types of contingency scenarios (N , N -1, and N -2), and including close-to-infeasible cases near steady-state voltage stability limits. We evaluate traditional solvers and GNN-based methods, highlighting key areas where existing approaches struggle, and identifying open problems for future research. Our dataset is available at https://huggingface.co/datasets/pfdelta/pfdelta/tree/main and our code with data generation scripts and model implementations is at https://github.com/MOSSLab-MIT/pfdelta.

URL PDF HTML ☆

赞 0 踩 0

2604.03634 2026-06-05 cs.LG cs.IT eess.SP math.IT 版本更新

Algebraic Diversity: Group-Theoretic Spectral Estimation from Single Observations

代数多样性：从单次观测进行群论谱估计

Mitchell A. Thornton

发表机构 * Richardson, TX 75080 USA（美国德克萨斯州里奇蒙德市75080号）

AI总结本文通过群论方法揭示了单次观测下的谱估计问题，证明了时间平均是退化群作用的特例，并展示了群平均估计与多快门协方差估计的等效性，同时统一了DFT、DCT和KLT等变换。

Comments 41 pages, 14 figures. v3: Retracted six quantitative findings in Section 11, transformer application, due to implementation error in spectral concentration metric. Corrected results deferred to separate publication. Remark added after Conjecture 23 on orbit-structure bias in psi criterion. All other sections unaffected v4: new result on blind group matching; v5: corrected/updated metrics

详情

AI中文摘要

我们证明时间平均多个观测是退化群作用的特例，群G={e}。一个通用替换定理证明了单快门群平均估计与多快门协方差估计具有等效的子空间分解。平凡群嵌入定理证明样本协方差是平凡群估计的累积，其方差由(G,L)连续体支配，随1/(|G|·L)变化。处理增益10log10(M) dB等于经典波束成形增益，证明该增益是群阶的属性而非传感器数量。DFT、DCT和KLT统一为群匹配的特例。我们推测一个通用代数平均定理，将这些结果扩展到任意统计量，方差由有效群阶d_eff支配。蒙特卡洛实验在五种群类型下的前四个样本矩上验证了该猜想，精度达四位。该框架利用信息的结构（数据对象的表示论对称性）而非内容，补充了香农理论。五种应用被展示：单快门MUSIC、大规模MIMO、单脉冲波形分类、图信号处理和变压器LLM分析。描述了盲群匹配技术。

英文摘要

We establish that temporal averaging over multiple observations is the degenerate case of algebraic group action with the trivial group $G=\{e\}$. A General Replacement Theorem proves that a group-averaged estimator from one snapshot achieves equivalent subspace decomposition to multi-snapshot covariance estimation. The Trivial Group Embedding Theorem proves that the sample covariance is the accumulation of trivial-group estimates, with variance governed by a $(G,L)$ continuum as $1/(|G|\cdot L)$. The processing gain $10\log_{10}(M)$ dB equals the classical beamforming gain, establishing that this gain is a property of group order, not sensor count. The DFT, DCT, and KLT are unified as group-matched special cases. We conjecture a General Algebraic Averaging Theorem extending these results to arbitrary statistics, with variance governed by the effective group order $d_{\mathrm{eff}}$. Monte Carlo experiments on the first four sample moments across five group types confirm the conjecture to four-digit precision. The framework exploits the $structure$ of information (representation-theoretic symmetry of the data object) rather than the content, complementing Shannon's theory. Five applications are demonstrated: single-snapshot MUSIC, massive MIMO, single-pulse waveform classification, graph signal processing, and analysis of transformer LLMs. Techniques for blind group matching are described.

URL PDF HTML ☆

赞 0 踩 0

2603.10457 2026-06-05 physics.plasm-ph cond-mat.stat-mech cs.LG physics.acc-ph 版本更新

PI-JEPA: 一种无需标签的替代预训练方法，用于通过操作符分裂潜在预测的耦合多物理场模拟

Brandon Yee, Pairie Koh

AI总结该研究提出了一种无需完整PDE求解的替代预训练框架PI-JEPA，通过掩码潜在预测和每子操作符PDE残差正则化，在未标记的参数场上训练，从而减少多物理场替代部署所需的模拟预算。

Comments Substantial Revision Required

详情

AI中文摘要

流体模拟工作流程面临根本的数据不对称性：输入参数场（地质统计渗透率实现、孔隙度分布）可以自由生成任意数量，但现有神经操作符替代模型需要大量昂贵的标记模拟轨迹数据，无法利用这种未标记结构。我们引入PI-JEPA（物理信息联合嵌入预测架构），一种替代预训练框架，无需任何完整的PDE求解，通过在未标记参数场上的掩码潜在预测和每子操作符PDE残差正则化进行训练。预测器银行在结构上与 governing equations 的李-特罗特操作符分裂分解对齐，为每个子过程（压力、饱和度传输、反应）分配一个物理约束的潜在模块，使微调仅需100次标记模拟运行。在单相达西流中，PI-JEPA在N_ℓ=100时比FNO低1.9倍，比DeepONet低2.4倍，在N_ℓ=500时比监督-only训练提高24%，证明了无标签替代预训练显著减少了多物理场替代部署所需的模拟预算。

英文摘要

Reservoir simulation workflows face a fundamental data asymmetry: input parameter fields (geostatistical permeability realizations, porosity distributions) are free to generate in arbitrary quantities, yet existing neural operator surrogates require large corpora of expensive labeled simulation trajectories and cannot exploit this unlabeled structure. We introduce \textbf{PI-JEPA} (Physics-Informed Joint Embedding Predictive Architecture), a surrogate pretraining framework that trains \emph{without any completed PDE solves}, using masked latent prediction on unlabeled parameter fields under per-sub-operator PDE residual regularization. The predictor bank is structurally aligned with the Lie--Trotter operator-splitting decomposition of the governing equations, dedicating a separate physics-constrained latent module to each sub-process (pressure, saturation transport, reaction), enabling fine-tuning with as few as 100 labeled simulation runs. On single-phase Darcy flow, PI-JEPA achieves $1.9\times$ lower error than FNO and $2.4\times$ lower error than DeepONet at $N_\ell{=}100$, with 24\% improvement over supervised-only training at $N_\ell{=}500$, demonstrating that label-free surrogate pretraining substantially reduces the simulation budget required for multiphysics surrogate deployment.

URL PDF HTML ☆

赞 0 踩 0

2604.01489 2026-06-05 cs.LG cs.AI cs.DC cs.PF cs.SE 版本更新

CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe

CuTeGen: 基于LLM的代理框架用于使用CuTe生成和优化高性能GPU内核

Tara Saba, Zhiyang Chen, Jikai Jason Li, Anne Ouyang, Xujie Si, Fan Long

发表机构 * Department of Computer Science, University of Toronto（计算机科学系，多伦多大学）

AI总结本文提出CuTeGen，一种基于LLM的代理框架，通过CuTe抽象层实现GPU内核的生成和优化，通过结构化生成-测试-优化工作流，在标准基准测试中实现了比PyTorch快1.71倍的速度提升，并在生成成本相近的情况下优于现有代理基线CudaForge。

详情

AI中文摘要

高性能GPU内核对现代机器学习系统至关重要，但开发这些内核仍然是一个手动、专家驱动的过程。最近的研究尝试利用LLM自动生成功能内核，但生成的内核在标准化基准测试中仍无法达到精心调优的参考内核。我们提出了CuTeGen，一种代理GPU内核合成框架，将内核开发视为在CuTe抽象层上的结构化生成-测试-优化工作流。CuTeGen有两个设计选择区别于先前的工作：针对CuTe而不是原始CUDA，这暴露了性能关键结构如分块和数据移动，同时保持足够的稳定性以进行迭代优化；以及延迟的性能调度，将低层次性能反馈推迟到内核的高层结构稳定之后。在209个KernelBench Level-1和Level-2任务上，CuTeGen在PyTorch上实现了平均1.71倍的速度提升，并在生成成本相近的情况下优于先前的代理基线CudaForge（0.89倍）。代码可在https://github.com/taratt/cutegen.git获取。

英文摘要

High-performance GPU kernels are critical to modern machine learning systems, yet developing them remains a manual, expert-driven process. Recent work has explored using LLMs to automate kernel generation, but generated kernels still fall short of carefully tuned references on standardized benchmarks. We present CuTeGen, an agentic GPU kernel synthesis framework that treats kernel development as a structured generate-test-refine workflow over the CuTe abstraction layer. Two design choices distinguish CuTeGen from prior work: targeting CuTe rather than raw CUDA, which exposes performance-critical structures such as tiling and data movement while remaining stable enough for iterative refinement, and a delayed profiling schedule that withholds low-level performance feedback until the kernel's high-level structure has stabilized. On the 209 tasks of KernelBench Level-1 and Level-2, CuTeGen achieves an average speedup of 1.71$\times$ over PyTorch and outperforms the prior agentic baseline CudaForge (0.89$\times$) at comparable per-task generation cost. Code available at https://github.com/taratt/cutegen.git

URL PDF HTML ☆

赞 0 踩 0

2604.00230 2026-06-05 cs.LG 版本更新

Neural Collapse Dynamics: Depth, Activation, Regularisation, and Feature Norm Threshold

神经坍缩动力学：深度、激活、正则化和特征范数阈值

Anamika Paul Rupa

发表机构 * Department of Electrical Engineering and Computer Science（电气工程与计算机科学系）

AI总结本文研究了神经坍缩现象的动力学，发现特征范数达到特定临界值时会发生神经坍缩，并探讨了深度、激活函数、正则化和网络宽度对这一过程的影响。

详情

AI中文摘要

神经坍缩（NC）——即最后层特征收敛到一个等角紧框架——在平衡状态下已被深入理解，但其发生过程的动力学仍不明确。我们发现一个简单且可预测的规律：当特征范数的均值达到模型-数据集特定的临界值fn*时会发生NC，该值对训练条件变化不敏感。该值在每个（模型，数据集）对中高度集中（CV < 8%）；训练动态主要影响fn接近fn*的速度，而非其值本身。在标准训练轨迹中，fn低于fn*的交叉始终在NC发生之前，提供了一个具有平均提前时间62个周期（MAE 24个周期）的实用预测器。直接干预实验确认fn*是梯度流的稳定吸引子——特征尺度的扰动在训练过程中会自我校正，无论方向如何都会收敛到相同值（p>0.2）。完成（架构x数据集）网格揭示了本文最强的结果：ResNet-20在MNIST上给出fn* = 5.867——相对于CIFAR-10的+68%，架构效应增加了+458%。该网格强烈非加性；fn*不能分解为独立的架构和数据集贡献。四个结构性规律出现：（1）深度对坍缩速度有非单调影响；（2）激活函数共同决定坍缩速度和fn*；（3）权重衰减定义了一个三区域相图——太小会减慢，最佳范围最快，太大会阻止坍缩；（4）宽度单调加速坍缩，同时将fn*最多移动13%。这些结果确立了特征范数动态作为预测NC时间的可行诊断方法，表明范数阈值行为是深度网络中延迟表示再组织的通用机制。

英文摘要

Neural collapse (NC) -- the convergence of penultimate-layer features to a simplex equiangular tight frame -- is well understood at equilibrium, but the dynamics governing its onset remain poorly characterised. We identify a simple and predictive regularity: NC occurs when the mean feature norm reaches a model-dataset-specific critical value, fn*, that is largely invariant to training conditions. This value concentrates tightly within each (model, dataset) pair (CV < 8%); training dynamics primarily affect the rate at which fn approaches fn*, rather than the value itself. In standard training trajectories, the crossing of fn below fn* consistently precedes NC onset, providing a practical predictor with a mean lead time of 62 epochs (MAE 24 epochs). A direct intervention experiment confirms fn* is a stable attractor of the gradient flow -- perturbations to feature scale are self-corrected during training, with convergence to the same value regardless of direction (p>0.2). Completing the (architecture)x(dataset) grid reveals the paper's strongest result: ResNet-20 on MNIST gives fn* = 5.867 -- a +458% architecture effect versus only +68% on CIFAR-10. The grid is strongly non-additive; fn* cannot be decomposed into independent architecture and dataset contributions. Four structural regularities emerge: (1) depth has a non-monotonic effect on collapse speed; (2) activation jointly determines both collapse speed and fn*; (3) weight decay defines a three-regime phase diagram -- too little slows, an optimal range is fastest, and too much prevents collapse; (4) width monotonically accelerates collapse while shifting fn* by at most 13%. These results establish feature-norm dynamics as an actionable diagnostic for predicting NC timing, suggesting that norm-threshold behaviour is a general mechanism underlying delayed representational reorganisation in deep networks.

URL PDF HTML ☆

赞 0 踩 0

2603.28257 2026-06-05 q-fin.ST cs.LG 版本更新

Nonlinear Factor Decomposition via Kolmogorov-Arnold Networks: A Spectral Approach to Asset Return Analysis

通过Kolmogorov-Arnold网络进行非线性因子分解：一种资产收益分析的谱方法

David Breazu

发表机构 * Faculty of Mathematics and Computer Science, University of Bucharest（布加勒斯特大学数学与计算机科学学院）

AI总结本文提出KAN-PCA，一种利用KAN作为编码器和线性映射作为解码器的自编码器，通过在每条边上使用学习的B样条函数替代线性投影，以捕捉比传统PCA更多的方差。实验表明KAN-PCA在20只S&P 500股票上实现了更高的重建R²值，并在修正数据泄露后与PCA外推结果一致。

Comments 12 pages, 2 figures

2505.11006 2026-06-05 stat.ML cs.LG 版本更新

Is Supervised Learning Really That Different from Unsupervised?

监督学习真的和无监督学习有那么大的区别吗？

Oskar Allerbo, Thomas B. Schön

发表机构 * KTH Royal Institute of Technology（皇家理工学院）； Uppsala University（乌普萨拉大学）

AI总结该研究通过将监督学习分解为两阶段过程，证明在不访问标签数据的情况下选择模型参数和添加输出，可以实现与传统监督学习相似的性能，表明监督与无监督学习的区别可能不如表面看起来那么根本。

Comments Paper accepted at AISTATS 2026

2603.19312 2026-06-05 cs.LG cs.AI 版本更新

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

LeWorldModel：从像素稳定端到端联合嵌入预测架构

Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, Randall Balestriero

发表机构 * Mila & Université de Montréal（Mila与蒙特利尔大学）； New York University（纽约大学）； Samsung SAIL（三星SAIL）； Brown University（布朗大学）

AI总结本文提出LeWorldModel，一种通过仅使用两个损失项从原始像素稳定端到端训练的联合嵌入预测架构，显著减少了可调损失超参数，并在多种2D和3D控制任务中表现出色，同时在物理结构编码和物理不合理的事件检测方面展示了其能力。

详情

AI中文摘要

联合嵌入预测架构（JEPAs）提供了一个有吸引力的框架，用于在紧凑的潜在空间中学习世界模型，但现有方法仍然脆弱，依赖于复杂的多术语损失、指数移动平均、预训练编码器或辅助监督来避免表示崩溃。在本工作中，我们引入了LeWorldModel（LeWM），这是第一个通过仅使用两个损失项从原始像素稳定端到端训练的JEPAs。这将可调损失超参数的数量从六个减少到一个。在单个GPU上几小时内可训练约1500万参数，LeWM的规划速度比基于基础模型的世界模型快48倍，同时在多种2D和3D控制任务中保持竞争力。除了控制之外，我们还展示了LeWM的潜在空间通过探测物理量编码有意义的物理结构。惊奇评估证实，该模型能够可靠地检测出物理上不可能的事件。

英文摘要

Joint Embedding Predictive Architectures (JEPAs) offer a compelling framework for learning world models in compact latent spaces, yet existing methods remain fragile, relying on complex multi-term losses, exponential moving averages, pre-trained encoders, or auxiliary supervision to avoid representation collapse. In this work, we introduce LeWorldModel (LeWM), the first JEPA that trains stably end-to-end from raw pixels using only two loss terms: a next-embedding prediction loss and a regularizer enforcing Gaussian-distributed latent embeddings. This reduces tunable loss hyperparameters from six to one compared to the only existing end-to-end alternative. With ~15M parameters trainable on a single GPU in a few hours, LeWM plans up to 48x faster than foundation-model-based world models while remaining competitive across diverse 2D and 3D control tasks. Beyond control, we show that LeWM's latent space encodes meaningful physical structure through probing of physical quantities. Surprise evaluation confirms that the model reliably detects physically implausible events.

URL PDF HTML ☆

赞 0 踩 0

2603.20980 2026-06-05 cs.LG cs.AI stat.AP stat.ML 版本更新

From Causal Discovery to Dynamic Causal Inference in Neural Time Series

从因果发现到神经时间序列中的动态因果推断

Dmitry Zaytsev, Valentina Kuskova, Michael Coppedge

发表机构 * Lucy Family Institute for Data & Society（数据与社会卢西家族研究所）； University of Notre Dame（诺克斯达大学）； Political Science University of Notre Dame（政治学诺克斯达大学）

AI总结提出动态因果网络自回归（DCNAR）两阶段框架，通过神经自回归因果发现学习稀疏有向因果网络，并将其作为结构先验用于时变神经网络自回归，实现无需预设网络结构的动态因果推断。

Comments 11 pages, 2 figures

详情

DOI: 10.1145/3770855.3818956

AI中文摘要

混合能量感知奖励塑形：一种统一的轻量级物理引导策略优化方法

Qijun Liao, Jue Yang, Yiting Kang, Xinxin Zhao, Yong Zhang, Mingan Zhao

发表机构 * School of Mechanical Engineering, University of Science and Technology Beijing（北京科技大学机械工程学院）； Jiangsu XCMG Construction Machinery Research Institute Co., Ltd.（江苏中联重科工程机械研究院有限公司）

AI总结提出混合能量感知奖励塑形（H-EARS），通过编码先验能量项作为奖励势能，结合动作正则化，在连续控制中提升收敛速度、稳定性和能效。

Comments 23 pages, 48 figures. Accepted by Neurocomputing

详情

DOI: 10.1016/j.neucom.2026.134132

AI中文摘要

深度强化学习在连续控制中常因纯数据驱动探索忽略可用物理结构而遭受高方差、低能效和分布偏移下泛化能力差的问题。本文提出混合能量感知奖励塑形（H-EARS），将主导能量项（假设先验已知）直接编码为奖励势能，每步计算复杂度为O(n)。H-EARS将塑形势能分解为任务导向和基于能量的组件，并辅以动作正则化项，有意修改优化目标以强制执行节能控制。建立了完整的理论基础：塑形与正则化的功能独立性、正定Hessian条件下的能量梯度增强、函数近似下的收敛保证以及近似势能误差界。在四个连续控制基准和四种基线算法上，H-EARS在收敛速度、策略稳定性和最终性能方面均取得一致提升。高保真车辆仿真验证了其在极端道路条件下安全关键场景中的适用性。

英文摘要

Deep reinforcement learning for continuous control often suffers from high variance, low energy efficiency, and poor generalization under distribution shift, as purely data-driven exploration ignores available physical structure. This paper proposes Hybrid Energy-Aware Reward Shaping (H-EARS), which encodes dominant energy terms -- assumed known a priori -- directly as reward potentials at O(n) per-step computation. H-EARS decomposes the shaping potential into task-oriented and energy-based components, supplemented by an action regularization term that deliberately modifies the optimization objective to enforce energy-efficient control. A complete theoretical foundation is established: functional independence of shaping and regularization, energy-based gradient enrichment under positive-definite Hessian conditions, convergence guarantees under function approximation, and approximate potential error bounds. Across four continuous control benchmarks and four baseline algorithms, H-EARS achieves consistent gains in convergence speed, policy stability, and final performance. High-fidelity vehicle simulations validate applicability in safety-critical settings under extreme road conditions.

URL PDF HTML ☆

赞 0 踩 0

2603.11319 2026-06-05 cs.LG stat.ML 版本更新

On the Robustness of Langevin Dynamics to Score Function Error

关于对数动力学对分数函数误差的鲁棒性

Daniel Yiming Cao, August Y. Chen, Karthik Sridharan, Yuchen Wu

发表机构 * Cornell University（康奈尔大学）

AI总结本文研究了基于分数函数的生成模型对分数函数估计误差的鲁棒性，发现对数动力学在L2误差（更一般地Lp误差）下并不鲁棒，即使在高维简单分布中，即使分数函数估计误差非常小，对数动力学在多项式时间内运行也会导致与目标分布的总变差距离很大，这进一步支持了扩散模型优于对数动力学。

Comments ICML 2026

详情

AI中文摘要

我们考虑了基于分数函数的生成模型对分数函数估计误差的鲁棒性。特别是，我们证明了对数动力学对分数函数估计的L2误差（更一般地Lp误差）不具有鲁棒性。已知在分数函数估计的L2误差较小的情况下，扩散模型可以在多项式时间内忠实采样目标分布，只要满足一定的正则性假设。相比之下，我们的工作表明，即使对于高维简单分布，对数动力学在任何多项式时间内运行都会产生与目标分布在总变差（TV）距离远的分布，即使分数函数估计的L2误差（更一般地Lp误差）可以任意小。考虑到在实践中从数据学习分数函数时，分数函数估计误差是不可避免的，我们的结果进一步支持扩散模型优于对数动力学，并警示不要使用估计的分数函数进行对数动力学采样。

英文摘要

We consider the robustness of score-based generative modeling to errors in the estimate of the score function. In particular, we show that Langevin dynamics is not robust to the $L^2$ errors (more generally $L^p$ errors) in the estimate of the score function. It is well-established that with small $L^2$ errors in the estimate of the score function, diffusion models can sample faithfully from the target distribution under fairly mild regularity assumptions in a polynomial time horizon. In contrast, our work shows that even for simple distributions in high dimensions, Langevin dynamics run for any polynomial time horizon will produce a distribution far from the target distribution in Total Variation (TV) distance, even when the $L^2$ error (more generally $L^p$) of the estimate of the score function is arbitrarily small. Considering such an error in the estimate of the score function is unavoidable in practice when learning the score function from data, our results provide further justification for diffusion models over Langevin dynamics and serve to caution against the use of Langevin dynamics with estimated scores.

URL PDF HTML ☆

赞 0 踩 0

2508.06249 2026-06-05 cs.LG cs.AI 版本更新

In-Training Defenses against Emergent Misalignment in Language Models

训练过程中对抗语言模型中新兴偏差的防御措施

David Kaczér, Magnus Jørgenvåg, Clemens Vetter, Esha Afzal, Robin Haselhorst, Lucie Flek, Florian Mai

发表机构 * University of Copenhagen（哥本哈根大学）

AI总结本文研究了在训练过程中如何防止语言模型出现新兴偏差，提出了五种训练正则化干预方法，并展示了通过选择对齐模型与偏差模型之间困惑度差异的交错数据可以获得最佳效果。

Comments Accepted at ICML 2026 https://icml.cc/virtual/2026/poster/64303

详情

AI中文摘要

微调使从业者能够将对齐的大型语言模型 (LLMs) 重新用于新领域，但最近的研究揭示了新兴偏差 (EM)：即使是一个小的、领域特定的微调，也可能导致远超出目标领域的有害行为。即使在模型权重被隐藏在微调API之后的情况下，这也为攻击者提供了无意中访问广泛偏差模型的途径，这从微调数据本身难以检测。我们提出了第一个系统研究在训练过程中对抗EM的防护措施，这些措施对提供者而言是可行的，他们通过API暴露微调：我们评估了这些措施是否能够防止广泛的偏差、允许狭窄的偏差、在良性任务上学习良好，并且保持一致性。我们调查了五种训练正则化干预：(i) 朝着安全参考模型的KL散度正则化，(ii) 特征空间中的ℓ2距离，(iii) 通过邪恶人格向量进行预防性引导，(iv) 从一般指令微调数据集交错训练示例，以及 (v) 疫苗提示。我们证明，通过选择对齐模型与偏差模型之间的困惑度差异的交错数据可以获得最佳效果。

英文摘要

Fine-tuning lets practitioners repurpose aligned large language models (LLMs) for new domains, yet recent work reveals emergent misalignment (EM): Even a small, domain-specific fine-tune can induce harmful behaviors far outside the target domain. Even in the case where model weights are hidden behind a fine-tuning API, this gives attackers inadvertent access to a broadly misaligned model in a way that can be hard to detect from the fine-tuning data alone. We present the first systematic study of in-training safeguards against EM that are practical for providers who expose fine-tuning via an API: We evaluate whether they a) prevent broad misalignment, b) allow narrow misalignment, c) learn well on benign tasks, and d) remain coherent. We investigate five training regularization interventions: (i) KL-divergence regularization toward a safe reference model, (ii) $\ell_2$ distance in feature space, (iii) preventive steering with an evil persona vector, (iv) interleaving training examples from a general instruct-tuning dataset and (v) inoculation prompting. We demonstrate that selecting interleaving data by the perplexity gap between aligned and misaligned models yields the best results overall.

URL PDF HTML ☆

赞 0 踩 0

2603.03993 2026-06-05 cs.LG cond-mat.dis-nn 版本更新

Specialization of softmax attention heads: insights from the high-dimensional single-location model

softmax 注意力头的专门化：来自高维单位置模型的见解

M. Sagitova, O. Duranthon, L. Zdeborová

发表机构 * Statistical physics of computation laboratory, École Polytechnique Fédérale de Lausanne, Switzerland（计算物理学实验室，瑞士联邦理工学院拉沃斯纳分校）

AI总结本文研究了多头注意力机制中注意力头的专门化现象，提出了一种理论模型，分析了SGD下多头softmax注意力的训练动态，并引入了Bayes-softmax注意力以优化预测性能。

2603.03955 2026-06-05 cs.LG cs.AI 版本更新

GIPO: Gaussian Importance Sampling Policy Optimization

GIPO：高斯重要性采样策略优化

Chengxuan Lu, Zhenquan Zhang, Shukuan Wang, Qunzhi Lin, Yanjie Li, Baigui Sun, Yang Liu

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结该研究提出了一种基于截断重要性采样的策略优化目标GIPO，通过使用基于对数比率的高斯信任权重替代硬裁剪，以软化极端重要性比率同时保持非零梯度，从而提高数据效率，实验表明GIPO在多种回放缓冲区大小下均取得最佳性能，表现出优越的偏差-方差权衡、高训练稳定性及改进的样本效率。

详情

AI中文摘要

在强化学习（RL）后训练近年来已显示出在多模态智能体上超越监督模仿的强劲潜力。然而，RL仍然受到较差的数据效率的限制，特别是在交互数据稀缺且迅速过时的设置中。为了解决这一挑战，GIPO（高斯重要性采样策略优化）被提出作为基于截断重要性采样的策略优化目标，用基于对数比率的高斯信任权重替代硬裁剪，以软化极端重要性比率同时保持非零梯度。理论分析显示，GIPO引入了隐含且可调的更新幅度约束，而集中界保证了在有限样本估计下的鲁棒性和稳定性。实验结果表明，GIPO在各种回放缓冲区大小范围内，从接近策略到高度过时的数据均取得了最佳性能，同时表现出优越的偏差-方差权衡、高训练稳定性和改进的样本效率。代码可在https://github.com/distanceLu/GIPO获得。

英文摘要

Post-training with reinforcement learning (RL) has recently shown strong promise for advancing multimodal agents beyond supervised imitation. However, RL remains limited by poor data efficiency, particularly in settings where interaction data are scarce and quickly become outdated. To address this challenge, GIPO (Gaussian Importance sampling Policy Optimization) is proposed as a policy optimization objective based on truncated importance sampling, replacing hard clipping with a log-ratio-based Gaussian trust weight to softly damp extreme importance ratios while maintaining non-zero gradients. Theoretical analysis shows that GIPO introduces an implicit, tunable constraint on the update magnitude, while concentration bounds guarantee robustness and stability under finite-sample estimation. Experimental results show that GIPO achieves state-of-the-art performance among clipping-based baselines across a wide range of replay buffer sizes, from near on-policy to highly stale data, while exhibiting superior bias--variance trade-off, high training stability and improved sample efficiency. Code is available at https://github.com/distanceLu/GIPO.

URL PDF HTML ☆

赞 0 踩 0

2603.02376 2026-06-05 cs.DC cs.AR cs.LG cs.MA 版本更新

CUCo: An Agentic Framework for Compute and Communication Co-design

CUCo：一种用于计算与通信协同设计的代理框架

Yoga Sri Varshan Varadharajan, Bodun Hu, Saurabh Agarwal, Aditya Akella

发表机构 * UT Austin（德克萨斯大学奥斯汀分校）

AI总结本文提出CUCo框架，通过结合结构化设计空间形式化和正确性优先的快速路径代理以及进化驱动的慢速路径代理，实现了CUDA内核的计算与通信协同设计，从而在四个多GPU工作负载中实现了1.57倍的加速，并在LLM推理成本低于10美元的情况下发现了一种双流重叠策略。

2602.24207 2026-06-05 cs.LG cs.CY cs.GT stat.ML 版本更新

The Stability of Online Algorithms in Performative Prediction

在线算法在表现性预测中的稳定性

Gabriele Farina, Juan Carlos Perdomo

发表机构 * MIT（麻省理工学院）； NYU（纽约大学）

AI总结本文研究了在线算法在表现性预测中的稳定性问题，证明了任何在表现性设置中使用的无遗憾算法都会收敛到一种表现性稳定的均衡状态，该状态中模型主动塑造数据分布，使得其预测在事后看来是最优的。该研究避免了对模型如何影响分布的假设，并揭示了常见算法如梯度下降为何能自然稳定化并防止 runaway 反馈循环。

详情

迈向人工智能流行病学：一种用于前瞻性风险检测的测量标准化框架

Kit Tempest-Walters

AI总结本文提出了一种测量标准化框架，用于在没有访问模型内部信息的情况下，将专家-人工智能交互压缩为结构化、可比较的领域，以进行前瞻性风险检测。该框架旨在定义其范围，包括语义和统计层面，并指定未来工作的实证测试协议。

Comments 29 pages, 3 figures

详情

AI中文摘要

本文提出了一种测量标准化框架，该框架将专家-人工智能交互压缩为结构化、可比较的领域，用于在部署的人工智能系统中进行前瞻性风险检测，而无需访问模型内部信息。本文的概念性论文的主要目的是定义该框架的范围，包括语义和统计层面，并指定未来工作的实证测试协议。该框架旨在支持的群体层面声明因此是阶段性的研究计划，而非本文中声称的结果。测量标准化支撑着接下来的三个声明。第一个是可靠性声明：在有限条件下，大型语言模型可以产生可靠的、标准化的评估，用于评估专家-人工智能交互的证据和对齐情况。第二个是治理声明：对齐分数在部署期间为专家提供即时信号，并为机构提供监控不同任务类型、模型和领域的对齐模式的基础。第三个是流行病学声明：一旦建立了测量标准化，聚合对齐分数可以用于研究与下游结果相关的关联，这在受监管的专业环境中是可能的。这引入了基于相关变量而非机理分析的“人工智能流行病学”的可能性。本文解决了第一个声明，并指定了调查第二个和第三个声明的协议。为了在未来研究中实现实证评估，本文阐述了定义的语法，以及基于成对Bootstrap推断的统计协议，DeLong测试用于成对AUCs作为灵敏度检查，预设的一侧非劣性边界为0.05，以及Holm-Bonferroni校正。

英文摘要

This paper proposes a measurement standardisation framework that compresses expert-AI interactions into structured, comparable fields for prospective risk detection in deployed AI systems, without access to model internals. The main aim of this concept paper is to define the scope of the framework, both semantically and statistically, and to specify a protocol for its empirical testing in future work. The population-level claims the framework is designed to support are therefore the subject of a staged research programme rather than results claimed in this paper. Measurement standardisation underpins all three claims that follow. The first is a reliability claim: under bounded conditions, large language models can produce reliable, standardised assessments of the evidential and policy alignment of expert-AI interactions. The second is a governance claim: alignment scores give experts an immediate signal during deployment and give institutions a basis for monitoring alignment patterns across mission types, models, and domains. The third is an epidemiological claim: once measurement standardisation is established, aggregate alignment scores could be used to study associations with downstream outcomes in regulated professional settings. This introduces the possibility of an "AI epidemiology" that detects risk based on correlated variables instead of mechanistic analysis. This paper addresses the first claim and specifies protocols for investigating the second and third. To enable empirical evaluation in future studies, this paper sets out a defined grammar, together with a statistical protocol based on paired bootstrap inference, DeLong's test for paired AUCs as a sensitivity check, a pre-specified one-sided non-inferiority margin of 0.05, and Holm-Bonferroni correction.

URL PDF HTML ☆

赞 0 踩 0

2509.24882 2026-06-05 cs.LG cond-mat.dis-nn cs.AI stat.ML 版本更新

Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime

浅层神经网络在特征学习 regime 中的缩放定律与谱特性

Leonardo Defilippis, Yizhou Xu, Julius Girardin, Emanuele Troiani, Vittorio Erba, Lenka Zdeborová, Bruno Loureiro, Florent Krzakala

发表机构 * Departement d’Informatique, École Normale Supérieure, PSL & CNRS（信息学院，巴黎高等师范学院，PSL与CNRS）； Statistical Physics of Computation Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)（计算统计物理实验室，洛桑联邦理工学院（EPFL））； Information, Learning and Physics Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)（信息、学习与物理实验室，洛桑联邦理工学院（EPFL））

AI总结本文研究了浅层神经网络在特征学习 regime 中的缩放定律与谱特性，通过分析二次和对角神经网络的缩放规律，揭示了样本复杂度和权重衰减对过剩风险缩放指数的影响，并建立了这些 regime 与训练网络权重谱性质的精确联系。

详情

Journal ref: ICLR 2026

AI中文摘要

神经缩放定律是深度学习近期许多进展的基础，但其理论理解仍然主要局限于线性模型。在本文中，我们系统分析了二次和对角神经网络在特征学习 regime 中的缩放定律。利用与矩阵压缩感知和LASSO的联系，我们推导了过剩风险缩放指数作为样本复杂度和权重衰减函数的详细相图。这种分析揭示了不同缩放 regime 之间的交叉和平台行为，与经验神经缩放文献中广泛报告的现象相呼应。此外，我们建立了这些 regime 与训练网络权重谱性质的精确联系，我们对其进行了详细刻画。作为结果，我们提供了最近经验观察的理论验证，这些观察将权重谱中幂律尾部的出现与网络泛化性能联系起来，从而给出了从基本原理出发的解释。

英文摘要

Neural scaling laws underlie many of the recent advances in deep learning, yet their theoretical understanding remains largely confined to linear models. In this work, we present a systematic analysis of scaling laws for quadratic and diagonal neural networks in the feature learning regime. Leveraging connections with matrix compressed sensing and LASSO, we derive a detailed phase diagram for the scaling exponents of the excess risk as a function of sample complexity and weight decay. This analysis uncovers crossovers between distinct scaling regimes and plateau behaviors, mirroring phenomena widely reported in the empirical neural scaling literature. Furthermore, we establish a precise link between these regimes and the spectral properties of the trained network weights, which we characterize in detail. As a consequence, we provide a theoretical validation of recent empirical observations connecting the emergence of power-law tails in the weight spectrum with network generalization performance, yielding an interpretation from first principles.

URL PDF HTML ☆

赞 0 踩 0

2602.16965 2026-06-05 cs.LG 版本更新

Multi-Agent Lipschitz Bandits

多智能体Lipschitz老虎机

Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen

发表机构 * University of Colorado Boulder（科罗拉多大学波德穆尔分校）； INRIA Paris（巴黎国家信息与自动化研究所）

AI总结本文研究了在连续Lipschitz结构动作空间上的去中心化多玩家随机老虎机问题，其中硬碰撞导致零奖励。研究提出了一种无需通信的策略，旨在最大化集体奖励，同时分离协调成本和学习成本。通过新颖的maxima-directed搜索识别并安排玩家到高价值区域，将问题分解为N个独立的单玩家Lipschitz老虎机。在共识模式下，得到端到端的 regrets bound，其主导学习项为~O(T^{(d+1)/(d+2)})，与单玩家Lipschitz速率匹配；前期协调成本在固定置信度下与时间无关，仅在期望 regrets 形式中为多项式对数。在额外的公共覆盖/调度假设下，还获得了无间隙~O(T^{(d+1)/(d+2)})保证。进一步推导了主导学习项的匹配下界，并将框架扩展到一般距离阈值碰撞模型。

Comments Twenty-Ninth Annual Conference on Artificial Intelligence and Statistics (AISTATS 2026)

详情

AI中文摘要

我们研究了在连续、Lipschitz-结构化动作空间上的去中心化多玩家随机老虎机问题，其中硬碰撞导致零奖励。我们的目标是设计一种无需通信的策略，以最大化集体奖励，同时将协调成本与学习成本分开。我们提出了一种模块化协议，首先通过新颖的maxima-directed搜索识别并安排玩家到不同的高价值区域，然后将问题分解为N个独立的单玩家Lipschitz老虎机。在共识模式下，我们得到了端到端的regret界，其主导学习项为~O(T^{(d+1)/(d+2)})，与单玩家Lipschitz速率匹配；前期协调成本在固定置信度下与时间无关，仅在期望regret形式中为多项式对数。在额外的公共覆盖/调度假设下，我们还获得了无间隙~O(T^{(d+1)/(d+2)})保证。我们进一步推导了主导学习项的匹配下界，并将框架扩展到一般距离阈值碰撞模型。

英文摘要

We study the decentralized multi-player stochastic bandit problem over a continuous, Lipschitz-structured action space where hard collisions yield zero reward. Our objective is to design a communication-free policy that maximizes collective reward, while separating coordination costs from learning costs. We propose a modular protocol that first solves the multi-agent coordination problem by identifying and seating players on distinct, high-value regions via a novel maxima-directed search and then decouples the problem into $N$ independent single-player Lipschitz bandits. In the consensus regime, we obtain an end-to-end regret bound whose dominant learning term is $\tilde{O}(T^{(d+1)/(d+2)})$, matching the single-player Lipschitz rate; the upfront coordination cost is horizon-independent at fixed confidence and only polylogarithmic in $T$ in the expected-regret form. Under an additional public coverage/scheduling assumption for the epochic extension, we also obtain a gap-free $\tilde{O}(T^{(d+1)/(d+2)})$ guarantee. We further derive a matching lower bound for the dominant learning term and extend the framework to general distance-threshold collision models.

URL PDF HTML ☆

赞 0 踩 0

2509.20345 2026-06-05 stat.ME cs.LG stat.ML 版本更新

General Synthetic-Powered Inference

通用合成数据驱动推断

Meshi Bashari, Yonghoon Lee, Roy Maor Lotan, Edgar Dobriban, Yaniv Romano

发表机构 * Department of Electrical and Computer Engineering, Technion IIT, Israel（电气与计算机工程系，技术离子研究所，以色列）； Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, USA（统计学与数据科学系，沃顿商学院，宾夕法尼亚大学，美国）； Department of Computer Science, Technion IIT, Israel（计算机科学系，技术离子研究所，以色列）

AI总结本文提出了一种通用合成数据驱动推断框架，通过结合高质量合成数据和真实数据来提高样本效率，同时在合成数据质量低时自动回退到传统方法，无需分布假设即可保持误差率在用户指定范围内。

详情

超越奖励的强化学习在网络安全防御中的应用

Elizabeth Bates, Chris Hicks, Vasilios Mavroudis

发表机构 * University of Cambridge（剑桥大学）

AI总结本文研究了在网络安全防御中使用强化学习时，奖励函数结构对学习和策略行为的影响，通过比较稀疏和密集奖励函数，揭示了奖励、动作空间和子最优策略风险之间的复杂关系。

详情

AI中文摘要

近年来，自主网络安全防御代理在使用深度强化学习保护计算机网络方面引起了广泛关注。这些代理通常在网络安全 gym 环境中训练，使用密集的、高度工程化的奖励函数，结合多种惩罚和激励，以应对各种（不） desirable 状态和昂贵的操作。密集奖励有助于缓解探索复杂环境的挑战，但会偏向于次优且可能风险更大的解决方案，这对复杂的网络安全环境至关重要。我们通过多种稀疏和密集奖励函数、两种已确立的网络安全 gym、不同网络规模以及策略梯度和基于价值的 RL 算法，全面评估了奖励函数结构对学习和策略行为特征的影响。我们的评估得益于一种新的真实评估方法，使可以直接比较不同的奖励函数，揭示了奖励、动作空间和网络安全环境中子最优策略风险之间的微妙关系。我们的结果表明，稀疏奖励，如果目标一致且可以频繁遇到，能够提供增强的训练可靠性和更有效的网络安全防御代理，具有较低风险的策略。令人惊讶的是，稀疏奖励还能产生与网络安全守护者目标更一致的策略，并在不使用显式奖励基于数值惩罚的情况下，节省昂贵的防御操作。

英文摘要

Recent years have seen an explosion of interest in autonomous cyber defence agents trained to defend computer networks using deep reinforcement learning. These agents are typically trained in cyber gym environments using dense, highly engineered reward functions which combine many penalties and incentives for a range of (un)desirable states and costly actions. Dense rewards help alleviate the challenge of exploring complex environments but risk biasing agents towards suboptimal and potentially riskier solutions, a critical issue in complex cyber environments. We thoroughly evaluate the impact of reward function structure on learning and policy behavioural characteristics using a variety of sparse and dense reward functions, two well-established cyber gyms, a range of network sizes, and both policy gradient and value-based RL algorithms. Our evaluation is enabled by a novel ground truth evaluation approach which allows directly comparing between different reward functions, illuminating the nuanced inter-relationships between rewards, action space and the risks of suboptimal policies in cyber environments. Our results show that sparse rewards, provided they are goal aligned and can be encountered frequently, uniquely offer both enhanced training reliability and more effective cyber defence agents with lower-risk policies. Surprisingly, sparse rewards can also yield policies that are better aligned with cyber defender goals and make sparing use of costly defensive actions without explicit reward-based numerical penalties.

URL PDF HTML ☆

赞 0 踩 0

2602.10314 2026-06-05 cs.LG 版本更新

Stop Training for the Worst: Progressive Unmasking Accelerates Masked Diffusion Training

停止训练于最差：渐进性解蔽加速了掩码扩散训练

Jaeyeon Kim, Jonathan Geuter, David Alvarez-Melis, Sham Kakade, Sitan Chen

发表机构 * Harvard University（哈佛大学）； Kempner Institute（凯普纳研究所）

AI总结本文提出了一种名为渐进性解蔽（PUMA）的方法，通过修改前向掩码过程，使训练时间和推理时的掩码模式一致，从而加速了掩码扩散模型的训练。

详情

AI中文摘要

掩码扩散模型（MDMs）已在离散空间的生成建模中展现出有前途的潜力。通过以任何顺序生成序列并允许并行解码，它们能够实现快速的推理和在非因果任务上的强大性能。然而，这种灵活性带来了训练复杂度的权衡：MDMs需要在一个指数级大的掩码模式集合上进行训练，这不仅计算成本高昂，而且在训练时使用的随机掩码与推理时由解码过程诱导的结构化掩码之间存在训练-测试不匹配。在本文中，我们提出渐进性解蔽（PUMA），这是一种简单的前向掩码过程修改方法，使训练时间和推理时的掩码模式一致，从而将优化集中在推理对齐的掩码上并加快训练。经验上，PUMA在125M规模的预训练中加速了约2.5倍，并在自回归初始化等常见方法上提供了互补的优势。我们开源了我们的代码库：https://github.com/JaeyeonKim01/PUMA。

英文摘要

Masked Diffusion Models (MDMs) have emerged as a promising approach for generative modeling in discrete spaces. By generating sequences in any order and allowing for parallel decoding, they enable fast inference and strong performance on non-causal tasks. However, this flexibility comes with a training complexity trade-off: MDMs train on an exponentially large set of masking patterns, which is not only computationally expensive, but also creates a train--test mismatch between the random masks used in training and the highly structured masks induced by inference-time unmasking. In this work, we propose Progressive UnMAsking (PUMA), a simple modification of the forward masking process that aligns training-time and inference-time masking patterns, thereby focusing optimization on inference-aligned masks and speeding up training. Empirically, PUMA speeds up pretraining at the 125M scale by $\approx 2.5\times$ and offers complementary advantages on top of common recipes like autoregressive initialization. We open-source our codebase at https://github.com/JaeyeonKim01/PUMA.

URL PDF HTML ☆

赞 0 踩 0

2602.09574 2026-06-05 cs.CL cs.AI cs.LG 版本更新

Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs

在LLMs的测试时间扩展中对树搜索策略与固定令牌预算对齐

Sora Miyamoto, Daisuke Oba, Naoaki Okazaki

发表机构 * University of Tokyo（东京大学）

AI总结本文提出了一种名为Budget-Guided MCTS (BG-MCTS)的树搜索解码算法，通过将搜索策略与剩余令牌预算对齐，以提高在不同令牌预算下的推理性能。

Comments Accepted at ICML 2026. Code: https://github.com/Sora-Miyamoto/bg-mcts

2602.08503 2026-06-05 cs.CV cs.CL cs.LG 版本更新

Learning Self-Correction in Vision-Language Models via Rollout Augmentation

通过回滚增强学习视觉-语言模型中的自我纠正

Yi Ding, Ziliang Qiu, Bolian Li, Ruqi Zhang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文提出一种基于回滚增强的强化学习框架Octopus，通过重新组合现有回滚生成密集的自我纠正示例，提高样本效率并稳定RL优化，同时引入响应遮蔽策略以解耦自我纠正与直接推理，从而在7个基准测试中实现开源VLM的SOTA性能。

Comments 18 pages

详情

Journal ref: ICML 2026

AI中文摘要

自我纠正对于解决视觉-语言模型（VLMs）中的复杂推理问题至关重要。然而，现有的强化学习（RL）方法在学习自我纠正方面存在困难，因为有效的自我纠正行为只在很少情况下出现，导致学习信号非常稀疏。为了解决这一挑战，我们提出了correction-specific rollouts（Octopus），一种RL回滚增强框架，通过重新组合现有回滚来合成密集的自我纠正示例。这种增强同时提高了样本效率，由于回滚重用，并通过平衡监督稳定了RL优化。此外，我们引入了一种响应遮蔽策略，将自我纠正与直接推理解耦，避免信号冲突，并使两种行为都能被有效学习。基于此，我们介绍了Octopus-8B，一种具有可控自我纠正能力的推理VLM。在7个基准测试中，它在开源VLM中实现了SOTA性能，优于最佳RLVR基线1.0分，同时仅需0.72倍的训练时间每步。

英文摘要

Self-correction is essential for solving complex reasoning problems in vision-language models (VLMs). However, existing reinforcement learning (RL) methods struggle to learn it, as effective self-correction behaviors emerge only rarely, making learning signals extremely sparse. To address this challenge, we propose correction-specific rollouts (Octopus), an RL rollout augmentation framework that synthesizes dense self-correction examples by recombining existing rollouts. This augmentation simultaneously improves sample efficiency due to rollout reuse and stabilizes RL optimization through balanced supervision. Furthermore, we introduce a response-masking strategy that decouples self-correction from direct reasoning, avoiding signal conflicts and enabling both behaviors to be learned effectively. Building on this, we introduce Octopus-8B, a reasoning VLM with controllable self-correction capability. Across 7 benchmarks, it achieves SoTA performance among open-source VLMs, outperforming the best RLVR baseline by 1.0 score while requiring only $0.72\times$ training time per step.

URL PDF HTML ☆

赞 0 踩 0

2508.04409 2026-06-05 stat.ML cs.LG 版本更新

The Relative Instability of Model Comparison with Cross-validation

模型比较与交叉验证的相对不稳定性

Alexandre Bayle, Lucas Janson, Lester Mackey

发表机构 * Department of Statistics, Harvard University, Cambridge, MA, USA（哈佛大学统计系）； Microsoft Research New England, Cambridge, MA, USA（微软研究院新英格兰分部）

AI总结研究指出即使个体稳定的模型在比较时也可能产生相对不稳定的结果，挑战了交叉验证推断的有效性，特别指出Lasso和软阈值化在最有利的学习条件下仍会导致无效的交叉验证推断。

2602.07834 2026-06-05 cs.LG math.DG 版本更新

Interpretable Analytic Calabi-Yau Metrics via Symbolic Distillation

通过符号蒸馏获得可解释的分析Calabi-Yau度量

D Yang Eng

发表机构 * D Yang Eng

AI总结本文研究如何用少量射影不变量紧凑描述Calabi-Yau度量的点确定比，并通过符号回归发现低阶对称特征能有效捕捉教师变化，同时验证了在复杂结构模数范围内保持一致性。

详情

AI中文摘要

点确定比 $ R_ψ(z)\equiv \log\!\left( rac{\det g_{\mathrm{RF}}(z;ψ)}{\det g_{\mathrm{FS}}(z)} ight) $ 用于衡量Dwork五次曲面上的Ricci-flat度量偏离Fubini-Study基线的程度。我们询问这个标量可观测是否能用少量射影不变量紧凑描述，以及是否在复杂结构模数范围内保持有效。使用Donaldson的$k=10$平衡度量作为代数教师，并对采样点进行符号回归，我们发现，在此处研究的受限模数-only特征类别中，两个低阶对称特征，即幂和$p_2=\sum_i |z_i|^4$和三次基本对称多项式$σ_3=e_3$，已经能捕捉大部分教师变化。一个以$(p_2,σ_3)$为变量的三次多项式在测试中达到$R^2=0.946$，而添加剩余低阶对称生成器只会改变不到$10^{-3}$。在同一两个特征空间中，符号回归识别出一个五项有理多项式表达式，能够与$k=10$教师匹配，$R^2=0.9994$。在$ψ\in[0,0.8]$范围内重新拟合相同的函数框架，保持采样点云上的平均确定比代理$\langle R_ψ angle$在$0.01\%$以内，且在研究范围内产生平滑变化的拟合系数。Holomorphic Yukawa耦合$κ_{111}=5$仅作为归一化检查被重现。总体而言，这些结果提供了Dwork家族上一个度量衍生标量可观测的紧凑符号描述，同时受限于用于蒸馏的有限$k$教师，而不是建立闭合形式的Ricci-flat度量。

英文摘要

The pointwise determinant ratio \[ R_ψ(z)\equiv \log\!\left(\frac{\det g_{\mathrm{RF}}(z;ψ)}{\det g_{\mathrm{FS}}(z)}\right) \] measures how the Ricci-flat metric on the Dwork quintic departs from the Fubini--Study baseline. We ask whether this scalar observable can be described compactly in terms of a small number of projective invariants, and whether the same scaffold remains usable across complex-structure moduli. Using Donaldson's $k=10$ balanced metric as an algebraic teacher and symbolic regression on sampled points, we find that, within the restricted moduli-only feature class studied here, two low-order symmetric features, the power sum $p_2=\sum_i |z_i|^4$ and the cubic elementary symmetric polynomial $σ_3=e_3$, already capture most of the teacher variation. A degree-3 polynomial in $(p_2,σ_3)$ achieves held-out test $R^2=0.946$, while adding the remaining low-order symmetric generators changes this by less than $10^{-3}$. Within the same two-feature space, symbolic regression identifies a five-term rational-polynomial expression that matches the $k=10$ teacher with $R^2=0.9994$. Refitting the same functional scaffold across $ψ\in[0,0.8]$ keeps the mean determinant-ratio proxy $\langle R_ψ\rangle$ within $0.01\%$ of the local teachers on the sampled point clouds and yields smoothly varying fitted coefficients over the studied range. The holomorphic Yukawa coupling $κ_{111}=5$ is reproduced as a normalization check only. Taken together, these results provide a compact symbolic description of one metric-derived scalar observable on the Dwork family, while remaining bounded by the finite-$k$ teacher used for distillation rather than establishing a closed-form Ricci-flat metric.

URL PDF HTML ☆

赞 0 踩 0

2602.06773 2026-06-05 cs.LG stat.ML 版本更新

On the Convergence of Multicalibration Gradient Boosting

多校准梯度提升的收敛性研究

Daniel Haimovich, Fridolin Linder, Lorenzo Perini, Niek Tax, Milan Vojnovic

发表机构 * Meta ； LSE, Department of Statistics（伦敦经济学院统计系）

AI总结本文研究了多校准梯度提升的收敛性，证明了预测更新的幅度以O(1/√T)衰减，并在额外的平滑假设下实现线性收敛，实验验证了理论结果和方法的快速收敛性。

Comments Under submission

2602.01607 2026-06-05 math.ST cs.IT cs.LG math.IT stat.ML stat.TH 版本更新

Minimax optimal differentially private synthetic data for smooth queries

最小最大最优差分隐私合成数据用于平滑查询

Rundong Ding, Yiyun He, Yizhe Zhu

发表机构 * Department of Mathematics, University of Southern California（南加州大学数学系）； Department of Mathematics, University of California San Diego（加州圣地亚哥大学数学系）

AI总结本文研究了如何生成具有(ε,δ)差分隐私的合成数据，以在保证个体隐私的同时，为有意义的下游分析提供强效用保证。提出了一种多项式时间算法，实现了最小最大误差率O_{k,d}(n^{-min{1, k/d}})，并建立了针对k-平滑查询的首个最小最大下界。

Comments COLT 2026 arXiv version. 34 pages

详情

AI中文摘要

差分隐私合成数据使敏感数据集的共享和分析成为可能，同时为个体贡献者提供严格的隐私保证。一个核心挑战是为有意义的下游分析提供强效用保证。许多现有方法确保在广泛的查询类上具有均匀的准确性，如所有Lipschitz函数，但这种通用性往往导致对实际感兴趣的统计量的次优速率。由于许多常见数据分析查询的平滑性超出了最坏情况Lipschitz界所捕捉的范围，我们询问是否可以利用这种额外的结构来提高效用。我们研究了从大小为n的数据集生成(ε,δ)差分隐私合成数据的问题，该数据集支持在超立方体[-1,1]^d上，具有对所有具有受界导数的平滑查询的均匀效用保证。我们提出了一种多项式时间算法，实现了最小最大误差率O_{k,d}(n^{-min{1, k/d}})，除了一个log(n)因子。这一特征揭示了k=d处的相变。我们的结果推广了Chebyshev矩匹配框架（Musco等，2025；Wang等，2016），并且严格改进了在\citep{wang2016differentially}中为k-平滑查询建立的误差率。此外，我们建立了针对k-平滑查询的首个最小最大下界，扩展了Boedihardjo等（2024）中关于ε-差分隐私的Wasserstein下界。

英文摘要

Differentially private synthetic data enables the sharing and analysis of sensitive datasets while providing rigorous privacy guarantees for individual contributors. A central challenge is to achieve strong utility guarantees for meaningful downstream analysis. Many existing methods ensure uniform accuracy over broad query classes, such as all Lipschitz functions, but this level of generality often leads to suboptimal rates for statistics of practical interest. Since many common data analysis queries exhibit smoothness beyond what worst-case Lipschitz bounds capture, we ask whether exploiting this additional structure can yield improved utility. We study the problem of generating $(\varepsilon,δ)$-differentially private synthetic data from a dataset of size $n$ supported on the hypercube $[-1,1]^d$, with utility guarantees uniformly for all smooth queries having bounded derivatives up to order $k$. We propose a polynomial-time algorithm that achieves a minimax error rate of $O_{k,d}(n^{-\min \{1, \frac{k}{d}\}})$, up to a $\log(n)$ factor. This characterization uncovers a phase transition at $k=d$. Our results generalize the Chebyshev moment matching framework of (Musco et al., 2025; Wang et al., 2016) and strictly improve the error rates for $k$-smooth queries established in \citep{wang2016differentially}. Moreover, we establish the first minimax lower bound for the utility of $(\varepsilon,δ)$-differentially private synthetic data with respect to $k$-smooth queries, extending the Wasserstein lower bound for $\varepsilon$-differential privacy in (Boedihardjo et al., 2024).

URL PDF HTML ☆

赞 0 踩 0

2602.05056 2026-06-05 cs.CR cs.CL cs.LG 版本更新

Grounded but Misleading: Evaluating Semantic Alignment in AI-Generated Security Explanations

grounded but Misleading: Evaluating Semantic Alignment in AI-Generated Security Explanations

Heajun An, Connor Ng, Sandesh Sharma Dulal, Junghwan Kim, Jin-Hee Cho

发表机构 * Virginia Tech（弗吉尼亚理工学院）

AI总结本文研究了AI生成的安全解释中语义对齐的问题，通过VEXA测试平台验证了词汇基础与语义风险对齐之间的差距，发现即使解释在词汇上显得合理，其语义解释可能削弱检测器的意图风险评估。

详情

AI中文摘要

在线诈骗越来越多地利用流畅且具有上下文意识的社会工程策略，导致对能够解释为何一条信息可能具有风险的AI系统的需求日益增长。然而，引用检测器衍生证据的解释可能仍然在语义上削弱或改变预期的风险解释。我们介绍了VEXA：验证语义解释对齐，一个用于研究AI生成诈骗风险解释中词汇基础与语义风险对齐差距的受控测试平台。VEXA通过独立控制证据基础和语义框架来生成无基础、风险对齐和风险稀释的解释。通过LLM作为判断者和人类评估，我们发现即使解释的语义解释削弱了检测器的意图风险评估，解释仍可能在比较上显得合理。在人类评估中，风险稀释的XAI基础解释保留了相对较高的感知证据基础评分（3.66），尽管其帮助性（3.00）和推理支持（3.14）评分较低。这些发现提供了AI生成安全解释中基础错觉效应的受控证据，并表明可信的解释评估必须不仅验证是否引用了证据，还要验证如何解释这些证据。

英文摘要

Online scams increasingly leverage fluent and context-aware social engineering strategies, creating growing demand for AI systems that explain why a message may be risky. However, explanations that cite detector-derived evidence may still semantically weaken or redirect the intended risk interpretation. We introduce VEXA: Verifying Semantic Explanation Alignment, a controlled testbed for studying the gap between lexical grounding and semantic risk alignment in AI-generated scam-risk explanations. VEXA generates ungrounded, risk-aligned, and risk-diluting explanations by independently controlling evidence grounding and semantic framing. Through LLM-as-a-judge and human evaluations, we show that explanations may continue to appear comparatively grounded even when their semantic interpretation weakens the detector's intended risk assessment. In human evaluation, risk-diluting XAI-grounded explanations retained comparatively elevated Perceived Evidence Grounding scores (3.66) despite lower Helpfulness (3.00) and Reasoning Support (3.14) scores. These findings provide controlled evidence of grounding illusion effects in AI-generated security explanations and suggest that trustworthy explanation evaluation must verify not only whether evidence is cited, but also how that evidence is interpreted.

URL PDF HTML ☆

赞 0 踩 0

2602.02680 2026-06-05 cs.LG 版本更新

FlexRank: Nested Low-Rank Knowledge Decomposition for Adaptive Model Deployment

FlexRank: 嵌套低秩知识分解用于自适应模型部署

Riccardo Zaccone, Stefanos Laskaridis, Marco Ciccone, Samuel Horváth

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出FlexRank方法，通过嵌套低秩权重分解和基于重要性的整合，从预训练模型中提取不同能力的子模型，实现“一次训练，随处部署”的自适应部署。

Comments Accepted at ICML 2026 (Spotlight)

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning, PMLR, 2026

AI中文摘要

深度神经网络（包括大型语言模型和视觉变换器）的规模不断增长，使得从头训练成本过高，部署成本也日益增加。这些模型通常作为固定计算成本的单一整体使用，阻碍了在不同成本预算下的自适应部署。我们认为，可以从预训练模型中提取按重要性排序的嵌套组件，并在可用计算预算内选择性激活。为此，我们提出的FlexRank方法利用嵌套的、基于重要性的低秩权重分解来整合子模型，从而提取能力递增的子模型。我们的方法实现了“一次训练，随处部署”的范式，无需为每个预算从头训练即可在成本与性能之间实现优雅的权衡——推进了大型模型的实际部署。

英文摘要

The growing scale of deep neural networks, encompassing large language models (LLMs) and vision transformers (ViTs), has made training from scratch prohibitively expensive and deployment increasingly costly. These models are often used as computational monoliths with fixed cost, hindering adaptive deployment across different cost budgets. We argue that nested components, ordered by importance, can be extracted from pretrained models and selectively activated within the available computational budget. To this end, our proposed FlexRank method leverages low-rank weight decomposition with nested, importance-based consolidation to extract submodels of increasing capabilities. Our approach enables a "train-once, deploy-everywhere" paradigm offering a graceful trade-off between cost and performance without training from scratch for each budget - advancing practical deployment of large models.

URL PDF HTML ☆

赞 0 踩 0

2602.02241 2026-06-05 cs.LG 版本更新

Variational Entropic Optimal Transport

变分熵最优传输

Roman Dyachenko, Nikita Gushchin, Kirill Sokolov, Petr Mokrov, Evgeny Burnaev, Alexander Korotin

发表机构 * Lomonosov Moscow State University（莫斯科罗蒙诺索夫莫斯科大学）； National Research Nuclear University MEPhI（国家研究核大学 MEPhI）

AI总结本文提出变分熵最优传输（VarEOT），通过精确的变分重参数化将对数分区函数转化为可处理的最小化问题，从而在不依赖MCMC模拟的情况下实现高效的最优传输学习，理论上有有限样本泛化界和通用函数逼近结果，并在合成数据和未配对图像到图像翻译任务中展示了竞争力或改进的翻译质量。

详情

AI中文摘要

熵最优传输（EOT）在连续空间中以二次成本为经典工具，用于解决领域迁移问题。在实践中，最近的方法优化一个弱对偶EOT目标，依赖于单一势能函数，但这样做在计算上效率不高，因为对数分区项不可计算。现有方法通常通过两种方式解决这一障碍：通过显著限制传输家族以获得闭式归一化（通过高斯混合参数化），或通过使用通用神经参数化，需要基于模拟的训练过程。我们提出变分熵最优传输（VarEOT），基于对数分区$\log \mathbb{E}[\exp(\cdot)]$的精确变分重参数化，作为对辅助对数归一化进行可处理的最小化。这产生了一个可微学习目标，通过随机梯度优化，并避免了训练期间MCMC模拟的必要性。我们提供了理论保证，包括有限样本泛化界和在通用函数逼近下的近似结果。在合成数据和未配对图像到图像翻译实验中，展示了竞争力或改进的翻译质量，而与使用相同弱对偶EOT目标的求解器比较支持所提出优化原理的优势。我们的求解器代码可在https://github.com/DrEternity/VarEOT找到。

英文摘要

Entropic optimal transport (EOT) in continuous spaces with quadratic cost is a classical tool for solving the domain translation problem. In practice, recent approaches optimize a weak dual EOT objective depending on a single potential, but doing so is computationally not efficient due to the intractable log-partition term. Existing methods typically resolve this obstacle in one of two ways: by significantly restricting the transport family to obtain closed-form normalization (via Gaussian-mixture parameterizations), or by using general neural parameterizations that require simulation-based training procedures. We propose Variational Entropic Optimal Transport (VarEOT), based on an exact variational reformulation of the log-partition $\log \mathbb{E}[\exp(\cdot)]$ as a tractable minimization over an auxiliary log-normalizer. This yields a differentiable learning objective optimized with stochastic gradients and avoids the necessity of MCMC simulations during the training. We provide theoretical guarantees, including finite-sample generalization bounds and approximation results under universal function approximation. Experiments on synthetic data and unpaired image-to-image translation demonstrate competitive or improved translation quality, while comparisons within the solvers that use the same weak dual EOT objective support the benefit of the proposed optimization principle. The code for our solver can be found at https://github.com/DrEternity/VarEOT .

URL PDF HTML ☆

赞 0 踩 0

2602.01196 2026-06-05 cs.LG 版本更新

Unraveling the Hidden Dynamical Structure in Recurrent Neural Policies

揭示递归神经策略中的隐藏动力学结构

Jin Li, Yue Wu, Mengsha Huang, Yuhao Sun, Hao He, Xianyuan Zhan

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文通过分析不同训练方法、模型架构和任务中学习得到的递归策略的隐藏状态域，发现稳定的循环结构在与环境交互时 consistently 出现，这些结构与动力系统分析中的极限环有相似性，并揭示了极限环几何结构与策略行为之间的对应关系，为解释递归策略的性能提供了新视角。

详情

AI中文摘要

递归神经策略在部分可观测控制和元强化学习任务中被广泛应用。它们能够维持内部记忆并快速适应未见过的场景，相较于非递归策略具有无可比拟的性能。然而，到目前为止，其优异的泛化性和鲁棒性性能的底层机制仍不明确。在本研究中，通过分析不同训练方法、模型架构和任务中学习得到的递归策略的隐藏状态域，我们发现稳定的循环结构在与环境交互时 consistently 出现。如果将策略和环境视为一个联合混合动力系统，这些循环结构与动力系统分析中的极限环有显著相似性。此外，我们发现这些极限环的几何结构也与策略行为具有结构化的对应关系。这些发现为解释递归策略的许多良好特性提供了新的视角：极限环的出现稳定了策略的内部记忆和任务相关的环境状态，同时抑制了来自环境不确定性的干扰变量；极限环的几何结构也编码了行为的关联结构，有助于在非稳态环境中更轻松地进行技能适应。

英文摘要

Recurrent neural policies are widely used in partially observable control and meta-RL tasks. Their abilities to maintain internal memory and adapt quickly to unseen scenarios have offered them unparalleled performance when compared to non-recurrent counterparts. However, until today, the underlying mechanisms for their superior generalization and robustness performance remain poorly understood. In this study, by analyzing the hidden state domain of recurrent policies learned over a diverse set of training methods, model architectures, and tasks, we find that stable cyclic structures consistently emerge during interaction with the environment. Such cyclic structures share a remarkable similarity with \textit{limit cycles} in dynamical system analysis, if we consider the policy and the environment as a joint hybrid dynamical system. Moreover, we uncover that the geometry of such limit cycles also has a structured correspondence with the policies' behaviors. These findings offer new perspectives to explain many nice properties of recurrent policies: the emergence of limit cycles stabilizes both the policies' internal memory and the task-relevant environmental states, while suppressing nuisance variability arising from environmental uncertainty; the geometry of limit cycles also encodes relational structures of behaviors, facilitating easier skill adaptation when facing non-stationary environments.

URL PDF HTML ☆

赞 0 踩 0

2601.22580 2026-06-05 cs.CL cs.AI cs.LG 版本更新

SpanNorm: Reconciling Training Stability and Performance in Deep Transformers

SpanNorm: 在深度Transformer中协调训练稳定性与性能

Chao Wang, Bei Li, Jiaqi Zhang, Xinyu Liu, Yuchun Fan, Linkun Lyu, Xin Chen, Jingang Wang, Tong Xiao, Peng Pei, Xunliang Cai

发表机构 * Meituan Inc.（美团公司）； NLP Lab, School of Computer Science and Engineering（自然语言处理实验室，计算机科学与工程学院）； Northeastern University, Shenyang, China（东北大学，沈阳，中国）

AI总结本文提出SpanNorm技术，通过结合前归一化和后归一化的优势，解决深度Transformer中训练稳定性与性能之间的根本性权衡问题，理论分析和实验结果表明其在密集和专家混合（MoE）场景中均优于传统归一化方案。

Comments Accepted by ICML2026

详情

AI中文摘要

大型语言模型（LLMs）的成功依赖于深度Transformer架构的稳定训练。一个关键的设计选择是归一化层的位置，导致了一个根本性的权衡：PreNorm架构在深度模型中确保了训练稳定性，但可能牺牲性能；而PostNorm架构提供了强大的性能，但面临严重的训练不稳定性。在本工作中，我们提出SpanNorm，一种新的技术，旨在通过整合两种范式的优点来解决这一困境。结构上，SpanNorm建立了一个跨越整个Transformer块的清晰残差连接以稳定信号传播，同时采用PostNorm风格的计算方式对聚合输出进行归一化以增强模型性能。我们提供了理论分析，证明SpanNorm结合合理的缩放策略可以在整个网络中保持信号方差有界，防止PostNorm模型中出现的梯度问题，并缓解PreNorm中的表示崩溃问题。实验结果表明，SpanNorm在密集和专家混合（MoE）场景中均优于传统归一化方案，为更强大和稳定的Transformer架构铺平了道路。

英文摘要

The success of Large Language Models (LLMs) hinges on the stable training of deep Transformer architectures. A critical design choice is the placement of normalization layers, leading to a fundamental trade-off: the ``PreNorm'' architecture ensures training stability at the cost of potential performance degradation in deep models, while the ``PostNorm'' architecture offers strong performance but suffers from severe training instability. In this work, we propose SpanNorm, a novel technique designed to resolve this dilemma by integrating the strengths of both paradigms. Structurally, SpanNorm establishes a clean residual connection that spans the entire transformer block to stabilize signal propagation, while employing a PostNorm-style computation that normalizes the aggregated output to enhance model performance. We provide a theoretical analysis demonstrating that SpanNorm, combined with a principled scaling strategy, maintains bounded signal variance throughout the network, preventing the gradient issues that plague PostNorm models, and also alleviating the representation collapse of PreNorm. Empirically, SpanNorm consistently outperforms standard normalization schemes in both dense and Mixture-of-Experts (MoE) scenarios, paving the way for more powerful and stable Transformer architectures.

URL PDF HTML ☆

赞 0 踩 0

2505.11766 2026-06-05 cs.LG cs.AI quant-ph 版本更新

Reformulating Neural Operators in $d+1$ Dimensions for Embedding Evolution

在d+1维度中重新表述神经算子以嵌入演化

Haoze Song, Zhihao Li, Xiaobo Zhang, Zecheng Gan, Zhilu Lai, Wei Wang

发表机构 * HKUST (GZ)（香港科技大学（广州））； HKUST（香港科技大学）； SWJTU（西南交通大学）

AI总结本文提出在d+1维度中重新表述神经算子，通过引入辅助函数维度来建模嵌入演化，从而改进嵌入扩展的效率，通过傅里叶基算子在物理域和辅助域上联合作用，实现更高效的嵌入演化模块，实验表明该方法在多个基准测试中表现优异。

详情

AI中文摘要

神经算子（NOs）是学习函数空间之间映射的强大架构。尽管大多数进展集中在改进核参数化在d维物理域上的精度，但提升的嵌入扩展仍缺乏探索，这通常导致模型倾向于计算成本高昂的嵌入扩展设计以提高近似能力。在本文中，我们引入了一个辅助函数维度，以运算形式建模嵌入演化，从而在d+1维度中重新表述NO流程。我们通过基于傅里叶的算子在物理域和辅助域上联合作用，实例化了这一框架，得到一个基于基底多样化的方法作为替代于暴力嵌入扩展。在超过十种越来越具有挑战性的基准测试中，从1D热方程到高度非线性的3D瑞利-泰勒不稳定性，我们的模型在评估的基线中始终实现了最低的相对L2误差。关键的是，这一优势通过（1）受控预算意识的比较，与缩放和剥离的基线；（2）混合分辨率训练和超分辨率推断下的鲁棒性；以及（3）零样本泛化到未见的时间范围，得到了实证支持。此外，我们还展示了更广泛的设计选择，以提升和恢复算子，展示了其对模型预测性能的影响。

英文摘要

Neural Operators (NOs) are powerful architectures for learning mappings between function spaces. While most advances focus on refining kernel parameterizations over the $d$-dimensional physical domain, the evolution of lifted embeddings remains underexplored, which often drives models toward computationally expensive embedding-scaling designs to improve approximation. In this paper, we introduce an auxiliary function dimension that models embedding evolution in operator form, thereby reformulating the NO pipeline in $d+1$ dimensions. We instantiate this framework via Fourier-based operators acting jointly on the physical and auxiliary domains, yielding a basis-diversified auxiliary evolution module as an alternative to brute-force embedding scaling. Across more than ten increasingly challenging benchmarks, ranging from the 1D heat equation to the highly nonlinear 3D Rayleigh-Taylor instability, our model consistently achieves the lowest relative $L_2$ error among the evaluated baselines. Crucially, this advantage is empirically supported by (1) controlled budget-aware comparisons against scaled and ablated baselines; (2) robustness under mixed-resolution training and super-resolution inference; and (3) zero-shot generalization to unseen temporal regimes. In addition, we present a broader set of design choices for lifting and recovery operators, demonstrating their impact on our model's predictive performance.

URL PDF HTML ☆

赞 0 踩 0

2601.18383 2026-06-05 cs.AI cs.CL cs.LG 版本更新

Dynamic Thinking-Token Selection for Efficient Reasoning in Large Reasoning Models

动态思维-令牌选择用于大型推理模型中的高效推理

Zhenyuan Guo, Tong Chen, Wenlong Meng, Chen Gong, Xin Yu, Chengkun Wei, Wenzhi Chen

发表机构 * Zhejiang University（浙江大学）

AI总结本研究提出动态思维-令牌选择方法，通过分析推理轨迹发现只有部分关键令牌影响最终答案，从而优化大型推理模型的效率。

2601.18219 2026-06-05 physics.med-ph cs.CV cs.LG 版本更新

Automated HER2 scoring with uncertainty quantification using lensfree holography and deep learning

利用无透镜全息和深度学习进行自动HER2评分及不确定性量化

Che-Yung Shen, Xilin Yang, Yuzhu Li, Leon Lenk, Aydogan Ozcan

发表机构 * Electrical and Computer Engineering Department, University of California, Los Angeles, CA, 90095, USA（加州大学洛杉矶分校电气与计算机工程系）； Bioengineering Department, University of California, Los Angeles, CA, 90095, USA（加州大学洛杉矶分校生物工程系）； California NanoSystems Institute (CNSI), University of California, Los Angeles, CA, 90095, USA（加州大学洛杉矶分校加州纳米系统研究所）； Department of Computer Science, University of California, Los Angeles, CA, 90095, USA（加州大学洛杉矶分校计算机科学系）

AI总结本文提出了一种基于无透镜全息和深度学习的紧凑型、低成本系统，用于自动免疫组化染色乳腺组织切片的HER2评分，通过贝叶斯蒙特卡洛Dropout策略提高诊断可靠性，实现了高准确率的HER2分类和评分。

Comments 23 Pages, 6 Figures, 1 Table

详情

DOI: 10.34133/bmef.0278
Journal ref: BME Frontiers, AAAS (2026)

AI中文摘要

准确评估人类表皮生长因子受体2（HER2）的表达对于乳腺癌的诊断、预后和治疗选择至关重要；然而，大多数现有的数字HER2评分方法依赖于笨重且昂贵的光学系统。本文提出了一种紧凑且经济的无透镜全息平台，结合深度学习用于自动免疫组化染色乳腺组织切片的HER2评分。该系统在RGB激光照明下捕获染色HER2组织切片的无透镜衍射图案，并在约1250 mm²的样本区域上以约84 mm²/分钟的有效吞吐量获取复杂数学信息。为提高诊断可靠性，我们采用了基于贝叶斯蒙特卡洛Dropout的不确定性量化策略，为每个预测提供自主的不确定性估计，支持可靠且稳健的HER2评分，整体修正率为30.4%。使用412个盲测样本的测试集，本方法在4类（0，1+，2+，3+）HER2分类中实现了84.9%的测试准确率，在二分类（0/1+ vs. 2+/3+）HER2评分中实现了94.8%的准确率，结合不确定性量化。总体而言，这种无透镜全息方法提供了一条通往便携式、高吞吐量和低成本HER2评分的实用途径，特别适用于资源有限的环境，其中传统数字病理基础设施不可用。

英文摘要

Accurate assessment of human epidermal growth factor receptor 2 (HER2) expression is critical for breast cancer diagnosis, prognosis, and therapy selection; yet, most existing digital HER2 scoring methods rely on bulky and expensive optical systems. Here, we present a compact and cost-effective lensfree holography platform integrated with deep learning for automated HER2 scoring of immunohistochemically stained breast tissue sections. The system captures lensfree diffraction patterns of stained HER2 tissue sections under RGB laser illumination and acquires complex field information over a sample area of ~1,250 mm^2 at an effective throughput of ~84 mm^2 per minute. To enhance diagnostic reliability, we incorporated an uncertainty quantification strategy based on Bayesian Monte Carlo dropout, which provides autonomous uncertainty estimates for each prediction and supports reliable, robust HER2 scoring, with an overall correction rate of 30.4%. Using a blinded test set of 412 unique tissue samples, our approach achieved a testing accuracy of 84.9% for 4-class (0, 1+, 2+, 3+) HER2 classification and 94.8% for binary (0/1+ vs. 2+/3+) HER2 scoring with uncertainty quantification. Overall, this lensfree holography approach provides a practical pathway toward portable, high-throughput, and cost-effective HER2 scoring, particularly suited for resource-limited settings, where traditional digital pathology infrastructure is unavailable.

URL PDF HTML ☆

赞 0 踩 0

2508.11618 2026-06-05 cs.LG 版本更新

Optimal CO2 storage management considering safety constraints in multi-stakeholder multi-site CCS projects: a Markov game perspective

考虑安全约束的多利益相关者多地点碳捕集与封存项目最优存储管理：从马尔可夫博弈视角

Jungang Chen, Seyyed A. Hosseini

发表机构 * The University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结本文基于马尔可夫博弈方法，研究多利益相关者多地点碳捕集与封存项目中不同联盟结构对利益相关者目标的影响，提出一种考虑安全约束的多智能体强化学习框架，以实现多利益相关者的最优存储管理。

Comments 58 pages

详情

DOI: 10.1016/j.ijggc.2026.104683
Journal ref: Int. J. Greenh. Gas Control 149 (2026) 104683

AI中文摘要

碳捕集与封存（CCS）项目通常涉及来自公共、私人和监管部门的多种利益相关者，每个利益相关者有不同的目标和责任。鉴于CCS操作的复杂性、规模和长期性，确定个体利益相关者是否能够独立最大化其利益，或是否需要协作联盟协议，仍然是有效CCS项目规划和管理的核心问题。CCS项目通常在地质相连的地点实施，其中共享的地质特征如压力空间和储层孔隙容量可能导致利益相关者之间的竞争行为。此外，CO2储存地点通常位于地质成熟的盆地，这些盆地以前曾作为石油开采或废水处置的地点，以利用现有基础设施，这使得单方面优化变得更加复杂和不现实。在本工作中，我们提出了一种基于马尔可夫博弈的范式，以定量研究不同联盟结构如何影响利益相关者的目标。我们将多利益相关者多地点问题框架为具有安全约束的多智能体强化学习问题。我们的方法使智能体能够在遵守安全规定的情况下学习最优策略。我们展示了多个操作员在地质相连盆地中向各自项目区域注入CO2的示例。为了解决高保真模型重复模拟的高计算成本，采用了一种基于Embed-to-Control（E2C）框架的先前开发的替代模型。我们的结果展示了所提出框架在处理多个具有不同目标和目标的利益相关者时实现CO2存储最优管理的有效性。

英文摘要

Carbon capture and storage (CCS) projects typically involve a diverse array of stakeholders or players from public, private, and regulatory sectors, each with different objectives and responsibilities. Given the complexity, scale, and long-term nature of CCS operations, determining whether individual stakeholders can independently maximize their interests or whether collaborative coalition agreements are needed remains a central question for effective CCS project planning and management. CCS projects are often implemented in geologically connected sites, where shared geological features such as pressure space and reservoir pore capacity can lead to competitive behavior among stakeholders. Furthermore, CO2 storage sites are often located in geologically mature basins that previously served as sites for hydrocarbon extraction or wastewater disposal in order to leverage existing infrastructures, which makes unilateral optimization even more complicated and unrealistic. In this work, we propose a paradigm based on Markov games to quantitatively investigate how different coalition structures affect the goals of stakeholders. We frame this multi-stakeholder multi-site problem as a multi-agent reinforcement learning problem with safety constraints. Our approach enables agents to learn optimal strategies while compliant with safety regulations. We present an example where multiple operators are injecting CO2 into their respective project areas in a geologically connected basin. To address the high computational cost of repeated simulations of high-fidelity models, a previously developed surrogate model based on the Embed-to-Control (E2C) framework is employed. Our results demonstrate the effectiveness of the proposed framework in addressing optimal management of CO2 storage when multiple stakeholders with various objectives and goals are involved.

URL PDF HTML ☆

赞 0 踩 0

2512.14338 2026-06-05 cs.LG 版本更新

Implicit Bias and Invariance: How Hopfield Networks Efficiently Learn Graph Orbits

隐式偏差与不变性：Hopfield网络如何高效学习图轨道

Michael Murray, Tenzin Chan, Kedar Karhadker, Christopher J. Hillar

发表机构 * Mathematical Sciences, University of Bath（巴斯大学数学科学系）； Department of Mathematics, UCLA（洛杉矶大学数学系）； Algebraic 4 New Theory AI（代数4新理论AI）

AI总结研究探讨了Hopfield网络在处理对称性学习问题时的隐式不变性机制，揭示了通过梯度下降学习图同构类时的隐式偏差及其对样本复杂度的影响。

详情

AI中文摘要

许多学习问题涉及对称性，尽管不变性可以被构建到神经架构中，但也可以在训练于群结构数据时隐式地出现。我们研究了经典Hopfield网络中的这一现象，并展示了它们可以从少量随机样本中推断出图的完整同构类。我们的结果揭示了：(i) 图的同构类可以在三维不变子空间内表示；(ii) 使用梯度下降最小化能量流（MEF）具有隐式偏差，倾向于规范高效解，这为学习同构类提供了多项式样本复杂度界；(iii) 在多种学习规则下，参数随着样本量的增加而收敛到不变子空间。这些发现突显了Hopfield网络泛化中的统一机制：学习过程对规范效率的偏见驱动了在群结构数据下的近似不变性出现。

英文摘要

Many learning problems involve symmetries, and while invariance can be built into neural architectures, it can also emerge implicitly when training on group-structured data. We study this phenomenon in classical Hopfield networks and show they can infer the full isomorphism class of a graph from a small random sample. Our results reveal that: (i) graph isomorphism classes can be represented within a three-dimensional invariant subspace, (ii) using gradient descent to minimize energy flow (MEF) has an implicit bias toward norm-efficient solutions, which underpins a polynomial sample complexity bound for learning isomorphism classes, and (iii) across multiple learning rules, parameters converge toward the invariant subspace as sample sizes grow. Together, these findings highlight a unifying mechanism for generalization in Hopfield networks: a bias toward norm efficiency in learning drives the emergence of approximate invariance under group-structured data.

URL PDF HTML ☆

赞 0 踩 0

2601.09236 2026-06-05 cs.LG cs.AI 版本更新

Reward Learning through Ranking Mean Squared Error

通过排名均方误差进行奖励学习

Chaitanya Kharyal, Calarina Muslimani, Matthew E. Taylor

发表机构 * Calarina Muslimani（卡拉里娜·穆斯林尼）； Matthew E. Taylor（马修·E·泰勒）

AI总结本文提出了一种基于排名的强化学习方法R4，通过引入新的排名均方误差损失函数，从轨迹-评分对数据中学习奖励函数，并在机器人基准测试中表现出色。

详情

AI中文摘要

奖励设计仍然是将强化学习（RL）应用于现实世界问题的主要瓶颈。一种流行的替代方法是奖励学习，其中奖励函数是从人类反馈中推断出来，而不是手动指定。最近的工作提出了从人类评分而不是传统二元偏好中学习奖励函数，从而实现更丰富且可能更少认知需求的监督。在此范式基础上，我们引入了一种新的基于评分的RL方法，即Ranked Return Regression for RL（R4）。其核心是使用一种新的排名均方误差损失，从轨迹-评分对数据集中学习，将人类提供的离散评分（例如，差，中性，好）视为有序目标。与以往的基于评分的方法不同，R4提供了正式的保证：在其解集下，在温和的假设下，解集是可证明的最小且完整的。实证上，使用人类提供的和模拟的评分，我们证明R4在OpenAI Gym和DeepMind Control Suite的机器人基准测试中，一致地匹配或优于现有的基于评分和偏好强化学习方法。代码发布在https://github.com/IRLL/R4。

英文摘要

Reward design remains a significant bottleneck in applying reinforcement learning (RL) to real-world problems. A popular alternative is reward learning, where reward functions are inferred from human feedback rather than manually specified. Recent work has proposed learning reward functions from human ratings rather than traditional binary preferences, enabling richer and potentially less cognitively demanding supervision. Building on this paradigm, we introduce a new rating-based RL method, Ranked Return Regression for RL (R4). At its core, R4 uses a novel ranking mean squared error loss that learns from a dataset of trajectory-rating pairs, treating the human-provided discrete ratings (e.g., bad, neutral, good) as ordinal targets. Unlike prior rating-based approaches, R4 offers formal guarantees: its solution set is provably minimal and complete under mild assumptions. Empirically, using both human-provided and simulated ratings, we demonstrate that R4 consistently matches or outperforms existing rating and preference-based RL methods on robotic benchmarks from OpenAI Gym and the DeepMind Control Suite. Code released at https://github.com/IRLL/R4.

URL PDF HTML ☆

赞 0 踩 0

2502.14131 2026-06-05 cs.LG cs.AI econ.EM 版本更新

An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model

一种用于离线逆强化学习和动态离散选择模型的经验风险最小化方法

Enoch H. Kang, Hema Yoganarasimhan, Lalit Jain

发表机构 * Foster School of Business, University of Washington（华盛顿大学福斯特商学院）

AI总结本文提出了一种基于经验风险最小化（ERM）的逆强化学习/动态离散选择模型框架，该方法无需显式估计贝尔曼方程中的状态转移概率，适用于高维和无限状态空间，并在理论上有Polyak-Lojasiewicz条件的支持，从而保证了快速的全局收敛性。

详情

AI中文摘要

我们研究了估计动态离散选择（DDC）模型的问题，也称为机器学习中的离线最大熵正则化逆强化学习（离线MaxEnt-IRL）。目标是从离线行为数据中恢复支配代理行为的奖励或Q*函数。在本文中，我们提出了一种全局收敛的基于梯度的方法来解决这些问题，而无需线性参数化的奖励假设。我们的方法的创新之处在于引入了基于经验风险最小化（ERM）的IRL/DDC框架，该框架避免了在贝尔曼方程中显式估计状态转移概率的需要。此外，我们的方法与非参数估计技术如神经网络兼容。因此，所提出的方法有潜力扩展到高维、无限状态空间。我们方法的一个关键理论洞察是贝尔曼残差满足Polyak-Lojasiewicz（PL）条件--一个属性，虽然比强凸性弱，但足以保证快速的全局收敛保证。通过一系列合成实验，我们证明我们的方法在性能上始终优于基准方法和最先进的替代方法。

英文摘要

We study the problem of estimating Dynamic Discrete Choice (DDC) models, also known as offline Maximum Entropy-Regularized Inverse Reinforcement Learning (offline MaxEnt-IRL) in machine learning. The objective is to recover reward or $Q^*$ functions that govern agent behavior from offline behavior data. In this paper, we propose a globally convergent gradient-based method for solving these problems without the restrictive assumption of linearly parameterized rewards. The novelty of our approach lies in introducing the Empirical Risk Minimization (ERM) based IRL/DDC framework, which circumvents the need for explicit state transition probability estimation in the Bellman equation. Furthermore, our method is compatible with non-parametric estimation techniques such as neural networks. Therefore, the proposed method has the potential to be scaled to high-dimensional, infinite state spaces. A key theoretical insight underlying our approach is that the Bellman residual satisfies the Polyak-Lojasiewicz (PL) condition -- a property that, while weaker than strong convexity, is sufficient to ensure fast global convergence guarantees. Through a series of synthetic experiments, we demonstrate that our approach consistently outperforms benchmark methods and state-of-the-art alternatives.

URL PDF HTML ☆

赞 0 踩 0

大气机器学习模型对均匀海表温度变暖的平衡响应

Bosong Zhang, Timothy M. Merlis

发表机构 * University of Washington（华盛顿大学）

AI总结本文评估了几种先进的机器学习模型对均匀海表温度变暖的气候响应，探讨了这些模型在气候预测中的潜力与局限性。

详情

AI中文摘要

近年来，能够产生稳定、多年气候模拟的全球大气机器学习模型已得到发展。然而，这些机器学习模型超越训练分布进行泛化的能力仍是一个开放性问题。在本研究中，我们评估了几种最先进的机器学习模型（ACE2-ERA5、NeuralGCM和cBottle）对均匀海表温度变暖的气候响应，这是一种广泛用于评估气候变化的基准测试。我们评估了这些机器学习模型相对于基于物理的一般环流模型（NOAA的Geophysical Fluid Dynamics Laboratory AM4）在关键诊断指标上的性能，包括地表空气温度、降水量、温度和风廓线以及大气顶部辐射。尽管机器学习模型能够再现物理模型响应的关键方面，特别是降水量的响应，但某些模型在辐射响应和陆地区域变暖方面表现出显著偏离稳健的物理响应。我们的结果突显了机器学习模型在气候变化应用中的潜力和当前的局限性，并表明需要进一步改进以实现稳健的样本外泛化。

英文摘要

Machine learning models for the global atmosphere that are capable of producing stable, multi-year simulations of Earth's climate have recently been developed. However, the ability of these ML models to generalize beyond the training distribution remains an open question. In this study, we evaluate the climate response of several state-of-the-art ML models (ACE2-ERA5, NeuralGCM, and cBottle) to a uniform sea surface temperature warming, a widely used benchmark for evaluating climate change. We assess each ML model's performance relative to a physics-based general circulation model (NOAA's Geophysical Fluid Dynamics Laboratory AM4) across key diagnostics, including surface air temperature, precipitation, temperature and wind profiles, and top-of-atmosphere radiation. While the ML models reproduce key aspects of the physical model response, particularly the response of precipitation, some exhibit notable departures from robust physical responses, including radiative responses and land region warming. Our results highlight the promise and current limitations of ML models for climate change applications and suggest that further improvements are needed for robust out-of-sample generalization.

URL PDF HTML ☆

赞 0 踩 0

2510.10968 2026-06-05 cs.LG stat.ML 版本更新

Blade: A Derivative-free Bayesian Inversion Method using Diffusion Priors

Blade：一种使用扩散先验的无导数贝叶斯反演方法

Hongkai Zheng, Austin Wang, Zihui Wu, Zhengyu Huang, Ricardo Baptista, Yisong Yue

发表机构 * California Institute of Technology（加州理工学院）； University of Toronto（多伦多大学）； Peking University（北京大学）

AI总结本文提出Blade方法，通过使用扩散模型作为数据驱动的先验，解决无导数贝叶斯反演中高维非线性问题的后验估计问题，实现了准确且校准良好的后验分布。

详情

AI中文摘要

无导数贝叶斯反演在科学和工程应用中出现，特别是在正向模型成本高或无法通过导数进行微分时。现有的无导数方法将后验缩减为点估计或在高维非线性问题中返回严重过自信的不确定性。我们介绍了Blade，它使用相互作用粒子的集合产生准确且校准良好的后验。Blade利用扩散模型作为数据驱动的先验，并且只通过正向评估（即无导数）查询正向模型。理论上，我们证明了在正向模型近似和先验分数估计误差下，Blade的收敛性和稳定性。经验上，在非线性流体动力学中，Blade产生校准良好的后验样本，这些样本现有无导数方法无法产生，通过CRPS、扩展-技能比和等级直方图进行测量。其准确性和校准随着迭代次数和粒子数的增加而持续提高，这得到了我们的收敛性和稳定性分析以及经验实验的支持。

英文摘要

Derivative-free Bayesian inversion arises in science and engineering applications, particularly when forward model is costly or infeasible to differentiate through. Existing derivative-free methods collapse the posterior to a point estimate or return severely over-confident uncertainty on high-dimensional, nonlinear problems. We introduce Blade, which produces accurate and well-calibrated posteriors using an ensemble of interacting particles. Blade leverages diffusion models as data-driven priors, and only queries the forward model through forward evaluations (i.e., derivative-free). Theoretically, we show the convergence and stability of Blade under forward model approximation and prior score estimation error. Empirically, on nonlinear fluid dynamics, Blade produces well-calibrated posterior samples that existing derivative-free methods cannot, as measured by CRPS, the spread-skill ratio, and the rank histogram. Its accuracy and calibration improve consistently with more iterations and particles, backed by our convergence and stability analysis and empirical experiments.

URL PDF HTML ☆

赞 0 踩 0

2512.21335 2026-06-05 physics.med-ph cs.LG physics.app-ph physics.bio-ph 版本更新

Autonomous Uncertainty Quantification for Computational Point-of-care Sensors

自主不确定性量化用于计算床旁传感器

Artem Goncharov, Rajesh Ghosh, Hyou-Arm Joung, Dino Di Carlo, Aydogan Ozcan

发表机构 * Electrical & Computer Engineering Department（电气与计算机工程系）； Bioengineering Department（生物工程系）； California NanoSystems Institute (CNSI)（加州纳米系统研究所）； Department of Surgery（外科医学系）； University of California, Los Angeles（加州大学洛杉矶分校）

AI总结本文提出了一种自主不确定性量化技术，用于改进床旁诊断中的神经网络驱动计算传感器系统，通过蒙特卡洛dropout方法提高诊断的准确性和可靠性。

Comments 18 Pages, 5 Figures

详情

DOI: 10.1021/acsnano.6c06616
Journal ref: ACS Nano (2026)

AI中文摘要

计算床旁（POC）传感器能够为紧急、偏远和资源有限地区提供快速、低成本和可及的诊断。这些系统可以利用基于神经网络的算法从快速诊断测试或传感器生成的信号中准确推断诊断。然而，基于神经网络的诊断模型容易产生幻觉，并可能产生错误预测，导致误诊和不准确的临床决策。为了解决这一挑战，本文提出了一种专为POC诊断开发的自主不确定性量化技术。作为测试平台，我们使用了用于快速诊断莱姆病（全球最普遍的蜱传疾病）的纸基计算垂直流分析（xVFA）平台。xVFA平台集成了可丢弃的纸基检测、手持光学读取器和基于神经网络的推断算法，可在20分钟内使用仅20微升患者血清提供快速且经济有效的莱姆病诊断。通过将基于蒙特卡洛dropout（MCDO）的不确定性量化方法整合到诊断流程中，我们识别并排除了具有高不确定性的错误预测，显著提高了xVFA的灵敏度和可靠性，无需访问患者的真实诊断信息。使用新患者样本的盲测显示，诊断灵敏度从88.2%提高到95.7%，表明基于MCDO的不确定性量化在增强神经网络驱动的计算POC传感系统鲁棒性方面的有效性。

英文摘要

Computational point-of-care (POC) sensors enable rapid, low-cost, and accessible diagnostics in emergency, remote and resource-limited areas that lack access to centralized medical facilities. These systems can utilize neural network-based algorithms to accurately infer a diagnosis from the signals generated by rapid diagnostic tests or sensors. However, neural network-based diagnostic models are subject to hallucinations and can produce erroneous predictions, posing a risk of misdiagnosis and inaccurate clinical decisions. To address this challenge, here we present an autonomous uncertainty quantification technique developed for POC diagnostics. As our testbed, we used a paper-based, computational vertical flow assay (xVFA) platform developed for rapid POC diagnosis of Lyme disease, the most prevalent tick-borne disease globally. The xVFA platform integrates a disposable paper-based assay, a handheld optical reader and a neural network-based inference algorithm, providing rapid and cost-effective Lyme disease diagnostics in under 20 min using only 20 uL of patient serum. By incorporating a Monte Carlo dropout (MCDO)-based uncertainty quantification approach into the diagnostics pipeline, we identified and excluded erroneous predictions with high uncertainty, significantly improving the sensitivity and reliability of the xVFA in an autonomous manner, without access to the ground truth diagnostic information of patients. Blinded testing using new patient samples demonstrated an increase in diagnostic sensitivity from 88.2% to 95.7%, indicating the effectiveness of MCDO-based uncertainty quantification in enhancing the robustness of neural network-driven computational POC sensing systems.

URL PDF HTML ☆

赞 0 踩 0

2512.20813 2026-06-05 cs.LG 版本更新

GraphFire-X: Physics-Informed Graph Attention Networks and Structural Gradient Boosting for Building-Scale Wildfire Preparedness at the Wildland-Urban Interface

GraphFire-X: 基于物理信息图注意力网络和结构梯度提升的建筑尺度野火准备方法用于荒野-城市界面

Miguel Esparza, Vamshi Battal, Ali Mostafavi

发表机构 * Urban Reslience.AI Lab, Zachry Department of Civil and Environmental Engineering, Texas A&M University（Urban Reslience.AI实验室，Zachry土木与环境工程系，德克萨斯A&M大学）； Department of Computer Science and Engineering, Texas A&M University（计算机科学与工程系，德克萨斯A&M大学）

AI总结本研究提出GraphFire-X框架，结合物理信息图注意力网络和结构梯度提升，通过分离脆弱性为环境传染和结构脆弱两个向量，解决荒野-城市界面野火风险建模问题，揭示环境压力主导传播路径，而屋檐成为主要微尺度入侵向量，从而实现精准的灾害预防和缓解策略。

详情

DOI: 10.1016/j.cacaie.2026.100085
Journal ref: Computer-Aided Civil and Infrastructure Engineering (2026): 100085

AI中文摘要

随着野火越来越多地演变为城市大火，传统将结构视为孤立资产的风险模型无法捕捉荒野-城市界面（WUI）的非线性传染动态。本研究通过建立一种新的双专家集成框架，弥合了机理物理与数据驱动学习之间的差距。该框架将脆弱性分解为两个不同的向量：环境传染和结构脆弱性。架构整合了两个专门的预测流，即环境专家，实现为图神经网络（GNN），将社区视为一个加权的有向传染图，权重由物理信息传导、辐射和火星概率决定，并结合高维Google AlphaEarth Foundation嵌入；以及结构专家，通过XGBoost实现，以隔离细粒度资产层面的韧性。应用于2025年Eaton火灾，该框架揭示了风险驱动因素的关键二元性。GNN显示，社区层面的环境压力主导内在结构特征，在定义传播路径中起主导作用，而XGBoost模型识别屋檐为主要微尺度入侵向量。通过逻辑堆叠合成这些分歧信号，集成模型实现了稳健的分类并生成诊断风险拓扑。该能力使决策者能够超越二元损失预测，精确针对缓解优先级，优先管理高连通性集群的植被和对建筑脆弱节点进行结构加固，从而实现一种主动、数据驱动的社区韧性方法。

英文摘要

As wildfires increasingly evolve into urban conflagrations, traditional risk models that treat structures as isolated assets fail to capture the non-linear contagion dynamics characteristic of the wildland urban interface (WUI). This research bridges the gap between mechanistic physics and data driven learning by establishing a novel dual specialist ensemble framework that disentangles vulnerability into two distinct vectors, environmental contagion and structural fragility. The architecture integrates two specialized predictive streams, an environmental specialist, implemented as a graph neural network (GNN) that operationalizes the community as a directed contagion graph weighted by physics informed convection, radiation, and ember probabilities, and enriched with high dimensional Google AlphaEarth Foundation embeddings, and a Structural Specialist, implemented via XGBoost to isolate granular asset level resilience. Applied to the 2025 Eaton Fire, the framework reveals a critical dichotomy in risk drivers. The GNN demonstrates that neighborhood scale environmental pressure overwhelmingly dominates intrinsic structural features in defining propagation pathways, while the XGBoost model identifies eaves as the primary micro scale ingress vector. By synthesizing these divergent signals through logistic stacking, the ensemble achieves robust classification and generates a diagnostic risk topology. This capability empowers decision makers to move beyond binary loss prediction and precisely target mitigation prioritizing vegetation management for high connectivity clusters and structural hardening for architecturally vulnerable nodes thereby operationalizing a proactive, data driven approach to community resilience.

URL PDF HTML ☆

赞 0 踩 0

2512.20111 2026-06-05 cs.CL cs.AI cs.LG 版本更新

ABBEL: Learning Natural-Language Belief States for Memory-Efficient Interaction

ABBEL: 为高效交互学习自然语言信念状态

Aly Lidayan, Jakob Bjorner, Satvik Golechha, Kartik Goyal, Alane Suhr

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Georgia Institute of Technology（佐治亚理工学院）

AI总结本文提出ABBEL框架，通过显式自然语言信念状态直接监督每个摘要的信息内容，以解决传统方法在生成摘要时信息丢失或更新错误的问题，从而在保持高效内存使用的同时提升交互性能。

详情

AI中文摘要

随着序列决策任务的时间范围扩大，将完整交互历史保留在模型上下文中变得越来越昂贵。最近的研究通过使用递归更新的自然语言摘要来减少上下文长度，这些摘要简洁且可解释。然而，这些方法在性能上仍低于能够访问完整上下文的智能体，表明它们未能生成足够的摘要。为此，我们提出了ABBEL，一种递归摘要框架，通过显式自然语言信念状态直接监督每个摘要的信息内容。首先，我们分析了在五个领域中由前沿模型生成的信念状态，并验证了性能通常因遗漏或错误更新信息而降低。我们还发现了一些模型使用内存低效的设置，通过保留冗余信息。我们通过两种基于强化学习的方法进行微调：信念分级，通过奖励基于信息内容的信念生成来减少更新错误；峰值信念惩罚，通过鼓励压缩内存足迹最大的信念。我们证明这些方法显著缩小了与完整上下文模型的性能差距，并使ABBEL在使用67%内存的情况下，比先前的记忆智能体工作提高了40%。我们的代码可在https://github.com/jakob-bjorner/optimal-explorer-dev获取。

英文摘要

As the time horizons of sequential decision-making tasks grow, keeping full interaction histories in model context becomes increasingly costly. Recent work reduces context lengths by instead conditioning decision-making agents on recursively updated natural-language summaries, which are concise and interpretable. However, they underperform agents with access to the full context, suggesting that they fail to generate sufficient summaries. To address this we propose ABBEL, a recursive summarization framework that isolates and directly supervises each summary's information contents in the form of explicit natural-language belief states. First, we analyze the belief states generated by frontier models under ABBEL across five domains, and verify that performance is often degraded due to omitting or incorrectly updating information. We also discover settings where models use memory inefficiently by retaining extraneous information. We target these limitations by fine-tuning with two RL-based methods: belief grading, which reduces update errors by rewarding belief generations based on their information content, and peak belief penalties, which encourage compressing the beliefs with the greatest memory footprints. We demonstrate that these methods significantly reduce the performance gap with full context models, and enable ABBEL to outperform prior memory agent work by 40% while using 67% of the memory. Our code is available at https://github.com/jakob-bjorner/optimal-explorer-dev

URL PDF HTML ☆

赞 0 踩 0

2412.11800 2026-06-05 cs.LG stat.ML 版本更新

Scalable Temporal Anomaly Causality Discovery in Large Systems: Achieving Computational Efficiency with Binary Anomaly Flag Data

在大规模系统中实现可扩展的时间异常因果发现：通过二进制异常标志数据实现计算效率

Mulugeta Weldezgina Asres, Christian Walter Omlin, The CMS-HCAL Collaboration

发表机构 * Department of ICT, University of Agder（阿格德大学信息与通信技术系）； The CMS Experiment, CERN（欧洲核子研究中心（CERN）CMS实验）

AI总结本文提出了一种异常因果发现方法（AnomalyCD），旨在解决从时间二进制标志数据集生成图形因果模型（GCMs）的准确性和计算挑战，通过异常数据感知的因果测试、稀疏数据和先验链接压缩以及边修剪调整等策略，提高了计算效率和准确性。

Comments 26 pages, 17 figures, 8 tables, published version at EPJ-C: Computing, Software and Data Science

详情

DOI: 10.1140/epjc/s10052-026-15611-5
Journal ref: Eur. Phys. J. C, 86, 585 (2026)

AI中文摘要

提取异常因果关系有助于在监控系统检测系统故障时进行诊断。在大规模系统中识别异常原因涉及在多个子系统中调查更广泛的监控变量。然而，学习图形因果模型（GCMs）带来了显著的计算负担，限制了现有方法在实时和大规模部署中的应用。此外，现代大规模系统的监控应用通常生成大量二进制警报标志，二进制异常数据的特征——状态转换的意义和数据稀疏性——挑战了现有的因果学习机制。本文提出了一种异常因果发现方法（AnomalyCD），以解决从时间二进制标志数据集生成GCMs的准确性和计算挑战。AnomalyCD提出了几种策略，例如异常数据感知的因果测试、稀疏数据和先验链接压缩，以及边修剪调整方法。我们在两个数据集上验证了该方法的性能：来自欧洲核子研究中心紧凑缪子对撞机实验读出盒系统的传感器数据，以及一个来自信息技术监控系统的公开数据集。在时间GCMs上的结果表明，计算开销显著减少，且在二进制异常数据集上准确性有所提高。代码：https://github.com/muleina/AnomalyCD

英文摘要

Extracting anomaly causality facilitates diagnostics once monitoring systems detect system faults. Identifying anomaly causes in large systems involves investigating a broader set of monitoring variables across multiple subsystems. However, learning graphical causal models (GCMs) comes with a significant computational burden that restrains the applicability of most existing methods in real-time and large-scale deployments. In addition, modern monitoring applications for large systems often generate large amounts of binary alarm flags, and the distinct characteristics of binary anomaly data -- the meaning of state transition and data sparsity -- challenge existing causality learning mechanisms. This study proposes an anomaly causal discovery approach (AnomalyCD), addressing the accuracy and computational challenges of generating GCMs from temporal binary flag datasets. The AnomalyCD presents several strategies, such as anomaly data-aware causality testing, sparse data and prior link compression, and edge pruning adjustment approaches. We validate the performance of the approach on two datasets: monitoring sensor data from the readout-box system of the Compact Muon Solenoid experiment at CERN, and a public dataset from an information technology monitoring system. The results on temporal GCMs demonstrate a considerable reduction of computation overhead and a moderate enhancement of accuracy on the binary anomaly datasets. Code: https://github.com/muleina/AnomalyCD .

URL PDF HTML ☆

赞 0 踩 0

2512.19510 2026-06-05 cs.LG stat.ML 版本更新

Toward Scalable and Valid Conditional Independence Testing with Spectral Representations

迈向基于谱表示的可扩展且有效的条件独立性检验

Alek Fröhlich, Vladimir R. Kostic, Karim Lounici, Daniel Perazzo, Daniel Tiezzi, Massimiliano Pontil

发表机构 * University of Cambridge（剑桥大学）； University of Oxford（牛津大学）； ETH Zürich（苏黎世联邦理工学院）

AI总结本文提出了一种基于谱表示的学习方法，用于解决传统条件独立性检验在适应性和可扩展性方面的不足，通过构造简单的检验统计量和双层对比算法，建立了表示学习误差与检验性能之间的理论联系，并在实际和合成数据上验证了其有效性。

Comments Accepted at ICML 2026. Revised to match the accepted version; updated experiments and exposition

详情

AI中文摘要

条件独立性（CI）在因果推断、特征选择和图模型中至关重要，然而在许多情况下，没有额外假设的情况下无法进行检验。现有的CI检验通常依赖于限制性的结构条件，限制了其有效性。核方法使用偏协方差算子提供了一种更系统的方法，但存在有限的适应性和可扩展性。在本工作中，我们探讨了表示学习是否能帮助解决这些限制。具体而言，我们关注由偏协方差算子的奇异值分解得到的表示，并利用这些表示构造一个简单的检验统计量。我们还引入了一个双层对比算法来学习这些表示。我们的理论将表示学习误差与检验性能联系起来，并建立了渐近有效性和功效保证。在实际和合成数据上的实验表明，这种方法提供了一条系统且统计上站得住脚的路径，以实现可扩展的CI检验，将基于核的理论与现代表示学习相结合。

英文摘要

Conditional independence (CI) is central to causal inference, feature selection, and graphical modeling, yet it is untestable in many settings without additional assumptions. Existing CI tests often rely on restrictive structural conditions, limiting their validity. Kernel methods using partial covariance operators offer a more principled approach but suffer from limited adaptivity and scalability. In this work, we explore whether representation learning can help address these limitations. Specifically, we focus on representations derived from the singular value decomposition of partial covariance operators and use them to construct a simple test statistic. We also introduce a bi-level contrastive algorithm to learn these representations. Our theory links representation learning error to test performance and establishes asymptotic validity and power guarantees. Experiments on real and synthetic data suggest that this approach offers a principled and statistically grounded path toward scalable CI testing, bridging kernel-based theory with modern representation learning.

URL PDF HTML ☆

赞 0 踩 0

2506.11152 2026-06-05 q-bio.GN cs.LG q-bio.CB 版本更新

HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data

HEIST：一种用于空间转录组学和蛋白质组学数据的图基础模型

Hiren Madhu, João Felipe Rocha, Tinglin Huang, Siddharth Viswanath, Smita Krishnaswamy, Rex Ying

发表机构 * Yale University, USA（耶鲁大学）

AI总结本文提出HEIST模型，通过图结构建模空间转录组学和蛋白质组学数据，利用层次化图Transformer实现对细胞空间位置和基因表达的联合建模，从而提升对细胞异质性和微环境响应的理解。

详情

AI中文摘要

单细胞转录组学和蛋白质组学已成为驱动生物学研究的重要数据来源，使高级深度学习方法能够理解单细胞水平的细胞异质性和基因表达。随着空间组学数据的出现，我们有希望在组织背景下表征细胞，因为其提供了空间坐标和细胞内转录或蛋白质计数。蛋白质组学通过直接测量蛋白质提供互补视角，蛋白质是细胞功能的主要效应器和关键治疗靶点。然而，现有模型要么忽略空间信息，要么忽略细胞内的复杂遗传和蛋白质组程序，因此无法推断细胞内部调节如何适应微环境信号。此外，这些模型通常使用固定基因词汇表，限制了其对未知基因的泛化能力。在本文中，我们介绍了HEIST，一种用于空间转录组学和蛋白质组学的层次化图Transformer基础模型。HEIST将组织建模为层次化图。高层图是空间细胞图，每个细胞再由其下层的基因共表达网络图表示。HEIST通过执行不同层次的消息传递来利用其嵌入中的层次结构，从而能够泛化到包括空间蛋白质组学在内的新数据类型，而无需重新训练。HEIST在15个器官的124种组织中使用空间感知对比和掩码自动编码目标，预训练了2230万细胞。对HEIST嵌入的无监督分析揭示了先前模型遗漏的具有空间信息的亚群。下游评估显示其在蛋白质组学数据上的泛化能力和在临床结果预测、细胞类型注释和基因填补中的最先进性能。

英文摘要

Single-cell transcriptomics and proteomics have become a great source for data-driven insights into biology, enabling the use of advanced deep learning methods to understand cellular heterogeneity and gene expression at the single-cell level. With the advent of spatial-omics data, we have the promise of characterizing cells within their tissue context as it provides both spatial coordinates and intra-cellular transcriptional or protein counts. Proteomics offers a complementary view by directly measuring proteins, which are the primary effectors of cellular function and key therapeutic targets. However, existing models either ignore the spatial information or the complex genetic and proteomic programs within cells. Thus they cannot infer how cell internal regulation adapts to microenvironmental cues. Furthermore, these models often utilize fixed gene vocabularies, hindering their generalizability unseen genes. In this paper, we introduce HEIST, a hierarchical graph transformer foundation model for spatial transcriptomics and proteomics. HEIST models tissues as hierarchical graphs. The higher level graph is a spatial cell graph, and each cell in turn, is represented by its lower level gene co-expression network graph. HEIST achieves this by performing both intra-level and cross-level message passing to utilize the hierarchy in its embeddings and can thus generalize to novel datatypes including spatial proteomics without retraining. HEIST is pretrained on 22.3M cells from 124 tissues across 15 organs using spatially-aware contrastive and masked autoencoding objectives. Unsupervised analysis of HEIST embeddings reveals spatially informed subpopulations missed by prior models. Downstream evaluations demonstrate generalizability to proteomics data and state-of-the-art performance in clinical outcome prediction, cell type annotation, and gene imputation across multiple technologies.

URL PDF HTML ☆

赞 0 踩 0

2512.09706 2026-06-05 cs.LG 版本更新

Training One Model to Master Cross-Level Agentic Actions via Reinforcement Learning

通过强化学习训练一个模型以掌握跨层级的代理行为

Kaichen He, Zihao Wang, Muyao Li, Anji Liu, Yitao Liang

发表机构 * Peking University（北京大学）； National University of Singapore（新加坡国立大学）

AI总结本文提出CrossHA，一种统一的代理模型，能够掌握异构的动作空间并自主选择每一步轨迹中最有效的接口，通过结合冷启动监督微调和多轮组相对策略优化（GRPO）算法，实现适应性动作切换，在Minecraft开放世界中超过800个任务上展示了最先进的性能。

Comments Accepted to CVPR 2026 as a Highlight

详情

AI中文摘要

代理AI的范式正从工程复杂的流程转向训练后的原生模型。然而，现有代理通常局限于静态的预定义动作空间，如仅使用API、GUI事件或机器人命令。这种刚性限制了它们在动态环境中适应性，其中最佳交互粒度会根据情境变化而变化。为弥合这一差距，我们提出了CrossHA，一种统一的代理模型，能够掌握异构的动作空间并自主选择每一步轨迹中最有效的接口。我们引入了一个全面的训练管道，整合了冷启动监督微调与多轮组相对策略优化（GRPO）算法。该方法使代理能够学习适应性动作切换，平衡高层效率与低层精度，而无需人为指定规则。在Minecraft开放世界中超过800个任务的广泛实验表明，CrossHA实现了最先进的性能。通过动态利用多样化的动作空间，我们的模型显著优于固定动作基线，展现出在长时间推理中的优越泛化能力和效率。所有代码和模型均在https://github.com/CraftJarvis/OpenHA上提供。

英文摘要

The paradigm of agentic AI is shifting from engineered complex workflows to post-training native models. However, existing agents are typically confined to static, predefined action spaces-such as exclusively using APIs, GUI events, or robotic commands. This rigidity limits their adaptability in dynamic environments where the optimal granularity of interaction varies contextually. To bridge this gap, we propose CrossHA, a unified agentic model that masters heterogeneous action spaces and autonomously selects the most effective interface for each step of a trajectory. We introduce a comprehensive training pipeline that integrates cold-start supervised fine-tuning with a Multi-Turn Group Relative Policy Optimization (GRPO) algorithm. This approach enables the agent to learn adaptive action switching-balancing high-level efficiency with low-level precision-without human-specified rules. Extensive experiments on over 800 tasks in the open-world Minecraft environment demonstrate that CrossHA achieves state-of-the-art performance. By dynamically leveraging the strengths of diverse action spaces, our model significantly outperforms fixed-action baselines, exhibiting superior generalization and efficiency in long-horizon reasoning. All code and models are available at https://github.com/CraftJarvis/OpenHA.

URL PDF HTML ☆

赞 0 踩 0

2511.21667 2026-06-05 cs.LG cs.AI 版本更新

Escaping the Verifier: Learning to Reason via Demonstrations

摆脱验证者：通过示范学习推理

Locke Cai, Max Ryabinin, Ivan Provilkov

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结本文提出RARO方法，通过逆强化学习从专家示范中学习强大的推理能力，无需任务特定的验证者，从而在多个评估任务中实现了显著的性能提升。

详情

AI中文摘要

训练大型语言模型（LLMs）进行推理通常依赖于强化学习（RL）与任务特定的验证者。然而，许多现实世界的推理密集型任务缺乏验证者，尽管提供了大量未被充分利用的专家示范。我们引入RARO（相对对抗推理优化），通过逆强化学习从专家示范中学习强大的推理能力。RARO设置了一个对抗游戏，政策与相对批评者之间进行对抗：政策学习模仿专家答案，而批评者旨在识别专家政策答案对中的专家。政策和批评者通过RL联合且连续地训练，并识别出实现稳健学习所需的关键稳定技术。实证结果表明，RARO在所有评估任务中均显著优于无验证者基线：在Countdown（1.5B）上准确率提高13.7%，在DeepMath（7B）上准确率提高8.2%，在Poetry Writing（7B）上对专家诗歌的胜利率提高19.1%。RARO还表现出与具有验证者的RL相似的稳健扩展趋势。这些结果表明，RARO能够从专家示范中有效提取强大的推理性能，即使在任务特定验证者不可用时也能实现稳健的推理学习。

英文摘要

Training Large Language Models (LLMs) to reason often relies on Reinforcement Learning (RL) with task-specific verifiers. However, many real-world reasoning-intensive tasks lack verifiers, despite offering abundant expert demonstrations that remain under-utilized for reasoning-focused training. We introduce RARO (Relativistic Adversarial Reasoning Optimization), which learns strong reasoning capabilities from expert demonstrations alone via Inverse Reinforcement Learning. RARO sets up an adversarial game between a policy and a relativistic critic: the policy learns to mimic expert answers, while the critic aims to identify the experts among expert-policy answer pairs. Both the policy and the critic are trained jointly and continuously via RL, and we identify the key stabilization techniques required for robust learning. Empirically, RARO significantly outperforms strong verifier-free baselines across all evaluation tasks: +13.7% accuracy on Countdown (1.5B), +8.2% accuracy on DeepMath (7B), and +19.1% win-rate on Poetry Writing (7B) against expert poems. RARO also exhibits similar robust scaling trends as RL with verifiers. These results demonstrate that RARO effectively elicits strong reasoning performance from expert demonstrations alone, enabling robust reasoning learning even when task-specific verifiers are unavailable.

URL PDF HTML ☆

赞 0 踩 0

2508.10875 2026-06-05 cs.CL cs.AI cs.LG 版本更新

A Survey on Diffusion Language Models

扩散语言模型的综述

Tianyi Li, Mingda Chen, Bowei Guo, Zhiqiang Shen

发表机构 * VILA Lab, Mohamed bin Zayed University of Artificial Intelligence（维拉实验室，穆罕默德·本·扎耶德人工智能大学）； Department of Automation, Tsinghua University（清华大学自动化系）

AI总结本文综述了扩散语言模型的发展现状，探讨了其与自回归模型和掩码语言模型的关系，分析了预训练策略、后训练方法以及推理优化技术，并讨论了多模态扩展、应用场景、局限性及未来研究方向。

详情

AI中文摘要

扩散语言模型（DLMs）正迅速崛起为一种强大的替代方案，以取代主导的自回归（AR）范式。通过迭代去噪过程并行生成令牌，DLMs在减少推理延迟和捕捉双向上下文方面具有固有优势，从而实现对生成过程的精细控制。尽管实现了数倍的加速，最近的进展使DLMs在性能上与自回归模型相当，使其成为各种自然语言处理任务的有力选择。在本文综述中，我们提供了当前DLM景观的全面概述。我们追踪其演变及其与其他范式，如自回归和掩码语言模型的关系，并涵盖了基础原理和最先进模型。我们的工作提供了一个最新、全面的分类法以及对当前技术的深入分析，从预训练策略到高级后训练方法。本文的另一个贡献是全面回顾DLM推理策略和优化，包括解码并行性、缓存机制和生成质量的改进。我们还突出了DLM多模态扩展的最新方法，并阐述了它们在各种实际场景中的应用。此外，我们的讨论还讨论了DLMs的局限性和挑战，包括效率、长序列处理和基础设施需求，同时概述了未来研究方向，以维持该快速发展的领域中的进步。Project GitHub可在https://github.com/VILA-Lab/Awesome-DLMs上找到。

英文摘要

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent advantages in reducing inference latency and capturing bidirectional context, thereby enabling fine-grained control over the generation process. While achieving a several-fold speed-up, recent advancements have allowed DLMs to show performance comparable to their autoregressive counterparts, making them a compelling choice for various natural language processing tasks. In this survey, we provide a holistic overview of the current DLM landscape. We trace its evolution and relationship with other paradigms, such as autoregressive and masked language models, and cover both foundational principles and state-of-the-art models. Our work offers an up-to-date, comprehensive taxonomy and an in-depth analysis of current techniques, from pre-training strategies to advanced post-training methods. Another contribution of this survey is a thorough review of DLM inference strategies and optimizations, including improvements in decoding parallelism, caching mechanisms, and generation quality. We also highlight the latest approaches to multimodal extensions of DLMs and delineate their applications across various practical scenarios. Furthermore, our discussion addresses the limitations and challenges of DLMs, including efficiency, long-sequence handling, and infrastructure requirements, while outlining future research directions to sustain progress in this rapidly evolving field. Project GitHub is available at https://github.com/VILA-Lab/Awesome-DLMs.

URL PDF HTML ☆

赞 0 踩 0

2511.21338 2026-06-05 cs.LG 版本更新

Masks Can Be Distracting: On Context Comprehension in Diffusion Language Models

掩码可能具有干扰性：关于扩散语言模型中的上下文理解

Julianna Piskorz, Cristina Pinneri, Alvaro Correia, Motasem Alfarra, Risheek Garrepalli, Christos Louizos

发表机构 * University of Cambridge（剑桥大学）

AI总结本文研究了扩散语言模型中掩码对上下文理解的影响，发现掩码会干扰模型对相关信息的处理，提出一种掩码无关的损失函数以提高模型的鲁棒性。

Comments Published at the Forty-Third International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

Masked Diffusion Language Models (MDLMs) have recently emerged as a promising alternative to Autoregressive Language Models (ARLMs), leveraging a denoising objective that, in principle, should enable more uniform context utilisation. In this work, we examine the context comprehension abilities of MDLMs and uncover two key limitations. First, despite their more global training objective and bidirectional attention mechanism, similarly to ARLMS, MDLMs exhibit a strong locality bias: performance is highly sensitive to the position of relevant information within the input, favouring local over distant context. Second, we show that appending a large number of mask tokens--required for generation--can significantly degrade context comprehension. Through systematic ablations, we find that these masks act as distractors, reducing the model's ability to process relevant information. To address this, we introduce a mask-agnostic loss function that encourages predictions to remain invariant to the number of appended masks. Fine-tuning with this objective substantially mitigates the distracting effect of masks, improving robustness of MDLMs. Overall, our findings reveal critical limitations of the current MDLM training paradigm and provide actionable insights for building diffusion-based language models with stronger context comprehension.

英文摘要

Masked Diffusion Language Models (MDLMs) have recently emerged as a promising alternative to Autoregressive Language Models (ARLMs), leveraging a denoising objective that, in principle, should enable more uniform context utilisation. In this work, we examine the context comprehension abilities of MDLMs and uncover two key limitations. First, despite their more global training objective and bidirectional attention mechanism, similarly to ARLMS, MDLMs exhibit a strong locality bias: performance is highly sensitive to the position of relevant information within the input, favouring local over distant context. Second, we show that appending a large number of mask tokens--required for generation--can significantly degrade context comprehension. Through systematic ablations, we find that these masks act as distractors, reducing the model's ability to process relevant information. To address this, we introduce a mask-agnostic loss function that encourages predictions to remain invariant to the number of appended masks. Fine-tuning with this objective substantially mitigates the distracting effect of masks, improving robustness of MDLMs. Overall, our findings reveal critical limitations of the current MDLM training paradigm and provide actionable insights for building diffusion-based language models with stronger context comprehension.

URL PDF HTML ☆

赞 0 踩 0

2511.20613 2026-06-05 cs.LG cs.AI cs.MA 版本更新

Can Vibe Coding Beat Graduate CS Students? An LLM vs. Human Coding Tournament on Market-driven Strategic Planning

能否用Vibe编码击败研究生计算机科学学生？一个LLM与人类编码竞赛在市场驱动的战略规划中的表现

Panayiotis Danassis, Naman Goel

发表机构 * University of Southampton（苏塞克斯大学）； University of Oxford and Alan Turing Institute（牛津大学和艾伦·图灵研究所）

AI总结本文提出一个基于现实物流优化问题（拍卖、取件和送货问题）的多智能体推理驱动基准，该问题结合了竞争拍卖与容量受限路由。研究通过比较40个LLM编码代理与17个人类编码代理在12场双打全部比赛和约4万场比赛中的表现，揭示了人类编码代理在战略规划和优化任务中的优势，以及LLM在现实世界中生成有效代码的能力不足。

详情

DOI: 10.65109/SEOT8410

AI中文摘要

大型语言模型（LLMs）的快速普及已经革新了AI辅助代码生成。然而，LLMs的快速发展超出了我们正确评估它们的能力。现有的基准测试强调单元测试通过率和语法正确性。这些指标低估了许多需要规划、优化和战略互动的真实世界问题的难度。我们引入了一个基于现实物流优化问题（拍卖、取件和送货问题）的多智能体推理驱动基准，该问题结合了竞争拍卖与容量受限路由。该基准要求构建能够（i）在不确定性下进行战略投标，以及（ii）优化规划者在交付任务的同时最大化利润的代理。我们评估了40个LLM编码的代理（由多种最先进的LLMs在多种提示方法下，包括Vibe编码）与17个在LLM出现之前开发的人类编码代理。我们的结果在12场双打全部比赛和约4万场比赛中显示（i）人类（研究生学生）编码代理的明显优势：前5名始终由人类编码代理占据；（ii）大多数LLM编码代理（33个中的40个）被非常简单的基线所击败；（iii）在给定最佳人类解决方案作为输入并提示改进的情况下，表现最好的LLM使解决方案显著变差而不是改进。我们的结果突显了LLMs在现实世界中生成具有竞争力的代码能力的差距，并促使新的评估，这些评估强调在现实世界场景中推理驱动的代码合成。

英文摘要

The rapid proliferation of Large Language Models (LLMs) has revolutionized AI-assisted code generation. This rapid development of LLMs has outpaced our ability to properly benchmark them. Prevailing benchmarks emphasize unit-test pass rates and syntactic correctness. Such metrics understate the difficulty of many real-world problems that require planning, optimization, and strategic interaction. We introduce a multi-agent reasoning-driven benchmark based on a real-world logistics optimization problem (Auction, Pickup, and Delivery Problem) that couples competitive auctions with capacity-constrained routing. The benchmark requires building agents that can (i) bid strategically under uncertainty and (ii) optimize planners that deliver tasks while maximizing profit. We evaluate 40 LLM-coded agents (by a wide range of state-of-the-art LLMs under multiple prompting methodologies, including vibe coding) against 17 human-coded agents developed before the advent of LLMs. Our results over 12 double all-play-all tournaments and $\sim 40$k matches demonstrate (i) a clear superiority of human(graduate students)-coded agents: the top 5 spots are consistently won by human-coded agents, (ii) the majority of LLM-coded agents (33 out of 40) are beaten by very simple baselines, and (iii) given the best human solution as an input and prompted to improve upon, the best performing LLM makes the solution significantly worse instead of improving it. Our results highlight a gap in LLMs' ability to produce code that works competitively in the real-world, and motivate new evaluations that emphasize reasoning-driven code synthesis in real-world scenarios.

URL PDF HTML ☆

赞 0 踩 0

2511.16111 2026-06-05 stat.ML cs.LG math.SP 版本更新

Rotation-Parameterized Graph Fractional Fourier Transform: Definition, Properties, and Optimal Filtering

旋转参数化图分数阶傅里叶变换：定义、性质和最优滤波

Feiyue Zhao, Mingzhi Wang, Yangfan He, Zhichao Zhang

发表机构 * School of Mathematics and Statistics, Nanjing University of Information Science and Technology（南京信息工程大学数学与统计学院）； School of Communication and Artificial Intelligence, Nanjing Institute of Technology（南京理工大学通信与人工智能学院）； School of Integrated Circuits, Nanjing Institute of Technology（南京理工大学集成电路学院）； Jiangsu Province Engineering Research Center of IntelliSense Technology and System（江苏省智能感知技术与系统工程研究中心）； Hubei Key Laboratory of Applied Mathematics, Hubei University（湖北省应用数学重点实验室）； Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai Jiao Tong University（教育部系统控制与信息处理重点实验室，上海交通大学）

AI总结本文提出旋转参数化图分数阶傅里叶变换（RP-GFRFT），通过统一分数阶和旋转参数化的谱分析，解决现有方法在旋转基控制和零角度退化方面的不足，提升图信号处理的去噪、重建和特征保留性能。

详情

AI中文摘要

图谱表示在图信号处理中是基础，为分析图结构数据提供严谨的框架。图分数阶傅里叶变换（GFRFT）通过分数阶参数扩展图傅里叶变换（GFT），实现灵活的谱分析并保持数学一致性。角图傅里叶变换（AGFT）通过旋转GFT特征向量引入角度控制；然而现有构造可能无法在零角度时精确还原为GFT，削弱理论一致性和可解释性。为解决这些互补的局限性，即GFRFT缺乏基于旋转的基控制和AGFT的零角度退化问题，本文提出旋转参数化图分数阶傅里叶变换（RP-GFRFT），统一分数阶和旋转参数化的谱分析。构造了一个保持退化的旋转矩阵族以保证在零角度时精确还原为GFT。然后提出了两种RP-GFRFT变体，I-RP-GFRFT和II-RP-GFRFT，并通过理论分析确认其幺正性、可逆性、还原行为和光滑参数依赖性。将分数阶和旋转角度联合优化用于自适应图谱滤波。在真实世界信号、图像和点云上的实验表明，RP-GFRFT在去噪精度、重建质量和特征保留方面优于GFRFT、AGFT和代表性滤波基线。

英文摘要

Graph spectral representations are fundamental in graph signal processing, providing a rigorous frameworkforanalyzing graph-structured data. The graph fractional Fourier transform (GFRFT) extends the graph Fourier transform (GFT) through a fractional-order parameter, enabling flexible spectral analysis with mathematical consistency. The angular graph Fourier transform (AGFT) further introduces angular control by rotating GFT eigenvectors; however, existing constructions may fail to reduce exactly to the GFT at zero angle, weakening theoretical consistency and interpretability. To address these complementary limitations, namely the lack of rotation-based basis control in GFRFT and the defective zero-angle degeneracy of AGFT, this paper proposes the rotation-parameterized graph fractional Fourier transform (RP-GFRFT), which unifies fractional order and rotation-parameterized spectral analysis. A degeneracy preserving rotation matrix family is constructed to guarantee exact GFT reduction at zero angle. TwoRP-GFRFTvariants,I-RP-GFRFTandII-RP-GFRFT,arethenformulated, with theoretical analyses confirming their unitarity, invertibility, reduction behavior, and smooth parameter dependence. The fractional order and rotation angle are jointly optimized for adaptive graph spectral filtering. Experiments on real-world signals, images, and point clouds demonstrate that RP-GFRFT improves denoising accuracy, reconstruction quality, and feature preservation over GFRFT, AGFT, and representative filtering baselines.

URL PDF HTML ☆

赞 0 踩 0

2511.13044 2026-06-05 cs.LG 版本更新

Bi-View Embedding Fusion: A Hybrid Learning Approach for Knowledge Graph's Nodes Classification Addressing Problems with Limited Data

双视角嵌入融合：一种混合学习方法用于知识图谱节点分类，以解决数据有限的问题

Rosario Napoli, Giovanni Lonia, Antonio Celesti, Massimo Villari, Maria Fazio

发表机构 * Department of Mathematical and Computer Sciences, Physical Sciences and Earth Sciences, University of Messina（数学与计算机科学系、物理科学与地球科学系，墨西拿大学）

AI总结本文提出了一种双视角嵌入融合方法，通过结合Node2Vec和GraphSAGE两种互补的图嵌入技术，提升知识图谱节点特征的 informative 内容，从而生成增强的图嵌入以改进GML模型，无需额外合成数据。

Comments Accepted at the 14th International Joint Conference on Knowledge Graphs (IJCKG) 2025

详情

DOI: 10.1007/978-981-95-5009-8_2
Journal ref: Knowledge Graphs, Springer Nature Singapore, 2026, pp. 19-34

AI中文摘要

传统机器学习（ML）方法需要大量数据才能表现良好，限制了其在稀疏或不完整场景中的适用性，并迫使使用额外合成数据来改进模型训练。为克服这一挑战，研究社区越来越多地关注图机器学习（GML），因为它通过利用数据中的关系提供了强大的替代方案。然而，这种方法在处理知识图谱（KGs）时也面临限制，因为KGs的语义性质可能导致隐藏大量信息。本研究引入了Bi-View，一种新颖的混合方法，通过增加KGs节点特征的信息内容，生成增强的图嵌入（GEs），用于改进GML模型，而无需依赖额外合成数据。所提出的方法结合了两种互补的GE技术：Node2Vec，通过无监督随机游走捕捉结构模式，以及GraphSAGE，通过监督方式聚合邻居信息。首先计算Node2Vec嵌入以表示图拓扑，然后用基于中心性的度量指标丰富节点特征，这些特征作为GraphSAGE模型的输入。此外，融合层将原始Node2Vec嵌入与GraphSAGE影响的表示结合，形成双视角嵌入空间。此类融合捕获了图的拓扑和语义属性，使模型能够利用数据集中可能存在但未显式表示的 informative 特征。我们的方法提高了下游任务的性能，特别是在初始特征较差的情况下，为更准确和精确的KG增强GML模型奠定了基础。

英文摘要

Traditional Machine Learning (ML) methods require large amounts of data to perform well, limiting their applicability in sparse or incomplete scenarios and forcing the usage of additional synthetic data to improve the model training. To overcome this challenge, the research community is looking more and more at Graph Machine Learning (GML) as it offers a powerful alternative by using relationships within data. However, this method also faces limitations, particularly when dealing with Knowledge Graphs (KGs), which can hide huge information due to their semantic nature. This study introduces Bi-View, a novel hybrid approach that increases the informative content of node features in KGs to generate enhanced Graph Embeddings (GEs) that are used to improve GML models without relying on additional synthetic data. The proposed work combines two complementary GE techniques: Node2Vec, which captures structural patterns through unsupervised random walks, and GraphSAGE, which aggregates neighbourhood information in a supervised way. Node2Vec embeddings are first computed to represent the graph topology, and node features are then enriched with centrality-based metrics, which are used as input for the GraphSAGE model. Moreover, a fusion layer combines the original Node2Vec embeddings with the GraphSAGE-influenced representations, resulting in a dual-perspective embedding space. Such a fusion captures both topological and semantic properties of the graph, enabling the model to exploit informative features that may exist in the dataset but that are not explicitly represented. Our approach improves downstream task performance, especially in scenarios with poor initial features, giving the basis for more accurate and precise KG-enanched GML models.

URL PDF HTML ☆

赞 0 踩 0

2511.10362 2026-06-05 cs.LG cs.SY eess.SY math.DS 版本更新

Gradient Flow Equations for Deep Linear Neural Networks: A Survey from a Network Perspective

深度线性神经网络的梯度流方程：从网络角度的综述

Joel Wendin, Claudio Altafini

发表机构 * Department of Electrical Engineering, Linköping University（电子工程系，林雪平大学）

AI总结本文综述了深度线性神经网络梯度流方程的动力学和损失景观的最新进展，从网络角度探讨了梯度下降训练动态（步长趋近于0时的极限情况）以及二次损失函数下的研究问题，揭示了该方程类为收敛的矩阵微分方程，具有 nilpotent、多项式、isospectral 和守恒律等特性。

Comments Manuscript accepted for publication in SIAM Review (SIREV)

详情

DOI: 10.1137/24M1715519
Journal ref: SIAM Review 68 (2026) 293-345

AI中文摘要

本文综述了深度线性神经网络梯度流方程的动力学和损失景观的最新进展，即在忽略激活函数且使用二次损失函数的情况下，深度神经网络的梯度下降训练动态（当步长趋近于0时的极限情况）。当用神经网络的邻接矩阵来表示时，这些梯度流方程形成了一类收敛的矩阵微分方程，具有 nilpotent、多项式、isospectral 和守恒律等特性。损失景观被详细描述。其特征是存在无限多个全局极小值和鞍点（严格和非严格），但缺乏局部极小值和极小值。损失函数本身是一个正半定的李雅普诺夫函数，其等高线是无界的不变集，其临界值对应于梯度沿特定轨迹学习的输入输出数据的奇异值数量。本文所用的邻接矩阵表示法可以突出显示商空间结构的存在，其中每个损失函数的临界值仅被表示一次，而所有其他具有相同临界值的临界点都属于与商空间相关的纤维。它还允许轻松确定鞍点的稳定和不稳定子流形，即使海森矩阵无法获得这些结构。

英文摘要

The paper surveys recent progresses in understanding the dynamics and loss landscape of the gradient flow equations associated to deep linear neural networks, i.e., the gradient descent training dynamics (in the limit when the step size goes to 0) of deep neural networks missing the activation functions and subject to quadratic loss functions. When formulated in terms of the adjacency matrix of the neural network, as we do in the paper, these gradient flow equations form a class of converging matrix ODEs which is nilpotent, polynomial, isospectral, and with conservation laws. The loss landscape is described in detail. It is characterized by infinitely many global minima and saddle points, both strict and nonstrict, but lacks local minima and maxima. The loss function itself is a positive semidefinite Lyapunov function for the gradient flow, and its level sets are unbounded invariant sets of critical points, with critical values that correspond to the amount of singular values of the input-output data learnt by the gradient along a certain trajectory. The adjacency matrix representation we use in the paper allows to highlight the existence of a quotient space structure in which each critical value of the loss function is represented only once, while all other critical points with the same critical value belong to the fiber associated to the quotient space. It also allows to easily determine stable and unstable submanifolds at the saddle points, even when the Hessian fails to obtain them.

URL PDF HTML ☆

赞 0 踩 0

2511.08972 2026-06-05 cs.LG 版本更新

Selective Sinkhorn Routing for Improved Sparse Mixture of Experts

选择性Sinkhorn路由以提高稀疏专家混合模型

Duc Anh Nguyen, Huu Binh Ta, Nhuan Le Duc, Tan Minh Nguyen, Toan Tran

发表机构 * University of California, Berkeley（加州大学伯克利分校）； National University of Singapore（新加坡国立大学）

AI总结本文提出了一种选择性Sinkhorn路由方法，通过将token到专家的分配问题转化为最优传输问题，并引入约束以确保专家利用率均衡，从而在不依赖辅助平衡损失的情况下提升稀疏专家混合模型的性能。

Comments 12 pages, 5 figures

详情

AI中文摘要

稀疏专家混合模型（SMoE）模型具有可扩展性和计算效率，能够在有限的推理开销下实现模型容量的大幅增加。现有的SMoE方法通常依赖于辅助目标，如负载均衡损失和z损失，或额外的可训练组件如噪声门控。虽然这些技术鼓励专家多样性，但可能会引入目标不一致、增加模型复杂性或带来显著的训练开销，尤其是在基于Sinkhorn的路由方法中。在本文中，我们重新审视token到专家的分配问题作为最优传输问题。我们添加约束以确保专家利用率的平衡。我们证明，即使是最小的基于最优传输的路由也能在不需辅助平衡损失的情况下提升SMoE性能。与以往方法不同，我们的方法直接从传输图中推导出门控分数，从而实现更平衡和有效的token到专家分配。基于这一见解，我们引入了选择性Sinkhorn路由（SSR），一种轻量级的路由机制，它用高效的Sinkhorn路由替代了复杂的辅助损失，同时保持灵活的专家选择。在语言建模和图像分类实验中，SSR在训练效率、准确性和对输入损坏的鲁棒性方面均有所提升。

英文摘要

Sparse Mixture-of-Experts (SMoE) models are scalable and computationally efficient, enabling large increases in model capacity with limited inference overhead. Existing SMoE methods often depend on auxiliary objectives, such as load-balancing loss and z-loss, or additional trainable components such as noisy gating. While these techniques encourage expert diversity, they can introduce objective misalignment, increase model complexity, or incur substantial training overhead, especially in Sinkhorn-based routing methods. In this paper, we revisit the token-to-expert assignment as an optimal transport problem. We add constraints to ensure balanced expert utilization. We show that even minimal optimal transport-based routing improves SMoE performance without requiring auxiliary balancing losses. Unlike prior approaches, our method derives gating scores directly from the transport map, leading to more balanced and effective token-to-expert assignments. Building on this insight, we introduce Selective Sinkhorn Routing (SSR), a lightweight routing mechanism that replaces complex auxiliary losses with efficient Sinkhorn-based routing while preserving flexible expert selection. Experiments on language modeling and image classification show that SSR improves training efficiency, accuracy, and robustness to input corruption.

URL PDF HTML ☆

赞 0 踩 0

2511.05615 2026-06-05 cs.LG cs.AI cs.AR physics.ins-det 版本更新

wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation

wa-hls4ml: 一个用于hls4ml资源和延迟估计的基准及替代模型

Benjamin Hawks, Jason Weitz, Dmitri Demler, Karla Tame-Narvaez, Dennis Plotnikov, Mohammad Mehdi Rahimifar, Hamza Ezzaoui Rahali, Audrey C. Therrien, Donovan Sproule, Elham E Khoda, Keegan A. Smith, Russell Marroquin, Giuseppe Di Guglielmo, Nhan Tran, Javier Duarte, Vladimir Loncar

发表机构 * Fermi National Accelerator Laboratory（费米国家加速器实验室）； University of California San Diego（加州大学圣地亚哥分校）； Johns Hopkins University（约翰霍普金斯大学）； University of Sherbrooke（Sherbrooke大学）； Columbia University（哥伦比亚大学）； Texas A&M University（德克萨斯A&M大学）； European Organization for Nuclear Research (CERN)（欧洲核子研究中心（CERN））

AI总结本文提出了一个用于评估ML加速器资源和延迟的基准wa-hls4ml，并介绍了基于图神经网络和Transformer的替代模型，用于预测ML加速器的延迟和资源使用情况。

Comments 30 pages, 18 figures

详情

DOI: 10.1145/3787490
Journal ref: Wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation. ACM Trans. Reconfigurable Technol. Syst. 19, 2, Article 20 (June 2026), 29 pages

AI中文摘要

随着机器学习（ML）越来越多地在硬件中实现以解决科学应用中的实时挑战，先进的工具链开发显著减少了各种设计迭代所需的时间。这些进步已经解决了主要障碍，但也暴露了新的挑战。例如，以前未被考虑的瓶颈过程，如硬件综合，现在成为设计快速迭代的限制因素。为缓解这些新兴约束，已经开展了多项努力，以开发基于ML的替代模型，以估计ML加速器架构的资源使用情况。我们介绍了wa-hls4ml，这是一个用于ML加速器资源和延迟估计的基准，以及其对应的初始数据集，包含超过680,000个全连接和卷积神经网络，均使用hls4ml合成并针对Xilinx FPGA。该基准评估了资源和延迟预测器在几种常见ML模型架构上的性能，这些架构主要来自科学领域，作为示例模型，并评估了数据集子集的平均性能。此外，我们还介绍了基于图神经网络和Transformer的替代模型，用于预测ML加速器的延迟和资源。我们展示了这些模型的架构和性能，并发现这些模型通常在合成测试数据集上对75百分位数的延迟和资源预测误差在几个百分点以内。

英文摘要

As machine learning (ML) is increasingly implemented in hardware to address real-time challenges in scientific applications, the development of advanced toolchains has significantly reduced the time required to iterate on various designs. These advancements have solved major obstacles, but also exposed new challenges. For example, processes that were not previously considered bottlenecks, such as hardware synthesis, are becoming limiting factors in the rapid iteration of designs. To mitigate these emerging constraints, multiple efforts have been undertaken to develop an ML-based surrogate model that estimates resource usage of ML accelerator architectures. We introduce wa-hls4ml, a benchmark for ML accelerator resource and latency estimation, and its corresponding initial dataset of over 680,000 fully connected and convolutional neural networks, all synthesized using hls4ml and targeting Xilinx FPGAs. The benchmark evaluates the performance of resource and latency predictors against several common ML model architectures, primarily originating from scientific domains, as exemplar models, and the average performance across a subset of the dataset. Additionally, we introduce GNN- and transformer-based surrogate models that predict latency and resources for ML accelerators. We present the architecture and performance of the models and find that the models generally predict latency and resources for the 75% percentile within several percent of the synthesized resources on the synthetic test dataset.

URL PDF HTML ☆

赞 0 踩 0

2410.02628 2026-06-05 cs.LG cs.AI 版本更新

Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization

逆熵最优运输通过数据似然最大化解决半监督学习

Mikhail Persiianov, Arip Asadulaev, Nikita Andreev, Nikita Starodubcev, Dmitry Baranchuk, Anastasis Kratsios, Evgeny Burnaev, Alexander Korotin

发表机构 * Institute for Advanced Study（高级研究院）； National Research Council Canada（加拿大国家研究理事会）； University of Toronto（多伦多大学）； St. Petersburg State University（圣彼得格勒国立大学）； Skolkovo Institute of Science and Technology（斯克罗夫诺技术研究所）； Kazan Federal University（卡兹兰卡联邦大学）

AI总结本文提出了一种名为EBiEOT的新学习范式，通过数据似然最大化技术无缝整合配对和非配对数据，解决了半监督学习中的数据获取难题，并证明了该方法在理论上能够以任意小的误差恢复真实条件分布。

详情

AI中文摘要

学习条件分布π*(⋅|x)是机器学习中的核心问题，通常通过监督方法利用配对数据(x,y)∼π*进行学习。然而，获取配对数据样本往往具有挑战性，尤其是在领域翻译等问题中。这需要开发能够利用有限配对数据和额外非配对i.i.d.样本x∼π*_x和y∼π*_y的半监督模型。使用此类结合数据复杂且常依赖启发式方法。为此，我们提出了一种新的学习范式称为EBiEOT，利用数据似然最大化技术无缝整合配对和非配对数据。我们证明了该方法与逆熵最优运输(OT)有奇妙的联系。这一发现使我们能够应用最近的计算OT进展，建立一个端到端的学习算法来获得π*(⋅|x)。此外，我们推导了通用逼近性质，证明该方法在理论上可以以任意小的误差恢复真实条件分布。最后，我们通过实验证明，我们的方法能够同时利用配对和非配对数据有效学习条件分布。EBiEOT的代码可在https://github.com/MuXauJl11110/EBiEOT上获得。

英文摘要

Learning conditional distributions $π^*(\cdot|x)$ is a central problem in machine learning, which is typically approached via supervised methods with paired data $(x,y) \sim π^*$. However, acquiring paired data samples is often challenging, especially in problems such as domain translation. This necessitates the development of $\textit{semi-supervised}$ models that utilize both limited paired data and additional unpaired i.i.d. samples $x \sim π^*_x$ and $y \sim π^*_y$ from the marginal distributions. The usage of such combined data is complex and often relies on heuristic approaches. To tackle this issue, we propose a new learning paradigm called $\textbf{EBiEOT}$ that integrates both paired and unpaired data seamlessly using data likelihood maximization techniques. We demonstrate that our approach also connects intriguingly with inverse entropic optimal transport (OT). This finding allows us to apply recent advances in computational OT to establish an $\textit{end-to-end}$ learning algorithm to get $π^*(\cdot|x)$. In addition, we derive the universal approximation property, demonstrating that our approach can theoretically recover true conditional distributions with arbitrarily small error. Finally, we demonstrate through empirical tests that our method effectively learns conditional distributions using paired and unpaired data simultaneously. The code of $\texttt{EBiEOT}$ is available at https://github.com/MuXauJl11110/EBiEOT.

URL PDF HTML ☆

赞 0 踩 0

2505.15405 2026-06-05 cs.LG 版本更新

HOPSE: Scalable Higher-Order Positional and Structural Encoder for Combinatorial Representations

HOPSE：可扩展的高阶位置和结构编码器用于组合表示

Guillermo Bernárdez, Marco Montagna, Louis Van Langendonck, Martin Carrasco, Amirreza Akbari, Louisa Cornelis, Mathilde Papillon, Pere Barlet-Ros, Nina Miolane, Lev Telyatnikov

发表机构 * Guillermo Bernárdez University California Santa Barbara（Guillermo Bernárdez 卡尔弗大学圣巴bara分校）； Sapienza University of Rome（罗马萨皮恩扎大学）； Universitat Politèqnica de Catalunya（加泰罗尼亚理工大学）； University of Fribourg（弗里堡大学）； Aalto University（阿alto大学）； University California Santa Barbara（加州圣巴bara大学）； Intelligent Maintenance and Operations Systems, EPFL（EPFL智能维护与操作系统）

AI总结本文提出HOPSE，一种无需消息传递层的框架，通过Hasse图分解在任意高阶域上生成高效且表达能力强的编码，实现了在组合表示规模线性增长的同时保持HOMP方法的表达能力和排列等价性，实验表明其在分子和拓扑基准上表现优异且速度更快。

详情

AI中文摘要

尽管图神经网络（GNNs）在建模关系数据方面表现出色，但成对连接无法自然捕捉复杂现实系统中多向关系。为此，拓扑深度学习（TDL）利用更一般的组合表示——如单纯复形或细胞复形——来容纳高阶交互。现有TDL方法通常通过高阶消息传递（HOMP）扩展GNNs，但因传播消息通过组合结构的复杂度高而面临关键的可扩展性挑战。为克服这一限制，我们提出了HOPSE（高阶位置和结构编码器），一种无需消息传递层的框架，利用Hasse图分解在任意高阶域上生成高效且表达能力强的编码。值得注意的是，HOPSE在组合表示规模上呈线性增长，同时保持HOMP方法的表达能力和排列等价性。在分子和拓扑基准上的实验表明，它在匹配或超越最先进性能的同时，始终在HOMP基于模型上实现速度提升，为可扩展的TDL开辟了新路径。代码可在https://github.com/geometric-intelligence/topobench.git获取。

英文摘要

While Graph Neural Networks (GNNs) have proven highly effective at modeling relational data, pairwise connections cannot fully capture multi-way relationships naturally present in complex real-world systems. In response to this, Topological Deep Learning (TDL) leverages more general combinatorial representations--such as simplicial or cellular complexes--to accommodate higher-order interactions. Existing TDL methods often extend GNNs through Higher-Order Message Passing (HOMP), but face critical scalability challenges due to the steep complexity overhead of propagating messages through combinatorial structures. To overcome this limitation, we propose HOPSE (Higher-Order Positional and Structural Encoder), a framework free of message passing layers that uses Hasse graph decompositions to derive efficient and expressive encodings over arbitrary higher-order domains. Notably, HOPSE scales linearly with the size of combinatorial representations while preserving the expressive power and permutation equivariance of the HOMP approaches. Experiments on molecular and topological benchmarks show that it matches or surpasses state-of-the-art performance while consistently achieving speedups over HOMP-based models, opening a new path for scalable TDL. The code is available at https://github.com/geometric-intelligence/topobench.git.

URL PDF HTML ☆

赞 0 踩 0

2510.15814 2026-06-05 stat.ML cs.LG 版本更新

On Universality of Deep Equivariant Networks

关于深度等变网络的通用性

Marco Pacini, Mircea Petrache, Bruno Lepri, Shubhendu Trivedi, Robin Walters

发表机构 * University of Trento（特伦托大学）； Fondazione Bruno Kessler（布鲁诺·凯瑟勒基金会）； PUC Chile（智利天主教大学）； Northeastern University（东北大学）

AI总结本文研究了等变神经网络的通用性问题，提出在分离约束下，通过全连接读出层可实现连续函数的近似，并引入了更严格的逐元素分离性准则，证明了足够深度或适当读出层可使等变网络在逐元素分离性范围内实现通用性。

Comments Published as a conference paper at ICLR 2026

详情

Journal ref: International Conference on Learning Representations (ICLR), 2026

AI中文摘要

对于等变神经网络的通用性结果仍然很少。已有的结果通常仅在受限的设置中成立：要么依赖于常规或高阶张量表示，导致隐藏空间维度过高，要么针对专门的架构，通常局限于不变设置。本文提出了一种更一般性的结论。对于不变网络，我们在分离约束下建立了通用性定理，证明添加全连接读出层可使连续函数的近似在分离约束下实现。对于等变网络，其中结果更为稀少，我们证明标准分离性概念不足，并引入更严格的逐元素分离性准则。我们证明在足够深度或添加适当读出层的情况下，等变网络可在逐元素分离性范围内实现通用性。结合先前结果表明浅层模型无法实现通用性，我们的发现将深度和读出层识别为通用性的关键机制，同时提供了一个统一的视角，涵盖了并扩展了先前专门的结果。

英文摘要

Universality results for equivariant neural networks remain rare. Those that do exist typically hold only in restrictive settings: either they rely on regular or higher-order tensor representations, leading to impractically high-dimensional hidden spaces, or they target specialized architectures, often confined to the invariant setting. This work develops a more general account. For invariant networks, we establish a universality theorem under separation constraints, showing that the addition of a fully connected readout layer secures approximation within the class of separation-constrained continuous functions. For equivariant networks, where results are even scarcer, we demonstrate that standard separability notions are inadequate and introduce the sharper criterion of $\textit{entry-wise separability}$. We show that with sufficient depth or with the addition of appropriate readout layers, equivariant networks attain universality within the entry-wise separable regime. Together with prior results showing the failure of universality for shallow models, our findings identify depth and readout layers as a decisive mechanism for universality, additionally offering a unified perspective that subsumes and extends earlier specialized results.

URL PDF HTML ☆

赞 0 踩 0

2510.05544 2026-06-05 cs.CL cs.LG 版本更新

Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM

基于激活信息的帕累托引导低秩压缩用于高效LLM/VLM

Ryan Solgi, Parsa Madinei, Jiayi Tian, Rupak Swaminathan, Jing Liu, Nathan Susanj, Zheng Zhang

发表机构 * University of California-Santa Barbara（加州大学圣芭芭拉分校）； Amazon（亚马逊）

AI总结本文提出了一种基于激活信息的帕累托引导低秩压缩方法，通过理论分析和算法设计，在保持模型精度的同时提升LLM和VLM的压缩效率和推理速度。

2510.04500 2026-06-05 cs.LG 版本更新

Expand Neurons, Not Parameters

扩展神经元，而非参数

Linghao Kong, Inimai Subramanian, Yonadav Shavit, Micah Adler, Dan Alistarh, Nir Shavit

发表机构 * University of Washington（华盛顿大学）； Microsoft Research（微软研究院）

AI总结通过增加神经元数量而不增加非零参数总数，减少特征干扰，从而提高网络性能，并在多种模型中验证了有效性。

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026). 9 pages, 6 figures. Code available at https://github.com/Shavit-Lab/Expand-Neurons

详情

AI中文摘要

本工作展示了如何在不增加网络非零参数总数的情况下，通过增加神经元数量来提升性能。我们证明，这种提升对应于多个特征之间干扰的减少，否则这些特征将共享相同的神经元。在符号布尔任务中，根据子句知识将每个神经元分割成更稀疏的子神经元，系统性地降低了多语义性指标，并获得了更高的任务准确率。值得注意的是，即使是神经元权重的随机分割也能近似这些增益，表明减少冲突（而非精确分配）是主要驱动因素。与叠加假说一致，该框架的收益随着干扰的增加而增长：当多语义负载较高时，准确率提升最大。将这些见解迁移到更现实的模型中，包括基于CLIP嵌入的分类器、卷积神经网络和更深的多层网络，我们发现，在保持非零参数数量不变的情况下加宽网络，持续提高了准确率。这些结果确定了一种基于可解释性的机制，利用宽度来对抗叠加，从而在不增加非零参数数量的情况下提升性能。这种方向与现代加速器非常匹配，因为在这些加速器中，非零参数的内存移动（而非原始计算）通常是主要瓶颈。

英文摘要

This work demonstrates how increasing the number of neurons in a network without increasing its total number of non-zero parameters improves performance. We show that this gain corresponds with a decrease in interference between multiple features that would otherwise share the same neurons. On symbolic Boolean tasks, splitting each neuron into sparser sub-neurons with knowledge of the clauses systematically reduces polysemanticity metrics and yields higher task accuracy. Notably, even random splits of neuron weights approximate these gains, indicating that reduced collisions, not precise assignment, are a primary driver. Consistent with the superposition hypothesis, the benefits of this framework grow with increasing interference: when polysemantic load is high, accuracy improvements are the largest. Transferring these insights to more realistic models, including classifiers over CLIP embeddings, convolutional neural networks, and deeper multilayer networks, we find that widening networks while maintaining a constant non-zero parameter count consistently increases accuracy. These results identify an interpretability-grounded mechanism to leverage width against superposition, improving performance without increasing the number of non-zero parameters. Such a direction is well matched to modern accelerators, where memory movement of non-zero parameters, rather than raw compute, is often a dominant bottleneck.

URL PDF HTML ☆

赞 0 踩 0

2508.09697 2026-06-05 cs.LG cs.CV 版本更新

多尺度科学数据的自适应生成流

Yifan Chen, Eric Vanden-Eijnden

发表机构 * Department of Mathematics, University of California, Los Angeles（加州大学洛杉矶分校数学系）； Machine Learning Lab, Capital Fund Management（资本基金管理有限公司机器学习实验室）； Courant Institute, New York University（纽约大学柯朗研究所）

AI总结本文提出了一种多尺度科学数据生成模型，通过设计噪声分布和插值计划，解决多尺度傅里叶谱数据中的数值挑战，提高了生成样本的质量和效率。

详情

AI中文摘要

基于流的生成模型在处理具有多尺度傅里叶谱的科学数据时常常面临数值挑战，通常在细尺度上产生较大的误差。我们通过在流匹配和随机插值框架内，通过噪声分布和插值计划的原理性设计来解决这个问题。在函数空间中工作可以确保生成模型在分辨率细化时仍然定义良好；漂移的Lipschitz正则性对这种函数空间的良定义性和固定分辨率下的积分成本都很重要。核心观察是噪声应至少与目标分布一样粗糙——通过傅里叶谱衰减来衡量——以保持Lipschitz常数有限。对于已知细尺度结构的高斯和近高斯目标，匹配谱噪声比标准白噪声选择更有效。对于更复杂的非高斯目标，匹配谱噪声可能不足以应对噪声比数据粗糙时出现的终端时间刚性问题，我们提出自适应插值计划来缓解这种情况。在合成高斯随机场和随机Allen-Cahn和Navier-Stokes方程不变测度上的数值实验展示了该方法，并证明了其在传统方法基础上以更低计算成本生成高质量样本的能力。

英文摘要

Flow-based generative models can face numerical challenges on scientific data with multiscale Fourier spectra, often producing large errors at fine scales. We approach this problem within the flow matching and stochastic interpolants framework, through the principled design of noise distributions and interpolation schedules. Working in function space ensures that the generative model remains well defined as the resolution is refined; the Lipschitz regularity of the drift is important to both this function-space well-posedness and the integration cost at fixed resolution. The central observation is that the noise should be at least as rough as the target distribution -- measured by Fourier-spectrum decay -- in order to keep the Lipschitz constant finite. For Gaussian and near-Gaussian targets whose fine-scale structure is known, matched-spectrum noise improves numerical efficiency over standard white-noise choices. For more complex non-Gaussian targets, matched-spectrum noise may not be sufficient, and we propose scale-adaptive interpolation schedules to mitigate the terminal-time stiffness that arises when the noise is rougher than the data. Numerical experiments on synthetic Gaussian random fields and on invariant measures of the stochastic Allen--Cahn and Navier--Stokes equations illustrate the approach and demonstrate its ability to generate high-fidelity samples at lower computational cost than traditional approaches.

URL PDF HTML ☆

赞 0 踩 0

2508.19006 2026-06-05 q-fin.PR cs.LG econ.EM q-fin.CP 版本更新

Is attention truly all we need? An empirical study of asset pricing in pretrained RNN sparse and global attention models

注意力真的全部我们需要吗？对预训练RNN稀疏和全局注意力模型在资产定价中的实证研究

Shanyan Lai

发表机构 * Department of Economics and Related Studies, Univiersity of York（经济与相关研究系，约克大学）

AI总结本文研究了预训练RNN注意力模型在资产定价中的应用，探讨了注意力机制在捕捉时间依赖性和长期记忆方面的改进，以及在不同市场条件下的稳定性。

Comments 72 pages including appendix

详情

AI中文摘要

本研究探讨了主流注意力机制，如加权注意力、Luong的三种注意力、全局自注意力和滑动窗口稀疏注意力，在顶级420只大型美国股票上的实证资产定价研究。这是首次将大规模最先进的（SOTA）注意力机制应用于资产定价领域。这些模型克服了传统机器学习资产定价方法的局限性，如误捕时间依赖性和短期记忆。此外，注意力机制中的强制因果掩码解决了未来数据泄漏问题，而这一问题被更先进的注意力模型如经典Transformer所忽视。所提出的注意力模型还考虑了资产定价数据的时间稀疏性，并通过部署简化模型结构来缓解潜在的过拟合问题。本文为未来实证经济研究提供了某些见解。所有模型均在三个时期内进行测试，涵盖新冠前、新冠期间和新冠后一年，以测试这些模型在极端市场条件下的稳定性。研究发现，在价值加权投资组合回测中，全局自注意力模型和滑动窗口稀疏注意力模型在获得绝对收益和对冲下行风险方面表现出色，在新冠期间静态交易成本情景下，它们分别实现了2.0和1.80的年化Sortino比率。此外，从绝对投资组合收益的角度来看，滑动窗口稀疏注意力模型在股票市值大小方面比全局自注意力模型表现更加稳定。

英文摘要

This study investigates the pre-trained RNN attention models with the mainstream attention mechanisms, such as additive attention, Luong's three attentions, global self-attention and sliding window sparse attention, for the empirical asset pricing research on the top 420 large-cap US stocks. This is the first paper on the large-scale state-of-the-art (SOTA) attention mechanisms applied in the asset pricing context. They overcome the limitations of the traditional machine learning-based asset pricing, such as mis-capturing the temporal dependency and short memory. Moreover, the enforced causal masks in the attention mechanisms address the future data leaking issue ignored by the more advanced attention-based models, such as the classic Transformer. The proposed attention models also consider the temporal sparsity characteristic of asset pricing data and mitigate potential overfitting issues by deploying the simplified model structures. This provides some insights for future empirical economic research. All models are examined in three periods, which cover pre-COVID-19, COVID-19 and one year post-COVID-19, for testing the stability of these models under extreme market conditions. The study finds that in value-weighted portfolio back testing, the global self-attention model and the sliding window sparse attention model exhibit excellent capabilities in deriving the absolute returns and hedging downside risks, while they achieve an annualized Sortino ratio of 2.0 and 1.80 respectively in the period with COVID-19 in the static transaction cost scenario. Moreover, the sliding window sparse attention model performs more stably than the global self-attention model from the perspective of absolute portfolio returns with respect to the size of stocks' market capitalization.

URL PDF HTML ☆

赞 0 踩 0

2307.05284 2026-06-05 cs.LG cs.AI 版本更新

Rethinking Distribution Shifts: Empirical Analysis and Modeling for Tabular Data

重新思考分布偏移：针对表格数据的经验分析与建模

Tianyu Wang, Jiashuo Liu, Peng Cui, Hongseok Namkoong

发表机构 * Department of Industrial Engineering and Operations Research（工业工程与运筹学系）； Department of Computer Science and Technology（计算机科学与技术系）； Decision, Risk, and Operations Division（决策、风险与运营部）； Columbia University（哥伦比亚大学）； Tsinghua University（清华大学）

AI总结本文通过经验分析和建模，重新审视分布偏移问题，发现Y|X偏移在表格数据中最为常见，与机器学习文献中对X（协变量）偏移的重视形成鲜明对比，并指出鲁棒算法的性能并不优于普通方法。

Comments Forthcoming at Management Science. Conference version appeared in NeurIPS 2023, previously titled "On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets"

详情

AI中文摘要

不同的分布偏移需要不同的干预措施，算法必须基于其解决的具体偏移类型来构建。然而，稳健算法的方法学发展通常依赖于缺乏实证验证的结构性假设。本文倡导一种以实证为基础的数据驱动方法来开发算法，构建了一个包含8个表格数据集中的自然偏移、172个分布对、45种方法和90,000种方法配置的实证测试平台，涵盖了经验风险最小化和分布鲁棒优化（DRO）方法。我们发现Y|X偏移在我们的测试平台中最为普遍，这与机器学习文献中对X（协变量）偏移的高度重视形成鲜明对比，并且稳健算法的性能并不优于普通方法。为了理解原因，我们深入分析了DRO方法，发现被忽视的实现细节——如底层模型类（例如LightGBM）的选择和超参数选择——对性能的影响比模糊集或其半径更大。通过案例研究，我们展示了如何通过数据驱动的归纳理解分布偏移，提供了一种新的算法开发方法。

英文摘要

Different distribution shifts require different interventions, and algorithms must be grounded in the specific shifts they address. However, methodological development for robust algorithms typically relies on structural assumptions that lack empirical validation. Advocating for an empirically grounded data-driven approach to algorithm development, we build an empirical testbed comprising natural shifts across 8 tabular datasets, 172 distribution pairs over 45 methods and 90,000 method configurations encompassing empirical risk minimization and distributionally robust optimization (DRO) methods. We find $Y|X$-shifts are most prevalent in our testbed, in stark contrast to the heavy focus on $X$ (covariate)-shifts in the ML literature, and that the performance of robust algorithms is no better than that of vanilla methods. To understand why, we conduct an in-depth empirical analysis of DRO methods and find that underlooked implementation details -- such as the choice of underlying model class (e.g., LightGBM) and hyperparameter selection -- have a bigger impact on performance than the ambiguity set or its radius. We illustrate via case studies how a data-driven, inductive understanding of distribution shifts can provide a new approach to algorithm development.

URL PDF HTML ☆

赞 0 踩 0

2508.10555 2026-06-05 physics.comp-ph cs.CE cs.LG 版本更新

A Differentiable Framework for Full and Phaseless Data Inversion Using Neural Implicit Contrast-Source Representation

一种基于神经隐式对比源表示的全数据和相位less数据反演可微框架

Haoran Sun, Daoqi Liu, Hongyu Zhou, Maokun Li, Shenheng Xu, Fan Yang

发表机构 * Department of Electronic Engineering, Beijing National Research Center for Information Science and Technology (BNRist), and State Key laboratory of Space Network and Communications（电子工程系，北京信息科学与技术国家研究中心（BNRist），空间网络与通信国家重点实验室）

AI总结本文提出了一种基于神经隐式对比源表示的可微框架，用于全数据和相位less数据反演，通过引入轻量级残差多层感知机作为连续神经场，提升了反演精度和鲁棒性，同时通过总变分正则化将状态方程和数据方程结合，形成可微目标函数，实现了端到端的可微优化。

详情

AI中文摘要

在本研究中，我们扩展了对比源反演，将其扩展为一个完全可微、无监督的框架，基于神经隐式表示的对比源。具体来说，而不是使用像素级离散表示，对比源由一个轻量级残差多层感知机（ResMLP）参数化，作为连续神经场，该神经场基于空间坐标和发射器设置进行条件化。这种连续参数化提供了更灵活的对比源表示，并在有噪声测量的情况下提高了重建精度和鲁棒性。基于此表示，状态方程和数据方程与总变分正则化相结合，形成一个可微的目标函数。通过将VIE约束反演重新公式化为一个端到端的可微优化问题，网络参数和介质对比率通过自动微分联合优化。在相同框架内，通过仅修改数据失配函数，同时支持全数据和相位less数据反演。数值实验表明，该方案在各种噪声水平和测量设置下，比传统CSI具有更高的重建精度和鲁棒性。连续神经场进一步使超分辨率推理成为可能，在训练网格更细的分辨率下实现，将反演成本与重建保真度解耦。消融研究和与替代神经架构的比较进一步确认，对比源参数化和基于VIE的公式化对于观察到的改进都是必不可少的。

英文摘要

In this study, we extend the contrast source inversion to a fully differentiable, unsupervised framework based on a neural implicit representation of the contrast source. Specifically, instead of a pixel-wise discrete representation, the contrast source is parameterized by a lightweight residual multilayer perceptron (ResMLP) as a continuous neural field conditioned on spatial coordinates and transmitter settings. This continuous parameterization provides a more flexible representation of the contrast source and improves reconstruction accuracy and robustness under noisy measurements. Building on this representation, the state equation and data equation are combined with total-variation regularization to form a differentiable objective function. By reformulating the VIE-constrained inversion as an end-to-end differentiable optimization problem, the network parameters and the medium contrast are jointly optimized via automatic differentiation. Within the same framework, both full and phaseless data inversion are accommodated by only modifying the data misfit function. Numerical experiments demonstrate that this scheme yields higher reconstruction accuracy and robustness than conventional CSI across a range of noise levels and measurement settings. The continuous neural field further enables super-resolution inference at resolutions finer than the training grid, decoupling inversion cost from reconstruction fidelity. Ablation studies and comparisons with alternative neural architectures further confirm that the contrast source parameterization and VIE-based formulation are both essential to the observed improvements.

URL PDF HTML ☆

赞 0 踩 0

2508.00775 2026-06-05 eess.SY cs.LG cs.SY math.OC 版本更新

Learning to optimize with guarantees: a complete characterization of linearly convergent algorithms

学习优化并保证收敛性：线性收敛算法的完整表征

Andrea Martin, Ian R. Manchester, Luca Furieri

发表机构 * School of Electrical Engineering and Computer Science, and Digital Futures, KTH Royal Institute of Technology, Sweden（电气工程与计算机科学学院及数字未来学院，瑞典皇家理工学院）； Australian Centre for Robotics and School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Australia（澳大利亚机器人中心及航空航天、机械与机电工程学院，澳大利亚悉尼大学）； Department of Engineering Sciences, University of Oxford, United Kingdom（工程科学系，英国牛津大学）

AI总结本文研究了如何通过改进算法在特定问题分布下的平均性能，提出了一种线性收敛算法的完整表征方法，展示了如何通过基线算法和可训练的指数衰减修改来实现线性收敛，并在非凸、梯度主导函数、强凸函数和多面体可行集优化中验证了其有效性。

详情

AI中文摘要

许多经典优化算法的设计受到线性收敛速率在问题类中的认证驱动。本文考虑了如何改进算法在特定问题实例分布下的平均性能。虽然可以通过将可训练组件嵌入算法更新中来解决这一任务，但关键挑战是保持整个问题类中的最坏保证。对于复合优化问题的类别，我们证明所有线性收敛算法都可以参数化为一个基线线性收敛算法和一组可训练的指数衰减修改其更新规则的参数；关键在于这种参数化排除了且仅排除了那些不收敛的算法。我们的结果适用于改进经典算法（如梯度下降用于非凸、梯度主导函数；Nesterov加速方法用于平滑强凸函数；投影梯度方法用于多面体可行集优化）的平均性能。我们展示了如何利用我们的表征来学习优化并保证线性收敛和可行性。数值结果展示了在求解病态线性方程组和在线性动力学系统上运行模型预测控制方案时，相较于经典优化器的优势。

英文摘要

The design of many classical optimization algorithms is driven by the certification of linear convergence rates over classes of optimization problems. In this paper, we consider the problem of improving the average-case performance of an algorithm over a specific distribution of problem instances. While this task can be tackled by embedding trainable components into the algorithm updates, a key challenge is to preserve worst-case guarantees across the entire problem class. For classes of composite optimization problems, we show that all linearly convergent algorithms can be parametrized in terms of a baseline linearly convergent algorithm, and a set of trainable, exponentially-decaying modifications to its update rule; crucially, this parametrization excludes all-and only-the algorithms that do not converge linearly. Our results apply to improving the average-case performance of classical algorithms such as gradient descent for nonconvex, gradient-dominated functions; Nesterov's accelerated method for smooth, strongly convex functions; and projected gradient methods for optimization over polyhedral feasible sets. We illustrate how our characterization can be used for learning to optimize with linear convergence and feasibility guarantees. Numerical results showcase benefits over classical optimizers when solving ill-conditioned systems of linear equations and running a model predictive control scheme on a linear dynamical system.

URL PDF HTML ☆

赞 0 踩 0

2507.06219 2026-06-05 cs.RO cs.AI cs.LG 版本更新

Is Diversity All You Need for Scalable Robotic Manipulation?

多样性是否是可扩展机器人操作的全部需求？

Modi Shi, Li Chen, Jin Chen, Yuxiang Lu, Chiming Liu, Guanghui Ren, Ping Luo, Di Huang, Maoqing Yao, Hongyang Li

发表机构 * University of Science and Technology of China（中国科学技术大学）； Tsinghua University（清华大学）； University of California, Berkeley（加州大学伯克利分校）

AI总结本文研究了数据多样性在机器人学习中的作用，发现任务多样性比单任务演示量更重要，多身体预训练数据在跨身体转移中可选，专家多样性可能对策略学习产生干扰，提出分布去偏方法提升性能。

Comments Code is available at https://github.com/OpenDriveLab/AgiBot-World

详情

AI中文摘要

数据扩展在自然语言处理和计算机视觉的基础模型中取得了显著成功，但机器人操作中有效数据扩展的原则仍不够清楚。本文通过研究机器人学习中数据多样性的细微作用，探讨了三个关键维度：任务（做什么）、身体（使用哪种机器人）和专家（谁演示）。通过在各种机器人平台上进行广泛实验，我们发现：（1）任务多样性比单任务演示数量更重要，有助于从多样预训练任务转移到新下游场景；（2）多身体预训练数据在跨身体转移中是可选的，高质量单身体预训练模型可以高效地转移到不同平台，在微调过程中表现出比多身体预训练模型更优的扩展特性；（3）专家多样性源于个体操作偏好和人类演示中的随机变化，可能对策略学习产生干扰，速度多模态成为关键贡献因素。基于这一洞察，我们提出了一种分布去偏方法以缓解速度模糊性，所提出的GO-1-Pro方法实现了15%的性能提升，相当于使用2.5倍的预训练数据。这些发现提供了新的视角，并为如何有效扩展机器人操作数据集提供了实用指导。

英文摘要

Data scaling has driven remarkable success in foundation models for Natural Language Processing (NLP) and Computer Vision (CV), yet the principles of effective data scaling in robotic manipulation remain insufficiently understood. In this work, we investigate the nuanced role of data diversity in robot learning by examining three critical dimensions-task (what to do), embodiment (which robot to use), and expert (who demonstrates)-challenging the conventional intuition of "more diverse is better". Throughout extensive experiments on various robot platforms, we reveal that (1) task diversity proves more critical than per-task demonstration quantity, benefiting transfer from diverse pre-training tasks to novel downstream scenarios; (2) multi-embodiment pre-training data is optional for cross-embodiment transfer-models trained on high-quality single-embodiment data can efficiently transfer to different platforms, showing more desirable scaling property during fine-tuning than multi-embodiment pre-trained models; and (3) expert diversity, arising from individual operational preferences and stochastic variations in human demonstrations, can be confounding to policy learning, with velocity multimodality emerging as a key contributing factor. Based on this insight, we propose a distribution debiasing method to mitigate velocity ambiguity, the yielding GO-1-Pro achieves substantial performance gains of 15%, equivalent to using 2.5 times pre-training data. Collectively, these findings provide new perspectives and offer practical guidance on how to scale robotic manipulation datasets effectively.

URL PDF HTML ☆

赞 0 踩 0

2501.14291 2026-06-05 cs.LG stat.ML 版本更新

FATE：用于多变量时间序列预测的焦点调节注意力编码器

Tajamul Ashraf, Janibul Bashir

发表机构 * GAASH Research Lab（GAASH研究实验室）； Department of Information Technology（信息科技系）； National Institute of Technology Srinagar（斯里 Nagar国立理工学院）

AI总结本文提出FATE，一种新的Transformer架构，用于可靠的多变量时间序列预测。FATE引入了张量化的焦点调节机制，以显式捕捉时间序列中的时空相关性，并通过两个调节分数提高可解释性，通过在七个不同现实世界数据集上基准测试，证明其在长视界多变量气象数据集上的优越性能。

详情

AI中文摘要

气候变化是21世纪最紧迫的全球挑战之一，其后果包括海平面上升、冰川融化以及日益极端的天气模式。准确的预测对于监测这些现象和支持缓解策略至关重要。尽管最近的数据驱动模型，包括CNNs、RNNs和基于注意力的Transformer，在时间序列预测中显示出潜力，但它们在处理序列依赖性和有限并行性方面存在困难，尤其是在长视界、多变量气象数据集中。在本文中，我们提出了Focal Modulated Attention Encoder（FATE），一种新的Transformer架构，用于可靠的多变量时间序列预测。与传统模型不同，FATE引入了张量化的焦点调节机制，以显式捕捉时间序列数据中的时空相关性。我们进一步提出了两个调节分数，通过突出影响预测的关键环境特征来提供可解释性。我们在七个不同的现实世界数据集上基准测试FATE，包括ETTh1、ETTm2、Traffic、Weather5k、USA-Canada、Europe和LargeST数据集，并显示其在所有最先进的方法，包括温度数据集上都表现优异。我们的消融研究也表明，FATE能够很好地推广到更广泛的多变量时间序列预测任务中。

英文摘要

Climate change stands as one of the most pressing global challenges of the twenty-first century, with far-reaching consequences such as rising sea levels, melting glaciers, and increasingly extreme weather patterns. Accurate forecasting is critical for monitoring these phenomena and supporting mitigation strategies. While recent data-driven models for time-series forecasting, including CNNs, RNNs, and attention-based transformers, have shown promise, they often struggle with sequential dependencies and limited parallelization, especially in long-horizon, multivariate meteorological datasets. In this work, we present Focal Modulated Attention Encoder (FATE), a novel transformer architecture designed for reliable multivariate time-series forecasting. Unlike conventional models, FATE introduces a tensorized focal modulation mechanism that explicitly captures spatiotemporal correlations in time-series data. We further propose two modulation scores that offer interpretability by highlighting critical environmental features influencing predictions. We benchmark FATE across seven diverse real-world datasets, including ETTh1, ETTm2, Traffic, Weather5k, USA-Canada, Europe, and LargeST datasets, and show that it consistently outperforms all state-of-the-art methods, including temperature datasets. Our ablation studies also demonstrate that FATE generalizes well to broader multivariate time-series forecasting tasks.

URL PDF HTML ☆

赞 0 踩 0

2506.11973 2026-06-05 cs.LG 版本更新

Self-Regulating Cars: Automating Traffic Control in Free Flow Road Networks

自调节汽车：自动化自由流道路网络的交通控制

Ankit Bhardwaj, Rohail Asim, Sachin Chauhan, Yasir Zaki, Lakshminarayanan Subramanian

发表机构 * Department of Computer Science（计算机科学系）； New York University（纽约大学）； Indian Institute of Technology Delhi（德里印度理工学院）

AI总结本文提出了一种基于强化学习的自调节汽车方法，通过动态调节车辆速度来优化通行能力和防止拥堵，无需新基础设施，结合经典交通流理论和微观模拟，在高保真度的PTV Vissim模拟器上实现了提高通行能力、减少延误和停车次数的改进。

详情

DOI: 10.1609/aaai.v40i45.41163

AI中文摘要

自由流道路网络，如郊区高速公路，由于通勤流量增加和基础设施有限，正越来越多地经历交通拥堵。传统控制机制，如交通信号或局部启发式方法，在这些高速、无信号的环境中效果不佳或不可行。我们引入了自调节汽车，一种基于强化学习的交通控制协议，通过动态调节车辆速度来优化通行能力和防止拥堵，而无需新的物理基础设施。我们的方法将经典交通流理论、间隙接受模型和微观模拟整合到一个物理指导的强化学习框架中。通过将道路抽象为超段，智能体捕捉到涌现的流量动态，并从即时交通观测中学习稳健的速度调节策略。在高保真度的PTV Vissim模拟器上，我们的方法在真实世界高速公路网络中实现了比无控制设置提高5%的总通行能力，减少13%的平均延误，以及减少3%的总停车次数。它还实现了更平滑、抗拥堵的流量，同时在各种交通模式中泛化，展示了其在可扩展的ML驱动交通管理中的潜力。

英文摘要

Free-flow road networks, such as suburban highways, are increasingly experiencing traffic congestion due to growing commuter inflow and limited infrastructure. Traditional control mechanisms, such as traffic signals or local heuristics, are ineffective or infeasible in these high-speed, signal-free environments. We introduce self-regulating cars, a reinforcement learning-based traffic control protocol that dynamically modulates vehicle speeds to optimize throughput and prevent congestion, without requiring new physical infrastructure. Our approach integrates classical traffic flow theory, gap acceptance models, and microscopic simulation into a physics-informed RL framework. By abstracting roads into super-segments, the agent captures emergent flow dynamics and learns robust speed modulation policies from instantaneous traffic observations. Evaluated in the high-fidelity PTV Vissim simulator on a real-world highway network, our method improves total throughput by 5%, reduces average delay by 13%, and decreases total stops by 3% compared to the no-control setting. It also achieves smoother, congestion-resistant flow while generalizing across varied traffic patterns, demonstrating its potential for scalable, ML-driven traffic management.

URL PDF HTML ☆

赞 0 踩 0

2506.11042 2026-06-05 cs.LG 版本更新

GenFT: A Generative Parameter-Efficient Fine-Tuning Method for Pretrained Foundation Models

GenFT：一种用于预训练基础模型的生成性参数高效微调方法

Guangning Xu, Baoquan Zhang, Michael. K. Ng

发表机构 * Department of Mathematics, Hong Kong Baptist University, Hong Kong, China（香港 Baptist 大学数学系，香港，中国）； Department of Computer Science, Harbin Institute of Technology, Shenzhen, China（哈尔滨工业大学（深圳）计算机科学系，中国）

AI总结本文提出GenFT，一种基于预训练权重的参数高效微调方法，通过生成任务特定的更新来利用预训练权重中的结构信息，实现高效的模型微调。

Comments paper is accepted at ICANN 2026

详情

AI中文摘要

参数高效微调（PEFT）已作为一种资源高效的策略，通过学习少量任务特定的更新ΔW来适应预训练基础模型（PFMs）。现有方法往往在很大程度上独立于预训练权重W₀，或主要通过初始化或简单的重参数化来利用W₀。为了进一步利用W₀中编码的结构信息，我们提出生成性参数高效微调（GenFT），一种基于W₀的PEFT方法，使用确定性权重生成器生成任务特定的更新。具体而言，GenFT通过行和列变换与非线性激活来从W₀中提取结构化模式，并引入共享-特定分解以平衡跨层信息重用和层特定的灵活性。GenFT简单且参数高效，在NLP和CV基准上实现了竞争性或更优的平均性能。我们进一步在LLaMA-7B上进行试点研究，以检验其在生成模型中的可行性。代码可在GitHub https://github.com/xuguangning1218/GenFT 上获得。

英文摘要

Parameter-efficient fine-tuning (PEFT) has emerged as a resource-efficient strategy for adapting Pretrained Foundation Models (PFMs) by learning a small number of task-specific updates $ΔW$. Existing methods often learn $ΔW$ largely independently of pretrained weights $W_0$, or exploit $W_0$ mainly through initialization or simple reparameterization. To further leverage the structural information encoded in $W_0$, we propose Generative Parameter-Efficient Fine-Tuning (GenFT), a $W_0$-conditioned PEFT method that uses a deterministic weight generator to produce task-specific updates. Specifically, GenFT performs row and column transformations with nonlinear activations to extract structured patterns from $W_0$, and introduces a shared-specific decomposition to balance cross-layer information reuse and layer-specific flexibility. GenFT is simple and parameter-efficient, achieving competitive or better average performance across NLP and CV benchmarks. We further provide a pilot study on LLaMA-7B to examine its feasibility for generative models. The code is available at GitHub https://github.com/xuguangning1218/GenFT.

URL PDF HTML ☆

赞 0 踩 0

2506.00188 2026-06-05 cs.LG stat.ML 版本更新

Cluster-Aware Causal Mixer for Online Anomaly Detection in Multivariate Time Series

基于聚类的因果混合器用于多变量时间序列的在线异常检测

Md Mahmuddun Nabi Murad, Yasin Yilmaz

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文提出了一种基于聚类的因果混合器，用于多变量时间序列的在线异常检测，通过聚类处理通道间的相关性，结合因果混合器保持时间因果性，并开发了序列异常评分方法以提高检测准确性。

详情

AI中文摘要

在时间序列数据中早期和准确地检测异常至关重要，因为假阳性和漏检带来的风险很大。虽然基于MLP的混合模型在时间序列分析中显示出潜力，但它们在数据处理过程中不维护时间因果性。此外，现实中的多变量时间序列通常包含众多通道，具有多样的通道间相关性。重构时间序列中的虚假相关性导致表示噪声，从而导致检测不准确。此外，忽略时间连续性的异常评分方法可能会误导连续检测。为了解决这些挑战，我们提出了一种多变量时间序列异常检测的基于聚类的因果混合器。根据相关性将通道分组为集群，并通过专用嵌入层对每个集群进行嵌入。引入因果混合器以在保持时间因果性的同时整合信息。我们进一步开发了一种序列异常评分方法，该方法在时间上累积证据并细化异常边界。我们提出的模型以在线方式运行，使其适合实时时间序列异常检测。在六个公开基准数据集上的实验评估表明，所提出的方法在性能上始终优于其他方法。

英文摘要

Early and accurate detection of anomalies in time-series data is critical due to the substantial risks associated with false or missed detections. While MLP-based mixer models have shown promise in time-series analysis, they do not maintain temporal causality during data processing. Moreover, real-world multivariate time series often contain numerous channels with diverse inter-channel correlations. Spurious correlations in the reconstructed time series lead to noisy representations, resulting in inaccurate anomaly detection. In addition, anomaly scoring methods that ignore temporal continuity can mislead sequential detection. To address these challenges, we propose a cluster-aware causal mixer for multivariate time-series anomaly detection. Channels are grouped into clusters based on their correlations, and each cluster is embedded through a dedicated embedding layer. A causal mixer is introduced to integrate information while maintaining temporal causality. We further develop a sequential anomaly-scoring method that accumulates evidence over time and refines anomaly boundaries. Our proposed model operates in an online fashion, making it suitable for real-time time-series anomaly detection. Experimental evaluations across six public benchmark datasets demonstrate that the proposed approach consistently achieves superior performance.

URL PDF HTML ☆

赞 0 踩 0

2505.16311 2026-06-05 stat.ML cs.LG stat.ME 版本更新

Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions

生成器介导的老虎机：面向生成式人工智能的自适应干预的汤普森采样

Marc Brooks, Gabriel Durham, Kihyuk Hong, Ambuj Tewari

发表机构 * Department of Statistics, University of Michigan, Ann Arbor, MI (USA)（密歇根大学统计学系，安阿伯，MI (美国)）

AI总结本文提出了一种生成器介导的老虎机算法（GAMBITTS），用于解决生成式人工智能（GenAI）驱动的自适应干预问题。该算法通过建模治疗和奖励生成过程，利用观察到的治疗信息加速策略学习，并在模拟研究中优于传统算法。

Comments 39 pages, 12 figures

详情

Journal ref: Advances in Neural Information Processing Systems 38 (NeurIPS 2025)

AI中文摘要

近期生成式人工智能（GenAI）模型的进步使生成个性化内容成为可能，该内容能够适应最新的用户情境。尽管个性化决策系统通常采用老虎机建模，但GenAI的引入为经典序列学习问题带来了新的结构。在GenAI驱动的干预中，智能体选择查询，但环境会经历由生成模型产生的随机响应。标准老虎机方法并未显式考虑这种结构，其中动作仅通过随机、观察到的治疗影响奖励。我们引入生成器介导的老虎机-汤普森采样（GAMBITTS），一种针对这种动作/治疗分割设计的老虎机方法，以移动健康干预中的大型语言模型生成文本作为动机案例。GAMBITTS显式建模治疗和奖励生成过程，利用所交付的治疗信息，相对于标准方法加速策略学习。我们通过分解治疗和奖励中的不确定性来源，建立了GAMBITTS的遗憾界，并识别了其在某些条件下优于标准老虎机方法的保证条件。在模拟研究中，GAMBITTS通过利用观察到的治疗更准确地估计预期奖励，始终优于传统算法。

英文摘要

Recent advances in generative artificial intelligence (GenAI) models have enabled the generation of personalized content that adapts to up-to-date user context. While personalized decision systems are often modeled using bandit formulations, the integration of GenAI introduces new structure into otherwise classical sequential learning problems. In GenAI-powered interventions, the agent selects a query, but the environment experiences a stochastic response drawn from the generative model. Standard bandit methods do not explicitly account for this structure, where actions influence rewards only through stochastic, observed treatments. We introduce generator-mediated bandit-Thompson sampling (GAMBITTS), a bandit approach designed for this action/treatment split, using mobile health interventions with large language model-generated text as a motivating case study. GAMBITTS explicitly models both the treatment and reward generation processes, using information in the delivered treatment to accelerate policy learning relative to standard methods. We establish regret bounds for GAMBITTS by decomposing sources of uncertainty in treatment and reward, identifying conditions where it achieves stronger guarantees than standard bandit approaches. In simulation studies, GAMBITTS consistently outperforms conventional algorithms by leveraging observed treatments to more accurately estimate expected rewards.

URL PDF HTML ☆

赞 0 踩 0

2310.04649 2026-06-05 cs.LG 版本更新

Uncovering Model Processing Strategies with Non-Negative Per-Example Fisher Factorization

通过非负每例费舍尔分解揭示模型处理策略

Michael Matena, Colin Raffel

发表机构 * University of North Carolina Chapel Hill（北卡罗来纳大学教堂山分校）； University of Toronto（多伦多大学）； Vector Institute（向量研究所）

AI总结本文提出NPEFF方法，通过分解每例费舍尔矩阵揭示模型生成预测所用的策略，展示了NPEFF组件在语言模型和文本处理任务中的应用，并展示了如何通过扰动这些组件来干扰模型处理，同时通过消融研究和实验验证了NPEFF在分析和缓解去学习的副作用以及研究上下文学习中的优势。

详情

AI中文摘要

我们引入NPEFF（非负每例费舍尔分解），一种可解释性方法，旨在揭示模型生成预测所使用的策略。NPEFF使用一种新颖的分解算法分解每例费舍尔矩阵，该算法学习了一组由学习得到的秩-1半正定矩阵表示的组件。通过结合人类评估和自动化分析，我们证明这些NPEFF组件对应于各种语言模型和文本处理任务中的模型处理策略。我们进一步展示了如何从NPEFF组件构建参数扰动，以选择性地干扰给定组件在模型处理中的作用。除了进行广泛的消融研究外，我们还包括实验，展示了NPEFF如何用于分析和缓解去学习的副作用，并用NPEFF研究上下文学习。此外，我们展示了NPEFF相对于梯度聚类和使用稀疏自编码器进行字典学习等基线方法的优势。我们发布了本工作的代码。

英文摘要

We introduce NPEFF (Non-Negative Per-Example Fisher Factorization), an interpretability method that aims to uncover strategies used by a model to generate its predictions. NPEFF decomposes per-example Fisher matrices using a novel decomposition algorithm that learns a set of components represented by learned rank-1 positive semi-definite matrices. Through a combination of human evaluation and automated analysis, we demonstrate that these NPEFF components correspond to model processing strategies for a variety of language models and text processing tasks. We further show how to construct parameter perturbations from NPEFF components to selectively disrupt a given component's role in the model's processing. Along with conducting extensive ablation studies, we include experiments to show how NPEFF can be used to analyze and mitigate collateral effects of unlearning and use NPEFF to study in-context learning. Furthermore, we demonstrate the advantages of NPEFF over baselines such as gradient clustering and using sparse autoencoders for dictionary learning over model activations. We release the code used in this work.

URL PDF HTML ☆

赞 0 踩 0

2505.02540 2026-06-05 cs.LG cs.AI 版本更新

Lazy But Effective: Collaborative Personalized Federated Learning with Heterogeneous Data

懒惰但有效：基于异构数据的协同个性化联邦学习

Ljubomir Rokvic, Panayiotis Danassis, Boi Faltings

发表机构 * Artificial Intelligence Laboratory EPFL（苏黎世联邦理工学院人工智能实验室）； Telenor Research（Telenor研究）

AI总结本文提出了一种简单有效的个性化联邦学习框架pFedLIA，通过使用计算效率高的影响近似方法'Lazy Influence'，在分布式 manner 中对客户端进行聚类，从而在模型聚合前协同训练模型以捕捉客户端特定的数据模式，实验证明其在非iid数据集上能有效恢复全局模型性能，并在多个基准任务中优于现有基线方法。

Comments Accepted at the International Joint Conference on Neural Networks (IJCNN), IEEE, 2025

详情

DOI: 10.1109/IJCNN64981.2025.11228646

AI中文摘要

在联邦学习中，客户端数据分布的异质性往往意味着单一全局模型无法为个别客户端提供最佳性能。例如，训练键盘的下一个词预测模型时，由于用户特定的语言模式（如人口统计学特征、语言能力、书写风格等），客户端之间会产生高度非iid的数据集。其他例子包括使用不同机器拍摄的医学图像或不同车辆类型的驾驶数据。为了解决这一问题，我们提出了一种简单但有效的个性化联邦学习框架（pFedLIA），该框架利用一种计算效率高的影响近似方法，称为'Lazy Influence'，在分布式 manner 中在模型聚合前对客户端进行聚类。在每个聚类中，数据所有者协同训练一个模型，以捕捉客户端特定的数据模式。我们的方法在各种合成和现实世界设置中成功恢复了由于非iid性导致的全局模型性能下降，特别是在北欧语言的下一个词预测任务以及多个基准任务中。它在性能上与假设的Oracle聚类匹配，并显著优于现有基线方法，例如在CIFAR100上提高了17%。

英文摘要

In Federated Learning, heterogeneity in client data distributions often means that a single global model does not have the best performance for individual clients. Consider for example training a next-word prediction model for keyboards: user-specific language patterns due to demographics (dialect, age, etc.), language proficiency, and writing style result in a highly non-IID dataset across clients. Other examples are medical images taken with different machines, or driving data from different vehicle types. To address this, we propose a simple yet effective personalized federated learning framework (pFedLIA) that utilizes a computationally efficient influence approximation, called `Lazy Influence', to cluster clients in a distributed manner before model aggregation. Within each cluster, data owners collaborate to jointly train a model that captures the specific data patterns of the clients. Our method has been shown to successfully recover the global model's performance drop due to the non-IID-ness in various synthetic and real-world settings, specifically a next-word prediction task on the Nordic languages as well as several benchmark tasks. It matches the performance of a hypothetical Oracle clustering, and significantly improves on existing baselines, e.g., an improvement of 17% on CIFAR100.

URL PDF HTML ☆

赞 0 踩 0

2411.18343 2026-06-05 cs.LG cs.AI 版本更新

Comprehensive and Reliable Feature Attribution for Diverse Modalities and Models via Frequency-Domain Insights

通过频域见解实现多样化模态和模型的全面可靠特征归因

Zechen Liu, Feiyang Zhang, Wei Song, Xiang Li, Wei Wei

发表机构 * School of Computational Science, Wuhan University（武汉大学计算科学学院）； Brain Research Center, Wuhan University（武汉大学脑科学研究中心）； College of Information Science and Technology (School of Cyber Science and Technology), Shihezi University（石河子大学信息科学学院（网络安全科学与技术学院））； Xinjiang Production and Construction Corps Key Laboratory of Computing Intelligence and Network Information Security Open Fund（新疆生产建设兵团计算智能与网络信息安全重点实验室开放基金）

AI总结本文提出了一种新的可解释性方法FreqX，结合信号处理和信息理论，以解决个性化联邦学习中非IID数据、异构设备、缺乏公平性和贡献不明确等问题，通过频域分析提高解释性效率和准确性。

Comments 16pages, 9 figures

2503.11910 2026-06-05 cs.LG cs.AI math.AT math.SG 版本更新

RTD-Lite: Scalable Topological Analysis for Comparing Weighted Graphs in Learning Tasks

RTD-Lite：用于学习任务中比较加权图拓扑结构的可扩展分析

Eduard Tulchinskii, Daria Voronkova, Ilya Trofimov, Evgeny Burnaev, Serguei Barannikov

发表机构 * Skoltech, AI Foundation and Algorithm Lab（斯克里普丘尔技术学院，人工智能基础与算法实验室）； Skoltech, AIRI（斯克里普丘尔技术学院，人工智能研究机构）； Skoltech, CNRS（斯克里普丘尔技术学院，法国国家科学研究中心）

AI总结本文提出RTD-Lite算法，通过最小生成树辅助图在O(n²)时间内高效比较加权图的拓扑特征，适用于降维和神经网络训练等任务，实验表明其在识别拓扑差异和减少计算时间方面优于现有方法。

Comments Accepted for AISTATS 2025

详情

AI中文摘要

用于比较加权图的拓扑方法在各种学习任务中具有价值，但通常在大规模数据集上计算效率低下。我们介绍了RTD-Lite，一种可扩展算法，能够高效比较两个具有顶点一一对应关系的加权图的拓扑特征，特别是任意尺度下的连通性或聚类结构。通过辅助图的最小生成树，RTD-Lite以O(n²)的时间和内存复杂度捕捉拓扑差异。这种效率使其适用于降维和神经网络训练等任务。在合成和现实数据集上的实验表明，RTD-Lite能够有效识别拓扑差异，同时显著减少计算时间，相较于现有方法。此外，将RTD-Lite作为损失函数组件整合到神经网络训练中，可以增强学习表示中的拓扑结构保持。我们的代码在https://github.com/ArGintum/RTD-Lite上公开可用。

英文摘要

Topological methods for comparing weighted graphs are valuable in various learning tasks but often suffer from computational inefficiency on large datasets. We introduce RTD-Lite, a scalable algorithm that efficiently compares topological features, specifically connectivity or cluster structures at arbitrary scales, of two weighted graphs with one-to-one correspondence between vertices. Using minimal spanning trees in auxiliary graphs, RTD-Lite captures topological discrepancies with $O(n^2)$ time and memory complexity. This efficiency enables its application in tasks like dimensionality reduction and neural network training. Experiments on synthetic and real-world datasets demonstrate that RTD-Lite effectively identifies topological differences while significantly reducing computation time compared to existing methods. Moreover, integrating RTD-Lite into neural network training as a loss function component enhances the preservation of topological structures in learned representations. Our code is publicly available at https://github.com/ArGintum/RTD-Lite

URL PDF HTML ☆

赞 0 踩 0

2502.20914 2026-06-05 cs.LG cs.AI cs.CL 版本更新

Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?

Maxime Méloux, Silviu Maniu, François Portet, Maxime Peyrard

发表机构 * Université Grenoble Alpes, CNRS, Grenoble INP, LIG（格勒诺布尔阿尔卑斯大学、国家科学研究中心、格勒诺布尔INP、实验室LIG）

AI总结本文探讨了在机械可解释性（MI）框架下，给定行为是否具有唯一解释的问题，通过统计可识别性理论分析了MI解释的可识别性，并提出了两种主要策略及实验结果。

详情

Journal ref: The Thirteenth International Conference on Learning Representations (ICLR 2025)

AI中文摘要

随着AI系统应用于高风险领域，确保可解释性至关重要。机械可解释性（MI）旨在通过提取人类可理解的算法来解释神经网络的行为。本文探讨了一个关键问题：在给定行为下，根据MI的标准，是否存在唯一的解释？借鉴统计学中的可识别性，其中参数在特定假设下可以唯一推断，我们探索了MI解释的可识别性。我们识别出两种主要的MI策略：（1）“where-then-what”，通过隔离复制模型行为的电路并在之后解释它；（2）“what-then-where”，从候选算法开始，通过因果对齐搜索实现它们的神经激活子空间。我们对布尔函数和小型多层感知机测试了这两种策略，完全枚举了候选解释。实验揭示了系统性的不可识别性：多个电路可以复制行为，一个电路可以有多种解释，多个算法可以与网络对齐，一个算法可以与不同的子空间对齐。是否需要唯一性？一种务实的方法可能只需要预测性和可操作性标准。如果唯一性对理解至关重要，可能需要更严格的条件。我们还参考了内部可解释性框架，该框架通过多种标准验证解释。本文为定义AI中的解释标准做出了贡献。

英文摘要

As AI systems are used in high-stakes applications, ensuring interpretability is crucial. Mechanistic Interpretability (MI) aims to reverse-engineer neural networks by extracting human-understandable algorithms to explain their behavior. This work examines a key question: for a given behavior, and under MI's criteria, does a unique explanation exist? Drawing on identifiability in statistics, where parameters are uniquely inferred under specific assumptions, we explore the identifiability of MI explanations. We identify two main MI strategies: (1) "where-then-what," which isolates a circuit replicating model behavior before interpreting it, and (2) "what-then-where," which starts with candidate algorithms and searches for neural activation subspaces implementing them, using causal alignment. We test both strategies on Boolean functions and small multi-layer perceptrons, fully enumerating candidate explanations. Our experiments reveal systematic non-identifiability: multiple circuits can replicate behavior, a circuit can have multiple interpretations, several algorithms can align with the network, and one algorithm can align with different subspaces. Is uniqueness necessary? A pragmatic approach may require only predictive and manipulability standards. If uniqueness is essential for understanding, stricter criteria may be needed. We also reference the inner interpretability framework, which validates explanations through multiple criteria. This work contributes to defining explanation standards in AI.

URL PDF HTML ☆

赞 0 踩 0

2502.06434 2026-06-05 cs.CV cs.LG 版本更新

Unifying Dataset Pruning and Distillation for Efficient Large-scale Compression

统一数据集剪枝与蒸馏以实现高效大规模压缩

Lingao Xiao, Songhua Liu, Yang He, Xinchao Wang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文提出一个统一的数据集压缩基准，探讨数据集剪枝与蒸馏的收敛趋势，发现软标签蒸馏在小数据集上表现不如剪枝，提出基于硬标签的数据集压缩方法，通过PCA框架提升图像质量和存储效率。

Comments Accepted by ICML 2026

详情

AI中文摘要

数据集剪枝（DP）和数据集蒸馏（DD）在输出上有根本差异：DP选择原始图像子集，而DD生成合成图像。最近，DD对原始图像的依赖增加表明两种方法趋于融合。为研究这种融合趋势，我们提出统一的数据集压缩（DC）基准。该基准揭示了软标签-DD的有趣权衡：虽然软标签提供有价值信息，但它们可能使蒸馏过程变得不必要，因为蒸馏图像可能不总能优于随机子集。此外，基准表明在当前阶段，数据集剪枝在小数据集上优于数据集蒸馏。鉴于这些观察，我们探索硬标签-DC作为互补方法，强调图像质量的同时提供显著的存储效率。我们的PCA（Prune, Combine, and Augment）是首个不依赖软标签而是聚焦图像质量的框架。（1）

英文摘要

Dataset pruning (DP) and dataset distillation (DD) fundamentally differ in their outputs: DP selects original image subsets, while DD generates synthetic images. Recently, DD's increasing reliance on original images suggests a convergence of the two directions. To investigate this convergence trend, we propose a unified dataset compression (DC) benchmark. This benchmark reveals an interesting trade-off for soft-label-DD: while soft labels provide valuable information, they can make the distillation process less essential, as distilled images may not always outperform random subsets. In addition, the benchmark reveals that in current stages, dataset pruning outperforms dataset distillation at small dataset sizes. Given these observations, we explore hard-label-DC as a complementary approach that emphasizes image quality while offering substantial storage efficiency. Our PCA (Prune, Combine, and Augment) is the first framework that does not rely on soft labels but instead focuses on image quality. (1) "P'' means selecting easy samples based on dataset pruning metrics, (2) "C'' indicates combining these samples effectively, and (3) "A'' is to apply constrained image augmentation during training. Our code is available at https://github.com/ArmandXiao/Unifying-Dataset-Pruning-and-Distillation

URL PDF HTML ☆

赞 0 踩 0

2410.04309 2026-06-05 cs.CY cs.LG 版本更新

Comprehensive Monitoring of Air Pollution Hotspots Using Sparse Sensor Networks

利用稀疏传感器网络全面监测空气污染热点

Ankit Bhardwaj, Ananth Balashankar, Shiva Iyer, Nita Soans, Anant Sudarshan, Rohini Pande, Lakshminarayanan Subramanian

发表机构 * New York University（纽约大学）； Google Research（谷歌研究）； Toyota InfoTechnology Center（丰田信息技术中心）； Kaiterra Inc（Kaiterra公司）； University of Warwick（沃里克大学）； Yale University（耶鲁大学）

AI总结本文通过结合预测建模和机理方法，利用新增的低成本传感器，发现新德里现有传感器网络之外的189个隐藏热点，并利用空间时间克里金法进行预测，同时开发了高斯烟雾扩散模型以解释热点形成机理，为资源受限环境下的空气污染管理提供了数据驱动和机理结合的解决方案。

详情

DOI: 10.1145/3748821

AI中文摘要

城市空气污染热点对健康构成重大威胁，但其检测和分析仍然受到公共传感器网络稀疏性的限制。本文通过结合预测建模和机理方法，全面监测污染热点。我们通过在新德里现有传感器网络中增加28个低成本传感器，收集了2018年5月1日至2020年11月1日期间30个月的PM2.5数据。应用已建立的热点定义，我们发现了除公共网络检测的660个热点外，还有189个隐藏热点。利用预测技术如空间时间克里金法，我们在50%的传感器故障率下实现了95%的精度和88%的召回率，在50%的缺失传感器情况下实现了98%的精度和95%的召回率。我们的预测模型的预期结果进一步被编译成政策建议，供公共当局参考。此外，我们开发了高斯烟雾扩散模型以理解热点形成的机理，结合了从本地来源衍生的排放清单。我们的机理模型能够解释65%的观测到的瞬时热点。我们的发现强调了在资源受限环境中，整合数据驱动的预测模型与基于物理的机理模型对于可扩展和稳健的空气污染管理的重要性。

英文摘要

Urban air pollution hotspots pose significant health risks, yet their detection and analysis remain limited by the sparsity of public sensor networks. This paper addresses this challenge by combining predictive modeling and mechanistic approaches to comprehensively monitor pollution hotspots. We enhanced New Delhi's existing sensor network with 28 low-cost sensors, collecting PM2.5 data over 30 months from May 1, 2018, to Nov 1, 2020. Applying established definitions of hotspots to this data, we found the existence of additional 189 hidden hotspots apart from confirming 660 hotspots detected by the public network. Using predictive techniques like Space-Time Kriging, we identified hidden hotspots with 95% precision and 88% recall with 50% sensor failure rate, and with 98% precision and 95% recall with 50% missing sensors. The projected results of our predictive models were further compiled into policy recommendations for public authorities. Additionally, we developed a Gaussian Plume Dispersion Model to understand the mechanistic underpinnings of hotspot formation, incorporating an emissions inventory derived from local sources. Our mechanistic model is able to explain 65% of observed transient hotspots. Our findings underscore the importance of integrating data-driven predictive models with physics-based mechanistic models for scalable and robust air pollution management in resource-constrained settings.

URL PDF HTML ☆

赞 0 踩 0

2406.08966 2026-06-05 cs.LG cs.AI 版本更新

Separation Power of Equivariant Neural Networks

等变神经网络的分离能力

Marco Pacini, Xiaowen Dong, Bruno Lepri, Gabriele Santin

发表机构 * University of Trento（特伦托大学）； Fondazione Bruno Kessler（布鲁诺·凯斯勒基金会）； University of Oxford（牛津大学）； University of Venice（威尼斯大学）

AI总结本文研究了等变神经网络的分离能力，分析了架构和超参数对分离能力的影响，发现非多项式激活函数在表达能力上等价，深度在阈值后不再提升分离能力，而隐表示的块分解会影响分离能力。

Comments Published as a conference paper at ICLR 2025

详情

Journal ref: International Conference on Learning Representations (ICLR), 2025

AI中文摘要

机器学习模型的分离能力是指其区分不同输入的能力，常被用作表达能力的代理。确实，了解模型家族的分离能力是获得细粒度普遍性结果的必要条件。在本文中，我们分析了等变神经网络（如卷积网络和置换不变网络）的分离能力。我们首先给出了由给定架构导出的模型无法区分的输入的完整特征化。从这些结果中，我们推导出分离能力如何受到超参数和架构选择（如激活函数、深度、隐藏层宽度和表示类型）的影响。值得注意的是，所有非多项式激活函数（包括ReLU和Sigmoid）在表达能力上是等价的，并能达到最大分离能力。深度在达到阈值后提升分离能力，之后进一步增加无效应。在隐表示中添加不变特征不影响分离能力。最后，隐表示的块分解影响分离性，最小的组件形成一个分离能力的层次结构，提供了一种直接比较模型分离能力的方法。

通过线性扰动损失最小化进行探索

David Janz, Shuai Liu, Alex Ayoub, Csaba Szepesvári

发表机构 * University of Alberta（阿尔伯塔大学）

AI总结本文提出了一种基于线性扰动损失的探索方法EVILL，通过求解线性扰动的正则化负对数似然函数的最小化问题，解释了随机奖励扰动为何能产生有效的多臂老虎机算法，并展示了数据依赖扰动如何使EVILL在理论和实践中达到与Thompson采样类参数扰动方法相当的性能。

Comments Updated with erratum note: Appendix I contains a gap in the proof; all main-paper claims remain valid via the corrected argument of Perneczky, Abeille & Janz (2026, arXiv:2606.00431)

详情

AI中文摘要

我们引入了通过线性损失扰动进行探索（EVILL），一种用于结构化随机老虎机问题的随机探索方法，其通过求解线性扰动的正则化负对数似然函数的最小化问题来工作。我们证明，在一般线性老虎机的情况下，EVILL简化为扰动历史探索（PHE），一种通过在随机扰动的奖励上进行训练来实现探索的方法。通过这样做，我们提供了一个简单清晰的解释，说明何时以及为什么随机奖励扰动会产生有效的老虎机算法。我们提出了之前PHE类型方法中未出现的数据依赖扰动，使EVILL能够匹配Thompson-sampling风格的参数扰动方法的性能，理论和实践中均如此。此外，我们展示了在一般线性老虎机之外的一个例子，其中PHE导致不一致的估计，从而产生线性遗憾，而EVILL仍然表现良好。与PHE一样，EVILL可以通过几行代码实现。

英文摘要

We introduce exploration via linear loss perturbations (EVILL), a randomised exploration method for structured stochastic bandit problems that works by solving for the minimiser of a linearly perturbed regularised negative log-likelihood function. We show that, for the case of generalised linear bandits, EVILL reduces to perturbed history exploration (PHE), a method where exploration is done by training on randomly perturbed rewards. In doing so, we provide a simple and clean explanation of when and why random reward perturbations give rise to good bandit algorithms. We propose data-dependent perturbations not present in previous PHE-type methods that allow EVILL to match the performance of Thompson-sampling-style parameter-perturbation methods, both in theory and in practice. Moreover, we show an example outside generalised linear bandits where PHE leads to inconsistent estimates, and thus linear regret, while EVILL remains performant. Like PHE, EVILL can be implemented in just a few lines of code.

URL PDF HTML ☆

赞 0 踩 0

2403.00965 2026-06-05 stat.AP cs.AI cs.LG 版本更新

Binary Gaussian Copula Synthesis: an LLM-powered data augmentation framework for early dialysis prediction in chronic kidney disease

二元高斯卷积合成：一种基于LLM的数据增强框架，用于慢性肾病早期透析预测

Hamed Khosravi, Milad Khanchi, Mobina Noori, Srinjoy Das, Abdullah Al-Mamun, Imtiaz Ahmed

发表机构 * Department of Industrial & Management Systems Engineering, West Virginia University（威斯康星大学工业与管理系统工程系）； Department of Electrical and Computer Engineering, Concordia University（康科迪亚大学电气与计算机工程系）； Department of Computer Science, University of California, Davis（加州大学戴维斯分校计算机科学系）； School of Mathematical & Data Sciences, West Virginia University（威斯康星大学数学与数据科学学院）； School of Systems Science and Industrial Engineering, The State University of New York at Binghamton（纽约州立大学布法罗分校系统科学与工业工程学院）； H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology（佐治亚理工学院H.米尔顿·斯图尔特工业与系统工程学院）

AI总结本文提出Binary Gaussian Copula Synthesis (BGCS)，一种专为二元临床数据设计的两阶段数据增强方法，通过生成合成少数类样本并过滤不合理的样本，提高了早期透析预测的性能。

详情

AI中文摘要

只有极少数慢性肾病（CKD）患者会进展到透析，这导致了严重的类别不平衡，限制了机器学习模型在早期透析预测中的性能。这一挑战进一步加剧了电子健康记录（EHR）数据的二元结构，而现有的大多数增强方法并未为此设计。我们提出了Binary Gaussian Copula Synthesis (BGCS)，一种专为二元临床数据设计的两阶段数据增强方法。BGCS首先使用高斯卷积框架生成合成少数类样本，该框架明确建模二元特征之间的成对依赖关系，然后应用微调的GPT-2分类器过滤出临床上不合理的样本后再进行训练。我们在一个包含15,169名CKD患者的真实世界EHR数据集中评估了BGCS，该数据集来自西弗吉尼亚州，收集时间从2008年到2022年。我们将其与SMOTE、CTGAN和标准高斯卷积在四个机器学习分类器上进行了基准测试，共进行了25次独立运行。BGCS在所有比较方法中表现一致，实现了90天透析预测的最高少数类召回率，不同分类器的中位数值范围从0.78到0.87，且在真实数据上的分布忠实度最强，特征的均值p值为0.68。表现最好的BGCS增强模型被集成到一个可解释的决策树基于的临床决策支持系统中，用于透析风险分层，其中电解质失衡、心血管合并症和肾脏监测指标成为最显著的预测特征。这些发现表明，为二元EHR数据的结构特性设计的增强方法可以显著提高早期透析风险预测，并支持开发可解释的临床决策支持工具用于CKD护理。

英文摘要

Only a small fraction of patients with chronic kidney disease (CKD) progress to dialysis, creating severe class imbalance that limits the performance of machine learning models for early dialysis prediction. This challenge is compounded by the binary structure of electronic health record (EHR) data, for which most existing augmentation methods were not designed. We propose Binary Gaussian Copula Synthesis (BGCS), a two-stage data augmentation method tailored to binary clinical data. BGCS first generates synthetic minority-class samples using a Gaussian copula framework that explicitly models pairwise dependencies among binary features, then applies a fine-tuned GPT-2 classifier to filter out clinically implausible samples before training. We evaluated BGCS on a real-world EHR dataset of 15,169 patients with CKD from West Virginia collected between 2008 and 2022, benchmarking it against SMOTE, CTGAN, and standard Gaussian Copula across four machine learning classifiers over 25 independent runs. BGCS consistently outperformed all comparison methods, achieving the highest minority-class recall for 90-day dialysis prediction, with median values ranging from 0.78 to 0.87 across classifiers, and the strongest distributional fidelity to real data, with a mean p-value of 0.68 across features. The best-performing BGCS-augmented model was integrated into an interpretable decision tree-based clinical decision support system for dialysis risk stratification, with electrolyte imbalances, cardiovascular comorbidities, and renal monitoring indicators emerging as the most influential predictive features. These findings suggest that augmentation methods designed for the structural properties of binary EHR data can meaningfully improve early dialysis risk prediction and support the development of interpretable clinical decision-support tools for CKD care.

URL PDF HTML ☆

赞 0 踩 0

2306.09712 2026-06-05 cs.LG cs.AI cs.CL 版本更新

Semi-Offline Reinforcement Learning for Optimized Text Generation

半离线强化学习用于优化文本生成

Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan

发表机构 * Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan（未知机构）

AI总结本文提出了一种半离线强化学习方法，平衡了探索能力和训练成本，并在优化成本、渐近误差和过拟合误差界方面实现了最优的强化学习设置。

Comments In Proceedings of the 40th International Conference on Machine Learning (ICML 2023)

2305.12640 2026-06-05 cs.AI cs.LG stat.ML 版本更新

Limited Resource Allocation in a Non-Markovian World: The Case of Maternal and Child Healthcare

在非马尔可夫世界中的有限资源分配：产科与儿童保健的案例

Panayiotis Danassis, Shresth Verma, Jackson A. Killian, Aparna Taneja, Milind Tambe

发表机构 * Harvard University（哈佛大学）； Google Research（谷歌研究）

AI总结本文研究了在非马尔可夫环境下如何通过时间序列方法优化资源分配，提出了一种新的时间序列臂排名指数（TARI）策略，以提高产科和儿童保健项目的参与度和依从性。

Comments Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI 2023)

详情

DOI: 10.24963/ijcai.2023/660

AI中文摘要

许多医疗项目成功的关键在于参与者的依从性。我们考虑在资源有限的环境中（例如健康工作者及时拨打电话）安排干预措施，以提高依从性和/或参与度。以往的工作已经成功开发了几种基于活跃多臂老虎机（RMAB）的解决方案。然而，所有以往的RMAB方法都假设参与者的行为遵循马尔可夫性质。我们展示了在我们合作伙伴NGO ARMMAN的产科健康意识项目上的真实数据中，存在显著偏离马尔可夫假设的现象。此外，我们扩展RMAB到连续状态空间，这是之前研究较少的领域。为解决一般的非马尔可夫RMAB环境，我们（i）将每个参与者的时间轨迹建模为时间序列，（ii）利用时间序列预测模型的力量来学习复杂模式和动态以预测未来状态，（iii）提出时间序列臂排名指数（TARI）策略，这是一种新的算法，选择最能从干预中受益的RMAB臂，基于我们的未来状态预测。我们在合成数据和ARMMAN的真实数据二次分析上评估了我们的方法，并证明了与部署的Whittle指数解决方案相比，参与度显著增加。这相当于额外16.3小时的内容被聆听，90.8%更多的脱节风险被防止，并覆盖了超过两倍的高脱节风险受益人。

英文摘要

The success of many healthcare programs depends on participants' adherence. We consider the problem of scheduling interventions in low resource settings (e.g., placing timely support calls from health workers) to increase adherence and/or engagement. Past works have successfully developed several classes of Restless Multi-armed Bandit (RMAB) based solutions for this problem. Nevertheless, all past RMAB approaches assume that the participants' behaviour follows the Markov property. We demonstrate significant deviations from the Markov assumption on real-world data on a maternal health awareness program from our partner NGO, ARMMAN. Moreover, we extend RMABs to continuous state spaces, a previously understudied area. To tackle the generalised non-Markovian RMAB setting we (i) model each participant's trajectory as a time-series, (ii) leverage the power of time-series forecasting models to learn complex patterns and dynamics to predict future states, and (iii) propose the Time-series Arm Ranking Index (TARI) policy, a novel algorithm that selects the RMAB arms that will benefit the most from an intervention, given our future state predictions. We evaluate our approach on both synthetic data, and a secondary analysis on real data from ARMMAN, and demonstrate significant increase in engagement compared to the SOTA, deployed Whittle index solution. This translates to 16.3 hours of additional content listened, 90.8% more engagement drops prevented, and reaching more than twice as many high dropout-risk beneficiaries.

URL PDF HTML ☆

赞 0 踩 0

0911.2381 2026-06-05 physics.data-an cond-mat.stat-mech cs.LG nlin.CD stat.ME 版本更新

Analytical Determination of Fractal Structure in Stochastic Time Series

随机时间序列中分形结构的解析确定

Fermín Moscoso del Prado Martín

发表机构 * Laboratoire de Psychologie Cognitive ( UMR --6146) CNRS \& Aix--Marseille Universit\'e I, Marseille, France

AI总结本文提出了一种基于贝叶斯评估的分析框架，用于客观准确地推断时间序列的分形结构，同时推导出一种优于现有方法的Hurst指数最大似然估计器。

Comments 9 pages, 4 figures