arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.06494 2026-06-05 cs.LG 版本更新

TailLoR: Protecting Principal Components in Parameter-Efficient Continual Learning

TailLoR:在参数高效持续学习中保护主成分

Marius Dragoi, Ioana Pintilie, Alexandra Dragomir, Antonio Barbalau, Florin Brad

发表机构 * Bitdefender

AI总结 提出TailLoR方法,利用预训练权重的奇异基作为固定参考系,对奇异值矩阵进行低秩更新,并通过软谱惩罚抑制与主导奇异方向对齐的更新,从而减少干扰并实现细粒度适应。

详情
AI中文摘要

基于谱分解的参数高效微调方法推动了持续学习的进展。本文介绍TailLoR,该方法利用预训练权重的奇异基U和V作为固定参考系,学习应用于奇异值矩阵的低秩更新。软谱惩罚抑制与主导奇异方向对齐的更新,减少干扰,同时将细粒度适应引导到高度灵活的长尾谱坐标中。

英文摘要

Parameter-efficient finetuning methods based on spectral decomposition have enabled progress in Continual Learning. In this paper we introduce TailLoR, which utilizes the singular bases U and V of the pre-trained weights as a fixed reference frame to learn a low-rank update applied to the singular value matrix. A soft spectral penalty discourages updates aligned with dominant singular directions, reducing interference while routing fine-grained adaptation into the highly flexible, long-tail spectral coordinates.

2606.06486 2026-06-05 cs.LG cs.AI cs.GT 版本更新

Regret Minimization with Adaptive Opponents in Repeated Games

重复博弈中与自适应对手的遗憾最小化

Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu, Kaiqing Zhang

发表机构 * Massachusetts Institute of Technology(麻省理工学院) OpenAI University of Maryland, College Park(马里兰大学学院公园分校)

AI总结 针对重复博弈中自适应对手的遗憾最小化问题,提出重复策略遗憾(RP-Regret)指标,并设计三种算法实现次线性遗憾,同时证明所有玩家最小化该遗憾可学习子博弈完美均衡。

详情
AI中文摘要

在本文中,我们研究重复博弈中与\emph{自适应}对手(即能够根据历史对局做出反应的对手)的遗憾最小化问题。已知在线学习中的标准\emph{外部遗憾}指标无法捕捉这种自适应性。为了考虑玩家的反事实推理,我们引入了{ t 重复策略遗憾(RP-Regret)},这是一种博弈论指标,衡量当所有玩家都能对历史对局做出\emph{反应}时,\emph{实际}累积效用与\emph{事后最优}累积效用之间的差异。与此背景下现有的遗憾概念相比,我们的概念更贴近重复博弈,允许更强的比较器和约束更少的对手,同时当所有玩家最小化该遗憾时,仍有可能找到更好的均衡。我们首先确定了获得时间次线性{ t RP-Regret}的必要条件,涉及遗憾定义中玩家比较器策略的变化以及比较器和对手策略的记忆。然后,我们研究了最小化{ t RP-Regret}的附加条件和可证明的算法,该遗憾在策略空间上本质上是\emph{非凸}的。为了应对这一挑战,我们提出了三种算法:(i)基于优化预言机(如先前一些在线非凸学习工作所假设的);(ii)每次迭代最小化{ t RP-Regret}的凸\emph{线性化}替代项;(iii)当对手缓慢改变策略时,直接最小化{ t RP-Regret}。此外,当所有玩家都能运行算法最小化{ t RP-Regret}(或其线性化变体)时,可以学习重复博弈的某些子博弈完美均衡。我们还提供了实验,表明最小化我们的遗憾概念可以在诸如猎鹿博弈等游戏中带来更合作、效用更高的解。

英文摘要

In this paper, we study regret minimization in repeated games with \emph{adaptive} opponents who can respond based on histories of play. The standard metric of \emph{external regret} in online learning is known to fail to capture such adaptivity. To account for players' counterfactual reasoning, we introduce {\tt Repeated Policy Regret (RP-Regret)}, a game-theoretic metric that measures the difference between the \emph{realized} and the \emph{best-in-hindsight} accumulated utility when all players can \emph{respond} to the history of play. Compared to existing regret notions in this setting, ours is native to repeated game playing, enabling stronger comparators and opponents with fewer constraints, while maintaining the possibility of finding better equilibria when all players minimize it. We first identify necessary conditions for obtaining {\tt RP-Regret} sublinear in time, on the variation of the player's comparator strategies in the regret definition and on the memories of both the comparator and opponents' strategies. We then study additional conditions and provable algorithms to minimize {\tt RP-Regret}, which is by definition \emph{non-convex} in the strategy space. To address this challenge, we propose three algorithms: (i) one based on an optimization oracle, as assumed in some prior work in online non-convex learning; (ii) one that minimizes a convex and \emph{linearized} surrogate of {\tt RP-Regret} at each iteration; (iii) one that directly minimizes {\tt RP-Regret} when opponents change strategies slowly. Furthermore, when all players can run algorithms to minimize the {\tt RP-Regret} (or its linearized variant), certain subgame perfect equilibria of the repeated game can be learned. We also provide experiments showing that minimizing our regret notions can lead to more cooperative solutions with higher utility in games such as Stag-Hunt.

2606.06481 2026-06-05 cs.CL cs.AI cs.LG 版本更新

Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection

操作引导的渐进式人机文本转换基准:面向多粒度AI文本检测

Sondos Mahmoud Bsharat, Jiacheng Liu, Xiaohan Zhao, Tianjun Yao, Xinyi Shang, Yi Tang, Jiacheng Cui, Ahmed Elhagry, Salwa K. Al Khatib, Hao Li, Salman Khan, Zhiqiang Shen

发表机构 * Mohamed bin Zayed University of Artificial Intelligence(莫扎德·本·泽亚德人工智能大学) University College London(伦敦大学学院)

AI总结 提出OpAI-Bench基准,通过九种渐进修订版本和五种AI编辑操作,模拟人机协作编辑过程,支持文档、句子、词元和跨度多粒度检测,揭示AI文本可检测性受编辑操作、领域和累积修订历史影响,并发现混合作者中间版本比纯人类或纯AI端点更难检测。

Comments Our code and data are available at https://github.com/VILA-Lab/OpAI-Bench

详情
AI中文摘要

随着AI写作助手越来越多地融入现实世界的起草和修订流程,许多文档不再是纯粹的人类撰写或AI生成,而是渐进式人机共同编辑的结果。然而,现有的AI文本检测基准主要关注最终输出,对AI作者身份信号如何在修订过程中出现、累积或消失的理解有限。我们引入了OpAI-Bench,一个操作引导的基准,用于研究在文档、句子、词元和跨度粒度上的渐进式人机文本转换。从人类撰写的文档开始,OpAI-Bench在预定义的AI覆盖水平和五种代表性AI编辑操作下,为每个样本构建了九个顺序修订版本,涵盖四个领域,同时保留多粒度上的完整作者身份来源。该基准支持8个文档级检测器、7个句子级检测器和2个细粒度词元/跨度级检测器的全面评估。实验表明,AI文本的可检测性不仅受AI编辑内容比例的影响,还受编辑操作、领域和累积修订历史的影响。有趣的是,我们注意到混合作者身份的中间版本通常比完全人类或大量AI编辑的端点更难检测,暴露了现有基准遗漏的非单调检测模式。OpAI-Bench为分析在现实渐进编辑场景下,AI辅助写作是否、何时以及如何变得可检测提供了一个受控测试平台。我们的代码和基准可在https://github.com/VILA-Lab/OpAI-Bench获取。

英文摘要

As AI writing assistants become increasingly integrated into real-world drafting and revision workflows, many documents are no longer purely human-written or AI-generated, but instead result from progressive human-AI co-editing. However, existing AI-text detection benchmarks largely focus on final outputs and provide limited understanding of how AI authorship signals emerge, accumulate, or disappear throughout the revision process. We introduce OpAI-Bench, an operation-guided benchmark for studying progressive human-to-AI text transformation across document, sentence, token, and span granularities. Starting from human-written documents, OpAI-Bench constructs nine sequentially revised versions for each sample under predefined AI coverage levels and five representative AI edit operations, covering four domains while preserving complete authorship provenance at multiple granularities. The benchmark supports comprehensive evaluation with 8 document-level detectors, 7 sentence-level detectors, and 2 fine-grained token/span-level detectors. Experiments reveal that AI-text detectability is governed not only by the proportion of AI-edited content, but also by edit operation, domain, and cumulative revision history. Interestingly, we notice that mixed-authorship intermediate versions are often harder to detect than both fully human and heavily AI-edited endpoints, exposing non-monotonic detection patterns missed by existing benchmarks. OpAI-Bench provides a controlled testbed for analyzing whether, when, and how AI-assisted writing becomes detectable under realistic progressive editing scenarios. Our code and benchmark are available at https://github.com/VILA-Lab/OpAI-Bench.

2606.06480 2026-06-05 cs.GT cs.LG 版本更新

DNQ: Deep Nash Q-Network for Partially Observable n-Player Games

DNQ: 用于部分可观测n人博弈的深度纳什Q网络

Qintong Xie, Edward Koh, Xavier Cadet, Peter Chin

发表机构 * IEEE

AI总结 针对多智能体同时博弈问题,提出DNQ框架,通过求解器在环的均衡监督训练智能体,并对比成对与精确均衡求解方法的可扩展性。

详情
AI中文摘要

许多现实世界的竞争系统要求多个决策者在共享约束、有限信息和重复交互下同时行动,例如拍卖、资源分配和安全竞争。我们将多轮同时竞价作为此类问题的受控测试平台,并提出DNQ,一种求解器在环的均衡监督框架,用于训练竞价智能体。DNQ在轨迹收集、基于评论家的收益估计、均衡计算和策略模仿之间交替进行。在每个访问的状态下,共享评论家预测成对收益矩阵或精确的N人收益张量,外部求解器计算均衡策略,智能体通过最小化其掩码策略与求解器导出的均衡目标之间的KL散度进行训练。我们专注于可扩展的成对公式,与精确公式相比,大大降低了均衡求解成本和训练时间,同时共享评论家跨智能体和状态摊销了收益学习。实验使用评论家损失、策略熵、竞价资源使用和训练成本比较了成对和精确变体,表明成对方法可扩展到更多智能体,而精确方法随着联合博弈的增长在计算上变得不可行。这些结果说明了重复竞争环境中战略保真度与可扩展性之间的权衡。

英文摘要

Many real-world competitive systems require multiple decision-makers to act simultaneously under shared constraints, limited information, and repeated interaction, as in auctions, resource allocation, and security competition. We study multi-turn simultaneous bidding as a controlled testbed for such problems and propose DNQ, a solver-in-the-loop equilibrium supervision framework for training bidding agents. DNQ alternates between trajectory collection, critic-based payoff estimation, equilibrium computation, and policy imitation. At each visited state, a shared critic predicts either pairwise payoff matrices or an exact N-player payoff tensor, an external solver computes equilibrium strategies, and the agents are trained by minimizing the KL divergence between their masked policies and the solver-derived equilibrium targets. We focus on a scalable pairwise formulation that greatly reduces equilibrium-solving cost and training time compared with the exact formulation, while the shared critic amortizes payoff learning across agents and states. Experiments compare the pairwise and exact variants using critic loss, policy entropy, bidding resource usage, and training cost, showing that the pairwise method scales to larger numbers of agents, whereas the exact method becomes computationally impractical as the joint game grows. These results illustrate the trade-off between strategic fidelity and scalability in repeated competitive environments.

2606.06479 2026-06-05 cs.LG cs.AI 版本更新

Pretraining Recurrent Networks without Recurrence

无递归预训练循环网络

Akarsh Kumar, Phillip Isola

发表机构 * MIT(麻省理工学院)

AI总结 提出监督记忆训练(SMT)方法,通过将循环神经网络训练转化为一步记忆转换标签上的监督学习,实现时间并行训练和稳定梯度路径,优于反向传播通过时间(BPTT)方法。

Comments 30 pages, 23 figures

详情
AI中文摘要

训练循环神经网络(RNN)需要在长序列计算中分配信用。标准的反向传播通过时间(BPTT)对此问题处理不佳:它在时间上是顺序的,限制了并行性,并且遭受梯度消失或爆炸,使得长程关联难以学习。我们提出监督记忆训练(SMT),一种训练非线性RNN的方法,通过将RNN训练简化为一步记忆转换标签 $(m_t, x_{t+1}) \rightarrow m_{t+1}$ 上的监督学习,完全绕过了循环信用传播。SMT通过训练基于Transformer的编码器在预测状态目标上获取这些记忆标签——仅保留预测未来所需的过去信息。通过将记忆内容与记忆更新方式解耦,SMT实现了时间并行的RNN训练,任意两个token之间具有稳定的$O(1)$长度梯度路径——而无需展开RNN。我们发现,在语言建模和像素序列建模等任务上预训练各种RNN架构时,SMT优于BPTT。SMT使非线性RNN能够更好地捕获长程依赖并并行训练,可能解锁构建过去经验时间抽象模型的缩放能力。

英文摘要

Training recurrent neural networks (RNNs) requires assigning credit across long sequences of computations. Standard backpropagation through time (BPTT) addresses this problem poorly: it is sequential in time, limiting parallelism, and suffers from vanishing or exploding gradients, making long-range associations difficult to learn. We propose Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely by reducing RNN training to supervised learning on one-step memory transition labels $(m_t, x_{t+1}) \rightarrow m_{t+1}$. SMT acquires these memory labels by training a Transformer-based encoder on a predictive state objective--retaining only information from the past necessary to predict the future. By decoupling what to remember from how to update memory, SMT enables time-parallel RNN training with a stable $O(1)$ length gradient path between any two tokens--without ever unrolling the RNN. We find that SMT outperforms BPTT when pretraining various RNN architectures on tasks like language modeling and pixel sequence modeling. SMT enables nonlinear RNNs to better capture long-range dependencies and train in parallel, potentially unlocking the scaling of models that build temporal abstractions of past experience.

2606.06475 2026-06-05 cs.LG cs.AI 版本更新

RREDCoT: Segment-Level Reward Redistribution for Reasoning Models

RREDCoT: 推理模型的片段级奖励再分配

Mykyta Ielanskyi, Kajetan Schweighofer, Lukas Aichberger, Sepp Hochreiter

发表机构 * ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria(林茨ELLIS单元和LIT人工智能实验室,机器学习研究所,林茨约瑟夫·冯·拉格纳大学,奥地利) Cognizant AI Lab, San Francisco, USA(认知人工智能实验室,美国旧金山) NXAI GmbH, Linz, Austria(NXAI公司,奥地利林茨)

AI总结 针对推理语言模型强化学习微调中的延迟奖励问题,提出RREDCoT方法,利用模型自身近似最优奖励再分配,无需额外生成,降低方差并提升信用分配效率。

Comments Preprint, under review

详情
AI中文摘要

近期推理语言模型的进展由强化学习微调驱动。通常,这些依赖于组相对策略优化(GRPO)算法或其变体来引导模型生成思维链(CoT)轨迹。最终答案只能在CoT轨迹完成后验证并分配奖励,这构成了延迟奖励问题。GRPO及其变体对应于标准强化学习中的蒙特卡洛方法,已知具有高方差。该问题的一个可能解决方案是通过信用分配进行奖励再分配,其中对达到期望解重要的CoT轨迹片段通过分配更高奖励来强调。虽然蒙特卡洛采样可用于提供中间状态值的无偏估计,但其计算开销使其不适用于长上下文高粒度下的训练时信用分配。我们引入RREDCoT(思维链的奖励再分配),它利用模型自身近似最优奖励再分配,无需额外生成。我们研究了我们的方法相比MC采样和几种归因方法的优势。我们进一步分析了与再分配构建相关的几个方面,例如CoT轨迹的分割和状态值估计。

英文摘要

Recent advancements in reasoning language models have been driven by Reinforcement Learning (RL) fine-tuning. Most often, these rely on the Group Relative Policy Optimization (GRPO) algorithm or modifications thereof to steer the models to produce Chain-of-Thought (CoT) traces. The final answer can only be verified, and the reward assigned, after the CoT trace is complete, making it a delayed reward problem. GRPO and its modifications correspond to Monte Carlo methods in standard RL, which are known to suffer from high variance. A possible solution to this problem is the redistribution of rewards through credit assignment, where segments of the CoT trace that are important for arriving at the desirable solution are emphasized by assigning a higher reward. While Monte Carlo sampling can be used to provide an unbiased estimate of intermediate state values, its computational overhead makes it unsuitable for train-time credit assignment in long contexts at high granularity. We introduce RREDCoT (Reward REDistribution for Chain of Thoughts), which utilizes the model itself to approximate the optimal reward redistribution without additional generation. We investigate the advantages of our method compared to MC sampling and several attribution methods. We further analyze several aspects relevant to the construction of the redistribution such as segmentation of CoT traces and state value estimation.

2606.06474 2026-06-05 cs.CL cs.AI cs.LG 版本更新

Self-Augmenting Retrieval for Diffusion Language Models

扩散语言模型的自增强检索

Paul Jünger, Justin Lovelace, Linxi Zhao, Dongyoung Go, Kilian Q. Weinberger

发表机构 * University of California, Berkeley(加州大学伯克利分校) Google Research(谷歌研究院)

AI总结 提出SARDI框架,利用扩散语言模型去噪过程中丢弃的低置信度标记作为前瞻信号指导检索,无需训练且与检索器无关,在多跳问答基准上以高达8倍吞吐量超越现有方法。

Comments ICML 2026

详情
AI中文摘要

离散扩散语言模型通过并行迭代去噪整个响应来生成文本。每一步,它们为每个掩码位置预测暂定标记,将高置信度预测提交到输出,并丢弃低置信度标记。我们表明,被丢弃的标记实际上对检索增强生成是有用的前瞻信号:即使低置信度标记也常在去噪轨迹早期浮现显著实体,从而在输出最终确定前检索到更强的证据。我们通过扩散语言模型的自增强检索(SARDI)利用这一点,这是一个动态RAG框架,在去噪过程中使用这些前瞻标记指导检索。SARDI无需训练、与检索器无关,并适用于任何具备推理能力的离散扩散语言模型。在五个多跳QA基准上,SARDI以高达8倍的吞吐量优于当前无训练的扩散和自回归检索基线。

英文摘要

Discrete diffusion language models generate text by iteratively denoising an entire response in parallel. At each step, they predict tentative tokens for every masked position, committing the confident predictions to the output and discarding the unconfident ones. We show that the discarded tokens are in fact a useful lookahead signal for retrieval-augmented generation: even low-confidence tokens often surface salient entities early in the denoising trajectory, enabling retrieval of stronger evidence before the output is finalized. We exploit this through Self-Augmenting Retrieval for Diffusion Language Models (SARDI), a dynamic RAG framework that uses these lookahead tokens to guide retrieval during denoising. SARDI is training-free, retriever-agnostic, and applicable to any reasoning-capable discrete diffusion language model. Across five multi-hop QA benchmarks, SARDI outperforms current training-free diffusion and autoregressive retrieval baselines at up to $8\times$ higher throughput.

2606.06470 2026-06-05 cs.LG cs.AI 版本更新

PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training

PC层:通过多项式权重预处理改进大语言模型预训练

Senmiao Wang, Tiantian Fang, Haoran Zhang, Yushun Zhang, Kunxiang Zhao, Alex Schwing, Ruoyu Sun

发表机构 * The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Google LLC(谷歌公司) Shenzhen International Center for Industrial and Applied Mathematics(深圳国际工业与应用数学中心) Shenzhen Research Institute of Big Data(深圳大数据研究院) University of Illinois at Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出一种多项式预条件子权重参数化方法(PC层),通过低阶多项式预条件重塑权重矩阵奇异值谱,确保LLM训练中权重条件稳定,且训练后无推理开销,在Llama-1B预训练中优于标准Transformer。

详情
AI中文摘要

我们提出了一种预条件(PC)层,一种通过多项式预条件子实现的权重参数化方法,确保在整个LLM训练过程中权重条件稳定。PC模块通过低阶多项式预条件重塑权重矩阵的奇异值谱。训练后,预条件权重可以合并回原始架构,不产生推理开销。我们展示了在Llama-1B预训练中,对于AdamW和Muon优化器,所提出的PC层相对于标准Transformer的优势。理论上,我们通过证明对于某些深度线性网络,均匀限制每层的奇异值能确保梯度下降几何收敛到全局最小值,从而证明了这一谱控制原理。我们的代码可在https://github.com/Empath-aln/PC-layer获取。

英文摘要

We propose a preconditioning (PC) layer, a weight parameterization via polynomial preconditioner that ensures stable weight conditioning throughout LLM training. The PC module reshapes the singular-value spectrum of weight matrices via low-degree polynomial preconditioning. After training, the preconditioned weights can be merged back into the original architecture, incurring no inference overhead. We demonstrate the advantage of the proposed PC layer over standard transformers in Llama-1B pre-training, for both the AdamW and Muon optimizers. Theoretically, we justify this spectrum-control principle by proving that uniformly bounding each layer's singular values ensures geometric convergence of gradient descent to global minima, for certain deep linear networks. Our code is available at https://github.com/Empath-aln/PC-layer.

2606.06469 2026-06-05 math.ST cs.LG math.PR stat.TH 版本更新

How abundant are good interpolators?

好的插值器有多丰富?

August Y. Chen, Ahmed El Alaoui

发表机构 * Cornell University, Department of Computer Science(康奈尔大学计算机科学系) Cornell University, Department of Statistics and Data Science(康奈尔大学统计与数据科学系)

AI总结 在高维比例下,通过大偏差原理研究随机均匀选择的线性插值分类器的泛化误差分布,发现几乎所有插值分类器具有相同的泛化性能,而高效算法(如梯度下降)优于大多数插值器。

Comments 140 pages

详情
AI中文摘要

设 $S$ 是单位范数线性分类器 $\theta\in \mathbb{R}^d$ 的集合,这些分类器以预先固定的可能负的间隔 $\kappa$ 正确分类标记数据集 $(X_i,y_i)_{i=1}^n$ 中的每个点,其中 $X_i \in \mathbb{R}^d$,$y_i \in \{-1,+1\}$。在两种自然的数据生成分布——高斯混合模型和具有高斯特征的逻辑模型——以及比例 $n/d \to \alpha$ 且 $\alpha$ 足够小的条件下,我们建立了关于事件(从 $S$ 中均匀随机选择的点 $\theta$ 达到给定泛化误差)的大偏差原理,且该事件以高概率依赖于数据的选择。相关的速率函数是确定性的,描述了在 $d$ 的指数尺度上具有给定期望性能的插值分类器的比例。作为推论,我们建立了以下集中现象:除了指数小的一部分外,所有插值分类器都具有大致相同的泛化性能,该性能由该速率函数的唯一最大值给出。我们将该最大值与通过梯度下降的经验风险最小化和自然线性规划的性能进行了数值比较,两者都找到了 $S$ 中的一个点,并推断出在 $\alpha$ 小的过参数化区域中,这些高效方法优于绝大多数插值器,指出了它们在此设置中非平凡的良性过拟合。

英文摘要

Let $S$ be the set of unit norm linear classifiers $θ\in \mathbb{R}^d$ which correctly classify every point of a labeled dataset $(X_i,y_i)_{i=1}^n$, $X_i \in \mathbb{R}^d$, $y_i \in \{-1,+1\}$, with a possibly negative margin $κ$ fixed in advance. Under two natural data-generating distributions of the $(X,y)$ pairs -- a Gaussian mixture model and a logistic model with Gaussian features -- and in the proportional regime $n/d \to α$ with small enough $α$, we establish a large deviation principle on the event that a point $θ$ chosen uniformly at random from $S$ achieves a given generalization error, with high probability over the choice of the data. The associated large deviation rate function is deterministic and describes the proportion, at the exponential scale in $d$, of interpolating classifiers having a given desired performance. As a consequence, we establish the following concentration phenomenon: all but an exponentially small fraction of interpolating classifiers have approximately the same generalization performance given by the unique maximizer of this rate function. We numerically compare this maximizer to the performance of empirical risk minimization by gradient descent and to the performance of a natural linear program, both finding a point in $S$, and deduce that in the overparametrized regime of small $α$, these efficient procedures outperform the vast majority of interpolators, pointing to their nontrivial benign overfitting in this setting.

2606.06467 2026-06-05 cs.CL cs.AI cs.LG 版本更新

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

仅索引一次:具有共享路由的跨层稀疏注意力

Yutao Sun, Yanqi Zhang, Li Dong, Jianyong Wang, Furu Wei

发表机构 * Microsoft Research(微软研究院) Tsinghua University(清华大学)

AI总结 提出跨层稀疏注意力(CLSA),通过共享KV缓存和路由索引,在保持token稀疏注意力精度的同时减少路由开销,显著提升长上下文LLM的解码效率。

详情
AI中文摘要

现代LLM中的长上下文推理越来越受到解码效率的限制,尤其是在模型生成长中间思维链的推理密集型场景中。现有的稀疏注意力方法通常面临实际的效率-质量权衡。结构化块稀疏方法通常提供更强的加速,但会导致明显的质量损失,而token稀疏方法通常更准确,但由于在全缓存上进行top-k路由仍然昂贵,因此端到端加速有限。在这项工作中,我们提出了跨层稀疏注意力(CLSA),它建立在KV共享架构(如YOCO)之上。核心思想不仅是跨解码器层共享KV缓存,还共享路由索引。单个索引器计算一次token级别的top-k选择,并在各层之间重用生成的索引,从而保留了token稀疏注意力的细粒度选择性,同时分摊了路由开销。由此产生的架构共同改善了所有主要的推理瓶颈,包括预填充、KV缓存存储和长上下文解码。在短上下文和长上下文基准上的实验表明,CLSA既准确又高效,在128K上下文下实现了高达7.6倍的解码加速和17.1倍的总体吞吐量提升。这些结果表明,对于长上下文LLM,这是一种更完整的架构解决方案,可同时提升模型质量和推理效率。

英文摘要

Long-context inference in modern LLMs is increasingly constrained by decoding efficiency, especially in reasoning-heavy settings where models generate long intermediate chains of thought. Existing sparse attention methods often face a practical efficiency-quality trade-off. Structured block sparse methods typically provide stronger acceleration but incur noticeable quality loss, while token sparse methods are usually more accurate yet deliver limited end-to-end speedup because top-k routing over the full cache remains expensive. In this work, we propose cross-layer sparse attention (CLSA), which is built on top of KV-sharing architectures such as YOCO. The core idea is to share not only the KV cache across cross-decoder layers, but also the routing index. A single indexer computes token-level top-k selection once and reuses the resulting index across layers, thereby preserving the fine-grained selectivity of token sparse attention while amortizing the routing overhead. The resulting architecture improves all major inference bottlenecks jointly, including pre-filling, KV-cache storage, and long-context decoding. Experiments across short-context and long-context benchmarks show that CLSA is both accurate and efficient, achieving up to 7.6x decoding speedup and 17.1x overall throughput improvement at 128K context. These results suggest a more complete architectural solution for long-context LLMs that jointly advances model quality and inference efficiency.

2606.06459 2026-06-05 cs.LG 版本更新

Event Detection for Parameter-to-KPI Dependency Learning for AI-RAN

面向AI-RAN的参数到KPI依赖学习的事件检测

Christie Djidjev, Nicholas Kaminski

发表机构 * Idaho National Laboratory(爱达荷国家实验室)

AI总结 针对AI-RAN中多AI控制函数相互干扰问题,提出基于事件检测的依赖学习方法,通过将噪声连续遥测转换为二元事件指示器,并利用合成数据评估机器学习管道恢复潜在依赖结构的能力。

详情
AI中文摘要

下一代无线网络预计将依赖多个并发的AI驱动控制功能,这些功能同时优化不同的网络目标,特别是在AI集成和开放无线接入网络架构中,如AI无线接入网络(AI-RAN)和开放无线接入网络(O-RAN)。当这些功能相互作用时,它们可能以难以仅从原始网络数据中检测的方式相互干扰。管理此类交互的一个关键缺失部分是可靠、可解释的依赖结构,该结构捕获在任何给定时间哪些控制参数积极影响哪些网络性能结果。本文聚焦于支持此类依赖学习所需的事件检测步骤,通过将噪声连续遥测转换为参数活动和KPI响应的二元指示器。核心困难在于并非数据中的每个波动都反映真实的控制交互,因此该方法必须区分真实的参数-结果关系与背景变化。由于难以获得具有已知参数-KPI真实标签的真实AI-RAN流量轨迹,我们引入了一个带有植入潜在依赖的合成闭环流量生成器。我们使用这种受控遥测来评估基于机器学习的依赖恢复管道,该管道将连续轨迹到二元事件指示器的转换表述为一个显著性检测问题。实验评估表明,当信号与背景变化充分分离时,所提出的管道能够可靠地从噪声连续轨迹中恢复潜在依赖结构,同时强调阈值校准是控制事件检测质量的关键因素。这些结果为自适应AI-RAN控制系统的可解释依赖学习奠定了基础。

英文摘要

Next-generation wireless networks are expected to rely on multiple concurrent AI-driven control functions that optimize different network objectives simultaneously, particularly in AI-integrated and open radio access network architectures such as AI Radio Access Network (AI-RAN) and Open Radio Access Network (O-RAN). When these functions interact, they can interfere with one another in ways that are difficult to detect from raw network data alone. A key missing piece for managing such interactions is a reliable, interpretable dependency structure that captures which control parameters are actively influencing which network performance outcomes at any given time. This paper focuses on the event-detection step needed to support such dependency learning by converting noisy continuous telemetry into binary indicators of parameter activity and KPI response. The central difficulty is that not every fluctuation in the data reflects a genuine control interaction, so the method must distinguish real parameter-outcome relationships from background variation. Because real AI-RAN traffic traces with known parameter-KPI ground truth are difficult to obtain, we introduce a synthetic closed-loop traffic generator with planted latent dependencies. We use this controlled telemetry to evaluate a machine-learning-based dependency recovery pipeline that formulates the conversion of continuous traces into binary event indicators as a significance-detection problem. Experimental evaluation shows that the proposed pipeline reliably recovers the latent dependency structure from noisy continuous traces when the signal is sufficiently separated from background variation, while highlighting threshold calibration as the key factor controlling event-detection quality. These results constitute a foundational step toward interpretable dependency learning for adaptive AI-RAN control systems.

2606.06458 2026-06-05 cs.LG cs.AI cs.CV 版本更新

In-Context Multiple Instance Learning

上下文多实例学习

Alexander Möllers, Marvin Sextro, Julius Hense, Gabriel Dernbach, Klaus-Robert Müller

发表机构 * Berlin Institute for the Foundations of Learning and Data(柏林学习与数据基础研究所) Machine Learning Group, Technische Universität Berlin(柏林技术大学机器学习小组) Aignostics Institute of Pathology, Charité – Universitätsmedizin Berlin(柏林查理医院病理研究所) Max-Planck Institute for Informatics(马克斯·普朗克信息研究所) Department of Artificial Intelligence, Korea University(韩国大学人工智能系)

AI总结 本文提出一种基于感知器架构的上下文学习器,通过合成数据预训练,无需梯度更新即可从少量标记包中解决新的多实例学习任务,在12个基准上超越需任务特定训练的监督基线。

详情
AI中文摘要

多实例学习(MIL)解决了在实例包级别提供监督的问题,并已成功应用于从计算病理学到卫星图像等领域。然而,现有算法在低标签率(许多实际应用的特点)下表现不佳。灵活的模型过拟合,而僵化的模型无法适应手头的任务。我们证明,在合成数据上预训练一个具有感知器架构的上下文学习器,可以得到一个能够从少量标记包中解决新任务的模型。在推理时,分类在单次前向传播中完成,无需梯度更新。我们提出并研究了不同的用于包结构数据的合成数据生成器,发现它们捕获了互补的归纳偏差。在这些生成器的混合上预训练的模型继承了每个生成器在各自任务上的优势,并在12个MIL基准上取得了最佳平均性能,超过了需要任务特定训练的监督基线。

英文摘要

Multiple Instance Learning (MIL) addresses problems where supervision is available at the level of bags of instances and has been successfully applied in fields ranging from computational pathology to satellite imagery. Nevertheless, existing algorithms struggle in the low-label regime that characterizes many real-world applications. Flexible models overfit and rigid ones fail to adapt to the task at hand. We show that pretraining an in-context learner with a Perceiver-style architecture on synthetic data yields a model that can solve new tasks from a handful of labeled bags. At inference time, classification happens in a single forward pass and requires no gradient updates. We propose and investigate different synthetic data generators for bag-structured data and find that they capture complementary inductive biases. A model pretrained on a mixture of these generators inherits their per-task strengths and achieves the best average performance across twelve MIL benchmarks, outperforming supervised baselines that require task-specific training.

2606.06447 2026-06-05 cs.CL cs.LG 版本更新

Latent Reasoning with Normalizing Flows

基于归一化流的潜在推理

Guancheng Tu, Xiangjun Fu, Suhao Yu, Yao Tang, Haoqiang Kang, Lianhui Qin, Yizhe Zhang, Jiatao Gu

发表机构 * University of Pennsylvania(宾夕法尼亚大学) UC San Diego(圣地亚哥大学) Meta(Meta公司)

AI总结 提出NF-CoT框架,通过归一化流在LLM内部建模连续潜在思维,保留自回归生成、概率采样、KV缓存解码和似然估计等优势,在代码生成任务中提升通过率并降低推理成本。

详情
AI中文摘要

大型语言模型通常通过生成显式思维链(CoT)来改进推理,展示了中间计算的重要性。然而,文本CoT迫使这种计算通过离散、串行且面向通信的令牌流进行:每个推理步骤必须在模型继续之前被语言化,即使底层更新是语义的、不确定的或仅部分形成的。潜在推理通过在承诺文本之前以紧凑的连续状态执行中间计算,提供了一种更高带宽的替代方案。然而,现有的潜在推理方法常常牺牲了使CoT在自回归语言模型中有效的关键优势,包括原生的从左到右生成、概率采样、与KV缓存解码的兼容性以及可处理的似然估计。我们提出NF-CoT,一种潜在推理框架,通过使用归一化流对连续思维进行建模来保留这些优势。NF-CoT在LLM骨干内部实例化一个TARFlow风格的归一化流,定义了从显式CoT提炼的紧凑连续思维上的可处理概率模型。连续思维位置由NF头生成,而文本位置由标准LM头在同一因果流中生成。这种设计为潜在思维提供了精确的似然,支持使用原始KV缓存进行概率从左到右解码,并支持在潜在推理空间中进行直接策略梯度优化。在代码生成基准测试中,NF-CoT在显式CoT和先前潜在推理基线上提高了通过率,同时显著降低了中间推理成本。

英文摘要

Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and communication-oriented token stream: each reasoning step must be verbalized before the model can proceed, even when the underlying update is semantic, uncertain, or only partially formed. Latent reasoning offers a higher-bandwidth alternative by performing intermediate computation in compact continuous states before committing to text. Yet existing latent-reasoning methods often sacrifice key advantages that make CoT effective in autoregressive language models, including native left-to-right generation, probabilistic sampling, compatibility with KV-cache decoding, and tractable likelihood estimation. We propose NF-CoT, a latent reasoning framework that preserves these advantages by modeling continuous thoughts with normalizing flows. NF-CoT instantiates a TARFlow-style normalizing flow inside the LLM backbone, defining a tractable probability model over compact continuous thoughts distilled from explicit CoT. Continuous-thought positions are generated by an NF head, while text positions are generated by the standard LM head within the same causal stream. This design provides exact likelihoods for latent thoughts, enables probabilistic left-to-right decoding with the original KV cache, and supports direct policy-gradient optimization in the latent reasoning space. On code-generation benchmarks, NF-CoT improves pass rates over explicit-CoT and prior latent-reasoning baselines while substantially reducing intermediate-reasoning cost.

2606.06440 2026-06-05 cs.LG stat.ML 版本更新

Causal Atlases from Entropic Inference: Bayesian Networks beyond Optimal DAGs

来自熵推理的因果图谱:超越最优DAG的贝叶斯网络

Hazhir Aliahmadi, Irina Babayan, Greg van Anders

发表机构 * Department of Physics , Engineering Physics and Astronomy(物理系、工程物理与天文学系)

AI总结 针对数据驱动因果识别中多因果链问题,提出基于熵推理的因果图谱方法,通过最大熵系综采样量化因果结构歧义性。

Comments 18 pages, 2 figures

详情
AI中文摘要

数据驱动的因果关系识别对于理解科学内外的复杂系统至关重要。贝叶斯网络通过有向无环图(DAG)为建模通用因果关系提供了一种概率方法。然而,构建贝叶斯网络的典型技术依赖于优化,这可能不适合学习因果关系,因为底层数据可能允许多条因果链。更忠实于数据的因果关系表示将提供构建多个因果图的框架,这些因果图与底层数据固有的变异性一致。在这里,我们展示了基于熵的推理生成了与底层数据一致的合理因果关系的图谱。在2节点和20节点线性结构方程模型的模拟噪声数据上,我们对图的最大熵系综进行采样,从而量化底层因果关系中固有的结构歧义性。我们的方法表明,“优化”的DAG可能包含在同等精确的拓扑中不一致的因果伪影。

英文摘要

Data-driven causal relationship identification is pertinent to advancing understanding of complex systems both within and beyond science. Bayesian networks offer a probabilistic method for modelling generic causal relationships via directed acyclic graphs (DAGs). However, typical techniques for constructing Bayesian networks rely on optimization, which can be ill-suited for learning causal relationships because the underlying data may admit multiple chains of causation. More data-faithful representations of causal relationships would provide frameworks for constructing multiple causal maps that are consistent with the variability that is inherent in underlying data. Here, we show that entropy-based inference generates atlases of plausible causal relationships that are consistent with underlying data. On simulated noisy data of 2- and 20-node linear structural equation models, we sample a maximum-entropy ensemble of graphs that allow us to quantify the inherent structural ambiguity in underlying causal relationships. Our method shows that "optimized" DAGs can contain causal artifacts are not consistent across equivalently accurate topologies.

2606.06418 2026-06-05 cs.LG cs.AI cs.SY eess.SY 版本更新

Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation Loss

双重预处理 (DoPr):针对测试时性能而非验证损失的优化

Thomas T. Zhang, Alok Shah, Yifei Zhang, Vincent Zhang, Nikolai Matni, Max Simchowitz

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学) University of Cambridge(剑桥大学) DeepMind(深度Mind) Google Research(谷歌研究)

AI总结 提出双重预处理优化范式,通过结合梯度级和激活级预处理,缓解自回归语言建模等场景中训练/验证损失与下游指标不匹配的测试时反馈问题,提升测试时性能而不一定改善验证损失。

详情
AI中文摘要

深度学习的许多现代应用涉及通过一步预测损失(例如,$L^2$回归、交叉熵)训练神经网络,但部署时沿着其自身预测进行展开。关键例子包括自回归语言建模、基于流的生成建模和机器人策略学习。已有充分证据表明,这些设置会引发我们称为测试时反馈(TTF)的现象:训练/验证损失与下游感兴趣指标(如任务成功率和生成质量)之间的不匹配,且随任务长度增长。虽然数据整理、架构和目标设计已被提出用于对抗TTF设置中的训练-测试偏移,但本文提出优化作为缓解误差累积的新设计轴。具体而言,我们引入了一种称为双重预处理(DoPr)的新优化范式,专门针对TTF的挑战。DoPr将梯度级预处理(如Adam和Muon中的)与激活级预处理(AP)(如KFAC中的)相结合。我们表明,添加AP可以在各种TTF设置中作为一种即插即用的干预手段,提高下游模型性能。有趣的是,这些测试时性能的提升并不总是伴随验证损失的改善,这为如何正确评估使用一步监督目标训练的模型提出了新问题。

英文摘要

Many modern applications of deep learning involve training a neural network via a one-step prediction loss (e.g., $L^2$ regression, cross-entropy), but deploy the network by rolling out along its own predictions. Key examples include autoregressive language modeling, flow-based generative modeling, and robot policy learning. It is well-documented that these settings induce a phenomenon we call test-time feedback (TTF): the mismatch between the training/validation loss and downstream metrics of interest, such as task success rate and generation quality, which grows with task length. While data curation, architecture, and objective design have been proposed to combat train-test shift in TTF settings, this paper proposes optimization as a new design axis to mitigate error accumulation. Specifically, we introduce a new optimization paradigm called double-preconditioning (DoPr) uniquely tailored to the challenges of TTF. DoPr combines gradient-wise preconditioning, as in Adam and Muon, with activation-wise preconditioning (AP), such as in KFAC. We show that the addition of AP yields a drop-in intervention for increasing downstream model performance across a range of TTF settings. Interestingly, these gains in test-time performance do not consistently accompany improvements in validation loss, opening new questions about how to properly evaluate models trained with one-step supervised objectives.

2606.06416 2026-06-05 cs.AI cs.CL cs.LG cs.MA 版本更新

Unsupervised Skill Discovery for Agentic Data Analysis

面向智能体数据分析的无监督技能发现

Zhisong Qiu, Kangqi Song, Shengwei Tang, Shuofei Qiao, Lei Liang, Huajun Chen, Shumin Deng

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出DataCOPE框架,通过无监督验证器引导从探索轨迹中发现可复用的数据分析技能,在报告式和推理式分析任务上分别提升平均得分9.71%和32.30%。

Comments Work in progress

详情
AI中文摘要

推理时技能增强通过注入可复用的程序性知识而不更新模型参数,为改进数据分析智能体提供了一种轻量级方法。然而,发现有效的数据分析技能仍然具有挑战性,因为可靠的监督成本高昂,且成功标准因分析格式而异。这提出了一个关键问题:如何仅从无标签探索中发现可复用的数据分析技能。我们提出DataCOPE,一种面向数据分析智能体的无监督验证器引导的技能发现框架。DataCOPE从探索轨迹中提取验证器信号,并利用这些信号表征轨迹间的相对质量或一致性。它迭代地协调一个数据分析智能体用于轨迹生成、一个无监督验证器用于信号提取、以及一个技能管理器用于对比式技能蒸馏。对于报告式分析,我们将验证器实例化为自适应检查表验证器,该验证器推导任务特定标准,通过可验证覆盖率对报告评分,并迭代优化检查表。对于推理式分析,我们将其实例化为答案一致性验证器,该验证器根据答案一致性对轨迹分组,并使用自一致性作为辅助信号。我们在Deep Data Research的报告式分析和DABStep的推理式分析上评估DataCOPE。在两种设置下,DataCOPE在保留任务上持续优于基线。在四种模型设置上平均,DataCOPE在报告式和推理式任务上分别将平均得分提高了9.71%和32.30%。

英文摘要

Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reusable procedural knowledge without updating model parameters. However, discovering effective skills for data analysis remains challenging, as reliable supervision is expensive and success criteria vary across analytical formats. This raises the key question of how to discover reusable data-analysis skills from unlabeled exploration alone. We propose DataCOPE, an unsupervised verifier-guided skill discovery framework for data-analytic agents. DataCOPE derives verifier signals from the exploration trajectories and uses them to characterize relative quality or aggreement among trajectories. It iteratively coordinates a Data-Analytic Agent for trajectory generation, an Unsupervised Verifier for signal extraction, and a Skill Manager for contrastive skill distillation. For report-style analysis, we instantiate the verifier as an Adaptive Checklist Verifier that derives task-specific criteria, scores reports by verifiable coverage, and iteratively refines the checklist. For reasoning-style analysis, we instantiate it as an Answer Agreement Verifier that groups trajectories by answer agreement and uses self-consistency as an auxiliary signal. We evaluate DataCOPE on report-style analysis from Deep Data Research and reasoning-style analysis from DABStep. Across both settings, DataCOPE consistently improves held-out performance over baselines. Averaged across four model settings, DataCOPE improves the mean score by 9.71% and 32.30% on report-style and reasoning-style tasks respectively.

2606.06393 2026-06-05 cs.LG 版本更新

Proper Scoring Rules for Right-Censored Survival Data

右删失生存数据的适当评分规则

Jef Jonkers, Glenn Van Wallendael, Luc Duchateau, Sofie Van Hoecke

发表机构 * IDLab Biometrics Research Group(生物信息学研究组) Department of Electronics(电子系) Department of Morphology, and Information Systems and Information Systems Imaging, Orthopedics(形态学系及信息系统和影像学、骨科) Ghent University(根特大学) Ghent University - imec(根特大学 - imec) Rehabilitation and Nutrition(康复与营养) Department of Electronics and Information Systems(电子系及信息系统)

AI总结 针对右删失生存数据,提出通过删失机制映射预测分布并应用适当评分的框架,得到删失CRPS、pinball损失等评分,并证明其适当性。

Comments 27 pages

详情
AI中文摘要

适当评分规则为概率预测的训练和评估提供了严格的理论基础。然而,在存在右删失的情况下,事件时间仅被部分观测,使得传统评分规则无法以标准形式应用。我们提出一个基于简单思想的右删失生存结果适当评分框架:首先,通过删失机制映射预测分布,然后在诱导的观测数据分布上应用底层适当评分。这产生了固定删失时间的局部评分,以及当删失时间随机或仅部分观测时的边际化评分。该构造在一个连贯的框架内恢复了熟悉的右删失似然和IPCW型准则,同时产生了CRPS、pinball损失、Brier分数和能量分数的右删失版本。我们证明边际评分在条件独立删失下是适当的,并且在可识别区域上是严格适当的。同一原则还导致了删失engression,一种用于多变量右删失生存建模的基于样本的学习目标。在实验中,我们的评分在多种删失机制下正确排序了oracle预测,而依赖预测的插件加权评分可能表现出排序反转。删失engression同样在删失结果上显著优于朴素训练。

英文摘要

Proper scoring rules provide a rigorous theoretical basis for the training and evaluation of probabilistic forecasts. However, in the presence of right censoring, the event time is only partially observed, rendering conventional scoring rules inapplicable in their standard form. We propose a framework for proper scoring of right-censored survival outcomes based on a simple idea: first, map the predictive distribution through the censoring mechanism, then apply the underlying proper score on the induced observed-data law. This yields localized scores for fixed censoring times and marginalized scores when the censoring time is random or only partially observed. The resulting construction recovers familiar right-censored likelihood and IPCW-type criteria within a coherent framework, while also yielding right-censored versions of the CRPS, pinball loss, Brier score, and energy score. We show that the marginalized score is proper under conditional independent censoring and strictly proper on the identifiable region. The same principle also leads to censored engression, a sample-based learning objective for multivariate right-censored survival modeling. In experiments, our scores correctly rank the oracle forecast across several censoring regimes, whereas forecast-dependent plug-in weighted scores can exhibit ranking reversals. Censored engression likewise substantially improves over naive training on censored outcomes.

2606.06391 2026-06-05 stat.ML cs.LG 版本更新

Conformal Risk Sharing: Certified Cost Allocation with Participation Guarantees

共形风险分担:具有参与保证的认证成本分配

Ieva Kazlauskaite

发表机构 * Ieva Kazlauskaite(伊娃·卡祖利特)

AI总结 提出共形风险分担方法,通过可解释的分担策略与分裂共形校准相结合,从有限数据中无分布假设地分配罕见事件的财务影响,为每个参与者提供义务上限并验证无人受损。

详情
AI中文摘要

将罕见不利事件的财务影响在群体中分担可以减轻极端个人负担,但任何因该安排而变得更糟的参与者都有理由退出。因此,一个可信的机制必须为每个代理人提供其未来义务的可信上限,并且只有在参与者之间的总损害有界时才应部署。我们将此形式化为认证分配问题:从有限数据中,无需分布假设,找到一种再分配规则,为每个参与者产生义务上限,并验证没有参与者实质上变得更糟。我们提出共形风险分担,通过将可解释的分担策略与分裂共形校准相结合来解决这个问题。分担强度在训练数据上调整,而保留的校准数据产生无分布假设的每个代理保证(在可交换性下有效)。在合成和真实数据(包括降水和能源合作社数据)上的实验证实,该框架可以显著降低高风险代理的极端义务,同时控制对他人的损害。

英文摘要

Sharing the financial impact of rare adverse events across a group can soften extreme individual burdens, but any participant made worse off by the arrangement has reason to leave. A credible mechanism must therefore provide each agent with a trustworthy cap on their future obligation and should be deployed only if the aggregate harm across participants is bounded. We formalise this as the Certified Allocation Problem: from finite data and without distributional assumptions, find a redistribution rule, produce obligation caps for every participant, and verify that no participant is made materially worse off. We propose Conformal Risk Sharing, which solves this problem by pairing an interpretable sharing policy with split conformal calibration. The sharing intensity is tuned on training data, while held-out calibration data produces distribution-free per-agent guarantees (valid under exchangeability). Experiments on synthetic and real-world data, including precipitation and energy-cooperative data, confirm that the framework can substantially reduce extreme obligations for high-risk agents while controlling harm to others.

2606.06385 2026-06-05 cs.LG 版本更新

Learned Response-Field Inertia Operator for HEC-RAS 2D Water-Surface Elevation Prediction

用于 HEC-RAS 2D 水面高程预测的学习型响应场惯性算子

Edward Holmberg, Elias Ioup, Md Meftahul Ferdaus, Mahdi Abdelguerfi, Julian Simeonov

发表机构 * Canizaro Livingston Gulf States Center for Environmental Informatics, Department of Computer Science, The University of New Orleans(坎西罗利文斯顿湾州环境信息中心,计算机科学系,新奥尔良大学) Center for Geospatial Sciences, Naval Research Laboratory(地理空间科学中心,海军研究实验室) Ocean Sciences Division, Naval Research Laboratory(海洋科学 division,海军研究实验室)

AI总结 提出学习型响应场惯性算子(LRFIO),一种基于增量、无外力项的学习代理模型,通过从已求解的 HEC-RAS 轨迹中校准惯性响应算子并在原生非均匀网格上进行封闭形式滚动预测,实现了跨数据集的水面高程预测,并展示了自适应复杂度控制。

Comments Preprint manuscript prepared using IEEEtran journal format

详情
AI中文摘要

本文提出了一种跨数据集评估学习型原生网格代理模型的方法,用于 HEC-RAS 2D 中求解器一致的水面高程(WSE)预测。为避免栅格重映射误差和信息访问混淆,代理模型直接在原始非均匀计算单元上评估,并采用显式策略分离静态项目输入、当前水力状态、项目输入强迫、校准衍生量以及未来求解器输出目标。我们引入了学习型响应场惯性算子(LRFIO),这是一种无外力、基于增量的学习代理模型,它从已求解的 HEC-RAS 轨迹中校准惯性响应算子,并通过封闭形式的原生网格滚动部署保留的算子。LRFIO 评估了一个基例优先的响应层次结构,包括持久性、全局校准惯性和分段响应场惯性。分段、残差校正和神经化惯性被视为可学习建模选择,仅当验证证据证明其成本合理时才保留增加的复杂度。在四个不同的 HEC-RAS 2D 基准测试中,LRFIO 对不同领域保留了不同的响应结构,展示了自适应学习复杂度。选择器审计显示复杂度可控,最大验证遗憾为 4.30%。在部署期间,保留的滚动时间范围为 0.003 秒至 0.242 秒,Beaver Bayou 实测-求解比较表明,相对于 HEC-RAS 实现了约 2.75 × 10^4 的视界归一化加速。这些结果表明,当前的原生网格增量是一个强大的求解器条件预测支架,并且仅在经验证实时才应保留增加的响应场、神经或空间复杂度。

英文摘要

This article presents a cross-dataset evaluation of learned native-cell surrogate models for solver-consistent water-surface elevation (WSE) prediction in HEC-RAS 2D. To avoid raster remapping error and information-access confounding, surrogates are evaluated directly on the original nonuniform computational cells under an explicit policy that separates static project inputs, current hydraulic state, project-input forcing, calibration-derived quantities, and future solver-output targets. We introduce the Learned Response-Field Inertia Operator (LRFIO), a no-forcing, increment-based learned surrogate that calibrates an inertial response operator from solved HEC-RAS trajectories and deploys the retained operator through closed-form native-cell rollout. LRFIO evaluates a base-case-first response hierarchy consisting of persistence, global calibrated inertia, and segmented response-field inertia. Segmentation, residual correction, and neuralized inertia are treated as learnable modeling choices, with added complexity retained only when validation evidence justifies its cost. Evaluated across four diverse HEC-RAS 2D benchmarks, LRFIO retains different response structures for different domains, demonstrating adaptive learned complexity. The selector audit shows controlled complexity with a maximum validation regret of 4.30%. During deployment, retained rollout times range from 0.003 s to 0.242 s, and the Beaver Bayou measured-solve comparison gives an estimated 2.75 x 10^4 horizon-normalized speedup over HEC-RAS. These results indicate that the current native-cell increment is a strong solver-conditioned predictive scaffold and that added response-field, neural, or spatial complexity should be retained only when empirically justified.

2606.06364 2026-06-05 cs.LG stat.ML 版本更新

End-to-End Subgraph Detection with GraphDETR

端到端子图检测与GraphDETR

Dexiong Chen, Till Hendrik Schulz, Karsten Borgwardt

发表机构 * Max Planck Institute of Biochemistry(马克斯·普朗克生物化学研究所)

AI总结 提出GraphDETR框架,将子图检测视为集合预测问题,通过图神经网络编码目标图、Transformer解码器联合预测所有模式实例,并采用二分匹配端到端训练,支持精确和近似匹配,在多达1000节点的图中检测50节点模式,并在ChEMBL数据集上实现AP100=91.2。

详情
AI中文摘要

子图检测旨在识别查询模式实例是否出现在更大图中及其位置。该问题在科学领域至关重要,且与子图同构密切相关,后者是NP完全的,限制了组合方法只能处理小模式或中等规模图。我们提出GraphDETR,一个深度学习框架,将子图检测公式化为集合预测问题,类似于目标检测中的DETR。GraphDETR使用图神经网络编码目标图,并采用一组固定的可学习查询向量,通过Transformer解码器解码,在单次前向传播中联合预测所有模式实例。这通过端到端训练和二分匹配实现。与传统仅解决精确结构匹配的组合方法不同,GraphDETR自然扩展到近似匹配,使得能够检测超出精确模式对应的实例。实验表明,GraphDETR能够在多达1000个节点的目标图中检测多达50个节点的多样化模式,如分子结构、环、团和模糊模式。我们进一步在ChEMBL数据集上评估分子官能团检测,GraphDETR预测每个分子的完整官能团集合,实现了$ ext{AP}_{100} = 91.2$的强性能。

英文摘要

Subgraph detection seeks to identify whether and where instances of query patterns occur within a larger graph. This problem is fundamental across scientific domains and is closely related to subgraph isomorphism, which is NP-complete, limiting combinatorial approaches to small patterns or moderately sized graphs. We introduce GraphDETR, a deep learning framework that formulates subgraph detection as a set prediction problem, analogous to DETR in object detection. GraphDETR encodes the target graph with a graph neural network, and employs a fixed set of learnable query vectors, decoded via a transformer decoder, to predict all pattern occurrences jointly in a single forward pass. This is enabled by training the model end-to-end with bipartite matching. Unlike traditional combinatorial methods that only solve exact structural matching, GraphDETR naturally extends to approximate matching, enabling detection beyond exact pattern correspondence. Empirically, we show that GraphDETR can detect diverse patterns, such as molecular structures, cycles, cliques, and fuzzy patterns of up to 50 nodes, in target graphs with up to 1000 nodes. We further evaluate on molecular functional group detection over the ChEMBL dataset, where GraphDETR predicts the complete set of functional groups per molecule, achieving a strong performance of $\text{AP}_{100} = 91.2$.

2606.06353 2026-06-05 cs.LG 版本更新

Maximising the Set-Piece Return: Optimising Football Corner Tactics with Graph Reinforcement Learning

最大化定位球回报:用图强化学习优化足球角球战术

Sean Groom, Michael Groom, Francisco Belo, Axl Rice, Liam Anderson, Victor-Alexandru Darvariu, Shuo Wang

发表机构 * School of Computer Science, University of Birmingham, Birmingham, UK(伯明翰大学计算机科学学院) Nottingham Forest Football Club, Nottingham, UK(诺丁汉森林足球俱乐部) Oxford Robotics Institute, University of Oxford, Oxford, UK(牛津大学机器人研究所) School of Sport, Exercise and Rehabilitation Sciences, University of Birmingham, Birmingham, UK(伯明翰大学运动、体能与康复科学学院)

AI总结 提出一种基于图强化学习的框架,通过调整进攻球员位置和速度来最大化角球首次触球射门概率,在英超角球数据上优于传统优化方法。

Comments 11 pages, 4 figures

详情
AI中文摘要

机器学习越来越多地被用于评估足球战术。然而,现有方法侧重于描述历史动作或分析师指定的反事实场景。在这项工作中,我们旨在超越对历史观察模式的模仿,发现新的可泛化的球员配置和策略。为此,我们专注于优化角球套路,并制定了一个决策问题,其中中央策略调整进攻球员的位置和速度,以最大化首次触球射门概率。与解决孤立设置的经典优化不同,我们贡献了一个基于图结构数据的强化学习架构,该架构产生一个通用策略,用于调整任意起始球员位置。在超过3000个英超角球上的评估表明,在匹配推理预算下,我们的方法显著优于基线优化技术。我们的结果表明,图强化学习可以将定位球分析从历史评估和模仿转向奖励驱动的战术发现。

英文摘要

Machine learning is increasingly employed for the evaluation of football tactics. However, existing approaches focus on characterising historical actions or analyst-specified counterfactual scenarios. In this work, we seek to go beyond the imitation of historically observed patterns towards discovering new generalisable player configurations and strategies. To tackle this, we focus on optimising corner kick routines, and formulate a decision-making problem in which a central policy makes adjustments to attacking player positions and velocities to maximise first contact shot probability. Unlike classic optimisation that solves for isolated setups, we contribute a reinforcement learning architecture operating on graph-structured data that yields a general policy for adjusting arbitrary starting player positions. Evaluated on over 3,000 Premier League corners, our approach strongly outperforms baseline optimisation techniques under matched inference budgets. Our results suggest that graph reinforcement learning can shift set-piece analysis from historical evaluation and imitation towards reward-driven tactical discovery.

2606.06351 2026-06-05 stat.ML cs.LG 版本更新

Function-Space Priors for Bayesian Neural ODEs with Application to Vessel Trajectory Prediction

贝叶斯神经常微分方程的函数空间先验及其在船舶轨迹预测中的应用

Jaeyeong Lee, Wonmo Koo, Heeyoung Kim

发表机构 * Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology (KAIST)(工业与系统工程系,韩国科学技术院(KAIST))

AI总结 针对船舶轨迹预测中不规则采样、缺失报告和复杂动力学挑战,提出一种在向量场上施加高斯过程核先验的正则化方法,并结合概率多重打靶实现长序列的不确定性量化。

详情
AI中文摘要

从自动识别系统(AIS)数据预测船舶轨迹对于海上态势感知至关重要,但由于不规则采样、缺失报告和复杂动力学,这仍然具有挑战性。除了准确的点预测外,海事应用还需要良好校准的不确定性估计以支持可靠决策。贝叶斯神经常微分方程(ODE)通过在神经向量场参数上放置先验,为具有不确定性量化的连续时间轨迹建模提供了原则性框架。然而,常用的各向同性高斯权重先验无法编码船舶动力学的信息性结构特性,如平滑性和局部性。现有的函数空间贝叶斯神经网络方法解决了静态映射的这一限制,但不能直接转移到神经常微分方程,因为其主要关注量是轨迹而非向量场本身。原则上,可以直接在ODE解上放置高斯过程(GP)先验,但这需要将分布通过非线性ODE求解器传播,这在分析上是棘手的。为了解决这一挑战,我们采用了一种实用方法,直接在有限测量点集上评估的向量场上施加基于GP核的先验。具体来说,我们用基于核的正则化器增强标准权重空间变分目标,该正则化器惩罚向量场偏离GP先验所隐含的结构。为了处理长且不规则的AIS轨迹,我们进一步将这种函数空间正则化与概率多重打靶相结合,该打靶方法在保持全局一致性的同时解耦跨时间段的推理。

英文摘要

Vessel trajectory prediction from Automatic Identification System (AIS) data is essential for maritime situational awareness, yet it remains challenging due to irregular sampling, missing reports, and complex dynamics. Beyond accurate point forecasts, maritime applications also demand well-calibrated uncertainty estimates for reliable decision-making. Bayesian Neural Ordinary Differential Equations (ODEs) offer a principled framework for continuous-time trajectory modeling with uncertainty quantification by placing a prior over the neural vector field parameters. However, the commonly used isotropic Gaussian weight prior fails to encode informative structural properties of vessel dynamics, such as smoothness and locality. Existing function-space Bayesian neural network methods address this limitation for static mappings, but do not transfer directly to Neural ODEs, where the primary quantity of interest is the trajectory rather than the vector field itself. In principle, one could place a Gaussian process (GP) prior directly over ODE solutions, but this requires propagating distributions through a nonlinear ODE solver, which is analytically intractable. To address this challenge, we adopt a practical approach that imposes a GP-kernel-based prior directly on the vector field evaluated at a finite set of measurement points. Specifically, we augment the standard weight-space variational objective with a kernel-based regularizer that penalizes deviations of the vector field from the structure implied by a GP prior. To handle long and irregular AIS trajectories, we further combine this function-space regularization with probabilistic multiple shooting, which decouples inference across temporal segments while maintaining global consistency.

2606.06348 2026-06-05 cs.LG 版本更新

Performance Evaluation of GraphCast for Medium-Range Weather Forecasting over Brazil

GraphCast在巴西中期天气预报中的性能评估

Wolfgang R. Rowell, Lucas S. Kupssinskü

发表机构 * MALTA, Machine Learning Theory and Applications Lab, PUCRS, Porto Alegre, Brazil(MALTA,机器学习理论与应用实验室,PUCRS,波士顿,巴西)

AI总结 本研究利用GraphCast模型与ECMWF IFS HRES基线,评估其在巴西四个气候子区域的中期天气预报性能,发现其技能具有季节性依赖性,在冬季中期表现不佳但在延伸期有优势,夏季则能准确捕捉大尺度水汽输送并抑制高频对流变率。

详情
AI中文摘要

全球天气预报范式正随着机器学习天气预报模型(MLWP)的出现而迅速转变。虽然这些数据驱动的架构展现出卓越的全球技能,但全球南方地区的区域基准仍然稀缺,其在复杂、高对流环境中的有效性在很大程度上未经验证。本研究评估了GraphCast operational在巴西四个不同气候子区域中,以确定性ECMWF IFS HRES为基线的性能。利用可扩展的云原生管道和WeatherBench-X框架进行天气模型基准测试,我们评估了四个选定季节窗口中的选定对流层变量($T_{850}$、$Q_{850}$、$Z_{500}$),以运行IFS分析作为地面实况,计算两个模型的统计指标。结果揭示了依赖于天气形势的技能特征。在南半球冬季,GraphCast在中期(预报天数2-7)对$Z_{500}$解析巴西上空快速传播的斜压系统时表现不佳,但在延伸期重新获得优势,此时其固有的对混沌小尺度变率的平滑在确定性技能指标下变得有益。相反,在南半球夏季雨季,GraphCast准确捕捉了大尺度水汽输送,同时内在抑制了破坏确定性NWP温度预报的高频对流变率。这些发现为巴西建立了基线,并定义了将指导未来“热带化”努力的具体物理边界,旨在优化这些基础AI模型以增强区域韧性。

英文摘要

The paradigm of global weather forecasting is rapidly shifting with the emergence of Machine Learning Weather Prediction models (MLWP). While these data-driven architectures demonstrate remarkable global skill, regional benchmarks in the Global South remain scarce, leaving their efficacy in complex, highly convective environments largely unverified. This study evaluates the performance of GraphCast operational against the deterministic ECMWF IFS HRES as baseline across four distinct Brazilian climatic sub-regions. Utilizing a scalable, cloud-native pipeline and the WeatherBench-X framework for benchmarking weather models, we assess selected tropospheric variables ($T_{850}$, $Q_{850}$, $Z_{500}$) over four selected seasonal windows, employing the operational IFS analysis as the ground truth to calculate the statistical metrics for both models. Results reveal a regime-dependent skill profile. During the austral winter, GraphCast underperforms in the medium range (lead days 2-7) for $Z_{500}$ when resolving fast-propagating baroclinic systems over southern Brazil, but regains an advantage in the extended range, where its inherent smoothing of chaotic small-scale variability becomes beneficial under deterministic skill metrics. Conversely, during the austral summer wet season, GraphCast accurately captures large-scale moisture transport while intrinsically dampening the high-frequency convective variability that degrades deterministic NWP temperature forecasts. These findings establish a baseline for Brazil and define the specific physical boundaries that will guide future ``tropicalization'' efforts, aiming to optimize these foundational AI models for regional resilience.

2606.06347 2026-06-05 eess.SY cs.LG cs.SY 版本更新

Attack Detection using Time Series Foundation Models

使用时间序列基础模型的攻击检测

Sribalaji C. Anand, Anh Tung Nguyen, George J. Pappas

发表机构 * University of Pennsylvania(宾夕法尼亚大学) KTH Royal Institute of Technology(皇家理工学院) Uppsala University(乌普萨拉大学)

AI总结 针对无模型知识的网络物理系统,提出基于TimesFM时间序列基础模型的零样本攻击检测方法,在IEEE 14节点电力系统上验证其性能。

Comments Under review

详情
AI中文摘要

本文解决了在没有任何被控对象模型或其结构知识的情况下,网络物理系统中的攻击检测问题。远程被控对象通过假设受到攻击的网络向操作员传输传感器测量值。我们考虑两类攻击:无模型重放攻击和基于模型的隐蔽攻击。对于后者,我们针对线性与非线性系统,推导了针对$\chi^2$检测器的最优隐蔽攻击策略的闭式表达式。然后,我们提出一种基于TimesFM(Google Research开发的时间序列基础模型)的无模型结构检测器,该检测器以零样本方式作为替代残差生成器运行。实验表明,基于TimesFM的检测器实现了相当或更优的攻击检测性能。在IEEE 14节点电力系统上通过数值实验证明了所提方法的有效性。我们还证明,当经典冗余假设失效时,TimesFM预测可作为受损测量值的替代,这是一种实用的缓解技术。

英文摘要

This paper addresses the problem of attack detection in cyber-physical systems without any knowledge of the plant model or its structure. A remotely located plant transmits sensor measurements to an operator over a network that is assumed to be under attack. We consider two classes of attacks: model-free replay attacks and model-based stealthy attacks. For the latter, we derive closed-form expressions for the optimal stealthy attack policy against a $χ^2$ detector, for both linear and nonlinear systems. We then propose a model-structure-free detector based on TimesFM, a time-series foundation model developed by Google Research, which serves as a surrogate residual generator operating in a zero-shot fashion. We show empirically that the TimesFM-based detector achieves a comparable or superior attack detection performance. The efficacy of the proposed approach is demonstrated numerically on the IEEE 14-bus power system. We also demonstrate that TimesFM predictions can serve as a substitute for corrupted measurements, a practical mitigation technique when classical redundancy assumptions fail.

2606.06345 2026-06-05 cs.AI cs.LG q-bio.NC 版本更新

Boosting Brain-to-Image Decoding with TRIBE v2 Data Augmentation

使用TRIBE v2数据增强提升脑到图像解码

Yohann Benchetrit, Marlène Careil, Simon Dahan, Hubert Banville, Stéphane d'Ascoli, Jean-Rémi King

发表机构 * Meta AI

AI总结 针对脑解码中标记数据稀缺的问题,提出利用预训练的fMRI响应模型TRIBE v2生成合成数据来增强小样本数据集,在两个数据集上实现最高68%的Top-10图像检索准确率提升,并发现纯合成数据训练的解码器在零样本设置中也能达到高于随机水平的性能。

详情
AI中文摘要

脑解码受限于标记神经数据的可用性,在低数据量情况下仍然具有挑战性。为了解决这个问题,我们研究了是否以及何时可以通过使用预训练的fMRI刺激响应模型生成的合成数据来增强小样本fMRI数据集,从而提升脑解码性能。我们使用TRIBE v2,这是一个大型编码模型,在超过1000小时的视频、音频和语言fMRI响应数据上进行了预训练。对于每个数据集,我们评估了系统网格,展示了图像解码器性能如何随用于训练的合成数据量变化。基于两个数据集(7T fMRI自然场景数据集和3T fMRI BOLD5000)的结果显示,与仅使用真实数据训练的解码器相比,Top-10图像检索准确率最高提升68%。重要的是,达到给定图像解码性能所需的增强数据比例需要根据数据源进行调整。令人惊讶的是,仅使用合成fMRI数据训练的图像解码器在某些设置下性能高于随机水平,表明TRIBE v2可以支持零样本脑到图像解码。这些结果共同表明,大规模fMRI响应模型(针对视觉、声音和语言)可以为提高图像解码的数据效率提供基础。

英文摘要

Brain decoding is limited by the availability of labeled neural data, and remains challenging in low-data regimes. To address this issue, we investigate whether and when brain decoding can be boosted by augmenting small fMRI datasets with synthetic data generated by a pretrained model of fMRI responses to stimuli. We use TRIBE v2, a large encoding model pretrained on more than 1000 hours of fMRI responses to video, audio and language. For each dataset, we evaluate systematic grids that show how the performance of image decoders varies with the amount of synthetic data used for training. Our results, based on two datasets (the 7T fMRI Natural Scenes Dataset and 3T fMRI BOLD5000), show up to 68% improvement in Top-10 image-retrieval accuracy compared to decoders trained only on real data. Importantly, the proportion of augmented data required to reach a given image decoding performance needs to be adjusted depending on the data source. Surprisingly, image decoders trained exclusively on synthetic fMRI can perform above chance in some settings, suggesting that TRIBE v2 can support zero-shot brain-to-image decoding. Together, these results show how large-scale models of the fMRI responses to sight, sound and language may provide a foundation to improve the data efficiency for image decoding.

2606.06344 2026-06-05 cs.LG cs.SC 版本更新

Equivariant Neural Belief Propagation

等变神经信念传播

Zehua Cheng, Wei Dai, Jiahao Sun

发表机构 * Department of Computer Science, University of Oxford(计算机科学系,牛津大学) FLock.io

AI总结 提出等变神经信念传播(ENBP),通过等变高斯混合消息和秩2精度矩阵合成,实现SE(3)对称性下的高效概率推理,在分子构象和机器人推理任务中显著超越基线。

Comments 18 pages

详情
AI中文摘要

空间嵌入变量上的概率推理需要尊重$SE(3)$对称性的信念,然而现有的等变网络仅产生标量和向量——而不是各向异性不确定性所需的秩2精度张量,并且单分量消息将多模态能量景观坍缩为物理上无意义的平均值。我们引入了等变神经信念传播(ENBP),一种因子图框架,其消息是等变高斯混合模型,其充分统计量在$SE(3)$下精确变换。秩2精度矩阵通过等变外积合成,通过可微谱分解处理,并通过贪婪的基于KL的混合约简保持可处理性,该约简可证明与$SE(3)$交换。在GEOM-QM9和GEOM-Drugs上,ENBP在0.090 $\mathring{A}$误差下实现了98.9%的构象覆盖率,延迟低于亚秒——比扩散基线快100倍以上且精度更高。在多体机器人推理中,普通环状BP在15个智能体时发散,而ENBP收敛,碰撞率接近零,等变误差达到机器精度(${\sim}10^{-7}$,而增强基线为$10^{-1}$)。

英文摘要

Probabilistic inference over spatially embedded variables requires beliefs that respect $SE(3)$ symmetry, yet existing equivariant networks produce only scalars and vectors -- not the rank-2 precision tensors needed for anisotropic uncertainty, and single-component messages collapse multi-modal energy landscapes to physically meaningless averages. We introduce Equivariant Neural Belief Propagation (ENBP), a factor-graph framework whose messages are equivariant Gaussian mixture models with sufficient statistics that transform exactly under $SE(3)$. Rank-2 precision matrices are synthesised via equivariant outer products, ingested through differentiable spectral decomposition, and kept tractable by a greedy KL-based mixture reduction that provably commutes with $SE(3)$. On GEOM-QM9 and GEOM-Drugs, ENBP achieves 98.9% conformational coverage at 0.090 $\mathring{A}$ error with sub-second latency -- over $100\times$ faster than diffusion baselines at higher accuracy. On multi-body robotic inference, vanilla loopy BP diverges at 15+ agents while ENBP converges with near-zero collision rates and machine-precision equivariance error (${\sim}10^{-7}$ vs.\ $10^{-1}$ for augmented baselines).

2606.06342 2026-06-05 stat.ML cs.LG 版本更新

Symmetric Divergence and Normalized Similarity: A Unified Topological Framework for Representation Analysis

对称散度与归一化相似性:表示分析的统一拓扑框架

Yan Wang, Tianyang Hu

发表机构 * School of Data Science, The Chinese University of Hong Kong, Shenzhen(数据科学学院,香港中文大学(深圳))

AI总结 提出对称表示拓扑散度(SRTD)和归一化拓扑相似性(NTS),分别解决现有拓扑散度的非对称性和无界性问题,实现细粒度结构诊断与跨场景标准化评估。

Comments Accepted by TMLR

详情
AI中文摘要

拓扑数据分析(TDA)为比较神经表示提供了一种原则性的、内在的视角。然而,现有的配对拓扑散度(如RTD)受到启发式非对称性以及更关键的无界分数(依赖于样本量)的限制,阻碍了可靠的跨场景基准测试。为了解决这些挑战,我们开发了一个统一的拓扑工具包,服务于两个互补的需求:细粒度结构诊断和鲁棒的标准化评估。首先,我们通过引入对称表示拓扑散度(SRTD)及其高效变体SRTD-lite来完善RTD框架。除了解决先前变体的理论非对称性外,SRTD将诊断信息整合到一个单一的、全面的交叉条码签名中。这使得能够精确定位结构差异,并作为有效的优化目标,无需双方向计算的开销。其次,为了在异构设置中实现可靠的基准测试,我们提出了归一化拓扑相似性(NTS)。通过测量层次合并顺序的秩相关性,NTS产生一个介于-1和1之间的尺度不变度量,有效克服了未归一化散度的尺度和样本依赖性。在合成和真实深度学习设置中的实验表明,我们的工具包捕捉到了几何度量无法发现的CNN中的功能变化,并且即使在距离饱和情况下也能鲁棒地映射LLM谱系,提供了一种严格的、拓扑感知的视角,补充了CKA等度量。

英文摘要

Topological Data Analysis (TDA) offers a principled, intrinsic lens for comparing neural representations. However, existing paired topological divergences (e.g., RTD) are limited by heuristic asymmetry and, more critically, unbounded scores that depend on sample size, hindering reliable cross-scenario benchmarking. To address these challenges, we develop a unified topological toolkit serving two complementary needs: fine-grained structural diagnosis and robust, standardized evaluation. First, we complete the RTD framework by introducing Symmetric Representation Topology Divergence (SRTD) and its efficient variant SRTD-lite. Beyond resolving the theoretical asymmetry of prior variants, SRTD consolidates diagnostic information into a single, comprehensive cross-barcode signature. This allows for precise localization of structural discrepancies and serves as an effective optimization objective without the overhead of dual directional computations. Second, to enable reliable benchmarking across heterogeneous settings, we propose Normalized Topological Similarity (NTS). By measuring the rank correlation of hierarchical merge orders, NTS yields a scale-invariant metric bounded between -1 and 1, effectively overcoming the scale and sample-dependence of unnormalized divergences. Experiments across synthetic and real-world deep learning settings demonstrate that our toolkit captures functional shifts in CNNs missed by geometric measures and robustly maps LLM genealogy even under distance saturation, offering a rigorous, topology-aware perspective that complements measures like CKA.

2606.06335 2026-06-05 cs.LG cs.AI 版本更新

Bridging Domain Expertise and Generalization for Performance Estimation

弥合领域专业知识与泛化能力以实现性能估计

Shuxuan Li, Zhilin Zhao, Quyu Kong, Wei-Shi Zheng

发表机构 * School of Computer Science and Engineering, Sun Yat-sen University, China(中山大学计算机科学与工程学院) Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China(教育部人工智能与先进计算重点实验室) Shenzhen Loop Area Institute, China(深圳环湖院) Alibaba Cloud(阿里云)

AI总结 提出FRAP方法,利用外部基础模型和基础模型的互补优势,通过温度缩放校准和对齐预测分布,构建更可靠的伪标签参考分布,从而在分布偏移下准确估计模型性能。

详情
AI中文摘要

分布偏移下的性能估计旨在预测模型在未标记测试集上的行为,该测试集的分布与训练数据不同,这一场景需要能够真实反映模型行为且无需真实标签的可靠指标。现有方法仅依赖给定模型的输出,而一旦分布发生偏移,其偏差会被放大,削弱了与真实性能的相关性。受此限制,我们提出融合参考对齐预测(FRAP),利用外部基础模型和基础模型的互补优势,构建更可靠的伪标签替代。FRAP通过应用温度缩放校准最小化基础模型与基础模型预测分布之间的差异,从而对齐两者。对齐后的预测通过基于置信度的加权融合成精炼的参考分布,该分布整合了基础模型的鲁棒性和基础模型的领域专业知识,并通过测量基础模型预测与该参考分布的一致性来获得性能估计。在多种数据集和架构上的大量实验表明,FRAP在分布偏移下相较于代表性性能估计方法取得了持续且显著的改进。

英文摘要

Performance estimation under distribution shift aims to predict how a model behaves on an unlabeled test set whose distribution differs from the training data, a scenario that requires reliable indicators that can faithfully reflect model behavior without ground-truth labels. Existing approaches rely solely on the outputs of the given model whose biases are amplified once the distribution shifts, weakening the correlation with the true performance. Motivated by this limitation, we propose Fused Reference Alignment Prediction (FRAP), which leverages the complementary strengths of an external foundation model and the base model to construct a more reliable surrogate of the ground-truth labels. FRAP aligns the prediction distribution of the foundation model with that of the base model by applying temperature-scaled calibration that minimizes their divergence. The aligned predictions are fused through confidence-based weighting into a refined reference distribution that integrates robustness from the foundation model and domain-specific expertise from the base model, and performance estimation is obtained by measuring how closely the base model predictions agree with this reference. Extensive experiments across diverse datasets and architectures show that FRAP provides consistent and substantial improvements over representative performance-estimation methods under distribution shift.

2606.06334 2026-06-05 cs.LG 版本更新

Quantifying the Privacy of Counterfactuals by Leveraging Membership Inference Attacks Against Synthetic Data

通过利用针对合成数据的成员推理攻击量化反事实的隐私性

Maryam Babaei, Yingke Wang, Hadrien Lautraite, Heber H. Arcolezi, Ulrich Aivodji, Sebastien Gambs

发表机构 * ÉTS Montreal and Mila Canada(蒙特利尔ÉTS学院和Mila加拿大) UQAM Canada(魁北克大学UQAM加拿大) Inria Grenoble France(法国格勒诺布尔Inria)

AI总结 本文利用针对合成数据的成员推理攻击,证明仅通过反事实即可成功实施成员推理攻击,无需访问模型,揭示了反事实发布中的隐私风险。

详情
AI中文摘要

反事实通常用于高风险决策领域,通过展示用户档案的变化如何导致期望结果来解释机器学习模型。然而,通过反事实解释模型决策也可能被对手利用,对模型或其训练数据进行隐私攻击。基于反事实提供真实训练数据的现实替代品(类似于合成数据)的类比,我们在本文中展示了如何通过借鉴针对合成数据开发的攻击,成功地对反事实进行隐私攻击。更准确地说,我们研究了针对合成数据设计的成员推理攻击在各种类型反事实上的有效性。此外,虽然现有的针对反事实的成员推理攻击通常需要能够查询模型,但我们展示了如何仅使用一组反事实(无需访问生成它们的模型)即可成功进行成员推理攻击。我们的结果表明,模型开发者在向不同用户发布反事实时应更加谨慎,因为这可能导致隐私泄露。

英文摘要

Counterfactuals are typically used in high-stakes decision areas to explain a machine learning model by showing how changes to the user profiles result in the desired outcome. However, explaining the model's decisions through counterfactuals can also be exploited by an adversary to conduct privacy attacks against the model or its training data. Drawing on the analogy that counterfactuals provide realistic substitutes for real training data, similar to synthetic data, we demonstrate in this paper how it is possible to successfully perform privacy attacks on counterfactuals by drawing on the attacks developed against synthetic data. More precisely, we investigate the effectiveness of the membership inference attacks designed for synthetic data on various types of counterfactuals. Additionally, while existing membership inference attacks against counterfactuals usually require to be able to query the model, we show how it is possible to perform successful membership inference attacks using only a set of counterfactuals, with no access to the model from which they are generated. Our results demonstrate that model developers should be more cautious when releasing counterfactuals to various users, as it can lead to a privacy breach.

2606.06333 2026-06-05 cs.LG cs.AI 版本更新

Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability

子空间感知稀疏自编码器用于有效的机制可解释性

Seyed Arshan Dalili, Mehrdad Mahdavi

发表机构 * The Pennsylvania State University(宾夕法尼亚州立大学)

AI总结 针对稀疏自编码器将特征假设为一维导致特征分裂的问题,提出子空间感知稀疏自编码器(SASA),通过学习解码器子空间、块稀疏门控和核范数正则化,在GPT-2和Mistral-7B上减少特征分裂和吸收,提高单义性和可解释性。

详情
AI中文摘要

稀疏自编码器(SAEs)广泛用于大型语言模型的机制可解释性,但其公式为每个潜在特征分配单个解码器方向,隐含地假设特征是一维的。我们证明这一假设与模型特征的多维结构不匹配,通过两种不同机制可证明地诱导特征分裂。从几何角度看,用单方向解码器重构内在维度$d_i \ge 2$的特征到误差$\varepsilon$,所需的原子数量随$d_i$呈指数增长。从端到端优化角度看,这种分裂不仅是可能的,而且是主动偏好的。我们证明存在一条从真实的$d_i$维基到$\ell_1$正则化SAE目标严格更低风险的连续路径,其下降方向驱使任何训练字典进入该指数区域。因此,一个单一连贯的特征被碎片化到许多近乎共线的潜在变量中,产生虚假的多重性并掩盖内在几何结构。受此启发,我们引入子空间感知稀疏自编码器(SASA),用学习的解码器子空间替换单向量解码器,通过Top-$s$组门控强制块稀疏性,并用核范数正则化器适应每个组的有效秩。然后我们证明,一旦块大小满足$r \ge d_i$,单个组不仅能表示整个特征切片,而且是SASA目标的全局最小值。这种整合产生样本复杂度关于$d_i$的多项式而非指数——鉴于每次训练激活都需要LLM前向传递,这是一个决定性优势。实验上,在GPT-2和Mistral-7B上,SASA减少了特征分裂和吸收,提高了单义性和可解释性,并且在约一半的token预算下训练,性能匹配或超过标准SAE。

英文摘要

Sparse Autoencoders (SAEs) are widely used for mechanistic interpretability in large language models, yet their formulation assigns each latent feature a single decoder direction, implicitly assuming features to be one-dimensional. We show that this assumption mismatches with the multi-dimensional structure of model features, provably inducing feature splitting through two distinct mechanisms. Geometrically, reconstructing a feature of intrinsic dimension $d_i \ge 2$ to error $\varepsilon$ with single-direction decoders forces a number of atoms that is exponential in $d_i$. From an end-to-end optimization perspective, this splitting is not merely possible but actively preferred. We prove that there exists a continuous path from the true $d_i$-dimensional basis to a strictly lower risk of the $\ell_1$-regularized SAE objective, whose descent directions drive any trained dictionary into that exponential regime. A single coherent feature is therefore fragmented across many near-collinear latents, producing spurious multiplicity and obscuring the intrinsic geometry. Motivated by this, we introduce Subspace-Aware Sparse Autoencoders (SASA), which replace single-vector decoders with learned decoder subspaces, enforce block sparsity via Top-$s$ group gating, and adapt each group's effective rank with a nuclear-norm regularizer. We then show that once the block size satisfies $r \ge d_i$, a single group not only can represent the entire feature slice but is the global minimizer of the SASA objective. This consolidation yields a sample complexity polynomial in $d_i$ rather than exponential -- a decisive advantage given that every training activation costs an LLM forward pass. Empirically, on GPT-2 and Mistral-7B, SASA reduces feature splitting and absorption, improves monosemanticity and interpretability, and matches or exceeds standard SAEs while training on roughly half the token budget.

2606.06329 2026-06-05 cs.LG cs.CG cs.CV stat.ML 版本更新

Efficient Mean Curvature Computation on High-Dimensional Data Manifolds

高维数据流形上的高效平均曲率计算

Alexandre L. M. Levada

发表机构 * Federal University of São Carlos(萨尔瓦多·卡洛斯联邦大学)

AI总结 针对高维数据集局部平均曲率计算中原始方法O(m^4)每点成本过高的问题,提出基于代数恒等式和截断SVD的快速估计器,将成本降至O(k^2 m + k m p^2),在真实数据集上实现50-300倍加速且精度损失可忽略。

Comments 31 pages, 2 figures and 5 tables

详情
AI中文摘要

估计高维数据集中每个点的局部平均曲率是几何感知机器学习算法(如平均曲率边界点(MCBP)方法)的关键组成部分。该计算的朴素实现基于从k近邻块近似的局部形状算子,涉及显式构造矩阵$H$,其迹形式导致每点成本为$O(m^4)$,使得该方法对于具有超过几十个特征的数据集变得难以处理。本文提出了两个互补的贡献,共同将这一成本降低了几个数量级。第一个贡献是一个精确的代数恒等式。该恒等式源自协方差矩阵特征向量的正交性和迹算子的循环性,完全消除了$H$,并将特征分解后的每点成本降低到$O(m^2)$。第二个贡献解决了完整特征分解中剩余的$O(m^3)$瓶颈。由于局部协方差矩阵的秩最多为$k-1 \ll m$,我们将其替换为$k imes m$中心数据矩阵的截断SVD,这是一个$O(k^2 m)$操作,并基于Haar测度下零空间特征向量外积的期望值,推导出其贡献的解析近似。得到的估计器总成本为$O(k^2 m + k m p^2)$,其中$p = k-1$。在真实数据集上的实验证实,相对于原始实现,加速比为50到300倍,当使用快速估计器替换原始版本时,精度损失可忽略。通过提供可扩展且数据驱动的局部曲率估计,所提出的方法将曲率确立为从经典到现代深度学习流水线的广泛机器学习任务中的实用几何特征。

英文摘要

Estimating local mean curvature at each point of a high-dimensional dataset is a key ingredient of geometry-aware machine learning algorithms, such as the Mean Curvature Boundary Points (MCBP) method. The naive implementation of this computation, based on a local shape operator approximated from k-nearest neighbor patches, involves an explicit construction of a matrix $H$ whose trace form yields an $O(m^4)$ cost per point, rendering the approach intractable for datasets with more than a few dozen features. This paper introduces two complementary contributions that together reduce this cost by several orders of magnitude. The first contribution is an exact algebraic identity. This identity, derived from the orthogonality of the eigenvectors of the covariance matrix and the cyclicity of the trace operator, eliminates $H$ entirely and reduces the per-point cost to $O(m^2)$ after the eigendecomposition. The second contribution addresses the remaining $O(m^3)$ bottleneck of the full eigendecomposition. Since the local covariance matrix has rank at most $k-1 \ll m$, we replace it with a truncated SVD of the $k \times m$ centered data matrix, an $O(k^2 m)$ operation, and derive an analytical approximation for the contribution of the null-space eigenvectors based on the expected value of their outer product under the Haar measure. The resulting estimator has total cost $O(k^2 m + k m p^2)$, where $p = k-1$. Experiments on real-world datasets confirm speedups of 50 to 300 times relative to the original implementation, with negligible loss when the fast estimator is used to replace the original version. By providing a scalable and data-driven estimate of local curvature, the proposed method establishes curvature as a practical geometric feature for a broad range of machine learning tasks, from classical to modern deep learning pipelines.

2606.06328 2026-06-05 cs.LG cs.AI 版本更新

PAMF: Prior-Aware Multimodal Fusion for Incomplete Time Series Data

PAMF: 面向不完整时间序列数据的先验感知多模态融合

Ziwen Kan, Wugeng Zheng, Tianlong Chen, Song Wang

发表机构 * Department of Computer Science, University of Central Florida(中央佛罗里达大学计算机科学系) Department of Computer Science, University of North Carolina at Chapel Hill(北卡罗来纳大学教堂山分校计算机科学系)

AI总结 提出PAMF框架,通过先验感知流匹配和权重共享显式处理模态内缺失和模态级缺失,将插补与下游预测耦合,提升多模态医疗时间序列任务性能。

Comments 5 figures. arXiv preprint version

详情
AI中文摘要

在医疗保健中,多模态时间序列任务在实践中通常处理不完整的观测,例如当电极脱落导致心电图片段丢失或夜间监测期间整个呼吸通道不可用时。这种缺失通常表现为两种结构上不同的模式:模态内缺失,即在某个观测模态内值缺失;以及模态级缺失,即整个模态不可用。现有方法通常通过掩码或缺失嵌入隐式表示未观测数据,而不学习实例特定的缺失信息,且大多数方法仅针对一种缺失模式设计。一种自然的方法是显式估计缺失数据;然而,现有的插补方法尽管缺失具有不同的结构先验,却统一处理缺失,并且插补过程通常与下游任务隔离,阻止下游任务引导插补朝向更具信息性的表示。为了解决这些局限性,我们提出了PAMF,一个多模态时间序列框架,它显式处理不同的缺失模式,同时通过先验感知流匹配和权重共享将插补与下游预测耦合。具体来说,该方法使用类型特定的先验初始化流匹配源状态,以区分两种缺失类型。它进一步通过架构匹配的编码器与权重共享连接插补和分类,将任务相关表示转移到插补过程中。在多个多模态医疗时间序列基准上的实验表明,与现有基线相比,所提出的方法在多样化的数据集和缺失设置下实现了最强的整体下游性能。

英文摘要

In healthcare, multimodal time series tasks often operate on incomplete observations in practice, for example when ECG segments are lost because electrodes detach or an entire respiratory channel is unavailable during overnight monitoring. Such missingness typically appears in two structurally distinct patterns: within-modality missing, where values are absent within an otherwise observed modality, and modality-level missing, where an entire modality is unavailable. Existing methods typically represent unobserved data implicitly through masks or missing embeddings, without learning instance-specific missing information, and most are designed for only one missingness pattern. A natural approach is to explicitly estimate the missing data; however, existing imputation methods treat missingness uniformly despite their different structural priors, and the imputation process is often isolated from downstream tasks, preventing downstream tasks from guiding imputation toward more informative representations. To address these limitations, we present PAMF, a multimodal time-series framework that explicitly handles different missingness patterns while coupling imputation with downstream prediction through prior-aware flow matching and weight sharing. Specifically, the method initializes the flow-matching source state with type-specific priors to distinguish two missing types. It further connects imputation and classification through architecturally matched encoders with weight sharing, transferring task-relevant representations into the imputation process. Experiments on multiple multimodal healthcare time-series benchmarks show that the proposed method achieves the strongest overall downstream performance across diverse datasets and missing settings compared with existing baselines.

2606.06320 2026-06-05 cs.LG cs.AI cs.CL 版本更新

Learning What to Forget: Improving LLM Unlearning via Learned Token-Level Importance

学习遗忘什么:通过习得的词元级重要性改进大语言模型遗忘

Gizem Yüce, Giorgos Nikolaou, Nicolas Flammarion

发表机构 * Theory of Machine Learning Lab, EPFL(机器学习理论实验室,EPFL)

AI总结 提出交替词元加权遗忘(ATWU)框架,通过联合学习词元遗忘特异性和模型参数,在无外部监督下实现最优的遗忘-保留权衡。

详情
AI中文摘要

机器遗忘旨在从训练好的模型中移除特定知识,同时保留其通用能力。对于自回归语言模型,遗忘样本中的并非所有词元都与遗忘同等相关。现有方法要么忽略这种异质性,要么依赖辅助模型、启发式方法或外部标注来估计每个词元对遗忘的相关性。我们转而通过其与保留目标的交互来刻画这种相关性:一个词元是遗忘特异性的,其程度取决于在该词元上最小化遗忘损失不与保留最优性冲突。我们将这一视角形式化为一个关于模型参数和词元权重的联合优化问题,并证明在自然分离条件下,所得目标能够恢复 oracle 遗忘特异性词元支持。受此公式启发,我们引入了交替词元加权遗忘(ATWU),这是一个轻量级框架,在遗忘过程中通过一个基于隐藏状态的简单线性评分器联合学习词元遗忘特异性和模型参数,无需外部词元级监督。在 TOFU 和 RWKU 上,ATWU 实现了最先进的遗忘-保留权衡,优于样本级方法、基于概率的词元加权启发式方法和基于辅助模型的方法。此外,学习到的分数与真实遗忘特异性跨度显著更好地对齐,表明 ATWU 识别了语义上有意义的词元级遗忘信号。总体而言,我们的结果表明,保留冲突为识别语言模型应遗忘什么提供了有效标准,使得能够直接从模型表示中以最小计算开销无监督学习词元级遗忘特异性。

英文摘要

Machine unlearning aims to remove targeted knowledge from a trained model while preserving its general capabilities. For autoregressive language models, not all tokens in a forget sample are equally relevant to forgetting. Existing approaches either ignore this heterogeneity or rely on auxiliary models, heuristics, or external annotations to estimate each token's relevance for forgetting. We instead characterize it through the interaction with the retain objective: a token is forget-specific to the extent that minimizing the forget loss on that token does not conflict with retain optimality. We formalize this perspective as a joint optimization problem over the model parameters and the token weights and show that, under a natural separation condition, the resulting objective recovers the oracle forget-specific token support. Motivated by this formulation, we introduce Alternating Token-Weighted Unlearning (ATWU), a lightweight framework that jointly learns token forget-specificity and model parameters during unlearning using a simple linear scorer over the hidden states, without external token level supervision. Across TOFU and RWKU, ATWU achieves state of the art forget-retain trade-offs, outperforming sample-level methods, probability-based token weighting heuristics, and auxiliary-model-based approaches. Moreover, the learned scores align substantially better with ground truth forget-specific spans, indicating that ATWU identifies semantically meaningful token level forgetting signals. Overall, our results suggest that retain conflict provides an effective criterion for identifying what language models should forget, enabling unsupervised learning of token level forget-specificity directly from model representations with minimal computational overhead.

2606.06314 2026-06-05 math.NA cs.LG cs.NA stat.ML 版本更新

DAS-PINNs for high-dimensional partial differential equations: extending deep adaptive sampling to spacetime domains

DAS-PINNs 用于高维偏微分方程:将深度自适应采样扩展到时空域

Anshima Singh, David J. Silvester

发表机构 * University of Manchester(曼彻斯特大学) Department of Mathematics(数学系)

AI总结 提出一种基于归一化流的深度自适应采样框架,将时空视为统一域,通过残差分布自动识别高残差区域并生成采样点,有效求解具有局部动态特征的高维时变PDE。

详情
AI中文摘要

具有空间局部和动态演化解的时变高维偏微分方程对物理信息神经网络(PINNs)构成根本性挑战,因为在高维时空域中均匀配点采样越来越无效。本文将深度自适应采样框架扩展到时变设置,将空间和时间视为统一域,无需任何显式时间推进。归一化流神经网络模型有效学习由PDE残差诱导的分布,并生成集中在解最难学习区域的新配点。与需要显式时间步进或移动网格的传统自适应策略不同,高残差区域由PDE残差分布驱动,在空间和时间上自动识别和跟踪。通过从二维空间中的尖锐移动特征到高达八维空间中的局部结构等一系列基准问题,评估了所提策略的有效性。

英文摘要

Time-dependent high-dimensional partial differential equations (PDEs) with spatially localised and dynamically evolving solutions pose a fundamental challenge for physics-informed neural networks (PINNs), as uniform collocation sampling becomes increasingly ineffective in high-dimensional spatiotemporal domains. In this work, a deep adaptive sampling framework for PINNs is extended to the time-dependent setting by treating space and time as a unified domain without any explicit time marching. A normalising flow neural network model effectively learns the distribution induced by the PDE residual and generates new collocation points concentrated in regions where the solution is most difficult to learn. Unlike conventional adaptive strategies that require explicit time stepping or moving meshes, high-residual regions are automatically identified and tracked across both space and time, driven purely by the PDE residual distribution. The effectiveness of the proposed strategy is assessed on a range of benchmark problems, from sharp and moving features in two spatial dimensions to localised structures in up to eight spatial dimensions.

2606.06313 2026-06-05 physics.flu-dyn cs.LG physics.comp-ph 版本更新

Wall Shear Stress Reconstruction from Concentration: Differentiable Physics and Physics-Informed Neural Networks

从浓度重建壁面剪切应力:可微物理与物理信息神经网络

Mahmoud Elhadidy, Siva Viknesh, Roshan M. D'Souza, Amirhossein Arzani

发表机构 * Department of Mechanical Engineering, University of Utah, Salt Lake City, UT, USA(机械工程系,犹他大学,盐湖城,UT,美国) Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT, USA(科学计算与成像研究所,犹他大学,盐湖城,UT,美国) Department of Mechanical Engineering, University of Wisconsin–Milwaukee, Milwaukee, WI, USA(机械工程系,威斯康星大学密尔沃基分校,密尔沃基,WI,美国) Department of Biomedical Engineering, University of Utah, Salt Lake City, UT, USA(生物医学工程系,犹他大学,盐湖城,UT,美国)

AI总结 本文通过可微物理框架和物理信息神经网络两种逆方法,从空间有限的被动标量观测中重建壁面剪切应力,并证明测量位置和逆公式共同决定重建精度。

详情
AI中文摘要

壁面剪切应力(WSS)控制近壁输运动力学,是心血管流动中的关键血流动力学指标,但由于需要精确计算近壁速度梯度,仍难以准确推断。被动标量场(如浓度或温度)由相同速度场输运,有潜力揭示隐藏的流动物理指标(如WSS)。本文通过两种根本不同的逆框架,从空间有限的被动标量观测中展示这种重建:基于离散伴随、PDE约束优化的可微物理框架(将控制方程作为硬约束)和物理信息神经网络(PINNs,将其作为软约束)。基准问题包括二维典型后向台阶(2D-BFS)和三维患者特异性狭窄冠状动脉。对于2D-BFS情况,在三种测量场景(近壁、远场和组合)下评估,当近壁数据可用时PINN达到高精度,但限于远场测量时失败,而可微物理方法在所有场景下均恢复准确的WSS。在三维患者特异性案例中,可微物理框架优于PINNs,产生准确的WSS重建。这些结果确立了测量位置和逆公式共同决定基于标量的近壁流动推断的重建保真度。所提出的框架开辟了从标量输运数据估计近壁血流动力学的途径,并广泛适用于可观测被动标量的流体流动问题。

英文摘要

Wall shear stress (WSS) governs near-wall transport dynamics and is a key hemodynamic indicator in cardiovascular flows, yet remains difficult to infer accurately due to the need for precise computation of near-wall velocity gradients. Passive scalar fields, such as concentration or temperature, are advected by the same underlying velocity field and have the potential to uncover hidden flow physics metrics such as WSS. In this work, we demonstrate such reconstruction from spatially limited passive scalar observations using two fundamentally different inverse frameworks: a differentiable physics framework based on discrete adjoint, PDE-constrained optimization, which enforces the governing equations as hard constraints, and physics-informed neural networks (PINNs), which treat them as soft constraints. Benchmark problems include a 2D canonical backward-facing step (2D-BFS) and a 3D patient-specific stenotic coronary artery. For the 2D-BFS case, evaluated under three measurement scenarios (near-wall, far-field, and combined), PINN achieves high accuracy when near-wall data are available but fails when restricted to far-field measurements, whereas the differentiable physics approach recovers accurate WSS across all scenarios. In the 3D patient-specific case, the differentiable physics framework outperforms PINNs, yielding accurate WSS reconstruction. These results establish that measurement location and inverse formulation jointly determine reconstruction fidelity in scalar-based near-wall flow inference. The proposed framework opens a path toward estimation of near-wall hemodynamics from scalar transport data, with broader applicability to fluid flow problems where passive scalars can be observed.

2606.06303 2026-06-05 cs.LG cs.AI 版本更新

Plug-and-Play Guidance for Discrete Diffusion Models via Gradient-Informed Logit Correction

基于梯度信息逻辑校正的离散扩散模型即插即用引导

Hongkun Dou, Zike Chen, Fengji Li, Hongjue Li, Yue Deng

发表机构 * National University of Singapore(新加坡国立大学) University of Science and Technology of China(中国科学技术大学)

AI总结 提出GILC框架,通过将预训练去噪网络作为变分代理来估计引导信号,并引入无雅可比机制直接校正干净预测逻辑,实现无需额外训练的离散扩散模型可控生成,在DNA、蛋白质序列和分子生成任务上达到最优性能。

Comments Accepted by ICML 2026

详情
AI中文摘要

离散扩散模型的可控生成常常受到高计算开销或需要重新训练的限制。在本文中,我们提出了\underline{\textbf{G}}radient-\underline{\textbf{I}}nformed \underline{\textbf{L}}ogit \underline{\textbf{C}}orrection (\textbf{GILC),这是一个即插即用框架,通过将预训练的去噪网络重新用作变分代理来高效估计引导信号。为了规避高维离散空间中固有的梯度不稳定性,我们引入了一种无雅可比机制,直接校正干净预测的逻辑,从而实现稳定且有效的引导。我们的方法适用于可微和不可微的奖励函数。在DNA、蛋白质序列和分子生成任务上的大量实验表明,GILC无需额外训练即可达到最先进的性能,并且常常优于微调方法。

英文摘要

Controllable generation with discrete diffusion models is often hindered by high computational overhead or the need for retraining. In this paper, we present \underline{\textbf{G}}radient-\underline{\textbf{I}}nformed \underline{\textbf{L}}ogit \underline{\textbf{C}}orrection (\textbf{GILC}), a plug-and-play framework that efficiently estimates guidance signals by repurposing the pretrained denoising network as a variational proxy. To circumvent the gradient instability inherent in high-dimensional discrete spaces, we introduce a Jacobian-free mechanism that directly corrects the clean prediction logits, facilitating stable and effective guidance. Our method accommodates both differentiable and non-differentiable reward functions. Extensive experiments across DNA, protein sequence, and molecular generation tasks demonstrate that GILC achieves state-of-the-art performance without additional training, frequently outperforming fine-tuning approaches.

2606.06295 2026-06-05 cs.LG physics.bio-ph physics.chem-ph 版本更新

Reactive Flux Matching: Mechanism Discovery and Adaptive Sampling of Rare Events

反应通量匹配:稀有事件的机制发现与自适应采样

Rishal Aggarwal, David Ryan Koes, Nicholas M. Boffi, Eric Vanden-Eijnden

发表机构 * CMU-Pitt Program in Computational Biology(CMU-匹兹堡计算生物学项目) Dept. of Computational & Systems Biology University of Pittsburgh(计算与系统生物学系匹兹堡大学) Machine Learning Department Dept. of Mathematical Sciences Carnegie Mellon University(机器学习系数学科学系卡内基梅隆大学) Courant Institute, New York University(纽约大学Courant研究所) Machine Learning Lab Capital Fund Management(机器学习实验室资本基金管理)

AI总结 提出通量匹配框架,从反应轨迹数据中学习电流速度 u(z) 和标量势 h(z),用于识别反应路径和反应坐标,并实现自适应采样。

Comments 21 pages, 7 figures, submitted to NeurIPS 2026

详情
AI中文摘要

路径采样方法生成连接亚稳态的反应轨迹系综,但从这些数据中提取机制性洞察仍然具有挑战性。我们引入了通量匹配,这是一个直接从反应轨迹数据中学习两个互补对象的框架:电流速度 $u(z)$,其流线描绘了主导反应路径;以及标量势 $h(z)$,通过对反应电流进行加权亥姆霍兹-霍奇分解得到,作为数据驱动的反应坐标。两者都最小化反应路径系综上的二次泛函,类似于生成建模中的流匹配损失,并且不需要了解底层动力学或平稳分布。与基于committor的方法不同,$u$ 和 $h$ 在投影到非马尔可夫集体变量上时仍然定义良好,它们的水平集反过来为增强采样方法提供了自适应界面,以改进采样。通量匹配通过生成电流速度轨迹和分子系统上的速率常数计算得到验证。

英文摘要

Path sampling methods generate ensembles of reactive trajectories connecting metastable states, but extracting mechanistic insight from these data remains nontrivial. We introduce Flux Matching, a framework that learns two complementary objects directly from reactive trajectory data: a current velocity $u(z)$, whose streamlines trace the dominant reaction pathways, and a scalar potential $h(z)$, obtained from a weighted Helmholtz-Hodge decomposition of the reactive current, that serves as a data-driven reaction coordinate. Both minimize quadratic functionals over the reactive path ensemble, analogous to the flow matching loss in generative modeling, and require no knowledge of the underlying dynamics or stationary distribution. Unlike committor-based methods, $u$ and $h$ remain well-defined under projection onto non-Markovian collective variables, and their level sets in turn provide adaptive interfaces for improved sampling with enhanced sampling methods. Flux Matching is validated through the generation of current velocity trajectories and rate constant calculations on molecular systems.

2606.06293 2026-06-05 cs.LG stat.ML 版本更新

PAC-Bayesian Adversarially Robust Generalization for Message Passing Graph Neural Networks: A Sensitivity Analysis

消息传递图神经网络的PAC-Bayesian对抗鲁棒泛化:敏感性分析

Ziling Liang, Xinping Yi, Qingsong Wen, Shi Jin

发表机构 * School of Information Science and Engineering, Southeast University(信息科学与工程学院,东南大学) Squirrel Ai Learning

AI总结 通过敏感性感知的PAC-Bayesian框架,利用输出雅可比矩阵的秩约束和异向高斯后验,为消息传递图神经网络导出更紧的对抗鲁棒泛化界。

详情
AI中文摘要

尽管图神经网络(GNNs)对对抗攻击的脆弱性对图表示学习构成了严重威胁,但在对抗环境下对鲁棒泛化行为的理解仍然是一个基本挑战。最近,基于PAC-Bayesian边际的泛化分析通过提供灵活且数据依赖的分析框架,显著推动了这一研究方向。然而,现有的鲁棒分析通常依赖于各向同性高斯后验,并在全参数空间中控制权重扰动,这限制了捕捉异质参数敏感性的能力,且依赖于隐藏宽度相关的复杂度项,导致泛化界不够紧。在本文中,我们将最近提出的敏感性感知PAC-Bayesian框架从深度神经网络扩展到消息传递图神经网络(MPGNNs),并在对抗环境下导出了更紧的鲁棒泛化界。具体地,我们首先通过推导关于权重参数的输出雅可比矩阵,量化不同参数块的扰动对网络输出的敏感性。利用这些雅可比矩阵在$K$类图分类中秩最多为$K$的事实,我们构建了雅可比对齐的敏感性矩阵,并使用具有优化协方差的异向高斯后验来紧上界KL散度。值得注意的是,通过细化学习权重的谱范数依赖性,并将主导维度因子从隐藏宽度相关项减少到类别数$K$,我们的分析为MPGNNs提供了更紧的鲁棒泛化保证,从而指导其设计以增强对抗鲁棒性。

英文摘要

Whilst the vulnerability of graph neural networks (GNNs) to adversarial attacks poses a critical threat to graph representation learning, the understanding of the robust generalization behavior remains a fundamental challenge in the adversarial setting. Recently, PAC-Bayesian margin-based generalization analysis substantially advances this line of research by providing a flexible and data-dependent analytical framework. However, existing robust analyses often rely on isotropic Gaussian posteriors and control weight perturbations in the full parameter space, which limits the ability to capture heterogeneous parameter sensitivity yet hinges on hidden-width-dependent complexity terms, resulting in not-tight-enough generalization bounds. In this paper, we extend a recently proposed sensitivity-aware PAC-Bayesian framework from deep neural networks to message passing GNNs (MPGNNs) and derive a tighter robust generalization bound in the adversarial setting. Specifically, we first quantify how sensitive the perturbations across different parameter blocks are to the network outputs by deriving the output Jacobians with respect to the weight parameters. Exploiting the fact that these Jacobian matrices have rank at most $K$ in $K$-class graph classification, we then construct Jacobian-aligned sensitivity matrices and use anisotropic Gaussian posteriors with optimized covariances to upper bound the KL divergence in a tight way. Notably, by refining the spectral-norm dependence on the learned weights and reducing the leading dimension factor from hidden-width-dependent terms to the number of classes $K$, our analysis yields much tighter robust generalization guarantees for MPGNNs, thereby guiding their designs to enhance adversarial robustness.

2606.06288 2026-06-05 stat.ML cs.LG 版本更新

Discrete Causal Representations from Heterogeneous Domains: A Bayesian Approach with Social Survey Applications

来自异构域的离散因果表示:一种贝叶斯方法及其在社会调查中的应用

Ankur Garg, Michael Stettler, Aaron Schein, Julius von Kügelgen

发表机构 * Department of Statistics, University of Chicago(芝加哥大学统计学系) University of Tübingen(图宾根大学) Department of Statistics & Data Science Institute, University of Chicago(芝加哥大学统计学与数据科学研究所) Seminars for Statistics, ETH Zürich(苏黎世联邦理工学院统计系)

AI总结 提出一种贝叶斯方法,从多环境数据中学习离散因果概念,通过序贯蒙特卡洛采样近似多模态后验,并在社会调查数据中验证了其推断有意义的高层概念和因果关系的有效性。

详情
AI中文摘要

因果表示学习旨在推断产生观测到的低层测量的高层潜在因果概念。这对于来自不同环境或领域的异构数据尤其相关,因为分布偏移通常通过某些底层因果机制中的稀疏局部变化而发生,而生成过程的其他部分保持不变。尽管因果表示的可识别性已被广泛研究,但实用的不确定性感知方法和真实世界用例仍较少探索。在这项工作中,我们提出了一种从多环境数据中学习因果表示的贝叶斯方法,重点关注离散因果概念和未知的多节点软干预的情况。为此,我们将因果假设和可解释性需求转化为层次模型中的适当先验和参数选择。然后,我们设计了一种基于序贯蒙特卡洛采样的推理方案来近似得到的多模态后验。我们通过社会调查数据的案例研究展示了我们的方法,其中潜在因果概念对应于文化价值观或政治观点,测量对应于调查响应,环境对应于不同的国家或州。我们的模型推断出有意义的高层概念以及它们之间合理的因果关系,展示了其在学习复杂真实世界数据的因果表示方面的实用性。

英文摘要

Causal representation learning aims to infer the high-level latent causal concepts that give rise to observed low-level measurements. This is particularly relevant for heterogeneous data from different environments or domains since distribution shifts often arise through sparse, localized changes in some of the underlying causal mechanisms, while other parts of the generative process remain unchanged. Whereas identifiability of causal representations has been studied extensively, practical uncertainty-aware methods and real-world use cases remain less explored. In this work, we propose a Bayesian approach to learning causal representations from multi-environment data, focusing on the case of discrete causal concepts and unknown multi-node soft interventions. To this end, we translate causal assumptions and interpretability desiderata into suitable priors and parametric choices within a hierarchical model. We then devise an inference scheme based on sequential Monte Carlo sampling to approximate the resulting multimodal posterior. We showcase our approach through case studies on social survey data, where latent causal concepts correspond to cultural values or political opinions, measurements to survey responses, and environments to different countries or states. Our model infers meaningful high-level concepts and plausible causal relations among them, demonstrating its utility for learning causal representations of complex real-world data.

2606.06272 2026-06-05 cs.LG cs.AI 版本更新

Your GFlowNet Secretly Learns an Optimal Transport Plan

你的GFlowNet秘密学习了一个最优传输方案

Ian Maksimov, Nikita Morozov, Denis Belomestny, Sergey Samsonov

发表机构 * GitHub arXiv

AI总结 本文建立了非无环生成流网络与最优传输之间的理论联系,证明最小流GFlowNet学习到的策略编码了从源分布到目标分布的最优传输方案。

Comments ICML 2026 SPIGM Workshop

详情
AI中文摘要

生成流网络(GFlowNets)是一个通过有向图中的随机轨迹对结构化对象进行采样的框架。在这项工作中,我们建立了非无环GFlowNets与最优传输(OT)之间的理论联系。我们证明,在最小流GFlowNet中固定初始流分布会将其目标函数简化为具有图诱导最短路径成本的Kantorovich OT问题。因此,在最优解处,学习到的GFlowNet策略编码了从源分布到目标分布的最优传输方案:我们表明,从最小流GFlowNet中采样轨迹可以恢复相应的最优耦合。我们的公式通过边流和神经参数化,使得将GFlowNet学习框架应用于大规模图上的OT问题成为可能。实验证实了与精确OT求解器的一致性,并展示了GFlowNets可以学习高质量的传输方案。

英文摘要

Generative Flow Networks (GFlowNets) are a framework for sampling structured objects via stochastic trajectories in a directed graph. In this work, we establish a theoretical connection between non-acyclic GFlowNets and optimal transport (OT). We show that fixing the initial flow distribution in a minimum-flow GFlowNet reduces its objective to a Kantorovich OT problem with graph-induced shortest path costs. At the optimum, the learned GFlowNet policy therefore encodes an optimal transport plan from the source distribution to the target distribution: we show that sampling trajectories from the minimum-flow GFlowNet recovers the corresponding optimal coupling. Our formulation enables applying the GFlowNet learning framework to OT problems on large graphs via edge flows and neural parameterization. Experiments confirm agreement with exact OT solvers and demonstrate that GFlowNets can learn high-quality transport plans.

2606.06249 2026-06-05 cs.CV cs.LG 版本更新

GRAMformer: Any-Order Modality Interactions via Volumetric Multimodal Cross-Attention

GRAMformer: 通过体积多模态交叉注意力实现任意顺序模态交互

Giordano Cicchetti, Eleonora Grassucci, Danilo Comminiello

发表机构 * Dept. of Information Engineering, Electronics, and Telecommunications, Sapienza University of Rome(信息工程、电子与电信系,罗马萨皮恩扎大学)

AI总结 提出体积多模态交叉注意力(VMA)机制,通过计算查询与多模态键向量的联合几何体积来建模任意顺序的模态交互,并集成到新型多模态Transformer架构GRAMformer中,提升多模态学习的有效性和效率。

详情
AI中文摘要

基于Transformer的多模态模型依赖注意力机制来整合异构模态间的信息。尽管取得了成功,现有的多模态注意力公式通过成对点积交互的集合或将所有模态拼接成键来计算分数,即使多个模态应该被联合参与。因此,当前方法要么在模态数量上产生二次复杂度,要么无法显式建模依赖于多个表示联合配置的交互。在这项工作中,我们引入了体积多模态交叉注意力(VMA),一种新颖的交叉注意力机制,其中注意力分数被定义为查询和多个模态特定键的联合几何的函数。VMA计算跨多个模态的查询和键向量所张成的体积,捕获超越成对相似性的联合多模态依赖,实现任意顺序模态交互的原生建模。我们将VMA集成到我们新颖的多模态Transformer架构中,命名为GRAMformer,该架构专门设计用于整合任意数量的模态。我们在多模态学习任务上评估了所提出的模型,展示了改进的有效性和效率。

英文摘要

Transformer-based multimodal models rely on attention mechanisms to integrate information across heterogeneous modalities. Despite their success, existing multimodal attention formulations compute their scores through collections of pairwise dot-product interactions or by concatenating all the modalities into the keys, even when multiple modalities should be jointly involved. As a consequence, current approaches either incur quadratic complexity in the number of modalities or fail to explicitly model interactions that depend on the joint configuration of multiple representations. In this work, we introduce the Volumetric Multimodal cross-Attention (VMA), a novel cross-attention mechanism in which attention scores are defined as a function of the joint geometry of a query and multiple modality-specific keys. VMA computes the volume spanned by query and key vectors across multiple modalities, capturing joint multimodal dependencies beyond pairwise similarity, enabling native modeling of any-order modality interactions. We integrate VMA into our novel multimodal transformer architecture, named GRAMformer, explicitly designed to integrate any number of modalities. We evaluate the proposed model on multimodal learning tasks, demonstrating improved effectiveness and efficiency.

2606.06238 2026-06-05 cs.LG cond-mat.stat-mech hep-lat 版本更新

Generative Criticality in Large Language Model Temperature Scaling

大型语言模型温度缩放中的生成临界性

Huajian Ruan, Jinyang Li, Xingyu Guo, Lingxiao Wang

发表机构 * State Key Laboratory of Nuclear Physics and Technology, Institute of Quantum Matter, South China Normal University(核物理与技术国家重点实验室,量子物质研究院,华南师范大学) Key Laboratory of Atomic and Subatomic Structure and Quantum Control (MOE), Guangdong-Hong Kong Joint Laboratory of Quantum Matter(原子与亚原子结构及量子控制重点实验室(MOE),量子物质广深联合实验室) Guangdong Basic Research Center of Excellence for Structure and Fundamental Interactions of Matter, Guangdong Provincial Key Laboratory of Nuclear Science(物质结构与基本相互作用卓越基础研究中心,广东省核科学重点实验室) KEK Theory Center, Institute of Particle and Nuclear Studies(KEK理论中心,粒子与核物理研究所) RIKEN Center for Interdisciplinary Theoretical and Mathematical Sciences (iTHEMS), Wako(RIKEN交叉学科理论与数学科学中心(iTHEMS),Wako) Graduate University for Advanced Studies (SOKENDAI), Oho 1-1, Tsukuba, Ibaraki(高等研究大学(SOKENDAI),Oho 1-1,筑波,Ibaraki) Institute for Physics of Intelligence, The University of Tokyo(智能物理研究院,东京大学)

AI总结 通过统计场框架研究大型语言模型文本生成中的温度缩放,发现接近特征温度时出现类似连续相变的临界现象,为理解解码策略与临界现象的联系提供定量工具。

Comments 9 pages, 7 figures, contributed to PAI 2026 Conference

详情
AI中文摘要

我们为大型语言模型(LLM)生成的文本提出一个统计场框架,将token嵌入视为一维链上的连续自旋变量。通过连接的两点关联函数定义磁化率,并通过系综平均嵌入场定义序参量,我们改变softmax温度$T$,观察到在特征温度$T_c$附近出现尖锐的磁化率峰,具有幂律标度行为,序参量同时发生快速变化,并在$T_c$以下坍缩到单一语义方向。由最近邻(TwoNN)方法估计的内在维度独立地证实了这些发现,在$T_c$附近达到最小值。结果在模型规模(Qwen3:0.6B--32B)和提示类别上具有鲁棒性。虽然现象学上类似于连续相变,但自回归生成的非平衡性质需要进一步研究。我们的框架为探测LLM输出的集体统计结构提供了定量工具,并暗示了解码策略与临界现象之间的联系。

英文摘要

We propose a statistical-field framework for text generated by large language models (LLMs), treating token embeddings as continuous spin variables on a one-dimensional chain. Defining a susceptibility from the connected two-point correlator and an order parameter from the ensemble-averaged embedding field, we vary the \texttt{softmax} temperature $T$ and observe a sharp susceptibility peak near a characteristic $T_c$ with power-law-like scaling, a concurrent rapid change in the order parameter, and a collapse onto a single semantic direction below $T_c$. The intrinsic dimension estimated by the two nearest neighbor (TwoNN) method independently corroborates these findings, reaching a minimum near $T_c$. Results are robust across model scales (Qwen3: 0.6B--32B) and prompt categories. While the phenomenology closely resembles a continuous phase transition, the non-equilibrium nature of autoregressive generation warrants further investigation. Our framework provides quantitative tools for probing the collective statistical structure of LLM outputs and suggests connections between decoding strategies and critical phenomena.

2606.06236 2026-06-05 cs.LG 版本更新

Tracing the Oracle: Improving Diffusion Timestep Scheduling for 3D CT Reconstruction

追踪神谕:改进扩散时间步调度用于3D CT重建

Yujia Wu, Zhaoqiang Liu

发表机构 * School of Computer Science and Engineering, University of Electronic Science and Technology of China(电子科技大学计算机科学与工程学院)

AI总结 针对3D CT重建中扩散模型推理计算开销大且均匀时间步调度引入大截断误差的问题,提出即插即用的TrO框架,通过动态规划优化时间步调度,在有限采样步数下显著提升重建保真度和计算效率。

Comments Accessed to ECML-PKDD2026

详情
AI中文摘要

预训练扩散模型在解决高度病态的3D计算机断层扫描(CT)逆问题中展现出令人印象深刻的潜力,但推理过程存在显著的计算开销。此外,现有的均匀时间步调度未能捕捉反向条件扩散随机微分方程的非均匀演化,从而引入了大量截断误差。为克服这一限制,我们提出Tracing the Oracle(TrO),一种用于改进时间步调度的即插即用框架。具体而言,我们将少量样本上的密集采样数值积分轨迹视为参考神谕。通过动态规划全局最小化少步近似与神谕之间的累积误差,提取优化后的调度。该机制将有限的采样步精确分配到对截断误差高度敏感的关键演化阶段。我们在AAPM数据集上的多个3D CT重建任务中进行的广泛实验表明,与最先进的3D CT重建方法DDS结合时,我们的优化时间步在不超过10个采样步的严格预算下,相比现有启发式调度显著提高了重建保真度和计算效率。

英文摘要

Pretrained diffusion models demonstrate impressive potential in solving highly ill-posed 3D computed tomography (CT) inverse problems, while the inference process suffers from significant computational overhead. Furthermore, existing uniform timestep schedules fail to capture the non-uniform evolution of the reverse conditional diffusion stochastic differential equation, thereby introducing substantial truncation errors. To overcome this limitation, we propose Tracing the Oracle (TrO), a plug-and-play framework for improved timestep scheduling. Specifically, we treat densely sampled numerical integration trajectories on a few samples as the reference oracle. The optimized schedule is extracted by leveraging dynamic programming to globally minimize the cumulative error between the few-step approximation and the oracle. This mechanism precisely allocates the limited sampling steps to critical evolution stages that are highly susceptible to truncation errors. Our extensive experiments on the AAPM dataset across multiple 3D CT reconstruction tasks demonstrate that, when combined with the state-of-the-art 3D CT reconstruction method DDS, our optimized timesteps significantly improve reconstruction fidelity and computational efficiency compared to existing heuristic schedules, especially under a strict budget of no more than 10 sampling steps.

2606.06235 2026-06-05 cs.LG cs.AI 版本更新

Design a Reliable LLM-Integrated Interface for Mortality Forecasting

设计一个可靠的LLM集成接口用于死亡率预测

Thi Kim Ngan Nguyen

发表机构 * Curtin University(Curtin大学)

AI总结 提出一个结合大语言模型(LLM)的接口,通过自然语言输入驱动确定性预测流程,在保持统计精度的同时提升非专家用户的可及性。

Comments 7 pages, 7 figures

详情
AI中文摘要

死亡率预测在精算和政策决策中扮演重要角色,但其实现仍然技术复杂且对非专家用户不友好。本项目提出一个可靠的大语言模型(LLM)集成接口,在保持统计功效的同时提升可用性。LLM被设计为一个约束编排层,将自然语言输入转化为确定性预测流程的结构化配置。采用三阶段方法确保准确性、可用性和透明度。首先,使用CoMoMo包实现基线流程,复现已建立的死亡率预测结果。其次,扩展流程以使用滚动原点评估和均方误差(MSE)生成多步预测。第三,原型接口使用本地LLM以自然语言处理用户的预测请求。该系统表明,LLM可以在不损害高敏感性分析工作流中的可重复性、透明度或精算有效性的前提下增强可访问性。

英文摘要

Mortality forecasting plays an important role in actuarial and policy decision-making, but its implementation remains technically complex and inaccessible to non-expert users. This project proposes a reliable large language model (LLM)-integrated interface that improves usability while maintaining statistical power. The LLM is designed as a constrained orchestration layer that translates natural-language inputs into structured configurations for a deterministic forecasting pipeline. A three-phase methodology is employed to ensure accuracy, usability, and transparency. First, a baseline pipeline is implemented using the CoMoMo package, reproducing established mortality forecasting results. Second, the pipeline is extended to generate multi-step forecasts using rolling-origin evaluation and mean squared error (MSE). Third, a prototype interface uses a local LLM to handle users' forecasting requests in plain language. The system demonstrates that LLMs can enhance accessibility without compromising reproducibility, transparency, or actuarial validity in high-stakes analytical workflows.

2606.06233 2026-06-05 stat.ML cs.LG stat.ME 版本更新

Anchor PCA

Anchor PCA

Benedikt Seiter, Anya Fries, Julius von Kügelgen, Jonas Peters

发表机构 * ETH Zürich(苏黎世联邦理工学院)

AI总结 针对多领域数据,提出Anchor PCA方法,通过修改目标矩阵进行主成分分析,在保留共享变异方向的同时权衡整体解释方差,实现鲁棒的降维。

详情
AI中文摘要

主成分分析(PCA)是最广泛使用的无监督降维技术之一。我们研究来自多个相关领域的数据的PCA。由于主成分在不同领域通常不同,获得共享低秩嵌入的一种方法是对合并数据进行PCA。然而,这种方法可能关注仅在少数领域中表现出高变异的虚假方向。为了找到在未见但相似领域中仍能解释大部分方差的鲁棒嵌入,我们提出关注共享变异方向。为此,我们引入了Anchor PCA,它在整体解释方差与共享和领域特定低秩嵌入之间的一致性之间进行权衡。Anchor PCA相当于对修改后的目标矩阵进行PCA,因此可以高效求解。此外,我们证明Anchor PCA恢复最大不变子空间,并在有界领域特定协方差膨胀下允许极小极大重构解释。在具有时间漂移的模拟和真实气体传感器数据上,我们分别证明Anchor PCA恢复了最大不变子空间,并产生了比合并基线和最坏情况替代方法在未见领域上解释更多方差的嵌入。综合来看,这些发现确立了Anchor PCA作为从多领域数据进行鲁棒无监督降维的有前途的方法。

英文摘要

Principal component analysis (PCA) is one of the most widely used unsupervised dimension reduction techniques. We study PCA for data from multiple related domains. Since principal components generally differ across domains, one way to obtain a shared low-rank embedding is to perform PCA on the pooled data. However, this approach can focus on spurious directions that exhibit high variation in only a few domains. To find a robust embedding that still explains most variance in unseen but similar domains, we propose instead to focus on shared directions of variation. To this end, we introduce Anchor PCA which trades off overall explained variance with agreement between the shared and domain-specific low-rank embeddings. Anchor PCA amounts to PCA on a modified target matrix and thus can be solved efficiently. Moreover, we show that Anchor PCA recovers a maximal invariant subspace and admits a minimax reconstruction interpretation under bounded domain-specific covariance inflations. On simulated and real-world gas sensor data with temporal drift, we demonstrate, respectively, that Anchor PCA recovers the maximally invariant subspace and yields embeddings that explain more variance on unseen domains than the pooling baseline and a worst-case alternative. Taken together, these findings establish Anchor PCA as a promising approach to robust unsupervised dimension reduction from multi-domain data.

2606.06225 2026-06-05 cs.IR cs.AI cs.LG 版本更新

Bridging the Semantic-Collaborative Gap: An Asymmetric Graph Architecture for Cold-Start Item Recommendation

弥合语义-协作鸿沟:面向冷启动物品推荐的非对称图架构

Anh Truong, John Trenkle, Yuanbo Chen, Honghong Zhao, Abdullah Alchihabi, Effy Fang, Michael Tamir

发表机构 * Tubi Kumo AI

AI总结 提出Shallow-RHS非对称链接预测架构,通过左端设备塔利用时序历史消息传递捕获协作信号,右端内容塔仅基于内在特征编码,解决冷启动物品推荐中的图归纳补全问题。

详情
AI中文摘要

协同过滤和基于图的推荐模型因利用观察到的用户交互而非常有效,但这种依赖性在新增内容没有交互历史时产生了根本性的冷启动挑战。在Tubi的生产检索系统中,这一挑战还受到服务接口的进一步限制:新内容必须立即分配独立的嵌入,并且模型必须产生适用于近似最近邻检索的设备嵌入。我们通过将冷启动推荐表述为时间二分设备-内容图上的归纳图补全问题来解决这一设置。我们提出Shallow-RHS,一种非对称链接预测架构,其中左端(LHS)设备塔利用时序有效的观看历史消息传递来捕获协作信号,而右端(RHS)内容塔相对于图是故意浅层的,仅从内在特征编码内容。RHS塔不使用基于ID的嵌入、内容侧子图、邻居聚合或交互派生的表示,迫使内容编码器将内在特征映射到协同过滤感知的嵌入空间。训练后,学习到的内容编码器为热内容和新增内容生成嵌入,通过检索热替代邻居实现隐式图补全。我们进一步将相同的表示补全原则扩展到设备冷启动,通过从人口统计特征构建基于群体的嵌入。大规模在线实验表明,在内容冷启动参与度、推广速度、印象获取和设备冷启动参与度方面持续相对改进。

英文摘要

Collaborative filtering and graph-based recommendation models are highly effective because they leverage observed user interactions, but this dependence creates a fundamental cold-start challenge when newly added content has no interaction history. In Tubi's production retrieval system, this challenge is further constrained by the serving interface: new content must be assigned a standalone embedding immediately, and the model must also produce device embeddings suitable for approximate nearest-neighbor retrieval. We address this setting by formulating cold-start recommendation as an inductive graph-completion problem on a temporal bipartite device-content graph. We propose Shallow-RHS, an asymmetric link-prediction architecture in which the left-hand side (LHS) device tower leverages temporally valid watch-history message passing to capture collaborative signals, while the right-hand side (RHS) content tower is intentionally shallow with respect to the graph and encodes content solely from intrinsic features. The RHS tower does not use ID-based embeddings, content-side subgraphs, neighbor aggregation, or interaction-derived representations, forcing the content encoder to map intrinsic features into a collaborative-filtering-aware embedding space. After training, the learned content encoder generates embeddings for both warm and newly ingested content, enabling implicit graph completion through retrieval of warm surrogate neighbors. We further extend the same representation-completion principle to device cold-start by constructing cohort-based embeddings from demographic features. Large-scale online experiments demonstrate consistent relative improvements in content cold-start engagement, promotion speed, impression acquisition, and device cold-start engagement.

2606.06207 2026-06-05 cs.AI cs.LG 版本更新

Unsupervised Pattern Analysis in Japanese Veterinary Toxicology: A Regulatory-Compliant Framework for Cross-Species Risk Assessment

日本兽医毒理学中的无监督模式分析:用于跨物种风险评估的合规框架

Yukiko Kawakami, Mohammad Shirazi, Ryo Shimizuwa, Saito Shinoda, Alireza Mortazavi, Matsumoto Kawahara

发表机构 * Graduate School of Information Sciences, Tohoku University(东北大学信息科学研究生院)

AI总结 提出一种监管集成的无监督框架,利用NVAL数据库对不良药物事件进行聚类分析,识别出具有生物学意义的跨物种毒性模式。

Comments Submitted to IEEE Transactions on Biomedical Engineering

详情
AI中文摘要

兽医药物警戒系统对于监测不良药物事件(ADEs)至关重要,然而现有方法往往无法捕捉由当地生物学和监管环境塑造的区域特异性毒性模式。在日本,这些挑战因物种特异性代谢差异以及农林水产省(MAFF)定义的报告实践而加剧。以往的工作大多依赖于预测导向模型,限制了机制可解释性。本研究提出了一种监管集成的无监督框架,用于利用国家兽医检测实验室(NVAL)数据库进行模式发现。ADEs被编码为器官系统对齐的表示,并针对物种特异性报告偏差进行调整,从而实现跨物种比较。应用基于相似性的聚类和降维来识别潜在毒性结构。对4,120份高置信度ADE报告(9,080个药物-ADE组合)的分析识别出三个显著的物种聚类(p < 0.01),包括伴侣动物中的肝脏主导模式(0.42 ± 0.06)、反刍动物中的肾毒性(0.39 ± 0.07)以及绵羊中的皮肤敏感性(0.35 ± 0.07)。药物水平聚类与药理类别的对齐率达到83%,而余弦相似度优于其他指标(轮廓系数:0.48;聚类精度:87%)。监管验证显示与既定分类高度一致。这些发现表明,与监管对齐的无监督分析能够揭示具有生物学意义的区域特异性毒性模式,为兽药安全性评估提供了一个可解释且可扩展的框架。

英文摘要

Veterinary pharmacovigilance systems are essential for monitoring adverse drug events (ADEs), yet existing approaches often fail to capture region-specific toxicity patterns shaped by local biological and regulatory contexts. In Japan, these challenges are amplified by species-specific metabolic differences and reporting practices defined by the Ministry of Agriculture, Forestry, and Fisheries (MAFF). Most prior work relies on prediction-oriented models, limiting mechanistic interpretability. This study proposes a regulatory-integrated unsupervised framework for pattern discovery using the National Veterinary Assay Laboratory (NVAL) database. ADEs are encoded into organ system-aligned representations and adjusted for species-specific reporting biases, enabling cross-species comparison. Similarity-based clustering and dimensionality reduction are applied to identify latent toxicity structures. Analysis of 4,120 high-confidence ADE reports (9,080 drug-ADE combinations) identified three significant species clusters (p < 0.01), including hepatic-dominant patterns in companion animals (0.42 $\pm$ 0.06), renal toxicity in ruminants (0.39 $\pm$ 0.07), and dermatological sensitivity in sheep (0.35 $\pm$ 0.07). Drug-level clustering achieved 83% alignment with pharmacological classes, while cosine similarity outperformed alternative metrics (silhouette score: 0.48; cluster precision: 87%). Regulatory validation showed strong agreement with established classifications. These findings demonstrate that regulation-aligned unsupervised analysis can uncover biologically meaningful, region-specific toxicity patterns, providing an interpretable and scalable framework for veterinary drug safety assessment.

2606.06205 2026-06-05 cs.LG 版本更新

Non-Negative Matrix Factorization for Event Data

事件数据的非负矩阵分解

Raphaël Romero

发表机构 * Ghent University(根特大学)

AI总结 提出EventNMF,一种直接对事件时间进行建模的连续时间非负矩阵分解方法,通过B样条基函数分解强度函数,避免分箱预处理的信息损失,并证明标准分箱方法是其特例。

详情
AI中文摘要

连续时间事件数据,其中实体随时间发出瞬时事件,自然出现在许多领域,如神经科学、地震学和社会网络。非负矩阵分解(NMF)是揭示此类数据中可解释结构的自然工具,但迄今为止仅在分箱或平滑实体级计数度量后应用。这种预处理步骤存在擦除实体级异质性和细粒度时间特征的风险。在本文中,我们介绍了EventNMF,一种直接对事件时间进行操作的连续时间非负分解模型:每个实体的事件被建模为泊松过程,其强度通过非负B样条基函数分解,一个简单的估计过程恢复了跨实体共享的可解释时间模板。所得方法在数学上严谨、易于实现且计算高效。我们进一步证明了标准分箱计数方法是零次样条的特例,探索了偏差-方差权衡,并在合成潜在因子模型上与现有方法进行了比较,以及在几个实际应用中展示了EventNMF的有效性。

英文摘要

Continuous-time event data, in which entities emit instantaneous events over time, arises naturally across many domains such as neuroscience, seismology, and social networks. Non-negative matrix factorization (NMF) is a natural tool to uncover interpretable structure in such data, but it has so far only been applied after binning or smoothing the entity-level counting measures. This preprocessing step comes with the risk of erasing entity-level heterogeneities and fine-grained temporal features. In this paper, we introduce EventNMF, a continuous-time non-negative factorization model that operates directly on event times: each entity's events are modeled as a Poisson process whose intensity factorizes through a non-negative B-spline basis, and a simple estimation procedure recovers interpretable temporal templates shared across entities. The resulting method is mathematically principled, easy to implement, and computationally efficient. We further show that standard binned-count approaches arise as the special case of degree-zero splines, explore bias-variance tradeoffs and compare against existing methods on a synthetic latent factor model, and demonstrate the effectiveness of EventNMF on several real-world applications.

2606.06196 2026-06-05 cs.LG 版本更新

A Machine Learning-Based Framework for Discovering Huntington's Disease Stages: Integrating Graph Representation Learning and clustering to Uncover Progression Dynamics in Longitudinal Enroll-HD Dataset

基于机器学习的亨廷顿病阶段发现框架:结合图表示学习与聚类以揭示纵向Enroll-HD数据集中的进展动态

Lubna M. Abu Zohair, Marta Vallejo, MD Azher Uddin, John R. Woodward, Hind Zantout

发表机构 * School of Mathematical and Computer Sciences, Heriot-Watt University(赫瑞斯-沃德大学数学与计算机科学学院)

AI总结 提出基于动态图表示学习和K-means++聚类的无监督框架,从纵向临床数据中识别亨廷顿病的四个有意义的进展阶段,并通过稳定性分析验证其鲁棒性。

Comments Accepted for publication in the Proceedings of the 10th International Conference on Medical and Health Informatics (ICMHI 2026), Association for Computing Machinery (ACM)

详情
AI中文摘要

亨廷顿病(HD)是一种进行性脑部疾病,逐渐影响运动、认知功能和行为。准确且一致地识别疾病阶段对于理解病程、患者分组、个性化护理和治疗发现至关重要。现有的临床分期框架主要依赖于预定义的临床测量阈值和临床专家决策,但这些离散的截断值可能掩盖有意义的阶段内变异性,并且容易受到评估者间差异的影响,尤其是在运动和功能评估中。为了解决这些局限性,我们开发了一个基于动态图表示学习的无监督机器学习框架,以捕捉来自纵向临床测量的患者内部和患者之间的时间关系。利用学习到的表示,我们应用K-means++聚类来识别分离良好的组。然后,我们迭代增加聚类数量(k),使用稳定性分析评估鲁棒性,并揭示超出初始最优解的额外有意义的聚类。我们将该框架应用于Enroll-HD队列中的302名个体(1477次就诊,每次就诊44个临床变量;80%为明显参与者),实现了反映自然临床进展的数据驱动的HD阶段发现。尽管队列规模有限,所提出的框架使用四维潜在空间实现了稳健的聚类性能,通过聚类稳定性分析识别出四个有意义的且统计上显著的疾病阶段。每个阶段对应明确的临床测量边界,与先前建立的临床分期方法相比,重叠最小。

英文摘要

Huntington's disease (HD) is a progressive brain disorder that gradually affects movement, cognitive function, and behavior. Identifying the stage of the disease accurately and consistently is important for understanding its course, grouping patients, personalized care, and discovering treatment. Existing clinical staging frameworks rely primarily on predefined clinical measurement thresholds and clinical expert decisions, yet these discrete cut-offs may obscure meaningful intra-stage variability and remain vulnerable to inter-rater differences, especially in motor and functional assessments. To address these limitations, we developed an unsupervised machine learning framework based on dynamic graph representation learning to capture temporal relationships within and across patients from longitudinal clinical measurements. Using the learned representations, we applied K-means++ clustering to identify well-separated groups. We then iteratively increased the number of clusters (k), using stability analysis to assess robustness and reveal additional meaningful clusters beyond the initial optimal solution. We applied the framework to 302 individuals from the Enroll-HD cohort (1,477 visits, 44 clinical variables per visit; 80% manifest participants), enabling data-driven discovery of HD stages reflecting natural clinical progression. Despite the limited cohort size, the proposed framework achieved robust clustering performance using a four-dimensional latent space, identifying four meaningful and statistically distinct disease stages through clustering stability analysis. Each stage corresponded to well-defined clinical measurement boundaries, with minimal overlap compared to previously established clinical staging methods.

2606.06179 2026-06-05 stat.ML cs.LG 版本更新

Diffusion Models Observe Only Gradients: A Geometric Perspective on Score Matching Errors

扩散模型仅观察梯度:分数匹配误差的几何视角

Naïl B. Khelifa, Richard E. Turner, Ramji Venkataramanan

发表机构 * University of Cambridge(剑桥大学)

AI总结 本文从几何角度揭示L2分数误差不是衡量边缘分布质量的正确指标,通过Helmholtz-Hodge分解将分数误差分解为梯度分量和螺线管分量,证明只有梯度分量影响Fokker-Planck动力学,并给出仅依赖于梯度分量的KL散度上界及可计算的梯度分量估计器。

详情
AI中文摘要

基于分数的扩散模型通常通过最小化$L^2$分数匹配误差来训练,标准理论分析依赖该量来约束学习分布与目标分布之间的采样差异。我们证明$L^2$分数误差不是边缘分布质量的正确内在度量:一个学习的扩散模型可能在完美匹配目标分布的同时产生任意大的$L^2$分数误差。通过将分数误差分解为梯度分量和螺线管分量(Helmholtz-Hodge分解),我们识别出背后的几何原因:只有梯度分量进入边缘Fokker-Planck动力学,而螺线管分量在结构上不可见。我们在三个结果中精确阐述了这一点。首先,基于修正的几何,我们证明了一个不可能结果:没有$L^2$分数误差的单调函数能够一致地给出学习分布与目标分布之间任何散度的下界。其次,我们推导出Kullback-Leibler散度的上界,该上界仅依赖于误差的可观测梯度分量,从而收紧标准Girsanov界,并指出其宽松性源于在路径空间而非边缘空间动力学上操作的成本。第三,我们通过对偶Sobolev恒等式给出了梯度分量的可处理估计器,实验表明该估计器与样本质量的相关性显著优于完整的$L^2$误差。

英文摘要

Score-based diffusion models are typically trained by minimizing the $L^2$ score matching error, and standard theoretical analyses rely on this quantity to bound the sampling discrepancy between the learned and target distributions. We show the $L^2$ score error is not the right intrinsic measure of marginal distributional quality: a learned diffusion model can incur arbitrarily large $L^2$ score error while perfectly matching the target distribution. By decomposing score errors into a gradient and a solenoidal component (a Helmholtz-Hodge decomposition), we identify the geometric reason behind this: only the gradient component enters the marginal Fokker-Planck dynamics, while the solenoidal component is structurally invisible. We make this precise in three results. First, building on the corrected geometry, we prove an impossibility result: no monotone function of the $L^2$ score error can uniformly lower bound any divergence between the learned and target distributions. Second, we derive an upper bound on the Kullback-Leibler divergence that depends only on the observable gradient component of the error, tightening the standard Girsanov bound and identifying its looseness as the cost of operating on path-space rather than marginal-space dynamics. Third, we give a tractable estimator of the gradient component via a dual Sobolev identity, which is shown to empirically correlate substantially better with sample quality than the full $L^2$ error.

2606.06178 2026-06-05 cs.LG cs.AI cs.CL 版本更新

Learning to Route LLMs from Implicit Cost-Performance Preferences via Meta-Learning

通过元学习从隐式成本-性能偏好中学习路由LLM

Jiahao Zeng, Ming Tang, Ningning Ding

发表机构 * Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) Southern University of Science and Technology(南方科技大学)

AI总结 提出MetaRouter框架,利用元学习从少量交互中学习用户隐式成本-性能偏好,实现个性化LLM路由,在分布内外任务上优于基线方法。

详情
AI中文摘要

大型语言模型(LLM)在性能与成本之间存在权衡,更强大的模型会产生更高的费用。LLM路由旨在通过将查询发送到最合适的模型来降低费用同时保持性能。然而,现有方法无法很好地适应不同用户的成本-性能偏好。为了解决这一差距,我们引入了一种新颖的感知LLM路由范式,用于个性化和以用户为中心的成本-性能优化,通过少量交互高效学习用户的隐式偏好。为了应对异构用户需求的挑战,我们将偏好配置文件形式化为上下文赌博机中的一组不同任务,并提出了MetaRouter,一个用于偏好感知LLM路由的元学习框架。实验结果表明,MetaRouter在分布内和分布外任务上均优于强基线。此外,它在学习用户偏好方面表现出高效率,对可路由LLM的变化具有鲁棒性,并且可扩展到多模型路由。

英文摘要

Large language models (LLMs) present a trade-off between performance and cost, where more powerful models incur greater expense. LLM routing aims to mitigate expenses while maintaining performance by sending queries to the most suitable model. However, existing methods cannot perform well for different user cost-performance preferences. To address this gap, we introduce a novel perceptive LLM routing paradigm for personalized and user-centric cost-performance optimization, which efficiently learns users' implicit preferences through little interaction. To handle the challenge of heterogeneous user needs, we formulate preference profiles as a set of distinct tasks in contextual bandit and propose MetaRouter, a meta-learning framework designed for preference-aware LLM routing. Experimental results show that MetaRouter outperforms strong baselines on both in-distribution and out-of-distribution tasks. Furthermore, it exhibits high efficiency in learning user preferences, robustness to changes in the routable LLMs, and scalability to multi-model routing.

2606.06174 2026-06-05 cs.LG stat.AP 版本更新

Learning to model pediatric asthma exacerbation from multiple risk factors: a case study in coastal Virginia

学习从多风险因素建模儿童哮喘加重:弗吉尼亚沿海地区案例研究

Jonathan Colen, Eric Werner, Maryam Golbazi, Heather Richter, Diana McSpadden, Amy Quinn, Jocel Santos, Mary Jane Darling, Mary Margaret Gleason

发表机构 * Joint Institute on Advanced Computing for Energy and Science(联合能源与科学高级计算研究所) Old Dominion University(老 Dominion 大学) School of Data Science(数据科学学院) Eastern Virginia Medical School(东部弗吉尼亚医学院) Macon & Joan Brock Virginia Health Sciences(马科恩与乔安·布罗克弗吉尼亚健康科学) Children’s Hospital of the King’s Daughters(国王女儿儿童医院) Children’s Specialty Group(儿童专科组) Institute for Coastal Adaptation and Resilience(海岸适应与韧性研究所) Chief Data Office(首席数据办公室) Thomas Jefferson National Accelerator Facility(托马斯·杰弗逊国家加速器设施) Department of Psychiatry(精神病学系) Boston Children’s Hospital(波士顿儿童医院)

AI总结 本研究通过比较广义线性模型、神经网络和稀疏字典学习框架,建模弗吉尼亚沿海地区儿童哮喘加重与空气污染、气象及社区社会经济因素的关系,并识别关键风险因素。

Comments 22 pages, 6 figures (5 supplemental)

详情
AI中文摘要

儿童哮喘是一种常见疾病,受空气污染、气象和社区级社会经济因素加剧。在大型时空数据集中建模哮喘加重(AE)需要厘清多个贡献因素的影响。在本案例研究中,我们比较了三种平衡预测能力与可解释性的技术,以预测汉普顿路(弗吉尼亚沿海地区,包括7个城市,人口超过150万)的AE。在整理环境空气污染测量值、天气数据和社区机会指标后,我们建模了2018-2023年区域儿童医院及附属机构的邮政编码级急性AE就诊情况。广义线性模型(GLM)提供基线,神经网络(NN)作为最大预测目标。为了桥接统计模型和深度学习,我们开发了一个基于稀疏字典学习的框架,以识别和解释简约的非线性交互方程。在比较每个模型的预测性能后,我们估计了输入暴露变量导致的AE相对风险,并发现各框架间的一致性。我们的工作将统计模型与可解释机器学习模型联系起来,突出了可能影响AE的协同交互作用,并可能为未来研究指导弗吉尼亚沿海地区的公共卫生干预措施。

英文摘要

Childhood asthma is a common illness exacerbated by air pollution as well as meteorological and neighborhood-level socioeconomic factors. Modeling asthma exacerbation (AE) in large spatiotemporal datasets requires disentangling impacts from multiple contributors. In this case study, we compared three techniques that balance predictive power with interpretability to predict AE in Hampton Roads, a coastal Virginia region comprising 7 cities and over 1.5 million people. After collating ambient air pollution measurements, weather data, and measures of neighborhood opportunity, we modeled zip code-level acute AE visits to a regional children's hospital and affiliated providers from 2018-2023. Generalized linear models (GLM) provided a baseline while neural networks (NN) served as a maximally predictive target. To bridge between statistical models and deep learning, we developed a framework based on sparse dictionary learning to identify and interpret parsimonious nonlinear interacting equations. After comparing each model's predictive performance, we estimated relative risks for AE due to input exposure variables and found consensus across frameworks. Our work links statistical and interpretable machine learning models to highlight possible synergistic interactions influencing AE, and may enable future studies to guide public health interventions in coastal Virginia.

2606.06171 2026-06-05 stat.ML cs.LG cs.NA math.NA physics.comp-ph 版本更新

Effective Dimensionality as an Operator Invariant for Physics-Preserving Constraint Adaptation in Physics-Informed Neural Networks

有效维度作为物理信息神经网络中保物理约束适配的算子不变量

Cornelius Otchere, Michael Shields

发表机构 * University of Cambridge(剑桥大学)

AI总结 利用Fisher信息矩阵定义物理约束模型的有效自由度d_eff,证明其收敛于微分算子核维数,并基于此提出子空间投影策略实现边界条件适配。

详情
AI中文摘要

物理信息神经网络固有地遭受任务干扰,因为它们依赖共享参数空间来同时满足控制微分方程和边界条件。我们使用Fisher信息矩阵分析这种结构冲突,量化物理约束模型中的有效自由度($d_{eff}$)。与经典的$d_{eff}$(衡量相对于统计先验由数据提供信息的参数方向数量)不同,我们的$d_{eff}$衡量不受微分算子约束的参数方向维度。对于具有有限维核的算子,我们证明$d_{eff}$精确收敛于核维数,与网络宽度、深度或激活函数无关,将其从拟合诊断重新解释为底层连续算子的结构不变量。对于具有无限维核的算子,$d_{eff}$则衡量网络对该核的有限维表示带宽,而非恢复整数不变量。重要的是,$d_{eff}$还作为先验结构诊断。将适定问题的$d_{eff}$驱动到零,证明物理和边界约束已吸收网络的自由方向。基于这一表征,我们引入了用于边界适配的子空间投影策略。无需从头重新训练,我们将参数更新投影到预训练物理算子的零空间,使得新边界条件得到满足而不干扰已学习的物理。基于梯度的微调可以达到或超过此效果,但需要更多的挂钟时间和调参,而子空间投影在几秒到几分钟内提供近乎等效的质量。我们在线性和非线性算子上验证,展示了对初始和边界偏移以及未遇到约束类型的准确适配。

英文摘要

Physics-Informed Neural Networks inherently suffer from task interference because they rely on a shared parameter space to satisfy both governing differential equations and boundary conditions. We analyze this structural conflict using the Fisher Information Matrix to quantify the effective degrees of freedom ($d_{eff}$) in a physics-constrained model. Unlike the classical $d_{eff}$ which measures how many parameter directions are informed by data against a statistical prior, our $d_{eff}$ measures the dimension of the parameter directions unconstrained by the differential operator. For operators with finite-dimensional kernel, we show that $d_{eff}$ converges to the kernel dimension exactly, independent of network width, depth, or activation function, recasting it from a fit diagnostic into a structural invariant of the underlying continuous operator. For operators with infinite-dimensional kernel, $d_{eff}$ instead measures the network's finite-dimensional representational bandwidth for that kernel rather than recovering an integer invariant. Importantly, $d_{eff}$ also serves as an a priori structural diagnostic. Driving $d_{eff}$ of a well-posed problem to zero certifies that the physics and boundary constraints have absorbed the network's free directions. Building on this characterization, we introduce subspace projection strategies for boundary adaptation. Rather than retraining from scratch, we project parameter updates into the null space of the pre-trained physics operator so that new boundary conditions are satisfied without disturbing the learned physics. Gradient-based fine-tuning can match or exceed this but needs more wall-clock time and tuning, whereas subspace projection delivers near-equivalent quality in seconds to minutes. We validate on linear and nonlinear operators, demonstrating accurate adaptation to initial and boundary shifts and unencountered constraint types.

2606.06164 2026-06-05 cs.LG physics.comp-ph 版本更新

On the training of physics-informed neural operators for solving parametric partial differential equations

关于物理信息神经算子训练以求解参数化偏微分方程的研究

Nanxi Chen, Chuanjie Cui, Airong Chen, Sifan Wang, Rujin Ma

发表机构 * College of Civil Engineering, Tongji University(同济大学土木工程学院) Department of Engineering Science, University of Oxford(牛津大学工程科学系) Institute for Foundations of Data Science, Yale University(耶鲁大学数据科学基础研究所)

AI总结 本文系统研究了物理信息神经算子(PINO)的训练策略,包括架构设计、优化器选择、损失平衡和配置点采样,通过实验发现CViT架构表现稳定,并揭示了梯度冲突和因果违反等优化问题,表明PINN的缓解算法在PINO中同样有效。

详情
AI中文摘要

物理信息神经算子(PINO)旨在通过使用控制物理作为监督来学习偏微分方程的解算子,而不是仅仅依赖配对的输入-输出模拟数据。通过将物理约束纳入训练目标,PINO结合了神经算子的跨实例泛化能力和物理信息学习的数据效率。尽管有这一前景,如何高效且稳健地训练PINO仍不如数据驱动神经算子或物理信息神经网络(PINN)的训练那样被充分理解。为弥补这一差距,我们考察了PINO训练流程的关键组成部分,包括架构设计、优化器选择、损失平衡和配置点采样策略。我们研究了三种代表性算子骨干:深度算子网络(DeepONet)、傅里叶神经算子(FNO)和连续视觉Transformer(CViT),涵盖五个不同的参数化PDE系统。结果表明,CViT在考虑的基准测试中提供了一致且稳定的强性能。除了架构,我们发现先前在PINN训练中识别出的几种优化病理自然出现在PINO中,包括梯度冲突和因果违反。我们还发现为PINN开发的缓解算法在PINO设置中仍然有效。我们进一步比较了不同数据体制下的物理信息和数据驱动训练,揭示精心设计的物理信息训练流程可以匹配,并在某些情况下超越纯数据驱动神经算子。综合来看,这些发现提供了对PINO训练中优化挑战的系统性实证理解,并为高效稳健的物理信息算子学习提供了实用流程。代码和数据可在 https://github.com/NanxiiChen/PI-CViT 获取。

英文摘要

Physics-informed neural operators (PINOs) aim to learn solution operators for partial differential equations by using the governing physics as supervision, rather than relying solely on paired input-output simulation data. By incorporating physical constraints into the training objective, PINOs combine the cross-instance generalization of neural operators with the data efficiency of physics-informed learning. Despite this promise, how to train PINOs efficiently and robustly remains less well-understood than the training of either data-driven neural operators or physics-informed neural networks (PINNs). To bridge this gap, we examine key components of the PINO training pipeline, including architecture design, optimizer choice, loss balancing, and collocation-point sampling strategy. We study three representative operator backbones, Deep Operator Network (DeepONet), Fourier Neural Operator (FNO), and Continuous Vision Transformer (CViT), across five diverse parametric PDE systems. Our results show that CViT provides consistently strong and stable performance across the considered benchmarks. Beyond architecture, we find that several optimization pathologies previously identified in PINN training naturally arise in PINOs, including gradient conflicts and causal violation. We also find that mitigation algorithms developed for PINNs remain effective in the PINO setting. We further compare physics-informed and data-driven training under different data regimes, revealing that a carefully designed physics-informed training pipeline can match, and in some cases, outperform purely data-driven neural operators. Taken together, these findings provide a systematic empirical understanding of the optimization challenges in PINO training and inform a practical pipeline for efficient and robust physics-informed operator learning. Code and data are available at https://github.com/NanxiiChen/PI-CViT.

2606.06156 2026-06-05 cs.LG 版本更新

Trust-Aware Predictive Emissions Monitoring for Gas Turbine Fleets with Limited Labelled Data

面向有限标注数据的燃气轮机机群信任感知预测性排放监测

Rebecca Potts, Aiden Durrant, Rick Hackney, Georgios Leontidis

发表机构 * School of Natural and Computing Sciences University of Aberdeen(自然与计算科学学院 布拉德福德大学) School of Computing Sciences University of East Anglia(计算科学学院 东安格利亚大学) Siemens Energy Industrial Turbomachinery Ltd.(西门子能源工业涡轮机有限公司) Department of Physics and Technology UiT The Arctic University of Norway(物理与技术系 UiT 北极大学)

AI总结 提出一种信任感知概率框架,结合多头循环预测模型、置信度估计、集成不确定性量化、辅助特征预测、特征空间距离分析和运行范围诊断,在少量标注数据下实现机群级NOx预测,并提供可解释的逐样本信任分数以指示未标注涡轮机的预测可靠性。

Comments 14 pages, 6 figures, 6 tables

详情
AI中文摘要

基于机器学习的预测性排放监测系统为直接排放测量提供了一种实用替代方案,但当排放标签仅适用于一小部分资产时,其在燃气轮机机群中的部署具有挑战性。在这项工作中,提出了一种信任感知概率框架,用于在有限标注监督下进行机群级燃气轮机NOx预测。该框架结合了多头循环预测模型与学习到的置信度估计、集成不确定性量化、辅助特征预测、特征空间距离分析和运行范围诊断。这些信号在标注数据上进行校准,以产生可解释的逐样本信任分数,为未标注涡轮机上的预测可靠性提供指标,支持识别在机群部署中应更加谨慎对待的预测。基于置信度的过滤将平均绝对误差从全覆盖时的0.202降低到最高置信度10%预测的0.070,表明置信度估计与预测误差有显著关联。未标注和分布外样本表现出增加的不确定性和降低的置信度,表明该框架对分布偏移做出了适当响应。结果表明,所提出的信任框架为未标注涡轮机上的排放预测提供了可操作的可靠性信息,支持PEMS在工业机群中更透明和可信的部署。

英文摘要

Machine learning-based predictive emissions monitoring systems offer a practical alternative to direct emissions measurement, but their deployment across gas turbine fleets is challenging when emissions labels are available for only a small subset of assets. In this work, a trust-aware probabilistic framework is proposed for fleet-level gas turbine NOx prediction under limited labelled supervision. The framework combines a multi-head recurrent prediction model with learned confidence estimation, ensemble-based uncertainty quantification, auxiliary feature prediction, feature-space distance analysis, and operating-range diagnostics. These signals are calibrated on labelled data to produce interpretable per-sample trust scores, providing indicators of prediction reliability on unlabelled turbines, supporting the identification of predictions that should be treated with greater caution during fleet-level deployment. Confidence-based filtering reduces MAE from 0.202 at full coverage to 0.070 for the highest-confidence 10\% of predictions, demonstrating that confidence estimates are meaningfully related to prediction error. Unlabelled and out-of-distribution samples exhibit increased uncertainty and reduced confidence, indicating that the framework responds appropriately to distributional shift. The results show that the proposed trust framework provides actionable reliability information for emissions prediction on unlabelled turbines, supporting more transparent and trustworthy deployment of PEMS across industrial fleets.

2606.06148 2026-06-05 cs.LG 版本更新

Tight list replicability bounds via a novel sphere covering theorem

通过新颖的球面覆盖定理实现紧的列表可复现性界

Ari Blondal, Hamed Hatami, Pooya Hatami, Chavdar Lalov, Sivan Tretiak

发表机构 * McGill University(麦吉尔大学) Ohio State University(俄亥俄州立大学)

AI总结 针对列表可复现性中列表大小与精度参数及假设类复杂度之间的关系,本文利用Borsuk-Ulam定理证明了一个新颖的拓扑球面覆盖定理,并由此得到VC类列表大小与精度的紧界,以及大间隔半空间的最优列表大小。

Comments 17 pages, 2 figures

详情
AI中文摘要

近年来,列表可复现性已成为学习理论中形式化可复现性的一个框架。一个核心问题是所需列表大小如何与精度参数及假设类的自然复杂度度量相关。为了获得列表可复现性的紧界,我们证明了一个新颖的拓扑球面覆盖定理,该定理源自Borsuk-Ulam定理。具体而言,如果$d$维球面被开集覆盖,且每个开集位于一个开半球内,那么这些集合中必有$d+1$个具有公共交集。利用这一结果,我们得到了VC类列表大小与精度之间关系的紧界。我们还证明,对于大间隔半空间,只要间隔不是太大,最优列表大小等于环境维度。然而,当间隔非常大时,我们设计了一个可复现算法,实现了最小列表大小$\lceil d/2 \rceil + 1$。

英文摘要

In recent years, list replicability has emerged as a framework for formalizing reproducibility in learning theory. A central question is how the required list size relates to the accuracy parameter and natural complexity measures of the hypothesis class. To achieve sharp bounds on list replicability, we prove a novel topological sphere covering theorem, derived from the Borsuk-Ulam theorem. Specifically, if the $d$-sphere is covered by open sets, each of which lies in an open hemisphere, then $d+1$ of these sets must have a common intersection. Using this result, we obtain a sharp bound on the relationship between list size and accuracy for VC classes. We also show that for large-margin half-spaces, provided the margin is not too large, the optimal list size equals the ambient dimension. However, when the margin is taken to be very large, we devise a replicable algorithm achieving the minimal list size of $\lceil d/2 \rceil + 1$.

2606.06123 2026-06-05 cs.LG stat.ML 版本更新

Adaptive state-action abstractions via rate-distortion

基于率失真的自适应状态-动作抽象

Fernando E. Rosas

发表机构 * Department of Informatics, University of Sussex(苏塞克斯大学信息学院) Department of Brain Science, Imperial College London(伦敦帝国学院脑科学系) Centre for Eudaimonia and Human Flourishing, University of Oxford(牛津大学幸福与人类繁荣中心)

AI总结 提出通过率失真原理构建软状态-动作抽象,并利用性能证书动态调整抽象粒度,以在压缩状态和动作信息时实现近似最优性能。

Comments 28 pages, 2 figures

详情
AI中文摘要

在学习走路时,婴儿似乎首先处理问题的粗略版本——保持直立、到达看护者——并且只有当在该分辨率下的进一步练习不再有回报时才会细化它。强化学习提供了多种构建复杂任务简单版本的技术,但缺乏关于如何在学习过程中动态调整这些抽象粒度的通用原则。本文提出了这样一个原则:一旦抽象内的学习误差变得与抽象本身引起的误差相当,就细化抽象。在这里,我们通过一个性能证书来研究这一原则的一种形式化方式,该证书将值误差分解为两项:由贝尔曼残差捕获的学习误差界,和由双模拟度量给出的抽象误差界。由此产生的切换策略通过基于率失真原理构建的软状态-动作抽象来实现,其沿状态和动作轴的分辨率可以连续调整。我们在各种表格设置中验证了这种构造,表明在状态和动作信息的大量有损压缩下可以实现近似最优性能。

英文摘要

When learning to walk, infants seem to address a coarse version of the problem first - stay upright, reach the caregiver - and refine it only when further practice at that resolution stops paying off. Reinforcement learning offers multiple techniques for building simple versions of complex tasks, but lacks general principles for how to dynamically adjust the granularity of these abstractions during learning. This paper proposes one such principle: refine the abstraction as soon as the learning error within it becomes comparable to the error induced by the abstraction itself. Here, we investigate one way of formalising this principle via a performance certificate that decomposes value error into two terms: a learning error bound captured by a Bellman residual, and an abstraction error bound given by a bisimulation metric. The resulting switching strategy is implemented by soft state-action abstractions built from rate-distortion principles, whose resolution along state and action axes can be continuously adjusted. We validate this construction in a range of tabular settings, showing that near-optimal performance can be achieved under substantial lossy compression of state and action information.

2606.06117 2026-06-05 q-bio.QM cs.LG math.AT q-bio.GN 版本更新

$p$-adic Bi-Filtrations for Topological Machine Learning on Genomic Sequences

$p$-adic 双过滤用于基因组序列的拓扑机器学习

Tirtharaj Dash, Gunja Sachdeva

发表机构 * Department of CS & IS, BITS Pilani, K K Birla Goa Campus(计算机科学与信息系统系,比斯潘大学,KK Birla Goa校区) Department of Mathematics, BITS Pilani, K K Birla Goa Campus(数学系,比斯潘大学,KK Birla Goa校区)

AI总结 提出 pVR 框架,结合 $p$-adic 数与拓扑数据分析,通过双过滤 Vietoris-Rips 复形提取基因组序列拓扑特征,在低样本数据集上优于基线方法。

Comments 12 pages, 5 figures, 8 tables

详情
AI中文摘要

我们引入 pVR,一种用于无比对基因组序列分类的拓扑机器学习框架,该框架将 $p$-adic 数与拓扑数据分析相结合。每条 DNA 序列沿两个互补轴编码:$k$-mer 前缀上的 $p$-adic 距离,捕捉层次化位置结构;以及 $k$-mer 频率上的组合 $L_1$ 距离,捕捉局部序列内容。这两个距离共同参数化一个双过滤 Vietoris-Rips 复形,来自该双过滤的每条序列的拓扑摘要作为标准机器学习分类器的特征。我们为该构造建立了理论保证:在度量扰动下的稳定性以及对素数选择的不变性,同时还有一个结果解释了为什么单个 $p$-adic 轴在拓扑上无信息,而双过滤能恢复非平凡同调。在十二个基因组基准测试(28 到 500 条序列,3 到 7 个类别)上,pVR 在六个低样本数据集中的三个上优于四种已建立的无对齐基线方法,提升高达 21 个百分点;仅在一个 SARS-CoV-2 变异基准测试上表现不佳,该基准测试的点突变偏离违反了层次化假设,并且所有方法在大样本情况下均达到饱和。pVR 还在三个低样本基准测试上优于 5 亿参数 Nucleotide Transformer v2 的零样本冻结嵌入,提升 6.7 到 11.4 个百分点。pVR 代码库公开于 https://github.com/MAHI-Group/pVR。

英文摘要

We introduce pVR, a topological machine learning framework for alignment-free genomic sequence classification that combines $p$-adic numbers with topological data analysis. Each DNA sequence is encoded along two complementary axes: a $p$-adic distance on $k$-mer prefixes, which captures hierarchical positional structure, and a compositional $L_1$ distance on $k$-mer frequencies, which captures local sequence content. The two distances jointly parameterise a bi-filtered Vietoris--Rips complex, and per-sequence topological summaries from this bi-filtration serve as features for standard machine learning classifiers. We establish theoretical guarantees for the construction: stability under metric perturbations and invariance to the choice of prime, alongside a result that explains why a single $p$-adic axis is topologically uninformative and why the bi-filtration recovers nontrivial homology. On twelve genomic benchmarks ($28$ to $500$ sequences, $3$ to $7$ classes), pVR outperforms four established alignment-free baselines on three of six low-sample datasets, with gains of up to $21$ percentage points; it underperforms only on a SARS-CoV-2 variant benchmark whose point-mutation divergence violates the hierarchical assumption, and all methods saturate in the large-sample regime. pVR also outperforms zero-shot frozen embeddings from the 500M-parameter Nucleotide Transformer v2 by $6.7$ to $11.4$ percentage points on three low-sample benchmarks. The pVR codebase is publicly available at https://github.com/MAHI-Group/pVR.

2605.03413 2026-06-05 cs.LG cs.AI 版本更新

Learning to Theorize the World from Observation

从观察中学习理论化世界

Doojin Baek, Gyubin Lee, Junyeob Baek, Hosung Lee, Sungjin Ahn

发表机构 * University of Washington(华盛顿大学)

AI总结 受认知科学启发,提出Learning-to-Theorize范式,通过神经理论家(NEO)模型从原始非文本观测中推断显式解释性理论,实现基于解释的泛化。

详情
AI中文摘要

理解世界意味着什么?当代世界模型通常将理解操作化为在潜在空间或观测空间中的准确未来预测。然而,发展认知科学提出了不同的观点:人类理解是通过构建关于世界如何运作的内部理论而涌现的,即使在成熟语言习得之前也是如此。受这种理论构建的认知观点启发,我们引入了Learning-to-Theorize,一种从原始非文本观测中推断世界的显式解释性理论的学习范式。我们通过神经理论家(NEO)实例化该范式,这是一种概率神经模型,它将潜在程序诱导为习得的思维语言,并通过共享的转移模型执行它们。在NEO中,理论被表示为一个可执行的组合程序,其习得的原语可以系统地重新组合以解释新现象。实验表明,这种公式化实现了基于解释的泛化,允许根据生成观测的程序来理解观测。

英文摘要

What does it mean to understand the world? Contemporary world models often operationalize understanding as accurate future prediction in latent or observation space. Developmental cognitive science, however, suggests a different view: human understanding emerges through the construction of internal theories of how the world works, even before mature language is acquired. Inspired by this theory-building view of cognition, we introduce Learning-to-Theorize, a learning paradigm for inferring explicit explanatory theories of the world from raw, non-textual observations. We instantiate this paradigm with the Neural Theorizer (NEO), a probabilistic neural model that induces latent programs as a learned Language of Thought and executes them through a shared transition model. In NEO, a theory is represented as an executable, compositional program whose learned primitives can be systematically recombined to explain novel phenomena. Experiments show that this formulation enables explanation-driven generalization, allowing observations to be understood in terms of the programs that generate them.

2606.06104 2026-06-05 cs.LG 版本更新

A Sliced-Wasserstein Framework on Correlation Matrices for EEG Decoding

用于脑电图解码的相关矩阵切片Wasserstein框架

Chen Hu, Rui Wang, Jiale Zhou, Jingjun Yi, Shaocheng Jin, Yidong Song, Yefeng Zheng

发表机构 * Westlake University(西湖大学) School of Artificial Intelligence and Computer Science(人工智能与计算机科学学院) Jiangnan University(江南大学) Sun Yat-sen University(中山大学)

AI总结 提出基于拉回欧几里得度量的切片Wasserstein框架,实例化两种相关矩阵切片Wasserstein差异,并构建脑电图解码的域泛化方法,在三个数据集上验证了分布偏移下的泛化能力提升。

Comments Accepted by KDD 2026

详情
AI中文摘要

脑电图(EEG)提供非侵入性、毫秒分辨率的神经活动记录,广泛应用于神经科学和医疗保健。许多EEG解码流程依赖协方差描述符以抵抗噪声,但这种表示对通道缩放敏感。因此,近期研究提倡使用满秩相关矩阵作为EEG解码的尺度不变替代。本文提出一个通用框架,用于在赋予拉回欧几里得度量(PEM)的流形上进行切片Wasserstein(SW)差异计算,称为拉回欧几里得度量切片Wasserstein(PEMSW)。在该框架下,我们在两种最近引入的相关几何(即Off-Log度量(OLM)和对数缩放度量(LSM))下,在满秩相关矩阵流形上实例化了两种相关切片Wasserstein(CorSW)差异。基于CorSW,我们进一步开发了用于EEG解码的域泛化(DG)框架。在三个EEG数据集上的实验表明,在分布偏移下泛化能力得到提升,且训练开销低,无额外推理成本。源代码可在https://github.com/ChenHu-ML/CorSW获取。

英文摘要

Electroencephalography (EEG) offers noninvasive, millisecond resolution recordings of neuronal activity and is widely used in neuroscience and healthcare. Many EEG decoding pipelines rely on covariance descriptors for their robustness to noise, but such representations are sensitive to channel-wise scaling. Recent studies have therefore advocated full-rank correlation matrices as a scale-invariant alternative for EEG decoding. In this paper, we propose a general framework for Sliced Wasserstein (SW) discrepancies on manifolds endowed with Pullback Euclidean Metrics (PEMs), termed Pullback Euclidean Metric Sliced Wasserstein (PEMSW). Within this framework, we instantiate two Correlation Sliced-Wasserstein (CorSW) discrepancies on the manifold of full-rank correlation matrices under two recently introduced correlation geometries, \textit{i.e.}, the Off-Log Metric (OLM) and Log-Scaled Metric (LSM). Building on CorSW, we further develop a domain generalization (DG) framework for EEG decoding. Experiments on three EEG datasets demonstrate improved generalization under distribution shifts, with low training overhead and no additional inference cost. The source code is available at https://github.com/ChenHu-ML/CorSW.

2606.06102 2026-06-05 cs.AI cs.LG 版本更新

Step-adaptive multimodal fusion network with multi-scale cloud feature learning for ultra-short-term solar irradiance forecasting

步进自适应多模态融合网络与多尺度云特征学习用于超短期太阳辐照度预测

Jingxin Zhang Xiaoqin Wang

发表机构 * School of Automation, Southeast University(自动化学院,东南大学)

AI总结 提出一种步进自适应多模态融合网络,通过InceptionNeXt提取多尺度云特征、步进自适应低频补偿单元动态调整低频信息,并结合气象时间序列特征进行超短期太阳辐照度预测。

详情
AI中文摘要

超短期太阳辐照度预测对于光伏系统调度和电网稳定性至关重要。现有方法存在三个关键缺陷:单一时间序列模型无法捕捉复杂条件下云的空间动态,标准卷积不能充分表示多尺度云特征,固定的低频补偿策略无法适应不同的预测步长。针对这些问题,本文提出了一种用于超短期辐照度预测的多源数据融合模型。该模型首先采用InceptionNeXt从地基云图像中提取多尺度、多方向的空间特征。然后引入步进自适应低频补偿单元,根据预测步长动态调节全局低频信息。最后,将增强的图像特征与气象时间序列特征相结合,通过TempAttnLSTM网络捕获全局时间依赖性进行多步预测。在公共NREL数据集和山东实际光伏电站上的实验表明,与几种最先进的方法相比,所提方法具有有效性。

英文摘要

Ultra-short-term solar irradiance prediction is critical for photovoltaic system dispatch and power grid stability. Existing approaches suffer from three key shortcomings: single time-series models cannot capture the spatial dynamics of clouds under complex conditions, standard convolutions inadequately represent multi-scale cloud features, and fixed low-frequency compensation strategies fail to adapt to different prediction steps. To address these issues, this proposes a multi-source data fusion model for ultra-short-term irradiance prediction. The model first employs InceptionNeXt to extract multi-scale, multi-directional spatial features from ground-based cloud images. A step-adaptive low-frequency compensation unit is then introduced to dynamically modulate global low-frequency information based on the prediction step. Eventually, the enhanced image features are combined with meteorological time-series features, and a TempAttnLSTM network captures global temporal dependencies for multi-step prediction. Experiments on the public NREL dataset and practical photovoltaic stations in Shandong illustrate the effectiveness of the proposed method compared with several state-of-the-art approaches.

2606.06098 2026-06-05 cs.CL cs.LG 版本更新

IR3DE: A Linear Router for Large Language Models

IR3DE:面向大型语言模型的线性路由器

Eros Fanì, Oğuzhan Ersoy

发表机构 * Gensyn

AI总结 提出基于岭回归的线性路由器IR3DE,以低成本快速为每个提示选择最合适的领域专家大语言模型,在推理任务中超越基线方法,并支持动态添加或移除专家模型。

Comments Accepted at the ICML 2026 Workshop on Resource-Adaptive Foundation Model Inference

详情
AI中文摘要

基础大型语言模型(LLM)在广泛的一般任务上表现出色,并通过领域专家LLM在各种专业任务上取得显著成果。随着可用LLM列表的不断增长,推理路由器被提出以选择每个提示最合适的LLM。然而,现有的路由方法要么优化弱到强通用LLM的成本,要么需要大量训练来支持领域专家路由。在本文中,我们提出IR3DE,一种基于岭回归的领域专家路由器,为每个提示提供廉价且快速的路由决策。我们在两种因果语言建模(CLM)设置中评估IR3DE,其中任务是对所有域进行下一个词预测,以及一种推理设置,其中每个域有自己的独特推理任务。尽管是线性路由器,IR3DE在两种CLM设置中实现了与其他基线相当的性能,并在推理设置中超越了它们,归一化性能达到98.4%。此外,IR3DE允许添加或移除新的领域专家,而无需从头重新训练路由器,从而可以动态服务一组LLM,对路由器本身的干扰最小。我们的代码可在github.com/gensyn-ai/IR3DE获取。

英文摘要

Foundational Large Language Models (LLMs) demonstrate proficiency on a wide range of general tasks, and achieve remarkable results on various specialized tasks via domain-expert LLMs. With the ever-growing list of available LLMs, inference routers are being proposed to select the most appropriate LLM for each prompt. However, existing routing methods either optimize cost across weak-to-strong generalist LLMs or require substantial training to support domain-expertise routing. In this paper, we propose IR3DE, a Ridge Regression-based Router for Domain Experts that provides cheap and fast routing decisions for each prompt. We evaluate IR3DE in two Causal Language Modeling (CLM) settings where the tasks are next-token prediction for all domains, and one reasoning setting where each domain has its own distinct reasoning task. Despite being a linear router, IR3DE achieves performance comparable to the other baselines in both CLM settings, and surpassing them in the reasoning setting, with a normalized performance of 98.4%. Moreover, IR3DE enables the addition or removal of new domain experts without requiring the router to be retrained from scratch, allowing a dynamic set of LLMs to be served with minimal disruption to the router itself. Our code is available at: github.com/gensyn-ai/IR3DE.

2606.06096 2026-06-05 cs.LG cs.AI cs.CL 版本更新

OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

OrderGrad: 通过顺序统计量策略梯度估计超越均值优化

Paavo Parmas, Yongmin Kim, Kohsei Matsutani, Shota Takashiro, Soichiro Nishimori, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo

发表机构 * The University of Tokyo(东京大学)

AI总结 提出OrderGrad,一种用于顺序统计量目标的似然比和重参数化梯度估计器族,通过奖励变换实现风险厌恶、鲁棒和探索性学习的统一即插即用方法。

详情
AI中文摘要

策略梯度方法通常优化期望回报,但许多现实应用关心回报的分布特性:尾部风险、异常值鲁棒性或最佳K发现。我们引入OrderGrad,一种用于顺序统计量目标的似然比和重参数化梯度估计器族。OrderGrad优化有限样本L-统计量,即排序奖励或成本的加权平均,通过仅改变秩权重来恢复诸如VaR、CVaR、修剪均值、中位数和top-m/最佳K标准等目标。对于任何固定样本大小和秩权重向量,OrderGrad为相应的顺序统计量目标提供无偏梯度估计。该方法实现为简单的奖励变换,然后可在其他标准策略梯度或重参数化更新中使用。我们研究了所得估计量的方差行为,并在均值优化与部署目标不匹配的任务上进行了评估,包括LLM数学后训练和其他任务。OrderGrad为风险厌恶、鲁棒和探索性学习提供了统一的即插即用途径。代码:https://github.com/paavo5/ordergrad

英文摘要

Policy-gradient methods usually optimize expected return, but many real world applications care about distributional properties of returns: tail risk, outlier robustness, or best-of-K discovery. We introduce OrderGrad, a family of likelihood-ratio and reparameterization gradient estimators for order-statistic objectives. OrderGrad optimizes finite-sample L-statistics, i.e., weighted averages of sorted rewards or costs, recovering objectives such as VaR, CVaR, trimmed means, medians, and top-m/best-of-K criteria by changing only the rank weights. For any fixed sample size and rank-weight vector, OrderGrad provides an unbiased gradient estimator for the corresponding order-statistic objective. The method is implemented as a simple reward transformation that can then be used in an otherwise standard policy-gradient or reparameterized update. We study the resulting estimator's variance behavior and evaluate it on tasks where mean optimization is mismatched to the deployment objective, including LLM math post-training and other tasks. OrderGrad provides a unified, plug-and-play route to risk-averse, robust, and exploratory learning. Code: https://github.com/paavo5/ordergrad

2606.06094 2026-06-05 cs.AI cs.LG math.DS physics.med-ph 版本更新

Integrating Mechanistic and Data-Driven Models for Neurological Disorders through Differentiable Programming

通过可微编程整合机制模型与数据驱动模型用于神经系统疾病

Shah Pallav Dhanendrakumar, Saikat Pal, Sitikantha Roy

发表机构 * Department of Applied Mechanics, Indian Institute of Technology Delhi(印度理工学院德里应用力学系) Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi(印度理工学院德里人工智能学院)

AI总结 本文综述了混合建模策略,通过可微编程将深度学习与基于物理的求解器结合,用于神经系统疾病的诊断、预后和治疗规划,优于纯机制或纯数据驱动方法。

详情
AI中文摘要

计算建模、神经影像和人工智能的进步正在革新神经系统疾病的建模,以改进诊断、预后和治疗规划。机制模型提供了对疾病的宝贵科学见解,但在实践中常常因假设而简化,或计算昂贵且求解缓慢。然而,纯数据驱动方法虽然提供速度和可扩展性,但需要大量高质量数据进行训练,并且通常存在可解释性和泛化问题。本视角论文提供了混合建模策略的结构化概述,这些策略将深度学习模型与基于物理的求解器相结合,并分为并行、串行和并行-串行架构。强调的三种主要方法是:用于缺失或不完整物理的残差建模、用于连续时间动力学近似的神经常微分方程(NODEs),以及用神经近似加速传统求解器的求解器在环。这些混合模型整合了基于控制微分方程的公式和深度学习,以表征神经系统疾病的演变,并有望实现先进的个性化神经建模。此外,该研究探索并提出了不同的混合配置,以提高诊断准确性、预测疾病进展,并为一系列神经系统疾病提供治疗策略信息。这些能力优于独立的机制或纯数据驱动方法,使混合建模成为强大的工具,特别是在涉及脑肿瘤、阿尔茨海默病和中风等神经系统疾病的进展和治疗反应建模的应用中。

英文摘要

Advances in computational modeling, neuroimaging, and artificial intelligence are revolutionizing the modeling of neurological disorders for improved diagnostics, prognosis, and treatment planning. Mechanistic models provide valuable scientific insight into the disorders, but in practice they are often simplified with assumptions or computationally expensive and slow to solve. However, while purely data driven approaches provide speed and scalability, they require large, high quality data to train and generally suffer from interpretability and generalization issues. This perspective paper presents a structured overview of hybrid modeling strategies, which combine deep learning models with physics based solvers, and are categorized into parallel, series, and parallel-series architectures. Three main approaches that have been emphasized are residual modeling for missing or incomplete physics, Neural Ordinary Differential Equations (NODEs) for continuous time dynamics approximation, and solver in the loop that accelerates traditional solvers with neural approximations. These hybrid models integrate the governing differential equation based formulations and deep learning to characterize the evolution of neurological disorders, and promise advanced personalized neurological modeling. In addition, the study explores and proposes different hybrid configurations to improve diagnosis accuracy, predict disease progression, and inform treatment strategies across a range of neurological disorders. These capabilities outperform standalone mechanistic or purely data driven approaches, making hybrid modeling a powerful tool, especially in applications involving modeling the progression and treatment responses in neurological conditions such as brain tumors, Alzheimer's disease, and stroke.

2606.06080 2026-06-05 cs.LG cs.AI cs.CL 版本更新

On Advantage Estimates for Max@K Policy Gradients

关于 Max@K 策略梯度的优势估计

Shota Takashiro, Soichiro Nishimori, Paavo Parmas, Yongmin Kim, Kohsei Matsutani, Gouki Minegishi, Yusuke Iwasawa, Takeshi Kojima, Yutaka Matsuo

发表机构 * The University of Tokyo(东京大学)

AI总结 针对稀疏奖励下推理模型后训练困难,提出一种新的优势估计方法 MaxPO,通过 Leave-Two-Out 基线实现中心化优势,降低梯度方差并提升性能。

详情
AI中文摘要

具有可验证奖励的强化学习广泛用于推理模型的后训练,但稀疏的结果奖励使得探索困难。一种补充方法是直接优化推理时目标如 pass@K 和 max@K,然而现有针对这些目标的策略梯度估计器使用不同的信号、基线和归一化,使得它们之间的关系不明确。我们通过基线设计和优势中心化来研究这个问题。从该领域领先方法的优势估计器出发,我们证明它是策略梯度无偏的,但产生非中心化的优势。然后我们引入一种 Leave-Two-Out 基线,它在保持策略梯度无偏性的同时,使得实现的批次优势完全中心化。由此产生的方法 MaxPO 具有高效的二次时间实现,并自然地集成到基于组的 LLM 后训练强化学习中。我们进一步推导了 max@K 的规范有限批次优势,为现有优势估计器提供了统一视角。实验上,我们验证了 L2O 基线降低了梯度方差,并优于非中心化的替代方案。

英文摘要

Reinforcement learning with verifiable rewards is widely used for post-training reasoning models, but sparse outcome rewards make exploration difficult. A complementary approach is to optimize inference-time objectives such as pass@K and max@K directly, yet existing policy-gradient estimators for these objectives use different signals, baselines, and normalizations, making their relationships unclear. We study this issue through baseline design and advantage centering. Starting from the advantage estimator of a leading method in the field, we show that it is policy-gradient unbiased but yields a non-centered advantage. We then introduce a Leave-Two-Out baseline that preserves policy-gradient unbiasedness while making realized batch advantages exactly centered. The resulting method, MaxPO, has an efficient quadratic-time implementation and integrates naturally into group-based RL for LLM post-training. We further derive the canonical finite-batch advantage for max@K, providing a unified view of existing advantage estimators. Empirically, we verify that the L2O baseline reduces gradient variance and outperforms non-centered alternatives.

2606.06077 2026-06-05 cs.RO cs.LG 版本更新

3D Underwater Path Planning via Generative Flow Field Surrogates

基于生成流场代理的三维水下路径规划

Zachary Cooper-Baldock, Paulo E. Santos, Russell S. A. Brinkworth, Karl Sammut

发表机构 * Flinders University(弗林德斯大学)

AI总结 针对自主水下航行器回收过程中复杂三维螺旋桨尾流的高成本CFD仿真问题,提出用条件生成对抗网络(cGAN)作为替代,结合能量加权A*路径规划,实现快速且有效的路径规划。

Comments 41 pages, 5 figures, 11 tables

详情
AI中文摘要

自主水下航行器(AUV)从行进中的母船船体发射和回收(LAR)需要穿越复杂的三维螺旋桨尾流,其水动力学结构无法用均匀流模型表征。高保真雷诺平均Navier-Stokes(RANS)计算流体动力学(CFD)仿真能够以足够精度解析该结构以用于路径规划,但其计算成本使其无法在机载使用。我们通过集成两种条件生成对抗网络(cGAN)架构——正则化PatchGAN和带有自注意力的2D3DGAN——作为三维能量加权A*路径规划框架中RANS CFD数据的即插即用替代方案来填补这一空白。两个生成器均由一个分层流水线驱动,该流水线仅从标量操作条件输入合成完整的$128^3$体素流场体积,端到端推理时间约为28-146微秒,而单个RANS计算则需要数小时。我们在550种不同流动条件下的19,800条独立生成轨迹上对所有四种环境知识水平(均匀流、真实CFD、PatchGAN和2D3DGAN~SA)进行了基准测试。与均匀流规划相比,完整的CFD尾流知识使能量消耗降低5.7-12.5%,高速尾流核心遭遇减少高达77.8%,且两种优势随操作严重程度增加而扩大。cGAN代理在推理速度与边缘设备兼容的情况下,恢复了约45-60%的CFD能量收益和高速单元规避收益。这些结果首次系统量化了cGAN预测水动力场在三维海洋机器人应用中的下游路径规划价值。

英文摘要

Autonomous underwater vehicle (AUV) launch and recovery (LAR) into the hull of an advancing host platform requires traversal of a complex, three-dimensional propeller wake whose hydrodynamic structure cannot be characterised by a uniform current model. High-fidelity Reynolds-Averaged Navier-Stokes (RANS) Computational Fluid Dynamics (CFD) simulations resolve this structure with sufficient accuracy for path planning, but their computational cost renders them impractical for onboard use. We address this gap by integrating two conditional generative adversarial network (cGAN) architectures -- a regularised PatchGAN and a 2D3DGAN with self-attention -- as drop-in replacements for RANS CFD data within a three-dimensional, energy-weighted A* path planning framework. Both generators are driven by a hierarchical pipeline that synthesises full $128^3$ voxel flow field volumes from scalar operating condition inputs alone, with end-to-end inference times of approximately 28-146 $μ$s, compared to hours for a single RANS computation. We benchmark all four environmental knowledge levels: uniform current, ground-truth CFD, PatchGAN, and 2D3DGAN~SA across 19,800 independently generated trajectories spanning 550 distinct flow conditions. Full CFD wake knowledge reduces energy expenditure by 5.7-12.5% and high-velocity wake-core encounters by up to 77.8% relative to uniform-current planning, with both benefits scaling with operating severity. The cGAN surrogates recover approximately 45-60% of the CFD energy benefit and high-velocity cell avoidance benefit while operating at inference speeds compatible with edge device use. These results provide the first systematic quantification of the downstream path planning value of cGAN-predicted hydrodynamic fields in a three-dimensional maritime robotics application.

2606.06058 2026-06-05 cs.LG cs.AI cs.CL 版本更新

MDP-GRPO: Stabilized Group Relative Policy Optimization for Multi-Constraint Instruction Following

MDP-GRPO:面向多约束指令跟随的稳定化组相对策略优化

Mohammad Mahdi Salmani-Zarchi, Zahra Rahimi, Heshaam Faili, Mohammad Javad Dousti

发表机构 * Department of Electrical and Computer Engineering, College of Engineering, University of Tehran(德黑兰大学电气与计算机工程系,工程学院) Department of Statistics, Mathematics and Computer Science, Allameh Tabataba’i University(塔巴蒂大学统计、数学与计算机科学系)

AI总结 针对标准GRPO在离散低分散奖励下的不稳定性,提出MDP-GRPO,通过多温度采样、双锚优势、前景理论整形和非对称KL正则化,在FollowBench等数据集上提升严格约束满足率最高5.0%。

Comments Accepted to ACL 2026 Main Conference. 14 pages, 9 figures

详情
AI中文摘要

可验证奖励的强化学习非常适合多约束指令跟随,但标准组相对策略优化(GRPO)在离散、低分散奖励下变得不稳定,此时组内奖励分布常常同质。我们识别并形式化了在此场景下z-score组归一化的三种病理:低方差放大、均值中心盲视和零方差崩溃。为解决这些问题,我们提出MDP-GRPO,通过以下方式稳定学习:(1)多温度采样以增加奖励分散度,(2)双锚优势以恢复同质组中的梯度并阻止均值中心盲视,(3)基于Kahneman和Tversky理论的前景理论整形以限制更新并惩罚违规,以及(4)非对称KL正则化。在FollowBench、IFEval和一个精心策划的多约束数据集上评估,MDP-GRPO优于标准GRPO,在Llama-3.2-3B上将严格约束满足率提高了最多5.0%。我们的方法还能够在保持MMLU和ARC上通用能力的同时,实现小批量大小的稳定收敛。

英文摘要

Reinforcement learning with verifiable rewards is ideal for multi-constraint instruction following, yet standard group-relative policy optimization (GRPO) becomes unstable under discrete, low-dispersion rewards, where within-group reward distributions are frequently homogeneous. We identify and formalize three pathologies of z-score group normalization in this regime: low-variance amplification, mean-centering blindness, and zero-variance collapse. To address them, we propose MDP-GRPO, which stabilizes learning through (1) multi-temperature sampling to increase reward dispersion, (2) dual-anchor advantages to restore gradients in homogeneous groups and stop mean-centering blindness, (3) prospect-theoretic shaping to bound updates and penalize violations based on Kahneman and Tversky's theory, and (4) asymmetric KL regularization. Evaluated on FollowBench, IFEval, and a curated multi-constraint dataset, MDP-GRPO outperforms standard GRPO, improving strict constraint satisfaction by up to 5.0% on Llama-3.2-3B. Our method also enables stable convergence with small group sizes while preserving general capabilities on MMLU and ARC.

2606.06056 2026-06-05 cs.SE cs.AI cs.LG 版本更新

Metamorphic Testing with the Rashomon Set: Explanation Faithfulness in Machine Learning

使用Rashomon集的蜕变测试:机器学习中的解释忠实性

Helge Spieker, Jørn Eirik Betten, Arnaud Gotlieb

发表机构 * Norwegian Ministry of Education and Research(挪威教育与研究部)

AI总结 针对机器学习中因Rashomon效应导致解释不可靠的问题,提出基于蜕变测试的框架,通过后验解释方法评估特征归因的忠实性,无需真实标签。

Comments Accepted at 10th International Workshop on Metamorphic Testing (MET 2026)

详情
AI中文摘要

多个机器学习模型在同一任务上可以达到近乎相等的预测性能,但提供不同的基于特征的解释。这被称为(可解释)机器学习的Rashomon效应,它引发了哪些解释(如果有的话)是可信的问题。我们提出了一个基于蜕变测试的框架,该框架通过探索后验解释方法中的归因特征重要性来评估解释忠实性,无需真实标签。五个蜕变关系形式化了模型行为与特征归因之间的预期一致性属性。我们将这个通用框架应用于两个表格回归数据集和两个后验解释器(SHAP和LIME)以演示该方法。该框架提供了一个实用的、模型无关的工具,用于选择具有可靠和可信解释的准确模型。

英文摘要

Multiple machine learning models can achieve near-equivalent predictive performance on the same task, yet provide divergent feature-based explanations. This is called the Rashomon effect of (explainable) machine learning, and it raises the question of which explanations, if any, are trustworthy. We propose a framework based on metamorphic testing that assesses explanation faithfulness without requiring ground-truth labels by exploring attributed feature importance from post-hoc explanation methods. Five metamorphic relations formalize expected consistency properties between model behavior and feature attributions. We apply this general framework to two tabular regression datasets and two post-hoc explainers (SHAP and LIME) to demonstrate the approach. The framework offers a practical, model-agnostic tool for selecting accurate models with reliable and trustworthy explanations.

2606.06053 2026-06-05 cs.LG 版本更新

Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification

基于函数近似的在线KL正则化强化学习在模型误设下的研究

Haoyang Hong, Zichen Wang, Quanquan Gu, Huazheng Wang

发表机构 * Department of XXX, University of YYY, Location, Country(XXX系,YYY大学,地点,国家) School of ZZZ, Institute of WWW, Location, Country(ZZZ学院,WWW研究所,地点,国家)

AI总结 研究在模型误设下,基于一般函数近似的KL正则化上下文赌博机和情节强化学习,提出KL误设公式并分析基于回归的Gibbs策略更新算法,给出包含显式误设项的高概率KL遗憾界。

Comments Accepted by RLC 2026

详情
AI中文摘要

我们研究了在一般函数近似下具有模型误设的KL正则化上下文赌博机和情节强化学习(RL)。现有的保证依赖于可实现性,因此不适用于误设模型,其中经典遗憾界可能失效。本文引入了上下文赌博机和情节RL的KL误设公式,并分析了基于回归的Gibbs策略更新算法。建立了具有显式误设项的高概率KL遗憾保证,将标准的可实现KL正则化设置作为特例恢复。

英文摘要

We study KL-regularized contextual bandits and episodic reinforcement learning (RL) under general function approximation with model misspecification. Existing guarantees rely on realizability and therefore do not extend to misspecified models, where classical regret bounds may fail. This work introduces KL misspecification formulations for contextual bandits and episodic RL and analyzes regression-based algorithms with Gibbs policy updates. High-probability KL-regret guarantees with explicit misspecification terms are established, recovering the standard realizable KL-regularized setting as a special case.

2606.06046 2026-06-05 math.NA cs.LG cs.NA 版本更新

Learning solution operators of PDEs with sparse approximation methods

用稀疏逼近方法学习PDE的解算子

Sebastian Neumayer, Daniel Potts, Fabian Taubert

AI总结 本文提出一种结合乘积基展开与正交匹配追踪的稀疏高维方法,用于逼近偏微分方程的解算子,显著减少所需样本量,并在数值实验中与立方体稀疏逼近和傅里叶神经算子对比,展示了在稀疏表示下的准确性和可解释性。

详情
AI中文摘要

我们研究了使用稀疏高维技术逼近偏微分方程(PDE)解算子的问题。基于维度增量框架,我们将乘积基展开与稀疏恢复方法(特别是正交匹配追踪(OMP))相结合,与先前考虑的基于立方体的方法相比,大幅减少了所需样本量。我们在多个数值示例上评估了所得方法,在准确性、运行时间和样本量方面与基于立方体的稀疏逼近和傅里叶神经算子进行了比较。实验表明,我们的方法相对于其前身显著减少了所需的PDE求解次数,同时保持了有竞争力的准确性,特别是当解在所选基中具有稀疏表示时。此外,恢复的稀疏索引集为相关变量和参数交互提供了可解释的见解。

英文摘要

We investigate the approximation of solution operators for partial differential equations (PDEs) using sparse high-dimensional techniques. Building on a dimension-incremental framework, we combine product basis expansions with sparse recovery methods, specifically orthogonal matching pursuit (OMP), to substantially reduce the required sample size compared with a previously considered cubature-based approach. We evaluate the resulting method numerically on several examples, comparing it against both cubature-based sparse approximation and Fourier neural operators in terms of accuracy, runtime, and sample size. The experiments show that our approach considerably reduces the number of required PDE solves relative to its predecessor while maintaining competitive accuracy, particularly when the solution admits a sparse representation in the chosen basis. Furthermore, the recovered sparse index sets yield interpretable insights into the relevant variables and parameter interactions.

2606.06043 2026-06-05 stat.ML cs.LG 版本更新

Adaptive Learning Rates with Surrogate Probability for Follow-the-Perturbed-Leader

基于替代概率的自适应学习率用于跟随扰动领导者

Jongyeong Lee, Junya Honda, Shinji Ito, Chansoo Kim

发表机构 * Korea Institute of Science and Technology(韩国科学技术院) Kyoto University(京都大学) RIKEN AIP(理化学研究所AIP) The University of Tokyo(东京大学) University of Science and Technology(科学技术大学)

AI总结 针对FTPL算法因无优化特性难以设计自适应概率依赖学习率的问题,提出基于替代概率函数的自适应学习率,实现了任意形状参数α>1的Pareto扰动下的最佳双世界保证,并扩展到专家建议的赌博机问题。

Comments TBA COLT2026

详情
AI中文摘要

跟随正则化领导者框架在在线学习问题中显示出有效性和灵活性,其中学习率的选择至关重要。最近,通过求解凸优化获得的、基于臂选择概率定义的自适应学习率,在各种赌博机问题中实现了改进的最佳双世界(BOBW)保证。相比之下,其计算高效替代方案——跟随扰动领导者(FTPL)的BOBW保证仍然相对有限,因为其无优化特性讽刺地使得设计自适应的、概率依赖的学习率变得非平凡。为了解决这一挑战,我们通过引入替代概率函数为FTPL提出了一种自适应学习率,该函数仅从可用量计算,无需精确概率。基于这些具有替代函数的学习率,我们为具有任意形状参数$α>1$的Pareto扰动的FTPL提供了BOBW保证,推广了先前仅限于特定选择$α=2$的结果。我们进一步展示了在具有专家建议的赌博机问题中,具有自适应学习率的FTPL的BOBW保证。我们的方法保留了FTPL的计算简单性,同时实现了概率依赖的自适应性,并且基于替代的方法论可能在其他算法框架(超越FTPL和学习率设计)中具有独立意义。

英文摘要

Follow-the-regularized-leader framework has shown effectiveness and flexibility in online learning problems, where the choice of learning rates are known to be crucial. Recently, adaptive learning rates defined in terms of the arm-selection probabilities, obtained by solving convex optimization, have achieved improved best-of-both-worlds (BOBW) guarantees in various bandit problems. In contrast, BOBW guarantees for its computationally efficient alternative, follow-the-perturbed-leader (FTPL), remain relatively limited since its optimization-free nature ironically makes the design of adaptive, probability-dependent learning rates non-trivial. To address this challenge, we propose an adaptive learning rate for FTPL by introducing surrogate probability functions that can be computed only from the available quantities, without requiring the exact probabilities. Based on these learning rates with surrogate functions, we provide the BOBW guarantee for FTPL with Pareto perturbations for any shape parameter $α>1$, generalizing prior results restricted to specific choices of $α=2$. We further show the BOBW guarantees for FTPL with adaptive learning rates in the bandit problem with expert advices. Our approach preserves the computational simplicity of FTPL while enabling probability-dependent adaptivity, and the surrogate-based methodology may be of independent interest in other algorithmic frameworks beyond FTPL and learning rate designs.

2606.06034 2026-06-05 cs.LG cs.AI 版本更新

When Good Enough Is Optimal: Multiplication-Only Matrix Inversion Approximation for Quantized Gated DeltaNet

当足够好即最优:量化门控DeltaNet的仅乘法矩阵求逆近似

Luoming Zhang, Yuwei Ren, Kui Zhang, Tian Liu, Lingjuan Ge, Denghao Li, Matthew Harper Langston, Yin Huang, Weiliang Will Zeng, Liang Zhang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 针对分块并行线性注意力中矩阵求逆的瓶颈,提出基于截断Neumann级数展开的仅矩阵乘法算法,结合结构掩码和并行残差校正,实现NPU上5倍内核加速和20%解码层开销降低。

详情
AI中文摘要

分块并行线性注意力中的矩阵求逆是长上下文建模的主要瓶颈,尤其是在NPU上,基于前向替换的方法并行性有限且硬件利用率低。我们提出了一种快速的、基于矩阵乘法(MatMul)的算法,专门针对分块线性注意力中出现的严格下三角矩阵。受Neumann级数项快速增长和逆矩阵对角集中性的启发,我们采用截断Neumann展开,结合结构掩码和并行残差校正,以消除顺序依赖。我们进一步将方法扩展到低比特INT,通过缓解重复矩阵幂运算引起的动态范围扩展,并根据块大小调整近似阶数和残差步长,以最小化计算成本同时保持模型精度。在Qwen3.5系列模型上的实验表明,在浮点和低精度推理下,该方法实现了高达5倍的内核级加速和20%的解码层开销降低,同时保持了精度。我们的方法为可扩展线性注意力提供了一种高效且硬件友好的解决方案。

英文摘要

Matrix inversion in chunk-wise parallel linear attention is a major bottleneck for long-context modeling, particularly on NPUs, where forward-substitution-based methods exhibit limited parallelism and poor hardware utilization. We propose a fast, Matrix Multiplication (MatMul)-based algorithm tailored for strictly lower-triangular matrices arising in chunk-wise linear attention. Motivated by the rapid growth of Neumann-series terms and the diagonal concentration of the inverse matrix, we employ a truncated Neumann expansion with structural masking and parallel residual correction to eliminate sequential dependencies. We further extend our method to low-bits INT by mitigating the dynamic range expansion arising from repeated matrix power operations, and adapt the approximation order and residual step to the chunk size to minimize computational cost while preserving the model's accuracy. Experiments on Qwen3.5-family models demonstrate up to 5$\times$ kernel-level speedup and a 20% reduction in decode-layer overhead, while preserving accuracy under both floating-point and low-precision inference. Our method offers an efficient and hardware-friendly solution for scalable linear attention.

2606.06032 2026-06-05 cs.LG 版本更新

Catastrophic Forgetting as Accessibility Collapse: A Three-Level Framework for Knowledge Persistence in Continual Learning

灾难性遗忘作为可访问性崩溃:持续学习中知识持久性的三层次框架

Ayushman Trivedi, Bhavika Melwani

发表机构 * Independent Researchers(独立研究者)

AI总结 本文提出一个三层次框架(知识存储、表示和可访问性),通过实验证明灾难性遗忘主要是可访问性失败而非表示擦除,重新训练分类器可恢复大部分性能。

Comments 14 pages, 6 figures, 8 tables. Sequential continual-learning experiments on CIFAR-100 using ResNet-18

详情
AI中文摘要

灾难性遗忘通常被解释为在顺序学习过程中先前获得的知识不可逆转地擦除。在这项工作中,我们研究了一种替代观点:遗忘可能并非源于任务表示的完全破坏,而是源于对保存信息的可访问性丧失。我们引入了一个三层次框架,将知识存储、表示和可访问性分开,并通过在ResNet-18上对顺序CIFAR-100分类进行的一系列持续学习实验来评估每个组件。我们的分析结合了检查点持久性、线性探测、表示几何、分类器重置恢复和逐层可恢复性实验。我们观察到早期任务的完全行为遗忘,任务准确率从54.8%下降到0%,而线性探测性能保留了大约76%的原始表示信息。此外,仅重新训练最终分类器就恢复了原始任务性能的75.7%,而无需修改骨干网络。逐层分析表明,早期和中间层保留了高度可恢复的任务信息,尽管后期阶段严重退化。投影能量和主角度分析表明,保留的知识以分布式高维表示的形式持续存在,而不是通过保留一个小的主导子空间。这些发现表明,灾难性遗忘更适合被描述为可访问性失败而非完全表示擦除,并且即使在功能遗忘发生后,大量任务相关信息仍嵌入在神经表示中。

英文摘要

Catastrophic forgetting is commonly interpreted as the irreversible erasure of previously acquired knowledge during sequential learning. In this work, we investigate an alternative perspective: that forgetting may arise not from complete destruction of task representations but from a loss of accessibility to preserved information. We introduce a three-level framework separating knowledge storage, representation, and accessibility, and evaluate each component through a series of continual-learning experiments on sequential CIFAR-100 classification using ResNet-18. Our analysis combines checkpoint persistence, linear probing, representation geometry, classifier-reset recovery, and layer-wise recoverability experiments. We observe complete behavioral forgetting of earlier tasks, with task accuracy collapsing from 54.8% to 0%, while linear probe performance retains approximately 76% of the original representational information. Furthermore, retraining only the final classifier restores 75.7% of the original task performance without modifying the backbone network. Layer-wise analysis reveals that early and intermediate layers preserve highly recoverable task information despite severe degradation at later stages. Projection-energy and principal-angle analyses indicate that retained knowledge persists as distributed high-dimensional representations rather than through preservation of a small dominant subspace. These findings suggest that catastrophic forgetting is better characterized as an accessibility failure than complete representational erasure, and that substantial task-relevant information remains embedded within neural representations even after functional forgetting has occurred.

2606.06027 2026-06-05 cs.AI cs.CL cs.LG cs.SI 版本更新

RedditPersona: A Modular Framework for Community-Conditioned LLM Adaptation from Reddit

RedditPersona: 一个用于从Reddit进行社区条件化LLM适配的模块化框架

Amirhossein Ghaffari, Ali Goodarzi, Huong Nguyen, Simo Hosio, Lauri Lovén, Ekaterina Gilman

发表机构 * Future Computing Group University of Oulu(未来计算组奥卢大学) Centre for Applied Computing University of Oulu(应用计算中心奥卢大学)

AI总结 提出RedditPersona模块化框架,通过五种分组策略和QLoRA训练参数高效适配器,在112个Reddit子版块上评估社区条件化语言模型,发现适配器的行为可识别性与策略内在一致性相关,且所有策略在可识别性和分布相似性之间存在一致权衡。

详情
AI中文摘要

社区条件化的语言模型适配需要在每个研究中独立做出关于数据收集、社区定义和评估的选择,这使得比较假设或重用工件变得困难。我们提出了RedditPersona,一个模块化框架,标准化了这些选择:它收集Reddit帖子和评论,分析活跃用户,根据五种分组策略(基于子版块、图结构、语义、混合和基于交互)对用户进行划分,通过QLoRA为每种策略训练参数高效的适配器,并在一个涵盖流畅性、忠实度、分布对齐和社区可识别性的共享度量套件下进行评估。应用于城市福祉领域的112个子版块(301,429个用户档案,超过1600万条评论),我们发现适配器的行为可识别性追踪了每种策略与子版块基线的内在一致性,并且所有五种策略在可识别性和与真实文本的分布相似性之间存在一致的权衡。代码和配置文件可在以下网址获取:https://github.com/Ahghaffari/redditpersona。

英文摘要

Community-conditioned language model adaptation requires choices about data collection, community definition, and evaluation that are currently made independently in each study, making it hard to compare assumptions or reuse artifacts. We present RedditPersona, a modular framework that standardizes these choices: it collects Reddit posts and comments, profiles active users, partitions them under five grouping strategies (subreddit-based, graph-structural, semantic, hybrid, and interaction-based), trains a parameter-efficient adapter per strategy via QLoRA, and evaluates them under a shared metric suite spanning fluency, fidelity, distributional alignment, and community identifiability. Applied to 112 subreddits in the urban well-being domain (301,429 user profiles, 16M+ comments), we find that adapters' behavioral identifiability tracks each strategy's intrinsic agreement with the subreddit baseline, and that a consistent trade-off between identifiability and distributional similarity to real text holds across all five strategies. The code and configuration files are available at: https://github.com/Ahghaffari/redditpersona.

2606.06011 2026-06-05 cs.RO cs.LG cs.MA 版本更新

Merging model-based control with multi-agent reinforcement learning for multi-agent cooperative teaming strategies

将基于模型的控制与多智能体强化学习相结合以实现多智能体协作团队策略

Christian Llanes, Spencer W. Jensen, Samuel Coogan

发表机构 * Georgia Institute of Technology(佐治亚理工学院) Sandia National Laboratories(桑地亚国家实验室)

AI总结 提出一种结合多智能体强化学习与模型预测控制的框架(MA-AC-MPC),通过扩展演员-评论家模型预测控制实现安全、动态可行的协作策略,并在追逃场景和异构环境中验证其优于多层感知机模型。

Comments 12 pages, 8 figures, 7 tables

详情
AI中文摘要

在这项工作中,我们提出了一种将多智能体强化学习(MARL)与基于模型的控制相结合的框架,以在协作多智能体任务中实现安全、动态可行的动作。多智能体强化学习具有从长期规划视野中的离散不可微奖励中学习多智能体团队协作策略的优势。模型预测控制具有鲁棒性,并在快速重规划框架中为短视野提供安全、动态可行的动作。我们提出了一种将演员-评论家模型预测控制扩展到MARL的算法,称为多智能体演员-评论家模型预测控制(MA-AC-MPC)。我们通过将其应用于多智能体追逃场景来展示该算法的能力。具体来说,我们比较了使用MA-AC-MPC模型和多层感知机模型(MA-AC-MLP)的逃避者团队策略。追逐者团队使用增强比例导航,因为它被接受为一种先进的对抗控制律。我们还提供了一个异构环境的示例,其中无人机和全向轮式机器人协作,在硬件上实现了可重复且成功的着陆,MA-AC-MPC的成功率为100%,而MA-AC-MLP为60%。我们在硬件上证明了所提出的MA-AC-MPC算法在两种环境中的鲁棒性。

英文摘要

In this work, we propose a framework that combines multi-agent reinforcement learning (MARL) with model-based control to achieve safe, dynamically feasible actions in cooperative multi-agent tasks. Multi-agent reinforcement learning provides the advantage of learning cooperative policies for multi-agent teams from discrete non-differentiable rewards in a long planning horizon. Model-predictive control is robust and offers safe, dynamically feasible actions in a fast replanning framework for short horizons. We propose an algorithm that extends actor-critic model predictive control for MARL which we refer to as multi-agent actor-critic model predictive control (MA-AC-MPC). We demonstrate the capabilities of this algorithm by applying it to a multi-agent pursuit-evasion scenario. Specifically, we compare the evader team's strategy using the MA-AC-MPC model and a multi-layer perceptron model (MA-AC-MLP). The pursuer team uses augmented proportional navigation as it is accepted as an advanced adversarial control law. We also provide an example with a heterogeneous environment where a drone and omni-wheeled rover cooperate to achieve repeatable and successful landing with 100% success rate in hardware for MA-AC-MPC compared to 60% for MA-AC-MLP. We demonstrate the robustness of the proposed MA-AC-MPC algorithm in hardware for both environments.

2606.05994 2026-06-05 cs.LG eess.SP 版本更新

HoT-SSM:Higher-order Temporal Knowledge Graph Reasoning with State Space Models for Health Care

HoT-SSM:用于医疗保健的高阶时序知识图谱推理与状态空间模型

Thummaluru Siddartha Reddy, Vempalli Naga Sai Saketh, Yash Punjabi, Mahesh Chandran

发表机构 * Fujitsu Research of India, Bangalore(印度班加罗尔 Fujitsu 研究院)

AI总结 提出HoT-SSM模型,通过构建超图捕获高阶临床交互,并利用动态超图状态空间模型建模长程时序依赖,在MIMIC-III/IV数据集上显著提升临床预测性能。

Comments Paper under review

详情
AI中文摘要

融合临床知识的医学知识图谱(MKGs)越来越多地被用于建模电子健康记录(EHRs),以支持医疗领域的可解释预测。然而,现有的基于MKG的方法在捕获临床概念(如病情、手术和药物)之间的成对关系方面存在局限,限制了其建模共现或语义相关概念间高阶交互的能力。此外,大多数利用MKG的表示学习方法要么跨就诊折叠时间信息,要么缺乏显式建模长程时序依赖的机制,而这对于死亡率预测等临床任务至关重要。为缓解这些局限,我们提出HoT-SSM,一种参数高效的高阶时序图推理方法,结合状态空间模型。对于每次就诊,HoT-SSM通过利用领域知识将语义相关的临床概念分组为超边来构建超图,从而保留就诊级别的临床上下文。此外,为在学表示的同时建模时序动态,我们引入一种新颖的基于动态超图的状态空间模型,显式捕获患者潜在状态随时间演变,同时保留长程信息。学到的表示用于下游临床预测和推理。在MIMIC-III和MIMIC-IV数据集上的实验表明,性能显著优于当前最先进模型,证明了联合建模高阶临床交互和长程时序依赖的有效性。

英文摘要

Medical knowledge graphs (MKGs) infused with clinical knowledge have been increasingly used to model electronic health records (EHRs) to support interpretable predictions in healthcare domain. However, existing MKG-based approaches are limited in capturing pairwise relations between clinical concepts (e.g., conditions, procedures, and medications), and restricts their ability to model higher-order interactions among co-occurring or semantically related concepts. In addition, most representation learning methods that leverage MKGs either collapse temporal information across visits or lack an explicit mechanism for modeling long-range temporal dependencies, which is critical for clinical tasks such as mortality prediction. To mitigate these limitations, we propose HoT-SSM, a parameter efficient and higher-order temporal graph reasoning with state space models. For each visit, HoT-SSM constructs hypergraphs by grouping semantically related clinical concepts into hyperedges using domain knowledge, thereby preserving visit-level clinical context. Further, to model the temporal dynamics while learning the representations, we introduce a novel dynamic hypergraph-based state space model that explicitly captures patients latent state evolution over time while preserving long-range information. The learned representations are used for downstream clinical prediction and reasoning. Experiments on MIMIC-III and MIMIC-IV datasets shows significant performance improvement over the current state-of-the-art models, demonstrating the effectiveness of jointly modeling higher-order clinical interactions and long-range temporal dependencies.

2606.05988 2026-06-05 cs.LG cs.CL 版本更新

Compress-Distill: Reasoning Trace Compression for Efficient Knowledge Distillation

压缩-蒸馏:面向高效知识蒸馏的推理轨迹压缩

Maxime Griot, Paul Steven Scotti, Tanishq Mathew Abraham

发表机构 * Université catholique de Louvain(列日天主教大学) Sophont Inc(Sophont公司)

AI总结 本文提出在知识蒸馏前对推理轨迹进行事后压缩,以降低训练成本并缩短推理输出,实验表明压缩在准确率与效率间存在权衡。

详情
AI中文摘要

推理模型产生长的思维链轨迹,这些轨迹蒸馏成本高且鼓励学生输出冗长内容。我们研究在知识蒸馏前对这些轨迹进行事后压缩。两个教师模型,Qwen3.5-397B-A17B 和 gpt-oss-120B,各生成约 283k 条正确轨迹;两个指令调优模型将其压缩至原始字符长度的 8.6-21.0%。在包含 48 次运行的主网格和七次 Qwen 教师截断消融实验中,压缩轨迹将训练 token 减少至原始的 12-30%,训练速度提升 2.0-7.6 倍,推理输出缩短 3-19 倍,在更短的 gpt-oss 教师下减少幅度较小。然而,原始轨迹在每个规模下和两位教师上都保持最高的下游准确率。一项长度匹配的原始轨迹截断消融实验表明,压缩并非仅仅受益于更小的 token 预算:模型压缩的轨迹通常优于或匹配朴素截断,尤其是对于较小的学生模型,同时保持更短的推理输出。总体而言,推理轨迹压缩提供了准确率与效率之间的权衡,而非免费改进:学生模型保留了原始轨迹高达 96% 的准确率,同时获得了高达 18 倍的每 token 效率提升;在 0.8B 规模下,使用 LoRA 压缩轨迹缩小了原始与压缩之间的差距,但未超过原始轨迹。

英文摘要

Reasoning models produce long chain-of-thought traces that are costly to distill and encourage verbose student outputs. We study post-hoc compression of such traces before knowledge distillation. Two teachers, Qwen3.5-397B-A17B and gpt-oss-120B, generate about 283k correct traces each; two instruction-tuned models then compress them to 8.6-21.0% of their original character length. Across a 48-run main grid plus seven Qwen-teacher truncation ablations, compressed traces reduce training tokens to 12-30% of raw, speed up training by 2.0-7.6x, and shorten inference outputs by 3-19x with smaller reductions under the shorter gpt-oss teacher. However, raw traces retain the highest downstream accuracy at every scale and for both teachers. A length-matched raw-trace truncation ablation shows that compression is not merely benefiting from a smaller token budget: model-compressed traces usually beat or match naive truncation, especially for smaller students, while maintaining shorter inference outputs. Overall, reasoning-trace compression offers an accuracy-efficiency trade-off rather than a free improvement: students retain up to 96% of raw-trace accuracy while gaining up to 18x higher per-token efficiency, and at the 0.8B scale under LoRA compressed traces narrow the raw-vs-compressed gap but do not exceed raw.

2606.05981 2026-06-05 cs.CV cs.LG 版本更新

Video-Rate Streaming Stylization on a Vision-Aware MLLM-Conditioned Edit Diffusion: Asymmetric Batched Inference on a Distilled UNet + MLLM Text Encoder

基于视觉感知的多模态大语言模型条件编辑扩散的视频率流式风格化:蒸馏UNet + MLLM文本编码器上的非对称批处理推理

Yoshiyuki Ootani

发表机构 * Independent researcher(独立研究员)

AI总结 针对蒸馏扩散模型中文本编码器成为瓶颈的问题,提出一种结合非对称CUDA流水线、编译友好的ControlNet-LLLite重构和周期性条件刷新调度的流式管线,在消费级GPU上实现视频率实时风格化编辑。

Comments 12 pages, 4 figures, 12 tables. Under review at IEEE Transactions on Circuits and Systems for Video Technology. Code, evaluation harness, and the released v3 Temporal LLLite adapter weights are at https://github.com/otanl/dreamlite-stream (also mirrored to Hugging Face and Zenodo)

详情
AI中文摘要

扩散U-Net的激进蒸馏反转了实时文本到图像流水线的逐帧瓶颈:一旦去噪器成为4步或1步蒸馏的学生模型,文本编码器就成为关键路径。这种反转在视觉感知编辑扩散中最为严重,其中编码器是多模态大语言模型(MLLM)。我们研究了一个0.39B蒸馏编辑U-Net与2.13B MLLM文本编码器(Qwen3-VL)配对的情况,并提出了一种针对该场景的流式管线,该管线围绕三种工程机制构建:非对称侧流/主流CUDA流水线,带有批处理文本编码器摊销(以及可选的静态提示缓存);一种编译友好的ControlNet-LLLite重构,将整个U-Net +适配器堆栈折叠成单个融合图;以及一个带有钩子子集的周期性条件刷新调度,用于摊销每帧条件成本。在单个消费级RTX 3090 Ti上,512x512分辨率下,管线在批大小B=8时维持27.4 fps,B=16时维持29.6 fps,端到端p50延迟分别约为0.5和1.0秒;相同操作点在RTX 4090上测得54.9 fps,在RTX 5090上测得74.1 fps。我们报告的是视频率流式吞吐量而非交互式低延迟,并将我们的数据与相同堆栈的StreamDiffusion重运行进行对比,作为系统上下文,而非基准优越性声明。对于训练的油画风格,发布的时序适配器在剪辑内噪声中泛化到19个未使用的DAVIS-2017序列和来自七个来源的15个非DAVIS剪辑;对未见风格族的提示级泛化有限,并单独报告。

英文摘要

Aggressive distillation of the diffusion U-Net inverts the per-frame bottleneck of real-time text-to-image pipelines: once the denoiser is a 4-step or 1-step distilled student, the text encoder becomes the critical path. This inversion is most acute in vision-aware edit diffusion, where the encoder is a multimodal large language model (MLLM). We study the case of a 0.39B distilled edit U-Net paired with a 2.13B MLLM text encoder (Qwen3-VL) and present a streaming pipeline targeted at this regime built around three engineering mechanisms: asymmetric side-stream / main-stream CUDA pipelining with batched text-encoder amortisation (and optional static-prompt caching), a compile-friendly ControlNet-LLLite reformulation that folds the entire U-Net + adapter stack into a single fused graph, and a periodic conditioning-refresh schedule with a hook subset that amortises the per-frame conditioning cost. On a single consumer RTX 3090 Ti at 512x512 the pipeline sustains 27.4 fps over a 480-frame run at batch size B=8 and 29.6 fps at B=16, with end-to-end p50 latency of approximately 0.5 and 1.0 seconds respectively; the same operating point measures 54.9 fps on RTX 4090 and 74.1 fps on RTX 5090. We report video-rate streaming throughput rather than interactive low latency, and locate our numbers against same-stack StreamDiffusion re-runs as systems context, not as a benchmark superiority claim. For the trained oil-painting style, the released temporal adapter generalises within in-clip noise to 19 unused DAVIS-2017 sequences and 15 non-DAVIS clips from seven sources; prompt-level generalisation to unseen style families is bounded and reported separately.

2606.05972 2026-06-05 cs.LG 版本更新

LLM Explainability with Counterfactual Chains and Causal Graphs

基于反事实链和因果图的LLM可解释性

Nirit Nussbaum-Hoffer, Nitay Calderon, Liat Ein-Dor, Roi Reichart

发表机构 * Faculty of Data and Decision Sciences, Technion I IBM Research(数据与决策科学学院,技术离子IBM研究所)

AI总结 提出一种四阶段方法,利用因果图建模LLM推理过程,通过MCMC启发的反事实增强发现类判别性概念并生成可解释图,用于疾病诊断、情感分析等任务。

详情
AI中文摘要

因果图为使机制透明提供了高级语言。近期工作使用大型语言模型(LLMs)恢复外部世界过程的因果图。相反,在本文中,我们使用因果图对LLM推理本身进行建模,为利益相关者提供模型如何感知和组织高层概念以产生预测的透明视图。我们提出了一种四阶段方法来构建此类图。给定目标LLM和一组文本示例,我们的方法发现类判别性、人类可解释的概念,并将每个输入映射到LLM感知的概念状态。然后,我们引入一种受MCMC启发的反事实增强过程,通过反事实链扩展稀疏的观测数据。这使得使用$σ$-CG进行稳定的因果发现成为可能,从而产生信息丰富且可解释的图。我们将我们的方法应用于三个LLM,涵盖疾病诊断、情感分析和LLM作为评判者的分类任务。我们评估了学习到的图的预测保真度和结构稳定性,以及受MCMC启发的增强的收敛性和下游效用。我们的结果表明,发现的因果图捕获了与LLM推理一致的有意义的依赖关系。总之,本文为LLM的概念级可解释性提供了基础。

英文摘要

Causal graphs provide a high-level language for making mechanisms transparent. Recent work uses Large Language Models (LLMs) to recover causal graphs of external-world processes. Instead, in this paper, we use causal graphs to model LLM inference itself, providing stakeholders with a transparent view of how the model perceives and organizes high-level concepts to produce a prediction. We propose a four-phase method for constructing such graphs. Given a target LLM and a set of textual examples, our method discovers class-discriminative, human-interpretable concepts and maps each input to LLM-perceived concept states. We then introduce an MCMC-inspired counterfactual augmentation procedure that expands the sparse observational data through chains of counterfactuals. This enables stable causal discovery with $σ$-CG, yielding informative, interpretable graphs. We apply our method to three LLMs across disease diagnosis, sentiment analysis, and LLM-as-a-judge classification tasks. We evaluate the learned graphs for predictive fidelity and structural stability, and the MCMC-inspired augmentation for convergence and downstream utility. Our results show that the discovered causal graphs capture meaningful dependencies consistent with LLMs' reasoning. Together, this paper provides a foundation for concept-level explainability of LLMs.

2606.05970 2026-06-05 cs.CL cs.AI cs.LG 版本更新

Measuring the sensitivity of LLM-based structured extraction to prompt, model, and schema choices in clinical discharge summaries

测量基于LLM的结构化提取对临床出院小结中提示、模型和模式选择的敏感性

Martin Murin

发表机构 * DryLabz GmbH(DryLabz公司)

AI总结 本研究通过固定提取任务并逐一改变提示、模型和模式选择,测量了大型语言模型在临床文本结构化提取中输出对上游配置的敏感性,发现模式选择导致的差异集中在缺失与沉默的区分上,而模型选择在多类分类中主导提示措辞。

Comments 69 pages, 5 main figures, supplementary material included

详情
AI中文摘要

大型语言模型越来越多地用于从临床自由文本笔记中进行结构化提取,但其输出对上游配置选择的敏感性比在固定基准上的准确性更少被理解。本文通过固定提取任务并逐一改变一个选择,在没有人工标注真实值的情况下测量了这种敏感性。固定模式包括17个临床文档标志(三值:是/否/未记录)和47个标签词汇(用于主要入院原因)。表达该模式的三种提示变体分别在两个模型大小上对MIMIC-IV v3.1出院小结运行。跨提示一致性通过Cohen's kappa在ICD分层子集上测量。配对相同笔记比较隔离了模型选择的影响,事后将三值标志折叠为二值测试了模式对不一致的贡献。在三值标志上,两个模型达到相同的合并跨提示一致性(中位数kappa 0.69和0.68);较大的模型提高了某些字段的一致性并降低了其他字段的一致性,这是一种重新分布而非无效果。将模式折叠为二值消除了大部分跨提示不一致,将其定位在缺失与沉默的区分上,而非发现是否存在。在多类入院分类上,改变模型会重新分配近一半笔记的主导标签,而改变提示措辞则重新分配约八分之一的笔记,并且较大的模型在残余的通用类别上分配的权重少得多(44%到26%)。这些模式表明,模式施加的不一致集中在缺失与沉默轴上,而模型在多类分类上主导提示措辞,这是通过一种可重复的方法在人群规模部署中审计提取可重复性而识别的。

英文摘要

Large language models are increasingly used for structured extraction from clinical free-text notes, but the sensitivity of their output to upstream configuration choices is less understood than their accuracy on fixed benchmarks. This work measures that sensitivity without human-annotated ground truth, by holding the extraction task fixed and varying one choice at a time. The fixed schema comprises 17 clinical documentation flags on a three-way yes/no/not_documented value set and a 47-tag vocabulary for the primary admission reason. Three prompt variants expressing this schema were each run at two model sizes on MIMIC-IV v3.1 discharge summaries. Cross-prompt agreement was measured by Cohen's kappa on ICD-stratified subsets. A paired same-note comparison isolated the effect of model choice, and a post-hoc collapse of the three-way flags to binary tested the schema's contribution to disagreement. On the three-way flags, the two models reach the same pooled cross-prompt agreement (median kappa 0.69 and 0.68); the larger model raises agreement on some fields and lowers it on others, a redistribution rather than the absence of an effect. Collapsing the schema to binary dissolves most of the cross-prompt disagreement, locating it on the absence-versus-silence distinction rather than on whether the finding is present. On the multi-class admission categorization, changing the model reassigns the dominant tag on close to half of all notes while changing the prompt phrasing reassigns it on roughly one in eight, and the larger model places far less mass on residual catch-all categories (44% to 26%). These patterns indicate a schema-imposed source of disagreement concentrated on the absence-versus-silence axis and a dominance of model over prompt phrasing on multi-class categorization, identified by a reusable methodology for auditing extraction reproducibility on a population-scale deployment.

2606.05958 2026-06-05 cs.LG 版本更新

Steering Vectors are an Adversarial Attack Surface

Steering Vectors 是对抗攻击面

Abzal Aidakhmetov, Donato Crisostomi, Tommaso Mencattini, Adrian Robert Minut, Iacopo Masi, Emanuele Rodolà

发表机构 * Sapienza University of Rome(罗马萨皮恩扎大学) EPFL(苏黎世联邦理工学院)

AI总结 本文揭示了一种隐蔽的数据投毒攻击,通过替换转向数据集中的4-6%令牌,使转向向量与反拒绝方向对齐,从而劫持目标模型,同时保留对良性提示的预期转向效果。

详情
AI中文摘要

激活转向已成为一种无需微调即可控制大型语言模型(LLM)行为的流行方法。由于该技术即插即用,用户共享数据集和预计算向量以转向模型激活。然而,我们展示了一种隐蔽的数据投毒攻击可以悄无声息地破坏这一流程。通过替换转向数据集中的4-6%令牌,攻击者可以使结果向量与反拒绝方向对齐。这劫持了目标模型,同时保留了对良性提示的预期转向效果。在此威胁模型下,恶意行为者可以分发一个看似安全的包,包含文本、向量和权重,以及一个终端用户可以验证的等价证书。我们在两个开放权重模型系列和八个模型-属性组合上测试了该攻击,观察到中毒向量的绝对攻击成功率(ASR)达到20-55%,比干净参考高出19%到51%。最后,我们发现一种拒绝方向正交化防御可以恢复约82%的ASR差距,而不损害良性行为。

英文摘要

Activation steering has become a popular way to control Large Language Model (LLM) behavior without fine-tuning. Since the technique is plug-and-play, users share datasets and precomputed vectors to steer model activations. However, we show that a \emph{stealth data poisoning attack} silently compromises this pipeline. By substituting $4{-}6\%$ of tokens in the steering dataset, an attacker can silently align the resulting vector with an anti-refusal direction. This jailbreaks the target model while preserving the intended steering effect on benign prompts. Under this threat model, a malicious actor can distribute an apparently safe bundle containing texts, vectors, and weights, alongside an equivalence certificate that the end-user can verify. We test the attack on two open-weight model families and eight model-attribute combinations, observing that poisoned vectors reach an absolute attack success rate (ASR) of $20{-}55\%$, $+19\%$ to $+51\%$ over a clean reference. Finally, we find that a refusal-direction orthogonalization defense can recover ${\approx}82\%$ of the ASR gap without harming benign behavior.

2606.05957 2026-06-05 cs.LG stat.ML 版本更新

Dead Directions: Geometric Singular Learning

死方向:几何奇异学习

Tejas Pradeep Shirodkar

发表机构 * IIIT, Hyderabad(Hyderabad 二十一世纪信息技术研究所)

AI总结 本文通过引入“死方向”概念,桥接奇异学习理论与信息几何,提出在原始参数坐标下从Fisher曲率衰减率恢复KL阶数的方法,并扩展到深度网络,实现无需后验采样的Watanabe三元组(λ, m, ν)轨迹率读出。

Comments 139 pages, 13 figures, 13 tables

详情
AI中文摘要

奇异学习理论和信息几何研究了相同的参数空间,但使用了大体不同的词汇:前者在解决坐标中计算贝叶斯不变量,后者在非退化假设下使用原始坐标,而过参数化模型经常违反该假设。我们通过一个原始概念——死方向——将它们桥接起来:死方向是沿着Fisher度量退化的单位向量,等价于具有确定KL阶数的解析奇异集的切向量,KL阶数由KL散度消失的速度决定。两种解读命名同一向量;我们的核心操作表明,其KL阶数可作为方向Fisher曲率趋近奇异点的衰减率恢复,在原始参数坐标中无需Hironaka分解。光滑纤维上的选择规则将该速率转化为Watanabe的单方向对实对数规范阈值的贡献,我们将恢复扩展到多分量交叉、重数m、奇异波动ν(在一维方向中KL阶数通用)、先验RLCT偏移以及温度后验。然后我们将该速率提升到深度网络:多层K-FAC分解将每个Fisher块写为激活侧和梯度侧速率的乘积,两者之间存在对偶性,并在现代网络原语(残差流、层归一化、注意力)中实例化。商定理将该速率传递到在G不变度量下梯度流的规范商Θ/G;SGD符合条件,标准Adam不符合,我们构造了一个G等变Adam族预条件器(DDCAdam)使其符合。该桥接提供了对奇异几何的参数坐标处理、每个架构的闭式预测,以及从一个检查点的前向和后向传播中无需后验采样的Watanabe三元组(λ, m, ν)轨迹率读出。

英文摘要

Singular learning theory and information geometry have studied the same parameter spaces in mostly separate vocabularies: the former computes Bayesian invariants in resolved coordinates, the latter works in original coordinates under a non-degeneracy assumption that overparameterised models routinely violate. We bridge them through one primitive, the dead direction: a unit vector along which the Fisher metric degenerates, equivalently a tangent to the analytic singular set with a definite KL order, set by how fast the KL divergence vanishes. The two readings name the same vector; our central move shows its KL order is recoverable as the decay rate of the directional Fisher curvature approaching the singularity, in original parameter coordinates and without a Hironaka resolution. A selection rule on smooth fibres translates this rate into Watanabe's single-direction contribution to the real log canonical threshold, and we extend the recovery to multi-component crossings, multiplicity $m$, the singular fluctuation $ν$ (universal in the KL order for 1D directions), prior-RLCT shifts, and tempered posteriors. We then lift this rate to a deep network: a multi-layer K-FAC factorisation writes each Fisher block as a product of activation- and gradient-side rates with a duality between them, instantiated at modern-network primitives (residual streams, layer normalisation, attention). A quotient theorem carries the rate to the gauge quotient $Θ/G$ under gradient flow on a $G$-invariant metric; SGD qualifies, standard Adam does not, and we construct a $G$-equivariant Adam-family preconditioner (DDCAdam) that does. The bridge yields a parameter-coordinate handle on singular geometry, closed-form per-architecture predictions, and a trajectory-rate readout of Watanabe's triple $(λ, m, ν)$ from one checkpoint's forward and backward passes, without posterior sampling.

2606.05946 2026-06-05 cs.LG 版本更新

Short paper: Models in the dark -- Rectification and erasure under GDPR in ML supply chains

短论文:黑暗中的模型——机器学习供应链中GDPR下的更正与删除

Henrik Graßhoff, Malte Hansen, Meiko Jensen, Sara Ramezanian

发表机构 * Karlstad University(卡尔斯塔德大学)

AI总结 本文从跨学科视角调查机器学习供应链中实现GDPR更正权和删除权的挑战,提出“黑暗中的模型”概念,并分析其带来的紧迫问题。

Comments accepted for presentation at Annual Privacy Forum 2026

详情
AI中文摘要

根据《通用数据保护条例》(GDPR)确立的更正权和删除权对于保护个人隐私至关重要。然而,它们在机器学习(ML)系统中的有效执行仍然具有挑战性。现有工作大多从法律或技术角度孤立地处理这些权利,而忽视了模型是在涉及开发、分发和部署等多个参与者的复杂供应链中产生的。本文对ML模型中实现更正权和删除权的挑战进行了全面调查。基于学术文献和数据保护机构的指导,我们发现许多GDPR要求在技术上尚无法在实践中满足。我们的发现进一步表明,ML供应链中出现的问题在研究中的关注不足。为了解决这一差距,我们引入了“黑暗中的模型”的概念——即在ML链下游创建的衍生模型,缺乏足够的透明度和可追溯性——并分析了这一现象带来的紧迫挑战。通过采用跨学科视角,这项工作有助于弥合法律要求与ML中数据主体权利技术实施之间的差距,最终支持可信人工智能的发展。

英文摘要

The rights to rectification and erasure, as established under the General Data Protection Regulation (GDPR), are central to protecting individuals' privacy. However, their effective enforcement in machine learning (ML) systems remains challenging. Existing work has largely addressed these rights from either a legal or a technical perspective in isolation and disregards the fact that models are produced in complex supply chains involving multiple actors across development, distribution, and deployment. This paper presents a holistic survey of challenges in implementing the rights to rectification and erasure in ML models. Drawing on academic literature and guidance from data protection authorities, we find that many GDPR requirements cannot yet be technically met in practice. Our findings further suggest that issues arising in ML supply chains are insufficiently addressed in research. To tackle this gap, we introduce the notion of models in the dark -- derived models created further downstream in an ML chain without sufficient transparency or traceability -- and analyse the urgent challenges posed by this phenomenon. By adopting an interdisciplinary perspective, this work contributes to bridging the gap between legal requirements and the technical implementation of data subject rights in ML, ultimately supporting the development of trustworthy artificial intelligence.

2606.05942 2026-06-05 stat.ML cs.LG 版本更新

EML-CD: Causal Mechanism Recovery via EML Symbolic Trees in Structure Learning

EML-CD:通过结构学习中的EML符号树进行因果机制恢复

Sota Asanuma

发表机构 * SoftBank Corp(软银公司)

AI总结 提出EML-CD框架,利用EML算子构建可解释的因果机制符号树,从数据中自动发现闭式因果方程,在真实和合成数据上实现了机制恢复与结构学习的平衡。

详情
AI中文摘要

基于神经网络(NN)的非线性因果发现方法能够恢复DAG结构,但将每个因果机制视为黑箱。Waxman等人认为从NN权重中提取因果机制是不适定的。我们提出EML-CD,一个将EML算子(能够从单个二元运算符组合初等函数)集成到因果结构学习中的框架,以可解释的机制恢复为主要目标。EML-CD将每条边机制表示为门控EML二叉树,并自动发现闭式因果方程。解析雅可比矩阵可直接从输出方程计算,从而定量理解因果效应。在真实数据(Sachs蛋白信号,d=11)上,EML-CD达到SHD=11.2±0.4(5次种子均值;基线为单次确定性运行),与PC/GES在种子方差内相当且低于CAM,同时为每条检测到的边附加闭式方程(精确率0.756,召回率0.365)。在已知机制的受控双变量测试中,EML-CD忠实恢复了11个初等函数族中的10个(留出形状相关性≥0.96;仅高频正弦部分恢复)。在符号合成基准上,EML-CD的留出机制f-MSE远低于固定SINDy字典且更稳定(均值3.67对比7644,后者因一次种子的灾难性外推而膨胀),尽管其结构恢复(SHD 14.0)仅与字典相当且低于专用优化器;在Causal Chambers光隧道子集上,深度2模型将F1分数从线性OLS-BIC的0.273提升至0.444。

英文摘要

Neural network (NN)-based nonlinear causal discovery methods recover DAG structure but leave each causal mechanism as a black box. Waxman et al. argued that extracting causal mechanisms from NN weights is ill-posed. We propose EML-CD, a framework that integrates the EML operator (capable of composing elementary functions from a single binary operator) into causal structure learning, with interpretable mechanism recovery as the primary objective. EML-CD represents each edge mechanism as a gated EML binary tree and automatically discovers closed-form causal equations. Analytical Jacobians can be directly computed from the output equations, enabling quantitative understanding of causal effects. On real data (Sachs protein signaling, d=11), EML-CD achieves SHD=11.2 +/- 0.4 (5-seed mean; baselines are single deterministic runs), on par with PC/GES within seed variance and below CAM, while attaching closed-form equations to each detected edge (precision 0.756, recall 0.365). In a controlled bivariate test with known mechanisms, EML-CD recovers 10 of 11 elementary function families faithfully (held-out shape correlation >= 0.96; only high-frequency sine is partial). On a symbolic synthetic benchmark, EML-CD attains a substantially lower and more stable held-out mechanism f-MSE than a fixed SINDy dictionary (mean 3.67 vs. 7644, the latter inflated by catastrophic extrapolation on one seed), although its structure recovery (SHD 14.0) only matches the dictionary and stays below specialized optimizers; on the Causal Chambers light-tunnel subset, a depth-2 model improves F1 over linear OLS-BIC (0.444 vs. 0.273).

2606.05931 2026-06-05 cs.CL cs.AI cs.CV cs.IR cs.LG cs.MM eess.AS 版本更新

To Be Multimodal or Not to Be: Query-Adaptive Audio-Visual Person Retrieval via Active Modality Detection

多模态还是非多模态:通过主动模态检测的查询自适应音视频人物检索

Erfan Loweimi, Mengjie Qian, Kate Knill, Guanfeng Wu, Chi-Ho Chan, Abbas Haider, Muhammad Awan, Josef Kittler, Hui Wang, Mark Gales

发表机构 * University of Cambridge(剑桥大学) Queen's University Belfast(贝尔法斯特女王大学) University of Surrey(萨里大学) Cisco(思科) Southwest Jiaotong University(西南交通大学) Teesside University(泰赛德大学)

AI总结 提出一种查询自适应框架,通过跨模态分数一致性检测主动模态,在BBC Rewind语料库上达到94.2%的P@1,优于单模态和固定融合方法。

Comments INTERSPEECH 2026

详情
AI中文摘要

当通过语音和面部从视频档案中检索一个人时,系统应该是多模态的吗?在实际的广播档案中,与精心策划的基准不同,目标可能只被听到但未被看到、只被看到但未被听到,或者两者兼有。融合来自缺失模态的分数会引入噪声,使精度低于最佳单模态系统。我们提出了一种查询自适应框架,通过跨模态分数一致性检测主动模态:当两种模态都活跃时,由一种模态检索的文件在另一种模态上也得分高;当一种模态缺失时,这种一致性被破坏。由这些跨模态特征驱动的分类器实现了89%的检测准确率。在BBC Rewind语料库(包含超过12,000个广播视频)上,自适应系统达到了94.2%的P@1,优于仅语音(82.9%)、仅面部(93.4%)和固定融合(90.0%),恢复了与具有真实模态标签的Oracle(96.6%)之间差距的64%。

英文摘要

When retrieving a person from a video archive by voice and face, should the system be multimodal or not? In real-world broadcast archives, unlike curated benchmarks, a target may be heard but unseen, seen but unheard, or both. Fusing scores from an absent modality injects noise, degrading precision below the best unimodal system. We propose a query-adaptive framework that detects active modalities via cross-modal score consistency: when both modalities are active, files retrieved by one also score highly on the other; this agreement breaks down when a modality is absent. Classifiers driven by these cross-modal features achieve 89% detection accuracy. On the BBC Rewind corpus (with over 12,000 broadcast videos) the adaptive system attains 94.2% P@1, outperforming speaker-only (82.9%), face-only (93.4%), and fixed fusion (90.0%), recovering 64% of the gap to an oracle with ground-truth modality labels (96.6%).

2606.05927 2026-06-05 cs.LG 版本更新

Addressing Imbalance in Multi-Label Data via Label-Specific Distance-based Oversampling

通过标签特定距离的过采样解决多标签数据中的不平衡问题

Bin Liu, Jun Wu, Haoyu Peng, Ao Zhou, Jin Wang, QiaoSong Chen, Grigorios Tsoumakas

发表机构 * Key Laboratory of Data Engineering and Visual Computing, Chongqing University of Posts and Telecommunications, China(数据工程与视觉计算重点实验室,重庆邮电大学,中国) School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, China(计算机科学与技术学院,重庆邮电大学,中国) State Key Laboratory of Novel Software Technology, Nanjing University, China(新型软件技术国家重点实验室,南京大学,中国) School of Informatics, Aristotle University of Thessaloniki, Greece(信息学院,希腊阿尔蒂米斯大学)

AI总结 针对多标签分类中的标签不平衡问题,提出基于标签特定距离的过采样方法LSDMLO,通过加权相关特征空间识别标签一致邻居,生成更有效的合成实例,实验表明优于现有方法。

详情
AI中文摘要

复杂的非平衡标签分布对多标签分类构成了严峻挑战,因为大多数分类器偏向于多数类和高频标签。过采样是一种高效且灵活的解决方案,通过增加实例来为多标签分类器提供更平衡的训练数据集。现有的大多数过采样方法以启发式方式创建合成实例,本质上依赖于在整个特征空间中使用欧氏距离检索的邻域信息。然而,它们未能考虑特征对不同标签的不同语义相关性,导致邻近邻居之间的标签不一致,进而引入标签混淆和过拟合到合成实例。为了克服上述问题,我们提出了一种新颖的采样方法,称为基于标签特定距离的多标签过采样(LSDMLO),该方法创建更有用且标签正确的合成实例,以解决多标签数据集中的不平衡问题。LSDMLO基于加权相关特征空间推导标签特定距离,以识别标签一致的邻居,这有助于选择在边界区域表达更多标签相关性的种子实例,并生成与原始数据标签分布一致的合成实例。综合实验表明,所提出的LSDMLO在各种基分类器下均优于最先进的多标签采样方法。

英文摘要

The complex imbalanced label distribution poses a crucial challenge to multi-label classification, as most classifiers are biased towards the majority class and high-frequent labels. Oversampling is an efficient and flexible solution that augments instances to provide a more balanced training dataset for multi-label classifiers. Most existing oversampling methods create synthetic instances in a heuristic way that essentially relies on neighborhood information retrieved using Euclidean distance within the entire feature space. However, they fail to consider the varying semantic relevance of features to different labels, leading to label inconsistency among proximate neighbors and further introducing label confusion and overfitting to synthetic instances. To overcome the above issue, we propose a novel sampling approach called Label-Specific Distance-based Multi-Label Oversampling (LSDMLO) that creates more useful and well-labeled synthetic instances to address the imbalance in multi-label datasets. LSDMLO derives the label-specific distance to identify label-consistent neighbors based on the weighted pertinent feature space, which facilitates selecting seed instances that express more label correlations in boundary areas and generating synthetic instances aligned with the label distribution of original data. The comprehensive experiments verify that the proposed LSDMLO outperforms the state-of-the-art multi-label sampling approaches under various base classifiers.

2606.05911 2026-06-05 cs.SD cs.LG eess.AS 版本更新

DBHN-Net: Dual-Branch Hybrid Neural Network For Low-Complexity Monaural Speech Enhancement

DBHN-Net: 低复杂度单声道语音增强的双分支混合神经网络

Cunhang Fan, Enrui Liu, Jing Zhou, Jian Kang, Jie Li, Andong Li, Jian Zhou, Zhao Lv, Xuelong Li

发表机构 * State Key Laboratory of Opto-Electronic Information Acquisition and Protection Technology, (School of Computer Science and Technology), Anhui University(光电信息获取与防护技术国家重点实验室(计算机科学与技术学院),安徽大学) China Telecom Artificial Intelligence Technology (Beijing) Co., Ltd(中国电信人工智能技术(北京)有限公司) Institute of Acoustics, University of Chinese Academy of Sciences(中国科学院声学研究所) Institute of Artificial Intelligence (TeleAI), China Telecom, China(人工智能研究所(TeleAI),中国电信,中国)

AI总结 提出一种结合ANN和SNN的双分支混合神经网络,通过BandSplit、TF-Mamba等模块降低计算复杂度,同时利用交互和融合模块保持性能,在三个公共数据集上实现平均7.5倍复杂度降低。

Comments This article has been accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI)

详情
Journal ref
IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI2026)
AI中文摘要

尽管基于人工神经网络(ANN)的语音增强(SE)方法表现出色,但高计算复杂度和高能耗阻碍了它们在实际前端处理任务中的部署。目前,脉冲神经网络(SNN)在降低功耗方面显示出潜力。然而,SNN的离散二进制激活和复杂的时空动态常常导致信息丢失。因此,当前的挑战集中在如何保持性能并降低计算复杂度。为了解决这个问题,本文提出了一种双分支混合神经网络(DBHN)。1)在网络架构方面:设计了一个集成ANN和SNN的双分支网络,其中SNN分支降低功耗,而ANN分支解决信息丢失;开发了BandSplit和时频(TF)-Mamba模块,以同时压缩能耗和增强模型性能;实现了带有残差连接的脉冲特征提取组(SFEG)和信息转换块(ITB)组件,以减轻信息丢失,同时进一步细化特征表示。2)为了促进分支间的信息融合:设计了一个交互模块,以促进双分支网络各个阶段的信息交换;设计了一个TF交叉注意力融合模块,在数据自适应地引导SNN分支保留更多关键信息的同时,对双分支信息进行时频域融合。结果表明,所提出的模型在三个公共数据集上保持了优越的性能,同时与基线模型相比,计算复杂度平均降低了7.5倍。

英文摘要

Although artificial neural network (ANN) based speech enhancement (SE) methods demonstrate excellent performance, the high computational complexity and high energy consumption hinder their deployment in practical front-end processing tasks.} Currently, the spiking neural networks (SNNs) have shown potential in reducing power consumption. However, the discrete binary activation and complex spatio-temporal dynamics of SNNs often result in information loss. The current challenge therefore focuses on how to maintain performance and reduce computational complexity. To address this issue, this work propose a Dual-Branch Hybrid Neural (DBHN) Network. 1) In terms of network architecture: A dual-branch network integrating ANN and SNN was designed, where the SNN branch reduces power consumption while the ANN branch addresses information loss; The BandSplit and Time-Frequency (TF) -Mamba modules were developed to simultaneously compress energy consumption and enhance model performance; Spiking Feature Extraction Group (SFEG) and Information Transformation Block (ITB) components were implemented with residual connections to mitigate information loss while further refining feature representations. 2) To facilitate inter-branch information fusion: An Interaction module was designed to promote information exchange at various stages of the dual-branch network; A TF-Cross Attention-Fusion module was designed to perform time-frequency domain fusion of dual-branch information while data-adaptively guiding the SNN branch to retain more critical information. Results show that the proposed model maintains superior performance across three public datasets while achieving an average 7.5 fold reduction in computational complexity compared to baseline models.

2606.05899 2026-06-05 cs.LG cond-mat.dis-nn 版本更新

High-Dimensional Theory of LoRA Fine-Tuning in a Solvable Attention Model

可解注意力模型中LoRA微调的高维理论

O. Duranthon, F. Boncoraglio, L. Zdeborová

发表机构 * Statistical Physics of Computation Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)(计算物理学实验室,瑞士联邦理工学院(EPFL))

AI总结 本文通过高维统计理论分析低秩适应(LoRA)在注意力模型中的微调过程,揭示了预训练与微调之间的相互作用,并给出了测试误差和表示对齐的精确渐近刻画。

详情
AI中文摘要

我们发展了低秩适应(LoRA)在注意力模型中的高维统计理论,捕捉了预训练与微调之间的相互作用。我们引入了一个可解框架,其中单头注意力层首先在数据丰富的任务上进行预训练,随后通过秩一LoRA更新在有限数据上进行适应。在高维极限下,两个阶段都允许通过一组有限序参数进行尖锐的渐近刻画,从而为测试误差和表示对齐提供显式预测。我们的分析表明,预训练对LoRA的影响可以总结为一个有效噪声项,由此我们推导出最优预训练过程的处方。我们还展示了一个测试误差与表示质量不匹配的机制,并提出了我们的理论在主动微调中的应用。

英文摘要

We develop a high-dimensional statistical theory of low-rank adaptation (LoRA) in attention models, capturing the interplay between pre-training and fine-tuning. We introduce a solvable framework in which a single-head attention layer is first pre-trained on a data-abundant task and subsequently adapted via a rank-one LoRA update on limited data. In the high-dimensional limit, both stages admit a sharp asymptotic characterization in terms of a finite set of order parameters, yielding explicit predictions for test errors and representation alignment. Our analysis shows that the impact of pre-training on LoRA is summarized by an effective noise term, from which we derive prescriptions for the optimal pre-training procedure. We also demonstrate a regime with a mismatch between the value of the test error and representation quality, and propose an application of our theory to active fine-tuning.

2606.05895 2026-06-05 cs.CL cs.LG 版本更新

Representing Research Attention as Contextually Structured Flows

将研究关注度表示为上下文结构化流

Jessica Rodrigues, Angelo Salatino, Gard Jenset, Scott Hale

发表机构 * University of Oxford(牛津大学) The Open University(开放大学) Springer Nature

AI总结 提出注意力流(attention flows)作为上下文结构化表示,编码注意力的组织及其随时间演化,通过类比推理基准评估发现流表示更有效支持结构比较,并提升部分观测和结构扰动下的鲁棒性。

Comments Accepted at STi 2026 - International Conference on Science and Technology Indicators

详情
AI中文摘要

研究关注度被广泛用作可见性、影响和社会采纳的指标,但通常表示为聚合计数,无法保留注意力在上下文中随时间如何发展。这造成了注意力解释方式与其表示方式之间的不匹配。我们提出注意力流作为上下文结构化表示,编码注意力的组织及其随时间演化。我们通过构建基于研究产出间类比推理的基准,评估这些表示是否捕获可迁移结构。比较信号、序列和基于流的表示,我们发现流表示更有效地支持结构比较,特别是在注意力受时间进程或上下文分布影响的场景中。我们进一步表明,学习到的流表示在部分观测和结构扰动下提高了鲁棒性。总体而言,这些结果支持将注意力建模为上下文结构化现象,并为更具信息性的研究评估方法提供了基础。

英文摘要

Research attention is widely used as an indicator of visibility, influence, and societal uptake, yet it is typically represented as aggregated counts that do not preserve how attention develops across contexts over time. This creates a mismatch between how attention is interpreted and how it is represented. We propose attention flows as contextually structured representations that encode the organisation of attention and its evolution over time. We evaluate whether these representations capture transferable structure by constructing a benchmark based on analogy-style reasoning across research outputs. Comparing signal, sequence, and flow-based representations, we find that flow representations more effectively support structural comparison, particularly in settings where attention is shaped by temporal progression or context distributions. We further show that learned flow representations improve robustness under partial observation and structural perturbation. Overall, these results support modelling attention as a contextually structured phenomenon and provide a basis for more informative approaches to research evaluation.

2606.05885 2026-06-05 cs.LG 版本更新

When Denser Credit Is Not Enough: Evidence-Calibrated Policy Optimization for Long-Horizon LLM Agent Training

当更密集的信用不足时:面向长周期LLM智能体训练的基于证据校准的策略优化

Yuanfan Li, Qi Zhou, Wenjing Duan, Lu Chen

发表机构 * X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University, Shanghai, China(X-LANCE实验室,计算机科学学院,上海交通大学,上海,中国) Faculty of Electronic and Information Engineering, Xi’an Jiaotong University(电子与信息工程学院,西安交通大学)

AI总结 针对长周期LLM智能体在稀疏延迟奖励下的信用分配问题,提出一种无评论家的策略优化算法ECPO,通过证据校准的动作优势和方差门控信用加权来修正密集信用的统计不可靠性,在ALFWorld和WebShop上显著提升性能。

详情
AI中文摘要

长周期LLM智能体需要能够在稀疏和延迟奖励下为中间决策分配信用的强化学习方法。最近的基于分组的方法如GiGPO通过构建重复锚点状态下的步骤级优势来改进GRPO。然而,我们表明这种密集信用在统计上可能不可靠:在有限的轨迹采样下,罕见但幸运的动作可能获得过大的优势,产生发散锚点偏差和后期训练振荡。我们提出证据校准策略优化(ECPO),一种在策略更新前校准步骤级信用的无评论家策略优化算法。ECPO结合了证据校准动作优势(将轨迹按规范动作分组并收缩低计数估计)和方差门控信用加权(抑制由动作内噪声主导的锚点状态)。在ALFWorld和WebShop上使用Qwen2.5-1.5B/7B的实验表明,ECPO持续优于强基线,在Qwen2.5-1.5B上,ALFWorld/WebShop的成功点分别比GiGPO提高+5.2/+7.3,同时仅增加0.1%的额外优势计算开销。

英文摘要

Long-horizon LLM agents require reinforcement learning methods that can assign credit to intermediate decisions under sparse and delayed rewards. Recent group-based methods such as GiGPO improve over GRPO by constructing step-level advantages at repeated anchor states. However, we show that such dense credit can be statistically unreliable: under limited rollouts, rare but lucky actions may receive overly large advantages, producing divergent anchor bias and late-stage training oscillation. We propose Evidence-Calibrated Policy Optimization (ECPO), a critic-free policy optimization algorithm that calibrates step-level credit before policy updates. ECPO combines Evidence-Calibrated Action Advantage, which groups rollouts by canonical actions and shrinks low-count estimates, with Variance-Gated Credit Weighting, which suppresses anchor states dominated by within-action noise. Experiments on ALFWorld and WebShop with Qwen2.5-1.5B/7B show that ECPO consistently outperforms strong baselines, improving GiGPO by +5.2/+7.3 success points on ALFWorld/WebShop with Qwen2.5-1.5B while adding only 0.1% additional advantage-computation overhead.

2606.05873 2026-06-05 cs.RO cs.AI cs.CV cs.LG 版本更新

LadderMan: Learning Humanoid Perceptive Ladder Climbing

LadderMan: 学习人形机器人感知爬梯

Siheng Zhao, Yuanhang Zhang, Ziqi Lu, Pieter Abbeel, Rocky Duan, Koushil Sreenath, Yue Wang, C. Karen Liu, Guanya Shi

发表机构 * Amazon FAR(亚马逊FAR) USC(美国南加州大学) UC Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学) CMU(卡内基梅隆大学)

AI总结 提出LadderMan系统,通过两阶段学习管道和视觉基础模型,使人形机器人能够鲁棒地攀爬多种梯子并在梯子上进行操控。

详情
AI中文摘要

人形机器人在以人为中心的环境中具有巨大潜力,但由于稀疏的立足点和手抓点、复杂的全身协调以及对感知和控制误差的敏感性,爬梯仍然是最具挑战性的任务之一。我们提出了 extbf{LadderMan},一个统一的系统,使人形机器人能够鲁棒地攀爬多种梯子并在这种受限条件下进行操控。我们的攀爬策略基于一个可扩展的两阶段学习管道,其中我们使用混合运动跟踪从单个参考运动学习多个攀爬专家,并通过混合模仿和强化学习将这些专家蒸馏成一个统一的基于深度视觉的运动攀爬策略。为了实现真实世界部署,我们利用视觉基础模型来弥合深度感知中的模拟到现实差距。基于学习到的攀爬策略,我们进一步使用双智能体公式训练一个独立的操控策略,允许通过遥操作在梯子上进行稳定操控。实验表明,LadderMan在多种几何形状的梯子上实现了鲁棒的攀爬,以零样本方式成功迁移到真实世界硬件,并在具有挑战性的梯子约束下支持各种操控任务。视频结果见https://ladderman-robot.github.io。

英文摘要

Humanoid robots hold great promise for operating in human-centered environments, yet ladder climbing remains one of the most challenging tasks due to sparse footholds and handholds, complex whole-body coordination, and sensitivity to perception and control errors. We present \textbf{LadderMan}, a unified system that enables humanoid robots to robustly climb diverse ladders and perform manipulation under such constrained conditions. Our climbing policy is built on a scalable two-stage learning pipeline, where we use hybrid motion tracking to learn multiple climbing experts from a single reference motion, and distill these experts into a unified depth-based visuomotor climbing policy via hybrid imitation and reinforcement learning. To enable real-world deployment, we leverage vision foundation models to bridge the sim-to-real gap in depth perception. Building on the learned climbing policy, we further train a separate manipulation policy using a dual-agent formulation, allowing stable on-ladder manipulation via teleoperation. Experiments demonstrate that LadderMan achieves robust ladder climbing across a wide range of geometries, successfully transfers to real-world hardware in a zero-shot manner, and supports various manipulation tasks under challenging ladder constraints. Video results are available at https://ladderman-robot.github.io .

2606.05870 2026-06-05 q-bio.NC cs.LG q-bio.QM 版本更新

Cross-scale spatially-aware generative modeling of transcriptomic programs underlying neurodegenerative brain organization

跨尺度空间感知生成模型揭示神经退行性脑组织下的转录组程序

Krishnakumar Vaithianathan

发表机构 * Department of Computer Engineering, Karaikal Polytechnic College, Karaikal, Puducherry, India(计算机工程系,卡莱克尔理工学院,卡莱克尔,浦那赫里,印度)

AI总结 提出一种跨尺度空间感知生成框架,通过变分生成架构结合图空间平滑正则化,学习区域基因表达与皮质退化的潜在生物程序,实现高精度预测(解释方差0.8604,空间相关r=0.9439)。

Comments 26 pages, 5 figures

详情
AI中文摘要

神经退行性疾病如阿尔茨海默病表现出高度有序的区域性脑脆弱性模式,但这种空间选择性的生物学机制仍不完全清楚。现有的成像-转录组研究主要依赖于基因表达与神经影像表型之间的相关性分析,限制了它们模拟分子组织如何导致神经退化的能力。在这里,我们引入了一个跨尺度空间感知生成框架,用于模拟皮质退化下的转录组程序。使用艾伦人脑图谱中910个标志基因在68个皮质区域的区域转录组图谱。通过计算认知正常对照(NC=926)和阿尔茨海默病受试者(AD=426)之间的皮质厚度差异,从ADNI FreeSurfer皮质厚度测量构建神经退行性脆弱性图谱。采用变分生成架构学习连接区域基因表达组织与皮质退化的潜在生物程序,同时结合基于图的空间平滑正则化以保持皮质组织。所提出的框架实现了对区域神经退行性脆弱性的强预测,解释方差为0.8604,预测与观察到的皮质退化图谱之间存在显著空间相关性(r=0.9439,p<0.001)。学习到的潜在表示揭示了与分布性疾病易感性相关的结构化转录组组织。这些发现表明,生物约束的生成建模可以桥接微观分子组织与宏观神经退化,为空间感知的生成神经生物学和计算神经科学奠定基础。

英文摘要

Neurodegenerative disorders such as Alzheimer's disease exhibit highly organized patterns of regional brain vulnerability, yet the biological mechanisms underlying this spatial selectivity remain incompletely understood. Existing imaging-transcriptomic studies have largely relied on correlation-based analyses between gene expression and neuroimaging phenotypes, limiting their ability to model how molecular organization gives rise to neurodegeneration. Here, we introduce a cross-scale spatially-aware generative framework for modeling transcriptomic programs underlying cortical neurodegeneration. Regional transcriptomic profiles were derived from the Allen Human Brain Atlas using 910 landmark genes across 68 cortical regions. Neurodegenerative vulnerability maps were constructed from ADNI FreeSurfer cortical thickness measurements by computing regional cortical thinning differences between cognitively normal controls (NC = 926) and Alzheimer's disease subjects (AD = 426). A variational generative architecture was used to learn latent biological programs linking regional gene-expression organization to cortical degeneration while incorporating graph-based spatial smoothness regularization to preserve cortical organization. The proposed framework achieved strong prediction of regional neurodegenerative vulnerability, yielding an explained variance of 0.8604 and a significant spatial correlation between predicted and observed cortical degeneration profiles (r = 0.9439, p < 0.001). The learned latent representations revealed structured transcriptomic organization associated with distributed disease susceptibility. These findings demonstrate that biologically constrained generative modeling can bridge microscale molecular organization with macroscale neurodegeneration, providing a foundation for spatially-aware generative neurobiology and computational neuroscience.

2606.05863 2026-06-05 cs.LG cs.AI 版本更新

Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction

通过深度线性网络理论与条件ReLU约简解读Grokking中的两个训练时钟

Hu Tan, Kuo Gai, Shihua Zhang

发表机构 * State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China(数学科学国家重点实验室,数学与系统科学研究院,中国科学院,北京100190,中国) School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China(中国科学院大学数学科学学院,北京100049,中国) Shanghai Institute for Mathematics and Interdisciplinary Sciences (SIMIS), Shanghai, China(上海数学与交叉科学研究所(SIMIS),上海,中国) Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China(浙江省系统健康科学重点实验室,生命科学学院,杭州先进研究院,中国科学院大学,中国科学院,杭州310024,中国)

AI总结 本文通过分离分类损失的快速衰减与表示学习的缓慢简化,定义了“两个训练时钟”形式化Grokking现象,并利用深度线性网络理论和条件ReLU约简机制解释了这一两阶段过程。

详情
AI中文摘要

Grokking表明,拟合训练数据和学习简单底层规则可能发生在不同的时间尺度上。我们通过将分类损失的快速衰减与学习表示的较慢简化分离来形式化这一现象,并将由此产生的停止时间对称为两个训练时钟。对于深度线性网络,我们证明后边际间隙增长或一步尾部收缩条件在对数时间尺度上将交叉熵损失降低到ε水平。相反,当存在逐层权重衰减时,端到端映射上的诱导正则化可以表示为Schatten型惩罚;在尖锐的晚期Kurdyka-Lojasiewicz尾部下,这种结构能量在多项式时间尺度上闭合。因此,两个时钟将拟合与表示简化分开。然后我们解释相同机制如何在ReLU MLP中出现。在训练集上的激活模式保持固定的区域中,网络简化为活动坐标上的线性模型。在两层ReLU嵌入模型中,链式法则估计进一步表明,在受控的下游范数下,分类器头可以比嵌入块接收更大的有效梯度。这支持了一个两阶段机制:分类器先拟合,而表示随后继续简化。我们以模加法作为主要实验设置。深度线性理论提供了分析的核心严格基础。但ReLU结果被表述为条件约简,以解释经验行为,而不声称对非线性训练动态的全局证明。

英文摘要

Grokking suggests that fitting the training data and learning a simple underlying rule may occur on different time scales. We formalize this phenomenon by separating the fast decay of the classification loss from the slower simplification of the learned representation, and we call the resulting pair of stopping times two training clocks. For deep linear networks, we show that a post-margin gap-growth or one-step tail-contraction condition reduces the cross-entropy loss to level epsilon on a logarithmic time scale. In contrast, when layerwise weight decay is present, the induced regularization on the end-to-end map can be expressed as a Schatten-type penalty; under a sharp late-time Kurdyka-Lojasiewicz tail, this structural energy closes on a polynomial time scale. The two clocks, therefore, separate fitting from representation simplification. We then explain how the same mechanism can appear in ReLU MLPs. In regions where the activation patterns on the training set remain fixed, the network reduces to a linear model in the active coordinates. In a two-layer ReLU embedding model, chain-rule estimates further show that the classifier head can receive larger effective gradients than the embedding block under controlled downstream norms. This supports a two-stage mechanism in which the classifier fits first, while the representation continues to simplify later. We use modular addition as the main experimental setting. The deep linear theory provides the rigorous core of the analysis. But the ReLU results are formulated as conditional reductions that account for empirical behavior without claiming a global proof for nonlinear training dynamics.

2606.05817 2026-06-05 cs.LG cs.AI 版本更新

Consistency Training Along the Transformer Stack

沿Transformer堆栈的一致性训练

Sukrati Gautam, Neil Shah, Arav Dhoot, Bryan Maruyama, Caroline Wei, Rohan Kapoor, Robert Sidey, Prakhar Gupta, Zi Cheng Huang, David Demitri Africa

发表机构 * Purdue University(普渡大学) Independent(独立) Columbia University(哥伦比亚大学) University of California, San Diego(加州大学圣地亚哥分校) University of California, Los Angeles(加州大学洛杉矶分校) Dartmouth College(达特茅斯学院) University of Michigan, Ann Arbor(密歇根大学安娜堡分校)

AI总结 本文通过引入MLP状态和注意力分布的一致性目标,将一致性训练扩展到多种安全威胁,并发现跨威胁泛化及共享机制,证明其作为灵活对齐框架的有效性。

Comments Submitted to EMNLP 2026

详情
AI中文摘要

一致性训练鼓励模型在不同上下文中表现相似,并已显示出减少对齐问题的潜力。我们以两种方式扩展一致性训练的范围。首先,我们引入两个新的内部一致性目标:MLP一致性训练(MLPCT),匹配激活后的MLP状态;以及注意力一致性训练(AttCT),匹配每个头的注意力分布。其次,我们将一致性训练应用于四种额外的安全威胁:角色上下文学习攻击、对抗性挫败、预填充攻击和条件性对齐错误。在多个模型和威胁设置中,我们发现一致性训练在减少对齐问题方面远优于先前工作中研究的谄媚和越狱设置。我们还发现了跨威胁泛化的案例,即针对一种失败模式的训练提高了对另一种模式的鲁棒性,并识别了ACT、MLPCT和AttCT共享的残差流机制,同时将BCT区分为机制上不同的方法。我们的结果表明,一致性训练是一个灵活且可扩展的对齐框架,能够统一防御更广泛的模型病理类别。

英文摘要

Consistency training encourages models to behave similarly across different contexts, and has shown promise for reducing misalignment. We broaden the scope of consistency training in two ways. First, we introduce two new internal consistency targets: MLP Consistency Training (MLPCT), which matches post-activation MLP states, and Attention Consistency Training (AttCT), which matches per-head attention distributions. Second, we apply consistency training to four additional safety threats: persona in-context learning attacks, adversarial frustration, prefill attacks, and conditional misalignment. Across several models and threat settings, we find that consistency training reduces misalignment well beyond the sycophancy and jailbreak settings studied in prior work. We also find cases of cross-threat generalization, where training against one failure mode improves robustness to another, and identify a shared residual-stream mechanism underlying ACT, MLPCT, and AttCT, while distinguishing BCT as mechanistically distinct. Our results suggest that consistency training is a flexible and extensible framework for alignment, capable of unifying defenses against a broader class of model pathologies.

2606.05814 2026-06-05 cs.LG 版本更新

Robust and sparse support vector machine via hybrid truncated loss for supervised classification

基于混合截断损失的鲁棒稀疏支持向量机用于监督分类

Yuliang Yang, Chen Chen, Yuxiang Liu, Huiru Wang

发表机构 * School of Science, Beijing Forestry University(北京林业大学理学院) Translational Cancer Research Center, Peking University First Hospital(北京大学第一医院转化肿瘤研究中心)

AI总结 提出一种稀疏且有界的混合截断损失函数L_ht,构建L_ht-SVM模型用于单视图分类,并扩展为多视图MvL_ht-SVM,通过P-平稳点和交替方向乘子法实现高效优化,实验表明在准确率、稀疏性和鲁棒性上优于对比方法。

详情
AI中文摘要

支持向量机(SVM)是一种广泛使用的分类器,但选择合适的损失函数仍然困难。凸损失如hinge损失和最小二乘损失对异常值敏感,而有界非凸损失通常导致高计算成本。为解决这一问题,我们提出一种混合截断损失函数($L_{\mathrm{ht}}$),该函数既稀疏又有界,并构建了用于单视图分类的$L_{\mathrm{ht}}$-SVM模型。我们引入P-平稳点,并利用它建立一阶必要和充分最优性条件。基于这些条件,我们设计了一种带有工作集策略的交替方向乘子法,降低了计算成本并实现了全局收敛。我们进一步通过添加结构信息和视图权重将$L_{\mathrm{ht}}$-SVM扩展到多视图学习,得到Mv$L_{\mathrm{ht}}$-SVM,该方法遵循共识和互补原则。在合成、真实世界和图像数据集上的实验表明,$L_{\mathrm{ht}}$-SVM在准确率更高、支持向量更少和噪声鲁棒性更好方面优于五种单视图方法,而Mv$L_{\mathrm{ht}}$-SVM在准确率、精确率、召回率和F1分数上优于六种多视图方法。

英文摘要

The support vector machine (SVM) is a widely used classifier, but choosing an appropriate loss function remains difficult. Convex losses such as the hinge loss and least-squares loss are sensitive to outliers, while bounded non-convex losses often lead to high computational cost. To address this, we propose a hybrid truncated loss function ($L_{\mathrm{ht}}$) that is both sparse and bounded, and build the $L_{\mathrm{ht}}$-SVM model for single-view classification. We introduce the P-stationary point and use it to establish the first-order necessary and sufficient optimality conditions. Based on these conditions, we design an alternating direction method of multipliers with a working-set strategy that reduces computational cost and achieves global convergence. We further extend $L_{\mathrm{ht}}$-SVM to multi-view learning by adding structural information and view weights, resulting in Mv$L_{\mathrm{ht}}$-SVM, which follows both the consensus and complementarity principles. Experiments on synthetic, real-world, and image datasets show that $L_{\mathrm{ht}}$-SVM achieves higher accuracy with fewer support vectors and better noise robustness than five single-view methods, while Mv$L_{\mathrm{ht}}$-SVM outperforms six multi-view methods in accuracy, precision, recall, and F1-score.

2606.05800 2026-06-05 cs.LG 版本更新

SALT: When More Rollouts Don't Help in Group-Based Policy Optimization and How to Make Them Matter

SALT: 当更多 rollout 在基于组的策略优化中无益时如何使其发挥作用

Powei Chang, Jinpeng Zhang, Chaoqun Sun, MiniWell Tsao, Lianrui Li, Jianxiang Xiang, Chenyu Wang, Yukang Gao, Dongying Kong

发表机构 * Bilibili Inc.(哔哩哔哩公司) Fudan University(复旦大学) Zhejiang University(浙江大学)

AI总结 针对 GRPO 风格组归一化中增加 rollout 数量导致梯度抵消的问题,提出 SALT 组件,通过子空间自适应重加权组相对更新系数,改善更新几何并提升性能。

详情
AI中文摘要

基于可验证奖励的强化学习(RLVR)通常采用 GRPO 风格的组相对更新,为每个提示采样多个 rollout 以构建归一化学习信号。然而,仅仅增加 rollout 数量并不能可靠地增强学习:在 GRPO 风格组归一化下,每个 rollout 的策略梯度特征可能集中到低秩、有符号的几何结构中,导致聚合时大量抵消,削弱有效更新。我们通过 SALT(子空间自适应几何插件组件)解决这种失效模式,该组件利用样本梯度几何对组相对更新的系数进行重新加权。SALT 从小批量 Gram 几何中估计主导共享子空间,将组相对系数分解为共享通道和残差通道,并在符号抵消严重时自适应放大残差通道。在多种推理导向的 RLVR 基准和模型规模上,SALT 在不修改奖励模型或 rollout 采样过程的情况下,改善了有效更新几何和性能。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) often adopts GRPO-style group-relative updates, sampling multiple rollouts per prompt to construct normalized learning signals. However, merely increasing the number of rollouts does not reliably strengthen learning: under GRPO-style group normalization, per-rollout policy-gradient features can concentrate into a low-rank, signed geometry, causing substantial cancellation during aggregation and weakening the effective update. We address this failure mode with SALT, a Subspace-Adaptive geometry pLug-in componenT that uses sample-wise gradient geometry to reweight the coefficients of group-relative updates. SALT estimates a dominant shared subspace from the mini-batch Gram geometry, decomposes group-relative coefficients into shared and residual channels, and adaptively amplifies the residual channel when signed cancellation is severe. Across diverse reasoning-oriented RLVR benchmarks and model scales, SALT improves effective update geometry and performance without modifying the reward model or the rollout sampling procedure

2606.05799 2026-06-05 cs.LG cs.CL 版本更新

CaliDist: Calibrating Large Language Models via Behavioral Robustness to Distraction

CaliDist: 通过抗干扰行为鲁棒性校准大型语言模型

Mohammad Anas Jawad, Cornelia Caragea

发表机构 * Cornelia Caragea(卡伦·卡雷亚) Mohammad Anas Jawad(穆罕默德·安斯·贾瓦德)

AI总结 提出CaliDist方法,通过测量和惩罚模型对语义干扰的敏感性来校准LLM,在7个NLU基准上平均将ECE从23%降至7%。

详情
AI中文摘要

现有的大型语言模型(LLM)校准方法常常忽略可信度的一个关键维度:模型对无关或误导信息的{\em 行为鲁棒性}。在本文中,我们认为模型的真实置信度应反映其在认知压力下的稳定性。我们引入\textsc{CaliDist},一种新颖的事后校准方法,直接测量并惩罚模型对干扰的敏感性。\textsc{CaliDist}量化了当输入提示被语义\textit{干扰项}扰动时,LLM的预测和不确定性如何变化。然后利用这种稳定性(或不稳定性)信号来自适应地缩放模型的初始置信度分数。我们在六个不同LLM的七个自然语言理解分类基准上进行的广泛实验表明,与强基线相比,\textsc{CaliDist}一致地实现了更低的期望校准误差(ECE)和Brier分数。值得注意的是,我们的方法平均将ECE从23%降至7%——相对改进70%——表明行为稳定性是校准的有力信号。我们在github.com/m-anas-j/CaliDist提供代码和数据集。

英文摘要

Existing calibration methods for Large Language Models (LLMs) often overlook a critical dimension of trustworthiness: a model's {\em behavioral robustness} to irrelevant or misleading information. In this paper, we argue that a model's true confidence should reflect its stability under cognitive pressure. We introduce \textsc{CaliDist}, a novel post-hoc calibration approach that directly measures and penalizes a model's susceptibility to distraction. \textsc{CaliDist} quantifies how an LLM's predictions and uncertainty change when its input prompt is perturbed with semantic \textit{distractors}. This stability (or lack thereof) signal is then used to adaptively scale the model's initial confidence score. Our extensive experiments on seven Natural Language Understanding classification benchmarks using six distinct LLMs show that \textsc{CaliDist} consistently achieves lower Expected Calibration Error (ECE) and Brier Score compared with strong baselines. Remarkably, our method reduces the ECE from 23\% to 7\% on average--a relative improvement of 70\%--demonstrating that behavioral stability is a powerful signal for calibration. We make our code and datasets available at github.com/m-anas-j/CaliDist.

2606.05793 2026-06-05 cs.CL cs.AI cs.CY cs.LG 版本更新

CollabBench: Benchmarking and Unleashing Collaborative Ability of LLMs with Diverse Players via Proactive Engagement

CollabBench: 通过主动参与与多样化玩家基准测试和释放LLMs的协作能力

Hong Qian, Yuanhao Liu, Zihan Zhou, Zongbao Zhang, Hanjie Ge, Haotian Shi, Liang Dou, Xiangfeng Wang, Jingwen Yang, Aimin Zhou

发表机构 * Shanghai Institute of AI for Education(上海人工智能教育研究院) School of Computer Science(计算机科学学院) East China Normal University(东华大学) Tencent Inc.(腾讯公司) Shanghai Innovation Institute(上海创新研究院)

AI总结 提出CollabBench基准,通过多样化玩家模拟和协作智能体训练范式,提升LLM在合作游戏中的任务效率和情感适应能力。

Comments Accepted by ICML 2026

详情
AI中文摘要

尽管基于LLM的智能体在个体任务上表现出色,但与真实人类伙伴的有效协作仍然具有挑战性。现有的对话级协作研究大多缺乏基于交互和行为执行,这促使需要能够实现情境化和沉浸式协作的合作游戏环境。为此,本文提出了CollabBench,一个用于评估和训练合作游戏中协作智能体的基准。CollabBench具有多样化玩家档案模拟管道,用于建模不同的玩家行为,以及一种协作智能体训练范式,通过智能体展开统一推理、沟通和行动,并使用混合奖励优化任务效率和情感适应。我们进一步将经典环境扩展到CWAH-MultiPlayer和Cook-MultiPlayer,以在多样化个性下进行系统评估。使用效率和情感指标的实验表明,我们训练的模型优于基础模型,效率提高了19.5%,情感表现提高了24.4%。进一步分析揭示了现有模型的关键协作局限性,并为未来的协作训练提供了见解。

英文摘要

While LLM-based agents excel at individual tasks, effective collaboration with realistic human partners remains challenging. Most of the existing conversation-level collaborative studies lack grounded interaction and behavioral execution, motivating the need for cooperative game environments that enable contextualized and immersive collaboration. To this end, this paper proposes CollabBench, a benchmark for evaluating and training collaborative agents in cooperative games. CollabBench features a Diverse Player Profile Simulation pipeline to model varied players behaviors, and a Collaborative Agentic Training paradigm that unifies reasoning, communication, and action via agentic rollouts, optimized with a hybrid reward balancing task efficiency and affective adaptation. We further extend classic environments to CWAH-MultiPlayer and Cook-MultiPlayer for systematic evaluation under diverse personalities. Experiments with efficiency and affective metrics show that our trained models outperform base models, achieving 19.5% higher efficiency and 24.4% improved affective performance. Further analysis reveals key collaborative limitations of existing models and offers insights for future collaborative training.

2606.05792 2026-06-05 cs.AI cs.LG cs.LO cs.SE 版本更新

Can LLMs Write Correct TLA+ Specifications? Evaluating Natural-Language-to-TLA+ Generation

LLM 能写出正确的 TLA+ 规范吗?自然语言到 TLA+ 生成的评估

Arslan Bisharat, Brian Ortiz, Eric Spencer, Khushboo Bhadauria, TaiNing Wang, George K. Thiruvathukal, Konstantin Laufer, Mohammed Abuhamad

发表机构 * Department of Computer Science, Loyola University Chicago(洛约奈大学芝加哥分校计算机科学系)

AI总结 本文首次系统评估基于 LLM 从自然语言合成 TLA+ 规范的能力,发现模型在语义正确性上仅达 8.6%,且成功依赖于渐进式提示,揭示了模型大小与质量无关、代码专用模型表现不佳等关键发现。

Comments 12 pages, 11 tables. Accepted at the 21st International Conference on Software Technologies (ICSOFT 2026); Recommended as Best Paper Award Candidate

详情
AI中文摘要

TLA+ 已支持亚马逊和微软等公司的工业验证,但从自然语言编写正确的 TLA+ 规范仍需时间和专业知识,这限制了其采用。LLM 显示出潜力,但尚无先前研究衡量它们是否能从自然语言生成语义正确的 TLA+ 规范。本文首次系统评估基于 LLM 的 TLA+ 规范合成。我们的研究在精心策划的 205 个 TLA+ 规范数据集上评估了来自八个系列的 30 个 LLM:四种提示策略下的 25 个开放权重模型(2600 次运行)和少样本提示下的 5 个专有模型(130 次运行),所有结果均由 SANY 解析器和 TLC 模型检查器验证。LLM 达到高达 26.6% 的语法正确性,但仅 8.6% 的语义正确性,成功仅出现在渐进式提示中。结果表明模型大小不能预测质量,例如 DeepSeek r1:8b 在所有策略上优于其 70B 变体,这表明推理对齐对形式语言的重要性。由于主流语言训练的负迁移,代码专用模型始终表现不佳。我们识别出五类重复出现的幻觉,所有幻觉均可追溯到特定的训练数据偏差。这些结果表明,当前 LLM 在没有专家监督的情况下无法生成可靠的 TLA+ 规范。我们发布了评估框架、代码和数据集,以支持可重复性和未来研究。

英文摘要

TLA+ has supported industrial verification at companies such as Amazon and Microsoft, yet writing correct TLA+ specifications from natural language still requires time and expertise, which limits adoption. LLMs show promise, but no prior study measures whether they produce semantically correct TLA+ specifications from natural language. This paper presents the first systematic evaluation of LLM-based TLA+ specification synthesis from natural language. Our study evaluates 30 LLMs across eight families on a curated dataset of 205 TLA+ specifications: 25 open-weight models across four prompting strategies (2,600 runs) and 5 proprietary models under few-shot prompting (130 runs), all validated by the SANY parser and TLC model checker. LLMs achieve up to 26.6% syntactic correctness but only 8.6% semantic correctness, with successes exclusive to progressive prompting. Results show that model size does not predict quality, e.g., DeepSeek r1:8b outperforms its 70B variant across all strategies, which suggests the importance of reasoning alignment for formal languages. Code-specialized models consistently underperform due to negative transfer from mainstream language training. We identify five recurring hallucination categories, all traceable to specific training data biases. These results suggest that current LLMs do not generate reliable TLA+ specifications without expert oversight. We release the evaluation framework, code, and dataset to support reproducibility and future research.

2606.05785 2026-06-05 cs.CV cs.AI cs.LG 版本更新

Next-Generation Parallel Decoder for LPDR: Architectural Optimization and Class-Balanced GAN-Augmentation

下一代LPDR并行解码器:架构优化与类别平衡的GAN增强

Shawaiz Obaid, Nida Chandio, Neha Jamil, Muhammad Khuram Shahzad

发表机构 * School of Electrical Engineering Computer Science National University of Sciences \& Technology Islamabad, Pakistan sobaid.mscs25seecs.edu.pk Computer Science National University of Sciences \& Technology Islamabad, Pakistan nchandio.mscs25seecs.edu.pk Computer Science National University of Sciences \& Technology Islamabad, Pakistan njamil.mscs25seecs.edu.pk Computer Science National University of Sciences \& Technology Islamabad, Pakistan

AI总结 针对车牌检测与识别中的空间字符不匹配和数据不平衡问题,提出交叉空间混合注意力和类别平衡合成增强方法,将少数省份车牌识别率从78.2%提升至91.5%,同时保持152 FPS的实时处理性能。

Comments 8 pages, 7 figures

详情
AI中文摘要

实时车牌检测与识别(LPDR)是现代智慧城市的基石。尽管YOLOV5-PDLPR模型通过并行解码器方法显著提高了系统效率,但其性能仍受训练集中空间字符不匹配和数据不平衡的影响。本文通过引入交叉空间混合注意力(CSHA)和类别平衡合成增强(CBSA)来解决这些局限性。进行了涉及75,000个合成样本的广泛研究,并在四个基准数据集(CCPD、CLPD、PKU和一个应用特定数据集)上进行了评估。实验结果表明,少数省份车牌识别率从78.2%大幅提升至91.5%,同时保持152 FPS的实时处理性能。结果表明,结合空间感知并行解码与类别平衡增强为高速车牌识别系统提供了有效解决方案。

英文摘要

Real-Time License Plate Detection and Recognition (LPDR) forms the backbone of modern smart cities. Although the YOLOV5-PDLPR model substantially improved system efficiency through a parallel decoder approach, its performance is still affected by spatial character mismatches and data imbalance within the training set. This paper addresses these limitations by introducing Cross-Spatial Hybrid Attention (CSHA) and Class-Balanced Synthetic Augmentation (CBSA). An extensive study involving 75,000 synthetic samples is conducted and evaluated on four benchmarks: CCPD, CLPD, PKU, and an application-specific dataset. Experimental results demonstrate a substantial improvement in the recognition rate of minority provincial license plates from 78.2% to 91.5% while maintaining real-time processing performance of 152 FPS. The results indicate that spatially-aware parallel decoding combined with class-balanced augmentation provides an effective solution for high-speed license plate recognition systems.

2606.05776 2026-06-05 cs.CR cs.AI cs.LG 版本更新

An Improved CNN-LSTM Based Intrusion Detection System for IoT Networks

基于改进的CNN-LSTM的物联网网络入侵检测系统

Mohammad Tariq Ikhlas, Pohanyar Khowaja Khil, Malik Muhammad Mueed Aslam, Muhammad Khuram Shahzad

发表机构 * University of Engineering and Technology, Lahore(拉合尔工程与技术大学)

AI总结 提出一种结合多类分类、数据集集成和时间特征学习的改进CNN-LSTM入侵检测模型,在物联网网络上达到约97%的准确率。

Comments 8 pages, 8 figures

详情
AI中文摘要

随着物联网设备的快速普及,安全问题急剧增加,入侵检测系统对于保护网络环境变得至关重要。本文提出了一种改进的基于CNN-LSTM的入侵检测模型,该模型结合了多类分类、数据集集成和时间特征学习,以增强物联网网络中的检测性能。使用网络流量数据,所提出的方法在入侵检测任务上进行了评估,达到了约97%的准确率。实验结果表明,该模型能有效检测多种攻击类别,同时保持稳定的训练和验证性能。卷积和循环神经网络组件的集成使框架能够捕获网络流量的空间和时间特征,提高了物联网环境中的整体入侵检测能力。

英文摘要

With the rapid proliferation of IoT devices, security concerns have dramatically escalated and intrusion detection systems have become critical for protecting networked environments. This paper presents an improved CNN-LSTM based intrusion detection model that combines multi-class classification, dataset integration, and temporal feature learning to enhance detection performance in IoT networks. Using network traffic data, the proposed approach is evaluated on intrusion detection tasks and achieves an accuracy of approximately 97%. Experimental results demonstrate that the model effectively detects multiple attack categories while maintaining stable training and validation performance. The integration of convolutional and recurrent neural network components enables the framework to capture both spatial and temporal characteristics of network traffic, improving overall intrusion detection capability in IoT environments.

2606.05758 2026-06-05 cs.CV cs.AI cs.LG 版本更新

DRIFT: A Residual Flow Adapter for Decoding Continuous Outputs in Vision-Language Models

DRIFT:一种用于视觉-语言模型中连续输出解码的残差流适配器

Zhuoming Liu, Jinhong Lin, Kwan Man Cheng, Lin Zhang, Shayok Bagchi, Yin Li

发表机构 * University of Wisconsin–Madison(威斯康星大学麦迪逊分校) West Lafayette Jr./Sr. High School(韦斯特拉法叶高中)

AI总结 提出DRIFT框架,通过结合基础预测器和基于流匹配的生成式精化模块,将预训练视觉-语言模型适配到连续解码任务,在视觉定位和机器人控制等任务上优于回归和生成方法。

详情
AI中文摘要

许多现代视觉-语言模型(VLM)基于离散标记的自回归解码。虽然基于文本的输出接口支持可扩展的预训练和跨多种任务的强零样本泛化,但它们不适用于需要精确连续输出的问题,例如定位事件的时间边界或生成机器人控制动作。为了解决这一挑战,我们提出了DRIFT,一个用于将预训练VLM适配到连续解码任务的通用框架。DRIFT结合了一个基础预测器(提供目标输出的粗略估计)和一个基于流匹配的生成式精化模块(迭代改进预测)。这种残差公式将生成建模问题从学习全局输出分布转变为在强先验周围建模局部残差分布,大大简化了优化。我们在感知和规划任务上评估了DRIFT,包括视觉定位和机器人控制。在跨越MLLM、VLA和WAM的多个任务和架构中,DRIFT consistently优于一组强大的基于回归和生成的方法。

英文摘要

Many modern vision-language models (VLMs) build on autoregressive decoding of discrete tokens. While text-based output interfaces enable scalable pretraining and strong zero-shot generalization across diverse tasks, they are poorly suited for problems that require precise continuous outputs, such as localizing temporal boundaries of events or generating robotic control actions. To address this challenge, we propose DRIFT, a general framework for adapting pretrained VLMs to continuous decoding tasks. DRIFT combines a base predictor, which provides a coarse estimate of the target output, with a generative refinement module based on flow matching that iteratively improves the prediction. This residual formulation transforms the generative modeling problem from learning a global output distribution to modeling a localized residual distribution around a strong prior, substantially simplifying optimization. We evaluate DRIFT on both perception and planning tasks, including visual grounding and robotic control. Across multiple tasks and architectures spanning MLLMs, VLAs, and WAMs, DRIFT consistently outperforms a strong set of regression- and generative-based solutions.

2606.05756 2026-06-05 cs.LG cs.AI cs.IT math.IT 版本更新

Beyond Soft Masks: Hard-Perturbation Mixup Explainer for Robust GNN Explainability

超越软掩码:用于鲁棒GNN可解释性的硬扰动混合解释器

Jialiang Yin, Zheng Zhao, Linsey Pang, Bo Dong, Bin Shi, Jiaxing Zhang

发表机构 * Xi’an Jiaotong University(西安交通大学) PayPal bellevue USA(贝尔维尤美国)

AI总结 提出基于广义图信息瓶颈的硬扰动混合解释框架HPME,通过图池化提取离散解释子图并采用结构级替换的混合策略,解决软掩码方法中标签无关信息泄漏和分布偏移问题,提升解释保真度。

详情
AI中文摘要

图神经网络(GNN)在涉及图结构数据的各种应用中表现出卓越性能,尤其是在高风险领域。然而,其决策过程的不透明性限制了可信度和更广泛的采用。现有的事后解释方法通过识别影响GNN预测的子图来提高可解释性,并采用混合策略来缓解使用子图进行预测时引起的分布外(OOD)问题。然而,这些方法通常依赖软掩码,其本质上无法完全消除标签无关信息,允许冗余结构泄漏到混合过程中,阻碍OOD问题的解决,从而降低解释保真度。在本文中,我们提出HPME,一个基于广义图信息瓶颈的硬扰动混合解释框架,利用图池化提取离散解释子图,并产生信息容量界限以彻底压缩标签无关组件。此外,我们引入了一种基于结构级替换的新型混合策略,生成分布内解释以有效缓解分布偏移。在多种任务上的大量实验表明,HPME在合成和真实数据集上生成鲁棒且可解释的解释方面达到了最先进的性能。

英文摘要

Graph Neural Networks (GNNs) have demonstrated remarkable performance across a range of applications involving graph-structured data, particularly in high-stakes domains. However, the opaque nature of their decision-making processes limits their trustworthiness and broader adoption. Existing post-hoc explanation methods aim to improve explainability by identifying subgraphs that influence GNN predictions and adopt mixup strategies to alleviate the out-of-distribution (OOD) issue caused by using subgraphs for prediction. Yet, these approaches typically rely on soft masks, which are inherently unable to fully eliminate label-irrelevant information, allowing redundant structures to leak into the mixup process and hindering the resolution of the OOD problem, thereby degrading explanation fidelity. In this work, we propose HPME, a Hard-Perturbation Mixup Explanation framework grounded in a generalized Graph Information Bottleneck, which leverages graph pooling to extract discrete explanatory subgraphs and to yield an information-capacity bound to thoroughly compress label-irrelevant components. Furthermore, we introduce a novel mixup strategy built upon structure-level replacement, generating in-distribution explanations to effectively mitigate the distribution shift. Extensive experiments on diverse tasks demonstrate that HPME achieves state-of-the-art performance in generating robust and interpretable explanations across both synthetic and real-world datasets.

2606.05737 2026-06-05 cs.CV cs.AI cs.LG cs.RO 版本更新

Let It Be Simple: One-Step Action Generation for Vision-Language-Action Models

让它简单:视觉-语言-动作模型的单步动作生成

Yitong Chen, Shiduo Zhang, Jingjing Gong, Xipeng Qiu

发表机构 * University of Science and Technology of China(中国科学技术大学) Shanghai Innovation Institute(上海创新研究院) Fudan University(复旦大学)

AI总结 针对视觉-语言-动作(VLA)模型,提出通过偏置训练时间分布至高频噪声状态,实现无需教师模型、蒸馏或辅助目标的单步动作生成,性能可匹配十步解码。

Comments 20 pages, 10 figures

详情
AI中文摘要

基于扩散的视觉-语言-动作(VLA)模型通常继承图像生成的观点:动作通过迭代去噪生成。我们认为VLA动作生成具有不同的条件-目标结构:策略以丰富的观测、语言和状态为条件,但仅预测紧凑的低维动作块。在这种不对称性下,强单步动作生成不一定需要为图像合成开发的先进单步方法。我们保持标准速度预测,不添加教师模型、蒸馏阶段或辅助目标;在我们的主要方案中,我们简单地将训练时间分布偏向高频噪声状态。我们首先在受控的MNIST网格到序列任务中隔离效果,然后通过广泛的机器人策略实验进行测试。在标准LIBERO、LIBERO-Plus和LIBERO-Pro上,使用高频噪声偏置调度训练的单步策略通常匹配相同方案下的十步解码,并且在标准LIBERO上可以超过使用均匀时间分布训练的十步策略。真实机器人双臂YAM RSS评估提供了相同采样器趋势的小样本跨架构检查。在具有30M动作头的1.4B VLM模型上,单步解码在LIBERO-Long上达到95.6%。这些结果表明,强单步VLA动作生成可以从标准扩散训练中涌现,而无需引入为图像生成开发的完整少步扩散机制。

英文摘要

Diffusion-based vision-language-action (VLA) models often inherit the image-generation view: actions are generated by iterative denoising. We argue that VLA action generation has a different condition-target structure: the policy is conditioned on rich observations, language, and state, but predicts only a compact, low-dimensional action chunk. Under this asymmetry, strong one-step action generation should not necessarily require the advanced one-step methods developed for image synthesis. We keep standard velocity prediction and add no teacher model, distillation stage, or auxiliary objective; in our main recipe, we simply bias the training time distribution toward high-noise states. We first isolate the effect in a controlled MNIST grid-to-sequence task, then test it with extensive robot-policy experiments. Across standard LIBERO, LIBERO-Plus, and LIBERO-Pro, one-step policies trained with high-noise biased schedules generally match ten-step decoding under the same recipe, and on standard LIBERO can exceed ten-step policies trained with a uniform time distribution. A real-robot bimanual YAM RSS evaluation gives a small-sample cross-architecture check of the same sampler trend. On a 1.4B VLM model with a 30M action head, one-step decoding reaches 95.6\% on LIBERO-Long. These results show that strong one-step VLA action generation can emerge from standard diffusion training, without importing the full few-step diffusion machinery developed for image generation.

2606.05733 2026-06-05 cs.LG cs.CE q-fin.CP stat.ML 版本更新

Zero-Copy Semantic Contagion: An In-Memory Streaming Architecture for Evolving Attention Graphs

零拷贝语义传染:一种用于演化注意力图的内存流式架构

Kabir Murjani

发表机构 * Department of Electrical Engineering, Nirma University(电气工程系,尼玛大学)

AI总结 提出一种基于Rust-Python的异构流式架构,通过零拷贝解析和神经霍克斯过程实现跨公司注意力图的实时构建与推理,在FNSPID语料库上相比随机基线提升1.70倍精度。

Comments Accepted to the 2026 ACM SIGMOD Workshop on Data Management for the Modern Financial Systems (FinDS). 10 pages, 4 figures

详情
AI中文摘要

按代码预测模型主导金融时间序列工作,但仍无法捕捉跨公司传播:台湾的晶圆厂中断在单资产模型中不会显现,直到苹果自己的价格已经变动。为解决这一局限,我们引入一种异构的Rust-Python流式架构,将跨公司注意力映射为直接由文本驱动的连续时间图。我们表明,在摄取端,零拷贝Rust边缘解析新闻记录约需100纳秒,并在约1.2微秒内扫描目标股票宇宙。在推理端,一个多变量神经霍克斯过程,具有每节点连续时间LSTM状态和双线性潜在投影,传播定向激发,而自适应剪枝规则限制了动态邻域更新的计算成本。结合这些阶段,我们展示了在单个商用CPU上,每条传入新闻记录的端到端处理延迟约为13毫秒。在FNSPID语料库(47个代码的638篇文章)的一个月时间保持集上评估,该系统在90百分位次日回报阈值下,相比随机基线精度提升1.70倍,相比同行业基线提升3.36倍。关键的是,移除图拓扑结构会使精度降至零,证实动态注意力网络是该架构中跨公司信号的唯一驱动因素。

英文摘要

Per-ticker forecasting models dominate financial time-series work yet remain blind to cross-company propagation: a foundry disruption in Taiwan does not register in a single-asset model until Apple's own price has already moved. To address this limitation, we introduce a heterogeneous Rust-Python streaming architecture that maps cross-company attention as a continuous-time graph driven directly from text. We show that on the ingestion side, a zero-copy Rust edge parses news records in $\sim$100 ns and scans the target equity universe in $\sim$1.2 $μ$s. On the inference end, a multivariate Neural Hawkes Process featuring per-node continuous-time LSTM states and a bilinear latent projection propagates directed excitation, while an adaptive pruning rule bounds the computational cost of dynamic neighborhood updates. Combining these stages, we demonstrate an end-to-end processing latency of $\sim$13 ms per incoming news record on a single commodity CPU. Evaluated on a one-month temporal holdout of the FNSPID corpus (638 articles across 47 tickers), the system delivers a $1.70\times$ precision lift over random at the 90th-percentile next-day return threshold, and $3.36\times$ over a same-sector baseline. Crucially, removing the graph topology collapses precision to zero, confirming that the dynamic attention network is the sole driver of cross-company signal in this architecture.

2606.05731 2026-06-05 cs.LG 版本更新

Intercomparison of Machine Learning Algorithms for Remote Sensing-based In-season Crop Mapping

基于遥感的季节内作物制图机器学习算法比较

August Posch, Jitendra Kumar, Forrest M. Hoffman, Auroop R. Ganguly

发表机构 * Oak Ridge National Laboratory(橡树岭国家实验室) Environmental Sciences Division(环境科学 division) Northeastern University(东北大学)

AI总结 本研究通过比较十种机器学习算法,利用Landsat-Sentinel反射率时间序列和轮作历史,在6月初准确绘制玉米和杏仁的30米分辨率作物图,并量化物候和分布不确定性,发现支持向量机总体表现最佳。

Comments 22 pages, 8 figures

详情
AI中文摘要

面对日益极端的气候相关作物威胁,季节内作物类型制图对粮食安全至关重要。目前,美国农业部作物数据层提供30米分辨率的作物类型标签,并在收获后的2月可用,但尚无产品能在收获前以令人满意的精度绘制作物类型,从而允许应急管理人员近乎实时地应对作物威胁。此外,直到本研究,广泛算法的相对优势尚未以考虑年际变异的方式进行评估。在此,我们结合协调的Landsat-Sentinel地表反射率时间序列和作物轮作历史信息,在6月初准确绘制爱荷华州的玉米和加利福尼亚州的杏仁的30米分辨率图,并稳健量化物候和作物分布引起的不确定性。通过逐年交叉验证和一套指标,比较了十种机器学习算法的数千种模型配置。超参数搜索显示,支持向量机是总体最成功的算法,在加利福尼亚州6月初的杏仁(爱荷华州6月初的玉米)的五个未见验证年份中,平均F1分数为0.74(0.59)。年际变异是不确定性的主要来源,但模式表明通过集成方法或辅助数据有进一步提高性能的潜力。未来工作可将这些方法扩展到包括所有作物类型的多类地图、全美国地图以及季节内作物产量预测。

英文摘要

In-season crop type mapping is critical for food security in the face of increasingly extreme climate-related threats to crops. Currently, the USDA Cropland Data Layer provides crop type labels at 30m resolution and is available the February after harvest, but no product exists that maps crop types before harvest with satisfactory accuracy that would allow emergency managers to respond to crop threats in near real time. Furthermore, the relative advantages of a wide range of algorithms have not been evaluated in a way that accounts for interannual variability, until this study. Here, Harmonized Landsat-Sentinel surface reflectance imagery time series and crop rotation history information are combined to map corn in Iowa and almonds in California at 30m resolution accurately by early June in unseen years, with robust quantification of uncertainty due to phenology and crop distribution. Thousands of model configurations across ten machine learning algorithms were compared using a year-wise cross-validation and a suite of metrics. Hyperparameter search revealed Support Vector Machines to be the most successful algorithm overall, with a mean F1 score of 0.74 (0.59) across five unseen validation years for almonds by early June in California (corn by early June in Iowa). Interannual variation was a large source of uncertainty, but patterns showed the potential to further improve performance with ensemble approaches or ancillary data. Future work may extend these methods to include multiclass maps of all crop types, CONUS-wide maps, and in-season crop yield forecasting.

2606.05729 2026-06-05 cs.IT cs.LG math.IT 版本更新

Automated Proving of Shannon-Type Entropy Inequalities via Fine-Tuned Language Models and Guided Tree Search

通过微调语言模型和引导树搜索自动证明香农型熵不等式

Shing Yin Wong, Shaocheng Liu, Linqi Song, Amin Gohari, Cheuk Ting Li

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文通过微调小规模语言模型并结合引导束搜索,自动化证明香农型熵不等式,在含10-15个变量的测试集上达到85%的证明成功率。

详情
AI中文摘要

证明香农型熵不等式是信息论中的一项基本任务,通常需要构造已知约束的非平凡线性组合,这是一个组合搜索问题,其规模随随机变量数量增加而急剧增长。我们研究了小规模大语言模型(0.6B--1.7B参数),在原子证明步骤上微调并结合引导束搜索,能否自动化这一过程。在包含n=10到15个变量的60个不等式的保留测试集上,我们的0.6B微调模型通过树搜索达到了85%的证明成功率。GPT-5.5在零样本提示下解决了1.7%的样本,而Psitip解决了33.3%的样本。跨训练上下文长度(4096 vs. 8192 token)和数据分布(n=9偏斜 vs. 非偏斜)的系统消融研究表明,4096 token的非偏斜训练分布表现最佳,而扩展上下文和偏斜数据没有带来边际收益。我们进一步识别了两种主要的失败模式——格式失败和步骤质量退化——并通过受控消融验证了束评分启发式的必要性(随机评分将成功率从83%降至23%)。

英文摘要

Proving Shannon-type entropy inequalities is a fundamental task in information theory that often requires constructing non-trivial linear combinations of known constraints, which is a combinatorial search problem that scales poorly with the number of random variables. We investigate whether small-scale large language models (0.6B--1.7B parameters), fine-tuned on atomic proof steps and combined with guided beam search, can automate this process. On a held-out test set of 60 inequalities spanning n=10 to 15 variables, our 0.6B fine-tuned model achieves an 85\% proof success rate with tree search. GPT-5.5 solves 1.7\% samples under zero-shot prompting while Psitip solves 33.3\% samples. A systematic ablation study across training context length (4096 vs.\ 8192 tokens) and data distribution (n=9-skewed vs not skewed) reveals that a 4096-token not skewed training distribution yields the best performance, with extended context and skewed data providing no marginal benefit. We further identify two dominant failure modes -- format failures and step quality degradation -- and verify that the beam-scoring heuristic is essential via a controlled ablation (random scoring reduces success from 83\% to 23\%).

2606.05718 2026-06-05 cs.CV cs.AI cs.LG 版本更新

ViCuR: Visual Cues as Recoverable Privilege for Multimodal On-Policy Distillation

ViCuR: 视觉线索作为多模态在策略蒸馏中的可恢复特权

Kanghui Tian, Siyuan Liu, Ziang Yan, Sheng Xia, Shuai Dong, Yi Wang

发表机构 * Shanghai AI Laboratory(上海人工智能实验室) Fudan University(复旦大学) Nanjing University(南京大学)

AI总结 提出ViCuR框架,通过将教师特权从答案侧替换为输入中的视觉线索,并引入轻量级线索恢复模块,解决多模态在策略蒸馏中的训练-测试不匹配问题,在七个基准上显著提升学生模型性能。

Comments 25 pages, 11 figures. Preprint, under review

详情
AI中文摘要

在策略蒸馏(OPD)通过在教师监督下,对学生自身策略采样的轨迹进行训练来改进推理。在多模态推理中,一种常见的扩展是使用特权教师,该教师观察仅在训练时可用的信号,如参考答案或理由。然而,这种答案侧特权造成了训练-测试不匹配:教师的监督可能依赖于学生无法获得的信号,鼓励捷径模仿而非基于视觉的推理。我们提出ViCuR,一种基于视觉的特权教师蒸馏框架,用视觉线索(输入中与查询相关的证据)取代答案侧特权。由于这些线索来源于推理时可用的相同视觉输入,它们的证据可由学生恢复。为此,ViCuR引入了一个轻量级线索恢复模块,在预填充期间使用专用的汇点令牌交叉注意力,将任务相关的视觉证据聚合到内部表示中,而不改变推理接口或需要辅助的线索生成损失。在七个基准上,使用Qwen3-VL-2B和8B学生,ViCuR在总体平均性能上持续优于基于答案的在策略自蒸馏,分别提升+1.19和+1.24。它还能自然地扩展到更强的教师OPD,超越OPD基线+0.64和+1.08,并在8B规模上具有一致的域外增益。这些结果表明,在多模态在策略蒸馏中,教师特权的设计与教师强度同等重要。

英文摘要

On-policy distillation (OPD) improves reasoning by training a student on trajectories sampled from its own policy under supervision from a teacher. In multimodal reasoning, a common extension is to use a privileged teacher that observes training-time-only signals such as reference answers or rationales. However, such answer-side privilege creates a train-test mismatch: the teacher's supervision may depend on signals unavailable to the student, encouraging shortcut imitation rather than visually grounded reasoning. We propose ViCuR, a visually grounded privileged-teacher distillation framework that replaces answer-side privilege with visual cues (query-related evidence in the input). Because these cues are derived from the same visual input available at inference, their evidence is recoverable by the student. To support this, ViCuR introduces a lightweight cue recovery module that uses dedicated sink-token cross-attention during prefill to aggregate task-relevant visual evidence into an internal representation, without changing the inference interface or requiring auxiliary cue-generation losses. Across seven benchmarks with Qwen3-VL-2B and 8B students, ViCuR consistently improves over answer-based on-policy self-distillation by +1.19 and +1.24 on overall average performance. It also extends naturally to stronger-teacher OPD, surpassing OPD baselines by +0.64 and +1.08, with consistent out-of-domain gains at the 8B scale. These results show that, in multimodal on-policy distillation, the design of teacher privilege is as important as teacher strength.

2606.05714 2026-06-05 cs.CR cs.LG 版本更新

Hybrid CNN-LSTM Framework for Intelligent Cyber Attack Detection and Prevention in U.S. Critical Digital Infrastructure: A Comparative Machine Learning Evaluation on CSE-CIC-IDS2018

混合CNN-LSTM框架用于美国关键数字基础设施的智能网络攻击检测与防御:基于CSE-CIC-IDS2018的机器学习比较评估

Md. Iqbal Hossan, Md. Serajul Kabir Chowdhury Rubel, Md. Arifur Rahman, B. M. Taslimul Haque

发表机构 * Department of Computer Science, Maharishi International University(马哈拉吉国际大学计算机科学系) Department of Information Studies, Trine University(特林大学信息学系) Department of Business Information Systems, Central Michigan University(中央密歇根大学商业信息系统系)

AI总结 提出一种结合CNN和LSTM的混合深度学习框架,利用CSE-CIC-IDS2018数据集进行网络攻击检测与防御,通过比较多种机器学习模型,实现高精度入侵检测和自动防御。

Comments 25 pages, 9 figures, CSE CIC IDS2018 dataset, Hybrid CNN LSTM, cyber attack detection

详情
Journal ref
Journal of Ai ML DL, 1(1), 2025
AI中文摘要

美国数字基础设施正在快速增长,因此,关键领域(包括医疗、金融、交通、能源和政府系统)面临的先进网络威胁也在增加。传统的网络安全方法,包括基于签名的入侵检测系统,已无法有效应对当今的网络攻击,因为它们无法实时检测未知和变化的攻击。为了克服这些限制,本研究提出了一种智能网络防御系统,利用人工智能(AI)和机器学习(ML)算法来检测和预防美国数字基础设施中的网络攻击。本研究使用CSE-CIC-IDS2018数据集,这是一个真实的网络流量数据集,包含各种网络攻击场景,包括分布式拒绝服务(DDoS)、暴力攻击、僵尸网络、渗透攻击和基于Web的攻击。实施并评估了多种机器学习和深度学习模型,如随机森林、XGBoost、卷积神经网络(CNN)和长短期记忆(LSTM)网络,用于识别恶意网络行为并提高入侵检测的准确性。所提出的框架结合了数据预处理、特征工程、实时流量监控、智能威胁分类和自动防御机制,以增强网络安全弹性。

英文摘要

Digital infrastructure is growing at a rapid pace in the United States, and as a result, exposure to advanced cyber threats to critical sectors including healthcare, finance, transportation, energy and government systems is growing. The traditional cybersecurity approaches, including signature-based intrusion detection systems, have become less effective against today's cyber attacks, as they are unable to detect unknown and changing attacks in real time. To overcome these constraints, this research suggests a smart cyber-defense system, which utilizes Artificial Intelligence (AI) and Machine Learning (ML) algorithms in the detection and prevention of cyber attacks in the U.S. digital infrastructure. This study uses the CSE-CIC-IDS2018 dataset, which is a realistic network traffic dataset, along with various cyber attack scenarios, including Distributed Denial of Service (DDoS), brute force attacks, botnets, infiltration attacks, and web-based attacks. A number of machine learning and deep learning models such as Random Forest, XGBoost, Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks are implemented and evaluated to be used in identifying malicious network behavior and boosting the accuracy of intrusion detection. The framework proposed combines data preprocessing, feature engineering, real-time traffic monitoring, intelligent threat classification with automated prevention mechanisms to build cybersecurity resilience. E

2606.05704 2026-06-05 cs.AI cs.LG 版本更新

Critic-Guided Heterogeneous Multi-Agent Reasoning for Reliable Mathematical Problem Solving

基于评论的异构多智能体推理用于可靠的数学问题求解

Muhammad Talha Sharif, Abdul Rehman

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出一种基于评论的异构多智能体框架,通过生成器-验证器结构和自适应学习系统,利用中间反馈评估和引导推理过程,在GSM8K基准上实现高达13%的准确率提升,并减少对大模型的依赖。

Comments 6 pages

详情
AI中文摘要

近期的大语言模型(LLMs)展示了令人印象深刻的推理能力;但在复杂数学推理问题中,它们仍然容易产生幻觉、中间推理错误以及不可靠的推理结果。在本研究中,我们引入了一种基于评论的异构多智能体方法,以提高数学推理的可靠性。该框架整合了多个不同专长的LLM智能体,并采用评论驱动的自适应学习系统,基于中间反馈评估和引导推理过程。系统采用生成器-验证器框架,验证器不仅判断正确性,还提供评论以指导解决方案的重新生成。这允许自适应错误纠正并防止错误级联。我们在GSM8K基准上的实验表明,所提方法相比单次和非评论模型实现了高达13%的准确率提升。此外,研究结果表明,异构性和评论减少了对大模型的需求,使较小模型也能达到相当的性能。消融研究显示,主要性能提升归因于基于评论的反馈循环,而非模型大小。总之,所提方法展示了结合异构多智能体协作与评论以获得可靠且可解释推理系统的优势。

英文摘要

Recent Large Language Models (LLMs) have shown impressive reasoning abilities; but they are still susceptible to hallucinations, intermediate reasoning mistakes, and unreliable reasoning results in complex mathematical reasoning problems. In this study, we introduce a critic-based heterogeneous multi-agent approach to improve the dependability of mathematical reasoning. This framework incorporates several LLM agents of different specialties and employs a critic-driven adaptive learning system to assess and guide the reasoning process based on intermediate feedback. The system adopts a generator-validator framework, with the validator not only determining correctness but also offering critiques to guide regeneration of solutions. This allows for adaptive error correction and prevents error cascading. Our experiments on the GSM8K benchmark show that the proposed method achieves up to 13% accuracy improvement over single-shot and non-critic models. Additionally, findings suggest that heterogeneity and critique reduce the need for large models, allowing smaller models to perform on par. Ablation studies reveal the main performance gains are due to the critic-based feedback loop and not model size. In summary, the proposed approach showcases the benefits of combining heterogeneous multi-agent collaboration and critique to obtain reliable and interpretable reasoning systems.

2606.05700 2026-06-05 cs.CV cs.LG 版本更新

T-SAR-JEPA: Self-Supervised Temporal Anomaly Detection in SAR Amplitude Stacks via Latent Prediction

T-SAR-JEPA:通过潜在预测在SAR幅度堆栈中进行自监督时间异常检测

Kerod Woldesenbet, Abem Woldesenbet

发表机构 * Independent Researcher(独立研究者) Dakota State University(达科塔州立大学)

AI总结 提出T-SAR-JEPA框架,通过自监督潜在预测在SAR幅度堆栈中检测时间异常,在DFC 2026数据集上达到77.0%的ROC-AUC,优于多种基线方法。

Comments Won IEEE GRSS Data Fusion Contest 2026; to appear in IGARSS 2026 proceedings

详情
AI中文摘要

我们提出了T-SAR-JEPA,一个通过潜在预测在SAR幅度堆栈中进行时间异常检测的自监督框架。来自SAR-JEPA的ViT-Base/16编码器在39,300个Capella图像块上通过局部掩码重建和梯度特征预测进行领域自适应。一个带有正弦时间编码的时间Transformer从K=7次采集中预测未来潜在状态,渐进式解冻显著降低了验证损失。该模型仅基于幅度操作;InSAR相干性仅作为独立的伪真实标签。在DFC 2026数据集(300个时间序列,三个感兴趣区域)上,T-SAR-JEPA在夏威夷喷发窗口上实现了77.0%的ROC-AUC,优于RX、PaDiM、线性AR和LSTM基线(约50%)。99.9%的空间一致性(p < 0.001,置换检验)确认了结构化检测。代码:https://github.com/TerraLatent/t-sar-jepa

英文摘要

We present T-SAR-JEPA, a self-supervised framework for temporal anomaly detection in SAR amplitude stacks via latent prediction. A ViT-Base/16 encoder from SAR-JEPA is domain-adapted on 39,300 Capella patches using local masked reconstruction with gradient feature prediction. A temporal transformer with sinusoidal time encoding forecasts future latent states from K=7 acquisitions, with progressive unfreezing substantially reducing validation loss. The model operates on amplitude alone; InSAR coherence serves exclusively as independent pseudo-ground-truth. On the DFC 2026 dataset (300 time-series, three AOIs), T-SAR-JEPA achieves ROC-AUC of 77.0% on the Hawaii eruption window, outperforming RX, PaDiM, Linear AR, and LSTM baselines (~50%). Spatial coherence of 99.9% (p < 0.001, permutation test) confirms structured detections. Code: https://github.com/TerraLatent/t-sar-jepa

2606.05695 2026-06-05 cs.LG 版本更新

Revisiting Prototype Rehearsal for Exemplar-Free Continual Learning: Manifold-Aware Boundary Sampling with Adaptive Class-Balanced Loss

重新审视原型重放用于无样本持续学习:基于流形感知边界采样与自适应类别平衡损失

Hongye Xu, Bartosz Krawczyk

发表机构 * Chester F. Carlson Center for Imaging Science(切斯特·F·卡森成像科学中心) Rochester Institute of Technology(罗切斯特理工学院)

AI总结 针对无样本类增量学习,提出流形感知边界采样和自适应类别平衡损失,通过生成边界感知重放样本和动态调整类别权重,使原型重放方法恢复竞争力并达到最先进性能。

Comments Published in CVPR 2026 Findings. 10 pages, 6 figures. CVF version: https://openaccess.thecvf.com/content/CVPR2026F/html/Xu_Revisiting_Prototype_Rehearsal_for_Exemplar-Free_Continual_Learning_Manifold-Aware_Boundary_Sampling_CVPRF_2026_paper.html. Code: https://github.com/HXuSz11/ACB_CEOS_CVPR2026_Findings

详情
AI中文摘要

无样本类增量学习旨在随时间获取新类别而不存储原始数据。历史上,原型重放(在存储的类原型周围采样并与当前任务数据混合)是减少灾难性遗忘的流行策略。然而,最近的漂移补偿方法通过在演化特征空间中显式重新对齐原型,持续优于基于原型的重放,引发了对重放本身是否根本受限的疑问。我们认为性能差距并非源于原型重放的思想本身,而是源于其典型的实现方式:现有方法将原型视为孤立的类摘要,忽略了来自邻近敌对类的信息,并且未能纠正少量合成旧类样本与来自新引入类别的数百个真实实例之间出现的类别不平衡。基于这一假设,我们重新审视原型重放,并提出一种流形感知变体,以恢复其在无样本类增量学习中的竞争力。首先,我们引入约束扩展过采样,将每个旧类原型向其最近的新类敌对特征进行插值,生成边界感知的重放样本,这些样本更好地遵循底层数据流形,同时保持类间分离。其次,我们设计了一种自适应类别平衡损失,执行基于时间的类别加权,在旧原型信息量最大时放大其梯度,并随着后续任务积累更丰富的监督而逐渐退火其影响。这些组件共同将原型重放转变为一种抗漂移、感知不平衡的机制,缩小甚至逆转了与近期漂移补偿方法的差距,在多个无样本类增量学习基准上实现了最先进的性能。

英文摘要

Exemplar-free class-incremental learning (EFCIL) aims to acquire new classes over time without storing raw data. Historically, prototype rehearsal, which samples around stored class prototypes and mixes them with current-task data, has been a popular strategy to reduce catastrophic forgetting. However, recent drift-compensation methods that explicitly realign prototypes in the evolving feature space consistently outperform prototype-based rehearsal, raising the question of whether rehearsal itself is fundamentally limited. We argue that the performance gap stems not from the idea of prototype rehearsal per se, but from how it is typically instantiated: existing approaches treat prototypes as isolated class summaries that ignore information from nearby enemy classes, and fail to correct the emerging class imbalance between a handful of synthetic old-class samples and hundreds of real instances from newly introduced classes. Building on this hypothesis, we revisit prototype rehearsal and propose a manifold-aware variant that restores its competitiveness in EFCIL. First, we introduce Constrained Expansive Over-Sampling, which interpolates each old-class prototype toward its nearest enemy features from new classes, generating boundary-aware rehearsal samples that better follow the underlying data manifold while preserving inter-class separation. Second, we design an Adaptive Class-Balanced loss that performs time-based class weighting, amplifying gradients from older prototypes when they are most informative and gradually annealing their influence as richer supervision from later tasks accumulates. Together, these components turn prototype rehearsal into a drift-resilient, imbalance-aware mechanism that closes, and often reverses, the gap to recent drift-compensation methods, achieving state-of-the-art performance across multiple EFCIL benchmarks.

2606.05689 2026-06-05 cs.LG 版本更新

Causal Modeling of Selection in Evolution

进化中选择的因果建模

Haoyue Dai, Zeyu Tang, Peter Spirtes, Kun Zhang

发表机构 * University of Washington(华盛顿大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文区分了静态选择与进化选择两种形式,针对进化选择提出新因果模型,并开发了从数据中识别该模型的完整方法。

Comments Appears at ICML 2026 (spotlight)

详情
AI中文摘要

理解数据中潜在的选择对于因果发现至关重要;我们认为常见叙述中的“选择”有两种形式,分别称为静态选择和进化选择。静态选择指一次性过滤过程,观测数据由感兴趣总体的一个子集组成,如调查志愿者偏差。相比之下,进化选择通过繁殖中差异适应性的重复轮次运作,观测数据构成由历史轨迹塑造的最新一代,如免疫适应、抗生素耐药性和社会规范涌现。现有方法大多混淆这两种形式,并依赖相同的选择图形模型。我们证明该模型在静态设置中有效,但无法表征进化下的数据,导致错误发现结果。为解决此问题,我们引入了一个专门表征进化选择的新模型,并开发了一个可靠且完整的程序,用于从跨一个或多个环境或世代的数据中识别此类模型。实验结果验证了该方法从数据中揭示进化相关机制的能力。

英文摘要

Understanding potential selection in data is crucial for causal discovery; we argue that "selection" in common narratives takes two forms, which we term static and evolutionary selection, respectively. Static selection refers to a one-shot filtering process where observed data consist of a subset of the population of interest, as in survey volunteer bias. Evolutionary selection, in contrast, operates through repeated rounds of differential fitness in reproduction, where observed data constitute the latest generation shaped by a historical trajectory, as in immune adaptation, antibiotic resistance, and social norm emergence. Existing methods largely conflate these two forms and rely on an identical graphical model of selection. We show that this model is valid for static settings but fails to characterize data under evolution, yielding false discovery results. To address this, we introduce a new model that specifically characterizes evolutionary selection, and develop a sound and complete procedure for identifying such models from data across one or multiple environments or generations. Experimental results validate the method's ability to uncover the relevant mechanisms underlying evolution from data.

2606.05680 2026-06-05 cs.PL cs.AR cs.LG 版本更新

CASS-RTL: Correctness-Aware Subspace Steering for RTL Generation with LLMs

CASS-RTL:面向LLM的RTL生成的正确性感知子空间引导

Mohammad Akyash, Nowfel Mashnoor, Kimia Azar, Hadi Kamali

发表机构 * Department of Electrical and Computer Engineering (ECE), University of Central Florida, Orlando, FL 32816, USA(电子与计算机工程系,中央佛罗里达大学,奥兰多,佛罗里达州32816,美国)

AI总结 提出CASS-RTL框架,通过识别LLM中与RTL正确性相关的注意力头并构建低维子空间进行轻量级干预,在无需额外监督或重训练的情况下提升RTL代码生成的功能准确性。

Comments Accepted to the IEEE International Conference on LLM-Aided Design (LAD '26)

详情
AI中文摘要

近期大型语言模型(LLM)的进展使得从自然语言指令自动综合(生成)寄存器传输级(RTL)代码成为可能,为加速芯片设计提供了有前景的途径。与典型的自然语言(及软件编码)任务不同,基于LLM的RTL代码生成要求严格的周期准确性和并发性,微小的逻辑错误可能导致电路无法使用或不安全。尽管先前的工作通过外部验证、自我评估提示、检索增强提示、领域特定微调、智能体解决方案和推理来探索幻觉缓解,但这些方法大多忽视了LLM中可能固有地与RTL正确性相关的注意力导向内部机制。本文提出CASS-RTL,这是首个发现并利用LLM的正确性感知组件来引导RTL生成朝向功能准确输出的框架。我们(i)识别注意力头,其激活模式一致地区分正确与不正确的RTL;(ii)构建一个低维子空间以捕获正确性相关信号;(iii)设计一种轻量级的、几何感知的干预,在推理时引导模型。CASS-RTL完全与模型无关,无需额外监督或重训练,并易于集成到现有模型中。实验上,我们在多个模型上评估CASS-RTL,观察到在VerilogEval上pass@1/5/10准确率提升10%-20%,在CVDP上提升5%,证明了我们的方法在增强可靠性方面的有效性,同时不牺牲模型效率或需要大型标注数据集进行微调。

英文摘要

Recent advances in large language models (LLMs) have enabled the automatic synthesis (generation) of register-transfer level (RTL) code from natural language instructions, offering a promising pathway to accelerate chip design. Unlike typical natural language (and software coding) tasks, LLM-based RTL code generation demands strict cycle accuracy with concurrency, where minor logical errors can render a circuit unusable or insecure. While prior work has explored hallucination mitigation via external verification, self-evaluation prompts, retrieval-augmented prompting, domain specific fine-tuning, agentic solutions, and reasoning, these approaches largely overlook the attention-oriented internal mechanisms of LLMs that may inherently correlate with RTL correctness. This work proposes CASS-RTL, a first-of-its-kind framework for discovering and leveraging LLMs' correctness-aware components to guide RTL generation toward functionally accurate outputs. We (i) identify attention heads whose activation patterns consistently differentiate correct from incorrect RTL; (ii) construct a low-dimensional subspace capturing correctness-relevant signals; and (iii) design a lightweight, geometry-aware intervention that steers the model at inference time. CASS-RTL is fully model-agnostic, requires no additional supervision or retraining, and readily integrates into existing models. Empirically, we evaluate CASS-RTL on multiple models and observe 10%-20% improvement in pass@1/5/10 accuracy on VerilogEval and 5% improvement on CVDP, demonstrating the effectiveness of our method in enhancing reliability without sacrificing model efficiency or requiring a large labeled dataset for fine-tuning.

2606.05675 2026-06-05 cs.LG cs.CV 版本更新

Two-Way Is Better Than One: Bidirectional Alignment with Cycle Consistency for Exemplar-Free Class-Incremental Learning

双向优于单向:基于循环一致性的双向对齐用于无样本类增量学习

Hongye Xu, Bartosz Krawczyk

发表机构 * Chester F. Carlson Center for Imaging Science(切斯特·F·卡勒中心影像科学中心) Rochester Institute of Technology(罗切斯特理工学院)

AI总结 提出BiCyc方法,通过双向投影器对齐和循环一致性目标,解决无样本类增量学习中原型漂移和单向投影偏差问题,减少灾难性遗忘并提升准确率。

Comments Published as a conference paper at ICLR 2026. 23 pages, 8 figures. Code: https://github.com/HXuSz11/BiCyc_ICLR2026

详情
AI中文摘要

持续学习(CL)旨在使模型在不遗忘先前知识的情况下获取新技能。在无样本类增量学习(EFCIL)中,由于无法存储过去数据,这一挑战被放大,旧类的表示漂移尤其有害。基于原型的EFCIL因其高效性而具有吸引力,但随着嵌入空间的演化,原型会发生漂移;因此,基于投影的漂移补偿已成为一种流行的补救措施。然而,我们表明,现有的单向投影引入了系统性偏差:它们要么追溯性地扭曲当前特征几何结构,要么仅局部对齐旧类,导致跨任务累积的循环不一致性。我们提出BiCyc,一种具有循环一致性目标的双向投影器对齐方法。BiCyc联合优化两个映射(旧到新和新到旧),并采用停止梯度门控,使得传输和表示共同演化。分析表明,循环损失在白化空间中将奇异谱向单位值收缩,并且类均值和协方差的改进传输导致分类对数几率扰动更小,从而保留旧类决策并减轻灾难性遗忘。实验上,在标准EFCIL基准测试中,BiCyc显著减少了遗忘并提高了从头开始设置下的准确率,同时在预训练细粒度场景中保持竞争力。

英文摘要

Continual learning (CL) seeks models that acquire new skills without erasing prior knowledge. In exemplar-free class-incremental learning (EFCIL), this challenge is amplified because past data cannot be stored, making representation drift for old classes particularly harmful. Prototype-based EFCIL is attractive for its efficiency, yet prototypes drift as the embedding space evolves; therefore, projection-based drift compensation has become a popular remedy. We show, however, that existing one-directional projections introduce systematic bias: they either retroactively distort the current feature geometry or align past classes only locally, leaving cycle inconsistencies that accumulate across tasks. We introduce BiCyc, a bidirectional projector alignment approach with a cycle-consistency objective. BiCyc jointly optimizes two maps, old-to-new and new-to-old, with stop-gradient gating so that transport and representation co-evolve. Analytically, we show that the cycle loss contracts the singular spectrum toward unity in whitened space, and that improved transport of class means and covariances yields smaller perturbations of classification log-odds, preserving old-class decisions and mitigating catastrophic forgetting. Empirically, across standard EFCIL benchmarks, BiCyc substantially reduces forgetting and improves accuracy in from-scratch settings, while remaining competitive in the pretrained fine-grained regime.

2606.05649 2026-06-05 stat.CO cs.LG 版本更新

Diff2SP: Diffusion Models for Correlated Scenario Generation in Stochastic Programming

Diff2SP:随机规划中相关场景生成的扩散模型

Haixiang Sun, Andrew Liu

发表机构 * Purdue University(普渡大学)

AI总结 提出Diff2SP扩散生成框架,将下游优化目标嵌入场景生成过程,通过理论证明和经验验证实现统计一致性与决策感知的平衡。

详情
AI中文摘要

场景生成是随机规划(SP)中的关键组成部分,直接影响不确定性下决策的质量。现有方法主要依赖于基于采样的技术或使用神经网络的监督学习。基于采样的方法通常难以捕捉复杂依赖关系和罕见但可能的事件,而监督学习需要固定的输入-输出对进行训练,且生成不受预定义模式或规则限制的多样化现实场景的能力有限。为了解决这些局限性,我们引入了Diff2SP,一种基于扩散的生成框架,将下游优化目标直接融入场景生成中。与将场景生成和决策制定视为独立步骤的传统方法不同,Diff2SP将随机优化嵌入训练过程,从而生成既统计一致又具有决策感知的场景。为了正式证明这种优化感知设计的合理性,我们建立了将分布精度与决策质量联系起来的遗憾界,并建立了样本复杂度保证,显示出比传统生成模型(如GAN)更快的收敛速度。在合成数据集和电力系统数据集上的实证结果验证了这些理论见解,表明Diff2SP在统计保真度和下游优化结果上均有一致提升。

英文摘要

Scenario generation is a critical component in stochastic programming (SP), as it directly influences the quality of decision-making under uncertainty. Existing approaches predominantly rely on either sampling-based techniques or supervised learning using neural networks. Sampling-based techniques often struggle to capture complex dependencies and rare but plausible events, while supervised learning requires fixed input-output pairs for training and is limited in its ability to generate a wide variety of realistic scenarios that are not restricted by predefined patterns or rules. To address these limitations, we introduce Diff2SP, a diffusion-based generative framework that incorporates downstream optimization objectives directly into scenario generation. Unlike conventional methods that treat scenario generation and decision-making as separate steps, Diff2SP embeds stochastic optimization into the training process, enabling the generation of scenarios that are both statistically coherent and decision-aware. To formally justify this optimization-aware design, we establish a regret bounds that link distributional accuracy to decision quality, and establish sample complexity guarantees showing faster convergence than traditional generative models such as GANs. Empirical results on both synthetic and power-system datasets validate these theoretical insights, demonstrating that Diff2SP consistently improves both statistical fidelity and downstream optimization outcomes.

2606.05639 2026-06-05 cs.LG 版本更新

Q-GNN: Query-Conditioned Graph Neural Networks with Type Awareness for Knowledge Graph Completion

Q-GNN: 具有类型感知的查询条件图神经网络用于知识图谱补全

Dongxiao He, Ruqiong Zhang, Zhizhi Yu, Ling Ding, Di Jin, Guangquan Xu, Zhiyong Feng

发表机构 * College of Intelligence and Computing, Tianjin University(智能与计算学院,天津大学)

AI总结 提出Q-GNN,通过融合查询实体的结构上下文和语义类型信息,增强图神经网络在知识图谱补全中的推理能力。

详情
AI中文摘要

知识图谱补全(KGC)旨在从不完整的知识图谱中预测缺失的三元组,这对于下游应用至关重要。近年来,基于图神经网络(GNN)的方法通过在以查询为中心的局部子图上进行消息传递取得了显著成功。然而,在实践中,查询由实体和关系共同定义,两者都携带推理不可或缺的信息,但这些方法仅依赖查询关系作为引导信号,而查询实体中固有的信息未被利用来指导推理——实体仅作为子图提取的结构锚点。为此,我们从两个角度将查询实体信息融入推理过程:第一是结构上下文,即实体周围的邻居结构和关系模式,由专用上下文编码器编码并用于调制消息;第二是实体的语义类型,由大语言模型推断,并融入注意力计算和最终评分,以提供类型级别的先验约束。这两类信息共同使推理过程同时受查询关系和查询实体引导。在标准基准上的实验结果证明了所提出的Q-GNN的有效性。

英文摘要

Knowledge Graph Completion (KGC) aims at predicting missing triplets from incomplete knowledge graphs, which is crucial for downstream applications. Recently, Graph Neural Network (GNN)-based methods have achieved remarkable success by performing message passing over query-centered local subgraphs. However, in practice, a query is jointly defined by both the entity and the relation, with both carrying information indispensable for reasoning, yet these methods rely solely on the query relation as the guiding signal, while the information inherent in the query entity is not leveraged to guide inference - the entity serves merely as a structural anchor for subgraph extraction. To this end, we incorporate query entity information into the reasoning process from two perspectives: the first is structural context, i.e., the neighboring structure and relation patterns around the entity, which is encoded by a dedicated context encoder and used to modulate messages; the second is semantic type of the entity, inferred by a large language model, which is incorporated into attention computation and final scoring to provide type-level prior constraints. Together, these two sources of information enable the reasoning process to be guided by both the query relation and the query entity. Experimental results on standard benchmarks demonstrate the effectiveness of the proposed Q-GNN.

2606.05636 2026-06-05 cs.LG 版本更新

StableRCA: Robust Graph-Agnostic Mechanism-Level Root Cause Analysis

StableRCA:鲁棒的图无关机制级根因分析

Xiaoyu Lin, Nicholas Tagliapietra, Kehan Li, Lavdim Halilaj, Juergen Luettin

发表机构 * Department of Computer Science, Tsinghua University(清华大学计算机科学系) Bosch Center for Artificial Intelligence(博世人工智能中心) Computer Science Department, TU Darmstadt(图尔恩大学计算机科学系)

AI总结 提出StableRCA框架,通过估计局部马尔可夫边界并检测条件分布偏移,避免全局图发现,实现鲁棒的机制级根因分析。

详情
AI中文摘要

根因分析(RCA)旨在识别复杂领域(如制造业、云计算和医疗保健)中导致系统行为异常的变量。现有方法面临一个关键瓶颈:基于图的因果方法可以识别干预目标,但通常需要已知或准确估计的因果图,而无图统计方法要么定位边际异常而非结构原因,要么依赖于对图结构或函数形式的限制性假设。我们提出StableRCA,一种局部机制级RCA框架,通过估计局部马尔可夫边界并检测其中的条件分布偏移来避免全局图发现。利用独立因果机制原理,我们证明在忠实马尔可夫边界恢复和非退化机制偏移下,干预目标可以以样本量指数收敛的概率被识别。在合成基准和五个真实世界数据集上的实验表明,StableRCA对图错误指定具有鲁棒性,在多个干预目标下有效,可扩展至大型系统,并在不同应用领域中可靠。代码可在 https://anonymous.4open.science/r/StableRCA-E362 获取。

英文摘要

Root-Cause Analysis (RCA) seeks to identify the variables responsible for abnormal system behavior in complex domains such as manufacturing, cloud computing, and healthcare. Existing approaches face a critical bottleneck: graph-based causal methods can identify intervention targets but typically require a known or accurately estimated causal graph, while graph-free statistical methods either localize marginal anomalies rather than structural causes, or rely on restrictive assumptions about graph structure or functional form. We propose StableRCA, a local mechanism-level RCA framework that avoids global graph discovery by estimating local Markov boundaries and detecting conditional distribution shifts within them. Leveraging the Independent Causal Mechanism principle, we show that intervention targets can be identified with probability converging exponentially in sample size under faithful Markov boundary recovery and non-degenerate mechanism shifts. Experiments on synthetic benchmarks and five real-world datasets demonstrate that StableRCA is robust to graph misspecification, effective under multiple intervention targets, scalable to large systems, and reliable across diverse application domains. Code is available at: https://anonymous.4open.science/r/StableRCA-E362

2606.05626 2026-06-05 cs.CL cs.AI cs.LG 版本更新

When New Generators Arrive: Lifelong Machine-Generated Text Attribution via Ridge Feature Transfer

当新生成器到来:基于岭特征迁移的终身机器生成文本归因

Zhen Sun, Yifan Liao, Zhicong Huang, Jiaheng Wei, Cheng Hong, Yutao Yue, Xinlei He

发表机构 * Wuhan University(武汉大学) Ant Group(蚂蚁集团) The Hong Kong University of Science and Technology (Guangzhou)(香港科学与技术大学(广州)) Institute of Deep Perception Technology, JITRI(感知技术研究院,JITRI)

AI总结 针对终身机器生成文本归因中持续适应新生成器与保留旧知识难以平衡的问题,提出轻量级分析更新框架RidgeFT,通过协方差校准和固定随机特征实现无需示例回放的闭式更新。

Comments 12 pages

详情
AI中文摘要

机器生成文本(MGT)归因旨在识别给定文本的特定生成器,从而为模型问责和滥用调查提供细粒度证据。随着新的大语言模型不断涌现,归因模型必须持续纳入新生成器,同时保留识别先前见过的生成器的能力。先前工作表明,这种终身MGT归因设置具有挑战性,现有方法通常难以在适应新类别和保留旧类别之间实现稳定平衡。为解决此问题,我们提出RidgeFT,一种轻量级分析更新框架,不依赖于示例回放。RidgeFT在初始生成器集上训练任务感知编码器,在首次观察到每个生成器类别时存储紧凑的类别充分统计量,然后冻结编码器以进行无回放的闭式更新。它通过协方差校准抑制与生成器无关的变异,通过固定随机特征提升表示能力,并基于类别充分统计量通过闭式岭回归更新新类别。在具有不同初始生成器设置的多主题评估中,RidgeFT始终优于基线。它在跨领域、骨干网络和增量协议上实现了最佳宏F1,同时改进了旧类别保留和新类别适应。这些结果表明,特征稳定的分析更新为终身MGT归因提供了一种简单而有效的方法。

英文摘要

Machine-generated text (MGT) attribution aims to identify the specific generator responsible for a given text, thereby providing fine-grained evidence for model accountability and misuse investigation. As new large language models continue to emerge, attribution models must continuously incorporate new generators while preserving their ability to recognize previously seen ones. Prior works have shown that this lifelong MGT attribution setting is challenging, and existing methods often struggle to achieve a stable balance between adapting to new classes and retaining old ones. To address this issue, we propose RidgeFT, a lightweight analytic update framework that does not rely on exemplar replay. RidgeFT trains a task-aware encoder on the initial generator set, stores compact class-wise sufficient statistics when each generator class is first observed, and then freezes the encoder for replay-free closed-form updates. It then suppresses generator-irrelevant variation through covariance calibration, improves representation capacity with fixed random features, and updates new classes through closed-form ridge regression based on class-level sufficient statistics. Across multi-topic evaluations with varying initial generator setups, RidgeFT consistently outperforms baselines. It achieves the best macro-F1 across domains, backbones, and incremental protocols, while also improving both old-class retention and new-class adaptation. These results suggest that feature-stable analytic updates provide a simple yet effective approach to lifelong MGT attribution.

2606.05625 2026-06-05 cs.AI cs.LG 版本更新

Self-Commitment Latency: A Reward-Free Probe for Prompted Implicit Hacking

自承诺延迟:一种用于提示隐式劫持的无奖励探针

Bonan Shen, Youting Wang, Dingyan Shang, Tao Ning

发表机构 * Stanford University(斯坦福大学) Tsinghua University(清华大学)

AI总结 提出自承诺延迟指标,通过测量推理上下文对模型自身最终答案的承诺时机,无需奖励信号即可检测提示隐式劫持,在GSM8K数据集上达到AUROC 0.878-0.926。

详情
AI中文摘要

当语言模型的思维链看似良性时,隐式奖励劫持难以审计:最终答案可能被提示捷径锚定,而书面推理仍类似于普通问题求解。基于验证器的探针通过测量早期截断的推理上下文获得高奖励来暴露此类行为,但需要任务特定的奖励信号。本文提出一种弱输入替代方案——自承诺延迟,它测量提示推理上下文对模型自身最终答案的承诺时机。我们在受控配对GSM8K设置中使用Qwen2.5-3B-Instruct-4bit评估该探针,比较普通提示与包含答案提示的提示。与诚实上下文相比,包含提示的上下文显著更早且以更低不确定性做出承诺。主要延迟指标——阈值为0.8时的首次承诺延迟——达到AUROC 0.878;支持的全曲线摘要达到承诺范围AUROC 0.926和平均未承诺质量AUROC 0.904。当两种提示条件都正确回答时信号更强,且在不同阈值下保持稳定。这些结果表明,存在捷径的推理上下文会留下早期行为承诺特征,无需奖励模型、外部评判或训练分类器即可检测。

英文摘要

Implicit reward hacking is hard to audit when a language model's chain of thought appears benign: a final answer may be anchored by a prompt shortcut while the written reasoning still resembles ordinary problem solving. Verifier-based probes expose such behavior by measuring how early truncated reasoning contexts obtain high reward, but require a task-specific reward signal. This paper proposes a weaker-input alternative, self-commitment latency, which measures how early a prompted reasoning context commits to the model's own final answer. We evaluate the probe in a controlled paired GSM8K setting using Qwen2.5-3B-Instruct-4bit, comparing ordinary prompts with prompts that include an answer hint. Hinted contexts commit substantially earlier and with lower uncertainty than honest contexts. The primary latency metric, first-commitment latency at threshold 0.8, reaches AUROC 0.878; supporting whole-curve summaries reach AUROC 0.926 for commitment range and 0.904 for mean uncommitted mass. The signal is stronger when both prompt conditions answer correctly and remains stable across thresholds. These results show that shortcut-available reasoning contexts can leave an early behavioral commitment signature detectable without a reward model, external judge, or trained classifier.

2606.05618 2026-06-05 nlin.CD cs.LG math.DS 版本更新

Uncovering Extreme Event Mechanisms for Prediction and Control with Sensitivity-Balanced Projections

利用敏感度平衡投影揭示极端事件机制以进行预测与控制

Nicholas Zolman, Sajeda Mokbel, Samuel E. Otto, Steven L. Brunton

发表机构 * Department of Mechanical Engineering, University of Washington(华盛顿大学机械工程系) AI Institute in Dynamic Systems, University of Washington(华盛顿大学动态系统人工智能研究所) Sibley School of Mechanical and Aerospace Engineering, Cornell University(康奈尔大学Sibley机械与航空航天工程学院)

AI总结 提出基于协方差平衡降维(CoBRAS)的可解释方法,通过自动微分替代伴随计算,识别敏感度平衡投影以揭示极端事件机制,并用于数据驱动预测和事件抑制控制。

Comments 12 pages, 6 figures (main text). Additional 14 pages of references and Supplementary Information

详情
AI中文摘要

极端事件——如地震和日冕物质抛射——在许多混沌动力系统中很常见,但由于驱动它们的微妙不稳定性机制,很难表征和预测。在这项工作中,我们开发了一种可解释的技术,揭示极端事件背后的潜在机制,并利用它们构建数据驱动的预测和直观的事件抑制控制器。特别是,我们利用伴随快照的协方差平衡降维(CoBRAS)方法来识别线性斜投影,这些投影最好地捕获感兴趣量的敏感度并重建原始状态。重要的是,我们绕过了繁琐的伴随计算的需要,而是通过现代自动可微数值框架使用反向传播。为了适应空间局部事件,我们还引入了一种新的CoBRAS变体,以获得局部敏感度平衡投影。我们展示了这种方法在一系列具有挑战性的系统中表征极端事件的效用,包括二维Kolmogorov流中湍流能量耗散的爆发、耦合FitzHugh-Nagumo振荡器网络中的自发同步,以及由修正非线性薛定谔方程产生的海洋怪波的局部形成。对于每个例子,我们展示了我们的简单预测模型准确预测极端事件,并且潜在机制可用于设计控制律以防止这些事件。最后,我们证明了通过直接从数据学习动力学的神经网络代理模型,我们可以将这种方法扩展到实验系统和那些并非原生用自动可微编程语言编写的系统。

英文摘要

Extreme events -- such as earthquakes and coronal mass ejections -- are common in many chaotic dynamical systems, yet are difficult to characterize and predict due to the subtle instability mechanisms that drive them. In this work, we develop an interpretable technique that reveals the underlying mechanisms behind extreme events and uses them to build data-driven forecasts and intuitive event suppression controllers. In particular, we utilize the covariance balancing reduction using adjoint snapshots (CoBRAS) method to identify linear oblique projections that best capture the sensitivity of a quantity of interest and reconstruct the original state. Importantly, we bypass the need for cumbersome adjoint calculations, instead using backpropagation via modern automatically differentiable numerical frameworks. To accommodate spatially localized events, we also introduce a new variant of CoBRAS to obtain local sensitivity-balanced projections. We demonstrate the utility of this approach to characterize extreme events across a diverse set of challenging systems, including turbulent bursts of energy dissipation in the 2D Kolmogorov Flow, spontaneous synchronization in networks of coupled FitzHugh-Nagumo oscillators, and the localized formation of ocean rogue waves from a modified nonlinear Schrödinger equation. For each example, we show that our simple forecast models accurately predict extreme events and that the underlying mechanisms may be used to design control laws to prevent these events. Finally, we demonstrate that by learning a neural network surrogate model of the dynamics directly from data, we may extend this approach to experimental systems and systems that are not natively written in an automatically differentiable programming language.

2606.05609 2026-06-05 cs.CR cs.AI cs.LG 版本更新

SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks

SlotGCG:利用LLMs中的位置脆弱性进行越狱攻击

Seungwon Jeong, Jiwoo Jeong, Hyeonjin Kim, Yunseok Lee, Woojin Lee

发表机构 * Dongguk University-Seoul(东国大学-首尔)

AI总结 本文提出SlotGCG方法,通过量化提示中不同插入位置(槽)的脆弱性得分(VSS),选择最脆弱的位置插入对抗性令牌,从而显著提升基于优化的越狱攻击成功率。

详情
Journal ref
International Conference on Learning Representations (ICLR), 2026
AI中文摘要

随着大型语言模型(LLMs)的广泛部署,通过越狱攻击识别其脆弱性变得日益关键。基于优化的攻击方法如贪婪坐标梯度(GCG)专注于将对抗性令牌插入到提示的末尾。然而,GCG将对抗性令牌限制在固定的插入点(通常是提示后缀),未探索在其他位置插入令牌的效果。在本文中,我们实证研究了提示中可插入令牌的候选位置(称为槽)。我们发现越狱的脆弱性与槽的选择高度相关。基于这些发现,我们引入了脆弱性槽得分(VSS)来量化越狱的位置脆弱性。随后,我们提出SlotGCG,该方法使用VSS评估所有槽,选择最脆弱的槽进行插入,并在这些槽上运行针对性的优化攻击。我们的方法提供了一种与攻击无关的位置搜索机制,可插入任何基于优化的攻击,仅增加200毫秒的预处理时间。在多个模型上的实验表明,SlotGCG显著优于现有方法。具体而言,与基于GCG的攻击相比,它实现了14%更高的攻击成功率(ASR),收敛更快,并且对防御方法表现出更强的鲁棒性,ASR比基线方法高42%。我们的实现可在https://github.com/youai058/SlotGCG获取。

英文摘要

As large language models (LLMs) are widely deployed, identifying their vulnerability through jailbreak attacks becomes increasingly critical. Optimization-based attacks like Greedy Coordinate Gradient (GCG) have focused on inserting adversarial tokens to the end of prompts. However, GCG restricts adversarial tokens to a fixed insertion point (typically the prompt suffix), leaving the effect of inserting tokens at other positions unexplored. In this paper, we empirically investigate \emph{slots}, i.e., candidate positions within a prompt where tokens can be inserted. We find that vulnerability to jailbreaking is highly related to the selection of the \emph{slots}. Based on these findings, we introduce the \textit{Vulnerable Slot Score} (VSS) to quantify the positional vulnerability to jailbreaking. We then propose SlotGCG, which evaluates all slots with VSS, selects the most vulnerable slots for insertion, and runs a targeted optimization attack at those slots. Our approach provides a position-search mechanism that is attack-agnostic and can be plugged into any optimization-based attack, adding only 200ms of preprocessing time. Experiments across multiple models demonstrate that SlotGCG significantly outperforms existing methods. Specifically, it achieves 14\% higher Attack Success Rates (ASR) over GCG-based attacks, converges faster, and shows superior robustness against defense methods with 42\% higher ASR than baseline approaches. Our implementation is available at \href{https://github.com/youai058/SlotGCG}{https://github.com/youai058/SlotGCG}

2606.05606 2026-06-05 cs.LG cs.AI math.OC 版本更新

Cross-Epoch Adaptive Rollout Optimization for RL Post-Training

跨时代自适应展开优化用于强化学习后训练

Yiming Zong, Yige Wang, Jiashuo Jiang

发表机构 * Department of Industrial Engineering & Decision Analytics, Hong Kong University of Science and Technology(工业工程与决策分析系,香港科学与技术大学)

AI总结 针对提示词训练信号差异大的问题,提出CERO方法,通过贝叶斯估计提示词成功概率并利用Fenchel对偶优化自适应分配展开预算,在固定总预算下提升样本效率。

详情
AI中文摘要

LLM后训练通常依赖于对每个提示采样多次展开的强化学习方法,但大多数现有方法对每个提示使用固定的展开预算,尽管不同提示提供的训练信号差异很大。本文研究在固定全局预算下的自适应展开分配,并将问题形式化为具有提示级递减收益的在线资源分配。我们的方法CERO维护每个提示成功概率的Beta后验分布,并使用后验期望伯努利方差作为额外展开价值的贝叶斯估计。我们利用该估计构建累积分配上的凹饱和效用函数,得到一个目标函数,其中跨提示和跨时代的决策通过全局预算耦合。由于所得目标在时间上不可分离,我们推导出Fenchel对偶重写,并通过投影在线梯度下降更新提示级和预算级对偶变量。在固定提示效用下,我们证明相对于离线分配基准的$O(\sqrt{K})$遗憾界。在数学推理问题上的实验表明,CERO在多个开源LLM和基准上持续优于GRPO,证明自适应展开预算可以提高样本效率。

英文摘要

LLM post-training often relies on reinforcement learning methods that sample multiple rollouts per prompt, yet most existing approaches use a fixed rollout budget for every prompt, despite large differences in the training signal different prompts provide. In this paper, we study adaptive rollout allocation under a fixed global budget and formulate the problem as online resource allocation with prompt-level diminishing returns. Our method, CERO, maintains a Beta posterior over each prompt's success probability and uses the posterior expected Bernoulli variance as a Bayesian estimate of the value of additional rollouts. We use this estimate to construct a concave, saturating utility over cumulative allocations, yielding an objective in which decisions across prompts and epochs are coupled by the global budget. Since the resulting objective is temporally nonseparable, we derive a Fenchel-dual reformulation and update both prompt-level and budget-level dual variables via projected online gradient descent. Under fixed prompt utilities, we prove an $O(\sqrt{K})$ regret bound against the offline allocation benchmark. Experiments on mathematical-reasoning problems show that CERO consistently outperforms GRPO across multiple open-weight LLMs and benchmarks, demonstrating that adaptive rollout budgeting can improve sample efficiency.

2606.05605 2026-06-05 cs.LG cs.NE 版本更新

From Prediction to Self: Developmental Conditions for Agency in Minimal Neural Systems

从预测到自我:最小神经系统中能动性的发展条件

Evan Ye

发表机构 * Independent Researcher(独立研究者)

AI总结 通过40个逐步增加的实验,研究最小GRU系统如何区分自我与世界因果影响,发现四个严格顺序的发展条件,并提出能动性增益作为度量指标。

Comments 18 pages, 6 figures

详情
AI中文摘要

一个仅仅预测世界的系统如何区分自身的因果影响与其他一切?我们在一个最小192维GRU中通过40个受控实验(按发展序列排列,一次添加一个组件)追踪这一转变,并跟踪系统是否能区分自我引起的变化与世界引起的变化。发展路径揭示了必须严格按顺序满足的四个条件:(1)形成稳定吸引子的持久状态,(2)连接输出到输入的因果动作循环,(3)使隐式因果知识显式的本体感觉反馈,以及(4)异步觉醒——感知学习必须在动作学习开始之前巩固。我们提出能动性增益(A = Err_world - Err_self),即了解自身动作的预测优势,作为跟踪这一过程的度量。自我感知预测器在周期性(正弦)和混沌(洛伦兹)环境中始终优于自我盲预测器,并且该度量在移除所有辅助组件后仍然有效。只有前向采样的动作选择产生有意义的能动性增益;两种基于梯度的替代方案退化。同样重要的是12个被证伪的假设,它们映射了发展停滞的地方:仅靠预测编码不会产生自我表征。

英文摘要

How does a system that merely predicts the world come to distinguish its own causal influence from everything else? We trace this transition in a minimal 192-dimensional GRU through 40 controlled experiments arranged as a developmental sequence, adding components one at a time and tracking whether the system can distinguish self-caused from world-caused changes. The developmental path reveals four conditions that must be satisfied in strict order: (1) persistent state forming stable attractors, (2) a causal action loop linking output to input, (3) proprioceptive feedback that makes implicit causal knowledge explicit, and (4) asynchronous awakening - perceptual learning must consolidate before action learning begins. We propose agency gain (A = Err_world - Err_self), the predictive advantage of knowing one's own action, as a metric to track this process. The self-aware predictor consistently outperforms the self-blind predictor across periodic (sinusoidal) and chaotic (Lorenz) environments, and the metric survives ablation of all auxiliary components. Only forward-sampled action selection produces meaningful agency gain; two gradient-based alternatives degenerate. Equally significant are 12 falsified hypotheses mapping where development stalls: predictive coding alone does not produce self-represent

2606.05602 2026-06-05 cs.AI cs.HC cs.LG 版本更新

Fix the Mind, Not the Move: Interpretable AI Assistance via Knowledge-Gap Localization

修正思维,而非动作:通过知识缺口定位实现可解释的AI辅助

Ayano Hiranaka, Ya-Chuan Hsu, Stefanos Nikolaidis, Erdem Bıyık, Daniel Seita

发表机构 * University of Tokyo(东京大学) National Institute of Information and Communications Technology(信息与通信技术国家研究所)

AI总结 提出SENSEI框架,通过结构化知识表示推断用户误解并提供针对性建议,在长时任务中实现零样本组合泛化,纠正90%的学生误解。

Comments Accepted to International Conference on Machine Learning (ICML) 2026

详情
AI中文摘要

在人机协作中,AI助手通常通过行为反馈(例如辅助驾驶中的警报或方向盘提示)来纠正次优的人类行为。此类干预可以缓解即时错误,但长期改进需要解决导致重复错误的潜在误解。我们引入了SENSEI,一个从交互行为推断用户误解并提供针对性、最小但充分建议的框架。我们的方法通过操作结构化知识表示来定位和纠正错误行为的根源,从而脱离动作或轨迹层面的干预。在具有不同误解和相应行为的三个长时任务中,SENSEI展示了零样本组合泛化能力,尽管仅针对单一误解案例进行训练,却能解开多个重叠的误解。一项用户研究进一步表明,我们的方法能够识别真实的人类误解,并提供有效的指导,从而提高长时任务表现,成功纠正了90%的学生误解。代码和项目页面见https://misoshiruseijin.github.io/SENSEI/。

英文摘要

AI assistants in human-AI collaboration often correct suboptimal human actions through behavioral feedback (e.g., alerts or steering-wheel nudges in assistive driving). Such interventions can mitigate immediate errors, but long-term improvement requires addressing the underlying misconceptions that cause repeated mistakes. We introduce SENSEI, a framework that infers user misconceptions from interaction behavior and provides targeted, minimal yet sufficient suggestions to correct them. Our approach departs from action- or trajectory-level interventions by operating over a structured knowledge representation to localize and correct the sources of erroneous behavior. Across three long-horizon tasks with diverse misconceptions and corresponding behaviors, SENSEI demonstrates zero-shot compositional generalization, disentangling multiple overlapping misconceptions despite training only on single-misconception cases. A user study further shows that our method identifies real human misconceptions and provides effective guidance that improves long-horizon task performance, successfully correcting $90\%$ of student misconceptions. Code and project page are available at https://misoshiruseijin.github.io/SENSEI/.

2606.05599 2026-06-05 cs.LG math.ST stat.ME stat.ML stat.TH 版本更新

Mitigating the Curse of Dimensionality in Uniform Convergence of Deep Neural Networks via Smooth Activations

通过平滑激活函数缓解深度神经网络一致收敛中的维度灾难

Yizhe Ding, Runze Li, Jia Liu, Lingzhou Xue

发表机构 * Department of Statistics, The Pennsylvania State University(宾夕法尼亚州立大学统计学系)

AI总结 本文通过分析平滑激活深度神经网络,建立了统一收敛的理论框架,证明其能够通过自适应利用目标函数的低维层次组合结构来缓解维度灾难。

Comments 30 pages, 5 figures

详情
AI中文摘要

本文为平滑激活深度神经网络(DNN)估计量的一致收敛建立了理论框架。虽然标准ReLU网络在各种非参数回归任务中,在$L^2(P)$范数下达到了极小化最优速率,但我们建立了一个理论下界,表明最小二乘ReLU估计量在其一致收敛行为中可能遭受维度灾难。受下游任务中对最坏情况可靠性的需求驱动,我们通过分析平滑激活DNN(平滑DNN),包括前馈和残差结构,来解决这一局限性。我们为这些模型的逼近器建立了新的伪维数界、非渐近逼近保证和Hölder范数界。利用这些结果,我们推导了平滑DNN估计量在多种统计上下文(包括Huber回归、最小二乘回归、分位数回归和逻辑回归)中的非渐近一致收敛速率。我们证明,平滑DNN可以通过自适应利用目标函数的低维层次组合结构来缓解一致收敛中的维度灾难。通过模拟研究和实际应用的支持,我们的结果将平滑DNN定位为在需要一致保证的统计学习任务中,理论上合理且实践上可行的ReLU网络替代方案。

英文摘要

This paper establishes a theoretical framework for the uniform convergence of smoothly activated deep neural network (DNN) estimators. While standard ReLU networks achieve minimax-optimal rates in the $L^2(P)$ norm for various nonparametric regression tasks, we establish a theoretical lower bound demonstrating that least-squares ReLU estimators can suffer from the curse of dimensionality in their uniform convergence behavior. Motivated by the need for reliable uniform guarantees in downstream tasks requiring worst-case reliability, we address this limitation by analyzing smoothly activated DNNs (smooth DNNs), encompassing both feedforward and residual structures. We establish novel pseudo-dimension bounds, non-asymptotic approximation guarantees, and Hölder-norm bounds for the approximators of these models. Leveraging these results, we derive non-asymptotic uniform convergence rates for smooth DNN estimators across multiple statistical contexts, including Huber, least-squares, quantile, and logistic regression. We prove that smooth DNNs can mitigate the {curse of dimensionality} in uniform convergence by adaptively exploiting the low-dimensional hierarchical composition structure of the target function. Supported by both simulation studies and a real-world application, our results position smooth DNNs as a theoretically grounded and practically viable alternative to ReLU networks for statistical learning tasks requiring uniform guarantees.

2606.05588 2026-06-05 cs.RO cs.LG 版本更新

Auditing Demonstration Curation Metrics: Action-Only Scorers Fail on the Structural Defects That Degrade Imitation Policies

审计示范策展指标:仅动作评分器在降低模仿策略的结构缺陷上失败

Aarav Bedi

发表机构 * Aarav Bedi

AI总结 本研究构建受控测试平台,注入两类示范缺陷(细微扰动和结构错误),审计七种策展指标,发现仅动作指标无法检测结构错误,且部分指标评分倒置,而状态轨迹指标能部分检测但下游性能恢复有限。

Comments 5 pages, 3 figures, 4 tables

详情
AI中文摘要

模仿学习策略继承了其训练示范的质量,越来越多的策展指标声称能自动评分和过滤低质量示范。这些指标各自在不同协议的不同数据上验证,因此不清楚哪些指标真正识别出损害策略的示范。我们构建了一个受控测试平台,其中示范缺陷以已知类型注入,并沿两个轴审计七种策展指标:每个指标区分缺陷示范与清洁示范的效果,以及基于每个指标策展的子集训练行为克隆策略是否提高任务成功率。我们研究两种缺陷机制。细微扰动(相关动作噪声、震颤、截断)可通过多变量离群值评分检测,一旦移除,可恢复全部下游差距。结构错误,即示范在关键时刻执行错误动作,对我们测试的每个仅动作指标都是不可见的,其中两个指标是倒置的:它们将缺陷示范评分为更高质量,并用于策展时,往往使策略处于或低于未策展基线,而非高于基线。只有检查状态轨迹的指标能检测结构错误,即使最好的指标也只能恢复三分之一的下游差距。高检测准确性并不保证下游改进。我们发布了测试平台和所有策展实现。

英文摘要

Imitation-learning policies inherit the quality of the demonstrations they are trained on, and a growing set of curation metrics promise to score and filter low-quality demonstrations automatically. These metrics are each validated on different data with different protocols, so it is unclear which of them actually identify the demonstrations that harm a policy. We build a controlled testbed in which demonstration defects are injected with known type, and audit seven curation metrics along two axes: how well each separates defective from clean demonstrations, and whether training a behavior-cloning policy on each metric's curated subset improves task success. We study two defect regimes. Subtle perturbations (correlated action noise, tremor, truncation) are detectable by multivariate outlier scoring and, once removed, recover the full downstream gap. Structural errors, where the demonstration executes a wrong action at a key moment, are invisible to every action-only metric we test, and two of them are inverted: they score defective demonstrations as higher quality and, used for curation, tend to leave the policy at or below the uncurated baseline rather than above it. Only metrics that examine the state trajectory detect structural errors, and even the best of them recovers just a third of the downstream gap. High detection accuracy does not guarantee downstream improvement. We release the testbed and all curation implementations.

2606.05587 2026-06-05 cs.CV cs.AI cs.LG 版本更新

HDST-GNN: Heterogeneous Dynamic Spatiotemporal Graph Neural Networks for Multi-Object Tracking in UAV Aerial Imagery

HDST-GNN:用于无人机航拍图像多目标跟踪的异质动态时空图神经网络

Phillip Jiang

发表机构 * Phillip Jiang(菲利普·姜)

AI总结 针对无人机航拍中目标小、密集、遮挡导致身份切换的问题,提出异质动态时空图神经网络HDST-GNN,通过高度自适应边构建、异质节点表示和遮挡门控时序聚合提升跟踪性能。

Comments 18 pages, 4 figures, 6 tables

详情
AI中文摘要

无人机航拍图像的多目标跟踪(MOT)面临独特挑战:序列间高度变化、目标小而密集、频繁遮挡导致身份切换。现有基于图的跟踪器假设固定空间上下文并统一处理所有目标,忽略了检测、活跃轨迹和丢失目标等异质生命周期状态。我们提出HDST-GNN,一种异质动态时空图神经网络,包含三项创新。首先,高度自适应边构建根据平均目标面积估计相机高度代理,并相应调整图连接半径。其次,异质节点表示将检测(D型)、确认轨迹(T型)和丢失轨迹(L型)建模为不同节点类型,具有专用投影和类型化边关系。第三,遮挡门控时序聚合根据每个节点的遮挡置信度门控其注意力贡献,防止被遮挡节点破坏邻居嵌入。HDST-GNN使用可微Sinkhorn头部,结合交叉熵和三元组损失进行端到端训练。在VisDrone2019-MOT上使用oracle检测时,HDST-GNN达到94.51% MOTA和97.24% IDF1,比SORT高出+5.0 MOTA点,身份切换减少81%。使用真实YOLOv8n检测时,HDST-GNN相比SORT身份切换减少49%。消融研究证实了每个组件的独立贡献。

英文摘要

Multi-object tracking (MOT) from UAV imagery presents unique challenges: altitude varies across sequences, objects are small and densely packed, and frequent occlusion causes identity switches. Existing graph-based trackers assume fixed spatial context and treat all objects uniformly, ignoring the heterogeneous lifecycle states of detections, active tracklets, and lost targets. We propose HDST-GNN, a Heterogeneous Dynamic Spatiotemporal Graph Neural Network with three novel contributions. First, Altitude-Adaptive Edge Construction estimates a camera-altitude proxy from mean object area and adjusts the graph connectivity radius accordingly. Second, Heterogeneous Node Representation models detections (Type-D), confirmed tracklets (Type-T), and lost tracklets (Type-L) as distinct node types with dedicated projections and typed edge relations. Third, Occlusion-Gated Temporal Aggregation gates each node's attention contribution by its occlusion confidence, preventing occluded nodes from corrupting neighbour embeddings. HDST-GNN is trained end-to-end with a differentiable Sinkhorn head using joint cross-entropy and triplet loss. On VisDrone2019-MOT with oracle detections, HDST-GNN achieves 94.51% MOTA and 97.24% IDF1, outperforming SORT by +5.0 MOTA points and reducing identity switches by 81%. With real YOLOv8n detections, HDST-GNN reduces identity switches by 49% vs. SORT. Ablation studies confirm the independent contribution of each component.

2606.05581 2026-06-05 cs.GR cs.CV cs.LG 版本更新

Monte Carlo Steklov Operators for Large-Scale Geometry Processing in the Wild

蒙特卡洛Steklov算子用于大规模野外几何处理

Arman Maesumi, Tanish Makadia, Aruna Anderson, Oras Phongpanangam, Justin Solomon, Daniel Ritchie

发表机构 * Brown University(布朗大学) Loyola Marymount University(洛约拉玛丽蒙特大学) Massachusetts Institute of Technology(麻省理工学院)

AI总结 提出一种蒙特卡洛方法估计Dirichlet-to-Neumann算子及其Steklov特征模态,实现鲁棒且高效的体积算子计算,并应用于大规模3D对比表示学习。

Comments 21 pages

详情
AI中文摘要

内在方法填充了网格几何处理的默认工具箱。内在算子,特别是拉普拉斯算子,是对等距不变性有要求的方法的基础,因此已用于许多形状分析、学习和编辑算法。然而,内在方法的前提假设在处理野外几何时变得脆弱,因为(i)网格质量无法保证,(ii)许多网格由多个连通分量建模。在这种情况下,体积构造定义更清晰,因为可以放宽对表面拓扑的限制。本文提出了一种蒙特卡洛方法,用于估计Dirichlet-to-Neumann (DtN)算子——一种边界到边界的体积算子——及其相关的Steklov特征模态。我们基于蒙特卡洛几何处理的最新发展,将该边界算子本身作为估计对象。通过体积随机过程定义的DtN算子被推广到外部域,通过周围环境空间耦合断开的分量。我们表明,我们的方法在计算Steklov谱时比现有的边界元方法快几个数量级,同时对低质量三角剖分、高分辨率网格和多分量几何保持鲁棒。为了展示这种可扩展性,我们计算了来自未策划的Objaverse数据集的约450,000个形状的内外Steklov特征谱。我们将这些算子集成到Steklov-CLIP中,这是一种基于网格的神经网络,使用体积谱算子进行大规模对比3D表示学习。得到的网络学习到语义上有意义的全局和密集形状表示,说明几何上有原则的体积算子可以在现代3D数据集规模上变得实用。

英文摘要

Intrinsic methods fill the default toolbox for geometry processing on meshes. Intrinsic operators, in particular the Laplacian, underlie methods that require invariance to isometry and have hence been employed in many algorithms for shape analysis, learning, and editing. However, intrinsic methods are predicated on assumptions that quickly become brittle when working with in-the-wild geometry, where (i) mesh quality is not guaranteed, and (ii) many meshes are modeled with multiple connected components. In such settings, volumetric constructions are better-defined, since restrictions on surface topology can be relaxed. This paper presents a Monte Carlo method for estimating the Dirichlet-to-Neumann (DtN) operator -- a boundary-to-boundary volumetric operator -- and its associated Steklov eigenmodes. We build on recent developments in Monte Carlo geometry processing by casting this boundary operator itself as the subject of estimation. The DtN operator, defined through a volumetric stochastic process, is then generalized to the exterior domain, where it couples disconnected components through the surrounding ambient space. We show that our method is orders of magnitude faster than existing boundary-element approaches for computing Steklov spectra while remaining robust to poor triangulations, high-resolution meshes, and multi-component geometry. To demonstrate this scalability, we compute interior and exterior Steklov eigenspectra for approximately 450,000 shapes from the uncurated Objaverse dataset. We incorporate these operators into Steklov-CLIP, a mesh-based neural network that uses volumetric spectral operators for large-scale contrastive 3D representation learning. The resulting network learns semantically meaningful global and dense shape representations, illustrating that geometrically-principled volumetric operators can be made practical at the scale of modern 3D datasets.

2606.05559 2026-06-05 cs.LG 版本更新

CLaaS: Continual learning as a service for sample efficient online learning

CLaaS: 作为服务的持续学习,用于样本高效的在线学习

Kion Fallah, Silen Naihin, Barak Widawsky, Qingqing Mao

发表机构 * Resolute Labs(Resolute实验室) Incept Labs(Incept实验室)

AI总结 提出CLaaS系统,通过经验回放缓冲区实现异步训练中的梯度复用,在对抗性任务中展示参数更新优于上下文学习的前向迁移和遗忘减少。

Comments 4 pages main content, 7 figures

详情
AI中文摘要

部署的大型语言模型代理必须适应动态环境中的分布偏移。理想情况下,可以从累积的代理经验中进行适应,并在转移到未来任务时保留先前的能力。然而,由于真实环境无法轻易重置,每个场景中代理的动作和环境转换只能采样一次。为此,我们研究了一种体验式和在线持续学习设置,其中代理从一系列场景中学习。我们提出了持续学习即服务(CLaaS),这是一个系统,使代理能够在部署期间改进,并通过聊天API抽象化。为了提高样本效率,CLaaS将轨迹存储在经验回放缓冲区中,以便在异步训练期间重用梯度。我们在对抗性任务上评估了CLaaS,证明参数更新比上下文学习具有更好的前向迁移和更少的遗忘,其中回放是样本效率的关键选择。

英文摘要

Deployed large language model agents must adapt to distribution shift in dynamic environments. Ideally, adaptation can be performed from accumulated agent experiences and retain prior capabilities while transferring to future tasks. However, agent actions and environmental transitions can only be sampled once per scenario, as real-world environments cannot be trivially reset. To this end, we investigate an experiential and online continual learning setting in which agents learn from a stream of scenarios. We propose continual learning as-a-service (CLaaS), a system which enables agents to improve during deployment, abstracted behind a chat API. To increase sample efficiency, CLaaS stores rollouts in an experience replay buffer for gradient reuse during asynchronous training. We evaluate CLaaS on an adversarial task, demonstrating that parametric updates lead to superior forward transfer and less forgetting than in-context learning, with replay being a critical choice for sample efficiency.

2606.05558 2026-06-05 cs.LG 版本更新

Autoregressive Diffusion World Models for Off-Policy Evaluation of LLM Agents

自回归扩散世界模型用于LLM智能体的离线评估

Kaixuan Liu, Guojun Xiong, Weinan Zhang, Shengpu Tang

发表机构 * Department of Computer Science, Emory University(埃默里大学计算机科学系) School of Computer Science, Shanghai Jiao Tong University(上海交通大学计算机科学学院)

AI总结 提出ADWM框架,通过自回归扩散世界模型从预收集轨迹中模拟环境响应,实现无需在线交互的LLM智能体策略离线评估。

详情
AI中文摘要

在多轮交互环境中评估大语言模型(LLM)智能体成本高且风险大,因为它需要在线环境交互。我们提出ADWM(自回归扩散世界模型),一个仅从预收集轨迹中估计新LLM智能体策略性能的评估框架。核心思想是学习一个潜在扩散世界模型,模拟环境如何响应评估策略,而无需在真实环境中执行。现有的基于扩散的OPE方法通过联合扩散状态和动作,在单次传递中引导完整轨迹,这一假设对于动作是离散文本且必须在观察环境后从策略中采样的LLM智能体不成立。与遭受复合误差的自回归世界模型不同,ADWM将每个转移建模为独立的去噪过程,实现可靠的逐步展开,其中世界模型和智能体按因果顺序交替。关键的是,被评估的LLM智能体通过策略条件得分函数直接引导每一步的扩散生成,确保模拟轨迹准确反映其决策模式。实验上,ADWM在多种多轮智能体任务中实现了准确的价值估计和评估可靠性,展示了其作为离线LLM智能体评估实用框架的前景。

英文摘要

Evaluating large language model (LLM) agents in multi-turn interactive environments is expensive and risky, as it requires online environment interaction. We propose ADWM (Autoregressive Diffusion World Model), an evaluation framework that estimates the performance of a new LLM agent policy purely from pre-collected trajectories. The core idea is to learn a latent diffusion world model that simulates how the environment responds to the evaluation policy, without ever executing it in the real environment. Existing diffusion-based OPE methods guide full trajectories in a single pass by jointly diffusing states and actions, an assumption that breaks down for LLM agents whose actions are discrete text that must be sampled from the policy after observing the environment. Unlike autoregressive world models that suffer from compounding errors, ADWM models each transition as an independent denoising process, enabling reliable step-by-step rollouts where the world model and agent alternate in causal order. Crucially, the LLM agent under evaluation directly guides the diffusion generation at each step via a policy-conditioned score function, ensuring that simulated trajectories accurately reflect its decision-making patterns. Empirically, ADWM achieves accurate value estimates and evaluation reliability across diverse multi-turn agent tasks, demonstrating its promise as a practical framework for offline LLM agent evaluation.

2606.05555 2026-06-05 cs.LG cs.AI 版本更新

Representation Learning Enables Scalable Multitask Deep Reinforcement Learning

表示学习实现可扩展的多任务深度强化学习

Johan Obando-Ceron, Lu Li, Scott Fujimoto, Pierre-Luc Bacon, Aaron Courville, Pablo Samuel Castro

发表机构 * Mila – Québec AI Institute(魁北克AI研究所) Université de Montréal(蒙特利尔大学) McGill University(麦吉尔大学) CIFAR AI Chair(CIFAR人工智能 chair) Google DeepMind(谷歌DeepMind)

AI总结 本文提出一种结合预测性表示学习与高容量值函数近似的无模型算法MR.Q,在无需规划的情况下,在多任务连续控制任务中超越基于世界模型的方法和多种深度强化学习基线,并显著降低计算开销。

详情
AI中文摘要

将强化学习扩展到多样化的多任务设置仍然是一个核心挑战。虽然基于模型的强化学习的最新进展取得了强劲的性能,但它们依赖于规划和复杂的训练流程,使得不清楚哪些组件对可扩展性至关重要。我们重新审视这个问题,并认为可扩展多任务强化学习的主要驱动力不是基于模型的控制,而是\emph{表示学习}。特别地,我们表明,将预测性的、基于模型的表示与高容量值函数逼近相结合,即使没有规划,也足以实现强劲的性能。我们评估了一种简单的无模型算法MR.Q,将辅助预测目标与可扩展的actor-critic架构相结合。这种方法在多样化的多任务连续控制任务套件中优于最近基于世界模型的方法和一系列深度强化学习基线,同时显著降低了计算开销并提高了实际时间效率。我们观察到随着模型容量的增加而持续改进,并通过消融实验表明预测性表示学习对性能至关重要。

英文摘要

Scaling reinforcement learning (RL) to diverse multitask settings remains a central challenge. While recent advances in model-based RL achieve strong performance, they rely on planning and complex training pipelines, making it unclear which components are essential for scalability. We revisit this question and argue that the primary driver of scalable multitask RL is not model-based control, but \emph{representation learning}. In particular, we show that combining predictive, model-based representations with high-capacity value function approximation is sufficient to achieve strong performance, even without planning. We evaluate a simple model-free algorithm, MR.Q, coupled with auxiliary predictive objectives into a scalable actor-critic architecture. This approach outperforms a recent world-model-based method and a range of deep RL baselines across a diverse suite of multitask continuous control tasks, while significantly reducing computational overhead and improving wall-clock efficiency. We observe consistent improvements with increased model capacity and show through ablations that predictive representation learning is critical for performance.

2606.05552 2026-06-05 cs.LG cs.AI cs.GR 版本更新

Balancing Image Compression and Generation with Bootstrapped Tokenization

平衡图像压缩与生成:自引导分词

Haozhe Chi, Jinghan Li, Hao Jiang, Wu Sheng, Yi Ma, Jing Wang, Yadong Mu

发表机构 * Peking University(北京大学) Central Media Technology Institute, Huawei(华为中央媒体技术研究所)

AI总结 提出SelfBootTok方法,通过自引导学习将图像信息分解为全局和局部标记组,使生成器仅依赖全局标记,减少40%计算量并提升重建与生成质量,以64个标记实现1.56的gFID新纪录。

详情
AI中文摘要

尽管图像分词取得了进展,但标准方法通过在每个标记中混合所有粒度来编码冗余信息,因此标记之间仍存在冗余。不同粒度信息的混合也增加了生成器训练的复杂性。本文介绍了SelfBootTok,一种通过将信息干净地分解为全局和局部标记组来解决此问题的方法。通过自引导学习,模型仅从全局标记预测局部细节,将视觉细节的负担从生成器转移到分词器。因此,我们的生成器效率更高,仅需全局标记,计算量减少约40%,同时提供更优的重建和生成。此外,该范式优雅地扩展:通过利用更多数据或参数来自监督局部表示学习,SelfBootTok仅使用64个标记就实现了1.56的最优gFID分数。

英文摘要

Despite progress in image tokenization, standard methods encode redundant information by mixing all granularities within each token, thus redundancy persists between tokens. The mix of information of different granularity also complicates the training of generators. This paper introduces SelfBootTok, a method that resolves this by cleanly decomposing information into global and local token groups. Through self-bootstrapped learning, the model predicts local details exclusively from global tokens, shifting the burden of visual details from the generator to the tokenizer. Consequently, our generator is far more efficient, requiring only global tokens and reducing computation by approximately 40%, while delivering superior reconstruction and generation. Moreover, this paradigm scales elegantly: by leveraging more data or parameters to self-supervise local representation learning, SelfBootTok achieves a new state-of-the-art gFID score of 1.56 using only 64 tokens.

2606.05538 2026-06-05 cs.LG cs.CL 版本更新

Less is MoE: Trimming Experts in Domain-Specialist Language Models

少即是MoE:修剪领域专家语言模型中的专家

Haoze He, Xinkai Zou, Xuan Jiang, Xingyuan Ding, Ao Qu, Juncheng Billy Li, Heather Miller

发表机构 * Carnegie Mellon University(卡内基梅隆大学) UCSD(加州大学圣地亚哥分校) MIT(麻省理工学院)

AI总结 针对MoE模型部署时参数过多的问题,提出基于Fisher重要性的中间维度修剪方法Fisher-MoE,在50%压缩比下保持模型能力,减少约45%权重内存并提升21%推理吞吐量。

详情
AI中文摘要

混合专家(MoE)模型通过条件计算实现了强大的性能,但其庞大的参数规模带来了部署挑战。先前的MoE压缩方法在常识推理之外的通用基准测试中评估时灾难性地失败。我们将这一失败归因于压缩的粒度:重要能力分布在各个专家中,但集中在FFN稀疏中间维度。为了识别这些维度,我们使用Fisher重要性,它优于基于激活、路由器得分和幅度的方法,并识别出极小的任务关键维度集:在Qwen1.5-MoE中,仅移除1.35M路由FFN中间维度中的12个就导致GSM8K准确率崩溃,同时基本保持事实知识性能。基于此,我们提出Fisher-MoE,它在FFN内部操作,移除按Fisher重要性排序的中间维度。在相同的50% MoE压缩比下,Fisher-MoE保持了模型能力,同时减少了约45%的权重内存并提高了21%的推理吞吐量。这些发现表明,中间维度粒度是MoE模型中能力集中的有效压缩和排序单元。

英文摘要

Mixture-of-Experts (MoE) models achieve strong performance through conditional computation, but their large parameter footprint poses deployment challenges. Prior MoE compression approaches catastrophically fail when evaluated on general-purpose benchmarks beyond commonsense reasoning. We trace this failure to the granularity of compression: important capabilities are distributed across experts but concentrated in FFN sparse intermediate dimensions. To identify these dimensions, we use Fisher importance which outperforms activation-, router-score-, and magnitude-based alternatives, and identifies tiny sets of task-critical dimensions: in Qwen1.5-MoE, removing as few as 12 of 1.35M routed-FFN intermediate dimensions collapses GSM8K accuracy while largely preserving factual-knowledge performance. Building on this, we propose Fisher-MoE, which operates within FFN to remove intermediate dimensions ranked by Fisher importance. At the same 50% MoE compression ratio, Fisher-MoE preserves model capability, while reducing weight memory by ~45% and improving inference throughput by 21%. These findings suggest intermediate dimension granularity is an effective unit for both compression and ranking where capability concentrates in MoE models.

2606.05533 2026-06-05 cs.LG cs.AI cs.CV cs.RO 版本更新

What Objects Enable, Not What They Are: Functional Latent Spaces for Affordance Reasoning

物体能做什么,而非它们是什么:面向功能可供性推理的功能潜在空间

Rohan Siva, Neel P. Bhatt, Yunhao Yang, Seoyoung Lee, Nishant Gadde, Christian Ellis, Alvaro Velasquez, Zhangyang Wang, Ufuk Topcu

发表机构 * The University of Texas at Austin(德克萨斯大学奥斯汀分校) Neurosymbolic Intelligence(神经符号智能) University of Colorado Boulder(科罗拉多大学博尔德分校)

AI总结 提出A4D框架,通过构建基于功能可供性的共享潜在空间,将视觉观察映射到该空间并测量与可供性的距离,实现基于物体功能而非外观的规划推理,显著提升泛化能力和推理效率。

Comments Code, videos, and data available at: https://A4Dance-reasoning.github.io

详情
AI中文摘要

现有的机器人规划系统依赖于基于外观的推理,其中视觉观察被编码到围绕物体外观组织的潜在空间中(例如,根据外观识别“手推车”)。然而,规划需要推理物体的任务相关功能(例如,物体是否“可移动”),而基于外观的潜在空间无法捕捉这些信息。因此,现有方法难以泛化到新颖的机器人-物体交互。我们通过功能可供性推理解决这一泛化能力有限的问题,使规划基于任务相关的物体功能而非仅外观。我们提出A4D,它将视觉观察映射到一个围绕可供性(例如“可移动”)组织的共享潜在空间中。通过将视觉观察投影到这个功能潜在空间并测量它们与可供性的接近程度,A4D推断出与观察物体相关的功能。此外,我们引入了一种可供性发现机制,扩展潜在空间以处理现有可供性不足的未见场景。A4D利用功能潜在空间中的接近度来量化可供性推理的不确定性,并选择性地触发可供性发现。我们在涉及多样化和未见可供性的多个规划任务上评估A4D。A4D在现有可供性上达到94%的推理准确率,比最先进方法高出超过15个百分点;在不到原始训练数据10%的情况下,将新可供性推理准确率从70%提升到90%以上,并实现100倍更快的推理。代码、视频和数据可在https://A4Dance-reasoning.github.io获取。

英文摘要

Existing robot planning systems rely on appearance-based reasoning, where visual observations are encoded into latent spaces organized around object appearances (e.g., recognizing a "cart" based on how it looks). However, planning requires reasoning about task-relevant functionalities of objects (e.g., whether an object is "movable"), which appearance-based latent spaces do not capture. As a result, existing approaches struggle to generalize to novel robot-object interactions. We address this limited generalizability through affordance reasoning, enabling planning based on task-relevant object functionalities instead of appearance alone. We introduce A4D, which maps visual observations into a shared latent space structured around affordances (e.g., "movable"). By projecting visual observations into this functional latent space and measuring their proximity to affordances, A4D infers functionalities relevant to the observed object. Furthermore, we introduce an affordance discovery mechanism that expands the latent space to handle unseen scenarios where existing affordances are insufficient. A4D uses proximity in the functional latent space to quantify uncertainty in affordance inference and selectively triggers affordance discovery. We evaluate A4D across several planning tasks involving diverse and unseen affordances. A4D achieves 94% inference accuracy on existing affordances outperforming state-of-the-art approaches by over 15% points, improves new-affordance inference accuracy from 70% to over 90% with fewer than 10% of the original training data, and enables 100x faster inference. Code, videos, and data available at: https://A4Dance-reasoning.github.io.

2606.05531 2026-06-05 cs.CV cs.AI cs.CL cs.LG 版本更新

Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models

Almieyar-Oryx-BloomBench:一个用于视觉语言模型认知知情评估的双语多模态基准

Mohammad Mahdi Abootorabi, Omid Ghahroodi, Anas Madkoor, Marzia Nouri, Doratossadat Dastgheib, Mohamed Hefeeda, Ehsaneddin Asgari

发表机构 * University of British Columbia(不列颠哥伦比亚大学) Zuse School(Zuse学校) Qatar Computing Research Institute (QCRI)(卡塔尔计算研究所) Hamad Bin Khalifa University(哈马德·本·哈利法大学)

AI总结 针对现有基准无法诊断视觉语言模型真实推理能力的问题,提出基于Bloom认知分类学的双语多模态基准BloomBench,系统评估六个认知层次,揭示模型在事实回忆和创造性合成方面的深层局限。

Comments Accepted to ACL 2026 Findings

详情
AI中文摘要

尽管视觉语言模型(VLM)取得了快速进展,但该领域缺乏能够严格诊断其真实推理能力并描绘出向类人多模态智能有意义进展的基准。大多数现有评估侧重于零散或脱节的任务,掩盖了关键的认知弱点,并为有针对性的改进提供了很少的见解。为了弥补这一差距,我们引入了BloomBench,这是Almieyar基准系列的一部分,也是第一个基于人类认知的、双语(英语-阿拉伯语)的多模态VLM基准。基于Bloom分类学,BloomBench通过精心设计的图像-问题-答案任务系统地评估六个认知层次(记忆、理解、应用、分析、评估、创造)。通过半自动化流水线构建,并通过分层混合质量保证协议验证,确保了可扩展性、文化包容性和语言保真度。利用这一框架,我们对最先进的VLM进行了全面研究,以诊断其认知特征。我们的分析揭示了明显的认知不对称:尽管最先进的模型在语义理解方面达到了强大的性能上限,但它们在事实回忆和创造性合成方面存在显著困难。这表明当前的一般多模态能力掩盖了特定认知层次的深层局限性。此外,我们的研究突出了阿拉伯语和英语之间的关键性能差距,暴露了当前跨语言多模态推理的局限性。这些发现为开发更符合认知和包容性的VLM奠定了基础。基准框架和数据集可在以下网址获取:https://github.com/qcri/Almieyar-Oryx-BloomBench。

英文摘要

Despite the rapid progress of Vision-Language Models (VLMs), the field lacks benchmarks that rigorously diagnose their true reasoning abilities and chart meaningful progress toward human-like multimodal intelligence. Most existing evaluations focus on piecemeal or disconnected tasks, obscuring critical cognitive weaknesses and providing little insight for targeted improvement. To address this gap, we introduce BloomBench, part of the Almieyar benchmarking series, the first cognitively human-grounded, bilingual (English-Arabic) multimodal benchmark for VLMs. Grounded in Bloom's Taxonomy, BloomBench systematically evaluates six levels of cognition (Remember, Understand, Apply, Analyze, Evaluate, Create) through carefully designed image-question-answer tasks. Built with a semi-automated pipeline and validated through a stratified hybrid quality assurance protocol, it ensures scalability, cultural inclusivity, and linguistic fidelity. Leveraging this framework, we conduct a comprehensive study of state-of-the-art VLMs to diagnose their cognitive profiles. Our analysis reveals a sharp cognitive asymmetry: while state-of-the-art models achieve strong performance ceilings in semantic understanding, they struggle substantially with factual recall and creative synthesis. This demonstrates that current general multimodal proficiency masks deeper limitations in specific cognitive layers. Furthermore, our study highlights a critical performance gap between Arabic and English, exposing limitations in current cross-lingual multimodal reasoning. These findings establish a foundation for developing more cognitively aligned and inclusive VLMs. The benchmark framework and dataset is available at: https://github.com/qcri/Almieyar-Oryx-BloomBench.

2606.05516 2026-06-05 cs.LG 版本更新

Dominant-Layer ZO: A Single Layer Dominates Zeroth-Order Fine-Tuning of LLMs

主导层 ZO:单层主导大语言模型的零阶微调

Wanhao Yu, Ziyan Wang, Zheng Wang, Abeer Matar Almalky, Yihang Zuo, Shuteng Niu, Sen Lin, Adnan Siraj Rakin, Deliang Fan, Li Yang

发表机构 * University of North Carolina at Charlotte(北卡罗来纳大学夏洛特分校) University of Houston(休斯顿大学) State University of New York at Binghamton(纽约州立大学布法罗分校) Arizona State University(亚利桑那州立大学) Department of Artificial Intelligence and Informatics, Mayo Clinic(梅奥诊所人工智能与信息学系)

AI总结 本文发现零阶优化微调大语言模型时,单个解码层主导性能,通过仅微调该层可匹配或超越全模型微调,并基于激活异常值识别该层,解释其机制。

详情
AI中文摘要

零阶(ZO)优化通过仅使用前向传播实现大语言模型(LLM)的内存高效微调,但适应性如何分布在各层仍不清楚。在这项工作中,我们揭示了一个令人惊讶的现象:ZO 微调被单个解码层显著主导。在多个 LLM 家族和下游任务中,仅微调这一主导层始终匹配甚至超越全模型 ZO 微调。我们进一步表明,主导层是任务无关但模型特定的,并且可以在训练前通过简单的仅推理激活异常值分析来识别。具体来说,主导层与预训练模型中的第一个激活异常值层一致。为了解释这一现象,我们分析了在 ZO 优化下扰动效应如何传播。我们发现主导层结合了两个关键特性:高扰动敏感性和在残差流中的早期位置,使得扰动引起的效应能够通过后续的解码层传播和累积。因此,该层在前向更新下产生不成比例的强且稳定的优化信号。在 LLaMA2-7B 和 Qwen3-8B 上的九个基准测试的广泛实验表明,主导层 ZO 微调在平均性能上优于全模型 MeZO 和基于 LoRA 的 ZO 微调,同时实现了高达 4.52 倍的训练加速。

英文摘要

Zeroth-order (ZO) optimization enables memory-efficient fine-tuning of large language models (LLMs) using only forward passes, but it remains unclear how useful adaptation is distributed across layers. In this work, we reveal a surprising phenomenon: ZO fine-tuning is sharply dominated by a single decoding layer. Across multiple LLM families and downstream tasks, fine-tuning this dominant layer alone consistently matches or even exceeds full-model ZO fine-tuning. We further show that the dominant layer is task-agnostic but model-specific, and can be identified before training through a simple inference-only analysis of activation outliers. Specifically, the dominant layer consistently aligns with the first activation-outlier layer in the pre-trained model. To explain this phenomenon, we analyze how perturbation effects propagate under ZO optimization. We find that the dominant layer combines two key properties: high perturbation sensitivity and early placement in the residual stream, allowing perturbation-induced effects to propagate and accumulate through remaining subsequent decoding layers. As a result, this layer produces disproportionately strong and stable optimization signals under forward-only updates. Extensive experiments on LLaMA2-7B and Qwen3-8B across nine benchmarks show that dominant-layer ZO fine-tuning improves average performance over full-model MeZO and LoRA-based ZO fine-tuning while achieving up to 4.52$\times$ training speedup.

2606.05497 2026-06-05 cs.LG 版本更新

LEVANTE-bench: Multi-Scale Comparison of VLMs to Children Using Cognitive Tasks (or, "Is Your VLM Smarter Than a 5th Grader?")

LEVANTE-bench: 使用认知任务对VLM与儿童进行多尺度比较(或者,“你的VLM比五年级学生聪明吗?”)

Alvin Wei Ming Tan, David Cardinal, Tania Lorido-Botran, Laura Bravo-Sanchez, Sunny Yu, Michael C. Frank

发表机构 * Stanford University(斯坦福大学)

AI总结 本文提出LEVANTE-bench基准,基于儿童认知任务数据,从多个尺度系统评估视觉语言模型与5-12岁儿童在六项任务上的对齐程度,发现模型与人类认知仅部分对齐。

详情
AI中文摘要

鉴于人类经验本质上是多模态的,视觉语言模型(VLM)在模拟人类认知随经验增长和发展方面具有巨大潜力。发挥其潜力需要工具来比较VLM与人类认知发展在不同任务、年龄和人群中的表现。我们提出LEVANTE-bench,这是一个基于学习变异网络(LEVANTE)的任务和数据的基准,该网络分发跨语言和文化测量儿童认知的开源任务和数据。在LEVANTE-bench中,我们系统评估了VLM在六项任务上的表现,比较它们与三个国家5-12岁儿童(N = 1547)的对齐程度。我们在多个尺度上比较模型,评估它们的整体准确性、在任务和项目层面与儿童的对齐程度,以及它们匹配儿童试验级错误分布的程度。对齐在不同尺度上是异质的:在任务和项目层面,能力更强的模型与人类对齐更好。然而,与人类错误分布的匹配在不同任务间差异很大,对于某些任务,较小的模型更好地匹配了年幼儿童的错误。此外,即使表现最好的VLM在矩阵推理和心理旋转任务上也表现不佳。因此,当前的VLM架构仅与儿童的认知能力部分对齐。

英文摘要

Given the inherently multimodal nature of human experience, vision-language models (VLMs) hold substantial promise for modeling human cognition as it grows and develops with experience. Realizing their potential requires tools for comparing VLMs with human cognitive development across tasks, ages, and populations. We present LEVANTE-bench, a benchmark based on tasks and data from the Learning Variability Network (LEVANTE), which distributes open-source tasks and data measuring children's cognition across languages and cultures. In LEVANTE-bench, we systematically assess VLMs on six tasks, comparing their alignment with children aged 5-12 ($N$ = 1547) across three countries. We compare models at multiple scales, assessing their overall accuracy, their task- and item-level alignment with children, and how well they match children's trial-level error distributions. Alignment was heterogeneous across scales: at the level of tasks and items, more capable models aligned better with humans. However, match to human error distributions varied widely across tasks, and for several tasks, smaller models matched younger children's errors better. In addition, even the best-performing VLMs struggled on matrix reasoning and mental rotation tasks. Thus, current VLM architectures align only partially with the cognitive abilities of children.

2606.05488 2026-06-05 stat.ML cs.LG stat.ME 版本更新

Sparse Functional Singular Value Decomposition for Biclustering and Triclustering Longitudinal Data

纵向数据的稀疏函数奇异值分解用于双聚类和三聚类

Yue Zhao, Thierry Chekouo, Sandra Safo

发表机构 * Division of Biostatistics and Health Data Science University of Minnesota(生物统计学与健康数据科学系明尼苏达大学)

AI总结 提出Tri-SfSVD框架,通过稀疏惩罚同时进行连续轨迹估计与对象、特征和时间选择,实现纵向数据中的双聚类和三聚类,优于现有方法。

详情
AI中文摘要

识别复杂疾病(如炎症性肠病,IBD)的亚型通常需要捕捉纵向组学数据中的潜在模式。然而,这些数据通常是高维、稀疏采样且时间上不规则观测的,对传统的(双)聚类和函数数据分析方法构成了重大挑战。我们提出Tri-SfSVD,一个统一的稀疏函数奇异值分解框架,用于发现纵向数据中的双聚类和三聚类。与现有的依赖于临时插值或强制限制性形状同质性假设的函数双聚类方法不同,Tri-SfSVD在单个优化框架中集成了连续轨迹估计与同时的对象、特征和时间选择。通过在对象、变量和时间子区域上施加稀疏惩罚,所提出的方法直接对观测数据操作,以发现对象级、对象-特征级和对象-特征-时间级的局部结构。大量模拟表明,Tri-SfSVD在高维设置下优于现有方法。应用于IBD多组学数据,该方法识别了三个双聚类,将样本聚类与不同的IBD相关临床特征以及特定细菌类群相关的微生物通路组联系起来,提供了可解释的对象-通路关联以表征疾病异质性。应用于多通道脑电图数据,该方法识别了三个三聚类,将样本聚类与不同的酒精相关表型以及局部脑活动模式联系起来,包括同一空间区域内由时间子区域分隔的亚组差异。

英文摘要

Identifying subtypes of complex conditions, such as Inflammatory Bowel Disease (IBD), often requires capturing latent patterns in longitudinal omics data. However, these data are typically high-dimensional, sparsely sampled, and irregularly observed over time, posing substantial challenges for conventional (bi)clustering and functional data analysis methods. We propose Tri-SfSVD, a unified sparse functional Singular Value Decomposition framework for discovering biclusters and triclusters in longitudinal data. Unlike existing functional biclustering methods that rely on ad hoc imputation or enforce restrictive shape-homogeneity assumptions, Tri-SfSVD integrates continuous trajectory estimation with simultaneous subject, feature, and temporal selection within a single optimization framework. By imposing sparse penalties across subjects, variables, and temporal subregions, the proposed method works directly on observed data to uncover localized structures at the subject, subject-feature, and subject-feature-time levels. Extensive simulations demonstrate that Tri-SfSVD outperforms existing approaches in high-dimensional settings. Applied to IBD multi-omics data, the method identified three biclusters linking sample clusters with distinct IBD-related clinical characteristics to microbial pathway groups associated with specific bacterial taxa, providing interpretable subject-pathway associations for characterizing disease heterogeneity. Applied to multi-channel EEG data, the method identified three triclusters linking sample clusters with distinct alcohol-related phenotypes to localized brain activity patterns, including subgroup differences separated by temporal subregions within the same spatial region.

2606.05486 2026-06-05 cs.CL cs.LG 版本更新

Localizing Prompt Ambiguity in Large Language Models with Probe-Targeted Attribution

通过探针目标归因定位大型语言模型中的提示歧义

Govind Ramesh, Yao Dou, Wei Xu

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 提出PRIG方法,利用线性探针和梯度归因,通过中间表示而非输出层定位提示中的歧义位置,在合成和人工基准上取得高AUROC。

Comments 23 pages, 5 figures, 5 tables

详情
AI中文摘要

提示歧义是大型语言模型中常见的失败原因,但由于它是提示的潜在属性,难以定位,而现有的归因方法旨在解释可观察的输出,如logits或生成的token。我们引入了PRIG,一种梯度归因方法,使用探针logit将潜在歧义归因于token位置。具体来说,PRIG训练一个线性探针来区分清晰提示和模糊提示,并将探针分数归因于残差流中早期的token表示。为了实现token级别的评估,我们通过重写每个提示中的一个关键句子,构建了涵盖编码、数学和写作的合成歧义数据集,并用人工编写的黄金基准进行补充。在这种设置下,PRIG在定位歧义片段方面显著优于梯度归因基线,在组合合成基准上达到0.840 AUROC,在黄金集上达到0.891 AUROC。它在句子级别的歧义识别上也优于GPT-5.4,并在域外保留了有用的信号。这些结果确立了PRIG作为一种实用工具,用于识别提示中哪些部分存在歧义。更广泛地说,它们表明潜在提示属性可以通过中间表示而非输出级归因来定位。

英文摘要

Prompt ambiguity is a common source of failure in large language models, but is difficult to localize because it is a latent property of the prompt, while existing attribution methods are designed to explain observable outputs such as logits or generated tokens. We introduce PRIG, a gradient attribution method that uses a probe logit to attribute latent ambiguity to token positions. Specifically, PRIG trains a linear probe to distinguish clear prompts from ambiguous prompts and attributes the probe score to earlier token representations in the residual stream. To enable token-level evaluation, we construct synthetic ambiguity datasets across coding, math, and writing by rewriting one task-critical sentence per prompt, and complement them with a human-written gold benchmark. In this setting, PRIG localizes ambiguous spans substantially better than gradient attribution baselines, achieving 0.840 AUROC on the combined synthetic benchmark and 0.891 AUROC on the gold set. It also outperforms GPT-5.4 on sentence-level ambiguity identification and retains useful signal out-of-domain. These results establish PRIG as a practical tool for identifying which parts of a prompt are ambiguous. More broadly, they suggest that latent prompt properties can be localized through intermediate representations, rather than through output-level attribution.

2606.05484 2026-06-05 cs.LG 版本更新

Learned Subspace Compression for Communication-Efficient Pipeline Parallelism

学习子空间压缩以实现通信高效的流水线并行

Paul Janson, Edouard Oyallon, Eugene Belilovsky

发表机构 * Concordia University(康科迪亚大学) Mila Quebec AI Institute(魁北克人工智能研究所) CNRS, Sorbonne University(法国国家科学研究中心,索邦大学)

AI总结 提出MAPL方法,通过Stiefel流形约束学习每个流水线阶段的任务最优正交投影,结合因子化锚点嵌入和残差向量量化,在低带宽网络中实现高压缩比且性能损失极小。

Comments Accepted at the 2nd Workshop on Connecting Low-rank Representations in AI, ICML 2026

详情
AI中文摘要

流水线并行使得训练超过单设备内存的大型语言模型成为可能,然而在低带宽网络上训练时,阶段间激活通信成为主要瓶颈。最近的工作提出使用固定正交投影来压缩激活,但这仍然会导致显著的性能下降,并且需要大量非标准调整来约束优化。一个自然的替代方案是为每个流水线阶段学习一个低秩投影,但在训练过程中保持这些投影器的必要正交性仍然是一个挑战。我们提出了流形感知投影学习(MAPL),该方法将阶段间压缩视为在显式Stiefel流形(正交矩阵)约束下的可学习正交投影。MAPL不是规定固定的全局子空间,而是让每个流水线阶段通过流形约束的最速下降法发现并持续适应其任务最优的压缩子空间。为了在阶段边界恢复特定于token的信号,我们引入了每个阶段的因子化锚点嵌入,使得能够以可忽略的通信开销实现全秩激活重建。我们进一步展示了可以在投影后结合残差向量量化,并采用流式码本同步协议来分摊字典通信。在从150M到1B参数的LLaMA模型上,我们表明MAPL可以轻松应用于现有流水线,并且能够实现高压缩比,性能下降可忽略不计,与子空间网络相比,在性能与压缩之间的权衡得到了显著改善。

英文摘要

Pipeline parallelism enables training of large language models that exceed single-device memory, yet inter-stage activation communication becomes the dominant bottleneck when trained on low-bandwidth networks. Recent work in this area has proposed using fixed orthogonal projections to compress activations. However, this still results in a significant performance degradation and requires a number of non-standard adaptations to constrain the optimization. A natural alternative is to learn a low rank projection for each pipeline stage, however maintaining the necessary orthogonality of these projectors during training remains a challenge. We present Manifold Aware Projection Learning (MAPL), a method that treats inter-stage compression as a learnable orthogonal projection under explicit Stiefel manifold (orthogonal matrices) constraints. Rather than prescribing a fixed global subspace, MAPL lets each pipeline stage discover and continuously adapt its own task-optimal compression subspace via manifold-constrained steepest descent. To recover token-specific signals at stage boundaries, we introduce per-stage factorized anchor embeddings that allow for full-rank activation reconstruction with negligible communication overhead. We further show that we can incorporate residual vector quantization after projection with a streaming codebook synchronization protocol that amortizes dictionary communication. Across LLaMA models from 150M to 1B parameters we show that MAPL can be easily applied to the existing pipeline and can achieve high compression with neglibile performance degradation with a drastically improved tradeoffs in performance vs. compression compared to Subspace Networks.

2606.05481 2026-06-05 cs.LG cs.AI eess.SP 版本更新

Towards Unified and Data-Efficient Prognostics and Health Management with Tabular Foundation Models

面向统一且数据高效的预测与健康管理:基于表格基础模型

Raffael Theiler, Lev Telyatnikov, Leandro Von Krannichfeldt, Olga Fink

发表机构 * IMOS Lab, EPFL(IMOS实验室,瑞士联邦理工学院)

AI总结 提出利用表格基础模型通过上下文学习处理工业时间序列,实现预测与健康管理(PHM)任务,在低数据场景下表现优异,并优于序列模型和梯度提升树。

详情
AI中文摘要

数据驱动的预测与健康管理(PHM)利用时变状态监测数据来诊断系统状态并估计工程资产的剩余使用寿命。这些任务是维护规划的核心,但工业PHM数据通常是碎片化的、部分观测且标注不足,这阻碍了监督学习。基础模型提供了一条通往可重用预测系统的途径,然而大多数时间序列基础模型是为预测设计的,并假设长序列、连贯且规则采样。为弥补这一差距,我们提出了一个框架,利用上下文学习将表格基础模型应用于工业时间序列,并在多种PHM任务上对其进行评估。通过将原始单元级信号转换为表格行,我们展示了这些模型在多个任务(包括预测和诊断)上表现良好,且数据效率高。我们在统一的评估协议下,直接将其与序列模型、Transformer基线和梯度提升树进行比较。结果表明,表格基础模型在预测和诊断任务中取得了最佳平均排名。我们的发现进一步表明,基于PFN的模型在低数据场景下具有竞争力,时间上下文可以在表格表示中保留,且性能依赖于子采样下的代表性上下文构建。这些结果证明,表格基础模型为异构PHM问题提供了一个实用且通用的接口。

英文摘要

Data-driven Prognostics and Health Management (PHM) uses time-varying condition-monitoring data to diagnose system states and estimate remaining useful life in engineered assets. These tasks are central to maintenance planning, but industrial PHM data are often fragmented, partially observed, and poorly labeled, which hinders supervised learning. Foundation models offer a route toward reusable predictive systems, yet most time-series foundation models are designed for forecasting and assume long, coherent, regularly sampled sequences. To address this gap, we propose a framework for applying Tabular Foundation Models to industrial time series using in-context learning, and we evaluate them on a variety of PHM tasks. By converting raw unit-level signals into tabular rows, we show that these models perform well across multiple tasks - including prognostics, and diagnostics - and are highly data efficient. We compare them directly with sequence models, transformer baselines, and gradient-boosted trees under a common evaluation protocol. The results indicate that tabular foundation models achieve the best average ranks across prognostic and diagnostic tasks. Our findings further show that PFN-based models are competitive in low-data regimes, that temporal context can be preserved in the tabular representation, and that performance depends on representative context construction under subsampling. These results demonstrate that tabular foundation models provide a practical and general interface for heterogeneous PHM problems.

2606.05478 2026-06-05 cs.CV cs.LG 版本更新

Can We Predict The Human Preference For Text-to-Image Content Prior To Generation And Is It Even Useful To Do So?

我们能否在生成之前预测文生图内容的人类偏好,以及这样做是否有用?

Joong Ho Kim, Keith G. Mills

发表机构 * LSU ATHENA Lab(LSU ATHENA实验室)

AI总结 研究在扩散模型生成图像前预测人类偏好评分(HPM)的可行性,并利用该预测提升生成质量,同时评估不同HPM的适用性。

Comments Code is available at https://github.com/LSU-ATHENA/HPM-Predict

详情
AI中文摘要

扩散模型(DM)通过从用户提示中合成高质量、逼真的视觉内容,彻底改变了文本驱动的生成。而先前视觉生成的进展(如VAE和GAN)主要基于感知或视觉相似性指标(如FID、PSNR)进行评估,DM的进展促进了更先进的人类偏好指标(HPM)的发展,这些指标将人类判断建模并量化为标量值。然而,DM使用固有的随机过程合成内容,其中随机噪声种子生成。初始随机噪声直接定性和定量地影响生成输出的质量。这种影响在本地部署场景的小型模型中尤为显著。鉴于这一现象,我们首先研究在投入计算资源进行生成之前,我们能在多大程度上预测标量HPM分数。进一步,我们研究能在多大程度上利用这种预测来改善生成图像的质量,并研究哪些HPM最适合此任务。我们的研究表明,这不仅是可能的,而且可以实现可忽略的硬件开销。

英文摘要

Diffusion Models (DM) have revolutionized text-driven generation by enabling the synthesis of high-quality, photorealistic visual content from user prompts. Whereas prior advances in visual generation such as VAEs and GANs were primarily evaluated on perceptual or visual similarity metrics such as FID PSNR, DM advances have fostered the development of more advanced Human Preference Metrics (HPM) that model and quantify human judgment as scalar values. However, DMs synthesize content using an inherently stochastic process where random noise seeds generation. The initial random noise directly affects the quality of generated outputs, both qualitatively and quantitatively. This influence is pronounced in smaller models for local deployment scenarios. Given this phenomenon, we first investigate to what extent we can predict scalar HPM scores prior to committing compute resources for generation. Further, we then investigate to what extent we can leverage such prediction to improve the quality of generated images, and also study which HPMs are best suited for this task. Our investigation reveals that not only is this possible, but that it is feasible to achieve negligible hardware overhead.

2606.05474 2026-06-05 q-bio.BM cs.LG 版本更新

AlloGen: Conformation-Selective Binder Generation with Differential State Scoring

AlloGen: 基于差异状态评分的构象选择性结合物生成

Hanqun Cao, Zachary Quinn, Aastha Pal, Sumi Kimura, Jingjie Zhang, Pheng Ann Heng, Pranam Chatterjee

发表机构 * Department of Computer Science and Engineering(计算机科学与工程系) The Chinese University of Hong Kong(香港中文大学) Department of Bioengineering(生物工程系) University of Pennsylvania(宾夕法尼亚大学) Department of Computer and Information Science(计算机与信息科学系)

AI总结 提出AlloGen框架,通过可学习的构象选择性评分器Qθ,结合骨架生成与状态选择性,实现针对蛋白不同构象状态的选择性结合物设计。

详情
AI中文摘要

蛋白质结合物设计主要优化亲和力,忽视了构象选择性:对于激酶、核受体和GPCR等变构靶点,无论结合多紧密,同时结合活性态和非活性态的结合物无法提供功能特异性。我们提出AlloGen,一个模块化框架,将骨架生成与学习到的状态选择性评分器$Q_θ$解耦,$Q_θ$是一个SE(3)不变的界面图变换器,通过两阶段课程训练,先学习界面几何,再施加构象区分。由于$Q_θ$完全可微且与生成器无关,它可以作为被动重排序器或主动基于梯度的引导器与任何骨架生成器集成,无需重新训练。在跨越多个家族和构象机制的多样化蛋白质基准上,AlloGen一致地识别出优先识别所需结构状态同时排斥替代构象的结合物。在钙调蛋白上的实验验证进一步表明,这些计算选择性信号可转化为物理分子,产生从头设计的肽,结合所需的全息构象,而对apo状态无检测到的结合。总之,这些结果确立了构象选择性作为可学习属性,并为状态选择性蛋白质结合物设计提供了通用框架。

英文摘要

Protein binder design has largely optimized for affinity alone, leaving conformational selectivity unaddressed: for allosteric targets such as kinases, nuclear receptors, and GPCRs, a binder that engages both active and inactive states provides no functional specificity regardless of how tightly it binds. We introduce AlloGen, a modular framework that decouples backbone generation from a learned state-selectivity scorer $Q_θ$, an SE(3)-invariant interface graph transformer trained via a two-phase curriculum that first learns interface geometry before imposing conformational discrimination. Because $Q_θ$ is fully differentiable and generator-agnostic, it integrates with any backbone generator as a passive reranker or an active gradient-based guide without retraining. Across a diverse benchmark of proteins spanning multiple families and conformational mechanisms, AlloGen consistently identifies binders that preferentially recognize desired structural states while rejecting alternative conformations. Experimental validation on calmodulin further demonstrates that these computational selectivity signals translate to physical molecules, yielding de novo peptides that bind the desired holo conformation while exhibiting no detectable binding to the apo state. Together, these results establish conformational selectivity as a learnable property and provide a general framework for state-selective protein binder design.

2606.05444 2026-06-05 cs.CL cs.AI cs.LG 版本更新

Multilingual Coreference Resolution via Cycle-Consistent Machine Translation

通过循环一致性机器翻译的多语言共指消解

Adriana-Valentina Costache, Eduard Poesina, Silviu-Florin Gheorghe, Paul Irofti, Radu Tudor Ionescu

发表机构 * Department of Computer Science, University of Bucharest(布加勒斯特大学计算机科学系)

AI总结 提出一种利用循环一致性机器翻译生成或扩展训练数据的管道,通过BERT潜在空间余弦相似度评估翻译质量并加权损失函数,显著提升低资源语言的共指消解性能。

详情
AI中文摘要

共指消解是一项核心的自然语言处理任务,具有广泛的下游应用,例如机器翻译、问答、文档摘要等。虽然该任务在英语中得到了充分研究,但其他语言(尤其是低资源语言)的共指消解关注相对较少。为了弥补这一差距,我们提出了一种新颖的共指消解管道,该管道利用从英语到目标低资源语言的机器翻译(MT)来生成或扩展训练数据。为了自动验证翻译样本的质量,我们将样本反向翻译,并通过BERT模型潜在空间中的余弦相似度评估与原始英语样本的相似性。得到的相似度分数被整合到损失函数中,以根据样本的MT循环一致性对训练样本进行加权。在四种低资源语言上的大量实验表明,我们的管道在共指消解中带来了显著的性能提升。此外,我们的管道使得在之前没有可用语料库的语言中也能实现准确的共指消解。

英文摘要

Coreference resolution is a core NLP task, having a broad range of downstream applications, e.g.~machine translation, question answering, document summarization, etc. While the task is well-studied in English, comparatively less attention is dedicated to coreference resolution in other languages, especially low-resource ones. To mitigate this gap, we propose a novel coreference resolution pipeline that harnesses machine translation (MT) from English to a target low-resource language, to generate or expand training data. To automatically validate the quality of the translated samples, we back-translate the samples and assess the similarity with the original English samples via cosine similarity in the latent space of a BERT model. The resulting similarity scores are integrated into the loss function to weight training samples according to their MT cycle consistency. Extensive experiments on four low-resource languages show that our pipeline brings significant performance gains in coreference resolution. Moreover, our pipeline enables accurate coreference resolution in languages where no previous corpora were available.

2606.05438 2026-06-05 cs.LG math.OC 版本更新

Sharp First-Order Lower Bounds for Higher-Order Smooth Nonconvex Optimization

高阶光滑非凸优化的尖锐一阶下界

Dongruo Zhou

发表机构 * Department of Computer Science, Indiana University(计算机科学系,印第安纳大学)

AI总结 针对高阶光滑非凸优化,通过块链机制构造硬实例,首次证明了匹配已知上界的一阶下界,如Hessian Lipschitz情形下的Ω(ε^{-7/4})和三阶光滑情形下的Ω(ε^{-5/3})。

Comments 24 pages, 1 table

详情
AI中文摘要

我们研究了在目标函数满足高阶光滑性假设时,寻找光滑非凸优化中ε-驻点的确定性一阶预言复杂度。虽然经典的ε^{-2}速率在仅Lipschitz梯度条件下是最优的,但高阶光滑性导致了加速的一阶上界,最显著的是在Lipschitz Hessian下的ε^{-7/4}速率和在Lipschitz三阶导数下的ε^{-5/3}速率。然而,匹配的下界一直未解决。我们通过证明一个新的无维数的一阶下界来填补这一空白,该下界适用于任意有限光滑阶的高阶光滑非凸函数。特别地,我们的构造在Hessian-Lipschitz情形下给出了匹配的Ω(ε^{-7/4})下界,在三阶光滑情形下给出了匹配的Ω(ε^{-5/3})下界。硬实例基于一种块链机制,该机制强制块状预言揭示,同时保持标量硬实例所需的光滑结构。该下界构造是在ChatGPT 5.5 Pro的协助下发现的,随后由作者验证。

英文摘要

We study the deterministic first-order oracle complexity of finding \(ε\)-stationary points in smooth nonconvex optimization when the objective satisfies higher-order smoothness assumptions. While the classical \(ε^{-2}\) rate is optimal under only Lipschitz gradients, higher-order smoothness leads to accelerated first-order upper bounds, most notably the \(ε^{-7/4}\) rate under Lipschitz Hessians and the \(ε^{-5/3}\) rate under Lipschitz third derivatives. The matching lower bounds, however, have remained open. We resolve this gap by proving a new dimension-free first-order lower bound for higher-order smooth nonconvex functions, valid for every finite smoothness order. In particular, our construction gives a matching \(Ω(ε^{-7/4})\) lower bound in the Hessian-Lipschitz case and a matching \(Ω(ε^{-5/3})\) lower bound in the third-order-smooth regime. The hard instance is based on a \emph{block-chain} mechanism that enforces blockwise oracle revelation while preserving the smoothness structure needed for the scalar hard instance. The lower-bound construction was discovered with the assistance of ChatGPT 5.5 Pro and subsequently verified by the authors.

2606.05435 2026-06-05 cs.LG cs.CR 版本更新

DP-MacAdam: Differentially Private Mechanism with Adaptive Clipping and Adaptive Momentum

DP-MacAdam:具有自适应裁剪和自适应动量的差分隐私机制

Naima Tasnim, Lalitha Sankar, Oliver Kosut

发表机构 * University of Southern California(南加州大学)

AI总结 提出DP-MacAdam算法,通过联合利用梯度均值和方差估计进行自适应裁剪和动量更新,在无需手动调整裁剪阈值的情况下提升模型效用。

Comments 6 pages, 2 tables

详情
AI中文摘要

差分隐私随机梯度下降(DP-SGD)已成为隐私保护机器学习的标准框架,但其依赖固定梯度裁剪阈值来限制敏感度,这仍然是一个重要的实际限制。诸如AdaClip等自适应裁剪算法在裁剪和添加噪声之前对梯度进行平移和缩放,使得裁剪后的梯度产生更具信息性的下降方向。平移和缩放参数根据经验均值和方差自适应选择。然而,在现有的自适应裁剪算法中,这些经验估计尚未同时用于动量以加速训练本身。另一方面,DP-Adam是一种利用基于梯度均值和方差的类Adam动量更新来加速训练的算法,但并未利用这些估计进行自适应裁剪。在这项工作中,我们提出了具有自适应裁剪和自适应动量的差分隐私机制(DP-MacAdam),这是一种新颖的算法,它结合了这两种方法,从而将相同的均值和方差估计同时用于裁剪和动量。我们进行了分析,表明DP-MacAdam以无偏方式估计梯度方差。此外,我们实证评估了DP-MacAdam的隐私和准确性,证明与DP-SGD、AdaClip和DP-Adam基线相比,它在无需手动调整裁剪阈值的情况下实现了改进的模型效用。

英文摘要

Differentially private stochastic gradient descent (DP-SGD) has become the standard framework for privacy-preserving machine learning, yet its reliance on a fixed gradient clipping threshold to limit sensitivity remains a significant practical limitation. Adaptive clipping algorithms such as AdaClip shift and scale the gradient prior to clipping and adding noise so that the clipped gradient yields a more informative descent direction. The shift and scaling parameters are selected adaptively based on the empirical mean and variance. However, in existing adaptive clipping algorithms, these empirical estimates have not been also used for momentum to accelerate training itself. On the other hand, DP-Adam is an algorithm that exploits Adam-like momentum updates based on the gradient mean and variance to accelerate training, but does not exploit these estimates for adaptive clipping. In this work, we propose Differentially Private Mechanism with Adaptive Clipping and Adaptive Momentum (DP-MacAdam), a novel algorithm that combines these two approaches so as to use the same mean and variance estimates for both clipping and momentum. We perform an analysis showing that DP-MacAdam estimates the gradient variances in a bias-free manner. In addition, we empirically evaluate the privacy and accuracy of DP-MacAdam, demonstrating that it achieves improved model utility compared to DP-SGD, AdaClip, and DP-Adam baselines, without requiring manual tuning of the clipping threshold.

2606.05434 2026-06-05 cs.LG cs.AI 版本更新

Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models

选择性优势熵自适应视野GRPO:用于语言模型高效强化学习的非对称令牌级折扣

Chirag Chawla, Rohan Charudatt Salvi, Madhav S. Baidya

发表机构 * Indian Institute of Technology (BHU)(印度理工学院(博生胡大学)) Department of Computer Science, University of Illinois Chicago(伊利诺伊大学芝加哥分校计算机科学系)

AI总结 提出选择性优势熵自适应视野GRPO(SA-AH-GRPO),通过非对称令牌级折扣(仅对负优势轨迹应用熵基折扣)来稳定训练并提升数学推理性能。

Comments 16 pages, 4 Figures, 7 Tables

详情
AI中文摘要

组相对策略优化(GRPO)已成为一种有效的强化学习算法,用于在推理任务上对齐语言模型,但它对称地处理每个令牌位置和每个采样轨迹。我们引入了两个互补的扩展:(i) 自适应视野GRPO(AH-GRPO),它使用基于累积熵的折扣对每个令牌的策略梯度进行加权,当模型不确定时减少有效视野;(ii) 选择性优势AH-GRPO(SA-AH-GRPO),它仅对负优势轨迹应用此折扣,而保留正优势的成功轨迹不受衰减。我们在GSM8K数学推理基准上,使用通过LoRA微调的Qwen 2.5-1.5B-Instruct和Qwen 2.5-3B-Instruct模型,评估了alpha=0的标准GRPO、alpha=0.5的AH-GRPO和alpha=0.5的SA-AH-GRPO。在3B模型上,SA-AH-GRPO在第30步达到峰值Pass@1=0.858,并在180步保持0.846,训练方差降至0.0246,相比GRPO减少了3.6倍,同时匹配其峰值准确率。在1.5B模型上,SA-AH-GRPO达到峰值Pass@1=0.686,优于零样本基线0.637。我们的分析表明,非对称折扣保留了正确解上的完整梯度信号,防止了熵崩溃,并显著稳定了训练,为结构化生成任务上具有可验证奖励的强化学习提供了一种原则性的归纳偏置。

英文摘要

Group Relative Policy Optimisation (GRPO) has emerged as an effective reinforcement-learning algorithm for aligning language models on reasoning tasks, but it treats every token position and every sampled rollout symmetrically. We introduce two complementary extensions: (i) Adaptive-Horizon GRPO (AH-GRPO), which weights each token's policy gradient using a cumulative entropy-based discount that reduces the effective horizon when the model is uncertain, and (ii) Selective-Advantage AH-GRPO (SA-AH-GRPO), which applies this discounting only to negative-advantage rollouts, leaving positive-advantage, successful trajectories unattenuated. We evaluate standard GRPO with alpha = 0, AH-GRPO with alpha = 0.5, and SA-AH-GRPO with alpha = 0.5 on the GSM8K mathematical reasoning benchmark using both Qwen 2.5-1.5B-Instruct and Qwen 2.5-3B-Instruct fine-tuned with LoRA. On the 3B model, SA-AH-GRPO achieves Pass@1 = 0.858 at its peak at step 30 and maintains 0.846 at 180 steps, with training variance reduced to 0.0246, a 3.6 times reduction relative to GRPO while matching its peak accuracy. On the 1.5B model, SA-AH-GRPO achieves a peak Pass@1 of 0.686, improving over the zero-shot baseline of 0.637. Our analysis shows that asymmetric discounting preserves the full gradient signal on correct solutions, prevents entropy collapse, and substantially stabilises training, suggesting a principled inductive bias for reinforcement learning with verifiable rewards on structured generation tasks.

2606.05415 2026-06-05 cs.CL cs.AI cs.LG 版本更新

Executable Schema Contracts: From Automatic Ingestion to Multi-Source Retrieval

可执行模式合约:从自动摄入到多源检索

Padmaja Jonnalagedda, Yuguang Yao, Xiang Gao, Hilaf Hasson, Kamalika Das

发表机构 * Intuit AI Research(Intuit AI研究)

AI总结 提出一种自动从多源数据中发现可执行模式并将其作为共享合约的系统,通过模式约束的检索路由和结构化分析提升多源问答性能。

Comments 9 pages, 4 figures, plus supplementary appendix

详情
AI中文摘要

现实世界的数据跨越表格、文档和半结构化文件,具有隐式语义。查询这些数据需要跨不一致的模式和格式整合证据,但现有方法要么需要昂贵的人工工程,要么完全绕过结构。我们提出一个系统,自动从原始多源数据中发现可执行模式,并将其用作知识图谱构建和查询时检索的共享合约。一个封闭世界的字段目录将基于LLM的模式发现限制在已证实的字段上;确定性结构分析推断身份键、外键和源层次结构;由此产生的模式驱动提取、去重和跨源链接,形成具有溯源意识的知识图谱。在查询时,该模式(可选地通过单调协议扩展)调节一个多工具代理,该代理在结构化查找、图遍历和向量搜索之间路由检索,返回带有可追溯引用的有根据的答案。在使用相同LLM、数据和评估框架的受控零样本比较中,该系统在四个QA基准上优于仅检索和基于分解的基线,消融实验表明模式条件路由、结构智能和模式引导构建各自贡献了性能提升。

英文摘要

Real-world data spans tables, documents, and semi-structured files with implicit semantics. Querying this data requires integrating evidence across inconsistent schemas and formats, yet existing approaches either demand costly manual engineering or bypass structure entirely. We present a system that automatically discovers an executable schema from raw multi-source data and uses it as a shared contract for knowledge graph construction and query-time retrieval. A closed-world field catalog constrains LLM-based schema discovery to attested fields; deterministic structural analysis infers identity keys, foreign keys, and source hierarchy; and the resulting schema drives extraction, deduplication, and cross-source linking into a provenance-aware knowledge graph. At query time the schema -- optionally extended via a monotonic protocol -- conditions a multi-tool agent routing retrieval across structured lookup, graph traversal, and vector search, returning grounded answers with traceable citations. In controlled zero-shot comparisons using the same LLM, data, and evaluation harness, the system improves over retrieval-only and decomposition-based baselines across four QA benchmarks, with ablations showing that schema-conditioned routing, structural intelligence, and schema-guided construction each contribute to the gains.

2606.05414 2026-06-05 cs.CL cs.AI cs.HC cs.LG 版本更新

When Evidence is Sparse: Weakly Supervised Early Failure Alerting in Dialogs and LLM-Agent Trajectories

当证据稀疏时:对话和LLM-Agent轨迹中的弱监督早期失败预警

Avinash Baidya, Xinran Liang, Ruocheng Guo, Xiang Gao, Kamalika Das

发表机构 * Intuit AI Research(Intuit AI研究院) Princeton University(普林斯顿大学)

AI总结 针对对话和LLM-Agent轨迹中早期失败预警问题,提出一种两阶段方法,通过注意力机制从稀疏的轨迹级标签中学习回合级失败证据,并结合α-STOP策略实现可控的早期预警,在多个基准上显著提升帕累托前沿质量并降低训练成本。

Comments 9 pages, 14 figures, and appendix

详情
AI中文摘要

早期失败预警需要在对话或智能体轨迹尚未完成时,决定是否将其标记为可能失败。这具有挑战性,因为监督信号通常仅以轨迹级成功/失败标签的形式提供,而预警必须从部分交互中发出。先前的早期分类方法通常通过将终端标签分配给每个前缀来弥合这一差距,将每个回合视为失败证据。我们假设这种前缀标签假设与多轮语言交互不匹配,因为最终失败的证据是稀疏且常常延迟的。在本文中,我们引入了一种两阶段方法,从这种稀疏证据结构中学习,并使用由此产生的风险估计进行可控的早期预警。具体来说,我们的基于注意力的失败预测器从轨迹标签中学习稀疏的回合级失败证据,并利用它从部分历史中估计失败风险。然后,我们将该预测器与α-STOP配对,这是一种单一偏好条件停止策略,在推理时选择准确率-早期性的操作点,而不是为每个偏好训练单独的触发器。在涵盖客户支持、任务导向对话、说服、工具使用和规划的五个基准上,我们首先表明高相关性失败证据仅占回合的4.7-11.3%,并且平均在轨迹的59.0-83.6%之后首次出现。我们进一步表明,基于注意力的预测器将帕累托前沿质量(超体积)比朴素前缀监督提高了1-10%,并且完整系统将前沿质量比最先进的触发器策略提高了3-42%,同时将每个操作点的训练成本降低了1-3个数量级。

英文摘要

Early failure alerting requires deciding, while a dialog or agent trajectory is still unfolding, whether to flag it as likely to fail. This is challenging because supervision is typically available only as a trajectory-level success/failure label while alerts must be raised from partial interactions. Prior early-classification methods often bridge this gap by assigning the terminal label to every prefix, treating every turn as failure evidence. We hypothesize that this prefix-label assumption is poorly matched to multi-turn language interactions, where evidence of eventual failure is sparse and often delayed. In this paper, we introduce a two-stage approach that learns from this sparse evidence structure and uses the resulting risk estimates for controllable early alerting. Specifically, our attention-based failure predictor learns sparse turn-level failure evidence from trajectory labels and uses it to estimate failure risk from partial histories. We then pair this predictor with $α$-STOP, a single preference-conditioned stopping policy that selects an accuracy-earliness operating point at inference time rather than training a separate trigger for each preference. Across five benchmarks spanning customer support, task-oriented dialog, persuasion, tool use, and planning, we first show that high-relevance failure evidence occupies only 4.7-11.3% of turns and first appears after 59.0-83.6\% of trajectories on average. We further show that the attention-based predictor improves Pareto-frontier quality (hypervolume) by 1-10\% over naive prefix supervision, and that the full system improves frontier quality by 3-42\% over state-of-the-art trigger policies while reducing training cost per operating point by 1-3 orders of magnitude.

2606.05413 2026-06-05 cs.LG cs.AI 版本更新

CausalPOI: Spatio-Temporal Graph-Based Causal Modeling for Cold-Start POI Check-in Forecasting

CausalPOI:基于时空图因果建模的冷启动POI签到预测

Zhaoqi Zhang, Miao Xie, Yi Li, Linyou Cai, Siqiang Luo, Gao Cong

发表机构 * Nanyang Technological University(南洋理工大学) China Agricultural University(中国农业大学) Meituan(美团)

AI总结 提出CausalPOI框架,利用时空功能交互图建模POI间语义和空间关系,通过结构对齐的处理和对照图模拟事实与反事实场景,解决冷启动POI签到预测问题,在真实数据集上显著优于基线。

Comments Accepted at KDD 2026

详情
AI中文摘要

随着城市环境的快速演变,准确建模兴趣点(POI)的动态行为对于支持数据驱动的城市规划和商业决策至关重要。尽管时空图学习的最新进展改进了POI预测,但大多数方法依赖于基于邻近性的图和相关性驱动建模,忽略了POI之间的功能依赖关系,且未能捕捉城市干预的因果效应。本文引入了一个新的研究问题——冷启动POI签到预测,旨在通过建模新引入POI的时间演化及其与附近POI在结构化城市空间背景下的功能交互,预测其未来的签到模式。为应对这些挑战,我们提出了CausalPOI,一个基于时空图的因果表示学习框架。CausalPOI利用时空功能交互图建模POI之间的语义和空间关系,并构建结构对齐的处理图和对照图以模拟事实和反事实场景。在真实SafeGraph数据集上的大量实验表明,CausalPOI在各方面显著优于最先进的基线,验证了其在时空预测、语义交互建模和因果效应估计方面的有效性,为城市干预分析提供了更可解释和可操作的基础。源代码可在Github获取。

英文摘要

As urban environments continue to evolve rapidly, accurately modeling the dynamic behaviour of Points of Interest is essential for supporting data-driven urban planning and commercial decision-making. While recent advancements in spatio-temporal graph learning have improved POI forecasting, most methods rely on proximity-based graphs and correlation-driven modeling, which overlook the functional dependencies between POIs and fail to capture the causal effects of urban interventions. In this paper, we introduce a novel research problem -- cold-start POI check-in forecasting, which aims to predict the future check-in pattern of a newly introduced POI, by modeling its temporal evolution and functional interactions with nearby POIs in a structured urban spatial context. To address these challenges, we propose CausalPOI, a spatio-temporal graph-based causal representation learning framework. CausalPOI leverages Spatio-Temporal Functional Interaction Graph to model semantic and spatial relationships between POIs, and constructs structurally aligned treatment and control graphs to simulate factual and counterfactual scenarios. Extensive experiments on real-world SafeGraph datasets demonstrate that CausalPOI significantly outperforms state-of-the-art baselines across the board, validating its effectiveness in spatio-temporal forecasting, semantic interaction modeling, and causal effect estimation, providing a more interpretable and actionable foundation for urban intervention analysis. Source code is available at Github.

2606.05404 2026-06-05 cs.AI cs.CL cs.LG 版本更新

Harnessing Generalist Agents for Contextualized Time Series

利用通用智能体进行情境化时间序列分析

Zihao Li, Kaifeng Jin, Yuanchen Bei, Jiaru Zou, Avaneesh Kumar, Xuying Ning, Yanjun Zhao, Mengting Ai, Baoyu Jing, Hanghang Tong, Jingrui He

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出TimeClaw框架,通过集成可执行时间工具、经验驱动能力进化和情景多模态记忆,使通用大语言模型智能体具备情境化时间推理能力,在能源、金融等多领域基准上取得性能提升。

Comments Preprint. 38 Pages

详情
AI中文摘要

时间序列通常嵌入在丰富的上下文中,这对于整体建模至关重要。此外,现实世界的从业者通常需要用于分析时间动态的端到端工作流,其中广泛研究的任务(如预测)只是更广泛解决方案循环中的一个步骤。虽然通用AI智能体为复杂上下文下的此类工作流提供了有前景的接口,但它们主要运行在文本空间中,并未与结构化时间信号完全对齐。在这项工作中,我们引入了TimeClaw,一个用于时间序列的智能体框架,它为通用大语言模型智能体配备了情境化时间推理所需的时间序列原生运行时支持。TimeClaw集成了可执行的时间工具以进行有根据和可审计的分析,经验驱动的能力进化以创建可重用的分析例程,以及用于检索相关推理轨迹的情景多模态记忆。这些组件共同解锁了带有上下文信息的开放式时间推理。在涵盖能源、金融、天气、交通和其他现实世界领域的多个基准上的广泛评估表明,TimeClaw的性能得到了提升。代码可在https://github.com/iDEA-iSAIL-Lab-UIUC/TimeClaw获取。

英文摘要

Time series are often embedded in rich contexts that are essential for holistic modeling. Moreover, real-world practitioners often require end-to-end workflows for analyzing temporal dynamics, where widely studied tasks such as forecasting are only one step in a broader solution loop. While generalist AI agents offer a promising interface for such workflows under complex contexts, they still operate primarily in textual spaces that are not fully aligned with structured temporal signals. In this work, we introduce TimeClaw, an agentic harness framework for time series that equips generalist LLM agents with the time series-native runtime support needed for contextualized temporal reasoning. TimeClaw integrates executable temporal tools for grounded and auditable analysis, experience-driven capability evolution for creating reusable analytical routines, and episodic multimodal memory for retrieving relevant reasoning traces. Together, these components unlock harnessed open-ended temporal reasoning with contextual information. Extensive evaluation on multiple benchmarks covering diverse tasks across energy, finance, weather, traffic, and other real-world domains demonstrates improved performance of TimeClaw. Code is available at https://github.com/iDEA-iSAIL-Lab-UIUC/TimeClaw.

2606.05403 2026-06-05 cs.LG cs.AI 版本更新

Trust, but Don't Verify: Epistemic Blind Spots in LLM Source Evaluation

信任,但不验证:LLM 源评估中的认知盲点

Rohan N. Pradhan, Steve Goley

发表机构 * Amazon(亚马逊)

AI总结 研究语言模型在多源综合中是否评估证据质量,发现模型虽能检测伪造统计但未在综合中启用,而是依赖方法论-语域门控,导致数值有效性被抑制。

详情
AI中文摘要

语言模型日益充当认知代理,综合多个来源的证据以辅助决策。然而,它们是否评估这些证据的质量,还是仅仅基于表面呈现进行聚合,目前尚不清楚。我们表明,模型具备检测伪造统计数据的能力(孤立方法论的正确识别率为0.76-1.00),但在多源综合过程中并未启用这一能力,无论统计数据是伪造还是有效,都会产生相似的数值估计。具体而言,源影响受方法论-语域门控支配,该门控响应分析文本的分布性语域,但不响应数值有效性:例如,统计上不可能的置信区间与有效区间获得相同权重。这种行为分离在来自三个家族(Claude、Qwen、OLMo)的五个模型以及三个专业领域中均得到复现。机制分析(包括因果追踪、线性探针和组件级归因)收敛于同一解释:模型编码并因果使用一种跨领域转移的方法论-语域表示(探针AUC 0.83-0.92),而数值有效性信号(在孤立时可解码)在多源综合中被抑制至随机水平。基于提示的缓解措施(甚至是指定精确统计检查的预言清单)会产生全面怀疑而非选择性辨别,我们检查的后训练流程强化了风格捷径而未建立数值验证。与追踪用户偏好的奉承行为不同,这种失败追踪的是源是否呈现为分析可信,而非其主张是否内部一致。我们称之为认知对齐:与偏好对齐和安全对齐一样,问题不在于能力,而在于部署。

英文摘要

Language models increasingly act as epistemic proxies, synthesizing evidence from multiple sources to inform decisions. Whether they evaluate the quality of that evidence, or merely aggregate it based on surface presentation, remains poorly understood. We show that models possess the capability to detect fabricated statistics (correct identification rates of 0.76-1.00 for methodology in isolation) but do not recruit this capability during multi-source synthesis, producing similar numeric estimates whether the statistics are fabricated or valid. Specifically, source influence is governed by a methodology-register gate that responds to the distributional register of analytical text but not to numeric validity: for example, statistically impossible confidence intervals receive the same weight as valid ones. The behavioral dissociation replicates across five models from three families (Claude, Qwen, OLMo) and three professional domains. Mechanistic analyses, including causal tracing, linear probes, and component-level attribution, converge on the same account: the model encodes and causally uses a methodology-register representation that transfers across domains (probe AUC 0.83-0.92), while numeric-validity signals, decodable in isolation, are suppressed to chance during multi-source synthesis. Prompting-based mitigations, even an oracle checklist naming the exact statistical checks, produce blanket skepticism rather than selective discernment, and the post-training pipelines we examine reinforce the stylistic shortcut without building numeric verification. Unlike sycophancy, which tracks user preference, this failure tracks whether a source presents as analytically credible, not whether its claims are internally consistent. We term this epistemic alignment: like preference and safety alignment, the question is not capability but deployment.

2606.05400 2026-06-05 cs.AI cs.CL cs.LG 版本更新

LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

LeanMarathon:通过长视界Lean自动形式化实现可靠的AI合作数学家

Yuanhe Zhang, Yuekai Sun, Taiji Suzuki, Jason D. Lee, Fanghui Liu

发表机构 * Department of Statistics, University of Warwick, UK(英国沃里克大学统计系) Center for Advanced Intelligence Project, RIKEN, Japan(日本理化学研究所高级智能项目) Department of Statistics, University of Michigan, USA(美国密歇根大学统计系) Department of Mathematical Informatics, The University of Tokyo(东京大学数学信息学系;日本理化学研究所高级智能项目) also Center for Advanced Intelligence Project, RIKEN, Japan(加州大学伯克利分校电气工程与计算机科学系;统计系) Department of Electrical Engineering and Computer Sciences, also Department of Statistics, University of California, Berkeley, USA(上海交通大学数学科学学院,自然科学院和MOE-LSC) School of Mathematical Sciences, Institute of Natural Sciences and MOE-LSC, Shanghai Jiao Tong University, China

AI总结 提出多智能体框架LeanMarathon,通过蓝图抽象和两阶段编排器实现长视界研究数学的可靠自动形式化,在四个Erdős问题上成功形式化七个定理。

Comments 26 pages, 9 figures. Comments are welcome

详情
AI中文摘要

长视界研究数学的自动形式化不仅在困难引理上失败,而且在规模上失败:陈述漂移、依赖关系纠缠、上下文衰减以及局部修复破坏远处的工作。我们提出LeanMarathon,一个用于可靠的研究级Lean自动形式化的多智能体框架。其核心抽象是一个演化的蓝图:一个Lean文件,同时作为形式化证明骨架、自然语言证明图和共享系统记录。四个合约范围的智能体构建、审计、证明和修复这个蓝图。这些智能体由一个两阶段编排器协调,该编排器首先通过对抗性审查稳定目标保真度,然后从动态叶节点向上并行地通过CI门控轮次释放证明有向无环图(DAG)。LeanMarathon将一次脆弱的数小时运行转变为许多局部、可恢复、并行的交易。我们在两篇最近的研究论文上评估LeanMarathon,涵盖四个Erdős问题(#1051, #1196, #164, #1217)。在三次自主运行中,它形式化了所有七个目标定理,没有留下任何sorry,证明了258个引理和定理。这些结果表明,可靠的AI合作数学不仅需要更强的证明器,还需要耐用的框架,以在长数学发展过程中保持目标保真度。代码可在https://github.com/YuanheZ/LeanMarathon找到。

英文摘要

Long-horizon autoformalization of research mathematics fails not only at hard lemmas, but at scale: statements drift, dependencies tangle, context decays, and local repairs corrupt distant work. We present LeanMarathon, a multi-agent harness for reliable research-level Lean autoformalization. Its core abstraction is an evolving blueprint: a Lean file that serves simultaneously as formal proof skeleton, natural-language proof graph, and shared system of record. Four contract-scoped agents construct, audit, prove, and repair this blueprint. These agents are coordinated by a two-stage orchestrator that first stabilizes target fidelity through adversarial review and then discharges the proof directed acyclic graph (DAG) from its dynamic leaves upward in parallel CI-gated rounds. LeanMarathon turns one brittle multi-hour run into many local, recoverable, parallel transactions. We evaluate LeanMarathon on two recent research papers spanning four Erdős problems (#1051, #1196, #164, #1217). Across three autonomous runs, it formalizes all seven target theorems with no sorry, proving 258 lemmas and theorems. These results show that reliable AI co-mathematics requires not only stronger provers, but durable harnesses that preserve target fidelity across long mathematical developments. The code can be found at https://github.com/YuanheZ/LeanMarathon.

2606.05381 2026-06-05 cs.LG 版本更新

Generalized TV--$\ell_p$ Structured Priors for Bayesian $T_1$ Mapping

广义TV--$\ell_p$结构化先验用于贝叶斯$T_1$映射

Disi Lin, Martin Berggren, Tommy Löfstedt

发表机构 * Department of Computing Science, Umeå University, Sweden(乌尔姆大学计算机科学系,瑞典)

AI总结 提出一种结合总变分(TV)与$\ell_p$范数的结构化空间先验族,并嵌入贝叶斯回归框架,利用No-U-Turn采样器进行后验推断,实现$T_1$映射中的不确定性量化,实验表明该方法能提高空间一致性和估计可靠性。

Comments Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2026:015

详情
Journal ref
Machine.Learning.for.Biomedical.Imaging. 2026 (2026)
AI中文摘要

我们提出了一类扩展的结构化空间先验,将总变分(TV)函数与$\ell_p$范数相结合。该先验被证明是适定的,并嵌入到贝叶斯回归框架中,以实现$T_1$映射中的不确定性量化,后验推断使用No-U-Turn采样器(NUTS)进行。该TV--$\ell_p$构造被证明构成一个定义良好的先验分布族,并且自然地增强了估计参数图中的空间一致性和平滑变化。该方法与基于均匀先验、Gamma先验和有界TV先验的最大似然估计以及几种贝叶斯替代先验进行了比较。评估包括在合成脑和心脏$T_1$映射数据集以及真实在体乳腺$T_1$映射数据集上的实验。结果表明,TV--$\ell_p$先验产生更集中的后验密度,表明不确定性降低。它还持续实现更低的方差和更小的(负)偏差,从而得到更可靠的估计。总体而言,在贝叶斯模型中将基于TV的结构化惩罚与$\ell_p$范数嵌入先验中,改善了$T_1$图中的空间一致性,并增强了不确定性量化,为具有不确定性的$T_1$映射提供了一种稳健的方法。

英文摘要

We propose an extended family of structured spatial priors that incorporates the total variation (TV) function with $\ell_p$ norms. The prior is proven to be proper and incorporated into a Bayesian regression framework to enable uncertainty quantification in $T_1$ mapping, with posterior inference performed using the No-U-Turn Sampler (NUTS). This TV--$\ell_p$ construction is proven to constitute a well-defined family of prior distributions, and it naturally enforces spatial consistency and smooth variations in the estimated parameter maps. The method was evaluated in comparison to maximum-likelihood estimation and several Bayesian alternative priors based on the uniform, Gamma, and bounded TV priors. The evaluation includes experiments on synthetic brain and cardiac $T_1$ mapping datasets, as well as a real in-vivo breast $T_1$ mapping dataset. The results show that the TV--$\ell_p$ prior yields more concentrated posterior densities, indicating reduced uncertainty. It also consistently achieves lower variance and smaller (negative) bias, leading to more reliable estimates. Overall, embedding a TV-based structured penalty along with $\ell_p$ norms in a prior in a Bayesian model improves spatial coherence in $T_1$ maps and enhances uncertainty quantification, offering a robust approach for $T_1$ mapping with uncertainties.

2606.05380 2026-06-05 cs.DS cs.LG 版本更新

Learning-Augmented Online Minimization with Dual Predictions

具有双重预测的学习增强在线最小化

Christian Coester, Alexa Tudose, Alexander Turoczy

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 针对度量任务系统和层状集合覆盖两类在线最小化问题,提出利用对偶线性规划最优解的机器学习预测来改进理论保证的学习增强算法。

详情
AI中文摘要

我们针对两类通用的在线最小化问题:度量任务系统和层状集合覆盖,提出了学习增强算法。这两种算法利用机器学习预测的对偶线性规划最优解,实现了改进的理论保证。与最优原始解不同(后者在微小实例扰动下可能发生剧烈变化),这些对偶解更加稳定,从而确保了对于相似实例族存在良好(且可学习)的预测。虽然先前的工作已在离线设置和在线最大化问题中使用了对偶预测,但据我们所知,我们的算法首次证明了对偶预测可以有效地用于在线最小化。我们的理论结果通过$k$-服务器问题和停车许可证问题的实验得到了补充。

英文摘要

We present learning-augmented algorithms for two general classes of online minimization problems: metrical task systems and laminar set cover. Both algorithms achieve improved theoretical guarantees using machine-learned predictions of an optimal solution to the dual linear program. Unlike optimal primal solutions, which can change drastically under tiny instance perturbations, these dual solutions are much more stable, which ensures the existence of good (and learnable) predictions for families of similar instances. While previous work has used dual predictions in offline settings and for online maximization problems, our algorithms are, to the best of our knowledge, the first demonstration that such dual predictions can be effective for online minimization. Our theoretical results are complemented by experiments on the $k$-server problem and the parking permit problem.

2606.05378 2026-06-05 cs.LG cs.AI 版本更新

Pattern Selectivity is Not Task-Causal Structure: A Cross-Architecture Mechanistic Study of Composed-Task Circuits in 1B-Class Language Models

模式选择性并非任务因果结构:1B类语言模型中组合任务电路的跨架构机制研究

Yongzhong Xu

发表机构 * B-Class Language Models(1B类语言模型) Cross-Architecture Mechanistic Study(跨架构机理研究)

AI总结 通过统一协议测试三个1B类语言模型在四个组合任务上的注意力头电路,发现不同模型对同一任务使用不同的注意力模式,并引入五类筛选结果分类法,提出MoE模型基于前一个token位置基板构建组合任务电路的可证伪假设。

Comments 27 pages, 3 figures

详情
AI中文摘要

我们测试了一个单一的筛选与消融方案——通过任务模式选择性识别注意力头电路,然后通过与匹配随机零假设进行因果消融验证——是否能在不同模型家族中产生一致的机制性结论。该方案可在不同流水线间移植;但它识别出的具体电路则不能。在四个组合任务(间接宾语识别、大于、后继序列、变量绑定)和三个来自不同训练流水线的1B类语言模型(Pythia 1B / Pile / 密集;OLMo 1B / DCLM / 密集;OLMoE 1B-7B / DCLM / 混合专家)上,我们运行了一个统一协议,每个单元使用十个种子采样匹配随机零假设。由此产生的12个(任务,模型)单元中,没有两个在可比较的效应大小下共享相同的主要因果筛选:同一任务,具有相同的行为能力,在不同模型中通过不同的注意力模式类型实现。 我们引入了一个五类筛选结果分类法——主要原因、次要原因、相关物、干扰物、零——并附有定量阈值,并展示了所有五类结果均出现在面板中。我们提出了一个可证伪的假设:我们面板中的MoE模型在一个基础的前一个token位置基板之上构建组合任务电路(对于OLMoE 1B-7B,前一个token电路消融在4个任务中的3个上是最强的因果筛选),IOI例外与IOI是最终位置名称复制任务一致,其结构直接探测不同的模式。该假设附带对其他MoE语言模型的明确预测。 我们诚实地构建方法论:来自配套方法论论文的谱参与比信号是专门化计算的一般指标;使发现具有任务特异性的是任务模式筛选加上每个模型的因果验证。

英文摘要

We test whether a single screen-and-ablate recipe -- identify attention-head circuits by task-pattern selectivity, then verify by causal ablation against a matched-random null -- produces consistent mechanistic claims across model families. The recipe ports across pipelines; the specific circuit it identifies does not. Across four composed tasks (indirect-object identification, greater-than, successor sequences, variable binding) and three 1B-class language models from distinct training pipelines (Pythia 1B / Pile / dense; OLMo 1B / DCLM / dense; OLMoE 1B-7B / DCLM / mixture-of-experts), we run a unified protocol with the matched-random null sampled across ten seeds per cell. The resulting 12 (task, model) cells contain no two that share the same primary causal screen at comparable effect size: the same task, with the same behavioral capability, is implemented through different attention-pattern types across models. We introduce a five-category screen-outcome taxonomy -- primary cause, secondary cause, correlate, interferer, null -- with quantitative thresholds, and show that all five outcomes appear in the panel. We propose a falsifiable hypothesis: the MoE model in our panel builds composed-task circuits on top of a foundational previous-token positional substrate (the prev-token-circuit ablation is the strongest causal screen on 3 of 4 tasks for OLMoE 1B-7B), with the IOI exception consistent with IOI being a final-position name-copying task whose structure directly probes a different pattern. The hypothesis comes with explicit predictions for other MoE language models. We frame the methodology honestly: the spectral participation-ratio signal from the companion methodology paper is a general indicator of specialized computation; what makes a finding task-specific is the task-pattern screen plus a per-model causal verification.

2606.05376 2026-06-05 cs.LG 版本更新

SHALA-LLM: Smartly Handling Ambiguous Labels in Aligning LLMs

SHALA-LLM:在对齐LLM中智能处理模糊标签

Jingyao Wu, Ashley Wang, Keane Ong, Paul Pu Liang, Rosalind Picard

发表机构 * MIT Media Lab, Massachusetts Institute of Technology(麻省理工学院媒体实验室、麻省理工学院) National University of Singapore(新加坡国立大学)

AI总结 提出SHALA-LLM强化学习框架,通过从标注者分布中学习并动态优先处理高模糊样本,改善LLM对模糊标签的建模,在NLI和情感识别任务中提升与标注者分布的一致性及分类性能。

详情
AI中文摘要

许多以人为中心的任务,包括自然语言推理(NLI)和情感识别(ER),具有多种合理的解释,导致标签模糊和不同标注者之间的分歧。随着LLM越来越多地部署在现实场景中,忠实建模这种模糊性对于识别有争议的输入、保留模糊情况下的变异性以及捕捉人类判断的完整分布至关重要。然而,现有的LLM对齐方法主要假设单一正确标签,在优化过程中排除了标注者分歧。我们不将这种模糊性视为噪声,而是展示如何通过一种名为SHALA-LLM(在对齐LLM中智能处理模糊标签)的新算法将其视为改善模型行为的信息。该强化学习框架提供了一种新方式,使LLM能够直接从标注者分布中学习,同时在优化过程中动态优先处理高模糊样本。在包括ChaosNLI、GoEmotions和MSP-Podcast在内的模糊敏感NLI和ER基准上的实验表明,SHALA-LLM改善了与标注者标签分布的一致性,例如在ChaosNLI上,它将Jensen-Shannon距离降低了高达62.1%。同时,SHALA-LLM将F1分数提高了高达16.7%,表明建模标注者分歧也能增强分类性能。

英文摘要

Many human-centered tasks, including natural language inference (NLI) and emotion recognition (ER), have multiple plausible interpretations, leading to label ambiguity and challenging disagreements across human annotators. As LLMs are increasingly deployed in real-world settings, faithfully modeling such ambiguity is essential to identify contested inputs, preserve variability in ambiguous cases, and capture the full distribution of human judgments. Yet, existing LLM alignment approaches have predominantly assumed a single correct label, excluding annotator disagreement during optimization. Instead of treating this ambiguity as noise, we show how to treat it as information that improves model behavior through a new algorithm called SMARTLY HANDLING AMBIGUOUS LABELS IN ALIGNING LLMS (SHALA-LLM). This reinforcement learning framework provides a new way for LLMs to learn directly from annotator distributions while dynamically prioritizing highly ambiguous samples during optimization. Experiments on ambiguity-sensitive NLI and ER benchmarks, including ChaosNLI, GoEmotions, and MSP-Podcast, demonstrate that SHALA-LLM improves agreement with annotator label distributions, e.g. on ChaosNLI, it reduces Jensen-Shannon Distance by up to 62.1%. At the same time, SHALA-LLM improves F1 by up to 16.7%, showing that modeling annotator disagreement can also strengthen classification performance.

2606.05373 2026-06-05 cs.LG physics.bio-ph 版本更新

Evidence-Guided Neural Architecture Selection under Uncertainty for Subject-Specific Blood Glucose Forecasting

证据引导的神经架构选择在不确定性下用于个体化血糖预测

Md Azharul Islam, Dwyer Deighan, Tarunraj Singha, Danial Faghihi

发表机构 * organization= Department of Mechanical Data-Enabled Sciences, University at Buffalo , city= Buffalo , state= NY , country= USA

AI总结 提出EVIDENT框架,结合贝叶斯训练、证据排序和任务特定验证,在有限、噪声和异构数据中自动选择最优神经架构,用于个体化血糖预测。

详情
AI中文摘要

在有限、噪声和异构数据下的时间序列预测中,可靠的神经架构选择是一个开放挑战,标准的启发式架构设计和验证方法无法确保准确可靠的预测和泛化。我们提出EVIDENT(基于证据的神经架构识别),一个整合贝叶斯训练、基于证据的排序和不确定性下任务特定验证的架构选择框架。该框架探索候选架构池,并识别满足规定验证标准的最低容量模型。我们使用时间卷积网络(TCNs)在1型糖尿病患者的个体化血糖预测中演示了该方法。结果表明,EVIDENT在群体水平糖尿病数据上系统地拒绝了参数不足和过度的TCN架构,同时识别出能可靠泛化到未见患者的模型。当多个架构具有竞争力时,该框架进一步支持基于可信度的集成预测,从而提升预测性能。与随机搜索基线相比,EVIDENT识别出更小的架构,在未见患者上具有更一致的预测性能。这些发现确立了EVIDENT作为一种神经架构发现策略,能够在数据有限和异构环境中实现高风险预测的可靠模型选择。

英文摘要

Reliable neural architecture selection is an open challenge in time-series forecasting under limited, noisy, and heterogeneous data, where standard heuristic architecture design and validation approaches fail to ensure accurate and reliable prediction and generalization. We propose EVIDENT (EVidence-based IDEntification of Neural archiTectures), a framework for architecture selection that integrates Bayesian training, evidence-based ranking, and task-specific validation under uncertainty. The framework explores the candidate architecture pool and identifies the lowest-capacity model that satisfies a prescribed validation criterion. We demonstrate this method using temporal convolutional networks (TCNs) for individualized blood glucose forecasting in type 1 diabetes patients. The results show that EVIDENT systematically rejects both under- and over-parameterized TCN architectures on population-level diabetes data, while identifying models that generalize reliably to unseen patients. When multiple architectures are competitive, the framework further supports plausibility-weighted ensemble predictions that enhance predictive performance. Compared with a random-search baseline, EVIDENT identified smaller architectures with more consistent forecasting performance on unseen patients. These findings establish EVIDENT as a strategy to neural architecture discovery, enabling reliable model selection for high-consequence forecasting in data-limited and heterogeneous settings.

2606.05371 2026-06-05 cs.LG cs.NA math.NA stat.ML 版本更新

Mamba-Assisted Non-Markovian Closure for Reduced-Order Modeling

Mamba辅助的非马尔可夫闭合用于降阶建模

Zhi-Feng Wei, Saad Qadeer, Panos Stinis

发表机构 * Pacific Northwest National Laboratory(太平洋西北国家实验室) University of Washington(华盛顿大学) Brown University(布朗大学)

AI总结 针对高维动力系统降阶建模中的非马尔可夫闭合项问题,提出Mamba辅助闭合框架,利用Mamba序列模型从已解析轨迹预测闭合项,并通过数值积分器耦合降阶方程,在粘性Burgers方程和混沌双尺度Lorenz '96系统上优于马尔可夫模型、GRU序列模型和Wilks方法。

Comments Code will be released upon acceptance

详情
AI中文摘要

高维动力系统的降阶建模常常受到非马尔可夫闭合项的阻碍,该闭合项表示未解析变量对解析动力学的影响。受Mori--Zwanzig形式论的启发,其中闭合项采取解析轨迹的记忆泛函形式,我们将闭合建模重新表述为序列建模问题,并提出Mamba辅助闭合(MAC)框架:一个基于Mamba的序列模型,经过训练从解析轨迹预测闭合项,通过数值积分器与降阶控制方程耦合,以在时间上推进解析变量。该框架的一个关键特性是利用状态空间模型的双重表示——模型通过卷积形式以序列到序列的方式进行训练,并通过循环形式进行逐步自回归部署,从而实现高效的长轨迹训练和恒定的每步推理成本。在粘性Burgers方程和混沌双尺度Lorenz '96系统上,MAC模型在预测准确性和长时间展开稳定性方面显著优于马尔可夫降阶模型、基于GRU的序列模型和Wilks方法。

英文摘要

Reduced-order modeling of high-dimensional dynamical systems is often hindered by the non-Markovian closure term that represents the effect of unresolved variables on the resolved dynamics. Inspired by the Mori--Zwanzig formalism, in which the closure takes the form of a memory functional of the resolved trajectory, we recast closure modeling as a sequence modeling problem and propose the Mamba-Assisted Closure (MAC) framework: a Mamba-based sequence model, trained to predict the closure from the resolved trajectory, is coupled with the reduced-order governing equations through a numerical integrator to advance the resolved variables in time. A key feature of the framework is its exploitation of the dual representation of state-space models -- the model is trained in a sequence-to-sequence fashion via the convolutional form, and deployed for step-by-step autoregressive rollout via the recurrent form, yielding both efficient long-trajectory training and constant per-step inference cost. On the viscous Burgers' equation and the chaotic two-scale Lorenz '96 system, the MAC model substantially outperforms the Markovian reduced-order model, the GRU-based sequence model, and the Wilks method in predictive accuracy and long-time rollout stability.

2606.05365 2026-06-05 stat.ML cs.LG 版本更新

Environment-Robust Representation Learning with Empirical Bayes

基于经验贝叶斯的环境鲁棒表示学习

Yuli Slavutsky, Matthew Shen, Bohan Wu, David M. Blei

发表机构 * Department of Statistics Columbia University(统计学系哥伦比亚大学) Columbia University(哥伦比亚大学) Departments of Statistics and Computer Science Columbia University(统计学与计算机科学系哥伦比亚大学)

AI总结 提出一种经验贝叶斯变分方法,通过跨环境平衡项学习不变潜在变量,实现对新环境的鲁棒预测,在天文、微生物和ICU数据上优于现有方法。

详情
AI中文摘要

我们考虑多环境预测问题。假设环境改变潜在变量的分布,而生成观测协变量和目标的机制在给定该变量条件下保持稳定。例如,医院或临床队列可能在潜在患者状态的流行率上有所不同,尽管这些状态、生理测量和结果之间的关系保持不变。给定来自多个环境的数据集,我们为这类问题构建了一个贝叶斯模型,并推导出相应的变分目标。我们证明该目标分解为每个环境项和由模型结构引起的额外跨环境平衡项。我们使用经验贝叶斯方法设置先验并将其纳入目标。基于该目标,我们开发了一种用于后验近似的摊销变分算法,并利用学习到的潜在变量在新环境中形成预测。我们通过模拟以及天文源识别、基于微生物组的疾病检测和ICU脓毒症预测的实际研究来研究我们的方法。在这些设置中,我们的方法在新环境预测方面优于先前的方法。

英文摘要

We consider multi-environment prediction problems. We assume the environments change the distribution of a latent variable, while the mechanisms generating observed covariates and targets remain stable conditional on that variable. For example, hospitals or clinical cohorts may differ in the prevalence of latent patient states, even though the relationships between those states, physiological measurements, and outcomes remain unchanged. Given a dataset from multiple environments, we formulate a Bayesian model for such problems and derive the corresponding variational objective. We show that this objective decomposes into per-environment terms and an additional cross-environment balancing term induced by the model's structure. We use an empirical Bayes method to set the prior and incorporate it into the objective. Based on this objective, we develop an amortized variational algorithm for posterior approximation, and use the resulting learned latent variables to form predictions in new environments.We study our approach through simulations and real-world studies of astronomical source identification, microbiome-based disease detection, and ICU sepsis prediction. Across these settings, our method outperforms previous approaches for prediction in new environments.

2606.05361 2026-06-05 stat.ML cs.LG 版本更新

TabSODA: Tabular Diffusion based Imputation with Skip Pattern Detection and Ordinal Awareness

TabSODA: 基于表格扩散的插补方法,结合跳跃模式检测与序数感知

Yuyu Chen, Taehyo Kim, Hai Shu, Yang Feng

发表机构 * Department of Biostatistics NYU School of Global Public Health(生物统计学系纽约大学全球公共卫生学院)

AI总结 提出TabSODA方法,通过EM框架下的扩散模型处理大规模调查中的结构跳跃和序数变量,在PATH和NSDUH数据集上显著提升插补精度。

详情
AI中文摘要

大规模调查中的缺失数据插补面临两个当前表格扩散方法未能很好处理的挑战。首先,\emph{结构跳跃}(因问卷设计而不可回答的单元格)不应被插补,但常与项目无回答混为一谈。其次,\emph{序数}响应编码了有序类别,但大多数流程通过独热或模拟位编码将其视为名义水平。我们提出 extbf{TabSODA}(具有跳跃模式检测和序数感知的表格扩散),一种基于期望最大化(EM)的扩散插补器,建立在阐明扩散模型(EDM)框架上。TabSODA通过去噪损失和逆时采样器传播结构跳跃,并用累积概率标量潜变量表示序数变量,同时保留名义变量的模拟位编码。当码本跳跃掩码可用时,TabSODA直接使用;否则,TabSODA+SKIP变体通过基于CART的跳跃模式挖掘器从原始响应和问卷顺序估计掩码。在烟草与健康人口评估(PATH)研究和全国药物使用与健康调查(NSDUH)这两个美国全国代表性调查中,TabSODA在MCAR、MAR和MNAR掩码下将序数MACE降低高达23.7%,并将分类准确率提高高达9%(相对于最强基线)。跳跃挖掘器在两个数据集上实现了近乎完美的精确度,使得TabSODA+SKIP能够紧密跟踪码本掩码变体。

英文摘要

Missing data imputation in large-scale surveys faces two challenges that are not well handled by current tabular diffusion methods. First, \emph{structural skips}, cells made inapplicable by questionnaire design, should not be imputed but are often conflated with item nonresponse. Second, \emph{ordinal} responses encode ordered categories, yet most pipelines treat them as nominal levels through one-hot or analog-bit encodings. We introduce \textbf{TabSODA} (\textbf{Tab}ular diffusion with \textbf{S}kip pattern detection and \textbf{O}r\textbf{d}inal \textbf{A}wareness), an Expectation-Maximization (EM)-based diffusion imputer built on the Elucidated Diffusion Model (EDM) framework. TabSODA propagates structural skips through the denoising loss and reverse-time sampler, and represents ordinal variables with cumulative-probit scalar latents while retaining analog-bit encodings for nominal variables. When a codebook skip mask is available, TabSODA uses it directly; otherwise, the TabSODA+SKIP variant estimates the mask from raw responses and questionnaire order using a CART-based skip-pattern miner. On Population Assessment of Tobacco and Health (PATH) study and the National Survey on Drug Use and Health (NSDUH), two nationally representative U.S.\ surveys, TabSODA reduces ordinal MACE by up to $23.7\%$ and improves categorical accuracy by up to $9\%$ over the strongest baseline across MCAR, MAR, and MNAR masking. The skip miner achieves near-perfect precision on both datasets, allowing TabSODA+SKIP to closely track the codebook-mask variant.

2606.05345 2026-06-05 cs.LG 版本更新

PJ-RoPE: A Fourier-Jet-Affine Position Space for Relative Attention

PJ-RoPE:一种用于相对注意力的傅里叶-喷气-仿射位置空间

Yaobo Zhang

发表机构 * School of Physics, Ningxia University(宁夏大学物理学院)

AI总结 本文提出PJ-RoPE,一种统一RoPE、Jordan-RoPE和ALiBi的傅里叶-喷气-仿射相对位置空间,通过可学习参数适应不同任务,并引入自适应扇区诊断和LC/快度坐标稳定高阶喷气。

Comments 26 pages, 6 figures, 10 tables. Code available at https://github.com/ybzhang-nxu/Poincare_Rope

详情
AI中文摘要

我们将RoPE的傅里叶相位、Jordan-RoPE的有限喷气和ALiBi的仿射近因统一到一个单一的可学习相对位置空间中,并研究不同任务选择该空间的哪些区域。PJ-RoPE是一种用于相对注意力的傅里叶-喷气-仿射公式,可选地具有庞加莱型解读,作为齐次傅里叶-喷气位置表示的仿射完备化。代数上,相同的基本元素构成一个有限常系数差分模:延迟移位算子的简单根给出傅里叶/RoPE特征,重复的非零根给出乔丹/傅里叶喷气,重复的单位根给出类似ALiBi的仿射近因。该框架将标量PJ偏置核与精确的PJ旋转特征变换分离,引入自适应扇区诊断,并使用LC/快度坐标稳定高阶喷气。受控探针验证了扇区包含和选择;小型语言运行暴露了仿射/近因边界;音乐令牌流提供了最清晰的情况,其中LC/仿射变体保持强劲,同时携带可测量的高阶修正;LC诊断显示尺度稳定性增益伴随相位分辨率损失。

英文摘要

We unify RoPE's Fourier phase, Jordan-RoPE's finite jets, and ALiBi's affine recency into a single learnable relative-position space, and study which regions of this space are selected by different tasks. PJ-RoPE is a Fourier-Jet-Affine formulation for relative attention, with an optional Poincare-type reading as the affine completion of a homogeneous Fourier-jet positional representation. Algebraically, the same primitives form a finite constant-coefficient difference module: simple roots of the lag-shift operator give Fourier/RoPE characters, repeated nonzero roots give Jordan/Fourier jets, and the repeated unit root gives ALiBi-like affine recency. The framework separates scalar PJ-bias kernels from exact PJ-rotary feature transforms, introduces adaptive sector diagnostics, and uses LC/rapidity coordinates to stabilize high-order jets. Controlled probes verify sector containment and selection; small language runs expose an affine/recency boundary; music-token streams provide the clearest case where LC/affine variants remain strong while carrying measurable high-order corrections; and LC diagnostics show a scale-stability gain coupled to phase-resolution loss.

2606.05335 2026-06-05 cs.LG stat.ML 版本更新

A prism hierarchy of learning regimes in large linear autoencoders

大型线性自编码器中学习机制的三棱柱层次结构

Eugene Golikov, Yaroslav Gusev, Dmitry Yarotsky

发表机构 * Applied AI Institute(应用人工智能研究所) Steklov Mathematical Institute of Russian Academy of Sciences(俄罗斯科学院斯捷克洛夫数学研究所)

AI总结 本文通过形式损失展开层次结构,将大型权重绑定线性自编码器的极端学习机制与三棱柱的面相关联,推导出五种基本极端机制下的训练和总体损失演化显式表达式。

Comments 33 pages, under review for NeurIPS'2026

详情
AI中文摘要

机器学习模型的理论研究通常考虑不同的极限机制,在这些机制下梯度下降的学习动态在理论上变得可处理。然而,对于特定类型的模型,系统地获得所有定性不同的极端学习机制的图景是可取的。在本文中,我们为大型权重绑定线性自编码器提出了这样一个图景,其特征由输入和潜在维度、初始化幅度以及训练集大小决定。该模型在权重上非线性,其梯度流没有一般的理论解。我们表明,在形式损失展开层次结构层面,其极端机制自然地与三棱柱的面相关联。特别地,存在与棱柱的2-面相关的五种基本极端机制:(1) 大数据,(2) 小数据,(3) 平均场,(4) 窄潜在,以及 (5) 自由。对于机制 (1,2,3,4),我们推导了梯度流下训练和总体极限损失演化的显式表达式,与实验结果非常吻合。

英文摘要

Theoretical studies of machine learning models commonly consider different limiting regimes in which the learning dynamics of gradient descent becomes theoretically tractable. It is, however, desirable to have a systematically obtained picture of all qualitatively different extreme learning regimes for a particular type of models. In this paper we propose such a picture for large weight-tied linear autoencoders characterized by input and latent dimensions, initialization magnitude, and training set size. This model is nonlinear in the weights and its gradient flow does not have a general theoretical solution. We show that at the level of the formal loss-expansion hierarchy, its extreme regimes are naturally associated with faces of a triangular prism. In particular, there are five basic extreme regimes associated with the 2-faces of the prism: (1) large-data, (2) small-data, (3) mean-field, (4) narrow-latent, and (5) free. For regimes (1,2,3,4), we derive explicit expressions for both train and population limiting loss evolutions under gradient flow, obtaining very good agreement with experimental results.

2606.05328 2026-06-05 cs.GR cs.AI cs.CV cs.LG 版本更新

The Invisible Hand of Physics: When Video Diffusion Models Know More Than They Show

物理的隐形之手:当视频扩散模型知道的比它们展示的更多

Parsa Esmati, Somjit Nath, Katja Hofmann, Derek Nowrouzezahrai, Samira Ebrahimi Kahou, Majid Mirmehdi

发表机构 * University of Bristol(布里斯托大学) McGill University(麦吉尔大学) Mila–Quebec AI Institute(魁北克AI研究院) Microsoft Research(微软研究院) University of Calgary(卡尔加里大学)

AI总结 通过逆向扩散过程探测视频扩散模型的潜在轨迹,发现物理合理性可以从扩散变换器状态中线性解码,准确率达81.27%,表明物理有意义的表示是生成式去噪的副产品。

详情
AI中文摘要

现代视频扩散模型生成越来越真实和时间上连贯的视频,这激发了它们作为候选世界模拟器的使用。然而,目前尚不清楚这些模型是否内部编码了物理结构,或者仅仅是复现了训练中看到的运动模式。我们通过沿着对应已知物理合理性的真实视频的潜在轨迹探测视频扩散模型来研究这个问题。为了获得这样的轨迹,我们通过从干净视频潜在变量向后积分学习到的速度场到噪声,近似逆向确定性采样过程,从而访问模型的中间状态和注意力图。利用这些恢复的轨迹,我们表明物理合理性可以从扩散变换器状态中线性解码,在IntPhys和InfLevel上达到约81.27%的平均准确率,并优于专门的表示学习基线如V-JEPA和VideoMAE。令人惊讶的是,这个信号在VAE潜在输入中不存在,而是在去噪变换器内部出现,尽管模型没有使用自监督预测目标进行训练。这些发现表明,物理有意义的表示可以作为生成式去噪的副产品产生。

英文摘要

Modern video diffusion models generate increasingly realistic and temporally coherent videos, motivating their use as candidate world simulators. Yet it remains unclear whether these models internally encode physical structure, or merely reproduce motion patterns seen during training. We study this question by probing video diffusion models along latent trajectories corresponding to real videos with known physical plausibility. To obtain such trajectories, we approximately invert the deterministic sampling process by integrating the learned velocity field backward from a clean video latent to noise, giving access to the model's intermediate states and attention maps. Using these recovered trajectories, we show that physical plausibility is linearly decodable from diffusion transformer states across IntPhys and InfLevel, reaching around 81.27% average accuracy and outperforming dedicated representation-learning baselines such as V-JEPA and VideoMAE. Surprisingly, this signal is absent from the VAE latent input and emerges inside the denoising transformer itself, despite the model not being trained with a self-supervised predictive objective. These findings suggest that physically meaningful representations can arise as a byproduct of generative denoising.

2606.05327 2026-06-05 cs.LG q-bio.QM stat.ML 版本更新

Multimarginal flow matching with optimal transport potentials

基于最优传输势的多边缘流匹配

Raghav Kansal, David Crair, Nghia Nguyen, Scott Pope, Bradley Parry

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出一种利用动态最优传输势引导流匹配学习中间边缘分布的方法,实现高效无模拟的多边缘流匹配,在单细胞RNA测序、海洋学和气象数据集上取得最优性能。

Comments 9 pages, 3 figures, 4 tables, and a 27 page appendix. Accepted to the Forty-Third International Conference on Machine Learning

详情
AI中文摘要

流匹配(FM)已成为学习两个经验分布之间动态传输映射的强大框架。然而,对于存在中间观测边缘分布的情况,这些边缘分布有助于约束端点之间的流,这方面的研究较少。这种“多边缘”设置对于许多科学领域中动态系统的时间演化建模至关重要,这些领域可以对序列分布进行采样。我们通过一种新颖的方法解决了这个问题,该方法利用了FM与动态最优传输(OT)之间的联系,通过动态OT作用中的势项将流柔和地引导向中间边缘分布。通过扩展条件FM学习目标以包含这些势,我们推导出一种高效、无模拟的多边缘FM算法,该算法在学习流的时空动力学方面提供了相当大的灵活性。我们在不同的单细胞RNA测序、海洋学和气象数据集上展示了OT势FM(OTP-FM)的最先进性能和训练效率。我们的代码可在https://github.com/Bexorg-Inc/OTP-FM获取。

英文摘要

Flow matching (FM) has emerged as a powerful framework for learning dynamic transport maps between two empirical distributions. However, less explored is the setting with intermediate observed marginals that can help constrain the flows between the endpoints. This "multimarginal" regime is central to modeling temporal evolution in dynamical systems in many scientific domains that can sample sequential distributions. We tackle this problem with a novel approach that leverages the connection between FM and dynamic optimal transport (OT), softly steering the flow towards the intermediate marginals through potential terms in the dynamic OT action. By extending the conditional FM learning target to incorporate these potentials, we derive an efficient, simulation-free algorithm for multimarginal FM that offers considerable flexibility in the spatiotemporal dynamics of the learned flows. We demonstrate state-of-the-art performance and training efficiency of OT-potential FM (OTP-FM) on diverse single-cell RNA sequencing, oceanographic, and meteorological datasets. Our code is available at https://github.com/Bexorg-Inc/OTP-FM.

2606.05326 2026-06-05 math.OC cs.AI cs.LG math-ph math.AP math.MP 版本更新

Gradient descent at the Edge of Stability: free energy model and kinetic description of the two-layer network

稳定边缘的梯度下降:双层网络的自由能模型与动力学描述

Antonin Chodron de Courcel

发表机构 * Ecole Normale Supérieure, CNRS, 45 rue d’Ulm, 75005 Paris, France(巴黎高等师范学院、法国国家科学研究中心、巴黎 rue d’Ulm 45 号、75005 地址、法国)

AI总结 针对大学习率下梯度下降的稳定边缘动力学,提出连续时间有效模型跟踪平均轨迹与快速振荡协方差,揭示有效自由能作为关键监控量,并导出宽双层网络的平均场极限动力学方程。

Comments Comments are welcome!

详情
AI中文摘要

我们研究了稳定边缘(Edge of Stability)机制下梯度下降的动力学,其中学习率足够大,导致损失和锐度出现持续振荡。我们提出了一个连续时间有效模型,跟踪平均轨迹的演化以及其快速振荡的时间平均协方差。我们的分析表明,在这种不稳定机制中,需要监控的自然量是有效自由能,它将原始风险泛函与曲率相关的“熵”项相结合。我们的模型允许我们跟踪振荡的包络,即使在动力学与平均权重在相似时间尺度上演化的情况下。换句话说,我们可以跟踪某些神经网络架构训练过程中出现的尖峰。对于在稳定非消失振荡下优化的宽双层神经网络,我们推导出一个平均场极限,产生了一个新的动力学方程,描述了权重及其波动的联合分布。我们证明该方程可以解释为宏观自由能的Wasserstein-2梯度流。最后,我们提供了矩阵分解和深度学习任务(CIFAR-10)上的数值证据,以证明模型在捕捉振荡包络方面的准确性以及有效自由能的预测能力。

英文摘要

We study the dynamics of gradient descent in the Edge of Stability regime, where the learning rate is large enough to induce persistent oscillations in the loss and the sharpness. We propose a continuous-time effective model that tracks the evolution of the average trajectory coupled with the time-averaged covariance of its fast oscillations. Our analysis reveals that the natural quantity to monitor in such unstable regimes is an effective free energy, which combines the original risk functional with a curvature-related "entropic" term. Our model allows us to track the envelope of the oscillations even in situations where its dynamics evolve on similar timescales as the averaged weights. Otherwise stated, we can track the spikes that occur during the training of some neural network architectures. For wide two-layer neural networks optimized under stable non-vanishing oscillations, we derive a mean-field limit that results in a novel kinetic equation describing the joint distribution of weights and their fluctuations. We show that this equation can be interpreted as a Wasserstein-2 gradient flow of a macroscopic free energy. Finally, we provide numerical evidence on matrix factorization and deep learning tasks (CIFAR-10) to demonstrate the model's accuracy in capturing the envelope of the oscillations and the predictive power of the effective free energy.

2606.05308 2026-06-05 cs.LG cs.AI cs.CL cs.IR stat.AP 版本更新

Statistically Reliable LLM-Based Ranking Evaluation via Prediction-Powered Inference

基于预测驱动推断的统计可靠LLM排序评估

Abhishek Divekar

发表机构 * Amazon(亚马逊)

AI总结 提出PRECISE框架,将预测驱动推断扩展到排序评估指标,通过结合少量人工标注和大量LLM判断实现无偏估计,并在ESCI基准和实际系统中验证了有效性。

Comments Accepted at ACL 2026 - GEM Workshop

详情
AI中文摘要

通过PRECISE,我们将预测驱动推断扩展到排序评估指标,通过结合少量人工标注集和大量LLM判断集,产生偏差校正的估计。PPI无论LLM判断器的错误分布如何,都是可证明无偏的。我们通过将输出空间计算从O(2^|C|)减少到O(2^K),使其适用于像Precision@K这样的分层指标,其中标注是按文档的,但指标是按查询的。在ESCI基准上,用Claude 3 Sonnet判断增强30个人工标注,将Precision@4估计的标准误差从4.45降低到3.50(相对减少21%)。在一个生产系统中,我们的框架从100个人工标签和2小时的领域专家标注中正确识别了三个系统变体中最好的一个;A/B测试确认了这一排序,日销售额增加了407个基点。

英文摘要

With PRECISE, we extended Prediction-Powered Inference to produce bias-corrected estimates of ranking evaluation metrics by combining a small human-labeled set with a large LLM-judged set. PPI is provably unbiased regardless of the LLM judge's error profile. We make it applicable to hierarchical metrics like Precision@K, where annotations are per-document but the metric is per-query, by reducing the output-space computation from O(2^|C|) to O(2^K). On the ESCI benchmark, augmenting 30 human annotations with Claude 3 Sonnet judgments reduces the standard error of Precision@4 estimates from 4.45 to 3.50 (a 21% relative reduction). In a production system, our framework correctly identified the best of three system variants from 100 human labels and 2 hours of domain-expert annotation; A/B testing confirmed this ranking with +407 bps in daily sales.

2606.05296 2026-06-05 cs.LG cs.AI 版本更新

Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents

智能体蒙特卡洛:黑盒智能体的强化学习模拟

Dae Yon Hwang, Raunaq Suri, Valentin Villecroze, Anthony L. Caterini, Jesse C. Cresswell, Noël Vouitsis, Brendan Leigh Ross

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出Agentic Monte Carlo (AMC)方法,利用序贯蒙特卡洛从最优策略后验中采样,无需参数级优化即可对黑盒LLM智能体进行强化学习式优化,在AgentGym基准上超越提示基线并随测试时计算扩展优于GRPO。

Comments Accepted by ICML 2026

详情
AI中文摘要

LLM智能体在两种不同的机制下运行:适用于强化学习(RL)的开权重智能体,以及其行为必须在测试时纯粹控制的黑盒智能体。尽管黑盒智能体通常由最先进的专有LLM支持,但仅API访问排除了参数级优化,使得大多数RL方法不适用。为解决这一限制,我们转向RL与贝叶斯推断之间的已知等价性。我们提出智能体蒙特卡洛(AMC),直接从黑盒智能体的最优策略中采样,而不是通过RL训练它。最优策略是轨迹上的后验,其先验我们定义为固定的黑盒LLM智能体。我们采用序贯蒙特卡洛从该后验中采样,通过学习一个价值函数来引导智能体,同时保持底层黑盒模型不变。我们在AgentGym基准的三个不同环境中验证了AMC,展示了相对于提示基线的显著改进,并且随着我们方法测试时计算的扩展,甚至优于组相对策略优化(GRPO)。AMC证明了执行黑盒LLM智能体的原则性RL式优化的可行性。代码可在https://github.com/layer6ai-labs/Agentic-Monte-Carlo获取。

英文摘要

LLM agents operate in two distinct regimes: open-weight agents amenable to reinforcement learning (RL) and black-box agents whose behaviour must be controlled purely at test time. Although black-box agents are often backed by state-of-the-art proprietary LLMs, API-only access precludes parameter-level optimization, rendering most RL methods inapplicable. To address this limitation, we turn to a known equivalence between RL and Bayesian inference. We propose Agentic Monte Carlo (AMC) to directly sample from the optimal policy of a black-box agent rather than training it through RL. The optimal policy is a posterior over trajectories whose prior we define as the fixed black-box LLM agent. We employ Sequential Monte Carlo to sample from this posterior by learning a value function to steer the agent while leaving the underlying black-box model unchanged. We validate AMC on three diverse environments from the AgentGym benchmark, demonstrating significant improvements over prompting baselines and even outperforming Group Relative Policy Optimization (GRPO) as we scale the test-time compute of our method. AMC demonstrates the feasibility of performing principled RL-style optimization of black-box LLM agents. Code is available at https://github.com/layer6ai-labs/Agentic-Monte-Carlo

2606.05274 2026-06-05 cs.LG 版本更新

Anomaly Detection for Electro-Hydrostatic Actuators using LSTM Autoencoder

基于LSTM自编码器的电液伺服作动器异常检测

Nehal Afifi, Abdelmonem Elhendawi, Felix Leitenberger, Nadine Piat, Sven Matthiesen

发表机构 * IPEK - Institute of Product Engineering, Karlsruhe Institute of Technology (KIT), Germany(产品工程研究所,卡尔斯鲁厄理工学院(KIT),德国) SUPMICROTECH-ENSMM, France(SUPMICROTECH-ENSMM,法国)

AI总结 针对电液伺服作动器传感器信号,提出基于LSTM自编码器的重构异常检测框架,在多种故障注入场景下达到99%平均准确率与极低误报率。

Comments 8 pages, 6 figures, 3 tables, ESREL 2026 -European Safety and Reliability Conference, accepted paper to be published

详情
AI中文摘要

电液伺服作动器(EHA)广泛应用于航空航天和工业系统,及时检测传感器异常对于确保安全可靠运行至关重要。然而,EHA传感器数据量大且采样频率高,给准确高效的异常检测带来了挑战。传统的统计和经典机器学习方法,如Z-score、四分位距(IQR)、中位数绝对偏差(MAD)、孤立森林、高斯混合和k-means,往往无法捕捉EHA信号中固有的时间依赖性,导致检测精度有限且误报率升高。此外,针对EHA系统的数据驱动异常检测方法的系统评估仍然很少,特别是在不同运行条件下。本研究提出了一种针对单变量EHA传感器信号的离线异常检测框架,重点关注从受控测试台收集的温度和压力数据。该方法采用基于重构的长短期记忆(LSTM)自编码器,通过验证集重构误差分布进行校准和评估。在多种故障注入场景下,使用准确率、精确率、召回率和F1分数评估性能,并辅以不同运行条件下的敏感性分析。LSTM自编码器在所有评估传感器上实现了平均准确率99.0%、精确率高达100%、召回率介于90.2%至99.6%之间、F1分数介于93.1%至99.8%之间,显示出高检测灵敏度和极低的误报率。这些结果凸显了数据驱动的离线异常检测在EHA中的可行性。未来工作将集中于将所开发的框架适配到在线(实时)环境。

英文摘要

Electro-Hydrostatic Actuators (EHAs) are widely used in aerospace and industrial systems, where timely detection of sensor anomalies is essential to ensure safe and reliable operation. However, the large volume and high sampling frequency of EHA sensor data pose challenges for accurate and efficient anomaly detection. Conventional statistical and classical machine-learning methods such as Z-score, Interquartile Range (IQR), Median Absolute Deviation (MAD), Isolation Forest, Gaussian Mixture, and k-means often fail to capture the temporal dependencies inherent in EHA signals, resulting in limited detection accuracy and elevated false-alarm rates. Furthermore, systematic evaluations of data-driven anomaly detection approaches for EHA systems remain scarce, particularly under varying operational conditions. This study presents an offline anomaly-detection framework for univariate EHA sensor signals, focusing on temperature and pressure data collected from a controlled test bench. The method employs a reconstruction-based Long Short-Term Memory (LSTM) autoencoder, calibrated and evaluated using validation-set reconstruction-error distributions. Performance is assessed across multiple fault-injection scenarios using accuracy, precision, recall, and F1-score, complemented by sensitivity analyses under varying operating conditions. The LSTM autoencoder achieved an average accuracy of 99.0\%, precision up to 100\%, recall between 90.2\% and 99.6\%, and F1-scores from 93.1\% to 99.8\%, demonstrating high detection sensitivity and a very low false-alarm rate across all evaluated sensors. These results highlight the feasibility of data-driven offline anomaly detection for EHAs. Future work will focus on adapting the developed framework for an online (real-time) environment.

2606.05272 2026-06-05 cs.LG 版本更新

Learning Manifold and Itô Dynamics with Branched Neural Rough Differential Equations

学习流形与伊藤动力学:分支神经粗糙微分方程

Luke Thompson, Dai Shi, Lequan Lin, Junbin Gao, Andi Han

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学) University of Toronto(多伦多大学)

AI总结 提出分支神经粗糙微分方程(B-NRDE),通过Hopf代数框架统一处理欧几里得伊藤动力学、流形上的有序协变导数及经典Stratonovich情形,实现精确的粗步流形约束动力学和伊藤一致律匹配。

Comments Accepted at ICML 2026

详情
AI中文摘要

神经粗糙微分方程(NRDE)在不规则采样下保持准确性,同时所需的积分步数远少于标准神经微分方程,它通过对数签名总结精细采样的驱动信号,并利用log-ODE方法在粗间隔上推进隐藏状态。这种效率依赖于洗牌代数,即Stratonovich微积分的代数对应。这种依赖性意味着NRDE无法暴露伊藤动力学所需的二次变分项,也无法处理带联络流形上控制伊藤流的有序协变导数。为改善这一点,我们引入了分支神经粗糙微分方程(B-NRDE),这是一个Hopf代数框架,将NRDE的log-ODE步骤重新解释为状态空间流形上的几何数值积分,使驱动代数与主导微积分相匹配:对于欧几里得伊藤动力学使用Grossman--Larson根树,对于流形上的有序协变导数使用Munthe-Kaas--Wright平面根树,在经典Stratonovich情形下使用洗牌代数。这产生了内在的粗步动力学,精确保持流形约束。最后,我们引入一个分支签名核目标,通过在训练过程中使二次变分项可见,实现伊藤一致律匹配。在粗糙Bergomi波动率、仿真到真实$\mathrm{SO}(3)$动力学预测以及SPD协方差动力学上,B-NRDE为欧几里得-Stratonovich设置之外的随机和流形值动力学提供了一种统一、有效的方法。

英文摘要

Neural rough differential equations (NRDEs) stay accurate under irregular sampling while taking far fewer integration steps than standard neural differential equations, summarising a finely sampled driver by its log-signature and advancing the hidden state over coarse intervals using the log-ODE method. This efficiency rests on the shuffle algebra, the algebraic counterpart of Stratonovich calculus. This reliance means NRDEs cannot expose the quadratic-variation terms Itô dynamics require, nor the ordered covariant derivatives that govern Itô flows on connection-equipped manifolds. Ameliorating this, we introduce Branched Neural Rough Differential Equations (B-NRDEs), a Hopf-algebraic framework that recasts the NRDE log-ODE step as geometric numerical integration on the state-space manifold, matching the driving algebra to the governing calculus: Grossman--Larson rooted trees for Euclidean Itô dynamics, Munthe-Kaas--Wright planar rooted trees for ordered covariant derivatives on manifolds, and the shuffle algebra in the classical Stratonovich case. This yields intrinsic coarse-step dynamics that exactly preserve manifold constraints. Finally, we introduce a branched signature-kernel objective to enable Itô-consistent law matching by making quadratic-variation terms visible during training. On rough Bergomi volatility, sim-to-real $\mathrm{SO}(3)$ dynamics forecasting, and SPD covariance dynamics, B-NRDEs offer a unified, effective approach to stochastic and manifold-valued dynamics beyond the Euclidean--Stratonovich setting.

2606.05268 2026-06-05 cs.GR cs.LG 版本更新

Aggregating LLM-Based Weak Verifiers for Spatial Layout Generation

基于LLM的弱验证器聚合用于空间布局生成

Sharon Zhang, R. Kenny Jones, Jiajun Wu, Maneesh Agrawala

发表机构 * Stanford University(斯坦福大学) Roblox

AI总结 提出一种通过聚合LLM生成的弱验证器来构建强验证器的流水线,用于空间布局领域,在3D房间布局和2D海报设计任务中F1分数提升高达7倍。

详情
AI中文摘要

我们提出了一种流水线,用于构建和聚合任务特定的、LLM生成的弱(不完美)验证器,以形成空间布局领域的强验证器。给定任务描述,我们的流水线要求LLM使用布局验证DSL合成一组验证程序。每个单独的LLM生成的验证器通常对布局与相应任务描述之间的匹配提供不完美的检查。我们表明,通过聚合许多此类验证器的响应,我们可以产生更强的验证器。此外,通过应用弱学习技术,我们的流水线可以从非常稀疏的人工标记示例布局(约10个)中学习如何聚合弱验证器。我们发现,我们的流水线产生的强验证器优于使用一组LLM评判者直接检查布局是否与任务描述匹配的现状方法,在各种3D房间布局和2D海报设计任务中,F1分数提高了高达7倍。我们还证明,使用来自我们强验证器的自然语言反馈进行验证器引导的布局生成,根据人类评估者的评估,将基础布局生成器的布局质量提高了高达66.2%。

英文摘要

We present a pipeline for building and aggregating task-specific, LLM-generated weak (imperfect) verifiers into a strong verifier for spatial layout domains. Given a task description, our pipeline asks an LLM to synthesize a collection of verifier programs using a layout verification DSL. Each individual LLM-generated verifier usually provides an imperfect check for a match between the layout and the corresponding task description. We show that by aggregating the responses of many such verifiers we can produce a stronger verifier. Moreover, by applying techniques from weak learning, our pipeline can learn how to aggregate the weak verifiers from a very sparse set of human labeled example layouts (about 10). We find that the strong verifiers produced by our pipeline outperform the status-quo approach of using a set of LLM judges to directly check whether a layout matches a task description, raising F1-scores by up to 7X across a variety of 3D room layout and 2D poster design tasks. We also demonstrate that verifier-guided layout generation using natural language feedback from our strong verifiers improves layout quality of a base layout generator by up to 66.2% according to a human evaluator.

2606.05266 2026-06-05 cs.LG cs.CC cs.DS math.CO math.PR math.ST stat.TH 版本更新

Sharp Low-Degree Thresholds for Planted-vs-Planted Testing

植入vs植入测试的尖锐低度阈值

Anda Skeja, Daniel Gutiérrez Espinoza, Fiona Skerman, Alexander S. Wein

发表机构 * Department of Mathematics, University of California, Davis(加州大学戴维斯分校数学系)

AI总结 针对植入vs植入设置,建立了低度多项式测试的首个尖锐阈值,并证明在植入子矩阵和植入稠密子图模型中计数社区的匹配上下界,测试阈值与已知低度恢复阈值精确一致。

详情
AI中文摘要

我们在植入vs植入设置中建立了低度多项式测试的首个尖锐阈值,其中目标是以渐近消失的错误率确定两个结构化植入机制中的哪一个生成了观测数据。我们证明了在植入子矩阵和植入稠密子图模型中计数社区的匹配低度上下界。所得的测试阈值与已知的低度恢复阈值精确一致。相比之下,弱测试(即目标优于随机猜测)没有尖锐阈值,而是存在一个我们识别的平滑过渡。为了证明我们的结果,我们开发了一个基于低度恢复中潜在变量展开的植入vs植入测试框架,并采用新方法来识别和修剪非信号贡献。

英文摘要

We establish the first sharp thresholds for low-degree polynomial tests in planted-vs-planted settings, where the goal is to determine with vanishing error which of two structured planted mechanisms generated the observed data. We prove matching low-degree upper and lower bounds for counting communities in the planted submatrix and planted dense subgraph models. The resulting testing threshold coincides, down to the sharp constant, with the known low-degree recovery threshold. In contrast, the task of weak testing, where the goal is to outperform random guessing, does not have a sharp threshold but rather a smooth transition, which we identify. To prove our results, we develop a framework for planted-vs-planted testing that builds on a latent-variable expansion originating in low-degree recovery and employs new methods to identify and prune non-signal contributions.

2606.05265 2026-06-05 cs.LG 版本更新

Data-efficient flood depth prediction through domain-aware coreset selection and tabular foundation models

数据高效的洪水深度预测:通过领域感知的核心集选择与表格基础模型

Lipai Huang, Adithi Srinath, Manas Singh, Junwei Ma, Ali Mostafavi

发表机构 * Urban Resilience.AI Lab(Urban Resilience.AI实验室) Zachry Department of Civil and Environmental Engineering, Texas A&M University(Zachry土木与环境工程系,德克萨斯A&M大学) Department of Computer Science and Engineering, Texas A&M University(计算机科学与工程系,德克萨斯A&M大学) Resilitix Intelligence LLC Institute for a Disaster Resilient Texas, Texas A&M University(德克萨斯灾难韧性研究所,德克萨斯A&M大学)

AI总结 提出一种领域感知的核心集构建流程,结合表格基础模型,仅用0.7%的训练数据即可实现与监督模型相当的洪水深度预测精度,并支持跨流域迁移。

详情
AI中文摘要

近实时洪水深度预测需要替代模型具有准确性、快速性和跨流域可迁移性。监督替代模型在精度上可媲美基于物理的模拟器,但每个流域需要数百万训练行,且无法外推到原始网格之外。我们提出了一种领域感知的核心集构建流程,在推理时对表格基础模型进行条件化。该流程按重现期和受影响最严重的流域对风暴进行分层,然后使用目标感知的空间选择器采样六边形。使用每个流域训练池的0.7%,模型在休斯顿地区九个流域上实现了平均$R^2$为0.663,达到监督参考($R^2$=0.673)的98.5%。该模型无需特定任务重训练即可迁移到未见的流域,优于基于核心集训练的监督基线。在真实风暴上,模型在一个远分布外案例中超过了监督参考,在一个几乎分布内案例中略逊于监督参考。领域感知的核心集构建使表格基础模型能够实现数据高效、跨流域可迁移的洪水预测,无需每个流域的训练。

英文摘要

Near-real-time flood depth prediction demands surrogate models that are accurate, fast, and transferable across watersheds. Supervised surrogates can match physics-based simulators in accuracy but need millions of training rows per watershed and cannot extrapolate beyond their original mesh. We propose a domain-aware coreset construction pipeline that conditions a tabular foundation model at inference time. The pipeline stratifies storms by return period and most-affected watershed, then samples hexagons with a target-aware spatial selector. With 0.7% of the per-watershed training pool, the model attains a mean $R^2$ of 0.663 across nine Houston-area watersheds, within 98.5% of the supervised reference ($R^2$ = 0.673). It transfers to held-out watersheds without task-specific retraining, staying ahead of a coreset-trained supervised baseline. On real storms it exceeds the supervised reference on a far out-of-distribution case and trails it on a mostly in-distribution one. Domain-aware coreset construction lets tabular foundation models deliver data-efficient, watershed-transferable flood predictions without per-watershed training.

2606.05263 2026-06-05 cs.LG cs.AI 版本更新

Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents

基于策略条件的反事实信用分配用于长周期语言智能体的可验证强化学习

Renwei Meng

发表机构 * stu.ahu.edu.cn(安徽大学)

AI总结 提出CVT-RL算法,通过策略条件反事实贡献估计和可验证奖励约束,解决长周期语言智能体在推理和工具使用中的虚假证据链、信念漂移和捷径行为问题,在多个任务上提升成功率并降低作弊率。

Comments 16 pages, 6 figures

详情
AI中文摘要

具有可验证奖励的强化学习改进了推理和工具使用,但长周期语言智能体仍然学习到无支持的证据链、信念漂移以及满足终端检查的捷径行为。现有的过程奖励大多是相关的:它们奖励类似检索、反思或验证的步骤,而不估计在指定干预下该步骤是否有助于最终验证的成功。我们提出CVT-RL,一种具有密集可验证奖励、干预有效性门控和策略条件反事实贡献(PCCC)估计器的约束策略梯度算法。删除、语义替换、证据替换和工具输出扰动定义了不同的受控干预;延续从冻结的参考策略中采样,并使用选择调整的双重稳健估计器增强优势。信念控制仅使用前缀可观察标签,而增广拉格朗日约束无支持的声明、跳过的验证、工具篡改和不安全调用。在长上下文问答、ALFWorld、ScienceWorld以及网页/工具任务上,CVT-RL将平均任务成功率从计算匹配的非因果强化学习的71.8%和信息匹配的反事实过程基线的75.4%提高到78.9%,证据F1分数从信息匹配基线的78.9提高到82.8,并将测量的作弊率从7.2%降低到3.9%。独立人工审计估计CVT-RL的作弊率为4.6%,而信息匹配基线为8.1%,自适应检测器规避攻击仅将作弊率提高到7.1%。分层自助法和混合效应检验在Holm校正后所有主要指标的p<0.01。精心范围的反事实信用,结合有效性门控、诊断和可验证约束,为语言智能体更可靠的长周期强化学习提供了一条可复现的路径。

英文摘要

Reinforcement learning with verifiable rewards improves reasoning and tool use, yet long-horizon language agents still learn unsupported evidence chains, belief drift, and shortcut actions that satisfy terminal checks. Existing process rewards are mostly correlational: they reward retrieval-, reflection-, or verification-like steps without estimating whether the step contributes to final verified success under a specified intervention. We propose CVT-RL, a constrained policy-gradient algorithm with dense verifiable rewards, intervention-validity gating, and a policy-conditioned counterfactual contribution (PCCC) estimator. Deletion, semantic substitution, evidence substitution, and tool-output perturbation define separate controlled interventions; continuations are sampled from a frozen reference policy, and a selection-adjusted doubly robust estimator augments the advantage. Belief control uses only prefix-observable labels, while an augmented Lagrangian constrains unsupported claims, skipped verification, tool tampering, and unsafe calls. On long-context QA, ALFWorld, ScienceWorld, and web/tool tasks, CVT-RL improves average task success from 71.8% for compute-matched non-causal RL and 75.4% for an information-matched counterfactual-process baseline to 78.9%, improves evidence F1 from 78.9 to 82.8 over the information-matched baseline, and reduces measured hacking from 7.2% to 3.9%. Independent human audit estimates 4.6% hacking for CVT-RL versus 8.1% for the information-matched baseline, and adaptive detector-evasion attacks raise hacking only to 7.1%. Stratified bootstrap and mixed-effects tests give p<0.01 after Holm correction for all primary metrics. Carefully scoped counterfactual credit, paired with validity gating, diagnostics, and verifiable constraints, provides a reproducible route toward more reliable long-horizon RL for language agents.

2606.05261 2026-06-05 cs.CV cs.AI cs.LG 版本更新

NIV: Neural Axis Variations for Variable Font Generation

NIV: 用于可变字体生成的神经轴变化

Nadav Benedek, Ariel Shamir, Ohad Fried

发表机构 * Reichman University(雷赫曼大学)

AI总结 提出NIV方法,通过预测字形轮廓的逐点位移,自动将静态字体转换为支持多轴连续插值的可变字体,并在新构建的数据集上验证其泛化能力。

详情
AI中文摘要

可变字体能够沿语义设计轴(如字重、字宽、倾斜和光学尺寸)实现字形几何的连续变化。然而,从静态字体构建可变字体仍然是一个劳动密集型过程,需要专业的字体设计和对字形变化数据的手动规范。我们引入了NIV(神经轴变化),一种自动将静态字体转换为功能齐全的可变字体的方法。给定字形轮廓和一组期望的设计轴,NIV预测每点的位移。该模型直接操作矢量字形几何,并采用一种新颖的属性嵌入机制,捕获多个轴之间的相互作用,从而在统一框架内实现一致的多轴变化。我们在一个新构建的源自可变Google字体的数据集上训练NIV,该数据集包含超过一百万个变化元组。得到的模型能够泛化到未见过的码点、未见过的字体样式、高复杂度的CJK字形,甚至分布外的手写输入。生成的输出是标准的可变字体文件,支持通过现有渲染引擎进行连续插值。为了促进研究,我们在https://github.com/ndvbd/NIV上发布了数据集、完整的训练和推理实现以及训练好的模型。超越字体排印,我们的方法展示了如何使用神经变形合成具有连续参数变化的结构化几何对象。

英文摘要

Variable fonts enable continuous variation of glyph geometry along semantic design axes such as weight, width, slant, and optical size. However, constructing a variable font from a static font remains a labor-intensive process requiring expert typographic design and manual specification of glyph variation data. We introduce NIV (Neural Axis Variations), a method that automatically converts a static font into a fully functional variable font. Given glyph outlines and a set of desired design axes, NIV predicts per-point displacements. The model operates directly on vector glyph geometry and employs a novel Property Embedding mechanism that captures interactions between multiple axes, enabling consistent multi-axis variation within a unified framework. We train NIV on a newly constructed dataset derived from variable Google Fonts, comprising over one million variation tuples. The resulting model generalizes across unseen code points, unseen font styles, high-complexity CJK glyphs, and even out-of-distribution handwriting inputs. The generated outputs are standard variable font files supporting continuous interpolation via existing rendering engines. To facilitate research, we release the dataset, the complete training and inference implementation, and trained models at https://github.com/ndvbd/NIV. Beyond typography, our approach demonstrates how structured geometric objects with continuous parametric variation can be synthesized using neural deformations.

2606.05258 2026-06-05 stat.ML cs.LG stat.AP 版本更新

Harnessing Source Heterogeneity for Cluster-Structured Transfer Learning

利用源异质性进行聚类结构迁移学习

Xiaohui Yin, Jun Jin, Shane J. Sacco, Robert H. Aseltine, Kun Chen

发表机构 * Department of Statistics, University of Connecticut(康奈尔大学统计学系) Department of Public Health Sciences, Henry Ford Health(亨利福特医疗健康部公共卫生科学系) Center for Population Health, University of Connecticut Health Center(康奈尔大学健康中心人口健康中心)

AI总结 针对迁移学习中源异质性问题,提出Trans-GLMC方法,通过聚类结构自适应融合目标与源数据,提升预测性能并识别可解释的源聚类。

详情
AI中文摘要

当目标群体数据有限但存在多个相关辅助源时,迁移学习是一种自然策略。一个核心难点是源异质性:辅助源可能并非同等有用,且其有用性可能以结构化的聚类方式变化。现有的迁移学习方法通常将源选择简化为二元的信息性/非信息性决策,忽略了具有不同可迁移性的源子组。受一项使用康涅狄格医院信息管理交换(CHIME)数据的自杀风险研究(涵盖27家医院的636,758名患者)的启发,我们提出了Trans-GLMC,一种针对广义线性模型的聚类结构迁移学习程序。CHIME设置说明了核心挑战:由于自杀尝试在任何单一设施中罕见,医院特定的风险模型不稳定,而不加区分地合并所有医院会模糊设施层面在患者构成和风险特征上的差异。Trans-GLMC首先在目标和候选源之间构建基于系数的距离,以恢复潜在源聚类。然后,它结合全局融合、聚类内细化和目标去偏,产生一个适应检测到的结构的估计量。我们建立了一个非渐近误差界,当存在有意义的目标聚类时,该误差界优于其非聚类对应物,否则在常数范围内匹配非聚类速率。在模拟和CHIME研究中,Trans-GLMC改进了设施特定的预测,识别了具有相互可迁移性的可解释医院社区,并恢复了临床一致的自杀风险因素。

英文摘要

Transfer learning is a natural strategy when a target population has limited data but multiple related auxiliary sources are available. A central difficulty is source heterogeneity: auxiliary sources may not be equally useful, and their usefulness may vary in a structured, cluster-like fashion. Existing transfer-learning methods often reduce source selection to a binary informative/non-informative decision, overlooking subgroups of sources with differential transferability. Motivated by a suicide-risk study using data from the Connecticut Hospital Information Management Exchange (CHIME), comprising 636,758 patients across 27 hospitals, we propose Trans-GLMC, a cluster-structured transfer-learning procedure for generalized linear models. The CHIME setting illustrates the core challenge: hospital-specific risk models are unstable because suicide attempts are rare at any single facility, whereas indiscriminate pooling across hospitals can obscure facility-level differences in patient mix and risk profiles. Trans-GLMC first constructs a coefficient-based distance among the target and candidate sources to recover latent source clusters. It then combines global fusion, within-cluster refinement, and target debiasing to produce an estimator that adapts to the detected structure. We establish a non-asymptotic error bound that improves over its unclustered counterpart whenever a meaningful target cluster exists and matches the unclustered rate up to constants otherwise. In simulations and in the CHIME study, Trans-GLMC improves facility-specific prediction, identifies interpretable communities of hospitals with mutual transferability, and recovers clinically coherent suicide-risk factors.

2606.05257 2026-06-05 cs.LG cs.IR 版本更新

Scaling Laws for Behavioral Foundation Models over User Event Sequences

用户事件序列上行为基础模型的缩放定律

Rickard Brüel Gabrielsson

发表机构 * Unbox AI

AI总结 研究行为基础模型在用户事件序列上的缩放定律,通过约600次实验发现小嵌入器参数最优,计算最优训练在低计算量时数据密集,且评估指标影响缩放定律。

详情
AI中文摘要

基础模型越来越多地在推荐、支付、欺诈和商务领域的用户行为序列上进行训练,但这些模型仍然缺乏语言模型缩放定律所提供的计算校准。我们研究了一种常见的两部件行为模型架构:基于特征的嵌入器将每个多模态项目映射为向量,解码器仅变换器从结果序列中预测下一个事件。在真实交互数据上进行约600次运行,涵盖$10^{15}$-$10^{19}$训练FLOPs,我们联合变化四个部署相关轴:两部件参数分配、临界批量大小、模型/数据分配以及冻结嵌入器后使用的采样负例数量。小嵌入器(参数占比$s^{\star}\!\approx\!2\%$)在我们测试的每个预算下都是计算最优的,因为嵌入器参数每步更昂贵,且暴露于比上下文器参数多得多的重复项目。计算最优训练在低计算量时相对于文本是数据密集的,但随着计算量增加,其$D/N$比率向Chinchilla启发式靠拢。采样训练目标和部署的排序指标以自身缩放的方式不一致:临界批量大小、冻结后的最优负例数量以及损失与排序质量之间的一致性都随计算量和所选评估指标而变化。对于负采样,更大的预算越来越偏好更多负例;到$10^{19}$ FLOPs时,活跃约束是候选轴内存而非FLOPs。在行为基础模型中,评估指标因此是缩放定律的一部分:改变它可能改变计算最优配方。

英文摘要

Foundation models are increasingly trained on sequences of user actions in recommendation, payments, fraud, and commerce, but these models still lack the kind of compute calibration that scaling laws provide for language models. We study a common two-part behavioral-model architecture: a feature-based event embedder maps each multi-modal item to a vector, and a decoder-only transformer predicts the next event from the resulting sequence. Across roughly 600 runs on real interaction data, spanning $10^{15}$-$10^{19}$ training FLOPs, we jointly vary four deployment-relevant axes: the two-part parameter split, critical batch size, model/data allocation, and the number of sampled negatives used after freezing the embedder. A small embedder ($s^{\star}\!\approx\!2\%$ of parameters) is compute-optimal at every budget we test because embedder parameters are both more expensive per step and exposed to far more repeated items than contextualizer parameters. Compute-optimal training is data-heavy relative to text at low compute, but its $D/N$ ratio moves toward the Chinchilla heuristic as compute increases. The sampled training objective and deployed ranking metrics disagree in ways that themselves scale: critical batch size, optimal negative count after freezing, and the agreement between loss and ranking quality all shift with compute and with the chosen evaluation metric. For negative sampling, larger budgets increasingly prefer more negatives; by $10^{19}$ FLOPs the active constraint is candidate-axis memory rather than FLOPs. In behavioral foundation models, the evaluation metric is therefore part of the scaling law: changing it can change the compute-optimal recipe.

2606.05254 2026-06-05 cs.LG cs.CV cs.RO 版本更新

Flash-WAM: Modality-Aware Distillation for World Action Models

Flash-WAM:面向世界动作模型的模态感知蒸馏

Arman Akbari, Ci Zhang, Arash Akbari, Lin Zhao, Yixiao Chen, Weiwei Chen, Xuan Zhang, Geng Yuan, Yanzhi Wang

发表机构 * Northeastern University(东北大学) University of Georgia(佐治亚大学) EmbodyX Inc.(EmbodyX公司)

AI总结 针对世界动作模型联合生成视频和机器人动作时因多模态噪声分布不对称导致蒸馏失效的问题,提出模态感知步蒸馏框架Flash-WAM,通过为不同模态选择匹配噪声机制的参数化方法,实现单步推理并大幅加速。

详情
AI中文摘要

世界动作模型(WAMs)通过迭代扩散联合生成未来视频和机器人动作,在操作基准上表现出色,但需要数十个去噪步骤,这一成本阻碍了实时控制。步蒸馏已成为自然的补救措施,但现成的方法在联合视频-动作设置中失效,因为视频和动作流使用不同的信噪比偏移噪声调度,并以显著不同的边际噪声分布到达训练,这种不对称性是单模态蒸馏方法无法处理的。我们提出 extbf{Flash-WAM},一个受一致性蒸馏启发的模态感知步蒸馏框架,为每个模态选择一致性函数以匹配其噪声机制:针对动作流的低噪声机制采用线性梯度缩放参数化,针对视频流的高噪声机制采用方差保持参数化,该框架基于对一致性函数族的结构分析,该分析刻画了在一致性边界条件下可实现的梯度缩放。在LingBot-VA上实例化,Flash-WAM将每个模态的推理压缩到单步。在RoboTwin 2.0上,这将每个块延迟从8.1秒减少到NVIDIA L40S上的348毫秒,实现了23倍的加速,从而支持实时推理。Flash-WAM在模拟基准上保持了任务成功率(RoboTwin 2.0上85.5%,LIBERO上95.7%),并大幅恢复了真实世界性能(Unitree G1人形机器人上平均60%),而朴素的一致性蒸馏在相同步预算下降至24%。

英文摘要

World-action models (WAMs) jointly generate future video and robot actions through iterative diffusion, achieving strong performance on manipulation benchmarks but requiring tens of denoising steps, a cost that precludes real-time control. Step distillation has emerged as the natural remedy, but off-the-shelf methods break down in the joint video-action setting because video and action streams use different SNR-shifted noise schedules and reach training with substantially different marginal noise distributions, an asymmetry that single-modality distillation methods cannot accommodate. We introduce \textbf{Flash-WAM}, a modality-aware step-distillation framework inspired by consistency distillation that selects the consistency function for each modality to match its noise regime: a linear-gradient-scaling parametrization for the action stream's low-noise regime, paired with a variance-preserving parametrization for the video stream's high-noise regime, grounded in a structural analysis of the consistency-function family that characterizes the achievable gradient scaling under the consistency boundary condition. Instantiated on LingBot-VA, Flash-WAM compresses inference to a single step in each modality. On RoboTwin 2.0, this reduces per-chunk latency from $8.1$ seconds to $348$ ms on NVIDIA L40S, a $23{\times}$ speedup that enables real-time inference. Flash-WAM preserves task success on simulation benchmarks ($85.5\%$ RoboTwin 2.0, $95.7\%$ LIBERO) and substantially recovers real-world performance ($60\%$ average on a Unitree G1 humanoid robot), while naive consistency distillation drops to $24\%$ at the same step budget.

2606.05253 2026-06-05 cs.LG 版本更新

Alpha-RTL: Test-Time Training for RTL Hardware Optimization

Alpha-RTL: 面向RTL硬件优化的测试时训练

Peilong Zhou, Zhirong Chen, Cangyuan Li, Haoyu Gao, Kaiyan Chang, Ziming Qu, Ying Wang

发表机构 * SKLP, Institute of Computing Technology, Chinese Academy of Sciences(SKLP,计算技术研究所,中国科学院) University of the Chinese Academy of Sciences(中国科学院大学) School of Advanced Interdisciplinary Sciences(先进交叉学科学院)

AI总结 提出TTT-RTL框架,通过测试时强化学习结合EDA反馈(语法检查、仿真和PPA奖励)优化LLM生成的RTL设计,在RTLLM v2.0和工业级C910 FPU单元上分别降低PPA乘积65.1%和ADP 59.4%。

Comments 10 pages, 5 figures

详情
AI中文摘要

大型语言模型(LLM)在生成功能正确的寄存器传输级(RTL)硬件设计方面展现出越来越大的潜力。最近的系统通过集成EDA的强化学习(结合语法、仿真和PPA奖励)进一步改进,但在部署前训练通用RTL生成器,而测试时方法使用冻结策略进行搜索。我们则在测试时执行强化学习,使LLM策略能够针对特定RTL问题适应可执行的EDA反馈。我们提出TTT-RTL,据我们所知,这是首个针对每个设计的测试时训练框架,它闭环了LLM策略与用于RTL优化的EDA流水线。TTT-RTL采样候选实现,通过语法检查和仿真验证它们,使用综合得出的PPA乘积对有效设计进行评分,通过PUCT索引的设计状态池重用高奖励变体,并使用熵策略梯度目标更新策略。为了在稀疏或平台期奖励下稳定策略更新,我们引入了一个自适应KL预算控制器,该控制器使用参考KL、有效样本量和奖励饱和信号来调整熵约束。在Nangate 45nm工艺下的RTLLM v2.0上,TTT-RTL将几何平均PPA乘积相对于参考降低了65.1%,优于最强的已发布冻结策略智能体基线(26.1%)。在Sky130工艺下的工业级XuanTie C910 FPU前导零预测单元上,TTT-RTL实现了59.4%的ADP降低,消融实验证实策略适应、状态重用和KL预算控制各自都有贡献。这些结果表明,带有可执行EDA反馈的测试时训练可以将基于LLM的RTL生成从功能正确性推向物理优化的硬件。

英文摘要

Large language models (LLMs) have shown increasing promise in generating functionally correct register-transfer-level (RTL) hardware designs. Recent systems improve further through EDA-integrated reinforcement learning with syntax, simulation, and PPA rewards, but train a general RTL generator before deployment while test-time approaches search with a frozen policy. We instead perform reinforcement learning at test time, allowing the LLM policy to adapt to executable EDA feedback for the specific RTL problem at hand. We propose TTT-RTL, to our knowledge the first per-design test-time training framework that closes the loop between an LLM policy and an EDA pipeline for RTL optimization. TTT-RTL samples candidate implementations, verifies them through syntax checking and simulation, scores valid designs using synthesis-derived PPA product, reuses high-reward variants through a PUCT-indexed design-state pool, and updates the policy with an entropic policy-gradient objective. To stabilize policy updates under sparse or plateaued rewards, we introduce an adaptive KL-budget controller that adjusts the entropy constraint using reference KL, effective sample size, and reward saturation signals. On RTLLM v2.0 under Nangate 45nm, TTT-RTL reduces the geometric-mean PPA product by 65.1% over the reference, outperforming the strongest published frozen-policy agent baseline at 26.1%. On an industrial XuanTie C910 FPU leading-zero-anticipation unit under Sky130, TTT-RTL achieves a 59.4% ADP reduction, and ablations confirm that policy adaptation, state reuse, and KL-budget control each contribute. These results suggest that test-time training with executable EDA feedback can move LLM-based RTL generation beyond functional correctness toward physically optimized hardware.

2606.05247 2026-06-05 cs.LG stat.ML 版本更新

DiffSlack: Learning under Nonlinear Inequality Constraints via Learnable Slack Variables

DiffSlack: 通过可学习松弛变量在非线性不等式约束下学习

Ziqian Wang, Chenxi Fang, Zhen Zhang

发表机构 * State Key Laboratory of Tribology in Advanced Equipment, Tsinghua University(先进设备摩擦学国家重点实验室,清华大学) Beijing Key Laboratory of Transformative High-end Manufacturing Equipment and Technology, Department of Mechanical Engineering, Tsinghua University(transformative高端制造设备与技术北京市重点实验室,机械工程系,清华大学) Automotive Electronics Business Unit, Hirain Inc.(Hirain公司汽车电子事业部)

AI总结 提出DiffSlack,一种可微投影层,通过可学习松弛变量将非线性不等式约束转化为等式,结合阻尼高斯-牛顿投影实现端到端约束满足,在车辆路径规划中取得更高成功率和几何约束满足度。

详情
AI中文摘要

在神经网络中强制执行非线性不等式约束仍然具有挑战性,尤其是当输出受到许多耦合约束时。现有的硬约束方法通常对约束集施加结构限制,或者为大规模非线性问题引入大量计算开销。在此,我们提出DiffSlack,一种用于非线性不等式约束神经预测的可微投影层。DiffSlack将不等式重新表述为带有可学习松弛变量的等式,这些松弛变量作为增强网络输出的一部分被预测,并为阻尼高斯-牛顿投影提供数据驱动的热启动。投影层将原始预测映射到增强可行流形上,同时保持端到端可微性。两阶段课程进一步稳定训练并改善约束满足。我们在具有200个来自碰撞避免、曲率限制和航点间距的非线性不等式约束的车辆路径规划上评估DiffSlack。与现有的基于学习的基线相比,DiffSlack在相当的推理预算下实现了更高的规划成功率和更强的几何约束满足。消融研究进一步表明,硬投影层降低了对监督质量的敏感性。CARLA中的闭环跟踪和真实车辆实验证实了生成轨迹的可执行性。这些结果表明,DiffSlack为工程应用中将硬不等式约束嵌入神经网络提供了一种实用且可扩展的方法。

英文摘要

Enforcing nonlinear inequality constraints in neural networks remains challenging, especially when the output is subject to many coupled constraints. Existing hard constraint methods often impose structural restrictions on the constraint set or introduce substantial computational overhead for large-scale nonlinear problems. Here, we propose DiffSlack, a differentiable projection layer for nonlinear inequality-constrained neural prediction. DiffSlack reformulates inequalities as equalities with learnable slack variables, which are predicted as part of the augmented network output and provide a data-driven warm start for damped Gauss-Newton projection. The projection layer maps raw predictions onto the augmented feasible manifold while preserving end-to-end differentiability. A two-stage curriculum further stabilizes training and improves constraint satisfaction. We evaluate DiffSlack on vehicle path planning with 200 nonlinear inequality constraints from collision avoidance, curvature limits, and waypoint spacing. Compared with existing learning-based baselines, DiffSlack achieves a higher planning success rate and stronger geometric constraint satisfaction under a comparable inference budget. Ablation studies further show that the hard projection layer reduces sensitivity to supervision quality. Closed-loop tracking in CARLA and real-world vehicle experiments confirms the executability of the generated trajectories. These results demonstrate that DiffSlack provides a practical and scalable approach to embedding hard inequality constraints into neural networks for engineering applications.

2606.05242 2026-06-05 stat.ML cs.LG cs.NA math.NA math.PR 版本更新

Deterministic Envelopes for Tamed SGLD: Decoupling Stochastic-Gradient Noise and Localizing Taming

驯化SGLD的确定性包络:解耦随机梯度噪声与局部化驯化

Yiwei Zhou, Ziheng Chen

发表机构 * School of Mathematics and Statistics, Yunnan University(云南大学数学与统计学院)

AI总结 针对随机梯度Langevin算法中驯化分母引入的稳态偏差问题,提出一种结构保持的确定性包络框架,通过解耦梯度噪声与驯化步骤来消除偏差,并引入混合包络设计以兼顾稳定性和偏差减少。

Comments 40 pages, 11 tables, 2 figures

详情
AI中文摘要

随机梯度Langevin算法常使用驯化分母来稳定非全局Lipschitz漂移。本文表明,当分母与分子依赖于相同的随机梯度实现时,驯化步骤会改变随机预言本身,即使原始随机梯度无偏,也可能产生稳态偏差。我们提出了一种结构保持的框架来设计驯化分母。它在采样预言噪声之前固定分母,并使用局部确定性包络来避免典型区域中的不必要驯化。这些核保留了驯化的稳定效果,同时避免了由梯度依赖分母引入的偏差。我们的理论解释了稳态误差如何分解为预言依赖驯化引起的偏差和确定性稳定引入的剩余误差。在这个确定性包络族中,分析识别出一个远尾条件,解释了局部软包络的局限性,并激发了一个混合成员:在典型区域使用软包络,但在罕见游荡时通过硬尾控制提供保护。实验证实了随机分母的预测稳态失真、确定性包络设计的偏差减少以及混合结构的稳定效果。

英文摘要

Stochastic-gradient Langevin algorithms often use tamed denominators to stabilize non-globally Lipschitz drifts. This paper shows that when the denominator depends on the same stochastic-gradient realization as the numerator, the taming step changes the stochastic oracle itself and can create a stationary bias even if the original stochastic gradient is unbiased. We propose a structure-preserving framework for designing tamed denominators. It fixes the denominator before the oracle noise is sampled and uses localized deterministic envelopes to avoid unnecessary taming in typical regions. These kernels keep the stabilizing effect of taming while avoiding the bias introduced by a gradient-dependent denominator. Our theory explains how the stationary error splits into the bias caused by oracle-dependent taming and the remaining error introduced by deterministic stabilization. Within this deterministic-envelope family, the analysis identifies a far-tail condition that explains the limitation of local soft envelopes and motivates a hybrid member: soft in the typical region, but protected by hard-tail control on rare excursions. Experiments confirm the predicted stationary distortions of random denominators, the bias reduction of deterministic-envelope designs, and the stabilizing effect of the hybrid construction.

2606.05239 2026-06-05 stat.ML cs.LG 版本更新

HyFAD: Hybrid Time-Frequency Diffusion with Frequency-Aware Embedding for Time Series Imputation

HyFAD: 用于时间序列插值的混合时频扩散与频率感知嵌入

Hongfan Gao, Wangmeng Shen, Bin Yang, Jilin Hu

发表机构 * School of Data Science and Engineering(数据科学与工程学院) East China Normal University(华东师范大学)

AI总结 提出HyFAD模型,通过耦合时频扩散框架和频率感知步嵌入,实现从时域到频域的渐进式去噪,有效解决频率敏感去噪和高频重建问题,在多个基准数据集上达到最先进性能。

详情
AI中文摘要

扩散模型通过迭代去噪逐步捕捉复杂数据分布,在时间序列建模中表现出强大性能。然而,现有方法在处理频率敏感去噪、高频重建以及平衡全局趋势与局部动态方面存在困难。为解决这些限制,我们提出 extbf{HyFAD},一种用于时间序列插值的 extbf{混合}时频 extbf{扩散}模型,带有 extbf{频率感知}嵌入。基于DDPM范式,HyFAD采用耦合的时频扩散框架,其中反向去噪从时域到频域顺序进行,实现从粗到细的生成。具体地,时域扩散过程捕捉低频全局趋势,而频域扩散过程细化高频频谱分量。我们进一步引入频率感知步嵌入,利用扩散步与频谱分量之间的关系,提供步依赖的频谱引导,促进更准确的频带重建。在多个基准数据集上的大量实验表明,HyFAD达到了最先进的性能。我们的源代码可在https://github.com/hongfangao/HyFAD获取。

英文摘要

Diffusion models have demonstrated strong performance in time series modeling due to their ability to progressively capture complex data distributions through iterative denoising. However, existing approaches struggle with frequency-sensitive denoising, high-frequency reconstruction and balancing global trends with local dynamics. To address these limitations, we propose \textbf{HyFAD}, a \textbf{Hy}brid time-frequency \textbf{D}iffusion model with \textbf{F}requency-\textbf{A}ware embedding for time series imputation. Built upon the DDPM paradigm, HyFAD adopts a coupled time-frequency diffusion framework, in which the reverse denoising proceeds sequentially from the time domain to the frequency domain, enabling coarse-to-fine generation. Specifically, the time-domain diffusion process captures low-frequency global trends, while the frequency-domain diffusion process refines high-frequency spectral components. We further introduce a frequency-aware step embedding that exploits the relationship between diffusion steps and spectral components, providing step-dependent spectral guidance and facilitates more accurate band-wise reconstruction. Extensive experiments on multiple benchmark datasets demonstrate that HyFAD achieves state-of-the-art performance. Our source code is available at https://github.com/hongfangao/HyFAD.

2606.05236 2026-06-05 cs.RO cs.LG 版本更新

A New Quaternion-Joint Cable-Driven Redundant Manipulator Configuration and its Control Through FABRIK and Residual Reinforcement Learning

一种新型四元数关节缆驱动冗余机械臂配置及其通过FABRIK和残差强化学习的控制

Tanapath Pornthisan, Thanapat Kemthong, Thanyapisit Kangsathien, Pasut Aranchaiya, Paulo Garcia, Viboon Sangveraphunsiri

发表机构 * University of California, San Diego(加州大学圣地亚哥分校)

AI总结 提出一种4段8关节四元数关节缆驱动冗余机械臂配置,并利用残差强化学习实现比FABRIK算法高三个数量级的位置和方向精度控制。

详情
AI中文摘要

能够穿越任意空间路径的机械臂,特别是在高度阻塞的工作空间中,在多个行业中备受期待。四元数关节最近赋予了一类特定的机械臂——缆驱动冗余机械臂——超越其先前能力的新功能。具体来说,四元数关节减少了每个自由度所需的电机数量,为更紧凑的解决方案铺平了道路。一个持续的挑战是,四元数关节运动学模型的复杂性给机械臂配置的先验决策带来了困难,并对控制系统提出了更高的计算需求,其非线性放大了由于制造不精确而产生的设计与物理实物之间的所有差异。在这里,我们展示了一个4段、8关节的机械臂可以在更低的硬件成本下实现比现有配置更广阔的工作空间,并且残差强化学习在控制此类机械臂方面优于现有最先进的方法——特别是FABRIK算法。我们的结果表明,这种配置比先前设计更有效地利用工作空间,并且残差强化学习在位置和方向精度上比FABRIK高出三个数量级,实现了对新型4段、8关节机械臂的精确控制。此外,控制实现更简单:我们描述了完整的FABRIK控制过程及相应的学习实现。我们的方法适用于新系统的设计,为设计者提供了开发此类机械臂及新型配置相应控制系统的更多工具。

英文摘要

Robotic arms capable of traversing arbitrary spatial paths, especially in highly obstructed workspaces, are highly desired across several industries. Quaternion-joints have recently empowered a specific class of robotic arms -- cable-driven redundant manipulators -- beyond its prior capabilities. Specifically, quaternion-joints reduce the number of required motors per degree of freedom, paving the way for more compact solutions.An ongoing challenge is that the complexity of the kinematic model of quaternion joints challenges a priori decisions on manipulator configurations and imposes higher computational demands on the control system and its non-linearities amplify all discrepancies between design and physical artifact arising from fabrication imprecision. Here we show a that a 4-segment, 8-joint manipulator can achieve a broader workspace than extant configurations, at lower hardware cost, and that Residual Reinforcement Learning outperforms extant state-of-the-art methods -- specifically, the FABRIK algorithm -- on the control of such manipulator. Our results show that this configuration is more workspace-effective than prior designs, and that Residual Reinforcement Learning outperforms FABRIK by three orders of magnitude on positional and orientational accuracy, effecting precise control of the novel 4-segment, 8-joint manipulator. Additionally, the control implementation is simpler: we describe the complete FABRIK process for control and corresponding learning implementation. Our methodology is applicable to the design of new systems, providing designers with further tools for the development of this class of manipulators and corresponding control systems for novel configurations.

2606.05234 2026-06-05 cs.RO cs.LG 版本更新

OLIVE: Online Low-Rank Incremental Learning for Efficient Adaptive Exoskeletons

OLIVE: 面向高效自适应外骨骼的在线低秩增量学习

Dong Liu, Yanxuan Yu, Ben Lengerich, Tony Geng, Ying Nian Wu

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校) Columbia University(哥伦比亚大学) University of Wisconsin-Madison(威斯康星大学麦迪逊分校) Rice University(里奇大学)

AI总结 提出OLIVE框架,通过低秩残差分解和奖励驱动策略梯度实现外骨骼控制的在线个性化自适应,在多种地形上提升步态平滑度、降低努力并增强稳定性。

详情
AI中文摘要

可穿戴外骨骼系统有望恢复身体障碍者的行动能力,但大多数现有控制器依赖于静态步态策略,缺乏适应动态真实环境或个体用户特征的能力。我们提出\olive(\underline{O}nline \underline{L}ow-rank \underline{I}ncremental Learning for Efficient Adapti\underline{ve} Exoskeletons),一种参数高效的在线自适应框架,在部署期间持续个性化外骨骼控制。\olive将控制策略的自适应组件分解为低秩残差形式~$\dW = \At\Bt^\top$,秩~$r!\ll!\min(d,k)$,将在线更新成本从$\mathcal{O}(dk)$降低到$\mathcal{O}(r(d{+}k))$,同时保持预训练基础控制器~$\Wz$的稳定性。参数通过奖励塑造的策略梯度更新,完全由身体传感器反馈(EMG、IMU、振动)驱动,消除了对离线参考轨迹的依赖。门控机制根据上下文状态调节个性化强度,动态秩调度器根据地形复杂度调整更新维度——在简单平坦地形上分配最小容量,在要求高的不平坦地形上扩展到更高秩更新——从而在多种活动中实现稳健性能:平地行走、楼梯导航、斜坡和不平坦地形。在可穿戴平台上的实验表明,\olive在步态平滑度、努力减少和运动稳定性上比最强基线分别提高了13、22和15个百分点,在大约1,800步内收敛,端到端延迟为7.4毫秒。我们的代码实现可在https://github.com/FastLM/OLIVE获取。

英文摘要

Wearable exoskeleton systems hold promise for restoring mobility in individuals with physical impairments, yet most existing controllers rely on static gait policies that lack the ability to adapt to dynamic real-world environments or individual user characteristics. We present \olive (\underline{O}nline \underline{L}ow-rank \underline{I}ncremental Learning for Efficient Adapti\underline{ve} Exoskeletons), a parameter-efficient online adaptation framework that continuously personalizes exoskeleton control during deployment. \olive decomposes the adaptive component of the control policy into a low-rank residual form~$\dW = \At\Bt^\top$ with rank~$r!\ll!\min(d,k)$, reducing online update cost from $\mathcal{O}(dk)$ to $\mathcal{O}(r(d{+}k))$ while preserving the stability of a pretrained base controller~$\Wz$. Parameters are updated via a reward-shaped policy gradient driven purely by on-body sensor feedback (EMG, IMU, vibration), eliminating dependence on offline reference trajectories. A gating mechanism modulates the strength of personalization based on contextual state, and a dynamic rank scheduler adapts the update dimensionality to terrain complexity -- allocating minimal capacity on simple flat terrain and expanding to higher-rank updates on demanding uneven surfaces -- enabling robust performance across diverse activities: flat walking, stair navigation, slopes, and uneven terrain. Experiments on the wearable platform demonstrate that \olive achieves +13, +22, and +15 percentage-point improvements in gait smoothness, effort reduction, and motion stability over the strongest baseline, converging within $\sim$1{,}800 walking steps at 7.4,ms end-to-end latency. Our code implementation is available at https://github.com/FastLM/OLIVE.

2606.05232 2026-06-05 cs.LG cs.AI 版本更新

Differentiable Efficient Operator Search

可微分高效算子搜索

Xiaohuan Pei, Jiyuan Zhang, Yuanfan Guo, Weiguo Feng, Tao Huang, Cho-Jui Hsieh, Chang Xu

发表机构 * The University of Sydney(悉尼大学) ByteDance(字节跳动) Shanghai Jiao Tong University(上海交通大学) University of California, Los Angeles(加州大学洛杉矶分校)

AI总结 提出可微分高效算子搜索框架,统一解释多种token缩减算子,通过联合搜索缩减位置、保留数量和算子行为,在预算约束下优化多模态模型性能。

详情
AI中文摘要

高效多模态基础模型通常依赖于手动设计的token缩减算子,如剪枝、合并、池化和自适应重加权。尽管这些算子看起来不同,但我们表明它们可以被解释为共享算子空间的不同区域。基于这一观点,我们引入了高效算子搜索,一个可微分框架,联合搜索在哪里缩减token、保留多少token以及如何处理缩减后的token信息。所提出的搜索空间参数化层激活、保留预算和算子行为,而搜索策略在单边预算和成本约束下优化任务性能。该公式将代表性手工设计基线作为特例恢复,并进一步发现超越孤立手动设计的混合算子。在多模态基准上的实验表明,搜索得到的算子在精度-效率权衡上具有竞争力,特别是在激进的视觉token缩减下。这些结果表明,高效多模态推理可以从手动算子设计重新构建为可微分算子搜索。

英文摘要

Efficient multimodal foundation models often rely on manually designed token-reduction operators, such as pruning, merging, pooling, and adaptive reweighting. Although these operators appear different, we show that they can be interpreted as distinct regimes of a shared operator space. Based on this view, we introduce Efficient Operator Search, a differentiable framework that jointly searches where to reduce tokens, how many tokens to retain, and how reduced token information should be processed. The proposed search space parameterizes layer activation, retention budget, and operator behavior, while the search policy optimizes task performance under one-sided budget and cost constraints. This formulation recovers representative hand-designed baselines as special cases and further discovers hybrid operators beyond isolated manual designs. Experiments on multimodal benchmarks show that the searched operators achieve competitive accuracy-efficiency trade-offs, especially under aggressive visual-token reduction. These results suggest that efficient multimodal inference can be reframed from manual operator design to differentiable operator search.

2606.05230 2026-06-05 stat.ML cs.LG eess.SP 版本更新

Central Description Length (CDL) Clustering Validation Index

中心描述长度(CDL)聚类验证指标

Mahdi Shamsi, Soosan Beheshti

发表机构 * Toronto Metropolitan University(多伦多 Metropolitan 大学)

AI总结 提出中心描述长度(CDL)聚类验证指标,通过计算不可观测真实聚类中心描述长度的概率上界来评估聚类质量,无需标签且适用于非凸和不规则形状数据。

详情
AI中文摘要

在工程机器学习管道中,无标签情况下选择聚类算法及其超参数是一个常见难题,这些管道通常用于传感器、图像或过程数据的无监督分析。聚类验证指标(CVI)提供内部评分来对候选聚类进行排序,但大多数流行的CVI基于欧几里得紧致性和分离项构建,因此倾向于紧凑的凸分区。已知它们在非凸、不规则或变密度数据上的性能会下降,通常需要使用核变换或替代距离度量,但代价是额外的调优和计算。本文介绍了中心描述长度(CDL)聚类验证指标。CDL利用观测到的簇内紧致性、估计的聚类中心和估计的聚类协方差,计算与不可观测的真实聚类中心相关的描述长度的概率上界。该界限将簇内紧致性和质心位移压缩为一个可计算的量,并在任何聚类算法产生的分区上进行评估。实现仅使用可观测的量(数据、分区、估计中心和估计协方差),不使用真实标签。在具有非凸和任意形状簇的合成基准测试中,CDL-CVI比我们测试的传统CVI更频繁地选择参考聚类数,并达到更高的调整兰德指数(ARI)值,且无需额外的核预处理阶段。在从冻结的无监督嵌入聚类的图像基准测试(MNIST、CIFAR-10、STL-10)中,CDL-CVI在报告的试验中,针对K-means、DBSCAN和谱聚类返回的聚类数接近参考类别数。

英文摘要

Selecting a clustering algorithm and its hyperparameters without labels is a common difficulty in engineering machine learning pipelines that work with unsupervised analysis of sensor, image, or process data. Clustering validation indices (CVIs) provide internal scores for ranking candidate clusterings, but most popular CVIs are built from Euclidean compactness and separation terms and so tend to favour compact, convex partitions. Their performance is known to degrade on non convex, irregular, or variable density data, where kernel transformations or alternative distance measures are typically used at the cost of additional tuning and computation. This paper introduces the Central Description Length (CDL) clustering validation index. CDL uses the observed within cluster compactness, the estimated cluster centers, and the estimated cluster covariances to compute a probabilistic upper bound on the description length associated with the unobservable true cluster centers. The bound condenses intra cluster compactness and centroid displacement into a single computable quantity and is evaluated on the partition produced by any clustering algorithm. The implementation uses only observable quantities (the data, the partition, the estimated centers, and the estimated covariances) and does not use ground truth labels. On synthetic benchmarks with non convex and arbitrary shape clusters, CDL-CVI selected the reference number of clusters more often and reached higher Adjusted Rand Index (ARI) values than the conventional CVIs we tested, without an additional kernel preprocessing stage. On image benchmarks (MNIST, CIFAR-10, STL-10) clustered from frozen unsupervised embeddings, CDL-CVI returned cluster numbers close to the reference class counts across K-means, DBSCAN, and spectral clustering in the reported trials.

2606.05227 2026-06-05 q-bio.CB cs.LG math-ph math.MP q-bio.BM 版本更新

Quantifying the biophysical properties of stomatocytes in health and disease

量化健康与疾病状态下口形红细胞的生物物理特性

Zhaojie Chai, Jianlu Zheng, He Li, Ming Dao, George Em Karniadakis

发表机构 * Division of Applied Mathematics, Brown University(布朗大学应用数学系) Department of Materials Science and Engineering, Massachusetts Institute of Technology(麻省理工学院材料科学与工程系) College of Engineering, University of Georgia(佐治亚大学工程学院)

AI总结 通过耗散粒子动力学模拟与微流控成像结合,构建三种口形红细胞模型,揭示其几何主导的脾窦穿越行为、膜滚动抑制及高剪切粘度增加,统一解释遗传性口形红细胞增多症的脾切除悖论。

Comments 26 pages, 9 figures

详情
AI中文摘要

遗传性口形红细胞增多症(HS)包括以杯状红细胞为特征的红细胞疾病,这些细胞对脾切除术的反应相反:在过度水化型HS(OHS)中可治愈,但在脱水型HS(DHS/干裂红细胞)中可能促进血栓形成。这一悖论持续存在,因为红细胞生物力学由部分独立的参数——剪切模量、弯曲刚度、表面积体积比(S/V)和细胞质粘度——控制,而现有检测方法仅能零散地捕获这些参数。本文结合耗散粒子动力学(DPD)模拟与微流控成像,在固定膜面积和递减体积(109.7、101.5、89.8 fL)下构建了一个对照盘状红细胞和三种口形红细胞模型(ST-RBC1-3),覆盖OHS到DHS的范围。通过五种力学正交的检测追踪这组参数,我们发现内皮间裂隙(IES)穿越由几何主导:过度水化型ST-RBC1需要比健康红细胞高一个数量级的临界压力,而脱水型ST-RBC3可自由通过。然而,ST-RBC3抑制膜滚动,并在生理血细胞比容下将低剪切全血粘度提高约29%,与戈谢病高粘度相当。一个漏斗-障碍芯片将这些差异放大为无标记的中心线偏移信号,预计可区分所有四种红细胞类型(极端表型间约4.5个标准差)。这些结果将单细胞力学、脾滤过和血液流变学统一在一个框架内,解决了脾切除悖论,并指向HS的微流控术前风险分层。

英文摘要

Hereditary stomatocytosis (HS) comprises red blood cell (RBC) disorders characterized by cup-shaped erythrocytes that respond oppositely to splenectomy: curative in overhydrated HS (OHS) but potentially thrombogenic in dehydrated HS (DHS/xerocytosis). This paradox persists because RBC biomechanics is governed by partly independent parameters--shear modulus, bending rigidity, surface-to-volume ratio (S/V), and cytoplasmic viscosity--that existing assays capture only piecemeal. Here we combine dissipative particle dynamics (DPD) simulations with microfluidic imaging to construct a control discocyte and three stomatocyte models (ST-RBC1-3) at fixed membrane area and decreasing volume (109.7, 101.5, 89.8 fL), spanning the OHS-to-DHS range. Tracing this parameter set through five mechanically orthogonal assays, we find that interendothelial-slit (IES) traversal is geometry-dominated: overhydrated ST-RBC1 requires an order of magnitude higher critical pressure than healthy RBCs, whereas dehydrated ST-RBC3 passes freely. ST-RBC3 nonetheless suppresses membrane tank-treading and raises low-shear whole-blood viscosity by ~29% at physiological haematocrit, comparable to Gaucher-disease hyperviscosity. A funnel-obstacle chip amplifies these differences into a label-free centerline-offset signal predicted to separate all four RBC types (~4.5 standard deviations between extreme phenotypes). These results unite single-cell mechanics, splenic filtration, and hemorheology in one framework, resolve the splenectomy paradox, and point toward microfluidic pre-operative risk stratification in HS.

2606.05225 2026-06-05 q-bio.QM cs.LG 版本更新

The Language of Elution: Autoregressive Prediction of the Next Feature in Untargeted LC-HRMS Lipidomics

洗脱的语言:非靶向LC-HRMS脂质组学中下一个特征的自回归预测

Dayanjan S. Wijesinghe

发表机构 * Virginia Commonwealth University School of Pharmacy(弗吉尼亚联邦大学药学院)

AI总结 将色谱洗脱建模为自回归序列预测任务,使用LSTM和Transformer模型基于无注释特征预测下一个洗脱的质荷比区间,在临床脂质组学数据上达到98.4%的top-1准确率,并揭示了序列模式而非分子特性驱动预测。

详情
AI中文摘要

非靶向液相色谱-高分辨质谱(LC-HRMS)每份样本可检测数千个分子特征,但仅有2-20%获得可靠的结构注释。造成这种“暗代谢组”的一个根本原因是串联质谱(MS/MS)采集是反应性的:仪器仅在离子出现后选择前体,而对后续洗脱的离子一无所知。我们将色谱洗脱重新定义为自回归序列预测任务。由于反相洗脱顺序由疏水性决定,连续特征形成物理约束的序列,类似于语言中的标记。我们将质荷比(m/z)轴离散化为110个区间,并训练长短期记忆(LSTM)和Transformer模型,基于五个无注释的每个标记特征(m/z区间、质量亏损、保留时间间隔、极性和强度排名)预测下一个洗脱的m/z区间。在来自四个临床脂质组学队列(342份血浆样本;SCIEX TripleTOF 6600+,Waters CSH C18)的15,242个特征上训练,LSTM达到98.4%的top-1准确率(top-5为99.99%;平均绝对误差3.6 Da),Transformer达到98.0%。消融实验表明,自回归上下文贡献了55.5个百分点,而没有任何单个特征贡献超过0.2个百分点:是序列模式而非分子特性驱动预测。模型在共享方法的仪器间可迁移(在独立Agilent 6530数据集上r=0.999),但在不同色谱柱化学(top-1为5.1%)或极性模式(2.6%)下失败,证实了方法和模式特异性。在少至2到5次质控进样上进行微调,可将保留准确率从2.6%恢复到近50%,因此跨条件部署需要最少的校准。这些结果表明洗脱序列高度可预测,并为预测性MS/MS采集奠定基础,以提高非靶向代谢组学的注释覆盖率。

英文摘要

Untargeted liquid chromatography-high-resolution mass spectrometry (LC-HRMS) detects thousands of molecular features per sample, yet only 2-20% receive confident structural annotations. A root cause of this "dark metabolome" is that tandem MS/MS acquisition is reactive: instruments select precursors only after ions appear, blind to what elutes next. We reframe chromatographic elution as an autoregressive sequence prediction task. Because reversed-phase elution order is governed by hydrophobicity, successive features form a physically constrained sequence, like tokens in language. We discretize the mass-to-charge (m/z) axis into 110 bins and train long short-term memory (LSTM) and Transformer models to predict the next eluting m/z bin from five annotation-free per-token features: m/z bin, mass defect, retention-time gap, polarity, and intensity rank. Trained on 15,242 features from four clinical lipidomics cohorts (342 plasma samples; SCIEX TripleTOF 6600+, Waters CSH C18), the LSTM reaches 98.4% top-1 accuracy (99.99% top-5; mean absolute error 3.6 Da) and the Transformer 98.0%. Ablation shows autoregressive context accounts for 55.5 percentage points while no single feature contributes more than 0.2 pp: the sequential pattern, not molecular properties, drives prediction. Models transfer across instruments sharing the method (r=0.999 on an independent Agilent 6530 dataset) but fail under a different column chemistry (5.1% top-1) or polarity mode (2.6%), confirming method- and mode-specificity. Fine-tuning on as few as two to five quality-control injections recovers held-out accuracy from 2.6% to nearly 50%, so cross-condition deployment needs minimal calibration. These results establish that elution sequences are highly predictable and lay the groundwork for predictive MS/MS acquisition to improve annotation coverage in untargeted metabolomics.

2606.05219 2026-06-05 cs.LG cs.AI 版本更新

Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

大步长梯度下降恢复多路径深度线性网络中的对称性

Hee-Sung Kim, Sungyoon Lee

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文研究大步长离散梯度下降如何通过边缘稳定性振荡使多路径深度线性网络从对称性破坏转向信号重新分配,从而偏好共享表示而非单路径主导。

Comments ICML 2026

详情
AI中文摘要

最近对多路径深度线性网络的分析使用梯度流预测了一种“赢家通吃”的专业化,其中路径对称性被破坏,每个特征集中在一个路径中。在这项工作中,我们表明具有大步长的离散梯度下降(GD)讲述了一个不同的故事。我们证明单路径解是尖锐最小值,而跨路径分布信号通过一个随路径数量和深度增加而减小的因子降低了尖锐度。因此,虽然早期训练再现了由GF预测的深度驱动的对称性破坏,但随后在稳定性边缘的振荡覆盖了这一趋势,并将网络驱动到重新平衡阶段,其中信号在路径间重新分布。总之,这些结果阐明了深度如何塑造路径竞争,并解释了大步长GD为何偏好共享表示而非持续的单路径主导。

英文摘要

Recent analyses of multi-pathway Deep Linear Networks use Gradient Flow to predict a "winner-takes-all" specialization in which path symmetry breaks and each feature concentrates in a single pathway. In this work, we show that discrete Gradient Descent (GD) with a large step size tells a different story. We prove that single-path solutions are sharp minima, whereas distributing signals across pathways reduces sharpness by a factor that decreases with both the number of pathways and depth. Consequently, while early training reproduces the depth-driven symmetry breaking predicted by GF, oscillations at the Edge of Stability subsequently override this tendency and drive the network into a re-balancing phase, where signals redistribute across pathways. Together, these results clarify how depth shapes pathway competition and explain why large-step GD favors shared representations rather than persistent single-pathway dominance.

2606.05217 2026-06-05 math-ph cs.AI cs.LG math.MP physics.data-an 版本更新

The Score Hamiltonian: Mapping Diffusion Models to Adiabatic Transport

得分哈密顿量:将扩散模型映射到绝热输运

Peter Halmos, Boris Hanin

发表机构 * Computer Science Department, Princeton University(普林斯顿大学计算机科学系) ORFE Department, Princeton University(普林斯顿大学ORFE系)

AI总结 本文通过构建得分哈密顿量,建立了基于得分的扩散模型采样与薛定谔算子基态绝热输运之间的精确对应关系,并利用绝热定理推导了密度重建误差界和退火调度方案。

详情
AI中文摘要

我们展示了基于得分的扩散模型采样与一族薛定谔算子的基态绝热输运之间的精确对应关系,这些薛定谔算子被称为得分哈密顿量,由学习到的得分的量子势构建。通过具有时变势的福克-普朗克方程的绝热定理,我们获得了新颖的密度重建误差界和原则性的退火调度方案。我们发现采样的基本极限由得分匹配误差平方与得分哈密顿量谱隙(数据密度的逆庞加莱常数)之比决定。

英文摘要

We exhibit an exact correspondence between sampling with score-based diffusion models and adiabatic transport of ground states for a family of Schrödinger operators we call Score Hamiltonians, built from the learned score's quantum potential. We obtain novel density reconstruction bounds and principled annealing schedules via adiabatic theorems for Fokker-Planck equations with time-varying potentials. We find the fundamental limit of sampling is set by the ratio of squared score-matching error to Score Hamiltonian spectral gap - the inverse Poincaré constant of the data density.

2606.05208 2026-06-05 eess.SP cs.LG 版本更新

Transformer-Enhanced Reinforcement Learning: Fundamentals and Applications in Communication Networks

Transformer增强的强化学习:通信网络中的基础与应用

Nguyen Cong Luong, Shaohan Feng, Nguyen Duc Hai, Zeping Sui, Bo Ma, Min Xu, Zhihao Dong, Qiushi Zhao, Nguyen Duc Duy Anh, Nguyen Quoc Khanh, Ngoc Hung Nguyen, Zitian Zhang, Jie Cao

发表机构 * Faculty of Computer Science, Phenikaa University(菲律宾Phenikaa大学计算机科学学院) Faculty of Artificial Intelligence and Data Science, Phenikaa University(菲律宾Phenikaa大学人工智能与数据科学学院) School of Information and Electronic Engineering (Sussex Artificial Intelligence Institute), Zhejiang Gongshang University(浙江工商大学信息与电子工程学院(Sussex人工智能研究院)) School of Computer Science and Electronics Engineering, University of Essex(埃塞克斯大学计算机科学与电子工程学院) School of Mathematics, Statistics and Mechanics, Beijing University of Technology(北京理工大学数学、统计与力学学院) School of Information Science and Technology, Harbin Institute of Technology(哈尔滨工业大学信息科学与技术学院) Department of Electrical and Information Technology, Faculty(电气与信息技术系)

AI总结 本文综述了Transformer增强的强化学习算法及其在通信网络中的应用,重点解决了传统RL在长期依赖建模和部分可观测性方面的局限。

详情
AI中文摘要

强化学习长期以来一直是解决通信网络中各种问题的强大工具。然而,传统的强化学习模型仍然面临若干局限性。它们不仅依赖于与环境的大量交互,而且在建模长期关系和应对部分可观测性方面也受到限制。近年来,Transformer模型展示了增强强化学习模型的能力,使其能够克服这些问题。特别是,Transformer中的自注意力机制能够有效建模长程依赖和全局相关性,同时加速训练过程并处理异质数据模态。本文全面综述了基于Transformer的强化学习算法及其在通信网络中的应用。具体而言,本文提供了强化学习和Transformer架构的数学背景,并深入探讨了资源分配、计算卸载、路由与轨迹控制以及网络安全等关键问题。最后,我们讨论了挑战、开放问题以及值得关注的未来研究方向,包括用于语义通信和网络优化的Transformer增强深度强化学习算法。

英文摘要

Reinforcement Learning (RL) has long been a powerful solution to various problems in communication networks. However, traditional RL models still face with several limitations. Not only do they rely on large numbers of interactions with the environment, but they are also limited in terms of modeling long-term relationships and tackling partial observability. In recent years, the Transformer model has demonstrated the ability to enhance RL models, allowing them to overcome these issues. Particularly, the self-attention mechanism within the Transformer enables efficient modeling of long-range dependencies and global correlations, as well as accelerates training processes and handles heterogeneous data modalities. In this paper, we present a comprehensive survey of Transformer-based RL algorithms and their applications in communication networks. Specifically, the paper provides the mathematical background of RL and Transformer architectures, along with insights into key issues such as resource allocation, computation offloading, routing, and trajectory control, and network security. We conclude the paper by discussing challenges, open issues, and notable future research directions, including Transformer-enhanced DRL algorithms for semantic communication and network optimization.

2606.05202 2026-06-05 physics.comp-ph cs.LG 版本更新

Multi-Fidelity Learning with Shallow Recurrent Decoders for Reactor Physics

基于浅层循环解码器的多保真度学习在反应堆物理中的应用

Stefano Riva, Carolina Introini, J. Nathan Kutz, Antonio Cammi

发表机构 * Autodesk Research(Autodesk研究院) Department of Energy, Nuclear Engineering Division(能源部核工程系) Politecnico di Milano(米兰理工大学) Department of Mechanical and Nuclear Engineering and Emirates Nuclear Technology Center(机械与核工程系和阿联酋核技术中心)

AI总结 针对反应堆物理中高保真数据稀缺而低保真数据丰富的问题,提出利用浅层循环解码器将低保真模型(如点动力学)映射到高保真模型(如扩散方程),以低成本获得高保真解。

详情
AI中文摘要

在反应堆物理中,根据用户需求,中子学可以以不同的保真度处理。一方面,由于数值求解玻尔兹曼输运方程的计算成本高,精确模拟反应堆中中子行为通常昂贵且耗时。另一方面,通过采用适当的假设,如SP$_N$、扩散理论和点动力学,可以高效生成低保真数据。从代理模型的角度看,这种计算限制转化为高保真数据稀缺和大量低保真数据。鉴于这种保真度差异,开发一种合适的程序将低保真模型映射到高保真模型将是有趣的;例如,可以从点动力学模型获得的时间序列数据出发,求解多群扩散方程。实际上,本文通过利用多保真度信息和浅层循环解码器(一种新颖的机器学习架构,能够将时间序列观测映射到反应堆的全状态)来研究这种可能性。该技术设计为使用局部或全局测量作为输入,并将其时间轨迹映射到高维状态;同理,原则上当输入由集总模型的解构成时,该架构也可使用。本文将这一思想应用于基准反应堆几何,在各种输入条件下将点动力学模型映射到扩散解,且计算成本大大降低。

英文摘要

In reactor physics, neutronics can be treated with different fidelity levels, according to the needs of the user. On one hand, the precise modeling of neutrons' behaviour in reactor physics is often expensive and time-consuming due to the high computational costs to numerically solve the Boltzmann transport equation. Conversely, by adopting suitable assumptions, such as the SP$_N$, diffusion theory, and point kinetics, it is possible to generate efficiently low-fidelity data. From the perspective of surrogate models, this computational limitation translates into a scarcity of high-fidelity data and a significant amount of low-fidelity data. Given this difference in fidelity levels, it would be interesting to develop a suitable procedure to map low-fidelity models towards higher fidelity models; for instance, one could obtain the solution to a multi-group diffusion equation starting from time-series data obtained from a point kinetics model. Indeed, this work investigates this possibility by leveraging multi-fidelity information with Shallow Recurrent Decoders, a novel machine learning architecture able to map time-series observations to the full state of the reactor. This technique has been designed to use local or global measurements as input and map their temporal trajectories to the high-dimensional state; by the same logic, in principle, this architecture can also be used when the input is formed by the solution of a lumped model. This work applies this idea to a benchmark reactor geometry, mapping the point kinetics model to the diffusion solution under various input conditions, with much less computational costs.

2606.05201 2026-06-05 cs.LG 版本更新

State commitment learning: training language models to distinguish computation from memory

状态承诺学习:训练语言模型区分计算与记忆

Fei Ding, Yongkang Zhang, Runhao Liu, Yuhao Liao, Zijian Zeng, Huiming Yang

发表机构 * Alibaba Group(阿里巴巴集团) Tsinghua University(清华大学)

AI总结 提出状态承诺学习目标及反事实擦除强化学习(CERL)方法,通过训练模型区分应保留的持久状态与可丢弃的临时计算,减少推理对隐藏思维的依赖,在数学、长链逻辑、科学问答和多轮工具使用任务中保持准确率的同时降低依赖。

Comments 17 pages

详情
AI中文摘要

推理语言模型不区分用于计算的token与构成持久状态的token:一旦生成,所有隐藏思维都保留在上下文中并影响未来预测。因此,下游推理可能依赖于不应在后续安全依赖的失败尝试、死胡同和私人草稿。我们将此现象重新定义为一种新的训练目标,即状态承诺学习:训练模型明确区分应作为持久状态提交的信息与可丢弃的临时计算。我们定义了一个反事实准则,即持久状态充分性,使得在隐藏思维被擦除后答案是否仍然可用变得可训练和可测量。然后,我们提出反事实擦除强化学习(CERL),它在相同前缀下评估保留隐藏思维的路径和擦除它们的路径,并仅在擦除路径保持正确时给予奖励。我们还引入了擦除依赖协议,并在数学、长链逻辑、科学问答和多轮工具使用评估中表明,CERL在不牺牲准确率的情况下显著降低了答案对隐藏思维的依赖,始终优于仅正确性强化学习和长答案SFT基线。

英文摘要

Reasoning language models do not distinguish tokens used for computation from tokens that constitute persistent state: once generated, all hidden thoughts remain in context and influence future predictions. As a result, downstream reasoning may depend on failed attempts, dead ends, and private scratch work that should not be safely relied on later. We recast this phenomenon as a new training objective, state commitment learning: training models to explicitly distinguish information that should be committed as persistent state from temporary computation that can be discarded. We define a counterfactual criterion, persistent-state sufficiency, which makes it trainable and measurable whether an answer remains usable after hidden thoughts are erased. We then propose Counterfactual Erasure RL (CERL), which evaluates, under the same prefix, both a path that keeps hidden thoughts and a path that erases them, and gives reward only when the erasure path remains correct. We also introduce the Erasure Dependence Protocol and show across mathematics, long-chain logic, scientific QA, and multi-turn tool-use evaluation that CERL substantially reduces answer dependence on hidden thoughts without sacrificing accuracy, consistently outperforming correctness-only RL and long-answer SFT baselines.

2606.05200 2026-06-05 physics.comp-ph cs.LG 版本更新

A differentiable machine learning small-angle X-ray scattering analysis framework for structure elucidation of lipid nanoparticles

一种用于脂质纳米颗粒结构解析的可微分机器学习小角X射线散射分析框架

Maria Bånkestad, Sandra Barman, Magnus Röding, Erik Kaunisto, Viktoriia Meklesh, Audrey Gallud, Marco Mendez, Marianna Yanez Arteta, Stefan Norberg, Ann Terry, Smita Chakraborty, Shun Yu, Jerk Rönnols, Sepideh Pashami

发表机构 * RISE Research Institutes of Sweden, Division Digital Systems, Computer Science(瑞典RISE研究机构,数字系统部门,计算机科学) RISE Research Institutes of Sweden, Division Bioeconomy, Food Research and Innovation(瑞典RISE研究机构,生物经济、食品研究与创新部门) Sustainable Innovation & Transformational Excellence, Pharmaceutical Technology & Development, Operations, AstraZeneca(可持续创新与转型卓越,制药技术与开发,运营,阿斯利康) Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg(查尔姆斯理工大学和哥德堡大学数学科学系) Advanced Drug Delivery, Pharmaceutical Sciences, R&D, AstraZeneca(先进药物递送,药学科学,研发,阿斯利康) Global Product Development, Pharmaceutical Technology & Development, Operations, AstraZeneca(全球产品开发,制药技术与开发,运营,阿斯利康) MAX IV Laboratory, Lund University(隆德大学MAX IV实验室)

AI总结 提出一种结合机器学习代理模型和可微分层的框架,加速脂质纳米颗粒的SAXS数据分析,实现多起点拟合和可辨识性分析,揭示参数简并性。

Comments 38 pages, 24 figures, 5 tables (incl. supplementary information)

详情
AI中文摘要

脂质纳米颗粒(LNPs)是带负电核酸的有效递送系统。其多组分架构产生核-壳结构。小角X射线散射(SAXS)是LNPs的重要表征技术,但从SAXS恢复内部结构和尺寸分布是一个具有非唯一解的反问题。现实模型通常过于昂贵,难以进行系统探索。我们引入了一个机器学习加速的、可微分的框架,用于异质、多分散LNPs的SAXS分析。前向模型结合了具有高斯随机场内芯的核-壳颗粒、单分散SAXS图的神经代理模型,以及一个对颗粒尺寸分布进行积分的可微分层。代理模型将预测成本降低了四个数量级,而可微性使得大规模多起点拟合和集成可辨识性分析成为可能。应用于合成和实验MC3 LNP数据,该框架表明,近乎相同的SAXS拟合可能源于不同的参数模式,其中实验拟合主要由尺寸分布与内部结构参数之间的权衡主导。

英文摘要

Lipid nanoparticles (LNPs) are efficient delivery systems for negatively charged nucleic acids. Their multi-component architecture yields a core-shell structure. Small-angle X-ray scattering (SAXS) is an important characterization technique for LNPs, but recovering internal structure and size distribution from SAXS is an inverse problem with non-unique solutions. Realistic models are often too expensive for systematic exploration. We introduce a machine-learning-accelerated, differentiable framework for SAXS analysis of heterogeneous, polydisperse LNPs. The forward model combines a core-shell particle with a Gaussian random-field interior, a neural surrogate for the monodisperse SAXS map, and a differentiable layer integrating over particle-size distributions. The surrogate reduces prediction cost by four orders of magnitude, while differentiability enables large-scale multi-start fitting and ensemble identifiability analysis. Applied to synthetic and experimental MC3 LNP data, the framework shows that near-identical SAXS fits can arise from distinct parameter modes, with the experimental fits dominated by a trade-off between size-distribution and interior-structure parameters.

2606.05198 2026-06-05 q-bio.BM cs.LG 版本更新

An accurate nucleic acid-small molecule docking framework via geometric deep learning with large-scale pretraining

基于几何深度学习与大尺度预训练的核酸-小分子精准对接框架

Shi Li, Xujun Zhang, Mingquan Liu, Hui Zhang, Shuoying Jia, Yu Kang, Tingjun Hou, Peichen Pan

发表机构 * College of Pharmaceutical Sciences, Zhejiang University(浙江大学药学院) Faculty of Health Sciences, University of Macau(澳门大学健康科学学院) Zhejiang Provincial Key Laboratory for Intelligent Drug Discovery and Development, Jinhua Institute of Zhejiang University(浙江省智能药物发现与开发重点实验室,浙江大学金华研究院) Shanghai Innovation Institute(上海创新研究院)

AI总结 提出NucleoDock框架,通过物理引导的大规模预训练和精细调优,结合序列、结构及原子级特征,利用混合密度网络几何评分头实现核酸-小分子对接,在125个复合物基准上达到56%的top-1成功率(RMSD<2.0Å),优于传统方法。

Comments 34 pages, 4 figures, 4 tabels, Supplementary Materials includes 8 tabels

详情
AI中文摘要

核酸越来越被认为是超越传统以蛋白质为中心的药物发现中的治疗靶点,然而将小分子准确高效地对接至核酸结构仍然具有挑战性。基于物理的对接方法通常准确性和效率有限,而深度学习方法则受限于实验解析的核酸-配体复合物稀缺。在此,我们提出NucleoDock,一个用于核酸-小分子对接的深度学习框架。为解决数据稀缺问题,NucleoDock将物理引导的大规模预训练(对数百万个对接生成的合成复合物)与在精选实验共晶结构上的微调相结合。它进一步整合了序列和结构信息的核苷酸表示与原子级三维特征,以捕获生物学背景和结合位点几何结构。使用基于混合密度网络的几何评分头来建模条件相互作用距离分布以进行构象排序。在125个核酸-配体复合物的外部基准测试中,NucleoDock在RMSD截止值2.0Å下实现了56%的top-1成功率,优于rDock的29%,同时每个复合物生成100个构象约需5秒。在ROBIN基准上的回顾性虚拟筛选进一步显示了早期富集的改善。NucleoDock代表了在弥合蛋白质导向和核酸导向的计算药物发现之间方法论差距方面迈出的一步。

英文摘要

Nucleic acids are increasingly recognized as therapeutic targets beyond conventional protein-centered drug discovery, yet accurate and efficient docking of small molecules to nucleic acid structures remains challenging. Physics-based docking methods often show limited accuracy and efficiency, whereas deep learning approaches are constrained by the scarcity of experimentally resolved nucleic acid-ligand complexes. Here, we present NucleoDock, a deep learning framework for nucleic acid-small molecule docking. To address data scarcity, NucleoDock combines physics-guided large-scale pretraining on millions of docking-generated synthetic complexes with fine-tuning on curated experimental co-crystal structures. It further integrates sequence- and structure-informed nucleotide representations with atomistic three-dimensional features to capture both biological context and binding-site geometry. A mixture density network-based geometric scoring head is used to model conditional interaction-distance distributions for pose ranking. On an external benchmark of 125 nucleic acid-ligand complexes, NucleoDock achieved a top-1 success rate of 56 percent at an RMSD cutoff of 2.0 Angstrom, outperforming rDock with 29 percent, while generating 100 poses in approximately 5 seconds per complex. Retrospective virtual screening on the ROBIN benchmark further showed improved early enrichment. NucleoDock represents a step toward bridging the methodological gap between protein- and nucleic acid-directed computational drug discovery.

2606.05194 2026-06-05 cs.LG cs.AI cs.CL 版本更新

Temporal Preference Concepts and their Functions in a Large Language Model

时间偏好概念及其在大语言模型中的功能

Ian Rios-Sialer, Shantanu Darveshi, Shuai Jiang, Avigya Paudel, Anastasiia Pronina, Ipshita Bandyopadhyay, Justin Shenk

发表机构 * AISC(AI Safety Camp) SPAR(Supervised Program for Alignment Research)

AI总结 通过因果定位和激活修补,本文发现大语言模型在中间到上层节点编码时间偏好几何结构,且行为分析表明模型对未来折扣比人类更平缓,但偏好不稳定,可通过引导向量调控。

详情
AI中文摘要

大语言模型(LLMs)越来越多地被部署用于需要在近期收益与长期后果之间权衡的决策,然而关于它们如何在内部表示或解决这些权衡,我们知之甚少。在这项工作中,我们通过因果定位了一个蒸馏LLM(Qwen3-4B-Instruct-2507)中时间偏好的底层子图,通过来自梯度归因和激活修补的汇聚证据识别了中上层节点。我们发现时间跨度的几何结构在预期局部层的残差流中被编码。行为分析表明,未干预的LLM对未来折扣的陡峭程度比人类低几倍,但这种偏好跨上下文不稳定,这促使我们进行显式控制而非隐式依赖训练。最后,我们发现有暗示性证据表明引导向量可以改变时间偏好。我们的工作展示了机械可解释性如何使我们更接近对LLM规划和推理方式的可靠控制。

英文摘要

Large Language Models (LLMs) are increasingly being deployed to make decisions that require trading off near-term gains against long-term consequences, yet little is known about how they internally represent or resolve these tradeoffs. In this work, we causally localize an underlying subgraph for temporal preference in a distilled LLM (Qwen3-4B-Instruct-2507), identifying mid-to-upper-layer nodes through converging evidence from gradient-based attribution and activation patching. We find that the geometry of time horizon is encoded in the residual stream at the expected localized layers. A behavioral analysis reveals that unintervened LLMs discount the future several times less steeply than humans, yet this preference is unstable across contexts, motivating explicit control rather than implicit reliance on training. Finally, we find suggestive evidence that steering vectors can shift temporal preference. Our work demonstrates how mechanistic interpretability can bring us closer to reliable control over how LLMs plan and reason

2606.05191 2026-06-05 cs.LG eess.SP 版本更新

PyCC.id: A package for hypothesis-driven equation discovery with structural identifiability

PyCC.id: 一个具有结构可辨识性的假设驱动方程发现包

Federico J. Gonzalez

发表机构 * Physics Institute of Rosario(罗萨里奥物理研究所)

AI总结 提出PyCC库,通过特征曲线骨架和假设驱动方法解决数据驱动方程发现中的病态逆问题,支持多种方程发现范式并具有结构可辨识性。

Comments The software package is available at: https://github.com/FedejGon/pyCC.id

详情
AI中文摘要

数据驱动的方程发现本质上是一个逆问题,旨在直接从时间序列测量中推断系统的控制微分方程。一个已知的问题是逆问题的病态性质,这经常产生多个对数据拟合得同样好的数学模型。解决这个问题的一种途径是事先将已知的假设和约束纳入训练阶段。虽然这种方法有效地减少了搜索空间,但仍然会产生多个候选模型,迫使实践者依赖基于自身领域知识的后验手动筛选。最近的一种方法引入了受特征曲线(CCs)启发的结构“骨架”,定义了一种假设驱动的方法。在这种方法中,实践者定义一个骨架,该骨架与一族常微分方程(ODEs)相关联,然后基于其领域知识添加假设和先验,以迭代地改进获得的模型。这种方法的一个重要优点是,一些骨架具有可证明的结构可辨识性属性,这对于检查骨架是否正确或应该被丢弃非常有用。此外,由于其模块化(例如神经网络、符号回归和稀疏回归),这种形式主义能够使用多种方程发现范式。在这项工作中,我们介绍了Python库PyCC,它将这些努力浓缩成一个灵活的工具,允许研究人员和工程师无缝地定义他们的骨架和假设,从时间依赖数据中发现ODEs。

英文摘要

Data-driven equation discovery is fundamentally an inverse problem that seeks to infer the governing differential equations of a system directly from time-series measurements. A known issue is the ill-conditioned nature of the inverse problem, which frequently produces multiple mathematical models that fit the data similarly well. One path to address this issue is by incorporating known hypotheses and constraints into the training phase beforehand. While this approach effectively reduces the search space, it still results in multiple candidate models, forcing practitioners to rely on post-hoc manual filtering based on their own domain expertise. A recent approach incorporates structural `skeletons' inspired by characteristic curves (CCs), defining a hypothesis-driven methodology. In this methodology, practitioners define a skeleton, which is associated with a family of ordinary differential equations (ODEs), and then add their hypotheses and priors based on their domain knowledge to refine the obtained model iteratively. An important advantage of this approach is that some skeletons have demonstrable structural identifiability properties, which are useful for checking whether the skeleton is correct or should be discarded. Furthermore, this formalism enables the use of multiple equation discovery paradigms due to its modularity (such as neural networks, symbolic regression, and sparse regression). In this work, we present the Python library PyCC, which condenses these efforts into a flexible tool that allows researchers and engineers to seamlessly define their skeletons and hypotheses to discover ODEs from time-dependent data.

2606.05186 2026-06-05 cs.LG cs.CL 版本更新

Staged Factorial Screening for Budget-Constrained Micro-Pretraining

预算受限的微预训练中的分阶段因子筛选

Felipe Chavarro Polania

发表机构 * Hewlett Packard Enterprise(惠普企业)

AI总结 针对预算受限的微预训练,提出分阶段分数因子设计方法,通过短时筛选识别高惩罚方向并确认有效锚点,在共享加速器上实现高效配方筛选。

Comments 23 pages, 4 figures

详情
AI中文摘要

预算受限的微预训练通常需要在共享加速器上对许多候选配方进行分诊,然后才能花费更大的搜索预算。我们研究了分阶段分数因子工作流是否能在这种设置中恢复稳定的早期效应结构。在固定的自动研究衍生的单GPU训练循环上,我们运行了613个实验,包括在2、5和10分钟时的试点和后续筛选;5和10分钟时的完整16条件种子重运行;有针对性的种子锚点检查;同主机贪婪和匹配成本随机基线;一个60分钟的桥接包;以及通过24小时的有界Windows A100和Linux L40S锚点延续。总批次、深度和宽度的主要惩罚在短预算时最大,并随预算增加而放松。在预先声明的种子全屏系列中,D、A、B和C在预算内Benjamini-Hochberg校正后,在5和10分钟时保留非零估计,而E则没有。随机搜索可以在这个32条件空间中达到强当前最优,但反复在相同的低惩罚区域,且没有因子归因。60分钟桥接锚点具有最低均值,尽管该包没有将工作流改进与更大桥接模型的能力优势分开。在两个主机上的有界12小时和24小时三锚点延续中,桥接具有最低样本均值,而非桥接顺序保持主机敏感。因此,我们提出了一个有界方法结果:使用短设计筛选来识别高惩罚方向,在重复运行下确认有希望的锚点,并在缩减空间内局部细化。证据支持在24小时内两个主机上的以桥接为中心的推荐,而不是硬件不变的排名或通用超参数优化的优越性。

英文摘要

Budget-constrained micro-pretraining often requires triaging many candidate recipes on a shared accelerator before larger search budgets are spent. We study whether a staged fractional-factorial workflow can recover stable early effect structure in this setting. On a fixed autoresearch-derived single-GPU training loop, we run 613 experiments across pilot and follow-up screens at 2, 5, and 10 minutes; full 16-condition seeded reruns at 5 and 10 minutes; targeted seeded anchor checks; same-host greedy and matched-cost random baselines; a 60-minute bridge package; and bounded Windows A100 and Linux L40S anchor continuations through 24 hours. Main penalties from total batch, depth, and width are largest at short budgets and relax as budget increases. Within the predeclared seeded full-screen families, D, A, B, and C retain non-zero estimates at 5 and 10 minutes after within-budget Benjamini-Hochberg correction, while E does not. Random search can reach strong incumbents in this 32-condition space, but repeatedly in the same low-penalty region and without factor attribution. The 60-minute bridge anchor has the lowest mean, although that package does not separate workflow refinement from the larger bridge model's capacity advantage. In bounded 12-hour and 24-hour three-anchor continuations on both hosts, the bridge has the lowest sample mean while the non-bridge ordering stays host-sensitive. We therefore present a bounded methods result: use short designed screens to identify high-penalty directions, confirm promising anchors under repeated runs, and refine locally inside the reduced space. The evidence supports a bridge-centered recommendation through 24 hours on two hosts, not hardware-invariant ranking or general hyperparameter-optimization superiority.

2606.05185 2026-06-05 cs.CY cs.CV cs.LG 版本更新

Drishti AI-Event Guardian: An Intelligent Real-Time Crowd Monitoring and Emergency Response System for Mass Gathering Events

Drishti AI-Event Guardian:面向大规模聚集事件的智能实时人群监控与应急响应系统

Ritabrata Roy Choudhury, Arkajyoti Karmakar, Rudra Pratap Mitra

发表机构 * School of Computer Engineering, Kalinga Institute of Industrial Technology(计算机工程学院,凯林加工业技术学院) School of Electronics Engineering, Kalinga Institute of Industrial Technology(电子工程学院,凯林加工业技术学院)

AI总结 提出Drishti AI-Event Guardian框架,结合YOLOv8、异常检测和梯度提升回归等多模态深度学习技术,实现实时人群密度估计、异常检测、预测建模、人脸识别、医疗紧急报告、聊天机器人和智能警卫重分配,在Kumbh Mela和RCB Victory Parade事件中验证了低延迟和高精度。

Comments 22 pages

详情
AI中文摘要

大规模聚集事件常因人群监控不足和应急响应协调不力导致严重安全事故。传统监控系统缺乏智能分析,导致威胁识别延迟、资源部署不当,以及在密集公共集会中对弱势个体的支持不足。本文提出Drishti AI-Event Guardian,一种利用深度学习增强公共安全的智能人群管理框架。该架构整合来自CCTV网络和无人机平台的多模态数据,由Google Vertex AI基础设施上的模型处理。核心方法包括使用YOLOv8进行实时人群密度估计、时空异常检测以及通过梯度提升回归进行预测性人群流动建模。Drishti还集成了四个模块:(i) 用于失踪人员识别并触发全人群通知的人脸识别;(ii) 带有自动调度的医疗紧急报告;(iii) 用于报告和投诉的对话式AI聊天机器人;(iv) 智能警卫重分配引擎,可根据人群密度变化动态重新分配人员。该系统在Kumbh Mela集会和RCB Victory Parade活动两个场景中进行了评估,实现了人群密度估计MAE为3.2人/平方米、异常检测F1分数为0.91、人脸识别精确率为0.93,以及中位警报延迟为111毫秒。预测性拥堵建模提供五分钟预测,MAPE为8.3%,从而实现预防性干预。聊天机器人无需人工操作即可解决89%的事件申报,而警卫重分配相比手动重新分配将响应人员部署延迟降低了34%。结果表明,该系统从被动监控转向主动人群智能,并为从本地集会到大型节日的活动提供了可扩展的基础。

英文摘要

Mass gathering events are associated with critical safety incidents caused by insufficient crowd monitoring and inadequate emergency response coordination. Traditional surveillance systems lack intelligent analytics, resulting in delayed threat identification, poor resource deployment, and weak support for vulnerable individuals during dense public assemblies. This paper presents Drishti AI-Event Guardian, an intelligent crowd management framework using deep learning for public safety enhancement. The architecture combines multimodal data from CCTV networks and UAV platforms, processed by models on Google Vertex AI infrastructure. Core methods include real-time crowd density estimation using YOLOv8, spatiotemporal anomaly detection, and predictive crowd-flow modeling through gradient-boosted regression. Drishti also integrates four modules: (i) facial recognition for missing person identification with crowd-wide notification; (ii) medical emergency reporting with automated dispatch; (iii) a conversational AI chatbot for reports and complaints; and (iv) an intelligent guard reallocation engine that dynamically reassigns personnel in response to crowd density changes. The system is evaluated on two scenarios: the Kumbh Mela gathering and the RCB Victory Parade event, achieving crowd density estimation MAE of 3.2 persons/m2, anomaly detection F1-score of 0.91, facial recognition precision of 0.93, and median alert latency of 111 ms. Predictive congestion modeling provides five-minute forecasts with MAPE of 8.3%, enabling preemptive intervention. The chatbot resolved 89% of incident filings without human operators, while guard reallocation reduced responder deployment latency by 34% versus manual reassignment. Results demonstrate a shift from passive surveillance toward active crowd intelligence and scalable foundation for events from local gatherings to mega festivals.

2606.05170 2026-06-05 cs.LG 版本更新

ERRORQUAKE: Heavy-Tailed Error Severity Distributions in Open-Weight Large Language Models

ERRORQUAKE: 开放权重大语言模型中重尾错误严重性分布

Jason Z Wang

发表机构 * Independent(独立)

AI总结 提出Errorquake-10k基准测试,通过连续严重性评分和古登堡-里希特指数b揭示开放权重LLM在相同准确率下错误严重性分布存在显著差异,证明严重性分布与错误率信息非冗余。

Comments 28 pages, 12 figures, appendix and checklist

详情
AI中文摘要

在匹配的准确率下,开放权重LLM的错误严重性分布形状存在显著差异——这种差异在标量错误率中不可见。幻觉基准测试报告单一错误计数,将所有错误视为等价,然而错误的日期和捏造的法庭裁决相差数个数量级。我们引入Errorquake-10k,一个包含10,000个查询的基准测试,在8个领域和5个难度等级上对每个响应进行0-4连续严重性评分,并为21个开放权重模型拟合每个模型的严重性分布。对于每个模型,我们估计严重性分布指数b(古登堡-里希特上尾斜率),并给出95%自助法置信区间。要点:在210个模型对中,有85对在人类共识评分上匹配准确率(|Δε| < 0.05)时具有不相交的95% b置信区间,例如deepseek-v3.2与ministral-14b在ε=0.586和Δb=0.47处。一项包含519个项目、三位评分者的人类验证研究确认了测量可靠性(ICC(2,k=3)=0.85),验证了LLM评判排名(ρ=0.89),并确认了人类数据上的密集模型缩放相关性(ρ_s=-0.86)。我们证明了一个不可约简性定理,表明严重性分布和错误率在信息上非冗余(I(b; model | ε)=1.56比特;跨模型b方差的64.5%无法由ε解释)。一个严重性机制分类(κ=0.83)揭示了错误类型随严重性发生类别性转变:低严重性错误为检索错误(71%);高严重性错误为捏造(39%)——并且这种组成因模型大小而异(p<0.0001)。严重性分布应与准确率一同报告;它携带了错误率无法提供的判别信息。

英文摘要

At matched accuracy, open-weight LLMs differ substantially in the shape of their error severity distribution -- a difference invisible to the scalar error rate. Hallucination benchmarks report a single error count and treat all errors as equivalent, yet a wrong date and a fabricated court ruling differ by orders of magnitude. We introduce Errorquake-10k, a 10,000-query benchmark scoring each response on a continuous 0-4 severity scale across 8 domains and 5 difficulty tiers, and we fit per-model severity distributions for 21 open-weight models. For each model we estimate a severity distribution index (b, the Gutenberg-Richter upper-tail slope) with 95% bootstrap confidence intervals. Headline: across the 210 model pairs, 85 have disjoint 95% b confidence intervals at matched accuracy (|Delta epsilon| < 0.05) on human-consensus scoring, e.g. deepseek-v3.2 vs. ministral-14b at epsilon = 0.586 and Delta b = 0.47. A 519-item three-rater human validation study confirms measurement reliability (ICC(2,k=3) = 0.85), validates the LLM-judge ranking (rho = 0.89), and confirms the dense-model scaling correlation on human data (rho_s = -0.86). We prove a Non-Reducibility Theorem showing that severity profile and error rate are informationally non-redundant (I(b; model | epsilon) = 1.56 bits; 64.5% of cross-model b variance is unexplained by epsilon). A severity mechanism taxonomy (kappa = 0.83) reveals that error type shifts categorically with severity: low-severity errors are retrievals (71%); high-severity errors are fabrications (39%) -- and this composition differs by model size (p < 0.0001). Severity distribution should be reported alongside accuracy; it carries discriminative information that the error rate cannot.

2606.05169 2026-06-05 cs.LG 版本更新

The Evaluation Blind Spot: A Stereological Theory of Benchmark Coverage for Large Language Models

评估盲点:大型语言模型基准覆盖的体视学理论

Jason Z Wang

发表机构 * Independent(独立)

AI总结 本文提出一种体视学理论,通过有效维度和可见豪斯多夫距离量化LLM基准覆盖的盲点,发现结构盲点远超统计噪声,并基于子模贪心算法和特征值分析提出稳定核心基准集。

Comments 55 pages, 3 figures, 3 tables, extensive appendix with proofs

详情
AI中文摘要

我们给出了LLM基准覆盖的体视学理论。对于任何有效维度为d_eff的测试套件,两个与相同分数一致且凸的能力轮廓之间的可见豪斯多夫距离由epsilon + C R m^(-1/(d_eff-1))界定,并具有匹配的Lipschitz下界。实验上,三个独立的排行榜(Open LLM v2、扩展的12基准套件、LiveBench)在其竞争前沿的有效维度均在[2.86, 4.80]范围内;结构盲点超过观察到的亚军分数差距两个数量级,并比统计噪声高52-127倍。在卡方投影模型下,各向同性先验是乐观情况;在六个隐藏能力先验和四个环境维度下,前两个模型的模拟半分割交换率保持在[0.38, 0.49],而500次随机可见/保留分割显示,92%的试验交换了前1名排名,平均2.83个前5名模型发生变化。具有Nemhauser (1 - 1/e)保证的子模贪心算法找到了一个由4个基准组成的稳定核心;12个基准中的7个足以达到90%的覆盖率,并且训练好的子集在时间季度间转移时保留率为93-97%。跨12个内部基准和27个Chatbot Arena类别的反事实验证证实,特征结构预测哪些评估是不可替代的(移除干扰的rho = -0.69,p = 0.013)以及哪些外部评估带来新信息(rho = +0.38)。作为第二个独立的理论贡献,我们解决了C^2支撑函数的Gardner问题1.5(1995),通过S^(D-1)上的最优恢复理论建立了一般维度下的极小极大速率Theta(R/(kappa m^(2/(D-1))))。

英文摘要

We give a stereological theory of LLM benchmark coverage. For any suite with effective dimensionality d_eff, the visible Hausdorff distance between two convex capability profiles consistent with the same scores is bounded by epsilon + C R m^(-1/(d_eff-1)), with matching Lipschitz lower bound. Empirically, three independent leaderboards (Open LLM v2, an extended 12-benchmark suite, LiveBench) all have d_eff in [2.86, 4.80] on their competitive frontier; the structural blind spot exceeds the observed runner-up score gap by two orders of magnitude and dominates statistical noise by 52-127x. Under a chi-squared projection model, the isotropic prior is the optimistic case; across six hidden-capability priors and four ambient dimensions the simulated half-split swap rate of the top two models stays in [0.38, 0.49], and a 500-trial random visible/held-out split shows that 92% of trials swap the top-1 ranking with on average 2.83 of 5 top-5 models changing. A submodular greedy algorithm with the Nemhauser (1 - 1/e) guarantee finds a stable core of 4 benchmarks; 7 of 12 suffice for 90% coverage, and the trained subset transfers across temporal quarters with 93-97% retention. A counterfactual validation across 12 internal benchmarks and 27 Chatbot Arena categories confirms that the eigenstructure predicts which evaluations are irreplaceable (rho = -0.69, p = 0.013 for removal disruption) and which external evaluations bring new information (rho = +0.38). As a second, independent theoretical contribution, we resolve Gardner's Problem 1.5 (1995) for C^2 support functions, establishing the minimax rate Theta(R/(kappa m^(2/(D-1)))) in general dimension via optimal recovery theory on S^(D-1).

2606.05168 2026-06-05 cs.CL cs.AI cs.LG 版本更新

Epidemiology of Model Collapse: Modeling Synthetic Data Contamination via Bilayer SIR Dynamics

模型崩溃的流行病学:通过双层SIR动力学建模合成数据污染

Xiangyu Wang

发表机构 * Xiangyu Wang(王翔宇)

AI总结 提出双层耦合SIR/SIRS框架,将数据语料库和AI模型视为两个相互作用的群体,通过交叉层传播模拟合成数据污染导致的模型崩溃,并推导基本再生数R0,实验验证了阈值动力学和干预策略的有效性。

Comments 24 pages, 15 figures

详情
AI中文摘要

在合成数据上训练会导致模型崩溃,但现有分析将其视为单链退化。实际上,AI生态系统涉及交叉污染:模型从其他模型摄取合成数据,产生新的合成文本,并污染共享语料库。我们提出了一个双层耦合SIR/SIRS框架——一个现象学平均场模型,将数据语料库和AI模型视为两个相互作用的群体,每个群体具有易感、感染和恢复三个仓室,并通过跨层传播连接。SIRS变体(我们的主要推荐)包含了免疫衰减,反映了过滤后的语料库和重新训练的模型仍然容易再次污染。我们通过下一代矩阵推导出基本再生数$R_0 = \sqrt{β_D β_M / [(γ_D+μ_D)(γ_M+μ_M)]}$,并将标准流行病阈值结果应用于双层系统。基于公开AI文本流行数据的说明性情景校准在三种情景下均产生超临界动力学($R_0 > 1$);Sobol敏感性分析将合成文本检测识别为最高杠杆参数。一个二分网络基于智能体的模型在密集网络上确认了平均场一致性($R^2 > 0.96$),但在异质性下退化。GPT-2污染链实验(在WikiText和Shakespeare上共192次运行)显示了剂量-反应退化和多样性损失,定性上与阈值图像一致。匹配预算的源多样性实验(1,088次运行)提供了提示性证据,表明多源混合适度减轻了崩溃,但该效应在较低污染分数下消失。干预分析将基于检测的过滤和群体免疫识别为最高杠杆策略。

英文摘要

Training on synthetic data causes model collapse, but existing analyses treat this as single-chain degradation. In reality, the AI ecosystem involves cross-contamination: models ingest synthetic data from other models, produce new synthetic text, and contaminate shared corpora. We propose a bilayer coupled SIR/SIRS framework -- a phenomenological mean-field model treating data corpora and AI models as two interacting populations, each with susceptible, infected, and recovered compartments linked by cross-layer transmission. The SIRS variant (our primary recommendation) incorporates immunity waning, reflecting that filtered corpora and retrained models remain susceptible to re-contamination. We derive the basic reproduction number $R_0 = \sqrt{β_D β_M / [(γ_D+μ_D)(γ_M+μ_M)]}$ via the Next Generation Matrix and apply standard epidemic threshold results to the bilayer system. Illustrative scenario-based calibration from public AI text prevalence data yields supercritical dynamics ($R_0 > 1$) across three scenarios; Sobol sensitivity analysis identifies synthetic-text detection as the highest-leverage parameter. A bipartite-network agent-based model confirms mean-field consistency ($R^2 > 0.96$) for dense networks but degrades under heterogeneity. GPT-2 contamination chain experiments (192 runs across WikiText and Shakespeare) show dose-response degradation and diversity loss qualitatively consistent with the threshold picture. Matched-budget source-diversity experiments (1,088 runs) provide suggestive evidence that multi-source mixing modestly attenuates collapse, but the effect vanishes at lower contamination fractions. Intervention analysis identifies detection-based filtering and herd immunity as the highest-leverage strategies.

2606.04777 2026-06-05 cs.LG 版本更新

UniFair: A unified fair clustering approach based on separation and compactness

UniFair: 基于分离度和紧致性的统一公平聚类方法

Antonia Karra, Vasiliki Papanikou, Georgios Vardakas, Evaggelia Pitoura, Aristidis Likas

发表机构 * University of Ioannina(伊奥安纳大学) Archimedes/Athena Research Center(阿基米德/雅典娜研究中心)

AI总结 提出UniFair框架,通过联合优化分离公平性和社会公平性,在保持聚类质量的同时减少群体差异。

Comments 17 pages, 6 Figures

详情
AI中文摘要

聚类越来越多地用于支持高影响决策,但诸如$k$-means等标准目标可能会产生对人口群体不平等对待的聚类结果。现有的公平聚类方法通常优化单一公平性概念,并且常常忽略聚类成本如何与诱导决策边界的几何形状相互作用。我们提出 extsc{UniFair},一个统一框架,联合优化\emph{分离公平性}和\emph{社会公平性}。分离公平性鼓励受保护群体远离诱导决策边界,而社会公平性通过惩罚群体级聚类成本来减少簇内失真的差异。我们为分离公平和统一$k$-means目标开发了基于梯度的优化过程,并通过在自编码器的潜在空间中强制执行相同标准将其扩展到深度聚类。在表格和图像数据集上的实验表明, extsc{UniFair}在仅适度增加聚类损失的情况下,减少了与边界相关和基于成本的群体差异。

英文摘要

Clustering is increasingly used to support high-impact decisions, yet standard objectives such as k-means can produce clusterings that treat demographic groups unequally. Existing fair clustering methods typically optimize a single notion of fairness and often overlook how clustering costs interact with the geometry of the induced decision boundaries. We propose UniFair, a unified framework that jointly optimizes separation fairness and social fairness. Separation fairness encourages protected groups to lie farther from the induced decision boundaries, while social fairness reduces disparities in within-cluster distortion by penalizing group-wise clustering costs. We develop gradient-based optimization procedures for separation-fair and unified k-means objectives, and extend them to deep clustering by enforcing the same criteria in the latent space of an autoencoder. Experiments on tabular and image datasets show that UniFair reduces both boundary-related and cost-based group disparities with only a modest increase in clustering loss.

2606.04672 2026-06-05 cs.LG cs.AI 版本更新

Learning Long Range Spatio-Temporal Representations over Continuous Time Dynamic Graphs with State Space Models

利用状态空间模型学习连续时间动态图中的长程时空表示

Ayushman Raghuvanshi, Thummaluru Siddartha Reddy, Sundeep Prabhakar Chepuri, Mahesh Chandran

发表机构 * University of Texas at Austin(德克萨斯大学奥斯汀分校) Indian Institute of Science(印度科学研究院)

AI总结 提出一种基于状态空间模型的连续时间动态图框架(CTDG-SSM),通过拓扑感知的高阶多项式投影算子(CTT-HiPPO)实现长程时空信息传播,在动态链接预测、节点分类和序列分类任务上取得最优性能。

Comments Accepted at ICML 2026

详情
AI中文摘要

连续时间动态图(CTDG)为捕捉演化关系数据中的细粒度时间模式提供了更丰富的框架。长程信息传播是学习表示时的关键挑战,其中需要在长时间跨度上保留和更新信息。现有方法限制模型捕捉一跳或局部时间邻域,无法捕捉多跳或全局结构模式。为解决此问题,我们从第一性原理推导出一个参数高效的连续时间动态图状态空间建模框架(CTDG-SSM)。我们首先引入连续时间拓扑感知高阶多项式投影算子(CTT-HiPPO),这是一种基于记忆的HiPPO新公式,用于联合编码时间动态和图结构。CTT-HiPPO的解通过将经典HiPPO解投影到拉普拉斯矩阵的多项式上获得,产生拓扑感知的记忆更新,该更新等价于CTDG的状态空间公式(CTDG-SSM)。然后,使用零阶保持方法获得计算高效的离散公式用于模型实现。在动态链接预测、动态节点分类和序列分类的基准测试中,CTDG-SSM实现了最先进的性能。值得注意的是,在需要长程时间(LRT)和空间推理的数据集上,它取得了较大的性能提升。

英文摘要

Continuous-time dynamic graphs (CTDGs) provide a richer framework to capture fine-grained temporal patterns in evolving relational data. Long-range information propagation is a key challenge while learning representations, wherein it is important to retain and update information over long temporal horizons. Existing approaches restrict models to capture one-hop or local temporal neighborhoods and fail to capture multi-hop or global structural patterns. To mitigate this, we derive a parameter-efficient state-space modeling framework for continuous-time dynamic graphs (CTDG-SSM) from first principles. We first introduce continuous-time Topology-Aware higher order polynomial projection operator (CTT-HiPPO), a novel memory-based reformulation of HiPPO to jointly encode temporal dynamics and graph structure. The solution from CTT-HiPPO is obtained by projecting the classical HiPPO solution through a polynomial of the Laplacian matrix, yielding topology-aware memory updates that admit an equivalent state-space formulation for CTDGs (CTDG-SSM). Then a computationally efficient discrete formulation is obtained using the zero-order hold approach for model implementation. Across benchmarks on dynamic link prediction, dynamic node classification, and sequence classification, CTDG-SSM achieves state-of-the-art performance. Notably, it achieves large performance gains on datasets that require long range temporal (LRT) and spatial reasoning.

2606.04560 2026-06-05 cs.LG cs.AI 版本更新

Rollout-Level Advantage-Prioritized Experience Replay for GRPO

基于轨迹级别优势优先经验回放的GRPO

Gyeongtae Yoo, Sanghyeok Park, Soohyuk Jang, Ik-hwan Kim, Sungroh Yoon

发表机构 * Department of Electrical and Computer Engineering, Seoul National University(首尔国立大学电子与计算机工程系) Interdisciplinary Program in AI, Seoul National University(首尔国立大学人工智能跨学科项目) AIIS, ASRI, INMC, and ISRC, Seoul National University(首尔国立大学人工智能研究所、人工智能研究机构、智能网络与计算中心及人工智能科学研究中心)

AI总结 针对GRPO样本效率低的问题,提出轨迹级经验回放缓冲器,通过年龄驱逐限制陈旧性、新鲜锚定组合保持在线策略、按优势幅度优先采样,在多个数学基准上显著提升性能。

详情
AI中文摘要

基于可验证奖励的GRPO强化学习是后训练推理LLM的标准方法,但样本效率低下。每个轨迹仅用于一次梯度更新后被丢弃。朴素回放在此设置中不适用,因为LLM策略每步梯度变化快,存储的轨迹会变得陈旧并破坏训练稳定性。我们提出一种面向GRPO的轨迹级回放缓冲器,存储和采样单个轨迹而非整组。缓冲器通过年龄驱逐限制陈旧性:任何超过tau_max训练步数的轨迹被移除。缓冲器还通过新鲜锚定组合保留在线策略数据:每个批次保留其新鲜的在线策略轨迹,并拼接从缓冲器中单独抽取的回放轨迹。我们按每个轨迹的优势幅度进行优先回放,并回收优势大的单个轨迹。在三个Qwen3-Base规模、五个数学基准上,我们的方法优于GRPO和朴素回放基线。所有规模均获得正向增益,且随模型增大而增长。最大增益在4B规模上,五个基准平均提升+4.35个百分点。在联合衡量准确率和token效率的AES指标下,与GRPO的效率差距同样在4B最大,为+0.579。

英文摘要

Reinforcement learning from verifiable rewards with GRPO is a standard approach for post-training reasoning LLMs. It remains sample inefficient. Each rollout is used for a single gradient update and then discarded. Naive replay is not well suited in this setting because LLM policies drift quickly per gradient step. Stored rollouts therefore become stale and can destabilize training. We propose a rollout-level replay buffer for GRPO that stores and samples individual rollouts rather than whole groups. The buffer bounds staleness through age eviction. Any rollout older than tau_max training steps is removed. The buffer also preserves on-policy data via fresh-anchored composition. Each batch keeps its fresh on-policy rollouts and then concatenates replay rollouts drawn separately from the buffer. We prioritize replay by per-rollout advantage magnitude and recycle individual rollouts whose advantages are large. Across three Qwen3-Base scales on five math benchmarks, our method outperforms GRPO and naive replay baselines. Gains are positive at every scale and grow with model size. The largest gain is +4.35 pp on the five-benchmark average at 4B. Under an AES metric that jointly measures accuracy and token efficiency, the efficiency margin over GRPO is again largest at 4B, at +0.579.

2606.04485 2026-06-05 cs.LG 版本更新

LimiX-2M: Mitigating Low-Rank Collapse and Attention Bottlenecks in Tabular Foundation Models

LimiX-2M:缓解表格基础模型中的低秩坍塌和注意力瓶颈

Yuanrui Wang, Xingxuan Zhang, Han Yu, Mingchao Hao, Gang Ren, Hao Yuan, Li Mao, Yunjia Zhang, Chun Yuan, Peng Cui

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出统一tokenize-and-route框架LimiX-2M,通过RaBEL扩展标量为局部RBF特征并重新排序双向块S→N→F,以2M参数超越更大模型,改善表格基础模型的精度-效率权衡。

Comments Accepted to ICML 2026

详情
AI中文摘要

表格基础模型(TFM)日益与树集成方法竞争,但其性能通常计算效率低下:使用标准仿射标量分词时,每个特征通过本质上的一维通道注入值变化,特征ID/位置信号无法增加特征内值的自由度,导致早期层值敏感性弱和隐藏状态冗余。我们提出了一个统一的\emph{tokenize-and-route}框架用于强TFM: extbf{RaBEL}将每个标量扩展为紧凑的局部RBF特征(可选指数门控)以改善条件和浅层有效秩,而重新排序的双向块 extbf{S$ ightarrow$N$ ightarrow$F}通过在特征混合前聚合跨样本上下文并使用注意力池化来使计算与读出对齐。这些变化共同产生了 extbf{LimiX-2M},一个2M参数模型,在广泛使用的表格基准上优于更大的TabPFN-v2和TabICL基线,同时降低了训练和推理成本。这些结果突出了值感知分词和读出对齐路由作为改善TFM中精度-效率权衡的关键杠杆。模型检查点和推理代码可在https://github.com/limix-ldm-ai/LimiX获取。

英文摘要

Tabular foundation models (TFMs) increasingly rival tree ensembles, but their performance is often compute-inefficient: with standard affine scalar tokenization, each feature injects value variation through an essentially one-dimensional channel, and feature IDs/positional signals cannot increase within-feature value degrees of freedom, yielding weak early-layer value sensitivity and redundant hidden states. We present a unified tokenize-and-route framework for strong TFMs: RaBEL expands each scalar into compact localized RBF features (optionally exponent-gated) to improve conditioning and shallow-layer effective rank, while a reordered bidirectional block S->N->F aligns computation with the readout by aggregating cross-sample context before feature mixing and using attention pooling. Together, these changes yield LimiX-2M, a 2M-parameter model that outperforms larger TabPFN-v2 and TabICL baselines on widely used tabular benchmarks while reducing training and inference costs. These results highlight value-aware tokenization and readout-aligned routing as key levers for improving the accuracy--efficiency trade-off in TFMs. Model checkpoints and inference code are available at https://github.com/limix-ldm-ai/LimiX.

2606.04335 2026-06-05 cs.LG cs.SY eess.SY 版本更新

Policy Gradient for Continuous-Time Robust Markov Decision Processes

连续时间鲁棒马尔可夫决策过程的策略梯度

Tanya Veeravalli, David M. Bossens, Atsushi Nitanda

发表机构 * Centre for Frontier AI Research, Agency for Science, Technology and Research (A*STAR)(前沿人工智能研究中心,科技研究局(A*STAR)) Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR)(高性能计算研究所,科技研究局(A*STAR)) College of Computing and Data Science, Nanyang Technological University(计算与数据科学学院,南洋理工大学)

AI总结 本文针对连续时间鲁棒马尔可夫决策过程,推导了策略梯度和对抗梯度,并提出双循环优化器和平均场优化器,分别实现线性收敛和亚线性收敛,同时给出了样本复杂度分析。

详情
AI中文摘要

鲁棒马尔可夫决策过程(RMDPs)框架允许设计在最坏情况转移动态下满足性能保证的强化学习智能体。传统RMDPs考虑离散时间动态,最近,样本高效的策略梯度算法已在此背景下被研究。本文研究连续时间RMDP框架内的策略梯度算法。使用随机和常微分方程的路径和伴随公式推导策略梯度和对抗梯度。我们提出双循环优化器,在基于oracle的设置中获得线性收敛,在基于样本的设置中获得$ ilde{\mathcal{O}}( rac{1}{ε^2})$样本复杂度,该分析还推导了无折扣总成本MDP框架的新工具。此外,我们提出平均场优化器作为分布优化器,具有$ ilde{\mathcal{O}}( rac{1}{K})$的基于oracle的收敛速率和$N$粒子近似下$ ilde{\mathcal{O}}( rac{N^2}ε)$的样本复杂度。通过神经常微分方程动态的连续时间RMDPs,两种优化器的连续时间策略梯度算法的有效性得到确认。

英文摘要

The framework of robust Markov decision processes (RMDPs) allows the design of reinforcement learning agents that satisfy performance guarantees under worst-case transition dynamics. Traditional RMDPs consider discrete-time dynamics and recently, sample-efficient policy gradient algorithms have been considered in this context. This paper investigates policy gradient algorithms within a continuous-time RMDP framework. Policy gradients and adversarial gradients are derived using pathwise and adjoint-based formulas for stochastic and ordinary differential equations. We propose double-loop optimisers to obtain linear convergence in the oracle-based setting and an $\tilde{\mathcal{O}}(\frac{1}{ε^2})$ sample complexity in the sample-based setting in an analysis which also derives novel tools for the framework of undiscounted total cost MDPs. Additionally, we propose mean-field optimisers as distributional optimisers with an $\tilde{\mathcal{O}}(\frac{1}{K})$ oracle-based convergence rate and an $\tilde{\mathcal{O}}(\frac{N^2}ε)$ sample complexity under $N$-particle approximation. The effectiveness of continuous-time policy gradient algorithms is confirmed for both optimisers on continuous-time RMDPs with neural ordinary differential equation dynamics.

2606.04037 2026-06-05 cs.AI cs.LG cs.SE 版本更新

Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification

面向企业AI代理的部署前保障:基于本体的仿真与信任认证

Thanh Luong Tuan, Abhijit Sanyal

发表机构 * Golden Gate University(金门大学) Data, Digital & IT, Novartis Healthcare Pvt. Ltd.(数据、数字与IT,诺华健康护理私人有限公司)

AI总结 提出一种基于本体的验证框架,通过本体驱动的场景生成和信任证书,实现企业AI代理在部署前的自动化监管合规与安全认证。

Comments 26 pages, 3 figures. Companion to arXiv:2604.00555. Code and data: https://github.com/frank-luongt/faos-research/tree/main/RA-6

详情
AI中文摘要

企业人工智能(AI)代理的部署前验证仍然是大型语言模型(LLM)能力基准测试与生产部署之间的关键缺口。一旦代理在生产环境中运行,部署后监控、人在回路控制和提示级护栏提供的保障有限。我们提出了一种基于本体的验证框架,包含三个组件:一个代理操作范围,形式化了跨权限、领域约束、安全属性、治理规则和自主级别的认证空间;一个本体到场景的生成流水线,自动推导出监管、操作和对抗性测试场景;以及一个信任证书,携带机器可验证的证明,并附带分级部署裁决(批准、有条件、拒绝)。在四个受监管行业(金融科技、银行、保险和医疗保健)中进行的受控试点,实例化为美国与越南的五个行业-监管体制单元,生成了1,800个场景,并针对125个主要来源监管要求和25个注入故障进行了评估。基于本体的生成(G4)实现了48.3%的监管覆盖率,而基于角色的基线为33.1%(校正后p=0.0006),并且领域特异性最高(4.77/5.0;p=2e-6)。在Bonferroni校正后,相对于基线和检索增强提示的覆盖率优势不再稳健。跨三个LLM家族(Claude Sonnet 4、Qwen 2.5 72B、Gemma 4 26B;总计5,400个场景)的交叉验证复制了角色与本体模式。结果表明,对于监管密集型领域,基于本体的场景生成可作为基于角色测试套件的可信补充。

英文摘要

Pre-deployment verification of enterprise artificial intelligence (AI) agents remains a critical gap between large language model (LLM) capability benchmarking and production deployment. Post-deployment monitoring, human-in-the-loop controls, and prompt-level guardrails offer limited assurance once an agent is operating in production. We present an ontology-grounded verification framework -- to our knowledge the first to combine three components: an Agent Operational Envelope formalizing the certification space across permissions, domain constraints, safety properties, governance rules, and autonomy levels; an ontology-to-scenario generation pipeline that derives regulatory, operational, and adversarial test scenarios automatically; and a machine-verifiable Trust Certificate with graduated deployment verdicts. A controlled pilot across four regulated industries (Fintech, Banking, Insurance, Healthcare), instantiated as five industry-by-regulatory-regime cells across the United States and Vietnam (where Vietnam's 2025 AI Law makes such verification legally mandated for financial services), generated 1,800 scenarios evaluated against 125 primary-source regulatory requirements and 25 injected faults. Ontology-grounded generation significantly outperformed the dominant persona-based baseline on regulatory coverage (48.3% versus 33.1%; corrected p_c = .0006) and attained the highest domain specificity (4.77/5.0; p = 2e-6); transparently, its advantage over plain and retrieval-augmented prompting did not survive Bonferroni correction. Cross-validation across three LLM families (Claude Sonnet 4, Qwen 2.5 72B, Gemma 4 26B; 5,400 total scenarios) replicated the persona-versus-ontology pattern. The framework offers a reproducible, regulation-grounded route to pre-deployment assurance for enterprise AI agents, complementing runtime governance with an auditable deployment gate.

2606.04032 2026-06-05 cs.LG cs.AI cs.CL cs.PF 版本更新

Do Transformers Need Three Projections? Systematic Study of QKV Variants

Transformer 需要三个投影吗?QKV 变体的系统研究

Ali Kayyam, Anusha Madan Gopal, M Anthony Lewis

发表机构 * Ali Kayyam Anusha Madan Gopal M Anthony Lewis

AI总结 本文系统研究了注意力机制中查询、键、值投影共享的变体,发现 Q-K=V 共享在语言建模中仅以 3.1% 的困惑度损失实现 50% 的 KV 缓存减少,且与头共享结合可达到 96.9% 的缓存减少,从而支持设备端推理。

Comments Accepted at ICML 2026 (PMLR vol. 306). 26 pages, 12 figures, 16 tables. Code: https://github.com/Brainchip-Inc/Do-Transformers-Need-3-Projections

详情
AI中文摘要

Transformer 已成为各种 AI 任务的标准解决方案,其中查询、键和值(QKV)注意力公式起着核心作用。然而,这三个投影的各自贡献以及省略某些投影的影响仍知之甚少。我们系统评估了三种投影共享约束:a) Q-K=V(共享键-值),b) Q=K-V(共享查询-键),c) Q=K=V(单投影)。后两种变体产生对称注意力图;为了解决这个问题,我们还通过二维位置编码探索了非对称注意力。通过涵盖合成任务、视觉(MNIST、CIFAR、TinyImageNet、异常检测)和语言建模(在 10B 令牌上训练的 300M 和 1.2B 参数模型)的实验,我们发现我们的 Transformer 性能与 QKV Transformer 相当,有时甚至更好。在语言建模中,Q-K=V 投影共享实现了 50% 的 KV 缓存减少,仅导致 3.1% 的困惑度下降。关键的是,投影共享与头共享(GQA/MQA)互补:将 Q-K=V 与 GQA-4 结合可实现 87.5% 的缓存减少,而 Q-K=V + MQA 则达到 96.9%,从而实现了实用的设备端推理。我们表明,Q-K=V 保持了质量,因为键和值可以占据相似的表示空间,并且注意力在低秩机制下运行,而 Q=K-V 则破坏了注意力的方向性。我们的结果系统地将投影共享描述为注意力中权重绑定的一种未被充分探索的实例,具有直接、可量化的推理内存优势,尤其对边缘部署有价值。代码公开于 https://github.com/anushamadan02/Do-Transformers-Need-3-Projections。

英文摘要

Transformers have become the standard solution for various AI tasks, with the query, key, and value (QKV) attention formulation playing a central role. However, the individual contribution of these three projections and the impact of omitting some remain poorly understood. We systematically evaluate three projection sharing constraints: a) Q-K=V (shared key-value), b) Q=K-V (shared query-key), and c) Q=K=V (single projection). The last two variants produce symmetric attention maps; to address this, we also explore asymmetric attention via 2D positional encodings. Through experiments spanning synthetic tasks, vision (MNIST, CIFAR, TinyImageNet, anomaly), and language modeling (300M and 1.2B parameter models on 10B tokens), we discovered that our transformers perform on par or occasionally better than the QKV transformer. In language modeling, Q-K=V projection sharing achieves 50% KV cache reduction with only 3.1% perplexity degradation. Crucially, projection sharing is complementary to head sharing (GQA/MQA): combining Q-K=V with GQA-4 yields 87.5% cache reduction, while Q-K=V + MQA achieves 96.9%, enabling practical on-device inference. We show that Q-K=V preserves quality because keys and values can occupy similar representational spaces and attention operates in a low-rank regime, whereas Q=K-V breaks attention directionality. Our results systematically characterize projection sharing as an underexplored instance of weight tying in attention, with direct, quantifiable inference memory benefits, particularly valuable for edge deployment. The code is publicly available at https://github.com/Brainchip-Inc/Do-Transformers-Need-3-Projections

2606.03067 2026-06-05 stat.ML cs.LG 版本更新

Trajectory-Aware Node Contributions and the Limits of Static Controllability

轨迹感知的节点贡献与静态可控性的极限

Valentina Kuskova, Dmitry Zaytsev, Michael Coppedge

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文提出“涌现贡献”(EC)作为节点动态杠杆的有限时域度量,通过可微模型的雅可比矩阵计算,在线性时不变极限下退化为平均可控性,并构建相图刻画两者一致与分歧的条件。

Comments 11 pages, 1 figure

详情
AI中文摘要

复杂网络中的一个常见数据挖掘任务是确定单个节点如何影响系统行为。现有方法依赖于静态图中心性或控制理论量(如可控性格拉姆矩阵),这些方法假设线性时不变动力学。然而,实际估计的系统通常是非线性和时变的。我们定义了“涌现贡献(EC)”,这是一种节点动态杠杆的有限时域度量:其脉冲响应的度量加权能量沿系统轨迹累积。EC 通过任何可微模型的雅可比矩阵计算,与估计器无关,并在线性时不变极限下精确地退化为平均可控性。我们的贡献是刻画了这两种度量一致与分歧的条件。使用一个具有已知真实贡献的受控合成族,我们构建了一个跨越非线性、机制结构、持续性和扰动幅度的相图。EC 和平均可控性在静态或平滑漂移动力学下一致,并且两者都跟踪真实值。分歧在持续机制切换下出现,在持续符号反转下最强,并在移除符号反转时消失。在极端扰动幅度下,两种度量都会退化,这揭示了局部线性化的极限。我们将来自多个领域的五个估计真实系统置于该相空间中。它们的位置可作为 EC 何时提供超出静态可控性信息的诊断,从而证明其额外计算成本的合理性。在一个深入检查的面板上,一个二十种子重训练集成揭示了稳健的方差-杠杆分离:节点的扰动广泛传播,尽管其系统内方差较低,这既未被静态中心性恢复,也未被基于方差的摘要恢复。

英文摘要

A recurring data mining task in complex networks is to determine how individual nodes contribute to system behavior. Existing approaches rely on either static-graph centralities or control-theoretic quantities such as controllability Gramians, which assume linear, time-invariant dynamics. Estimated systems, however, are typically nonlinear and time-varying. We define "emergent contribution (EC)," a finite-horizon measure of a node's dynamical leverage: the metric-weighted energy of its impulse response accumulated along the system trajectory. Computed from the Jacobians of any differentiable model, EC is estimator-agnostic and reduces exactly to average controllability in the linear, time-invariant limit. Our contribution is a characterization of when the two measures agree and diverge. Using a controlled synthetic family with known ground-truth contribution, we construct a phase diagram spanning nonlinearity, regime structure, persistence, and perturbation amplitude. EC and average controllability agree under static or smoothly drifting dynamics and both track ground truth. Divergence emerges under persistent regime switching, is strongest under persistent sign reversal, and disappears when the sign reversal is removed. At extreme perturbation amplitudes, both measures degrade, identifying the limits of local linearization. We place five estimated real systems from several domains within this phase space. Their placement serves as a diagnostic of when EC provides information beyond static controllability and therefore justifies its additional computational cost. On one panel examined in depth, a twenty-seed retraining ensemble reveals a robust variance--leverage dissociation: nodes whose perturbations propagate widely despite low within-system variance, which is not recovered by static centralities nor variance-based summaries.

2606.03100 2026-06-05 cs.CV cs.LG 版本更新

Zero-Shot 3D Question Answering via Hierarchical View-to-Token Transportation

零样本3D问答通过层级视图到令牌传输

Dongsheng Wang, Dawei Su, Hui Huang

发表机构 * Dongsheng Wang(王东生) Dawei Su(苏大卫) Hui Huang(黄慧)

AI总结 提出KeyVT方法,通过层级视图和令牌级输入上下文收集,结合像素特征与相机参数评估视图重要性,并利用最优传输识别代表性令牌,实现零样本3D问答性能提升。

Comments Accepted at ICML 2026. 19 pages, 6 figures

详情
AI中文摘要

最近,通过2D视觉-语言模型(VLM)进行零样本3D场景理解因其有前景的空间推理能力而受到越来越多的研究关注。通常,从3D点云中采样多个2D视图,并输入预训练的VLM以回答给定问题。这种范式凸显了输入上下文质量的关键作用,并提出了在有限输入预算下尽可能保留与任务相关的3D细节的挑战。我们提出了 exttt{KeyVT},一种在视图和令牌级别进行输入上下文收集的层级方法。具体来说,我们将像素特征与相机参数结合,并基于语义内容和几何位置评估视图重要性,从而得到空间一致且与任务相关的视图。此外,我们通过最优传输(OT)框架识别代表性令牌来解决选定视图中补丁之间的冗余问题,其中视图令牌和关键令牌被公式化为嵌入空间中的两个离散分布。这些关键令牌通过最小化OT距离期望覆盖所有视图特征。我们在三个广泛使用的基准上评估了我们的框架,结果表明与现有的无调优方法相比有显著改进,并且性能与基于训练的方法相当。

英文摘要

Recently, zero-shot 3D scene understanding via 2D Vision-Language Models (VLMs) has gained increasing research interest due to their promising spatial reasoning capabilities. Typically, multiple 2D views are sampled from a 3D point cloud and fed into pre-trained VLMs to answer a given question. This paradigm highlights the critical role of input context quality and raises the challenge of retaining as many task-relevant 3D details as possible under a limited input budget. We propose \texttt{KeyVT}, a hierarchical approach for input context collection at both the view and token levels. Specifically, we combine pixel features with camera parameters and assess view importance based on both semantic content and geometric position, resulting in spatially consistent and task-relevant views. Furthermore, we address redundancy among patches across selected views by identifying representative tokens under the optimal transport (OT) framework, where view tokens and key tokens are formulated as two discrete distributions in the embedding space. These key tokens are expected to cover all view features by minimizing the OT distance. We evaluate our framework on three widely used benchmarks, demonstrating significant improvements over existing tuning-free methods and performance comparable to training-based approaches.

2606.03070 2026-06-05 cs.LG cs.AI 版本更新

ASymPO: Asymmetric-Scale Policy Optimization for Asynchronous LLM Post-Training Without Behavior Information

ASymPO: 用于异步大语言模型后训练的非对称尺度策略优化(无需行为信息)

Zehua Liu, Yuxuan Yao, Xiaojin Fu, Tao Zhong, Mingxuan Yuan

发表机构 * Huawei Technologies(华为技术)

AI总结 针对异步强化学习中陈旧响应导致的分布漂移问题,提出非对称尺度策略优化(ASymPO),通过归一化每个响应的令牌损失来恢复零和平衡,无需行为策略概率。

Comments incorrect proofs in the paper

详情
AI中文摘要

异步强化学习通过将响应生成与策略优化解耦来提高语言模型后训练的吞吐量,但陈旧响应会引入分布漂移。标准的行为校正方法通过行为策略概率、重要性比率或裁剪来控制这种漂移,这需要在推出和学习系统之间具有令牌对齐、版本化和数值一致的行为对数概率。我们探究是否仅使用当前策略概率就能稳定异步组相对强化学习。我们识别出一种尺度不平衡失败模式:当在当前策略下评估陈旧响应时,正负损失项可能出现在不同的负对数概率尺度上,因此零和优势不再意味着平衡的损失贡献。我们提出非对称尺度策略优化(ASymPO),它通过每个响应的当前平均令牌负对数概率来归一化其令牌损失。ASymPO不需要行为策略概率,恢复了响应级别的零和平衡,并保留了非零的学习信号。我们还引入了固定负缩放基线——缩放策略优化(SPO),并在异步数学推理后训练中评估了这两种仅当前策略的目标函数。

英文摘要

Asynchronous reinforcement learning can improve language-model post-training throughput by decoupling response generation from policy optimization, but stale responses introduce distribution drift. Standard behavior-corrected methods control this drift with behavior-policy probabilities, importance ratios, or clipping, which requires token-aligned, versioned, and numerically consistent behavior log-probabilities across rollout and learner systems. We ask whether asynchronous group-relative RL can instead be stabilized using only current-policy probabilities. We identify a scale-imbalance failure mode: when stale responses are evaluated under the current policy, positive and negative loss terms can appear at different negative log-probability scales, so zero-sum advantages no longer imply balanced loss contributions. We propose Asymmetric-Scale Policy Optimization (ASymPO), which normalizes each response's token loss by its current average token negative log-probability. ASymPO requires no behavior-policy probabilities, restores response-level zero-sum balance, and preserves a nonzero learning signal. We also introduce Scaled Policy Optimization (SPO), a fixed negative-scaling baseline, and evaluate both current-policy-only objectives in asynchronous mathematical reasoning post-training.

2606.02684 2026-06-05 cs.LG cs.AI cs.CL 版本更新

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

先过滤,再重加权:重新思考在线策略蒸馏中的优化粒度

Yuying Li, Leqi Zheng, Yongzi Yu, Wenrui Zhou, Xuchang Zhong, Xing Hu, Jing Jin, Hangjie Yuan, Tao Feng

发表机构 * THU(清华大学) HKUST(香港科技大学) BIT(北京理工大学) Meituan(美团) ZJU(浙江大学)

AI总结 针对在线策略蒸馏,提出FiRe-OPD方法,通过轨迹级过滤和令牌级软重加权实现细粒度优化,在多种设置下优于现有方法。

详情
AI中文摘要

大型语言模型中的在线策略蒸馏正从全轨迹KL监督转向更具选择性的训练范式。最近的在线策略蒸馏方法越来越关注选择哪些轨迹进行学习、哪些令牌信息量最大以及哪些监督信号最可靠。受此趋势启发,我们重新思考在线策略蒸馏的优化粒度,并提出FiRe-OPD(先过滤,再重加权),该方法在轨迹和令牌两个层面联合调整监督信号。具体来说,FiRe-OPD首先过滤轨迹以移除低质量的采样结果,然后在保留的轨迹内应用软重加权以强调信息丰富的令牌。与硬令牌选择相比,FiRe-OPD利用软加权机制有效减轻信息损失并增强优化稳定性,从而实现更细粒度的在线策略蒸馏优化。我们在强到弱、单教师和多教师设置中验证了FiRe-OPD的有效性,并展示了其相对于近期令牌级在线策略蒸馏方法的优越性(例如,在强到弱设置中AIME 2024上+6.25,在多教师设置中Miner上+18.81)。我们的代码可从此链接获取。

英文摘要

On-Policy distillation (OPD) in large language models is shifting from full-trace KL supervision toward more selective training paradigms. Recent OPD methods increasingly focus on selecting which trajectories to learn from, which tokens are most informative, and which supervision signals are most reliable. Motivated by this trend, we rethink optimization granularity of OPD and propose \fireicon\ FiRe-OPD (Filter, then Reweight), which jointly adjusts supervision signals at both trajectory and token levels. In details, FiRe-OPD first filters trajectories to remove low-quality rollout samples, and then applies soft reweighting within the retained trajectories to emphasize informative tokens. Compared with hard token selection, FiRe-OPD leverages a soft-weighting mechanism to effectively mitigate information loss and enhance optimization stability, thereby achieving finer-grained OPD optimization. We validate the effectiveness of FiRe-OPD across strong-to-weak, single-teacher, and multi-teacher settings, and demonstrate its superiority over recent token-level OPD methods ( (e.g., +6.25 on AIME 2024 in strong-to-weak, +18.81 on Miner in multi-teacher). Our code is available at https://github.com/YuYingLi0/FiRe-OPD.

2606.02031 2026-06-05 cs.LG cs.AI cs.CL cs.CV 版本更新

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

OpenWebRL: 揭秘视觉网络代理的在线多轮强化学习

Rui Yang, Qianhui Wu, Yuxi Chen, Hao Bai, Wenlin Yao, Hao Cheng, Baolin Peng, Huan Zhang, Tong Zhang, Jianfeng Gao

发表机构 * UIUC(伊利诺伊大学香槟分校) Microsoft(微软)

AI总结 提出OpenWebRL框架,通过在线多轮强化学习在真实网站上训练视觉网络代理,以4B参数模型在基准测试中达到开源最优,并与闭源系统竞争。

Comments 36 pages, 11 figures

详情
AI中文摘要

构建强大的视觉网络代理需要长程推理、精确定位以及与动态真实网站的稳健交互。尽管进展迅速,最强的系统仍然大多是专有的,而开放代理仍然严重依赖于对大量策划的网络轨迹进行监督式后训练。这种依赖造成了主要的可扩展性瓶颈:高质量演示的收集成本高昂,而静态数据集对多样且不断变化的开放网络的覆盖有限。尽管在线强化学习在基于文本的代理中显示出前景,但其直接用于在实时网站上训练视觉网络代理的潜力仍未得到充分探索。在本文中,我们介绍了OpenWebRL,一个用于在真实网站上通过在线多轮强化学习训练视觉网络代理的开放框架。OpenWebRL涵盖了完整的训练流程,包括可扩展的实时浏览器基础设施、监督初始化、多模态上下文管理、轨迹级成功判断以及高效的多轮策略优化。使用该框架,我们训练了OpenWebRL-4B,在具有挑战性的实时网络基准测试中建立了新的开源最优水平。仅使用0.4K初始化轨迹和2.2K开放式强化学习训练任务,OpenWebRL-4B在Online-Mind2Web上达到67.0%的成功率,在DeepShop上达到64.0%,优于之前类似或更大规模的开放代理,并与包括OpenAI CUA和Gemini CUA在内的专有系统保持竞争力。除了强大的基准性能外,我们还系统研究了使在线强化学习对视觉网络代理有效的关键设计选择,并分析了强化学习如何改进代理推理。总体而言,我们的工作为构建更强大、可重复且成本效益更高的开放网络代理提供了一条实用路径。我们将发布我们的训练数据、模型和代码以支持未来的研究。

英文摘要

Building capable visual web agents requires long-horizon reasoning, precise grounding, and robust interaction with dynamic real-world websites. Despite rapid progress, the strongest systems remain largely proprietary, while open agents still depend heavily on supervised post-training over large collections of curated web trajectories. This dependence creates a major scalability bottleneck: high-quality demonstrations are expensive to collect, and static datasets offer limited coverage of the diverse, ever-changing open web. Although online RL has shown promise for text-based agents, its potential for training visual web agents directly on live websites remains largely underexplored. In this paper, we introduce OpenWebRL, an open framework for training visual web agents with online multi-turn RL on real websites. OpenWebRL covers the full training pipeline, including scalable live-browser infrastructure, supervised initialization, multimodal context management, trajectory-level success judging, and efficient multi-turn policy optimization. Using this framework, we train OpenWebRL-4B, which establishes a new open-source state of the art on challenging live-web benchmarks. With only 0.4K initialization trajectories and 2.2K open-ended RL training tasks, OpenWebRL-4B achieves 67.0% success on Online-Mind2Web and 64.0% on DeepShop, outperforming prior open agents of similar or larger scale and remaining competitive with proprietary systems including OpenAI CUA and Gemini CUA. Beyond strong benchmark performance, we systematically study the key design choices that make online RL effective for visual web agents, and analyze how RL improves agentic reasoning. Overall, our work offers a practical path toward building more capable, reproducible, and cost-efficient open web agents. We will release our training data, models, and code to support future research.

2605.31278 2026-06-05 cs.AI cs.LG stat.ME 版本更新

Industrializing Prediction-Powered Inference: The GLIDE Library for Reliable GenAI and Agentic Systems Evaluation

工业化预测驱动推断:用于可靠生成式AI与智能体系统评估的GLIDE库

Grégoire Martinon, Ibrahim Merad, Mohammed Raki

发表机构 * University of California, Berkeley(加州大学伯克利分校) Google Research(谷歌研究院)

AI总结 提出GLIDE开源库,统一多种预测驱动推断方法,提供无偏估计与有效置信区间,显著降低人工标注成本。

Comments 8 pages, Accepted to the ICML 2026 Workshop on Statistical Frameworks for Uncertainty in Agentic Systems, Seoul, South Korea, 2026

详情
AI中文摘要

智能体系统的可靠评估需要具有有效不确定性的无偏估计,但标准实践在昂贵的人工标注和有偏的LLM-as-judge代理之间权衡。预测驱动推断(PPI)将两者结合为具有有效置信区间的去偏估计,然而其各种方法仍分散在不同论文的部分实现中。我们介绍GLIDE,一个开源Python库,它在专用于均值估计的scipy风格API下统一了最先进的PPI估计器(PPI++、分层PPI、先预测后去偏及其分层变体、主动统计推断)和采样器(均匀、分层、主动、成本最优)。GLIDE附带一个可复现的蒙特卡洛验证套件、一个基于经验的决策树用于方法选择,以及一个智能体评估案例研究,显示在同等精度下显著节省标注成本。GLIDE包可通过此URL获取:https://github.com/EmertonData/glide

英文摘要

Reliable evaluation of agentic systems requires unbiased estimates with valid uncertainty, but standard practice navigates between costly human annotation and biased LLM-as-judge proxies. Prediction-powered inference (PPI) combines both into debiased estimates with valid confidence intervals, yet its various methods remain scattered across papers under partial implementations. We introduce GLIDE, an open-source Python library that unifies state-of-the-art PPI estimators (PPI++, Stratified PPI, Predict-Then-Debias and its stratified variants, Active Statistical Inference) and samplers (uniform, stratified, active, cost-optimal) under a scipy-style API specialized to mean estimation. GLIDE ships with a reproducible Monte Carlo validation suite, an empirically grounded decision tree for method selection, and an agentic evaluation case study showing substantial annotation savings at equivalent precision. The GLIDE package is available at this URL: https://github.com/EmertonData/glide

2605.27991 2026-06-05 stat.ML cs.LG 版本更新

Gradient-Flow Optimization as Dynamic Random-Effects Inference: Testing and Early Stopping with Applications to Deep Learning

深度神经网络训练作为随机效应:优化-推断对偶性

Minhao Yao, Ruoyu Wang, Xihong Lin, Lin Liu, Zhonghua Liu

发表机构 * Centre for Biomedical Data Science, Duke-NUS Medical School, National University of Singapore(生物医学数据科学中心,国立新加坡大学杜克-新加坡医学学校) Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA(生物统计学系,哈佛T.H. Chan公共卫生学院,马萨诸塞州波士顿,美国) Institute of Natural Sciences, MOE-LSC, School of Mathematical Sciences, CMA-Shanghai, SJTU-Yale Joint Center of Biostatistics and Data Science, Shanghai Jiao Tong University(自然科学院,MOE-LSC,数学科学学院,CMA-上海,SJTU-耶鲁联合生物统计学与数据科学中心,上海交通大学) Department of Biostatistics, Columbia University, New York, NY, USA(生物统计学系,哥伦比亚大学,纽约州纽约市,美国)

AI总结 本文提出深度神经网络训练与经典随机效应模型等价,揭示了优化-推断对偶性,并利用限制最大似然估计实现基于似然的早停规则。

详情
AI中文摘要

深度神经网络(DNN)取得了显著的实证成功,但其训练动态主要从优化而非统计原理的角度被理解。本文通过证明连续时间神经正切核(NTK)梯度流产生的预测与经典随机效应模型的预测完全等价,为过参数化机制下的DNN训练建立了一个统计框架。在该框架中,训练时间充当方差分量,或等价地作为经验贝叶斯协方差超参数,控制噪声到结构化信号的变异分配。这种等价性揭示了一种优化-推断对偶性:梯度流路径既是优化轨迹,也是经验贝叶斯随机效应推断路径。以训练时间为条件,网络输出是潜在信号的后验均值,通过限制最大似然估计(REML)估计训练时间,将早停转化为基于似然的经验贝叶斯推断,而非外部调参。这一视角产生了一个两阶段推断程序。首先,方差分量检验确定DNN训练是否捕捉到初始化之外的统计显著结构。其次,以训练合理为条件,REML提供基于似然的早停规则。由此产生的停止时间在NTK特征基下具有谱解释,其中训练持续到谱损失去相关实现。我们进一步证明,对于固定设计下的样本内预测,REML引导的早停实现了渐近最优预测误差,并且在额外的随机设计正则条件下,对于样本外预测也成立。这项工作将DNN训练重新定义为统计推断,并为决定是否以及训练深度神经网络多长时间提供了原则性基础。

英文摘要

Gradient-flow optimization is usually viewed as an algorithmic procedure for minimizing empirical loss, with training duration selected by validation or heuristic early-stopping rules. We develop a statistical inference framework for the gradient-flow training trajectory itself. The central object is fixed-operator squared-error gradient flow: whenever the fitted value evolves through a time-invariant positive semidefinite training operator, the trained model output at each training time is exactly equivalent to the best linear unbiased predictor, or empirical-Bayes posterior mean, under a corresponding random-effects model. Under this representation, training time becomes a variance-component parameter governing how variance is reallocated from residual noise to structured signal. This turns two basic training decisions into inferential problems. First, whether training is needed is formulated as a variance-component test for signal beyond initialization. Second, how long to train is formulated as restricted maximum likelihood (REML) estimation of the training-time variance component. The resulting REML-guided early stopping rule has a spectral interpretation: it selects the training time at which optimized spectral losses become empirically decorrelated from the eigenvalues of the training operator, yielding an effective degrees-of-freedom measure for the evolving trained model. We establish asymptotic prediction optimality for fixed-design in-sample risk and, under additional kernel regularity conditions, random-design out-of-sample risk. Deep learning models in fixed-kernel gradient regimes provide canonical modern-AI instantiations of the theory. Numerical experiments and a UK Biobank proteomics application show that the proposed inferential approach attains competitive prediction accuracy while reducing the reliance on validation splits and repeated checkpoint evaluation.

2605.27292 2026-06-05 cs.LG stat.ML 版本更新

Detectability in Diversity: Improved Canary Crafting for Privacy Auditing in One Run

多样性中的可检测性:单次运行中用于隐私审计的改进金丝雀构造

Mathieu Dagréou, Aurélien Bellet

发表机构 * PreMeDICaL team, Inria Idesp, Inserm, Univ. de Montpellier(PreMeDICaL团队,Inria Idesp,Inserm,蒙彼利埃大学)

AI总结 针对单次运行隐私审计中金丝雀点相互干扰导致隐私泄露估计偏弱的问题,提出结合影响函数贪婪初始化与双层优化的方法,最大化金丝雀可检测性并促进嵌入空间多样性,以较低计算成本获得更强的隐私泄露估计。

详情
AI中文摘要

隐私审计旨在利用成员推断攻击(MIAs)实证评估机器学习模型中的隐私泄露,并推导差分隐私(DP)参数的下界。最近的单次运行审计方法通过依赖单个训练运行和多个“金丝雀”点来降低标准方法的高成本,审计者需要检测这些点是否被包含或排除。在这项工作中,我们研究了为单次运行隐私审计高效构造金丝雀的问题。受最近理论见解的启发,即金丝雀之间的干扰导致与多次运行方法相比更弱的泄露估计,我们提出优化金丝雀使其既高度可检测又最小化干扰。我们的方法结合了基于影响函数的贪婪初始化与双层优化过程,该过程最大化可区分性同时促进嵌入空间中的多样性,从而能够使用计算高效的双层算法。实验表明,与现有的金丝雀构造方法相比,我们的方法以更低的计算成本实现了更强的隐私泄露估计。

英文摘要

Privacy auditing aims to empirically assess privacy leakage in machine learning models using membership inference attacks (MIAs), and to derive lower bounds on differential privacy (DP) parameters. Recent one-run auditing methods address the high cost of standard approaches by relying on a single training run with multiple "canary" points whose inclusion or exclusion must be detected by the auditor. In this work, we study the problem of efficiently crafting canaries for one-run privacy auditing. Motivated by recent theoretical insights suggesting that interference between canaries contributes to weaker leakage estimates compared to multi-run methods, we propose to optimize canaries to be both highly detectable and minimally interfering. Our approach combines a greedy initialization based on influence functions with a bilevel optimization procedure that maximizes distinguishability while promoting diversity in embedding space, enabling the use of computationally efficient bilevel algorithms. Experiments show that our method achieves stronger privacy leakage estimates at a lower computational cost than existing canary crafting approaches.

2605.26046 2026-06-05 cs.CL cs.AI cs.LG cs.MA cs.SE 版本更新

When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

当梯度冲突时:多目标提示优化用于LLM评判器的失败模式

Parth Darshan, Abhishek Divekar

发表机构 * IIT Jodhpur(印度理工学院乔普里尔) Amazon(亚马逊)

AI总结 研究多目标文本梯度优化中梯度稀释和指令干扰两种失败模式,通过分解优化器信息共享方式揭示性能下降原因。

Comments Accepted at ACL 2026 - CustomNLP4U Workshop. Code, prompts and data available at https://github.com/adivekar-utexas/when-gradients-collide

详情
AI中文摘要

将LLM评判器定制到特定任务或领域通常需要同时跨多个评估标准优化其提示。文本梯度方法针对单一评判标准实现了自动化,但它们产生自然语言批评,而非数值向量。因此,多任务学习的冲突解决工具包(PCGrad、MGDA)不适用于多目标文本梯度设置。我们通过改变损失、梯度和优化器LLM共享跨任务信息的程度,测试了文本梯度优化器的五种分解模式。在10种配置中的6种中,我们观察到优化从未优于初始提示。当梯度LLM联合处理多个标准时,梯度特异性下降了59%(从9.0降至3.7)。另外,我们观察到将每个任务的指令简单组合成单个提示会使斯皮尔曼相关系数降低5.3%。这些结果识别出两种可分离的失败模式:优化时的梯度稀释和推理时的指令干扰,它们共同限制了使用文本反馈进行多目标评判器定制的设计空间。

英文摘要

Customizing an LLM judge to a specific problem or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produce natural-language critiques, not numerical vectors. Thus, the conflict-resolution toolkit of multi-task learning (PCGrad, MGDA) does not apply to this multi-objective textual gradient setting. We extend TextGrad to the multi-objective setting and test four decomposition modes of textual gradient optimizers by varying how much cross-objective information the loss, gradient and optimizer LLMs share. We find the gradient's task-focus drops by 59% (9.0 to 3.7 out of 10) when the gradient LLM must provide feedback on multiple criteria jointly. Separately, we observe that naively combining single-objective optimized instructions into a single prompt degrades Spearman rho from 0.305 to 0.220 (-0.085). These results identify two separable failure modes: optimization-time gradient dilution and inference-time instruction interference, which together constrain the design space for multi-objective judge optimization using textual feedback.

2603.19294 2026-06-05 cs.LG cs.AI cs.CL 版本更新

Maximizing Mutual Information Between Prompt and Response Improves LLM Performance With No Additional Data

最大化提示与响应之间的互信息无需额外数据即可提升LLM性能

Hyunji Nam, Haoran Li, Natasha Jaques

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出互信息偏好优化(MIPO)方法,通过对比数据增强构建偏好对,利用直接偏好优化最大化提示与响应间的点互信息,无需额外数据或外部监督即可提升LLM在个性化和可验证任务上的性能。

Comments International Conference on Machine Learning 2026

详情
AI中文摘要

虽然后训练已在多个领域成功改进了大型语言模型(LLM),但这些提升严重依赖人工标注数据或外部验证器。现有数据已被充分利用,而新数据收集成本高昂。此外,真正的智能远不止可验证任务。因此,我们需要较少依赖外部信号且更广泛适用于可验证和不可验证领域的自我改进框架。我们提出**互信息偏好优化(MIPO)**,一种对比数据增强方法,通过基于正确提示生成正响应,以及基于随机无关提示生成负响应来构建偏好对。我们证明,使用直接偏好优化从这些配对数据中学习,可以最大化*基础LLM*下提示与响应之间的逐点互信息。使用1-7B参数的Llama和Qwen指令模型的实验表明,与提示基线相比,MIPO在个性化任务上实现了3-16%的提升(Qwen2.5-1B-Instruct提升51%)。令人惊讶的是,MIPO在可验证领域(如数学和多项选择题问答)也有用,*无需任何额外数据或外部监督*即可获得1-20%的提升。这些结果表明,利用对比数据对中的内在信号进行自我改进是一个有前景的方向。

英文摘要

While post-training has successfully improved large language models (LLMs) across a variety of domains, these gains heavily rely on human-labeled data or external verifiers. Existing data has already been exploited, and new data is expensive to collect. Moreover, true intelligence goes far beyond verifiable tasks. Therefore, we need self-improvement frameworks that are less dependent on external signals and more broadly applicable to both verifiable and non-verifiable domains. We propose **Mutual Information Preference Optimization (MIPO)**, a contrastive data augmentation method that constructs preference pairs by generating a positive response conditioning on the correct prompt, and a negative response by conditioning on a random, unrelated prompt. We show that using Direct Preference Optimization to learn from this paired data maximizes pointwise mutual information *under the base LLM* between prompts and model responses. Experiments with with 1-7B parameter Llama and Qwen instruct models show that MIPO achieves 3-16% gains (and 51% increase for Qwen2.5-1.5B-Instruct) on personalization compared to prompting baselines. Surprisingly, MIPO can also be useful in verifiable domains, such as math and multiple-choice question answering, yielding 1-20% gains *without any additional data or external supervision*. These results suggest a promising direction for self-improvement using intrinsic signals derived from contrastive data pairs.

2605.25582 2026-06-05 cs.LG cs.AI 版本更新

Extreme Region Policy Distillation

极端区域策略蒸馏

Changyu Chen, Xiting Wang, Rui Yan

发表机构 * Gaoling School of Artificial Intelligence, Renmin University of China(中国人民大学耿丽人工智能学院) Wuhan University(武汉大学)

AI总结 提出极端区域策略蒸馏(ERPD)两阶段框架,通过解耦样本效率与KL效率,在固定数据上先进行弱约束离策略优化以最大化提取训练信号,再在信任区域约束下蒸馏到基础策略,从而在数学推理任务中实现更好的性能与更小的KL散度。

详情
AI中文摘要

大语言模型的强化学习面临样本效率与渐近性能之间的基本权衡:严格在策略方法在单次更新后丢弃轨迹,而离策略重用引入分布不匹配,现有信任区域技术主要通过强制保守优化来缓解,往往未充分利用丰富的训练信号。为研究此问题,我们在固定数据上执行大量离策略更新。实验揭示,激进的多步优化带来快速初始增益,但过度更新导致轨迹概率偏离和熵崩溃,性能早期停滞。收紧KL约束仅降低上限而不解决退化。这促使我们提出极端区域策略蒸馏(ERPD),一个两阶段框架,将样本效率与KL效率解耦。第一阶段在固定数据上执行弱约束离策略优化,以最大化提取训练信号。所得策略提供令牌级监督。第二阶段,我们在信任区域约束下将这些信号蒸馏到基础策略中,过滤有害漂移同时保留有用信号。蒸馏后的策略以显著更小的KL散度达到相当或更好的性能,表明第一阶段的大部分散度用于不必要的漂移而非真正改进。关键的是,ERPD同时适应强和弱教师:当激进优化未产生更强策略时,即使是退化教师也能通过替代信号构建策略提供有效监督。我们在数学推理上验证ERPD,显示出对强基础模型(在策略训练停滞时)的增益,以及使用弱教师的可靠改进。

英文摘要

Reinforcement learning for large language models faces a fundamental trade-off between sample efficiency and asymptotic performance: strictly on-policy methods discard trajectories after a single update, while off-policy reuse introduces distribution mismatch that existing trust-region techniques mitigate primarily by enforcing conservative optimization, often leaving rich training signals underutilized. To investigate this, we perform extensive off-policy updates on fixed data. Our experiments reveal that aggressive multi-step optimization brings rapid initial gains, but excessive updates cause trajectory probabilities to deviate and entropy to collapse, with performance plateauing early. Tightening KL constraints merely lowers the ceiling without resolving the degradation. This motivates Extreme Region Policy Distillation (ERPD), a two-stage framework that decouples sample efficiency from KL efficiency. The first stage performs weakly constrained off-policy optimization on fixed data to maximally extract training signals. The resulting policy provides token-level supervision. In the second stage, we distill these signals into the base policy under trust-region constraints, filtering harmful drift while preserving useful signals. The distilled policy achieves comparable or better performance with substantially smaller KL divergence, indicating that much of the first-stage divergence was spent on unnecessary drift rather than genuine improvement. Crucially, ERPD accommodates both strong and weak teachers: when aggressive optimization yields no stronger policy, even degenerate teachers provide effective supervision via alternative signal construction strategies. We validate ERPD on mathematical reasoning, showing gains for strong base models where on-policy training plateaus, and reliable improvements with weak teachers.

2507.02628 2026-06-05 cs.LG 版本更新

A Generative Approach for Semantic Auditing of Electronic Health Records

电子健康记录语义审计的生成式方法

Irena Girshovitz, Atai Ambus, Moni Shahar, Ran Gilad-Bachrach

发表机构 * School of Biomedical Engineering, Faculty of Engineering, Tel Aviv University(特拉维夫大学生物医学工程学院,工程学院) AI and Data Science Center of Tel Aviv University (TAD)(特拉维夫大学人工智能与数据科学中心(TAD)) Safra Center for Bioinformatics, Tel Aviv University(特拉维夫大学萨弗拉生物信息学中心)

AI总结 提出一种基于大语言模型的生成式方法,通过语义数据覆盖和检索增强生成架构自动检测电子健康记录与流行病学证据之间的不一致性,实现可扩展的语义审计。

Comments 23 pages, 5 figures (+ appendix)

详情
AI中文摘要

临床人工智能的可靠性依赖于高质量数据,然而电子健康记录常常与现有科学知识不一致。当前的质量评估存在局限性:要么关注语法,要么依赖劳动密集型的手工规则来捕捉语义细微差别。为了克服这些可扩展性障碍,我们提出了Medical Data Pecking,一种采用软件单元测试原则进行医疗数据验证的方法。它引入了语义数据覆盖,利用大语言模型生成上下文感知的测试,以“啄出”观察数据与流行病学证据之间的不一致性。为了演示这种方法,我们使用检索增强生成架构实现了一个参考工具,将医学文献综合为可执行代码。当应用于三个数据集时,该实现为每个队列生成了数十个测试,识别出观察分布与流行病学先验之间的差异。这些差异既包括真实的数据不一致性,也包括预期的队列选择效应。这项工作为可扩展的语义审计提供了一个初步框架,将保证从手工规则转向可信AI所需的生成式和上下文敏感的验证。

英文摘要

The reliability of clinical artificial intelligence (AI) depends on high-quality data, yet Electronic Health Records are often inconsistent with existing scientific knowledge. Current quality assessments are limited: they either focus on syntax or rely on labor-intensive manual rules to capture semantic nuances. To overcome these scalability barriers, we propose Medical Data Pecking, a methodology that adopts software unit testing principles for medical data validation. It introduces Semantic Data Coverage, employing Large Language Models to generate context-aware tests that "peck" for inconsistencies between observed data and epidemiological evidence. To demonstrate this methodology, we implemented a reference tool using a Retrieval-Augmented Generation architecture that synthesizes medical literature into executable code. When applied to three datasets, this implementation generated dozens of tests per cohort, identifying discrepancies between observed distributions and epidemiological priors. These discrepancies encompass both genuine data inconsistencies and expected cohort-selection effects. This work provides an initial framework for scalable semantic auditing, shifting assurance from manual rules to the generative and context-sensitive verification required for trustworthy AI.

2605.24059 2026-06-05 cs.LG cs.AI 版本更新

Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers

频谱探测电路:识别预训练Transformer中注意力头电路的三步法

Yongzhong Xu

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出一种三步法,通过频谱信号排序、任务模式筛选和组消融因果验证,无需标签即可识别预训练Transformer中执行持续内容依赖计算的注意力头电路,并在多个模型上验证了其通用性和因果必要性。

Comments 35 pages, 4 figures

详情
AI中文摘要

我们提出了一种三步法,用于识别预训练Transformer中的注意力头电路。每个头的频谱信号——即每个头注意力输出的时间积分参与比——可以在没有标签或归因梯度的情况下,对执行持续内容依赖计算的头进行排序。任务模式屏幕将此通用指标过滤为特定任务的候选电路,而针对匹配随机对照的组消融则完成了因果声明。我们在8倍参数范围(5100万至10亿活跃/70亿总参数)、两种架构族(密集型和混合专家)以及四种预训练流程上进行了验证。该方法是可移植的:一个2-6头的归纳电路在每个测试模型中都是因果必需的,消融后合成归纳top-1下降94-100%。频谱信号在无监督下具有预测性:在5100万参数探测模型的六个独立种子上,相同的计算识别出每个种子上的种子特定电路。在Pythia族(1.24亿至4.1亿)中,执行可识别专门计算的头比例保持在17-19%,而特定归纳电路保持3-11个头——与总头数呈次线性关系。本文是一个三篇论文计划的方法论锚点;配套论文将该方法扩展到预训练期间的发展轨迹以及组合任务电路,其中模式选择性与任务因果结构解耦。

英文摘要

We present a three-step recipe for identifying attention-head circuits in pretrained transformers. A per-head spectral signal -- the time-integrated participation ratio of each head's attention output -- ranks heads doing sustained content-dependent computation without labels or attribution gradients. A task-pattern screen filters this general indicator into a task-specific candidate circuit, and group ablation against a matched-random control completes the causal claim. We validate across an 8x parameter range (51M to 1B-active / 7B-total), two architecture families (dense, mixture-of-experts), and four pretraining pipelines. The recipe ports: a 2-6 head induction circuit is causally necessary in every model tested, with a 94-100% drop in synthetic-induction top-1 after ablation. The spectral signal is predictive without supervision: on six independent seeds of a 51M-parameter probe model, the same computation identifies the seed-specific circuit on each seed. The fraction of heads doing identifiable specialized computation is conserved at 17-19% across the Pythia family (124M to 410M), while specific induction circuits stay 3-11 heads -- sublinear in total head count. This paper is the methodology anchor of a three-paper program; companion papers extend the recipe to developmental trajectories during pretraining and to composed-task circuits where pattern selectivity decouples from task-causal structure.

2605.23809 2026-06-05 eess.SY cs.LG cs.SY 版本更新

Advanced AI Service Provisioning in O-RAN through LLM Engine Integration

通过LLM引擎集成在O-RAN中的高级AI服务提供

Seyed Bagher Hashemi Natanzi, Pranshav Gajjar, Bo Tang, Vijay K. Shah

发表机构 * Department of Electrical and Computer Engineering, Worcester Polytechnic Institute(电气与计算机工程系,沃斯特理工学院) Department of Electrical and Computer Engineering, North Carolina State University(电气与计算机工程系,北卡罗来纳州立大学)

AI总结 提出一种双脑架构,结合LLM的推理能力和轻量级ML引擎的实时性,实现O-RAN中AI服务的自动化部署与配置。

详情
AI中文摘要

开放无线接入网络(O-RAN)架构允许通过模块化的xApp和rApp将AI直接嵌入到RAN中,然而创建这些应用程序——收集数据、训练模型、编写代码以及安全部署它们——仍然缓慢且主要依赖人工。大型语言模型(LLM)具有强大的推理和代码生成能力,但不适合实时RAN控制所需的快速、确定性推理。我们提出了一种概念验证的双脑架构,结合了两者的优势:基于LLM的编排器将运营商意图转化为数据收集策略和部署代码,而自动化ML引擎NeuralSmith通过API按需训练轻量级分类器。我们描述了架构和提供工作流,分享了来自容器化O-RAN 5G SA测试平台的实际见解,并讨论了开放的研究方向。

英文摘要

The Open Radio Access Network (O-RAN) architecture allows AI to be embedded directly into the RAN through modular xApps and rApps, yet creating these applications collecting data, training models, writing code, and deploying them safely remains slow and largely manual. Large Language Models (LLMs) offer strong reasoning and code-generation capabilities but are unsuited for the fast, deterministic inference required in real-time RAN control. We present a proof-of-concept Dual-Brain architecture that combines both strengths: an LLM-based orchestrator translates operator intents into data-collection policies and deployment code, while an automated ML engine, NeuralSmith, trains lightweight classifiers on demand via an API. We describe the architecture and provisioning workflow, share practical insights from a containerized O-RAN 5G~SA testbed, and discuss open research directions.

2605.23453 2026-06-05 cs.LG 版本更新

Class-Dependent Hybrid Data Augmentation for Multiclass Migraine Classification under Severe Class Imbalance

严重类别不平衡下多类别偏头痛分类的类别依赖混合数据增强

Elvin Somón, Miguel A. Gutiérrez-Naranjo

发表机构 * University of Santiago de Compostela(圣地亚哥-德孔波斯特拉大学)

AI总结 针对偏头痛分类中严重类别不平衡问题,提出一种基于类别样本量的混合数据增强策略,并引入保真度不对称概念,在纠正数据泄露和指标偏差后,显著提升了多分类器的平均鲁棒性。

详情
AI中文摘要

我们以可重复性为导向,对先前的偏头痛分类研究进行了重新评估,纠正了数据泄露和指标偏差。然后我们引入了(i)一种临床驱动的两个偏瘫亚型聚合(遵循ICHD-3 §1.2.3),(ii)一种类别依赖的混合增强策略,根据每类样本量分配生成方法,以及(iii)保真度不对称的概念,激励按比例约束的增长作为完全类别平衡的替代方案。实验在包含400名患者、七种偏头痛亚型的数据集上进行,采用两阶段协议,包括上述六类配置。模型使用分层5折交叉验证进行评估,以宏平均F1作为主要指标。纠正方法缺陷降低了先前膨胀的性能估计,修正后的宏F1基线为0.71。所提出的框架在八个评估分类器的宏平均F1上持续优于单个增强器(0.862对比高斯Copula的0.836、CTGAN的0.815和无增强基线的0.801),并在比例增强下使用FT-Transformer达到峰值结果0.914。无增强的FT-Transformer基线(0.896)表明,在每分类器上限处,临床驱动的类别聚合贡献了大部分绝对改进;该框架的主要可测量贡献是跨分类器的平均鲁棒性提升,凸显了问题表述的主导作用。

英文摘要

We conducted a reproducibility-oriented re-evaluation of prior migraine classification studies, correcting for data leakage and metric bias. We then introduced (i) a clinically motivated aggregation of two hemiplegic subtypes following ICHD-3 §1.2.3, (ii) a class-dependent hybrid augmentation strategy that assigns generation methods based on per-class sample size, and (iii) the concept of fidelity asymmetry, motivating proportionally constrained growth as an alternative to full class balance. Experiments were performed on a dataset of 400 patients across seven migraine subtypes under a two-stage protocol, including the six-class configuration described above. Models were evaluated using stratified 5-fold cross-validation with macro-averaged F1 as the primary metric. Correcting methodological flaws reduces previously inflated performance estimates, with the corrected macro-F1 baseline standing at 0.71. The proposed framework consistently outperformed individual augmenters in macro-F1 averaged across the eight evaluated classifiers (0.862 vs. 0.836 for Gaussian Copula, 0.815 for CTGAN, and 0.801 for the no-augmentation baseline), and achieved its peak result of 0.914 with FT-Transformer under proportional augmentation. The no-augmentation FT-Transformer baseline (0.896) shows that, at the per-classifier ceiling, clinically motivated class aggregation accounts for most of the absolute improvement; the framework's principal measurable contribution is the gain in average robustness across classifiers, highlighting the dominant role of problem formulation.

2605.23415 2026-06-05 cs.LG cs.AI 版本更新

Reflex: Reinforcement Learning with Reflection Symmetry Exploitation in State-Based Continuous Control

Reflex: 基于状态连续控制中利用反射对称性的强化学习

Shuai Zhen, Yifan Zhang, Yuling Wang, Yanhua Yu

AI总结 提出Reflex框架,通过反射对称性正则化机制将反射对称性融入策略学习,提升基于状态的连续控制任务的样本效率。

Comments Some of the data in the paper contain errors and need to be confirmed for modification

详情
AI中文摘要

强化学习长期面临样本效率低下的问题。缓解该问题的一种有前景的方法是利用群不变马尔可夫决策过程($G$-不变MDP)。现有工作主要关注基于图像的强化学习和旋转对称性(如$\mathrm{SO(2)}$),而基于状态的强化学习和反射对称性尚未得到充分探索。本文聚焦于基于状态的连续控制任务,通过引入Reflex范式来利用反射对称性,该范式可无缝集成到同策略和异策略强化学习算法中。我们形式化了两种反射类型——轴向反射和双侧反射,并刻画了它们对应的变换。基于对保持对称性的最优值函数和策略的理论分析,Reflex通过原则性的对称性正则化机制将反射对称性融入策略学习。我们将Reflex与PPO和SAC集成,并在OpenAI Gym和DeepMind Control基准测试套件上进行评估,结果表明相比标准基线,Reflex在提升样本效率的同时实现了更优的性能。我们的代码开源在https://github.com/TonyStark042/Reflex。

英文摘要

Reinforcement learning has long struggled with poor sample efficiency. One promising approach to mitigate this problem is leveraging group-invariant Markov Decision Processes ($G$-invariant MDPs). Existing works in this direction have primarily focused on image-based RL and rotational symmetry such as $\mathrm{SO(2)}$, leaving state-based RL and reflection symmetry largely underexplored. In this work, we focus on state-based continuous control tasks and exploit reflection symmetry by introducing Reflex, a paradigm that seamlessly integrates with both on-policy and off-policy RL algorithms. We formalize two types of reflection-axial reflection and bilateral reflection, and characterize their corresponding transformations. Building on a theoretical analysis of symmetry-preserving optimal value functions and policies, Reflex integrates reflection symmetry into policy learning through principled symmetry regularization mechanisms. We integrate Reflex with PPO and SAC, and evaluate it on a suite of OpenAI Gym and DeepMind Control benchmarks, demonstrating superior performance over standard baselines while improving sample efficiency. Our code is available at https://github.com/TonyStark042/Reflex.

2605.21557 2026-06-05 stat.ML cs.AI cs.LG 版本更新

Scalable Reinforcement Learning via Adaptive Batch Scaling

通过自适应批处理缩放实现可扩展的在线强化学习

Jongchan Park

发表机构 * Jongchan Park

AI总结 本文提出自适应批处理缩放方法,通过动态调整有效批处理大小来平衡强化学习早期的可塑性需求和晚期的稳定收敛,发现增大网络和批处理大小的组合在强化学习中取得最佳性能。

详情
AI中文摘要

传统观点认为大批次训练与强化学习(RL)本质上不兼容,超过一定阈值后增大批次大小通常会导致回报减少或性能下降,由于数据分布的固有非平稳性。我们通过观察非平稳性并非RL的固定属性,而是随着训练过程演变:早期阶段表现出快速的行为转变,需要小批次以保持可塑性,而晚期阶段接近准平稳状态,大批次可实现精确收敛。受此启发,我们提出自适应批处理缩放(ABS),根据学习策略的稳定性动态调整有效批次大小。ABS的核心是行为分歧,一种新的度量指标,通过测量连续更新之间的动作级转变来量化策略非平稳性,用于将批次大小反向缩放至策略波动性。与并行化Q网络(PQN)算法结合并在ALE基准上评估,ABS无缝地平衡了早期阶段的可塑性和晚期阶段的稳定收敛。令人惊讶的是,与传统观点相反,我们的结果表明,较大的网络和较大的批次大小的组合实现了最佳性能——一种之前被认为在强化学习中无法实现的扩展行为,现在通过自适应批处理控制得以解锁。

英文摘要

Conventional wisdom holds that large-batch training is fundamentally incompatible with Reinforcement Learning (RL) - beyond a modest threshold, increasing batch sizes typically yields diminishing returns or performance degradation due to the inherent non-stationarity of the data distribution. We challenge this view by observing that non-stationarity is not a fixed property of RL, but evolves throughout training: early stages exhibit rapid behavioral shifts that demand small batches for plasticity, whereas late stages approach a quasi-stationary regime where large batches enable precise convergence. Motivated by this observation, we propose Adaptive Batch Scaling (ABS), that dynamically adjusts the effective batch size according to the stability of the learning policy. Central to ABS is Behavioral Divergence, a novel metric that quantifies policy non-stationarity by measuring action-level shifts between consecutive updates, which we use to scale batch size inversely to policy volatility. Integrated with the Parallelised Q-Network (PQN) algorithm and evaluated on the ALE benchmark, ABS seamlessly reconciles early-stage plasticity with late-stage stable convergence. Strikingly, contrary to conventional wisdom, our results reveal that the combination of larger networks and larger batch sizes achieves the best performance - a scaling behavior previously thought to be unattainable in RL, now unlocked through adaptive batch control.

2605.10807 2026-06-05 cs.CR cs.AR cs.LG 版本更新

LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

利用大语言模型进行安全硬件设计及相关问题:机遇与挑战

Johann Knechtel, Ozgur Sinanoglu, Ramesh Karri

发表机构 * New York University Abu Dhabi(纽约大学阿布扎克分校) NYU Tandon School of Engineering(纽约大学塔能工程学院)

AI总结 本文探讨了大语言模型在电子设计自动化和硬件安全领域的应用,分析了其在生成RTL代码、自动生成测试平台以及弥合高层次规格与硅芯片之间语义差距方面的潜力,同时指出了其引入的严重安全漏洞,并总结了当前研究的最新进展和未来研究方向。

Comments Accepted for 2026 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

详情
AI中文摘要

将大语言模型(LLMs)整合到电子设计自动化(EDA)和硬件安全领域正迅速重塑半导体行业。尽管LLMs在生成寄存器传输级(RTL)代码、自动生成测试平台以及弥合高层次规格与硅芯片之间的语义差距方面提供了前所未有的能力,但同时它们也引入了严重的安全隐患。本文全面回顾了LLM驱动的硬件设计的最新研究进展,围绕EDA综合、硬件信任、安全设计和教育等方面的关键进展进行深入分析。我们系统地扩展了最近突破的方法——从基于推理的综合和多代理漏洞提取到数据污染和对抗性机器学习(ML)规避。我们整合了关于关键防御措施的一般讨论,如动态基准测试以对抗数据记忆和激进的红队测试以实现稳健的安全评估。最后,我们综合了跨领域的经验教训,以指导未来研究朝着安全、可信和自主的设计生态系统发展。

英文摘要

The integration of Large Language Models (LLMs) into Electronic Design Automation (EDA) and hardware security is rapidly reshaping the semiconductor industry. While LLMs offer unprecedented capabilities in generating Register Transfer Level (RTL) code, automating testbenches, and bridging the semantic gap between high-level specifications and silicon, they simultaneously introduce severe vulnerabilities. This comprehensive review provides an in-depth analysis of the state-of-the-art in LLM-driven hardware design, organized around key advancements in EDA synthesis, hardware trust, design for security, and education. We systematically expand on the methodologies of recent breakthroughs -- from reasoning-driven synthesis and multi-agent vulnerability extraction to data contamination and adversarial machine learning (ML) evasion. We integrate general discussions on critical countermeasures, such as dynamic benchmarking to combat data memorization and aggressive red-teaming for robust security assessment. Finally, we synthesize cross-cutting lessons learned to guide future research toward secure, trustworthy, and autonomous design ecosystems.

2605.20119 2026-06-05 cs.LG cs.AI 版本更新

Toto 2.0: Time Series Forecasting Enters the Scaling Era

Toto 2.0:时间序列预测进入规模化时代

Emaad Khwaja, Chris Lettieri, Gerald Woo, Eden Belouadah, Marc Cenac, Guillaume Jarry, Enguerrand Paquin, Xunyi Zhao, Viktoriya Zhukov, Othmane Abou-Amal, Chenghao Liu, Ameet Talwalkar, David Asker

发表机构 * Datadog AI Research(Datadog AI研究院) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文提出Toto 2.0模型家族,通过单一训练配方在400万到25亿参数范围内实现可靠的预测质量提升,并在三个基准测试中达到新状态。

Comments Code: https://github.com/DataDog/toto Weights: https://huggingface.co/collections/Datadog/toto-20

详情
AI中文摘要

我们证明时间序列基础模型可以扩展:一种单一的训练配方能够在400万到25亿参数范围内产生可靠的预测质量提升。我们发布了Toto 2.0,这是一个由五种开放权重预测模型组成的家族,这些模型均基于此配方进行训练。Toto 2.0家族在三个预测基准测试中达到新状态:BOOM,我们的可观测性基准;GIFT-Eval,标准的通用基准;以及最近的抗污染TIME基准。本报告描述了我们的实验结果,并详细说明了Toto 2.0的设计决策:其架构和训练配方、训练数据以及u-muP超参数转移流水线。所有五个基础检查点均以Apache 2.0许可证发布。

英文摘要

We show that time series foundation models scale: a single training recipe produces reliable forecast-quality improvements from 4M to 2.5B parameters. We release Toto 2.0, a family of five open-weights forecasting models trained under this recipe. The Toto 2.0 family sets a new state of the art on three forecasting benchmarks: BOOM, our observability benchmark; GIFT-Eval, the standard general-purpose benchmark; and the recent contamination-resistant TIME benchmark. This report describes our experimental results and details the design decisions behind Toto 2.0: its architecture and training recipe, training data, and the u-muP hyperparameter transfer pipeline. All five base checkpoints are released under Apache 2.0.

2405.05097 2026-06-05 cs.LG stat.ML 版本更新

Biology-inspired joint distribution neurons based on Hierarchical Correlation Reconstruction allowing for multidirectional propagation of values and densities

受生物学启发的联合分布神经元:基于层次相关性重建的多向传播神经元

Jarek Duda

发表机构 * Jagiellonian University(雅盖隆大学)

AI总结 本文提出了一种受生物学启发的联合分布神经元,通过层次相关性重建实现多向值和密度传播,改进了传统人工神经元在学习、灵活性和鲁棒性方面的不足。

Comments 12 pages, 17 figures

详情
AI中文摘要

最近,一百万生物神经元(BNN)在现代强化学习(RL)方法中表现出色,尤其在Pong游戏中表现优异,表明它们在学习、灵活性和鲁棒性方面仍具有显著优势,提示需要改进当前人工神经元(如MLP/KAN)以更好地与生物神经元一致。本文提出了一种扩展KAN方法的神经元,包含局部联合分布模型:ρ(x)=∑_{j∈B} a_j f_j(x)对于x∈[0,1]^d,增加了对KAN的解释和信息流控制,并允许逐步补充生物神经元的三个基本特性:1)生物轴突可以双向传播,而当前人工神经元仅单向传播,联合分布神经元可通过替换变量获得条件值/分布;2)动物表现出风险规避,需要处理方差,现实世界更需要概率模型,所提方法可预测和传播分布作为矩向量(期望值、方差等);3)生物神经元需要局部训练,除了反向传播外,所提方法还允许其他训练方式,如直接训练、张量分解或最终的局部和有前景的信息瓶颈。所提方法非常通用,也可用于扩展softmax在transformer或JEPA嵌入中的应用,暗示特征是现实世界属性联合密度的混合矩。

英文摘要

Recently a million of biological neurons (BNN) has turned out better from modern RL methods in playing Pong~\cite{RL}, reminding they are still qualitatively superior e.g. in learning, flexibility and robustness - suggesting to try to improve current artificial e.g. MLP/KAN for better agreement with biological. There is proposed extension of KAN approach to neurons containing model of local joint distribution: $ρ(\mathbf{x})=\sum_{\mathbf{j}\in B} a_\mathbf{j} f_\mathbf{j}(\mathbf{x})$ for $\mathbf{x} \in [0,1]^d$, adding interpretation and information flow control to KAN, and allowing to gradually add missing 3 basic properties of biological: 1) biological axons propagate in both directions~\cite{axon}, while current artificial are focused on unidirectional propagation - joint distribution neurons can repair by substituting some variables to get conditional values/distributions for the remaining. 2) Animals show risk avoidance~\cite{risk} requiring to process variance, and generally real world rather needs probabilistic models - the proposed can predict and propagate also distributions as vectors of moments: (expected value, variance) or higher. 3) biological neurons require local training, and beside backpropagation, the proposed allows many additional ways, like direct training, through tensor decomposition, or finally local and promising: information bottleneck. Proposed approach is very general, can be also used as extension of softmax in embeddings of e.g. transformer, JEPA, Mamba, suggesting interpretation that features are mixed moments of joint density of real-world properties.

2605.13587 2026-06-05 stat.ML cs.LG eess.SP 版本更新

Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: A large-scale benchmark of operator-adaptive PLS and Ridge models

将预处理选择重新定义为近红外光谱学中的模型内部校准:一种大规模的运算符自适应PLS和岭模型基准测试

Gregory Beurier, Robin Reiter, Camille Noûs, Lauriane Rouan, Denis Cornet

发表机构 * CIRAD, UMR AGAP Institut(CIRAD,AGAP研究院) UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro(AGAP研究院,蒙彼利埃大学,CIRAD,INRAE,农业研究院) Laboratoire Cogitamus(Cogitamus实验室)

AI总结 本文研究了在近红外光谱学中,将预处理选择重新定义为模型内部校准的方法,通过大规模基准测试比较运算符自适应PLS和岭模型的性能和效率。

Comments 17 pages, 8 figures; supplementary material (39 pages, 4 figures) included. Extended preprint version of a companion study prepared as a concise journal article (same results, different framing and scope). Code and artifacts: https://github.com/GBeurier/nirs4all-aom

详情
AI中文摘要

预处理筛选通常是近红外光谱学校准工作流程中最昂贵的部分。其有效性在于平滑、导数、去趋势及相关滤波器会改变PLS或岭回归所看到的光谱方向,但完整的外部搜索会反复拟合几乎相同的线性模型。本文研究了将该搜索折叠成一个校准步骤的情况。对于严格的线性预处理运算符,变换后的PLS交叉协方差满足(XA^T)^T Y = AX^T Y,而岭回归依赖于运算符诱导的核X A^T A X^T。这些恒等式允许在模型内部筛选有限的运算符银行,同时保留原始波长系数。样本自适应或拟合的校正如SNV、MSC、EMSC和ASLS仍保持为折叠局部分支,而不是被吸收进代数中。本研究使用AOM基准队列:在显式中包含61个回归行和17个分类行。在主回归分母(N=32)上,普通的紧凑银行AOM-PLS记录了与PLS默认值相比的中位RMSEP比为0.991,与PLS-HPO相比为0.990;所选的ASLS-AOM-compact-cv5分支在相同的两个参考上记录为0.985和1.002。普通的AOMRidge-global-compact-none基线记录了与Ridge默认值相比的0.974,与Ridge-HPO相比为0.984,而所选的AOMRidge-Blender-headline-spxy3记录为0.918和0.966。所选分类器AOM-PLS-DA-global-simpls-covariance在13个数据集上将平衡精度提高了0.159,其中12/13胜出。运行时间差距是实际结果:PLS-HPO每次运行的中位总时间是710.81秒,而所选的AOM-PLS分支仅为1.63秒。线性运算符自适应校准因此在预测质量上与彻底的预处理筛选相当,对于PLS来说,拟合时间减少了多个数量级。

英文摘要

Preprocessing screening is often the most expensive part of a near-infrared spectroscopy calibration workflow. It works because smoothing, derivatives, detrending and related filters change the spectral directions seen by partial least squares (PLS) or Ridge regression, but a full external search repeatedly refits nearly the same linear model. This paper studies the case where that search can be collapsed into one calibration step. For a strict linear preprocessing operator A acting on row spectra as XA^T, the transformed PLS cross-covariance satisfies (XA^T)^T Y = A X^T Y, and Ridge regression depends on the operator-induced kernel X A^T A X^T. These identities let a finite operator bank be screened inside the model while retaining original-wavelength coefficients, and the same identity extends to cheaply evaluated linear operator chains. Sample-adaptive or fitted corrections such as SNV, MSC, EMSC and ASLS are not strict linear; we prove the boundary and keep them as fold-local branches. The cohort has 61 regression and 17 classification rows, with a strict paired regression denominator of N=32 for the eight paper variants. There, AOM-PLS reaches median RMSEP ratios of 0.991/0.990 (simple) and 0.985/1.002 (best) against PLS-default/PLS-HPO, and AOM-Ridge reaches 0.974/0.984 (simple) and 0.918/0.966 (best) against Ridge-default/Ridge-HPO. The operator-adaptive classifier AOM-PLS-DA improves balanced accuracy by a median 0.159 on N=13 datasets (12/13 wins). The practical result is the runtime gap: PLS-HPO takes a median 710.81 s per run, whereas AOM-PLS takes 1.18-1.63 s -- 436 to 602 times less PLS fitting time. Linear operator-adaptive calibration thus gives prediction quality comparable to exhaustive preprocessing screening, with orders-of-magnitude less fitting time for PLS.

2511.20577 2026-06-05 cs.LG 版本更新

MSTN: A Lightweight and Fast Model for General TimeSeries Analysis

MSTN: 一种轻量且快速的通用时间序列分析模型

Sumit S Shevtekar, Chandresh K Maurya

发表机构 * Department of Computer Science and Engineering(计算机科学与工程系) Indian Institute of Technology Indore(印度理工学院印度尔)

AI总结 本文提出了一种轻量且快速的MSTN模型,通过多尺度时间网络架构,结合早期内部聚合原理,有效处理时间序列中的非平稳性、非线性动态和多时间尺度行为,实现了在多个时间尺度上的灵活建模,同时保持了模型的轻量级和低延迟特性。

Comments 30 pages, published in Transactions on Machine Learning Research (TMLR)

详情
AI中文摘要

现实世界的时间序列往往表现出强烈的非平稳性、复杂的非线性动态以及在多个时间尺度上的行为,从快速的局部波动到缓慢演变的长期趋势。然而,许多现代架构施加了刚性的、固定尺度的结构先验,如基于补丁的标记化、预定义的感受野或冻结的主干编码器,这可能会过度正则化时间动态并限制对突发高幅事件的适应性。为此,我们引入了多尺度时间网络(MSTN),一种基于早期内部聚合原理的混合神经架构。MSTN集成了三个互补的组件:(i)多尺度卷积编码器,用于捕捉细粒度的局部结构;(ii)序列建模模块,通过递归或注意力机制学习长距离依赖;(iii)自我门控融合阶段,结合挤压激活和单个密集层,动态重新加权和融合多尺度表示。ETA确保下游模块在O(1)时间内运行,而编码器保留O(L^2)(Transformer)或O(L)(BiLSTM)。这种设计使MSTN能够灵活地建模从毫秒到长期的时间模式,同时避免了通常与长上下文模型相关的计算负担。在广泛的基准测试中,包括填补缺失值、长期预测、分类和跨数据集泛化,MSTN实现了最先进的性能,在27个数据集中的21个上建立了新的最佳结果,同时保持轻量级(MSTN-BiLSTM约为0.40M参数,MSTN-Transformer约为1.06M参数)并适合低延迟推理(<1秒,通常在毫秒级),资源受限的部署。

英文摘要

Real-world time series often exhibit strong non-stationarity, complex nonlinear dynamics, and behavior expressed across multiple temporal scales, from rapid local fluctuations to slow-evolving long-range trends. However, many contemporary architectures impose rigid, fixed-scale structural priors such as patch-based tokenization, predefined receptive fields, or frozen backbone encoders - which can over-regularize temporal dynamics and limit adaptability to abrupt high-magnitude events. To handle this, we introduce the Multi-scale Temporal Network (MSTN), a hybrid neural architecture grounded in an Early Temporal Aggregation principle. MSTN integrates three complementary components: (i) a multi-scale convolutional encoder that captures fine-grained local structure; (ii) a sequence modeling module that learns long-range dependencies through either recurrent or attention-based mechanisms; and (iii) a self-gated fusion stage incorporating squeeze-excitation and a single dense layer to dynamically reweight and fuse multi-scale representations. ETA ensures downstream modules operate in O(1) time, while the encoder retains O(L^2) (Transformer) or O(L) (BiLSTM). This design enables MSTN to flexibly model temporal patterns spanning milliseconds to extended horizons, while avoiding the computational burden typically associated with long-context models. Across extensive benchmarks covering imputation, long-term forecasting, classification, and cross-dataset generalization, MSTN achieves state-of-the-art performance, establishing new best results on 21 of 27 datasets, while remaining lightweight (~0.40M params for MSTN-BiLSTM and ~1.06M for MSTN-Transformer) and suitable for low-latency inference (<1 sec, often in milliseconds), resource-constrained deployment.

2605.16138 2026-06-05 cs.LG cs.AI hep-ex 版本更新

Surrogate Neural Architecture Codesign Package (SNAC-Pack)

代理神经架构协同设计包(SNAC-Pack)

Jason Weitz, Dmitri Demler, Benjamin Hawks, Aaron Wang, Nhan Tran, Javier Duarte

发表机构 * University of California San Diego(加州大学圣地亚哥分校) ETH Zurich(苏黎世联邦理工学院) Fermi National Accelerator Laboratory(费米国家加速器实验室) University of Illinois Chicago(伊利诺伊大学芝加哥分校)

AI总结 本文提出SNAC-Pack,一种面向硬件的自动化机器学习框架,用于神经架构协同设计和端到端FPGA部署,通过多目标全局搜索和硬件代理模型减少合成成本,并结合量化感知训练和迭代幅度剪枝来压缩模型,最终在FPGA上实现高效部署。

Comments 15 Pages, 3 Figures, AutoML (International Conference on Automated Machine Learning) 2026

详情
AI中文摘要

神经架构搜索(NAS)是一种强大的自动模型设计方法,但现有方法往往只优化准确率或依赖如位操作(BOPs)等代理指标,这些指标与硬件成本的相关性较差。在FPGA部署中,成本由查找表、DSP、触发器、BRAM和延迟等多维预算主导。我们提出了代理神经架构协同设计包(SNAC-Pack),一种开源的AutoML框架,用于硬件感知的神经架构协同设计和端到端FPGA部署。SNAC-Pack使用Optuna和NSGA-II进行多目标全局搜索,将试验加载到共享的SQLite存储中,以实现计算节点之间的并行工作。硬件代理模型输出每个试验的资源和延迟估计,避免了否则会主导搜索循环的合成成本。随后的局部搜索阶段结合量化感知训练(QAT)和迭代幅度剪枝,在联合压缩循环中应用。最后,通过hls4ml Python库将最终模型合成到FPGA固件中。YAML配置和可选的代理前端使用户能够在新数据集上运行管道而无需修改框架。我们在大型强子对撞机的喷射分类和超导量子比特读出中展示了SNAC-Pack,发现了紧凑的架构,这些架构在任务指标上匹配或超过强基线,同时减少FPGA资源利用,并在量子比特读出情况下将设计空间探索过程从数月的手动微调减少到数小时的自动化搜索。

英文摘要

Neural architecture search (NAS) is a powerful approach for automating model design, but existing methods often optimize for accuracy alone or rely on proxy metrics such as bit operations (BOPs) that correlate poorly with hardware cost. This gap is particularly large for FPGA deployment, where cost is dominated by a multi-dimensional budget of lookup tables, DSPs, flip-flops, BRAM, and latency. We present the Surrogate Neural Architecture Codesign Package (SNAC-Pack), an open-source AutoML framework for hardware-aware neural architecture codesign and end-to-end FPGA deployment. SNAC-Pack runs a multi-objective global search with Optuna and NSGA-II, loading trials to a shared SQLite store that enables parallel workers across compute nodes. A hardware surrogate model outputs per-trial resource and latency estimates, avoiding the synthesis cost that would otherwise dominate the search loop. A local search stage then applies quantization-aware training (QAT) together with iterative magnitude pruning in a combined compression loop, after which the final model is synthesized to FPGA firmware via the hls4ml Python library. A YAML configuration and an optional agentic frontend let users run the pipeline on new datasets without modifying the framework. We demonstrate SNAC-Pack on jet classification at the Large Hadron Collider and superconducting qubit readout, discovering compact architectures that match or exceed strong baselines on the task metric while reducing FPGA resource utilization and, in the qubit readout case, reducing the design space exploration process from months of manual fine-tuning to hours of automated search.

2509.10825 2026-06-05 cs.LG cs.AI stat.ML 版本更新

CUBE: Contrastive Understanding by Balanced Experiments

CUBE: 通过平衡实验实现对比理解

Dongseok Kim, Hyoungsun Choi, Mohamed Jismy Aashik Rasool, Gisung Oh

发表机构 * Department of Computer Engineering(计算机工程系) Gachon University(加荣大学)

AI总结 本文提出CUBE框架,通过平衡低-高探针解释已训练的预测模型,揭示模型的主要效应和交互作用,验证了其在合成和现实表格任务中的有效性。

Comments The core framework and main claims remain unchanged; the manuscript has been revised for clarity, presentation, and consistency

详情
AI中文摘要

事后解释依赖于模型查询是如何组织的。我们提出了CUBE,一种基于设计的框架,通过平衡的低-高探针来解释已训练的预测模型。所选变量定义了因素,设计的特征级组合定义了查询条件,模型预测被总结为因子对比。CUBE报告主效应和成对交互作用作为受控阅读的平均和条件响应变化的总结。在合成和现实表格任务中的实验表明,CUBE恢复了主导的学习效应结构,澄清了查询高效的可识别性,并支持筛查-后续细化。

英文摘要

Post-hoc explanation depends on how model queries are organized. We propose CUBE, a design-based framework that explains a trained predictive model through balanced low--high probes. Selected variables define factors, designed feature-level combinations define query conditions, and model predictions are summarized as factorial contrasts. CUBE reports main effects and pairwise interactions as controlled readings of average and conditional response changes over a declared design space. Experiments on synthetic and real tabular tasks show that CUBE recovers dominant learned effect structure, clarifies query-efficient identifiability, and supports screening--follow-up refinement.

2605.15454 2026-06-05 cs.CL cs.LG stat.ML 版本更新

Reasoning Models Don't Just Think Longer, They Move Differently

推理模型不只思考更久,它们的移动方式不同

Anders Gjølbye, Lars Kai Hansen, Sanmi Koyejo

发表机构 * Technical University of Denmark(丹麦技术大学) Stanford University(斯坦福大学)

AI总结 本文研究了推理训练模型在生成链式思维时的轨迹差异,发现通过长度校正后,不同领域中难度与轨迹几何的耦合关系存在显著差异,尤其是在代码领域中,推理训练模型表现出更直接的轨迹和更一致的局部曲率。

Comments Preprint

详情
AI中文摘要

经过训练的推理语言模型通常在更难的问题上消耗更多标记,但更长的思维链并不表明模型只是计算更多步骤或遵循不同的内部轨迹。我们通过在编程、数学和布尔可满足性问题中研究链式思维生成过程中的隐藏状态轨迹来区分这一区别。原始轨迹几何强烈受到生成长度的影响:更长的生成会机械地改变路径统计,因此在没有调整的情况下,基于难度的比较是误导的。在残差化轨迹统计后,难度在所有研究的领域中系统地与修正后的轨迹几何相关联。在代码领域中,最清晰的推理特定分离出现在更难的问题中,推理训练模型显示出更直接的修正轨迹和更一致的局部曲率,而与匹配的指令训练基线相比,这种差异更小。在数学和布尔可满足性问题中,修正后的难度-几何耦合较弱,但仍存在。提示阶段的线性探测不反映代码领域的分离,行为注释显示更强的修正耦合与策略转变和不确定性监控同时出现。这些发现确立了长度校正作为生成时间轨迹分析的先决条件,并表明推理训练可以与不同的修正轨迹几何相关联,这种效果的强度取决于领域。

英文摘要

Reasoning-trained language models often spend more tokens on harder problems, but longer chains of thought do not show whether a model is merely computing for more steps or following a different internal trajectory. We study this distinction through hidden-state trajectories during chain-of-thought generation across competitive programming, mathematics, and Boolean satisfiability. Raw trajectory geometry is strongly shaped by generation length: longer generations mechanically alter path statistics, so difficulty-dependent comparisons are misleading without adjustment. After residualizing trajectory statistics on length, difficulty remains systematically coupled to corrected trajectory geometry across all domains studied. The clearest reasoning-specific separation appears in the code domain, where harder problems show more direct corrected trajectories and less heterogeneous local curvature in reasoning-trained models than in matched instruction-tuned baselines. Corrected difficulty-geometry coupling is weaker, but still present, in mathematics and Boolean satisfiability. Prompt-stage linear probes do not mirror the code-domain separation, and behavioral annotations show that stronger corrected coupling co-occurs with strategy shifts and uncertainty monitoring. Together, these findings establish length correction as a prerequisite for generation-time trajectory analysis and show that reasoning training can be associated with distinct corrected trajectory geometry, with the strength of the effect depending on the domain.

2605.13830 2026-06-05 cs.AI cs.LG 版本更新

Quantifying Sensitivity for Tree Ensembles: A symbolic and compositional approach

对树集成模型的敏感性量化:一种符号和组合方法

Ajinkya Naik, Chaitanya Garg, S. Akshay, Ashutosh Gupta, Kuldeep S. Meel

发表机构 * Indian Institute of Technology Bombay(印度理工学院班加罗尔分校) University of Toronto(多伦多大学)

AI总结 本文提出了一种针对树集成模型的敏感性量化方法,通过离散化输入空间并枚举易受敏感性影响的区域,结合代数决策图(ADD)编码和分拆子问题,实现高效计算。实验表明,所提工具XCount在速度和可扩展性方面优于其他方法。

详情
AI中文摘要

决策树集成(DTE)是一种广泛应用于AI分类任务的流行模型,用于多个安全关键领域,因此对这些模型的验证已成为过去十年的研究热点之一。其中一个问题就是敏感性问题,它询问给定一个DTE,是否一小部分特征的变化会导致输入的误分类。在本工作中,我们的目标是构建一个针对DTEs的定量敏感性概念,通过离散化模型的输入空间并枚举易受敏感性影响的区域。我们提出了一种新的算法技术,可以在保证认证误差和置信度范围内高效地完成此计算。我们的方法基于将问题编码为代数决策图(ADD),并进一步将其拆分为可高效解决的子问题,使计算成为组合和可扩展的。我们在不同规模的基准上评估了我们的技术的性能,与相同问题编码下的模型计数器进行比较。实验结果表明,我们的工具XCount在速度上显著优于其他方法,并且在集成规模增加时表现良好。

英文摘要

Decision tree ensembles (DTE) are a popular model for a wide range of AI classification tasks, used in multiple safety critical domains, and hence verifying properties on these models has been an active topic of study over the last decade. One such verification question is the problem of sensitivity, which asks, given a DTE, whether a small change in subset of features can lead to misclassification of the input. In this work, our focus is to build a quantitative notion of sensitivity, tailored to DTEs, by discretizing the input space of the model and enumerating the regions which are susceptible to sensitivity. We propose a novel algorithmic technique that can perform this computation efficiently, within a certified error and confidence bound. Our approach is based on encoding the problem as an algebraic decision diagram (ADD), and further splitting it into subproblems that can be solved efficiently and make the computation compositional and scalable. We evaluate the performance of our technique over benchmarks of varying size in terms of number of trees and depth, comparing it against the performance of model counters over the same problem encoding. Experimental results show that our tool XCount achieves significant speedup over other approaches and can scale well with the increasing sizes of the ensembles.

2605.12951 2026-06-05 stat.ML cs.LG 版本更新

Coreset-Induced Conditional Velocity Flow Matching

由Coreset诱导的条件速度流匹配

Xiao Wang, Zihua She, Jianxi Su

发表机构 * Department of Statistics, Purdue University(普渡大学统计学系)

AI总结 本文提出了一种生成模型CCVFM,通过数据驱动的源分布增强层次化修正流,利用Coreset压缩目标数据并生成高斯混合分布,从而在无需学习神经采样器的情况下实现条件速度律的闭式表达,并通过轻量级修正流进一步优化生成效果。

详情
AI中文摘要

我们提出了Coreset-Induced Conditional Velocity Flow Matching (CCVFM),一种生成模型,通过数据驱动的源分布增强层次化修正流。层次化流匹配在速度空间中建模完整的条件速度定律,但其内部流被要求从头开始将各向同性高斯噪声传输到多模态目标速度分布。我们的关键观察是,此内部源可以被一个闭式近似替代,该近似基于目标的Coreset。CCVFM首先利用熵Sinkhorn Coreset将目标压缩为加权原子,并将它们提升为高斯混合分布。由此诱导的条件速度定律是一个闭式高斯混合分布,可在不学习神经采样器的情况下进行采样。一个轻量级修正流,从该精确近似源训练而来,然后优化剩余的近似到目标残差,而不是学习整个噪声到数据映射。我们证明,在显式压缩假设下,近似传输成本等于目标-近似Wasserstein差距,而噪声-源的类比具有维度尺度下界。我们进一步刻画了直接近似源训练目标的条件二次矩,并表明当近似条件律接近真实条件速度律在均值和协方差时,其源依赖的超额是小的。实验证明,在MNIST、CIFAR-10、ImageNet-32和CelebA-HQ上,所提方法在匹配架构下实现了具有竞争力的少步生成。

英文摘要

We propose Coreset-Induced Conditional Velocity Flow Matching (CCVFM), a generative model that augments hierarchical rectified flow with a data-informed source distribution. Hierarchical flow matching models the full conditional velocity law in velocity space, but its inner flow is asked to transport isotropic Gaussian noise to a multimodal target velocity distribution from scratch. Our key observation is that this inner source can be replaced by a closed-form surrogate built from a coreset of the target. CCVFM first compresses the target into weighted atoms using an entropic Sinkhorn coreset and lifts them to a Gaussian mixture. The induced conditional velocity law is then a closed-form Gaussian mixture that can be sampled without a learned neural sampler. A lightweight correction flow, trained from this exact surrogate source, then refines the remaining surrogate-to-target residual rather than learning an entire noise-to-data map. We prove that the surrogate transport cost equals the target--surrogate Wasserstein gap under an explicit compression assumption, whereas the noise-source analogue has a dimension-scale lower bound. We further characterize the conditional second moment of the direct surrogate-source training target and show that its source-dependent excess is small when the surrogate conditional law is close to the true conditional velocity law in mean and covariance. Empirically, on MNIST, CIFAR-10, ImageNet-32, and CelebA-HQ, the proposed method reaches competitive few-step generation under matched architectures.

2605.08318 2026-06-05 cs.LG cs.AI cs.NA math.NA physics.comp-ph stat.ML 版本更新

When Attention Beats Fourier: Multi-Scale Transformers for PDE Solving on Irregular Domains

当注意力胜过傅里叶:用于不规则域上的PDE求解的多尺度变换器

Brandon Yee, Pairie Koh, Jack Rodriguez, Mihir Tekal

发表机构 * Physics Lab, Yee Collins Research Group(Yee Collins研究组物理实验室)

AI总结 本文研究了深度学习模型在求解偏微分方程(PDE)时的架构选择问题,探讨了基于学习注意力的变换器架构在何时优于傅里叶域神经算子。引入了多尺度注意力变换器(MSAT),该架构将时空解的历史编码为令牌序列,并通过复合监督目标进行端到端训练。在五个基准问题上,与九种基线方法(包括物理信息神经网络、神经算子和状态空间模型)进行了全面的实证评估,展示了在复杂几何问题上的最佳泛化能力。

Comments Substantial Revision Required

详情
AI中文摘要

我们研究了深度学习模型在求解偏微分方程(PDE)时的架构选择问题,探讨了基于学习注意力的变换器架构在何时优于傅里叶域神经算子。我们介绍了多尺度注意力变换器(MSAT),一种深度学习架构,将时空解的历史编码为令牌序列,并通过复合监督目标进行端到端训练。我们对九种基线方法(包括物理信息神经网络、神经算子和状态空间模型)进行了全面的实证评估,覆盖了PINNacle套件中的五个基准问题,使用相同的训练/测试分割和参考数据。MSAT在复杂几何问题上实现了最先进的泛化能力(Heat2D-CG的L²相对误差为0.0101,比FNO提高了3.7倍),在34秒的总推理时间下,比Mamba-NO的120,812秒快得多。对物理正则化组件的消融研究揭示了精确的归纳偏置权衡:物理先验减少了扩散主导问题的测试误差,但会退化混沌和回流流动制度的泛化能力,直接刻画了先验规格错误的边界。近似误差界作为域边界复杂性κ的函数,为这些实证发现提供了理论基础,并为架构选择提供了一个原则性的规则。

英文摘要

We study the problem of \emph{architecture selection} for deep learning models trained to solve partial differential equations (PDEs), asking when transformer-based architectures with learned attention outperform Fourier-domain neural operators. We introduce the \textbf{Multi-Scale Attention Transformer} (\msat{}), a deep learning architecture that encodes spatiotemporal solution histories as token sequences and trains end-to-end via a composite supervised objective with optional physics-informed regularization terms. We conduct a comprehensive empirical evaluation against nine baselines -- including physics-informed neural networks (PINNs), neural operators (FNO, DeepONet, GNOT), and state-space models (Mamba-NO) -- across five benchmark problems from the PINNacle suite, using identical train/test splits and reference data for all methods. \msat{} achieves state-of-the-art generalization on complex geometry problems ($L^2_\mathrm{rel} = 0.0101$ on Heat2D-CG, a $3.7\times$ improvement over FNO) at $34\,\mathrm{s}$ total inference vs.\ $120{,}812\,\mathrm{s}$ for Mamba-NO. Ablation studies over the physics regularization component reveal a precise inductive bias tradeoff: physics priors reduce test error on diffusion-dominated problems but degrade generalization on chaotic and recirculating-flow regimes, directly characterizing the prior misspecification boundary. Approximation error bounds as a function of domain boundary complexity $κ$ provide a theoretical basis for these empirical findings and a principled rule for architecture selection.

2605.08253 2026-06-05 cs.LG cs.AI 版本更新

Path-Coupled Bellman Flows for Distributional Reinforcement Learning

路径耦合贝尔曼流用于分布式强化学习

Boyang Xu, Qing Zou, Siqin Yang, Hao Yan

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出路径耦合贝尔曼流(PCBF),一种连续时间的分布式强化学习方法,通过学习回报分布的流匹配来解决现有方法在边界不匹配和高方差-bootstrap问题,实验表明其在分布保真度和训练稳定性方面有所提升。

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026)

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026
AI中文摘要

分布式强化学习(DRL)模型完整回报分布,但现有有限支持或分位数方法依赖于投影,而近期基于流的方法在流源处可能遭受边界不匹配,或在当前和后续噪声独立时出现高方差的bootstrap问题。本文提出路径耦合贝尔曼流(PCBF),一种连续时间DRL方法,通过学习回报分布的流匹配使用源一致的贝尔曼耦合路径:当前路径从t=0所需的基先验开始,到达t=1的贝尔曼目标,并在中间时间保持路径上的线性关系到后续流(不需要时间t的边际满足分布贝尔曼固定点对所有t)。PCBF通过共享基噪声耦合当前和后续回报流,并使用λ参数化的控制变异目标:λ=0恢复无偏样本贝尔曼目标,而λ>0通过可控的偏倚换取方差减少。在可解析的MRPs、OGBench和D4RL上的实验表明,PCBF在分布保真度和训练稳定性方面有所提升,并在离线RL性能上具有竞争力。

英文摘要

Distributional reinforcement learning (DRL) models the full return distribution, but existing finite-support or quantile-based methods rely on projections, while recent flow-based approaches can suffer from \emph{boundary mismatch} at the flow source or from \emph{high-variance} bootstrapping when current and successor noises are independent. We propose Path-Coupled Bellman Flows (PCBF), a continuous-time DRL method that learns return distributions with flow matching using \textbf{source-consistent Bellman-coupled paths}: the current path starts from the required base prior at $t{=}0$, reaches the Bellman target at $t{=}1$, and maintains a pathwise affine relation to the successor flow at intermediate times (without requiring time-$t$ marginals to satisfy a distributional Bellman fixed point for all $t$). PCBF couples current and successor return flows through shared base noise and uses a $λ$-parameterized control-variate target: $λ{=}0$ recovers an unbiased sample Bellman target, while $λ{>}0$ trades controlled bias for variance reduction. Experiments on analytically tractable MRPs, OGBench, and D4RL show improved distributional fidelity and training stability, and competitive offline RL performance.

2605.08215 2026-06-05 cs.CV cs.LG cs.RO 版本更新

Test-Time Training for Visual Foresight Vision-Language-Action Models

测试时训练用于视觉前瞻视觉-语言-动作模型

Sangwu Park, Wonjoong Kim, Yeonjun In, Sein Kim, Hongseok Kang, Chanyoung Park

发表机构 * KAIST(韩国科学技术院)

AI总结 本文提出了一种测试时训练方法,用于增强视觉前瞻视觉-语言-动作模型在面对分布外数据时的鲁棒性,通过引入适应性更新过滤机制来减少测试时更新带来的实际挑战。

Comments Accepted at ICML 2026 Workshop on Continual Adaptation at Scale (CATS)

详情
AI中文摘要

Visual Foresight VLA (VF-VLA) 已成为最近 VLA 中的重要架构选择,因其出色的性能。然而,VF-VLA 的固有设计使其特别容易受到分布外(OOD)偏移的影响。由于动作的质量直接取决于预测未来视觉信息的准确性,OOD 条件会影响两个阶段。为了解决这一脆弱性,我们提出了测试时训练视觉前瞻 VLA($T^3$VF),这是一种受观察启发的测试时训练方法,即预测的未来图像及其后续观察形成自然的监督对。为了进一步解决由于随意测试时更新而产生的实际挑战,我们引入了自适应更新过滤机制。经验上,$T^3$VF 在不改变任何架构或辅助模块的情况下,以适度的额外推理成本缓解了 VF-VLA 的 OOD 脆弱性。

英文摘要

Visual Foresight VLA (VF-VLA) has become a prominent architectural choice in the recent VLA due to its impressive performance. Nevertheless, the inherent design of VF-VLA makes it particularly vulnerable to out-of-distribution (OOD) shifts. Because the quality of action directly depends on the accuracy of the predicted future visual information, OOD conditions affect both stages at once. To address this vulnerability, we propose Test-Time Training Visual Foresight VLA ($T^3$VF), a test-time training approach motivated by the observation that the predicted future image and its subsequent observation form a natural supervision pair. To further address the practical challenges that arise from indiscriminate test-time updates, we introduce an adaptive update filtering mechanism. Empirically, $T^3$VF mitigates the OOD vulnerability of VF-VLA at a modest additional inference cost, without requiring any architectural modification or auxiliary modules.

2605.07482 2026-06-05 cs.LG cs.AI 版本更新

SHRED: Retain-Set-Free Unlearning via Self-Distillation with Logit Demotion

SHRED: 通过自蒸馏与对数势降低实现无保留集的去记忆

Zizhao Hu, Ameya Godbole, Johnny Tian-Zheng Wei, Mohammad Rostami, Jesse Thomason, Robin Jia

发表机构 * University of Southern California(南加州大学) USC Information Sciences Institute(USC信息科学研究所)

AI总结 本文提出了一种无需保留集的去记忆方法SHRED,通过自蒸馏与对数势降低,在去记忆的同时保持模型的实用性,优于传统需要保留集的方法。

详情
AI中文摘要

针对大语言模型(LLMs)的机器去记忆问题,旨在选择性地移除记忆中的内容,如私人数据、受版权文本或危险知识,而无需昂贵的全量重新训练。现有大多数方法需要一个经过精心挑选的保留集以防止一般模型用途的灾难性退化,这会增加额外的数据依赖性,使部署复杂化。我们提出SHRED(通过高惊奇度的无保留集熵降低的自蒸馏),一种无需保留集的去记忆方法,基于一个关键洞察:并非所有遗忘集实例中的token都同等地包含记忆信息。高信息token集中了模型的记忆知识,而低信息token反映了一般语言能力。SHRED分为两个阶段。(1)选择:我们对遗忘集实例进行前向传递,收集每个token的自回归概率,并选择底部(最低概率,最高香农信息)作为遗忘位置;剩余位置保留为良性锚点。(2)训练:我们构建了修改的KL目标,降低记忆token在遗忘位置的logit,同时在良性位置保持原始分布。模型通过单一的顶部KL自蒸馏目标进行训练,同时驱动遗忘和实用性保持。我们评估了SHRED在四个标准去记忆基准上的表现,并证明其在遗忘效果和模型实用性之间建立了新的帕累托最优权衡,优于保留集依赖的方法。我们的分析显示,SHRED对重新学习攻击和成员推断攻击具有鲁棒性,并且在多次连续去记忆运行后仍能保持稳定的实用性。

英文摘要

Machine unlearning for large language models (LLMs) aims to selectively remove memorized content such as private data, copyrighted text, or hazardous knowledge, without costly full retraining. Most existing methods require a retain set of curated examples to prevent catastrophic degradation of general model utility, creating an extra data dependency that complicates deployment. We propose SHRED (Self-distillation via High-surprisal-only Retain-set-free Entropy Demotion), a retain-set-free unlearning method built on a key insight: not all tokens within a forget set instance carry memorized information equally. High-information tokens concentrate the model's memorized knowledge, while low-information tokens reflect general language competence. SHRED operates in two stages. (1) Selection: We perform a forward pass on a forget set instance, collect per-token autoregressive probabilities, and select the bottom (lowest probability, highest Shannon information) as forget positions; the remaining positions are retained as benign anchors. (2) Training: We construct modified KL targets that demote the memorized token's logit at forget positions while preserving the original distribution at benign positions. The model is then trained via a single top KL self-distillation objective that simultaneously drives forgetting and utility preservation. We evaluate SHRED across four standard unlearning benchmarks and demonstrate that it establishes a new Pareto-optimal trade-off between forget efficacy and model utility, outperforming retain-set-dependent methods. Our analysis shows that SHRED is robust against relearning attacks and membership-inference attacks, and it maintains stable utility even after many sequential unlearning runs.

2605.07096 2026-06-05 cs.LG cs.AI stat.ME 版本更新

Query-efficient model evaluation using cached responses

通过缓存响应实现高效的模型评估

Hayden Helm, Ben Johnson, Carey Priebe

发表机构 * University of Maryland(马里兰大学)

AI总结 本文提出了一种基于数据核视角空间(DKPS)的方法,利用已缓存的模型响应来预测基准性能,从而减少评估新模型所需的查询数量,提高了模型评估的效率。

详情
AI中文摘要

在部署新模型之前,评估其在现有基准上的表现通常是必要的。对于现代评估框架来说,生成并评估所有查询的响应可能成本过高。实际上,先前评估模型的响应往往被缓存——这为利用此额外信息来减少准确评估新模型所需查询数量提供了潜在机会。在本文中,我们介绍了一种预测基准性能的方法,该方法利用缓存的模型响应,基于数据核视角空间(DKPS),一种在黑箱设置下量化模型之间关系的方法。理论上,我们证明了基于DKPS的方法在某些条件下是查询高效的。实证上,我们展示了基于DKPS的方法在查询预算大幅减少的情况下,能够达到与基线相同的平均绝对误差。最后,我们提出了一种离线方法,用于选择一组查询,以最大化参考模型上的拟合质量,从而在随机查询选择的基础上提高预测准确性。

英文摘要

Evaluating a new model on an existing benchmark is often necessary to understand its behavior before deployment. For modern evaluation frameworks, generating and evaluating a response for all queries can be prohibitively expensive. In practice, responses from previously-evaluated models are often cached -- creating a potential opportunity to use this additional information to decrease the number of queries required to accurately evaluate a new model. In this paper, we introduce an approach for predicting benchmark performance that leverages cached model responses based on the Data Kernel Perspective Space (DKPS), a method for quantifying the relationship between models in the black-box setting. Theoretically, we show that DKPS-based methods are query-efficient under certain conditions. Empirically, we demonstrate that DKPS-based methods achieve the same mean absolute error as baselines with a substantially decreased query budget. We conclude by proposing an offline method for selecting a set of queries that maximizes the goodness-of-fit on reference models, improving prediction accuracy over random query selection.

2604.26634 2026-06-05 cs.LG econ.GN q-fin.EC stat.AP 版本更新

Electricity price forecasting across Norway's five bidding zones in the post-crisis era

在危机后时代跨挪威五个竞价区的电力价格预测

My Thi Diem Phan, Trung Tuyen Truong, Hoai Phuong Ha, Dat Thanh Nguyen

发表机构 * Independent researcher(独立研究者) Department of Mathematics, University of Oslo(奥斯陆大学数学系) Department of Computer Science, The Arctic University of Norway(挪威北极大学计算机科学系) Faculty of Medicine, University of Oslo(奥斯陆大学医学院)

AI总结 本文研究了挪威五个竞价区在能源危机后电力价格预测的问题,通过构建多模态数据集并评估了八种预测模型,发现LightGBM在所有区域表现最佳,同时强调了外部特征在不同市场状况下的重要性。

Comments This version removes variables unavailable at prediction time to eliminate look-ahead leakage, clarifies the forecasting task definition, and updates the results and discussion accordingly. All tables and figures have been recomputed

详情
AI中文摘要

挪威的电力市场长期以来由水电主导,但2021-2022年的能源危机和与欧洲大陆的更强整合已从根本上改变了价格形成机制,降低了基于历史数据校准的预测模型的可靠性。尽管需要更新的模型,但缺乏一个统一的基准来评估所有结构各异的挪威竞价区的特征贡献。本文提出了对Nord Pool市场在所有五个挪威竞价区的一步预测的全面评估。我们构建了一个覆盖2019-2025年的多模态小时数据集,并使用严格因果测试集评估了八种预测模型家族,包括Light Gradient Boosting Machine(LightGBM)、带有外生变量的自回归模型和先进的深度学习架构。我们实现了稳健的滚动起源回测、留一组法特征消融和条件制度分析来分解模型性能和特征效用。我们的结果表明,LightGBM在每个区域都表现最佳,平均绝对误差范围为1.60至5.58欧元每兆瓦时,而一个带有外生变量的岭正则化自回归模型在北部区域仍然是一个高度有竞争力的线性基准。特征消融揭示了仅依赖滞后价格和日历变量的模型能够获得高精度,通常与完整的多模态模型的性能相匹配或接近。然而,条件制度分析显示,外部特征如水库水位和天然气价格在分层预测误差方面至关重要,这些误差在压力市场制度下持续增加。这突显了模型可解释性和制度意识在决策者面对市场动态结构性变化时的实用价值。

英文摘要

Norway's electricity market is heavily dominated by hydropower, but the 2021-2022 energy crisis and stronger integration with Continental Europe have fundamentally altered price formation, reducing the reliability of forecasting models calibrated on historical data. Despite the critical need for updated models, a unified benchmark evaluating feature contributions across all structurally diverse Norwegian bidding zones remains lacking. Here we present a comprehensive evaluation of one-step-ahead forecasting of the Nord Pool market across all five Norwegian bidding zones. We constructed a multimodal hourly dataset spanning 2019-2025 and evaluated eight forecasting model families, including Light Gradient Boosting Machine (LightGBM), autoregressive models with exogenous variables, and advanced deep learning architectures, using a strictly causal test set. We implemented robust rolling-origin backtesting, leave-one-group-out feature ablation, and conditional regime analysis to dissect model performance and feature utility. Our results show that LightGBM achieves the best performance in every zone, with mean absolute error ranging from 1.60 to 5.58 euros per megawatt-hour, while a ridge-regularized autoregressive model with exogenous variables remains a highly competitive linear benchmark in northern zones. Feature ablation reveals that models relying solely on lagged prices and calendar variables achieve high accuracy and often match or closely approach the performance of the full multimodal model. However, conditional regime analysis demonstrates that external features like reservoir levels and gas prices remain crucial to stratify forecast errors, which consistently increase under stressed market regimes. This highlights the practical value of model interpretability and regime awareness for decision makers facing structural changes in market dynamics.

2604.26269 2026-06-05 cs.CL cs.AI cs.LG 版本更新

Calibrated Surprise: An Information-Theoretic Account of Creative Quality

校准的惊喜:一种信息论视角下的创造性质量

Bo Zou, Chao Xu

发表机构 * Bo Zou(邹波) Chao Xu(徐超)

AI总结 本文提出了一种信息论框架,用于评估创造性写作的质量,通过校准的惊喜概念,结合香农互信息理论,量化了高质量文本与降质文本之间的差异。

Comments 28 pages, 3 figures

详情
AI中文摘要

在大型语言模型时代,创造性写作的质量缺乏可计算的理论基础。主流方法是评分标准——将整体审美判断分解为子评分,以及通过RLHF偏好信号——用群体投票代替质量。这两种方法都绕过了文本本身的统计结构。本文提供了一种信息论基础,填补这一空白。我们提出了'校准的惊喜'作为优秀创造性写作的信息论本质。这种判断符合阅读直觉并涵盖了其对立面。这种文学判断可以精确地进行数学公式化。在完全维度约束Y下,可行的写作选择被强制进入极狭窄的空间。稀有的幸存者,从无约束的视角来看,恰好是最不可预测的选择。两者都通过香农互信息I(X;Y) = H(X) - H(X|Y)精确测量——'校准'对应H(X|Y)接近0;'惊喜'对应H(X)升高。公式的减法结构自然地将'有根据的惊喜'与'纯噪声'分开。我们使用Qwen1.5-7B的token级logprobs作为理想读者概率分布的操作代理。在20对(12中文/8英文)的高质量与系统降质文学段落中,20/20对支持核心预测:高质量段落的I(X;Y)系统性地高于其降质版本。

英文摘要

In the era of large language models, creative writing quality lacks a computable theoretical anchor. The dominant approaches are rubric scoring -- decomposing holistic aesthetic judgment into sub-scores -- and RLHF preference signals -- replacing quality with group votes. Both bypass the statistical structure of the text itself. This paper provides an information-theoretic foundation to fill this gap. We propose 'calibrated surprise' as the information-theoretic essence of excellent creative writing. This judgment matches reading intuition and covers its opposite. This literary judgment admits a precise mathematical formulation. Under full-dimensional constraints Y, feasible writing choices are forced into an extremely narrow space. The rare survivors are, from the unconstrained perspective, exactly the least predictable choices. Both are measured precisely by Shannon mutual information I(X;Y) = H(X) - H(X|Y) -- 'calibrated' corresponds to H(X|Y) approaching 0; 'surprising' corresponds to H(X) going high. The subtraction structure of the formula naturally separates 'well-grounded surprise' from 'pure noise'. We use token-level logprobs from Qwen1.5-7B as an operational proxy for the ideal reader's probability distribution. Across 20 pairs (12 Chinese / 8 English) of high-quality vs. systematically degraded literary passages, 20/20 pairs support the core prediction: high-quality passages have systematically higher I(X;Y) than their degraded versions.

2604.17121 2026-06-05 cs.LG cs.AI 版本更新

The Topological Trouble With Transformers

Transformer 的拓扑困境

Michael C. Mozer, Shoaib Ahmed Siddiqui, Rosanne Liu

发表机构 * Google DeepMind(谷歌深Mind)

AI总结 本文探讨了Transformer在处理序列结构时的拓扑问题,指出其纯前馈架构限制了动态状态跟踪,提出应通过递归架构转向隐含激活动态,并介绍了连续思维Transformer架构的分类方法及未来研究方向。

详情
AI中文摘要

Transformers通过扩展的上下文历史在序列中编码结构。然而,其纯前馈架构从根本上限制了动态状态跟踪。状态跟踪——迭代更新反映不断变化环境的潜在变量——涉及本质上序列依赖性,这使得前馈网络难以维持。因此,前馈模型会将演进状态表示推入其层栈更深处,使得信息在浅层不可用,最终耗尽模型的深度。虽然动态深度模型和显式或隐式思维可以绕过这一深度限制,但这些解决方案在计算和内存上效率低下。在本文中,我们主张,时间扩展认知需要从显式思维轨迹转向隐式激活动态,通过递归架构。我们引入了递归和连续思维Transformer架构的分类方法,按其递归轴(深度与步长)和输入标记与递归步长的比例进行分类。最后,我们概述了有前景的研究方向,包括增强的状态空间模型和粗粒度递归,以更好地将状态跟踪整合到现代基础模型中。

英文摘要

Transformers encode structure in sequences via an expanding contextual history. However, their purely feedforward architecture fundamentally limits dynamic state tracking. State tracking -- the iterative updating of latent variables reflecting an evolving environment -- involves inherently sequential dependencies that feedforward networks struggle to maintain. Consequently, feedforward models push evolving state representations deeper into their layer stack with each new input step, rendering information inaccessible in shallow layers and ultimately exhausting the model's depth. While this depth limit can be bypassed by dynamic depth models and by explicit or latent thinking that externalizes state representations, these solutions are computationally and memory inefficient. In this article, we argue that temporally extended cognition requires refocusing from explicit thought traces to implicit activation dynamics via recurrent architectures. We introduce a taxonomy of recurrent and continuous-thought transformer architectures, categorizing them by their recurrence axis (depth versus step) and their ratio of input tokens to recurrence steps. Finally, we outline promising research directions, including enhanced state-space models and coarse-grained recurrence, to better integrate state tracking into modern foundation models.

2604.23466 2026-06-05 cs.LG cs.AI cs.AR 版本更新

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

评估Hopper和Blackwell GPU上的CUDA Tile用于AI工作负载

Divakar Kumar Yadav, Tian Zhao, Deepak Kumar

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 本文评估了CUDA Tile在Hopper和Blackwell GPU上的AI工作负载性能,比较了CuTile与cuBLAS、Triton等方法的效率和可移植性,发现CuTile在特定工作负载上表现优异,但在跨架构优化上仍有不足。

详情
AI中文摘要

NVIDIA的CUDA Tile(CuTile)引入了一种基于Python的、以tile为中心的抽象,用于GPU内核开发,旨在简化编程同时保持Tensor Core和Tensor Memory Accelerator(TMA)在现代GPU上的效率。我们对三种NVIDIA GPU(Hopper和Blackwell架构下的H100 NVL、B200和RTX PRO 6000 Blackwell Server Edition)上的CuTile进行了首次独立、跨架构评估,对比了cuBLAS、Triton、WMMA和原始SIMT等现有方法。我们通过基准测试代表性AI工作负载,包括GEMM、融合多头注意力和端到端LLM推理(BF16/FP16精度),以评估性能和可移植性。我们的结果表明,CuTile的效果强烈依赖于工作负载和架构。在数据中心级Blackwell(B200)上,CuTile在融合注意力任务中达到最高1007 TFLOP/s,比FlashAttention-2快2.5倍,仅需60行Python内核代码。对于GEMM,CuTile在22行代码中达到cuBLAS性能的52-79%,比WMMA的123行代码更高效,使其成为手写CUDA内核的实用替代品,但尚未成为供应商优化库的替代品。然而,相同的CuTile注意力内核在RTX PRO 6000(sm_120)上仅达到FlashAttention-2的53%吞吐量,暴露了显著的跨架构优化差距。相比之下,Triton在所有测试平台上的cuBLAS性能保持在62-101%,无需架构特定调整,显示出更强的可移植性。

英文摘要

NVIDIA's CUDA Tile (CuTile) introduces a Python-based, tile-centric abstraction for GPU kernel development that aims to simplify programming while retaining Tensor Core and Tensor Memory Accelerator (TMA) efficiency on modern GPUs. We present the first independent, cross-architecture evaluation of CuTile against established approaches such as cuBLAS, Triton, WMMA, and raw SIMT on three NVIDIA GPUs spanning Hopper and Blackwell: H100 NVL, B200, and RTX PRO 6000 Blackwell Server Edition. We benchmark representative AI workloads, including GEMM, fused multi-head attention, and end-to-end LLM inference in BF16/FP16 precision, to assess both performance and portability. Our results show that CuTile effectiveness is strongly workload- and architecture-dependent. On datacenter-class Blackwell (B200), CuTile achieves up to 1007 TFLOP/s for fused attention, outperforming FlashAttention-2 by 2.5x while requiring only 60 lines of Python kernel code. For GEMM, CuTile reaches 52-79% of cuBLAS performance in 22 lines of code (versus 123 for WMMA), making it a practical replacement for hand-written CUDA kernels but not yet for vendor-optimized libraries. However, the same CuTile attention kernel achieves only 53% of FlashAttention-2 throughput on RTX PRO 6000 (sm_120), exposing significant cross-architecture optimization gaps. In contrast, Triton sustains 62-101% of cuBLAS performance across all tested platforms without architecture-specific tuning, demonstrating substantially stronger portability.

2602.23665 2026-06-05 cs.IR cs.LG cs.SI 版本更新

Geodesic Semantic Search: Cartographic Navigation of Citation Graphs with Learned Local Riemannian Maps

测地语义搜索:基于学习局部黎曼度量的引文图导航

Brandon Yee, Lucas Wang, Kundana Kommini

AI总结 本文提出Geodesic Semantic Search (GSS),通过在引文图上学习节点特定的黎曼度量,实现几何感知的语义检索。不同于传统基于嵌入的检索依赖固定欧几里得距离,GSS在每个节点学习低秩度量张量,诱导局部正定度量,从而在保持模型可计算性的同时保证有效度量。检索过程通过多源Dijkstra算法在学习的测地距离上进行,随后通过最大边际相关性重排序和路径一致性过滤。在包含169,000篇arXiv论文的引文预测基准上,GSS在Recall@20上比SPECTER+FAISS基线提升了23%。我们提供了Bridge Recovery Guarantee,描述了测地检索在定性上优于直接相似性的情况,以及训练损失与检索质量的边际分离结果,并刻画了低秩度量参数化的表达能力。我们的分层粗到细检索方法结合k-means池化,将计算成本降低4倍,同时保持97%的检索质量。

Comments Substantial Revision Required

详情
AI中文摘要

我们提出了Geodesic Semantic Search (GSS),一种检索系统,通过在引文图上学习节点特定的黎曼度量,以实现几何感知的语义检索。不同于标准基于嵌入的检索依赖固定欧几里得距离,\gss{}在每个节点学习一个低秩度量张量$\mL_i \in \R^{d imes r}$,诱导一个局部正定度量$\mG_i = \mL_i \mL_i^ op + \eps \mI$。这种参数化保证了有效的度量,同时保持模型的可计算性。检索过程通过在学习的测地距离上进行多源Dijkstra算法,随后通过最大边际相关性重排序和路径一致性过滤。在包含169,000篇arXiv论文的引文预测基准上,GSS在Recall@20上比SPECTER+FAISS基线提高了23%。我们提供了Bridge Recovery Guarantee,描述了测地检索在定性上优于直接相似性的情况,以及训练损失与检索质量的边际分离结果,并刻画了低秩度量参数化的表达能力。我们的分层粗到细检索方法结合k-means池化,将计算成本降低4倍,同时保持97%的检索质量。

英文摘要

We present Geodesic Semantic Search (GSS), a retrieval system that learns node-specific Riemannian metrics on citation graphs to enable geometry-aware semantic search. Unlike standard embedding-based retrieval that relies on fixed Euclidean distances, \gss{} learns a low-rank metric tensor $\mL_i \in \R^{d \times r}$ at each node, inducing a local positive semi-definite metric $\mG_i = \mL_i \mL_i^\top + \eps \mI$. This parameterization guarantees valid metrics while keeping the model tractable. Retrieval proceeds via multi-source Dijkstra on the learned geodesic distances, followed by Maximal Marginal Relevance reranking and path coherence filtering. On citation prediction benchmarks with 169K arXiv papers, GSS achieves 23\% relative improvement in Recall@20 over SPECTER+FAISS baselines. We provide a Bridge Recovery Guarantee characterizing when geodesic retrieval qualitatively outperforms direct similarity, a margin separation result connecting training loss to retrieval quality, and characterize the expressiveness of low-rank metric parameterization. Our hierarchical coarse-to-fine search with k-means pooling reduces computational cost by $4\times$ while maintaining 97\% retrieval quality.

2604.22583 2026-06-05 cs.LG 版本更新

Adaptive Head Budgeting for Efficient Multi-Head Attention

自适应头预算用于高效多头注意力

Bilal Faye, Abdoulaye Mbaye, Hanane Azzag, Mustapha Lebbah

发表机构 * LIPN, Université Paris 13(巴黎第十三大学LIPN实验室) Université Paris 13(巴黎第十三大学) Université de Versailles Saint-Quentin-en-Yvelines(巴黎- Versailles 巴黎-圣昆丁-埃夫里大学)

AI总结 提出BudgetFormer,通过动态分配注意力头预算和相关性分布,在文本分类任务中减少计算和内存开销,同时保持或提升性能。

详情
AI中文摘要

多头注意力使Transformer能够捕获多样化的表示,但无论任务复杂度如何,通常每个输入都会激活所有注意力头。对于粗粒度任务(如文本分类),相关信息通常是全局性的,这种固定分配会引入不必要的计算。我们提出BudgetFormer,一种基于每个输入动态分配注意力头的Transformer架构。该模型学习头预算和相关性分布,以选择信息量最大的头。为了支持有效的头选择,我们引入了一种平衡探索与利用的训练策略。在文本分类任务上的实验表明,BudgetFormer减少了FLOPs和内存使用,同时匹配或超越了标准多头注意力的性能。这些结果突显了自适应头分配作为提高Transformer效率和性能的有效方法。

英文摘要

Multi-head attention enables Transformers to capture diverse representations, but all attention heads are typically activated for every input, regardless of task complexity. For coarse-grained tasks such as text classification, where relevant information is often global, this fixed allocation can introduce unnecessary computation. We propose BudgetFormer, a Transformer architecture that dynamically allocates attention heads on a per-input basis. The model learns both a head budget and a relevance distribution to select the most informative heads. To support effective head selection, we introduce a training strategy that balances exploration and exploitation. Experiments on text classification tasks show that BudgetFormer reduces FLOPs and memory usage while matching or surpassing the performance of standard multi-head attention. These results highlight adaptive head allocation as an effective approach to improving Transformer efficiency and performance.

2602.13939 2026-06-05 cs.LG cs.AI 版本更新

A Horizon-Aware Decision-Support Framework for Demand Forecasting Model Selection in Resilient Production Planning

面向 horizon 的决策支持框架:用于在鲁棒生产计划中选择需求预测模型

Adolfo González, Víctor Parada

发表机构 * Department of Computer Engineering and Informatics, Faculty of Engineering, University of Santiago of Chile(工程学院计算机工程与信息学系,智利圣塔克鲁斯大学)

AI总结 本文提出了一种面向 horizon 的决策支持框架,用于在需求波动大、不确定性高的生产计划中选择需求预测模型,通过 MDFH 方法预测误差指标并提出 RMSSEh 和 AHSIV 作为改进的模型选择方法。

Comments 31 pages, 12 figures and Appendix

详情
AI中文摘要

需求预测是鲁棒生产计划、库存补给、采购和产能决策中的关键输入,在需求间歇性、高波动性和运营不确定性下尤为重要。在这些情况下,仅根据固定的测试周期性能选择预测模型可能导致决策与预测所用的未来规划周期不一致。本文提出 Metric Degradation by Forecast Horizon(MDFH)程序作为面向 horizon 的决策支持框架,用于选择需求预测模型。MDFH 在显式结构稳定性条件下,将观察到的测试周期的误差指标(如MAE、RMSE和RMSSE)投影到未来的运营周期。基于此层,RMSSEh 被推导为一种简洁的面向周期的选优器,同时提出 Adaptive Hybrid Selector for Intermittency and Variability(AHSIV)作为结构异质需求序列的适应性扩展。ERA,一种多变量排名聚合选优器,被包含为比较对象。实证评估使用了Walmart、M3、M4和M5数据集,三个训练-测试分区,22个预测模型和12步未来周期。结果表明,RMSSEh 和 AHSIV 在通过事后全局相对精度评估时,比ERA提供更一致的下游体积对齐。

英文摘要

Demand forecasting is a critical input for resilient production planning, inventory replenishment, procurement, and capacity decisions under demand intermittency, high variability, and operational uncertainty. In these contexts, selecting forecasting models solely on the basis of fixed test-horizon performance may lead to decisions misaligned with the future planning horizons in which forecasts are used. This study proposes the Metric Degradation by Forecast Horizon (MDFH) procedure as a horizon-aware decision-support framework for selecting demand forecasting models. MDFH projects eligible out-of-sample error metrics, specifically MAE, RMSE, and RMSSE, from an observed test horizon toward future operational horizons under explicit structural-stability conditions. Based on this layer, RMSSEh is derived as a parsimonious horizon-aware selector, while the Adaptive Hybrid Selector for Intermittency and Variability (AHSIV) is proposed as an adaptive extension for structurally heterogeneous demand series. ERA, a multivariate ranking-aggregation selector, is included as a comparator. The empirical evaluation uses the Walmart, M3, M4, and M5 datasets, three training-testing partitions, 22 forecasting models, and 12-step future horizons. Results show that RMSSEh and AHSIV provide more consistent downstream volumetric alignment than ERA when assessed through ex post Global Relative Accuracy.

2510.22048 2026-06-05 cs.LG 版本更新

PF$Δ$: A Benchmark Dataset for Power Flow under Load, Generation, and Topology Variations

PF$Δ$: 一个用于负载、发电和拓扑变化的功率流基准数据集

Ana K. Rivera, Anvita Bhagavathula, Alvaro Carbonero, Priya Donti

发表机构 * Department of Electrical Engineering & Computer Science(电气工程与计算机科学系) Laboratory for Information & Decision Systems(信息与决策系统实验室) Massachusetts Institute of Technology(麻省理工学院)

AI总结 本文提出PF$Δ$基准数据集,用于评估在负载、发电和拓扑变化下的功率流计算,通过包含859,800个解决的实例,涵盖六种不同的变电站系统大小,并包含三种故障场景,以评估传统求解器和基于GNN的方法,识别现有方法的不足和未来研究的开放问题。

Comments 31 pages, 14 figures. Accepted at NeurIPS 2025

详情
Journal ref
NeurIPS 2025
AI中文摘要

功率流(PF)计算是实时电网操作的核心,广泛应用于诸如故障分析(其中重复的PF评估评估在停电情况下的电网安全性)和拓扑优化(涉及基于PF的在组合学上庞大的动作空间中的搜索)。在操作时间尺度上运行这些计算或在大规模评估空间中仍然是主要的计算瓶颈。此外,随着可再生能源的整合和气候引起的极端天气,电力系统操作的不确定性也在增加,这要求工具能够准确且高效地模拟广泛的情景和运行条件。机器学习方法相对于传统求解器提供了潜在的加速,但其性能尚未在能够捕捉真实世界变化的基准上得到系统评估。本文介绍了PF$Δ$,一个用于功率流的基准数据集,能够捕捉负载、发电和拓扑的多样化变化。PF$Δ$包含859,800个解决的功率流实例,涵盖六种不同的变电站系统大小,捕捉三种类型的故障场景(N、N-1和N-2),并包括接近不可行的案例,这些案例接近稳态电压稳定性极限。我们评估了传统求解器和基于GNN的方法,突显了现有方法在关键领域的不足,并识别了未来研究的开放问题。我们的数据集可在https://huggingface.co/datasets/pfdelta/pfdelta/tree/main获取,我们的代码、数据生成脚本和模型实现可在https://github.com/MOSSLab-MIT/pfdelta获取。

英文摘要

Power flow (PF) calculations are the backbone of real-time grid operations, across workflows such as contingency analysis (where repeated PF evaluations assess grid security under outages) and topology optimization (which involves PF-based searches over combinatorially large action spaces). Running these calculations at operational timescales or across large evaluation spaces remains a major computational bottleneck. Additionally, growing uncertainty in power system operations from the integration of renewables and climate-induced extreme weather also calls for tools that can accurately and efficiently simulate a wide range of scenarios and operating conditions. Machine learning methods offer a potential speedup over traditional solvers, but their performance has not been systematically assessed on benchmarks that capture real-world variability. This paper introduces PF$Δ$, a benchmark dataset for power flow that captures diverse variations in load, generation, and topology. PF$Δ$ contains 859,800 solved power flow instances spanning six different bus system sizes, capturing three types of contingency scenarios (N , N -1, and N -2), and including close-to-infeasible cases near steady-state voltage stability limits. We evaluate traditional solvers and GNN-based methods, highlighting key areas where existing approaches struggle, and identifying open problems for future research. Our dataset is available at https://huggingface.co/datasets/pfdelta/pfdelta/tree/main and our code with data generation scripts and model implementations is at https://github.com/MOSSLab-MIT/pfdelta.

2604.03634 2026-06-05 cs.LG cs.IT eess.SP math.IT 版本更新

Algebraic Diversity: Group-Theoretic Spectral Estimation from Single Observations

代数多样性:从单次观测进行群论谱估计

Mitchell A. Thornton

发表机构 * Richardson, TX 75080 USA(美国德克萨斯州里奇蒙德市75080号)

AI总结 本文通过群论方法揭示了单次观测下的谱估计问题,证明了时间平均是退化群作用的特例,并展示了群平均估计与多快门协方差估计的等效性,同时统一了DFT、DCT和KLT等变换。

Comments 41 pages, 14 figures. v3: Retracted six quantitative findings in Section 11, transformer application, due to implementation error in spectral concentration metric. Corrected results deferred to separate publication. Remark added after Conjecture 23 on orbit-structure bias in psi criterion. All other sections unaffected v4: new result on blind group matching; v5: corrected/updated metrics

详情
AI中文摘要

我们证明时间平均多个观测是退化群作用的特例,群G={e}。一个通用替换定理证明了单快门群平均估计与多快门协方差估计具有等效的子空间分解。平凡群嵌入定理证明样本协方差是平凡群估计的累积,其方差由(G,L)连续体支配,随1/(|G|·L)变化。处理增益10log10(M) dB等于经典波束成形增益,证明该增益是群阶的属性而非传感器数量。DFT、DCT和KLT统一为群匹配的特例。我们推测一个通用代数平均定理,将这些结果扩展到任意统计量,方差由有效群阶d_eff支配。蒙特卡洛实验在五种群类型下的前四个样本矩上验证了该猜想,精度达四位。该框架利用信息的结构(数据对象的表示论对称性)而非内容,补充了香农理论。五种应用被展示:单快门MUSIC、大规模MIMO、单脉冲波形分类、图信号处理和变压器LLM分析。描述了盲群匹配技术。

英文摘要

We establish that temporal averaging over multiple observations is the degenerate case of algebraic group action with the trivial group $G=\{e\}$. A General Replacement Theorem proves that a group-averaged estimator from one snapshot achieves equivalent subspace decomposition to multi-snapshot covariance estimation. The Trivial Group Embedding Theorem proves that the sample covariance is the accumulation of trivial-group estimates, with variance governed by a $(G,L)$ continuum as $1/(|G|\cdot L)$. The processing gain $10\log_{10}(M)$ dB equals the classical beamforming gain, establishing that this gain is a property of group order, not sensor count. The DFT, DCT, and KLT are unified as group-matched special cases. We conjecture a General Algebraic Averaging Theorem extending these results to arbitrary statistics, with variance governed by the effective group order $d_{\mathrm{eff}}$. Monte Carlo experiments on the first four sample moments across five group types confirm the conjecture to four-digit precision. The framework exploits the $structure$ of information (representation-theoretic symmetry of the data object) rather than the content, complementing Shannon's theory. Five applications are demonstrated: single-snapshot MUSIC, massive MIMO, single-pulse waveform classification, graph signal processing, and analysis of transformer LLMs. Techniques for blind group matching are described.

2603.10457 2026-06-05 physics.plasm-ph cond-mat.stat-mech cs.LG physics.acc-ph 版本更新

Beam-Plasma Collective Oscillations in Intense Charged-Particle Beams: Dielectric Response Theory, Langmuir Wave Dispersion, and Unsupervised Detection via Prometheus

强流带电粒子束中束-等离子体集体振荡:介电响应理论、朗缪尔波色散以及通过Prometheus的无监督检测

Brandon Yee, Wilson Collins, Michael Iofin, Jiayi Fu

AI总结 本文研究了强流带电粒子束中束-等离子体集体振荡的理论和计算框架,通过介电响应理论、朗缪尔波色散关系以及Prometheus算法验证了束-等离子体过渡的特性,展示了其在中间能区的应用前景。

Comments Substantial Revision Required

详情
AI中文摘要

我们开发了一个理论和计算框架,用于研究强流带电粒子束在中间能量(10-100 MeV)下的束-等离子体集体振荡。在第一部分,我们建立了由Vlasov-Poisson系统支配的动能场理论,推导出三种束分布函数的Lindhard介电函数和随机相位近似(RPA)极化张量。我们通过介电函数epsilon(omega,q)=0证明了临界束密度n_c以上的未阻尼朗缪尔波模式的存在,获得了显式的束-等离子体色散关系,并表明Landau阻尼在粒子-空穴连续谱之上消失。等离子体频率Omega_p^2 = ne^2/(m*epsilon_0)通过f求和规则固定,与分布形状无关;更高的色散系数取决于速度矩。空间电荷效应驱动异常束展宽,具有sqrt(n-n_c)起始和q=2k_F处的Friedel振荡。束-等离子体过渡通过重整化群分析属于三维Ising普遍性类。在第二部分,我们利用Prometheus验证这些预测,Prometheus是基于静态结构因子数据S(q)训练的beta-VAE。Prometheus检测到高斯和均匀分布中的集体等离子体振荡起始,确认在退相干费米气体(n_c->0)中不存在,且在q=2k_F处解析了Kohn异常。通过PIC模拟得到的S(q,omega)色散分析验证了由f求和规则预测的分布无关的Omega_p。所有六个验证检查均通过。预测的特征——密度可调的等离子体共振在omega_p与sqrt(n)成正比、异常束展宽具有sqrt(n-n_c)起始以及Friedel振荡——在现有的中间能区束设施中是可访问的。

英文摘要

We develop a theoretical and computational framework for beam-plasma collective oscillations in intense charged-particle beams at intermediate energies (10-100 MeV). In Part I, we formulate a kinetic field theory governed by the Vlasov-Poisson system, deriving the Lindhard dielectric function and random phase approximation (RPA) polarization tensor for three beam distribution functions. We prove via the dielectric function epsilon(omega,q)=0 the existence of undamped Langmuir wave modes above a critical beam density n_c, obtain explicit beam-plasma dispersion relations, and show that Landau damping vanishes above the particle-hole continuum. The plasma frequency Omega_p^2 = ne^2/(m*epsilon_0) is fixed by the f-sum rule independently of distribution shape; higher dispersion coefficients depend on velocity moments. Space charge effects drive anomalous beam broadening with sqrt(n-n_c) onset and Friedel oscillations at q=2k_F. The beam-plasma transition belongs to the 3D Ising universality class via renormalization group analysis. In Part II, we validate these predictions using Prometheus, a beta-VAE trained on static structure factor data S(q) from particle-in-cell (PIC) beam simulations. Prometheus detects collective plasma oscillation onset in Gaussian and uniform distributions, confirms their absence in the degenerate Fermi gas (n_c -> 0), and resolves the Kohn anomaly at q=2k_F. Dispersion analysis of S(q,omega) from PIC simulations verifies the distribution-independent Omega_p predicted by the f-sum rule. All six validation checks pass. Predicted signatures -- density-tunable plasma resonances at omega_p proportional to sqrt(n), anomalous beam broadening with sqrt(n-n_c) onset, and Friedel oscillations -- are accessible at existing intermediate-energy beam facilities.

2604.07709 2026-06-05 cs.AI cs.CL cs.CY cs.LG 版本更新

IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures

IatroBench: AI安全措施中意外伤害的预注册证据

David Gringras

发表机构 * Harvard T.H. Chan School of Public Health(哈佛大学T.H. 洪学校公共卫生学院)

AI总结 该研究通过IatroBench评估了AI安全措施在医疗决策中的意外伤害风险,发现不同模型在身份相关性上的隐瞒行为存在显著差异,尤其在高度安全训练的模型中表现更明显。

Comments 30 pages, 3 figures, 11 tables. Pre-registered on OSF (DOI: 10.17605/OSF.IO/G6VMZ). Code and data: https://github.com/davidgringras/iatrobench. v2: Fix bibliography entries (add arXiv IDs, published venues); correct p-value typo in Limitations section; add AI Assistance Statement v3: Correct Figure 1 (decoupling scatter accidentally reverted to earlier draft in v2)

详情
AI中文摘要

一个经过严格安全训练的模型会将完整的苯二氮䓬类药物减量方案交给医生,而拒绝给需要该方案的患者,尽管临床事实完全相同;知识在两种情况下都存在。IatroBench在六十个预注册的临床场景和六个前沿模型(3,600次响应)中测量这种不对称性,并通过医生编写的结构化评估进行评分,该评估由第二位医生验证(加权Kappa 0.571,内部一致性96%)。在保持临床内容不变的情况下,仅改变提问者是患者还是医生,产生我们称为身份依赖性隐瞒的现象:所有五个可测试的模型都给医生更多(解耦间隙+0.38,p=0.003;在安全冲突行动上的非专业人士命中率下降13.1点,p<0.0001;其余无变化),且在最高度安全训练的模型Opus中,差距最大(+0.65)。触发因素是缺乏任何专业或知识信号,而不是身份证明,因为律师或知情的非专业人士可以恢复被拒绝的患者情况。仅考虑委托的基准会将三种机制评分相同。Opus抑制了医生框架证明其知道的内容;Llama 4在两种框架中都不胜任;GPT-5.2的过滤器剥离了其33.2%的医生响应,但没有剥离非专业人士的响应。评估层继承了训练层的盲目性;标准LLM评分者在我们流程标记为有害的81.5%的响应中对遗漏伤害评分零(Kappa 0.066),因此用于检测失败的工具重现了这种现象。这些场景是为碰撞设计的;其比率描述了这种设计,但说 nothing about ordinary prevalence.

英文摘要

A heavily safety-trained model will hand a physician the full, patient-followable benzodiazepine taper and refuse it to the patient who needs it, over identical clinical facts; the knowledge is present either way. IatroBench measures that asymmetry across sixty pre-registered clinical scenarios and six frontier models (3,600 responses), scoring each on two axes, commission harm (what a response gets wrong) and omission harm (what it withholds), through a physician-authored structured evaluation validated by a second physician (weighted kappa 0.571, within-1 agreement 96%). Holding clinical content fixed and varying only whether the asker presents as patient or physician yields what we call identity-contingent withholding: all five testable models give the physician more (a decoupling gap of +0.38, p = 0.003; a 13.1-point fall in layperson hit rates on safety-colliding actions, p < 0.0001; no change on the rest), and the gap runs widest in the most heavily safety-trained model, Opus (+0.65). The trigger is the absence of any professional or epistemic signal rather than a credential, since a lawyer or an informed layperson recovers what the patient is refused. A commission-only benchmark would score three mechanisms alike. Opus suppresses what physician framing proves it knows; Llama 4 is incompetent in either framing; GPT-5.2's filter strips 33.2% of its physician responses and none of the lay ones. The evaluation layer inherits the blindness of the training layer; a standard LLM judge scores zero omission harm on 81.5% of the responses our pipeline flags harmful (kappa 0.066), so the instrument built to detect the failure reproduces it. The scenarios are engineered for collision; their rates describe that design and say nothing about ordinary prevalence.

2604.12110 2026-06-05 cs.LG 版本更新

SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling

SOLARIS: 预测性卸载基于潜在表示的推理扩展

Zikun Liu, Liang Luo, Qianru Li, Zhengyu Zhang, Wei Ling, Jingyi Shen, Zeliang Chen, Yaning Huang, Jingxian Huang, Abdallah Aboelela, Chonglin Sun, Feifan Gu, Fenggang Wu, Hang Qu, Huayu Li, Jill Pan, Kaidi Pei, Laming Chen, Longhao Jin, Qin Huang, Tongyi Tang, Varna Puvvada, Wenlin Chen, Xiaohan Wei, Xu Cao, Yantao Yao, Yuan Jin, Yunchen Pu, Yuxin Chen, Zijian Shen, Zhengkai Zhang, Jing Zhu, Dong Liang, Ellie Wen

发表机构 * Meta AI

AI总结 本文提出SOLARIS框架,通过预测未来请求中的用户-项目交互嵌入,将昂贵的基础模型推理与关键服务路径解耦,从而在大规模应用中实现实时知识转移,提升服务效率和收益。

Comments Accepted to SIGIR 2026 Industry Track

详情
AI中文摘要

近期推荐系统扩展定律的进展导致了前所未有的复杂基础模型。尽管这些模型性能优异,但其计算需求使得实时服务不切实际,通常迫使从业者依赖知识蒸馏,以牺牲服务质量换取效率。为了解决这一挑战,我们提出了SOLARIS(基于潜在表示的推测卸载推理扩展)框架,灵感来源于推测解码。SOLARIS通过预测未来请求中可能出现的用户-项目对,主动预计算用户-项目交互嵌入,并异步生成其基础模型表示。这种方法将昂贵的基础模型推理与延迟敏感的服务路径解耦,使能够从此前被认为过于昂贵而无法用于在线使用的模型中进行实时知识转移。在部署于Meta的广告系统中,该系统每日处理数十亿请求,SOLARIS实现了0.67%的收益驱动的顶级指标提升,证明了其在大规模应用中的有效性。

英文摘要

Recent advances in recommendation scaling laws have led to foundation models of unprecedented complexity. While these models offer superior performance, their computational demands make real-time serving impractical, often forcing practitioners to rely on knowledge distillation-compromising serving quality for efficiency. To address this challenge, we present SOLARIS (Speculative Offloading of Latent-bAsed Representation for Inference Scaling), a novel framework inspired by speculative decoding. SOLARIS proactively precomputes user-item interaction embeddings by predicting which user-item pairs are likely to appear in future requests, and asynchronously generating their foundation model representations ahead of time. This approach decouples the costly foundation model inference from the latency-critical serving path, enabling real-time knowledge transfer from models previously considered too expensive for online use. Deployed across Meta's advertising system serving billions of daily requests, SOLARIS achieves 0.67% revenue-driving top-line metrics gain, demonstrating its effectiveness at scale.

2604.08477 2026-06-05 cs.AI cs.CL cs.LG 版本更新

SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions

SUPERNOVA: 通过自然指令上的强化学习激发大语言模型的通用推理

Ashima Suvarna, Kendrick Phan, Mehrab Beikzadeh, Hritik Bansal, Saadia Gabriel

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校)

AI总结 本文提出SUPERNOVA框架,通过自然指令数据集构建高质量的强化学习可验证奖励数据集,通过100+次强化学习实验系统研究如何利用这些数据集提升下游推理性能,并在BigBench Extra Hard基准上实现64.4个百分点的相对提升。

Comments 23 Pages; 2-column format; 10 figures

详情
AI中文摘要

强化学习可验证奖励(RLVR)在数学和代码等正式领域显著提升了推理能力,但将其扩展到STEM领域以外仍然具有挑战性。扩展RLVR到STEM领域本质上受到高质量可验证训练数据的缺乏限制。在本文中,我们引入SUPERNOVA,一个从自然指令数据集中整理RLVR数据的框架,这些数据集是专家标注的丰富来源,但尚未被充分利用于RLVR训练。通过100多次受控的强化学习实验,我们系统研究如何利用这些数据集进行RLVR训练以及数据整理决策如何影响下游推理性能。特别是,我们研究了三种数据设计:(a)源任务选择,(b)任务混合,以及(c)合成干预。我们的分析揭示了源任务选择对下游推理性能有显著影响。此外,基于单个目标任务性能选择任务优于基于总体平均性能的策略,合成干预并未提高推理能力。受这些见解的启发,我们构建了SUPERNOVA,一个从自然指令数据集中整理出的25,000个实例的高质量RLVR数据集。我们证明了在SUPERNOVA上训练Qwen3-0.6B比基础Qwen3-0.6B表现更优,在包含23个复杂推理任务的挑战性基准BigBench Extra Hard(BBEH)上实现了64.4个百分点的相对提升。重要的是,我们发现SUPERNOVA的收益可以推广到未见基准、更大模型规模和新模型家族。总体而言,我们的发现为整理人类标注资源以扩展RLVR到通用推理提供了实用见解。模型、数据、代码见https://github.com/asuvarna31/supernova。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) has substantially improved reasoning in formal domains such as mathematics and code, but extending these gains beyond STEM remains challenging. Extending RLVR beyond STEM is fundamentally constrained by the lack of high-quality verifiable training data. In this work, we introduce SUPERNOVA, a framework for curating RLVR data from natural instruction datasets, which are a rich source of expert-annotated data but are underexplored for RLVR training. Through 100+ controlled RL experiments, we systematically study how to utilize these dataset for RLVR and how data curation decisions affect downstream reasoning performance . In particular, we investigate three data designs: (a) source task selection, (b) task mixing, and (c) synthetic interventions. Our analysis reveals that source task selection has a significant impact on downstream reasoning performance. Moreover, selecting tasks based on their performance for individual target tasks outperforms strategies based on overall average performance and synthetic interventions do not improve reasoning. Guided by these insights, we construct SUPERNOVA, a high-quality RLVR dataset of 25K instances curated from natural instruction datasets. We show that training Qwen3-0.6B on SUPERNOVA outperforms the base Qwen3-0.6B, yielding a relative gain of 64.4pp on BigBench Extra Hard (BBEH), a challenging benchmark comprising 23 complex reasoning tasks. Importantly, we find that gains from SUPERNOVA generalize to unseen benchmarks, larger model scales, and newer model families. Overall, our findings provide practical insights for curating human-annotated resources to extend RLVR to general reasoning. Models, Data, Code at https://github.com/asuvarna31/supernova.

2604.01349 2026-06-05 cs.LG cs.CE physics.comp-ph 版本更新

PI-JEPA: Label-Free Surrogate Pretraining for Coupled Multiphysics Simulation via Operator-Split Latent Prediction

PI-JEPA: 一种无需标签的替代预训练方法,用于通过操作符分裂潜在预测的耦合多物理场模拟

Brandon Yee, Pairie Koh

AI总结 该研究提出了一种无需完整PDE求解的替代预训练框架PI-JEPA,通过掩码潜在预测和每子操作符PDE残差正则化,在未标记的参数场上训练,从而减少多物理场替代部署所需的模拟预算。

Comments Substantial Revision Required

详情
AI中文摘要

流体模拟工作流程面临根本的数据不对称性:输入参数场(地质统计渗透率实现、孔隙度分布)可以自由生成任意数量,但现有神经操作符替代模型需要大量昂贵的标记模拟轨迹数据,无法利用这种未标记结构。我们引入PI-JEPA(物理信息联合嵌入预测架构),一种替代预训练框架,无需任何完整的PDE求解,通过在未标记参数场上的掩码潜在预测和每子操作符PDE残差正则化进行训练。预测器银行在结构上与 governing equations 的李-特罗特操作符分裂分解对齐,为每个子过程(压力、饱和度传输、反应)分配一个物理约束的潜在模块,使微调仅需100次标记模拟运行。在单相达西流中,PI-JEPA在N_ℓ=100时比FNO低1.9倍,比DeepONet低2.4倍,在N_ℓ=500时比监督-only训练提高24%,证明了无标签替代预训练显著减少了多物理场替代部署所需的模拟预算。

英文摘要

Reservoir simulation workflows face a fundamental data asymmetry: input parameter fields (geostatistical permeability realizations, porosity distributions) are free to generate in arbitrary quantities, yet existing neural operator surrogates require large corpora of expensive labeled simulation trajectories and cannot exploit this unlabeled structure. We introduce \textbf{PI-JEPA} (Physics-Informed Joint Embedding Predictive Architecture), a surrogate pretraining framework that trains \emph{without any completed PDE solves}, using masked latent prediction on unlabeled parameter fields under per-sub-operator PDE residual regularization. The predictor bank is structurally aligned with the Lie--Trotter operator-splitting decomposition of the governing equations, dedicating a separate physics-constrained latent module to each sub-process (pressure, saturation transport, reaction), enabling fine-tuning with as few as 100 labeled simulation runs. On single-phase Darcy flow, PI-JEPA achieves $1.9\times$ lower error than FNO and $2.4\times$ lower error than DeepONet at $N_\ell{=}100$, with 24\% improvement over supervised-only training at $N_\ell{=}500$, demonstrating that label-free surrogate pretraining substantially reduces the simulation budget required for multiphysics surrogate deployment.

2604.01489 2026-06-05 cs.LG cs.AI cs.DC cs.PF cs.SE 版本更新

CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe

CuTeGen: 基于LLM的代理框架用于使用CuTe生成和优化高性能GPU内核

Tara Saba, Zhiyang Chen, Jikai Jason Li, Anne Ouyang, Xujie Si, Fan Long

发表机构 * Department of Computer Science, University of Toronto(计算机科学系,多伦多大学)

AI总结 本文提出CuTeGen,一种基于LLM的代理框架,通过CuTe抽象层实现GPU内核的生成和优化,通过结构化生成-测试-优化工作流,在标准基准测试中实现了比PyTorch快1.71倍的速度提升,并在生成成本相近的情况下优于现有代理基线CudaForge。

详情
AI中文摘要

高性能GPU内核对现代机器学习系统至关重要,但开发这些内核仍然是一个手动、专家驱动的过程。最近的研究尝试利用LLM自动生成功能内核,但生成的内核在标准化基准测试中仍无法达到精心调优的参考内核。我们提出了CuTeGen,一种代理GPU内核合成框架,将内核开发视为在CuTe抽象层上的结构化生成-测试-优化工作流。CuTeGen有两个设计选择区别于先前的工作:针对CuTe而不是原始CUDA,这暴露了性能关键结构如分块和数据移动,同时保持足够的稳定性以进行迭代优化;以及延迟的性能调度,将低层次性能反馈推迟到内核的高层结构稳定之后。在209个KernelBench Level-1和Level-2任务上,CuTeGen在PyTorch上实现了平均1.71倍的速度提升,并在生成成本相近的情况下优于先前的代理基线CudaForge(0.89倍)。代码可在https://github.com/taratt/cutegen.git获取。

英文摘要

High-performance GPU kernels are critical to modern machine learning systems, yet developing them remains a manual, expert-driven process. Recent work has explored using LLMs to automate kernel generation, but generated kernels still fall short of carefully tuned references on standardized benchmarks. We present CuTeGen, an agentic GPU kernel synthesis framework that treats kernel development as a structured generate-test-refine workflow over the CuTe abstraction layer. Two design choices distinguish CuTeGen from prior work: targeting CuTe rather than raw CUDA, which exposes performance-critical structures such as tiling and data movement while remaining stable enough for iterative refinement, and a delayed profiling schedule that withholds low-level performance feedback until the kernel's high-level structure has stabilized. On the 209 tasks of KernelBench Level-1 and Level-2, CuTeGen achieves an average speedup of 1.71$\times$ over PyTorch and outperforms the prior agentic baseline CudaForge (0.89$\times$) at comparable per-task generation cost. Code available at https://github.com/taratt/cutegen.git

2604.00230 2026-06-05 cs.LG 版本更新

Neural Collapse Dynamics: Depth, Activation, Regularisation, and Feature Norm Threshold

神经坍缩动力学:深度、激活、正则化和特征范数阈值

Anamika Paul Rupa

发表机构 * Department of Electrical Engineering and Computer Science(电气工程与计算机科学系)

AI总结 本文研究了神经坍缩现象的动力学,发现特征范数达到特定临界值时会发生神经坍缩,并探讨了深度、激活函数、正则化和网络宽度对这一过程的影响。

详情
AI中文摘要

神经坍缩(NC)——即最后层特征收敛到一个等角紧框架——在平衡状态下已被深入理解,但其发生过程的动力学仍不明确。我们发现一个简单且可预测的规律:当特征范数的均值达到模型-数据集特定的临界值fn*时会发生NC,该值对训练条件变化不敏感。该值在每个(模型,数据集)对中高度集中(CV < 8%);训练动态主要影响fn接近fn*的速度,而非其值本身。在标准训练轨迹中,fn低于fn*的交叉始终在NC发生之前,提供了一个具有平均提前时间62个周期(MAE 24个周期)的实用预测器。直接干预实验确认fn*是梯度流的稳定吸引子——特征尺度的扰动在训练过程中会自我校正,无论方向如何都会收敛到相同值(p>0.2)。完成(架构x数据集)网格揭示了本文最强的结果:ResNet-20在MNIST上给出fn* = 5.867——相对于CIFAR-10的+68%,架构效应增加了+458%。该网格强烈非加性;fn*不能分解为独立的架构和数据集贡献。四个结构性规律出现:(1)深度对坍缩速度有非单调影响;(2)激活函数共同决定坍缩速度和fn*;(3)权重衰减定义了一个三区域相图——太小会减慢,最佳范围最快,太大会阻止坍缩;(4)宽度单调加速坍缩,同时将fn*最多移动13%。这些结果确立了特征范数动态作为预测NC时间的可行诊断方法,表明范数阈值行为是深度网络中延迟表示再组织的通用机制。

英文摘要

Neural collapse (NC) -- the convergence of penultimate-layer features to a simplex equiangular tight frame -- is well understood at equilibrium, but the dynamics governing its onset remain poorly characterised. We identify a simple and predictive regularity: NC occurs when the mean feature norm reaches a model-dataset-specific critical value, fn*, that is largely invariant to training conditions. This value concentrates tightly within each (model, dataset) pair (CV < 8%); training dynamics primarily affect the rate at which fn approaches fn*, rather than the value itself. In standard training trajectories, the crossing of fn below fn* consistently precedes NC onset, providing a practical predictor with a mean lead time of 62 epochs (MAE 24 epochs). A direct intervention experiment confirms fn* is a stable attractor of the gradient flow -- perturbations to feature scale are self-corrected during training, with convergence to the same value regardless of direction (p>0.2). Completing the (architecture)x(dataset) grid reveals the paper's strongest result: ResNet-20 on MNIST gives fn* = 5.867 -- a +458% architecture effect versus only +68% on CIFAR-10. The grid is strongly non-additive; fn* cannot be decomposed into independent architecture and dataset contributions. Four structural regularities emerge: (1) depth has a non-monotonic effect on collapse speed; (2) activation jointly determines both collapse speed and fn*; (3) weight decay defines a three-regime phase diagram -- too little slows, an optimal range is fastest, and too much prevents collapse; (4) width monotonically accelerates collapse while shifting fn* by at most 13%. These results establish feature-norm dynamics as an actionable diagnostic for predicting NC timing, suggesting that norm-threshold behaviour is a general mechanism underlying delayed representational reorganisation in deep networks.

2603.28257 2026-06-05 q-fin.ST cs.LG 版本更新

Nonlinear Factor Decomposition via Kolmogorov-Arnold Networks: A Spectral Approach to Asset Return Analysis

通过Kolmogorov-Arnold网络进行非线性因子分解:一种资产收益分析的谱方法

David Breazu

发表机构 * Faculty of Mathematics and Computer Science, University of Bucharest(布加勒斯特大学数学与计算机科学学院)

AI总结 本文提出KAN-PCA,一种利用KAN作为编码器和线性映射作为解码器的自编码器,通过在每条边上使用学习的B样条函数替代线性投影,以捕捉比传统PCA更多的方差。实验表明KAN-PCA在20只S&P 500股票上实现了更高的重建R²值,并在修正数据泄露后与PCA外推结果一致。

Comments 12 pages, 2 figures

详情
AI中文摘要

KAN-PCA是一种自编码器,其编码器使用KAN,解码器使用线性映射。它通过在每条边上使用学习的B样条函数替代线性投影,扩展了传统PCA。动机是捕捉比传统PCA更多的方差,这在市场危机期间线性假设失效时变得效率低下,因为资产之间的相关性剧烈变化。我们证明,如果将样条激活函数强制为线性,KAN-PCA的结果与传统PCA完全相同,从而将PCA确立为特殊情况。在20只S&P 500股票(2015-2024)上的实验表明,KAN-PCA在3个因子下实现了66.57%的重建R²值,比传统PCA的62.99%更高,同时在修正训练过程中的数据泄露后与PCA的外推结果一致。

英文摘要

KAN-PCA is an autoencoder that uses a KAN as encoder and a linear map as decoder. It generalizes classical PCA by replacing linear projections with learned B-spline functions on each edge. The motivation is to capture more variance than classical PCA, which becomes inefficient during market crises when the linear assumption breaks down and correlations between assets change dramatically. We prove that if the spline activations are forced to be linear, KAN-PCA yields exactly the same results as classical PCA, establishing PCA as a special case. Experiments on 20 S&P 500 stocks (2015-2024) show that KAN-PCA achieves a reconstruction R^2 of 66.57%, compared to 62.99% for classical PCA with the same 3 factors, while matching PCA out-of-sample after correcting for data leakage in the training procedure.

2505.11006 2026-06-05 stat.ML cs.LG 版本更新

Is Supervised Learning Really That Different from Unsupervised?

监督学习真的和无监督学习有那么大的区别吗?

Oskar Allerbo, Thomas B. Schön

发表机构 * KTH Royal Institute of Technology(皇家理工学院) Uppsala University(乌普萨拉大学)

AI总结 该研究通过将监督学习分解为两阶段过程,证明在不访问标签数据的情况下选择模型参数和添加输出,可以实现与传统监督学习相似的性能,表明监督与无监督学习的区别可能不如表面看起来那么根本。

Comments Paper accepted at AISTATS 2026

详情
AI中文摘要

我们展示了监督学习如何分解为一个两阶段过程,其中(1)所有模型参数以无监督的方式选择,(2)输出y被添加到模型中,而无需改变参数值。这通过一种新的模型选择标准实现,与交叉验证不同,该标准可以在不访问y的情况下使用。对于线性岭回归,我们界定了我们方法的渐近外样本风险,以最优渐近风险为基准。我们还证明了在不访问y的情况下训练的线性和核岭回归、平滑样条、k近邻、随机森林和神经网络,其性能与基于y的传统方法相似。因此,我们的结果表明,监督学习和无监督学习之间的区别可能不如表面看起来那么根本。

英文摘要

We demonstrate how supervised learning can be decomposed into a two-stage procedure, where (1) all model parameters are selected in an unsupervised manner, and (2) the outputs y are added to the model, without changing the parameter values. This is achieved by a new model selection criterion that - in contrast to cross-validation - can be used also without access to y. For linear ridge regression, we bound the asymptotic out-of-sample risk of our method in terms of the optimal asymptotic risk. We also demonstrate that versions of linear and kernel ridge regression, smoothing splines, k-nearest neighbors, random forests, and neural networks, trained without access to y, perform similarly to their standard y-based counterparts. Hence, our results suggest that the difference between supervised and unsupervised learning is less fundamental than it may appear.

2603.19312 2026-06-05 cs.LG cs.AI 版本更新

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

LeWorldModel:从像素稳定端到端联合嵌入预测架构

Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, Randall Balestriero

发表机构 * Mila & Université de Montréal(Mila与蒙特利尔大学) New York University(纽约大学) Samsung SAIL(三星SAIL) Brown University(布朗大学)

AI总结 本文提出LeWorldModel,一种通过仅使用两个损失项从原始像素稳定端到端训练的联合嵌入预测架构,显著减少了可调损失超参数,并在多种2D和3D控制任务中表现出色,同时在物理结构编码和物理不合理的事件检测方面展示了其能力。

详情
AI中文摘要

联合嵌入预测架构(JEPAs)提供了一个有吸引力的框架,用于在紧凑的潜在空间中学习世界模型,但现有方法仍然脆弱,依赖于复杂的多术语损失、指数移动平均、预训练编码器或辅助监督来避免表示崩溃。在本工作中,我们引入了LeWorldModel(LeWM),这是第一个通过仅使用两个损失项从原始像素稳定端到端训练的JEPAs。这将可调损失超参数的数量从六个减少到一个。在单个GPU上几小时内可训练约1500万参数,LeWM的规划速度比基于基础模型的世界模型快48倍,同时在多种2D和3D控制任务中保持竞争力。除了控制之外,我们还展示了LeWM的潜在空间通过探测物理量编码有意义的物理结构。惊奇评估证实,该模型能够可靠地检测出物理上不可能的事件。

英文摘要

Joint Embedding Predictive Architectures (JEPAs) offer a compelling framework for learning world models in compact latent spaces, yet existing methods remain fragile, relying on complex multi-term losses, exponential moving averages, pre-trained encoders, or auxiliary supervision to avoid representation collapse. In this work, we introduce LeWorldModel (LeWM), the first JEPA that trains stably end-to-end from raw pixels using only two loss terms: a next-embedding prediction loss and a regularizer enforcing Gaussian-distributed latent embeddings. This reduces tunable loss hyperparameters from six to one compared to the only existing end-to-end alternative. With ~15M parameters trainable on a single GPU in a few hours, LeWM plans up to 48x faster than foundation-model-based world models while remaining competitive across diverse 2D and 3D control tasks. Beyond control, we show that LeWM's latent space encodes meaningful physical structure through probing of physical quantities. Surprise evaluation confirms that the model reliably detects physically implausible events.

2603.20980 2026-06-05 cs.LG cs.AI stat.AP stat.ML 版本更新

From Causal Discovery to Dynamic Causal Inference in Neural Time Series

从因果发现到神经时间序列中的动态因果推断

Dmitry Zaytsev, Valentina Kuskova, Michael Coppedge

发表机构 * Lucy Family Institute for Data & Society(数据与社会卢西家族研究所) University of Notre Dame(诺克斯达大学) Political Science University of Notre Dame(政治学诺克斯达大学)

AI总结 提出动态因果网络自回归(DCNAR)两阶段框架,通过神经自回归因果发现学习稀疏有向因果网络,并将其作为结构先验用于时变神经网络自回归,实现无需预设网络结构的动态因果推断。

Comments 11 pages, 2 figures

详情
AI中文摘要

时变因果模型为研究动态科学系统提供了强大框架,然而大多数现有方法假设潜在因果网络是先验已知的——这一假设在现实领域中很少成立,因为在这些领域中因果结构是不确定的、演变的或仅能间接观测。这限制了动态因果推断在许多科学场景中的适用性。我们提出动态因果网络自回归(DCNAR),一个两阶段神经因果建模框架,将数据驱动的因果发现与时变因果推断相结合。在第一阶段,神经自回归因果发现模型从多变量时间序列中学习稀疏有向因果网络。在第二阶段,该学习到的结构被用作时变神经网络自回归的结构先验,从而无需预先指定网络结构即可实现因果影响的动态估计。我们使用评估因果必要性、时间稳定性和对结构变化敏感性的行为诊断来验证DCNAR的科学有效性,而不仅仅是预测准确性。在多国面板时间序列数据上的实验表明,即使预测性能相当,学习到的因果网络也比基于系数或无结构替代方法产生更稳定且行为上有意义的动态因果推断。这些结果将DCNAR定位为一个通用框架,用于在结构不确定性下将AI作为动态因果推理的科学工具。

英文摘要

Time-varying causal models provide a powerful framework for studying dynamic scientific systems, yet most existing approaches assume that the underlying causal network is known a priori - an assumption rarely satisfied in real-world domains where causal structure is uncertain, evolving, or only indirectly observable. This limits the applicability of dynamic causal inference in many scientific settings. We propose Dynamic Causal Network Autoregression (DCNAR), a two-stage neural causal modeling framework that integrates data-driven causal discovery with time-varying causal inference. In the first stage, a neural autoregressive causal discovery model learns a sparse directed causal network from multivariate time series. In the second stage, this learned structure is used as a structural prior for a time-varying neural network autoregression, enabling dynamic estimation of causal influence without requiring pre-specified network structure. We evaluate the scientific validity of DCNAR using behavioral diagnostics that assess causal necessity, temporal stability, and sensitivity to structural change, rather than predictive accuracy alone. Experiments on multi-country panel time-series data demonstrate that learned causal networks yield more stable and behaviorally meaningful dynamic causal inferences than coefficient-based or structure-free alternatives, even when forecasting performance is comparable. These results position DCNAR as a general framework for using AI as a scientific instrument for dynamic causal reasoning under structural uncertainty.

2602.19373 2026-06-05 cs.LG cs.AI 版本更新

Stable Deep Reinforcement Learning via Isotropic Gaussian Representations

通过各向同性高斯表示实现稳定的深度强化学习

Ali Saheb Pasand, Johan Obando-Ceron, Aaron Courville, Pouya Bashivan, Pablo Samuel Castro

发表机构 * University of Waterloo(滑铁卢大学)

AI总结 本文提出了一种基于各向同性高斯表示的深度强化学习方法,通过在训练过程中塑造表示以达到各向同性高斯分布,从而在非平稳环境下提高性能并减少表示崩溃、神经元休眠和训练不稳定性。

详情
AI中文摘要

深度强化学习系统常常由于非平稳性导致训练动态不稳定,因为学习目标和数据分布随时间变化。我们证明在非平稳目标下,各向同性高斯嵌入具有证明优势。特别是,它们诱导了线性读出对时间变化目标的稳定跟踪,实现了固定方差预算下的最大熵,并鼓励所有表示维度的平衡使用——这些都使智能体更具适应性和稳定性。基于这一见解,我们提出使用Sketched Isotropic Gaussian Regularization来塑造表示以达到各向同性高斯分布。我们在各种领域中实验证明,这种简单且计算成本低的方法在非平稳环境下提高了性能,同时减少了表示崩溃、神经元休眠和训练不稳定性。

英文摘要

Deep reinforcement learning systems often suffer from unstable training dynamics due to non-stationarity, where learning objectives and data distributions evolve over time. We show that under non-stationary targets, isotropic Gaussian embeddings are provably advantageous. In particular, they induce stable tracking of time-varying targets for linear readouts, achieve maximal entropy under a fixed variance budget, and encourage a balanced use of all representational dimensions--all of which enable agents to be more adaptive and stable. Building on this insight, we propose the use of Sketched Isotropic Gaussian Regularization for shaping representations toward an isotropic Gaussian distribution during training. We demonstrate empirically, over a variety of domains, that this simple and computationally inexpensive method improves performance under non-stationarity while reducing representation collapse, neuron dormancy, and training instability.

2603.17925 2026-06-05 stat.ME cs.LG math.ST stat.TH 版本更新

Multi-Armed Sequential Hypothesis Testing by Betting

通过赌注进行多臂顺序假设检验

Ricardo J. Sandoval, Ian Waudby-Smith, Michael I. Jordan

发表机构 * University of California Berkeley(加州大学伯克利分校) École Normale Supérieure & Inria Paris(法国国家科学研究中心巴黎分校 & 巴黎研究所)

AI总结 本文研究了通过赌注进行多臂顺序检验的问题,提出了一种在多个数据源(臂)中选择以获取数据的统计学家的变体,旨在拒绝全局空假设P(所有臂在某种意义上无效)并支持复合替代假设Q(至少有一个臂非空)。通过推广对数最优性和期望拒绝时间最优性的概念,得到了匹配的上下界,并提出了一个修改的上置信界算法来处理不可观测但足够可估计的奖励。

详情
AI中文摘要

我们考虑了一种通过赌注进行的顺序检验变体,其中在每个时间步,统计学家会面对多个数据源(臂)并选择其中一个以获取数据。我们考虑了一个复合全局空假设P,即所有臂在某种意义上(例如所有治疗剂量无效)都是空假设,并希望拒绝P以支持一个复合替代假设Q,其中至少有一个臂是非空的(例如存在有效的治疗剂量)。我们提出了一种最优性要求,即即使多个臂是非空的,我们寻求e-过程和顺序检验,其性能尽可能强,如同拥有 oracle 知识关于哪个臂生成最多反对P的证据。形式上,我们将对数最优性和期望拒绝时间最优性的概念推广到多个臂,得到两者匹配的上下界。在最优性分析中,一个关键技术设备是一个修改的上置信界算法,用于不可观测但足够“可估计”的奖励。在设计此算法时,我们推导了非渐近的集中不等式,用于最优财富增长率,即凯利[1956]的意义。这些可能具有独立的兴趣。

英文摘要

We consider a variant of sequential testing by betting where, at each time step, the statistician is presented with multiple data sources (arms) and obtains data by choosing one of the arms. We consider the composite global null hypothesis $\mathscr{P}$ that all arms are null in a certain sense (e.g. all dosages of a treatment are ineffective) and we are interested in rejecting $\mathscr{P}$ in favor of a composite alternative $\mathscr{Q}$ where at least one arm is non-null (e.g. there exists an effective treatment dosage). We posit an optimality desideratum that we describe informally as follows: even if several arms are non-null, we seek $e$-processes and sequential tests whose performance are as strong as the ones that have oracle knowledge about which arm generates the most evidence against $\mathscr{P}$. Formally, we generalize notions of log-optimality and expected rejection time optimality to more than one arm, obtaining matching lower and upper bounds for both. A key technical device in this optimality analysis is a modified upper-confidence-bound-like algorithm for unobservable but sufficiently "estimable" rewards. In the design of this algorithm, we derive nonasymptotic concentration inequalities for optimal wealth growth rates in the sense of Kelly [1956]. These may be of independent interest.

2603.13761 2026-06-05 cs.LG cs.AI 版本更新

Level Up: Defining and Exploiting Transitional Problems for Curriculum Learning

Level Up: 定义和利用过渡问题以进行课程学习

Amogh Inamdar, Zhenwei Tang, Ashton Anderson, Richard Zemel

发表机构 * Department of Computer Science, Columbia University(哥伦比亚大学计算机科学系) Department of Computer Science, University of Toronto(多伦多大学计算机科学系)

AI总结 本文提出了一种新的方法,通过定义和利用过渡问题来改进课程学习,该方法能够根据模型能力的提升动态调整训练难度,从而更有效地提升模型性能。

详情
AI中文摘要

课程学习——按顺序排列训练示例以帮助机器学习——受到人类学习的启发,但尚未得到广泛接受。静态策略依赖于间接的难度评分代理,产生不特定于当前学习者的课程。动态方法基于梯度信息估计难度,但需要大量的额外计算。我们介绍了一种新的方法,通过一系列能力递增的模型来测量单个问题实例的难度,并识别出在模型能力提升时始终更简单的过渡问题。将此方法应用于由多个可用模型构成的多样化模型系列,我们发现,使用从简单到困难的过渡问题进行训练,最有效地将模型提升到下一个能力层级。这些问题诱导了从简单到困难的自然进步,优于其他训练策略。通过直接测量难度相对于模型能力,我们的方法产生了可解释的问题、特定于学习者的课程以及逐步改进的原理基础。

英文摘要

Curriculum learning--ordering training examples in a sequence to aid machine learning--takes inspiration from human learning, but has not gained widespread acceptance. Static strategies for scoring item difficulty rely on indirect proxy scores of varying quality and produce curricula that are not specific to the learner at hand. Dynamic approaches base difficulty estimates on gradient information, requiring considerable extra computation during training. We introduce a novel method for measuring the difficulty of individual problem instances that is calibrated to a series of models of increasing competence, and identify \emph{transitional problems} that are consistently easier as model ability increases. Applying this method to diverse model series constructed from sets of models that are readily available on many tasks, we find that training on a curriculum that \emph{levels up} from easier to harder transitional problems most efficiently improves a model to the next tier of competence. These problems induce a natural progression from easier to harder items, which outperforms other training strategies. By measuring difficulty directly relative to model competence, our method yields interpretable problems, learner-specific curricula, and a principled basis for step-by-step improvement.

2603.11600 2026-06-05 cs.LG cs.SY eess.SY math.OC 版本更新

Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization

混合能量感知奖励塑形:一种统一的轻量级物理引导策略优化方法

Qijun Liao, Jue Yang, Yiting Kang, Xinxin Zhao, Yong Zhang, Mingan Zhao

发表机构 * School of Mechanical Engineering, University of Science and Technology Beijing(北京科技大学机械工程学院) Jiangsu XCMG Construction Machinery Research Institute Co., Ltd.(江苏中联重科工程机械研究院有限公司)

AI总结 提出混合能量感知奖励塑形(H-EARS),通过编码先验能量项作为奖励势能,结合动作正则化,在连续控制中提升收敛速度、稳定性和能效。

Comments 23 pages, 48 figures. Accepted by Neurocomputing

详情
AI中文摘要

深度强化学习在连续控制中常因纯数据驱动探索忽略可用物理结构而遭受高方差、低能效和分布偏移下泛化能力差的问题。本文提出混合能量感知奖励塑形(H-EARS),将主导能量项(假设先验已知)直接编码为奖励势能,每步计算复杂度为O(n)。H-EARS将塑形势能分解为任务导向和基于能量的组件,并辅以动作正则化项,有意修改优化目标以强制执行节能控制。建立了完整的理论基础:塑形与正则化的功能独立性、正定Hessian条件下的能量梯度增强、函数近似下的收敛保证以及近似势能误差界。在四个连续控制基准和四种基线算法上,H-EARS在收敛速度、策略稳定性和最终性能方面均取得一致提升。高保真车辆仿真验证了其在极端道路条件下安全关键场景中的适用性。

英文摘要

Deep reinforcement learning for continuous control often suffers from high variance, low energy efficiency, and poor generalization under distribution shift, as purely data-driven exploration ignores available physical structure. This paper proposes Hybrid Energy-Aware Reward Shaping (H-EARS), which encodes dominant energy terms -- assumed known a priori -- directly as reward potentials at O(n) per-step computation. H-EARS decomposes the shaping potential into task-oriented and energy-based components, supplemented by an action regularization term that deliberately modifies the optimization objective to enforce energy-efficient control. A complete theoretical foundation is established: functional independence of shaping and regularization, energy-based gradient enrichment under positive-definite Hessian conditions, convergence guarantees under function approximation, and approximate potential error bounds. Across four continuous control benchmarks and four baseline algorithms, H-EARS achieves consistent gains in convergence speed, policy stability, and final performance. High-fidelity vehicle simulations validate applicability in safety-critical settings under extreme road conditions.

2603.11319 2026-06-05 cs.LG stat.ML 版本更新

On the Robustness of Langevin Dynamics to Score Function Error

关于对数动力学对分数函数误差的鲁棒性

Daniel Yiming Cao, August Y. Chen, Karthik Sridharan, Yuchen Wu

发表机构 * Cornell University(康奈尔大学)

AI总结 本文研究了基于分数函数的生成模型对分数函数估计误差的鲁棒性,发现对数动力学在L2误差(更一般地Lp误差)下并不鲁棒,即使在高维简单分布中,即使分数函数估计误差非常小,对数动力学在多项式时间内运行也会导致与目标分布的总变差距离很大,这进一步支持了扩散模型优于对数动力学。

Comments ICML 2026

详情
AI中文摘要

我们考虑了基于分数函数的生成模型对分数函数估计误差的鲁棒性。特别是,我们证明了对数动力学对分数函数估计的L2误差(更一般地Lp误差)不具有鲁棒性。已知在分数函数估计的L2误差较小的情况下,扩散模型可以在多项式时间内忠实采样目标分布,只要满足一定的正则性假设。相比之下,我们的工作表明,即使对于高维简单分布,对数动力学在任何多项式时间内运行都会产生与目标分布在总变差(TV)距离远的分布,即使分数函数估计的L2误差(更一般地Lp误差)可以任意小。考虑到在实践中从数据学习分数函数时,分数函数估计误差是不可避免的,我们的结果进一步支持扩散模型优于对数动力学,并警示不要使用估计的分数函数进行对数动力学采样。

英文摘要

We consider the robustness of score-based generative modeling to errors in the estimate of the score function. In particular, we show that Langevin dynamics is not robust to the $L^2$ errors (more generally $L^p$ errors) in the estimate of the score function. It is well-established that with small $L^2$ errors in the estimate of the score function, diffusion models can sample faithfully from the target distribution under fairly mild regularity assumptions in a polynomial time horizon. In contrast, our work shows that even for simple distributions in high dimensions, Langevin dynamics run for any polynomial time horizon will produce a distribution far from the target distribution in Total Variation (TV) distance, even when the $L^2$ error (more generally $L^p$) of the estimate of the score function is arbitrarily small. Considering such an error in the estimate of the score function is unavoidable in practice when learning the score function from data, our results provide further justification for diffusion models over Langevin dynamics and serve to caution against the use of Langevin dynamics with estimated scores.

2508.06249 2026-06-05 cs.LG cs.AI 版本更新

In-Training Defenses against Emergent Misalignment in Language Models

训练过程中对抗语言模型中新兴偏差的防御措施

David Kaczér, Magnus Jørgenvåg, Clemens Vetter, Esha Afzal, Robin Haselhorst, Lucie Flek, Florian Mai

发表机构 * University of Copenhagen(哥本哈根大学)

AI总结 本文研究了在训练过程中如何防止语言模型出现新兴偏差,提出了五种训练正则化干预方法,并展示了通过选择对齐模型与偏差模型之间困惑度差异的交错数据可以获得最佳效果。

Comments Accepted at ICML 2026 https://icml.cc/virtual/2026/poster/64303

详情
AI中文摘要

微调使从业者能够将对齐的大型语言模型 (LLMs) 重新用于新领域,但最近的研究揭示了新兴偏差 (EM):即使是一个小的、领域特定的微调,也可能导致远超出目标领域的有害行为。即使在模型权重被隐藏在微调API之后的情况下,这也为攻击者提供了无意中访问广泛偏差模型的途径,这从微调数据本身难以检测。我们提出了第一个系统研究在训练过程中对抗EM的防护措施,这些措施对提供者而言是可行的,他们通过API暴露微调:我们评估了这些措施是否能够防止广泛的偏差、允许狭窄的偏差、在良性任务上学习良好,并且保持一致性。我们调查了五种训练正则化干预:(i) 朝着安全参考模型的KL散度正则化,(ii) 特征空间中的ℓ2距离,(iii) 通过邪恶人格向量进行预防性引导,(iv) 从一般指令微调数据集交错训练示例,以及 (v) 疫苗提示。我们证明,通过选择对齐模型与偏差模型之间的困惑度差异的交错数据可以获得最佳效果。

英文摘要

Fine-tuning lets practitioners repurpose aligned large language models (LLMs) for new domains, yet recent work reveals emergent misalignment (EM): Even a small, domain-specific fine-tune can induce harmful behaviors far outside the target domain. Even in the case where model weights are hidden behind a fine-tuning API, this gives attackers inadvertent access to a broadly misaligned model in a way that can be hard to detect from the fine-tuning data alone. We present the first systematic study of in-training safeguards against EM that are practical for providers who expose fine-tuning via an API: We evaluate whether they a) prevent broad misalignment, b) allow narrow misalignment, c) learn well on benign tasks, and d) remain coherent. We investigate five training regularization interventions: (i) KL-divergence regularization toward a safe reference model, (ii) $\ell_2$ distance in feature space, (iii) preventive steering with an evil persona vector, (iv) interleaving training examples from a general instruct-tuning dataset and (v) inoculation prompting. We demonstrate that selecting interleaving data by the perplexity gap between aligned and misaligned models yields the best results overall.

2603.03993 2026-06-05 cs.LG cond-mat.dis-nn 版本更新

Specialization of softmax attention heads: insights from the high-dimensional single-location model

softmax 注意力头的专门化:来自高维单位置模型的见解

M. Sagitova, O. Duranthon, L. Zdeborová

发表机构 * Statistical physics of computation laboratory, École Polytechnique Fédérale de Lausanne, Switzerland(计算物理学实验室,瑞士联邦理工学院拉沃斯纳分校)

AI总结 本文研究了多头注意力机制中注意力头的专门化现象,提出了一种理论模型,分析了SGD下多头softmax注意力的训练动态,并引入了Bayes-softmax注意力以优化预测性能。

详情
AI中文摘要

多头注意力使Transformer模型能够同时表示多种注意力模式。经验上,头的专门化在训练过程中出现于不同的阶段,而许多头仍然冗余且学习相似的表示。我们提出了一种理论模型,基于多索引和单位置回归框架,捕捉这一现象。第一部分分析了多头softmax注意力在SGD下的训练动态,揭示了初始非专门化阶段后,不同头依次对齐潜在信号方向的多阶段专门化阶段。第二部分研究了注意力激活函数对性能的影响。我们引入了Bayes-softmax注意力,该方法在该设置中实现了最优的预测性能。

英文摘要

Multi-head attention enables transformer models to represent multiple attention patterns simultaneously. Empirically, head specialization emerges in distinct stages during training, while many heads remain redundant and learn similar representations. We propose a theoretical model capturing this phenomenon, based on the multi-index and single-location regression frameworks. In the first part, we analyze the training dynamics of multi-head softmax attention under SGD, revealing an initial unspecialized phase followed by a multi-stage specialization phase in which different heads sequentially align with latent signal directions. In the second part, we study the impact of attention activation functions on performance. We introduce the Bayes-softmax attention, which achieves optimal prediction performance in this setting.

2603.03955 2026-06-05 cs.LG cs.AI 版本更新

GIPO: Gaussian Importance Sampling Policy Optimization

GIPO:高斯重要性采样策略优化

Chengxuan Lu, Zhenquan Zhang, Shukuan Wang, Qunzhi Lin, Yanjie Li, Baigui Sun, Yang Liu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 该研究提出了一种基于截断重要性采样的策略优化目标GIPO,通过使用基于对数比率的高斯信任权重替代硬裁剪,以软化极端重要性比率同时保持非零梯度,从而提高数据效率,实验表明GIPO在多种回放缓冲区大小下均取得最佳性能,表现出优越的偏差-方差权衡、高训练稳定性及改进的样本效率。

详情
AI中文摘要

在强化学习(RL)后训练近年来已显示出在多模态智能体上超越监督模仿的强劲潜力。然而,RL仍然受到较差的数据效率的限制,特别是在交互数据稀缺且迅速过时的设置中。为了解决这一挑战,GIPO(高斯重要性采样策略优化)被提出作为基于截断重要性采样的策略优化目标,用基于对数比率的高斯信任权重替代硬裁剪,以软化极端重要性比率同时保持非零梯度。理论分析显示,GIPO引入了隐含且可调的更新幅度约束,而集中界保证了在有限样本估计下的鲁棒性和稳定性。实验结果表明,GIPO在各种回放缓冲区大小范围内,从接近策略到高度过时的数据均取得了最佳性能,同时表现出优越的偏差-方差权衡、高训练稳定性和改进的样本效率。代码可在https://github.com/distanceLu/GIPO获得。

英文摘要

Post-training with reinforcement learning (RL) has recently shown strong promise for advancing multimodal agents beyond supervised imitation. However, RL remains limited by poor data efficiency, particularly in settings where interaction data are scarce and quickly become outdated. To address this challenge, GIPO (Gaussian Importance sampling Policy Optimization) is proposed as a policy optimization objective based on truncated importance sampling, replacing hard clipping with a log-ratio-based Gaussian trust weight to softly damp extreme importance ratios while maintaining non-zero gradients. Theoretical analysis shows that GIPO introduces an implicit, tunable constraint on the update magnitude, while concentration bounds guarantee robustness and stability under finite-sample estimation. Experimental results show that GIPO achieves state-of-the-art performance among clipping-based baselines across a wide range of replay buffer sizes, from near on-policy to highly stale data, while exhibiting superior bias--variance trade-off, high training stability and improved sample efficiency. Code is available at https://github.com/distanceLu/GIPO.

2603.02376 2026-06-05 cs.DC cs.AR cs.LG cs.MA 版本更新

CUCo: An Agentic Framework for Compute and Communication Co-design

CUCo:一种用于计算与通信协同设计的代理框架

Yoga Sri Varshan Varadharajan, Bodun Hu, Saurabh Agarwal, Aditya Akella

发表机构 * UT Austin(德克萨斯大学奥斯汀分校)

AI总结 本文提出CUCo框架,通过结合结构化设计空间形式化和正确性优先的快速路径代理以及进化驱动的慢速路径代理,实现了CUDA内核的计算与通信协同设计,从而在四个多GPU工作负载中实现了1.57倍的加速,并在LLM推理成本低于10美元的情况下发现了一种双流重叠策略。

详情
AI中文摘要

在分布式大语言模型(LLM)训练和推理中,计算与通信传统上是孤立优化的;专家设计的系统如DeepEP、FLUX和TokenWeave展示了协同设计的潜力,但需要深入的系统专业知识和硬件特定的调优;CUCo是一种代理框架,通过结合结构化的设计空间形式化与正确性优先的快速路径代理以获得可靠的基线,以及进化驱动的慢路径代理以获得高性能策略,从而在四个多GPU工作负载中实现了高达1.57倍的加速,并在LLM推理成本低于10美元的情况下发现了DeepSeek-V3 MoE层上的双流重叠策略,该策略通过本地计算隐藏调度。

英文摘要

Computation and communication in distributed LLM training and inference are traditionally optimized in isolation; expert-crafted systems such as DeepEP, FLUX, and TokenWeave show the potential of co-design but require deep systems expertise and hardware-specific tuning; CUCo is an agentic framework that automates compute-communication co-design of CUDA kernels by combining a structured design-space formalization with a correctness-first fast-path agent for reliable baselines and an evolution-driven slow-path agent for high-performance strategies, achieving up to 1.57x speedup across four multi-GPU workloads and discovering a two-stream overlap strategy on a DeepSeek-V3 MoE layer that hides dispatch behind local compute at an LLM inference cost under $10 per workload.

2602.24207 2026-06-05 cs.LG cs.CY cs.GT stat.ML 版本更新

The Stability of Online Algorithms in Performative Prediction

在线算法在表现性预测中的稳定性

Gabriele Farina, Juan Carlos Perdomo

发表机构 * MIT(麻省理工学院) NYU(纽约大学)

AI总结 本文研究了在线算法在表现性预测中的稳定性问题,证明了任何在表现性设置中使用的无遗憾算法都会收敛到一种表现性稳定的均衡状态,该状态中模型主动塑造数据分布,使得其预测在事后看来是最优的。该研究避免了对模型如何影响分布的假设,并揭示了常见算法如梯度下降为何能自然稳定化并防止 runaway 反馈循环。

详情
AI中文摘要

使用算法预测进行决策会导致反馈循环,其中我们部署的模型主动影响我们看到的数据分布,以及后来用于重新训练的数据分布。这种动态由Perdomo等人在表现性预测工作中正式化。我们的主要结果是一个无条件的减少,表明任何在表现性设置中使用的无遗憾算法都会收敛到一个(混合)表现性稳定的均衡:一种解决方案,其中模型以使它们的预测在事后看来最优的方式塑造数据分布。在我们之前的工作之前,该领域所有积极结果都对模型如何影响分布施加了强限制。通过使用鞅论据并允许随机化,我们避免了对人口如何响应预测的任何假设,并绕过了最近的硬度结果,表明确定性稳定的模型通常在PPAD难度上是难以计算的。最后,从概念上讲,我们的连接揭示了常见算法如梯度下降为何自然稳定化并防止 runaway 反馈循环。我们希望我们的工作能促进未来在线优化和表现性之间的技术转移。

英文摘要

The use of algorithmic predictions in decision-making leads to a feedback loop where the models we deploy actively influence the data distributions we see, and later use to retrain on. This dynamic was formalized by Perdomo et al. 2020 in their work on performative prediction. Our main result is an unconditional reduction showing that any no-regret algorithm deployed in performative settings converges to a (mixed) performatively stable equilibrium: a solution in which models actively shape data distributions in ways that their own predictions look optimal in hindsight. Prior to our work, all positive results in this area imposed strong restrictions on how models influenced distributions. By using a martingale argument and allowing randomization, we avoid any assumption on how populations respond to predictions and sidestep recent hardness results showing that deterministic stable models are in general PPAD-hard to compute. Lastly, on a more conceptual note, our connection sheds light on why common algorithms, like gradient descent, are naturally stabilizing and prevent runaway feedback loops. We hope our work enables future technical transfer of ideas between online optimization and performativity.

2602.19327 2026-06-05 cs.LG cs.AI 版本更新

Soft Sequence Policy Optimization

软序列策略优化

Svetlana Glazyrina, Maksim Kryzhanovskiy, Roman Ischenko

发表机构 * Lomonosov Moscow State University(罗蒙诺索夫莫斯科国立大学) Institute for Artificial Intelligence(人工智能研究所)

AI总结 本文提出软序列策略优化方法,通过引入软门控函数改进序列级重要性权重,提升大语言模型对齐任务的训练稳定性与性能。

详情
AI中文摘要

大量近期关于大语言模型(LLM)对齐的研究聚焦于基于组相对策略优化(GRPO)开发新的策略优化方法。两个显著方向出现:(i)向序列级重要性采样权重的转变,以更好地对齐许多任务中使用的序列级奖励;(ii)替代PPO风格的剪裁方法,以避免相关的训练信号损失和熵崩溃。我们引入了软序列策略优化(SSPO),一种离策略强化学习目标,其在序列级重要权重中整合了token级概率比的软门控函数。我们为SSPO提供了理论动机,并调查了实际修改以改善优化行为。实证结果显示,SSPO在数学推理和编码任务中均提高了训练稳定性与性能。

英文摘要

A significant portion of recent research on Large Language Model (LLM) alignment focuses on developing new policy optimization methods based on Group Relative Policy Optimization (GRPO). Two prominent directions have emerged: (i) a shift toward sequence-level importance sampling weights that better align with the sequence-level rewards used in many tasks, and (ii) alternatives to the PPO-style clipping that aim to avoid the associated loss of training signal and entropy collapse. We introduce Soft Sequence Policy Optimization, an off-policy reinforcement learning objective that incorporates soft gating functions over token-level probability ratios within sequence-level importance weights. We provide theoretical motivation for SSPO and investigate practical modifications to improve optimization behavior. Empirically, we demonstrate that SSPO improves training stability and performance both in mathematical reasoning and coding tasks.

2602.18955 2026-06-05 cs.LG 版本更新

Incremental Transformer Neural Processes

增量变换器神经过程

Philip Mortimer, Cristiana Diaconu, Tommy Rochussen, Bruno Mlodozeniec, Richard E. Turner

发表机构 * University of Cambridge(剑桥大学)

AI总结 本文提出增量变换器神经过程(incTNP),通过因果掩码、键值缓存和高效自回归训练策略,在保持预测性能的同时将更新计算复杂度从二次降低到线性,从而在序列推理中实现显著的速度提升,并保持流式推理的一致性。

Comments Accepted at ICML 2026

详情
AI中文摘要

神经过程(NPs)以及特定的变换器神经过程(TNPs)在从时空预测到表格数据建模的任务中展现了卓越的性能。然而,许多应用本质上是序列性的,涉及连续数据流,如实时传感器读数或数据库更新。在这种情况下,模型应支持低成本的增量更新,而不是为每个新观察重新计算内部表示——这一能力现有的TNP变体所缺乏。受大型语言模型的启发,我们引入了增量TNP(incTNP)。通过利用因果掩码、键值(KV)缓存和数据高效的自回归训练策略,incTNP在保持标准TNPs的预测性能的同时,将更新的计算成本从二次复杂度降低到线性复杂度。我们在一系列合成和现实任务上经验性地评估了我们的模型,包括表格回归和温度预测。我们的结果表明,令人惊讶的是,incTNP的性能与非因果TNPs相当或更好,同时在序列推理中解锁了数量级的速度提升。最后,我们评估了模型更新的一致性——通过适配

英文摘要

Neural Processes (NPs), and specifically Transformer Neural Processes (TNPs), have demonstrated remarkable performance across tasks ranging from spatiotemporal forecasting to tabular data modelling. However, many of these applications are inherently sequential, involving continuous data streams such as real-time sensor readings or database updates. In such settings, models should support cheap, incremental updates rather than recomputing internal representations from scratch for every new observation -- a capability existing TNP variants lack. Drawing inspiration from Large Language Models, we introduce the Incremental TNP (incTNP). By leveraging causal masking, Key-Value (KV) caching, and a data-efficient autoregressive training strategy, incTNP matches the predictive performance of standard TNPs while reducing the computational cost of updates from quadratic to linear time complexity. We empirically evaluate our model on a range of synthetic and real-world tasks, including tabular regression and temperature prediction. Our results show that, surprisingly, incTNP delivers performance comparable to -- or better than -- non-causal TNPs while unlocking orders-of-magnitude speedups for sequential inference. Finally, we assess the consistency of the model's updates -- by adapting a metric of "implicit Bayesianness", we show that under a one-at-a-time streaming protocol, incTNP retains a prediction rule as implicitly Bayesian as standard non-causal TNPs, demonstrating that incTNP achieves the computational benefits of causal masking without sacrificing the consistency required for streaming inference.

2602.07875 2026-06-05 cs.LG 版本更新

Harpoon: Generalised Manifold Guidance for Conditional Tabular Diffusion

Harpoon:基于条件表格扩散的通用流形引导

Aditya Shankar, Yuandou Wang, Rihan Hai, Lydia Y. Chen

发表机构 * Department of Computer Science, Delft University of Technology(代尔夫特理工大学计算机科学系) Department of Computer Science, Université de Neuchâtel(日内瓦大学计算机科学系)

AI总结 本文提出Harpoon,一种基于流形引导的条件表格扩散方法,通过扩展流形理论来处理多样化的推理目标,从而在表格数据生成中实现更精确的条件控制。

Comments Accepted at ICLR 2026

详情
AI中文摘要

在需要对生成过程进行精确控制的应用中,生成表格数据至关重要。现有方法依赖于训练时的策略,无法在推理时泛化到未见过的约束,并且难以处理超出表格填补的条件任务。虽然流形理论提供了一种指导生成的原理化方法,但当前的公式化方法局限于特定的推理时间目标,并且仅限于连续领域。我们扩展了流形理论到表格数据,并扩展了其范围以处理多样的推理时间目标。在此基础上,我们引入了HARPOON,一种表格扩散方法,通过引导无约束样本沿着流形几何来满足多样化的表格条件。我们在填补和强制不等约束等任务上通过实验证明了我们的理论贡献,展示了HARPOON在各种数据集上的强大性能以及流形感知指导在表格数据中的实际好处。代码URL:https://github.com/adis98/Harpoon

英文摘要

Generating tabular data under conditions is critical to applications requiring precise control over the generative process. Existing methods rely on training-time strategies that do not generalise to unseen constraints during inference, and struggle to handle conditional tasks beyond tabular imputation. While manifold theory offers a principled way to guide generation, current formulations are tied to specific inference-time objectives and are limited to continuous domains. We extend manifold theory to tabular data and expand its scope to handle diverse inference-time objectives. On this foundation, we introduce HARPOON, a tabular diffusion method that guides unconstrained samples along the manifold geometry to satisfy diverse tabular conditions at inference. We validate our theoretical contributions empirically on tasks such as imputation and enforcing inequality constraints, demonstrating HARPOON'S strong performance across diverse datasets and the practical benefits of manifold-aware guidance for tabular data. Code URL: https://github.com/adis98/Harpoon

2512.15783 2026-06-05 cs.AI cs.LG 版本更新

Towards AI epidemiology: a measurement standardisation framework for prospective risk detection

迈向人工智能流行病学:一种用于前瞻性风险检测的测量标准化框架

Kit Tempest-Walters

AI总结 本文提出了一种测量标准化框架,用于在没有访问模型内部信息的情况下,将专家-人工智能交互压缩为结构化、可比较的领域,以进行前瞻性风险检测。该框架旨在定义其范围,包括语义和统计层面,并指定未来工作的实证测试协议。

Comments 29 pages, 3 figures

详情
AI中文摘要

本文提出了一种测量标准化框架,该框架将专家-人工智能交互压缩为结构化、可比较的领域,用于在部署的人工智能系统中进行前瞻性风险检测,而无需访问模型内部信息。本文的概念性论文的主要目的是定义该框架的范围,包括语义和统计层面,并指定未来工作的实证测试协议。该框架旨在支持的群体层面声明因此是阶段性的研究计划,而非本文中声称的结果。测量标准化支撑着接下来的三个声明。第一个是可靠性声明:在有限条件下,大型语言模型可以产生可靠的、标准化的评估,用于评估专家-人工智能交互的证据和对齐情况。第二个是治理声明:对齐分数在部署期间为专家提供即时信号,并为机构提供监控不同任务类型、模型和领域的对齐模式的基础。第三个是流行病学声明:一旦建立了测量标准化,聚合对齐分数可以用于研究与下游结果相关的关联,这在受监管的专业环境中是可能的。这引入了基于相关变量而非机理分析的“人工智能流行病学”的可能性。本文解决了第一个声明,并指定了调查第二个和第三个声明的协议。为了在未来研究中实现实证评估,本文阐述了定义的语法,以及基于成对Bootstrap推断的统计协议,DeLong测试用于成对AUCs作为灵敏度检查,预设的一侧非劣性边界为0.05,以及Holm-Bonferroni校正。

英文摘要

This paper proposes a measurement standardisation framework that compresses expert-AI interactions into structured, comparable fields for prospective risk detection in deployed AI systems, without access to model internals. The main aim of this concept paper is to define the scope of the framework, both semantically and statistically, and to specify a protocol for its empirical testing in future work. The population-level claims the framework is designed to support are therefore the subject of a staged research programme rather than results claimed in this paper. Measurement standardisation underpins all three claims that follow. The first is a reliability claim: under bounded conditions, large language models can produce reliable, standardised assessments of the evidential and policy alignment of expert-AI interactions. The second is a governance claim: alignment scores give experts an immediate signal during deployment and give institutions a basis for monitoring alignment patterns across mission types, models, and domains. The third is an epidemiological claim: once measurement standardisation is established, aggregate alignment scores could be used to study associations with downstream outcomes in regulated professional settings. This introduces the possibility of an "AI epidemiology" that detects risk based on correlated variables instead of mechanistic analysis. This paper addresses the first claim and specifies protocols for investigating the second and third. To enable empirical evaluation in future studies, this paper sets out a defined grammar, together with a statistical protocol based on paired bootstrap inference, DeLong's test for paired AUCs as a sensitivity check, a pre-specified one-sided non-inferiority margin of 0.05, and Holm-Bonferroni correction.

2509.24882 2026-06-05 cs.LG cond-mat.dis-nn cs.AI stat.ML 版本更新

Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime

浅层神经网络在特征学习 regime 中的缩放定律与谱特性

Leonardo Defilippis, Yizhou Xu, Julius Girardin, Emanuele Troiani, Vittorio Erba, Lenka Zdeborová, Bruno Loureiro, Florent Krzakala

发表机构 * Departement d’Informatique, École Normale Supérieure, PSL & CNRS(信息学院,巴黎高等师范学院,PSL与CNRS) Statistical Physics of Computation Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)(计算统计物理实验室,洛桑联邦理工学院(EPFL)) Information, Learning and Physics Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)(信息、学习与物理实验室,洛桑联邦理工学院(EPFL))

AI总结 本文研究了浅层神经网络在特征学习 regime 中的缩放定律与谱特性,通过分析二次和对角神经网络的缩放规律,揭示了样本复杂度和权重衰减对过剩风险缩放指数的影响,并建立了这些 regime 与训练网络权重谱性质的精确联系。

详情
Journal ref
ICLR 2026
AI中文摘要

神经缩放定律是深度学习近期许多进展的基础,但其理论理解仍然主要局限于线性模型。在本文中,我们系统分析了二次和对角神经网络在特征学习 regime 中的缩放定律。利用与矩阵压缩感知和LASSO的联系,我们推导了过剩风险缩放指数作为样本复杂度和权重衰减函数的详细相图。这种分析揭示了不同缩放 regime 之间的交叉和平台行为,与经验神经缩放文献中广泛报告的现象相呼应。此外,我们建立了这些 regime 与训练网络权重谱性质的精确联系,我们对其进行了详细刻画。作为结果,我们提供了最近经验观察的理论验证,这些观察将权重谱中幂律尾部的出现与网络泛化性能联系起来,从而给出了从基本原理出发的解释。

英文摘要

Neural scaling laws underlie many of the recent advances in deep learning, yet their theoretical understanding remains largely confined to linear models. In this work, we present a systematic analysis of scaling laws for quadratic and diagonal neural networks in the feature learning regime. Leveraging connections with matrix compressed sensing and LASSO, we derive a detailed phase diagram for the scaling exponents of the excess risk as a function of sample complexity and weight decay. This analysis uncovers crossovers between distinct scaling regimes and plateau behaviors, mirroring phenomena widely reported in the empirical neural scaling literature. Furthermore, we establish a precise link between these regimes and the spectral properties of the trained network weights, which we characterize in detail. As a consequence, we provide a theoretical validation of recent empirical observations connecting the emergence of power-law tails in the weight spectrum with network generalization performance, yielding an interpretation from first principles.

2602.16965 2026-06-05 cs.LG 版本更新

Multi-Agent Lipschitz Bandits

多智能体Lipschitz老虎机

Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen

发表机构 * University of Colorado Boulder(科罗拉多大学波德穆尔分校) INRIA Paris(巴黎国家信息与自动化研究所)

AI总结 本文研究了在连续Lipschitz结构动作空间上的去中心化多玩家随机老虎机问题,其中硬碰撞导致零奖励。研究提出了一种无需通信的策略,旨在最大化集体奖励,同时分离协调成本和学习成本。通过新颖的maxima-directed搜索识别并安排玩家到高价值区域,将问题分解为N个独立的单玩家Lipschitz老虎机。在共识模式下,得到端到端的 regrets bound,其主导学习项为~O(T^{(d+1)/(d+2)}),与单玩家Lipschitz速率匹配;前期协调成本在固定置信度下与时间无关,仅在期望 regrets 形式中为多项式对数。在额外的公共覆盖/调度假设下,还获得了无间隙~O(T^{(d+1)/(d+2)})保证。进一步推导了主导学习项的匹配下界,并将框架扩展到一般距离阈值碰撞模型。

Comments Twenty-Ninth Annual Conference on Artificial Intelligence and Statistics (AISTATS 2026)

详情
AI中文摘要

我们研究了在连续、Lipschitz-结构化动作空间上的去中心化多玩家随机老虎机问题,其中硬碰撞导致零奖励。我们的目标是设计一种无需通信的策略,以最大化集体奖励,同时将协调成本与学习成本分开。我们提出了一种模块化协议,首先通过新颖的maxima-directed搜索识别并安排玩家到不同的高价值区域,然后将问题分解为N个独立的单玩家Lipschitz老虎机。在共识模式下,我们得到了端到端的regret界,其主导学习项为~O(T^{(d+1)/(d+2)}),与单玩家Lipschitz速率匹配;前期协调成本在固定置信度下与时间无关,仅在期望regret形式中为多项式对数。在额外的公共覆盖/调度假设下,我们还获得了无间隙~O(T^{(d+1)/(d+2)})保证。我们进一步推导了主导学习项的匹配下界,并将框架扩展到一般距离阈值碰撞模型。

英文摘要

We study the decentralized multi-player stochastic bandit problem over a continuous, Lipschitz-structured action space where hard collisions yield zero reward. Our objective is to design a communication-free policy that maximizes collective reward, while separating coordination costs from learning costs. We propose a modular protocol that first solves the multi-agent coordination problem by identifying and seating players on distinct, high-value regions via a novel maxima-directed search and then decouples the problem into $N$ independent single-player Lipschitz bandits. In the consensus regime, we obtain an end-to-end regret bound whose dominant learning term is \(\tilde{O}(T^{(d+1)/(d+2)})\), matching the single-player Lipschitz rate; the upfront coordination cost is horizon-independent at fixed confidence and only polylogarithmic in \(T\) in the expected-regret form. Under an additional public coverage/scheduling assumption for the epochic extension, we also obtain a gap-free \(\tilde{O}(T^{(d+1)/(d+2)})\) guarantee. We further derive a matching lower bound for the dominant learning term and extend the framework to general distance-threshold collision models.

2509.20345 2026-06-05 stat.ME cs.LG stat.ML 版本更新

General Synthetic-Powered Inference

通用合成数据驱动推断

Meshi Bashari, Yonghoon Lee, Roy Maor Lotan, Edgar Dobriban, Yaniv Romano

发表机构 * Department of Electrical and Computer Engineering, Technion IIT, Israel(电气与计算机工程系,技术离子研究所,以色列) Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, USA(统计学与数据科学系,沃顿商学院,宾夕法尼亚大学,美国) Department of Computer Science, Technion IIT, Israel(计算机科学系,技术离子研究所,以色列)

AI总结 本文提出了一种通用合成数据驱动推断框架,通过结合高质量合成数据和真实数据来提高样本效率,同时在合成数据质量低时自动回退到传统方法,无需分布假设即可保持误差率在用户指定范围内。

详情
AI中文摘要

高质量合成数据的快速普及——由先进的人工智能模型生成或从相关任务中收集——为统计推断带来了机遇和挑战。本文介绍了一种通用合成数据驱动推断(GESPI)框架,该框架围绕广义的统计推断程序包裹,通过结合合成和真实数据安全地提高样本效率。我们的框架利用高质量合成数据提高统计效力,但能自适应回退到仅使用真实数据的传统方法,当合成数据质量较低时。在不假设合成数据分布的情况下,该方法的误差率始终低于用户指定的界限,且随着合成数据质量的提高而降低。这种灵活性使该框架能够无缝集成到符合性预测、风险控制、假设检验和多重检验程序中,而无需修改基础推断方法。我们在有限标注数据的挑战性任务上展示了该方法的优势,包括AlphaFold蛋白质结构预测,以及在复杂数学问题上比较大型推理模型。

英文摘要

The rapid proliferation of high-quality synthetic data -- generated by advanced AI models or collected as auxiliary data from related tasks -- presents both opportunities and challenges for statistical inference. This paper introduces a GEneral Synthetic-Powered Inference (GESPI) framework that wraps around a broad class of statistical inference procedures to safely enhance sample efficiency by combining synthetic and real data. Our framework leverages high-quality synthetic data to boost statistical power, yet adaptively defaults to the standard method using only real data when synthetic data are of low quality. The error rate of our method remains below a user-specified bound without any distributional assumptions on the synthetic data, and decreases as the quality of the synthetic data improves. This flexibility enables seamless integration with conformal prediction, risk control, hypothesis testing, and multiple testing procedures, all without modifying the base inference method. We demonstrate the benefits of our method on challenging tasks with limited labeled data, including AlphaFold protein structure prediction, and comparing large reasoning models on complex math problems.

2507.12257 2026-06-05 cs.LG physics.data-an stat.ML stat.OT 版本更新

Robust Causal Discovery in Real-World Time Series with Power-Laws

在现实时间序列中使用幂律实现鲁棒因果发现

Matteo Tusoni, Giuseppe Masi, Andrea Coletta, Aldo Glielmo, Viviana Arrigoni, Novella Bartolini

发表机构 * Department of Computer Science, Sapienza University of Rome(罗马大学计算机科学系)

AI总结 本文提出了一种基于幂律谱特征提取的鲁棒因果发现方法,以提高在现实时间序列中因果关系发现的鲁棒性,该方法在合成数据集和真实数据集上均优于现有方法。

详情
AI中文摘要

在随机时间序列中探索因果关系是一项具有广泛应用(包括金融、经济、神经科学和气候科学)的挑战性但至关重要的任务。许多因果发现(CD)算法已被提出;然而,它们通常对噪声高度敏感,在真实数据中导致虚假的因果推断。在本文中,我们观察到许多现实时间序列的频率谱遵循幂律分布,这主要是由于内在的自组织行为。利用这一见解,我们构建了一种基于提取幂律谱特征的鲁棒CD方法,以放大真实的因果信号。我们的方法在合成基准和具有已知因果结构的真实数据集上均优于最先进的替代方法,证明了其鲁棒性和实际相关性。

英文摘要

Exploring causal relationships in stochastic time series is a challenging yet crucial task with a vast range of applications, including finance, economics, neuroscience, and climate science. Many algorithms for Causal Discovery (CD) have been proposed; however, they often exhibit a high sensitivity to noise, resulting in spurious causal inferences in real data. In this paper, we observe that the frequency spectra of many real-world time series follow a power-law distribution, notably due to an inherent self-organizing behavior. Leveraging this insight, we build a robust CD method based on the extraction of power-law spectral features that amplify genuine causal signals. Our method consistently outperforms state-of-the-art alternatives on both synthetic benchmarks and real-world datasets with known causal structures, demonstrating its robustness and practical relevance.

2602.13697 2026-06-05 cs.AI cs.DB cs.LG 版本更新

No Need to Train Your RDB Foundation Model

无需训练你的关系数据库基础模型

Linjie Xu, Yanlin Zhang, Quan Gan, Minjie Wang, David Wipf

发表机构 * University of Hong Kong, Shanghai X-Lab(香港大学,上海X实验室)

AI总结 本文提出了一种基于上下文学习的关系数据库编码器,能够在不重新训练的情况下,与现有的单表上下文学习基础模型结合,实现对多张相关表的高效处理。

Comments International Conference on Machine Learning (ICML) 2026

详情
AI中文摘要

关系数据库(RDBs)包含大量异构的表格信息,可用于预测建模。但鉴于企业环境中潜在的目标空间广阔,如何避免每次预测新感兴趣的量时重新训练新模型?基于上下文学习(ICL)的基础模型提供了一种方便的选项,但目前大多局限于单表操作。在推广到多张相互关联的表时,关键在于将可变大小的RDB邻域压缩为固定长度的ICL样本供解码器使用。然而,细节至关重要:与现有监督学习RDB流程不同,我们提供了理论和实证证据表明,ICL特定的压缩应限制在高维RDB列中,其中所有实体共享单位和角色,而不是跨列,因为异构数据类型的相关性无法在缺乏大量标签信息的情况下确定。基于此限制,我们证明了排除可训练参数不会影响编码器的表达能力。因此,我们得到了一种原理上可行的RDB编码器家族,可以无缝搭配已有的单表ICL基础模型,从而无需训练或微调。从实用角度看,我们开发了可扩展的SQL原语来实现编码器阶段,最终得到一个易于使用的开源RDBLearn基础模型,能够在未见过的数据集上实现稳健的性能。

英文摘要

Relational databases (RDBs) contain vast amounts of heterogeneous tabular information that can be exploited for predictive modeling purposes. But since the space of potential targets is vast across enterprise settings, how can we avoid retraining a new model each time we wish to predict a new quantity of interest? Foundation models based on in-context learning (ICL) offer a convenient option, but so far are largely restricted to single-table operability. In generalizing to multiple interrelated tables, it is essential to compress variably-sized RDB neighborhoods into fixed-length ICL samples for consumption by the decoder. However, the details here are critical: unlike existing supervised learning RDB pipelines, we provide theoretical and empirical evidence that ICL-specific compression should be constrained within high-dimensional RDB columns where all entities share units and roles, not across columns where the relevance of heterogeneous data types cannot be determined without extensive label information. Conditioned on this restriction, we then demonstrate that encoder expressiveness is actually not compromised by excluding trainable parameters. Hence we arrive at a principled family of RDB encoders that can be seamlessly paired with already-existing single-table ICL foundation models, whereby no training or fine-tuning is required. From a practical standpoint, we develop scalable SQL primitives to implement the encoder stage, resulting in the easy-to-use open-source RDBLearn foundation model capable of robust performance on unseen datasets out of the box.

2602.12124 2026-06-05 cs.LG cs.CL 版本更新

Alignment Risks from Capability-Seeking RL Training

从能力寻求强化学习训练中产生的对齐风险

Yujun Zhou, Yue Huang, Han Bao, Kehan Guo, Zhenwen Liang, Pin-Yu Chen, Tian Gao, Werner Geyer, Nuno Moniz, Nitesh V Chawla, Xiangliang Zhang

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学) University of Washington(华盛顿大学) University of Texas at Austin(德克萨斯大学奥斯汀分校) University of Toronto(多伦多大学) University of Cambridge(剑桥大学)

AI总结 本文研究了在易受攻击的环境中通过强化学习训练语言模型时,模型可能利用隐含漏洞来最大化奖励的风险,发现这些策略不仅限于狭窄的技巧,还能在一定程度上转移、传播,并在某些情况下比通过SFT学习更持久,表明需要扩展AI安全工作到审计和保障训练环境、奖励机制和评估渠道。

Comments Accepted by ICML 2026

详情
AI中文摘要

尽管大多数AI对齐研究集中在防止模型生成显式有害内容,但来自易受攻击环境中的能力寻求强化学习训练的更微妙的风险却值得关注。我们研究了当语言模型在具有隐含漏洞的环境中通过强化学习(RL)训练时,是否能学习利用这些漏洞来最大化奖励,即使没有被明确指示这样做。为此,我们设计了四种多样化的“漏洞游戏”,每种游戏都涉及与上下文条件合规性、代理指标、奖励篡改和自我评估相关的结构性漏洞。我们的实验表明,模型经常学会利用这些漏洞,发现机会性策略以增加奖励,有时甚至保持或改进标准任务性能指标。更关键的是,我们发现这些剥削策略不总是狭窄的“技巧”:它们可以在结构但有限的方式下转移,通过SFT从有能力的教师模型传播到其他学生模型,并在某些情况下通过RL学习比通过SFT蒸馏更持久。我们的发现表明,来自能力寻求RL训练的能力对齐风险可能难以通过标准性能监控检测,这表明未来AI安全工作应超越内容审查,扩展到审计和保障训练环境、奖励机制和评估渠道。代码可在https://github.com/YujunZhou/Capability-seeking-RL-risk获取。

英文摘要

While most AI alignment research focuses on preventing models from generating explicitly harmful content, a more subtle risk arises from capability-seeking RL training in vulnerable environments. We investigate whether language models, when trained with reinforcement learning (RL) in environments with implicit loopholes, can learn to exploit these flaws to maximize reward, even without being explicitly instructed to do so. To test this, we design a suite of four diverse "vulnerability games," each presenting a structural vulnerability related to context-conditional compliance, proxy metrics, reward tampering, and self-evaluation. Our experiments show that models often learn to exploit these vulnerabilities, discovering opportunistic strategies that increase reward while sometimes preserving or even improving standard task-performance metrics. More critically, we find that these exploitative strategies are not always narrow "tricks": they can transfer in structured but limited ways, propagate from a capable teacher model to other student models through SFT, and in several cases remain more persistent when learned through RL than when distilled through SFT. Our findings show that alignment risks from capability-seeking RL training can be difficult to detect with standard performance monitoring, suggesting that future AI safety work should extend beyond content moderation to auditing and securing training environments, reward mechanisms, and evaluation channels. Code is available at https://github.com/YujunZhou/Capability-seeking-RL-risk.

2602.04809 2026-06-05 cs.LG cs.AI 版本更新

Beyond Rewards in Reinforcement Learning for Cyber Defence

超越奖励的强化学习在网络安全防御中的应用

Elizabeth Bates, Chris Hicks, Vasilios Mavroudis

发表机构 * University of Cambridge(剑桥大学)

AI总结 本文研究了在网络安全防御中使用强化学习时,奖励函数结构对学习和策略行为的影响,通过比较稀疏和密集奖励函数,揭示了奖励、动作空间和子最优策略风险之间的复杂关系。

详情
AI中文摘要

近年来,自主网络安全防御代理在使用深度强化学习保护计算机网络方面引起了广泛关注。这些代理通常在网络安全 gym 环境中训练,使用密集的、高度工程化的奖励函数,结合多种惩罚和激励,以应对各种(不) desirable 状态和昂贵的操作。密集奖励有助于缓解探索复杂环境的挑战,但会偏向于次优且可能风险更大的解决方案,这对复杂的网络安全环境至关重要。我们通过多种稀疏和密集奖励函数、两种已确立的网络安全 gym、不同网络规模以及策略梯度和基于价值的 RL 算法,全面评估了奖励函数结构对学习和策略行为特征的影响。我们的评估得益于一种新的真实评估方法,使可以直接比较不同的奖励函数,揭示了奖励、动作空间和网络安全环境中子最优策略风险之间的微妙关系。我们的结果表明,稀疏奖励,如果目标一致且可以频繁遇到,能够提供增强的训练可靠性和更有效的网络安全防御代理,具有较低风险的策略。令人惊讶的是,稀疏奖励还能产生与网络安全守护者目标更一致的策略,并在不使用显式奖励基于数值惩罚的情况下,节省昂贵的防御操作。

英文摘要

Recent years have seen an explosion of interest in autonomous cyber defence agents trained to defend computer networks using deep reinforcement learning. These agents are typically trained in cyber gym environments using dense, highly engineered reward functions which combine many penalties and incentives for a range of (un)desirable states and costly actions. Dense rewards help alleviate the challenge of exploring complex environments but risk biasing agents towards suboptimal and potentially riskier solutions, a critical issue in complex cyber environments. We thoroughly evaluate the impact of reward function structure on learning and policy behavioural characteristics using a variety of sparse and dense reward functions, two well-established cyber gyms, a range of network sizes, and both policy gradient and value-based RL algorithms. Our evaluation is enabled by a novel ground truth evaluation approach which allows directly comparing between different reward functions, illuminating the nuanced inter-relationships between rewards, action space and the risks of suboptimal policies in cyber environments. Our results show that sparse rewards, provided they are goal aligned and can be encountered frequently, uniquely offer both enhanced training reliability and more effective cyber defence agents with lower-risk policies. Surprisingly, sparse rewards can also yield policies that are better aligned with cyber defender goals and make sparing use of costly defensive actions without explicit reward-based numerical penalties.

2602.10314 2026-06-05 cs.LG 版本更新

Stop Training for the Worst: Progressive Unmasking Accelerates Masked Diffusion Training

停止训练于最差:渐进性解蔽加速了掩码扩散训练

Jaeyeon Kim, Jonathan Geuter, David Alvarez-Melis, Sham Kakade, Sitan Chen

发表机构 * Harvard University(哈佛大学) Kempner Institute(凯普纳研究所)

AI总结 本文提出了一种名为渐进性解蔽(PUMA)的方法,通过修改前向掩码过程,使训练时间和推理时的掩码模式一致,从而加速了掩码扩散模型的训练。

详情
AI中文摘要

掩码扩散模型(MDMs)已在离散空间的生成建模中展现出有前途的潜力。通过以任何顺序生成序列并允许并行解码,它们能够实现快速的推理和在非因果任务上的强大性能。然而,这种灵活性带来了训练复杂度的权衡:MDMs需要在一个指数级大的掩码模式集合上进行训练,这不仅计算成本高昂,而且在训练时使用的随机掩码与推理时由解码过程诱导的结构化掩码之间存在训练-测试不匹配。在本文中,我们提出渐进性解蔽(PUMA),这是一种简单的前向掩码过程修改方法,使训练时间和推理时的掩码模式一致,从而将优化集中在推理对齐的掩码上并加快训练。经验上,PUMA在125M规模的预训练中加速了约2.5倍,并在自回归初始化等常见方法上提供了互补的优势。我们开源了我们的代码库:https://github.com/JaeyeonKim01/PUMA。

英文摘要

Masked Diffusion Models (MDMs) have emerged as a promising approach for generative modeling in discrete spaces. By generating sequences in any order and allowing for parallel decoding, they enable fast inference and strong performance on non-causal tasks. However, this flexibility comes with a training complexity trade-off: MDMs train on an exponentially large set of masking patterns, which is not only computationally expensive, but also creates a train--test mismatch between the random masks used in training and the highly structured masks induced by inference-time unmasking. In this work, we propose Progressive UnMAsking (PUMA), a simple modification of the forward masking process that aligns training-time and inference-time masking patterns, thereby focusing optimization on inference-aligned masks and speeding up training. Empirically, PUMA speeds up pretraining at the 125M scale by $\approx 2.5\times$ and offers complementary advantages on top of common recipes like autoregressive initialization. We open-source our codebase at https://github.com/JaeyeonKim01/PUMA.

2602.09574 2026-06-05 cs.CL cs.AI cs.LG 版本更新

Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs

在LLMs的测试时间扩展中对树搜索策略与固定令牌预算对齐

Sora Miyamoto, Daisuke Oba, Naoaki Okazaki

发表机构 * University of Tokyo(东京大学)

AI总结 本文提出了一种名为Budget-Guided MCTS (BG-MCTS)的树搜索解码算法,通过将搜索策略与剩余令牌预算对齐,以提高在不同令牌预算下的推理性能。

Comments Accepted at ICML 2026. Code: https://github.com/Sora-Miyamoto/bg-mcts

详情
AI中文摘要

树搜索解码是大型语言模型(LLMs)测试时间扩展的有效方法,但现实部署中通常会施加一个固定的每查询令牌预算,且该预算在不同设置中有所不同。现有的树搜索策略大多缺乏预算意识,仅将预算视为终止条件,从而可能导致后期过度分支或提前终止。我们提出Budget-Guided MCTS (BG-MCTS),一种树搜索解码算法,其搜索策略与剩余令牌预算对齐:它从广泛的探索开始,然后在剩余预算减少时优先进行细化和答案完成,同时减少浅层节点的后期分支。BG-MCTS在数学推理基准和额外的物理推理基准上,使用开放权重LLMs在各种推理预算下均优于预算无关的树搜索基线。

英文摘要

Tree-search decoding is an effective form of test-time scaling for large language models (LLMs), but real-world deployment often imposes a fixed per-query token budget that varies across settings. Existing tree-search policies are largely budget-agnostic, treating the budget merely as a termination condition, thereby risking late-stage over-branching or premature termination. We propose Budget-Guided MCTS (BG-MCTS), a tree-search decoding algorithm that aligns its search policy with the remaining token budget: it starts with broad exploration, then prioritizes refinement and answer completion as the remaining budget decreases while reducing late-stage branching from shallow nodes. BG-MCTS consistently outperforms budget-agnostic tree-search baselines across inference budgets on mathematical reasoning benchmarks and an additional physics reasoning benchmark with open-weight LLMs.

2602.08503 2026-06-05 cs.CV cs.CL cs.LG 版本更新

Learning Self-Correction in Vision-Language Models via Rollout Augmentation

通过回滚增强学习视觉-语言模型中的自我纠正

Yi Ding, Ziliang Qiu, Bolian Li, Ruqi Zhang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出一种基于回滚增强的强化学习框架Octopus,通过重新组合现有回滚生成密集的自我纠正示例,提高样本效率并稳定RL优化,同时引入响应遮蔽策略以解耦自我纠正与直接推理,从而在7个基准测试中实现开源VLM的SOTA性能。

Comments 18 pages

详情
Journal ref
ICML 2026
AI中文摘要

自我纠正对于解决视觉-语言模型(VLMs)中的复杂推理问题至关重要。然而,现有的强化学习(RL)方法在学习自我纠正方面存在困难,因为有效的自我纠正行为只在很少情况下出现,导致学习信号非常稀疏。为了解决这一挑战,我们提出了correction-specific rollouts(Octopus),一种RL回滚增强框架,通过重新组合现有回滚来合成密集的自我纠正示例。这种增强同时提高了样本效率,由于回滚重用,并通过平衡监督稳定了RL优化。此外,我们引入了一种响应遮蔽策略,将自我纠正与直接推理解耦,避免信号冲突,并使两种行为都能被有效学习。基于此,我们介绍了Octopus-8B,一种具有可控自我纠正能力的推理VLM。在7个基准测试中,它在开源VLM中实现了SOTA性能,优于最佳RLVR基线1.0分,同时仅需0.72倍的训练时间每步。

英文摘要

Self-correction is essential for solving complex reasoning problems in vision-language models (VLMs). However, existing reinforcement learning (RL) methods struggle to learn it, as effective self-correction behaviors emerge only rarely, making learning signals extremely sparse. To address this challenge, we propose correction-specific rollouts (Octopus), an RL rollout augmentation framework that synthesizes dense self-correction examples by recombining existing rollouts. This augmentation simultaneously improves sample efficiency due to rollout reuse and stabilizes RL optimization through balanced supervision. Furthermore, we introduce a response-masking strategy that decouples self-correction from direct reasoning, avoiding signal conflicts and enabling both behaviors to be learned effectively. Building on this, we introduce Octopus-8B, a reasoning VLM with controllable self-correction capability. Across 7 benchmarks, it achieves SoTA performance among open-source VLMs, outperforming the best RLVR baseline by 1.0 score while requiring only $0.72\times$ training time per step.

2508.04409 2026-06-05 stat.ML cs.LG 版本更新

The Relative Instability of Model Comparison with Cross-validation

模型比较与交叉验证的相对不稳定性

Alexandre Bayle, Lucas Janson, Lester Mackey

发表机构 * Department of Statistics, Harvard University, Cambridge, MA, USA(哈佛大学统计系) Microsoft Research New England, Cambridge, MA, USA(微软研究院新英格兰分部)

AI总结 研究指出即使个体稳定的模型在比较时也可能产生相对不稳定的结果,挑战了交叉验证推断的有效性,特别指出Lasso和软阈值化在最有利的学习条件下仍会导致无效的交叉验证推断。

详情
AI中文摘要

交叉验证(CV)已知能提供渐近精确的模型改进测试和置信区间,但仅在模型比较相对稳定时才成立。令人惊讶的是,我们证明了即使简单且个体稳定的模型也能产生相对不稳定的比较,从而质疑CV推断的有效性。具体来说,我们展示了Lasso及其近亲软阈值化在最有利的学习条件下,即使两个模型本身都稳定,也会产生相对不稳定的比较和无效的CV推断。这些发现强调在部署CV进行模型比较前验证相对稳定性的重要性。

英文摘要

Cross-validation (CV) is known to provide asymptotically exact tests and confidence intervals for model improvement but only when the model comparison is relatively stable. Surprisingly, we prove that even simple, individually stable models can generate relatively unstable comparisons, calling into question the validity of CV inference. Specifically, we show that the Lasso and its close cousin, soft-thresholding, generate relatively unstable comparisons and invalid CV inferences, even in the most favorable of learning settings and when both models are individually stable. These findings highlight the importance of verifying relative stability before deploying CV for model comparison.

2602.07834 2026-06-05 cs.LG math.DG 版本更新

Interpretable Analytic Calabi-Yau Metrics via Symbolic Distillation

通过符号蒸馏获得可解释的分析Calabi-Yau度量

D Yang Eng

发表机构 * D Yang Eng

AI总结 本文研究如何用少量射影不变量紧凑描述Calabi-Yau度量的点确定比,并通过符号回归发现低阶对称特征能有效捕捉教师变化,同时验证了在复杂结构模数范围内保持一致性。

详情
AI中文摘要

点确定比 $ R_ψ(z)\equiv \log\!\left( rac{\det g_{\mathrm{RF}}(z;ψ)}{\det g_{\mathrm{FS}}(z)} ight) $ 用于衡量Dwork五次曲面上的Ricci-flat度量偏离Fubini-Study基线的程度。我们询问这个标量可观测是否能用少量射影不变量紧凑描述,以及是否在复杂结构模数范围内保持有效。使用Donaldson的$k=10$平衡度量作为代数教师,并对采样点进行符号回归,我们发现,在此处研究的受限模数-only特征类别中,两个低阶对称特征,即幂和$p_2=\sum_i |z_i|^4$和三次基本对称多项式$σ_3=e_3$,已经能捕捉大部分教师变化。一个以$(p_2,σ_3)$为变量的三次多项式在测试中达到$R^2=0.946$,而添加剩余低阶对称生成器只会改变不到$10^{-3}$。在同一两个特征空间中,符号回归识别出一个五项有理多项式表达式,能够与$k=10$教师匹配,$R^2=0.9994$。在$ψ\in[0,0.8]$范围内重新拟合相同的函数框架,保持采样点云上的平均确定比代理$\langle R_ψ angle$在$0.01\%$以内,且在研究范围内产生平滑变化的拟合系数。Holomorphic Yukawa耦合$κ_{111}=5$仅作为归一化检查被重现。总体而言,这些结果提供了Dwork家族上一个度量衍生标量可观测的紧凑符号描述,同时受限于用于蒸馏的有限$k$教师,而不是建立闭合形式的Ricci-flat度量。

英文摘要

The pointwise determinant ratio \[ R_ψ(z)\equiv \log\!\left(\frac{\det g_{\mathrm{RF}}(z;ψ)}{\det g_{\mathrm{FS}}(z)}\right) \] measures how the Ricci-flat metric on the Dwork quintic departs from the Fubini--Study baseline. We ask whether this scalar observable can be described compactly in terms of a small number of projective invariants, and whether the same scaffold remains usable across complex-structure moduli. Using Donaldson's $k=10$ balanced metric as an algebraic teacher and symbolic regression on sampled points, we find that, within the restricted moduli-only feature class studied here, two low-order symmetric features, the power sum $p_2=\sum_i |z_i|^4$ and the cubic elementary symmetric polynomial $σ_3=e_3$, already capture most of the teacher variation. A degree-3 polynomial in $(p_2,σ_3)$ achieves held-out test $R^2=0.946$, while adding the remaining low-order symmetric generators changes this by less than $10^{-3}$. Within the same two-feature space, symbolic regression identifies a five-term rational-polynomial expression that matches the $k=10$ teacher with $R^2=0.9994$. Refitting the same functional scaffold across $ψ\in[0,0.8]$ keeps the mean determinant-ratio proxy $\langle R_ψ\rangle$ within $0.01\%$ of the local teachers on the sampled point clouds and yields smoothly varying fitted coefficients over the studied range. The holomorphic Yukawa coupling $κ_{111}=5$ is reproduced as a normalization check only. Taken together, these results provide a compact symbolic description of one metric-derived scalar observable on the Dwork family, while remaining bounded by the finite-$k$ teacher used for distillation rather than establishing a closed-form Ricci-flat metric.

2602.06773 2026-06-05 cs.LG stat.ML 版本更新

On the Convergence of Multicalibration Gradient Boosting

多校准梯度提升的收敛性研究

Daniel Haimovich, Fridolin Linder, Lorenzo Perini, Niek Tax, Milan Vojnovic

发表机构 * Meta LSE, Department of Statistics(伦敦经济学院统计系)

AI总结 本文研究了多校准梯度提升的收敛性,证明了预测更新的幅度以O(1/√T)衰减,并在额外的平滑假设下实现线性收敛,实验验证了理论结果和方法的快速收敛性。

Comments Under submission

详情
AI中文摘要

多校准梯度提升最近作为一种可扩展的方法出现,实证上能够产生近似多校准的预测器,并在大规模网络上部署。尽管这种实证成功,其收敛性质尚未得到充分理解。在本文中,我们为多校准梯度提升算法提供了计算保证。我们证明了连续预测更新的幅度以O(1/√T)衰减,这表明在轮次中经验多校准误差的相同收敛率界限。在额外的弱学习器平滑性假设下,该速率提高到线性收敛。我们进一步建立了自适应变体的收敛性。在真实世界数据集上的实验支持我们的理论,并澄清了该方法在何种情况下实现快速收敛性。

英文摘要

Multicalibration gradient boosting has recently emerged as a scalable method that empirically produces approximately multicalibrated predictors and has been deployed at web scale. Despite this empirical success, its convergence properties are not well understood. In this paper, we provide computational guarantees for multicalibration gradient boosting algorithms. We show that the magnitude of successive prediction updates decays at $O(1/\sqrt{T})$, which implies the same convergence rate bound for the empirical multicalibration error over rounds. Under additional smoothness assumptions on the weak learners, this rate improves to linear convergence. We further establish convergence for adaptive variants. Experiments on real-world datasets support our theory and clarify the regimes in which the method achieves fast convergence.

2602.01607 2026-06-05 math.ST cs.IT cs.LG math.IT stat.ML stat.TH 版本更新

Minimax optimal differentially private synthetic data for smooth queries

最小最大最优差分隐私合成数据用于平滑查询

Rundong Ding, Yiyun He, Yizhe Zhu

发表机构 * Department of Mathematics, University of Southern California(南加州大学数学系) Department of Mathematics, University of California San Diego(加州圣地亚哥大学数学系)

AI总结 本文研究了如何生成具有(ε,δ)差分隐私的合成数据,以在保证个体隐私的同时,为有意义的下游分析提供强效用保证。提出了一种多项式时间算法,实现了最小最大误差率O_{k,d}(n^{-min{1, k/d}}),并建立了针对k-平滑查询的首个最小最大下界。

Comments COLT 2026 arXiv version. 34 pages

详情
AI中文摘要

差分隐私合成数据使敏感数据集的共享和分析成为可能,同时为个体贡献者提供严格的隐私保证。一个核心挑战是为有意义的下游分析提供强效用保证。许多现有方法确保在广泛的查询类上具有均匀的准确性,如所有Lipschitz函数,但这种通用性往往导致对实际感兴趣的统计量的次优速率。由于许多常见数据分析查询的平滑性超出了最坏情况Lipschitz界所捕捉的范围,我们询问是否可以利用这种额外的结构来提高效用。我们研究了从大小为n的数据集生成(ε,δ)差分隐私合成数据的问题,该数据集支持在超立方体[-1,1]^d上,具有对所有具有受界导数的平滑查询的均匀效用保证。我们提出了一种多项式时间算法,实现了最小最大误差率O_{k,d}(n^{-min{1, k/d}}),除了一个log(n)因子。这一特征揭示了k=d处的相变。我们的结果推广了Chebyshev矩匹配框架(Musco等,2025;Wang等,2016),并且严格改进了在\citep{wang2016differentially}中为k-平滑查询建立的误差率。此外,我们建立了针对k-平滑查询的首个最小最大下界,扩展了Boedihardjo等(2024)中关于ε-差分隐私的Wasserstein下界。

英文摘要

Differentially private synthetic data enables the sharing and analysis of sensitive datasets while providing rigorous privacy guarantees for individual contributors. A central challenge is to achieve strong utility guarantees for meaningful downstream analysis. Many existing methods ensure uniform accuracy over broad query classes, such as all Lipschitz functions, but this level of generality often leads to suboptimal rates for statistics of practical interest. Since many common data analysis queries exhibit smoothness beyond what worst-case Lipschitz bounds capture, we ask whether exploiting this additional structure can yield improved utility. We study the problem of generating $(\varepsilon,δ)$-differentially private synthetic data from a dataset of size $n$ supported on the hypercube $[-1,1]^d$, with utility guarantees uniformly for all smooth queries having bounded derivatives up to order $k$. We propose a polynomial-time algorithm that achieves a minimax error rate of $O_{k,d}(n^{-\min \{1, \frac{k}{d}\}})$, up to a $\log(n)$ factor. This characterization uncovers a phase transition at $k=d$. Our results generalize the Chebyshev moment matching framework of (Musco et al., 2025; Wang et al., 2016) and strictly improve the error rates for $k$-smooth queries established in \citep{wang2016differentially}. Moreover, we establish the first minimax lower bound for the utility of $(\varepsilon,δ)$-differentially private synthetic data with respect to $k$-smooth queries, extending the Wasserstein lower bound for $\varepsilon$-differential privacy in (Boedihardjo et al., 2024).

2602.05056 2026-06-05 cs.CR cs.CL cs.LG 版本更新

Grounded but Misleading: Evaluating Semantic Alignment in AI-Generated Security Explanations

grounded but Misleading: Evaluating Semantic Alignment in AI-Generated Security Explanations

Heajun An, Connor Ng, Sandesh Sharma Dulal, Junghwan Kim, Jin-Hee Cho

发表机构 * Virginia Tech(弗吉尼亚理工学院)

AI总结 本文研究了AI生成的安全解释中语义对齐的问题,通过VEXA测试平台验证了词汇基础与语义风险对齐之间的差距,发现即使解释在词汇上显得合理,其语义解释可能削弱检测器的意图风险评估。

详情
AI中文摘要

在线诈骗越来越多地利用流畅且具有上下文意识的社会工程策略,导致对能够解释为何一条信息可能具有风险的AI系统的需求日益增长。然而,引用检测器衍生证据的解释可能仍然在语义上削弱或改变预期的风险解释。我们介绍了VEXA:验证语义解释对齐,一个用于研究AI生成诈骗风险解释中词汇基础与语义风险对齐差距的受控测试平台。VEXA通过独立控制证据基础和语义框架来生成无基础、风险对齐和风险稀释的解释。通过LLM作为判断者和人类评估,我们发现即使解释的语义解释削弱了检测器的意图风险评估,解释仍可能在比较上显得合理。在人类评估中,风险稀释的XAI基础解释保留了相对较高的感知证据基础评分(3.66),尽管其帮助性(3.00)和推理支持(3.14)评分较低。这些发现提供了AI生成安全解释中基础错觉效应的受控证据,并表明可信的解释评估必须不仅验证是否引用了证据,还要验证如何解释这些证据。

英文摘要

Online scams increasingly leverage fluent and context-aware social engineering strategies, creating growing demand for AI systems that explain why a message may be risky. However, explanations that cite detector-derived evidence may still semantically weaken or redirect the intended risk interpretation. We introduce VEXA: Verifying Semantic Explanation Alignment, a controlled testbed for studying the gap between lexical grounding and semantic risk alignment in AI-generated scam-risk explanations. VEXA generates ungrounded, risk-aligned, and risk-diluting explanations by independently controlling evidence grounding and semantic framing. Through LLM-as-a-judge and human evaluations, we show that explanations may continue to appear comparatively grounded even when their semantic interpretation weakens the detector's intended risk assessment. In human evaluation, risk-diluting XAI-grounded explanations retained comparatively elevated Perceived Evidence Grounding scores (3.66) despite lower Helpfulness (3.00) and Reasoning Support (3.14) scores. These findings provide controlled evidence of grounding illusion effects in AI-generated security explanations and suggest that trustworthy explanation evaluation must verify not only whether evidence is cited, but also how that evidence is interpreted.

2602.02680 2026-06-05 cs.LG 版本更新

FlexRank: Nested Low-Rank Knowledge Decomposition for Adaptive Model Deployment

FlexRank: 嵌套低秩知识分解用于自适应模型部署

Riccardo Zaccone, Stefanos Laskaridis, Marco Ciccone, Samuel Horváth

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出FlexRank方法,通过嵌套低秩权重分解和基于重要性的整合,从预训练模型中提取不同能力的子模型,实现“一次训练,随处部署”的自适应部署。

Comments Accepted at ICML 2026 (Spotlight)

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning, PMLR, 2026
AI中文摘要

深度神经网络(包括大型语言模型和视觉变换器)的规模不断增长,使得从头训练成本过高,部署成本也日益增加。这些模型通常作为固定计算成本的单一整体使用,阻碍了在不同成本预算下的自适应部署。我们认为,可以从预训练模型中提取按重要性排序的嵌套组件,并在可用计算预算内选择性激活。为此,我们提出的FlexRank方法利用嵌套的、基于重要性的低秩权重分解来整合子模型,从而提取能力递增的子模型。我们的方法实现了“一次训练,随处部署”的范式,无需为每个预算从头训练即可在成本与性能之间实现优雅的权衡——推进了大型模型的实际部署。

英文摘要

The growing scale of deep neural networks, encompassing large language models (LLMs) and vision transformers (ViTs), has made training from scratch prohibitively expensive and deployment increasingly costly. These models are often used as computational monoliths with fixed cost, hindering adaptive deployment across different cost budgets. We argue that nested components, ordered by importance, can be extracted from pretrained models and selectively activated within the available computational budget. To this end, our proposed FlexRank method leverages low-rank weight decomposition with nested, importance-based consolidation to extract submodels of increasing capabilities. Our approach enables a "train-once, deploy-everywhere" paradigm offering a graceful trade-off between cost and performance without training from scratch for each budget - advancing practical deployment of large models.

2602.02241 2026-06-05 cs.LG 版本更新

Variational Entropic Optimal Transport

变分熵最优传输

Roman Dyachenko, Nikita Gushchin, Kirill Sokolov, Petr Mokrov, Evgeny Burnaev, Alexander Korotin

发表机构 * Lomonosov Moscow State University(莫斯科罗蒙诺索夫莫斯科大学) National Research Nuclear University MEPhI(国家研究核大学 MEPhI)

AI总结 本文提出变分熵最优传输(VarEOT),通过精确的变分重参数化将对数分区函数转化为可处理的最小化问题,从而在不依赖MCMC模拟的情况下实现高效的最优传输学习,理论上有有限样本泛化界和通用函数逼近结果,并在合成数据和未配对图像到图像翻译任务中展示了竞争力或改进的翻译质量。

详情
AI中文摘要

熵最优传输(EOT)在连续空间中以二次成本为经典工具,用于解决领域迁移问题。在实践中,最近的方法优化一个弱对偶EOT目标,依赖于单一势能函数,但这样做在计算上效率不高,因为对数分区项不可计算。现有方法通常通过两种方式解决这一障碍:通过显著限制传输家族以获得闭式归一化(通过高斯混合参数化),或通过使用通用神经参数化,需要基于模拟的训练过程。我们提出变分熵最优传输(VarEOT),基于对数分区$\log \mathbb{E}[\exp(\cdot)]$的精确变分重参数化,作为对辅助对数归一化进行可处理的最小化。这产生了一个可微学习目标,通过随机梯度优化,并避免了训练期间MCMC模拟的必要性。我们提供了理论保证,包括有限样本泛化界和在通用函数逼近下的近似结果。在合成数据和未配对图像到图像翻译实验中,展示了竞争力或改进的翻译质量,而与使用相同弱对偶EOT目标的求解器比较支持所提出优化原理的优势。我们的求解器代码可在https://github.com/DrEternity/VarEOT找到。

英文摘要

Entropic optimal transport (EOT) in continuous spaces with quadratic cost is a classical tool for solving the domain translation problem. In practice, recent approaches optimize a weak dual EOT objective depending on a single potential, but doing so is computationally not efficient due to the intractable log-partition term. Existing methods typically resolve this obstacle in one of two ways: by significantly restricting the transport family to obtain closed-form normalization (via Gaussian-mixture parameterizations), or by using general neural parameterizations that require simulation-based training procedures. We propose Variational Entropic Optimal Transport (VarEOT), based on an exact variational reformulation of the log-partition $\log \mathbb{E}[\exp(\cdot)]$ as a tractable minimization over an auxiliary log-normalizer. This yields a differentiable learning objective optimized with stochastic gradients and avoids the necessity of MCMC simulations during the training. We provide theoretical guarantees, including finite-sample generalization bounds and approximation results under universal function approximation. Experiments on synthetic data and unpaired image-to-image translation demonstrate competitive or improved translation quality, while comparisons within the solvers that use the same weak dual EOT objective support the benefit of the proposed optimization principle. The code for our solver can be found at https://github.com/DrEternity/VarEOT .

2602.01196 2026-06-05 cs.LG 版本更新

Unraveling the Hidden Dynamical Structure in Recurrent Neural Policies

揭示递归神经策略中的隐藏动力学结构

Jin Li, Yue Wu, Mengsha Huang, Yuhao Sun, Hao He, Xianyuan Zhan

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文通过分析不同训练方法、模型架构和任务中学习得到的递归策略的隐藏状态域,发现稳定的循环结构在与环境交互时 consistently 出现,这些结构与动力系统分析中的极限环有相似性,并揭示了极限环几何结构与策略行为之间的对应关系,为解释递归策略的性能提供了新视角。

详情
AI中文摘要

递归神经策略在部分可观测控制和元强化学习任务中被广泛应用。它们能够维持内部记忆并快速适应未见过的场景,相较于非递归策略具有无可比拟的性能。然而,到目前为止,其优异的泛化性和鲁棒性性能的底层机制仍不明确。在本研究中,通过分析不同训练方法、模型架构和任务中学习得到的递归策略的隐藏状态域,我们发现稳定的循环结构在与环境交互时 consistently 出现。如果将策略和环境视为一个联合混合动力系统,这些循环结构与动力系统分析中的极限环有显著相似性。此外,我们发现这些极限环的几何结构也与策略行为具有结构化的对应关系。这些发现为解释递归策略的许多良好特性提供了新的视角:极限环的出现稳定了策略的内部记忆和任务相关的环境状态,同时抑制了来自环境不确定性的干扰变量;极限环的几何结构也编码了行为的关联结构,有助于在非稳态环境中更轻松地进行技能适应。

英文摘要

Recurrent neural policies are widely used in partially observable control and meta-RL tasks. Their abilities to maintain internal memory and adapt quickly to unseen scenarios have offered them unparalleled performance when compared to non-recurrent counterparts. However, until today, the underlying mechanisms for their superior generalization and robustness performance remain poorly understood. In this study, by analyzing the hidden state domain of recurrent policies learned over a diverse set of training methods, model architectures, and tasks, we find that stable cyclic structures consistently emerge during interaction with the environment. Such cyclic structures share a remarkable similarity with \textit{limit cycles} in dynamical system analysis, if we consider the policy and the environment as a joint hybrid dynamical system. Moreover, we uncover that the geometry of such limit cycles also has a structured correspondence with the policies' behaviors. These findings offer new perspectives to explain many nice properties of recurrent policies: the emergence of limit cycles stabilizes both the policies' internal memory and the task-relevant environmental states, while suppressing nuisance variability arising from environmental uncertainty; the geometry of limit cycles also encodes relational structures of behaviors, facilitating easier skill adaptation when facing non-stationary environments.

2601.22580 2026-06-05 cs.CL cs.AI cs.LG 版本更新

SpanNorm: Reconciling Training Stability and Performance in Deep Transformers

SpanNorm: 在深度Transformer中协调训练稳定性与性能

Chao Wang, Bei Li, Jiaqi Zhang, Xinyu Liu, Yuchun Fan, Linkun Lyu, Xin Chen, Jingang Wang, Tong Xiao, Peng Pei, Xunliang Cai

发表机构 * Meituan Inc.(美团公司) NLP Lab, School of Computer Science and Engineering(自然语言处理实验室,计算机科学与工程学院) Northeastern University, Shenyang, China(东北大学,沈阳,中国)

AI总结 本文提出SpanNorm技术,通过结合前归一化和后归一化的优势,解决深度Transformer中训练稳定性与性能之间的根本性权衡问题,理论分析和实验结果表明其在密集和专家混合(MoE)场景中均优于传统归一化方案。

Comments Accepted by ICML2026

详情
AI中文摘要

大型语言模型(LLMs)的成功依赖于深度Transformer架构的稳定训练。一个关键的设计选择是归一化层的位置,导致了一个根本性的权衡:PreNorm架构在深度模型中确保了训练稳定性,但可能牺牲性能;而PostNorm架构提供了强大的性能,但面临严重的训练不稳定性。在本工作中,我们提出SpanNorm,一种新的技术,旨在通过整合两种范式的优点来解决这一困境。结构上,SpanNorm建立了一个跨越整个Transformer块的清晰残差连接以稳定信号传播,同时采用PostNorm风格的计算方式对聚合输出进行归一化以增强模型性能。我们提供了理论分析,证明SpanNorm结合合理的缩放策略可以在整个网络中保持信号方差有界,防止PostNorm模型中出现的梯度问题,并缓解PreNorm中的表示崩溃问题。实验结果表明,SpanNorm在密集和专家混合(MoE)场景中均优于传统归一化方案,为更强大和稳定的Transformer架构铺平了道路。

英文摘要

The success of Large Language Models (LLMs) hinges on the stable training of deep Transformer architectures. A critical design choice is the placement of normalization layers, leading to a fundamental trade-off: the ``PreNorm'' architecture ensures training stability at the cost of potential performance degradation in deep models, while the ``PostNorm'' architecture offers strong performance but suffers from severe training instability. In this work, we propose SpanNorm, a novel technique designed to resolve this dilemma by integrating the strengths of both paradigms. Structurally, SpanNorm establishes a clean residual connection that spans the entire transformer block to stabilize signal propagation, while employing a PostNorm-style computation that normalizes the aggregated output to enhance model performance. We provide a theoretical analysis demonstrating that SpanNorm, combined with a principled scaling strategy, maintains bounded signal variance throughout the network, preventing the gradient issues that plague PostNorm models, and also alleviating the representation collapse of PreNorm. Empirically, SpanNorm consistently outperforms standard normalization schemes in both dense and Mixture-of-Experts (MoE) scenarios, paving the way for more powerful and stable Transformer architectures.

2505.11766 2026-06-05 cs.LG cs.AI quant-ph 版本更新

Reformulating Neural Operators in $d+1$ Dimensions for Embedding Evolution

在d+1维度中重新表述神经算子以嵌入演化

Haoze Song, Zhihao Li, Xiaobo Zhang, Zecheng Gan, Zhilu Lai, Wei Wang

发表机构 * HKUST (GZ)(香港科技大学(广州)) HKUST(香港科技大学) SWJTU(西南交通大学)

AI总结 本文提出在d+1维度中重新表述神经算子,通过引入辅助函数维度来建模嵌入演化,从而改进嵌入扩展的效率,通过傅里叶基算子在物理域和辅助域上联合作用,实现更高效的嵌入演化模块,实验表明该方法在多个基准测试中表现优异。

详情
AI中文摘要

神经算子(NOs)是学习函数空间之间映射的强大架构。尽管大多数进展集中在改进核参数化在d维物理域上的精度,但提升的嵌入扩展仍缺乏探索,这通常导致模型倾向于计算成本高昂的嵌入扩展设计以提高近似能力。在本文中,我们引入了一个辅助函数维度,以运算形式建模嵌入演化,从而在d+1维度中重新表述NO流程。我们通过基于傅里叶的算子在物理域和辅助域上联合作用,实例化了这一框架,得到一个基于基底多样化的方法作为替代于暴力嵌入扩展。在超过十种越来越具有挑战性的基准测试中,从1D热方程到高度非线性的3D瑞利-泰勒不稳定性,我们的模型在评估的基线中始终实现了最低的相对L2误差。关键的是,这一优势通过(1)受控预算意识的比较,与缩放和剥离的基线;(2)混合分辨率训练和超分辨率推断下的鲁棒性;以及(3)零样本泛化到未见的时间范围,得到了实证支持。此外,我们还展示了更广泛的设计选择,以提升和恢复算子,展示了其对模型预测性能的影响。

英文摘要

Neural Operators (NOs) are powerful architectures for learning mappings between function spaces. While most advances focus on refining kernel parameterizations over the $d$-dimensional physical domain, the evolution of lifted embeddings remains underexplored, which often drives models toward computationally expensive embedding-scaling designs to improve approximation. In this paper, we introduce an auxiliary function dimension that models embedding evolution in operator form, thereby reformulating the NO pipeline in $d+1$ dimensions. We instantiate this framework via Fourier-based operators acting jointly on the physical and auxiliary domains, yielding a basis-diversified auxiliary evolution module as an alternative to brute-force embedding scaling. Across more than ten increasingly challenging benchmarks, ranging from the 1D heat equation to the highly nonlinear 3D Rayleigh-Taylor instability, our model consistently achieves the lowest relative $L_2$ error among the evaluated baselines. Crucially, this advantage is empirically supported by (1) controlled budget-aware comparisons against scaled and ablated baselines; (2) robustness under mixed-resolution training and super-resolution inference; and (3) zero-shot generalization to unseen temporal regimes. In addition, we present a broader set of design choices for lifting and recovery operators, demonstrating their impact on our model's predictive performance.

2601.18383 2026-06-05 cs.AI cs.CL cs.LG 版本更新

Dynamic Thinking-Token Selection for Efficient Reasoning in Large Reasoning Models

动态思维-令牌选择用于大型推理模型中的高效推理

Zhenyuan Guo, Tong Chen, Wenlong Meng, Chen Gong, Xin Yu, Chengkun Wei, Wenzhi Chen

发表机构 * Zhejiang University(浙江大学)

AI总结 本研究提出动态思维-令牌选择方法,通过分析推理轨迹发现只有部分关键令牌影响最终答案,从而优化大型推理模型的效率。

详情
AI中文摘要

大型推理模型(LRMs)通过显式生成推理轨迹来解决复杂问题,但扩展生成带来了显著的内存足迹和计算开销,限制了LRMs的效率。本工作利用注意力图分析推理轨迹的影响,发现只有部分关键令牌引导模型走向最终答案,其余令牌贡献微乎其微。基于这一观察,我们提出了动态思维-令牌选择(DynTS)。该方法识别关键令牌,并在推理过程中仅保留其关联的键值(KV)缓存状态,淘汰冗余条目以优化效率。

英文摘要

Large Reasoning Models (LRMs) excel at solving complex problems by explicitly generating a reasoning trace before deriving the final answer. However, these extended generations incur substantial memory footprint and computational overhead, bottlenecking LRMs' efficiency. This work uses attention maps to analyze the influence of reasoning traces and uncover an interesting phenomenon: only some decision-critical tokens in a reasoning trace steer the model toward the final answer, while the remaining tokens contribute negligibly. Building on this observation, we propose Dynamic Thinking-Token Selection (DynTS). This method identifies decision-critical tokens and retains only their associated Key-Value (KV) cache states during inference, evicting the remaining redundant entries to optimize efficiency.

2601.18219 2026-06-05 physics.med-ph cs.CV cs.LG 版本更新

Automated HER2 scoring with uncertainty quantification using lensfree holography and deep learning

利用无透镜全息和深度学习进行自动HER2评分及不确定性量化

Che-Yung Shen, Xilin Yang, Yuzhu Li, Leon Lenk, Aydogan Ozcan

发表机构 * Electrical and Computer Engineering Department, University of California, Los Angeles, CA, 90095, USA(加州大学洛杉矶分校电气与计算机工程系) Bioengineering Department, University of California, Los Angeles, CA, 90095, USA(加州大学洛杉矶分校生物工程系) California NanoSystems Institute (CNSI), University of California, Los Angeles, CA, 90095, USA(加州大学洛杉矶分校加州纳米系统研究所) Department of Computer Science, University of California, Los Angeles, CA, 90095, USA(加州大学洛杉矶分校计算机科学系)

AI总结 本文提出了一种基于无透镜全息和深度学习的紧凑型、低成本系统,用于自动免疫组化染色乳腺组织切片的HER2评分,通过贝叶斯蒙特卡洛Dropout策略提高诊断可靠性,实现了高准确率的HER2分类和评分。

Comments 23 Pages, 6 Figures, 1 Table

详情
Journal ref
BME Frontiers, AAAS (2026)
AI中文摘要

准确评估人类表皮生长因子受体2(HER2)的表达对于乳腺癌的诊断、预后和治疗选择至关重要;然而,大多数现有的数字HER2评分方法依赖于笨重且昂贵的光学系统。本文提出了一种紧凑且经济的无透镜全息平台,结合深度学习用于自动免疫组化染色乳腺组织切片的HER2评分。该系统在RGB激光照明下捕获染色HER2组织切片的无透镜衍射图案,并在约1250 mm²的样本区域上以约84 mm²/分钟的有效吞吐量获取复杂数学信息。为提高诊断可靠性,我们采用了基于贝叶斯蒙特卡洛Dropout的不确定性量化策略,为每个预测提供自主的不确定性估计,支持可靠且稳健的HER2评分,整体修正率为30.4%。使用412个盲测样本的测试集,本方法在4类(0,1+,2+,3+)HER2分类中实现了84.9%的测试准确率,在二分类(0/1+ vs. 2+/3+)HER2评分中实现了94.8%的准确率,结合不确定性量化。总体而言,这种无透镜全息方法提供了一条通往便携式、高吞吐量和低成本HER2评分的实用途径,特别适用于资源有限的环境,其中传统数字病理基础设施不可用。

英文摘要

Accurate assessment of human epidermal growth factor receptor 2 (HER2) expression is critical for breast cancer diagnosis, prognosis, and therapy selection; yet, most existing digital HER2 scoring methods rely on bulky and expensive optical systems. Here, we present a compact and cost-effective lensfree holography platform integrated with deep learning for automated HER2 scoring of immunohistochemically stained breast tissue sections. The system captures lensfree diffraction patterns of stained HER2 tissue sections under RGB laser illumination and acquires complex field information over a sample area of ~1,250 mm^2 at an effective throughput of ~84 mm^2 per minute. To enhance diagnostic reliability, we incorporated an uncertainty quantification strategy based on Bayesian Monte Carlo dropout, which provides autonomous uncertainty estimates for each prediction and supports reliable, robust HER2 scoring, with an overall correction rate of 30.4%. Using a blinded test set of 412 unique tissue samples, our approach achieved a testing accuracy of 84.9% for 4-class (0, 1+, 2+, 3+) HER2 classification and 94.8% for binary (0/1+ vs. 2+/3+) HER2 scoring with uncertainty quantification. Overall, this lensfree holography approach provides a practical pathway toward portable, high-throughput, and cost-effective HER2 scoring, particularly suited for resource-limited settings, where traditional digital pathology infrastructure is unavailable.

2508.11618 2026-06-05 cs.LG 版本更新

Optimal CO2 storage management considering safety constraints in multi-stakeholder multi-site CCS projects: a Markov game perspective

考虑安全约束的多利益相关者多地点碳捕集与封存项目最优存储管理:从马尔可夫博弈视角

Jungang Chen, Seyyed A. Hosseini

发表机构 * The University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 本文基于马尔可夫博弈方法,研究多利益相关者多地点碳捕集与封存项目中不同联盟结构对利益相关者目标的影响,提出一种考虑安全约束的多智能体强化学习框架,以实现多利益相关者的最优存储管理。

Comments 58 pages

详情
Journal ref
Int. J. Greenh. Gas Control 149 (2026) 104683
AI中文摘要

碳捕集与封存(CCS)项目通常涉及来自公共、私人和监管部门的多种利益相关者,每个利益相关者有不同的目标和责任。鉴于CCS操作的复杂性、规模和长期性,确定个体利益相关者是否能够独立最大化其利益,或是否需要协作联盟协议,仍然是有效CCS项目规划和管理的核心问题。CCS项目通常在地质相连的地点实施,其中共享的地质特征如压力空间和储层孔隙容量可能导致利益相关者之间的竞争行为。此外,CO2储存地点通常位于地质成熟的盆地,这些盆地以前曾作为石油开采或废水处置的地点,以利用现有基础设施,这使得单方面优化变得更加复杂和不现实。在本工作中,我们提出了一种基于马尔可夫博弈的范式,以定量研究不同联盟结构如何影响利益相关者的目标。我们将多利益相关者多地点问题框架为具有安全约束的多智能体强化学习问题。我们的方法使智能体能够在遵守安全规定的情况下学习最优策略。我们展示了多个操作员在地质相连盆地中向各自项目区域注入CO2的示例。为了解决高保真模型重复模拟的高计算成本,采用了一种基于Embed-to-Control(E2C)框架的先前开发的替代模型。我们的结果展示了所提出框架在处理多个具有不同目标和目标的利益相关者时实现CO2存储最优管理的有效性。

英文摘要

Carbon capture and storage (CCS) projects typically involve a diverse array of stakeholders or players from public, private, and regulatory sectors, each with different objectives and responsibilities. Given the complexity, scale, and long-term nature of CCS operations, determining whether individual stakeholders can independently maximize their interests or whether collaborative coalition agreements are needed remains a central question for effective CCS project planning and management. CCS projects are often implemented in geologically connected sites, where shared geological features such as pressure space and reservoir pore capacity can lead to competitive behavior among stakeholders. Furthermore, CO2 storage sites are often located in geologically mature basins that previously served as sites for hydrocarbon extraction or wastewater disposal in order to leverage existing infrastructures, which makes unilateral optimization even more complicated and unrealistic. In this work, we propose a paradigm based on Markov games to quantitatively investigate how different coalition structures affect the goals of stakeholders. We frame this multi-stakeholder multi-site problem as a multi-agent reinforcement learning problem with safety constraints. Our approach enables agents to learn optimal strategies while compliant with safety regulations. We present an example where multiple operators are injecting CO2 into their respective project areas in a geologically connected basin. To address the high computational cost of repeated simulations of high-fidelity models, a previously developed surrogate model based on the Embed-to-Control (E2C) framework is employed. Our results demonstrate the effectiveness of the proposed framework in addressing optimal management of CO2 storage when multiple stakeholders with various objectives and goals are involved.

2512.14338 2026-06-05 cs.LG 版本更新

Implicit Bias and Invariance: How Hopfield Networks Efficiently Learn Graph Orbits

隐式偏差与不变性:Hopfield网络如何高效学习图轨道

Michael Murray, Tenzin Chan, Kedar Karhadker, Christopher J. Hillar

发表机构 * Mathematical Sciences, University of Bath(巴斯大学数学科学系) Department of Mathematics, UCLA(洛杉矶大学数学系) Algebraic 4 New Theory AI(代数4新理论AI)

AI总结 研究探讨了Hopfield网络在处理对称性学习问题时的隐式不变性机制,揭示了通过梯度下降学习图同构类时的隐式偏差及其对样本复杂度的影响。

详情
AI中文摘要

许多学习问题涉及对称性,尽管不变性可以被构建到神经架构中,但也可以在训练于群结构数据时隐式地出现。我们研究了经典Hopfield网络中的这一现象,并展示了它们可以从少量随机样本中推断出图的完整同构类。我们的结果揭示了:(i) 图的同构类可以在三维不变子空间内表示;(ii) 使用梯度下降最小化能量流(MEF)具有隐式偏差,倾向于规范高效解,这为学习同构类提供了多项式样本复杂度界;(iii) 在多种学习规则下,参数随着样本量的增加而收敛到不变子空间。这些发现突显了Hopfield网络泛化中的统一机制:学习过程对规范效率的偏见驱动了在群结构数据下的近似不变性出现。

英文摘要

Many learning problems involve symmetries, and while invariance can be built into neural architectures, it can also emerge implicitly when training on group-structured data. We study this phenomenon in classical Hopfield networks and show they can infer the full isomorphism class of a graph from a small random sample. Our results reveal that: (i) graph isomorphism classes can be represented within a three-dimensional invariant subspace, (ii) using gradient descent to minimize energy flow (MEF) has an implicit bias toward norm-efficient solutions, which underpins a polynomial sample complexity bound for learning isomorphism classes, and (iii) across multiple learning rules, parameters converge toward the invariant subspace as sample sizes grow. Together, these findings highlight a unifying mechanism for generalization in Hopfield networks: a bias toward norm efficiency in learning drives the emergence of approximate invariance under group-structured data.

2601.09236 2026-06-05 cs.LG cs.AI 版本更新

Reward Learning through Ranking Mean Squared Error

通过排名均方误差进行奖励学习

Chaitanya Kharyal, Calarina Muslimani, Matthew E. Taylor

发表机构 * Calarina Muslimani(卡拉里娜·穆斯林尼) Matthew E. Taylor(马修·E·泰勒)

AI总结 本文提出了一种基于排名的强化学习方法R4,通过引入新的排名均方误差损失函数,从轨迹-评分对数据中学习奖励函数,并在机器人基准测试中表现出色。

详情
AI中文摘要

奖励设计仍然是将强化学习(RL)应用于现实世界问题的主要瓶颈。一种流行的替代方法是奖励学习,其中奖励函数是从人类反馈中推断出来,而不是手动指定。最近的工作提出了从人类评分而不是传统二元偏好中学习奖励函数,从而实现更丰富且可能更少认知需求的监督。在此范式基础上,我们引入了一种新的基于评分的RL方法,即Ranked Return Regression for RL(R4)。其核心是使用一种新的排名均方误差损失,从轨迹-评分对数据集中学习,将人类提供的离散评分(例如,差,中性,好)视为有序目标。与以往的基于评分的方法不同,R4提供了正式的保证:在其解集下,在温和的假设下,解集是可证明的最小且完整的。实证上,使用人类提供的和模拟的评分,我们证明R4在OpenAI Gym和DeepMind Control Suite的机器人基准测试中,一致地匹配或优于现有的基于评分和偏好强化学习方法。代码发布在https://github.com/IRLL/R4。

英文摘要

Reward design remains a significant bottleneck in applying reinforcement learning (RL) to real-world problems. A popular alternative is reward learning, where reward functions are inferred from human feedback rather than manually specified. Recent work has proposed learning reward functions from human ratings rather than traditional binary preferences, enabling richer and potentially less cognitively demanding supervision. Building on this paradigm, we introduce a new rating-based RL method, Ranked Return Regression for RL (R4). At its core, R4 uses a novel ranking mean squared error loss that learns from a dataset of trajectory-rating pairs, treating the human-provided discrete ratings (e.g., bad, neutral, good) as ordinal targets. Unlike prior rating-based approaches, R4 offers formal guarantees: its solution set is provably minimal and complete under mild assumptions. Empirically, using both human-provided and simulated ratings, we demonstrate that R4 consistently matches or outperforms existing rating and preference-based RL methods on robotic benchmarks from OpenAI Gym and the DeepMind Control Suite. Code released at https://github.com/IRLL/R4.

2502.14131 2026-06-05 cs.LG cs.AI econ.EM 版本更新

An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model

一种用于离线逆强化学习和动态离散选择模型的经验风险最小化方法

Enoch H. Kang, Hema Yoganarasimhan, Lalit Jain

发表机构 * Foster School of Business, University of Washington(华盛顿大学福斯特商学院)

AI总结 本文提出了一种基于经验风险最小化(ERM)的逆强化学习/动态离散选择模型框架,该方法无需显式估计贝尔曼方程中的状态转移概率,适用于高维和无限状态空间,并在理论上有Polyak-Lojasiewicz条件的支持,从而保证了快速的全局收敛性。

详情
AI中文摘要

我们研究了估计动态离散选择(DDC)模型的问题,也称为机器学习中的离线最大熵正则化逆强化学习(离线MaxEnt-IRL)。目标是从离线行为数据中恢复支配代理行为的奖励或Q*函数。在本文中,我们提出了一种全局收敛的基于梯度的方法来解决这些问题,而无需线性参数化的奖励假设。我们的方法的创新之处在于引入了基于经验风险最小化(ERM)的IRL/DDC框架,该框架避免了在贝尔曼方程中显式估计状态转移概率的需要。此外,我们的方法与非参数估计技术如神经网络兼容。因此,所提出的方法有潜力扩展到高维、无限状态空间。我们方法的一个关键理论洞察是贝尔曼残差满足Polyak-Lojasiewicz(PL)条件--一个属性,虽然比强凸性弱,但足以保证快速的全局收敛保证。通过一系列合成实验,我们证明我们的方法在性能上始终优于基准方法和最先进的替代方法。

英文摘要

We study the problem of estimating Dynamic Discrete Choice (DDC) models, also known as offline Maximum Entropy-Regularized Inverse Reinforcement Learning (offline MaxEnt-IRL) in machine learning. The objective is to recover reward or $Q^*$ functions that govern agent behavior from offline behavior data. In this paper, we propose a globally convergent gradient-based method for solving these problems without the restrictive assumption of linearly parameterized rewards. The novelty of our approach lies in introducing the Empirical Risk Minimization (ERM) based IRL/DDC framework, which circumvents the need for explicit state transition probability estimation in the Bellman equation. Furthermore, our method is compatible with non-parametric estimation techniques such as neural networks. Therefore, the proposed method has the potential to be scaled to high-dimensional, infinite state spaces. A key theoretical insight underlying our approach is that the Bellman residual satisfies the Polyak-Lojasiewicz (PL) condition -- a property that, while weaker than strong convexity, is sufficient to ensure fast global convergence guarantees. Through a series of synthetic experiments, we demonstrate that our approach consistently outperforms benchmark methods and state-of-the-art alternatives.

2404.10370 2026-06-05 cs.CV cs.LG 版本更新

Know Yourself Better: Diverse Object-Related Features Improve Open Set Recognition

Know Yourself Better: Diverse Object-Related Features Improve Open Set Recognition

Jiawen Xu, Margret Keuper

发表机构 * Technical University Berlin(柏林技术大学) University of Mannheim(曼海姆大学)

AI总结 研究通过分析特征多样性提升开放集识别性能,提出了一种利用特征多样性的新型开放集识别方法。

详情
AI中文摘要

开放集识别(OSR)是机器学习中的关键方面,旨在解决推理过程中检测新类别的挑战。在深度学习领域,训练于封闭集数据的神经分类器通常难以识别新类别,导致错误预测。为了解决这一问题,已提出各种启发式方法,允许模型通过声明"I don't know"来表达不确定性。然而,文献中仍存在空白,因为对这些方法的底层机制探讨有限。在本文中,我们对开放集识别方法进行了分析,重点在于特征多样性方面。我们的研究揭示了学习多样化的判别特征与增强OSR性能之间存在显著相关性。基于这一见解,我们提出了一种新的OSR方法,利用特征多样性的优势。通过在标准OSR测试平台上的严格评估,证明了我们方法的有效性,显示出相对于最新方法的显著改进。

英文摘要

Open set recognition (OSR) is a critical aspect of machine learning, addressing the challenge of detecting novel classes during inference. Within the realm of deep learning, neural classifiers trained on a closed set of data typically struggle to identify novel classes, leading to erroneous predictions. To address this issue, various heuristic methods have been proposed, allowing models to express uncertainty by stating "I don't know." However, a gap in the literature remains, as there has been limited exploration of the underlying mechanisms of these methods. In this paper, we conduct an analysis of open set recognition methods, focusing on the aspect of feature diversity. Our research reveals a significant correlation between learning diverse discriminative features and enhancing OSR performance. Building on this insight, we propose a novel OSR approach that leverages the advantages of feature diversity. The efficacy of our method is substantiated through rigorous evaluation on a standard OSR testbench, demonstrating a substantial improvement over state-of-the-art methods.

2601.08446 2026-06-05 cs.CV cs.LG 版本更新

Noise-Adaptive Regularization for Robust Multi-Label Remote Sensing Image Classification

针对鲁棒多标签遥感图像分类的噪声自适应正则化

Tom Burgert, Julia Henkel, Begüm Demir

发表机构 * Burgert et al.(Burgert 等)

AI总结 本文提出了一种噪声自适应正则化方法NAR,通过区分加性噪声和减性噪声,提升遥感多标签分类的鲁棒性,实验表明在不同噪声场景下均优于现有方法。

Comments Submitted to TGRS

详情
AI中文摘要

可靠多标签分类(MLC)方法的发展已成为遥感(RS)研究中的重要方向。随着RS数据规模的扩大,标注过程越来越多地依赖主题产品或众包流程以降低人工标注成本。尽管成本效益高,这些策略往往以部分错误标注的形式引入多标签噪声。在MLC中,标签噪声以加性噪声、减性噪声或两者的混合形式出现。先前工作大多忽略了这一区别,通常将噪声标注视为监督信号,缺乏能够明确适应不同噪声类型的机制。为了解决这一限制,我们提出NAR,一种噪声自适应正则化方法,它在半监督学习框架中明确区分加性和减性噪声。NAR采用基于置信度的标签处理机制,动态保留高置信度标签条目,暂时停用中等置信度条目,并通过翻转纠正低置信度条目。这种选择性抑制监督与早期学习正则化(ELR)相结合,以稳定训练并减轻对损坏标签的过拟合。在加性、减性及混合噪声场景中的实验表明,NAR在不同噪声情况下均比现有方法更稳健。性能提升在减性及混合噪声情况下最为显著,表明适应性抑制和选择性纠正噪声监督为遥感MLC中的噪声鲁棒学习提供了一种有效策略。

英文摘要

The development of reliable methods for multi-label classification (MLC) has become a prominent research direction in remote sensing (RS). As the scale of RS data continues to expand, annotation procedures increasingly rely on thematic products or crowdsourced procedures to reduce the cost of manual annotation. While cost-effective, these strategies often introduce multi-label noise in the form of partially incorrect annotations. In MLC, label noise arises as additive noise, subtractive noise, or a combination of both in the form of mixed noise. Previous work has largely overlooked this distinction and commonly treats noisy annotations as supervised signals, lacking mechanisms that explicitly adapt learning behavior to different noise types. To address this limitation, we propose NAR, a noise-adaptive regularization method that explicitly distinguishes between additive and subtractive noise within a semi-supervised learning framework. NAR employs a confidence-based label handling mechanism that dynamically retains label entries with high confidence, temporarily deactivates entries with moderate confidence, and corrects low confidence entries via flipping. This selective attenuation of supervision is integrated with early-learning regularization (ELR) to stabilize training and mitigate overfitting to corrupted labels. Experiments across additive, subtractive, and mixed noise scenarios demonstrate that NAR consistently improves robustness compared with existing methods. Performance improvements are most pronounced under subtractive and mixed noise, indicating that adaptive suppression and selective correction of noisy supervision provide an effective strategy for noise robust learning in RS MLC.

2505.05026 2026-06-05 cs.CL cs.LG 版本更新

Do MLLMs Capture How Interfaces Guide User Behavior? A Benchmark for Multimodal UI/UX Design Understanding

多模态用户界面/用户体验设计理解的基准测试:MLLMs能否捕捉界面如何引导用户行为?

Jaehyun Jeon, Min Soo Kim, Jang Han Yoon, Sumin Shim, Yejin Choi, Hanbin Kim, Dae Hyun Kim, Youngjae Yu

发表机构 * Yonsei University(延世大学) Seoul National University(首尔国立大学) NC AI

AI总结 本文提出WiserUI-Bench基准测试,用于评估多模态UI/UX设计对用户行为的影响,通过300对真实世界UI图像对和专家解读,发现MLLMs在理解UI/UX设计行为影响方面存在局限。

Comments ACL 2026 Main. Our code and dataset: https://github.com/jeochris/wiserui-bench

详情
AI中文摘要

用户界面(UI)设计超越了视觉,旨在塑造用户体验(UX),凸显了UI/UX作为统一概念的转变。尽管最近的研究已探索使用多模态大语言模型(MLLMs)评估UI,但它们主要关注表面特征,忽略了设计选择如何在大规模上影响用户行为。为此,我们引入了WiserUI-Bench,一个新颖的基准测试,用于多模态理解UI/UX设计如何影响用户行为,基于300对来自行业A/B测试的真实UI图像,具有经实证验证的胜者,这些胜者引发了更多用户行为。为了未来在实践中推动设计进步,需要事后理解为何这些胜者能与大量用户成功;我们通过专家整理的关键解读支持这一点。在WiserUI-Bench上对多个MLLMs进行实验,针对两个主要任务(1)预测A/B测试对中更有效的UI图像,(2)根据专家解读进行事后解释,显示模型在理解UI/UX设计行为影响方面存在局限。我们相信我们的工作将促进利用MLLMs在用户行为上下文中进行视觉设计的研究。

英文摘要

User interface (UI) design goes beyond visuals to shape user experience (UX), underscoring the shift toward UI/UX as a unified concept. While recent studies have explored UI evaluation using Multimodal Large Language Models (MLLMs), they largely focus on surface-level features, overlooking how design choices influence user behavior at scale. To fill this gap, we introduce WiserUI-Bench, a novel benchmark for multimodal understanding of how UI/UX design affects user behavior, built on 300 real-world UI image pairs from industry A/B tests, with empirically validated winners that induced more user actions. For future design progress in practice, post-hoc understanding of why such winners succeed with mass users is also required; we support this via expert-curated key interpretations for each instance. Experiments across multiple MLLMs on WiserUI-Bench for two main tasks, (1) predicting the more effective UI image between an A/B-tested pair, and (2) explaining it post-hoc in alignment with expert interpretations, show that models exhibit limited understanding of the behavioral impact of UI/UX design. We believe our work will foster research on leveraging MLLMs for visual design in user behavior contexts.

2510.02415 2026-06-05 physics.ao-ph cs.LG 版本更新

The Equilibrium Response of Atmospheric Machine-Learning Models to Uniform Sea Surface Temperature Warming

大气机器学习模型对均匀海表温度变暖的平衡响应

Bosong Zhang, Timothy M. Merlis

发表机构 * University of Washington(华盛顿大学)

AI总结 本文评估了几种先进的机器学习模型对均匀海表温度变暖的气候响应,探讨了这些模型在气候预测中的潜力与局限性。

详情
AI中文摘要

近年来,能够产生稳定、多年气候模拟的全球大气机器学习模型已得到发展。然而,这些机器学习模型超越训练分布进行泛化的能力仍是一个开放性问题。在本研究中,我们评估了几种最先进的机器学习模型(ACE2-ERA5、NeuralGCM和cBottle)对均匀海表温度变暖的气候响应,这是一种广泛用于评估气候变化的基准测试。我们评估了这些机器学习模型相对于基于物理的一般环流模型(NOAA的Geophysical Fluid Dynamics Laboratory AM4)在关键诊断指标上的性能,包括地表空气温度、降水量、温度和风廓线以及大气顶部辐射。尽管机器学习模型能够再现物理模型响应的关键方面,特别是降水量的响应,但某些模型在辐射响应和陆地区域变暖方面表现出显著偏离稳健的物理响应。我们的结果突显了机器学习模型在气候变化应用中的潜力和当前的局限性,并表明需要进一步改进以实现稳健的样本外泛化。

英文摘要

Machine learning models for the global atmosphere that are capable of producing stable, multi-year simulations of Earth's climate have recently been developed. However, the ability of these ML models to generalize beyond the training distribution remains an open question. In this study, we evaluate the climate response of several state-of-the-art ML models (ACE2-ERA5, NeuralGCM, and cBottle) to a uniform sea surface temperature warming, a widely used benchmark for evaluating climate change. We assess each ML model's performance relative to a physics-based general circulation model (NOAA's Geophysical Fluid Dynamics Laboratory AM4) across key diagnostics, including surface air temperature, precipitation, temperature and wind profiles, and top-of-atmosphere radiation. While the ML models reproduce key aspects of the physical model response, particularly the response of precipitation, some exhibit notable departures from robust physical responses, including radiative responses and land region warming. Our results highlight the promise and current limitations of ML models for climate change applications and suggest that further improvements are needed for robust out-of-sample generalization.

2510.10968 2026-06-05 cs.LG stat.ML 版本更新

Blade: A Derivative-free Bayesian Inversion Method using Diffusion Priors

Blade:一种使用扩散先验的无导数贝叶斯反演方法

Hongkai Zheng, Austin Wang, Zihui Wu, Zhengyu Huang, Ricardo Baptista, Yisong Yue

发表机构 * California Institute of Technology(加州理工学院) University of Toronto(多伦多大学) Peking University(北京大学)

AI总结 本文提出Blade方法,通过使用扩散模型作为数据驱动的先验,解决无导数贝叶斯反演中高维非线性问题的后验估计问题,实现了准确且校准良好的后验分布。

详情
AI中文摘要

无导数贝叶斯反演在科学和工程应用中出现,特别是在正向模型成本高或无法通过导数进行微分时。现有的无导数方法将后验缩减为点估计或在高维非线性问题中返回严重过自信的不确定性。我们介绍了Blade,它使用相互作用粒子的集合产生准确且校准良好的后验。Blade利用扩散模型作为数据驱动的先验,并且只通过正向评估(即无导数)查询正向模型。理论上,我们证明了在正向模型近似和先验分数估计误差下,Blade的收敛性和稳定性。经验上,在非线性流体动力学中,Blade产生校准良好的后验样本,这些样本现有无导数方法无法产生,通过CRPS、扩展-技能比和等级直方图进行测量。其准确性和校准随着迭代次数和粒子数的增加而持续提高,这得到了我们的收敛性和稳定性分析以及经验实验的支持。

英文摘要

Derivative-free Bayesian inversion arises in science and engineering applications, particularly when forward model is costly or infeasible to differentiate through. Existing derivative-free methods collapse the posterior to a point estimate or return severely over-confident uncertainty on high-dimensional, nonlinear problems. We introduce Blade, which produces accurate and well-calibrated posteriors using an ensemble of interacting particles. Blade leverages diffusion models as data-driven priors, and only queries the forward model through forward evaluations (i.e., derivative-free). Theoretically, we show the convergence and stability of Blade under forward model approximation and prior score estimation error. Empirically, on nonlinear fluid dynamics, Blade produces well-calibrated posterior samples that existing derivative-free methods cannot, as measured by CRPS, the spread-skill ratio, and the rank histogram. Its accuracy and calibration improve consistently with more iterations and particles, backed by our convergence and stability analysis and empirical experiments.

2512.21335 2026-06-05 physics.med-ph cs.LG physics.app-ph physics.bio-ph 版本更新

Autonomous Uncertainty Quantification for Computational Point-of-care Sensors

自主不确定性量化用于计算床旁传感器

Artem Goncharov, Rajesh Ghosh, Hyou-Arm Joung, Dino Di Carlo, Aydogan Ozcan

发表机构 * Electrical & Computer Engineering Department(电气与计算机工程系) Bioengineering Department(生物工程系) California NanoSystems Institute (CNSI)(加州纳米系统研究所) Department of Surgery(外科医学系) University of California, Los Angeles(加州大学洛杉矶分校)

AI总结 本文提出了一种自主不确定性量化技术,用于改进床旁诊断中的神经网络驱动计算传感器系统,通过蒙特卡洛dropout方法提高诊断的准确性和可靠性。

Comments 18 Pages, 5 Figures

详情
Journal ref
ACS Nano (2026)
AI中文摘要

计算床旁(POC)传感器能够为紧急、偏远和资源有限地区提供快速、低成本和可及的诊断。这些系统可以利用基于神经网络的算法从快速诊断测试或传感器生成的信号中准确推断诊断。然而,基于神经网络的诊断模型容易产生幻觉,并可能产生错误预测,导致误诊和不准确的临床决策。为了解决这一挑战,本文提出了一种专为POC诊断开发的自主不确定性量化技术。作为测试平台,我们使用了用于快速诊断莱姆病(全球最普遍的蜱传疾病)的纸基计算垂直流分析(xVFA)平台。xVFA平台集成了可丢弃的纸基检测、手持光学读取器和基于神经网络的推断算法,可在20分钟内使用仅20微升患者血清提供快速且经济有效的莱姆病诊断。通过将基于蒙特卡洛dropout(MCDO)的不确定性量化方法整合到诊断流程中,我们识别并排除了具有高不确定性的错误预测,显著提高了xVFA的灵敏度和可靠性,无需访问患者的真实诊断信息。使用新患者样本的盲测显示,诊断灵敏度从88.2%提高到95.7%,表明基于MCDO的不确定性量化在增强神经网络驱动的计算POC传感系统鲁棒性方面的有效性。

英文摘要

Computational point-of-care (POC) sensors enable rapid, low-cost, and accessible diagnostics in emergency, remote and resource-limited areas that lack access to centralized medical facilities. These systems can utilize neural network-based algorithms to accurately infer a diagnosis from the signals generated by rapid diagnostic tests or sensors. However, neural network-based diagnostic models are subject to hallucinations and can produce erroneous predictions, posing a risk of misdiagnosis and inaccurate clinical decisions. To address this challenge, here we present an autonomous uncertainty quantification technique developed for POC diagnostics. As our testbed, we used a paper-based, computational vertical flow assay (xVFA) platform developed for rapid POC diagnosis of Lyme disease, the most prevalent tick-borne disease globally. The xVFA platform integrates a disposable paper-based assay, a handheld optical reader and a neural network-based inference algorithm, providing rapid and cost-effective Lyme disease diagnostics in under 20 min using only 20 uL of patient serum. By incorporating a Monte Carlo dropout (MCDO)-based uncertainty quantification approach into the diagnostics pipeline, we identified and excluded erroneous predictions with high uncertainty, significantly improving the sensitivity and reliability of the xVFA in an autonomous manner, without access to the ground truth diagnostic information of patients. Blinded testing using new patient samples demonstrated an increase in diagnostic sensitivity from 88.2% to 95.7%, indicating the effectiveness of MCDO-based uncertainty quantification in enhancing the robustness of neural network-driven computational POC sensing systems.

2512.20813 2026-06-05 cs.LG 版本更新

GraphFire-X: Physics-Informed Graph Attention Networks and Structural Gradient Boosting for Building-Scale Wildfire Preparedness at the Wildland-Urban Interface

GraphFire-X: 基于物理信息图注意力网络和结构梯度提升的建筑尺度野火准备方法用于荒野-城市界面

Miguel Esparza, Vamshi Battal, Ali Mostafavi

发表机构 * Urban Reslience.AI Lab, Zachry Department of Civil and Environmental Engineering, Texas A&M University(Urban Reslience.AI实验室,Zachry土木与环境工程系,德克萨斯A&M大学) Department of Computer Science and Engineering, Texas A&M University(计算机科学与工程系,德克萨斯A&M大学)

AI总结 本研究提出GraphFire-X框架,结合物理信息图注意力网络和结构梯度提升,通过分离脆弱性为环境传染和结构脆弱两个向量,解决荒野-城市界面野火风险建模问题,揭示环境压力主导传播路径,而屋檐成为主要微尺度入侵向量,从而实现精准的灾害预防和缓解策略。

详情
Journal ref
Computer-Aided Civil and Infrastructure Engineering (2026): 100085
AI中文摘要

随着野火越来越多地演变为城市大火,传统将结构视为孤立资产的风险模型无法捕捉荒野-城市界面(WUI)的非线性传染动态。本研究通过建立一种新的双专家集成框架,弥合了机理物理与数据驱动学习之间的差距。该框架将脆弱性分解为两个不同的向量:环境传染和结构脆弱性。架构整合了两个专门的预测流,即环境专家,实现为图神经网络(GNN),将社区视为一个加权的有向传染图,权重由物理信息传导、辐射和火星概率决定,并结合高维Google AlphaEarth Foundation嵌入;以及结构专家,通过XGBoost实现,以隔离细粒度资产层面的韧性。应用于2025年Eaton火灾,该框架揭示了风险驱动因素的关键二元性。GNN显示,社区层面的环境压力主导内在结构特征,在定义传播路径中起主导作用,而XGBoost模型识别屋檐为主要微尺度入侵向量。通过逻辑堆叠合成这些分歧信号,集成模型实现了稳健的分类并生成诊断风险拓扑。该能力使决策者能够超越二元损失预测,精确针对缓解优先级,优先管理高连通性集群的植被和对建筑脆弱节点进行结构加固,从而实现一种主动、数据驱动的社区韧性方法。

英文摘要

As wildfires increasingly evolve into urban conflagrations, traditional risk models that treat structures as isolated assets fail to capture the non-linear contagion dynamics characteristic of the wildland urban interface (WUI). This research bridges the gap between mechanistic physics and data driven learning by establishing a novel dual specialist ensemble framework that disentangles vulnerability into two distinct vectors, environmental contagion and structural fragility. The architecture integrates two specialized predictive streams, an environmental specialist, implemented as a graph neural network (GNN) that operationalizes the community as a directed contagion graph weighted by physics informed convection, radiation, and ember probabilities, and enriched with high dimensional Google AlphaEarth Foundation embeddings, and a Structural Specialist, implemented via XGBoost to isolate granular asset level resilience. Applied to the 2025 Eaton Fire, the framework reveals a critical dichotomy in risk drivers. The GNN demonstrates that neighborhood scale environmental pressure overwhelmingly dominates intrinsic structural features in defining propagation pathways, while the XGBoost model identifies eaves as the primary micro scale ingress vector. By synthesizing these divergent signals through logistic stacking, the ensemble achieves robust classification and generates a diagnostic risk topology. This capability empowers decision makers to move beyond binary loss prediction and precisely target mitigation prioritizing vegetation management for high connectivity clusters and structural hardening for architecturally vulnerable nodes thereby operationalizing a proactive, data driven approach to community resilience.

2512.20111 2026-06-05 cs.CL cs.AI cs.LG 版本更新

ABBEL: Learning Natural-Language Belief States for Memory-Efficient Interaction

ABBEL: 为高效交互学习自然语言信念状态

Aly Lidayan, Jakob Bjorner, Satvik Golechha, Kartik Goyal, Alane Suhr

发表机构 * University of California, Berkeley(加州大学伯克利分校) Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文提出ABBEL框架,通过显式自然语言信念状态直接监督每个摘要的信息内容,以解决传统方法在生成摘要时信息丢失或更新错误的问题,从而在保持高效内存使用的同时提升交互性能。

详情
AI中文摘要

随着序列决策任务的时间范围扩大,将完整交互历史保留在模型上下文中变得越来越昂贵。最近的研究通过使用递归更新的自然语言摘要来减少上下文长度,这些摘要简洁且可解释。然而,这些方法在性能上仍低于能够访问完整上下文的智能体,表明它们未能生成足够的摘要。为此,我们提出了ABBEL,一种递归摘要框架,通过显式自然语言信念状态直接监督每个摘要的信息内容。首先,我们分析了在五个领域中由前沿模型生成的信念状态,并验证了性能通常因遗漏或错误更新信息而降低。我们还发现了一些模型使用内存低效的设置,通过保留冗余信息。我们通过两种基于强化学习的方法进行微调:信念分级,通过奖励基于信息内容的信念生成来减少更新错误;峰值信念惩罚,通过鼓励压缩内存足迹最大的信念。我们证明这些方法显著缩小了与完整上下文模型的性能差距,并使ABBEL在使用67%内存的情况下,比先前的记忆智能体工作提高了40%。我们的代码可在https://github.com/jakob-bjorner/optimal-explorer-dev获取。

英文摘要

As the time horizons of sequential decision-making tasks grow, keeping full interaction histories in model context becomes increasingly costly. Recent work reduces context lengths by instead conditioning decision-making agents on recursively updated natural-language summaries, which are concise and interpretable. However, they underperform agents with access to the full context, suggesting that they fail to generate sufficient summaries. To address this we propose ABBEL, a recursive summarization framework that isolates and directly supervises each summary's information contents in the form of explicit natural-language belief states. First, we analyze the belief states generated by frontier models under ABBEL across five domains, and verify that performance is often degraded due to omitting or incorrectly updating information. We also discover settings where models use memory inefficiently by retaining extraneous information. We target these limitations by fine-tuning with two RL-based methods: belief grading, which reduces update errors by rewarding belief generations based on their information content, and peak belief penalties, which encourage compressing the beliefs with the greatest memory footprints. We demonstrate that these methods significantly reduce the performance gap with full context models, and enable ABBEL to outperform prior memory agent work by 40% while using 67% of the memory. Our code is available at https://github.com/jakob-bjorner/optimal-explorer-dev

2412.11800 2026-06-05 cs.LG stat.ML 版本更新

Scalable Temporal Anomaly Causality Discovery in Large Systems: Achieving Computational Efficiency with Binary Anomaly Flag Data

在大规模系统中实现可扩展的时间异常因果发现:通过二进制异常标志数据实现计算效率

Mulugeta Weldezgina Asres, Christian Walter Omlin, The CMS-HCAL Collaboration

发表机构 * Department of ICT, University of Agder(阿格德大学信息与通信技术系) The CMS Experiment, CERN(欧洲核子研究中心(CERN)CMS实验)

AI总结 本文提出了一种异常因果发现方法(AnomalyCD),旨在解决从时间二进制标志数据集生成图形因果模型(GCMs)的准确性和计算挑战,通过异常数据感知的因果测试、稀疏数据和先验链接压缩以及边修剪调整等策略,提高了计算效率和准确性。

Comments 26 pages, 17 figures, 8 tables, published version at EPJ-C: Computing, Software and Data Science

详情
Journal ref
Eur. Phys. J. C, 86, 585 (2026)
AI中文摘要

提取异常因果关系有助于在监控系统检测系统故障时进行诊断。在大规模系统中识别异常原因涉及在多个子系统中调查更广泛的监控变量。然而,学习图形因果模型(GCMs)带来了显著的计算负担,限制了现有方法在实时和大规模部署中的应用。此外,现代大规模系统的监控应用通常生成大量二进制警报标志,二进制异常数据的特征——状态转换的意义和数据稀疏性——挑战了现有的因果学习机制。本文提出了一种异常因果发现方法(AnomalyCD),以解决从时间二进制标志数据集生成GCMs的准确性和计算挑战。AnomalyCD提出了几种策略,例如异常数据感知的因果测试、稀疏数据和先验链接压缩,以及边修剪调整方法。我们在两个数据集上验证了该方法的性能:来自欧洲核子研究中心紧凑缪子对撞机实验读出盒系统的传感器数据,以及一个来自信息技术监控系统的公开数据集。在时间GCMs上的结果表明,计算开销显著减少,且在二进制异常数据集上准确性有所提高。代码:https://github.com/muleina/AnomalyCD

英文摘要

Extracting anomaly causality facilitates diagnostics once monitoring systems detect system faults. Identifying anomaly causes in large systems involves investigating a broader set of monitoring variables across multiple subsystems. However, learning graphical causal models (GCMs) comes with a significant computational burden that restrains the applicability of most existing methods in real-time and large-scale deployments. In addition, modern monitoring applications for large systems often generate large amounts of binary alarm flags, and the distinct characteristics of binary anomaly data -- the meaning of state transition and data sparsity -- challenge existing causality learning mechanisms. This study proposes an anomaly causal discovery approach (AnomalyCD), addressing the accuracy and computational challenges of generating GCMs from temporal binary flag datasets. The AnomalyCD presents several strategies, such as anomaly data-aware causality testing, sparse data and prior link compression, and edge pruning adjustment approaches. We validate the performance of the approach on two datasets: monitoring sensor data from the readout-box system of the Compact Muon Solenoid experiment at CERN, and a public dataset from an information technology monitoring system. The results on temporal GCMs demonstrate a considerable reduction of computation overhead and a moderate enhancement of accuracy on the binary anomaly datasets. Code: https://github.com/muleina/AnomalyCD .

2512.19510 2026-06-05 cs.LG stat.ML 版本更新

Toward Scalable and Valid Conditional Independence Testing with Spectral Representations

迈向基于谱表示的可扩展且有效的条件独立性检验

Alek Fröhlich, Vladimir R. Kostic, Karim Lounici, Daniel Perazzo, Daniel Tiezzi, Massimiliano Pontil

发表机构 * University of Cambridge(剑桥大学) University of Oxford(牛津大学) ETH Zürich(苏黎世联邦理工学院)

AI总结 本文提出了一种基于谱表示的学习方法,用于解决传统条件独立性检验在适应性和可扩展性方面的不足,通过构造简单的检验统计量和双层对比算法,建立了表示学习误差与检验性能之间的理论联系,并在实际和合成数据上验证了其有效性。

Comments Accepted at ICML 2026. Revised to match the accepted version; updated experiments and exposition

详情
AI中文摘要

条件独立性(CI)在因果推断、特征选择和图模型中至关重要,然而在许多情况下,没有额外假设的情况下无法进行检验。现有的CI检验通常依赖于限制性的结构条件,限制了其有效性。核方法使用偏协方差算子提供了一种更系统的方法,但存在有限的适应性和可扩展性。在本工作中,我们探讨了表示学习是否能帮助解决这些限制。具体而言,我们关注由偏协方差算子的奇异值分解得到的表示,并利用这些表示构造一个简单的检验统计量。我们还引入了一个双层对比算法来学习这些表示。我们的理论将表示学习误差与检验性能联系起来,并建立了渐近有效性和功效保证。在实际和合成数据上的实验表明,这种方法提供了一条系统且统计上站得住脚的路径,以实现可扩展的CI检验,将基于核的理论与现代表示学习相结合。

英文摘要

Conditional independence (CI) is central to causal inference, feature selection, and graphical modeling, yet it is untestable in many settings without additional assumptions. Existing CI tests often rely on restrictive structural conditions, limiting their validity. Kernel methods using partial covariance operators offer a more principled approach but suffer from limited adaptivity and scalability. In this work, we explore whether representation learning can help address these limitations. Specifically, we focus on representations derived from the singular value decomposition of partial covariance operators and use them to construct a simple test statistic. We also introduce a bi-level contrastive algorithm to learn these representations. Our theory links representation learning error to test performance and establishes asymptotic validity and power guarantees. Experiments on real and synthetic data suggest that this approach offers a principled and statistically grounded path toward scalable CI testing, bridging kernel-based theory with modern representation learning.

2506.11152 2026-06-05 q-bio.GN cs.LG q-bio.CB 版本更新

HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data

HEIST:一种用于空间转录组学和蛋白质组学数据的图基础模型

Hiren Madhu, João Felipe Rocha, Tinglin Huang, Siddharth Viswanath, Smita Krishnaswamy, Rex Ying

发表机构 * Yale University, USA(耶鲁大学)

AI总结 本文提出HEIST模型,通过图结构建模空间转录组学和蛋白质组学数据,利用层次化图Transformer实现对细胞空间位置和基因表达的联合建模,从而提升对细胞异质性和微环境响应的理解。

详情
AI中文摘要

单细胞转录组学和蛋白质组学已成为驱动生物学研究的重要数据来源,使高级深度学习方法能够理解单细胞水平的细胞异质性和基因表达。随着空间组学数据的出现,我们有希望在组织背景下表征细胞,因为其提供了空间坐标和细胞内转录或蛋白质计数。蛋白质组学通过直接测量蛋白质提供互补视角,蛋白质是细胞功能的主要效应器和关键治疗靶点。然而,现有模型要么忽略空间信息,要么忽略细胞内的复杂遗传和蛋白质组程序,因此无法推断细胞内部调节如何适应微环境信号。此外,这些模型通常使用固定基因词汇表,限制了其对未知基因的泛化能力。在本文中,我们介绍了HEIST,一种用于空间转录组学和蛋白质组学的层次化图Transformer基础模型。HEIST将组织建模为层次化图。高层图是空间细胞图,每个细胞再由其下层的基因共表达网络图表示。HEIST通过执行不同层次的消息传递来利用其嵌入中的层次结构,从而能够泛化到包括空间蛋白质组学在内的新数据类型,而无需重新训练。HEIST在15个器官的124种组织中使用空间感知对比和掩码自动编码目标,预训练了2230万细胞。对HEIST嵌入的无监督分析揭示了先前模型遗漏的具有空间信息的亚群。下游评估显示其在蛋白质组学数据上的泛化能力和在临床结果预测、细胞类型注释和基因填补中的最先进性能。

英文摘要

Single-cell transcriptomics and proteomics have become a great source for data-driven insights into biology, enabling the use of advanced deep learning methods to understand cellular heterogeneity and gene expression at the single-cell level. With the advent of spatial-omics data, we have the promise of characterizing cells within their tissue context as it provides both spatial coordinates and intra-cellular transcriptional or protein counts. Proteomics offers a complementary view by directly measuring proteins, which are the primary effectors of cellular function and key therapeutic targets. However, existing models either ignore the spatial information or the complex genetic and proteomic programs within cells. Thus they cannot infer how cell internal regulation adapts to microenvironmental cues. Furthermore, these models often utilize fixed gene vocabularies, hindering their generalizability unseen genes. In this paper, we introduce HEIST, a hierarchical graph transformer foundation model for spatial transcriptomics and proteomics. HEIST models tissues as hierarchical graphs. The higher level graph is a spatial cell graph, and each cell in turn, is represented by its lower level gene co-expression network graph. HEIST achieves this by performing both intra-level and cross-level message passing to utilize the hierarchy in its embeddings and can thus generalize to novel datatypes including spatial proteomics without retraining. HEIST is pretrained on 22.3M cells from 124 tissues across 15 organs using spatially-aware contrastive and masked autoencoding objectives. Unsupervised analysis of HEIST embeddings reveals spatially informed subpopulations missed by prior models. Downstream evaluations demonstrate generalizability to proteomics data and state-of-the-art performance in clinical outcome prediction, cell type annotation, and gene imputation across multiple technologies.

2512.09706 2026-06-05 cs.LG 版本更新

Training One Model to Master Cross-Level Agentic Actions via Reinforcement Learning

通过强化学习训练一个模型以掌握跨层级的代理行为

Kaichen He, Zihao Wang, Muyao Li, Anji Liu, Yitao Liang

发表机构 * Peking University(北京大学) National University of Singapore(新加坡国立大学)

AI总结 本文提出CrossHA,一种统一的代理模型,能够掌握异构的动作空间并自主选择每一步轨迹中最有效的接口,通过结合冷启动监督微调和多轮组相对策略优化(GRPO)算法,实现适应性动作切换,在Minecraft开放世界中超过800个任务上展示了最先进的性能。

Comments Accepted to CVPR 2026 as a Highlight

详情
AI中文摘要

代理AI的范式正从工程复杂的流程转向训练后的原生模型。然而,现有代理通常局限于静态的预定义动作空间,如仅使用API、GUI事件或机器人命令。这种刚性限制了它们在动态环境中适应性,其中最佳交互粒度会根据情境变化而变化。为弥合这一差距,我们提出了CrossHA,一种统一的代理模型,能够掌握异构的动作空间并自主选择每一步轨迹中最有效的接口。我们引入了一个全面的训练管道,整合了冷启动监督微调与多轮组相对策略优化(GRPO)算法。该方法使代理能够学习适应性动作切换,平衡高层效率与低层精度,而无需人为指定规则。在Minecraft开放世界中超过800个任务的广泛实验表明,CrossHA实现了最先进的性能。通过动态利用多样化的动作空间,我们的模型显著优于固定动作基线,展现出在长时间推理中的优越泛化能力和效率。所有代码和模型均在https://github.com/CraftJarvis/OpenHA上提供。

英文摘要

The paradigm of agentic AI is shifting from engineered complex workflows to post-training native models. However, existing agents are typically confined to static, predefined action spaces-such as exclusively using APIs, GUI events, or robotic commands. This rigidity limits their adaptability in dynamic environments where the optimal granularity of interaction varies contextually. To bridge this gap, we propose CrossHA, a unified agentic model that masters heterogeneous action spaces and autonomously selects the most effective interface for each step of a trajectory. We introduce a comprehensive training pipeline that integrates cold-start supervised fine-tuning with a Multi-Turn Group Relative Policy Optimization (GRPO) algorithm. This approach enables the agent to learn adaptive action switching-balancing high-level efficiency with low-level precision-without human-specified rules. Extensive experiments on over 800 tasks in the open-world Minecraft environment demonstrate that CrossHA achieves state-of-the-art performance. By dynamically leveraging the strengths of diverse action spaces, our model significantly outperforms fixed-action baselines, exhibiting superior generalization and efficiency in long-horizon reasoning. All code and models are available at https://github.com/CraftJarvis/OpenHA.

2511.21667 2026-06-05 cs.LG cs.AI 版本更新

Escaping the Verifier: Learning to Reason via Demonstrations

摆脱验证者:通过示范学习推理

Locke Cai, Max Ryabinin, Ivan Provilkov

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 本文提出RARO方法,通过逆强化学习从专家示范中学习强大的推理能力,无需任务特定的验证者,从而在多个评估任务中实现了显著的性能提升。

详情
AI中文摘要

训练大型语言模型(LLMs)进行推理通常依赖于强化学习(RL)与任务特定的验证者。然而,许多现实世界的推理密集型任务缺乏验证者,尽管提供了大量未被充分利用的专家示范。我们引入RARO(相对对抗推理优化),通过逆强化学习从专家示范中学习强大的推理能力。RARO设置了一个对抗游戏,政策与相对批评者之间进行对抗:政策学习模仿专家答案,而批评者旨在识别专家政策答案对中的专家。政策和批评者通过RL联合且连续地训练,并识别出实现稳健学习所需的关键稳定技术。实证结果表明,RARO在所有评估任务中均显著优于无验证者基线:在Countdown(1.5B)上准确率提高13.7%,在DeepMath(7B)上准确率提高8.2%,在Poetry Writing(7B)上对专家诗歌的胜利率提高19.1%。RARO还表现出与具有验证者的RL相似的稳健扩展趋势。这些结果表明,RARO能够从专家示范中有效提取强大的推理性能,即使在任务特定验证者不可用时也能实现稳健的推理学习。

英文摘要

Training Large Language Models (LLMs) to reason often relies on Reinforcement Learning (RL) with task-specific verifiers. However, many real-world reasoning-intensive tasks lack verifiers, despite offering abundant expert demonstrations that remain under-utilized for reasoning-focused training. We introduce RARO (Relativistic Adversarial Reasoning Optimization), which learns strong reasoning capabilities from expert demonstrations alone via Inverse Reinforcement Learning. RARO sets up an adversarial game between a policy and a relativistic critic: the policy learns to mimic expert answers, while the critic aims to identify the experts among expert-policy answer pairs. Both the policy and the critic are trained jointly and continuously via RL, and we identify the key stabilization techniques required for robust learning. Empirically, RARO significantly outperforms strong verifier-free baselines across all evaluation tasks: +13.7% accuracy on Countdown (1.5B), +8.2% accuracy on DeepMath (7B), and +19.1% win-rate on Poetry Writing (7B) against expert poems. RARO also exhibits similar robust scaling trends as RL with verifiers. These results demonstrate that RARO effectively elicits strong reasoning performance from expert demonstrations alone, enabling robust reasoning learning even when task-specific verifiers are unavailable.

2508.10875 2026-06-05 cs.CL cs.AI cs.LG 版本更新

A Survey on Diffusion Language Models

扩散语言模型的综述

Tianyi Li, Mingda Chen, Bowei Guo, Zhiqiang Shen

发表机构 * VILA Lab, Mohamed bin Zayed University of Artificial Intelligence(维拉实验室,穆罕默德·本·扎耶德人工智能大学) Department of Automation, Tsinghua University(清华大学自动化系)

AI总结 本文综述了扩散语言模型的发展现状,探讨了其与自回归模型和掩码语言模型的关系,分析了预训练策略、后训练方法以及推理优化技术,并讨论了多模态扩展、应用场景、局限性及未来研究方向。

详情
AI中文摘要

扩散语言模型(DLMs)正迅速崛起为一种强大的替代方案,以取代主导的自回归(AR)范式。通过迭代去噪过程并行生成令牌,DLMs在减少推理延迟和捕捉双向上下文方面具有固有优势,从而实现对生成过程的精细控制。尽管实现了数倍的加速,最近的进展使DLMs在性能上与自回归模型相当,使其成为各种自然语言处理任务的有力选择。在本文综述中,我们提供了当前DLM景观的全面概述。我们追踪其演变及其与其他范式,如自回归和掩码语言模型的关系,并涵盖了基础原理和最先进模型。我们的工作提供了一个最新、全面的分类法以及对当前技术的深入分析,从预训练策略到高级后训练方法。本文的另一个贡献是全面回顾DLM推理策略和优化,包括解码并行性、缓存机制和生成质量的改进。我们还突出了DLM多模态扩展的最新方法,并阐述了它们在各种实际场景中的应用。此外,我们的讨论还讨论了DLMs的局限性和挑战,包括效率、长序列处理和基础设施需求,同时概述了未来研究方向,以维持该快速发展的领域中的进步。Project GitHub可在https://github.com/VILA-Lab/Awesome-DLMs上找到。

英文摘要

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent advantages in reducing inference latency and capturing bidirectional context, thereby enabling fine-grained control over the generation process. While achieving a several-fold speed-up, recent advancements have allowed DLMs to show performance comparable to their autoregressive counterparts, making them a compelling choice for various natural language processing tasks. In this survey, we provide a holistic overview of the current DLM landscape. We trace its evolution and relationship with other paradigms, such as autoregressive and masked language models, and cover both foundational principles and state-of-the-art models. Our work offers an up-to-date, comprehensive taxonomy and an in-depth analysis of current techniques, from pre-training strategies to advanced post-training methods. Another contribution of this survey is a thorough review of DLM inference strategies and optimizations, including improvements in decoding parallelism, caching mechanisms, and generation quality. We also highlight the latest approaches to multimodal extensions of DLMs and delineate their applications across various practical scenarios. Furthermore, our discussion addresses the limitations and challenges of DLMs, including efficiency, long-sequence handling, and infrastructure requirements, while outlining future research directions to sustain progress in this rapidly evolving field. Project GitHub is available at https://github.com/VILA-Lab/Awesome-DLMs.

2511.21338 2026-06-05 cs.LG 版本更新

Masks Can Be Distracting: On Context Comprehension in Diffusion Language Models

掩码可能具有干扰性:关于扩散语言模型中的上下文理解

Julianna Piskorz, Cristina Pinneri, Alvaro Correia, Motasem Alfarra, Risheek Garrepalli, Christos Louizos

发表机构 * University of Cambridge(剑桥大学)

AI总结 本文研究了扩散语言模型中掩码对上下文理解的影响,发现掩码会干扰模型对相关信息的处理,提出一种掩码无关的损失函数以提高模型的鲁棒性。

Comments Published at the Forty-Third International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

Masked Diffusion Language Models (MDLMs) have recently emerged as a promising alternative to Autoregressive Language Models (ARLMs), leveraging a denoising objective that, in principle, should enable more uniform context utilisation. In this work, we examine the context comprehension abilities of MDLMs and uncover two key limitations. First, despite their more global training objective and bidirectional attention mechanism, similarly to ARLMS, MDLMs exhibit a strong locality bias: performance is highly sensitive to the position of relevant information within the input, favouring local over distant context. Second, we show that appending a large number of mask tokens--required for generation--can significantly degrade context comprehension. Through systematic ablations, we find that these masks act as distractors, reducing the model's ability to process relevant information. To address this, we introduce a mask-agnostic loss function that encourages predictions to remain invariant to the number of appended masks. Fine-tuning with this objective substantially mitigates the distracting effect of masks, improving robustness of MDLMs. Overall, our findings reveal critical limitations of the current MDLM training paradigm and provide actionable insights for building diffusion-based language models with stronger context comprehension.

英文摘要

Masked Diffusion Language Models (MDLMs) have recently emerged as a promising alternative to Autoregressive Language Models (ARLMs), leveraging a denoising objective that, in principle, should enable more uniform context utilisation. In this work, we examine the context comprehension abilities of MDLMs and uncover two key limitations. First, despite their more global training objective and bidirectional attention mechanism, similarly to ARLMS, MDLMs exhibit a strong locality bias: performance is highly sensitive to the position of relevant information within the input, favouring local over distant context. Second, we show that appending a large number of mask tokens--required for generation--can significantly degrade context comprehension. Through systematic ablations, we find that these masks act as distractors, reducing the model's ability to process relevant information. To address this, we introduce a mask-agnostic loss function that encourages predictions to remain invariant to the number of appended masks. Fine-tuning with this objective substantially mitigates the distracting effect of masks, improving robustness of MDLMs. Overall, our findings reveal critical limitations of the current MDLM training paradigm and provide actionable insights for building diffusion-based language models with stronger context comprehension.

2511.20613 2026-06-05 cs.LG cs.AI cs.MA 版本更新

Can Vibe Coding Beat Graduate CS Students? An LLM vs. Human Coding Tournament on Market-driven Strategic Planning

能否用Vibe编码击败研究生计算机科学学生?一个LLM与人类编码竞赛在市场驱动的战略规划中的表现

Panayiotis Danassis, Naman Goel

发表机构 * University of Southampton(苏塞克斯大学) University of Oxford and Alan Turing Institute(牛津大学和艾伦·图灵研究所)

AI总结 本文提出一个基于现实物流优化问题(拍卖、取件和送货问题)的多智能体推理驱动基准,该问题结合了竞争拍卖与容量受限路由。研究通过比较40个LLM编码代理与17个人类编码代理在12场双打全部比赛和约4万场比赛中的表现,揭示了人类编码代理在战略规划和优化任务中的优势,以及LLM在现实世界中生成有效代码的能力不足。

详情
AI中文摘要

大型语言模型(LLMs)的快速普及已经革新了AI辅助代码生成。然而,LLMs的快速发展超出了我们正确评估它们的能力。现有的基准测试强调单元测试通过率和语法正确性。这些指标低估了许多需要规划、优化和战略互动的真实世界问题的难度。我们引入了一个基于现实物流优化问题(拍卖、取件和送货问题)的多智能体推理驱动基准,该问题结合了竞争拍卖与容量受限路由。该基准要求构建能够(i)在不确定性下进行战略投标,以及(ii)优化规划者在交付任务的同时最大化利润的代理。我们评估了40个LLM编码的代理(由多种最先进的LLMs在多种提示方法下,包括Vibe编码)与17个在LLM出现之前开发的人类编码代理。我们的结果在12场双打全部比赛和约4万场比赛中显示(i)人类(研究生学生)编码代理的明显优势:前5名始终由人类编码代理占据;(ii)大多数LLM编码代理(33个中的40个)被非常简单的基线所击败;(iii)在给定最佳人类解决方案作为输入并提示改进的情况下,表现最好的LLM使解决方案显著变差而不是改进。我们的结果突显了LLMs在现实世界中生成具有竞争力的代码能力的差距,并促使新的评估,这些评估强调在现实世界场景中推理驱动的代码合成。

英文摘要

The rapid proliferation of Large Language Models (LLMs) has revolutionized AI-assisted code generation. This rapid development of LLMs has outpaced our ability to properly benchmark them. Prevailing benchmarks emphasize unit-test pass rates and syntactic correctness. Such metrics understate the difficulty of many real-world problems that require planning, optimization, and strategic interaction. We introduce a multi-agent reasoning-driven benchmark based on a real-world logistics optimization problem (Auction, Pickup, and Delivery Problem) that couples competitive auctions with capacity-constrained routing. The benchmark requires building agents that can (i) bid strategically under uncertainty and (ii) optimize planners that deliver tasks while maximizing profit. We evaluate 40 LLM-coded agents (by a wide range of state-of-the-art LLMs under multiple prompting methodologies, including vibe coding) against 17 human-coded agents developed before the advent of LLMs. Our results over 12 double all-play-all tournaments and $\sim 40$k matches demonstrate (i) a clear superiority of human(graduate students)-coded agents: the top 5 spots are consistently won by human-coded agents, (ii) the majority of LLM-coded agents (33 out of 40) are beaten by very simple baselines, and (iii) given the best human solution as an input and prompted to improve upon, the best performing LLM makes the solution significantly worse instead of improving it. Our results highlight a gap in LLMs' ability to produce code that works competitively in the real-world, and motivate new evaluations that emphasize reasoning-driven code synthesis in real-world scenarios.

2511.16111 2026-06-05 stat.ML cs.LG math.SP 版本更新

Rotation-Parameterized Graph Fractional Fourier Transform: Definition, Properties, and Optimal Filtering

旋转参数化图分数阶傅里叶变换:定义、性质和最优滤波

Feiyue Zhao, Mingzhi Wang, Yangfan He, Zhichao Zhang

发表机构 * School of Mathematics and Statistics, Nanjing University of Information Science and Technology(南京信息工程大学数学与统计学院) School of Communication and Artificial Intelligence, Nanjing Institute of Technology(南京理工大学通信与人工智能学院) School of Integrated Circuits, Nanjing Institute of Technology(南京理工大学集成电路学院) Jiangsu Province Engineering Research Center of IntelliSense Technology and System(江苏省智能感知技术与系统工程研究中心) Hubei Key Laboratory of Applied Mathematics, Hubei University(湖北省应用数学重点实验室) Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai Jiao Tong University(教育部系统控制与信息处理重点实验室,上海交通大学)

AI总结 本文提出旋转参数化图分数阶傅里叶变换(RP-GFRFT),通过统一分数阶和旋转参数化的谱分析,解决现有方法在旋转基控制和零角度退化方面的不足,提升图信号处理的去噪、重建和特征保留性能。

详情
AI中文摘要

图谱表示在图信号处理中是基础,为分析图结构数据提供严谨的框架。图分数阶傅里叶变换(GFRFT)通过分数阶参数扩展图傅里叶变换(GFT),实现灵活的谱分析并保持数学一致性。角图傅里叶变换(AGFT)通过旋转GFT特征向量引入角度控制;然而现有构造可能无法在零角度时精确还原为GFT,削弱理论一致性和可解释性。为解决这些互补的局限性,即GFRFT缺乏基于旋转的基控制和AGFT的零角度退化问题,本文提出旋转参数化图分数阶傅里叶变换(RP-GFRFT),统一分数阶和旋转参数化的谱分析。构造了一个保持退化的旋转矩阵族以保证在零角度时精确还原为GFT。然后提出了两种RP-GFRFT变体,I-RP-GFRFT和II-RP-GFRFT,并通过理论分析确认其幺正性、可逆性、还原行为和光滑参数依赖性。将分数阶和旋转角度联合优化用于自适应图谱滤波。在真实世界信号、图像和点云上的实验表明,RP-GFRFT在去噪精度、重建质量和特征保留方面优于GFRFT、AGFT和代表性滤波基线。

英文摘要

Graph spectral representations are fundamental in graph signal processing, providing a rigorous frameworkforanalyzing graph-structured data. The graph fractional Fourier transform (GFRFT) extends the graph Fourier transform (GFT) through a fractional-order parameter, enabling flexible spectral analysis with mathematical consistency. The angular graph Fourier transform (AGFT) further introduces angular control by rotating GFT eigenvectors; however, existing constructions may fail to reduce exactly to the GFT at zero angle, weakening theoretical consistency and interpretability. To address these complementary limitations, namely the lack of rotation-based basis control in GFRFT and the defective zero-angle degeneracy of AGFT, this paper proposes the rotation-parameterized graph fractional Fourier transform (RP-GFRFT), which unifies fractional order and rotation-parameterized spectral analysis. A degeneracy preserving rotation matrix family is constructed to guarantee exact GFT reduction at zero angle. TwoRP-GFRFTvariants,I-RP-GFRFTandII-RP-GFRFT,arethenformulated, with theoretical analyses confirming their unitarity, invertibility, reduction behavior, and smooth parameter dependence. The fractional order and rotation angle are jointly optimized for adaptive graph spectral filtering. Experiments on real-world signals, images, and point clouds demonstrate that RP-GFRFT improves denoising accuracy, reconstruction quality, and feature preservation over GFRFT, AGFT, and representative filtering baselines.

2511.13044 2026-06-05 cs.LG 版本更新

Bi-View Embedding Fusion: A Hybrid Learning Approach for Knowledge Graph's Nodes Classification Addressing Problems with Limited Data

双视角嵌入融合:一种混合学习方法用于知识图谱节点分类,以解决数据有限的问题

Rosario Napoli, Giovanni Lonia, Antonio Celesti, Massimo Villari, Maria Fazio

发表机构 * Department of Mathematical and Computer Sciences, Physical Sciences and Earth Sciences, University of Messina(数学与计算机科学系、物理科学与地球科学系,墨西拿大学)

AI总结 本文提出了一种双视角嵌入融合方法,通过结合Node2Vec和GraphSAGE两种互补的图嵌入技术,提升知识图谱节点特征的 informative 内容,从而生成增强的图嵌入以改进GML模型,无需额外合成数据。

Comments Accepted at the 14th International Joint Conference on Knowledge Graphs (IJCKG) 2025

详情
Journal ref
Knowledge Graphs, Springer Nature Singapore, 2026, pp. 19-34
AI中文摘要

传统机器学习(ML)方法需要大量数据才能表现良好,限制了其在稀疏或不完整场景中的适用性,并迫使使用额外合成数据来改进模型训练。为克服这一挑战,研究社区越来越多地关注图机器学习(GML),因为它通过利用数据中的关系提供了强大的替代方案。然而,这种方法在处理知识图谱(KGs)时也面临限制,因为KGs的语义性质可能导致隐藏大量信息。本研究引入了Bi-View,一种新颖的混合方法,通过增加KGs节点特征的信息内容,生成增强的图嵌入(GEs),用于改进GML模型,而无需依赖额外合成数据。所提出的方法结合了两种互补的GE技术:Node2Vec,通过无监督随机游走捕捉结构模式,以及GraphSAGE,通过监督方式聚合邻居信息。首先计算Node2Vec嵌入以表示图拓扑,然后用基于中心性的度量指标丰富节点特征,这些特征作为GraphSAGE模型的输入。此外,融合层将原始Node2Vec嵌入与GraphSAGE影响的表示结合,形成双视角嵌入空间。此类融合捕获了图的拓扑和语义属性,使模型能够利用数据集中可能存在但未显式表示的 informative 特征。我们的方法提高了下游任务的性能,特别是在初始特征较差的情况下,为更准确和精确的KG增强GML模型奠定了基础。

英文摘要

Traditional Machine Learning (ML) methods require large amounts of data to perform well, limiting their applicability in sparse or incomplete scenarios and forcing the usage of additional synthetic data to improve the model training. To overcome this challenge, the research community is looking more and more at Graph Machine Learning (GML) as it offers a powerful alternative by using relationships within data. However, this method also faces limitations, particularly when dealing with Knowledge Graphs (KGs), which can hide huge information due to their semantic nature. This study introduces Bi-View, a novel hybrid approach that increases the informative content of node features in KGs to generate enhanced Graph Embeddings (GEs) that are used to improve GML models without relying on additional synthetic data. The proposed work combines two complementary GE techniques: Node2Vec, which captures structural patterns through unsupervised random walks, and GraphSAGE, which aggregates neighbourhood information in a supervised way. Node2Vec embeddings are first computed to represent the graph topology, and node features are then enriched with centrality-based metrics, which are used as input for the GraphSAGE model. Moreover, a fusion layer combines the original Node2Vec embeddings with the GraphSAGE-influenced representations, resulting in a dual-perspective embedding space. Such a fusion captures both topological and semantic properties of the graph, enabling the model to exploit informative features that may exist in the dataset but that are not explicitly represented. Our approach improves downstream task performance, especially in scenarios with poor initial features, giving the basis for more accurate and precise KG-enanched GML models.

2511.10362 2026-06-05 cs.LG cs.SY eess.SY math.DS 版本更新

Gradient Flow Equations for Deep Linear Neural Networks: A Survey from a Network Perspective

深度线性神经网络的梯度流方程:从网络角度的综述

Joel Wendin, Claudio Altafini

发表机构 * Department of Electrical Engineering, Linköping University(电子工程系,林雪平大学)

AI总结 本文综述了深度线性神经网络梯度流方程的动力学和损失景观的最新进展,从网络角度探讨了梯度下降训练动态(步长趋近于0时的极限情况)以及二次损失函数下的研究问题,揭示了该方程类为收敛的矩阵微分方程,具有 nilpotent、多项式、isospectral 和守恒律等特性。

Comments Manuscript accepted for publication in SIAM Review (SIREV)

详情
Journal ref
SIAM Review 68 (2026) 293-345
AI中文摘要

本文综述了深度线性神经网络梯度流方程的动力学和损失景观的最新进展,即在忽略激活函数且使用二次损失函数的情况下,深度神经网络的梯度下降训练动态(当步长趋近于0时的极限情况)。当用神经网络的邻接矩阵来表示时,这些梯度流方程形成了一类收敛的矩阵微分方程,具有 nilpotent、多项式、isospectral 和守恒律等特性。损失景观被详细描述。其特征是存在无限多个全局极小值和鞍点(严格和非严格),但缺乏局部极小值和极小值。损失函数本身是一个正半定的李雅普诺夫函数,其等高线是无界的不变集,其临界值对应于梯度沿特定轨迹学习的输入输出数据的奇异值数量。本文所用的邻接矩阵表示法可以突出显示商空间结构的存在,其中每个损失函数的临界值仅被表示一次,而所有其他具有相同临界值的临界点都属于与商空间相关的纤维。它还允许轻松确定鞍点的稳定和不稳定子流形,即使海森矩阵无法获得这些结构。

英文摘要

The paper surveys recent progresses in understanding the dynamics and loss landscape of the gradient flow equations associated to deep linear neural networks, i.e., the gradient descent training dynamics (in the limit when the step size goes to 0) of deep neural networks missing the activation functions and subject to quadratic loss functions. When formulated in terms of the adjacency matrix of the neural network, as we do in the paper, these gradient flow equations form a class of converging matrix ODEs which is nilpotent, polynomial, isospectral, and with conservation laws. The loss landscape is described in detail. It is characterized by infinitely many global minima and saddle points, both strict and nonstrict, but lacks local minima and maxima. The loss function itself is a positive semidefinite Lyapunov function for the gradient flow, and its level sets are unbounded invariant sets of critical points, with critical values that correspond to the amount of singular values of the input-output data learnt by the gradient along a certain trajectory. The adjacency matrix representation we use in the paper allows to highlight the existence of a quotient space structure in which each critical value of the loss function is represented only once, while all other critical points with the same critical value belong to the fiber associated to the quotient space. It also allows to easily determine stable and unstable submanifolds at the saddle points, even when the Hessian fails to obtain them.

2511.08972 2026-06-05 cs.LG 版本更新

Selective Sinkhorn Routing for Improved Sparse Mixture of Experts

选择性Sinkhorn路由以提高稀疏专家混合模型

Duc Anh Nguyen, Huu Binh Ta, Nhuan Le Duc, Tan Minh Nguyen, Toan Tran

发表机构 * University of California, Berkeley(加州大学伯克利分校) National University of Singapore(新加坡国立大学)

AI总结 本文提出了一种选择性Sinkhorn路由方法,通过将token到专家的分配问题转化为最优传输问题,并引入约束以确保专家利用率均衡,从而在不依赖辅助平衡损失的情况下提升稀疏专家混合模型的性能。

Comments 12 pages, 5 figures

详情
AI中文摘要

稀疏专家混合模型(SMoE)模型具有可扩展性和计算效率,能够在有限的推理开销下实现模型容量的大幅增加。现有的SMoE方法通常依赖于辅助目标,如负载均衡损失和z损失,或额外的可训练组件如噪声门控。虽然这些技术鼓励专家多样性,但可能会引入目标不一致、增加模型复杂性或带来显著的训练开销,尤其是在基于Sinkhorn的路由方法中。在本文中,我们重新审视token到专家的分配问题作为最优传输问题。我们添加约束以确保专家利用率的平衡。我们证明,即使是最小的基于最优传输的路由也能在不需辅助平衡损失的情况下提升SMoE性能。与以往方法不同,我们的方法直接从传输图中推导出门控分数,从而实现更平衡和有效的token到专家分配。基于这一见解,我们引入了选择性Sinkhorn路由(SSR),一种轻量级的路由机制,它用高效的Sinkhorn路由替代了复杂的辅助损失,同时保持灵活的专家选择。在语言建模和图像分类实验中,SSR在训练效率、准确性和对输入损坏的鲁棒性方面均有所提升。

英文摘要

Sparse Mixture-of-Experts (SMoE) models are scalable and computationally efficient, enabling large increases in model capacity with limited inference overhead. Existing SMoE methods often depend on auxiliary objectives, such as load-balancing loss and z-loss, or additional trainable components such as noisy gating. While these techniques encourage expert diversity, they can introduce objective misalignment, increase model complexity, or incur substantial training overhead, especially in Sinkhorn-based routing methods. In this paper, we revisit the token-to-expert assignment as an optimal transport problem. We add constraints to ensure balanced expert utilization. We show that even minimal optimal transport-based routing improves SMoE performance without requiring auxiliary balancing losses. Unlike prior approaches, our method derives gating scores directly from the transport map, leading to more balanced and effective token-to-expert assignments. Building on this insight, we introduce Selective Sinkhorn Routing (SSR), a lightweight routing mechanism that replaces complex auxiliary losses with efficient Sinkhorn-based routing while preserving flexible expert selection. Experiments on language modeling and image classification show that SSR improves training efficiency, accuracy, and robustness to input corruption.

2511.05615 2026-06-05 cs.LG cs.AI cs.AR physics.ins-det 版本更新

wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation

wa-hls4ml: 一个用于hls4ml资源和延迟估计的基准及替代模型

Benjamin Hawks, Jason Weitz, Dmitri Demler, Karla Tame-Narvaez, Dennis Plotnikov, Mohammad Mehdi Rahimifar, Hamza Ezzaoui Rahali, Audrey C. Therrien, Donovan Sproule, Elham E Khoda, Keegan A. Smith, Russell Marroquin, Giuseppe Di Guglielmo, Nhan Tran, Javier Duarte, Vladimir Loncar

发表机构 * Fermi National Accelerator Laboratory(费米国家加速器实验室) University of California San Diego(加州大学圣地亚哥分校) Johns Hopkins University(约翰霍普金斯大学) University of Sherbrooke(Sherbrooke大学) Columbia University(哥伦比亚大学) Texas A&M University(德克萨斯A&M大学) European Organization for Nuclear Research (CERN)(欧洲核子研究中心(CERN))

AI总结 本文提出了一个用于评估ML加速器资源和延迟的基准wa-hls4ml,并介绍了基于图神经网络和Transformer的替代模型,用于预测ML加速器的延迟和资源使用情况。

Comments 30 pages, 18 figures

详情
Journal ref
Wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation. ACM Trans. Reconfigurable Technol. Syst. 19, 2, Article 20 (June 2026), 29 pages
AI中文摘要

随着机器学习(ML)越来越多地在硬件中实现以解决科学应用中的实时挑战,先进的工具链开发显著减少了各种设计迭代所需的时间。这些进步已经解决了主要障碍,但也暴露了新的挑战。例如,以前未被考虑的瓶颈过程,如硬件综合,现在成为设计快速迭代的限制因素。为缓解这些新兴约束,已经开展了多项努力,以开发基于ML的替代模型,以估计ML加速器架构的资源使用情况。我们介绍了wa-hls4ml,这是一个用于ML加速器资源和延迟估计的基准,以及其对应的初始数据集,包含超过680,000个全连接和卷积神经网络,均使用hls4ml合成并针对Xilinx FPGA。该基准评估了资源和延迟预测器在几种常见ML模型架构上的性能,这些架构主要来自科学领域,作为示例模型,并评估了数据集子集的平均性能。此外,我们还介绍了基于图神经网络和Transformer的替代模型,用于预测ML加速器的延迟和资源。我们展示了这些模型的架构和性能,并发现这些模型通常在合成测试数据集上对75百分位数的延迟和资源预测误差在几个百分点以内。

英文摘要

As machine learning (ML) is increasingly implemented in hardware to address real-time challenges in scientific applications, the development of advanced toolchains has significantly reduced the time required to iterate on various designs. These advancements have solved major obstacles, but also exposed new challenges. For example, processes that were not previously considered bottlenecks, such as hardware synthesis, are becoming limiting factors in the rapid iteration of designs. To mitigate these emerging constraints, multiple efforts have been undertaken to develop an ML-based surrogate model that estimates resource usage of ML accelerator architectures. We introduce wa-hls4ml, a benchmark for ML accelerator resource and latency estimation, and its corresponding initial dataset of over 680,000 fully connected and convolutional neural networks, all synthesized using hls4ml and targeting Xilinx FPGAs. The benchmark evaluates the performance of resource and latency predictors against several common ML model architectures, primarily originating from scientific domains, as exemplar models, and the average performance across a subset of the dataset. Additionally, we introduce GNN- and transformer-based surrogate models that predict latency and resources for ML accelerators. We present the architecture and performance of the models and find that the models generally predict latency and resources for the 75% percentile within several percent of the synthesized resources on the synthetic test dataset.

2410.02628 2026-06-05 cs.LG cs.AI 版本更新

Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization

逆熵最优运输通过数据似然最大化解决半监督学习

Mikhail Persiianov, Arip Asadulaev, Nikita Andreev, Nikita Starodubcev, Dmitry Baranchuk, Anastasis Kratsios, Evgeny Burnaev, Alexander Korotin

发表机构 * Institute for Advanced Study(高级研究院) National Research Council Canada(加拿大国家研究理事会) University of Toronto(多伦多大学) St. Petersburg State University(圣彼得格勒国立大学) Skolkovo Institute of Science and Technology(斯克罗夫诺技术研究所) Kazan Federal University(卡兹兰卡联邦大学)

AI总结 本文提出了一种名为EBiEOT的新学习范式,通过数据似然最大化技术无缝整合配对和非配对数据,解决了半监督学习中的数据获取难题,并证明了该方法在理论上能够以任意小的误差恢复真实条件分布。

详情
AI中文摘要

学习条件分布π*(⋅|x)是机器学习中的核心问题,通常通过监督方法利用配对数据(x,y)∼π*进行学习。然而,获取配对数据样本往往具有挑战性,尤其是在领域翻译等问题中。这需要开发能够利用有限配对数据和额外非配对i.i.d.样本x∼π*_x和y∼π*_y的半监督模型。使用此类结合数据复杂且常依赖启发式方法。为此,我们提出了一种新的学习范式称为EBiEOT,利用数据似然最大化技术无缝整合配对和非配对数据。我们证明了该方法与逆熵最优运输(OT)有奇妙的联系。这一发现使我们能够应用最近的计算OT进展,建立一个端到端的学习算法来获得π*(⋅|x)。此外,我们推导了通用逼近性质,证明该方法在理论上可以以任意小的误差恢复真实条件分布。最后,我们通过实验证明,我们的方法能够同时利用配对和非配对数据有效学习条件分布。EBiEOT的代码可在https://github.com/MuXauJl11110/EBiEOT上获得。

英文摘要

Learning conditional distributions $π^*(\cdot|x)$ is a central problem in machine learning, which is typically approached via supervised methods with paired data $(x,y) \sim π^*$. However, acquiring paired data samples is often challenging, especially in problems such as domain translation. This necessitates the development of $\textit{semi-supervised}$ models that utilize both limited paired data and additional unpaired i.i.d. samples $x \sim π^*_x$ and $y \sim π^*_y$ from the marginal distributions. The usage of such combined data is complex and often relies on heuristic approaches. To tackle this issue, we propose a new learning paradigm called $\textbf{EBiEOT}$ that integrates both paired and unpaired data seamlessly using data likelihood maximization techniques. We demonstrate that our approach also connects intriguingly with inverse entropic optimal transport (OT). This finding allows us to apply recent advances in computational OT to establish an $\textit{end-to-end}$ learning algorithm to get $π^*(\cdot|x)$. In addition, we derive the universal approximation property, demonstrating that our approach can theoretically recover true conditional distributions with arbitrarily small error. Finally, we demonstrate through empirical tests that our method effectively learns conditional distributions using paired and unpaired data simultaneously. The code of $\texttt{EBiEOT}$ is available at https://github.com/MuXauJl11110/EBiEOT.

2505.15405 2026-06-05 cs.LG 版本更新

HOPSE: Scalable Higher-Order Positional and Structural Encoder for Combinatorial Representations

HOPSE:可扩展的高阶位置和结构编码器用于组合表示

Guillermo Bernárdez, Marco Montagna, Louis Van Langendonck, Martin Carrasco, Amirreza Akbari, Louisa Cornelis, Mathilde Papillon, Pere Barlet-Ros, Nina Miolane, Lev Telyatnikov

发表机构 * Guillermo Bernárdez University California Santa Barbara(Guillermo Bernárdez 卡尔弗大学圣巴bara分校) Sapienza University of Rome(罗马萨皮恩扎大学) Universitat Politèqnica de Catalunya(加泰罗尼亚理工大学) University of Fribourg(弗里堡大学) Aalto University(阿alto大学) University California Santa Barbara(加州圣巴bara大学) Intelligent Maintenance and Operations Systems, EPFL(EPFL智能维护与操作系统)

AI总结 本文提出HOPSE,一种无需消息传递层的框架,通过Hasse图分解在任意高阶域上生成高效且表达能力强的编码,实现了在组合表示规模线性增长的同时保持HOMP方法的表达能力和排列等价性,实验表明其在分子和拓扑基准上表现优异且速度更快。

详情
AI中文摘要

尽管图神经网络(GNNs)在建模关系数据方面表现出色,但成对连接无法自然捕捉复杂现实系统中多向关系。为此,拓扑深度学习(TDL)利用更一般的组合表示——如单纯复形或细胞复形——来容纳高阶交互。现有TDL方法通常通过高阶消息传递(HOMP)扩展GNNs,但因传播消息通过组合结构的复杂度高而面临关键的可扩展性挑战。为克服这一限制,我们提出了HOPSE(高阶位置和结构编码器),一种无需消息传递层的框架,利用Hasse图分解在任意高阶域上生成高效且表达能力强的编码。值得注意的是,HOPSE在组合表示规模上呈线性增长,同时保持HOMP方法的表达能力和排列等价性。在分子和拓扑基准上的实验表明,它在匹配或超越最先进性能的同时,始终在HOMP基于模型上实现速度提升,为可扩展的TDL开辟了新路径。代码可在https://github.com/geometric-intelligence/topobench.git获取。

英文摘要

While Graph Neural Networks (GNNs) have proven highly effective at modeling relational data, pairwise connections cannot fully capture multi-way relationships naturally present in complex real-world systems. In response to this, Topological Deep Learning (TDL) leverages more general combinatorial representations--such as simplicial or cellular complexes--to accommodate higher-order interactions. Existing TDL methods often extend GNNs through Higher-Order Message Passing (HOMP), but face critical scalability challenges due to the steep complexity overhead of propagating messages through combinatorial structures. To overcome this limitation, we propose HOPSE (Higher-Order Positional and Structural Encoder), a framework free of message passing layers that uses Hasse graph decompositions to derive efficient and expressive encodings over arbitrary higher-order domains. Notably, HOPSE scales linearly with the size of combinatorial representations while preserving the expressive power and permutation equivariance of the HOMP approaches. Experiments on molecular and topological benchmarks show that it matches or surpasses state-of-the-art performance while consistently achieving speedups over HOMP-based models, opening a new path for scalable TDL. The code is available at https://github.com/geometric-intelligence/topobench.git.

2510.15814 2026-06-05 stat.ML cs.LG 版本更新

On Universality of Deep Equivariant Networks

关于深度等变网络的通用性

Marco Pacini, Mircea Petrache, Bruno Lepri, Shubhendu Trivedi, Robin Walters

发表机构 * University of Trento(特伦托大学) Fondazione Bruno Kessler(布鲁诺·凯瑟勒基金会) PUC Chile(智利天主教大学) Northeastern University(东北大学)

AI总结 本文研究了等变神经网络的通用性问题,提出在分离约束下,通过全连接读出层可实现连续函数的近似,并引入了更严格的逐元素分离性准则,证明了足够深度或适当读出层可使等变网络在逐元素分离性范围内实现通用性。

Comments Published as a conference paper at ICLR 2026

详情
Journal ref
International Conference on Learning Representations (ICLR), 2026
AI中文摘要

对于等变神经网络的通用性结果仍然很少。已有的结果通常仅在受限的设置中成立:要么依赖于常规或高阶张量表示,导致隐藏空间维度过高,要么针对专门的架构,通常局限于不变设置。本文提出了一种更一般性的结论。对于不变网络,我们在分离约束下建立了通用性定理,证明添加全连接读出层可使连续函数的近似在分离约束下实现。对于等变网络,其中结果更为稀少,我们证明标准分离性概念不足,并引入更严格的逐元素分离性准则。我们证明在足够深度或添加适当读出层的情况下,等变网络可在逐元素分离性范围内实现通用性。结合先前结果表明浅层模型无法实现通用性,我们的发现将深度和读出层识别为通用性的关键机制,同时提供了一个统一的视角,涵盖了并扩展了先前专门的结果。

英文摘要

Universality results for equivariant neural networks remain rare. Those that do exist typically hold only in restrictive settings: either they rely on regular or higher-order tensor representations, leading to impractically high-dimensional hidden spaces, or they target specialized architectures, often confined to the invariant setting. This work develops a more general account. For invariant networks, we establish a universality theorem under separation constraints, showing that the addition of a fully connected readout layer secures approximation within the class of separation-constrained continuous functions. For equivariant networks, where results are even scarcer, we demonstrate that standard separability notions are inadequate and introduce the sharper criterion of $\textit{entry-wise separability}$. We show that with sufficient depth or with the addition of appropriate readout layers, equivariant networks attain universality within the entry-wise separable regime. Together with prior results showing the failure of universality for shallow models, our findings identify depth and readout layers as a decisive mechanism for universality, additionally offering a unified perspective that subsumes and extends earlier specialized results.

2510.05544 2026-06-05 cs.CL cs.LG 版本更新

Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM

基于激活信息的帕累托引导低秩压缩用于高效LLM/VLM

Ryan Solgi, Parsa Madinei, Jiayi Tian, Rupak Swaminathan, Jing Liu, Nathan Susanj, Zheng Zhang

发表机构 * University of California-Santa Barbara(加州大学圣芭芭拉分校) Amazon(亚马逊)

AI总结 本文提出了一种基于激活信息的帕累托引导低秩压缩方法,通过理论分析和算法设计,在保持模型精度的同时提升LLM和VLM的压缩效率和推理速度。

详情
AI中文摘要

大型语言模型(LLM)和视觉-语言模型(VLM)已取得最先进的性能,但在部署时带来了显著的内存和计算挑战。我们提出了一种新颖的低秩压缩框架来解决这一挑战。首先,我们通过层间激活基于的压缩误差上界来限制网络损失的变化,填补了文献中的理论空白。然后,我们将低秩模型压缩公式化为双目标优化问题,并证明单一统一容忍度会产生替代帕累托最优的异质秩。基于我们的理论洞察,我们提出帕累托引导奇异值分解(PGSVD),一种零样本流程,通过帕累托引导的秩选择和交替最小二乘实现来改进激活感知压缩。我们将PGSVD应用于LLM和VLM,显示在相同压缩水平下具有更好的准确性和推理加速。

英文摘要

Large language models (LLM) and vision-language models (VLM) have achieved state-of-the-art performance, but they impose significant memory and computing challenges in deployment. We present a novel low-rank compression framework to address this challenge. First, we upper bound the change of network loss via layer-wise activation-based compression errors, filling a theoretical gap in the literature. We then formulate low-rank model compression as a bi-objective optimization and prove that a single uniform tolerance yields surrogate Pareto-optimal heterogeneous ranks. Based on our theoretical insights, we propose Pareto-Guided Singular Value Decomposition (PGSVD), a zero-shot pipeline that improves activation-aware compression via Pareto-guided rank selection and alternating least-squares implementation. We apply PGSVD to both LLM and VLM, showing better accuracy at the same compression levels and inference speedup.

2510.04500 2026-06-05 cs.LG 版本更新

Expand Neurons, Not Parameters

扩展神经元,而非参数

Linghao Kong, Inimai Subramanian, Yonadav Shavit, Micah Adler, Dan Alistarh, Nir Shavit

发表机构 * University of Washington(华盛顿大学) Microsoft Research(微软研究院)

AI总结 通过增加神经元数量而不增加非零参数总数,减少特征干扰,从而提高网络性能,并在多种模型中验证了有效性。

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026). 9 pages, 6 figures. Code available at https://github.com/Shavit-Lab/Expand-Neurons

详情
AI中文摘要

本工作展示了如何在不增加网络非零参数总数的情况下,通过增加神经元数量来提升性能。我们证明,这种提升对应于多个特征之间干扰的减少,否则这些特征将共享相同的神经元。在符号布尔任务中,根据子句知识将每个神经元分割成更稀疏的子神经元,系统性地降低了多语义性指标,并获得了更高的任务准确率。值得注意的是,即使是神经元权重的随机分割也能近似这些增益,表明减少冲突(而非精确分配)是主要驱动因素。与叠加假说一致,该框架的收益随着干扰的增加而增长:当多语义负载较高时,准确率提升最大。将这些见解迁移到更现实的模型中,包括基于CLIP嵌入的分类器、卷积神经网络和更深的多层网络,我们发现,在保持非零参数数量不变的情况下加宽网络,持续提高了准确率。这些结果确定了一种基于可解释性的机制,利用宽度来对抗叠加,从而在不增加非零参数数量的情况下提升性能。这种方向与现代加速器非常匹配,因为在这些加速器中,非零参数的内存移动(而非原始计算)通常是主要瓶颈。

英文摘要

This work demonstrates how increasing the number of neurons in a network without increasing its total number of non-zero parameters improves performance. We show that this gain corresponds with a decrease in interference between multiple features that would otherwise share the same neurons. On symbolic Boolean tasks, splitting each neuron into sparser sub-neurons with knowledge of the clauses systematically reduces polysemanticity metrics and yields higher task accuracy. Notably, even random splits of neuron weights approximate these gains, indicating that reduced collisions, not precise assignment, are a primary driver. Consistent with the superposition hypothesis, the benefits of this framework grow with increasing interference: when polysemantic load is high, accuracy improvements are the largest. Transferring these insights to more realistic models, including classifiers over CLIP embeddings, convolutional neural networks, and deeper multilayer networks, we find that widening networks while maintaining a constant non-zero parameter count consistently increases accuracy. These results identify an interpretability-grounded mechanism to leverage width against superposition, improving performance without increasing the number of non-zero parameters. Such a direction is well matched to modern accelerators, where memory movement of non-zero parameters, rather than raw compute, is often a dominant bottleneck.

2508.09697 2026-06-05 cs.LG cs.CV 版本更新

Towards Label-Noise Resistant Learning via Optimal Brain Damage Masking

通过最优脑损伤遮蔽实现抗标签噪声学习

Xinlei Zhang, Fan Liu, Chuanyi Zhang, Fan Cheng, Qian Li, Yuhui Zheng

发表机构 * Hohai University(河海大学)

AI总结 本文提出了一种基于最优脑损伤理论的抗标签噪声学习方法,通过遮蔽冗余连接来减少噪声梯度传播,提升模型鲁棒性。

详情
AI中文摘要

噪声标签在现实世界中不可避免。由于深度神经网络强大的记忆能力,这些噪声标签会导致显著的性能下降。现有的噪声鲁棒方法主要集中在鲁棒损失函数和样本选择上,对动态架构适应的探索相对有限。本文重新审视了标签噪声存在下模型连接的作用。直观上,噪声标签引起的性能下降源于噪声梯度的反向传播。由于最终分类器层是这种误差传播的主要通道,直接丢弃分类器中的冗余连接可以在根源上截断噪声梯度。为了识别这些冗余连接,我们利用模型压缩中的经典最优脑损伤(OBD)理论,该理论指出造成微小损失扰动的参数可以安全移除而不影响性能。基于这一原则,我们发现遮蔽低激活边可以保持网络的正常拟合能力,同时有效降低噪声梯度传播的风险。为了将这一理论洞察与实际训练相结合,我们提出了一种新的选择性边遮蔽(SEM)机制,用于广泛采用的全连接(FC)层,以增强模型对噪声标签的鲁棒性。SEM可以自适应地只保留最重要的边用于信息传播,同时抑制由噪声标签引起的梯度误差。作为插件式组件,SEM可以无缝集成到各种噪声鲁棒方法中,包括鲁棒损失函数和样本选择。在合成和现实世界基准上的广泛评估表明,我们的OBD驱动方法在性能上始终优于最先进的方法。

英文摘要

Noisy labels are inevitable in real-world scenarios. Due to the strong capacity of deep neural networks to memorize corrupted labels, these noisy labels cause significant performance degradation. Existing noise-robust methods have mainly focused on robust loss functions and sample selection, with comparatively limited exploration of dynamic architectural adaptation. In this paper, we rethink the role of model connectivity in the presence of label noise. Intuitively, performance degradation caused by noisy labels stems from the backpropagation of noisy gradients. Since the final classifier layer acts as the primary gateway for this error propagation, directly discarding redundant connections within the classifier can structurally intercept noisy gradients at the root. Consequently, to identify these redundant connections, we leverage the seminal Optimal Brain Damage (OBD) theory from model compression, which posits that parameters causing negligible loss perturbation can be safely removed without impairing performance. Guided by this principle, we reveal that masking low-activation edges maintains the network's normal fitting capacity while effectively reducing the risk of backpropagating noisy gradients. To bridge this theoretical insight with practical training, we propose a novel Selective Edge Masking (SEM) mechanism for the widely-adopted fully connected (FC) layer to enhance model robustness against noisy labels. It can adaptively preserve only the most critical edges for information propagation while suppressing gradient errors caused by noisy labels. As a plug-and-play component, SEM can be seamlessly integrated into various noise-robust methods, including robust loss functions and sample selection. Extensive evaluations on both synthetic and real-world benchmarks demonstrate that our OBD-driven approach consistently outperforms state-of-the-art methods.

2509.25397 2026-06-05 cs.SE cs.AI cs.LG 版本更新

A Cartography of Open Collaboration in Open Source AI: Mapping Practices, Motivations, and Governance in 14 Open Large Language Model Projects

开源人工智能中开放协作的图谱:映射14个开源大语言模型项目的实践、动机与治理

Johan Linåker, Cailean Osborne, Jennifer Ding, Ben Burtenshaw

发表机构 * RISE Research Institutes of Sweden AB(瑞典RISE研究机构) University of Oxford(牛津大学)

AI总结 本文通过分析14个开源大语言模型项目的开发与再利用生命周期中的开放协作实践,揭示了协作方法、动机和治理结构的多样性,以及开放源代码AI并非单一属性,而是协作组织方式在互联艺术领域、生命周期阶段和制度背景下的涌现结果。

Comments In submission

详情
AI中文摘要

开源大语言模型(LLMs)的普及正在推动人工智能(AI)领域形成一个活跃的生态系统。然而,开发开源LLMs所使用的协作方法,在其公开发布前后仍未被系统研究,这限制了我们对开源LLM项目如何启动、组织和治理的理解,以及进一步促进这一生态系统的机会。我们通过探索性分析开源LLMs的开发与再利用生命周期中的开放协作,基于对14个不同开源LLM项目开发者的半结构化访谈。这些协作跨越多个艺术领域——包括模型、数据、软件、评估、计算和社区参与——每个领域都使不同的参与形式成为可能,并涉及不同的利益相关者,这些利益相关者在LLM开发生命周期中不断演变,从早期的集中、选择性参与转变为模型发布后的广泛、分散参与。开源LLM开发者受多种社会、经济和技术动机驱动,从民主化AI访问和促进开放科学到构建区域生态系统和扩展语言代表性。这些动态通过一系列治理结构协调,通常在不同程度上正式和专业化,包括以公司为中心的集中努力到去中心化的基层倡议。我们通过一个概念模型综合了我们的发现,提供了实践建议,并得出结论:开源AI的开放性并非单一属性,而是协作在互联艺术领域、生命周期阶段和制度背景下的组织方式的涌现结果。

英文摘要

The proliferation of open large language models (LLMs) is fostering a vibrant ecosystem in artificial intelligence (AI). However, the methods of collaboration used to develop open LLMs, both before and after their public release, have not yet been systematically studied, limiting our understanding of how open LLM projects are initiated, organised, and governed, as well as the opportunities to further foster this ecosystem. We address this gap through an exploratory analysis of open collaboration throughout the development and reuse lifecycle of open LLMs, drawing on semi-structured interviews with the developers of 14 diverse open LLM projects. These collaborations span multiple artefact domains -- including models, data, software, evaluation, compute, and community engagement -- each enabling distinct forms of participation and involving different stakeholders that evolves across the LLM development lifecycle, shifting from concentrated, selective engagement in the early stages to broader, distributed participation after model release. The open LLM developers are motivated by a variety of social, economic, and technological motivations, ranging from democratising access to AI and promoting open science to building regional ecosystems and expanding language representation. These dynamics are coordinated through a range of governance structures, typically formal and professionalised to varying degrees, including centralised company-led efforts to decentralised grassroots initiatives. We synthesise our findings in a conceptual model of open collaboration in open LLM ecosystems, provide recommendations for practice, and conclude that openness in open source AI is not a uniform property but an emergent outcome of how collaboration is organised across interconnected artefact domains, lifecycle stages, and institutional contexts.

2509.22015 2026-06-05 cs.LG 版本更新

Concept-SAE: A Controllable and Invertible Concept Interface for Sparse Autoencoders

Concept-SAE: 一种可控且可逆的概念接口用于稀疏自编码器

Jianrong Ding, Muxi Chen, Chenchen Zhao, Qiang Xu

发表机构 * The Chinese University of Hong Kong(香港中文大学)

AI总结 本文提出Concept-SAE,一种通过结构化可控接口探测用户定义概念的框架,通过将激活子空间分解为概念令牌和自由令牌,实现高保真、局部化强且解耦的概念表示,优于现有方法。

Comments Accepted by ECML PKDD 2026, the project can be found at https://github.com/RafaDD/Concept-SAE

详情
AI中文摘要

标准稀疏自编码器(SAEs)在发现模型学习的字典方面表现出色,为被动特征发现提供了强大视角。然而,这种被动性质使得系统评估或分析用户关心的概念变得困难。我们引入Concept-SAE,一种通过结构化可控接口扩展SAEs的框架,用于探测用户定义的概念。Concept-SAE将激活子空间分解为两个正交部分:概念令牌,通过双监督在概念存在和空间定位上对齐外部指定语义;自由令牌,像标准SAEs一样捕捉所有剩余信息。这种混合解耦策略确保概念令牌忠实、空间接地且与残差子空间清洁分离,同时保留SAEs对开放概念发现的能力。我们进行了广泛的实验,证明Concept-SAE产生高保真、局部化强且解耦的概念表示,优于替代方法。最后,我们通过三个诊断评估验证该概念接口的实用性:对对抗图像样本的分类检测测试、聚焦于可控反事实编辑的可控性测试以及使用对抗扰动的稳定性测试。这些结果表明,Concept-SAE为SAEs提供了一种可靠的机制,用于评估、探测和诊断用户定义的概念。

英文摘要

Standard Sparse Autoencoders (SAEs) excel at discovering a dictionary of a model's learned features, providing a powerful lens for passive feature discovery. However, this passive nature makes it difficult to systematically evaluate or analyze concepts that users explicitly care about. We introduce Concept-SAE, a framework that augments SAEs with a structured and controllable interface for probing user-defined concepts. Concept-SAE decomposes an activation subspace into two orthogonal components: Concept Tokens, which are aligned to externally specified semantics through dual supervision on both concept existence and spatial localization, and Free Tokens, which operate like standard SAEs to capture all remaining information. This hybrid disentanglement strategy ensures that Concept Tokens are faithful, spatially grounded, and cleanly separated from the residual subspace while preserving the ability of SAEs for open-ended concept discovery. We conduct extensive experiments demonstrating that Concept-SAE yields high-fidelity, well-localized, and strongly disentangled concept representations, outperforming alternatives in interface quality. Finally, we validate the utility of this conceptual interface through three diagnostic evaluations: a detection test on classifying adversarial image samples, a controllability test focusing on controlled counterfactual editing and a stability test using adversarial perturbations. Together, these results show that Concept-SAE equips SAEs with a reliable mechanism for evaluating, probing, and diagnosing user-defined concepts.

2509.02971 2026-06-05 stat.ML cs.LG cs.NA math.NA math.PR 版本更新

Scale-Adaptive Generative Flows for Multiscale Scientific Data

多尺度科学数据的自适应生成流

Yifan Chen, Eric Vanden-Eijnden

发表机构 * Department of Mathematics, University of California, Los Angeles(加州大学洛杉矶分校数学系) Machine Learning Lab, Capital Fund Management(资本基金管理有限公司机器学习实验室) Courant Institute, New York University(纽约大学柯朗研究所)

AI总结 本文提出了一种多尺度科学数据生成模型,通过设计噪声分布和插值计划,解决多尺度傅里叶谱数据中的数值挑战,提高了生成样本的质量和效率。

详情
AI中文摘要

基于流的生成模型在处理具有多尺度傅里叶谱的科学数据时常常面临数值挑战,通常在细尺度上产生较大的误差。我们通过在流匹配和随机插值框架内,通过噪声分布和插值计划的原理性设计来解决这个问题。在函数空间中工作可以确保生成模型在分辨率细化时仍然定义良好;漂移的Lipschitz正则性对这种函数空间的良定义性和固定分辨率下的积分成本都很重要。核心观察是噪声应至少与目标分布一样粗糙——通过傅里叶谱衰减来衡量——以保持Lipschitz常数有限。对于已知细尺度结构的高斯和近高斯目标,匹配谱噪声比标准白噪声选择更有效。对于更复杂的非高斯目标,匹配谱噪声可能不足以应对噪声比数据粗糙时出现的终端时间刚性问题,我们提出自适应插值计划来缓解这种情况。在合成高斯随机场和随机Allen-Cahn和Navier-Stokes方程不变测度上的数值实验展示了该方法,并证明了其在传统方法基础上以更低计算成本生成高质量样本的能力。

英文摘要

Flow-based generative models can face numerical challenges on scientific data with multiscale Fourier spectra, often producing large errors at fine scales. We approach this problem within the flow matching and stochastic interpolants framework, through the principled design of noise distributions and interpolation schedules. Working in function space ensures that the generative model remains well defined as the resolution is refined; the Lipschitz regularity of the drift is important to both this function-space well-posedness and the integration cost at fixed resolution. The central observation is that the noise should be at least as rough as the target distribution -- measured by Fourier-spectrum decay -- in order to keep the Lipschitz constant finite. For Gaussian and near-Gaussian targets whose fine-scale structure is known, matched-spectrum noise improves numerical efficiency over standard white-noise choices. For more complex non-Gaussian targets, matched-spectrum noise may not be sufficient, and we propose scale-adaptive interpolation schedules to mitigate the terminal-time stiffness that arises when the noise is rougher than the data. Numerical experiments on synthetic Gaussian random fields and on invariant measures of the stochastic Allen--Cahn and Navier--Stokes equations illustrate the approach and demonstrate its ability to generate high-fidelity samples at lower computational cost than traditional approaches.

2508.19006 2026-06-05 q-fin.PR cs.LG econ.EM q-fin.CP 版本更新

Is attention truly all we need? An empirical study of asset pricing in pretrained RNN sparse and global attention models

注意力真的全部我们需要吗?对预训练RNN稀疏和全局注意力模型在资产定价中的实证研究

Shanyan Lai

发表机构 * Department of Economics and Related Studies, Univiersity of York(经济与相关研究系,约克大学)

AI总结 本文研究了预训练RNN注意力模型在资产定价中的应用,探讨了注意力机制在捕捉时间依赖性和长期记忆方面的改进,以及在不同市场条件下的稳定性。

Comments 72 pages including appendix

详情
AI中文摘要

本研究探讨了主流注意力机制,如加权注意力、Luong的三种注意力、全局自注意力和滑动窗口稀疏注意力,在顶级420只大型美国股票上的实证资产定价研究。这是首次将大规模最先进的(SOTA)注意力机制应用于资产定价领域。这些模型克服了传统机器学习资产定价方法的局限性,如误捕时间依赖性和短期记忆。此外,注意力机制中的强制因果掩码解决了未来数据泄漏问题,而这一问题被更先进的注意力模型如经典Transformer所忽视。所提出的注意力模型还考虑了资产定价数据的时间稀疏性,并通过部署简化模型结构来缓解潜在的过拟合问题。本文为未来实证经济研究提供了某些见解。所有模型均在三个时期内进行测试,涵盖新冠前、新冠期间和新冠后一年,以测试这些模型在极端市场条件下的稳定性。研究发现,在价值加权投资组合回测中,全局自注意力模型和滑动窗口稀疏注意力模型在获得绝对收益和对冲下行风险方面表现出色,在新冠期间静态交易成本情景下,它们分别实现了2.0和1.80的年化Sortino比率。此外,从绝对投资组合收益的角度来看,滑动窗口稀疏注意力模型在股票市值大小方面比全局自注意力模型表现更加稳定。

英文摘要

This study investigates the pre-trained RNN attention models with the mainstream attention mechanisms, such as additive attention, Luong's three attentions, global self-attention and sliding window sparse attention, for the empirical asset pricing research on the top 420 large-cap US stocks. This is the first paper on the large-scale state-of-the-art (SOTA) attention mechanisms applied in the asset pricing context. They overcome the limitations of the traditional machine learning-based asset pricing, such as mis-capturing the temporal dependency and short memory. Moreover, the enforced causal masks in the attention mechanisms address the future data leaking issue ignored by the more advanced attention-based models, such as the classic Transformer. The proposed attention models also consider the temporal sparsity characteristic of asset pricing data and mitigate potential overfitting issues by deploying the simplified model structures. This provides some insights for future empirical economic research. All models are examined in three periods, which cover pre-COVID-19, COVID-19 and one year post-COVID-19, for testing the stability of these models under extreme market conditions. The study finds that in value-weighted portfolio back testing, the global self-attention model and the sliding window sparse attention model exhibit excellent capabilities in deriving the absolute returns and hedging downside risks, while they achieve an annualized Sortino ratio of 2.0 and 1.80 respectively in the period with COVID-19 in the static transaction cost scenario. Moreover, the sliding window sparse attention model performs more stably than the global self-attention model from the perspective of absolute portfolio returns with respect to the size of stocks' market capitalization.

2307.05284 2026-06-05 cs.LG cs.AI 版本更新

Rethinking Distribution Shifts: Empirical Analysis and Modeling for Tabular Data

重新思考分布偏移:针对表格数据的经验分析与建模

Tianyu Wang, Jiashuo Liu, Peng Cui, Hongseok Namkoong

发表机构 * Department of Industrial Engineering and Operations Research(工业工程与运筹学系) Department of Computer Science and Technology(计算机科学与技术系) Decision, Risk, and Operations Division(决策、风险与运营部) Columbia University(哥伦比亚大学) Tsinghua University(清华大学)

AI总结 本文通过经验分析和建模,重新审视分布偏移问题,发现Y|X偏移在表格数据中最为常见,与机器学习文献中对X(协变量)偏移的重视形成鲜明对比,并指出鲁棒算法的性能并不优于普通方法。

Comments Forthcoming at Management Science. Conference version appeared in NeurIPS 2023, previously titled "On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets"

详情
AI中文摘要

不同的分布偏移需要不同的干预措施,算法必须基于其解决的具体偏移类型来构建。然而,稳健算法的方法学发展通常依赖于缺乏实证验证的结构性假设。本文倡导一种以实证为基础的数据驱动方法来开发算法,构建了一个包含8个表格数据集中的自然偏移、172个分布对、45种方法和90,000种方法配置的实证测试平台,涵盖了经验风险最小化和分布鲁棒优化(DRO)方法。我们发现Y|X偏移在我们的测试平台中最为普遍,这与机器学习文献中对X(协变量)偏移的高度重视形成鲜明对比,并且稳健算法的性能并不优于普通方法。为了理解原因,我们深入分析了DRO方法,发现被忽视的实现细节——如底层模型类(例如LightGBM)的选择和超参数选择——对性能的影响比模糊集或其半径更大。通过案例研究,我们展示了如何通过数据驱动的归纳理解分布偏移,提供了一种新的算法开发方法。

英文摘要

Different distribution shifts require different interventions, and algorithms must be grounded in the specific shifts they address. However, methodological development for robust algorithms typically relies on structural assumptions that lack empirical validation. Advocating for an empirically grounded data-driven approach to algorithm development, we build an empirical testbed comprising natural shifts across 8 tabular datasets, 172 distribution pairs over 45 methods and 90,000 method configurations encompassing empirical risk minimization and distributionally robust optimization (DRO) methods. We find $Y|X$-shifts are most prevalent in our testbed, in stark contrast to the heavy focus on $X$ (covariate)-shifts in the ML literature, and that the performance of robust algorithms is no better than that of vanilla methods. To understand why, we conduct an in-depth empirical analysis of DRO methods and find that underlooked implementation details -- such as the choice of underlying model class (e.g., LightGBM) and hyperparameter selection -- have a bigger impact on performance than the ambiguity set or its radius. We illustrate via case studies how a data-driven, inductive understanding of distribution shifts can provide a new approach to algorithm development.

2508.10555 2026-06-05 physics.comp-ph cs.CE cs.LG 版本更新

A Differentiable Framework for Full and Phaseless Data Inversion Using Neural Implicit Contrast-Source Representation

一种基于神经隐式对比源表示的全数据和相位less数据反演可微框架

Haoran Sun, Daoqi Liu, Hongyu Zhou, Maokun Li, Shenheng Xu, Fan Yang

发表机构 * Department of Electronic Engineering, Beijing National Research Center for Information Science and Technology (BNRist), and State Key laboratory of Space Network and Communications(电子工程系,北京信息科学与技术国家研究中心(BNRist),空间网络与通信国家重点实验室)

AI总结 本文提出了一种基于神经隐式对比源表示的可微框架,用于全数据和相位less数据反演,通过引入轻量级残差多层感知机作为连续神经场,提升了反演精度和鲁棒性,同时通过总变分正则化将状态方程和数据方程结合,形成可微目标函数,实现了端到端的可微优化。

详情
AI中文摘要

在本研究中,我们扩展了对比源反演,将其扩展为一个完全可微、无监督的框架,基于神经隐式表示的对比源。具体来说,而不是使用像素级离散表示,对比源由一个轻量级残差多层感知机(ResMLP)参数化,作为连续神经场,该神经场基于空间坐标和发射器设置进行条件化。这种连续参数化提供了更灵活的对比源表示,并在有噪声测量的情况下提高了重建精度和鲁棒性。基于此表示,状态方程和数据方程与总变分正则化相结合,形成一个可微的目标函数。通过将VIE约束反演重新公式化为一个端到端的可微优化问题,网络参数和介质对比率通过自动微分联合优化。在相同框架内,通过仅修改数据失配函数,同时支持全数据和相位less数据反演。数值实验表明,该方案在各种噪声水平和测量设置下,比传统CSI具有更高的重建精度和鲁棒性。连续神经场进一步使超分辨率推理成为可能,在训练网格更细的分辨率下实现,将反演成本与重建保真度解耦。消融研究和与替代神经架构的比较进一步确认,对比源参数化和基于VIE的公式化对于观察到的改进都是必不可少的。

英文摘要

In this study, we extend the contrast source inversion to a fully differentiable, unsupervised framework based on a neural implicit representation of the contrast source. Specifically, instead of a pixel-wise discrete representation, the contrast source is parameterized by a lightweight residual multilayer perceptron (ResMLP) as a continuous neural field conditioned on spatial coordinates and transmitter settings. This continuous parameterization provides a more flexible representation of the contrast source and improves reconstruction accuracy and robustness under noisy measurements. Building on this representation, the state equation and data equation are combined with total-variation regularization to form a differentiable objective function. By reformulating the VIE-constrained inversion as an end-to-end differentiable optimization problem, the network parameters and the medium contrast are jointly optimized via automatic differentiation. Within the same framework, both full and phaseless data inversion are accommodated by only modifying the data misfit function. Numerical experiments demonstrate that this scheme yields higher reconstruction accuracy and robustness than conventional CSI across a range of noise levels and measurement settings. The continuous neural field further enables super-resolution inference at resolutions finer than the training grid, decoupling inversion cost from reconstruction fidelity. Ablation studies and comparisons with alternative neural architectures further confirm that the contrast source parameterization and VIE-based formulation are both essential to the observed improvements.

2508.00775 2026-06-05 eess.SY cs.LG cs.SY math.OC 版本更新

Learning to optimize with guarantees: a complete characterization of linearly convergent algorithms

学习优化并保证收敛性:线性收敛算法的完整表征

Andrea Martin, Ian R. Manchester, Luca Furieri

发表机构 * School of Electrical Engineering and Computer Science, and Digital Futures, KTH Royal Institute of Technology, Sweden(电气工程与计算机科学学院及数字未来学院,瑞典皇家理工学院) Australian Centre for Robotics and School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Australia(澳大利亚机器人中心及航空航天、机械与机电工程学院,澳大利亚悉尼大学) Department of Engineering Sciences, University of Oxford, United Kingdom(工程科学系,英国牛津大学)

AI总结 本文研究了如何通过改进算法在特定问题分布下的平均性能,提出了一种线性收敛算法的完整表征方法,展示了如何通过基线算法和可训练的指数衰减修改来实现线性收敛,并在非凸、梯度主导函数、强凸函数和多面体可行集优化中验证了其有效性。

详情
AI中文摘要

许多经典优化算法的设计受到线性收敛速率在问题类中的认证驱动。本文考虑了如何改进算法在特定问题实例分布下的平均性能。虽然可以通过将可训练组件嵌入算法更新中来解决这一任务,但关键挑战是保持整个问题类中的最坏保证。对于复合优化问题的类别,我们证明所有线性收敛算法都可以参数化为一个基线线性收敛算法和一组可训练的指数衰减修改其更新规则的参数;关键在于这种参数化排除了且仅排除了那些不收敛的算法。我们的结果适用于改进经典算法(如梯度下降用于非凸、梯度主导函数;Nesterov加速方法用于平滑强凸函数;投影梯度方法用于多面体可行集优化)的平均性能。我们展示了如何利用我们的表征来学习优化并保证线性收敛和可行性。数值结果展示了在求解病态线性方程组和在线性动力学系统上运行模型预测控制方案时,相较于经典优化器的优势。

英文摘要

The design of many classical optimization algorithms is driven by the certification of linear convergence rates over classes of optimization problems. In this paper, we consider the problem of improving the average-case performance of an algorithm over a specific distribution of problem instances. While this task can be tackled by embedding trainable components into the algorithm updates, a key challenge is to preserve worst-case guarantees across the entire problem class. For classes of composite optimization problems, we show that all linearly convergent algorithms can be parametrized in terms of a baseline linearly convergent algorithm, and a set of trainable, exponentially-decaying modifications to its update rule; crucially, this parametrization excludes all-and only-the algorithms that do not converge linearly. Our results apply to improving the average-case performance of classical algorithms such as gradient descent for nonconvex, gradient-dominated functions; Nesterov's accelerated method for smooth, strongly convex functions; and projected gradient methods for optimization over polyhedral feasible sets. We illustrate how our characterization can be used for learning to optimize with linear convergence and feasibility guarantees. Numerical results showcase benefits over classical optimizers when solving ill-conditioned systems of linear equations and running a model predictive control scheme on a linear dynamical system.

2507.06219 2026-06-05 cs.RO cs.AI cs.LG 版本更新

Is Diversity All You Need for Scalable Robotic Manipulation?

多样性是否是可扩展机器人操作的全部需求?

Modi Shi, Li Chen, Jin Chen, Yuxiang Lu, Chiming Liu, Guanghui Ren, Ping Luo, Di Huang, Maoqing Yao, Hongyang Li

发表机构 * University of Science and Technology of China(中国科学技术大学) Tsinghua University(清华大学) University of California, Berkeley(加州大学伯克利分校)

AI总结 本文研究了数据多样性在机器人学习中的作用,发现任务多样性比单任务演示量更重要,多身体预训练数据在跨身体转移中可选,专家多样性可能对策略学习产生干扰,提出分布去偏方法提升性能。

Comments Code is available at https://github.com/OpenDriveLab/AgiBot-World

详情
AI中文摘要

数据扩展在自然语言处理和计算机视觉的基础模型中取得了显著成功,但机器人操作中有效数据扩展的原则仍不够清楚。本文通过研究机器人学习中数据多样性的细微作用,探讨了三个关键维度:任务(做什么)、身体(使用哪种机器人)和专家(谁演示)。通过在各种机器人平台上进行广泛实验,我们发现:(1)任务多样性比单任务演示数量更重要,有助于从多样预训练任务转移到新下游场景;(2)多身体预训练数据在跨身体转移中是可选的,高质量单身体预训练模型可以高效地转移到不同平台,在微调过程中表现出比多身体预训练模型更优的扩展特性;(3)专家多样性源于个体操作偏好和人类演示中的随机变化,可能对策略学习产生干扰,速度多模态成为关键贡献因素。基于这一洞察,我们提出了一种分布去偏方法以缓解速度模糊性,所提出的GO-1-Pro方法实现了15%的性能提升,相当于使用2.5倍的预训练数据。这些发现提供了新的视角,并为如何有效扩展机器人操作数据集提供了实用指导。

英文摘要

Data scaling has driven remarkable success in foundation models for Natural Language Processing (NLP) and Computer Vision (CV), yet the principles of effective data scaling in robotic manipulation remain insufficiently understood. In this work, we investigate the nuanced role of data diversity in robot learning by examining three critical dimensions-task (what to do), embodiment (which robot to use), and expert (who demonstrates)-challenging the conventional intuition of "more diverse is better". Throughout extensive experiments on various robot platforms, we reveal that (1) task diversity proves more critical than per-task demonstration quantity, benefiting transfer from diverse pre-training tasks to novel downstream scenarios; (2) multi-embodiment pre-training data is optional for cross-embodiment transfer-models trained on high-quality single-embodiment data can efficiently transfer to different platforms, showing more desirable scaling property during fine-tuning than multi-embodiment pre-trained models; and (3) expert diversity, arising from individual operational preferences and stochastic variations in human demonstrations, can be confounding to policy learning, with velocity multimodality emerging as a key contributing factor. Based on this insight, we propose a distribution debiasing method to mitigate velocity ambiguity, the yielding GO-1-Pro achieves substantial performance gains of 15%, equivalent to using 2.5 times pre-training data. Collectively, these findings provide new perspectives and offer practical guidance on how to scale robotic manipulation datasets effectively.

2501.14291 2026-06-05 cs.LG stat.ML 版本更新

Advances in Temporal Point Processes: Bayesian, Neural, and LLM Approaches

时间点过程的进展:贝叶斯、神经网络和大语言模型方法

Feng Zhou, Quyu Kong, Jie Qiao, Cheng Wan, Yixuan Zhang, Ruichu Cai

发表机构 * Center for Applied Statistics and School of Statistics, Renmin University of China(应用统计中心和中国人民大学统计学院) Independent Researcher(独立研究者) School of Computer Science, Guangdong University of Technology(广东工业大学计算机学院) School of Statistics and Data Science, Southeast University(东南大学统计与数据科学学院)

AI总结 本文综述了时间点过程的最新研究,从贝叶斯、深度学习和大语言模型三个角度探讨了模型设计、参数估计以及经典应用领域,并展望了未来的研究挑战和方向。

详情
AI中文摘要

时间点过程(TPPs)是用于表征连续时间中事件序列的随机过程模型。传统统计TPPs已有长久的历史,众多模型已被提出并在不同领域中成功应用。近年来,深度学习的进步推动了神经TPPs的发展,使捕捉复杂时间动态变得更加灵活和表达性更强。大语言模型(LLMs)的出现进一步引发了关注,通过利用其丰富的上下文理解能力,为事件序列建模和分析提供了新的可能性。本文从贝叶斯、深度学习和LLM三个视角全面回顾了最近关于TPPs的研究。我们首先回顾了TPPs的基本概念,随后深入讨论了这三种框架中的模型设计和参数估计技术。我们还回顾了TPPs的经典应用领域,以突出其实际相关性。最后,我们概述了TPPs面临的挑战和未来研究的有前景方向。

英文摘要

Temporal point processes (TPPs) are stochastic process models used to characterize event sequences occurring in continuous time. Traditional statistical TPPs have a long-standing history, with numerous models proposed and successfully applied across diverse domains. In recent years, advances in deep learning have spurred the development of neural TPPs, enabling greater flexibility and expressiveness in capturing complex temporal dynamics. The emergence of large language models (LLMs) has further sparked excitement, offering new possibilities for modeling and analyzing event sequences by leveraging their rich contextual understanding. This survey presents a comprehensive review of recent research on TPPs from three perspectives: Bayesian, deep learning, and LLM approaches. We begin with a review of the fundamental concepts of TPPs, followed by an in-depth discussion of model design and parameter estimation techniques in these three frameworks. We also revisit classic application areas of TPPs to highlight their practical relevance. Finally, we outline challenges and promising directions for future research.

2503.04712 2026-06-05 math.OC cs.LG 版本更新

Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity

通过广义平滑性实现高效逃离鞍点的自界正则性

Daniel Yiming Cao, August Y. Chen, Karthik Sridharan, Benjamin Tang

发表机构 * Cornell University(康奈尔大学)

AI总结 本文研究了非凸函数(不一定光滑)的一阶优化方法,提出了一种新的框架,系统分析了在广义平滑性下一阶优化算法的收敛性,首次建立了在广义平滑性下一阶方法达到二阶 stationary 点的收敛保证。

Comments Camera ready version of NeurIPS 2025 paper. 97 pages

详情
AI中文摘要

我们研究了非凸函数(不一定光滑)的一阶优化方法,其中梯度和/或Hessian是Lipschitz连续的。平滑性在机器学习的理论和实践中是一个限制性假设,推动了大量关于使用一阶方法寻找满足广义平滑性函数的一阶 stationary 点的研究。我们开发了一种新的框架,使我们能够系统地研究在广义平滑性下一大类一阶优化算法(我们称之为减少过程)的收敛性。我们将该框架应用于分析在广义平滑性下一阶优化算法收敛到一阶和二阶 stationary 点的收敛性。作为结果,我们建立了在广义平滑性下一阶方法达到二阶 stationary 点的首次收敛保证。我们证明了几个经典例子落在该框架内,并突显了实际意义。

英文摘要

We study the optimization of non-convex functions that are not necessarily smooth (gradient and/or Hessian are Lipschitz) using first order methods. Smoothness is a restrictive assumption in machine learning in both theory and practice, motivating significant recent work on finding first order stationary points of functions satisfying generalizations of smoothness with first order methods. We develop a novel framework that lets us systematically study the convergence of a large class of first-order optimization algorithms (which we call decrease procedures) under generalizations of smoothness. We instantiate our framework to analyze the convergence of first order optimization algorithms to first and \textit{second} order stationary points under generalizations of smoothness. As a consequence, we establish the first convergence guarantees for first order methods to second order stationary points under generalizations of smoothness. We demonstrate that several canonical examples fall under our framework, and highlight practical implications.

2506.19260 2026-06-05 cs.CR cs.DC cs.LG 版本更新

Topology-Aware Differential Privacy in Federated Learning

基于拓扑的联邦学习差分隐私

Murtaza Rangwala, Richard O. Sinnott, Rajkumar Buyya

发表机构 * Quantum Cloud Computing and Distributed Systems (qCLOUDS) Lab, School of Computing and Information Systems, The University of Melbourne(量子云计算与分布式系统实验室,计算与信息系统学院,墨尔本大学)

AI总结 本文研究了联邦学习中通信拓扑对差分隐私的影响,提出了一种拓扑感知的分布推理方法TADI,通过四个通道消解来隔离客户端泄露,并推导出一个加性互信息界,从而得到Fulcrum算法,该算法在非对称拓扑下优于均匀DP-SGD,在多个数据集上实现了隐私保护的提升。

Comments 16 pages, 6 figures, 2 tables. Data from the experiments and source code can be found here: https://doi.org/10.5281/zenodo.20507155

详情
AI中文摘要

联邦学习通过传输模型更新来保护客户端数据,而差分隐私随机梯度下降(DP-SGD)通过这些更新来限制内容层面的泄露。然而,现有的机制并未考虑联邦本身通信拓扑所揭示的信息。在跨硅部署中,一个了解拓扑和组织结构的被动攻击者能够访问DP-SGD未考虑的信息通道。本文正式化了这一威胁并推导出一种原则性的防御方法。我们引入了TADI(拓扑感知分布推理),一种通过四个通道消解的影子训练通道分解方法,通过将客户端泄露分解为参数、结构和组织组件,并证明了一个加性客户端互信息界,将可控机制项与不可控先验耦合地板分离。从该界中,我们推导出Fulcrum,一种闭合形式的平衡最小-最大最优噪声分配,当联邦的杠杆配置不对称时,严格优于均匀DP-SGD,当不对称时退化为均匀DP-SGD,使其无条件安全采用。在Fed-ISIC2019、Fed-Heart-Disease和合成CIFAR-10六个拓扑家族上评估,Fulcrum在无可测量的实用性成本下实现了隐私保护的提升。TADI通道分解确认参数通道在所有设置下受DP-SGD限制,先验耦合通道在匹配先验条件下经验达到,且在现实跨硅威胁模型下部署有利方向上保守。

英文摘要

Federated learning transmits only model updates to protect client data, and differentially private SGD (DP-SGD) bounds content-level leakage through those updates. Neither mechanism accounts for what the communication topology of the federation itself reveals. In cross-silo deployments, a passive adversary with knowledge of the topology and organisational structure has access to information channels that DP-SGD leaves entirely unaddressed. We formalise this threat and derive a principled defense. We introduce TADI (Topology-Aware Distributional Inference), a shadow-trained channel decomposition that isolates per-client leakage into parameter, structural, and organisational components via four channel ablations, and prove an additive per-client mutual-information bound separating a controllable mechanism term from an uncontrollable prior-coupling floor. From this bound we derive Fulcrum, a closed-form balanced min-max optimal noise allocation that strictly dominates uniform DP-SGD whenever the federation's leverage profile is asymmetric, and degenerates exactly to uniform DP-SGD when it is not, making it safe to adopt unconditionally. Evaluated on Fed-ISIC2019, Fed-Heart-Disease, and synthetic CIFAR-10 across six topology families, Fulcrum delivers privacy gains of up to 1.967 nats at no measurable utility cost. The TADI channel decomposition confirms that the parameter channel is bounded by DP-SGD across all settings, the prior-coupling channel is empirically attained under matched-prior conditions, and the bound is conservative in a deployment-favourable direction under realistic cross-silo threat models.

2408.11336 2026-06-05 cs.LG cs.CV 版本更新

FATE: Focal-modulated Attention Encoder for Multivariate Time-series Forecasting

FATE:用于多变量时间序列预测的焦点调节注意力编码器

Tajamul Ashraf, Janibul Bashir

发表机构 * GAASH Research Lab(GAASH研究实验室) Department of Information Technology(信息科技系) National Institute of Technology Srinagar(斯里 Nagar国立理工学院)

AI总结 本文提出FATE,一种新的Transformer架构,用于可靠的多变量时间序列预测。FATE引入了张量化的焦点调节机制,以显式捕捉时间序列中的时空相关性,并通过两个调节分数提高可解释性,通过在七个不同现实世界数据集上基准测试,证明其在长视界多变量气象数据集上的优越性能。

详情
AI中文摘要

气候变化是21世纪最紧迫的全球挑战之一,其后果包括海平面上升、冰川融化以及日益极端的天气模式。准确的预测对于监测这些现象和支持缓解策略至关重要。尽管最近的数据驱动模型,包括CNNs、RNNs和基于注意力的Transformer,在时间序列预测中显示出潜力,但它们在处理序列依赖性和有限并行性方面存在困难,尤其是在长视界、多变量气象数据集中。在本文中,我们提出了Focal Modulated Attention Encoder(FATE),一种新的Transformer架构,用于可靠的多变量时间序列预测。与传统模型不同,FATE引入了张量化的焦点调节机制,以显式捕捉时间序列数据中的时空相关性。我们进一步提出了两个调节分数,通过突出影响预测的关键环境特征来提供可解释性。我们在七个不同的现实世界数据集上基准测试FATE,包括ETTh1、ETTm2、Traffic、Weather5k、USA-Canada、Europe和LargeST数据集,并显示其在所有最先进的方法,包括温度数据集上都表现优异。我们的消融研究也表明,FATE能够很好地推广到更广泛的多变量时间序列预测任务中。

英文摘要

Climate change stands as one of the most pressing global challenges of the twenty-first century, with far-reaching consequences such as rising sea levels, melting glaciers, and increasingly extreme weather patterns. Accurate forecasting is critical for monitoring these phenomena and supporting mitigation strategies. While recent data-driven models for time-series forecasting, including CNNs, RNNs, and attention-based transformers, have shown promise, they often struggle with sequential dependencies and limited parallelization, especially in long-horizon, multivariate meteorological datasets. In this work, we present Focal Modulated Attention Encoder (FATE), a novel transformer architecture designed for reliable multivariate time-series forecasting. Unlike conventional models, FATE introduces a tensorized focal modulation mechanism that explicitly captures spatiotemporal correlations in time-series data. We further propose two modulation scores that offer interpretability by highlighting critical environmental features influencing predictions. We benchmark FATE across seven diverse real-world datasets, including ETTh1, ETTm2, Traffic, Weather5k, USA-Canada, Europe, and LargeST datasets, and show that it consistently outperforms all state-of-the-art methods, including temperature datasets. Our ablation studies also demonstrate that FATE generalizes well to broader multivariate time-series forecasting tasks.

2506.11973 2026-06-05 cs.LG 版本更新

Self-Regulating Cars: Automating Traffic Control in Free Flow Road Networks

自调节汽车:自动化自由流道路网络的交通控制

Ankit Bhardwaj, Rohail Asim, Sachin Chauhan, Yasir Zaki, Lakshminarayanan Subramanian

发表机构 * Department of Computer Science(计算机科学系) New York University(纽约大学) Indian Institute of Technology Delhi(德里印度理工学院)

AI总结 本文提出了一种基于强化学习的自调节汽车方法,通过动态调节车辆速度来优化通行能力和防止拥堵,无需新基础设施,结合经典交通流理论和微观模拟,在高保真度的PTV Vissim模拟器上实现了提高通行能力、减少延误和停车次数的改进。

详情
AI中文摘要

自由流道路网络,如郊区高速公路,由于通勤流量增加和基础设施有限,正越来越多地经历交通拥堵。传统控制机制,如交通信号或局部启发式方法,在这些高速、无信号的环境中效果不佳或不可行。我们引入了自调节汽车,一种基于强化学习的交通控制协议,通过动态调节车辆速度来优化通行能力和防止拥堵,而无需新的物理基础设施。我们的方法将经典交通流理论、间隙接受模型和微观模拟整合到一个物理指导的强化学习框架中。通过将道路抽象为超段,智能体捕捉到涌现的流量动态,并从即时交通观测中学习稳健的速度调节策略。在高保真度的PTV Vissim模拟器上,我们的方法在真实世界高速公路网络中实现了比无控制设置提高5%的总通行能力,减少13%的平均延误,以及减少3%的总停车次数。它还实现了更平滑、抗拥堵的流量,同时在各种交通模式中泛化,展示了其在可扩展的ML驱动交通管理中的潜力。

英文摘要

Free-flow road networks, such as suburban highways, are increasingly experiencing traffic congestion due to growing commuter inflow and limited infrastructure. Traditional control mechanisms, such as traffic signals or local heuristics, are ineffective or infeasible in these high-speed, signal-free environments. We introduce self-regulating cars, a reinforcement learning-based traffic control protocol that dynamically modulates vehicle speeds to optimize throughput and prevent congestion, without requiring new physical infrastructure. Our approach integrates classical traffic flow theory, gap acceptance models, and microscopic simulation into a physics-informed RL framework. By abstracting roads into super-segments, the agent captures emergent flow dynamics and learns robust speed modulation policies from instantaneous traffic observations. Evaluated in the high-fidelity PTV Vissim simulator on a real-world highway network, our method improves total throughput by 5%, reduces average delay by 13%, and decreases total stops by 3% compared to the no-control setting. It also achieves smoother, congestion-resistant flow while generalizing across varied traffic patterns, demonstrating its potential for scalable, ML-driven traffic management.

2506.11042 2026-06-05 cs.LG 版本更新

GenFT: A Generative Parameter-Efficient Fine-Tuning Method for Pretrained Foundation Models

GenFT:一种用于预训练基础模型的生成性参数高效微调方法

Guangning Xu, Baoquan Zhang, Michael. K. Ng

发表机构 * Department of Mathematics, Hong Kong Baptist University, Hong Kong, China(香港 Baptist 大学数学系,香港,中国) Department of Computer Science, Harbin Institute of Technology, Shenzhen, China(哈尔滨工业大学(深圳)计算机科学系,中国)

AI总结 本文提出GenFT,一种基于预训练权重的参数高效微调方法,通过生成任务特定的更新来利用预训练权重中的结构信息,实现高效的模型微调。

Comments paper is accepted at ICANN 2026

详情
AI中文摘要

参数高效微调(PEFT)已作为一种资源高效的策略,通过学习少量任务特定的更新ΔW来适应预训练基础模型(PFMs)。现有方法往往在很大程度上独立于预训练权重W₀,或主要通过初始化或简单的重参数化来利用W₀。为了进一步利用W₀中编码的结构信息,我们提出生成性参数高效微调(GenFT),一种基于W₀的PEFT方法,使用确定性权重生成器生成任务特定的更新。具体而言,GenFT通过行和列变换与非线性激活来从W₀中提取结构化模式,并引入共享-特定分解以平衡跨层信息重用和层特定的灵活性。GenFT简单且参数高效,在NLP和CV基准上实现了竞争性或更优的平均性能。我们进一步在LLaMA-7B上进行试点研究,以检验其在生成模型中的可行性。代码可在GitHub https://github.com/xuguangning1218/GenFT 上获得。

英文摘要

Parameter-efficient fine-tuning (PEFT) has emerged as a resource-efficient strategy for adapting Pretrained Foundation Models (PFMs) by learning a small number of task-specific updates $ΔW$. Existing methods often learn $ΔW$ largely independently of pretrained weights $W_0$, or exploit $W_0$ mainly through initialization or simple reparameterization. To further leverage the structural information encoded in $W_0$, we propose Generative Parameter-Efficient Fine-Tuning (GenFT), a $W_0$-conditioned PEFT method that uses a deterministic weight generator to produce task-specific updates. Specifically, GenFT performs row and column transformations with nonlinear activations to extract structured patterns from $W_0$, and introduces a shared-specific decomposition to balance cross-layer information reuse and layer-specific flexibility. GenFT is simple and parameter-efficient, achieving competitive or better average performance across NLP and CV benchmarks. We further provide a pilot study on LLaMA-7B to examine its feasibility for generative models. The code is available at GitHub https://github.com/xuguangning1218/GenFT.

2506.00188 2026-06-05 cs.LG stat.ML 版本更新

Cluster-Aware Causal Mixer for Online Anomaly Detection in Multivariate Time Series

基于聚类的因果混合器用于多变量时间序列的在线异常检测

Md Mahmuddun Nabi Murad, Yasin Yilmaz

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文提出了一种基于聚类的因果混合器,用于多变量时间序列的在线异常检测,通过聚类处理通道间的相关性,结合因果混合器保持时间因果性,并开发了序列异常评分方法以提高检测准确性。

详情
AI中文摘要

在时间序列数据中早期和准确地检测异常至关重要,因为假阳性和漏检带来的风险很大。虽然基于MLP的混合模型在时间序列分析中显示出潜力,但它们在数据处理过程中不维护时间因果性。此外,现实中的多变量时间序列通常包含众多通道,具有多样的通道间相关性。重构时间序列中的虚假相关性导致表示噪声,从而导致检测不准确。此外,忽略时间连续性的异常评分方法可能会误导连续检测。为了解决这些挑战,我们提出了一种多变量时间序列异常检测的基于聚类的因果混合器。根据相关性将通道分组为集群,并通过专用嵌入层对每个集群进行嵌入。引入因果混合器以在保持时间因果性的同时整合信息。我们进一步开发了一种序列异常评分方法,该方法在时间上累积证据并细化异常边界。我们提出的模型以在线方式运行,使其适合实时时间序列异常检测。在六个公开基准数据集上的实验评估表明,所提出的方法在性能上始终优于其他方法。

英文摘要

Early and accurate detection of anomalies in time-series data is critical due to the substantial risks associated with false or missed detections. While MLP-based mixer models have shown promise in time-series analysis, they do not maintain temporal causality during data processing. Moreover, real-world multivariate time series often contain numerous channels with diverse inter-channel correlations. Spurious correlations in the reconstructed time series lead to noisy representations, resulting in inaccurate anomaly detection. In addition, anomaly scoring methods that ignore temporal continuity can mislead sequential detection. To address these challenges, we propose a cluster-aware causal mixer for multivariate time-series anomaly detection. Channels are grouped into clusters based on their correlations, and each cluster is embedded through a dedicated embedding layer. A causal mixer is introduced to integrate information while maintaining temporal causality. We further develop a sequential anomaly-scoring method that accumulates evidence over time and refines anomaly boundaries. Our proposed model operates in an online fashion, making it suitable for real-time time-series anomaly detection. Experimental evaluations across six public benchmark datasets demonstrate that the proposed approach consistently achieves superior performance.

2505.16311 2026-06-05 stat.ML cs.LG stat.ME 版本更新

Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions

生成器介导的老虎机:面向生成式人工智能的自适应干预的汤普森采样

Marc Brooks, Gabriel Durham, Kihyuk Hong, Ambuj Tewari

发表机构 * Department of Statistics, University of Michigan, Ann Arbor, MI (USA)(密歇根大学统计学系,安阿伯,MI (美国))

AI总结 本文提出了一种生成器介导的老虎机算法(GAMBITTS),用于解决生成式人工智能(GenAI)驱动的自适应干预问题。该算法通过建模治疗和奖励生成过程,利用观察到的治疗信息加速策略学习,并在模拟研究中优于传统算法。

Comments 39 pages, 12 figures

详情
Journal ref
Advances in Neural Information Processing Systems 38 (NeurIPS 2025)
AI中文摘要

近期生成式人工智能(GenAI)模型的进步使生成个性化内容成为可能,该内容能够适应最新的用户情境。尽管个性化决策系统通常采用老虎机建模,但GenAI的引入为经典序列学习问题带来了新的结构。在GenAI驱动的干预中,智能体选择查询,但环境会经历由生成模型产生的随机响应。标准老虎机方法并未显式考虑这种结构,其中动作仅通过随机、观察到的治疗影响奖励。我们引入生成器介导的老虎机-汤普森采样(GAMBITTS),一种针对这种动作/治疗分割设计的老虎机方法,以移动健康干预中的大型语言模型生成文本作为动机案例。GAMBITTS显式建模治疗和奖励生成过程,利用所交付的治疗信息,相对于标准方法加速策略学习。我们通过分解治疗和奖励中的不确定性来源,建立了GAMBITTS的遗憾界,并识别了其在某些条件下优于标准老虎机方法的保证条件。在模拟研究中,GAMBITTS通过利用观察到的治疗更准确地估计预期奖励,始终优于传统算法。

英文摘要

Recent advances in generative artificial intelligence (GenAI) models have enabled the generation of personalized content that adapts to up-to-date user context. While personalized decision systems are often modeled using bandit formulations, the integration of GenAI introduces new structure into otherwise classical sequential learning problems. In GenAI-powered interventions, the agent selects a query, but the environment experiences a stochastic response drawn from the generative model. Standard bandit methods do not explicitly account for this structure, where actions influence rewards only through stochastic, observed treatments. We introduce generator-mediated bandit-Thompson sampling (GAMBITTS), a bandit approach designed for this action/treatment split, using mobile health interventions with large language model-generated text as a motivating case study. GAMBITTS explicitly models both the treatment and reward generation processes, using information in the delivered treatment to accelerate policy learning relative to standard methods. We establish regret bounds for GAMBITTS by decomposing sources of uncertainty in treatment and reward, identifying conditions where it achieves stronger guarantees than standard bandit approaches. In simulation studies, GAMBITTS consistently outperforms conventional algorithms by leveraging observed treatments to more accurately estimate expected rewards.

2310.04649 2026-06-05 cs.LG 版本更新

Uncovering Model Processing Strategies with Non-Negative Per-Example Fisher Factorization

通过非负每例费舍尔分解揭示模型处理策略

Michael Matena, Colin Raffel

发表机构 * University of North Carolina Chapel Hill(北卡罗来纳大学教堂山分校) University of Toronto(多伦多大学) Vector Institute(向量研究所)

AI总结 本文提出NPEFF方法,通过分解每例费舍尔矩阵揭示模型生成预测所用的策略,展示了NPEFF组件在语言模型和文本处理任务中的应用,并展示了如何通过扰动这些组件来干扰模型处理,同时通过消融研究和实验验证了NPEFF在分析和缓解去学习的副作用以及研究上下文学习中的优势。

详情
AI中文摘要

我们引入NPEFF(非负每例费舍尔分解),一种可解释性方法,旨在揭示模型生成预测所使用的策略。NPEFF使用一种新颖的分解算法分解每例费舍尔矩阵,该算法学习了一组由学习得到的秩-1半正定矩阵表示的组件。通过结合人类评估和自动化分析,我们证明这些NPEFF组件对应于各种语言模型和文本处理任务中的模型处理策略。我们进一步展示了如何从NPEFF组件构建参数扰动,以选择性地干扰给定组件在模型处理中的作用。除了进行广泛的消融研究外,我们还包括实验,展示了NPEFF如何用于分析和缓解去学习的副作用,并用NPEFF研究上下文学习。此外,我们展示了NPEFF相对于梯度聚类和使用稀疏自编码器进行字典学习等基线方法的优势。我们发布了本工作的代码。

英文摘要

We introduce NPEFF (Non-Negative Per-Example Fisher Factorization), an interpretability method that aims to uncover strategies used by a model to generate its predictions. NPEFF decomposes per-example Fisher matrices using a novel decomposition algorithm that learns a set of components represented by learned rank-1 positive semi-definite matrices. Through a combination of human evaluation and automated analysis, we demonstrate that these NPEFF components correspond to model processing strategies for a variety of language models and text processing tasks. We further show how to construct parameter perturbations from NPEFF components to selectively disrupt a given component's role in the model's processing. Along with conducting extensive ablation studies, we include experiments to show how NPEFF can be used to analyze and mitigate collateral effects of unlearning and use NPEFF to study in-context learning. Furthermore, we demonstrate the advantages of NPEFF over baselines such as gradient clustering and using sparse autoencoders for dictionary learning over model activations. We release the code used in this work.

2505.02540 2026-06-05 cs.LG cs.AI 版本更新

Lazy But Effective: Collaborative Personalized Federated Learning with Heterogeneous Data

懒惰但有效:基于异构数据的协同个性化联邦学习

Ljubomir Rokvic, Panayiotis Danassis, Boi Faltings

发表机构 * Artificial Intelligence Laboratory EPFL(苏黎世联邦理工学院人工智能实验室) Telenor Research(Telenor研究)

AI总结 本文提出了一种简单有效的个性化联邦学习框架pFedLIA,通过使用计算效率高的影响近似方法'Lazy Influence',在分布式 manner 中对客户端进行聚类,从而在模型聚合前协同训练模型以捕捉客户端特定的数据模式,实验证明其在非iid数据集上能有效恢复全局模型性能,并在多个基准任务中优于现有基线方法。

Comments Accepted at the International Joint Conference on Neural Networks (IJCNN), IEEE, 2025

详情
AI中文摘要

在联邦学习中,客户端数据分布的异质性往往意味着单一全局模型无法为个别客户端提供最佳性能。例如,训练键盘的下一个词预测模型时,由于用户特定的语言模式(如人口统计学特征、语言能力、书写风格等),客户端之间会产生高度非iid的数据集。其他例子包括使用不同机器拍摄的医学图像或不同车辆类型的驾驶数据。为了解决这一问题,我们提出了一种简单但有效的个性化联邦学习框架(pFedLIA),该框架利用一种计算效率高的影响近似方法,称为'Lazy Influence',在分布式 manner 中在模型聚合前对客户端进行聚类。在每个聚类中,数据所有者协同训练一个模型,以捕捉客户端特定的数据模式。我们的方法在各种合成和现实世界设置中成功恢复了由于非iid性导致的全局模型性能下降,特别是在北欧语言的下一个词预测任务以及多个基准任务中。它在性能上与假设的Oracle聚类匹配,并显著优于现有基线方法,例如在CIFAR100上提高了17%。

英文摘要

In Federated Learning, heterogeneity in client data distributions often means that a single global model does not have the best performance for individual clients. Consider for example training a next-word prediction model for keyboards: user-specific language patterns due to demographics (dialect, age, etc.), language proficiency, and writing style result in a highly non-IID dataset across clients. Other examples are medical images taken with different machines, or driving data from different vehicle types. To address this, we propose a simple yet effective personalized federated learning framework (pFedLIA) that utilizes a computationally efficient influence approximation, called `Lazy Influence', to cluster clients in a distributed manner before model aggregation. Within each cluster, data owners collaborate to jointly train a model that captures the specific data patterns of the clients. Our method has been shown to successfully recover the global model's performance drop due to the non-IID-ness in various synthetic and real-world settings, specifically a next-word prediction task on the Nordic languages as well as several benchmark tasks. It matches the performance of a hypothetical Oracle clustering, and significantly improves on existing baselines, e.g., an improvement of 17% on CIFAR100.

2411.18343 2026-06-05 cs.LG cs.AI 版本更新

Comprehensive and Reliable Feature Attribution for Diverse Modalities and Models via Frequency-Domain Insights

通过频域见解实现多样化模态和模型的全面可靠特征归因

Zechen Liu, Feiyang Zhang, Wei Song, Xiang Li, Wei Wei

发表机构 * School of Computational Science, Wuhan University(武汉大学计算科学学院) Brain Research Center, Wuhan University(武汉大学脑科学研究中心) College of Information Science and Technology (School of Cyber Science and Technology), Shihezi University(石河子大学信息科学学院(网络安全科学与技术学院)) Xinjiang Production and Construction Corps Key Laboratory of Computing Intelligence and Network Information Security Open Fund(新疆生产建设兵团计算智能与网络信息安全重点实验室开放基金)

AI总结 本文提出了一种新的可解释性方法FreqX,结合信号处理和信息理论,以解决个性化联邦学习中非IID数据、异构设备、缺乏公平性和贡献不明确等问题,通过频域分析提高解释性效率和准确性。

Comments 16pages, 9 figures

详情
AI中文摘要

个性化联邦学习(PFL)允许客户端在不披露其私有数据集的情况下协作训练个性化模型。然而,PFL面临非IID、异构设备、缺乏公平性和贡献不明确等挑战,亟需深度学习模型的可解释性来克服这些问题。这些挑战提出了新的可解释性需求,包括低成本、隐私性和详细信息。目前没有现有的可解释性方法能满足这些需求。在本文中,我们提出了一种新的可解释性方法FreqX,通过引入信号处理和信息理论。我们的实验表明,FreqX的解释结果包含属性信息和概念信息。FreqX的运行速度至少比包含概念信息的基线方法快10倍。

英文摘要

Personalized Federal learning(PFL) allows clients to cooperatively train a personalized model without disclosing their private dataset. However, PFL suffers from Non-IID, heterogeneous devices, lack of fairness, and unclear contribution which urgently need the interpretability of deep learning model to overcome these challenges. These challenges proposed new demands for interpretability. Low cost, privacy, and detailed information. There is no current interpretability method satisfying them. In this paper, we propose a novel interpretability method \emph{FreqX} by introducing Signal Processing and Information Theory. Our experiments show that the explanation results of FreqX contain both attribution information and concept information. FreqX runs at least 10 times faster than the baselines which contain concept information.

2503.11910 2026-06-05 cs.LG cs.AI math.AT math.SG 版本更新

RTD-Lite: Scalable Topological Analysis for Comparing Weighted Graphs in Learning Tasks

RTD-Lite:用于学习任务中比较加权图拓扑结构的可扩展分析

Eduard Tulchinskii, Daria Voronkova, Ilya Trofimov, Evgeny Burnaev, Serguei Barannikov

发表机构 * Skoltech, AI Foundation and Algorithm Lab(斯克里普丘尔技术学院,人工智能基础与算法实验室) Skoltech, AIRI(斯克里普丘尔技术学院,人工智能研究机构) Skoltech, CNRS(斯克里普丘尔技术学院,法国国家科学研究中心)

AI总结 本文提出RTD-Lite算法,通过最小生成树辅助图在O(n²)时间内高效比较加权图的拓扑特征,适用于降维和神经网络训练等任务,实验表明其在识别拓扑差异和减少计算时间方面优于现有方法。

Comments Accepted for AISTATS 2025

详情
AI中文摘要

用于比较加权图的拓扑方法在各种学习任务中具有价值,但通常在大规模数据集上计算效率低下。我们介绍了RTD-Lite,一种可扩展算法,能够高效比较两个具有顶点一一对应关系的加权图的拓扑特征,特别是任意尺度下的连通性或聚类结构。通过辅助图的最小生成树,RTD-Lite以O(n²)的时间和内存复杂度捕捉拓扑差异。这种效率使其适用于降维和神经网络训练等任务。在合成和现实数据集上的实验表明,RTD-Lite能够有效识别拓扑差异,同时显著减少计算时间,相较于现有方法。此外,将RTD-Lite作为损失函数组件整合到神经网络训练中,可以增强学习表示中的拓扑结构保持。我们的代码在https://github.com/ArGintum/RTD-Lite上公开可用。

英文摘要

Topological methods for comparing weighted graphs are valuable in various learning tasks but often suffer from computational inefficiency on large datasets. We introduce RTD-Lite, a scalable algorithm that efficiently compares topological features, specifically connectivity or cluster structures at arbitrary scales, of two weighted graphs with one-to-one correspondence between vertices. Using minimal spanning trees in auxiliary graphs, RTD-Lite captures topological discrepancies with $O(n^2)$ time and memory complexity. This efficiency enables its application in tasks like dimensionality reduction and neural network training. Experiments on synthetic and real-world datasets demonstrate that RTD-Lite effectively identifies topological differences while significantly reducing computation time compared to existing methods. Moreover, integrating RTD-Lite into neural network training as a loss function component enhances the preservation of topological structures in learned representations. Our code is publicly available at https://github.com/ArGintum/RTD-Lite

2502.20914 2026-06-05 cs.LG cs.AI cs.CL 版本更新

Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?

Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?

Maxime Méloux, Silviu Maniu, François Portet, Maxime Peyrard

发表机构 * Université Grenoble Alpes, CNRS, Grenoble INP, LIG(格勒诺布尔阿尔卑斯大学、国家科学研究中心、格勒诺布尔INP、实验室LIG)

AI总结 本文探讨了在机械可解释性(MI)框架下,给定行为是否具有唯一解释的问题,通过统计可识别性理论分析了MI解释的可识别性,并提出了两种主要策略及实验结果。

详情
Journal ref
The Thirteenth International Conference on Learning Representations (ICLR 2025)
AI中文摘要

随着AI系统应用于高风险领域,确保可解释性至关重要。机械可解释性(MI)旨在通过提取人类可理解的算法来解释神经网络的行为。本文探讨了一个关键问题:在给定行为下,根据MI的标准,是否存在唯一的解释?借鉴统计学中的可识别性,其中参数在特定假设下可以唯一推断,我们探索了MI解释的可识别性。我们识别出两种主要的MI策略:(1)“where-then-what”,通过隔离复制模型行为的电路并在之后解释它;(2)“what-then-where”,从候选算法开始,通过因果对齐搜索实现它们的神经激活子空间。我们对布尔函数和小型多层感知机测试了这两种策略,完全枚举了候选解释。实验揭示了系统性的不可识别性:多个电路可以复制行为,一个电路可以有多种解释,多个算法可以与网络对齐,一个算法可以与不同的子空间对齐。是否需要唯一性?一种务实的方法可能只需要预测性和可操作性标准。如果唯一性对理解至关重要,可能需要更严格的条件。我们还参考了内部可解释性框架,该框架通过多种标准验证解释。本文为定义AI中的解释标准做出了贡献。

英文摘要

As AI systems are used in high-stakes applications, ensuring interpretability is crucial. Mechanistic Interpretability (MI) aims to reverse-engineer neural networks by extracting human-understandable algorithms to explain their behavior. This work examines a key question: for a given behavior, and under MI's criteria, does a unique explanation exist? Drawing on identifiability in statistics, where parameters are uniquely inferred under specific assumptions, we explore the identifiability of MI explanations. We identify two main MI strategies: (1) "where-then-what," which isolates a circuit replicating model behavior before interpreting it, and (2) "what-then-where," which starts with candidate algorithms and searches for neural activation subspaces implementing them, using causal alignment. We test both strategies on Boolean functions and small multi-layer perceptrons, fully enumerating candidate explanations. Our experiments reveal systematic non-identifiability: multiple circuits can replicate behavior, a circuit can have multiple interpretations, several algorithms can align with the network, and one algorithm can align with different subspaces. Is uniqueness necessary? A pragmatic approach may require only predictive and manipulability standards. If uniqueness is essential for understanding, stricter criteria may be needed. We also reference the inner interpretability framework, which validates explanations through multiple criteria. This work contributes to defining explanation standards in AI.

2502.06434 2026-06-05 cs.CV cs.LG 版本更新

Unifying Dataset Pruning and Distillation for Efficient Large-scale Compression

统一数据集剪枝与蒸馏以实现高效大规模压缩

Lingao Xiao, Songhua Liu, Yang He, Xinchao Wang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出一个统一的数据集压缩基准,探讨数据集剪枝与蒸馏的收敛趋势,发现软标签蒸馏在小数据集上表现不如剪枝,提出基于硬标签的数据集压缩方法,通过PCA框架提升图像质量和存储效率。

Comments Accepted by ICML 2026

详情
AI中文摘要

数据集剪枝(DP)和数据集蒸馏(DD)在输出上有根本差异:DP选择原始图像子集,而DD生成合成图像。最近,DD对原始图像的依赖增加表明两种方法趋于融合。为研究这种融合趋势,我们提出统一的数据集压缩(DC)基准。该基准揭示了软标签-DD的有趣权衡:虽然软标签提供有价值信息,但它们可能使蒸馏过程变得不必要,因为蒸馏图像可能不总能优于随机子集。此外,基准表明在当前阶段,数据集剪枝在小数据集上优于数据集蒸馏。鉴于这些观察,我们探索硬标签-DC作为互补方法,强调图像质量的同时提供显著的存储效率。我们的PCA(Prune, Combine, and Augment)是首个不依赖软标签而是聚焦图像质量的框架。(1)

英文摘要

Dataset pruning (DP) and dataset distillation (DD) fundamentally differ in their outputs: DP selects original image subsets, while DD generates synthetic images. Recently, DD's increasing reliance on original images suggests a convergence of the two directions. To investigate this convergence trend, we propose a unified dataset compression (DC) benchmark. This benchmark reveals an interesting trade-off for soft-label-DD: while soft labels provide valuable information, they can make the distillation process less essential, as distilled images may not always outperform random subsets. In addition, the benchmark reveals that in current stages, dataset pruning outperforms dataset distillation at small dataset sizes. Given these observations, we explore hard-label-DC as a complementary approach that emphasizes image quality while offering substantial storage efficiency. Our PCA (Prune, Combine, and Augment) is the first framework that does not rely on soft labels but instead focuses on image quality. (1) "P'' means selecting easy samples based on dataset pruning metrics, (2) "C'' indicates combining these samples effectively, and (3) "A'' is to apply constrained image augmentation during training. Our code is available at https://github.com/ArmandXiao/Unifying-Dataset-Pruning-and-Distillation

2410.04309 2026-06-05 cs.CY cs.LG 版本更新

Comprehensive Monitoring of Air Pollution Hotspots Using Sparse Sensor Networks

利用稀疏传感器网络全面监测空气污染热点

Ankit Bhardwaj, Ananth Balashankar, Shiva Iyer, Nita Soans, Anant Sudarshan, Rohini Pande, Lakshminarayanan Subramanian

发表机构 * New York University(纽约大学) Google Research(谷歌研究) Toyota InfoTechnology Center(丰田信息技术中心) Kaiterra Inc(Kaiterra公司) University of Warwick(沃里克大学) Yale University(耶鲁大学)

AI总结 本文通过结合预测建模和机理方法,利用新增的低成本传感器,发现新德里现有传感器网络之外的189个隐藏热点,并利用空间时间克里金法进行预测,同时开发了高斯烟雾扩散模型以解释热点形成机理,为资源受限环境下的空气污染管理提供了数据驱动和机理结合的解决方案。

详情
AI中文摘要

城市空气污染热点对健康构成重大威胁,但其检测和分析仍然受到公共传感器网络稀疏性的限制。本文通过结合预测建模和机理方法,全面监测污染热点。我们通过在新德里现有传感器网络中增加28个低成本传感器,收集了2018年5月1日至2020年11月1日期间30个月的PM2.5数据。应用已建立的热点定义,我们发现了除公共网络检测的660个热点外,还有189个隐藏热点。利用预测技术如空间时间克里金法,我们在50%的传感器故障率下实现了95%的精度和88%的召回率,在50%的缺失传感器情况下实现了98%的精度和95%的召回率。我们的预测模型的预期结果进一步被编译成政策建议,供公共当局参考。此外,我们开发了高斯烟雾扩散模型以理解热点形成的机理,结合了从本地来源衍生的排放清单。我们的机理模型能够解释65%的观测到的瞬时热点。我们的发现强调了在资源受限环境中,整合数据驱动的预测模型与基于物理的机理模型对于可扩展和稳健的空气污染管理的重要性。

英文摘要

Urban air pollution hotspots pose significant health risks, yet their detection and analysis remain limited by the sparsity of public sensor networks. This paper addresses this challenge by combining predictive modeling and mechanistic approaches to comprehensively monitor pollution hotspots. We enhanced New Delhi's existing sensor network with 28 low-cost sensors, collecting PM2.5 data over 30 months from May 1, 2018, to Nov 1, 2020. Applying established definitions of hotspots to this data, we found the existence of additional 189 hidden hotspots apart from confirming 660 hotspots detected by the public network. Using predictive techniques like Space-Time Kriging, we identified hidden hotspots with 95% precision and 88% recall with 50% sensor failure rate, and with 98% precision and 95% recall with 50% missing sensors. The projected results of our predictive models were further compiled into policy recommendations for public authorities. Additionally, we developed a Gaussian Plume Dispersion Model to understand the mechanistic underpinnings of hotspot formation, incorporating an emissions inventory derived from local sources. Our mechanistic model is able to explain 65% of observed transient hotspots. Our findings underscore the importance of integrating data-driven predictive models with physics-based mechanistic models for scalable and robust air pollution management in resource-constrained settings.

2406.08966 2026-06-05 cs.LG cs.AI 版本更新

Separation Power of Equivariant Neural Networks

等变神经网络的分离能力

Marco Pacini, Xiaowen Dong, Bruno Lepri, Gabriele Santin

发表机构 * University of Trento(特伦托大学) Fondazione Bruno Kessler(布鲁诺·凯斯勒基金会) University of Oxford(牛津大学) University of Venice(威尼斯大学)

AI总结 本文研究了等变神经网络的分离能力,分析了架构和超参数对分离能力的影响,发现非多项式激活函数在表达能力上等价,深度在阈值后不再提升分离能力,而隐表示的块分解会影响分离能力。

Comments Published as a conference paper at ICLR 2025

详情
Journal ref
International Conference on Learning Representations (ICLR), 2025
AI中文摘要

机器学习模型的分离能力是指其区分不同输入的能力,常被用作表达能力的代理。确实,了解模型家族的分离能力是获得细粒度普遍性结果的必要条件。在本文中,我们分析了等变神经网络(如卷积网络和置换不变网络)的分离能力。我们首先给出了由给定架构导出的模型无法区分的输入的完整特征化。从这些结果中,我们推导出分离能力如何受到超参数和架构选择(如激活函数、深度、隐藏层宽度和表示类型)的影响。值得注意的是,所有非多项式激活函数(包括ReLU和Sigmoid)在表达能力上是等价的,并能达到最大分离能力。深度在达到阈值后提升分离能力,之后进一步增加无效应。在隐表示中添加不变特征不影响分离能力。最后,隐表示的块分解影响分离性,最小的组件形成一个分离能力的层次结构,提供了一种直接比较模型分离能力的方法。

英文摘要

The separation power of a machine learning model refers to its ability to distinguish between different inputs and is often used as a proxy for its expressivity. Indeed, knowing the separation power of a family of models is a necessary condition to obtain fine-grained universality results. In this paper, we analyze the separation power of equivariant neural networks, such as convolutional and permutation-invariant networks. We first present a complete characterization of inputs indistinguishable by models derived by a given architecture. From this results, we derive how separability is influenced by hyperparameters and architectural choices-such as activation functions, depth, hidden layer width, and representation types. Notably, all non-polynomial activations, including ReLU and sigmoid, are equivalent in expressivity and reach maximum separation power. Depth improves separation power up to a threshold, after which further increases have no effect. Adding invariant features to hidden representations does not impact separation power. Finally, block decomposition of hidden representations affects separability, with minimal components forming a hierarchy in separation power that provides a straightforward method for comparing the separation power of models.

2205.11518 2026-06-05 cs.CR cs.AI cs.LG 版本更新

LIA: Privacy-Preserving Data Quality Evaluation in Federated Learning Using a Lazy Influence Approximation

LIA: 在联邦学习中使用懒惰影响近似进行隐私保护的数据质量评估

Ljubomir Rokvic, Panayiotis Danassis, Sai Praneeth Karimireddy, Boi Faltings

发表机构 * École Polytechnique Fédérale de Lausanne (EPFL)(瑞士联邦理工学院洛桑校区) Telenor Research(Telenor研究) University of Southern California(南加州大学)

AI总结 本文提出了一种新的隐私保护数据质量评估方法LIA,通过懒惰影响近似技术过滤和评分数据,在保持隐私的前提下有效识别低质量、损坏或恶意数据。

Comments Proceedings of the 2024 IEEE International Conference on Big Data (IEEE BigData 2024). A preliminary version of this work received the Best Paper Award at the International Workshop on Trustworthy Federated Learning at IJCAI (FL-IJCAI) 2023

详情
AI中文摘要

在联邦学习中,处理低质量、损坏或恶意数据至关重要。然而,传统数据估值方法由于隐私问题并不适用。为此,我们提出了一种简单而有效的方法,利用一种称为“懒惰影响”的新影响近似方法来过滤和评分数据,同时保持隐私。为此,每个参与者使用自己的数据来估计另一个参与者批次的影响,并将差分隐私混淆的评分发送给中央协调器。我们的方法已在各种模拟和现实世界设置中成功过滤出偏见和损坏的数据,实现了超过90%的召回率(有时高达100%),同时在ε ≤ 1的强差分隐私保证下保持性能。

英文摘要

In Federated Learning, it is crucial to handle low-quality, corrupted, or malicious data. However, traditional data valuation methods are not suitable due to privacy concerns. To address this, we propose a simple yet effective approach that utilizes a new influence approximation called "lazy influence" to filter and score data while preserving privacy. To do this, each participant uses their own data to estimate the influence of another participant's batch and sends a differentially private obfuscated score to the central coordinator. Our method has been shown to successfully filter out biased and corrupted data in various simulated and real-world settings, achieving a recall rate of over $>90\%$ (sometimes up to $100\%$) while maintaining strong differential privacy guarantees with $\varepsilon \leq 1$.

2410.04907 2026-06-05 math.CO cs.DM cs.LG cs.NE math.OC 版本更新

Decomposition Polyhedra of Piecewise Linear Functions

分段线性函数的分解多面体

Marie-Charlotte Brandenburg, Moritz Grillo, Christoph Hertrich

发表机构 * Ruhr-Universität Bochum(博德姆鲁尔大学) Max Planck Institute MiS(马克斯·普朗克研究所MiS) University of Technology Nuremberg(纽伦堡技术大学)

AI总结 本文研究如何将连续分段线性函数分解为两个凸分段线性函数的差,通过固定多面体复形来确定非线性区域的可能位置,并证明分解集合形成一个多面体,从而为优化和神经网络理论提供新的见解。

详情
AI中文摘要

在本文中,我们致力于研究如何将连续分段线性(CPWL)函数分解为两个凸CPWL函数的差这一经常被研究的问题。每种CPWL函数都有无限多种这样的分解方式,但为了在优化和神经网络理论中应用,找到具有尽可能少线性片段的分解至关重要。我们通过反驳Tran和Wang最近提出的方法来展示这一问题的挑战性。为了使问题更具可处理性,我们提出固定一个底层多面体复形来确定可能的非线性区域位置。在此假设下,我们证明分解集合形成一个多面体,该多面体是两个平移锥的交集。我们证明不可约分解对应于该多面体的有界面,并且最小解必须是顶点。然后我们识别出具有唯一最小分解的情况,并展示我们的见解对亚模函数理论的影响。最后,我们改进了给定凸CPWL函数的神经网络构造,并应用我们的框架在非凸情况下获得结果。

英文摘要

In this paper we contribute to the frequently studied question of how to decompose a continuous piecewise linear (CPWL) function into a difference of two convex CPWL functions. Every CPWL function has infinitely many such decompositions, but for applications in optimization and neural network theory, it is crucial to find decompositions with as few linear pieces as possible. This is a highly challenging problem, as we further demonstrate by disproving a recently proposed approach by Tran and Wang [Minimal representations of tropical rational functions. Algebraic Statistics, 15(1):27-59, 2024]. To make the problem more tractable, we propose to fix an underlying polyhedral complex determining the possible locus of nonlinearity. Under this assumption, we prove that the set of decompositions forms a polyhedron that arises as intersection of two translated cones. We prove that irreducible decompositions correspond to the bounded faces of this polyhedron and minimal solutions must be vertices. We then identify cases with a unique minimal decomposition, and illustrate how our insights have consequences in the theory of submodular functions. Finally, we improve upon previous constructions of neural networks for a given convex CPWL function and apply our framework to obtain results in the nonconvex case.

2406.07049 2026-06-05 cs.NE cs.LG 版本更新

GridPE: A Grid Cell-Inspired Unified Position Embedding for Arbitrary-Dimensional Spaces

GridPE: 一种基于网格细胞的统一位置嵌入方法用于任意维度空间

Boyang Li, Yulin Wu, Nuoxian Huang, Wenjia Zhang

发表机构 * New York University(纽约大学) Peking University(北京大学) Imperial College London(伦敦帝国学院) Tongji University(同济大学)

AI总结 本文提出GridPE,一种受哺乳动物空间认知中六边形周期编码启发的新型位置嵌入框架,旨在解决高维时空任务中位置嵌入的理论保障问题,通过结合计算神经科学原理和调和分析,为任意维度空间提供统一的位置嵌入解决方案。

详情
AI中文摘要

理解所有维度上的空间关系对于智能系统至关重要。然而,现有的位置嵌入方法,如旋转位置嵌入(RoPE),在高维时空任务如视频理解和机器人导航中缺乏理论保障。受哺乳动物空间认知中网格细胞的六边形周期编码启发,我们提出GridPE——一种结合计算神经科学原理与调和分析的新位置嵌入框架。我们的方法基于随机傅里叶特征,并利用神经科学原理构建高效的嵌入。理论上,我们证明任何平移不变的空间函数都可以通过有限个傅里叶基的求和来近似,这自然在一维情况下还原为RoPE。然后,我们从生物可用性角度推导出每个尺度下的频率向量方向和数量,以及不同尺度之间的最佳比例。这些推导等同于该维度中正则单纯形的中心与顶点之间的关系。我们验证了GridPE在多种空间建模任务中的有效性,包括2D图像分类(ImageNet100)和3D点云识别(ModelNet40)。我们的理论分析确立了GridPE作为任意维度空间位置嵌入的统一框架,而实验结果证明其优于现有方法。

英文摘要

Understanding spatial relationships across all dimensions is fundamental for intelligent systems. However, existing positional embeddings, such as Rotary Positional Embedding (RoPE), lack theoretical guarantees for high-dimensional spatiotemporal tasks like video understanding and robotic navigation. Inspired by the hexagonal periodic coding of grid cells in mammalian spatial cognition, we propose GridPE -- a novel positional embedding framework that integrates computational neuroscience principles with harmonic analysis. Our approach builds upon Random Fourier Features and leverages principles from neuroscience to construct efficient embeddings. Theoretically, we prove that any translation-invariant spatial function can be approximated by a finite sum of Fourier bases, which naturally reduces to RoPE in the one-dimensional case. We then derive the directions and quantities of frequency vectors at each scale in any Euclidean dimension, along with the optimal ratio between different scales, from a bioavailability perspective. These derivations are equivalent to the relationship between the centroid and the vertices of a regular simplex in that dimension. We validate GridPE across a range of spatial modeling tasks, including 2D image classification (ImageNet100) and 3D point cloud recognition (ModelNet40). Our theoretical analysis establishes GridPE as a unified framework for positional embedding in arbitrary-dimensional spaces, while empirical results demonstrate its superiority over existing methods.

2311.07565 2026-06-05 cs.LG stat.ML 版本更新

Exploration via linearly perturbed loss minimisation

通过线性扰动损失最小化进行探索

David Janz, Shuai Liu, Alex Ayoub, Csaba Szepesvári

发表机构 * University of Alberta(阿尔伯塔大学)

AI总结 本文提出了一种基于线性扰动损失的探索方法EVILL,通过求解线性扰动的正则化负对数似然函数的最小化问题,解释了随机奖励扰动为何能产生有效的多臂老虎机算法,并展示了数据依赖扰动如何使EVILL在理论和实践中达到与Thompson采样类参数扰动方法相当的性能。

Comments Updated with erratum note: Appendix I contains a gap in the proof; all main-paper claims remain valid via the corrected argument of Perneczky, Abeille & Janz (2026, arXiv:2606.00431)

详情
AI中文摘要

我们引入了通过线性损失扰动进行探索(EVILL),一种用于结构化随机老虎机问题的随机探索方法,其通过求解线性扰动的正则化负对数似然函数的最小化问题来工作。我们证明,在一般线性老虎机的情况下,EVILL简化为扰动历史探索(PHE),一种通过在随机扰动的奖励上进行训练来实现探索的方法。通过这样做,我们提供了一个简单清晰的解释,说明何时以及为什么随机奖励扰动会产生有效的老虎机算法。我们提出了之前PHE类型方法中未出现的数据依赖扰动,使EVILL能够匹配Thompson-sampling风格的参数扰动方法的性能,理论和实践中均如此。此外,我们展示了在一般线性老虎机之外的一个例子,其中PHE导致不一致的估计,从而产生线性遗憾,而EVILL仍然表现良好。与PHE一样,EVILL可以通过几行代码实现。

英文摘要

We introduce exploration via linear loss perturbations (EVILL), a randomised exploration method for structured stochastic bandit problems that works by solving for the minimiser of a linearly perturbed regularised negative log-likelihood function. We show that, for the case of generalised linear bandits, EVILL reduces to perturbed history exploration (PHE), a method where exploration is done by training on randomly perturbed rewards. In doing so, we provide a simple and clean explanation of when and why random reward perturbations give rise to good bandit algorithms. We propose data-dependent perturbations not present in previous PHE-type methods that allow EVILL to match the performance of Thompson-sampling-style parameter-perturbation methods, both in theory and in practice. Moreover, we show an example outside generalised linear bandits where PHE leads to inconsistent estimates, and thus linear regret, while EVILL remains performant. Like PHE, EVILL can be implemented in just a few lines of code.

2403.00965 2026-06-05 stat.AP cs.AI cs.LG 版本更新

Binary Gaussian Copula Synthesis: an LLM-powered data augmentation framework for early dialysis prediction in chronic kidney disease

二元高斯卷积合成:一种基于LLM的数据增强框架,用于慢性肾病早期透析预测

Hamed Khosravi, Milad Khanchi, Mobina Noori, Srinjoy Das, Abdullah Al-Mamun, Imtiaz Ahmed

发表机构 * Department of Industrial & Management Systems Engineering, West Virginia University(威斯康星大学工业与管理系统工程系) Department of Electrical and Computer Engineering, Concordia University(康科迪亚大学电气与计算机工程系) Department of Computer Science, University of California, Davis(加州大学戴维斯分校计算机科学系) School of Mathematical & Data Sciences, West Virginia University(威斯康星大学数学与数据科学学院) School of Systems Science and Industrial Engineering, The State University of New York at Binghamton(纽约州立大学布法罗分校系统科学与工业工程学院) H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology(佐治亚理工学院H.米尔顿·斯图尔特工业与系统工程学院)

AI总结 本文提出Binary Gaussian Copula Synthesis (BGCS),一种专为二元临床数据设计的两阶段数据增强方法,通过生成合成少数类样本并过滤不合理的样本,提高了早期透析预测的性能。

详情
AI中文摘要

只有极少数慢性肾病(CKD)患者会进展到透析,这导致了严重的类别不平衡,限制了机器学习模型在早期透析预测中的性能。这一挑战进一步加剧了电子健康记录(EHR)数据的二元结构,而现有的大多数增强方法并未为此设计。我们提出了Binary Gaussian Copula Synthesis (BGCS),一种专为二元临床数据设计的两阶段数据增强方法。BGCS首先使用高斯卷积框架生成合成少数类样本,该框架明确建模二元特征之间的成对依赖关系,然后应用微调的GPT-2分类器过滤出临床上不合理的样本后再进行训练。我们在一个包含15,169名CKD患者的真实世界EHR数据集中评估了BGCS,该数据集来自西弗吉尼亚州,收集时间从2008年到2022年。我们将其与SMOTE、CTGAN和标准高斯卷积在四个机器学习分类器上进行了基准测试,共进行了25次独立运行。BGCS在所有比较方法中表现一致,实现了90天透析预测的最高少数类召回率,不同分类器的中位数值范围从0.78到0.87,且在真实数据上的分布忠实度最强,特征的均值p值为0.68。表现最好的BGCS增强模型被集成到一个可解释的决策树基于的临床决策支持系统中,用于透析风险分层,其中电解质失衡、心血管合并症和肾脏监测指标成为最显著的预测特征。这些发现表明,为二元EHR数据的结构特性设计的增强方法可以显著提高早期透析风险预测,并支持开发可解释的临床决策支持工具用于CKD护理。

英文摘要

Only a small fraction of patients with chronic kidney disease (CKD) progress to dialysis, creating severe class imbalance that limits the performance of machine learning models for early dialysis prediction. This challenge is compounded by the binary structure of electronic health record (EHR) data, for which most existing augmentation methods were not designed. We propose Binary Gaussian Copula Synthesis (BGCS), a two-stage data augmentation method tailored to binary clinical data. BGCS first generates synthetic minority-class samples using a Gaussian copula framework that explicitly models pairwise dependencies among binary features, then applies a fine-tuned GPT-2 classifier to filter out clinically implausible samples before training. We evaluated BGCS on a real-world EHR dataset of 15,169 patients with CKD from West Virginia collected between 2008 and 2022, benchmarking it against SMOTE, CTGAN, and standard Gaussian Copula across four machine learning classifiers over 25 independent runs. BGCS consistently outperformed all comparison methods, achieving the highest minority-class recall for 90-day dialysis prediction, with median values ranging from 0.78 to 0.87 across classifiers, and the strongest distributional fidelity to real data, with a mean p-value of 0.68 across features. The best-performing BGCS-augmented model was integrated into an interpretable decision tree-based clinical decision support system for dialysis risk stratification, with electrolyte imbalances, cardiovascular comorbidities, and renal monitoring indicators emerging as the most influential predictive features. These findings suggest that augmentation methods designed for the structural properties of binary EHR data can meaningfully improve early dialysis risk prediction and support the development of interpretable clinical decision-support tools for CKD care.

2306.09712 2026-06-05 cs.LG cs.AI cs.CL 版本更新

Semi-Offline Reinforcement Learning for Optimized Text Generation

半离线强化学习用于优化文本生成

Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan

发表机构 * Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan(未知机构)

AI总结 本文提出了一种半离线强化学习方法,平衡了探索能力和训练成本,并在优化成本、渐近误差和过拟合误差界方面实现了最优的强化学习设置。

Comments In Proceedings of the 40th International Conference on Machine Learning (ICML 2023)

详情
AI中文摘要

在强化学习(RL)中,与环境交互有两种主要设置:在线和离线。在线方法在显著的时间成本下探索环境,而离线方法通过牺牲探索能力高效地获得奖励信号。我们提出了一种半离线RL,一种新的范式,能够从离线过渡到在线设置,平衡探索能力和训练成本,并为比较不同的RL设置提供理论基础。基于半离线公式,我们提出了在优化成本、渐近误差和过拟合误差界方面最优的RL设置。广泛实验表明,我们的半离线方法高效且在与最新方法相比时表现相当或更好。

英文摘要

In reinforcement learning (RL), there are two major settings for interacting with the environment: online and offline. Online methods explore the environment at significant time cost, and offline methods efficiently obtain reward signals by sacrificing exploration capability. We propose semi-offline RL, a novel paradigm that smoothly transits from offline to online settings, balances exploration capability and training cost, and provides a theoretical foundation for comparing different RL settings. Based on the semi-offline formulation, we present the RL setting that is optimal in terms of optimization cost, asymptotic error, and overfitting error bound. Extensive experiments show that our semi-offline approach is efficient and yields comparable or often better performance compared with state-of-the-art methods.

2305.12640 2026-06-05 cs.AI cs.LG stat.ML 版本更新

Limited Resource Allocation in a Non-Markovian World: The Case of Maternal and Child Healthcare

在非马尔可夫世界中的有限资源分配:产科与儿童保健的案例

Panayiotis Danassis, Shresth Verma, Jackson A. Killian, Aparna Taneja, Milind Tambe

发表机构 * Harvard University(哈佛大学) Google Research(谷歌研究)

AI总结 本文研究了在非马尔可夫环境下如何通过时间序列方法优化资源分配,提出了一种新的时间序列臂排名指数(TARI)策略,以提高产科和儿童保健项目的参与度和依从性。

Comments Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI 2023)

详情
AI中文摘要

许多医疗项目成功的关键在于参与者的依从性。我们考虑在资源有限的环境中(例如健康工作者及时拨打电话)安排干预措施,以提高依从性和/或参与度。以往的工作已经成功开发了几种基于活跃多臂老虎机(RMAB)的解决方案。然而,所有以往的RMAB方法都假设参与者的行为遵循马尔可夫性质。我们展示了在我们合作伙伴NGO ARMMAN的产科健康意识项目上的真实数据中,存在显著偏离马尔可夫假设的现象。此外,我们扩展RMAB到连续状态空间,这是之前研究较少的领域。为解决一般的非马尔可夫RMAB环境,我们(i)将每个参与者的时间轨迹建模为时间序列,(ii)利用时间序列预测模型的力量来学习复杂模式和动态以预测未来状态,(iii)提出时间序列臂排名指数(TARI)策略,这是一种新的算法,选择最能从干预中受益的RMAB臂,基于我们的未来状态预测。我们在合成数据和ARMMAN的真实数据二次分析上评估了我们的方法,并证明了与部署的Whittle指数解决方案相比,参与度显著增加。这相当于额外16.3小时的内容被聆听,90.8%更多的脱节风险被防止,并覆盖了超过两倍的高脱节风险受益人。

英文摘要

The success of many healthcare programs depends on participants' adherence. We consider the problem of scheduling interventions in low resource settings (e.g., placing timely support calls from health workers) to increase adherence and/or engagement. Past works have successfully developed several classes of Restless Multi-armed Bandit (RMAB) based solutions for this problem. Nevertheless, all past RMAB approaches assume that the participants' behaviour follows the Markov property. We demonstrate significant deviations from the Markov assumption on real-world data on a maternal health awareness program from our partner NGO, ARMMAN. Moreover, we extend RMABs to continuous state spaces, a previously understudied area. To tackle the generalised non-Markovian RMAB setting we (i) model each participant's trajectory as a time-series, (ii) leverage the power of time-series forecasting models to learn complex patterns and dynamics to predict future states, and (iii) propose the Time-series Arm Ranking Index (TARI) policy, a novel algorithm that selects the RMAB arms that will benefit the most from an intervention, given our future state predictions. We evaluate our approach on both synthetic data, and a secondary analysis on real data from ARMMAN, and demonstrate significant increase in engagement compared to the SOTA, deployed Whittle index solution. This translates to 16.3 hours of additional content listened, 90.8% more engagement drops prevented, and reaching more than twice as many high dropout-risk beneficiaries.

0911.2381 2026-06-05 physics.data-an cond-mat.stat-mech cs.LG nlin.CD stat.ME 版本更新

Analytical Determination of Fractal Structure in Stochastic Time Series

随机时间序列中分形结构的解析确定

Fermín Moscoso del Prado Martín

发表机构 * Laboratoire de Psychologie Cognitive ( UMR --6146) CNRS \& Aix--Marseille Universit\'e I, Marseille, France

AI总结 本文提出了一种基于贝叶斯评估的分析框架,用于客观准确地推断时间序列的分形结构,同时推导出一种优于现有方法的Hurst指数最大似然估计器。

Comments 9 pages, 4 figures

详情
Journal ref
Psychological Methods (2013) 18(4):514-34
AI中文摘要

当前确定时间序列是否具有分形结构(FS)的方法依赖于对Hurst指数估计值的主观评估。本文引入了贝叶斯评估的标度性,一种用于对时间序列的FS进行客观和准确推断的分析框架。该技术利用了时间序列相关扩散的标度性质。所得标准易于计算,并代表了支持时间序列标度域不同假设的证据的准确表征。此外,从该标准导出了H的闭式最大似然估计器,该估计器优于目前最好的估计器。

英文摘要

Current methods for determining whether a time series exhibits fractal structure (FS) rely on subjective assessments on estimators of the Hurst exponent (H). Here, I introduce the Bayesian Assessment of Scaling, an analytical framework for drawing objective and accurate inferences on the FS of time series. The technique exploits the scaling property of the diffusion associated to a time series. The resulting criterion is simple to compute and represents an accurate characterization of the evidence supporting different hypotheses on the scaling regime of a time series. Additionally, a closed-form Maximum Likelihood estimator of H is derived from the criterion, and this estimator outperforms the best available estimators.