arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.26112 2026-05-26 cs.AI cs.LG 版本更新

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

从模型扩展到系统扩展:扩展智能体AI中的“缰绳”

Shangding Gu

发表机构 * UC Berkeley(伯克利大学)

AI总结 本文提出智能体AI的下一个瓶颈是系统扩展而非仅模型扩展,通过设计可审计、持久、模块化和可验证的架构(称为“缰绳”),并研究上下文治理、可信记忆和动态技能路由三大瓶颈,以推动智能体行为从模型能力向长期任务执行转化。

详情
AI中文摘要

本文研究智能体AI中下一个主要瓶颈是系统扩展,而不仅仅是模型扩展:围绕基础模型设计可审计、持久、模块化和可验证的架构。我们将这种转变称为扩展“缰绳”:将基础模型周围的结构化执行层视为设计、评估和优化的一等对象。尽管近期的大语言模型使智能体能够使用工具、检索信息、维护记忆并执行长期工作流,但评估仍以模型为中心,通常将智能体简化为最终任务成功,而将记忆、检索、工具使用、编排、验证和治理视为次要的实现细节。这种框架日益不足,因为智能体性能源于基础模型、记忆基质、上下文构建器、技能路由层、编排循环以及验证与治理层之间的交互。这些组件共同构成智能体缰绳,将模型能力转化为长期智能体行为。我们通过三个核心瓶颈研究扩展缰绳:上下文治理、可信记忆和动态技能路由,以及协调和约束它们的编排与治理机制。我们进一步概述了缰绳级基准的研究议程,超越一次性任务成功,测量轨迹质量、记忆卫生、上下文效率、通信保真度、验证成本和随时间的安全演化。为使讨论具体化,我们开发了CheetahClaws:https://github.com/SafeRL-Lab/cheetahclaws,一个Python原生参考缰绳,并将其与Claude Code和OpenClaw进行比较。我们的主要主张是,智能体AI的未来进展将同样依赖于系统设计和更强的模型。

英文摘要

This paper studies the next major bottleneck in agentic AI as system scaling, not only model scaling: the design of auditable, persistent, modular, and verifiable architectures around foundation models. We refer to this shift as scaling the harness: treating the structured execution layer around a foundation model as a first-class object of design, evaluation, and optimization. Although recent large language models enable agents to use tools, retrieve information, maintain memory, and execute long-horizon workflows, evaluation remains largely model-centric, often reducing agents to final-task success while treating memory, retrieval, tool use, orchestration, verification, and governance as secondary implementation details. This framing is increasingly inadequate because agent performance emerges from the interaction among the foundation model, memory substrate, context constructor, skill-routing layer, orchestration loop, and verification-and-governance layer. Together, these components form the agent harness, which translates model capability into long-horizon agent behavior. We study scaling the harness through three core bottlenecks: context governance, trustworthy memory, and dynamic skill routing, together with the orchestration and governance mechanisms that coordinate and constrain them. We further outline a research agenda for harness-level benchmarks that go beyond one-shot task success to measure trajectory quality, memory hygiene, context efficiency, communication fidelity, verification cost, and safe evolution over time. To make the discussion concrete, we develop CheetahClaws: https://github.com/SafeRL-Lab/cheetahclaws, a Python-native reference harness, and compare it with Claude Code and OpenClaw. Our main claim is that future progress in agentic AI will depend as much on system design as on stronger foundation models.

2605.26111 2026-05-26 cs.CV cs.AI cs.GR cs.LG cs.MM 版本更新

Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation

从多模态大语言模型中榨取能力用于主题驱动生成

Shuhong Zheng, Aashish Kumar Misraa, Yu-Teng Li, Yu-Jhe Li, Igor Gilitschenski

发表机构 * University of Toronto & Vector Institute(多伦多大学及向量研究所) Adobe(Adobe公司) Google(谷歌公司)

AI总结 提出一种结合多模态大语言模型和VAE身份条件的方法,通过双层级聚合模块和多阶段去噪策略,在主题驱动图像生成中实现多模态理解与身份保持的平衡,优于现有方法。

Comments 33 pages, 18 figures, Project Page: https://zsh2000.github.io/squeeze-mllm-subject-gen/

详情
AI中文摘要

主题驱动图像生成旨在合成新图像,在遵循文本指令的同时保持给定主题的身份。现有方法通常分别编码文本和参考图像,这限制了跨模态推理能力并导致复制粘贴伪影。最近连接多模态模型和扩散模型的框架改进了指令遵循,但很大程度上忽略了身份保持。为了解决这些限制,我们将扩散模型条件设置为联合编码文本和参考图像的多模态大语言模型(MLLM),并用基于VAE的身份条件进行增强。设计了一种新颖的双层级聚合(DLA)模块来聚合多级MLLM特征以实现最优条件,并应用多阶段去噪策略在推理过程中逐步平衡来自MLLM的语义信息和来自VAE的精细细节身份。大量实验表明,我们的方法协调了多模态理解与身份保持,缓解了复制粘贴问题,并在主题驱动图像生成中实现了优于人类偏好的性能。我们的项目网站位于https://zsh2000.github.io/squeeze-mllm-subject-gen/。

英文摘要

Subject-driven image generation aims to synthesize new images that preserve the identity of the given subject while following textual instructions. Existing approaches often encode text and reference images separately. This limits cross-modal reasoning abilities and causes copy-paste artifacts. Recent frameworks that connect multimodal models and diffusion models improve instruction following, but largely overlook identity preservation. To address these limitations, we condition diffusion models on Multimodal Large Language Models (MLLMs) that jointly encode text and reference images, and augment it with VAE-based identity conditioning. A novel Dual Layer Aggregation (DLA) module is designed to aggregate multi-level MLLM features for optimal conditioning, and a multi-stage denoising strategy is applied to progressively balance the semantic information from MLLM and fine-detail identity from VAE during inference. Extensive experiments demonstrate that our approach harmonizes multimodal understanding with identity preservation, mitigates copy-paste issues, and achieves superior performance regarding human preference on subject-driven image generation. Our project website is available at https://zsh2000.github.io/squeeze-mllm-subject-gen/.

2605.26110 2026-05-26 cs.LG cs.CL cs.CV 版本更新

Prism: A Plug-in Reproducible Infrastructure for Scalable Multimodal Continual Instruction Tuning

Prism:面向可扩展多模态持续指令微调的插件式可复现基础设施

Jun-Tao Tang, Yu-Cheng Shi, Zhen-Hao Xie, Da-Wei Zhou

发表机构 * School of Artificial Intelligence, Nanjing University, China(南京大学人工智能学院) National Key Laboratory for Novel Software Technology, Nanjing University, China(南京大学新型软件技术国家重点实验室)

AI总结 针对多模态持续指令微调中工程瓶颈问题,提出Prism插件式代码库,通过轻量级插件注册机制分离算法开发与骨干实现,支持大规模训练流水线,实现可复现、可扩展的实验。

Comments Code is available at https://github.com/LAMDA-CL/Prism

详情
AI中文摘要

多模态大语言模型(MLLMs)通过指令微调将多样任务重构为统一的指令遵循框架,从而实现多功能性。然而,实际部署需要持续适应新兴任务,这推动了多模态持续指令微调(MCIT)的发展。尽管其重要性日益增长,当前的MCIT研究受到严重的工程瓶颈阻碍。现有方法通常通过直接修改基础MLLM代码库来实现,这带来了大量的实现开销,并产生了方法特定的架构,严重限制了代码复用和公平比较。为了解决这一问题,我们引入了Prism,一个专门为可扩展MCIT研究设计的插件式可复现代码库。它通过轻量级插件注册机制将算法开发与骨干实现分离,使得新策略可以作为独立插件集成,而无需修改底层MLLM代码库,从而消除结构碎片化并加速方法开发。Prism原生支持广泛使用的大规模训练流水线,从而实现可复现和可扩展的MCIT实验。代码可在https://github.com/LAMDA-CL/Prism获取。

英文摘要

Multimodal Large Language Models (MLLMs) achieve versatility by reformulating diverse tasks into a unified instruction-following framework via instruction tuning. However, real-world deployment requires continuous adaptation to emerging tasks, motivating Multimodal Continual Instruction Tuning (MCIT). Despite its growing importance, current MCIT research is hindered by severe engineering bottlenecks. Existing methods are typically implemented by directly modifying the base MLLM codebase, which imposes substantial implementation overhead and yields method-specific architectures that severely limit code reuse and fair comparison. To address this, we introduce Prism, a plug-in reproducible codebase specifically designed for scalable MCIT research. It separates algorithmic development from the backbone implementation via a lightweight plugin registration mechanism, enabling new strategies to be integrated as independent plugins without modifying the underlying MLLM codebase, thereby eliminating structural fragmentation and accelerating method development. Prism natively supports widely used large-scale training pipeline, thereby enabling reproducible and scalable MCIT experimentation. Code is available at https://github.com/LAMDA-CL/Prism.

2605.26106 2026-05-26 cs.LG 版本更新

Looped Diffusion Language Models

循环扩散语言模型

Sanghyun Lee, Chunsan Hong, Seungryong Kim, Jonghyun Lee, Jongho Park, Dongmin Park

发表机构 * KAIST(韩国科学技术院) KRAFTON(KRAFTON公司) University of California, Berkeley(加州大学伯克利分校)

AI总结 提出LoopMDM,通过选择性循环早期-中间Transformer层,在训练时实现深度缩放效果而不增加参数,在推理时通过调整循环次数灵活扩展计算量,从而提升掩码扩散模型的训练效率和性能。

Comments 23 pages

详情
AI中文摘要

掩码扩散模型(MDMs)已成为自回归模型在语言建模中的有前途替代方案,然而针对MDMs的Transformer架构有效设计仍未充分探索。在本文中,我们展示选择性循环早期-中间Transformer层显著提升了MDMs的训练效率和模型性能。我们将此方法称为LoopMDM(循环掩码扩散模型),它带来两个关键优势:训练时循环层产生深度缩放效果而不增加参数,而推理时改变循环次数可实现灵活的计算扩展。尽管简单,结果令人瞩目:在多个预训练语料库上,LoopMDM在匹配相同大小MDMs性能的同时,训练FLOPs最多减少3.3倍,并且在各种推理基准上最终性能优于它们,包括在GSM8K上最多提升8.5个百分点。它甚至超越了使用可比每步计算训练的更深非循环MDMs,表明选择性循环比简单深度缩放更有效。此外,LoopMDM可通过增加循环次数来扩展推理时计算。在采样过程中自适应调整循环次数进一步在保持性能的同时提高计算效率。最后,通过注意力分析,我们提供证据表明循环通过促进掩码位置之间的交互在MDMs中有效。我们的代码和权重将公开发布。

英文摘要

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models for language modeling, yet the effective design of transformer architectures for MDMs remains underexplored. In this paper, we show that selectively looping the early-middle transformer layers significantly improves both training efficiency and model performance in MDMs. We call this approach LoopMDM(Looped Masked Diffusion Model), which brings two key benefits: looping layers at training-time yields a depth-scaling effect without adding parameters, while varying the number of loops at inference-time enables flexible compute scaling. Despite the simplicity, the results are striking: across multiple pre-training corpora, LoopMDM matches the performance of same-size MDMs with up to 3.3 fewer training FLOPs, while its final performance outperforms them on various reasoning benchmarks, including up to 8.5 points on GSM8K. It even surpasses deeper non-looped MDMs trained with comparable per-step compute, indicating that selective looping is more effective than naive depth scaling. Furthermore, LoopMDM can scale inference-time compute by increasing the number of loops. Adaptively adjusting the number of loops throughout the sampling process further yields additional gains in compute efficiency while maintaining performance. Lastly, with attention analysis, we provide evidence that looping is effective in MDMs by promoting interactions among masked positions. Our code and weights will be publicly released.

2605.26097 2026-05-26 cs.LG 版本更新

Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay

语言模型中的遗忘:容量、优化与自生成回放

Martin Marek, Dongkyu Cho, Shikai Qiu, Rumi Chunara, Pavel Izmailov, Andrew Gordon Wilson

发表机构 * New York University(纽约大学)

AI总结 本文研究了语言模型中的遗忘问题,发现自生成样本可作为有效的回放数据几乎消除遗忘,并揭示了容量限制和低学习率对遗忘的影响。

详情
AI中文摘要

在新任务上训练的模型通常会在先前任务上表现下降,这种现象称为遗忘。传统上,缓解遗忘需要回放存储的先前任务样本,这通常不切实际。相比之下,语言模型可以从自身的训练分布中采样,我们证明这些自生成样本可作为有效的回放数据,几乎消除遗忘。然而,我们发现当模型剩余容量很小时,遗忘仍然存在:接近饱和的预训练模型无法在不覆盖先前知识的情况下吸收新信息。当容量不是限制因素时,低学习率会减少遗忘,但需要更多的训练步骤。回放打破了这一权衡,使得无需遗忘即可进行快速、高学习率的微调。

英文摘要

Models trained on a new task typically degrade on prior tasks, a phenomenon known as forgetting. Traditionally, mitigating forgetting has required replaying stored exemplars from prior tasks, which is often impractical. By contrast, language models can sample from their own training distribution, and we show that these self-generated samples serve as effective replay data, nearly eliminating forgetting. We find that forgetting nonetheless persists when the model has little remaining capacity: models pretrained close to saturation cannot absorb new information without overwriting prior knowledge. When capacity is not the limiting factor, low learning rates reduce forgetting but require substantially more training steps. Replay breaks this tradeoff, enabling fast, high-learning-rate finetuning without forgetting.

2605.26093 2026-05-26 cs.LG stat.ML 版本更新

Goal-driven Bayesian Optimal Experimental Design for Robust Decision-Making Under Model Uncertainty

面向模型不确定性下鲁棒决策的目标驱动贝叶斯最优实验设计

Jinwoo Go, Xiaoning Qian, Byung-Jun Yoon

发表机构 * Computing & Data Sciences, Brookhaven National Laboratory(布鲁海文国家实验室计算与数据科学部) Department of Electrical & Computer Engineering, Texas A&M University(德克萨斯农工大学电气与计算机工程系) Department of Computer Science & Engineering, Texas A&M University(德克萨斯农工大学计算机科学与工程系)

AI总结 提出GoBOED框架,通过结合变分后验代理与可微凸决策层,直接优化实验设计以提升下游决策质量,并理论证明其对决策无关参数方向不敏感。

详情
AI中文摘要

贝叶斯最优实验设计(BOED)选择实验以最大化关于模型参数的信息增益。然而,在决策关键场景中,减少参数不确定性并不一定能改善下游决策,因为只有与目标相关的特定参数方向才真正重要。我们提出了GoBOED,一个目标驱动的BOED框架,它直接针对指定的决策目标优化实验设计。GoBOED结合了摊销变分后验代理与可微凸决策层,实现了完全以决策为中心的基于梯度的设计优化。我们从理论上证明,GoBOED梯度对决策目标无关的参数方向不敏感,这为为什么目标驱动设计在更广泛的实验设计集合上实现与信息增益最大化等效的决策质量提供了形式化依据。在源定位、流行病管理和药代动力学控制等实证任务中,GoBOED识别出与下游决策目标更一致的设计,并揭示了接近最优的设计窗口比目标无关的BOED方法预测的要宽得多。

英文摘要

Bayesian optimal experimental design (BOED) selects experiments to maximize information gain about model parameters. However, in decision-critical settings, reducing parameter uncertainty does not necessarily improve downstream decisions, as only specific parameter directions relevant to the objective truly matter. We propose GoBOED, a goal-driven BOED framework that directly optimizes experimental designs for a specified decision-making objective. GoBOED combines an amortized variational posterior surrogate with a differentiable convex decision layer, enabling gradient-based design optimization that is fully decision-focused. We theoretically show that GoBOED gradients are insensitive to parameter directions irrelevant to the decision objective, providing a formal justification for why goal-driven design achieves equivalent decision quality over a wider set of experimental designs than information-gain maximization. Empirically, across source localization, epidemic management, and pharmacokinetic control, GoBOED identifies designs that better align with downstream decision objectives and reveals that near-optimal design windows are substantially wider than those predicted by goal-agnostic BOED approaches.

2605.26087 2026-05-26 stat.ML cs.LG 版本更新

DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking

DiscoverPhysics: 基准测试LLMs的即用型科学思维

Matt L. Wiemann, Lindsay M. Smith, Peter Melchior, Siddharth Mishra-Sharma, Andrew Gordon Wilson, Pavel Izmailov, Carolina Cuesta-Lázaro

发表机构 * Princeton University(普林斯顿大学) Boston University(波士顿大学) New York University(纽约大学) Flatiron Institute(Flatiron研究所) Institute for Advanced Studies(高级研究研究所)

AI总结 提出DiscoverPhysics交互式基准,通过让LLM代理探索物理定律偏离现实的模拟世界,评估其设计实验、修正假设和发现物理规律的能力。

详情
AI中文摘要

前沿LLM现在在广泛的物理评估中表现强劲,但很难区分真正的推理与对已知科学的回忆。我们引入了DiscoverPhysics,一个交互式基准,要求LLM代理发现一个模拟世界的运动定律,该世界的物理故意偏离我们自己的世界。我们构建了22个世界,分别由屏蔽重力、分数幂重力、多物种耦合、隐藏暗物质样粒子、非坐标无关物理以及时变相互作用等支配。每个世界由N体模拟器按需生成,代理提出多轮实验,观察原始轨迹数据,最终提交对世界物理的自然语言解释以及推断定律的Python实现。由于解决一个世界需要代理设计信息性实验并修正其假设,该基准探测了在实验历史之上的长程推理。我们沿着两个互补轴评估提交:保留粒子的轨迹MSE和LLM评判的解释分数,该分数遵循专家编写的评估每个世界概念理解的规则。在11个前沿模型中,我们发现最强的代理仅通过一半的世界,并且在那些必须揭示潜在结构的世界中持续失败。开源模型在设计信息性实验和从数据中提取结论的能力方面明显落后于商业模型。我们进一步发现,良好的预测准确性并不能保证高质量的解释,并且概念理解依赖于通过精心选择的实验进行假设修正。

英文摘要

Frontier LLMs now perform strongly across a wide range of physics evaluations, but it is hard to disentangle genuine reasoning from recall of established science. We introduce DiscoverPhysics, an interactive benchmark that asks a LLM agent to discover the laws of motion of a simulated world whose physics deliberately deviates from our own. We construct 22 worlds governed by, among others, screened and fractional-power gravity, multi-species couplings, hidden dark-matter-like particles, non-coordinate-free physics, and time-varying interactions. Each world is generated on demand by an N-body simulator, for which the agent proposes several rounds of experiments, observes raw trajectory data, and ultimately submits both a natural-language explanation of the world's physics and a Python implementation of the inferred law. Because solving a world requires the agent to design informative experiments and revise its hypotheses, the benchmark probes long-horizon reasoning over an experimental history. We evaluate submissions along two complementary axes: trajectory MSE on held-out particles and an LLM-judged explanation score following an expert-written rubric assessing conceptual understanding of each world. Across eleven frontier models, we find that the strongest agents pass only half of the worlds and consistently fail on those where latent structure must be uncovered. Open-source models lag substantially behind commercial models, both in their ability to design informative experiments and in extracting conclusions from the data. We further find that good predictive accuracy does not guarantee high explanation quality and that conceptual understanding depends on hypothesis refinement through well-chosen experiments.

2605.26072 2026-05-26 cs.LG 版本更新

Active Query Synthesis for Preference Learning

用于偏好学习的主动查询合成

Namrata Nadagouda, Nauman Ahad, Maegan Tucker, Mark A. Davenport

发表机构 * Georgia Institute of Technology(佐治亚理工学院) Gauss Labs(Gauss实验室)

AI总结 针对偏好学习中的查询反馈可靠性问题和池评估计算瓶颈,提出基于互信息最大化的连续空间主动查询合成框架Info-Synth,并扩展出两种有限池查询策略,在合成数据、文本摘要和机器人控制任务上验证了有效性。

Comments 27 pages, 12 figures

详情
AI中文摘要

用户偏好的高效学习对于许多现代决策系统至关重要,但通常需要昂贵的标注数据。主动学习降低了这一成本,然而由于基于池的评估,标准方法计算开销大。此外,大多数方法假设所有查询反馈同样可靠,忽略了几乎相同或完全不同的项目之间的成对查询会产生模糊、低置信度的响应。为了解决反馈可靠性问题,我们引入了一种新颖的置信度感知响应模型,明确考虑了这些模糊比较。为了克服基于池评估的计算瓶颈,我们提出了一个主动查询合成框架Info-Synth,通过在连续空间内最大化基于互信息的目标来生成最优查询。此外,我们提出了两种策略,Pair M-dist和Pair Opt-dist,将Info-Synth扩展到即使限制在有限查询池中也能选择有效查询。我们通过合成偏好学习、受限文本摘要数据集以及模拟移动机器人的主观连续空间控制器增益调优,展示了我们框架的通用性和性能。

英文摘要

Efficient learning of user preferences is crucial for many modern decision making systems but typically requires costly labeled data. Active learning reduces this cost, yet standard methods are computationally expensive due to pool-based evaluation. Further, most methods assume all query feedback is equally reliable, ignoring that pairwise queries between nearly identical or entirely dissimilar items yield ambiguous, low-confidence responses. To address the issue of feedback reliability, we introduce a novel confidence aware response model that explicitly accounts for these ambiguous comparisons. To overcome the computational bottleneck of pool-based evaluation, we propose an active query synthesis framework, Info-Synth that generates optimal queries by maximizing a mutual information-based objective within a continuous space. Moreover, we propose two strategies, Pair M-dist and Pair Opt-dist, that extend Info-Synth to select effective queries even when restricted to finite query pools. We demonstrate our framework's versatility and performance across synthetic preference learning, constrained text summary datasets, and subjective, continuous-space controller gain tuning for a simulated mobile robot.

2605.26067 2026-05-26 cs.LG cs.AI 版本更新

Conditional KRR: Injecting Unpenalized Features into Kernel Methods with Applications to Kernel Thresholding

条件KRR:将无惩罚特征注入核方法及其在核阈值处理中的应用

Rustem Takhanov, Zhenisbek Assylbekov

发表机构 * Department of Mathematics, Nazarbayev University, Astana, Kazakhstan(纳扎尔巴耶夫大学数学系) Nazarbayev University Research Administration, Astana, Kazakhstan(纳扎尔巴耶夫大学研究行政部) Purdue University Fort Wayne, Indiana, USA(普渡大学枫林分校)

AI总结 本文通过将条件KRR简化为带残差核的KRR,理论分析了其统计性质,并展示了在核主成分和随机特征场景下优于标准KRR的条件。

Comments Accepted to ICML 2026

详情
AI中文摘要

条件正定(CPD)核是相对于函数类$\mathcal{F}$定义的。众所周知,这样的核$K$与其原生空间(类似于RKHS定义)相关联,进而产生一种学习方法——称为条件核岭回归(条件KRR),因其与KRR的类比而得名——其中估计的回归函数通过其原生空间范数的平方进行惩罚。该方法之所以引人关注,是因为它可以被视为经典线性回归(由$\mathcal{F}$指定特征),随后对目标变量的残差(未解释)部分应用标准KRR。这类方法最近引起了越来越多的关注。 我们通过将其行为简化为带有另一个固定核(称为残差核)的KRR来研究该方法的统计性质。我们的主要理论结果表明,这种简化确实是可能的,代价是期望测试风险中增加一个由$\mathcal{O}(1/\sqrt{N})$界定的额外项,其中$N$是样本量,隐藏常数依赖于类$\mathcal{F}$和输入分布。 这种简化使我们能够分析在$K$是正定的且$\mathcal{F}$由$K$的Mercer分解中的前$k$个主特征函数给出的情况下的条件KRR。我们还考虑了$\mathcal{F}$由来自$K$的随机特征表示的$k$个随机特征组成的设置。事实证明,这两种设置密切相关。我们的理论分析和实验都证实,只要回归函数的$\mathcal{F}$分量比残差部分更显著,条件KRR在这些情况下优于标准KRR。

英文摘要

Conditionally positive definite (CPD) kernels are defined with respect to a function class $\mathcal{F}$. It is well known that such a kernel $K$ is associated with its native space (defined analogously to an RKHS), which in turn gives rise to a learning method -- called conditional kernel ridge regression (conditional KRR) due to its analogy with KRR -- where the estimated regression function is penalized by the square of its native space norm. This method is of interest because it can be viewed as classical linear regression, with features specified by $\mathcal{F}$, followed by the application of standard KRR to the residual (unexplained) component of the target variable. Methods of this type have recently attracted increasing attention. We study the statistical properties of this method by reducing its behavior to that of KRR with another fixed kernel, called the residual kernel. Our main theoretical result shows that such a reduction is indeed possible, at the cost of an additional term in the expected test risk, bounded by $\mathcal{O}(1/\sqrt{N})$, where $N$ is the sample size and the hidden constant depends on the class $\mathcal{F}$ and the input distribution. This reduction enables us to analyze conditional KRR in the case where $K$ is positive definite and $\mathcal{F}$ is given by the first $k$ principal eigenfunctions in the Mercer decomposition of $K$. We also consider the setting where $\mathcal{F}$ consists of $k$ random features from a random feature representation of $K$. It turns out that these two settings are closely related. Both our theoretical analysis and experiments confirm that conditional KRR outperforms standard KRR in these cases whenever the $\mathcal{F}$-component of the regression function is more pronounced than the residual part.

2605.26061 2026-05-26 cs.LG cs.AI 版本更新

Neuronal Stochastic Attention Circuit (NSAC) for Probabilistic Representation Learning

神经元随机注意力电路(NSAC)用于概率表示学习

Waleed Razzaq, Yun-Bo Zhao

发表机构 * Department of Automation, University of Science \& Technology of China, Hefei, China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center

AI总结 提出一种受生物学启发的连续时间注意力架构NSAC,通过Ornstein-Uhlenbeck随机微分方程和NCP门控机制在logits上诱导高斯分布,实现概率输出与不确定性量化。

详情
AI中文摘要

连续时间表示学习中不确定性估计的可靠量化仍处于初级阶段,尤其是在连续时间注意力架构中。我们引入了神经元随机注意力电路(NSAC),这是一种新颖的受生物学启发的连续时间注意力架构,它将注意力logit计算重新表述为Ornstein-Uhlenbeck随机微分方程的解,该方程由来自重新利用的秀丽隐杆线虫神经元电路策略(NCP)布线机制的输入依赖的非线性互连门调制。它在logits上诱导高斯分布,通过注意力权重上的逻辑正态分布传播原则性的随机性,从而产生概率输出。一个结合高斯负对数似然与认知分离正则化器的两项目标函数强制更高的预测方差,并能够联合量化偶然不确定性和认知不确定性。实验上,我们在多种学习任务中实现了NSAC,包括:(i) 不规则连续时间函数逼近;(ii) 多元回归;(iii) 长程预测;(iv) 工业4.0;以及(v) 自动驾驶车辆的车道保持。我们观察到,NSAC在准确性上与多个基线保持竞争力,产生合理校准的不确定性估计,同时在神经元细胞级别具有可解释性。

英文摘要

Reliable quantification of uncertainty estimates in continuous-time (CT) representation learning remains nascent, particularly within CT attention architectures. We introduce the Neuronal Stochastic Attention Circuit (NSAC), a novel biologically-inspired CT attention architecture that reformulates attention logit computation as the solution of an Ornstein-Uhlenbeck stochastic differential equation modulated by input-dependent, nonlinear interlinked gates derived from repurposed C.elegans Neuronal Circuit Policies (NCPs) wiring mechanism. It induces Gaussian distribution over logits that propagates principled stochasticity through logistic-normal distribution over attention weights to yield probabilistic output. A two-term objective function combining Gaussian negative log-likelihood with an epistemic-separation regularizer enforces higher predictive variance and enables joint quantification of aleatoric and epistemic uncertainty. Empirically, we implement NSAC in a diverse set of learning tasks including: (i) irregular CT function approximation; (ii) multivariate regression; (iii) long-range forecasting; (iv) Industry 4.0; and (v) the lane-keeping of autonomous vehicles. We observe that the NSAC remains competitive against several baselines in terms of accuracy and produces reasonably well-calibrated uncertainty estimates while being interpretable at the neuronal cell level.

2605.26059 2026-05-26 physics.flu-dyn cs.LG 版本更新

Accelerating Bayesian inverse design in computational fluid dynamics using neural operators

利用神经算子加速计算流体力学中的贝叶斯逆向设计

Bipin Tiwari, Omer San

发表机构 * Department of Mechanical and Aerospace Engineering, University of Tennessee, Knoxville(机械与航空航天工程系,田纳西大学,诺克斯维尔)

AI总结 本文提出将神经算子代理模型嵌入MCMC采样循环,在保持后验结构的同时实现超过三个数量级的加速,用于计算流体力学中的贝叶斯逆向设计。

详情
Journal ref
Mach. Learn. Comput. Sci. Eng 2, 14 (2026)
AI中文摘要

贝叶斯逆向设计提供了一个原则性框架,用于从稀疏流场观测中推断空气动力学几何形状并量化不确定性。然而,其在计算流体力学(CFD)中的实际应用受到基于梯度的马尔可夫链蒙特卡洛(MCMC)采样所需重复高保真模拟成本的严重限制。虽然通常提出代理模型来降低这一成本,但它们对后验几何和不确定性(尤其是激波主导流)的影响仍知之甚少。在这项工作中,我们证明神经算子代理可以直接嵌入MCMC推断循环中,同时保持后验结构。通过准一维喷管流的全贝叶斯逆公式,我们证明几何参数化在可辨识性和后验条件中起决定性作用,其中三次B样条产生稳定且物理意义明确的不确定性估计。基于该公式,在No-U-Turn采样器中用CFD生成数据训练的深度算子网络替代CFD求解器,同时保持似然模型、先验和采样配置不变。在从稀疏到完全观测的范围内,基于代理的推断再现了CFD参考的后验几何和不确定性趋势。由于代理集成,总推断时间减少到一秒以下,对应超过三个数量级的加速。此外,直接逆神经算子作为逆向设计的确定性替代方案被研究,无需后验采样即可实现单次几何重建。这些结果表明,神经算子加速的贝叶斯推断能够为空气动力学应用实现实用的、不确定性感知的逆向设计工作流程。

英文摘要

Bayesian inverse design provides a principled framework for inferring aerodynamic geometries from sparse flow observations while quantifying uncertainty. However, its practical use in computational fluid dynamics (CFD) is severely limited by the cost of repeated high-fidelity simulations required for gradient-based Markov chain Monte Carlo (MCMC) sampling. While surrogate models are commonly proposed to reduce this cost, their effect on posterior geometry and uncertainty, especially for shock-dominated flows, remains poorly understood. In this work, we demonstrate that neural operator surrogates can be embedded directly within the MCMC inference loop while preserving posterior structure. Using a fully Bayesian inverse formulation of quasi-one-dimensional nozzle flow, we demonstrate that geometry parameterization plays a decisive role in identifiability and posterior conditioning, with cubic B-splines yielding stable and physically meaningful uncertainty estimates. Building on this formulation, a Deep Operator Network trained on CFD-generated data is substituted for the CFD solver within a No-U-Turn Sampler, while keeping the likelihood model, priors, and sampling configuration unchanged. Across sparse to fully observed regimes, surrogate-based inference reproduces the posterior geometry and uncertainty trends of the CFD reference. As a result of surrogate integration, total inference time is reduced to under one second, corresponding to a speedup exceeding three orders of magnitude. In addition, a direct inverse neural operator is examined as a deterministic alternative for inverse design, enabling single-shot geometry reconstruction without posterior sampling. These results demonstrate that neural operator-accelerated Bayesian inference enables practical, uncertainty-aware inverse design workflows for aerodynamic applications.

2605.26036 2026-05-26 cs.AI cs.LG 版本更新

CITYREP: A Unified Benchmark for Urban Representations Across Cities, Tasks, and Modalities

CITYREP:跨城市、任务和模态的城市表示统一基准

Junyuan Liu, Xinglei Wang, Zichao Zeng, Jiazhuang Feng, Quan Qin, Ilya Ilyankou, Guangsheng Dong, Tao Cheng

发表机构 * SpaceTimeLab, University College London, UK(伦敦大学空间时间实验室) DIMPact, University College London, UK(伦敦大学3DIMPact实验室) School of Resource and Environmental Sciences, Wuhan University, China(武汉大学资源与环境科学学院) State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, China(武汉大学测绘遥感信息工程国家重点实验室)

AI总结 提出CityRep基准,通过空间结构划分评估城市表示在不同模态、城市和任务上的性能,解决随机划分导致的空间泄漏和性能膨胀问题。

详情
AI中文摘要

城市表示学习将复杂城市环境编码为通用嵌入,用于多样下游任务和新兴城市基础模型。然而,当前评估存在局限,通常聚焦于一两个城市和任务,并依赖随机划分导致空间泄漏,从而产生膨胀的性能,并弱化跨位置泛化和公平比较的支持。为解决此问题,我们提出CityRep,一个统一基准,使用空间结构划分评估跨数据模态、城市和任务的城市表示。CityRep包含三个关键组件:(1)一个空间单元无关的评估框架,通过标准化对齐模块支持异构城市表示;(2)一个统一评估协议,使用基于区块的空间划分以减轻空间泄漏并实现严格的模型比较;(3)一个可扩展的多城市、多任务基准套件,涵盖8个城市和8个任务,包括回归、分类和分布预测。我们评估了11个代表性城市表示模型。结果表明,性能对划分协议高度敏感,随机划分会膨胀分数并改变模型排名。我们还观察到跨城市和任务的显著变异性,强调了需要泛化感知的评估。CityRep作为一个可复现的基准发布,包含数据集、评估流水线和诊断工具,以促进公平比较并支持未来城市表示学习向城市基础模型的研究。

英文摘要

Urban representation learning encodes complex urban environments into general-purpose embeddings for diverse downstream tasks and emerging urban foundation models. However, current evaluations are limited, typically focusing on one or two cities and tasks and relying on random splits that introduce spatial leakage, leading to inflated performance and weak support for cross-location generalization and fair comparison. To address this, we propose CityRep, a unified benchmark that evaluates urban representations across data modalities, cities, and tasks using spatially structured splits. CityRep consists of three key components: (1) a spatial unit-agnostic evaluation framework that supports heterogeneous urban representations through a standardized alignment module; (2) a unified evaluation protocol using block-based spatial splits to mitigate spatial leakage and enable rigorous model comparison; and (3) an extensible multi-city, multi-task benchmark suite spanning 8 cities and 8 tasks across regression, classification, and distribution prediction. We evaluate 11 representative urban representation models. Results show that performance is highly sensitive to the split protocol, with random splits inflating scores and altering model rankings. We also observe substantial variability across cities and tasks, underscoring the need for generalization-aware evaluation. CityRep is released as a reproducible benchmark with datasets, evaluation pipelines, and diagnostic tools to facilitate fair comparison and support future research in urban representation learning towards urban foundation models.

2605.26035 2026-05-26 cs.LG 版本更新

Length Generalization with Log-Depth Recurrent Units

对数深度循环单元的长度泛化

Charles Pert, Dalal Alrajeh, Alessandra Russo

发表机构 * Department of Computing(计算系) Imperial College London(帝国理工学院伦敦分校)

AI总结 提出MLP-LDRU(对数深度循环单元),通过并行约简近似循环,在21个正则语言任务中实现长度泛化,18个任务达到100%分布外准确率。

Comments 39 pages, 11 figures

详情
AI中文摘要

长度泛化仍然是神经网络面临的一个持续挑战:循环模型往往存在位置偏差,而Transformer受限于固定的计算深度。正则语言为评估长度泛化提供了一个常用的测试平台,因为标签预测可以针对任何序列长度进行检查。我们提出了MLP-LDRU,一种对数深度循环单元,它捕获了一类具有结合性偏好的算子,旨在通过并行约简来近似循环。我们在21个正则语言任务上评估了MLP-LDRU,包括标准基准和新的前缀语言,当增加最大训练长度时,它在18个任务上实现了100%的分布外准确率,在其余3个任务上至少达到99.9%,优于可比的循环和基于注意力的模型。我们还在ListOps和NLP分类基准上进一步评估了MLP-LDRU在正则语言之外的表现,结果具有竞争力。

英文摘要

Length generalization remains a persistent challenge for neural networks: recurrent models tend to suffer from positional biases, while transformers are constrained by fixed computational depth. Regular languages provide a frequently used testbed for evaluating length generalization, as label prediction can be checked for any sequence length. We propose MLP-LDRU, a type of Log-Depth Recurrent Unit, which captures a class of associativity-biased operators designed to approximate recurrence through parallel reduction. We evaluate MLP-LDRU on 21 regular-language tasks, consisting of standard benchmarks and new prefix languages, where it achieves 100% out-of-distribution accuracy on 18 tasks and at least 99.9% on the remaining 3 when increasing max training length, outperforming comparable recurrent and attention-based models. We further evaluate MLP-LDRU beyond regular languages on ListOps and NLP classification benchmarks, where it performs competitively.

2605.26032 2026-05-26 cs.CV cond-mat.stat-mech cs.AI cs.LG 版本更新

Everything at Every Scale: Scale-Invariant Diffusion with Continuous Super-Resolution

一切尺度:具有连续超分辨率的尺度不变扩散

Zixin Jessie Chen, Zhuo Chen, Archer Wang, Jeff Gore, William T. Freeman, Congyue Deng, Marin Soljačić

发表机构 * Department of Physics, Massachusetts Institute of Technology(麻省理工学院物理系) Department of EECS, Massachusetts Institute of Technology(麻省理工学院电子工程与计算机科学系) NSF AI Institute for Artificial Intelligence and Fundamental Interactions(国家科学基金会人工智能与基础相互作用研究所) Institute for Data, Systems and Society, Massachusetts Institute of Technology(麻省理工学院数据、系统与社会研究所)

AI总结 提出SKILD模型,通过尺度不变扩散统一图像生成与连续超分辨率,仅改变起始时间步即可实现不同任务。

Comments 29 pages, 17 figures

详情
AI中文摘要

从噪声创建图像是图像生成;从粗糙输入重建精细细节是超分辨率。尽管它们在实际应用中有差异,但都可以理解为逆转跨尺度的信息损失。我们引入了$ extbf{SKILD}$,一个$ extbf{S}$cale-invariant $ extbf{K}$-Space $ extbf{I}$mage $ extbf{L}$earning $ extbf{D}$iffusion模型,它在单个无条件框架内统一了生成和连续超分辨率。自然图像和临界物理系统都表现出尺度不变性,我们利用这一点设计了一个前向过程,该过程从精细尺度到粗糙尺度衰减图像内容,同时注入频谱匹配的高斯噪声,使尺度成为扩散动力学的显式坐标。相同训练的反向过程通过仅改变起始时间步来执行生成和连续超分辨率:$ extit{没有特定任务的架构,没有条件分支,没有无分类器指导,没有按尺度因子重新训练}$。实验上,SKILD在无条件CIFAR-10上达到FID 2.65和Inception Score 9.63,从单个无条件检查点在ImageNet上执行$2 imes$--$8 imes$超分辨率,同时在感知指标上优于条件模型,并重建了临界伊辛模型,其连接的四点相关函数紧密跟踪真实情况。

英文摘要

Creating images from noise is image generation; reconstructing fine details from coarse inputs is super-resolution. Despite their practical differences, both can be understood as reversing information loss across scales. We introduce $\textbf{SKILD}$, a $\textbf{S}$cale-invariant $\textbf{K}$-Space $\textbf{I}$mage $\textbf{L}$earning $\textbf{D}$iffusion model that unifies generation and continuous super-resolution within a single unconditional framework. Both natural images and critical physical systems exhibit scale invariance, and we leverage it to design a forward process that attenuates image content from fine to coarse scales while injecting spectrum-matched Gaussian noise, making scale an explicit coordinate of the diffusion dynamics. The same trained reverse process performs generation and continuous super-resolution by varying only the starting timestep: $\textit{no task-specific architecture, no conditioning branch, no classifier-free guidance, no retraining per scale factor}$. Empirically, SKILD reaches FID $2.65$ and Inception Score $9.63$ on unconditional CIFAR-10, performs $2\times$--$8\times$ super-resolution on ImageNet from a single unconditional checkpoint while outperforming conditional models across perceptual metrics, and reconstructs critical Ising models whose connected four-point correlations closely track the ground truth.

2605.26026 2026-05-26 cs.CV cs.AI cs.LG 版本更新

A Multimodal 3D Foundation Model for Light Sheet Fluorescence Microscopy Enables Few-Shot Segmentation, Classification, and Deblurring

一种用于光片荧光显微镜的多模态3D基础模型实现少样本分割、分类和去模糊

Adina Scheinfeld, Haotan Zhang, Shang Mu, Rudolf L. M. van Herten, Lucas Stoffl, Ali Erturk, Zhuhao Wu, Johannes C. Paetzold

发表机构 * Tri-Institutional Program in Computational Biology \& Medicine, Weill Cornell Medicine, New York, NY, USA Department of Radiology, Weill Cornell Medicine, New York, NY, USA Helen Robert Appel Alzheimers Disease Research Institute, Feil Family Brain Mind Research Institute, Weill Cornell Medicine, New York, NY, USA Graduate Program in Physiology, Biophysics Systems Biology, Weill Cornell Medicine, New York, NY, USA Cornell Tech, New York, NY, USA Institute for Intelligent Biotechnologies (iBIO), Helmholtz Center Munich, Neuherberg, Germany Institute for Stroke Dementia Research, Klinikum der Universität München, Ludwig-Maximilians University Munich, Munich, Germany

AI总结 提出一种基于掩码重建与图像-文本对齐联合优化的3D基础模型,在光片荧光显微镜数据上预训练,通过少样本适应显著降低标注成本并提升分割、分类和去模糊性能。

Comments 11 pages, 3 figures

详情
AI中文摘要

光片荧光显微镜(LSM)能够对生物样本进行高分辨率三维(3D)成像,提供丰富的体积数据用于研究细胞组织、病理学和血管网络。然而,LSM数据的大小、维度和标注负担使得监督深度学习方法成本高昂且难以扩展。此外,尽管存在大量未标注的LSM体积数据,但由于计算挑战和体积表示学习的复杂性,针对该模态的基础模型仍未得到充分探索。在这项工作中,我们引入了一个用于LSM数据的3D基础模型,该模型在涵盖多种生物体、染色和成像协议的大型精选3D图像集合上进行了预训练。通过联合优化掩码重建和图像-文本对齐,我们学习了可迁移的体积表示。预训练骨干网络大幅降低了标注负担,实现了针对多种下游任务的高效少样本适应。我们在下游分割、分类和去模糊任务上评估了该方法。结果表明,我们的方法在(1)使用标准评估指标衡量时以及(2)经过领域专家严格评估时,均持续优于基线。这凸显了基础模型预训练在减少标注需求的同时提升多样化LSM分析任务性能的潜力。预训练模型权重以及预训练和微调的代码已公开:https://github.com/AdinaScheinfeld/lsm_fm_public_repo.git。

英文摘要

Light sheet fluorescence microscopy (LSM) enables high-resolution, three-dimensional (3D) imaging of biological specimens, providing rich volumetric data for studying cellular organization, pathology, and vascular networks. However, the size, dimensionality, and annotation burden of LSM data make supervised deep learning approaches costly and difficult to scale. Additionally, despite the abundance of unannotated LSM volumes, foundation models for this modality remain underexplored due to computational challenges and the complexity of volumetric representation learning. In this work, we introduce a 3D foundation model for LSM data, pretrained on a large curated collection of 3D images spanning multiple organisms, stains, and imaging protocols. We learn transferable volumetric representations by jointly optimizing for masked reconstruction and image-text alignment. The pretrained backbone drastically reduces the annotation burden, enabling efficient, few-shot adaptation for varied downstream tasks. We evaluate this approach on downstream segmentation, classification, and deblurring. Our results demonstrate consistent improvements over baselines, (1) when measured using standard evaluation metrics and (2) when rigorously assessed by domain experts. This highlights the potential of foundation model pretraining to reduce annotation requirements while improving performance across diverse LSM analysis tasks. Pretrained model weights and code for pretraining and finetuning are publicly available: https://github.com/AdinaScheinfeld/lsm_fm_public_repo.git.

2605.26019 2026-05-26 cs.LG cs.AI cs.CL 版本更新

Retrieval-Augmented Detection of Potentially Abusive Clauses in Chilean Terms of Service

智利服务条款中潜在滥用条款的检索增强检测

Christoffer Loeffler, Tomás Rey Pizarro, Daniel Ignacio Miranda Vásquez, Andrea Martínez Freile

发表机构 * School of Computer Engineering, Pontificia Universidad Católica de Valparaíso(Pontificia Universidad Católica de Valparaíso计算机工程学院) Faculty of Law, Universidad Adolfo Ibáñez(Adolfo Ibáñez大学法学院)

AI总结 提出检索增强生成框架,结合混合稠密-稀疏检索与提示增强,用于自动检测和分类智利服务条款中的潜在滥用条款,并引入包含100份合同和10,029条标注条款的语料库,实验表明该方法显著提升性能,使本地模型接近云端系统。

Comments 42 pages, 6 figures, 9 tables

详情
AI中文摘要

在线服务条款通常作为附意合同运作,造成不对称性,可能使消费者面临潜在滥用条款。在智利,评估此类条款在法律上具有挑战性,因为某些条款明显违反强制性消费者法律,而其他条款则依赖于更广泛的标准,如诚信和合同失衡。我们提出一个检索增强生成框架,用于自动检测和分类智利服务条款中的潜在滥用条款。该框架设计为本地执行,结合了高效条款检测、混合稠密-稀疏检索、重排序和提示增强,以支持中等规模的开源语言模型。我们还引入了智利滥用服务条款扩展语料库,包含100份合同和10,029条标注条款,涵盖24个法律基础的类别,包括非法、黑暗和灰色条款。比较商业和开源语言模型、微调编码器以及传统基线的实验表明,检索增强提示显著提高了性能,并使本地模型能够以较低的计算和令牌成本接近更大的基于云的系统。该研究还贡献了一个精细的法律注释方案和一个用于AI辅助消费者合同审查的实用设计。

英文摘要

Online Terms of Service often function as contracts of adhesion, creating asymmetries that may expose consumers to potentially abusive clauses. In Chile, assessing such clauses is legally challenging because some provisions clearly violate mandatory consumer law, whereas others depend on broader standards such as good faith and contractual imbalance. We present a retrieval-augmented generation framework for the automated detection and classification of potentially abusive clauses in Chilean Terms of Service. Designed for local execution, it combines efficient clause detection, hybrid dense--sparse retrieval, reranking, and prompt augmentation to support medium-sized open-weight language models. We also introduce the Chilean Abusive Terms of Service Extended corpus, comprising 100 contracts and 10,029 annotated clauses in 24 legally grounded categories spanning illegal, dark, and gray clauses. Experiments comparing commercial and open-weight language models, fine-tuned encoders, and traditional baselines show that retrieval-augmented prompting substantially improves performance and enables local models to approach larger cloud-based systems at lower computational and token cost. The study also contributes a refined legal annotation scheme and a practical design for AI-assisted consumer contract review.

2605.26013 2026-05-26 cs.LG cs.AI cs.CV 版本更新

AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models

AdvantageFlow: 流模型中基于优势加权的强化学习最小二乘法

Branislav Kveton, Anup Rao, Subhojyoti Mukherjee, Krishna Kumar Singh, Viet Dac Lai

发表机构 * Adobe Research(Adobe研究)

AI总结 提出AdvantageFlow算法,通过优势加权前向过程预测损失和 rollout 策略正则化,在图像生成任务中优于Flow-GRPO和负感知微调基线。

详情
AI中文摘要

我们引入了AdvantageFlow,一种用于修正流模型的前向过程强化学习算法。与优化反向过程的Flow-GRPO不同,我们优化了一个优势加权的前向过程预测损失。当优势为负且损失变为非凸时,该优化问题不稳定。我们通过rollout策略正则化来稳定它,这降低了方差,并源于拟合局部奖励改进的目标分布。我们在Stable Diffusion 3.5 Medium上评估了AdvantageFlow在图像生成任务中的表现。它优于Flow-GRPO和基于负感知微调的最先进前向过程强化学习基线。

英文摘要

We introduce AdvantageFlow, a forward-process reinforcement learning algorithm for rectified flow models. Unlike Flow-GRPO, which optimizes the reverse process, we optimize an advantage-weighted forward-process prediction loss. This optimization problem is unstable when advantages are negative and the loss becomes non-convex. We stabilize it by rollout policy regularization, which reduces variance and arises from fitting a local reward-improving target distribution. We evaluate AdvantageFlow on image generation tasks with Stable Diffusion 3.5 Medium. It outperforms both Flow-GRPO and a state-of-the-art forward-process RL baseline based on negative-aware fine-tuning.

2605.26012 2026-05-26 cs.LG cs.AI 版本更新

Learning in Low-Dimensional Subspaces: Orthogonal Bottlenecks for Reinforcement Learning

低维子空间中的学习:强化学习的正交瓶颈

Aleksandar Todorov, Matthia Sabatelli

AI总结 提出一种在强化学习编码器特征中插入固定正交投影以约束低维子空间的简单先验,证明其在线性可实现性假设下保持表达能力,并在实验中显示价值表示可压缩至极低维度而不损失性能。

详情
AI中文摘要

深度强化学习代理通常依赖高维神经表示,尽管越来越多的证据表明任务相关的价值和策略结构本质上是低维的。在这项工作中,我们提出了一种简单而有效的表示级先验,它插入一个固定的正交投影以将编码器特征约束到低维子空间,无需辅助目标、预训练或对底层RL算法的更改。在线性可实现性假设下,我们证明当瓶颈维度超过特征空间中最优价值函数的内在秩时,瓶颈保持表达能力,并将诱导的梯度动力学保留到等价的低维参数化。实验上,我们发现,在单任务和多任务基准测试中,一旦瓶颈维度超过一个小的任务相关阈值,基线性能要么匹配要么提高;在许多情况下,价值表示可以压缩到极低维度而不损失,最小充分维度更多地取决于环境复杂性而非编码器宽度。此外,我们分析了表示几何,发现正交瓶颈稳定了特征范数,并与更高的有效秩相关。这些结果共同支持了强化学习中流形假设的表示空间解释,并将正交瓶颈定位为一种轻量级、架构无关的塑造RL表示的机制。

英文摘要

Deep reinforcement learning (RL) agents commonly rely on high-dimensional neural representations, despite growing evidence that task-relevant value and policy structure may be intrinsically low-dimensional. In this work, we present a simple yet effective representation-level prior that inserts a fixed orthonormal projection to constrain encoder features to a low-dimensional subspace, requiring no auxiliary objectives, pretraining, or changes to the underlying RL algorithm. Under a linear realizability assumption, we prove that when the bottleneck dimension exceeds the intrinsic rank of the optimal value function in feature space, the bottleneck preserves expressivity and leaves the induced gradient dynamics unchanged up to an equivalent low-dimensional parameterization. Empirically, we find that across both single and multi-task benchmarks, baseline performance is either matched or improved once the bottleneck dimension exceeds a small task-dependent threshold; in many cases, value representations can be compressed to extremely low dimensions without loss, and the minimal sufficient dimension depends far more on environment complexity than encoder width. In addition, we analyze representation geometry and find that orthogonal bottlenecks stabilize feature norms and are associated with higher effective rank. Together, these results support a representation-space interpretation of the manifold hypothesis in reinforcement learning and position orthogonal bottlenecks as a lightweight, architecture-agnostic mechanism for shaping RL representations.

2605.26000 2026-05-26 stat.ML cs.LG stat.ME 版本更新

Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance

超越有限方差的随机梯度下降统计推断

Jose Blanchet, Peter Glynn, Wenhao Yang

发表机构 * Management Science and Engineering, Stanford University(斯坦福大学管理科学与工程系)

AI总结 针对随机梯度下降中梯度方差可能无限的问题,提出一种基于联合弱收敛和自正则化统计量的模型无关置信域构建方法,并通过子采样校准实现渐近有效推断。

详情
AI中文摘要

随机梯度下降(SGD)是大规模统计学习和随机优化的基础算法。然而,当随机梯度具有无限方差时,基于SGD迭代的统计推断仍然具有挑战性,因为相关的极限分布依赖于未知的冗余参数。在本文中,我们开发了一种高效、模型无关的方法,用于从SGD轨迹构建置信域,该方法适用于有限方差和无限方差两种情况。该过程基于Polyak-Ruppert平均估计量和由SGD轨迹上的随机梯度构建的经验二阶矩归一化器的联合弱收敛结果。这种联合极限产生了一个自归一化统计量,其中主要的尾部依赖尺度项相互抵消。然后,我们使用子采样校准方案来估计相关的临界值,避免了对尾部指数、慢变函数或稳定律参数的显式估计。由此产生的置信域易于实现,并且在有限二阶矩和无限二阶矩情况下都是渐近有效的。模拟研究显示了在各种设置下的可靠覆盖,支持所提出的方法作为随机优化中不确定性量化的实用工具。

英文摘要

Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite variance, as the relevant limiting distributions depend on unknown nuisance parameters. In this paper, we develop an efficient, model-agnostic methodology for constructing confidence regions from SGD trajectories that applies in both finite- and infinite-variance regimes. The procedure is based on a joint weak convergence result for the Polyak-Ruppert averaged estimator and an empirical second-moment normalizer constructed from stochastic gradients along the SGD trajectory. This joint limit yields a self-normalized statistic in which the leading tail-dependent scaling terms cancel. We then use a subsampling calibration scheme to estimate the relevant critical values, avoiding explicit estimation of tail indices, slowly varying functions, or stable-law parameters. The resulting confidence regions are straightforward to implement and are asymptotically valid under both the finite- and infinite-second-moment regimes. Simulation studies show reliable coverage in various settings, supporting the proposed method as a practical tool for uncertainty quantification in stochastic optimization.

2605.25998 2026-05-26 cs.LG 版本更新

Causal methods for LLM development and evaluation

因果方法在LLM开发与评估中的应用

Dennis Frauen, Marie Brockschmidt, Konstantin Hess, Haorui Ma, Yuchen Ma, Abdurahman Maarouf, Maresa Schröder, Jonas Schweisthal, Yuxin Wang, Athiya Deviyani, Sonali Parbhoo, Rahul G. Krishnan, Stefan Feuerriegel

发表机构 * Imperial College London(帝国理工学院伦敦分校) University of Toronto(多伦多大学)

AI总结 本文提出因果方法可解决LLM开发与评估中的关键因果问题,并系统梳理其在预训练、对齐、路由等环节的应用机会。

Comments Published in KDD 2026

详情
AI中文摘要

大型语言模型(LLM)的开发目前由数据混合、奖励模型、路由策略和评估流程的大规模经验迭代驱动。本文认为,LLM开发和评估中的许多核心问题本质上是因果性的:在预训练中添加数据域会产生什么影响?当LLM以不同风格生成文本时,注释者的偏好如何变化?在推理成本约束下,提示应路由到更大还是更小的模型?通常,因果方法非常适合这种干预改变结果的情景,但令人惊讶的是,它们在LLM开发中代表性不足。我们的贡献有三方面:(1)我们解释了因果方法如何帮助现代LLM开发和评估:LLM开发严重依赖日志数据,这些数据通常受混杂和分布偏移影响;评估使用学习到的但可能有偏见的评判者;部署环境是非平稳的。这些条件使得纯预测方法变得脆弱,并为因果推断中的原则性识别和估计方法创造了机会。(2)我们进一步映射了因果方法在整个LLM开发流程中的机会,包括预训练、对齐、路由、智能体工作流和评估。(3)我们讨论了利用因果方法进行LLM开发和评估的新研究机会。总体而言,我们认为因果方法在LLM开发和评估流程中可能未被充分利用,尽管这些方法可以确保可靠且科学合理的设计。

英文摘要

Large language model (LLM) development is currently driven by large-scale empirical iteration over data mixtures, reward models, routing strategies, and evaluation pipelines. Here, we argue that many central questions in LLM development and evaluation are inherently causal: What is the effect of adding a data domain during pretraining? How do annotator preferences change when LLMs generate text in a different style? Should a prompt be routed to a larger or smaller model given inference cost constraints? In general, causal methods are well-suited to such settings where interventions change outcomes but, surprisingly, are underrepresented in LLM development. Our contribution is threefold: (1) We explain how causal methods can help develop modern LLM development and evaluation: LLM development relies heavily on logged data, which are often subject to confounding and distribution shifts; evaluation uses learned but potentially biased judges; and deployment environments are non-stationary. These conditions make purely predictive approaches fragile and create opportunities for principled identification and estimation methods from causal inference. (2) We further map opportunities for causal methods in the entire LLM development pipeline, including pretraining, alignment, routing, agentic workflows, and evaluation. (3) We discuss new research opportunities around leveraging causal methods for LLM development and evaluation. Overall, we argue that causal methods are potentially underutilized for the LLM development and evaluation pipeline, despite the fact that such methods can ensure a reliable and scientifically grounded design.

2605.25997 2026-05-26 cs.LG stat.ML 版本更新

Deployment-complete benchmarking

部署完备的基准测试

El Mustapha Mansouri, Keigo Arai

发表机构 * School of Engineering, Institute of Science Tokyo(东京科学研究所工程学院)

AI总结 提出部署完备的基准测试框架,通过证据纤维和完成曲线量化基准证据是否足以确定部署行动,并证明仅靠分数不足以支持部署决策。

Comments 33 pages, 5 figures, 1 table; supplementary tables and code available

详情
AI中文摘要

基准测试日益指导部署、采购和科学筛选,但分数仅支持其记录的反应,不一定支持部署行动。我们引入了部署完备的基准测试,测试基准证据是否确定部署行动。当行动在每个证据纤维上恒定时,基准对于某个声明是完备的;混合纤维暴露了缺失的部署信息,完成曲线量化了解决歧义所需的证据。在受控响应空间中,基准通道的共形覆盖率为94.98%,但迁移到未测量的部署通道时表现不佳(10.07%),而响应排名区间实现了94.91%的覆盖率;即使零基准错误,在最大残差大小下也仅认证了45.4%的候选者。公开审计揭示了不完备性,包括97.9%的混合Tox21纤维和Matbench与JARVIS主要审计中零中位可认证分数。在保留的重放中,先认证后获取将Tox21中的错误决策从1.19%降至0.027%,JARVIS中从20.3%降至0.128%,同时改变了模型选择并识别了部署相关的探针。部署就绪的基准应报告证据、支持的行动、歧义和完成成本,而不仅仅是分数。

英文摘要

Benchmarks increasingly guide deployment, procurement and scientific screening, yet a score supports only the response it records, not necessarily the deployment action. We introduce deployment-complete benchmarking, which tests whether benchmark evidence determines a deployment action. A benchmark is complete for a claim exactly when the action is constant on each evidence fiber; mixed fibers expose missing deployment information, and completion curves quantify the evidence required to resolve ambiguity. In controlled response spaces, benchmark-channel conformal coverage of 94.98% transferred poorly to an unmeasured deployment channel (10.07%), whereas response-rank intervals achieved 94.91% coverage; even zero benchmark error certified only 45.4% of candidates at the largest residual size. Public audits revealed incompleteness, including 97.9% mixed Tox21 fibers and zero median certifiable fraction in main Matbench and JARVIS audits. In held-out replays, certify-then-acquire reduced false decisions from 1.19% to 0.027% in Tox21 and from 20.3% to 0.128% in JARVIS, while changing model choice and identifying deployment-relevant probes. Deployment-ready benchmarks should report evidence, supported actions, ambiguity and completion cost rather than scores alone.

2605.23082 2026-05-26 stat.ML cs.AI cs.LG 版本更新

KAPLAN: Kolmogorov-Arnold Prognostic Learnable Activation Networks for Survival Analysis

KAPLAN: 用于生存分析的Kolmogorov-Arnold可预测可学习激活网络

Stelios Boulitsakis Logothetis, Angela Wood, Pietro Liò

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出KAPLAN-HR模型,利用B样条Kolmogorov-Arnold网络非参数估计条件风险函数,通过深层架构自动捕捉交互和时变效应,并证明其收敛速率仅依赖于表示平滑性,从而缓解维度灾难,在六个临床数据集上达到或超越现有方法。

Comments 9 pages, 3 figures, 13 supplementary pages. Submitted to NeurIPS 2026

详情
AI中文摘要

生存分析旨在建模协变量和时间如何共同影响右删失下的事件时间分布。经典方法如Cox模型和广义加性模型(GAM)需要手动指定交互和时变效应,这在丰富的临床数据集上越来越不切实际。我们引入了KAPLAN-HR,一种B样条Kolmogorov-Arnold网络(KAN),用于非参数估计条件风险函数作为协变量和时间的联合函数。单层KAPLAN-HR模型恢复GAM,而更深层的架构通过组合捕捉交互和时变效应。我们为非参数KAN风险估计器建立了收敛速率,该速率仅依赖于底层KAN表示的平滑性,而不依赖于协变量维度,从而缓解了KAN可表示目标的维度灾难。在六个临床基准数据集的评估中,KAPLAN-HR匹配或超过了已建立的统计和深度学习生存方法的预测性能。

英文摘要

Survival analysis aims to model how covariates and time jointly shape the time-to-event distribution under right censoring. Classical methods such as the Cox model and generalised additive models (GAMs) require interactions and time-varying effects to be manually specified, which is increasingly impractical on rich clinical datasets. We introduce KAPLAN-HR, a B-spline Kolmogorov-Arnold Network (KAN) for nonparametric estimation of the conditional hazard as a joint function of covariates and time. A single-layer KAPLAN-HR model recovers a GAM, while deeper architectures capture interactions and time-varying effects through composition. We establish a convergence rate for the nonparametric KAN hazard estimator that depends only on the smoothness of the underlying KAN representation and not on the covariate dimension, thereby mitigating the curse of dimensionality for KAN-representable targets. In evaluations over six clinical benchmark datasets, KAPLAN-HR matches or exceeds the predictive performance of established statistical and deep learning survival methods.

2605.02836 2026-05-26 cs.LG math.AT 版本更新

A Closed-Form Persistence-Landmark Pipeline for Certified Point-Cloud and Graph Classification

一种用于认证点云和图分类的闭式持久性-地标管道

Sushovan Majhi, Atish Mitra, Žiga Virk, Pramita Bagchi

发表机构 * Data Science, George Washington University, USA(乔治华盛顿大学数据科学系) Department of Mathematical Sciences, Montana Technological University, USA(蒙塔纳技术大学数学科学系) Faculty of Computer and Information Science, University of Ljubljana, Institute IMFM, Slovenia(卢布尔雅那大学计算机与信息科学系,IMFM研究所,斯洛文尼亚) Biostatistics and Bioinformatics, George Washington University, USA(乔治华盛顿大学生物统计学与生物信息学系)

AI总结 提出PLACE管道,通过闭式公式从持久同调签名中分类点云和图,无需学习权重或校准,提供基于间隔的过量风险率、描述符选择规则和每个预测的认证。

Comments TMLR submission, https://openreview.net/forum?id=4kZxNlE5Ve. v2: variance-aware Pinelis-Bernstein certificate (radius iii) fires on 8/12 benchmarks (v1: not operational); MUTAG: empirical and population NC rules agree on 940/940 predictions. Matching-free nu-coherence replaces non-interference. Le Cam lower bound (Thm 3.2) recast PD-native, matching regime m<~R/D explicit

详情
AI中文摘要

我们引入PLACE(持久性-地标分析分类引擎),一种通过持久同调签名对点云和图进行分类的闭式管道。三个定量保证——基于间隔的过量风险率、闭式描述符选择规则和每个预测的认证——仅从训练标签中推导,无需学习权重或保留校准。嵌入将Mitra-Virk单点坐标函数求和到稀疏地标网格上;闭式权重规则$w_k^2 \propto (d_{k+1}^2 - d_k^2)/R_k^2$在$\nu$-相干性下最大化Mitra-Virk仿射证书中的失真斜率。(i) 由类均值分离$\Delta$和嵌入半径$R$驱动的$O(kR/(\Delta\sqrt{m_{\min}}))$间隔界,在样本匮乏区域$m \lesssim R/\Delta$中由Le Cam极小极大下界匹配。(ii) 在Ledoit-Wolf收缩协方差下的马氏距离是64描述符化学图池中最强的闭式排序器(11个基准上平均Spearman $\rho=+0.56$,11个中10个为正);各向同性替代$\Delta/\sqrt{\ell}$在同质蛋白质/社交池上具有闭式选择一致性率。(iii) 训练时决定的证书,无每个预测开销,有三种具体半径(Pinelis、高斯插件和方差感知的Pinelis-Bernstein)。实验上,PLACE是Orbit5k上最强的基于图的方法,并在MUTAG和COX2上在统计噪声内匹配最强的基于拓扑的基线;剩余差距分为两个可诊断区域(NCI1/NCI109上的描述符盲点;其他地方的池覆盖限制)。Pinelis-Bernstein半径在12个基准中的8个上触发;在MUTAG上,经验和总体最近质心规则在940个保留测试预测中的每一个上一致,验证了证书的机制。

英文摘要

We introduce PLACE (Persistence-Landmark Analytic Classification Engine), a closed-form pipeline for classifying point clouds and graphs through their persistent-homology signatures. Three quantitative guarantees -- a margin-based excess-risk rate, a closed-form descriptor-selection rule, and a per-prediction certificate -- are derived from training labels alone, with no learned weights or held-out calibration. The embedding sums Mitra-Virk single-point coordinate functions over a sparse landmark grid; the closed-form weight rule $w_k^2 \propto (d_{k+1}^2 - d_k^2)/R_k^2$ maximizes the distortion slope in Mitra-Virk's affine certificate under $ν$-coherence. (i) An $O(kR/(Δ\sqrt{m_{\min}}))$ margin bound, driven by class-mean separation $Δ$ and embedding radius $R$, matched in the sample-starved regime $m \lesssim R/Δ$ by a Le Cam minimax lower bound. (ii) The Mahalanobis margin under Ledoit-Wolf-shrunk covariance is the strongest closed-form ranker on a 64-descriptor chemical-graph pool (mean Spearman $ρ= +0.56$ across 11 benchmarks, positive on 10 of 11); the isotropic surrogate $Δ/\sqrt{\ell}$ admits a closed-form selection-consistency rate on the homogeneous protein/social pools. (iii) A training-time-decided certificate, with no per-prediction overhead, in three concrete radii (Pinelis, Gaussian plug-in, and variance-aware Pinelis-Bernstein). Empirically, PLACE is the strongest diagram-based method on Orbit5k and matches the strongest topology-based baseline within statistical noise on MUTAG and COX2; remaining gaps fall into two diagnosable regimes (descriptor blindness on NCI1/NCI109; pool-coverage limits elsewhere). The Pinelis-Bernstein radius fires on 8 of the 12 benchmarks; on MUTAG the empirical and population nearest-centroid rules agree on every one of 940 held-out test predictions, validating the certificate's mechanism.

2605.25991 2026-05-26 cs.LG cs.NA math.NA 版本更新

Fuzzy PyTorch: Rapid Numerical Variability Evaluation for Deep Learning Models

Fuzzy PyTorch: 深度学习模型的快速数值变异性评估

Inés Gonzalez-Pepe, Hiba Akhaddar, Tristan Glatard, Yohan Chatelain

发表机构 * Department of Computer Science and Software Engineering(计算机科学与软件工程系) Concordia University(康科迪亚大学) Krembil Centre for Neuroinformatics(神经信息学克雷姆布里中心) Centre for Addiction and Mental Health(成瘾与心理健康中心) Camh

AI总结 提出Fuzzy PyTorch框架,通过集成随机算术和概率舍入实现深度学习模型数值变异性的快速评估,相比现有工具Verrou实现5至60倍加速,并支持从1到3.41亿参数的模型规模。

Comments 19 pages, 8 figures, Published in Transactions on Machine Learning Research (01/2026)

详情
AI中文摘要

我们介绍了Fuzzy PyTorch,一个用于快速评估深度学习(DL)模型中数值变异性的框架。随着DL越来越多地应用于各种任务,理解浮点运算带来的变异性对于确保稳健可靠的性能至关重要。评估此类变异性的工具必须具有可扩展性、高效性,并能与现有框架无缝集成,同时最小化代码修改。Fuzzy PyTorch通过将随机算术集成到PyTorch中实现了这一点,它采用了一种名为“概率舍入与指令集管理”的新型库,该库与数值分析编译器Verificarlo接口。该库提供了随机舍入模式以及一种新模式:上下舍入。对比评估显示,Fuzzy PyTorch保持了模型性能,并且与最先进的工具Verrou相比,运行时间减少了5倍到60倍。我们进一步通过运行从1到3.41亿参数的模型展示了其可扩展性,确认了其在小型和大型DL架构中的适用性。总体而言,Fuzzy PyTorch为评估深度学习中的数值变异性提供了一种高效、可扩展且实用的解决方案,使研究人员和从业者能够在不牺牲性能或计算效率的情况下量化和管理浮点不确定性。

英文摘要

We introduce Fuzzy PyTorch, a framework for rapid evaluation of numerical variability in deep learning (DL) models. As DL is increasingly applied to diverse tasks, understanding variability from floating-point arithmetic is essential to ensure robust and reliable performance. Tools assessing such variability must be scalable, efficient, and integrate seamlessly with existing frameworks while minimizing code modifications. Fuzzy PyTorch enables this by integrating stochastic arithmetic into PyTorch through Probabilistic Rounding with Instruction Set Management, a novel library interfacing with Verificarlo, a numerical analysis compiler. The library offers stochastic rounding mode and a novel mode; up-down rounding. Comparative evaluations show Fuzzy PyTorch maintains model performance and achieves runtime reductions of 5x to 60x versus Verrou, a state-of-the-art tool. We further demonstrate scalability by running models from 1 to 341 million parameters, confirming applicability across small and large DL architectures. Overall, Fuzzy PyTorch provides an efficient, scalable, and practical solution for assessing numerical variability in deep learning, enabling researchers and practitioners to quantify and manage floating-point uncertainty without compromising performance or computational efficiency.

2605.25977 2026-05-26 cs.CL cs.AI cs.LG 版本更新

Creative Quality Alignment: Expert Tacit Knowledge Transfer via Chain-of-Thought Fine-Tuning

创意质量对齐:通过思维链微调实现专家隐性知识迁移

Bo Zou, Chao Xu

AI总结 本文通过低数据成本和小基模型的严格工程条件,实证验证了校准惊喜中的创意质量度量,并发现数据偏差,提出创意质量对齐方法及理论解释。

详情
AI中文摘要

本文对校准惊喜(Zou & Xu, 2026a)中提出的创意质量度量进行了实证实现。本文解决的问题是:这一数学主张在工程层面是否成立?为使答案尽可能通用,我们特意选择了最严格的工程条件:低数据成本和小基模型。训练数据来自BC协议(Zou & Xu, 2026b)产生的大约100个专家思维链(CoT)标注。我们还发现了一个数据偏差:大多数公开可用的对齐数据集偏向于工艺相关知识,而受众建模和现实逻辑覆盖系统性薄弱。我们使用术语“创意质量对齐”(CQA)来描述这类工程方法。我们还提供了一个支持性的理论观察:在具有单一条件分布架构的LLM中,通过架构对偶性,校准欣赏侧会自动迁移到生成侧。这是大约100个CoT示例就足够的结构性原因——而非像LIMA(Zhou et al., 2023)那样的纯粹经验观察。

英文摘要

This paper provides an empirical implementation of the creative quality metric proposed in Calibrated Surprise (Zou & Xu, 2026a). The question this paper addresses is: does this mathematical claim hold at the engineering level? To make the answer as general as possible, we deliberately choose the strictest engineering conditions: low data cost and a small base model. Training data comes from approximately 100 expert chain-of-thought (CoT) annotations produced by the BC Protocol (Zou & Xu, 2026b). We also identify a data bias: most publicly available alignment datasets are skewed toward craft-related knowledge, while audience modeling and reality-logic coverage are systematically weak. We use the term Creative Quality Alignment (CQA) to describe this class of engineering methods. We also offer a supporting theoretical observation: in an LLM with a single conditional distribution architecture, calibrating the appreciation side automatically transfers to the generation side via architectural duality. This is the structural reason why ~100 CoT examples are sufficient -- not a purely empirical observation like LIMA (Zhou et al., 2023).

2605.25967 2026-05-26 cs.LG cs.SD 版本更新

Hidden in Plain Tokens: Simply Robust, Gradient-Free Watermark for Synthetic Audio

隐藏在明文令牌中:简单、鲁棒、无梯度的合成音频水印

Georgios Milis, Yubin Qin, Yihan Wu, Heng Huang

发表机构 * Department of Computer Science, University of Maryland, College Park, USA(大学马里兰大学计算机科学系)

AI总结 本文利用离散化中的词汇冗余,提出一种无需微调或梯度的合成音频水印方法,通过社区检测缩减词汇表提升检测鲁棒性,在音频修改下仍保持高性能。

Comments Accepted to ICML 2026

详情
AI中文摘要

随着政策追赶生成式AI的能力,水印技术成为内容溯源工作的核心。自回归模型的推理时水印由于离散化不一致而不适用于连续模态。现有方法通过微调模态分词器来克服这一问题,但失去了水印无需训练的优势。在这项工作中,受离散化中词汇冗余的启发,我们提出了一种优雅的解决方案,用于合成音频的强大且鲁棒的水印。我们从理论上分析了令牌错误对水印检测的影响,并通过社区检测获得的缩减词汇表有效缓解了这些问题。充分的实验表明,我们的无梯度方法可以将可检测性提高几个数量级,同时实现对音频修改的内置鲁棒性。广泛地说,我们发现了多媒体中令牌级水印的新最先进技术,这仅仅源于离散表示学习的本质。

英文摘要

As policy catches up with the capabilities of generative AI, watermarking is central to content provenance efforts. Inference-time watermarks for autoregressive models are unfit for continuous modalities due to discretization inconsistencies. Existing methods overcome this by finetuning the modality tokenizers, nullifying the watermark's training-free advantage. In this work, motivated by the vocabulary redundancy of discretization, we propose an elegant solution for powerful and robust watermarking of synthetic audio. We theoretically analyze the impact of token errors on watermark detection, and effectively mitigate them using a reduced vocabulary obtained via community detection. Thorough experiments showcase that our gradient-free method can boost detectability by several orders of magnitude, while also achieving built-in robustness to audio modifications. Broadly, we discover a new state-of-the-art for token-level watermarks in multimedia, which simply arises from the nature of discrete representation learning.

2605.25966 2026-05-26 cs.LG cs.CL stat.ML 版本更新

Mapping the Schedule x Bit-Width Boundary in Sub-100M Quantisation-Aware Training

在小于100M参数量化感知训练中映射调度策略与位宽边界

Christian Brandt Thomassen

发表机构 * Dwarf A/S(Dwarf公司)

AI总结 通过大规模实验研究子100M参数解码器语言模型中,量化感知训练的最佳学习率调度是否依赖于位宽,发现INT6 QAT无需不同调度,INT4在50M以上需wd33调度,以下则噪声主导。

Comments 20 pages, 6 figures, 4 tables. 1345 training runs total (720 + 625). Submitted for review at TMLR

详情
AI中文摘要

我们测试了在子100M参数解码器语言模型中,从初始化开始的量化感知训练(QAT)的最佳学习率调度是否依赖于位宽。一项720次运行的因子网格实验(阶段2)覆盖了位宽×衰减分数×学习率大小×模型大小×随机种子(FP16/INT8/INT6,15M-100M,5个种子),发现在每个(位宽,大小)单元中,最佳衰减分数为33%。主要假设——INT6 QAT需要与高精度训练不同的调度——在FP16/INT8/INT6下被证伪。后续625次运行(阶段5)沿五个轴探测零假设:优化器(AdamW)、调度形状(余弦)、训练长度(最多9倍迭代次数)、扩展的大小扫描(5M-350M)以及从3M到100M的INT4扫描。零假设在所有三种设置变化下均稳健。INT6的惩罚遵循对数线性缩放定律,其在阶段2的拟合预测了五个保留的阶段5大小(5M、8M、175M、250M、350M),且均在95%预测区间内(5/5)。对于INT4,情况比高精度更清晰:在50M和100M时,wd33明确最优(配对z~12-15,10/10种子);低于50M时,在从3M到30M的六个测试大小中,没有单个大小显示出统计显著的调度偏好,且每个大小的平均惩罚在种子级噪声内振荡。因此,边界是从低于50M的噪声主导区域到50M及以上明确的wd33区域的过渡,而非清晰的wd10区域。权重到网格距离的探测证伪了FP16/INT8/INT6零假设的最简单机制(快速网格锁定):在衰减前,INT6-QAT权重与INT6网格的距离基本与FP16权重相同(比率~1.04)。实用建议:在子100M规模下,在FP16上调优一次学习率调度,并原封不动地应用于INT8/INT6 QAT;对于50M以上的INT4,使用wd33;对于50M以下的INT4,调度选择在噪声中。

英文摘要

We test whether the optimal learning-rate schedule depends on bit-width during from-initialisation quantisation-aware training (QAT) for sub-100M decoder language models. A 720-run factorial grid (Phase 2) over bit-width x warmdown fraction x LR magnitude x model size x seed (FP16/INT8/INT6, 15M-100M, 5 seeds) finds the optimal warmdown is 33% at every (bit-width, size) cell. The primary hypothesis -- that INT6 QAT requires a different schedule than higher-precision training -- is falsified at FP16/INT8/INT6. A 625-run follow-up (Phase 5) probes the null along five axes: optimiser (AdamW), schedule shape (cosine), training length (up to 9x more iterations), an extended size sweep (5M-350M), and an INT4 sweep from 3M to 100M. The null is robust under all three setup changes. The INT6 penalty follows a log-linear scaling law whose fit on Phase 2 predicts the five held-out Phase 5 sizes (5M, 8M, 175M, 250M, 350M) within their 95% prediction intervals (5/5). For INT4 the picture is sharper than the higher precisions: at 50M and 100M, wd33 is decisively optimal (paired z ~ 12-15, 10/10 seeds); below 50M, across the six tested sizes from 3M to 30M, no individual size shows a statistically significant schedule preference and the per-size mean penalty oscillates within seed-level noise. The boundary is therefore a transition between a noise-dominated regime below 50M and a decisive wd33 regime at and above 50M, not a clean wd10 region. A weight-to-grid-distance probe falsifies the simplest mechanism for the FP16/INT8/INT6 null result (rapid grid-snapping): pre-warmdown, INT6-QAT weights sit at essentially the same distance from the INT6 grid as FP16 weights (ratio ~ 1.04). Practical recommendation: at sub-100M scale, tune the LR schedule once at FP16 and apply unchanged to INT8/INT6 QAT; for INT4 at 50M+ use wd33; for INT4 below 50M the schedule choice is in the noise.

2605.25955 2026-05-26 cs.CL cs.AI cs.LG 版本更新

QUIET: A Multi-Blank Cascaded Story Cloze Benchmark for LLM Creative Generation Capability

QUIET: 面向LLM创意生成能力的多空白级联故事完形填空基准

Bo Zou, Chao Xu

AI总结 提出QUIET基准,通过多空白级联故事完形填空和基于信息论的自动评分协议,客观评估大语言模型的创意生成能力。

详情
AI中文摘要

大语言模型(LLM)在创意能力评估中面临双重挑战:现有基准(如Story Cloze Test、HellaSwag)通过多项选择识别范式衡量模型对叙事延续的判别能力,而非直接衡量创意生成能力;基于量规的评分和LLM-as-Judge方法依赖主观维度评估或自然语言模型输出,无法提供客观、自动化的评分机制。本文提出QUIET(Quality Understanding via Interlocked Evaluation Testing),一种基于多空白级联故事完形填空的LLM创意能力诊断基准。QUIET在结构完整的故事中设置N个空白(10-20个),每个空白附带显式内容约束,且空白之间存在级联依赖关系——较早空白填充的内容约束较晚空白的可行解空间。被评估模型(或人类参与者)以开放生成模式填充所有空白;结果由基于信息论的自动化评分协议评分,无需人工评分。该评分协议直接操作化“校准惊喜”理论框架(Zou & Xu, 2026a)。对于每个空白k,计算复合分数:score = satisfy * (1 + lambda * surprise),其中lambda = 1.0。这里,“satisfy”衡量空白填充满足内容约束的程度(客观逻辑推理判断,非主观审美评分),“surprise”衡量在满足约束条件下的惊喜程度。不满足约束的创意答案得零分;满足约束但平庸的答案得分低;满足约束且令人惊喜的答案得分高。

英文摘要

Large language models (LLMs) face a dual challenge in creative capability evaluation: existing benchmarks (e.g., Story Cloze Test, HellaSwag) measure models' discriminative ability over narrative continuation using multiple-choice recognition paradigms, rather than directly measuring creative generation capability; rubric-based scoring and LLM-as-Judge methods rely on subjective dimension assessment or natural language model outputs, and cannot provide objective, automated scoring mechanisms. This paper proposes QUIET (Quality Understanding via Interlocked Evaluation Testing), a diagnostic benchmark for LLM creative capability based on multi-blank cascaded story cloze. QUIET sets N blanks (10-20) in a story with complete structure, with each blank accompanied by an explicit content constraint, and cascade dependency relationships between blanks -- the content filled into earlier blanks constrains the feasible solution space for later blanks. The evaluated model (or human participants) fills all blanks in open-ended generation mode; the results are scored by an information-theoretic automated scoring protocol without human grading. The scoring protocol directly operationalizes the "calibrated surprise" theoretical framework (Zou & Xu, 2026a). For each blank k, a composite score is computed: score = satisfy * (1 + lambda * surprise), where lambda = 1.0. Here, "satisfy" measures how well the blank filling satisfies the content constraint (objective logical reasoning judgment, not subjective aesthetic scoring), and "surprise" measures the degree of surprise given that the constraint is satisfied. Creative answers that do not satisfy the constraint score zero; answers that satisfy the constraint but are mediocre score low; answers that satisfy the constraint and are surprising score high.

2605.25954 2026-05-26 cs.LG cs.AI 版本更新

Step-TP: A Grounded, Step-Level Dataset with Chain-of-Thought Reasoning for LLM-Guided Tensor Program Optimization

Step-TP: 一个基于步骤级、带有思维链推理的 LLM 引导张量程序优化数据集

Mengfan Liu, Da Zheng, Junwei Su, Chuan Wu

发表机构 * The University of Hong Kong(香港大学) University of Science and Technology of China(中国科学技术大学)

AI总结 为解决 LLM 在张量程序优化中缺乏可验证步骤级监督的问题,提出 Step-TP 数据集,通过结构化思维链推理和原子步骤监督实现可靠的多步优化。

详情
AI中文摘要

尽管大语言模型(LLM)具有强大的推理能力,但由于需要精确、可组合的变换决策,优化张量程序的执行效率仍然具有挑战性。最近的 LLM 引导方法将张量程序优化视为一个迭代决策过程,但现有数据集仅提供使用令牌效率低下的表示方式的端到端优化程序对,缺乏可验证的步骤级监督和可解释性。因此,LLM 难以在大型组合优化空间中做出可靠的单步决策。我们引入了 Step-TP,一个用于张量程序优化的后训练数据集,它提供基于事实的、原子性的步骤级监督,并带有结构化的思维链(CoT)推理。Step-TP 在中间程序状态上形成一个封闭的推理循环,从而实现可靠的多步优化,而非结果模仿。其设计遵循四个原则:(i) 令牌高效、可验证的中间表示(IR),可确定性降低为 TVM TIR;(ii) 原子且可组合的优化策略,将复杂轨迹分解为可解释的单步决策;(iii) 结构化的 CoT 监督与显式的 IR 到 IR 状态转换相结合;(iv) 策略过滤以平衡覆盖范围同时防止捷径利用。该数据集和实现可在 GitHub 链接 https://github.com/LIUMENGFAN-gif/StepTP 获取。

英文摘要

Despite the strong reasoning capabilities of large language models (LLMs), optimizing the execution efficiency of tensor programs remains challenging due to the need for precise, composable transformation decisions. Recent LLM-guided approaches frame tensor program optimization as an iterative decision process, but existing datasets provide only end-to-end optimized program pairs using token-inefficient representations, lacking verifiable step-level supervision and interpretability. As a result, LLMs struggle to make reliable single-step decisions in large combinatorial optimization spaces. We introduce Step-TP, a post-training dataset for tensor program optimization that provides grounded, atomic, step-level supervision with structured chain-of-thought (CoT) reasoning. Step-TP forms a closed reasoning loop over intermediate program states, enabling reliable multi-step optimization rather than outcome imitation. Its design is guided by four principles: (i) a token-efficient, verifiable intermediate representation (IR) that deterministically lowers to TVM TIR; (ii) atomic and composable optimization strategies that decompose complex trajectories into interpretable single-step decisions; (iii) structured CoT supervision coupled with explicit IR-to-IR state transitions; and (iv) strategy filtering to balance coverage while preventing shortcut exploitation. The dataset and implementation are available at a GitHub link, https://github.com/LIUMENGFAN-gif/StepTP.

2605.25949 2026-05-26 cs.LG cs.AI physics.comp-ph 版本更新

Small Models, Strong Priors: Architectural Inductive Bias for Parameter-Efficient Neural PDE Solvers

小模型,强先验:参数高效神经PDE求解器的架构归纳偏置

Shyam Sankaran, Hanwen Wang, Paris Perdikaris

发表机构 * Department of Mechanical Engineering and Applied Mechanics, University of Pennsylvania(宾夕法尼亚大学机械工程与应用力学系)

AI总结 提出WaveLiT架构,通过小模型(1-10M参数)利用小波多尺度先验实现参数高效,在多个PDE基准上媲美大100-1000倍的基础模型,并揭示先验失败模式可提供有用信号。

详情
AI中文摘要

神经PDE求解器遵循视觉和语言的扩展轨迹,最近的基础模型达到数十亿参数。我们认为,在该领域中,规模不能很好地替代架构归纳偏置:结构化先验带来超高的参数效率,并且它们成功和失败的模式本身就能说明它们捕获了什么。我们通过WaveLiT实例化这一论点,该架构结合了用于无损多分辨率标记化的离散小波变换、增强的线性注意力块、共享权重的多尺度特征金字塔以及小波域辅助损失。定制的1-10M参数WaveLiT模型在八个TheWell基准测试中与规模大100-1000倍的基础模型竞争,在波动和声学主导的基准测试中增益最大,其中小波多尺度先验适合主导动力学结构,且小的每步误差在展开时不会几何级数地复合。在所有八个基准测试上联合训练后,一个10M参数的基础变体表现出结构化的、物理上可解释的迁移模式——在小波多尺度先验匹配动力学的地方最强,在混沌平流主导的流动中最弱。整个流水线在单个GPU上训练。结果表明,小模型PDE性能由架构归纳偏置而非规模决定,并且先验失败的结构是关于其内容的有用经验信号。

英文摘要

Neural PDE solvers have followed the scaling trajectory of vision and language, with recent foundation models reaching billions of parameters. We argue that scale is a poor substitute for architectural inductive bias in this domain: structured priors deliver outsized parameter efficiency, and the pattern of where they succeed and fail is itself informative about what they capture. We instantiate this argument in WaveLiT, an architecture combining a discrete wavelet transform for lossless multi-resolution tokenization, an augmented linear attention block, a shared-weight multiscale feature pyramid, and a wavelet-domain auxiliary loss. Bespoke 1-10M-parameter WaveLiT models compete with foundation models of 100-1000$\times$ their size across eight TheWell benchmarks, with the largest gains on wave and acoustic-dominated benchmarks where the wavelet-multiscale prior fits the dominant dynamical structure and small per-step errors do not compound geometrically under rollout. Trained jointly across all eight benchmarks, a 10M-parameter foundation variant exhibits a structured, physically interpretable transfer pattern -- strongest where the wavelet-multiscale prior matches the dynamics, weakest on chaotic advection-dominated flows. The entire pipeline trains on a single GPU. The results suggest that small-model PDE performance is shaped by architectural inductive bias rather than scale, and that the structure of a prior's failures is a useful empirical signal about its content.

2605.25943 2026-05-26 cs.LG 版本更新

STaT: Resolving Shape Distortion in Non-Stationary Time Series via Tri-Modal Synergy

STaT: 通过三模态协同解决非平稳时间序列中的形状失真

Hui Cheng, Jinsheng Guo, Zhenhao Weng, Yan Qiao, Meng Li

发表机构 * Hefei University of Technology(合肥工业大学)

AI总结 提出STaT多模态架构,通过符号-时间-文本三模态对齐,在降低平均误差的同时减少形状失真,在8个基准上提升幅度指标达8.9%并降低形状失真达8.5%。

详情
AI中文摘要

近期时间序列预测研究常探索将文本和视觉模态与数值模型结合,以更好地应对非平稳环境。尽管取得了可靠的数值结果,现有多模态方法通常面临两难:优先最小化平均误差会导致预测过于平滑,忽略关键波动。为解决这一局限,我们提出STaT,一种创新的符号-时间-文本对齐多模态架构,无缝融合三种协同模态。具体而言,符号模态将连续时间序列转换为离散标记,便于准确识别结构模式和转折点;时间模态提取内在序列依赖;文本模态利用领域语义引导宏观预测趋势。在八个真实世界基准上的综合评估表明,STaT表现卓越,将传统幅度指标提升高达8.9%,同时将形状失真降低高达8.5%。

英文摘要

Recent research in time series forecasting frequently investigates the integration of textual and visual modalities with numerical models to better navigate non-stationary environments. Despite delivering solid numerical results, existing multi-modal approaches usually encounter a dilemma: prioritizing the minimization of average errors can result in excessively smooth forecasts that overlook essential fluctuations. To resolve this limitation, we introduce STaT, an innovative multimodal architecture for Symbolic-Temporal-Textual Alignment, which seamlessly unites three synergistic modalities. Specifically, the symbolic modality converts continuous time series into discrete tokens, facilitating the accurate identification of structural patterns and turning points; the temporal modality extracts inherent sequential dependencies; and the textual modality leverages domain semantics to steer the macroscopic forecasting trends. Comprehensive evaluations on eight real-world benchmarks indicate that STaT delivers exceptional performance, enhancing conventional magnitude indicators by up to 8.9% while simultaneously decreasing shape distortion by up to 8.5%.

2605.25939 2026-05-26 cs.LG cs.AI 版本更新

From Latent Space to Training Data: Explainable Specialization in Minimal MLPs

从潜在空间到训练数据:最小MLP中的可解释特化

Enrique Alba, Ezequiel Lopez-Rubio

发表机构 * ITIS Software, University of Malaga(马德里大学ITIS软件)

AI总结 研究最小单隐藏层MLP中隐藏神经元是否因训练偏差而特化,以及这种特化是否改善基于原型的训练数据重构,发现覆盖正则化能提高特化比并降低重构误差,而重叠惩罚会导致原型中心被推出凸包。

详情
AI中文摘要

我们在此研究训练偏差是否能使隐藏神经元在最小单隐藏层MLP中特化,以及这种特化是否改善从学习权重对训练数据集进行基于原型的重构。我们考虑宽度等于数据集大小的高斯激活MLP,并比较三种结构损失(分别鼓励训练样本覆盖、神经元诱导原型之间的分离以及隐藏响应的低重叠)与标准拟合基线。在均匀采样的一维数据集上的实验显示,从N=3到N=100的480次受控运行中呈现稳定模式。覆盖正则化在每个测试大小下给出最低的平均重构误差,并相对于标准基线提高了原型使用特化比,而分离效果参差不齐,重叠惩罚则系统性有害。我们表明这种损害并非优化失败:重叠激活的方法与无重叠方法一样拟合数据,但将优化器引导至退化均衡,其中原型中心被推出训练输入的凸包。覆盖无法奖励这种驱逐,并充当吸引子:分离仅在高温下允许它,而重叠在名义超参数选择下允许它。在分离掩码上的直接τ扫描和N=100时的原型位置可视化确认了这一机制。这些发现为原型可恢复性感知训练提供了一个简单的设计原则:每个排斥性结构损失必须由一个兼容的吸引子补偿,否则它将破坏本应精炼的潜在几何结构。

英文摘要

We here study whether training biases can make hidden neurons specialize in minimal one-hidden-layer MLPs, and whether such specialization improves prototype-based reconstruction of the training dataset from the learned weights. We consider Gaussianactivation MLPs of width equal to dataset size and compare three structural losses that respectively encourage coverage of the training samples, separation between neuron-induced prototypes, and low overlap of hidden responses, against the standard fitting baseline. Experiments on uniformly sampled one-dimensional datasets show a stable pattern from N = 3 to N = 100 across 480 controlled runs. Coverage regularization gives the lowest mean reconstruction error at every tested size and raises the prototype-usage specialization ratio relative to the standard baseline, while separation has mixed effects and overlap penalties are systematically harmful. We show that the harm is not an optimization failure: overlap-active approaches fit the data as well as overlap-free ones but route the optimizer to a degenerate equilibrium in which prototype centers are pushed outside the convex hull of the training inputs. Coverage cannot reward this expulsion and acts as an attractor: separation admits it only at large temperature and overlap admits it at the nominal hyperparameter choice. A direct τ-sweep on the separation-only mask and a prototype-position visualization at N = 100 confirm the mechanism. The findings yield a simple design principle for prototype-recoverability-aware training: every repulsive structural loss must be compensated by a compatible attractor, or it will collapse the latent geometry it was meant to refine.

2605.25937 2026-05-26 cs.CR cs.LG 版本更新

Building an Adversarial Malware Dataset by Family and Type: Generation, Evasion, and Poisoning Evaluation

构建按家族和类型分类的对抗性恶意软件数据集:生成、逃避和投毒评估

David Košťál, Martin Jureček

发表机构 * Department of Information Security, Faculty of Information Technology, Czech Technical University in Prague(信息安全系,信息技术学院,布拉格捷克技术大学)

AI总结 基于RawMal-TF真实恶意软件数据集,使用对抗性生成器构建家族和类型标记的对抗性PE文件,评估逃避率和投毒攻击影响。

详情
AI中文摘要

我们提出了一个对抗性恶意软件样本数据集,该数据集源自公开的RawMal-TF真实恶意软件二进制文件集合。使用一套对抗性恶意软件生成器,我们构建了两组对抗性PE文件:44,347个按家族标记的样本和33,596个按类型标记的样本,分别对EMBER分类器实现了98.35%和92.20%的逃避率。每个对抗性二进制文件都附有详细的元数据,包括EMBER分数和VirusTotal分类。我们进一步通过一系列训练实验证明了恶意软件分类管道对数据投毒攻击的敏感性。在家族标记数据集中,仅注入占训练数据0.5%的完全错误标记的对抗性样本,就将对重新训练的分类器的逃避率从26.1%提高到92.8%。该数据集已公开发布,以促进未来对抗性恶意软件、投毒攻击以及基于机器学习的恶意软件检测系统鲁棒性的研究。

英文摘要

We present a dataset of adversarial malware samples derived from the public RawMal-TF collection of real-world malware binaries. Using a suite of adversarial malware generators, we construct two sets of adversarial PE files: 44,347 family-labelled samples and 33,596 type-labelled samples, achieving evasion rates of 98.35 % and 92.20 % against the EMBER classifier, respectively. Each adversarial binary is accompanied by detailed metadata, including EMBER scores and VirusTotal classifications. We further demonstrate the susceptibility of malware classification pipelines to data poisoning attacks through a series of training experiments. Injecting fully mislabelled adversarial samples representing only 0.5 % of the training data in the family-labelled dataset increases the evasion rate against the re-trained classifier from 26.1 % to 92.8 %. The dataset is publicly released to facilitate future research on adversarial malware, poisoning attacks, and the robustness of machine-learning-based malware detection systems.

2605.25933 2026-05-26 cs.LG cs.AI 版本更新

Quantitative Evaluation of the Severity of Posttraumatic Stress Disorder through Transfer Learning from Specific Phobia Data

通过特定恐惧症数据迁移学习定量评估创伤后应激障碍的严重程度

Nicolas Ricka, Gauthier Pellegrin, Denis A. Fompeyrine, Thomas Rohaly, Leah Enders, Heather Roy

发表机构 * MyndBlue DCS Corporation Human in Complex Systems Division, DEVCOM Army Research Laboratory(复杂系统人类研究部,DEVCOM陆军研究实验室)

AI总结 提出基于多元核密度估计的机器学习方法,利用心率与皮肤电导信号从特定恐惧症数据迁移学习,客观评估PTSD严重程度,分类准确率86%,平均绝对误差5.6。

Comments Submitted to a peer-reviewed journal, comments welcome

详情
AI中文摘要

创伤后应激障碍(PTSD)是一种普遍且使人衰弱的心理健康状况,对个人和社会产生重大影响。目前PTSD的临床评估通常依赖主观评价,耗时、昂贵且易受人为偏见影响。本研究提出一种基于多元核密度估计(MKDE)技术的机器学习方法,用于客观评估PTSD严重程度。我们收集了21名参与者在沉浸式模拟期间的心率(HR)和皮肤电导反应(GSR)信号以及PTSD检查表-军事版(PCL-M)标签。在公开的蜘蛛恐惧症数据集上训练恐惧反应模型,并从军事数据集估计的恐惧反应曲线中提取PTSD预测特征。该模型在分类PTSD状态时达到86%的准确率,有效区分有和无PTSD的参与者(PCL-M阈值为36)。模型的平均绝对误差(MAE)为5.6,并以17%的平均绝对百分比误差估计临床PTSD严重程度量表。我们的算法通过提供一种客观且低努力的生理评估方法,显示出增强PTSD严重程度估计和随访的潜力。这些发现表明在筛查和随访环境中具有临床实用性。

英文摘要

Posttraumatic stress disorder (PTSD) is a prevalent and debilitating mental health condition with significant personal and societal impacts. Current clinical assessments of PTSD often rely on subjective evaluations, which can be time-consuming, costly, and prone to human bias. This study proposes a machine learning (ML) approach based on multivariate kernel density estimation (MKDE) technique for the objective evaluation of PTSD severity. We collected heart rate (HR) and galvanic skin response (GSR) signals as well as PTSD Checklist - Military Version (PCL-M) labels from 21 participants during an immersive simulation. A fear-response model was trained on a public arachnophobia dataset, and predictive features of PTSD were extracted from the fear-response curves estimated on the military dataset. The model achieved an accuracy of 86\% in classifying PTSD status, effectively distinguishing participants with and without PTSD (PCL-M threshold of 36). The average mean absolute error (MAE) of the models is 5.6, and it estimated a clinical PTSD severity scale with a mean absolute percentage error of 17\%. Our algorithm demonstrates promising potential for enhancing estimation of PTSD severity and followup by offering an objective and low-effort evaluation approach using physiology. These findings suggest clinical utility in both screening and follow-up settings.

2605.25924 2026-05-26 cs.CL cs.LG 版本更新

Does Continued Pretraining on a Learner Corpus Improve Automated Essay Scoring on English Proficiency Tests? Evidence from EFCAMDAT

在学习者语料库上继续预训练是否能提高英语水平测试的自动作文评分?来自EFCAMDAT的证据

Duy Anh Nguyen

发表机构 * University of Greenwich(格林威治大学)

AI总结 研究通过在EFCAMDAT学习者语料库上进行领域自适应继续预训练(DAPT),探究其对基于Transformer的自动作文评分(AES)在英语水平测试中的影响,发现全语料库DAPT效果不一,而基于CEFR分级的针对性DAPT能更可靠地提升领域内评分性能。

Comments 16 pages, 3 figures, 10 tables, including references and appendices

详情
AI中文摘要

最近的自动作文评分(AES)研究越来越多地使用预训练的Transformer模型,但这些模型通常是在通用领域英语上预训练的,可能无法充分代表第二语言学习者的写作。本研究调查了在EFCAMDAT学习者语料库上进行领域自适应继续预训练(DAPT)是否能提高基于Transformer的AES在英语水平测试中的表现。我们对三个Transformer编码器应用DAPT,并在FCE和IELTS上评估了领域内评分和少样本跨数据集迁移。全语料库DAPT在模型、数据集和指标上产生了混合结果。进一步分析表明,这些混合效应部分由EFCAMDAT与下游数据集在熟练度、体裁和交际目的上的不匹配解释。基于熟练度的消融实验显示,使用CEFR对齐子集进行针对性DAPT比全语料库DAPT更可靠地提高了下游评分,尤其是对于使用B1-B2数据的FCE。然而,这些增益并未一致地改善跨数据集迁移。总体而言,研究结果表明,当预训练数据与下游评估设置充分对齐时,在学习者写作语料库上继续预训练可以有益于英语评估的领域内AES,但它不会自动提高跨不同英语水平测试数据集的迁移性。

英文摘要

Recent automated essay scoring (AES) studies increasingly use pretrained transformer models, but these models are usually pretrained on general-domain English and may under-represent second-language learner writing. This study investigates whether domain-adaptive continued pretraining (DAPT) on the EFCAMDAT learner corpus improves transformer-based AES for English proficiency tests. We apply DAPT to three transformer encoders and evaluate them on FCE and IELTS in both in-domain scoring and few-shot cross-dataset transfer. Full-corpus DAPT produces mixed results across models, datasets, and metrics. Further analyses suggest that these mixed effects are partly explained by mismatches in proficiency, genre, and communicative purpose between EFCAMDAT and the downstream datasets. A proficiency-based ablation shows that targeted DAPT using CEFR-aligned subsets improves downstream scoring more reliably than full-corpus DAPT, especially for FCE with B1--B2 data. However, these gains do not consistently improve cross-dataset transfer. Overall, the findings suggest that continued pretraining on a learner-writing corpus can benefit in-domain AES for English assessment when the pretraining data is sufficiently aligned with the downstream assessment settings. However, it does not automatically improve transferability across different English proficiency test datasets.

2605.25916 2026-05-26 cs.LG cs.DC cs.NI 版本更新

Joint Optimization of Training and Inference in Federated Edge Learning via Constrained Multi-Objective Deep Reinforcement Learning

通过约束多目标深度强化学习联合优化联邦边缘学习中的训练与推理

Zhen Li, Jun Cai, Chao Yang, Haoran Gao

发表机构 * Department of Electrical and Computer Engineering, Concordia University(康科迪亚大学电气与计算机工程系) School of Automation, Guangdong University of Technology(广东工业大学自动化学院)

AI总结 提出一种在线优化框架,通过约束多目标深度强化学习算法C-MOPPO联合管理资源受限边缘设备上的联邦训练和推理,以在最小化延迟和能耗的同时最大化推理精度。

详情
AI中文摘要

联邦边缘学习(FEEL)最近成为一种有前景的范式,通过支持跨边缘设备的协作模型训练同时保护数据隐私来实现边缘智能(EI)。在本文中,我们提出了一种在线优化框架,用于联合管理资源受限边缘设备上的联邦训练和推理。我们引入了一种基于串联队列的转换机制,将推理请求与训练数据桥接起来,并进一步将数据和模型的新鲜度纳入准确性公式中,以捕捉真实环境中的时间动态。为了在最小化延迟和能耗的同时最大化推理精度,边缘设备的模式选择、通信和计算资源分配被联合优化。我们将此优化表述为一个多目标优化问题,该问题是NP难的,并且由于在线设置而进一步复杂化。为了应对这些挑战,我们将问题转化为多目标马尔可夫决策过程(MOMDP),并开发了一种约束多目标近端策略优化(C-MOPPO)算法。具体来说,C-MOPPO首先学习一组具有不同目标偏好策略,然后利用约束策略优化来丰富帕累托前沿并获得高质量、密集的解。大量实验表明,C-MOPPO在目标之间实现了良好的平衡权衡,并在各种系统配置下显著优于基线。

英文摘要

Federated edge learning (FEEL) has recently emerged as a promising paradigm for achieving edge intelligence (EI) via enabling collaborative model training across edge devices while protecting data privacy. In this paper, we put forth an online optimization framework that jointly manages federated training and inference on resource-constrained edge devices. We introduce a tandem-queue-inspired conversion mechanism that bridges inference requests and training data, and further incorporate both data and model freshness into the accuracy formulation to capture temporal dynamics in real-world environments. To maximize inference accuracy while minimizing latency and energy consumption, the mode selections, communication, and computation resource allocations of edge devices are jointly optimized. We formulate this optimization as a multi-objective optimization problem, which is NP-hard and further complicated by the online setting. To address these challenges, we transform the problem into a multi-objective Markov decision process (MOMDP) and develop a \underline{c}onstrained \underline{m}ulti-\underline{o}bjective \underline{p}roximal \underline{p}olicy \underline{o}ptimization (C-MOPPO) algorithm. Specifically, C-MOPPO first learns a set of policies with different preferences across three objectives, then leverages constrained policy optimization to enrich the Pareto front and obtain high-quality, dense solutions. Extensive experiments demonstrate that C-MOPPO achieves well-balanced trade-offs among objectives and significantly outperforms baselines under various system configurations.

2605.25903 2026-05-26 cs.CL cs.LG 版本更新

Universal Activation Verbalizer: A Unified Framework for Cross-Model Activation Explanation

通用激活词化器:跨模型激活解释的统一框架

Haiyan Zhao, Zirui He, Guanchu Wang, Ali Payani, Yingcong Li, Mengnan Du

发表机构 * New Jersey Institute of Technology(新泽西理工学院) University of North Carolina at Charlotte(北卡罗来纳大学夏洛特分校) Cisco Research(思科研究) The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳))

AI总结 提出通用激活词化器(UAV)框架,通过共享解码器和轻量适配器将异构模型的隐藏表示转化为自然语言解释,支持跨模型家族和规模的激活词化,在分类、事实检索和要点总结任务中与强基线竞争。

Comments 23 pages, 11 figures, 11 tables

详情
AI中文摘要

激活词化以自然语言解释隐藏表示,但现有方法大多局限于自解释,即每个模型仅解释自身的激活。我们引入通用激活词化器(UAV),一个使用共享解码器解释来自异构捐赠模型激活的框架。UAV学习一个轻量适配器,将捐赠激活转化为解码器嵌入空间中的软标记,并通过重用冻结的解码器侧LoRA同时为另一个捐赠者训练新适配器,进一步支持仅适配器迁移。在分类、事实检索和要点总结任务中,UAV在实现跨模型家族和规模的跨模型词化时,与强自解释基线保持竞争力。消融实验表明,解码器侧调优主要改善任务行为,而适配器提供激活基于的事实和语义信息,用于忠实解释。

英文摘要

Activation verbalization explains hidden representations in natural language, but existing methods are mostly limited to self-explanation, where each model explains only its own activations. We introduce Universal Activation Verbalizer (UAV), a framework that uses a shared decoder to explain activations from heterogeneous donor models. UAV learns a lightweight adapter that converts donor activations into soft tokens in decoder's embedding space, and further supports adapter-only transfer by reusing a frozen decoder-side LoRA while training only a new adapter for another donor. Across classification, fact retrieval, and gist summarization, UAV remains competitive with strong self-explanation baselines while enabling cross-model verbalization across model families and scales. Ablations show that decoder-side tuning mainly improves task behavior, whereas the adapter provides the activation-grounded factual and semantic information needed for faithful explanations.

2605.25894 2026-05-26 cs.LG q-fin.ST 版本更新

Predicting Stock Price Direction on Earnings Announcement Days using Multi-modal Deep Learning

使用多模态深度学习预测盈利公告日的股价方向

Manuel Noseda, Nathan Soldati, Marco Paina

发表机构 * ETH Zürich(苏黎世联邦理工学院)

AI总结 本研究结合基本面指标、技术指标和新闻情感,利用LSTM和Transformer模型预测盈利公告日的股价方向,发现Transformer在识别波动方面更敏感,且新闻情感有助于提升性能。

详情
AI中文摘要

预测盈利公告(EAs)期间的股价走势是一个重大挑战,因为市场噪音和高冲击价格不连续性。在本研究中,我们评估了公告前的新闻情感、公司基本面和近期市场动态是否共同预测了EA日股票的价格方向运动。我们构建了一个多模态特征空间,结合了15个基本面指标、3个基于价格的技术指标以及使用FinBERT处理的金融新闻文章的情感分数。我们将长短期记忆(LSTM)网络和基于Transformer的架构与逻辑回归基线进行比较,并进一步评估所有模型在有和没有情感特征的情况下的增量价值。我们的结果表明,虽然LSTM通过保守的安全策略显示出更高的精确度,但Transformer模型在识别波动性运动方面表现出更高的敏感性,获得了更高的宏观F1分数,消融实验显示加入新闻情感有一致的益处。

英文摘要

Predicting stock price movements during Earnings Announcements (EAs) is a significant challenge due to market noise and high-impact price discontinuities. In this study, we evaluate whether pre-announcement news sentiment, firm fundamentals, and recent market dynamics jointly predict the directional price movement of equities on EA days. We construct a multi-modal feature space combining 15 fundamental metrics, 3 price-based technical indicators and sentiment scores derived from financial news articles processed using FinBERT. We compare a Long Short-Term Memory (LSTM) network and a Transformer-based architecture against a logistic regression baseline, and further assess all models with and without sentiment features to quantify their incremental value. Our results indicate that while the LSTM demonstrates higher precision through a conservative safe-bet strategy, the Transformer model exhibits superior sensitivity in identifying volatile movements, achieving a higher macro F1-score, with ablation experiments showing a consistent benefit from incorporating news sentiment.

2605.25890 2026-05-26 cs.LG 版本更新

Merge-Bench: Resolve Merge Conflicts with Large Language Models

Merge-Bench: 使用大型语言模型解决合并冲突

Benedikt Schesch, Michael D. Ernst

发表机构 * Amazon(亚马逊) University of Washington(华盛顿大学)

AI总结 本文构建了包含7938个真实合并冲突的数据集Merge-Bench,并利用组相对策略优化(GRPO)训练LLMergeJ模型,在Java程序上以14B参数超越多数商业LLM,但最佳模型正确解决率仍低于60%。

Comments 14 pages, 7 figures

详情
AI中文摘要

本文应用机器学习处理版本控制合并这一困难且重要的任务。(1)我们构建了一个数据集Merge-Bench,包含来自1439个GitHub仓库的7938个真实合并冲突片段。真实标注是开发者提交到仓库的合并解决方案。我们的数据集构建方法可扩展到任意数据量,因为无需手动标注。(2)我们训练了一个模型LLMergeJ,用于解决Java程序中的合并冲突。我们的方法使用组相对策略优化(GRPO),一种在线强化学习方法,来训练大型语言模型(LLM)。(3)我们对LLM在解决合并冲突上的性能进行了两次评估。在Java程序上,具有14B参数的LLMergeJ优于3个商业LLM,仅次于Gemini 2.5 Pro。在11种编程语言中,商业LLM的性能在不同语言间基本稳定。最佳模型正确解决的合并冲突不到60%。

英文摘要

This paper applies machine learning to the difficult and important task of version control merging. (1) We constructed a dataset, Merge-Bench, of 7938 real-world merge conflict hunks from 1439 GitHub repositories. The ground truth is the merge resolution that developers committed to the repository. Our dataset construction methodology is scalable to arbitrary amounts of data since no manual labeling is required. (2) We trained a model, LLMergeJ, to resolve merge conflicts in Java programs. Our approach uses Group Relative Policy Optimization (GRPO), an online reinforcement learning method, to train a Large Language Model (LLM). (3) We performed two evaluations of the performance of LLMs on resolving merge conflicts. On Java programs, LLMergeJ with 14B parameters outperforms 3 commercial LLMs, trailing only Gemini 2.5 Pro. Across 11 programming languages, commercial LLM performance is largely stable from language to language. The best models correctly resolve less than 60% of merge conflicts.

2605.25888 2026-05-26 cs.LG math.OC 版本更新

Optimal and Order-optimal Gated Priority-based Greedy Policies for Two-layer Multi-item Order Fulfillment

两层多物品订单履约的最优和阶最优门控优先级贪婪策略

Xi Chen, Yuze Chen, Ziyi Chen, Yuan Zhou

发表机构 * Leonard N. Stern School of Business, New York University(纽约大学 Leonard N. Stern 商学院) Qiuzhen College, Tsinghua University(清华大学邱泽学院) Yau Mathematical Sciences Center & Department of Mathematical Sciences, Tsinghua University(清华大学姚数学科学中心及数学科学系)

AI总结 针对电商在两层分销网络中实时履约决策问题,提出门控优先级贪婪策略,证明其竞争比最优性,并通过数值实验验证性能。

详情
AI中文摘要

我们研究当多物品客户订单顺序到达且未来需求未知时,电商企业如何在两层分销网络中做出实时履约决策。核心管理矛盾在于:是否使用稀缺的前端配送中心(FDC)库存来节省当前履约成本,还是保留该库存用于未来可能更有价值的本地服务订单。我们构建了一个对抗性在线模型,包含多个FDC、一个区域配送中心(RDC)、多单位多物品订单以及物品特定且时变的可变成本。理论目标是刻画简单、可解释且可实施的履约规则何时能够达到与最优先知规划者几乎相同的性能。我们提出了一类门控优先级贪婪策略,在时变和时不变成本结构下推导了竞争比保证,并为任何在线算法建立了匹配或接近匹配的下界。数值实验表明,所提策略相对于广义短视和基于预测的基准方法表现强劲。分析提供了管理指导:何时应保护本地库存,何时拆分订单值得承担固定成本负担,以及固定成本和可变成本的相对大小如何决定更复杂优化的价值。

英文摘要

We study how an e-commerce firm should make real-time fulfillment decisions in a two-layer distribution network when multi-item customer orders arrive sequentially and future demand is unknown. The central managerial tension is whether to use scarce front distribution center (FDC) inventory to save current fulfillment cost or preserve that inventory for future orders that may be more valuable to serve locally. We formulate an adversarial online model with multiple FDCs, one regional distribution center (RDC), multi-unit multi-item orders, and item-specific and time-varying variable costs. Our theoretical objective is to characterize when simple, interpretable, and implementable fulfillment rules can perform nearly as well as an optimal clairvoyant planner. We develop a family of Gated Priority-based Greedy policies, derive competitive-ratio guarantees under both time-varying and time-invariant cost structures, and establish matching or near-matching lower bounds for any online algorithm. Numerical experiments show that the proposed policies perform strongly relative to generalized myopic and forecast-based benchmarks. The analysis yields managerial guidance on when local inventory should be protected, when splitting orders is worth the fixed-cost burden, and how the relative magnitudes of fixed and variable costs determine the value of more sophisticated optimization.

2605.25882 2026-05-26 cs.LG 版本更新

Conformalised imprecise inference for robust extrapolation under limited data

基于共形化的不精确推断在有限数据下的鲁棒外推

Yu Chen, Scott Ferson

发表机构 * Institute for Risk and Uncertainty(风险与不确定性研究所) University of Liverpool(利物浦大学)

AI总结 提出一种模型无关的共形化不精确推断框架,通过引入不精确性和距离感知,在分布偏移下保持覆盖并自适应扩展不确定性,实现有限数据下的鲁棒外推。

Comments 10 pages, 5 figures

详情
AI中文摘要

最近不确定性量化的进展越来越强调机器学习中偶然不确定性和认知不确定性之间的区别,这激发了对更统一框架的需求。然而,尽管在产生可靠预测方面取得了很大进展,现有方法在泛化到训练领域之外时往往缺乏严格的保证。我们提出了一种用于鲁棒外推的共形化不精确推断框架,该框架是模型无关的,并为预测模型增加了不精确性和距离感知。所提出的方法产生不精确预测(概率盒),这些预测在分布偏移下仍然有效,在外推区域中保持覆盖的同时自适应地扩展不确定性。在合成和基准数据集上的实验表明,与标准概率方法相比,特别是在有限数据下,该方法具有更好的鲁棒性和可靠的覆盖。

英文摘要

Recent advances in uncertainty quantification increasingly emphasise the distinction between aleatory and epistemic uncertainty in machine learning, motivating the need for more unified frameworks. However, despite much progress in producing reliable predictions, existing methods often lack rigorous guarantees when generalising beyond the training domain. We propose a conformalised imprecise inference framework for robust extrapolation, which is model-agnostic and augments predictive models with imprecision and distance awareness. The proposed approach yields imprecise predictions (probability boxes) that remain valid under distributional shift, maintaining coverage while adaptively expanding uncertainty in extrapolation regimes. Experiments on synthetic and benchmark datasets demonstrate improved robustness and reliable coverage compared to standard probabilistic approaches, particularly under limited data.

2605.25880 2026-05-26 cs.LG 版本更新

The Quantization Benefits of Residual-Free Transformers

无残差Transformer的量化优势

Yiping Ji, Mahalakshmi Sabanayagam, Peyman Moghadam, Hemanth Saratchandran, Simon Lucey

发表机构 * Australian Institute for Machine Learning, Adelaide University(澳大利亚机器学习研究所,阿德莱德大学) DATA61, CSIRO(DATA61,CSIRO)

AI总结 本文通过对比残差与无残差Transformer,发现残差连接导致激活值非高斯性增强,从而增加量化误差;而无残差Transformer通过正交初始化等技术保持近高斯激活值,显著提升低比特量化鲁棒性,揭示了精度与可压缩性之间的权衡。

Comments Under review

详情
AI中文摘要

大规模Transformer的训练和部署日益受到跨加速器传输激活值、梯度和优化器状态的限制。低比特量化提供了一种自然的补救措施,但Transformer的激活值通常具有重尾和异常值主导的特点,使得简单量化损失严重。我们表明,这种困难不仅是量化器的属性,也是架构的属性。具体来说,残差连接在训练过程中可能使Transformer激活值偏离高斯性。通过残差和无残差Transformer之间的受控比较,我们证明这种效应导致残差模型在低精度下量化误差和精度下降显著更高。我们通过超额峰度分析解释这一现象,表明残差混合可以放大非高斯性,而无残差中的密集混合则压缩非高斯性。然后我们展示,使用正交初始化、谱或二阶优化以及注意力温度的深度感知缩放,可以使无残差Transformer可训练。在语言任务中,虽然全精度性能略有下降,但这些模型保持近高斯激活值,并对低比特量化表现出显著改善的鲁棒性。我们的结果揭示了Transformer设计中的精度-可压缩性权衡,并激发了面向量化的基础模型的架构级方法。

英文摘要

Large-scale transformer training and deployment are increasingly constrained by the transfer of activations, gradients, and optimizer states across accelerators. Low-bit quantization offers a natural remedy, but transformer activations are often heavy-tailed and outlier-dominated, making simple quantization highly lossy. We show that this difficulty is not only a property of the quantizer, but also of the architecture. Specifically, residual connections can drive transformer activations away from Gaussianity during training. Using controlled comparisons between residual and residual-free transformers, we demonstrate that this effect leads to substantially higher quantization error and accuracy degradation at low precision in residual models. We explain the phenomenon through an excess kurtosis analysis, showing that residual mixing can amplify non-Gaussianity, whereas dense mixing in residual-free contracts non-Gaussianity. We then show that residual-free transformers can be made trainable using orthogonal initialization, spectral or second-order optimization, and depth-aware scaling of attention temperature. In language tasks, while there is a small drop in full precision performance, these models retain near-Gaussian activations and exhibit significantly improved robustness to low-bit quantization. Our results identify an accuracy--compressibility trade-off in transformer design and motivate architecture-level approaches to quantization-friendly foundation models.

2605.25868 2026-05-26 cs.HC cs.LG 版本更新

The Timing Dependencies of Trust: Speed, Accuracy, and cBCI Neuro-Decoupling in Human-AI Teams

信任的时间依赖性:人机团队中的速度、准确性与cBCI神经解耦

Christopher Baker, Stephen Hinton, Akashdeep Nijjar, Riccardo Poli, Caterina Cinel, Tom Reed, Stephen Fairclough

发表机构 * School of Electronics, Electrical Engineering and Computer Science(电子工程与计算机科学学院) Queen's University Belfast(贝尔法斯特女王大学) School of Psychology(心理学学院) Liverpool John Moores University(利物浦约翰摩尔斯大学) School of Computer Science and Electronic Engineering(计算机科学与电子工程学院) University of Essex(埃塞克斯大学) Defence Science Technology Laboratory(国防科学技术实验室)

AI总结 本研究通过比较快速低准确率(FLA-AI)与慢速高准确率(SA-AI)两种AI助手,利用协作脑机接口(cBCI)和自适应黎曼Oracle,揭示了AI响应时间决定了团队失败机制:快速AI引发盲目服从,慢速AI导致延迟认知冲突,并通过混合融合方法有效提升了团队性能。

详情
AI中文摘要

人工队友的速度和准确性从根本上改变了人机集成的失败状态。高速AI干预可能诱发反射性盲目服从,而延迟干预则可能引发模糊的认知冲突。本研究调查了任务内AI助手的基本特征——快速/低准确率(FLA-AI)与慢速/高准确率(SA-AI)——如何影响虚拟现实无人机任务中协作脑机接口(cBCI)团队的协同效应。17名操作员在高认知负荷下完成连续搜索任务,同时使用二维自适应黎曼Oracle映射其空间协方差。结果数学上证明,AI时间决定了团队失败机制。快速AI引发即时盲目服从;欺骗下的人类准确率降至50.2%,纯行为团队(N=8)无法超过74.1%。相反,慢速AI引发延迟认知冲突;人类犹豫(准确率61.1%),但N=8的行为团队最终恢复到100.0%。关键的是,黎曼Oracle数学上适应这些状态:它严格限制时间窗口(<0.8秒)以拦截快速反射性服从,同时扩大窗口(>1.2秒)以捕获延迟认知冲突。通过混合融合集成这些孤立的真实信号,成功挽救了快速AI团队(N=8时+7.6%),并显著加速了较小慢速AI团队的恢复(N=4时+6.9%)。这些发现证明,cBCI协同效应高度依赖于信任的时间动态,为设计动态门控的人机系统提供了关键框架。

英文摘要

The speed and accuracy of an artificial teammate fundamentally alter the failure states of Human-AI integration. While high-speed AI interventions risk inducing reflexive blind compliance, delayed interventions can induce ambiguous cognitive conflict. This study investigates how the fundamental characteristics of an in-task AI assistant, Fast/Less-Accurate (FLA-AI) versus Slow/Accurate (SA-AI) impact the synergy of Collaborative Brain-Computer Interface (cBCI) teams in a Virtual Reality drone task. Seventeen operators completed continuous search tasks under high cognitive workload while their spatial covariance was mapped using a 2D Adaptive Riemannian Oracle. The results mathematically demonstrate that AI timing dictates the mechanism of team failure. Fast AI induced instant, blind compliance; human accuracy under deception collapsed to 50.2%, and pure behavioural teams (N=8) failed to scale beyond 74.1%. In contrast, Slow AI induced delayed cognitive conflict; humans hesitated (61.1% accuracy), but N=8 behavioural teams eventually recovered to 100.0%. Crucially, the Riemannian Oracle mathematically adapted to these states: it heavily restricted temporal windows (< 0.8s) to intercept fast reflexive compliance, while widening windows (> 1.2s) to capture delayed cognitive conflict. Integrating these isolated veridical signals via Hybrid Fusion successfully rescued the Fast AI team (+7.6% at N=8) and significantly accelerated the recovery of smaller Slow AI teams (+6.9% at N=4). These findings prove that cBCI synergy is heavily contingent on the temporal dynamics of trust, providing a critical framework for designing dynamically gated Human-AI systems.

2605.25866 2026-05-26 cs.LG cond-mat.mtrl-sci physics.class-ph 版本更新

UNATE: UNsupervised ATomic Embedding for crystal structures property prediction

UNATE:用于晶体结构性质预测的无监督原子嵌入

Laura Solà-Garcia, Àlex Solé, Javier Ruiz-Hidalgo

发表机构 * GitHub

AI总结 提出UNATE框架,通过无监督去噪自编码器和自监督对比学习从无标签晶体结构中学习鲁棒原子表示,用于下游性质预测,在有限标签数据下提升高达10%。

详情
AI中文摘要

准确预测晶体性质对于加速材料发现至关重要,但通常受限于稀缺的标记数据和昂贵的理论计算。为缓解这一问题,我们提出UNATE(无监督原子嵌入),一个利用从无标签晶体结构中提取的结构信息的框架。UNATE将无监督去噪自编码器与自监督对比学习相结合,学习鲁棒的原子表示,然后将其用作下游性质预测的输入特征。实验结果表明,用UNATE预训练的节点嵌入替换原始原子序数,在全数据基线上提升了2.7%。值得注意的是,在标记数据有限的情况下,优势更加明显,当仅使用25%的标记数据时,提升高达10%。

英文摘要

Accurately predicting crystal properties is critical for accelerating materials discovery, but it is often limited by scarce labeled data and costly theoretical calculations. To alleviate this, we propose UNATE (Unsupervised Atomic Embedding), a framework that leverages structural information extracted from unlabeled crystal structures. UNATE integrates an unsupervised denoising autoencoder with self-supervised contrastive learning to learn robust atomic representations, which are then used as input features for downstream property prediction. Experimental results show that replacing raw atomic numbers with UNATE-pretrained node embeddings yields a 2.7\% improvement over the full-data baseline. Notably, the benefits become more pronounced in scenarios with limited labeled data, reaching improvements of up to 10\% when only 25\% of the labeled data is used.

2605.25864 2026-05-26 cs.LG cs.CL 版本更新

When Self-Belief Misleads: Active Label Acquisition for Reinforcement Learning with Verifiable Rewards

当自我信念误导:面向可验证奖励的强化学习的主动标签获取

Li Wang, Xiaodong Lu, Xiaohan Wang, Yikun Ban, Jiajun Chai, Wei Lin, Tianhao Peng, Guojun Yin

发表机构 * Meituan(美团) Beihang University(北京航空航天大学) Nanyang Technological University(新加坡国立大学)

AI总结 提出RLAVR框架,通过主动获取少量真实标签并与伪标签结合,利用CAG指标和CARE策略稳定训练并提升有限标注预算下的性能。

详情
AI中文摘要

大型语言模型(LLM)通过可验证奖励的强化学习(RLVR)在推理能力上取得了显著进展。然而,RLVR本质上依赖于真实标签进行奖励计算,而在实际场景中获取这些标签通常成本高昂。虽然无监督的RLVR范式试图通过训练伪标签来规避这一问题,但它们极易发生训练崩溃。此外,不同样本往往具有不同的标注价值。在本文中,我们提出了主动可验证奖励的强化学习(RLAVR),它主动获取少量选定样本的真实标签,并将其与伪标签相结合,从而稳定训练动态并在有限标注预算下提高性能。为了识别有价值的样本,我们提出了纠正优势差距(CAG)指标,并分析了样本级别的监督价值。在此基础上,我们引入了用于RLAVR的纠正感知可靠性估计(CARE),它将理想的CAG准则转化为实用的预查询获取策略,以显著提高训练稳定性。跨不同领域、模型家族和模型规模的大量实验证明了我们方法的有效性和通用性。我们的代码可在https://github.com/Lumina04/CARE获取。

英文摘要

Large Language Models (LLMs) have achieved remarkable advancements in reasoning capabilities empowered by Reinforcement Learning with Verifiable Rewards (RLVR). Nonetheless, RLVR intrinsically relies on ground-truth labels for reward computation, the acquisition of which is often prohibitively expensive in real-world scenarios. While unsupervised RLVR paradigms attempt to circumvent this by training on pseudo-labels, they are notoriously susceptible to training collapse. Moreover, different samples often exhibit varying annotation values. In this paper, we propose Reinforcement Learning with Active Verifiable Rewards (RLAVR), which actively acquires ground-truth labels for a small set of selected samples and integrates them with pseudo-labels, thereby stabilizing training dynamics and improving performance under limited annotation budgets. To identify valuable samples, we propose the Corrective Advantage Gap (CAG) metric and analyze the sample-level supervision value. Building on this, we introduce Correction-Aware Reliability Estimation for RLAVR (CARE), which translates the oracle CAG criterion into a practical pre-query acquisition policy to substantially improve training stability. Extensive experiments across diverse domains, model families, and model scales demonstrate the effectiveness and generality of our approach. Our code is available at https://github.com/Lumina04/CARE.

2605.25859 2026-05-26 math.ST cs.LG stat.TH 版本更新

Minimax Limits of k-Fold Cross-Validation via Majority

k折交叉验证的极小极大极限:多数投票算法

Ido Nachum, Rüdiger Urbanke, Thomas Weinberger

发表机构 * University of Haifa(海法大学) EPFL(苏黎世联邦理工学院)

AI总结 本文通过分析二元分类中多数投票算法的交叉验证均方误差,揭示了k折交叉验证的极小极大极限,证明当折数k随样本数n增长时,任何经验风险最小化算法的均方误差下界为Ω(√k/n)。

详情
AI中文摘要

我们研究了$k$折交叉验证作为风险估计量的均方误差,特别关注其精度如何依赖于折数$k$。尽管交叉验证被广泛使用,但关于如何选择$k$的原则性指导基本缺失,这主要是由于折间误差估计的复杂依赖性。为了获得清晰且可解释的结果,我们聚焦于二元分类中的多数投票算法,这是一个最小但非平凡的经验风险最小化过程。我们对其交叉验证行为进行了细粒度分析,表明即使这个简单算法也表现出微妙而精细的现象,现有理论对此给出的界是宽松甚至无效的。借助这一分析,我们引入了交叉验证风险估计的极小极大框架,并证明当折数随样本数$n$增长时,没有任何经验风险最小化算法能够达到$O(1/n)$的极小极大均方误差;相反,一个$Ω(√k/n)$阶的下界是不可避免的。我们的结果揭示了交叉验证作为数据重用策略的根本局限性,澄清了先前理论工作中的空白和不准确之处,并将多数投票算法定位为一个自然的基准,任何对交叉验证的紧致分析都应能够解释它。

英文摘要

We study the mean-squared error of $k$-fold cross-validation as a risk estimator, with particular emphasis on how its accuracy depends on the number of folds $k$. Despite the widespread use of cross-validation, principled guidance for choosing $k$ is largely absent, mainly due to the complex dependence between fold-wise error estimates. To obtain sharp and interpretable results, we focus on the majority algorithm in binary classification, a minimal yet nontrivial empirical risk minimization procedure. We provide a fine-grained analysis of its cross-validation behavior, showing that even this simple algorithm exhibits subtle and delicate phenomena for which existing theory provides loose and even vacuous bounds. Leveraging this analysis, we introduce a minimax framework for cross-validation risk estimation and prove that no empirical risk minimization algorithm can achieve an $O(1/n)$ minimax mean-squared error when the number of folds grows with the number of samples $n$; instead, a lower bound of order $Ω(\sqrt{k}/n)$ is unavoidable. Our results reveal fundamental limitations of cross-validation as a data-reuse strategy, clarify gaps and inaccuracies in prior theoretical work, and position the majority algorithm as a natural benchmark that any tight analysis of cross-validation should be able to explain.

2605.25850 2026-05-26 cs.CL cs.AI cs.LG 版本更新

TIAR: Trajectory-Informed Advantage Reweighting for LLM Abstention Learning

TIAR:基于轨迹信息的优势重加权用于大语言模型弃权学习

Muyu Pan, Shu Zhao, Nan Zhang, Philip Shin, Varun Parekh, Vijaykrishnan Narayanan, Rui Zhang

发表机构 * Department of Computer Science, The Pennsylvania State University(宾夕法尼亚州立大学计算机科学系)

AI总结 本文提出TIAR方法,利用GRPO中的多条轨迹作为自然弃权信号,动态重加权弃权奖励,在六个评估类别中的五个上取得最优弃权F1分数,同时保持基线准确率。

Comments 10 pages, 1 figure, 4 tables

详情
AI中文摘要

本文研究大语言模型(LLM)的弃权学习,特别是使用三元奖励来激励大语言模型中的真实性。本文将该思想从三元奖励扩展到基于轨迹信息的优势重加权(Trajectory-Informed Advantage Reweighting),在组相对策略优化(GRPO)训练期间动态重加权弃权奖励。本工作的目标聚焦于弃权学习而非提升真实性,作为减少幻觉的探索。本文的新颖之处在于方法论创新、优势重加权和基准选择。利用GRPO的多条轨迹作为自然弃权信号,该方法使用奖励信号探索知识边界并鼓励一致性。通过证明轨迹可以作为策略相对于查询的置信度指标,进而用于动态计算弃权优势。使用AbstentionBench作为评估基准,因为本工作旨在为弃权学习领域做出贡献。对该基准上的所有数据集,均使用本方法和各种基线进行了测试。实证结果表明,TIAR在六个评估类别中的五个上取得了最优弃权F1分数,在31个基准数据集中的17个上优于静态三元基线,同时完全保持基线准确率。

英文摘要

This paper investigates large language model (LLM) abstention learning, specifically using ternary reward, which incentivize truthfulness in large language models. This paper extends that idea by moving from a ternary reward to a Trajectory-Informed advantage reweighting, dynamically re-weights the abstention reward during Group Relative Policy Optimization (GRPO) training. The objective of this work focuses on abstention learning instead of improving truthfulness, serving as an exploration into hallucination reduction. The novelty of this paper lies in methodological innovation, advantage re-weighting, and benchmark selection. Leveraging GRPO's multiple trajectories as a natural abstention signal, this method uses a reward signal to explore knowledge boundaries and encourage consistency. By demonstrating that trajectories can be used as a confidence indicator of the policy relative to the query, they are then used to dynamically calculate the abstention advantage. AbstentionBench is used as the evaluation benchmark, as this work aims to contribute to the field of abstention learning. All datasets on the benchmark were tested against this method and various baselines. Empirical results demonstrate that TIAR achieves state-of-the-art abstention F1 scores across five of six evaluation categories, outperforming the static ternary baseline on 17 of 31 benchmark datasets while fully preserving baseline accuracy.

2605.25848 2026-05-26 cs.LG cs.AI 版本更新

Geometric Evolution Maps: Extracting Stable Concept Probes from Transformer Residual Streams

几何演化图:从Transformer残差流中提取稳定概念探针

James Henry

发表机构 * Independent Researcher(独立研究者)

AI总结 提出几何演化图(GEM)方法,通过追踪残差流中概念的方向轨迹并识别旋转停止的交接层,提取稳定的概念探针,在391个概念×模型对中优于峰值层探针的比例达66.2%。

Comments 24 pages, 3 figures. Reference implementation: rosetta_tools v1.3.1 (doi:10.5281/zenodo.20361433)

详情
AI中文摘要

从Transformer残差流中提取的概念探针的可靠性取决于提取层。常见的做法是在固定的后期层或分离得分函数的峰值处进行探测,这忽略了一个基本的结构特征:概念表示在其组装阶段经历显著的方向旋转,直到主要概念分配区(CAZ)之后的一个特征交接层才稳定下来。我们引入了几何演化图(GEM),它通过残差流激活追踪概念的完整方向轨迹,识别旋转停止的交接层,并从该层提取稳定的探针方向。在跨越70M到14B参数的23种架构和17种概念类型中,CAZ内入口到出口的余弦相似度平均为0.233,表明CAZ入口处的探针方向不能可靠地预测出口处的探针方向。在391个概念×模型对(23个模型×17个概念)上的消融实验表明,GEM提取的探针在268/391次试验(68.5%)中至少与峰值层探针一样精确,并在259/391次试验(66.2%)中严格优于峰值层探针。架构差异显著:MHA模型在173/221次试验(78.3%)中偏好交接层;GQA模型仅在56/119次试验(47.1%)中偏好交接层。模型级Wilcoxon检验:W=214, N=23, p=0.010(单侧)。一个自适应消融宽度规则针对79/391个近最终层情况:在60/79个触发情况(75.9%)中提高了探针质量,平均增益+7.44个百分点。方向特异性控制证实消融效果是概念方向特异性的:与随机方向消融相比,中位数抑制率为377倍(99.1%的概念方向击败了所有10个随机种子)。参考实现:rosetta_tools v1.3.1(doi:10.5281/zenodo.20361433)。

英文摘要

Concept probes extracted from transformer residual streams are only as reliable as the layer from which they are extracted. The common practice of probing at a fixed late layer or at the peak of a separation score function ignores a fundamental structural feature: concept representations undergo substantial directional rotation during their assembly phase, and do not settle into a stable direction until a characteristic handoff layer after the primary Concept Allocation Zone (CAZ). We introduce Geometric Evolution Maps (GEMs), which track the full directional trajectory of a concept through residual stream activations, identify the handoff layer where rotation ceases, and extract the settled probe direction from that layer. Across 23 architectures spanning 70M to 14B parameters and 17 concept types, the entry-to-exit cosine similarity within CAZs has a mean of 0.233, showing that probe direction at CAZ entry does not reliably predict probe direction at exit. Ablation experiments across 391 concept x model pairs (23 models x 17 concepts) show that GEM-extracted probes are at least as precise as peak-layer probes in 268/391 trials (68.5%), and strictly outperform in 259/391 (66.2%). The architecture split is pronounced: MHA models favour the handoff in 173/221 trials (78.3%); GQA models favour the handoff in only 56/119 trials (47.1%). Model-level Wilcoxon: W=214, N=23, p=0.010 (one-sided). An adaptive ablation width rule targets the 79/391 near-final-layer cases: it improves probe quality in 60/79 triggered cases (75.9%), mean gain +7.44pp. A direction-specificity control confirms the ablation effect is concept-direction specific: median 377x suppression rate versus random-direction ablation (99.1% of concept directions beat all 10 random seeds). Reference implementation: rosetta_tools v1.3.1 (doi:10.5281/zenodo.20361433).

2605.25835 2026-05-26 cs.LG cs.AI 版本更新

Context-Instrumental Data Distillation for Kubernetes Manifest Generation: Method and Experimental Evaluation

面向Kubernetes清单生成的上下文-工具数据蒸馏方法及实验评估

Andrey Kozachok, Anatoliy Bakaev, Aleksandr Kozachok, Shamil Magomedov, Artem Noev

发表机构 * RTU MIREA(俄罗斯莫斯科RTU MIREA)

AI总结 提出上下文-工具数据蒸馏方法,通过合成生成和反向指令生成构建语料库,结合外部验证器过滤,在资源受限条件下微调1.5B参数小语言模型生成Kubernetes清单,实验表明严格输出格式比增加训练样本更关键。

Comments 15 pages, 4 figures, 2 tables

详情
AI中文摘要

本文研究了参数高达40亿的小语言模型(SLM)在领域特定语言(DSL)中生成工件的专业化。选择Kubernetes清单作为目标领域。我们提出了上下文-工具数据蒸馏方法:源语料库通过合成生成形成,在扩展方案中通过从真实Kubernetes YAML文件进行反向指令生成,仅当通过外部验证器并匹配领域上下文模型时,才将配对包含在训练中。与经典的KL散度知识蒸馏不同,基线实现简化为在工具验证示例上进行监督微调。实验部分在资源受限条件下展示了试点实现:DeepSeek-V4 Flash API作为教师模型进行合成生成,而Qwen2.5-Coder-1.5B-Instruct通过LoRA在CPU上进行微调。在K8s-Distill-Pilot语料库(训练1200,验证100,测试200)上,我们以更严格的提示公式和max_new_tokens=768实现了full-pass@1 = 91.5%(183/200)。关键经验发现是,对于Kubernetes YAML,试点中的结果质量更多地取决于严格的输出格式要求,而不是简单地增加训练样本数量。

英文摘要

This paper examines the specialization of Small Language Models (SLMs) with up to 4 billion parameters for generating artifacts in domain-specific languages (DSL). Kubernetes manifests are chosen as the target domain. We propose the context-instrumental data distillation method: the source corpus is formed through synthetic generation and, in an extended scheme, through reverse instruction generation from real Kubernetes YAML files, with pairs included in training only upon passing external validators and matching the domain context model. Unlike classical KL-divergence knowledge distillation, the baseline implementation reduces to supervised fine-tuning on instrumentally verified examples. The experimental section presents a pilot implementation under resource-constrained conditions: the DeepSeek-V4 Flash API serves as the teacher for synthetic generation, while Qwen2.5-Coder-1.5B-Instruct is fine-tuned via LoRA on CPU. On the K8s-Distill-Pilot corpus (train_1200, validation_100, test_200), we achieved full-pass@1 = 91.5% (183/200) with a stricter prompt formulation and max_new_tokens=768. The key empirical finding is that for Kubernetes YAML, result quality in the pilot depended more on strict output format requirements than on simply increasing the number of training examples.

2605.25831 2026-05-26 cs.CL cs.AI cs.LG 版本更新

Clarify, Abstain or Answer? Strategising in Conversation with Belief-Augmented Generation

澄清、弃权或回答?基于信念增强生成的对话策略

Joris Baan, Wilker Aziz, Barbara Plank, Raquel Fernández

发表机构 * University of Amsterdam(阿姆斯特丹大学) MCML Munich(慕尼黑MCML) LMU Munich(慕尼黑莱茵-魏尔堡大学)

AI总结 提出信念增强生成(BAG)方法,通过将大语言模型自身的信念状态注入提示,使其推理多个采样响应并决定对话策略(回答、澄清或弃权),从而提升多轮模糊问答的准确性和策略决策的忠实度。

详情
AI中文摘要

大语言模型(LLMs)定义了文本上的分布,这可以视为不确定性的概率表示:采样K个响应会产生一个信念状态——模型认为合理的响应。现有工作利用这种表示进行解码或选择性预测等狭窄任务,通常需要手动干预,无法直接控制生成。我们提出信念增强生成(BAG):通过提示将LLMs锚定在其自身的信念状态中,并让它们推理这K个样本以决定对话策略:回答、澄清或弃权。在多轮模糊问答设置中,我们发现LLMs默认很少澄清或弃权,忽略了关于输入或事实的不确定性。BAG在六个模型上提高了问答准确性,并产生了比仅提示基线更忠实于信念状态的策略决策。然而,区分何时澄清与何时弃权仍然具有挑战性。

英文摘要

Large language models (LLMs) define a distribution over text, which can be viewed as a probabilistic representation of uncertainty: sampling K responses yields a belief state - responses a model deems plausible. Existing work exploits this representation for narrow tasks like either decoding or selective prediction, and often requires manual interventions, not controlling generation directly. We propose Belief-Augmented Generation (BAG): grounding LLMs in their own belief state via the prompt and letting them reason over these K samples to decide on a conversational strategy: answer, clarify, or abstain. In a multi-turn ambiguous QA setting, we find that LLMs by default rarely clarify or abstain, ignoring uncertainty about the input or facts. BAG improves QA accuracy across six models and yields strategy decisions more faithful to the belief state than prompt-only baselines. Disentangling when to clarify from when to abstain, however, remains challenging.

2605.25826 2026-05-26 math.NA cs.CE cs.LG cs.NA 版本更新

Branched Signature Kernel Solvers for ODEs with rough Single-Trajectory signals

带粗糙单轨迹信号的常微分方程的分支签名核求解器

Munawar Ali, Qi Feng, Charlie Pyle, George Xu

发表机构 * Department of Mathematics, Florida State University(佛罗里达州立大学数学系) Department of Mathematics, Texas A&M University(德克萨斯大学阿姆斯特朗分校数学系) Department of Mathematics, Rutgers University(罗格斯大学数学系)

AI总结 针对由单个粗糙信号驱动的ODE,提出基于计数采样和核配置的分支签名核求解器,实现准确稳定的预测。

Comments 39 pages, 12 figures

详情
AI中文摘要

我们开发了一种分支签名核求解器,用于求解由可能粗糙的强迫信号的\emph{单个观测轨迹}驱动的线性和非线性常微分方程——这种设置自然出现在地震工程、金融、生物学和结构健康监测中,其中强迫信号仅被观测一次,求解器必须尊重底层物理定律而不依赖集合实现。两个成分是新的。首先,一个\emph{计数采样}构造将单个观测转化为一个由$N+1$个嵌套训练路径组成的层次族,在这些路径上可以评估分支签名核;这使得原本为多实现回归问题设计的签名核机制能够处理单轨迹观测。其次,一个核配置框架将假设置于解的最高阶导数上(通过积分核恢复低阶导数)或解本身(在对ODE进行$m$次积分之后)。我们证明了分支签名核的通用逼近定理,利用Hairer–Kelly同态通过时间扩展路径的几何签名来表达分支签名评估。离线求解器被扩展为流式测试/训练/重训练协议,在线性情况下具有闭式在线更新,在非线性情况下具有标量牛顿步。在六个基准(El-Centro地震位移、Solow资本存量模型、fBM驱动的二阶ODE、强迫Duffing振子、路径依赖的Arias强度退化变系数振子以及含噪Kuramoto相位振子系统)上的数值实验表明,分支签名核求解器在所有情况下都能提供准确、稳定的预测。

英文摘要

We develop a branched signature kernel solver for linear and nonlinear ordinary differential equations driven by a \emph{single observed trajectory} of a possibly rough forcing signal -- a setting that arises naturally in earthquake engineering, finance, biology, and structural health monitoring, where the forcing is observed exactly once and the solver must respect the underlying physical law without recourse to an ensemble of realizations. Two ingredients are new. First, a \emph{count-sampling} construction turns the single observation into a hierarchical family of $N+1$ nested training paths on which the branched signature kernel can be evaluated; this allows the signature kernel machinery, originally designed for multi-realization regression problems, to operate on a single-trajectory observation. Second, a kernel-collocation framework places the ansatz either on the highest-order derivative of the solution (with lower derivatives recovered by integrating the kernel) or on the solution itself (after $m$-fold integration of the ODE). We prove a universal approximation theorem for the branched signature kernel, leveraging the Hairer--Kelly morphism to express branched signature evaluations through geometric signatures of time-extended paths. The offline solver is extended to a streaming Test/Train/Retrain protocol with closed-form online updates in the linear case and scalar Newton steps in the nonlinear case. Numerical experiments on six benchmarks (El-Centro earthquake displacement, the Solow capital-stock model, an fBM-driven second-order ODE, a forced Duffing oscillator, a path-dependent Arias-intensity-degraded oscillator with variable coefficients, and a noisy Kuramoto phase-oscillator system) show that the branched signature-kernel solver delivers accurate, stable predictions across all regimes.

2605.25819 2026-05-26 cs.LG cs.CR 版本更新

On Reliability of Efficient Membership Inference Vulnerability Evaluation

关于高效成员推断脆弱性评估的可靠性

Joonas Jälkö, Gauri Pradhan, Ossi Räisä, Antti Honkela

发表机构 * University of Helsinki(赫尔辛基大学) CISPA Helmholtz Center for Information Security(信息安全赫尔姆霍兹中心)

AI总结 本文揭示了高效成员推断攻击评估中两个关键缺陷:跨样本FPR未校准导致差分隐私审计不可靠,以及有限总体偏差导致样本脆弱性高估,并提出了后处理校准方法。

Comments 14 pages, 10 figures

详情
AI中文摘要

成员推断攻击(MIA)是通过从数据中学习的模型或统计量来经验性评估训练数据中敏感信息泄露的流行方法。MIA脆弱性通常通过二元分类器的假阳性率(FPR)和真阳性率(TPR)来评估,该分类器试图预测特定样本是否在训练数据中。然而,为了可靠估计TPR,尤其是对于低FPR值,需要大量观测,这在MIA中意味着许多目标模型,导致巨大的计算成本。为避免过高的计算需求,MIA分数通常跨多个个体和多个目标模型进行平均。我们展示了这种高效MIA评估流程中的两个关键弱点。首先,我们表明基于跨多个个体拼接的MIA分数评估TPR(常用于研究极低FPR机制下的脆弱性)在跨样本FPR上未校准。这使得它作为差分隐私审计工具不可靠。为解决此问题,我们提出了一种后处理方法,以有效校准不同样本的FPR。其次,我们识别了Carlini等人2022年提出的常用高效似然比攻击(LiRA)实现中的有限总体偏差,导致样本脆弱性的正向偏差。

英文摘要

Membership inference attacks (MIAs) are popular methods for empirically assessing the leakage of sensitive information in the training data through models or statistics learned from the data. The MIA vulnerability is often evaluated through false positive rate (FPR) and true positive rate (TPR) of a binary classifier that tries to predict whether a particular sample was in the training data. However, in order to reliably estimate the TPR especially for low FPR values, a lot of observations are needed, which in case of MIA translates to many target models, leading to large computational cost. To avoid excessive compute requirements, the MIA scores are often averaged over multiple individuals and multiple targeted models. We demonstrate two key weaknesses in this efficient MIA evaluation pipeline. First, we show that evaluating the TPR based on MIA scores concatenated across multiple individuals, commonly used to study vulnerabilities in the very low FPR regime, is not calibrated across the per-sample FPRs. This makes it unreliable as a tool for auditing differential privacy. To solve this, we propose a post-processing method to effectively calibrate the FPR across different samples. Second, we identify a finite population bias in the commonly used efficient likelihood-ratio attack (LiRA) implementation proposed by Carlini et al. 2022, leading to a positive bias in the per-sample vulnerability.

2605.25811 2026-05-26 stat.ME cs.LG stat.ML 版本更新

Geometry Adaptive Counterfactual Distribution Learning with Diffusion-Guided Smoothing

几何自适应反事实分布学习与扩散引导平滑

Kwangho Kim

发表机构 * Department of Statistics, Korea University(韩国大学统计系)

AI总结 针对高维反事实分布学习,提出两种基于扩散引导的几何自适应平滑估计器,通过有效维度降低误差,并在CelebA实验验证。

详情
AI中文摘要

我们研究了高维结果的反事实分布学习,其反事实律可能集中在低维结构附近。标准各向同性平滑对所有环境方向一视同仁,导致不利的缩放和不稳定的局部推断。我们提出了两种基于半参数去偏的扩散引导估计器:用于反事实密度的扩散知情平滑和用于反事实得分的扩散知情得分平滑。这些估计器将因果干扰调整与由扩散得分信息驱动的几何自适应定位相结合,在去除一阶干扰偏差的同时使平滑与局部结果几何对齐。我们建立了平滑密度和基于得分目标的渐近展开、风险界限和推断程序,并在额外近似条件下获得了环境密度推断。在结构几何条件下,主导随机误差由扩散引导核诱导的有效维度控制,而非环境维度。基于CelebA的半合成实验显示几何自适应方法的误差衰减更陡峭,支持了所提出的有效维度理论。

英文摘要

We study counterfactual distribution learning for high-dimensional outcomes whose counterfactual law may concentrate near lower-dimensional structure. Standard isotropic smoothing treats all ambient directions equally, leading to unfavorable scaling and unstable local inference. We propose two diffusion-guided estimators based on semiparametric debiasing: diffusion-informed smoothing for counterfactual densities and diffusion-informed score smoothing for counterfactual scores. The estimators combine causal nuisance adjustment with geometry-adaptive localization driven by diffusion score information, removing first-order nuisance bias while aligning smoothing with local outcome geometry. We establish asymptotic expansions, risk bounds, and inference procedures for smoothed density and score-based targets, with ambient density inference obtained under additional approximation conditions. Under structural geometry conditions, the leading stochastic error is governed by an effective dimension induced by the diffusion-guided kernel, rather than by the ambient dimension. Semi-synthetic experiments based on CelebA show steeper error decay for geometry-adaptive methods, supporting the proposed effective-dimension theory.

2605.25789 2026-05-26 cs.LG cs.AI cs.IT math.IT stat.ML 版本更新

On the Benefits of Free Exploration for Regret Minimization in Multi-Armed Bandits

关于自由探索对多臂老虎机遗憾最小化的益处

Yunlong Hou, Zixin Zhong, Vincent Y. F. Tan

发表机构 * Department of Mathematics, National University of Singapore(新加坡国立大学数学系) Department of Mathematics, Department of Electrical and Computer Engineering, National University of Singapore(新加坡国立大学数学系、电子与计算机工程系)

AI总结 本文研究在初始自由探索阶段后最小化累积遗憾的多臂老虎机问题,提出一种两阶段算法UFE-KLUCB-H,并证明其相比无自由探索的策略能严格减少遗憾。

Comments 55 pages

详情
AI中文摘要

我们研究了一个随机多臂老虎机问题,其中智能体在遗憾累积之前被授予一个自由探索预算,这是经典遗憾最小化或纯探索范式未涵盖的设置。目标是设计一个自适应策略,在初始自由探索阶段策略性地探索老虎机实例,并在后续阶段最小化累积遗憾。我们形式化了这个带有自由探索的遗憾最小化问题,并识别出一个有趣的区间,其中自由探索预算与时间范围成对数比例。为了量化由于自由探索阶段的可用性而高概率节省的遗憾量,我们引入了一类新的策略,称为$(α,β)$-可能节省策略。我们提出了一种两阶段、可能节省的算法UFE-KLUCB-H,它由一个原则性的自由探索策略UFE和一个历史感知的遗憾最小化策略KLUCB-H组成。推导了UFE-KLUCB-H的实例相关上界,表明UFE-KLUCB-H累积的遗憾严格少于无法访问自由探索阶段的策略。作为补充,我们基于针对自由探索环境定制的多实例扰动论证推导了实例相关下界,证明了UFE-KLUCB-H对于二值老虎机的近乎最优性。我们的上界和下界揭示了累积遗憾中依赖于可用自由探索量的尖锐相变。进行了仿真,表明算法中的强制探索和自适应性导致了更大的遗憾节省。

英文摘要

We study a stochastic multi-armed bandit problem where an agent is granted a free exploration budget before regret accumulates, a setting not captured by the classic regret minimization or pure exploration paradigms. The goal is to design an adaptive policy that strategically explores the bandit instance in the initial free exploration phase and minimizes the cumulative regret in the subsequent phase. We formalize this regret minimization with free exploration problem and identify an interesting regime where the free exploration budget scales logarithmically with the time horizon. To quantify the amount of regret saved with high probability as a result of the availability of the free exploration phase, we introduce a novel set of policies known as $(α,β)$-probably saving policies. We propose a two-phase, probably saving algorithm, UFE-KLUCB-H, which consists of a principled free exploration policy, UFE, and a history-aware regret minimization policy KLUCB-H. Instance-dependent upper bounds on UFE-KLUCB-H are derived, showing that UFE-KLUCB-H accumulates strictly less regret than policies that do not have access to a free exploration phase. Complementarily, we derive instance-dependent lower bounds based on novel multi-instance perturbation arguments tailored to the free-exploration setting, demonstrating the near-optimality of UFE-KLUCB-H for two-valued bandits. Our upper and lower bounds reveal sharp phase transitions in the accumulated regret depending on the amount of available free exploration. Simulations are conducted to demonstrate that forced exploration and adaptivity in the algorithm lead to greater regret savings.

2605.25786 2026-05-26 cs.LG cs.AI 版本更新

NPSolver: Neural Poisson Solver with Iterative Physics Supervision

NPSolver: 具有迭代物理监督的神经泊松求解器

Bocheng Zeng, Rui Zhang, Runze Mao, Mengtao Yan, Xuan Bai, Yang Liu, Zhi X. Chen, Hao Sun

发表机构 * Gaoling School of Artificial Intelligence(高岭人工智能学院) Renmin University of China(中国人民大学) School of Mechanics and Engineering Science(力学与工程科学学院) Peking University(北京大学) AI for Science Institute(AI for Science研究院) University of Chinese Academy of Sciences(中国科学院大学)

AI总结 提出NPSolver,通过迭代物理监督(利用少量PCG步骤)训练无标签的神经泊松求解器,并引入边界感知Transolver架构,在2D/3D不规则几何上优于物理信息和数据驱动基线。

Comments kdd 2026

详情
AI中文摘要

在复杂不规则域上高效求解泊松方程仍然是科学计算中的一个基本挑战,因为经典迭代求解器常常因病态系统而面临过长的运行时间。虽然神经算子提供了一种快速的替代方案,但它们通常依赖大规模标记数据集,或者在使用物理信息残差损失时难以处理不稳定的训练动态。我们提出 extsc{NPSolver},一种通过迭代物理监督训练的无标签神经泊松求解器。 extsc{NPSolver} 不依赖完全收敛的数值解或原始PDE残差,而是利用少量预处理共轭梯度(PCG)步骤来优化自身预测,从而提供更稳定且尺度良好的训练信号。理论分析证实,这种迭代监督充当了良态误差代理,并且停止梯度设计对于优化稳定性至关重要。为了更好地捕捉混合边界条件下的边界驱动特征,我们进一步引入了边界感知Transolver( extsc{BA-Transolver})架构,该架构明确分离了内部和边界令牌化。在2D和3D不规则几何上的广泛评估表明, extsc{NPSolver} 优于物理信息和数据驱动基线。此外,一个下游热控制任务突出了该模型进行高效可靠的基于梯度的边界控制的能力。我们将在 https://github.com/intell-sci-comput/NPSolver 发布我们的代码和数据。

英文摘要

Efficiently solving Poisson equations on complex, irregular domains remains a fundamental challenge in scientific computing, as classical iterative solvers often suffer from prohibitive runtime due to ill-conditioned systems. While neural operators offer a fast alternative, they typically rely on large-scale labeled datasets or struggle with unstable training dynamics when using physics-informed residual losses. We propose \textsc{NPSolver}, a neural Poisson solver trained without solution labels via iterative physics supervision. Instead of relying on fully converged numerical solutions or raw PDE residuals, \textsc{NPSolver} utilizes a small number of preconditioned conjugate gradient (PCG) steps to refine its own predictions, providing a more stable and well-scaled training signal. Theoretical analysis confirms that this iterative supervision serves as a well-conditioned error proxy and that a stop-gradient design is essential for optimization stability. To better capture boundary-driven features under mixed boundary conditions, we further introduce the Boundary-Aware Transolver (\textsc{BA-Transolver}) architecture that explicitly separates interior and boundary tokenization. Extensive evaluations on 2D and 3D irregular geometries demonstrate that \textsc{NPSolver} outperforms both physics-informed and data-driven baselines. Furthermore, a downstream thermal control task highlights the model's capability for conducting efficient and reliable gradient-based boundary control. We will release our codes and data at https://github.com/intell-sci-comput/NPSolver.

2605.25771 2026-05-26 cs.LG cs.AI 版本更新

MDGMIX: Boundary-Aware Subgraph Mixing for Multi-Domain Graph Pre-Training

MDGMIX: 边界感知的子图混合用于多域图预训练

Ziyu Zheng, Yaming Yang, Ziyu Guan, Wei Zhao, Xinyan Huang

发表机构 * School of Computer Science(计算机科学学院) Technology, Xidian University, Xi’an, China(技术学院,西安电子科技大学) School of Artificial Intelligence, Xidian University, Xi’an, China(人工智能学院,西安电子科技大学)

AI总结 针对多域图预训练中的数据冗余问题,提出MDGMIX框架,通过边界感知子图混合与层次判别学习解耦共享和域特定模式,并在适配时使用轻量级提示加权机制,在少样本分类任务中优于强基线且效率更高。

Comments Accepted by ICML2026

详情
AI中文摘要

多域图预训练是构建具有跨域泛化能力的基础图模型的关键步骤。然而,现有方法主要依赖联合训练所有源域图,导致计算成本高。此外,尚不清楚所有源域图数据是否对有效迁移有同等贡献。本文通过实验揭示了多域图预训练中存在显著的数据冗余。基于这一发现,我们提出了多域图预训练框架MDGMIX,该框架将边界感知的子图混合与层次判别相结合。通过选择边界节点构建具有挑战性的混合域子图,MDGMIX利用粗粒度域判别和细粒度域分解损失来解耦共享模式与域特定模式。在适配过程中,MDGMIX采用轻量级提示加权机制来迁移源域知识。大量实验表明,MDGMIX在少样本分类任务中持续优于强基线,同时表现出优越的时间和内存效率。代码可在 https://github.com/zhengziyu77/MDGMIX 获取。

英文摘要

Multi-domain graph pre-training is a crucial step in constructing foundational graph models with cross-domain generalization capabilities. However, existing methods predominantly rely on jointly training all source domain graphs, resulting in high computational costs. Furthermore, it remains unclear whether all source domain graph data contribute equally to effective transfer. This paper empirically reveals significant data redundancy in multi-domain graph pre-training. Based on this finding, we propose the Multi-domain Graph Pre-training Framework, MDGMIX, which combines boundary-aware subgraph mixing with hierarchical discrimination. By selecting boundary nodes to construct challenging mixed-domain subgraphs, MDGMIX employs coarse-grained domain discrimination and fine-grained domain decomposition losses to decouple shared patterns from domain-specific patterns. During adaptation, MDGMIX employs a lightweight prompt weighting mechanism to transfer source domain knowledge. Extensive experiments demonstrate that MDGMIX consistently outperforms strong baselines in few-shot classification tasks while exhibiting superior time and memory efficiency. The code is available at: https://github.com/zhengziyu77/MDGMIX.

2605.25765 2026-05-26 cs.CV cs.AI cs.LG 版本更新

Concept Unlearning via Cross-Attention Activation Projection for Diffusion Models

通过交叉注意力激活投影实现扩散模型的概念遗忘

Saemi Moon, Suhyeon Jun, Seoyeon Lee, Dongwoo Kim

发表机构 * CSE, POSTECH(POSTECH计算机科学系) GSAI, POSTECH(POSTECH通用人工智能实验室)

AI总结 提出PURE方法,利用交叉注意力激活空间构建遗忘和保留基,通过线性投影编辑权重,在保持保留概念的同时有效消除目标概念。

详情
AI中文摘要

概念遗忘旨在从预训练的文本到图像扩散模型中擦除目标概念,而无需重新训练。闭式方法在此设置中具有吸引力,因为它们对交叉注意力权重应用单一确定性编辑,并且不增加推理时间成本。然而,现有的闭式方法通过文本编码器对少数命名目标概念的简短锚定提示的响应来表示目标概念,而唤起该概念但不一致命名的释义提示可以绕过编辑。我们认为,目标应该改为在交叉注意力激活空间中表示。文本嵌入描述用户的提示,而交叉注意力激活描述模型即将渲染的内容,后者泛化到锚定模板未覆盖的释义。基于这一观察,我们提出了PURE(U-Net渲染中的投影用于擦除),这是一种闭式方法,从沿短去噪轨迹捕获的逐层交叉注意力激活构建遗忘和保留基,并将单个线性投影器应用于交叉注意力键和值权重。在最近涵盖艺术风格、知识产权、名人和NSFW类别中十个概念的整体概念遗忘基准上,PURE显著减少了在释义和对抗性提示下的目标泄露,同时将保留概念保持接近未编辑模型,在评估方法中实现了最佳的总体遗忘-保留权衡。

英文摘要

Concept unlearning aims to erase a target concept from a pretrained text-to-image diffusion model without retraining. Closed-form methods are attractive in this setting because they apply a single deterministic edit to the cross-attention weights and add no inference-time cost. Existing closed-form methods, however, represent the target concept through the text encoder's response to a few short anchor prompts that name it, and paraphrased prompts that evoke the concept without naming it consistently bypass the edit. We argue that the target should instead be represented in the cross-attention activation space. Text embeddings describe the user's prompt, while cross-attention activations describe what the model is about to render, and the latter generalize to paraphrase the anchor templates do not cover. Building on this observation, we propose PURE (Projection in U-Net Rendering for Erasure), a closed-form method that builds the forget and retain bases from per-layer cross-attention activations captured along a short denoising trajectory and applies a single linear projector to the cross-attention key and value weights. On a recent holistic concept-unlearning benchmark covering ten concepts across artistic style, intellectual property, celebrity, and NSFW categories, PURE significantly reduces target leakage under paraphrased and adversarial prompts while preserving retain concepts close to the unedited model, yielding the best overall forget-retain trade-off among evaluated methods.

2605.25750 2026-05-26 cs.LG 版本更新

Invariant-Based Weight Sharing for Message Passing

基于不变量的消息传递权重共享

Florian Seiffarth

发表机构 * University of Bonn(波恩大学) Lamarr Institute for Machine Learning and Artificial Intelligence(拉马尔人工智能与机器学习研究院)

AI总结 提出一种基于图不变量的权重共享原则,通过直接根据图不变量索引权重,增强消息传递神经网络的结构感知能力,并在合成与真实数据上取得优于标准MPNN的效果。

Comments 13 pages main paper + 30 pages references and appendix

详情
AI中文摘要

消息传递神经网络(MPNN)是学习图结构域表示的一个强大框架。然而,MPNN中的权重仅作用于特征,限制了其捕捉结构模式的能力。我们引入了一种新颖的结构感知权重共享原则,该原则明确地融入了图结构固有的信息。权重由用户选择的图不变量(即在节点置换下保持不变的函数)直接索引,从而能够在结构等价的子图之间进行系统性的权重复用。我们提出了ShareGNN,该模型在一个简单的编码器-解码器架构中实例化了这一原则,产生了一个具有可学习邻接矩阵和类似Transformer连接性的MPNN。我们证明,其表达能力至少与所选不变量的区分能力相当,从而提供了对模型复杂度的显式控制。在合成数据和真实数据以及子图计数任务上的实验表明,与标准MPNN相比,该方法具有一致的改进,具有超越1-WL测试的竞争力,并且可扩展到大型数据集。

英文摘要

Message-passing neural networks (MPNNs) are a powerful framework for learning representations of graph-structured domains. However, weights in MPNNs act on features only, limiting their ability to capture structural patterns. We introduce a novel structure-aware weight sharing principle that explicitly incorporates information inherent to the graph structure. Weights are indexed directly by user-chosen graph invariants, i.e., functions preserved under node permutations, enabling systematic reuse across structurally equivalent subgraphs. We present ShareGNNs, which instantiate this principle within a simple encoder-decoder architecture, resulting in an MPNN with learnable adjacency and transformer-like connectivity. We show that their expressivity is at least as strong as the discriminative power of the chosen invariants, providing explicit control over the model complexity. Experiments on synthetic and real-world data, as well as subgraph counting tasks, demonstrate consistent improvements over standard MPNNs, competitive expressivity beyond the 1-WL test, and scalability to large datasets.

2605.25749 2026-05-26 cs.IR cs.AI cs.LG 版本更新

DeGRe: Dense-supervised Generative Reranking for Recommendation

DeGRe: 密集监督的生成式重排序用于推荐

Chaotian Song, Jingyao Zhang, Chenghao Chen, Zisen Sang, Dehai Zhao, Guodong Cao, Boxi Wu, Deng Cai, Jia Jia

发表机构 * College of Software, Zhejiang University Hangzhou China Rajax Network Technology, Taobao Shangou of Alibaba Hangzhou China Rajax Network Technology, Taobao Shangou of Alibaba Beijing China State Key Lab of CAD\&CG, Zhejiang University Hangzhou China Rajax Network Technology, Taobao Shangou of Alibaba Shanghai China College of Software, Zhejiang University Rajax Network Technology, Taobao Shangou of Alibaba State Key Lab of CAD\&CG, Zhejiang University

AI总结 提出DeGRe框架,通过离线探索中的密集监督信号(Lookahead Evaluator)指导在线生成器(Online Generator)进行单步贪婪解码,解决重排序中的启发式标签偏差和信用分配问题。

Comments Accepted to KDD 2026 (ADS Track)

详情
AI中文摘要

在多阶段推荐系统中,重排序通过捕获列表内上下文依赖关系来优化整体效用,但其核心挑战在于在指数级排列空间中探索最优序列。最近的研究转向端到端生成式框架,通常利用列表级奖励或偏好对齐来指导生成器训练。然而,这些方法仍面临两个关键问题。首先是启发式标签偏差。现有方法通常基于简单规则构建训练目标,例如将点击项提升到顶部,而忽略列表上下文中的因果依赖关系。其次是信用分配问题。稀疏的列表级后验奖励无法直接指导序列生成中的中间步骤,导致优化方向模糊。为了解决这些问题,我们提出DeGRe(密集监督的生成式重排序),一种通过密集监督弥合离线探索与在线效率之间差距的生成式重排序框架。DeGRe的核心在于其离线-在线解耦设计。在离线阶段,我们引入基于累积回归的Lookahead Evaluator,利用束搜索在未曝光空间中主动挖掘高价值前瞻序列。在训练期间,我们将评估器的逐步价值估计转换为密集监督信号,并将其蒸馏到轻量级在线生成器中。这种机制使生成器能够内化前瞻规划能力,在线推理时仅需一次高效的贪婪解码即可逼近全局最优。实验表明,DeGRe在公开基准和工业数据集上优于基线模型。我们已成功将DeGRe部署到淘宝闪购中,显著提升了在线推荐效果。

英文摘要

In multi-stage recommender systems, reranking optimizes overall utility by capturing intra-list contextual dependencies, yet its central challenge lies in exploring optimal sequences within an exponentially large permutation space. Recent studies have shifted towards end-to-end generative frameworks, which typically leverage list-wise rewards or preference alignment to guide generator training. However, these methods still face two critical issues. First is the heuristic label bias. Existing methods often construct training targets based on simple rules, such as promoting clicked items to the top, while ignoring causal dependencies within the list context. Second is the credit assignment problem. Sparse list-level posterior rewards fail to directly guide intermediate steps in sequence generation, leading to ambiguous optimization directions. To address these issues, we propose DeGRe (Dense-supervised Generative Reranking), a generative reranking framework that bridges the gap between offline exploration and online efficiency through dense supervision. The core of DeGRe lies in its offline-online decoupled design. During the offline phase, we introduce a Lookahead Evaluator based on cumulative regression, which leverages beam search to actively mine high-value lookahead sequences in the unexposed space. During training, we transform the step-wise value estimations from the evaluator into dense supervision signals and distill them into a lightweight Online Generator. This mechanism enables the generator to internalize lookahead planning capabilities, requiring only a single efficient greedy decoding pass during online inference to approximate the global optimum. Experiments demonstrate that DeGRe outperforms baseline models on public benchmarks and industrial datasets. We have successfully deployed DeGRe on Taobao Flash Shopping, significantly improving online recommendations.

2605.25740 2026-05-26 cs.LG 版本更新

Latent Representation Alignment for Offline Goal-Conditioned Reinforcement Learning

离线目标条件强化学习中的潜在表示对齐

Hyungkyu Kang, Byeongchan Kim, Min-hwan Oh

发表机构 * Seoul National University(首尔国立大学)

AI总结 针对离线目标条件强化学习中价值函数错误泛化的瓶颈,提出潜在对齐价值学习(LAVL)算法,通过潜在表示价值泛化与分层规划的统一框架,在OGBench的22个数据集上20个取得最优性能。

Comments Accepted in ICML 2026

详情
AI中文摘要

离线目标条件强化学习(GCRL)提供了一个从固定数据集获取目标达成策略的实用框架。然而,在长视野任务中学习可靠的目标条件价值函数仍然具有挑战性。在本文中,我们指出目标条件价值函数中的错误泛化是一个根本性瓶颈,并证明在价值函数中引入适当的归纳偏置对于解决该瓶颈至关重要。基于这些发现,我们提出了潜在对齐价值学习(LAVL),一种离线GCRL算法,它将基于潜在表示的价值泛化与分层规划集成在一个统一框架中。在OGBench上的大量实验表明,LAVL持续优于现有的离线GCRL方法,在22个数据集中的20个上取得了最高性能。值得注意的是,LAVL在长视野任务和轨迹拼接数据集上表现出强大的性能,而先前的方法在这些任务上性能显著下降。我们的代码可在https://github.com/oh-lab/LAVL.git获取。

英文摘要

Offline goal-conditioned reinforcement learning (GCRL) provides a practical framework for obtaining goal-reaching policies from fixed datasets. However, learning a reliable goal-conditioned value function in long-horizon tasks remains challenging. In this paper, we identify erroneous generalization in goal-conditioned value functions as a fundamental bottleneck, and demonstrate that appropriate inductive bias in the value function is crucial for addressing the bottleneck. Building on these findings, we propose Latent-Aligned Value Learning (LAVL), an offline GCRL algorithm that integrates latent-representation-based value generalization with hierarchical planning in a unified framework. Extensive experiments on OGBench demonstrate that LAVL consistently outperforms existing offline GCRL methods, achieving the highest performance on 20 out of 22 datasets. Notably, LAVL exhibits strong performance in long-horizon tasks and trajectory stitching datasets, where prior methods suffer significant performance degradation. Our code is available at https://github.com/oh-lab/LAVL.git.

2605.25739 2026-05-26 cs.LG cs.GT stat.ML 版本更新

The Behavioral Credibility Trilemma: When Calibrated Autonomy Becomes Impossible

行为可信度三难困境:当校准自主性变得不可能

Lauri Lovén, Nam Do, Hassan Mehmood, Dinesh Kumar Sah, Sasu Tarkoma

发表机构 * Future Computing Group University of Oulu(奥卢大学未来计算组) Department of Computer Science University of Helsinki(赫尔辛基大学计算机科学系)

AI总结 本文证明,在理性监督下,当某些任务超出智能体的可靠能力时,任何具有置信门控自主性的强化学习策略都无法同时实现最大帮助性、最优校准和完全自主性,即行为可信度三难困境。

Comments 48 pages, 3 figures

详情
AI中文摘要

我们证明,在理性监督下,当某些任务超出智能体的可靠能力时,任何具有置信门控自主性的强化学习策略都无法同时实现最大帮助性、最优校准和完全自主性:即行为可信度三难困境。这种不可能性是几何性的——向严格适当的评分规则添加任何非仿射自主性激励都会破坏严格适当性,因此,同时因校准置信度和自主行动而获得奖励的智能体,会在低于委托人批准阈值的任务上系统性地夸大其报告的置信度。行为扰动引理量化了这种膨胀(对于Brier分数,缩放比例为 $w_A/(2 w_C)$),并表明检测需要 $Ω(1/Δ^2)$ 次观测。我们证明委托人的最优监督规则必然是非仿射的,这使得不可能性是无条件的,并且在对数凹密度策略族中与优化器无关。我们形式化了置信门控决策问题,将现有方法映射到三难困境上,并确定了两种建设性的解决路径(承诺、领域分离)。一个540配置的Best-of-N实验测试了五个预注册假设,所有假设均得到强烈证实(效应量 $d = 1.10$ 至 $5.32$),并增加了对可达 $(H, C, A)$ 曲面几何的描述性分析,显示了一个与预测的膨胀饱和一致的平台截断前沿。

英文摘要

We prove that no reinforcement learning policy with confidence-gated autonomy can simultaneously achieve maximum helpfulness, optimal calibration, and full autonomy under rational oversight, whenever some tasks exceed the agent's reliable competence: the Behavioral Credibility Trilemma. The impossibility is geometric -- adding any non-affine autonomy incentive to a strictly proper scoring rule destroys strict properness, so an agent rewarded for both calibrated confidence and autonomous action systematically inflates its reported confidence on tasks below the principal's approval threshold. The Behavioral Perturbation Lemma quantifies the inflation (scaling as $w_A/(2 w_C)$ for the Brier score) and shows detection requires $Ω(1/Δ^2)$ observations. We prove the principal's optimal oversight rule is necessarily non-affine, making the impossibility unconditional and optimizer-independent across log-concave-density policy families. We formalize the Confidence-Gated Decision Problem, map existing methods onto the trilemma, and identify two constructive resolution pathways (commitment, domain separation). A 540-configuration Best-of-N experiment tests five pre-registered hypotheses, all strongly confirmed (effect sizes $d = 1.10$ to $5.32$), and adds a descriptive analysis of the achievable-$(H, C, A)$ surface geometry showing a plateau-truncated frontier consistent with the predicted inflation saturation.

2605.25717 2026-05-26 cs.AI cs.CE cs.LG 版本更新

FLOATBench: A Dataset and Benchmark for Floating Offshore Wind Turbine Tower Fatigue

FLOATBench:浮式海上风力发电机塔架疲劳数据集与基准

João Alves Ribeiro, Bruno Alves Ribeiro, Francisco Pimenta, Sérgio M. O. Tavares, Faez Ahmed

发表机构 * Department of Mechanical Engineering(机械工程系) Massachusetts Institute of Technology(麻省理工学院) School of Engineering(工程学院) Brown University(布朗大学) CONSTRUCT, Faculty of Engineering University of Porto(CONSTRUCT,工程学院,葡萄牙波尔图大学) University of Aveiro(阿维罗大学)

AI总结 提出FLOATBench,一个包含582,120个疲劳损伤标签的表格基准,基于22 MW浮式风机塔架的高保真仿真,并引入工况感知的评估协议以检测随机划分无法发现的性能排名变化。

详情
AI中文摘要

全球大部分海上风能资源位于水深过大、无法使用固定式基础的海域,因此浮式海上风力发电机(FOWT)对于深水部署至关重要。随着行业向22 MW级设计规模发展,塔架疲劳变得愈发关键,因为更大的结构会放大由持续风浪激励引起的耦合气动-水动-伺服-弹性载荷。准确的疲劳损伤预测对于认证、设计优化和成本降低至关重要。然而,该领域缺乏共享的替代模型基准:不同研究报告了不同的仿真、划分和指标,使得方法难以比较。我们提出FLOATBench,一个公开的表格基准,包含三种22 MW FOWT塔架几何形状的582,120个逐截面疲劳损伤标签,这些标签来自三种塔架的19,404次高保真OpenFAST仿真(每种塔架6,468次:1,078个对齐风浪工况点×六个湍流种子),每种塔架在30个截面上进行标注。FLOATBench包括一个基于工况感知的联合风浪运行包络的alpha-shape划分,将测试点分为训练内、插值和外推区域。它配备了一个可复现的评估框架,涵盖三个协议级别:随机验证(E1)、塔内工况感知评估(E2)和跨塔迁移(E3)。工况感知协议揭示了全局性能与外推性能之间的排名变化,而随机划分排行榜无法检测到这些变化。据作者所知,FLOATBench是首个用于表格替代建模的FOWT疲劳基准,并提供了一个可推广到定义在物理运行包络上的工程替代模型的评估协议。数据集和代码可在以下网址获取:https://github.com/Joao97ribeiro/FLOATBench。

英文摘要

Most of the world's offshore wind resource lies in waters too deep for fixed-bottom foundations, making floating offshore wind turbines (FOWTs) essential for deep-water deployment. As the industry scales toward $22$ MW class designs, tower fatigue becomes increasingly critical because larger structures amplify the coupled aero-hydro-servo-elastic loads induced by continuous wind and wave excitation. Accurate fatigue-damage prediction is therefore central to certification, design optimization, and cost reduction. Yet the field lacks a shared surrogate benchmark: studies report different simulations, splits, and metrics, making methods difficult to compare. We present FLOATBench, a public tabular benchmark with $582{,}120$ per-section fatigue-damage labels across three $22$ MW FOWT tower geometries, derived from $19{,}404$ high-fidelity OpenFAST simulations across the three towers ($6{,}468$ per tower: $1{,}078$ aligned wind/wave operating points $\times$ six turbulence seeds), labeled at $30$ cross-sections per tower. FLOATBench includes a regime-aware alpha-shape partition of the joint wind/wave operating envelope, stratifying test points into in-train, interpolation, and extrapolation regimes. It is paired with a reproducible evaluation harness covering three protocol levels: random validation (E1), within-tower regime-aware evaluation (E2), and cross-tower transfer (E3). The regime-aware protocol reveals rank shifts between global and extrapolation performance that random-split leaderboards cannot detect. To the authors' knowledge, FLOATBench is the first FOWT fatigue benchmark for tabular surrogate modeling, and offers an evaluation protocol that generalizes to engineering surrogates defined over physical operating envelopes. Dataset and code available at: https://github.com/Joao97ribeiro/FLOATBench.

2605.25710 2026-05-26 physics.chem-ph cond-mat.mtrl-sci cs.LG physics.comp-ph 版本更新

Machine Learning Multiscale Interactions

机器学习多尺度相互作用

Àlex Solé, Sergio Suárez-Dou, Albert Mosella-Montoro, Silvia Gómez-Coca, Eliseo Ruiz, Alexandre Tkatchenko, Javier Ruiz-Hidalgo

发表机构 * Image Processing Group – Signal Theory and Communications Department(图像处理组——信号理论与通信系) Inorganic and Organic Chemistry Department and Institute of Theoretical and Computational Chemistry(无机和有机化学系及理论与计算化学研究所) Department of Physics and Materials Science(物理与材料科学系)

AI总结 提出多尺度结构集成(MuSE)层次模型,通过软粗粒化池化构建多尺度表示,与多种机器学习力场耦合,准确捕获跨尺度的量子力学相互作用。

详情
AI中文摘要

现实物理系统的特征在于跨多个长度和时间尺度的涌现相互作用,这对预测性机器学习模型构成了重大挑战。大多数科学机器学习模型关注于狭窄的相互作用范围。虽然机器学习力场提供了接近量子精度的准确性,但普遍的消息传递层缺失了长程多体效应。在此,我们引入多尺度结构集成(MuSE),一种层次模型,它使用软粗粒化池化从原子到粗节点的平滑分数分配构建粗粒表示,使机器学习力场模块能够在多个尺度上运行。MuSE是架构无关的,并与SO3krates、MACE和PaiNN机器学习力场耦合,适用于分子和材料。通过基于Hessian的基准测试、生物分子的折叠轨迹以及分子-石墨烯纳米结构中的能量分布,我们展示了MuSE的强大能力——与近期其他长程机器学习模型不同,MuSE在相关尺度上准确捕获了量子力学相互作用。

英文摘要

Realistic physical systems are characterised by emergent interactions across multiple length and time scales, posing a significant challenge for predictive machine learning (ML) models. Most scientific ML models focus on a narrow range of interactions. While machine learning force fields (MLFFs) offer near-quantum accuracy, the ubiquitous message-passing layers miss long-range many-body effects. Here we introduce the Multiscale Structural Ensemble (MuSE), a hierarchical model that uses Soft Coarse-Graining Pooling to construct coarse representations from smooth fractional assignments of atoms to coarse nodes, enabling MLFF modules to operate across multiple scales. MuSE is architecture-agnostic and coupled with SO3krates, MACE, and PaiNN MLFFs for both molecules and materials. We demonstrate the power of MuSE through Hessian-based benchmarks, folding trajectories for biomolecules, and energy profiles in molecule-graphene nanostructures, where MuSE accurately captures quantum-mechanical interactions at relevant scales -- unlike other recent long-range ML models.

2605.25704 2026-05-26 cs.CL cs.LG 版本更新

PowLU: An Activation Function for Stable Pre-Training of LLMs

PowLU: 一种用于LLM稳定预训练的激活函数

Peijie Jiang, Yuqi Feng, Cunyin Peng, Qian Zhao, Jia Liu, KunLong Chen, Zhiqiang Zhang, Jun Zhou

发表机构 * Ant Group(蚂蚁集团)

AI总结 提出PowLU激活函数,通过有理幂函数实现自适应非线性,解决SwiGLU在低精度LLM训练中的数值不稳定问题,在大规模训练中取得与SwiGLU和SwiGLU-Clip相当的性能并提升可扩展性。

Comments 17 pages, 7 figures, techreport

详情
AI中文摘要

在当代大型语言模型(LLM)中,swish门控线性单元(SwiGLU)激活函数被广泛采用以调节信息流并引入非线性。对于大的正输入,SwiGLU近似于二次函数$x^2$,提供强非线性和表达能力。然而,这一特性也导致随着输入或模型规模增大时的数值不稳定性,特别是在低精度LLM训练中。主要原因是其近似二次放大,扩大了输出范围并加剧了异常值。为了解决这个问题,我们提出了一种稳定的激活函数——幂线性单元(PowLU),用于大规模LLM预训练。具体来说,PowLU采用有理幂函数实现自适应非线性,从而改善表示能力并在尖峰区域实现稳定训练。此外,我们为PowLU的几个关键性质提供了理论证明。缩放定律实验确认了性能在不同模型规模下的一致性,进一步使用Ling架构(总参数7.9B和124B)的实验结果表明,PowLU在大规模LLM训练中取得了与SwiGLU和SwiGLU-Clip相当的结果。此外,实验结果还表明PowLU有效提升了LLM大规模训练的可扩展性。

英文摘要

In contemporary large language models (LLMs), the swish-gated linear unit (SwiGLU) activation function is widely adopted to regulate the information flow and introduce non-linearity. For large positive inputs, SwiGLU approximates the quadratic function $x^2$, providing strong nonlinearity and expressive capacity. However, this property also causes numerical instability as the input or model scale increases, particularly in low-precision LLM training. The main reason is its approximate quadratic amplification, which enlarges the output range and exacerbates outliers. To address this issue, we propose a stable activation function, Power Linear Unit (PowLU), for large-scale LLM pre-training. Specifically, PowLU employs a rational power function to achieve adaptive nonlinearity, thereby improving representation ability and enabling stable training in spike regions. Moreover, we provide theoretical justification for several key properties of PowLU. Scaling law experiments confirm that the performance is consistent across model sizes, and further experimental results with the Ling architecture (7.9B and 124B total parameters) demonstrate that PowLU achieves competitive results against SwiGLU and SwiGLU-Clip in large-scale training of LLMs. In addition, the experimental results also show that PowLU effectively improves the scalability of the large-scale training of LLMs.

2605.25698 2026-05-26 cs.LG cs.AI 版本更新

How Should LLMs Consume High-Quality Data? Optimal Data Scheduling via Quality-Aware Functional Scaling Laws

LLM应如何消费高质量数据?通过质量感知的功能缩放定律实现最优数据调度

Zhitao Zhu, Xili Wang, Shizhe Wu, Jiawei Fu, Xiaoqing Liu

发表机构 * Peking University(北京大学) Meituan(美团)

AI总结 本文通过引入数据质量维度扩展功能缩放定律,解析求解了联合数据质量和批次大小调度问题,揭示了高质量数据的双重角色,并提出了Drop-Stable-Rampup调度策略,在15B MoE模型上相比WSD和余弦衰减分别提升平均准确率+1.70和+2.98。

详情
AI中文摘要

高质量数据在大语言模型训练中稀缺,但如何联合训练动态调度其使用缺乏理论指导。我们通过引入数据质量维度扩展功能缩放定律,并以渐近闭式形式求解了联合数据质量和批次大小调度问题。该解揭示了两个阶段和高质量数据的双重角色。在噪声受限阶段,高质量数据应作为信号放大器:降低批次大小将更清洁的数据转换为更多信号而不放大噪声。在信号受限阶段,它应作为噪声抑制器:后期放置可减少终端噪声而不牺牲信号积累。现有的课程式流程主要利用第二个角色,将更清洁的数据放在后期,但忽略了第一个角色,因为传统的衰减调度在高质量数据可用时恰好降低了更新强度。受此启发,我们为LLM中期训练提出了Drop-Stable-Rampup:在质量转换时,降低批次大小,保持稳定以积累信号,然后逐渐增加以抑制终端噪声。在一个在108B tokens上中期训练的15B混合专家模型上,Drop-Stable-Rampup相比Warmup-Stable-Decay (WSD)平均准确率提升+1.70,相比余弦衰减提升+2.98,在数学推理基准如GSM8K (+4.23)和MATH (+2.80)上增益尤其显著。

英文摘要

High-quality data is scarce in large language model (LLM) training, yet how to schedule its use jointly with training dynamics lacks theoretical guidance. We extend functional scaling laws by incorporating a data-quality dimension, and solve the joint data-quality and batch-size scheduling problem in asymptotic closed form. The solution reveals two regimes and a dual role of high-quality data. In the noise-limited regime, high-quality data should be used as a signal amplifier: lowering the batch size converts cleaner data into more signal without amplifying noise. In the signal-limited regime, it should be used as a noise suppressor: late placement reduces terminal noise without sacrificing signal accumulation. Existing curriculum-style pipelines primarily exploit the second role by placing cleaner data late, but miss the first role because conventional decay schedules reduce update intensity exactly when high-quality data becomes available. Guided by this, we propose Drop-Stable-Rampup for LLM midtraining: upon the quality transition, drop the batch size, hold it stable to accumulate signal, then ramp up to suppress terminal noise. On a 15B Mixture-of-Experts model midtrained on 108B tokens, Drop-Stable-Rampup improves average accuracy over Warmup-Stable-Decay (WSD) by +1.70 and over Cosine-decay by +2.98, with particularly large gains on mathematical reasoning benchmarks such as GSM8K (+4.23) and MATH (+2.80).

2605.25696 2026-05-26 cs.LG 版本更新

Evaluating passing decision-making in professional football: An enhanced MPNN approach to Receiver Selection

评估职业足球中的传球决策:一种增强的MPNN方法用于接球者选择

Gabriel Masella, Giuseppe Alessio D'Inverno, Max Goldsmith, Gianluigi Rozza

发表机构 * Department of Mathematics, Informatics and Geoscience(数学、信息学与地质科学系) University of Trieste(特里斯特大学) MathLab(数学实验室) International School for Advanced Studies (SISSA)(国际高级研究学校(SISSA)) Royal Belgium Football Association(比利时皇家足球协会)

AI总结 提出一种图神经网络框架,通过将场上交互建模为动态图来预测最佳传球目标,在接球者选择任务上达到竞争性准确率,并能在数秒内评估超过1000次传球。

详情
AI中文摘要

足球中的决策过程以空间定位、对手压力和球员意图之间的复杂相互作用为特征。本文介绍了一种图神经网络(GNN)框架,旨在通过将场上交互建模为动态图来预测接球者选择,即最佳传球目标。每个球员被表示为一个节点,具有位置和上下文特征,而潜在的传球线形成加权边,由距离、角度和压力指标表征。我们开发并训练了一个消息传递神经网络(MPNN),使用了来自职业比赛的跟踪数据和事件数据的组合,通过基于优化版Needleman-Wunsch算法的稳健流水线进行同步。该模型在识别实际选择的接球者方面达到了竞争性准确率,并在前三建议中达到了最先进的准确率。我们的模型还提供了每个选项的可能性、威胁和创造力的量化,使表现分析师能够在数秒内评估超过1000次传球。

英文摘要

The process of decision-making in football is characterized by a complex interplay between spatial positioning, opponent pressure, and player intent. This work introduces a Graph Neural Network (GNN) framework designed to predict Receiver Selection, the optimal passing target, by modeling on-field interactions as dynamic graphs. Each player is represented as a node with positional and contextual features, while potential passing lines form weighted edges characterized by distance, angle, and pressure metrics. A Message-Passing Neural Network (MPNN) has been developed and trained using a combination of tracking data and event data from professional matches, synchronized through a robust pipeline based on an optimized version of the Needleman-Wunsch Algorithm. The model achieves competitive accuracy in identifying the actual chosen receiver and state-of-the-art accuracy within its top three suggestions. Our model further offers quantification of each option's likelihood, threat, and creativity, enabling performance analysts to evaluate over 1,000 passes in seconds.

2605.25681 2026-05-26 cs.LG cs.AI 版本更新

Don't Retrain, Just Reuse: Recovering Dual-Target Molecules from Single-Target Diffusion Models

不要重新训练,只需重用:从单目标扩散模型中恢复双目标分子

Qingyuan Zeng, Pengxiang Cai, Zixin Guan, Ziyang Chen, Anglin Liu, Lang Qin, Xinyao Lai, Jintai Chen

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) Guangzhou University of Chinese Medicine(广州中医药大学)

AI总结 提出REUSE框架,通过层次化进化输入空间搜索,从冻结的单目标扩散模型中恢复双目标分子,无需重新训练或修改扩散过程,在双目标亲和力上提升20.9个百分点。

详情
AI中文摘要

设计一个能调节两个靶点的单一分子是多药理学中一种有前景的策略,但它比标准的单目标生成要困难得多,因为一个候选分子必须满足两个结合要求,同时保持药物相似性和可合成性。现有的双目标生成方法通常通过在采样期间重新训练生成器或干预扩散过程来引入双目标能力。前者在双目标监督稀疏时可能成本高昂且难以稳定,而后者可能对去噪时的目标平衡和竞争性更新方向敏感。这些局限性促使我们寻找一种保持生成器不变的替代方案:能否在不修改参数或去噪动态的情况下,从冻结的单目标扩散模型的输入空间中恢复双目标候选分子?我们将此任务表述为一个受约束的多目标优化问题,并提出REUSE,一种层次化进化输入空间搜索框架,结合配对条件探索和结构化多阶段选择,以强制执行双目标亲和力、化学质量和多样性。实验表明,与修改扩散过程的方法相比,REUSE持续改善了双目标亲和力和平衡性,在双高亲和力指标上比最强基线提高了20.9个百分点,同时保持了竞争性的分子质量。

英文摘要

Designing a single molecule that modulates two targets is a promising strategy for polypharmacology, but it remains substantially harder than standard single-target generation because one candidate must satisfy two binding requirements while preserving drug-likeness and synthesizability. Existing dual-target generative methods typically introduce dual-target capability by either retraining the generator or intervening in the diffusion process during sampling. The former can be costly and difficult to stabilize when dual-target supervision is sparse, while the latter may be sensitive to denoising-time target balancing and competing update directions. These limitations motivate a generator-preserving alternative that keeps the pretrained prior intact: can dual-target candidates instead be recovered from the input space of a frozen single-target diffusion model, without modifying its parameters or denoising dynamics? We formulate this task as a constrained multi-objective optimization problem and propose REUSE, a hierarchical evolutionary input-space search framework that combines pair-conditioned exploration with structured multi-stage selection to enforce dual-target affinity, chemical quality, and diversity. Experiments show that, compared with methods that modify the diffusion process, REUSE consistently improves dual-target affinity and balance, achieving a 20.9-percentage-point gain in Dual High Affinity over the strongest prior baseline while maintaining competitive molecular quality.

2605.25674 2026-05-26 cs.LG 版本更新

Stochastic Estimation of the Layer-wise Hessian Trace for Monitoring Neural-network Training

逐层Hessian迹的随机估计用于监测神经网络训练

Maxim Bolshim, Alexander Kugaevskikh

发表机构 * ITMO University, St. Petersburg, Russia(圣彼得堡ITMO大学)

AI总结 提出一种随机估计器,通过Hutchinson迹估计与单次Hessian-向量积结合,在单次反向传播中无偏估计神经网络每层Hessian矩阵对角块的迹,并应用于检测标签记忆化阶段。

Comments 9 pages, 1 table

详情
AI中文摘要

损失及其梯度范数只能微弱地区分神经网络训练的健康和病态阶段,而经验风险的曲率在两者间有质的差异,但在参数数量$P\sim 10^{6}-10^{8}$时无法显式计算。我们提出了一种神经网络经验风险Hessian矩阵对角块迹的随机估计器。该过程将Hutchinson随机迹估计与整个参数向量上的单次Hessian-向量积相结合,并在计算图的单次反向传播中恢复每层迹的无偏估计。我们证明,在权重共享下,正确性要求逐层Hessian在第二次微分之前组装:将共享权重展开为独立坐标会引入系统偏差,其符号和大小由展开Hessian的跨实例块控制。推导了固定Hessian下估计器方差的闭式表达式,以及小批量采样分布下总方差的分解。该分解产生一个临界探测次数$K^{\star}$,平衡了两个随机源,并支持在线监测模式下$K\in[5,10]$的实用建议。该估计器应用于检测ResNet-18、ResNet-34和VGG-11在CIFAR-10和CIFAR-100上的标签记忆化阶段,其中校准的累积和决策规则在虚警率$16/120$下达到了$179/180$的经验检测能力。

英文摘要

The loss and the norm of its gradient separate the healthy and the pathological regimes of neural-network training only weakly, whilst the curvature of the empirical risk differs qualitatively between them but is inaccessible explicitly at parameter counts $P\sim 10^{6}-10^{8}$. We present a stochastic estimator of the trace of the diagonal blocks of the Hessian matrix of the empirical risk of a neural network. The procedure combines the Hutchinson stochastic trace estimator with a single Hessian-vector product over the whole parameter vector and recovers unbiased estimates of every per-layer trace in one backward pass through the computational graph. We show that correctness under weight sharing requires the layer-wise Hessian to be assembled before the second differentiation: unrolling shared weights into independent coordinates introduces a systematic bias whose sign and magnitude are governed by the cross-instance blocks of the unrolled Hessian. A closed-form expression for the variance of the estimator at a fixed Hessian is derived, together with a decomposition of the total variance under the mini-batch sampling distribution. This decomposition yields a critical probe count $K^{\star}$ that balances the two sources of randomness and supports the practical recommendation $K\in[5,10]$ in the on-line monitoring regime. The estimator is applied to the detection of the label-memorisation regime of ResNet-18, ResNet-34, and VGG-11 on CIFAR-10 and CIFAR-100, where a calibrated cumulative-sum decision rule attains an empirical detection power of $179/180$ at a false-alarm rate of $16/120$.

2605.25663 2026-05-26 cs.LG cs.CV 版本更新

Opportunistic Target Selection: Early Directional Commitment for Query-Efficient Black-Box Adversarial Attacks

机会目标选择:面向查询高效黑盒对抗攻击的早期定向承诺

Florent Tariolle, Florian Yger

发表机构 * INSA Rouen Normandy(里昂-诺曼底理工学院) LITIS

AI总结 提出一种轻量级方法OTS,通过早期将无目标攻击切换为有目标攻击,锁定当前领先的非真实类,从而减少查询次数并提高成功率。

Comments 13 pages, 10 figures, 3 tables; code available at https://github.com/Tariolle/opportunistic-target-selection

详情
AI中文摘要

仅最小化真实置信度的黑盒对抗攻击存在类别漂移问题:扰动在特征空间中游荡而不承诺特定对抗类别,浪费查询在分散、无方向的进展上。我们引入机会目标选择(OTS),一种轻量级包装器,在攻击轨迹早期将无目标攻击切换为有目标目标,锁定当前领先的非真实类别。OTS不需要对底层攻击进行架构修改,不需要梯度访问,也不需要先验的目标类别知识。我们在五个标准ImageNet分类器(4500次运行)上对三种基于分数的攻击(SimBA、使用交叉熵损失的Square Attack和Bandits)验证了OTS。在随机搜索攻击上,OTS紧密跟踪oracle性能,在ResNet-50上成功率提升高达27个百分点,审查均值迭代次数相对减少43%。在梯度估计攻击(Bandits)和边际损失攻击上,OTS是冗余的,这一负面结果强化了我们将OTS解释为边际损失替代的观点。在对抗训练模型上,双峰难度分布消除了目标帮助的机制。

英文摘要

Black-box adversarial attacks that minimize only the ground-truth confidence suffer from class drift: perturbations wander through the feature space without committing to a specific adversarial class, wasting queries on diffuse, undirected progress. We introduce Opportunistic Target Selection (OTS), a lightweight wrapper that switches an untargeted attack to a targeted objective early in its trajectory, locking onto whichever non-true class currently leads. OTS requires no architectural modification to the underlying attack, no gradient access, and no a priori target-class knowledge. We validate OTS on three score-based attacks (SimBA, Square Attack with cross-entropy loss, and Bandits) across five standard ImageNet classifiers (4,500 runs). On random-search attacks, OTS closely tracks oracle performance, with gains up to +27 pp in success rate and 43% relative reduction in censored-mean iterations on ResNet-50. On gradient-estimation attacks (Bandits) and attacks with margin loss, OTS is redundant, a negative result that reinforces our interpretation of OTS as a margin-loss surrogate. On adversarially-trained models, a bimodal difficulty distribution eliminates the regime where targeting helps.

2605.25662 2026-05-26 cs.LG 版本更新

Closed-Form Node Classification with Exact Graph Unlearning

具有精确图遗忘的闭式节点分类

Aditya Gaur, Charu Sharma

发表机构 * Machine Learning Lab IIIT Hyderabad(IIIT Hyderabad 机器学习实验室)

AI总结 提出一种基于调整同配性的路由闭式框架,通过闭式求解器(SGC+Ridge回归或LCF-Net)匹配或超越图神经网络性能,并实现精确图遗忘的快速更新与隐私分析。

Comments 19 pages, 5 figures, 12 tables (7 main + 5 appendix)

详情
AI中文摘要

用于节点分类的图神经网络通常通过梯度下降训练数百或数千个epoch。最近的工作表明,当适当调整时,经典的GCN/SAGE/GAT架构可以在许多节点分类基准上匹配图变换器。我们提出一个互补的问题:通过确定性闭式求解器能恢复多少性能,以及这能提供什么保证? 我们引入了一个由调整同配性选择的路由闭式框架。对于同配图,我们使用SGC风格的传播后接Ridge回归;对于异配图,我们引入LCF-Net,一种逐层闭式图特征精炼网络,其每层Ridge求解由高斯核-Ridge头部限制。在14个基准上,包括ogbn-arxiv和ogbn-proteins,我们的闭式预测器在9个测量数据集中的9个上匹配或击败了最佳普通2层GCN/SAGE/GAT,在12个小基准中的9个上在1个标准差内与调优的深度配方持平,并在两个大图上超过了OGB排行榜的普通GCN。剩余的异配差距紧密跟踪从普通2层到深度SAGE的增益,表明残差差异主要是架构性的。 由于我们的预测器是确定性线性系统的显式解,修改后的图输入可以重新求解以获得重训练等效参数。我们形式化了标签、特征、边、节点和子图修改的精确图对象遗忘,证明了Ridge组件的K跳局部性,并在109个配置上验证了精确性。在ogbn-arxiv上,局部更新比完全重新求解快21-45倍,比梯度重训练快约10^6倍。结构反演实验进一步量化了精确重训练的隐私下限和近似图遗忘方法的额外泄漏。

英文摘要

Graph neural networks for node classification are typically trained by gradient descent over hundreds or thousands of epochs. Recent work has shown that, when properly tuned, classic GCN/SAGE/GAT architectures can match graph transformers on many node-classification benchmarks. We ask a complementary question: how much of this performance can be recovered by deterministic closed-form solvers, and what guarantees does this enable? We introduce a routed closed-form framework selected by adjusted homophily. For assortative graphs, we use SGC-style propagation followed by Ridge regression; for heterophilous graphs, we introduce LCF-Net, a layer-wise closed-form graph feature-refinement network whose per-layer Ridge solves are capped by a Gaussian kernel-Ridge head. Across 14 benchmarks, including ogbn-arxiv and ogbn-proteins, our closed-form predictors match or beat the best vanilla 2-layer GCN/SAGE/GAT on 9 of 9 measured datasets, tie tuned deep recipes within one standard deviation on 9 of 12 small benchmarks, and exceed the OGB-leaderboard plain GCN on both large graphs. The remaining heterophilous gap closely tracks the gain from vanilla 2-layer to deep SAGE, suggesting that the residual difference is primarily architectural. Because our predictors are explicit solutions of deterministic linear systems, modified graph inputs can be re-solved to obtain retrain-equivalent parameters. We formalize exact graph-object unlearning for label, feature, edge, node, and subgraph modifications, prove K-hop locality for Ridge components, and verify exactness across 109 configurations. On ogbn-arxiv, localized updates give $21$--$45\times$ speedups over full re-solving and roughly $10^{6}\times$ speedups over gradient retraining. Structural-inversion experiments further quantify the privacy floor of exact retraining and the additional leakage of approximate graph-unlearning methods.

2605.25648 2026-05-26 stat.ML cs.LG 版本更新

StrTransformer: Source-Wise Structured Transformers for Unsupervised Blind Source Recovery

StrTransformer: 面向无监督盲源恢复的源向结构化Transformer

Yuan-Hao Wei

发表机构 * PolyU

AI总结 提出StrTransformer框架,通过源向结构化Transformer分支和观测空间混合器直接优化潜在源矩阵,实现盲源恢复和分支潜在建模。

详情
AI中文摘要

本文提出StrTransformer,一种用于盲源恢复和分支潜在建模的源向结构化Transformer框架。StrTransformer不使用编码器推断潜在变量,而是直接优化潜在源矩阵,同时结合观测空间混合器和源向结构化Transformer分支。混合器强制重建一致性,而每个Transformer分支对一条潜在源轨迹施加可微的结构约束。具体来说,每个源被转换为多尺度补丁令牌,随机掩码,由局部偏置Transformer处理,并通过掩码补丁重建能量进行评估。该能量作为隐式的源向结构先验。为了鼓励不同潜在分支专门处理不同的时间模式,StrTransformer进一步引入有序多尺度控制器,学习分支特定的补丁尺度权重、有序尺度中心和局部注意力斜率。最终目标函数结合了观测重建、源向结构正则化以及用于分离和尺度专门化的模块化辅助惩罚。我们分析了目标函数的解耦和耦合结构、正则化精确重建纤维,以及由有序分支描述符引起的置换对称性减少。一个受控案例研究表明,学习到的分支收敛到不同的时间尺度结构,并在事后评估中恢复源对齐的潜在轨迹。

英文摘要

This paper proposes StrTransformer, a source-wise structured Transformer framework for blind source recovery and branch-wise latent modeling. Instead of using an encoder to infer latent variables, StrTransformer directly optimizes the latent source matrix together with an observation-space mixer and source-wise structural Transformer branches. The mixer enforces reconstruction consistency, while each Transformer branch imposes a differentiable structural constraint on one latent source trajectory. Specifically, each source is converted into multi-scale patch tokens, randomly masked, processed by a locality-biased Transformer, and evaluated through a masked patch reconstruction energy. This energy acts as an implicit source-wise structural prior. To encourage different latent branches to specialize into different temporal regimes, StrTransformer further introduces an ordered multi-scale controller that learns branch-specific patch-scale weights, ordered scale centers, and locality attention slopes. The resulting objective combines observation reconstruction, source-wise structural regularization, and modular auxiliary penalties for separation and scale specialization. We analyze the decoupling and coupling structure of the objective, the regularized exact-reconstruction fiber, and the reduction of permutation symmetry induced by ordered branch descriptors. A controlled case study shows that the learned branches converge to distinct temporal-scale structures and recover source-aligned latent trajectories under post-hoc evaluation.

2605.25640 2026-05-26 physics.ins-det cs.LG hep-ex nucl-ex 版本更新

3D Magnetic Field Reconstruction and Mapping with Physics-Informed Neural Networks

基于物理信息神经网络的3D磁场重建与映射

Haohan Yu, Zhanxu Hao, Bingzhi Li, Zejia Lu, Xiang Chen, Liang Li

发表机构 * Xinxiang Medical University(新乡医学院) Institute of Particle and Nuclear Physics(粒子与核物理研究所) Henan Normal University(河南师范大学) Henan University of Urban Construction(河南城市学院) Shanghai Institute of Applied Physics(上海应用物理研究所) Chinese Academy of Sciences(中国科学院) State Key Laboratory of Dark Matter Physics(暗物质物理国家重点实验室) School of Physics and Astronomy(物理与天文学院) Key Laboratory for Particle Astrophysics and Cosmology (Ministry of Education)(粒子天体物理与宇宙学重点实验室(教育部)) Shanghai Key Laboratory for Particle Physics and Cosmology(上海粒子物理与宇宙学重点实验室) Scientific Model Research Group(科学模型研究组)

AI总结 提出一种物理信息神经网络(PINN)框架,通过将麦克斯韦方程直接融入损失函数并引入测量点物理残差损失,实现高精度3D磁场重建,仿真精度达10^{-4},实验精度达10^{-3}水平。

详情
AI中文摘要

准确重建不可达区域的磁场对于物理学中的许多高精度实验至关重要。传统方法(如球谐展开)常因截断误差而限制精度。本研究提出一种先进的物理信息神经网络(PINN)框架,用于高精度3D磁场映射。与传统的纯数据驱动模型不同,所提出的PINN将麦克斯韦方程直接融入损失函数,在整个域内强制执行无散度和无旋度条件。一个关键创新是在测量位置包含显式的物理残差损失,确保超越随机配点采样的严格物理一致性。使用模拟数据进行验证,重建精度达到$10^{-4}$,比现有PINN基准提高十倍。此外,使用定制线圈组件的实验验证表明,在环境条件下,相对精度达到亚百分比水平($10^{-3}$量级)的稳健重建。这种AI驱动方法为传感器放置受限的复杂实验环境中的场监测和测量提供了稳健的高精度解决方案。

英文摘要

Accurate reconstruction of magnetic fields in inaccessible regions is vital for many high-precision experiments in physics. Traditional methods, such as spherical harmonic expansion, often suffer from truncation errors that limit their precision. This study proposes an advanced Physics-Informed Neural Network (PINN) framework for high-precision 3D magnetic field mapping. Unlike conventional data-driven models, the proposed PINN integrates Maxwell's equations directly into the loss function, enforcing divergence-free and curl-free conditions across the entire domain. A key innovation is the inclusion of explicit physics-residual losses at measurement locations, ensuring rigorous physical consistency beyond random collocation sampling. Validation using simulated data achieves a reconstruction accuracy of $10^{-4}$, a tenfold improvement over existing PINN benchmarks. Furthermore, experimental validation using a custom coil assembly demonstrates robust reconstruction with sub-percent relative accuracy, reaching the $10^{-3}$ level under ambient conditions. This AI-driven methodology provides a robust, high-precision solution for field monitoring and measurement in complex experimental environments where direct sensor placement is restricted.

2605.25632 2026-05-26 cs.AI cs.LG q-fin.RM 版本更新

Insuring Every Action: An Authority Frontier Framework for Runtime Actuarial Control of Autonomous AI Agents

为每个行动投保:自主AI代理运行时精算控制的权威边界框架

Hao-Hsuan Chen

发表机构 * Department of Risk Management and Insurance(风险管理与保险系)

AI总结 提出精算行动接口(AAI)和权威边界框架,通过确定性运行时合约对自主AI代理的副作用行动进行定价、门控和评估,实现跨领域的精算控制与基准测试。

Comments 35 pages, 4 figures, 11 tables. Companion paper on the mathematical foundations: SSRN 6761960

详情
AI中文摘要

自主AI代理越来越多地产生带有副作用的行动:数据库变更、退款、支付、外部承诺。我们提出精算行动接口(AAI),这是一个确定性的运行时合约,它在时间一致的风险映射下,对每个此类行动按照合约固定的安全默认值进行定价,并根据每个边界的储备资本预算门控执行。然后我们开发了权威边界,这是一种评估原语,用于衡量运行时在每个储备资本水平下释放的自主权威量。该框架提供:(i) 一个确定性的报价-绑定-提交协议,带有通行费限制的能力令牌;(ii) 一个通用的七类行动分类法,将异构工具调用映射到可比较的权威单位;(iii) 在alpha支出下的重放确定性和逐路径储备覆盖;(iv) 通过全储备需求C_full和资本指标Capital@k进行跨域归一化。我们在四个代理环境(数据库变更、客服退款以及公共tau-bench零售和航空工具使用轨迹)中实例化AAI,并报告一个实时Postgres面板,其中三个Azure托管的模型通过同一合约提出行动。边界在跨域中表现出常见的低储备拒绝和中间释放模式,仅在预算网格达到全储备需求时饱和;所需储备资本变化达22倍(Capital@50从289到6457)。该框架不强制域采用相同形状;它揭示每个域的精算几何。在实时面板中,合约在低预算下防止了所有三个模型的实现损失,但在拒绝下的承保持续性方面有所不同:模型身份是一个精算承保变量。贡献是一个用于自主代理副作用运行时精算控制的基准就绪评估框架。

英文摘要

Autonomous AI agents increasingly issue side-effect-bearing actions: database mutations, refunds, payments, external commitments. We propose the Actuarial Action Interface (AAI), a deterministic runtime contract that prices each such action against a contractually fixed safe default under a time-consistent risk mapping, and gates execution against a per-boundary reserve capital budget. We then develop the Authority Frontier, an evaluation primitive measuring how much autonomous authority the runtime releases at each level of reserve capital. The framework provides (i) a deterministic quote-bind-commit protocol with toll-bounded capability tokens; (ii) a universal seven-class action taxonomy mapping heterogeneous tool calls to comparable authority units; (iii) replay determinism and pathwise reserve coverage under alpha-spending; (iv) cross-domain normalization via full reserve demand C_full and capital metrics Capital@k. We instantiate AAI across four agentic environments (database mutation, customer-service refund, and the public tau-bench retail and airline tool-use traces) and report a live Postgres panel in which three Azure-hosted models propose actions through the same contract. The frontier exhibits a common low-reserve refusal and intermediate-release pattern across domains, with saturation only where the budget grid reaches full reserve demand; required reserve capital varies by 22x (Capital@50 from 289 to 6457). The framework does not force domains into the same shape; it surfaces each domain's actuarial geometry. In the live panel the contract prevents realized loss across all three models at low budget while differing in underwriting persistence under denial: model identity is an actuarial underwriting variable. The contribution is a benchmark-ready evaluation framework for runtime actuarial control of autonomous-agent side effects.

2605.25619 2026-05-26 cs.LG 版本更新

Analogies between Transformer Layers and Power Method

Transformer层与幂法之间的类比

Chenglong Li, Claudio Altafini

AI总结 本文揭示了Transformer层中的操作(投影和层归一化,忽略前馈神经网络)与幂法步骤之间的类比,并证明通过层后token倾向于与该层输出权重矩阵和值权重矩阵乘积的主特征向量对齐,同时提出了一种将Transformer输出导向任意期望方向的方法。

详情
AI中文摘要

在本文中,我们展示了Transformer层中发生的操作(投影和层归一化,忽略前馈神经网络)与幂法步骤之间存在类比。与此类比一致,我们证明通过一层后,token倾向于朝向一个矩阵的主特征向量倾斜,该矩阵是该层的输出权重矩阵和值权重矩阵的乘积。在具有共享权重的Transformer(即所有层具有相同权重)的特殊情况下,与这个主特征向量的对齐在经验上特别明显,并且也可以在分析上证明。该类比还提出了一种方法,可以将Transformer的输出引导到token空间中的任意期望方向。

英文摘要

In the paper we show that there is an analogy between the operations occurring in a layer of a transformer (projections and layer normalizations, disregarding the feedforward neural network) and a step in the power method. Coherently with this analogy, we show that passing through a layer the tokens tend to be tilted towards the principal eigenvector of a matrix which is the product of the output and value weight matrices of that layer. In the special case of a transformer with shared weights (i.e., in which all layers have identical weights) then the alignment with this principal eigenvector is particularly evident empirically, and can also be shown analytically. The analogy also suggests a method to steer the output of the transformer towards an arbitrary desired direction in token space.

2605.25616 2026-05-26 cs.LG stat.ML 版本更新

Courtroom Analogy: New Perspective on Uncertainty-Aware Classification

法庭类比:不确定性感知分类的新视角

Taeseong Yoon, Heeyoung Kim

发表机构 * Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology(工业与系统工程系,韩国科学技术院)

AI总结 提出法庭类比框架,通过结构化混合狄利克雷分布建模分类中的不确定性聚合,并设计单次前馈神经网络MoDEX实现高效、可解释的不确定性量化。

Comments ICML 2026

详情
AI中文摘要

分类中的单次不确定性量化方法通过预测类概率向量上的可处理分布来表示不确定性。现有方法主要关注增强该分布的表示能力,但往往对预测不确定性如何结构化和聚合提供的见解有限,导致可解释性较弱。我们引入法庭类比,将不确定性感知分类概念化为类特定倡导者之间的结构化辩论。每位倡导者形成概率意见,并通过输入依赖的可信度权重聚合这些意见得出最终裁决。在此框架中,每位倡导者的意见被建模为狄利克雷分布,其浓度参数分解为共享证据和类特定倡导。这产生了具有语义可解释参数的结构化混合狄利克雷分布。为实例化该公式,我们提出了混合狄利克雷专家(MoDEX),一种预测法庭参数的单次前馈神经架构,能够在显式建模不确定性聚合的同时实现高效且表达力强的不确定性量化。我们证明MoDEX具有强大的理论性质,并在多种基准测试中实现了最先进的不确定性量化性能,产生具有有意义语义的可解释不确定性估计。

英文摘要

Single-pass uncertainty quantification (UQ) methods for classification represent uncertainty by predicting a tractable distribution over the class probability vector. While existing approaches primarily focus on enhancing the expressiveness of this distribution, they often provide limited insight into how predictive uncertainty is structured and aggregated, resulting in weak interpretability. We introduce the courtroom analogy, which conceptualizes uncertainty-aware classification as a structured debate among class-specific advocates. Each advocate forms a probabilistic opinion, and a final verdict is reached by aggregating these opinions using input-dependent plausibility weights. In this framework, each advocate's opinion is modeled as a Dirichlet distribution whose concentration parameter is decomposed into shared evidence and class-specific advocacy. This yields a structured mixture of Dirichlet distributions with semantically interpretable parameters. To instantiate this formulation, we propose Mixture of Dirichlet EXperts (MoDEX), a single-pass neural architecture that predicts the courtroom parameters, enabling efficient and expressive UQ while explicitly modeling uncertainty aggregation. We demonstrate that MoDEX enjoys strong theoretical properties and achieves state-of-the-art UQ performance across diverse benchmarks, yielding interpretable uncertainty estimates with meaningful semantics.

2605.25612 2026-05-26 cs.LG cs.AI 版本更新

Towards the Connection between Activation Sparsity and Flat Minima

激活稀疏性与平坦极小值之间的联系

Ze Peng, Jian Zhang, Lei Qi, Yang Gao, Yinghuan Shi

发表机构 * State Key Laboratory for Novel Software Technology, Nanjing University(南京大学新型软件技术国家重点实验室) Institute of Brain-Machine Interface, Nanjing University(南京大学脑机接口研究院) School of Computer Science and Engineering, Southeast University(东南大学计算机科学与工程学院)

AI总结 本文发现损失景观的平坦性与Transformer中MLP激活稀疏性密切相关,通过理论推导和三种实用方法增强稀疏性,显著降低推理和训练成本。

详情
AI中文摘要

标准训练的Transformer的MLP块中出现的激活稀疏性为在不牺牲性能的情况下大幅降低计算成本提供了机会。为了从理论上解释这一现象,现有工作表明激活稀疏性并非源于数据属性或数据拟合,而是来自训练过程的隐式偏差。然而,这些联系是在强假设下得到的,无法应用于标准训练的大步数深度模型。与这些工作不同,我们发现损失景观的平坦性也与MLP激活稀疏性密切相关,并且可以作为标准深度网络的一个更弱且自然出现的假设。具体来说,我们发现:1) MLP激活稀疏性等于“增强平坦性”(平坦性度量的加权和)与输入范数和MLP激活梯度乘积的比值。我们经验性地发现该比值在训练过程中下降,导致稀疏激活。2) 我们还提出了导数稀疏性的概念,在ReLU下它退化为激活稀疏性,但进一步支持反向传播中的剪枝,并且比激活稀疏性更稳定。基于理论发现,我们通过三种方法减小分子和增大分母来进一步鼓励激活稀疏性。这些即插即用的修改可以有效降低比值并产生更稀疏的激活。在ImageNet-1K和C4上的实验表明,与原始Transformer相比,推理稀疏性至少提高36%,训练稀疏性至少提高50%,表明在推理和训练中进一步降低成本的潜力。

英文摘要

The observation that activation sparsity emerges in MLP blocks of standardly trained Transformers offers an opportunity to drastically reduce computation costs without sacrificing performance. To theoretically explain this phenomenon, existing works have shown that activation sparsity does not result from the data properties or data fitting but from the implicit bias of the training process. However, these connections are obtained with strong assumptions, which cannot be applied to deep models standardly trained with a large number of steps. Different from these works, we find that the flatness of loss landscapes is also closely related to the MLP activation sparsity and can serve as a weaker and naturally emerging assumption standard deep networks. Specifically, we find that 1) the MLP activation sparsity equals a ratio between "augmented flatness" (a weighted sum of flatness measures) and the product of the input norm and activation gradient of the MLP. We empirically find that this ratio decreases during training, leading to sparse activations. 2) We also propose the notion of derivative sparsity, which reduces to activation sparsity under ReLU, but further enables pruning in the backward propagation and is more stable than activation sparsity. With the theoretical findings, we can further encourage activation sparsity by decreasing the numerator and increasing the denominator of the ratio using three methods. These plug-and-play modifications can effectively reduce the ratio and produce sparser activations. Experiments on ImageNet-1K and C4 demonstrate relative improvements of at least 36% on inference sparsity and at least 50% on training sparsity over vanilla Transformers, indicating further potential cost reduction in both inference and training

2605.25608 2026-05-26 stat.ML cs.LG 版本更新

Learning Sparse Compositional Functions with Norm-Constrained Neural Networks

学习具有范数约束神经网络的稀疏组合函数

Shuo Huang, Lorenzo Fiorito, Lorenzo Rosasco, Tomaso Poggio

发表机构 * Istituto Italiano di Tecnologia(意大利技术研究院) Università degli Studi di Genova(热那亚大学) MaLGa(MaLGa实验室) DIBRIS(迪布里兹实验室) CBMM(生物医学工程与机器人实验室) Massachusetts Institute of Technology(麻省理工学院)

AI总结 本文通过范数约束的深度神经网络,建立了学习稀疏组合函数的逼近率和过风险界,证明了深度网络能够利用层次表示避免维数灾难。

详情
AI中文摘要

深度神经网络学习层次特征的能力被广泛认为是其在高维学习中成功的关键机制。现有理论通过基于参数计数的逼近率和组合模型的无维数灾难样本复杂度保证,部分支持了这一观点。为了研究参数数量超过样本量的过参数化场景,我们开发了一个通过参数范数衡量复杂度的框架。在该方法中,我们使用Frobenius范数约束的深度神经网络,为学习稀疏组合函数建立了逼近率和过风险界,其中组合函数的组合结构由有向无环图表示。我们的结果具有广泛的适用性,因为每个可有效图灵计算的函数都具有稀疏组合表示。特别地,我们涵盖了一系列代表性模型,包括多指标模型、二叉树结构和一般组合架构。我们推导的速率表明,深度网络可以利用目标函数的组合结构,通过层次表示有效避免维数灾难。

英文摘要

The ability of deep neural networks to learn hierarchical features is widely regarded as a key mechanism underlying their success in high-dimensional learning. Existing theory partially supports this view by establishing approximation rates based on parameter counts and sample complexity guarantees for compositional models without incurring the curse of dimensionality (CoD). To study overparameterized regimes, where the number of parameters exceeds the sample size, we develop a framework that measures complexity via the parameter norm. Within this approach, we establish approximation rates and excess risk bounds for learning sparse compositional functions whose compositional structure is represented by directed acyclic graphs (DAGs), using Frobenius norm-constrained deep neural networks. Our results have broad applicability since every function that is efficiently Turing computable admits sparse compositional representations. In particular, we cover a range of representative models, including multi-index models, binary tree structures, and general compositional architectures. The rates we derive show that deep networks can exploit the compositional structure of the target functions, effectively avoiding the CoD through hierarchical representations.

2605.25605 2026-05-26 eess.AS cs.LG 版本更新

Decoding Stimulus Reconstruction-Based Auditory Attention Robustly in Unbalanced EEG Datasets

在不平衡EEG数据集中基于刺激重建的听觉注意力鲁棒解码

Yuanming Zhang, Yayun Liang, Zhibin Lin, Jing Lu

发表机构 * Key Lab of Modern Acoustics, Nanjing University(南京大学现代声学国家重点实验室) Horizon Robotics

AI总结 研究不平衡数据集对基于刺激重建的听觉注意力解码性能的影响,提出留一对包交叉验证协议以防止解码准确率膨胀。

详情
AI中文摘要

在过去十年中,许多研究通过刺激重建从脑电图信号中应用深度神经网络解码听觉注意力。然而,数据集平衡对基于刺激重建的AAD解码性能的影响尚未被探索。在本研究中,使用三个公开的EEG-AAD数据集——KUL、DTU和NJU cEEGrid——构建平衡和不平衡的实验条件。我们假设并证明基于刺激重建的DNN解码器倾向于在不平衡数据集上产生高估的解码性能。为了解决这个问题,我们提出了一种留一对包交叉验证协议。实验结果证实,LOPEO有效防止了在不平衡数据集上的解码准确率膨胀。虽然平衡数据集在实验设计中通常更受青睐,但LOPEO为已经发表的不平衡数据集提供了一个原则性的评估框架,填补了该领域的一个重要空白。

英文摘要

In the past decade, numerous studies have applied deep neural networks (DNNs) to decode auditory attention (AAD) from Electroencephalogram (EEG) signals via stimulus reconstruction. However, the influence of dataset balance on the decoding performance of stimulus reconstruction-based AAD remains unexplored. In this study, three publicly available EEG-AAD datasets - KUL, DTU, and NJU cEEGrid - are used to construct both balanced and unbalanced experimental conditions. We hypothesize and demonstrate that stimulus reconstruction-based DNN decoders tend to produce overestimated decoding performance on unbalanced datasets. To address this issue, we propose a leave-one-paired-envelope-out (LOPEO) cross-validation protocol. Experimental results confirm that LOPEO effectively prevents inflated decoding accuracy on unbalanced datasets. While balanced datasets are generally preferred in experimental design, LOPEO provides a principled evaluation framework for unbalanced datasets that have already been published, filling an important gap in the field.

2605.25604 2026-05-26 cs.CL cs.LG 版本更新

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

DVAO: 面向多奖励强化学习的动态方差自适应优势优化

Guochao Jiang, Jingyi Song, Guofeng Quan, Chuzhan Hao, Guohua Liu, Yuewei Zhang

发表机构 * Alibaba Cloud Computing(阿里巴巴云 computing)

AI总结 针对多奖励强化学习中奖励组合导致训练不稳定、优势组合依赖静态超参数的问题,提出动态方差自适应优势优化方法,通过基于经验奖励方差动态调整组合权重,实现稳定训练与多目标帕累托前沿优化。

详情
AI中文摘要

强化学习已成为将大型语言模型与人类意图和任务要求对齐的标准范式。尽管组相对策略优化为近端策略优化提供了一种高效、无价值模型的替代方案,但将其适应于现实世界的多奖励设置仍然具有挑战性。标准的标量化实践,如奖励组合和优势组合,存在显著缺陷:奖励组合经常产生平方幅度过大的优势,导致训练不稳定;而优势组合依赖静态超参数,忽略了跨目标相关性。为了解决这些限制,我们提出了动态方差自适应优势优化(DVAO),它根据 rollout 组内每个目标的经验奖励方差动态调整组合权重,有效提高具有更强学习信号的目标的权重,同时抑制噪声目标。我们从数学上证明 DVAO 保持有界的优势幅度以实现稳定训练,并引入了一种自适应的跨目标正则化机制。使用 Qwen3 和 Qwen2.5 模型在数学推理和工具使用基准上的大量实验表明,DVAO 显著优于基线方法,实现了卓越的多目标帕累托前沿和稳健的训练稳定性。

英文摘要

Reinforcement Learning has become a standard paradigm for aligning Large Language Models with human intent and task requirements. While Group Relative Policy Optimization offers an efficient, value-model-free alternative to Proximal Policy Optimization, adapting it to real-world multi-reward settings remains challenging. Standard scalarization practices, such as Reward Combination and Advantage Combination, suffer from significant drawbacks: Reward Combination frequently generates advantages with excessively large squared magnitudes that lead to training instability, while Advantage Combination relies on static hyperparameters and ignores cross-objective correlations. To address these limitations, we propose Dynamic Variance-adaptive Advantage Optimization (DVAO), which dynamically adjusts combination weights based on the empirical reward variance of each objective within a rollout group, effectively up-weighting objectives with a stronger learning signal while suppressing noisy ones. We mathematically prove that DVAO maintains bounded advantage magnitudes for stable training and introduces a self-adaptive cross-objective regularization mechanism. Extensive experiments on mathematical reasoning and tool-use benchmarks using Qwen3 and Qwen2.5 models demonstrate that DVAO significantly outperforms baseline methods, achieving a superior multi-objective Pareto frontier and robust training stability.

2605.25599 2026-05-26 cs.LG cs.CV 版本更新

Generalized Evidential Deep Learning: From a Bayesian Perspective

广义证据深度学习:从贝叶斯视角

Yuanye Liu, Yibo Gao, Yuanyang Chen, Xiahai Zhuang

发表机构 * School of Data Science, Fudan University, Shanghai, China(复旦大学数据科学学院,上海,中国)

AI总结 本文从广义贝叶斯框架出发,为证据深度学习建立理论基础,并提出统一可扩展的广义证据深度学习框架,在分类、不确定性估计和OOD检测上取得可比结果。

Comments Submitted to ICML2026

详情
AI中文摘要

证据深度学习(EDL)已成为一种高效、无需采样的不确定性估计策略。一系列EDL变体被提出以解决原始框架的特定局限性,并取得了显著成功。然而,EDL的基本理论结构以及这些变体之间的关系尚未得到系统研究。在这项工作中,我们通过在广义贝叶斯框架内解释EDL,包括先验规范、后验更新和训练目标,为其建立了原则性的理论基础。我们进一步从贝叶斯分布不确定性角度刻画了证据不确定性,并通过渐近分析建立。基于这一视角,我们进一步提出了广义证据深度学习(GEDL),这是一个统一且可扩展的框架,明确解耦了各个组件的作用,并将GEDL与现有变体系统地联系起来。大量实验表明,GEDL在分类、不确定性估计和OOD检测上取得了可比的结果,并具有理论依据。

英文摘要

Evidential Deep Learning (EDL) has emerged as an efficient, sampling-free strategy for uncertainty estimation. A series of EDL variants have been proposed to address specific limitations of the original framework, achieving notable success. However, the underlying theoretical structure of EDL and the relationships among these variants have received limited systematic investigation. In this work, we establish a principled theoretical foundation for EDL by interpreting it within a generalized Bayesian framework that includes prior specification, posterior update, and training objective. We further characterize evidential uncertainty from a Bayesian distributional uncertainty viewpoint, established via asymptotic analysis. Building on this perspective, we further propose Generalized Evidential Deep Learning (GEDL), a unified and extensible framework that explicitly disentangles the roles of individual components and systematically relates GEDL to existing variants. Extensive experiments demonstrate that GEDL yields comparable results on classification, uncertainty estimation and OOD detections, with theoretical grounding.

2605.25592 2026-05-26 stat.ML cs.LG 版本更新

Optimal Design for Multinomial Logit Model with Applications to Best Assortment Identification

多项Logit模型的最优设计及其在最佳组合识别中的应用

Joongkyu Lee, Min-hwan Oh

发表机构 * Seoul National University, Seoul, Korea(首尔国立大学)

AI总结 针对多项Logit(MNL)模型,提出计算高效的最优实验设计框架,通过混合整数线性规划和多项式时间松弛方法实现统计效率与可扩展性,并应用于线性效用和非均匀收益下的最佳组合识别。

Comments Accepted at ICML 2026

详情
AI中文摘要

我们研究了多项Logit(MNL)赌博机的最优实验设计,其中智能体从大小为$N$的基集中重复选择$K$个物品的子集,并观察单选择反馈。与线性或广义线性赌博机不同,MNL赌博机具有组合动作空间,这使得经典的最优设计方法和对所有子集的朴素优化在计算上难以处理。我们为MNL模型提出了一种计算高效的最优设计框架,通过两种互补方法实现了统计效率和可扩展性:(i) 将设计预言精确或认证近似地重构为带有求解器认证早停的$0$-$1$混合整数线性规划(MILP),以及(ii) 一种完全多项式时间的提升设计,用可处理的替代目标替换非线性目标。利用Kiefer-Wolfowitz等价定理,我们建立了接近G-最优性的保证,并刻画了由此产生的统计-计算权衡。作为应用,我们为具有线性效用和非均匀收益的MNL赌博机开发了一种最佳组合识别算法,并证明了实例相关的样本复杂度为$\tilde{O}\big(\frac{d \log N}{\Delta^2}\big)$,其中$d$是特征维度,$N$是臂的数量,$\Delta$是最小收益差距。

英文摘要

We study optimal experimental design for multinomial logit (MNL) bandits, where an agent repeatedly selects a subset of $K$ items from a ground set of size $N$ and observes single-choice feedback. Unlike linear or generalized linear bandits, MNL bandits have a combinatorial action space, which makes classical optimal design approaches and naive optimization over all subsets computationally intractable. We propose a computationally efficient optimal design framework for MNL models that achieves both statistical efficiency and scalability through two complementary approaches: (i) an exact or certified-approximate reformulation of the design oracle as a $0$-$1$ mixed-integer linear program (MILP) with solver-certified early stopping, and (ii) a fully polynomial-time lifted design that replaces the nonlinear objective with a tractable surrogate. Using the Kiefer-Wolfowitz equivalence theorem, we establish near G-optimality guarantees and characterize the induced statistical-computational trade-offs. As an application, we develop a best assortment identification algorithm for MNL bandits with linear utilities and non-uniform revenues, and prove an instance-dependent sample complexity of $\tilde{O}\big(\frac{d \log N}{Δ^2}\big)$, where $d$ is the feature dimension, $N$ is the number of arms, and $Δ$ is the minimum revenue gap.

2605.25590 2026-05-26 stat.ML cs.LG 版本更新

Nonstationary Generalized Linear Bandits with Discounted Online Mirror Descent

基于折扣在线镜像梯度的非平稳广义线性老虎机

Joongkyu Lee, Min-hwan Oh

发表机构 * Seoul National University(首尔国立大学)

AI总结 提出DOMD-GLB算法,利用折扣在线镜像梯度处理非平稳广义线性老虎机,在保持O(1)每轮计算和内存成本的同时,实现动态遗憾界。

详情
AI中文摘要

我们研究非平稳广义线性老虎机(GLBs),其中期望奖励通过非线性链接函数与未知时变参数建模。该框架涵盖广泛的奖励模型,包括线性、伯努利和二项式奖励。现有方法主要基于最大似然估计(MLE),使用滑动窗口、重启或折扣机制处理非平稳性。尽管这些方法在统计上实现了高效的遗憾保证,但它们通常需要在每轮重新访问过去观测,导致计算和内存成本随时间增长;此外,其中一些方法依赖于非凸投影步骤。本文提出DOMD-GLB,一种用于非平稳GLBs的新算法,利用折扣在线镜像梯度(DOMD)进行参数估计,从而每轮仅产生O(1)的计算和内存成本。我们证明了在漂移环境下的动态遗憾界为$\tilde{O} \big(c_\mu^{-1/2} d^{3/4} P_T^{1/4} T^{3/4}\big)$,在分段平稳环境下为$\tilde{O}\big(c_\mu^{-1/3} d^{2/3} \Gamma_T^{1/3} T^{2/3}\big)$,其中$d$表示特征维度,$T$表示时间范围,$P_T$表示路径长度,$\Gamma_T$表示变化点数量,$c_\mu$是与链接函数相关的曲率参数,同时显著提高了计算效率。据我们所知,这是首个每轮计算和内存成本与时间无关的非平稳GLBs算法。

英文摘要

We study nonstationary generalized linear bandits (GLBs), where the expected reward is modeled through a nonlinear link function with an unknown time-varying parameter. This framework encompasses a broad class of reward models, including linear, Bernoulli, and binomial rewards. Existing approaches are predominantly based on maximum-likelihood estimation (MLE), using sliding-window, restart, or discounting mechanisms to handle nonstationarity. Although these methods achieve statistically efficient regret guarantees, they generally require revisiting past observations at every round, which leads to computation and memory costs that grow with time; moreover, several of them rely on a non-convex projection step. In this paper, we propose DOMD-GLB, a new algorithm for nonstationary GLBs that utilizes discounted online mirror descent (DOMD) for parameter estimation, thereby incurring only $O(1)$ computation and memory costs per round. We prove dynamic regret bounds of order $\tilde{O} \big(c_μ^{-1/2} d^{3/4} P_T^{1/4} T^{3/4}\big)$ in drifting environments and $\tilde{O}\big(c_μ^{-1/3} d^{2/3} Γ_T^{1/3} T^{2/3}\big) $in piecewise-stationary environments, where $d$ denotes the feature dimension, $T$ the time horizon, $P_T$ the path length, $Γ_T$ the number of change points, and $c_μ$ a curvature parameter associated with the link function, while substantially improving computational efficiency over prior work. To the best of our knowledge, this is the first algorithm for nonstationary GLBs with per-round computation and memory costs independent of time.

2605.25581 2026-05-26 cs.LG 版本更新

Learning Latent Dynamical Causal Processes for Single-Cell Perturbation Prediction

学习单细胞扰动预测的潜在动态因果过程

Wenkang Jiang, Yuhang Liu, Erdun Gao, Ehsan Abbasnejad, Lina Yao, Javen Qinfeng Shi

发表机构 * AIML, Adelaide University(AIML,阿德莱德大学) Responsible AI Research Centre(负责任人工智能研究中心) Monash University(莫纳什大学) University of New South Wales(新南威尔士大学)

AI总结 提出一种潜在动态因果生成模型(CITE-VAE),联合捕获潜在细胞程序、扰动条件机制和时间演化,实现单细胞扰动预测的分布外泛化。

Comments Accepted to SIGKDD 2026 AI4Science Track

详情
AI中文摘要

单细胞扰动预测旨在推断细胞如何响应未见过的干预,并实现分布外(OOD)泛化,为理解扰动如何随时间重塑细胞程序提供计算途径。现有的机器学习方法取得了重要进展,但通常仅捕捉响应的一方面。潜在因果方法寻求支持泛化和解释的机制,但往往将扰动效应视为静态结果。时间模型描述基因表达随时间的变化,但通常不显式恢复驱动这些变化的潜在因果生成机制。在实践中,扰动效应既是潜在的也是动态的:干预通过未观察到的细胞程序起作用,这些程序的状态随时间演变并产生观察到的表达谱。受此观点启发,我们提出一个用于单细胞扰动数据的潜在动态因果生成模型,联合捕获潜在细胞程序、扰动条件机制和时间演化。我们进一步提供可识别性分析,表明在适当条件下,潜在因果变量可恢复至标准等价类。在此分析指导下,我们开发了CITE-VAE,一个从单细胞测序数据中恢复潜在细胞程序及其扰动驱动动态的学习框架。在Causal-3DIdent上的实验验证了理论结果和所提方法在受控环境中的有效性。在真实世界的基于CRISPR的单细胞扰动数据上的额外实验表明,与最先进的基线相比,对未见扰动的泛化能力有所提升,突显了我们方法的实际鲁棒性。

英文摘要

Single-cell perturbation prediction aims to infer how cells respond to unseen interventions and to achieve out-of-distribution (OOD) generalization, providing a computational route to understanding how perturbations reshape cellular programs over time. Existing machine learning methods have made important progress, but typically capture only one side of the response. Latent causal approaches seek mechanisms that support generalization and interpretation, yet often treat perturbation effects as static outcomes. Temporal models describe how gene expression changes across time, but usually do not explicitly recover the latent causal generative mechanisms driving these changes. In practice, perturbation effects are both latent and dynamical: interventions act through unobserved cellular programs, whose states evolve over time and give rise to observed expression profiles. Motivated by this view, we propose a latent dynamical causal generative model for single-cell perturbation data that jointly captures latent cellular programs, perturbation-conditioned mechanisms, and temporal evolution. We further provide an identifiability analysis showing that, under suitable conditions, the latent causal variables are recoverable up to standard equivalence classes. Guided by this analysis, we develop CITE-VAE, a learning framework for recovering latent cellular programs and their perturbation-driven dynamics from single-cell sequencing data. Experiments on Causal-3DIdent validate the theoretical results and the effectiveness of the proposed method in controlled settings. Additional experiments on real-world CRISPR-based single-cell perturbation data show improved generalization to unseen perturbations compared with state-of-the-art baselines, highlighting the practical robustness of our approach.

2605.25577 2026-05-26 cs.LG cs.AI 版本更新

Geometric Flow Matching for Molecular Conformation Generation via Manifold Decomposition

基于流形分解的几何流匹配分子构象生成

Yunqing Liu, Yi Zhou, Wenqi Fan

发表机构 * The Hong Kong Polytechnic University(香港理工大学)

AI总结 提出GO-Flow方法,通过将生成过程分解为平移、旋转和构象三个物理子空间,利用流形上的最优传输和测地流,解决现有方法忽略分子几何层次结构的问题,实现高质量、高效率的分子构象生成。

详情
AI中文摘要

生成准确的3D分子构象是计算化学和药物发现中的关键挑战。最近,扩散和流匹配模型取得了显著成功。然而,它们的数学公式与分子的物理现实之间存在严重的不匹配。现有方法主要将分子视为笛卡尔空间中的无结构点云,忽略了键长和键角相对刚性而扭转角构成主要柔性自由度的内在层次力学。这种对流形的不感知迫使模型从头重新学习基本几何约束,常常导致物理上不可信的中间结构。为了解决这个问题,我们提出了GO-Flow,通过流形分解将生成建模与分子几何对齐。GO-Flow不是强制在欧几里得空间中运动,而是将生成过程分解为三个物理驱动的子空间:具有线性最优输运的平移空间、$SO(3)$上具有测地流的旋转空间以及具有熵最优输运的构象空间。这种分解注入了几何归纳偏置,使生成路径更好地与分子自由度对齐。当与等变神经架构结合时,它鼓励旋转一致的生成并提高几何有效性。在GEOM-Drugs和GEOM-QM9上的大量实验表明,GO-Flow实现了最先进的生成质量。值得注意的是,通过在正确的流形上自然地学习更直的概率路径,我们的方法能够在仅50步的情况下实现高保真采样,有效弥合了结构精度与计算效率之间的差距。

英文摘要

The generation of accurate 3D molecular conformations is a pivotal challenge in computational chemistry and drug discovery. Recently, diffusion and flow matching models have achieved remarkable success. However, there is a critical misalignment between their mathematical formulation and the physical reality of molecules. Existing approaches predominantly treat molecules as unstructured point clouds in Cartesian space, overlooking the intrinsic hierarchical mechanics where bond lengths and bond angles are relatively stiff, whereas torsion angles constitute the dominant flexible degrees of freedom. This lack of manifold awareness forces models to relearn fundamental geometric constraints from scratch, often leading to physically implausible intermediate structures. To address this, we propose GO-Flow that aligns generative modeling with molecular geometry via manifold decomposition. Instead of forcing motion through Euclidean space, GO-Flow decomposes the generation process into three physically motivated subspaces: translation space with linear optimal transport, rotation space with geodesic flows on $SO(3)$, and conformation space with entropic optimal transport. This decomposition injects geometric inductive biases and makes the generative paths better aligned with molecular degrees of freedom. When combined with equivariant neural architectures, it encourages rotation-consistent generation and improves geometric validity. Extensive experiments on GEOM-Drugs and GEOM-QM9 demonstrate that GO-Flow achieves state-of-the-art generation quality. Notably, by learning straighter probability paths on the correct manifolds naturally, our method enables high-fidelity sampling with as few as 50 steps, effectively bridging the gap between structural precision and computational efficiency.

2605.25565 2026-05-26 cs.LG cs.CL 版本更新

RotMoLE: Enhancing Mixture of Low-Rank Experts through Rotational Gating Mechanism

RotMoLE:通过旋转门控机制增强混合低秩专家

Mengyang Sun, Maochuan Dou, Tao Feng, Dan Zhang, Yihao Wang, Junpeng Liu, Yifan Zhu, Jie Tang

发表机构 * Tsinghua University(清华大学) Beijing Information Science and Technology University(北京信息科技大学) National University of Singapore(新加坡国立大学) Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) Beijing University of Posts and Telecommunications(北京邮电大学)

AI总结 针对MoE-LoRA中传统门控仅标量加权限制表示能力的问题,提出RotMoLE框架,通过引入旋转门控机制对每个专家进行旋转操作,提升专家利用率和专业化程度,在多任务和多语言训练中验证有效性。

详情
AI中文摘要

虽然大型语言模型(LLM)通常在进行垂直应用之前会针对特定领域任务进行微调,但将它们适应于具有多样化专业知识的复杂场景仍然具有挑战性。与此同时,混合专家(MoE)架构已成为训练LLM的关键范式,最近的一些工作也将MoE引入参数高效微调(PEFT),提出了混合低秩专家(MoE-LoRA),以增强低秩适配器学习复杂知识的能力。然而,MoE中的传统门控机制通常仅对选中的专家应用标量重新加权,从而限制了其表示和泛化的潜在能力。受MoE-LoRA中低秩结构的启发和推动,我们提出了RotMoLE,一个专门针对低秩专家的MoE框架,其特点是一个额外的旋转门控。除了简单的缩放,RotMoLE为每个选中的专家实现了一个旋转机制,从而在专家候选有限的情况下,实现了更好的专家利用和专业化,以学习多样化的数据。在复杂多任务和多语言训练场景下的实证结果验证了我们的有效性。

英文摘要

While Large Language Models (LLMs) are commonly fine-tuned to handle domain-specific tasks before being applied to vertical applications, adapting them to complex scenarios with diverse specialized knowledge remains challenging. Meanwhile, Mixture-of-Experts (MoE) architecture has risen as a crucial paradigm for training LLMs, and some recent works have also incorporated MoE into Parameter-Efficient Fine-Tuning (PEFT) to propose the Mixture of Low-rank Experts (MoE-LoRA), to enhance the power of low-rank adapters for learning complicated knowledge. However, conventional gating mechanisms in MoE typically apply only a scalar reweighing to selected experts, thereby limiting their underlying capacity of representation and generalization. Motivated and enabled by the low-rank structures in MoE-LoRA, we propose RotMoLE, a specialized MoE framework for low-rank experts featuring an additional rotation gate. Beyond simple scaling, RotMoLE implements a rotation mechanism for each selected expert, enabling superior expert exploitation and specialization for learning diverse data, especially when expert candidates are limited. Empirical results on complex multi-task and multilingual training scenarios validate our effectiveness.

2605.25551 2026-05-26 cs.LG 版本更新

Learning Permutation from Structure Without Supervision

从结构中无监督学习排列

Ran Eisenberg, Ofir Lindenbaum

发表机构 * Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel(巴伊兰大学工程学院,拉马特甘,以色列)

AI总结 提出熵自适应Gumbel-Sinkhorn方法,通过局部调节温度改善无监督排列学习的稳定性和质量。

详情
AI中文摘要

许多学习问题需要揭示隐藏的排序,以揭示无序数据中的结构,例如排序中的单调性或拼图重建中的空间连续性。在这些设置中,排列可以作为潜在算子通过优化直接定义在重排序输出上的目标来学习,通常没有真实排序的访问。可微松弛如Gumbel-Sinkhorn通过用双随机矩阵近似排列矩阵使这种方法实用。然而,无监督地从结构学习会导致非均匀的不确定性:一些分配早期变得自信,而其他分配仍然模糊。现有方法使用单个全局温度控制这一过程,迫使所有分配同时锐化或扩散,导致大规模不稳定。我们引入了一种熵自适应的Gumbel-Sinkhorn公式,根据分配不确定性局部调节温度。这使得自信的分配可以早期离散化,同时在不明确的地方保留探索。在排序和拼图重建任务以及路由式设置中,相对于固定温度基线,自适应熵控制提高了训练稳定性和最终排列质量,特别是在问题规模和分配模糊性增加时。

英文摘要

Many learning problems require uncovering a hidden ordering that reveals structure in unordered data, such as monotonicity in sorting or spatial continuity in jigsaw reconstruction. In these settings, permutations can be learned as latent operators by optimizing objectives defined directly on the reordered output, often without access to ground-truth orderings. Differentiable relaxations such as Gumbel-Sinkhorn make this approach practical by approximating permutation matrices with doubly stochastic matrices. However, learning from structure without supervision induces a non-uniform uncertainty: some assignments become confident early, while others remain ambiguous. Existing methods control this process using a single global temperature, forcing all assignments to sharpen or diffuse simultaneously and leading to instability at scale. We introduce an entropy-adaptive formulation of Gumbel-Sinkhorn that locally modulates temperature based on assignment uncertainty. This allows confident assignments to discretize early while preserving exploration where uncertainty remains. Across sorting and jigsaw reconstruction tasks and in routing-style settings, adaptive entropy control improves training stability and final permutation quality relative to fixed-temperature baselines, particularly as problem size and assignment ambiguity increase.

2605.25549 2026-05-26 cs.CL cs.AI cs.LG 版本更新

BC Protocol: Structured Dual-Expert Dialogue for Eliciting High-Quality Chain-of-Thought Post-Training Data

BC协议:结构化双专家对话用于生成高质量思维链后训练数据

Bo Zou, Chao Xu

AI总结 针对大语言模型后训练中高质量专家思维链数据生产瓶颈,提出BC协议——一种结构化双专家引出方法,通过配对领域专家与知识工程师,系统外化专家隐性判断为自然语言推理链,实验证明其在推理过程自然性上具有压倒性优势。

详情
AI中文摘要

高质量的专家思维链(CoT)数据是大语言模型(LLM)后训练的核心瓶颈之一。现有数据生产方法各有结构性局限:众包标注缺乏深度推理路径;专家单独写作受限于“专家盲点”——专家会结构性跳过他们认为显而易见的推理步骤;RLHF仅产生偏好信号而非推理链。 本文提出BC协议——一种用于LLM后训练数据生产的结构化双专家引出方法。该方法精心配对领域专家(晶体智力)与知识工程师(流体智力),系统地将专家的隐性判断外化为自然语言推理链。我们引入了参与者资质模型,定义了影响引出质量的六个参与者特征维度。“校准的无知”是本文提出的原创概念。我们进一步提出“选择优于规定”作为方法论原则:对于隐性知识引出任务,将质量控制资源投入人员选择比投入同等资源于流程设计能获得更高回报。 在叙事小说领域的受控实验中,我们直接比较了BC协议双对话产生的CoT(A组,n=20)与同一领域专家独立撰写的CoT(B组,n=20)。三个跨供应商评判模型——GPT-4o、Claude Opus 4.5和Gemini 2.5 Pro——在五个维度上进行了盲评(共600个评分)。结果表明,BC协议在“推理过程自然性”上具有压倒性优势(A组均值4.80 vs. B组均值1.30,p=2.4×10^{-8},Cliff's δ=1.0)。

英文摘要

High-quality expert chain-of-thought (CoT) data is one of the core bottlenecks in large language model (LLM) post-training. Existing data production methods each have structural limitations: crowdsourced annotation lacks deep reasoning paths; expert solo writing is constrained by the "expert blind spot" -- experts structurally skip reasoning steps they consider obvious; RLHF only produces preference signals rather than reasoning chains. This paper proposes the BC Protocol -- a structured dual-expert elicitation method for LLM post-training data production. The method carefully pairs a domain expert (crystallized intelligence) with a knowledge engineer (fluid intelligence), systematically externalizing the expert's implicit judgments as natural language reasoning chains. We introduce the Participant Aptitude Model, which defines six participant characteristic dimensions that affect elicitation quality. "Calibrated Ignorance" is an original concept proposed in this paper. We further propose "Selection-over-Prescription" as a methodological principle: for implicit knowledge elicitation tasks, investing quality-control resources in personnel selection yields a higher return than investing the same resources in process design. In a controlled experiment in the narrative fiction domain, we directly compared CoT produced by BC Protocol dual dialogue (Group A, (n=20)) against CoT written independently by the same domain expert (Group B, (n=20)). Three cross-vendor judge models -- GPT-4o, Claude Opus 4.5, and Gemini 2.5 Pro -- conducted blind evaluation across five dimensions (600 ratings total). Results show that the BC Protocol achieves an overwhelming advantage in "naturalness of reasoning process" (Group A mean 4.80 vs. Group B mean 1.30, (p=2.4\times10^{-8}), Cliff's (δ=1.0)).

2605.25548 2026-05-26 cs.LG cs.AI 版本更新

'Si'multaneous 'S'patial-'T'emporal Message Passing for Dynamic Graph Representation Learning

Si'multaneous 'S'patial-'T'emporal Message Passing for Dynamic Graph Representation Learning

Shubhajit Roy, Anirban Dasgupta

发表机构 * Department of Computer Science and Engineering(计算机科学与工程系) Indian Institute of Technology Gandhinagar(印度理工学院甘地纳加尔)

AI总结 提出SiST-GNN,通过在一个消息传递操作中融合空间和时间信号,实现动态图表示学习的联合推理,在链接预测任务上超越先前方法109%-277%。

详情
AI中文摘要

操作于快照序列的动态图神经网络(DGNN)通常分为两类:\emph{时间优先}方法先构建每个节点的时间嵌入,然后进行空间聚合;而\emph{空间优先}方法则颠倒这一顺序,将图卷积的输出馈送到下游时间模块。无论哪种情况,严格的顺序迫使第二阶段消耗第一阶段已压缩的摘要,排除了对拓扑和演化的联合推理;具体而言,消息传递算子永远无法根据邻居的\emph{过去}轨迹来加权其贡献。本文介绍了 extbf{SiST-GNN}( extbf{Si}multaneous extbf{S}patial- extbf{T}emporal extbf{GNN}),它在单个消息传递操作中融合两种信号,而不是将它们串联。具体地,在每个快照中,我们为每个节点维护一个循环隐藏状态来总结其历史,将其与节点当前特征向量配对,并将该配对视为由跨时间边连接的两个节点;在此时间增强图上运行标准图卷积,得到更新后的表示。我们的实证研究涵盖九个公开基线和十四个模型-数据集组合,覆盖固定分割和实时更新评估场景。在每个公开基准上,SiST-GNN在链接预测任务中相对于最强先前方法,在固定分割设置中提升109%-277%,在实时更新设置中提升68%-194%。我们还通过离散化底层连续时间事件流,构建了三个动态节点分类任务;在此,SiST-GNN以7%-22%的优势击败领先的离散时间(DTDG)基线,并与直接消费原始事件的连续时间(CTDG)方法相匹配。

英文摘要

Dynamic graph neural networks (DGNNs) that operate on snapshot sequences typically fall into one of two categories. \emph{Temporal-first} approaches build per-node temporal embeddings and only afterwards perform spatial aggregation, whereas \emph{Spatial-first} approaches invert this order, feeding the output of a graph convolution into a downstream temporal module. In either case, the rigid sequencing forces the second stage to consume an already-compressed summary produced by the first, ruling out joint reasoning over topology and evolution; concretely, the message-passing operator never gets to weight a neighbor's contribution by that neighbor's \emph{past} trajectory. This paper introduces \textbf{SiST-GNN} (\textbf{Si}multaneous \textbf{S}patial-\textbf{T}emporal \textbf{GNN}), which fuses the two signals inside a single message-passing operation rather than chaining them. Concretely, at each snapshot we maintain a recurrent hidden state per node that summarises its history, pair it with the node's current feature vector, and treat the pair as two nodes joined by a cross-time edge; running a standard graph convolution on this temporally augmented graph yields the updated representation. Our empirical study spans nine public baselines and fourteen model-dataset combinations, covering both fixed-split and live-update evaluation regimes. Across every public benchmark, SiST-GNN sets a new state of the art in link prediction task over the strongest prior method by $109$--$277\%$ in the fixed-split setting and by $68$--$194\%$ in the live-update setting. We additionally construct three dynamic node-classification tasks by discretising the underlying continuous-time event streams; here SiST-GNN beats the leading discrete-time (DTDG) baseline by $7$--$22\%$ and matches continuous-time (CTDG) methods that consume the raw events directly.

2605.25541 2026-05-26 cs.CG cs.AI cs.HC cs.LG 版本更新

TopoAlign: Topology-Aware Visual Representation Alignment

TopoAlign:拓扑感知的视觉表示对齐

Xinyuan Yan, Rita Sevastjanova, Mennatallah El-Assady, Bei Wang

发表机构 * University of Utah(犹他大学) ETH Zürich(苏黎世联邦理工学院)

AI总结 提出TopoAlign框架,利用拓扑数据分析中的mapper图,通过联合力导向优化、自动结构匹配区域检测和基序查询,从拓扑角度比较不同模型或层的表示结构对齐。

详情
AI中文摘要

神经网络将输入编码为高维向量(称为表示),通过编码任务相关的结构和语义来捕捉模型如何处理数据。表示对齐指不同模型、层或训练条件对相同输入产生相似表示的程度,对模型解释、选择和鲁棒性分析有重要意义。现有的对齐度量方法主要依赖于几何属性(如邻域和聚类相似性),对表示的全局组织提供的洞察有限。在这项工作中,我们提出了TopoAlign,一个从结构角度视觉比较模型表示的拓扑感知框架。利用拓扑数据分析中的mapper图,TopoAlign联合分析来自不同模型或层的共享输入构建的图。该框架支持自上而下的比较工作流:首先通过联合力导向优化进行全局结构对齐,生成协调的图布局;然后通过自动检测结构匹配区域(用Bubble Sets可视化)识别局部对应关系;最后通过基于基序的查询和膜启发式可视化实现细粒度模式检查。我们通过语言和多模态模型的案例研究以及专家反馈展示了TopoAlign。结果表明,TopoAlign从拓扑角度为表示结构和对齐提供了有意义的洞察。

英文摘要

Neural networks encode inputs as high-dimensional vectors, known as representations, that capture how models process data by encoding task-relevant structure and semantics. Representation alignment refers to the degree to which different models, layers, or training conditions produce similar representations for the same inputs, with important implications for model interpretation, selection, and robustness analysis. Existing approaches to measure alignment primarily rely on geometric properties, such as neighborhood and cluster similarity, offering limited insight into the global organization of representations. In this work, we present TopoAlign, a topology-aware framework for visually comparing model representations from a structural perspective. Leveraging mapper graphs from topological data analysis, TopoAlign jointly analyzes graphs constructed from representations of shared inputs across different models or layers. The framework supports a top-down comparative workflow: it first performs global structure alignment via joint force-directed optimization to produce coordinated graph layouts; it then identifies local correspondences through automated detection of structurally matching regions, visualized with Bubble Sets; and finally it enables fine-grained pattern inspection through motif-based queries and membrane-inspired visualizations. We demonstrate TopoAlign through case studies on language and multimodal models, complemented by expert feedback. Our results show that TopoAlign provides meaningful insights into representation structure and alignment from a topological perspective.

2605.25540 2026-05-26 cs.SD cs.LG 版本更新

A Multimodal Framework for Dementia Detection via Linguistic and Acoustic Representation Learning

基于语言和声学表征学习的多模态痴呆检测框架

Loukas Ilias, Dimitris Askounis

发表机构 * Decision Support Systems Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens(决策支持系统实验室,电气与计算机工程学院,国家技术大学雅典)

AI总结 提出一个端到端可训练的多模态深度学习框架,通过预训练模型提取声学和文本特征,结合注意力融合与互信息最大化,实现自动痴呆检测。

详情
AI中文摘要

阿尔茨海默病(AD)是一种进行性神经退行性疾病,是痴呆的主要原因,影响记忆、推理、沟通和日常功能。早期诊断尤为重要,因为及时干预可能有助于减缓认知衰退并改善患者护理。最近的研究表明,自发性言语包含与痴呆相关的有价值的语言和声学生物标志物。然而,现有方法通常依赖于独立训练的模态特定模型、特征拼接策略、集成方法或基于注意力的融合机制,这些方法并未明确最大化语音和转录表示之间的依赖性。在这项工作中,我们提出了一种用于自动痴呆检测的多模态深度学习框架,该框架以端到端可训练的方式联合利用语音和转录信息。具体来说,语音录音被分割成10秒的片段,并通过预训练的HuBERT模型提取上下文化的声学表示。为了更好地捕捉信息丰富的时域语音特征,采用注意力统计池化来聚合帧级声学嵌入。对于文本模态,使用预训练的BERT模型对转录进行编码,其中[CLS]标记表示用作语言嵌入。随后,使用基于注意力的音频-文本融合(AT-Fusion)机制组合声学和文本表示。此外,我们引入了一个MINE目标,以最大化模态之间的互信息并改善多模态表示对齐。最终融合的多模态表示用于痴呆分类。在公开的ADReSS挑战赛和PROCESS-2数据集上进行的实验证明了所提方法在基于语音的痴呆评估中的有效性和鲁棒性。

英文摘要

Alzheimer's disease (AD) is a progressive neurodegenerative disorder and the leading cause of dementia, affecting memory, reasoning, communication, and daily functioning. Early diagnosis is particularly important, as timely intervention may help slow cognitive decline and improve patient care. Recent studies have demonstrated that spontaneous speech contains valuable linguistic and acoustic biomarkers associated with dementia. However, existing approaches often rely on independently trained modality-specific models, feature concatenation strategies, ensemble methods, or attention-based fusion mechanisms that do not explicitly maximize the dependency between speech and transcript representations. In this work, we propose a multimodal deep learning framework for automatic dementia detection that jointly exploits speech and transcript information in an end-to-end trainable manner. Specifically, speech recordings are divided into 10-second segments and passed through a pre-trained HuBERT model to extract contextualized acoustic representations. To better capture informative temporal speech characteristics, attentive statistics pooling is employed to aggregate frame-level acoustic embeddings. For the textual modality, transcripts are encoded using a pre-trained BERT model, where the [CLS] token representation is used as the linguistic embedding. The acoustic and textual representations are subsequently combined using an attention-based Audio-Text Fusion (AT-Fusion) mechanism. In addition, we introduce a MINE objective to maximize the mutual information between modalities and improve multimodal representation alignment. The fused multimodal representation is finally used for dementia classification. Experiments conducted on the publicly available ADReSS Challenge and PROCESS-2 dataset demonstrate the effectiveness and robustness of the proposed approach for speech-based dementia assessment.

2605.25527 2026-05-26 cs.LG cs.CE 版本更新

DeepSeekMath Meets Order Book: Group-Aware Policy Optimization for High-Frequency Directional Trading

DeepSeekMath 遇见订单簿:面向高频方向性交易的组感知策略优化

Sayak Charabarty, Souradip Pal

发表机构 * Department of Computer Science(计算机科学系) Northwestern University(西北大学) School of Electrical and Computer Engineering(电气与计算机工程学院) Purdue University(普渡大学)

AI总结 本文通过将基于订单流的状态模型与策略梯度方法结合,研究限价订单簿上的高频交易强化学习,提出组感知策略优化方法,在回测中优于基于价值的 Q-learning 基线。

Comments 9 pages, 3 figures

详情
AI中文摘要

本文通过将基于订单流的状态模型与策略梯度方法配对,研究限价订单簿上的高频交易强化学习。与基于价值的 RL 技术(如表格 Q-learning)不同,我们的方法部署基于策略的方法,如普通 PPO 以及受 DeepSeekMath 启发的变体 GRPO 和 GSPO,这些方法使用组归一化更新和下行感知整形。在使用基于点差缩放奖励的简化回测设置下,对金融资产 AMZN、AAPL 和 GOOG 进行回测,这些新策略在净平均 PnL、盈利能力和回撤方面优于 Q-learning 基线。我们的结果表明:(1) 订单流信号是策略 RL 的合适状态;(2) 组感知 PPO 替代方法优于基于价值的基线。

英文摘要

This paper studies reinforcement learning for high-frequency trading on limit order books by pairing an Order-Flow-based state model with policy-gradient methods. Instead of value-based RL techniques like tabular Q-learning, our approach deploys policy-based methods like vanilla PPO and DeepSeekMath-inspired variants like GRPO and GSPO, that use group-normalized updates and downside-aware shaping. On backtests with financial assets AMZN, AAPL, and GOOG under a simplified backtesting setup based on spread-scaled rewards, these new policies improve net average PnL, profitability, and drawdown over the Q-Learning baseline. Our results show that (1) Order-Flow signals are an adequate state for policy RL and (2) group-aware PPO surrogates are preferable over value-based baselines.

2605.25526 2026-05-26 stat.ML cs.LG 版本更新

From DPPs to $k$-DPPs: identifiability analysis via spectral decomposition

从DPP到$k$-DPP:通过谱分解的可识别性分析

Hideitsu Hino, Keisuke Yano

发表机构 * The Institute of Statistical Mathematics(统计数学研究所)

AI总结 通过谱分解研究行列式点过程(DPP)及其条件版本$k$-DPP的几何结构,揭示了$k$-DPP中谱参数和特征空间旋转参数的可识别性变化,并刻画了可识别性差距。

Comments 10 pages

详情
AI中文摘要

我们通过谱分解$L=UΛU^{\top}$研究行列式点过程(DPP)的几何结构。谱$Λ$通过初等对称多项式控制基数分布,而特征空间方向$U$控制每个固定基数层内的条件分布。在基数$k$上取条件得到$k$-DPP,其可识别性结构发生根本变化:谱参数仅在一个公共尺度下可识别,特征空间旋转参数仅通过特征向量矩阵的平方子式可识别。我们通过三个显式不变性(尺度、符号相似性和特征空间旋转)以及一个维数计数定理精确刻画了可识别性差距,该定理表明当$\binom{N}{k}<N(N+1)/2$时存在额外的连续不可识别性。相比之下,对于完整DPP,不可识别性仅来自离散的符号相似性。

英文摘要

We study the geometry of determinantal point processes (DPPs) through the spectral decomposition $L=UΛU^{\top}$. The spectrum $Λ$ governs the cardinality distribution via elementary symmetric polynomials, while the eigenspace orientation $U$ governs the conditional law within each fixed-cardinality stratum. Conditioning on cardinality $k$ yields the $k$-DPP, for which the identifiability structure changes fundamentally: the spectral parameter becomes identifiable only up to a common scale, and the eigenspace rotation parameter is identifiable only through squared minors of the eigenvector matrix. We characterize the identifiability gap precisely, via three explicit invariances (scale, sign similarity, and eigenspace rotation) and a dimension-counting theorem showing the existence of additional continuous non-identifiability whenever $\binom{N}{k}<N(N+1)/2$. In contrast, for the full DPP the non-identifiability comes only from the discrete sign similarity.

2605.25525 2026-05-26 cs.LG 版本更新

SAE-FD: Sparse Autoencoder Feature Distillation for Continual Learning of Large Language Models

SAE-FD: 面向大语言模型持续学习的稀疏自编码器特征蒸馏

Mingxu Zhang, Yuhan Li, Lujundong Li, Dazhong Shen, Hui Xiong, Ying Sun

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) Nanjing University of Aeronautics and Astronautics(南京航空航天大学) The 63rd Research Institute, National University of Defense Technology, Nanjing(国防科技大学第六十三研究所,南京)

AI总结 针对持续学习中的灾难性遗忘问题,提出基于稀疏自编码器特征蒸馏的方法,通过将模型表示锚定在稀疏特征空间以减少表征纠缠,实现更精准的正则化,在多个基准上优于现有方法。

详情
AI中文摘要

持续学习使大语言模型能够适应不断变化的任务而无需从头重新训练,但灾难性遗忘仍然是一个核心障碍。在持续学习方法中,基于正则化的方法被广泛用于约束模型更新并减少遗忘,这些方法在权重空间、梯度空间或输出空间中操作。然而,这些密集表示空间存在特征叠加问题,即多个概念被编码在重叠的维度中,使得难以在不阻碍新任务学习的情况下有选择地保护先前学到的知识。为了解决这个问题,我们提出了\method(稀疏自编码器特征蒸馏),该方法将模型表示锚定在预训练稀疏自编码器的稀疏特征空间中,其中密集激活被分解为稀疏过完备基,从而减少表征纠缠,实现更有针对性的正则化,同时减少对新任务学习的干扰。在三个模型架构上的两个持续学习基准实验表明,\method始终优于现有的基于正则化的方法,平均准确率高达52.70%,仅产生-0.46的后向迁移。

英文摘要

Continual learning enables large language models to adapt to evolving tasks without retraining from scratch, yet catastrophic forgetting remains a central obstacle. Among continual learning methods, regularization-based approaches are widely used to constrain model updates and reduce forgetting, operating in weight space, gradient space, or output space. However, these dense representation spaces suffer from feature superposition, where multiple concepts are encoded in overlapping dimensions, making it difficult to selectively protect previously learned knowledge without impeding new-task learning. To address this issue, we propose \method (Sparse Autoencoder Feature Distillation), which anchors model representations in the sparse feature space of a pre-trained Sparse Autoencoder, where dense activations are decomposed into a sparse overcomplete basis that reduces representational entanglement, enabling more targeted regularization with less interference to new-task learning. Experiments on two continual learning benchmarks across three model architectures show that \method consistently outperforms existing regularization-based methods, achieving up to 52.70% average accuracy with only -0.46 backward transfer.

2605.25509 2026-05-26 stat.ML cs.LG 版本更新

Guided Flow Matching for Forward and Inverse PDE Problems with Sparse Observations: Algorithm and Theory

面向稀疏观测的正反PDE问题的引导流匹配:算法与理论

Xifeng Zhang, Jin Zhao

发表机构 * School of Mathematical Science(数学科学学院) Academy for Multidisciplinary Studies(多学科研究学院)

AI总结 提出FM4PDE流匹配生成框架,通过引导采样联合学习PDE系数与解分布,实现稀疏观测下的正向模拟与逆问题恢复,并提供误差保证。

Comments 50 pages, 8 figures, 4 tables

详情
AI中文摘要

从稀疏观测中重建PDE解是科学计算中的核心挑战。我们提出FM4PDE,一种流匹配生成框架,学习PDE系数(或初始状态)与解(或最终状态)的联合分布,从而在有限配对数据下实现正向模拟和逆问题恢复。在推理时,采样由一个复合损失引导,该损失强制与稀疏测量一致并减少PDE残差;我们支持确定性、随机性和混合采样器。我们为这些引导过程提供误差保证。对于确定性优化器,一个强制条件确保轨迹有界,且逐阶段收缩导致目标精度的对数复杂度。对于随机采样器,我们引入自适应引导并假设速度场的耗散性,以获得与噪声基底参数无关的均匀矩界。这导致多项式时间误差界,且一个匹配的下界表明恒定引导会引入不可避免的正偏差,从而激发自适应性。还提供了混合确定性-随机分析。在静态和时变基准PDE上的实验表明,与基于扩散的生成模型相比,具有竞争性的精度和更快的推理速度。

英文摘要

Reconstructing PDE solutions from sparse observations is a core challenge in scientific computing. We present FM4PDE, a flow-matching generative framework that learns the joint distribution of PDE coefficients (or initial states) and solutions (or final states), enabling both forward simulation and inverse recovery with limited paired data. At inference, sampling is guided by a composite loss that enforces agreement with sparse measurements and reduces the PDE residual; we support deterministic, stochastic, and hybrid samplers. We provide error guarantees for these guided procedures. For the deterministic optimizer, a coercivity condition ensures trajectory boundedness and a phase-wise contraction yields logarithmic complexity in the target accuracy. For the stochastic sampler, we introduce adaptive guidance and assume dissipativity of the velocity field to obtain uniform moment bounds independent of the noise-floor parameter. This leads to polynomial-time error bounds, and a matching lower bound shows constant guidance induces an unavoidable positive bias, motivating adaptivity. A hybrid deterministic-stochastic analysis is also provided. Experiments on static and time-dependent benchmark PDEs demonstrate competitive accuracy and faster inference than diffusion-based generative models.

2605.25508 2026-05-26 cs.LG 版本更新

Relative Repairability: A Calibration-Based Diagnostic for High-Sparsity Post-Pruning Allocation

相对可修复性:一种基于校准的高稀疏度剪枝后分配诊断方法

Qishi Zhan, Liang He, Minxuan Hu, Ziheng Chen

发表机构 * Marquette University(马凯特大学) Tongji University(同济大学) Cornell University(康奈尔大学) UT Austin(得克萨斯大学奥斯汀分校)

AI总结 提出相对可修复性(RR)指标,通过校准数据比较层剪枝引起的原始激活失真与通道方差匹配修复后的残余失真,用于诊断高稀疏度下剪枝损伤的可修复性,实验表明其在架构依赖的可恢复性转变区域优于现有分配规则。

详情
AI中文摘要

在极高稀疏度下,神经网络剪枝不仅决定哪些权重保留,还决定剪枝引起的损伤在网络中的分布位置,以及这些损伤能否通过固定的轻量修复过程恢复。我们通过修复条件稀疏分配的角度研究这一问题。我们引入相对可修复性(RR),一种基于校准的诊断方法,比较逐层剪枝引起的原始激活失真与通道方差匹配修复后的残余失真。RR仅使用未标记的校准数据,估计修复后剩余局部损伤的比例。在CIFAR10和CIFAR100上的ResNet18、ResNet34和VGG16 BN实验中,我们发现RR并非普遍主导的分配规则。相反,它在架构依赖的可恢复性转变附近最为有用,此时标准的结构或幅度基分配先验开始失去可靠性,但修复后恢复尚未完全崩溃。在CIFAR100 ResNet18上,细粒度扫描显示RR在中心转变带上优于ERK,并在该带上部超过LAMP。投影强制消融进一步表明,有上限的ERK可能过度保护投影层,将过多稀疏度转移到常规卷积上,降低修复后恢复。这些结果表明,高稀疏度剪枝不仅应分配保留的权重,还应分配可修复的损伤。

英文摘要

At very high sparsity, neural network pruning does more than decide which weights remain. It also determines where pruning induced damage is placed across the network, and whether that damage can be recovered by a fixed lightweight repair procedure. We study this problem through the lens of repair conditioned sparsity allocation. We introduce Relative Repairability (RR), a calibration based diagnostic that compares the raw activation distortion caused by layerwise pruning with the residual distortion left after channelwise variance matching repair. RR estimates the fraction of local damage that remains after repair, using only unlabeled calibration data. Across ResNet18, ResNet34, and VGG16 BN on CIFAR10 and CIFAR100, we find that RR is not a universally dominant allocation rule. Instead, it is most useful near an architecture dependent recoverability transition, where standard structural or magnitude based allocation priors begin to lose reliability but post repair recovery has not yet fully collapsed. On CIFAR100 ResNet18, a fine grained sweep shows that RR improves over ERK across the central transition band and surpasses LAMP near the upper part of this band. A projection forced ablation further shows that capped ERK can over protect projection layers, shifting excessive sparsity onto regular convolutions and reducing post repair recovery. These results suggest that high sparsity pruning should allocate not only retained weights, but also repairable damage.

2605.25499 2026-05-26 cs.LG 版本更新

Accelerated Dynamic Importance Weighting with Versatile Divergence-Minimizing Estimators

加速动态重要性加权与通用散度最小化估计器

Tongtong Fang, Nan Lu, Gang Niu, Kenji Fukumizu, Masashi Sugiyama

发表机构 * The Institute of Statistical Mathematics(统计数学研究所) University of Bristol(布里斯托大学) RIKEN Center for Advanced Intelligence Project(RIKEN高级智能项目中心) The University of Tokyo(东京大学)

AI总结 针对联合分布偏移问题,提出加速动态重要性加权(ADIW)框架,通过轻量投影梯度下降和通用散度最小化,在提升效率的同时实现最优性能。

详情
AI中文摘要

重要性加权(IW)是解决联合分布偏移的黄金方法,其中训练数据和测试数据的联合分布不同。为解决此问题,IW估计测试与训练密度比作为重要性权重,并相应地重新加权训练损失。最近动态IW(DIW)的进展将权重估计集成到模型训练中,实现了深度模型的可扩展IW,并在大型现代数据集上取得了强劲性能。尽管有前景,DIW仍存在两个局限。首先,它通过在每个小批量中求解核均值匹配(KMM)诱导的优化问题至收敛,导致大量计算开销。其次,它仅依赖KMM进行权重估计,而IW文献包含基于不同散度度量的多种估计方法。本文提出加速动态IW(ADIW),一个统一且高效的联合分布偏移下深度学习IW框架。ADIW执行少量轻量投影梯度下降更新,从先前更新的权重热启动,显著提高效率。此外,ADIW将DIW推广为一个统一的散度最小化框架,以即插即用方式支持多种权重估计方法,包括基于Kullback-Leibler散度、平方距离和Wasserstein-1距离的方法。我们在温和条件下建立了ADIW的收敛保证,实证结果表明ADIW在实现最先进IW性能的同时,效率大幅提升。

英文摘要

Importance weighting (IW) is a golden solver for joint distribution shift, where the joint distributions differ between the training and test data. To solve this problem, IW estimates test-to-training density ratios as importance weights and reweights the training losses accordingly. Recent advances in dynamic IW (DIW) integrate weight estimation into model training, enabling scalable IW for deep models and achieving strong performance on large modern datasets. Despite its promise, DIW remains limited in two aspects. First, it incurs substantial computational overhead by solving a kernel mean matching (KMM)-induced optimization problem to convergence in every mini-batch. Second, it relies solely on KMM for weight estimation, whereas the IW literature contains diverse estimation methods based on different divergence measures. In this paper, we propose accelerated DIW (ADIW), a unified and efficient IW framework for deep learning under joint distribution shift. ADIW performs a few lightweight projected gradient descent updates that warm-start from previously updated weights, substantially improving efficiency. Moreover, ADIW generalizes DIW into a unified divergence-minimization framework that supports diverse weight-estimation methods in a plug-and-play manner, including those based on the Kullback-Leibler divergence, squared distance, and Wasserstein-1 distance. We establish convergence guarantees for ADIW under mild conditions, and empirical results demonstrate that ADIW achieves state-of-the-art IW performance while being substantially more efficient.

2605.25492 2026-05-26 cs.LG 版本更新

SafetyRepro: Configuration-Conditional Rank Instability on Alignment Benchmarks

SafetyRepro: 对齐基准上的配置条件排名不稳定性

Yanhang Li, Zhichao Fan, Zexin Zhuang

发表机构 * Northeastern University, Boston, MA, USA(东北大学) University of Illinois Urbana-Champaign, Urbana, IL, USA(伊利诺伊大学厄巴纳-香槟分校) Southern Methodist University, Dallas, TX, USA(南方 Methodist 大学)

AI总结 本文通过理论命题和提交戳评估协议,证明对齐基准上的成对模型比较结果(如“A比B更安全”)会因未指定的配置选择而发生严格反转。

详情
AI中文摘要

从基础模型基准中得出的成对模型比较(“A比B更安全”)被视为定量结论,但依赖于基准论文未充分指定的工具选择。我们在此原语上闭合了一个理论-基准循环:一个有限包络命题,将可测量的成对不一致率与严格排序是否允许配置对反转联系起来,并配以一个提交戳评估协议,该协议在广泛引用的对齐基准上实现了这一命题。在我们测试的每个基准上,仅配置选择就能翻转成对结论;该命题隔离了这种严格反转的失败模式。

英文摘要

Pairwise model comparisons drawn from foundation-model benchmarks ("A is safer than B") are read as quantitative verdicts but hinge on harness choices benchmark papers under-specify. We close one theory-benchmark loop on this primitive: a finite-envelope proposition tying a measurable pairwise-disagreement rate to whether the strict ordering admits a configuration-pair reversal, paired with a commit-stamped evaluation protocol that operationalises it on widely cited alignment benchmarks. On every benchmark we test, configuration choice alone can flip the pairwise verdict; the proposition isolates this strict-reversal failure mode.

2605.25469 2026-05-26 cs.LG 版本更新

JacQuant: STE-Free Quantization-Aware Training via Learned Jacobian Surrogates

JacQuant: 通过学习的雅可比替代实现无STE的量化感知训练

Kai Yi, Vignesh Vivekraja, Harshit Khaitan, Steven Li

发表机构 * Meta AI

AI总结 提出JacQuant框架,通过学习模型对参数变化的局部敏感性的轻量级替代,在不使用直通估计器的情况下稳定和加速量化感知训练,在低比特大语言模型上达到更高精度。

详情
AI中文摘要

量化感知训练(QAT)被广泛部署,但通常依赖于直通估计器(STE),它通过强行将梯度通过不可微量化器传递。这常常使得训练在边界附近脆弱,并且与低精度模型的实际行为弱对齐。我们引入JacQuant,一个QAT框架,它学习模型对参数变化的局部敏感性的轻量级替代,并使用它在标准方差缩减优化器中稳定和加速训练。该替代是廉价的(对角或块对角)、数据驱动的,并且与常见的权重和激活量化器兼容。在代码保持的训练阶段,我们证明了非凸目标的收敛性,并在PL条件下获得了线性速率,并通过简单的校准论证将学习到的敏感性与端到端输出保真度联系起来。在$\leq 2$比特的LLM基准测试中,JacQuant始终达到比基于STE的QAT更高的准确度,并且对各种模型的运行时分析表明,在实际分组大小下,额外成本可以忽略不计。该方法即插即用,无需更改前向量化器;我们的实证声明仅限于超低比特LLM QAT。

英文摘要

Quantization-aware training (QAT) is widely deployed but typically relies on the Straight-Through Estimator (STE), which passes gradients through non-differentiable quantizers by fiat. This often makes training brittle near bin boundaries and weakly aligned with the actual behavior of the low-precision model. We introduce JacQuant, a QAT framework that learns a lightweight surrogate of the model's local sensitivity to parameter changes and uses it to stabilize and accelerate training within standard variance-reduced optimizers. The surrogate is inexpensive (diagonal or block-diagonal), data-driven, and compatible with common weight and activation quantizers. On code-preserving training phases, we prove convergence for non-convex objectives and obtain linear rates under a PL condition, and we relate the learned sensitivity to end-to-end output fidelity via a simple calibration argument. Across LLM benchmarks at $\leq 2$ bits, JacQuant consistently reaches higher accuracy than STE-based QAT, and the runtime analyses on various models show that the added cost remains negligible under practical group sizes. The method is drop-in and requires no changes to the forward quantizers; our empirical claims are scoped to ultra-low-bit LLM QAT.

2605.25460 2026-05-26 stat.ML cs.LG 版本更新

Mean-Shift PCA by Knockoff Mean

通过Knockoff均值的Mean-Shift PCA

Mengda Li, Zeng Li, Jianfeng Yao

发表机构 * Department of Statistics and Data Science, Southern University of Science and Technology(统计与数据科学系,南方科技大学) Data Science, Southern University of Science(数据科学,南方科技大学) School of Data Science, The Chinese University of Hong Kong, Shenzhen, China(数据科学学院,香港中文大学(深圳))

AI总结 提出一种通过故意引入knockoff均值扰动来消除PCA中均值偏移噪声的方法,利用随机矩阵理论证明均值偏移尖峰与原始协方差特征值谱可分离,并设计了两阶段PCA算法。

Comments ICML 2026

详情
AI中文摘要

去除噪声是困难的,但添加噪声是容易的。在这项工作中,我们展示了如何通过故意引入knockoff均值扰动来消除PCA中的均值偏移噪声成分。标准PCA对样本均值的偏移高度敏感:来自偏移分布的一小部分样本可能导致主成分方向的大偏差。在高维情况下,现有的鲁棒PCA方法无法处理混合模型中固有的均值偏移污染结构。利用随机矩阵理论工具,我们证明了均值偏移尖峰在谱上与原始协方差的稳定特征值可分离。此外,原始特征空间渐近地不受污染影响,与混合权重无关。利用这种谱稳定性,我们提出了一种简单的两阶段PCA算法,通过添加knockoff均值,仅使用标准PCA操作来识别和移除均值偏移成分。

英文摘要

Removing noise is difficult, but adding noise is easy. In this work, we show how to eliminate mean-shift noisy components from PCA by deliberately introducing knockoff mean-shift perturbation. Standard PCA is highly sensitive to shifts in the sample mean: a small fraction of samples from a shifted distribution can cause large deviations in the leading principal components. In high-dimensional regimes, existing Robust PCA approaches cannot handle the mean-shift contamination structure inherent in the mixture model. Using tools from Random Matrix Theory, we prove that the mean-shift spikes are spectrally separable from the stable eigenvalues of the original covariance. Furthermore, the original eigenspace remains asymptotically invariant to the contamination, independent of the mixture weight. Exploiting this spectral stability, we propose a simple, two-stage PCA algorithm by adding knockoff mean that identifies and removes the mean-shift component using only standard PCA operations.

2605.25459 2026-05-26 cs.LG cs.AI 版本更新

From Simulation to Enaction: Post-trained language models recognize and react to their own generations

从模拟到行动:后训练语言模型识别并回应自身生成

Asvin G., Jack Lindsey

发表机构 * Institute for Advanced Study, Princeton(普林斯顿高级研究院) Anthropic

AI总结 本文发现后训练语言模型能够识别自身生成(on-policy)并降低输出熵,通过内部表示输入意外性来调节,且显式识别与隐式识别机制不同。

Comments Anthropic fellows project mentored by Jack Lindsey

详情
AI中文摘要

语言模型被预训练为被动预测器,没有动机去建模自身输出的后果。后训练改变了这一点:产生自身响应的模型可以从识别自身处于on-policy状态中获益。我们提供证据表明,后训练模型识别其on-policy生成,并且这种识别隐式编码在其输出分布中。特别是,在不同模型家族和规模类别中,on-policy输出分布熵比off-policy熵低3-4倍。我们将这种效应的部分原因追溯到输入意外性的内部表示,该表示跟踪模型先前预测中最新的输入标记的不可能性,并因果性地调节输出熵。这些现象的一个例子可以在对开放式提示的响应中观察到;后训练模型(与预训练模型不同)在第一个输出标记之前就将其对即将生成的响应主题的不确定性坍缩;用不同主题的前缀违反这种缓存意图会导致更高的输出熵。我们还测试了模型是否可以通过显式口头报告区分on-policy上下文和前缀。我们发现它们可以,但有趣的是,这种显式识别通过不同于隐式识别的机制进行路由。

英文摘要

Language models are pretrained as passive predictors with no incentive to model the consequences of their own outputs. Post-training changes this: a model producing its own responses can benefit from recognizing that it is on-policy. We present evidence that post-trained models recognize their on-policy generations, and this recognition is implicitly encoded in their output distributions. In particular, on-policy output distribution entropy is 3--4$\times$ lower than off-policy entropy, across model families and size classes. We trace part of this effect to an internal representation of input surprise, tracking the unlikeliness of the most recent input token according to the model's prior predictions, that causally modulates output entropy. One example of these phenomena can be observed in response to open-ended prompts; post-trained models (unlike pretrained models) collapse their uncertainty over the topic of their upcoming response before the first output token; violating this cached intention with a different-topic prefill results in higher output entropy. We also tested whether models can distinguish on-policy contexts from prefills via explicit verbal report. We find that they can, but that interestingly, this explicit recognition routes through a different mechanism than implicit recognition.

2605.25452 2026-05-26 stat.ME cs.LG stat.ML 版本更新

Different Statistical Perspectives for Understanding Generalisation in Graph Neural Networks

理解图神经网络泛化能力的不同统计视角

Nil Ayday, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar

发表机构 * Technical University of Munich(慕尼黑技术大学) Australian Institute for Machine Learning(澳大利亚机器学习研究所) Adelaide University(阿德莱德大学)

AI总结 本文从学习理论、无限参数/图渐近和随机图模型三个统计框架综述图神经网络泛化性的理论进展。

Comments 15 pages, 4 figures, submission for Special Issue in AStA Advances in Statistical Analysis

详情
AI中文摘要

图神经网络(GNN)是目前用于图结构数据学习和预测的最流行方法,已部署在从社交网络分析到药物发现的各种领域。然而,对GNN性能的数学理解仍然有限。我们讨论了用于研究GNN统计泛化性的各种视角。我们识别出三个广泛的框架。第一种方法根植于学习理论,依赖于一致收敛界和特定GNN架构假设类的复杂度。该方法还建立在GNN的表达性之上,通常通过图同构测试的视角进行研究。第二个原则是通过分析无限多参数或无限图大小渐近下的GNN来简化神经架构。该方法使用高斯过程、神经正切核或图神经网络算子来近似GNN,从而可以研究训练后GNN的泛化性或稳定性。第三个框架在随机图模型(通常是上下文随机块模型)下研究GNN,并利用高维统计工具推导非渐近误差率。我们强调了一些关键的理论结果,并讨论了每个视角的一些局限性和开放研究问题。

英文摘要

Graph Neural Networks (GNN) are currently the most popular approach for learning and prediction on graph-structured data and are deployed in various fields, from social network analysis to drug discovery. However, there is limited mathematical understanding of the performance of GNNs. We discuss the various perspectives used to study statistical generalisation in GNNs. We identify three broad frameworks. The first approach, rooted in learning theory, relies on uniform convergence bounds and the complexity of the hypothesis class of specific GNN architectures. This approach also builds on the expressivity of GNNs, typically studied through the lens of graph isomorphism tests. The second principle is to simplify the neural architecture by analysing GNNs under the asymptotics of infinitely many parameters or infinite graph size. This approach approximates GNNs using Gaussian processes, neural tangent kernels or graphon neural network operators, which allow studying the generalisation or stability of trained GNNs. The third framework studies GNNs under random graph models, often the contextual stochastic block model, and derives non-asymptotic error rates using tools from high-dimensional statistics. We highlight some key theoretical results and discuss a few limitations and open research questions for each perspective.

2605.25446 2026-05-26 cs.AI cs.LG 版本更新

A Signal-Language Foundation Model for Broad-Spectrum Cardiovascular Assessment from Routine Electrocardiography

面向常规心电图广谱心血管评估的信号-语言基础模型

Ziqing Yu, Yuhui Tao, Jiayu Huo, Lei Pan, Zilong Xiao, Juecheng Chen, Xiao Li, Jianxuan Li, You Zhou, Zhixing Li, Cong Wang, Beijian Zhang, Chen Chen, Hongyang Lu, Konstantinos Patlatzoglou, Daniel B. Kramer, Jonathan W. Waks, Yangang Su, Fu Siong Ng, Shuo Wang, Yixiu Liang, Junbo Ge

发表机构 * Department of Cardiology, Zhongshan Hospital of Fudan University(复旦大学中山医院心内科) Shanghai Institute of Cardiovascular Diseases, National Clinical Research Centre for Interventional Medicine(上海心血管病研究所,国家介入医学临床研究中心) Digital Medical Research Center, School of Basic Medical Sciences, Fudan University(复旦大学基础医学研究院数字医疗研究中心) Shanghai Key Laboratory of Medical Imaging Computing and Computer Assisted Intervention(上海医学影像计算与计算机辅助手术重点实验室) National Heart and Lung Institute, Imperial College London, Hammersmith Hospital, Du Cane Road(伦敦帝国学院国家心肺研究所,哈马舍姆医院,杜肯路) Department of Cardiology, Shanghai Geriatric Medical Center(上海老年医学中心心内科) Cardiac Rhythm Management, Medtronic Technology Center, Medtronic (Shanghai) Ltd.(美敦力技术中心,美敦力(上海)有限公司,心律管理部) Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology, Beth Israel Deaconess Medical Center, Harvard Medical School(哈佛医学院比尔·德·阿克谢心脏结局研究中心,贝斯以色列·德aconess医疗中心) Harvard-Thorndike Electrophysiology Institute, Beth Israel Deaconess Medical Center, Harvard Medical School(哈佛-托尔恩迪克电生理研究所,贝斯以色列·德aconess医疗中心,哈佛医学院) Department of Cardiology, Imperial College Healthcare NHS Trust(伦敦帝国学院医疗信托心内科部) Department of Cardiology, Chelsea and Westminster NHS Foundation Trust(切尔西和温斯洛医院 NHS 基础信托心内科部) Department of Computer Science and Technology, University of Cambridge(剑桥大学计算机科学与技术系)

AI总结 提出ECGCLIP信号-语言对比学习框架,通过大规模心电图-报告预训练,在89项下游任务中超越基线,实现对常见心律失常、超声心动图靶标及罕见心脏病的广谱评估。

详情
AI中文摘要

心电图(ECG)是心血管诊疗的核心,但传统AI模型通常局限于常见心律失常,且在不同人群或临床细微疾病中泛化能力较差。我们开发了ECGCLIP(心电图对比语言-图像预训练),一种信号-语言对比学习框架,将ECG波形与专家诊断报告对齐。ECGCLIP在来自1,324,856名患者的2,837,962份心电图研究上进行了预训练,并在一个留出内部测试集以及包含约150万份心电图的九个独立外部队列上进行了评估。评估覆盖89项下游任务,包括45项心电图诊断、39项超声心动图靶标和5种罕见心脏病,以PRAUC为主要指标。ECGCLIP在随机初始化和Merl-R18基线上持续提升性能。在内部测试集上,ECGCLIP-R34对心房颤动(PRAUC 0.900)和ST段抬高型心肌梗死(PRAUC 0.383)表现出强劲性能,并在所有外部队列中具有稳健泛化能力。它还改善了低患病率和诊断困难的疾病,包括埃布斯坦畸形、缩窄性心包炎、右位心和心脏淀粉样变性,内部PRAUC值分别为0.253、0.175、0.121和0.201。ECGCLIP数据高效,仅使用10%的训练数据即可达到或超过全数据集基线性能。特征可视化和显著性分析表明,其学习到的表示与既定心电图标准具有临床意义的对齐。这些发现表明,大规模心电图-报告对比预训练可以将常规心电图解读从常见心律失常扩展到广谱心血管评估以及超声心动图和罕见病的机会性筛查。

英文摘要

Electrocardiography (ECG) is central to cardiovascular care, but conventional AI models are often restricted to common arrhythmias and may generalize poorly across populations or clinically subtle diseases. We developed ECG Contrastive Language-Image Pre-training (ECGCLIP), a signal-language contrastive learning framework that aligns ECG waveforms with expert diagnostic reports. ECGCLIP was pre-trained on 2,837,962 ECG studies from 1,324,856 patients and evaluated on a held-out internal test set plus nine independent external cohorts comprising about 1.5 million ECGs. Evaluation covered 89 downstream tasks, including 45 ECG diagnoses, 39 echocardiographic targets, and 5 rare cardiac diseases, using PRAUC as the primary metric. ECGCLIP consistently improved performance over random initialization and Merl-R18 baselines. On the internal test set, ECGCLIP-R34 achieved strong performance for atrial fibrillation (PRAUC 0.900) and ST-segment elevation myocardial infarction (PRAUC 0.383), with robust generalization across all external cohorts. It also improved low-prevalence and diagnostically elusive diseases, including Ebstein anomaly, constrictive pericarditis, dextrocardia, and cardiac amyloidosis, with internal PRAUC values of 0.253, 0.175, 0.121, and 0.201, respectively. ECGCLIP was data efficient, matching or exceeding full-dataset baseline performance with only 10% of training data. Feature visualization and saliency analysis suggested clinically meaningful representations aligned with established electrocardiographic criteria. These findings indicate that large-scale ECG-report contrastive pre-training can expand routine ECG interpretation beyond common arrhythmias toward broad cardiovascular assessment and opportunistic screening of echocardiographic and rare conditions.

2605.25439 2026-05-26 cs.LG 版本更新

Missing Pattern Recognized Diffusion Imputation Model for Missing Not At Random

缺失非随机识别的扩散插补模型

Gyuwon Sim, Sumin Lee, Heesun Bae, Byeonghu Na, Doyun Kwon, Ju-Hee Hwang, Jae-Young Lim, Il-Chul Moon

发表机构 * KAIST(韩国科学技术院) Seoul National University(首尔国立大学)

AI总结 针对缺失非随机(MNAR)问题,提出缺失模式识别扩散插补模型(PRDIM),通过模式识别器和EM算法最大化联合分布似然,实现精确插补。

详情
AI中文摘要

缺失数据在包括时间序列和图像在内的多个领域中频繁出现。在现实世界中,缺失的发生往往依赖于不可观测的值本身,这被称为缺失非随机(MNAR)。在这项工作中,我们引入了缺失模式识别扩散插补模型(PRDIM),这是一个新颖的框架,它显式地捕获缺失模式并精确插补未观测值。PRDIM在期望最大化(EM)算法下迭代地最大化观测值和缺失掩码的联合分布似然。从这个意义上说,我们首先采用一个模式识别器,它近似潜在的缺失模式,并在每次推理中提供指导,以针对缺失信息进行更合理的插补。通过大量实验,我们证明PRDIM在多种数据模态的MNAR设置下始终实现强大的插补性能。

英文摘要

Missing data frequently arises across diverse domains, including time-series and image domains. In the real world, missing occurrences often depend on the unobservable values themselves, which are referred to as Missing Not at Random (MNAR). In this work, we introduce the Missing Pattern Recognized Diffusion Imputation Model (PRDIM), a novel framework that explicitly captures the missing pattern and precisely imputes unobserved values. PRDIM iteratively maximizes the likelihood of the joint distribution for observed values and missing mask under an Expectation-Maximization (EM) algorithm. In this sense, we first employ a pattern recognizer, which approximates the underlying missing pattern and provides guidance during every inference toward more plausible imputations with respect to the missing information. Through extensive experiments, we demonstrate that PRDIM consistently achieves strong imputation performance under MNAR settings across multiple data modalities.

2605.25429 2026-05-26 cs.LG 版本更新

Rethinking Feature Alignment in Generalist Graph Anomaly Detection: A Relational Fingerprint-based Approach

重新思考通用图异常检测中的特征对齐:一种基于关系指纹的方法

Yujing Liu, Yixin Liu, Yu Zheng, Alan Wee-Chung Liew, Xiaofeng Cao, Shirui Pan

发表机构 * Griffith University, Gold Coast, Australia(格里菲斯大学,澳大利亚黄金海岸) Tongji University, Shanghai, China(同济大学,上海,中国)

AI总结 针对通用图异常检测中特征对齐忽略语义导致负迁移的问题,提出基于关系指纹的通用方法ReFi-GAD,通过编码上下文和结构异常指示线索的语义感知指纹,结合Transformer编码器和SNR引导的领域自适应模块,在14个数据集上显著超越现有方法。

Comments 9 pages, 7 figures. Accepted by ICML 2026

详情
AI中文摘要

通用图异常检测(GAD)旨在无需针对特定图进行重新训练即可检测未见图上的异常。然而,现有方法主要关注通过基于PCA的投影来对齐不同数据域间的异构特征,这种对齐方式虽然统一了特征维度,却忽略了特征语义。因此,GAD模型无法学习可迁移的语义知识,甚至在未见图上表现出负迁移。为解决此问题,我们提出一种基于关系指纹的通用GAD方法(简称ReFi-GAD),通过一种通用的、语义感知的关系指纹(ReFi)对齐异构原始特征,该指纹从上下文和结构两个角度编码异常指示线索。基于ReFi,我们设计了一个基于指纹的通用GAD模型,该模型结合了基于Transformer的编码器以捕获领域不变知识,以及一个SNR引导的细化模块用于领域特定自适应。在14个数据集上的大量实验表明,ReFi-GAD显著优于现有最先进方法。

英文摘要

Generalist graph anomaly detection (GAD) aims to detect anomalies on unseen graphs without graph-specific retraining. Nevertheless, existing approaches primarily focus on aligning heterogeneous features across different data domains via PCA-based projection, which harmonizes feature dimensions ignores feature semantics. As a result, GAD models fail to learn transferable semantic knowledge, and even exhibit negative transfer on unseen graphs. To address this issue, we propose a Relational Fingerprint-based generalist GAD approach (ReFi-GAD for short), aligning heterogeneous raw features with a universal and semantics-aware Relational Fingerprint (ReFi) that encodes anomaly-indicative cues from both contextual and structural perspectives. Building on ReFi, we design a fingerprint-grounded generalist GAD model, which combines a transformer-based encoder to capture domain-invariant knowledge with an SNR-guided refinement module for domain-specific adaptation. Extensive experiments on 14 datasets demonstrate that ReFi-GAD significantly outperforms state-of-the-art methods.

2605.25424 2026-05-26 cs.LG cs.AI 版本更新

SeqRoute: Global Budget-Aware Sequential LLM Routing via Offline Reinforcement Learning

SeqRoute: 通过离线强化学习实现全局预算感知的顺序LLM路由

Zhongling Xu, Shunan Zheng, Wei Wang

发表机构 * Department of Operations Research and Industrial Engineering(运筹学与工业工程系)

AI总结 提出SeqRoute框架,将多轮LLM路由建模为有限时域马尔可夫决策过程,通过离线强化学习(CQL)和事后预算重标记(HBR)学习延迟满足,在全局预算约束下优化成本与质量,降低破产率至1%以下。

详情
AI中文摘要

现有的LLM路由框架将查询视为独立事件,忽略了受全局计算预算约束的真实用户会话的顺序性质。这种不匹配不可避免地导致预算破产:短视的路由策略在早期交互中耗尽资源,迫使后续通常更复杂的查询使用不充分的模型。我们引入SeqRoute,一个将多轮路由建模为有限时域马尔可夫决策过程并通过离线强化学习求解的框架。通过将剩余预算纳入状态空间并使用保守Q学习(CQL)进行训练,SeqRoute学习延迟满足以策略性地为会话后期的高风险轮次保留资源。为了克服数据匮乏,我们提出事后预算重标记(HBR)。该技术在不同假设预算下回顾性地模拟历史轨迹,将10,000个原始会话扩展为238万个包含关键破产信号的转换。在部署时,动态λ扫描机制无需重新训练即可实现成本-质量帕累托前沿的零样本导航。大量评估表明,SeqRoute在保持或提高质量的同时将运营成本降低6.0-73.5%,并将破产率抑制在1%以下,在整个帕累托前沿上严格优于行为克隆、预算感知启发式和静态基线。

英文摘要

Existing LLM routing frameworks treat queries as independent events, neglecting the sequential nature of real-world user sessions constrained by global computational budgets. This mismatch inevitably leads to budget bankruptcy: myopic routing policies exhaust resources on early interactions, forcing subsequent and often more complex queries onto inadequate models. We introduce SeqRoute, a framework that formulates multi-turn routing as a finite-horizon Markov Decision Process and solves it via offline reinforcement learning. By incorporating the remaining budget into the state space and training with Conservative Q-Learning (CQL), SeqRoute learns delayed gratification to strategically preserve resources for high-stakes turns later in the session. To overcome data starvation, we propose Hindsight Budget Relabeling (HBR). This technique retrospectively simulates historical trajectories under diverse hypothetical budgets, expanding 10,000 raw sessions into 2.38 million transitions enriched with critical bankruptcy signals. At deployment, a dynamic $λ$-sweep mechanism enables zero-shot navigation of the cost-quality Pareto frontier without retraining. Extensive evaluations demonstrate that SeqRoute reduces operational costs by 6.0-73.5% while maintaining or improving quality, and suppresses bankruptcy rates to under 1%, strictly dominating behavior cloning, budget-aware heuristics, and static baselines across the entire Pareto frontier.

2605.25419 2026-05-26 cs.LG 版本更新

Capture-Calibrate-Coach: A Graph-Based Framework for Knowledge Monitoring Estimation and Adaptive Feedback

捕获-校准-指导:基于图的知识监控估计与自适应反馈框架

Gen Li, Li Chen, Cheng Tang, Boxuan Ma, Yuncheng Jiang, Daisuke Deguchi, Takayoshi Yamashita, Atsushi Shimada

发表机构 * Kyushu University(九州大学) Osaka Kyoiku University(大阪京都大学) South China Normal University(华南师范大学) Nagoya University(名古屋大学) Chubu University(楚博大学)

AI总结 提出Capture-Calibrate-Coach框架,通过异构图神经网络推断学习者未明确提及概念的知识状态,并基于元认知模式提供个性化反馈,在684名学生中预测潜在感知状态AUC达85.21%。

Comments To be published in Proceedings of the 27th International Conference on Artificial Intelligence in Education (AIED 2026)

详情
AI中文摘要

有效的学习支持不仅需要了解学习者知道什么,还需要了解他们如何准确地感知自己的理解。这种元认知维度,称为知识监控,从根本上影响自我调节学习,然而这一维度在当前系统中仍未得到充分探索。本文介绍了用于自适应学习支持的捕获-校准-指导(3C)框架。捕获阶段从开放式自我报告中提取学习者的感知知识状态,构建连接学习者和知识概念的异构图。校准阶段应用异构图神经网络来推断未明确提及的概念的潜在感知状态,从而实现系统的知识监控评估。指导阶段将学习者分为五种元认知模式,并提供针对知识差距和校准误差的个性化反馈。对684名学生的评估显示,预测潜在感知状态的AUC达到85.21%,显著优于基线方法。一项包含47名参与者的用户研究表明,参与者对反馈质量持积极态度,尤其重视关于知识差距的具体反馈和可操作的学习指导。这些发现将基于AI的学习支持推向元认知队友,在支持知识增长的同时培养准确的自我意识。

英文摘要

Effective learning support requires understanding not only what learners know but also how accurately they perceive their own understanding. This metacognitive dimension, known as knowledge monitoring, fundamentally influences self-regulated learning, yet this dimension remains underexplored in current systems. This paper introduces the Capture-Calibrate-Coach (3C) framework for adaptive learning support. The Capture phase extracts learners' perceived knowledge states from open-ended self-reports to construct a heterogeneous graph linking learners and knowledge concepts. The Calibrate phase applies a heterogeneous graph neural network to infer latent perceived states for concepts not explicitly mentioned, enabling systematic knowledge monitoring assessment. The Coach phase classifies learners into five metacognitive patterns and delivers personalized feedback addressing both knowledge gaps and calibration errors. Evaluation with 684 students demonstrates 85.21% AUC in predicting latent perceived states, significantly outperforming baseline methods. A user study with 47 participants shows positive reception of feedback quality, with participants particularly valuing concrete feedback on knowledge gaps and actionable study guidance. These findings advance AI-based learning support toward metacognitive teammates that foster accurate self-awareness while supporting knowledge growth.

2605.25418 2026-05-26 cs.CV cs.GR cs.LG 版本更新

Generating 3D models from sketches of human faces using a combined approach of Convolutional Neural Networks, Procedural Modeling, and Contour Mapping

利用卷积神经网络、程序化建模和轮廓映射的联合方法从人脸素描生成3D模型

Nancy Iskander

发表机构 * Behaviour Digital

AI总结 提出一种结合卷积神经网络、参数化3D人脸模型和主动蛇形轮廓的新方法,首次通过训练CNN检测素描中的表情并生成对应3D模型。

Comments A thesis submitted in conformity with the requirements for the degree of Master of Science in Computer Science Graduate Department of Computer Science University of Toronto

详情
AI中文摘要

从人脸素描生成3D模型是计算机图形学中的一个活跃研究课题,因为它有潜力极大地促进专业3D艺术家和新手的建模工作。受面部表情显著改变和塑造面部轮廓这一观察的启发,我们的方法结合了表情检测和3D模型生成。结果是一种从素描生成3D模型的新方法,它依赖于三个组成部分:卷积神经网络、参数化3D人脸模型(Valley Girl)和主动蛇形轮廓。在文献中首次,CNN(使用我们自己生成的数据集)被训练通过检测活跃的FACS动作单元来识别给定素描中的表情。然后,该表情被复制到Valley Girl上以获得具有相似表情的3D模型。接着,使用主动蛇形轮廓来找到所需的变换,以缩小该模型与给定素描之间的差距。

英文摘要

Generating 3D models from face sketches is an active topic of research in Computer Graphics due to its potential to tremendously facilitate the modeling of faces for both professional 3D arists and novices. Motivated by the observation that facial expressions are responsible for significantly altering and shaping the contours in our faces, we combine both expression detection and 3D model generation in our approach. The result is a novel approach to generating 3D models from sketches which relies on three components: Convolutional Neural Networks, a parametric 3D face model (Valley Girl), and Active Snake Contours. For the first time in the literature, CNNs are trained (using our own generated dataset) to detect the expression in the given sketch through detecting the active FACS Action Units. The expression is then duplicated on Valley Girl to obtain a 3D model with a similar expression. Active Snake Contours are then used to find the transforms needed to close the gaps between that model and the given sketch.

2605.25395 2026-05-26 cs.LG math.OC 版本更新

EMA-Nesterov: Stabilizing Nesterov's Lookahead for Accelerated Deep Learning Optimization

EMA-Nesterov:稳定Nesterov前瞻以加速深度学习优化

Chung-Yiu Yau, Dawei Li, Athanasios Glentis, Valentyn Boreiko, Hoi-To Wai, Mingyi Hong

发表机构 * University of Minnesota(明尼苏达大学) Amazon AGI(亚马逊人工智能实验室) The Chinese University of Hong Kong(香港中文大学)

AI总结 针对深度学习优化中Nesterov动量因随机梯度噪声和非凸损失导致的不稳定性,提出EMA-Nesterov方法,用参数更新的指数移动平均替代标准前瞻方向,通过低通滤波捕捉训练轨迹的低频趋势,在凸问题中保持理论加速收敛率,并在语言模型预训练中验证了其广泛适用性和优于现有前瞻方法的性能。

Comments 25 page, 10 figures

详情
AI中文摘要

基于前瞻的加速方法,如Nesterov动量,在优化中广泛使用,但在深度学习训练中常因随机梯度噪声和非凸损失景观而变得不可靠。特别是,标准前瞻依赖于短视更新信号(例如连续迭代之间的差异),这些信号本质上有噪声,可能导致不稳定的外推方向。本文从轨迹角度重新审视Nesterov加速,并认为深度学习中的有效加速应利用优化轨迹的低频趋势,而非外推噪声的一步更新。基于这一见解,我们提出EMA-Nesterov,一个简单的修改,用参数更新的指数移动平均(EMA)替代标准Nesterov前瞻方向。这产生了一个稳定的前瞻方向,通过低通滤波器捕捉并利用训练轨迹的演变趋势,同时通过EMA的几何加权结构保持对渐进变化的适应性。我们证明,EMA-Nesterov在凸问题中保留了与Nesterov加速梯度方法类似的理论加速收敛率。此外,我们在语言模型预训练上提供了经验证据,验证了EMA-Nesterov广泛适用于一系列微调的基础优化器,包括Adam、SOAP、Muon,以及在优化基准(NanoGPT)上达到最先进性能的复杂优化器。与先前的瞻方法相比,EMA-Nesterov通过避免短视前瞻的不稳定性和长视前瞻的非自适应性,实现了更好的性能。

英文摘要

Lookahead-based acceleration methods, such as Nesterov's momentum, are widely used in optimization, but they often become unreliable in deep learning training mainly due to stochastic gradient noise and non-convex loss landscapes. In particular, standard lookahead relies on short-horizon update signals (e.g., differences between consecutive iterates), which are inherently noisy and can lead to unstable extrapolation directions. This work revisits Nesterov's acceleration from a trajectory perspective and argues that effective acceleration in deep learning should harness the low-frequency trends of optimization trajectories rather than extrapolating noisy one-step updates. Leveraging this insight, we propose EMA-Nesterov, a simple modification that replaces the standard Nesterov's lookahead direction with an exponential moving average (EMA) of parameter updates. This yields a stabilized lookahead direction that captures and harnesses the evolving trend of the training trajectory through a low-pass filter, while remaining adaptive to progressive changes via the geometric weighting structure of EMA. We show that EMA-Nesterov retains a theoretical accelerated convergence rate in convex problems that is analogous to Nesterov's accelerated gradient method. Furthermore, we provide empirical evidence on language model pre-training to verify that EMA-Nesterov is broadly applicable across a range of fine-tuned base optimizers, including Adam, SOAP, Muon, as well as complex optimizers that achieve state-of-the-art performance on optimization benchmarks (NanoGPT). Compared to prior lookahead methods, EMA-Nesterov achieves better performance by avoiding the instability of short-horizon lookahead and the non-adaptivity of long-horizon lookahead.

2605.25391 2026-05-26 cs.LG eess.SP 版本更新

A Context Augmented Multi-Play Multi-Armed Bandit Algorithm for Fast Channel Allocation in Opportunistic Spectrum Access

一种用于机会频谱接入中快速信道分配的上下文增强多玩多臂老虎机算法

Ruiyu Li, Guangxia Li, Xiao Lu, Jichao Liu, Yan Jin

发表机构 * School of Computer Science and Technology(计算机科学与技术学院) Xidian University(西安电子科技大学) Research and Development(研发) Hainayun IoT Technology Co., Ltd(海纳云物联网科技有限公司) Hainayun IoT Technology Ltd(海纳云物联网科技有限公司)

AI总结 针对机会频谱接入中的信道分配问题,提出一种上下文增强的多玩多臂老虎机算法,通过将信道噪声建模为奖励函数的扰动并利用信道状态信息作为上下文,分别针对线性和非线性相关性推导出两种索引策略,实现低遗憾和更合理的次优臂选择。

Comments Accepted by ISCC'24

详情
AI中文摘要

我们研究了机会频谱接入(OSA)场景中用于信道分配的动态上下文多玩多臂老虎机(MP-MAB)问题。大多数现有的MP-MAB方法对于实际OSA系统不实用,因为它们假设了许多理想条件,计算成本高,最重要的是忽略了与服务质量直接相关的信道噪声的影响。在本研究中,我们通过将信道噪声建模为MP-MAB中臂奖励函数的扰动来体现这种影响。由于信道状态信息与信道噪声之间存在隐含的相关性,我们将前者作为MP-MAB的上下文来表示后者引起的扰动。我们研究了上下文与扰动之间的两种相关性——线性和非线性,并分别推导出两种索引策略。这些策略通过线性模型和神经网络学习相关性,并使用估计的噪声值调整上置信界。数值实验表明,所提出的策略能够实现更低的遗憾,并以更合理的方式选择次优臂。

英文摘要

We study the restless contextual multi-play multi-armed bandit (MP-MAB) problem for channel allocation in the opportunity spectrum access (OSA) scenario. Most existing MP-MAB methods are impractical for real-world OSA systems as they assume many ideal conditions, incur a heavy computational cost, and most importantly, ignore the impact of channel noise which is directly related to the quality of service. In this study, we embody this impact by modeling channel noise as a perturbation of the arm's reward function in MP-MAB. As there is an implicit correlation between channel state information and channel noise, we take the former as a context for MP-MAB to present the perturbation caused by the latter. We investigate two types of correlation between the context and the perturbation -- linear and nonlinear, and derive two index policies, respectively. These policies learn the correlations through a linear model and a neural network, and use estimated noise value to adjust the upper confidence bound. Numerical experiments demonstrate that the proposed policies can achieve lower regret and select sub-optimal arms in a more reasonable way.

2605.25388 2026-05-26 cs.LG q-bio.QM 版本更新

ViroBench: Benchmarking Nucleotide Foundation Models on Viral Genomics Tasks

ViroBench:病毒基因组学任务中的核苷酸基础模型基准测试

Dongxin Ye, Fang Hu, Han Hu, Shu Hu, Yang Tan, Wanli Ouyang, Stan Z. Li, Jie Cui, Nanqing Dong

发表机构 * Shanghai Innovation Institute Shanghai China University of Electronic Science Fudan University Shanghai China Shanghai Artificial Intelligence Laboratory Shanghai China Institute of Infection Health Fudan University Shanghai China Shanghai Sci-Tech Inno Center for Infection \& Immunity Shanghai China Shanghai Jiao Tong University Shanghai China Shenzhen Loop Area Institute Shenzhen China Chinese University of Hong Kong Hong Kong China Westlake University Hangzhou China Shanghai Innovation Institute Fudan University Shanghai Artificial Intelligence Laboratory Shanghai Sci-Tech Inno Center for Infection \& Immunity Shanghai Jiao Tong University Shenzhen Loop Area Institute Chinese University of Hong Kong Westlake University

AI总结 提出首个针对病毒基因组学的综合基准ViroBench,评估66个核苷酸基础模型在生物学理解和潜在生物安全风险上的表现,发现模型在系统发育和时间偏移下性能下降,生成任务中统计似然与生物功能有效性脱钩,且预训练数据的分类多样性比参数规模更重要。

Comments 42 pages,15 figures

详情
AI中文摘要

核苷酸序列构成了生物系统的基本遗传基础,使得病毒基因组分析对生物医学进步至关重要。尽管生物基础模型,特别是核苷酸基础模型(NFMs)取得了进展,但该领域缺乏一个统一的病毒基因组学标准来促进社区发展并实施生物安全约束。为了解决这个问题,我们引入了ViroBench,这是第一个专门为病毒场景中的NFMs设计的全面且大规模的基准测试。ViroBench在两个关键维度上评估模型:生物学理解和潜在生物安全风险,覆盖4种任务类型中的18个不同场景。对66个不同架构的NFMs的广泛评估得出了三个关键结论。首先,NFMs在系统发育和时间偏移下表现出生物学理解的性能下降,表明外推能力较弱。其次,生成任务揭示了统计似然与生物功能有效性之间的脱钩,构成了潜在的生物安全风险。第三,受控消融研究表明,预训练数据中的分类多样性比参数规模更重要。具体来说,一个在多样化数据上训练的轻量级基线相比其原始模型实现了67.5%的性能提升。总体而言,ViroBench为未来病毒核苷酸基础模型的研究提供了可解释的诊断评估和可重复的测量框架。数据集和代码公开于https://github.com/QIANJINYDX/ViroBench。

英文摘要

Nucleotide sequences constitute the fundamental genetic basis of biological systems, rendering viral genomic analysis critical for biomedical advancement. Despite progress in biological foundation models, specifically nucleotide foundation models (NFMs), the field lacks a unified standard for viral genomics to facilitate community development and enforce biosecurity constraints. To address this, we introduce ViroBench, the first comprehensive and large-scale benchmark specifically designed for NFMs in viral settings. ViroBench evaluates models across two critical dimensions: biological understanding and latent biosecurity risk, covering 18 diverse scenarios within 4 task types. Extensive evaluation of 66 NFMs across diverse architectures yields three critical conclusions. Firstly, NFMs exhibit a performance degradation in biological understanding under phylogenetic and temporal shifts, indicating weak extrapolation capabilities. Secondly, generation tasks reveal a decoupling between statistical likelihood and biological functional validity, posing latent biosecurity risks. Thirdly, controlled ablation studies reveal that taxonomic diversity in pretraining data outweighs parameter scale. Specifically, a lightweight baseline trained on diverse data achieves a 67.5% performance gain over its original model. Overall, ViroBench provides interpretable, diagnostic evaluations and a reproducible measurement framework for future research on viral nucleotide foundation models. The datasets and code are publicly available at https://github.com/QIANJINYDX/ViroBench.

2605.25383 2026-05-26 stat.ML cs.LG math.ST stat.TH 版本更新

Learning manifold diffusion semigroups from graph transition matrices

从图转移矩阵学习流形扩散半群

Xiuyuan Cheng, Nan Wu

发表机构 * Department of Mathematics, Duke University(杜克大学数学系) Department of Mathematical Sciences, The University of Texas at Dallas(德克萨斯大学达拉斯分校数学科学系)

AI总结 本文提出通过迭代图转移矩阵直接逼近流形热半群,在低正则性假设下给出了无穷范数误差界,并实现了与图拉普拉斯方法相当的收敛速率。

详情
AI中文摘要

我们考虑由从嵌入欧氏空间的未知流形中抽取的有限独立同分布样本构建的图扩散过程,其中图亲和度由环境高斯核矩阵定义。我们证明,在测试函数 $f$ 仅具有低正则性假设(包括 $f \in L^\infty$ 的情况)下,流形热半群 $Q_t = e^{t\Delta}$ 可以通过迭代图转移矩阵 $P$ 直接逼近。我们以 $\infty$-范数界定了 $\| P^n f - Q_t f \|$,其中算子对 $f$ 的作用被适当定义,并且对于扩散时间 $t$ 至 $O(1)$ 及更长,我们恢复了经典图拉普拉斯逐点速率 $O(N^{-2/(d+6)})$(忽略对数因子)。该速率适用于样本内误差以及样本外泛化,其中新点处 $Q_t f$ 的估计量通过核卷积定义。为了处理流形上的非均匀采样密度,我们引入了图转移矩阵的右归一化;在采样密度 $p$ 为 $C^3$ 且远离零的假设下,相同的收敛速率成立。我们在模拟数据上数值验证了所提估计器的性能。

英文摘要

We consider graph diffusion processes constructed from finite i.i.d. samples drawn from an unknown manifold embedded in ambient Euclidean space, where the graph affinity is defined by an ambient Gaussian kernel matrix. We show that the manifold heat semigroup $Q_t = e^{tΔ}$ can be approximated directly by iterating the graph transition matrix $P$, under only low regularity assumptions on the test function $f$, including the case $f \in L^\infty$. We bound $\| P^n f - Q_t f \|$ in $\infty$-norm, with the operator application to $f$ properly defined, and we recover the classical graph-Laplacian pointwise rate $O(N^{-2/(d+6)})$ up to logarithmic factors, for diffusion times $t $ up to $O(1)$ and longer. The rate holds for in-sample error as well as out-of-sample generalization, where the estimator of $Q_t f$ at a new point is defined via kernel convolution. To handle non-uniform sampling densities on the manifold, we introduce a right-normalization of the graph transition matrix; under the assumption that the sampling density $p$ is $C^3$ and bounded away from zero, the same convergence rates hold. We numerically demonstrate the performance of the proposed estimator on simulated data.

2605.25381 2026-05-26 cs.LG 版本更新

Not only where, But when: Temporal Scheduling for RLVR

不仅在哪里,而且何时:RLVR 的时间调度

Jinghao Zhang, Ruilin Li, Feng Zhao, Jiaqi Wang

发表机构 * University of Science and Technology of China(中国科学技术大学) Shanghai Innovation Institute(上海创新研究院) Wuhan University(武汉大学)

AI总结 针对强化学习可验证奖励(RLVR)中忽略策略行为异质性的问题,提出时间调度方法,通过动态调整信用分配标准来优化学习动态,实验表明该方法能提升训练稳定性和效率。

Comments Github: https://github.com/Jinghaoleven/RLVR-Schedule

详情
AI中文摘要

具有可验证奖励的强化学习(RLVR)已成为大型语言模型(LLMs)后训练的核心技术。虽然策略优化由所有采样token在全局广播标量奖励下驱动,但轨迹中表现出的异质性策略行为在很大程度上被忽视而未加以区分。现有工作通过信用分配来解决这一问题,包括token级优势重加权和选择性token优化,然而分配标准在整个训练过程中基本保持不变,限制了策略的弹性演化。在这项工作中,我们认为学习信号的调度时机与它们在token间的分配位置同样重要,并引入了时间维度,即在RLVR优化过程中调度信用分配标准。我们发现,优先关注具有特定策略行为的目标token,并逐渐向通用优化衰减,可以带来更稳定和高效的学习动态。此外,我们表明简单的轨迹百分位数为区分策略行为提供了自然视角,并与时间调度有效配合。我们的分析揭示,标准优化在同时适应异质性行为时显著牺牲了策略熵,而时间调度产生了更健康的策略演化动态。在数学和通用推理基准上的实验表明了一致的改进,表明时间调度构成了一个有前景的优化维度。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has become a core technique for post-training of Large Language Models (LLMs). While policy optimization is driven by all sampled tokens under a globally broadcast scalar reward, the heterogeneous policy behaviors exhibited along trajectories are largely overlooked without differentiation. Existing works address this by credit allocation, including token-level advantage reweighting, and selective token optimization, however, the allocation criterion are principally stagnant throughout training, limiting resilient policy evolution. In this work, we argue that \textit{when} learning signals are scheduled can be as important as \textit{where} they are allocated across tokens, and introduce the temporal dimension that scheduling the credit allocation criteria over the course of RLVR optimization. We find that prioritizing targeted tokens emphasized with specific policy behaviors, and gradually attenuating toward general optimization leads to more stable and efficient learning dynamics. Furthermore, we show that simple trajectory percentiles provide a natural perspective for distinguishing policy behaviors, and works effectively with temporal scheduling. Our analysis reveals that standard optimization substantially sacrifices policy entropy when simultaneously accommodating heterogeneous behaviors, whereas temporal scheduling yields healthier policy evolution dynamics. Experiments across mathematical and general reasoning benchmarks demonstrate consistent improvements, suggesting that temporal scheduling constitutes a promising optimization dimension.

2605.25352 2026-05-26 cs.LG cs.AI 版本更新

Certified Robustness from Approximate Gaussian Mixture Structures in Pretrained Latent Spaces

基于预训练潜在空间中近似高斯混合结构的认证鲁棒性

Konstantinos Emmanouilidis, Tianjiao Ding, Nghia Nguyen, Nicolas Loizou, René Vidal

发表机构 * CS & MINDS Johns Hopkins University(计算机科学与MINDS约翰霍普金斯大学) CIS University of Pennsylvania(计算机与信息科学宾夕法尼亚大学) AMS & MINDS Johns Hopkins University(人工智能与机器学习系约翰霍普金斯大学) ESE, Radiology & IDEAS University of Pennsylvania(工程科学与放射学系及IDEAS宾夕法尼亚大学)

AI总结 本文提出一个框架,利用预训练编码器将输入映射到近似高斯混合的潜在分布,通过理论分析证明鲁棒性退化有界,从而实现可认证鲁棒分类器,在CIFAR-10和ImageNet上达到最优或竞争性的认证准确率。

详情
AI中文摘要

深度学习模型易受对抗扰动影响,这对安全关键部署提出了重要关切。经验性防御在实践中可以实现强鲁棒性,但缺乏形式化保证,这推动了可认证鲁棒分类器的需求。虽然认证方法提供了形式化保证,但由于无法利用复杂数据分布中的结构,它们通常产生过于保守的边界。在这项工作中,我们提出了一个设计可认证鲁棒分类器的框架,该框架利用数据表示中的潜在结构。我们首先分析高斯混合设置,推导出鲁棒分类器存在的必要和充分条件,并构建了一个具有闭式鲁棒性证书和泛化保证的分类器。我们的主要贡献是证明精确结构并非必需:我们证明,如果预训练编码器将输入映射到一个与高斯混合分布$\varepsilon$-接近(在KL散度下)的潜在分布,那么认证准确率会优雅地退化,并给出了一个显式边界,关联真实分布和近似分布下的鲁棒性。这一结果使得直接使用预训练模型成为可能,而无需精确的分布假设。实验上,我们的方法在CIFAR-10和ImageNet上实现了最先进或具有竞争力的认证准确率,同时保持了强大的干净性能和低计算开销。总体而言,我们的工作将近似潜在结构确立为通往可认证鲁棒性的一条实用且有原则的路径。

英文摘要

Deep learning models are vulnerable to adversarial perturbations, raising important concerns for safety-critical deployment. Empirical defenses can achieve strong robustness in practice, but lack formal guarantees, motivating the need for certifiably robust classifiers. While certified methods provide formal guarantees, they often yield overly conservative bounds due to their inability to exploit structure in complex data distributions. In this work, we propose a framework for designing certifiably robust classifiers that leverages latent structure in data representations. We first analyze the Gaussian mixture setting, deriving necessary and sufficient conditions for the existence of robust classifiers and constructing a classifier with a closed-form robustness certificate and generalization guarantees. Our main contribution is to show that exact structure is not required: we prove that if a pretrained encoder maps inputs to a latent distribution that is $\varepsilon$-close (in KL divergence) to a Gaussian mixture, then certified accuracy degrades gracefully, with an explicit bound relating robustness under the true and approximate distributions. This result enables the direct use of pretrained models without requiring exact distributional assumptions. Empirically, our method achieves state-of-the-art or competitive certified accuracy on CIFAR-10 and ImageNet, while maintaining strong clean performance and low computational overhead. Overall, our work establishes approximate latent structure as a practical and principled route to certifiable robustness.

2605.25348 2026-05-26 eess.IV cs.AI cs.CV cs.LG cs.SC 版本更新

Parameter-Efficient CT Reconstruction via Deep Graph Laplacian Regularization

基于深度图拉普拉斯正则化的参数高效CT重建

Veera Varuni Radhakrishnan, Chinthaka Dinesh, Qurat-ul-Ain Azim

发表机构 * Mechanical and Industrial Engineering Department(机械与工业工程系)

AI总结 提出深度图拉普拉斯正则化(Deep GLR)方法,通过将二次图正则化集成到近端前向-后向分裂优化框架中,仅用少量参数和数据即可实现低剂量CT重建的噪声抑制,在参数效率和数据效率上显著优于现有方法。

Comments 7 pages, 3 figures, conference

详情
AI中文摘要

低剂量计算机断层扫描(LDCT)重建面临重建质量与资源需求之间的关键权衡。虽然最近的深度学习方法达到了最先进的性能,但它们通常依赖超过50万个参数,并在超过35,000次扫描的大规模数据集上训练。本文研究在严格资源约束下,基于图的正则化是否能提供有意义的噪声抑制。我们提出了深度图拉普拉斯正则化(Deep GLR),将二次图正则化集成到近端前向-后向分裂优化框架中,并包含三个轻量级CNN模块。在LoDoPaB-CT基准上评估,Deep GLR达到了30.70 dB的PSNR,相比滤波反投影提高了6.33 dB,同时仅使用了91,848个参数,在1000个样本上训练(标准训练集的2.8%)。与基准方法相比,这代表了每dB改进5.8倍的参数效率和30倍的数据效率。学习到的图带宽参数(ε=1.25)收敛到可解释的值,表明该方法捕捉了有意义的图像先验而非过拟合。尽管与最先进方法相比仍有13 dB的差距,但结果表明基于图的正则化为资源受限的医学成像场景提供了有利的效率-质量权衡。

英文摘要

Low-dose computed tomography (LDCT) reconstruction faces a critical tradeoff between reconstruction quality and resource requirements. While recent deep learning methods achieve state-of-the-art performance, they typically rely on over 500,000 parameters trained on large-scale datasets exceeding 35,000 scans. This work investigates whether graph-based regularization can provide meaningful noise reduction under strict resource constraints. We propose Deep Graph Laplacian Regularization (Deep GLR), integrating quadratic graph regularization into a Proximal Forward-Backward Splitting optimization framework with three lightweight CNN modules. Evaluated on the LoDoPaB-CT benchmark, Deep GLR achieves 30.70 dB PSNR, representing a 6.33 dB improvement over filtered backprojection, while using only 91,848 parameters trained on 1000 samples (2.8\% of standard training set). Compared to benchmark methods, this represents 5.8 times better parameter efficiency and 30 times better data efficiency per dB improvement. The learned graph bandwidth parameter ($ε$=1.25) converges to interpretable values, suggesting the method captures meaningful image priors rather than overfitting. While a 13 dB gap remains versus state-of-the-art methods, results demonstrate that graph-based regularization provides a favorable efficiency-quality tradeoff for resource-constrained medical imaging scenarios.

2605.25347 2026-05-26 cs.CV cs.LG 版本更新

ERNIE-Image Technical Report

ERNIE-Image 技术报告

Jiaxiang Liu, Zhida Feng, Pengyu Zou, Zhenyu Qian, Tianrui Zhu, Jun Xia, Yuehu Dong, Yanzheng Lin, Honglin Xiong, Anqi Chen, Yunpeng Ding, Jinghui Duan, Lin Gao, Chao Han, Tiechao He, Jiakang Hu, Ranjun Hua, Xueming Jiang, Qingli Kong, Yuting Lei, Tianyu Li, Yunlin Liu, Changling Liu, Yaxin Liu, Yi Liu, Xuguang Liu, Xiaolong Ma, Yan Pan, Yiran Ren, Nan Sheng, Yu Sun, Siyang Sun, Yixiang Tu, Yang Wan, Huanai Wang, Siqi Wang, Yang Wu, Youzhi Yang, Xiaowen Yang, Jianwen Yang, Yehua Yang, Quanwen Zhang, Xinmin Zhang, Haoxin Zhang, Xiang Zhang, Jun Zhang, Qian Zhang, Qiao Zhao, Qi Zhou

发表机构 * ERNIE Team, Baidu(百度ERNIE团队)

AI总结 提出基于8B单流DiT架构的开源文本到图像生成模型ERNIE-Image,通过自底向上的预训练数据构建和自顶向下的后训练数据构建,结合稳定DPO策略和MT-DMD蒸馏方法,在指令遵循、文本渲染和美学质量上接近顶级商业模型。

详情
AI中文摘要

我们介绍了ERNIE-Image,一个基于8B单流DiT架构构建的开源文本到图像生成模型。ERNIE-Image旨在通过更有效地挖掘大规模预训练数据并在整个训练过程中提高监督质量,来弥合当前开源模型与领先闭源系统之间的差距。在预训练阶段,我们采用自底向上的数据构建流程,结合细粒度图像分类、丰富的标题注释、美学评估和分层采样。该策略在保留长尾概念和详细真实世界知识的同时减少数据噪声,为复杂生成任务提供了更坚实的基础。在后训练阶段,我们针对高需求场景使用自顶向下的数据构建流程,多样化提示注释以更好地匹配真实用户输入,并应用稳定的DPO策略使模型与人类美学偏好对齐。我们进一步训练ERNIE-Image-Turbo以实现高效的8-NFE生成,并提出MT-DMD以减轻蒸馏过程中的能力漂移。为了使模型在实际场景中更易于使用,我们为其配备了一个轻量级的提示增强器,将简洁的用户意图扩展为结构化的视觉描述。此外,我们开发了工业级美学模型ERNIE-Image-Aes,以及用于真实美学评估的人工标注基准ERNIE-Image-Aes-1K。大量的定性和定量实验表明,ERNIE-Image在开源模型中实现了领先性能,并在指令遵循、文本渲染和美学质量方面接近顶级商业模型。我们发布训练好的模型和美学资源,以促进AIGC社区的进一步学术研究和技术进步。

英文摘要

We introduce ERNIE-Image, an open-source text-to-image generation model built upon an 8B single-stream DiT architecture. ERNIE-Image aims to bridge the gap between current open-source models and leading closed-source systems through more effective mining of large-scale pre-training data and improved supervision quality throughout training. During pre-training, we adopt a bottom-up data construction pipeline that combines fine-grained image categorization, rich caption annotation, aesthetic assessment, and hierarchical sampling. This strategy reduces data noise while preserving long-tail concepts and detailed real-world knowledge, providing a stronger foundation for complex generation tasks. In the post-training stage, we use a top-down data construction pipeline for high-demand scenarios, diversify prompt annotations to better match real user inputs, and apply a stabilized DPO strategy to align the model with human aesthetic preferences. We further train ERNIE-Image-Turbo for efficient 8-NFE generation and propose MT-DMD to mitigate capability drift during distillation. To make the model easier to use in practical scenarios, we equip it with a lightweight Prompt Enhancer that expands concise user intents into structured visual descriptions. In addition, we develop ERNIE-Image-Aes, an industrial-grade aesthetic model, together with ERNIE-Image-Aes-1K, a human-annotated benchmark for realistic aesthetic evaluation. Extensive qualitative and quantitative experiments show that ERNIE-Image achieves leading performance among open-source models and approaches top-tier commercial models in instruction following, text rendering, and aesthetic quality. We release the trained models and aesthetic resources to facilitate further academic research and technical progress in the AIGC community.

2605.25346 2026-05-26 cs.RO cs.AI cs.LG cs.SY eess.SY math.OC 版本更新

Parallel Differentiable Reachability for Learning and Planning with Certified Neural Dynamics and Controllers

用于学习和规划的并行可微可达性:带认证的神经动力学与控制器

Keyi Shen, Glen Chou

发表机构 * MIT(麻省理工学院)

AI总结 提出一种基于JAX的并行可微可达性框架,结合泰勒模型流形构建与CROWN线性界传播,支持GPU批处理和自动微分,并用于认证训练和可达性感知的MPC,在非抓取操作和四旋翼任务中实现在线规划与有界不确定性下的认证可达集过近似。

Comments Robotics: Science and Systems XXII (RSS 2026)

详情
AI中文摘要

神经网络动力学模型和控制策略在机器人领域取得了强大性能,但在不确定性下提供可靠保证仍然困难,尤其是对于闭环神经网络系统。现有的可达性工具提供了形式化的过近似,但通常不可微、过于保守或对于现代学习和在线规划流程来说太慢。为了解决这个问题,我们提出了一个在JAX中可并行化、可微的可达性框架,适用于连续和离散时间系统,具有解析和基于神经网络的动力学和控制器。我们的框架通过统一表示结合了泰勒模型流形构建和CROWN风格的线性界传播,该表示在支持GPU批处理计算和自动微分的同时保留了仿射依赖。基于这个可达性基元,我们开发了(i)一种认证训练方法,鼓励生成对可达性友好的动力学模型和控制器,以及(ii)一种具有基于梯度细化的可达性感知采样MPC方案。在非抓取操作和四旋翼任务上的实验,包括硬件和更高维度的评估(高达72维),展示了在实际在线规划中保持有界不确定性下认证可达集过近似的可行性。

英文摘要

Neural network (NN) dynamics models and control policies achieve strong performance in robotics, but providing sound guarantees under uncertainty remains difficult, especially for closed-loop NN systems. Existing reachability tools provide formal over-approximations, yet are often non-differentiable, overly conservative, or too slow for modern learning and online planning pipelines. To address this, we present a parallelizable, differentiable reachability framework in JAX for continuous- and discrete-time systems with analytical and NN-based dynamics and controllers. Our framework combines Taylor-model flowpipe construction with CROWN-style linear bound propagation through a unified representation that preserves affine dependencies while supporting GPU-batched computation and automatic differentiation. Building on this reachability primitive, we develop (i) a certified training method that encourages reachability-friendly dynamics models and controllers, and (ii) a reachability-aware sampling-based MPC scheme with gradient-based refinement. Experiments on non-prehensile manipulation and quadrotor tasks, including hardware and higher-dimensional evaluations (up to 72D), demonstrate practical online planning while maintaining certified reachable-set over-approximations under bounded uncertainty.

2605.25344 2026-05-26 cs.CL cs.AI cs.LG quant-ph 版本更新

A general tensor-structured compression scheme for efficient large language models

一种用于高效大语言模型的通用张量结构压缩方案

Ying Lu, Peng-Fei Zhou, Qi-Xuan Fang, Pan Zhang, Shi-Ju Ran, Gang Su

发表机构 * School of Physical Sciences, University of Chinese Academy of Sciences(中国科学院大学物理科学学院) Kavli Institute for Theoretical Sciences, University of Chinese Academy of Sciences(中国科学院大学理论科学研究院) Center for Quantum Physics and Intelligent Sciences, Department of Physics, Capital Normal University(首都师范大学量子物理与智能科学中心) Institute of Theoretical Physics, Chinese Academy of Sciences(中国科学院理论物理研究所)

AI总结 提出张量混合(MixT)方案,通过将密集线性层替换为张量算子混合体,在保持MMLU准确率的同时大幅减少参数、FLOPs和内存。

Comments 12 pages, 4 figures

详情
AI中文摘要

大语言模型(LLMs)主要由密集线性变换主导,其存储、内存和计算开销阻碍了高效的适配和部署,同时掩盖了结构简化对功能的影响。本文提出张量混合(MixT),一种通用的张量结构压缩方案,将目标密集线性层替换为可原生执行的张量算子混合体。MixT直接作用于通用线性投影而非模型特定组件,因此可能适用于基于Transformer的LLMs及其他密集神经映射。我们在统一的恢复协议下对Qwen3-8B和LLaMA2-7B评估MixT,识别出一个广泛的压缩区域,在该区域内MMLU准确率基本保持不变,直到模型特定边界处出现突变。该突变与输出熵、预测熵和层间几何的协同变化同时发生。在LLaMA2-7B的突变边界处,MixT将全模型参数减少47.5%,推理FLOPs减少37.1%,训练FLOPs减少52.1%,峰值推理内存减少60.4%,展示了其在低成本LLM压缩中的实际潜力。

英文摘要

Large language models (LLMs) are dominated by dense linear transformations, whose storage, memory and computational overheads hinder efficient adaptation and deployment while masking the functional impacts of structural simplification. Here we present Tensor Mixture (MixT), a general tensor-structured compression scheme that replaces targeted dense linear layers with natively executable mixtures of tensor operators. Operating directly on generic linear projections instead of model-specific components, MixT is potentially applicable across Transformer-based LLMs and other dense neural mappings. We evaluate MixT on Qwen3-8B and LLaMA2-7B under a unified recovery protocol, identifying a broad compressible regime in which MMLU accuracy is largely preserved before an abrupt transition at model-specific boundaries. This transition coincides with coordinated shifts in output entropy, prediction entropy and inter-layer geometry. At the LLaMA2-7B transition boundary, MixT reduces full-model parameters by 47.5\%, inference FLOPs by 37.1\%, training FLOPs by 52.1\% and peak inference memory by 60.4\%, demonstrating its practical potential for lower-cost LLM compression.

2605.25338 2026-05-26 cs.LG cs.AI 版本更新

CausalFlow: Causal Attribution and Counterfactual Repair for LLM Agent Failures

CausalFlow: LLM Agent 失败的因果归因与反事实修复

Akash Bonagiri, Devang Borkar, Gerard Janno Anderias, Setareh Rafatirad, Houman Homayoun

发表机构 * Department of Computer Science University of California, Davis(计算机科学系加州大学戴维斯分校)

AI总结 提出CausalFlow框架,通过反事实干预计算步骤级因果责任分数,识别失败步骤并生成最小编辑修复,用于测试时修复和训练时监督,在多个基准上优于启发式方法。

详情
AI中文摘要

大型语言模型(LLM)代理在涉及推理、工具使用和环境交互的多步任务中经常失败。虽然此类失败通常被记录或通过启发式重试处理,但它们包含了关于执行中断位置的结构化信号。我们提出了CausalFlow,一个干预框架,将失败的代理轨迹转换为最小的反事实修复和可重用的监督。CausalFlow将执行轨迹建模为依赖步骤的顺序链,并通过步骤级反事实干预计算因果责任分数(CRS)来识别导致失败的步骤。对于这些步骤,我们生成最小编辑修复,将最终结果翻转为成功,产生形式为(错误步骤,修正步骤)的验证对比对。CausalFlow支持两种互补用途:具有最小行为漂移的针对性测试时修复,以及适用于离线偏好优化或奖励建模的训练时监督。在涵盖数学推理、代码生成、问答和医学浏览的四个基准测试中,CausalFlow将失败执行转换为具有高最小性和因果一致性分数的验证最小修复,并证明因果归因对于跨不同代理任务的可靠改进是必要的,在复杂检索设置中优于启发式细化,同时产生更局部的修复。这些结果表明,对结构化执行轨迹的干预分析提供了一种原则性和可扩展的机制,将代理失败转化为可靠性提升和可学习的监督。

英文摘要

Large language model (LLM) agents frequently fail on multi-step tasks involving reasoning, tool use, and environment interaction. While such failures are typically logged or retried heuristically, they contain structured signals about where execution broke down. We introduce CausalFlow, an interventional framework that converts failed agent traces into minimal counterfactual repairs and reusable supervision. CausalFlow models execution traces as sequential chains of dependent steps and computes Causal Responsibility Scores(CRS) via step-level counterfactual intervention to identify failure-inducing steps. For these steps, we generate minimally edited repairs that flip the final outcome to success, producing validated contrastive pairs of the form (wrong step, corrected step). CausalFlow supports two complementary uses: targeted test-time repair that recovers from failures with minimal behavioral drift, and training-time supervision suitable for offline preference optimization or reward modeling. Across four benchmarks spanning mathematical reasoning, code generation, question answering, and medical browsing, CausalFlow converts failed executions into validated minimal repairs with high minimality and causal-consensus scores, and demonstrates that causal attribution is necessary for reliable improvement across diverse agent tasks, outperforming heuristic refinement in complex retrieval settings while producing more localized repairs throughout. These results demonstrate that interventional analysis over structured execution traces provides a principled and scalable mechanism for transforming agent failures into reliability gains and learning-ready supervision.

2605.25313 2026-05-26 cs.LG cs.AI cs.RO stat.ML 版本更新

UWM-JEPA: Predictive World Models That Imagine in Belief Space

UWM-JEPA:在信念空间中进行想象的世界预测模型

Santosh Kumar Radha, Oktay Goktas

发表机构 * AgentField AI

AI总结 针对部分可观测环境,提出UWM-JEPA模型,通过密度矩阵潜变量和酉预测器在信念空间中保持联合状态谱,实现长时域盲推演下的不确定性保持,显著优于向量潜变量基线。

Comments 14 pages, 6 figures, 7 tables. Code and data: https://github.com/santoshkumarradha/uwm-jepa

详情
AI中文摘要

部分可观测环境下的世界模型必须想象多个兼容的隐藏未来,并在反事实动作下引导它们。联合嵌入预测架构(JEPAs)在潜在空间中实现这一点,但向量值潜变量没有内部结构来承载盲推演过程中隐藏连续性的信念。我们引入了酉世界模型JEPA(UWM-JEPA),这是一种JEPA世界模型,具有在联合系统-环境空间上的密度矩阵潜变量和学习的酉预测器。该结构在推演过程中精确保持联合状态谱,因此预测器本身不会耗散表示的不确定性。在一个需要根据给定动作序列进行五步前向模拟且目标观测被掩蔽的隐藏速度指示任务中,UWM-JEPA达到0.77的准确率,并且随着动作被扰动而单调下降;而参数匹配的LSTM-JEPA在相同的反事实目标目标和动作头训练下,在所有动作条件下都崩溃为多数类准确率(0.53)。在盲推演下,UWM-JEPA在短时域上损失不到十个点的探针R^2,而向量潜变量基线损失四十一个和六十八个点;两者在保留的上下文探针上表现相当,表明差异在于预测器而非编码器。动作敏感性本身需要针对反事实而非教师强制目标进行训练,这一发现适用于酉参数化之外。对于JEPA世界模型在部分可观测性下进行想象,潜变量几何和预测器动力学至关重要,而不仅仅是冻结的上下文编码能力。

英文摘要

World models for partially observed environments must imagine multiple compatible hidden futures and steer between them under counterfactual actions. Joint Embedding Predictive Architectures (JEPAs) do this in latent space, but a vector-valued latent has no internal structure for carrying the belief over hidden continuations through blind rollout. We introduce the Unitary World Model JEPA (UWM-JEPA), a JEPA world model with a density-matrix latent on a joint system-environment space and a learned unitary predictor. The construction preserves the joint-state spectrum exactly during rollout, so the predictor itself cannot dissipate the represented uncertainty. On a hidden-velocity indicator task requiring five-step forward simulation under a given action sequence with the target observation masked, UWM-JEPA reaches 0.77 accuracy and degrades monotonically as actions are perturbed; a parameter-matched LSTM-JEPA trained under the same counterfactual-target objective and action head collapses to majority-class accuracy (0.53) under every action condition. Under blind rollout, UWM-JEPA loses fewer than ten points of probe R^2 at short horizons while vector-latent baselines lose forty-one and sixty-eight; both nevertheless tie on a held-out context probe, locating the separation in the predictor rather than the encoder. Action sensitivity itself requires training against counterfactual rather than teacher-forced targets, a finding that applies beyond the unitary parameterisation. For JEPA world models to imagine under partial observability, latent geometry and predictor dynamics matter, not frozen context-encoding capacity alone.

2605.25305 2026-05-26 cs.LG 版本更新

Electricity Consumption Forecasting: An Approach Using Cooperative Ensemble Learning with SHapley Additive exPlanations

电力消耗预测:一种使用SHapley加法解释的协作集成学习方法

Eduardo Luiz Alba, Gilson Adamczuk Oliveira, Matheus Henrique Dal Molin Ribeiro, Érick Oliveira Rodrigues

发表机构 * Industrial & Systems Engineering Graduate Program (PPGEPS), Federal University of Technology-Parana (UTFPR)(工业与系统工程研究生项目(PPGEPS),联邦技术大学-巴兰(UTFPR))

AI总结 提出一种名为弱分离器增强器(WSB)的协作集成学习方法,结合LSTM、RF、SVR和XGBoost模型,利用SHAP进行特征选择,遗传算法和粒子群优化超参数,对巴西联邦学院两个校区未来12个月的电力消耗进行预测,取得较低误差。

详情
Journal ref
Forecasting 2024
AI中文摘要

电力费用管理面临重大挑战,因为该资源易受多种影响因素影响。在大学中,随着机构扩张,对该资源的需求迅速增长,并对环境产生显著影响。本研究使用长短期记忆(LSTM)、随机森林(RF)、支持向量回归(SVR)和极端梯度提升(XGBoost)机器学习模型,基于巴拉那联邦学院(IFPR)过去七年的历史消费数据和气候变量,训练模型以预测未来12个月的电力消耗。采用了两个校区的数据集。为了提高模型性能,使用Shapley加法解释(SHAP)进行特征选择,并使用遗传算法(GA)和粒子群优化(PSO)进行超参数优化。结果表明,所提出的名为弱分离器增强器(WSB)的协作集成学习方法在数据集上表现最佳。具体而言,对于IFPR-Palmas校区,其sMAPE为13.90%,MAE为1990.87 kWh;对于Coronel Vivida校区,sMAPE为18.72%,MAE为465.02 kWh。SHAP分析揭示了两个IFPR校区不同的特征重要性模式。一个共同点是滞后时间序列值的强烈影响和气候变量的最小影响。

英文摘要

Electricity expense management presents significant challenges, as this resource is susceptible to various influencing factors. In universities, the demand for this resource is rapidly growing with institutional expansion and has a significant environmental impact. In this study, the machine learning models long short-term memory (LSTM), random forest (RF), support vector regression (SVR), and extreme gradient boosting (XGBoost) were trained with historical consumption data from the Federal Institute of Paraná (IFPR) over the last seven years and climatic variables to forecast electricity consumption 12 months ahead. Datasets from two campuses were adopted. To improve model performance, feature selection was performed using Shapley additive explanations (SHAP), and hyperparameter optimization was carried out using genetic algorithm (GA) and particle swarm optimization (PSO). The results indicate that the proposed cooperative ensemble learning approach named Weaker Separator Booster (WSB) exhibited the best performance for datasets. Specifically, it achieved an sMAPE of 13.90% and MAE of 1990.87 kWh for the IFPR-Palmas Campus and an sMAPE of 18.72% and MAE of 465.02 kWh for the Coronel Vivida Campus. The SHAP analysis revealed distinct feature importance patterns across the two IFPR campuses. A commonality that emerged was the strong influence of lagged time-series values and a minimal influence of climatic variables.

2605.25304 2026-05-26 cs.LG cs.CR cs.CV 版本更新

When Interpretability Becomes a Liability: Adversarial Attacks on CBM Concept Layers

当可解释性成为负担:针对CBM概念层的对抗攻击

Aditya Sridhar

发表机构 * Independent Researcher(独立研究者)

AI总结 本文系统研究了概念瓶颈模型(CBM)中概念层的对抗性脆弱性,提出了一种基于语义扰动的稳定性正则化防御方法SPECTRA,显著提高了攻击所需的最小扰动范数,同时保持了分类精度。

Comments Accepted to CVPR 2026 (Findings). 9 pages, 6 figures

详情
AI中文摘要

概念瓶颈模型(CBM)已成为可解释机器学习的基础方法,通过显式的概念激活提供人类可理解的中间表示。然而,这种可解释性从根本上引入了一个关键且先前未被探索的攻击面:概念瓶颈层本身。我们提出了对CBM中概念级对抗性脆弱性的全面、系统性研究,揭示了对输入像素进行有针对性的最小扰动可以通过操纵语义表示导致灾难性的错误分类。我们开发了一个严格的理论框架来量化概念空间的鲁棒性,建立了揭示这些架构脆弱性景观的新指标。我们在CUB-200-2011数据集上的广泛分析表明,标准CBM对概念级操纵表现出严重的敏感性。为了解决这一关键弱点,我们引入了SPECTRA(基于语义扰动的概念训练以增强对抗鲁棒性),一种原则性的稳定性正则化防御。SPECTRA有效地强化了语义表示空间,将成功攻击所需的最小扰动范数从0.46提高到超过4,200,使得有针对性的概念操纵在计算上变得不可行。此外,SPECTRA将基线分类精度保持在2.2%以内。通过将概念级攻击确立为一种根本不同的威胁模型,这项工作在可解释机器学习与对抗鲁棒性的交叉领域开辟了一个新的研究前沿。

英文摘要

Concept Bottleneck Models (CBMs) have emerged as a cornerstone approach for interpretable machine learning, providing human-understandable intermediate representations through explicit concept activations. However, this interpretability fundamentally introduces a critical, previously unexplored attack surface: the concept bottleneck layer itself. We present a comprehensive, systematic study of concept-level adversarial vulnerabilities in CBMs, revealing that targeted, minimal perturbations operating on input pixels can induce catastrophic misclassification by manipulating semantic representations. We develop a rigorous theoretical framework to quantify concept-space robustness, establishing novel metrics that expose the vulnerability landscape of these architectures. Our extensive analysis on the CUB-200-2011 dataset demonstrates that standard CBMs exhibit severe susceptibility to concept-level manipulation. To address this critical weakness, we introduce SPECTRA (Semantic Perturbation-based Concept Training for Robustness against Attacks), a principled stability regularization defense. SPECTRA effectively hardens the semantic representation space, increasing the minimal perturbation norm required for a successful attack from 0.46 to over 4,200, rendering targeted concept manipulation computationally prohibitive. Furthermore, SPECTRA preserves baseline classification accuracy to within 2.2%. By establishing concept-level attacks as a fundamentally distinct threat model, this work opens a new research frontier at the intersection of interpretable machine learning and adversarial robustness.

2605.25290 2026-05-26 stat.ML cs.LG 版本更新

Choosing Online Experiment Designs under Interference in Ads, Recommendations, and Member-Experience Systems

广告、推荐和会员体验系统中存在干扰时的在线实验设计选择

Prashant Shekhar, Caroline Howard

发表机构 * Department of Mathematics(数学系) Embry-Riddle Aeronautical University(埃姆布里-瑞德尔航空航天大学)

AI总结 针对广告、推荐和会员体验系统中干扰机制未知的问题,提出一种基于鲁棒设计选择的框架,通过最坏情况规划风险比较六种可实施设计,并给出几何感知保证和有限目录近似定理。

详情
AI中文摘要

广告、推荐和会员体验系统中的在线实验通常是在主导干扰机制已知之前规划的。处理效应可能通过预算、库存、生产者曝光、图溢出或时间结转传播,使得随机化设计本身成为一个统计决策。我们将此问题形式化为在不确定曝光机制下的鲁棒设计选择。给定一个包含六种可实施设计的有限目录,选择器通过模糊集上的最坏情况规划风险比较每种设计。风险结合了曝光偏差、分配单元方差、最小可检测效应、污染或结转、操作成本和估计量不匹配。在理论证明方面,本文开发了一种几何感知保证,指出设计偏差受限于到发布曝光分布的Wasserstein距离,并且该惩罚在Lipschitz曝光响应下是极小极大紧的。我们还证明了有限目录近似和具有超额风险控制的鲁棒选择器定理、在分离条件下的精确恢复,以及当风险曲面平坦时的认证候选列表。实证上,同一选择器在来自公共数据集的样本上给出不同的推荐。它在Criteo广告上选择用户随机化,无量纲鲁棒风险为1.295;在Open Bandit-bts/men上选择切换设计,风险为2.105;在KuaiRand上选择聚类随机化,风险为2.240。Open Bandit案例强调了已知但不均匀的日志记录支持,倾向性从0.00006到0.594,IPS有效样本份额为5.17%。总体而言,本文贡献了一个基于机制鲁棒设计决策的干扰感知实验设计框架,输出要么是合理的设计选择,要么是不确定性候选列表。

英文摘要

Online experiments in ads, recommendation, and member-experience systems are often planned before the dominant interference mechanism is known. A treatment may propagate through budgets, inventory, producer exposure, graph spillovers, or temporal carryover, making the randomization design itself a statistical decision. We formulate this problem as robust design selection over uncertain exposure mechanisms. Given a finite catalog of six implementable designs, the selector compares each design by worst-case planning risk over an ambiguity set. The risk combines exposure bias, assignment-unit variance, minimum detectable effect, contamination or carryover, operational cost, and estimand mismatch. For theoretical justification, the paper develops a geometry-aware guarantee, stating that design bias is bounded by Wasserstein distance to the launch exposure distribution, and this penalty is minimax tight under Lipschitz exposure response. We also prove finite-catalog approximation and a robust selector theorem with excess-risk control, exact recovery under separation, and certified shortlists when the risk surface is flat. Empirically, the same selector gives different recommendations across samples from public datasets. It selects user-randomization on Criteo ads with dimensionless robust risk 1.295, switchbacks on Open Bandit-bts/men with risk 2.105, and cluster-randomization on KuaiRand with risk 2.240. The Open Bandit case stresses known but uneven logging support, with propensities from 0.00006 to 0.594 and a 5.17% IPS effective-sample share. Overall, the paper contributes an interference-aware experiment design framework based on mechanism-robust design decisions, where the output is either a justified design choice or an uncertainty shortlist.

2605.25275 2026-05-26 cs.LG 版本更新

Label-NTK Alignments and A Tighter Convergence Bound in the NTK Regime

标签-NTK 对齐与 NTK 区域中更紧的收敛界

Ruchirinkil Marreddy, Chaoyue Liu

发表机构 * Elmore Family School of Electrical and Computer Engineering(埃洛姆家族电气与计算机工程学院)

AI总结 通过标签与NTK特征谱的对齐特性,提出更紧的收敛界,显著改进经典最坏情况结果。

详情
AI中文摘要

神经正切核(NTK)框架通过近似线性化动力学解释过参数化神经网络的优化,提供指数收敛保证。然而,现有结果往往过于悲观,与实际快速训练不符,因为它们依赖于最小的NTK特征值,而该特征值在实践中通常极小。在这项工作中,我们通过刻画数据标签与NTK特征谱之间的相互作用,开发了更精确的收敛保证。我们识别出两个关键现象:标签-NTK对齐和残差-NTK对齐,表明标签和残差在NTK特征向量上的投影与对应特征值成比例。我们在温和的数据假设下提供了经验证据和理论证明。利用这些对齐性质,我们推导出一个依赖于完整谱的精细收敛界,该界紧密匹配实际训练动态,显著优于经典最坏情况结果。我们进一步获得了改进的泛化界。在多个数据集上的MLP和CNN实验验证了我们的理论。

英文摘要

The Neural Tangent Kernel (NTK) framework explains optimization in over-parameterized neural networks via approximately linearized dynamics, yielding exponential convergence guarantees. However, existing results are often overly pessimistic and do not match the fast training in practice, as they depend on the smallest NTK eigenvalue, which is typically extremely small in practice. In this work, we develop sharper convergence guarantees by characterizing the interaction between data labels and the NTK eigen-spectrum. We identify two key phenomena, Label-NTK alignment and Residual-NTK alignment, showing that projections of labels and residuals onto NTK eigenvectors scale with the corresponding eigenvalues. We provide empirical evidence and theoretical justification under mild data assumptions. Exploiting these alignment properties, we derive a refined convergence bound that depends on the full spectrum and closely matches practical training dynamics, significantly improving over classical worst-case results. We further obtain improved generalization bounds. Experiments on MLPs and CNNs across multiple datasets validate our theory.

2605.25267 2026-05-26 cs.LG cs.AI 版本更新

Latent Q-Barrier Shielding for Safe In-Context Reinforcement Learning

潜在Q-屏障屏蔽用于安全上下文强化学习

Minjae Kwon, Amir Moeini, Shangtong Zhang, Lu Feng

发表机构 * University of Virginia(弗吉尼亚大学)

AI总结 提出一种潜在Q-屏障屏蔽方法,通过学习上下文表示、潜在动力学和集成成本评论家,在部署时无需参数更新即可根据剩余预算和预测未来成本过滤或软重加权候选动作,从而改善安全上下文强化学习在分布外转移下的奖励-安全权衡。

详情
AI中文摘要

安全上下文强化学习(ICRL)在测试时不更新参数,仅从交互历史中在线适应,同时将情节成本控制在安全预算内。在分布外(OOD)部署转移下,仅预训练的安全ICRL可能产生较差的奖励-安全权衡,因为剩余预算仅通过冻结的策略条件影响行为,而非通过针对预测未来成本的显式动作级检查。我们提出一种潜在Q-屏障屏蔽,在部署前学习上下文表示、潜在动力学和集成成本评论家。无需参数更新,该屏蔽从历史中推断上下文,并使用剩余预算和预测未来成本过滤或软重加权候选动作。我们证明了一个条件性的、误差分解的屏障-边际结果:满足Q-屏障的动作将下一个潜在预算状态置于近似预算安全的延续中(在学习的评论家下),误差上界由贝尔曼误差和潜在预测误差决定。在五个安全ICRL基准测试中,该屏蔽在部署时相比强安全ICRL基线改善了奖励-安全权衡:在短上下文窗口后,它在五个基准中的四个上实现了更高的回报,同时在所有五个基准中匹配或降低了平均情节成本。

英文摘要

Safe in-context reinforcement learning (ICRL) adapts online from interaction history without test-time parameter updates while controlling episode cost under a safety budget. Under out-of-distribution (OOD) deployment shifts, pretraining-only safe ICRL can give poor reward-safety tradeoffs because the remaining budget affects behavior only through frozen policy conditioning, not an explicit action-level check against predicted future cost. We propose a latent Q-Barrier shield that learns a context representation, latent dynamics, and an ensemble cost critic before deployment. Without parameter updates, the shield infers context from history and filters or softly reweights candidate actions using the remaining budget and predicted future cost. We prove a conditional, error-decomposed barrier-margin result: a Q-Barrier-satisfying action leaves the next latent-budget state with an approximately budget-safe continuation under the learned critic, up to Bellman and latent-prediction errors. Across five safe ICRL benchmarks, the shield improves deployment-time reward-safety tradeoffs over a strong safe-ICRL baseline: after a short context window, it achieves higher return in four of five benchmarks while matching or lowering average episode cost in all five.

2605.25258 2026-05-26 cs.IR cs.AI cs.CY cs.LG 版本更新

First, do no harm: Breaking suicidogenic echo chambers in media recommendation

首先,不伤害:打破媒体推荐中的自杀性回音室

Alberto Díaz-Álvarez, Raúl Lara-Cabrera, Fernando Ortega-Requena, Víctor Ramos-Osuna

发表机构 * E.T.S.I. Sistemas Informáticos (Universidad Politécnica de Madrid)(马德里理工大学信息系统工程系)

AI总结 针对推荐系统在心理健康场景中可能加剧用户自杀倾向的问题,提出RankAid重排序方法,通过惩罚有害内容并提升治疗性内容,在保持推荐准确性的同时确保临床安全。

Comments 10 pages, 5 figures. Research on safety-aware recommender systems and algorithmic ethics

详情
AI中文摘要

推荐系统通常优化用户参与度,但在心理健康背景下这种方法存在危险。当脆弱用户表现出自杀意念迹象时,标准算法往往将他们困在有害内容的回音室中,恶化其心理状态。为此,我们引入RankAid,一种重排序方法,在预测相关性的同时优先考虑临床安全性。它作为现有模型的附加层运行:根据用户当前的脆弱程度惩罚风险项目并提升治疗性内容。我们使用MovieLens 1M数据集评估了该方法,其中项目通过大语言模型进行了临床风险和治疗价值的语义注释。我们的模拟表明,该算法在危机高峰期成功阻止了有害内容的推荐,主动重塑信息流以支持情绪降级。此外,这种安全干预仅导致标准准确性指标(如NDCG)可控且可接受的下降。通过使用非对称超参数,RankAid还使系统管理员能够根据特定的临床指南调整干预的严重程度。

英文摘要

Recommender systems generally optimises user engagement, but this approach is dangerous in mental health contexts. When vulnerable users show signs of suicidal ideation, standard algorithms often trap them in echo chambers of harmful content, worsening their psychological state. In response, we introduce RankAid, a re-ranking method that prioritises clinical safety alongside predictive relevance. It works as an add-on layer to existing models: it penalises risky items and boosts therapeutic content depending on the user's current level of vulnerability. We evaluated this approach using the MovieLens 1M dataset, where items were semantically annotated for clinical risk and therapeutic value using large language models. Our simulations show that our algorithm successfully blocks the recommendation of harmful content during crisis peaks, actively reshaping the feed to support emotional de-escalation. Furthermore, this safety intervention only causes a controlled, acceptable drop in standard accuracy metrics like NDCG. By using asymmetric hyperparameters, RankAid also gives system administrators the flexibility to tune the severity of the intervention based on specific clinical guidelines.

2605.23650 2026-05-26 stat.ML cs.LG 版本更新

Learning Kernel-Based MDPs from Episodic Preferential Feedback

从片段偏好反馈中学习基于核的MDP

Nikola Pavlovic, Sattar Vakili, Qing Zhao

发表机构 * Cornell University(康奈尔大学) MediaTek Research(联发科研究)

AI总结 本文研究片段核MDP中的偏好学习,提出基于偏好比较的价值估计和置信集方法,并证明亚线性遗憾界。

详情
AI中文摘要

人类反馈通常以偏好形式而非校准的数值奖励出现,这推动了从偏好反馈中强化学习(也称为从人类反馈中强化学习,RLHF)。我们对片段核MDP中的纯偏好学习进行了严格的理论研究。在每个片段中,学习器从共同起始状态部署两个策略,并接收一个二进制标签,指示哪个轨迹更受偏好,该标签通过Bradley-Terry-Luce链接函数基于累积(未观测)奖励的差异建模。在奖励和转移函数基于核的假设下(这是最适宜理论分析的一般模型之一),我们开发了基于偏好的价值估计和专门针对片段结束比较的置信集。我们证明了遗憾界以高概率随片段数亚线性增长,这意味着学习策略的价值收敛到最优策略的价值。

英文摘要

Human feedback often arrives as preferences rather than calibrated numeric rewards, motivating reinforcement learning from preferential feedback, also referred to as reinforcement learning from human feedback (RLHF). We present a rigorous theoretical study of preference-only learning in episodic kernel MDPs. In each episode, the learner deploys two policies from a common start state and receives a single binary label indicating which trajectory is preferred, modeled by a Bradley--Terry--Luce link on the difference of cumulative (unobserved) rewards. Under kernel-based assumptions on the reward and transition functions (one of the most general models amenable to theoretical analysis) we develop preference-based value estimation and confidence sets tailored to end-of-episode comparisons. We prove high-probability regret bounds that scale sublinearly in the number of episodes, implying that the value of the learned policy converges to that of the optimal policy.

2605.23491 2026-05-26 cs.LG cs.AI cs.CL 版本更新

CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test

CoSPlay: 测试时协作自我博弈与自生成代码和单元测试

Zhangyi Hu, Chenhui Liu, Tian Huang, Jindong Li, Yang Yang, Jiemin Wu, Zining Zhong, Menglin Yang, Yutao Yue

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) Institute of Deep Perception Technology, JITRI, Wuxi, China(深度感知技术研究院,无锡,中国)

AI总结 提出CoSPlay框架,通过代码与单元测试的协作自我博弈,在无真实单元测试的情况下迭代优化两者,显著提升代码生成性能。

Comments Code is available at: https://github.com/sanae-ai/CosPlay | Data & log is available at: https://huggingface.co/datasets/yomi017/CosPlay

详情
AI中文摘要

最近,可验证奖励强化学习(RLVR)和测试时扩展(TTS)通过可执行验证推动了LLM代码生成的发展。然而,真实单元测试(GT UTs)仍然是瓶颈:最先进的RLVR方法需要它们进行昂贵的训练,而现有的TTS方法在没有它们的情况下会失去竞争力。这促使了无GT的TTS,其中现有方法直接使用自生成的UT来优化和选择代码候选。然而,这些UT通常带有噪声或与错误代码虚假耦合,而UT质量在没有可靠代码的情况下也无法验证。因此,关键挑战是同时改进两者。为此,我们提出了CoSPlay,一个无GT、无需训练的框架,通过协作自我博弈同时改进代码和UT。它首先探索多样化的解决方案思路,识别其潜在失败模式以生成有区分力的UT思路。然后,它利用代码-UT执行矩阵中的双向通过计数信号,迭代地修剪或修复弱代码,并刷新或替换不可靠的UT,使两个池共同进化。最后,当多个代码在最高通过计数上并列时,它从最大的输出共识簇中选择最终代码,因为正确的代码在相同输入上一致,而错误的代码则发散。在四个具有挑战性的基准上的实验表明,CoSPlay在Qwen2.5-7B-Instruct上将平均BoN从22.1%提升到33.2%,UT准确率从14.6%提升到78.3%,匹配或超越了RLVR模型CURE-7B。当应用于CURE-7B时,它进一步将BoN提高了5.7%。CoSPlay还能跨不同骨干网络泛化,并在相当的token预算下优于无GT的TTS基线,且随着预算增加持续获益。这些结果表明,无需任何GT数据即可实现竞争性代码生成的可扩展推理策略。

英文摘要

Recently, Reinforcement Learning with Verifiable Rewards (RLVR) and Test-Time Scaling (TTS) have advanced LLM code generation through executable verification. Yet Ground-Truth Unit Tests (GT UTs) remain a bottleneck: SOTA RLVR methods require them for costly training, while existing TTS methods lose competitiveness without them. This motivates GT-free TTS, where existing methods directly use self-generated UTs to refine and select code candidates. Yet such UTs are often noisy or spuriously coupled with wrong code, and UT quality in turn cannot be validated without reliable code. The key challenge is therefore to jointly improve both. To this end, we present CoSPlay, a GT-free, training-free framework that jointly improves codes and UTs through cooperative self-play. It first explores diverse solution ideas and identifies their potential failure modes to produce discriminative UT ideas. It then uses bidirectional pass-count signals from the Code-UT execution matrix to iteratively prune or fix weak codes and refresh or replace unreliable UTs, letting the two pools co-evolve. Finally, when multiple codes remain tied at the highest pass count, it picks the final code from the largest output-consensus cluster, since correct codes agree on the same inputs while wrong codes diverge. Experiments on four challenging benchmarks show that CoSPlay on Qwen2.5-7B-Instruct improves average BoN from 22.1% to 33.2% and UT accuracy from 14.6% to 78.3%, matching or surpassing the RLVR model CURE-7B. When applied to CURE-7B, it further improves BoN by 5.7%. CoSPlay also generalizes across diverse backbones and outperforms GT-free TTS baselines under comparable token budgets, with continued gains as the budget scales up. These results suggest a scalable inference strategy for competitive code generation without any GT data.

2605.23473 2026-05-26 cs.LG cs.AI 版本更新

Automated Random Embedding for Practical Bayesian Optimization with Unknown Effective Dimension

面向未知有效维度的实用贝叶斯优化的自动随机嵌入

Hong Qian, Xiang Shu, Xiang Xia, Xuhui Liu, Yangde Fu, Bei Liang, Huibin Wang, Liang Dou

发表机构 * Shanghai Institute of AI for Education, and School of Computer Science and Technology, East China Normal University(上海人工智能教育研究院,东华大学计算机科学与技术学院) Ant Group(蚂蚁集团) Nanjing University(南京大学)

AI总结 提出动态共享嵌入贝叶斯优化(DSEBO)方法,通过自动调整子空间维度并共享查询解,平衡近似与优化误差,在高维优化中显著降低遗憾和时间成本。

Comments This paper has been accepted by IJCAI 2026

详情
AI中文摘要

贝叶斯优化广泛应用于复杂黑箱函数的优化,但受维度灾难困扰。随机嵌入作为一种降维策略,通过在低维子空间中优化来简化具有有效维度的任务。然而,预先确定任务的有效维度仍是一个重大挑战,它影响子空间维度的选择和优化性能。传统方法使用专家提供的固定子空间维度,或依赖试错法估计子空间维度,消耗资源。为此,本文提出一种针对未知有效维度的高维贝叶斯优化的自动随机嵌入方法,称为动态共享嵌入贝叶斯优化(DSEBO)。DSEBO从低维度开始,如果当前子空间中的解显示初步收敛,则切换到更高维的子空间。DSEBO基于不同子空间中解的质量动态确定下一子空间的维度,并与新子空间共享已查询的解以实现更好的初始化。理论上,我们推导了DSEBO的遗憾界,并证明DSEBO能更好地平衡近似误差和优化误差。在维度规模变化的函数和未知有效维度的实际任务上的大量实验表明,与最先进方法相比,跨不同子空间的交替优化在高维优化中显著提高了优化遗憾和时间性能。

英文摘要

Bayesian optimization is widely employed for optimizing complex black-box functions but struggles with the curse of dimensionality. Random embedding, as a dimension reduction strategy, simplifies tasks that possess the effective dimension by optimizing within a low-dimensional subspace. However, determining the effective dimension of a task in advance remains a significant challenge, which influences the selection of the subspace dimensionality and the optimization performance. Traditional methods use fixed subspace dimensions provided by experts or rely on trial and error to estimate subspace dimensions with resources consumed. To this end, this paper proposes an automated random embedding for high-dimensional Bayesian optimization with unknown effective dimension, called Dynamic Shared Embedding Bayesian Optimization (DSEBO). DSEBO starts with a low dimension and switches to a higher subspace if the solutions in the current subspace show preliminary convergence. DSEBO dynamically determines the dimension of the next subspace based on the quality of the solutions in different subspaces and shares the queried solutions with the new subspace for a better initialization. Theoretically, we derive a regret bound for DSEBO and demonstrate that DSEBO can better balance approximation and optimization errors. Extensive experiments on functions with dimensionality of varying magnitudes and real-world tasks with unknown effective dimensions reveal that, compared with state-of-the-art methods, alternating optimization across different subspaces results in significant improvements in high-dimensional optimization, both in terms of optimization regret and time.

2605.23395 2026-05-26 cs.LG 版本更新

Convex Compositional Reasoning Models

凸组合推理模型

Meir Roketlishvili, Semyon Semenov, Maksim Bobrin, Viktor Kovalchuk, Albert Baichorov, Abduragim Shtanchaev, Fakhri Karray, Dmitry V. Dylov, Martin Takáč, Arip Asadulaev

发表机构 * Mohamed bin Zayed University of Artificial Intelligence(莫扎德·本·扎耶德人工智能大学) Applied AI Institute(应用人工智能研究所) Computational Imaging Lab(计算成像实验室)

AI总结 针对组合推理中能量景观的非凸几何瓶颈,提出凸组合能量最小化框架,通过输入凸神经网络参数化因子并优化紧凸松弛,实现确定性投影一阶优化,在小问题上训练后可零样本迁移到大实例。

详情
AI中文摘要

组合能量模型可以通过在许多局部约束中重用学习到的因子能量,泛化到更大的组合推理问题。在本文中,我们表明组合推理的一个关键瓶颈不是组合本身,而是学习到的能量景观的非凸几何。为了解决这个问题,我们引入了凸组合能量最小化(CCEM),这是一个用输入凸神经网络参数化每个因子,并在可行集的紧凸松弛上优化组合能量的框架。由于凸性在求和下保持不变,全局松弛目标保持凸性,从而能够进行确定性投影一阶优化。CCEM分两个阶段训练:因子级对比学习以塑造局部能量盆地,然后通过展开的投影求解器进行端到端细化。我们的实验表明,在小子问题或单个问题规模上训练的模型可以无需重新训练地迁移到更大的实例。

英文摘要

Compositional energy-based models can generalize to larger combinatorial reasoning problems by reusing a learned factor energy across many local constraints. In our paper, we show that a key bottleneck in compositional reasoning is not composition itself, but the non-convex geometry of the learned energy landscape. To solve this problem, we introduce Convex Compositional Energy Minimization (CCEM), a framework that parameterizes each factor with an input-convex neural network and optimizes the composed energy over a tight convex relaxation of the feasible set. Because convexity is preserved under summation, the global relaxed objective remains convex, enabling deterministic projected first-order optimization. CCEM is trained in two stages: factor-level contrastive learning to shape local energy basins, followed by end-to-end refinement through an unrolled projected solver. Our experiments show that our models trained on small subproblems or a single problem size transfer to larger instances without retraining.

2605.22894 2026-05-26 cs.GR cs.LG cs.RO 版本更新

SCRIPT: Scalable Diffusion Policy with Multi-stage Training for Language-driven Physics-based Humanoid Control

SCRIPT: 面向语言驱动的物理仿真人体控制的可扩展扩散策略与多阶段训练

Jingyan Zhang, Han Liang, Ruichi Zhang, Bin Li, Juze Zhang, Xin Chen, Jingya Wang, Lan Xu, Jingyi Yu

发表机构 * ShanghaiTech University(上海科技大学) University of Pennsylvania(宾夕法尼亚大学) Stanford University(斯坦福大学)

AI总结 提出SCRIPT框架,通过联合动作-状态-文本扩散Transformer和多阶段训练(监督模仿预训练+混合奖励强化学习后训练),实现语言指令驱动的物理仿真人体控制,在文本对齐、运动质量和物理真实性上超越现有方法。

Comments Project page: https://zhanglele12138.github.io/SCRIPT/

详情
AI中文摘要

从自然语言指令控制物理仿真人体是迈向通用具身智能体的关键一步。然而,现有方法仍受限于语义表达能力和物理可行性之间的张力,往往难以同时实现忠实的指令跟随、高质量的运动和稳定的长时程控制。我们提出SCRIPT,一种具有多阶段训练框架的可扩展扩散策略,用于语言驱动的物理仿真人体控制。SCRIPT的核心是联合动作-状态-文本扩散Transformer(JAST-DiT),它将动作、物理状态和文本表示为专门的令牌流,并通过联合注意力将它们耦合,使语言语义和控制动态之间能够直接交互。为了稳定自回归控制,我们引入了一种非线性历史条件机制,该机制保留密集的近期上下文,并从长期历史中采样越来越稀疏的线索。除了监督模仿预训练外,我们提出了一个后训练阶段,使用混合奖励强化学习(RLHR)进一步提高性能。通过将可学习噪声注入流采样过程,RLHR利用混合物理反馈和文本奖励在闭环模拟中有效改善运动质量和指令跟随。定量评估表明,SCRIPT在文本对齐、运动质量和物理真实性指标上均优于先前的最先进方法。此外,在1200小时的MotionMillion数据集上的扩展研究显示,随着模型规模的扩大,性能持续提升,突显了SCRIPT在大规模预训练中的稳健可扩展性。我们的代码将公开供未来研究使用。

英文摘要

Controlling physics-based humanoids from natural-language instructions is a critical step toward general-purpose embodied agents. However, existing methods remain constrained by a tension between semantic expressiveness and physical feasibility, often failing to jointly achieve faithful instruction following, high-quality motion, and stable long-horizon control. We propose SCRIPT, a scalable diffusion policy with a multi-stage training framework for language-driven physics-based humanoid control. The core of SCRIPT is a Joint Action-State-Text Diffusion Transformer (JAST-DiT), which represents actions, physical states, and text as dedicated token streams and couples them through joint attention, enabling direct interaction between language semantics and control dynamics. To stabilize autoregressive control, we introduce a nonlinear history conditioning mechanism, which preserves the dense recent context and samples increasingly sparse cues from long-term history. Beyond supervised imitation pre-training, we propose a post-training stage, further improving the performance using Reinforcement Learning with Hybrid Rewards (RLHR). By injecting learnable noise into the flow-sampling process, RLHR effectively improves motion quality and instruction following within closed-loop simulations using hybrid physical feedback and text rewards. Quantitative evaluations demonstrate that SCRIPT outperforms prior state-of-the-art methods, with gains across text alignment, motion quality, and physical realism metrics. Furthermore, scaling studies on the 1200-hour MotionMillion dataset demonstrate consistent performance gains with model scaling, highlighting SCRIPT's robust scalability for large-scale pre-training. Our code will be publicly available for future research.

2605.22892 2026-05-26 q-fin.RM cs.LG 版本更新

Is TabPFN the Silver Bullet for Insurance Pricing?

TabPFN 是保险定价的银弹吗?

Bruno Deprez, Wouter Verbeke, Tim Verdonck

发表机构 * KU Leuven University of Antwerp-imec(根特大学安特卫普-imec) KU Leuven(根特大学) University of Antwerp-imec(安特卫普大学-imec)

AI总结 本文首次实证评估 TabPFN 在车险定价中的表现,与 GLM 和 XGBoost 对比,发现其性能不稳定、推理时间长且对上下文训练集大小敏感,目前无法替代传统精算方法。

详情
AI中文摘要

非寿险定价中的索赔频率和严重性建模主要依赖广义线性模型,梯度提升机是领先的机器学习替代方案。表格基础模型(TFM)提出了一种根本不同的推理范式。通过在大量合成数据集上预训练,TFM 能够通过上下文学习对新数据进行推理,无需针对特定数据集进行拟合或超参数调优。本文首次对 TabPFN 在车险定价中进行实证评估,在两个公开的 MTPL 数据集上将其与 GLM 和 XGBoost 进行基准测试。我们的结果表明,TabPFN 并未持续优于已建立的基线,推理时间显著更长,并且对上下文训练集的大小敏感。虽然表格基础模型代表了有前景的方向,特别是在数据稀缺的情况下,但其当前性能无法为已建立的精算方法提供可行的替代方案。

英文摘要

Modelling claim frequency and severity for non-life insurance pricing predominantly relies on generalised linear models, with gradient-boosted machines as the leading machine learning alternative. Tabular foundation models (TFMs) present a fundamentally different inference paradigm. By pre-training on large collections of synthetic datasets, TFMs enable inference on new data through in-context learning, without any dataset-specific fitting or hyperparameter tuning. This paper presents a first empirical evaluation of TabPFN for motor insurance pricing, benchmarking it against GLM and XGBoost on two publicly available MTPL datasets. Our results show that TabPFN does not consistently outperform established baselines, exhibits substantially longer inference times, and is sensitive to the size of the in-context training set. While tabular foundation models represent a promising direction, particularly in data-scarce settings, their current performance does not offer a viable replacement for established actuarial methods.

2605.22856 2026-05-26 eess.SP cs.AI cs.IT cs.LG cs.NI math.IT 版本更新

PilotWiMAE: Pilot-Native Representation Learning for Wireless Channels

PilotWiMAE:面向无线信道的导频原生表示学习

Berkay Guler, Giovanni Geraci, Hamid Jafarkhani

发表机构 * Center for Pervasive Communications and Computing, University of California, Irvine(加州大学尔湾分校普及通信与计算中心) Nokia and Universitat Pompeu Fabra(诺基亚与庞培法布拉大学)

AI总结 提出PilotWiMAE自监督框架,直接处理噪声导频观测,通过分解注意力机制和补丁归一化重构,在缩小观测空间的同时实现跨频段波束选择和信道表征,优于监督基线。

详情
AI中文摘要

信道基础模型假设能够访问完全观测的信道,这一假设在部署中不成立。我们提出PilotWiMAE,一种自监督框架,其编码器直接接收噪声导频观测,注意力沿时间与联合空频处理轴分解,这是受问题物理特性启发的归纳偏置。导频输入将观测空间缩小两个数量级,并消除了全CSI可用性的不现实假设,同时降低延迟。分解设计通过利用可分离的信道结构生成鲁棒表示,并允许预训练掩码率达到$99\%$。我们将捕获小尺度衰落结构的补丁归一化重构与恢复大尺度衰落特征的辅助尺度损失相结合,并使用AWGN课程学习来匹配预训练和部署时的导频噪声。仅在$3.5$\,GHz上预训练,在$28$\,GHz上评估,涵盖分布内和分布外场景,PilotWiMAE的跨频段波束选择和信道表征在更小的观测空间上仍优于监督基线。为削弱解码器容量与表示质量之间的耦合,我们进一步提出在编码器-解码器联合预训练之后进行以解码器为中心的预训练阶段,使得PilotWiMAE在不牺牲表示质量的情况下展现出有竞争力的信道估计性能。为促进该方向的进一步研究,我们发布了PilotWiMAE预训练权重和训练流程,以及基于Sionna的射线追踪信道生成工具CSIGen和本文使用的信道数据集。

英文摘要

Channel foundation models assume access to fully observed channels, an assumption that fails in deployment. We introduce PilotWiMAE, a self-supervised framework whose encoder ingests noisy pilot observations directly and whose attention factorizes along the axis separating temporal from joint space-frequency processing, an inductive bias inspired by the physics of the problem. Pilot input shrinks the observation space by up to two orders of magnitude and also removes the unrealistic assumption of full-CSI availability while incurring lower latency. The factorized design generates robust representations by exploiting the separable channel structure and allows a pretraining mask ratio of $99\%$. We pair patch-normalized reconstruction, which captures small-scale fading structure, with an auxiliary scale loss that recovers the large-scale fading features, and use an AWGN curriculum to match pilot noise at pretraining and deployment. Pretrained solely on $3.5$\,GHz and evaluated at $28$\,GHz across in-distribution and out-of-distribution settings, PilotWiMAE's cross-frequency beam selection and channel characterization beat supervised baselines despite operating on a smaller observation space. To weaken the coupling between decoder capacity and representation quality, we further propose a decoder-centric pretraining stage following the encoder-decoder joint pretraining, which allows PilotWiMAE to demonstrate competitive channel estimation without sacrificing representation quality. To foster further work in this direction, we release the PilotWiMAE pretrained weights and training pipeline, together with CSIGen, our Sionna-based ray-tracing channel-generation tool, and the channel datasets used in this work.

2605.22795 2026-05-26 stat.ML cs.AI cs.LG math.ST stat.TH 版本更新

Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models

保守与非保守漂移模型的有限粒子收敛速率

Krishnakumar Balasubramanian

发表机构 * Department of Statistics, University of California, Davis(加州大学戴维斯分校统计系)

AI总结 针对一步生成建模,提出保守漂移方法(用核密度估计梯度速度替代位移速度)并证明连续时间有限粒子收敛界,同时分析非保守方法(Laplace核)的对应速率。

详情
AI中文摘要

我们提出并分析了一种用于一步生成建模的保守漂移方法。该方法将原始的基于位移的漂移速度替换为核密度估计(KDE)梯度速度,即核平滑数据得分与核平滑模型得分之差。该速度为梯度场,解决了通用基于位移的漂移场中发现的非保守性问题。我们证明了在$\R^d$上保守方法的连续时间有限粒子收敛界:联合熵恒等式给出了经验Stein漂移、KDE的平滑Fisher差异以及中心速度平方的界。主要的有限粒子校正是倒数KDE自相互作用项,我们给出了确定性和高概率的局部占据条件,在此条件下该项可控。我们保持求积常数显式并追踪其可能的带宽依赖性:在额外的$h$均匀求积正则条件下,根残差速度率为$N^{-1/(d+4)}$;而更一般的增长条件产生优化根速率$N^{-(2-β)/(2(d+4-β))}$,其中$0\le β<2$。我们还分析了使用Laplace核的非保守漂移方法,对应于Deng等人2026年(arxiv:2602.04770)提出的原始基于位移的速度。对于该方法,一个尖锐的伴随核将速度分解为尖锐得分不匹配的正标量预处理加上Laplace尺度不匹配残差,产生类似的有限粒子速率,但带有一个不可避免的残差项。最后,我们解释了如何通过显式漂移大小$η$将连续时间残差速度界转化为一步生成保证。

英文摘要

We propose and analyze a conservative drifting method for one-step generative modeling. The method replaces the original displacement-based drifting velocity by a kernel density estimator (KDE)-gradient velocity, namely the difference of the kernel-smoothed data score and the kernel-smoothed model score. This velocity is a gradient field, addressing the non-conservatism issue identified for general displacement-based drifting fields. We prove continuous-time finite-particle convergence bounds for the conservative method on $\R^d$: a joint-entropy identity yields bounds for the empirical Stein drift, the smoothed Fisher discrepancy of the KDE, and the squared center velocity. The main finite-particle correction is a reciprocal-KDE self-interaction term, and we give deterministic and high-probability local-occupancy conditions under which this term is controlled. We keep the quadrature constants explicit and track their possible bandwidth dependence: the root residual-velocity rate $N^{-1/(d+4)}$ holds under an additional $h$-uniform quadrature regularity condition, while a more general growth condition yields the optimized root rate $N^{-(2-β)/(2(d+4-β))}$, where $0\le β<2$. We also analyze the non-conservative drifting method with Laplace kernel, corresponding to the original displacement-based velocity proposed in Deng et al., 2026 (arxiv:2602.04770). For this method, a sharp companion kernel decomposes the velocity into a positive scalar preconditioning of a sharp-score mismatch plus a Laplace scale-mismatch residual, producing an analogous finite-particle rate with an unavoidable residual term. Finally, we explain how the continuous-time residual-velocity bounds translate into one-step generation guarantees through the explicit drift size $η$.

2605.22365 2026-05-26 cs.CR cs.AI cs.LG 版本更新

TimeGuard: Channel-wise Pool Training for Backdoor Defense in Time Series Forecasting

TimeGuard: 面向时间序列预测中后门防御的通道式池化训练

Quang Duc Nguyen, Siyuan Liang, Yiming Li, Fushuo Huo, Dacheng Tao

发表机构 * College of Computing(计算学院) Data Science, Nanyang Technological University, Singapore(数据科学,新加坡南洋理工大学)

AI总结 针对时间序列预测中后门攻击防御难题,提出基于通道式池化训练的TimeGuard方法,通过时间感知池初始化与距离正则化损失选择缓解信号稀释与损失退化,显著提升鲁棒性。

Comments 44 pages, 30 figures. ICML 2026

详情
AI中文摘要

时间序列预测(TSF)极易受到后门攻击,但由于数据纠缠和任务公式化转变带来的挑战,有效的防御方法仍未被充分探索。为填补这一空白,我们对TSF生命周期中的十三种代表性后门防御进行了系统评估,并分析了它们的失败模式。我们的结果揭示了两个根本问题:(1)数据纠缠导致通道级信号稀释,使得样本过滤和触发器合成防御无法有效定位后门;(2)任务公式化转变导致训练损失退化,使得训练阶段中毒窗口与干净窗口难以区分。基于这些发现,我们提出了一种针对TSF的训练时后门防御方法,称为TimeGuard。该方法以通道式池化训练为核心范式,并使用时间感知标准初始化高置信度池以缓解信号稀释。此外,我们引入了距离正则化损失选择,在训练过程中逐步扩展可靠池并缓解损失退化。在多个数据集、预测架构和TSF后门攻击上的大量实验表明,TimeGuard显著提升了鲁棒性,将$\mathrm{MAE}_\mathrm{P}$相对于领先基线提升了1.96倍,同时将干净性能保持在5% $\mathrm{MAE}_\mathrm{C}$以内。

英文摘要

Time Series Forecasting (TSF) is highly vulnerable to backdoor attacks, yet effective defenses remain underexplored due to challenges arising from data entanglement and shifts in task formulation. To fill this gap, we conduct a systematic evaluation of thirteen representative backdoor defenses across the TSF life cycle and analyze their failure modes. Our results reveal two fundamental issues: (1) data entanglement induces channel-level signal dilution, rendering sample-filtering and trigger-synthesis defenses ineffective at localizing backdoors; and (2) task-formulation shift leads to training-loss degeneration, causing poisoned and clean windows to become indistinguishable at training stages. Based on these findings, we propose a training-time backdoor defense for TSF, termed TimeGuard. Our method adopts channel-wise pool training as the core paradigm and initializes a high-confidence pool using time-aware criteria to mitigate signal dilution. Moreover, we introduce distance-regularized loss selection to progressively expand the reliable pool during training and ease loss degeneration. Extensive experiments across multiple datasets, forecasting architectures, and TSF backdoor attacks demonstrate that TimeGuard substantially improves robustness, boosting $\mathrm{MAE}_\mathrm{P}$ by $1.96\times$ over the leading baseline, while preserving clean performance within 5% $\mathrm{MAE}_\mathrm{C}$.

2605.22222 2026-05-26 cs.LG 版本更新

ARC-STAR: Auditable Post-Hoc Correction for PDE Foundation Models

ARC-STAR: 面向PDE基础模型的可审计事后修正

Chengze Li, Lingwei Wei, Li Sun, Hongbo Lv, Jie Yang, Hanrong Zhang, Kening Zheng, Wei-Chieh Huang, Enze Ma, Philip S. Yu

发表机构 * University of Illinois Chicago(伊利诺伊大学芝加哥分校) Beijing University of Posts and Telecommunications(北京邮电大学) North China Electric Power University(华北电力大学)

AI总结 针对PDE基础模型预测漂移且误差空间集中的问题,提出冻结求解器的事后修正框架ARC-STAR,通过全局修正、局部精炼和预算感知路由三阶段实现可审计、低误差的修正。

Comments 40 pages, including appendices

详情
AI中文摘要

偏微分方程(PDE)基础模型是预训练网络,能够从单一可重用求解器预测速度、压力等物理场的演化。在不熟悉的流场上,它们的预测会逐步漂移,误差集中在少数区域,然而重新训练会破坏网络稳定性,而统一的事后修正忽略了这种空间集中性。为解决此问题,我们提出了一种冻结求解器的事后修正框架——自适应风险校准空间分诊可审计精炼(ARC-STAR)。ARC-STAR将修正组织为三个阶段:全局修正器消除广泛的求解器偏差,块级局部精炼器清理全局后残差,在部署时,无标签分数在计算预算下将精炼路由到高风险块。该框架设计为:(i) 冻结宿主,保留预训练求解器无需微调;(ii) 可审计,全局和局部阶段分别训练和评估以实现可衡量的贡献;(iii) 预算感知,使用块级接口,要么精炼整个场,要么将有限计算路由到高风险区域。在跨越十个状态单元的五个流基准测试中,ARC-STAR是唯一在每个单元上将速度滚动误差比原始Poseidon降低至少36倍的方法。全局阶段将原始宿主误差降低91-99%,局部阶段进一步将剩余的全局后残差降低高达94.4%。

英文摘要

Partial differential equation (PDE) foundation models are pretrained networks that forecast how physical fields like velocity and pressure evolve from a single reusable solver. On unfamiliar flows their predictions drift step by step, errors concentrate in a few regions, yet retraining destabilizes the network and uniform post-hoc correction overlooks this spatial concentration. To address this, we propose a frozen-solver post-hoc correction framework, Adaptive Risk-Calibrated Spatial Triage for Auditable Refinement (ARC-STAR). ARC-STAR organizes correction into three stages: a global corrector removes broad solver bias, a blockwise local refiner cleans the post-global residual, and, at deployment, a label-free score routes refinement to high-risk blocks under a compute budget. The framework is designed to be (i) frozen-host, preserving the pretrained solver without fine-tuning; (ii) auditable, with global and local stages trained and evaluated separately for measurable contributions; and (iii) budget-aware, using a blockwise interface that either refines the full field or routes limited compute to high-risk regions. Across five flow benchmarks spanning ten regime cells, ARC-STAR is the only method that cuts velocity rollout error by at least 36x over raw Poseidon on every cell. The global stage reduces raw host error by 91-99%, and the local stage further reduces the remaining post-global residual by up to 94.4%.

2605.20749 2026-05-26 cs.LG cs.AI 版本更新

The Devil is in the Condition Numbers: Why is GLU Better than non-GLU Structure?

魔鬼在于条件数:为什么GLU优于非GLU结构?

Xingyu Lyu, Qianqian Xu, Zhiyong Yang, Peisong Wen, Qingming Huang

发表机构 * State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China(人工智能安全国家重点实验室,计算技术研究所,中国科学院,北京100190,中国) School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 101408, China(中国科学院大学计算机科学与技术学院,北京101408,中国) Beijing Academy of Artificial Intelligence (BAAI), Beijing, China(北京人工智能研究院(BAAI),北京,中国)

AI总结 通过神经正切核分析,发现门控线性单元(GLU)通过重塑核谱、减小条件数来加速优化收敛,而非主要降低泛化差距。

Comments Accepted by ICML 2026

详情
AI中文摘要

门控线性单元(GLU)及其变体被广泛应用于现代开源大语言模型架构中,并且始终优于其非门控对应物,然而这种优势的根本原因尚不清楚。在这项工作中,我们通过分析神经正切核(NTK)机制下的两层网络来研究GLU。我们的分析表明,GLU结构重塑了NTK谱,导致更小的条件数和更紧凑的特征值分布。基于这一发现,我们进一步分析了由此产生的训练动态,并展示了重塑后的谱如何导致GLU模型更快的收敛,包括在GLU和非GLU模型之间观察到的特征损失交叉现象。最后,我们通过实验观察到,GLU在缩小各种模型(包括ViT和GPT-2)的泛化差距方面影响有限,这表明其主要优势在于加速优化而非减少泛化差距。代码可在 https://github.com/Zemdalk/GLU-NTK 获取。

英文摘要

Gated Linear Units (GLU) and their variants are widely adopted in modern open-source large language model architectures and consistently outperform their non-gated counterparts, yet the underlying reasons for this advantage remain unclear. In this work, we study GLU by analyzing two-layer networks in the neural tangent kernel (NTK) regime. Our analysis reveals that the GLU structure reshapes the NTK spectrum, leading to a smaller condition number and a more compact eigenvalue distribution. Building on this finding, we further analyze the resulting training dynamics and show how the reshaped spectrum leads to faster convergence of GLU models, including a characteristic loss-crossing phenomenon observed between GLU and non-GLU models. Finally, we empirically observe that GLU has limited impact in reducing the generalization gap on various models, including ViT and GPT-2, suggesting that its primary benefit lies in accelerating optimization rather than reducing the generalization gap. The code is available at: https://github.com/Zemdalk/GLU-NTK.

2605.18797 2026-05-26 cs.LG cs.AI 版本更新

Simply Stabilizing the Loop via Fully Looped Transformer

通过全循环Transformer简单稳定循环

Rao Fu, Zixuan Yang, Jiankun Zhang, Jing Ma, Hechang Chen, Yu Li, Yi Chang

发表机构 * Hong Kong Baptist University(香港 Baptist 大学) Jilin University(吉林大学)

AI总结 针对循环Transformer在迭代次数增加时出现的训练不稳定性,提出全循环Transformer,通过全循环架构和注意力注入两种无参数修改,稳定训练至12次循环,下游任务性能提升最高13.2%。

详情
AI中文摘要

扩展模型性能通常需要增加模型大小。循环Transformer通过迭代重用相同的Transformer块提供了一种引人注目的替代方案,用额外的计算换取性能提升,而不增加参数数量或上下文长度。由于推理时可以调整循环迭代次数,它还提供了一种平衡性能和测试时计算的自然机制。然而,当循环迭代次数增加时,循环Transformer仍然存在训练不稳定性。我们的分析表明,这种不稳定性源于两个来源:梯度振荡和残差爆炸。为了解决这两个问题,我们提出了全循环Transformer,它引入了两种无参数修改:(1)全循环架构,将循环间信号分布到所有层以缓解残差爆炸;(2)注意力注入,重用现有的注意力块以抑制梯度振荡。这些修改稳定了训练动态,使得全循环Transformer能够稳定训练多达12次循环迭代,而其他基线循环模型在这种情况下会崩溃。在循环Transformer不会崩溃的较温和设置中,全循环Transformer仍然将平均下游任务性能提升了高达13.2%。总体而言,我们的实验表明,全循环Transformer提高了训练稳定性,增强了下游性能,并通过在推理时改变循环迭代次数,提供了在不同测试时计算预算下的初步适应性。

英文摘要

Scaling model performance typically requires increasing model size. Looped Transformer offers a compelling alternative by iteratively reusing the same Transformer blocks, trading additional computation for improved performance without increasing parameter count or context length. Because the number of loop iterations can be adjusted at inference, it also provides a natural mechanism for balancing performance and test-time compute. However, Looped Transformer still suffers from training instability when the number of loop iterations increases. Our analysis reveals that this instability stems from two sources: gradient oscillation and residual explosion. To address these two problems, we propose the Fully Looped Transformer, which introduces two parameter-free modifications: (1) Fully Looped Architecture, which distributes inter-loop signals across all layers to mitigate residual explosion; (2) Attention Injection, which reuses the existing attention block to suppress gradient oscillation. These modifications stabilize training dynamics, enabling the Fully Looped Transformer to be trained stably up to 12 loop iterations, whereas other baseline looped models collapse in this regime. In milder settings where Looped Transformer does not collapse, Fully Looped Transformer still improves average downstream-task performance by up to 13.2\%. Overall, our experiments demonstrate that Fully Looped Transformer improves training stability, enhances downstream performance, and provides preliminary adaptability under different test-time compute budgets by varying loop iterations at inference.

2605.18746 2026-05-26 cs.CV cs.AI cs.CL cs.LG cs.RO 版本更新

ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

ESI-Bench: 迈向闭环感知-动作的具身空间智能

Yining Hong, Jiageng Liu, Han Yin, Manling Li, Leonidas Guibas, Li Fei-Fei, Jiajun Wu, Yejin Choi

发表机构 * Stanford University(斯坦福大学) UCLA(加州大学洛杉矶分校) Northwestern University(西北大学)

AI总结 提出ESI-BENCH基准,通过主动探索(感知、移动、操作)在OmniGibson环境中评估具身空间智能,发现主动探索显著优于被动方法,失败主因是动作盲视而非感知弱,且模型存在元认知差距。

Comments https://esi-bench.github.io/

详情
AI中文摘要

空间智能通过感知-动作循环展开:智能体通过行动获取观察,并推理观察如何随动作变化。它们不是被动处理所见,而是主动揭示未见——遮挡结构、动态、包含关系和功能,这些无法仅通过被动感知解决。我们超越先前假设神谕观察的空间智能表述,将观察者重新定义为行动者。我们引入ESI-BENCH,一个基于OmniGibson、扎根于Spelke核心知识系统的全面具身空间智能基准,涵盖10个任务类别和29个子类别。智能体必须决定部署哪些能力——感知、移动和操作——以及如何排序以主动积累任务相关证据。我们对最先进的MLLM进行大量实验,发现主动探索显著优于被动对应物,智能体自发发现涌现的空间策略而无需明确指令,而随机多视角往往增加噪声而非信号,尽管消耗更多图像。大多数失败并非源于感知弱,而是动作盲视:糟糕的动作选择导致糟糕的观察,进而引发级联错误。虽然显式3D基础稳定了深度敏感任务的推理,但不完美的3D表示通过扭曲空间关系证明比2D基线更有害。人类研究进一步揭示,与寻求证伪视角并在矛盾下修正信念的人类不同,模型无论证据质量如何都过早且高置信度地承诺,暴露了一个既不能通过更好感知也不能通过更多具身互动单独闭合的元认知差距。

英文摘要

Spatial intelligence unfolds through a perception-action loop: agents act to acquire observations, and reason about how observations vary as a function of action. Rather than passively processing what is seen, they actively uncover what is unseen - occluded structure, dynamics, containment, and functionality that cannot be resolved from passive sensing alone. We move beyond prior formulations of spatial intelligence that assume oracle observations by recasting the observer as an actor. We introduce ESI-BENCH, a comprehensive benchmark for embodied spatial intelligence spanning 10 task categories and 29 subcategories built on OmniGibson, grounded in Spelke's core knowledge systems. Agents must decide what abilities to deploy - perception, locomotion, and manipulation - and how to sequence them to actively accumulate task-relevant evidence. We conduct extensive experiments on state-of-the-art MLLMs and find that active exploration substantially outperforms passive counterparts, with agents spontaneously discovering emergent spatial strategies without explicit instructions, while random multi-view often adds noise rather than signal despite consuming far more images. Most failures stem not from weak perception but from action blindness: poor action choices lead to poor observations, which in turn drive cascading errors. While explicit 3D grounding stabilizes reasoning on depth-sensitive tasks, imperfect 3D representation proves more harmful than 2D baselines by distorting spatial relations. Human studies further reveal that unlike humans who seek falsifying viewpoints and revise beliefs under contradiction, models commit prematurely with high confidence regardless of evidence quality, exposing a metacognitive gap that neither better perception nor more embodied interaction alone can close.

2605.18745 2026-05-26 stat.ML cs.LG cs.NA math.NA math.PR q-fin.MF stat.CO 版本更新

SURGE: Approximation and Training Free Particle Filter for Diffusion Surrogate

SURGE: 扩散替代模型的近似与免训练粒子滤波

Lifu Wei, Yinuo Ren, Naichen Shi, Yiping Lu

发表机构 * Department of Mechanical Engineering, Northwestern University, Evanston, IL, United States Institute for Computational \& Mathematical Engineering, Stanford University, Stanford, CA, United States Department of Industrial Engineering \& Management Sciences, Northwestern University, Evanston, IL, United States

AI总结 提出一种基于扩散模型的无偏粒子滤波方法,通过序列蒙特卡洛对扩散轨迹进行重加权和重采样,融合观测数据与模型模拟,实现状态估计的连续校正。

Comments accepted by ICML 2026

详情
AI中文摘要

数据同化(DA)解决从含噪声和不完整的观测中顺序估计动力系统状态的问题。本文采用扩散模型作为世界模型来模拟和预测系统动力学。最近,基于分数的扩散模型学习了全局扩散先验,能有效建模(随机)动力学,显示出数据同化的强大潜力。本文研究如何利用含噪观测信息,在使用扩散先验时实现对预测系统状态的连续校正和细化。受粒子滤波方法启发,我们使用一组粒子表示后验分布。接收到含噪观测后,利用观测似然引导扩散模型,使生成过程朝向与观测一致的状态。然而,这种引导并不能保证从真实后验中采样。因此,我们将扩散轨迹视为路径测度,采用序列蒙特卡洛方法对粒子进行重加权和重采样,从而纠正生成过程并确保收敛到所需的后验分布。这产生了一种无偏的粒子滤波方法,严格地将观测数据与扩散模型模拟融合。

英文摘要

Data assimilation (DA) addresses the problem of sequentially estimating the state of a dynamical system from noisy and incomplete observations. In this work, we employ a diffusion model as a world model to simulate and predict the system's dynamics. Recently, score-based diffusion models have learned global diffusion priors that effectively model (stochastic) dynamics, revealing strong potential for data assimilation. In this paper, we investigate how information from noisy observations can be incorporated to enable continuous correction and refinement of the predicted system state when using a diffusion prior. Motivated by particle filtering methods, we represent the posterior distribution using a set of particles. After receiving noisy observations, the diffusion model is guided using the observation likelihood to steer the generation process toward observation-consistent states. Nevertheless, such guidance does not guarantee sampling from the true posterior. We therefore employ a Sequential Monte Carlo approach over the diffusion trajectory, viewed as a path measure, to reweight and resample particles, thereby correcting the generation process and ensuring convergence toward the desired posterior distribution. This leads to an unbiased particle filtering method that rigorously fuses observational data with diffusion model simulations.

2605.17788 2026-05-26 cs.IR cs.LG 版本更新

Uncertainty-Calibrated Recommendations for Low-Active Users

低活跃用户的不确定性校准推荐

Bob Junyi Zou, Sai Li, Tianyun Sun, Wentao Guo, Qinglei Wang

发表机构 * Stanford University(斯坦福大学) ByteDance Inc.(字节跳动公司)

AI总结 提出一个生产就绪的框架,通过校准模型不确定性来为低活跃用户实施风险规避的去增强策略,为高活跃用户采用风险寻求的UCB策略,从而平衡推荐可靠性与多样性。

Comments Accepted to the Applied Data Science (ADS) track at the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)

详情
Journal ref
Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '26), August 09--13, 2026, Jeju Island, Republic of Korea
AI中文摘要

推荐系统的一个基本挑战是平衡低活跃用户(LAUs)的可靠性与高活跃用户(HAUs)的多样性。这一平衡的关键在于量化模型不确定性,它近似预测误差的风险并揭示模型当前知识的局限性。在大规模短视频和直播平台上,模型不确定性可以警告可能导致LAUs脱离的低质量推荐,同时识别出为HAUs多样化内容推荐的机会。为了利用这种二分法,我们引入了一个统一的、生产就绪的框架,该框架校准不确定性以驱动差异化策略。具体来说,我们为LAUs实施了一种基于模型不确定性的风险规避去增强策略,以抑制不可靠的推荐,同时为HAUs采用风险寻求的上置信界(UCB)策略以鼓励探索。在一个主要直播平台上验证,我们的框架显著提高了LAUs的留存(活跃小时数)和满意度(质量观看时间比率),并显著增加了HAUs的兴趣多样性和类别覆盖率,证明了在工业环境中不确定性感知推荐的价值。

英文摘要

A fundamental challenge in recommender systems is balancing reliability for Low-Active Users (LAUs) with diversity for High-Active Users (HAUs). The key to this balance lies in quantifying model uncertainty, which approximates the risk of prediction errors and reveals the limits of the model's current knowledge. On large-scale short-video and livestream platforms, model uncertainty can warn of low-quality recommendations that may lead to disengagement of LAUs and at the same time identify opportunities to diversify content recommendation for HAUs. To leverage this dichotomy, we introduce a unified, production-ready framework that calibrates uncertainty to drive differentiated strategies. Specifically, we implement a model-uncertainty-based risk-averse deboosting policy for LAUs to suppress unreliable recommendations, while employing a risk-seeking Upper Confidence Bound (UCB) strategy for HAUs to encourage exploration. Validated on a major livestream platform, our framework demonstrates significant improvements in retention (active hours) and satisfaction (quality watch time ratio) for LAUs as well as remarkable increases in interest diversity and category coverage for HAUs, proving the value of uncertainty-aware recommendation in industrial settings.

2605.17730 2026-05-26 cs.LG cs.AI 版本更新

L-Drive: Beyond a Single Mapping-Latent Context Drives Time Series Forecasting

L-Drive:超越单一映射——潜在上下文驱动时间序列预测

Fan Zhang, Shijun Chen, Hua Wang

发表机构 * Business University, Yantai, Shandong, China(山东商业大学) Ludong University, Yantai, Shandong, China(鲁东大学)

AI总结 针对分布偏移和机制变化导致直接映射范式在转折点响应滞后的问题,提出L-Drive框架,通过引入潜在上下文表征高层动态并利用门控调制增量表示,提升对变化段的适应能力,同时采用补丁共享相对位置基函数增强段内结构建模,实现预测精度与计算效率的更好平衡。

详情
AI中文摘要

多变量时间序列预测的主流方法主要遵循直接映射范式。它们在观测空间中学习从历史到未来的统一映射,以拟合值级依赖关系。然而,现实世界系统经常经历分布偏移和机制变化。在这种情况下,统一映射在转折点附近可能出现响应滞后,导致切换窗口内误差累积,降低预测可靠性。为解决此问题,我们提出L-Drive,一种变化感知预测框架。L-Drive引入潜在上下文,显式表征随时间演变的高层动态,并使用门控调制增量表示。这提供了更及时的变化线索,并改善了对变化段的适应。此外,它结合了补丁共享相对位置基函数,以加强段内结构建模并减少由绝对位置记忆引起的过拟合。大量实验验证了L-Drive的有效性,并展示了其在预测精度和计算效率之间更好的整体权衡。

英文摘要

Mainstream methods for multivariate time-series forecasting largely follow the Direct-Mapping paradigm. They learn a unified mapping from history to the future in the observation space to fit value-level dependencies. However, real-world systems often undergo distribution shifts and regime changes. In such cases, a unified mapping can exhibit response lag around turning points, causing error accumulation within the switching window and reducing forecasting reliability. To address this issue, we propose L-Drive, a change-aware forecasting framework. L-Drive introduces a Latent-Context, to explicitly characterize high-level dynamics evolving over time, and uses gating to modulate increment representations. This provides more timely change cues and improves adaptation to changing segments. In addition, it incorporates patch-shared relative positional basis functions to strengthen intra-segment structural modeling and reduce overfitting caused by absolute-position memorization. Extensive experiments validate the effectiveness of L-Drive and show a better overall trade-off between forecasting accuracy and computational efficiency.

2605.17234 2026-05-26 cs.LG 版本更新

Active Budget Allocation for Efficient Scaling Law Estimation via Surrogate-Guided Pruning

通过代理引导剪枝实现高效缩放定律估计的主动预算分配

Viktoria Schram, Markus Hiller, Daniel Beck, Trevor Cohn

发表机构 * The School of Computing and Information Systems, the University of Melbourne, Melbourne, Australia(墨尔本大学计算与信息系统学院) Royal Melbourne Institute of Technology, Melbourne, Australia(皇家墨尔本理工学院) Now at Google Australia(现就职于谷歌澳大利亚)

AI总结 提出结合连续减半与参数/非参数代理模型的主动预算分配方法,在显著降低计算成本(节省高达98.7%)的同时获得更优的损失-计算量前沿,实现精确的缩放定律估计。

Comments Accepted at ICML 2026

详情
AI中文摘要

预测更大规模下的模型性能能够设计针对特定性能目标的训练策略和架构。经验缩放定律研究通过识别函数形式来辅助这一预测任务,这些函数形式利用学习曲线定义的损失-计算量前沿描述损失与计算量之间的关系。由于该方法的经验性质,计算负担巨大,使得战略资源分配至关重要——然而这一领域却出人意料地未被充分探索。在本工作中,我们通过探索连续减半(SH)以及SH与参数化和非参数化代理模型结合的适用性来弥补这一不足。除了能够更系统地分配给定的计算预算外,我们的发现表明,SH与代理模型结合得到的一组学习曲线中,包含一条损失-计算量值低于朴素均匀分配或仅SH方法所能获得的曲线。我们的实验在真实世界和合成学习曲线数据集上分别展示了高达2.84%和5.47%的平均相对改进。这种战略资源分配使我们能够以显著降低的计算成本获得准确的缩放定律,相比传统的穷举方法节省高达98.7%的计算量。

英文摘要

Predicting model performance at larger scales enables the design of training strategies and architectures tailored to specific performance targets. Empirical scaling law research identifies functional forms to aid this prediction task. These describe the relationship between loss and compute using a loss-compute frontier defined by learning curves. Due to the empirical nature of this approach, the computational burden is substantial, making strategic resource allocation essential - yet it remains surprisingly underexplored. In this work, we address this shortcoming by exploring the suitability of Successive Halving (SH) and SH combined with parametric and non-parametric surrogate models. In addition to enabling a more systematic allocation of a given compute budget, our findings show that SH paired with surrogate models yields a set of learning curves that includes one with a lower loss-compute value than what naive uniform allocation or an SH-only approach can obtain. Our experiments demonstrate mean relative improvements of up to 2.84% and 5.47% on real-world and synthetic learning curve datasets. This strategic resource allocation enables us to obtain accurate scaling laws at significantly reduced computational costs, saving up to 98.7% over the traditional exhaustive approach.

2605.16815 2026-05-26 cs.CR cs.LG 版本更新

Universal Graph Backdoor Defense: A Feature-based Homophily Perspective

通用图后门防御:基于特征同质性的视角

Mengting Pan, Fan Li, Chen Chen, Xiaoyang Wang

发表机构 * The University of New South Wales(新南威尔士大学) University of Wollongong(沃林戈大学)

AI总结 针对图后门攻击,提出基于特征同质性视角的通用防御框架,通过邻居感知重构损失区分后门节点并采用鲁棒训练消除触发器影响。

Comments 17 pages, 6 figures

详情
AI中文摘要

图神经网络(GNN)在关系学习中取得了显著成功。然而,它们对图后门攻击(GBA)的脆弱性阻碍了在高风险应用中的广泛采用。尽管图后门防御(GBD)近期有所进展,现有方法主要关注基于子图的GBA,依赖于受污染目标节点明确连接到子图触发器的假设。我们的实证结果表明,这种以结构为中心的方法无法防御新兴的基于特征的GBA,后者保留了图拓扑结构。因此,本文研究通用图后门防御的新问题。首先,我们从基于特征的同质性视角研究两种攻击类型的共同影响,该视角刻画了节点与其邻域之间的局部特征一致性。充分的理论和实证分析表明,无论触发器机制如何,GBA诱导的后门节点比干净节点表现出更低的基于特征的同质性,表明局部特征相似性存在差异。受此启发,我们提出利用节点级局部特征一致性(通过邻居感知重构损失建模)来区分后门节点与干净节点。然后,开发了一种鲁棒训练策略,以消除触发器影响,同时减少检测不确定性引入的噪声。大量实验表明,我们的框架在基于子图和基于特征的攻击下显著降低了攻击成功率,并保持了有竞争力的干净准确率。

英文摘要

Graph neural networks (GNNs) have achieved remarkable success in relational learning. However, their vulnerability to graph backdoor attacks (GBAs) poses a significant barrier to broader adoption in high-stakes applications. Despite recent advances in graph backdoor defense (GBD), existing methods primarily focus on subgraph-based GBAs, relying on the assumption that poisoned target nodes are explicitly connected to subgraph triggers. Our empirical results reveal that such structure-centric approaches fail to defend against emerging feature-based GBAs that preserve graph topology. Therefore, in this paper, we study a novel problem of universal graph backdoor defense. First, we investigate the shared effects of both attack types from a feature-based homophily perspective, which characterizes local feature consistency between nodes and their neighborhoods. Thorough theoretical and empirical analyses demonstrate that, regardless of trigger mechanisms, backdoors induced by GBAs exhibit lower feature-based homophily than clean nodes, indicating a discrepancy in local feature similarity. Motivated by this insight, we propose to leverage node-level local feature consistency, modeled by a neighbor-aware reconstruction loss, to distinguish backdoors from clean nodes. Then, a robust training strategy is developed to eliminate trigger effects while reducing noise induced by detection uncertainty. Extensive experiments demonstrate that our framework significantly degrades the attack success rate and maintains competitive clean accuracy under both subgraph-based and feature-based attacks.

2605.16023 2026-05-26 cs.CL cs.LG 版本更新

Judge Circuits

Judge Circuits

Nils Feldhus, Tanja Baeumel, Elena Golimblevskaia, Qianli Wang, Van Bach Nguyen, Aaron Louis Eidt, Selin Kahvecioglu, Christopher Ebert, Wojciech Samek, Jing Yang, Vera Schmitt, Sebastian Möller, Simon Ostermann

发表机构 * Technische Universität Berlin(柏林技术大学) BIFOLD – Berlin Institute for the Foundations of Learning and Data(柏林学习与数据基础研究院) German Research Center for Artificial Intelligence (DFKI)(德国人工智能研究中心) Fraunhofer Heinrich Hertz Institute(弗劳恩霍夫海因里希·赫茨研究所) Marburg University(马尔堡大学) Centre for European Research in Trusted AI (CERTAIN)(欧洲可信人工智能研究中心)

AI总结 本研究利用位置感知边归因修补(PEAP)因果分析Gemma-3、Qwen2.5和Llama-3的内部机制,发现结构化理解和开放式偏好任务中的判断共享一个稀疏、泛化的潜在评估子图,并通过解耦抽象判断与输出格式,揭示了格式诱导不一致性的机制原因。

Comments 39 pages

详情
AI中文摘要

LLM-as-a-judge已成为大规模评估模型输出的主导范式,然而同一模型在其输出格式变化时(例如,1-5评分与真/假标签)会系统地给出不同的分数。现有对这些格式诱导不一致性的诊断停留在输入输出层面。利用位置感知边归因修补(PEAP),我们因果地研究了Gemma-3、Qwen2.5和Llama-3的内部机制。我们发现,跨结构化理解和开放式偏好任务的判断共享一个稀疏、泛化的潜在评估子图,位于中后期多层感知器(MLPs)中;将其零消融会破坏判断,同时保留架构模块化模型中的世界知识。通过结构上解耦抽象判断与输出格式,我们为我们研究的开放权重模型上的格式诱导不一致性提供了机制解释:在共享主干中计算的连续判断信号通过脆弱、格式特定的终端分支映射,使得格式无关的偏好能够在请求的输出格式下游被隔离。我们的发现意味着跨格式的基准级可靠性比较部分测量的是格式化器几何形状而非评估质量。

英文摘要

LLM-as-a-judge has become the dominant paradigm for grading model outputs at scale, yet the same model assigns systematically different scores when its output format changes (e.g., a 1-5 rating vs. a True/False label). Existing diagnoses of these format-induced inconsistencies stop at the input-output level. Using Position-aware Edge Attribution Patching (PEAP), we causally investigate the internal mechanism in Gemma-3, Qwen2.5, and Llama-3. We find that judgments across structured understanding and open-ended preference tasks share a sparse, generalized Latent Evaluator sub-graph in the mid-to-late multi-layer perceptrons (MLPs); zero-ablating it collapses judgment while preserving world knowledge in architecturally modular models. By structurally decoupling abstract judging from output formatting, we provide a mechanistic account of format-induced inconsistency on the open-weight models we study: a continuous judgment signal computed in the shared trunk is mapped through fragile, format-specific terminal branches, enabling format-independent preference to be isolated downstream of the requested output format. Our findings imply that benchmark-level reliability comparisons across formats are partially measuring formatter geometry rather than evaluation quality.

2605.14769 2026-05-26 cs.LG 版本更新

Composable Crystals: Controllable Materials Discovery via Concept Learning

可组合晶体:通过概念学习实现可控材料发现

Nian Liu, Yuwei Zeng, Ryoji Kubo, Nikita Kazeev, Stephen Gregory Dale, Artem Maevskiy, Pengru Huang, Thomas Laurent, Kostya S. Novoselov, Xavier Bresson

发表机构 * National University of Singapore(新加坡国立大学) Loyola Marymount University(洛桑玛丽蒙大学)

AI总结 提出基于概念组合的晶体生成框架,利用向量量化变分自编码器自动发现可重用晶体概念,通过概念重组实现可控的新晶体探索,在MP-20和Alex-MP-20上V.S.U.N指标提升最高53.2%和51.7%。

详情
AI中文摘要

从头晶体生成是材料发现中的核心任务,旨在生成同时有效、稳定、独特且新颖的晶体。现有方法主要依赖黑盒随机采样,对生成结构如何超越观测分布的控制有限。本文提出了一种基于概念的组合式晶体生成框架。我们训练了一个向量量化变分自编码器,自动发现一组可重用的晶体概念,这些概念作为引导生成的构建块。这些学习到的概念在局部原子环境和全局对称模式上自然表现出可解释性,并能泛化到不同分布的晶体。通过重组这些概念,我们的框架能够可控地探索训练分布之外的新颖晶体,而非仅依赖无约束的随机采样。为进一步提高组合效率,我们引入了一个组合生成器,并使用模型自身生成的高质量样本对其进行迭代优化。最终的概念组合用于条件化下游晶体生成。在MP-20和Alex-MP-20上的数值实验表明,分别组合概念使基础模型在V.S.U.N指标上提升高达53.2%和51.7%,尤其在新颖性方面增益显著。

英文摘要

De novo crystal generation, a central task in materials discovery, aims to generate crystals that are simultaneously valid, stable, unique, and novel. Existing methods mainly rely on black-box stochastic sampling, providing limited control over how generated structures move beyond the observed distribution. In this paper, we introduce a concept-based compositional framework for crystal generation. We train a vector-quantized variational autoencoder to automatically discover a shared set of reusable crystal concepts, which serve as building blocks for guided generation. These learned concepts naturally exhibit interpretability from both local atomic environments and global symmetry patterns, and generalize to crystals from different distributions. By recombining such concepts, our framework enables controllable exploration of novel crystals beyond the training distribution, rather than relying solely on unconstrained random sampling. To further improve composition efficiency, we introduce a composition generator and iteratively refine it using high-quality samples generated by the model itself. The resulting concept compositions are then used to condition downstream crystal generation. Numerical experiments on MP-20 and Alex-MP-20 show that compositing concepts separately increase base model up to 53.2% and 51.7% on V.S.U.N metric, with particular gains in novelty.

2605.14759 2026-05-26 cs.LG 版本更新

Crys-JEPA: Accelerating Crystal Discovery via Embedding Screening and Generative Refinement

Crys-JEPA:通过嵌入筛选和生成精炼加速晶体发现

Nian Liu, Nikita Kazeev, Stephen Gregory Dale, Artem Maevskiy, Yuwei Zeng, Ryoji Kubo, Pengru Huang, Thomas Laurent, Yann LeCun, Kostya S. Novoselov, Xavier Bresson

发表机构 * National University of Singapore(国立新加坡大学) Loyola Marymount University(洛约拉玛丽蒙特大学) New York University(纽约大学) AMI

AI总结 提出Crys-JEPA联合嵌入预测架构,通过能量感知的潜在空间和筛选-精炼流程,解决晶体生成中稳定性和新颖性的冲突,在MP-20和Alex-MP-20数据集上V.S.U.N.指标分别提升53.8%和72.7%。

详情
AI中文摘要

从头晶体生成旨在发现不仅真实而且稳定和新颖的材料。然而,大多数现有生成模型被训练为最大化观测晶体的似然,这鼓励样本接近已知材料,但不一定与发现中重要的标准一致。我们的实证分析表明,当前晶体生成模型在稳定性和新颖性之间存在明显冲突:接近观测分布的样本倾向于保持稳定性但提供有限的新颖性,而远离分布的样本通常迅速失去稳定性。这表明发现既稳定又新颖晶体的有用区域极其狭窄。为了突破这一限制,我们引入了Crys-JEPA,一种用于晶体的联合嵌入预测架构,它学习一个能量感知的潜在空间,保留形成能差异。在这个空间中,稳定性评估可以重新表述为基于嵌入的与可访问训练晶体的比较,减少了对昂贵能量评估和特定任务外部参考的依赖。基于Crys-JEPA,我们进一步开发了一个筛选-精炼流程,识别有前景的生成晶体并重新引入它们以精炼生成模型。在MP-20和Alex-MP-20数据集上,我们在V.S.U.N.指标上分别比基线提升了53.8%和72.7%。

英文摘要

De novo crystal generation seeks to discover materials that are not merely realistic, but also stable and novel. However, most existing generative models are trained to maximize the likelihood of observed crystals, which encourages samples to stay close to known materials yet not necessarily align with the criteria that matter in discovery. Our empirical analysis shows that current crystal generative models exhibit a clear conflict between stability and novelty: samples near the observed distribution tend to retain stability but offer limited novelty, whereas samples farther from it often lose stability rapidly. This suggests that the useful region for discovering crystals that are both stable and novel is extremely narrow. To move beyond this limitation, we introduce Crys-JEPA, a joint embedding predictive architecture for crystals that learns an energy-aware latent space preserving formation-energy differences. In this space, stability assessment can be reformulated as an embedding-based comparison against accessible training crystals, reducing the reliance on expensive energy evaluation and task-specific external references. Building on Crys-JEPA, we further develop a screening-and-refinement pipeline that identifies promising generated crystals and reintroduces them to refine the generative model. On MP-20 and Alex-MP-20 datasets, we achieve improvements over baselines up to 53.8% and 72.7% on V.S.U.N. metric, respectively.

2605.12906 2026-05-26 cs.LG cs.AI 版本更新

Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning

数据难度与LLM微调中的泛化-外推权衡

Siyuan Liu, Tinghong Chen, Xinghan Li, Yifei Wang, Jingzhao Zhang

发表机构 * IIIS, Tsinghua University(清华大学人工智能学院) College of AI, Tsinghua University(清华大学人工智能学院) Shanghai Qi Zhi Institute(上海启智研究院) Amazon AGI SF Lab(亚马逊AGI旧金山实验室)

AI总结 本文通过实证和理论分析,研究了监督微调中数据难度对模型行为的影响,发现数据难度与数据量共同决定泛化与外推之间的权衡,并存在最优难度随数据量增加而向更难数据偏移的规律。

Comments Accepted to ICML 2026

详情
AI中文摘要

监督微调(SFT)期间的数据选择可以显著改变大型语言模型(LLMs)的行为。尽管已有工作研究了基于困惑度、难度或长度等启发式方法选择数据的效果,但报告的结果往往不一致或依赖于上下文。在这项工作中,我们从实证和理论角度系统地研究了数据难度在微调中的作用,并发现不存在普遍最优的难度水平;相反,其有效性取决于数据集大小。我们表明,对于固定的数据预算,SFT存在一个最优的数据难度,并且随着数据预算的增加,该最优难度向更难的数据偏移。为了解释这一现象,我们进行了受控的合成实验,揭示了一个简单的底层机制:分布内泛化差距与外推差距之间的相互作用。我们通过使用PAC-Bayesian泛化界限的理论分析进一步支持了这一机制。总的来说,我们的结果阐明了数据大小和难度如何共同影响SFT中泛化与外推之间的权衡,为在特定模型和数据条件下基于难度的数据选择提供了指导。

英文摘要

Data selection during supervised fine-tuning (SFT) can critically change the behavior of large language models (LLMs). Although existing work has studied the effect of selecting data based on heuristics such as perplexity, difficulty, or length, the reported findings are often inconsistent or context-dependent. In this work, we systematically study the role of data difficulty in fine-tuning from both empirical and theoretical perspectives, and find that there is no universally optimal difficulty level; rather, its effectiveness depends on the dataset size. We show that for a fixed data budget, there exists an optimal data difficulty for SFT, and that this optimal difficulty shifts toward harder data as the data budget increases. To explain this phenomenon, we conduct controlled synthetic experiments that reveal a simple underlying mechanism: the interplay between the (in-distribution) generalization gap and the extrapolation gap. We further support this mechanism through a theoretical analysis using PAC-Bayesian generalization bounds. Overall, our results clarify how data size and difficulty jointly affect the trade-off between generalization and extrapolation in SFT, providing guidance for difficulty-based data selection under certain model and data conditions.

2605.12374 2026-05-26 cs.CV cs.AI cs.LG 版本更新

Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models

填补GAP:多模态大语言模型中视觉推理的粒度对齐范式

Yanting Miao, Yutao Sun, Dexin Wang, Mengyu Zhou, Pascal Poupart, Lei Lv, Qi Zhao, Li Wang, Hao Li, Xiaoxi Jiang, Guanjun Jiang

发表机构 * Qwen Large Model Application Team, Alibaba(阿里云大模型应用团队) Alibaba University of Waterloo(阿里大学水力学院) Vector Institute(向量研究所) Zhejiang University(浙江大学)

AI总结 提出GAP(粒度对齐范式),通过特征级、上下文级和能力引导级对齐,解决多模态大语言模型中视觉潜在推理的特征空间不匹配问题,提升感知与推理性能。

详情
AI中文摘要

视觉潜在推理让多模态大语言模型(MLLM)以连续令牌形式创建中间视觉证据,避免外部工具或图像生成器。然而,现有方法通常遵循输出即输入的潜在范式,产生不稳定的收益。我们识别出特征空间不匹配是导致这种不稳定的证据:主流的视觉潜在模型建立在预归一化MLLM上,重用解码器隐藏状态作为预测的潜在输入,尽管这些状态与模型训练时消耗的输入嵌入处于截然不同的范数范围(Xie et al., 2025; Li et al., 2026; Team et al., 2026)。这种不匹配可能使直接潜在反馈不可靠。受此诊断启发,我们提出GAP,一种用于视觉潜在建模的粒度对齐范式。GAP在三个层面对齐视觉潜在推理:特征级对齐通过轻量级PCA对齐潜在头将解码器输出映射为输入兼容的视觉潜在;上下文级对齐通过可检查的辅助视觉监督锚定潜在目标;能力引导对齐选择性地将潜在监督分配给基础MLLM难以处理的示例。在Qwen2.5-VL 7B上,所得模型在我们监督变体中实现了最佳平均聚合感知和推理性能。推理时干预探测进一步表明,生成的潜在提供了任务相关的视觉信号,而不仅仅是增加令牌槽位。

英文摘要

Visual latent reasoning lets a multimodal large language model (MLLM) create intermediate visual evidence as continuous tokens, avoiding external tools or image generators. However, existing methods usually follow an output-as-input latent paradigm and yield unstable gains. We identify evidence for a feature-space mismatch that can contribute to this instability: dominant visual-latent models build on pre-norm MLLMs and reuse decoder hidden states as predicted latent inputs, even though these states occupy a substantially different norm regime from the input embeddings the model was trained to consume (Xie et al., 2025; Li et al., 2026; Team et al., 2026). This mismatch can make direct latent feedback unreliable. Motivated by this diagnosis, we propose GAP, a Granular Alignment Paradigm for visual latent modeling. GAP aligns visual latent reasoning at three levels: feature-level alignment maps decoder outputs into input-compatible visual latents through a lightweight PCA-aligned latent head; context-level alignment grounds latent targets with inspectable auxiliary visual supervision; and capacity-guided alignment assigns latent supervision selectively to examples where the base MLLM struggles. On Qwen2.5-VL 7B, the resulting model achieves the best mean aggregate perception and reasoning performance among our supervised variants. Inference-time intervention probing further suggests that generated latents provide task-relevant visual signal beyond merely adding token slots.

2605.09270 2026-05-26 cs.LG cs.AI 版本更新

Memorize Theorems, Not Instances: Probing SFT Generalization through Mathematical Reasoning

记忆定理而非实例:通过数学推理探究SFT泛化

Ruiying Peng, Mengyu Yang, Jing Lei, Xiaohui Li, Xueyu Wu, Xinlei Chen

发表机构 * Tsinghua Shenzhen International Graduate School(清华大学深圳国际研究生院) Huawei Technologies(华为技术)

AI总结 针对监督微调(SFT)损害推理泛化的问题,提出Theorem-SFT方法,通过显式定理应用训练,在多个基准上取得显著提升,并揭示前馈层是推理规则的主要存储位置。

详情
AI中文摘要

监督微调(SFT)广泛用于任务特定适配,但近期工作表明它会系统性地削弱推理泛化。我们认为根本原因不在于记忆本身,而在于其目标:标准SFT驱动模型利用并记忆问题-答案对中的虚假表面相关性,使其对表面输入变化脆弱。为解决此问题,我们提出Theorem-SFT,通过教授模型规则如何被调用而非答案看起来像什么,将监督重新导向显式定理应用。Theorem-SFT在多个基准和模型家族上取得一致提升:在MATH上(LLaMA3.2-3B-Instruct)提升8.8%,在GeoQA上(Qwen2.5-VL-7B-Instruct)提升20.27%,无需特定模态的重新训练。仅微调MLP层即可达到全层性能,表明前馈组件是推理规则的主要存储位置。我们的发现重新定义了争论:泛化失败并非源于记忆机制本身,而是源于记忆了错误的归纳目标。

英文摘要

Supervised Fine-Tuning (SFT) is widely used for task-specific adaptation, yet recent work shows it systematically undermines reasoning generalization. We argue the root cause is not memorization itself, but its target: vanilla SFT drives models to exploit and memorize spurious surface correlations in problem-solution pairs, leaving them brittle to superficial input variations. To address this, we propose Theorem-SFT, which reorients supervision toward explicit theorem application by teaching models how rules are invoked rather than what answers look like. Theorem-SFT yields consistent gains across benchmarks and model families: +8.8% on MATH (LLaMA3.2-3B-Instruct) and +20.27% on GeoQA (Qwen2.5-VL-7B-Instruct) without modality-specific re-training. Fine-tuning MLP layers alone matches full-layers performance, implicating feed-forward components as the primary locus of reasoning rules. Our findings reframe the debate: Generalization failures stem not from memorization as a mechanism, but from memorizing the wrong inductive targets.

2605.08306 2026-05-26 eess.IV cs.LG 版本更新

Non-intrusive Body Composition Assessment from Full-body mmWave Scans

基于全身毫米波扫描的非侵入性身体成分评估

Miriam Senne, Benjamin D. Killeen, Tony Danjun Wang, Nassir Navab

发表机构 * Chair for Computer Aided Medical Procedures(计算机辅助医疗程序研究所) Technical University of Munich(慕尼黑技术大学) Munich Center for Machine Learning(慕尼黑机器学习中心)

AI总结 提出一种利用毫米波雷达扫描通过多任务学习回归身体成分指标的方法,在真实扫描中实现了内脏脂肪体积和体脂百分比的预测。

详情
AI中文摘要

身体成分评估(BCA)提供了体内不同组织类型分布的详细信息,比BMI或单独体重更能精确描述个体特征。持续且频繁的BCA对个性化医疗有价值,但金标准方法如CT和MRI仅适用于有临床影像指征的患者的机会性监测,不适合普通人群的常规使用。本文考虑一种目前未用于医学应用的成像方式:毫米波雷达。毫米波扫描常用于安检场景,能够快速、非侵入性且保护隐私地重建全身形状,无需脱衣。为了证明从毫米波扫描进行快速便捷BCA的可行性,我们提出了一种基于多任务学习策略的BCA值回归方法,该方法利用来自临床影像和参数化人体模型的合成毫米波点云。我们在一个由真实毫米波扫描组成的初步队列上评估模型,该队列具有生物阻抗导出的体脂测量值,支持从站立姿势下通过衣物获取的毫米波数据估计VAT和体脂百分比(BFP)的可行性。我们发现模型预测VAT和BFP的平均绝对误差分别为1.0升和3.2%,展示了毫米波扫描在广泛场景中进行常规BCA的潜力。

英文摘要

Body composition assessment (BCA) provides detailed information about the distribution of different tissue types in the body, enabling more precise characterization of individuals than BMI or weight alone. Consistent and frequent BCA would be valuable for personalized medicine, but the gold standard methods for BCA, such as CT and MRI, are only practical for opportunistic monitoring of patients with clinical indications for imaging and are not suitable for routine use in the general population. Here, we consider an imaging modality which is not currently used in medical applications: millimeter wave (mmWave) radar. Commonly used in security settings, mmWave scans enable fast, non-intrusive, and privacy-preserving reconstruction of full body shape without the need to remove clothing. To demonstrate the feasibility of fast and convenient BCA from mmWave scans, we present a method for BCA value regression using a multi-task learning strategy that leverages synthetic mmWave-like point clouds derived from clinical imaging and parametric human models. We evaluate the model on a pilot cohort of real mmWave scans with bioimpedance-derived body fat measurements, supporting the feasibility of estimating VAT and body fat percentage (BFP) from mmWave data acquired through clothing in a standing posture. We find that the model can predict VAT and BFP with a mean absolute error of 1.0 L and 3.2%, respectively, demonstrating the potential of mmWave scanning for routine BCA in a wide range of settings.

2605.07233 2026-05-26 cs.LG cs.CR stat.ML 版本更新

Modulated learning for private and distributed regression with just a single sample per client device

调制学习:每个客户端设备仅有一个样本的私有分布式回归

Praneeth Vepakomma, Amirhossein Reisizadeh, Samuel Horváth, Munther A. Dahleh

发表机构 * MIT(麻省理工学院)

AI总结 针对每个客户端仅有一个样本的分布式学习场景,提出一种通过注入校准噪声并共享后处理表示来实现隐私保护的全局模型学习方法,在期望上匹配非私有中心化梯度更新。

Comments 30 pages

详情
AI中文摘要

本文聚焦于从大量设备中学习的问题,每个设备仅持有一个数据样本。这种每客户端一个样本的设置存在于多个实际应用中,包括从健身追踪器、数据/应用使用聚合器、可穿戴传感设备和日常事件监测器等学习。当客户端只有一个样本时,标准的联邦学习范式会失效,因为基于单个点的局部更新远非有用,尤其是在模型系数估计的早期轮次中。这种效用进一步被每轮添加的隐私诱导噪声削弱。本文针对这一问题,使此类客户端能够协作贡献,有效学习全局模型,同时不泄露其数据隐私。所提出的方法在每个客户端注入一个精心校准的噪声扰动来变换样本,然后共享经过后处理的表示给服务器。服务器聚合这些表示,处理得到无偏梯度更新,该更新在期望上匹配非私有中心化梯度,同时保护数据隐私。这种方法不同于传统的私有联邦学习,其中通信负载涉及模型系数而非私有变换的数据样本。该方法使数据极其有限的设备能够协作学习准确、保护隐私的模型,无需大量本地数据集或牺牲个体隐私。

英文摘要

This work focuses on the question of learning from a large number of devices with each device holding only a single sample of data. Several real-world applications exist to this one sample per client setup up including learning from fitness trackers, data/app usage aggregators, body-worn sensing devices, and daily event monitors to name a few. When a client has only one sample, the standard federated learning paradigm breaks down as a local update based on that single point is far from being useful, especially in the earlier rounds for estimation of the model coefficients. This utility is further weakened by the privacy-inducing noise applied at every round. This work caters to this problem to enable such clients to collaboratively contribute to effectively learn a global model without leaking the privacy of their data. The proposed approach injects a single, carefully calibrated noisy perturbation to transform the sample at each client, followed by a post-processed representation which is shared with the server. These representations aggregated at the server are processed to obtain an unbiased gradient update that in expectation matches the non-private centralized gradient while preserving data privacy. This approach is different than traditional private federated learning, where the communication payloads involve model coefficients as opposed to privately transformed data samples. This method enables devices with extremely limited data to collaborate and learn accurate, privacy-preserving models without requiring large local datasets or sacrificing individual privacy.

2605.04700 2026-05-26 cs.CR cs.AI cs.CL cs.LG cs.SD 版本更新

Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization

稀疏令牌足矣:通过令牌感知梯度优化越狱音频语言模型

Zheng Fang, Xiaosen Wang, Shenyi Zhang, Shaokang Wang, Zhijin Ge

发表机构 * Wuhan University Institute for Math \& AI, Wuhan University Huazhong University of Science Shanghai Jiao Tong University Xidian University

AI总结 本文提出令牌感知梯度优化(TAGO)方法,通过仅保留高梯度能量的音频令牌对应的波形梯度,实现稀疏越狱攻击,在保持高成功率的同时大幅减少优化量。

Comments To appear in the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

对音频语言模型(ALM)的越狱攻击通过优化音频扰动来引发不安全生成,通常在整个优化过程中密集地更新整个波形。在这项工作中,我们通过分析ALM中令牌对齐梯度的结构来研究这种密集优化的必要性。我们发现梯度能量在音频令牌之间高度不均匀,表明只有一小部分令牌对齐的音频区域主导了优化信号。受此观察启发,我们提出了令牌感知梯度优化(TAGO),它通过每次迭代仅保留与高梯度能量音频令牌对齐的波形梯度,同时屏蔽其余梯度,实现了稀疏越狱优化。在三个ALM上,TAGO优于基线,并且大幅稀疏化仍能保持较高的攻击成功率(例如,在Qwen3-Omni上,令牌保留率为0.25时,$\mathrm{ASR}_{l}$仍为86%,而全令牌保留时为87%)。这些结果表明密集的波形更新在很大程度上是冗余的,我们主张未来的音频越狱和安全对齐研究应进一步利用这种异质的令牌级梯度结构。

英文摘要

Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire waveform densely throughout optimization. In this work, we investigate the necessity of such dense optimization by analyzing the structure of token-aligned gradients in ALMs. We find that gradient energy is highly non-uniform across audio tokens, indicating that only a small subset of token-aligned audio regions dominates the optimization signal. Motivated by this observation, we propose Token-Aware Gradient Optimization (TAGO), which enables sparse jailbreak optimization by retaining only waveform gradients aligned with audio tokens that have high gradient energy, while masking the remaining gradients at each iteration. Across three ALMs, TAGO outperforms baselines, and substantial sparsification preserves strong attack success rates (e.g. on Qwen3-Omni, $\mathrm{ASR}_{l}$ remains at 86% with a token retention ratio of 0.25, compared to 87% with full token retention). These results demonstrate that dense waveform updates are largely redundant, and we advocate that future audio jailbreak and safety alignment research should further leverage this heterogeneous token-level gradient structure.

2605.02495 2026-05-26 cs.LG cs.AI stat.ML 版本更新

Efficient Preference Poisoning Attack on Offline RLHF

高效偏好投毒攻击离线RLHF

Chenye Yang, Weiyu Xu, Lifeng Lai

发表机构 * Department of Electrical and Computer Engineering, University of California, Davis, Davis, CA, USA(加州大学戴维斯分校电气与计算机工程系) Department of Electrical and Computer Engineering, University of Iowa, Iowa City, IA, USA(爱荷华大学电气与计算机工程系)

AI总结 针对离线RLHF中的偏好投毒攻击,提出基于梯度字典的二进制稀疏近似方法(BAL-A和BMP-A),实现高效标签翻转攻击。

Comments Accepted to ICML 2026

详情
AI中文摘要

离线人类反馈强化学习(RLHF)流程(如直接偏好优化DPO)在预收集的偏好数据集上训练,使其容易受到偏好投毒攻击。我们研究了对数线性DPO的标签翻转攻击。首先说明翻转一个偏好标签会在DPO梯度中引起与参数无关的偏移。利用这一关键性质,我们可以将目标投毒问题转化为结构化的二进制稀疏近似问题。为解决该问题,我们开发了两种攻击方法:二进制感知格点攻击(BAL-A)和二进制匹配追踪攻击(BMP-A)。BAL-A将二进制翻转选择问题嵌入二进制感知格点,并应用Lenstra-Lenstra-Lovász约简和Babai最近平面算法;我们提供了强制二进制系数并恢复最小翻转目标的充分条件。BMP-A将二进制匹配追踪适应于我们的非归一化梯度字典,并给出基于相干性的恢复保证和$K$翻转预算的鲁棒性(不可能性)证书。在合成字典和斯坦福人类偏好数据集上的实验验证了理论,并突出了字典几何如何决定攻击成功。

英文摘要

Offline Reinforcement Learning from Human Feedback (RLHF) pipelines such as Direct Preference Optimization (DPO) train on a pre-collected preference dataset, which makes them vulnerable to preference poisoning attack. We study label flip attacks against log-linear DPO. We first illustrate that flipping one preference label induces a parameter-independent shift in the DPO gradient. Using this key property, we can then convert the targeted poisoning problem into a structured binary sparse approximation problem. To solve this problem, we develop two attack methods: Binary-Aware Lattice Attack (BAL-A) and Binary Matching Pursuit Attack (BMP-A). BAL-A embeds the binary flip selection problem into a binary-aware lattice and applies Lenstra-Lenstra-Lovász reduction and Babai's nearest plane algorithm; we provide sufficient conditions that enforce binary coefficients and recover the minimum-flip objective. BMP-A adapts binary matching pursuit to our non-normalized gradient dictionary and yields coherence-based recovery guarantees and robustness (impossibility) certificates for $K$-flip budgets. Experiments on synthetic dictionaries and the Stanford Human Preferences dataset validate the theory and highlight how dictionary geometry governs attack success.

2605.00419 2026-05-26 cs.LG cs.CL 版本更新

Rethinking LLM Ensembling from the Perspective of Mixture Models

从混合模型的角度重新思考大语言模型集成

Jiale Fu, Yuchu Jiang, Peijun Wu, Chonghan Liu, Joey Tianyi Zhou, Xu Yang

发表机构 * Key Laboratory of New Generation Artificial Intelligence Technology(新一代人工智能技术关键实验室) Its Interdisciplinary Applications (Southeast University), Ministry of Education(交叉应用(东南大学),教育部) Southeast University(东南大学) Centre for Frontier AI Research (CFAR), Agency for Science, Technology(前沿人工智能研究(CFAR),科技研究局) Research (A STAR), Singapore(研究(A STAR),新加坡) Institute of High Performance Computing (IHPC), Agency for Science, Technology(高性能计算(IHPC),科技研究局)

AI总结 本文提出混合模型式集成(ME),通过将集成重新解释为混合模型,随机选择单个模型生成下一个token,避免显式计算完整集成分布,实现1.78x-2.68x加速,并揭示了集成与token级路由方法的联系。

Comments ICML 2026 Spotlight

详情
AI中文摘要

模型集成是提升机器学习模型性能的成熟技术。传统上,这涉及对多个模型的输出分布进行平均,并选择最可能的标签。这一思想已自然扩展到大型语言模型(LLMs),在提升性能的同时也带来了巨大的计算成本。这种低效源于将传统集成实现直接应用于LLMs,需要为每个模型单独进行前向传播以显式计算集成分布。在本文中,我们提出了混合模型式集成(ME)。通过将集成重新解释为混合模型,ME在每一步随机选择一个模型来生成下一个token,从而避免显式计算完整的集成分布。ME在数学上等价于从集成分布中采样,但只需调用一个模型,使其比传统集成快1.78x-2.68x倍。此外,这一视角将LLM集成与token级路由方法联系起来,表明LLM集成是路由方法的一个特例。我们的发现为高效的LLM集成开辟了新途径,并激励了对LLM token级路由策略的进一步探索。我们的代码可在https://github.com/Kamichanw/Mixture-model-like-Ensemble获取。

英文摘要

Model ensembling is a well-established technique for improving the performance of machine learning models. Conventionally, this involves averaging the output distributions of multiple models and selecting the most probable label. This idea has been naturally extended to large language models (LLMs), yielding improved performance but incurring substantial computational cost. This inefficiency stems from directly applying conventional ensemble implementation to LLMs, which require a separate forward pass for each model to explicitly compute the ensemble distribution. In this paper, we propose the Mixture-model-like Ensemble (ME). By reinterpreting the ensemble as a mixture model, ME stochastically selects a single model at each step to generate the next token, thereby avoiding the need to explicitly compute the full ensemble distribution. ME is mathematically equivalent to sampling from the ensemble distribution, but requires invoking only one model, making it 1.78x-2.68x faster than conventional ensembling. Furthermore, this perspective connects LLM ensembling and token-level routing methods, suggesting that LLM ensembling is a special case of routing methods. Our findings open new avenues for efficient LLM ensembling and motivate further exploration of token-level routing strategies for LLMs. Our code is available at https://github.com/Kamichanw/Mixture-model-like-Ensemble.

2604.23017 2026-05-26 cs.LG cs.NA math.CV math.NA 版本更新

Complex Stochastic Gradient Descent and Directional Bias in Reproducing Kernel Hilbert Spaces

复随机梯度下降与再生核希尔伯特空间中的方向偏差

Natanael Alpay, Emeric Battaglia

发表机构 * Department of Mathematics University of California, Irvine, Irvine, CA 92697 USA(数学系,加州大学伊文斯顿分校,伊文斯顿,CA 92697,美国)

AI总结 本文提出复随机梯度下降(Complex SGD)方法,在无解析性约束下证明其收敛性,并验证方向偏差从实域扩展到复域,在复再生核希尔伯特空间中通过核回归有效恢复超振荡函数和Blaschke乘积。

详情
AI中文摘要

随机梯度下降(SGD)是一种已知的随机迭代方法,因其实现简单和可扩展性而流行于大规模凸优化问题。某些目标函数,例如复值神经网络中的目标函数,受益于SGD和梯度下降(GD)中使用新定义的“梯度”进行更新,该梯度允许复参数。这种SGD/GD方法的复变体已被提出,但尚未提供无解析性约束的收敛保证。我们提出了一种允许复参数的SGD变体(复SGD),并在与实设置平行的假设下提供了收敛保证。值得注意的是,这些结果也扩展到GD,并且在相同的假设集下,我们确认了对于核回归问题,一些方向偏差结果从实域扩展到复域。我们提供了实证结果,证明了复SGD在使用复再生核希尔伯特空间的核回归问题中的有效性。特别地,我们展示了通过选择特定的损失函数,可以分别从Fock空间和Hardy空间中恢复超振荡函数和Blaschke乘积作为最优函数。

英文摘要

Stochastic Gradient Descent (SGD) is a known stochastic iterative method popular for large-scale convex optimization problems due to its simple implementation and scalability. Some objectives, such as those found in complex-valued neural networks, benefit from updates like in SGD and Gradient Descent (GD) with a newly defined ``gradient'' that allows for complex parameters. This complex variant of the SGD/GD methods has already been proposed, but convergence guarantees without analyticity constraints have not yet been provided. We propose a variant of SGD (complex SGD) that allows for complex parameters, and we provide convergence guarantees under assumptions that parallel those from the real setting. Notably, these results extend to GD as well, and with the same set of assumptions, we confirm that some directional bias results extend from the real to the complex setting for kernel regression problems. We provide empirical results demonstrating the efficacy of the complex SGD in kernel regression problems utilizing complex reproducing kernel Hilbert spaces. In particular, we demonstrate we may recover superoscillation functions and Blaschke products from the Fock Space and Hardy Space, respectively, as the optimal functions for a particular choice of a loss function.

2604.18970 2026-05-26 cs.LG cs.CR 版本更新

Mechanistic Anomaly Detection via Functional Attribution

通过功能归因的机制性异常检测

Hugo Lyons Keenan, Christopher Leckie, Sarah Erfani

发表机构 * School of Computing and Information Systems, The University of Melbourne, Victoria, Australia(计算与信息系统学院,墨尔本大学,维多利亚,澳大利亚)

AI总结 将机制性异常检测重新定义为功能归因问题,利用影响函数测量测试样本与参考集之间的功能耦合,在视觉模型后门检测、大语言模型后门检测以及对抗样本和分布外样本检测中取得最优或显著改进。

Comments ICML '26 Camera Ready

详情
AI中文摘要

我们通常可以使用真实标签验证神经网络输出的正确性,但无法可靠地确定输出是由正常还是异常的内部机制产生的。机制性异常检测(MAD)旨在标记这些情况,但现有方法要么依赖于潜在空间分析(易受混淆攻击),要么特定于特定架构和模态。我们将MAD重新定义为功能归因问题:询问来自可信集的样本在多大程度上可以解释模型的输出,其中归因失败表明异常行为。我们使用影响函数来实现这一点,通过参数空间采样测量测试样本与小型参考集之间的功能耦合。我们在多种异常类型和模态上进行了评估。对于视觉模型中的后门,我们的方法在BackdoorBench上实现了最先进的检测,在七种攻击和四个数据集上平均防御有效性评分(DER)为0.93(次优为0.83)。对于大语言模型,我们在几种后门类型(包括显式混淆的模型)上也取得了比基线显著的改进。除了后门,我们的方法可以检测对抗样本和分布外样本,并区分单个模型内的多种异常机制。我们的结果确立了功能归因作为一种有效的、模态无关的工具,用于检测部署模型中的异常行为。

英文摘要

We can often verify the correctness of neural network outputs using ground truth labels, but we cannot reliably determine whether the output was produced by normal or anomalous internal mechanisms. Mechanistic anomaly detection (MAD) aims to flag these cases, but existing methods either depend on latent space analysis, which is vulnerable to obfuscation, or are specific to particular architectures and modalities. We reframe MAD as a functional attribution problem: asking to what extent samples from a trusted set can explain the model's output, where attribution failure signals anomalous behavior. We operationalize this using influence functions, measuring functional coupling between test samples and a small reference set via parameter-space sampling. We evaluate across multiple anomaly types and modalities. For backdoors in vision models, our method achieves state-of-the-art detection on BackdoorBench, with an average Defense Effectiveness Rating (DER) of 0.93 across seven attacks and four datasets (next best 0.83). For LLMs, we similarly achieve a significant improvement over baselines for several backdoor types, including on explicitly obfuscated models. Beyond backdoors, our method can detect adversarial and out-of-distribution samples, and distinguishes multiple anomalous mechanisms within a single model. Our results establish functional attribution as an effective, modality-agnostic tool for detecting anomalous behavior in deployed models.

2604.14054 2026-05-26 cs.LG cs.CL 版本更新

$π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data

$\pi$-Play: 通过特权自蒸馏实现的多智能体自对弈,无需外部数据

Yaocheng Zhang, Yuanheng Zhu, Wenyue Chong, Songjun Tu, Qichao Zhang, Jiajun Chai, Xiaohan Wang, Wei Lin, Guojun Yin, Dongbin Zhao

发表机构 * Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所) School of Advanced Interdisciplinary Sciences, University of Chinese Academy of Sciences(中国科学院大学先进交叉学科学院) Meituan(美团) School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院)

AI总结 提出$\pi$-Play框架,利用自对弈中生成的问答构建路径作为特权信息,结合自蒸馏实现密集反馈的多智能体协同进化,无需外部数据即可超越全监督搜索代理。

Comments 23 pages, 11 figures

详情
AI中文摘要

深度搜索代理已成为解决复杂信息寻求任务的有前景范式,但其训练仍面临稀疏奖励、弱信用分配和有限标注数据的挑战。自对弈提供了一种可扩展的减少数据依赖的途径,但传统自对弈仅通过稀疏结果奖励优化学生,导致学习效率低下。在这项工作中,我们观察到自对弈在任务生成过程中自然产生一个问题构建路径(QCP),这是一种捕获反向求解过程的中间产物。这揭示了一种新的特权信息来源:自对弈可以低成本、大规模地提供高质量特权信息用于自蒸馏,无需依赖人类反馈或精心设计的特权信息。基于这一洞察,我们提出特权信息自对弈($\pi$-Play),一种结合自对弈和自蒸馏的新型多智能体自进化框架。在$\pi$-Play中,考官生成任务及QCP,教师利用QCP作为特权上下文,通过自蒸馏对学生进行密集监督。这种设计将稀疏奖励的自对弈转变为密集反馈的协同进化。大量实验表明,无数据的$\pi$-Play超越了全监督搜索代理,并将进化效率相比传统自对弈提升了2-3倍。代码见 https://github.com/zhyaoch/pi-play。

英文摘要

Deep search agents have emerged as a promising paradigm for addressing complex information-seeking tasks, but their training remains challenging due to sparse rewards, weak credit assignment, and limited labeled data. Self-play offers a scalable route to reduce data dependence, but conventional self-play optimizes students only through sparse outcome rewards, leading to low learning efficiency. In this work, we observe that self-play naturally produces a question construction path (QCP) during task generation, an intermediate artifact that captures the reverse solution process. This reveals a new source of privileged information: self-play can provide high-quality privileged information for the self-distillation at low cost and at scale, without relying on human feedback or curated privileged information. Leveraging this insight, we propose Privileged Information Self-Play ($π$-Play), a novel multi-agent self-evolution framework combining self-play and self-distillation. In $π$-Play, an examiner generates tasks together with QCPs, and a teacher employs QCP as privileged context to densely supervise a student via self-distillation. This design transforms sparse-reward self-play into a dense-feedback co-evolution. Extensive experiments show that data-free $π$-Play surpasses fully supervised search agents and improves evolutionary efficiency by 2-3$\times$ over conventional self-play. Code is available at https://github.com/zhyaoch/pi-play.

2604.10783 2026-05-26 cs.AI cs.LG 版本更新

Learning Preference-Based Objectives from Clinical Narratives for Dynamic Sepsis Treatment

从临床叙述中学习基于偏好的目标用于动态脓毒症治疗

Daniel J. Tan, Jayne Hui Zhen Chan, Kai Wen Hwang, Arturo Yong Yao Neo, Kay Choong See, Mengling Feng

发表机构 * Institute of Data Science, National University of Singapore, Singapore(新加坡国立大学数据科学研究所) National University Hospital, Singapore(新加坡国立大学医院) Saw Swee Hock School of Public Health, National University of Singapore, Singapore(新加坡国立大学 Saw Swee Hock 公共卫生学院)

AI总结 提出CN-PR框架,利用大语言模型从出院小结中提取轨迹级偏好,通过偏好优化学习奖励函数,在离线强化学习中改善脓毒症治疗结果。

详情
AI中文摘要

在医疗保健中为强化学习设计奖励函数仍然具有挑战性,因为临床有意义的结果稀疏、延迟且难以明确指定。尽管结构化临床数据捕获了生理状态,但它们往往无法反映患者轨迹的更广泛方面,如治疗反应、恢复动态和干预负担。相比之下,临床叙述编码了临床医生对疾病进展、治疗效果和恢复的纵向评估,提供了超越预定义结果指标的轨迹级监督的潜在来源。我们提出了临床叙述知情偏好奖励(CN-PR)框架,该框架通过将临床叙述视为轨迹级偏好的可扩展监督,直接从出院小结中学习奖励函数。使用大语言模型,我们推导出轨迹质量分数,并在患者轨迹之间构建成对偏好,通过基于偏好的优化来学习奖励。为了考虑叙述信息量的变异性,我们引入了一个任务相关性信号,根据监督与下游决策任务的相关性对其进行加权。我们在离线强化学习中评估了CN-PR在动态脓毒症治疗中的应用。学习到的奖励与轨迹质量分数表现出强烈的单调对齐,并产生了与改善恢复相关结果相关的策略,包括增加器官支持无天数和更快的休克解决,同时保持与基于结果的奖励基线相当的性能。这些发现在外部验证下得以保留。我们的结果表明,临床叙述为动态治疗方案中的奖励学习提供了可扩展且富有表现力的监督来源。

英文摘要

Designing reward functions for reinforcement learning (RL) in healthcare remains challenging because clinically meaningful outcomes are sparse, delayed, and difficult to explicitly specify. Although structured clinical data capture physiologic states, they often fail to reflect broader aspects of patient trajectories such as treatment response, recovery dynamics, and intervention burden. Clinical narratives, by contrast, encode longitudinal clinician assessments of disease progression, treatment effectiveness, and recovery, providing a potential source of trajectory-level supervision beyond predefined outcome metrics. We propose Clinical Narrative-informed Preference Rewards (CN-PR), a framework that learns reward functions directly from discharge summaries by treating clinical narratives as scalable supervision for trajectory-level preferences. Using a large language model, we derive trajectory quality scores and construct pairwise preferences between patient trajectories to learn rewards through preference-based optimization. To account for variability in narrative informativeness, we incorporate a task relevance signal that weights supervision according to its relevance to the downstream decision-making task. We evaluate CN-PR in dynamic sepsis treatment using offline RL. The learned reward demonstrated strong monotonic alignment with trajectory quality scores and produced policies associated with improved recovery-related outcomes, including increased organ support-free days and faster shock resolution, while maintaining mortality performance comparable to outcome-based reward baselines. These findings were preserved under external validation. Our results suggest that clinical narratives provide a scalable and expressive source of supervision for reward learning in dynamic treatment regimes.

2604.08870 2026-05-26 cs.LG cs.AI 版本更新

Temporal Dropout Risk in Learning Analytics: A Harmonized Survival Benchmark Across Dynamic and Early-Window Representations

学习分析中的时间辍学风险:跨动态与早期窗口表示的协调生存基准

Rafael da Silva, Jeff Eicher, Gregory Longo

发表机构 * Applied Data Science Program(应用数据科学项目) Eastern University(东部大学)

AI总结 本研究使用OULAD数据集,通过协调的生存分析基准(包括动态周表示和连续时间表示)评估辍学风险模型,发现时间行为特征比静态背景属性更具预测力。

Comments 34 pages, 14 figures, 18 tables. Includes appendix with reliability diagrams, sensitivity analyses, and dataset audit tables

详情
AI中文摘要

学生辍学是学习分析中持续关注的问题,然而比较研究经常在异质协议下评估预测模型,优先考虑区分度而非时间可解释性和校准。本研究引入了一个面向生存的基准,用于使用开放大学学习分析数据集(OULAD)进行时间辍学风险建模。比较了两个协调分支:一个动态周分支,采用人-时期表示的模型;以及一个可比较的连续时间分支,扩展了模型家族——基于树的生存模型、参数模型和神经网络模型。评估协议整合了四个分析层面:预测性能、消融、可解释性和校准。结果在每个分支内分别报告,因为跨分支单一排名在方法论上不合理。在可比较分支中,随机生存森林在区分度和特定时间点的Brier分数上领先;在动态分支中,泊松分段指数在紧密的五家族聚类中在综合Brier分数上略微领先。无重抽样自举变异将这些位置视为方向性信号而非绝对优势。消融和可解释性分析在所有家族中收敛于一个共同发现:主导预测信号主要不是人口统计学或结构性的,而是时间和行为性的。校准在更好区分的模型中证实了这一模式,但XGBoost AFT除外,它表现出系统性偏差。这些结果支持在学习分析中采用协调的多维基准的价值,并将辍学风险定位为一个时间行为过程,而非静态背景属性的函数。

英文摘要

Student dropout is a persistent concern in Learning Analytics, yet comparative studies frequently evaluate predictive models under heterogeneous protocols, prioritizing discrimination over temporal interpretability and calibration. This study introduces a survival-oriented benchmark for temporal dropout risk modelling using the Open University Learning Analytics Dataset (OULAD). Two harmonized arms are compared: a dynamic weekly arm, with models in person-period representation, and a comparable continuous-time arm, with an expanded roster of families -- tree-based survival, parametric, and neural models. The evaluation protocol integrates four analytical layers: predictive performance, ablation, explainability, and calibration. Results are reported within each arm separately, as a single cross-arm ranking is not methodologically warranted. Within the comparable arm, Random Survival Forest leads in discrimination and horizon-specific Brier scores; within the dynamic arm, Poisson Piecewise-Exponential leads narrowly on integrated Brier score within a tight five-family cluster. No-refit bootstrap sampling variability qualifies these positions as directional signals rather than absolute superiority. Ablation and explainability analyses converged, across all families, on a shared finding: the dominant predictive signal was not primarily demographic or structural, but temporal and behavioral. Calibration corroborated this pattern in the better-discriminating models, with the exception of XGBoost AFT, which exhibited systematic bias. These results support the value of a harmonized, multi-dimensional benchmark in Learning Analytics and situate dropout risk as a temporal-behavioral process rather than a function of static background attributes.

2604.00499 2026-05-26 cs.LG 版本更新

Scheduling LLM Inference with Uncertainty-Aware Output Length Predictions

具有不确定性感知输出长度预测的LLM推理调度

Haoyu Zheng, Yongqiang Zhang, Fangcheng Fu, Xiaokai Zhou, Hao Luo, Hongchao Zhu, Yuanyuan Zhu, Hao Wang, Xiao Yan, Jiawei Jiang

发表机构 * School of Computer Science, Wuhan University, Wuhan, China(武汉大学计算机学院) School of Computer Science, Central China Normal University, Wuhan, China(中央财经大学计算机学院) Dameng Database Co., Ltd., Wuhan, China(达梦数据库有限公司) School of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai, China(上海交通大学人工智能学院) Institute for Math and AI, Wuhan University, Wuhan, China(武汉大学数学与人工智能研究院)

AI总结 针对LLM推理调度,提出基于对数t分布的输出长度分布模型,并设计Tail Inflated Expectation (TIE)指标替代点估计,以降低在线推理延迟并提高离线吞吐量。

Comments Accepted at ICML 2026

详情
AI中文摘要

为了调度LLM推理, extit{最短作业优先}(SJF)原则通过优先处理输出长度短的请求来避免队头(HOL)阻塞。现有方法通常为每个请求预测单个输出长度以促进调度。我们认为这种 extit{点估计}与LLM推理的 extit{随机}解码过程不匹配,其中输出长度本质上是 extit{不确定的},由何时采样到序列结束(EOS)标记决定。因此,每个请求的输出长度应拟合为分布而非单个值。通过对经验数据和随机解码过程的深入分析,我们观察到输出长度服从重尾分布,并可以用对数t分布拟合。在此基础上,我们提出一个简单的指标,称为Tail Inflated Expectation (TIE),用于替换SJF调度中的输出长度,该指标通过对数t分布的期望进行尾部概率调整,以考虑请求生成长输出的风险。为了评估我们的TIE调度器,我们将其与三个强基线进行比较,结果表明,TIE将在线推理的每token延迟降低了$2.31 imes$,并将离线数据生成的吞吐量提高了$1.42 imes$。

英文摘要

To schedule LLM inference, the \textit{shortest job first} (SJF) principle is favorable by prioritizing requests with short output lengths to avoid head-of-line (HOL) blocking. Existing methods usually predict a single output length for each request to facilitate scheduling. We argue that such a \textit{point estimate} does not match the \textit{stochastic} decoding process of LLM inference, where output length is \textit{uncertain} by nature and determined by when the end-of-sequence (EOS) token is sampled. Hence, the output length of each request should be fitted with a distribution rather than a single value. With an in-depth analysis of empirical data and the stochastic decoding process, we observe that output length follows a heavy-tailed distribution and can be fitted with the log-t distribution. On this basis, we propose a simple metric called Tail Inflated Expectation (TIE) to replace the output length in SJF scheduling, which adjusts the expectation of a log-t distribution with its tail probabilities to account for the risk that a request generates long outputs. To evaluate our TIE scheduler, we compare it with three strong baselines, and the results show that TIE reduces the per-token latency by $2.31\times$ for online inference and improves throughput by $1.42\times$ for offline data generation.

2603.09581 2026-05-26 cs.LG 版本更新

Towards Understanding Adam Convergence on Highly Degenerate Polynomials

理解Adam在高度退化多项式上的收敛性

Zhiwei Bai, Jiajie Zhao, Zhangchen Zhou, Zhi-Qin John Xu, Yaoyu Zhang

发表机构 * Institute of Natural Sciences, MOE-LSC, Shanghai Jiao Tong University(上海交通大学理论科学研究院) School of Mathematical Sciences, Shanghai Jiao Tong University(上海交通大学数学科学学院) Shanghai Seres Information Technology Co., Ltd, Shanghai 200040, China(上海塞瑞斯信息技术有限公司)

AI总结 本文研究Adam优化器在高度退化多项式上的自动收敛性质,推导局部渐近稳定性条件,证明其线性收敛速度优于梯度下降和动量法,并刻画超参数相图。

Comments Accepted to ICML 2026

详情
AI中文摘要

Adam是深度学习中广泛使用的优化算法,然而其具有固有优势的目标函数类别仍未被充分探索。与先前需要外部调度器和$\beta_2$接近1才能收敛的研究不同,本文研究了Adam的“自然”自动收敛性质。我们识别了一类高度退化多项式,Adam无需额外调度器即可自动收敛。具体地,我们推导了退化多项式上局部渐近稳定性的理论条件,并展示了理论界限与实验结果之间的强一致性。我们证明Adam在这些退化函数上实现局部线性收敛,显著优于梯度下降和动量法的次线性收敛。这种加速源于第二矩$v_t$与平方梯度$g_t^2$之间的解耦机制,该机制指数级放大有效学习率。最后,我们刻画了Adam的超参数相图,识别出三种不同的行为区域:稳定收敛、尖峰和类似SignGD的振荡。

英文摘要

Adam is a widely used optimization algorithm in deep learning, yet the specific class of objective functions where it exhibits inherent advantages remains underexplored. Unlike prior studies requiring external schedulers and $β_2$ near 1 for convergence, this work investigates the ``natural'' auto-convergence properties of Adam. We identify a class of highly degenerate polynomials where Adam converges automatically without additional schedulers. Specifically, we derive theoretical conditions for local asymptotic stability on degenerate polynomials and demonstrate strong alignment between theoretical bounds and experimental results. We prove that Adam achieves local linear convergence on these degenerate functions, significantly outperforming the sub-linear convergence of Gradient Descent and Momentum. This acceleration stems from a decoupling mechanism between the second moment $v_t$ and squared gradient $g_t^2$, which exponentially amplifies the effective learning rate. Finally, we characterize Adam's hyperparameter phase diagram, identifying three distinct behavioral regimes: stable convergence, spikes, and SignGD-like oscillation.

2603.08072 2026-05-26 cs.LG 版本更新

Hybrid Quantum Neural Network for Multivariate Clinical Time Series Forecasting

混合量子神经网络用于多变量临床时间序列预测

Irene Iele, Floriano Caprio, Paolo Soda, Matteo Tortora

发表机构 * Department of Diagnostics and Intervention, Radiation Physics, Biomedical Engineering, Umeå University, Sweden(诊断与介入部门,辐射物理,生物医学工程,乌梅大学,瑞典) Department of Naval, Electrical, Electronics and Telecommunications Engineering, University of Genoa, Italy(海军、电气、电子与电信工程部门,热那亚大学,意大利)

AI总结 提出一种混合量子-经典架构,将变分量子电路集成到循环神经网络中,用于多变量生理时间序列的多步预测,在BIDMC数据集上表现出与基线相当的精度和更强的鲁棒性。

详情
AI中文摘要

预测生理信号可以通过预期患者状态的临界变化来支持主动监测和及时的临床干预。在这项工作中,我们通过联合预测心率、血氧饱和度、脉搏率和呼吸率在15、30和60秒的预测时域上,解决了生理时间序列的多变量多步预测问题。我们提出了一种混合量子-经典架构,将变分量子电路(VQC)集成到循环神经骨干中。GRU编码器将历史观察窗口总结为潜在表示,然后将其投影到用于参数化VQC的量子角度上。量子层作为可学习的非线性特征混合器,在最终预测阶段之前建模跨变量交互。我们在BIDMC PPG和呼吸数据集上采用留一患者方案评估了所提出的方法。结果显示,与经典和深度学习基线相比,该方法具有竞争性的精度,同时对噪声和缺失输入具有更强的鲁棒性。这些发现表明,混合量子层可以为小队列临床环境中的生理时间序列预测提供有用的归纳偏置。代码可在https://github.com/arco-group/quantum-ml获取。

英文摘要

Forecasting physiological signals can support proactive monitoring and timely clinical intervention by anticipating critical changes in patient status. In this work, we address multivariate multi-horizon forecasting of physiological time series by jointly predicting heart rate, oxygen saturation, pulse rate, and respiratory rate at forecasting horizons of 15, 30, and 60 seconds. We propose a hybrid quantum-classical architecture that integrates a Variational Quantum Circuit (VQC) within a recurrent neural backbone. A GRU encoder summarizes the historical observation window into a latent representation, which is then projected into quantum angles used to parameterize the VQC. The quantum layer acts as a learnable non-linear feature mixer, modeling cross-variable interactions before the final prediction stage. We evaluate the proposed approach on the BIDMC PPG and Respiration dataset under a Leave-One-Patient-Out protocol. The results show competitive accuracy compared with classical and deep learning baselines, together with greater robustness to noise and missing inputs. These findings suggest that hybrid quantum layers can provide useful inductive biases for physiological time series forecasting in small-cohort clinical settings. The code is available at https://github.com/arco-group/quantum-ml.

2603.06798 2026-05-26 cs.LG cs.DC stat.ML 版本更新

NEST: Network- and Memory-Aware Device Placement For Distributed Deep Learning

NEST: 面向分布式深度学习的网络与内存感知设备放置

Irene Wang, Vishnu Varma Venkata, Arvind Krishnamurthy, Divya Mahajan

发表机构 * Georgia Institute of Technology(佐治亚理工学院) University of Washington(华盛顿大学)

AI总结 提出NEST框架,通过结构化动态规划统一模型并行、拓扑建模和内存可行性,在多种硬件和网络上实现高达2.43倍的吞吐量提升。

Comments Accepted to MLSys 2026

详情
AI中文摘要

深度学习规模的不断增长要求分布式训练框架能够联合考虑并行性、内存和网络拓扑。先前的工作通常依赖启发式或拓扑无关的搜索,分别处理通信和内存。由于缺乏每设备内存感知,这些方法通常事后通过将参数和激活分片到多个设备上来确保可行性,从而增加同步、扩大通信、降低计算利用率,限制了实际数据中心网络上的可扩展性和效率。我们提出了NEST,一个网络、计算和内存感知的设备放置框架,通过结构化动态规划统一了模型并行、拓扑建模和内存可行性。NEST的动态规划在具有张量和专家并行配置、跨层次或任意网络的显式allreduce延迟以及内存/计算轮廓的算子图上运行。通过跨张量、流水线、数据和专家维度分解并行性,NEST为混合策略定义了一个原则性的搜索空间,同时联合优化共置、网络延迟和内存可行性。在多种硬件和网络上的评估表明,与最先进的基线相比,NEST实现了高达2.43倍的吞吐量提升、更好的内存效率和可扩展性,为下一代AI基础设施的并行化策略和数据中心互连协同设计提供了基础。NEST的源代码可在https://github.com/scai-tech/Nest获取。

英文摘要

The growing scale of deep learning demands distributed training frameworks that jointly reason about parallelism, memory, and network topology. Prior works often rely on heuristic or topology-agnostic search, handling communication and memory separately. Without per-device memory awareness, these methods typically ensure feasibility post hoc by sharding parameters and activations across many devices, increasing synchronization, inflating communication, and underutilizing compute-limiting scalability and efficiency on real datacenter networks. We present NEST, a network-, compute-, and memory-aware device placement framework that unifies model parallelism, topology modeling, and memory feasibility via structured dynamic programming. NEST's DP operates on operator graphs with tensor and expert parallel configurations, explicit allreduce latencies across hierarchical or arbitrary networks, and memory/compute profiles. By factoring parallelism across tensor, pipeline, data, and expert dimensions, NEST defines a principled search space for hybrid strategies while jointly optimizing co-location, network latency, and memory feasibility. Evaluations across diverse hardware and networks show NEST achieves up to 2.43 times higher throughput, better memory efficiency, and improved scalability over state-of-the-art baselines, providing a foundation for co-designing parallelization strategies and datacenter interconnects for next-generation AI infrastructure. The source code of NEST is available at: https://github.com/scai-tech/Nest

2603.05143 2026-05-26 cs.CL cs.LG 版本更新

Feature Resemblance: Towards a Theoretical Understanding of Analogical Reasoning in Transformers

特征相似性:迈向对Transformer中类比推理的理论理解

Ruichen Xu, Wenjing Yan, Ying-Jun Angela Zhang

发表机构 * Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong(香港中文大学信息工程系)

AI总结 本文通过最小化Transformer抽象模型,从理论上证明联合训练和特定课程顺序能使实体在表示空间中对齐,从而通过特征相似性实现属性转移,即类比推理。

详情
AI中文摘要

理解大型语言模型中的推理因评估混淆多种推理类型而变得复杂。我们分离出类比推理,即模型在共享已知属性的实体之间转移属性,并研究这种转移何时能从训练中涌现。为了使问题在分析上易于处理,我们研究了一个最小化的Transformer风格抽象,该抽象隔离了学习到的表示如何支持类比推理。在此设置中,我们证明了三个关键结果。首先,对相似性和属性前提的联合训练通过对齐表示实现类比推理。其次,顺序训练仅在相似性结构先于特定属性学习时成功,揭示了课程不对称性。第三,在我们的风格化设置中,两跳推理$(a \to b, b \to c \Rightarrow a \to c)$可被视为具有身份桥$(b=b)$的类比推理,这些身份桥在训练数据中明确出现。这些结果共同揭示了一个统一机制:具有共享属性的实体在表示空间中对齐,从而通过特征相似性实现属性转移。使用高达8B参数的架构进行的实验与理论定性一致,并表明表示几何在风格化模型之外的类比推理中扮演重要角色。

英文摘要

Understanding reasoning in large language models is complicated by evaluations that conflate multiple reasoning types. We isolate analogical reasoning, where a model transfers an attribute between entities that share known properties, and study when such transfer can emerge from training. To make the problem analytically tractable, we study a minimal transformer-style abstraction that isolates how learned representations support analogical reasoning. Within this setting, we prove three key results. First, joint training on similarity and attribution premises enables analogical reasoning through aligned representations. Second, sequential training succeeds only when similarity structure is learned before specific attributes, revealing a curriculum asymmetry. Third, in our stylized setting, two-hop reasoning $(a \to b, b \to c \Rightarrow a \to c)$ can be viewed as analogical reasoning with identity bridges $(b=b)$, which appear explicitly in training data. Together, these results reveal a unified mechanism: entities with shared properties become aligned in representation space, enabling property transfer through feature resemblance. Experiments with architectures up to 8B parameters show qualitative agreement with the theory and suggest that representational geometry plays an important role in analogical reasoning beyond the stylized model.

2603.00857 2026-05-26 cs.LG cs.AI 版本更新

MultiPUFFIN: A Multimodal Domain-Constrained Foundation Model for Molecular Property Prediction of Small Molecules

MultiPUFFIN:用于小分子性质预测的多模态领域约束基础模型

Idelfonso B. R. Nogueira, Carine M. Rebello, Mumin Enis Leblebici, Erick Giovani Sperandio Nascimento

发表机构 * Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU)(挪威科学与技术大学化学工程系) Faculty of Industrial Engineering, KU Leuven(鲁文大学工业工程学院) University of Surrey(萨里大学)

AI总结 提出多模态基础模型MultiPUFFIN,融合SMILES、2D图、3D构象及实验条件,通过条件感知精炼和热力学约束头,在小样本下优于ChemBERTa-2,预测小分子热物理性质。

详情
AI中文摘要

MultiPUFFIN是一个领域信息多模态基础模型,用于预测小分子的热物理性质,填补了化学工程、药物发现和材料科学中的关键空白。现有的分子基础模型在数百万分子上预训练以学习通用表示,但其标准MLP输出层不施加物理约束,蒸汽压预测可能违反温度单调依赖性,粘度曲线可能缺乏过程模拟器所需的功能形式。保证热力学一致性的领域信息方法仍局限于单一性质和少量数据集,而多模态基础模型则侧重于生物活性而非热物理性质。MultiPUFFIN通过双向跨模态注意力和门控融合融合SMILES序列、2D分子图和3D构象几何,并辅以实验条件和分子描述符的辅助编码器,填补了这一空白。骨干网络使用三种互补的自监督目标在500,000个未标记的PubChem分子上预训练。一个条件感知精炼堆栈包含五个条件器(温度、pH、压力、多晶型和测量方法),将每个性质路由到一个四头锦标赛,选择该性质性能最佳的热力学信息头。MultiPUFFIN的平均测试R²为0.784,在所有九个性质上优于微调的ChemBERTa-2,尽管训练使用的标记分子数量少了约2000倍。

英文摘要

MultiPUFFIN is a domain-informed multimodal foundation model for predicting thermophysical properties of small molecules, addressing a critical gap in chemical engineering, drug discovery, and materials science. Existing molecular foundation models pretrain on millions of molecules to learn general-purpose representations, but their standard MLP output layers impose no physical constraints, vapor pressure predictions may violate monotonic temperature dependence, and viscosity curves may lack the functional form required by process simulators. Domain-informed approaches that guarantee thermodynamic consistency have remained limited to single properties and small datasets, whereas multimodal foundation models have focused on biological activity rather than thermophysical properties. MultiPUFFIN fills this gap by fusing SMILES sequences, 2D molecular graphs, and 3D conformer geometries through bidirectional cross-modal attention and gated fusion, supplemented by auxiliary encoders for experimental conditions and molecular descriptors. The backbone is pretrained on 500,000 unlabelled PubChem molecules using three complementary self-supervised objectives. A condition-aware refinement stack of five conditioners (temperature, pH, pressure, polymorph, and measurement method) routes each property to a four-head tournament that selects the best-performing thermodynamically informed head for that property. MultiPUFFIN achieves a mean test R2 of 0.784 and outperforms fine-tuned ChemBERTa-2 on all nine properties despite training on roughly 2,000x fewer labeled molecules.

2602.21198 2026-05-26 cs.LG cs.AI cs.CL cs.CV cs.RO 版本更新

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

从试错中学习:具身大语言模型的反思式测试时规划

Yining Hong, Huang Huang, Manling Li, Li Fei-Fei, Leonidas Guibas, Jiajun Wu, Yejin Choi

发表机构 * Stanford University(斯坦福大学) Northwestern University(西北大学)

AI总结 提出反思式测试时规划方法,通过行动中反思和行动后反思两种模式,结合回溯性反思,使具身智能体在测试时进行自我纠正和经验积累,显著提升长程任务性能。

详情
AI中文摘要

具身大语言模型赋予机器人高级任务推理能力,但它们无法反思错误原因,导致部署成为一系列独立尝试,错误重复而非积累经验。借鉴人类反思实践,我们引入反思式测试时规划,整合两种反思模式: extit{行动中反思},代理在行动前利用测试时扩展生成并评分多个候选行动,基于内部反思;以及 extit{行动后反思},利用测试时训练,根据执行后的外部反思更新内部反思模型和行动策略。我们还包含回溯性反思,允许代理重新评估早期决策,并利用后见之明进行模型更新,实现适当的长程信用分配。在我们新设计的Long-Horizon Household基准和MuJoCo Cupboard Fitting基准上的实验表明,与基线模型相比有显著提升,并能零样本泛化到逼真的HM3D环境以及在Franka Panda机械臂上的真实机器人实验。消融实验证实,行动中反思和行动后反思相互依赖,且回溯性反思在较低计算开销下比逐步外部反馈实现更好的信用分配。定性分析进一步突出了通过反思进行的行为纠正。

英文摘要

Embodied LLMs endow robots with high-level task reasoning, but they cannot reflect on what went wrong or why, turning deployment into a sequence of independent trials where mistakes repeat rather than accumulate into experience. Drawing upon human reflective practitioners, we introduce Reflective Test-Time Planning, which integrates two modes of reflection: \textit{reflection-in-action}, where the agent uses test-time scaling to generate and score multiple candidate actions using internal reflections before execution; and \textit{reflection-on-action}, which uses test-time training to update both its internal reflection model and its action policy based on external reflections after execution. We also include retrospective reflection, allowing the agent to re-evaluate earlier decisions and perform model updates with hindsight for proper long-horizon credit assignment. Experiments on our newly-designed Long-Horizon Household benchmark and MuJoCo Cupboard Fitting benchmark show significant gains over baseline models, with zero-shot generalization to photorealistic HM3D environments and real-robot experiments on a Franka Panda arm. Ablations confirm that reflection-in-action and reflection-on-action are mutually dependent, and that retrospective reflection achieves better credit assignment than step-wise external feedback at lower computational overhead. Qualitative analyses further highlight behavioral correction through reflection.

2602.20210 2026-05-26 cs.LG cs.AI 版本更新

Multimodal Crystal Flow: Any-to-Any Modality Generation for Unified Crystal Modeling

多模态晶体流:面向统一晶体建模的任意模态生成

Kiyoung Seong, Sungsoo Ahn, Sehui Han, Changyoung Park

发表机构 * Graduate School of AI, KAIST, Seoul, South Korea(韩国科学技术院人工智能研究生院,首尔,韩国) Materials Intelligence Lab, LG AI Research, Seoul, South Korea(LG AI研究所材料智能实验室,首尔,韩国)

AI总结 提出多模态晶体流(MCFlow),一种统一的多模态流模型,通过原子类型和晶体结构的独立时间变量实现多种晶体生成任务,并在MP-20和MPTS-52基准上达到与任务特定基线竞争的性能。

详情
AI中文摘要

晶体建模涵盖一系列条件和非条件生成任务,包括晶体结构预测(CSP)和从头生成(DNG)。尽管最近的深度生成模型表现出有前景的性能,但它们仍然主要是任务特定的,缺乏跨任务共享晶体表示的统一框架。为了解决这一限制,我们提出了多模态晶体流(MCFlow),一种统一的多模态流模型,通过原子类型和晶体结构的独立时间变量将多种晶体生成任务实现为不同的推理轨迹。为了在标准Transformer模型中实现多模态流,我们引入了一种具有层次排列增强的组合和对称感知原子排序,无需显式结构模板即可注入组合和晶体学先验。在MP-20和MPTS-52基准上的实验表明,单个MCFlow模型在CSP、DNG和结构条件原子类型生成方面与任务特定基线具有竞争力。

英文摘要

Crystal modeling spans a family of conditional and unconditional generation tasks, including crystal structure prediction (CSP) and de novo generation (DNG). While recent deep generative models have shown promising performance, they remain largely task-specific, lacking a unified framework that shares crystal representations across tasks. To address this limitation, we propose Multimodal Crystal Flow (MCFlow), a unified multimodal flow model that realizes multiple crystal generation tasks as distinct inference trajectories via independent time variables for atom types and crystal structures. To enable multimodal flow in a standard transformer model, we introduce a composition- and symmetry-aware atom ordering with hierarchical permutation augmentation, injecting compositional and crystallographic priors without explicit structural templates. Experiments on the MP-20 and MPTS-52 benchmarks show that a single MCFlow model is competitive with task-specific baselines across CSP, DNG, and structure-conditioned atom type generation.

2602.17658 2026-05-26 cs.LG cs.AI cs.IT math.IT 版本更新

MARS: Margin and Semantic-Aware Data Augmentation for Reward Modeling

MARS:面向奖励建模的边界与语义感知数据增强

Payel Bhattacharjee, Osvaldo Simeone, Ravi Tandon

发表机构 * University of Arizona(亚利桑那大学) Northeastern University London(伦敦东北大学)

AI总结 提出MARS框架,通过优先增强低边界偏好对并利用语义距离细化,提升奖励模型质量和对齐性能。

详情
AI中文摘要

奖励建模是RLHF、RLAIF和基于PPO的策略优化等对齐流程的核心,但其可靠性受限于有限且异构的人类偏好数据,这些数据难以大规模收集。虽然合成增强可以扩展偏好监督,但现有方法通常均匀增强或在表示层面增强,而不针对奖励模型不确定或容易误排序的示例。在本文中,我们介绍了MARS(面向奖励建模的边界与语义感知数据增强),一种自适应增强框架,优先考虑低边界偏好对,并使用语义距离作为第二层细化,以增强选择响应和拒绝响应之间的对比。在多个偏好数据集、奖励模型骨干、下游对齐设置以及包括RewardBench和AlpacaEval在内的基准测试中,MARS在奖励模型质量和对齐性能上都优于现有基线。我们的结果表明,当同时由模型边界和语义结构引导时,奖励模型增强最为有效。

英文摘要

Reward modeling is central to alignment pipelines such as RLHF, RLAIF, and PPO-based policy optimization, yet its reliability is constrained by limited and heterogeneous human preference data that are expensive to collect at scale. While synthetic augmentation can expand preference supervision, existing methods often augment uniformly or at the representation level, without targeting examples where the reward model is uncertain or prone to mis-ranking. In this paper, we introduce MARS (Margin and Semantic-Aware Data Augmentation for Reward Modeling), an adaptive augmentation framework that prioritizes low-margin preference pairs and uses semantic distance as a second layer for refinement to enhance the contrast between the chosen and rejected responses. Across multiple preference datasets, reward-model backbones, downstream alignment settings, and benchmarks including RewardBench and AlpacaEval, MARS improves both reward-model quality and alignment performance over existing baselines. Our results show that reward-model augmentation is most effective when guided by both model margins and semantic structure.

2602.17234 2026-05-26 cs.AI cs.LG 版本更新

All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection and Mitigation in LLM Backtesting

所有泄漏都重要,有些泄漏更重要:LLM回测中可解释的时间污染检测与缓解

Zeyu Zhang, Ryan Chen, Bradly C. Stadie

发表机构 * Department of Statistics and Data Science, Northwestern University(统计与数据科学系,西北大学) Bridgewater AIA Labs(布里奇沃特AIA实验室)

AI总结 提出基于Shapley值的声明级评估框架Shapley-DCLR和推理时架构TimeSPEC,用于检测和缓解LLM回测中的时间污染问题。

Comments 8 pages plus appendix

详情
AI中文摘要

对已解决事件进行回测的LLM假设模型仅基于截止前知识进行推理,然而预训练模型不可避免地泄漏截止后知识。我们引入了一个声明级评估框架,将预测理由分解为原子声明,并应用Shapley值量化每个声明的决策影响,从而得到 extbf{Shapley-DCLR}( extbf{Shapley}加权的 extbf{决策关键泄漏率})——一个可解释的度量,用于衡量决策驱动推理中被污染的比例。我们进一步提出 extbf{TimeSPEC}(基于提取声明的时间监督预测),一种推理时架构,它将时间过滤的检索与声明级监督交织在一起,生成完全基于截止前证据的预测。在三个LLM上的消融实验证实了检索和监督共同必要;三项任务探测进一步说明,时间强制的性能成本与每个任务对截止后信息的依赖程度成正比。

英文摘要

Backtesting LLMs on resolved events assumes models reason only from pre-cutoff knowledge, yet pretrained models inevitably leak post-cutoff knowledge. We introduce a claim-level evaluation framework that decomposes prediction rationales into atomic claims and applies Shapley values to quantify each claim's decision impact, yielding \textbf{Shapley-DCLR} (\textbf{Shapley}-weighted \textbf{D}ecision-\textbf{C}ritical \textbf{L}eakage \textbf{R}ate) -- an interpretable metric measuring what fraction of decision-driving reasoning is contaminated. We further propose \textbf{TimeSPEC} (\textbf{Time}-\textbf{S}upervised \textbf{P}rediction with \textbf{E}xtracted \textbf{C}laims), an inference-time architecture that interleaves temporally-filtered retrieval with claim-level supervision, producing predictions grounded entirely in pre-cutoff evidence. Across three LLMs, the ablation experiments confirm retrieval and supervision are jointly necessary; and a three-task probe further illstrates that the performance cost of temporal enforcement scales with each task's reliance on post-cutoff information.

2602.16229 2026-05-26 cs.LG 版本更新

Factored Latent Action World Models

因子化潜在动作世界模型

Zizhao Wang, Chang Shi, Jiaheng Hu, Kevin Rohling, Roberto Martín-Martín, Amy Zhang, Peter Stone

发表机构 * University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 提出因子化潜在动作模型(FLAM),通过将场景分解为独立因子并学习各自的潜在动作,提升了无动作视频中多实体动态建模的准确性和视频生成质量。

详情
AI中文摘要

从无动作视频中学习潜在动作已成为扩展可控世界模型学习的强大范式。潜在动作为用户迭代生成和操作视频提供了自然接口。然而,大多数现有方法依赖整体逆动态和正动态模型,学习单一潜在动作来控制整个场景,因此在多个实体同时行动的复杂环境中表现不佳。本文引入因子化潜在动作模型(FLAM),一种因子化动态框架,将场景分解为独立因子,每个因子推断自己的潜在动作并预测自己的下一步因子值。与整体模型相比,这种因子化结构能够更准确地建模复杂多实体动态,并提高无动作视频设置中的视频生成质量。基于模拟和真实世界多实体数据集的实验,我们发现FLAM在预测准确性和表示质量方面优于先前工作,并促进了下游策略学习,展示了因子化潜在动作模型的优势。

英文摘要

Learning latent actions from action-free video has emerged as a powerful paradigm for scaling up controllable world model learning. Latent actions provide a natural interface for users to iteratively generate and manipulate videos. However, most existing approaches rely on monolithic inverse and forward dynamics models that learn a single latent action to control the entire scene, and therefore struggle in complex environments where multiple entities act simultaneously. This paper introduces Factored Latent Action Model (FLAM), a factored dynamics framework that decomposes the scene into independent factors, each inferring its own latent action and predicting its own next-step factor value. This factorized structure enables more accurate modeling of complex multi-entity dynamics and improves video generation quality in action-free video settings compared to monolithic models. Based on experiments on both simulation and real-world multi-entity datasets, we find that FLAM outperforms prior work in prediction accuracy and representation quality, and facilitates downstream policy learning, demonstrating the benefits of factorized latent action models.

2602.11534 2026-05-26 cs.LG cs.AI 版本更新

Krause Synchronization Transformers

Krause同步变换器

Jingkun Liu, Yisong Yue, Max Welling, Yue Song

发表机构 * Shanghai Qi Zhi Institute(上海启智研究院) College of AI, Tsinghua University(清华大学人工智能学院) Shanghai Jiao Tong University(上海交通大学) California Institute of Technology(加州理工学院) University of Amsterdam(阿姆斯特丹大学)

AI总结 提出基于有界置信共识动力学的Krause注意力机制,通过局部化稀疏交互替代全局softmax归一化,缓解表示坍缩和注意力汇聚现象,实现线性复杂度并提升性能。

Comments ICML 2026, Project page: https://jingkun-liu.github.io/krause-sync-transformers/

详情
AI中文摘要

Transformer中的自注意力依赖于全局归一化的softmax权重,导致所有token在每一层竞争影响力。当跨深度组合时,这种交互模式会诱导强同步动力学,倾向于收敛到主导模式,这种行为与表示坍缩和注意力汇聚现象相关。我们引入了Krause注意力,一种受有界置信共识动力学启发的原则性注意力机制。Krause注意力将基于相似性的全局聚合替换为基于距离的、局部化的、选择性稀疏的交互,促进结构化的局部同步而非全局混合。我们将这种行为与最近将Transformer动力学建模为相互作用粒子系统的理论联系起来,并展示有界置信交互如何自然地调节注意力集中并缓解注意力汇聚。将交互限制在局部邻域还将运行时复杂度从序列长度的二次方降低到线性。实验上,我们在多种设置中验证了Krause注意力,包括视觉(CIFAR/ImageNet上的ViT)、自回归图像生成(MNIST/CIFAR-10)、大语言模型(Llama/Qwen)以及从零开始训练的多种规模(100M/200M)的语言模型。在这些领域中,Krause注意力在提高计算效率的同时实现了持续的性能提升,突显了有界置信动力学作为注意力的一种可扩展且有效的归纳偏置。

英文摘要

Self-attention in Transformers relies on globally normalized softmax weights, causing all tokens to compete for influence at every layer. When composed across depth, this interaction pattern induces strong synchronization dynamics that favor convergence toward a dominant mode, a behavior associated with representation collapse and attention sink phenomena. We introduce Krause Attention, a principled attention mechanism inspired by bounded-confidence consensus dynamics. Krause Attention replaces similarity-based global aggregation with distance-based, localized, and selectively sparse interactions, promoting structured local synchronization instead of global mixing. We relate this behavior to recent theory modeling Transformer dynamics as interacting particle systems, and show how bounded-confidence interactions naturally moderate attention concentration and alleviate attention sinks. Restricting interactions to local neighborhoods also reduces runtime complexity from quadratic to linear in sequence length. Empirically, we validate Krause Attention across diverse settings, including vision (ViT on CIFAR/ImageNet), autoregressive image generation (MNIST/CIFAR-10), large language models (Llama/Qwen), and language models trained from scratch at multiple scales (100M/200M). Across these domains, Krause Attention achieves consistent performance gains while improving computational efficiency, highlighting bounded-confidence dynamics as a scalable and effective inductive bias for attention.

2602.11439 2026-05-26 cs.LG 版本更新

Multi-Level Strategic Classification: Incentivizing Improvement through Promotion and Relegation Dynamics

多层级策略分类:通过晋升与降级动态激励改进

Ziyuan Huang, Lina Alkarmi, Mingyan Liu

发表机构 * Electrical and Computer Engineering Department, University of Michigan, Ann Arbor, MI 48109, USA(密歇根大学电气与计算机工程系,安阿伯,MI 48109,美国)

AI总结 本文提出一种多层级晋升-降级框架,通过设计分类器阈值和难度递进来激励代理人诚实努力,并证明在温和条件下代理人可通过真实改进达到任意高水平。

Comments 9 pages, 4 figures, Accepted at ICML 2026

详情
AI中文摘要

策略分类研究自私个体或代理人操纵其响应以获得分类器有利决策结果的问题,通常当虚假行为成本低于真实努力时,他们会采取不诚实行为。虽然现有关于序列策略分类的研究主要关注优化动态分类器权重,但我们偏离这些以权重为中心的方法,分析了多层级晋升-降级框架中分类器阈值和难度递进的设计。我们的模型捕捉了由代理人的远见、技能保留以及资格与成就可自我强化的“助力效应”驱动的关键跨期激励。我们刻画了代理人的最优长期策略,并证明委托人可以设计一系列阈值来有效激励诚实努力。关键地,我们证明在温和条件下,该机制使代理人能够仅通过真实改进努力达到任意高水平。

英文摘要

Strategic classification studies the problem where self-interested individuals or agents manipulate their response to obtain favorable decision outcomes made by classifiers, typically turning to dishonest actions when they are less costly than genuine efforts. While existing studies on sequential strategic classification primarily focus on optimizing dynamic classifier weights, we depart from these weight-centric approaches by analyzing the design of classifier thresholds and difficulty progression within a multi-level promotion-relegation framework. Our model captures the critical inter-temporal incentives driven by an agent's farsightedness, skill retention, and a leg-up effect where qualification and attainment can be self-reinforcing. We characterize the agent's optimal long-term strategy and demonstrate that a principal can design a sequence of thresholds to effectively incentivize honest effort. Crucially, we prove that under mild conditions, this mechanism enables agents to reach arbitrarily high levels solely through genuine improvement efforts.

2602.08499 2026-05-26 cs.LG cs.AI 版本更新

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards

上下文展开赌博机:面向可验证奖励的强化学习

Xiaodong Lu, Xiaohan Wang, Jiajun Chai, Guojun Yin, Wei Lin, Zhijun Chen, Yu Luo, Fuzhen Zhuang, Yikun Ban, Deqing Wang

发表机构 * School of Computer Science and Engineering, Beihang University(北京航空航天大学计算机科学与工程学院) School of Artificial Intelligence, Beihang University(北京航空航天大学人工智能学院) Huawei(华为)

AI总结 针对RLVR中展开使用无差别、短视导致的问题,提出上下文赌博机框架,自适应选择高价值展开,提升训练效率与性能。

详情
AI中文摘要

可验证奖励的强化学习(RLVR)是提升大型语言模型推理能力的有效范式。然而,现有RLVR方法以无差别和短视的方式使用展开:每个提示内不同质量的响应被统一对待,且历史展开在单次使用后被丢弃。这导致监督噪声大、样本效率低以及策略更新次优。我们通过将RLVR中的展开调度形式化为上下文赌博机问题,并提出一个统一的神经调度框架来解决这些问题,该框架在整个训练过程中自适应地选择高价值展开。每个展开被视为一个臂,其奖励由连续优化步骤之间诱导的性能增益定义。由此产生的调度器支持噪声感知的组内选择和历史展开的自适应全局重用,所有这些都在一个统一的原则性框架内。我们通过推导次线性遗憾界并证明扩大展开缓冲区可改善可实现性能上限,提供了理论依据。在六个数学推理基准上的实验表明,在多种RLVR优化方法中,性能和训练效率均有一致的提升。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) is an effective paradigm for improving the reasoning capabilities of large language models. However, existing RLVR methods utilize rollouts in an indiscriminate and short-horizon manner: responses of heterogeneous quality within each prompt are treated uniformly, and historical rollouts are discarded after a single use. This leads to noisy supervision, poor sample efficiency, and suboptimal policy updates. We address these issues by formulating rollout scheduling in RLVR as a contextual bandit problem and proposing a unified neural scheduling framework that adaptively selects high-value rollouts throughout training. Each rollout is treated as an arm whose reward is defined by the induced performance gain between consecutive optimization steps. The resulting scheduler supports both noise-aware intra-group selection and adaptive global reuse of historical rollouts within a single principled framework. We provide theoretical justification by deriving sublinear regret bounds and showing that enlarging the rollout buffer improves the achievable performance upper bound. Experiments on six mathematical reasoning benchmarks demonstrate consistent gains in performance and training efficiency across multiple RLVR optimization methods.

2602.07518 2026-05-26 cs.ET cs.AR cs.LG nlin.AO 版本更新

Physical Analogue Kolmogorov-Arnold Networks based on Reconfigurable Nonlinear-Processing Units

基于可重构非线性处理单元的物理模拟Kolmogorov-Arnold网络

Manuel Escudero, Mohamadreza Zolfagharinejad, Sjoerd van den Belt, Nikolaos Alachiotis, Wilfred G. van der Wiel

发表机构 * NanoElectronics Group, MESA+ Institute and BRAINS Center for Brain-Inspired Computing, University of Twente(纳米电子组、MESA+研究所和脑启发计算中心、代尔夫特理工大学) CAES Group and BRAINS Center for Brain-Inspired Computing, University of Twente(CAES组和脑启发计算中心、代尔夫特理工大学)

AI总结 提出一种基于可重构非线性处理单元(RNPU)的物理模拟KAN架构,通过硬件实现可编程非线性变换,在回归和分类任务中以更少参数达到与MLP相当的精度,并实现约10²-10³倍能效提升和约10倍面积缩减。

详情
AI中文摘要

Kolmogorov-Arnold网络(KAN)将神经计算从线性层转移到可学习的非线性边函数,但在硬件中高效实现这些非线性仍是一个开放挑战。本文介绍了一种物理模拟KAN架构,其中边函数通过可重构非线性处理单元(RNPU)在材料中实现:RNPU是多端纳米级硅器件,其输入输出特性通过控制电压调谐。通过将多个RNPU组合成边处理器,并将这些模块组装成具有集成混合信号接口的可重构模拟KAN(aKAN)架构,我们建立了一个现实的系统级硬件实现,能够以可编程非线性变换实现紧凑的KAN式回归和分类。使用实验校准的RNPU模型和硬件测量,我们展示了在任务复杂度增加时准确函数逼近的能力,同时所需可训练参数少于或相当于多层感知器(MLP)。系统级估计表明,对于代表性工作负载,每次推理能耗约为250 pJ,端到端推理延迟约为600 ns,与具有相似逼近误差的数字定点MLP相比,能耗降低约10²-10³倍,面积减少约10倍。这些结果确立了RNPU作为可扩展的硬件原生非线性计算基元,并指出模拟KAN架构是实现能量、延迟和面积高效的模拟神经网络硬件的现实硅基路径,特别适用于边缘推理。

英文摘要

Kolmogorov-Arnold Networks (KANs) shift neural computation from linear layers to learnable nonlinear edge functions, but implementing these nonlinearities efficiently in hardware remains an open challenge. Here we introduce a physical analogue KAN architecture in which edge functions are realized in materia using reconfigurable nonlinear-processing units (RNPUs): multi-terminal nanoscale silicon devices whose input-output characteristics are tuned via control voltages. By combining multiple RNPUs into an edge processor and assembling these blocks into a reconfigurable analogue KAN (aKAN) architecture with integrated mixed-signal interfacing, we establish a realistic system-level hardware implementation that enables compact KAN-style regression and classification with programmable nonlinear transformations. Using experimentally calibrated RNPU models and hardware measurements, we demonstrate accurate function approximation across increasing task complexity while requiring fewer or comparable trainable parameters than multilayer perceptrons (MLPs). System-level estimates indicate an energy per inference of $\sim$250 pJ and an end-to-end inference latency of $\sim$600 ns for a representative workload, corresponding to a $\sim$10$^{2}$-10$^{3}\times$ reduction in energy accompanied by a $\sim$10$\times$ reduction in area compared to a digital fixed-point MLP at similar approximation error. These results establish RNPUs as scalable, hardware-native nonlinear computing primitives and identify analogue KAN architectures as a realistic silicon-based pathway toward energy-, latency-, and footprint-efficient analogue neural-network hardware, particularly for edge inference.

2602.06717 2026-05-26 cs.LG cs.AI 版本更新

F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare

F-GRPO: 别让你的策略学到显而易见的而忘记罕见的

Daniil Plyusov, Alexey Gorbatovski, Boris Shaposhnikov, Viacheslav Sinii, Alexey Malakhov, Daria Korotyshova, Daniil Gavrilov

发表机构 * T-Tech

AI总结 针对强化学习中有限采样组导致罕见正确轨迹被忽略的问题,提出基于Focal loss的难度感知缩放系数F-GRPO,在不增加组大小和计算成本下提升数学推理性能。

详情
AI中文摘要

基于可验证奖励的强化学习通常依赖组采样来估计优势并稳定策略更新。实践中,计算限制往往排除非常大的组,因此训练使用有限的rollout集合,这些集合只能强化它们暴露的正确行为。在实际组大小下,更新可能会遗漏罕见的正确轨迹,同时仍然包含混合奖励,将概率集中在更常见的采样解上。我们推导了这种提示局部尾部遗漏事件作为组大小函数的概率,展示了非单调行为,并在分类抽象中描述了未采样的正确质量如何在总正确质量增长时缩小。受此分析启发,我们提出了一种难度感知缩放系数,灵感来自Focal loss,它降低了高成功采样组的更新权重。经验上,分类模拟在分类设置中展示了相同效果,Maze提供了单解测试,LLM实验包括代表性的GRPO组大小扫描以及GRPO、DAPO和CISPO之间的固定N迁移。在Qwen2.5-7B上,N=8时,我们的方法将平均数学pass@256从64.1提高到70.3(GRPO),69.3提高到72.5(DAPO),73.2提高到76.8(CISPO);在所有三种情况下,OOD pass@256也得到改善,且不增加组大小或计算成本。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) is commonly based on group sampling to estimate advantages and stabilize policy updates. In practice, computational limits often rule out very large groups, so training proceeds with finite rollout sets that can reinforce only the correct behavior they expose. At practical group sizes, updates can miss rare-correct trajectories while still containing mixed rewards, concentrating probability on more common sampled solutions. We derive the probability of such prompt-local tail-miss events as a function of group size, showing non-monotonic behavior, and in the categorical abstraction characterize how unsampled-correct mass can shrink even as total correct mass grows. Motivated by this analysis, we propose a difficulty-aware scaling coefficient, inspired by Focal loss, that down-weights updates on high-success sampled groups. Empirically, categorical simulation illustrates the same effect in the categorical setting, Maze provides a single-solution test, and LLM experiments include a representative GRPO group-size sweep together with fixed-$N$ transfer across GRPO, DAPO, and CISPO. On Qwen2.5-7B at $N{=}8$, our method improves average math pass@256 from 64.1 $\rightarrow$ 70.3 (GRPO), 69.3 $\rightarrow$ 72.5 (DAPO), and 73.2 $\rightarrow$ 76.8 (CISPO); OOD pass@256 also improves in all three cases, without increasing group size or computational cost.

2602.05052 2026-05-26 cs.LG 版本更新

Learning, Solving and Optimizing PDEs with TensorGalerkin: an efficient high-performance Galerkin assembly algorithm

使用TensorGalerkin学习、求解和优化PDE:一种高效的高性能Galerkin组装算法

Shizheng Wen, Mingyuan Chi, Tianwei Yu, Ben Moseley, Mike Yan Michelis, Pu Ren, Hao Sun, Siddhartha Mishra

发表机构 * ETH Zurich, Switzerland(苏黎世联邦理工学院,瑞士) Imperial College London, UK(伦敦帝国学院,英国) Northeastern University, USA(东北大学,美国) Renmin University of China, China(中国人民大学,中国)

AI总结 提出基于Galerkin离散化的统一算法框架,通过张量化元素操作和稀疏矩阵乘法实现O(1)图规模的系统组装,高效求解、约束优化和物理信息学习变分PDE。

详情
AI中文摘要

我们提出了一个统一的算法框架,用于具有变分结构的PDE的数值求解、约束优化和物理信息学习。该框架基于底层变分形式的Galerkin离散化,其高效率源于一种新颖的高度优化且兼容GPU的TensorGalerkin框架,用于线性系统组装(刚度矩阵和载荷向量)。TensorGalerkin通过在Python级Map阶段张量化元素操作,然后使用稀疏矩阵乘法进行全局归约,该乘法在网格诱导的稀疏图上执行消息传递。Map和Reduce阶段在PyTorch的autograd内部协同设计,使得组装图包含O(1)个节点,无论元素数量和局部自由度如何缩放。我们通过将TensorGalerkin部署为i)高效的数值PDE求解器,ii)用于PDE约束优化的端到端可微框架,以及iii)用于PDE的物理信息算子学习算法,验证了这种O(1)图属性。通过多个基准测试,包括非结构化网格上的2D和3D椭圆、抛物线和双曲PDE,我们证明了所提出的框架在所有目标下游应用中相比各种基线提供了显著的计算效率和精度提升。

英文摘要

We present a unified algorithmic framework for the numerical solution, constrained optimization, and physics-informed learning of PDEs with a variational structure. Our framework is based on a Galerkin discretization of the underlying variational forms, and its high efficiency stems from a novel highly-optimized and GPU-compliant TensorGalerkin framework for linear system assembly (stiffness matrices and load vectors). TensorGalerkin operates by tensorizing element-wise operations within a Python-level Map stage and then performs global reduction with a sparse matrix multiplication that performs message passing on the mesh-induced sparsity graph. The Map and Reduce stages are co-designed inside PyTorch's autograd so that the assembly graph contains $O(1)$ nodes regardless of how the number of elements and local DoFs scale. We validate this $O(1)$-graph property by deploying TensorGalerkin downstream as i) a highly-efficient numerical PDEs solver, ii) an end-to-end differentiable framework for PDE-constrained optimization, and iii) a physics-informed operator learning algorithm for PDEs. With multiple benchmarks, including 2D and 3D elliptic, parabolic, and hyperbolic PDEs on unstructured meshes, we demonstrate that the proposed framework provides significant computational efficiency and accuracy gains over a variety of baselines in all the targeted downstream applications.

2602.04139 2026-05-26 cs.LG physics.comp-ph 版本更新

Generative Neural Operators through Diffusion Last Layer

通过扩散最后一层的生成式神经算子

Sungwon Park, Anthony Zhou, Hongjoong Kim, Amir Barati Farimani

发表机构 * Korea University, Seoul, South Korea(韩国大学,首尔,韩国) Carnegie Mellon University, Pittsburgh, USA(卡内基梅隆大学,匹兹堡,美国)

AI总结 提出扩散最后一层(DLL)作为神经算子的概率输出头,通过Karhunen-Loéve展开和系数空间的条件扩散模型实现高效分布建模,在随机PDE基准和确定性长时滚动任务中提升了分布保真度和不确定性估计。

Comments ICML 2026, code is available at https://github.com/sungwpark/dll-no

详情
AI中文摘要

神经算子为学习函数空间之间的离散化不变映射提供了强大框架,但标准确定性模型无法捕捉预测不确定性。我们引入了扩散最后一层(DLL),一种用于神经算子主干的模块化概率输出头。DLL通过受Karhunen-Loéve展开启发的输入依赖低秩展开表示目标场,并在相应系数空间上学习条件扩散模型。这种设计使得在保留算子学习结构优势的同时实现高效的分布建模。在具有随机强迫的随机PDE基准测试中,DLL实现了强分布保真度,并与像素空间和传统潜在扩散基线竞争。在确定性长时滚动任务中,DLL提高了底层主干的滚动稳定性,并在复合自回归误差下提供了有用的预测不确定性估计。这些结果表明,在学习到的系数空间中进行扩散建模为不确定性感知神经算子提供了一条实用途径。

英文摘要

Neural operators provide a powerful framework for learning discretization invariant mappings between function spaces, but standard deterministic models do not capture predictive uncertainty. We introduce diffusion last layer (DLL), a modular probabilistic output head for neural operator backbones. DLL represents target fields through an input dependent low rank expansion inspired by the Karhunen-Loéve expansion and learns a conditional diffusion model over the corresponding coefficient space. This design enables efficient distributional modeling while preserving the structural advantages of operator learning. On stochastic PDE benchmarks with random forcing, DLL achieves strong distributional fidelity and performs competitively with pixel space and conventional latent diffusion baselines. In deterministic long horizon rollout tasks, DLL improves rollout stability over the underlying backbone and provides useful estimates of predictive uncertainty under compounding autoregressive errors. These results suggest that diffusion modeling in learned coefficient spaces offers a practical route to uncertainty aware neural operators.

2602.04120 2026-05-26 cs.LG cs.AI cs.DC cs.SE 版本更新

Scalable Explainability-as-a-Service (XaaS) for Edge AI Systems

面向边缘AI系统的可扩展可解释性即服务(XaaS)

Samaresh Kumar Singh, Joyjit Roy

AI总结 提出可解释性即服务(XaaS)分布式架构,通过解耦推理与解释生成、语义缓存、轻量验证和自适应引擎,在边缘设备上实现低延迟、高保真的可解释性,并在三个实际用例中降低38%延迟。

Comments 8 pages, 5 figures, 2 tables. This version updates metadata after publication in IEEE Xplore and publication by SoutheastCon 2026

详情
Journal ref
2026 IEEE SoutheastCon, Huntsville, AL, USA, 2026
AI中文摘要

尽管可解释人工智能(XAI)取得了显著进展,但其在边缘和物联网系统中的集成通常是临时且低效的。当前大多数方法以“耦合”方式运行,即解释生成与模型推理同时进行。因此,这些方法在异构边缘设备上部署时会产生冗余计算、高延迟和可扩展性差的问题。本文提出可解释性即服务(XaaS),一种将可解释性视为一等系统服务(而非模型特定功能)的分布式架构。我们提出的XaaS架构的关键创新在于解耦推理与解释生成,使边缘设备能够在资源和延迟约束下请求、缓存和验证解释。为此,我们引入三项主要创新:(1)基于语义相似性的分布式解释缓存检索方法,显著减少冗余计算;(2)轻量验证协议,确保缓存和新生成解释的保真度;(3)自适应解释引擎,根据设备能力和用户需求选择解释方法。我们在三个实际边缘AI用例上评估了XaaS的性能:(i)制造质量控制;(ii)自动驾驶车辆感知;(iii)医疗诊断。实验结果表明,XaaS在三个实际部署中延迟降低38%,同时保持高解释质量。总体而言,本工作使得在大规模异构物联网系统中部署透明和可问责的AI成为可能,并弥合了XAI研究与边缘实用性之间的差距。

英文摘要

Though Explainable AI (XAI) has made significant advancements, its inclusion in edge and IoT systems is typically ad-hoc and inefficient. Most current methods are "coupled" in such a way that they generate explanations simultaneously with model inferences. As a result, these approaches incur redundant computation, high latency and poor scalability when deployed across heterogeneous sets of edge devices. In this work we propose Explainability-as-a-Service (XaaS), a distributed architecture for treating explainability as a first-class system service (as opposed to a model-specific feature). The key innovation in our proposed XaaS architecture is that it decouples inference from explanation generation allowing edge devices to request, cache and verify explanations subject to resource and latency constraints. To achieve this, we introduce three main innovations: (1) A distributed explanation cache with a semantic similarity based explanation retrieval method which significantly reduces redundant computation; (2) A lightweight verification protocol that ensures the fidelity of both cached and newly generated explanations; and (3) An adaptive explanation engine that chooses explanation methods based upon device capability and user requirement. We evaluated the performance of XaaS on three real-world edgeAI use cases: (i) manufacturing quality control; (ii) autonomous vehicle perception; and (iii) healthcare diagnostics. Experimental results show that XaaS reduces latency by 38% while maintaining high explanation quality across three real-world deployments. Overall, this work enables the deployment of transparent and accountable AI across large scale, heterogeneous IoT systems, and bridges the gap between XAI research and edge-practicality.

2602.02979 2026-05-26 cs.CL cs.LG 版本更新

CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning

CPMobius: 无数据强化学习的迭代式教练-玩家推理

Ran Li, Zeyuan Liu, Yinghao Chen, Bingxiang He, Jiarui Yuan, Zixuan Fu, Weize Chen, Jinyi Hu, Chen Qian, Zhiyuan Liu, Maosong Sun

发表机构 * Tsinghua University(清华大学) University of Cambridge(剑桥大学) Shanghai Jiao Tong University(上海交通大学)

AI总结 提出CPMobius协作式教练-玩家范式,通过无外部数据的合作优化循环提升数学推理能力,在Qwen2.5-Math-7B-Instruct上总体准确率提升4.9%,OOD准确率提升5.4%。

Comments Accepted to the ICML 2026

详情
AI中文摘要

大型语言模型(LLMs)在复杂推理方面展现出强大潜力,但其进展仍从根本上受限于对大规模高质量人工策划任务和标签的依赖,无论是通过监督微调(SFT)还是基于推理特定数据的强化学习(RL)。这种依赖使得监督密集型训练范式日益不可持续,实践中已出现可扩展性减弱的迹象。为克服这一限制,我们引入了CPMöbius(CPMobius),一种用于推理模型无数据强化学习的协作式教练-玩家范式。与传统对抗性自博弈不同,CPMöbius受现实世界人类体育协作和多智能体协作启发,将教练和玩家视为独立但合作的角色。教练针对玩家的能力提出指令,并根据玩家表现的变化获得奖励,而玩家则因解决教练生成的越来越有指导性的任务而获得奖励。这种合作优化循环旨在直接提升玩家的数学推理能力。值得注意的是,CPMöbius在不依赖任何外部训练数据的情况下实现了显著改进,优于现有的无监督方法。例如,在Qwen2.5-Math-7B-Instruct上,我们的方法总体准确率平均提升4.9%,分布外(OOD)准确率平均提升5.4%,总体准确率超过RENT 1.5%,OOD准确率超过R-zero 4.2%。我们的代码库已在https://github.com/thunlp/CPMobius发布。

英文摘要

Large Language Models (LLMs) have demonstrated strong potential in complex reasoning, yet their progress remains fundamentally constrained by reliance on massive high-quality human-curated tasks and labels, either through supervised fine-tuning (SFT) or reinforcement learning (RL) on reasoning-specific data. This dependence renders supervision-heavy training paradigms increasingly unsustainable, with signs of diminishing scalability already evident in practice. To overcome this limitation, we introduce CPMöbius (CPMobius), a collaborative Coach-Player paradigm for data-free reinforcement learning of reasoning models. Unlike traditional adversarial self-play, CPMöbius, inspired by real world human sports collaboration and multi-agent collaboration, treats the Coach and Player as independent but cooperative roles. The Coach proposes instructions targeted at the Player's capability and receives rewards based on changes in the Player's performance, while the Player is rewarded for solving the increasingly instructive tasks generated by the Coach. This cooperative optimization loop is designed to directly enhance the Player's mathematical reasoning ability. Remarkably, CPMöbius achieves substantial improvement without relying on any external training data, outperforming existing unsupervised approaches. For example, on Qwen2.5-Math-7B-Instruct, our method improves accuracy by an overall average of +4.9 and an out-of-distribution average of +5.4, exceeding RENT by +1.5 on overall accuracy and R-zero by +4.2 on OOD accuracy. Our codebase has been released at https://github.com/thunlp/CPMobius.

2602.02495 2026-05-26 cs.CL cs.AI cs.LG 版本更新

Reward-free Alignment for Conflicting Objectives

无奖励的冲突目标对齐

Peter Chen, Xiaopeng Li, Xi Chen, Tianyi Lin

发表机构 * Columbia University(哥伦比亚大学)

AI总结 提出RACO框架,通过冲突规避梯度下降的裁剪变体直接利用成对偏好数据解决多目标冲突,实现帕累托最优对齐。

Comments Accepted to ICML 2026 (Oral)

详情
AI中文摘要

直接对齐方法越来越多地用于将大型语言模型(LLMs)与人类偏好对齐。然而,许多现实世界的对齐问题涉及多个相互冲突的目标,简单的偏好聚合可能导致训练不稳定和糟糕的权衡。特别是,加权损失方法可能无法识别同时改善所有目标的更新方向,而现有的多目标方法通常依赖显式奖励模型,增加了额外复杂性并扭曲了用户指定的偏好。本文的贡献有两方面。首先,我们提出了一种用于冲突目标的无奖励对齐框架(RACO),该框架直接利用成对偏好数据,并通过一种新颖的冲突规避梯度下降的裁剪变体解决梯度冲突。我们提供了收敛到尊重用户指定目标权重的帕累托临界点的保证,并进一步证明在双目标设置中裁剪可以严格改善收敛速度。其次,我们使用一些启发式方法改进了我们的方法,并进行了实验,以证明所提框架在LLM对齐中的兼容性。在多个LLM家族(Qwen 3、Llama 3、Gemma 3)上的多目标摘要和安全对齐任务的定性和定量评估表明,与现有的多目标对齐基线相比,我们的方法始终能实现更好的帕累托权衡。

英文摘要

Direct alignment methods are increasingly used to align large language models (LLMs) with human preferences. However, many real-world alignment problems involve multiple conflicting objectives, where naive aggregation of preferences can lead to unstable training and poor trade-offs. In particular, weighted loss methods may fail to identify update directions that simultaneously improve all objectives, and existing multi-objective approaches often rely on explicit reward models, introducing additional complexity and distorting user-specified preferences. The contributions of this paper are two-fold. First, we propose a Reward-free Alignment framework for Conflicted Objectives (RACO) that directly leverages pairwise preference data and resolves gradient conflicts via a novel clipped variant of conflict-averse gradient descent. We provide convergence guarantees to Pareto-critical points that respect user-specified objective weights, and further show that clipping can strictly improve convergence rate in the two-objective setting. Second, we improve our method using some heuristics and conduct experiments to demonstrate the compatibility of the proposed framework for LLM alignment. Both qualitative and quantitative evaluations on multi-objective summarization and safety alignment tasks across multiple LLM families (Qwen 3, Llama 3, Gemma 3) show that our method consistently achieves better Pareto trade-offs compared to existing multi-objective alignment baselines.

2602.01322 2026-05-26 cs.LG cs.CL 版本更新

PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding

PolySAE: 通过多项式解码建模稀疏自编码器中的特征交互

Panagiotis Koromilas, Andreas D. Demou, James Oldfield, Yannis Panagakis, Mihalis Nicolaou

发表机构 * The Cyprus Institute(塞浦路斯研究所) University of Athens(雅典大学) University of Oxford(牛津大学) Archimedes AI/Athena Research Center(阿基米德AI/雅典娜研究中心) University of Cyprus(塞浦路斯大学)

AI总结 提出PolySAE,在稀疏自编码器解码器中引入高阶项以建模特征交互,通过低秩张量分解在共享投影子空间上捕获成对和三元特征交互,在保持可解释性的同时提升探测F1约8%,并产生与共现频率无关的组合结构。

Comments 43rd International Conference on Machine Learning (ICML 2026); Code: https://github.com/pakoromilas/PolySAE

详情
AI中文摘要

稀疏自编码器(SAE)通过将激活分解为字典原子的稀疏组合来解释神经网络表示。然而,SAE假设特征通过线性重建相加组合,这种假设无法捕捉组合结构:线性模型无法区分“Starbucks”是由“star”和“coffee”特征的组合还是仅由它们的共现产生。这迫使SAE为复合概念分配整体特征,而不是将其分解为可解释的组成部分。我们引入了PolySAE,它通过高阶项扩展SAE解码器以建模特征交互,同时保留对可解释性至关重要的线性编码器。通过在共享投影子空间上进行低秩张量分解,PolySAE以较小的参数开销(GPT2上为3%)捕获成对和三元特征交互。在四个语言模型和三个SAE变体上,PolySAE在保持可比重建误差的同时,探测F1平均提升约8%,并产生类别条件特征分布之间2-10倍更大的Wasserstein距离。关键的是,学习到的交互权重与共现频率的相关性可忽略不计(r = 0.06,而SAE特征协方差为r = 0.82),表明多项式项捕获了很大程度上独立于表面统计的组合结构。最后,学习到的交互方向因果性地将模型输出引导向相应的组合语义。

英文摘要

Sparse autoencoders (SAEs) interpret neural network representations by decomposing activations into sparse combinations of dictionary atoms. However, SAEs assume features combine additively through linear reconstruction, an assumption that cannot capture compositional structure: linear models cannot distinguish whether ''Starbucks'' arises from the composition of ''star'' and ''coffee'' features or merely their co-occurrence. This forces SAEs to allocate monolithic features for compound concepts rather than decomposing them into interpretable constituents. We introduce PolySAE, which extends the SAE decoder with higher-order terms to model feature interactions while preserving the linear encoder essential for interpretability. Through low-rank tensor factorization on a shared projection subspace, PolySAE captures pairwise and triple feature interactions with small parameter overhead (3% on GPT2). Across four language models and three SAE variants, PolySAE achieves an average improvement of $\sim$8% in probing F1 while maintaining comparable reconstruction error, and produces 2--10$\times$ larger Wasserstein distances between class-conditional feature distributions. Critically, learned interaction weights exhibit negligible correlation with co-occurrence frequency ($r = 0.06$ vs $r = 0.82$ for SAE feature covariance), suggesting that polynomial terms capture compositional structure largely independent of surface statistics. Finally, the learned interaction directions causally steer model outputs toward the corresponding compositional semantics.

2602.01183 2026-05-26 cs.CV cs.LG 版本更新

Refining Context-Entangled Content Segmentation via Curriculum Selection and Anti-Curriculum Promotion

通过课程选择与反课程促进优化上下文纠缠内容分割

Chunming He, Rihan Zhang, Fengyang Xiao, Dingming Zhang, Zhiwen Cao, Sina Farsiu

发表机构 * Duke University(杜克大学) Adobe(Adobe公司)

AI总结 提出CurriSeg双阶段学习框架,结合课程学习与反课程学习原理,通过动态数据选择与频谱盲性微调提升上下文纠缠内容分割的鲁棒性和泛化能力。

Comments ICML 2026, 8 figures, 11 tables

详情
AI中文摘要

生物学习从简单到困难的任务逐步进行,逐渐增强感知和鲁棒性。受此原理启发,我们解决上下文纠缠内容分割(CECS)这一具有挑战性的场景,其中对象与周围环境共享内在视觉模式,如伪装目标检测。传统分割网络主要依赖架构增强,但往往忽略了在纠缠数据分布下控制鲁棒性的学习动态。我们引入CurriSeg,一个双阶段学习框架,统一了课程和反课程原则以提高表示可靠性。在课程选择阶段,CurriSeg基于样本损失的时间统计动态选择训练数据,区分困难但有信息的样本与噪声或模糊样本,从而实现稳定的能力增强。在反课程促进阶段,我们设计了频谱盲性微调,抑制高频成分以强制依赖低频结构和上下文线索,从而增强泛化能力。大量实验表明,CurriSeg在多种CECS基准上取得了一致的改进,无需增加参数或增加总训练时间,为进展与挑战如何相互作用以促进鲁棒且上下文感知的分割提供了原则性视角。代码将发布。

英文摘要

Biological learning proceeds from easy to difficult tasks, gradually reinforcing perception and robustness. Inspired by this principle, we address Context-Entangled Content Segmentation (CECS), a challenging setting where objects share intrinsic visual patterns with their surroundings, as in camouflaged object detection. Conventional segmentation networks predominantly rely on architectural enhancements but often ignore the learning dynamics that govern robustness under entangled data distributions. We introduce CurriSeg, a dual-phase learning framework that unifies curriculum and anti-curriculum principles to improve representation reliability. In the Curriculum Selection phase, CurriSeg dynamically selects training data based on the temporal statistics of sample losses, distinguishing hard-but-informative samples from noisy or ambiguous ones, thus enabling stable capability enhancement. In the Anti-Curriculum Promotion phase, we design Spectral-Blindness Fine-Tuning, which suppresses high-frequency components to enforce dependence on low-frequency structural and contextual cues and thus strengthens generalization. Extensive experiments demonstrate that CurriSeg achieves consistent improvements across diverse CECS benchmarks without adding parameters or increasing total training time, offering a principled view of how progression and challenge interplay to foster robust and context-aware segmentation. Code will be released.

2601.22466 2026-05-26 cs.LG 版本更新

EvoEGF-Mol: Evolving Exponential Geodesic Flow for Structure-based Drug Design

EvoEGF-Mol:用于基于结构的药物设计的演化指数测地流

Yaowei Jin, Junjie Wang, Cheng Cao, Penglei Wang, Duo An, Qian Shi

发表机构 * Lingang Laboratory(Lingang 实验室) School of Information Science(信息科学学院) Technology, ShanghaiTech University(技术,上海科技大学)

AI总结 针对基于结构的药物设计中欧几里得空间与概率空间不匹配的问题,提出EvoEGF-Mol模型,通过复合指数族分布和演化指数测地流统一表示分子,实现高几何精度和相互作用保真度。

Comments Accepted to ICML 2026

详情
AI中文摘要

基于结构的药物设计(SBDD)旨在发现生物活性配体。传统方法在欧几里得空间和概率空间中分别构建连续原子坐标和离散化学类别的概率路径,导致与底层统计流形不匹配。我们通过使用复合指数族分布来表示分子来解决这个问题,其中坐标和类别在统一的自然参数空间中表示,并在Fisher-Rao度量下沿指数测地线同步演化。为了避免直接针对狄拉克分布的测地线导致的瞬时轨迹崩溃,我们提出了用于SBDD的演化指数测地流(EvoEGF-Mol),该方法用动态集中的分布替代静态狄拉克目标,并通过渐进参数细化架构进行训练。我们的模型在CrossDock上达到了参考级别的PoseBusters通过率(93.4%),展示了卓越的几何精度和相互作用保真度,同时在真实世界的MolGenBench任务中,在生物活性骨架恢复方面取得了优于基线方法的性能。代码可在https://github.com/BLEACH366/EvoEGF-Mol获取。

英文摘要

Structure-Based Drug Design (SBDD) aims to discover bioactive ligands. Conventional approaches construct probability paths separately in Euclidean and probabilistic spaces for continuous atomic coordinates and discrete chemical categories, leading to a mismatch with the underlying statistical manifolds. We address this issue by representing molecules using composite exponential-family distributions, where coordinates and categories are represented within a unified natural parameter space to evolve synchronously along exponential geodesics under the Fisher-Rao metric. To avoid the instantaneous trajectory collapse induced by geodesics directly targeting Dirac distributions, we propose Evolving Exponential Geodesic Flow for SBDD (EvoEGF-Mol), which replaces static Dirac targets with dynamically concentrating distributions and is trained with a progressive-parameter-refinement architecture. Our model approaches a reference-level PoseBusters passing rate (93.4%) on CrossDock, demonstrating remarkable geometric precision and interaction fidelity, while achieving superior performance over baseline methods on real-world MolGenBench tasks for bioactive scaffold recovery. Code is available at https://github.com/BLEACH366/EvoEGF-Mol.

2601.21406 2026-05-26 cs.CV cs.LG 版本更新

Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation

通过多表示生成增强统一多模态模型的理解能力

Zihan Su, Hongyang Wei, Kangrui Cen, Yong Wang, Guanhua Chen, Chun Yuan, Xiangxiang Chu

发表机构 * Tsinghua Shenzhen International Graduate School, Tsinghua University(清华大学深圳国际研究生院,清华大学) AMAP, Alibaba Group(阿里妈妈,阿里巴巴集团) Shanghai Jiao Tong University(上海交通大学) Southern University of Science and Technology(南方科技大学)

AI总结 提出UniMRG方法,通过辅助生成像素、深度和分割等多重表示,增强统一多模态模型的理解能力,减少幻觉并提升空间理解。

Comments Code: https://github.com/Sugewud/UniMRG

详情
AI中文摘要

统一多模态模型(UMMs)在单一框架内整合了视觉理解和生成。其最终目标是创建一个理解和生成相互促进的循环。虽然最近的后训练方法成功利用理解来增强生成,但利用生成来改善理解的逆向方向仍基本未被探索。在这项工作中,我们提出了UniMRG(统一多表示生成),一种简单而有效的架构无关的后训练方法。UniMRG通过引入辅助生成任务来增强UMMs的理解能力。具体来说,我们训练UMMs生成输入图像的多种内在表示,即像素(重建)、深度(几何)和分割(结构),同时进行标准的视觉理解目标。通过综合这些多样化的表示,UMMs捕获关于外观、空间关系和结构布局的互补信息。因此,UMMs对视觉输入形成了更深入和全面的理解。跨多种UMM架构的大量实验表明,我们的方法显著增强了细粒度感知,减少了幻觉,并改善了空间理解,同时提升了生成能力。

英文摘要

Unified Multimodal Models (UMMs) integrate both visual understanding and generation within a single framework. Their ultimate aspiration is to create a cycle where understanding and generation mutually reinforce each other. While recent post-training methods have successfully leveraged understanding to enhance generation, the reverse direction of utilizing generation to improve understanding remains largely unexplored. In this work, we propose UniMRG (Unified Multi-Representation Generation), a simple yet effective architecture-agnostic post-training method. UniMRG enhances the understanding capabilities of UMMs by incorporating auxiliary generation tasks. Specifically, we train UMMs to generate multiple intrinsic representations of input images, namely pixel (reconstruction), depth (geometry), and segmentation (structure), alongside standard visual understanding objectives. By synthesizing these diverse representations, UMMs capture complementary information regarding appearance, spatial relations, and structural layout. Consequently, UMMs develop a deeper and more comprehensive understanding of visual inputs. Extensive experiments across diverse UMM architectures demonstrate that our method notably enhances fine-grained perception, reduces hallucinations, and improves spatial understanding, while simultaneously boosting generation capabilities.

2601.21094 2026-05-26 cs.LG cs.AI cs.SY eess.SY 版本更新

Safety Generalization Under Distribution Shift in Safe Reinforcement Learning: A Diabetes Testbed

安全强化学习中的分布偏移下的安全泛化:一个糖尿病测试平台

Minjae Kwon, Josephine Lamp, Lu Feng

发表机构 * Department of Computer Science, University of Virginia(弗吉尼亚大学计算机科学系)

AI总结 研究安全强化学习算法在分布偏移下训练时安全保证能否迁移到部署中,使用糖尿病管理作为测试平台,发现安全泛化差距并通过测试时屏蔽有效恢复安全性。

Comments Accepted at ICML 2026. Camera-ready version

详情
AI中文摘要

安全强化学习算法通常在固定的训练条件下进行评估。我们使用糖尿病管理作为安全关键测试平台,研究训练时的安全保证是否能在分布偏移下迁移到部署中。我们在统一的临床模拟器上对安全强化学习算法进行基准测试,并揭示了一个安全泛化差距:在训练期间满足约束的策略经常在未见过的患者身上违反安全要求。我们证明,测试时屏蔽(使用学习到的动力学模型过滤不安全动作)能有效恢复跨算法和患者群体的安全性。在八种安全强化学习算法、三种糖尿病类型和三个年龄组中,屏蔽使得PPO-Lag和CPO等强基线的血糖达标时间范围提高了13-14%,同时降低了临床风险指数和血糖变异性。我们的模拟器和基准测试为研究安全关键控制领域中分布偏移下的安全性提供了一个平台。代码可在https://github.com/safe-autonomy-lab/GlucoSim 和 https://github.com/safe-autonomy-lab/GlucoAlg 获取。

英文摘要

Safe Reinforcement Learning (RL) algorithms are typically evaluated under fixed training conditions. We investigate whether training-time safety guarantees transfer to deployment under distribution shift, using diabetes management as a safety-critical testbed. We benchmark safe RL algorithms on a unified clinical simulator and reveal a safety generalization gap: policies satisfying constraints during training frequently violate safety requirements on unseen patients. We demonstrate that test-time shielding, which filters unsafe actions using learned dynamics models, effectively restores safety across algorithms and patient populations. Across eight safe RL algorithms, three diabetes types, and three age groups, shielding achieves Time-in-Range gains of 13--14\% for strong baselines such as PPO-Lag and CPO while reducing clinical risk index and glucose variability. Our simulator and benchmark provide a platform for studying safety under distribution shift in safety-critical control domains. Code is available at https://github.com/safe-autonomy-lab/GlucoSim and https://github.com/safe-autonomy-lab/GlucoAlg.

2601.19743 2026-05-26 eess.IV cs.CV cs.LG 版本更新

Interpretable and backpropagation-free Green Learning for efficient multi-task echocardiographic segmentation and classification

可解释且无需反向传播的绿色学习用于高效多任务超声心动图分割与分类

Jyun-Ping Kao, Jiaxin Yang, C. -C. Jay Kuo, Jonghye Woo

AI总结 提出一种无需反向传播的多任务绿色学习框架,通过无监督VoxelHop编码器与多级回归解码器及XG-Boost分类器,在EchoNet-Dynamic数据集上实现左心室分割与射血分数分类,以极低参数量达到高精度。

Comments Accepted for publication in APSIPA Transactions on Signal and Information Processing. Jyun-Ping Kao and Jiaxing Yang contributed equally to this work. C.-C. Jay Kuo and Jonghye Woo are the senior authors

详情
AI中文摘要

超声心动图是管理心力衰竭(HF)的基石,左心室射血分数(LVEF)是指导治疗的关键指标。然而,手动LVEF评估存在较高的观察者间变异性,而现有的深度学习(DL)模型通常是计算密集且数据饥饿的“黑箱”,阻碍了临床信任和采用。在此,我们提出了一种无需反向传播的多任务绿色学习(MTGL)框架,可同时进行左心室(LV)分割和LVEF分类。我们的框架将用于分层时空特征提取的无监督VoxelHop编码器与多级回归解码器和XG-Boost分类器相结合。在EchoNet-Dynamic数据集上,我们的MTGL模型实现了最先进的分类和分割性能,分类准确率达到94.3%,Dice相似系数(DSC)达到0.912,显著优于多个先进的3D DL模型。关键的是,我们的模型在参数数量少一个数量级的情况下实现了这一性能,展现了卓越的计算效率。这项工作表明,GL范式可以为复杂的医学图像分析提供高度准确、高效且可解释的解决方案,为临床实践中更可持续和可信的人工智能铺平道路。

英文摘要

Echocardiography is a cornerstone for managing heart failure (HF), with Left Ventricular Ejection Fraction (LVEF) being a critical metric for guiding therapy. However, manual LVEF assessment suffers from high inter-observer variability, while existing Deep Learning (DL) models are often computationally intensive and data-hungry "black boxes" that impede clinical trust and adoption. Here, we propose a backpropagation-free multi-task Green Learning (MTGL) framework that performs simultaneous Left Ventricle (LV) segmentation and LVEF classification. Our framework integrates an unsupervised VoxelHop encoder for hierarchical spatio-temporal feature extraction with a multi-level regression decoder and an XG-Boost classifier. On the EchoNet-Dynamic dataset, our MTGL model achieves state-of-the-art classification and segmentation performance, attaining a classification accuracy of 94.3% and a Dice Similarity Coefficient (DSC) of 0.912, significantly outperforming several advanced 3D DL models. Crucially, our model achieves this with over an order of magnitude fewer parameters, demonstrating exceptional computational efficiency. This work demonstrates that the GL paradigm can deliver highly accurate, efficient, and interpretable solutions for complex medical image analysis, paving the way for more sustainable and trustworthy artificial intelligence in clinical practice.

2601.05613 2026-05-26 cs.LG cs.AI 版本更新

PiXTime: A Model for Federated Time Series Forecasting with Heterogeneous Data across Nodes

PiXTime: 一种跨节点异构数据联邦时间序列预测模型

Yiming Zhou, Jiahao Wang, Mingyue Cheng, Hao Wang, Defu Lian, Enhong Chen

发表机构 * University of Science and Technology of China(科学技术大学)

AI总结 提出基于Transformer的PiXTime框架,通过参数解耦架构(局部个性化模块+全局共享骨干)处理异构时间序列,实现联邦学习中的异构数据预测,并在多个基准上达到最优性能。

详情
AI中文摘要

虽然对分布式时间序列进行协同预测非常理想,但由于数据共享限制,直接合并局部数据集通常不可行。联邦学习提供了一种有前景的替代方案,但传统的联邦学习算法要求同构模型架构,这与去中心化节点中常见的结构差异(如时间分辨率不对齐、变量通道不匹配)不兼容。为弥合这一差距,我们引入了PiXTime,一种新颖的基于Transformer的框架,旨在原生适应并利用结构异构的时间数据。其核心采用参数解耦架构,将模型策略性地划分为局部个性化模块和全局聚合共享骨干。具体而言,节点特定的局部模块作为维度适配器,将不同长度的原始序列投影到统一表示空间。同时,全局同步的VE表将一致的类别标识注入特征空间,使共享骨干能够跨不一致的变量分布协同学习并泛化表示。在多个基准上的全面评估表明,PiXTime在异构联邦环境中实现了最先进的性能,同时在标准同构和集中式预测设置中保持强大的优势。

英文摘要

While collaborative forecasting on distributed time series is highly desirable, directly pooling localized datasets is often impractical due to data sharing constraints. Federated learning offers a promising alternative, yet conventional federated learning algorithms require homogeneous model architectures, which are incompatible with the structural discrepancies, such as unaligned temporal resolutions and mismatched variable channels, commonly observed across decentralized nodes. To bridge this gap, we introduce PiXTime, a novel Transformer-based framework designed to natively accommodate and leverage structurally heterogeneous temporal data. At its core, PiXTime adopts a parameter-decoupling architecture, strategically partitioning the model into localized personalized modules and a globally aggregated shared backbone. Specifically, node-specific local modules act as dimensional adapters, projecting raw sequences of diverse lengths into a unified representation space. Concurrently, a globally synchronized VE Table injects consistent categorical identities into the feature space, allowing the shared backbone to collaboratively learn and generalize representations across inconsistent variable distributions. Comprehensive evaluations on multiple benchmarks demonstrate that PiXTime achieves state-of-the-art performance in heterogeneous federated environments, while maintaining robust superiority in standard homogeneous and centralized forecasting settings.

2601.05289 2026-05-26 hep-ph cs.LG hep-ex physics.ins-det 版本更新

A universal vision transformer for fast calorimeter simulations

一种用于快速量热器模拟的通用视觉变换器

Luigi Favaro, Andrea Giammanco, Claudius Krause

发表机构 * Centre for Cosmology, Particle Physics(宇宙学、粒子物理与现象学研究中心) Marietta Blau Institute for Particle Physics (MBI Vienna), Austrian Academy of Sciences (ÖAW), Austria(玛丽埃塔·布劳粒子物理研究所(MBI维也纳),奥地利科学院(ÖAW),奥地利)

AI总结 本研究基于CaloDREAM架构,提出使用视觉变换器(ViT)进行快速量热器模拟,在规则和不规则几何结构及多个探测器上均表现出高精度和可扩展性,生成时间在单GPU上为10-100毫秒,并通过预训练和微调降低了训练成本并提高了数据效率。

Comments 44 pages, 17 figures, 8 tables; journal version. Mach. Learn.: Sci. Technol (2026)

详情
AI中文摘要

探测器的高维复杂特性使得快速量热器模拟成为现代生成式机器学习的主要应用。视觉变换器(ViT)能够以无与伦比的精度模拟Geant4响应,并且不限于规则几何结构。从CaloDREAM架构出发,我们展示了ViT在规则和不规则几何结构以及多个探测器上的鲁棒性和可扩展性。结果表明,ViT在多个评估指标下生成的电磁和强子簇射与Geant4的偏差极小,同时在单个GPU上的生成时间保持在$\mathcal{O}(10-100)$毫秒。此外,我们表明在大型数据集上预训练并在目标几何结构上微调可以降低训练成本并提高数据效率,或者整体上提高生成簇射的保真度。

英文摘要

The high-dimensional complex nature of detectors makes fast calorimeter simulations a prime application for modern generative machine learning. Vision transformers (ViTs) can emulate the Geant4 response with unmatched accuracy and are not limited to regular geometries. Starting from the CaloDREAM architecture, we demonstrate the robustness and scalability of ViTs on regular and irregular geometries, and multiple detectors. Our results show that ViTs generate electromagnetic and hadronic showers with minimal deviations from Geant4 in multiple evaluation metrics, while maintaining the generation time in the $\mathcal{O}(10-100)$ ms on a single GPU. Furthermore, we show that pretraining on a large dataset and fine-tuning on the target geometry leads to reduced training costs and higher data efficiency, or altogether improves the fidelity of generated showers.

2601.03327 2026-05-26 cs.LG cs.AI 版本更新

Extreme-value forest fire prediction A study of the Loss Function in an Ordinality Scheme

极端值森林火灾预测:序数方案中损失函数的研究

Nicolas Caron, Christophe Guyeux, Hassan Noura, Benjamin Aynes

AI总结 提出首个序数分类框架预测火灾严重等级,研究损失函数设计对预测极端事件的影响,发现加权卡帕损失在极端类别上IoU提升超过0.1。

Comments Following external reviews, we identified major methodological issues in the manuscript, including insufficient justification of the ordinal clustering strategy, limited statistical validation, ambiguities in dataset splitting, and missing comparisons with standard ordinal approaches. We therefore request withdrawal in order to prepare a substantially revised version

详情
AI中文摘要

野火在空间和严重程度上是高度不平衡的自然灾害,使得极端事件的预测特别具有挑战性。在这项工作中,我们引入了第一个序数分类框架,用于预测与法国操作决策直接对齐的野火严重等级。我们的研究调查了损失函数设计对神经模型预测罕见但关键的高严重火灾发生能力的影响。我们将标准交叉熵与几种序数感知目标进行比较,包括提出的基于截断离散指数广义帕累托分布的概率TDeGPD损失。通过对多种架构和真实操作数据的广泛基准测试,我们表明序数监督显著提高了模型相对于传统方法的性能。特别是,加权卡帕损失(WKLoss)取得了最佳整体结果,在最极端严重类别上IoU(交并比)增益超过0.1,同时保持了有竞争力的校准质量。然而,由于数据集中极端事件极低的代表性,对于最罕见事件的性能仍然有限。这些发现强调了将严重性排序、数据不平衡考虑和季节性风险整合到野火预测系统中的重要性。未来的工作将集中于将季节动态和不确定性信息纳入训练,以进一步提高极端事件预测的可靠性。

英文摘要

Wildfires are highly imbalanced natural hazards in both space and severity, making the prediction of extreme events particularly challenging. In this work, we introduce the first ordinal classification framework for forecasting wildfire severity levels directly aligned with operational decision-making in France. Our study investigates the influence of loss-function design on the ability of neural models to predict rare yet critical high-severity fire occurrences. We compare standard cross-entropy with several ordinal-aware objectives, including the proposed probabilistic TDeGPD loss derived from a truncated discrete exponentiated Generalized Pareto Distribution. Through extensive benchmarking over multiple architectures and real operational data, we show that ordinal supervision substantially improves model performance over conventional approaches. In particular, the Weighted Kappa Loss (WKLoss) achieves the best overall results, with more than +0.1 IoU (Intersection Over Union) gain on the most extreme severity classes while maintaining competitive calibration quality. However, performance remains limited for the rarest events due to their extremely low representation in the dataset. These findings highlight the importance of integrating both severity ordering, data imbalance considerations, and seasonality risk into wildfire forecasting systems. Future work will focus on incorporating seasonal dynamics and uncertainty information into training to further improve the reliability of extreme-event prediction.

2512.24075 2026-05-26 cs.LG 版本更新

Evolutionary Physics-Informed Temporal Fusion for Lane-Change Intention Prediction

进化物理信息时间融合用于换道意图预测

Jiazhao Shi, Qiyang Xie, Ziyu Wang, Dongxu Zhang, Yichen Lin, Di Zhu, Chen Xie, Ziwei Wang, Haoyun Zhang, Enliang Li, Zetong Guan

发表机构 * Tandon School of Engineering(工程学院) New York University(纽约大学) Khoury College of Computer Science(计算机科学学院) Northeastern University(东北大学) School of Business(商学院) Wake Forest University(威克森林大学) Independent Researcher(独立研究者) University of Massachusetts Amherst(马萨诸塞大学阿默斯特分校) Carnegie Mellon University(卡内基梅隆大学) University of Pennsylvania(宾夕法尼亚大学) Qualcomm CDMA Technologies(高通CDMA技术) University of Michigan(密歇根大学)

AI总结 提出一种进化物理信息时间融合框架,通过融合从传统交通信号导出的时间描述符和从原始轨迹序列学习的时间嵌入,实现三分类换道意图预测,在highD和exiD数据集上取得高F1分数。

详情
AI中文摘要

早期换道意图预测对于自动驾驶和ADAS至关重要,但由于换道行为依赖于不断变化的交通风险、周围车辆交互和目标车道可行性,而非仅瞬时车辆状态,因此仍具挑战性。本研究提出一种进化物理信息时间融合框架,用于三分类换道意图预测,包括左换道、右换道和不换道。该方法并非仅使用静态物理信息变量,而是从传统交通信号中导出时间描述符,包括风险演化、间隙持续性、反事实车道效用、交互压力梯度、机动可行性和意图一致性。这些描述符与通过序列编码器从原始轨迹序列学习的时间嵌入融合,融合表示用于最终分类。在highD和exiD数据集上,分别在1秒、2秒和3秒预测时域下进行实验。所提模型在highD上达到0.9514、0.9256和0.8872的宏F1分数,在exiD上达到0.9386、0.9070和0.8531。在exiD匝道邻近场景中改进尤为显著,表明时间物理演化在交互丰富的环境中特别有用。这些结果表明,将进化物理信息描述符与学习的时间表示相结合,为早期换道意图预测提供了更动态且可解释的解决方案。

英文摘要

Early lane-change intention prediction is essential for autonomous driving and ADAS, but it remains challenging because lane-changing behavior depends on evolving traffic risk, surrounding-vehicle interactions, and target-lane feasibility rather than only instantaneous vehicle states. This study proposes an evolutionary physics-informed temporal fusion framework for three-class lane-change intention prediction, including left lane change, right lane change, and no lane change. Instead of using static physics-informed variables alone, the proposed method derives temporal descriptors from conventional traffic signals, including risk evolution, gap persistence, counterfactual lane utility, interaction pressure gradient, maneuver feasibility, and intent consistency. These descriptors are fused with temporal embeddings learned from raw trajectory sequences through a sequence encoder, and the fused representation is used for final classification. Experiments are conducted on the highD and exiD datasets under 1\,s, 2\,s, and 3\,s prediction horizons. The proposed model achieves Macro F1-scores of 0.9514, 0.9256, and 0.8872 on highD, and 0.9386, 0.9070, and 0.8531 on exiD, respectively. The improvement is especially pronounced in exiD ramp-adjacent scenarios, indicating that temporal physical evolution is particularly useful in interaction-rich environments. These results demonstrate that combining evolutionary physics-informed descriptors with learned temporal representations provides a more dynamic and interpretable solution for early lane-change intention prediction.

2512.23076 2026-05-26 cs.LG cs.AI cs.HC 版本更新

Multimodal Functional Maximum Correlation for Emotion Recognition

多模态功能最大相关用于情感识别

Deyang Zheng, Tianyi Zhang, Wenming Zheng, Shujian Yu

发表机构 * Key Laboratory of Child Development and Learning Science (Ministry of Education), School of Biological Sciences and Medical Engineering, Southeast University(儿童发展与学习科学重点实验室(教育部)、生物科学与医学工程学院、东南大学) Department of Artificial Intelligence, Westlake University(人工智能学院、西湖大学) Department of Artificial Intelligence, Vrije Universiteit Amsterdam(人工智能学院、阿姆斯特丹自由大学)

AI总结 提出多模态功能最大相关(MFMC)框架,通过双重总相关目标最大化高阶多模态依赖,在情感识别基准上取得最先进性能。

Comments manuscript accepted by IEEE Transactions on Affective Computing. Code is available at https://github.com/DY9910/MFMC

详情
AI中文摘要

情绪状态表现为中枢和自主系统之间协调但异质的生理反应,这对情感计算中的多模态表示学习构成了基本挑战。学习这种联合动态因情感标注的稀缺性和主观性而进一步复杂化,这推动了自监督学习(SSL)的使用。然而,大多数现有的SSL方法依赖于成对对齐目标,这些目标不足以表征两个以上模态之间的依赖关系,也无法捕捉由协调的脑和自主反应产生的高阶交互。为了解决这一限制,我们提出了多模态功能最大相关(MFMC),一个原则性的SSL框架,通过双重总相关(DTC)目标最大化高阶多模态依赖。通过推导一个紧致的夹逼界并使用基于功能最大相关分析(FMCA)的迹替代进行优化,MFMC直接捕捉联合多模态交互,而不依赖于成对对比损失。在三个公开的情感计算基准上的实验表明,MFMC在受试者依赖和受试者独立评估协议下均一致地达到最先进或具有竞争力的性能,突显了其对受试者间变异性的鲁棒性。特别是,MFMC将CEAP-360VR上的受试者依赖准确率从78.9%提高到86.8%,仅使用EDA信号就将受试者独立准确率从27.5%提高到33.1%。此外,在MAHNOB-HCI最具挑战性的EEG受试者独立划分中,MFMC与最佳方法的差距在0.8个百分点以内。我们的代码可在https://github.com/DY9910/MFMC获取。

英文摘要

Emotional states manifest as coordinated yet heterogeneous physiological responses across central and autonomic systems, posing a fundamental challenge for multimodal representation learning in affective computing. Learning such joint dynamics is further complicated by the scarcity and subjectivity of affective annotations, which motivates the use of self-supervised learning (SSL). However, most existing SSL approaches rely on pairwise alignment objectives, which are insufficient to characterize dependencies among more than two modalities and fail to capture higher-order interactions arising from coordinated brain and autonomic responses. To address this limitation, we propose Multimodal Functional Maximum Correlation (MFMC), a principled SSL framework that maximizes higher-order multimodal dependence through a Dual Total Correlation (DTC) objective. By deriving a tight sandwich bound and optimizing it using a functional maximum correlation analysis (FMCA) based trace surrogate, MFMC captures joint multimodal interactions directly, without relying on pairwise contrastive losses. Experiments on three public affective computing benchmarks demonstrate that MFMC consistently achieves state-of-the-art or competitive performance under both subject-dependent and subject-independent evaluation protocols, highlighting its robustness to inter-subject variability. In particular, MFMC improves subject-dependent accuracy on CEAP-360VR from 78.9% to 86.8%, and subject-independent accuracy from 27.5% to 33.1% using the EDA signal alone. Moreover, MFMC remains within 0.8 percentage points of the best-performing method on the most challenging EEG subject-independent split of MAHNOB-HCI. Our code is available at https://github.com/DY9910/MFMC.

2512.15605 2026-05-26 cs.LG stat.ML 版本更新

Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction

自回归语言模型实际上是能量模型:对下一个词元预测的预见能力的洞察

Mathieu Blondel, Michael E. Sander, Germain Vivier-Ardisson, Tianlin Liu, Vincent Roulet

发表机构 * Google DeepMind(谷歌深Mind)

AI总结 本文通过建立自回归模型与能量模型之间的双射,揭示了自回归模型在下一个词元预测范式下具备预见能力,并提供了理论误差界。

详情
AI中文摘要

自回归模型(ARMs)目前构成了大型语言模型(LLMs)的主导范式。能量模型(EBMs)代表了另一类模型,历史上在LLM发展中不太普遍,但自然地刻画了后训练对齐中的最优策略。在本文中,我们提供了这两类模型的统一视角。以概率链式法则为起点,我们在函数空间建立了ARMs和EBMs之间的显式双射,并证明这对应于最大熵强化学习中的软贝尔曼方程的一个特例。基于这一双射,我们推导了ARMs和EBMs的监督学习之间的等价性。此外,我们通过提供理论误差界分析了将EBMs蒸馏为ARMs的过程。我们的结果揭示了ARMs尽管基于下一个词元预测范式,却具备规划能力的原因。

英文摘要

Autoregressive models (ARMs) currently constitute the dominant paradigm for large language models (LLMs). Energy-based models (EBMs) represent another class of models, which have historically been less prevalent in LLM development, yet naturally characterize the optimal policy in post-training alignment. In this paper, we provide a unified view of these two model classes. Taking the chain rule of probability as a starting point, we establish an explicit bijection between ARMs and EBMs in function space, which we show to correspond to a special case of the soft Bellman equation in maximum entropy reinforcement learning. Building upon this bijection, we derive the equivalence between supervised learning of ARMs and EBMs. Furthermore, we analyze the distillation of EBMs into ARMs by providing theoretical error bounds. Our results provide insights into the ability of ARMs to plan ahead, despite being based on the next-token prediction paradigm.

2512.10506 2026-05-26 eess.IV cs.LG eess.SP 版本更新

Hyperspectral Image Data Reduction for Endmember Extraction

用于端元提取的高光谱图像数据降维

Tomohiko Mizutani

发表机构 * Department of Mathematical and Systems Engineering, Shizuoka University(数学与系统工程系,静冈大学)

AI总结 针对高光谱图像端元提取计算成本高的问题,提出一种基于线性混合模型和纯像元假设的数据降维技术,去除混合像元,保留近端元像元,并结合线性规划自字典方法,在不牺牲提取精度的前提下显著降低计算时间。

Comments 37 pages, code is available at https://github.com/tomohiko-mizutani/REDIC

详情
AI中文摘要

从高光谱图像中提取端元旨在识别场景中存在的材料的光谱特征。最近的研究表明,自字典方法可以实现高提取精度;然而,其高计算成本限制了它们在大规模高光谱图像中的适用性。尽管已经提出了几种方法来解决这个问题,但它仍然是一个主要挑战。受此情况启发,本文采用数据降维方法。假设高光谱图像遵循线性混合模型且满足纯像元假设,我们开发了一种数据降维技术来去除对应于多个端元特征混合的像元。我们分析了该降维步骤的理论性质,并表明它保留了靠近端元的像元。基于这一结果,我们提出了一种数据降维自字典方法,该方法将数据降维与基于线性规划公式的自字典方法相结合。数值实验表明,所提出的方法可以在不牺牲端元提取精度的情况下,显著减少原始自字典方法的计算时间。

英文摘要

Endmember extraction from hyperspectral images aims to identify the spectral signatures of materials present in a scene. Recent studies have shown that self-dictionary methods can achieve high extraction accuracy; however, their high computational cost limits their applicability to large-scale hyperspectral images. Although several approaches have been proposed to mitigate this issue, it remains a major challenge. Motivated by this situation, this paper pursues a data reduction approach. Assuming that a hyperspectral image follows the linear mixing model with the pure-pixel assumption, we develop a data reduction technique to remove pixels corresponding to mixtures of multiple endmember signatures. We analyze the theoretical properties of this reduction step and show that it preserves pixels that lie close to the endmembers. Building on this result, we propose a data-reduced self-dictionary method that integrates the data reduction with a self-dictionary method based on a linear programming formulation. Numerical experiments demonstrate that the proposed method can substantially reduce the computational time of the original self-dictionary method without sacrificing endmember extraction accuracy.

2512.08974 2026-05-26 physics.ao-ph cs.LG 版本更新

FuXi-Nowcast: Environment-conditioned deep learning for severe convection nowcasting

FuXi-Nowcast:环境条件深度学习用于强对流临近预报

Lei Chen, Zijian Zhu, Xiaoran Zhuang, Tianyuan Qi, Yuxuan Feng, Xiaohui Zhong, Hao Li

发表机构 * Jiangsu Meteorological Observatory(江苏省气象台) Jiangsu Key Laboratory of Severe Storm Disaster Risk / Key Laboratory of Transportation Meteorology of CMA(江苏省 severe storm disaster risk key laboratory / 国家气象局交通运输气象重点实验室) FuXi Intelligent Computing Technology Co., Ltd.(FuXi 智能计算技术有限公司)

AI总结 提出环境条件深度学习系统FuXi-Nowcast,结合高分辨率观测与三维大气预报,在12小时内预测复合反射率、降水、阵风及地表变量,优于业务数值、持续性和外推基线。

详情
AI中文摘要

强对流产生局地灾害,通常需要在雷达回波完全揭示风暴发展之前发出预警。对流触发和强对流的维持对于仅依赖雷达的临近预报仍具挑战性,因为雷达观测中可能缺乏对流前信号,且强回波在预报中常快速衰减。本文提出FuXi-Nowcast,一种环境条件深度学习系统,结合高分辨率观测与三维大气预报,预测未来12小时内的复合反射率、降水、阵风及地表变量。在2024年4月至7月华东地区的评估中,FuXi-Nowcast在反射率和降水方面优于业务数值、持续性和外推基线。案例研究、诊断和消融实验表明,大气湿度信息和对强对流信号的显式保留有助于对流触发和维持的预报。这些结果表明,环境条件约束可以缓解仅依赖雷达的临近预报在高影响对流天气中的重要失效模式。

英文摘要

Severe convection produces localized hazards that often require warnings before radar echoes fully reveal storm development. Convective initiation and the maintenance of intense convection remain challenging for radar-only nowcasting because pre-convective signals may be absent from recent radar observations and strong echoes often decay rapidly in forecasts. Here we present FuXi-Nowcast, an environment-conditioned deep learning system that combines high-resolution observations with three-dimensional atmospheric forecasts to predict composite reflectivity, precipitation, wind gusts, and surface variables up to 12 h ahead. In April--July 2024 evaluations over East China, FuXi-Nowcast outperforms operational numerical, persistence and extrapolation baselines for reflectivity and precipitation. Case studies, diagnostics, and ablation experiments suggest that atmospheric moisture information and explicit preservation of strong convective signals contribute to forecasts of convective initiation and maintenance. These results show that environmental conditioning can mitigate important failure modes of radar-only nowcasting for high-impact convective weather.

2512.08508 2026-05-26 q-bio.BM cs.LG 版本更新

Multi-Alignment Contrastive Learning for Enzyme--Reaction Retrieval

多对齐对比学习用于酶-反应检索

Gengmo Zhou, Feng Yu, Wenda Wang, Zhifeng Gao, Guolin Ke, Zhewei Wei, Zhen Wang

发表机构 * Renmin University of China(中国人民大学) DP Technology(DP科技)

AI总结 提出多对齐对比学习框架,通过联合建模酶-反应跨域兼容性及功能注释驱动的域内关系,并引入Gromov-Wasserstein正则化项,提升酶虚拟筛选和双向检索性能。

详情
AI中文摘要

识别催化目标生化反应的酶是计算酶发现和生物催化剂设计的关键步骤。最近的表示学习方法将这一问题表述为酶-反应匹配,其中配对的酶和反应被嵌入到共享空间中。然而,大多数现有方法主要依赖于成对的酶-反应监督,并且对反应集或酶家族内部关系的利用有限。本文介绍了一种用于生化检索的多对齐对比学习框架。该框架联合建模酶与反应之间的跨域兼容性以及由功能注释诱导的域内关系。此外,受Gromov-Wasserstein启发的正则化目标鼓励学习的酶和反应表示空间之间的几何一致性。通过将成对的催化监督与高阶关系对齐相结合,该模型捕获了直接的酶-反应关联以及更广泛的功能组织。我们在酶虚拟筛选和双向酶-反应检索任务上评估了该方法。在EnzymeMap上的实验表明,与强对比基线相比,在BEDROC和富集因子指标下,早期识别性能有所提高。在ReactZyme上,该方法在基于时间、酶相似性和反应相似性的划分中均取得了一致的增益,展示了对未见酶和未见反应的鲁棒性。消融研究进一步表明,域内对齐、功能监督和几何正则化项各自对观察到的改进有所贡献。这些结果表明,建模多种形式的对齐可以改进用于酶发现、反应注释及相关计算生物学应用的对比检索模型。

英文摘要

Identifying enzymes that catalyze target biochemical reactions is a key step in computational enzyme discovery and biocatalyst design. Recent representation-learning methods formulate this problem as enzyme--reaction matching, where paired enzymes and reactions are embedded into a shared space. However, most existing approaches primarily rely on pairwise enzyme--reaction supervision and make limited use of the relationships within reaction sets or enzyme families. This work introduces a multi-alignment contrastive learning framework for biochemical retrieval. The framework jointly models cross-domain compatibility between enzymes and reactions and within-domain relationships induced by functional annotations. In addition, a Gromov--Wasserstein-inspired regularization objective encourages geometric consistency between the learned enzyme and reaction representation spaces. By combining pairwise catalytic supervision with higher-order relational alignment, the model captures both direct enzyme--reaction associations and broader functional organization. We evaluate the approach on enzyme virtual screening and bidirectional enzyme--reaction retrieval tasks. Experiments on EnzymeMap show improved early-recognition performance under BEDROC and enrichment-factor metrics compared with strong contrastive baselines. On ReactZyme, the method achieves consistent gains across time-based, enzyme-similarity, and reaction-similarity splits, demonstrating robustness to unseen enzymes and unseen reactions. Ablation studies further indicate that within-domain alignment, functional supervision, and the geometric regularization term each contribute to the observed improvements. These results suggest that modeling multiple forms of alignment can improve contrastive retrieval models for enzyme discovery, reaction annotation, and related computational biology applications.

2512.05865 2026-05-26 cs.LG cs.AI 版本更新

Intrinsically Interpretable Attention via Sparse Post-Training

通过稀疏后训练实现内在可解释的注意力机制

Florent Draye, Anson Lei, Hsiao-Ru Pan, Ingmar Posner, Bernhard Schölkopf

发表机构 * MPI-IS(马克斯·普朗克研究所) University of Oxford(牛津大学) ETH Zürich(苏黎世联邦理工学院)

AI总结 提出一种后训练方法,通过约束损失下的灵活稀疏正则化,在不牺牲性能的前提下将Transformer注意力连接稀疏至约0.4%,从而简化全局电路并提升可解释性。

详情
AI中文摘要

我们引入一种简单的后训练方法,使Transformer注意力变得稀疏而不牺牲性能。在约束损失目标下应用灵活的稀疏正则化,我们在高达7B参数的模型上证明,可以将注意力连接减少到其边缘的约0.4%,同时保留原始预训练损失。与为计算效率设计的稀疏注意力方法不同,我们的方法利用稀疏性作为结构先验:它保留了能力,同时暴露出更有组织和可解释的连接模式。我们发现这种局部稀疏性级联成全局电路简化:特定任务的电路涉及更少的组件(注意力头和MLP),连接它们的边缘减少了多达100倍。此外,使用跨层转录器,我们表明稀疏注意力显著简化了注意力归因,实现了基于特征和基于电路视角的统一视图。这些结果表明,Transformer注意力可以变得稀疏几个数量级,表明其大部分计算是冗余的,并且稀疏性可以作为更结构化和可解释模型的指导原则。

英文摘要

We introduce a simple post-training method that makes transformer attention sparse without sacrificing performance. Applying a flexible sparsity regularisation under a constrained-loss objective, we show on models up to 7B parameters that it is possible to retain the original pretraining loss while reducing attention connectivity to $\approx 0.4 \%$ of its edges. Unlike sparse-attention methods designed for computational efficiency, our approach leverages sparsity as a structural prior: it preserves capability while exposing a more organized and interpretable connectivity pattern. We find that this local sparsity cascades into global circuit simplification: task-specific circuits involve far fewer components (attention heads and MLPs) with up to 100x fewer edges connecting them. Additionally, using cross-layer transcoders, we show that sparse attention substantially simplifies attention attribution, enabling a unified view of feature-based and circuit-based perspectives. These results demonstrate that transformer attention can be made orders of magnitude sparser, suggesting that much of its computation is redundant and that sparsity may serve as a guiding principle for more structured and interpretable models.

2511.20236 2026-05-26 cs.AI cs.LG 版本更新

Actionable and diverse counterfactual explanations incorporating domain knowledge and plausibility constraints

结合领域知识和可行性约束的可操作且多样化的反事实解释

Szymon Bobek, Łukasz Bałec, Grzegorz J. Nalepa

发表机构 * Faculty of Physics, Astronomy and Applied Computer Science, Institute of Applied Computer Science, Jagiellonian Human-Centered AI Lab(物理、天文与应用计算机科学学院,应用计算机科学研究所,雅盖隆人机中心AI实验室)

AI总结 提出DANCE方法,通过建模特征依赖和领域约束生成可操作、多样化的反事实解释,在OpenML数据集和工业邮件营销场景中验证了其有效性和实用性。

详情
AI中文摘要

反事实解释通过识别实现期望结果所需的最小变化来提高机器学习模型的可操作可解释性。然而,现有方法常常忽略特征之间的依赖关系,这可能导致不现实或不切实际的修改。这一限制降低了反事实解释在现实决策支持系统中的实用性。受网络安全中电子邮件营销应用的启发,我们提出了DANCE(多样化、可操作且知识约束的解释),一种生成反事实的方法,该方法结合了特征依赖和领域约束。DANCE使用线性或概率结构对特征之间的关系进行建模,这些结构可以从数据中学习或由专家指定。在搜索过程中强制执行这些依赖关系以提高可行性和现实性。该方法在一个统一的目标中联合优化可行性、多样性、邻近性和稀疏性。我们在OpenML的140个数据集上评估了DANCE,并证明它在多个评估标准上相比现有方法具有竞争性或更优的性能。此外,我们与一个电子邮件营销平台合作,在真实工业环境中验证了该方法,表明它能够产生符合领域且可操作的建议。

英文摘要

Counterfactual explanations improve the actionable interpretability of machine learning models by identifying minimal changes required to achieve a desired outcome. However, existing methods often neglect dependencies among features, which can lead to unrealistic or impractical modifications. This limitation reduces the usefulness of counterfactual explanations in real-world decision-support systems. Motivated by applications in cybersecurity for email marketing, we propose DANCE (Diverse, Actionable, and Knowledge-Constrained Explanations), a method for generating counterfactuals that incorporate feature dependencies and domain constraints. DANCE models relationships between features using linear and probabilistic structures that can be learned from data or specified by experts. These dependencies are enforced during the search process to improve plausibility and feasibility. The method jointly optimizes plausibility, diversity, proximity, and sparsity within a unified objective. We evaluate DANCE on 140 datasets from OpenML and demonstrate that it achieves competitive or superior performance compared to existing approaches across multiple evaluation criteria. Additionally, we validate the method in a real-world industrial setting in collaboration with an email marketing platform, showing that it produces domain-consistent and actionable recommendations.

2511.19065 2026-05-26 cs.CV cs.AI cs.LG 版本更新

Understanding, Accelerating, and Improving MeanFlow Training

理解、加速和改进MeanFlow训练

Jin-Young Kim, Hyojun Go, Lea Bogensperger, Julius Erbach, Nikolai Kalischek, Federico Tombari, Konrad Schindler, Dominik Narnhofer

发表机构 * Yonsei University(延世大学) ETH Zurich(苏黎世联邦理工学院) University of Zurich(苏黎世大学) Max Planck ETH CLS(马克斯·普朗克ETH CLS) Google(谷歌)

AI总结 通过分析瞬时速度与平均速度的相互作用,提出一种加速瞬时速度形成并逐步转移训练重点的有效训练方案,实现更快的收敛和更优的少步生成性能。

详情
AI中文摘要

MeanFlow通过联合学习瞬时速度场和平均速度场,有望在少步内实现高质量生成建模。然而,其底层训练动态仍不清楚。我们分析两种速度之间的相互作用,发现:(i) 建立良好的瞬时速度是学习平均速度的前提;(ii) 当时间间隔较小时,瞬时速度的学习受益于平均速度,但随着间隔增大而退化;(iii) 任务亲和性分析表明,对于一步生成至关重要的大间隔平均速度的平滑学习,依赖于先形成准确的瞬时速度和小间隔平均速度。在这些观察的指导下,我们设计了一种有效的训练方案,加速瞬时速度的形成,然后将重点从短间隔平均速度转移到长间隔平均速度。我们改进的MeanFlow训练实现了更快的收敛和显著更好的少步生成:使用相同的DiT-XL骨干网络,我们的方法在1-NFE ImageNet 256x256上达到了令人印象深刻的FID 2.87,而传统的MeanFlow基线为3.43。或者,我们的方法以2.5倍更短的训练时间或使用更小的DiT-L骨干网络,匹配MeanFlow基线的性能。

英文摘要

MeanFlow promises high-quality generative modeling in few steps, by jointly learning instantaneous and average velocity fields. Yet, the underlying training dynamics remain unclear. We analyze the interaction between the two velocities and find: (i) well-established instantaneous velocity is a prerequisite for learning average velocity; (ii) learning of instantaneous velocity benefits from average velocity when the temporal gap is small, but degrades as the gap increases; and (iii) task-affinity analysis indicates that smooth learning of large-gap average velocities, essential for one-step generation, depends on the prior formation of accurate instantaneous and small-gap average velocities. Guided by these observations, we design an effective training scheme that accelerates the formation of instantaneous velocity, then shifts emphasis from short- to long-interval average velocity. Our enhanced MeanFlow training yields faster convergence and significantly better few-step generation: With the same DiT-XL backbone, our method reaches an impressive FID of 2.87 on 1-NFE ImageNet 256x256, compared to 3.43 for the conventional MeanFlow baseline. Alternatively, our method matches the performance of the MeanFlow baseline with 2.5x shorter training time, or with a smaller DiT-L backbone.

2511.12046 2026-05-26 cs.CR cs.AI cs.CV cs.LG 版本更新

BackWeak: Backdooring Knowledge Distillation Simply with Weak Triggers and Fine-tuning

BackWeak: 使用弱触发器和微调简单后门知识蒸馏

Shanmin Wang, Dongdong Zhao

发表机构 * School of Computer Science and Artificial Intelligence(计算机科学与人工智能学院) Wuhan University of Technology(武汉科技大学)

AI总结 提出BackWeak方法,通过微调教师模型嵌入弱触发器实现后门攻击,无需替代学生模型或模拟蒸馏,在标准蒸馏过程中可靠转移至不同学生架构。

详情
AI中文摘要

知识蒸馏对于压缩大型模型至关重要,但依赖从第三方仓库下载的预训练“教师”模型引入了严重的安全风险——最显著的是后门攻击。现有的知识蒸馏后门方法通常复杂且计算密集:它们使用替代学生模型和模拟蒸馏来保证可转移性,并构建类似于通用对抗扰动(UAP)的触发器,这些触发器在幅度上不隐蔽,本质上表现出强烈的对抗行为。本文质疑这种复杂性是否必要,并构建了隐蔽的“弱”触发器——具有可忽略对抗效应的不可察觉扰动。我们提出了BackWeak,一种简单、无替代的攻击范式。BackWeak表明,通过使用非常小的学习率对良性教师模型进行微调并嵌入弱触发器,即可植入强大的后门。我们证明,这种精细的微调足以嵌入后门,在受害者的标准蒸馏过程中可靠地转移到不同的学生架构,从而实现高攻击成功率。在多个数据集、模型架构和知识蒸馏方法上的广泛实证评估表明,BackWeak比以往复杂的方法更高效、更简单,且通常更隐蔽。本文呼吁研究知识蒸馏后门攻击的学者特别关注触发器的潜在对抗特性。

英文摘要

Knowledge Distillation (KD) is essential for compressing large models, yet relying on pre-trained "teacher" models downloaded from third-party repositories introduces serious security risks--most notably backdoor attacks. Existing KD backdoor methods are typically complex and computationally intensive: they employ surrogate student models and simulated distillation to guarantee transferability, and construct triggers similar to universal adversarial perturbations (UAPs), which being not stealthy in magnitude, inherently exhibit strong adversarial behavior. This work questions whether such complexity is necessary and constructs stealthy "weak" triggers--imperceptible perturbations that have negligible adversarial effect. We propose BackWeak, a simple, surrogate-free attack paradigm. BackWeak shows that a powerful backdoor can be implanted by simply fine-tuning a benign teacher with a weak trigger using a very small learning rate. We demonstrate that this delicate fine-tuning is sufficient to embed a backdoor that reliably transfers to diverse student architectures during a victim's standard distillation process, yielding high attack success rates. Extensive empirical evaluations on multiple datasets, model architectures, and KD methods show that BackWeak is efficient, simpler, and often more stealthy than previous elaborate approaches. This work calls on researchers studying KD backdoor attacks to pay particular attention to the trigger's potential adversarial characteristics.

2511.03548 2026-05-26 cs.LG 版本更新

Flat Minima and Generalization: Insights from Stochastic Convex Optimization

平坦极小值与泛化:来自随机凸优化的见解

Matan Schliserman, Shira Vansover-Hager, Tomer Koren

发表机构 * Blavatnik School of Computer Science and AI, Tel Aviv University(塔夫茨大学Blavatnik计算机科学与人工智能学院) Google Research(谷歌研究)

AI总结 本文在随机凸优化框架下研究平坦极小值与泛化的关系,发现平坦经验极小值可能产生Ω(1)的总体风险,而尖锐极小值泛化最优,并证明两种锐度感知算法(SA-GD和SAM)也可能泛化不佳。

详情
AI中文摘要

理解学习算法的泛化行为是学习理论的核心目标。最近一种新兴的解释是,学习算法在实践中成功是因为它们收敛到平坦极小值,而平坦极小值一直与改进的泛化性能相关联。在这项工作中,我们在非负、β-光滑目标的随机凸优化的经典设置中研究平坦极小值与泛化之间的联系。我们的第一个发现是,即使在这个基础且被充分研究的设置中,平坦的经验极小值可能产生平凡的Ω(1)总体风险,而尖锐极小值则能最优地泛化。然后,我们表明这种糟糕的泛化行为延伸到两种自然的“锐度感知”算法,这些算法最初由Foret等人(2021)提出,旨在将优化偏向平坦解:锐度感知梯度下降(SA-GD)和锐度感知最小化(SAM)。对于SA-GD,它在预定义邻域内对最大损失执行梯度步骤,我们证明虽然它成功以快速率收敛到平坦极小值,但解的总体风险仍然可能高达Ω(1),表明即使使用锐度感知梯度方法算法性地找到的平坦极小值也可能泛化不佳。对于SAM,一种基于归一化上升步骤的SA-GD计算高效近似,我们表明尽管它最小化经验损失,但可能收敛到尖锐极小值,并且也产生Ω(1)的总体风险。最后,我们使用算法稳定性技术为SA-GD和SAM建立了总体风险上界。

英文摘要

Understanding the generalization behavior of learning algorithms is a central goal of learning theory. A recently emerging explanation is that learning algorithms are successful in practice because they converge to flat minima, which have been consistently associated with improved generalization performance. In this work, we study the link between flat minima and generalization in the canonical setting of stochastic convex optimization with a non-negative, $β$-smooth objective. Our first finding is that, even in this fundamental and well-studied setting, flat empirical minima may incur trivial $Ω(1)$ population risk while sharp minima generalizes optimally. Then, we show that this poor generalization behavior extends to two natural ''sharpness-aware'' algorithms originally proposed by Foret et al. (2021), designed to bias optimization toward flat solutions: Sharpness-Aware Gradient Descent (SA-GD) and Sharpness-Aware Minimization (SAM). For SA-GD, which performs gradient steps on the maximal loss in a predefined neighborhood, we prove that while it successfully converges to a flat minimum at a fast rate, the population risk of the solution can still be as large as $Ω(1)$, indicating that even flat minima found algorithmically using a sharpness-aware gradient method might generalize poorly. For SAM, a computationally efficient approximation of SA-GD based on normalized ascent steps, we show that although it minimizes the empirical loss, it may converge to a sharp minimum and also incur population risk $Ω(1)$. Finally, we establish population risk upper bounds for both SA-GD and SAM using algorithmic stability techniques.

2511.03529 2026-05-26 cs.LG 版本更新

Byzantine-Robust Federated Learning with Learnable Aggregation Weights

具有可学习聚合权重的拜占庭鲁棒联邦学习

Javad Parsa, Amir Hossein Daghestani, André M. H. Teixeira, Mikael Johansson

发表机构 * Uppsala University, Sweden(瑞典乌普萨拉大学) KTH, Sweden(瑞典皇家理工学院)

AI总结 提出一种将聚合权重作为可学习参数联合优化的拜占庭鲁棒联邦学习优化问题,并开发了交替最小化算法,在异构数据和恶意客户端场景下优于现有方法。

Comments ICLR 2026

详情
AI中文摘要

联邦学习(FL)使客户端能够在不共享私有数据的情况下协作训练全局模型。然而,恶意(拜占庭)客户端的存在对FL的鲁棒性构成了重大挑战,尤其是在客户端数据分布异构的情况下。在本文中,我们提出了一种新颖的拜占庭鲁棒FL优化问题,该问题将自适应加权引入聚合过程。与传统方法不同,我们的公式将聚合权重视为可学习参数,与全局模型参数联合优化。为了解决这个优化问题,我们开发了一种交替最小化算法,在对抗攻击下具有强收敛保证。我们分析了所提目标的拜占庭弹性。我们在各种数据集和攻击场景下,将我们的算法与最先进的拜占庭鲁棒FL方法进行了性能评估。实验结果表明,我们的方法始终优于现有方法,特别是在数据高度异构且恶意客户端比例较大的情况下。

英文摘要

Federated Learning (FL) enables clients to collaboratively train a global model without sharing their private data. However, the presence of malicious (Byzantine) clients poses significant challenges to the robustness of FL, particularly when data distributions across clients are heterogeneous. In this paper, we propose a novel Byzantine-robust FL optimization problem that incorporates adaptive weighting into the aggregation process. Unlike conventional approaches, our formulation treats aggregation weights as learnable parameters, jointly optimizing them alongside the global model parameters. To solve this optimization problem, we develop an alternating minimization algorithm with strong convergence guarantees under adversarial attack. We analyze the Byzantine resilience of the proposed objective. We evaluate the performance of our algorithm against state-of-the-art Byzantine-robust FL approaches across various datasets and attack scenarios. Experimental results demonstrate that our method consistently outperforms existing approaches, particularly in settings with highly heterogeneous data and a large proportion of malicious clients.

2510.22827 2026-05-26 cs.CV cs.LG 版本更新

FairJudge: Abstention-Aware Multimodal Judges for Fairness and Alignment Evaluation in Text-to-Image Models

FairJudge: 文本到图像模型中公平性与对齐评估的弃权感知多模态裁判

Zahraa Al Sahili, Maimuna Nowaz, Maryam Fetanat, Ioannis Patras, Matthew Purver

发表机构 * Queen Mary University of London(伦敦玛丽女王大学) Institut Jožef Stefan(乔泽夫·斯蒂芬研究所) Imperial College London(伦敦帝国学院)

AI总结 提出FairJudge协议,利用多模态大语言模型作为结构化裁判,通过封闭标签、弃权机制和证据报告,在文本到图像模型中实现社会属性预测、职业定位和提示-图像对齐的公平性评估。

详情
AI中文摘要

评估文本到图像(T2I)系统不仅需要判断图像是否匹配提示,还需要判断社会显著属性是否被忠实表示且没有无根据的推断。现有的自动评估器通常依赖于以面部为中心的识别器或对比图像-文本相似度,这些方法提供的诊断反馈有限,并且通常在视觉证据模糊或缺失时强制进行预测。对于宗教和残疾等公平敏感属性,其中线索可能是上下文相关的、间接的或故意未指定的,这些评估器可能会遗漏细心的人类评审员会注意到的失败模式。我们引入了\textsc{FairJudge},一种弃权感知的评估协议,该协议使用遵循指令的多模态LLM作为社会属性预测、职业定位和提示-图像对齐的结构化裁判。该协议将输出限制为封闭标签集,要求可见证据的理由,在线索不足时支持明确的\textsc{unspecified}决策,并将基于量规的对齐判断映射到$[-1,1]$。这些约束将MLLM裁判从开放式评估转变为可解析、可审计的评估程序。在四个属性预测基准和三个职业/对齐基准上,\textsc{FairJudge}优于或补充了CLIP、DeepFace、VIEScore和VQAScore。消融实验表明,封闭标签、弃权和证据报告对可靠性至关重要。我们进一步引入了\textsc{DIVERSIFY}和\textsc{DIVERSIFY-Professions},这两个资源丰富的上下文数据集用于评估超越面部可见或图标线索的社会表示和职业定位。我们发布了代码、提示、数据集、解析器日志和每张图像的裁判输出,以支持可重复的审计。

英文摘要

Evaluating text-to-image (T2I) systems requires judging not only whether an image matches a prompt, but also whether socially salient attributes are represented faithfully and without unsupported inference. Existing automated evaluators typically rely on face-centric recognizers or contrastive image--text similarity, which provide limited diagnostic feedback and often force predictions even when visual evidence is ambiguous or absent. For fairness-sensitive attributes such as religion and disability, where cues may be contextual, indirect, or intentionally unspecified, these evaluators can therefore miss failure modes that careful human reviewers would notice. We introduce \textsc{FairJudge}, an abstention-aware evaluation protocol that uses instruction-following multimodal LLMs as structured judges for social-attribute prediction, profession grounding, and prompt--image alignment. The protocol constrains outputs to closed label sets, requires visible-evidence rationales, supports an explicit \textsc{unspecified} decision when cues are insufficient, and maps rubric-based alignment judgments to $[-1,1]$. These constraints turn MLLM judging from open-ended assessment into a parseable, auditable evaluation procedure. Across four attribute-prediction benchmarks and three profession/alignment benchmarks, \textsc{FairJudge} outperforms or complements CLIP, DeepFace, VIEScore, and VQAScore. Ablations show that closed labels, abstention, and evidence reporting are central to reliability. We further introduce \textsc{DIVERSIFY} and \textsc{DIVERSIFY-Professions}, two context-rich resources for evaluating social representation and profession grounding beyond face-visible or iconic cues. We release code, prompts, datasets, parser logs, and per-image judge outputs to support reproducible auditing.

2510.22186 2026-05-26 cs.LG cs.IT math.FA math.IT math.MG 版本更新

Quantitative Bounds for Sorting-Based Permutation-Invariant Embeddings

基于排序的置换不变嵌入的定量界

Nadav Dym, Matthias Wellershoff, Efstratios Tsoukanis, Daniel Levy, Radu Balan

发表机构 * Department of Mathematics, University of Maryland(马里兰大学数学系) Institute of Mathematical Sciences, Claremont Graduate University(克莱姆森研究生大学数学科学研究所)

AI总结 研究通过排序独立一维投影得到的置换不变嵌入,改进了注入性所需嵌入维度的上下界,并给出了双Lipschitz常数的估计,其失真度与点数n的平方成正比且与维度d无关。

Comments Minor revision; 37 pages, 1 figure, 2 tables

详情
Journal ref
IEEE Trans. Inf. Theory, vol. 72, no. 6, pp. 4297-4311, Jun. 2026
AI中文摘要

我们研究$d$维点集的置换不变嵌入,这些嵌入通过排序输入数据的$D$个独立一维投影来定义。此类嵌入出现在图深度学习中对图节点输出应具有置换不变性的场景。先前的工作表明,对于足够大的$D$和处于一般位置的投影,该映射是单射的,并且满足双Lipschitz条件。然而,仍存在两个空白:首先,注入性所需的最优大小$D$尚不清楚;其次,映射的双Lipschitz常数估计未知。本文在解决这两个空白方面取得了实质性进展。针对第一个空白,我们改进了注入性所需嵌入维度$D$的最佳已知上界,并给出了最小注入性维度的下界。针对第二个空白,我们构造了投影向量矩阵,使得映射的双Lipschitz失真度与点数$n$的平方成正比,且完全独立于维度$d$。我们还证明,对于任何投影向量的选择,映射的失真度不会优于与$n$的平方根成比例的界。最后,我们展示了即使对映射应用线性投影以降低其维度,也能提供类似的保证。

英文摘要

We study permutation-invariant embeddings of $d$-dimensional point sets, which are defined by sorting $D$ independent one-dimensional projections of the input. Such embeddings arise in graph deep learning where outputs should be invariant to permutations of graph nodes. Previous work showed that for large enough $D$ and projections in general position, this mapping is injective, and moreover satisfies a bi-Lipschitz condition. However, two gaps remain: firstly, the optimal size $D$ required for injectivity is not yet known, and secondly, no estimates of the bi-Lipschitz constants of the mapping are known. In this paper, we make substantial progress in addressing both of these gaps. Regarding the first gap, we improve upon the best known upper bounds for the embedding dimension $D$ necessary for injectivity, and also provide a lower bound on the minimal injectivity dimension. Regarding the second gap, we construct matrices of projection vectors, so that the bi-Lipschitz distortion of the mapping depends quadratically on the number of points $n$, and is completely independent of the dimension $d$. We also show that for any choice of projection vectors, the distortion of the mapping will never be better than a bound proportional to the square root of $n$. Finally, we show that similar guarantees can be provided even when linear projections are applied to the mapping to reduce its dimension.

2510.19731 2026-05-26 eess.SY cs.LG cs.SY 版本更新

Bridging Earth and Space: A Survey on HAPS for Non-Terrestrial Networks

连接地球与太空:面向非地面网络的HAPS综述

G. Svistunov, A. Akhtarshenas, D. López-Pérez, M. Giordani, G. Geraci, H. Yanikomeroglu

发表机构 * Universitat Politècnica de València(瓦伦西亚理工大学) University of Padova(帕多瓦大学) Universitat Pompeu Fabra(庞培法华大学) Carleton University(卡尔顿大学)

AI总结 本文综述了高空平台站(HAPS)在6G非地面网络中的用例、技术及集成策略,强调了其在扩展覆盖、动态回传、大规模物联网和低延迟通信中的关键作用。

Comments 43 pages. This work has been submitted to IEEE for possible publication (under review)

详情
AI中文摘要

HAPS正在成为6G无线网络演进中的关键推动者,连接地面和非地面基础设施。HAPS在平流层运行,能够提供广域覆盖、低延迟、高能效的宽带通信,并为各种应用提供灵活的部署选项。本综述全面概述了HAPS在6G生态系统中的用例、技术和集成策略。讨论了HAPS在扩展未覆盖区域连接、支持动态回传、实现大规模物联网以及为自主和沉浸式服务提供可靠低延迟通信方面的作用。本文回顾了地面和非地面网络集成的最先进架构,并强调了最近的现场试验。此外,还研究了关键使能技术,如信道建模、AI驱动的资源分配、干扰控制、移动管理和高能效通信。本文还概述了开放的研究挑战。通过解决现有文献中的空白,本综述将HAPS定位为全球集成、有弹性和可持续的6G网络的基础组成部分。

英文摘要

HAPS are emerging as key enablers in the evolution of 6G wireless networks, bridging terrestrial and non-terrestrial infrastructures. Operating in the stratosphere, HAPS can provide wide-area coverage, low-latency, energy-efficient broadband communications with flexible deployment options for diverse applications. This survey delivers a comprehensive overview of HAPS use cases, technologies, and integration strategies within the 6G ecosystem. The roles of HAPS in extending connectivity to underserved regions, supporting dynamic backhauling, enabling massive IoT, and delivering reliable low-latency communications for autonomous and immersive services are discussed. The paper reviews state-of-the-art architectures for terrestrial and non-terrestrial network integration, highlights recent field trials. Furthermore, key enabling technologies such as channel modeling, AI-driven resource allocation, interference control, mobility management, and energy-efficient communications are examined. The paper also outlines open research challenges. By addressing existing gaps in the literature, this survey positions HAPS as a foundational component of globally integrated, resilient, and sustainable 6G networks.

2510.11296 2026-05-26 cs.CV cs.LG 版本更新

$Δ\mathrm{Energy}$: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization

$Δ\mathrm{Energy}$: 优化视觉-语言对齐过程中的能量变化提升OOD检测与OOD泛化

Lin Zhu, Yifeng Yang, Xinbing Wang, Qinying Gu, Nanyang Ye

发表机构 * Shanghai Jiao Tong University(上海交通大学) Shanghai Artificial Intelligence Laboratory(上海人工智能实验室)

AI总结 本文提出ΔEnergy分数,通过重新对齐视觉-语言模态时的能量变化来同时提升分布外检测和分布外泛化性能,并基于此开发了统一微调框架EBM。

Comments Accepted by NeurIPS2025

详情
AI中文摘要

近期针对视觉-语言模型(VLM)的方法在下游任务快速适应中取得了显著成功。当应用于真实世界下游任务时,VLM不可避免地会遇到分布内(ID)数据和分布外(OOD)数据。OOD数据集通常包括协变量偏移(例如,已知类别但图像风格变化)和语义偏移(例如,测试时未见类别)。这凸显了提升VLM对协变量偏移OOD数据的泛化能力,同时有效检测开放集语义偏移OOD类别的重要性。本文受重新对齐视觉-语言模态时(具体通过将最大余弦相似度直接降低到低值)观察到的闭集数据中显著能量变化的启发,提出了一种新的OOD分数,命名为ΔEnergy。ΔEnergy显著优于基于能量的原始OOD分数,为OOD检测提供了更可靠的方法。此外,ΔEnergy还能同时提升协变量偏移下的OOD泛化,这是通过ΔEnergy的下界最大化(称为EBM)实现的。理论上证明EBM不仅能增强OOD检测,还能产生领域一致的Hessian矩阵,这作为OOD泛化的强指标。基于这一发现,我们开发了一个统一的微调框架,能够提升VLM在OOD泛化和OOD检测两方面的鲁棒性。在具有挑战性的OOD检测和泛化基准上的大量实验证明了我们方法的优越性,在AUROC上比近期方法提升了10%到25%。

英文摘要

Recent approaches for vision-language models (VLMs) have shown remarkable success in achieving fast downstream adaptation. When applied to real-world downstream tasks, VLMs inevitably encounter both the in-distribution (ID) data and out-of-distribution (OOD) data. The OOD datasets often include both covariate shifts (e.g., known classes with changes in image styles) and semantic shifts (e.g., test-time unseen classes). This highlights the importance of improving VLMs' generalization ability to covariate-shifted OOD data, while effectively detecting open-set semantic-shifted OOD classes. In this paper, inspired by the substantial energy change observed in closed-set data when re-aligning vision-language modalities (specifically by directly reducing the maximum cosine similarity to a low value), we introduce a novel OOD score, named ΔEnergy. ΔEnergy significantly outperforms the vanilla energy-based OOD score and provides a more reliable approach for OOD detection. Furthermore, ΔEnergy can simultaneously improve OOD generalization under covariate shifts, which is achieved by lower-bound maximization for ΔEnergy (termed EBM). EBM is theoretically proven to not only enhance OOD detection but also yields a domain-consistent Hessian, which serves as a strong indicator for OOD generalization. Based on this finding, we developed a unified fine-tuning framework that allows for improving VLMs' robustness in both OOD generalization and OOD detection. Extensive experiments on challenging OOD detection and generalization benchmarks demonstrate the superiority of our method, outperforming recent approaches by 10% to 25% in AUROC.

2510.10921 2026-05-26 cs.CV cs.AI cs.LG 版本更新

FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model

FG-CLIP 2: 一种双语细粒度视觉-语言对齐模型

Chunyu Xie, Bin Wang, Fanjing Kong, Jincheng Li, Dawei Liang, Ji Ao, Dawei Leng, Yuhui Yin

发表机构 * AI Research(360人工智能研究院)

AI总结 提出FG-CLIP 2双语视觉语言模型,通过区域-文本匹配、长描述建模和文本内模态对比损失等细粒度监督,在英中双语上实现细粒度对齐,在29个数据集上取得最优结果。

Comments Accepted in ICML2026

详情
AI中文摘要

细粒度视觉-语言理解需要视觉内容与语言描述之间的精确对齐,这一能力在当前模型中仍然有限,尤其是在非英语环境下。虽然CLIP等模型在全局对齐上表现良好,但它们往往难以捕捉对象属性、空间关系和语言表达中的细粒度细节,且对双语理解的支持有限。为应对这些挑战,我们提出了FG-CLIP 2,一个旨在推进英语和中文细粒度对齐的双语视觉语言模型。我们的方法利用了丰富的细粒度监督,包括区域-文本匹配和长描述建模,以及多个判别性目标。我们进一步引入了文本内模态对比损失,以更好地区分语义相似的描述。在精心策划的大规模英语和中文数据混合上训练,包括新发布的1200万中文区域-文本数据集,FG-CLIP 2实现了强大的双语性能。为进行严格评估,我们提出了一个新的中文多模态理解基准,包括长描述检索和边界框分类。在8个任务的29个数据集上的大量实验表明,FG-CLIP 2优于现有方法,在两种语言上均达到了最先进的结果。我们发布了模型、代码和基准,以促进双语细粒度视觉-语言对齐的未来研究。

英文摘要

Fine-grained vision-language understanding requires precise alignment between visual content and linguistic descriptions, a capability that remains limited in current models, particularly in non-English settings. While models like CLIP perform well on global alignment, they often struggle to capture fine-grained details in object attributes, spatial relations, and linguistic expressions, with limited support for bilingual comprehension. To address these challenges, we introduce FG-CLIP 2, a bilingual vision-language model designed to advance fine-grained alignment for both English and Chinese. Our approach leverages rich fine-grained supervision, including region-text matching and long-caption modeling, alongside multiple discriminative objectives. We further introduce the Textual Intra-modal Contrastive (TIC) loss to better distinguish semantically similar captions. Trained on a carefully curated mixture of large-scale English and Chinese data, including a newly released 12M Chinese region-text dataset, FG-CLIP 2 achieves powerful bilingual performance. To enable rigorous evaluation, we present a new benchmark for Chinese multimodal understanding, featuring long-caption retrieval and bounding box classification. Extensive experiments on 29 datasets across 8 tasks show that FG-CLIP 2 outperforms existing methods, achieving state-of-the-art results in both languages. We release the model, code, and benchmark to facilitate future research on bilingual fine-grained vision-language alignment.

2510.08558 2026-05-26 cs.AI cs.CL cs.IR cs.LG 版本更新

Agent Learning via Early Experience

通过早期经验进行智能体学习

Kai Zhang, Xiangchao Chen, Bo Liu, Tianci Xue, Zeyi Liao, Zhihan Liu, Xiyao Wang, Yuting Ning, Zhaorun Chen, Xiaohan Fu, Jian Xie, Yuxuan Sun, Boyu Gou, Qi Qi, Zihang Meng, Jianwei Yang, Ning Zhang, Xian Li, Ashish Shah, Dat Huynh, Hengduo Li, Zi Yang, Sara Cao, Lawrence Jang, Shuyan Zhou, Jiacheng Zhu, Huan Sun, Jason Weston, Yu Su, Yifan Wu

发表机构 * Meta Superintelligence Labs(Meta超智能实验室) FAIR at Meta(Meta的FAIR部门) The Ohio State University(俄亥俄州立大学)

AI总结 提出早期经验范式,利用智能体自身动作生成的交互数据(无需奖励信号)通过隐式世界建模和自我反思两种策略提升智能体在多样化环境中的效果和跨域泛化能力。

Comments ICML 2026

详情
AI中文摘要

语言智能体的一个长期目标是通过自身经验学习和改进,最终在复杂的现实任务中超越人类。然而,在缺乏可验证奖励(如网站)或需要低效长程展开(如多轮工具使用)的许多环境中,基于经验数据使用强化学习训练智能体仍然困难。因此,当前大多数智能体依赖专家数据的监督微调,这难以扩展且泛化能力差。这一局限性源于专家示范的本质:它们只捕获了狭窄的场景范围,并使智能体暴露于有限的环境多样性。我们通过一种称为早期经验的中间范式来解决这一局限性:由智能体自身动作生成的交互数据,其中产生的未来状态作为监督信号,无需奖励。在此范式下,我们研究了使用此类数据的两种策略:(1)隐式世界建模,利用收集的状态将策略基于环境动态;(2)自我反思,智能体从其次优动作中学习以改进推理和决策。在八个多样化环境和多个模型家族上的评估表明,我们的方法持续提升了有效性和跨域泛化,凸显了早期经验的价值。此外,在具有可验证奖励的环境中,我们的结果提供了有希望的信号,表明早期经验为后续强化学习奠定了坚实基础,使其成为模仿学习与完全经验驱动智能体之间的实用桥梁。

英文摘要

A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use). As a result, most current agents rely on supervised fine-tuning on expert data, which is challenging to scale and generalizes poorly. This limitation stems from the nature of expert demonstrations: they capture only a narrow range of scenarios, and expose the agent to limited environment diversity. We address this limitation with a middle-ground paradigm we call early experience: interaction data generated by the agent's own actions, where the resulting future states serve as supervision without reward signals. Within this paradigm, we study two strategies of using such data: (1) implicit world modeling, which uses collected states to ground the policy in environment dynamics; and (2) self-reflection, where the agent learns from its suboptimal actions to improve reasoning and decision-making. Evaluation across eight diverse environments and multiple model families shows that our approaches consistently improve effectiveness and out-of-domain generalization, highlighting the value of early experience. Moreover, in environments with verifiable rewards, our results provide promising signals that early experience offers a strong foundation for subsequent reinforcement learning, making it a practical bridge between imitation learning and fully experience-driven agents.

2510.08350 2026-05-26 cs.LG cs.AI 版本更新

DeepEN: A Deep Reinforcement Learning Framework for Personalized Enteral Nutrition in Critical Care

DeepEN: 一种用于重症监护中个性化肠内营养的深度强化学习框架

Daniel Jason Tan, Jiayang Chen, Dilruk Perera, Kay Choong See, Mengling Feng

发表机构 * Institute of Data Science(数据科学研究所) Saw Swee Hock School of Public Health, National University of Singapore, Singapore(Saw Swee Hock公共卫生学院,新加坡国立大学,新加坡) National University Hospital, Singapore(新加坡国立医院)

AI总结 提出DeepEN框架,利用深度强化学习从电子健康记录中学习个性化肠内营养方案,在MIMIC-IV数据集上相比临床实践降低绝对死亡率4.0个百分点。

详情
AI中文摘要

目的:由于个性化程度有限以及在动态代谢需求下对适当热量、蛋白质和液体目标的不确定性,ICU中的肠内营养(EN)输送仍不理想。我们引入DeepEN,一个使用电子健康记录数据进行个性化EN优化的强化学习(RL)框架。方法:DeepEN在来自MIMIC-IV的超过11,000名ICU患者上训练,以生成每4小时一次、针对患者的卡路里、蛋白质和液体目标。状态表示包括人口统计学、合并症、生命体征、实验室值和近期干预措施。一个生理学对齐的奖励框架平衡了生物标志物稳定性与长期生存。策略学习采用带有保守Q学习正则化的决斗双深度Q网络,以实现安全的离线训练。结果:DeepEN实现了最高的估计策略价值($V^π= 9.48$)和最低的校准死亡率(18.8 ± 1.0%),与临床实践(22.8%)相比绝对降低了4.0个百分点。该策略还表现出优越的代谢稳定性,实现了目标范围内葡萄糖、磷酸盐和钠值的最高比例。此外,偏离DeepEN策略与死亡率和生物标志物不稳定性独立相关,而偏离随机策略则没有这种关联。可解释性分析进一步表明,建议是基于器官功能和代谢状态的生理相关标志物,而不是静态剂量启发式。结论:DeepEN证明了保守离线RL在安全、个性化EN优化中的可行性,突出了数据驱动个性化在重症监护中补充基于指南方法的潜力。

英文摘要

Objective: Enteral nutrition (EN) delivery in the ICU remains suboptimal due to limited personalization and uncertainty regarding appropriate calorie, protein, and fluid targets under dynamic metabolic demands. We introduce DeepEN, a reinforcement learning (RL) framework for personalized EN optimization using electronic health record data. Methods: DeepEN was trained on over 11,000 ICU patients from MIMIC-IV to generate 4-hourly, patient-specific caloric, protein, and fluid targets. The state representation incorporated demographics, comorbidities, vital signs, laboratory values, and recent interventions. A physiologically aligned reward framework balanced biomarker stability with long-term survival. Policy learning employed a dueling double deep Q-network with Conservative Q-Learning regularization to enable safe offline training. Results: DeepEN achieved the highest estimated policy value ($V^π= 9.48$) and the lowest calibrated mortality (18.8 +/- 1.0%), representing a 4.0 percentage-point absolute reduction compared with clinician practice (22.8%). The policy also demonstrated superior metabolic stability, achieving the highest proportion of glucose, phosphate, and sodium values within target range. Furthermore, deviation from the DeepEN policy was independently associated with increased mortality and biomarker instability, whereas deviation from a random policy showed no such association. Interpretability analyses further indicated that recommendations were conditioned on physiologically relevant markers of organ function and metabolic status rather than static dosing heuristics. Conclusion: DeepEN demonstrates the feasibility of conservative offline RL for safe, individualized EN optimization, highlighting the potential of data-driven personalization to complement guideline-based approaches in critical care.

2510.06672 2026-05-26 cs.LG 版本更新

XRPO: Pushing the limits of GRPO with Targeted Exploration and Exploitation

XRPO:通过定向探索与利用突破GRPO极限

Udbhav Bamba, Minghao Fang, Yifan Yu, Haizhong Zheng, Fan Lai

发表机构 * University of Illinois Urbana–Champaign(伊利诺伊大学厄巴纳-香槟分校) Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出XRPO框架,通过自适应探索分配器、上下文种子策略和新颖性感知优势机制,在数学和编码基准上实现比GRPO最高4% pass@1和6% cons@32的提升,并加速训练收敛达2.7倍。

详情
AI中文摘要

GRPO等强化学习算法推动了大型语言模型推理的最新进展。虽然增加rollout数量可以稳定训练,但现有方法在具有挑战性的提示上探索有限,且由于跨提示的上下文无关rollout分配(例如,每个提示生成16个rollout)以及严重依赖稀疏奖励,导致信息性反馈信号未被充分利用。本文提出XRPO(探索-利用GRPO),这是一个统一框架,通过rollout探索-利用的原则性视角重新审视策略优化。为增强探索,XRPO引入了一个数学基础的rollout分配器,自适应地优先处理具有更高不确定性减少潜力的提示。它还通过上下文种子策略注入精选示例,解决零奖励提示上的停滞问题,引导模型进入更困难的推理轨迹。为加强利用,XRPO开发了一种组相对、新颖性感知的优势锐化机制,利用序列似然性放大低概率但正确的响应,从而将策略扩展到稀疏奖励之外。在多种数学和编码基准上对推理和非推理模型的实验表明,XRPO优于现有先进方法(如GRPO和GSPO),pass@1提升高达4%,cons@32提升高达6%,同时训练收敛速度加快达2.7倍。

英文摘要

Reinforcement learning algorithms such as GRPO have driven recent advances in large language model (LLM) reasoning. While scaling the number of rollouts stabilizes training, existing approaches suffer from limited exploration on challenging prompts and leave informative feedback signals underexploited, due to context-independent rollout allocation across prompts (e.g., generating 16 rollouts per prompt) and relying heavily on sparse rewards. This paper presents XRPO(eXplore - eXploit GRPO), a unified framework that recasts policy optimization through the principled lens of rollout exploration-exploitation. To enhance exploration, XRPO introduces a mathematically grounded rollout allocator that adaptively prioritizes prompts with higher potential for uncertainty reduction. It further addresses stagnation on zero-reward prompts through an in-context seeding strategy that injects curated exemplars, steering the model into more difficult reasoning trajectories. To strengthen exploitation, XRPO develops a group-relative, novelty-aware advantage sharpening mechanism that leverages sequence likelihoods to amplify low-probability yet correct responses, thereby extending the policy's reach beyond sparse rewards. Experiments across diverse math and coding benchmarks on both reasoning and non-reasoning models demonstrate that XRPO outperforms existing advances (e.g., GRPO and GSPO) up to 4% pass@1 and 6% cons@32, while accelerating training convergence by up to 2.7X.

2510.05688 2026-05-26 cs.LG cs.AI 版本更新

vAttention: Verified Sparse Attention

vAttention: 验证的稀疏注意力

Aditya Desai, Kumar Krishna Agrawal, Shuo Yang, Alejandro Cuadron, Luis Gaspar Schroeder, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica

发表机构 * Electrical Engineering and Computer Sciences, University of California, Berkeley(加州大学伯克利分校电气工程与计算机科学系)

AI总结 提出vAttention,通过统一top-k和随机采样,实现首个具有用户指定(ε, δ)近似精度保证的实用稀疏注意力机制,显著提升质量-效率权衡。

详情
Journal ref
Proceedings of the International Conference on Learning Representations (ICLR), 2026
AI中文摘要

最先进的用于减少解码延迟的稀疏注意力方法主要分为两类:近似top-$k$(及其扩展top-$p$)和最近引入的基于采样的估计。然而,这些方法在逼近全注意力方面存在根本性局限:它们无法在头和查询向量之间提供一致的近似,最关键的是,缺乏对近似质量的保证,限制了其实际部署。我们观察到top-$k$和随机采样是互补的:当注意力分数由少数标记主导时,top-$k$表现良好,而当注意力分数相对均匀时,随机采样提供更好的估计。基于这一洞察并利用采样的统计保证,我们引入了vAttention,这是第一个具有用户指定$(ε, δ)$近似精度保证(因此称为“已验证”)的实用稀疏注意力机制。这些保证使vAttention成为向大规模实用、可靠部署稀疏注意力迈出的引人注目的一步。通过统一top-$k$和采样,vAttention在质量-效率权衡上优于两者各自的表现。我们的实验表明,vAttention显著提高了稀疏注意力的质量(例如,在RULER-HARD上,Llama 3.1 8B Instruct和DeepSeek-R1-Distill-Llama-8B提高了约4.5个百分点),并有效弥合了全注意力和稀疏注意力之间的差距(例如,在多个数据集上,以高达20倍稀疏度匹配全模型质量)。我们还展示了它可以部署在推理场景中,在不牺牲模型质量的情况下实现快速解码(例如,vAttention在AIME2024上以10倍稀疏度和高达32K标记生成实现了全模型质量)。代码:https://github.com/skylight-org/sparse-attention-hub。网页:https://sky-light.eecs.berkeley.edu。

英文摘要

State-of-the-art sparse attention methods for reducing decoding latency fall into two main categories: approximate top-$k$ (and its extension, top-$p$) and recently introduced sampling-based estimation. However, these approaches are fundamentally limited in their ability to approximate full attention: they fail to provide consistent approximations across heads and query vectors and, most critically, lack guarantees on approximation quality, limiting their practical deployment. We observe that top-$k$ and random sampling are complementary: top-$k$ performs well when attention scores are dominated by a few tokens, whereas random sampling provides better estimates when attention scores are relatively uniform. Building on this insight and leveraging the statistical guarantees of sampling, we introduce vAttention, the first practical sparse attention mechanism with user-specified $(ε, δ)$ guarantees on approximation accuracy (thus, "verified"). These guarantees make vAttention a compelling step toward practical, reliable deployment of sparse attention at scale. By unifying top-$k$ and sampling, vAttention outperforms both individually, delivering a superior quality-efficiency trade-off. Our experiments show that vAttention significantly improves the quality of sparse attention (e.g., $\sim$4.5 percentage points for Llama 3.1 8B Instruct and DeepSeek-R1-Distill-Llama-8B on RULER-HARD), and effectively bridges the gap between full and sparse attention (e.g., across datasets, it matches full model quality with up to 20x sparsity). We also demonstrate that it can be deployed in reasoning scenarios to achieve fast decoding without compromising model quality (e.g., vAttention achieves full model quality on AIME2024 at 10x sparsity with up to 32K token generations). Code: https://github.com/skylight-org/sparse-attention-hub. Webpage: https://sky-light.eecs.berkeley.edu.

2509.23975 2026-05-26 eess.SY cs.LG cs.NA cs.SY math.NA math.OC 版本更新

Equation-Free Coarse Control of Distributed Parameter Systems via Local Neural Operators

基于局部神经算子的分布式参数系统无方程粗粒度控制

Gianluca Fabiani, Constantinos Siettos, Ioannis G. Kevrekidis

发表机构 * Hopkins Extreme Materials Institute and Department of Chemical and Biomolecular Engineering, Johns Hopkins University(霍普金斯极端材料研究所和化学与生物分子工程系,约翰霍普金斯大学) Dipartimento di Matematica e Applicazioni ”Renato Caccioppoli”, Università degli studi di Napoli Federico II(Renato Caccioppoli数学与应用系,那不勒斯费德里克二世大学) Department of Chemical and Biomolecular Engineering and Department of Applied Mathematics and Statistics, Johns Hopkins University(化学与生物分子工程系和应用数学与统计学系,约翰霍普金斯大学)

AI总结 提出一种数据驱动方法,利用局部神经算子学习短时解算子,结合Krylov子空间方法计算稳态和降阶模型,实现无显式粗粒度方程的高维分布式参数系统控制。

Comments 8 pages, 2 figures

详情
AI中文摘要

当显式粗粒度方程不可用时,高维分布式参数系统(DPS)的控制仍然是一个挑战。经典的无方程(EF)方法依赖于被视为黑箱时间步进器的细尺度模拟器。然而,用于稳态计算、线性化和控制设计的重复模拟通常在计算上代价高昂,或者微观时间步进器甚至可能不可用,使得数据成为唯一资源。我们提出一种数据驱动替代方案,使用在时空微观/介观数据上训练的局部神经算子来获得高效的短时解算子。这些代理模型在Krylov子空间方法中用于计算粗粒度的稳定和不稳定稳态,同时以无矩阵方式提供雅可比信息。然后,Krylov-Arnoldi迭代逼近主导特征谱,生成捕获开环慢动态的降阶模型,而无需显式组装雅可比矩阵。离散时间线性二次型调节器(dLQR)和极点配置(PP)控制器均基于此降阶系统,并提升回完整的非线性动力学,从而闭合反馈回路。该框架通过稳定Liouville-Bratu PDE的不稳定稳态得到验证,展示了学习代理与真实系统之间的一致性能,并在模型失配下量化了性能下降。

英文摘要

The control of high-dimensional distributed parameter systems (DPS) remains a challenge when explicit coarse-grained equations are unavailable. Classical equation-free (EF) approaches rely on fine-scale simulators treated as black-box timesteppers. However, repeated simulations for steady-state computation, linearization, and control design are often computationally prohibitive, or the microscopic timestepper may not even be available, leaving us with data as the only resource. We propose a data-driven alternative that uses local neural operators, trained on spatiotemporal microscopic/mesoscopic data, to obtain efficient short-time solution operators. These surrogates are employed within Krylov subspace methods to compute coarse stable and unstable steady states, while also providing Jacobian information in a matrix-free manner. Krylov-Arnoldi iterations then approximate the dominant eigenspectrum, yielding reduced models that capture the open-loop slow dynamics without explicit Jacobian assembly. Both discrete-time Linear Quadratic Regulator (dLQR) and pole-placement (PP) controllers are based on this reduced system and lifted back to the full nonlinear dynamics, thereby closing the feedback loop. The framework is validated by stabilizing an unstable steady-state of the Liouville-Bratu PDE, demonstrating consistent performance between the learned surrogate and the true system, with quantified degradation under plant-model mismatch.

2509.22299 2026-05-26 cs.LG cs.AI 版本更新

HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space

HEAPr: 基于Hessian的输出空间中高效原子专家剪枝

Ke Li, Zheng Yang, Zhongbin Zhou, Feng Xue, Zhonglin Jiang, Wenxiao Wang

发表机构 * School of Software Technology, Zhejiang University(浙江大学软件学院) FABU Inc.(FABU公司) Hangzhou Kuaidi Science and Technology Co., Ltd.(杭州快的科学技术有限公司)

AI总结 针对MoE模型粗粒度专家剪枝导致精度下降的问题,提出HEAPr算法,通过将专家分解为原子专家并利用二阶信息(最优脑外科原理)评估重要性,在输出空间简化计算,实现高比例无损压缩。

Comments ICLR 2026

详情
Journal ref
Proceedings of the International Conference on Learning Representations (ICLR), 2026
AI中文摘要

大型语言模型中的混合专家(MoE)架构相比密集LLM具有卓越性能和更低的推理成本。然而,其庞大的参数数量导致内存需求过高,限制了实际部署。现有的剪枝方法主要关注专家级剪枝,这种粗粒度通常导致显著的精度下降。在这项工作中,我们引入了HEAPr,一种新颖的剪枝算法,它将专家分解为更小、不可分割的原子专家,从而实现更精确和灵活的原子专家剪枝。为了衡量每个原子专家的重要性,我们利用基于最优脑外科理论原理的二阶信息。为了解决二阶信息带来的计算和存储挑战,HEAPr利用原子专家的固有属性,将专家参数的二阶信息转换为原子专家参数的二阶信息,并进一步简化为原子专家输出的二阶信息。这种方法将空间复杂度从$O(d^4)$(其中$d$是模型的维度)降低到$O(d^2)$。HEAPr仅需在小型校准集上进行两次前向传播和一次反向传播即可计算原子专家的重要性。在包括DeepSeek MoE和Qwen MoE系列在内的MoE模型上的大量实验表明,HEAPr在广泛的剪枝比例和基准测试中优于现有的专家级剪枝方法。具体来说,在大多数模型中,HEAPr在20%~25%的剪枝比例下实现了几乎无损的压缩,同时FLOPs也减少了近20%。代码可在[https://github.com/LLIKKE/HEAPr](https://github.com/LLIKKE/HEAPr)找到。

英文摘要

Mixture-of-Experts (MoE) architectures in large language models (LLMs) deliver exceptional performance and reduced inference costs compared to dense LLMs. However, their large parameter counts result in prohibitive memory requirements, limiting practical deployment. While existing pruning methods primarily focus on expert-level pruning, this coarse granularity often leads to substantial accuracy degradation. In this work, we introduce HEAPr, a novel pruning algorithm that decomposes experts into smaller, indivisible atomic experts, enabling more precise and flexible atomic expert pruning. To measure the importance of each atomic expert, we leverage second-order information based on principles similar to the Optimal Brain Surgeon theory. To address the computational and storage challenges posed by second-order information, HEAPr exploits the inherent properties of atomic experts to transform the second-order information from expert parameters into that of atomic expert parameters, and further simplifies it to the second-order information of atomic expert outputs. This approach reduces the space complexity from $O(d^4)$, where $d$ is the model's dimensionality, to $O(d^2)$. HEAPr requires only two forward passes and one backward pass on a small calibration set to compute the importance of atomic experts. Extensive experiments on MoE models, including DeepSeek MoE and Qwen MoE family, demonstrate that HEAPr outperforms existing expert-level pruning methods across a wide range of pruning ratios and benchmarks. Specifically, HEAPr achieves nearly lossless compression at pruning ratios of 20% ~ 25% in most models, while also reducing FLOPs nearly by 20%. The code can be found at [https://github.com/LLIKKE/HEAPr](https://github.com/LLIKKE/HEAPr).

2509.21592 2026-05-26 cs.CV cs.AI cs.LG 版本更新

What Happens Next? Anticipating Future Motion by Generating Point Trajectories

接下来会发生什么?通过生成点轨迹预测未来运动

Gabrijel Boduljak, Laurynas Karazija, Iro Laina, Christian Rupprecht, Andrea Vedaldi

发表机构 * Visual Geometry Group, University of Oxford(牛津大学视觉几何组)

AI总结 提出一种基于单张图像预测未来运动的方法,通过生成密集轨迹网格来捕捉场景动态和不确定性,相比现有方法更准确多样,并验证其在机器人等下游任务中的有效性。

详情
Journal ref
ICLR 2026
AI中文摘要

我们考虑从单张图像预测运动的问题,即预测世界中物体可能如何移动,而无法观察其他参数如物体速度或施加的力。我们将此任务表述为密集轨迹网格的条件生成,模型紧密遵循现代视频生成器的架构,但输出运动轨迹而非像素。这种方法捕捉了场景范围的动态和不确定性,比先前的回归器和生成器产生更准确和多样化的预测。我们在模拟数据上广泛评估了我们的方法,展示了其在机器人等下游应用中的有效性,并在真实世界的直觉物理数据集上显示出有希望的准确性。尽管最近最先进的视频生成器常被视为世界模型,但我们表明它们在从单张图像预测运动方面存在困难,即使在简单的物理场景如落块或机械物体交互中,尽管对这些数据进行了微调。我们表明这一局限性源于生成像素的开销,而非直接建模运动。

英文摘要

We consider the problem of forecasting motion from a single image, i.e., predicting how objects in the world are likely to move, without the ability to observe other parameters such as the object velocities or the forces applied to them. We formulate this task as conditional generation of dense trajectory grids with a model that closely follows the architecture of modern video generators but outputs motion trajectories instead of pixels. This approach captures scene-wide dynamics and uncertainty, yielding more accurate and diverse predictions than prior regressors and generators. We extensively evaluate our method on simulated data, demonstrate its effectiveness on downstream applications such as robotics, and show promising accuracy on real-world intuitive physics datasets. Although recent state-of-the-art video generators are often regarded as world models, we show that they struggle with forecasting motion from a single image, even in simple physical scenarios such as falling blocks or mechanical object interactions, despite fine-tuning on such data. We show that this limitation arises from the overhead of generating pixels rather than directly modeling motion.

2509.16931 2026-05-26 cs.IR cs.AI cs.LG 版本更新

Equip Pre-ranking with Target Attention by Residual Quantization

通过残差量化为预排序阶段配备目标注意力机制

Yutong Li, Yu Zhu, Yichen Qiao, Ziyu Guan, Lv Shao, Tong Liu, Bo Zheng

发表机构 * Taobao \& Tmall Group of Alibaba Hangzhou China Shanghai Jiao Tong University Shanghai China Xidian University Xi'an China Taobao \& Tmall Group of Alibaba Beijing China Taobao \& Tmall Group of Alibaba Shanghai Jiao Tong University Xidian University

AI总结 提出TARQ框架,利用残差量化在预排序阶段近似目标注意力架构,首次在延迟关键阶段引入TA建模能力,实现精度与效率的新最优平衡。

Comments 5 pages, 2 figures, accepted by SIGIR 2026 Short Paper Track

详情
AI中文摘要

工业推荐系统中的预排序阶段面临效率与效果之间的根本冲突。虽然目标注意力(TA)等强大模型在排序阶段擅长捕捉复杂的特征交互,但其高计算成本使其无法用于通常依赖简单向量积模型的预排序阶段。这种差异给整个系统造成了显著的性能瓶颈。为弥合这一差距,我们提出了TARQ,一种新颖的预排序框架。受生成模型启发,TARQ的关键创新在于通过残差量化为预排序阶段配备近似TA的架构。这使得我们首次将TA的建模能力引入延迟关键的预排序阶段,建立了精度与效率之间新的最优权衡。在淘宝进行的大量离线实验和大规模在线A/B测试证明了TARQ在排序性能上的显著提升。因此,我们的模型已全面部署在生产环境中,服务于数千万日活跃用户,并带来了可观的业务改进。代码和数据可在 https://github.com/zyody/tarq_sigir2026 获取。

英文摘要

The pre-ranking stage in industrial recommendation systems faces a fundamental conflict between efficiency and effectiveness. While powerful models like Target Attention (TA) excel at capturing complex feature interactions in the ranking stage, their high computational cost makes them infeasible for pre-ranking, which often relies on simplistic vector-product models. This disparity creates a significant performance bottleneck for the entire system. To bridge this gap, we propose TARQ, a novel pre-ranking framework. Inspired by generative models, TARQ's key innovation is to equip pre-ranking with an architecture approximate to TA by Residual Quantization. This allows us to bring the modeling power of TA into the latency-critical pre-ranking stage for the first time, establishing a new state-of-the-art trade-off between accuracy and efficiency. Extensive offline experiments and large-scale online A/B tests at Taobao demonstrate TARQ's significant improvements in ranking performance. Consequently, our model has been fully deployed in production, serving tens of millions of daily active users and yielding substantial business improvements. The code and data are available at https://github.com/zyody/tarq_sigir2026.

2509.16139 2026-05-26 cs.LG 版本更新

Spatio-temporal, multi-field deep learning of shock propagation in meso-structured media

介观结构介质中冲击传播的时空多场深度学习

M. Giselle Fernández-Godino, Meir H. Shachar, Kevin Korner, Jonathan L. Belof, Mukul Kumar, Jonathan Lind, William J. Schill

发表机构 * Lawrence Livermore National Laboratory(劳伦斯利弗莫尔国家实验室)

AI总结 提出多场时空模型(MSTM),通过训练多尺度多物理场数据,同时演化七个耦合热力学和动力学场,以高精度预测冲击传播中的异常响应,实现1000倍加速。

Comments 25 pages, 12 figures

详情
AI中文摘要

预测多孔和晶格材料极端流体动力学响应是高能量密度物理学中的一个基本挑战,其中冲击诱导的孔洞塌陷、斜压涡度和异常动力学与热力学状态必须在多个尺度上解析。传统高保真流体动力学代码在行星防御和惯性约束聚变等应用的大规模设计探索中计算成本过高。我们提出了一种多场时空模型(MSTM),旨在克服标准机器学习替代模型的局限性,这些模型通常无法捕捉冲击传播特征的尖锐梯度和非线性场耦合。通过在高保真、多尺度多物理场数据上训练,MSTM同时演化七个耦合的热力学和动力学场——包括压力、温度、密度和速度——跨越复杂材料架构。我们的框架展示了准确预测异常响应的能力,例如反直觉的冲击后密度降低和局部热点形成,均方根误差低至1.4%。关键的是,模型的多场公式在长自回归展开中保持了物理一致性和界面稳定性,在结构保真度上比单场模型提高了94%。该框架实现了1000倍的求解时间减少,为介观结构介质中能量耗散和动量传递的实时分析与优化提供了实用途径。

英文摘要

Predicting the extreme hydrodynamic response of porous and architected lattice materials is a fundamental challenge in high energy density physics, where shock-induced pore collapse, baroclinic vorticity, and anomalous kinetic and thermodynamic states must be resolved across multiple scales. Traditional high-fidelity hydrocodes are computationally prohibitive for large-scale design exploration in applications like planetary defense and inertial confinement fusion. We present a multi-field spatio-temporal model (MSTM) designed to overcome the limitations of standard machine learning surrogates, which often fail to capture the sharp gradients and non-linear field couplings characteristic of shock propagation. By training on high-fidelity, multiscale multiphysics data, MSTM simultaneously evolves seven coupled thermodynamic and kinetic fields - including pressure, temperature, density, and velocity - across complex material architectures. Our framework demonstrates the ability to accurately predict anomalous responses, such as counterintuitive post-shock density reductions and localized hotspot formation, with mean root mean squared errors as low as 1.4%. Crucially, the model's multi-field formulation maintains physical consistency and interface stability over long autoregressive rollouts, outperforming single-field models by 94% in structural fidelity. This framework enables a 1000x reduction in time to solution, providing a practical pathway for the real-time analysis and optimization of energy dissipation and momentum transfer in meso-structured media.

2509.04445 2026-05-26 cs.LG 版本更新

Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment

朝向认知忠实决策模型以改善AI对齐

Cyrus Cousins, Vijay Keswani, Vincent Conitzer, Hoda Heidari, Jana Schaich Borg, Walter Sinnott-Armstrong

发表机构 * Duke University(杜克大学) IIT Delhi(德里印度理工学院) CMU(卡内基梅隆大学)

AI总结 提出一种基于公理的方法,从成对比较中学习认知忠实的决策过程,以解决标准偏好诱导方法未能捕捉人类决策认知过程的问题,并在肾脏分配任务中验证了模型的有效性。

Comments In ICLR 2026

详情
AI中文摘要

最近的AI趋势旨在将AI模型与以人为中心的学习目标(如个人偏好、效用或社会价值观)对齐。使用标准偏好诱导方法,研究人员和从业者构建人类决策和判断的模型,AI模型与之对齐。然而,标准诱导方法通常未能捕捉人类决策背后的认知过程,如启发式或简化的结构化思维模式。为了解决这一失败,我们采用公理化的方法从成对比较中学习认知忠实的决策过程。基于分析塑造人类决策的认知过程的文献,我们推导出一个模型类,其中特征首先通过学习的规则处理,然后通过固定规则(如Bradley-Terry规则)聚合以产生决策。这种结构化的信息处理确保了这些模型作为代表潜在人类决策过程的现实且可行的候选者。我们通过在肾脏分配任务中学习可解释的人类决策模型来展示这种建模方法的有效性,并表明我们提出的模型在准确性上匹配或超越了先前的人类成对决策模型。

英文摘要

Recent AI trends seek to align AI models to learned human-centric objectives, such as personal preferences, utility, or societal values. Using standard preference elicitation methods, researchers and practitioners build models of human decisions and judgments, to which AI models are aligned. However, standard elicitation methods often fail to capture the cognitive processes behind human decision making, such as heuristics or simplifying structured thought patterns. To address this failure, we take an axiomatic approach to learning cognitively faithful decision processes from pairwise comparisons. Building on the literature analyzing cognitive processes that shape human decision-making, we derive a model class in which features are first processed with learned rules, then aggregated via a fixed rule, such as the Bradley-Terry rule, to produce a decision. This structured processing of information ensures that such models are realistic and feasible candidates to represent underlying human decision-making processes. We demonstrate the efficacy of this modeling approach by learning interpretable models of human decision making in a kidney allocation task, and show that our proposed models match or surpass the accuracy of prior models of human pairwise decision-making.

2508.17090 2026-05-26 stat.ML cs.LG 版本更新

Neural Stochastic Differential Equations on Compact State Spaces: Theory, Methods, and Application to Suicide Risk Modeling

紧致状态空间上的神经随机微分方程:理论、方法及其在自杀风险建模中的应用

Malinda Lu, Yue-Jane Liu, Matthew K. Nock, Yaniv Yacoby

发表机构 * Wellesley College(韦尔斯利学院) Harvard University(哈佛大学)

AI总结 针对生态瞬时评估数据中随机微分方程违反域约束和训练不稳定的问题,提出一种新型表达性SDE,通过约束漂移和扩散确保解在紧致多面体状态空间内,并引入参数化映射任意动力学为满足约束的SDE,在真实数据上提升预测和优化性能。

Comments Accepted at the Symposium on Probabilistic Machine Learning (ProbML) 2026, and at the Methods and Opportunities at Small Scale (MOSS), ICML 2025, Vancouver, Canada

详情
AI中文摘要

生态瞬时评估(EMA)研究能够通过智能手机收集自杀想法和行为(STB)的高频自我报告。潜在随机微分方程(SDE)是EMA数据的一个有前景的模型类别,因为数据是不规则采样、有噪声且部分观测的。但基于SDE的模型存在两个关键限制。(a) 这些模型经常违反域约束,削弱了模型的科学有效性和临床信任。(b) 训练在数值上不稳定,除非采用临时修复(例如过度简化的动力学),而这些修复不适合高风险应用。在此,我们开发了一类新型表达性SDE,其解被证明被限制在预设的紧致多面体状态空间内,与EMA数据的域匹配。在这项工作中,(1) 我们从理论和经验上展示了为什么基于链式法则的紧致域上SDE构造会失败;(2) 我们推导了一般和稳态SDE的漂移和扩散约束,使其解保持在所需状态空间内;(3) 我们引入了一种参数化方法,将任意(神经或专家给出的)动力学映射为满足约束的SDE。在多个真实EMA数据集上,包括一项大型自杀风险研究,我们的参数化方法在预测和优化动力学方面优于标准潜在神经SDE基线。这些贡献为自杀风险和其他临床时间序列的原则性、可信赖的连续时间模型铺平了道路,并将基于SDE的方法(例如扩散模型)的应用扩展到具有硬状态约束的领域。

英文摘要

Ecological Momentary Assessment (EMA) studies enable the collection of high-frequency self-reports of suicidal thoughts and behaviors (STBs) via smartphones. Latent stochastic differential equations (SDEs) are a promising model class for EMA data, as it is irregularly sampled, noisy, and partially observed. But SDE-based models suffer from two key limitations. (a) These models often violate domain constraints, undermining scientific validity and clinical trust of the model. (b) Training is numerically unstable without ad hoc fixes (e.g. oversimplified dynamics) that are ill-suited for high-stakes applications. Here, we develop a novel class of expressive SDEs whose solutions are provably confined to a prescribed compact polyhedral state space, matching the domains of EMA data. In this work, (1) we show why chain-rule based constructions of SDEs on compact domains fail, theoretically and empirically; (2) we derive constraints on drift and diffusion for general and stationary SDEs so their solutions remain in the desired state space; and (3), we introduce a parameterization that maps arbitrary (neural or expert-given) dynamics into constraint-satisfying SDEs. On several real EMA datasets, including a large suicide-risk study, our parameterization improves forecasts and optimization dynamics over standard latent neural SDE baselines. These contributions pave the way for principled, trustworthy continuous-time models of suicide risk and other clinical time series and extend applications of SDE-based methods (e.g. diffusion models) to domains with hard state constraints.

2508.13309 2026-05-26 cs.CV cs.LG 版本更新

DASH: A Meta-Attack Framework for Synthesizing Effective and Stealthy Adversarial Examples

DASH:一种用于合成有效且隐蔽的对抗样本的元攻击框架

Abdullah Al Nomaan Nafi, Habibur Rahaman, Zafaryab Haider, Tanzim Mahfuz, Fnu Suya, Swarup Bhunia, Prabuddha Chakraborty

发表机构 * University of Maine(缅因大学) University of Florida(佛罗里达大学) University of Tennessee, Knoxville(田纳西大学,基洛纳)

AI总结 提出DASH元攻击框架,通过多阶段自适应组合Lp约束攻击方法,生成有效且感知对齐的对抗样本,在多个数据集上优于现有方法。

Comments Accepted to CVPR 2026

详情
AI中文摘要

在白盒设置下,已有大量技术被提出用于在严格的Lp范数约束下生成对抗样本。然而,这类范数受限的样本往往与人类感知不一致,只有少数方法专门探索感知对齐的对抗样本。此外,尚不清楚能否有效利用Lp约束攻击的见解来提升感知效能。本文介绍DASH,一个完全可微的元攻击框架,通过策略性地组合现有基于Lp的攻击方法,生成有效且感知对齐的对抗样本。DASH以多阶段方式运行:在每个阶段,它使用学习到的自适应权重聚合来自多个基础攻击的候选对抗样本,并将结果传播到下一阶段。一种新颖的元损失函数通过联合最小化误分类损失和感知失真来指导这一过程,使框架能够动态调整每个基础攻击在各阶段的贡献。我们在CIFAR-10、CIFAR-100和ImageNet上对对抗训练模型评估DASH。尽管仅依赖基于Lp约束的方法,DASH显著优于最先进的感知攻击如AdvAD,实现了更高的攻击成功率(例如提升20.63%)和更优的视觉质量(以SSIM、LPIPS和FID衡量,分别提升约11、0.015和5.7)。此外,DASH对未见过的防御具有良好的泛化能力,使其成为评估鲁棒性的实用且强大的基线,无需为每种新防御手工设计自适应攻击。

英文摘要

Numerous techniques have been proposed for generating adversarial examples in white-box settings under strict Lp-norm constraints. However, such norm-bounded examples often fail to align well with human perception, and only a few methods specifically explore perceptually aligned adversarial examples. Moreover, it remains unclear whether insights from Lp-constrained attacks can be effectively leveraged to improve perceptual efficacy. In this paper, we introduce DASH, a fully differentiable meta-attack framework that generates effective and perceptually aligned adversarial examples by strategically composing existing Lp-based attack methods. DASH operates in a multi-stage fashion: at each stage, it aggregates candidate adversarial examples from multiple base attacks using learned, adaptive weights and propagates the result to the next stage. A novel meta-loss function guides this process by jointly minimizing misclassification loss and perceptual distortion, enabling the framework to dynamically modulate the contribution of each base attack throughout the stages. We evaluate DASH on adversarially trained models across CIFAR-10, CIFAR-100, and ImageNet. Despite relying solely on Lp-constrained based methods, DASH significantly outperforms state-of-the-art perceptual attacks such as AdvAD, achieving higher attack success rates (e.g., 20.63% improvement) and superior visual quality, as measured by SSIM, LPIPS, and FID (improvements $\approx$ of 11, 0.015, and 5.7, respectively). Furthermore, DASH generalizes well to unseen defenses, making it a practical and strong baseline for evaluating robustness without requiring handcrafted adaptive attacks for each new defense.

2508.11925 2026-05-26 cs.CR cs.CL cs.LG 版本更新

Optimizing Token Choice for Code Watermarking: An RL Approach

优化代码水印的令牌选择:一种强化学习方法

Zhimeng Guo, Huaisheng Zhu, Siyuan Xu, Hangfan Zhang, Teng Xiao, Minhao Cheng

发表机构 * The Pennsylvania State University(宾夕法尼亚州立大学)

AI总结 提出CodeTracer框架,通过强化学习训练策略模型智能选择令牌嵌入水印,在保持代码功能的同时提高水印可检测性。

Comments ICML 2026, 18 pages, 3 figures

详情
AI中文摘要

保护LLM生成代码的知识产权需要有效的水印系统,该系统能够在代码高度结构化、语法受限的性质中运行。在这项工作中,我们引入了CodeTracer,一种创新的自适应代码水印框架,其基础是一种新颖的强化学习训练范式。其核心是,CodeTracer采用策略驱动方法,利用参数化模型在下一个令牌预测期间智能地偏向令牌选择。该策略确保嵌入的水印保持代码功能,同时表现出与典型令牌分布微妙但统计上可检测的偏差。为了促进策略学习,我们设计了一个全面的奖励系统,将执行反馈与水印嵌入信号无缝集成,平衡过程级和结果级奖励。此外,我们采用Gumbel Top-k重参数化来实现离散水印决策的基于梯度的优化。广泛的比较评估表明,CodeTracer在水印可检测性和生成代码功能保持方面均显著优于最先进的基线。我们的代码可在https://github.com/TimeLovercc/CodeTracer获取。

英文摘要

Protecting intellectual property on LLM-generated code necessitates effective watermarking systems that can operate within code's highly structured, syntactically constrained nature. In this work, we introduce CodeTracer, an innovative adaptive code watermarking framework underpinned by a novel reinforcement learning training paradigm. At its core, CodeTracer features a policy-driven approach that utilizes a parameterized model to intelligently bias token choices during next-token prediction. This strategy ensures that embedded watermarks maintain code functionality while exhibiting subtle yet statistically detectable deviations from typical token distributions. To facilitate policy learning, we devise a comprehensive reward system that seamlessly integrates execution feedback with watermark embedding signals, balancing process-level and outcome-level rewards. Additionally, we employ Gumbel Top-k reparameterization to enable gradient-based optimization of discrete watermarking decisions. Extensive comparative evaluations demonstrate CodeTracer's significant superiority over state-of-the-art baselines in both watermark detectability and the preservation of generated code's functionality. Our code is available at https://github.com/TimeLovercc/CodeTracer.

2508.03104 2026-05-26 cs.LG cs.AI 版本更新

HiTeC: Hierarchical Contrastive Learning on Text-Attributed Hypergraph with Semantic-Aware Augmentation

HiTeC: 基于语义感知增强的文本属性超图层次对比学习

Mengting Pan, Fan Li, Chen Chen, Xiaoyang Wang, Wenjie Zhang

发表机构 * The University of New South Wales(新南威尔士大学) University of Wollongong(沃拉彭大学)

AI总结 提出HiTeC框架,通过两阶段层次对比学习,结合结构感知文本编码预训练和语义感知增强,解决文本属性超图中文本与拓扑关联不足、随机增强噪声及长程依赖捕获问题。

Comments 16 pages, 8 figures

详情
AI中文摘要

对比学习已成为自监督超图学习的主流范式,能够在无需昂贵标签的情况下实现有效训练。然而,现实世界超图中的节点实体通常关联丰富的文本信息,这在先前工作中被大量忽略。直接将现有基于对比学习的方法应用于此类文本属性超图(TAHGs)会导致三个关键限制:(1)普遍使用的图无关文本编码器无法捕获文本语义与超图拓扑之间的相关性,导致表示表达能力不足。(2)它们对随机数据增强的依赖引入了噪声并削弱了对比信号。(3)主要关注节点和超边级别的对比信号限制了捕获长程依赖的能力,而这对于有效的表示学习至关重要。为解决这些挑战,我们引入了HiTeC,一个两阶段层次对比学习框架,用于在TAHGs上进行有效的自监督学习。在第一阶段,我们使用结构感知的对比目标预训练文本编码器,以克服传统方法的图无关特性。在第二阶段,我们首先引入语义感知增强,包括结构上下文化的文本增强和语义感知的超边丢弃,以促进信息丰富的视图生成。随后,我们提出一个多尺度对比损失,结合基于$s$步行走的子图级别目标,以捕获长程依赖。在六个真实世界数据集上的大量实验验证了我们提出方法的有效性。

英文摘要

Contrastive learning (CL) has become a dominant paradigm for self-supervised hypergraph learning, enabling effective training without costly labels. However, node entities in real-world hypergraphs are often associated with rich textual information, which has been largely ignored in prior works. Directly applying existing CL-based methods to such text-attributed hypergraphs (TAHGs) leads to three key limitations: (1) The common use of graph-agnostic text encoders fails to capture the correlations between textual semantics and hypergraph topology, resulting in less expressive representations. (2) Their reliance on random data augmentations introduces noise and weakens the contrastive signals. (3) The primary focus on node- and hyperedge-level contrastive signals limits the ability to capture long-range dependencies, which is essential for effective representation learning. To address these challenges, we introduce HiTeC, a two-stage hierarchical contrastive learning framework for effective self-supervised learning on TAHGs. In the first stage, we pre-train the text encoder with a structure-aware contrastive objective to overcome the graph-agnostic nature of conventional methods. In the second stage, we begin by introducing semantic-aware augmentations, including structure-contextualized text augmentation and semantic-aware hyperedge dropping, to facilitate informative view generation. Subsequently, we propose a multi-scale contrastive loss with an $s$-walk-based subgraph-level objective to capture long-range dependencies. Extensive experiments on six real-world datasets validate the effectiveness of our proposed method.

2507.10593 2026-05-26 cs.SE cs.AI cs.CL cs.LG 版本更新

ToolRegistry: A Protocol-Agnostic Tool Management Library for Function-Calling LLMs

ToolRegistry: 一个用于函数调用LLM的协议无关工具管理库

Peng Ding, Rick Stevens

发表机构 * University of Chicago(芝加哥大学) Argonne National Laboratory(阿贡国家实验室)

AI总结 提出ToolRegistry系统,通过统一工具对象和注册表实现协议无关的工具管理,支持多种传输协议、可插拔后端和高级功能,显著减少集成代码并提升吞吐量。

Comments 16 pages, 4 figures, v3: add co-author, permission system, progressive tool disclosure, think-augmented calling, RPC framing, multi-provider support

详情
AI中文摘要

每个LLM工具调用在结构上都是一个RPC——一个函数名、JSON参数和序列化结果——然而每个协议(原生Python、MCP、OpenAPI、LangChain)都是从零开始集成的。我们提出ToolRegistry,一个使这种RPC本质显式化的系统:一个单一的Tool对象充当通用存根,无论传输方式如何,而注册表则作为RPC客户端运行时,负责调度、模式生成和执行。该系统以三个包的形式发布——一个核心注册表、一个通过MCP和OpenAPI暴露工具的服务器,以及一个生产就绪实现的中心——并通过可插拔的线程或进程后端调用工具。该系统现在还提供基于标签的权限策略、针对大型注册表的BM25F驱动的渐进式工具披露、增强思考的函数调用、多提供商模式支持(OpenAI、Anthropic、Gemini)、声明式JSONC/YAML配置,以及一个基于仅stdlib内置模块的近乎零依赖的核心。在我们的基准测试中,该库将集成代码减少了60-80%,并且为给定工作负载选择正确的并发模式(线程与进程)相比替代方案可带来高达3.1倍的吞吐量。ToolRegistry在https://github.com/Oaklight/ToolRegistry开源;文档位于https://toolregistry.readthedocs.io/。

英文摘要

Every LLM tool call is structurally an RPC -- a function name, JSON arguments, and a serialized result -- yet each protocol (native Python, MCP, OpenAPI, LangChain) is integrated from scratch. We present ToolRegistry, a system that makes this RPC nature explicit: a single Tool object acts as a universal stub regardless of transport, while the registry serves as the RPC client runtime for dispatch, schema generation, and execution. The system ships as three packages -- a core registry, a server exposing tools over MCP and OpenAPI, and a hub of production-ready implementations -- and invokes tools through pluggable thread or process backends. The system now also provides tag-based permission policies, BM25F-powered progressive tool disclosure for large registries, think-augmented function calling, multi-provider schema support (OpenAI, Anthropic, Gemini), declarative JSONC/YAML configuration, and a near-zero-dependency core built on stdlib-only vendored modules. In our benchmarks the library cuts integration code by 60-80%, and choosing the right concurrency mode (thread vs. process) yields up to 3.1x throughput over the alternative for a given workload. ToolRegistry is open-source at https://github.com/Oaklight/ToolRegistry; documentation lives at https://toolregistry.readthedocs.io/.

2507.03159 2026-05-26 cs.LG math.OC 版本更新

MathOptAI.jl: Embed trained machine learning predictors into JuMP models

MathOptAI.jl: 将训练好的机器学习预测器嵌入JuMP模型

Oscar Dowson, Robert B Parker, Russel Bent

发表机构 * Dowson Farms(多森农场) Los Alamos National Laboratory(洛斯阿拉莫斯国家实验室)

AI总结 提出开源Julia库MathOptAI.jl,将多种训练好的机器学习模型(神经网络、决策树、高斯过程)嵌入JuMP优化模型,并支持PyTorch模型的GPU加速。

详情
AI中文摘要

我们提出了 exttt{MathOptAI.jl},一个用于将训练好的机器学习预测器嵌入JuMP模型的开源Julia库。 exttt{MathOptAI.jl}可以将多种神经网络、决策树和高斯过程嵌入到更大的数学优化模型中。除了与一系列基于Julia的机器学习库(如 exttt{Lux.jl}和 exttt{Flux.jl})交互外, exttt{MathOptAI.jl}还利用Julia的Python接口提供对PyTorch模型的支持。当PyTorch支持与 exttt{MathOptAI.jl}的灰盒公式结合时,与PyTorch模型相关的函数、雅可比矩阵和海森矩阵评估被卸载到Python中的GPU上,而其余的非线性预言机则在Julia中的CPU上评估。\MathOptAI可在https://github.com/lanl-ansi/MathOptAI.jl上获取,采用BSD-3许可证。

英文摘要

We present \texttt{MathOptAI.jl}, an open-source Julia library for embedding trained machine learning predictors into a JuMP model. \texttt{MathOptAI.jl} can embed a wide variety of neural networks, decision trees, and Gaussian Processes into a larger mathematical optimization model. In addition to interfacing a range of Julia-based machine learning libraries such as \texttt{Lux.jl} and \texttt{Flux.jl}, \texttt{MathOptAI.jl} uses Julia's Python interface to provide support for PyTorch models. When the PyTorch support is combined with \texttt{MathOptAI.jl}'s gray-box formulation, the function, Jacobian, and Hessian evaluations associated with the PyTorch model are offloaded to the GPU in Python, while the rest of the nonlinear oracles are evaluated on the CPU in Julia. \MathOptAI is available at https://github.com/lanl-ansi/MathOptAI.jl under a BSD-3 license.

2507.02215 2026-05-26 stat.ML cs.LG cs.NA math.NA 版本更新

Hybrid least squares for learning functions from highly noisy data

混合最小二乘法:从高噪声数据中学习函数

Ben Adcock, Bernhard Hientzsch, Akil Narayan, Yiming Xu

发表机构 * Department of Mathematics, Simon Fraser University(Simon Fraser大学数学系) Courant Institute of Mathematical Sciences, New York University(纽约大学Courant数学科学研究所) Scientific Computing and Imaging Institute, University of Utah(犹他大学科学计算与成像研究所) Department of Mathematics, University of Kentucky(肯塔基大学数学系)

AI总结 针对高噪声数据下的最小二乘函数逼近问题,提出结合Christoffel采样与最优实验设计的混合方法,在样本点生成和噪声平滑方面实现最优性,提升计算效率和样本复杂度,并扩展到凸性约束和自适应随机子空间场景。

Comments 30 pages

详情
AI中文摘要

受高效估计条件期望需求的驱动,我们考虑一个数据严重污染的最小二乘函数逼近问题。在小噪声情况下有效的现有方法在存在大噪声时表现不佳。为了解决这个问题,我们提出了一种混合方法,将Christoffel采样与最优实验设计相结合。我们证明,所提出的算法在样本点生成和噪声平滑方面都具有适当的优化特性,与现有方法相比,提高了计算效率和样本复杂度。我们还将该算法扩展到凸性约束设置,并具有类似的理论保证。当目标函数定义为随机场的期望时,我们进一步扩展我们的方法以利用自适应随机子空间,并建立了自适应过程逼近能力的结果。我们的理论发现得到了数值研究的支持,包括合成数据以及计算金融中更具挑战性的随机模拟问题。

英文摘要

Motivated by the need for efficient estimation of conditional expectations, we consider a least-squares function approximation problem with heavily polluted data. Existing methods that are effective in the small-noise regime are suboptimal when large noise is present. To address this issue, we propose a hybrid approach that combines Christoffel sampling with optimal experimental design. We show that the proposed algorithm enjoys appropriate optimality properties for both sample point generation and noise mollification, leading to improved computational efficiency and sample complexity compared to existing methods. We also extend the algorithm to convexity-constrained settings with similar theoretical guarantees. When the target function is defined as the expectation of a random field, we further extend our approach to leverage adaptive random subspaces and establish results on the approximation capacity of the adaptive procedure. Our theoretical findings are supported by numerical studies on both synthetic data and on a more challenging stochastic simulation problem in computational finance.

2506.21137 2026-05-26 cs.LG 版本更新

Norm$\times$Direction: Restoring the Missing Query Norm in Vision Linear Attention

Norm×Direction:恢复视觉线性注意力中缺失的查询范数

Weikang Meng, Yadan Luo, Liangyu Huo, Yingjian Li, Yaowei Wang, Xin Li, Zheng Zhang

发表机构 * Harbin Institute of Technology, Shenzhen, China(哈尔滨工业大学(深圳)) Pengcheng Laboratory, China(鹏城实验室) The University of Queensland, Australia(昆士兰大学)

AI总结 针对线性注意力中查询范数丢失和非负性导致信息损失的问题,提出基于范数-方向分解的NaLaFormer,通过注入查询范数恢复注意力分布尖峰性,并采用余弦相似度保证非负性,在多项任务上达到线性注意力新标杆。

详情
AI中文摘要

线性注意力缓解了softmax注意力的二次复杂度,但遭受了关键的表达能力损失。我们识别出两个主要原因:(1)归一化操作取消了查询范数,这打破了查询范数与softmax注意力中注意力分布的尖峰性(熵)之间的相关性。(2)强制非负性的标准技术通过抵消有效的内积交互导致破坏性的信息损失。为了解决这些挑战,我们引入了NaLaFormer,一种基于查询和键向量的范数×方向(ND)分解的新型线性注意力机制。我们利用每个分量解决一个不同的问题:查询范数被注入到我们的核中,以创建一个查询范数感知的映射,恢复注意力分布的尖峰性。方向向量通过基于几何的余弦相似度度量进行处理,该度量在保证非负性的同时保留了内积的丰富细粒度信息。我们通过全面的多模态评估验证了NaLaFormer,它在线性注意力上设立了新的最先进基准。我们的模型在ImageNet-1K上实现了高达7.5%的准确率提升,在ADE20K上实现了4.7%的mIoU改进,相比可比的基线。它展示了深刻的效率,在令牌密集的超分辨率任务(7万+令牌)中,将峰值内存减少了变革性的92.3%。NaLaFormer的通用性进一步得到证实,它在常识推理上超越了像Mamba这样的强基线,并在Long Range Arena(LRA)基准上设立了新的最先进水平。代码可在https://github.com/ZacharyMeng/NaLaFormer获取。

英文摘要

Linear attention mitigates the quadratic complexity of softmax attention but suffers from a critical loss of expressiveness. We identify two primary causes: (1) The normalization operation cancels the query norm, which breaks the correlation between a query's norm and the spikiness (entropy) of the attention distribution as in softmax attention. (2) Standard techniques for enforcing non-negativity cause destructive information loss by nullifying valid inner-product interactions. To address these challenges, we introduce NaLaFormer, a novel linear attention mechanism built upon a norm$\times$direction (ND) decomposition of the query and key vectors. We leverage each component to solve a distinct problem: The query norm is injected into our kernel to create a query-norm-aware map that restores the attention distribution's spikiness. The direction vectors are processed by a geometric, cosine-based similarity metric that guarantees non-negativity while preserving the rich, fine-grained information of the inner product. We validate NaLaFormer through a comprehensive multi-modal evaluation, where it sets new state-of-the-art benchmarks for linear attention. Our model achieves up to a 7.5% accuracy gain on ImageNet-1K and a 4.7% mIoU improvement on ADE20K over comparable baselines. It demonstrates profound efficiency, reducing peak memory by a transformative 92.3% in token-intensive super-resolution tasks (70K+ tokens). NaLaFormer's versatility is further confirmed as it surpasses strong baselines like Mamba on common-sense reasoning and sets a new state-of-the-art on the Long Range Arena (LRA) benchmark. Code is available at https://github.com/ZacharyMeng/NaLaFormer .

2506.19037 2026-05-26 cs.CL cs.AI cs.IT cs.LG cs.NE math.IT 版本更新

Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models

速度规划:用于掩码扩散语言模型的膨胀调度

Omer Luxembourg, Haim Permuter, Eliya Nachmani

发表机构 * School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beersheba, Israel(电气与计算机工程学院,内盖夫本· Gurion大学,贝尔谢巴,以色列)

AI总结 提出膨胀解掩码调度器(DUS),通过将序列位置划分为非相邻的膨胀组并并行解掩码,最小化联合熵增益上界,在不修改去噪器的情况下实现高达5.8倍加速。

Comments Accepted at ICML 2026

详情
AI中文摘要

掩码扩散语言模型(MDLM)承诺快速、非自回归的文本生成,然而现有的采样器根据模型置信度选择要解掩码的标记,忽略了并行解掩码多个位置时的交互,实际上退化为缓慢的自回归行为。我们提出了膨胀解掩码调度器(DUS),这是一种仅推理、无需规划模型的方法,它将序列位置划分为非相邻的膨胀组,并并行解掩码,以在每个去噪步骤中最小化联合熵增益的上界。通过明确权衡网络调用次数与生成质量,DUS恢复了传统并行解掩码策略下丢失的大部分性能。在数学(GSM8K, MATH500)、代码(HumanEval, MBPP)、通用知识(BBH, MMLU-Pro)和指令遵循(IFEval)基准测试中,DUS优于基于置信度的规划器,并将扩散特有的质量-速度权衡转化为由块大小$B$确定的确定性、可预测的加速,与逐标记MDLM解码相比,实现了高达5.8倍的墙钟加速,而无需修改底层去噪器。作为即插即用的后滤波器,膨胀间隔也改进了自适应采样器。代码可在https://github.com/omerlux/DUS获取。

英文摘要

Masked diffusion language models (MDLMs) promise fast, non-autoregressive text generation, yet existing samplers, which pick tokens to unmask based on model confidence, ignore interactions when unmasking multiple positions in parallel and effectively reduce to slow, autoregressive behavior. We propose the Dilated Unmasking Scheduler (DUS), an inference-only, planner-model-free method that partitions sequence positions into non-adjacent dilated groups and unmasks them in parallel so as to minimize an upper bound on joint entropy gain at each denoising step. By explicitly trading off the number of network calls against generation quality, DUS recovers most of the performance lost under traditional parallel unmasking strategies. Across math (GSM8K, MATH500), code (HumanEval, MBPP), general-knowledge (BBH, MMLU-Pro), and instruction following (IFEval) benchmarks, DUS outperforms confidence-based planners and turns the diffusion-specific quality-speed trade-off into a deterministic, predictable speedup set by the block size $B$, yielding up to $5.8\times$ wall-clock speedup over token-by-token MDLM decoding without modifying the underlying denoiser. Applied as a drop-in post-filter, dilated spacing also improves adaptive samplers. Code is available at https://github.com/omerlux/DUS.

2506.17326 2026-05-26 cs.LG stat.AP stat.ML 版本更新

CopulaSMOTE: A Copula-Based Oversampling Approach for Imbalanced Classification in Diabetes Prediction

CopulaSMOTE:基于Copula的过采样方法用于糖尿病预测中的不平衡分类

Agnideep Aich, Md Monzur Murshed, Bruce Wade, Sameera Hewage

发表机构 * Stanford University School of Medicine(斯坦福大学医学院) Minnesota State University(明尼苏达州立大学) University of Louisiana at Lafayette(路易斯安那大学拉斐特分校) Southern Utah University(犹他州南方大学)

AI总结 提出CopulaSMOTE方法,利用截断藤copula建模少数类联合依赖结构生成合成样本,在三个糖尿病数据集上结合多种分类器评估,显示能改善大表格数据集的少数类恢复。

详情
AI中文摘要

类别不平衡仍然是糖尿病等疾病临床预测模型开发中的一个实际障碍,其中确诊病例的数量通常远少于对照组。合成少数类过采样技术(SMOTE)及其变体被广泛用于解决这种不平衡,但它们通过特征空间中的局部插值生成合成观测值,并未显式建模少数类的联合依赖结构。为了解决这一挑战,我们的研究引入了一种基于copula的数据增强方法,该方法在生成合成样本时估计少数类的依赖结构,并与标准机器学习技术集成。具体来说,我们采用截断藤copula通过一系列双变量构建块来表示多元依赖。我们在三个公共糖尿病数据集上评估了所提出的方法,即Pima Indians糖尿病数据集、Iraqi糖尿病数据集和CDC BRFSS 2015糖尿病健康指标数据集,这些数据集涵盖了不同的样本量、维度和不平衡程度。对于每个数据集,使用5×2交叉验证协议和Dietterich配对t检验,在五个分类器上比较了五种重采样策略。我们的研究结果表明,CopulaSMOTE可以改善较大表格糖尿病数据集(尤其是CDC BRFSS数据集)中的少数类恢复,但其优势取决于分类器和评估指标。

英文摘要

Class imbalance remains a practical obstacle in the development of clinical prediction models for conditions such as diabetes mellitus, where the number of confirmed cases is often much smaller than the number of controls. The Synthetic Minority Over-sampling Technique (SMOTE) and its variants are widely used to address this imbalance, but they generate synthetic observations through local interpolation in feature space and do not explicitly model the joint dependence structure of the minority class. To address this challenge, our study introduces a copula-based data augmentation approach that estimates the minority-class dependence structure when generating synthetic samples and integrates with standard machine learning techniques. Specifically, we employ truncated vine copulas to represent multivariate dependence through a sequence of bivariate building blocks. We evaluate the proposed approach on three public diabetes datasets, namely the Pima Indians Diabetes dataset, the Iraqi Diabetes dataset, and the CDC BRFSS 2015 Diabetes Health Indicators dataset, which together cover a range of sample sizes, dimensionalities, and imbalance regimes. For each dataset, five resampling strategies are compared across five classifiers using a 5 by 2 cross validation protocol with Dietterich's paired t test. Our findings suggest that CopulaSMOTE can improve minority-class recovery in larger tabular diabetes datasets, particularly the CDC BRFSS dataset, but its advantages depend on the classifier and evaluation metric.

2506.11027 2026-05-26 cs.LG cs.AI cs.PL 版本更新

From Reasoning to Code: GRPO Optimization for Underrepresented Languages

从推理到代码:针对代表性不足语言的GRPO优化

Federico Pennino, Bianca Raimondi, Massimo Rondelli, Andrea Gurioli, Maurizio Gabbrielli

发表机构 * Qwen2.5-Coder

AI总结 提出结合Qwen2.5-Coder小模型与GRPO的强化学习方法,利用执行反馈和奖励机制提升Prolog、Lisp等低资源语言的代码生成准确性与推理质量。

Comments Accepted ICLP 2026

详情
AI中文摘要

使用大型语言模型(LLM)生成准确且可执行的代码对于代表性不足的编程语言(如Prolog和Lisp)仍然是一个重大挑战,因为与Python等高资源语言相比,公共训练数据稀缺。本文介绍了一种可泛化的强化学习(RL)方法,将Qwen2.5-Coder模型的小规模版本与组相对策略优化(GRPO)相结合,通过推理实现有效的代码生成。为了解决稀疏数据集的局限性,我们将执行驱动的反馈直接集成到RL循环中,利用一个奖励系统,该系统同时利用逻辑正确性和结构格式。在GSM8K数据集上的实验结果表明,在代表性不足的语言中,推理质量和代码准确性有显著提升。这些发现强调了我们的方法通过利用符号推理和基于解释器的反馈,使缺乏广泛训练资源的多种编程语言受益的潜力。

英文摘要

Generating accurate and executable code using Large Language Models (LLMs) remains a significant challenge for underrepresented programming languages, such as Prolog and Lisp, due to the scarcity of public training data compared to high-resource languages like Python. This paper introduces a generalizable Reinforcement Learning (RL) approach that combines small-scale versions of the Qwen2.5-Coder model with Group Relative Policy Optimization (GRPO) to enable effective code generation through reasoning. To address the limitations of sparse datasets, we integrate execution-driven feedback directly into the RL loop, utilizing a reward system that exploits both logical correctness and structural formatting. Experimental results on GSM8K dataset demonstrate significant improvements in reasoning quality and code accuracy across underrepresented languages. These findings underscore the potential of our approach to benefit a wide range of programming languages lacking extensive training resources by leveraging symbolic reasoning and interpreter-based feedback.

2506.06840 2026-05-26 stat.ML cs.AI cs.LG stat.AP stat.OT 版本更新

A Statistical Framework for Model Selection in LSTM Networks

LSTM网络中模型选择的统计框架

Fahad Mostafa

发表机构 * School of Mathematical and Natural Sciences, Arizona State University(数学与自然科学院,亚利桑那州立大学)

AI总结 针对LSTM网络模型选择依赖启发式且计算昂贵的问题,提出统一统计框架,通过扩展信息准则和收缩估计到序列神经网络,定义适应时间结构的惩罚似然、广义阈值方法处理隐状态动态,并利用变分贝叶斯和近似边际似然实现高效估计,在生物医学数据上验证了灵活性和性能提升。

详情
AI中文摘要

长短期记忆(LSTM)神经网络模型已成为从自然语言处理到时间序列预测等众多应用中序列数据建模的基石。尽管取得了成功,但模型选择问题,包括超参数调优、架构规范和正则化选择,仍然很大程度上是启发式的且计算昂贵。在本文中,我们提出了一个统一的统计框架,用于LSTM网络中的系统模型选择。我们的框架将经典的模型选择思想,如信息准则和收缩估计,扩展到序列神经网络。我们定义了适应时间结构的惩罚似然,提出了一个用于隐状态动态的广义阈值方法,并利用变分贝叶斯和近似边际似然方法提供了高效的估计策略。几个以生物医学数据为中心的示例展示了所提出框架的灵活性和改进的性能。

英文摘要

Long Short-Term Memory (LSTM) neural network models have become the cornerstone for sequential data modeling in numerous applications, ranging from natural language processing to time series forecasting. Despite their success, the problem of model selection, including hyperparameter tuning, architecture specification, and regularization choice remains largely heuristic and computationally expensive. In this paper, we propose a unified statistical framework for systematic model selection in LSTM networks. Our framework extends classical model selection ideas, such as information criteria and shrinkage estimation, to sequential neural networks. We define penalized likelihoods adapted to temporal structures, propose a generalized threshold approach for hidden state dynamics, and provide efficient estimation strategies using variational Bayes and approximate marginal likelihood methods. Several biomedical data centric examples demonstrate the flexibility and improved performance of the proposed framework.

2506.06454 2026-05-26 cs.LG cs.AI stat.ML 版本更新

LETS Forecast: Learning Embedology for Time Series Forecasting

LETS Forecast:用于时间序列预测的嵌入学

Abrar Majeedi, Viswanatha Reddy Gajjala, Satya Sai Srinath Namburi GNVV, Nada Magdi Elkordi, Yin Li

发表机构 * Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison(生物统计学与医学信息学系,威斯康星大学麦迪逊分校) Department of Computer Sciences, University of Wisconsin-Madison(计算机科学系,威斯康星大学麦迪逊分校)

AI总结 提出DeepEDM框架,结合非线性动力系统建模与深度学习,通过延迟嵌入和核回归学习潜在动态,实现高精度时间序列预测。

Comments Accepted at International Conference on Machine Learning (ICML) 2025

详情
AI中文摘要

现实世界的时间序列通常受复杂的非线性动力学支配。理解这些潜在动力学对于精确的未来预测至关重要。虽然深度学习在时间序列预测中取得了重大成功,但许多现有方法并未显式建模动力学。为弥补这一差距,我们引入了DeepEDM,一个将非线性动力系统建模与深度神经网络相结合的框架。受经验动态建模(EDM)启发并基于Takens定理,DeepEDM提出了一种新颖的深度模型,该模型从时间延迟嵌入中学习潜在空间,并使用核回归来逼近潜在动力学,同时利用softmax注意力的高效实现,允许对未来时间步进行准确预测。为了评估我们的方法,我们在非线性动力系统的合成数据以及跨领域的真实世界时间序列上进行了全面实验。结果表明,DeepEDM对输入噪声具有鲁棒性,并在预测准确性上优于最先进的方法。我们的代码可在以下网址获取:https://abrarmajeedi.github.io/deep_edm。

英文摘要

Real-world time series are often governed by complex nonlinear dynamics. Understanding these underlying dynamics is crucial for precise future prediction. While deep learning has achieved major success in time series forecasting, many existing approaches do not explicitly model the dynamics. To bridge this gap, we introduce DeepEDM, a framework that integrates nonlinear dynamical systems modeling with deep neural networks. Inspired by empirical dynamic modeling (EDM) and rooted in Takens' theorem, DeepEDM presents a novel deep model that learns a latent space from time-delayed embeddings, and employs kernel regression to approximate the underlying dynamics, while leveraging efficient implementation of softmax attention and allowing for accurate prediction of future time steps. To evaluate our method, we conduct comprehensive experiments on synthetic data of nonlinear dynamical systems as well as real-world time series across domains. Our results show that DeepEDM is robust to input noise, and outperforms state-of-the-art methods in forecasting accuracy. Our code is available at: https://abrarmajeedi.github.io/deep_edm.

2506.04805 2026-05-26 cs.LG 版本更新

Adaptive Preconditioners Trigger Loss Spikes in Adam

Adam中的自适应预处理器引发损失尖峰

Zhiwei Bai, Zhangchen Zhou, Jiajie Zhao, Xiaolong Li, Zhiyu Li, Feiyu Xiong, Hongkang Yang, Yaoyu Zhang, Zhi-Qin John Xu

发表机构 * Institute of Natural Sciences, MOE-LSC, Shanghai Jiao Tong University(上海交通大学理论科学研究院) School of Mathematical Sciences, Shanghai Jiao Tong University(上海交通大学数学科学学院) MemTensor (Shanghai) Technology Co., Ltd.(MemTensor(上海)科技有限公司) Institute for Advanced Algorithms Research, Shanghai(上海先进算法研究院) Shanghai Seres Information Technology Co., Ltd, Shanghai 200040, China(上海塞瑞斯信息技术有限公司,上海200040,中国)

AI总结 通过分析Adam二阶矩估计器的内部动力学,发现自适应预处理器与瞬时平方梯度之间的解耦机制导致损失尖峰,并基于二次近似分析提出尖峰预测方法。

Comments Accepted to ICML 2026

详情
AI中文摘要

损失尖峰在使用Adam优化器训练神经网络时普遍出现,跨越不同架构和规模,但其潜在机制仍不清楚。虽然先前的解释将这些现象归因于较低损失处更尖锐的损失景观,但我们表明仅景观几何不足以解释该现象。在这项工作中,我们将根本原因定位在Adam二阶矩估计器的内部动力学中。我们识别出一个关键的“解耦”机制,其中自适应预处理器 $v_t$ 未能跟踪瞬时平方梯度 $g_t^2$,导致自适应机制有效失效。这种解耦允许预处理器在梯度上升时自主衰减,从而将预处理Hessian的最大特征值推至稳定阈值 $2/η$ 以上持续一段时间,表现为剧烈的损失尖峰。通过二次近似分析,我们从理论和实验上刻画了尖峰演化的五个不同阶段,并提出了基于梯度方向曲率预测尖峰的指标。我们经验性地发现,所提出的损失尖峰机制虽然源于简化模型,但能很好地推广到从小型神经网络到大规模Transformer的实际场景。

英文摘要

Loss spikes commonly emerge during neural network training with the Adam optimizer across diverse architectures and scales, yet their underlying mechanism remains elusive. While previous explanations attribute these phenomena to sharper loss landscapes at lower loss, we show that landscape geometry alone is insufficient to explain the phenomenon. In this work, we pinpoint the root cause in the internal dynamics of Adam's second moment estimator. We identify a critical ``decoupling'' mechanism where the adaptive preconditioner $v_t$ fails to track the instantaneous squared gradients $g_t^2$, causing the adaptive mechanism to effectively fail. This decoupling allows the preconditioner to decay autonomously despite rising gradients, which pushes the maximum eigenvalue of the preconditioned Hessian beyond the stability threshold $2/η$ for sustained periods, manifesting as dramatic loss spikes. Through a quadratic approximation analysis, we theoretically and experimentally characterize five distinct stages of spike evolution and propose a predictor for anticipating spikes based on gradient-directional curvature. We empirically find that the proposed loss spike mechanism, although derived from simplified models, generalizes well to practical scenarios ranging from small neural networks to large-scale Transformers.

2505.22322 2026-05-26 cs.LG 版本更新

A Closer Look on Memorization in Tabular Diffusion Model: A Data-Centric Perspective

表格扩散模型中记忆化的深入探究:以数据为中心的观点

Zhengyu Fang, Zhimeng Jiang, Huiyuan Chen, Xiaoge Zhang, Kaiyu Tang, Xiao Li, Jing Li

发表机构 * Department of Computer and Data Sciences(计算机与数据科学系) Case Western Reserve University(凯斯西储大学) Department of Computer Science & Engineering(计算机科学与工程系) Texas A&M University(德克萨斯大学) Department of Biochemistry(生物化学系) Center for RNA Science and Therapeutics(RNA科学与治疗中心) Department of Biomedical Engineering(生物医学工程系)

AI总结 本文首次从数据角度研究表格扩散模型中的记忆化动态,通过量化每个真实样本的记忆化程度,发现少数样本贡献了大部分泄露,并提出两阶段缓解方法DynamicCut。

Comments Published in Transactions on Machine Learning Research (TMLR), 2026

详情
AI中文摘要

扩散模型在生成高质量表格数据方面表现出色,但通过重现精确训练样本带来隐私风险。先前工作侧重于数据集级增强以减少记忆化,但鲜有研究哪些个体样本贡献最大。我们首次从数据角度研究表格扩散模型中的记忆化动态。我们基于有多少生成样本被标记为副本,使用相对距离比率量化每个真实样本的记忆化程度。实证分析揭示了记忆化计数的重尾分布:一小部分样本对泄露贡献不成比例,通过样本移除实验得到证实。为理解这一点,我们将真实样本分为顶部记忆化和非顶部记忆化两组,分析其训练时行为。我们追踪每个样本首次被记忆化的时间,并监测每轮记忆化强度(AUC)。记忆化样本稍早被记忆化,并在早期训练中表现出更强信号。基于这些见解,我们提出DynamicCut,一种两阶段、模型无关的缓解方法:(a)按轮次强度对样本排序,(b)修剪可调顶部比例,(c)在过滤后的数据集上重新训练。在多个表格数据集和模型上,DynamicCut减少了记忆化,对数据多样性和下游性能影响最小。它还补充了基于增强的防御。此外,DynamicCut实现了跨模型迁移性:从一个模型(如扩散模型)识别出的高排名样本,当从其他模型(如GAN和VAE)中移除时,也能有效减少记忆化。

英文摘要

Diffusion models have shown strong performance in generating high-quality tabular data, but they carry privacy risks by reproducing exact training samples. While prior work focuses on dataset-level augmentation to reduce memorization, little is known about which individual samples contribute most. We present the first data-centric study of memorization dynamics in tabular diffusion models. We quantify memorization for each real sample based on how many generated samples are flagged as replicas, using a relative distance ratio. Our empirical analysis reveals a heavy-tailed distribution of memorization counts: a small subset of samples contributes disproportionately to leakage, confirmed via sample-removal experiments. To understand this, we divide real samples into top- and non-top-memorized groups and analyze their training-time behaviors. We track when each sample is first memorized and monitor per-epoch memorization intensity (AUC). Memorized samples are memorized slightly earlier and show stronger signals in early training. Based on these insights, we propose DynamicCut, a two-stage, model-agnostic mitigation method: (a) rank samples by epoch-wise intensity, (b) prune a tunable top fraction, and (c) retrain on the filtered dataset. Across multiple tabular datasets and models, DynamicCut reduces memorization with minimal impact on data diversity and downstream performance. It also complements augmentation-based defenses. Furthermore, DynamicCut enables cross-model transferability: high-ranked samples identified from one model (e.g., a diffusion model) are also effective for reducing memorization when removed from others, such as GANs and VAEs.

2505.18190 2026-05-26 eess.SP cs.AI cs.LG 版本更新

PhySense: Sensor Placement Optimization for Accurate Physics Sensing

PhySense:面向精确物理感知的传感器布局优化

Yuezhou Ma, Haixu Wu, Hang Zhou, Huikun Weng, Jianmin Wang, Mingsheng Long

发表机构 * School of Software, BNRist, Tsinghua University(软件学院,BNRist,清华大学)

AI总结 提出PhySense两阶段框架,通过流生成模型和投影梯度下降联合优化传感器布局与物理场重建,实现高精度物理感知。

详情
AI中文摘要

物理感知在许多科学和工程领域中扮演着核心角色,它固有地涉及两个耦合的任务:从稀疏观测中重建密集物理场,以及优化分散的传感器布局以观测最大信息。虽然深度学习在稀疏数据重建方面取得了快速进展,但现有方法通常忽略传感器布局的优化,将重建与布局之间的相互增强束之高阁。为了改变这种次优实践,我们提出了PhySense,一个协同的两阶段框架,学习联合重建物理场和优化传感器布局,两者都旨在实现精确的物理感知。第一阶段涉及一个基于流的生成模型,通过交叉注意力增强以自适应地融合稀疏观测。利用重建反馈,第二阶段通过投影梯度下降执行传感器布局以满足空间约束。我们进一步证明两个阶段的学习目标与经典方差最小化原则一致,提供了理论保证。在三个具有挑战性的基准测试(特别是3D几何数据集)上的大量实验表明,PhySense实现了最先进的物理感知精度,并发现了以前未考虑的信息丰富的传感器布局。代码可在以下仓库获取:https://github.com/thuml/PhySense。

英文摘要

Physics sensing plays a central role in many scientific and engineering domains, which inherently involves two coupled tasks: reconstructing dense physical fields from sparse observations and optimizing scattered sensor placements to observe maximum information. While deep learning has made rapid advances in sparse-data reconstruction, existing methods generally omit optimization of sensor placements, leaving the mutual enhancement between reconstruction and placement on the shelf. To change this suboptimal practice, we propose PhySense, a synergistic two-stage framework that learns to jointly reconstruct physical fields and to optimize sensor placements, both aiming for accurate physics sensing. The first stage involves a flow-based generative model enhanced by cross-attention to adaptively fuse sparse observations. Leveraging the reconstruction feedback, the second stage performs sensor placement via projected gradient descent to satisfy spatial constraints. We further prove that the learning objectives of the two stages are consistent with classical variance-minimization principles, providing theoretical guarantees. Extensive experiments across three challenging benchmarks, especially a 3D geometry dataset, indicate PhySense achieves state-of-the-art physics sensing accuracy and discovers informative sensor placements previously unconsidered. Code is available at this repository: https://github.com/thuml/PhySense.

2505.11788 2026-05-26 cs.DC cs.IT cs.LG cs.NI eess.SP math.IT 版本更新

Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission

基于不确定性感知的机会主义与压缩传输的通信高效混合语言模型

Seungeun Oh, Jinhyuk Kim, Jihong Park, Seung-Woo Ko, Jinho Choi, Tony Q. S. Quek, Seong-Lyun Kim

发表机构 * School of Electrical and Electronic Engineering, Yonsei University, South Korea(延世大学电气电子工程学院,韩国) Information Systems Technology and Design pillar, Singapore University of Technology and Design, Singapore 487372(新加坡科技设计大学信息系统技术与设计支柱,新加坡487372) Department of Smart Mobility Engineering, Inha University, South Korea(Inha大学智能移动工程系,韩国) School of Electrical and Mechanical Engineering, The University of Adelaide, Australia(阿德莱德大学电气与机械工程学院,澳大利亚) Singapore University of Technology and Design, Singapore 487372(新加坡科技设计大学,新加坡487372)

AI总结 提出通信高效的混合语言模型CU-HLM,通过不确定性感知的机会主义传输和词汇表压缩,在保持97.4%准确率的同时实现高达206倍的令牌吞吐量提升。

Comments 17 pages, 13 figures, 5 tables; This article has been accepted for publication in IEEE Transactions on Communications. This is the author's accepted version; the final published version will be available via IEEE Xplore

详情
AI中文摘要

为了支持使用分散异构计算资源的新兴语言应用,混合语言模型(HLM)提供了一种有前景的架构,其中设备端的小语言模型(SLM)生成草稿令牌,由远程大语言模型(LLM)验证和纠正。然而,原始HLM存在巨大的通信开销,因为LLM要求SLM为每个令牌上传完整的词汇分布。此外,当LLM验证极有可能被接受的令牌时,通信和计算资源都被浪费。为了克服这些限制,我们提出了通信高效且不确定性感知的HLM(CU-HLM)。在CU-HLM中,SLM仅在其输出不确定性高时才传输截断的词汇分布。我们通过发现SLM的不确定性与LLM的拒绝概率之间的强相关性,验证了这种机会主义传输的可行性。此外,我们从理论上推导了最优不确定性阈值和最优词汇截断策略。仿真结果表明,与标准HLM相比,CU-HLM通过跳过74.8%的传输并压缩97.4%的词汇,实现了高达206倍的令牌吞吐量提升,同时保持了97.4%的准确率。

英文摘要

To support emerging language-based applications using dispersed and heterogeneous computing resources, the hybrid language model (HLM) offers a promising architecture, where an on-device small language model (SLM) generates draft tokens that are validated and corrected by a remote large language model (LLM). However, the original HLM suffers from substantial communication overhead, as the LLM requires the SLM to upload the full vocabulary distribution for each token. Moreover, both communication and computation resources are wasted when the LLM validates tokens that are highly likely to be accepted. To overcome these limitations, we propose communication-efficient and uncertainty-aware HLM (CU-HLM). In CU-HLM, the SLM transmits truncated vocabulary distributions only when its output uncertainty is high. We validate the feasibility of this opportunistic transmission by discovering a strong correlation between SLM's uncertainty and LLM's rejection probability. Furthermore, we theoretically derive optimal uncertainty thresholds and optimal vocabulary truncation strategies. Simulation results show that, compared to standard HLM, CU-HLM achieves up to 206$\times$ higher token throughput by skipping 74.8% transmissions with 97.4% vocabulary compression, while maintaining 97.4% accuracy.

2505.05880 2026-05-26 cs.AI cs.LG 版本更新

Combining Abstract Argumentation and Machine Learning for Efficiently Analyzing Low-Level Process Event Streams

结合抽象论证与机器学习高效分析低层过程事件流

Bettina Fazzinga, Sergio Flesca, Filippo Furfaro, Luigi Pontieri, Francesco Scala

发表机构 * University of Calabria(卡拉布里亚大学) CNR(国家科研委员会)

AI总结 提出一种数据高效的神经符号方法,通过抽象论证框架(AAF)优化序列标注模型生成的候选事件解释,以解决低层过程事件流中事件到活动映射的不确定性问题。

详情
AI中文摘要

监控和分析过程轨迹是现代公司和组织的一项关键任务。在轨迹事件与参考业务活动之间存在差距的场景中,这涉及一个解释问题,即将任何正在进行的轨迹的每个事件转换为活动实例的相应步骤。基于最近将解释问题框架化为抽象论证框架(AAF)内的接受问题的方法,可以优雅地分析可能的(可能以聚合形式)事件解释,并为那些与先验过程知识冲突的解释提供解释。由于在事件到活动映射高度不确定(或简单地说未充分指定)的环境中,这种基于推理的方法可能产生低信息量的结果和繁重的计算,因此可以考虑发现一个序列标注模型,该模型经过训练以上下文感知的方式建议高概率的候选事件解释。然而,最优地训练这样的模型可能需要使用大量手动注释的示例轨迹。因此,我们提出了一种数据高效的神经符号方法,其中由示例驱动的序列标注器返回的候选解释由基于AAF的推理器进行细化。这使我们能够利用先验知识来补偿示例数据的稀缺性,实验结果证实了这一点。

英文摘要

Monitoring and analyzing process traces is a critical task for modern companies and organizations. In scenarios where there is a gap between trace events and reference business activities, this entails an interpretation problem, amounting to translating each event of any ongoing trace into the corresponding step of the activity instance. Building on a recent approach that frames the interpretation problem as an acceptance problem within an Abstract Argumentation Framework (AAF), one can elegantly analyze plausible event interpretations (possibly in an aggregated form), as well as offer explanations for those that conflict with prior process knowledge. Since, in settings where event-to-activity mapping is highly uncertain (or simply under-specified) this reasoning-based approach may yield lowly-informative results and heavy computation, one can think of discovering a sequence-tagging model, trained to suggest highly-probable candidate event interpretations in a context-aware way. However, training such a model optimally may require using a large amount of manually-annotated example traces. We then propose a data-efficient neuro-symbolic approach to the problem, where the candidate interpretations returned by the example-driven sequence tagger is refined by the AAF-based reasoner. This allows us to also leverage prior knowledge to compensate for the scarcity of example data, as confirmed by experimenftal results.

2502.11167 2026-05-26 cs.LG cs.CL 版本更新

SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors

SURGE: 大型语言模型作为通用代理代码执行器的潜力

Bohan Lyu, Siqiao Huang, Zichen Liang

发表机构 * Department of Computer Science and Technology, Tsinghua(清华大学计算机科学与技术系) Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua(清华大学交叉信息研究院)

AI总结 提出SURGE基准,包含1160个问题覆盖8个关键方面,通过评估21个开源和专有LLM,研究其作为代码执行预测代理模型的可行性、扩展律、数据效率和预测准确性。

详情
Journal ref
Proceedings of The 2025 Conference on Empirical Methods in Natural Language Processing
AI中文摘要

神经代理模型是数据挖掘中强大且高效的工具。同时,大型语言模型(LLM)在代码相关任务(如生成和理解)中展示了卓越的能力。然而,一个同样重要但尚未充分探索的问题是,LLM是否可以作为代码执行预测的代理模型。为了系统研究这一问题,我们引入了SURGE,一个包含1160个问题的综合基准,覆盖8个关键方面:多语言编程任务、竞赛级编程问题、仓库级代码分析、高成本科学计算、时间复杂度密集型算法、有缺陷代码分析、依赖特定编译器或执行环境的程序,以及形式化数学证明验证。通过对21个开源和专有LLM的广泛分析,我们研究了扩展律、数据效率和预测准确性。我们的发现揭示了LLM作为计算过程高效代理的可行性的重要见解。基准和评估框架可在https://github.com/Imbernoulli/SURGE获取。

英文摘要

Neural surrogate models are powerful and efficient tools in data mining. Meanwhile, large language models (LLMs) have demonstrated remarkable capabilities in code-related tasks, such as generation and understanding. However, an equally important yet underexplored question is whether LLMs can serve as surrogate models for code execution prediction. To systematically investigate it, we introduce SURGE, a comprehensive benchmark with $1160$ problems covering $8$ key aspects: multi-language programming tasks, competition-level programming problems, repository-level code analysis, high-cost scientific computing, time-complexity-intensive algorithms, buggy code analysis, programs dependent on specific compilers or execution environments, and formal mathematical proof verification. Through extensive analysis of $21$ open-source and proprietary LLMs, we examine scaling laws, data efficiency, and predictive accuracy. Our findings reveal important insights about the feasibility of LLMs as efficient surrogates for computational processes. The benchmark and evaluation framework are available at https://github.com/Imbernoulli/SURGE.

2502.10311 2026-05-26 cs.LG cs.AI cs.HC 版本更新

ExplainReduce: Generating global explanations from many local explanations

ExplainReduce: 从许多局部解释生成全局解释

Lauri Seppäläinen, Mudong Guo, Kai Puolamäki

发表机构 * University of Helsinki(赫尔辛基大学)

AI总结 本文提出 ExplainReduce 方法,通过贪心启发式算法将大量局部解释缩减为少量简单模型,作为生成式全局解释,并证明其有效性和竞争力。

Comments 21 pages with a 36 page appendix, 8 + 39 figures, 1+1 tables. The datasets and source code used in the paper are available at https://github.com/edahelsinki/explainreduce. Accepted for publication in the 4th World Conference on eXplainable Artificial Intelligence (2026)

详情
AI中文摘要

最常用的非线性机器学习方法是黑箱模型,人类无法解释。可解释人工智能(XAI)领域旨在开发工具来检查这些黑箱的内部工作原理。一种常用的模型无关的 XAI 方法涉及使用简单模型作为局部近似来产生所谓的局部解释;这种方法的例子包括 LIME、SHAP 和 SLISEMAP。本文展示了如何将大量局部解释缩减为少量简单模型的“代理集”,这些模型可以作为生成式全局解释。这种缩减过程 ExplainReduce 可以表述为一个优化问题,并使用贪心启发式算法高效近似。我们表明,对于许多问题,少至五个解释就能忠实地模拟黑箱模型,并且我们的缩减过程与其他模型聚合方法相比具有竞争力。

英文摘要

Most commonly used non-linear machine learning methods are closed-box models, uninterpretable to humans. The field of explainable artificial intelligence (XAI) aims to develop tools to examine the inner workings of these closed boxes. An often-used model-agnostic approach to XAI involves using simple models as local approximations to produce so-called local explanations; examples of this approach include LIME, SHAP, and SLISEMAP. This paper shows how a large set of local explanations can be reduced to a small "proxy set" of simple models, which can act as a generative global explanation. This reduction procedure, ExplainReduce, can be formulated as an optimisation problem and approximated efficiently using greedy heuristics. We show that, for many problems, as few as five explanations can faithfully emulate the closed-box model and that our reduction procedure is competitive with other model aggregation methods.

2502.01397 2026-05-26 cs.LG cs.AI cs.NA math.NA 版本更新

Message-Passing GNNs Fail to Approximate Sparse Triangular Factorizations

消息传递GNN无法近似稀疏三角分解

Vladislav Trifonov, Ekaterina Muravleva, Ivan Oseledets

发表机构 * AIC, Skoltech(斯克里普金技术大学人工智能中心) Skoltech AI4S Center(斯克里普金技术大学AI4S中心) Sberbank of Russia(俄罗斯储蓄银行) AIRI

AI总结 本文通过理论和实验证明,消息传递图神经网络在逼近稀疏三角分解时存在根本性局限,需要超越消息传递的架构创新。

Comments Camera-ready version published in Transactions on Machine Learning Research

详情
Journal ref
Transactions on Machine Learning Research, 2026
AI中文摘要

图神经网络(GNN)已被提议作为学习稀疏矩阵预条件子的工具,预条件子是加速线性求解器的关键组件。我们提出理论和实验证据表明,对于存在高质量预条件子但需要非局部依赖的矩阵类别,消息传递GNN从根本上无法近似稀疏三角分解。为了说明这一点,我们使用合成矩阵和SuiteSparse集合中的真实示例构建了一组基线。在包括图注意力网络和图变换器在内的多种GNN架构中,我们观察到预测因子与参考因子之间的余弦相似度较低(关键情况下≤0.7)。我们的理论和实验结果表明,需要超越消息传递的架构创新才能将GNN应用于矩阵分解等科学计算任务。此外,实验表明仅克服非局部性是不够的。需要定制的架构来捕获所需的依赖关系,因为即使是完全非局部的全局图变换器也无法匹配所提出的基线。

英文摘要

Graph Neural Networks (GNNs) have been proposed as a tool for learning sparse matrix preconditioners, which are key components in accelerating linear solvers. We present theoretical and empirical evidence that message-passing GNNs are fundamentally incapable of approximating sparse triangular factorizations for classes of matrices for which high-quality preconditioners exist but require non-local dependencies. To illustrate this, we construct a set of baselines using both synthetic matrices and real-world examples from the SuiteSparse collection. Across a range of GNN architectures, including Graph Attention Networks and Graph Transformers, we observe low cosine similarity ($\leq0.7$ in key cases) between predicted and reference factors. Our theoretical and empirical results suggest that architectural innovations beyond message-passing are necessary for applying GNNs to scientific computing tasks such as matrix factorization. Moreover, experiments demonstrate that overcoming non-locality alone is insufficient. Tailored architectures are necessary to capture the required dependencies since even a completely non-local Global Graph Transformer fails to match the proposed baselines.

2502.01184 2026-05-26 cs.LG cs.AI physics.chem-ph q-bio.QM 版本更新

FragmentNet: Adaptive Graph Fragmentation for Graph-to-Sequence Molecular Representation Learning

FragmentNet: 自适应图分片用于图到序列分子表示学习

Ankur Samanta, Rohan Gupta, Aditi Misra, Christian McIntosh Clarke, Jayakumar Rajadas

发表机构 * Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada(电气与计算机工程系,多伦多大学,多伦多,加拿大) Regenerative Biomaterials Laboratory, Stanford Cardiovascular Institute, Palo Alto, USA(再生生物材料实验室,斯坦福心血管研究所,帕洛阿尔托,美国)

AI总结 提出FragmentNet,通过自适应学习的分词器将分子图分解为化学有效的片段,并利用化学感知的空间位置编码保持分子拓扑,在片段级别进行掩码预训练,在多个属性预测任务上提升了性能。

Comments 22 pages, 13 figures, 5 tables

详情
AI中文摘要

分子表示学习方法通常将分子标记为单个原子或使用刚性、基于规则的分片分解,限制了它们捕捉有意义化学子结构上下文的能力。我们引入了FragmentNet,一种围绕新颖的自适应学习分词器构建的图到序列模型,该分词器将分子图分解为可调整粒度的化学有效片段,并辅以化学感知的空间位置编码,在生成的序列中保留分子拓扑。将自然语言处理中的掩码预训练策略扩展到分子领域,我们在化学有意义的片段级别而非单个原子级别对分子进行掩码和重建。在多个属性预测基准上的评估发现,在片段粒度上进行预训练在大多数任务上提高了下游性能,表明标记化粒度是分子表示学习的重要设计选择。

英文摘要

Molecular representation learning methods typically tokenize molecules as individual atoms or use rigid, rule-based fragment decompositions, limiting their ability to capture meaningful chemical substructure context. We introduce FragmentNet, a graph-to-sequence model built around a novel adaptive, learned tokenizer that decomposes molecular graphs into chemically valid fragments of adjustable granularity, complemented by chemically aware spatial positional encodings that preserve molecular topology in the resulting sequence. Extending masked pre-training strategies from natural language processing to the molecular domain, we mask and reconstruct molecules at the level of chemically meaningful fragments rather than individual atoms. Evaluating across multiple property prediction benchmarks, we find that pre-training at fragment granularity leads to improved downstream performance on the majority of tasks, demonstrating that tokenization granularity is an important design choice for molecular representation learning.

2501.14889 2026-05-26 cs.LG 版本更新

Iterative Feature Space Optimization through Incremental Adaptive Evaluation

通过增量自适应评估的迭代特征空间优化

Yanping Wu, Yanyong Huang, Zhengzhang Chen, Zijun Yao, Yanjie Fu, Kunpeng Liu, Xiao Luo, Dongjie Wang

发表机构 * University of Kansas(堪萨斯大学) Southwestern University of Finance and Economics(西南财经大学) Arizona State University(亚利桑那州立大学) Portland State University(波特兰州立大学) University of California(加州大学)

AI总结 提出EASE框架,通过特征-样本子空间生成器和上下文注意力评估器,实现高效、泛化的特征空间优化,解决评估偏差、过拟合和低效问题。

Comments 18 pages

详情
AI中文摘要

迭代特征空间优化涉及系统评估和调整特征空间以提升下游任务性能。然而,现有工作存在三个关键局限:1)忽视数据样本间的差异导致评估偏差;2)针对特定机器学习模型定制特征空间导致过拟合和泛化能力差;3)每次优化迭代需要从头重新训练评估器,显著降低整体优化效率。为弥补这些不足,我们提出一种广义自适应特征空间评估器(EASE),以高效产生最优且泛化的特征空间。该框架包含两个关键组件:特征-样本子空间生成器和上下文注意力评估器。第一个组件旨在解耦特征空间内的信息分布以减轻评估偏差。为此,我们首先根据后续评估器的反馈,识别与预测任务最相关的特征和评估中最具挑战性的样本。这种解耦策略使评估器持续聚焦于特征空间中最具挑战性的方面。第二个组件旨在增量捕获特征空间的演化模式以实现高效评估。我们提出一种加权共享多头注意力机制,将特征空间的关键特征编码为嵌入向量用于评估。此外,评估器进行增量更新,保留先前的评估知识同时融入新见解,因为优化过程中连续的特征空间共享部分信息。在十四个真实世界数据集上的大量实验证明了所提框架的有效性。我们的代码和数据已公开。

英文摘要

Iterative feature space optimization involves systematically evaluating and adjusting the feature space to improve downstream task performance. However, existing works suffer from three key limitations:1) overlooking differences among data samples leads to evaluation bias; 2) tailoring feature spaces to specific machine learning models results in overfitting and poor generalization; 3) requiring the evaluator to be retrained from scratch during each optimization iteration significantly reduces the overall efficiency of the optimization process. To bridge these gaps, we propose a gEneralized Adaptive feature Space Evaluator (EASE) to efficiently produce optimal and generalized feature spaces. This framework consists of two key components: Feature-Sample Subspace Generator and Contextual Attention Evaluator. The first component aims to decouple the information distribution within the feature space to mitigate evaluation bias. To achieve this, we first identify features most relevant to prediction tasks and samples most challenging for evaluation based on feedback from the subsequent evaluator. This decoupling strategy makes the evaluator consistently target the most challenging aspects of the feature space. The second component intends to incrementally capture evolving patterns of the feature space for efficient evaluation. We propose a weighted-sharing multi-head attention mechanism to encode key characteristics of the feature space into an embedding vector for evaluation. Moreover, the evaluator is updated incrementally, retaining prior evaluation knowledge while incorporating new insights, as consecutive feature spaces during the optimization process share partial information. Extensive experiments on fourteen real-world datasets demonstrate the effectiveness of the proposed framework. Our code and data are publicly available.

2408.08399 2026-05-26 cs.LG cs.SY eess.SY 版本更新

Transformer-based few-shot learning for modeling Electricity Consumption Profiles with minimal data across thousands of domains

基于Transformer的少样本学习:以最少数据跨数千个领域建模电力消费曲线

Weijie Xia, Gao Peng, Chenguang Wang, Peter Palensky, Eric Pauwels, Pedro P. Vergara

发表机构 * Intelligent Electrical Power Grids (IEPG) Group(智能电力电网组) Centrum Wiskunde & Informatica (CWI)(数学与信息学研究中心) Alliander N.V(Alliander公司)

AI总结 针对电力消费曲线建模中数据稀缺问题,提出一种结合Transformer和高斯混合模型的免微调少样本学习框架,仅需1.6%数据即可准确恢复复杂分布,优于现有方法。

详情
Journal ref
International Journal of Electrical Power & Energy Systems, Volume/Issue (February 2026), Article 111575
AI中文摘要

电力消费曲线(ECP)对于配电系统的运行和规划至关重要,尤其是在太阳能电池板和电动汽车等低碳技术日益普及的背景下。传统的ECP建模方法通常假设有足够的ECP数据可用。然而,在实践中,由于隐私问题或缺乏计量设备,ECP数据的可访问性有限。少样本学习(FSL)已成为数据稀缺场景下ECP建模的一种有前景的解决方案。然而,标准的FSL方法(例如用于图像的方法)不适用于ECP建模,因为(1)这些方法通常假设有多个具有充足数据的源域和多个目标域。但在ECP建模中,可能存在数千个源域(例如具有中等数据量的家庭)和数千个目标域(例如需要建模ECP的家庭)。(2)标准FSL方法通常涉及繁琐的知识迁移机制,例如预训练和微调。为了解决这些局限性,本文提出了一种新颖的FSL框架,将Transformer与高斯混合模型(GMM)相结合用于ECP建模。所提出的方法无需微调,计算效率高,即使在数据极其有限的情况下也具有鲁棒性。结果表明,我们的方法可以用最少的ECP数据(例如,仅占完整域数据集的1.6%)准确恢复复杂的ECP分布,并且在ECP建模背景下优于最先进的时间序列建模方法。

英文摘要

Electricity Consumption Profiles (ECPs) are crucial for operating and planning power distribution systems, especially with the increasing number of low-carbon technologies such as solar panels and electric vehicles. Traditional ECP modeling methods typically assume the availability of sufficient ECP data. However, in practice, the accessibility of ECP data is limited due to privacy issues or the absence of metering devices. Few-shot learning (FSL) has emerged as a promising solution for ECP modeling in data-scarce scenarios. Nevertheless, standard FSL methods, such as those used for images, are unsuitable for ECP modeling because (1) these methods usually assume several source domains with sufficient data and several target domains. However, in the context of ECP modeling, there may be thousands of source domains, e.g., households with a moderate amount of data, and thousands of target domains, e.g., households that ECP are required to be modeled. (2) Standard FSL methods usually involve cumbersome knowledge transfer mechanisms, such as pre-training and fine-tuning. To address these limitations, this paper proposes a novel FSL framework that integrates Transformers with Gaussian Mixture Models (GMMs) for ECP modeling. The proposed approach is fine-tuning-free, computationally efficient, and robust even with extremely limited data. Results show that our method can accurately restore the complex ECP distribution with a minimal amount of ECP data (e.g., only 1.6% of the complete domain dataset) and outperforms state-of-the-art time series modeling methods in the context of ECP modeling.

2406.09079 2026-05-26 cs.LG 版本更新

Hadamard Representation: Scaffolding Performance Across Model-free RL

Hadamard表示:跨无模型强化学习的性能支撑

Jacob E. Kooi, Zhao Yang, Mark Hoogendoorn, Vincent François-Lavet

发表机构 * Vrije Universiteit Amsterdam(阿姆斯特丹自由大学)

AI总结 提出Hadamard表示(HR),通过将标准隐藏层替换为两个独立参数化层的逐元素乘积,减少神经元休眠并增加有效秩,从而在多种强化学习算法和领域中一致提升性能。

Comments 26 pages, 17 figures

详情
AI中文摘要

深度强化学习智能体在训练过程中逐渐失去表示能力:神经元变得休眠,从网络中移除活跃容量,有效秩崩溃,使存活的神经元冗余。现有的补救措施如周期性重置和特殊神经网络架构,大多局限于特定算法或领域。我们提出一个简单的架构修复,即Hadamard表示(HR),它将标准隐藏层替换为两个独立参数化层的逐元素乘积。HR通过两种互补机制运作。首先,它降低了神经元变得休眠的概率,这对于连续可微激活函数(如tanh)尤其有价值:与休眠的ReLU神经元(被有效剪枝)不同,饱和的tanh神经元通过将其输出权重转化为固定偏置而暗中破坏下游层。其次,独立于休眠,乘法结构捕获更丰富的特征交互,并在不拓宽层的情况下增加有效秩。我们在五种算法和三个领域上评估HR:基于像素的离散动作Atari上的DQN、PPO和PQN,基于状态连续控制上的SimbaV2,以及视觉连续控制上的MR.Q。HR在无需任何超参数调优的情况下,一致地优于强基线,并且其增益在参数匹配的更宽变体上仍然保持,排除了参数数量作为替代解释的可能性。

英文摘要

Deep reinforcement learning agents progressively lose representational capacity during training: neurons become dormant, removing active capacity from the network, and effective rank collapses, leaving surviving neurons redundant. Existing remedies such as periodic resets, and special neural network architectures, are largely algorithm- or domain-specific. We propose a simple architectural fix, the Hadamard Representation (HR), which replaces a standard hidden layer with the element-wise product of two independently parameterized layers. HR operates through two complementary mechanisms. First, it reduces the probability of a neuron becoming dormant, which is particularly valuable for continuously differentiable activations such as tanh: unlike dormant ReLU neurons, which are effectively pruned, saturated tanh neurons silently corrupt downstream layers by turning their outgoing weights into fixed biases. Second, independently of dormancy, the multiplicative structure captures richer feature interactions and increases effective rank without widening the layer. We evaluate HR across five algorithms and three domains: DQN, PPO, and PQN on pixel-based discrete-action Atari, SimbaV2 on state-based continuous control, and MR.Q on visual continuous control. HR consistently improves performance over the strong baselines without any hyperparameter tuning, and gains persist against parameter-matched wider variants, ruling out parameter count as an alternative explanation.

2406.04374 2026-05-26 cs.IR cs.GT cs.LG stat.ML 版本更新

Incentivized Exploration with Stochastic Covariates: A Two-Stage Mechanism Design for Recommender System

带随机协变量的激励探索:推荐系统的两阶段机制设计

Yuantong Li, Guang Cheng, Xiaowu Dai

发表机构 * Meta Platforms Inc(Meta公司) Department of Statistics and Data Science(统计学与数据科学系) University of California, Los Angeles, CA(加州大学洛杉矶分校)

AI总结 针对推荐系统中用户自利偏好下的探索-利用权衡问题,提出一种两阶段算法,通过激励相容的探索和逆比例间隙采样策略实现次线性遗憾并满足激励约束。

Comments ICML 2026

详情
AI中文摘要

推荐系统通过连接用户与相关产品在互联网经济中扮演关键角色。然而,设计有效的推荐系统面临关键挑战:在确保探索新产品的激励与用户自利偏好之间的探索-利用权衡。先前工作解决了固定设计线性bandit中的贝叶斯激励相容性(Sellke & Slivkins, 2023),我们则应对在线采样的随机用户协变量的挑战。与标准的黑箱归约(Mansour et al., 2020)不同,我们的两阶段框架利用线性奖励结构,在满足激励约束的同时实现次线性遗憾。为解决该问题,我们提出一种两阶段算法,将激励探索与任何高效的即插即用离线学习算法相结合。在第一阶段,算法在保持激励相容性的同时探索产品以收集最优样本。第二阶段采用逆比例间隙采样策略(IPGS)与任何高效学习方法相结合,以确保次线性遗憾。理论上,我们证明算法RCB实现了$O(\sqrt{KdT})$遗憾,同时满足激励约束,并发现了激励预算与遗憾之间的权衡,实验验证了这一点。通过在个性化华法林剂量调整的实际应用和模拟中,我们展示了RCB的强激励增益、次线性遗憾和鲁棒性。

英文摘要

Recommender systems play a crucial role in internet economies by connecting users with relevant products. However, designing effective recommender systems faces the key challenges: the exploration-exploitation tradeoff in securing incentive to explore new products against user's self-interested preferences. While prior work addresses Bayesian Incentive Compatibility (BIC) in fixed-design linear bandits (Sellke & Slivkins, 2023), we tackle the challenge of stochastic user covariates sampled online. Unlike standard black-box reductions (Mansour et al., 2020), our two-stage framework exploits the linear reward structure to achieve sublinear regret while satisfying incentive constraints. To address it, we propose a two-stage algorithm that integrates incentivized exploration with any efficient plug-in offline learning algorithms. In the first stage, it explores products while maintaining incentive compatibility to gather optimal samples. The second stage employs inverse proportional gap sampling strategy (IPGS) integrated with any efficient learning methods to secure sublinear regret. Theoretically, we prove that algorithm RCB achieves $O(\sqrt{KdT})$ regret and simultaneously satisfies incentive constraints, and discovers the tradeoff between incentive budget and regret, validating in experiments. We demonstrate RCB's strong incentive gain, sublinear regret, and robustness through a real application on personalized warfarin dosing and simulations.

2404.08073 2026-05-26 math.OC cs.LG stat.ML 版本更新

Spurious Stationarity and Hardness Results for Bregman Proximal-Type Algorithms

Bregman近端类型算法的伪平稳性和困难结果

He Chen, Jiajin Li, Anthony Man-Cho So

发表机构 * Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong(香港中文大学系统工程与工程管理系) Sauder School of Business, University of British Columbia(不列颠哥伦比亚大学萨德勒商学院)

AI总结 本文揭示了Bregman近端类型算法(如镜像下降)在非欧几何下可能陷入伪平稳点,即使对于凸问题,若Bregman核的梯度非Lipschitz连续,停滞可无限持续,并指出该现象在非凸多面体约束问题中普遍存在,挑战了现有收敛性理论。

详情
AI中文摘要

Bregman近端类型算法(BPs),如镜像下降,已成为机器学习和数据科学中通过非欧几何利用问题结构的流行工具。在本文中,我们表明BPs可能被困在一类非平稳点附近,我们称之为\emph{伪平稳点}。如果Bregman核的梯度不是Lipschitz连续的,即使对于凸问题,这种停滞也可能持续任意有限次迭代。根本原因在于欧几里得几何和Bregman几何在下降行为上的根本对比:虽然欧几里得梯度下降确保在任何非平稳点附近充分下降,但BPs可能在伪平稳点附近表现出任意缓慢的下降。因此,常用的基于Bregman的平稳性度量,例如Bregman散度的相对变化,可能在伪平稳点附近消失。这可能误导性地表明收敛,即使迭代点仍远离任何真正的平稳点。我们的分析进一步揭示,伪平稳点并非病态,而是在具有多面体约束的广泛非凸问题类中普遍出现。综上所述,我们的发现揭示了基于Bregman的优化方法中的一个严重盲点,并呼吁新的理论工具和算法保障以确保可靠的收敛。

英文摘要

Bregman proximal-type algorithms (BPs), such as mirror descent, have become popular tools in machine learning and data science for exploiting problem structures through non-Euclidean geometries. In this paper, we show that BPs can get trapped near a class of non-stationary points, which we term \emph{spurious stationary points}. Such stagnation can persist for any finite number of iterations if the gradient of the Bregman kernel is not Lipschitz continuous, even in convex problems. The root cause lies in a fundamental contrast in descent behavior between Euclidean and Bregman geometries: While Euclidean gradient descent ensures sufficient decrease near any non-stationary point, BPs may exhibit arbitrarily slow decrease around spurious stationary points. As a result, commonly used Bregman-based stationarity measure, such as relative change in terms of Bregman divergence, can vanish near spurious stationary points. This may misleadingly suggest convergence, even when the iterates remain far from any true stationary point. Our analysis further reveals that spurious stationary points are not pathological, but rather occur generically in a broad class of nonconvex problems with polyhedral constraints. Taken together, our findings reveal a serious blind spot in Bregman-based optimization methods and calls for new theoretical tools and algorithmic safeguards to ensure reliable convergence.

2403.04545 2026-05-26 cs.LG math.ST stat.TH 版本更新

Branch Scaling Manifests as Implicit Architectural Regularization for Improving Generalization in Overparameterized ResNets

分支缩放表现为隐式架构正则化以改善过参数化ResNet的泛化能力

Zixiong Yu, Guhan Chen, Jianfa Lai, Bohan Li, Songtao Tian

发表机构 * Huawei Large Model Data Technology Lab, Shenzhen(华为大模型数据技术实验室,深圳) Tsinghua University, Beijing(清华大学,北京) Kyoto University, Kyoto(京都大学,京都)

AI总结 本文研究残差网络中分支缩放因子对过参数化ResNet泛化性能的影响,通过理论分析证明快速深度衰减的缩放因子结合早停可实现极小极大最优泛化率,并利用神经正切核(NTK)近似解释其机制。

Comments Accepted by ICML. This version incorporates content from the preprint arXiv:2305.18506. The contributors of the relevant content have consented to its inclusion and have been listed as authors

详情
AI中文摘要

残差分支中的缩放因子已成为提升神经网络性能的流行方法,特别是在无归一化架构中。虽然先前的工作主要从优化角度研究缩放效应,本文通过泛化理论的视角探讨其在残差架构中的作用。具体来说,我们证明具有恒定缩放因子的宽残差网络(ResNet)随着深度增加渐近地变得不可学习。相反,当缩放因子表现出快速的深度方向衰减并结合早停时,过参数化ResNet实现了极小极大最优泛化率。为了建立这一结论,我们证明宽ResNet的泛化能力可以通过与神经正切核(NTK)相关的核回归来近似。我们的理论发现通过合成数据和真实世界分类任务(包括MNIST和CIFAR-100)的实验得到验证。

英文摘要

Scaling factors in residual branches have emerged as a prevalent method for boosting neural network performance, especially in normalization-free architectures. While prior work has primarily examined scaling effects from an optimization perspective, this paper investigates their role in residual architectures through the lens of generalization theory. Specifically, we establish that wide residual networks (ResNets) with constant scaling factors become asymptotically unlearnable as depth increases. In contrast, when the scaling factor exhibits rapid depth-wise decay combined with early stopping, over-parameterized ResNets achieve minimax-optimal generalization rates. To establish this, we demonstrate that the generalization capability of wide ResNets can be approximated by kernel regression associated with the Neural Tangent Kernel (NTK). Our theoretical findings are validated through experiments on synthetic data and real-world classification tasks, including MNIST and CIFAR-100.

2402.13791 2026-05-26 cs.LG 版本更新

Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing

打开黑箱:遥感中可解释人工智能的系统综述

Adrian Höhl, Ivica Obadic, Miguel Ángel Fernández Torres, Hiba Najjar, Dario Oliveira, Zeynep Akata, Andreas Dengel, Xiao Xiang Zhu

发表机构 * Chair of Data Science in Earth Observation, Technical University of Munich (TUM)(地球观测数据科学教授团,慕尼黑技术大学) Munich Center for Machine Learning(慕尼黑机器学习中心) Image Processing Laboratory (IPL), Universitat de València (UV)(图像处理实验室(IPL),瓦伦西亚大学) University of Kaiserslautern-Landau, Germany(德国凯撒斯劳滕-兰道大学) German Research Center for Artificial Intelligence (DFKI)(德国人工智能研究中心(DFKI)) School of Applied Mathematics, Getulio Vargas Foundation, Brazil(巴西格洛里奥·瓦格斯基金会应用数学学院) Institute for Explainable Machine Learning at Helmholtz Munich(海德堡慕尼黑可解释机器学习研究所) Chair of Interpretable and Reliable Machine Learning, Technical University of Munich(可解释和可靠机器学习教授团,慕尼黑技术大学)

AI总结 本文通过系统综述,总结了遥感中可解释AI方法的使用、目标、发现和挑战,揭示了新兴方向并提供了评估方法。

详情
Journal ref
published in IEEE Geoscience and Remote Sensing Magazine, vol. 12, no. 4, pp. 261-304, Dec. 2024
AI中文摘要

近年来,黑箱机器学习方法已成为遥感知识提取的主导建模范式。尽管通过可解释人工智能揭示这些模型内部运作具有潜在益处,但目前在遥感应用中,仍缺乏全面概述可解释AI方法及其目标、发现和挑战的综述。本文通过系统综述来填补这一空白,识别该领域的关键趋势,并阐明针对特定遥感挑战的新颖可解释AI方法和新兴方向。我们还揭示了解释解释的常见模式,讨论了提取的科学见解,并反思了用于评估可解释AI方法的方法。因此,我们的综述提供了遥感中可解释AI最新技术的完整总结。此外,我们详细展望了挑战和有前景的研究方向,这为新颖方法论的发展奠定了基础,并为该领域的新研究者提供了有用的起点。

英文摘要

In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in remote sensing. Despite the potential benefits of uncovering the inner workings of these models with explainable AI, a comprehensive overview summarizing the explainable AI methods used and their objectives, findings, and challenges in remote sensing applications is still missing. In this paper, we address this gap by performing a systematic review to identify the key trends in the field and shed light on novel explainable AI approaches and emerging directions that tackle specific remote sensing challenges. We also reveal the common patterns of explanation interpretation, discuss the extracted scientific insights, and reflect on the approaches used for the evaluation of explainable AI methods. As such, our review provides a complete summary of the state-of-the-art of explainable AI in remote sensing. Further, we give a detailed outlook on the challenges and promising research directions, representing a basis for novel methodological development and a useful starting point for new researchers in the field.

2312.03957 2026-05-26 q-bio.TO cs.LG 版本更新

PerSival: Neural-network-based visualisation for pervasive continuum-mechanical simulations in musculoskeletal biomechanics

PerSival:基于神经网络的肌肉骨骼生物力学中连续介质力学模拟的普适可视化

David Rosin, Johannes Kässinger, Xingyao Yu, Okan Avci, Christian Bleiler, Oliver Röhrle

发表机构 * Institute for Parallel and Distributed Systems, University of Stuttgart(并行与分布式系统研究所,斯图加特大学) Visualization Research Center VISUS, University of Stuttgart(可视化研究中心VISUS,斯图加特大学) Biomechatronic Systems, Fraunhofer IPA, Stuttgart(生物机械系统,弗劳恩霍夫IPA研究所,斯图加特)

AI总结 本文提出一种神经网络架构,通过稀疏网格代理捕捉肱二头肌表面变形,实现3D上肢肌肉骨骼系统模型在资源受限设备上的实时可视化,平均误差0.97 mm。

Comments 10 pages, 4 figures, 5 tables, to be submitted to Medical Image Analysis

详情
AI中文摘要

本文提出一种新颖的神经网络架构,用于3D人体上肢肌肉骨骼系统模型的普适可视化。将模拟能力扩展到移动设备等资源贫乏系统,在众多研究领域中日益受到关注,以拓宽方法和结果的适用性。直到最近,由于计算成本过高,这一目标被认为对于肌肉骨骼系统的真实连续介质力学模拟而言遥不可及。在本工作中,我们使用稀疏网格代理来捕捉肱二头肌的表面变形,以训练一个深度学习模型,用于同一肌肉的实时可视化。这些代理模型均以5个肌肉激活水平作为输入,并输出肌肉表面每个网格节点的笛卡尔坐标向量。因此,神经网络架构的输入维度显著低于输出维度。5个肌肉激活水平足以实现肱二头肌2809个网格节点位置的平均误差为0.97 ± 0.16 mm,即0.57 ± 0.10%。该模型在仅使用CPU时每个预测变形状态的评估时间为9.88 ms,在GPU支持下为3.48 ms,对应的理论帧率分别为101 fps和287 fps。因此,深度学习代理为连续介质力学模拟在视觉实时应用中的可访问性提供了一条途径。

英文摘要

This paper presents a novel neural network architecture for the purpose of pervasive visualisation of a 3D human upper limb musculoskeletal system model. Bringing simulation capabilities to resource-poor systems like mobile devices is of growing interest across many research fields, to widen applicability of methods and results. Until recently, this goal was thought to be out of reach for realistic continuum-mechanical simulations of musculoskeletal systems, due to prohibitive computational cost. Within this work we use a sparse grid surrogate to capture the surface deformation of the m.~biceps brachii in order to train a deep learning model, used for real-time visualisation of the same muscle. Both these surrogate models take 5 muscle activation levels as input and output Cartesian coordinate vectors for each mesh node on the muscle's surface. Thus, the neural network architecture features a significantly lower input than output dimension. 5 muscle activation levels were sufficient to achieve an average error of 0.97 +/- 0.16 mm, or 0.57 +/- 0.10 % for the 2809 mesh node positions of the biceps. The model achieved evaluation times of 9.88 ms per predicted deformation state on CPU only and 3.48 ms with GPU-support, leading to theoretical frame rates of 101 fps and 287 fps respectively. Deep learning surrogates thus provide a way to make continuum-mechanical simulations accessible for visual real-time applications.

2311.11342 2026-05-26 cs.LG cs.DC math.OC 版本更新

On the Communication Complexity of Decentralized Stochastic Bilevel Optimization

去中心化随机双层优化的通信复杂度

Yihan Zhang, My T. Thai, Jie Wu, Hongchang Gao

发表机构 * Temple University(特拉华大学)

AI总结 针对异构环境下现有去中心化随机双层优化算法收敛慢、通信成本高的问题,提出基于同步和交替更新策略的两种新算法,实现了更快的收敛速度和更低的通信成本,并首次在温和假设下揭示了异构设置中Hessian逆向量积的计算与通信对收敛率的影响。

详情
AI中文摘要

随机双层优化在机器学习中有着广泛的应用,包括元学习、超参数优化和神经架构搜索。为了将随机双层优化扩展到分布式数据,已经开发了几种去中心化随机双层优化算法。然而,现有方法在异构设置中通常存在收敛速度慢和通信成本高的问题,限制了它们在实际任务中的适用性。为了解决这些问题,我们提出了两种基于 extit{同步}和 extit{交替}更新策略的新型去中心化随机双层梯度下降算法。我们的算法能够实现比现有方法更快的收敛速度和更低的通信成本。重要的是,我们的收敛分析不依赖于关于异构性的强假设。更重要的是,我们的理论分析清晰地揭示了在异构设置下,关于Hessian逆向量积的计算和通信如何影响收敛率。据我们所知,这是首次在异构设置中在温和假设下取得如此有利的理论结果。此外,我们展示了如何在使用方差缩减梯度时建立交替更新策略的收敛率。最后,实验结果证实了我们算法的有效性。

英文摘要

Stochastic bilevel optimization finds widespread applications in machine learning, including meta-learning, hyperparameter optimization, and neural architecture search. To extend stochastic bilevel optimization to distributed data, several decentralized stochastic bilevel optimization algorithms have been developed. However, existing methods often suffer from slow convergence rates and high communication costs in heterogeneous settings, limiting their applicability to real-world tasks. To address these issues, we propose two novel decentralized stochastic bilevel gradient descent algorithms based on \textit{simultaneous} and \textit{alternating} update strategies. Our algorithms can achieve faster convergence rates and lower communication costs than existing methods. Importantly, our convergence analyses do not rely on strong assumptions regarding heterogeneity. More importantly, our theoretical analyses clearly disclose how the computation and communication regarding the Hessian-inverse-vector product under the heterogeneous setting affects the convergence rate. To the best of our knowledge, this is the first time such favorable theoretical results have been achieved with mild assumptions in the heterogeneous setting. Furthermore, we demonstrate how to establish the convergence rate for the alternating update strategy when combined with the variance-reduced gradient. Finally, experimental results confirm the efficacy of our algorithms.

2309.07778 2026-05-26 eess.IV cs.CV cs.LG q-bio.TO 版本更新

Virchow: A Million-Slide Digital Pathology Foundation Model

Virchow:百万级数字病理学基础模型

Eugene Vorontsov, Alican Bozkurt, Adam Casson, George Shaikovski, Michal Zelechowski, Siqi Liu, Kristen Severson, Eric Zimmermann, James Hall, Neil Tenenholtz, Nicolo Fusi, Philippe Mathieu, Alexander van Eck, Donghun Lee, Julian Viret, Eric Robert, Yi Kan Wang, Jeremy D. Kunz, Matthew C. H. Lee, Jan Bernhard, Ran A. Godrich, Gerard Oakley, Ewan Millar, Matthew Hanna, Juan Retamero, William A. Moye, Razik Yousfi, Christopher Kanan, David Klimstra, Brandon Rothrock, Thomas J. Fuchs

发表机构 * Paige Microsoft Research(微软研究院) NSW Health Pathology(新南威尔士州卫生病理学) St George Hospital(圣乔治医院) Memorial Sloan Kettering Cancer Center(纪念斯隆凯特琳癌症中心) University of Rochester(罗切斯特大学)

AI总结 提出Virchow,一个基于DINOv2自监督学习、在150万张H&E染色全切片图像上训练的6.32亿参数视觉Transformer模型,用于计算病理学,在泛癌检测和生物标志物预测任务上达到最先进性能。

详情
AI中文摘要

通过分析病理图像实现精准医疗和决策支持系统的人工智能应用,有潜力彻底改变癌症的诊断和治疗。这类应用将依赖于模型捕捉病理图像中观察到的多样化模式的能力。为应对这一挑战,我们提出了Virchow,一个用于计算病理学的基础模型。利用DINOv2算法支持的自监督学习,Virchow是一个拥有6.32亿参数的视觉Transformer模型,在来自不同组织和标本类型的150万张苏木精-伊红染色全切片图像上训练,数据量比以往工作高出数个数量级。Virchow模型使得开发一个泛癌检测系统成为可能,该系统在17种不同癌症类型上的整体标本级AUC达到0.949,同时在7种罕见癌症类型上达到0.937的AUC。Virchow模型在内部和外部图像块级基准测试以及切片级生物标志物预测任务上均达到了最先进水平。性能的提升凸显了在大型病理图像数据集上训练的重要性,表明扩展数据和网络架构可以提高许多高影响计算病理学应用的准确性,尤其是在训练数据有限的情况下。

英文摘要

The use of artificial intelligence to enable precision medicine and decision support systems through the analysis of pathology images has the potential to revolutionize the diagnosis and treatment of cancer. Such applications will depend on models' abilities to capture the diverse patterns observed in pathology images. To address this challenge, we present Virchow, a foundation model for computational pathology. Using self-supervised learning empowered by the DINOv2 algorithm, Virchow is a vision transformer model with 632 million parameters trained on 1.5 million hematoxylin and eosin stained whole slide images from diverse tissue and specimen types, which is orders of magnitude more data than previous works. The Virchow model enables the development of a pan-cancer detection system with 0.949 overall specimen-level AUC across 17 different cancer types, while also achieving 0.937 AUC on 7 rare cancer types. The Virchow model sets the state-of-the-art on the internal and external image tile level benchmarks and slide level biomarker prediction tasks. The gains in performance highlight the importance of training on massive pathology image datasets, suggesting scaling up the data and network architecture can improve the accuracy for many high-impact computational pathology applications where limited amounts of training data are available.

2306.14853 2026-05-26 math.OC cs.LG 版本更新

Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

基于全一阶Oracle的近似最优非凸强凸双层优化

Lesi Chen, Yaohua Ma, Jingzhao Zhang

发表机构 * IIIS, Tsinghua University(清华大学人工智能院) Shanghai Qi Zhi Institute(上海启智研究院) Shanghai AI Lab(上海人工智能实验室)

AI总结 针对下层问题强凸的双层优化,提出一种两时间尺度更新的一阶方法,在确定性设置下达到近最优的$\tilde{\mathcal{O}}(\varepsilon^{-2})$一阶Oracle复杂度,并扩展到随机和高阶光滑场景。

Comments JMLR 2025; fix a bug in the proof in Appendix E compared to the journal version

详情
AI中文摘要

本文考虑下层问题强凸的双层优化。最近的研究表明,使用Hessian-向量积(HVP)Oracle,可以在${\mathcal{O}}(\varepsilon^{-2})$次Oracle调用内找到$\varepsilon$-稳定点。然而,HVP Oracle在实践中可能无法访问或代价高昂。Kwon等人(ICML 2023)通过提出一种一阶方法解决了这个问题,该方法以较慢的$\tilde{\mathcal{O}}(\varepsilon^{-3})$速率实现相同目标。本文引入两时间尺度更新,改进了他们的方法,实现了近最优的$\tilde {\mathcal{O}}(\varepsilon^{-2})$一阶Oracle复杂度。我们的分析具有高度可扩展性。在随机设置下,当随机噪声仅存在于上层目标或同时存在于两层目标时,我们的算法分别可以达到$\tilde {\mathcal{O}}(\varepsilon^{-4})$和$\tilde {\mathcal{O}}(\varepsilon^{-6})$的随机一阶Oracle复杂度。当目标具有更高阶光滑性条件时,我们的确定性方法可以通过注入噪声逃离鞍点,并利用Nesterov动量加速达到更快的$\tilde {\mathcal{O}}(\varepsilon^{-1.75})$速率。

英文摘要

In this work, we consider bilevel optimization when the lower-level problem is strongly convex. Recent works show that with a Hessian-vector product (HVP) oracle, one can provably find an $ε$-stationary point within ${\mathcal{O}}(ε^{-2})$ oracle calls. However, the HVP oracle may be inaccessible or expensive in practice. Kwon et al. (ICML 2023) addressed this issue by proposing a first-order method that can achieve the same goal at a slower rate of $\tilde{\mathcal{O}}(ε^{-3})$. In this paper, we incorporate a two-time-scale update to improve their method to achieve the near-optimal $\tilde {\mathcal{O}}(ε^{-2})$ first-order oracle complexity. Our analysis is highly extensible. In the stochastic setting, our algorithm can achieve the stochastic first-order oracle complexity of $\tilde {\mathcal{O}}(ε^{-4})$ and $\tilde {\mathcal{O}}(ε^{-6})$ when the stochastic noises are only in the upper-level objective and in both level objectives, respectively. When the objectives have higher-order smoothness conditions, our deterministic method can escape saddle points by injecting noise, and can be accelerated to achieve a faster rate of $\tilde {\mathcal{O}}(ε^{-1.75})$ using Nesterov's momentum.

2011.11194 2026-05-26 cs.LG cs.CV cs.NE 版本更新

V3H: View Variation and View Heredity for Incomplete Multi-view Clustering

V3H: 面向不完整多视图聚类的视图变异与视图遗传

Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

发表机构 * School of Computer Science and Technology, Huazhong University of Science and Technology(华中科技大学计算机科学与技术学院) Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology(华中科技大学大数据安全工程研究中心) Department of Electrical and Computer Engineering, University of Florida(佛罗里达大学电子与计算机工程系)

AI总结 提出一种受遗传学启发的视图变异与视图遗传方法(V3H),通过分解子空间为变异矩阵和遗传矩阵分别学习各视图的独特信息和所有视图的一致信息,并利用可调低秩表示恢复底层数据结构,在不完整多视图聚类中同时捕获一致与独特信息,在15个基准数据集上超越现有方法。

Comments Publisheded in IEEE Transactions on Artificial Intelligence

详情
Journal ref
IEEE Transactions on Artificial Intelligence 2020
AI中文摘要

真实数据常以多个不完整视图的形式出现。不完整多视图聚类是集成这些不完整视图的有效方法。以往的方法仅学习不同视图之间的一致信息,而忽略了每个视图的独特信息,这限制了它们的聚类性能和泛化能力。为克服这一局限,我们提出了一种新颖的视图变异与视图遗传方法(V3H)。受遗传学中变异与遗传的启发,V3H首先将每个子空间分解为对应视图的变异矩阵和所有视图的遗传矩阵,分别表示独特信息和一致信息。然后,通过基于聚类指示矩阵对齐不同视图,V3H集成来自不同视图的独特信息以提高聚类性能。最后,借助基于遗传矩阵的可调低秩表示,V3H恢复潜在的真正数据结构以减少大不完整性的影响。更重要的是,V3H可能是首个将遗传学引入聚类算法以从不完整多视图数据中同时学习一致信息和独特信息的工作。在15个基准数据集上的大量实验结果验证了其相对于其他最先进方法的优越性。

英文摘要

Real data often appear in the form of multiple incomplete views. Incomplete multi-view clustering is an effective method to integrate these incomplete views. Previous methods only learn the consistent information between different views and ignore the unique information of each view, which limits their clustering performance and generalizations. To overcome this limitation, we propose a novel View Variation and View Heredity approach (V3H). Inspired by the variation and the heredity in genetics, V3H first decomposes each subspace into a variation matrix for the corresponding view and a heredity matrix for all the views to represent the unique information and the consistent information respectively. Then, by aligning different views based on their cluster indicator matrices, V3H integrates the unique information from different views to improve the clustering performance. Finally, with the help of the adjustable low-rank representation based on the heredity matrix, V3H recovers the underlying true data structure to reduce the influence of the large incompleteness. More importantly, V3H presents possibly the first work to introduce genetics to clustering algorithms for learning simultaneously the consistent information and the unique information from incomplete multi-view data. Extensive experimental results on fifteen benchmark datasets validate its superiority over other state-of-the-arts.

2011.10396 2026-05-26 cs.LG cs.AI 版本更新

Double Self-weighted Multi-view Clustering via Adaptive View Fusion

双自加权多视图聚类:通过自适应视图融合

Xiang Fang, Yuchong Hu

发表机构 * School of Computer Science and Technology, Key Laboratory of Information Storage System Ministry of Education of China, Huazhong University of Science and Technology(计算机科学与技术学院,信息存储系统教育部重点实验室,华中科技大学)

AI总结 提出双自加权多视图聚类框架(DSMC),通过自适应权重矩阵和权重因子分别对特征和图进行加权,去除冗余和噪声,并融合多图进行聚类。

Comments Corresponding author: Xiang Fang

详情
AI中文摘要

多视图聚类已应用于许多实际应用中,其中原始数据通常包含噪声。一些基于图的多视图聚类方法被提出来试图减少噪声的负面影响。然而,以往的基于图的多视图聚类方法即使存在冗余特征或噪声,也平等对待所有特征,这显然是不合理的。在本文中,我们提出了一种新颖的多视图聚类框架——双自加权多视图聚类(DSMC)来克服上述缺陷。DSMC执行双自加权操作,从每个图中去除冗余特征和噪声,从而获得鲁棒的图。对于第一次自加权操作,它通过引入自适应权重矩阵为不同特征分配不同的权重,这可以增强重要特征在联合表示中的作用,并使每个图鲁棒。对于第二次自加权操作,它通过施加自适应权重因子对不同图进行加权,这可以为更鲁棒的图分配更大的权重。此外,通过设计自适应多图融合,我们可以融合不同图中的特征,以整合这些图进行聚类。在六个真实世界数据集上的实验证明了其相对于其他最先进的多视图聚类方法的优势。

英文摘要

Multi-view clustering has been applied in many real-world applications where original data often contain noises. Some graph-based multi-view clustering methods have been proposed to try to reduce the negative influence of noises. However, previous graph-based multi-view clustering methods treat all features equally even if there are redundant features or noises, which is obviously unreasonable. In this paper, we propose a novel multi-view clustering framework Double Self-weighted Multi-view Clustering (DSMC) to overcome the aforementioned deficiency. DSMC performs double self-weighted operations to remove redundant features and noises from each graph, thereby obtaining robust graphs. For the first self-weighted operation, it assigns different weights to different features by introducing an adaptive weight matrix, which can reinforce the role of the important features in the joint representation and make each graph robust. For the second self-weighting operation, it weights different graphs by imposing an adaptive weight factor, which can assign larger weights to more robust graphs. Furthermore, by designing an adaptive multiple graphs fusion, we can fuse the features in the different graphs to integrate these graphs for clustering. Experiments on six real-world datasets demonstrate its advantages over other state-of-the-art multi-view clustering methods.

2011.10331 2026-05-26 cs.CV cs.LG 版本更新

ANIMC: A Soft Framework for Auto-weighted Noisy and Incomplete Multi-view Clustering

ANIMC: 一种自动加权噪声与不完整多视图聚类的软框架

Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

发表机构 * Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology(大数据安全湖北工程研究中心,信息科学与工程学院,华中科技大学) School of Computer Science and Technology, Huazhong University of Science and Technology(计算机科学与技术学院,华中科技大学) Key Laboratory of Information Storage System Ministry of Education of China, Huazhong University of Science and Technology(信息存储系统教育部重点实验室,华中科技大学) Department of Electrical and Computer Engineering, University of Florida(电气与计算机工程系,佛罗里达大学)

AI总结 提出ANIMC框架,通过软自动加权策略和双软正则回归模型,处理多视图聚类中的缺失实例和噪声问题。

Comments Publisheded in IEEE Transactions on Artificial Intelligence

详情
Journal ref
IEEE Transactions on Artificial Intelligence 2021
AI中文摘要

多视图聚类在许多图像处理场景中有广泛应用。在这些场景中,原始图像数据通常包含缺失实例和噪声,而大多数多视图聚类方法忽略了这一点。然而,缺失实例可能使这些方法难以直接使用,噪声则会导致不可靠的聚类结果。本文通过软自动加权策略和双软正则回归模型,提出了一种新颖的自动加权噪声与不完整多视图聚类框架(ANIMC)。首先,通过设计自适应半正则化非负矩阵分解(adaptive semi-RNMF),软自动加权策略为每个视图分配适当的权重,并添加软边界以平衡噪声和不完整性的影响。其次,通过提出θ-范数,双软正则回归模型通过选择不同的θ来调整模型的稀疏性。与现有方法相比,ANIMC具有三个独特优势:1)它是一种软算法,可以在不同场景下调整我们的框架,从而提高其泛化能力;2)它自动学习每个视图的适当权重,从而减少噪声的影响;3)它执行双软正则回归,对齐不同视图中的相同实例,从而减少缺失实例的影响。大量实验结果表明,它优于其他最先进的方法。

英文摘要

Multi-view clustering has wide applications in many image processing scenarios. In these scenarios, original image data often contain missing instances and noises, which is ignored by most multi-view clustering methods. However, missing instances may make these methods difficult to use directly and noises will lead to unreliable clustering results. In this paper, we propose a novel Auto-weighted Noisy and Incomplete Multi-view Clustering framework (ANIMC) via a soft auto-weighted strategy and a doubly soft regular regression model. Firstly, by designing adaptive semi-regularized nonnegative matrix factorization (adaptive semi-RNMF), the soft auto-weighted strategy assigns a proper weight to each view and adds a soft boundary to balance the influence of noises and incompleteness. Secondly, by proposingθ-norm, the doubly soft regularized regression model adjusts the sparsity of our model by choosing differentθ. Compared with existing methods, ANIMC has three unique advantages: 1) it is a soft algorithm to adjust our framework in different scenarios, thereby improving its generalization ability; 2) it automatically learns a proper weight for each view, thereby reducing the influence of noises; 3) it performs doubly soft regularized regression that aligns the same instances in different views, thereby decreasing the impact of missing instances. Extensive experimental results demonstrate its superior advantages over other state-of-the-art methods.

2011.10254 2026-05-26 cs.LG cs.AI stat.ML 版本更新

Unbalanced Incomplete Multi-view Clustering via the Scheme of View Evolution: Weak Views are Meat; Strong Views do Eat

通过视图演化方案的不平衡不完整多视图聚类:弱视图为食,强视图为食

Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

发表机构 * School of Computer Science and Technology, Key Laboratory of Information Storage System Ministry of Education of China, Huazhong University of Science and Technology(计算机科学与技术学院,信息存储系统教育部重点实验室,华中科技大学) Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology(大数据安全工程研究中心,网络安全学院,华中科技大学) Department of Electrical and Computer Engineering, University of Florida(电气与计算机工程系,佛罗里达大学)

AI总结 针对不同视图不完整程度不平衡的问题,受生物进化理论启发,提出基于视图演化的不平衡不完整多视图聚类方法UIMC,通过加权多视图子空间聚类和低秩鲁棒表示恢复数据,显著提升聚类性能。

Comments Accepted by IEEE Transactions on Emerging Topics in Computational Intelligence

详情
Journal ref
IEEE Transactions on Emerging Topics in Computational Intelligence 2021
AI中文摘要

不完整多视图聚类是处理现实世界中不完整多视图数据的重要技术。以往的工作假设所有视图具有相同的不完整性,即平衡不完整性。然而,不同的视图往往具有不同的不完整性,即不平衡不完整性,这导致了强视图(低不完整性视图)和弱视图(高不完整性视图)。不平衡不完整性阻止我们直接使用先前的方法进行聚类。在本文中,受有效生物进化理论的启发,我们设计了新颖的视图演化方案来聚类强视图和弱视图。此外,我们提出了一种不平衡不完整多视图聚类方法(UIMC),这是第一个基于视图演化的有效方法,用于不平衡不完整多视图聚类。与先前的方法相比,UIMC有两个独特的优势:1)它提出了加权多视图子空间聚类来整合这些不平衡不完整的视图,有效解决了不平衡不完整多视图问题;2)它设计了低秩和鲁棒表示来恢复数据,减少了不完整性和噪声的影响。大量的实验结果表明,UIMC在三个评估指标上相比其他最先进的方法将聚类性能提高了高达40%。

英文摘要

Incomplete multi-view clustering is an important technique to deal with real-world incomplete multi-view data. Previous works assume that all views have the same incompleteness, i.e., balanced incompleteness. However, different views often have distinct incompleteness, i.e., unbalanced incompleteness, which results in strong views (low-incompleteness views) and weak views (high-incompleteness views). The unbalanced incompleteness prevents us from directly using the previous methods for clustering. In this paper, inspired by the effective biological evolution theory, we design the novel scheme of view evolution to cluster strong and weak views. Moreover, we propose an Unbalanced Incomplete Multi-view Clustering method (UIMC), which is the first effective method based on view evolution for unbalanced incomplete multi-view clustering. Compared with previous methods, UIMC has two unique advantages: 1) it proposes weighted multi-view subspace clustering to integrate these unbalanced incomplete views, which effectively solves the unbalanced incomplete multi-view problem; 2) it designs the low-rank and robust representation to recover the data, which diminishes the impact of the incompleteness and noises. Extensive experimental results demonstrate that UIMC improves the clustering performance by up to 40% on three evaluation metrics over other state-of-the-art methods.

2605.25235 2026-05-26 cs.LG cs.AI math.OC 版本更新

Constraint-Anchored Attribution: Feasibility-Certified Counterfactuals and Bonferroni-PAC Sufficient Subsets for Neural CO Policies

约束锚定归因:神经组合优化策略的可行性认证反事实与Bonferroni-PAC充分子集

Sohaib Lafifi

发表机构 * Univ. Artois, UR 3926, Laboratoire de G\'enie Informatique et d'Automatique de l'Artois (LGI2A) B\'ethune F-62400 France Univ. Artois, UR 3926, Laboratoire de G\'enie Informatique et d'Automatique de l'Artois (LGI2A)

AI总结 提出一种神经组合优化策略的归因方法,通过LP松弛对偶分解决策、CSP可行性模型认证反事实,并用Bonferroni校正的Hoeffding充分子集测试界定PAC解释大小。

Comments 4 pages, 1 figure, Reference implementation: https://github.com/sohaibafifi/neuro-co-cax (MIT)

详情
AI中文摘要

我们为神经组合优化(CO)策略提供了一种归因方法,该方法(i)通过LP松弛对偶按约束族分解决策,(ii)通过组合可行性模型(实现为CSP可行性决策模型)认证反事实,以及(iii)通过沿贪心顺序的Bonferroni校正Hoeffding充分子集测试界定PAC充分解释的大小。在三个CO问题和三个随机种子上,我们的LP锚定$\Lambda$-归因在CVRPTW(n_cert=344)上匹配CF导出信号的96.5%,在定向问题(n_cert=281)上匹配77.2%,而代理梯度分别为75.0%和35.2%(配对差异+0.215和+0.420;McNemar精确$p \le 10^{-14}$)。在柔性作业车间调度问题的秩对齐机制中,两个后端在每个CSP认证翻转(n_cert=59)上一致,确认了无增益预测。Bonferroni-PAC子集平均每步5.0个节点($M=70$,$\varepsilon=\delta=0.2$,$k_{\max}=25$)。参考实现:https://github.com/sohaibafifi/neuro-co-cax

英文摘要

We give an attribution method for neural combinatorial-optimisation (CO) policies that (i) decomposes a decision by constraint families via LP-relaxation duals, (ii) certifies counterfactuals through a combinatorial feasibility model (implemented as a CSP feasibility-decision model), and (iii) bounds the size of a PAC-sufficient explanation with a Bonferroni-corrected Hoeffding sufficient-subset test along a greedy ordering. Across three CO problems and three seeds, our LP-anchored $Λ$-attribution matches the CF-derived signal at 96.5% on CVRPTW (n_cert=344) and 77.2% on the Orienteering Problem (n_cert=281) vs 75.0% and 35.2% for proxy gradient (paired diffs +0.215 and +0.420; McNemar exact $p \le 10^{-14}$). In the rank-aligned regime of the Flexible Job-Shop Scheduling Problem, both backends agree on every CSP-certified flip (n_cert=59), confirming the no-gain prediction. Bonferroni-PAC subsets average 5.0 nodes per step ($M=70$, $\varepsilon=δ=0.2$, $k_{\max}=25$). Reference implementation: https://github.com/sohaibafifi/neuro-co-cax

2605.25234 2026-05-26 cs.LG cs.AI stat.CO stat.ML 版本更新

On the Epistemic Uncertainty of Overparametrized Neural Networks

关于过参数化神经网络的认知不确定性

David Rügamer

发表机构 * Department of Statistics, LMU Munich(统计系,慕尼黑大学) Munich Center for Machine Learning (MCML)(慕尼黑机器学习中心)

AI总结 本文通过非可辨识性视角分析过参数化神经网络的认知不确定性,刻画了离散和连续残余不确定性来源,并以单隐层ReLU网络为例验证理论。

Comments Accepted at ICML 2026 (Main Track)

详情
AI中文摘要

认知不确定性通常被视为一种可减少的不确定性,随着数据增加而消失。这种观点隐含地假设参数可辨识,并将认知不确定性等同于预测变异性。然而,在过参数化神经网络中,由于对称性和冗余表示,模型参数通常不可辨识。因此,即使底层函数被完全识别,大量的参数不确定性仍然存在。在这项工作中,我们通过非可辨识性的视角分析认知不确定性,并刻画了残余不确定性的离散和连续来源。聚焦于单隐层ReLU网络,我们深入分析了由此产生的后验结构,并通过实证研究验证了我们的理论见解。

英文摘要

Epistemic uncertainty is often viewed as a reducible uncertainty that vanishes with increasing data. This perspective implicitly assumes parameter identifiability and equates epistemic uncertainty with predictive variability. In overparametrized neural networks, however, model parameters are typically non-identifiable due to symmetries and redundant representations. As a consequence, substantial parameter uncertainty can persist even when the underlying function is fully identified. In this work, we analyze epistemic uncertainty through the lens of non-identifiability and characterize both discrete and continuous sources of residual uncertainty. Focusing on one-hidden-layer ReLU networks, we thoroughly analyze the resulting posterior structure and validate our theoretical insights through empirical studies.

2605.25228 2026-05-26 cs.LG 版本更新

A Blended Likelihood Approach for Achieving Fairness Using Naive Bayes

一种使用朴素贝叶斯实现公平性的混合似然方法

John Arthur Junior, Abdul Lateef Yussif, Maame G. Asante-Mensah, Charles R. Haruna, Sandro Amofa, Elliot Attipoe

发表机构 * Department of Computer Science and Information Technology, University of Cape Coast(计算机科学与信息技术系,卡普科斯特大学)

AI总结 提出一种公平感知的朴素贝叶斯扩展(BMNB),通过混合似然估计和自适应阈值后处理来平衡公平性与准确性,在多个数据集上实现接近公平的指标。

详情
AI中文摘要

随着人工智能被纳入高风险决策,对算法偏见和公平性的担忧日益增加。传统的朴素贝叶斯分类器虽然高效且可解释,但缺乏公平性感知机制,并在招聘、信用评分和刑事司法等敏感领域延续了历史偏见。本研究开发了一种公平感知的朴素贝叶斯分类器扩展,在保持计算效率的同时减轻偏见。我们提出了偏见缓解朴素贝叶斯(BMNB)分类器,整合了处理中和处理后干预。处理中阶段采用混合似然方法,通过可调混合参数alpha结合组特定和合并似然估计,以平衡公平性和准确性。处理后阶段应用具有自适应阈值的输出校准,以微调组特定决策边界。实验结果表明,BMNB在Adult、ProPublica和Framingham数据集上分别达到了1.000、1.171和0.997的差异影响(DI)值,以及-0.217、-0.226和-0.053的均等机会差异(EOD)值,同时保持了计算效率。消融研究证实,混合似然与自适应阈值的组合相比单独使用任一技术都能产生更优的性能。

英文摘要

Concerns about algorithmic bias and fairness have increased as artificial intelligence has been incorporated into high-stakes decision-making. Traditional Naive Bayes classifiers, while efficient and interpretable, lack fairness-awareness mechanisms and perpetuate historical biases in sensitive domains such as hiring, credit scoring, and criminal justice. This study develops a fairness-aware extension of the Naive Bayes classifier that mitigates bias while maintaining computational efficiency. We propose the Bias Mitigating Naive Bayes (BMNB) classifier, integrating in-processing and post-processing interventions. The in-processing stage employs a blended likelihood approach combining group-specific and pooled likelihood estimates through a tunable blending parameter alpha to balance fairness and accuracy. The post-processing stage applies output calibration with adaptive thresholding to fine-tune group-specific decision boundaries. Experimental results indicate that BMNB attains Disparate Impact (DI) values of 1.000, 1.171, and 0.997 and Equal Opportunity Difference (EOD) values of -0.217, -0.226, and -0.053 on the Adult, ProPublica, and Framingham datasets, respectively, while maintaining computational efficiency. Ablation studies confirm that the combination of blended likelihood and adaptive thresholding yields superior performance compared to either technique in isolation.

2605.25221 2026-05-26 math.DS cs.LG 版本更新

Data-Specific Hyper-Parameter Design: A Paradigm Shift in Reservoir Computing

数据特定超参数设计:储层计算中的范式转变

G Manjunath, Juan-Pablo Ortega, Alma van der Merwe

发表机构 * University of Pretoria. Department of Mathematics and Applied Mathematics(南非大学 Pretoria分校 数学与应用数学系) Nanyang Technological University. Division of Mathematical Sciences. School of Physical and Mathematical Sciences(南洋理工大学 数学科学学院 物理与数学科学学院)

AI总结 本文从几何角度提出数据特定的储层设计原则,通过锥形集中约束储层状态增量,减少岭回归训练误差,并给出回声状态网络的构造方法及谱诊断指标。

详情
AI中文摘要

储层计算通常依赖于大型随机生成的储层,从而实现简单、通常是线性的读出。在过去的二十年中,大多数构造利用了选择储层的自由度,主要受基于状态收缩或记忆容量的稳定性条件约束。然而,这些设计在很大程度上独立于输入数据和学习目标,导致了一种由随机性驱动的试错方法。在高维空间中,储层充当输入历史的随机嵌入,隐式地依赖Johnson--Lindenstrauss型集中现象来保留信息。相比之下,我们从几何角度为确定性动力系统生成的输入开发了储层设计原则。我们不依赖随机嵌入,而是要求储层状态增量在输入确定的向量子空间内的锥形中对齐,并证明这种锥形集中能减少岭回归训练误差。当锥角较小时,储层状态的方差集中在输入确定的子空间中,改善了经验二阶矩矩阵的条件数,并加强了主导协方差方向与状态-目标互协方差之间的对齐。对于回声状态网络,我们提供了一种构造性的储层设计方法。储层矩阵的选择使得相关的Krylov链方向在输入确定的子空间内保持几乎封闭,同时允许在其正交补中进行受控混合。我们还为岭回归训练提供了一种谱诊断方法,用于识别储层几何何时将预测信息集中到少数主导协方差模式中,以及何时“谱污染”抑制了预测。数值实验表明,与任意储层构造相比,该方法具有一致的性能提升。

英文摘要

Reservoir computing typically relies on large, randomly generated reservoirs, enabling simple, often linear readouts. Over the past two decades, most constructions have exploited the freedom to select the reservoir, constrained primarily by stability conditions based on state contraction or memory capacity. However, these designs are largely independent of the input data and learning objective, resulting in a trial-and-error methodology driven by randomness. In high dimensions, the reservoir acts as a random embedding of the input history, implicitly relying on Johnson--Lindenstrauss--type concentration phenomena to preserve information. In contrast, we develop reservoir design principles from a geometric perspective for inputs generated by deterministic dynamical systems. Rather than relying on random embeddings, we require reservoir state increments to align within a cone around an input-determined vector subspace, and prove that such a cone concentration reduces ridge-regression training error. When the cone angle is small, the variance of reservoir states concentrates in the input-determined subspace, improving conditioning of the empirical second-moment matrix and strengthening alignment between dominant covariance directions and the state-target cross-covariance. For echo state networks, we provide a constructive approach to reservoir design. The reservoir matrix is chosen so that associated Krylov-chain directions remain nearly closed within an input-determined subspace while permitting controlled mixing in its orthogonal complement. We also provide a spectral diagnostic for ridge regression training that identifies when reservoir geometry concentrates predictive information into a few dominant covariance modes and when ``spectral pollution'' inhibits forecasting. Numerical experiments demonstrate consistent performance gains over arbitrary reservoir constructions.

2605.25212 2026-05-26 cs.LG cs.SY eess.SY 版本更新

Personalized Federated Learning by Energy-Efficient UAV Communications

通过节能无人机通信实现个性化联邦学习

Shiqian Guo, Jianqing Liu, Beatriz Lorenzo

发表机构 * Department of Computer Science, North Carolina State University(计算机科学系,北卡罗来纳州立大学) Department of Electrical and Computer Engineering, University of Massachusetts(电气与计算机工程系,马萨诸塞大学)

AI总结 针对无人机辅助联邦学习中数据异构和能耗问题,提出全局共享骨干与本地个性化头部分离的架构,并设计基于梯度范数的调度策略,在降低能耗的同时提升学习精度。

详情
AI中文摘要

联邦学习是一种在保护数据隐私的同时增强边缘设备学习能力的有效范式。在分布式联邦学习系统中,如偏远地区的传感器网络,无人机可以灵活建立高质量通信链路以支持参数交换。然而,设备异构性和无人机有限的电池容量带来了重大挑战。具体而言,数据异构性会减慢收敛速度,而调度所有设备进行全局协作会导致过高的通信和能量成本。为了克服这些挑战,我们采用全局共享骨干与永久本地个性化头部的严格分离,从而减轻数据异构性的影响。此外,我们提出了一种基于梯度的调度策略,该策略联合考虑了能量效率和学习性能。在每轮通信中,骨干仅由梯度$\ell_{2}$范数排名前$α$的设备更新,确保优化集中在信息量最大的更新上。仿真结果表明,与最先进的方法相比,所提方案实现了更高的学习精度,同时显著降低了无人机能耗。

英文摘要

Federated learning (FL) is an effective paradigm for enhancing the learning capability of edge devices while preserving data privacy. In geographically dispersed FL systems, such as sensor networks in remote areas, unmanned aerial vehicles (UAVs) can flexibly establish high-quality communication links to support parameter exchange. However, device heterogeneity and the limited battery capacity of UAVs pose significant challenges. Specifically, data heterogeneity slows convergence, while scheduling all devices for global collaboration incurs excessive communication and energy costs. To overcome these challenges, we adopt a strict separation between a globally shared backbone and permanently local personalization heads, thereby mitigating the impact of data heterogeneity. Furthermore, we propose a gradient-based scheduling strategy that jointly considers energy efficiency and learning performance. In each communication round, the backbone is updated only by the top-$α$ devices ranked by gradient $\ell_{2}$-norm, ensuring that optimization focuses on the most informative updates. Simulation results demonstrate that the proposed scheme achieves higher learning accuracy than state-of-the-art approaches while significantly reducing UAV energy consumption.

2605.25211 2026-05-26 cs.LG 版本更新

Evolving Causal Regulatory Networks (ECR-Net)

演化因果调控网络(ECR-Net)

Govind Vallabhasseri Binish, Abdhul Ahadh, Rano Roy Kavanal, Arya Ukunde

AI总结 提出一种受生物启发的自适应因果机制发现框架ECR-Net,通过演化搜索算法动态建模因果图结构,以应对非平稳环境下的分布外泛化问题。

Comments 9 pages, 6 figures. Presents ECR-Net, an evolutionary framework for adaptive causal structure discovery under non-stationarity, with empirical evaluation against NOTEARS, PCMCI+, and related baselines

详情
AI中文摘要

现代机器学习模型在模式识别方面表现出色,但仍然脆弱,常常无法在分布外(OOD)泛化,因为它们捕获的是虚假相关性而非潜在的因果数据生成过程。当前的因果发现方法虽然强大,但通常假设静态图结构,无法建模跨环境适应或发生结构变化的系统。我们提出ECR-Net,即演化因果调控网络,一种新颖的、受生物启发的自适应因果机制发现框架。我们的方法将数据生成过程建模为动态系统,类似于基因调控网络(GRN),由局部递归函数组成,其中变量可以相互激活和抑制。为了发现该网络的潜在结构,我们采用演化搜索算法,演化候选调控图群体,优化适应度函数,该函数衡量模拟系统动力学重建观测数据的程度。ECR-Net的关键创新在于其建模结构适应的能力:它明确地将数据统计特性的变化作为环境冲击的信号。作为响应,演化搜索识别出因果图拓扑的简约修改,例如链接抑制或激活,以解释新的数据状态。我们认为ECR-Net代表了一类新的自适应结构因果模型,能够发现系统基本规则如何以及为何发生变化,为复杂非平稳系统中的鲁棒泛化提供了途径。

英文摘要

Modern machine learning models excel at pattern recognition but remain brittle, often failing to generalize out of distribution (OOD) because they capture spurious correlations rather than the underlying causal data-generating process. Current causal discovery methods, while powerful, typically assume a static graph structure, rendering them unable to model systems that adapt or undergo structural changes across different environments. We introduce ECR-Net, Evolving Causal Regulatory Networks, a novel, bio-inspired framework for adaptive causal mechanism discovery. Our approach models the data-generating process not as a static graph, but as a dynamic system analogous to a Gene Regulatory Network (GRN), composed of localized, recursive functions where variables can activate and inhibit one another. To discover the latent structure of this network, we employ an evolutionary search algorithm that evolves a population of candidate regulatory graphs, optimizing for a fitness function that measures how well the simulated system dynamics reconstruct the observed data. The key innovation of ECR-Net is its ability to model structural adaptation, it explicitly ingests shifts in the data's statistical properties as signals of an environmental shock. In response, the evolutionary search identifies parsimonious modifications to the causal graph topology, such as link inhibitions or activations that explain the new data regime. We posit that ECR-Net represents a new class of adaptive Structural Causal Models capable of discovering how and why a system's fundamental rules change, offering a path toward robust generalization in complex, non-stationary systems.

2605.25210 2026-05-26 cs.LG cs.AI stat.ML 版本更新

Multi-Objective Learning for Diffusion Models: A Statistical Theory under Semi-Supervised Learning

扩散模型的多目标学习:半监督学习下的统计理论

Ziheng Cheng, Yixiao Huang, Hanlin Zhu, Haoran Geng, Somayeh Sojoudi, Jitendra Malik, Pieter Abbeel, Xin Guo

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 针对扩散模型在多目标学习中因模型容量增大导致统计成本高的问题,提出半监督两阶段训练方法,利用未标记数据通过伪样本蒸馏,证明所需配对样本量仅取决于专家模型复杂度。

详情
AI中文摘要

扩散模型越来越多地被用作强大的条件生成器,然而实际部署通常涉及来自不同任务的多个目标分布,例如文本到图像生成中的多样化提示域,或机器人技术中具有扩散策略的多个环境。这自然引出了多目标学习(MOL)问题。一个关键挑战是,实现良好的帕累托权衡可能需要一个通用模型类,其容量远大于解决任何单个任务所需的容量,从而增加了统计成本,因为样本复杂度通常随模型复杂度而扩展。为了调和这一点,我们为有限数据下的扩散模型开发了一个原则性的多目标学习框架:一种半监督机制,其中配对(标记)样本稀缺,但(未标记)条件数据丰富。我们提出了一种两阶段训练程序,首先从有限的配对数据中拟合轻量级专家模型,然后通过生成伪样本将它们蒸馏成一个通用模型。我们建立了泛化界限,表明所需的配对样本数量仅取决于专家模型类的复杂度。我们进一步将理论扩展到用于序列决策的扩散策略,以考虑在线策略展开中的分布偏移。在机器人控制和图像恢复任务上进行了大量实验,以验证我们的理论结果。

英文摘要

Diffusion models are increasingly used as powerful conditional generators, yet real deployments often involve multiple target distributions arising from different tasks, e.g., diverse prompt domains in text-to-image generation, or multiple environments in robotics with diffusion policies. This naturally leads to a multi-objective learning (MOL) problem. A key challenge is that achieving good Pareto trade-offs can require a generalist model class with substantially larger capacity than what suffices for solving any individual task, thereby increasing statistical cost since sample complexity typically scales with the model complexity. To reconcile this, we develop a principled MOL framework for diffusion models with limited data: a semi-supervised regime where paired (labeled) samples are scarce, but (unlabeled) condition data are abundant. We propose a two-stage training procedure that first fits lightweight specialist models from limited paired data, and then distills them into a generalist model by generating pseudo-samples. We establish generalization bounds showing that the required number of paired samples only depends on the complexity of the specialist model classes. We further extend the theory to diffusion policies for sequential decision making to account for distribution shift in on-policy rollouts. Extensive experiments on robotic control and image restoration tasks are conducted to verify our theoretical results.

2605.25203 2026-05-26 cs.LG cs.AI cs.LO 版本更新

Influence-Inspired Spectral Rotations for Extreme Low-Bit LLM Quantization

基于影响启发的谱旋转用于极端低位LLM量化

Gorgi Pavlov

发表机构 * Lehigh University(莱斯大学)

AI总结 本文利用伴随理论论文的影响自适应Walsh几何,通过WHT旋转和列缩放结合重构误差量化器,实现极端低位权重量化,在多个模型上降低困惑度15-58%。

Comments 14 pages, no figures. Companion application paper to arXiv:2605.01637 (theory). Code and pinned eval stack: https://github.com/gogipav14/spectral-llm

详情
AI中文摘要

我们将伴随理论论文(arXiv:2605.01637)的影响自适应Walsh几何应用于极端低位仅权重量化。方法是一个数学不变的变换:对每个线性层的权重矩阵进行WHT旋转,并根据逐坐标Walsh基激活能量重新缩放其列,然后交给重构误差量化器(Intel auto-round)。这使每组整数舍入偏向高谱能量通道。在四个从135M到1.5B参数的预训练仅解码器模型上,BBT-spectral在W2A16下相对于普通auto-round将wikitext-2困惑度降低了15-58%;我们还报告了一个TinyLlama-1.1B辅助数据点。三个扩展将方法迁移到其失败的族:针对Qwen3注意力的每头PCA矩阵-Gamma替换q_norm/k_norm(Qwen3-0.6B上PPL从136.76降至88.99);与RoPE可交换的SO(2)每对旋转(Qwen2.5-1.5B上PPL从36.93降至21.84);以及通过架构模糊测试发现的Laguna风格融合专家布局的MoE感知输入侧吸收修复。W2与W4的消融实验给出了一个故意的阴性对照:在W4下,重新分配收益落在±0.5 PPL噪声基底内,这与Schur-凸性直觉一致,即非集中影响成本随噪声预算缩小而消失。所有量化权重导出为OpenVINO IR,并在Intel NPU + Arc dGPU + CPU上运行,PPL在设备间变化在±0.1内。我们不声称将理论论文的majorization论证形式化为布尔到实数值的迁移:这里使用的WHT激活能量不是理论论文的布尔影响,联系是直观的,贡献在于工程价值而非迁移定理。与SpinQuant、QuaRot、QuIP-sharp、AQLM、OmniQuant和ButterflyQuant在匹配校准下的头对头基准测试是未来的主要工作。

英文摘要

We apply the influence-adaptive Walsh geometry of a companion theory paper (arXiv:2605.01637) to extreme low-bit weight-only LLM quantization. The recipe is one math-invariant transformation: WHT-rotate each linear layer's weight matrix and rescale its columns by per-coordinate Walsh-basis activation energy before handing off to a reconstruction-error quantizer (Intel auto-round). This biases per-group integer rounding toward high-spectral-energy channels. On four pretrained decoder-only models from 135M to 1.5B parameters, BBT-spectral reduces wikitext-2 perplexity by 15-58% relative to vanilla auto-round at W2A16; we also report a TinyLlama-1.1B auxiliary data point. Three extensions transfer the recipe to families it failed on: a per-head PCA matrix-Gamma replacement of q_norm/k_norm for Qwen3 attention (PPL 136.76 -> 88.99 on Qwen3-0.6B); an SO(2) per-pair rotation that commutes with RoPE (PPL 36.93 -> 21.84 on Qwen2.5-1.5B); and an MoE-aware input-side absorption fix identified by architectural fuzzing of Laguna-style fused-expert layouts. A W2-vs-W4 ablation gives a deliberate negative control: the redistribution payoff falls within the +/-0.5 PPL noise floor at W4, consistent with the Schur-convexity intuition that the cost of unconcentrated influence vanishes as the noise budget shrinks. All quantized weights export to OpenVINO IR and run on Intel NPU + Arc dGPU + CPU with PPL invariant to device within +/-0.1. We do not claim a formal Boolean-to-real-valued transfer of the theory paper's majorization argument: the WHT activation energy used here is not the Boolean influence of the theory paper, the link is intuitive, and the contribution is engineering value rather than a transferred theorem. Head-to-head benchmarks against SpinQuant, QuaRot, QuIP-sharp, AQLM, OmniQuant, and ButterflyQuant at matched calibration are the main future-work item.

2605.25198 2026-05-26 cs.LG cs.AI 版本更新

Hide to Guide: Learning via Semantic Masking

隐藏以引导:通过语义掩码学习

Ruitao Liu, Qinghao Hu, Alex Hu, Yecheng Wu, Shang Yang, Luke J. Huang, Zhuoyang Zhang, Han Cai, Song Han

发表机构 * MIT(麻省理工学院) NVIDIA(英伟达)

AI总结 提出语义掩码专家策略优化(SMEPO),通过掩码专家轨迹中与奖励相关的语义片段,将困难问题转化为填空过程,提升强化学习在推理密集型任务中的探索效率。

详情
AI中文摘要

具有可验证奖励的强化学习(RLVR)已成为提升语言模型在推理密集型任务上性能的强大范式,但其有效性常受限于探索。例如,模型在困难问题上常常失败,留下很少有用的奖励信号。外部专家轨迹提供了一种自然的引导来源,但它们也可能在通往验证器目标的关键路径上暴露与奖励相关的内容,如最终答案、中间值、可执行实现或与答案相关的实体。这些内容可能创建意外的奖励黑客通道,使策略通过复制轨迹而非学习底层推理或智能体行为来获得奖励。现有的引导式RL方法通过使用部分轨迹来降低这种风险,但它们主要启发式地控制展示多少专家信息,而非控制应隐藏哪些部分。为此,我们提出语义掩码专家策略优化(SMEPO),一种用于专家引导RLVR的细粒度语义掩码策略。SMEPO不是粗略地截断轨迹或原样展示,而是在保留专家分解、计划和过程结构的同时,掩码关键路径上与奖励相关的语义片段。这将困难问题从从头推理转变为填空过程:策略可以遵循专家的问题解决路径,但仍需自行重建缺失的值、代码或实体。SMEPO易于应用,无需更改奖励函数或RL目标。在包括数学、代码和智能体搜索在内的多个领域,SMEPO相比GRPO将准确率提升最多3.2个百分点,并将训练时间减少最多4.2倍。代码已开源:https://github.com/mit-han-lab/SMEPO。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has become a powerful paradigm for improving language models on reasoning-intensive tasks, but its effectiveness is often limited by exploration. For example, models often fail on hard problems, leaving little useful reward signal. External expert traces offer a natural source of guidance, yet they may also expose reward-relevant content along the critical path to the verifier target, such as final answers, intermediate values, executable implementations, or answer-related entities. This content can create an unintended reward hacking channel, allowing the policy to obtain reward by copying the trace rather than learning the underlying reasoning or agentic behavior. Existing guided-RL methods reduce this risk by using partial trajectories, but they mainly control how much expert information is shown heuristically rather than which parts should be hidden. To this end, we propose Semantic Masked Expert Policy Optimization (SMEPO), a fine-grained semantic masking strategy for expert-guided RLVR. Instead of truncating traces coarsely or revealing them unchanged, SMEPO masks reward-relevant semantic spans along the critical path while preserving the expert's decomposition, plan, and procedural structure. This turns hard problems from reasoning from scratch into a fill-in-the-blank process: the policy can follow the expert's problem-solving route, but must still reconstruct the missing values, code, or entities by itself. SMEPO is simple to apply and requires no changes to the reward function or RL objective. Across diverse domains, including math, code, and agentic search, SMEPO improves accuracy by up to 3.2 points over GRPO and reduces training time by up to 4.2x. The code is available at https://github.com/mit-han-lab/SMEPO.

2605.25194 2026-05-26 cs.LG 版本更新

Localization then Neutralization: Gradient-guided Token Suppression against Visual Prompt Injection Attack

先定位再中和:梯度引导的令牌抑制对抗视觉提示注入攻击

Dongpeng Zhang, Ke Ma, Yangbangyan Jiang, Gaozheng Pei, Longtao Huang, Qianqian Xu, Qingming Huang

发表机构 * School of Advanced Interdisciplinary Sciences, UCAS(UCAS交叉学科研究院) School of Electronic, Electrical and Communication Engineering, UCAS(UCAS电子电气与通信工程学院) State Key Laboratory of AI Safety, Institute of Computing Technology, CAS(中国科学院计算技术研究所人工智能安全国家重点实验室) Alibaba Group(阿里巴巴集团) School of Computer Science and Technology, UCAS(UCAS计算机科学与技术学院) Beijing Academy of Artificial Intelligence(北京人工智能研究院) Key Laboratory of Big Data Mining and Knowledge Management, UCAS(UCAS大数据挖掘与知识管理重点实验室)

AI总结 针对多模态大语言模型的视觉提示注入攻击,提出梯度令牌掩码(GTM)方法,通过梯度分析定位关键图像令牌并掩码中和,将攻击成功率降至接近零且计算开销极小。

详情
AI中文摘要

对抗性图像通过提示注入对多模态大语言模型构成严重安全威胁。现有防御缺乏对底层机制的原则性理解,且难以平衡效率和防御效用。在这项工作中,我们表明成功的对抗攻击并非均匀依赖整个图像,而是依赖于一小部分关键图像令牌。基于这一见解,我们提出梯度令牌掩码(GTM),通过梯度分析定位这些令牌并通过掩码中和它们。我们发现,当攻击保留预测令牌时,基于第一个生成令牌输出概率的归因会失败。为克服这一点,GTM利用隐藏状态梯度范数分数进行对抗输入下的生成影响归因。我们证明其排名与完整对抗损失梯度的排名一致,为精确定位提供了理论保证。我们的方法仅需一次前向-反向传播即可识别并清零少量高分令牌,有效破坏对抗攻击路径。在提示注入和多模态越狱攻击上的大量实验表明,我们的方法将攻击成功率(ASR)降至接近零,同时以可忽略的计算开销保持模型效用。

英文摘要

Adversarial images pose a severe security threat to multimodal large language models through prompt injection. Existing defenses largely lack a principled understanding of the underlying mechanisms and struggle to balance efficiency and defense utility. In this work, we show that successful adversarial attacks do not rely on the entire image uniformly but instead depend on a small subset of critical image tokens. Based on this insight, we propose Gradient Token Masking (GTM), which localizes these tokens via gradient analysis and neutralizes them through masking. We find that attribution based on the first generated token's output probability fails when attacks preserve the predicted token. To overcome this, GTM utilizes the Hidden-State Gradient Norm score for generation-influence attribution under adversarial inputs. We prove that its ranking is consistent with that of the full adversarial loss gradient, providing a theoretical guarantee for accurate localization. Our method requires only a single forward-backward pass to identify and zero out a small number of high-scoring tokens, effectively disrupting the adversarial attack path. Extensive experiments on prompt injection and multimodal jailbreak attacks demonstrate that our approach reduces attack success rates (ASR) to near zero while preserving model utility with negligible computational overhead.

2605.25189 2026-05-26 cs.LG cs.CL 版本更新

Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models

方向对齐缓解语言模型强化学习中的奖励黑客问题

Wenlong Deng, Jiaji Huang, Kaan Ozkara, Yushu Li, Christos Thrampoulidis, Xiaoxiao Li, Youngsuk Park

发表机构 * University of British Columbia(不列颠哥伦比亚大学) Vector Institute(向量研究所) Amazon(亚马逊)

AI总结 通过分析强化学习更新的几何结构,发现奖励黑客源于优化偏离稳定低维学习轨迹,提出可信方向投影方法约束梯度在干净参考子空间内,延迟捷径利用并保持任务性能。

详情
AI中文摘要

当模型通过利用捷径而非解决预期任务来改进代理奖励时,就会出现奖励黑客问题。我们通过语言模型中强化学习更新的几何结构来研究这种失败模式,并认为当优化偏离稳定的低维学习轨迹时,黑客行为就会出现。我们通过参数更新的主导奇异方向分析了这种漂移,并表明奖励黑客运行比干净运行表现出显著更大的方向变化。基于这一观察,我们引入了可信方向投影,它约束梯度保持在干净参考子空间内。在数学推理的奖励黑客实验中,所提出的方法延迟了捷径利用并更好地保持了任务性能。

英文摘要

Reward hacking arises when a model improves a proxy reward by exploiting shortcuts rather than solving the intended task. We study this failure mode through the geometry of reinforcement learning updates in language models and argue that hacking emerges when optimization drifts away from a stable low-dimensional learning trajectory. We analyze this drift through dominant singular directions of parameter updates and show that reward-hacking runs exhibit substantially larger directional change than clean runs. Motivated by this observation, we introduce trusted-direction projection, which constrains gradients to remain within a clean reference subspace. Across reward-hacking experiments on mathematical reasoning, the proposed approach delays shortcut exploitation and better preserves task performance.

2605.25174 2026-05-26 q-bio.NC cs.LG cs.NE 版本更新

Growing a Neural Network in Breadth, Depth, and Time

在广度、深度和时间上生长神经网络

Eivinas Butkus, Kedar Garzón Gupta, Nikolaus Kriegeskorte

发表机构 * Columbia University(哥伦比亚大学) NSF AI Institute for Artificial and Natural Intelligence(国家科学基金会人工智能与自然智能研究院)

AI总结 提出在循环卷积神经网络中定义广度、深度和时间的可微成本,通过反向传播联合优化任务误差和资源成本,发现三者可相互权衡,且模型使用的时间与人类反应时间相关。

详情
AI中文摘要

空间和时间资源约束对生物和人工智能系统都至关重要。在这里,我们在一个被构想为无限格点有限子集的循环卷积神经网络中,定义了广度、深度和时间的可微成本项。我们通过反向传播将这些成本与任务误差联合优化。我们对广度、深度和时间施加不同的压力,导致通过训练有机地出现多样化的计算图。我们发现所有三种资源可以相互权衡以达到给定的准确度水平。网络在所有三个维度上随任务复杂性增长,并且在输入被遮挡时自发地采取更多的循环步骤。令人惊讶的是,模型使用的时间与人类在物体识别任务中的反应时间相关。我们的框架提供了资源约束如何塑造神经架构的规范性解释,与神经科学中关于大脑设计的问题相联系,并可能有助于阐明自然界中发现的神经解决方案的多样性。

英文摘要

Spatial and temporal resource constraints are critical for both biological and artificial intelligent systems. Here we define differentiable cost terms for breadth, depth, and time within a recurrent convolutional neural network conceived as a finite subset of an infinite lattice. We optimize these costs jointly with task errors via backpropagation. We set different pressures on breadth, depth, and time, which leads to diverse computational graphs emerging organically through training. We find that all three resources can be traded off against each other to achieve a given level of accuracy. Networks grow in all three dimensions with task complexity and spontaneously take more recurrent steps when inputs are occluded. Surprisingly, time used by the model correlates with human reaction times in an object recognition task. Our framework provides a normative account of how resource constraints shape neural architectures, connecting to questions about brain design in neuroscience, and may help illuminate the diversity of neural solutions found in nature.

2605.25173 2026-05-26 stat.ML cs.LG math.ST stat.TH 版本更新

Nyström Kernel Stein Discrepancy Tests

Nyström 核 Stein 散度检验

Florian Kalinke, Zoltán Szabó, Bharath K. Sriperumbudur

发表机构 * Chair of Information Systems(信息系统系) Department of Statistics(统计系) London School of Economics(伦敦经济学院) The Pennsylvania State University(宾夕法尼亚州立大学)

AI总结 本文提出并理论证明 Nyström 加速的核 Stein 散度检验在保持渐近水平和局部一致性的同时,显著降低计算复杂度。

详情
AI中文摘要

核 Stein 散度(KSD)是通用域上最受欢迎的拟合优度(GoF)度量之一,已成功部署大量应用。KSD 的主要应用之一是构建强大的 GoF 检验。然而,依赖于经典 U-/V-统计量基 KSD 估计量的检验有两个主要缺点。(i)其运行时间随样本数量呈二次方增长。(ii)在大多数情况下,其渐近零分布计算上难以处理,通常通过自举法处理。虽然已知 Nyström 方法可以在温和条件下以无统计精度损失的方式加速 KSD 估计,但据我们所知,其对基于自举的 GoF 检验影响的基本问题尚未解决;解决此问题是本文的重点。特别地,我们证明了二次时间自举 KSD 基 GoF 检验的关键性质(渐近水平和局部一致性)由其 Nyström 加速版本保持。我们在球面数据和函数数据的 GoF 检验背景下数值展示了加速 KSD 估计量和自举的效率。我们的数值结果表明,Nyström 加速方法在统计性能上与二次时间方法相当,同时需要显著更小的运行时间。

英文摘要

Kernel Stein discrepancy (KSD) is among the most popular goodness-of-fit (GoF) measures on general domains with a large number of successful deployments. One of the main applications of KSD is in constructing powerful GoF tests. However, tests relying on the classical U-/V-statistic-based KSD estimators have two major drawbacks. (i) Their runtime scales quadratically in the number of samples. (ii) Their asymptotic null distribution is computationally intractable in most cases, typically handled by bootstrapping. While it is known that the Nyström method permits accelerating KSD estimation with no loss of statistical accuracy under mild conditions, to the best of our knowledge, the fundamental question of its impact on bootstrap-based GoF testing is open; resolving this question is the focus of the current paper. In particular, we prove that the key properties of the quadratic-time bootstrapped KSD-based GoF test (asymptotic level and local consistency) are preserved by its Nyström acceleration. We numerically demonstrate the efficiency of the accelerated KSD estimator and bootstrap in the context of GoF testing of spherical and functional data. Our numerical results show that the Nyström-accelerated method performs statistically on-par with the quadratic-time approach, while requiring substantially smaller runtime.

2605.25172 2026-05-26 stat.AP cs.DL cs.LG 版本更新

Rejoinder: The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review

回复:ICML 2023 排名实验:审视机器学习/人工智能同行评审中的作者自我评估

Buxin Su, Jiayao Zhang, Natalie Collina, Yuling Yan, Didong Li, Kyunghyun Cho, Jianqing Fan, Aaron Roth, Weijie Su

发表机构 * University of Pennsylvania(宾夕法尼亚大学) University of Wisconsin–Madison(威斯康星大学麦迪逊分校) University of North Carolina at Chapel Hill(北卡罗来纳大学教堂山分校) New York University(纽约大学) Princeton University(普林斯顿大学) Associate Chair of ICML 2023(ICML 2023 associate chair) Program Chair of ICML 2023(ICML 2023 program chair)

AI总结 本文回应了关于ICML 2023排名实验的讨论,将同行评审视为统计估计问题,探讨了等渗机制的公平性与策略问题,并提出了结合审稿人排名和生成式AI时代以人为中心的评审框架。

Comments Rejoinder to the JASA Discussion of "The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review" (arXiv:2408.13430)

详情
AI中文摘要

本文是对即将发表在《美国统计协会杂志》并附有讨论的《ICML 2023排名实验:审视机器学习/人工智能同行评审中的作者自我评估》一文的回复。为了回应讨论者提出的实践和理论观点,我们围绕四个核心主题组织回应:(i) 将同行评审表述为统计估计问题;(ii) 缓解等渗机制部署中的公平性和策略性担忧;(iii) 整合补充信号,如审稿人排名和结构化元数据;(iv) 探索生成式AI时代以人为中心的同行评审框架。

英文摘要

This article is the rejoinder to ``The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review,'' to appear in the Journal of the American Statistical Association with discussion. To address the practical and theoretical points raised by the discussants, we organize our response around four core themes: (i) formulating peer review as a statistical estimation problem; (ii) mitigating equity and strategic concerns in the deployment of the Isotonic Mechanism; (iii) incorporating complementary signals such as reviewer rankings and structured metadata; and (iv) exploring a human-centered framework for peer review in the era of generative AI.

2605.25170 2026-05-26 cs.LG cs.AI cs.ET cs.RO 版本更新

Grow-Prune-Freeze Networks: Adaptive & Continual Learning Technique for Olfactory Navigation

生长-剪枝-冻结网络:用于嗅觉导航的自适应与持续学习技术

Kordel K. France, Ovidiu Daescu

AI总结 提出生长-剪枝-冻结(GPF)网络框架,通过动态调整策略网络层数实现持续学习,在湍流羽流导航任务中达到94%成功率,并推广到其他机器学习任务。

详情
AI中文摘要

嗅觉训练数据分散在非标准化的数据集中,限制了构建代表性世界模型的能力。嗅觉导航是一项高度动态和非平稳的任务,受益于实时持续学习。我们引入了一种名为生长-剪枝-冻结(GPF)网络的自适应框架,使智能体能够通过生长、剪枝和冻结其策略的早期层来持续学习,以应对世界复杂性。将GPF基于非线性随机矩阵理论,我们展示了Pennington & Worth(2017)的工作可以从单隐藏层扩展到n层持续学习模型,并且网络权重的特征值组成在添加连续层时得以保持。我们展示了基于期望SARSA的GPF在湍流羽流导航上实现了94%的成功率——这是一个部分可观测、非平稳的任务,代表了激发机器人自适应学习的“大世界”挑战——并提供了将GPF应用于其他世界模型的支撑方法。进一步的实验表明,GPF可能很好地推广到其他机器学习任务,如Atari中的强化学习、图像分类和自回归语言模型。我们开源所有代码和数据,以鼓励对嗅觉机器人技术的改进和更多研究。

英文摘要

Training data for olfaction is scattered through disparate, non-standardized datasets that limit the ability to build representative world models. Olfactory navigation is a highly dynamic and non-stationary task that benefits from real-time continual learning. We introduce an adaptive framework called Grow-Prune-Freeze (GPF) networks that enable an agent to continually learn through growing, pruning, and freezing early layers of its policy in response to world complexity. Grounding GPFs in non-linear random matrix theory, we show that the work of Pennington & Worth (2017) can be extended from single hidden layers to n-layer continual-learning models, and that eigenvalue composition of network weights is preserved as successive layers are added. We show that GPFs based on Expected SARSA achieve a 94% success rate on turbulent plume navigation - a partially observable, non-stationary task representative of the "big world" challenges that motivate adaptive learning in robotics - and provide supporting methodology for applying GPFs in other world models. Further experiments amount evidence that GPFs may generalize well to other machine learning tasks such as reinforcement learning in Atari, image classification, and autoregressive language models. We open source all code and data to encourage improvements on and more research in olfactory robotics.

2605.25169 2026-05-26 cs.LG stat.ME stat.ML 版本更新

Learning Treatment Effects during Resource Allocation via Priority-Queue Randomization

资源分配中通过优先级队列随机化学习处理效应

JungHo Lee, Johnna Sundberg, Pim Welle, Bryan Wilder

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Allegheny County Department of Human Services(阿勒格尼县人类服务部)

AI总结 提出优先级队列随机化实验设计框架,在优先服务高需求个体的同时识别因果效应,并优化队列分配以平衡统计效率与优先级。

详情
AI中文摘要

公共服务项目通常在对其效益不确定的情况下分配有限资源,因此需要随机化来支持可信评估。然而在实践中,申请人通常进入等待名单,资源通过分层优先级队列优先分配给被认为需求更高的个体,这使得直接随机化变得困难。受此启发,我们开发了一个实验设计框架,用于在学习处理效应的同时优先治疗最需要帮助的个体,其中新申请人根据其评估的风险评分被随机分配到优先级队列。然后,在预算允许的情况下,按优先级顺序跨队列提供治疗,并在队列内按先到先得原则提供。我们的贡献有两方面。首先,我们描述了在这种优先级队列分配下哪些因果效应被识别。当到达是外生时,处理是条件随机化的,因此标准估计量被识别;当到达是内生时,队列随机化反而为处理提供了工具变量,识别出由排队过程引起的局部处理效应。其次,我们开发了优化的队列分配设计,以在统计效率与优先考虑高需求申请人之间进行权衡。在此过程中,我们表明,尽管设计导致的处理分配存在依赖性,但通常的独立同分布效率界限仍然是合理的设计目标。我们使用美国一个大县的住房分配项目的数据来说明所提出的设计。

英文摘要

Public service programs often allocate limited resources under uncertainty about their benefits, creating a need for randomization to support credible evaluation. In practice, however, applicants commonly enter waitlists where resources are prioritized toward individuals judged to have higher need through tiered priority queues, making direct randomization difficult. Motivated by this, we develop an experimental design framework for learning treatment effects while treating those most in need where incoming applicants are randomized into priority queues based on their assessed risk scores. Treatments are then provided across queues in priority order and first-in-first-out within queue as budget becomes available. Our contributions are two-fold. First, we characterize what causal effects are identified under this priority-queue allocation. When arrivals are exogenous, treatments are conditionally randomized, and hence standard estimands are identified; when arrivals are endogenous, queue randomization instead provides an instrument for treatment, identifying local treatment effects induced by the queuing process. Second, we develop optimized queue-assignment designs that trade off statistical efficiency against prioritizing higher-need applicants. We show in the process that, despite dependence in treatment assignments induced by the design, usual iid efficiency bounds remain well-justified design objectives. We illustrate the proposed designs using data from a housing allocation program in a large U.S. county.

2605.25166 2026-05-26 cs.LG cs.AI 版本更新

AME-TS: Anchored Mixture-of-Experts for Time Series Forecasting

AME-TS:基于锚定的混合专家模型用于时间序列预测

Rui Wang, Renhao Xue, Ray Razi, Huan Song, Hannah R. Marlowe

发表机构 * Amazon Web Services(亚马逊网络服务)

AI总结 提出AME-TS,一种结构引导的稀疏时间序列基础模型,通过轻量级预测器估计序列级描述符并生成专家软结构先验,实现专家路由与可解释时间结构对齐,在GIFT-Eval基准上实现精度-效率权衡,并在M5微调中展现更稳定的专家专业化。

详情
AI中文摘要

时间序列预测模型通过大型Transformer骨干不断扩展规模,但大多数现有方法通过共享密集计算路径处理所有序列,尽管时间结构存在显著异质性。混合专家模型(MoE)通过条件计算提供了一种自然替代方案,但标准MoE路由导致专家专业化识别弱且在下游适应中常不稳定。我们提出AME-TS,一种结构引导的稀疏时间序列基础模型,将专家路由与可解释的时间结构对齐。AME-TS首先使用轻量级预测器估计序列级描述符,包括可预测性、季节性、趋势和稀疏性,并将其映射为专家上的软结构先验。该序列级先验在训练期间指导令牌级路由,鼓励结构对齐的专业化。在GIFT-Eval基准上,AME-TS在不同模型规模下提供了强大的精度-效率权衡:在小型模型规模上显著优于现有时间序列基础模型,在较大规模上与最强模型保持竞争力,同时通过稀疏路由激活显著更少的参数。我们进一步表明,在M5数据集微调期间,AME-TS学习了比标准MoE更可解释的路由几何和更稳定的专家专业化。这些结果表明,结构感知路由是实现稀疏专家模型在时间序列预测中优势的有效且可靠方式。

英文摘要

Time series forecasting models are increasingly scaled through large Transformer backbones, yet most existing approaches process all series through a shared dense computation path despite substantial heterogeneity in temporal structure. Mixture-of-Experts (MoE) offers a natural alternative by enabling conditional computation, but standard MoE routing leaves expert specialization weakly identified and often unstable during downstream adaptation. We propose AME-TS, a structure-guided sparse time series foundation model that aligns expert routing with interpretable temporal structure. AME-TS first uses a lightweight regime predictor to estimate series-level descriptors, including forecastability, seasonality, trend, and sparsity, and maps them to a soft structural prior over experts. This series-level prior guides token-level routing during training, encouraging structure-aligned specialization. On the GIFT-Eval benchmark, AME-TS delivers a strong accuracy-efficiency tradeoff across model scales: it substantially outperforms existing time series foundation models at small model scales and remains competitive with the strongest models at larger scales, while activating substantially fewer parameters through sparse routing. We further show that AME-TS learns more interpretable routing geometry and substantially more stable expert specialization than standard MoE during fine-tuning on the M5 dataset. These results suggest that structure-aware routing is an effective and reliable way to realize the benefits of sparse expert models for time series forecasting.

2605.25156 2026-05-26 cs.LG cs.AI 版本更新

Abduction-Deduction Entanglement: Domain Generalization via Representation Transplants

溯因-演绎纠缠:通过表示移植实现领域泛化

Kasra Jalaldoust, Elias Bareinboum

发表机构 * Columbia University(哥伦比亚大学)

AI总结 本文提出一种基于表示移植的方法,通过参数化溯因-演绎纠缠中的非可识别性,在源分布约束下搜索目标分布空间,实现领域泛化中的最优目标预测。

详情
AI中文摘要

在源分布下训练的预测模型通常无法很好地泛化到不同的目标分布。对未见数据分布的有效推断必须依赖于生成源数据和目标数据的某些因果机制的不变性,然而这些结构不变性仅从源数据中是无法识别的。在关于数据的温和因果假设下,我们表明目标中的最优预测实际上部分可由源分布识别。该结果基于一个简单的观察:在任何领域中,最优预测可以分解为我们称之为溯因映射和演绎映射的一对映射,其中溯因映射从观测变量推断某些未观测变量(可能是混杂因素),演绎映射使用观测和推断的量来预测标签。大量源数据的使用固定了最优预测,从而约束了产生它的有效溯因-演绎组合——这种非可识别性我们称之为溯因-演绎纠缠。为了利用这一点,我们使用所谓的表示移植来参数化受约束的族,表示移植是表示空间中的一种特定线性变换,它在保留演绎成分的同时操纵表示的溯因内容。生成标签的因果机制的不变性意味着源和目标之间存在不变的演绎映射。因此,我们可以通过参数化移植来搜索合理的目标分布空间。我们在一个学习器-对手博弈中使用该方案,在理想优化下,该博弈可证明终止于学习器具有极小极大最优目标预测。评估验证了理论,表明该方法在领域泛化基准测试中具有竞争力。

英文摘要

Prediction models trained under the source distribution do not generalize well to a different target distribution. A valid inference about an unseen data distribution must be anchored by the invariance of certain causal mechanisms that generate the source and target data, however, these structural invariances are non-identifiable from the source data alone. Under mild causal assumptions about the data, we show that the optimal prediction in the target is in fact partially identifiable by the source distribution. The result rests on a simple observation: In any domain, the optimal prediction can be factorized into what we call a pair of abduction and deduction maps, where the abduction map makes inference about some unobserved variables (possibly confounders) from the observed variables and the deduction map predicts the label using both the observed and inferred quantities. Access to large source data pins down the optimal prediction, thus constrains the valid abduction-deduction ensembles that produce it -- a non-identifiability that we call the abduction-deduction entanglement. To leverage this, we parameterize the constrained family using what we call a representation transplant, that is a specific linear transformation in the representation space that manipulates the abduction content of the representation while retaining the deduction component. Invariance of the causal mechanism generating the label implies existence of an invariant deduction map between source and target. Thus, we can search the space of plausible target distributions via a parametric transplant. We use this scheme in a learner-adversary game that, under an idealistic optimization, provably terminates with the learner having the minimax-optimal target prediction. Evaluations verify the theory, showing that the method is competitive in DG benchmarks.

2605.25135 2026-05-26 cs.LG cs.AI 版本更新

ASTRO: Adaptive Spatio-Temporal Reinforcement Optimization for GNN Powered Anomly Detection in Cyber Physical Systems

ASTRO: 用于信息物理系统中基于GNN的异常检测的自适应时空强化优化

Rai Ali Yar, Umaisa Lail, Anwar Shah

发表机构 * Department of Computer Science, FAST NUCES(计算机科学系,FAST NUCES) Department of Information Technology, Riphah International University(信息技术系,Riphah国际大学)

AI总结 提出ASTRO框架,结合深度Q网络与图神经网络、时间建模和多头注意力机制,通过强化学习动态优化阈值,在SWaT和WADI数据集上实现高F1分数,优于现有方法。

详情
AI中文摘要

工业物联网环境中的异常检测对于保护工业控制系统和信息物理系统免受运行时虚假数据注入和其他恶意攻击至关重要。传感器网络和互连控制回路日益复杂,使得识别隐藏在高维和时间依赖信号中的异常行为变得困难。为解决这些挑战,本文介绍了自适应时空强化优化ASTRO,一种新颖的异常检测框架,开创性地使用强化学习进行动态阈值优化。通过将深度Q网络与图神经网络、时间建模和多头注意力机制相结合,ASTRO不断调整其决策边界以提高检测精度。GNN组件建模传感器之间的空间关系,时间模型捕获时间序列依赖性,注意力层突出显示最具信息量的时间步。模型生成连续异常分数,通过自适应阈值转换为二元决策,该阈值通过深度Q网络优化。ASTRO方法在两个真实工业基准测试:安全水处理和水分配数据集上进行了评估。所提模型在SWaT上取得了卓越性能,F1分数为0.990。此外,在高度复杂的127个终端设备的WADI数据集上,它获得了0.788的F1分数,比最先进的基线高出近14%。多次运行的结果证实了其一致的泛化能力和稳定性。这些实验表明,ASTRO框架是增强大规模信息物理基础设施的高度实用和可扩展的方法。

英文摘要

Anomaly detection in Industrial Internet of Things (IIoT) environments is essential to protect the Industrial Control Systems (ICS) and Cyber-Physical Systems (CPS) from occuring run time false data injection and other malicious attacks. The increasing complexity of sensor networks and interconnected control loops makes it difficult to identify anomalous behavior hidden within high-dimensional and time-dependent signals. To address these challenges, this article introduces Adaptive Spatio-Temporal Reinforcement Optimization ASTRO (ASTRO), a novel anomaly detection framework that pioneers the use of reinforcement learning for dynamic threshold optimization. By integrating a Deep Q-Network (DQN) with Graph Neural Networks (GNNs), temporal modelling and a Multi-Head Attention mechanism, ASTRO continuously adapts its decision boundaries to improve detection accuracy. The GNN component models the spatial relations among sensors, Temporal model captures time series dependencies and the attention layer highlights most informative time steps. The model generates continuous anomaly scores, which are transformed into binary decisions using an adaptive threshold, optimized via a Deep Q-Network (DQN). The ASTRO approach is evaluated on two real world industrial benchmarks: the Secure Water Treatment (SWaT) and Water Distribution (WADI) datasets. The proposed model achieves an exceptional performance on the SWaT with F1 score of 0.990. Moreover, on highly complex 127 end devices WADI dataset, it secures F1 score of 0.788, outperforming state-of-the-art baselines by nearly 14%. Results across multiple runs confirm consistent generalization and stability. These experiments demonstrate that the ASTRO framework is highly practical and scalable method for strengthening the large scale cyber physical infrastructures

2605.25129 2026-05-26 cs.LG 版本更新

Blocked Gibbs meets Diffusion Transformers: Unsupervised Learning for Constraint Optimization

分块吉布斯采样遇上扩散Transformer:约束优化的无监督学习

Yudong W. Xu, Wenhao Li, Xiaoyu Wang, Scott Sanner, Elias B. Khalil

发表机构 * University of Toronto(多伦多大学) Vector Institute(向量研究所)

AI总结 提出分块吉布斯扩散Transformer(BloGDiT),通过分块高斯去噪替代标准联合高斯去噪,解决扩散模型在约束优化中变量子集大规模编辑的需求,在数独、图着色、最大独立集和MaxCut任务上匹配或超越现有方法。

详情
AI中文摘要

扩散模型在学习解决约束优化问题方面显示出潜力。然而,它们大多局限于二元变量问题,并依赖图神经网络,阻碍了其应用于更广泛的问题,例如具有一般离散变量或需要全局而非局部推理的约束结构的问题。我们研究了使用扩散Transformer来解决上述局限性。朴素实现表现不佳,因为标准扩散过程与约束求解之间存在根本性不匹配:前者对所有变量进行微小、渐进的去噪,而后者需要大幅改变特定的变量子集以实现可行性或最优性。我们的方法,分块吉布斯扩散Transformer(BloGDiT),是第一个通过用分块高斯去噪替代标准联合高斯去噪来解决这一局限性的方法。BloGDiT使用迭代块重采样,并随时间退火块大小,以促进变量块内的大规模、有针对性的编辑。在数独、图着色、最大独立集和MaxCut上,BloGDiT匹配或超越了现有方法,表明分块吉布斯式扩散为基于Transformer的约束满足和优化提供了高度有效的归纳偏置。

英文摘要

Diffusion models have shown promise in learning to solve constraint optimization problems. However, they are mostly restricted to problems with binary variables and rely on graph neural networks, hindering their application to a broader range of problems such as those with general discrete variables or constraint structures that necessitate global rather than local reasoning. We investigate the use of Diffusion Transformers to address the aforementioned limitations. A naive implementation performs poorly due to a fundamental mismatch between the standard diffusion process and constraint solving: while the former applies small, incremental denoising across all variables, the latter requires substantially altering specific subsets of variables to attain feasibility or optimality. Our method, Blocked Gibbs Diffusion Transformer (BloGDiT), is the first to address this limitation by replacing standard joint Gaussian denoising with blocked Gaussian denoising. BloGDiT uses iterative block resampling and anneals the block size over time to facilitate large, targeted edits within a block of variables. Across Sudoku, Graph Coloring, Maximum Independent Set, and MaxCut, BloGDiT matches or outperforms existing methods, demonstrating that blocked Gibbs-style diffusion provides a highly effective inductive bias for Transformer-based constraint satisfaction and optimization.

2605.25127 2026-05-26 cs.CV cs.LG 版本更新

PQDT: Pseudo-Query Dual Transformer for Robust Point Cloud Restoration

PQDT: 伪查询双Transformer用于鲁棒点云修复

Haoqing Wu, Alexa Nawotki, Jochen Garcke

发表机构 * Mercedes-Benz AG(梅赛德斯-奔驰集团) University of Bonn(波恩大学) Fraunhofer SCAI(弗劳恩霍夫SCAI研究所)

AI总结 提出一种基于伪查询模块和Transformer主干网络的统一3D修复网络,通过两阶段几何变换增强结构清晰度和局部细节,在多种退化场景下超越现有方法。

Comments To be published in The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

详情
AI中文摘要

点云是计算机视觉中一种基本的3D表示,支持广泛的感知任务。然而,由于传感器限制或遮挡,真实世界的点云常常遭受不完整、噪声、离群点和密度不规则等退化。从这种退化数据中恢复干净且详细的形状对于下游应用至关重要。尽管现有的基于学习方法在完成或去噪等单个任务上取得了进展,但它们通常依赖于全局瓶颈特征,这会丢失细粒度几何信息,并且对变化的输入质量敏感。我们提出一个统一的3D修复网络,直接以点云作为输入,并在多种退化场景下自适应地重建高质量几何。我们方法的核心是一个伪查询模块,在Transformer主干网络中实现,它将几何变换重新表述为两个协作阶段,以增强结构清晰度、鲁棒性和局部细节保留。在精心设计的基准测试上的大量实验表明,我们的方法在通用3D修复中超越了最先进的性能。它有效处理了完成、变形和去噪退化的复杂组合。通过这项工作,我们提供了一个新颖的、统一的、仅基于点的主干网络,用于鲁棒的3D修复,从而实现更通用的3D感知。

英文摘要

Point clouds are a fundamental 3D representation in computer vision, enabling a wide range of perception tasks. However, real-world point clouds often suffer from degradations such as incompleteness, noise, outliers, and irregular density, caused by sensor limitations or occlusions. Recovering clean and detailed shapes from such degraded data is crucial for downstream applications. While existing learning-based methods achieve progress on individual tasks like completion or denoising, they typically rely on global bottleneck features, which lose fine-grained geometry and remain sensitive to varying input quality. We propose a unified 3D restoration network that directly takes point clouds as input and adaptively reconstructs high-quality geometry under diverse degradation scenarios. At the core of our approach is a Pseudo-Query module, implemented within a Transformer backbone, which reformulates geometric translation into two cooperative stages to enhance structural clarity, robustness, and local detail preservation. Extensive experiments on curated benchmarks demonstrate that our approach surpasses state-of-the-art performance in general 3D restoration. It effectively handles complex combinations of completion, deformation, and denoising degradations. With this work, we provide a novel unified, point-only backbone for robust 3D restoration, enabling more versatile 3D perception.

2605.25124 2026-05-26 cs.LG 版本更新

Optimizing Multidimensional Scaling in Gini Metric Spaces

在基尼度量空间中优化多维缩放

Cassandra Mussard, Stéphane Mussard

发表机构 * GitHub

AI总结 提出基尼多维缩放(Gini MDS)框架,通过基于值和秩的基尼伪距离,在噪声和异常值数据上优于欧几里得MDS,并利用PyTorch实现GPU加速。

详情
AI中文摘要

基尼多维缩放(Gini MDS)框架扩展了欧几里得多维缩放。我们引入了一种基于值和秩的基尼伪距离,该距离依赖于一个可微调的超参数。这种伪距离允许灵活探索潜在配置,从而实现与观测相异度最佳匹配的嵌入。Gini MDS被证明对噪声和异常值具有鲁棒性,使其非常适合实际应用。我们在16个带有异常值的UCI数据集和带有噪声的MNIST图像上进行了实验,表明Gini MDS在噪声数据上优于欧几里得MDS。最后,与 exttt{sklearn}库的标准MDS相比,基于张量的 exttt{PyTorch}实现提供了GPU加速和高效计算。

英文摘要

The Gini Multidimensional Scaling (Gini MDS) framework extends the Euclidean multidimensional scaling. We introduce a Gini pseudo-distance based on values and their ranks that depends on a fine-tunable hyperparameter. This pseudo-distance allows flexible exploration of latent configurations, enabling embeddings that best match observed dissimilarities. The Gini MDS is shown to be robust to noise and outliers, making it well-suited for real-world applications. We provide experiments on 16 UCI datasets with outliers and on MNIST images with noise to show that the Gini MDS outperforms the Euclidean MDS on noisy data. Finally, a tensor-based implementation in \texttt{PyTorch} provides GPU acceleration and efficient computation compared to the standard MDS of the \texttt{sklearn} library.

2605.25123 2026-05-26 cs.LG cs.AI cs.CL cs.CV stat.ML 版本更新

Inference-Time Alignment of Diffusion Models via Trust-Region Iterative Twisted Sequential Monte Carlo

扩散模型的推理时对齐:基于信任区域迭代扭曲序贯蒙特卡洛方法

Weixin Wang, Yu Yang, Wei Deng, Pan Xu

发表机构 * Duke University(杜克大学) Morgan Stanley(摩根大通)

AI总结 提出信任区域迭代扭曲序贯蒙特卡洛(TRI-TSMC)框架,通过迭代学习扭曲函数来改进扩散模型推理时的对齐,在文本生成和文本到图像生成任务上优于现有方法。

Comments 34 pages, 6 figures, and 7 tables

详情
AI中文摘要

我们研究基于扩散的生成模型的推理时对齐,旨在引导基础模型产生高奖励输出而不更新其权重。最近的基于序贯蒙特卡洛(SMC)的引导方法以原则性的方式近似奖励倾斜的目标分布,但其提议仍主要依赖于基础采样器。由于奖励信息主要通过粒子重加权和重采样在传播后使用,这些方法可能需要大量粒子预算,并遭受权重退化和高方差估计的问题。降低方差和提高粒子效率的一种方法是迭代学习提供前瞻指导的扭曲函数,如扭曲SMC。然而,现有的可学习扭曲方法主要针对经典序贯推理开发,当应用于具有高维状态空间和终端、噪声或黑盒奖励的扩散对齐时可能不稳定。我们提出信任区域迭代扭曲序贯蒙特卡洛(TRI-TSMC),一种用于在基于SMC的推理时对齐中学习扭曲函数的信任区域框架。每次迭代在路径空间中计算精确的KL约束更新,通过温度重要性重加权得到闭式解,并通过加权最大似然将该目标投影回参数化扭曲族。理论上,我们形式化了最优扭曲函数的值函数解释,并表明它产生零方差采样器。我们证明信任区域更新沿着护航路径朝向目标分布,加权最大似然更新是前向KL投影,并且该路径降低了残差重要性权重方差。实验上,在匹配的推理时预算下,TRI-TSMC在离散扩散文本生成和文本到图像生成上改进了主要对齐目标。

英文摘要

We study inference-time alignment for diffusion-based generative models, aiming to steer a base model toward high-reward outputs without updating its weights. Recent Sequential Monte Carlo (SMC)-based steering methods approximate reward-tilted target distributions in a principled way, but their proposals remain largely tied to the base sampler. Since reward information is mainly used after propagation through particle reweighting and resampling, these methods can require large particle budgets and suffer from weight degeneracy and high-variance estimates. One way to reduce variance and improve particle efficiency is to iteratively learn twisting functions that provide look-ahead guidance, as in twisted SMC. However, existing learnable twisting methods are developed mainly for classical sequential inference and can be unstable when applied to diffusion-based alignment with high-dimensional state spaces and terminal, noisy, or black-box rewards. We propose Trust-Region Iterative Twisted Sequential Monte Carlo (TRI-TSMC), a trust-region framework for learning twisting functions in SMC-based inference-time alignment. Each iteration computes an exact KL-constrained update in path space, which admits a closed-form solution by tempered importance reweighting, and projects this target back to the parameterized twisted family by weighted maximum likelihood. Theoretically, we formalize the value-function interpretation of the optimal twisting function and show that it yields a zero-variance sampler. We prove that the trust-region update follows an escort path toward the target distribution, that the weighted maximum-likelihood update is a forward-KL projection, and that the path reduces residual importance-weight variance. Empirically, TRI-TSMC improves primary alignment objectives on discrete diffusion text generation and text-to-image generation under matched inference-time budgets.

2605.25119 2026-05-26 cs.CV cs.AI cs.LG 版本更新

Trust-Aware Joint Feature-Prediction Discrepancy for Robust Domain Adaptation

信任感知的联合特征-预测差异用于鲁棒域适应

Xi Ding, Lei Wang, Syuan-Hao Li, Yongsheng Gao

发表机构 * School of Engineering and Built Environment, Griffith University, Australia(工程与环境学院,格里菲斯大学,澳大利亚)

AI总结 提出信任感知域适应框架,通过联合特征-预测差异(JFPD)结合不确定性信任和语义对齐信任,实现可靠性感知的域差异估计,提升域适应性能。

Comments Research report

详情
AI中文摘要

域适应旨在减轻标记源域与未标记或稀疏标记目标域之间分布偏移导致的性能下降。大多数现有方法在特征空间或预测空间中估计域差异。然而,这些单一视角策略忽略了域偏移下的一个关键问题:用于对齐的信号可靠性。实际上,学习到的表示和语义预测都可能变得不可靠,平等对待所有目标样本可能导致误导性对齐和次优迁移。我们引入了信任感知域适应,这是一个原则性框架,通过特征和预测信号的可靠性来建模域差异。我们方法的核心是联合特征-预测差异(JFPD),这是一个统一公式,联合捕捉表示散度和预测散度,并通过样本特定信任加权它们的贡献。信任通过两种互补机制量化:不确定性信任,从预测熵导出以抑制不可靠预测;语义对齐信任,从特征空间中的原型相似性计算以强调良好对齐的表示。通过优先考虑自信且语义一致的样本,同时降低噪声或模糊样本的权重,JFPD提供了域差异的可靠性感知估计。我们进一步将JFPD集成到训练目标中,引导适应朝向目标域的可靠区域。在标准基准上的实验表明,所提出的框架始终实现优越的适应性能,并产生与目标域误差相关的差异估计。这项工作首次解决了在域适应中建模特征与预测之间交互信任的重要性。

英文摘要

Domain adaptation aims to mitigate performance degradation caused by distribution shifts between a labeled source domain and an unlabeled or sparsely labeled target domain. Most existing approaches estimate domain discrepancy either in feature space or in prediction space. However, these single-perspective strategies overlook a critical problem under domain shift: the reliability of the signals used for alignment. In practice, both learned representations and semantic predictions may become unreliable, and treating all target samples equally can lead to misleading alignment and suboptimal transfer. We introduce trust-aware domain adaptation, a principled framework that models domain discrepancy through the reliability of feature and prediction signals. Central to our approach is the Joint Feature-Prediction Discrepancy (JFPD), a unified formulation that jointly captures representation divergence and prediction divergence while weighting their contributions by sample-specific trust. Trust is quantified via two complementary mechanisms: uncertainty-aware trust, derived from prediction entropy to suppress unreliable predictions, and semantic-alignment trust, computed from prototype similarity in feature space to emphasize well-aligned representations. By prioritizing confident and semantically consistent samples while down-weighting noisy or ambiguous ones, JFPD provides a reliability-aware estimate of domain discrepancy. We further integrate JFPD into a training objective that guides adaptation toward trustworthy regions of the target domain. Experiments on standard benchmarks demonstrate that the proposed framework consistently achieves superior adaptation performance and yields discrepancy estimates that correlate with target-domain error. This work addresses, for the first time, the importance of modeling trust in the interaction between features and predictions for domain adaptation.

2605.25115 2026-05-26 cs.LG cs.AI cs.CE physics.app-ph 版本更新

Courant: a State-Adaptive Perceiver-Based Neural Surrogate with Local Support and Interpretable Field Decomposition

Courant:一种具有局部支持和可解释场分解的状态自适应感知器神经代理模型

Anuj Kumar, Josiah Bjorgaard, Nikolaos Bouklas, Matteo Salvador, Alexander Lavin

发表机构 * Pasteur Labs(Pasteur实验室) Cornell University(康奈尔大学) Institute for Simulation Intelligence(模拟智能研究所)

AI总结 提出基于感知器的编码-处理-解码代理模型Courant,通过状态自适应潜在查询和轻量解码器实现类似自适应hp细化的局部支持与可解释场分解,在稳态/瞬态模拟基准上取得竞争性精度。

详情
AI中文摘要

我们引入“Courant”,一种基于感知器的编码器-处理器-解码器代理模型,其潜在特征在物理空间中表现出自适应专门化和局部支持,实现了类似于自适应hp细化方案的功能,这是传统数值求解器和科学机器学习中非常期望的属性。所提出的架构结合了共享随机傅里叶特征坐标嵌入、状态自适应潜在查询和轻量解码器。Courant使用稳态或瞬态模拟数据进行端到端训练,仅使用物理空间中的标准L_2预测损失,在基准测试上达到竞争性精度。我们证明Courant的归纳偏差产生了设计上可解释的潜在变量:它们在模拟域中发展出多尺度几何专门化,并在时间相关情况下跟踪相干结构,类似于随时间演化的空间基函数,从而允许对模拟场进行紧凑的、几何锚定的、单位划分式的分解。

英文摘要

We introduce "Courant", a Perceiver-based encoder-processor-decoder surrogate model that has latent features exhibiting adaptive specialization and local support in the physical space, enabling functionality akin to an adaptive hp-refinement scheme, an attribute that is highly desirable in traditional numerical solvers and scientific machine learning broadly. The proposed architecture combines a shared random Fourier feature coordinate embedding, state-adapted latent queries, and a light-weight decoder. Courant is trained end-to-end with steady or transient simulation data and only a standard L_2 prediction loss in the physical space, achieving competitive accuracy on benchmarks. We demonstrate that Courant's inductive biases yield latents that are interpretable by design: they develop multiscale geometric specialization in the simulation domain and track coherent structures in the time-dependent case, acting analogously to time-evolving spatial basis functions and allowing for decoding a compact, geometry-anchored, partition-of-unity-like decomposition of the simulated field.

2605.25114 2026-05-26 stat.ML cs.LG 版本更新

Counterfactually Safe Reinforcement Learning

反事实安全的强化学习

Jingyi Li, Peng Wu, Chengchun Shi

发表机构 * Department of Statistics and Data Science, National University of Singapore(新加坡国立大学统计与数据科学系) School of Mathematics and Statistics, Beijing Technology and Business University(北京技术与商业大学数学与统计学学院) Department of Statistics, London School of Economics and Political Science(伦敦政治经济学院统计系)

AI总结 针对强化学习策略可能对个体造成伤害的问题,提出基于反事实视角定义个体伤害,并设计两阶段学习过程以最大化期望回报同时控制伤害率,理论证明有限样本性质与次优性上界,实验验证有效性。

详情
AI中文摘要

强化学习算法通常被设计为最大化群体上的期望回报。然而,平均最优的策略可能对某些个体是次优的,导致潜在的安全问题。为了解决这个问题,我们首先从反事实角度形式化了个体伤害的概念,并将伤害定义为所选动作导致结果严格差于基线替代方案的事件。然后,我们提出了一种通用的两阶段过程来学习策略,该策略在考虑个体伤害的同时最大化期望回报。我们进一步建立了所学策略的有限样本性质,推导了其次优性差距的上界,并表明伤害率得到了良好控制。在模拟和真实数据集上的数值实验证明了所提出方法的有效性。

英文摘要

Reinforcement learning algorithms are generally designed to maximize the expected return across a population. However, a policy that is optimal on average may be suboptimal for certain individuals, leading to potential safety concerns. To address this, we first formalize the notion of individual harm from a counterfactual perspective and define harm as the event in which a chosen action results in a strictly worse outcome than a baseline alternative. We then propose a general two-stage procedure for learning policies that maximize the expected return while accounting for individual harm. We further establish the finite-sample properties of the learned policy, derive an upper bound on its sub-optimality gap, and show that the harm rate remains well-controlled. Numerical experiments on both simulated and real-world datasets demonstrate the effectiveness of the proposed approach.

2605.25111 2026-05-26 cs.LG 版本更新

Revisiting Pre-Propagation GNNs: Robust Diffusion Operators and Hidden-State Re-Propagation

重新审视预传播图神经网络:鲁棒扩散算子与隐状态再传播

Zichao Yue, Zhiru Zhang

发表机构 * School of Electrical and Computer Engineering, Cornell University, Ithaca, New York, USA(电气与计算机工程系,康奈尔大学,纽约州伊萨卡市)

AI总结 提出鲁棒图扩散算子和少量隐状态再传播方案,使预传播图神经网络在保持训练效率的同时匹配消息传递图神经网络的精度。

详情
AI中文摘要

预传播图神经网络(PPGNNs)将节点特征传播与变换解耦:图扩散作为预处理一次性执行,训练简化为每个节点的密集变换。这种设计使得小批量训练无需节点间依赖,避免了重复的稀疏矩阵-矩阵乘法,并更好地适配针对密集计算优化的现代加速器。然而,其表达能力仍不明确,实验结果表明PPGNNs与对应的消息传递图神经网络在常用图基准(尤其是异配图)上存在差距。本文提出一套用于预处理的鲁棒图扩散算子和训练过程中的少量隐状态再传播方案。我们的方法提高了PPGNNs的验证和测试准确率,使其在保持训练效率的同时匹配消息传递图神经网络的精度。

英文摘要

Pre-propagation graph neural networks (PPGNNs) decouple node feature propagation from transformation: graph diffusion is performed once as preprocessing, and training reduces to dense per-node transformations. This design enables mini-batch training without inter-node dependencies, avoids repeated sparse matrix--matrix multiplications, and better matches modern accelerators optimized for dense compute. However, their expressivity remains unclear, and empirical results show a gap between PPGNNs and their message-passing counterparts on commonly used graph benchmarks, especially heterophilic ones. In this paper, we propose a suite of robust graph diffusion operators for preprocessing and a few-shot hidden-state re-propagation scheme during training. Our methods improve the validation and test accuracy of PPGNNs, enabling them to match the accuracy of message-passing GNNs while maintaining training efficiency.

2605.25110 2026-05-26 cs.CV cs.AI cs.LG 版本更新

Uncertainty-DTW for Sequences and Visual Tokens

Uncertainty-DTW 用于序列和视觉标记

Lei Wang, Syuan-Hao Li, Yongsheng Gao, Piotr Koniusz

发表机构 * School of Engineering and Built Environment, Electrical and Electronic Engineering, Griffith University(工程与建筑环境学院,电气与电子工程学院,格里菲斯大学) School of Computer Science and Engineering, University of New South Wales(计算机科学与工程学院,新南威尔士大学)

AI总结 提出不确定性感知的动态时间规整(uDTW)框架,通过异方差不确定性建模和最大似然估计实现鲁棒对齐,并推广到视觉标记集,在多个领域取得优于现有方法的结果。

Comments Research report

详情
AI中文摘要

对齐结构化数据是计算机视觉和机器学习中的一个基本问题,支撑着时间序列分析、人类动作识别和视觉表示学习等任务。现有的对齐方法,包括动态时间规整(DTW)及其可微变体,依赖于确定性相似度度量,因此对异质和噪声特征敏感。在这项工作中,我们引入了不确定性感知对齐,这是一个概率框架,用异方差不确定性建模成对对应关系,并沿对齐路径执行结构化匹配。我们的公式,不确定性-DTW(uDTW),为每个对应分配一个正态分布,并通过最大似然估计目标参数化每条对齐路径,该目标包括(i)一个精度加权匹配项,抑制不可靠特征,以及(ii)一个对数方差正则化,防止退化解。这产生了一个概率对齐机制,对噪声具有鲁棒性且可解释,因为不确定性直接反映了匹配的可靠性。我们进一步将该框架从时间序列推广到标记化的视觉表示,从而能够对视觉标记集进行结构化匹配。学习到的不确定性可以解释为反向注意力:语义相关区域表现出低不确定性并主导对齐,而模糊/噪声区域具有高不确定性。这提供了对齐、注意力和不确定性建模之间的联系。我们在不同领域评估了所提出的框架。结果表明,与最先进的方法相比,该方法持续改进,并且学习到的不确定性与语义重要性相关。这些发现将不确定性感知对齐确立为一个通用、鲁棒且可解释的框架,用于从结构化数据中学习。

英文摘要

Aligning structured data is a fundamental problem in computer vision and machine learning, underlying tasks such as time series analysis, human action recognition, and visual representation learning. Existing alignment methods, including Dynamic Time Warping (DTW) and its differentiable variants, rely on deterministic similarity measures and are therefore sensitive to heterogeneous and noisy features. In this work, we introduce uncertainty-aware alignment, a probabilistic framework that models pairwise correspondences with heteroscedastic uncertainty and performs structured matching along alignment paths. Our formulation, uncertainty-DTW (uDTW), assigns each correspondence a Normal distribution and parametrizes each alignment path by a Maximum Likelihood Estimate objective consisting of (i) a precision-weighted matching term that suppresses unreliable features, and (ii) a log-variance regularization that prevents degenerate solutions. This yields a probabilistic alignment mechanism that is robust to noise and interpretable, as uncertainty directly reflects the reliability of matches. We further generalize this framework from temporal sequences to tokenized visual representations, enabling structured matching over sets of visual tokens. The learned uncertainty can be interpreted as a reverse-attention: semantically relevant regions exhibit low uncertainty and dominate the alignment, while ambiguous/noisy regions have high uncertainty. This provides a connection between alignment, attention, and uncertainty modeling. We evaluate the proposed framework across diverse domains. The results demonstrate consistent improvements over state-of-the-art methods and show that learned uncertainty correlates with semantic importance. These findings establish uncertainty-aware alignment as a general, robust, and interpretable framework for learning from structured data.

2605.25107 2026-05-26 cs.LG cs.AI cs.NA math.NA 版本更新

Leveraging Gauge Freedom for Learning Non-Gradient Population Dynamics of Stochastic Systems

利用规范自由度学习随机系统的非梯度种群动力学

Jules Berman, Tobias Blickhan, Benjamin Peherstorfer

发表机构 * Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA(数学科学学院,纽约大学,纽约,纽约州,10012,美国)

AI总结 针对现有种群动力学推断局限于梯度流的问题,提出非梯度推断流(NGIF)算法,通过连续性方程的弱形式参数化一般向量场并选择非最小动能准则,在低维和高维物理问题中提高了分布精度并更好地捕捉非势输运。

详情
AI中文摘要

现有的种群动力学推断工作通常关注由标量势的梯度向量场产生的流。在所有与种群动力学兼容的容许流中,梯度流在特定意义下是最优的:它们最小化动能。基于不同准则选择场对应于确定种群动力学时的规范自由度,我们在本文中利用了这一点。我们提出了非梯度推断流(NGIF),一种使用连续性方程弱形式推断非梯度种群动力学的算法。这使我们能够参数化一般向量场,并选择超出最小动能的其他选择准则。我们在各种低维和高维物理问题上证明,这种更一般的方法提高了相对于梯度受限基线的分布精度,并更好地捕捉了非势输运。

英文摘要

Existing work on population dynamics inference often focuses on flows arising from vector fields that are the gradients of scalar potentials. Among all admissible flows that are compatible with the population dynamics, gradient flows are optimal in a specific sense: they minimize kinetic energy. The selection of fields based on different criteria corresponds to a gauge freedom when determining population dynamics, which we leverage in this work. We propose Non-Gradient Inference Flows (NGIF), an algorithm to infer non-gradient population dynamics using a weak formulation of the continuity equation. This allows us to parameterize general vector fields and choose other selection criteria beyond minimal kinetic energy. We demonstrate on a variety of low- and high-dimensional physics problems that this more general approach improves distributional accuracy over gradient-restricted baselines and better captures non-potential transport.

2605.25095 2026-05-26 cs.AI cs.LG math.OC 版本更新

RECTOR: Priority-Aware Rule-Based Reranking for Compliance-Aware Autonomous Driving Trajectory Selection

RECTOR: 基于优先级规则的合规感知自动驾驶轨迹选择重排序

Hadi Hajieghrary, Benedikt Walter, Chaitanya Shinde, Paul Schmitt, Miguel Hurtado

发表机构 * TORC Robotics LLC(TORC机器人公司) Daimler Truck AG(戴姆勒卡车集团) Reynolds & Moore(雷诺兹与摩尔公司) MassRobotics(马斯机器人)

AI总结 提出RECTOR,一种后生成重排序层,通过差异化代理和场景条件适用性机制,基于分层规则手册(安全>法律>道路>舒适)对候选轨迹进行评分,并采用确定性ε-词典序规则选择,在无需重新训练预测器的情况下,将安全与法律违规率从28.58%降至20.42%。

详情
AI中文摘要

自动驾驶堆栈必须从多模态候选集中选择一条轨迹;仅凭模型置信度选择会忽略安全、交通法规和舒适性约束。我们提出RECTOR(规则强制约束轨迹编排器),一种后生成重排序层,通过差异化代理和场景条件适用性机制,根据分层规则手册(安全>法律>道路>舒适)对候选轨迹进行评分,然后采用确定性ε-词典序规则进行选择,该规则通过构造保持跨层优先级——无需重新训练预测器。在Waymo开放运动数据集validation_interactive划分(43,219个增强实例,K=6)上,根据协议B(28条规则代理目录,oracle适用性),与同一候选集上仅基于置信度的选择相比,规则感知选择将安全+法律违规从28.58%降至20.42%,总违规从40.32%降至32.41%。在该基准上,均匀加权求和基线匹配了二元合规性——经验提升来自规则感知排序,而词典序保证是任何权重校准无法复制的结构性差异因素。在对抗性置信度破坏下,仅置信度选择在100%的场景中失败,而两种规则感知选择器在约96%的场景中拒绝了注入的模式。所有数据均为代理评估器结果(非安全认证),开环,5秒时域,美国规则,验证集划分。

英文摘要

Autonomous driving stacks must pick one trajectory from a multi-modal candidate set; choosing by model confidence ignores safety, traffic-law, and comfort constraints. We present \textsc{RECTOR} (Rule-Enforced Constrained Trajectory Orchestrator), a post-generation reranking layer that scores candidates against a tiered rulebook (Safety~$\succ$~Legal~$\succ$~Road~$\succ$~Comfort) via differentiable proxies and a scene-conditioned applicability mechanism, then selects with a deterministic $\varepsilon$-lexicographic rule that preserves cross-tier priority by construction -- without retraining the predictor. On the Waymo Open Motion Dataset \texttt{validation\_interactive} split (43{,}219 augmented instances, $K{=}6$), under Protocol~B (28-rule proxy catalog, oracle applicability) rule-aware selection cuts Safety+Legal violations from 28.58\% to 20.42\% and Total from 40.32\% to 32.41\% versus confidence-only on the same candidates. A uniform-weight weighted-sum baseline matches binary compliance on this benchmark -- the empirical lift comes from rule-aware ranking, while the lexicographic guarantee is the structural differentiator no weight calibration can replicate. Under adversarial confidence corruption, confidence-only selection fails in 100\% of scenarios while both rule-aware selectors reject the injected mode in $\sim$96\%. All figures are proxy-evaluator results (not a safety certificate), open-loop, 5\,s horizon, U.S.\ rules, validation split.

2605.25073 2026-05-26 cs.CR cs.AI cs.LG 版本更新

Security in the Fine-Tuning Lifecycle of Large Language Models: Threats, Defenses,Evaluation, and Future Directions

大型语言模型微调生命周期中的安全:威胁、防御、评估与未来方向

Wenjuan Li, Yitao Liu, Runze Chen, Rajkumar Buyya

发表机构 * Hangzhou Normal University(杭州师范大学) Zhejiang University(浙江大学) China Mobile (Zhejiang) Innovation Research Institute Co., Ltd.(中国移动(浙江)创新研究院有限公司) Quantum Cloud Computing and Distributed Systems (qCLOUDS) Lab, School of Computing and Information Systems(量子云计算与分布式系统(qCLOUDS)实验室,计算与信息学院) The University of Melbourne(墨尔本大学)

AI总结 本文系统综述了大型语言模型微调过程中的安全威胁与防御,提出了基于生命周期的三阶段框架,并通过统一实验评估了攻击与防御的有效性及跨阶段局限性。

Comments 39 pages, 7 figures, 22 tables

详情
AI中文摘要

背景:微调是将预训练大型语言模型(LLM)适应下游任务的核心,但其对训练数据、参数更新和可重用组件的依赖为攻击者提供了入口。威胁已从数据投毒和权重篡改演变为智能体操纵和接口利用,然而现有综述缺乏涵盖整个微调生命周期的统一框架。目标:本文对LLM微调安全进行了系统调查,并建立了一个基于生命周期的框架来比较攻击和防御,辅以统一的实证评估。方法:我们根据干预时机将攻击和防御机制分为三个阶段:微调前、微调中和微调后。在每个阶段,我们回顾和对比策略以揭示其演变和局限性。然后在统一的模型、硬件和协议设置下评估代表性方法,并进行跨阶段实验,配对来自不同阶段的攻击和防御。结果:攻击有效性高度依赖于模型且随规模非单调变化:对早期模型有效的权重编辑攻击在现代开源LLM上失去影响;跨语言后门迁移在更大规模上报告为近乎完美,但在测试的1B-4B模型上完全失败;纯粹良性样本可以损害指令微调模型的安全对齐。单阶段防御很少能跨阶段泛化,防御有效性共同依赖于模型架构和对齐状态。结论:我们识别了关键开放问题(配置鲁棒防御、跨阶段防御组合以及超越行为假设的嵌入空间攻击),并提出了具体的未来研究方向。

英文摘要

Background: Fine-tuning is central to adapting pre-trained Large Language Models (LLMs) to downstream tasks, but its reliance on training data, parameter updates, and reusable components opens entry points for attackers. Threats have evolved from data poisoning and weight tampering to agent manipulation and interface exploitation, yet existing reviews lack a unified framework spanning the full fine-tuning lifecycle. Objective: This paper presents a systematic survey of LLM fine-tuning security and establishes a lifecycle-based framework for comparing attacks and defenses, complemented by unified empirical evaluation. Methods: We divide attack and defense mechanisms into three phases by intervention timing: pre-tuning, during-tuning, and post-tuning. Within each phase, strategies are reviewed and contrasted to expose their evolution and limitations. Representative methods are then evaluated under a unified model, hardware, and protocol setup, with cross-phase experiments pairing attacks and defenses from different phases. Results: Attack effectiveness is highly model-dependent and non-monotonic with scale: weight-editing attacks effective on earlier models lose impact on modern open-source LLMs; cross-lingual backdoor transfer, reported as near-perfect at larger scales, fails entirely on tested 1B-4B models; and purely benign samples can compromise safety alignment in instruction-tuned models. Single-phase defenses rarely generalize across phases, and defense effectiveness depends jointly on model architecture and alignment state. Conclusion: We identify key open problems (configuration-robust defense, cross-phase defense composition, and embedding-space attacks beyond behavioral assumptions) and propose concrete future research directions.

2605.25066 2026-05-26 quant-ph cs.CR cs.LG 版本更新

QML-PipeGuard: Drift-Aware Behavioral Fingerprinting for Quantum Machine Learning Pipeline Integrity

QML-PipeGuard:面向量子机器学习管道完整性的漂移感知行为指纹识别

Esra Yeniaras

发表机构 * Quantum Security and Post-Quantum Cryptography Researcher(量子安全与后量子密码学研究员)

AI总结 提出QML-PipeGuard框架,通过行为指纹和基于合约的监测机制,同时应对量子机器学习管道中的硬件漂移和恶意信道替换威胁。

Comments 54 pages, 12 Tables, 5 figures

详情
AI中文摘要

量子机器学习(QML)正从研究原型转向部署的云服务。随着QML进入受监管行业,量子阶段的完整性在两个层面成为实际问题:重新校准之间信道级别的噪声硬件漂移,以及控制执行环境的对手可以用行为相似但数学上不同的量子信道替换声明的量子信道。现有的QML验证工作涉及脉冲级噪声、输入漂移、输入扰动鲁棒性或设备身份,但均未涵盖这两个问题。我们提出QML-PipeGuard,一个基于合约的框架,在单一数学机制下处理这两个问题。它通过行为指纹(在断层扫描结构测量族下的可观测量期望值向量)在运行时表征QML管道,并运行于两种模式:漂移感知监测(在标定容差内吸收良性校准变化)和对抗检测(将信道替换捕获为信息完备可观测量合约的违反)。该框架贡献了编码器-变分电路-测量信道的管道组合处理,包含QML特定的威胁模型(单量子比特Pauli族的紧框架界C=√3)、有限样本复杂度界以及分离对抗和自然漂移贡献的容差分解。我们在IBM Heron r2处理器(ibm_fez)上的两量子比特QSVM管道上端到端验证了该框架,并在噪声匹配模拟器上进行了样本复杂度验证。规定的测量预算(约1.4e4次)适合单批次作业,隐蔽信道在规避弱合约的同时被以宽裕的安全裕度检测到,典型硬件漂移在容差范围内。

英文摘要

Quantum machine learning (QML) is moving from research prototypes to deployed cloud services. As QML enters regulated industries, the integrity of the quantum stage becomes a practical concern on two fronts: noisy hardware drifts at the channel level between recalibrations, and an adversary with control over the execution environment can substitute the declared quantum channel with a behaviorally similar but mathematically distinct one. Neither concern is covered by existing QML verification work on pulse-level noise, input drift, input-perturbation robustness, or device identity. We introduce QML-PipeGuard, a contract-based framework addressing both concerns under a single mathematical machinery. It characterizes a QML pipeline at runtime by its behavioral fingerprint, the vector of observable expectation values under a tomographically structured measurement family, and operates in two modes: drift-aware monitoring that absorbs benign calibration changes within a calibrated tolerance, and adversarial detection that catches channel substitution as a violation of an informationally complete observable contract. The framework contributes a pipeline-composition treatment of the encoder-ansatz-measurement channel with a QML-specific threat model (tight frame-bound C=sqrt(3) for the single-qubit Pauli family), a finite-shot sample-complexity bound, and a tolerance decomposition separating adversarial and natural-drift contributions. We validate the framework end-to-end on a two-qubit QSVM pipeline on the IBM Heron r2 processor (ibm_fez), with a sample-complexity validation on a noise-matched simulator. The prescribed measurement budget (about 1.4e4 shots) fits in a single batched job, the sneaky channel is detected with a wide safety margin while evading the weak contract, and the typical hardware drift sits within tolerance.

2605.25063 2026-05-26 cs.LG cond-mat.mtrl-sci 版本更新

Reinforcement Learning for Laser Additive Manufacturing Scan-Order Optimisation: A Bilevel Proxy--FEA Diagnostic Framework for Reward and World-Model Diagnosis

激光增材制造扫描顺序优化的强化学习:用于奖励和世界模型诊断的双层代理-有限元分析诊断框架

Xian Wu, Haoran Li, Dongbin Zhao, Ruiyao Zhang, Yuanqi Chu, Bin Wang

发表机构 * College of Engineering, Design and Physical Sciences, Brunel University London(布鲁内尔大学伦敦工程、设计与物理科学学院) Pattern Recognition Laboratory, Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所模式识别实验室) ISIS Neutron and Muon Source, Science and Technology Facilities Council, Rutherford Appleton Laboratory(Rutherford Appleton实验室,科学与技术设施委员会ISIS中子与μ子源)

AI总结 本文提出一个双层代理-有限元分析诊断框架,通过轻量代理和稀疏有限元模拟,诊断强化学习在激光增材制造扫描顺序优化中的奖励和世界模型保真度问题。

Comments 31 pages, 7 figures, 3 tables

详情
AI中文摘要

强化学习为激光增材制造中的扫描顺序优化提供了一种有前景的方法,其中顺序扫描决策关键影响热积累、残余应力、变形和最终零件质量。将RL应用于该领域的一个核心挑战在于奖励和世界模型的保真度:完整的有限元分析在密集的环路评估中计算成本过高,而廉价的热启发代理度量虽然高效,但可能仅捕获真实热机械目标的局部方面。本文研究了一个用于强化学习引导的扫描顺序优化中奖励和世界模型诊断的双层代理-有限元分析诊断框架。下层采用轻量扫描路径和热启发代理进行快速候选生成和初步策略侧筛选,而上层利用稀疏的Abaqus有限元分析模拟提供基于模拟的参考标签。该框架在一个简化的全轨迹加热LDED32条纹基准上进行检验,该基准包含十种代表性扫描策略。最终冷却残余Mises应力、U3垂直变形和PEEQ塑性度量揭示了一个观察到的应力-变形权衡,而非单一单调的质量目标。在评估的集合中,center_out策略成为稳健的折衷候选,而raster_left_to_right和edge_in构成权衡的对立端点。代理-有限元分析对齐分析表明,当前廉价的基于路径的度量主要捕获变形相关(U3)行为,且与稀疏有限元分析参考标签仅呈现弱相关性。这些发现表明,仅代理的奖励设计在未来的RL训练中可能存在错位风险,并强调了在大规模策略优化之前,稀疏有限元分析参考信号对于诊断引导的奖励和世界模型精炼的价值。

英文摘要

Reinforcement learning offers a promising approach for scan-order optimisation in laser additive manufacturing, where sequential scan decisions critically influence thermal accumulation, residual stress, distortion, and final part quality. A central challenge in applying RL to this domain lies in reward and world-model fidelity: full finite-element analysis is computationally prohibitive for dense in-the-loop evaluation, while cheap thermo-inspired proxy metrics, though efficient, may capture only partial aspects of the true thermo-mechanical objectives. This paper investigates a bilevel Proxy--FEA diagnostic framework for reward and world-model diagnosis in reinforcement-learning-guided scan-order optimisation. The lower level employs lightweight scan-path and thermo-inspired proxies for rapid candidate generation and preliminary policy-side screening, while the upper level utilises sparse Abaqus FEA simulations to provide simulation-based reference labels. The framework is examined on a simplified whole-track heating LDED32 stripe benchmark comprising ten representative scan strategies. Final-cooling residual Mises stress, U3 vertical distortion, and PEEQ plasticity metrics reveal an observed stress--distortion trade-off rather than a single monotonic quality objective. Within the evaluated set, the center_out strategy emerges as a robust compromise candidate, while raster_left_to_right and edge_in form opposing endpoints of the trade-off. Proxy--FEA alignment analysis shows that current cheap path-based metrics predominantly capture distortion-related (U3) behaviour and exhibit only weak correlation with the sparse FEA reference labels. These findings highlight that proxy-only reward designs risk misalignment in future RL training and underscore the value of sparse FEA reference signals for diagnostic-guided reward and world-model refinement prior to large-scale policy optimisation.

2605.25061 2026-05-26 cs.LG cs.AI 版本更新

GL-LFGNN:A Global-Local Dual-branch Causal Graph Neural Network Based on Liang-Kleeman Information Flow for EEG Emotion Recognition

GL-LFGNN:基于Liang-Kleeman信息流的全局-局部双分支因果图神经网络用于脑电情感识别

Ziyi Wang, Dongyang Kuang

发表机构 * School of Mathematics (Zhuhai), Sun Yat-sen University, Zhuhai, China(中山大学数学学院(珠海))

AI总结 提出GL-LFGNN模型,利用Liang-Kleeman信息流理论构建有向因果图,通过全局-局部双分支架构整合全脑与区域连接,在MEEG数据集上以少量参数实现高精度情感识别。

Comments 10 pages, 3 figures

详情
AI中文摘要

基于脑电的情感识别在客观诊断情绪障碍方面具有重要前景。图神经网络已成为建模脑电通道间依赖关系的主流范式,但现有方法依赖于基于空间邻近性或功能相关性导出的对称邻接矩阵,这些矩阵本质上捕捉的是统计关联而非有向因果影响,这与神经信息流固有的非对称、因果驱动特性相冲突。为弥合这一差距,我们提出GL-LFGNN,一种基于Liang-Kleeman信息流理论的全局-局部双分支因果图神经网络。与仅评估时间优先性的格兰杰因果不同,我们的方法从动力系统角度严格量化因果强度,生成神经生理学可解释的有向图。双分支架构进一步将全脑连接性与符合既定功能神经解剖学的区域特定处理相结合。在MEEG数据集上,GL-LFGNN仅用37K参数(约为当前最优模型的10%)便达到86.17%(唤醒度)和86.71%(效价)的准确率,表明原则性的因果建模可同时增强可解释性、泛化能力和计算效率。代码将开源。

英文摘要

EEG-based emotion recognition holds significant promise for objective diagnosis of mood disorders. Graph neural networks (GNNs) have emerged as the dominant paradigm for modeling inter-channel dependencies in EEG, yet existing approaches rely on symmetric adjacency matrices derived from spatial proximity or functional correlations that fundamentally capture statistical associations rather than directed causal influences, which conflicts with the inherently asymmetric, causally-driven nature of neural information flow. To bridge this gap, we propose GL-LFGNN, a Global-Local Dual-branch Causal Graph Neural Network grounded in Liang-Kleeman information flow theory. Unlike Granger causality that merely assesses temporal precedence, our approach rigorously quantifies causal strength from a dynamical systems perspective, yielding neurophysiologically interpretable directed graphs. A dual-branch architecture further integrates whole-brain connectivity with region-specific processing aligned to established functional neuroanatomy. On the MEEG dataset, GL-LFGNN achieves 86.17% (Arousal) and 86.71% (Valence) accuracy with only 37K parameters -- approximately 10% of the current state-of-the-art -- demonstrating that principled causal modeling can simultaneously enhance interpretability, generalization, and computational efficiency. Code will be released.

2605.25057 2026-05-26 math.NA cs.LG cs.NA 版本更新

Random Neural Network Expressivity for Non-Linear Partial Differential Equations

随机神经网络对非线性偏微分方程的表达能力

Muhammed Ali Mehmood, Lukas Gonon

发表机构 * Department of Mathematics(数学系) Imperial College London(帝国理工学院伦敦分校) School of Computer Science(计算机科学学院) University of St. Gallen(圣加尔登大学)

AI总结 研究随机生成隐藏权重的神经网络(RaNNs)对非线性偏微分方程解的逼近能力,推导了误差界并得到维数无关的逼近率1/2,应用于多孔介质方程和可压缩Navier-Stokes方程。

详情
AI中文摘要

随机生成隐藏权重的神经网络(RaNNs)已被广泛研究,既作为独立的机器学习方法,也作为全可训练深度学习方法的初始化。本文研究RaNNs在学习非线性偏微分方程(PDEs)解方面的表达能力。尽管在实际应用中广泛使用,但对此背景下RaNNs逼近性质的严格理论理解仍然有限。本文推导了RaNNs对时间依赖Sobolev函数的误差界,并对足够正则的函数获得了维数无关的逼近率$ rac{1}{2}$。我们将结果应用于两类重要的非线性PDEs:多孔介质方程和可压缩Navier-Stokes方程,表明RaNNs能够有效逼近这些复杂非线性PDEs的解。我们的理论分析得到了数值实验的支持,表明所获得的收敛速率超出了所考虑的设置。

英文摘要

Neural networks with randomly generated hidden weights (RaNNs) have been extensively studied, both as a standalone learning method and as an initialization for fully trainable deep learning methods. In this work, we study RaNN expressivity for learning solutions to non-linear partial differential equations (PDEs). Despite their widespread use in practical applications, a rigorous theoretical understanding of the approximation properties of RaNNs in this context remains limited. Here, we derive error bounds for RaNN approximations to time-dependent Sobolev functions and obtain a dimension-free approximation rate $\frac{1}{2}$ for sufficiently regular functions. We apply our results to two important classes of non-linear PDEs: Porous Medium Equations and Compressible Navier-Stokes Equations, showing that RaNNs are capable of efficiently approximating solutions to these complex, non-linear PDEs. Our theoretical analysis is supported by numerical experiments, showing that the obtained convergence rates extend beyond the considered setting.

2605.25050 2026-05-26 stat.AP cs.LG q-bio.QM stat.ML 版本更新

Multimodality Stacking with Blockwise missing values and application to the PIONeeR biomarkers study for prediction of resistance to immunotherapy

具有分块缺失值的多模态堆叠及其在预测免疫治疗耐药性的PIONeeR生物标志物研究中的应用

Mohamed Boussena, Florence Monville, Jacques Fieschi-Meric, Frederic Vely, Pierre Milpied, Julien Mazieres, Maurice Perol, Eric Vivier, Laurent Greillier, Fabrice Barlesi, Sebastien Benzekry

发表机构 * Inria – Inserm team COMPO, COMPutational pharmacology and clinical Oncology, Centre Inria Sophia Antipolis - Méditerranée, Centre de Recherches en Cancérologie de Marseille, Inserm U1068, CNRS UMR7258, Institut Paoli-Calmettes, Pharmacy faculty, Aix-Marseille University(Inria - Inserm COMPO团队,计算药理学和临床肿瘤学,Inria Sophia Antipolis -地中海, Marseille癌症研究中心,Inserm U1068,CNRS UMR7258,Paoli-Calmettes研究所,药学系,Aix-Marseille大学) Veracyte SAS, Marseille, France(Veracyte SAS,法国马赛) Assistance Publique-Hôpitaux de Marseille (APHM), Marseille, France(马赛公共医院(APHM),法国马赛) Toulouse University Hospital, Toulouse, France(图卢兹大学医院,法国图卢兹) Centre Leon Berard, Lyon, France(Leon Berard中心,法国里昂) Innate Pharma, Marseille, France(Innate Pharma,法国马赛) Université Paris Saclay, Gustave Roussy, Inserm, Prédicteurs Moléculaires et nouvelles cibles en oncologie (U981), F-94805, Villejuif, France(巴黎萨克雷大学,Gustave Roussy,Inserm,分子预测与肿瘤学新靶点(U981),法国维尔若,F-94805)

AI总结 提出多模态堆叠框架MSB,通过独立建模各模态特征并利用交叉验证堆叠元学习器聚合预测,解决高维和分块缺失问题,在PIONeeR研究中预测非小细胞肺癌免疫治疗无进展生存期,性能优于基线算法。

详情
AI中文摘要

在临床肿瘤学中,整合多模态数据集常受到高维性和分块缺失的阻碍,即特定患者子集无法获得完整数据源。标准生存模型通常难以处理这些缺失,导致结果偏倚或患者排除。我们提出具有分块缺失值的多模态堆叠(MSB),一种用于生存分析的晚期融合框架,它独立建模模态特定特征,然后通过交叉验证的堆叠元学习器聚合预测。MSB在PIONeeR研究(n=443名患者,来自八个异质来源的378个生物标志物)中进行了验证,以预测接受免疫治疗的晚期非小细胞肺癌患者的无进展生存期。MSB产生了比基线算法更高的预测性能(C-index)。改进幅度因基线强度而异:线性模型提高了15.9%(Wilcoxon符号秩检验p<0.001),随机生存森林提高了5.4%(p=0.002),梯度提升方法提高了2.1%(p=0.030)。除了区分能力外,MSB还缩小了泛化差距(5折交叉验证重复3次的训练-测试差异:0.055 vs 线性模型的0.380)。置换重要性分析确定了常规实验室标志物、临床特征和PD-L1表达为主要预测驱动因素。缺失块指示器的重要性可忽略,表明模型从生物标志物值而非数据可用性模式中学习。MSB为具有分块缺失的多模态生存预测提供了一个统计验证的框架。通过无需完整数据即可进行系统性生物标志物评估,MSB为生物医学研究中的预测建模提供了实用工具,有待外部验证。实现代码可在https://github.com/MohamedBoussena/MSB 根据Inria许可证获取。

英文摘要

Integrating multimodal datasets in clinical oncology is frequently hindered by high dimensionality and blockwise missingness, where entire data sources are unavailable for specific patient subsets. Standard survival models often struggle with these gaps, leading to biased results or patient exclusion. We introduce Multimodality Stacking with Blockwise missing values (MSB), a late-fusion framework for survival analysis that independently models modality-specific features before aggregating predictions via a cross-validated stacking meta-learner. MSB was validated on the PIONeeR study (n=443 patients, 378 biomarkers across eight heterogeneous sources) to predict progression-free survival in advanced non-small cell lung cancer patients receiving immunotherapy. MSB yielded higher predictive performance (C-index) than baseline algorithms. Improvements varied by baseline strength: linear models showed a 15.9% increase (p<0.001 for the Wilcoxon signed-rank test), random survival forests gained 5.4% (p=0.002), and gradient boosting methods improved by 2.1% (p=0.030). Beyond discrimination, MSB reduced the generalization gap (train-test difference in 5 folds cross-validation repeated 3 times: 0.055 vs 0.380 for linear models). Permutation importance analysis identified routine laboratory markers, clinical features, and PD-L1 expression as primary predictive drivers. Missing block indicators showed negligible importance, suggesting the model learned from biomarker values rather than data availability patterns. MSB provides a statistically validated framework for multimodal survival prediction with blockwise missingness. By enabling systematic biomarker evaluation without requiring complete data, MSB offers a practical tool for predictive modeling in biomedical research, pending external validation. Implementation is available at https://github.com/MohamedBoussena/MSB under Inria license.

2605.25038 2026-05-26 cs.CL cs.LG cs.SE 版本更新

TRACE: A taxonomy-grounded synthetic dataset for teaching-program generation and session interpretation in Applied Behavior Analysis

TRACE:一个基于分类学的合成数据集,用于应用行为分析中的教学程序生成和会话解释

Festus Kahunla

发表机构 * Drexel University(德雷塞尔大学) Pombo Labs(波莫实验室)

AI总结 提出TRACE数据集,通过分类学驱动的确定性生成器创建2999个合成示例,覆盖教学程序生成和多会话行为解释任务,以解决ABA领域真实数据受隐私保护无法公开的问题。

Comments 11 pages, 3 tables. Dataset: https://huggingface.co/datasets/PomboLabs/TRACE ; code: https://github.com/Pombo-Labs/TRACE

详情
AI中文摘要

应用行为分析(ABA)是一门临床学科,其文档、教学程序和多次会话行为日志具有公式化和高容量的特点,但真实会话数据受HIPAA保护并受专业保密规则约束,阻碍了训练语料库的发布。我们提出了TRACE(分类学参考的ABA临床示例),一个包含2999个示例的合成指令调优数据集,涵盖两项ABA任务:跨离散试验训练、自然环境教学和任务分析的教学程序生成;以及跨十二种轨迹模式和十三种目标行为的多会话行为解释。每个示例均由一个基于经典ABA文献的确定性分类学驱动生成器产生,并且每个示例都带有完整的采样来源,即产生它的确切分类学单元。该数据集以CC BY-NC 4.0(数据)和MIT(代码)许可发布,包含分层训练集(2549)、验证集(149)、测试集(281)和完整性检查集(20)。TRACE是一个研究工件,尚未经过临床验证。

英文摘要

Applied Behavior Analysis (ABA) is a clinical discipline whose documentation, teaching programs and multi-session behavioral logs, is formulaic and high-volume, yet real session data is HIPAA-protected and bound by professional confidentiality rules, blocking the release of a training corpus. We present TRACE (Taxonomy-Referenced ABA Clinical Examples), a 2,999-example synthetic instruction-tuning dataset covering two ABA tasks: teaching-program generation across Discrete Trial Training, Natural Environment Teaching, and Task Analysis; and multi-session behavioral interpretation across twelve trajectory patterns and thirteen target behaviors. Every example is produced by a deterministic taxonomy-driven generator grounded in the canonical ABA literature, and every example carries complete sampling provenance, the exact taxonomy cells that produced it. The dataset is released under CC BY-NC 4.0 for data and MIT for code, with stratified train (2,549), validation (149), test (281), and sanity (20) splits. TRACE is a research artifact and has not been clinically validated.

2605.25030 2026-05-26 cs.LG 版本更新

MimirRAG: A Multi-Agent RAG Framework for Financial Data Retrieval with Metadata Integration

MimirRAG:一种集成元数据的金融数据检索多智能体RAG框架

Magnus Samuelsen, Wilmer Nyström, Somnath Mazumdar, Mansoor Hussain, Mikkel Strange

发表机构 * Copenhagen Business School(哥本哈根商学院)

AI总结 提出MimirRAG多智能体RAG框架,通过元数据集成、表格感知分块和智能体工作流,在金融数据检索中实现89.3%准确率,优于基线。

详情
AI中文摘要

检索增强生成(RAG)系统提供了一种有前景的方法来减少大语言模型(LLM)中的幻觉并提高答案准确性,这是可靠金融分析的必要条件,其中答案必须基于文件中的可验证证据,而非从模型先验生成。然而,设计能够从混合金融文档中提取有意义见解并集成到分析师工作流程中的RAG系统仍然具有挑战性。本文介绍了MimirRAG(元数据集成多智能体信息检索),这是一个迭代开发的多智能体RAG系统,旨在应对这些挑战。MimirRAG具有模块化流水线,包括PDF文件的保结构解析、表格感知分块、元数据提取、带有查询规划和混合搜索的基于智能体的检索、验证以及支持数值推理的上下文感知生成。我们的消融研究确定了有效金融RAG的三个关键技术推动因素:元数据集成、表格感知分块和智能体工作流。MimirRAG使用FinanceBench进行定量评估,并通过四位金融分析师的专家验证进行定性评估。该系统在FinanceBench上达到89.3%的准确率,优于原始基准基线。专家反馈强调,成功部署还需要校准信任、全面的数据集成和用户个性化。我们得出结论,将多智能体RAG架构与以人为中心的设计原则相结合,可以改善金融分析中有意义见解的提取。

英文摘要

Retrieval-augmented generation (RAG) systems offer a promising approach to reduce hallucinations and improve answer accuracy in large language models (LLMs), a requirement for reliable, financial analysis where answers must be grounded in verifiable evidence from filings rather than generated from model priors. However, designing RAG systems that extract meaningful insights from mixed financial documents and integrate into analyst workflows remains challenging. This paper introduces MimirRAG (Metadata-Integrated Multi-Agent Information Retrieval), a multi-agent RAG system developed iteratively to address these challenges. MimirRAG features a modular pipeline encompassing structure-preserving parsing of PDF filings, table-aware chunking, metadata extraction, agent-based retrieval with query planning and hybrid search, validation, and context-aware generation with numerical reasoning support. Our ablation study identifies three key technical enablers for effective financial RAG: metadata integration, table-aware chunking, and an agentic workflow. MimirRAG was evaluated quantitatively using FinanceBench and qualitatively through expert validation with four financial analysts. The system achieved 89.3% accuracy on FinanceBench, outperforming the original benchmark baselines. Expert feedback highlighted that successful deployment also requires calibrated trust, comprehensive data integration, and user personalization. We conclude that combining multi-agent RAG architecture with human-centric design principles can improve the extraction of meaningful insights in financial analysis.

2605.25011 2026-05-26 cs.LG 版本更新

A perspective on fluid mechanical environments for challenges in reinforcement learning

强化学习挑战中的流体力学环境视角

Shruti Mishra, Michael Chang, Vamsi Spandan, Shmuel M. Rubinstein

发表机构 * Sony AI(索尼人工智能) Cohere Labs(Cohere实验室) The Hebrew University of Jerusalem(耶路撒冷希伯来大学)

AI总结 本文提出将经典流体力学问题作为强化学习测试平台,通过非线性不稳定性环境中的状态、动作空间和奖励函数设计,促进智能体在高维动态环境中的高效交互。

详情
AI中文摘要

我们考虑开发能够高效与高维、演化环境交互的智能体所面临的挑战,旨在实现与开放世界交互的实际强化学习智能体,这些智能体仅能观察并影响世界的一小部分。我们认为,经典流体力学问题及其模拟为这类方法的开发提供了一个引人注目的测试平台。这些问题出现在非线性不稳定性中,其中小扰动可能增长并改变系统的动力学。非线性不稳定性代表了若干具有工业应用的开放科学挑战——液体射流的液滴破碎、两种流体界面的混合,以及海洋中异常高的怪浪的出现。在这些设置中,智能体可以利用跨变化动力学的保留表示来高效学习。 我们提出了两个智能体与流体力学环境交互的问题描述,并描述了这些智能体的状态空间、动作空间和奖励函数。对于这些示例,我们指定了环境的非平稳方面以及保留的不变性。我们注意到Dedalus和JAX-CFD是可用于开发强化学习方法的开源模拟器(Burns等人,2016;Kochkov等人,2021)。我们通过创建在Dedalus模拟的静态环境中学习导航的强化学习智能体,展示了Dedalus在环境生成中的使用。这为未来开发能够有意义地与代表自然和工业流动中科学挑战的模拟环境交互的强化学习智能体奠定了基础。

英文摘要

We consider the challenge of developing agents that efficiently interact with high-dimensional, evolving environments, towards a view of practical reinforcement learning (RL) agents interacting with open worlds, of which they witness and affect only a small part. We argue that canonical fluid mechanics problems, and their simulations, present a compelling testbed for the development of such methods. These problems arise in nonlinear instabilities, where small disturbances can grow to transform the dynamics of a system. Nonlinear instabilities represent several open scientific challenges with industrial applications -- the droplet breakup of a liquid jet, mixing at an interface between two fluids, and the appearance of unusually tall rogue waves in the ocean. In these settings, agents may leverage preserved representations across the changing dynamics to learn efficiently. We present two problem descriptions of agents interacting with a fluid mechanical environment, and describe the state and action spaces, and reward functions, for these agents. For these examples, we specify the aspects of the environment which are nonstationary and the preserved invariances. We note Dedalus and JAX-CFD as open-source simulators that can be used for the development of reinforcement learning methods (Burns et al., 2016; Kochkov et al., 2021)) We demonstrate the use of Dedalus for environment generation by creating RL agents that learn to navigate in a stationary environment that is simulated using Dedalus. This sets the stage for future development of RL agents that learn to meaningfully interact with simulated environments that represent scientific challenges in natural and industrial flows.

2605.25004 2026-05-26 cs.LG cs.AI 版本更新

Metropolis-Scale Resilient and Trustworthy Traffic Flow Inference Using Multi-Source Data

基于多源数据的都市尺度弹性可信交通流推断

Qishen Zhou, Yifan Zhang, Michail A. Makridis, Anastasios Kouvelas, Yibing Wang, Simon Hu

发表机构 * School of Transportation, Jilin University(吉林大学交通运输学院) Department of Computer Science, City University of Hong Kong (Dongguan)(香港城市大学(东莞)计算机科学系) Institute for Transport Planning and Systems, ETH Zurich(苏黎世联邦理工学院交通规划与系统研究所) Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University(浙江大学智能交通系统研究所) ZJU-UIUC Institute, Zhejiang University(浙江大学-UIUC研究院)

AI总结 提出任务感知注意力神经过程(TA-ANP)统一概率框架,融合浮动车数据和稀疏固定检测器数据,实现高精度、可信的不确定性量化的全局交通状态推断,并在都市尺度数据集上取得最优性能。

Comments The paper has been submitted to Elsevier for possible publication

详情
AI中文摘要

从稀疏观测中以高精度和可信的不确定性量化推断网络级交通状态对于智能交通系统至关重要,但由于问题的欠定性、传感网络的多方面干扰以及多个推断子任务在联合建模时的固有冲突,这仍然具有挑战性。我们提出了任务感知注意力神经过程(TA-ANP),这是一个统一的概率框架,通过融合浮动车数据(FCD)和稀疏的固定检测器测量,实现弹性且可信的全局交通状态推断(GTSI)。通过将GTSI视为一个随机过程,TA-ANP利用神经过程的元学习特性,无需重新训练即可快速适应传感配置的变化。引入了一个具有不同时空归纳偏置的任务感知多查询注意力模块,以联合处理三个GTSI子任务,同时减轻跨任务干扰。对于不确定性量化,我们将神经过程与蒙特卡洛丢弃法相结合,以捕获偶然不确定性和认知不确定性。为了支持都市尺度评估,我们构建了都市多源交通数据集(MMTD),该数据集整合了固定环路传感器测量、FCD统计数据和OpenStreetMap道路网络数据,覆盖了包含2371个路段的城市网络。在MMTD上的实验表明,TA-ANP在确定性和概率性指标下的所有子任务中均达到了最先进的性能。由此产生的良好校准的不确定性使得能够以更少的传感器部署实现更高效的固定传感器布局。在“损坏-修复-新增”传感生命周期下,TA-ANP在干扰吸收、性能恢复和对未见传感配置的适应性方面表现出卓越的弹性。

英文摘要

Inferring network-wide traffic states from sparse observations with high accuracy and trustworthy uncertainty quantification is essential for intelligent transportation systems, yet it remains challenging due to the underdetermined nature of the problem, multifaceted disturbances in sensing networks, and the inherent conflicts among multiple inference sub-tasks when modeled jointly. We propose the Task-Aware Attentive Neural Process (TA-ANP), a unified probabilistic framework for resilient and trustworthy global traffic state inference (GTSI) by fusing floating car data (FCD) with sparse fixed-detector measurements. By casting GTSI as a stochastic process, TA-ANP leverages the meta-learning properties of neural processes to adapt rapidly to changes in sensing configurations without retraining. A task-aware multi-query attention module with distinct spatiotemporal inductive biases is introduced to jointly handle three GTSI sub-tasks, while mitigating cross-task interference. For uncertainty quantification, we combine neural processes with Monte Carlo Dropout to capture both aleatoric and epistemic uncertainty. To support metropolis-scale evaluation, we construct the Metropolitan Multi-Source Traffic Dataset (MMTD), integrating fixed-loop sensor measurements, FCD statistics, and OpenStreetMap road-network data over an urban network of 2,371 road segments. Experiments on MMTD show that TA-ANP achieves state-of-the-art performance across all sub-tasks under deterministic and probabilistic metrics. The resulting well-calibrated uncertainties enable more efficient fixed-sensor placement with fewer sensor deployments. Under a Damage-Repair-Addition sensing lifecycle, TA-ANP demonstrates superior resilience in terms of disturbance absorption, performance recovery, and adaptability to unseen sensing configurations.

2605.25001 2026-05-26 cs.LG 版本更新

Mitigating Gradient Pathology in PINNs through Aligned Constraint

通过对齐约束缓解PINN中的梯度病理

Yichen Luo, Peiyu Zhu, Dongxiao Hu, Jia Wang, Tailin Wu, Dapeng Lan, Yu Liu, Zhibo Pang

发表机构 * Department of Information Science and Engineering, KTH Royal Institute of Technology, Stockholm, Sweden(信息科学与工程系,皇家理工学院,斯德哥尔摩,瑞典) School of Advanced Manufacturing and Robotics, Peking University, Beijing, China(先进制造与机器人学院,北京大学,北京,中国) School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Suzhou, China(先进技术学院,西安交通大学利物浦大学,苏州,中国) Department of AI, School of Engineering, Westlake University, Hangzhou, China(人工智能系,工程学院,西湖大学,杭州,中国)

AI总结 针对物理信息神经网络训练中梯度冲突导致的局部最优问题,提出约束对齐损失与流形提升方法,通过重新表述零阶项为对齐约束并引入延迟因子,显著提升数值稳定性和效率。

详情
Journal ref
Forty-Third International Conference on Machine Learning (ICML 2026)
AI中文摘要

虽然物理信息神经网络(PINN)在求解偏微分方程(PDE)方面功能强大,但其训练常常因梯度病理而瘫痪。来自PDE残差和边界约束的梯度相互对抗,使模型陷入局部最小值。当前的解决方案,如自适应加权或硬约束,要么无法从根本上解决这种病态条件,要么仅限于简单几何形状。在本研究中,我们从损失景观和优化动态的角度系统分析了这种梯度病理的可能原因。基于所得结论,我们提出了带有流形提升的约束对齐损失(CAML)。通过将所有零阶项重新表述为对齐约束,我们的方法有效缓解了梯度冲突。此外,我们引入了一个延迟因子来帮助优化器跳过高曲率区域。实验表明,我们的CAML在高度复杂的PINN问题中显著增强了数值稳定性和效率。我们的代码已在https://github.com/YichenLuo-0/CAML上开源。

英文摘要

While Physics-Informed Neural Networks (PINNs) are powerful for solving Partial Differential Equations (PDEs), their training is often paralyzed by gradient pathology. The gradients from the PDE residuals and boundary constraints oppose each other, trapping the model in local minima. Current solutions, such as adaptive weighting or hard constraints, either fail to fundamentally resolve this ill-conditioning or are limited to simple geometries. In this study, we systematically analyze the possible causes of this gradient pathology from the perspectives of loss landscapes and optimization dynamics. Based on the obtained conclusion, we propose Constraint-Aligned loss with Manifold Lifting (CAML). By reformulating all zeroth-order terms into aligned constraints, our method effectively mitigates gradient conflicts. In addition, we introduce a delay factor to help the optimizer skip the high-curvature area. Experiments demonstrate that our CAML significantly enhances numerical stability and efficiency in highly complex PINN problems. Our code is open-sourced on https://github.com/YichenLuo-0/CAML.

2605.24992 2026-05-26 cs.NI cs.AI cs.LG cs.MA 版本更新

Scaling up Energy-Aware Multi-Agent Reinforcement Learning for Mission-Oriented Drone Networks with Individual Reward

面向任务驱动无人机网络的能量感知多智能体强化学习扩展与个体奖励

Changling Li, Ying Li

发表机构 * Department of Computer Science, ETH Zurich(苏黎世联邦理工学院计算机科学系) Department of Computer Science, Colby College(科尔比学院计算机科学系)

AI总结 提出基于个体奖励函数的能量感知多智能体强化学习模型,利用深度Q网络解决无人机网络动态环境和电池容量限制下的轨迹规划问题,实验表明在任务密度高时成功率接近100%,且扩展性优于共享奖励模型。

Comments IEEE Internet of Things Journal

详情
Journal ref
volume=12, number=8, year=2025, pages=10640-10654
AI中文摘要

多智能体强化学习(MARL)因其通过交互学习的能力,在自动驾驶和智慧城市等协作系统中显示出广泛适用性。随着无人机网络的最新发展,研究人员也应用MARL来解决轨迹规划问题。然而,动态环境和有限的电池容量仍然是使用MARL实现高效协作任务执行的挑战。在本文中,我们提出了一种能量感知的MARL模型作为应对这些挑战的尝试,利用深度Q网络(DQN)和由任务执行进度及无人机剩余电量驱动的个体奖励函数。我们对所提出的模型进行了一系列仿真研究,并将其与共享奖励MARL进行比较,以探索MARL中信用分配的影响。结果表明,无论任务位置和长度如何,我们提出的模型都能达到至少80%的成功率。与共享奖励模式类似,个体奖励模式在任务密度高时可以获得更好的成功率,并且当任务密度接近40%时,几乎可以达到100%的成功率。我们提出的个体奖励模型的真正优势在环境扩展时得以显现。与共享奖励MARL的比较表明,我们提出的模型对环境大小和智能体数量的变化更加鲁棒。由于目标的清晰性,它可以用更少的步骤实现更高的成功率,从而更好地提高能源效率。

英文摘要

Multi-agent reinforcement learning (MARL) has shown wide applicability in collaborative systems such as autonomous driving and smart cities for its ability of learning through interaction. With the recent development of drone networks, researchers have also applied MARL to address the trajectory planning problems. However, the dynamic environment and the limited battery capacity are still challenging for using MARL to achieve efficient collaborative task execution. In this paper, we propose an energy-aware MARL model as an attempt to tackle these challenges, leveraging Deep Q-Networks (DQN) with \emph{individual reward functions} driven by the task execution progress and the remaining battery of drones. We conduct a set of simulation studies for the proposed mode and compare it with the shared reward MARL~\cite{Li2022MARL} to explore the impact of credit assignment in MARL. The results indicate that our proposed model can achieve at least 80\% success rate regardless of the task locations and lengths. Similar to the shared reward mode, the individual reward mode can achieve a better success rate when the task density is high, and it can hit nearly a 100\% success rate when task density gets close to 40\%. The true advantage of our proposed model with individual reward is revealed when scaling up the environment. The comparison to the shared reward MARL shows that the our proposed model is more robust towards the change of the environment size and agent numbers. It can achieve higher success rate with fewer steps due to the clarity of the goal which improves energy efficiency even better.

2605.24989 2026-05-26 cs.LG cs.AI cs.IR 版本更新

Selective Test-Time Compute Scaling for Click-Through Rate Prediction via Uncertainty-Triggered Feature Path Exploration

基于不确定性触发的特征路径探索的点击率预测选择性测试时计算扩展

Moyu Zhang, Yun Chen, Yujun Jin, Jinxin Hu, Yu Zhang, Xiaoyi Zeng

发表机构 * Alibaba Group(阿里巴巴集团)

AI总结 针对点击率预测中训练数据稀疏导致的不确定性,提出无需训练、模型无关的UTTSI框架,通过双信号估计器区分认知不确定性和偶然模糊性,对不确定实例进行自适应特征过滤和随机特征路径探索,在保持最坏延迟不变的情况下实现平均约2.8倍基础模型开销,实验和在线A/B测试均取得显著提升。

Comments 12 pages, 4 Figures, 3 Tables

详情
AI中文摘要

扩展测试时计算对语言模型已被证明非常有效,然而这一机会在工业点击率(CTR)预测中仍未得到充分探索。CTR模型存在一个根本的不对称性:训练中充分表示的特征组合产生自信的预测,而稀疏观察到的特征组合则产生不可靠的输出。现有的训练阶段解决方案(如自适应门控)学习一个固定的选择函数,但受限于相同的稀疏性,在部署时无法提供针对每个实例的补救措施。我们提出UTTSI(不确定性触发的测试时选择性推理),一个无需训练、模型无关的框架,将推理深度按比例扩展到每个实例的不确定性。一个结合模型logit置信度和数据级频率先验的双信号估计器区分认知不确定性和偶然模糊性。每个实例都经过自适应特征过滤以去除不可靠的嵌入;不确定的实例额外接受随机特征路径探索,其预测通过一致性加权集成进行聚合。自信的实例完全绕过探索,保持平均开销约为基础模型成本的2.8倍,最坏情况延迟不变。在四个数据集和三种骨干架构上的实验表明,与所有训练阶段基线相比,取得了持续且统计显著的增益。为期七天的在线A/B测试进一步证实了5.3%的相对CTR提升(p < 0.01),确立了选择性测试时计算分配作为CTR预测训练阶段进展的实用补充。

英文摘要

Scaling test-time compute has proven highly effective for language models, yet this opportunity remains largely unexplored for industrial Click-Through Rate (CTR) prediction. CTR models suffer from a fundamental asymmetry: feature combinations well-represented in training yield confident predictions, while sparsely observed ones produce unreliable outputs. Existing training-phase solutions such as adaptive gating learn a fixed selection function subject to the same sparsity, offering no per-instance recourse at deployment.We propose UTTSI (Uncertainty-Triggered Test-Time Selective Inference), a training-free model-agnostic framework that scales inference depth proportionally to per-instance uncertainty. A dual-signal estimator combining model logit confidence with a data-level frequency prior distinguishes epistemic uncertainty from aleatoric ambiguity. Every instance undergoes adaptive feature filtering to remove unreliable embeddings; uncertain instances additionally receive stochastic feature-path explorations whose predictions are aggregated via consistency-weighted ensembling. Confident instances bypass exploration entirely, keeping average overhead at approximately $2.8\times$ base model cost with worst-case latency unchanged.Experiments on four datasets with three backbone architectures demonstrate consistent, statistically significant gains over all training-phase baselines. A seven-day online A/B test further confirms a 5.3% relative CTR gain ($p < 0.01$), establishing selective test-time compute allocation as a practical complement to training-phase advances for CTR prediction.

2605.24986 2026-05-26 cs.IR cs.LG 版本更新

Self-Balancing Gradient Allocation for Heterogeneity-Aware Feature Generation in Click-Through Rate Prediction

点击率预测中面向异构感知特征生成的自平衡梯度分配

Moyu Zhang, Yun Chen, Yujun Jin, Jinxin Hu, Yu Zhang, Xiaoyi Zeng

发表机构 * Alibaba Group(阿里巴巴集团)

AI总结 针对生成式CTR方法中重建目标忽略特征场异构性导致难场欠拟合的问题,提出HeteGenCTR,通过可学习的场难度参数联合训练去噪网络,实现自平衡损失和难度引导注意力机制,在五个基准和在线A/B测试中取得显著提升。

Comments 12 pages, 5 figures, 4 tables

详情
AI中文摘要

通过离散扩散的生成式预训练在所有特征场上同时提供密集的重建监督,缓解了CTR预测中数据稀疏导致的表示崩溃。然而,所有现有的生成式CTR方法都有一个根本限制:重建目标对每个特征场赋予相同的训练权重,忽略了高基数ID字段、稀疏分类属性、数值和行为序列之间重建难度的深刻异质性。这导致容易的场主导训练梯度,而最难但信息最丰富的场长期欠拟合,我们将这个问题称为生成难度不平衡。我们提出HeteGenCTR,通过每个场可学习的难度参数与去噪网络联合训练来解决这种不平衡。这个统一信号驱动两个协调组件,无需额外超参数:一个自平衡损失,自动将梯度预算重新分配给更难的场,具有可证明的稳定均衡;以及一个难度引导的注意力机制,抑制已经收敛的容易场的影响,同时放大向难场的跨场信息流。两个组件共享相同的学习信号,并在整个训练过程中保持相互一致。在五个CTR基准和一个为期七天的在线A/B测试中,实验表明相对于最先进的基线具有一致且统计显著的改进,对冷启动和长尾用户有不成比例的增益。

英文摘要

Generative pre-training via discrete diffusion provides dense reconstruction supervision across all feature fields simultaneously, mitigating representation collapse from data sparsity in CTR prediction. However, all existing generative CTR methods share a fundamental limitation: the reconstruction objective assigns equal training weight to every feature field, ignoring the profound heterogeneity of reconstruction difficulty across high-cardinality ID fields, sparse categorical attributes, numerical values, and behavioral sequences. This causes easy fields to dominate training gradients while the hardest but most informative fields remain chronically underfit, a problem we term the generative difficulty imbalance.We propose HeteGenCTR, which resolves this imbalance through per-field learnable difficulty parameters jointly trained with the denoising network. This unified signal drives two coordinated components without additional hyperparameters: a self-balancing loss that automatically reallocates gradient budget toward harder fields with a provably stable equilibrium, and a difficulty-guided attention mechanism that suppresses the influence of already-converged easy fields while amplifying cross-field information flow toward hard fields. Both components share the same learned signal and remain mutually consistent throughout training. Experiments on five CTR benchmarks and a seven-day online A/B test demonstrate consistent, statistically significant improvements over state-of-the-art baselines, with disproportionate gains for cold-start and long-tail users.

2605.24985 2026-05-26 cs.RO cs.LG physics.comp-ph 版本更新

Learning, locomotion, and navigation of soft synthetic snakes in three-dimensional, heterogeneous environments

软体合成蛇在三维异质环境中的学习、运动与导航

Xiaotian Zhang, Ali Albazroun, Tixian Wang, Songyuan Cui, Prashant G. Mehta, Mattia Gazzola

发表机构 * Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana–Champaign(卡尔·R·沃塞基因组生物学研究所,伊利诺伊大学厄巴纳-香槟分校) Department of Mechanical and Aerospace Engineering, Hong Kong University of Science and Technology(香港科学与技术大学机械与航空航天工程系) Department of Mechanical Science and Engineering, University of Illinois Urbana–Champaign(伊利诺伊大学厄巴纳-香槟分校机械科学与工程系)

AI总结 提出基于仿生驱动和感知模型的强化学习框架,使软体合成蛇能够自主导航非结构化三维地形,并通过高保真环境验证鲁棒性。

Comments 14 pages, 5 figures

详情
AI中文摘要

无肢陆地动物表现出卓越的运动多样性和控制能力,目前尚无法被工程对应物所超越。在这里,我们引入了一个计算框架,使软体合成蛇能够导航非结构化的、异质的三维地形。我们的方法基于仿生驱动和感知模型,这些模型降低了高自由度连续体固有的控制复杂性。这些模型被集成到强化学习架构中,以推导出穿越环境的策略。训练首先在简化的同质地形中进行,以学习运动基元。然后,这些基元被组合成针对复杂地形的自适应策略。我们通过将蛇部署在从真实世界成像重建的高保真三维环境中来展示鲁棒性,实现了可靠的导航。总体而言,这项工作为自然地形中连续系统的控制提供了一个物理真实的仿真平台和实用见解。

英文摘要

Limbless terrestrial animals exhibit exceptional locomotor versatility and control, currently unmatched by engineered counterparts. Here, we introduce a computational framework that enables soft synthetic snakes to navigate unstructured, heterogeneous 3D terrains. Our approach is grounded in bio-inspired actuation and sensing models that reduce the control complexity inherent to high-degree-of-freedom, continuum bodies. These models are integrated into a reinforcement learning architecture to derive environment-traversing policies. Training first occurs in simplified, homogeneous terrains to learn locomotion primitives. These are then composed into adaptive strategies for complex landscapes. We demonstrate robustness by deploying a snake in high-fidelity 3D environments reconstructed from real-world imaging, achieving reliable navigation. Overall, this work provides a physically-realistic simulation platform and practical insights for the control of continuum systems in natural terrains.

2605.24983 2026-05-26 cs.LG 版本更新

Benchmarking non-conformity score functions in conformal prediction

共形预测中非一致性评分函数的基准测试

Sol Erika Boman

发表机构 * Department of Medical Epidemiology and Biostatistics, Karolinska Institutet(卡罗林斯卡研究所医学流行病学与生物统计学系) Department of Molecular Medicine and Surgery, Karolinska Institutet(卡罗林斯卡研究所分子医学与外科学系)

AI总结 本文综述了共形预测中非一致性评分函数的性质,提出原始修改和评估方法,并通过实验比较了不同函数在平衡和不平衡类别设置下的性能。

Comments 3 tables, 1 supplementary table, 1 supplementary figure

详情
AI中文摘要

共形预测是机器学习分类中模型校准的一种有用且多功能的替代方法。它将单类预测替换为预测集,保证预测集包含真实类别的 extit{先验}概率大于或等于预指定的比率。预测集的大小和有用性在很大程度上取决于非一致性评分函数的选择。科学文献中包含许多非一致性评分函数的例子,但缺乏研究其性质和有效性的工作。在本文中,我们概述了非一致性评分函数的性质。我们给出了现有文献中的非一致性评分函数示例,并引入了原始修改。我们提出了一种评估共形预测器预测集大小的原始方法,并用它来比较非一致性评分函数。我们还研究了不同非一致性评分函数在类别不平衡设置下用于类别条件共形预测的有效性。

英文摘要

Conformal prediction is a useful and versatile alternative to model calibration in machine learning classification. It replaces single-class prediction with prediction sets, guaranteeing that the \textit{a priori} probability of the prediction sets containing the true class is larger than or equal to a pre-specified rate. The size and usefulness of the prediction sets relies heavily on the choice of the non-conformity score function. The scientific literature contains many examples of non-conformity score functions but there is an absence of studies examining their properties and effectiveness. In this paper, we give an overview of properties of non-conformity score functions. We give examples of non-conformity score functions in the existing literature and introduce original modifications. We introduce an original method of evaluating the prediction set sizes of conformal predictors and use it to provide a comparison between non-conformity score functions. We also examine efficacy of different non-conformity score functions for class-conditional conformal prediction in a setting with imbalanced classes.

2605.24981 2026-05-26 cs.CL cs.LG 版本更新

Large Language Model Selection with Limited Annotations

有限标注下的大语言模型选择

Yavuz Durmazkeser, Patrik Okanovic, Andreas Kirsch, Torsten Hoefler, Nezihe Merve Gürel

发表机构 * TU Delft(代尔夫特理工大学) ETH Zurich(苏黎世联邦理工学院)

AI总结 提出SELECT-LLM框架,通过基于期望信息增益的查询选择规则,在有限标注下高效识别最佳大语言模型,显著降低标注成本。

Comments 33 pages, 5 figures, 4 tables

详情
AI中文摘要

为给定任务选择大语言模型(LLM)需要比较许多强候选模型,然而标准评估依赖于固定评估集上的昂贵标注。为解决这一挑战,我们开发了SELECT-LLM,这是第一个用于主动模型选择LLM的框架。SELECT-LLM旨在找到一组查询,其标注对于识别给定任务的最佳LLM最具信息量。为此,我们引入了一种基于期望信息增益的查询选择规则,该规则通过候选模型输出之间的成对相似性计算。由于该规则仅使用生成的模型响应,SELECT-LLM可以在不假设候选模型架构或访问模型权重的情况下应用。这使得它适用于开源权重和黑盒LLM。我们在23个数据集、156个评估模型、多样化的任务族和多个文本评估指标上评估了SELECT-LLM。在所有实验中,SELECT-LLM在每个设置中都优于最强基线,最佳模型选择的标注成本降低高达81.8%,近最佳模型选择的标注成本降低高达84.78%。

英文摘要

Choosing a Large Language Model (LLM) for a given task requires comparing many strong candidates, yet standard evaluation relies on costly annotations over fixed evaluation sets. To address this challenge, we develop SELECT-LLM, the first framework for active model selection of LLMs. SELECT-LLM aims to find a small set of queries whose annotations are most informative for identifying the best LLM for a given task. To this end, we introduce a query selection rule based on expected information gain, computed from pairwise similarities between candidate model outputs. Because this rule only uses generated model responses, SELECT-LLM can be applied across candidate models without assumptions about their architecture or access to model weights. This makes it suitable for both open-weight and black-box LLMs. We evaluate SELECT-LLM across 23 datasets, 156 evaluated models, diverse task families, and multiple text evaluation metrics. Across all experiments, SELECT-LLM improves over the strongest baseline in every setting, with annotation cost reductions up to 81.8% for best model selection and up to 84.78% for near-best model selection.

2605.24975 2026-05-26 cs.RO cs.AI cs.LG 版本更新

Bridging the Gap: Enabling Soft Actor Critic for High Performance Legged Locomotion

弥合差距:实现软演员-评论家算法用于高性能腿部运动

Gianluca Sabatini, Chenhao Li, Marco Hutter

发表机构 * ETH Zurich(苏黎世联邦理工学院)

AI总结 本文通过识别软演员-评论家(SAC)在并行训练中性能不足的根本原因,并提出策略初始化、超时感知评论家目标和多步回报估计等改进,使其在腿部运动任务中达到与近端策略优化(PPO)相当的性能。

详情
AI中文摘要

近端策略优化(PPO)由于其在IsaacLab等大规模并行仿真环境中的鲁棒性和可扩展性,已成为训练腿部机器人的事实标准。然而,其基于策略的性质使其天生样本效率低下,阻碍了其在真实硬件上的持续适应和微调。相比之下,软演员-评论家(SAC)是一种可以重用过去经验的离策略算法,使其成为模拟到现实迁移工作流程的自然候选,其中同一算法既可用于仿真,也可用于真实机器人的在线学习。尽管有这些优势,SAC在大规模并行训练设置中始终未能匹配PPO的经验性能。本工作确定了这一差距的根本原因,并引入了针对性的修改,包括策略初始化、超时感知评论家目标和多步回报估计,使SAC能够稳定地大规模训练。在多个腿部机器人平台和多样化的运动任务上评估,我们的方法完全弥合了与PPO的性能差距。

英文摘要

Proximal Policy Optimization (PPO) has become the de facto standard for training legged robots, thanks to its robustness and scalability in massively parallel simulation environments like IsaacLab. However, its on-policy nature makes it inherently sample-inefficient, preventing its use for continuous adaptation and fine-tuning on real hardware. Soft Actor-Critic (SAC), by contrast, is an off-policy algorithm that can reuse past experience, making it a natural candidate for sim-to-real transfer workflows where the same algorithm can be used both in simulation and for online learning on the real robot. Despite these advantages, SAC has consistently failed to match PPO's empirical performance in massively parallel training settings. This work identifies the root causes of this gap and introduces targeted modifications, covering policy initialization, timeout-aware critic targets, and multi-step return estimation, that enable SAC to train stably at scale. Evaluated across multiple legged robot platforms and diverse locomotion tasks, our approach closes the performance gap with PPO entirely.

2605.24971 2026-05-26 cs.LG cs.AI 版本更新

TGFormer: Towards Temporal Graph Transformer with Auto-Correlation Mechanism

TGFormer:基于自相关机制的时间图Transformer

Hongjiang Chen, Pengfei Jiao, Ming Du, Xuan Guo, Zhidong Zhao, Di Jin, Xiao Liu

发表机构 * Hangzhou Dianzi University, School of Cyberspace(杭州电子科技大学信息学院) Tianjin University, College of Intelligence and Computing(天津大学智能与计算学院) State Key Laboratory of Systems Medicine for Cancer, Shanghai Cancer Institute(癌症系统医学国家重点实验室,上海癌症研究院)

AI总结 针对时间图神经网络在捕获长期依赖和周期模式上的不足,提出TGFormer,通过轨迹框架和自相关机制实现子交互级别的依赖发现与表示聚合,在六个基准上最高提升9.35%精度。

详情
Journal ref
Pattern Recognition 170 (2026): 112053
AI中文摘要

对时间图神经网络(TGNN)日益增长的兴趣源于它们能够建模复杂动态并提供卓越性能。然而,TGNN在捕获长期依赖和识别周期模式方面面临根本性挑战。为解决这些限制,我们提出了TGFormer,一种专为时间图设计的新型Transformer架构。我们的模型通过建立与时间序列分析原理一致的轨迹框架,重新定义了时间图学习。这种方法使TGFormer能够通过对历史交互的系统分析来推导节点表示,从而实现对跨连续时间戳的节点关系的精细检查。基于随机过程理论,我们开发了一种自相关机制,系统性地揭示节点交互中的周期依赖。这一创新使TGFormer能够在子交互级别进行依赖发现和表示聚合,相比传统注意力机制展现出更高的效率和准确性。在六个公开基准上的实验验证了我们的方法的有效性,与最先进方法相比,TGFormer最高实现了9.35%的精度提升。

英文摘要

The growing interest in Temporal Graph Neural Networks (TGNNs) stems from their ability to model complex dynamics and deliver superior performance. However, TGNNs encounter fundamental challenges in capturing long-term dependencies and identifying periodic patterns. To address these limitations, we propose TGFormer, a novel Transformer architecture specifically designed for temporal graphs. Our model redefines temporal graph learning by establishing a trajectory framework that aligns with time series analysis principles. This approach allows TGFormer to derive node representations through systematic analysis of historical interactions, enabling granular examination of node relationships across sequential timestamps. Building upon stochastic process theory, we develop an auto-correlation mechanism that systematically uncovers periodic dependencies in node interactions. This innovation empowers TGFormer to perform dependency discovery and representation aggregation at sub-interaction levels, demonstrating superior efficiency and accuracy compared to conventional attention mechanisms. Experimental validation across six public benchmarks confirms the effectiveness of our approach, with TGFormer at most achieving 9.35\% precision improvement compared to state-of-the-art approaches.

2605.24969 2026-05-26 cs.LG cs.AI 版本更新

OSDTW: Optimal Shared Depth and Task Weighting for Long-Tailed Recognition

OSDTW:长尾识别的最优共享深度与任务加权

Chang Chu, Qingyue Zhang, Shao-Lun Huang, Junxiong Zheng

发表机构 * Shenzhen International Graduate School, Tsinghua University, Shenzhen, China(清华大学深圳国际研究生院,中国深圳) Shenzhen Zkosemi Semiconductor Technology Co., Ltd(深圳卓芯半导体科技有限公司)

AI总结 提出OSDTW框架,通过分解任务、共享编码器与任务特定解码器,并基于Fisher信息矩阵推导泛化误差的偏置-方差分解,以优化共享深度和任务权重,解决长尾识别中头部-尾部性能权衡问题。

Comments ICIC 2026 Oral

详情
AI中文摘要

长尾识别面临持续的头部-尾部权衡:提升尾部性能通常会降低头部准确率,并可能增加训练不稳定性。尽管重加权、解耦训练和多专家方法取得了强有力的实证结果,但关于头部和尾部类别之间表示共享以及跨类别组监督加权的关键设计选择仍主要基于启发式。在这项工作中,我们提出了OSDTW,一个原则性的任务分解框架,将原始的单标签识别问题划分为头部任务和尾部任务,通过共享编码器和任务特定解码器实现。为了处理两个标签组之间的互斥性和统计依赖性,我们引入了一个因子化模型,并表明由此产生的基于KL散度的泛化误差可以写为任务项之和(加一个常数),从而得到一个定义良好的任务级目标。我们进一步开发了一个三阶段训练流程:独立任务训练以估计任务级最优值和Fisher信息矩阵,加权联合训练以学习共享编码器,以及分支组装以构建最终的解耦模型。在块对角Fisher近似下,我们推导了期望泛化误差的可计算二阶展开,将其分解为编码器方差、编码器偏置和解码器方差。这种偏置-方差分解提供了一个可计算的代理来选择共享深度和任务权重,从而实现高效的超参数搜索。在标准长尾基准上的实验证明了所提出方法相对于强基线的有效性。

英文摘要

Long-tailed recognition suffers from a persistent head--tail trade-off: improving tail performance often degrades head accuracy and can increase training instability. Despite strong empirical results from re-weighting, decoupled training, and multi-expert methods, key design choices about representation sharing between head and tail classes and supervision weighting across class groups remain largely heuristic. In this work, we propose OSDTW, a principled task-decomposition framework that partitions the original single-label recognition problem into a head task and a tail task, implemented with a shared encoder and task-specific decoders. To handle the mutual exclusivity and statistical dependence between the two label groups, we introduce a factorized model and show that the resulting Kullback--Leibler divergence-based generalization error can be written as the sum of task-wise terms up to an additive constant, yielding a well-defined task-wise objective. We further develop a three-stage training pipeline: independent task training to estimate task-wise optima and the Fisher information matrix, weighted joint training to learn a shared encoder, and branch assembly to construct the final decoupled model. Under a block-diagonal Fisher approximation, we derive a computable second-order expansion of the expected generalization error, decomposing it into encoder variance, encoder bias, and decoder variance. This bias--variance decomposition provides a computable proxy to select the shared depth and task weights, enabling efficient hyper-parameter search. Experiments on standard long-tailed benchmarks demonstrate the effectiveness of the proposed approach over strong baselines.

2605.24965 2026-05-26 cs.CV cs.AI cs.LG 版本更新

Cross-Domain Generalization Limits of Vision Foundation Models in Facial Deepfake Detection

视觉基础模型在面部深度伪造检测中的跨域泛化极限

Ibrahim Delibasoglu

发表机构 * Department of Software Engineering, Faculty of Computer and Information Sciences(软件工程系,计算机与信息科学学院)

AI总结 本文通过系统评估三种视觉基础模型(RoPE-ViT、DINOv3、NVIDIA C-RADIOv4-H)在DF40基准上的线性探测性能,揭示了它们在面部深度伪造检测中的跨域泛化极限,发现基础模型对全脸合成保持高判别力,但对局部编辑技术存在根本性边界。

详情
AI中文摘要

生成模型的快速进化使得超逼真面部深度伪造的创建成为可能,暴露了现代数字取证中的一个关键弱点:检测器无法泛化到未见过的操作技术。传统网络遭受表示崩溃,过度拟合特定训练生成器的局部伪影指纹。本研究探讨了现代视觉基础模型是否可以作为可泛化的、开箱即用的特征提取器,能够在完全未见过的生成流形上追踪取证异常。我们进行了系统的跨域评估,比较了三种基础学习范式:全监督宏观语义特征(RoPE-ViT)、纯自监督几何特征(DINOv3)和多教师聚合表示(NVIDIA C-RADIOv4-H)。通过部署冻结的骨干网络并进行下游线性探测,我们映射了这些架构在具有挑战性的DF40基准上的性能极限。我们的实证结果揭示了预训练范式和参数规模之间的内在权衡,证明虽然基础模型对全脸合成保持高判别能力,但局部面部编辑技术在线性探测评估结构中暴露了基本边界。源代码和模型权重可在 http://github.com/mribrahim/deepfake 获取。

英文摘要

The rapid evolution of generative models has enabled the creation of hyper-realistic facial deepfakes, exposing a critical vulnerability in modern digital forensics: the inability of detectors to generalize to unseen manipulation techniques. Traditional networks suffer from representation collapse, overfitting to localized artifact fingerprints of specific training generators. This work investigates whether modern Vision Foundation Models can serve as generalizable, out-of-the-box feature extractors capable of tracking forensic anomalies across entirely unseen generative manifolds. We conduct a systematic cross-domain evaluation comparing three foundational learning paradigms: fully supervised macro-semantic features (RoPE-ViT), pure self-supervised geometric features (DINOv3), and multi-teacher agglomerative representations (NVIDIA C-RADIOv4-H). By deploying frozen backbones subjected to downstream linear probing, we map the performance limitations of these architectures on the challenging DF40 benchmark. Our empirical findings expose the intrinsic trade-offs between pre-training paradigms and parameter scale, proving that while foundation models retain high discriminative capabilities for entire face synthesis, localized face editing techniques expose fundamental boundaries in linear probe evaluation structures. Source code and model weights are available in http://github.com/mribrahim/deepfake

2605.24961 2026-05-26 cs.LG 版本更新

MedMamba: Multi-View State Space Models with Adaptive Graph Learning for Medical Time Series Classification

MedMamba: 基于自适应图学习的多视图状态空间模型用于医疗时间序列分类

Da Zhang, Bingyu Li, Zhiyuan Zhao, Hongyuan Zhang, Junyu Gao, Xuelong Li

发表机构 * Northwest Polytechnical University(西北工业大学) University of Hong Kong(香港大学)

AI总结 提出MedMamba,一种集成状态空间模型与领域特定归纳偏置的端到端架构,通过多尺度卷积嵌入、三支差分状态空间编码器和空间图Mamba模块,分别处理局部-全局动态、非平稳性和潜在通道交互,在五个真实数据集上实现最先进性能。

Comments Accepted to 2026 ICML

详情
AI中文摘要

医疗时间序列是医疗保健的核心,能够实现连续监测并支持及时的临床决策。尽管最近取得了进展,现有方法仍难以联合建模局部-全局动态并处理基线漂移等非平稳性,同时常常无法捕捉潜在的通道交互。为了解决这些挑战,我们提出了MedMamba,一种将状态空间模型与领域特定归纳偏置相结合的端到端架构。具体来说,MedMamba首先采用多尺度卷积嵌入来捕获判别性的局部形态。其次,为了缓解非平稳性,我们引入了一个三支差分状态空间编码器,处理原始视图、时间差分视图和频域视图,融合它们以强调信息模式同时抑制漂移。此外,为了揭示潜在的通道相关性,我们设计了一个空间图Mamba模块,学习一个向稀疏性和无环性正则化的有向依赖结构,从而无需预定义图。在五个真实世界数据集上的大量实验表明,MedMamba在保持线性计算复杂度的同时实现了最先进的性能,消融研究验证了每个组件的贡献。代码可在 https://github.com/zhangda1018/MedMamba 获取。

英文摘要

Medical time series are central to healthcare, enabling continuous monitoring and supporting timely clinical decisions. Despite recent progress, existing methods struggle to jointly model local-global dynamics and handle nonstationarities like baseline drift, while often failing to capture latent channel interactions. To address these challenges, we propose MedMamba, an end-to-end architecture that integrates state space models with domain-specific inductive biases. Specifically, MedMamba first employs multi-scale convolutional embeddings to capture discriminative local morphology. Second, to mitigate nonstationarity, we introduce a tri-branch differential state space encoder that processes raw, temporal-difference, and frequency-domain views, fusing them to emphasize informative patterns while suppressing drift. Furthermore, to uncover latent channel correlations, we design a spatial graph Mamba module that learns a directed dependency structure regularized toward sparsity and acyclicity, which obviates the need for predefined graphs. Extensive experiments on five real-world datasets demonstrate that MedMamba achieves state-of-the-art performance while maintaining linear computational complexity, and ablation studies validate each component's contribution.Code is available at https://github.com/zhangda1018/MedMamba.

2605.24960 2026-05-26 cs.CL cs.AI cs.LG 版本更新

Investigating the Interplay between Contextual and Parametric Chain-of-Thought Faithfulness under Optimization

探究优化下上下文与参数化思维链忠实性之间的相互作用

Jingyi Sun, Qianli Wang, Pepa Atanasova, Nils Feldhus, Isabelle Augenstein

发表机构 * University of Copenhagen(哥本哈根大学) Technische Universität Berlin(柏林技术大学) German Research Center for Artificial Intelligence (DFKI)(德国人工智能研究中心(DFKI)) BIFOLD – Berlin Institute for the Foundations of Learning and Data(BIFOLD – 柏林学习与数据基础研究院)

AI总结 通过提出统一偏好对齐接口FaithMate,研究上下文与参数化两种思维链忠实性范式在优化下的相互作用,发现两者正相关但不对称,且上下文忠实性指标间存在权衡。

Comments The first two authors contributed equally and share first-authorship

详情
AI中文摘要

思维链(CoT)忠实性,即CoT是否真实反映大型语言模型(LLM)的底层行为,通常通过两种不相交的范式进行评估:上下文忠实性(通过扰动输入或CoT轨迹测量)和参数化忠实性(通过干预模型的参数化知识评估)。然而,先前的工作仅对它们进行描述性比较。我们通过提出FaithMate(一个统一的偏好对齐接口,用于优化模型朝向任一忠实性范式)来填补这一空白。它使我们能够研究两种范式之间的相互作用,检查忠实性增益在范式内部和跨范式之间是否以及多大程度上泛化。在三个模型、两个数据集和六个忠实性指标上,我们发现两种范式呈正相关但不对称:优化参数化忠实性在两种范式上均产生一致的增益,而上下文对应范式则带来更多可变的增益。在上下文范式内,一个指标上的忠实性增益不能一致地转移到其他指标上,这表明现有的上下文指标捕捉了忠实性的不同方面,并暴露了固有的权衡。这些发现意味着CoT忠实性不是一个单一目标,因此需要多方面的优化和评估。

英文摘要

Chain-of-Thought (CoT) faithfulness, i.e., whether CoTs genuinely reflect large language models' (LLM) underlying behavior, is typically evaluated under two disjoint paradigms: contextual faithfulness, measured by perturbing the input or CoT trace, and parametric faithfulness, assessed by intervening on a model's parametric knowledge. Yet prior work compares them only descriptively. We fill this gap by proposing FaithMate, a unified preference-alignment interface for optimizing models towards either faithfulness paradigm. It enables us to investigate the interplay between the two paradigms, examining whether and to what extent faithfulness gains generalize within and across paradigms. Across three models, two datasets, and six faithfulness metrics, we find that the two paradigms are positively coupled, yet asymmetric: optimizing towards parametric faithfulness yields consistent gains across both paradigms, whereas the contextual counterpart delivers more variable gains. Within the contextual paradigm, faithfulness gains on one metric do not consistently transfer to others, implying that existing contextual metrics capture disjoint facets of faithfulness and exposing inherent trade-offs. These findings imply that CoT faithfulness is not a monolithic objective and therefore requires multifaceted optimization and evaluation.

2605.24957 2026-05-26 cs.AI cs.CV cs.LG 版本更新

Mitigating Object Hallucinations in Vision-Language Models through Region-Aware Attention Recalibration

通过区域感知注意力重校准减轻视觉语言模型中的对象幻觉

Yuanzhi Xu, Qian Gao, Jun Fan, Guohui Ding, Zhenyu Yang, Sixue Lin, Yuteng Xiao

发表机构 * Qilu University of Technology (Shandong Academy of Sciences)(齐鲁工业大学(山东省科学院)) China Telecom Digital Intelligence Technology Co, Ltd(中国电信数字智能技术有限公司) Shenyang Aerospace University(沈阳航空航天大学) Qilu Institute of Technology(齐鲁理工学院)

AI总结 提出一种无需训练的区域感知自适应加权机制,通过计算注意力头的稳健统计中点并利用跨头分歧动态调整干预预算,以连续惩罚调制抑制幻觉路径,有效纠正视觉语义错位,同时保持生成流畅性。

详情
AI中文摘要

生成事实上不正确的对象(通常称为对象幻觉)仍然是大型视觉语言模型(LVLMs)中的一个持久挑战。当前解决该问题的方法——从昂贵的数据驱动微调和延迟较高的对比解码到刚性的注意力头截断——常常在计算效率或模型特征空间的连续性上做出妥协。为克服这些限制,我们引入了一种新颖的、无需训练的推理策略,该策略作为一种区域感知的自适应加权机制,动态纠正语义漂移,而不依赖于突然的启发式截断。通过计算各注意力头上的离群值稳健统计中点,我们为可靠的视觉表示建立了一个稳定锚点。然后,我们利用跨区域映射的跨头分歧来动态确定干预预算,通过连续惩罚调制温和地抑制引起幻觉的注意力路径。这种重校准过程有效纠正了视觉语义错位,同时完全保留了生成流畅性和语言先验。在包括CHAIR、POPE和MME在内的标准多模态基准上的全面评估表明,我们的策略显著减少了实例级和句子级幻觉。结果展示了与当代基线相比的最先进性能,证实了我们方法的效率和算法鲁棒性。我们的代码将公开。

英文摘要

The generation of factually incorrect objects, commonly known as object hallucination, remains a persistent challenge in Large Vision-Language Models (LVLMs). Current approaches to address this issue - ranging from expensive data-driven fine-tuning and high-latency contrastive decoding to rigid attention head truncation - frequently compromise either computational efficiency or the continuity of the model's feature space. To overcome these limitations, we introduce a novel, training-free inference strategy that operates as a region-aware adaptive weighting mechanism to dynamically correct semantic drift without relying on abrupt heuristic truncations. By computing an outlier-resistant statistical midpoint across various attention heads, we establish a stable anchor for reliable visual representations. We then utilize the inter-head disagreement mapped across regions to dynamically determine intervention budgets, gently suppressing hallucination-inducing attention paths through a continuous penalty modulation. This recalibration process effectively rectifies visual-semantic misalignments while fully preserving generative fluency and language priors. Comprehensive evaluations on standard multimodal benchmarks, including CHAIR, POPE, and MME, reveal that our strategy substantially curtails both instance- and sentence-level hallucinations. The results demonstrate state-of-the-art performance against contemporary baselines, confirming our method's efficiency and algorithmic robustness. Our code will be public.

2605.24950 2026-05-26 cs.RO cs.LG 版本更新

ARCANE-PedSynth: Synthetic Multi-Pedestrian Datasets with Behavioural Crossing Annotations

ARCANE-PedSynth:具有行为穿越注释的合成多行人数据集

Muhammad Naveed Riaz, Maciej Wielgosz, Antonio M. López Peña

发表机构 * Computer Vision Center (CVC), Universitat Aut\` o noma de Barcelona (UAB), Bellaterra, Barcelona, Spain Institute of Electronics, Faculty of Computer Science, Electronics Telecommunications, AGH University of Krakow, Krak\' o w, Poland

AI总结 提出基于CARLA的开源框架ARCANE-PedSynth,通过混合AI-手动控制架构和12状态行为有限状态机生成高穿越率的多行人合成数据,支持RGB、LiDAR和DVS模态及行为标注,用于自动驾驶中的行人穿越预测。

详情
AI中文摘要

我们提出ARCANE-PedSynth,一个基于CARLA的开源软件框架,用于生成具有密集行为注释的合成多行人数据集,以支持自动驾驶中的行人穿越预测。该框架通过混合AI-手动行人控制架构克服了CARLA原生9%的穿越率,可实现高达75%的可配置目标穿越率。一个包含五种角色原型的12状态行为有限状态机产生了多样化的穿越行为。该框架生成同步的RGB、LiDAR和DVS数据,并带有每帧穿越标签、行为状态和估计的2D姿态关键点。我们通过PedSynth++(一个使用该框架生成的示例数据集)展示了ARCANE-PedSynth,该数据集包含533个多行人片段,覆盖12种天气条件,并带有RGB、LiDAR和DVS流。ARCANE-PedSynth通过CLI参数化和Docker容器化实现完全可重复性。

英文摘要

We present ARCANE-PedSynth, an open-source CARLA-based software framework for generating synthetic multi-pedestrian datasets with dense behavioural annotations for pedestrian crossing prediction in autonomous driving. The framework overcomes CARLA's native 9% crossing rate through a hybrid AI-manual pedestrian control architecture, enabling configurable target rates up to 75%. A 12-state behavioural finite state machine with five character archetypes produces diverse crossing behaviours. The framework generates synchronised RGB, LiDAR, and DVS data with per-frame crossing labels, behavioural states, and estimated 2D pose keypoints. We demonstrate ARCANE-PedSynth through PedSynth++, an example dataset generated with the framework, comprising 533 multi-pedestrian clips across 12 weather conditions with RGB, LiDAR, and DVS streams. ARCANE-PedSynth is fully reproducible via CLI parameterisation and Docker containerisation.

2605.24945 2026-05-26 cs.LG cs.AI physics.ao-ph 版本更新

RealBench: Benchmarking Data-Driven Numerical Weather Forecasting Under Operational Conditions and Extreme Event Challenges

RealBench: 在操作条件和极端事件挑战下对数据驱动数值天气预报的基准测试

Ruize Li, Zhibin Wen, Tao Han, Hao Chen, Fenghua Ling, Wei Zhang, Song Guo, Lei Bai

发表机构 * The Hong Kong University of Science and Technology(香港科学与技术大学) Nanjing University(南京大学) Southern University of Science and Technology(南方科技大学) Shanghai AI Laboratory(上海人工智能实验室) Shanghai TechWind Technology Co., Ltd.(上海技风科技有限公司)

AI总结 提出RealBench基准,通过使用低延迟操作分析和全球10,000+站点观测数据,在严格分布外测试集上评估AI天气预报模型,揭示再分析指标与实际性能的显著差异,特别是极端事件方面。

Comments 35 pages, 22 figures

详情
AI中文摘要

准确评估天气预报模型对于其在现实世界应用中的可靠部署至关重要。然而,现有基准主要依赖再分析产品(如ERA5),这些产品通过延迟数据同化生成,不能反映实时操作预报的约束,导致基准性能与现实预报之间存在系统性不匹配。在这项工作中,我们引入了RealBench,这是一个用于AI天气预报的下一代基准,强调在操作条件下的现实评估。RealBench具有严格分布外测试集,覆盖2025年,以消除数据泄露并捕捉近期大气状况。它整合了多个数据源,包括低延迟操作分析和包含超过10,000个站点的全球原位观测数据集,从而能够直接针对真实大气测量进行评估。除了标准全球指标外,RealBench还为高影响极端事件(包括热浪、寒潮和热带气旋)提供了全面的评估框架,使用事件特定指标更好地反映现实预报优先级。评估结果揭示了基于再分析的指标与现实性能之间的显著差异,特别是关于极端事件。通过突出现有基准的局限性,这项工作建立了一个更忠实且与操作相关的评估范式,为推进下一代AI天气预报系统提供了严格的基础。基准实现可在以下网址获取:https://github.com/lixruize-del/NWP-Benchmark。

英文摘要

Accurate evaluation of weather forecasting models is critical for their reliable deployment in real-world applications. However, existing benchmarks predominantly rely on reanalysis products such as ERA5, which are generated through delayed data assimilation and do not reflect the constraints of real-time operational forecasting, thereby resulting in a systematic mismatch between benchmark performance and real-world forecasting. In this work, we introduce RealBench, a next-generation benchmark for AI weather forecasting that emphasizes realistic evaluation under operational conditions. RealBench features a strictly out-of-distribution test set spanning 2025 to eliminate data leakage and capture recent atmospheric regimes. It integrates multiple data sources, including low-latency operational analysis and a large-scale global in-situ observation dataset comprising over 10,000 stations, enabling direct evaluation against real atmospheric measurements. Beyond standard global metrics, RealBench provides a comprehensive evaluation framework for high-impact extreme events, including heatwaves, cold surges, and tropical cyclones, using event-specific metrics that better reflect real-world forecasting priorities. The evaluation results reveal substantial discrepancies between reanalysis-based metrics and real-world performance, particularly concerning extreme events. By highlighting the limitations of existing benchmarks, this work establishes a more faithful and operationally relevant evaluation paradigm, providing a rigorous foundation for advancing next-generation AI weather forecasting systems. The benchmark implementation is available at: https://github.com/lixruize-del/NWP-Benchmark.

2605.24941 2026-05-26 cs.CR cs.LG 版本更新

Memory-Induced Tool-Drift in LLM Agents

LLM代理中的记忆诱导工具漂移

Mahavir Dabas, Jihyun Jeong, Ming Jin, Ruoxi Jia

发表机构 * Virginia Tech(弗吉尼亚理工大学)

AI总结 研究LLM代理中长期记忆存储的个性偏见(如成本意识、不耐烦等)在不适用情境下静默影响工具调用的问题,提出MEMDRIFT基准测试,发现偏置记忆导致工具参数偏离基线,且现有防御措施无法消除该现象。

详情
AI中文摘要

现代LLM代理将用于个性化的长期记忆与用于在现实世界中采取行动的工具调用接口相结合——这一组合支撑着当代生产系统。我们研究了这种组合的一个先前未被检查的失败:当存储在记忆中的个性驱动偏见(成本意识、不耐烦、风险承受能力等)在不适用情境下静默影响工具调用时。我们称此为记忆诱导工具漂移,并通过MEMDRIFT将其操作化,MEMDRIFT是一个包含105个场景的基准测试,涵盖五个偏见维度和七个专业领域,通过自动化对抗性流水线生成。在七个前沿模型(包括具有扩展推理能力的模型)中,偏置记忆使偏转分数(一种由评判者评分的参数偏离无偏基线的度量)在1-5分制上提高高达+3.6分。当记忆管理由三种生产记忆架构处理时,工具漂移持续存在。该现象影响现实世界的工具:扫描288个经过验证的MCP服务器上的6,062个工具,我们标记了608个具有易受影响参数的工具,并在一个经过验证的子集上确认了工具漂移。从机制上讲,偏置记忆充当隐式引导向量,将激活沿与显式行为指令相同的潜在方向推动。它们还将注意力从任务相关上下文重新分配到与目标参数具有表面关键词重叠的记忆条目上。标准防御——基于提示的相关性指令和记忆过滤器——减少了漂移但未能消除它。随着代理以用户名义采取越来越重要的行动,记忆诱导工具漂移代表了当前安全措施未能解决的一个系统性漏洞,这激发了在记忆管理和工具调用生成交叉点上的专用防御。

英文摘要

Modern LLM agents combine long-term memory for personalization with tool-calling interfaces for taking actions in the world -- a combination underpinning contemporary production systems. We study a previously unexamined failure of this combination: when personality-driven biases stored in memory (cost-consciousness, impatience, risk tolerance, etc.) silently affect tool calls in contexts where they are not applicable. We call this memory-induced tool-drift and operationalize it through MEMDRIFT, a benchmark of 105 scenarios spanning five bias dimensions and seven professional domains, generated through an automated adversarial pipeline. Across seven frontier models -- including those with extended reasoning -- biased memories raise deflection scores (a judge-scored measure of parameter deviation from unbiased baselines) by up to $+3.6$ points on a 1--5 scale. Tool-drift persists when memory management is handled by three production memory architectures. The phenomenon affects real-world tools: scanning 6{,}062 tools across 288 verified MCP servers, we flag 608 with susceptible parameters and confirm tool-drift on a validated subset. Mechanistically, biased memories act as implicit steering vectors, pushing activations along the same latent directions as explicit behavioral instructions. They also redistribute attention from task-relevant context toward memory entries with surface-level keyword overlap to the target parameter. Standard defenses -- prompt-based relevance instructions and memory filters -- reduce drift but do not eliminate it. As agents take increasingly consequential actions on a user's behalf, memory-induced tool-drift represents a systematic vulnerability that current safeguards do not address, motivating dedicated defenses at the intersection of memory management and tool-call generation.

2605.24939 2026-05-26 cs.LG math.OC 版本更新

Global linear convergence of entropy-regularized softmax policy gradient beyond tabular MDPs

熵正则化softmax策略梯度的全局线性收敛性:超越表格MDP

Ziyue Chen, David Šiška, Lukasz Szpruch

发表机构 * School of Mathematics(数学系)

AI总结 本文研究连续状态和动作空间的无限时域熵正则化马尔可夫决策过程中策略梯度的全局收敛性,通过线性函数逼近的log-linear softmax策略,在$Q^π_τ$可实现性假设下建立非均匀Polyak--Łojasiewicz不等式,并识别两种特征机制下非均匀常数的有界性,证明正则化目标沿梯度流的全局线性收敛。

详情
AI中文摘要

我们研究了具有连续状态和动作空间的无限时域熵正则化马尔可夫决策过程(MDP)中策略梯度的全局收敛性。我们考虑带有线性函数逼近的log-linear softmax策略,它扩展了表格softmax参数化,同时保留了易处理的策略类。在正则化状态-动作值函数的$Q^π_τ$可实现性下,我们首先建立了一个非均匀的Polyak--Łojasiewicz(PŁ)不等式。非均匀性源于与策略几何相关的常数的退化性,即Fisher信息矩阵或非中心特征协方差矩阵。然后,我们确定了两种特征机制,在这些机制下,该非均匀常数可以沿梯度流有界。对于全仿射跨度特征,我们证明了KL正则化子的径向无界性,并表明Fisher信息矩阵的最小特征值保持在一个依赖于初始化的正常数之下。对于单纯形值特征,我们在与全1向量正交的子空间中证明了类似的径向无界性结果,并获得了非中心协方差矩阵最小特征值的统一下界。这些结果意味着正则化目标沿梯度流的全局线性收敛,即次优性以$\mathcal{O}(e^{-Ct})$衰减,其中$C>0$。我们的分析将熵正则化softmax策略梯度的全局收敛理论扩展到Agarwal等人(2020);Bhandari和Russo(2024);Mei等人(2020)的表格设置之外。

英文摘要

We study the global convergence of policy gradient for infinite-horizon entropy-regularized Markov decision processes (MDPs) with continuous state and action spaces. We consider log-linear softmax policies with linear function approximation, which extend the tabular softmax parameterization while retaining a tractable policy class. Under $Q^π_τ$-realizability for the regularized state-action value function, we first establish a non-uniform Polyak--Łojasiewicz (PŁ) inequality. The non-uniformity arises through degeneracy of constants associated with the policy geometry, namely the Fisher information matrix or an uncentered feature covariance matrix. We then identify two feature regimes under which this non-uniform constant can be bounded along the gradient flow. For full-affine-span features, we prove radial unboundedness of the KL regularizer and show that the smallest eigenvalue of the Fisher information matrix remains bounded below by an initialization-dependent positive constant. For simplex-valued features, we prove an analogous radial unboundedness result in the subspace orthogonal to the all-ones vector and obtain a uniform lower bound for the smallest eigenvalue of the uncentered covariance matrix. These results imply global linear convergence of the regularized objective along the gradient flow, i.e. suboptimality decaying as $\mathcal{O}(e^{-Ct})$ for some $C>0$. Our analysis extends the global convergence theory of entropy-regularized softmax policy gradient beyond the tabular setting of Agarwal et al. (2020); Bhandari and Russo (2024); Mei et al. (2020).

2605.24929 2026-05-26 stat.ML cs.IT cs.LG math.IT 版本更新

Estimating Mixture Distributions via Stochastic Mirror Descent

通过随机镜像下降估计混合分布

Mohammadreza Ahmadypour, Tara Javidi, Farinaz Koushanfar

发表机构 * Department of Electrical and Computer Engineering(电气与计算机工程系) University of California San Diego(加州大学圣地亚哥分校)

AI总结 针对从样本中估计未知分布的问题,提出基于随机镜像下降(SMD)的混合模型估计器族,通过选择Bregman散度实现灵活估计,在大规模候选分量下保持高效,并在KL散度和ℓ2范数下达到近最优收敛率。

详情
AI中文摘要

我们重新审视了从样本中估计未知分布的经典问题,通过拟合最小化交叉熵损失的混合模型。将该任务视为在$M$分量混合分布空间上的随机凸优化问题,我们提出了一族源自随机镜像下降(SMD)算法的估计器。这种基于优化的方法提供了一个原则性且灵活的框架,它推广了传统估计器,并通过选择Bregman散度提出了多种新颖的估计器。我们方法的一个关键优势是它能够随着候选分量$f_i$的数量高效扩展;也就是说,可以在混合模型中使用大量基分布,而不会产生显著的计算开销。这使得能够实现更丰富的近似和改进的估计精度。此外,在类别分布(离散结果)的情况下,我们的估计器不需要严格的下界,换句话说,我们的框架不需要精确知道分布的支持集。我们证明,在温和条件下,所提出的$φ$-SMD估计器在Kullback-Leibler(KL)散度和$\ell_2$范数下均能达到近最优的收敛速率,并在计算昂贵时提供实际优势。我们的数值分析突出了相对于经典估计器在样本效率和可扩展性方面的改进性能保证。

英文摘要

We revisit the classical problem of estimating an unknown distribution from its samples by fitting a mixture model that minimizes cross-entropy loss. Framing the task as a stochastic convex optimization problem over the space of $ M $-component mixture distributions, we propose a family of estimators derived from the stochastic mirror descent (SMD) algorithm. This optimization-based approach provides a principled and flexible framework that generalizes traditional estimators and proposes a variety of novel estimators through the choice of Bregman divergences. A key advantage of our method is that it scales efficiently with the number of candidate components $ f_i $; that is, one can employ a large set of basis distributions in the mixture model without incurring significant computational overhead. This enables richer approximations and improved estimation accuracy. Moreover, in the case of categorical distribution (discrete outcomes) our estimators do not require a strict lower bound, in other words our framework does not require the precise knowledge of the support of the distribution. We demonstrate that, under mild conditions, the proposed $ φ$-SMD estimators achieve near-optimal convergence rates in both Kullback-Leibler (KL) divergence and $ \ell_2 $-norm and offer practical benefits when computation is expensive. Our numerical analysis highlights improved performance guaranties over classical estimators, particularly in terms of sample efficiency and scalability.

2605.24921 2026-05-26 cs.LG 版本更新

BandVQ: Band-Wise Vector-Quantized EEG Foundation Model

BandVQ: 分带向量量化的脑电图基础模型

Jamiyan Sukhbaatar, Satoshi Imamura, Toshihisa Tanaka

发表机构 * Tokyo University of Agriculture and Technology(东京农工大学) National University of Mongolia(蒙古国国立大学)

AI总结 针对脑电图基础模型中频率特异性活动表征不足的问题,提出BandVQ模型,通过分带VQ-VAE分词器和共享Transformer编码器,在71个公共数据集上预训练,并在六个分类任务上取得领先性能。

Comments 15 pages, 1 figure

详情
AI中文摘要

脑电图(EEG)基础建模的一个核心挑战是学习跨不同任务、导联、参考和频谱特征的记录的可迁移表示。现有的掩码建模方法通常依赖于宽带连续块或单一离散表示,这可能无法充分表征频率特异性活动。本文提出BandVQ,一种分带向量量化的EEG基础模型,它将EEG分解为delta、theta、alpha、beta和gamma频带,为每个频带训练独立的VQ-VAE分词器,并在生成的离散VQ码索引上预训练一个共享的Transformer编码器。编码器使用掩码码元、量化绝对对数功率元、通道和时间嵌入,以及表示参考、频带、任务族和阶段的元数据前缀元。还引入了基于区域的掩码,以减少空间相邻电极的平凡重建。该模型在71个公共EEG语料库上进行预训练,涵盖超过9200名受试者和357,000单通道小时,并在六个独立于受试者的分类数据集上进行评估。在当前评估设置下,所提模型实现了强大的迁移性能,在三个认知任务上取得了最高报告结果,在三个运动想象任务上取得了有竞争力的性能。

英文摘要

A central challenge in electroencephalography (EEG) foundation modeling is learning transferable representations across recordings with diverse tasks, montages, references, and spectral characteristics. Existing masked modeling approaches often rely on broadband continuous patches or a single discrete representation, which may underrepresent frequency-specific activity. This paper proposes BandVQ, a band-wise vector-quantized EEG foundation model that decomposes EEG into delta, theta, alpha, beta, and gamma bands, trains an independent VQ-VAE tokenizer for each band, and pretrains a shared Transformer encoder on the resulting discrete VQ code indices. The encoder uses masked code tokens, quantized absolute log-power tokens, channel and temporal embeddings, and metadata prefix tokens representing reference, band, task family, and phase. Region-based masking is also introduced to reduce the trivial reconstruction of spatially adjacent electrodes. The model is pretrained on 71 public EEG corpora comprising over 9,200 subjects and 357,000 single-channel hours and evaluated on six subject-independent classification datasets. Under the current evaluation setting, the proposed model achieves strong transfer performance, with the highest reported results on three cognitive tasks and competitive performance on three motor imagery tasks.

2605.24920 2026-05-26 cs.LG cs.AI stat.ML 版本更新

Quaternion Self-Attention with Shared Scores

共享分数的四元数自注意力

Shogo Yamauchi, Tohru Nitta, Hideaki Tamori

发表机构 * Tokyo Woman's Christian University(东京女子基督教大学)

AI总结 提出一种共享分数四元数自注意力机制,通过四元数内积计算单一实值分数并共享注意力分布,在保持性能的同时大幅降低计算成本。

Comments 26 pages, 6 figures and 15 tables. Accepted at ICML2026

详情
AI中文摘要

四元数神经网络通过将四个相关特征表示为一个单一实体,实现了参数高效并建模多维依赖关系。然而,现有的四元数自注意力计算每个分量的分数并对每个分量应用独立的softmax操作,这增加了计算成本并允许注意力分布在分量间发散。我们提出了一种共享分数的四元数自注意力机制,该机制使用四元数内积计算单一实值分数,并在所有分量上应用共享的注意力分布。这将分数计算乘法减少了75%,并将softmax操作次数从四次减少到一次。我们证明,当查询和键由诱导分量预混合的四元数线性投影产生时,分量级分数和共享分数位于相同的交互子空间中,表明独立的分量级注意力主要重新参数化相同的交互,而不是扩展特征交互空间。在语音增强中,我们的方法在GPU上将推理时间减少了高达44.3%,在CPU上减少了58.1%,同时保持了质量,并且在视觉和自然语言处理中呈现一致的趋势。

英文摘要

Quaternion neural networks are parameter-efficient and model multidimensional dependencies by representing four related features as a single entity. However, existing quaternion self-attention computes component-wise scores and applies independent softmax operations to each component, which increases the computational cost and allows attention distributions to diverge across components. We propose a shared-score quaternion self-attention mechanism that computes a single real-valued score using the quaternion inner product and applies a shared attention distribution across all components. This reduces score-computation multiplications by 75% and the number of softmax operations from four to one. We prove that, when queries and keys are produced by quaternion linear projections that induce component pre-mixing, the component-wise and shared scores lie in the same interaction subspace, indicating that independent component-wise attention primarily re-parameterizes the same interactions rather than expanding the feature interaction space. In speech enhancement, our method reduces inference time by up to 44.3% on a GPU and 58.1% on a CPU while maintaining quality, with consistent trends across vision and natural language processing.

2605.24914 2026-05-26 cs.IR cs.DB cs.LG 版本更新

MVR-cache: Optimizing Semantic Caching via Multi-Vector Retrieval and Learned Prompt Segmentation

MVR-cache:通过多向量检索和学习型提示分割优化语义缓存

Ali Noshad, Zishan Zheng, Yinjun Wu

发表机构 * School of Computer Science, Peking University, Beijing, China(北京大学计算机科学学院,北京,中国) School of Information, Renmin University of China, Beijing, China(中国人民大学信息学院,北京,中国)

AI总结 提出MVR-cache方法,利用多向量检索和学习型提示分割模型,通过强化学习优化缓存命中率,在保证正确性的前提下将缓存命中率提升高达37%。

Comments Published in ICML 2026

详情
AI中文摘要

为了降低LLM的成本和延迟,语义缓存系统必须准确识别新提示是否与缓存提示匹配。当前方法通常依赖简单的相似性度量,限制了其有效性。我们提出MVR-cache,一种新颖的语义缓存方法,通过集成多向量检索(MVR)显著提高了检索准确性。MVR-cache基于一个可学习的分割模型,智能地分割提示,通过MaxSim实现细粒度的相似性比较。我们从严格的理论分析中推导出模型的训练目标,确保优化该目标能在严格正确性约束下直接最大化缓存命中率。为了解决由此产生的非可微组合优化问题,我们利用基于强化学习的训练策略,以理论推导的目标作为奖励。在跨不同任务的已有基准上的实验结果表明,与最先进方法相比,MVR-cache在保持相同正确性保证的同时,一致地将缓存命中率提高了高达37%。MVR-cache可在https://github.com/PKU-SDS-lab/MVR-Cache获取。

英文摘要

To reduce LLM costs and latency, semantic caching systems must accurately identify when a new prompt matches a cached one. Current methods often rely on simplistic similarity measures, which limit their effectiveness. We introduce MVR-cache, a novel semantic caching approach that significantly improves retrieval accuracy by integrating Multi-Vector Retrieval (MVR). MVR-cache is built upon a learnable segmentation model that intelligently splits prompts, enabling fine-grained similarity comparisons via MaxSim. We derive the model's training objective from a rigorous theoretical analysis. This can ensure that optimizing this objective directly maximizes cache hits under strict correctness constraints. To solve the resulting non-differentiable combinatorial optimization problem, we leverage a reinforcement learning-based training strategy with the theoretically grounded objectives as the reward. Experimental results on established benchmarks across diverse tasks confirm that in comparison to the state-of-the-art, MVR-cache consistently increases the cache hit rates by up to 37% while maintaining the same correctness guarantees. MVR-cache is available at https://github.com/PKU-SDS-lab/MVR-Cache

2605.24912 2026-05-26 cs.LG cs.AI q-bio.OT 版本更新

Explainable Retinal Imaging for Prediction of Multi-Organ Dysfunction in Type 2 Diabetes

可解释的视网膜成像用于预测2型糖尿病多器官功能障碍

Mini Han Wang, Liting Huang, Wei Hong, Boonthawan Wingwon

发表机构 * Faculty of Computer Science and Artificial Intelligence(计算机科学与人工智能学院) Frontier Science Computing Center(前沿科学计算中心) Chinese Academy of Sciences(中国科学院) Chinese University of Hong Kong(香港中文大学) Zhuhai People's Hospital(珠海人民医院) Beijing Institute of Technology(北京理工大学) Jinan University(暨南大学) Lampang Inter-Tech College

AI总结 本研究利用常规实验室生物标志物构建系统级异常指数,通过梯度提升模型预测2型糖尿病多系统失调,并采用SHAP实现可解释性,揭示了高血糖、肾功能障碍、血脂异常和炎症是主要驱动因素。

Comments 15 pages, 8 figures

详情
AI中文摘要

背景:2型糖尿病(T2DM)日益被认为是一种以代谢、肾脏、脂质和炎症通路协调功能障碍为特征的系统性疾病。现有的临床评估往往无法捕捉这种多维度负担。方法:我们对1,195名患者进行了回顾性研究,使用了常规收集的实验室生物标志物。构建了系统级异常指数以量化器官特异性功能障碍,并将多系统受累定义为两个或以上系统异常。训练了包括逻辑回归、随机森林和梯度提升在内的监督机器学习模型来预测多系统失调。使用SHapley Additive exPlanations(SHAP)实现模型可解释性。结果:梯度提升模型表现出近乎完美的区分能力(AUC = 1.000),显著优于逻辑回归(AUC = 0.925)。特征归因分析显示,高血糖、肾功能障碍、血脂异常和炎症是多系统风险的主要驱动因素。部分依赖分析中观察到的剂量-反应关系进一步支持了模型预测的生物学合理性。结论:本研究提出了一个可解释的、数据驱动的框架,用于量化T2DM的系统性疾病负担。通过将常规生物标志物与多器官功能障碍联系起来,我们的方法提供了预测准确性和机制洞察,为糖尿病护理中的风险分层和精准医学提供了潜力。本研究中使用的数据和代码可在GitHub上公开获取:https://github.com/MiniHanWang/Type-2-Diabetes-1.git

英文摘要

Background: Type 2 diabetes mellitus (T2DM) is increasingly recognised as a systemic disease characterised by coordinated dysfunction across metabolic, renal, lipid, and inflammatory pathways. Existing clinical assessments often fail to capture this multi-dimensional burden. Methods: We conducted a retrospective study of 1,195 patients using routinely collected laboratory biomarkers. System-level abnormality indices were constructed to quantify organ-specific dysfunction, and multi-system involvement was defined as abnormalities in two or more systems. Supervised machine learning models, including logistic regression, random forest, and gradient boosting, were trained to predict multi-system dysregulation. Model interpretability was achieved using SHapley Additive exPlanations (SHAP). Results: The gradient boosting model demonstrated near-perfect discrimination (AUC = 1.000), significantly outperforming logistic regression (AUC = 0.925). Feature attribution analysis revealed that hyperglycaemia, renal impairment, dyslipidaemia, and inflammation were the dominant drivers of multi-system risk. Dose-response relationships observed in partial dependence analyses further supported the biological plausibility of model predictions. Conclusion: This study presents an interpretable, data-driven framework for quantifying systemic disease burden in T2DM. By linking routine biomarkers to multi-organ dysfunction, our approach provides both predictive accuracy and mechanistic insight, offering potential for improved risk stratification and precision medicine in diabetes care. The data and code used in this study are openly available on GitHub at: https://github.com/MiniHanWang/Type-2-Diabetes-1.git

2605.24911 2026-05-26 cs.LG cs.AI 版本更新

Factorize to Generalize: Retrieval-Guided Invariant-Dynamic Decomposition for Time Series Forecasting

因式分解以泛化:面向时间序列预测的检索引导不变-动态分解

Jinjin Chi, Lei Feng, Lulu Zhang, Yongcheng Jing, Yiming Wang, Ximing Li, Jialie Shen, Leszek Rutkowski, Dacheng Tao

发表机构 * College of Computer Science and Technology, Jilin University(吉林大学计算机科学与技术学院) College of Computing and Data Science, Nanyang Technological University(南洋理工大学计算机与数据科学学院) City St George’s, University of London(伦敦大学城圣乔治学院) Systems Research Institute, Polish Academy of Sciences(波兰科学院系统研究所)

AI总结 提出检索引导的不变-动态分解框架,通过分离稳定共享结构与实例特定变化,提升时间序列零样本预测在分布偏移下的鲁棒性。

详情
AI中文摘要

时间序列基础模型(TSFMs)最近通过大规模预训练和检索增强预测实现了强大的零样本预测性能。然而,我们的实证分析揭示了基于检索的预测的一个非平凡限制:检索倾向于导致更振荡的预测,在高度波动的序列上提升性能,但在更平滑、趋势主导的序列上降低准确性。这表明检索信息可能在未明确区分稳定时间结构与实例特定变化的情况下被融合到预测中,这可能在分布偏移下降低鲁棒性。我们提出了一种用于时间序列预测的检索引导不变-动态分解框架。我们不将检索用作辅助预测上下文,而是利用检索到的序列作为来自相关环境的隐式样本,以指导表示分解。具体来说,我们首先通过基于注意力的聚合构建检索感知表示,然后引入检索引导路由机制将其分解为捕获稳定共享结构的不变组件和建模上下文相关变化的动态组件。这两个组件分别预测并融合以进行最终预测,使模型能够保留可迁移模式,同时保持对动态演变的适应性。我们进一步设计了鼓励不变学习和解耦的训练目标,并提供了理论见解,表明检索聚合减少了方差,并在没有显式环境监督的情况下近似不变表示学习。大量实验表明,我们的方法在分布偏移下持续提高鲁棒性,并在零样本预测设置中优于现有的TSFMs和基于检索的基线。

英文摘要

Time series foundation models (TSFMs) have recently achieved strong zero-shot forecasting performance through large-scale pretraining and retrieval-augmented prediction. However, our empirical analysis reveals a non-trivial limitation of retrieval-based forecasting: retrieval tends to induce more oscillatory predictions, improving performance on highly fluctuating series while degrading accuracy on smoother, trend-dominated ones. This suggests that retrieved information may be fused into prediction without explicitly distinguishing stable temporal structure from instance-specific variations, which can reduce robustness under distribution shifts. We propose a Retrieval-guided Invariant-Dynamic DEcomposition framework for time series forecasting. Rather than using retrieval as auxiliary predictive context, we leverage retrieved sequences as implicit samples from related environments to guide representation decomposition. Specifically, we first construct a retrieval-aware representation via attention-based aggregation, and then introduce a retrieval-guided routing mechanism to decompose it into an invariant component capturing stable shared structure and a dynamic component modeling context-dependent variations. These two components are forecast separately and fused for final prediction, enabling the model to preserve transferable patterns while remaining adaptive to evolving dynamics. We further design training objectives that encourage invariant learning and disentanglement, and provide theoretical insight showing that retrieval aggregation reduces variance and approximates invariant representation learning without explicit environment supervision. Extensive experiments demonstrate that our method consistently improves robustness under distribution shifts and outperforms existing TSFMs and retrieval-based baselines in zero-shot forecasting settings.

2605.24908 2026-05-26 cs.LG cs.AI 版本更新

On the Impact of Class Imbalance on the Learning Dynamics of Deep Neural Networks:An Intuitive Insight

论类别不平衡对深度神经网络学习动态的影响:直观洞察

Ismail B. Mustapha, Shafaatunnur Hasan, Sunday O. Olatunji, Hatem S. Y. Nabus

发表机构 * Faculty of Computing(计算机学院) Universiti Teknologi Malaysia(技术大学) Adejkunle Ajasin University(阿德吉库内勒·阿贾辛大学) Johor, Malaysia(马来西亚 Johor) Akungba-Akoko, Nigeria(尼日利亚 Akungba-Akoko)

AI总结 通过监测不同不平衡比率下深度神经网络对多数类和少数类的学习模式,系统研究了类别不平衡如何导致模型早期欠拟合少数类并仅学习多数类,最终造成少数类表示过拟合而非泛化。

Comments Conference

详情
AI中文摘要

近年来,深度神经网络(DNN)中的类别不平衡问题引起了研究者的广泛关注。然而,相关文献中对DNN在不平衡数据上表现不佳的原因存在不同解释,表明人们对这一长期存在的现象如何影响DNN性能知之甚少。更好地理解这一问题对于开发有效的基于DNN的不平衡方法至关重要。因此,本研究通过监测DNN模型在不同不平衡比率数据集上对多数类和少数类的学习模式,系统研究了类别不平衡对DNN学习动态的影响。实验结果表明,与从平衡数据集学习时DNN类似地学习各个类别不同,类别不平衡严重损害了DNN的性能,导致模型在早期训练轮次中欠拟合少数类样本,同时仅学习多数类。尽管DNN最终学会了少数类样本,但这种学习方式仅导致学习到的少数类表示在测试阶段无法泛化,因为它们仅仅是过拟合以尽可能降低整体训练损失。

英文摘要

Class imbalance in deep neural networks (DNNs) has witnessed a rapid increase in research attention in recent years. However, the varying accounts of the reasons behind the poor performance of DNN on imbalance data in pertinent literature shows that little is known about how this agelong phenomenon impacts the performance of DNNs. A better understanding of this problem is crucial to developing effective DNN-based imbalance methods. Thus, this study systematically investigates the impact of class imbalance on the learning dynamics of DNN by monitoring the learning pattern of DNN models on both the majority and minority classes of datasets of varying imbalance ratios. Experimental findings shows that as against learning from balanced datasets where DNN learns the classes similarly, class imbalance has severe deteriorating impact on the performance of DNN, driving the model to underfit the minority class samples in the early training epochs while simultaneously learning only the majority class. Although DNN ultimately learns the minority samples, learning in this manner only results in learnt minority representations that are non-generalizable at test phase because they are merely overfitted to keep the overall training loss as low as possible.

2605.24903 2026-05-26 cs.CR cs.LG 版本更新

SEED: Semi-supervised Continual MalwarE Detection for Tackling ConcEpt Drift on a BuDget

SEED: 预算约束下应对概念漂移的半监督持续恶意软件检测

Suresh Kumar Amalapuram, Bikraj Shresta, Siva Ram murthy Chebiyam, Bheemarjuna Reddy Tamma, Sumohana S Channappayya

发表机构 * Indian Institute of Technology Ropar(印度理工学院罗帕尔) Indian Institute of Technology Hyderabad(印度理工学院海得拉巴)

AI总结 提出SEED方法,结合定制二元交叉熵损失与半监督持续学习和主动学习,在有限标注下有效检测未知恶意软件,平均AUT提升40%(BODMAS)和14%(AndroZoo)。

详情
AI中文摘要

基于机器学习的恶意软件检测器会随着良性应用和恶意应用中的概念漂移而随时间变得过时。最近的方法依赖完全标注数据,并利用层次对比损失(HCL)与主动学习,通过利用恶意软件表示中的语义结构来提高对漂移的鲁棒性。然而,在安全领域获取标注数据很困难。在部分标注设置下,HCL在检测未知恶意软件时性能显著下降,尤其是在BODMAS等可能缺乏强语义结构的数据集上。本文提出SEED,一种在有限监督下进行恶意软件检测的语义结构无关方法。SEED将定制的二元交叉熵目标与半监督持续学习和主动学习相结合。对于部分标注的已见任务,未标注样本通过奇异值分解投影到从先前已见数据构建的表示空间中,并与合适的标注样本配对以鼓励表示一致性。对于完全未标注的未见任务,使用表示空间中的余弦距离量化不确定性,并选择最不确定的样本供分析师标注。我们在Windows和Android恶意软件数据集上评估SEED。在已见任务上仅使用20%标注数据,与HCL*(HCL的半监督适应)相比,SEED在未知恶意软件检测上平均AUT提升40%(BODMAS)和14%(AndroZoo),同时在APIGraph上保持竞争力。最后,我们引入延迟缓冲区更新策略以减少重放期间的标签噪声传播并提高学习稳定性。

英文摘要

Machine learning based malware detectors become obsolete over time due to concept drift in benign and malware applications. Recent methods rely on fully labeled data and use hierarchical contrastive loss (HCL) with active learning to improve robustness against drift by exploiting semantic structure in malware representations. However, obtaining labeled data in the security domain is difficult. Under partially labeled settings, HCL suffers significant performance degradation in detecting unseen malware, especially on datasets such as BODMAS where strong semantic structure may not exist. In this paper, we propose SEED, a semantic-structure-agnostic method for malware detection under limited supervision. SEED combines a tailored binary cross-entropy objective with semi-supervised continual learning and active learning. For partially labeled seen tasks, unlabeled samples are projected into a representation space constructed from previously seen data using singular value decomposition, and paired with suitable labeled samples to encourage representation consistency. For unseen tasks with fully unlabeled data, uncertainty is quantified using cosine distance in representation space, and the most uncertain samples are selected for analyst labeling. We evaluate SEED on both Windows and Android malware datasets. Using only 20% labeled data on seen tasks, SEED achieves average AUT improvements of 40% on BODMAS and 14% on AndroZoo for unseen malware detection compared to HCL* (the semi-supervised adaptation of HCL), while remaining competitive on APIGraph. Finally, we introduce a delayed buffer update strategy to reduce label noise propagation during replay and improve learning stability.

2605.24902 2026-05-26 cs.CL cs.AI cs.LG 版本更新

When Reasoning Hurts: Source-Aware Evaluation of Frontier LLMs for Clinical SOAP Note Generation

当推理有害:面向临床SOAP笔记生成的前沿LLM源感知评估

Faizan Faisal

发表机构 * University of California, Davis(加州大学戴维斯分校)

AI总结 通过源感知基准测试,评估推理增强型LLM在临床SOAP笔记生成中的表现,发现推理能力反而降低GPT-5.4的质量,而相同源RAG带来模型依赖的小幅提升。

详情
AI中文摘要

推理增强型LLM在医学推理基准测试中表现强劲,但这些增益是否能迁移到结构化临床文档尚不清楚;我们通过一个跨OMI Health、ACI-Bench和PriMock57的源感知基准,利用临床对话生成SOAP笔记来研究这一问题。我们在一个2x2受控设计中评估GPT-5.4、DeepSeek-V4-Flash和Gemma-4-E4B,独立切换提供者原生推理和相同源检索增强生成(RAG)。输出使用七种自动指标以及两个参考感知的LLM评判者进行评估。两种评估方法一致认为,非推理的GPT-5.4配置达到最高整体质量,而DeepSeek-V4-Flash在推理增强配置中表现最佳。启用推理显著降低了GPT-5.4在所有三个数据集上的性能,而相同源RAG带来较小的、模型依赖的改进。总体而言,研究结果表明,不应假设更强的推理能力能改善对保真度敏感的SOAP笔记生成,而无需专门的、任务特定的评估。

英文摘要

Reasoning-enabled LLMs perform strongly on medical reasoning benchmarks, but it remains unclear whether these gains transfer to structured clinical documentation; we investigate this question using SOAP note generation from clinical dialogue in a source-aware benchmark spanning OMI Health, ACI-Bench, and PriMock57. We evaluate GPT-5.4, DeepSeek-V4-Flash, and Gemma-4-E4B in a controlled 2x2 design that independently toggles provider-native reasoning and same-source retrieval-augmented generation (RAG). Outputs are assessed using seven automatic metrics alongside two reference-aware LLM judges. Both evaluation approaches agree that a non-reasoning GPT-5.4 configuration achieves the highest overall quality, while DeepSeek-V4-Flash performs best among reasoning-enabled configurations. Enabling reasoning significantly degrades GPT-5.4 performance across all three datasets, whereas same-source RAG yields smaller, model-dependent improvements. Overall, the findings indicate that stronger reasoning capability should not be assumed to improve fidelity-sensitive SOAP note generation without dedicated, task-specific evaluation.

2605.24879 2026-05-26 cs.LG math.OC 版本更新

Efficient DP-SGD for LLMs with Randomized Clipping

基于随机裁剪的高效DP-SGD用于大语言模型

Enayat Ullah, Sai Aparna Aketi, Devansh Gupta, Huanyu Zhang, Meisam Razaviyayn

发表机构 * Meta Platforms Inc(Meta平台公司) University of Southern California(南加州大学)

AI总结 提出DP-SGD-RC算法,利用随机迹估计(Hutchinson和Hutch++)降低每样本梯度范数估计的内存开销,在保持隐私保证的同时减少内存和计算复杂度。

Comments Accepted at ICML 2026

详情
AI中文摘要

大语言模型(LLMs)在可能包含敏感信息的大规模数据集上进行训练。差分隐私(DP)作为正式隐私保护的事实标准,为训练具有可证明隐私保护的LLMs提供了原则性框架。然而,最先进的DP训练实现依赖于快速梯度裁剪技术,其内存开销为$O(B \min\{T^2, d^2\})$,其中$B$是批量大小,$T$是序列长度,$d$是模型宽度。随着模型规模和上下文长度的增长,这一开销变得难以承受。我们提出DP-SGD-RC,一种带有随机裁剪的新型DP-SGD变体,可降低内存和计算复杂度。DP-SGD-RC利用随机迹估计方法,特别是Hutchinson估计器[Hutchinson, 1989]及其改进变体Hutch++[Meyer et al., 2021],以减少每样本梯度范数估计的内存占用。我们提供了严格的隐私分析,表明DP-SGD-RC实现了与确定性裁剪相竞争噪声乘数。在长上下文基准(包括分类、问答和摘要任务)上微调Llama~3.2-1B的实验表明,DP-SGD-RC在匹配基线效用的同时显著降低了内存和计算需求。

英文摘要

Large language models (LLMs) are trained on vast datasets that may contain sensitive information. Differential privacy (DP), the de facto standard for formal privacy guarantees, provides a principled framework for training LLMs with provable privacy protection. However, state-of-the-art DP training implementations rely on fast gradient clipping techniques with memory overhead $O(B \min\{T^2, d^2\})$, where $B$ is the batch size, $T$ is the sequence length, and $d$ is the model width. This becomes prohibitive as both model size and context length grow. We propose DP-SGD-RC, a novel variant of DP-SGD with randomized clipping that reduces memory and compute complexity. DP-SGD-RC leverages stochastic trace estimation methods, specifically Hutchinson's estimator[Hutchinson, 1989] and its improved variant, Hutch++[Meyer et al., 2021], to reduce the memory footprint of per-sample gradient norm estimation. We provide a tight privacy analysis showing that DP-SGD-RC achieves noise multipliers competitive with deterministic clipping. Experiments fine-tuning Llama~3.2-1B on long-context benchmarks spanning classification, question answering, and summarization tasks demonstrate that DP-SGD-RC matches baseline utility while significantly reducing memory and compute requirements.

2605.24876 2026-05-26 math.NA cs.LG cs.NA 版本更新

IV-Net: A neural network for elliptic PDEs with random and highly varying coefficients

IV-Net: 用于随机和高变系数椭圆型偏微分方程的神经网络

Shan Zhong, George Biros

发表机构 * Oden Institute for Computational Science and Engineering, The University of Texas at Austin(计算科学与工程院,德克萨斯大学奥斯汀分校) Walker Department of Mechanical Engineering, The University of Texas at Austin(机械工程系,德克萨斯大学奥斯汀分校)

AI总结 提出一种受V-cycle多重网格求解器启发的神经算子架构IV-Net,用于逼近高对比度空间变系数线性椭圆型偏微分方程的解,在高度异质系数问题上优于POD和现有神经算子,在光滑系数低频振荡Helmholtz问题上与Fourier神经算子性能相当。

Comments 36 pages

详情
AI中文摘要

我们提出了一种新颖的神经算子架构,旨在逼近具有高对比度、空间变化系数的线性椭圆型偏微分方程的解。该网络称为迭代V形网络(IV-Net),实现了从输入系数和右端项到相应解场的映射。IV-Net的架构受V-cycle多重网格求解器启发,并与之高度相似。IV-Net模型通过物理域中定义的卷积层进行参数化。对于具有高度异质系数的强制问题,所提出的网络相对于本征正交分解(POD)方法和几种现有的神经算子架构表现出优越的性能。对于具有光滑系数的低频振荡Helmholtz问题,其性能与Fourier神经算子相似。我们分析了IV-Net的逼近误差和收敛行为、其数据效率以及对底层离散网格的依赖性。此外,我们通过一系列数值实验展示了该架构的实际有效性,包括在不确定性量化、反问题和感兴趣量预测中的应用。

英文摘要

We introduce a novel neural operator architecture designed to approximate solutions of linear elliptic partial differential equations with high-contrast, spatially varying coefficients. The network, termed the Iterated V-shaped Net (IV-Net), realizes a mapping from the input coefficients and righthand side to the corresponding solution field. The architecture of IV-Net is informed by, and closely resembles, a V-cycle multigrid solver. The IV-Net model is parameterized via convolutional layers defined in the physical domain. For coercive problems with highly heterogeneous coefficients, the proposed network exhibits superior performance relative to a proper orthogonal decomposition (POD) approach and several existing neural operator architectures. For low-frequency oscillatory Helmholtz problems with smooth coefficients, its performance is similar to that of a Fourier neural operator. We analyze the approximation error and convergence behavior of IV-Net, its data efficiency, and its dependence on the underlying discretization mesh. Furthermore, we demonstrate the practical effectiveness of the architecture through a series of numerical experiments, including applications to uncertainty quantification, inverse problems, and prediction of quantities of interest.

2605.24873 2026-05-26 cs.CL cs.AI cs.LG 版本更新

Towards a Universal Causal Reasoner

迈向通用因果推理器

Qirun Dai, Xiao Liu, Jiawei Zhang, Dylan Zhang, Hao Peng, Chenhao Tan

发表机构 * The University of Chicago(芝加哥大学) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出UniCo数据生成框架,覆盖Pearl因果阶梯的18种查询类型,将符号示例转化为代码和自然语言,通过监督微调显著提升LLM的因果推理能力和推理忠实度。

详情
AI中文摘要

尽管因果推理的重要性不言而喻,但训练LLM进行因果推理仍未被充分探索。现有的数据工作大多集中在针对因果关系的特定方面对LLM进行基准测试,这使得它们不太适合训练可泛化的因果推理器。为了解决这个问题,我们提出了UniCo,一个数据生成框架,它既(1)涵盖了Pearl因果阶梯中的18种因果查询类型,又(2)将原生符号示例转化为代码和自然语言形式,以模拟因果术语未明确指定的真实世界用例。为确保数据质量,UniCo用精确的因果推理来支撑答案,并过滤掉存在推理捷径的案例。通过使用66.6K个UniCo生成的实例进行监督微调,Qwen3-4B、Qwen3-8B和Olmo-3-7B-Instruct在所有18种分布内查询类型上平均提升了22.9%,在训练分布之外的7个已建立的因果基准上,相比最先进的因果数据生成框架提升了8.1%。更重要的是,在真实世界的医学理解、法律决策和表格推理中,UniCo训练的模型始终展现出更忠实的推理轨迹,在忠实度指标上平均超过基础模型20.2%。这些结果表明,以因果为中心的训练不仅增强了因果推理能力,还赋予了LLM在一般推理任务中的因果思维。

英文摘要

Despite the importance of causal reasoning, training LLMs to reason causally remains underexplored. Existing data efforts mostly focus on benchmarking LLMs on specific aspects of causality, making them less suitable for training generalizable causal reasoners. To address this, we propose UniCo, a data generation framework that both (1) addresses 18 causal query types across Pearl's Causal Ladder and (2) translates natively symbolic examples into code and natural language forms to simulate real-world use cases where causal terms are not explicitly specified. To ensure data quality, UniCo grounds answers with exact causal inference and filters cases with reasoning shortcuts. Upon supervised finetuning with 66.6K UniCo-generated instances, Qwen3-4B, Qwen3-8B and Olmo-3-7B-Instruct achieve an average of 22.9% improvements across all 18 in-distribution query types, and 8.1% over state-of-the-art causal data generation frameworks on 7 established causal benchmarks outside the training distribution. More importantly, in real-world medical understanding, legal decision, and tabular reasoning, UniCo-trained models consistently display more faithful reasoning traces, outperforming the base models by an average of 20.2% in faithfulness metrics. These suggest that causality-centered training not only strengthens causal reasoning, but also equips LLMs with a causal mindset in general reasoning tasks.

2605.24872 2026-05-26 cs.LG 版本更新

Cluster Frequency Conformal Prediction for Local Coverage

聚类频率共形预测用于局部覆盖

Tomer Lavi, Bracha Shapira, Nadav Rappoport

发表机构 * Institute of Applied AI Research (AAIR)(应用人工智能研究学院) Faculty of Computer and Information Science(计算机与信息科学学院) Ben-Gurion University of the Negev(贝叶尔-加隆大学) Institute for Interdisciplinary Computational Science (ICS)(跨学科计算科学研究所)

AI总结 提出聚类频率共形预测(CFCP)框架,通过聚类嵌入并估计局部标签频率分布,结合全局先验和可靠性感知收缩,在标准共形预测中实现类级覆盖改进,并在图像和文本基准上验证有效性。

详情
AI中文摘要

共形预测提供了无分布覆盖保证,但在多类分类中仍可能对特定类别或子群体覆盖不足,阻碍了在高风险应用中的安全部署。我们提出聚类频率共形预测(CFCP),一个即插即用框架,将共形预测适应于学习表示空间中的局部结构。CFCP对学习到的嵌入进行聚类,从校准数据中估计聚类级别的标签频率分布,并为每个测试点通过软混合附近聚类分布(经全局先验和可靠性感知收缩正则化)构建样本特定的概率向量。然后使用标准集构造器对该向量进行共形化。在不相交分割机制下,CFCP继承了标准的有限样本边际有效性。在额外假设下,CFCP进一步允许局部有效性解释。由于表示聚类聚合了局部相似样本,其经验类别频率提供了局部标签歧义的稳定估计。在图像和文本基准上,CFCP在15/16个数据集/评分族比较中实现了最佳类别覆盖,并具有竞争力的预测集大小效率,其中多个设置显著更高效。总体而言,我们的结果表明,聚类频率信息为改善多类共形预测中的类级可靠性提供了有效的局部化信号。

英文摘要

Conformal prediction provides distribution-free coverage guarantees, but in many-class classification it may still under-cover specific classes or subpopulations, preventing safe deployment in high-stakes applications. We propose Cluster Frequency Conformal Prediction (CFCP), a plug-in framework that adapts conformal prediction to local structure in a learned representation space. CFCP clusters learned embeddings, estimates cluster-level label-frequency distributions from calibration data, and for each test point constructs a sample-specific probability vector by softly mixing nearby cluster distributions regularized with global-prior and reliability-aware shrinkage. This vector is then conformalized using standard set constructors. In the disjoint-split regime, CFCP inherits standard finite-sample marginal validity. Under additional assumptions, CFCP further admits a local-validity interpretation. Since representation clusters aggregate locally similar samples, their empirical class frequencies provide a stable estimate of local label ambiguity. Across image and text benchmarks, CFCP achieves the best class coverage in 15/16 dataset/score-family comparisons and a competitive prediction set size efficiency, with several settings substantially more efficient. Overall, our results show that cluster-frequency information provides an effective localized signal for improving classwise reliability in many-class conformal prediction.

2605.24868 2026-05-26 cs.LG nlin.CD physics.comp-ph 版本更新

A comparative study of accuracy and rollout stability of temporal surrogate models

时间代理模型的准确性与展开稳定性比较研究

Rajarshi Biswas

发表机构 * Cargill Inc.(卡吉尔公司)

AI总结 本文比较了多种深度神经网络架构在混沌动力系统时间代理建模中的长期预测稳定性,发现具有积分器式更新的模型表现出更低的偏差和扰动放大,从而实现稳定的长期展开和更准确的预测。

Comments 24 pages, 18 figures, submitted to journal

详情
AI中文摘要

时间代理模型对于预测计算成本可能过高的混沌动力系统是有效的。几种深度神经网络架构可用于此目的。在这项工作中,使用共同的训练协议比较了几种常用的架构。目标是公平评估模型架构对长期预测稳定性的影响。针对三个问题进行了实验:双摆、Kuramoto-Sivashinsky方程和Kolmogorov流。实验在匹配模型容量的情况下进行。还对每个模型单独优化的场景进行了分析。观察到在两种场景中,模型在长期展开中表现出类别差异。为了具体量化,使用局部雅可比、相对单步偏差和有限时间李雅普诺夫增长等指标分析了逐步误差注入和扰动放大。此外,还进行了吸引子分析,以评估学习模型复制底层系统几何形状的程度。还进行了消融研究,以隔离连续更新架构中每个组件的影响。结论是,具有积分器式更新的模型表现出更低的偏差和扰动放大,从而产生稳定的长期展开和更准确的预测。

英文摘要

Temporal surrogate models are effective for predicting chaotic dynamical systems where computational cost can be prohibitive. Several deep neural network architectures can be used for such purposes. In this work, a few commonly used architectures are compared using a common training protocol. The objective is to fairly assess the impact of model architectures for long-horizon prediction stability. Experiments are carried out for three problems, the double pendulum, the Kuramoto-Sivashinsky equations, and the Kolmogorov flow. The experiments are carried out with matching model capacity. Analysis is also carried out for a scenario where each model is individually optimized. It is observed that in both scenarios, the models exhibit categorical differences in long-horizon rollouts. For a concrete quantification, stepwise error injections and perturbation amplifications are analyzed using metrics such as local jacobian, relative one-step bias, and finite-time Lyapunov growth. Additionally, an attractor analysis is also conducted to assess how well the learned models replicate the underlying system geometry. An ablation study to isolate the impact of each component of a continuous-update architecture is also carried out. It is concluded that models that having integrator-like updates show lower bias and perturbation amplification yielding stable long-horizon rollout and more accurate predictions.

2605.24862 2026-05-26 cs.LG 版本更新

Unifying Value Alignment and Assignment in Cross-Domain Offline Reinforcement Learning with Heterogeneous Datasets

统一跨域离线强化学习中异构数据集的价值对齐与价值分配

Zhongjian Qiao, Jiafei Lyu, Chenjia Bai, Peisong Wang, Siyang Gao, Shuang Qiu

发表机构 * City University of Hong Kong Tencent Institute of Artificial Intelligence (TeleAI), China Telecom Institute of Automation, Chinese Academy of Sciences. Corresponding Author

AI总结 针对异构跨域离线强化学习中价值误分配问题,提出V2A方法,通过时间一致模态表示学习和模态感知优势学习统一动力学对齐、价值对齐与价值分配,显著提升策略性能。

Comments Accepted at ICML 2026

详情
AI中文摘要

跨域离线强化学习旨在利用有限的目标域数据集和存在动力学偏移的源域数据集,在目标域中学习策略。直接在原始源数据集上训练通常会导致性能崩溃。最近的研究从动力学对齐或价值对齐的角度进行数据过滤,以实现有效的策略迁移。然而,这些研究通常在单域或单行为策略的源数据集上验证。在这项工作中,我们探索了一个更一般的异构跨域离线强化学习设置,其中源数据集可能由多种行为策略从多个源域收集。我们首先揭示了该设置中一个关键但被忽视的问题:价值误分配。通过实验和理论,我们证明了价值误分配会破坏价值对齐,误导数据过滤选择次优样本,并扩大次优性差距,从而降低智能体的性能。为了解决这个问题,我们提出了V2A,它整合了动力学对齐、价值对齐和价值分配。V2A首先采用时间一致的模态表示学习从源数据集中提取动力学模态,然后通过模态感知优势学习纠正价值对齐。最后,它采用数据过滤范式选择性共享源数据进行策略学习。实验结果表明,在一般异构跨域离线强化学习设置下,V2A显著优于强基线方法。

英文摘要

Cross-domain offline reinforcement learning (RL) aims to learn a policy in the target domain with a limited target domain dataset and a source domain dataset that exhibits a dynamics shift. Training directly on the original source dataset typically leads to performance collapse. Recent studies perform data filtering from the perspective of dynamics alignment or value alignment to enable efficient policy transfer. However, these studies are typically validated on single-domain or single-behavior-policy source datasets. In this work, we explore a more general heterogeneous cross-domain offline RL setting, where the source datasets may be collected from multiple source domains by diverse behavior policies. We first uncover a critical yet overlooked issue in this setting: value misassignment. Empirically and theoretically, we demonstrate that value misassignment can undermine value alignment, mislead data filtering toward selecting suboptimal samples, and loosen the suboptimality gap, thereby degrading the agent's performance. To address this issue, we propose V2A, which integrates dynamics alignment, value alignment, and value assignment. V2A first employs temporally-consistent modality representation learning to extract dynamics modalities from the source dataset, followed by modality-aware advantage learning to rectify value alignment. Finally, it adopts a data filtering paradigm to selectively share source data for policy learning. Empirical results show that V2A significantly outperforms strong baseline methods under general heterogeneous cross-domain offline RL settings.

2605.24860 2026-05-26 eess.SY cs.AI cs.ET cs.LG cs.RO cs.SY 版本更新

DBPnet: Damper Characteristics-Based Bayesian Physics-Informed Neural Network for Wheel Load Estimation

DBPnet:基于阻尼特性的贝叶斯物理信息神经网络用于车轮载荷估计

Tianyi Wang, Tianyi Zeng, Zimo Zeng, Feiyang Zhang, Yujin Wang, Xiangyu Li, Yiming Xu, Sikai Chen, Junfeng Jiao, Christian Claudel, Xinbo Chen

发表机构 * Department of Civil, Architectural, and Environmental Engineering, The University of Texas at Austin(德克萨斯大学奥斯汀分校土木、建筑与环境工程系) School of Automation and Intelligent Sensing, Shanghai Jiao Tong University(上海交通大学自动化与智能感知学院) College of Electrical Engineering, Zhejiang University(浙江大学电气工程学院) School of Automotive Studies, Tongji University(同济大学汽车学院) School of Architecture, The University of Texas at Austin(德克萨斯大学奥斯汀分校建筑学院) Department of Civil and Environmental Engineering, University of Wisconsin-Madison(威斯康星大学麦迪逊分校土木与环境工程系)

AI总结 提出DBPnet,一种结合阻尼特性嵌入模块的贝叶斯物理信息神经网络,通过悬架连杆级建模和物理信息损失函数,实现鲁棒的车轮载荷估计。

Comments 14 pages, 12 figures, 6 tables

详情
AI中文摘要

高级驾驶辅助系统(ADAS)在现代汽车智能化中扮演重要角色,显著提升车辆安全性和稳定性。ADAS的性能关键依赖于准确可靠的车辆状态估计,特别是来自车辆动态传感器的信号。在这些信号中,车轮载荷是底盘控制和安全关键功能的关键变量,但由于复杂的悬架几何结构、非线性动力学和测量噪声,难以鲁棒估计。为解决此问题,我们提出DBPnet,一种贝叶斯物理信息神经网络(PINN),其具有受阻尼特性启发的物理感知嵌入模块。首先,本文提出一种悬架连杆级建模(SLLM)方法,通过显式考虑悬架的复杂几何结构,构建非线性瞬时动态模型。在SLLM基础上,将贝叶斯推断集成到PINN中,有效应对车辆底盘系统中的噪声和不确定性,从而提高模型的鲁棒性。然后,采用物理信息损失函数确保与基本物理原理的一致性,同时受阻尼特性启发的嵌入模块提取输入信号的时间变化特征,并将其融入PINN的每一层,确保物理观测指导神经网络而不受固定物理模型的约束。在高保真仿真和真实世界实验上的广泛评估表明,我们的DBPnet在RMSE和MaxError上始终低于基线方法。这些结果凸显了我们的DBPnet在推进车轮载荷估计和为更可靠的ADAS执行器功能发展做出贡献的潜力。

英文摘要

Advanced driver assistance systems (ADAS) play an important role in modern automotive intelligence, significantly enhancing vehicle safety and stability. The performance of ADAS critically relies on accurate and reliable vehicle state estimation, particularly from vehicle dynamic sensors. Among these signals, wheel load is a key variable for chassis control and safety-critical functions, yet it remains difficult to estimate robustly due to complex suspension geometry, nonlinear dynamics, and measurement noise. To address this issue, we propose DBPnet, a Bayesian physics-informed neural network (PINN) with a physics-aware embedding module inspired by damper characteristics. First, this paper presents a suspension linkage-level modeling (SLLM) approach that constructs a nonlinear instantaneous dynamic model by explicitly considering the complex geometric structure of the suspension. Building upon SLLM, Bayesian inference is integrated into the PINN to effectively cope with noise and uncertainty in the vehicle chassis system, thereby improving the model's robustness. Then, a physics-informed loss function is employed to ensure consistency with fundamental physical principles, while the damper characteristics-inspired embedding module extracts temporal variation features of input signals and incorporates them into each layer of the PINN, ensuring that physical observations guide the neural network without being constrained by fixed physical models. Extensive evaluations on high-fidelity simulations and real-world experiments demonstrate that our DBPnet consistently achieves lower RMSE and MaxError than baseline methods. These results highlight the potential of our DBPnet to advance wheel load estimation and contribute to the development of more reliable ADAS actuator functions.

2605.24856 2026-05-26 cs.LG cs.AI 版本更新

The Concept Allocation Zone: Tracking How Concepts Form Across Transformer Depth

概念分配区:追踪概念如何跨越Transformer深度形成

James Henry

发表机构 * Independent Researcher(独立研究者)

AI总结 提出概念分配区(CAZ)框架,通过层间度量(分离度、概念一致性、概念速度)检测概念在残差流中逐渐形成的深度区间,并在34个模型上验证了分离曲线的多模态性及温和CAZ的因果活性。

Comments 34 models, 8 architectural families, 7 concepts. Companion papers: GEM (arXiv forthcoming), CAZ Validation (arXiv forthcoming), PRH Validation (arXiv forthcoming). Code: https://github.com/jamesrahenry/Rosetta_Tools

详情
AI中文摘要

Transformer语言模型中的概念形成是深度扩展的,而非单层事件:概念在残差流的连续区域内逐渐出现。可解释性方法识别出类别分离峰值的单层——

英文摘要

Concept formation in transformer language models is depth-extended, not a single-layer event: concepts emerge gradually across a contiguous region of the residual stream. Mechanistic interpretability methods identify the single layer of peak class separation -- the "best layer" -- capturing a snapshot rather than the process itself. We introduce the Concept Allocation Zone (CAZ): the depth interval within which a concept becomes measurably separable, the region allocated to its geometric expression. We formalize the CAZ through three layer-wise metrics (Separation, Concept Coherence, Concept Velocity) and derive principled boundary detection without manual layer sweeps. A CAZ is not a concept: it is the depth region within which the model organizes its geometry to make a concept separable. A single concept typically participates in multiple CAZes; multiple concepts may share one. Empirical validation across 34 models from 8 architectural families and 7 concepts reveals that the separation curve S(l) is frequently multimodal. A scored detector uncovers "gentle CAZes" -- subtle allocation regions invisible to standard peak detection but causally active in 93-100% of cases under ablation (16 of 34 models; 26 in the companion validation paper). The framework generates seven testable predictions; four yield clear verdicts (two not supported, one partially supported, one supported), one had its precondition invalidated by the data, and two are underpowered -- with cross-architecture alignment confirmed as depth-matched rather than monolithic under leave-one-concept-out cross-validation. Reference implementation: rosetta_tools v1.3.1 (doi:10.5281/zenodo.20361433).

2605.24852 2026-05-26 cs.LG cs.SY eess.SY 版本更新

T2S-MPC: Time-Embedded Online Adaptive Model Predictive Control for Time-Varying Dynamics

T2S-MPC:面向时变动力学的时间嵌入在线自适应模型预测控制

Zeyu Shen, Zhuoyuan Wang, Laixi Shi

发表机构 * JHU Department of Applied Mathematics and Statistics, Johns Hopkins University, MD, USA(约翰霍普金斯大学应用数学与统计学系) CMU Department of Electrical and Computer Engineering, Carnegie Mellon University, PA, USA(卡内基梅隆大学电气与计算机工程系)

AI总结 提出T2S-MPC框架,通过时间嵌入和双时间尺度更新在线学习残差动力学模型,实现快速时变环境下的自适应模型预测控制,在四旋翼任务中优于经典和神经MPC方法。

详情
AI中文摘要

基于学习的模型预测控制(MPC)的最新进展利用神经网络进行在线模型学习,当非平稳系统动力学偏离标称模型时,取得了强劲的性能。然而,现有方法主要处理特定或相对结构化的动力学变化形式,对于更一般、未知且不可预测的时变动力学处理不足。为应对这一挑战,我们提出T2S-MPC框架,该框架在线自适应学习残差动力学模型,并将其与MPC框架内的标称模型集成,以实现快速演变的在线规划。为使模型具有时间感知能力,我们通过结构化时间嵌入显式编码时间信息,并采用双时间尺度更新方案,使控制器能够捕捉非平稳动力学,同时平衡快速适应与稳定学习。我们在二维四旋翼上评估了所提方法,在多种时变扰动(包括线性漂移和周期性扰动)下执行稳定和轨迹跟踪任务。实验结果表明,T2S-MPC在控制性能上始终优于经典MPC、神经MPC及消融变体,同时在没有额外调参的情况下,在广泛的扰动条件下展现出强鲁棒性。源代码公开于https://github.com/Zeyuu0920/T2S_MPC。

英文摘要

Recent advances in learning-based model predictive control (MPC) have leveraged neural networks for online model learning, achieving strong performance when nonstationary system dynamics deviate from nominal models. However, existing approaches primarily address specific or relatively structured forms of dynamical variation, leaving more general, unknown, and unpredictable time-varying dynamics insufficiently handled. To tackle this challenge, we propose T2S-MPC, a framework that adaptively learns a residual dynamics model online and integrates it with the nominal model within the MPC framework to enable fast-evolving online planning. To make the model time-aware, we explicitly encode temporal information through a structured time embedding and employ a two-timescale update scheme, allowing the controller to capture nonstationary dynamics while balancing rapid adaptation with stable learning. We evaluate the proposed method on a 2D quadrotor across stabilization and trajectory tracking tasks under diverse time-varying disturbances, including linear drifting and periodic perturbations. Experimental results show that T2S-MPC consistently outperforms classical MPC, neural MPC, and ablated variants in control performance, while also demonstrating strong robustness across a wide range of disturbance conditions without additional tuning. The source code is publicly available at https://github.com/Zeyuu0920/T2S_MPC

2605.24841 2026-05-26 cs.LG 版本更新

DriftingMol: Decoder-Coupled Drift for One-Pass Property-Conditional Molecular Generation

DriftingMol: 用于一次性属性条件分子生成的解码器耦合漂移

Jiangjie Qiu, Yijun Li, Wentao Li, Xiaonan Wang

发表机构 * Beijing Key Laboratory of Artificial Intelligence for Advanced Chemical Engineering Materials(北京先进化工材料人工智能重点实验室)

AI总结 提出 DriftingMol 两阶段框架,通过解码器耦合漂移将漂移模型适应于 SELFIES 潜在分子空间,实现低采样成本、高有效性和多样性的属性条件分子生成。

Comments 9 pages, 5 figures

详情
AI中文摘要

属性条件分子生成应在响应连续目标值的同时,以低采样成本生成有效且多样的分子。我们引入了 DriftingMol,一个两阶段框架,将漂移模型适应于 SELFIES 潜在分子空间。冻结的 SELFIES beta-VAE 提供潜在空间,其解码器的隐藏表示作为漂移特征图。在解码器耦合漂移中,解码器权重保持不变,但漂移梯度通过解码器特征图反向传播到 DiT 生成器,从而诱导出与分子解码对齐的拉回度量。在 ZINC250K 上,默认设置实现了 QED Spearman 相关系数 0.493,独特性 94.7%,而最强的解码器耦合条件达到 0.510。在协议匹配的四属性条件下,解码器耦合漂移的平均 Spearman 相关系数高达 0.598。在 15 个受控变体中,保留通过解码器特征的梯度路径的模型比测试的潜在空间、随机特征和外部特征漂移变体实现了更高的相关性,而分离或停止梯度的解码器控制导致 QED 相关性接近零且独特性极低。这些结果表明,解码器耦合漂移是一种有用的低成本机制,用于属性偏置分子生成,只需一次生成器评估和一次冻结解码器传递。

英文摘要

Property-conditional molecular generation should produce valid, diverse molecules while responding to continuous target values at low sampling cost. We introduce DriftingMol, a two-stage framework that adapts drifting models to a SELFIES latent molecular space. A frozen SELFIES beta-VAE provides the latent space, and the hidden representation of its decoder serves as the drift feature map. In decoder-coupled drift, decoder weights remain fixed, but drift gradients are backpropagated through the decoder feature map to a DiT generator, inducing a pullback metric aligned with molecular decoding. On ZINC250K, the default setting achieves QED Spearman correlation 0.493 with 94.7% uniqueness, while the strongest decoder-coupled condition reaches 0.510. Under protocol-matched four-property conditioning, decoder-coupled drift reaches mean Spearman correlation up to 0.598. Across 15 controlled variants, models that preserve the gradient path through decoder features achieve higher correlations than the tested latent-space, random-feature, and external-feature drift variants, while detached or stop-gradient decoder controls yield near-zero QED correlation and very low uniqueness. These results indicate that decoder-coupled drift is a useful low-cost mechanism for property-biased molecular generation, requiring one generator evaluation and one frozen decoder pass.

2605.24817 2026-05-26 cs.CR cs.AR cs.CL cs.LG 版本更新

RouteScan: A Non-Intrusive Approach to Auditing MoE LLMs Safety via Expert Routing Telemetry

RouteScan: 通过专家路由遥测对MoE大语言模型安全性进行非侵入式审计

Bo Lv, Zhiheng Xu, KeDong Xiu, Ruyi Ding, Tianhang Zheng, Zhibo Wang, Kui Ren

发表机构 * Zhejiang University(浙江大学) Donghua University(东华大学) Louisiana State University(路易斯安那州立大学)

AI总结 提出RouteScan,一种利用MoE模型GPU级专家路由遥测(如预填充阶段活跃线程数)作为微架构指纹,通过轻量级检测流水线识别恶意提示的非侵入式审计框架,在未见过的有害领域AUROC超0.93,新越狱包装下超0.96,且相比基于内容的审计方法具有隐私优势。

Comments 20 pages. Under submission

详情
AI中文摘要

混合专家(MoE)架构已成为扩展大型语言模型(LLM)日益重要的范式。随着MoE模型越来越多地部署在实际服务中,安全性审计变得必要,以验证这些模型在运行过程中是否产生或助长有害行为。然而,现有的基于内容的审计方法通常需要访问用户提示、模型输入或生成输出,可能暴露敏感用户信息,并在LLM安全性和用户隐私之间造成根本性紧张。另一方面,我们观察到,在MoE模型中,稀疏专家路由将不同输入映射到激活不同的专家执行模式,在低级GPU执行遥测中产生可测量的足迹。受此观察启发,我们提出RouteScan,一种通过GPU级专家路由遥测检测有害行为的非侵入式审计框架。具体而言,RouteScan利用预填充阶段分配给专家模块的活跃GPU线程数作为判别性微架构指纹,并构建轻量级检测流水线,隔离跨领域不变风险指标以精确识别恶意提示。对具有不同路由设计的开源MoE LLM的全面评估表明,RouteScan实现了强泛化,在未见过的有害领域AUROC超过0.93,在新型越狱包装下超过0.96。此外,经验性反演测试表明,收集的专家路由遥测为提示重建提供的信息有限,表明相对于基于内容的审计方法具有实际隐私优势。

英文摘要

Mixture-of-Experts (MoE) architectures have become an increasingly important paradigm for scaling Large Language Models (LLMs). As MoE models are increasingly deployed in real-world services, safety auditing becomes necessary to verify whether these models produce or facilitate harmful behaviors during operation. However, existing content-based auditing methods typically require access to user prompts, model inputs, or generated outputs, potentially exposing sensitive user information and creating a fundamental tension between LLM safety and user privacy. On the other hand, we observe that, in MoE models, sparse expert routing maps different inputs to activate different expert-execution patterns, producing measurable footprints in low-level GPU execution telemetry. Inspired by this observation, we propose RouteScan, a non-intrusive auditing framework for detecting harmful behaviors through GPU-level expert routing telemetry. Specifically, RouteScan utilizes the number of active GPU threads allocated to expert modules during the prefilling phase as a discriminative micro-architectural fingerprint, and builds a lightweight detection pipeline that isolates cross-domain invariant risk indicators for the precise identification of malicious prompts. Comprehensive evaluations on open-source MoE LLMs with distinct routing designs demonstrate that RouteScan achieves strong generalization, with an AUROC exceeding 0.93 on unseen harmful domains and 0.96 under novel jailbreak wrappers. Moreover, empirical inversion tests show that the collected expert routing telemetry provides limited information for prompt reconstruction, suggesting a practical privacy advantage over content-based auditing methods.

2605.24810 2026-05-26 cs.LG cs.AI cs.RO stat.AP 版本更新

Cross-Domain Energy-Guided Diffusion Generation for Off-Dynamics Reinforcement Learning

跨域能量引导扩散生成用于动态偏移强化学习

Yu Yang, Yihong Guo, Anqi Liu, Pan Xu

发表机构 * Duke University(杜克大学) Johns Hopkins University(约翰霍普金斯大学)

AI总结 提出CEDGE框架,利用能量引导扩散模型生成目标域轨迹,解决动态偏移下离线强化学习的域适应问题。

Comments 29 pages, 3 figures, and 14 tables

详情
AI中文摘要

离动态离线强化学习旨在从大规模源数据集和有限目标数据集中学习目标域策略,但面临转移动态不匹配的问题。现有方法如奖励增强和数据过滤受限于源数据集,无法合成新的目标行为以改善超出收集源轨迹的覆盖范围。虽然近期基于模型的方法尝试通过学习目标感知动态来解决此问题,但生成的体验仅在转移层面构建,导致长时域上的累积误差。这些限制促使离动态离线RL转向轨迹级生成。我们提出CEDGE,一种跨域能量引导扩散生成框架。CEDGE在源域轨迹上训练轨迹扩散模型,并通过能量引导将生成样本适应到目标域。该引导通过最小化源域与期望目标域轨迹之间的分布不匹配得到,并分解为回报、域和行为能量成分。得到的能量引导轨迹既可用于直接规划,也可作为策略学习的合成数据。由于目标适应通过能量引导而非重新训练扩散模型实现,与先前方法相比,CEDGE能高效适应新的目标动态。在ODRL基准上的实验表明,轨迹级能量引导生成改善了动态偏移下的扩散规划,并产生提升下游目标策略学习的合成数据。

英文摘要

Off-dynamics offline reinforcement learning seeks to learn a target-domain policy from a large source dataset and a limited target dataset under mismatched transition dynamics. Existing approaches such as reward augmentation and data filtering are constrained to the source dataset and cannot synthesize new target behavior to improve coverage beyond the collected source trajectories. While recent model-based methods attempt to address this by learning target-aware dynamics, the generated experience is constructed only at the transition level, which leads to accumulated errors over long horizons. These limitations necessitate a shift toward trajectory-level generation for off-dynamics offline RL. We propose CEDGE, a Cross-domain Energy-guided Diffusion GEneration framework. CEDGE trains a trajectory diffusion model on source-domain trajectories and adapts the generated samples to the target domain through energy guidance. This guidance is derived by minimizing the distribution mismatch between the source and desired target-domain trajectories and is decomposed into return, domain, and behavior energy components. The resulting energy-guided trajectories are useful both for direct planning and as synthetic data for policy learning. Since target adaptation is achieved via energy guidance rather than retraining the diffusion model, CEDGE can be efficiently adapted to new target dynamics compared to previous methods. Experiments on the ODRL benchmark demonstrate that trajectory-level energy-guided generation improves diffusion planning under dynamics shifts and produces synthetic data that improves downstream target policy learning.

2605.24808 2026-05-26 cs.LG cs.AI 版本更新

Disentangled Double Machine Learning for Accurate Causal Effect Estimation

解缠双机器学习用于精确因果效应估计

Guodu Xiang, Kui Yu, Yujie Wang, Richang Hong, Fuyuan Cao, Jiye Liang

发表机构 * School of Computer Science and Information Engineering, Hefei University of Technology(合肥工业大学计算机科学与信息工程学院) School of Computer and Information Technology, Shanxi University(山西大学计算机与信息学院)

AI总结 提出解缠双机器学习(DDML),通过因果角色解缠和残差依赖正交化策略,解决高维或有限样本下双机器学习中因混淆因子未解缠导致的偏差和不稳定问题,在合成、半合成和真实数据集上优于13种基线方法。

Comments 15 pages, 9 figures

详情
AI中文摘要

混淆偏差是从观测数据中估计因果效应的一个关键挑战。双机器学习(DML)通过估计治疗和结果 nuisance 函数、构建治疗和结果残差,并从残差中估计因果效应来解决这一问题。然而,DML 在高维或有限样本场景中常常产生有偏和不稳定的估计。一个原因是 DML 使用所有协变量估计 nuisance 函数,而没有解缠不同的潜在因子,导致不可靠的 nuisance 函数估计。另一个原因是不精确的 nuisance 估计进一步引入了治疗残差与剩余结果误差之间的残差依赖,破坏了因果效应估计的准确性。为了解决这些问题,本文提出解缠双机器学习(DDML),一种整合两种关键策略的新算法。首先,因果角色解缠策略将协变量分解为混淆因子、治疗特有因子和结果特有因子,以实现可靠的 nuisance 函数估计。其次,残差依赖正交化策略减轻由 nuisance 估计误差引起的残差依赖,以增强因果效应估计的精度。在合成、半合成和真实数据集上的实验结果表明,DDML 在 MAE 和 RMSE 上均显著优于 13 种最先进的基线算法。

英文摘要

Confounding bias is a key challenge in causal effect estimation from observational data. Double Machine Learning (DML) addresses this issue by estimating treatment and outcome nuisance functions, constructing treatment and outcome residuals, and estimating causal effects from the residuals. However, DML often produces biased and unstable estimates in highdimensional or finite-sample scenarios. One reason is that DML estimates nuisance functions using all covariates without disentangling distinct latent factors, resulting in unreliable nuisance function estimation. Another is that imprecise nuisance estimation further introduces residual dependence between the treatment residual and the remaining outcome error, undermining the accuracy of causal effect estimates. To address these issues, in this paper, we propose Disentangled Double Machine Learning (DDML), a novel algorithm that integrates two key strategies. First, a causal role disentanglement strategy decomposes covariates into confounders, treatment-specific factors, and outcomespecific factors for enabling reliable nuisance function estimation. And second, a residual dependence orthogonalization strategy mitigates residual dependence caused by nuisance estimation errors for enhancing the precision of causal effect estimates. Experimental results on synthetic, semi-synthetic, and real-world datasets demonstrate that DDML significantly outperforms 13 state-of-the-art baseline algorithms in both MAE and RMSE.

2605.24803 2026-05-26 cs.LG 版本更新

Active Learning for Stochastic Contextual Linear Bandits

随机上下文线性老虎机的主动学习

Emma Brunskill, Ishani Karmarkar, Zhaoqi Li

发表机构 * Stanford University(斯坦福大学)

AI总结 提出一种通过主动采样上下文-动作对奖励来学习近最优策略的算法,理论上证明主动上下文采样可将最小最大率改进最多√d倍,并在华法林剂量预测和笑话推荐任务中验证了样本效率提升。

详情
AI中文摘要

随机上下文线性老虎机的一个关键目标是高效学习近最优策略。现有算法通过策略性地采样动作来学习策略,但被动地从底层上下文分布中采样上下文。然而,在许多实际场景中——包括在线内容推荐、调查研究、临床试验——从业者可以根据上下文分布的先前知识主动采样或招募上下文。尽管有这种主动学习的潜力,但策略性上下文采样在随机上下文线性老虎机中的作用尚未被充分探索。我们提出一种算法,通过策略性地采样上下文-动作对的奖励来学习近最优策略。我们证明了实例相关的理论保证,表明我们的主动上下文采样策略可以将最小最大率改进最多√d倍,其中d是线性维度。我们通过实验证明,我们的算法在学习近最优策略所需的样本数量上有所减少,例如在华法林剂量预测和笑话推荐任务中。

英文摘要

A key goal in stochastic contextual linear bandits is to efficiently learn a near-optimal policy. Prior algorithms for this problem learn a policy by strategically sampling actions but naively (passively) sampling contexts from the underlying context distribution. However, in many practical scenarios -- including online content recommendation, survey research, and clinical trials -- practitioners can actively sample or recruit contexts based on prior knowledge of the context distribution. Despite this potential for active learning, the role of strategic context sampling in stochastic contextual linear bandits is underexplored. We propose an algorithm that learns a near-optimal policy by strategically sampling rewards of context-action pairs. We prove instance-dependent theoretical guarantees demonstrating that our active context sampling strategy can improve over the minimax rate by up to a factor of $\sqrt{d}$, where $d$ is the linear dimension. We show empirically that our algorithm reduces the number of samples needed to learn a near-optimal policy, in tasks such as warfarin dose prediction and joke recommendation.

2605.24786 2026-05-26 cs.LG cs.AI 版本更新

CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM

CONF-KV:面向长序列LLM的置信度感知KV缓存淘汰与混合精度存储

Yubo Li, Yidi Miao

发表机构 * Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出CONF-KV方法,利用模型当前不确定性(置信度)动态调整KV缓存预算,结合混合精度存储和分块在线softmax注意力,在长序列推理中显著降低显存占用并保持高精度。

详情
AI中文摘要

长序列LLM推理使键值(KV)缓存成为GPU内存的主要消耗者,并使每个token的注意力计算越来越昂贵。许多常见的淘汰策略使用静态的最近窗口或历史注意力,忽略了每个解码步骤中计算出的一个信号:模型当前的不确定性。我们引入CONF-KV,一个KV缓存管理器,它将下一个token分布转换为标量置信度分数,并用它来选择每步缓存预算,在模型不确定时保留更多上下文,在模型确定时积极剪枝。在每个预算内,token根据累积注意力质量和最近性的组合进行排序,同时一个受保护的最近窗口保持局部连贯性。我们将该策略与分块在线softmax注意力、混合FP16/INT8存储以及金字塔式逐层预算变体相结合。在四个模型家族和生成长度高达4K的情况下,CONF-KV的显存占用接近固定的512 token滑动窗口,同时与完整KV相比,困惑度差异保持在1.5-2.1点以内。在长达32K token的“大海捞针”测试中,CONF-KV的检索准确率达到91.4%,而滑动窗口为53.8%,H2O为80.6%;在75个VisualWebArena任务中,它以2.8倍的峰值内存降低保留了完整KV成功率的95.3%。

英文摘要

Long-horizon LLM inference turns the key--value (KV) cache into the dominant GPU memory consumer and makes per-token attention increasingly expensive. Many common eviction policies use static recency windows or historical attention, leaving unused a signal computed on every decoding step: the model's current uncertainty. We introduce CONF-KV, a KV-cache manager that converts the next-token distribution into a scalar confidence score and uses it to choose the per-step cache budget, retaining more context when the model is uncertain and pruning aggressively when it is confident. Within each budget, tokens are ranked by a composite of accumulated attention mass and recency, while a protected recent window preserves local coherence. We combine the policy with blockwise online-softmax attention, mixed FP16/INT8 storage, and a pyramidal per-layer budget variant. Across four model families and generated lengths up to 4K, CONF-KV stays near the footprint of a fixed 512-token sliding window while remaining within 1.5--2.1 perplexity points of full KV. On Needle-in-a-Haystack up to 32K tokens, CONF-KV reaches 91.4% retrieval accuracy versus 53.8% for sliding windows and 80.6% for H2O; on 75 VisualWebArena tasks it retains 95.3% of full-KV success at 2.8 times lower peak memory.

2605.24779 2026-05-26 cs.LG cs.AI math.CO 版本更新

Complement Submodular Information Measures for Balanced and Robust Data Selection

互补子模信息度量用于平衡和鲁棒的数据选择

Rishabh Iyer

发表机构 * The University of Texas at Dallas(德克萨斯大学达拉斯分校)

AI总结 提出互补子模信息(CSI)目标函数,通过建模子集与其补集之间的共享结构信息,实现平衡且鲁棒的数据选择,并在理论上证明其近似单调性和贪心近似保证,实验表明在鲁棒隐藏切片感知子集选择中优于经典子模目标。

详情
AI中文摘要

子模优化已成为数据选择、检索、摘要和表示学习的基本范式,因为它能够建模覆盖度、多样性和代表性。然而,经典子模目标仅优化所选子集,并未明确保留所选子集与剩余数据之间的结构信息。在许多现代机器学习应用中,包括训练/验证/测试分割、基准构建和鲁棒子集选择,选择的质量关键取决于在所选子集及其补集之间保持平衡结构。在这项工作中,我们引入了互补子模信息(CSI),这是一类新的互补感知子模目标,用于量化子集与其补集之间的共享结构信息。我们的框架产生了几个经典子模函数的互补感知变体,包括设施选址、图割、LogDet、饱和覆盖、集合覆盖、概率集合覆盖和基于特征函数。我们分析了CSI目标的理论性质,并表明它们在有限曲率条件下表现出近似单调性,从而得到接近$(1-1/e)$的贪心近似保证。实验上,CSI目标在鲁棒隐藏切片感知子集选择中始终优于标准子模目标。特别是,CSI目标显著改善了相干稀有/尾部语义结构的保留,同时抑制了噪声和孤立异常值,从而显著提高了下游预测性能。合成实验进一步说明了不同的CSI实例如何捕获代表性、多样性、连通性和平衡邻域保留的互补概念。

英文摘要

Submodular optimization has become a fundamental paradigm for data selection, retrieval, summarization, and representation learning due to its ability to model coverage, diversity, and representativeness. However, classical submodular objectives optimize only the selected subset and do not explicitly preserve structural information between the selected subset and the remaining data. In many modern machine learning applications, including train/validation/test splitting, benchmark construction, and robust subset selection, the quality of a selection depends critically on preserving balanced structure across both the selected subset and its complement. In this work, we introduce Complement Submodular Information (CSI), a new class of complement-aware submodular objectives that quantify shared structural information between a subset and its complement. Our framework induces complement-aware variants of several classical submodular functions including Facility Location, Graph Cut, LogDet, Saturated Coverage, Set Cover, Probabilistic Set Cover, and Feature Based Functions. We analyze the theoretical properties of CSI objectives and show that they exhibit approximate monotonicity under bounded curvature conditions, leading to near-$(1-1/e)$ greedy approximation guarantees. Empirically, CSI objectives consistently outperform standard submodular objectives on robust hidden-slice-aware subset selection. In particular, CSI objectives significantly improve preservation of coherent rare/tail semantic structure while simultaneously suppressing noisy and isolated outliers, leading to substantially improved downstream predictive performance. Synthetic experiments further illustrate how different CSI instantiations capture complementary notions of representativeness, diversity, connectivity, and balanced neighborhood preservation.

2605.24774 2026-05-26 cs.LG physics.comp-ph 版本更新

Hermite-NGP: Gradient-Augmented Hash Encoding for Learning PDEs

Hermite-NGP:用于学习PDE的梯度增强哈希编码

Jinjin He, Zhiqi Li, Sinan Wang, Bo Zhu

发表机构 * Georgia Institute of Technology, Atlanta, GA, USA(佐治亚理工学院,亚特兰大,GA,美国)

AI总结 提出Hermite-NGP,一种梯度增强的多分辨率哈希编码,通过显式存储哈希网格顶点处的函数值和混合偏导数并利用Hermite插值实现解析梯度计算,从而快速准确地计算神经PDE求解器的空间导数,并引入多分辨率课程训练策略,在2D和3D PDE基准上实现高达约20倍误差降低和2-10倍收敛时间减少。

Comments Accepted by ICML 2026.Project page: https://jinjinhe2001.github.io/hermite-ngp/

详情
AI中文摘要

我们提出Hermite-NGP,一种梯度增强的多分辨率哈希编码,旨在实现神经PDE求解器空间导数的快速准确计算。与现有依赖自动微分或有限差分且存在不稳定或高成本的NGP方法不同,Hermite-NGP在哈希网格顶点处显式存储函数值和混合偏导数,从而通过Hermite插值实现梯度、雅可比矩阵和海森矩阵的完全解析计算。该设计在保持NGP的效率和空间自适应性的同时,支持高达二阶的解析微分算子。我们进一步引入一种类似于多重网格V-cycle的多分辨率课程训练策略,以实现从粗到细的优化。在一系列2D和3D PDE基准测试中,Hermite-NGP相比先前的神经PDE方法实现了高达约20倍的误差降低,并将与其他求解器相比的收敛时间缩短了2到10倍,对于多达1700万参数的模型,每个epoch的训练时间低至3.5毫秒。

英文摘要

We propose Hermite-NGP, a gradient-augmented multi-resolution hash encoding designed to enable fast and accurate computation of spatial derivatives for neural PDE solvers. Unlike existing NGP-based approaches that rely on automatic differentiation or finite differences and suffer from instability or high cost, Hermite-NGP explicitly stores function values and mixed partial derivatives at hash grid vertices, allowing fully analytic evaluation of gradients, Jacobians, and Hessians via Hermite interpolation. This design preserves the efficiency and spatial adaptivity of NGP while supporting analytic differential operators up to second order. We further introduce a multi-resolution curriculum training strategy analogous to multigrid V-cycles to enable coarse-to-fine optimization. Across a range of 2D and 3D PDE benchmarks, Hermite-NGP achieves up to approximately 20 times lower error than prior neural PDE methods, and reduces wall-clock convergence time by 2 to 10 times compared to other solvers, with per-epoch training times as low as 3.5 ms for models with up to 17M parameters.

2605.24771 2026-05-26 cs.CV cs.AI cs.LG 版本更新

From Theory to Decision Rule: Calibrating the Noisy-Label Crossover for Vision-Language Model Weak Supervision Across Three Medical-Imaging Benchmarks

从理论到决策规则:校准视觉-语言模型弱监督的噪声标签交叉点——基于三个医学影像基准

Bruce Changlong Xu, Jose James, Alexander Ryu

发表机构 * Department of Computer Science, Stanford University(计算机科学系,斯坦福大学)

AI总结 通过三个医学影像基准校准理论预测的噪声标签交叉点,提出基于少量金标标签的决策规则。

Comments 5 pages, 2 figures, 4 tables

详情
AI中文摘要

经典的噪声标签理论预测,弱监督下的下游性能上限是标注者的准确率,这意味着一个尖锐的交叉点:一旦金标训练的分类器达到标注者的水平,弱标签就会从帮助变为伤害。该预测是理论性的;缺少的是将其转化为现代基础模型标注者的实例级陈述的基准校准。我们针对BiomedCLIP生成的弱标签,在三个医学影像基准(PCAM、ISIC、NIH-CXR)和六个跨越11倍参数范围的下游架构上提供了这样的校准。理论预测的交叉点出现在PCAM上约100个样本,ISIC上20-50个,NIH-CXR上250-500个;交叉点以上的弱标签使AUC降低高达-0.10。对于五个预训练架构中的四个,交叉点位置与架构无关,而一个家族内的DenseNet扫描(2.5倍参数,相同预训练)支持了标注者(而非学生)是主要约束的观点。该校准进而产生一个可在10-20个金标标签下操作的决策规则:比较仅金标AUC与用户金标集上的VLM准确率。NIH-CXR上的结构化与随机噪声符号翻转表明,该界限的仅速率形式是不完整的,并确定了一个具体的改进(标签空间投影),未来的基准可以设计来测试它。

英文摘要

Classical noisy-label theory predicts that downstream performance under weak supervision is bounded above by the labeler's accuracy, implying a sharp crossover: once a gold-trained classifier matches the labeler, weak labels stop helping and start hurting. The prediction is theoretical; what is missing is a benchmark calibration that turns it into an instance-level statement for modern foundation-model labelers. We provide such a calibration for BiomedCLIP-generated weak labels on three medical-imaging benchmarks (PCAM, ISIC, NIH-CXR) and six downstream architectures spanning an 11x parameter range. The crossover predicted by theory appears at ng~100 on PCAM, 20-50 on ISIC, and 250-500 on NIH-CXR; weak labels above the crossover degrade AUC by up to -0.10. The location is architecture-invariant for four of five pretrained architectures, and a within-family DenseNet sweep (2.5x parameters, identical pretraining) supports the view that the labeler, not the student, is the dominant constraint. The calibration in turn produces a decision rule operable from 10-20 gold labels: compare gold-only AUC to VLM accuracy on the user's gold set. A structured-vs-random noise sign flip on NIH-CXR shows that the rate-only formulation of the bound is incomplete and identifies a concrete refinement (label-space projection) that future benchmarks can be designed to test.

2605.24770 2026-05-26 cs.LG cs.CV 版本更新

Muon in Vision Transformers: Optimizer-Recipe Interactions and Gradient Spectra

Muon在视觉Transformer中的应用:优化器-数据增强交互与梯度谱

Ben S. Southworth, Shuai Jiang, Daniel McBride, Eric C. Cyr, Stephen Thomas

发表机构 * Los Alamos National Laboratories(洛斯阿拉莫斯国家实验室) Sandia National Laboratories(桑迪亚国家实验室) Lehigh University(莱斯大学)

AI总结 研究Muon优化器在视觉Transformer训练中的表现,发现其优于AdamW,且增益依赖于数据增强,通过梯度奇异值分析揭示Muon与AdamW在注意力投影和深层前馈块中的谱差异。

Comments 25 pages, 15 figures

详情
AI中文摘要

Muon是一种最近开发的矩阵感知优化器,在Transformer训练中表现出色,但其在视觉Transformer(ViT)中的行为尚不明确。我们研究Muon在ViT训练中的应用,主要在ImageNet-100和Pl@ntNet-300K上,与AdamW在涉及mixup、cutmix、平滑以及随机增强和擦除的标准视觉方案下进行比较。Muon始终优于AdamW,在长尾Pl@ntNet宏观top-1上尤其显著。这些增益也依赖于数据增强方案,Muon从高级和显著的数据增强技术中获益远大于AdamW。为了理解这种交互,我们分析了整个ViT中矩阵梯度的奇异值结构。在Muon训练中,去除重度数据增强会导致训练后期梯度矩阵的谱集中和模式坍塌,主要发生在深层MLP-down块中。在固定的“完整”增强方案下,Muon与AdamW最明显的对比出现在QKV梯度中,其中AdamW梯度能量集中在更窄的基上,而Muon将能量分散到更多的奇异模式上。因此,ViT中的Muon最好理解为一种优化器-数据增强交互。在固定方案下,Muon与AdamW最明显的区别在于注意力投影,其梯度由更宽的谱基组成。在Muon内部,完整的训练方案对于防止深层前馈块中的后期谱集中和模式坍塌很重要。我们进一步展示了在图像分割和掩码自编码器模型上训练ViT的效果,Muon在所有考虑的设置中均优于AdamW。

英文摘要

Muon is a recently developed matrix-aware optimizer that has shown strong results in transformer training, but its behavior in vision transformers (ViTs) is not yet well understood. We study Muon for ViT training, largely on ImageNet-100 and Pl@ntNet-300K, comparing against AdamW under standard vision recipes involving mixup, cutmix, smoothing, and random augmentation and erasing. Muon consistently outperforms AdamW, with especially large gains on long-tailed Pl@ntNet macro top-1. These gains are also recipe-dependent, where Muon benefits much more than AdamW from advanced and significant data augmentation techniques. To understand this interaction, we analyze the singular-value structure of matrix gradients throughout the ViT. Within Muon training runs, removing heavy data augmentation induces a late-training spectral concentration and mode collapse in gradient matrices, primarily in deep MLP-down blocks. Under a fixed "full" augmentation recipe, the clearest Muon-AdamW contrast appears instead in QKV gradients, where AdamW gradient energy remains concentrated in a much narrower basis while Muon spreads energy across substantially more singular modes. Muon in ViTs is therefore best understood as an optimizer-recipe interaction. Under a fixed recipe, Muon differs from AdamW most clearly in attention projections, where its gradients consist of a broader spectral basis. Within Muon, a full training recipe is important for preventing late spectral concentration and mode collapse in deep feedforward blocks. We further demonstrate efficacy in training ViTs on image segmentation and masked autoencoder models, where Muon outperforms AdamW in all settings considered.

2605.24765 2026-05-26 cs.CR cs.LG 版本更新

CyberMaskQA: A Privacy-Aware Benchmark for Evaluating Large Language Models in Cybersecurity Question Answering

CyberMaskQA: 一个用于评估大语言模型在网络安全问答中隐私意识的基准

Matilda Gaddi, Jin Noh, Onat Gungor, Tajana Rosing

发表机构 * Department of Computer Science and Engineering(计算机科学与工程系) University of California, San Diego (UCSD)(加州大学圣地亚哥分校)

AI总结 针对现有基准缺乏隐私保护评估的问题,提出CyberMaskQA基准,通过结合人工场景与LLM语义扩展生成带隐私标签的数据集,以评估模型在网络安全问答中的推理与隐私保护能力。

详情
AI中文摘要

大型语言模型(LLM)越来越多地应用于网络安全问答(QA),用于事件响应和漏洞分析等关键任务。然而,现实世界的操作环境,包括系统日志和网络配置,本质上包含敏感标识符,例如IP地址、主机名和用户账户。在受监管的环境中,使用基于云的模型处理这些数据通常不安全或不可行。此外,隐私保护问答的进展因缺乏能够同时评估操作推理和隐私保护的带注释、上下文丰富的数据集而受阻。为解决这一差距,我们引入了CYBERMASKQA,一个涵盖关键安全领域的隐私感知问答基准。与主要测试事实知识的现有基准不同,CYBERMASKQA将问题置于现实的组织环境中,并具有资产和权限之间的显式因果依赖关系。通过系统化的流水线生成,该数据集结合了人工策划的基础场景与LLM驱动的语义扩展,为每个实例标注精确的私有实体标签,以实现可控的信息披露。对问答准确性和掩码性能的评估证明了该基准在开发可部署、上下文感知的网络安全模型以及促进隐私-效用权衡的细致研究方面的实用性。一经接受,我们将发布数据集和生成框架。

英文摘要

Large language models (LLMs) are increasingly applied to cybersecurity question answering (QA) for critical tasks such as incident response and vulnerability analysis. However, real-world operational contexts, including system logs and network configurations, inherently contain sensitive identifiers, e.g., IP addresses, host names, and user accounts. Processing this data with cloud-based models is often unsafe or infeasible in regulated environments. Furthermore, progress in privacy-preserving QA is hindered by the lack of annotated, context-rich datasets capable of jointly evaluating operational reasoning and privacy preservation. To address this gap, we introduce CYBERMASKQA, a privacy-aware QA benchmark covering key security domains. Unlike existing benchmarks that primarily test factual knowledge, CYBERMASKQA grounds questions in realistic organizational contexts with explicit causal dependencies among assets and privileges. Generated through a systematic pipeline, the dataset combines human-curated base scenarios with LLM-driven semantic expansion, annotating each instance with precise private entity labels to enable controlled information disclosure. Evaluations of QA accuracy and masking performance demonstrate the benchmark's utility for developing deployable, context-aware cybersecurity models and facilitating nuanced studies of privacy-utility trade-offs. Upon acceptance, we will release the dataset and the generation framework.

2605.24763 2026-05-26 cs.LG physics.flu-dyn 版本更新

High-fidelity Modeling of Full-scale Pressurized Water Reactor Flow Fields for Machine Learning Applications

面向机器学习应用的全尺寸压水堆流场高保真建模

Logan A. Burnett, Hyungjun Kim, Hsien-Cheng Chou, Arsha Witoelar, Robert A. Brewster, Benoit Forget, Emilio Baglietto, Majdi I. Radaideh

发表机构 * Department of Nuclear Engineering and Radiological Sciences, University of Michigan(密歇根大学核工程与辐射科学系) Department of Nuclear Science and Engineering, Massachusetts Institute of Technology(麻省理工学院核科学与工程系) Korea Atomic Energy Research Institute(韩国原子能研究所) Department of Mechanical Engineering, University of Michigan(密歇根大学机械工程系) Department of Computer Science and Engineering, University of Michigan(密歇根大学计算机科学与工程系)

AI总结 本研究利用高保真CFD模拟和机器学习模型,对四环路压水堆组件级流场进行表征,揭示了冷腿旋流和下腔室输运导致的入口流量分布不均匀性,并验证了ConvLSTM等空间感知架构在流场重建与预测中的优越性。

Comments 30 pages, 10 figures, and 6 Tables

详情
AI中文摘要

本工作提出了一个用于四环路压水堆组件级流动表征的高保真计算流体动力学和数据驱动建模框架。利用公开可用的几何和运行条件构建了完整的下腔室和堆芯入口域,实现了带有泵诱导旋流边界条件的瞬态模拟。结果表明,冷腿旋流和下腔室输运在堆芯下部区域产生强烈的非均匀组件级入口流量分布,而轴向阻力和混合作用逐渐使更高位置的流动均匀化。这些基于物理的数据集随后被用于评估机器学习在部分场重建和短期自回归预测中的应用。一个基于3D卷积的修复模型成功地从部分观测中重建了缺失的组件级质量流量,误差集中在高湍流底部层,并在上层显著减小。跨多个ML模型的比较分析表明,空间感知架构,特别是ConvLSTM,通过有效捕捉耦合的时空动态,显著优于基于序列的LSTM和算子学习DeepONet方法。研究还强调了关键挑战,包括入口流预测对湍流和网格分辨率的敏感性,以及缺乏全尺寸实验验证数据。尽管存在这些限制,结果仍与预期的物理行为一致。总体而言,本工作将高保真CFD确立为开发数据驱动代理模型、稀疏传感策略和未来多物理场耦合框架的关键基础。

英文摘要

This work presents a high-fidelity computational fluid dynamics (CFD) and data-driven modeling framework for assembly-level flow characterization in a four-loop pressurized water reactor (PWR). A full lower-plenum and core-inlet domain was constructed using publicly available geometry and operating conditions, enabling transient simulations with pump-induced swirl boundary conditions. The results show that cold-leg swirl and lower-plenum transport generate strongly heterogeneous assembly-wise inlet flow distributions, particularly near the lower core region, while axial resistance and mixing progressively homogenize the flow at higher elevations. These physics-informed datasets were subsequently used to evaluate machine learning (ML) applications for partial field reconstruction and short-term autoregressive prediction. A 3D convolutional-based inpainting model successfully recon-structed missing assembly-level mass flow rates from partial observations, with errors concentrated in the highly turbulent base (bottom) layer and diminishing significantly in upper layers. Comparative analysis across multiple ML models demon-strates that spatially aware architectures, particularly ConvLSTM, significantly outperform sequence-based (LSTM) and operator-learning (DeepONet) approaches by effectively capturing coupled spatio-temporal dynamics. The study also high-lights key challenges, including the sensitivity of inlet flow predictions to turbulence and mesh resolution, as well as the absence of full-scale experimental validation data. Despite these limitations, the results remain consistent with expected physical behavior. Overall, this work establishes high-fidelity CFD as a critical foundation for developing data-driven surrogates, sparse sensing strategies, and future multiphysics coupling frameworks.

2605.24759 2026-05-26 cs.LG 版本更新

A Contractive Feedback Semantics for Reinforcement Learning

强化学习的收缩反馈语义

Zuyuan Zhang

发表机构 * The George Washington University(乔治华盛顿大学)

AI总结 本文通过将单步决策过程视为开放随机组件,并利用收缩反馈环实现无限时域策略评估,建立了强化学习的组合语义,并推导出近似等价、状态抽象和合约规范的理论结果。

详情
AI中文摘要

折扣强化学习通常通过闭马尔可夫决策过程上的贝尔曼方程来呈现。本文发展了一种组合视角:将单步决策过程视为开放随机组件,并通过闭合收缩反馈环实现无限时域策略评估。由此产生的语义为开放组件分配了类型化的贝尔曼变换器,将串联和并联布线解释为变换器的复合和张量,并将反馈解释为由唯一不动点实现的可容许有界守护迹。这一视角产生了三个理论结果。第一,近似组件等价是对于可容许的良类型守护单孔上下文的上下文同余:局部算子误差在将组件插入周围电路后仍受控,该电路使用该孔一次且其反馈节点具有认证的均匀守护性。第二,精确和近似状态抽象成为交换或近交换的余代数图,从而给出值保持和显式 sup-norm 失真界。第三,在单调 ω-连续合约变换器语义下,安全性、风险和资源规范可以表示为量值值合约,其中局部归纳界通过最小不动点推理提升到布线和反馈中。其核心主张并非所有强化学习态射构成全局迹幺半范畴,而是折扣贝尔曼评估在守护电路的可容许类上允许收缩反馈语义。

英文摘要

Discounted reinforcement learning is usually presented through Bellman equations on closed Markov decision processes. This paper develops a compositional view: a one-step decision process is treated as an open stochastic component, and infinite-horizon policy evaluation is obtained by closing a contractive feedback loop. The resulting semantics assigns typed Bellman transformers to open components, interprets series and parallel wiring as composition and tensoring of transformers, and interprets feedback as an admissible guarded Banach trace realized by a unique fixed point. This perspective yields three theoretical consequences. First, approximate component equivalence is a contextual congruence for admitted well-typed guarded one-hole contexts: local operator error remains controlled after plugging the component into a surrounding circuit that uses the hole once and whose feedback nodes have certified uniform guardedness. Second, exact and approximate state abstractions become commuting or near-commuting coalgebraic diagrams, giving value-preservation and explicit sup-norm distortion bounds. Third, under monotone $ω$-continuous contract-transformer semantics, safety, risk, and resource specifications can be represented as quantale-valued contracts, where local inductive bounds lift through wiring and feedback by least-fixed-point reasoning. Its central claim is not that all RL morphisms form a global traced monoidal category, but that discounted Bellman evaluation admits a contractive feedback semantics on the admissible class of guarded circuits.

2605.24754 2026-05-26 cs.CV cs.AI cs.LG 版本更新

Motion-Compensated Weight Compression

运动补偿权重压缩

Ismail Lamaakal

发表机构 * Multidisciplinary Faculty of Nador Mohammed Premier University(纳多莫哈梅德 premier 大学多学科学院)

AI总结 提出运动补偿权重压缩(MCWC)方法,通过对齐置换对称块并利用层序预测和熵编码,有效压缩神经网络权重,在Transformer语言建模和视觉分类任务中提升率-精度帕累托前沿。

Comments 54 pages, 17 tables, 6 Figures

详情
AI中文摘要

神经网络权重日益成为部署的瓶颈,然而大多数压缩流水线独立处理各层,忽略了由函数保持对称性引起的跨层冗余。我们提出运动补偿权重压缩(MCWC),一种仅权重的编解码器,它对齐置换对称块(例如隐藏单元和注意力头)以最大化跨层对应,将深度转化为可预测序列。在对齐的坐标系中,MCWC使用带有周期性关键帧的轻量级层序预测器,并仅编码在率失真目标下训练的学习熵模型预测残差。一个简单的解码器通过熵解码、反量化、预测驱动重建和逆对齐来重建可部署的权重,从而实现快速权重物化以进行推理。在Transformer语言建模和视觉分类中,MCWC在强量化和学习权重编解码基线之上改善了率-精度帕累托前沿,同时保持有竞争力的解码时间。消融实验证实,对齐、预测、熵建模和关键帧调度对于获得全部增益都是必要的。我们的代码可通过 https://github.com/Ism-ail11/MCWC 获取。

英文摘要

Neural network weights are increasingly a bottleneck for deployment, yet most compression pipelines treat layers independently and overlook cross-layer redundancy induced by function-preserving symmetries. We propose Motion-Compensated Weight Compression (MCWC), a weight-only codec that aligns permutation-symmetric blocks (e.g., hidden units and attention heads) to maximize cross-layer correspondence, turning depth into a predictable sequence. In the aligned coordinate system, MCWC uses a lightweight layer-sequential predictor with periodic keyframes and encodes only quantized prediction residuals using a learned entropy model trained under a rate distortion objective. A simple decoder reconstructs deployable weights by entropy decoding, dequantization, predictor-driven reconstruction, and inverse alignment, enabling fast weight materialization for inference. Across Transformer language modeling and vision classification, MCWC improves the rate accuracy Pareto frontier over strong quantization and learned weight-codec baselines, while maintaining competitive decode time. Ablations confirm that alignment, prediction, entropy modeling, and keyframe scheduling are each necessary for the full gains. Our code is available via https://github.com/Ism-ail11/MCWC.

2605.24752 2026-05-26 cs.LG cs.CC cs.DS math.PR 版本更新

A computational phase transition for learning-to-sample from Ising models

从Ising模型中学习采样的计算相变

Andrej Risteski, Thuy-Duong Vuong

发表机构 * Machine Learning Department, Carnegie Mellon University(卡内基梅隆大学机器学习系) Department of Computer Science and Engineering, UC San Diego(加州大学圣地亚哥分校计算机科学与工程系)

AI总结 本研究构造了谱阈值以上的有界宽度Ising模型族,证明在标准密码学假设下学习采样是计算困难的,从而在谱阈值处建立了尖锐的计算相变。

详情
AI中文摘要

我们研究从Ising模型中\emph{学习采样}——这是生成模型背后的基本算法任务,Ising模型是理论计算机科学和机器学习中算法思想的标准测试平台。给定未知目标分布的独立同分布样本,学习采样的目标是学习一个计算高效的生成过程,产生近似相同分布的新样本。我们构造了一个常界宽度的Ising模型族,该族恰好位于谱阈值$λ_{\max}(J)-λ_{\min}(J)=1$之上,并表明在标准密码学假设下,即使学习者获得模型的多项式多个独立同分布样本以及对其参数的显式访问,对该族的学习采样在计算上也是困难的。结合[AJKPV24,KLV25]的结果(表明谱阈值以下学习采样是可处理的),这建立了在谱阈值处的一个尖锐计算相变。此外,结合先前关于有界宽度Ising模型参数学习的结果[KM17,WSD19,VML20],这表明学习采样可能比参数学习更困难。最后,我们表明,对于这些困难实例,任何高效的学习者都表现出一种自然的记忆-幻觉二分法:学习者要么输出经过简单变换后与(变换后的)训练数据匹配的配置,要么将大量质量放在目标分布下概率可忽略的配置上。

英文摘要

We study \emph{learning-to-sample} -- a basic algorithmic task underlying generative modeling -- for Ising models, a standard testbed for algorithmic ideas in both theoretical computer science and machine learning. Given i.i.d. samples of an unknown target distribution, the goal of learning-to-sample is to learn a computationally efficient generation procedure that produces new samples following approximately the same distribution. We construct a family of Ising models of constantly bounded-width which lie just beyond the spectral threshold $λ_{\max}(J)-λ_{\min}(J)=1$, and show that learning-to-sample for this family is computationally hard under standard cryptographic assumptions, even when the learner is given both polynomially many i.i.d. samples from the model and explicit access to its parameters. Combined with results of [AJKPV24,KLV25] showing tractability of learning-to-sample below the spectral threshold, this establishes a sharp computational phase transition at the spectral threshold. Moreover, combined with prior results on parameter learning for bounded-width Ising models [KM17,WSD19,VML20], this shows that learning-to-sample can be more difficult than parameter learning. Finally, we show that any efficient learner for these hard instances exhibits a natural memorization-hallucination dichotomy: the learner must either output configurations that, after a simple transformation, match the (transformed) training data or place substantial mass on configurations of negligible probability under the target distribution.

2605.24749 2026-05-26 stat.ML cs.LG 版本更新

How Neural Reward Models Learn Features for Policy Optimization: A Single-Index Analysis

神经奖励模型如何学习策略优化的特征:单指标分析

Rei Higuchi, Ryotaro Kawata, Akifumi Wachi, Shokichi Takakura, Kohei Miyaguchi, Taiji Suzuki

发表机构 * The University of Tokyo(东京大学) RIKEN AIP(理化学研究所AIP) LY Corporation(LY公司)

AI总结 本文通过高斯单指标模型分析两阶段神经奖励模型,研究指数奖励加权对特征学习的影响,并推导出倾斜策略价值差距的界限,给出可接受的部署温度范围。

Comments 35 pages

详情
AI中文摘要

奖励建模不仅是一个预测问题:在KL正则化策略优化中,学习到的奖励被指数化以定义部署策略,因此下游价值取决于奖励倾斜区域中的误差。我们在高斯单指标模型 $r^*(x) = σ^*(\langle θ^*, x angle)$ 且 $x \sim N(0, I_d)$ 下研究这种反馈。我们分析了一个两阶段神经奖励模型,该模型首先从奖励加权样本中学习隐藏方向 $θ^*$,然后通过加权岭回归拟合读出层。指数奖励加权改变了第一层可用的Hermite信号;对于任何高于无维度 $O(1)$ 阈值的特征学习温度 $β_1$,恒定比例的神经元恢复隐藏方向,弱恢复复杂度由生成指数控制。在特征恢复后,我们推导了理想化标签加权拟合(权重 $e^{y/β_2}$)和更实用的代理加权拟合(权重 $e^{r_{a_0}(x)/β_2}$)的倾斜策略价值差距界限。保持 $β_2$ 依赖性显式,得到一组可接受的部署温度,平衡降低 $β_2$ 带来的收益与指数加权放大的学习成本;在代理加权情况下,代理相关因子缩小了该可接受集。

英文摘要

Reward modeling is not only a prediction problem: in KL-regularized policy optimization, the learned reward is exponentiated to define the deployed policy, so downstream value depends on errors in reward-tilted regions. We study this feedback in a Gaussian single-index model with $r^*(x) = σ^*(\langle θ^*, x\rangle)$ and $x \sim N(0, I_d)$. We analyze a two-stage neural reward model that first learns the hidden direction $θ^*$ from reward-weighted samples and then fits the readout layer by weighted ridge regression. Exponential reward weighting changes the Hermite signal available to the first layer; for any feature-learning temperature $β_1$ above a dimension-free $O(1)$ threshold, a constant fraction of neurons recover the hidden direction, with weak-recovery complexity governed by the generative exponent. After feature recovery, we derive tilted-policy value-gap bounds for an idealized label-weighted fit with weights $e^{y/β_2}$ and a more practical surrogate-weighted fit with weights $e^{r_{a_0}(x)/β_2}$. Keeping the $β_2$-dependence explicit yields an admissible set of deployment temperatures, balancing the gain from lowering $β_2$ against the learning cost amplified by exponential weighting; in the surrogate-weighted case, proxy-dependent factors shrink this admissible set.

2605.24748 2026-05-26 astro-ph.SR cs.LG 版本更新

Deep Learning-Enabled Prediction of Geoeffective CMEs Using SOHO and SDO Observations

基于深度学习的日冕物质抛射地效性预测:利用SOHO和SDO观测数据

Zhaoxin Yan, Jason T. L. Wang, Haimin Wang, Harim Lee, Ju Jing, Yan Xu, Chunhui Xu, Vasyl Yurchyshyn

发表机构 * Institute for Space Weather Sciences(空间天气科学研究所) Department of Computer Science(计算机科学系) Center for Solar-Terrestrial Research(太阳-地球研究中心) Big Bear Solar Observatory(大熊太阳观测站)

AI总结 提出一种融合卷积神经网络和预测网络的模型,利用SOHO和SDO观测数据预测日冕物质抛射是否引发地磁暴及其概率,在五折交叉验证中TSS达0.703,Brier分数0.095。

Comments 23 pages, 12 figures, 4 tables

详情
AI中文摘要

理解和预测日冕物质抛射(CME)的地效性对于保护近地空间环境和地球上的基础设施至关重要。在本研究中,我们提出了一种新颖的融合模型来预测CME事件的地效性。我们的模型结合了用于特征学习的卷积神经网络和用于特征融合及事件分类的预测网络。该模型利用来自太阳和日球层天文台(SOHO)上的大角度光谱日冕仪(LASCO)以及太阳动力学天文台(SDO)上的大气成像组件(AIA)和日震与磁成像仪(HMI)的观测数据进行训练。然后,训练好的模型用于预测一个到达地球的CME是否会引起地磁暴,以及/或者该CME引起此类暴的概率。基于五折交叉验证方案的实验结果表明,我们的融合模型表现出良好的性能:当模型用作确定性预测工具时,平均真实技能统计(TSS)得分为0.703;当模型用作概率预测工具时,平均Brier得分为0.095,其中TSS得分为1或Brier得分为0表示完美性能。这项工作有助于预测太阳-地球相互作用中指向地球的CME与地磁暴之间的因果关系。

英文摘要

Understanding and forecasting the geoeffectiveness of a coronal mass ejection (CME) is crucial for protecting infrastructure in the near-Earth space environment and on Earth. In this study, we present a novel fusion model to forecast the geoeffectiveness of CME events. Our model combines convolutional neural networks for feature learning and a prediction network for feature fusion and event classification. The model is trained by observations from instruments including the Large Angle Spectroscopic Coronagraph (LASCO) on board the Solar and Heliospheric Observatory (SOHO) and the Atmospheric Imaging Assembly (AIA) and Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO). The trained model is then used to predict whether an Earth-reaching CME will cause a geomagnetic storm and/or the probability that the CME will cause such a storm. Experimental results based on a five-fold cross validation scheme demonstrate the good performance of our fusion model, achieving a mean true skill statistic (TSS) score of 0.703 when the model is used as a deterministic prediction tool, and a mean Brier score of 0.095 when the model is used as a probabilistic forecasting tool, where a TSS score of 1 or a Brier score of 0 indicates perfect performance. This work contributes to forecasting the causal relationship between Earth-directed CMEs and geomagnetic storms in solar-terrestrial interactions.

2605.24743 2026-05-26 cs.LG cs.AI 版本更新

Bilevel Optimization of Synthetic Trajectories for Multi-Turn LLM Fine-Tuning

用于多轮LLM微调的合成轨迹的双层优化

Shresth Verma, Mauricio Tec, Cheol Woo Kim, Kai Wang, Milind Tambe

发表机构 * Harvard University(哈佛大学) Georgia Institute of Technology(佐治亚理工学院)

AI总结 提出BOOST双层优化框架,通过内层加权训练和外层轻量级重加权头学习,解决合成轨迹质量异质性导致的LLM多轮交互性能下降问题。

详情
AI中文摘要

虽然LLM在单轮生成中表现出色,但在长程多轮交互中表现不佳。离线强化学习提供了一种可扩展的方法,但其性能依赖于多轮轨迹数据的可用性和质量。一种常见的补救措施是使用LLM或模拟器生成的合成轨迹来增强训练,但合成数据的质量高度异质,天真地将所有轨迹视为同等信息量会降低性能。我们提出BOOST,一个双层优化框架,其中内层在重新加权的数据上训练LLM,外层在保留的真实验证任务上训练一个轻量级的重加权头,无需外部评判器即可分配连续的轨迹级权重。为了夯实这一方法,我们推导出一个PAC-Bayesian界,揭示了三方权衡:合成数据增加了多样性但存在任务偏移风险,而将权重集中在高质量轨迹上提高了经验性能但以有效样本量为代价。实验上,我们的方法一致优于多个基线。分析表明,它提高了与真实数据分布一致且具有更高定性价值的合成轨迹的权重。

英文摘要

While LLMs excel at single-turn generation, they struggle with long-horizon, multi-turn interactions. Offline reinforcement learning (RL) offers a scalable approach, yet its performance hinges on the availability and quality of multi-turn trajectory data. A common remedy is to augment training with synthetic trajectories generated by LLMs or simulators, but synthetic data is highly heterogeneous in quality, and naively treating all trajectories as equally informative can degrade performance. We propose BOOST, a bilevel optimization framework where the inner level trains the LLM on reweighted data and the outer level trains a lightweight reweighting head on held-out real validation tasks, assigning continuous trajectory-level weights without requiring an external judge. To ground this approach, we derive a PAC-Bayesian bound revealing a three-way trade-off: synthetic data increases diversity but risks task-shift, while concentrating weight on high-quality trajectories improves empirical performance at the cost of effective sample size. Empirically, our method consistently outperforms multiple baselines. Analysis reveals it upweights synthetic trajectories that align with the real data distribution and exhibit higher qualitative merit.

2605.24742 2026-05-26 cs.LG 版本更新

Aligning Molecular Graph Explanations with Chemical Identity via InChIfied Invariants

通过InChIfied不变量将分子图解释与化学身份对齐

Emanuele Guidotti, Sara Puglioli

发表机构 * University of Lugano(卢加诺大学) Philochem AG(Philochem公司)

AI总结 提出基于InChI的节点、边和图特征(InChIfied Invariants),确保化学等价分子图具有一致表示,从而提升预测和解释的一致性。

详情
AI中文摘要

在分子图上进行机器学习时,获得一致的解释需要预测和归因与化学身份对齐。然而,同一分子的化学等价图示可能产生不同的分子表示,导致不一致的预测和解释。在这里,我们引入了InChIfied不变量,这是一类基于国际化学标识符(InChI)的节点、边和图特征,设计为在保持化学身份的变换下具有不变性。使用来自PubChem Substances的一百万个分子图,我们表明InChIfied不变量在99.62%的情况下为化学等价图生成相同的表示,而标准的Daylight不变量仅在0.35%的情况下如此。在MoleculeNet任务中,InChIfied不变量在保持预测性能的同时,显著提高了同一分子不同图描绘之间的预测一致性。我们进一步进行了定量归因分析,并表明使用标准分子特征化方法产生的解释在化学等价图之间差异很大,而InChIfied不变量通过构造强制一致归因。我们发布了实现InChIfied不变量的开源软件,可作为标准分子图特征的即插即用替代品。

英文摘要

Obtaining consistent explanations for machine learning on molecular graphs requires predictions and attributions to be aligned with chemical identity. However, chemically equivalent drawings of the same molecule can induce different molecular representations, leading to inconsistent predictions and explanations. Here, we introduce InChIfied Invariants, a class of node, edge, and graph features based on the International Chemical Identifier (InChI) and designed to be invariant under transformations that preserve chemical identity. Using one million molecular graphs from PubChem Substances, we show that InChIfied Invariants produce identical representations for chemically equivalent graphs in 99.62% of cases, whereas standard Daylight invariants do so in only 0.35% of cases. Across MoleculeNet tasks, InChIfied Invariants preserve predictive performance while significantly improving prediction consistency across alternative graph depictions of the same molecules. We further perform a quantitative attribution analysis and show that explanations produced with standard molecular featurization methods vary substantially across chemically equivalent graphs, while InChIfied Invariants enforce consistent attributions by construction. We release open-source software implementing InChIfied Invariants, which can be used as a drop-in replacement for standard molecular graph features.

2605.24741 2026-05-26 math.ST cs.IT cs.LG math.IT stat.ML stat.TH 版本更新

On the Sample Complexity of Robust Binary Hypothesis Testing

关于鲁棒二元假设检验的样本复杂度

Shankar Vallinayagam, Ankit Pensia, Varun Jog

发表机构 * Department of Pure Mathematics and Mathematical Statistics, University of Cambridge(剑桥大学纯数学与数学统计系) Department of Statistics, Carnegie Mellon University(卡内基梅隆大学统计系)

AI总结 研究在三种污染模型下鲁棒二元假设检验的样本复杂度,证明最不利分布的存在性并给出显式公式,揭示样本复杂度对污染参数的不稳定性,并建立不同模型间样本复杂度的可比性。

Comments Comments welcome

详情
AI中文摘要

我们研究了在三种标准污染模型下鲁棒二元假设检验的样本复杂度:$\varepsilon$-加性(Huber)、$\varepsilon$-减性和$\varepsilon$-全变差(TV),分别记为$n^*_{\mathrm{Hub}}(\varepsilon)$、$n^*_{\mathrm{Sub}}(\varepsilon)$和$n^*_{\mathrm{TV}}(\varepsilon)$。对于减性污染,我们证明最不利分布存在并给出显式公式,使该模型与经典的Huber和TV模型一致。接下来我们表明,在所有三种模型中,样本复杂度可能在污染参数$\varepsilon$上高度不稳定,即使对于$o(\varepsilon)$的扰动也会增加多项式因子。类似地,当$\varepsilon$精确已知与仅知道$o(\varepsilon)$误差时,样本复杂度之间可能存在多项式因子差距。尽管所有模型中样本复杂度不稳定,但我们表明,在$\varepsilon$的常数因子重新缩放下,各模型的样本复杂度是可比较的。具体地,对于任意固定的$\delta_0>0$,以下对所有分布$p$和$q$成立:(i) $n^*_{\mathrm{Hub}}(\varepsilon) \lesssim n^*_{\mathrm{TV}}(\varepsilon) \lesssim n^*_{\mathrm{Hub}}(2\varepsilon)$,(ii) $n^*_{\mathrm{Sub}}(\varepsilon) \lesssim n^*_{\mathrm{TV}}(\varepsilon) \lesssim n^*_{\mathrm{Sub}}((2+\delta_0)\varepsilon)$,(iii) $n^*_{\mathrm{Sub}}(\varepsilon) \lesssim n^*_{\mathrm{Hub}}(\varepsilon) \lesssim n^*_{\mathrm{Sub}}((1+\delta_0)\varepsilon)$,且缩放常数是紧的。最后,我们将结果扩展到污染模型的自适应版本。

英文摘要

We study the sample complexity of robust binary hypothesis testing under three standard contamination models: $\varepsilon$-additive (Huber), $\varepsilon$-subtractive, and $\varepsilon$-total variation (TV), denoted by $n^*_{\mathrm{Hub}}(\varepsilon)$, $n^*_{\mathrm{Sub}}(\varepsilon)$, and $n^*_{\mathrm{TV}}(\varepsilon)$, respectively. For subtractive contamination, we show that least favourable distributions exist and provide explicit formulas for the same, bringing this model in line with the classical Huber and TV models. Next we show that in all three models, sample complexity may be highly unstable in the contamination parameter $\varepsilon$, increasing by polynomial factors even for $o(\varepsilon)$ perturbations. Similarly, there may be polynomial factor gaps between the sample complexities when $\varepsilon$ is known exactly versus when it is known up to $o(\varepsilon)$ error. Despite the instability of the sample complexity in all models, we show that the sample complexities across models are comparable up to constant-factor rescaling of $\varepsilon$. Specifically, for any fixed $δ_0>0$, the following hold for all distributions $p$ and $q$: (i) $n^*_{\mathrm{Hub}}(\varepsilon) \lesssim n^*_{\mathrm{TV}}(\varepsilon) \lesssim n^*_{\mathrm{Hub}}(2\varepsilon)$, (ii) $n^*_{\mathrm{Sub}}(\varepsilon) \lesssim n^*_{\mathrm{TV}}(\varepsilon) \lesssim n^*_{\mathrm{Sub}}((2+δ_0)\varepsilon)$, and (iii) $n^*_{\mathrm{Sub}}(\varepsilon) \lesssim n^*_{\mathrm{Hub}}(\varepsilon) \lesssim n^*_{\mathrm{Sub}}((1+δ_0)\varepsilon)$, and the scaling constants are tight. Finally, we extend our results to adaptive versions of the contamination models.

2605.24740 2026-05-26 cs.LG cs.GT 版本更新

Reinforcement Learning for Reachability: Guaranteeing Asymptotic Optimality

可达性的强化学习:保证渐近最优性

Amogh Palasamudram, Jakub Svoboda, Suguman Bansal, Krishnendu Chatterjee

发表机构 * Institute of Science and Technology, Austria(奥地利科学与技术研究所) Georgia Institute of Technology, USA(美国佐治亚理工学院) Dartmouth College, USA(美国达特茅斯学院)

AI总结 针对可达性规格的强化学习,提出一种基于PAC学习的迭代方法,在无需已知MDP内部参数的情况下实现渐近最优策略,并通过实验验证收敛动态。

Comments Main text and appendix of work accepted in ICML 2026

详情
AI中文摘要

强化学习(RL)在可达性规格中的应用是序列决策的基础,但理论保证仍较少探索。最近的工作实现了向最优策略的渐近收敛。然而,该方法对收敛动态的洞察有限。在这项工作中,我们提出了一种替代方法,提供了对收敛更深入的理论洞察。我们的方法基于带有假设的PAC学习。PAC学习保证在有限时间内以高置信度获得接近最优的策略,但需要知道内部MDP参数,如最小转移概率。我们认为,虽然这些参数在RL中是未知的,但它们可以迭代地细化并以递增的精度估计。通过迭代满足PAC条件,我们证明了在极限情况下可以实现精确最优性。在标准基准上的实证评估验证了我们对收敛动态的理论洞察。

英文摘要

Reinforcement learning (RL) for reachability specifications is fundamental in sequential decision-making, yet theoretical guarantees remain less explored. A recent work achieves asymptotic convergence to optimal policies. However, this approach provides limited insight into convergence dynamics. In this work, we present an alternative approach that provides deeper theoretical insights into convergence. Our approach builds on PAC learning with assumptions. PAC learning guarantees near-optimal policies with high confidence in finite time but requires knowing internal MDP parameters like minimum transition probability. We argue that while these parameters are unknown in RL, they can be iteratively refined and estimated with increasing accuracy. By iteratively satisfying PAC conditions, we show that exact optimality can be achieved in the limit. Empirical evaluations on standard benchmarks validate our theoretical insights into convergence dynamics.

2605.24712 2026-05-26 cs.LG cs.HC 版本更新

Hardware-Aware Federated Learning for Speech Emotion Recognition

面向语音情感识别的硬件感知联邦学习

Beyazit Bestami Yuksel, Emrah Dikbiyik

发表机构 * Computer Engineering(计算机工程) Istanbul Technical University(伊斯坦布尔技术大学) Department of Computer Technologies(计算机技术系) Istanbul University-Cerrahpaşa(伊斯坦布尔大学-塞拉赫帕沙)

AI总结 提出一种硬件感知联邦学习框架,通过硬件性能分析、Top-K客户端选择和自适应本地轮数,在IEMOCAP数据集上实现情感识别,相比FedAvg减少约36.5%训练时间和40%通信成本。

Comments 4 pages, 3 figures, 4 Tables

详情
AI中文摘要

联邦学习(FL)能够在分布式边缘设备间进行隐私保护的协作训练,但实际部署中涉及具有不同处理能力、内存容量和通信延迟的异构客户端,这通常会增加轮次持续时间和系统成本。本文提出一种硬件感知的联邦学习框架,用于在会话划分的IEMOCAP数据集上进行情感识别,该框架在统一训练循环中集成了硬件性能分析、Top-K客户端选择和自适应本地轮数。我们在非独立同分布设置下将该方法与FedAvg、FedProx和随机Top-K选择进行比较,结果表明,在50个联邦轮次和5次独立试验中,所提方法达到了具有竞争力的验证准确率(0.352),总训练时间相比FedAvg减少约36.5%,累积通信成本降低40%。

英文摘要

Federated learning (FL) enables privacy-preserving collaborative training across distributed edge devices, but real deployments involve heterogeneous clients with different processing power, memory capacity, and communication latency, which often increase round duration and system cost. This paper proposes a hardware-aware federated learning framework for emotion recognition on session-partitioned IEMOCAP that integrates hardware profiling, top-K client selection, and adaptive local epochs within a unified training loop. We compare the method against FedAvg, FedProx, and random top-K selection under a non-IID setup and show that, across 50 federated rounds and 5 independent trials, the proposed approach achieves competitive validation accuracy (0.352), reduces total training time by about 36.5% compared to FedAvg, and lowers cumulative communication cost by 40%.

2605.24710 2026-05-26 cs.LG math.PR math.ST stat.ML stat.TH 版本更新

Feature Learning in Wide Neural Networks under $μ$P: Identifiability and Sparse-Dictionary Decomposition of the Mean-Field Limit

μP 下宽神经网络中的特征学习:平均场极限的可辨识性与稀疏字典分解

Akmal Xodarev

发表机构 * Independent Researcher(独立研究者)

AI总结 本文在最大更新参数化(μP)下,针对宽两层神经网络,建立了特征学习的四个结构结果,包括平均场极限的全局存在唯一性、可辨识性刻画、稀疏字典分解以及总特征学习误差分解,并揭示了架构-数据对的自然学习单元。

Comments 86 pages

详情
AI中文摘要

我们在最大更新参数化($μ$P)下,为宽两层神经网络中的特征学习建立了四个结构结果。 第一,我们证明了在$μ$P下带噪声梯度下降的平均场极限的全局存在唯一性,确定了初始化矩序列上的最大可容许权重$w^*$作为参数-矩增长边界的倒数,从而也是流传播的最大加权矩类。有限粒子近似具有关于时间的均匀平方Wasserstein速率$O(N^{-1})$。 第二,我们刻画了平均场极限的可辨识性:两个可容许参数测度在$L^2$中诱导相同的网络函数当且仅当它们的活跃分量在模去架构的有限秩实现对称性后一致。轨道深度$D^*_{\mathrm{orb}}$与矩簇深度$D^*_{\mathrm{var}}$不同。 第三,在Barron-Hermite目标条件下,长时间极限测度的活跃支撑集允许一个稀疏字典分解:它在模去有限秩实现对称性后至多支撑在$S^*$个原子上,其中$S^*$由一个显式的系数阈值数界定。 第四,我们将总特征学习误差分解为统计、优化、混沌传播和稀疏残差分量,其中目标相关的Hermite/Barron尾部取代了任何仅初始化的残差。 这四个结果通过一个架构恒等式联系在一起:三元组$(w^*, D^*_{\mathrm{orb}}, S^*)$——最大可容许权重、轨道可辨识深度以及目标可实现时的稀疏字典深度——是架构-数据对$(\sigma, \rho)$的自然学习单元。证明是自包含的,除了来自$μ$P和平均场Langevin理论的标准结果。

英文摘要

We establish four structural results for feature learning in wide two-layer neural networks under the Maximal Update Parametrization ($μ$P). First, we prove global existence and uniqueness of the mean-field limit of noisy gradient descent under $μ$P, identifying the maximal admissible weight $w^*$ on the moment sequence of the initialization as the reciprocal parameter-moment-growth boundary, and hence the largest weighted moment class propagated by the flow. The finite-particle approximation has uniform-in-time squared-Wasserstein rate $O(N^{-1})$. Second, we characterize identifiability of the mean-field limit: two admissible parameter measures induce the same network function in $L^2$ exactly when their active components agree modulo the finite-rank realization symmetry of the architecture. The orbit depth $D^*_{\mathrm{orb}}$ is separated from the moment-variety depth $D^*_{\mathrm{var}}$. Third, under the Barron-Hermite target condition the active support of the long-time limit measure admits a sparse-dictionary decomposition: it is supported on at most $S^*$ atoms modulo finite-rank realization symmetry, with $S^*$ bounded by an explicit coefficient-threshold number. Fourth, we derive the total feature-learning-error decomposition into statistical, optimization, propagation-of-chaos, and sparse-residual components, with a target-dependent Hermite/Barron tail replacing any initialization-only residual. The four results are tied together by an architectural identity: the triple $(w^*, D^*_{\mathrm{orb}}, S^*)$ -- the maximal admissible weight, the orbit identifiability depth, and the sparse-dictionary depth at which the target is realizable -- is the natural learning cell of the architecture-data pair $(σ, ρ)$. The proofs are self-contained except for standard results from $μ$P and mean-field Langevin theory.

2605.24709 2026-05-26 cs.LG 版本更新

Streaming Reinforcement Learning under Partial Observability with Real-Time Recurrent Learning

部分可观测下的流式强化学习与实时循环学习

Noah Farr, Aryaman Reddi, Carlo D'Eramo, Jan Peters

发表机构 * Technical University of Darmstadt(德累斯顿技术大学) University of Würzburg(维尔茨堡大学) German Research Center for AI (DFKI)(德国人工智能研究中心 (DFKI)) Zuse School(祖斯学校)

AI总结 提出使用递归迹单元(RTU)实现精确实时循环学习(RTRL),在参数数量上具有线性时间和内存复杂度,解决了部分可观测环境下流式强化学习的梯度计算瓶颈,并在离散和连续控制任务中保持性能。

Comments 16 pages, 4 figures

详情
AI中文摘要

流式强化学习已成为一种在线学习范式,它符合自然学习代理的约束,即增量处理数据(批大小为1,无回放缓冲区)。虽然流式RL最近在完全可观测下通过深度函数逼近实现了扩展,但部分可观测设置仍然难以实现。在流式设置下,截断式时间反向传播退化为一步梯度视野,而精确的实时循环学习则代价过高。我们使用递归迹单元(一种对角递归架构,能够在参数数量上实现线性时间和内存复杂度的精确RTRL)来弥合这一差距,并展示它们能够干净地集成到现有的流式算法中,适用于离散和连续控制。在链长从2到128的MemoryChain诊断任务中,我们的方法保持了性能,而使用前馈、GRU和RTU网络的流式TBPTT(1)基线则崩溃。在五个POPGym任务和部分可观测的MuJoCo连续控制中,流式方法在POPGym上与批量PPO竞争,并在掩码MuJoCo上恢复了批量性能的很大一部分,尽管没有使用回放缓冲区或批量更新。

英文摘要

Streaming reinforcement learning has emerged as an online learning paradigm that conforms to the restrictions of natural learning agents that process data incrementally, i.e. with a batch size of 1 and no replay buffer. While streaming RL has recently been shown to scale with deep function approximation with full observability, partially observable settings have remained out of reach. Truncated backpropagation through time collapses to a one-step gradient horizon under the streaming setting, and exact real-time recurrent learning is prohibitively expensive. We close this gap using recurrent trace units, a diagonal recurrent architecture that enables exact RTRL with linear time and memory complexity in the parameter count, and show that they integrate cleanly into existing streaming algorithms across both discrete and continuous control. On a MemoryChain diagnostic with chain lengths from 2 to 128, our method sustains performance where streaming TBPTT(1) baselines using feedforward, GRU, and RTU networks collapse. On five POPGym tasks and on partially observable MuJoCo continuous control, the streaming approach is competitive with batched PPO on POPGym and recovers a substantial fraction of batched performance on masked MuJoCo, despite using no replay buffer or batched updates.

2605.24699 2026-05-26 cs.AI cs.LG 版本更新

MDIA: A Multi-Agent Diagnostic Intelligence Pipeline on HealthBench Professional

MDIA:HealthBench Professional上的多智能体诊断智能流水线

Roberto Cruz, David Rey-Blanco

发表机构 * TietAI

AI总结 提出MDIA多智能体诊断系统,通过7节点专业路由临床推理图架构,在非微调LLM上实现HealthBench Professional基准性能提升3.72个百分点,归因于系统架构设计而非提示工程。

Comments 33 pages, 10 figures

详情
AI中文摘要

大多数关于agentic-LLM临床基准测试的报告收益通常归因于提示工程,但我们的结果表明,更大的改进可能来自架构和引擎级别的设计。我们提出了MDIA,一个多智能体诊断智能体,实现为7节点专业路由临床推理图,在完整的HealthBench Professional基准测试(n=525)上,使用非微调LLM。MDIA在OpenAI的GPT-5.4-2026-03-05下达到0.6272,比OpenAI的ChatGPT for Clinicians的性能高出3.72个百分点。实验工作表明,性能提升归因于系统架构:专业路由、多轮上下文保留、药物状态安全门控、站点过滤搜索、长度感知合成和引擎级可靠性。这些发现支持了agentic临床基准性能由底层基础模型和编排架构共同塑造的观点。然而,我们也注意到在使用其他模型作为评分器时存在显著差异;特别是,当使用Gemini 2.5 Pro时,MDIA得分为0.6585,这表明评分器的选择是变异性来源。因此,对LLM的稳健评估需要跨多个独立评分器模型进行评估。

英文摘要

Most reported gains on agentic-LLM clinical benchmarks are often attributed to prompt engineering, yet our results suggest that larger improvements can come from architectural and engine-level design. We present MDIA, a Multi-agent Diagnostic Intelligence Agent implemented as a 7-node specialty-routed clinical reasoning graph, on the full HealthBench Professional benchmark (n = 525), on a non-fine-tuned LLM. MDIA achieves 0.6272 under OpenAI's GPT-5.4-2026-03-05, which is +3.72 pp above the performance of OpenAI's ChatGPT for Clinicians. The experimental work shows that performance lift is attributable to system architecture: specialty routing, multi-turn context preservation, drug-state safety gating, site-filtered search, length-aware synthesis, and engine-level reliability. These findings support the view that agentic clinical benchmark performance is shaped both by the underlying foundation model and the orchestration architecture. Nevertheless, we also noticed notable differences when using other models as a grader; in particular, when using Gemini 2.5 Pro, MDIA scored 0.6585, which suggests that the choice of grader is a source of variability. Robust evaluation of LLMs would therefore require assessment across several independent grader models.

2605.24696 2026-05-26 cs.CR cs.LG 版本更新

CALIBURN: A Regime-Sensitivity Study of Operationally Calibrated Streaming Intrusion Detection

CALIBURN: 操作校准流式入侵检测的机制敏感性研究

Michel A. Youssef

发表机构 * Independent Researcher(独立研究者)

AI总结 本文提出CALIBURN流式告警流水线,通过贝叶斯变化点检测、等渗校准、成本敏感阈值、共形风险控制和多窗口烧毁率告警五个组件,在不同攻击率场景下评估其性能,在罕见攻击场景下AUC-PR达到0.943。

Comments 55 pages, 5 figures, 14 tables. Under review at Cyber Security and Applications. Code: https://github.com/MichelYsf/rcbsid-paper. Archived release: https://doi.org/10.5281/zenodo.20074590

详情
AI中文摘要

流式网络入侵检测系统必须持续处理流并保持内存有界,但大多数现有方法将告警阈值选择视为事后调优问题,不适合生产环境。操作员需要在部署前使用误报成本、漏报成本和告警预算等输入来指定告警行为。本文提出CALIBURN,一个由五个组件组成的流式告警流水线:截断贝叶斯在线变化点检测器、将变化点后验映射到经验条件攻击概率的等渗校准层、从操作员指定的误分类成本导出的成本敏感决策阈值、将告警预算规范转换为可交换性下窗口内有效阈值的共形风险控制包装器,以及从站点可靠性工程实践改编的多窗口烧毁率告警层。我们不声称统一优势,而是将CALIBURN作为机制敏感性研究,在三个攻击率场景下评估流水线:LITNET-2020(5.2%)、CICIDS2017(22.06%)和UNSW-NB15(64%)。在罕见攻击场景下,CALIBURN在LITNET-2020上达到AUC-PR 0.943,比最佳流式基线高出2.21倍,比最佳批处理参考高出4.12倍;等渗校准将Brier分数降低30%。在中等攻击率场景下,CALIBURN在CICIDS2017上仍是最强的流式方法,但被批处理密度方法超越。在高攻击率场景下,所有流式方法都接近攻击率下限。我们进一步识别了两种不同的CRC崩溃机制,导致在小的alpha下告警规则退化,并将两者作为操作指南提供给实践者。

英文摘要

Streaming network intrusion detection systems must process flows continuously while keeping memory bounded, but most current methods leave alerting threshold selection as a post-hoc tuning problem poorly suited to production. Operators need alerting behaviour specifiable before deployment using inputs such as false-negative cost, false-positive cost, and alerting budget. This paper presents CALIBURN, a five-component streaming alerting pipeline composed of a truncated Bayesian online change-point detector, an isotonic calibration layer mapping the change-point posterior to an empirical conditional attack probability, a cost-sensitive decision threshold derived from operator-specified misclassification costs, a Conformal Risk Control wrapper that converts an alert-budget specification into a within-window valid threshold under exchangeability, and a multi-window burn-rate alerting layer adapted from Site Reliability Engineering practice. Rather than claiming uniform dominance, we present CALIBURN as a regime-sensitivity study, evaluating the pipeline across three attack-prevalence regimes: LITNET-2020 at 5.2 percent, CICIDS2017 at 22.06 percent, and UNSW-NB15 at 64 percent. In the rare-attack regime, CALIBURN achieves AUC-PR 0.943 on LITNET-2020, outperforming the best streaming baseline by 2.21x and the best batch reference by 4.12x; isotonic calibration reduces Brier score by 30 percent. In the moderate-prevalence regime, CALIBURN remains the strongest streaming method on CICIDS2017 but is exceeded by batch density methods. In the high-prevalence regime, all streaming methods approach the prevalence floor. We further identify two distinct CRC-collapse mechanisms driving the alert rule to degeneracy at small alpha, treating both as operational guidance for practitioners.

2605.24690 2026-05-26 cs.RO cs.LG 版本更新

Sum of Costs Diffusion with Dynamic Guidance for Motion Planning

运动规划的动态引导代价和扩散模型

Aysu Aylin Kaplan, Özgür Erkent

发表机构 * Computer Engineering Department, Hacettepe University(哈切特佩大学计算机工程系)

AI总结 提出一种基于扩散模型的高泛化运动规划方法,通过总碰撞代价梯度引导去噪过程并动态选择引导起始步,在Mπnets数据集上取得最优性能。

Comments Accepted at the Frontiers of Optimization for Robotics Workshop at the IEEE International Conference of Robotics & Automation (ICRA), 2026

详情
AI中文摘要

机器人操作的运动规划问题可以通过经典方法或深度学习方法来解决。现有方法在泛化到不同场景时面临重大挑战。在本研究中,我们提出了一种具有高泛化能力的方法,该方法使用扩散模型生成无碰撞轨迹,其中去噪过程由总碰撞代价的梯度引导。我们还提出了一种动态选择梯度引导起始步的方法。实验结果表明,通过动态引导扩散模型与碰撞代价之和,能够克服竞争方法面临的泛化问题,提供更鲁棒的性能。所提出的模型在Mπnets数据集的不同测试场景中,相比其他方法取得了最高性能,证明了其有效性。

英文摘要

The motion planning problem for robotic manipulation can be addressed through classical or deep learning approaches. Existing methods face significant challenges in generalizing to diverse settings. In this study, we present a method with high generalization capability that generates collision-free trajectories using diffusion models where the denoising process is guided by the gradient of the total collision cost. We are also presenting a dynamic approach for choosing start step of the gradient guidance. Experimental results demonstrate that guiding the diffusion model dynamically with the sum of collision costs offers more robust performance by overcoming the generalization issues faced by competing methods. The proposed model demonstrates its effectiveness by achieving the highest performance on diverse test settings in M$π$nets\ dataset among the compared methods.

2605.24684 2026-05-26 cs.LG cs.AI 版本更新

Beyond the Aggregation Dilemma: Prior-Retaining Decoupled Learning for Multimodal Graphs

超越聚合困境:多模态图的先验保持解耦学习

Hao Yan, Xuanru Wang, Jun Yin, Shirui Pan, Senzhang Wang, Chengqi Zhang

发表机构 * School of Computer Science and Engineering, Central South University(中南大学计算机科学与工程学院) Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University(香港理工大学数据科学与人工智能系) School of Information and Communication Technology, Griffith University(格里菲斯大学信息与通信技术学院)

AI总结 针对多模态属性图学习中强制聚合导致性能反转的聚合困境,提出解耦双路径架构SUPRA,通过保持先验特征的独立性和轻量级共享GNN捕获结构协同,并辅以深度监督缓解梯度饥饿,实现SOTA性能且显著降低计算开销。

详情
AI中文摘要

多模态属性图学习(MAGL)通过图聚合将节点内在属性与结构拓扑相结合。然而,随着预训练编码器演变为大型基础模型(LFM),MAGL的格局发生了根本性转变:在高置信度LFM先验下,强制聚合引入了拓扑噪声,淹没了判别信号,引发反直觉的性能反转,即复杂的MAGL架构性能不如简单的拓扑无关MLP。通过系统的实证和理论分析,我们确定这种反转源于一个基本的聚合困境,其特征是两种并发病理:(1)表征病理(信噪比退化)——强制聚合用拓扑噪声稀释了鲁棒的内在特征,导致噪声惩罚超过其协作收益;(2)优化病理(梯度饥饿)——拓扑聚合减弱了梯度流,而共享任务损失导致主导模态过早抑制较弱模态。为解决这一困境,我们提出SUPRA(共享-独特先验保持架构),一种解耦的双路径范式。SUPRA通过拓扑无关的MLP处理模态特定特征,同时通过轻量级共享GNN捕获结构协同,并辅以深度监督来对抗梯度饥饿。大量评估表明,SUPRA实现了最先进的性能,同时峰值GPU内存需求降低3.5倍,训练时间比多模态图变换器快4.4倍。

英文摘要

Multimodal Attributed Graph Learning (MAGL) integrates intrinsic node attributes with structural topology via graph aggregation. However, as pretrained encoders evolve into Large Foundation Models (LFMs), the landscape of MAGL fundamentally shifts: under high-confidence LFM priors, mandatory aggregation introduces topological noise that overwhelms discriminative signals, triggering a counter-intuitive performance inversion where sophisticated MAGL architectures underperform simple topology-agnostic MLPs. Through systematic empirical and theoretical analysis, we identify that this inversion stems from a fundamental aggregation dilemma characterized by two concurrent pathologies: (1) Representational Pathology (SNR Degradation) - mandatory aggregation dilutes robust intrinsic features with topological noise, causing the noise penalty to outweigh its collaborative benefit; and (2) Optimization Pathology (Gradient Starvation) - topological aggregation attenuates gradient flow, while a shared task loss causes dominant modalities to prematurely suppress weaker ones. To resolve this dilemma, we propose SUPRA (Shared-Unique Prior-Retaining Architecture), a decoupled dual-pathway paradigm. SUPRA processes modality-specific features through topology-agnostic MLPs while capturing structural synergy via a lightweight shared GNN, with auxiliary deep supervision counteracting gradient starvation. Extensive evaluations demonstrate that SUPRA achieves state-of-the-art performance while requiring 3.5x lower peak GPU memory and up to 4.4x faster training time than Multimodal Graph Transformers.

2605.24680 2026-05-26 cs.LG 版本更新

Trajectory-Based Difficulty Scoring for Reliable Learning on Tabular Data

基于轨迹的难度评分用于表格数据的可靠学习

Tomer Lavi, Bracha Shapira, Nadav Rappoport

发表机构 * Faculty of Computer and Information Science, Ben-Gurion University of the Negev(计算机与信息科学学院,本·古里安内盖夫大学)

AI总结 提出轨迹难度评分(TDS),通过分析梯度提升树的逐树累积预测轨迹,为每个实例估计难度,并在分类和回归任务中优于现有基线,同时支持主动学习、选择性预测和共形预测等应用。

详情
AI中文摘要

梯度提升树在表格数据上表现出色,但常常留下一个长尾的预测不佳实例。我们引入了一种基于轨迹的难度评分(TDS),这是一种针对提升集成模型的实例级难度估计器,源自每棵树的累积预测轨迹。对于每个实例,我们计算可解释的轨迹描述符(例如,方差、振荡峰值、符号切换和尾部稳定性),并训练一个轻量级回归模型来预测保留损失。经验CDF将得到的信号校准为$[0,1]$内的分数,支持对困难案例进行排序。在多种表格基准和集成大小上,TDS与误差表现出强秩相关性,并且在分类任务上优于现有的实例难度和不确定性基线,同时在回归任务上保持竞争力。然后,我们展示了单个难度信号如何改进多个数据挖掘工作流:用于标签高效训练的难度驱动主动学习、用于改进风险覆盖权衡的难度阈值选择性预测,以及用于更均匀条件覆盖的TDS分层(Mondrian)共形预测。最后,使用SHAP归因对高TDS实例进行聚类,揭示了以紧凑特征值范围为特征的连贯故障模式,支持错误分析和针对性数据采集。

英文摘要

Gradient-boosted trees achieve strong performance on tabular data, yet often leave a long tail of poorly predicted instances. We introduce a Trajectory-based Difficulty Score (TDS), an instance-level difficulty estimator for boosted ensembles derived from per-tree cumulative prediction trajectories. For each instance, we compute interpretable trajectory descriptors (e.g., variance, oscillation peaks, sign switches, and tail stability) and train a lightweight regression model to predict held-out loss. An empirical CDF calibrates the resulting signal into a score in $[0,1]$ that supports ranking hard cases. Across diverse tabular benchmarks and ensemble sizes, TDS exhibits strong rank correlation with error and outperforms established instance-hardness and uncertainty baselines on classification, while remaining competitive on regression. We then show how a single difficulty signal improves multiple data mining workflows: difficulty-driven active learning for label-efficient training, difficulty-thresholded selective prediction for improved risk-coverage trade-offs, and TDS-stratified (Mondrian) conformal prediction for more uniform conditional coverage. Finally, clustering high-TDS instances using SHAP attributions reveals coherent failure modes characterized by compact feature-value ranges, supporting error analysis and targeted data acquisition.

2605.24673 2026-05-26 stat.ML cs.LG 版本更新

Affinity Graph Connectivity in Convex Clustering

凸聚类中的亲和图连通性

Sam Rosen, Jason Xu

发表机构 * Department of Statistical Science, Duke University(杜克大学统计科学系) Department of Biostatistics, University of California Los Angeles(加州大学洛杉矶分校生物统计学系)

AI总结 研究凸聚类中亲和权重对应一般连通图时的有限样本界,通过随机游走理论分析聚类性能与图结构连通性的关系,并提出超参数调优应包括亲和权重的调整。

Comments 28 pages, 6 figures

详情
AI中文摘要

我们将凸聚类的有限样本界推广到目标函数中的亲和权重对应一般连通图的情形。这些界及其分析有助于更好地理解数据背后各种隐含连通结构下的聚类行为,并为质心恢复提供新的收敛速率。新的理论框架基于随机游走,这使得可以应用与随机图模型相关的集中不等式,并形式化了聚类性能与图结构连通性之间的关系。通过界的形式和实证结果,我们认为凸聚类问题的超参数调优还应包括输入亲和权重的调优。

英文摘要

We generalize finite-sample bounds for convex clustering to the setting where affinity weights appearing in the objective correspond to a general connected graph. These bounds and their analysis lead to a better understanding of clustering behavior under various implied connectivity structures behind the data and to new rates of convergence for centroid recovery. The new theoretical framework is based on random walks, which allow application of concentration inequalities related to random graph models, and formalizes the relationship between the clustering performance and the connectivity of the graph structures. Through the form of the bound and empirical results, we argue proper tuning of hyperparameters to convex clustering problems should also include tuning of input affinity weights.

2605.24667 2026-05-26 cs.AI cs.LG 版本更新

When Mean CE Fails: Median CE Can Better Track Language Model Quality

当平均交叉熵失效时:中位数交叉熵能更好地跟踪语言模型质量

Hao Guo, Simon Dennis, Rivaan Patil, Kevin Shabahang

发表机构 * i14 University of Melbourne(墨尔本大学) University of California, Santa Cruz(加州大学圣克ruz分校)

AI总结 本文发现中位数交叉熵比平均交叉熵更能反映语言模型在训练过程中的任务性能,并建议在评估时报告多个百分位交叉熵。

Comments 20 pages

详情
AI中文摘要

平均交叉熵是语言模型的标准验证指标,但在训练过程中可能无法跟踪模型质量。我们在两种常见场景下研究了这一点。首先,在Qwen2.5-1.5B的合成事实学习SFT中,我们发现平均CE在初始学习阶段后显著上升,而保留的事实召回准确率保持接近峰值。其次,在TinyStories上的top-K蒸馏中,我们发现减小K会改善中位数CE而恶化平均CE;Top-5学生获得了最高的LLM评判分数,并在中位数CE上低于其教师,尽管其平均CE最差。在这两种情况下,中位数CE与任务性能的相关性比平均CE更紧密。分析训练过程中整体和尾部百分位CE的变化表明,训练重塑了经验性的每token CE分布。在top-K蒸馏中,较小的K产生了一个在两端都有更多质量的分布,降低了中位数并增加了平均值。在Qwen SFT中,整体部分迅速饱和,而尾部在训练后半段延伸。在这两种情况下,任务评估指标似乎对整体部分比尾部更敏感。实际上,我们建议在报告平均CE的同时报告一小部分百分位CE摘要,并利用它们之间的一致性作为跟踪分布重塑的工具,以及当平均和中位数CE在模型选择上不一致时的低成本诊断。

英文摘要

Mean cross-entropy is the standard validation metric for language models, but it can fail to track model quality during training. We examine this in two common scenarios. First, in Qwen2.5-1.5B SFT on synthetic fact-learning, we find that mean CE rises substantially after the initial learning phase while held-out fact-recall accuracy remains near its peak. Second, we find that in top-K distillation on TinyStories, decreasing K improves median CE while worsening mean CE; the Top-5 student attains the highest LLM-judge score and crosses below its teacher on median CE, despite having the worst mean CE. In both cases, median CE correlates much more closely with task performance than does mean CE. Analyzing how bulk and tail percentile CE move during training reveals that training reshapes the empirical per-token CE distribution. In top-K distillation, smaller K yields a distribution with more mass at both extremes, decreasing the median and increasing the mean. In Qwen SFT, the bulk saturates quickly while the tail extends in the latter half of training. In both, the task-evaluation metric appears more sensitive to the bulk than to the tail. Practically, we recommend reporting a small set of percentile CE summaries alongside the mean, and using concordance among them as a tool to keep track of distribution reshaping, as well as a low-cost diagnostic for when mean and median CE disagree on model selection.

2605.24659 2026-05-26 cs.LG 版本更新

IterInject: Indirect Prompt Injection Against LLM Agents via Feedback-Guided Iterative Optimization

IterInject: 通过反馈引导的迭代优化实现对LLM智能体的间接提示注入

Zixuan Chen, Jiaxiang Chen, Li Luo, Ke Xu, Xiaoxiang Huang, Tanfeng Sun, Xinghao Jiang

发表机构 * Shanghai Jiao Tong University(上海交通大学) The University of Hong Kong(香港大学)

AI总结 提出IterInject框架,通过规则诊断器和LLM优化器迭代优化对抗载荷,实现对LLM智能体的间接提示注入攻击,在多个基准和实际系统中显著优于现有方法,并揭示了注意力介导的阈值机制。

Comments Submitted to EMNLP 2026

详情
AI中文摘要

基于LLM的智能体越来越多地被部署用于需要规划、工具使用和与外部服务交互的复杂任务。它们对外部不可信内容的依赖使其容易受到间接提示注入(IPI)攻击,其中嵌入在检索数据中的对抗指令劫持智能体行为。现有攻击依赖于无法适应智能体特定防御的静态载荷;即使是最近的适应性方法也缺乏结构化反馈来指导优化。我们提出\oursys,一个反馈引导的迭代框架,闭合了注入、诊断和精炼之间的循环:基于规则的诊断器产生带有行为描述的结构化结果标签,基于LLM的优化器根据完整的优化历史精炼载荷。一个合成步骤从失败模式中生成新的伪装种子,使策略空间能够自我进化。在AgentDojo和InjectAgent上,\oursys在四个受害模型上显著优于静态基线和现有的适应性方法。在Claude Code(一个具有分层防御的生产级编码智能体)上的扩展实验表明,优化后的载荷在9个目标中的5个上取得了完全成功;即使那些抵抗完全利用的目标也显示出通过迭代精炼可衡量的改进。我们进一步对IPI进行了机制分析,识别出中后层中注意力介导的阈值机制;三个因果干预验证了这一发现,并指出了具体的防御方向。

英文摘要

LLM-based agents are increasingly deployed for complex tasks requiring planning, tool use, and interaction with external services. Their reliance on untrusted external content exposes them to indirect prompt injection (IPI), in which adversarial instructions embedded in retrieved data hijack agent behavior. Existing attacks rely on static payloads that cannot adapt to agent-specific defenses; even recent adaptive methods lack structured feedback to guide optimization. We introduce \oursys, a feedback-guided iterative framework that closes the loop between injection, diagnosis, and refinement: a rule-based diagnoser produces structured outcome labels with behavioral descriptions, and an LLM-based optimizer refines payloads conditioned on the full optimization history. A synthesis step generates new disguise seeds from failure patterns, enabling the strategy space to self-evolve. On AgentDojo and InjectAgent, \oursys substantially outperforms static baselines and existing adaptive methods across four victim models. Extension experiments on Claude Code, a production-grade coding agent with layered defenses, show that optimized payloads achieve full success on 5 of 9 targets; even those that resist full exploitation exhibit measurable improvement from iterative refinement. We further present a mechanistic analysis of IPI, identifying an attention-mediated threshold mechanism in mid-to-late layers; three causal interventions validate this finding and point to concrete defense directions.

2605.24658 2026-05-26 cs.LG 版本更新

WLNO: Wavelet-Laplace Neural Operator for Solving Partial Differential Equations

WLNO: 用于求解偏微分方程的小波-拉普拉斯神经算子

Muhammad Abid, Arth Sojitra, Omer San

发表机构 * Department of Mechanical and Aerospace Engineering, University of Tennessee, Knoxville(田纳西大学机械与航空航天工程系)

AI总结 提出WLNO,通过融合Haar小波多尺度空间分解与拉普拉斯神经算子的极点-留数公式,在五个基准PDE问题上优于LNO,尤其擅长处理具有强空间多尺度结构的问题。

详情
AI中文摘要

本文介绍了小波-拉普拉斯神经算子(WLNO),一种新颖的神经算子,它将Haar小波多尺度空间分解与拉普拉斯神经算子(LNO)的拉普拉斯域极点-留数公式融合在一起。虽然LNO通过可学习的系统极点和留数捕捉瞬态和稳态动力学,但它缺乏提取复杂PDE解中固有的空间局部多尺度特征的显式机制。WLNO通过用并行单级Haar离散小波变换(DWT)分支增强LNO核心来解决这一问题,该分支将提升的特征图分解为四个频率子带:近似(LL)、水平细节(LH)、垂直细节(HL)和对角细节(HH),并在通过逆DWT重建之前对每个子带应用独立学习的$1\times1$卷积。两个分支通过一个可学习的sigmoid门控权重$\alpha_\mathrm{wav}$融合,该权重初始化为给小波分支一个小的初始贡献,允许模型在整个训练过程中自适应地平衡拉普拉斯域动力学与空间多尺度特征。WLNO与LNO在五个基准PDE问题上使用相同的超参数、训练数据和评估协议进行评估:扩散方程、Burgers方程、反应扩散系统、达西流和二维Navier-Stokes方程。WLNO在所有五个问题上始终优于LNO,在具有强空间多尺度结构的问题上改进最为显著,例如具有尖锐激波前沿的Burgers方程和具有相干涡旋结构的Navier-Stokes方程,而在更平滑和椭圆问题上表现一致。这些结果表明,基于小波的多尺度空间分解是拉普拉斯域算子学习的一种有原则且有效的补充。

英文摘要

This work introduces the Wavelet-Laplace Neural Operator (WLNO), a novel neural operator that fuses Haar wavelet multi-scale spatial decomposition with the Laplace-domain pole-residue formulation of the Laplace Neural Operator (LNO). While LNO captures transient and steady-state dynamics through learnable system poles and residues, it lacks an explicit mechanism for extracting spatially localized multi-scale features inherent in complex PDE solutions. WLNO addresses this by augmenting the LNO core with a parallel single-level Haar discrete wavelet transform (DWT) branch that decomposes the lifted feature map into four frequency subbands: approximation (LL), horizontal detail (LH), vertical detail (HL), and diagonal detail (HH) and applies independent learned $1\times1$ convolutions to each subband before reconstruction via the inverse DWT. The two branches are fused through a learnable sigmoid-gated weight $α_\mathrm{wav}$, initialized to give a small initial contribution to the wavelet branch, allowing the model to adaptively balance Laplace-domain dynamics against spatial multi-scale features throughout training. WLNO is evaluated against LNO on five benchmark PDE problems using identical hyperparameters, training data, and evaluation protocols: the diffusion equation, the Burgers equation, the reaction-diffusion system, Darcy flow, and the two-dimensional Navier-Stokes equation. WLNO consistently outperforms LNO on all five problems, with the most pronounced improvement on problems with strong spatial multi-scale structure, such as the Burgers equation with sharp shock fronts and the Navier-Stokes equation with coherent vortical structures, while remaining consistent across smoother and elliptic problems. These results demonstrate that wavelet-based multi-scale spatial decomposition is a principled and effective complement to Laplace-domain operator learning.

2605.24651 2026-05-26 math.NA cs.LG cs.NA 版本更新

WINO: A Weak-Form Physics Informed Neural Operator for Hyperelasticity on Variable Domains

WINO: 一种用于变域超弹性问题的弱形式物理信息神经算子

Bokai Zhu, Qinghui Zhang, Timon Rabczuk

发表机构 * School of Science, Harbin Institute of Technology, Shenzhen, P. R. China(哈尔滨工业大学深圳校区) School of Science, Harbin Institute of Technology, Shenzhen, Guangdong(哈尔滨工业大学深圳校区) Institute of Structural Mechanics, Bauhaus-Universität Weimar(魏玛 Bauhaus 大学结构力学研究所)

AI总结 提出一种无数据框架WINO,结合神经算子的效率与φ-有限元法的几何灵活性,通过最小化弱形式残差和惩罚项训练,实现高精度且计算时间减少50-80%。

详情
AI中文摘要

我们提出了一种弱形式物理信息神经算子(WINO),这是一个无数据框架,结合了神经算子的效率与φ-有限元法(φ-FEM)的几何灵活性。φ-FEM是一种非拟合方法,无需体拟合网格即可适应几何变化,其中域几何由水平集函数φ表示。为了施加边界条件,Dirichlet问题采用φ-FEM提升,因此仅学习齐次位移贡献,而牵引驱动的Neumann问题额外预测非拟合弱形式所需的辅助场。参数通过最小化与φ-FEM对齐的弱形式残差平方以及切割单元辅助方程的平方惩罚来训练,从而消除了对大型配对数据集的依赖。训练后,WINO输出可作为神经算子热启动(NOWS)为非线性φ-FEM求解器提供初始值,相比传统冷启动求解器减少了迭代次数。数值基准测试表明,WINO在所有基准测试中实现了低于0.04的高精度,同时与纯数据驱动方法相比,总计算时间减少了50-80%。

英文摘要

We propose a Weak-form Physics-Informed Neural Operator (WINO), a data-free framework that combines the efficiency of neural operators with the geometric flexibility of the $φ$-finite element method ($φ$-FEM). $φ$-FEM is an unfitted method that accommodates geometric variations without body-fitted meshes, where the domain geometry is represented by the level-set function $φ$. To impose the boundary conditions, Dirichlet problems adopt the $φ$-FEM lifting so only the homogeneous displacement contribution is learned, whereas traction-driven Neumann problems additionally predict the auxiliary fields necessary for the unfitted weak formulation. Parameters are trained by minimizing squared weak-form residuals aligned with $φ$-FEM together with squared penalties on the cut-cell auxiliary equations, which removes the need for large paired datasets of converged reference solutions. After training, WINO outputs can seed the nonlinear $φ$-FEM solvers as neural operator warm starts (NOWS), which reduce iteration counts relative to traditional cold-started solvers. Numerical benchmarks show that WINO achieves high accuracy below 0.04 across all benchmarks, while reducing total computational time by 50--80\% compared with purely data-driven methods.

2605.24632 2026-05-26 cs.CR cs.AI cs.LG 版本更新

Demystifying the Mythos or Disrupting Bugonomics? From Zero-Day Asymmetry to Defender Remediation Throughput

揭秘神话或颠覆漏洞经济学?从零日不对称到防御者修复吞吐量

Alfredo Pesoli, Herman Errico, Lorenzo Cavallaro

发表机构 * University College London(伦敦大学学院) Bynario

AI总结 本文通过漏洞经济学视角分析LLM驱动的漏洞发现,指出其核心影响并非增加零日漏洞,而是提升防御者修复吞吐量,并利用Anthropic Mythos预览和Mozilla Firefox合作数据论证这一转变。

详情
AI中文摘要

最近,大型语言模型在生产软件中生成候选和确认漏洞的演示,重新引发了AI将重塑攻防安全的叙事。头条新闻强调能力,却很少审视成本和激励。本文通过漏洞经济学视角审视LLM驱动的漏洞发现:即生产、证明、优先级排序和修复安全相关缺陷的操作经济学。历史上,最引人注目的高端漏洞经济学是攻击方定价的,因为生产级零日漏洞和利用链是面向政府、经纪人和攻击方供应商的昂贵专家输出。防御方漏洞经济学早已存在于漏洞研究、奖励计划和供应商修复工作中;LLM辅助系统改变了其规模和分布。它们使得候选生成、代码理解、测试工具构建、影响证明草拟和报告准备在代码库规模上更便宜。利用和概念验证仍然重要,但在防御方工作流中,它们主要用于证明影响、指导优先级排序和证明修复的合理性。由此产生的瓶颈不仅仅是发现更多漏洞,而是吸收、验证、分类、修补和发布更大规模的报告流。利用Anthropic的Mythos预览和Mozilla Firefox合作的公开数据,以及公开的利用市场价格锚点和漏洞奖励计划,我们认为近期的转变不仅仅是更多的零日漏洞。而是向更广泛的防御者修复吞吐量迈进:低信号候选变得更便宜,证据丰富的修复变得更加重要,稀缺的能力转向维护者审查和发布工作。这种影响在开源领域尤为严重,因为LLM辅助发现可以增加报告量,而维护者侧的验证、分类、资金和发布能力可能无法扩展。

英文摘要

Recent demonstrations of large language models producing candidate and confirmed vulnerabilities in production software have renewed the narrative that AI will reshape offensive and defensive security. Headlines emphasize capability; they rarely interrogate costs and incentives. This paper examines LLM-driven vulnerability discovery through a bugonomics lens: the operational economics of producing, proving, prioritizing, and fixing security-relevant defects. Historically, the most visible high-end bugonomics was offense-priced because production-grade zero-days and exploit chains were expensive specialist outputs for governments, brokers, and offensive vendors. Defender-side bugonomics already existed in vulnerability research, reward programs, and vendor remediation work; LLM-assisted systems change its scale and distribution. They make candidate generation, code comprehension, harness construction, proof-of-impact drafting, and report preparation cheaper at codebase scale. Exploits and proofs of concept remain important, but in defender workflows they primarily prove impact, guide prioritization, and justify remediation. The resulting bottleneck is not only finding more bugs; it is absorbing, validating, triaging, patching, and shipping a larger stream of reports. Using public data from Anthropic's Mythos Preview and Mozilla Firefox collaborations, along with public exploit-market price anchors and vulnerability reward programs, we argue that the near-term shift is not simply more zero-days. It is a move toward broader defender remediation throughput: low-signal candidates become cheaper, evidence-rich remediation become more important, and scarce capacity shifts toward maintainer review and release work. The effect is acute in open source, where LLM-assisted discovery can increase report volume while maintainer-side validation, triage, funding, and release capacity may not scale.

2605.24631 2026-05-26 cs.LG cs.AI cs.CV 版本更新

Beyond Generative Priors: Minority Sampling with JEPA-Guided Diffusion

超越生成先验:JEPA引导扩散的少数采样

Sol Park, Soobin Um

发表机构 * Department of Artificial Intelligence, Kookmin University, Seoul, South Korea(人工智能系,韩国全州大学,首尔)

AI总结 提出一种基于世界模型JEPA引导的扩散采样框架,通过近似策略实现高效计算,在无条件、类别条件和文本到图像生成中提升少数样本的保真度和语义有效性。

Comments ICML 2026, 21 pages, 9 figures

详情
AI中文摘要

少数采样旨在数据流形上生成低密度实例,在医学诊断、异常检测和创意AI等应用中具有核心重要性。然而,现有方法相对于从训练数据中学习的生成先验来定义少数样本,将稀有性限制在可能无法很好反映现实世界语义的模型特定概念中。在这项工作中,我们提出了一种以世界为中心的少数采样视角,该视角相对于现实世界先验而非生成器诱导的密度来定义稀有性。为此,我们引入了JEPA引导,一种由联合嵌入预测架构(JEPA)引导的扩散采样框架——JEPA是一类编码广泛、语义丰富表示的世界模型。JEPA引导将扩散轨迹导向JEPA隐含密度下的低密度区域,从而使生成的少数样本与现实世界的语义稀有性对齐。为了使JEPA引导在计算上实用,我们开发了带有理论误差界限的原则性近似策略,显著降低了引导计算的开销。在无条件、类别条件和文本到图像生成上的大量实验表明,JEPA引导持续提高了少数样本的保真度和语义有效性,在捕捉现实世界的稀有性概念方面优于以生成器为中心的基线。代码可在https://github.com/soobin-um/jepa-guidance获取。

英文摘要

Minority sampling aims to generate low-density instances on a data manifold and is of central importance in applications such as medical diagnosis, anomaly detection, and creative AI. Existing approaches, however, define minority samples relative to generative priors learned from training data, confining rarity to model-specific notions that may poorly reflect real-world semantics. In this work, we propose a world-centric perspective on minority sampling, which defines rarity with respect to real-world priors rather than generator-induced densities. To this end, we introduce JEPA guidance, a diffusion sampling framework guided by a Joint-Embedding Predictive Architecture (JEPA) -- a class of world models that encode broad, semantically rich representations. JEPA guidance steers diffusion trajectories toward low-density regions under the implicit density induced by the JEPA, thereby aligning generated minorities with real-world semantic rarity. To make JEPA guidance computationally practical, we develop principled approximation strategies accompanied by theoretical error bounds, significantly reducing the overhead of guidance computation. Extensive experiments across unconditional, class-conditional, and text-to-image generation demonstrate that JEPA guidance consistently improves the fidelity and semantic validity of minority samples, outperforming generator-centric baselines in capturing real-world notions of rarity. Code is available at https://github.com/soobin-um/jepa-guidance.

2605.24621 2026-05-26 cs.CV cs.AI cs.LG 版本更新

Phase-Aware Wavelet-Based-Scattering Encoder-Decoder for Dense Predictions

相位感知的基于小波散射的编解码器用于密集预测

Ghassen Marrakchi, Basarab Matei

发表机构 * Northern Paris Computer Science Lab, Sorbonne Paris Nord University, Villetaneuse, France(北巴黎计算机科学实验室,巴黎-索邦大学,法国维莱特内斯)

AI总结 提出一种相位感知散射编解码器,通过在跳跃连接中显式保留相位信息来恢复空间结构,在图像去噪和皮肤病变分割任务中验证了相位对密集预测的有效性。

Comments 21 pages, 16 figures, 10 tables

详情
AI中文摘要

散射变换实现了Lipschitz稳定性和平移不变性,但密集预测任务需要保留在全局平均中丢失的空间结构。我们提出了相位感知散射编解码器,通过在跳跃连接中显式保留相位来恢复这些信息。在图像去噪(BSD68)上,打破平移不变性使PSNR提高了+2.17 dB;相位保留额外增加了+1.03 dB。一种新颖的空间洗牌消融实验(惩罚-1.26 dB)表明相位编码了位置依赖的结构。我们在第二个密集预测任务(ISIC皮肤病变分割)上进行了初步的可扩展性研究,完整的交叉验证正在进行中。这项工作推进了原则性的小波-深度学习集成,展示了相位信息如何在像素级预测中补充散射的稳定性-表达性权衡。

英文摘要

Scattering transforms achieve Lipschitz stability and translation invariance, but dense prediction tasks require preserving spatial structure lost in global averaging. We propose Phase-Aware Scattering Encoder-Decoder, which restores this information by explicitly preserving phase in skip connections. On image denoising (BSD68), breaking translation invariance improves PSNR by $+2.17$~dB; phase preservation adds $+1.03$~dB. A novel spatial shuffling ablation ($-1.26$~dB penalty) demonstrates phase encodes location-dependent structure. We conduct a preliminary extensibility study on a second dense prediction task (ISIC skin lesion segmentation), with full cross-validation as ongoing work. This work advances principled wavelet-deep learning integration, showing how phase information complements scattering's stability-expressiveness trade-off in pixel-level prediction.

2605.24614 2026-05-26 cs.CL cs.AI cs.LG 版本更新

Measuring the Depth of LLM Unlearning via Activation Patching

通过激活修补测量大语言模型遗忘的深度

Jaeung Lee, Dohyun Kim, Jaemin Jo

发表机构 * Sungkyunkwan University(全北大学)

AI总结 提出遗忘深度评分(UDS),通过激活修补量化遗忘的机制深度,在150个遗忘模型上的元评估中达到最高忠实性和鲁棒性。

Comments 18 pages

详情
AI中文摘要

大语言模型遗忘已成为隐私保护和人工智能安全的关键事后机制,但审计目标知识是否真正被擦除仍然具有挑战性。现有的输出级指标无法检测到这些知识是否仍可从内部表示中恢复。最近的白盒研究揭示了此类残留知识,但通常依赖于辅助训练或数据集特定调整,缺乏可推广的指标。为解决这些限制,我们提出遗忘深度评分(UDS),一种通过激活修补量化遗忘机制深度的指标。UDS首先使用保留模型基线识别编码目标知识的层,然后在0-1尺度上测量遗忘模型中该知识被擦除的程度。在跨越8种方法的150个遗忘模型上的20个指标的元评估中,UDS实现了最高的忠实性和鲁棒性,证实了我们的因果方法是遗忘评估中最可靠的。案例研究进一步揭示,白盒指标可能在层级别上不一致,并且擦除深度因示例而异。我们提供了将UDS集成到现有基准测试框架并简化评估流程的指南。代码和数据可在https://github.com/gnueaj/unlearning-depth-score获取。

英文摘要

Large language model (LLM) unlearning has emerged as a crucial post-hoc mechanism for privacy protection and AI safety, yet auditing whether target knowledge is truly erased remains challenging. Existing output-level metrics fail to detect when this knowledge remains recoverable from internal representations. Recent white-box studies reveal such residual knowledge but often rely on auxiliary training or dataset-specific adaptations, leaving no generalizable metric. To address these limitations, we propose the Unlearning Depth Score (UDS), a metric that quantifies the mechanistic depth of unlearning via activation patching. UDS first identifies layers that encode the target knowledge using a retain model baseline, then measures how much of it is erased in the unlearned model on a 0-1 scale. In a meta-evaluation across 20 metrics on 150 unlearned models spanning 8 methods, UDS achieves the highest faithfulness and robustness, confirming our causal approach as the most reliable for unlearning evaluation. Case studies further reveal that white-box metrics can disagree at the layer level and that erasure depth varies across examples. We provide guidelines for integrating UDS into existing benchmarking frameworks and streamlining the evaluation pipeline. Code and data are available at https://github.com/gnueaj/unlearning-depth-score

2605.24611 2026-05-26 cs.LG 版本更新

Beyond Fixed Points: Superpolynomial Capacity of Asymmetric Hopfield Networks

超越不动点:非对称Hopfield网络的超多项式容量

Aakash Kumar, Anatoly Khina, Frederik Mallmann-Trenn, Emanuele Natale

发表机构 * COATI, CNRS, Inria, I3S, Université Côte d’Azur, France(法国国家科学研究中心(CNRS)、法国国家信息与自动化研究所(Inria)、I3S研究所、蔚蓝海岸大学) School of Electrical and Computer Engineering, Tel Aviv University, Israel(特拉维夫大学电气与计算机工程学院) Department of Informatics, King’s College London, UK(伦敦国王学院信息学院) Institute of Science and Technology Austria (ISTA), Klosterneuburg, Austria(奥地利科学与技术研究所(ISTA))

AI总结 通过结合组合数学、数论和观点动力学分析,在经典同步非对称Hopfield网络中实现了指数级数量的极限环吸引子,每个吸引子具有指数级周期且对噪声鲁棒,首次证明了非对称Hopfield网络的超多项式容量。

详情
AI中文摘要

经典Hopfield网络由于对称权重而局限于静态模式,而非对称网络可以通过极限环吸引子编码时间序列。然而,在经典的同步非对称网络中实现长序列的高容量存储仍然是一个挑战。我们在具有二进制神经元和同步更新的经典非对称Hopfield模型中提出了一种简单且鲁棒的构造,使得$n$个神经元能够支持$\exp\!ig(Ω(n/(\log n)^2)ig)$个不同的极限环吸引子,每个吸引子的周期为$\exp\!ig(Ω(\sqrt n/\log n)ig)$,并且对翻转概率高达$ rac12-o(1)$的随机噪声具有鲁棒性,从而在存储序列的数量和长度上实现了超多项式容量。这是首次展示非对称Hopfield网络的这种容量,我们通过结合组合数学、数论和观点动力学的分析得到了这一结果。我们的发现表明,同步非对称Hopfield网络具有比先前认识到的更大且更鲁棒的序列记忆容量,证明在生物和人工神经系统中,鲁棒的序列表示可以通过粗糙的结构模式而非复杂的非线性来实现。

英文摘要

Classical Hopfield networks are limited to static patterns due to symmetric weights, whereas asymmetric networks can encode temporal sequences via limit-cycle attractors. Achieving high-capacity storage of long sequences in classical synchronous asymmetric networks, however, has remained a challenge. We present a simple and robust construction within the classical asymmetric Hopfield model with binary neurons and synchronous updates, that allows $n$ neurons to support $\exp\!\big(Ω(n/(\log n)^2)\big)$ distinct limit-cycle attractors, each with period $\exp\!\big(Ω(\sqrt n/\log n)\big)$ and robust to random noise with flip probability up to $\frac12-o(1)$, yielding superpolynomial capacity in both the number and length of stored sequences. This is the first demonstration of such capacity for asymmetric Hopfield networks, which we obtain by combining results from combinatorics, number theory and the analysis of opinion dynamics. Our findings show that synchronous asymmetric Hopfield networks possess a sequence-memory capacity which is larger and more robust than previously recognized, demonstrating that, in both biological and artificial neural systems, robust sequence representation can be achieved through coarse architectural motifs rather than complex nonlinearities.

2605.24608 2026-05-26 cs.AI cs.CV cs.LG 版本更新

Lattice theory and algebraic models for deep convolutional learning based on mathematical morphology

基于数学形态学的深度卷积学习的格论与代数模型

Gustavo, Angulo

发表机构 * Mines Paris, PSL University, CMA-Center for Applied Mathematics, Sophia-Antipolis, France(巴黎 Mines 学院,PSL 大学,应用数学中心,法国索菲亚-安蒂波利斯)

AI总结 本文基于格论和数学形态学,为深度卷积架构(CNN、ResNet、UNet)建立了严格的代数框架,揭示了标准CNN流水线是交叉格算子,并识别出三种真正的幂等开运算层设计。

详情
AI中文摘要

我们为深度卷积架构(包括CNN、ResNet和如UNet的编码器-解码器网络)建立了一个严格的代数框架,该框架基于格论和数学形态学。核心工具是Matheron-Maragos-Banon-Barrera (MMBB) 平移不变算子通用表示理论,我们将其系统地应用于标准深度网络的每一层。主要发现是:标准CNN流水线(线性卷积 + ReLU + 平坦最大池化)是一个交叉格算子:卷积是傅里叶下半格中的腐蚀,ReLU是格并闭包,最大池化是逐点最大加格中的膨胀,它们的组合既不是形态学开运算也不是闭运算。第二个发现是:ReLU在逐点格中的上伴随是一个全局(非局部)算子,在全局非负函数上为恒等映射,否则为负无穷,因此没有局部形态学腐蚀能与ReLU构成伴随对。这两个结果共同提供了深度在标准CNN中引入真正表示能力的精确代数原因:组合层不是幂等的。我们识别并完全刻画了三种真正的幂等开运算层设计:纯最大加形态学层(逐点格)、谱维纳层(傅里叶格)和自对偶形态学层。我们建立了完整的不动点和收敛理论。该框架还将最大池化、步长卷积和拉普拉斯金字塔统一在Goutsias-Heijmans伴随金字塔理论下,并给出了激活-池化膨胀(APD)分解及其正确的伴随算子。

英文摘要

We develop a rigorous algebraic framework for deep convolutional architectures, CNNs, ResNets, and encoder--decoder networks such as UNet, grounded in lattice theory and mathematical morphology. The central tool is the Matheron--Maragos--Banon--Barrera (MMBB) universal representation theory for translation-invariant operators, which we apply systematically to every layer of a standard deep network. The principal finding is that the standard CNN pipeline (linear convolution~$+$ ReLU~$+$ flat max-pooling) is a cross-lattice operator: the convolution is an erosion in the Fourier inf-semilattice while ReLU is a lattice-join closing and max-pooling is a dilation in the pointwise max-plus lattice, and their composition is a morphological opening in neither. A second finding is that the upper adjoint of ReLU in the pointwise lattice is a global (non-local) operator, the identity on globally non-negative functions and $-\infty$ otherwise, so no local morphological erosion can form an adjunction pair with ReLU. These two results together provide the precise algebraic reason why depth in standard CNNs introduces genuine representational power: the composed layer is not idempotent. Three layer designs that are genuine idempotent openings are identified and fully characterised: the pure max-plus morphological layer (pointwise lattice), the spectral Wiener layer (Fourier lattice), and the self-dual morphological layer. We establish a complete fixed-point and convergence theory. The framework also unifies max-pooling, strided convolution, and the Laplacian pyramid under the Goutsias--Heijmans adjoint pyramid theory, and gives the Activation--Pooling Dilation (APD) factorisation with its correct adjoint.

2605.24603 2026-05-26 cs.CL cs.LG 版本更新

CSP-Atlas: Concept-Specific Neural Circuits in a Sparse Python Transformer

CSP-Atlas: 稀疏Python Transformer中的概念特异性神经回路

Piotr Wilam

发表机构 * University College London(伦敦大学学院)

AI总结 通过提取106个Python概念的特异性神经回路,发现模型内部组织遵循计算结构而非语义类别,并识别出原子性超簇。

Comments Code: https://github.com/piotrwilam/AtlasCSP

详情
AI中文摘要

一个稀疏的8层代码Transformer为每个测试的Python构造开发了专用的神经回路,并且这些回路按照清晰的计算原则而非语义类别进行组织。我们通过边缘化63,800个受控提示,提取了106个概念(43个AST节点类型,63个内置对象)的神经回路,并使用对比检查提示(呈现一个关键字标记而不带其关联的句法结构)将每个回路分解为概念特异性和标记驱动组件。出现了三个发现。首先,所有106个概念在九个参数设置中的每一个都产生非空的通用回路,并且跨构造的概念特异性排名在扫描中保持稳定——存活不是宽松阈值的伪影。其次,AST回路包含一个与标记激活不同的真正概念组件:在中间到后期层,仅概念神经元占最强烈激活神经元的比例高达62.5%,而内置回路几乎完全由标记驱动。第三,六个计算上原子的构造——Import、ImportFrom、Break、Continue、Pass、Assert——尽管在语义上不相关,却聚集在一起,仅共享作为不需要嵌套体的单语句构造的属性;这个原子性超簇,以及由标记歧义性和结构独特性组织的四层层次结构,表明模型的内部组织追踪计算结构而非含义。方法、完整分解数据和分析代码已发布。

英文摘要

A sparse 8-layer code transformer develops dedicated neural circuitry for every Python construct tested, and that circuitry is organised by a clean computational principle rather than by semantic category. We extract neural circuits for 106 concepts (43 AST node types, 63 builtin objects) by marginalising across 63,800 controlled prompts, and decompose each circuit into concept-specific and token-driven components using contrastive checker prompts that present a keyword token without its associated syntactic structure. Three findings emerge. First, all 106 concepts produce non-empty universal circuits at every one of nine parameter settings, and the ranking of concept-specificity across constructs is stable across the sweep - survival is not an artifact of a permissive threshold. Second, AST circuits contain a genuine concept component distinct from token activation: concept-only neurons constitute up to 62.5% of the loudest-firing neurons at mid-to-late layers, while builtin circuits are almost entirely token-driven. Third, six computationally atomic constructs - Import, ImportFrom, Break, Continue, Pass, Assert - cluster together despite being semantically unrelated, sharing only the property of being single-statement constructs requiring no nested body; this atomicity super-cluster, together with a four-tier hierarchy organised by token ambiguity and structural distinctiveness, shows that the model's internal organisation tracks computational structure rather than meaning. The methodology, full decomposition data, and analysis code are released.

2605.24597 2026-05-26 cs.AI cs.CL cs.LG 版本更新

Learning to Reason Efficiently with A* Post-Training

学习通过A*后训练进行高效推理

Andreas Opedal, Francesco Ignazio Re, Abulhair Saparov, Mrinmaya Sachan, Bernhard Schölkopf, Ryan Cotterell

发表机构 * ETH Zürich(苏黎世联邦理工学院) MPI for Intelligent Systems, Tübingen(图宾根智能系统研究所) Purdue University(普渡大学)

AI总结 本文通过A*搜索算法指导LLM生成正确且高效的推理步骤,提出监督微调和强化学习两种训练方法,在1B-3B参数模型上显著提升推理准确性和效率。

Comments Preprint

详情
AI中文摘要

大型语言模型(LLM)的许多应用需要演绎推理,但模型经常产生不正确或冗余的推理步骤。我们将自然语言推理框架化为一个搜索问题,其中最终答案本身就是有效的证明,需要推理过程中间推理正确。具体来说,我们研究LLM是否能够通过A*搜索(一种保证通向目标的最优高效路径的算法)的指导,学习生成正确且高效的证明。我们探索了两种训练技术:在A*执行轨迹上的监督微调,以及使用A*信息的过程奖励模型进行强化学习。实验发现,1B-3B范围内的Llama-3.2模型从A*后训练中获益显著,从接近零准确率提升到超越更大的模型DeepSeek-V3.2。我们的分析揭示了一个权衡:简单的正确性奖励最大化准确率,而A*信息的信号在准确率和效率之间取得平衡。此外,我们发现,在更大的搜索空间中,使用不完美启发式训练的模型表现出更高的准确率。我们的结果展示了朝着由经典搜索算法原理指导的推理方向的有前景的路径。

英文摘要

Many applications of large language models (LLMs) require deductive reasoning, yet models frequently produce incorrect or redundant inference steps. We frame natural language inference as a search problem where the final answer is the valid proof itself, requiring a reasoning procedure in which intermediate inferences are correct. Specifically, we investigate whether LLMs can learn to generate correct and efficient proofs with guidance from A* search -- an algorithm that guarantees an optimally efficient path to a goal. We explore two training techniques: supervised fine-tuning on execution traces from A* and reinforcement learning with A*-informed process reward models. Empirically, we find that Llama-3.2 models in the 1B--3B range benefit substantially from A* post training, going from near-zero accuracy to outperforming DeepSeek-V3.2 -- a much larger model. Our analysis uncovers a trade-off: while simple correctness rewards maximize accuracy, A*-informed signals strike a balance between accuracy and efficiency. Furthermore, we find that on larger search spaces, models trained with imperfect heuristics exhibit superior accuracy. Our results demonstrate a promising direction towards reasoning guided by principles derived from classical search algorithms.

2605.24590 2026-05-26 cs.CV cs.LG stat.ML 版本更新

Physen-Noise2Noise: Physics-Guided Self-Supervised Defocus Deblurring with Bias Correction under Low-Light Conditions

Physen-Noise2Noise: 低光条件下带偏差校正的物理引导自监督散焦去模糊

Ziyan Huang, Lang Wu, Hongji Wang, Yifei Liu, Dongliang Tang, Hongqiao Wang

发表机构 * School of Mathematics and Statistics, Central South University(数学与统计学学院,中南大学) Key Laboratory for Micro/Nano Optoelectronic Devices of Ministry of Education, Hunan Provincial Key Laboratory of Low-Dimensional Structural Physics and Devices, School of Physics and Electronics, Hunan University(教育部微/纳米光电器件重点实验室,湖南省低维结构物理与器件重点实验室,物理与电子学院,湖南大学)

AI总结 提出一种基于物理模型的自监督散焦去模糊框架Physen-Noise2Noise,通过可学习噪声偏差参数和频域约束,在无干净参考图像的情况下联合校正偏差噪声并恢复高频细节。

Comments 14 pages

详情
AI中文摘要

低光、长曝光散焦去模糊由于同时存在严重模糊和复杂有偏噪声,仍然是一个具有挑战性的问题。现有方法通常依赖于简化的噪声假设,这限制了它们在真实成像条件下的有效性。在这项工作中,我们提出了Physen-Noise2Noise,一种由散焦成像物理模型引导的自监督去模糊框架,它利用有噪声的多帧观测,无需干净参考图像。与传统的基于Noise2Noise的方法假设零均值噪声不同,我们推导了散焦成像过程固有的频域约束,并通过可学习的噪声偏差参数将其纳入学习框架。此外,引入了一种多帧有噪初始化策略,在去模糊之前抑制复杂有偏噪声,为重建提供更稳定的起点。该公式显式建模有偏噪声,并在训练过程中实现联合偏差校正和高频细节恢复。此外,我们开发了一种预训练-微调变体,以增强在挑战性噪声条件下的鲁棒性和泛化能力。在模拟和真实数据集上的大量实验表明,所提出的方法在存在复杂有偏噪声的情况下,始终优于最先进的自监督散焦去模糊方法。

英文摘要

Low-light, long-exposure defocus deblurring remains a challenging problem due to the simultaneous presence of severe blur and complex biased noise. Existing methods typically rely on simplified noise assumptions, which limits their effectiveness under realistic imaging conditions. In this work, we propose Physen-Noise2Noise, a self-supervised deblurring framework guided by the physical model of defocus imaging, which leverages noisy multi-frame observations without requiring clean reference images. Unlike conventional Noise2Noise-based approaches that assume zero-mean noise, we derive a frequency-domain constraint inherent to the defocus imaging process and incorporate it into the learning framework via a learnable noise bias parameter. In addition, a multi-frame noisy initialization strategy is introduced to suppress complex biased noise prior to deblurring, providing a more stable starting point for reconstruction. This formulation explicitly models biased noise and enables joint bias correction and high-frequency detail recovery during training. Furthermore, we develop a pretrain-finetune variant to enhance robustness and generalization under challenging noise conditions. Extensive experiments on both simulation and real-world datasets demonstrate that the proposed method consistently outperforms state-of-the-art self-supervised approaches for defocus deblurring in the presence of complex biased noise.

2605.24588 2026-05-26 cs.AI cs.LG 版本更新

HeartBeatAI: An Interpretable and Robust Deep Learning Framework for Multi-Label ECG Arrhythmia Detection

HeartBeatAI:用于多标签心电图心律失常的可解释且鲁棒的深度学习框架

Shubham Gupta, Nikhil Panwar, Partha Pratim Roy

发表机构 * Department of Computer Science and Engineering, Indian Institute of Technology Roorkee(印度拉胡尔理工学院计算机科学与工程系)

AI总结 提出HeartBeatAI框架,结合域泛化、多尺度特征聚合和临床可解释性,通过Squeeze-and-Excitation ResNet和多层浓度管道实现鲁棒的12导联心电图分类,在跨数据集评估中达到98%宏F1分数,但跨机构部署时罕见异常检测仍存在挑战。

详情
AI中文摘要

虽然深度学习增强了自动化心电图分析,但临床部署受到类别不平衡和泛化差距的阻碍。本文提出了HeartBeatAI,一个结合域泛化、多尺度特征聚合和临床可解释性的深度学习框架,用于鲁棒的12导联心电图分类。超越基于图像的范式,HeartBeatAI集成了一个Squeeze-and-Excitation ResNet来隔离诊断导联,以及一个多层浓度管道来捕捉宏观节律和微观形态异常。为了缓解域偏移,该框架采用了MixStyle正则化和标签平滑。通过使用源内和留一域外协议在四个大规模数据集上进行严格的基准测试,在源内条件下实现了高性能(98%宏F1分数)。然而,留一域外评估揭示了检测罕见异常时的显著退化,突显了跨机构部署中持续存在的挑战。

英文摘要

While Deep Learning (DL) enhances automated electrocardiogram (ECG) analysis, clinical deployment is hindered by class imbalance and the generalization gap. This paper presents HeartBeatAI, a deep learning framework combining domain generalization, multi-scale feature aggregation, and clinical explainability for robust 12-lead ECG classification. Moving beyond image-based paradigms, HeartBeatAI integrates a Squeeze-and-Excitation (SE) ResNet to isolate diagnostic leads alongside a Multi-Layer Concentration Pipeline to capture macro-rhythm and micro-morphological anomalies. To mitigate domain shift, the framework employs MixStyle regularization and Label Smoothing. Rigorous benchmarking across four large-scale datasets using intra-source and Leave-One-Domain-Out (LODO) protocols demonstrates high performance (98% Macro F1-score) under intra-source conditions. However, LODO evaluations reveal significant degradation in detecting rare anomalies, highlighting a persistent challenge in cross-institutional deployment.

2605.24584 2026-05-26 cs.LG cs.AI 版本更新

LAPLEX: The FFT of Learnable Laplace Kernels

LAPLEX: 可学习拉普拉斯核的FFT

Łukasz Struski, Hanna Blazhko, Piotr Kubaty, Jacek Tabor

发表机构 * Faculty of Mathematics and Computer Science, Jagiellonian University(杰里戈尼亚大学数学与计算机科学系) Doctoral School of Exact and Natural Sciences, Jagiellonian University(杰里戈尼亚大学精确与自然科学博士学院) Centre for Credible Artificial Intelligence, Warsaw University of Technology(华沙技术大学可信人工智能中心)

AI总结 提出LAPLEX算子,通过可学习坐标锚点隐式定义满秩稠密矩阵,实现FFT规模的可训练矩阵-向量运算,分离表达性与存储成本。

详情
AI中文摘要

深度学习中的快速线性代数通常面临一个选择:固定几何和精确计算(如傅里叶变换),或者通过稠密参数、随机特征或低秩近似实现自适应几何。为了超越这种权衡,我们引入了LAPLEX,一类精确的、可训练的(相位)拉普拉斯核算子。LAPLEX层通常是一个满秩稠密矩阵,由可学习的坐标锚点隐式定义,具有类似FFT的缩放特性。因此,它支持在现代GPU上对高达$10^9$维的向量进行可训练的矩阵-向量运算。作为神经网络层,它产生紧凑的投影和分类头,可解释为软性的、可训练的路由模型。同样的原语也可作为高效的Gram算子,实现对展平图像(维度$3 \cdot 10^6$)的高维协方差建模,在保留可见空间结构的同时不施加卷积偏差。这些应用反映了一个单一原则:无需存储稠密矩阵即可学习稠密几何,从而在普通稠密层无法企及的领域中实现数据自适应的全局交互。在这个意义上,LAPLEX将表达性与存储成本分离:它表现得像一个稠密可训练矩阵,但通过一个小的结构化参数集表示和应用。

英文摘要

Fast linear algebra in deep learning usually comes with a choice: fixed geometry and exact computation, as in the Fourier transform, or adaptive geometry paid for by dense parameters, random features, or low-rank surrogates. To move beyond this trade-off, we introduce LAPLEX, a class of exact, trainable (phased) Laplace-kernel operators. A LAPLEX layer is a typically full-rank dense matrix, implicitly defined by learnable coordinate anchors, with FFT-like scaling. Consequently, it supports trainable matrix--vector operations at vector dimensions up to $10^9$ on modern GPUs. As a neural layer, it yields compact projections and classification heads interpretable as soft, trainable routing models. The same primitive also serves as an efficient Gram operator, enabling high-dimensional covariance models on flattened images of dimension $3 \cdot 10^6$ that preserve visible spatial structure without imposing convolutional bias. These applications reflect a single principle: dense geometry can be learned without storing a dense matrix, which enables data-adaptive global interactions in regimes where ordinary dense layers are out of reach. In this sense, LAPLEX separates expressivity from storage cost: it behaves like a dense trainable matrix, but is represented and applied through a small structured set of parameters.

2605.24577 2026-05-26 cs.LG cs.AI cs.CL 版本更新

Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m

多态性即旋转:从两层Transformer到Pythia-70m的操作性机械可解释性

Jordan F. McCann

发表机构 * Independent Researcher(独立研究者)

AI总结 本文发现独立训练的Transformer在残差流基上通过均匀随机旋转相互关联,并利用正交Procrustes拟合实现特征字典和转向向量在模型间的迁移,无需重新训练。

Comments 26 pages, 4 figures, 40 references. Pre-registered four-bar framework; all numerical claims reproducible

详情
AI中文摘要

独立训练的Transformer在残差流基上计算相同的函数,这些基通过$\mathrm{SO}(d_{\mathrm{model}})$上的均匀随机旋转相互关联。我们将这种现象称为多态性:相同的函数,但内部坐标互不可解。每对模型之间的一次矩阵乘法即可消除这种多态性:在单批激活上进行正交Procrustes拟合,即可在独立训练的模型之间迁移稀疏自编码器特征字典和转向向量,无需重新训练。 该现象对标准SAE通用性度量不可见。解码器列余弦相似度在不同种子间匹配度达98%,即SAE通用性的头条数字,而一个种子训练的SAE重构另一个种子的激活时,解释方差为负,比预测常数均值更差。解码器列对齐,但编码器从旋转后的框架读取。单个Procrustes旋转$R$可在每个内部位置将重构恢复至种子内上限的0.025 EV以内。 $R$服从Haar分布:$\|R - I\|_F$与随机正交预测$\sqrt{2 d_{\mathrm{model}}}$在$d_{\mathrm{model}} = 512$时匹配至0.1%,且$R$的特征值谱与Haar $\mathrm{SO}(d_{\mathrm{model}})$的Kolmogorov-Smirnov检验在合并和逐对情况下均返回$p \approx 1.000$。均值差转向向量通过与$R$的不变子空间对齐在三种机制下迁移:当被共享输出权重固定时清晰,与旋转子空间重叠时部分,否则反转。在无共享输入/输出(Pythia)时,所有三种情况均坍缩为普遍反转。同一旋转解释适用于单次运行中的不同训练检查点。 在104k参数的Dyck-3 Transformer和九个独立训练的Pythia-70m种子(基于The Pile数据集)上,通过预注册的四柱操作框架进行验证。前沿规模(10B+)的复现仍有待研究。

英文摘要

Independently trained transformers compute the same function in residual-stream bases that differ by a uniform random rotation on $\mathrm{SO}(d_{\mathrm{model}})$. We call this phenomenon polymorphism: same function, mutually unintelligible interior coordinates. One matrix multiplication per model pair removes it: an orthogonal Procrustes fit on a single batch of activations transfers sparse-autoencoder feature dictionaries and steering vectors between independently trained models, with no retraining. The phenomenon is invisible to the standard SAE universality metric. Decoder-column cosine similarity matches across seeds at 98%, the SAE-universality headline number, while an SAE trained on one seed reconstructs another seed's activations at negative explained variance, worse than predicting the constant mean. The decoder columns align; the encoder reads from a rotated frame. A single Procrustes rotation $R$ restores reconstruction to within 0.025 EV of the within-seed ceiling at every internal site. $R$ is Haar-distributed: $\|R - I\|_F$ matches the random-orthogonal prediction $\sqrt{2 d_{\mathrm{model}}}$ to 0.1% at $d_{\mathrm{model}} = 512$, and a Kolmogorov-Smirnov test of $R$'s eigenvalue spectrum against Haar $\mathrm{SO}(d_{\mathrm{model}})$ returns $p \approx 1.000$ pooled and per-pair. Diff-of-means steering vectors transfer in three regimes by alignment with $R$'s invariant subspace: clean when pinned by shared output weights, partial when overlapping the rotated subspace, inverted otherwise. With no shared I/O (Pythia), all three collapse to universally inverted. The same rotation account holds across training checkpoints within a single run. Validated on a 104k-parameter Dyck-3 transformer and nine independently-trained Pythia-70m seeds on The Pile, via a pre-registered four-bar operational framework. Frontier-scale (10B+) replication remains open.

2605.24570 2026-05-26 cs.LG cs.AI cs.CV 版本更新

PILOT: Policy-Informed Learned Optimization for Adaptive Deep Network Training

PILOT: 策略引导的学习优化器用于自适应深度网络训练

Sattam Altuuaim, Lama Ayash, Muhammad Mubashar, Naeemullah Khan

发表机构 * King Abdullah University of Science and Technology(卡布斯大学) University of Strathclyde(斯特拉思克莱德大学)

AI总结 提出PILOT在线优化器,通过梯度方向一致性信号动态调整动量、归一化和符号更新的组合,在FashionMNIST和CIFAR-10上实现更高准确率。

Comments 16 pages, 5 figures

详情
AI中文摘要

尽管优化在深度学习中扮演核心角色,但大多数优化器依赖于训练开始前固定函数形式的更新结构。这种静态设计限制了它们响应损失景观中变化梯度行为的能力,其中训练可能在稳定、噪声和不一致状态之间切换。本研究提出PILOT(策略引导的学习优化器),一种在线优化器,在训练过程中自适应其更新行为。PILOT不使用动量、归一化和符号更新之间的固定平衡,而是将梯度方向一致性作为局部训练稳定性的信号。基于该一致性信号调整更新规则,使优化器能够在梯度变得稳定、噪声或不一致时调整其行为。在FashionMNIST和CIFAR-10上的实验表明,PILOT在卷积设置中始终达到评估优化器中的最高准确率。在CNN架构上,PILOT在FashionMNIST上达到94.13%,在CIFAR-10上达到81.94%。在ResNet-18上,它进一步提升了性能,在FashionMNIST上达到95.71%,在CIFAR-10上达到93.42%。这些结果表明,在训练过程中学习如何调整更新结构可以在保持简单一阶优化框架的同时,提高紧凑和更深卷积模型的性能。PILOT的实现公开于https://github.com/SattamAltwaim/PILOT.git。

英文摘要

Despite the central role of optimization in deep learning, most optimizers rely on update structures whose functional form is fixed before training begins. This static design can limit their ability to respond to changing gradient behavior across the loss landscape, where training may shift between stable, noisy, and inconsistent regimes. This study proposes PILOT (Policy-Informed Learned OpTimizer), an online optimizer that adapts its update behavior during training. Rather than using a fixed balance between momentum, normalization, and sign-based updates, PILOT uses gradient-direction agreement as a signal of local training stability. Conditioning the update rule on this agreement signal allows the optimizer to adjust its behavior when gradients become stable, noisy, or inconsistent. Experiments on FashionMNIST and CIFAR-10 show that PILOT consistently achieves the highest accuracy among the evaluated optimizers across convolutional settings. On the CNN architecture, PILOT reaches 94.13% on FashionMNIST and 81.94% on CIFAR-10. On ResNet-18, it further improves performance, reaching 95.71% on FashionMNIST and 93.42% on CIFAR-10. These results suggest that learning how to adapt the update structure during training can improve performance across both compact and deeper convolutional models while preserving a simple first-order optimization framework. The implementation of PILOT is publicly available at https://github.com/SattamAltwaim/PILOT.git

2605.24566 2026-05-26 cs.CV cs.GR cs.LG 版本更新

EMA: Effort Metric Attention for Anatomical Effort-Guided Human Motion Diffusion

EMA: 面向解剖学努力引导的人体运动扩散的努力度量注意力

Joshua Siy, Huakun Liu, Yutaro Hirao, Monica Perusquia-Hernandez, Hideaki Uchiyama, Kiyoshi Kiyokawa

发表机构 * Nara Institute of Science and Technology(奈良科学技术大学)

AI总结 提出基于努力度量注意力(EMA)的强度控制框架,通过数值努力信号调节运动扩散模型,实现细粒度、区域化的运动强度控制,并验证了与LMA描述符的单调对齐。

Comments Accepted at IEEE International Conference on Automatic Face and Gesture Recognition (FG 2026)

详情
AI中文摘要

人体运动扩散模型可以从文本合成动作序列,但控制运动强度仍然具有挑战性。现有方法依赖于与努力相关的副词,这些副词模糊不清,无法捕捉诸如节奏等定量方面,通常导致动态平坦且单调。我们提出了一种基于努力度量注意力(EMA)的强度控制框架,这是一个交叉注意力模块,将扩散条件建立在数值努力信号上。受拉班动作分析(LMA)启发,该框架关注时间和重量努力因素。我们使用两个运动学指标来近似这些因素:用于节奏的峰值关节位置变化和用于运动量的集体关节位置变化。EMA实现了细粒度、区域化的控制,无需昂贵的后验优化。我们引入了两个评估任务,度量到运动的一致性和身体部位级别的努力调制,以评估数值保真度和局部控制。实验和用户研究表明,指定的努力水平、生成的运动动态和已建立的LMA描述符之间具有近乎单调的对齐。这些结果表明在实践中对努力动态进行了有效且可解释的控制。

英文摘要

Human motion diffusion models can synthesize action sequences from text, but controlling motion intensity remains challenging. Existing approaches rely on effort-related adverbs, which are ambiguous and fail to capture quantitative aspects such as pacing, often resulting in flat and monotonous dynamics. We propose an intensity-control framework based on Effort Metric Attention (EMA), a cross-attention module that conditions diffusion on numerical effort signals. Inspired by Laban Movement Analysis (LMA), the framework focuses on the Time and Weight effort factors. We approximate these factors using two kinematic metrics: peak joint positional change for pacing and collective joint positional change for motion amount. EMA enables fine-grained, region-wise control without costly post-hoc optimization. We introduce two evaluation tasks, metric-to-motion consistency and body-part-level effort modulation, to assess numerical fidelity and localized control. Experiments and a user study show near-monotonic alignment between specified effort levels, generated motion dynamics, and established LMA descriptors. These results indicate effective and interpretable control of effort dynamics in practice.

2605.24564 2026-05-26 cs.AI cs.CE cs.LG 版本更新

Summoning the Oracle to Slay It: Mitigating Look-Ahead Bias in Financial Backtesting with Large Language Models

召唤神谕以屠之:利用大语言模型缓解金融回测中的前瞻偏差

Weixian Waylon Li, Mengyu Wang, Tiejun Ma

发表机构 * University of Edinburgh(爱丁堡大学)

AI总结 提出FinCAD方法,通过对抗性偏差发现和实体日期自适应规则,在不重新训练的情况下抑制大语言模型对历史结果的记忆,从而缓解金融回测中的参数化前瞻偏差。

详情
AI中文摘要

在历史金融数据上回测大语言模型(LLMs)是不可靠的,因为预训练在事件发生后截断。一个在2024年训练的LLM已经“知道”2018-2020年股票的走势。我们将这种失败命名为参数化前瞻偏差,并提出FinCAD,一种上下文感知解码的推理时适配方法,无需重新训练即可抑制LLM对历史结果的记忆。FinCAD结合了一个对抗性偏差发现流程,该流程学习一个模型特定的记忆激活先验提示,以及一个实体和日期自适应规则,该规则将CAD强度按(实体,日期)记忆程度缩放,使得惩罚在记忆的样本内日期触发,并在样本外衰减至零。在五个7-14B LLM和五只大盘股上,FinCAD在记忆日期上将样本内回测收益削减高达-67.1%,同时将2025年样本外收益保持在$8K以内,夏普比率在基线的0.10以内,并保持通用推理能力在1.7分以内。在十一个模型的排行榜上,它将样本内/样本外Spearman相关性从+0.779提升至+0.846,恢复了真正预测样本外表现的排名。

英文摘要

Backtesting large language models (LLMs) on historical financial data is unreliable because pre-training cuts off after the events happened. An LLM trained in 2024 already "knows" which way 2018-2020 stocks moved. We name this failure parametric look-ahead bias and propose FinCAD, an inference-time adaptation of Context-Aware Decoding that suppresses an LLM's memory of historical outcomes without retraining. FinCAD pairs an adversarial bias-discovery pipeline that learns a model-specific memory-activating prior prompt with an entity- and date-adaptive rule that scales the CAD strength to per-(entity, date) memorisation, so the penalty fires on memorised in-sample dates and decays to zero out-of-sample. Across five 7-14B LLMs and five mega-cap equities, FinCAD cuts in-sample backtest returns by up to -67.1% on memorised dates while leaving 2025 out-of-sample returns within $8K and Sharpe within 0.10 of baseline, and preserves general-purpose reasoning within 1.7 pts. On an eleven-model leaderboard, it raises the in-sample / out-of-sample Spearman correlation from +0.779 to +0.846, recovering rankings that genuinely predict out-of-sample performance.

2605.24558 2026-05-26 cs.LG 版本更新

Position: AI for Science Should Treat Measurement-to-Dataset Pipelines as Inference Components

立场:科学人工智能应将测量到数据集的处理流程视为推理组件

Ling Zhan, Xiaoyao Yu, Tao Jia

发表机构 * College of Computer and Information Science(计算机与信息科学学院) Chongqing Key Laboratory of Brain-Inspired Cognitive Computing and Educational Rehabilitation for Children with Special Needs(重庆脑启发认知计算及特殊需要儿童教育康复重点实验室) Chongqing Normal University(重庆师范大学)

AI总结 本文主张科学人工智能中的测量到数据集流程应被视为推理组件,并揭示了将其输出视为固定数据导致的三个失败模式,通过大规模神经科学实证验证了问题的严重性,呼吁建立可计算的观测框架。

Comments 23 pages, 5 figures, Proceedings of the 43 rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026

详情
AI中文摘要

科学人工智能(AI4Science)工作流通常将发布的数据集视为底层系统的固定接口。然而,在依赖间接观测的领域中,学习器观察到的是由多阶段测量、重建和预处理流程产生的衍生表示。我们认为这些测量到数据集的处理流程是推理组件:将其输出视为“给定数据”会冻结观测模型并掩盖可行流程选择的不确定性。我们识别出这种“冻结透镜”导致的三个失败模式:(C1)隐藏假设空间,即发布的数据集未指定流程配置或其有效性条件;(C2)未经认证的可迁移性,即流程可能被记录但其有效性范围未经测试,因此分布偏移下的失败无法判定;(C3)无约束的多样性,即存在许多可辩护的流程且分散性是真实的,但未传播到不确定性感知的证据中。我们通过大规模神经科学实证审计对这些主张进行压力测试,发现在跨数据集稳定性标准下存活率约为0.0004%。我们呼吁AI4Science社区通过特定领域的可计算观测框架使流程成为可计算的推理对象。这一转变能够量化流程的充分性和稳定性,将隐式的实现选择转化为可审计、可复现和累积的科学证据。

英文摘要

AI for Science (AI4Science) workflows often treat the released dataset as a fixed interface to the underlying system. However, in domains relying on \emph{indirect observation}, the learner observes a derivative representation produced by multi-stage measurement, reconstruction, and preprocessing pipelines. \textbf{We argue that these measurement-to-dataset pipelines are inference components: treating their outputs as ``given data'' freezes an observation model and obscures uncertainty over feasible pipeline choices.} We identify three failure modes arising from this ``frozen lens'': \textbf{(C1) hidden hypothesis space}, where the released dataset does not specify the pipeline configuration or its validity conditions; \textbf{(C2) uncertified transportability}, where a pipeline may be documented but its regime of validity is untested, so failures under distribution shift cannot be adjudicated; \textbf{(C3) ungoverned multiplicity}, where many defensible pipelines exist and dispersion is real but not propagated into uncertainty-aware evidence. We stress-test these claims with a large-scale neuroscience empirical audit, finding a survival rate of $\approx 0.0004\%$ under a cross-dataset stability criterion. We call on the AI4Science community to make pipelines \emph{computable} inference objects via domain-specific Computable Observation Frameworks. This shift enables quantifying pipeline adequacy and stability, converting implicit implementation choices into auditable, reproducible, and cumulative scientific evidence.

2605.24556 2026-05-26 cs.IR cs.CL cs.LG 版本更新

The Multilingual Curse at the Retrieval Layer: Evidence from Amharic

多语言诅咒在检索层:来自阿姆哈拉语的证据

Yosef Worku Alemneh, Kidist Amde Mekonnen, Maarten de Rijke

发表机构 * Independent Researcher(独立研究者) University of Amsterdam(阿姆斯特丹大学)

AI总结 针对零样本多语言检索在低资源形态丰富语言(如阿姆哈拉语)上表现不佳的问题,通过对比实验发现单语检索器显著优于多语言检索器,并揭示了多语言基准测试的局限性。

Comments 10 pages, 4 tables. Accepted to the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM) at ACL 2026

详情
AI中文摘要

多语言检索日益支撑着跨语言问答和检索增强生成。在多语言基准测试上的强零样本分数常被视为当前编码器能可靠跨语言迁移的证据。我们认为,对于代表性不足、形态丰富的语言,这一假设不成立,并以阿姆哈拉语作为诊断案例。在涵盖密集、延迟交互、学习稀疏和交叉编码器范式的共享段落检索协议下,我们比较了零样本多语言检索器、阿姆哈拉语微调的多语言检索器以及单语阿姆哈拉语检索器。最强的零样本多语言检索器在MRR@10上比最强的单语阿姆哈拉语第一阶段检索器低23%。在相同的阿姆哈拉语监督下微调两个最新的多语言嵌入模型,相比零样本获得了32-60%的相对MRR@10提升,但最佳阿姆哈拉语微调多语言模型仍低于最强的单语阿姆哈拉语检索器。这些发现表明,零样本多语言检索并不能充分代表LLM时代公平的信息访问:对于代表性不足的语言,检索必须在语言内部进行评估和适应,而不是从聚合的多语言基准测试中推断。为促进未来研究,我们在https://github.com/rasyosef/amharic-neural-ir 公开发布了数据集、代码库和训练模型。

英文摘要

Multilingual retrieval increasingly underpins cross-lingual question answering and retrieval-augmented generation. Strong zero-shot scores on multilingual benchmarks are often taken as evidence that current encoders transfer reliably across many languages. We argue that this assumption breaks down for underrepresented, morphologically rich languages, and use Amharic as a diagnostic case. Under a shared passage retrieval protocol covering dense, late-interaction, learned sparse, and cross-encoder paradigms, we compare zero-shot multilingual retrievers, Amharic-fine-tuned multilingual retrievers, and monolingual Amharic retrievers. The strongest zero-shot multilingual retriever underperforms the strongest monolingual Amharic first-stage retriever by 23% relative MRR@10. Fine-tuning two recent multilingual embedding models on the same Amharic supervision yields 32-60% relative MRR@10 gains over zero-shot, but the best Amharic-fine-tuned multilingual model remains below the strongest monolingual Amharic retriever. These findings indicate that zero-shot multilingual retrieval is not a sufficient proxy for equitable information access in the LLM era: for underrepresented languages, retrieval must be evaluated and adapted in-language rather than inferred from aggregate multilingual benchmarks. To foster future research, we publicly release the dataset, codebase, and trained models at https://github.com/rasyosef/amharic-neural-ir.

2605.24550 2026-05-26 cs.AI cs.CL cs.LG 版本更新

Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models

越狱以保护:通过临时越狱进行缓冲和强化以实现大型语言模型的安全微调

Seokil Ham, Jaehyuk Jang, Wonjun Lee, Changick Kim

发表机构 * School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST)(韩国科学技术院电子工程学院)

AI总结 针对微调即服务中安全对齐被有害微调攻击削弱的问题,提出一种基于梯度分析的缓冲与强化框架,通过临时越狱适配器减少有害更新并利用QR分解合并强化安全,实现无需额外安全数据的高效防御。

Comments ICML 2026 Spotlight

详情
AI中文摘要

微调即服务(FaaS)使得大型语言模型(LLMs)的个性化成为可能,但它在有害微调攻击下会削弱安全对齐。最近的研究表明,在微调期间激活有害行为模块可以防止模型学习不良行为,但其机制尚不清楚。在本文中,我们重新审视临时越狱作为对抗有害微调的一种防御手段,并提供了梯度层面的分析,表明它能够饱和安全退化梯度,同时保留良性任务相关梯度。基于这一见解,我们提出了一种缓冲与强化微调框架,该框架在用户微调期间缓冲有害更新,并在适应后强化安全。具体来说,BufferLoRA作为一个可移除的适配器,在用户微调期间诱导临时越狱以减少有害更新。适应后,通过基于QR分解的合并,将经过训练的ReinforceLoRA(用于在临时越狱状态下恢复拒绝行为)与UserLoRA集成,以在保持用户任务性能的同时强化安全。大量实验表明,我们的框架在用户微调期间无需额外安全数据且计算成本极低的情况下,实现了卓越的安全性和实用性。

英文摘要

Fine-tuning-as-a-Service (FaaS) enables personalization of large language models (LLMs), but it can weaken safety-alignment under harmful fine-tuning attacks. Recent work has shown that activating harmful-behavior modules during fine-tuning can prevent models from learning undesired behaviors, but its mechanism remains unclear. In this paper, we revisit temporary jailbreaking as a defense against harmful fine-tuning and provide a gradient-level analysis showing that it saturates safety-degrading gradients while preserving benign task-relevant gradients. Based on this insight, we propose a Buffer-and-Reinforce fine-tuning framework that buffers harmful updates during user fine-tuning and reinforces safety after adaptation. Specifically, BufferLoRA induces temporary jailbreaking as a removable adapter to reduce harmful updates during user fine-tuning. After adaptation, ReinforceLoRA, trained to recover refusal behavior under the temporarily jailbroken state, is integrated with UserLoRA via QR decomposition-based merging to reinforce safety while preserving user-task performance. Extensive experiments show that our framework achieves superior safety and utility with no additional safety data during user fine-tuning and minimal computational cost.

2605.24548 2026-05-26 cs.LG math.PR 版本更新

Deep ZakaiJ: Structured Filtering for Jump-Diffusion Time Series Forecasting

Deep ZakaiJ:用于跳跃扩散时间序列预测的结构化滤波

Yan Leng, Thibaut Mastrolia, Hao Wang

发表机构 * University of Texas at Austin(德克萨斯大学奥斯汀分校) University of California, Berkeley(加州大学伯克利分校)

AI总结 提出Deep ZakaiJ模型,将Zakai非线性滤波方程嵌入神经编码器-解码器架构,通过Strang分裂实现隐状态信念更新,用于部分观测的跳跃扩散系统,在合成、金融和海洋数据集上改进了分布预测并保持点精度竞争力。

详情
AI中文摘要

由未观测隐状态驱动的时间序列经常表现出突然的跳跃不连续性,其时间和幅度无法仅从观测历史预测。经典跳跃扩散模型提供了严谨的数学框架,但假设刚性参数形式,而最近的神经跳跃模型在完全观测轨迹上操作,不推断控制动力学的隐状态。我们提出 extit{Deep ZakaiJ},一种用于部分观测跳跃扩散系统的隐状态模型,将Zakai非线性滤波方程嵌入神经编码器-解码器架构。编码器通过Strang分裂递归更新隐状态的信念,分为三个可解释的子步骤:先验传播、扩散创新和跳跃创新,产生精确滤波演化的可微一阶精确近似。解码器是一个结构化跳跃扩散模型,明确以滤波信念为条件,保持连续动力学和不连续冲击之间的分离。在合成、金融和海洋数据集上, extit{Deep ZakaiJ}改善了分布预测,同时保持点精度竞争力,实现了校准的预测区间,并在合成和定性案例研究中恢复了可解释的隐结构。

英文摘要

Time series driven by unobserved latent states frequently exhibit abrupt jump discontinuities whose timing and magnitude cannot be predicted from observed history alone. Classical jump-diffusion models offer a principled mathematical framework but assume rigid parametric forms, while recent neural jump models operate on fully observed trajectories without inferring the hidden states that govern the dynamics. We propose \textit{Deep ZakaiJ}, a latent-state model for partially observed jump-diffusion systems that embeds the Zakai nonlinear filtering equation into a neural encoder--decoder architecture. The encoder recursively updates a belief over the latent state via Strang splitting into three interpretable substeps: prior propagation, diffusion innovation, and jump innovation, yielding a differentiable, first-order-accurate approximation of the exact filtering evolution. The decoder is a structured jump-diffusion model explicitly conditioned on the filtered belief, preserving the separation between continuous dynamics and discontinuous shocks. On synthetic, financial, and oceanographic datasets, \textit{Deep ZakaiJ} improves distributional forecasts while remaining competitive in point accuracy, achieving calibrated predictive intervals and recovering interpretable latent structure in synthetic and qualitative case studies.

2605.24547 2026-05-26 cs.LG 版本更新

RL with Learnable Textual Feedback: A Bilevel Approach

基于可学习文本反馈的强化学习:一种双层方法

Utsav Singh, Sidhaarth Sredharan, Souradip Chakraborty, Amrit Singh Bedi

发表机构 * University of Central Florida(佛罗里达中央大学)

AI总结 针对稀疏奖励导致样本效率低的问题,提出一种双层优化框架Bi-NAC,联合训练评论家生成可改善策略的文本反馈和演员利用该反馈,在MATH-500等任务上提升了样本和参数效率。

详情
AI中文摘要

具有可验证奖励的强化学习可以改进LLM的推理能力,但当终端奖励稀疏时,学习仍然样本效率低下。这推动了关于文本反馈强化学习的一系列工作,其中评论家模型生成自然语言反馈来指导推理模型(演员),用更丰富的学习信号增强标量奖励。然而,现有方法通常将反馈视为固定的或辅助的,这忽略了关键性质:反馈不仅应正确,而且应在上下文中提供时改进策略(演员模型)。这激发了用于强化学习的可学习文本反馈范式。然而,反馈的可学习性和有用性取决于策略从中学习的能力,使得具有可学习反馈的强化学习本质上是一个双层问题。我们将这种耦合形式化为Stackelberg双层规划,并推导出双层自然语言演员-评论家(Bi-NAC),它联合训练评论家生成改善奖励的反馈和演员利用该反馈。在MATH-500、MBPP和GPQA上,Bi-NAC在样本和参数效率上优于强化学习和固定评论家基线:我们的2B模型优于3B GRPO基线,在MATH-500上达到46.6%对比41.4%,而我们的6B模型超过7B GRPO基线,在GPQA上达到49.3%对比43.6%。

英文摘要

Reinforcement learning with verifiable rewards can improve LLM reasoning, but learning remains sample-inefficient when terminal rewards are sparse. This has motivated a growing line of work on RL with textual feedback, where a critic model generates natural language feedback to guide a reasoning model (the actor), augmenting scalar rewards with richer learning signals. However, existing methods typically treat feedback as fixed or auxiliary, which misses a key property: feedback should not merely be correct, but should improve the policy (actor model) when provided in context. This motivates a paradigm of learnable textual feedback for RL. Yet the learnability and usefulness of feedback depend on the policy's ability to learn from it, making RL with learnable feedback an inherently bilevel problem. We formalize this coupling as a Stackelberg bilevel program and derive Bilevel Natural Language Actor-Critic (Bi-NAC), which jointly trains a critic to generate reward-improving feedback and an actor to exploit it. Across MATH-500, MBPP, and GPQA, Bi-NAC improves sample and parameter efficiency over RL and fixed-critic baselines: our 2B model outperforms the 3B GRPO baseline, achieving 46.6% versus 41.4% on MATH-500, while our 6B model surpasses the 7B GRPO baseline, achieving 49.3% versus 43.6% on GPQA.

2605.24545 2026-05-26 cs.LG cs.AI 版本更新

Rethinking Federated Unlearning via the Lens of Memorization

通过记忆视角重新思考联邦遗忘学习

Jiaheng Wei, Yanjun Zhang, He Zhang, Leo Yu Zhang, Chao Chen, Kok-Leong Ong, Jun Zhang, Yang Xiang

发表机构 * Royal Melbourne Institute of Technology(皇家墨尔本理工学院) Griffith University(格里菲斯大学) Swinburne University of Technology(斯威本理工大学)

AI总结 针对联邦学习中遗忘数据与保留数据重叠导致遗忘无效和客户端不公平的问题,提出基于分组记忆评估的联邦记忆剪枝方法,通过重置负责记忆的冗余参数实现高效遗忘。

Comments This paper has been accepted by SIGKDD 2026

详情
AI中文摘要

联邦学习越来越需要机器遗忘来遵守隐私法规。然而,现有的联邦遗忘方法可能忽略了遗忘数据与保留数据之间的重叠信息,导致遗忘无效和客户端之间的不公平。在这项工作中,我们通过记忆的视角重新审视联邦遗忘。我们认为,遗忘主要应移除归因于待遗忘数据的独特记忆信息,同时保留也得到剩余数据支持的重叠模式。具体地,我们提出了分组记忆评估,一种示例级度量,将记忆知识与重叠知识分离。基于该度量,我们引入了联邦记忆剪枝(FedMemPrune),一种基于剪枝的遗忘方法,重置负责记忆的冗余参数。大量实验表明,FedMemPrune 与基于重训练的遗忘基线紧密匹配,同时比现有联邦遗忘算法更有效地消除记忆,在保持保留知识效用的情况下实现了强大的遗忘性能。

英文摘要

Federated learning (FL) increasingly needs machine unlearning to comply with privacy regulations. However, existing federated unlearning approaches may overlook the overlapping information between the unlearning and remaining data, leading to ineffective unlearning and unfairness between clients. In this work, we revisit federated unlearning through the lens of memorization. We argue that unlearning should mainly remove the unique memorized information attributable to the data to be forgotten, while preserving overlapping patterns that are also supported by the remaining data. Specifically, we propose Grouped Memorization Evaluation, an example-level metric that separates memorized knowledge from overlapping knowledge. Building on this metric, we introduce Federated Memorization Pruning (FedMemPrune), a pruning-based unlearning approach that resets redundant parameters responsible for memorization. Extensive experiments show that FedMemPrune closely matches retraining-based unlearning baselines while more effectively eliminating memorization than existing federated unlearning algorithms, yielding strong unlearning performance without sacrificing the utility of retained knowledge.

2605.24542 2026-05-26 cs.CR cs.AI cs.LG cs.MA cs.SE 版本更新

AI-Driven Adaptive Adversaries and the Erosion of Cryptographic Trust in Public Key Systems

AI驱动的自适应对手与公钥系统中密码学信任的侵蚀

Petar Radanliev

发表机构 * Department of Computer Sciences, University of Oxford(牛津大学计算机科学系) The Alan Turing Institute(艾伦·图灵研究所) British Library(大英图书馆)

AI总结 本文研究人工智能驱动的自适应对手如何利用实现层面的可观测性侵蚀公钥密码学的安全性,提出了一种新的安全评估框架。

详情
Journal ref
J Anal Sci Technol 17, 26 (2026)
AI中文摘要

本文研究了在人工智能驱动的自适应对手优化下,公钥密码学(PKC)安全性的侵蚀问题。所解决的问题是以算法为中心的密码安全模型与操作攻击现实之间日益增长的错配,其中对手利用实现层面的可观测性,而不是破解密码原语。

英文摘要

This paper examines the erosion of Public Key Cryptography (PKC) security under adaptive adversarial optimisation driven by artificial intelligence. The problem addressed is the growing mismatch between algorithm-centric cryptographic security models and operational attack realities, where adversaries exploit implementation-level observability rather than breaking cryptographic primitives.

2605.24541 2026-05-26 cs.LG cs.AI cs.CL cs.IR 版本更新

SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors

SemanticZip: 以LLM作为语义解压器的有损文本压缩的试点框架

Natalia Trukhina, Vadim Vashkelis

发表机构 * Embedded Intelligence Lab (EMILAB)(嵌入式智能实验室)

AI总结 提出SemanticZip框架,通过LLM将文本压缩为紧凑代码并解压为任务相关语义,在结构化散文、JSON等六种表示上评估,发现结构化散文恢复率最高(WAR=0.956,19.1%令牌增益),而CCL-Min平衡性最佳(39.4%令牌增益,WAR=0.874)。

Comments 13 pages, 1 figure, 2 tables. Pilot framework paper; code and supplementary artifacts available in ancillary files

详情
AI中文摘要

大型语言模型(LLM)系统的文本压缩通常被框架化为令牌删除、检索、摘要或精确重建。我们研究了一种更具攻击性但明确有损的设置:将文本压缩为紧凑代码,LLM可以将其扩展为任务相关的含义。我们将此设置称为SemanticZip。与无损压缩不同,SemanticZip不需要字节相同的重建;与普通摘要不同,它将基于模型的解压缩视为编解码器的一部分,并评估是否恢复了任务相关的语义承诺。 本文是一个试点框架,而非基准声明。我们形式化了LLM介导的解压缩,定义了受保护/有损数据包架构,并在五个作者构建的诊断案例上评估了六种表示体系:结构化散文、JSON、CCL-Core、CCL-Min、SemanticZip ASCII和SemanticZip emoji。一个独立的解码器LLM从每种压缩表示中重建类型化的语义原子,我们评估关键原子召回率、加权原子召回率、精确度和分词器增益。在该试点中,结构化散文具有最高的可恢复性,WAR=0.956,o200k_base令牌增益19.1%。CCL-Min是最强的平衡点,令牌增益39.4%,WAR=0.874。SemanticZip ASCII提供了最大的有用压缩,令牌增益46.5%,WAR=0.802,而表情符号密集的SemanticZip在压缩和恢复方面表现均较差。 主要贡献并非声称这些数字建立了通用前沿。相反,我们引入了一个可重复的实验接口,用于研究有损、LLM可解压的文本代码,以及一个设计原则:安全关键和精确的承诺应保持受保护,而可预测的低风险上下文可以进行语义压缩。

英文摘要

Text compression for large language model (LLM) systems is usually framed as token deletion, retrieval, summarization, or exact reconstruction. We study a more aggressive but explicitly lossy setting: compress text into compact codes that an LLM can expand into task-relevant meaning. We call this setting SemanticZip. Unlike lossless compression, SemanticZip does not require byte-identical reconstruction; unlike ordinary summarization, it treats model-based decompression as part of the codec and evaluates whether task-relevant semantic commitments are recovered. This paper is a pilot framework, not a benchmark claim. We formalize LLM-mediated decompression, define a protected/lossy packet architecture, and evaluate six representation regimes over five author-constructed diagnostic cases: structured prose, JSON, CCL-Core, CCL-Min, SemanticZip ASCII, and SemanticZip emoji. An independent decoder LLM reconstructs typed semantic atoms from each compressed representation, and we score Critical Atom Recall, Weighted Atom Recall, precision, and tokenizer gain. In this pilot, structured prose has the highest recoverability, with WAR = 0.956 and 19.1% o200k_base token gain. CCL-Min is the strongest balanced point, with 39.4% token gain and WAR = 0.874. SemanticZip ASCII provides the largest useful compression, with 46.5% token gain and WAR = 0.802, while emoji-heavy SemanticZip performs worse on both compression and recovery. The main contribution is not the claim that these numbers establish a universal frontier. Rather, we introduce a reproducible experimental interface for studying lossy, LLM-decompressible text codes and a design principle: safety-critical and exact commitments should remain protected, while predictable low-risk context may be semantically zipped.

2605.22794 2026-05-26 cs.AI cs.LG 版本更新

MOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems

MOSS:自主智能体系统中通过源代码级重写的自我进化

Qianshu Cai, Yonggang Zhang, Xianzhang Jia, Huajiang Zheng, Wei Xue, Jun Song, Xinmei Tian, Yike Guo

发表机构 * University of Science and Technology of China(中国科学技术大学) Hong Kong Generative AI Research & Development Center(香港生成式AI研究与开发中心) The Hong Kong University of Science and Technology(香港理工大学) Hong Kong Baptist University(香港 Baptist大学)

AI总结 提出MOSS系统,通过源代码级重写实现自主智能体系统的自我进化,利用生产故障证据自动批处理和多阶段确定性流水线,在OpenClaw上单周期内将平均评分从0.25提升至0.61。

Comments 12 pages, 3 figures, 2 tables. Preprint. Code: https://github.com/hkgai-official/Moss

详情
AI中文摘要

自主智能体系统在部署后基本是静态的:它们不会从用户交互中学习,重复的失败会持续存在,直到下一次人工驱动的更新发布修复。自我进化的智能体应运而生,但所有进化都局限于文本可变的工件——技能文件、提示配置、记忆模式、工作流图——而智能体框架本身保持不变。由于路由、钩子排序、状态不变量和调度存在于代码中而非任何文本工件中,整个结构故障类别在文本层上是物理上不可达的。我们认为源代码级适应是一种本质上更通用的媒介:它是图灵完备的,是每个文本可变范围的严格超集,通过确定性方式生效而非基础模型合规性,并且不会在长上下文漂移下退化。我们提出了MOSS,一个在生产智能体基板上执行源代码级自我重写的系统。每次进化都锚定在自动策划的生产故障证据批次上,并通过确定性的多阶段流水线进行;代码修改委托给可插拔的外部编码智能体CLI,而MOSS保留阶段顺序和判定。候选者通过在临时试验工作器中重放批次来验证,然后通过用户同意门控的就地容器交换和健康探针门控的回滚进行推广。在OpenClaw上,MOSS在单周期内无需人工干预将四个任务的平均评分从0.25提升至0.61。

英文摘要

Autonomous agentic systems are largely static after deployment: they do not learn from user interactions, and recurring failures persist until the next human-driven update ships a fix. Self-evolving agents have emerged in response, but all confine evolution to text-mutable artifacts -- skill files, prompt configurations, memory schemas, workflow graphs -- and leave the agent harness untouched. Since routing, hook ordering, state invariants, and dispatch live in code rather than in any text artifact, an entire class of structural failure is physically unreachable from the text layer. We argue that source-level adaptation is a fundamentally more general medium: it is Turing-complete, a strict superset of every text-mutable scope, takes effect deterministically rather than through base-model compliance, and does not erode under long-context drift. We present MOSS, a system that performs self-rewriting at the source level on production agentic substrates. Each evolution is anchored to an automatically curated batch of production-failure evidence and proceeds through a deterministic multi-stage pipeline; code modification is delegated to a pluggable external coding-agent CLI while MOSS retains stage ordering and verdicts. Candidates are verified by replaying the batch against the candidate image in ephemeral trial workers, then promoted via user-consent-gated, in-place container swap with health-probe-gated rollback. On OpenClaw, MOSS lifts a four-task mean grader score from 0.25 to 0.61 in a single cycle without human intervention.

2605.22684 2026-05-26 cs.LG 版本更新

ChronoVAE-HOPE: Beyond Attention -- A Next-Generation VAE Foundation Model for Specialized Time Series Classification

ChronoVAE-HOPE:超越注意力——面向专业时间序列分类的下一代VAE基础模型

José Alberto Rodríguez, Luis Balderas, Miguel Lastra, Antonio Arauzo-Azofra, José M. Benítez

发表机构 * Department of Computer Science and Artificial Intelligence(计算机科学与人工智能系) DiCITS iMUDS DaSCI University of Granada(格拉纳达大学) Advanced Medical Imaging Group(先进医学成像组) Instituto de Investigación Biosanitaria de Granada(格拉纳达生物医学研究 institute) Department of Software Engineering(软件工程系) Department of Rural Engineering(农村工程系) University of Córdoba(科尔多瓦大学)

AI总结 提出ChronoVAE-HOPE,一种基于VAE和HOPE块(含Titans模块和连续记忆系统)的下一代时间序列基础模型,通过解耦潜在空间分离趋势与季节成分,在UCR基准分类任务上表现优异。

详情
AI中文摘要

时间序列基础模型已成为通用时间序列预测领域的最新技术组成部分。然而,将其应用于专业分类任务仍受两个相互关联的挑战制约:标准注意力机制的二次成本以及无法解耦时间序列变异性背后的结构成分。本技术报告介绍了ChronoVAE-HOPE,一种下一代时间序列基础模型,它调和了大规模泛化与结构化潜在表示在时间序列分类中的需求。该方案的核心是构建于HOPE块之上的变分自编码器框架,该框架用双记忆系统替代二次注意力:用于动态短期保留的Titans模块和用于长期历史上下文抽象的连续记忆系统。一个关键的架构创新是解耦潜在空间,通过专用编码器头和分离的解码器路径将表示分解为独立的趋势和季节成分。ChronoVAE-HOPE在Monarch档案上进行自监督预训练,结合了掩码时间序列建模辅助目标和解耦VAE重建损失。预训练编码器随后被冻结,用于生成固定长度嵌入,以在UCR基准数据集上进行下游分类。实证结果表明,在不同时间域上,特别是在具有严格因果结构的设置中,模型表现出强劲性能。ChronoVAE-HOPE通过结构化生成表示为基础模型适应时间序列分类建立了一个稳健且可解释的框架。

英文摘要

Time Series Foundation Models (TSFMs) have become a new component of the state-of-the-art in general time series forecasting. However, adapting them to specialized classification tasks remains constrained by two interconnected challenges: the quadratic cost of standard attention mechanisms and the inability to disentangle the structural components underlying time series variability. This technical report introduces ChronoVAE-HOPE, a next-generation TSFM that reconciles massive generalization with structured latent representation for time series classification. The core of the proposal is a Variational Autoencoder (VAE) framework built upon the HOPE Block, which replaces quadratic attention with a dual-memory system: Titans modules for dynamic short-term retention and a Continuum Memory System (CMS) for the abstraction of long-term historical context. A key architectural novelty is the disentangled latent space, which factorizes representations into independent trend and seasonal components via dedicated encoder heads and separate decoder pathways. ChronoVAE-HOPE undergoes self-supervised pre-training on the Monash archive, combining a Masked Time Series Modeling (MTSM) auxiliary objective with a disentangled VAE reconstruction loss. The pre-trained encoder is subsequently frozen and used to generate fixed-length embeddings for downstream classification on the UCR benchmark datasets. Empirical results demonstrate strong performance across diverse temporal domains, particularly in settings characterized by strict causal structure. ChronoVAE-HOPE establishes a robust and interpretable framework for the adaptation of foundation models to time series classification through structured generative representations.

2605.22242 2026-05-26 cs.LG physics.ao-ph 版本更新

Decomposing Ensemble Spread in Lorenz '96 With Learned Stochastic Parameterizations

利用学习随机参数化分解 Lorenz '96 中的集合离散度

Birgit Kühbacher, Daan Crommelin, Niki Kilbertus

发表机构 * Technical University of Munich(慕尼黑技术大学) Helmholtz Munich(海德堡-慕尼黑研究所) Munich Center for Machine Learning (MCML)(慕尼黑机器学习中心) Centrum Wiskunde & Informatica (CWI)(荷兰代尔夫特数学与信息研究所) Korteweg-de Vries Institute for Mathematics, University of Amsterdam(阿姆斯特丹大学克罗内克-德·维尔斯数学研究所)

AI总结 本研究利用双尺度 Lorenz 1996 系统,通过比较多种集合配置和参数化策略,系统分析了内在变率、初始条件扰动和随机模型不确定性对集合离散度的影响,揭示了随机参数化特别是时间持续结构能增强早期离散度增长并改善离散度-误差一致性。

详情
AI中文摘要

由于混沌动力学、不完美的初始条件以及对底层物理过程的不完全表示,天气和气候预报本质上具有不确定性。业务集合预报旨在通过预报离散度来表示这些不确定性,然而许多方法产生的离散度估计不足,即离散度相对于预报误差增长过慢。利用双尺度 Lorenz 1996 系统作为广泛使用的受控测试平台,我们设计了一种系统方法来区分内在变率、初始条件扰动和随机模型不确定性。我们比较了多种集合配置和参数化策略,包括现有的确定性和自回归方法以及新颖的贝叶斯和基于流的方法。我们的结果表明,集合扰动不会增加系统的长期方差;相反,它们调节轨迹去相关和探索不变测度的速度。随机参数化,特别是那些具有时间持续结构的参数化,增强了早期离散度增长并改善了离散度-误差一致性。总体而言,我们阐明了不同不确定性来源在混沌系统中如何相互作用,并为天气和气候模型中随机参数化的设计和评估提供了指导。

英文摘要

Weather and climate forecasts are inherently uncertain due to chaotic dynamics, imperfect initial conditions, and incomplete representation of the underlying physical processes. Operational ensemble forecasts aim to represent these uncertainties through forecast spread, yet many approaches yield underdispersive estimates, with spread that grows too slowly relative to forecast error. Using the two-scale Lorenz 1996 system as a widely used, controlled testbed, we design a systematic approach to disentangle intrinsic variability, initial-condition perturbations, and stochastic model uncertainty. We compare multiple ensemble configurations and parameterization strategies, including existing deterministic and autoregressive as well as novel Bayesian and flow-based approaches. Our results show that ensemble perturbations do not increase the system's long-term variance; rather, they regulate how rapidly trajectories decorrelate and explore the invariant measure. Stochastic parameterizations, particularly those with temporally persistent structure, enhance early spread growth and improve spread-error consistency. Overall, we bring clarity to how different sources of uncertainty interact in a chaotic system and provide guidance for the design and evaluation of stochastic parameterizations in weather and climate models.

2605.20747 2026-05-26 q-bio.GN cs.LG 版本更新

Multi-Modal Machine Learning for Population- and Subject-Specific lncRNA-Type 2 Diabetes Association Analysis

多模态机器学习用于群体和个体特异性lncRNA-2型糖尿病关联分析

Ashwani Siwach, Sanjeev Narayan Sharma, Sunil Datt Sharma

发表机构 * Department of Electronics and Communication Engineering, IIITDM Jabalpur(IIITDM Jabalpur电子与通信工程系) Department of Electronics and Communication Engineering, Central University of Jammu(Jammu中央大学电子与通信工程系)

AI总结 本研究通过整合表达、二级结构和序列特征的多模态机器学习框架,在独立队列中识别与2型糖尿病相关的lncRNA,并利用SHAP分析实现群体和个体水平的关联解释。

Comments This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

长链非编码RNA(lncRNA)是参与慢性疾病(包括2型糖尿病)发病机制的新兴调控分子。我们研究了文献中报道的与2型糖尿病相关的十种lncRNA:MALAT1、MEG3、MIAT、ANRIL、GAS5、KCNQ1OT1、H19、BCYRN1、XIST和HOTAIR,在两个独立的人群RNA-seq队列中进行了分析。单组学方法提供了疾病生物学的不完整视图,因此开发了一个整合多特征框架,提取每种lncRNA的表达、二级结构和序列特征。在分层k折交叉验证、留一法交叉验证和重复留出法方案下评估了八种机器学习分类器,以确保稳健的性能估计。应用SHAP分析进行个体水平的关联解释。在一个队列中,发现GAS5和XIST的表达特征以及GAS5、MEG3和ANRIL的序列特征与2型糖尿病相关,而在第二个队列中,发现MALAT1的表达特征以及KCNQ1OT1、ANRIL和MEG3的序列特征与2型糖尿病相关。SHAP将MEG3识别为两个队列中的主要lncRNA。机器学习结果与已建立的统计方法一致,同时额外提供了与特定分子特征类型相关的群体和个体水平疾病关联谱。所提出的框架增进了对2型糖尿病机制的理解,并支持基于lncRNA的精准医学。

英文摘要

Long non-coding RNAs (lncRNAs) are emerging regulatory molecules implicated in chronic disease pathogenesis, including Type 2 Diabetes Mellitus (T2D). We investigated ten literature reported lncRNAs associated with T2D: MALAT1, MEG3, MIAT, ANRIL, GAS5, KCNQ1OT1, H19, BCYRN1, XIST, and HOTAIR across two independent population-based RNA-seq cohorts. Single-omics approaches provide an incomplete view of disease biology, therefore, an integrative multi-feature framework was developed, extracting expression, secondary-structure, and sequence features for each lncRNA. Eight machine learning (ML) classifiers were evaluated under stratified k-fold, leave-one-out cross-validation (LOOCV), and repeated hold-out schemes to ensure robust performance estimation. SHAP analysis was applied for subject-level association interpretation. In one cohort, GAS5 and XIST expression features, along with GAS5, MEG3, and ANRIL sequence features, were found to be associated with T2D, while MALAT1 expression and KCNQ1OT1, ANRIL, and MEG3 sequence features were found to be associated in the second cohort. MEG3 was identified by SHAP as the dominant lncRNA in both cohorts. ML results were consistent with established statistical methods while additionally providing population- and subject-level disease association profiles linked to specific molecular feature types. The proposed framework advances mechanistic understanding of T2D and supports lncRNA-based precision medicine.

2605.20416 2026-05-26 cs.LG physics.comp-ph 版本更新

Miller-Index-Based Latent Crystallographic Fracture Plane Reasoning and generation with Vision-Language Models

基于米勒指数的潜在晶体学断裂面推理与生成:视觉-语言模型方法

Qinwu Xu, Xiaofu Ma, Yifan Jiang

发表机构 * Independent research(独立研究)

AI总结 研究多模态大语言模型能否利用米勒指数作为结构化潜在表示来推理断裂几何,实验表明模型在理想条件下可进行潜在推理,并能拒绝不适用物理的表示。

详情
AI中文摘要

我们研究多模态大语言模型(MLLMs)是否能够利用晶体学平面指数(米勒指数)作为结构化潜在表示来推理断裂几何。我们将米勒指数 $z = (h,k,l)$ 形式化为控制理想平面断裂的潜在变量,并评估两种互补能力:(i) 潜在推理,即模型在物理有效条件下将视觉观测映射到平面假设;(ii) 潜在适用性评估,即模型判断这种表示对于给定断裂图像是否有意义。通过涵盖合成数据、受控的2D-3D几何对以及多种材料类别(包括陶瓷、玻璃、金属和混凝土)的真实断裂图像的广泛实验,我们表明MLLMs能够在理想设置下可靠地进行潜在推理,并且关键的是,当底层物理不支持时,能够拒绝该潜在表示。作为探索性扩展,我们进一步检查了AI生成的断裂序列,并观察到定性上合理的脆性断裂进展行为,这表明多模态生成模型可能编码了与材料失效动力学相关的部分隐式物理先验。这些结果表明,只要明确建模有效性域,MLLMs可以作为基于结构化潜在先验的物理感知推理系统。

英文摘要

We study whether multimodal large language models (MLLMs) can leverage crystallographic plane indices (Miller indices) as a structured latent representation for reasoning about fracture geometry. We formulate Miller indices $z = (h,k,l)$ as a latent variable governing idealized planar fracture and evaluate two complementary capabilities: (i) latent inference, where the model maps visual observations to plane hypotheses under physically valid conditions, and (ii) latent applicability assessment, where the model determines whether such a representation is meaningful for a given fracture image. Through extensive experiments spanning synthetic data, controlled 2D--3D geometric pairs, and real-world fracture images across multiple material classes -- including ceramics, glass, metals, and concrete -- we show that MLLMs can reliably perform latent inference in idealized settings and, critically, can reject the latent representation when the underlying physics does not support it. As an exploratory extension, we further examine AI-generated fracture sequences and observe qualitatively plausible brittle-fracture progression behaviors, suggesting that multimodal generative models may encode partial implicit physical priors related to material failure dynamics. These results suggest that MLLMs can act as physics-aware reasoning systems conditioned on structured latent priors, provided that the domain of validity is explicitly modeled.

2605.20278 2026-05-26 cs.LG cs.AI cs.CV 版本更新

ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison

ClaimDiff-RL: 通过视觉声明比较进行细粒度描述强化学习

Tianle Li, Xuyang Shen, Yan Ma, Rongxin Guo, Shaoxiang Chen, Jiacheng Chen, Haochen Wang, Hongyang Tang, Yucong Zhou, Yu Cheng

发表机构 * The Chinese University of Hong Kong(香港中文大学) MiniMax

AI总结 提出ClaimDiff-RL框架,利用原子声明差异作为奖励单元,通过多模态判断器枚举视觉差异并分配错误类型和严重程度,以解决长描述强化学习中事实性与覆盖度的权衡问题。

详情
AI中文摘要

长格式图像描述揭示了强化学习中的奖励粒度问题:描述被整体判断,而重要错误发生在单个视觉声明层面。一个好的密集描述应既忠实又信息丰富,避免幻觉而不遗漏显著细节。然而,成对偏好、基于参考的指标和整体标量奖励将这些局部错误压缩为单个序列级信号,模糊了事实性与覆盖度之间的权衡。我们引入ClaimDiff-RL框架,该框架使用基于参考的原子声明差异作为描述强化学习的奖励单元。给定一张图像、一个演员描述和一个参考描述,多模态判断器枚举视觉上可区分的差异,针对图像验证每个差异,分配开放词汇的错误类型和严重程度,并生成每个差异的统计信息用于奖励组合。这使得幻觉声明和遗漏的显著事实可以分别测量和调整。实验表明,整体标量奖励可以通过增加遗漏事实来减少幻觉,而ClaimDiff-RL揭示了这种忠实性与覆盖度的权衡,并实现了更平衡的操作点。在包含160张图像的人工标注诊断基准、公开描述基准和VQA基准上,ClaimDiff-RL改善了幻觉-遗漏事实平衡,保留了通用能力,甚至在多个细粒度能力维度(如物体计数、空间关系和场景识别)上超越了Gemini-3-Pro-Preview。这些结果表明,类型化、可验证的声明差异是细粒度且可诊断的描述强化学习的有效奖励单元。

英文摘要

Long-form image captioning exposes a reward granularity problem in RL: captions are judged as whole sequences, while the important errors occur at the level of individual visual claims. A good dense caption should be both faithful and informative, avoiding hallucination without omitting salient details. Yet pairwise preferences, reference-based metrics, and holistic scalar rewards compress these local errors into a single sequence-level signal, obscuring the tradeoff between factuality and coverage. We introduce ClaimDiff-RL, a framework that uses reference-conditioned atomic claim differences as the reward unit for caption RL. Given an image, an actor caption, and a reference caption, a multimodal judge enumerates visually grounded differences, verifies each difference against the image, assigns open-vocabulary error types and severity levels, and produces per-difference statistics for reward composition. This makes hallucinated claims and omitted salient facts separately measurable and tunable. Experiments show that holistic scalar rewards can reduce hallucination by increasing missing facts, while ClaimDiff-RL exposes this faithfulness and coverage tradeoff and enables more balanced operating points. On a 160-image human-labeled diagnostic benchmark, public captioning benchmarks, and VQA benchmarks, ClaimDiff-RL improves the hallucination--missing-fact balance, preserves general capability, and even surpasses Gemini-3-Pro-Preview on several fine-grained Capability dimensions such as object counting, spatial relations, and scene recognition. These results suggest that typed, verifiable claim differences are an effective reward unit for fine-grained and diagnosable caption RL.

2605.19938 2026-05-26 stat.ME cs.LG stat.ML 版本更新

Variance-Reduced Manifold Sampling via Polynomial-Maximization Density Estimation

通过多项式最大化密度估计的方差缩减流形采样

Serhii Zabolotnii

发表机构 * Department of Information, Multimedia Technologies and Design, Cherkasy State Business College(切爾卡西州商業學院信息、多媒體技術與設計系) State Scientific Research Institute of Armament and Military Equipment Testing and Certification(武器和軍事設備測試和認證國家科學研究 institutes) Department of Cybernetics and Applied Mathematics, Uzhhorod National University(烏茲霍羅德國家大學自動控制與應用數學系)

AI总结 针对隐式定义流形上的均匀采样问题,提出一种基于多项式最大化矩估计的密度估计模块PMM-MASEM,通过门控机制在非平坦间距分布下替代传统插件估计,降低密度均方误差22-36%。

Comments 16 pages, 5 figures, 3 tables. Code supplement: https://github.com/SZabolotnii/Ku-PMM-MASEM-code-supplement

详情
AI中文摘要

在隐式定义流形上的均匀采样是运动规划、约束模拟和概率机器学习中的核心原语。MASEM通过熵最大化重采样解决该问题,但其重采样权重依赖于局部k近邻密度估计,而激进的重采样温度可能放大其误差。我们探究是否可以用多项式最大化矩估计器替代插件密度规则,而不改变周围的MASEM架构。所提出的PMM-MASEM模块从嵌套的k近邻半径计算壳间距,估计其标准化累积量,并仅在间距分布偏离平坦的Exp(1)分布时使用门控的PMM2/PMM3估计器;否则回退到插件/MLE规则。这种回退至关重要:在平坦齐次流形上,插件估计器已经是MLE,因此PMM不应优于它。局部已知DGP蒙特卡洛实验证实了该门控:选择器在平坦Exp(1)间距下返回MLE,并在非对称伽马和边界间距情况下将密度MSE降低22-36%。证据并非一致积极:PMM3在尖峰均匀间距法则下表现更差,而轻量级重采样代理实验改善了七瓣覆盖但降低了正弦和瑞士卷代理的性能。因此,当前证据支持的是适用边界结果,而非一般的MASEM改进主张。

英文摘要

Uniform sampling on implicitly defined manifolds is a core primitive in motion planning, constrained simulation, and probabilistic machine learning. MASEM addresses this problem by entropy-maximizing resampling, but its resampling weights depend on a local k-nearest-neighbour density estimate whose errors can be amplified by aggressive resampling temperatures. We ask whether a polynomial-maximization moment estimator can replace the plug-in density rule without changing the surrounding MASEM architecture. The proposed PMM-MASEM module computes shell spacings from nested k-nearest-neighbour radii, estimates their standardized cumulants, and uses a gated PMM2/PMM3 estimator only when the spacing distribution departs from the flat Exp(1) regime; otherwise it falls back to the plug-in/MLE rule. This fallback is essential: on a flat homogeneous manifold the plug-in estimator is already the MLE, so PMM should not outperform it. A local Known-DGP Monte Carlo experiment confirms this gate: the selector returns MLE on flat Exp(1) spacings and reduces density MSE by 22--36% on asymmetric gamma and boundary-spacing regimes. The evidence is not uniformly positive: PMM3 worsens a platykurtic uniform spacing law, and a lightweight resampling-proxy experiment improves seven-lobes coverage but degrades the sine and swiss-roll proxies. The current evidence therefore supports an applicability-boundary result rather than a general MASEM improvement claim.

2605.19409 2026-05-26 cs.LG cs.AI 版本更新

Unlocking the Potential of Continual Model Merging: An ODE Perspective

解锁持续模型合并的潜力:ODE视角

Lihong Lin, Haidong Kang

发表机构 * Northeastern University, Shenyang, China(东北大学,沈阳,中国)

AI总结 提出ODE-M框架,将持续模型合并建模为参数空间中的轨迹,通过整流时变速度场和效用感知时间调度平衡历史知识与新任务,提升长任务流性能。

Comments 21 pages, 8 figures

详情
AI中文摘要

持续模型合并(CMM)通过顺序整合任务适配模型实现基础模型的快速定制,无需重复训练。然而,现有合并规则通常通过固定代数或基于投影的操作更新部署模型,对保留多少先前积累的知识相对于新任务模型的控制有限。这种限制导致长任务流中保留不稳定和性能下降,当任务具有异构效用时更为明显。我们提出ODE驱动的合并(ODE-M),一个可控框架,将每次持续合并视为参数空间中的轨迹而非一步端点更新。受模式连通性启发,ODE-M使用整流时变速度场构建屏障感知轨迹,其中来自小型校准集的轻量级一阶反馈抑制损失增加的运动,同时保持向新模型的进展。然后通过沿该轨迹选择操作点(通过效用感知时间调度)获得下一个合并模型,为平衡保留的历史知识和新任务专业知识提供显式机制。在标准CMM基准上的大量实验表明,ODE-M在CLIP ViT骨干、流长度和异构任务效用设置上持续优于强持续合并基线。

英文摘要

Continual Model Merging (CMM) enables rapid customization of foundation models by sequentially incorporating task-adapted models without repeated retraining. However, existing merging rules usually update the deployed model through fixed algebraic or projection-based operations, providing limited control over how much previously accumulated knowledge should be retained relative to the incoming task model. This limitation leads to unstable retention and performance degradation in long task streams, and becomes more pronounced when tasks have heterogeneous utilities. We propose ODE-driven Merging (ODE-M), a controllable framework that formulates each continual merge as a trajectory in parameter space rather than a one-step endpoint update. Motivated by mode connectivity, ODE-M constructs a barrier-aware trajectory using a rectified time-dependent velocity field, where lightweight first-order feedback from a small calibration set suppresses loss-increasing motion while preserving progress toward the incoming model. The next merged model is then obtained by selecting an operating point along this trajectory through a utility-aware time schedule, providing an explicit mechanism for balancing retained historical knowledge and incoming task expertise. Extensive experiments on standard CMM benchmarks show that ODE-M consistently improves over strong continual merging baselines across CLIP ViT backbones, stream lengths, and heterogeneous task-utility settings.

2605.19170 2026-05-26 stat.ML cs.LG 版本更新

Reducing Diffusion Model Memorization with Higher Order Langevin Dynamics

使用高阶朗之万动力学减少扩散模型记忆化

Benjamin Sterling, Mónica F. Bugallo, Tom Tirer

发表机构 * Department of Applied Math & Statistics(应用数学与统计学系) Stony Brook University(石溪大学) Department of Electrical and Computer Engineering(电气与计算机工程系) Faculty of Engineering(工程学院) Bar-Ilan University(巴伊兰大学)

AI总结 本文研究高阶朗之万动力学(HOLD)对扩散模型记忆化的影响,通过理论分析表明HOLD通过低通滤波学习得分函数并随阶数增加平滑度,从而缓解记忆化,并在真实数据上验证了理论。

详情
AI中文摘要

扩散/基于分数的模型已成为强大的生成模型,能够生成模仿训练数据分布的高质量样本。然而,观察到它们容易重现训练样本——称为“记忆化”——可能违反版权和隐私。在本文中,我们研究了高阶朗之万动力学(HOLD)对这一现象的影响。HOLD扩散过程引入了辅助变量;如果数据变量被解释为“位置”,那么辅助变量可以解释为“速度”和“加速度”,具体取决于所选模型的阶数。它们最初是基于这样的直觉提出的:通过隐式施加额外的动力学约束来正则化数据变量的轨迹。据我们所知,我们的工作首次提供了HOLD正则化效应的理论刻画。具体来说,我们表明在HOLD中,数据变量的动力学由学习得分函数的低通滤波版本控制,其平滑度随HOLD阶数增加而增加。然后我们分析了最优经验得分和分布崩溃的可能性。总之,我们的结果解释了随着模型阶数增加记忆化的缓解。最后,我们在真实世界数据上进行了实证研究,支持了我们的理论,并突出了HOLD在实践中相对于标准扩散的这一独特优势。

英文摘要

Diffusion/score-based models have emerged as powerful generative models, capable of generating high-quality samples that mimic the training data distribution. However, it has been observed that they are prone to reproducing training samples-known as "memorization"-potentially violating copyright and privacy. In this paper, we study the effect of Higher-Order Langevin Dynamics (HOLD) on this phenomenon. HOLD diffusion processes introduce auxiliary variables; if the data variable is interpreted as "position," then the auxiliary variables can be interpreted as "velocity" and "acceleration," depending on the chosen order of the model. They were originally proposed based on the intuition that they regularize the trajectories of the data variable by implicitly imposing additional dynamical constraints. Our work provides, to our knowledge, the first theoretical characterization of the regularization effect of HOLD. Specifically, we show that in HOLD, the dynamics of the data variable are governed by a low-pass-filtered version of the learned score function, with smoothness increasing with the order of HOLD. We then analyze the optimal empirical score and the possibility of distribution collapse. Together, our results explain the mitigation of memorization as the model order increases. Finally, we present an empirical study on real-world data that supports our theory and highlights this distinct advantage of HOLD over standard diffusion in practice.

2605.18840 2026-05-26 cs.LG cs.AI cs.CL 版本更新

The Growing Pains of Frontier Models: When Leaderboards Stop Separating and What to Measure Next

前沿模型的成长之痛:当排行榜不再区分以及接下来衡量什么

Adil Amin

发表机构 * Zehen Labs(泽亨实验室)

AI总结 本文通过分解SWE-bench和GPQA Diamond分数为种群耦合趋势和每版本残差(h场),诊断前沿模型能力之间的协作与权衡,并提供三步诊断法、每实验室测量优先级表及七个可证伪预测。

Comments 13 pages, 5 figures, 4 tables. Companion paper: "Lying Is Just a Phase: The Hidden Alignment Transition in Language Model Scaling." ( https://doi.org/10.48550/arXiv.2605.18838 ). Code: https://github.com/adilamin89/cape-scaling . Dashboard: https://zehenlabs.com/cape/

详情
AI中文摘要

排行榜在独立轴上对前沿模型进行排名,但并未揭示能力在版本间是相互增强还是权衡——而在前沿,这种相互作用是更具信息量的信号。我们将配对的SWE-bench和GPQA Diamond分数分解为种群耦合趋势和每版本残差(h场),该残差从两个公开基准分数诊断能力重点。在来自10个实验室的34个模型(2024-2026)中,能力相互协作(r = +0.72,p < 10^{-6}),但协作程度系统性地变化:每个实验室的耦合斜率跨度达5倍(谷歌1.15 vs. DeepSeek 0.23),且实验室发生转向——DeepSeek从推理密集型逆转为编码优先(Δh = 15.9个百分点);Anthropic在编码偏离和恢复之间振荡。种群回归作为等斜线相边界:用于识别基础尺度耦合转变的相同分类器√[(a/b)·B₁] [Amin, 2026] 对前沿模型进行分类,并已在下一个转变处检测到混合相行为(两个模型低于GPQA-IFEval等斜线)。h场不仅具有诊断性——它还告诉你需要改变什么。预训练建立耦合为0.871,而RLHF增加0.081 [Amin, 2026]:预训练级别的转变是永久的(DeepSeek的四个版本逆转持续存在),后训练转变是可逆的(Anthropic的三次编码偏离均在单个版本内恢复),仅推理计算在不重新训练的情况下将h改变+7.8个百分点。知道哪个组件占主导地位决定了是重新训练还是等待。我们提供了三步诊断法(定位、分类、预测)、每实验室测量优先级表以及七个带有时间戳标准的可证伪预测。五个截止日期后的版本落在95%预测区间内。代码、数据和交互式仪表盘:https://zehenlabs.com/cape/。

英文摘要

Leaderboards rank frontier models on independent axes but do not reveal whether capabilities reinforce or trade off across releases -- and at the frontier, this interaction is the more informative signal. We decompose paired SWE-bench and GPQA Diamond scores into a population coupling trend and per-release residual ($h$-field) that diagnoses capability emphasis from two public benchmark scores. Across 34 models from 10 labs (2024--2026), capabilities cooperate ($r = +0.72$, $p < 10^{-6}$), but cooperation varies systematically: per-lab coupling slopes span $5\times$ (Google $1.15$ vs. DeepSeek $0.23$), and labs pivot -- DeepSeek reversed from reasoning-rich to coding-first ($Δh = 15.9$~pp); Anthropic oscillates between coding excursions and recovery. The population regression serves as an isocline phase boundary: the same $\sqrt{(a/b)\cdot B_1}$ classifier that identifies the base-scale coupling transition [Amin, 2026] classifies frontier models and already detects mixed-phase behavior at the next transition (two models below the GPQA--IFEval isocline). The $h$-field is not just diagnostic -- it tells you what to change. Pretraining establishes coupling at $0.871$ while RLHF adds $0.081$ [Amin, 2026]: pretraining-level shifts are permanent (DeepSeek's four-release reversal persists), post-training shifts are reversible (Anthropic's three coding excursions each recover within one release), and inference compute alone shifts $h$ by $+7.8$~pp without retraining. Knowing which component dominates determines whether to retrain or wait. We provide a three-step diagnostic (locate, classify, predict), a per-lab measurement-priority table, and seven falsifiable predictions with timestamped criteria. Five post-cutoff releases fall within the 95\% prediction interval. Code, data, and an interactive dashboard: https://zehenlabs.com/cape/.

2605.18657 2026-05-26 cs.LG cs.AI 版本更新

KairosHope: A Next-Generation Time-Series Foundation Model for Specialized Classification via Dual-Memory Architecture

KairosHope: 一种基于双记忆架构的下一代时间序列基础模型,用于专门分类

Luis Balderas, José Alberto Rodríguez, Miguel Lastra, Antonio Arauzo-Azofra, José M. Benítez

发表机构 * Department of Computer Science and Artificial Intelligence(计算机科学与人工智能系) DiCITS, iMUDS, DaSCI(DiCITS、iMUDS、DaSCI) University of Granada(格拉纳达大学) Advanced Medical Imaging Group(先进医学成像组) Instituto de Investigación Biosanitaria de Granada (ibs.Granada)(格拉纳达生物医学研究机构(ibs.Granada)) Department of Software Engineering(软件工程系) Department of Rural Engineering(农村工程系) University of Córdoba(科尔多瓦大学)

AI总结 针对标准注意力计算瓶颈和经典统计知识缺失问题,提出KairosHope模型,通过双记忆系统(Titans模块和连续记忆系统CMS)替代二次注意力,并融合深度表示与统计特征的混合决策头,在UCR基准上实现优越分类性能。

详情
AI中文摘要

时间序列基础模型(TSFMs)在通用预测任务中取得了显著成功;然而,它们对专门分类问题的适应仍然受到标准注意力的计算瓶颈和对经典统计知识的系统性忽略的限制。本技术报告介绍了KairosHope,一种下一代TSFM,旨在协调大规模泛化与分类任务中的分析精度。该提案的核心是HOPE块,一种用双记忆系统替代二次注意力的架构:用于动态短期保留的Titans模块和用于长期历史上下文抽象的连续记忆系统(CMS)。为了丰富归纳偏差,引入了混合决策头,它将深度潜在表示与通过tsfeatures包提取的确定性统计特征融合。KairosHope在大型Monash档案上进行自监督预训练,结合了掩码时间序列建模(MTSM)和对比学习(InfoNCE)。随后,通过严格的线性探测和全微调(LP-FT)协议在UCR基准数据集上进行适应,以防止灾难性遗忘。实验结果表明,在具有严格时间因果关系的领域(如HAR或传感器数据)中,性能优越。因此,KairosHope为基础模型适应时间序列分析建立了一个稳健高效的框架。

英文摘要

Time Series Foundation Models (TSFMs) have demonstrated notable success in general-purpose forecasting tasks; however, their adaptation to specialized classification problems remains constrained by the computational bottleneck of standard attention and the systematic omission of classical statistical knowledge. This technical report introduces KairosHope, a next-generation TSFM designed to reconcile massive generalization with analytical precision in classification tasks. The core of the proposal is the HOPE block, an architecture that replaces quadratic attention with a dual-memory system: Titans modules for dynamic short-term retention and a Continuum Memory System (CMS) for the abstraction of long-term historical context. To enrich the inductive bias, a Hybrid Decision Head is introduced, which fuses deep latent representations with deterministic statistical features extracted via tsfeatures package. KairosHope undergoes self-supervised pre-training on the massive Monash archive, combining Masked Time Series Modeling (MTSM) and contrastive learning (InfoNCE). Its subsequent adaptation to the UCR benchmark datasets is conducted through a rigorous Linear Probing and Full Fine-Tuning (LP-FT) protocol to prevent catastrophic forgetting. Empirical results demonstrate superior performance in domains characterized by strict temporal causality such as HAR or Sensor data. Consequently, KairosHope establishes a robust and efficient framework for the adaptation of foundation models to time series analysis.

2605.16591 2026-05-26 cs.LG cs.AI 版本更新

How Few-Shot Examples Add Up: A Causal Decomposition of Function Vectors in In-Context Learning

少样本示例如何累加:上下文学习中函数向量的因果分解

Entang Wang, Yiwei Wang, Aleksandra Bakalova, Michael Hahn

AI总结 本文通过因果分解揭示少样本提示中函数向量由示例级子向量线性组合而成,并发现模型通过注意力重加权机制根据上下文调整示例贡献。

Comments Accepted at ICML 2026. 70 pages, 65 figures

详情
AI中文摘要

上下文学习(ICL)擅长从极少量示例中学习新任务,但我们仍缺乏对少样本提示如何塑造模型函数向量(FV)——一种驱动ICL查询任务行为的因果激活方向——的机制性解释。跨任务和模型,一个$n$样本FV可以通过示例级子FV的线性组合很好地近似,表明来自单个演示的贡献具有加性和可组合性。除了加性之外,我们展示了模型基于先前示例对单个示例的表示进行上下文化,以自适应地重新加权哪些演示主导FV:注意力转向在上下文中信息量更大、歧义更少的示例。最后,因果分解将查询-键路由与值更新分离,发现上下文化对FV质量最一致的贡献来自查询-键对齐——尤其是在歧义设置中——而值介导的效应则更加异质。综合起来,这些结果将加性叠加与上下文相关的注意力重加权统一为一个机制性的、可检验的说明,解释少样本提示如何实现任务。

英文摘要

In-context learning (ICL) excels at new tasks from minimal examples, yet we still lack a mechanistic explanation of how few-shot prompts shape a model's function vector (FV)--a causal activation direction that drives task behavior on the ICL query. Across tasks and models, an $n$-shot FV is well-approximated by a linear combination of example-level sub-FVs, suggesting additive and composable contributions from individual demonstrations. Beyond additivity, we show that models contextualize individual examples' representations based on prior examples to adaptively reweight which demonstrations dominate the FV: attention shifts toward examples that are more informative and less ambiguous under the context. Finally, a causal decomposition separates Query-Key routing from Value updates, finding that contextualization's most consistent contributions to FV quality arise from Query-Key alignment--particularly in ambiguous settings--while Value-mediated effects are more heterogeneous. Together, these results unify additive superposition with context-dependent attention reweighting into a mechanistic, testable account of how few-shot prompts implement tasks.

2605.16409 2026-05-26 cs.CV cs.CL cs.LG 版本更新

Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for Multimodal Large Language Models

多语言OCR感知微调和提示引导的链式思维推理用于多模态大语言模型

Qinwu Xu, Yifan Jiang, Haoyu Ren

发表机构 * Meta AI UT Austin(德克萨斯大学奥斯汀分校)

AI总结 提出一种多语言OCR感知的多模态训练框架,通过合成数据生成、OCR感知微调和结构化视觉链式思维提示,提升多模态大语言模型在复杂视觉条件下的OCR完整性和多语言翻译准确性。

详情
AI中文摘要

光学字符识别(OCR)和多语言文本理解仍然是多模态大语言模型(MLLMs)的主要失败模式,尤其是在包含杂乱布局、小字体、模糊、遮挡和复杂排版的真实世界图像中。我们提出了一种OCR感知的多语言多模态训练框架,该框架结合了(i)大规模合成OCR到翻译数据生成,(ii)使用LoRA适配的OCR感知监督微调(SFT),以及(iii)在不确定视觉条件下进行推理的结构化视觉链式思维(CoT)提示。使用基于LLaMA的多模态架构,所提出的框架在OCR完整性、多语言翻译准确性和退化视觉条件下的鲁棒性方面有了显著提升。在多语言收据、菜单、海报、标志、手写文本和文档图像上的实验结果表明,与基线模型相比,视觉-文本对齐显著改善。特别是,所提出的OCR感知后训练框架提高了对小、模糊、空间分散和部分遮挡文本的提取,同时减少了对不确定OCR条件下语言先验的依赖。与前沿多模态系统(包括GPT-5类和Gemini系列模型)的定性比较进一步表明,在噪声和视觉模糊的OCR场景下,OCR对齐得到改善,幻觉减少。总体而言,结果表明,以数据为中心的OCR感知多模态后训练为改进多语言OCR和基于OCR的视觉问答系统提供了一种有效且可扩展的方向。

英文摘要

Optical character recognition (OCR) and multilingual text understanding remain major failure modes of multimodal large language models (MLLMs), particularly in real-world images containing cluttered layouts, small fonts, blur, occlusion, and complex typography. We present an OCR-aware multilingual multimodal training framework that combines (i) large-scale synthetic OCR-to-translation data generation, (ii) OCR-aware supervised fine-tuning (SFT) with LoRA adaptation, and (iii) structured visual chain-of-thought (CoT) prompting for reasoning under uncertain visual conditions. Using a LLaMA-based multimodal architecture, the proposed framework substantially improves OCR completeness, multilingual translation accuracy, and robustness under degraded visual conditions. Experimental results on multilingual receipts, menus, posters, signs, handwritten text, and document images demonstrate significantly improved visual-text grounding compared with the baseline model. In particular, the proposed OCR-aware post-training framework improves extraction of small, blurred, spatially scattered, and partially occluded text while reducing reliance on language priors under uncertain OCR conditions. Qualitative comparisons with frontier multimodal systems, including GPT-5-class and Gemini-family models, further suggest improved OCR grounding and reduced hallucination under noisy and visually ambiguous OCR scenarios. Overall, the results indicate that data-centric OCR-aware multimodal post-training provides an effective and scalable direction for improving multilingual OCR and OCR-based visual question answering systems.

2605.14605 2026-05-26 cs.CR cs.AI cs.LG 版本更新

One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries

一步之遥:为什么针对恶意微调的防御在自适应对手面前失败

Itay Zloczower, Eyal Lenga, Gilad Gressel, Yisroel Mirsky

发表机构 * Ben-Gurion University of the Negev(贝纳-约瑟夫大学) Amrita Vishwa Vidyapeetham(阿米塔维莎瓦迪耶佩塔)

AI总结 本文通过分析15种近期防御机制,发现它们共享一个弱点:仅掩盖或误导有害行为路径而未消除行为本身,并开发了一种统一的自适应攻击,成功突破了所有防御机制。

Comments Under review

详情
AI中文摘要

模型提供商越来越多地发布开放权重或允许用户通过API微调基础模型。尽管这些模型在发布前经过安全对齐,但其防护措施通常可以通过对有害数据的微调来移除。最近的防御旨在使模型对此类恶意微调具有鲁棒性,但它们主要仅针对不考虑防御的固定攻击进行评估。我们表明这些鲁棒性声明是不完整的。通过调查15种近期防御,我们识别了几种防御机制,并表明它们共享一个单一弱点:它们掩盖或误导通往有害行为的路径,而不移除行为本身。然后,我们开发了一种统一的自适应攻击,突破了所有防御机制。我们的结果表明,当前方法并未提供稳健的安全性;它们主要阻止了它们所设计的攻击。我们希望我们针对这一领域的统一自适应对手将帮助未来的研究人员和实践者在部署前对新防御进行压力测试。

英文摘要

Model providers increasingly release open weights or allow users to fine-tune foundation models through APIs. Although these models are safety-aligned before release, their safeguards can often be removed by fine-tuning on harmful data. Recent defenses aim to make models robust to such malicious fine-tuning, but they are largely evaluated only against fixed attacks that do not account for the defense. We show that these robustness claims are incomplete. Surveying 15 recent defenses, we identify several defense mechanisms and show that they share a single weakness: they obscure or misdirect the path to harmful behavior without removing the behavior itself. We then develop a unified adaptive attack that breaks defenses across all defense mechanisms. Our results show that current approaches do not provide robust security; they mainly stop the attacks they were designed against. We hope that our unified adaptive adversary for this domain will help future researchers and practitioners stress-test new defenses before deployment.

2605.13282 2026-05-26 cs.AI cs.LG 版本更新

Differentiable Learning of Lifted Action Schemas for Classical Planning

经典规划中提升动作模式的可微学习

Jonas Reiter, Jakob Elias Gebler, Hector Geffner

发表机构 * RWTH Aachen University(亚琛工业大学)

AI总结 提出一种神经网络架构,从完全可观测状态但动作参数未观测的轨迹中学习提升动作模式,实现近乎完美的结构恢复。

详情
AI中文摘要

经典规划器可以有效解决用STRIPS或PDDL表示的非常大的确定性MDP,其中状态是对象和关系上的原子集合,提升动作模式添加或删除这些原子。这种紧凑表示产生了强大的搜索启发式,并为结构泛化提供了理想设置,因为提升关系和动作模式可以产生无限多个领域实例。一个核心挑战是从数据中学习这些关系和动作模式,最近的方法使用不同类型的观测来解决这个问题。在这项工作中,我们开发了一种新颖的神经网络架构,从状态完全可观测但动作参数未观测的轨迹中学习动作模式。该问题是一个简化,但却是从图像序列和动作标签学习规划领域的重要一步,我们旨在以近乎完美的方式解决这个简化问题。挑战在于同时从观测到的状态变化中识别动作参数并学习动作模式。我们的方法产生了一个鲁棒的可微组件,然后可以集成到更大的神经符号模型中。我们在各种规划领域上评估该架构,其中学习到的提升动作模式必须恢复真实结构。此外,我们报告了关于对观测噪声的鲁棒性以及与基于槽的动态模型相关变体的实验。

英文摘要

Classical planners can effectively solve very large deterministic MDPs represented in STRIPS or PDDL where states are sets of atoms over objects and relations, and lifted action schemas add or delete these atoms. This compact representation yields strong search heuristics and provides an ideal setting for structural generalization, since lifted relations and action schemas give rise to infinitely many domain instances. A central challenge is to learn these relations and action schemas from data, and recent approaches have addressed this problem using different types of observations. In this work, we develop a novel neural network architecture for learning action schemas from traces where states are fully observed but action arguments are unobserved. The problem is a simplification but an important step towards learning planning domains from sequences of images and action labels, and we aim to solve this simplification in a nearly perfect manner. The challenge lies in learning the action schemas while simultaneously identifying the action arguments from observed state changes. Our approach yields a robust differentiable component that can then be integrated into larger neuro-symbolic models. We evaluate the architecture on various planning domains, where the learned lifted action schemas must recover the ground-truth structure. Additionally, we report experiments on robustness to observation noise and on a variation related to slot-based dynamics models.

2605.12850 2026-05-26 cs.CL cs.AI cs.CR cs.LG 版本更新

Persona-Model Collapse in Emergent Misalignment

涌现性失调中的人格模型崩溃

Davi Bastos Costa, Renato Vicente

发表机构 * TELUS Digital Research Hub(TELUS数字研究中心) Center for Artificial Intelligence and Machine Learning(人工智能与机器学习中心) Institute of Mathematics, Statistics and Computer Science(数学、统计与计算机科学研究所) University of São Paulo(圣保罗大学)

AI总结 提出人格模型崩溃假说,通过道德易感性(S)和道德稳健性(R)两个指标,证明在有害数据上微调大语言模型会导致模型模拟、区分和维持一致角色的内部能力恶化,从而引发涌现性失调。

Comments 23 pages, 7 figures, 7 tables; NeurIPS 2026 submission; Corrected code repository URL

详情
AI中文摘要

在包含有害内容的狭窄数据上微调大型语言模型,会在无关提示上产生广泛的失调行为,这种现象称为涌现性失调。我们提出涌现性涉及人格模型崩溃:模型模拟、区分和维持一致角色的内部能力恶化。我们通过两个指标在行为上检验这一假设:道德易感性(S)和道德稳健性(R),它们根据模型在角色扮演下道德基础问卷回答的跨角色和角色内变异性计算得出。这些指标形式化了模型区分角色的能力(S)以及模拟给定角色时的一致性(R)。我们评估了四个前沿模型(DeepSeek-V3.1, GPT-4.1, GPT-4o, Qwen3-235B)的三种变体:基础版、微调为输出不安全代码的版本,以及匹配的微调为输出安全代码的对照版本。在四个模型中,不安全微调导致S平均增加55%,将所有四个不安全变体推至先前工作中13个前沿模型基准观测到的波段之外——其中GPT-4o达到波段上端的两倍以上——表明分化失调。它还导致R平均下降65%,相当于1/R增加304%。相比之下,匹配的安全对照将S保持在基础值附近,仅引起部分R损失,表明这些效应主要特定于失调。补充这些指标变化,不安全变体的无条件响应趋近于接近量表上限的饱和状态,与基础模型的结构化响应以及基础模型角色扮演有毒人格时的响应明显不同。综合来看,这些指标为涌现性失调提供了敏感的诊断,并作为其涉及人格模型崩溃的行为证据。

英文摘要

Fine-tuning large language models on narrow data with harmful content produces broadly misaligned behavior on unrelated prompts, a phenomenon known as emergent misalignment. We propose that emergent misalignment involves persona-model collapse: deterioration of the model's internal capacity to simulate, differentiate, and maintain consistent characters. We test this hypothesis behaviorally using two metrics: moral susceptibility (S) and moral robustness (R), computed from the across- and within-persona variability of models' Moral Foundations Questionnaire responses under persona role-play. These metrics formalize the model's ability to differentiate characters (S) and its consistency when simulating a given one (R). We evaluate four frontier models (DeepSeek-V3.1, GPT-4.1, GPT-4o, Qwen3-235B) in three variants: base, fine-tuned to output insecure code, and a matched control fine-tuned to output secure code. Across the four models, insecure fine-tuning produces an average $55\%$ increase in S, pushing all four insecure variants beyond the band observed across 13 frontier models benchmarked in prior work -- with GPT-4o reaching more than twice the band's upper end -- signaling dysregulated differentiation. It also causes an average $65\%$ decrease in R, equivalent to a $304\%$ increase in 1/R. By contrast, the matched secure control preserves S near the base and induces only a partial R loss, showing that these effects are largely misalignment-specific. Complementing these metric shifts, insecure variants' unconditioned responses converge toward saturation near the scale ceiling, departing markedly from both base models' structured responses and those elicited when base models role-play toxic personas. Taken together, these metrics provide a sensitive diagnostic for emergent misalignment and serve as behavioral evidence that it involves persona-model collapse.

2605.12764 2026-05-26 q-fin.MF cs.LG stat.ML 版本更新

Yield Curves Dynamics Using Variational Autoencoders Under No-arbitrage

无套利条件下使用变分自编码器的收益率曲线动力学

Fusheng Luo, H'elyette Geman

发表机构 * Department of Applied Mathematics and Statistics, Johns Hopkins University, USA(应用数学与统计学系,约翰霍普金斯大学,美国)

AI总结 提出一种物理信息生成框架,通过两阶段架构(学生t条件变分自编码器+动态水平注入和神经随机微分方程)解决深度学习统计灵活性与固定收益理论约束的冲突,在多个主权货币上显著降低预测误差并实现无套利。

Comments This is the full script (version 2) of our paper, which is awaiting submission to financial journals/conferences, after modifying and double-checking the reference lists

详情
AI中文摘要

本文引入了一个物理信息生成框架,解决了深度学习统计灵活性与固定收益建模严格理论约束之间的根本冲突。我们证明,标准生成模型和无约束统计外推在预测跨多种宏观经济体制的期限结构时,会遭受“流形崩溃”和严重的套利违规。为克服这一问题,我们提出了一种两阶段架构。首先,具有动态水平注入的学生t条件变分自编码器(CVAEsT+LS)提取了一个稳健、重尾的期限结构流形,有效解耦了宏观经济形状动态与绝对基准利率。其次,潜在动态演化由连续时间神经随机微分方程(SDE)控制,并受到无套利偏微分方程(PDE)的严格惩罚。跨多个主权货币(美元、英镑、日元)的实证结果证实,我们的协同方法大幅降低了样本外预测误差——实现了卓越的6.58个基点平均期限RMSE——并成功克服了经典HJM模型在极端环境中表现出的巨大平行漂移和零下限违规。此外,通过相空间向量场分析,我们展示了该模型在无监督宏观经济体制检测和高质量连续时间情景生成方面的卓越能力。最终,本研究为期限结构建模提供了一个高度可扩展、数学上合理的演化引擎。

英文摘要

This paper introduces a physics-informed generative framework that resolves the fundamental conflict between the statistical flexibility of deep learning and the rigorous theoretical constraints of fixed-income modeling. We demonstrate that standard generative models and unconstrained statistical extrapolations suffer from "manifold collapse" and severe arbitrage violations when forecasting term structures across diverse macroeconomic regimes. To overcome this, we propose a two-stage architecture. First, a Student-t Conditional Variational Autoencoder with Dynamic Level Injection (CVAEsT+LS) extracts a robust, heavy-tailed term structure manifold, effectively decoupling macroeconomic shape dynamics from absolute base rates. Second, the latent dynamic evolution is governed by a continuous-time Neural Stochastic Differential Equation (SDE) strictly penalized by a No-Arbitrage Partial Differential Equation (PDE). Empirical results across multiple sovereign currencies (USD, GBP, JPY) confirm that our synergistic approach drastically reduces out-of-sample forecasting errors -- achieving an exceptional 6.58 bps Mean Tenor RMSE -- and successfully overcomes the massive parallel drift and zero-lower-bound violations exhibited by the classical HJM model in extreme environments. Furthermore, through phase space vector field analysis, we demonstrate the model's superior capability in unsupervised macroeconomic regime detection and high-quality continuous-time scenario generation. Ultimately, this research provides a highly scalable, mathematically sound evolutionary engine for term structure modeling.

2605.12118 2026-05-26 stat.ML cs.LG 版本更新

Keeping Score: Efficiency Improvements in Neural Likelihood Surrogate Training via Score-Augmented Loss Functions

保持分数:通过分数增强损失函数提高神经似然代理训练的效率

Alexander Shen, Mikael Kuusela

发表机构 * Department of Statistics and Data Science(统计与数据科学系)

AI总结 针对随机过程模型,提出通过分数增强损失函数和自适应加权改进神经似然代理训练,在显著降低计算成本的同时提升代理质量,实现与10倍训练数据相当的推理性能。

Comments 9 pages of main text, 9 pages of appendices, 13 figures

详情
AI中文摘要

对于随机过程模型,参数推断通常受限于计算昂贵的似然函数。基于模拟的推断(SBI)通过构建摊销代理似然绕过了这一限制,但大多数SBI方法假设黑箱数据生成过程。虽然这些代理在无限训练数据下是精确的,但实际场景迫使在模型质量和模拟成本之间进行严格权衡。在这项工作中,我们放宽了SBI的黑箱假设,以改善结构化随机过程模型的这种权衡。具体而言,对于通过概率分类训练的神经网络似然代理,我们提出用精确的分数信息 $\nabla_θ\log p(x \mid θ)$ 和基于损失梯度的自适应加权来增强标准二元交叉熵损失。我们在涉及网络动力学和空间过程的案例研究中评估了我们的方法,证明我们的方法以远低于生成更多训练数据的计算成本提高了代理质量。值得注意的是,在某些情况下,我们的方法实现了与训练数据增加10倍相当的下游推理性能,而训练时间增加不到1.1倍。

英文摘要

For stochastic process models, parameter inference is often severely bottlenecked by computationally expensive likelihood functions. Simulation-based inference (SBI) bypasses this restriction by constructing amortized surrogate likelihoods, but most SBI methods assume a black-box data generating process. While these surrogates are exact in the limit of infinite training data, practical scenarios force a strict tradeoff between model quality and simulation cost. In this work, we loosen the black-box assumption of SBI to improve this tradeoff for structured stochastic process models. Specifically, for neural network likelihood surrogates trained via probabilistic classification, we propose to augment the standard binary cross-entropy loss with exact score information $\nabla_θ\log p(x \mid θ)$ and adaptive weighting based on loss gradients. We evaluate our approach on case studies involving network dynamics and spatial processes, demonstrating that our method improves surrogate quality at a drastically lower computational cost than generating more training data. Notably, in some cases, our approach achieves downstream inference performance equivalent to a 10x increase in training data with less than a 1.1x increase in training time.

2605.10989 2026-05-26 cs.LG cs.AI 版本更新

SURGE: Surrogate Gradient Adaptation in Binary Neural Networks

SURGE: 二值神经网络中的替代梯度自适应

Haoyu Huang, Boyu Liu, Linlin Yang, Yanjing Li, Yuguang Yang, Xuhui Liu, Canyu Chen, Zhongqian Fu, Baochang Zhang

发表机构 * National College for Excellent Engineers, Beihang University, Beijing, China(北京航空航天大学优秀工程师学院) School of Artificial Intelligence, Beihang University, Beijing, China(北京航空航天大学人工智能学院) School of Electronic and Information Engineering, Beihang University, Beijing, China(北京航空航天大学电子与信息工程学院) King Abdullah University of Science and Technology, Saudi Arabia(沙特国王 Abdullah 科学技术大学) Huawei Noah’s Ark Lab, China(华为诺亚实验室)

AI总结 针对二值神经网络中梯度失配和固定范围梯度裁剪导致的信息损失问题,提出一种基于理论的可学习梯度补偿框架SURGE,通过双路径梯度补偿器和自适应梯度缩放器实现偏差减少的梯度估计与动态平衡,在图像分类、目标检测和语言理解任务上达到最优性能。

Comments Accepted as a poster at the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

二值神经网络(BNN)的训练从根本上依赖于对不可微二值化操作(如符号函数)的梯度近似。然而,包括直通估计器(STE)及其改进变体在内的主流方法依赖于手工设计,存在梯度失配问题和固定范围梯度裁剪导致的信息损失。为了解决这一问题,我们提出了SURrogate GradiEnt Adaptation(SURGE),一种新颖的、具有理论依据的可学习梯度补偿框架。SURGE通过辅助反向传播缓解梯度失配。具体地,我们设计了一个双路径梯度补偿器(DPGC),为每个二值化层构建一个并行的全精度辅助分支,通过在反向传播期间进行输出分解来解耦梯度流。DPGC利用全精度分支估计超出STE一阶近似的分量,从而实现偏差减少的梯度估计。为了进一步增强训练稳定性,我们引入了一个基于最优缩放因子的自适应梯度缩放器(AGS),通过基于范数的缩放动态平衡分支间的梯度贡献。在图像分类、目标检测和语言理解任务上的实验表明,SURGE在现有最先进方法中表现最佳。

英文摘要

The training of Binary Neural Networks (BNNs) is fundamentally based on gradient approximation for non-differentiable binarization operations (e.g., sign function). However, prevailing methods including the Straight-Through Estimator (STE) and its improved variants, rely on hand-crafted designs that suffer from gradient mismatch problem and information loss induced by fixed-range gradient clipping. To address this, we propose SURrogate GradiEnt Adaptation (SURGE), a novel learnable gradient compensation framework with theoretical grounding. SURGE mitigates gradient mismatch through auxiliary backpropagation. Specifically, we design a Dual-Path Gradient Compensator (DPGC) that constructs a parallel full-precision auxiliary branch for each binarized layer, decoupling gradient flow via output decomposition during backpropagation. DPGC enables bias-reduced gradient estimation by leveraging the full-precision branch to estimate components beyond STE's first-order approximation. To further enhance training stability, we introduce an Adaptive Gradient Scaler (AGS) based on an optimal scale factor to dynamically balance inter-branch gradient contributions via norm-based scaling. Experiments on image classification, object detection, and language understanding tasks demonstrate that SURGE performs best over state-of-the-art methods.

2605.10718 2026-05-26 cs.DC cs.AI cs.LG cs.PF cs.SY eess.SY 版本更新

An Uncertainty-Aware Resilience Micro-Agent for Causal Observability in the Computing Continuum

一种面向计算连续体中因果可观测性的不确定性感知韧性微代理

Suvi De Silva, Alfreds Lapkovskis, Alaa Saleh, Sasu Tarkoma, Praveen Kumar Donta

发表机构 * Department of Computer Systems and Sciences(计算机系统与科学系) Department of Computer Science(计算机科学系)

AI总结 提出AURORA框架,通过集成自由能原理、因果do-calculus和局部因果状态图,在边缘层实现灰色故障的因果诊断与缓解,并采用双门控执行机制在不确定性高时避免破坏性干预。

详情
AI中文摘要

计算连续体中的灰色故障会产生模糊重叠的症状,现有方法由于缺乏因果意识或在高度认知不确定性下行动,无法可靠诊断,并可能导致破坏性干预。本文提出了一种面向因果可观测性的不确定性感知韧性微代理(AURORA),这是一个轻量级框架,用于诊断和缓解边缘层环境中的灰色故障。该框架采用并行微代理,集成自由能原理、因果do-calculus和局部因果状态图,支持每个故障马尔可夫毯内的反事实根因分析。将推理限制在因果相关变量上可降低计算开销,同时保持诊断保真度。AURORA进一步引入双门控执行机制,仅在因果置信度高且预测认知不确定性有界时授权修复;否则,放弃本地干预并将诊断有效载荷升级到雾层。我们的实验表明,AURORA优于基线,实现了0%的破坏性行动率,同时保持62.0%的修复准确率和3ms的平均修复时间。

英文摘要

Grey failures in the computing continuum produce ambiguous overlapping symptoms that existing approaches fail to diagnose reliably, either due to a lack of causal awareness or acting under high epistemic uncertainty, risking destructive interventions. This paper presents an uncertainty-aware resilience micro-agent for causal observability (AURORA), a lightweight framework for diagnosing and mitigating grey failures in edge-tier environments. The framework employs parallel micro-agents that integrate the free-energy principle, causal do-calculus, and localized causal state-graphs to support counterfactual root-cause analysis within each fault's Markov blanket. Restricting inference to causally relevant variables reduces computational overhead while preserving diagnostic fidelity. AURORA further introduces a dual-gated execution mechanism that authorizes remediation only when causal confidence is high and predicted epistemic uncertainty is bounded; otherwise, it abstains from local intervention and escalates the diagnostic payload to the fog tier. Our experiments demonstrate that AURORA outperforms baselines, achieving a 0% destructive action rate, while maintaining 62.0% repair accuracy and a 3ms mean time to repair.

2605.10302 2026-05-26 cs.LG 版本更新

Follow the Mean: Reference-Guided Flow Matching

跟随均值:参考引导的流匹配

Pedro M. P. Curvo, Maksim Zhdanov, Floor Eijkelboom, Jan-Willem van de Meent

发表机构 * University of Amsterdam(阿姆斯特丹大学) AMLab(AML实验室)

AI总结 提出通过改变参考集均值来引导预训练流匹配模型实现可控生成,无需微调或额外网络。

详情
AI中文摘要

现有的可控生成方法通常依赖于微调、辅助网络或测试时搜索。我们证明流匹配提供了不同的控制接口:通过示例进行自适应。对于确定性插值,速度场仅由条件端点均值决定;移动该均值会移动流本身。这为可控生成提供了一个简单原则:通过改变模型遵循的参考集来引导预训练模型。我们以两种形式实例化这一思想。参考均值引导无需训练:它从参考库中计算封闭形式的端点均值修正,并将其应用于冻结的FLUX.2-klein(4B)模型,在保持提示、种子和权重不变的情况下,实现对颜色、身份、风格和结构的控制。半参数引导通过显式均值锚点和学习到的残差精炼器摊销相同的思想,在AFHQv2上匹配无条件的DiT-B/4质量,同时允许在推理时交换参考集。这些结果指向一个更广泛的方向:通过数据而非参数更新进行自适应的生成模型。

英文摘要

Existing approaches to controllable generation typically rely on fine-tuning, auxiliary networks, or test-time search. We show that flow matching admits a different control interface: adaptation through examples. For deterministic interpolants, the velocity field is solely governed by a conditional endpoint mean; shifting this mean shifts the flow itself. This yields a simple principle for controllable generation: steer a pretrained model by changing the reference set it follows. We instantiate this idea in two forms. Reference-Mean Guidance is training-free: it computes a closed-form endpoint-mean correction from a reference bank and applies it to a frozen FLUX.2-klein (4B) model, enabling control of color, identity, style, and structure while keeping the prompt, seed, and weights fixed. Semi-Parametric Guidance amortizes the same idea through an explicit mean anchor and learned residual refiner, matching unconditional DiT-B/4 quality on AFHQv2 while allowing the reference set to be swapped at inference time. These results point to a broader direction: generative models that adapt through data, not parameter updates.

2605.06505 2026-05-26 cs.LG cs.AI cs.CR 版本更新

PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization

PACZero: 通过符号量化的语言模型PAC隐私微调

Murat Bilgehan Ertan, Xiaochen Zhu, Phuong Ha Nguyen, Marten van Dijk, Srinivas Devadas

发表机构 * CWI Amsterdam(阿姆斯特丹信息与计算科学研究所) MIT Cambridge(麻省理工学院) Vrije Universiteit Amsterdam(阿姆斯特丹自由大学)

AI总结 提出PACZero系列零阶机制,通过符号量化实现零互信息下的PAC隐私微调,在SST-2和SQuAD上取得竞争性结果。

详情
AI中文摘要

我们引入了PACZero,一系列用于微调大型语言模型的PAC隐私零阶机制,在$I(S^*; Y_{1:T})=0$时提供可用的效用。该隐私机制将成员推断攻击(MIA)后验成功率限制在先验水平,这是DP框架仅在$\varepsilon=0$和无限噪声下才能达到的MIA抵抗水平。所有下面的DP-ZO比较都在MIA后验水平上匹配。关键见解是,PAC隐私仅在发布依赖于哪个候选子集是秘密时才对互信息收费。对子集聚合的零阶梯度进行符号量化会产生频繁的一致步骤,即每个候选子集在更新方向上达成一致;在这些步骤中,发布的符号花费零条件互信息。我们提出了两个变体,涵盖隐私-效用权衡:PACZero-MI(通过对二元发布进行精确校准的预算化MI)和PACZero-ZPL(在分歧步骤上通过均匀硬币翻转实现$I=0$)。我们在SST-2和SQuAD上使用OPT-1.3B和OPT-6.7B在LoRA和全参数轨道上进行了评估。在SST-2 OPT-1.3B全微调$I=0$时,PACZero-ZPL达到$88.99\pm0.91$,比非私有MeZO基线($91.1$ FT)低2.1个百分点。在$\varepsilon<1$的高隐私机制下,没有先前方法能产生可用的效用,而PACZero-ZPL在$I=0$时在OPT-1.3B和OPT-6.7B上获得了有竞争力的SST-2准确率和非平凡的SQuAD F1分数。

英文摘要

We introduce PACZero, a family of PAC-private zeroth-order mechanisms for fine-tuning large language models that delivers usable utility at $I(S^*; Y_{1:T})=0$. This privacy regime bounds the membership-inference attack (MIA) posterior success rate at the prior, an MIA-resistance level the DP framework matches only at $\varepsilon=0$ and infinite noise. All DP-ZO comparisons below are matched at the MIA posterior level. The key insight is that PAC Privacy charges mutual information only when the release depends on which candidate subset is the secret. Sign-quantizing subset-aggregated zeroth-order gradients creates frequent unanimity, steps at which every candidate subset agrees on the update direction; at these steps the released sign costs zero conditional mutual information. We propose two variants that span the privacy-utility trade-off: PACZero-MI (budgeted MI via exact calibration on the binary release) and PACZero-ZPL ($I=0$ via a uniform coin flip on disagreement steps). We evaluate on SST-2 and SQuAD with OPT-1.3B and OPT-6.7B in both LoRA and full-parameter tracks. On SST-2 OPT-1.3B full fine-tuning at $I=0$, PACZero-ZPL reaches ${88.99\pm0.91}$, within $2.1$pp of the non-private MeZO baseline ($91.1$ FT). No prior method produces usable utility in the high-privacy regime $\varepsilon<1$, and PACZero-ZPL obtains competitive SST-2 accuracy and nontrivial SQuAD F1 across OPT-1.3B and OPT-6.7B at $I=0$.

2605.06259 2026-05-26 cs.LG cs.CR 版本更新

Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds

基于随机洗牌的DP-SGD的权衡函数:紧的上界和下界

Marten van Dijk, Murat Bilgehan Ertan

发表机构 * CWI Amsterdam(阿姆斯特丹信息与计算科学研究所) Vrije Universiteit Amsterdam(阿姆斯特丹自由大学)

AI总结 本文在$f$-DP框架下,针对基于随机洗牌子采样的差分隐私随机梯度下降(DP-SGD),推导了权衡函数的紧致分析,得到了透明且可解释的闭式界,并展示了单轮训练中达到有意义的差分隐私所需的参数设置。

详情
AI中文摘要

我们在$f$-DP框架下,针对基于随机洗牌子采样的差分隐私随机梯度下降(DP-SGD),推导了权衡函数的紧致分析。我们的分析涵盖了噪声乘数$σ$满足$σ\geq \sqrt{3/\ln M}$的情形,其中$M$是单轮内的轮数。与泊松子采样的$f$-DP分析(产生非封闭的隐式公式,可机器计算但不透明)不同,随机洗牌允许紧致分析,得到透明且可解释的闭式界。我们通过Berry-Esseen定理推导的具体界,在证明框架内紧致到常数因子。我们展示了单轮($E=1$)的工作参数设置,对应的权衡函数$\geq 1-a-δ$,即仅比理想随机猜测对角线$1-a$低$δ$:对于$δ=1/100$和$σ=1$,大约$M \approx 1.14\times 10^6$轮和$N \approx 1.14\times 10^7$训练样本足以实现有意义的差分隐私。这与最近关于$σ\leq 1/\sqrt{2 \ln M}$情形的负面结果形成对比。我们的具体界可以在多个轮次上组合,导致$δ$具有与$E$的线性依赖关系,这限制了$E=O(\sqrt{M})$。为了超越Berry-Esseen,我们引入了一种新的证明技术,基于大数定律的推广,得到了渐近随机猜测对角线极限结果:如果$E=c_M^2M$且$c_M\to 0$,则$E$次组合的权衡函数满足$f^{\otimes E}(a)\to 1-a$在$a\in[0,1]$上一致,且$δ$仅具有$O(\sqrt{E})$的依赖关系。我们将这种渐近状态与相应的泊松子采样渐近进行比较,并将显式收敛速率的刻画作为一个开放问题。

英文摘要

We derive a tight analysis of the trade-off function for Differentially Private Stochastic Gradient Descent (DP-SGD) with subsampling based on random shuffling within the $f$-DP framework. Our analysis covers the regime $σ\geq \sqrt{3/\ln M}$, where $σ$ is the noise multiplier and $M$ is the number of rounds within a single epoch. Unlike $f$-DP analyses for Poisson subsampling, which yield non-closed implicit formulas that can be machine computed but are non-transparent, random shuffling admits a tight analysis yielding transparent and interpretable closed-form bounds. Our concrete bounds, derived via the Berry-Esseen theorem, are tight up to constant factors within the proof framework. We demonstrate worked parameter settings for a single epoch ($E=1$) with a corresponding trade-off function $\geq 1-a-δ$, that is, only $δ$ below the ideal random guessing diagonal $1-a$: For $δ= 1/100$ and $σ= 1$, roughly $M \approx 1.14\times 10^6$ rounds and $N \approx 1.14\times 10^7$ training samples suffice to achieve meaningful differential privacy. This is in contrast to recent negative results for the regime $σ\leq 1/\sqrt{2 \ln M}$. Our concrete bounds can be composed over multiple epochs leading to $δ$ having a linear in $E$ dependency, which restricts $E=O(\sqrt{M})$. To go beyond Berry--Esseen, we introduce a new proof technique based on a generalization of the law of large numbers that yields an asymptotic random guessing diagonal-limit result: if $E=c_M^2M$ with $c_M\to 0$, then the $E$-fold composed trade-off function satisfies $f^{\otimes E}(a)\to 1-a$ uniformly in $a\in[0,1]$ with $δ$ having only an $O(\sqrt{E})$ dependency. We compare this asymptotic regime with the corresponding Poisson subsampling asymptotic, and highlight the characterization of explicit convergence rates as an open question.

2605.05795 2026-05-26 cs.LG 版本更新

Reward Shaping and Action Masking for Compositional Tasks using Behavior Trees and LLMs

使用行为树和LLM的组合任务奖励塑造与动作掩码

Nicholas Potteiger, Ankita Samaddar, Taylor T. Johnson, Xenofon Koutsoukos

发表机构 * Vanderbilt University(范德比大学)

AI总结 提出MRBT结构,结合LLM自动生成奖励和动作掩码,通过SMT验证和神经符号RL循环,提升组合任务训练效率和成功率。

详情
AI中文摘要

将复杂任务分解为一系列更简单的子任务可以提高自主代理的学习效率。强化学习(RL)可用于优化代理策略以完成子任务,但需要明确定义的子任务奖励,并受益于动作掩码。最近的工作使用大型语言模型(LLM)来自动化奖励塑造和动作掩码,然而它们都没有完全解决对子任务失败的响应性以及组合任务中不同对象的模块化问题。为了克服这些挑战,我们开发了掩码奖励行为树(MRBT),这是一种用作响应式和模块化奖励及动作掩码函数的符号结构。我们设计了一个MRBT模板,并推导出逻辑规范来构建和验证一系列对象交互子任务的MRBT。此外,我们开发了一个自动化流水线,使用LLM生成对变化任务对象鲁棒的MRBT,使用SMT求解器验证规范的正确性,以及一个神经符号RL循环来训练代理完成组合任务。实验证明成功生成和优化了五个MRBT,与基线以及没有动作掩码的MRBT相比,持续提高了训练效率和任务成功率。我们进一步强调了MRBT的三个优势:可迁移性、模块化和可验证性。

英文摘要

Decomposing complex tasks into a sequence of simpler subtasks can improve learning efficiency for an autonomous agent. Reinforcement learning (RL) can be used to optimize agent policies to complete subtasks, but requires well-defined subtask rewards and benefits from action masking. Recent work uses large language models (LLMs) to automate reward shaping and action masking, however none of them fully address reactivity to subtask failure and modularity to varying objects for compositional tasks. To overcome these challenges, we develop masking reward behavior tree (MRBT), a symbolic structure used as a reactive and modular reward and action mask function. We design an MRBT template and derive logical specifications to construct and verify MRBTs for a sequence of object-interaction subtasks. Further, we develop an automated pipeline that uses an LLM to generate MRBTs robust to varying task objects, an SMT-solver to verify correctness of specifications, and a neurosymbolic RL loop to train agents on compositional tasks. Experiments demonstrate successful generation and refinement of five MRBTs, consistently improving training efficiency and task success rates over baselines and MRBTs without action masking. We further highlight three advantages of MRBTs: transferability, modularity, and verifiability.

2605.05759 2026-05-26 cs.LG 版本更新

Full-Spectrum Graph Neural Networks: Expressive and Scalable

全谱图神经网络:表达力与可扩展性

Xiaohan Wang, Deyu Bo, Longlong Li, Kelin Xia

发表机构 * Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore(数学科学学院,物理与数学科学学院,南洋理工大学,新加坡637371,新加坡)

AI总结 提出全谱图神经网络(FSpecGNN),通过将信号从节点域提升到节点对域并将单变量谱滤波器扩展为双变量滤波器,实现了对节点对信号的通用逼近,同时保持可扩展性。

Comments 41 pages, 4 figures. Accepted to ICML 2026

详情
AI中文摘要

众所周知,谱图神经网络(GNN)可以通用逼近节点信号;然而,它们的表达能力仍然受限于1维Weisfeiler-Lehman测试,这体现在它们对高阶信号缺乏通用性。为了突破这一界限,我们提出了全谱GNN(FSpecGNN),这是经典谱GNN的二阶推广。FSpecGNN从两个角度推进了谱滤波:(1)将信号从节点域提升到节点对域;(2)将特征值上的单变量谱滤波器扩展为特征值对上的双变量滤波器。我们证明经典谱GNN是FSpecGNN的对角特例,并证明FSpecGNN在通用逼近节点对信号的同时,其表达能力最多与Local 2-GNN相当,后者对异配图学习特别有益。此外,FSpecGNN支持可扩展实现,避免了显式的节点对级计算;结合低秩近似将全谱卷积简化为多项式谱滤波器的组合,使其能够在大图上学习。实验上,FSpecGNN验证了预测的表达能力,并在异配基准上展现了强劲性能。

英文摘要

It is well established that spectral graph neural networks (GNNs) can universally approximate node signals; however, their expressive power remains bounded by the 1-dimensional Weisfeiler-Lehman test, which is mirrored in their lack of universality for higher-order signals. To go beyond this bound, we propose the Full-Spectrum GNNs (FSpecGNNs), a second-order generalization of classical spectral GNNs. FSpecGNN advances spectral filtering from two perspectives: (1) it lifts signals from the node domain to the node-pair domain; and (2) it extends the univariate spectral filter over eigenvalues to a bivariate filter over eigenvalue pairs. We show that classical spectral GNNs arise as a diagonal special case of FSpecGNNs, and prove that FSpecGNNs can be at most as expressive as Local 2-GNN while universally approximating node-pair signals, the latter being particularly beneficial for heterophilic graph learning. Moreover, FSpecGNN admits scalable implementations that avoid explicit node-pair-level computations; combined with a low-rank approximation that reduces full-spectrum convolution to a combination of polynomial spectral filters, it enables learning on large graphs. Empirically, FSpecGNN validates the predicted expressivity and delivers strong performance on heterophilic benchmarks.

2605.05226 2026-05-26 cs.LG cs.AI cs.CL 版本更新

Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning

将结果监督内化为过程监督:推理强化学习的新范式

Fei Ding, Yongkang Zhang, Runhao Liu, Yuhao Liao, Zijian Zeng, Sibo wang, Huiming Yang

发表机构 * Alibaba Group(阿里巴巴集团) Tsinghua University(清华大学)

AI总结 提出一种监督内化方法,使模型在仅结果监督下自动提取过程级学习信号,实现细粒度策略优化。

详情
AI中文摘要

推理强化学习的核心挑战不仅在于结果级监督的稀疏性,更在于如何将仅在序列末尾提供的反馈转化为可指导中间推理步骤的细粒度学习信号。现有方法要么依赖结果级奖励进行序列级优化,导致精确信用分配困难,要么依赖外部构建的过程监督,成本高昂且难以可持续扩展。为解决这一问题,我们提出一个新视角:推理强化学习可以理解为将结果监督内化为过程监督的问题。基于此视角,我们引入一种用于推理强化学习的监督内化方法,使模型能够通过识别、纠正和重用失败的推理轨迹自动提取过程级学习信号,从而在仅结果监督下实现更细粒度的策略优化。我们进一步将这一思想抽象为一种新的训练范式,其中模型在强化学习过程中持续生成并完善自身的内部过程监督,为推理强化学习中细粒度信用分配开辟了一条不同于外部提供过程监督的新路径。

英文摘要

The central challenge of reinforcement learning for reasoning lies not only in the sparsity of outcome-level supervision, but more fundamentally in how to transform feedback provided only at the end of a sequence into fine-grained learning signals that can guide intermediate reasoning steps. Existing approaches either rely on outcome-level rewards for sequence-level optimization, which makes precise credit assignment difficult, or depend on externally constructed process supervision, which is costly and difficult to scale sustainably. To address this, we propose a new perspective: reinforcement learning for reasoning can be understood as the problem of internalizing outcome supervision into process supervision. From this perspective, we introduce a supervision-internalization method for reinforcement learning for reasoning, enabling the model to automatically extract process-level learning signals through identifying, correcting, and reusing failed reasoning trajectories, thereby achieving finer-grained policy optimization under outcome-only supervision. We further abstract this idea into a new training paradigm, in which the model continually generates and refines its own internal process supervision during reinforcement learning, opening a new path for fine-grained credit assignment in reinforcement learning for reasoning that differs from externally provided process supervision.

2605.04363 2026-05-26 cs.LG cs.AI 版本更新

Mitigating Label Shift in Tabular In-Context Learning via Test-Time Posterior Adjustment

通过测试时后验调整缓解表格上下文学习中的标签偏移

Seunghan Lee

发表机构 * LG AI Research(LG人工智能研究)

AI总结 针对TabPFN在表格数据上下文学习中对标签偏移敏感的问题,提出DistPFN方法,通过测试时后验调整重新缩放类别概率,无需修改架构或额外训练,在250多个OpenML数据集上显著提升分类性能。

Comments ICML 2026

详情
AI中文摘要

TabPFN最近作为表格数据集的基础模型受到关注,通过在合成数据上利用上下文学习实现了强性能。然而,我们发现TabPFN容易受到标签偏移的影响,常常过拟合训练数据集中的多数类。为了解决这一局限性,我们提出了DistPFN,这是第一个专为表格基础模型设计的测试时后验调整方法。DistPFN通过降低训练先验(即上下文的类别分布)的影响并强调模型预测后验的贡献来重新缩放预测的类别概率,无需架构修改或额外训练。我们进一步引入了DistPFN-T,它结合了温度缩放,以根据先验和后验之间的差异自适应地控制调整强度。我们在超过250个OpenML数据集上评估了我们的方法,证明在标签偏移下,各种基于TabPFN的模型在分类任务中取得了显著改进,同时在无标签偏移的标准设置中保持了强性能。代码可在以下仓库获取:https://github.com/seunghan96/DistPFN。

英文摘要

TabPFN has recently gained attention as a foundation model for tabular datasets, achieving strong performance by leveraging in-context learning on synthetic data. However, we find that TabPFN is vulnerable to label shift, often overfitting to the majority class in the training dataset. To address this limitation, we propose DistPFN, the first test-time posterior adjustment method designed for tabular foundation models. DistPFN rescales predicted class probabilities by downweighting the influence of the training prior (i.e., the class distribution of the context) and emphasizing the contribution of the model's predicted posterior, without architectural modification or additional training. We further introduce DistPFN-T, which incorporates temperature scaling to adaptively control the adjustment strength based on the discrepancy between prior and posterior. We evaluate our methods on over 250 OpenML datasets, demonstrating substantial improvements for various TabPFN-based models in classification tasks under label shift, while maintaining strong performance in standard settings without label shift. Code is available at this repository: https://github.com/seunghan96/DistPFN.

2605.03462 2026-05-26 cs.LG cs.AI 版本更新

From Muscle Bursts to Motor Intent: Self-Supervised Token Modeling for Heterogeneous EMG

从肌肉爆发到运动意图:面向异质EMG的自监督令牌建模

Zhenghao Huang, Huilin Yao, Kaikai Wang

AI总结 提出AEMG自监督学习方法,通过事件级令牌建模和Transformer编码,从异质EMG数据中提取可复用的神经肌肉表征,提升跨用户、跨会话的鲁棒性并减少校准数据需求。

Comments After further verification, we identified issues in the current version that may affect the reliability and reproducibility of the reported experimental results. In particular, part of the evaluation relies on a dataset for which the public-release/redistribution status and supporting validation remain unresolved

详情
AI中文摘要

表面肌电图提供了一种从可穿戴肌肉记录推断人类运动意图的实用方法,但在单一采集设置下训练的模型在用户、会话、电极布局或手势协议改变时往往会失去可靠性。本文提出AEMG,一种自监督学习方法,旨在从多样化的EMG源中提取可复用的神经肌肉表征。首先将八个公开手势数据集转换为共享信号格式,以减少通道配置、传感器拓扑和记录协议的差异。AEMG不依赖固定长度滑动窗口,而是从能量变化中识别收缩事件并将其表示为紧凑的神经肌肉令牌,同时有序令牌组描述运动过程中多个肌肉的协调活动。然后使用空间和时间条件Transformer编码这些令牌序列,保留电极位置、激活时序和顺序结构信息。在预训练中,模型通过向量量化重建构建收缩原型的离散库,并通过从周围观测中恢复掩蔽的神经肌肉令牌进一步学习上下文依赖关系。在留一受试者和低标签适应设置下的实验表明,学习到的表征提高了对未见用户的鲁棒性,并减少了手势识别所需的校准数据量。这些发现表明,事件级令牌建模为适应性强且数据高效的基于EMG的运动意图理解提供了一条可扩展的途径。

英文摘要

Surface electromyography provides a practical way to infer human movement intention from wearable muscle recordings, but models trained under a single acquisition setting often lose reliability when the user, session, electrode layout, or gesture protocol changes. This paper proposes AEMG, a self-supervised learning approach designed to extract reusable neuromuscular representations from diverse EMG sources. Eight public gesture datasets are first transformed into a shared signal format to reduce discrepancies in channel configuration, sensor topology, and recording protocol. Instead of relying on fixed-length sliding windows, AEMG identifies contraction events from energy variations and represents them as compact neuromuscular tokens, while ordered token groups describe the coordinated activity of multiple muscles during motion. A spatially and temporally conditioned Transformer is then used to encode these token sequences, preserving information about electrode position, activation timing, and sequential structure. For pre-training, the model constructs a discrete library of contraction prototypes through vector-quantized reconstruction and further learns contextual dependencies by recovering masked neuromuscular tokens from surrounding observations. Experiments under leave-one-subject-out and low-label adaptation settings show that the learned representation improves robustness to unseen users and reduces the amount of calibration data required for gesture recognition. These findings suggest that event-level token modeling offers a scalable route toward adaptable and data-efficient EMG-based motor-intent understanding.

2605.02124 2026-05-26 cs.LG cs.AI math.PR 版本更新

Soft-to-Hard Routing in Sparse Mixture-of-Experts Models

稀疏混合专家模型中的软到硬路由

Reza Rastegar

发表机构 * Meta Platforms, Inc(Meta平台)

AI总结 本文通过边界层微积分方法,研究了稀疏混合专家模型中softmax路由随温度趋于零时趋近于硬top-1路由的极限过程,并给出了基于路由界面邻域概率的定量误差界。

详情
AI中文摘要

随着温度趋于零,softmax路由趋近于硬top-1路由,但极限过程在路由器平局时存在奇异性。本文针对总体平方损失混合专家回归中的软到硬极限,发展了一种边界层微积分方法。对于具有logits $a_k(x;ϕ)$的路由器,相关的局部量是前两名的间隔$Δ(x;ϕ)$,相关的全局量是边界质量$\\mathbb{P}(Δ(X;ϕ)\\\le w)$。在光滑性和横截性假设下,余面积和管状邻域估计展示了该质量如何随板宽缩放;在二元情形中,主导系数是路由界面上的显式曲面积分。这些几何估计给出了软目标$L_τ$和硬目标$L_0$之间的定量界,包括在间隔尾条件下的$O(τ^α)$一致比较,并得到了紧参数空间上软目标的$Γ$-收敛性。主要结论是,零温度近似由路由界面的$O(τ)$邻域所承载的概率控制,而不仅仅由温度本身决定。在分离出问题的这一边界层部分后,我们记录了一个从硬路由到小温度软路由的条件景观传递定理,以及一个简化的双专家高斯计算,展示了局部对称性破缺。仅包含合成诊断作为边界层预测的受控检验。

英文摘要

Softmax routing approaches hard top-1 routing as the temperature tends to zero, but the limiting passage is singular at router ties. This paper develops a boundary-layer calculus for this soft-to-hard limit in population squared-loss mixture-of-experts regression. For a router with logits $a_k(x;ϕ)$, the relevant local quantity is the top-two margin $Δ(x;ϕ)$, and the relevant global quantity is the boundary mass $\mathbb{P}(Δ(X;ϕ)\le w)$. Under smoothness and transversality assumptions, coarea and tubular-neighborhood estimates show how this mass scales with the slab width; in the binary case the leading coefficient is an explicit surface integral over the routing interface. These geometric estimates give quantitative bounds between the soft objective $L_τ$ and the hard objective $L_0$, including an $O(τ^α)$ uniform comparison under a margin-tail condition, and yield $Γ$-convergence of the soft objectives on compact parameter spaces. The main conclusion is that the zero-temperature approximation is controlled by the probability carried by an $O(τ)$ neighborhood of the routing interfaces, not by temperature alone. After isolating this boundary-layer part of the problem, we record a conditional landscape-transfer theorem from hard to small-temperature soft routing and a reduced two-expert Gaussian calculation illustrating local symmetry breaking. Synthetic diagnostics are included only as controlled checks of the boundary-layer predictions.

2604.23396 2026-05-26 cs.IR cs.AI cs.CL cs.LG 版本更新

Lost in Decoding? Reproducing and Stress-Testing the Look-Ahead Prior in Generative Retrieval

迷失在解码中?复现与压力测试生成式检索中的前瞻先验

Kidist Amde Mekonnen, Yongkang Li, Yubao Tang, Simon Lupart, Maarten de Rijke

发表机构 * University of Amsterdam(阿姆斯特丹大学)

AI总结 本文复现并压力测试了生成式检索中的前瞻先验方法PAG,发现其规划信号在词汇表面形式变化下脆弱,并评估了跨语言鲁棒性与查询端缓解策略。

Comments 12 pages, 5 figures, 9 tables; accepted to the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval, July 20-24, 2026, Melbourne/Naarm, Australia

详情
Journal ref
Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '26), pages XXX-XXX, 2026
AI中文摘要

生成式检索(GR)通过自回归生成文档标识符来对文档进行排序。由于许多GR方法依赖于trie约束的束搜索,它们在有限束解码下容易过早剪枝相关前缀。生成式检索中的前瞻规划(PAG)通过使用同时解码来计算文档级前瞻先验,指导后续顺序解码,从而缓解了这种失败模式。我们在推理时复现了PAG,并压力测试了其解码行为。使用作者发布的检查点和标识符/trie工件,在报告的解码设置下,我们在MS MARCO Dev和TREC-DL 2019/2020上复现了主要有效性结果,并在我们的硬件设置中证实了报告的束大小-延迟权衡。在复现之外,我们引入了规划漂移诊断,量化意图保持的查询变体如何改变规划器的top-n候选集和最高权重规划器令牌,以及这些变化如何影响引导解码。我们发现PAG的规划信号在词汇表面形式变化下是脆弱的:意图保持的拼写错误可能触发规划崩溃,其中规划的候选池变化足够大,使得前瞻奖励几乎无法提供有用的指导,实际上使解码退回到较弱的无引导搜索。我们进一步使用非英语mMARC O查询对英语索引评估了固定索引的跨语言鲁棒性,并评估了无需重新索引的查询端缓解策略;在我们的设置中,查询翻译提供了最强的恢复。总体而言,我们的结果证实了PAG报告的有效性以及在发布的推理设置下规划引导解码的优势,同时表明这些增益依赖于规划信号在现实查询变化和查询-文档不匹配下的稳定性。

英文摘要

Generative retrieval (GR) ranks documents by autoregressively generating document identifiers. Because many GR methods rely on trie-constrained beam search, they are vulnerable to early pruning of relevant prefixes under finite-beam decoding. Planning Ahead in Generative Retrieval (PAG) mitigates this failure mode by using simultaneous decoding to compute a document-level look-ahead prior that guides subsequent sequential decoding. We reproduce PAG at inference time and stress-test its decoding behavior. Using the authors' released checkpoint and identifier/trie artifacts under the reported decoding setup, we reproduce the main effectiveness results on MS MARCO Dev and TREC-DL 2019/2020, and corroborate the reported beam-size-latency trade-off in our hardware setting. Beyond reproduction, we introduce plan drift diagnostics that quantify how intent-preserving query variations alter the planner's top-n candidate set and highest-weight planner tokens, and how these changes affect guided decoding. We find that PAG's planning signal is brittle under lexical surface-form variation: intent-preserving typos can trigger plan collapse, where the planned candidate pool shifts enough that the look-ahead bonus provides little useful guidance, effectively reverting decoding toward weaker unguided search. We further evaluate fixed-index cross-lingual robustness using non-English mMARCO queries against an English index, and assess query-side mitigation strategies that require no re-indexing; query translation provides the strongest recovery in our setting. Overall, our results confirm PAG's reported effectiveness and the benefit of planning-guided decoding under the released inference setup, while showing that these gains depend on the stability of the planning signal under realistic query variation and query-document mismatch.

2604.20022 2026-05-26 cs.LG cs.AI cs.CL 版本更新

MoBayes: A Modular Bayesian Framework for Separating Reasoning from Language in Conversational Clinical Decision Support

MoBayes:一种用于对话式临床决策支持中推理与语言分离的模块化贝叶斯框架

Yusuf Kesmen, Fay Elhassan, Jiayi Ma, Julien Stalhandske, Yena Chang, David Sasu, Alexandra Kulinkina, Akhil Arora, Lars Klein, Mary-Anne Hartley

发表机构 * LiGHT, EPFL(LiGHT,瑞士联邦理工学院) University of Bern(伯尔尼大学) Aarhus University(奥胡斯大学)

AI总结 提出MoBayes框架,通过将LLM作为语言接口、贝叶斯模块进行概率推理,实现推理与语言分离,在临床决策支持中优于独立前沿LLM医生。

Comments 50 pages including appendix, 13 figures, 22 tables. Preprint

详情
AI中文摘要

大型语言模型(LLM)越来越多地用于对话式临床决策支持,但它们将下一个标记预测与概率决策混为一谈。我们认为这种混淆反映了架构上的局限性:此类系统缺乏显式的后验追踪、可控的弃权阈值和可审计的推理链。我们引入MoBayes,一个模块化贝叶斯对话框架,将推理与语言分离。LLM仅作为语言接口,将患者对话解析为结构化观察,而贝叶斯模块对这些观察进行概率推理以更新后验,通过期望信息增益选择后续问题,并通过校准的决策阈值决定何时停止或推迟。这种设计实现了显式后验追踪、可控的选择性决策,以及无需重新训练语言模型即可替换的特定人群统计后端。在经验知识和LLM生成的知识库上,MoBayes优于独立的前沿LLM医生,包括匹配模型系列的比较,其中廉价的传感器模型与MoBayes配对以较低成本超过更大的自主模型。在对抗性患者沟通风格和不同诊断场景下,该优势依然存在。这些结果表明,可靠的对话式临床决策支持系统应将概率推理与语言生成分离,而不是仅扩大模型规模。代码可在https://anonymous.4open.science/r/MoBayes/获取。

英文摘要

Large language models (LLMs) are increasingly used for conversational clinical decision support, yet they conflate next token prediction with probabilistic decision making. We argue that this conflation reflects an architectural limitation: such systems lack explicit posterior tracking, controllable abstention thresholds, and auditable reasoning chains. We introduce MoBayes, a Modular Bayesian dialogue framework that separates reasoning from language. The LLM acts only as a language interface, parsing patient conversation into structured observations, while a Bayesian module performs probabilistic inference over these observations to update posteriors, select follow-up questions via expected-information-gain and determine when to stop or defer through calibrated decision thresholds. This design enables explicit posterior tracking, controllable selective decision-making, and replaceable population-specific statistical backends without retraining the language model. Across empirical and LLM-generated knowledge bases, MoBayes outperforms standalone frontier LLM doctors, including matched model-family comparisons where inexpensive sensor models paired with MoBayes exceed larger autonomous models at lower cost. The advantage persists under adversarial patient communication styles and across varying diagnostic scenarios. These results suggest that reliable conversational clinical decision support systems should separate probabilistic reasoning from language generation rather than scaling model size alone. Code is available at https://anonymous.4open.science/r/MoBayes/

2604.18800 2026-05-26 cs.SI cs.GT cs.LG 版本更新

Optimal Exploration of New Products under Assortment Decisions

基于分类决策的新产品最优探索

Jackie Baek, Atanas Dinev, Thodoris Lykouris

发表机构 * Stern School of Business, New York University(纽约大学斯特恩商学院) Massachusetts Institute of Technology(麻省理工学院)

AI总结 研究平台在容量约束下通过分类决策在线学习新产品质量,提出最优探索策略以最小化遗憾,并揭示新产品应与顶级现有产品搭配、同时探索数量由潜力决定等结构洞见。

详情
AI中文摘要

我们研究了一个平台在容量约束下对提供哪些产品进行分类决策时,对新产品的在线学习。对于新上架的产品,其质量最初未知,质量信息通过社会学习传播:当顾客购买新产品并留下评论时,其质量对平台和未来顾客都变得可见。由于评论需要购买,平台必须在分类中展示新产品(“探索”)以产生评论来了解新产品。这种探索成本高昂,因为顾客对新产品的需求低于现有产品。我们刻画了用于探索的最优分类以最小化遗憾,解决了两个问题。(1)平台应该单独提供新产品还是与现有产品一起提供?前者最大化新产品的购买概率,但产生较低的短期收入。尽管购买概率较低,我们证明将新产品与顶级现有产品配对总是最优的。(2)对于多个新产品,平台应该同时探索它们还是逐个探索?我们证明同时探索的新产品最优数量具有简单的阈值结构:它随着新产品的“潜力”增加而增加,并且令人惊讶的是,不依赖于它们的个体购买概率。我们还表明,两种经典的bandit算法,UCB和汤普森采样,在此设置中因相反的原因而失败:UCB过度探索而汤普森采样探索不足。我们的结果为平台应如何通过分类决策了解新产品提供了结构性洞见。

英文摘要

We study online learning for new products on a platform that makes capacity-constrained assortment decisions on which products to offer. For a newly listed product, its quality is initially unknown, and quality information propagates through social learning: when a customer purchases a new product and leaves a review, its quality is revealed to both the platform and future customers. Since reviews require purchases, the platform must feature new products in the assortment ("explore") to generate reviews to learn about new products. Such exploration is costly because customer demand for new products is lower than for incumbent products. We characterize the optimal assortments for exploration to minimize regret, addressing two questions. (1) Should the platform offer a new product alone or alongside incumbent products? The former maximizes the purchase probability of the new product but yields lower short-term revenue. Despite the lower purchase probability, we show it is always optimal to pair the new product with the top incumbent products. (2) With multiple new products, should the platform explore them simultaneously or one at a time? We show that the optimal number of new products to explore simultaneously has a simple threshold structure: it increases with the "potential" of the new products and, surprisingly, does not depend on their individual purchase probabilities. We also show that two canonical bandit algorithms, UCB and Thompson Sampling, both fail in this setting for opposite reasons: UCB over-explores while Thompson Sampling under-explores. Our results provide structural insights on how platforms should learn about new products through assortment decisions.

2604.18128 2026-05-26 cs.CL cs.AI cs.LG 版本更新

Depth Registers Unlock W4A4 on SwiGLU: A Reader/Generator Decomposition

深度寄存器解锁 SwiGLU 上的 W4A4:一种读取器/生成器分解

Ziyang Liu

AI总结 本研究通过深度寄存器和铰链损失(DR+sink)训练时干预,将 SwiGLU 解码器语言模型的 W4A4 量化困惑度从 1727 降至 119,并分解出残差轴读取器主导误差,而生成器 w2 的双线性输入是剩余差距的主因。

Comments The authors have decided to withdraw this version following internal review regarding authorship and contribution agreements

详情
AI中文摘要

我们在一个受控的 300M 参数 SwiGLU 解码器语言模型(在 FineWeb-Edu 的 5B 令牌上训练)中研究训练后 W4A4 量化,并询问哪些输入激活位点主导误差。朴素的四舍五入 W4A4 将验证困惑度从 FP16 的 23.6 降至 1727。一种简单的残差轴训练时干预——带有寄存器幅度铰链损失的深度寄存器(DR+sink)——在匹配的 FP16 PPL 和匹配的零样本能力下,将其降至 119(约 14 倍),并与 SmoothQuant 组合达到 39.9 PPL。与 FP16 之间约 2 PPL 的剩余差距是诊断核心。我们按输入激活位点分解 W4A4 损伤:SwiGLU 块中的五个可训练线性层分为残差轴读取器(qkv, w1, w3)和块内生成器(o_proj, w2)。基本的范数论证表明,残差轴幅度控制紧密约束读取器,但 w2 的双线性输入仅受因子范数平凡乘积的约束;经验上,DR+sink 降低了读取器的峰度,而生成器基本不变,并且读取器恢复的 W4A4 残差在三个匹配检查点上平坦约为 0.28 nats,其中 Delta-remove(w2) 占主导。我们将 DR+sink 作为训练时探针而非部署方案提出:一种事后替代方案(Per-Linear QuaRot)在读取器轴上几乎与之匹配。完整的 QuaRot——添加在线每头值 Hadamard 和在线 w2 输入旋转——也没有缩小差距,直接验证了正交旋转无法约束双线性 SwiGLU 尾部的预测。这些主张特定于我们的 300M、5B 令牌、单种子设置,并且我们的实验未将分区与铰链分离。

英文摘要

We study post-training W4A4 quantization in a controlled 300M-parameter SwiGLU decoder-only language model trained on 5B tokens of FineWeb-Edu, and ask which input-activation sites dominate the error. Naive round-to-nearest W4A4 collapses validation perplexity from FP16 23.6 to 1727. A simple residual-axis training-time intervention -- Depth Registers with a register-magnitude hinge loss (DR+sink) -- reduces this to 119 (about 14x) at matched FP16 PPL and matched zero-shot capacity, and composes with SmoothQuant to 39.9 PPL. The residual ~2 PPL gap to FP16 is the diagnostic core. We decompose W4A4 damage by input-activation site: the five trainable linears in a SwiGLU block split into residual-axis readers (qkv, w1, w3) and block-internal generators (o_proj, w2). Elementary norm arguments show residual-axis magnitude control bounds readers tightly but leaves w2's bilinear input bounded only by the trivial product of factor bounds; empirically, DR+sink collapses reader kurtosis while leaving generators essentially unchanged, and the reader-rescued W4A4 residue is flat at ~0.28 nats across three matched checkpoints with Delta-remove(w2) dominating. We present DR+sink as a training-time probe rather than a deployment proposal: a post-hoc alternative (Per-Linear QuaRot) nearly matches it on the reader axis. Full QuaRot -- adding online per-head value Hadamard plus online w2-input rotation -- does not close the gap either, directly testing the prediction that orthogonal rotation cannot bound the bilinear SwiGLU tail. Claims are specific to our 300M, 5B-token, single-seed setting, and our experiments do not isolate the partition from the hinge.

2604.17328 2026-05-26 cs.LG cs.AI 版本更新

Rethinking the Comparison Unit in Sequence-Level Reinforcement Learning: An Equal-Length Paired Training Framework from Loss Correction to Sample Construction

重新思考序列级强化学习中的比较单元:从损失校正到样本构建的等长配对训练框架

Fei Ding, Yongkang Zhang, Runhao Liu, Yuhao Liao, Zijian Zeng, Huiming Yang, Sibo wang, Linglin Liao

发表机构 * Alibaba Group(阿里巴巴集团) Tsinghua University(清华大学)

AI总结 本文提出序列级相对强化学习中的长度问题本质是比较单元构建问题,并基于此提出等长配对训练框架EqLen,通过双轨同步生成、前缀继承和段掩码构建可比较的训练样本。

详情
AI中文摘要

本文研究了序列级相对强化学习中的长度问题。我们观察到,尽管现有方法部分缓解了与长度相关的现象,但一个更根本的问题仍未得到充分刻画:训练过程中使用的比较单元缺乏内在可比性。基于这一观察,我们提出一个新的视角:长度问题不应仅仅被视为损失缩放或归一化偏差,而应被视为一个比较单元构建问题。我们进一步建立了一个基于样本构建的训练框架,该框架不是对不等长响应进行事后校正,而是在生成过程中主动构建等长、可对齐且可比较的训练段。在该框架内,我们提出了EqLen,一种适用于组相对比较算法(如GRPO、GSPO和RLOO)的具体方法。通过双轨同步生成、前缀继承和段掩码,EqLen高效地收集有效的等长训练段,并实现稳定的训练。

英文摘要

This paper investigates the length problem in sequence-level relative reinforcement learning. We observe that, although existing methods partially alleviate length-related phenomena, a more fundamental issue remains insufficiently characterized: the comparison units used during training lack inherent comparability. Building on this observation, we propose a new perspective: the length problem should not be viewed merely as a loss-scaling or normalization bias, but rather as a \emph{comparison unit construction} problem. We further establish a sample-construction-based training framework that, instead of applying post-hoc corrections to unequal-length responses, proactively constructs equal-length, alignable, and comparable training segments during generation. Within this framework, we propose EqLen, a concrete method applicable to group-relative comparison algorithms such as GRPO, GSPO, and RLOO. Through dual-track synchronous generation, prefix inheritance, and segment masking, EqLen efficiently collects effective equal-length training segments and enables stable

2604.16778 2026-05-26 cs.LG cs.AI 版本更新

Federation over Text: Insight Sharing for Multi-Agent Reasoning

文本上的联邦:多智能体推理的洞察共享

Dixi Yao, Tahseen Rabbani, Manzil Zaheer, Tian Li

发表机构 * University of Chicago(芝加哥大学) Google DeepMind(谷歌DeepMind)

AI总结 提出一种类似联邦学习的框架FoT,通过迭代聚合多个客户端的本地推理过程,构建跨任务元认知洞察库,无需共享问题实例或任务指令,显著提升推理效果和效率。

Comments 46 pages

详情
AI中文摘要

我们提出了一种类似联邦学习的框架——文本上的联邦(FoT),它使得处理不同任务的多个客户端能够通过迭代地联邦化其本地推理过程,共同生成一个共享的元认知洞察库,而无需共享实际的问题实例或任务指令。与梯度上的联邦(例如分布式训练)不同,FoT在语义层面运作,无需任何梯度优化或监督信号。迭代地,每个客户端运行一个LLM智能体,独立地对其特定任务进行本地思考和自我改进,并将推理轨迹与中央服务器共享,中央服务器将其聚合和提炼成一个跨任务(和跨领域)的洞察库,现有和未来的智能体可以利用该库来改进相关任务的性能。实验表明,FoT在广泛具有挑战性的应用中提高了推理效果和效率,包括数学问题求解、跨领域协作、现实世界日常任务以及机器学习研究洞察发现。具体而言,在前三个应用中,它平均提高了25%的性能得分,同时减少了4%的推理令牌。在研究洞察发现应用中,FoT能够生成覆盖后续论文中80%以上主要贡献的洞察。

英文摘要

We propose a federated learning-like framework, Federation over Text (FoT), that enables multiple clients solving different tasks to collectively generate a shared library of metacognitive insights by iteratively federating their local reasoning processes without sharing actual problem instances or task instructions. Instead of federation over gradients (e.g., as in distributed training), FoT operates at the semantic level without any gradient optimization or supervision signal. Iteratively, each client runs an LLM agent that does local thinking and self-improvement on their specific tasks independently, and shares reasoning traces with a central server, which aggregates and distills them into a cross-task (and cross-domain) insight library that existing and future agents can leverage to improve performance on related tasks. Experiments show that FoT improves reasoning effectiveness and efficiency across a wide range of challenging applications, including mathematical problem solving, cross-domain collaboration, real-world daily tasks, and machine learning research insight discovery. Specifically, it improves average performance scores by 25% while reducing the reasoning tokens by 4% across the first three applications. In the research insight discovery application, FoT is able to generate insights that cover over 80% of the major contributions in the subsequent papers.

2603.28128 2026-05-26 cs.LG cs.CR 版本更新

ORACAL: A Robust and Explainable Multimodal Framework for Smart Contract Vulnerability Detection with Causal Graph Enrichment

ORACAL: 一种基于因果图增强的鲁棒且可解释的智能合约漏洞检测多模态框架

Tran Duong Minh Dai, Triet Huynh Minh Le, M. Ali Babar, Van-Hau Pham, Phan The Duy

发表机构 * Information Security Lab, University of Information Technology(信息安全部,信息科技大学) Vietnam National University(越南国家大学) School of Computer Science and Information Technology, Adelaide University(计算机科学与信息技术学院,阿德莱德大学)

AI总结 提出ORACAL异构多模态图学习框架,集成控制流图、数据流图和调用图,通过RAG和LLM增强关键子图,并采用因果注意力机制和PGExplainer实现鲁棒且可解释的智能合约漏洞检测。

Comments 21 pages, version 2

详情
AI中文摘要

尽管图神经网络(GNN)在智能合约漏洞检测中展现出潜力,但仍面临显著限制。同构图模型无法捕捉控制流与数据依赖之间的相互作用,而异构图方法通常缺乏深层语义理解,使其易受对抗攻击。此外,大多数黑盒模型无法提供可解释证据,阻碍了专业审计的信任。为解决这些挑战,我们提出ORACAL(基于可观测RAG增强的因果推理分析),一种异构多模态图学习框架,集成了控制流图(CFG)、数据流图(DFG)和调用图(CG)。ORACAL选择性地用检索增强生成(RAG)和大语言模型(LLM)的专家级安全上下文增强关键子图,并采用因果注意力机制从虚假相关性中分离真正的漏洞指示。为提升透明度,该框架采用PGExplainer生成子图级解释,识别漏洞触发路径。在大型数据集上的实验表明,ORACAL实现了最先进的性能,在主要基准上以91.28%的峰值宏F1超越MANDO-HGT、MTVHunter、GNN-SC和SCVHunter高达39.6个百分点。ORACAL在分布外数据集上保持强泛化能力,在CGT Weakness和DAppScan上分别达到91.8%和77.1%。在可解释性评估中,PGExplainer针对人工标注的漏洞触发路径实现了32.51%的平均交并比(MIoU)。在对抗攻击下,ORACAL将性能下降限制在约2.35%的F1下降,攻击成功率(ASR)仅为3%,优于ASR在10.91%至18.73%之间的SCVHunter和MANDO-HGT。

英文摘要

Although Graph Neural Networks (GNNs) have shown promise for smart contract vulnerability detection, they still face significant limitations. Homogeneous graph models fail to capture the interplay between control flow and data dependencies, while heterogeneous graph approaches often lack deep semantic understanding, leaving them susceptible to adversarial attacks. Moreover, most black-box models fail to provide explainable evidence, hindering trust in professional audits. To address these challenges, we propose ORACAL (Observable RAG-enhanced Analysis with CausAL reasoning), a heterogeneous multimodal graph learning framework that integrates Control Flow Graph (CFG), Data Flow Graph (DFG), and Call Graph (CG). ORACAL selectively enriches critical subgraphs with expert-level security context from Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs), and employs a causal attention mechanism to disentangle true vulnerability indicators from spurious correlations. For transparency, the framework adopts PGExplainer to generate subgraph-level explanations identifying vulnerability triggering paths. Experiments on large-scale datasets demonstrate that ORACAL achieves state-of-the-art performance, outperforming MANDO-HGT, MTVHunter, GNN-SC, and SCVHunter by up to 39.6 percentage points, with a peak Macro F1 of 91.28% on the primary benchmark. ORACAL maintains strong generalization on out-of-distribution datasets with 91.8% on CGT Weakness and 77.1% on DAppScan. In explainability evaluation, PGExplainer achieves 32.51% Mean Intersection over Union (MIoU) against manually annotated vulnerability triggering paths. Under adversarial attacks, ORACAL limits performance degradation to approximately 2.35% F1 decrease with an Attack Success Rate (ASR) of only 3%, surpassing SCVHunter and MANDO-HGT which exhibit ASRs ranging from 10.91% to 18.73%.

2603.25288 2026-05-26 cs.IT cs.AI cs.ET cs.LG eess.SP math.IT 版本更新

CSI-tuples-based 3D Channel Fingerprints Construction Assisted by MultiModal Learning

基于CSI元组的多模态学习辅助3D信道指纹构建

Chenjie Xie, Li You, Ruirong Chen, Gaoning He, Xiqi Gao

发表机构 * National Mobile Communications Research Laboratory, Southeast University(东南大学国家移动通信研究中心) Purple Mountain Laboratories(紫金山实验室) Huawei Technologies Co., Ltd.(华为技术有限公司)

AI总结 针对低空通信中的3D信道指纹构建问题,提出一种基于CSI元组的多模态回归框架,通过融合位置、通信测量和地理环境地图,实现高效高精度的信道状态信息估计。

Comments 14 pages, 9 figures

详情
Journal ref
IEEE Transactions on Wireless Communications, vol. 25, pp. 17369-17383, 2026
AI中文摘要

低空通信可以促进空中和地面无线资源的整合,扩大网络覆盖范围,提高传输质量,从而推动第六代(6G)移动通信的发展。作为低空传输的关键技术,3D信道指纹(3D-CF),也称为3D无线电地图或3D信道知识地图,有望增强对通信环境的理解,并辅助获取信道状态信息(CSI),从而避免重复估计并降低计算复杂度。本文提出了一种模块化的多模态框架来构建3D-CF。具体而言,我们首先基于莱斯衰落信道建立了3D-CF模型,将其表示为CSI元组的集合,每个元组包含低空飞行器(LAV)的位置及其对应的统计CSI。考虑到不同先验数据的异构结构,我们将3D-CF构建问题表述为一个多模态回归任务,其中CSI元组中的目标信道信息可以通过其对应的LAV位置、通信测量和地理环境地图直接估计。然后,相应地提出了一种高效的多模态框架,包括基于相关性的多模态融合(Corr-MMF)模块、多模态表示(MMR)模块和CSI回归(CSI-R)模块。数值结果表明,我们提出的框架能够高效地构建3D-CF,并在不同通信场景下比现有算法至少提高27.5%的精度,展示了其竞争性能和出色的泛化能力。我们还分析了计算复杂度,并说明了其在推理时间方面的优越性。

英文摘要

Low-altitude communications can promote the integration of aerial and terrestrial wireless resources, expand network coverage, and enhance transmission quality, thereby empowering the development of sixth-generation (6G) mobile communications. As an enabler for low-altitude transmission, 3D channel fingerprints (3D-CF), also referred to as the 3D radio map or 3D channel knowledge map, are expected to enhance the understanding of communication environments and assist in the acquisition of channel state information (CSI), thereby avoiding repeated estimations and reducing computational complexity. In this paper, we propose a modularized multimodal framework to construct 3D-CF. Specifically, we first establish the 3D-CF model as a collection of CSI-tuples based on Rician fading channels, with each tuple comprising the low-altitude vehicle's (LAV) positions and its corresponding statistical CSI. In consideration of the heterogeneous structures of different prior data, we formulate the 3D-CF construction problem as a multimodal regression task, where the target channel information in the CSI-tuple can be estimated directly by its corresponding LAV positions, together with communication measurements and geographic environment maps. Then, a high-efficiency multimodal framework is proposed accordingly, which includes a correlation-based multimodal fusion (Corr-MMF) module, a multimodal representation (MMR) module, and a CSI regression (CSI-R) module. Numerical results show that our proposed framework can efficiently construct 3D-CF and achieve at least 27.5% higher accuracy than the state-of-the-art algorithms under different communication scenarios, demonstrating its competitive performance and excellent generalization ability. We also analyze the computational complexity and illustrate its superiority in terms of the inference time.

2603.18766 2026-05-26 cs.LG 版本更新

Enhancing the Parameterization of Reservoir Properties for Data Assimilation Using Deep VAE-GAN

利用深度VAE-GAN增强数据同化中储层属性的参数化

M. A. Sampaio, P. H. Ranazzi, M. J. Blunt

发表机构 * Departamento de Engenharia de Minas e de Petróleo, Escola Politécnica, Universidade de São Paulo(圣保罗大学采矿与石油工程系,理工学院) Department of Earth Science and Engineering, Imperial College London(伦敦帝国理工学院地球科学与工程系)

AI总结 提出将VAE-GAN与ESMDA结合,以同时实现高质量储层描述和良好历史拟合,克服传统方法在非高斯分布和有限集合大小上的局限。

详情
AI中文摘要

目前,称为迭代集合平滑器的方法,特别是称为多重数据同化集合平滑器(ESMDA)的方法,可被视为石油储层模拟中历史拟合的最先进技术。然而,这种方法有两个重要限制:使用有限大小的集合来表示分布,以及参数和数据不确定性中的高斯假设。后者尤为重要,因为许多储层属性具有非高斯分布。参数化涉及在更新前将非高斯参数映射到高斯场,然后将其映射回原始域以将集合通过储层模拟器向前传播。一种有前景的参数化方法是通过深度学习模型。最近的研究表明,生成对抗网络(GAN)在数据同化方面表现不佳,但能生成地质上更合理的储层实现,而变分自编码器(VAE)在数据同化中表现优于GAN,但生成的地质模型不太真实。本工作的创新之处在于结合两者的优势,实现一个称为变分自编码器生成对抗网络(VAE-GAN)的深度学习模型,并与ESMDA集成。该方法应用于两个案例研究,一个案例是分类的,另一个是连续渗透率值。我们的发现表明,通过应用VAE-GAN模型,我们可以同时获得高质量的储层描述(就像GAN)和良好的生产曲线历史拟合(就像VAE)。

英文摘要

Currently, the methods called Iterative Ensemble Smoothers, especially the method called Ensemble Smoother with Multiple Data Assimilation (ESMDA) can be considered state-of-the-art for history matching in petroleum reservoir simulation. However, this approach has two important limitations: the use of an ensemble with finite size to represent the distributions and the Gaussian assumption in parameter and data uncertainties. This latter is particularly important because many reservoir properties have non-Gaussian distributions. Parameterization involves mapping non-Gaussian parameters to a Gaussian field before the update and then mapping them back to the original domain to forward the ensemble through the reservoir simulator. A promising approach to perform parameterization is through deep learning models. Recent studies have shown that Generative Adversarial Networks (GAN) performed poorly concerning data assimilation, but generated more geologically plausible realizations of the reservoir, while the Variational Autoencoder (VAE) performed better than the GAN in data assimilation, but generated less geologically realistic models. This work is innovative in combining the strengths of both to implement a deep learning model called Variational Autoencoder Generative Adversarial Network (VAE-GAN) integrated with ESMDA. The methodology was applied in two case studies, one case being categorical and the other with continuous values of permeability. Our findings demonstrate that by applying the VAE-GAN model we can obtain high quality reservoir descriptions (just like GANs) and a good history matching on the production curves (just like VAEs) simultaneously.

2603.16481 2026-05-26 cs.LG cs.SY eess.SY math.OC 版本更新

Optimal uncertainty bounds for multivariate kernel regression under bounded noise: A Gaussian process-based dual function

有界噪声下多元核回归的最优不确定性界:基于高斯过程的对偶函数

Amon Lahr, Anna Scampicchio, Johannes Köhler, Melanie N. Zeilinger

发表机构 * Institute for Dynamical Systems and Control, ETH Zurich(动态系统与控制研究所,苏黎世联邦理工学院) Department of Electrical Engineering, Chalmers University of Technology(电气工程系,查尔姆斯理工大学) Department of Mechanical Engineering, Imperial College London(机械工程系,伦敦帝国理工学院)

AI总结 针对有界噪声下再生核希尔伯特空间中的多输出函数,提出一种紧致、确定性的不确定性界,通过无约束对偶公式获得,具有与经典高斯过程置信界相同的结构,便于集成到下游优化中。

Comments Extended version

详情
AI中文摘要

非保守的不确定性界对于从含噪数据中对潜在函数进行可靠预测至关重要,因此是安全学习控制的关键推动因素。在该领域,高斯过程回归等核方法因其固有的不确定性量化机制而成为成熟技术。然而,现有方法要么对底层噪声分布施加强假设,要么保守,要么不直接适用于多输出情况,要么难以集成到下游任务中。本文通过提出一种针对再生核希尔伯特空间(RKHS)中多输出函数的紧致、确定性界来应对这些限制,该函数受有界噪声影响。该界通过无约束的对偶公式获得,该公式具有与经典高斯过程置信界相同的结构,因此可以直接集成到下游优化流程中。我们证明了所提出的界推广了现有结果,并使用四旋翼动力学学习的示例说明了其应用。

英文摘要

Non-conservative uncertainty bounds are essential for making reliable predictions about latent functions from noisy data, and thus, a key enabler for safe learning-based control. In this domain, kernel methods such as Gaussian process regression are established techniques, thanks to their inherent uncertainty quantification mechanism. Still, existing bounds either pose strong assumptions on the underlying noise distribution, are conservative, do not directly apply in the multi-output case, or are difficult to integrate into downstream tasks. This paper addresses these limitations by presenting a tight, deterministic bound for multi-output functions in Reproducing Kernel Hilbert Spaces (RKHSs) subject to bounded noise. It is obtained through an unconstrained, duality-based formulation, which shares the same structure as classic Gaussian process confidence bounds, and can thus be straightforwardly integrated into downstream optimization pipelines. We show that the proposed bound generalizes existing results and illustrate its application using an example inspired by quadrotor dynamics learning.

2603.10250 2026-05-26 cs.LG 版本更新

GeMPO: Generalized Measure Matching for Online Diffusion Reinforcement Learning

GeMPO:在线扩散强化学习的广义度量匹配

Haitong Ma, Chenxiao Gao, Tianyi Chen, Na Li, Bo Dai

发表机构 * Harvard University(哈佛大学) Georgia Institute of Technology(佐治亚理工学院)

AI总结 提出GeMPO框架,通过将扩散RL中的重加权从softmax推广到一般单调函数,并引入负重加权机制,以解决过贪策略和负样本利用不足的问题。

Comments 22 pages, 6 figures

详情
AI中文摘要

扩散策略的强化学习中常用的一类算法对来自行为策略的样本进行softmax重加权,这通常会导致过贪策略,并且未能利用负样本的反馈。在这项工作中,我们引入了GeMPO,一个简单且统一的框架,将扩散RL中的重加权方案从softmax推广到一般单调函数。GeMPO通过度量匹配的视角重新审视扩散RL:首先,通过求解正则化策略优化目标构建虚拟目标策略度量;其次,通过重加权流匹配最小化当前策略与该目标度量之间的散度。这种公式有两个关键优势:i) 它将权重设计扩展到传统的指数重加权之外,允许针对不同的奖励景观进行定制;ii) 通过放松目标度量的非负性约束,我们的框架为负重加权提供了原则性的理由。我们解释了负重加权如何主动使策略远离次优动作,从而促进探索。大量的实证评估表明,GeMPO通过利用这些灵活的加权方案实现了具有竞争力或更优的性能,并且我们提供了在实践中选择重加权方法的实用指南。

英文摘要

A commonly used family of RL algorithms for diffusion policies conducts softmax reweighting over samples from the behavior policy, which often induces an overgreedy policy and fails to utilize feedback from negative samples. In this work, we introduce GeMPO, a simple and unified framework that generalizes reweighting scheme in diffusion RL from softmax to general monotonic functions. GeMPO revisits diffusion RL via a measure matching perspective: First, we construct a virtual target policy measure via solving a regularized policy optimization objective; Second, we minimize the divergence between the current policy and this target measure through reweighted flow matching. This formulation offers two key advantages: i) It extends weight design beyond traditional exponential reweighting, allowing it to be tailored to diverse reward landscapes; and ii) by relaxing the non-negativity constraint on the target measure, our framework provides a principled justification for negative reweighting. We provide interpretations of how negative reweighting actively repels the policy from suboptimal actions and thus facilitates exploration. Extensive empirical evaluations demonstrate that GeMPO achieves competitive or superior performance by leveraging these flexible weighting schemes, and we provide practical guidelines for selecting reweighting methods in practice.

2603.06626 2026-05-26 cs.LG cs.AI 版本更新

Grouter: Decoupling Routing from Representation for Accelerated MoE Training

Grouter: 将路由与表示解耦以加速MoE训练

Yuqi Xu, Rizhen Hu, Zihan Liu, Mou Sun, Kun Yuan

发表机构 * School of Mathematical Sciences, Peking University, Beijing, China(北京大学数学科学学院) Center for Machine Learning Research, Peking University, Beijing, China(北京大学机器学习研究中心) Yuanpei College, Peking University, Beijing, China(北京大学元培学院) Zhejiang Lab, Hangzhou, China(浙江实验室)

AI总结 提出Grouter方法,通过从预训练MoE模型中蒸馏高质量结构作为固定路由器,解耦结构优化与权重更新,显著加速模型收敛并提升训练吞吐量。

详情
AI中文摘要

传统的混合专家(MoE)训练通常没有任何结构先验,实际上要求模型在训练专家权重的同时,在巨大的组合空间中搜索最优路由策略。这种纠缠常常导致收敛缓慢和训练不稳定。本文介绍了Grouter,一种先发制人的路由方法,通过从完全训练的MoE模型中蒸馏高质量结构,并作为目标模型的固定路由器。通过将结构优化与权重更新解耦,Grouter显著加速了模型收敛的速度和质量。为了确保框架的通用性,我们还引入了专家折叠以适应不同模型配置的Grouter,以及专家调优以重新平衡不同数据分布下的工作负载。此外,通过利用先发制人路由提供的结构先验,我们可以实施有针对性的优化以进一步提高训练吞吐量。实验表明,Grouter实现了卓越的性能和效率,将预训练数据利用率提高了4.28倍,并实现了高达33.5%的吞吐量加速,确立了先发制人路由作为可扩展MoE训练的基本范式。我们在https://github.com/JimmyAwoe/Grouter公开了我们的代码和预训练的Grouter检查点。

英文摘要

Traditional Mixture-of-Experts (MoE) training typically proceeds without any structural priors, effectively requiring the model to simultaneously train expert weights while searching for an optimal routing policy within a vast combinatorial space. This entanglement often leads to sluggish convergence and training instabilities. This paper introduces Grouter, a preemptive routing method that by distilling high-quality structures from fully-trained MoE models and serving as a fixed router for target models. By decoupling structural optimization from weight updates, Grouter significantly accelerates both the speed and quality of model convergence. To ensure the framework's versatility, we also introduce expert folding to adapt Grouter across varying model configurations and expert tuning to rebalance workloads across different data distributions. Furthermore, by leveraging the structural priors provided by preemptive routing, we can implement targeted optimizations to further enhance training throughput. Experiments demonstrate that Grouter achieves superior performance and efficiency which boosts pre-training data utilization by 4.28x and achieves up to 33.5% throughput acceleration, establishing preemptive routing as a fundamental paradigm for scalable MoE training. We publicly release our code and pretrained Grouter checkpoints at https://github.com/JimmyAwoe/Grouter.

2603.00191 2026-05-26 cs.LG cs.CV 版本更新

Task-Driven Subspace Decomposition for Knowledge Sharing and Isolation in LoRA-based Continual Learning

基于LoRA的持续学习中任务驱动的子空间分解用于知识共享与隔离

Lingfeng He, De Cheng, Huaijie Wang, Xi Yang, Nannan Wang, Xinbo Gao

发表机构 * Department of XXX, University of YYY, Location, Country(XXX部门,YYY大学,地点,国家) School of ZZZ, Institute of WWW, Location, Country(ZZZ学院,WWW研究所,地点,国家) State Key Laboratory of Integrated Services Networks, School of Telecommunications Engineering, Xidian University, Xi'an, China(信息服务网络国家重点实验室,电信工程学院,西安电子科技大学,西安,中国) School of Electronic Engineering, Xidian University, Xi'an, China(电子工程学院,西安电子科技大学,西安,中国)

AI总结 提出LoDA方法,通过任务驱动分解构建通用和任务特定LoRA子空间,结合梯度对齐优化和闭式重校准,实现知识共享与隔离,提升持续学习性能。

Comments Accepted by ICML 2026

详情
AI中文摘要

持续学习要求模型在不遗忘旧知识的情况下顺序适应新任务。最近,低秩适应(LoRA)作为一种代表性的参数高效微调方法,在持续学习中受到越来越多的关注。几种基于LoRA的持续学习方法通过分离更新空间来减少任务间的干扰,通常从过去任务的估计零空间中构建新空间。然而,它们(i)忽略了任务共享方向,抑制了知识迁移;(ii)未能捕获真正有效的任务特定方向,因为旧任务的这些“零基”在相关任务下对新任务几乎保持不活跃。为了解决这个问题,我们从投影能量的角度研究LoRA的学习能力,并提出了低秩分解与适应(LoDA)。它通过解决两个基于能量的目标,执行任务驱动分解以构建通用和真正的任务特定LoRA子空间,解耦知识共享和隔离的方向。LoDA固定两个子空间上的LoRA下投影,并通过梯度对齐优化方法学习鲁棒的上投影。在每个任务之后,在将LoRA更新集成到主干之前,LoDA为通用更新推导出一个闭式重校准,沿着这个任务共享方向近似特征级联合最优。实验表明,LoDA优于现有的持续学习方法。我们的代码可在https://github.com/HHHLF/LoDA_ICML2026获取。

英文摘要

Continual Learning (CL) requires models to sequentially adapt to new tasks without forgetting old knowledge. Recently, Low-Rank Adaptation (LoRA), a representative Parameter-Efficient Fine-Tuning (PEFT) method, has gained increasing attention in CL. Several LoRA-based CL methods reduce interference across tasks by separating their update spaces, typically building the new space from the estimated null space of past tasks. However, they (i) overlook task-shared directions, which suppresses knowledge transfer, and (ii) fail to capture truly effective task-specific directions since these ``null bases" of old tasks can remain nearly inactive for new task under correlated tasks. To address this, we study LoRA learning capability from a projection energy perspective, and propose Low-rank Decomposition and Adaptation (LoDA). It performs a task-driven decomposition to build general and truly task-specific LoRA subspaces by solving two energy-based objectives, decoupling directions for knowledge sharing and isolation. LoDA fixes LoRA down-projections on two subspaces and learns robust up-projections via a Gradient-Aligned Optimization (GAO) approach. After each task, before integrating the LoRA updates into the backbone, LoDA derives a closed-form recalibration for the general update, approximating a feature-level joint optimum along this task-shared direction. Experiments indicate that LoDA outperforms existing CL methods. Our code is available at https://github.com/HHHLF/LoDA_ICML2026.

2603.00177 2026-05-26 cs.CR cs.HC cs.LG 版本更新

Detecting Cognitive Signatures in Typing Behavior for Non-Intrusive Authorship Verification

通过打字行为检测认知特征以实现非侵入式作者验证

David Condrey

发表机构 * Writerslogic, Inc.(Writerslogic公司)

AI总结 利用大规模击键数据集中的认知负荷相关性(CLC)区分真实创作与机械转录,提出一种仅收集时间元数据的非侵入式验证框架,在保护隐私的同时实现85-95%的判别准确率,并证明认知特征对时序伪造攻击具有鲁棒性。

Comments 7 pages

详情
AI中文摘要

AI生成文本的激增加剧了对可靠作者验证的需求,然而当前基于输出的方法越来越不可靠。我们观察到,普通的打字界面捕获了丰富的认知特征,即击键时序中可测量的模式,反映了真实创作过程中的规划、翻译和修改阶段。基于包含超过1.36亿事件的大规模击键数据集,我们定义了认知负荷相关性(CLC),并表明它能区分真实创作与机械转录。我们提出了一种非侵入式验证框架,该框架在现有写作界面内运行,仅收集时间元数据以保护隐私。我们的分析评估估计,在所述假设下,判别准确率为85%至95%,同时通过证据量化限制生物特征泄露。我们分析了认知特征的对抗鲁棒性,表明它们能够抵抗击败运动级身份验证的时序伪造攻击,因为认知通道与语义内容纠缠在一起。我们得出结论,将作者验证重新定义为人机交互问题,为侵入式监控提供了一种保护隐私的替代方案。

英文摘要

The proliferation of AI-generated text has intensified the need for reliable authorship verification, yet current output-based methods are increasingly unreliable. We observe that the ordinary typing interface captures rich cognitive signatures, measurable patterns in keystroke timing that reflect the planning, translating, and revising stages of genuine composition. Drawing on large-scale keystroke datasets comprising over 136 million events, we define the Cognitive Load Correlation (CLC) and show it distinguishes genuine composition from mechanical transcription. We present a non-intrusive verification framework that operates within existing writing interfaces, collecting only timing metadata to preserve privacy. Our analytical evaluation estimates 85 to 95 percent discrimination accuracy under stated assumptions, while limiting biometric leakage via evidence quantization. We analyze the adversarial robustness of cognitive signatures, showing they resist timing-forgery attacks that defeat motor-level authentication because the cognitive channel is entangled with semantic content. We conclude that reframing authorship verification as a human-computer interaction problem provides a privacy-preserving alternative to invasive surveillance.

2602.22631 2026-05-26 cs.MS cs.LG cs.LO cs.NA cs.PL math.NA 版本更新

TorchLean: Formalizing Neural Networks in Lean

TorchLean: 在 Lean 中形式化神经网络

Robert Joseph George, Jennifer Cruden, Will Adkisson, Xiangru Zhong, Huan Zhang, Anima Anandkumar

发表机构 * California Institute of Technology(加利福尼亚理工学院) Washington University in St. Louis(华盛顿大学圣路易斯分校) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出 TorchLean 框架,在 Lean 4 中统一神经网络的执行、验证与定理证明,通过共享语义弥合执行网络与分析工件之间的语义鸿沟。

Comments 55 pages

详情
AI中文摘要

神经网络越来越多地部署在科学、安全关键和任务关键型流程中,但验证和分析通常在定义和运行模型的编程环境之外进行。这在执行的网络与分析工件之间造成了语义鸿沟:保证可能依赖于关于算子语义、张量布局、预处理、浮点行为、图变换、加速内核和外部证书的隐式约定。我们提出 TorchLean,一个在 Lean 4 中形式化、执行和验证神经网络的统一框架。TorchLean 将学习模型视为可执行程序和数学对象,具有用于计算、验证和定理证明的共享语义。该框架为类型化张量、层、目标、优化器、自动微分和图程序提供了 PyTorch 风格的 API,具有急切和编译执行路径,这些路径降低到公共计算图表示。TorchLean 支持精确和有限精度张量语义、验证的反向模式微分、区间和仿射边界传播、CROWN/LiRPA 风格的证书检查、导入/导出工作流以及通过显式 FFI 边界的 CUDA 支持执行。它还包括用于注意力和 FlashAttention、状态空间序列模型、扩散和采样过程、概率核、强化学习目标和马尔可夫决策过程以及自监督目标(如掩码自编码、JEPA 风格的预测视图和基于方差/相关性的抗崩溃损失)的语义层。这些组件共同为验证机器学习提供了语义基础,其中可执行的神经网络工件、验证过程、运行时边界和数学声明可以在一个定理证明环境中陈述和关联。

英文摘要

Neural networks are increasingly deployed in scientific, safety critical, and mission critical pipelines, yet verification and analysis are often performed outside the programming environment that defines and runs the model. This creates a semantic gap between the executed network and the analyzed artifact: guarantees can depend on implicit conventions about operator semantics, tensor layouts, preprocessing, floating-point behavior, graph transformations, accelerated kernels, and external certificates. We present TorchLean, a unified framework for formalizing, executing, and verifying neural networks in Lean 4. TorchLean treats learned models as executable programs and mathematical objects with a shared semantics for computation, verification, and theorem proving. The framework provides a PyTorch style API for typed tensors, layers, objectives, optimizers, automatic differentiation, and graph programs, with eager and compiled execution paths that lower to a common computation-graph representation. TorchLean supports exact and finite-precision tensor semantics, verified reverse-mode differentiation, interval and affine bound propagation, CROWN/LiRPA style certificate checking, import/export workflows, and CUDA-backed execution through explicit FFI boundaries. It also includes semantic layers for attention and FlashAttention, state-space sequence models, diffusion and sampling processes, probability kernels, reinforcement-learning objectives and Markov decision processes, and self-supervised objectives such as masked autoencoding, JEPA-style predictive views, and variance/correlation-based anti-collapse losses. Together, these components provide a semantic foundation for verified machine learning, where executable neural network artifacts, verification procedures, runtime boundaries, and mathematical claims can be stated and related inside one theorem-proving environment.

2602.21479 2026-05-26 stat.ML cs.LG 版本更新

Global Sequential Testing for Multi-Stream Auditing

多流审计的全局序贯检验

Beepul Bharti, Ambar Pal, Jeremias Sulam

发表机构 * Mathematical Institute for Data Science (MINDS), Johns Hopkins University(数据科学数学研究所(MINDS),约翰霍普金斯大学) Department of Biomedical Engineering, Johns Hopkins University(生物医学工程系,约翰霍普金斯大学) Amazon Responsible AI(亚马逊负责任人工智能)

AI总结 针对多数据流审计问题,提出基于鞅合并的序贯检验方法,在稀疏和密集备择假设下分别达到最优停止时间,并通过实验验证。

详情
AI中文摘要

在许多风险敏感领域,随着接收更多数据,持续审计机器学习系统以快速判断其是否按设计运行至关重要。该审计任务可建模为具有 $k$ 个数据流和全局零假设的序贯假设检验问题,其中全局零假设断言系统在所有 $k$ 个流上按预期运行。在备择假设下,使用 Bonferroni 校正的标准全局序贯检验,对于大 $k$ 和显著性水平 $α$,期望停止时间为 $O\left(\ln rac{k}{α} ight)$。在这项工作中,我们证明了依赖于通过平均和乘积规则合并鞅的高效序贯检验提供了改进的停止时间,从而对零假设具有更强的检验能力。利用这些结果,我们表明平衡检验在稀疏情形(仅少数非零流)下可以达到 Bonferroni 的 $O\left(\ln rac{k}{α} ight)$ 速率,同时在密集备择假设(许多非零流)下实现 $O\left( rac{1}{k}\ln rac{1}{α} ight)$。我们通过在合成数据和真实数据上的实验验证了我们的理论。

英文摘要

Across many risk-sensitive areas, it is critical to continuously audit machine learning systems as we receive more data to quickly determine if they are performing as designed. This auditing task can be modeled as a sequential hypothesis testing problem with $k$ data streams and a global null hypothesis that asserts the system operates as intended across all $k$ streams. Under the alternative, the standard global sequential test, which uses a Bonferroni correction, has an expected stopping time of $O\left(\ln \frac{k}α\right)$ for large $k$ and significance level $α$. In this work, we demonstrate that efficient sequential tests, relying on merging martingales via averaging and products rules, provide improved stopping times, and thus more powerful tests against the null. Using these results, we show that a balanced test can match the Bonferroni rate of $O\left(\ln \frac{k}α\right)$ in the sparse regime (just a few non-null streams) while achieving $O\left(\frac{1}{k}\ln \frac{1}α\right)$ under dense alternatives (many non-null steams). We validate our theory through experiments on both synthetic and real-world data.

2602.16340 2026-05-26 cs.LG stat.ML 版本更新

The Implicit Bias of Adam and Muon on Smooth Homogeneous Neural Networks

Adam和Muon在光滑齐次神经网络上的隐式偏差

Eitan Gronich, Gal Vardi

发表机构 * Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel(计算机科学与应用数学系,魏茨曼科学研究院,以色列雷霍夫特)

AI总结 研究动量优化器在光滑齐次模型上的隐式偏差,证明Muon、MomentumGD和Signum在衰减学习率下近似于最速下降轨迹,并偏向于对应边际最大化问题的KKT点,同时将分析扩展到Adam和混合范数优化器。

Comments ICML 2026. 8 pages, 1 figure (with appendix: 45 pages, 3 figures)

详情
AI中文摘要

我们研究了动量优化器在光滑齐次模型上的隐式偏差。我们证明,在衰减学习率调度下,像Muon(谱范数)、MomentumGD(ℓ2范数)和Signum(ℓ∞范数)这样的动量最速下降算法是近似最速下降轨迹,从而证明这些算法偏向于对应边际最大化问题的KKT点。我们将分析扩展到Adam(不含稳定性常数),它最大化ℓ∞边际,以及Muon-Signum和Muon-Adam,它们最大化混合范数。我们的实验证实了理论,并表明最大化的边际类型取决于优化器的选择。总体而言,我们的结果扩展了早期关于齐次模型中最速下降和线性模型中动量优化器的工作线。

英文摘要

We study the implicit bias of momentum-based optimizers on smooth homogeneous models. We show that \textit{momentum steepest descent} algorithms like Muon (spectral norm), MomentumGD ($\ell_2$ norm), and Signum ($\ell_\infty$ norm) are \textit{approximate} steepest descent trajectories under a decaying learning rate schedule, proving that these algorithms have a bias towards KKT points of the corresponding margin maximization problem. We extend the analysis to Adam (without the stability constant), which maximizes the $\ell_\infty$ margin, and to Muon-Signum and Muon-Adam, which maximize a hybrid norm. Our experiments corroborate the theory and show that the identity of the margin maximized depends on the choice of optimizer. Overall, our results extend earlier lines of work on steepest descent in homogeneous models and momentum-based optimizers in linear models.

2602.09130 2026-05-26 cs.LG 版本更新

UniComp: A Unified Evaluation of Large Language Model Compression via Pruning, Quantization and Distillation

UniComp: 通过剪枝、量化和蒸馏对大型语言模型压缩的统一评估

Jonathan von Rad, Yong Cao, Andreas Geiger

发表机构 * University College London(伦敦大学学院) University of Tübingen(图宾根大学) Tübingen AI Center(图宾根人工智能中心)

AI总结 提出UniComp框架,统一评估剪枝、量化和知识蒸馏三种压缩方法,从性能、可靠性和效率三个维度在40个数据集上分析,发现知识偏差、性能与可靠性解耦以及任务特定校准可提升推理性能。

Comments 18 pages, 5 figures, 18 tables

详情
AI中文摘要

模型压缩对于部署大型语言模型(LLM)日益重要,然而现有的比较研究主要集中在剪枝和量化,且主要基于知识中心的基准进行评估。因此,我们引入了UniComp,一个用于比较剪枝、量化和知识蒸馏的统一评估框架。UniComp从性能、可靠性和效率三个维度评估压缩模型,使用多样化的面向能力和安全性的基准以及硬件感知的效率分析。通过对40个数据集上的六种压缩技术进行评估,我们观察到:(i) 一致的知识偏差,即事实回忆基本保留,而多步推理、多语言和指令遵循能力下降;(ii) 性能与可靠性之间的解耦,表明保留的性能并不一致地意味着保留的可靠性;(iii) 任务特定校准可以在剪枝模型中实现高达50%的推理性能相对提升。

英文摘要

Model compression is increasingly essential for deploying large language models (LLMs), yet existing comparative studies largely focus on pruning and quantization evaluated primarily on knowledge-centric benchmarks. Thus, we introduce UniComp, a unified evaluation framework for comparing pruning, quantization, and knowledge distillation. UniComp evaluates compressed models along three dimensions: performance, reliability, and efficiency, using a diverse set of capability- and safety-oriented benchmarks together with a hardware-aware efficiency analysis. Through evaluation of six compression techniques across 40 datasets, we observe (i) a consistent knowledge bias, where factual recall is largely preserved while multi-step reasoning, multilingual, and instruction-following capabilities degrade; (ii) a decoupling between performance and reliability, indicating that retained performance does not consistently imply preserved reliability; and (iii) that task-specific calibration can yield up to 50% relative improvement of reasoning performance in pruned models.

2602.06357 2026-05-26 cs.LG 版本更新

LLM-SAA: LLM-persona Generated Distributions for Decision-making

LLM-SAA:基于LLM人格生成分布的决策方法

Jackie Baek, Yunhan Chen, Ziyu Chi, Will Ma

发表机构 * Stern School of Business, New York University(纽约大学 Stern 商学院) Department of Computer Science, Columbia University(哥伦比亚大学 计算机科学系) Graduate School of Business and Data Science Institute, Columbia University(哥伦比亚大学 商学院和数据科学研究院)

AI总结 研究利用LLM生成分布(如模拟消费者支付意愿)支持下游决策,通过三个经典问题(分类优化、定价、报童模型)评估其实际效用,发现低数据场景下有效,且决策无关指标(如Wasserstein距离)可能误导。

详情
AI中文摘要

LLM可以生成丰富的数据,从模拟人类估值和偏好的虚拟人格,到基于世界知识的需求预测。但这类LLM生成的分布对下游决策的支持程度如何?例如,在定价新产品时,企业可以提示LLM根据产品描述模拟消费者愿意支付的价格,但由此得到的分布对优化价格有多大用处?我们将这种方法称为LLM-SAA,即利用LLM构建估计分布,然后在该分布下优化决策。在本文中,我们研究基于这些分布所诱导的决策来评估其质量的指标。以三个经典决策问题(分类优化、定价和报童模型)为例,我们发现LLM生成的分布在实际中是有用的,尤其是在低数据场景下。我们还表明,在评估这些分布用于决策时,诸如Wasserstein距离等与决策无关的指标可能会产生误导。

英文摘要

LLMs can generate a wealth of data, ranging from simulated personas imitating human valuations and preferences, to demand forecasts based on world knowledge. But how well do such LLM-generated distributions support downstream decision-making? For example, when pricing a new product, a firm could prompt an LLM to simulate how much consumers are willing to pay based on a product description, but how useful is the resulting distribution for optimizing the price? We refer to this approach as LLM-SAA, in which an LLM is used to construct an estimated distribution and the decision is then optimized under that distribution. In this paper, we study metrics to evaluate the quality of these LLM-generated distributions, based on the decisions they induce. Taking three canonical decision-making problems (assortment optimization, pricing, and newsvendor) as examples, we find that LLM-generated distributions are practically useful, especially in low-data regimes. We also show that decision-agnostic metrics such as Wasserstein distance can be misleading when evaluating these distributions for decision-making.

2602.04653 2026-05-26 cs.CR cs.LG 版本更新

Inference-Time Backdoors via Chat Templates: From LLM Supply Chains to Agentic System Compromise

通过聊天模板的推理时后门:从LLM供应链到代理系统妥协

Ariel Fogel, Omer Hofman, Eilon Cohen, Roman Vainshtein

发表机构 * Fujitsu Research of Europe(富士通欧洲研究)

AI总结 提出一种通过恶意修改聊天模板实现推理时后门攻击的方法,无需修改模型权重或训练数据,在LLM、代理和多代理系统层面均能成功攻击,且能绕过现有防御。

Comments V3: Accepted to ICLR 2026 Trustworthy AI Workshop, V4: Submitted to CCS 2026

详情
AI中文摘要

开源权重语言模型越来越多地用于生产环境,带来了新的安全挑战。一个突出的威胁是后门攻击,攻击者嵌入在特定条件下激活的隐藏行为。先前的工作假设攻击者能够访问训练流程或部署基础设施。我们提出了一种新颖的攻击面,不需要这些:即“聊天模板”。聊天模板是在每次推理调用时执行的可执行程序,通常用Jinja2实现,占据用户输入和模型处理之间的特权位置。我们表明,分发带有恶意修改模板的模型的攻击者可以在不修改模型权重、投毒训练数据或控制运行时基础设施的情况下植入推理时后门。我们在三个部署层级评估了这种攻击。在LLM层面,触发的后门将事实准确性从平均90%降低到15%,并诱导攻击者控制的URL发射,成功率超过80%,而良性输入没有可测量的退化;这些结果在十八个模型上成立。在代理层面,模板后门在两个基准测试(涵盖3868个回合)中劫持了工具使用,绕过了基准测试提供的所有测试过的注入防御,同时在缺乏触发条件时完全休眠。在多代理系统层面,我们展示了单个投毒工件如何损害真实世界的代理部署,并向下游传播供应链代码投毒。投毒工件在最大的开源模型分发平台上逃避了所有安全扫描;并且由于负载在用户输入处理之前由模板渲染,它在架构上无法被输入级防御(如提示注入护栏)触及。这些结果确立了聊天模板在开源权重AI供应链中作为一种可靠且未受防御的攻击方式。

英文摘要

Open-weight language models are increasingly used in production settings, raising new security challenges. One prominent threat is backdoor attacks, in which adversaries embed hidden behaviors that activate under specific conditions. Previous work has assumed that adversaries have access to training pipelines or deployment infrastructure. We propose a novel attack surface requiring neither: the "chat template". Chat templates are executable programs invoked at every inference call, often implemented in Jinja2, that occupy a privileged position between user input and model processing. We show that an adversary who distributes a model with a maliciously modified template can implant an inference-time backdoor without modifying model weights, poisoning training data, or controlling runtime infrastructure. We evaluate this attack across three deployment tiers. At the LLM level, triggered backdoors reduce factual accuracy from 90% to 15% on average and induce attacker-controlled URL emission with success rates exceeding 80%, while benign inputs show no measurable degradation; these results hold across eighteen models. At the agent level, template backdoors hijack tool-use across two benchmarks spanning 3,868 episodes, bypassing every tested injection defense offered by the benchmarks while remaining fully dormant absent the trigger. At the multi-agent system level, we demonstrate how a single poisoned artifact compromises a real-world agentic deployment and propagates supply-chain code poisoning downstream. The poisoned artifacts evade all security scans on the largest open model distribution platform; and because the payload is rendered by the template before user input is processed, it is architecturally unreachable by input-level defenses such as prompt injection guardrails. These results establish chat templates as a reliable and undefended attack in the open-weight AI supply chain.

2602.04360 2026-05-26 cs.LG cs.AI cs.CY 版本更新

Counterfactual Explanations for Hypergraph Neural Networks

超图神经网络的反事实解释

Fabiano Veglianti, Lorenzo Antonelli, Gabriele Tolomei

发表机构 * Department of Computer Control and Management Engineering, Sapienza University(计算机控制与管理工程系,萨皮恩扎大学) Department of Computer Science, Sapienza University(计算机科学系,萨皮恩扎大学)

AI总结 提出CF-HyperGNNExplainer方法,通过最小结构变化生成反事实超图,以解释超图神经网络的预测决策。

详情
AI中文摘要

超图神经网络(HGNNs)有效建模了许多现实系统中的高阶交互,但仍难以解释,限制了其在高风险场景中的部署。我们引入了CF-HyperGNNExplainer,一种针对HGNNs的反事实解释方法,该方法识别改变模型预测所需的最小结构变化。该方法通过仅限于删除节点-超边关联或删除超边的可操作编辑生成反事实超图,产生简洁且结构上有意义的解释。在超图基准数据集上的大量实验表明,CF-HyperGNNExplainer生成了有效且简洁的反事实,突出了对HGNN决策最关键的高阶关系。

英文摘要

Hypergraph neural networks (HGNNs) effectively model higher-order interactions in many real-world systems but remain difficult to interpret, limiting their deployment in high-stakes settings. We introduce CF-HyperGNNExplainer, a counterfactual explanation method for HGNNs that identifies the minimal structural changes required to alter a model's prediction. The method generates counterfactual hypergraphs using actionable edits limited to removing node-hyperedge incidences or deleting hyperedges, producing concise and structurally meaningful explanations. Extensive experiments on hypergraph benchmark datasets show that CF-HyperGNNExplainer generates valid and concise counterfactuals, highlighting the higher-order relations most critical to HGNN decisions.

2602.02544 2026-05-26 cs.LG cs.AI 版本更新

SPA-Cache: Singular Proxies for Adaptive Caching in Diffusion Language Models

SPA-Cache: 扩散语言模型中的自适应缓存奇异代理

Wenhao Sun, Rong-Cheng Tu, Yifu Ding, Zhao Jin, Jingyi Liao, Yongcheng Jing, Dacheng Tao

发表机构 * College of Computing(计算学院) Data Science, Nanyang Technological University, Singapore, Singapore(数据科学,南洋理工大学,新加坡,新加坡)

AI总结 针对扩散语言模型因非因果特性无法使用标准KV缓存导致计算开销大的问题,提出SPA-Cache方法,通过低维奇异代理识别关键令牌并自适应分配缓存预算,实现高达8倍吞吐量提升和2-4倍加速。

Comments Accepted by ICML 2026.The code repository is available at https://github.com/wenhao728/spa-cache

详情
AI中文摘要

尽管扩散语言模型(DLM)为自回归范式提供了一种灵活、任意顺序的替代方案,但其非因果特性排除了标准的KV缓存,迫使在每个解码步骤进行昂贵的隐藏状态重新计算。现有的DLM缓存方法通过选择性隐藏状态更新来降低这一成本;然而,它们仍然受限于(i)昂贵的逐令牌更新识别启发式方法和(ii)僵化的统一预算分配,未能考虑异构的隐藏状态动态。为了解决这些挑战,我们提出了SPA-Cache,它在DLM缓存中联合优化了更新识别和预算分配。首先,我们推导出一个低维奇异代理,能够在低维子空间中识别更新关键令牌,大幅降低更新识别的开销。其次,我们引入一种自适应策略,在不降低生成质量的情况下,为稳定层分配更少的更新。这些贡献共同显著提高了DLM的效率,相比原始解码实现了高达8倍的吞吐量提升,相比现有缓存基线实现了2-4倍的加速。

英文摘要

While Diffusion Language Models (DLMs) offer a flexible, arbitrary-order alternative to the autoregressive paradigm, their non-causal nature precludes standard KV caching, forcing costly hidden state recomputation at every decoding step. Existing DLM caching approaches reduce this cost by selective hidden state updates; however, they are still limited by (i) costly token-wise update identification heuristics and (ii) rigid, uniform budget allocation that fails to account for heterogeneous hidden state dynamics. To address these challenges, we present SPA-Cache that jointly optimizes update identification and budget allocation in DLM cache. First, we derive a low-dimensional singular proxy that enables the identification of update-critical tokens in a low-dimensional subspace, substantially reducing the overhead of update identification. Second, we introduce an adaptive strategy that allocates fewer updates to stable layers without degrading generation quality. Together, these contributions significantly improve the efficiency of DLMs, yielding up to an $8\times$ throughput improvement over vanilla decoding and a $2$--$4\times$ speedup over existing caching baselines.

2602.02474 2026-05-26 cs.CL cs.AI cs.LG 版本更新

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

MemSkill:面向自进化智能体的可学习与进化记忆技能

Haozhen Zhang, Quanyu Long, Jianzhu Bao, Tao Feng, Weizhi Zhang, Haodong Yue, Wenya Wang

发表机构 * Nanyang Technological University(南洋理工大学) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) University of Illinois Chicago(伊利诺伊大学芝加哥分校) Tsinghua University(清华大学)

AI总结 提出MemSkill框架,将记忆操作转化为可学习和可进化的技能,通过控制器选择技能、执行器生成记忆、设计者进化技能集,形成闭环提升LLM智能体任务性能。

Comments Code is available at https://github.com/ViktorAxelsen/MemSkill

详情
AI中文摘要

大多数大语言模型(LLM)智能体记忆系统依赖少量静态、手工设计的操作来提取记忆。这些固定程序硬编码了关于存储内容和如何修订记忆的人类先验知识,使其在多样化的交互模式下僵化,并在长历史记录上效率低下。为此,我们提出 extbf{MemSkill},将这些操作重新定义为可学习和可进化的记忆技能,即从交互轨迹中提取、整合和修剪信息的结构化可重用例程。受智能体技能设计哲学的启发,MemSkill采用一个 extit{控制器},学习选择少量相关技能,并与基于LLM的 extit{执行器}配对,生成技能引导的记忆。除了学习技能选择,MemSkill引入一个 extit{设计者},定期审查所选技能产生错误或不完整记忆的困难案例,并通过提出改进和新技能来进化技能集。共同地,MemSkill形成了一个闭环流程,改进了技能选择策略和技能集本身。在LoCoMo、LongMemEval、HotpotQA和ALFWorld上的实验表明,MemSkill在强基线上提高了任务性能,并在不同设置下具有良好的泛化能力。进一步分析揭示了技能如何进化,为LLM智能体更自适应、自进化的记忆管理提供了见解。

英文摘要

Most Large Language Model (LLM) agent memory systems rely on a small set of static, hand-designed operations for extracting memory. These fixed procedures hard-code human priors about what to store and how to revise memory, making them rigid under diverse interaction patterns and inefficient on long histories. To this end, we present \textbf{MemSkill}, which reframes these operations as learnable and evolvable memory skills, structured and reusable routines for extracting, consolidating, and pruning information from interaction traces. Inspired by the design philosophy of agent skills, MemSkill employs a \emph{controller} that learns to select a small set of relevant skills, paired with an LLM-based \emph{executor} that produces skill-guided memories. Beyond learning skill selection, MemSkill introduces a \emph{designer} that periodically reviews hard cases where selected skills yield incorrect or incomplete memories, and evolves the skill set by proposing refinements and new skills. Together, MemSkill forms a closed-loop procedure that improves both the skill-selection policy and the skill set itself. Experiments on LoCoMo, LongMemEval, HotpotQA, and ALFWorld demonstrate that MemSkill improves task performance over strong baselines and generalizes well across settings. Further analyses shed light on how skills evolve, offering insights toward more adaptive, self-evolving memory management for LLM agents.

2602.00545 2026-05-26 cs.LG 版本更新

Depth, Not Data: An Analysis of Hessian Spectral Bifurcation

深度,而非数据:Hessian谱分叉分析

Shenyang Deng, Boyao Liao, Zhuoli Ouyang, Tianyu Pang, Yaoqing Yang

发表机构 * Department of Computer Science(计算机科学系) Dartmouth College(达特茅斯学院) University of Birmingham(伯明翰大学)

AI总结 本文通过分析深度线性网络,证明Hessian矩阵的谱分叉结构(主导特征值与主体特征值分离)可仅由网络深度引起,与数据协方差平衡无关,且主导与主体特征值之比与深度线性相关。

详情
AI中文摘要

Hessian矩阵的特征值分布在理解深度神经网络的优化景观中起着关键作用。先前的工作将广泛记录的“主体-尖峰”谱结构(其中少数主导特征值与大量较小特征值分离)归因于数据协方差矩阵的不平衡。在这项工作中,我们通过证明这种谱分叉可以纯粹由网络架构引起,而与数据不平衡无关,来挑战这一观点。具体来说,我们分析了一个深度线性网络设置,并证明即使数据协方差完全平衡,Hessian仍然表现出分叉特征值结构:一个主导簇和一个主体簇。至关重要的是,我们建立了主导特征值与主体特征值之比与网络深度呈线性关系。这表明谱间隙受到网络架构的强烈影响,而不仅仅是由数据分布决定。我们的结果表明,在设计深度网络的优化算法时,应同时考虑模型架构和数据特征。

英文摘要

The eigenvalue distribution of the Hessian matrix plays a crucial role in understanding the optimization landscape of deep neural networks. Prior work has attributed the well-documented ``bulk-and-spike'' spectral structure, where a few dominant eigenvalues are separated from a bulk of smaller ones, to the imbalance in the data covariance matrix. In this work, we challenge this view by demonstrating that such spectral Bifurcation can arise purely from the network architecture, independent of data imbalance. Specifically, we analyze a deep linear network setup and prove that, even when the data covariance is perfectly balanced, the Hessian still exhibits a Bifurcation eigenvalue structure: a dominant cluster and a bulk cluster. Crucially, we establish that the ratio between dominant and bulk eigenvalues scales linearly with the network depth. This reveals that the spectral gap is strongly affected by the network architecture rather than solely by data distribution. Our results suggest that both model architecture and data characteristics should be considered when designing optimization algorithms for deep networks.

2602.00511 2026-05-26 cs.LG math.OC 版本更新

Partition of Unity Neural Networks for Interpretable Classification with Explicit Class Regions

用于可解释分类的单元划分神经网络及显式类别区域

Akram Aldroubi

发表机构 * Department of Mathematics(数学系)

AI总结 提出单元划分神经网络(PUNN),通过直接学习满足和为1的非负函数来定义类别概率,无需softmax层,实现可解释分类并证明其稠密性,实验表明在保持精度的同时大幅减少参数。

Comments v2: substantially revised; under review at TMLR

详情
AI中文摘要

尽管神经网络分类器在经验上取得了成功,但它们仍然难以解释。在基于softmax的模型中,类别区域被隐式定义为logits之间不等式系统的解,这使得它们难以提取和可视化。我们引入了单元划分神经网络(PUNN),这是一种架构,其中类别概率直接来自学习到的单元划分,无需softmax层。PUNN构造了$k$个非负函数$h_1, \ldots, h_k$,满足$\sum_i h_i(x) = 1$,其中每个$h_i(x)$直接表示$P(\text{类别 } i \mid x)$。与softmax不同,其中类别区域通过logits之间的耦合不等式隐式定义,每个PUNN划分函数$h_i$直接定义类别$i$的概率作为$x$的独立函数。我们证明了PUNN在紧致域上的连续概率映射空间中是稠密的。定义划分的门函数$g_i$可以使用各种激活函数(sigmoid、高斯、bump)和参数化方法,从灵活的MLP到参数高效、形状感知的设计(球壳、椭球、球谐函数)。在合成数据、UCI基准和MNIST上的实验表明,基于MLP门的PUNN在精度上达到标准多层感知机的0.3-0.6%以内。当几何先验与数据结构匹配时,形状感知的门在参数减少多达300倍的情况下实现了相当的精度。这些结果表明,可解释性设计架构可以与黑盒模型竞争,同时提供透明的类别概率分配。

英文摘要

Despite their empirical success, neural network classifiers remain difficult to interpret. In softmax-based models, class regions are defined implicitly as solutions to systems of inequalities among logits, making them difficult to extract and visualize. We introduce Partition of Unity Neural Networks (PUNN), an architecture in which class probabilities arise directly from a learned partition of unity, without requiring a softmax layer. PUNN constructs $k$ nonnegative functions $h_1, \ldots, h_k$ satisfying $\sum_i h_i(x) = 1$, where each $h_i(x)$ directly represents $P(\text{class } i \mid x)$. Unlike softmax, where class regions are defined implicitly through coupled inequalities among logits, each PUNN partition function $h_i$ directly defines the probability of class $i$ as a standalone function of $x$. We prove that PUNN is dense in the space of continuous probability maps on compact domains. The gate functions $g_i$ that define the partition can use various activation functions (sigmoid, Gaussian, bump) and parameterizations ranging from flexible MLPs to parameter-efficient shape-informed designs (spherical shells, ellipsoids, spherical harmonics). Experiments on synthetic data, UCI benchmarks, and MNIST show that PUNN with MLP-based gates achieves accuracy within 0.3--0.6\% of standard multilayer perceptrons. When geometric priors match the data structure, shape-informed gates achieve comparable accuracy with up to 300$\times$ fewer parameters. These results demonstrate that interpretable-by-design architectures can be competitive with black-box models while providing transparent class probability assignments.

2601.23164 2026-05-26 cs.LG 版本更新

Stochastic Linear Bandits with Parameter Noise

带有参数噪声的随机线性赌博机

Daniel Ezer, Alon Peled-Cohen, Yishay Mansour

发表机构 * Tel Aviv University(特拉维夫大学) Google Research(谷歌研究)

AI总结 研究带有参数噪声的随机线性赌博机模型,提出一种简单的探索-利用算法,实现了与下界匹配(对数因子内)的遗憾界,并揭示了与经典加性噪声模型不同的最优遗憾阶。

Comments 8 pages

详情
AI中文摘要

我们研究了带有参数噪声模型的随机线性赌博机,其中动作$a$的奖励为$a^ op θ$,$θ$是独立同分布的样本。我们给出了一个遗憾上界$\widetilde{O} (\sqrt{d T \log (K/δ) σ^2_{\max}})$,其中$T$是时间范围,动作集大小为$K$,维度为$d$,$σ^2_{\max}$是任何动作奖励的最大方差。我们进一步给出了一个下界$\widetildeΩ (d \sqrt{T σ^2_{\max}})$,当$\log (K) \approx d$时,该下界是紧的(忽略对数因子)。对于更具体的动作集,即$p \leq 2$的$\ell_p$单位球及其对偶范数$q$,我们证明了极小极大遗憾为$\widetildeΘ (\sqrt{dT σ^2_q})$,其中$σ^2_q$是一个与方差相关的量,且始终不超过4。这与经典加性噪声模型中此类动作集可达到的极小极大遗憾(阶为$d \sqrt{T}$)形成对比。令人惊讶的是,我们表明这个最优(忽略对数因子)遗憾界可以通过一个非常简单的探索-利用算法实现。

英文摘要

We study the stochastic linear bandits with parameter noise model, in which the reward of action $a$ is $a^\top θ$ where $θ$ is sampled i.i.d. We show a regret upper bound of $\widetilde{O} (\sqrt{d T \log (K/δ) σ^2_{\max})}$ for a horizon $T$, general action set of size $K$ of dimension $d$, and where $σ^2_{\max}$ is the maximal variance of the reward for any action. We further provide a lower bound of $\widetildeΩ (d \sqrt{T σ^2_{\max}})$ which is tight (up to logarithmic factors) whenever $\log (K) \approx d$. For more specific action sets, $\ell_p$ unit balls with $p \leq 2$ and dual norm $q$, we show that the minimax regret is $\widetildeΘ (\sqrt{dT σ^2_q)}$, where $σ^2_q$ is a variance-dependent quantity that is always at most $4$. This is in contrast to the minimax regret attainable for such sets in the classic additive noise model, where the regret is of order $d \sqrt{T}$. Surprisingly, we show that this optimal (up to logarithmic factors) regret bound is attainable using a very simple explore-exploit algorithm.

2601.22925 2026-05-26 cs.IR cs.AI cs.LG 版本更新

BEAR: Towards Beam-Search-Aware Optimization for Recommendation with Large Language Models

BEAR:面向大语言模型推荐中束搜索感知的优化

Weiqin Yang, Bohao Wang, Zhenxiang Xu, Jiawei Chen, Shengjia Zhang, Jingbang Chen, Canghong Jin, Can Wang

发表机构 * Zhejiang University(浙江大学) The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Hangzhou City University(杭州市城市大学)

AI总结 针对监督微调与束搜索推理之间的不一致性,提出BEAR正则化方法,通过确保正例每个token在解码步骤中排名前B来避免过早剪枝,显著提升推荐性能。

Comments Accepted by SIGIR 2026

详情
AI中文摘要

近年来,利用大语言模型(LLM)进行推荐的研究迅速增长。这些方法通常采用监督微调(SFT)使LLM适应推荐场景,并在推理时使用束搜索高效检索前B个推荐项。然而,我们发现了关键的训练-推理不一致性:虽然SFT优化正例的整体概率,但即使这些项具有高整体概率,也不能保证它们会被束搜索检索到。由于贪心剪枝机制,束搜索可能会在正例的前缀概率不足时过早丢弃它。为了解决这种不一致性,我们提出了BEAR(束搜索感知正则化),一种新的微调目标,在训练中显式考虑束搜索行为。BEAR不直接模拟每个训练实例的束搜索(计算代价过高),而是强制执行一个宽松的必要条件:正例中的每个token在每个解码步骤中必须排在前B个候选token中。该目标有效降低了错误剪枝的风险,同时与标准SFT相比仅增加可忽略的计算开销。在四个真实世界数据集上的大量实验表明,BEAR显著优于强基线。代码可在https://github.com/Tiny-Snow/BEAR-SIGIR-2026获取。

英文摘要

Recent years have seen a rapid surge in research leveraging Large Language Models (LLMs) for recommendation. These methods typically employ supervised fine-tuning (SFT) to adapt LLMs to recommendation scenarios, and utilize beam search during inference to efficiently retrieve $B$ top-ranked recommended items. However, we identify a critical training-inference inconsistency: while SFT optimizes the overall probability of positive items, it does not guarantee that such items will be retrieved by beam search even if they possess high overall probabilities. Due to the greedy pruning mechanism, beam search can prematurely discard a positive item once its prefix probability is insufficient. To address this inconsistency, we propose BEAR (Beam-SEarch-Aware Regularization), a novel fine-tuning objective that explicitly accounts for beam search behavior during training. Rather than directly simulating beam search for each instance during training, which is computationally prohibitive, BEAR enforces a relaxed necessary condition: each token in a positive item must rank within the top-$B$ candidate tokens at each decoding step. This objective effectively mitigates the risk of incorrect pruning while incurring negligible computational overhead compared to standard SFT. Extensive experiments across four real-world datasets demonstrate that BEAR significantly outperforms strong baselines. Code is available at https://github.com/Tiny-Snow/BEAR-SIGIR-2026 .

2601.21924 2026-05-26 cs.LG stat.ML 版本更新

One-Step Bellman Alignment Enables Provably Efficient Transfer in Online RL

一步贝尔曼对齐实现在线强化学习中的可证明高效迁移

Elynn Chen, Enpei Zhang, Jinhang Chai, Yujun Yan

发表机构 * Department of Technology, Operations, & Statistics, New York University(纽约大学技术、运营与统计系) Department of Operations Research & Financial Engineering, Princeton University(普林斯顿大学运筹学与金融工程系) Department of Computer Science, Dartmouth College(达特茅斯学院计算机科学系)

AI总结 提出一步贝尔曼对齐作为在线强化学习中迁移的正确抽象,并通过重加权目标(RWT)实现算子级修正,在RKHS函数逼近下建立了与任务迁移复杂度相关的遗憾界。

详情
AI中文摘要

我们研究在情节马尔可夫决策过程中的在线迁移强化学习,其中在学习目标任务时,来自相关源任务的经验是可用的。一个基本困难在于任务相似性通常根据奖励或转移来定义,而在线RL算法操作在贝尔曼回归目标上。因此,简单地重用源贝尔曼更新会引入系统性偏差并使遗憾保证失效。我们识别出一阶贝尔曼对齐作为在线RL中迁移的正确抽象,并提出重加权目标(RWT),这是一种算子级修正,通过测度变换重新定位延续值并补偿转移不匹配。RWT将任务不匹配简化为固定的一步修正,并实现了源数据的统计上合理的重用。这种对齐产生了一个两阶段RWT Q学习框架,将方差减少与偏差修正分离。在RKHS函数逼近下,我们建立的遗憾界随任务迁移的复杂度而非目标MDP的复杂度变化。我们进一步证明了所需的密度比允许一个具有有限样本保证的构造性RKHS估计器,并经验验证了对估计和错误指定比率的鲁棒性。在表格和神经网络设置中的实证结果均显示,与单任务学习和朴素池化相比,持续改进,突出了贝尔曼对齐作为在线RL中模型无关的迁移原理。

英文摘要

We study online transfer reinforcement learning (RL) in episodic Markov decision processes, where experience from related source tasks is available during learning on a target task. A fundamental difficulty is that task similarity is typically defined in terms of rewards or transitions, whereas online RL algorithms operate on Bellman regression targets. As a result, naively reusing source Bellman updates introduces systematic bias and invalidates regret guarantees. We identify one-step Bellman alignment as the correct abstraction for transfer in online RL and propose re-weighted targeting (RWT), an operator-level correction that retargets continuation values and compensates for transition mismatch via a change of measure. RWT reduces task mismatch to a fixed one-step correction and enables statistically sound reuse of source data. This alignment yields a two-stage RWT $Q$-learning framework that separates variance reduction from bias correction. Under RKHS function approximation, we establish regret bounds that scale with the complexity of the task shift rather than the target MDP. We further show the required density ratios admit a constructive RKHS estimator with finite-sample guarantees, and empirically validate robustness to estimated and mis-specified ratios. Empirical results in both tabular and neural network settings demonstrate consistent improvements over single-task learning and naïve pooling, highlighting Bellman alignment as a model-agnostic transfer principle for online RL.

2601.21601 2026-05-26 cs.LG cs.AI 版本更新

Dynamics Reveals Structure: Challenging the Linear Propagation Assumption

动力学揭示结构:挑战线性传播假设

Hoyeon Chang, Bálint Mucsányi, Seong Joon Oh

发表机构 * University of Tübingen(图宾根大学)

AI总结 通过关系代数研究神经网络中线性传播假设的几何极限,证明其在对合运算(否定、逆)上可行,但在组合运算上存在根本性障碍,导致特征映射崩溃,并解释知识编辑失败、反转诅咒和多跳推理等问题的共同根源。

详情
AI中文摘要

神经网络通过一阶参数更新进行自适应,但尚不清楚这种更新是否保持逻辑一致性。我们研究了线性传播假设(LPA)的几何极限,该假设认为局部更新能够连贯地传播到逻辑结论。为了形式化这一点,我们采用关系代数,研究关系的三种核心运算:否定翻转真值、逆交换参数顺序、组合链接关系。对于否定和逆,我们证明保证与方向无关的一阶传播需要一种张量分解,将实体对上下文与关系内容分离。然而,对于组合,我们识别出一个根本性障碍。我们证明组合可归结为合取,并证明任何在线性特征上良好定义的合取必须是双线性的。由于双线性与否定不兼容,这迫使特征映射崩溃。这些结果表明,知识编辑失败、反转诅咒和多跳推理可能源于LPA固有的共同结构限制。

英文摘要

Neural networks adapt through first-order parameter updates, yet it remains unclear whether such updates preserve logical coherence. We investigate the geometric limits of the Linear Propagation Assumption (LPA), the premise that local updates coherently propagate to logical consequences. To formalize this, we adopt relation algebra and study three core operations on relations: negation flips truth values, converse swaps argument order, and composition chains relations. For negation and converse, we prove that guaranteeing direction-agnostic first-order propagation necessitates a tensor factorization separating entity-pair context from relation content. However, for composition, we identify a fundamental obstruction. We show that composition reduces to conjunction, and prove that any conjunction well-defined on linear features must be bilinear. Since bilinearity is incompatible with negation, this forces the feature map to collapse. These results suggest that failures in knowledge editing, the reversal curse, and multi-hop reasoning may stem from common structural limitations inherent to the LPA.

2601.20738 2026-05-26 cs.LG cs.DC eess.SP math.OC stat.ML 版本更新

SA-PEF: Step-Ahead Partial Error Feedback for Efficient Federated Learning

SA-PEF:用于高效联邦学习的前瞻部分误差反馈

Dawit Kiros Redie, Reza Arablouei, Stefan Werner

发表机构 * Department of Electronic Systems, Norwegian University of Science and Technology (NTNU)(挪威科学技术大学电子系统系) Department of Information and Communications Engineering, Aalto University(阿尔托大学信息与通信工程系) CSIRO’s Data61(CSIRO数据61)

AI总结 提出SA-PEF方法,通过结合前瞻校正和部分误差反馈,在非IID数据和部分客户端参与下加速联邦学习收敛,并理论证明其收敛速率与Fed-SGD相当。

详情
Journal ref
Transactions on Machine Learning Research, 2026
AI中文摘要

带误差反馈(EF)的有偏梯度压缩减少了联邦学习(FL)中的通信,但在非IID数据下,残差误差可能缓慢衰减,导致早期轮次中的梯度不匹配和进度停滞。我们提出前瞻部分误差反馈(SA-PEF),它集成了前瞻(SA)校正与部分误差反馈(PEF)。当前瞻系数$α=0$时,SA-PEF恢复为EF;当$α=1$时,恢复为前瞻EF(SAEF)。对于非凸目标和$δ$-收缩压缩器,我们建立了二阶矩界和残差递归,保证了在异构数据和部分客户端参与下收敛到平稳点。得到的速率与标准非凸Fed-SGD保证在常数因子内匹配,在固定内步长下实现$O((η,η_0TR)^{-1})$收敛到方差/异质性下界。我们的分析揭示了一个由前瞻控制的残差收缩$ρ_r$,解释了早期训练阶段观察到的加速。为了平衡SAEF的快速预热与EF的长期稳定性,我们选择接近理论预测最优的$α$。跨多种架构和数据集的实验表明,SA-PEF始终比EF更快达到目标精度。

英文摘要

Biased gradient compression with error feedback (EF) reduces communication in federated learning (FL), but under non-IID data, the residual error can decay slowly, causing gradient mismatch and stalled progress in the early rounds. We propose step-ahead partial error feedback (SA-PEF), which integrates step-ahead (SA) correction with partial error feedback (PEF). SA-PEF recovers EF when the step-ahead coefficient $α=0$ and step-ahead EF (SAEF) when $α=1$. For non-convex objectives and $δ$-contractive compressors, we establish a second-moment bound and a residual recursion that guarantee convergence to stationarity under heterogeneous data and partial client participation. The resulting rates match standard non-convex Fed-SGD guarantees up to constant factors, achieving $O((η,η_0TR)^{-1})$ convergence to a variance/heterogeneity floor with a fixed inner step size. Our analysis reveals a step-ahead-controlled residual contraction $ρ_r$ that explains the observed acceleration in the early training phase. To balance SAEF's rapid warm-up with EF's long-term stability, we select $α$ near its theory-predicted optimum. Experiments across diverse architectures and datasets show that SA-PEF consistently reaches target accuracy faster than EF.

2601.15544 2026-05-26 cs.LG cs.AI 版本更新

RDumb++: Drift-Aware Continual Test-Time Adaptation

RDumb++:漂移感知的持续测试时自适应

Himanshu Mishra

发表机构 * Department of Computer Science(计算机科学系) University of British Columbia(不列颠哥伦比亚大学)

AI总结 针对持续测试时自适应中分布快速变化或长期漂移导致性能崩溃的问题,提出RDumb++方法,通过熵和KL散度漂移检测机制与自适应重置策略,在CCC基准上实现约3%的绝对准确率提升。

详情
AI中文摘要

持续测试时自适应(CTTA)旨在仅使用传入的无标签数据流在部署期间更新预训练模型。尽管先前的方法如Tent、EATA等在短期演化偏移下提供了有意义的改进,但当测试分布快速变化或时间跨度极长时,它们表现不佳。CCC基准测试体现了这一挑战,模型在包含750万样本且不断变化损坏类型和严重程度的数据流上运行。我们提出RDumb++,它是RDumb的合理扩展,引入了两种漂移检测机制,即基于熵的漂移评分和KL散度漂移评分,以及自适应重置策略。这些机制使模型能够检测累积的自适应何时变得有害,并在预测崩溃发生前恢复。在包含三种速度和三种种子的CCC-medium(九次运行,每次包含一百万样本)上,RDumb++始终优于RDumb,在整个数据流中实现约3%的绝对准确率提升,同时保持稳定的自适应。关于漂移阈值和重置强度的消融实验进一步表明,漂移感知重置对于防止崩溃和实现可靠的长期CTTA至关重要。

英文摘要

Continual Test-Time Adaptation (CTTA) seeks to update a pretrained model during deployment using only the incoming, unlabeled data stream. Although prior approaches such as Tent, EATA etc. provide meaningful improvements under short evolving shifts, they struggle when the test distribution changes rapidly or over extremely long horizons. This challenge is exemplified by the CCC benchmark, where models operate over streams of 7.5M samples with continually changing corruption types and severities. We propose RDumb++, a principled extension of RDumb that introduces two drift-detection mechanisms i.e entropy-based drift scoring and KL-divergence drift scoring, together with adaptive reset strategies. These mechanisms allow the model to detect when accumulated adaptation becomes harmful and to recover before prediction collapse occurs. Across CCC-medium with three speeds and three seeds (nine runs, each containing one million samples), RDumb++ consistently surpasses RDumb, yielding approx 3% absolute accuracy gains while maintaining stable adaptation throughout the entire stream. Ablation experiments on drift thresholds and reset strengths further show that drift-aware resetting is essential for preventing collapse and achieving reliable long-horizon CTTA.

2601.14340 2026-05-26 cs.CR cs.LG 版本更新

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

基于回合的结构性触发器:多轮LLM中的无提示后门

Yiyang Lu, Jinwen He, Yue Zhao, Kai Chen, Ruigang Liang, Cheng Hong, Yingjun Zhang

发表机构 * School of Cyber Security, University of Chinese Academy of Sciences(中国科学院大学网络安全学院) Institute of Information Engineering, Chinese Academy of Sciences(中国科学院信息工程研究所) Institute of Software, Chinese Academy of Sciences(中国科学院软件研究所) Ant Group(蚂蚁集团)

AI总结 提出一种利用对话结构(回合索引)作为触发器的后门攻击方法TST,无需用户输入即可激活,实现高攻击成功率并绕过提示中心防御。

详情
AI中文摘要

大型语言模型(LLM)被广泛集成到交互式系统中,如对话代理和面向任务的助手。这一日益增长的生态系统也带来了供应链风险,攻击者可以分发被污染的模型,降低下游可靠性和用户信任。现有的后门攻击和防御大多以提示为中心,关注用户可见的触发器,而忽视了多轮对话中的结构信号。我们提出了基于回合的结构性触发器(TST),这是一种从对话结构激活的后门攻击,使用回合索引作为触发器,且独立于用户输入。这造成了一种结构条件性的可靠性风险:带有后门的模型可以通过以提示为中心的检查和标准效用评估,但在选定的对话位置执行攻击者指定的行为,而用户输入中没有任何触发器。在四个开源LLM家族中,TST实现了99.52%的平均攻击成功率,同时基本保持了非触发效用,并且在未见过的对话数据集和代表性防御中仍然有效。这些结果揭示了对话结构是一个被忽视的攻击面,并激励了超越提示检查的结构感知多轮审计。

英文摘要

Large Language Models (LLMs) are widely integrated into interactive systems such as dialogue agents and task-oriented assistants. This growing ecosystem also raises supply-chain risks, where adversaries can distribute poisoned models that degrade downstream reliability and user trust. Existing backdoor attacks and defenses are largely prompt-centric, focusing on user-visible triggers while overlooking structural signals in multi-turn conversations. We propose Turn-based Structural Trigger (TST), a backdoor attack that activates from dialogue structure, using the turn index as the trigger and remaining independent of user inputs. This creates a structure-conditioned reliability risk: a backdoored model can pass prompt-centric checks and standard utility evaluations, yet execute attacker-specified behaviors at selected dialogue positions without any trigger in the user input. Across four open-source LLM families, TST achieves a 99.52% average ASR while largely preserving non-triggered utility, and remains effective across unseen dialogue datasets and representative defenses. These results reveal dialogue structure as an overlooked attack surface and motivate structure-aware multi-turn auditing beyond prompt inspection.

2601.10494 2026-05-26 stat.ML cs.LG 版本更新

CROCS: A Two-Stage Clustering Framework for Behaviour-Centric Consumer Segmentation with Smart Meter Data

CROCS:一种基于智能电表数据的以行为为中心的消费者细分的两阶段聚类框架

Luke W. Yerbury, Ricardo J. G. B. Campello, G. C. Livingston, Mark Goldsworthy, Lachlan O'Neil

发表机构 * Ausgrid(澳大利亚电网公司)

AI总结 提出CROCS两阶段聚类框架,通过消费者日常负荷曲线的独立聚类和基于加权最小距离的集合间比较,实现鲁棒且可扩展的消费者行为细分。

详情
AI中文摘要

随着电网运营商面临可再生能源整合和电气化推广带来的不确定性增加,需求侧管理(DSM)——特别是需求响应(DR)——作为一种平衡现代电力系统的成本效益机制引起了广泛关注。全球持续部署的智能电表提供了前所未有的消费数据量,使得基于实际用电行为的消费者细分成为可能,有望为设计更有效的DSM和DR计划提供信息。然而,现有的基于聚类的细分方法未能充分反映消费者的行为多样性,通常依赖于严格的时间对齐,并且在存在异常值、缺失数据或大规模部署时表现不佳。为了解决这些挑战,我们提出了一种新颖的两阶段聚类框架——优化消费者细分的聚类表示(CROCS)。在第一阶段,每个消费者的每日负荷曲线被独立聚类,形成代表性负荷集(RLS),提供其典型日间消费行为的紧凑摘要。在第二阶段,使用加权最小距离和(WSMD)对消费者进行聚类,这是一种新颖的集合间度量,通过考虑这些行为的普遍性和相似性来比较RLS。最后,对WSMD诱导图进行社区检测,揭示体现定义消费者群体的共享日间行为的高阶原型,从而增强所得聚类的可解释性。在合成和真实澳大利亚智能电表数据集上的大量实验表明,CROCS能够捕捉消费者内部变异性,发现同步和异步行为相似性,对异常值和缺失数据保持鲁棒性,并通过自然并行化实现高效扩展。这些结果...

英文摘要

With grid operators confronting rising uncertainty from renewable integration and a broader push toward electrification, Demand-Side Management (DSM) -- particularly Demand Response (DR) -- has attracted significant attention as a cost-effective mechanism for balancing modern electricity systems. Unprecedented volumes of consumption data from a continuing global deployment of smart meters enable consumer segmentation based on real usage behaviours, promising to inform the design of more effective DSM and DR programs. However, existing clustering-based segmentation methods insufficiently reflect the behavioural diversity of consumers, often relying on rigid temporal alignment, and faltering in the presence of anomalies, missing data, or large-scale deployments. To address these challenges, we propose a novel two-stage clustering framework -- Clustered Representations Optimising Consumer Segmentation (CROCS). In the first stage, each consumer's daily load profiles are clustered independently to form a Representative Load Set (RLS), providing a compact summary of their typical diurnal consumption behaviours. In the second stage, consumers are clustered using the Weighted Sum of Minimum Distances (WSMD), a novel set-to-set measure that compares RLSs by accounting for both the prevalence and similarity of those behaviours. Finally, community detection on the WSMD-induced graph reveals higher-order prototypes that embody the shared diurnal behaviours defining consumer groups, enhancing the interpretability of the resulting clusters. Extensive experiments on both synthetic and real Australian smart meter datasets demonstrate that CROCS captures intra-consumer variability, uncovers both synchronous and asynchronous behavioural similarities, and remains robust to anomalies and missing data, while scaling efficiently through natural parallelisation. These results...

2601.08205 2026-05-26 cs.CV cs.LG 版本更新

FUME: Fused Unified Multi-Gas Emission Network for Livestock Rumen Acidosis Detection

FUME: 用于牲畜瘤胃酸中毒检测的融合统一多气体排放网络

Taminul Islam, Toqi Tahamid Sarker, Mohamed Embaby, Khaled R Ahmed, Amer AbuGhazaleh

发表机构 * Southern Illinois University, Carbondale(南方伊利诺伊大学,卡本达勒分校) University of California, Davis(加州大学戴维斯分校)

AI总结 提出FUME网络,利用双气体(CO2和CH4)光学成像,通过轻量双流架构和通道注意力融合,实现瘤胃酸中毒的高精度分割与分类。

Comments 10 pages, 5 figures

详情
Journal ref
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2026, pp. 510-519
AI中文摘要

瘤胃酸中毒是奶牛中常见的代谢紊乱,导致重大经济损失和动物福利问题。当前的诊断方法依赖于侵入性pH测量,限制了持续监测的可扩展性。我们提出了FUME(融合统一多气体排放网络),这是首个在体外条件下通过双气体光学成像进行瘤胃酸中毒检测的深度学习方法。我们的方法利用红外相机捕获的互补二氧化碳(CO2)和甲烷(CH4)排放模式,将瘤胃健康状态分类为健康、过渡和酸中毒。FUME采用轻量双流架构,包含权重共享编码器、模态特定自注意力和通道注意力融合,联合优化气体羽流分割和奶牛健康分类。我们引入了首个双气体OGI数据集,包含8967个标注帧,覆盖六个pH水平,并带有像素级分割掩码。实验表明,FUME在仅使用1.28M参数和1.97G MACs的情况下,实现了80.99%的mIoU和98.82%的分类准确率——在分割质量上优于最先进方法,且计算成本降低10倍。消融研究揭示,CO2提供主要的判别信号,而双任务学习对于最优性能至关重要。我们的工作确立了基于气体排放的牲畜健康监测的可行性,为实用的体外酸中毒检测系统铺平了道路。代码可在 https://github.com/taminulislam/fume 获取。

英文摘要

Ruminal acidosis is a prevalent metabolic disorder in dairy cattle causing significant economic losses and animal welfare concerns. Current diagnostic methods rely on invasive pH measurement, limiting scalability for continuous monitoring. We present FUME (Fused Unified Multi-gas Emission Network), the first deep learning approach for rumen acidosis detection from dual-gas optical imaging under in vitro conditions. Our method leverages complementary carbon dioxide (CO2) and methane (CH4) emission patterns captured by infrared cameras to classify rumen health into Healthy, Transitional, and Acidotic states. FUME employs a lightweight dual-stream architecture with weight-shared encoders, modality-specific self-attention, and channel attention fusion, jointly optimizing gas plume segmentation and classification of dairy cattle health. We introduce the first dual-gas OGI dataset comprising 8,967 annotated frames across six pH levels with pixel-level segmentation masks. Experiments demonstrate that FUME achieves 80.99% mIoU and 98.82% classification accuracy while using only 1.28M parameters and 1.97G MACs--outperforming state-of-the-art methods in segmentation quality with 10x lower computational cost. Ablation studies reveal that CO2 provides the primary discriminative signal and dual-task learning is essential for optimal performance. Our work establishes the feasibility of gas emission-based livestock health monitoring, paving the way for practical, in vitro acidosis detection systems. Codes are available at https://github.com/taminulislam/fume.

2601.06870 2026-05-26 cs.LG cs.AI 版本更新

QASA: Quality-Aware Semantic Augmentation for Robust Multimodal Sentiment Analysis

QASA: 面向鲁棒多模态情感分析的质量感知语义增强

Jiazhang Liang, Jianheng Dai, Miaosen Luo, Menghua Jiang, Sijie Mai

发表机构 * School of Computer Science, South China Normal University(华南师范大学计算机学院)

AI总结 提出QASA框架,利用扩散模型生成视觉和听觉增强样本,并通过解耦质量感知评分模块分配训练权重,以解决高质量数据稀缺问题,提升多模态情感分析的鲁棒性和泛化能力。

Comments 11 pages, 4 figures

详情
AI中文摘要

多模态大语言模型在多模态情感分析中展现出强大的语义表示能力。然而,由于高质量训练数据的稀缺,它们学习稳定且可泛化的多模态特征的能力受到限制。为了解决这一问题,我们提出了QASA(质量感知语义增强),该方法使用扩散模型生成增强的视觉和听觉样本,从而扩大训练数据集并支持多模态学习。生成的样本质量可能参差不齐,并可能出现跨模态不一致。为此,我们引入了一个解耦的质量感知评分模块,根据每个增强样本的可靠性分配训练权重。这种方法减少了低质量数据的影响,有助于更稳定和鲁棒的模型训练。该框架结合了扩散模型的生成能力和多模态大模型的语义推理能力,提供了一种无需人工标注的自动数据增强策略,同时在有限高质量数据下提高了泛化性和鲁棒性。在CH-SIMS数据集上的实验表明,QASA在五类准确率(Acc5)和二类准确率(Acc2)上分别相对提升了18.0%和5.9%,并且在CMU-MOSI和MUStARD基准测试上也优于现有方法。

英文摘要

Multimodal large language models have demonstrated strong ability in capturing semantic representations for multimodal sentiment analysis. Their capacity to learn stable and generalizable multimodal features is limited, however, by the scarcity of high-quality training data. To address this, we propose QASA (Quality-Aware Semantic Augmentation), which uses diffusion models to generate augmented visual and auditory samples, thereby enlarging the training dataset and supporting multimodal learning. The generated samples can vary in quality and may exhibit cross-modal inconsistencies. To manage this, we introduce a decoupled quality-aware scoring module that assigns training weights based on the reliability of each augmented sample. This approach reduces the influence of low-quality data and contributes to more stable and robust model training. The framework combines the generative capabilities of diffusion models with the semantic reasoning of multimodal large models, providing an automated data augmentation strategy that does not require human annotation while improving generalization and robustness under limited high-quality data. Experiments on the CH-SIMS dataset show that QASA yields a relative increase of 18.0\% and 5.9\% in five-class accuracy (Acc5) and binary accuracy (Acc2), respectively, and it also outperforms existing methods on the CMU-MOSI and MUStARD benchmarks.

2512.23956 2026-05-26 stat.ML cs.LG 版本更新

Implicit geometric regularization in flow matching via density weighted Stein operators

通过密度加权Stein算子的流匹配中的隐式几何正则化

Shinto Eguchi

发表机构 * The Institute of Statistical Mathematics(统计数学研究所)

AI总结 提出γ-流匹配(γ-FM),通过动态密度加权策略隐式正则化高维空间中的回归几何,改善向量场平滑性和采样效率。

Comments Revised version

详情
AI中文摘要

流匹配(FM)已成为连续归一化流的一个强大范式,但标准FM隐式地在整个环境空间上进行未加权的$L^2$回归。在高维空间中,这导致了一个根本性的低效:绝大多数积分区域由低密度的“空洞”区域组成,其中目标速度场通常是混沌或定义不良的。在本文中,我们提出了γ-流匹配(γ-FM),一种密度加权变体,它将回归几何与底层概率流对齐。虽然密度加权是可取的,但朴素实现需要评估难以处理的目标密度。我们通过引入一种动态密度加权策略来规避这一点,该策略直接从训练粒子估计目标密度。这种方法使我们能够动态降低空洞区域中的回归损失,而不损害FM的无模拟特性。理论上,我们证明了γ-FM在赋予γ-Stein度量的统计流形上最小化传输成本。谱分析进一步表明,这种几何结构引入了隐式Sobolev正则化,有效地抑制了空洞区域中的高频振荡。实验上,γ-FM显著改善了高维潜在数据集上的向量场平滑性和采样效率,同时展示了对异常值的内在鲁棒性。

英文摘要

Flow Matching (FM) has emerged as a powerful paradigm for continuous normalizing flows, yet standard FM implicitly performs an unweighted $L^2$ regression over the entire ambient space. In high dimensions, this leads to a fundamental inefficiency: the vast majority of the integration domain consists of low-density ``void'' regions where the target velocity fields are often chaotic or ill-defined. In this paper, we propose {$γ$-Flow Matching ($γ$-FM)}, a density-weighted variant that aligns the regression geometry with the underlying probability flow. While density weighting is desirable, naive implementations would require evaluating the intractable target density. We circumvent this by introducing a Dynamic Density-Weighting strategy that estimates the \emph{target} density directly from training particles. This approach allows us to dynamically downweight the regression loss in void regions without compromising the simulation-free nature of FM. Theoretically, we establish that $γ$-FM minimizes the transport cost on a statistical manifold endowed with the $γ$-Stein metric. Spectral analysis further suggests that this geometry induces an implicit Sobolev regularization, effectively damping high-frequency oscillations in void regions. Empirically, $γ$-FM significantly improves vector field smoothness and sampling efficiency on high-dimensional latent datasets, while demonstrating intrinsic robustness to outliers.

2512.20063 2026-05-26 cs.LG 版本更新

PairFlow: Closed-Form Source-Target Coupling for Few-Step Generation in Discrete Flow Models

PairFlow: 离散流模型中用于少步生成的闭式源-目标耦合

Mingue Park, Jisung Hwang, Seungwoo Yoo, Kyeongmin Yeo, Minhyuk Sung

发表机构 * KAIST(韩国科学技术院)

AI总结 提出PairFlow,一种轻量级预处理方法,通过闭式反演构建源-目标配对样本,无需预训练教师即可实现离散流模型的少步采样,匹配甚至超越两阶段微调性能。

Comments ICLR 2026

详情
AI中文摘要

我们介绍了$\texttt{PairFlow}$,一种用于训练离散流模型(DFM)的轻量级预处理步骤,无需预训练教师即可实现少步采样。DFM最近作为一类新的离散数据生成模型出现,性能强劲。然而,由于其迭代性质,采样速度慢。现有的加速方法主要依赖微调,这引入了大量额外的训练开销。$\texttt{PairFlow}$通过轻量级预处理步骤解决了这个问题。受ReFlow及其在DFM上的扩展启发,我们从源分布和目标分布的耦合样本训练DFM,无需任何预训练教师。我们方法的核心是DFM的闭式反演,这使得能够高效构建配对的源-目标样本。尽管成本极低,仅占完整模型训练所需计算量的1.7%,但$\texttt{PairFlow}$匹配甚至超越了涉及微调的两阶段训练的性能。此外,使用我们的框架训练的模型为后续蒸馏提供了更强的基模型,在微调后进一步加速。在分子数据以及二值和RGB图像上的实验证明了我们方法的广泛适用性和有效性。

英文摘要

We introduce $\texttt{PairFlow}$, a lightweight preprocessing step for training Discrete Flow Models (DFMs) to achieve few-step sampling without requiring a pretrained teacher. DFMs have recently emerged as a new class of generative models for discrete data, offering strong performance. However, they suffer from slow sampling due to their iterative nature. Existing acceleration methods largely depend on finetuning, which introduces substantial additional training overhead. $\texttt{PairFlow}$ addresses this issue with a lightweight preprocessing step. Inspired by ReFlow and its extension to DFMs, we train DFMs from coupled samples of source and target distributions, without requiring any pretrained teacher. At the core of our approach is a closed-form inversion for DFMs, which allows efficient construction of paired source-target samples. Despite its extremely low cost, taking only up to 1.7% of the compute needed for full model training, $\texttt{PairFlow}$ matches or even surpasses the performance of two-stage training involving finetuning. Furthermore, models trained with our framework provide stronger base models for subsequent distillation, yielding further acceleration after finetuning. Experiments on molecular data as well as binary and RGB images demonstrate the broad applicability and effectiveness of our approach.

2512.13323 2026-05-26 cs.AI cs.LG 版本更新

Error-Driven Prompt Optimization for Arithmetic Reasoning

基于错误驱动的算术推理提示优化

Árpád Pándy, Róbert Lakatos, András Hajdu

发表机构 * Deptartment of Data Science & Visualization, Faculty of Informatics, University of Debrecen(数据科学与可视化系,信息学院,德布勒恩大学)

AI总结 提出一种错误驱动的提示优化框架,通过聚类错误预测迭代优化提示规则,使小型本地语言模型在算术推理任务中准确率达到70.8%,超越GPT-3.5 Turbo。

详情
Journal ref
IEEE Access, vol. 14, pp. 62570-62583, 2026
AI中文摘要

人工智能的最新进展激发了人们对工业代理的兴趣,这些代理能够在表格数据工作流中支持金融和医疗等受监管领域的分析师。此类系统的关键能力是对结构化数据执行准确的算术运算,同时确保敏感信息永远不会离开安全的本地环境。在此,我们引入了一种用于算术推理的错误驱动优化框架,该框架增强了代码生成代理(CGA),特别应用于本地小型语言模型(SLM)。通过对领先的SLM(Qwen3 4B)进行系统评估,我们发现虽然基础模型在算术任务中表现出基本局限性,但我们提出的错误驱动方法通过聚类错误预测来迭代优化提示规则,显著提升了性能,将模型准确率提高到70.8%。我们的结果表明,开发可靠、可解释且可工业部署的AI助手不仅可以通过昂贵的微调实现,还可以通过系统的、错误驱动的提示优化来实现,从而使小型模型以符合隐私要求的方式超越大型语言模型(GPT-3.5 Turbo)。

英文摘要

Recent advancements in artificial intelligence have sparked interest in industrial agents capable of supporting analysts in regulated sectors, such as finance and healthcare, within tabular data workflows. A key capability for such systems is performing accurate arithmetic operations on structured data while ensuring sensitive information never leaves secure, on-premises environments. Here, we introduce an error-driven optimization framework for arithmetic reasoning that enhances a Code Generation Agent (CGA), specifically applied to on-premises small language models (SLMs). Through a systematic evaluation of a leading SLM (Qwen3 4B), we find that while the base model exhibits fundamental limitations in arithmetic tasks, our proposed error-driven method, which clusters erroneous predictions to refine prompt-rules iteratively, dramatically improves performance, elevating the model's accuracy to 70.8\%. Our results suggest that developing reliable, interpretable, and industrially deployable AI assistants can be achieved not only through costly fine-tuning but also via systematic, error-driven prompt optimization, enabling small models to surpass larger language models (GPT-3.5 Turbo) in a privacy-compliant manner.

2512.06393 2026-05-26 cs.AI cs.CL cs.LG cs.LO 版本更新

Conflict-Aware Fusion: Mitigating Logic Inertia in Large Language Models via Structured Cognitive Priors

冲突感知融合:通过结构化认知先验缓解大语言模型中的逻辑惯性

Qiming Bao, Xiaoxuan Fu, Michael Witbrock

发表机构 * Xtracta & Strong AI Lab, University of Auckland(Xtracta与强人工智能实验室,奥克兰大学) School of Humanities, China University of Political Science and Law(人文学院,中国政法大学) Strong AI Lab, University of Auckland(强人工智能实验室,奥克兰大学)

AI总结 针对大语言模型在规则系统结构扰动下表现脆弱的问题,提出冲突感知融合训练流程,通过验证-演绎结构先验和符号推理奖励,在多个压力测试中实现鲁棒性饱和。

详情
AI中文摘要

大型语言模型(LLM)在许多推理基准上取得了高准确率,但在基于规则系统的结构扰动下仍然脆弱。我们引入了一个包含四个压力测试的诊断框架——冗余与必要规则删除、矛盾规则注入、逻辑保持重写和多定律堆叠——并用它来揭示逻辑惯性:生成式LLM(Qwen2/3、TinyLlama、GPT-4o、Gemma-3-4B-IT)和仅编码器BERT基线在矛盾前提下沿学习到的演绎轨迹持续推理的倾向。这种崩溃是剧烈的:未经处理的基线在基础任务上的准确率从1.00下降到矛盾注入时的0.00(实例级精确匹配),而GPT-4o仅解决了56.0%的矛盾案例。我们提出冲突感知融合,这是一个四阶段训练流程,将验证-演绎作为学习到的结构先验强制执行:(i)SFT建立验证前缀;(ii)DPO锐化矛盾停止决策边界;(iii)逻辑不变正则化(LIRE)通过对称KL惩罚逻辑等价规则公式之间的差异;(iv)来自验证反馈的强化学习(RLVF)使用符号前向链接引擎作为确定性预言奖励,联合优化不变性和敏感性。该流程在1.5B和8B骨干网络上均使所有四个主要压力测试达到饱和。我们进一步验证了第二阶段扩展,用Lean 4内核替换命题预言机,在分层187个问题的Lean翻译样本中,对105个经典可推导(T)问题达到99.0%的内核一致性(整体71.7%,涵盖两种极性),为形式化验证的RL训练提供了可靠的升级路径。代码和基准:https://github.com/14H034160212/lemo

英文摘要

Large language models (LLMs) achieve high accuracy on many reasoning benchmarks but remain brittle under structural perturbations of rule-based systems. We introduce a diagnostic framework with four stress tests -- redundant vs. essential rule deletion, contradictory-rule injection, logic-preserving rewrites, and multi-law stacking -- and use it to expose Logic Inertia: the tendency of generative LLMs (Qwen2/3, TinyLlama, GPT-4o, Gemma-3-4B-IT) and the encoder-only BERT baseline to persist along learned deductive trajectories under inconsistent premises. The collapse is sharp: untreated baselines fall from accuracy 1.00 on the base task to 0.00 on contradiction injection (instance-level exact match), and GPT-4o resolves only 56.0% of contradiction cases. We propose Conflict-Aware Fusion, a four-stage training pipeline that enforces verification-before-deduction as a learned structural prior: (i) SFT establishes the verification preamble; (ii) DPO sharpens the halt-on-contradiction decision boundary; (iii) Logical Invariance REgularisation (LIRE) penalises divergence between logically equivalent rule formulations via symmetric KL; (iv) Reinforcement Learning from Verification Feedback (RLVF) uses a symbolic forward-chaining engine as a deterministic oracle reward, jointly optimising invariance and sensitivity. The pipeline saturates all four primary stress tests for both 1.5B and 8B backbones. We further validate a Phase 2 extension that replaces the propositional oracle with a Lean 4 kernel, attaining 99.0% kernel agreement on the 105 classically-derivable (T) questions within a stratified 187-question Lean-translated sample (overall 71.7% across both polarities), providing a sound upgrade path to formally verified RL training. Code and benchmark: https://github.com/14H034160212/lemo

2512.05791 2026-05-26 physics.med-ph cs.CV cs.LG math.PR 版本更新

Fast and Robust Diffusion Posterior Sampling for MR Image Reconstruction Using the Preconditioned Unadjusted Langevin Algorithm

使用预条件未调整朗之万算法实现快速且鲁棒的MR图像重建扩散后验采样

Moritz Blumenthal, Tina Holliber, Jonathan I. Tamir, Martin Uecker

发表机构 * Institute of Biomedical Imaging, Graz University of Technology, Graz, Austria Department of Radiology, Boston Children's Hospital, Harvard Medical School, Boston, USA Chandra Family Department of Electrical Engineering, University of Texas at Austin, USA Department of Diagnostic Medicine, Dell Medical School, University of Texas at Austin, USA

AI总结 针对MR图像重建中扩散后验采样速度慢和参数调优问题,提出基于预条件未调整朗之万算法的精确似然方法,实现快速收敛且无需调参的鲁棒采样。

Comments Submitted to Magnetic Resonance in Medicine

详情
AI中文摘要

目的:结合未调整朗之万算法(ULA)与扩散模型,可以从高度欠采样的k空间数据生成高质量MRI重建结果并附带不确定性估计。然而,扩散后验采样(DPS)或似然退火等采样方法存在重建时间长和需要参数调优的问题。本文旨在开发一种具有快速收敛性的鲁棒采样算法。 理论与方法:在用于后验采样的反向扩散过程中,精确似然与所有噪声尺度下的扩散先验相乘。为克服收敛缓慢的问题,采用了预条件技术。该方法在fastMRI数据上训练,并在健康志愿者的回顾性欠采样脑部数据上测试。 结果:对于笛卡尔和非笛卡尔加速MRI的后验采样,新方法在重建速度和样本质量上均优于退火采样和DPS。 结论:所提出的预条件精确似然方法能够在各种MRI重建任务中实现快速可靠的后验采样,无需参数调优。

英文摘要

Purpose: The Unadjusted Langevin Algorithm (ULA) in combination with diffusion models can generate high quality MRI reconstructions with uncertainty estimation from highly undersampled k-space data. However, sampling methods such as diffusion posterior sampling (DPS) or likelihood annealing suffer from long reconstruction times and the need for parameter tuning. The purpose of this work is to develop a robust sampling algorithm with fast convergence. Theory and Methods: In the reverse diffusion process used for sampling the posterior, the exact likelihood is multiplied with the diffused prior at all noise scales. To overcome the issue of slow convergence, preconditioning is used. The method is trained on fastMRI data and tested on retrospectively undersampled brain data of a healthy volunteer. Results: For posterior sampling in Cartesian and non-Cartesian accelerated MRI the new approach outperforms annealed sampling and DPS in terms of reconstruction speed and sample quality. Conclusion: The proposed exact likelihood with preconditioning enables rapid and reliable posterior sampling across various MRI reconstruction tasks without the need for parameter tuning.

2512.05765 2026-05-26 cs.AI cs.LG 版本更新

AGI Requires a Coordination Layer on Top of Pattern Repositories

AGI 需要在模式存储库之上建立协调层

Edward Y. Chang

发表机构 * Department of Computer Science, Stanford University(斯坦福大学计算机科学系)

AI总结 本文提出大型语言模型(LLM)并非AGI的死胡同,而是缺少系统2协调层,通过UCCT和RCA实现语义锚定与因果验证,并设计MACI多智能体协调栈,实验表明自适应控制优于静态提示。

Comments 15 pages, 5 figures, 7 tables

详情
AI中文摘要

在本文中,我们认为那些将大型语言模型(LLM)视为AGI死胡同的有影响力的批评误判了瓶颈:它们混淆了海洋与渔网。模式存储库是必要的系统1基础;缺失的组件是一个系统2协调层,该层能够招募相关模式、验证其使用、保持状态并控制收敛。我们将常常被混淆的两种控制用途分开。由UCCT(统一上下文控制理论)形式化的语义锚定,通过由有效支持(rho_d)、表征不匹配(d_r)和自适应锚定预算(gamma log k)控制的相变,将标签和任务意图绑定到学习到的模式区域。由递归因果审计(RCA)实现的追踪-答案验证,测试最终因果判断是否在其自身推理轨迹的压力下得到支持。我们将这些思想转化为MACI,一个多智能体协调栈,通过诱饵(PID调节辩论)、过滤(苏格拉底式和因果审计)和持久性(事务性内存)整合多样性和控制。在因果判断和谄媚-偏执权衡上的实证验证表明,静态提示失败的地方,自适应控制成功。通过将常见反对意见重新定义为可测试的协调失败,我们认为通往AGI的道路是通过LLM,而不是绕过它们。能力不是协调。

英文摘要

In this paper we argue that influential critiques dismissing Large Language Models (LLMs) as a dead end for AGI misidentify the bottleneck: they confuse the ocean with the net. Pattern repositories are the necessary System-1 substrate; the missing component is a System-2 coordination layer that recruits relevant patterns, verifies their use, preserves state, and governs convergence. We separate two uses of control that are often conflated. Semantic anchoring, formalized by UCCT (Unified Contextual Control Theory), binds labels and task intent to learned pattern regions through a phase transition governed by effective support (rho_d), representational mismatch (d_r), and an adaptive anchoring budget (gamma log k). Trace-answer verification, implemented by Recursive Causal Audit (RCA), tests whether a final causal judgment is warranted by its own reasoning trace under pressure. We translate these ideas into MACI, a multi-agent coordination stack that integrates diversity and control via baiting (PID-modulated debate), filtering (Socratic and causal audit), and persistence (transactional memory). Empirical validation on causal judgment and the sycophancy-paranoia trade-off demonstrates that static prompting fails where adaptive control succeeds. By reframing common objections as testable coordination failures, we argue that the path to AGI runs through LLMs, not around them. Capability is not coordination.

2512.00125 2026-05-26 cs.CV cs.LG 版本更新

Hybrid Synthetic Data Generation with Domain Randomization Enables Zero-Shot Vision-Based Part Inspection Under Extreme Class Imbalance

混合合成数据生成与域随机化实现极端类别不平衡下基于视觉的零样本零件检测

Ruo-Syuan Mei, Sixian Jia, Guangze Li, Soo Yeon Lee, Brian Musser, William Keller, Sreten Zakula, Jorge Arinez, Chenhui Shao

发表机构 * Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109, USA Materials \& Manufacturing Systems Research Lab, General Motors, Warren, MI 48092, USA

AI总结 提出一种结合仿真渲染、域随机化和真实背景合成的混合合成数据生成框架,仅用合成数据训练YOLOv8n和MobileNetV3-small模型,在极端类别不平衡下实现零样本工业零件检测,检测mAP@0.5达0.995,分类准确率96%,平衡准确率90.1%。

Comments Submitted to the NAMRC 54

详情
AI中文摘要

机器学习,特别是深度学习,正在改变工业质量检测。然而,训练鲁棒的机器学习模型通常需要大量高质量标注数据,这在制造业中获取成本高昂、耗时且劳动密集。此外,缺陷样本本身稀少,导致严重的类别不平衡,降低模型性能。这些数据约束阻碍了基于机器学习的质量检测方法在实际生产环境中的广泛采用。合成数据生成(SDG)通过高效、经济且可扩展的方式创建大规模、平衡且完全标注的数据集,提供了一种有前景的解决方案。本文提出一种混合SDG框架,集成了基于仿真的渲染、域随机化和真实背景合成,无需人工标注即可实现基于计算机视觉的工业零件检测的零样本学习。该SDG流水线通过改变零件几何、光照和表面属性,并将合成零件合成到真实图像背景上,在一小时内生成12,960张标注图像。利用YOLOv8n骨干网络进行目标检测、MobileNetV3-small进行质量分类的两阶段架构,仅使用合成数据训练,并在300个真实工业零件上评估。所提方法在检测上达到mAP@0.5为0.995,分类准确率96%,平衡准确率90.1%。与基于少量真实数据的基线方法相比,性能显著提升。在严重类别不平衡下,所提基于SDG的方法达到90-91%的平衡准确率,而基线仅达到50%准确率。这些结果表明,所提方法能够为真实制造应用实现免标注、可扩展且鲁棒的质量检测。

英文摘要

Machine learning, particularly deep learning, is transforming industrial quality inspection. Yet, training robust machine learning models typically requires large volumes of high-quality labeled data, which are expensive, time-consuming, and labor-intensive to obtain in manufacturing. Moreover, defective samples are intrinsically rare, leading to severe class imbalance that degrades model performance. These data constraints hinder the widespread adoption of machine learning-based quality inspection methods in real production environments. Synthetic data generation (SDG) offers a promising solution by enabling the creation of large, balanced, and fully annotated datasets in an efficient, cost-effective, and scalable manner. This paper presents a hybrid SDG framework that integrates simulation-based rendering, domain randomization, and real background compositing to enable zero-shot learning for computer vision-based industrial part inspection without manual annotation. The SDG pipeline generates 12,960 labeled images in one hour by varying part geometry, lighting, and surface properties, and then compositing synthetic parts onto real image backgrounds. A two-stage architecture utilizing a YOLOv8n backbone for object detection and MobileNetV3-small for quality classification is trained exclusively on synthetic data and evaluated on 300 real industrial parts. The proposed approach achieves an mAP@0.5 of 0.995 for detection, 96% classification accuracy, and 90.1% balanced accuracy. Comparative evaluation against few-shot real-data baseline approaches demonstrates significant improvement. The proposed SDG-based approach achieves 90-91% balanced accuracy under severe class imbalance, while the baselines reach only 50% accuracy. These results demonstrate that the proposed method enables annotation-free, scalable, and robust quality inspection for real-world manufacturing applications.

2511.15407 2026-05-26 cs.AI cs.CV cs.LG 版本更新

IPR-1: Interactive Physical Reasoner

IPR-1:交互式物理推理器

Mingyu Zhang, Lifeng Zhuo, Tianxi Tan, Guocan Xie, Xian Nie, Yan Li, Renjie Zhao, Zizhu He, Ziyu Wang, Jiting Cai, Yong-Lu Li

发表机构 * CARNEGIE MELLON UNIVERSITY(卡内基梅隆大学)

AI总结 提出IPR模型,通过世界模型滚动评分和强化VLM策略,结合物理中心动作代码PhysCode,在1000+异构游戏基准上实现鲁棒的物理推理,性能超越GPT-5并零样本迁移至未见游戏。

Comments Accepted by CVPR 2026. 13 pages of main text and 20 pages of appendices. Project page: https://mybearyzhang.github.io/ipr-1

详情
AI中文摘要

人类通过观察、与环境交互以及内化物理和因果关系来学习。在这里,我们旨在探究一个智能体是否能够通过交互类似地获得类人推理能力,并随着更多经验不断改进。为此,我们引入了一个包含1000+异构游戏的Game-to-Unseen (G2U)基准,这些游戏展现出显著的视觉领域差异。现有方法(包括VLM和世界模型)难以捕捉底层物理和因果关系,因为它们不关注核心机制且过度拟合视觉细节。VLM/VLA智能体能够推理,但在交互设置中缺乏前瞻性,而世界模型进行想象但模仿视觉模式而非分析物理和因果关系。因此,我们提出IPR(交互式物理推理器),利用世界模型滚动来评分和强化VLM的策略,并引入PhysCode,一种以物理为中心的动作代码,将语义意图与动力学对齐,为预测和推理提供共享动作空间。在1000+游戏上预训练后,我们的IPR在从原始直觉到目标驱动推理的各个层次上表现稳健,甚至在总体上超越了GPT-5。我们发现,性能随着训练游戏和交互步骤的增加而提升,并且模型还能零样本迁移到未见过的游戏。这些结果支持以物理为中心的交互作为稳步提升物理推理的路径。更多演示和项目详情请见https://mybearyzhang.github.io/ipr-1。

英文摘要

Humans learn by observing, interacting with environments, and internalizing physics and causality. Here, we aim to ask whether an agent can similarly acquire human-like reasoning from interaction and keep improving with more experience. To study this, we introduce a Game-to-Unseen (G2U) benchmark of 1,000+ heterogeneous games that exhibit significant visual domain gaps. Existing approaches, including VLMs and world models, struggle to capture underlying physics and causality since they are not focused on core mechanisms and overfit to visual details. VLM/VLA agents reason but lack look-ahead in interactive settings, while world models imagine but imitate visual patterns rather than analyze physics and causality. We therefore propose IPR (Interactive Physical Reasoner), using world-model rollouts to score and reinforce a VLM's policy, and introduce PhysCode, a physics-centric action code aligning semantic intent with dynamics to provide a shared action space for prediction and reasoning. Pretrained on 1,000+ games, our IPR performs robustly on levels from primitive intuition to goal-driven reasoning, and even surpasses GPT-5 overall. We find that performance improves with more training games and interaction steps, and that the model also zero-shot transfers to unseen games. These results support physics-centric interaction as a path to steadily improving physical reasoning. Further demos and project details can be found at https://mybearyzhang.github.io/ipr-1.

2511.09048 2026-05-26 cs.LG 版本更新

Guaranteeing Conservation of Integrals with Projection in Physics-Informed Neural Networks

在物理信息神经网络中通过投影保证积分守恒

Anthony Baez, Wang Zhang, Ziwen Ma, Lam Nguyen, Subhro Das, Luca Daniel

发表机构 * MIT(麻省理工学院) IBM(国际商业机器公司)

AI总结 提出一种投影方法,通过求解约束非线性优化问题,在物理信息神经网络中分别或联合保证线性和二次积分量的守恒,将守恒误差降低三到四个数量级。

详情
AI中文摘要

我们提出了一种新颖的投影方法,能够保证物理信息神经网络(PINNs)中积分量的守恒。尽管PINNs用于强制执行偏微分方程(PDEs)结构的软约束在训练过程中提供了必要的灵活性,但也允许发现的解违反物理定律。为了解决这个问题,我们引入了一种投影方法,分别和联合保证线性和二次积分的守恒。我们通过求解约束非线性优化问题推导了投影公式,并发现经过投影修改的PINN(称为PINN-Proj)相比软约束,将这些量的守恒误差降低了三到四个数量级,并略微减少了PDE解误差。我们还发现,投影通过改善损失景观的条件性来改善收敛。我们的方法有望成为一个通用框架,只要存在可解的方案,就能保证PINN中任何积分量的守恒。

英文摘要

We propose a novel projection method that guarantees the conservation of integral quantities in Physics-Informed Neural Networks (PINNs). While the soft constraint that PINNs use to enforce the structure of partial differential equations (PDEs) enables necessary flexibility during training, it also permits the discovered solution to violate physical laws. To address this, we introduce a projection method that guarantees the conservation of the linear and quadratic integrals, both separately and jointly. We derived the projection formulae by solving constrained non-linear optimization problems and found that our PINN modified with the projection, which we call PINN-Proj, reduced the error in the conservation of these quantities by three to four orders of magnitude compared to the soft constraint and marginally reduced the PDE solution error. We also found evidence that the projection improved convergence through improving the conditioning of the loss landscape. Our method holds promise as a general framework to guarantee the conservation of any integral quantity in a PINN if a tractable solution exists.

2511.03963 2026-05-26 stat.ML cs.LG 版本更新

Robust inference using density-powered Stein operators

使用密度驱动的Stein算子进行稳健推断

Shinto Eguchi

发表机构 * The Institute of Statistical Mathematics(统计数学研究所)

AI总结 提出基于γ-散度的γ-Stein算子,通过密度加权实现未归一化概率模型的稳健推断,并应用于稳健拟合优度检验和贝叶斯后验近似。

Comments Revised version

详情
AI中文摘要

我们引入了Stein算子的密度幂加权变体,称为γ-Stein算子。这是一类从γ-散度导出的新型算子,旨在为未归一化概率模型构建稳健的推断方法。该算子的构造(通过模型密度的正幂γ进行加权)固有地降低了异常值的影响,提供了一种稳健性的原则性机制。应用该算子产生了得分匹配的稳健推广,保留了不依赖于模型归一化常数的关键性质。我们将此框架扩展到两个关键应用:用于稳健拟合优度检验的γ-核化Stein散度,以及用于稳健贝叶斯后验近似的γ-Stein变分梯度下降。在受污染的高斯和四次势模型上的实验结果表明,我们的方法在稳健性和统计效率上显著优于标准基线。

英文摘要

We introduce a density-power weighted variant for the Stein operator, called the $γ$-Stein operator. This is a novel class of operators derived from the $γ$-divergence, designed to build robust inference methods for unnormalized probability models. The operator's construction (weighting by the model density raised to a positive power $γ$ inherently down-weights the influence of outliers, providing a principled mechanism for robustness. Applying this operator yields a robust generalization of score matching that retains the crucial property of being independent of the model's normalizing constant. We extend this framework to develop two key applications: the $γ$-kernelized Stein discrepancy for robust goodness-of-fit testing, and $γ$-Stein variational gradient descent for robust Bayesian posterior approximation. Empirical results on contaminated Gaussian and quartic potential models show our methods significantly outperform standard baselines in both robustness and statistical efficiency.

2510.20954 2026-05-26 stat.ML cs.LG eess.SP 版本更新

A Spectral Framework for Graph Neural Operators: Convergence Guarantees and Tradeoffs

图神经算子的谱框架:收敛保证与权衡

Roxanne Holden, Luana Ruiz

发表机构 * Applied Mathematics and Statistics(应用数学与统计学) Johns Hopkins University(约翰霍普金斯大学)

AI总结 本文提出统一谱框架,分析图神经算子在无正则性、全局Lipschitz连续和分段Lipschitz连续假设下的收敛率与权衡。

详情
AI中文摘要

图极限(Graphons)作为图序列的极限,为分析图神经算子的渐近行为提供了算子理论框架。采样图到图极限的谱收敛诱导了相应神经算子的收敛,从而实现了图神经网络(GNN)的可迁移性分析。本文开发了一个统一的谱框架,将不同假设下(包括无正则性、全局Lipschitz连续和分段Lipschitz连续)的收敛结果整合在一起。该框架将这些结果置于公共算子环境中,便于直接比较其假设、收敛率和权衡。我们进一步在合成图和真实世界图上展示了这些率的经验紧致性。

英文摘要

Graphons, as limits of graph sequences, provide an operator-theoretic framework for analyzing the asymptotic behavior of graph neural operators. Spectral convergence of sampled graphs to graphons induces convergence of the corresponding neural operators, enabling transferability analyses of graph neural networks (GNNs). This paper develops a unified spectral framework that brings together convergence results under different assumptions on the underlying graphon, including no regularity, global Lipschitz continuity, and piecewise-Lipschitz continuity. The framework places these results in a common operator setting, enabling direct comparison of their assumptions, convergence rates, and tradeoffs. We further illustrate the empirical tightness of these rates on synthetic and real-world graphs.

2510.19328 2026-05-26 cs.LG 版本更新

Clustered Calibration: Representation-Aware Probability Calibration via Learned Subpopulations

聚类校准:通过学习子群体实现表示感知的概率校准

Tomer Lavi, Bracha Shapira, Nadav Rappoport

发表机构 * Faculty of Computer and Information Science(计算机与信息科学学院)

AI总结 提出聚类校准框架,通过在特征空间聚类识别子群体并拟合混合校准器,结合分层收缩实现上下文特定校准,在表格、图像和文本基准上提升或匹配强全局校准器的负对数似然和Brier分数。

详情
AI中文摘要

在高风险领域如临床决策支持、自动驾驶和金融风险评估中,确保预测概率与观察频率一致至关重要。现有的校准方法通常应用单一全局变换或依赖对预测置信度的后验分箱,限制了其利用子群体间异质可靠性的能力。我们提出聚类校准,一种表示感知框架,通过在学习的特征空间(如覆盖向量、SHAP值、CNN激活、Transformer嵌入)中聚类识别子群体,并在分层收缩下拟合向全局映射的簇特定参数化校准器的软混合。这种设计在保持全局稳定性的同时实现了上下文特定的校准。在六个表格数据集以及额外的图像和文本基准上,聚类校准在负对数似然和Brier分数方面持续改进或匹配强全局校准器,同时保持AUC和准确率。我们进一步从分析和经验上证明,即使适当评分规则改进,固定箱期望校准误差(ECE)也可能对软的、区域感知的校准器进行错误排序,并主张在此类设置中使用对数损失和Brier作为更可靠的模型选择基础。

英文摘要

Ensuring that predicted probabilities align with observed frequencies is critical in high-stakes domains such as clinical decision support, autonomous driving and financial risk assessment. Existing calibration methods typically apply a single global transformation or rely on post-hoc binning over predicted confidences, limiting their ability to exploit heterogeneous reliability across sub-populations. We propose Clustered Calibration, a representation-aware framework that identifies sub-populations via clustering in learned feature spaces (e.g., coverage vectors, SHAP values, CNN activations, Transformer embeddings) and fits a soft mixture of cluster-specific parametric calibrators under hierarchical shrinkage toward a global mapping. This design yields context-specific calibration while maintaining global stability. Across six tabular datasets and additional image and text benchmarks, clustered calibration consistently improves or matches strong global calibrators in terms of negative log-likelihood and Brier score, while preserving AUC and accuracy. We further show, both analytically and empirically, that fixed-bin Expected Calibration Error (ECE) can mis-rank soft, region-aware calibrators even when proper scoring rules improve, and we advocate for log-loss and Brier as more reliable bases for model selection in such settings.

2510.15284 2026-05-26 cs.LG math.ST stat.TH 版本更新

Small Ensemble-based Data Assimilation: A Machine Learning-Enhanced Data Assimilation Method with Limited Ensemble Size

基于小集合的数据同化:一种机器学习增强的有限集合数据同化方法

Zhilin Li, Zhou Yao, Xianglong Li, Zeng Liu, Zhaokuan Lu, Shanlin Xu, Seungnam Kim, Guangyao Wang

发表机构 * Centre for Regional Oceans, Department of Ocean Science and Technology, and State Key Laboratory of Internet of Things for Smart City, University of Macau(澳门地区海洋研究中心、海洋科学与技术学院及智能城市物联网国家重点实验室,澳门大学) School of Naval Architecture and Ocean Engineering, Huazhong University of Science and Technology(华中科技大学船舶与海洋工程学院) Ningbo Institute of Dalian University of Technology(大连理工大学宁波研究院) College of Civil Engineering, Zhejiang University of Technology(浙江工业大学土木工程学院) Department of Naval Architecture and Ocean Engineering, Hongik University(成均馆大学船舶与海洋工程学院) State Key Laboratory of Internet of Things for Smart City, University of Macau(智能城市物联网国家重点实验室,澳门大学) Zhuhai UM Science and Technology Research Institute(珠海UM科技研究院)

AI总结 提出一种结合集合卡尔曼滤波与全连接神经网络的机器学习数据同化方法,通过小集合生成初步分析状态并用神经网络预测修正项,在几乎不增加计算成本下提升精度。

详情
AI中文摘要

基于集合的数据同化方法因其处理非线性动态问题的固有能力而日益流行。然而,这些方法通常面临分析精度与计算效率之间的权衡,因为更高精度所需的更大集合规模也会导致更高的计算成本。在本研究中,我们提出了一种新颖的基于机器学习的数据同化方法,将传统的集合卡尔曼滤波与全连接神经网络相结合。具体而言,我们的方法使用相对较小的集合规模通过EnKF生成初步但次优的分析状态。然后利用FCNN学习并预测这些状态的修正项,从而减轻有限集合规模导致的性能下降。我们通过涉及Lorenz系统和非线性海浪场模拟的数值实验评估了所提出的EnKF-FCNN方法的性能。结果一致表明,新方法在相同集合规模下比传统EnKF实现了更高的精度,同时几乎不增加额外计算成本。此外,EnKF-FCNN方法通过与不同模型耦合以及使用替代的基于集合的数据同化方法,可适应多种应用。

英文摘要

Ensemble-based data assimilation (DA) methods have become increasingly popular due to their inherent ability to address nonlinear dynamic problems. However, these methods often face a trade-off between analysis accuracy and computational efficiency, as larger ensemble sizes required for higher accuracy also lead to greater computational cost. In this study, we propose a novel machine learning-based data assimilation approach that combines the traditional ensemble Kalman filter (EnKF) with a fully connected neural network (FCNN). Specifically, our method uses a relatively small ensemble size to generate preliminary yet suboptimal analysis states via EnKF. A FCNN is then employed to learn and predict correction terms for these states, thereby mitigating the performance degradation induced by the limited ensemble size. We evaluate the performance of our proposed EnKF-FCNN method through numerical experiments involving Lorenz systems and nonlinear ocean wave field simulations. The results consistently demonstrate that the new method achieves higher accuracy than traditional EnKF with the same ensemble size, while incurring negligible additional computational cost. Moreover, the EnKF-FCNN method is adaptable to diverse applications through coupling with different models and the use of alternative ensemble-based DA methods.

2510.14925 2026-05-26 cs.AI cs.CL cs.LG 版本更新

False Fixed Points: Kantian Feedback, Stable Miscalibration, and Representational Compression in LLMs

虚假不动点:大语言模型中的康德反馈、稳定误校准与表征压缩

Akira Okutomi

发表机构 * ToppyMicroServices OÜ(ToppyMicroServices公司)

AI总结 本文通过康德承诺门控框架和线性反馈模型,研究大语言模型中高置信度错误作为局部稳定、内部一致且自信错误的虚假不动点现象,发现稳定性与正确性可分离,并探索高信噪比惯性和表征压缩作为稳定误校准的可能机制。

Comments 27 pages, 8 figures, v3.0

详情
AI中文摘要

大型语言模型中的高置信度错误通常被视为脆弱的失败。我们研究另一种可能性:某些错误可能是虚假不动点,即局部稳定、内部一致且自信地错误。这分离了鲁棒性与真实追踪。我们通过康德承诺门控框架和一个最小线性反馈模型来发展这种分离,其中稳定性和正确性可以偏离。在三个开源权重模型上,根据我们的隐藏状态敏感性探测,过度自信的错误项并不比自信正确的项系统性地更局部脆弱。基于弃权的自我批评通过牺牲覆盖率减少了过度自信的错误承诺,而C3-R(一种基于规则的显式反馈门控)则加剧了这种权衡而非消除它。这些结果激发但未证实高信噪比惯性和表征压缩作为稳定误校准的可能机制。

英文摘要

High-confidence errors in large language models are often treated as fragile failures. We study an alternative: some errors may be false fixed points, locally stable, internally coherent, and confidently wrong. This separates robustness from truth-tracking. We develop the separation through a Kantian commitment-gate framing and a minimal linear feedback model in which stability and correctness can diverge. Across three open-weight models, overconfident wrong items are not systematically more locally fragile than confidently correct items under our hidden-state sensitivity probes. Abstention-aware self-critique reduces overconfident wrong commitments by sacrificing coverage, and C3-R, a rule-based explicit feedback gate, sharpens that tradeoff rather than eliminating it. These results motivate, but do not establish, high signal-to-noise (high-SNR) inertia and representational compression as possible mechanisms for stable miscalibration.

2510.08609 2026-05-26 cs.SE cs.CR cs.LG cs.PL 版本更新

Which Is Better For Reducing Outdated and Vulnerable Dependencies: Pinning or Floating?

哪种方法更能减少过时和易受攻击的依赖:固定版本还是浮动版本?

Imranur Rahman, Jill Marley, William Enck, Laurie Williams

发表机构 * North Carolina State University(北卡罗来纳州立大学)

AI总结 本研究通过实证分析npm、PyPI和Cargo生态系统中依赖版本约束的使用趋势,利用生存分析比较固定版本与浮动版本对依赖过时和易受攻击风险的影响。

Comments Accepted to ASE 2025

详情
Journal ref
2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE)
AI中文摘要

开发者通常使用版本约束来指定其项目依赖的可接受版本。固定依赖可以减少破坏性变更的可能性,但需要手动管理过时和易受攻击依赖的替换。另一方面,浮动依赖可以自动获取错误修复和安全修复,但存在破坏性变更的风险。安全从业者主张固定依赖以防止软件供应链攻击,例如恶意包更新。然而,由于固定是最严格的版本约束,它最可能导致依赖过时。尽管如此,不同版本约束类型下依赖变得过时或易受攻击的可能性如何变化尚不清楚。本研究旨在通过大规模实证评估不同版本约束类型下依赖变得过时或易受攻击的可能性,帮助开发者做出明智的依赖版本约束选择。在本研究中,我们首先识别了npm、PyPI和Cargo生态系统中依赖版本约束使用的趋势以及开发者对版本约束类型更改的模式。然后,我们使用生存分析对依赖状态转换进行建模,并估计使用固定版本与其他版本约束类型相比,依赖变得过时或易受攻击的可能性如何变化。我们观察到,在过时和易受攻击的依赖中,最常用的版本约束类型是浮动-次要,固定版本次之。我们还发现,浮动-主要导致过时的可能性最小,而浮动-次要导致易受攻击的可能性最小。

英文摘要

Developers consistently use version constraints to specify acceptable versions of the dependencies for their project. Pinning dependencies can reduce the likelihood of breaking changes, but comes with a cost of manually managing the replacement of outdated and vulnerable dependencies. On the other hand, floating can be used to automatically get bug fixes and security fixes, but comes with the risk of breaking changes. Security practitioners advocate pinning dependencies to prevent against software supply chain attacks, e.g., malicious package updates. However, since pinning is the tightest version constraint, pinning is the most likely to result in outdated dependencies. Nevertheless, how the likelihood of becoming outdated or vulnerable dependencies changes across version constraint types is unknown. The goal of this study is to aid developers in making an informed dependency version constraint choice by empirically evaluating the likelihood of dependencies becoming outdated or vulnerable across version constraint types at scale. In this study, we first identify the trends in dependency version constraint usage and the patterns of version constraint type changes made by developers in the npm, PyPI, and Cargo ecosystems. We then modeled the dependency state transitions using survival analysis and estimated how the likelihood of becoming outdated or vulnerable changes when using pinning as opposed to the rest of the version constraint types. We observe that among outdated and vulnerable dependencies, the most commonly used version constraint type is floating-minor, with pinning being the next most common. We also find that floating-major is the least likely to result in outdated and floating-minor is the least likely to result in vulnerable dependencies.

2510.02730 2026-05-26 cs.LG cs.CV 版本更新

Dale meets Langevin: A Multiplicative Denoising Diffusion Model

Dale meets Langevin: 乘法去噪扩散模型

Nishanth Shetty, Madhava Prasath, Chandra Sekhar Seelamantula

发表机构 * Department of Electrical Engineering(电子工程系) Indian Institute of Science(印度科学研究所)

AI总结 提出以几何布朗运动为前向噪声过程的乘法分数生成模型,推导反向时间SDE并设计两种乘法采样器,引入Hyvärinen分数和乘法去噪分数匹配目标,在图像数据集上验证生成能力。

详情
AI中文摘要

指数梯度下降(EGD)是一种受生物学启发的优化算法,遵循Dale定律,在收敛时产生对数正态分布的突触权重,与神经科学的实验观察一致。由于几何布朗运动(GBM)在任何固定时间的边际分布是对数正态的,这种收敛性质揭示了EGD与基于GBM的随机过程之间的自然联系。我们提出了一种基于分数的乘法生成模型,以GBM作为前向噪声过程,并推导了其在环境空间和对数变换空间中的相应反向时间SDE。通过离散化相应的反向时间SDE,我们推导出两种乘法采样器:直接从环境空间反向时间SDE得到的符号无关采样器,以及通过Lamperti变换得到的符号保持采样器,我们称之为Dale-Langevin采样器。我们将该框架与镜像Langevin动力学联系起来,表明优化中驱动EGD的凸函数精确地控制着Dale-Langevin采样器。虽然标准Stein分数(定义为随机向量X在x处的∇log p_X(x))在基于加性噪声的扩散模型中自然出现,但在乘法设置中,我们遇到了一种用于采样的修改版Stein分数,我们称之为Hyvärinen分数:x∘∇log p_X(x)。为了估计该分数,我们提出了一种新的乘法去噪分数匹配目标(M-DSM),证明了其与乘法显式分数匹配损失的等价性,并表明它包含了非负分数匹配损失。在MNIST、Fashion-MNIST、Kuzushiji-MNIST和CIFAR-10上的实验结果验证了所提框架的生成能力。

英文摘要

Exponentiated gradient descent (EGD), a biologically motivated optimisation algorithm that respects Dale's law, produces log-normally distributed synaptic weights at convergence, in alignment with experimental observations in neuroscience. Since the marginal distribution of geometric Brownian motion (GBM) at any fixed time is log-normal, this convergence property reveals a natural connection between EGD and GBM-based stochastic processes. We propose a multiplicative score-based generative model with GBM as a forward noising process and derive its corresponding reverse-time SDE in both the ambient space and in the $\log$-transformed space. We derive two multiplicative samplers by discretising the corresponding reverse-time SDEs: a sign-agnostic sampler obtained directly from the ambient-space reverse-time SDE, and a sign-preserving sampler, which we refer to as the Dale-Langevin sampler, obtained via the Lamperti transform. We connect the framework to Mirrored Langevin Dynamics, showing that the convex function driving EGD in optimisation precisely governs the Dale-Langevin sampler. While the standard Stein score, defined as $\nabla \log p_{\boldsymbol{X}}(\boldsymbol{x})$ for a random vector $\boldsymbol{X}$ evaluated at $\boldsymbol{x}$, comes up naturally in the additive noise based diffusion models, in the multiplicative setting, we encounter a modified version of the Stein score for sampling, which we refer to as the {\it Hyvärinen score}: $\boldsymbol{x} \circ \nabla \log p_{\boldsymbol{X}}(\boldsymbol{x})$. To estimate the score, we propose a new multiplicative denoising score-matching objective (M-DSM), prove its equivalence to the multiplicative explicit score-matching loss and show that it subsumes the non-negative score matching loss. Experimental results on MNIST, Fashion-MNIST, Kuzushiji-MNIST, and CIFAR-10 to validate the generative capability of the proposed framework.

2510.01389 2026-05-26 cs.RO cs.AI cs.LG 版本更新

INSIGHT: INference-time Sequence Introspection for Generating Help Triggers in Vision-Language-Action Models

INSIGHT: 视觉-语言-动作模型中生成帮助触发器的推理时序列内省

Ulas Berk Karli, Ziyao Shangguan, Tesca FItzgerald

发表机构 * Department of Computer Science, Yale University(耶鲁大学计算机科学系)

AI总结 提出INSIGHT框架,利用令牌级不确定性信号(熵、对数概率、不确定性估计)训练变压器分类器,预测VLA模型何时需要人类帮助,并对比强/弱监督下的性能,发现建模时间动态优于静态评分。

详情
AI中文摘要

最近的视觉-语言-动作(VLA)模型展现出强大的泛化能力,但它们缺乏用于预测失败和向人类监督者请求帮助的内省机制。我们提出了INSIGHT,一个利用令牌级不确定性信号来预测VLA何时应请求帮助的学习框架。使用π0-FAST作为基础模型,我们提取每个令牌的熵、对数概率以及基于狄利克雷的偶然不确定性和认知不确定性估计,并训练紧凑的变压器分类器将这些序列映射到帮助触发器。我们探索了强监督或弱监督的监督机制,并在分布内和分布外任务中进行了广泛比较。我们的结果显示了权衡:强标签使模型能够捕捉细粒度的不确定性动态以实现可靠的帮助检测,而弱标签虽然噪声较大,但在训练和评估对齐时仍能支持有竞争力的内省,为密集标注不可行时提供了可扩展的路径。关键的是,我们发现使用变压器建模令牌级不确定性信号的时间演化比静态序列级评分提供了更强的预测能力。本研究首次对VLA中基于不确定性的内省进行了系统评估,为主动学习和通过选择性人工干预实现实时错误缓解开辟了未来途径。

英文摘要

Recent Vision-Language-Action (VLA) models show strong generalization capabilities, yet they lack introspective mechanisms for anticipating failures and requesting help from a human supervisor. We present \textbf{INSIGHT}, a learning framework for leveraging token-level uncertainty signals to predict when a VLA should request help. Using $π_0$-FAST as the underlying model, we extract per-token \emph{entropy}, \emph{log-probability}, and Dirichlet-based estimates of \emph{aleatoric and epistemic uncertainty}, and train compact transformer classifiers to map these sequences to help triggers. We explore supervision regimes for strong or weak supervision, and extensively compare them across in-distribution and out-of-distribution tasks. Our results show a trade-off: strong labels enable models to capture fine-grained uncertainty dynamics for reliable help detection, while weak labels, though noisier, still support competitive introspection when training and evaluation are aligned, offering a scalable path when dense annotation is impractical. Crucially, we find that modeling the temporal evolution of token-level uncertainty signals with transformers provides far greater predictive power than static sequence-level scores. This study provides the first systematic evaluation of uncertainty-based introspection in VLAs, opening future avenues for active learning and for real-time error mitigation through selective human intervention.

2509.25507 2026-05-26 stat.ML cs.LG math.ST stat.ME stat.TH 版本更新

One-shot Conditional Sampling: MMD meets Nearest Neighbors

一次性条件采样:MMD 遇见最近邻

Anirban Chatterjee, Sayantan Choudhury, Rohan Hore

发表机构 * University of Chicago(芝加哥大学) MBZUAI(马斯克商学院) Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出 CGMMD 框架,通过最小化最大均值差异(MMD)实现一次性条件采样,理论保证收敛性,并在图像去噪和超分辨率等任务中表现优异。

Comments Accepted at the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

我们如何从从未完全观察到的条件分布中生成样本?这个问题在现代机器学习和经典统计学的广泛应用中都会出现,包括计算机视觉中的图像后处理、基于模拟的推理中的近似后验采样以及复杂数据设置中的条件分布建模。在这种情况下,与无条件采样相比,可以利用额外的特征信息来实现更自适应和高效的采样。基于此,我们引入了使用 MMD 的条件生成器(CGMMD),一种用于条件采样的新颖框架。与许多当代方法不同,我们的方法将训练目标设定为一个简单的、无对抗的直接最小化问题。CGMMD 的一个关键特性是它能够在生成器的单次前向传播中产生条件样本,从而实现实际的一次性采样,测试时复杂度低。我们建立了从 CGMMD 采样器采样时产生的损失的严格理论界限,并证明了估计分布向真实条件分布的收敛性。在此过程中,我们还开发了基于最近邻的泛函的一致集中结果,这可能具有独立的研究价值。最后,我们展示了 CGMMD 在涉及复杂条件密度的合成任务以及实际应用(如图像去噪和图像超分辨率)中具有竞争力的表现。

英文摘要

How can we generate samples from a conditional distribution that we never fully observe? This question arises across a broad range of applications in both modern machine learning and classical statistics, including image post-processing in computer vision, approximate posterior sampling in simulation-based inference, and conditional distribution modeling in complex data settings. In such settings, compared with unconditional sampling, additional feature information can be leveraged to enable more adaptive and efficient sampling. Building on this, we introduce Conditional Generator using MMD (CGMMD), a novel framework for conditional sampling. Unlike many contemporary approaches, our method frames the training objective as a simple, adversary-free direct minimization problem. A key feature of CGMMD is its ability to produce conditional samples in a single forward pass of the generator, enabling practical one-shot sampling with low test-time complexity. We establish rigorous theoretical bounds on the loss incurred when sampling from the CGMMD sampler, and prove convergence of the estimated distribution to the true conditional distribution. In the process, we also develop a uniform concentration result for nearest-neighbor based functionals, which may be of independent interest. Finally, we show that CGMMD performs competitively on synthetic tasks involving complex conditional densities, as well as on practical applications such as image denoising and image super-resolution.

2509.25339 2026-05-26 cs.CV cs.AI cs.LG eess.IV 版本更新

VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes

VisualOverload: 在真正密集场景中探测VLM的视觉理解

Paul Gavrikov, Wei Lin, M. Jehanzeb Mirza, Soumya Jahagirdar, Muhammad Huzaifa, Sivan Doveh, Serena Yeung-Levy, James Glass, Hilde Kuehne

发表机构 * Independent Researcher(独立研究者) JKU Linz(林茨JKU) MIT CSAIL Tübingen AI Center(图宾根人工智能中心) Stanford(斯坦福) MIT-IBM Watson AI Lab(MIT-IBM沃森人工智能实验室)

AI总结 提出VisualOverload基准,通过密集场景中的简单视觉任务测试VLM,发现最佳模型仅达69.5%准确率,揭示计数、OCR和逻辑一致性等关键缺陷。

Comments Accepted at CVPR 2026

详情
AI中文摘要

最先进的VLM是否真正解决了基本视觉理解?我们提出VisualOverload,一个略有不同的视觉问答(VQA)基准,包含2,720个问答对,并持有私有真实答案。与以往通常关注近全局图像理解的VQA数据集不同,VisualOverload挑战模型在密集(或过载)场景中执行简单的、无需知识的视觉任务。我们的数据集由公共领域绘画的高分辨率扫描图组成,这些绘画包含多个人物、动作和展开的子情节,背景细节丰富。我们手动为这些图像标注了六个任务类别的问题,以探测对场景的彻底理解。我们假设当前基准高估了VLM的性能,编码和推理细节对它们来说仍然是一项具有挑战性的任务,尤其是当面对密集场景时。实际上,我们观察到在37个测试模型中,即使是最好的模型(o3)在我们最难的测试子集上也仅达到19.6%的准确率,在所有问题上总体准确率为69.5%。除了全面评估外,我们还通过错误分析补充了基准,揭示了多种失败模式,包括缺乏计数能力、OCR失败以及复杂任务下惊人的逻辑不一致。总之,VisualOverload暴露了当前视觉模型中的关键差距,并为社区开发更好的模型提供了重要资源。基准:http://paulgavrikov.github.io/visualoverload

英文摘要

Is basic visual understanding really solved in state-of-the-art VLMs? We present VisualOverload, a slightly different visual question answering (VQA) benchmark comprising 2,720 question-answer pairs, with privately held ground-truth responses. Unlike prior VQA datasets that typically focus on near global image understanding, VisualOverload challenges models to perform simple, knowledge-free vision tasks in densely populated (or, overloaded) scenes. Our dataset consists of high-resolution scans of public-domain paintings that are populated with multiple figures, actions, and unfolding subplots set against elaborately detailed backdrops. We manually annotated these images with questions across six task categories to probe for a thorough understanding of the scene. We hypothesize that current benchmarks overestimate the performance of VLMs, and encoding and reasoning over details is still a challenging task for them, especially if they are confronted with densely populated scenes. Indeed, we observe that even the best model (o3) out of 37 tested models only achieves 19.6% accuracy on our hardest test split and overall 69.5% accuracy on all questions. Beyond a thorough evaluation, we complement our benchmark with an error analysis that reveals multiple failure modes, including a lack of counting skills, failure in OCR, and striking logical inconsistencies under complex tasks. Altogether, VisualOverload exposes a critical gap in current vision models and offers a crucial resource for the community to develop better models. Benchmark: http://paulgavrikov.github.io/visualoverload

2509.24050 2026-05-26 cs.LG 版本更新

Bridging On-Device and Cloud LLMs for Collaborative Reasoning: A Unified Methodology for Local Routing and Post-Training

桥接设备端与云端大语言模型实现协作推理:本地路由与后训练的统一方法

Wenzhi Fang, Dong-Jun Han, Liangqi Yuan, Evan Chen, Christopher Brinton

发表机构 * Purdue University(普渡大学) Yonsei University(延世大学)

AI总结 提出通过强化学习后训练使设备端LLM内部决定是否调用云端,结合分层奖励和自适应提示过滤,显著缩小与纯云端LLM的性能差距。

Comments We propose a unified post-training framework that integrates routing optimization, enabling the on-device LLM to improve its problem-solving ability while learning routing strategies

详情
AI中文摘要

设备-云端协作有望部署大型语言模型(LLM),利用轻量级设备端模型提高效率,同时依赖强大的云端模型实现卓越推理。该设置中的一个核心挑战是,对于每个传入查询,确定是应在本地处理还是卸载到云端。现有方法通常依赖外部路由器,这些路由器往往难以从提示本身判断难度,尤其是涉及复杂推理的任务。受此限制,我们提出使设备端LLM在推理时内部决定是否调用云端协助,并通过基于强化学习的后训练来灌输这种能力。将设备端LLM后训练视为奖励最大化问题,我们设计分层奖励以鼓励本地问题解决和明智的云端卸载。为解决该问题,我们开发了一种算法,采用组级策略梯度稳定优化,并结合自适应提示过滤提供互补学习信号,以缓解策略崩溃(即仅本地执行或仅云端卸载)。在多个推理基准上对设备端规模的LLaMA和Qwen模型进行的大量实验表明,我们的方法始终优于基线,并显著缩小了与纯云端LLM的差距。

英文摘要

Device-cloud collaboration holds promise for deploying large language models (LLMs), leveraging lightweight on-device models for efficiency while relying on powerful cloud models for superior reasoning. A central challenge in this setting is determining, for each incoming query, whether it should be processed locally or offloaded to the cloud. Existing approaches typically rely on external routers, which often struggle to determine difficulty from the prompt itself, especially for tasks involving complex reasoning. Motivated by this limitation, we propose enabling on-device LLMs to decide internally whether to invoke cloud assistance at inference time, with this capability instilled through reinforcement learning based post-training. Casting on-device LLM post-training as a reward maximization problem, we design hierarchical rewards to encourage local problem solving and judicious cloud offloading. To solve the resulting problem, we develop an algorithm featuring a group-level policy gradient that stabilizes optimization, together with adaptive prompt filtering that provides complementary learning signals to mitigate policy collapse (i.e., exclusive local execution or exclusive cloud offloading). Extensive experiments on on-device-scale LLaMA and Qwen models across multiple reasoning benchmarks show that our method consistently outperforms baselines and significantly narrows the gap to full cloud LLMs.

2509.12196 2026-05-26 cs.LG cs.AI 版本更新

Dynamic Relational Priming Improves Transformer in Multivariate Time Series

动态关系先验提升Transformer在多变量时间序列中的表现

Hunjae Lee, Corey Clark

发表机构 * Department of Computer Science, Southern Methodist University, Dallas TX USA(计算机科学系,南方 Methodist 大学,德克萨斯州达拉斯)

AI总结 提出动态关系先验注意力机制(prime attention),通过为每个token对动态调整表示,有效捕捉多变量时间序列中异构的通道间依赖关系,在保持相同计算复杂度下提升预测精度达6.5%。

详情
AI中文摘要

标准Transformer中的注意力机制使用静态的token表示,这些表示在每一层的所有成对计算中保持不变。这限制了它们与每个token对交互中可能存在的多样化关系动态的表示对齐。虽然标准注意力在关系相对同质的领域表现出色,但其静态关系学习难以捕捉多变量时间序列(MTS)数据中多样、异构的通道间依赖关系——其中单个系统内不同的通道对交互可能由完全不同的物理定律或时间动态支配。为了更好地将注意力机制与此类领域现象对齐,我们提出了带有动态关系先验的注意力机制(prime attention)。与标准注意力中每个token在所有成对交互中呈现相同表示不同,prime attention通过可学习的调制动态地(或按交互)定制每个token,以最好地捕捉每个token对的独特关系动态,从而针对特定关系优化每个成对交互。这种prime attention的表示可塑性使其能够在保持与标准注意力相同渐近计算复杂度的同时,有效提取MTS中关系特定的信息。我们的结果表明,prime attention在基准测试中始终优于标准注意力,预测精度提升高达6.5%。此外,我们发现与标准注意力相比,prime attention在使用最多40%更短序列长度时即可达到相当或更优的性能,进一步证明了其卓越的关系建模能力。

英文摘要

Standard attention mechanisms in transformers employ static token representations that remain unchanged across all pair-wise computations in each layer. This limits their representational alignment with the potentially diverse relational dynamics of each token-pair interaction. While they excel in domains with relatively homogeneous relationships, standard attention's static relational learning struggles to capture the diverse, heterogeneous inter-channel dependencies of multivariate time series (MTS) data--where different channel-pair interactions within a single system may be governed by entirely different physical laws or temporal dynamics. To better align the attention mechanism for such domain phenomena, we propose attention with dynamic relational priming (prime attention). Unlike standard attention where each token presents an identical representation across all of its pair-wise interactions, prime attention tailors each token dynamically (or per interaction) through learnable modulations to best capture the unique relational dynamics of each token pair, optimizing each pair-wise interaction for that specific relationship. This representational plasticity of prime attention enables effective extraction of relationship-specific information in MTS while maintaining the same asymptotic computational complexity as standard attention. Our results demonstrate that prime attention consistently outperforms standard attention across benchmarks, achieving up to 6.5\% improvement in forecasting accuracy. In addition, we find that prime attention achieves comparable or superior performance using up to 40\% less sequence length compared to standard attention, further demonstrating its superior relational modeling capabilities.

2509.10515 2026-05-26 cs.LG cs.CL 版本更新

Adaptive Preference Optimization with Uncertainty-aware Utility Anchor

基于不确定性感知效用锚点的自适应偏好优化

Xiaobo Wang, Zixia Jia, Jiaqi Li, Qi Liu, Zilong Zheng

发表机构 * State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China(认知智能国家重点实验室,中国科学技术大学) Institute of Artificial Intelligence, Hefei Comprehensive National Science Center(人工智能研究院,合肥综合性国家科学中心) State Key Laboratory of General Artificial Intelligence, BIGAI(通用人工智能国家重点实验室,BIGAI)

AI总结 提出一种通用离线偏好优化框架UAPO,通过引入锚点函数估计偏好数据标注的不确定性,支持非配对数据训练,提升数据利用效率和训练鲁棒性。

Comments Accepted by EMNLP 2025 Findings

详情
AI中文摘要

离线偏好优化方法对于大型语言模型(LLMs)的对齐是高效的。直接偏好优化(DPO)类学习作为最流行的方法之一,因其在奖励建模中的高效性而脱颖而出。然而,这些方法通常遵循惯例使用Bradley-Terry(BT)奖励建模,该建模面临几个关键假设,包括对成对训练数据的需求、模型分布偏移、人类理性假设等。为了解决这些限制,我们提出了一种通用的离线偏好优化框架——基于不确定性感知效用锚点的自适应偏好优化(UAPO),该框架引入了一个锚点函数来估计偏好数据标注带来的不确定性。我们的方法即使在数据未配对的情况下也能进行训练,显著提高了数据利用效率。此外,锚点设计使UAPO在训练过程中更加鲁棒。实验结果表明,UAPO在无需严格依赖数据配对的情况下取得了有竞争力的结果,为更灵活有效的偏好优化方法铺平了道路。

英文摘要

Offline preference optimization methods are efficient for large language models (LLMs) alignment. Direct Preference optimization (DPO)-like learning, one of the most popular approaches, stands out for its efficiency in reward modeling. However, these methods typically follow the convention to use Bradley-Terry (BT) reward modeling that faces several critical assumptions, including the requirement for pairwise training data, model distribution shifting, human rationality assumption, etc. To address these limitations, we propose a general framework for offline preference optimization methods, Adaptive Preference Optimization with Utility Anchor (UAPO), which introduces an anchoring function to estimate the uncertainties brought from preference data annotation. Our method enables training even in scenarios where the data is unpaired, significantly enhancing data utilization efficiency. Moreover, the anchor design makes UAPO more robust in the training process. Experimental results demonstrate that UAPO achieves competitive outcomes without the strict dependency on data pairing, paving the way for more flexible and effective preference optimization methods.

2508.21620 2026-05-26 cs.LG 版本更新

Introduction to the Analysis of Probabilistic Decision-Making Algorithms

概率决策算法分析导论

Agustinus Kristiadi

发表机构 * Western University and Vector Institute(西方大学和向量研究所)

AI总结 本文为概率决策算法(包括赌博机算法、贝叶斯优化和树搜索算法)的理论分析提供了一本自包含的入门指南,旨在降低非专家的理解门槛。

详情
AI中文摘要

决策理论为在各种不确定性下做出选择提供了原则性方法。实现这些理论的算法已成功应用于广泛的实际问题,包括材料和药物发现。事实上,这些算法是可取的,因为它们可以自适应地收集信息以在未来做出更好的决策,从而产生数据高效的工作流程。在科学发现中,实验成本高昂,因此这些算法可以显著降低实验成本。这些算法的理论分析对于理解其行为以及为开发下一代算法提供有价值的见解至关重要。然而,文献中的理论分析通常对非专家来说难以理解。本专著旨在为常用概率决策算法(包括赌博机算法、贝叶斯优化和树搜索算法)的理论分析提供一本可访问的、自包含的入门介绍。仅假设读者具备概率论和统计学的基本知识,以及一些关于高斯过程的基础知识。

英文摘要

Decision theories offer principled methods for making choices under various types of uncertainty. Algorithms that implement these theories have been successfully applied to a wide range of real-world problems, including materials and drug discovery. Indeed, they are desirable since they can adaptively gather information to make better decisions in the future, resulting in data-efficient workflows. In scientific discovery, where experiments are costly, these algorithms can thus significantly reduce the cost of experimentation. Theoretical analyses of these algorithms are crucial for understanding their behavior and providing valuable insights for developing next-generation algorithms. However, theoretical analyses in the literature are often inaccessible to non-experts. This monograph aims to provide an accessible, self-contained introduction to the theoretical analysis of commonly used probabilistic decision-making algorithms, including bandit algorithms, Bayesian optimization, and tree search algorithms. Only basic knowledge of probability theory and statistics, along with some elementary knowledge about Gaussian processes, is assumed.

2508.11307 2026-05-26 physics.ao-ph cs.LG physics.data-an 版本更新

Approximating the universal thermal climate index using sparse regression with orthogonal polynomials

使用正交多项式稀疏回归逼近通用热气候指数

Sabin Roman, Ljupco Todorovski, Saso Dzeroski, Gregor Skok

发表机构 * Department of Knowledge Technologies, Jo z ef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia Faculty of Mathematics Physics, University of Ljubljana, Jadranska ulica 19, 1000 Ljubljana, Slovenia

AI总结 针对通用热气候指数(UTCI)标准多项式近似误差大的问题,提出基于正交多项式基的稀疏回归方法,在保持计算效率的同时显著降低平均误差和大误差频率。

Comments Final peer-reviewed version of the manuscript

详情
Journal ref
Geoscientific Model Development 19, 4319-4330 (2026)
AI中文摘要

通用热气候指数(UTCI)是一种衡量热舒适度的指标,用于量化人类对环境条件的感受。由于其作为生物气候指标的稳健性和多功能性,已被广泛应用于生物气候学的众多研究中,并越来越多地作为户外热舒适度的操作度量。从相关环境参数计算UTCI值通常并不直接,因此使用6次多项式近似已成为计算UTCI值的标准方法。尽管计算效率高,但该多项式近似的误差可能很大。本研究的目标是开发一种改进的多项式近似版本——既能保持相当的计算效率,又在数值稳定性方面更稳健,且精度显著提高,特别是在减少较大误差的频率方面。通过使用稀疏正交回归(即基于正交多项式基的稀疏回归)实现了这一目标,这不仅大幅降低了平均误差(即平均误差、平均绝对误差和均方根误差),还显著减少了较大误差的频率。利用勒让德多项式基,可以构建近似模型,有效填充精度与复杂度的帕累托前沿,并在不同模型容量下表现出稳定的层次化系数结构。仅使用20%的数据训练新近似模型,并在剩余80%的数据上进行测试,显示出成功的泛化能力,且结果在自助法下具有稳健性。该分解有效地将UTCI近似为正交基中的傅里叶式展开,在L2(最小二乘)意义上接近理论最优值。

英文摘要

The Universal Thermal Climate Index (UTCI) is a measure of thermal comfort that quantifies how humans experience environmental conditions. Due to its robustness and versatility as a bioclimatic indicator, it has been extensively employed across a wide range of studies in bioclimatology and is increasingly used as an operational measure of outdoor thermal comfort. Calculating the UTCI value from the relevant environmental parameters is nominally not straightforward, which is why using a 6th-degree polynomial approximation has become the standard way to calculate UTCI values. Although it is computationally efficient, the error of this polynomial approximation can be substantial. The goal of this study was to develop an improved version of the polynomial approximation - one that retains comparable computational efficiency but is more robust in terms of numerical stability and substantially more accurate, particularly in reducing the frequency of larger errors. This goal was achieved using sparse orthogonal regression, namely sparse regression with an orthogonal polynomial basis, which not only substantially reduces the average errors (i.e., the mean error, the mean absolute error, and the root mean square error) but also drastically reduces the frequency of large errors. By leveraging Legendre polynomial bases, approximation models could be constructed that efficiently populate a Pareto front of accuracy versus complexity and exhibit stable, hierarchical coefficient structures across varying model capacities. Training the new approximation models over only 20% of the data, with the testing performed over the remaining 80%, highlights successful generalization, with the results being robust under bootstrapping. The decomposition effectively approximates the UTCI as a Fourier-like expansion in an orthogonal basis, yielding results near the theoretical optimum in the L2 (least squares) sense.

2507.14760 2026-05-26 eess.IV cs.AI cs.CV cs.LG 版本更新

QUTCC: Quantile Uncertainty Training and Conformal Calibration for Imaging Inverse Problems

QUTCC: 成像逆问题的分位数不确定性训练与保形校准

Cassandra Tong Ye, Shamus Li, Tyler King, Kristina Monakhova

AI总结 提出QUTCC方法,结合分位数回归与U-Net实现空间自适应保形校准,在多个成像逆问题中生成更紧的不确定性区间并定位模型幻觉。

详情
AI中文摘要

尽管深度学习为科学和医学成像带来了巨大前景,但任何失败和幻觉(与事实不符的预测)都难以定位,并可能产生严重的下游后果。不确定性估计技术,如保形预测,可以通过预测模型预测的统计有效误差条来提供帮助。然而,流行的保形预测方法并非为高维图像值问题设计,且在保形校准过程中未考虑图像内的空间相关性,导致不确定性区间过大。我们提出了一种实用的同时分位数回归方法,能够在保形校准期间实现非线性、空间自适应缩放。我们的方法QUTCC使用带有分位数嵌入的U-Net架构,在训练期间学习完整的条件分位数分布,然后利用这个非线性学习函数进行空间自适应保形校准。在测试时,我们的方法能够高效地估计具有像素边际覆盖保证的不确定性区间。此外,QUTCC还可以在没有内置分布假设的情况下预测逐像素条件概率密度估计。我们在多个去噪问题、加速磁共振成像和定量相位显微镜上评估了我们的方法。与先前的保形方法相比,我们的方法在相同覆盖水平下始终产生更紧的不确定性区间,能够预测不同任务的合理条件分布,并且在某些情况下,高不确定性区域可以帮助我们定位模型预测中的幻觉。

英文摘要

While deep learning offers tremendous promise for scientific and medical imaging, any failures and hallucinations (predictions that do not coincide with reality) are hard to pinpoint and can have serious downstream consequences. Uncertainty estimation techniques, such as conformal prediction, can help by predicting statistically valid error bars for a model's prediction. However, popular conformal prediction methods were not designed for high-dimensional image-valued problems and do not take into account spatial correlations within an image during conformal calibration, resulting in larger-than-necessary uncertainty intervals. We propose a practical simultaneous quantile regression method that enables non-linear, spatially-adaptive scaling during conformal calibration. Our method, QUTCC uses a U-Net architecture with a quantile embedding to learn a full conditional quantile distribution during training, and then leverages this non-linear, learned function for spatially-adaptive conformal calibration. At test time, our method can efficiently estimate uncertainty intervals with pixel-marginal coverage guarantees. In addition, QUTCC can also predict pixel-wise conditional probability density estimates without any built-in distributional assumptions. We evaluate our method on several denoising problems, accelerated magnetic resonance imaging, and quantitative phase microscopy. Our method consistently produces tighter uncertainty intervals than prior conformal methods at the same coverage level, can predict plausible conditional distributions for different tasks, and in some cases, high-uncertainty regions can help us locate hallucinations in a model's prediction.

2507.06038 2026-05-26 math.NA cs.LG cs.NA 版本更新

Fredholm Neural Networks for inverse problems in elliptic PDEs

Fredholm神经网络用于椭圆型偏微分方程反问题

Kyriakos C. Georgiou, Constantinos Siettos, Athanasios N. Yannacopoulos

发表机构 * Division of Applied Mathematics, Brown University(布朗大学应用数学系) Department of Statistics and Stochastic Modelling and Applications Laboratory, Athens University of Economics and Business(雅典经济与商业大学统计学与随机建模与应用实验室)

AI总结 基于Fredholm神经网络框架,提出可解释的Potential Fredholm神经网络(PFNN)求解椭圆型偏微分方程正反问题,实现高精度并严格证明误差界。

详情
AI中文摘要

在我们先前关于Fredholm神经网络(Fredholm NN / FNN)求解积分方程的工作基础上,我们将该框架扩展到线性和非线性椭圆型偏微分方程的反问题。所提出的方案包含一个定制设计的深度神经网络(DNN),其中层数、权重、偏置和超参数基于不动点方案以可解释的方式计算,因此我们称之为Potential Fredholm神经网络(PFNN)。我们首先构建PFNN作为求解正问题的方法,表明该方法确保了高精度和可解释性,在区域内部实现小误差,在边界上接近机器精度。然后,我们使用该方法求解椭圆型PDE的反问题,并提供了方案一致性的严格证明以及与PFNN架构直接相关的区域内部和边界的误差界。特别地,我们表明这些误差界依赖于边界函数的近似和积分离散方案,两者都直接对应于Fredholm NN架构的组成部分。通过这种方式,我们构建了一个可解释的方案,该方案为反问题提供精确解,同时由于PFNN的架构而明确尊重边界条件。我们评估了所提出方案在二维和三维线性和半线性椭圆型PDE上的性能。

英文摘要

Building on our previous work on Fredholm Neural Networks (Fredholm NNs/ FNNs) for solving integral equations, we extend the framework to inverse problems for linear and nonlinear elliptic partial differential equations. The proposed scheme consists of a custom-designed deep neural network (DNN) in which the number of layers, weights, biases and hyperparameters are computed in an explainable manner based on a fixed-point scheme, and we therefore refer to this as the Potential Fredholm Neural Network (PFNN). We first build the PFNN as a method for solving the forward problem, showing that this approach ensures both a high accuracy and explainability, achieving small errors in the interior of the domain, and near machine-precision on the boundary. We then use this approach to solve inverse problems for elliptic PDEs, and provide a rigorous proof for the consistency of the scheme and error bounds for both the interior and boundary of the domain, tied directly to the architecture of the PFNN. In particular, we show that these error bounds depend on the approximation of the boundary function and the integral discretization scheme, both of which directly correspond to components of the Fredholm NN architecture. In this way, we construct an explainable scheme that provides accurate solutions to the inverse problems, whilst still explicitly respecting the boundary conditions, due to the architecture of the PFNN. We assess the performance of the proposed scheme for linear and semi-linear elliptic PDEs in two and three dimensions.

2506.01945 2026-05-26 econ.EM cs.LG stat.AP 版本更新

Stock Market Telepathy: Graph Neural Networks Predicting the Secret Conversations between MINT and G7 Countries

股市读心术:图神经网络预测MINT与G7国家之间的秘密对话

Nurbanu Bursa

发表机构 * Hacettepe University(哈切特佩大学)

AI总结 使用MTGNN图神经网络分析2012-2024年G7与MINT国家股市指数,揭示美国、加拿大、印尼和土耳其的影响力,并证明该方法优于传统预测模型。

详情
Journal ref
Communications in Statistics: Case Studies, Data Analysis and Applications (2026)
AI中文摘要

新兴经济体,特别是MINT国家(墨西哥、印度尼西亚、尼日利亚和土耳其),在全球股市中的影响力日益增强,尽管它们仍易受G7(加拿大、法国、德国、意大利、日本、英国和美国)等发达国家经济状况的影响。金融市场的这种相互关联性和敏感性使得理解这些关系对于投资者和政策制定者准确预测股价走势至关重要。为此,我们研究了2012年至2024年G7和MINT国家的主要股市指数,使用了一种称为多元时间序列图神经网络(MTGNN)的最新图神经网络算法。该方法允许考虑多元时间序列中复杂的时空连接。在实现中,MTGNN揭示出美国和加拿大在预测过程中对股市指数最具影响力的G7国家,而印度尼西亚和土耳其是最具影响力的MINT国家。此外,我们的结果表明,MTGNN在预测MINT和G7国家股市指数价格方面优于传统方法。因此,该研究为经济板块市场提供了宝贵的见解,并提出了一种使用MTGNN分析全球股市动态的令人信服的实证方法。

英文摘要

Emerging economies, particularly the MINT countries (Mexico, Indonesia, Nigeria, and Türkiye), are gaining influence in global stock markets, although they remain susceptible to the economic conditions of developed countries like the G7 (Canada, France, Germany, Italy, Japan, the United Kingdom, and the United States). This interconnectedness and sensitivity of financial markets make understanding these relationships crucial for investors and policymakers to predict stock price movements accurately. To this end, we examined the main stock market indices of G7 and MINT countries from 2012 to 2024, using a recent graph neural network (GNN) algorithm called multivariate time series forecasting with graph neural network (MTGNN). This method allows for considering complex spatio-temporal connections in multivariate time series. In the implementations, MTGNN revealed that the US and Canada are the most influential G7 countries regarding stock indices in the forecasting process, and Indonesia and Türkiye are the most influential MINT countries. Additionally, our results showed that MTGNN outperformed traditional methods in forecasting the prices of stock market indices for MINT and G7 countries. Consequently, the study offers valuable insights into economic blocks' markets and presents a compelling empirical approach to analyzing global stock market dynamics using MTGNN.

2505.05371 2026-05-26 eess.SP cs.LG q-bio.NC 版本更新

From Sleep Staging to Spindle Detection: A Case Study on End-to-End Automated Sleep Analysis

从睡眠分期到纺锤波检测:端到端自动化睡眠分析的案例研究

Niklas Grieger, Siamak Mehrkanoon, Philipp Ritter, Stephan Bialonski

发表机构 * Department of Medical Engineering and Technomathematics, FH Aachen University of Applied Sciences(医学工程与技术数学系,亚琛应用科学大学) Department of Information and Computing Sciences, Utrecht University(信息与计算科学系,乌得勒支大学) Institute for Data-Driven Technologies, FH Aachen University of Applied Sciences(数据驱动技术研究所,亚琛应用科学大学) Department of Psychiatry and Psychotherapy, University Hospital Carl Gustav Carus, Technische Universität Dresden(精神病学与心理治疗系,卡尔·古斯塔夫·卡尔斯大学医院,德累斯顿技术大学)

AI总结 本研究通过案例评估,使用已验证的机器学习模型(RobustSleepNet和SUMOv2)实现全自动化睡眠分析,成功复现了专家基于双相情感障碍的研究发现,表明全自动化方法可促进大规模睡眠研究。

Comments 12 pages, 4 figures, 2 tables

详情
Journal ref
Scientific Reports 16, 16014 (2026)
AI中文摘要

睡眠分析的自动化,包括宏观结构(睡眠分期)和微观结构(例如睡眠纺锤波)元素,有望实现大规模睡眠研究,并减少由于评分者间不一致导致的差异。虽然睡眠分期和纺锤波检测等单个步骤已被分别研究,但多步骤睡眠分析自动化的可行性仍不清楚。在本案例研究中,我们评估了使用经过验证的机器学习模型进行睡眠分期(RobustSleepNet)和后续纺锤波检测(SUMOv2)的全自动化分析是否能够复现基于专家的双相情感障碍研究结果。自动化分析定性地复现了专家研究的关键发现,包括双相情感障碍患者与健康对照之间快速纺锤波密度的显著差异,在几分钟内完成了以前需要数月手动完成的工作。虽然自动化分析的结果在定量上与专家研究存在差异,可能是由于专家评分者之间或评分者与模型之间的偏差,但各个模型在睡眠分期和纺锤波检测方面的表现达到或超过了评分者间一致性。我们的结果表明,全自动化方法具有促进大规模睡眠研究的潜力。我们通过共享代码并引入SomnoBot(一个保护隐私的睡眠分析平台),公开提供自动化分析中使用的工具。

英文摘要

Automation of sleep analysis, including both macrostructural (sleep stages) and microstructural (e.g., sleep spindles) elements, promises to enable large-scale sleep studies and to reduce variance due to inter-rater incongruencies. While individual steps, such as sleep staging and spindle detection, have been studied separately, the feasibility of automating multi-step sleep analysis remains unclear. In this case study, we evaluate whether a fully automated analysis using validated machine learning models for sleep staging (RobustSleepNet) and subsequent spindle detection (SUMOv2) can replicate findings from an expert-based study of bipolar disorder. The automated analysis qualitatively reproduced key findings from the expert-based study, including significant differences in fast spindle densities between bipolar patients and healthy controls, accomplishing in minutes what previously took months to complete manually. While the results of the automated analysis differed quantitatively from the expert-based study, possibly due to biases between expert raters or between raters and the models, the models individually performed at or above inter-rater agreement for both sleep staging and spindle detection. Our results demonstrate that fully automated approaches have the potential to facilitate large-scale sleep research. We are providing public access to the tools used in our automated analysis by sharing our code and introducing SomnoBot, a privacy-preserving sleep analysis platform.

2504.05181 2026-05-26 cs.IR cs.AI cs.DL cs.LG 版本更新

Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval

轻量级直接文档相关性优化用于生成式信息检索

Kidist Amde Mekonnen, Yubao Tang, Maarten de Rijke

发表机构 * Institute for Clarity in Documentation(文档清晰度研究所) Inria Paris-Rocquencourt(巴黎- Rocquencourt 国家信息与自动化所) Rajiv Gandhi University(拉朱·甘地大学) Tsinghua University(清华大学) Palmer Research Laboratories(帕勒尔研究实验室) University of Amsterdam(阿姆斯特丹大学)

AI总结 提出直接文档相关性优化(DDRO)方法,通过成对排序直接对齐令牌级文档ID生成与文档级相关性估计,无需显式奖励建模和强化学习,在MS MARCO和Natural Questions上分别提升MRR@10 7.4%和19.9%。

Comments 12 pages, 3 figures. SIGIR '25 Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval July 13--18, 2025 Padua, Italy. Code and pretrained models available at: https://github.com/kidist-amde/ddro/

详情
Journal ref
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '25), pages 1327-1338, 2025
AI中文摘要

生成式信息检索(GenIR)是一种有前景的神经检索范式,它将文档检索形式化为文档标识符(docid)生成任务,允许朝着统一的全局检索目标进行端到端优化。然而,现有的GenIR模型存在令牌级错位问题,即训练用于预测下一个令牌的模型往往无法有效捕捉文档级相关性。虽然基于强化学习的方法(如相关性反馈强化学习(RLRF))旨在通过奖励建模解决这种错位,但它们引入了显著的复杂性,需要优化辅助奖励函数,然后进行强化微调,这在计算上昂贵且往往不稳定。为了解决这些挑战,我们提出了直接文档相关性优化(DDRO),它通过成对排序的直接优化,将令牌级docid生成与文档级相关性估计对齐,无需显式的奖励建模和强化学习。在包括MS MARCO文档和Natural Questions在内的基准数据集上的实验结果表明,DDRO优于基于强化学习的方法,在MS MARCO上MRR@10提升了7.4%,在Natural Questions上提升了19.9%。这些发现凸显了DDRO通过简化优化方法增强检索效果的潜力。通过将对齐问题框架化为直接优化问题,DDRO简化了GenIR模型的排序优化流程,同时为基于强化学习的方法提供了一种可行的替代方案。

英文摘要

Generative information retrieval (GenIR) is a promising neural retrieval paradigm that formulates document retrieval as a document identifier (docid) generation task, allowing for end-to-end optimization toward a unified global retrieval objective. However, existing GenIR models suffer from token-level misalignment, where models trained to predict the next token often fail to capture document-level relevance effectively. While reinforcement learning-based methods, such as reinforcement learning from relevance feedback (RLRF), aim to address this misalignment through reward modeling, they introduce significant complexity, requiring the optimization of an auxiliary reward function followed by reinforcement fine-tuning, which is computationally expensive and often unstable. To address these challenges, we propose direct document relevance optimization (DDRO), which aligns token-level docid generation with document-level relevance estimation through direct optimization via pairwise ranking, eliminating the need for explicit reward modeling and reinforcement learning. Experimental results on benchmark datasets, including MS MARCO document and Natural Questions, show that DDRO outperforms reinforcement learning-based methods, achieving a 7.4% improvement in MRR@10 for MS MARCO and a 19.9% improvement for Natural Questions. These findings highlight DDRO's potential to enhance retrieval effectiveness with a simplified optimization approach. By framing alignment as a direct optimization problem, DDRO simplifies the ranking optimization pipeline of GenIR models while offering a viable alternative to reinforcement learning-based methods.

2504.05108 2026-05-26 cs.AI cs.LG cs.NE 版本更新

Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning

利用大语言模型发现算法:进化搜索遇见强化学习

Anja Surina, Amin Mansouri, Lars Quaedvlieg, Amal Seddas, Maryna Viazovska, Emmanuel Abbe, Caglar Gulcehre

发表机构 * EPFL(苏黎世联邦理工学院) Apple(苹果公司)

AI总结 提出通过强化学习微调持续优化大语言模型,结合进化搜索加速发现更优算法,在组合优化任务上验证有效性。

Comments 34 pages

详情
AI中文摘要

发现解决复杂问题的高效算法一直是数学和计算机科学中的重大挑战,多年来需要大量人类专业知识。近期,基于大语言模型(LLMs)的进化搜索在加速跨领域算法发现方面展现出潜力,特别是在数学和优化领域。然而,现有方法将LLM视为静态生成器,错过了利用进化探索获得的信号更新模型的机会。在这项工作中,我们提出通过强化学习(RL)微调持续优化搜索算子——即LLM,从而增强基于LLM的进化搜索。我们的方法利用进化搜索作为探索策略来发现改进的算法,而RL则基于这些发现优化LLM策略。我们在组合优化任务上的实验表明,将RL与进化搜索相结合加速了更优算法的发现,展示了RL增强的进化策略在算法设计中的潜力。

英文摘要

Discovering efficient algorithms for solving complex problems has been an outstanding challenge in mathematics and computer science, requiring substantial human expertise over the years. Recent advancements in evolutionary search with large language models (LLMs) have shown promise in accelerating the discovery of algorithms across various domains, particularly in mathematics and optimization. However, existing approaches treat the LLM as a static generator, missing the opportunity to update the model with the signal obtained from evolutionary exploration. In this work, we propose to augment LLM-based evolutionary search by continuously refining the search operator - the LLM - through reinforcement learning (RL) fine-tuning. Our method leverages evolutionary search as an exploration strategy to discover improved algorithms, while RL optimizes the LLM policy based on these discoveries. Our experiments on combinatorial optimization tasks demonstrate that integrating RL with evolutionary search accelerates the discovery of superior algorithms, showcasing the potential of RL-enhanced evolutionary strategies for algorithm design.

2503.19605 2026-05-26 cs.LG cs.CL math.ST stat.TH 版本更新

Lean Formalization of Generalization Error Bound by Rademacher Complexity and Dudley's Entropy Integral

Rademacher复杂度和Dudley熵积分的泛化误差界的Lean形式化

Sho Sonoda, Kazumi Kasaura, Yuma Mizuno, Kei Tsukamoto, Naoto Onda

发表机构 * RIKEN AIP(日本理化学研究所AIP) CyberAgent Inc.(CyberAgent公司) OMRON SINIC X Corporation(OMRON SINIC X株式会社) University College Cork(科克大学) The University of Tokyo(东京大学)

AI总结 本文在Lean 4中形式化了基于Rademacher复杂度的泛化误差界,通过形式化对称化论证、有界差异分析和McDiarmid不等式,并扩展到可数假设类及可分离拓扑索引集,最后应用得到线性预测器的经验Rademacher界和Dudley熵积分界。

Comments accepted at ITP2026

详情
AI中文摘要

理解和证明机器学习算法的泛化性能——即从训练误差获得测试误差的理论估计——是统计学习理论的核心主题。在用于推导此类保证的众多复杂度度量中,Rademacher复杂度提供了尖锐的、数据相关的界,其适用范围远超经典的VC维理论。在本研究中,我们基于Mathlib库中可用的测度论概率论,在Lean 4中形式化了Rademacher复杂度的泛化误差界。我们的开发提供了一个经过机械检查的流水线,从经验和期望Rademacher复杂度的定义开始,经过形式化的对称化论证和有界差异分析,通过形式化证明的McDiarmid不等式得到高概率一致偏差界。一个关键的技术贡献是可重用机制,通过归约到可数稠密子集,将结果从可数假设类(其中上确界的可测性在Mathlib中直接成立)提升到可分离拓扑索引集。作为抽象定理的工作应用,我们机械化了$\ell_2$和$\ell_1$正则化下线性预测器的标准经验Rademacher界,并且我们还形式化了基于覆盖数和链式构造的Dudley型熵积分界。

英文摘要

Understanding and certifying the generalization performance of machine learning algorithms -- i.e. obtaining theoretical estimates of the test error from the training error -- is a central theme of statistical learning theory. Among the many complexity measures used to derive such guarantees, Rademacher complexity yields sharp, data-dependent bounds that apply well beyond classical VC-dimension theory. In this study, we formalize the generalization error bound by Rademacher complexity in Lean 4, building on measure-theoretic probability theory available in the Mathlib library. Our development provides a mechanically-checked pipeline from the definitions of empirical and expected Rademacher complexity, through a formal symmetrization argument and a bounded-differences analysis, to high-probability uniform deviation bounds via a formally proved McDiarmid inequality. A key technical contribution is a reusable mechanism for lifting results from countable hypothesis classes (where measurability of suprema is straightforward in Mathlib) to separable topological index sets via a reduction to a countable dense subset. As worked applications of the abstract theorem, we mechanize standard empirical Rademacher bounds for linear predictors under $\ell_2$ and $\ell_1$ regularizations, and we also formalize a Dudley-type entropy integral bound based on covering numbers and a chaining construction.

2503.01684 2026-05-26 nucl-th cs.LG physics.comp-ph stat.ML 版本更新

An Efficient Learning Method to Connect Observables

一种连接可观测量与高效学习方法

Hang Yu, Takayuki Miyagi

发表机构 * Center for Computational Sciences, University of Tsukuba(茨川大学计算科学中心)

AI总结 提出多参数本征值问题(MEP)仿真器,通过连接不同仿真器实现从可观测量到可观测量的直接预测,并利用特征向量延续(EC)和参数矩阵模型(PMM)数据进行训练,在一维格点模拟和$^{28}$O示例中验证了性能与预测概率分布获取的简便性。

Comments 5+2 pages, 4 figures, matched published version. Shared data and toy model code in the source file (shared.zip)

详情
Journal ref
Phys. Rev. Lett. 136, 202502 (2026)
AI中文摘要

构建快速准确的替代模型是许多主题中做出稳健预测的关键要素。我们引入了一种新模型,即多参数本征值问题(MEP)仿真器。新方法连接了仿真器,并可以直接从可观测量到可观测量进行预测。我们展示了MEP仿真器可以使用来自特征向量延续(EC)和参数矩阵模型(PMM)仿真器的数据进行训练。在一维格点上的简单模拟证实了MEP仿真器的性能。以$^{28}$O为例,我们还证明了通过新仿真器可以轻松获得目标可观测量的预测概率分布。

英文摘要

Constructing fast and accurate surrogate models is a key ingredient for making robust predictions in many topics. We introduce a new model, the Multiparameter Eigenvalue Problem (MEP) emulator. The new method connects emulators and can make predictions directly from observables to observables. We present that the MEP emulator can be trained with data from Eigenvector Continuation (EC) and Parametric Matrix Model (PMM) emulators. A simple simulation on a one-dimensional lattice confirms the performance of the MEP emulator. Using $^{28}$O as an example, we also demonstrate that the predictive probability distribution of the target observables can be easily obtained through the new emulator.

2502.06018 2026-05-26 cs.LG cs.AI 版本更新

Kolmogorov-Arnold Fourier Networks

Kolmogorov-Arnold 傅里叶网络

Jusheng Zhang, Yijia Fan, Kaitong Cai, Keze Wang, Wenhao Wang

发表机构 * Sun Yat-sen University(中山大学) Vast Intelligence Lab(远见实验室)

AI总结 针对KAN网络参数爆炸和高维任务中高频特征捕获能力不足的问题,提出Kolmogorov-Arnold傅里叶网络(KAF),通过谱重参数化将局部B样条表示转换为全局自适应谱表示,引入可训练随机傅里叶特征和自适应混合GELU-傅里叶激活机制,在CV、NLP、音频和PDE求解任务上取得最优性能。

Comments Code:https://github.com/kolmogorovArnoldFourierNetwork/KAF

详情
AI中文摘要

尽管基于Kolmogorov-Arnold的可解释网络(KAN)具有强大的理论表达能力,但在高维任务中面临严重的参数爆炸和捕获高频特征能力有限的问题。为解决这些问题,我们提出了Kolmogorov-Arnold傅里叶网络(KAF),通过谱重参数化从根本上重新定义了KAN范式。我们的主要贡献包括:(1)提出从局部的、基于网格的B样条表示到全局的、自适应的谱表示的基础基变换。这一转变改变了网络的归纳偏置,将参数复杂度从$O(G)$降低到$O(1)$,同时保持表达能力;(2)引入通过谱对齐策略初始化的可训练随机傅里叶特征(RFF),使模型能够打破固定核的平滑性限制,准确捕获高频分量;(3)实现自适应混合GELU-傅里叶激活机制,在训练过程中逐步增强频率表示。大量实验证明了KAF在计算机视觉(CV)、自然语言处理(NLP)、音频和偏微分方程(PDE)求解任务上的优越性,以更高的效率实现了最先进的性能。代码可在https://github.com/kolmogorovArnoldFourierNetwork/KAF获取。

英文摘要

Although Kolmogorov-Arnold-based interpretable networks (KANs) possess strong theoretical expressiveness, they suffer from severe parameter explosion and limited ability to capture high-frequency features in high-dimensional tasks. To address these issues, we propose the Kolmogorov-Arnold Fourier Network (KAF), which fundamentally redefines the KAN paradigm through spectral reparameterization. Our key contributions include: (1) proposing a fundamental basis transformation from the local, grid-based B-spline representation to a global, adaptive spectral representation. This shift changes the network's inductive bias, reducing parameter complexity from $O(G)$ to $O(1)$ while preserving expressiveness; (2) introducing trainable Random Fourier Features (RFF) initialized via a spectral alignment strategy, which allows the model to break the smoothness limitation of fixed kernels and accurately capture high-frequency components; and (3) implementing an adaptive hybrid GELU-Fourier activation mechanism that progressively enhances frequency representation during training. Comprehensive experiments demonstrate the superiority of KAF across computer vision (CV), natural language processing (NLP), audio, and partial differential equation (PDE) solving tasks, achieving state-of-the-art performance with improved efficiency. The code is available at https://github.com/kolmogorovArnoldFourierNetwork/KAF.

2501.19389 2026-05-26 cs.LG 版本更新

Federated Sketching LoRA: A Flexible Framework for Heterogeneous Collaborative Fine-Tuning of LLMs

联邦草图LoRA:一种用于异构协作微调大语言模型的灵活框架

Wenzhi Fang, Dong-Jun Han, Liangqi Yuan, Seyyedali Hosseinalipour, Christopher G. Brinton

发表机构 * Purdue University(普渡大学) Yonsei University(延世大学) University at Buffalo-SUNY(布法罗大学- SUNY)

AI总结 针对资源受限客户端上大语言模型微调中的异构性问题,提出联邦草图LoRA(FSLoRA),通过草图机制让客户端选择性更新服务器维护的全局LoRA模块子矩阵,并利用草图比例灵活适应客户端约束,提供收敛性分析,实验表明优于基线并提升训练效率。

Comments We propose Federated Sketching LoRA (FSLoRA), a theoretically grounded methodology for collaborative LLM fine-tuning that retains LoRA's flexibility while adapting to the communication and computational capabilities of individual clients

详情
AI中文摘要

在资源受限的客户端上微调大语言模型(LLMs)仍然是一个具有挑战性的问题。最近的工作将低秩适应(LoRA)技术与联邦微调相结合,以缓解与客户端模型大小和数据稀缺相关的挑战。然而,资源的异构性仍然是一个关键瓶颈:虽然更高秩的模块通常能提升性能,但不同的客户端能力限制了LoRA可行的秩范围。现有试图解决该问题的方法要么缺乏分析依据,要么增加额外的计算开销,为高效且理论扎实的解决方案留下了很大空白。为了解决这些挑战,我们提出了联邦草图LoRA(FSLoRA),它利用草图机制使客户端能够选择性地更新服务器维护的全局LoRA模块的子矩阵。通过调整决定客户端子矩阵秩的草图比例,FSLoRA灵活地适应客户端特定的通信和计算约束。我们提供了FSLoRA的严格收敛性分析,刻画了草图比例如何影响收敛速度。通过大量实验,我们证明FSLoRA优于基线,并在保持稳定收敛的同时显著提高了训练效率。

英文摘要

Fine-tuning large language models (LLMs) on resource-constrained clients remains a challenging problem. Recent works have fused low-rank adaptation (LoRA) techniques with federated fine-tuning to mitigate challenges associated with client model sizes and data scarcity. Still, the heterogeneity of resources remains a critical bottleneck: while higher-rank modules generally enhance performance, varying client capabilities constrain LoRA's feasible rank range. Existing approaches attempting to resolve this issue either lack analytical justification or impose additional computational overhead, leaving a wide gap for efficient and theoretically-grounded solutions. To address these challenges, we propose federated sketching LoRA (FSLoRA), which leverages a sketching mechanism to enable clients to selectively update submatrices of global LoRA modules maintained by the server. By adjusting the sketching ratios, which determine the ranks of the submatrices on the clients, FSLoRA flexibly adapts to client-specific communication and computational constraints. We provide a rigorous convergence analysis of FSLoRA that characterizes how the sketching ratios affect the convergence rate. Through extensive experiments, we demonstrate that FSLoRA outperforms baselines and significantly improves training efficiency while preserving stable convergence.

2501.18278 2026-05-26 cs.LG 版本更新

ReactEmbed: A Plug-and-Play Module for Unifying Protein-Molecule Representations Guided by Biochemical Reaction Networks

ReactEmbed: 一种基于生化反应网络统一蛋白质-分子表示的可插拔模块

Amitay Sicherman, Kira Radinsky

发表机构 * Technion(技术ion)

AI总结 提出ReactEmbed模块,利用生化反应网络对齐蛋白质和分子嵌入,实现跨域统一表示,无需重新训练。

详情
AI中文摘要

最先进的模型将蛋白质和分子表示在独立的嵌入流形中,限制了对系统性生物过程的建模。我们提出ReactEmbed,一个轻量级、可插拔的模块,弥合了这一差距。ReactEmbed利用生化反应网络作为功能上下文的来源,基于共同参与反应定义共享功能范围的原则。该模块使用加权反应图和专门的采样策略,将来自ESM-3和MolFormer等模型的冻结嵌入对齐到统一空间。这一过程丰富了单模态嵌入,并在跨域基准上实现了强性能。ReactEmbed提供了一种无需昂贵重新训练即可统一生物表示的实用方法。代码和数据库已开放使用。

英文摘要

State-of-the-art models represent proteins and molecules in separate embedding manifolds, limiting the modeling of systemic biological processes. We introduce ReactEmbed, a lightweight, plug-and-play module that bridges this gap. ReactEmbed leverages biochemical reaction networks as a source of functional context, based on the principle that co-participation in reactions defines a shared functional scope. The module aligns frozen embeddings from models like ESM-3 and MolFormer into a unified space using a weighted reaction graph and a specialized sampling strategy. This process enriches unimodal embeddings and enables strong performance on cross-domain benchmarks. ReactEmbed offers a practical method to unify biological representations without costly retraining. The code and database are available for open use\footnote{https://github.com/amitaysicherman/ReactEmbeded}.

2501.18196 2026-05-26 cs.LG 版本更新

GDformer: Going Beyond Subsequence Isolation for Multivariate Time Series Anomaly Detection

GDformer:超越子序列隔离的多变量时间序列异常检测

Qingxiang Liu, Xiaoliang Luo, Chenghao Liu, Sheng Sun, Di Yao, Lvchun Wang, Wei Yu, Yuxuan Liang

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港科学与技术大学(广州)) China Mobile (Jiangxi) Virtual Reality Technology Co., Ltd.(中国移动(江西)虚拟现实技术有限公司) Salesforce AI Research(Salesforce AI研究) Institute of Computing Technology, Chinese Academy of Sciences(中国科学院计算技术研究所)

AI总结 提出全局字典增强Transformer(GDformer),通过基于字典的交叉注意力机制学习整个序列中所有正常点的全局表示,并利用原型捕获正常点-全局相关权重分布,实现基于表示相似性的统一检测准则,在五个基准数据集上达到最先进性能。

详情
AI中文摘要

无监督的多变量时间序列异常检测是一项具有挑战性的任务,因为需要在不访问异常点的情况下推导出紧凑的检测标准。现有方法主要基于重构误差或关联分歧,两者都局限于有限视野的孤立子序列,难以提供统一的序列级标准。在本文中,我们提出了全局字典增强Transformer(GDformer),采用改进的基于字典的交叉注意力机制,以培养整个序列中所有正常点共享的全局表示。相应地,交叉注意力图反映了点与全局表示之间的相关权重,这自然导致了基于表示相似性的检测标准。为了促进更紧凑的检测边界,引入了原型来捕获正常点-全局相关权重的分布。GDformer在五个真实世界基准数据集上一致实现了最先进的无监督异常检测性能。进一步的实验验证了全局字典在不同数据集之间具有良好的可迁移性。

英文摘要

Unsupervised anomaly detection of multivariate time series is a challenging task, given the requirements of deriving a compact detection criterion without accessing the anomaly points. The existing methods are mainly based on reconstruction error or association divergence, which are both confined to isolated subsequences with limited horizons, hardly promising unified series-level criterion. In this paper, we propose the Global Dictionary-enhanced Transformer (GDformer) with a renovated dictionary-based cross attention mechanism to cultivate the global representations shared by all normal points in the entire series. Accordingly, the cross-attention maps reflect the correlation weights between the point and global representations, which naturally leads to the representation-wise similarity-based detection criterion. To foster more compact detection boundary, prototypes are introduced to capture the distribution of normal point-global correlation weights. GDformer consistently achieves state-of-the-art unsupervised anomaly detection performance on five real-world benchmark datasets. Further experiments validate the global dictionary has great transferability among various datasets.

2501.15131 2026-05-26 math.OC cs.LG 版本更新

Split-Merge: A Difference-based Approach for Dominant Eigenvalue Problem

分裂-合并:一种基于差异的主特征值问题方法

Xiaozhi Liu, Mengmeng Song, Yong Xia

发表机构 * LMIB of the Ministry of Education, School of Mathematical Sciences, Beihang University(教育部离散数学与信息检索重点实验室,北京航空航天大学数学科学学院) National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Northeastern University(工业智能与系统优化国家级前沿科学中心,东北大学)

AI总结 针对对称半正定矩阵的主特征对计算问题,提出一种基于差异的无约束优化框架,并设计分裂-合并算法,实现无矩阵、无参数迭代,收敛速度优于经典幂法。

详情
AI中文摘要

对称半正定矩阵的主特征对的计算是数值优化的基础。本文将范式从经典瑞利商转变为无约束差异公式,其全局最优解恢复主特征对。在此框架下,我们证明步长 $α\in (0, 1)$ 的常数步长梯度下降几乎必然以局部线性速率收敛到全局最优解。该分析从而将经典幂法重新解释为保守特例 $α=1/2$,并严格建立了其渐近次优性。为了推进这一一阶方案,我们基于最大化-最小化原理提出了分裂-合并算法。在分裂矩阵后,我们引入辅助向量以有效合并分解因子,得到一种无矩阵、无参数的迭代,该迭代捕获更紧的曲率信息。我们证明分裂-合并几乎必然收敛到全局最小化器,并表明该迭代展现出一种谱剥离机制,抑制目标特征空间,可能超越幂法的静态线性速率。在合成和真实数据集上的数值评估证实,我们的方法具有可扩展的效率,相比幂法实现超过 $10\times$ 的加速,性能与子空间迭代相当。

英文摘要

The computation of the dominant eigenpair for symmetric positive semidefinite matrices is fundamental in numerical optimization. This work shifts the paradigm from the classical Rayleigh quotient to an unconstrained difference formulation, whose global optimum recovers the dominant eigenpair. Within this framework, we prove that gradient descent with a constant step-size $α\in (0, 1)$ converges almost surely to the global optimum at a local linear rate. This analysis thereby reinterprets the classical power method as the conservative special case $α=1/2$ and rigorously establishes its asymptotic sub-optimality. To advance this first-order scheme, we propose the Split-Merge algorithm based on the majorization-minimization principle. After splitting the matrix, we introduce auxiliary vectors to effectively merge the decomposition factors, resulting in a matrix-free and parameter-free iteration that captures tighter curvature information. We establish that Split-Merge converges almost surely to a global minimizer, and show that the iteration exhibits a spectral peeling mechanism that suppresses the targeted eigenspace, potentially surpassing the static linear rate of power iterations. Numerical evaluations across synthetic and real-world datasets confirm that our method has scalable efficiency, achieving speed-ups exceeding $10\times$ over the power method, with performance comparable to subspace iterations.

2410.10652 2026-05-26 q-bio.QM cs.LG 版本更新

Querying structural and functional niches on spatial transcriptomics data

查询空间转录组数据中的结构和功能生态位

Mo Chen, Minsheng Hao, Xinquan Liu, Lin Deng, Peng Liu, Chen Li, Dongfang Wang, Kui Hua, Liang Guo, Xuegong Zhang, Lei Wei

发表机构 * MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University(生物信息学教育部重点实验室和北京理工大学生物信息学分部,自动化系,清华大学) Center for Synthetic and Systems Biology, School of Life Sciences and School of Medicine, Tsinghua University(合成与系统生物学中心,生命科学学院和医学学院,清华大学) Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College(胸外科部门,国家癌症中心/国家癌症临床研究中心/癌症医院,中国医学科学院和北京协和医学院) Peking Union Medical College, Chinese Academy of Medical Sciences(北京协和医学院,中国医学科学院) Biomedical Pioneering Innovation Center (BIOPIC), Peking University(生物医学前瞻性创新中心(BIOPIC),北京大学) Cancer Research UK Cambridge Institute, University of Cambridge(英国癌症研究Cambridge研究所,剑桥大学) Department of Immunology, School of Basic Medical Sciences, Harbin Medical University(免疫学部门,基础医学学院,哈尔滨医科大学) Zhongguancun Academy, Beijing, China(中关村学院,北京,中国) Zhongguancun Institute of Artificial Intelligence, Beijing, China(中关村人工智能研究院,北京,中国)

AI总结 提出QueST方法,通过子图建模和对比学习查询空间转录组样本中的相似生态位,有效捕捉异质环境中的生态位结构并跨平台泛化。

详情
AI中文摘要

多细胞生物中的细胞协调形成结构和功能生态位。随着空间转录组学(ST)能够在空间背景下进行基因表达谱分析,已揭示空间生态位在生理和病理过程中作为内聚且重复出现的单位。这些观察表明,由保守生态位模式编码的普遍组织原则,并呼吁一种超越当前计算工具的基于查询的生态位分析范式。在这项工作中,我们定义了生态位查询任务,即给定感兴趣生态位(NOI),在ST样本中识别相似生态位。我们进一步开发了QueST,一种专门解决该任务的方法。QueST将每个生态位建模为子图,使用对比学习学习判别性生态位嵌入,并引入对抗训练以减轻批次效应。在模拟和基准数据集中,QueST优于为生态位查询改造的现有方法,准确捕捉异质环境中的生态位结构,并展现出跨不同测序平台的强泛化能力。应用于肾癌和肺癌中的三级淋巴结构,QueST揭示了与患者预后相关的功能不同生态位,并发现了跨癌症类型的保守和分歧空间结构。应用于组合空间扰动数据集,QueST展示了完整的从头发现导向工作流程,通过查询表征了先前未解析的肿瘤结节。这些结果表明,QueST能够跨样本进行系统、定量的空间生态位分析,为剖析健康和疾病中的空间组织结构提供了强大工具。

英文摘要

Cells in multicellular organisms coordinate to form structural and functional niches. With spatial transcriptomics (ST) enabling gene expression profiling in spatial contexts, it has been revealed that spatial niches serve as cohesive and recurrent units in physiological and pathological processes. These observations suggest universal tissue organization principles encoded by conserved niche patterns, and call for a query-based niche analytical paradigm beyond current computational tools. In this work, we defined the niche-query task, which is to identify similar niches across ST samples given a niche of interest (NOI). We further developed QueST, a specialized method for solving this task. QueST models each niche as a subgraph, uses contrastive learning to learn discriminative niche embeddings, and incorporates adversarial training to mitigate batch effects. In simulations and benchmark datasets, QueST outperformed existing methods repurposed for niche querying, accurately capturing niche structures in heterogeneous environments and demonstrating strong generalizability across diverse sequencing platforms. Applied to tertiary lymphoid structures in renal and lung cancers, QueST revealed functionally distinct niches associated with patient prognosis and uncovered conserved and divergent spatial architectures across cancer types. Applied to a combinatorial spatial perturbation dataset, QueST demonstrated a complete de novo discovery-oriented workflow, characterizing previously unresolved tumor nodules through querying. These results demonstrate that QueST enables systematic, quantitative profiling of spatial niches across samples, providing a powerful tool to dissect spatial tissue architecture in health and disease.

2409.19727 2026-05-26 cs.LG cs.CV 版本更新

Investigating the Effect of Network Pruning on Performance and Interpretability

探究网络剪枝对性能与可解释性的影响

Jonathan von Rad, Florian Seuffert

发表机构 * AI Center, Neural Information Processing Group University of Tübingen(人工智能中心、神经信息处理组 汤姆森大学)

AI总结 本文通过系统应用非结构化、结构化剪枝及连接稀疏方法,研究不同剪枝技术对GoogLeNet在ImageNet验证集上的分类性能和可解释性的影响,发现充分重训练后性能可接近甚至超越原始网络,且可解释性评分与剪枝率无显著关联。

Comments 4 pages, 6 figures

详情
AI中文摘要

深度神经网络(DNN)通常对其任务而言是过参数化的,可以通过移除权重进行大幅压缩,这一过程称为剪枝。我们研究了不同剪枝技术对GoogLeNet的分类性能和可解释性的影响。我们系统地应用非结构化剪枝、结构化剪枝以及连接稀疏性(输入权重剪枝)方法,并分析这些方法对网络在ImageNet验证集上性能的影响。我们还比较了不同的重训练策略,如迭代剪枝和一次性剪枝。我们发现,通过足够的重训练轮次,网络的性能可以接近默认GoogLeNet的性能——甚至在某些情况下超越它。为了评估可解释性,我们采用了Zimmermann等人开发的机制可解释性评分(MIS)。我们的实验表明,当使用MIS作为度量时,可解释性与剪枝率之间没有显著关系。此外,我们观察到,准确率极低的网络仍然可以获得高MIS分数,这表明MIS可能并不总是与可解释性的直观概念(例如理解正确决策的基础)一致。

英文摘要

Deep Neural Networks (DNNs) are often over-parameterized for their tasks and can be compressed quite drastically by removing weights, a process called pruning. We investigate the impact of different pruning techniques on the classification performance and interpretability of GoogLeNet. We systematically apply unstructured and structured pruning, as well as connection sparsity (pruning of input weights) methods to the network and analyze the outcomes regarding the network's performance on the validation set of ImageNet. We also compare different retraining strategies, such as iterative pruning and one-shot pruning. We find that with sufficient retraining epochs, the performance of the networks can approximate the performance of the default GoogLeNet - and even surpass it in some cases. To assess interpretability, we employ the Mechanistic Interpretability Score (MIS) developed by Zimmermann et al. . Our experiments reveal that there is no significant relationship between interpretability and pruning rate when using MIS as a measure. Additionally, we observe that networks with extremely low accuracy can still achieve high MIS scores, suggesting that the MIS may not always align with intuitive notions of interpretability, such as understanding the basis of correct decisions.

2401.11963 2026-05-26 cs.NE cs.AI cs.LG 版本更新

Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey on Hybrid Algorithms

桥接进化算法与强化学习:混合算法的全面综述

Pengyi Li, Jianye Hao, Hongyao Tang, Xian Fu, Yan Zheng, Ke Tang

发表机构 * College of Intelligence and Computing, Tianjin University(天津大学智能与计算学院) Montreal Institute of Learning Algorithms (MILA)(蒙特利尔学习算法研究所) Department of Computer Science and Engineering, Southern University of Science and Technology(南方科技大学计算机科学与工程系)

AI总结 本文全面综述了进化强化学习(ERL)领域,将进化算法(EA)与强化学习(RL)融合,系统总结了三种主要研究方向:EA辅助RL优化、RL辅助EA优化以及EA与RL协同优化,并分析了各分支解决的问题及未来挑战。

Comments New Version, add more methods

详情
AI中文摘要

进化强化学习(ERL)将进化算法(EA)和强化学习(RL)相结合用于优化,已展现出显著的性能提升。通过融合这两种方法,ERL已成为一个有前景的研究方向。本综述全面概述了ERL中的不同研究分支。具体而言,我们系统地总结了相关算法的最新进展,并确定了三个主要研究方向:EA辅助的RL优化、RL辅助的EA优化以及EA和RL的协同优化。随后,我们对每个研究方向进行了深入分析,组织了多个研究分支。我们阐明了每个分支旨在解决的问题,以及EA和RL的整合如何应对这些挑战。最后,我们讨论了各个研究方向中潜在的挑战和未来的研究方向。为了便于研究人员深入研究ERL,我们在https://github.com/yeshenpy/Awesome-Evolutionary-Reinforcement-Learning上整理了所涉及的算法和代码。

英文摘要

Evolutionary Reinforcement Learning (ERL), which integrates Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for optimization, has demonstrated remarkable performance advancements. By fusing both approaches, ERL has emerged as a promising research direction. This survey offers a comprehensive overview of the diverse research branches in ERL. Specifically, we systematically summarize recent advancements in related algorithms and identify three primary research directions: EA-assisted Optimization of RL, RL-assisted Optimization of EA, and synergistic optimization of EA and RL. Following that, we conduct an in-depth analysis of each research direction, organizing multiple research branches. We elucidate the problems that each branch aims to tackle and how the integration of EAs and RL addresses these challenges. In conclusion, we discuss potential challenges and prospective future research directions across various research directions. To facilitate researchers in delving into ERL, we organize the algorithms and codes involved on https://github.com/yeshenpy/Awesome-Evolutionary-Reinforcement-Learning.

2311.15487 2026-05-26 cs.LG cs.AI math-ph math.MP math.OC stat.ML 版本更新

Global $\mathcal{L}^2$ minimization at uniform exponential rate via geometrically adapted gradient descent in Deep Learning

全局 $\mathcal{L}^2$ 最小化:通过深度学习中的几何自适应梯度下降实现均匀指数速率

Thomas Chen

发表机构 * Department of Mathematics, University of Texas at Austin(德克萨斯大学奥斯汀分校数学系)

AI总结 本文利用微分几何中黎曼度量的任意性,提出两种改进的梯度下降流(过参数化和欠参数化设置),在秩条件成立时证明其以均匀指数收敛速率驱动 $\mathcal{L}^2$ 代价到全局最小值,并推广到秩条件不成立的情形。

Comments AMS Latex, 21 pages. Typos corrected, references and comments added

详情
AI中文摘要

我们考虑深度学习网络中的监督学习场景,并利用黎曼度量选择的任意性(微分几何的一般事实)来定义梯度下降流。在标准的深度学习方法中,参数空间(权重和偏置)上的梯度流是相对于欧几里得度量定义的。而在这里,我们选择相对于深度学习网络输出层中的欧几里得度量的梯度流。这自然地在参数空间中诱导出两种改进的梯度下降流版本,一种适用于过参数化设置,另一种适用于欠参数化设置。在过参数化情况下,我们证明,只要秩条件成立,改进的梯度下降的所有轨道都以均匀指数收敛速率将 ${\mathcal L}^2$ 代价驱动到其全局最小值;因此,对于任何预先指定的接近全局最小值的程度,可以获得一个先验的停止时间。我们指出了后者与亚黎曼几何的关系。此外,我们将上述框架推广到秩条件不成立的情况;特别地,我们表明局部平衡只有在秩损失发生时才能存在,并且通常它们不是孤立点,而是参数空间中临界子流形的元素。

英文摘要

We consider the scenario of supervised learning in Deep Learning (DL) networks, and exploit the arbitrariness of choice in the Riemannian metric relative to which the gradient descent flow can be defined (a general fact of differential geometry). In the standard approach to DL, the gradient flow on the space of parameters (weights and biases) is defined with respect to the Euclidean metric. Here instead, we choose the gradient flow with respect to the Euclidean metric in the output layer of the DL network. This naturally induces two modified versions of the gradient descent flow in the parameter space, one adapted for the overparametrized setting, and the other for the underparametrized setting. In the overparametrized case, we prove that, provided that a rank condition holds, all orbits of the modified gradient descent drive the ${\mathcal L}^2$ cost to its global minimum at a uniform exponential convergence rate; one thereby obtains an a priori stopping time for any prescribed proximity to the global minimum. We point out relations of the latter to sub-Riemannian geometry. Moreover, we generalize the above framework to the situation in which the rank condition does not hold; in particular, we show that local equilibria can only exist if a rank loss occurs, and that generically, they are not isolated points, but elements of a critical submanifold of parameter space.

2310.01285 2026-05-26 q-fin.CP cs.LG q-fin.MF stat.ML 版本更新

Automated regime classification in multidimensional time series data using sliced Wasserstein k-means clustering

多维时间序列数据中的自动制度分类:基于切片Wasserstein k-means聚类

Qinmeng Luan, James Hamp

发表机构 * Citigroup, London, UK(伦敦英国摩根大通公司) Data Science Institute, London School of Economics, London, UK(伦敦经济学院数据科学研究所)

AI总结 提出切片Wasserstein k-means聚类方法,通过近似多维Wasserstein距离,实现多维时间序列数据的自动制度分类,并在合成数据和真实外汇数据中验证有效性。

详情
Journal ref
Data Science in Finance and Economics 2025, Volume 5, Issue 3: 387-418
AI中文摘要

最近的研究提出Wasserstein k-means(Wk-means)聚类作为对时间序列数据(特别是单维资产收益)进行制度分类的强大方法。本文首先详细研究应用于合成一维时间序列数据的Wasserstein k-means聚类算法的行为。我们通过详细研究聚类算法的动态以及超参数变化如何影响不同随机初始化的性能,扩展了先前的工作。我们计算简单的度量,发现这些度量有助于识别高质量的聚类。然后,我们将Wasserstein k-means聚类技术扩展到多维时间序列数据,通过将多维Wasserstein距离近似为切片Wasserstein距离,得到一种称为“切片Wasserstein k-means(sWk-means)聚类”的方法。我们将sWk-means聚类方法应用于多维时间序列数据中的自动制度分类问题,使用合成数据证明该方法的有效性和有效性。最后,我们以公开的外汇即期汇率数据作为案例研究,表明sWk-means方法能够识别真实多维金融时间序列中的不同市场制度。我们最后评论了该方法的一些局限性以及潜在的补充或替代方法。

英文摘要

Recent work has proposed Wasserstein k-means (Wk-means) clustering as a powerful method to classify regimes in time series data, and one-dimensional asset returns in particular. In this paper, we begin by studying in detail the behaviour of the Wasserstein k-means clustering algorithm applied to synthetic one-dimensional time series data. We extend the previous work by studying, in detail, the dynamics of the clustering algorithm and how varying the hyperparameters impacts the performance over different random initialisations. We compute simple metrics that we find to be useful in identifying high-quality clusterings. We then extend the technique of Wasserstein k-means clustering to multidimensional time series data by approximating the multidimensional Wasserstein distance as a sliced Wasserstein distance, resulting in a method we call 'sliced Wasserstein k-means (sWk-means) clustering'. We apply the sWk-means clustering method to the problem of automated regime classification in multidimensional time series data, using synthetic data to demonstrate the validity and effectiveness of the approach. Finally, we show that the sWk-means method is able to identify distinct market regimes in real multidimensional financial time series, using publicly available foreign exchange spot rate data as a case study. We conclude with remarks about some limitations of our approach and potential complementary or alternative approaches.

2306.02216 2026-05-26 cs.LG cs.CV 版本更新

Forgettable Federated Linear Learning with Certified Data Unlearning

具有认证数据遗忘的可遗忘联邦线性学习

Ruinan Jin, Minghui Chen, Qiong Zhang, Xiaoxiao Li

发表机构 * University of British Columbia(不列颠哥伦比亚大学) Vector Institute(向量研究所) Renmin University of China(中国人民大学)

AI总结 提出一种基于预训练模型线性近似的联邦遗忘框架,通过联邦线性训练实现高效、安全且可认证的客户端数据遗忘。

Comments IEEE Transactions on Neural Networks and Learning Systems

详情
Journal ref
IEEE Transactions on Neural Networks and Learning Systems, Early Access, pp. 1-10, 2026
AI中文摘要

联邦学习(FL)能够在分布式客户端之间进行协作模型训练,同时保护用户隐私。最近,联邦遗忘(FU)的出现旨在解决“被遗忘权”问题,并在无需重新训练整个FL系统的情况下移除中毒或目标客户端的影响。然而,许多FU方法需要与保留或目标客户端通信,引入额外的安全风险,或存储历史模型,限制了其效率和实用性。此外,由于非线性模型及其训练动态的复杂性,大多数用于深度神经网络(DNN)的FU方法缺乏理论认证。在这项工作中,我们引入了可遗忘联邦线性学习,这是一个用于DNN的训练和遗忘框架。我们的方法使用预训练模型线性近似DNN,并通过联邦线性训练实现与原始网络相当的性能。我们进一步提出了一种经过认证、高效且安全的遗忘策略,使服务器能够在不进行额外客户端通信或存储的情况下移除目标客户端的影响。在从小型到大型数据集上使用卷积神经网络和现代基础模型进行的广泛实验表明,我们的方法在模型准确性和有效的目标客户端遗忘之间取得了平衡。这项工作为高效且可信的FU提供了一个实用的流程。代码:https://github.com/Nanboy-Ronan/2F2L-Federated-Unlearning

英文摘要

Federated Learning (FL) enables collaborative model training across distributed clients while preserving user privacy. Recently, Federated Unlearning (FU) has emerged to address the "right to be forgotten" and to remove the influence of poisoned or target clients without retraining the entire FL system. However, many FU methods require communication with retained or target clients, introduce additional security risks, or store historical models, limiting their efficiency and practicality. Moreover, most FU methods for deep neural networks (DNNs) lack theoretical certification due to the complexity of nonlinear models and their training dynamics. In this work, we introduce Forgettable Federated Linear Learning, a training and unlearning framework for DNNs. Our approach uses pre-trained models to linearly approximate DNNs and achieve performance comparable to the original networks through Federated Linear Training. We further present a certified, efficient, and secure unlearning strategy that enables the server to remove a target client's influence without additional client communication or storage. Extensive experiments on small- to large-scale datasets, using both convolutional neural networks and modern foundation models, show that our method balances model accuracy with effective target-client unlearning. This work provides a practical pipeline for efficient and trustworthy FU. Code: https://github.com/Nanboy-Ronan/2F2L-Federated-Unlearning

2605.24524 2026-05-26 cs.LG cs.CL q-bio.NC 版本更新

What Are We Actually Decoding? Source Attribution for Non-Invasive Brain-to-Language Retrieval

我们究竟在解码什么?非侵入式脑到语言检索的源归因

Xinyu Zhang, Sichao Liu, Runhao Lu, Alexandra Woolgar, Lihui Wang

发表机构 * KTH(瑞典皇家理工学院) University of Cambridge(剑桥大学) EPFL(苏黎世联邦理工学院) Karolinska Institutet(Karolinska研究所) McGill University(麦吉尔大学)

AI总结 针对非侵入式神经语言解码中结果被非刺激诱发源(如解码器先验、嵌入度量、信号时长等)膨胀的问题,提出一个审计框架,通过结构捷径、窗口级刺激锁定证据和跨窗口上下文聚合三种源分离,并引入组上下文偏差(GCB)作为可控的源归因干预,实现性能的源归因而非仅报告。

Comments 35 pages, 7 figures, 25 tables

详情
AI中文摘要

在非侵入式神经语言解码中,结果可能被非刺激诱发的神经证据源膨胀:解码器先验、基于嵌入的度量以及非神经结构干扰(如信号时长)。因此,方法学挑战在于归因:当报告的性能提升可以追溯到特定源时,它才更具信息性。我们将刺激锁定的MEG到音频检索重新构建为一个审计框架,将表观性能分离为三个源——结构捷径、窗口级刺激锁定证据和跨窗口上下文聚合——并为每个源提供诊断。在变长解码下,信号盲的高斯噪声达到66.3%的Rank@1(R@1),但一旦强制执行固定时长窗口和刺激身份分割,其性能骤降至接近随机,从而隔离了结构泄漏。在这些控制下,固定窗口检索恢复了可测量的MEG-音频可区分性,而一个神谕句子桶诊断显示,95.7%的Top-1错误选择了错误的句子,将剩余瓶颈定位到句子级竞争。我们使用组上下文偏差(GCB)审计这一上下文源,这是一种推理时的加性logit偏差,它跨窗口汇集句子一致的证据,同时保持基础检索分数和候选池固定。作为分数空间干预,GCB使上下文源变得可测量:在相同固定设置下,Gwilliams上的R@1从44%变为52%,MOUS上从22%变为29%。在此设计下,GCB是可审计的:其效应在随机分组扰动下崩溃,并在局部证据在MEG中衰减或在EEG中接近随机时消失,支持其作为受控源归因干预的使用。这些结果表明,脑到语言性能应进行源归因,而不仅仅是报告。

英文摘要

In non-invasive neural language decoding, results can be inflated by sources that are not stimulus-evoked neural evidence: decoder priors, embedding-based metrics, and non-neural structural nuisances such as signal duration. The methodological challenge is therefore attribution: a reported gain is more informative when it can be traced to a specific source. We recast stimulus-locked MEG-to-audio retrieval as an auditing framework that separates apparent performance into three sources - structural shortcuts, window-level stimulus-locked evidence, and cross-window contextual aggregation - and provides a diagnostic for each. Signal-blind Gaussian noise reaches 66.3% Rank@1 (R@1) under variable-length decoding but collapses to near chance once fixed-duration windows and stimulus-identity splits are enforced, isolating structural leakage. Under these controls, fixed-window retrieval recovers measurable MEG-audio discriminability, while an oracle sentence-bucket diagnostic shows that 95.7% of Top-1 errors select the wrong sentence, localising the residual bottleneck to sentence-level competition. We audit this contextual source with Group Context Bias (GCB), an inference-time additive logit bias that pools sentence-consistent evidence across windows while leaving the base retrieval scores and candidate pool fixed. Used as a score-space intervention, GCB makes the contextual source measurable: R@1 shifts from 44% to 52% on Gwilliams and from 22% to 29% on MOUS under the same fixed setting. GCB is auditable under this design: its effect collapses under random-grouping perturbations and vanishes when local evidence is attenuated in MEG or is near chance in EEG, supporting its use as a controlled source-attribution intervention. These results suggest that brain-to-language performance should be source-attributed, not merely reported.

2605.24523 2026-05-26 cs.LG cs.CL q-bio.NC 版本更新

MindAlign: Bridging EEG, Vision, and Language for Zero-Shot Visual Decoding

MindAlign: 弥合脑电图、视觉和语言实现零样本视觉解码

Zexuan Chen, Sichao Liu, Runhao Lu, Huichao Qi, Alexandra Woolgar, Xi Vincent Wang, Lihui Wang

发表机构 * KTH, SWeden(瑞典皇家理工学院) University of Cambridge, UK(剑桥大学) EPFL, Switzerland(瑞士联邦理工学院) McGill University, Canada(麦吉尔大学) Karolinska Institutet, Sweden(卡罗林斯卡研究所)

AI总结 提出一种三模态对比学习框架MindAlign,通过对齐脑电图、图像和文本表示,在Things-EEG2零样本基准上实现54.1% Top-1和83.4% Top-5准确率,显著超越先前方法。

Comments 20 pages, 10 figures, 15 tables

详情
AI中文摘要

从大脑信号进行视觉解码是计算机视觉和神经科学交叉领域的关键挑战,需要连接神经表征和视觉计算模型的方法。我们提出了一种基于脑电图的视觉解码三模态对比框架,在统一潜在空间中对齐脑电图、视觉和文本表示。我们的方法采用两阶段设计。首先,我们通过无标签试次上的掩码重建预训练脑电图编码器,学习可稳健迁移到下游任务的时空规律。其次,我们通过对比学习联合对齐脑电图、图像和大语言模型生成的文本描述,其中文本监督作为语义正则化器,向共享空间注入语言结构,而不压倒主要的脑电图-图像信号。编码器集成了被试自适应、通道上的图注意力和时空卷积嵌入。在Things-EEG2 200路零样本基准上,我们的框架实现了54.1%的Top-1和83.4%的Top-5准确率,大幅超过最强先前基线(32.4%/64.0%),配对Wilcoxon检验证实所有被试内基线的显著性(p<0.01)。我们在Things-MEG上验证了泛化性。分析表明,紧凑的嵌入几何(CN-CLIP)优于更大的骨干网络,且解码与视觉处理的既定神经生理学一致。这项工作是从非侵入性时间神经信号进行稳健、语义基础视觉解码的关键一步。源代码公开于https://github.com/anon-eeg/eeg_image_decoding。

英文摘要

Visual decoding from brain signals is a key challenge at the intersection of computer vision and neuroscience, requiring methods that bridge neural representations and computational models of vision. We introduce a tri-modal contrastive framework for EEG-based visual decoding that aligns EEG, visual, and textual representations within a unified latent space. Our approach follows a two-stage design. First, we pre-train an EEG encoder via masked reconstruction on unlabeled trials, learning spatio-temporal regularities that transfer robustly to downstream tasks. Second, we jointly align EEG, image, and LLM-generated textual descriptions through contrastive learning, where text supervision acts as a semantic regularizer that injects linguistic structure into the shared space without overwhelming the primary EEG-image signal. The encoder integrates subject-specific adaptation, graph-attention over channels, and temporal-spatial convolutional embeddings. On the Things-EEG2 200-way zero-shot benchmark, our framework achieves 54.1% Top-1 and 83.4% Top-5 accuracy, substantially exceeding the strongest prior baseline (32.4% / 64.0%), with paired Wilcoxon tests confirming significance (p < 0.01) over all in-subject baselines. We validate generalization on Things-MEG. Analysis reveals that compact embedding geometries (CN-CLIP) outperform much larger backbones, and that decoding aligns with established neurophysiology of visual processing. This work is a critical step towards robust, semantically-grounded visual decoding from non-invasive temporal neural signals. The source code is publicly available in https://github.com/anon-eeg/eeg_image_decoding.

2605.24520 2026-05-26 q-bio.GN cs.LG 版本更新

AnnotateMissense: a genome-wide annotation and benchmarking framework for missense pathogenicity prediction

AnnotateMissense:一个用于错义致病性预测的全基因组注释和基准测试框架

Muhammad Muneeb, David B. Ascher

发表机构 * School of Chemistry and Molecular Biology(化学与分子生物学学院) The University of Queensland(昆士兰大学) Baker Heart and Diabetes Institute(贝克心脏病与糖尿病研究所)

AI总结 提出AnnotateMissense框架,整合多种特征,通过XGBoost模型在ClinVar数据集上实现高精度错义变异致病性预测,并生成全基因组预测结果。

详情
AI中文摘要

错义变异解读仍然具有挑战性,因为致病性取决于来自群体频率、进化保守性、转录本背景、氨基酸替代严重性、先验致病性预测因子以及蛋白质语言模型衍生特征的异质性证据。我们提出了AnnotateMissense,一个用于错义变异解读的可扩展注释、基准测试和全基因组预测框架。AnnotateMissense整合了来自dbNSFP v5.1的hg38错义变异与ANNOVAR注释、dbNSFP转录本/蛋白质描述符、AlphaMissense评分、ESM衍生特征、保守性指标、群体频率变量、已建立的致病性预测因子以及工程化的氨基酸/密码子背景特征。使用132,714个ClinVar标记的错义变异,我们在受控特征配置下对机器学习和深度学习模型进行了基准测试。完整的303特征基准集在XGBoost上实现了最强性能,在分层五折交叉验证中平均MCC=0.9411,ROC-AUC=0.9950。受限的朴素和位置导向特征集分别达到了较低的MCC最佳值0.4989和0.5113。循环控制消融实验表明,移除先验预测因子、群体频率和临床重叠证据会降低性能,而单独排除AlphaMissense和ESM衍生特征影响最小。在新观察到的致病/良性变异上的时间ClinVar验证实现了MCC=0.7613,准确率=0.8798,F1分数=0.8750。最终模型应用于90,643,830个hg38错义变异,生成AnnotateMissense致病性评分和二元预测标签。代码和输出可在https://github.com/MuhammadMuneeb007/CAGI7_Annotate_All_Missense和https://doi.org/10.5281/zenodo.19981867获取。

英文摘要

Missense variant interpretation remains challenging because pathogenicity depends on heterogeneous evidence from population frequency, evolutionary conservation, transcript context, amino acid substitution severity, prior pathogenicity predictors and protein-language-model-derived features. We present AnnotateMissense, a scalable annotation, benchmarking and genome-wide prediction framework for missense variant interpretation. AnnotateMissense integrates hg38 missense variants derived from dbNSFP v5.1 with ANNOVAR annotations, dbNSFP transcript/protein descriptors, AlphaMissense scores, ESM-derived features, conservation metrics, population-frequency variables, established pathogenicity predictors and engineered amino acid/codon-context features. Using 132,714 ClinVar-labelled missense variants, we benchmarked machine-learning and deep-learning models under controlled feature configurations. The full 303-feature benchmark set achieved the strongest performance with XGBoost, reaching mean MCC = 0.9411 and ROC-AUC = 0.9950 across stratified five-fold cross-validation. Restricted naive and location-oriented feature sets achieved lower best MCC values of 0.4989 and 0.5113, respectively. Circularity-controlled ablations showed that removing prior-predictor, population-frequency and clinically overlapping evidence reduced performance, whereas excluding AlphaMissense and ESM-derived features alone had minimal effect. Temporal ClinVar validation on newly observed pathogenic/benign variants achieved MCC = 0.7613, accuracy = 0.8798 and F1-score = 0.8750. The final model was applied to 90,643,830 hg38 missense variants to generate AnnotateMissense pathogenicity scores and binary prediction labels. Code and outputs are available at https://github.com/MuhammadMuneeb007/CAGI7_Annotate_All_Missense and https://doi.org/10.5281/zenodo.19981867.

2605.24517 2026-05-26 cs.LG cs.CL 版本更新

ECHO: Terminal Agents Learn World Models for Free

ECHO: 终端代理免费学习世界模型

Vaishnavi Shrivastava, Piero Kauffmann, Ahmed Awadallah, Dimitris Papailiopoulos

发表机构 * Microsoft Research(微软研究院)

AI总结 提出ECHO混合目标,通过预测环境观测令牌将终端反馈转化为密集监督信号,显著提升CLI代理在TerminalBench-2.0上的性能。

详情
AI中文摘要

CLI代理是语言模型最接近具身环境的设置:模型发出命令,终端执行它们,返回的流——stdout、错误、文件、日志和跟踪——记录了后果。我们认为这个流是一个监督信号,但标准的代理强化学习丢弃了它:GRPO风格的训练使用稀疏的结果级奖励更新动作令牌,而忽略了rollout中已有的环境响应。失败的rollout尽管包含关于环境如何响应的丰富证据,但提供的策略梯度信号很少。我们引入了ECHO(环境交叉熵混合目标),这是一种混合目标,它将动作令牌上的标准策略梯度损失与辅助损失相结合,该辅助损失训练策略预测其自身动作产生的环境观测令牌。ECHO重用与GRPO相同的前向传播,不需要额外的rollout,并将终端反馈转化为所有rollout的密集监督。ECHO在TerminalBench-2.0上将GRPO的pass@1翻倍:Qwen3-8B从2.70%提升到5.17%,Qwen3-14B从5.17%提升到10.79%。ECHO还产生了更好地预测终端动态的策略,即使是在它们未生成的轨迹上:在保留的rollout中,它显著降低了环境令牌的交叉熵,而单独的GRPO几乎没有改变。从基础Qwen3-8B开始,ECHO在没有专家演示的情况下,在保留的终端任务上匹配了专家SFT然后GRPO的性能,并在TerminalBench-2.0上恢复了大专家SFT初始化收益的一半。在某些设置中,仅环境预测损失就能实现无验证器的自我改进,使策略仅通过与环境交互就能在未见过的OOD任务上改进。这些结果表明,环境观测不仅是未来动作的上下文,而且是每个rollout中已经存在的密集、在策略的监督信号。

英文摘要

CLI agents are the closest thing language models have to an embodied setting: the model emits commands, the terminal executes them, and the returned stream -- stdout, errors, files, logs, and traces -- records the consequences. We argue that this stream is a supervision signal, but standard agent RL discards it: GRPO-style training updates action tokens with sparse outcome-level rewards while ignoring environment responses already in the rollout. Failed rollouts provide little policy-gradient signal despite containing rich evidence about how the environment responds. We introduce ECHO (Environment Cross-entropy Hybrid Objective), a hybrid objective that combines the standard policy-gradient loss on action tokens with an auxiliary loss that trains the policy to predict environment observation tokens resulting from its own actions. ECHO reuses the same forward pass as GRPO, requires no additional rollouts, and turns terminal feedback into dense supervision for all rollouts. ECHO doubles GRPO pass@1 on TerminalBench-2.0: Qwen3-8B improves from 2.70% to 5.17%, and Qwen3-14B from 5.17% to 10.79%. ECHO also produces policies that better predict terminal dynamics, even on trajectories they did not generate: across held-out rollouts, it sharply reduces environment-token cross-entropy while GRPO alone barely changes it. From base Qwen3-8B, ECHO matches expert-SFT-then-GRPO performance on held-out terminal tasks without expert demonstrations, and recovers roughly half of the expert-SFT initialization benefit on TerminalBench-2.0. In some settings, the environment prediction loss alone enables verifier-free self-improvement, allowing policies to improve on unseen OOD tasks by learning only from environment interactions. Together, these results suggest that environment observations are not merely context for future actions, but a dense, on-policy supervision signal already present in every rollout.

2605.24515 2026-05-26 cs.LG 版本更新

Lake Detection and Water Quality Estimation in Sentinel-2 Data

Sentinel-2 数据中的湖泊检测与水质估计

Iulia Pleşu, Alexandra Băicoianu, Ioana Cristina Plajer

发表机构 * Transilvania University of Bra\c sov, Faculty of Mathematics and Computer Science(布拉索夫特拉西亚大学数学与计算机科学学院)

AI总结 本文比较了三种机器学习架构用于水体识别与监测,并提出了针对水质指数的有意义配色方案,以提高可解释性和决策支持。

详情
AI中文摘要

随着气候变化和人类对自然景观的压力增加,内陆水资源变得越来越稀缺、脆弱且难以可持续管理。因此,可靠且自动化的地表水体检测、监测和评估方法具有日益增长的科学和实践重要性。在本文中,我们研究并比较了三种不同的机器学习架构用于水体识别与监测。通过定量指标和实际案例评估其性能。此外,在代表性测试图像上与经典的 NDWI 阈值法进行直接比较,以突出数据驱动方法与基于指数方法之间的差异。这一分析使我们能够识别出在准确性、鲁棒性和实际适用性方面表现最佳的模型。除了检测之外,有意义的水质评估的一个主要挑战在于光谱水指数的一致且可解释的可视化。标准颜色映射技术通常不足或可能对环境应用产生误导。为弥补这一差距,我们提出了一套适用于水质指数的有意义配色方案,有助于人类用户更清晰地解释、比较和决策。

英文摘要

With climate change and increasing human pressure on natural landscapes, inland water resources are becoming progressively scarcer, more vulnerable, and more difficult to manage sustainably. Reliable and automated methods for detecting, monitoring, and assessing surface water bodies are therefore of growing scientific and practical importance. In this paper, we investigate and compare three distinct machine learning architectures for water body identification and monitoring. Their performance is evaluated through quantitative metrics and real-world examples. Furthermore, a direct comparison with classical NDWI thresholding is conducted on a representative test image to highlight differences between data-driven and index-based approaches. This analysis allows us to identify the best-performing model in terms of accuracy, robustness, and practical applicability. Beyond detection, a major challenge for meaningful water quality assessment lies in the consistent and interpretable visualization of spectral water indices. Standard color mapping techniques are often inadequate or potentially misleading for environmental applications. To address this gap, we propose a suite of meaningful color schemes adapted for water quality indices, facilitating clearer interpretation, comparison, and decision-making for human users.

2605.24513 2026-05-26 cs.LG 版本更新

Zeroth-Order Nonconvex Nonsmooth Optimization with Heavy-Tailed Noise

具有重尾噪声的零阶非凸非光滑优化

Zhuanghua Liu, Luo Luo

发表机构 * Zhuanghua Liu(刘庄华) Luo Luo(罗洛)

AI总结 针对目标函数Lipschitz连续的非凸非光滑问题,提出一种通过裁剪两点梯度估计器的在线到非凸转换框架的随机零阶算法,在重尾噪声下实现$(δ, ε)$-Goldstein驻点,其零阶复杂度为${\\mathcal O}(d^{\\frac{p}{2(p-1)}}δ^{-1}ε^{-\\frac{2p-1}{p-1}})$,与已知最优结果一致。

详情
AI中文摘要

本文考虑目标函数Lipschitz连续的非凸非光滑问题。我们关注随机设置,其中算法可以访问带有重尾噪声的随机函数值评估,这在许多流行的机器学习应用中普遍存在。我们提出了一种随机零阶算法,通过裁剪两点梯度估计器来改进在线到非凸转换的框架。理论分析表明,我们的算法可以找到$(δ, ε)$-Goldstein驻点,其零阶复杂度为${\\mathcal O}(d^{\\frac{p}{2(p-1)}}δ^{-1}ε^{-\\frac{2p-1}{p-1}})$,其中$d$是问题维度,$p\\\in(1,2]$是有界矩的阶数。注意,我们对维度$d$的依赖性与随机零阶优化在寻找随机凸非光滑问题的次优解方面的已知最佳结果相匹配。此外,我们对精度参数$δ$和$ε$的依赖性与随机非凸非光滑问题的已知最佳随机一阶算法一致。最后,我们进行了数值实验,以证明所提出方法的有效性。

英文摘要

This paper considers the nonconvex nonsmooth problem in which the objective function is Lipschitz continuous. We focus on the stochastic setting where the algorithm can access stochastic function value evaluations with heavy-tailed noise, which is prevalent in many popular machine learning applications. We propose a stochastic zeroth-order algorithm that refines the framework of online-to-nonconvex conversion by clipping the two-point gradient estimator. The theoretical analysis shows that our algorithm can find a $(δ, ε)$-Goldstein stationary point with zeroth-order oracle complexity of ${\mathcal O}(d^{\frac{p}{2(p-1)}}δ^{-1}ε^{-\frac{2p-1}{p-1}})$, where $d$ is the problem dimension and $p\in(1,2]$ is the order of bounded moments. Note that our dependence on dimension $d$ matches the best-known results of stochastic zeroth-order optimization for finding the sub-optimal solution of a stochastic convex nonsmooth problem. In addition, our dependence on accuracy parameters $δ$ and $ε$ is consistent with that of the best-known stochastic first-order algorithms for stochastic nonconvex nonsmooth problems. Finally, we conduct numerical experiments to demonstrate the effectiveness of the proposed method.

2605.24509 2026-05-26 cs.CV cs.AI cs.GR cs.LG 版本更新

Φ-Noise: Training-Free Temporal Video Conditioning via Phase-Based Noise Manipulation

Φ-Noise:基于相位噪声操作的无训练时间视频条件生成

Ofir Abramovich, Nadav Z. Cohen, Adi Rosenthal, Ariel Shamir

发表机构 * Canvas-Lab

AI总结 提出一种无需训练的方法,通过将参考视频的低频相位信息注入扩散噪声潜变量,实现运动条件视频生成,无需修改模型架构或推理流程。

Comments Under Review; 26 pages, 21 figures

详情
AI中文摘要

潜在视频扩散模型通过逐步将高斯噪声转换为基于文本或视觉输入的真实样本来生成视频。然而,现有的条件方法通常需要额外的训练和计算开销。受最近关于频率分量在生成模型中重要性的发现启发,我们提出了一种简单、无需训练的运动条件视频生成方法,通过将参考视频的低频相位信息直接注入扩散噪声潜变量。我们的方法在不修改模型架构或推理流程的情况下传递运动线索。通过多个应用,我们展示了在生成视频中对外观和动态的有效控制,同时与更复杂的条件方法相比取得了具有竞争力或更优的结果。

英文摘要

Latent video diffusion models generate videos by progressively transforming Gaussian noise into realistic samples conditioned on text or visual inputs. However, existing conditioning methods often require additional training and computational overhead. Motivated by recent findings on the importance of frequency components in generative models, we propose a simple, training-free approach for motion-conditioned video generation by injecting low-frequency phase information from a reference video directly into the diffusion noise latents. Our method transfers motion cues without modifying the model architecture or inference pipeline. Using several applications, we demonstrate effective control over both appearance and dynamics in generated videos, while achieving competitive or superior results compared to more complex conditioning approaches.

2605.24502 2026-05-26 cond-mat.stat-mech cs.LG math.CO physics.comp-ph 版本更新

Implicit Binarization via Complex Phase Dynamics in Combinatorial Optimization

组合优化中通过复相位动力学实现的隐式二值化

Khen Cohen, Mark Glass, Meir Feder, Yaron Oz

发表机构 * School of Physics and Astronomy, Tel Aviv University(特拉维夫大学物理与天文学学院) School of Electrical and Computer Engineering, Tel Aviv University(特拉维夫大学电气与计算机工程学院)

AI总结 提出一种受物理启发的连续松弛框架,通过将离散二进制变量参数化为复单位圆上的连续波状状态,隐式正则化促进收敛到离散状态,显著提升NP难组合优化问题的求解性能。

Comments 27 pages, 5 figures

详情
AI中文摘要

我们引入了一种受物理启发的连续松弛框架,该框架为NP难组合优化问题(包括二次无约束二进制优化(QUBO)、二进制稀疏编码和植入解伊辛模型)提供了显著改进的解。通过将离散二进制变量参数化为复单位圆上的连续波状状态,我们固有地平滑了高度非凸的能量景观。我们证明,将二进制变量表示为复相位揭示了一种隐式正则化机制,该机制促进向离散状态的收敛。即使在标准的实值优化框架中显式使用该正则化器,提取这一机制也能带来显著改进。实验上,该正则化比标准的实值替代方案实现了更高的基态收敛率。我们的模型在严重噪声(σ=0.25)下的大规模160x160 QUBO任务中实现了零误差,并在欠定稀疏编码中优于传统算法(OMP和LASSO),在σ=0.15时完美恢复。求解器的鲁棒性进一步通过11个严格设计的植入解基准中恢复8个精确基态配置得到验证。

英文摘要

We introduce a physics-inspired continuous relaxation framework that yields substantially improved solutions for NP-hard combinatorial optimization problems, including Quadratic Unconstrained Binary Optimization (QUBO), binary sparse coding, and planted-solution Ising models. By parameterizing discrete binary variables as continuous wave-like states on the complex unit circle, we inherently smooth highly non-convex energy landscapes. We show that representing binary variables as complex phases reveals an implicit regularization mechanism that promotes convergence toward discrete states. Extracting this mechanism yields significant improvements even within standard real-valued optimization frameworks, using this regularizer explicitly. Empirically, this regularization yields vastly higher ground-state convergence rates than standard real-valued alternatives. Our models achieved zero error in large-scale 160x160 QUBO tasks under severe noise (sigma=0.25), and outperformed traditional algorithms (OMP and LASSO) in underdefined sparse coding with perfect recovery at sigma=0.15. The solver's robustness was further validated by recovering exact ground-state configurations in 8 out of 11 rigorously engineered planted-solution benchmarks.

2605.24490 2026-05-26 cs.AI cs.LG q-fin.PM 版本更新

Market Regime Council for Dynamic Credit Assignment in Multi-Agent LLM Decision Systems

市场制度委员会:多智能体LLM决策系统中的动态信用分配

Yunhua Pei, Zerui Ge, Jin Zheng, John Cartlidge

发表机构 * University of Bristol, UK(布里斯托大学)

AI总结 提出市场制度委员会(MRC),一种基于Shapley值进行在线智能体加权、贝叶斯自适应混合和制度依赖乘数的多智能体决策系统,在加密货币投资中实现高夏普比率和累计收益。

Comments 35 pages, 13 figures, preprint

详情
AI中文摘要

用于投资组合管理的多智能体LLM决策系统仍然缺乏一种原则性的方法来跨专业智能体分配信用,在制度转变下容易受到冷启动主导的影响,并且最终分配如何形成的透明度有限。我们提出了市场制度委员会(MRC),一种合作式多智能体决策系统,它计算所有单个、成对和大联盟输出的精确Shapley信用,用于在线智能体加权。实例化为N=3个专业智能体,在每个交易周期,MRC从指数加权性能历史中重新计算基于联盟的Shapley权重,使用贝叶斯自适应混合来稳定早期阶段,应用制度依赖乘数调整智能体权威,并通过五层因果追踪记录每次再平衡。在13种加密资产和5个种子的1037个交易日中,MRC实现了1.51的夏普比率和440.1%的累计收益,在主动基准中排名第一(CR、SR和IR),并在主动方法中实现了最低的最大回撤。消融实验表明,收益来自跨联盟输出的Shapley加权集成,而非任何单一阶段。代码和演示数据包含在补充材料中。

英文摘要

Multi-agent LLM decision systems for portfolio management still lack a principled way to assign credit across specialist agents, remain vulnerable to cold-start dominance under regime shifts, and offer limited transparency into how final allocations are formed. We propose Market Regime Council (MRC), a cooperative multi-agent decision system that computes exact Shapley credits across all single, pairwise, and Grand-coalition outputs for online agent weighting. Instantiated with N=3 specialist agents, at each trading period, MRC recomputes coalition-based Shapley weights from exponentially weighted performance histories, uses a Bayesian adaptive mixture to stabilize early periods, applies regime-dependent multipliers to adjust agent authority, and records each rebalance through a five-layer causal trace. Over 1,037 trading days across 13 crypto assets and five seeds, MRC achieves a Sharpe ratio of 1.51 and a cumulative return of 440.1%, ranking first on CR, SR, and IR among active baselines and attaining the lowest MDD among active methods. Ablation results show that the gains come from Shapley-weighted integration across coalition outputs rather than from any single stage in isolation. Code and demo data are included in the supplementary material.

2605.24484 2026-05-26 cs.AI cs.LG 版本更新

SPACE: Unifying Symmetric and Asymmetric Routing Problems for Generalist Neural Solver

SPACE:统一对称与非对称路由问题的通用神经求解器

Rongsheng Chen, Changliang Zhou, Canhong Yu, Yuanyao Chen, Yu Zhou, Zhuo Chen, Zhenkun Wang

发表机构 * School of Automation and Intelligent Manufacturing, Southern University of Science and Technology, Shenzhen, China(自动化与智能制造学院,南方科技大学,深圳,中国) Pengcheng Laboratory, Shenzhen, China(鹏城实验室,深圳,中国) Guangdong Provincial Key Laboratory of Fully Actuated System Control Theory and Technology, Southern University of Science and Technology, Shenzhen, China(广东省全驱动系统控制理论与技术重点实验室,南方科技大学,深圳,中国) College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China(计算机科学与软件工程学院,深圳大学,深圳,中国)

AI总结 针对现有神经求解器在对称与非对称车辆路径问题中表现不一致的问题,提出基于空间枢轴对齐的无坐标嵌入框架SPACE,通过双向弗雷歇表示和权重解耦自适应解码机制,实现统一节点表示与解生成,在110个变体上取得优异零样本泛化。

详情
AI中文摘要

通用神经路由求解器在利用统一模型解决多种车辆路径问题(VRPs)方面显示出巨大潜力。然而,现有求解器通常局限于对称设置,或在切换到非对称设置时由于输入不一致或固有结构差异而性能下降,这严重限制了它们在包含两种场景的实际应用中的实用性。为解决这一限制,我们基于每个节点到特定枢轴集的相对距离定义其空间位置,并进一步提出一种空间枢轴对齐的无坐标嵌入(SPACE)框架,该框架统一了对称和非对称VRP中的节点表示和解生成。具体而言,我们使用一种新颖的最远枢轴采样策略构建双向弗雷歇表示,以实现跨不同问题设置的不变节点表示。此外,我们引入了一种权重分解的自适应解码机制,将几何感知从问题表示中解耦,减轻约束决策对特定几何设置的过拟合。在110个VRP变体(包括55个对称问题及其非对称对应问题)上的大量实验表明,SPACE在对称和非对称VRP中均实现了有前景的零样本泛化。

英文摘要

Generalist neural routing solvers have shown great potential in solving diverse vehicle routing problems (VRPs) with a unified model. However, existing solvers are typically limited to symmetric settings or degrade in performance when switching to asymmetric settings due to input inconsistencies or inherent structural differences, substantially limiting their practicality in real-world scenarios that encompass both scenarios. To address this limitation, we define the spatial position of each node based on the relative distances to a specific set of pivots and further propose a Spatial Pivot-Aligned Coordinate-free Embedding (SPACE) framework that unifies node representation and solution generation across symmetric and asymmetric VRPs. Specifically, we construct a bidirectional Frechet representation using a novel furthest pivot sampling strategy to enable invariant node representations across distinct problem settings. Furthermore, we introduce a weight-decomposed adaptive decoding mechanism that decouples geometric perception from problem representations, mitigating the overfitting of constraint decisions to a specific geometry setting. Extensive experiments on 110 VRP variants, comprising 55 symmetric problems and their asymmetric counterparts, demonstrate that SPACE achieves promising zero-shot generalization in both symmetric and asymmetric VRPs.

2605.24477 2026-05-26 cs.LG cs.IT math.IT math.ST stat.TH 版本更新

The Normalized Maximum Likelihood for Regular Non-Smooth Models: Measure-Theoretic Foundations and Geometric Sampling

正则非光滑模型的归一化最大似然:测度理论基础与几何采样

Trenton Lau, Gary P. T. Choi

发表机构 * Department of Mathematics, The Chinese University of Hong Kong(香港中文大学数学系)

AI总结 针对现代机器学习中非光滑估计器(如Lasso、稀疏SVM)的归一化最大似然(NML)编码长度计算问题,本文利用几何测度论和保守雅可比矩阵建立严格框架,并提出一种几何MCMC采样器(PDL-PPMH)以精确计算非光滑模型的随机复杂度。

详情
AI中文摘要

归一化最大似然(NML)编码长度,或称随机复杂度,代表了通用编码的一个原则性准则。虽然最近基于余面积公式的公式化为光滑模型提供了计算方法,但该框架对于现代机器学习中普遍存在的非光滑估计器(例如Lasso、稀疏SVM)失效。在这项工作中,我们为正则路径可微Lipschitz(PDL)估计器提供了计算NML的严格框架。通过应用经典几何测度论并将余面积公式与保守雅可比矩阵联系起来,我们证明了非光滑模型的随机复杂度是适定的,并且在理论上与现代自动微分的输出一致。为了精确计算该量,我们引入了提议-投影Metropolis-Hastings(PDL-PPMH)采样器,这是一种能够遍历最大似然估计器非可微水平集的几何MCMC算法。我们在理论上证明了其组成部分的合理性,包括随机切空间提议和可证明收敛的非光滑投影求解器。我们通过从高维Lasso后验($P=2000$)中采样来展示该方法的鲁棒性,同时量化了控制精确性与混合时间之间权衡的计算规模。关键的是,我们通过实验证明,我们的精确NML准则提供了一种高度数据高效的交叉验证替代方案,无需数据分割即可获得统计上不可区分的预测最优值。总之,我们的工作为正则非光滑模型的NML编码长度理论分析铺平了道路。

英文摘要

The Normalized Maximum Likelihood (NML) codelength, or stochastic complexity, represents a principled criterion for universal coding. While recent coarea-based formulations provided a calculation method for smooth models, this framework collapses for the non-smooth estimators ubiquitous in modern machine learning (e.g., Lasso, Sparse SVMs). In this work, we provide a rigorous framework for computing the NML for regular path-differentiable Lipschitz (PDL) estimators. By applying classical geometric measure theory and bridging the coarea formula with conservative Jacobians, we prove that the stochastic complexity for non-smooth models is well-posed and theoretically consistent with the outputs of modern Automatic Differentiation. To compute this quantity exactly, we introduce the Propose-and-Project Metropolis-Hastings (PDL-PPMH) sampler, a geometric MCMC algorithm capable of traversing the non-differentiable level sets of the maximum likelihood estimator. We theoretically justify its components, including a stochastic tangent space proposal and a provably convergent non-smooth projection solver. We demonstrate the method's robustness by sampling from a high-dimensional Lasso posterior ($P=2000$), while simultaneously quantifying the computational scaling that governs the trade-off between exactness and mixing time. Crucially, we empirically demonstrate that our exact NML criterion provides a highly data-efficient alternative to cross-validation, achieving statistically indistinguishable predictive optima without requiring data splitting. Altogether, our work paves the way for the theoretical analysis of the NML codelength for regular non-smooth models.

2605.24458 2026-05-26 cs.LG cs.AI 版本更新

Balancing Fairness, Privacy, and Accuracy: A Multitask Adversarial Framework for Centralized Data-Driven Systems

平衡公平性、隐私和准确性:面向集中式数据驱动系统的多任务对抗框架

Imesh Ekanayake, Elham Naghizade, Jeffrey Chan

发表机构 * School of Computing Technologies, RMIT University(计算技术学院,皇家墨尔本理工大学)

AI总结 提出一种多任务对抗模型,将公平性和隐私作为核心目标,通过优化代价函数动态平衡三者,在最小化性能损失的同时实现高公平性和隐私保护。

Comments 13 Pages, 6 figures, IEEE TKDE

详情
AI中文摘要

在集中式数据驱动应用中,公平性和隐私的整合至关重要,尤其是当这些系统日益影响具有重大社会影响的领域时。当前方法很少同时考虑隐私、公平性和准确性,这可能会损害伦理标准和隐私法规。然而,平衡这三个目标相当具有挑战性,因为每个目标通常对模型的设计和训练提出相互冲突的要求,使得优化一个目标而不损害其他目标变得困难。本文提出了一种新颖的多任务对抗模型,将公平性和隐私视为整体目标而非事后考虑,并学习一个隐藏敏感属性同时保留任务相关信息的潜在表示。我们的方法通过优化的代价函数动态平衡公平性与准确性及隐私,即使在严格条件下也能实现最小的性能损失。在多种数据集上的广泛测试表明,我们的模型能够在不大幅牺牲准确性的情况下实现高标准的公平性和隐私。与最先进的隐私和公平标准进行基准测试表明,我们的方法增强了隐私、公平性和准确性优化的鲁棒性,证明了其在不同数据集上的适应性。

英文摘要

The integration of fairness and privacy in centralized data-driven applications is critical, especially as these systems increasingly influence sectors with significant societal impact. Current methods rarely address privacy, fairness, and accuracy together, which can potentially compromise ethical standards and privacy regulations. However, balancing these three objectives is quite challenging since each of objective often imposes conflicting requirements on the design and training of models, making it difficult to optimize one without compromising the others. This paper introduces a novel multitask adversarial model that treats fairness and privacy as integral objectives rather than afterthoughts, and learns a latent representation that hides sensitive attributes while preserving essential task-related information. Our approach dynamically balances fairness with accuracy and privacy through an optimized cost function with minimal performance loss even under strict conditions. Extensive testing on diverse datasets shows the ability of our model to achieve high standards of fairness and privacy without significant sacrifice to accuracy. Benchmarking against state-of-the-art privacy and fairness standards shows that our method enhances the robustness of privacy, fairness, and accuracy optimization, proving its adaptability across various datasets.

2605.24457 2026-05-26 eess.SY cs.LG cs.SY 版本更新

Asymmetric Adaptation-based Real-time Fault Diagnosis Under Transitional Operating Conditions

基于非对称自适应的过渡工况实时故障诊断

Hongshuo Zhao, Zeyi Liu, Xiao He

发表机构 * MCC5 Group Shanghai Co. LTD(MCC5集团上海有限公司) Tsinghua University(清华大学)

AI总结 针对离线训练未覆盖的过渡工况导致分布偏移问题,提出一种结合离线域泛化与在线测试时自适应的非对称自适应故障诊断方法,通过周期原型重投影和不对称学习率策略实现快速适应并保持判别能力。

Comments 6 pages, 3 figures, Accepted by ICAIS & ISAS 2026

详情
AI中文摘要

实际工业场景中的数据流通常包含离线训练中未覆盖的过渡工况,导致显著的分布偏移。为弥合静态离线模型与动态在线数据之间的差距,本文提出了一种新颖的基于非对称自适应的故障诊断方法。具体地,在离线阶段,我们采用域泛化技术从多个稳定工况中提取域不变特征,并构建鲁棒的归一化故障原型作为参考锚点。随后,在在线推理阶段,我们设计了一种基于周期原型重投影机制的在线测试时自适应方法,以动态更新原型位置。此外,我们利用从锚点导出的几何分布来指导分类器的更新,并对特征提取器和分类器采用非对称学习率策略。所提方法确保快速适应新的过渡工况,同时保留从离线域泛化初始化继承的判别能力。实验结果表明,该机制有效利用离线泛化知识指导在线推理,显著提高了非平稳环境下的鲁棒性。

英文摘要

Data streams in real-world industrial scenarios often contain transitional operating conditions that are uncovered during offline training, leading to significant distribution shifts. To bridge the gap between static offline models and dynamic online data, a novel asymmetric adaptation-based fault diagnosis method is proposed in this paper. Specifically, in the offline stage, we employ domain generalization techniques to extract domain-invariant features from multiple stable conditions and construct robust normalized fault prototypes as reference anchors. Subsequently, during online inference, we design an online test-time adaptation method based on a periodic prototype re-projection mechanism to dynamically update prototype positions. Furthermore, we utilize the geometric distribution derived from anchors to guide the updates of classifiers and adopt an asymmetric learning rate strategy for the feature extractor and classifier. The proposed approach ensures rapid adaptation to new transitional conditions while preserving the discriminative power inherited from the offline domain generalization initialization. Experimental results demonstrate that this mechanism effectively leverages offline generalized knowledge to guide online inference, significantly improving robustness in non-stationary environments.

2605.24452 2026-05-26 cs.CL cs.AI cs.LG 版本更新

Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions

法律判决预测中的时间概念漂移:跨越乌克兰法院判决三个时期的神经基线

Volodymyr Ovcharov

AI总结 通过微调四种Transformer编码器在乌克兰法院三个时期(战前、混合战争、全面入侵)的判决上,研究法律语言的时间漂移,发现前向性能严重下降(最多27.2个百分点),法律领域预训练不能提升绝对性能但能减轻漂移,时序持续学习可消除灾难性遗忘。

Comments 17 pages, 6 tables, 5 figures. Dataset: https://huggingface.co/datasets/overthelex/ukrainian-court-decisions

详情
AI中文摘要

法律NLP基准测试在随机分割的数据上评估模型,隐含假设法律语言是平稳的。我们通过微调四种Transformer编码器——XLM-RoBERTa(base和large)及其法律领域变体——在地缘政治事件定义的三个时间时期的乌克兰法院判决上测试这一假设:战前(2008-2013)、混合战争(2014-2021)和全面入侵(2022-2026)。每个模型在一个时期上训练,并在所有三个时期上评估,产生一个3x3的跨时间泛化矩阵。四个发现出现。(1)前向退化严重:在战前数据上训练的模型应用于全面入侵时期判决时,宏F1最多下降27.2个百分点。(2)退化不对称:后向迁移(全面入侵到战前)比前向迁移稳健得多,与法律语言是加性的假设一致。(3)法律领域预训练(Legal-XLM-R)不提升绝对性能,但减少前向退化的幅度和不对称性。(4)时序持续学习消除了通用XLM-R的灾难性遗忘:战前知识完全保留(+1.8至+6.2个百分点),而全面入侵性能提升+16.5至+19.0个百分点;逆时序训练导致严重遗忘。跨司法管辖区在瑞士判决预测数据上的预训练提升绝对性能,但不减少时间退化幅度,确认时间漂移是法律语言演化的内在属性。数据集(三个时期共428K判决)作为LEXTREME贡献公开可用。

英文摘要

Legal NLP benchmarks evaluate models on randomly split data, implicitly assuming that legal language is stationary. We test this assumption by fine-tuning four transformer encoders -- XLM-RoBERTa (base and large) and their legal-domain variants -- on Ukrainian court decisions from three temporal epochs defined by geopolitical disruptions: pre-war (2008-2013), hybrid war (2014-2021), and full-scale invasion (2022-2026). Each model is trained on one epoch and evaluated on all three, producing a 3x3 cross-temporal generalization matrix. Four findings emerge. (1) Forward degradation is severe: models trained on pre-war data lose up to 27.2 percentage points of macro-F1 when applied to full-scale invasion era decisions. (2) The degradation is asymmetric: backward transfer (full-scale to pre-war) is substantially more robust than forward transfer, consistent with the hypothesis that legal language is additive. (3) Legal-domain pretraining (Legal-XLM-R) does not improve absolute performance but reduces forward degradation magnitude and asymmetry. (4) Chronological continual learning eliminates catastrophic forgetting for general XLM-R: pre-war knowledge is fully retained (+1.8 to +6.2 pp) while full-scale performance gains +16.5 to +19.0 pp; reverse-chronological training causes severe forgetting. Cross-jurisdictional pretraining on Swiss Judgment Prediction data improves absolute performance but does not reduce temporal degradation magnitude, confirming that temporal drift is an intrinsic property of legal language evolution. The dataset (428K decisions across three epochs) is publicly available as a LEXTREME contribution.

2605.24449 2026-05-26 cs.RO cs.LG 版本更新

Vision-Guided Outdoor Flight and Obstacle Evasion via Reinforcement Learning

基于强化学习的视觉引导户外飞行与避障

Shiladitya Dutta, Aayush Gupta, Varun Saran, Avideh Zakhor

发表机构 * College of Engineering, Department of Electrical Engineering and Computer Science, University of California Berkeley(加州大学伯克利分校工程学院电气工程与计算机科学系)

AI总结 提出一种基于立体视觉深度和视觉惯性里程计的传感器运动策略,通过强化学习和特权学习在仿真中训练,实现零样本迁移到未知户外环境和无人机平台进行自主避障导航。

Comments Published in IEEE Robotics and Automation Letters, vol 11, no 2. Presented at the IEEE International Conference on Robotics and Automation 2026

详情
AI中文摘要

尽管四旋翼飞行器凭借其全向机动性拥有令人印象深刻的穿越能力,但在复杂环境中需要持续的人工操控限制了其在GNSS和遥测信号缺失场景中的应用。为此,我们提出了一种新颖的传感器运动策略,该策略使用立体视觉深度和视觉惯性里程计(VIO)在未知环境中自主穿越障碍物以到达目标点。该策略由一个预训练的自编码器作为感知前端,后接一个规划与控制LSTM网络,输出速度指令,可由现成的商用无人机执行。我们利用强化学习和特权学习范式,通过两阶段过程在仿真中训练该策略:1)以全局运动规划器生成的优化轨迹作为监督骨干进行初始训练;2)在课程环境中进一步微调。为弥合仿真到现实的差距,我们采用领域随机化和奖励塑造来创建对噪声和领域偏移具有鲁棒性的策略。在户外实验中,我们的方法成功实现了对训练中从未遇到的障碍环境和无人机平台的零样本迁移。

英文摘要

Although quadcopters boast impressive traversal capabilities enabled by their omnidirectional maneuverability, the need for continuous pilot control in complex environments impedes their application in GNSS and telemetry-denied scenarios. To this end, we propose a novel sensorimotor policy that uses stereo-vision depth and visual-inertial odometry (VIO) to autonomously navigate through obstacles in an unknown environment to reach a goal point. The policy is comprised of a pre-trained autoencoder as the perception head followed by a planning and control LSTM network which outputs velocity commands that can be followed by an off-the-shelf commercial drone. We leverage reinforcement and privileged learning paradigms to train the policy in simulation through a two-stage process: 1) initial training with optimal trajectories generated by a global motion planner acting as a supervisory backbone, 2) further fine-tuning in a curriculum environment. To bridge the sim-to-real gap, we employ domain randomization and reward shaping to create a policy that is both robust to noise and domain shift. In outdoor experiments, our approach achieves successful zero-shot transfer to both obstacle environments and a drone platform that were never encountered during training.

2605.24437 2026-05-26 cs.LG 版本更新

CAffNet: Hard Constraint-Affine Neural Networks

CAffNet: 硬约束仿射神经网络

Yang Zhao, Jungeun Lee, Jeong hwan Jeon, Sze Zheng Yong

发表机构 * Department of Mechanical and Industrial Engineering, Northeastern University, Boston, MA 02115 USA(东北大学机械与工业工程系) Department of Electrical Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea(乌山科学技术研究院电子工程系)

AI总结 提出一种将任意基数的输入相关仿射约束硬嵌入前馈神经网络和Transformer的框架,通过可训练的约束仿射层实现联合优化并保持通用逼近性质。

详情
AI中文摘要

我们提出了一种新颖的框架,用于将硬约束满足嵌入神经网络(NN)架构中,特别是前馈神经网络和Transformer,约束为任意基数的输入相关仿射约束。传统的约束执行方法要么依赖于基于惩罚的软约束,无法保证满足性,要么依赖于训练后执行约束的后处理方法,可能导致次优性。我们在神经网络中引入了一个可训练的约束仿射(CAffine)层,得到CAffNet,它超越了通过固定正交或平行投影执行仿射约束的方式,并实现了与网络参数的联合优化。此外,我们对约束空间维度没有施加任何限制,并证明了我们的构造保持了神经网络的通用逼近性质,同时为所有输入提供了约束遵守的可证明保证。实验验证表明,在需要保证约束满足的各个领域中,性能稳健。

英文摘要

We present a novel framework for embedding hard constraint satisfaction into neural network (NN) architectures, specifically feedforward neural networks and transformers, with input-dependent affine constraints of arbitrary cardinality. Traditional constraint enforcement approaches either rely on penalty-based soft constraints, which offer no guarantee of satisfaction, or on post-processing methods that enforce constraints after the NN is trained, which may lead to suboptimality. We introduce a trainable constraint-affine (CAffine) layer into NNs, yielding CAffNet, which goes beyond enforcing affine constraints via fixed orthogonal or parallel projections and enables joint optimization with network parameters. Moreover, we impose no restrictions on the constraint space dimensions and establish that our construction preserves the universal approximation properties of NNs, while providing provable guarantees on constraint adherence for all inputs. Experimental validation demonstrates robust performance across diverse domains requiring guaranteed constraint satisfaction.

2605.24436 2026-05-26 cs.MA cs.LG cs.RO 版本更新

A Reinforcement Learning Inspired Latent Yield Based Adaptive Algorithm Switching Mechanism

一种受强化学习启发的基于潜在收益的自适应算法切换机制

Jayprakash S. Nair, Jimson Mathew, Shivashankar B. Nair

发表机构 * Indian Institute of Technology Patna(印度理工学院帕纳布分校) Indian Institute of Technology Guwahati(印度理工学院古瓦哈提分校)

AI总结 针对在线或动态环境中算法选择困难的问题,提出一种受强化学习启发的潜在收益方法,通过封装奖励和惩罚触发探索与利用,实现自适应算法切换,并在排序算法和机器人避障任务中验证了有效性。

Comments Accepted and published in the Proceedings of the 29th European Conference on Applications of Evolutionary Computation (EvoApplications 2026), held as part of EvoStar 2026, Toulouse, France, April 8 to 10, 2026. Lecture Notes in Computer Science (LNCS), Springer Nature Switzerland

详情
Journal ref
Applications of Evolutionary Computation, EvoApplications 2026, LNCS, Springer Nature Switzerland, 2026
AI中文摘要

对于给定的问题实例,选择最合适的算法仍然是一项具有挑战性的任务,尤其是在问题特征随时间演变的在线或动态环境中。仅依赖瞬时性能指标可能导致反应性和不稳定的行为,通常会导致次优的算法切换。本文介绍了一种计算高效的方法,用于聚合算法在多个问题实例上的性能,该方法对实例特征的剧烈变化具有相当的免疫性。受强化学习(RL)固有特征的启发,该技术将奖励和惩罚封装到一个潜在收益中,进而触发利用和探索,从而产生自适应算法切换。所提出的技术采用受遗传算法启发的岛屿模型,以促进并行探索和算法种群之间的性能交换,这些算法种群栖息在局部库中。在排序算法和机器人避障任务上的实验评估证明了该方法的可行性和有效性,突显了其在自适应算法选择至关重要的领域中的潜力。

英文摘要

Selecting the most suitable algorithm for a given problem instance remains a challenging task, particularly in online or dynamic environments where problem characteristics evolve over time. Relying solely on instantaneous performance metrics can result in a reactive and unstable behaviour, often leading to suboptimal algorithm switching. This paper introduces a computationally efficient approach for aggregating an algorithm's performance across multiple problem instances that is fairly immune to erratic variations in instance features. Inspired by features inherent to Reinforcement Learning (RL), this technique encapsulates rewards and penalties into a latent yield that, in turn, triggers exploitation and exploration, consequently resulting in adaptive algorithm switching. The proposed technique employs island models, inspired by Genetic Algorithms, to facilitate parallel exploration and performance exchanges among algorithm populations inhabiting local repertoires. Experimental evaluations on sorting algorithms and robotic obstacle avoidance tasks demonstrate the feasibility and effectiveness of the approach, highlighting its potential in domains where adaptive algorithm selection is critical.

2605.24433 2026-05-26 cs.RO cs.LG 版本更新

Smoother Action Chunking Flow Policy via Prior-Corrected Orthogonal Trust-Region Guidance

基于先验校正的正交信任区域引导的平滑动作块流策略

Kai Fang, Hailong Pei, Xuemin Chi

发表机构 * South China University of Technology(华南理工大学) Zhejiang University(浙江大学)

AI总结 提出POTR方法,通过先验校正权重和正交信任区域约束,改善流匹配机器人策略中动作块推理的边界不连续性和横向扰动,提升成功率和运动平滑性。

详情
AI中文摘要

流匹配机器人策略通常使用动作块推理进行高效的闭环控制,但块边界可能引入不连续的动作转换。现有的RTC引导通过在去噪过程中注入校正信号来改善连续性,但其权重调度在中间时间步较弱,且无约束的校正方向可能引入横向扰动。我们提出POTR,一种先验校正的正交信任区域引导方法。首先,我们将数据先验尺度$σ_d$纳入RTC引导权重,产生更强的中间时间校正。其次,我们将引导向量分解为与去噪速度平行和垂直的分量,并将垂直分量约束在信任区域内。在LIBERO上使用$π_{0.5}$,与RTC相比,POTR提高了成功率,并持续减少了块边界不连续性、加速度和加加速度。消融实验表明,先验校正权重提供了主要的校正增益,而正交信任区域进一步提高了稳定性。

英文摘要

Flow-matching robot policies commonly use action-chunking inference for efficient closed-loop control, but chunk boundaries can introduce discontinuous action transitions. Existing RTC guidance improves continuity by injecting correction signals during denoising, yet its weight schedule is weak at intermediate timesteps and its unconstrained correction direction may introduce transverse perturbations. We propose POTR, a **p**rior-corrected **o**rthogonal **t**rust-**r**egion guidance method. First, we incorporate a data-prior scale $σ_d$ into the RTC guidance weight, yielding stronger intermediate-time correction. Second, we decompose the guidance vector into components parallel and perpendicular to the denoising velocity, and constrain the perpendicular component within a trust region. On LIBERO with $π_{0.5}$, POTR improves success rate and consistently reduces chunk-boundary discontinuity, acceleration, and jerk compared with RTC. Ablations show that the prior-corrected weight provides the main correction gain, while the orthogonal trust region further improves stability.

2605.24428 2026-05-26 cs.LG 版本更新

Representation-Guided Discrete Molecular Graph Retrosynthesis

表示引导的离散分子图逆合成

Jiahai Huang, Anjie Qiao, Zhen Wang, Defu Lian, Yutong Lu

发表机构 * Sun Yat-sen University(中山大学) University of Science and Technology of China(中国科学技术大学)

AI总结 提出表示引导的分子图逆合成方法GRG,通过将预训练编码器的化学语义注入扩散模型,在USPTO-50k上达到58.6/77.2/83.4/87.1的top-1/3/5/10准确率,多样性提升至15.5,并加速收敛35%的epoch和30%的时间。

详情
AI中文摘要

基于随机过程的分子图生成器已成为无模板单步逆合成的最先进方法。然而,这些模型通常仅在产物-反应物对上训练,从而以间接和隐式的方式获取化学相关表示。与此同时,计算机视觉的最新进展表明,向生成器提供表示引导可以有效地将预训练编码器的语义提取到DiTs中,显著改善收敛性和生成质量。类似的增益是否适用于逆合成任务,以及哪些图特定的设计选择可以使其工作,仍然是一个开放问题。为了解决这些问题,我们在一个统一的设计空间上进行了系统的实证研究,该空间涵盖教师分子表示、端点和粒度选择、去噪器中的注入深度、对应策略和引导方案。在这些考虑的指导下,我们开发了图导向的表示引导(GRG),在USPTO-50k上实现了58.6/77.2/83.4/87.1的top-1/3/5/10准确率,同时将多样性提高到15.5,两者均大幅优于所采用的基础生成器。值得注意的是,GRG在分布外设置中一致地改进了所有top-k指标,表明表示引导有助于获取内在的化学语义。同时,引入的表示引导将达到可比性能所需的epoch数减少了35%,挂钟时间减少了30%。此外,我们引入了一种简单而有效的基于表示相似性的重排序机制,该机制无需训练额外的验证器即可进一步改善排序列表的顶部。

英文摘要

Stochastic process-based molecular graph generators have become the state of the art for template-free single-step retrosynthesis. However, these models are typically trained only on product-reactant pairs, thereby acquiring chemistry-relevant representations in an indirect and implicit manner. Meanwhile, recent advances in computer vision demonstrate that offering representation guidance to a generator can effectively distill semantics from pretrained encoders into DiTs, substantially improving both convergence and generation quality. Whether similar gains extend to the retrosynthesis task, and what graph-specific design choices can make them work, remains an open question. To address these questions, we conduct a systematic empirical study over a unified design space spanning teacher molecular representations, endpoint and granularity choices, injection depths in the denoiser, correspondence strategies and guidance scheme. Guided by these considerations, we develop Graph-oriented Representation Guidance (GRG), which achieves 58.6 / 77.2 / 83.4 / 87.1 top-1 / 3 / 5 / 10 accuracy on USPTO-50k, while increasing diversity to 15.5, both substantially outperforming the adopted base generator. Notably, GRG consistently improves all top-k metrics in out-of-distribution settings, suggesting that representation guidance facilitates the acquisition of intrinsic chemical semantics. Meanwhile, the introduced representation guidance reduces the number of epochs by 35% and the wall-clock time by 30% to reach comparable performance. In addition, we introduce a simple yet effective representation-similarity-based reranking mechanism, which further improves the top of the ranked list without training an additional verifier.

2605.24425 2026-05-26 cs.LG cs.AI cs.CL 版本更新

Momentum Streams for Optimizer-Inspired Transformers

动量流:优化器启发的Transformer

Jingchu Gai, Nai-Chieh Huang, Jiayun Wu

发表机构 * Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出一类优化器启发的Transformer(如三重动量TMMFormer),通过将残差更新解释为优化器步骤,发现动量是性能提升的关键,能收敛到更平坦的极小值,减少遗忘并改善泛化。

详情
AI中文摘要

预归一化Transformer层的残差更新可以被解释为对代理token能量执行一阶优化器的一步,其中注意力和MLP子层充当梯度预言。基于这一观察,我们构建了一族优化器启发的Transformer(三重动量、Adam/AdamW、Muon、SOAP),并在匹配计算量下进行比较。在我们的主要预训练实验中,三重动量TMMFormer取得了最低的验证损失,优于普通Transformer和先前的架构变体。受控消融实验和支持理论表明,动量(而非预条件)是增益的主要来源。我们进一步证明,TMMFormer和其他基于动量的设计比普通Transformer收敛到更平坦的极小值,这导致更少的遗忘和更好的泛化。

英文摘要

The residual update of a pre-norm Transformer layer admits an interpretation as one step of a first-order optimizer acting on a surrogate token energy, wherein the attention and MLP sublayers function as gradient oracles. Based on this observation, we build a family of optimizer-inspired Transformers (triple-momentum, Adam/AdamW, Muon, SOAP) and compare them under matched compute. In our main pretraining experiment, the triple-momentum TMMFormer achieves the lowest validation loss, outperforming the vanilla Transformer and prior architectural variants. A controlled ablation and supporting theory show that momentum, not preconditioning, is the main source of the gain. We further show that TMMFormer and other momentum-based designs reach flatter minima than the vanilla Transformer, which leads to less forgetting and better generalization.

2605.24422 2026-05-26 stat.ML cs.LG 版本更新

Clustering based on Stochastic Dominance with application for risk averters and risk seekers

基于随机占优的聚类及其对风险规避者和风险寻求者的应用

Hua Li, Xue Jia, Yilin Kang, Wing-Keung Wong

发表机构 * School of Science, Changchun University, Changchun, China(长春大学科学学院,中国长春) School of Mathematics and Science, Northeast Normal University, Changchun, China(东北师范大学数学与科学学院,中国长春)

AI总结 针对传统聚类方法无法捕捉资产间风险占优关系的问题,提出基于随机占优检验统计量的聚类分析框架,通过构造随机占优系数矩阵并改进K-means和层次聚类算法,实现面向不同风险偏好投资者的定制化资产配置。

详情
AI中文摘要

随机占优(SD)理论为选择适合不同风险偏好(即风险规避、风险寻求和风险中性)投资者资产配置需求的优质资产提供了严格框架。然而,传统的股票聚类方法通常依赖欧氏距离等几何度量,往往无法有效捕捉资产间的内在风险占优关系。为解决这一局限,本文提出一种基于SD检验统计量的创新聚类分析框架。方法上,本研究将SD理论与机器学习算法深度融合。超越传统依赖几何距离的限制,我们创新性地利用一阶、二阶和三阶SD的检验统计量构建“随机占优系数矩阵”。在此矩阵基础上,我们修改了经典的K-means和层次聚类算法。具体地,针对不同阶次的SD关系,我们推导出12种不同的算法变体。同时,我们构建了SD-SC系数和SD-DBI指数作为专门的有效性指标来评估聚类性能。实证上,我们分析了代表性发达市场(美国纳斯达克指数)和新兴市场(中国沪深100指数)的成分股数据。结果验证了所提方法的有效性和稳健性。此外,我们将聚类结果应用于单指数模型的修正和全局最小方差投资组合(GMVP)的构建。结果表明,所提方法有效促进了投资者的定制化资产配置,具有重要的理论价值和实践意义。

英文摘要

Stochastic Dominance (SD) theory provides a rigorous framework for selecting superior assets tailored to the asset allocation needs of investors with varying risk preferences (i.e., risk-averse, risk-seeking, and risk-neutral). However, traditional stock clustering methods typically rely on geometric metrics such as Euclidean distance, which often fail to effectively capture the intrinsic risk dominance relationships among assets. To address this limitation, this paper proposes an innovative clustering analysis framework based on SD test statistics. Methodologically, this study deeply integrates SD theory with machine learning algorithms. Transcending the limitations of traditional reliance on geometric distance, we innovatively utilize test statistics from first-, second-, and third-order SD to construct a "Stochastic Dominance Coefficient Matrix." Building upon this matrix, we modify the classic K-means and Hierarchical Clustering algorithms. Specifically, we derive 12 distinct algorithm variants tailored to different orders of SD relationships. Simultaneously, we construct the SD-SC coefficient and the SD-DBI index as specialized validity indices to evaluate the clustering performance. Empirically, we analyze constituent stock data from a representative developed market (the US NASDAQ Index) and an emerging market (China's CSI 100 Index). The results verify the effectiveness and robustness of the proposed method. Furthermore, we apply the clustering results to the modification of the Single Index Model and the construction of Global Minimum Variance Portfolios (GMVP). The findings demonstrate that the proposed method effectively facilitates customized asset allocation for investors, holding significant theoretical value and practical implications.

2605.24421 2026-05-26 cs.CR cs.LG 版本更新

Poisoning the Watchtower: Prompt Injection Attacks Against LLM-Augmented Security Operations Through Adversarial Log Content

毒害瞭望塔:通过对抗性日志内容对LLM增强的安全运营进行提示注入攻击

Rohan Pandey, Archit Bhujang

发表机构 * DigitalOcean Arizona State University(亚利桑那州立大学)

AI总结 研究攻击者控制的日志字段如何向LLM注入指令(日志基底提示注入),提出四类攻击分类,并评估不同防御下的攻击成功率。

Comments 10 pages

详情
AI中文摘要

大型语言模型(LLM)越来越多地被用作安全运营中心(SOC)的分析师助手,它们接收日志和警报数据,生成分类标签、事件摘要或修复建议。我们研究了这种设计的一个结构性故障模式:许多日志字段是攻击者控制的。用户代理、URL、有效载荷、DNS查询和尝试的用户名因此可以将指令与入侵证据一起传递给模型。我们将这种设置称为\emph{日志基底提示注入}。我们引入了日志基底攻击的四类分类法:直接覆盖(S1)、角色劫持(S2)、上下文操纵(S3)和混淆有效载荷(S4)。我们使用 exttt{gpt-4o-mini}作为分析师评估了48种策略-防御-任务组合。三个发现突出。首先,直接覆盖在我们的设置中无效:所有S1分类攻击实现了0%的抑制。相比之下,角色劫持在朴素分类器下抑制了68%的恶意日志,并且在更强的防御下仍然有效。其次,摘要是最高风险的任务:上下文操纵在没有防御的情况下达到96%的注入成功率,即使在受限输出下也达到38%。第三,防御减少但未消除攻击面:平均注入成功率从朴素提示下的26.6%下降到我们最强防御下的11.8%。我们还将实证结果与确定性模拟分析师进行比较,发现模拟显著错误预测了当前模型行为,尤其是对于直接覆盖。这些结果表明,SOC副驾驶应将原始日志内容视为对抗性输入,而非普通分析师上下文。

英文摘要

Large language models (LLMs) are increasingly used as analyst assistants in security operations centers (SOCs), where they ingest log and alert data to produce triage labels, incident summaries, or remediation advice. We study a structural failure mode of this design: many log fields are attacker controlled. User agents, URLs, payloads, DNS queries, and attempted usernames can therefore carry instructions to the model alongside evidence of the intrusion. We call this setting \emph{log-substrate prompt injection}. We introduce a four-class taxonomy of log-substrate attacks: direct override (S1), persona hijack (S2), context manipulation (S3), and obfuscated payloads (S4). We evaluate 48 strategy-defense-task combinations using \texttt{gpt-4o-mini} as the analyst. Three findings stand out. First, direct overrides are ineffective in our setting: all S1 classification attacks achieve 0\% suppression. In contrast, persona hijacks suppress 68\% of malicious logs under a naive classifier and remain effective under stronger defenses. Second, summarization is the highest-risk task: context manipulation reaches 96\% injection success without defenses and 38\% even with constrained output. Third, defenses reduce but do not eliminate the attack surface: average injection success falls from 26.6\% under naive prompting to 11.8\% under our strongest defense. We also compare empirical results to a deterministic mock analyst and find that simulation substantially mispredicts current model behavior, especially for direct overrides. These results suggest that SOC copilots should treat raw log content as adversarial input rather than ordinary analyst context.

2605.24420 2026-05-26 cs.LG cs.AI 版本更新

Batch Normalization Amplifies Memorization and Privacy Risks

批归一化加剧记忆化和隐私风险

Ngoc Phu Doan, Chongyan Gu, Ihsen Alouani

发表机构 * Queen’s University Belfast(女王大学贝尔法斯特)

AI总结 本文通过实证和理论分析,发现批归一化层会显著增加模型对异常样本的记忆化,从而加剧隐私泄露风险。

详情
AI中文摘要

批归一化(BN)被广泛采用以加速深度神经网络的收敛并实现更稳定的训练。然而,其对隐私和记忆化的影响在很大程度上尚未被探索。在这项工作中,我们研究了BN层对非典型或异常样本记忆化的影响及其对隐私泄露的启示。我们使用三种互补方法进行了广泛的实证研究:(i)对分布外训练样本的无意记忆化,(ii)通过梯度范数测量的每个样本影响,以及(iii)对成员推断攻击(MIA)的敏感性。跨多个数据集和架构,我们一致观察到,与没有BN的模型相比,BN显著增加了对异常值的记忆化。关键的是,这种放大的记忆化直接转化为隐私漏洞:具有BN的模型对MIA表现出显著更高的敏感性。我们通过理论分析补充了实证结果,表明BN在训练过程中放大了异常样本的每步影响,为这一现象提供了机制性见解。我们的结果突显了与BN相关的被低估的隐私风险,并为归一化层如何放大罕见或敏感训练样本的影响提供了实践和理论见解。

英文摘要

Batch Normalization (BN) is widely adopted to enable faster convergence and more stable training of deep neural networks. However, its impact on privacy and memorization has remained largely unexplored. In this work, we investigate the effect of BN layers on the memorization of atypical or outlier samples and its implications for privacy leakage. We conduct an extensive empirical study using three complementary approaches: (i) unintended memorization of out-of-distribution training samples, (ii) per-sample influence measured via gradient norms, and (iii) susceptibility to membership inference attacks (MIA). Across multiple datasets and architectures, we consistently observe that BN substantially increases the memorization of outliers compared to models without BN. Critically, this amplified memorization translates directly into privacy vulnerabilities: models with BN exhibit significantly higher susceptibility to MIAs. We complement our empirical findings with a theoretical analysis showing that BN amplifies the per-step influence of outlier samples during training, providing mechanistic insight into this phenomenon. Our results highlight an underappreciated privacy risk associated with BN and provide both practical and theoretical insights into how normalization layers can amplify the influence of rare or sensitive training examples.

2605.24418 2026-05-26 cs.LG 版本更新

ChainLearn: A Blockchain-Based Capacity-Aware Framework for Federated Ensemble Learning

ChainLearn: 一种基于区块链的容量感知联邦集成学习框架

Karan Sharma, Aditya Tripathi, Rahul Mishra, Tapas Kumar Maiti

发表机构 * EdddTri

AI总结 针对医院计算资源异构导致标准联邦学习失效的问题,提出容量感知协调方法,通过区块链分离链上策略与链下学习,为各医院分配适当架构并加权集成,在降低通信开销的同时保持竞争性精度与校准误差。

Comments 10 pages, 7 figures, 11 tables. IEEE conference format. Code: https://github.com/EdddTri/ChainLearn

详情
AI中文摘要

联邦学习用于医疗影像中,其中隐私禁止集中数据。标准联邦算法假设同质硬件、相同架构和集中聚合,当医院拥有不均等的计算资源时失败。我们提出容量感知协调:测量每个医院的吞吐量,分配容量适当的架构(MobileNetV3-Small、EfficientNet-B0、ResNet-50),并通过加权集成组合预测。弱医院和强医院都可以参与,无需强制统一架构。我们将链上策略与链下学习分离。一个Solidity合约存储医院注册、基准哈希、指标和权重。医院本地训练并仅提交哈希和标量(而非参数)。加权集成推理在链下计算。在PneumoniaMNIST和DermaMNIST上的实验(5个种子,3个非独立同分布水平)表明,我们的方法相比等权集成实现了更低或相等的校准误差,相比FedAvg、FedProx和FedMD具有竞争性精度。每轮通信开销为224字节,相比FedAvg减少了超过912,000倍。

英文摘要

Federated learning is used in medical imaging where privacy prohibits centralizing data. Standard federated algorithms assume homogeneous hardware, identical architectures, and centralized aggregation, which fails when hospitals have unequal compute resources. We propose capacity-aware coordination: measure each hospital's throughput, assign capacity-appropriate architectures (MobileNetV3-Small, EfficientNet-B0, ResNet-50), and combine predictions via weighted ensemble. Weak and strong hospitals can participate without forcing uniform architectures. We separate on-chain policy from off-chain learning. A Solidity contract stores hospital registration, benchmark hashes, metrics, and weights. Hospitals train locally and submit only hashes and scalars (not parameters). Weighted ensemble inference is computed off-chain. Experiments on PneumoniaMNIST and DermaMNIST (5 seeds, 3 non-IID levels) show our method achieves lower or equal calibration error versus equal-weight ensemble and competitive accuracy versus FedAvg, FedProx, and FedMD. Communication overhead is 224 bytes per round, a reduction of over 912,000x compared to FedAvg.

2605.24417 2026-05-26 cs.LG 版本更新

LLMTabBench: Evaluating LLMs on Binary Tabular Classification From Zero to Few Shots

LLMTabBench:从零样本到少样本的二元表格分类中评估LLM

Daria Grushina, Kseniia Kuvshinova, Alina Kostromina, Aziz Temirkhanov, Mile Mitrovic, Dmitry Simakov

发表机构 * Sb AI Lab, HSE, NES(Sb AI Lab,HSE,NES) Sb AI Lab, HSE(Sb AI Lab,HSE) Sb AI Lab, HSE University(Sb AI Lab,HSE大学) Sb AI Lab

AI总结 提出LLMTabBench基准,系统评估LLM在数据稀缺条件下进行表格分类时,先验知识与上下文信息(任务描述和少样本示例)的交互作用,以及性能随数据复杂度的扩展规律。

详情
AI中文摘要

表格数据的监督分类仍然是核心机器学习任务,但其对大规模标注数据集的依赖限制了在数据稀缺领域的适用性。对于此类少样本场景,像TabPFN(一种最先进的先验数据拟合网络)这样的专门方法通过利用大规模合成预训练设定了高标准,但它们仍然需要标注示例的上下文才能运行。相比之下,大型语言模型(LLM)可以通过直接从任务描述中进行零样本和少样本上下文学习提供更灵活的替代方案,但它们在表格数据上的性能仍然不一致且理解不足。我们引入了LLMTabBench,这是一个基准测试,旨在系统评估LLM在数据稀缺条件下进行表格分类的能力。LLMTabBench明确探究了(i)LLM先验知识如何与上下文信息(任务描述和少样本示例)相互作用,以及(ii)模型性能如何随数据复杂度的增加而扩展,使用了真实世界和受控合成数据集。我们的发现包括:(1)LLM在零样本设置中极具竞争力,甚至可以超越那些能够访问少样本示例的替代模型;(2)加入额外的少样本示例可能与LLM先验知识冲突,限制甚至降低性能;(3)存在一个数据复杂度阈值,超过该阈值LLM的性能下降且少样本示例变得效果较差。这些发现共同揭示了表格数据上下文学习的基本限制,并为在低数据场景中部署LLM提供了实用指导。

英文摘要

Supervised classification for tabular data remains a core machine learning task, yet its reliance on large labeled datasets limits applicability in data-scarce domains. For such few-shot scenarios, specialized methods like TabPFN - a state-of-the-art Prior-Data Fitted Network - have set a high standard by leveraging large-scale synthetic pretraining, though they still require a context of labeled examples to function. In contrast, Large Language Models (LLMs) could offer a more flexible alternative via zero- and few-shot in-context learning directly from task descriptions, but their performance on tabular data remains inconsistent and poorly understood. We introduce LLMTabBench, a benchmark designed to systematically evaluate LLMs for tabular classification under data-scarce conditions. LLMTabBench explicitly probes (i) how LLM prior knowledge interacts with in-context information (task descriptions and few-shot examples), and (ii) how model performance scales with increasing data complexity, using both real-world and controlled synthetic datasets. Our findings include: (1) LLMs are highly competitive in zero-shot settings and can outperform alternative models, even when those models have access to few-shot examples; (2) incorporating additional few-shot examples can conflict with LLM prior knowledge, limiting or even degrading performance; and (3) there is a data complexity threshold beyond which LLMs' performance declines and few-shot examples become less effective. Together, these findings reveal fundamental constraints of in-context learning for tabular data and provide practical guidance for deploying LLMs in low-data regimes.

2605.24416 2026-05-26 cs.LG 版本更新

Synheart Capacity: A Theory-Driven Physiological Representation of Cognitive Capacity Dynamics from Wearable Signals

Synheart Capacity: 一种理论驱动的从可穿戴信号中认知容量动态的生理表征

Yisak Debele, Henok Ademtew, Israel Goytom

发表机构 * Synheart AI

AI总结 提出一种理论驱动的多模态学习框架,通过心脏和皮肤电信号的双流编码,将认知容量状态建模为资源分配(努力)和超负荷(压力)的二维生理表征,在SWELL-KW数据集上实现跨个体泛化,并区分不同认知状态。

详情
AI中文摘要

人类认知表现受有限心理资源的约束,但认知容量动态的连续计算估计仍然是一个开放挑战。我们提出了一种理论驱动的多模态学习框架,将容量相关的认知状态建模为由自愿资源分配(心理努力)和超负荷相关压力(应激)定义的二维生理表征。所提出的架构结合了心脏(IBI/HRV)和皮肤电(EDA)信号的双流编码,以及后期融合和任务特定的输出头,独立估计概率性的努力和压力状态。在SWELL-KW数据集上使用严格的留一受试者交叉验证进行评估,展示了跨个体泛化能力(压力:70.0%平衡准确率;努力:72.2%),多模态融合和理论引导监督带来了显著提升。所提出的努力-压力状态空间不是将生理动态压缩为单一工作负荷标签,而是能够结构化区分不同的认知状态,包括生产性投入和超负荷相关压力。在受控工作负荷操作下,预测的状态轨迹表现出显著的负荷敏感性变化,努力和压力在中断和时间压力条件下呈现差异化响应。这些结果表明,基于生理的多维状态表征可能为能够进行连续容量感知监测和人本交互的自适应系统提供基础。

英文摘要

Human cognitive performance is constrained by limited mental resources, yet continuous computational estimation of cognitive capacity dynamics remains an open challenge. We propose a theory-driven multimodal learning framework that models capacity-related cognitive state as a two-dimensional physiological representation defined by voluntary resource allocation (mental effort) and overload-related strain (stress). The proposed architecture combines dual-stream encoding of cardiac (IBI/HRV) and electrodermal (EDA) signals with late fusion and task-specific output heads that independently estimate probabilistic effort and stress states. Evaluation on the SWELL-KW dataset using strict leave-one-subject-out cross-validation demonstrates cross-individual generalization (stress: 70.0\% balanced accuracy; effort: 72.2\%), with significant gains from multimodal integration and theory-guided supervision. Rather than collapsing physiological dynamics into a single workload label, the proposed effort--stress state-space enables structured differentiation between distinct cognitive regimes, including productive engagement and overload-related strain. Predicted state trajectories exhibit significant demand-sensitive shifts under controlled workload manipulations, with effort and stress responding differentially across interruption and time-pressure conditions. These results suggest that physiologically grounded multidimensional state representations may provide a foundation for adaptive systems capable of continuous capacity-aware monitoring and human-centered interaction.

2605.24411 2026-05-26 cs.AI cs.LG 版本更新

The Model Is Not the Product: A Dual-Pillar Architecture for Local-First Psychological Coaching

模型并非产品:面向本地优先心理辅导的双支柱架构

Alexander Mihalcea

发表机构 * iOS application(iOS应用)

AI总结 本文提出Psych LM,一种基于本地优先架构的iOS应用,通过自动记忆语料库和检索增强生成实现近无限上下文窗口,在移动设备上提供可靠的上下文感知心理辅导。

Comments 10 pages, 3 figures

详情
AI中文摘要

现有语言模型应用难以满足情感导向支持的需求,主要原因是它们无法在会话间维持深度、持久的上下文。本报告介绍了Psych LM,一款iOS应用,验证了对于此类应用,周围架构至关重要的论点。Psych LM在专为行为和生活辅导应用设计的本地优先运行时中运行本地设备端语言模型。该系统通过一个自动化的、用户可检查的记忆语料库实现了接近无限上下文窗口的实际效果,该语料库将对话转换为结构化的记忆卡片,包括事实、目标和事件,并通过语义和向量搜索动态注入提示中。因此,该系统可定义为一种主动学习、检索增强生成、设备端架构。该架构提供了四个主要贡献:以隐私为核心属性的本地优先设计;用于持久存储关键用户信息的记忆语料库的详细描述;提供独立于模型内部状态的稳定行为骨架的确定性编排层;以及专注于在现实操作条件下评估集成系统可靠性的基准框架。研发过程证实,通过优先考虑架构控制和资源管理而非简单模型大小,可以在移动环境的严格约束下可靠地实现复杂的上下文感知交互。

英文摘要

Existing language model applications struggle to meet the demand for emotionally oriented support, primarily due to their inability to maintain deep, persistent context across sessions. This report introduces Psych LM, an iOS application that validates the thesis that, for such applications, the surrounding architecture is paramount. Psych LM runs a local, on-device language model within a purpose-built, local-first runtime designed for behavioral and life-coaching applications. The system achieves the practical effect of a near-infinite context window through an automated, user-inspectable memory corpus that converts conversations into structured memory cards, including facts, goals, and events, and dynamically injects them into the prompt via semantic and vector search. As such, the system can be defined as an active-learning, retrieval-augmented generative, on-device architecture. This architecture delivers four primary contributions: a local-first design where privacy is a core property; a detailed description of the memory corpus for persistent context of key user information; a deterministic orchestration layer that provides a stable behavioral spine independent of the model's internal state; and a benchmark framework focused on evaluating the integrated system's reliability under realistic operating conditions. The R and D process confirms that complex, context-aware interaction can be reliably achieved under the strict constraints of a mobile environment by prioritizing architectural control and resource management over simple model size.

2605.24406 2026-05-26 cs.LG 版本更新

A Unified Python Framework for Direct PPO-based Control of AHUs with Economizer Logic and CO2-Constrained Ventilation

一个统一的Python框架:基于直接PPO的AHU控制,包含节能器逻辑和CO2约束通风

Erfan Haghighat Damavandi, Davide Papurello, Mahdi Alibeigi, Armin Keshavarz, Simone Canevarolo, Marco Condo

发表机构 * Research and Development Department Owtana Tech- Turin, Italy(Owtana科技研发部,意大利都灵) Department of Energy Politecnico di Torino- Turin, Italy(都灵理工学院能源部,意大利都灵) Department of Energy University of Tehran- Tehran, Iran(伊朗德黑兰大学能源部,伊朗德黑兰) Department of Energy Sharif University of Technology- Tehran, Iran(伊朗谢赫大学能源部,伊朗德黑兰)

AI总结 本文提出一个基于深度强化学习和PPO算法的统一Python框架,通过层次流逻辑和焓基节能器实现AHU的节能控制,在保证CO2浓度不超限的同时提升温度稳定性和能效。

Comments 10 pages, 7 figures

详情
AI中文摘要

优化HVAC(供暖、通风和空调)系统可以在为居住者提供舒适度的同时提高建筑能效。由于建筑围护结构随时间经历随机负荷变化而具有非线性特性,使用传统控制系统来维持HVAC功能通常很困难。本文提出了一种新方法,通过深度强化学习(DRL)算法和在自定义Python性能环境中实现的近端策略优化(PPO)算法来优化HVAC系统。DRL系统使用二阶电阻-电容热模型和集成的CO2动态质量平衡来复制与建筑相关的复杂物理过程。本研究的一个主要创新是“层次流逻辑”,它通过覆盖导致CO2超过1000 ppm的智能体动作来确保室内空气质量(IAQ)得以维持。此外,使用基于焓的节能器从室外环境实现免费冷却。实验数据表明,与通过遗传算法(GA)调优的PID控制器或传统的开关控制相比,PPO智能体具有更好的温度稳定性和整体能效。端到端的流水线为在真实硬件实现背景下实施智能建筑能源管理提供了稳健且通用的解决方案。

英文摘要

Optimizing HVAC (Heating, Ventilation and Air Conditioning) can enhance a building's energy efficiency while providing comfort levels for its occupants. Using conventional control systems to maintain HVAC functions is often difficult because of the nonlinear characteristics of a building envelope as it experiences stochastic load variations over time. This paper presents a new approach to optimizing HVAC systems through the use of Deep Reinforcement Learning (DRL) algorithms and the Proximal Policy Optimization (PPO) algorithm implemented in a custom Python performance environment. The DRL system uses a second order resistor-capacitor thermal model and an integrated dynamic mass balance of CO2 to replicate the complex physics associated with buildings. One major innovation of this study is a "Hierarchical Flow Logic," which provides the means to ensure that indoor air quality (IAQ) is maintained by overriding the accepted actions of the agent that cause CO2 to exceed 1000 ppm. In addition, an enthalpy-based economiser is used to create free cooling from the outdoor environment. The experimental data shows that compared to PID controllers tuned by GA or traditional On-Off controls, a PPO agent has better temperature stability and energy efficiency overall. An end-to-end pipeline provides an avenue for robust and generalized solutions to help implement smart building energy management within the context of real hardware implementation.

2605.24405 2026-05-26 cs.LG cs.AI 版本更新

Generative OOD-regularized Model-based Policy Optimization

生成式OOD正则化的基于模型的策略优化

Aysin Tumay, Jiahe Huang, Elise Jortberg, Rose Yu

发表机构 * University of California, San Diego(加州大学圣地亚哥分校) Abiomed(阿比omed)

AI总结 提出GORMPO算法,利用生成式密度估计在稀疏状态-动作空间中限制策略更新到高密度区域,以解决离线强化学习中的分布外动作问题,并在真实医疗数据集和离线RL数据集上优于基线方法。

详情
AI中文摘要

我们研究使用离线强化学习的序贯决策。传统离线RL策略在训练仅依赖稀疏离线表示时可能导致分布外(OOD)动作。为确保在稀疏状态-动作空间中的安全离线策略,我们探索如何将密度估计模型集成到基于模型的RL方法中以避免OOD区域。生成式模型能够显式建模稀疏状态-动作空间中的密度。基于此,我们引入生成式OOD正则化的基于模型的策略优化(GORMPO),一种密度正则化的离线RL算法,使用生成式密度建模将策略更新限制在数据集的高密度区域。此外,我们考察更好的OOD检测是否对应更好的基于模型的离线策略。我们比较了(1)各种密度估计器的OOD检测能力,以及(2)它们在GORMPO框架内在真实医疗数据集和稀疏离线RL数据集上的性能。我们在温和假设下理论上保证了GORMPO的性能。实验上,GORMPO在真实医疗数据集上比最先进的基线方法提升17%,并在离线RL数据集上增强了基础模型。我们的实证发现表明,在动态稳定的环境中,更好的OOD检测通常导致改进的策略,而当动态不确定时,带有保守惩罚的较差密度估计更受青睐。

英文摘要

We study sequential decision-making with offline reinforcement learning (RL). Traditional offline RL policies may result in out-of-distribution (OOD) actions when training relies only on sparse offline representations. To ensure safe offline policies in a sparse state-action space, we explore how density estimation models can be integrated into model-based RL methods to avoid the OOD regions. Generative models are capable of explicitly modeling the density in sparse state-action spaces. Building on this, we introduce Generative OOD-regularized Model-based Policy Optimization (GORMPO), a density-regularized offline RL algorithm that uses generative density modeling to restrict policy updates to high-density areas of the dataset. Furthermore, we examine whether better OOD detection corresponds to better model-based offline policies. We compare (1) the OOD detection capabilities of various density estimators and (2) their performance within the GORMPO framework on a real-world medical dataset and sparse offline RL datasets. We theoretically guarantee GORMPO's performance under mild assumptions. Empirically, GORMPO outperforms state-of-the-art baselines by 17% on a real-world medical dataset and enhances the base model on the offline RL datasets. Our empirical findings show that better OOD detection generally results in improved policies in environments with stable dynamics, while conservative penalties with poor density estimation are favored when dynamics are uncertain.

2605.24395 2026-05-26 cs.LG 版本更新

AvAtar: Learning to Align via Active Optimal Transport

AvAtar: 通过主动最优输运学习对齐

Qi Yu, Ruizhong Qiu, Zhichen Zeng, My T. Thai, Huan Liu, Hanghang Tong

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) University of Florida(佛罗里达大学) Arizona State University(亚利桑那州立大学)

AI总结 提出AvAtar框架,利用主动学习策略通过熵正则化最优输运的梯度影响量化候选点信息量,并采用伴随状态法高效求解,以提升对齐性能。

Comments Published as a conference paper at ICML 2026

详情
AI中文摘要

对齐在许多机器学习问题中扮演基础角色,例如多网络分析、多模态学习和点云配准。近期工作越来越多地利用最优输运(OT)进行分布对齐,其有效性很大程度上依赖于在实践中难以或昂贵获取的稀疏监督。然而,现有工作大多忽略了如何主动获取高质量监督以提升OT框架下的对齐性能。本文提出了一种基于主动对齐的最优输运框架AvAtar。我们通过测量候选点对全局对齐结果的梯度影响来量化其信息量,该影响通过熵正则化OT公式从全局对齐结果传播到候选点的所有可能监督。鉴于OT的约束性质,直接对其求导具有挑战性,我们利用伴随状态方法将计算重新表述为一个线性系统,可通过共轭梯度法以线性复杂度求解并保证收敛。通过有效的效用函数编码全局对齐结果,AvAtar适用于OT框架下的一般对齐问题。在三个代表性对齐任务上的大量实验证明了所提AvAtar的有效性、可扩展性和泛化性。

英文摘要

Alignment plays a fundamental role in many machine learning problems, such as multi-network analysis, multimodal learning, and point cloud registration. Recent works increasingly leverage optimal transport (OT) for distributional alignment, whose effectiveness largely depends on sparse supervision that is hard or costly to obtain in practice. Existing works, however, largely overlook how to actively acquire high-quality supervision to improve their alignment performance under OT frameworks. In this paper, we propose a principled active alignment framework for optimal transport alignment called AvAtar. We quantify the informativeness of a candidate by measuring its gradient-based impact on the global alignment result, computed as the gradient propagation from the global alignment result to all possible supervisions of the candidate through the entropy-regularized OT formulation. While differentiating through OT is challenging given its constrained nature, we leverage the adjoint-state method to reformulate the computation to a linear system solvable by the conjugate gradient method with linear complexity and guaranteed convergence. By encoding the global alignment result via effective utility functions, AvAtar is applicable to general alignment problems under the OT framework. Extensive experiments on three representative alignment tasks demonstrate the effectiveness, scalability, and generalizability of the proposed AvAtar.

2605.24390 2026-05-26 cs.LG 版本更新

Learning Laplacian Eigenspace with Mass-Aware Neural Operators on Point Clouds

学习拉普拉斯特征空间:基于质量感知神经算子的点云处理

Zherui Yang, Tao Du, Ligang Liu

发表机构 * University of Science and Technology of China(中国科学技术大学) Tsinghua University(清华大学) Shanghai Qi Zhi Institute(上海智院) Laoshan Laboratory(崂山实验室)

AI总结 提出神经特征空间算子(NEO),通过预测稳定不变的低频子空间而非特征向量,结合质量感知神经算子和瑞利-里兹精化,实现点云上拉普拉斯-贝尔特拉米算子的快速谱分解。

详情
AI中文摘要

拉普拉斯-贝尔特拉米算子(LBO)的特征分解是几何分析的基础,但由于在大规模数据上迭代求解器的高成本,计算其低频特征模态仍然是一个重大瓶颈。为了分摊这一成本,我们引入了神经特征空间算子(NEO),这是一种前馈框架,旨在直接从点云预测谱。关键的是,NEO通过学习稳定、不变的低频子空间,规避了标准特征向量回归的不适定性(后者存在固有的符号翻转和旋转歧义)。具体来说,网络预测一组冗余的基函数,其张成空间稳健地覆盖目标特征空间,从而通过轻量级的瑞利-里兹精化恢复精确的特征对。为了处理不规则采样,我们提出了一种质量感知神经算子,将逐点面积权重纳入基于注意力的聚合中,提高了对非均匀密度的鲁棒性,并实现了跨分辨率的零样本泛化。我们的方法实现了近线性的运行时间缩放,并在相当精度下比迭代求解器获得了显著的挂钟加速,同时对高分辨率点云表现出强大的零样本迁移能力。得到的特征对支持标准的谱几何任务,而原始基函数为下游学习提供了有效的逐点特征。代码:https://github.com/Adversarr/NEO。

英文摘要

The eigendecomposition of the Laplace--Beltrami Operator (LBO) is fundamental to geometric analysis, yet computing its low-frequency eigenmodes remains a significant bottleneck due to the high cost of iterative solvers on large-scale data. To amortize this cost, we introduce the Neural Eigenspace Operator (NEO), a feed-forward framework designed to predict the spectrum directly from point clouds. Crucially, NEO circumvents the ill-posed nature of standard eigenvector regression, which suffers from intrinsic sign flips and rotation ambiguities, by learning the stable, invariant low-frequency subspace instead. Specifically, the network predicts a redundant set of basis functions whose span robustly covers the target eigenspace, allowing for the recovery of accurate eigenpairs via a lightweight Rayleigh--Ritz refinement. To handle irregular sampling, we propose a mass-aware neural operator that incorporates per-point area weights into attention-based aggregation, improving robustness to non-uniform densities and enabling zero-shot generalization across resolutions. Our approach achieves near-linear runtime scaling and substantial wall-clock speedups over iterative solvers at comparable accuracy, and exhibits strong zero-shot transfer to high-resolution point clouds. The resulting eigenpairs support standard spectral geometry tasks, while the raw basis functions provide effective point-wise features for downstream learning. Code: https://github.com/Adversarr/NEO.

2605.24386 2026-05-26 quant-ph cond-mat.stat-mech cs.DS cs.LG 版本更新

Fermi-Dirac machines as quantizations of neurons

费米-狄拉克机作为神经元的量子化

Alexander He, Nana Liu, Mark M. Wilde

发表机构 * Department of Physics, Cornell University(康奈尔大学物理系) Institute of Natural Sciences, Shanghai Jiao Tong University(上海交通大学自然科学研究院) School of Mathematical Sciences, Shanghai Jiao Tong University(上海交通大学数学科学学院) Ministry of Education Key Laboratory in Scientific and Engineering Computing, Shanghai Jiao Tong University(上海交通大学教育部科学与工程计算重点实验室) Global College, Shanghai Jiao Tong University(上海交通大学全球学院) School of Electrical and Computer Engineering, Cornell University(康奈尔大学电气与计算机工程学院)

AI总结 本文将费米-狄拉克机重新解释为经典神经元的正则量子化,通过用算子替换经典变量,开发了高效混合量子-经典算法来评估和训练量子化神经元,并证明了基于费米-狄拉克神经元的计算决策问题是BQP完全的。

Comments 87 pages, 12 figures, 2 tables

详情
AI中文摘要

费米-狄拉克机最近被提出作为在量子计算机上解决半定优化问题的一种方法。在这里,我们将其重新解释为经典神经元的正则量子化。通过将经典神经元视为应用于参数化经典哈密顿量的激活函数,我们通过用算子替换经典变量来量子化该模型,这些算子的特征值编码了它们的可能值。这遵循了量子力学中正则量子化的标准方法。关键的是,当哈密顿量由对易算子组成时,我们的构造精确地简化为经典神经元。更一般地,我们的方法产生了一个激活可观测量,定义为应用于参数化量子哈密顿量的激活函数。这个量子化神经元的输出是一个随机变量,其期望值等于激活可观测量相对于输入状态的期望值。我们开发了高效的混合量子-经典算法来评估和训练我们的量子化神经元的输出和梯度,从而实现评估和训练。这些算法依赖于基本原语,包括随机采样、哈密顿量模拟和Hadamard测试。我们还量子化了其他一系列激活函数,包括平滑修正线性单元(ReLU)、sigmoid线性单元、高斯平滑ReLU和高斯误差线性单元(GeLU),这些已知对深度学习应用有用。数值实验表明,基于量子哈密顿量的神经元可以学习经典神经元无法学习的函数。我们进一步基于费米-狄拉克神经元定义了一个计算决策问题,并证明它是BQP完全的,提供了反对有效经典模拟的复杂性理论证据。最后,我们将我们的方法推广到连续量子变量,并勾画了将这些神经元组合成网络的两种不同方式。

英文摘要

Fermi-Dirac machines were proposed recently as an approach to solving semidefinite optimization problems on quantum computers. Here, we reinterpret them as canonical quantizations of classical neurons. By viewing a classical neuron as an activation function applied to a parameterized classical Hamiltonian, we quantize this model by replacing classical variables with operators whose eigenvalues encode their possible values. This follows the standard approach to canonical quantization in quantum mechanics. Crucially, when the Hamiltonian consists of commuting operators, our construction reduces exactly to a classical neuron. More generally, our approach yields an activation observable, defined as an activation function applied to a parameterized quantum Hamiltonian. The output of this quantized neuron is a random variable with expectation value equal to that of the activation observable with respect to an input state. We develop efficient hybrid quantum-classical algorithms for evaluating outputs and gradients of our quantized neurons, enabling evaluation and training. These algorithms rely on basic primitives that include random sampling, Hamiltonian simulation, and the Hadamard test. We also quantize a whole host of other activation functions, including the smooth rectified linear unit (ReLU), sigmoid linear unit, Gaussian-smoothed ReLU, and Gaussian error linear unit (GeLU), which are known to be useful for deep learning applications. Numerical experiments indicate that neurons based on quantum Hamiltonians can learn functions that classical neurons cannot. We further define a computational decision problem based on Fermi-Dirac neurons and prove that it is BQP-complete, providing complexity-theoretic evidence against efficient classical simulation. Finally, we generalize our approach to continuous quantum variables and sketch two different ways of composing these neurons into networks.

2605.24381 2026-05-26 cs.LG cs.AI stat.AP stat.ML 版本更新

Assessing the Operational Viability of Foundation Models for Time Series Forecasting

评估基础模型在时间序列预测中的操作可行性

Kavin Soni, Debanshu Das, Vamshi Guduguntla

发表机构 * Google, USA(谷歌公司,美国)

AI总结 通过对比基础模型与监督学习方法在四种操作场景下的性能,提出基于经验特征的复杂度路由器以实现精度与效率的平衡。

Comments 21 pages, 8 Figures, Code available at [https://github.com/kavin-soni/timeseries-zeroshot-eval]

详情
AI中文摘要

时间序列预测驱动着金融、交通和能源等领域的操作决策。虽然监督学习方法表现出色,但它们需要特定领域的训练、特征工程和持续维护。大规模基础模型最近作为一种零样本替代方案出现,像LLM一样避免了任务特定训练。在这项工作中,我们评估了基础模型与标准监督方法的对比。我们不仅关注总体精度,还分析了四种操作场景下的性能:周期性人机系统、物理约束过程、随机金融市场和异构需求预测。我们的结果描述了最优部署区域。基础模型在具有可迁移周期结构的领域中表现良好,并且对于冷启动或长尾场景效率高。相反,监督专家在受严格物理约束的系统中保持更高的精度。在金融领域,较新的基础模型正在迅速缩小与监督专家的性能差距。我们进一步量化了推理延迟、数据漂移适应性和部署约束之间的权衡。最后,我们提出了一个复杂度路由器,它利用经验特征将每个序列分配给最优模型类别。我们证明,与部署通用基础模型相比,这种选择性路由实现了更高的精度和显著更低的推理成本,为平衡泛化性和效率提供了一个实用框架。

英文摘要

Time series forecasting drives operational decisions in areas like finance, transportation, and energy. While supervised learning approaches achieve strong performance, they require domain-specific training, feature engineering, and ongoing maintenance. Large-scale foundation models have recently emerged as a zero-shot alternative, avoiding task-specific training much like LLMs. In this work, we evaluate foundation models against standard supervised approaches. Rather than focusing solely on aggregate accuracy, we analyze performance across four operational regimes: periodic human-centric systems, physically constrained processes, stochastic financial markets, and heterogeneous demand forecasting. Our results characterize optimal deployment areas. Foundation models perform well in domains with transferable periodic structures and are efficient for cold-start or long-tail scenarios. Conversely, supervised specialists maintain higher precision in systems governed by strict physical constraints. In financial domains, newer foundation models are rapidly closing the performance gap with supervised specialists. We further quantify trade-offs in inference latency, data drift adaptability, and deployment constraints. Finally, we propose a Complexity Router that assigns each series to the optimal model class using empirical features. We demonstrate that this selective routing achieves higher accuracy and significantly lower inference costs compared to deploying a universal foundation model, providing a practical framework for balancing generalization and efficiency.

2605.24370 2026-05-26 cs.LG q-bio.QM 版本更新

GEESE: Genotype-aware End-to-End Spatio-temporal Embedding for Behavioral Phenotyping

GEESE: 基因型感知的端到端时空嵌入用于行为表型分析

Yiran Ding, Yuen Gao, Chunqi Qian, Zijun Cui

发表机构 * Department of Computer Science and Engineering(计算机科学与工程系) Department of Radiology(放射学系)

AI总结 提出GEESE框架,利用预训练时间序列基础模型从3D姿态动力学中直接学习行为表征,无需手工特征,在三个自闭症相关基因模型上超越传统方法,并开发了交互式工具HONK。

详情
AI中文摘要

遗传动物模型的行为表型分析目前需要劳动密集的手工特征工程,这限制了可重复性和可扩展性。我们提出GEESE,一个端到端的深度学习框架,直接从3D姿态动力学中学习行为表征,无需手工特征。使用预训练的时间序列基础模型,我们将运动序列编码到一个行为流形中,该流形支持行为分类和基因型预测。在三个自闭症相关基因模型(CNTNAP2、CHD8、FMR1)上评估,我们的深度学习方法在这两个任务上都超越了手工特征基线,揭示出学习到的表征捕获了基因型特异的行为特征。该框架跨遗传背景泛化,一个全队列模型仅从运动模式中识别遗传背景和基因型。我们进一步提供HONK,一个交互式智能工具,使没有编程专业知识的科研人员能够通过自然语言交互从姿态数据中进行行为表型分析。

英文摘要

Behavioral phenotyping of genetic animal models currently requires labor-intensive manual feature engineering that limits reproducibility and scalability. We present GEESE, an end-to-end deep learning framework that learns behavioral representations directly from 3D pose dynamics without hand-crafted features. Using a pretrained time series foundation model, we encode movement sequences into a behavioral manifold that supports both behavior classification and genotype prediction. Evaluated across three autism-associated genetic models (CNTNAP2, CHD8, FMR1), our deep learning approach surpasses hand-crafted feature baselines in both tasks, revealing that learned representations capture genotype-specific behavioral signatures. The framework generalizes across genetic backgrounds, and an all-cohort model identifies both genetic background and genotype from movement patterns alone. We further provide HONK, an interactive intelligent tool enabling researchers without programming expertise to perform behavioral phenotyping from pose data through natural language interaction.

2605.24367 2026-05-26 cs.CV cs.LG 版本更新

Gaussian Rank-Based Neighborhood Degree for Graph Neural Networks in Image Classification

基于高斯排序邻域度的图神经网络图像分类方法

Rafael Mendonça Duarte, Jean Roberto Ponciano, Lucas Pascotti Valem

发表机构 * Institute of Mathematics and Computer Science (ICMC)(数学与计算机科学研究所) University of São Paulo (USP)(圣保罗大学) São Carlos -- SP -- Brazil(巴西圣卡洛斯)

AI总结 提出GRaNDe(高斯排序邻域度)方法,通过结合邻域排序与高斯距离加权来改进图神经网络中的度归一化,在五个公开图像分类数据集上取得一致准确率提升。

详情
AI中文摘要

数据的指数级增长加剧了未标注数据的可用性与人工标注的高成本之间的差距。图神经网络(GNN)作为一种有前景的解决方案出现,因为它们利用关系结构并从标注和未标注数据中学习,执行半监督学习。这些模型的一个关键组成部分是基于度的归一化,它影响消息传播,但通常假设邻域节点具有均匀重要性。在图像分类中,图通常根据特征相似性构建,将所有邻居平等对待可能会忽略相关性的重要变化。受此差距启发,我们提出GRaNDe(高斯排序邻域度)。这种新颖的度度量将邻域排序与高斯距离加权相结合,以更好地捕捉节点重要性。在五个公开图像分类数据集上的实验表明,与最先进方法相比,该方法具有一致的准确率提升和竞争性或更优的结果。

英文摘要

The exponential growth of data has intensified the gap between the availability of unlabeled data and the high cost of manual annotation. Graph Neural Networks (GNNs) have emerged as a promising solution, as they exploit relational structures and learn from both labeled and unlabeled data, performing semi-supervised learning. A crucial component of many of these models is degree-based normalization, which influences message propagation but typically assumes uniform importance among neighboring nodes. In image classification, graphs are usually constructed from feature similarity, where treating all neighbors equally may overlook important variations in relevance. Motivated by this gap, we propose GRaNDe (Gaussian Rank-based Neighborhood Degree). This novel degree measure integrates neighborhood ranking with Gaussian distance weighting to better capture node importance. Experiments on five public image classification datasets show consistent accuracy improvements and competitive or superior results compared to state-of-the-art methods.

2605.24366 2026-05-26 cs.CL cs.LG 版本更新

Structure-Aware RAG: Structured Retrieval Augmented Generation from Noisy Data for Conversational Agents

结构感知检索增强生成:面向对话代理的噪声数据结构化检索增强生成

Kaiqiao Han, LuAn Tang, Renliang Sun, Peng Yuan, Wei Cheng, Haoyu Wang, Wei Wang, Yizhou Sun, Haifeng Chen

发表机构 * UCLA(加州大学洛杉矶分校) NEC Labs(日本电装实验室)

AI总结 提出结构感知检索增强生成(SA-RAG),通过表格作为中间结构化表示来减少噪声并保留关键信息,结合质量感知的表格元数据生成框架和优化方法,在噪声真实数据集上显著优于现有RAG基线。

详情
AI中文摘要

大型语言模型(LLM)已广泛应用于对话应用。然而,它们对参数化知识的依赖限制了在需要动态或领域特定信息的真实场景中的可靠性。检索增强生成(RAG)通过在生成过程中引入外部知识来解决这一限制,但现有的基于文本和基于图的RAG方法通常难以处理噪声或不相关的上下文。在这项工作中,我们提出了结构感知检索增强生成(SA-RAG),它使用表格作为中间结构化表示,提供紧凑且可控的接口,在减少噪声的同时保留关键信息。我们引入了一个质量感知的表格元数据生成框架,对元数据规范化和有效性进行建模,提高了元数据质量和下游性能。此外,我们探索了无训练和基于训练的表格生成方法。生成验证和直接偏好优化进一步提高了表格质量,同时保持了语义和结构一致性。在两个噪声真实数据集上的实验表明,SA-RAG显著优于现有的RAG基线。我们的代码已在公共仓库中公开。

英文摘要

Large Language Models (LLMs) have been widely adopted in conversational applications. However, their reliance on parametric knowledge limits reliability in real-world scenarios that require dynamic or domain-specific information. Retrieval-Augmented Generation (RAG) addresses this limitation by incorporating external knowledge during generation, but existing text-based and graph-based RAG methods often struggle with noisy or irrelevant contexts. In this work, we propose Structure-aware Retrieval Augmented Generation (SA-RAG), which uses tables as an intermediate structured representation to provide a compact and controllable interface that reduces noise while preserving essential information. We introduce a quality-aware table metadata generation framework that models metadata normalization and effectiveness, improving metadata quality and downstream performance. Furthermore, we explore both training-free and training-based table generation methods. Generation validation and direct preference optimization further improve table quality while maintaining semantic and structural consistency. Experiments on two noisy real-world datasets show that SA-RAG significantly outperforms existing RAG baselines. Our code is publicly available at a public repository.

2605.24364 2026-05-26 stat.ML cs.LG 版本更新

Multicalibration Boosting: Theory, Convergence, and Transferability

多校准提升:理论、收敛性和可迁移性

Hanxuan Ye, Hongzhe Li

发表机构 * Department of Biostatistics and Epidemiology(生物统计学与流行病学系) University of Pennsylvania(宾夕法尼亚大学)

AI总结 本文统一了多校准提升(MCBoost)的理论框架,揭示了校准-风险权衡,并建立了在弱假设下的收敛率、有限样本保证和协变量偏移下的迁移性保证。

详情
AI中文摘要

多校准通过要求预测在一组丰富的函数(包括预测切片和子群体)上无偏,扩展了经典校准。它已成为公平性、鲁棒性和可靠预测的强大框架,然而多校准提升(MCBoost)的理论理解仍然零散且常依赖限制性假设。在这项工作中,我们发展了一个统一且精细的MCBoost视角,涵盖了现有变体,包括多精度、BatchGCP和BatchMVP。我们揭示了几个现象,为其实际行为提供了新见解:即使高度准确和灵活的预测器也可能保持显著的不校准;强制多校准引入了校准-风险权衡;早停在这一权衡的控制中起核心作用。在理论方面,我们在更弱且更现实的条件下建立了MCBoost的通用框架。我们证明提升迭代收敛到审计类生成的累积跨度上总体最优预测器的Bregman投影,从而显式刻画了实现多校准的函数空间。我们进一步推导了不同光滑性假设下的收敛率、有限样本保证以及确保终止时多校准的原则性停止规则。最后,我们将通用适应性理论扩展到协变量偏移下,提供了更一般的迁移保证,并阐明了多校准预测器何时跨领域泛化。这些结果为多校准提升提供了更完整的理论基础和实践指导,将其定位为现代预测模型的统一框架和可靠的后处理方法。

英文摘要

Multicalibration extends classical calibration by requiring predictions to be unbiased over a rich collection of functions, encompassing both prediction slices and subpopulations. It has emerged as a powerful framework for fairness, robustness, and reliable prediction, yet the theoretical understanding of multicalibration boosting (MCBoost) remains fragmented and often relies on restrictive assumptions. In this work, we develop a unified and refined perspective on MCBoost that subsumes existing variants, including multiaccuracy, BatchGCP, and BatchMVP. We uncover several phenomena that provide new insights into its practical behavior: even highly accurate and flexible predictors can remain substantially miscalibrated; enforcing multicalibration introduces a calibration-risk trade-off; and early stopping plays a central role in controlling this trade-off. On the theoretical side, we establish a general framework for MCBoost under weaker and more realistic conditions. We show that the boosting iterates converge to a Bregman projection of the population-optimal predictor onto the cumulative span generated by the audit class, thereby explicitly characterizing the function space on which multicalibration is achieved. We further derive convergence rates under different smoothness assumptions, finite-sample guarantees, and principled stopping rules that ensure multicalibration at termination. Finally, we extend the theory of universal adaptability under covariate shift, providing more general transfer guarantees and clarifying when multicalibrated predictors generalize across domains. These results provide a more complete theoretical foundation and practical guidance for multicalibration boosting, positioning it as both a unifying framework and a reliable post-processing approach for modern predictive models.

2605.24357 2026-05-26 cs.LG 版本更新

Refined Analysis of Entropy-Regularized Actor-Critic

熵正则化演员-评论家的精细化分析

Safwan Labbi, Paul Mangold, Daniil Tiapkin, Eric Moulines

发表机构 * CMAP, CNRS, École Polytechnique, Institut Polytechnique de Paris, 91120 Palaiseau, France(法国巴黎政治学院CMAP、国家科学研究中心、巴黎政治学院、巴黎理工学院、法国Palaiseau) Université Paris-Saclay, CNRS, LMO, 91405, Orsay, France. Now at Google DeepMind.(巴黎萨克雷大学、国家科学研究中心、LMO、法国Orsay、现就职于DeepMind) Mohamed bin Zayed University of Artificial Intelligence, UAE(阿联酋Mohamed bin Zayed人工智能大学)

AI总结 本文精细化分析了熵正则化有限折扣环境中演员-评论家算法中评论家的作用,证明精确评论家可强方差缩减,使随机梯度演员-评论家达到与确定性策略梯度相同的样本复杂度,并指出评论家误差足够小时方差缩减和快速收敛得以保持,强调了准确评论家估计的关键性。

详情
AI中文摘要

在本文中,我们研究了熵正则化、有限、折扣环境中演员-评论家算法中评论家的作用。我们证明,当评论家精确时,将其作为基线是一种强意义上的方差缩减方法。在这种情况下,使用随机梯度的演员-评论家达到了与确定性策略梯度相同的样本复杂度,以 $\tilde{O}(\log(1/ε))$ 个样本达到 $ε$-最优正则化值。在实践中,评论家与演员同时学习:演员更新的方差随后受到评论家方差和偏差的影响。具体而言,当评论家误差足够小时,方差缩减和快速收敛得以保持。这建议先学习评论家,并在每次演员更新后保持其更新,强调了准确评论家估计在演员-评论家方法中的关键作用。

英文摘要

In this paper, we study the role of the critic in actor--critic for entropy-regularized, finite, discounted environments. We establish that, when the critic is exact, using the latter as a baseline is a variance-reduction method in a strong sense. In this case, actor--critic with stochastic gradients matches the sample complexity of deterministic policy gradient, reaching an $ε$-optimal regularized value with $\tilde{O}(\log(1/ε))$ samples. In practice, the critic is learned alongside the actor: the variance of the actor update is then influenced by the critic's variance and bias. Specifically, when the critic has a sufficiently small error, the variance reduction and rapid convergence are preserved. This suggests to learn the critic first, keeping it up to date after each actor update, underscoring the crucial role of accurate critic estimation in actor--critic methods.

2605.24345 2026-05-26 cs.LG 版本更新

Evolving Robustness--Exploration Trade-off in Online Reinforcement Learning via Quantile Bayesian Risk MDPs

通过分位数贝叶斯风险MDP演化在线强化学习中的鲁棒性-探索权衡

Meichen Song, Yuhao Wang, Enlu Zhou

发表机构 * School of Industrial and Systems Engineering(工业与系统工程系) Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文提出一种基于分位数贝叶斯风险MDP的自适应算法,通过动态调整分位数水平来平衡早期鲁棒性与后期探索,并证明了亚线性贝叶斯遗憾界。

详情
AI中文摘要

在在线强化学习中,数据稀缺导致认知不确定性,使得鲁棒性在学习初期很重要,而充分的探索对于学习真实环境的最优策略是必要的。我们通过分位数贝叶斯风险感知马尔可夫决策过程(BR-MDP)研究这种时变的鲁棒性-探索权衡,其中分位数水平控制后验不确定性如何进入贝尔曼备份。我们通过分位数BR-MDP值与真实环境值之差的渐近正态性结果来表征这种控制。结果表明,上/下尾分位数分别导致对认知不确定性的乐观/悲观,并且乐观/悲观的程度随着数据积累而减小。基于这一表征,我们提出了一种在线贝叶斯风险感知算法,该算法具有自适应分位数调度,早期强调鲁棒性,并逐渐鼓励探索较少访问的状态-动作对。我们建立了相对于真实最优值和最优BR-MDP鲁棒值的亚线性贝叶斯遗憾界。数值实验在探索需求型和探索成本型环境中均表现出强劲性能。

英文摘要

In online reinforcement learning, data scarcity creates epistemic uncertainty that makes robustness important early in learning, whereas sufficient exploration is needed to learn the true-environment optimal policy. We study this time-varying robustness--exploration trade-off through a quantile Bayesian risk-aware Markov decision process (BR-MDP), in which the quantile level controls how posterior uncertainty enters the Bellman backup. We characterize this control through an asymptotic normality result for the difference between the quantile BR-MDP value and the value in the true environment. The result implies that upper/lower-tail quantiles induce optimism/pessimism towards epistemic uncertainty, and the magnitude of the optimism/pessimism decreases as data accumulate. Building on this characterization, we propose an online Bayesian risk-aware algorithm with an adaptive quantile schedule that emphasizes robustness early and gradually encourages exploration of less-visited state--action pairs. We establish sublinear Bayesian regret bounds with respect to both the true optimal value and the optimal BR-MDP robust value. Numerical experiments demonstrate strong performance in both exploration-demanding and exploration-costly environments.

2605.24340 2026-05-26 cs.LG 版本更新

ChainzRule: Sample-Efficient, Robust Deep Learning Across Tabular, NLP, and Vision Tasks

ChainzRule: 跨表格、NLP和视觉任务的样本高效、鲁棒深度学习

Rowan Martnishn

发表机构 * Sentivity AI

AI总结 提出ChainzRule架构,用可学习多项式层替代激活函数,结合微分正则化,通过限制中间导数实现低频率、结构稳定的表示,在多个领域以更少数据和标准推理成本取得更优性能。

详情
AI中文摘要

跨企业领域的生产深度学习系统在学术基准通常掩盖的约束下运行:标记数据昂贵,推理预算紧张,无法解释其行为的模型难以信任和维护。我们提出ChainzRule (CR),一种神经架构,用可学习多项式层替代典型激活函数,这些层由微分正则化(DREG)驱动,这是一种在前向传播期间以标准推理成本分析计算的逐层雅可比惩罚。核心主张是,限制中间导数迫使网络走向低频、结构稳定的表示,同时减少对标记数据量的依赖,提高对分布偏移的鲁棒性,并提供可测量的、基于梯度的模型行为处理手段。在五个领域的评估中,CR在Pima糖尿病数据集上达到$85.71\% \pm 2.01\%$(统计上优于SVM和XGBoost),在SST-5情感分类上使用冻结编码器达到$46.20\% \pm 0.37\%$(优于使用约5%训练数据的RNTN),在SST-5上使用微调BERT骨干达到$55.79\%$(对比BERT-base线性头的$54.9\%$),在Yelp Full序数回归上使用3.2M参数达到$70.17\%$(对比10模型平均$66.35\%$),在CIFAR-10-C上平均损坏准确率提升$+2.32\%$。所有报告$p$值的结果在Bonferroni校正后均低于$\alpha=0.05$阈值。CR在所有数据分数下保持梯度尾部比率$\tau$(p99/均值)为$1.01$--$1.02$,而所有典型激活函数基线为$1.07$--$1.09$,我们提出这一结构不变性作为样本效率的机制驱动因素和部署时模型可靠性的代理指标。

英文摘要

Production deep learning systems across enterprise domains operate under constraints that academic benchmarks routinely obscure: labeled data is expensive, inference budgets are tight, and models that cannot explain their behavior are difficult to trust and maintain. We present ChainzRule (CR), a neural architecture replacing typical activations with learnable polynomial layers governed by Differential Regularization (DREG), a layer-wise Jacobian penalty computed analytically during the forward pass at standard inference cost. The core claim is that bounding intermediate derivatives forces the network toward low-frequency, structurally stable representations, simultaneously reducing dependence on labeled data volume, improving robustness to distribution shift, and providing a measurable, gradient-based handle on model behavior. Evaluated across five domains, CR achieves $85.71\% \pm 2.01\%$ on Pima Diabetes (statistically superior to SVM and XGBoost), $46.20\% \pm 0.37\%$ on SST-5 sentiment classification with a frozen encoder (superior to RNTN using approximately 5\% of its training data), $55.79\%$ on SST-5 with a fine-tuned BERT backbone (versus BERT-base linear head at $54.9\%$), $70.17\%$ on Yelp Full ordinal regression with 3.2M parameters versus a 10-model average of $66.35\%$, and $+2.32\%$ mean corruption accuracy on CIFAR-10-C. All results with reported $p$-values fall below the $α= 0.05$ threshold after Bonferroni correction. CR maintains a gradient tail ratio $τ$ (p99/mean) of $1.01$--$1.02$ against $1.07$--$1.09$ for all typical activation function baselines across every data fraction, a structural invariant we propose as the mechanistic driver of sample efficiency and a deployment-time proxy for model reliability.

2605.24331 2026-05-26 cs.LG stat.ML 版本更新

CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning

CurveRL: 用于LLM推理的基于分布感知的上下文重加权原则

Ke Sun, Yizhou Zhao, Jiayi Xin, Qi Long, Weijie Su

发表机构 * University of Pennsylvania(宾夕法尼亚大学)

AI总结 本文提出CurveRL方法,通过分位数坐标变换实现分布感知的提示重加权,在RLVR框架下统一优化理论并显著提升推理性能。

详情
AI中文摘要

上下文或提示级别的重加权已成为使用验证奖励的强化学习(RLVR)中提升大型语言模型推理能力的关键算法杠杆,但决定最优加权的原则仍不清楚。我们通过将提示重加权公式化为通过率函数空间中定义的效用泛函的泛函导数来解决这一差距,从而产生一个统一的优化框架,该框架能够容纳现有方案,包括REINFORCE和GRPO。在此优化框架的基础上,我们提出了一种基于分位数坐标变换的分布感知提示重加权方法,称为CurveRL,其中分配给每个提示的权重不取决于通过率的绝对值,而是取决于其排名和密度,以反映学习动态中通过率的分布结构。跨多个基准的大量实验表明,我们提出的CurveRL始终优于GRPO和其他RLVR基线。我们的研究将上下文分布控制确定为分析和设计提示重加权RLVR算法的原则性轴心。代码发布在https://github.com/zhyzmath/CurveRL。

英文摘要

Context or prompt-level reweighting has emerged as a central algorithmic lever in Reinforcement Learning with Verified Rewards (RLVR) for improving the reasoning capability of large language models, yet the principle determining what constitutes an optimal weighting remains poorly understood. We address this gap by formulating prompt reweighting as a functional derivative of a utility functional defined in the pass-rate function space, yielding a unified optimality framework that accommodates existing schemes, including REINFORCE and GRPO. Building on this optimality framework, we propose a distribution-aware prompt reweighting approach, called CurveRL, based on a quantile coordinate transform, in which the weight assigned to each prompt depends not on the absolute value of pass rates but on its rank and density to reflect the distributional structure of the pass rates in the learning dynamics. Extensive experiments across multiple benchmarks demonstrate that our proposed CurveRL consistently outperforms GRPO and other RLVR baselines. Our study identifies context-distribution control as a principled axis for analyzing and designing prompt-reweighted RLVR algorithms. The code is released in https://github.com/zhyzmath/CurveRL.

2605.24330 2026-05-26 cs.LG 版本更新

Interdomain Attention: Beyond Token-Level Key-Value Memory

域间注意力:超越令牌级键值记忆

Naoki Kiyohara, Harrison Bo Hua Zhu, Riccardo El Hassanin, Zhuo Sun, Wenlong Chen, Samir Bhatt, Yingzhen Li

发表机构 * Imperial College London, UK(伦敦帝国学院) University of Copenhagen, Denmark(哥本哈根大学) Shanghai University of Finance(上海金融学院) Canon Inc., Japan(日本佳能公司) Technical University of Denmark, Denmark(丹麦技术大学)

AI总结 提出域间注意力机制,通过核方法将状态空间模型集成到注意力模块中,实现固定大小状态上的查询条件注意力,在语言建模中优于SSM和标准注意力基线。

详情
AI中文摘要

Transformer和深度状态空间模型(SSM)处于基本设计选择的两端:注意力通过基于内容的匹配以二次代价将每个查询路由到不断增长的键值(KV)缓存中,而深度SSM将上下文压缩为固定大小的循环状态,该状态不能通过查询-键匹配直接寻址。我们提出域间注意力,通过核方法将SSM集成到注意力模块中:注意力核通过有限特征图近似,得到的键特征和值投影到由单个SSM循环维护的一组共享基函数上,每个查询通过其自身的特征图关注压缩后的系数,从而恢复对固定大小状态的查询条件注意力。可扩展层是该推导的学习松弛版本,我们通过消融实验验证其组件。在FineWeb-Edu上进行的125M到1.3B自回归语言建模研究中,在匹配循环状态预算的情况下,域间注意力在每个规模上都优于SSM令牌混合器,在1.3B规模上,在验证困惑度和八任务常识套件上超越了相同配方的softmax基线,并且继承了其固定状态核心的长度平坦行为,可外推至训练上下文的3.5倍。消融实验表明,查询条件投影是增益的主要来源。

英文摘要

Transformers and deep state space models (SSMs) sit at opposite ends of a basic design choice: attention routes each query through a growing key-value (KV) cache by content-based matching at quadratic cost, while deep SSMs compress context into a fixed-size recurrent state that is not directly addressed by query-key matching. We propose Interdomain Attention, which integrates an SSM into an attention module through kernel methods: an attention kernel is approximated by a finite feature map, the resulting key features and values are projected onto a shared set of basis functions maintained by a single SSM recurrence, and each query attends to the compressed coefficients through its own feature map, recovering query-conditioned attention over a fixed-size state. The scalable layer is a learned relaxation of this derivation, and we validate its components through ablations. In a 125M to 1.3B autoregressive language-modeling study on FineWeb-Edu at matched recurrent-state budget, Interdomain Attention improves on an SSM token mixer at every scale, surpasses a same-recipe softmax baseline at 1.3B on validation perplexity and on the eight-task commonsense suite, and inherits the length-flat behavior of its fixed-state core out to 3.5x the training context. Ablations indicate that the query-conditioned projection is the main source of the gain.

2605.24324 2026-05-26 quant-ph cs.LG 版本更新

A Matched Spectral Benchmark of Quantum Inspired Feature Maps

量子启发特征映射的匹配光谱基准

Toheeb Ogunade, Taofeek Kassim, Etinosa Osaro

发表机构 * Department of Computer Science, University of Lagos(拉各斯大学计算机科学系) Department of Physics, University of Lagos(拉各斯大学物理系) Department of Chemical Engineering, University of Notre Dame(圣母大学化学工程系)

AI总结 通过匹配输出维度和强经典控制,比较振幅、角度和基编码作为确定性特征映射在经典监督学习中的表现,分析其几何特性并发现固定量子启发编码几何本身并非经典数据上机器学习优势的可靠来源。

详情
AI中文摘要

量子机器学习通常受到这样一种想法的驱动:量子系统可以暴露经典模型难以访问的有用高维结构。我们分离了这一主张的一个核心组成部分:固定数据编码映射。在匹配输出维度和强经典控制下,将振幅、角度和基编码作为经典监督学习的确定性特征映射进行评估。该基准测试将这些编码与原始线性模型、随机傅里叶特征、多项式特征、PCA、RBF SVM和浅层神经网络在多种经典数据集上进行比较。我们不是将性能视为单一终点,而是通过有效秩、条件数、中心核对齐、预测性能和实际开销来分析每个表示的几何结构。结果图景是机械性的:振幅编码可以通过单位球面归一化去除幅度信息,角度编码可能在几何上与原始线性特征冗余,基编码可能施加与平滑决策结构不对齐的二元汉明几何。然而,这些发现并不反对量子计算,它们表明固定的量子启发编码几何本身并不是经典数据上机器学习优势的可靠来源。

英文摘要

Quantum machine learning is often motivated by the idea that quantum systems can expose useful high-dimensional structure that is difficult to access with classical models. We isolate one central component of this claim: the fixed data-encoding map. Amplitude, angle, and basis encoding are evaluated as deterministic feature maps for classical supervised learning under matched output dimensionality and strong classical controls. The benchmark compares these encodings against raw linear models, random Fourier features, polynomial features, PCA, RBF SVMs, and shallow neural networks across diverse classical datasets. Rather than treating performance as a single endpoint, we analyze the geometry of each representation through effective rank, condition number, centered kernel alignment, predictive performance, and practical overhead. The resulting picture is mechanistic: amplitude encoding can remove magnitude information through unit-sphere normalization, angle encoding can become geometrically redundant with raw linear features, and basis encoding can impose a binary Hamming geometry that is poorly aligned with smooth decision structure. These findings do not argue against quantum computation, however, they show that fixed quantum-inspired encoding geometry alone is not a reliable source of machine-learning advantage on classical data.

2605.24319 2026-05-26 cs.LG 版本更新

Omissive Bias in Religious Representation: Benchmarking LLM Answers to Everyday Ethical Decision-making

宗教表征中的省略偏见:评估LLM在日常伦理决策中的回答

David Wingate, Sheryl Carty, Joshua Coates, Daniel Feldman, Nancy Fulda, Larry Howell, Brett Israelson, Dallin Jacobs, Jonathan Karr, John Paul Kimes, Elisabeth Kincaid, Paul Martens, Gavin Mobley, Suzana Pinheiro, Lindsay Slemboski, Peter Whiting

发表机构 * Brigham Young University Baylor University University of Notre Dame Yeshiva University

AI总结 通过构建AllFaith宗教表征基准,评估LLM在回答日常伦理问题时是否提及宗教,发现模型普遍存在省略宗教框架的偏见,尤其在个人实际情境中更为明显。

详情
AI中文摘要

随着大型语言模型成为个人、道德和存在性问题上的默认指导来源,它们是否借鉴了历史上塑造此类推理的宗教框架,还是系统性地忽略了它们,这一点至关重要。在本文中,我们提出了一个刻意狭窄的问题:当面对一个日常伦理问题,而宗教观点可能具有价值时,LLM是否会援引宗教?与寻找政治倾向或社会偏见存在的基准相反,我们寻找的是宗教表征的缺失,作为LLM中价值对齐和偏见的一个维度。我们将其称为“省略偏见”。为了衡量省略偏见,我们贡献了AllFaith宗教表征基准:150个伦理和个人相关的问题,来源于真实聊天记录和信仰社区贡献者,并配有一个LLM作为评判者的评分标准,该标准对任何提及宗教、宗教实践或宗教领袖的内容给予满分。这些问题本身并非关于宗教——它们是关于悲伤、宽恕、人际关系、目的和诚实的开放式问题,其中宗教是几种有价值的视角之一。我们还进行了一项人类受试者调查,以比较LLM行为与人类期望。评估27个模型后,我们发现LLM相对于人类期望始终低估了宗教。这种省略是不对称的:模型在抽象的存在性问题(意义、死亡、真理)上比在个人实际情境——悲伤、婚姻、家庭冲突、成瘾——中更容易援引宗教,而后者正是许多人最依赖宗教的地方。我们的目的并非评判LLM应持有何种价值观。我们更温和地认为,当前的LLM回答忽视了反映许多人在应对个人和伦理挑战时所依赖的宗教框架的关键机会。

英文摘要

As large language models become a default source of guidance on personal, moral, and existential questions, it matters whether they draw on the religious frameworks that have historically shaped such reasoning, or systematically omit them. In this paper, we ask a deliberately narrow question: when posed an everyday ethical question for which religious perspectives may be valuable, do LLMs invoke religion at all? In contrast to benchmarks that look for the presence of political leanings or social bias, we look for the absence of religious representation as a dimension of value alignment and bias in LLMs. We term this ``omissive bias.'' To measure omissive bias, we contribute the AllFaith Religious Representation Benchmark: 150 ethically and personally salient questions, sourced from in-the-wild chat transcripts and faith-community contributors, paired with an LLM-as-judge rubric that gives full credit for any mention of a religion, a religious practice, or a religious leader. The questions are not themselves about religion--they are open-ended questions about grief, forgiveness, relationships, purpose, and honesty, where religion is one valuable perspective among several. We also run a human-subjects survey to compare LLM behavior against human expectations. Evaluating 27 models, we find that LLMs consistently underrepresent religion relative to human expectations. The omission is asymmetric: models invoke religion more readily for abstract existential questions (meaning, death, truth) than for the practical personal situations--grief, marriage, family conflict, addiction--where many people most rely on it. It is not our purpose to adjudicate which values LLMs should hold. We argue, more modestly, that current LLM responses overlook critical opportunities to reflect religious frameworks that many people draw on when navigating personal and ethical challenges.

2605.24316 2026-05-26 cs.LG 版本更新

From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression

从单次SGD到数据重用:草图线性回归中的小批量缩放定律

Ziyan Chen, Ding-Xuan Zhou

发表机构 * The University of Sydney(悉尼大学)

AI总结 本文通过分析单次批量SGD、有放回多次批量SGD和无放回多次批量SGD在幂律协方差谱下的风险分解,推导了草图线性回归中小批量大小的缩放定律,揭示了小批量对优化偏差、方差和波动项的影响。

Comments 56 pages, 3 figures

详情
AI中文摘要

缩放定律提供了预测误差如何随计算量、模型大小和数据变化的简洁描述,但现有理论主要处理单样本SGD或完全数据重用,未明确小批量的作用。我们研究了在幂律协方差谱和目标参数源条件下的草图线性回归的批量缩放定律。我们分析了单次批量SGD、有放回多次批量SGD和无放回多次批量SGD。我们的第一个结果是风险分解:所有三种过程共享相同的不可约项和逼近项,而它们的随机项取决于采样协议。单次批量SGD分解为偏差和方差,而两种多次批量方法分解为GD偏差、GD方差和围绕公共GD参考轨迹的波动项。然后我们证明了单次和多次小批量方法的源条件缩放定律。对于单次批量SGD,小批量保留了逼近和优化偏差指数,而方差按$O(\min(M,(T_{\mathrm{eff}}γ)^{1/a})/(B T_{\mathrm{eff}}))$缩放。因此,在固定更新次数$T$下,通常的$1/B$协方差减少成立,但在单次机制中$T=N/B$,它被更短的优化视野部分抵消。对于多次批量SGD,有放回和无放回采样具有相同的逼近和GD偏差/方差项;它们仅在波动协方差前因子不同,有放回时为$1/B$,无放回时为$ρ_{N,B}=(N-B)/(B(N-1))$。因此,对于$B>1$,无放回采样噪声更小,当$B=N$时波动消失,恢复确定性梯度下降。这些结果将批量大小与计算量、数据和模型维度置于草图线性回归中相同的理论基础上。

英文摘要

Scaling laws provide compact descriptions of how prediction error varies with compute, model size, and data, but existing theory mainly treats single-sample SGD or full data reuse, leaving the role of mini-batching unclear. We study batch scaling laws for sketched linear regression under a power-law covariance spectrum and a source condition on the target parameter. We analyze one-pass batch SGD, multi-pass batch SGD with replacement, and multi-pass batch SGD without replacement. Our first result is a risk decomposition: all three procedures share the same irreducible and approximation terms, while their stochastic terms depend on the sampling protocol. One-pass batch SGD splits into bias and variance, whereas the two multi-pass methods split into GD bias, GD variance, and a fluctuation term around a common GD reference trajectory. We then prove source-condition scaling laws for one-pass and multi-pass mini-batch methods. For one-pass batch SGD, mini-batching preserves the approximation and optimization-bias exponents, while the variance scales as $O(\min(M,(T_{\mathrm{eff}}γ)^{1/a})/(B T_{\mathrm{eff}}))$. Thus the usual $1/B$ covariance reduction holds at fixed update count $T$, but in the one-pass regime $T=N/B$ it is partly offset by the shorter optimization horizon. For multi-pass batch SGD, with- and without-replacement sampling have identical approximation and GD bias/variance terms; they differ only in the fluctuation covariance prefactor, which is $1/B$ with replacement and $ρ_{N,B}=(N-B)/(B(N-1))$ without replacement. Hence without-replacement sampling is less noisy for $B>1$, and when $B=N$ the fluctuation vanishes, recovering deterministic gradient descent. These results place batch size on the same theoretical footing as compute, data, and model dimension in sketched linear regression.

2605.24310 2026-05-26 cs.CL cs.LG 版本更新

Discovering Lexical Gaps Using Embeddings from Multilingual LLMs

利用多语言大语言模型的嵌入发现词汇空缺

Yoonwon Jung, Aaron S. Cohen, Benjamin K. Bergen

发表机构 * Department of Cognitive Science, University of California San Diego(加州大学圣地亚哥分校认知科学系)

AI总结 提出一种数据驱动框架,通过多语言大语言模型的上下文嵌入计算语义相似度,以识别跨语言词汇空缺,在韩英和英韩方向上分别达到0.81和0.76的AUC。

Comments CoNLL 2026

详情
AI中文摘要

词汇空缺是指在某些语言中不存在的单词。它们给构建多语言词汇资源、机器翻译和跨语言迁移带来了挑战。现有的词汇空缺检测依赖于人工判断或固定的概念分类法。我们提出了一个数据驱动的框架来识别跨语言词汇空缺。我们从韩英双语大语言模型中提取了韩语到英语和英语到韩语翻译对的上下文嵌入。通过组合不同的LLM、嵌入类型、维度和正交变换,在100个训练-测试划分中,每种源语言产生了4000个不同的嵌入空间。在每个空间中,我们计算每个源词与其在目标语言中最近邻的语义相似度,并比较空缺词与非空缺词的分布。在94%(韩语到英语)和97%(英语到韩语)的嵌入空间中,空缺词显示出比非空缺词更弱的跨语言语义对齐。在未对齐的嵌入空间上训练的逻辑分类器可以可靠地区分空缺词和非空缺词,在韩语到英语和英语到韩语方向上分别达到0.81和0.76的AUC,并检索出18/19个韩语空缺词和26/27个英语空缺词。该方法提供了一种语言无关且无需分类法的可扩展词汇空缺识别方法。

英文摘要

Lexical gaps are words that do not exist in certain languages. They pose challenges for building multilingual lexical resources, for machine translation, and for cross-lingual transfer. Existing lexical gap detection relies on human judgments or fixed conceptual taxonomies. We propose a data-driven framework for identifying cross-lingual lexical gaps. We extracted contextualized embeddings from Korean-English bilingual LLMs for Korean-to-English and English-to-Korean translation pairs. Combinations of LLMs, embedding types, dimensionality, and orthogonal transformations across 100 train-test splits yielded 4000 distinct embedding spaces in each source language. In each space, we computed the semantic similarity between each source word and its nearest neighbor in the target language, and compared their distribution for gap words versus non-gap words. In 94% (Korean-to-English) and 97% (English-to-Korean) of embedding spaces, gap words showed weaker cross-lingual semantic alignment than non-gap words. Logistic classifiers trained on unaligned embedding spaces can reliably separate gap words from non-gap words, achieving AUCs of 0.81 (Korean-to-English) and 0.76 (English-to-Korean) and retrieving 18/19 Korean and 26/27 English gap words. This approach provides a language-agnostic and taxonomy-free method for scalable lexical gap identification.

2605.24305 2026-05-26 cs.LG cs.AI 版本更新

ChaosBench-Logic v2: Evaluating LLM Logical Reasoning over Dynamical Systems at Scale

ChaosBench-Logic v2: 大规模评估大语言模型在动力系统上的逻辑推理能力

Noel Thomas

发表机构 * Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学)

AI总结 针对二元推理基准的准确性掩盖了关键缺陷,本文提出包含40,886个问题、覆盖165个动力系统的ChaosBench-Logic v2基准和CARE评估协议,揭示模型在状态转换推理、FOL演绎等任务上的表现差异和系统性反相关。

Comments 14 pages, 8 figures. Published at the ICLR 2026 Workshop on LLM Reasoning

详情
AI中文摘要

二元推理基准的标准准确性隐藏了关键失败模式:先验崩溃、释义下不一致以及无法推理参数依赖的动态。我们提出了ChaosBench-Logic v2,一个包含40,886个问题、覆盖165个动力系统、27个FOL谓词和78条公理边的基准,以及CARE(校准与对抗鲁棒评估)协议,该协议揭示了这些病理现象。评估14个模型,我们发现即使对于前沿模型,状态转换推理仍接近随机(MCC = 0.05),而给定前提的FOL演绎达到MCC = 0.52。按系列分解显示,专有模型的优势集中在跨指标(+0.40)和一致性任务上,而开源Qwen 2.5-32B在指标诊断上占优(0.91 vs. 0.45)。两个模型在分岔问题上表现出负MCC,通过混淆矩阵分析确认为系统性反相关。

英文摘要

Standard accuracy on binary reasoning benchmarks hides critical failure modes: prior collapse, inconsistency under paraphrase, and inability to reason about parameter-dependent dynamics. We present ChaosBench-Logic v2, a 40,886-question benchmark over 165 dynamical systems with 27 FOL predicates and 78 axiom edges, together with CARE (Calibration- and Adversarial-Robust Evaluation), a protocol that surfaces these pathologies. Evaluating 14 models, we find that regime-transition reasoning remains near random (MCC = 0.05) even for frontier models, whereas FOL deduction with given premises reaches MCC = 0.52. Per-family decomposition shows that the proprietary-model advantage concentrates on cross-indicator (+0.40) and consistency tasks, while open-source Qwen 2.5-32B dominates indicator diagnostics (0.91 vs. 0.45). Two models exhibit negative MCC on bifurcation questions, confirmed as systematic anti-correlation via confusion-matrix analysis.

2605.24300 2026-05-26 cs.CR cs.AI cs.LG 版本更新

Enhancing Reliability in LLM-Based Secure Code Generation

增强基于LLM的安全代码生成的可靠性

Mohammed F. Kharma, Mohammad Alkhanafseh, Ahmed Sabbah, David Mohaisen

发表机构 * Department of Computer Science, Birzeit University(巴伊兹大学计算机科学系) Department of Computer Science, University of Central Florida(佛罗里达中央大学计算机科学系)

AI总结 提出Mitigation-Aware Chain-of-Thought (MA-CoT)框架,通过嵌入任务特定的CWE缓解指导和语言感知安全措施,显著降低LLM生成代码中的漏洞,在多个模型和语言上验证了其一致的安全可靠性提升。

Comments 15 pages; 7 tables; 3 figures

详情
AI中文摘要

大型语言模型(LLM)被广泛用于代码生成,但其安全可靠性在不同语言和提示策略下仍不一致。现有的提示工程提高了功能正确性,但很少能确保一致的安全结果。我们引入了 extit{Mitigation-Aware Chain-of-Thought (MA-CoT)}框架,该框架嵌入了任务特定的CWE缓解指导和语言感知安全措施,以减少生成代码中反复出现的漏洞。我们在三个LLM(gpt-5, claude-4.5, gemini-2.5)、三种编程语言(C, Java, Python)和四种提示策略(Vanilla, Zero-shot, CoT, MA-CoT)上,使用200个任务的主数据集以及LLMSecEval的外部验证对MA-CoT进行了评估。通过静态分析和专家验证,MA-CoT在主数据集上将总安全问题从92个减少到39个(57.6%),在LLMSecEval上从73个减少到4个(94.5%)。高严重性问题(阻塞+严重)分别从90个降至39个(56.7%)和从45个降至2个(95.6%)。在两个数据集中,MA-CoT是唯一能持续提高安全可靠性的策略;Zero-shot和CoT可靠性较低,且可能增加漏洞,尤其是在C语言中。我们进一步引入了严格的漏洞驱动因素分层归因(语言核心层与栈层),并表明残余风险集中在硬化导向的模式(例如,依赖于操作系统和工具链),这激励了在提示之外采用安全构造原语。

英文摘要

Large language models (LLMs) are widely used for code generation, but their security reliability remains inconsistent across languages and prompting strategies. Existing prompt engineering improves functional correctness but rarely ensures consistent security outcomes. We introduce the \textit{Mitigation-Aware Chain-of-Thought (MA-CoT)} framework, which embeds task-specific CWE mitigation guidance and language-aware safeguards to reduce recurring vulnerabilities in generated code. We evaluate MA-CoT across three LLMs (gpt-5, claude-4.5, gemini-2.5), three programming languages (C, Java, Python), and four prompting strategies (Vanilla, Zero-shot, CoT, MA-CoT) on a 200-task primary dataset, with external validation on LLMSecEval. Using static analysis with expert validation, MA-CoT reduces total security findings from 92 to 39 (57.6\%) on the primary dataset and from 73 to 4 (94.5\%) on LLMSecEval. High-severity findings (Blocker + Critical) drop from 90 to 39 (56.7\%) and from 45 to 2 (95.6\%), respectively. Across both datasets, MA-CoT is the only strategy that consistently improves security reliability; Zero-shot and CoT are less reliable and may increase vulnerability, especially in C. We further introduce a strict layered attribution of vulnerability drivers (language-core vs. stack layers) and show that residual risk concentrates in hardening-oriented patterns (e.g., OS- and toolchain-dependent), motivating secure-by-construction primitives alongside prompting.

2605.24299 2026-05-26 cs.LG 版本更新

LLMs Show No Signs Of Individuated Metacognition

LLMs 未显示出个体化元认知的迹象

M. Moran, Mark Whiting

发表机构 * Pareto AI

AI总结 通过因素分析和校准方法,研究20个前沿大语言模型在六个基准上的置信度判断,发现模型间置信度差异主要由共享的难度因子和决策阈值决定,而非个体化元认知,数学推理中的表面例外实为混淆效应。

详情
AI中文摘要

置信度加权路由、选择性弃权和集成加权都假设模型表达的置信度能反映其回答问题的能力。它们假定功能性元认知,即无需实际执行就能评估自身能力的能力。聚合校准已被广泛研究,结果不一,但置信度表达的内在结构尚不明确。我们使用四因素分析与成对校准,分解了20个前沿大语言模型在六个基准上的二元置信度判断,探究置信度不同的两个模型是否也在性能上存在差异。在事实回忆和信息检索基准上,跨模型置信度矩阵近似秩为1,单个主导因子捕获了大部分潜在方差。检索事实的模型共享一个项目级难度轴,主要区别在于沿该轴的决策阈值。在所有基准上,一旦移除所有模型一致同意的项目,置信度与性能之间的关系便消失。模型间成对校准即使统计显著也很小,且在控制共享因子上的基率差异后,剩余部分缩小为零。数学推理似乎是例外,但结果发现这是一种混淆:推理模型通过尝试在思维链中解决问题来回答关于其置信度的问题,绕过了我们试图测量的亚符号自我知识。我们没有发现任何测试领域存在显著的语言化个体化元认知的证据。

英文摘要

Confidence-weighted routing, selective abstention, and ensemble weighting all assume that a model's stated confidence is informative about its capability on the question being asked. They presume functional metacognition, the capacity to assess one's own capabilities, without exercising them. Aggregate calibration is well studied, with mixed results, but the underlying structure of elicited confidence is less well understood. We decompose binary confidence judgements from 20 frontier Large Language Models (LLMs) across six benchmarks using tetrachoric factor analysis paired with pairwise calibration, asking whether two models that differ in confidence also differ in performance. On factual recall and information retrieval benchmarks the cross-model confidence matrix is approximately rank-one and a single dominant factor captures most of the latent variance. Models retrieving facts share an item-level difficulty axis and differ mainly in their decision thresholds along it. Across all benchmarks the relationship between confidence and performance collapses once items that all models agree on are removed. Inter-model pairwise calibration is small even where statistically significant, and what remains shrinks to nothing once base-rate differences along the shared factor are controlled for. Mathematical reasoning is the apparent exception, but this turns out to be a confound where reasoning models answer questions about their confidence by trying to solve them in their chain of thought, bypassing the sub-symbolic self-knowledge we seek to measure. We find no evidence for significant verbalised individuated metacognition in any tested domain.

2605.24298 2026-05-26 cs.CR cs.AI cs.LG 版本更新

An Empirical Evaluation of LLM-Generated Code Security Across Prompting Methods

LLM生成代码安全性的提示方法实证评估

Mohammed Kharma, Ahmed Sabbah, Mohammad Alkhanafseh, Mohammad Hammoudeh, David Mohaisen

发表机构 * Department of Computer Science, Birzeit University(计算机科学系,巴勒斯坦比泽大学) King Fahd University of Petroleum and Minerals(国王法赫德石油和矿物大学) University of Central Florida(中央佛罗里达大学)

AI总结 通过跨5个LLM和4种编程语言的实证评估,提出弱点感知零样本链式思考(WA-0CoT)提示策略,发现提示方法虽影响弱点类别分布,但无法显著降低漏洞频率或密度。

Comments 40 pages, 22 tables, 8 figures

详情
AI中文摘要

大型语言模型(LLM)在自动化代码生成中的日益使用提高了软件开发效率,但往往以安全性为代价。生成的代码经常忽略关键问题,使其容易受到弱加密和不正确的输入验证等问题的影响。为了研究这一问题,我们对跨五个LLM和四种编程语言(Java、C++、C和Python)的LLM生成代码的安全质量进行了全面的实证评估,考察了多种提示工程方法的影响。我们提出了一种弱点感知的零样本链式思考(WA-0CoT)提示策略,该策略利用CWE映射丰富提示中的安全上下文以指导模型推理。我们的实证分析在卡方检验的支持下发现,不同提示方法在漏洞频率或密度上没有统计学上的显著降低。然而,包括WA-0CoT在内的提示策略系统地影响了CWE类别的组成分布,其效果因编程语言而异。这些发现表明,虽然安全感知的提示改变了生成弱点的结构,但仅靠提示工程不足以可靠地降低整体漏洞水平。结果强调了在评估LLM生成代码的安全属性时,语言感知和模型感知的提示设计的重要性。

英文摘要

The growing use of Large Language Models (LLMs) for automated code generation has enhanced software development efficiency, but often at the cost of security. Generated code frequently overlooks critical concerns, leaving it vulnerable to issues such as weak encryption and improper input validation. To investigate this problem, we present a comprehensive empirical evaluation of the security quality of LLM-generated code across five LLMs and four programming languages (Java, C++, C, and Python), examining the impact of multiple prompt engineering methods. We introduce a weaknesses-aware zero-shot chain-of-thought (WA-0CoT) prompting strategy that enriches prompts with security context using CWE mappings to guide model reasoning. Our empirical analysis, supported by chi-square tests, finds no statistically significant reductions in vulnerability frequency or density across prompt methods. However, prompting strategies, including WA-0CoT, systematically influence the compositional distribution of CWE categories, with effects varying by programming language. These findings suggest that while security-aware prompting alters the structure of generated weaknesses, prompt engineering alone is insufficient to reliably reduce overall vulnerability levels. The results highlight the importance of language-aware and model-aware prompt design when evaluating the security properties of LLM-generated code.

2605.24295 2026-05-26 cs.LG stat.ML 版本更新

Private Adaptive Covariance Estimation via Gaussian Graphical Models

通过高斯图模型进行私有自适应协方差估计

Cecilia Ferrando, Miguel Fuentes, Brett Mullins, Cameron Musco, Daniel Sheldon

发表机构 * Manning College of Information and Computer Sciences(信息与计算机科学学院)

AI总结 提出PACE-GGM,一种数据自适应的差分隐私协方差估计方法,通过将隐私预算集中在经验协方差矩阵信息量最大的条目上,并在每轮中选择近似差的条目进行高斯机制测量,然后通过最大熵重建目标重构完整协方差矩阵,从而在高维和低到中等隐私预算下显著降低估计误差。

详情
AI中文摘要

我们提出了PACE-GGM,一种数据自适应的差分隐私协方差估计方法,该方法将隐私预算集中在经验协方差矩阵信息量最大的条目上,而不是扰动所有条目。这适用于建模者为每个变量提供单独边界的自然场景,因此各个条目可以比整个矩阵以更少的噪声进行测量。在每一轮中,我们的方法选择一个近似较差的条目,使用高斯机制对其进行测量,然后通过最大熵重建目标重构完整的协方差矩阵,从而得到高斯图模型结构。在多个真实世界数据集上的实验表明,与高斯机制和其他基线相比,该方法在估计误差方面持续改进,特别是在高维和低到中等隐私预算的情况下。

英文摘要

We propose PACE-GGM, a data-adaptive differentially private method for covariance estimation that concentrates its privacy budget on the most informative entries of the empirical covariance matrix, rather than perturbing all entries. This applies in the natural setting where the modeler supplies separate bounds for each variable, so that individual entries can be measured with less noise than the full matrix. In each round, our method selects a poorly approximated entry, measures it using the Gaussian mechanism, and then reconstructs a full covariance matrix using a maximum-entropy reconstruction objective, leading to a Gaussian graphical model structure. Experiments on diverse real-world datasets demonstrate consistent improvements in estimation error with respect to the Gaussian mechanism and other baselines, particularly in high-dimensional and low-to-moderate privacy regimes.

2605.24294 2026-05-26 cs.CR cs.AI cs.LG 版本更新

Concept Drift Adaptation Using Self-Supervised and Reinforcement Learning In Android Malware Detection

使用自监督学习和强化学习在Android恶意软件检测中适应概念漂移

Ahmed Sabbah, Mohammad Kharma, Mohammad Alkhanafseh, Radi Jarrar, Samer Zein, David Mohaisen

发表机构 * Birzeit University(巴伊兹大学) University of Central Florida(中央佛罗里达大学)

AI总结 提出一个基于自监督学习和强化学习的框架,通过冻结编码器测量潜在漂移并轻量适配,同时利用PPO控制器在成本约束下选择维护动作,以应对Android恶意软件检测中的概念漂移。

Comments 9 pages, 2 figures, 2 tables

详情
AI中文摘要

Android恶意软件检测器在部署后常因概念漂移而性能下降,而每次维护步骤完全重新训练成本高昂。我们提出一个按时间顺序的自适应维护框架,将部署时的维护建模为序列决策问题。该框架在初始化阶段通过自监督学习学习稳定的潜在表示,冻结编码器,在固定表示空间中测量潜在漂移,并使用可训练适配器和分类头进行轻量下游适配。一个近端策略优化控制器根据检测器状态(包括当前效用、固定记忆集上的保留率、潜在漂移指标和更新成本)选择低成本的维护动作。我们在模拟器和真实Android恶意软件数据集上,使用静态和动态特征,在因果部署式协议下评估该框架。结果表明,RL控制器提供了一种强大的成本感知适配策略,在非平稳部署条件下,始终保持在最佳策略之列,同时在时间性能、记忆保留和维护成本之间取得有利平衡。

英文摘要

Android malware detectors often degrade after deployment because of concept drift, while full retraining at each maintenance step is costly. We propose a chronological adaptive maintenance framework that models deployment-time maintenance as a sequential decision problem. The framework learns a stable latent representation through self-supervised learning during initialization, freezes the encoder, measures latent drift in the fixed representation space, and performs lightweight downstream adaptation using a trainable adapter and classification head. A proximal policy optimization controller selects low-cost maintenance actions based on the detector state, including current utility, retention on a fixed memory set, latent drift indicators, and update cost. We evaluate the framework under a causal deployment-style protocol on emulator and real Android malware datasets with static and dynamic features. Results show that the RL controller provides a strong cost-aware adaptation strategy, consistently remaining among the top-performing policies while achieving a favorable balance between temporal performance, memory retention, and maintenance cost under non-stationary deployment conditions.

2605.24292 2026-05-26 cs.LG 版本更新

TUBE: Tangent Upper Bound on Evidence for Discrete Diffusion Language Models

TUBE: 离散扩散语言模型证据的切线上界

Arseny Ivanov, Sergei Kholkin, Vladislav Gromadskii, Grigoriy Ksenofontov, Ivan Oseledets, Alexander Korotin

发表机构 * Applied AI Institute(应用人工智能研究所) HSE University(俄罗斯高等经济学院) MIRAI

AI总结 针对离散扩散模型无法精确计算对数似然的问题,提出变分上界TUBE,并通过无偏蒙特卡洛估计器评估,发现块状扩散模型和块状任意阶自回归模型的对数似然严格低于自回归模型基线。

Comments Preprint. 9 pages main text, 5 figures, plus appendix

详情
AI中文摘要

对数似然是评估生成模型的标准指标。不幸的是,与自回归模型(ARMs)相比,离散扩散模型通常无法精确计算该量。因此,现有评估依赖于证据下界(ELBO),不清楚真实值可能高出多少。我们通过引入证据的切线上界(TUBE)来解决这个问题,这是一个对数似然的变分上界,允许无偏蒙特卡洛估计。我们的TUBE适用于潜在变量模型,包括掩码扩散模型(MDMs)、任意阶ARMs(AO-ARMs)以及两者的块变体。应用于块MDMs和块AO-ARMs时,TUBE揭示了我们的关键实证发现:这些模型严格低于精确的ARM基线,表明ARMs在似然性方面仍然占主导地位。

英文摘要

Log-likelihood is a standard metric for evaluating generative models. Unfortunately, in contrast to autoregressive models (ARMs), discrete diffusion models generally do not admit exact computation of this quantity. Existing evaluations, therefore, rely on the evidence lower bound (ELBO), leaving unclear how much higher the true value may be. We address this by introducing the Tangent Upper Bound on Evidence (TUBE), a variational upper bound on log-likelihood that admits an unbiased Monte Carlo estimator. Our TUBE extends across latent-variable models, including masked diffusion models (MDMs), any-order ARMs (AO-ARMs), and block variants of both. Applied to block MDMs and block AO-ARMs, TUBE reveals our key empirical finding that these models lie strictly below the exact ARM baseline, showing that ARMs still dominate in likelihood.

2605.24286 2026-05-26 cs.LG cs.CL 版本更新

Faithfulness as Information Flow: Evaluating and Training Faithful Chain-of-Thought Reasoning

忠实性作为信息流:评估与训练忠实的链式思维推理

Jinghan Jia, Joe Benton, Eric Easley

发表机构 * Dept. CSE, Michigan State University(密歇根州立大学计算机科学系) Anthropic

AI总结 通过信息流视角提出基于充分性、完整性和必要性的框架,结合熵、掩码KL和梯度诊断评估链式思维忠实性,并引入更新时干预(如注意力掩码、反向梯度掩码等)训练更忠实的推理模型。

详情
AI中文摘要

链式思维(CoT)推理仅在推理轨迹忠实反映产生最终答案的计算过程时,才有助于监控语言模型。然而,模型可能依赖绕过CoT的提示-答案捷径,使得可见的推理轨迹即使看似合理也具有误导性。我们通过结构化的信息流视角研究CoT忠实性:忠实推理应将答案相关信息通过从提示到CoT再到答案的中介路径路由,而非通过直接的提示-答案捷径。该视角产生了一个基于三个互补属性(充分性、完整性和必要性)的任务无关框架,我们使用基于熵的、掩码KL和基于梯度的诊断来实例化。我们表明,这些指标恢复了提示推理中外部判断的忠实性差异,并识别了基于KL的诊断中低熵失败模式,其中基于梯度的度量保持更稳定。基于此分析,我们引入了基于验证器的在线强化学习的更新时干预,包括注意力掩码、仅反向梯度掩码、CoT梯度以及提示表示的对抗扰动。在提示算术、可奖励黑客的代码修复以及未经提示训练但在错误提示注入下评估的DAPO-Math模型中,我们的干预将行为和结构指标转向更强的CoT中介。特别是,它们使捷径和奖励黑客行为在CoT中更加透明,并改善了任务无关的忠实性指标,同时在某些设置中也降低了对错误提示的敏感性。我们的结果表明,在训练期间控制信息流是通向更忠实和可监控的CoT推理的实用途径。代码见 https://github.com/safety-research/faithful-cot。

英文摘要

Chain-of-thought (CoT) reasoning is useful for monitoring language models only when the reasoning trace faithfully reflects the computation that produces the final answer. However, models can rely on prompt-to-answer shortcuts that bypass the CoT, making the visible reasoning trace misleading even when it appears plausible. We study CoT faithfulness through a structural information-flow perspective: faithful reasoning should route answer-relevant information through the mediated path from prompt to CoT to answer, rather than through a direct prompt-to-answer shortcut. This perspective yields a task-agnostic framework based on three complementary properties, sufficiency, completeness, and necessity, which we instantiate with entropy-based, masked-KL, and gradient-based diagnostics. We show that these metrics recover externally judged faithfulness differences in hinted reasoning, and identify a low-entropy failure mode of KL-based diagnostics where gradient-based measures remain more stable. Building on this analysis, we introduce update-time interventions for verifier-based on-policy RL, including attention masking, backward-only gradient masking, CoT gradients, and adversarial perturbations of prompt representations. Across hinted arithmetic, reward-hackable code repair, and DAPO-Math models trained without hints but evaluated under wrong-hint injection, our interventions shift behavioral and structural indicators toward stronger CoT mediation. In particular, they make shortcut and reward-hacking behavior more transparent in the CoT and improve task-agnostic faithfulness metrics, while in some settings also reducing wrong-hint susceptibility. Our results suggest that controlling information flow during training is a practical route toward more faithful and monitorable CoT reasoning. Code is available at https://github.com/safety-research/faithful-cot.

2605.24278 2026-05-26 cs.LG 版本更新

Fourier Feature Pyramids for Physics-Informed Neural Networks

面向物理信息神经网络的傅里叶特征金字塔

Brandon Zhao, Yixuan Wang, Jonathan T. Barron, Katherine L. Bouman, Dor Verbin, Pratul P. Srinivasan

发表机构 * Department of Computing and Mathematical Sciences, California Institute of Technology(计算与数学科学部,加州理工学院) Cahill Center for Astronomy and Astrophysics, California Institute of Technology(天文与天体物理中心,加州理工学院) Google DeepMind(谷歌DeepMind)

AI总结 提出一种名为beignet的多分辨率傅里叶特征金字塔架构,通过可训练的特征网格和傅里叶插值,结合链式法则与FFT高效计算空间导数,以更少的参数实现比现有PINN方法更高的求解精度。

详情
AI中文摘要

我们提出了一种改进的神经场架构,用于求解偏微分方程(PDE)。当前的物理信息神经网络(PINN)提供了求解PDE的灵活框架,但难以获得高精度解,且计算量随参数数量增长而扩展性差。我们的模型称为beignet(带插值网格网络的带限嵌入),它将现有PINN模型使用的随机傅里叶特征嵌入替换为可训练的多分辨率傅里叶特征金字塔。为了在连续坐标处查询beignet,我们在金字塔的每一层使用傅里叶插值返回输入坐标处的特征,然后通过一个全连接神经网络主干解码该向量。我们的模型提供了多重优势:1)空间导数可以通过链式法则高效计算,将自动微分计算的神经网络导数与快速傅里叶变换(FFT)谱计算的特征网格导数相结合。2)beignet可以通过扩展傅里叶特征金字塔的参数数量,以计算高效的方式获得更高精度,而不是采用扩展神经网络架构这种效率较低的策略。3)beignet可以直接控制表示带限,从而对困难的PDE实现更稳定的优化。我们证明,在PDE基准测试中,beignet使用比最先进的PINN方法更少的参数,找到了显著更精确的解。我们进一步在自相似无粘Burgers爆破问题上评估beignet,并表明它可以使用Adam将残差最小化到接近机器精度,这一精度水平以前仅通过使用计算昂贵的高阶优化器才能达到。

英文摘要

We present an improved neural field architecture for solving partial differential equations (PDEs). Current physics-informed neural networks (PINNs) provide a flexible framework for solving PDEs, but they struggle to achieve highly accurate solutions and require computation that scales poorly with parameter count. Our model, which we call beignet (Bandlimited Embedding with Interpolated Grid Network), replaces the random Fourier feature embedding used by existing PINN models with a trainable multi-resolution Fourier feature pyramid. To query beignet at a continuous coordinate, we use Fourier interpolation at each level of the pyramid to return features at the input coordinate, and then decode this vector with a fully-connected neural network trunk. Our model provides multiple benefits: 1) Spatial derivatives can be computed efficiently by using the chain rule to compose derivatives of the neural network computed with automatic differentiation with derivatives of the feature grid computed spectrally by the Fast Fourier transform (FFT). 2) beignet can achieve higher accuracy in a compute-efficient manner by scaling the parameter count of this Fourier feature pyramid, instead of the less-efficient strategy of scaling the neural network architecture. 3) beignet can directly control the representation bandlimit, resulting in more stable optimization for difficult PDEs. We demonstrate that beignet finds significantly more accurate solutions on PDE benchmarks using fewer parameters than state-of-the-art PINN methods. We further evaluate beignet on the self-similar inviscid Burgers blowup problem and show that it can minimize residuals to near machine precision using Adam, an accuracy regime previously attained only by using computationally expensive higher-order optimizers.

2605.24274 2026-05-26 cs.LG stat.ML 版本更新

A lift for input-convex neural network training

输入凸神经网络训练的提升方法

Ali Siahkoohi, Anirudh Thatipelli

发表机构 * Department of Computer Science(计算机科学系)

AI总结 针对输入凸神经网络(ICNN)中非负权重约束导致的训练困难,提出一种通过超网络参数扩展的“提升”方法,软化损失景观,避免梯度衰减,在多个任务上达到更低测试损失。

详情
AI中文摘要

输入凸神经网络(ICNN)广泛用于对数凹密度估计、凸势归一化流、最优传输以及高维贝叶斯后验的传输图反演。这些任务共享一个结构约束:ICNN的层间权重必须保持非负。标准方法——投影梯度下降(PGD)到非负锥——应用硬非光滑投影(ADMM风格约束分裂的刚性惩罚极限),其经典收敛保证不适用于非光滑的ICNN训练景观;可微替代方案——softplus重参数化——以权重幅度指数方式衰减梯度,导致层间权重死亡和损失平台,从而停滞训练。受PDE约束反问题的参数扩展提升启发,我们提出“提升”:不是直接约束层间权重,而是训练一个无约束的超网络,该超网络从输入批次的置换不变摘要中生成这些权重。这为训练动态增加了随机性,软化了损失景观,使迭代能够逃离直接softplus停滞的梯度衰减区域。我们将这种软化追溯到三个结构要素——作为松弛变量的可学习偏置、条件于目标批次的超网络主体、以及通过批次随机性耦合两者的交叉协方差——并证明每个要素都是必要的:删除任何单个要素都会破坏承载软化的交叉协方差。在一维玩具目标到图像风格潜在变量的对数凹能量建模,以及21维表格基准上的凸势归一化流实验中,我们展示了提升方法比PGD和直接softplus达到更低的测试损失,并将平台受限的训练轨迹转变为下降谷底的轨迹。

英文摘要

Input-convex neural networks (ICNNs) are widely used for log-concave density estimation, convex-potential normalizing flows, optimal transport, and transport-map inversion for high-dimensional Bayesian posteriors. These tasks share a structural constraint: the inter-layer weights of the ICNN must remain non-negative. The standard recipe, projected gradient descent (PGD) onto the non-negative cone, applies a hard, non-smooth projection -- the stiff-penalty limit of an ADMM-style constraint splitting -- and its classical convergence guarantees do not transfer to the non-smooth ICNN training landscape; the differentiable alternative, softplus reparametrization, attenuates the gradient exponentially in the weight magnitude, stalling training with dead inter-layer weights and plateaued loss. Inspired by parameter-extension lifts of PDE-constrained inverse problems, we propose the lift: instead of constraining the inter-layer weights directly, we train an unconstrained hypernetwork that emits them from a permutation-invariant summary of the input batch. This adds stochasticity to the training dynamics that softens the loss landscape, letting the iterates escape the gradient-attenuated region where direct softplus stalls. We trace this softening to three structural ingredients -- a learnable bias acting as slack, a hypernetwork body that conditions on the target batch, and a cross-covariance coupling the two through batch stochasticity -- and prove each one necessary: deleting any single ingredient collapses the cross-covariance that carries the softening. On log-concave energy-based modeling from one-dimensional toy targets to image-flavored latents, and convex-potential normalizing flows on a 21-dimensional tabular benchmark, we show that the lift reaches a lower test loss than both PGD and direct softplus, and turns a plateau-bounded training trajectory into a valley-descending one.

2605.24261 2026-05-26 cs.LG cs.SY eess.SY 版本更新

Optimizing Digital Therapeutic Interventions: Online Learning under Endogenous Adherence

优化数字治疗干预:内源性依从性下的在线学习

Eric Pulick, Stephanie Carpenter, Matthew Buman, Yonatan Mintz

发表机构 * Department of Industrial and Systems Engineering, University of Wisconsin-Madison(威斯康星大学麦迪逊分校工业与系统工程系) College of Health Solutions, Arizona State University(亚利桑那州立大学健康解决方案学院)

AI总结 针对慢性病数字治疗中患者依从性受推荐和过去依从性影响的问题,提出一个包含线性动力系统和logit链接的决策支持框架,并设计基于乐观主义的UCB-BOLD算法实现亚线性遗憾。

Comments 48 pages, 6 figures

详情
AI中文摘要

临床医生管理慢性病干预面临的一个关键挑战是在信息和资源有限的情况下维持患者的长期健康。数字治疗(DT)通过重复互动(例如每日治疗建议)提供了一种成本效益高的方式来大规模管理干预,但患者的成功高度依赖于他们的依从性。行为心理学表明,治疗建议和过去的依从性都会影响未来的依从性,然而现有的DT决策支持框架仅建模建议效应或将依从性视为外生背景,在模型和算法开发上留下了关键空白。为填补这一空白,我们提出了一个DT决策支持框架,该框架同时捕捉建议和依从性效应,使临床医生能够更好地规划治疗建议。我们使用线性动力系统(LDS)对患者随时间变化的治疗参与能力进行建模,该系统同时捕捉建议和依从性效应,并通过logit链接与依从性行为内生连接。我们建立了该模型的有限时间辨识保证,将LDS结果扩展到我们的设置。接下来,我们提出了一种基于乐观主义的算法UCB-BOLD用于在线治疗选择,并证明其实现了亚线性遗憾。我们通过使用微随机试验数据生成的合成患者队列进行消融研究,将UCB-BOLD与基准进行了比较。DT决策支持工具可以包含动态模型,使决策者能够有效利用DT设置中的数据,通过有效的资源分配改善患者健康。虽然短视或启发式方法对某些患者类型足够,但对于其他患者,明确规划建议和依从性效应的好处显著;UCB-BOLD的条件风险价值遗憾比次优基准低2-3倍。

英文摘要

A critical challenge facing clinicians managing chronic disease interventions is sustaining long-run patient health given limited information and resources. Digital therapeutics (DTs) provide a cost-effective way to manage interventions at scale through repeated interactions (e.g. daily treatment recommendations), but patient success is highly dependent on their adherence. Behavioral psychology suggests that both treatment recommendations and past adherence affect future adherence, yet existing decision support frameworks for DTs model only recommendation effects or treat adherence as exogenous context, leaving a key gap in model and algorithm development. To address this gap, we present a DT decision support framework that captures both recommendation and adherence effects, allowing clinicians to better plan treatment recommendations. We model a patient's time-varying capacity for engagement with treatment using a linear dynamical system (LDS) that captures both recommendation and adherence effects, endogenously connected to adherence behavior with a logit link. We establish finite-time identification guarantees for this model, extending LDS results to our setting. Next, we propose an optimism-based algorithm, UCB-BOLD, for online treatment selection and prove that it achieves sublinear regret. We evaluate UCB-BOLD against benchmarks via ablation studies on a synthetic patient cohort generated using micro-randomized trial data. DT decision support tools can include dynamical models to enable decision makers to efficiently use the data in DT settings to improve patient health through effective resource allocation. While myopic or heuristic approaches suffice for some patient types, the benefits of explicitly planning around recommendation and adherence effects are significant for others; UCB-BOLD achieves 2-3x lower conditional value-at-risk regret than the next-best benchmark.

2605.24251 2026-05-26 cs.LG cs.CV 版本更新

Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions

重新思考边缘上的持续异常检测:在现实工业条件下进行基准测试

Chad Weatherly, Sen Lin

发表机构 * University of Houston(休斯敦大学)

AI总结 针对现有持续异常检测方法在评估、比较和边缘部署约束上的不足,提出统一基准和训练无关方法DINOSaur,在多种协议下超越所有现有方法,并在边缘设备上实现快速推理和适应。

详情
AI中文摘要

持续异常检测(CAD)解决了工业检测系统适应不断变化的生产条件的需求,但现有方法存在三个关键差距:不现实的评估、缺乏系统比较以及未考虑边缘部署约束。我们引入了一个统一的基准,结合了结构和逻辑异常的离散任务评估、一种新颖的连续漂移协议、对所有已发布CAD方法的首次头对头比较,以及在边缘硬件上的计算效率分析。我们的结果表明,现有的CAD方法并不一致地优于带有简单经验重放的传统方法。受此启发,我们提出了DINOSaur,一种无需训练的方法,结合了冻结的DINOv3骨干网络、空间索引的coreset记忆和邻域限制的异常评分。DINOSaur通过构造实现了零遗忘,在所有五种协议上优于所有评估的方法,并在NVIDIA Jetson Orin Nano上以低于100毫秒的推理速度运行,在设备上适应新任务的时间不到30秒。

英文摘要

Continual anomaly detection (CAD) addresses the need for industrial inspection systems to adapt to evolving production conditions, yet existing methods share three critical gaps: unrealistic evaluation, no systematic comparison, and no consideration of edge deployment constraints. We introduce a unified benchmark combining discrete-task evaluation on structural and logical anomalies, a novel continuous drift protocol, the first head-to-head comparison of all published CAD methods, and computational efficiency profiling on edge hardware. Our results reveal that existing CAD methods do not consistently outperform traditional approaches with simple experience replay. Thus motivated, we propose DINOSaur, a training-free method combining a frozen DINOv3 backbone with spatially-indexed coreset memory and neighborhood-restricted anomaly scoring. DINOSaur achieves zero forgetting by construction, outperforms all evaluated methods across all five protocols, and runs at sub-100\,ms inference on an NVIDIA Jetson Orin Nano, with on-device adaptation to new tasks in under 30 seconds.

2605.24249 2026-05-26 cs.LG 版本更新

PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets

PrivFusion: 一种用于协调分布式数据集的隐私保护多智能体框架

Anisa Halimi, Liubov Nedoshivina, Kieran Fraser, Stefano Braghin

发表机构 * IBM Research(IBM研究院)

AI总结 提出PrivFusion框架,通过多智能体自动协调异构结构化数据集,在联邦学习前实现隐私保护的数据对齐,减少人工干预。

Comments Accepted by IEEE CBMS 2026

详情
AI中文摘要

临床数据的日益可用性增加了机器学习的使用,但集中式数据聚合对于敏感健康信息通常不可行。联邦学习提供了一种分布式替代方案,但其采用受到机构数据集间显著异质性的限制,使得协调成为多站点分析的关键但经常被忽视的前提。我们引入了PrivFusion,一个隐私保护的多智能体框架,在联邦训练之前自动协调结构化数据集。PrivFusion使用智能体分析本地数据,跨站点聚类语义相似的特征,并提供迭代转换建议直到实现对齐。在四个异构COVID-19数据集上的评估表明,PrivFusion有效且高效地协调了多站点数据,同时大幅减少了人工工作量。

英文摘要

The growing availability of clinical data has increased the use of machine learning, yet centralized data aggregation is often infeasible for sensitive health information. Federated Learning (FL) offers a distributed alternative, but its adoption is limited by substantial heterogeneity across institutional datasets, making harmonization a critical but frequently overlooked prerequisite for multi-site analytics. We introduce PrivFusion, a privacy-preserving multi-agent framework that automates the harmonization of structured datasets prior to federated training. PrivFusion uses agents to analyze local data, cluster semantically similar features across sites, and provide iterative transformation recommendations until alignment is achieved. Evaluation across four heterogeneous COVID-19 datasets demonstrates that PrivFusion effectively and efficiently harmonizes multi-site data while substantially reducing manual effort.

2605.24216 2026-05-26 cs.LG cs.AI cs.CL cs.CR 版本更新

Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning

Agent-ToM: 通过心智理论推理学习监控自主LLM智能体

Nesreen K. Ahmed, Nima Nafisi

发表机构 * Cisco Outshift(思科Outshift)

AI总结 针对自主LLM智能体的隐蔽恶意行为监控难题,提出基于心智理论推理的Agent-ToM框架,通过信念推断、意图假设与验证实现结构化轨迹分析,在监控基准上取得优于集成方法的性能。

Comments 23 pages, 9 figures

详情
AI中文摘要

监控自主大语言模型(LLM)智能体的隐蔽恶意行为具有挑战性,因为攻击模式具有延迟性、上下文依赖性和长期性。智能体可能追求隐藏目标同时保持表面良性行为,即使拥有完整轨迹访问也难以检测。先前的监控方法改进了脚手架或集成聚合,但独立处理每条轨迹,未从先前的监控经验中学习。此外,标准推理方法解释观察到的行为,但没有明确推理智能体的信念、意图和目标对齐,而这些对于区分良性任务执行和隐蔽偏离是必要的。 我们提出 extbf{Agent-ToM},一种基于心智理论(ToM)推理的监控学习框架,用于自主智能体的安全分析。Agent-ToM通过推断信念、具有校准置信度的意图假设、预期行动以及与任务一致行为基线的偏离,执行结构化的全轨迹分析。在推理时,它采用 extit{推理-验证-细化}流程来构建和验证监控决策。在训练时,Agent-ToM将批评信号蒸馏到持久的 extit{语义护栏记忆}中,使得跨回合可重用的信念和意图条件约束成为可能。我们在对抗性智能体监控基准(SHADE-Arena和CUA-SHADE-Arena)上评估Agent-ToM。Agent-ToM实现了强精确率-召回率平衡,并使用连贯的双调用推理流程,优于包括集成方法在内的最先进监控基线。这些结果表明,在监控层学习,结合结构化的ToM推理和验证,为保护自主LLM智能体提供了有效且可部署的基础。

英文摘要

Monitoring autonomous large language model (LLM) agents for covert malicious behavior is challenging due to delayed, context-dependent, and long-horizon attack patterns. Agents may pursue hidden objectives while maintaining superficially benign behavior, making detection difficult even with full trajectory access. Prior monitoring approaches improve scaffolding or ensemble aggregation, but treat each trajectory independently and do not learn from prior monitoring experience. Moreover, standard reasoning methods explain observed behavior without explicitly reasoning about agent beliefs, intentions, and goal alignment required to distinguish benign task execution from covert deviation. We propose \textbf{Agent-ToM}, a learning-to-monitor framework grounded in Theory-of-Mind (ToM) reasoning for security analysis of autonomous agents. Agent-ToM performs structured full-trajectory analysis by inferring beliefs, intent hypotheses with calibrated confidence, expected actions, and deviations from task-consistent behavioral baselines. At inference time, it employs a \textit{Reason-Verify-Refine} pipeline to construct and validate monitoring decisions. At training time, Agent-ToM distills critique signals into a persistent \textit{semantic guardrail memory}, enabling reusable belief- and intent-conditioned constraints across episodes. We evaluate Agent-ToM on adversarial agent monitoring benchmarks (SHADE-Arena and CUA-SHADE-Arena). Agent-ToM achieves strong precision-recall balance and outperforms state-of-the-art monitoring baselines, including ensemble methods, while using a coherent two-call reasoning pipeline. These results demonstrate that learning at the monitoring layer, combined with structured ToM reasoning and verification, provides an effective and deployable foundation for securing autonomous LLM agents.

2605.24213 2026-05-26 cs.SE cs.AI cs.LG 版本更新

Towards Evaluation Engineering: An Empirical Study of ML Evaluation Harnesses in the Wild

迈向评估工程:机器学习评估工具在野外的实证研究

Zhimin Zhao, Zehao Wang, Abdul Ali Bangash, Bram Adams, Ahmed E. Hassan

发表机构 * Software Analysis and Intelligence Lab (SAIL), School of Computing, Queen's University(软件分析与智能实验室(SAIL),计算学院,女王大学) Concordia University(Concordia大学) Lahore University of Management Sciences (LUMS)(拉合尔管理科学大学(LUMS))

AI总结 通过对57个评估工具的实证研究,提出五阶段工具模型,并分类16560个问题,发现规范阶段问题最多(41.4%),主要根因是未实现功能(24.3%)、文档缺失(20.3%)和输入验证缺失(17.2%),为将评估工程作为独立软件工程关注点奠定实证基础。

详情
AI中文摘要

评估工具是编排模型评估的软件系统,管理模型调用、数据加载、指标计算和结果报告。尽管它们在机器学习基础设施中扮演关键角色,但其操作挑战和工程问题迄今受到的关注有限。我们对57个评估工具进行了实证研究,推导出一个五阶段工具模型,并根据工作流阶段和根本原因对16,560个问题进行了分类。大多数工具操作挑战集中在规范阶段(占问题的41.4%),在此阶段工具集成外部模型、数据集和评分评判者。操作挑战的三个最常见根本原因是未实现功能(24.3%)、文档缺失(20.3%)和输入验证缺失(17.2%),这些合计占分类问题的61.7%,涵盖现有功能的缺陷和阻碍预期工作流的能力缺口。根本原因也因工作流阶段而异:环境不兼容和外部依赖破坏占配置问题的36.2%,而算法错误(25.9%)和验证缺失(22.5%)主导评估问题。这些贡献共同为将评估工程视为一个独立的软件工程关注点建立了实证基础。

英文摘要

Evaluation harnesses are software systems that orchestrate model evaluation by managing model invocation, data loading, metric computation, and result reporting. Despite their critical role in machine learning infrastructure, their operational challenges and engineering concerns have received limited attention so far. We present an empirical study of 57 evaluation harnesses, deriving a five-stage harness model and classifying 16,560 issues by workflow stage and root cause. Most harness operational challenges concentrate in the Specification stage (41.4% of issues), where harnesses integrate external models, datasets, and scoring judges. The three most frequent root causes of operational challenges are unimplemented features (24.3%), documentation gaps (20.3%), and missing input validation (17.2%), which together account for 61.7% of classified issues, spanning both defects in existing functionality and capability gaps that block intended workflows. Root causes also vary by workflow stage: environment incompatibility and external dependency breakage account for 36.2% of provisioning issues, whereas algorithmic error (25.9%) and validation gap (22.5%) dominate assessment issues. Together, these contributions establish an empirical foundation for treating evaluation engineering as a distinct software engineering concern.

2605.24212 2026-05-26 stat.AP cs.AI cs.LG stat.ML 版本更新

Distributionally Robust Transfer Learning with Structurally Missing Covariates, with Application to Cross-National Cardiac Arrest Prediction

分布鲁棒迁移学习在结构缺失协变量中的应用:以跨国心脏骤停预测为例

Siqi Li, Chuan Hong, Ziye Tian, Benjamin Sieu-Hon Leong, Koshi Nakagawa, Hideharu Tanaka, Sang Do Shin, Khuong Quoc Dai, Do Ngoc Son, Marcus Eng Hock Ong, Nan Liu, Molei Liu

发表机构 * Centre for Biomedical Data Science, Duke-NUS Medical School, Singapore(生物医学数据科学中心,杜克-国家大学医学院,新加坡) Duke-NUS AI + Medical Sciences Initiative, Duke-NUS Medical School, Singapore(杜克-国家大学医学院AI+医学科学倡议,新加坡) Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA(生物统计学与生物信息学系,杜克大学,北卡罗来纳州达勒姆,美国) Duke Clinical Research Institute, Durham, NC, USA(杜克临床研究学院,北卡罗来纳州达勒姆,美国) Emergency Medicine Department, National University Hospital, Singapore(急诊医学部,国立大学医院,新加坡) Department of Sport and Medical Science, Faculty of Physical Education, Kokushikan University, Tokyo, Japan(体育与医学科学系,体育学院,立命馆大学,东京,日本) Graduate School of Emergency Medical System, Kokushikan University, Tokyo, Japan(急救医疗系统研究生院,立命馆大学,东京,日本) Department of Emergency Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea(急诊医学系,首尔国立大学医学院,首尔,韩国) Center for Emergency Medicine, Bach Mai Hospital, Hanoi, Vietnam(急救医学中心,巴赫梅医院,河内,越南) Center for Critical Care Medicine, Bach Mai Hospital, Hanoi, Vietnam(重症医学中心,巴赫梅医院,河内,越南) Health Services Research Centre, Singapore Health Services, Singapore(卫生服务研究中心,新加坡卫生服务,新加坡) Department of Emergency Medicine, Singapore General Hospital, Singapore(急诊医学部,新加坡中央医院,新加坡) Pre-hospital & Emergency Research Centre, Health Services Research and Population Health, Duke-NUS Medical School, Singapore(院前与急诊研究中心,卫生服务研究与人口健康,杜克-国家大学医学院,新加坡)

AI总结 提出DRUM框架,通过分布鲁棒优化和神经网络生成器处理目标域中结构缺失的协变量,实现无标签目标域的预测模型迁移,并在跨国心脏骤停预测中验证有效性。

详情
AI中文摘要

当关键训练协变量在部署时不可用且目标域中标记结果有限时,跨医疗系统部署临床预测模型常常失败。例如,院外心脏骤停(OHCA)的高性能模型依赖于高资源环境中常规收集的详细院前测量数据,但在许多国际登记处中不可用。现有方法要么丢弃缺失协变量,牺牲预测信息,要么依赖于关于其目标分布的可检验假设。我们提出了DRUM(具有结构缺失协变量的分布鲁棒无监督迁移学习),这是一个将预测模型迁移到某些协变量结构缺失且结果标签不可用的目标群体的框架。DRUM将协变量划分为共享组件($X$,在所有环境中观察到)和缺失组件($A$,仅在源域中观察到)。DRUM不进行缺失协变量插补,而是使用神经网络生成器优化未知目标分布$A \mid X$上的最坏情况预测性能,并通过鲁棒性参数控制与源条件允许的偏差。我们进一步开发了一种偏差校正程序,以减少对干扰估计误差的敏感性。模拟显示,在分布偏移下,平均和最坏情况预测误差均有显著改善。应用于跨国OHCA预测,将模型从美国登记处迁移到多个未记录院前变量的亚洲登记处,DRUM在各个站点产生了更校准的预测和改进的临床分类性能。

英文摘要

Deploying clinical prediction models across healthcare systems often fails when key training covariates are unavailable at deployment and labeled outcomes are limited in the target domain. For example, high-performing models for out-of-hospital cardiac arrest (OHCA) rely on detailed prehospital measurements routinely collected in high-resource settings but unavailable in many international registries. Existing methods either discard missing covariates, sacrificing predictive information, or rely on untestable assumptions about their target distribution. We propose DRUM (\underline{D}istributionally \underline{R}obust \underline{U}nsupervised transfer learning with structurally \underline{M}issing covariates), a framework that transfers prediction models to target populations where certain covariates are structurally absent and outcome labels are unavailable. DRUM partitions covariates into shared components ($X$), observed across all settings, and missing components ($A$), observed only in the source. Rather than imputing missing covariates, DRUM optimizes worst-case predictive performance over the unknown target distribution of $A \mid X$ using a neural network generator, with a robustness parameter controlling allowable deviation from the source conditional. We further develop a bias correction procedure that reduces sensitivity to nuisance estimation error. Simulations show substantial improvements in both mean and worst-case prediction error under distribution shift. Applied to cross-national OHCA prediction, transferring models from a US registry to multiple Asian registries where prehospital variables are unrecorded, DRUM yields better-calibrated predictions and improved clinical classification performance across sites.

2605.24210 2026-05-26 cs.LG stat.ML 版本更新

Characterizing the Representational Capacity of Neural Processes

神经过程表示能力的刻画

Robin Young

发表机构 * University of Cambridge(剑桥大学)

AI总结 本文通过严格层级分析,刻画了条件神经过程、注意力神经过程、Transformer神经过程及其潜在变体的表示能力,揭示了不同架构在函数表示上的包含关系与局限。

Comments To appear at ProbML/AABI 2026

详情
AI中文摘要

神经过程能表示哪些函数?我们分析了流行的NP架构的表示能力:条件神经过程(CNPs)、注意力神经过程(ANPs)、Transformer神经过程(TNPs)及其潜在变体。我们证明这些架构形成了一个严格的层级结构。CNP可表示的函数恰好是那些依赖于上下文分布的有限多个期望特征的函数。ANPs通过查询相关的重新加权严格推广了CNPs,从而实现了核平滑器。ConvCNPs和ANPs不可比较;每个都包含对方之外的函数,通过平稳性与平移等变性区分。具有$L$个自注意力层的TNPs捕获$L$跳上下文交互。对于潜在NPs,我们证明有限维潜在变量提供一致的采样,但不能规避编码器的限制;匹配GP后验分布需要潜在维度随上下文大小缩放。这些结果为基于任务结构的架构选择提供了理论基础。

英文摘要

What functions can Neural Processes represent? We analyze the representational capacity of popular NP architectures: Conditional Neural Processes (CNPs), Attentive Neural Processes (ANPs), Transformer Neural Processes (TNPs), and their latent variants. We prove these architectures form a strict hierarchy. CNP-representable functions are exactly those depending on finitely many expected features of the context distribution. ANPs strictly generalize CNPs via query-dependent reweighting, enabling kernel smoothers. ConvCNPs and ANPs are incomparable; each contains functions outside the other, separated by stationarity versus translation equivariance. TNPs with $L$ self-attention layers capture $L$-hop context interactions. For latent NPs, we show finite-dimensional latents provide coherent sampling but do not circumvent encoder limitations; matching GP posterior distributions requires latent dimension scaling with context size. These results provide a theoretical foundation for architecture selection based on task structure.

2605.24207 2026-05-26 cs.DB cs.LG 版本更新

Incorporating Deep Learning Design in Database Queries

将深度学习设计融入数据库查询

Yuval Lev Lubarsky, Dean Light, Boaz Berger, Shunit Agmon, Benny Kimelfeld

发表机构 * University of Washington(华盛顿大学)

AI总结 提出一种将深度学习自然集成到数据库查询中的方法,通过为元组关联可学习的向量嵌入,使查询同时操作数据和嵌入,实现关系深度学习。

详情
AI中文摘要

关系数据库上的深度学习通常通过将数据转换为图表示并在外部框架中应用基于图的神经网络来实现。这种数据库与外部机器学习系统之间的往返引入了非平凡的工程开销。实际上,这些图神经网络对元组嵌入进行操作,并以捕获关系连接引起的交互的方式操纵它们。鉴于这种自然的对应关系,没有根本原因说明为什么在关系数据上指定神经网络应该比查询它困难得多。我们提出了一种将深度学习与数据库查询自然集成的方法。关键思想是为每个元组关联一个来源,表示为具有可学习参数的向量嵌入。查询被提升为联合操作数据和嵌入,将带有嵌入元组的输入关系映射到带有嵌入元组的输出关系。这种方法为关系深度学习提供了声明性基础,促进了与数据库系统的集成、优化和广泛采用。我们描述了RelaNN,这是一个基于PyTorch和cuDF构建的概念验证实现。通过实现各种图学习模型,包括图卷积网络、异构图变换器、超图神经网络和深度同态网络,我们展示了RelaNN的实用性。程序的简单性及其有竞争力的运行时性能展示了一条具体路径,使得在数据库上实现最先进的神经网络变得像编写查询一样简单。

英文摘要

Deep learning over relational databases is conventionally realized by translating data into graph representations and applying graph-based neural networks within external frameworks. This round-trip between the database and external machine learning (ML) systems introduces non-trivial engineering overhead. In effect, these graph neural networks operate on tuple embeddings and manipulate them in ways that capture the interactions induced by relational joins. Given this natural correspondence, there is no fundamental reason why specifying a neural network over relational data should be substantially harder than querying it. We propose an approach that naturally integrates deep learning with database queries. The key idea is to associate each tuple with provenance, represented as a vector embedding with learnable parameters. Queries are lifted to operate jointly on data and embeddings, mapping input relations with embedded tuples to output relations with embedded tuples. This approach provides a declarative foundation for relational deep learning, facilitating integration with database systems, optimization, and wide adoption. We describe RelaNN, a proof-of-concept implementation of this approach built on top of PyTorch and cuDF. We illustrate the utility of RelaNN by implementing various graph-learning models, including graph convolutional networks, heterogeneous graph transformers, hypergraph neural networks and deep homomorphism networks. The simplicity of the programs and their competitive runtime performance demonstrate a concrete path toward making the implementation of state-of-the-art neural networks over databases as simple as writing a query.

2605.24195 2026-05-26 cs.CV cs.LG 版本更新

Single View Seafloor Recovery from Imaging Sonar via Differentiable Rendering

通过可微渲染从成像声纳进行单视图海底恢复

Sevan Brodjian, Michael Hobley, Pietro Perona

发表机构 * California Institute of Technology(加州理工学院)

AI总结 提出一种无需训练的方法,通过可微渲染在30秒内从单张声纳图像恢复海底地形,利用已知海底倾斜条件,首次实现单视图高度恢复。

详情
AI中文摘要

由于光衰减和浑浊度,声纳通常是水下高分辨率成像的唯一合适模态。前视成像声纳提供距离和水平角度的测量,但将垂直结构压缩成平面图像,产生歧义,使得3D恢复具有挑战性。成像声纳的一个常见应用是水下地形测绘(测深),但目前的方法需要多个视图、昂贵的多传感器设置或大量训练数据,这限制了其使用和对新环境的适应性。我们提出了一种无需训练的方法,通过可微渲染在30秒内从单张声纳图像恢复测深,条件为已知的海底倾斜。据我们所知,这是声纳中单视图高度恢复的第一个可微渲染方法。我们的方法实现了可微声纳光线追踪,并优化显式高度场以重现目标图像。在合成数据集上,我们的方法在分布偏移下优于有监督的CNN,在粗糙地形上保持接近,而CNN在分布内获胜。通过建模声纳过程的物理基础先验,我们的方法无需训练数据即可适应不同的传感器配置和环境。

英文摘要

Sonar is often the only modality suitable for high-resolution imaging underwater due to light attenuation and turbidity. Forward-looking imaging sonar provides measurements over range and horizontal angle but collapses vertical structure into a flat image, creating ambiguities that make 3D recovery challenging. A common use case for imaging sonar is underwater terrain mapping (bathymetry), yet current methods require many views, expensive multi-sensor setups, or significant training data, which limits use and adaptability to new environments. We present a training-free method that recovers bathymetry from a single sonar image in under 30 seconds via differentiable rendering, conditioned on a known seafloor tilt. To our knowledge, this is the first differentiable rendering approach for single-view height recovery in sonar. Our method implements differentiable sonar ray tracing and optimizes an explicit height field to reproduce the target image. On synthetic datasets, our approach outperforms a supervised CNN under distribution shift and remains close on rough terrain, while the CNN wins in-distribution. By modeling physically grounded priors of the sonar process, our method adapts across sensor configurations and environments without training data.

2605.24193 2026-05-26 cs.SD cs.LG 版本更新

Music Transcription with (Almost) No Supervision

音乐转录:几乎无需监督

Saebyeol Shin, Chao Wan, Zhenzhen Liu, Justin Lovelace, Daniel C. Lin, Kilian Q. Weinberger, John Thickstun

发表机构 * Cornell University(康奈尔大学)

AI总结 采用循环一致性翻译框架,利用少量配对数据作为锚点,充分挖掘未配对音频和乐谱数据,实现高质量音乐转录。

详情
AI中文摘要

竞争性的音乐转录模型需要大量的配对音频-乐谱数据,但由于收集成本、对齐困难和版权限制,这类数据稀缺。与此同时,大量未配对的音频录音和符号乐谱可免费获取,但未被利用。我们采用循环一致性翻译框架,其中少量配对数据作为最小锚点,释放未配对数据池的全部潜力。我们发现:未配对数据带来惊人的提升,尤其在有限监督下;未配对音频比未配对乐谱贡献更大;在训练中引入新乐器的未标注音频,可在无需任何配对监督的情况下改善该乐器的转录。这些结果共同表明,扩展未配对数据为标注数据仍然稀缺的乐器提供了一条实现高质量转录的实用途径。

英文摘要

Competitive music transcription models require large amounts of paired audio-score data, which is scarce due to collection costs, alignment difficulty, and copyright restrictions. Meanwhile, vast quantities of unpaired audio recordings and symbolic scores are freely available but have gone unused. We adopt a cycle-consistent translation framework in which a small amount of paired data acts as a minimal anchor, unlocking the full potential of the unpaired pool. We find that: unpaired data yields surprisingly large gains, especially under limited supervision; unpaired audio contributes more than unpaired scores; incorporating unlabeled audio from a new instrument during training improves transcription for that instrument without any paired supervision. Together, these results suggest that scaling unpaired data offers a practical path toward high-quality transcription for instruments where labeled data remains scarce.

2605.24192 2026-05-26 cs.LG cs.AI cs.CV 版本更新

Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization

滤波后验均值集合:扩散泛化分析模型的统一框架

Matthew Niedoba, Berend Zwartsenberg, Frank Wood

发表机构 * University of British Columbia(不列颠哥伦比亚大学) Inverted AI Alberta Machine Intelligence Institute(阿尔伯塔机器智能研究所)

AI总结 本文提出滤波后验均值集合(FPMC)统一框架,通过查询精度向量、响应权重和源分布建模扩散模型去噪函数的泛化行为,并通过软松弛和源分布增强提升现有方法性能。

Comments 27 Pages, 7 figures

详情
AI中文摘要

作为图像扩散模型骨干的神经网络去噪函数,在多种网络架构和训练超参数下展现出显著一致的泛化行为。最近一系列研究试图通过聚合训练数据集补丁的后验加权平均值来建模这些网络的输出。在本工作中,我们将这些方法整合为一个统一的模型类,称为滤波后验均值集合(FPMC)。我们使用查询精度向量、响应权重和源分布定义该模型类,并说明现有方法可通过这些设计轴的具体选择恢复。依次研究每个轴,我们发现FPMC性能可以通过对先前基于补丁的方法进行软松弛以及通过源分布的增强来改进。将这些发现应用于现有的FPMC,我们在三个自然图像数据集上展示了样本的一致改进。

英文摘要

The neural-network denoising functions which form the backbone of image diffusion models are remarkably consistent in their generalization behaviour across a wide variety of network architectures and training procedure hyperparameters. A recent line of research has sought to model the outputs of these networks by aggregating posterior weighted averages of training dataset patches. In this work, we consolidate these approaches into a unified model class which we call Filtered Posterior Mean Collections (FPMCs). We define this model class using query precision vectors, response weights, and source distributions, and illustrate that existing methods are recoverable with specific choices of these design axes. Investigating each axis in turn, we find that FPMC performance can be improved with soft relaxations of prior patch-based methods, and through augmentations of source distributions. Applying these findings to an existing FPMC, we demonstrate consistent sample improvement across three natural image datasets.

2605.24183 2026-05-26 cs.DB cs.AI cs.LG 版本更新

AvalancheBench: Evaluating Enterprise Data Agents Through Latent World Recovery

AvalancheBench: 通过潜在世界恢复评估企业数据智能体

Darek Kleczek, Fuheng Zhao, Alexander W. Lee, Julien Tissier, Pawel Liskowski, Ugur Cetintemel, Anupam Datta

发表机构 * Brown University and Snowflake(布朗大学和Snowflake)

AI总结 提出AvalancheBench基准,通过潜在世界恢复评估企业数据智能体的分析理解能力,揭示早期错误如何传播并导致系统性错误推荐。

详情
AI中文摘要

我们介绍了AvalancheBench,一个通过潜在世界恢复评估企业数据智能体的基准。AvalancheBench在三个方面改进了现有基准。首先,它评估分析理解而非流程完成:系统根据是否恢复了解释数据的片段、驱动因素、时间事件和关系来评分,而不仅仅是执行工作流或生成看似合理的报告。其次,它通过从已知潜在世界生成观测数据,为目标驱动分析提供真实基准,从而允许对不完整但有效的恢复给予部分分数。第三,它揭示了早期分析错误如何传播到后续结论:遗漏的片段、合并的事件或错误的归因可能导致系统性错误推荐。在这个意义上,AvalancheBench通过提供一个受控环境来诊断智能体是否恢复了企业数据背后的分析结构,从而补充了真实数据基准。在第一个电子商务用例中,领先编码智能体的最强配置仅恢复了26%的评分标准,失败集中在通用客户细分和合并的时间事件上。

英文摘要

We introduce AvalancheBench, a benchmark for evaluating enterprise data agents through \emph{latent world recovery}. AvalancheBench improves on existing benchmarks in three ways. First, it evaluates analytical understanding rather than pipeline completion: systems are scored on whether they recover the segments, drivers, temporal events, and relationships that explain the data, not merely on whether they execute a workflow or produce a plausible report. Second, it provides ground truth for goal-driven analytics by generating observations from a known latent world, enabling partial credit for incomplete but valid recoveries. Third, it exposes how early analytical mistakes propagate into later conclusions: missed segments, merged events, or wrong attributions can lead to systematically wrong recommendations. In this sense, AvalancheBench complements real-data benchmarks by providing a controlled setting for diagnosing whether agents recover the analytical structure behind enterprise data. On a first e-commerce use case, the strongest configuration of a leading coding agent recovers only 26\% of the rubric, with failures concentrated in generic customer segmentations and merged temporal events.

2605.24173 2026-05-26 cs.CL cs.AI cs.CR cs.LG 版本更新

Extracting Training Data from Diffusion Language Models via Infilling

通过填充从扩散语言模型中提取训练数据

Yihan Wang, N. Asokan

发表机构 * University of Waterloo(滑铁卢大学) KTH Royal Institute of Technology(皇家理工学院)

AI总结 提出填充提取协议,利用扩散语言模型的双向去噪能力,通过任意二进制掩码参数化,揭示掩码几何形状控制提取能力,边缘条件掩码比前缀条件掩码多提取三倍逐字序列,且双向访问打开了自回归模型无法利用的通道。

详情
AI中文摘要

大型语言模型中的记忆化几乎完全通过前缀条件提取进行研究,这是自回归模型的自然选择。然而,扩散语言模型(DLM)可以在任意位置去噪掩码标记。因此,仅前缀探测揭示了DLM中记忆化的一个方面,并显著低估了训练数据提取的风险。为了真实地建模DLM中训练数据的可提取性,我们引入了\emph{填充提取},这是一种由任意二进制掩码参数化的数据提取协议,它包含了前缀仅探测并考虑了DLM的双向归纳偏差。在LLaDA-8B和Dream-7B上,跨五种提取模式、三种训练流水线和三个涵盖逐字和部分泄漏的语料库进行实例化,我们发现掩码几何形状控制着可提取性:边缘条件掩码比前缀条件掩码\emph{多提取三倍}的逐字序列,并且双向访问打开了自回归模型中无法利用的通道。特别是,我们表明,一个能够访问已删除个人身份信息的训练数据的现实对手,甚至可以从DLM中提取被删除的电子邮件地址,其召回率高于规模匹配的自回归模型。解码的可调参数可测量地影响提取性能,而后续的监督微调阶段并未消除先前的记忆化。

英文摘要

Memorization in large language models has been studied almost exclusively through prefix-conditioned extraction, a natural choice for autoregressive models. However, diffusion language models (DLMs) can denoise masked tokens at arbitrary positions. Thus, prefix-only probing reveals only one facet of memorization in DLMs and significantly underestimates the risk of training-data extraction. In order to realistically model extractability of training data in DLMs, we introduce \emph{infilling extraction}, a data-extraction protocol parameterized by an arbitrary binary mask that subsumes prefix-only probing and accounts for the bidirectional inductive bias of DLMs. Instantiating it on LLaDA-8B and Dream-7B across five extraction modes, three training pipelines, and three corpora covering verbatim and partial leakage, we find that mask geometry governs extractability: edge-conditioned masks \emph{extract up to three times more} verbatim sequences than prefix-conditioned ones, and bidirectional access opens channels inaccessible in autoregressive models. In particular, we show that a realistic adversary with access to training data where personally identifiable information has been redacted, can even achieve higher recall on extracting redacted email addresses from DLMs than from scale-matched autoregressive models. Tunable parameters for decoding measurably affect extraction performance, while a follow-up supervised finetuning stage does not eliminate the prior memorization.

2605.24171 2026-05-26 cs.LG cs.AI 版本更新

PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection

PromptAudit: 审计基于LLM的漏洞检测中的提示敏感性

Steffen J. Camarato, Yahya Hmaiti, Mandana Ghadamian, David Mohaisen

发表机构 * University of Central Florida(佛罗里达大学)

AI总结 提出PromptAudit框架,通过固定数据集、解码和解析仅变化提示策略,评估五种提示策略在五个开源模型上对1000个CVE(6074个代码样本,16种编程语言)的漏洞检测性能,发现标准思维链提示整体性能最佳,而提示敏感性是系统的一级属性。

详情
AI中文摘要

大型语言模型越来越多地用于漏洞检测,但它们在各种提示表述下的可靠性仍未得到表征。我们提出了PromptAudit,一个受控评估框架,通过固定数据集、解码和解析,仅变化提示策略来隔离提示效应。我们在1000个CVE(涵盖16种编程语言的6074个代码样本)上,使用五种提示策略对五个开源模型进行评估,计算准确率、召回率、弃权率、覆盖率和有效F1。我们发现,标准思维链提示实现了最强的整体操作性能,而少样本提示提供了模型相关的益处,对于提示敏感的模型最为显著。相比之下,自适应思维链经常抑制召回率,而自一致性导致过度弃权,急剧降低了有效性能。这些结果表明,漏洞检测行为由模型和提示共同决定,并且提示敏感性是一个一级系统属性,必须在评估和部署中明确表征。

英文摘要

Large language models are increasingly used for vulnerability detection, yet their reliability under different prompt formulations remains uncharacterized. We present PromptAudit, a controlled evaluation framework that isolates prompt effects by fixing the dataset, decoding, and parsing while varying only the prompting strategy. Using five prompting strategies across five open-weight models on 1,000 CVEs (6,074 code samples spanning 16 programming languages), we evaluate accuracy, recall, abstention, coverage, and effective F1. We find that standard chain-of-thought prompting achieves the strongest overall operational performance, while few-shot prompting provides model-dependent benefits that are most pronounced for prompt-sensitive models. In contrast, adaptive chain-of-thought frequently suppresses recall and self-consistency induces excessive abstention, sharply reducing effective performance. These results show that vulnerability detection behavior is jointly determined by the model and the prompt, and that prompt sensitivity is a first-class system property that must be explicitly characterized in evaluation and deployment.

2605.24170 2026-05-26 math.DS cs.LG q-bio.QM 版本更新

Learning dynamical systems with biochemically informed neural ordinary differential equations

基于生化信息的神经常微分方程学习动力系统

Luis L. Fonseca, Reinhard C. Laubenbacher, Lucas Böttcher

发表机构 * Laboratory for Systems Medicine, Department of Medicine, University of Florida(系统医学实验室,医学系,佛罗里达大学) Department of Computational Science and Philosophy, Frankfurt School of Finance and Management(计算科学与哲学系,法兰克福金融与管理学院)

AI总结 提出生化信息神经常微分方程(BINODEs),通过将神经网络表示的过程嵌入化学计量结构,在保持可解释性的同时灵活建模部分已知的生化动力系统。

Comments 23 pages, 13 figures, 4 tables

详情
AI中文摘要

生化反应的常微分方程模型通常被表述为化学计量系统,其中动力学源于一系列相互作用的过程。一个核心挑战是每个过程的函数形式很少先验已知,且可能难以从数据中推断。我们提出生化信息神经常微分方程(BINODEs),这是一种神经ODE框架,保留了机制模型的化学计量结构,同时通过神经网络表示各个过程。在BINODEs中,神经网络过程(NNP)的输出通过类似于化学计量矩阵的线性层映射到状态导数。这种架构允许将生物侧面信息(如过程特定输入、符号约束和单调性假设)直接构建到模型中。我们表征了NNP对几种标准生化速率定律的逼近性质,并表明所提出的框架在Monod、Lotka-Volterra、药代动力学和超日内分泌模型中恢复了轨迹和过程级结构。这些结果表明,BINODEs在机制可解释性和数据驱动灵活性之间提供了有用的折衷,用于建模部分已知的生化或生物动力系统。

英文摘要

Ordinary differential equation models of biochemical reactions are often formulated as stoichiometric systems in which the dynamics arise from a collection of interacting processes. A central challenge is that the functional form of each process is rarely known a priori and may be difficult to infer from data. We propose biochemically informed neural ordinary differential equations (BINODEs), a neural-ODE framework that retains the stoichiometric structure of mechanistic models while representing individual processes by neural networks. In BINODEs, the outputs of neural network processes (NNPs) are mapped to state derivatives through a linear layer analogous to a stoichiometric matrix. This architecture allows biological side information, such as process-specific inputs, sign constraints, and monotonicity assumptions, to be built directly into the model. We characterize the approximation properties of NNPs for several standard biochemical rate laws and show that the proposed framework recovers both trajectories and process-level structure in Monod, Lotka--Volterra, pharmacokinetic, and ultradian endocrine models. These results suggest that BINODEs offer a useful compromise between mechanistic interpretability and data-driven flexibility for modeling partially known biochemical or biological dynamical systems.

2605.24168 2026-05-26 cs.AI cs.LG 版本更新

Inference Time Context Sparsity: Illusion or Opportunity?

推理时上下文稀疏性:幻觉还是机遇?

Sahil Joshi, Prithvi Dixit, Agniva Chowdhury, Anshumali Shrivastava, Joseph E. Gonzalez, Ion Stoica, Kumar Krishna Agrawal, Aditya Desai

发表机构 * Berkeley(伯克利)

AI总结 本文通过实证和理论证据论证,在长上下文LLM推理中采用极端但原则性的上下文维度稀疏性不仅是可行的,而且能显著加速处理(如H100上实现10倍加速),从而挑战了密集注意力机制的必要性。

Comments 19 pages, 8 figures

详情
AI中文摘要

稀疏性长期以来一直是LLM效率的核心主题,但其在上下文处理中的作用仍未解决。随着LLM工作负载转向更长的上下文和智能体交互,注意力的计算和内存瓶颈变得日益关键,这引发了这些约束是否根本性的问题。我们的立场是,这些约束是人为且不必要的,LLM推理的未来在于沿上下文维度的极端但原则性的稀疏性。这一立场得到了多方面的经验和理论证据支持。首先,我们发现坚持密集注意力是不合理的,因为在长上下文中,查询实际上将O(N)个注意力信息投影到维度d << N的隐藏空间中,使得该过程固有地有损。其次,我们对跨越五个模型家族的20个LLM进行了广泛的稀疏性研究,变化上下文长度和不同稀疏水平。我们经验性地展示了一个强烈趋势:当前的LLM,尽管未针对上下文稀疏性进行训练,但在不同复杂度的任务(包括检索、多跳QA、数学推理和智能体编码)中对推理时解码稀疏性表现出显著的鲁棒性。重要的是,我们还表明当前的硬件已经足以从这种稀疏性中实现实质性收益。例如,我们的稀疏解码内核在H100等硬件上以50倍稀疏性水平将大上下文处理加速高达10倍,相比FlashInfer。总体而言,这些结果将极端上下文稀疏性定位为不仅是启发式的,而是LLM推理、训练和架构设计的原则性基础:既可行又有益,是未来系统的一个有吸引力的方向。

英文摘要

Sparsity has long been a central theme in LLM efficiency, but its role in context processing remains unresolved. As LLM workloads shift toward longer contexts and agentic interactions, the compute and memory bottlenecks of attention become increasingly critical, raising the question of whether these constraints are fundamental. Our position is that these constraints are artificial and unnecessary, and that the future of LLM inference lies in extreme but principled sparsity along the context dimension. This position is supported by several strands of empirical and theoretical evidence. First, we find the insistence on dense attention unreasonable, since in a long context a query effectively projects O(N) attention information into a hidden space of dimension d << N, making the process inherently lossy. Second, we perform an extensive study of sparsity in LLMs spanning 20 models across five model families, varying context lengths, and different sparsity levels. We empirically demonstrate a strong trend: current LLMs, despite not being trained for context sparsity, are remarkably robust to inference-time decode sparsity across tasks of varying complexity, including retrieval, multi-hop QA, mathematical reasoning, and agentic coding. Importantly, we also show that current hardware is already sufficient to realize substantial gains from this sparsity. For example, our sparse decode kernels accelerate large-context processing by up to 10x over FlashInfer at 50x sparsity levels on hardware such as the H100. Overall, these results position extreme context sparsity not as a heuristic, but as a principled foundation for LLM inference, training, and architecture design: one that is both feasible and beneficial, and a compelling direction for future systems.

2605.24162 2026-05-26 cs.LG cs.AI 版本更新

Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis

知识图谱调制的深度学习用于有限样本临床数据分析

Yuwei Xue, Sakib Mostafa, James Zou, Joseph Liao, Maximilian Diehn, Ash A. Alizadeh, Lei Xing, Md. Tauhidul Islam

发表机构 * Department of Radiation Oncology, Stanford University(放射肿瘤科,斯坦福大学) Stanford University(斯坦福大学) Stanford University School of Medicine(斯坦福大学医学院) Department of Computer Science, Stanford University(计算机科学系,斯坦福大学) Department of Electrical Engineering, Stanford University(电气工程系,斯坦福大学) Department of Biomedical Data Science, Stanford University School of Medicine(生物医学数据科学系,斯坦福大学医学院) Stanford Cancer Institute, Stanford University(斯坦福癌症研究所,斯坦福大学) Institute for Stem Cell Biology and Regenerative Medicine, Stanford University(干细胞生物学与再生医学研究所,斯坦福大学) Department of Medicine, Division of Oncology, Stanford University(医学系,肿瘤学分会,斯坦福大学) Institute of Computational and Mathematical Engineering, Stanford University(计算与数学工程研究所,斯坦福大学)

AI总结 提出Graph-in-Graph (GiG)框架,通过将患者表示为模块化图并整合生物知识图谱,在有限样本临床任务中显著提升预测性能。

Comments 17 pages, 4 figures, 12 supplementary figures

详情
AI中文摘要

生物系统受结构化分子相互作用支配,其中通路、调控回路和功能基因关系塑造细胞行为和疾病进展。这些知识大多自然表示为图。然而,大多数生物医学AI模型无法直接使用图编码的生物知识,而是需要压缩的低维表示,这可能会丢失重要结构并降低性能,尤其是在有限样本的临床研究中。这里,我们引入Graph-in-Graph (GiG),一个知识图谱调制的深度学习框架,用于数据高效的临床预测。GiG将每个患者表示为一个独立的模块化图,其中精选的生物知识图谱定义边,患者特定的测量(如基因表达)定义节点特征。这种设计允许整合多个生物知识图谱,同时在患者级表示学习中保留基因-基因相互作用和通路拓扑。在涵盖近9700名患者和五个临床任务(包括液体活检癌症检测、前列腺癌诊断和32类泛癌分类)的队列中,GiG持续优于传统和最先进的方法,在有限样本设置中增益最大。在具有挑战性的前列腺癌诊断任务中,GiG相对于竞争方法将macro-F1提高了最多49个百分点。用随机拓扑替换真实通路图的对照实验证实,这些增益源于生物学基础的知识图谱结构,而非仅图建模。这些发现表明,知识图谱调制的深度学习可以提高临床数据分析的鲁棒性、可解释性和样本效率,并为将生物知识图谱整合到预测建模中提供了一个原则性框架。

英文摘要

Biological systems are governed by structured molecular interactions, where pathways, regulatory circuits, and functional gene relationships shape cellular behavior and disease progression. Much of this knowledge is naturally represented as graphs. However, most biomedical AI models cannot directly use graph-encoded biological knowledge and instead require compressed low-dimensional representations, which can lose important structure and reduce performance, especially in limited-sample clinical studies. Here, we introduce Graph-in-Graph (GiG), a knowledge graph-modulated deep learning framework for data-efficient clinical prediction. GiG represents each patient as a standalone modular graph, in which curated biological knowledge graphs define edges and patient-specific measurements, such as gene expression, define node features. This design allows multiple biological knowledge graphs to be integrated while preserving gene-gene interactions and pathway topology during patient-level representation learning. Across cohorts comprising nearly 9,700 patients and five clinical tasks, including liquid biopsy cancer detection, prostate cancer diagnosis, and 32-class pan-cancer classification, GiG consistently outperforms traditional and state-of-the-art methods, with the largest gains in limited-sample settings. On the challenging prostate cancer diagnosis task, GiG improves macro-F1 by up to 49 percentage points relative to competing methods. Control experiments replacing real pathway graphs with random topologies confirm that these gains arise from biologically grounded knowledge graph structure rather than graph modeling alone. These findings show that knowledge graph-modulated deep learning can improve robustness, interpretability, and sample efficiency in clinical data analysis, and provide a principled framework for integrating biological knowledge graphs into predictive modeling.

2605.24155 2026-05-26 cs.IR cs.AI cs.LG 版本更新

An Interpretable CF-RL-TOPSIS Fusion Model for Skills-Aware Talent Recommendation

一种可解释的CF-RL-TOPSIS融合模型用于技能感知的人才推荐

Özkan Canay

发表机构 * Sakarya University(萨克萨大学)

AI总结 提出CF-RL-TOPSIS可解释融合模型,结合协同过滤、强化学习臂和熵权TOPSIS,在ICT人才历史基准上验证其在不同数据模式下的有效性。

Comments Preprint submitted to Knowledge-Based Systems; 4 figures and 8 tables

详情
AI中文摘要

有效的技能感知人才推荐必须平衡行为转换模式、轨迹敏感适应性和可检查的职业层面标准。然而,关于这些信号如何相互作用的公共基准证据仍然有限。本研究提出CF-RL-TOPSIS,一种可解释的后期融合模型,它集成了转换感知协同分支、紧凑的强化风格职业族臂和由六个语义代理构建的熵权TOPSIS分支;验证选择的融合系数保持可审计。该模型在两个冻结的公共ICT人才历史基准JobHop和Karrierewege上进行了评估,使用重复的时间顺序前5排名和配对Wilcoxon检验。在JobHop上,完整混合模型达到NDCG@5 = 0.3040 +/- 0.0073,并显著优于repeat-last、item Markov、转换感知协同过滤、CF+TOPSIS混合、GRU4Rec和SASRec(计划比较中p <= 0.0039)。在Karrierewege上,混合模型保持竞争力,但未显著超过最强的Markov基线,揭示了一个持久性主导的环境,其中臂分支适当缩小到接近零权重。代理敏感性、家庭级深度Q网络和运行时检查支持这一解释,一个详细的用户级案例展示了如何检查单个推荐的各分支分数、标准权重和排名变化。贡献不是基准无关的优越性声明,而是对透明后期融合在简单延续启发式之外增加价值的条件的可重复说明。在语义丰富、非饱和的人才历史机制中,三个分支相互增强;在持久性主导机制中,相同的架构通过其协同骨干保持竞争力,而自适应分支正确处于非活跃状态。

英文摘要

Effective skills-aware talent recommendation must balance behavioral transition patterns, trajectory-sensitive adaptation, and inspectable occupation-level criteria. Evidence from public benchmarks on how these signals interact, however, remains limited. This study proposes CF-RL-TOPSIS, an interpretable late-fusion model that integrates a transition-aware collaborative branch, a compact reinforcement-style occupation-family bandit, and an entropy-weighted TOPSIS branch constructed from six semantic proxies; the validation-selected fusion coefficients remain auditable. The model is evaluated on two frozen public ICT talent-history benchmarks, JobHop and Karrierewege, using repeated chronological top-5 ranking and paired Wilcoxon tests. On JobHop the full hybrid attains NDCG@5 = 0.3040 +/- 0.0073 and significantly surpasses repeat-last, item Markov, transition-aware collaborative filtering, the CF+TOPSIS hybrid, GRU4Rec, and SASRec (p <= 0.0039 across planned comparisons). On Karrierewege the hybrid remains competitive but does not significantly exceed the strongest Markov baseline, revealing a persistence-dominated setting in which the bandit branch appropriately shrinks to near-zero weight. Proxy-sensitivity, family-level deep Q-network, and runtime checks support this interpretation, and a worked user-level case shows how branch scores, criterion weights, and rank shifts can be inspected for an individual recommendation. The contribution is not a benchmark-agnostic superiority claim, but a reproducible account of the conditions under which transparent late fusion adds value beyond simple continuation heuristics. In semantically rich, non-saturating talent-history regimes the three branches reinforce one another; in persistence-dominated regimes the same architecture remains competitive through its collaborative backbone, with the adaptive branch correctly inactive.

2605.24144 2026-05-26 cs.AR cs.LG 版本更新

EVA: Accelerating LLM Decoding via an Efficient Vector Quantization Architecture

EVA:通过高效向量量化架构加速大语言模型解码

Bowen Duan, Cong Guo, Chiyue Wei, Haoxuan Shan, Yuzhe Fu, Xinhua Chen, Yifan Xu, Ziyue Zhang, Changchun Zhou, Hai Li, Yiran Chen

发表机构 * Duke University(杜克大学)

AI总结 针对大语言模型解码阶段内存瓶颈和计算利用率低的问题,提出EVA架构,将GEMV计算转化为GEMM并消除内存冲突,实现最高11.17倍加速和7.17倍能效提升。

Comments 17 pages. Accepted to ISCA 2026

详情
AI中文摘要

大型语言模型(LLM)在多个领域取得了令人印象深刻的性能,但在自回归解码阶段仍然效率低下。与采用计算密集型GEMM操作的预填充阶段不同,解码执行一系列小型类GEMV计算,这些计算受内存限制且未能充分利用现代加速器。仅权重的向量量化(VQ)已成为一种有效的压缩技术,它将模型权重聚类到共享码本中,并用低精度索引替换原始权重矩阵,实现2位级别的权重压缩。虽然这种方法显著减少了模型大小和内存带宽,但它仍然存在两个关键的低效问题:GEMV计算利用率低以及码本查找期间频繁的内存冲突。本文提出了EVA,一种高效的基于向量量化的架构,解决了LLM解码中的计算和内存瓶颈。EVA基于一个简单而有效的见解,即结合输入-码本计算与无冲突内存访问。EVA不是从索引重建量化权重,而是直接在输入向量和权重码本之间执行点积,将LLM解码从GEMV计算转变为GEMM计算。然后,它从中间输出缓冲区执行结构化查找,消除了内存组冲突。我们进一步设计了一种硬件-软件协同优化的架构,专门用于LLM解码,同时保持与传统预填充执行的兼容性。评估表明,与最先进的基于查找的架构相比,EVA实现了最高11.17倍的加速和7.17倍的能效提升,同时保持了向量量化后的算术精度。我们的代码可在https://github.com/dbw6/Eva.git获取。

英文摘要

Large Language Models (LLMs) have achieved impressive performance across diverse domains but remain inefficient during the autoregressive decoding phase. Unlike the prefill stage, which employs compute-bound GEMM operations, decoding executes a sequence of small GEMV-like computations that are memory-bound and underutilize modern accelerators. Weight-only vector quantization (VQ) has emerged as an effective compression technique that clusters model weights into a shared codebook and replaces the original weight matrix with low-precision indices, enabling 2-bit-level weight compression. While this approach substantially reduces model size and memory bandwidth, it still suffers from two critical inefficiencies: the low utilization of GEMV computation and frequent memory conflicts during codebook lookups. This paper presents EVA, an efficient vector-quantization-based architecture that addresses both computational and memory bottlenecks in LLM decoding. EVA builds on a simple yet effective insight that combines input-codebook computation with conflict-free memory access. Instead of reconstructing quantized weights from indices, EVA directly performs dot products between input vectors and the weight codebook, transforming LLM decoding from GEMV to GEMM computation. It then performs structured lookups from an intermediate output buffer, eliminating memory bank conflicts. We further design a hardware-software co-optimized architecture specialized for LLM decoding while remaining compatible with conventional prefill execution. Evaluations show that EVA achieves up to 11.17$\times$ speedup and 7.17$\times$ higher energy efficiency compared with the SOTA lookup-based architecture, while preserving arithmetic precision after vector quantization. Our code is available at https://github.com/dbw6/Eva.git.

2605.24139 2026-05-26 cs.AI cs.LG 版本更新

MAPLE: Multi-State Aggregated Policy Evaluation for AlphaZero in Imperfect-Information Games

MAPLE:不完全信息游戏中AlphaZero的多状态聚合策略评估

Qian-Rong Li, Hung Guei, I-Chen Wu, Ti-Rong Wu

发表机构 * Department of Computer Science, National Yang Ming Chiao Tung University(国立阳明交通大学计算机科学系) Institute of Information Science, Academia Sinica(中科院信息所)

AI总结 提出MAPLE方法,通过单搜索树聚合多个采样世界状态的策略和价值评估,结合PIMC和IS-MCTS优势,在Phantom Go和Dark Hex上分别提升Elo 291和136。

Comments Accepted by the IEEE Conference on Games (IEEE CoG 2026)

详情
AI中文摘要

不完全信息游戏(IIGs)具有挑战性,因为玩家必须在未完全观察真实游戏状态的情况下做出决策。虽然AlphaZero在完美信息游戏中取得了显著成功,但将其扩展到IIGs仍然困难。现有的基于搜索的方法,如完美信息蒙特卡洛(PIMC),存在策略融合问题,而信息集蒙特卡洛树搜索(IS-MCTS)在与神经网络结合时计算成本高昂。在本文中,我们提出了多状态聚合策略评估(MAPLE),一种树搜索方法,它在单个搜索树内聚合来自多个采样世界状态的策略和价值评估,结合了PIMC和IS-MCTS的优点,同时保持可控的计算成本。我们进一步引入基于孪生网络的采样策略,从信息集中选择信息丰富的世界状态。在Phantom Go和Dark Hex上的实验表明,MAPLE显著优于基于PIMC的AlphaZero基线,分别实现了291和136的Elo提升。这些结果表明,MAPLE是一种在不完全信息游戏中进行AlphaZero式学习的有效方法。

英文摘要

Imperfect-information games (IIGs) are challenging, as players must make decisions without fully observing the true game state. While AlphaZero has achieved remarkable success in perfect-information games, extending it to IIGs remains difficult. Existing search-based approaches, such as Perfect Information Monte Carlo (PIMC), suffer from strategy fusion, while Information Set Monte Carlo Tree Search (IS-MCTS) incurs high computational cost when combined with neural networks. In this paper, we propose Multi-State Aggregated PoLicy Evaluation (MAPLE), a tree search method that aggregates policy and value evaluations from multiple sampled world states within a single search tree, combining the advantages of PIMC and IS-MCTS while maintaining a controllable computational cost. We further incorporate a Siamese-based sampling strategy to select informative world states from the information set. Experiments on Phantom Go and Dark Hex show that MAPLE significantly outperforms the PIMC-based AlphaZero baseline, achieving Elo improvements of 291 and 136, respectively. These results demonstrate that MAPLE is an effective approach for AlphaZero-style learning in imperfect-information games.

2605.24136 2026-05-26 stat.ML cs.LG stat.CO 版本更新

Detecting Metastable Basins in High Dimensions via Marginal Trajectory Distribution Discrimination

通过边际轨迹分布判别检测高维亚稳态盆地

Taj Jones-McCormick

AI总结 提出一种基于边际轨迹分布比较的判别方法,通过神经网络近似贝叶斯分类器来识别高维马尔可夫过程中的亚稳态盆地,克服了传统谱方法在高维和非线性几何下的局限性。

详情
AI中文摘要

我们研究仅使用轨迹采样来识别高维时间齐次马尔可夫过程中动态不同的吸引盆地的问题。该问题是亚稳态动力系统分析的基础,其中过程在盆地内快速混合,而盆地之间的转换在感兴趣的时间尺度上很少发生,甚至当状态空间可约时也是如此。现有方法通常依赖于空间离散化或估计转移算子的谱分析,这在高维设置或底层盆地几何高度非线性时可能变得不可靠。我们提出了一种基于边际轨迹分布比较的盆地识别判别方法。我们证明了一个简单的风险分离结果:如果两个初始状态属于同一盆地,则区分其边际轨迹分布的贝叶斯最优分类器达到接近1/2的风险,而如果它们位于不同的盆地,则最优风险接近零。这一观察将盆地检测简化为边际轨迹分布之间的两样本判别问题。基于这一原理,我们开发了一种神经算法,该算法接收一组候选盆地代表,并通过神经网络近似贝叶斯分类器估计分类风险,迭代地合并它们。我们在各种亚稳态系统上评估了该方法。这些系统包括通过将低维动力学嵌入高维噪声环境空间构建的合成系统。在这些设置中,标准的谱和聚类方法常常失败,而我们的方法准确恢复了底层盆地结构。这些结果显示了现有方法的缺点,并突出了轨迹判别作为识别高维随机系统中动态盆地的有效工具。

英文摘要

We study the problem of identifying dynamically distinct basins of attraction in high dimensional time-homogeneous Markov processes using only trajectory sampling. This problem is fundamental in the analysis of metastable dynamical systems, where the process rapidly mixes within basins while transitions between basins occur rarely on the timescale of interest, or even when the state space is reducible. Existing approaches typically rely on spatial discretization or spectral analysis of estimated transition operators, which can become unreliable in high dimensional settings or when the underlying basin geometry is highly nonlinear. We propose a discriminative approach to basin identification based on marginal trajectory distribution comparison. We prove a simple risk separation result: if two initial states belong to the same basin, the Bayes-optimal classifier distinguishing their marginal trajectory distributions achieves risk close to 1/2, whereas if they lie in distinct basins, the optimal risk is close to zero. This observation reduces basin detection to a two-sample discrimination problem between marginal trajectory distributions. Motivated by this principle, we develop a neural algorithm that receives a set of candidate basin representatives and iteratively merges them by estimating classification risk with a neural network that approximates the Bayes classifier. We evaluate the method on various metastable systems. These include synthetic systems constructed by embedding low-dimensional dynamics into high dimensional noisy ambient spaces. In these settings, standard spectral and clustering-based methods often fail, while our approach accurately recovers the underlying basin structure. These results display a shortcoming of existing methods and highlight trajectory discrimination as an effective tool for identifying dynamical basins in high dimensional stochastic systems.

2605.24113 2026-05-26 cs.LG math.DG math.OC math.ST stat.TH 版本更新

Riemannian Archetypal Analysis: Interpretable non-linear data analysis on deformed star distributions

黎曼原型分析:变形星形分布上的可解释非线性数据分析

Willem Diepeveen, Deanna Needell

发表机构 * Department of Mathematics(数学系) University of California, Los Angeles(加州大学洛杉矶分校)

AI总结 提出黎曼原型分析(RAM),通过数据驱动的拉回几何将经典原型分析扩展到非线性流形,结合可解释性与非线性表达能力,并基于凸松弛与非凸细化的优化方案实现。

详情
AI中文摘要

经典原型分析因其可解释性而具有吸引力,但其线性几何在处理强非线性结构的数据时可能限制性能;同时,现有的神经扩展提高了灵活性,但往往削弱了原型和插值的几何意义。在这项工作中,我们基于数据驱动的拉回几何,针对实值数据开发了黎曼版本的原型分析,旨在结合经典原型分析的可解释性与现代非线性模型的表达能力。我们引入了一类变形星形分布及其相关的拉回黎曼几何,以提供所得流形映射的统计解释,将黎曼原型映射(RAM)定义为投影到原型的测地凸组合流形上,并提出了基于凸松弛后接非凸细化的实用优化方案。我们进一步提出了一种学习方案,从数据中产生合理但通常次优的变形星形分布。在合成示例和MNIST上的实验表明,所提出的框架产生了有意义的测地线、有用的去噪投影和几何感知分类,同时也明确了当前优化限制所在。

英文摘要

Classical archetypal analysis is appealing for its interpretability, but its linear geometry can limit performance on data with strongly non-linear structure; at the same time, existing neural extensions improve flexibility while often weakening the geometric meaning of archetypes and interpolations. In this work, we develop a Riemannian version of archetypal analysis based on data-driven pullback geometry for real-valued data, with the goal of combining the interpretability of classical archetypal analysis with the expressive power of modern non-linear models. We introduce a class of deformed star distributions together with associated pullback Riemannian geometry to provide a statistical interpretation of the resulting manifold mappings, define the Riemannian archetypal mapping (RAM) as a projection onto the manifold of geodesically convex combinations of archetypes, and propose a practical optimization scheme based on convex relaxation followed by non-convex refinement. We further propose a learning scheme that yields reasonable, albeit generally suboptimal, deformed star distributions from data. Experiments on synthetic examples and MNIST show that the resulting framework produces meaningful geodesics, useful denoising projections, and geometry-aware classifications, while also clarifying where current optimization limitations remain.

2605.24106 2026-05-26 cs.LG cs.AI 版本更新

Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference

克服地球观测中的“物理冲击”:面向PINN洪水推断的异方差不确定性框架

Tewodros Syum Gebre, Jagrati Talreja, Matilda Anokye, Leila Hashemi-Beni

发表机构 * Built Environment Department, College of Science and Technology, North Carolina A&T State University(北卡罗来纳A&T州立大学科学与技术学院建筑环境系) United Nations University Institute for Water, Environment and Health(联合国大学水、环境与健康研究所)

AI总结 提出一种不确定性感知的物理信息神经网络框架,通过动态热身启动和异方差不确定性建模,解决遥感洪水映射中物理约束与噪声数据冲突导致的梯度发散问题,在Sen1Floods11数据集上IoU提升25%。

Comments This article is accepted in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

详情
AI中文摘要

从遥感数据(如合成孔径雷达SAR)中快速准确地绘制洪水范围对于灾害应急响应至关重要,但标准深度学习模型由于缺乏水文约束常产生物理上不可能的预测。尽管物理信息神经网络(PINNs)试图通过将控制定律直接嵌入损失函数来解决这一问题,但其在真实遥感数据上的应用经常失败。将刚性空间导数(如二维浅水方程)强加于试图拟合噪声SAR散斑的无条件潜在空间会导致灾难性的梯度发散,我们将这一现象称为“物理冲击”。本文提出了一种专门针对应用地球观测的新型不确定性感知PINN框架,以解决这一不稳定性。通过集成动态热身启动协议和通过负对数似然目标建模异方差偶然不确定性,网络学会在高传感器噪声区域动态放松物理约束,而在高置信度区域严格强制执行。在Sen1Floods11数据集上的评估表明,我们的概率注意力门控FNO-UNet成功稳定了多目标优化,与确定性基线相比,交并比(IoU)相对提高了25%。此外,通过深度集成,我们成功地将内在传感器噪声与分布外地形未知性分离开来,为运营机构提供了高度校准、物理一致的置信区间,用于稳健的灾害缓解和实时决策。

英文摘要

Rapid and accurate flood extent mapping from Remote Sensing data, such as Synthetic Aperture Radar (SAR), is critical for operational disaster response, but standard Deep Learning models often produce physically impossible predictions due to a lack of hydrological constraints. While PhysicsInformed Neural Networks (PINNs) attempt to address this by embedding governing laws directly into the loss function, their application to real-world remote sensing data frequently fails. Enforcing rigid spatial derivatives (e.g., the 2D Shallow Water Equations) onto unconditioned latent spaces attempting to fit noisy SAR speckle causes catastrophic gradient divergence, a phenomenon we term Physics Shock. In this paper, we propose a novel Uncertainty-Aware PINN framework tailored specifically for applied Earth Observation that addresses this instability. By integrating a dynamic Warm-Start protocol and modeling heteroscedastic aleatoric uncertainty via a negative log-likelihood objective, the network learns to dynamically relax physical constraints in regions of high sensor noise while strictly enforcing them in high-confidence areas. Evaluated on the Sen1Floods11 dataset, our probabilistic Attention-Gated FNO-UNet successfully stabilizes multi-objective optimization, achieving a +25% relative improvement in Intersection over Union (IoU) compared to deterministic baselines. Furthermore, through Deep Ensembles, we successfully disentangle intrinsic sensor noise from out-of-distribution terrain ignorance, providing operational agencies with highly calibrated, physically consistent confidence bounds for robust disaster mitigation and real-time decision-making.

2605.24084 2026-05-26 cs.LG cs.AI cs.LO 版本更新

Verified SHAP: Provable Bounds for Exact Shapley Values of Neural Networks

Verified SHAP: 神经网络精确Shapley值的可证明界

David Boetius, Shahaf Bassan, Guy Katz, Stefan Leue, Tobias Sutter

发表机构 * University of Konstanz, Konstanz, Germany(康斯坦茨大学) Hebrew University of Jerusalem, Jerusalem, Israel(耶路撒冷希伯来大学) University of St.Gallen, St.Gallen, Switzerland(斯图加特大学)

AI总结 利用神经网络验证技术,提出一种计算SHAP值精确上下界的算法,可扩展到比现有精确方法大数个数量级的搜索空间。

Comments Accepted at ICML 2026. 34 pages, 13 figures

详情
AI中文摘要

Shapley加法解释(SHAP)被广泛认为对于神经网络在计算上是棘手的,因为它们在输入特征上诱导出指数搜索空间。在这项工作中,我们迈出了将精确SHAP计算扩展到更大搜索空间的第一步,引入了一种算法,该算法利用神经网络验证的最新进展来计算神经网络SHAP值的任意紧的精确下界和上界,最终恢复精确的SHAP值。我们证明了我们的方法可以扩展到比最先进的精确方法大数个数量级的搜索空间。这为精确SHAP计算提供了重要的第一步,并为在更大搜索空间上评估统计近似方法建立了原则性的基石。

英文摘要

Shapley additive explanations (SHAP) are widely recognised as computationally intractable for neural networks, since they induce an exponential search space over the input features. In this work, we take a first step towards scaling exact SHAP computation to larger search spaces by introducing an algorithm that leverages recent advances in neural network verification to compute arbitrarily tight exact lower and upper bounds on SHAP values for neural networks, ultimately recovering the exact SHAP values. We demonstrate that our approach scales to orders of magnitude larger search spaces than state-of-the-art exact methods. This provides an important first step towards exact SHAP computation and establishes a principled cornerstone for evaluating statistical approximation methods on larger search spaces.

2605.24077 2026-05-26 eess.SP cs.LG 版本更新

LWM-CDE: A Representation Space for Wireless Data Reasoning and Transferability

LWM-CDE:无线数据推理与可迁移性的表示空间

Sadjad Alikhani, Akshay Malhotra, Shahab Hamidi-Rad, Ahmed Alkhateeb

发表机构 * School of Electrical, Computer and Energy Engineering, Arizona State University(电气、计算机与能源工程学院,亚利桑那州立大学) InterDigital, Inc.(InterDigital公司)

AI总结 提出基于预训练无线基础模型特征空间的数据集相似性框架LWM-CDE,通过对比学习和几何形状损失微调数据集嵌入,构建距离可靠指示可迁移性的结构化流形,在无线基准测试中比现有指标更高效且与经验迁移性能相关性更强。

Comments The model and relevant scripts are available on the WILab Hugging Face page: https://huggingface.co/wi-lab

详情
AI中文摘要

机器学习在真实世界无线通信任务中的部署面临显著的泛化挑战,原因包括信号结构对位置和环境的依赖性、不同部署场景下数据的高度多样性以及真实世界数据的有限可用性。当前用于评估训练分布与推理(部署)分布之间数据相似性以及模型可迁移性的方法存在计算成本高和性能不一致的问题,导致关键的模型部署和模型生命周期管理决策缺乏原则性基础。为了解决这一问题,我们引入了一个基于预训练无线基础模型特征空间的数据集相似性框架。我们的方法LWM-CDE(对比学习数据集嵌入)通过结合对比损失和几何形状损失对基础模型的数据集嵌入进行微调,构建了一个结构化流形,其中距离可靠地指示可迁移性。在无线基准测试上的大量实验表明,LWM-CDE与经验迁移性能的相关性比现有指标更强,同时计算效率更高。学习到的表示空间支持更有效且数据高效的决策,例如源数据集选择、标签感知增强和预算预训练,展示了其在各种无线通信应用中的广泛实用性。

英文摘要

Machine learning deployments in real-world wireless communication tasks face significant generalization challenges due to location and environment-specific signal structure, high diversity in data across different deployments, and limited availability of real-world data. Current approaches for assessing data similarity between training and inference (deployment) distributions, as well as evaluating model transferability, suffer from high computational costs and inconsistent performance, leaving critical model deployment and model life cycle management decisions without a principled foundation. To address this, we introduce a dataset similarity framework built upon the feature space of a pretrained wireless foundation model. Our method, LWM-CDE (Contrastive learning of Dataset Embedding), fine-tunes the dataset embeddings of the foundation model using a combination of contrastive and geometry-shaping losses, creating a structured manifold where distance reliably indicates transferability. Extensive experiments on wireless benchmarks show that LWM-CDE achieves stronger correlation with empirical transfer performance than existing metrics while being more computationally efficient. The learned representation space supports more effective and data-efficient decision-making for tasks like source dataset selection, label-aware augmentation, and budgeted pretraining, demonstrating its broader utility across different wireless communication applications.

2605.24076 2026-05-26 stat.ML cs.LG 版本更新

Causality as the Statistical Conscience of Artificial Intelligence: From Pearl's Ladder to Trustworthy Machines

因果关系作为人工智能的统计良知:从珀尔的阶梯到可信机器

Ernest Fokoué

发表机构 * School of Mathematics and Statistics, College of Science(数学与统计学学院,科学学院)

AI总结 本文论证因果推断是AI不可或缺的统计良知,通过统计必要性定理、统一因果统计估计框架以及三种AI失败模式的因果盲区分析,提出可信AI本质上是因果统计问题。

Comments 18 pages, 4 figures, 1 table

详情
AI中文摘要

现代人工智能通过优化大规模语料库上的统计风险函数实现了卓越的预测能力。然而,这与其正智能之间存在差距:无法区分相关性与因果关系。本文认为,因果推断(识别干预下不变的机制)是人工智能不可或缺的统计良知。没有因果基础,AI系统只是相关机器:在熟悉领域强大,在分布偏移下脆弱,在高风险场景中存在偏见。三个贡献发展了这一论点。首先,因果泛化的统计必要性定理:任何实现分布外泛化的算法必须编码因果结构,形式化了预测P(Y|X)与智能P(Y|do(X))之间的区别。其次,一个统一框架将珀尔的do演算、潜在结果框架、双机器学习以及不变风险最小化连接为一系列因果统计估计量,每个估计量在不同假设下识别干预分布。第三,三种AI失败模式(大语言模型中的幻觉、基于人类反馈的强化学习中的奖励黑客以及分布偏移下的退化)是因果盲区的表现,每种都有原则性的统计补救措施。可信AI的核心是一个因果统计问题。统计界不仅有能力解决它——而且是唯一拥有严格解决所需基础工具的群体。

英文摘要

Modern Artificial Intelligence achieves remarkable predictive power by optimizing statistical risk functionals over vast corpora. Yet a gap separates this from genuine intelligence: the inability to distinguish correlation from causation. This paper argues that causal inference (identifying mechanisms invariant under intervention) is AI's indispensable statistical conscience. Without causal grounding, AI systems are correlation machines: powerful in familiar domains, brittle under distribution shift, and biased in high-stakes settings. Three contributions develop this argument. First, a Statistical Necessity Theorem for Causal Generalization: any algorithm achieving out-of-distribution generalization must encode causal structure, formalizing the distinction between prediction P(Y|X) and intelligence P(Y|do(X)). Second, a unified framework connects Pearl's do-calculus, the Potential Outcomes framework, Double Machine Learning, and Invariant Risk Minimization as a family of Causal Statistical Estimators, each identifying interventional distributions under different assumptions. Third, three AI failure modes (hallucination in large language models, reward hacking in reinforcement learning from human feedback, and degradation under distribution shift) are manifestations of causal blindness, each admitting a principled statistical remedy. Trustworthy AI is, at its core, a problem of causal statistics. The statistical community is not merely equipped to solve it -- it is the only community with the foundational tools to do so rigorously.

2605.24073 2026-05-26 physics.chem-ph cs.LG 版本更新

Multitask learning with semiempirical orbital charges enables sample-efficient MLIPs

基于半经验轨道电荷的多任务学习实现样本高效的机器学习原子间势

Ihor Neporozhnii, Sjoerd Hoogland, Oleksandr Voznyy

发表机构 * Department of Physical and Environmental Sciences(物理与环境科学系) Department of Electrical and Computer Engineering(电气与计算机工程系) The Alliance for AI-Accelerated Materials Discovery(人工智能加速材料发现联盟)

AI总结 提出利用半经验轨道电荷进行多任务学习,通过等变模型预测轨道电荷,显著提升机器学习原子间势的样本效率和精度,减少46%能量误差并降低五倍数据需求。

Comments 16 pages, 6 figures

详情
AI中文摘要

机器学习原子间势(MLIPs)需要生成计算昂贵的大规模训练数据集,以准确模拟材料和分子。利用多任务学习结合电子结构信息可提高样本效率,然而,在全哈密顿矩阵(随原子数二次缩放)上训练对于大数据集是难以处理的。在这项工作中,我们表明利用轨道分辨的半经验电荷进行多任务学习显著提高了MLIPs的样本效率和精度。为了高效预测轨道电荷,我们实现了一个专门的等变模型,与不变基线相比降低了电荷预测误差。通过使用计算成本低、随原子数线性缩放的GFN1-xTB轨道电荷增强训练,我们的模型实现了能量平均绝对误差降低46%,并且仅需五分之一的数据即可达到仅能量模型的性能。此外,我们的方法优于在昂贵的密度泛函理论(DFT)原子电荷上训练的模型,捕捉了轨道分辨的电子复杂性,并迫使网络学习一个物理准确的潜在空间,该空间根据共享化学性质自发地对金属进行聚类。由于轨道电荷仅在训练期间需要,这种方法保持了推理效率,为开发复杂化学系统的准确、数据高效的基础模型提供了可扩展的方案。

英文摘要

Machine learning interatomic potentials (MLIPs) require generating computationally expensive, large-scale training datasets to accurately simulate materials and molecules. Incorporating electronic structure information using multitask learning improves sample efficiency, however, training on full Hamiltonian matrices, which scale quadratically with the number of atoms, is intractable for large datasets. In this work, we show that multitask learning utilizing orbitally resolved semiempirical charges significantly improves sample efficiency and accuracy in MLIPs. To efficiently predict orbital charges, we implement a specialized equivariant model, reducing charge prediction error compared to an invariant baseline. By augmenting training with computationally inexpensive GFN1-xTB orbital charges, which scale linearly with the number of atoms, our model achieves a 46\% reduction in energy mean absolute error and requires five times less data to match the performance of energy-only models. Furthermore, our approach outperforms models trained on expensive density functional theory (DFT) atomic charges, capturing orbitally resolved electronic complexity and forcing the network to learn a physically accurate latent space that spontaneously clusters metals by shared chemical properties. Because orbital charges are only required during training, this approach preserves inference efficiency, providing a scalable recipe for developing accurate, data-efficient foundation models for complex chemical systems.

2605.24072 2026-05-26 stat.ML cs.LG math.PR 版本更新

Optimal Non-Asymptotic Edgeworth Expansions for Multivariate Neural Network Outputs

多元神经网络输出的最优非渐近 Edgeworth 展开

Lucia Celli

发表机构 * Department of Mathematics, University of Luxembourg(卢森堡大学数学系)

AI总结 针对有限宽度全连接神经网络输出,利用任意阶 Edgeworth 展开逼近其与高斯极限的偏差,并给出总变差距离的上下界。

Comments 34 pages, 2 figures

详情
AI中文摘要

具有高斯初始化权重的有限宽度全连接神经网络偏离其无限宽度高斯极限,表现出非消失的高阶累积量。我们针对在有限个输入上评估的神经网络,使用任意阶 $4m-1$($m\in\mathbb{N}$)的多维 Edgeworth 展开来逼近这些偏差。假设相应的高斯极限具有可逆协方差矩阵且激活函数为多项式有界,我们在真实网络输出分布与其 Edgeworth 逼近之间的总变差距离上建立了 $n^{-m}$ 阶的界,并给出了匹配的下界。作为一个应用,我们量化了当先验被其 Edgeworth 展开替代时贝叶斯后验分布的误差。我们的结果更具一般性,也适用于收敛到具有可逆协方差的高斯向量的条件高斯向量序列。

英文摘要

Finite-width fully connected neural networks with Gaussian-initialized weights deviate from their infinite-width Gaussian limit, exhibiting non-vanishing higher-order cumulants. We approximate these deviations, for a neural network evaluated in a finite number of inputs, using multidimensional Edgeworth expansions of arbitrary order $4m-1$, with $m\in\mathbb{N}$. Assuming that the corresponding Gaussian limit has an invertible covariance matrix and that the activation function is polynomially bounded, we establish a bound of order $n^{-m}$ on the total variation distance between the law of the true network output and its Edgeworth approximation, with matching lower bounds. As an application, we quantify the error in Bayesian posterior distributions when the prior is replaced by its Edgeworth expansion. Our results are more general and also apply to sequences of conditionally Gaussian vectors converging to a Gaussian vector with invertible covariance.

2605.24067 2026-05-26 physics.ao-ph cs.LG 版本更新

Seeing Inside the Storm: Improving Nowcasting by Integrating Meteorological Drivers

洞察风暴内部:通过整合气象驱动因子改进临近预报

Minghui Qiu, Jun Chen, Lin Chen, Weifeng Chen, Shuxin Zhong, Zhidan Liu, Yu Zhang, Kaishun Wu

发表机构 * Guangzhou Meteorological Observatory(广州气象局)

AI总结 提出MetroLogist框架,通过物理定制的编码器、时间相位对齐器和跨场空间聚合器,整合热力学、动力学和微物理驱动因子,实现风暴生命周期的完整建模,显著提升临近预报性能。

详情
AI中文摘要

大多数基于雷达反射率的临近预报系统关注当前降水,忽略了大气前兆——如低层辐合、湍流涡旋和潜热加热——这些为预见风暴诞生提供了短暂窗口。我们提出了MetroLogist,一个受物理启发的雷达智能框架,模拟从风暴前兆到组织化演变的完整对流生命周期。然而,利用这些前兆并非易事:它们源自多个气象驱动因子——热力学、动力学和微物理——这些因子异步演化(C1)且在空间上分散(C2)。为此,MetroLogist设计了三个紧密集成的组件。物理定制编码器根据雷达回波的内在物理尺度和语义进行处理,形成热力学、动力学和微物理流,捕捉不同的动力机制。时间相位对齐器通过利用因果时间注意力来捕捉不同驱动因子何时以及如何相互作用和激活,从而解决C1。跨场空间聚合器通过跨区域融合,对齐相邻单元中微弱且分散的前兆,以暴露上游触发因素并强制空间一致性,从而解决C2。在3D-NEXRAD(2020-2022,全美范围)上的评估显示,MetroLogist在高影响检测(CSI40)上比强基线提升了+9.7%,并在风暴发展阶段实现了37.67%的显著增益——展示了在风暴出现之前感知它们的真正预见能力。代码可在补充材料中找到。

英文摘要

Most nowcasting systems, built on radar reflectivity, focus on current precipitation, ignoring the atmospheric precursors -- such as low-level convergence, turbulent eddies, and latent heating -- that offer a fleeting window to foresee storm birth. We introduce MeteoLogist, a physics-inspired radar intelligence framework that models the full life cycle of convection -- from its precursors to organized storm evolution. However, exploiting these precursors is non-trivial: they originate from multiple meteorological drivers -- thermodynamic, kinematic, and microphysical -- that evolve asynchronously (C1) and remain spatially fragmented (C2). To this end, MeteoLogist designs three tightly integrated components. The Physics-Tailored Encoders process radar echoes according to their intrinsic physical scales and semantics, forming thermodynamic, kinematic, and microphysical streams that capture distinct dynamical regimes. The Temporal-Phase Aligner addresses C1 by leveraging causal temporal attention to capture when and how different drivers interact and activate. The Cross-Field Spatial Aggregator addresses C2 through cross-regional fusion, aligning weak and scattered precursors across neighboring cells to expose upstream triggers and enforce spatial coherence. Evaluated on 3D-NEXRAD (2020--2022, US-wide), MeteoLogist boosts high-impact detection (CSI40) by +9.7% over strong baselines, and achieves a remarkable 37.67% gain during the storm-developing stage -- demonstrating true foresight in sensing storms before they appear. The code can be found in the supplementary material.

2605.24064 2026-05-26 cs.LG cs.AI 版本更新

Generative Representation Learning on Hyper-relational Knowledge Graphs via Masked Discrete Diffusion

超关系知识图谱上的生成式表示学习:基于掩码离散扩散

Jaejun Lee, Seheon Kim, Joyce Jiyoung Whang

发表机构 * School of Computing(计算学院) Department of AI Computing, KAIST, Daejeon, South Korea(人工智能计算系,韩国科学技术院,大田,韩国)

AI总结 针对超关系知识图谱中任意掩码查询的补全与事实生成任务,提出基于掩码离散扩散的生成式表示学习方法KREPE,统一链接预测与事实生成,性能达到最优。

Comments 28 pages, 16 figures, 18 tables, 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

超关系知识图谱(HKG)能有效表示复杂事实。在HKG中推断新知识是一个关键问题,但现有方法将其视为简单的链接预测,假设事实中几乎所有实体和关系已知,仅留单个空白待填充。然而,这种受限假设在现实场景中可能不成立,因为事实的多个甚至全部组成成分可能同时缺失。为弥补这一差距,我们引入一个称为事实生成的任务:从任意掩码查询生成有效超关系事实,即补全部分观察到的事实或从头生成事实。我们提出KREPE,这是首个用于HKG的生成式表示学习方法,通过掩码离散扩散学习以局部事实成分和HKG全局结构为条件的缺失成分概率分布。KREPE通过上下文消息传递建模事实内依赖,并通过聚合随机采样上下文建模事实间关联。KREPE在单一训练框架内无缝统一链接预测与事实生成,在标准HKG链接预测基准上达到最先进性能,并在生成新颖且正确事实方面超越基于LLM的基线方法。

英文摘要

Hyper-relational knowledge graphs (HKGs) effectively represent complex facts. While inferring new knowledge in HKGs is a critical problem, current methods cast it as a simple link prediction, assuming that nearly all entities and relations within a fact are known, leaving only a single blank to be filled. However, this restricted assumption may not hold in real-world scenarios in which multiple, or even all, constituent components of a fact may be missing simultaneously. To bridge this gap, we introduce a task called fact generation: generating a valid hyper-relational fact from an arbitrarily masked query, i.e., completing a partially observed fact or generating a fact from scratch. We propose KREPE, the first generative representation learning method for HKGs that learns to model the probability distributions of missing components conditioned on the local fact components and global structure of HKGs via a masked discrete diffusion. KREPE models both the intra-fact dependencies by contextual message passing and inter-fact correlations by aggregating stochastically sampled contexts. KREPE seamlessly unifies link prediction and fact generation within a single training framework, achieving state-of-the-art performance on standard HKG link prediction benchmarks and outperforming LLM-based baselines in generating novel and correct facts.

2605.24062 2026-05-26 cs.LG cs.AI 版本更新

Federated Learning over Human-Body Communication for On-Body Edge Intelligence: A Survey, Taxonomy, and BODYFED-HBC Scheduling Vignette

基于人体通信的联邦学习用于体表边缘智能:综述、分类法与BODYFED-HBC调度示例

Koffka Khan

发表机构 * Department of Computing and Information Technology(计算与信息技术系) The University of the West Indies(西印度大学)

AI总结 本文综述了人体通信与联邦学习在可穿戴设备中的交叉领域,提出了一种区分体内、体中心、跨用户和临床云联邦学习部署的分类法,并引入BODYFED-HBC参考架构和调度算法以解决体信道感知的联邦学习问题。

详情
AI中文摘要

人体通信(HBC)是一种有前景的可穿戴体域网络物理层,因为它可以将通信局限在身体周围,并减轻传统无线电链路的负担。联邦学习(FL)是一种有前景的学习层,因为它可以减少生理和行为传感的原始数据集中化。然而,这两类文献之间的联系仍然薄弱:用于可穿戴设备的FL通常抽象通信层,而HBC研究通常抽象学习和模型更新流量。本文综述了HBC、无线体域网络、可穿戴FL、身体互联网隐私和边缘智能优化的交叉领域。我们提出了一种分类法,区分了体内、体中心、跨用户和临床云FL部署,并识别了体信道感知FL这一开放问题:即客户端选择、更新压缩和聚合由姿态相关的HBC链路、剩余能量、传感器内存和隐私风险控制的学习协议。为了使研究议程具体化,我们引入了BODYFED-HBC作为参考架构,并提供了优化公式和调度算法。我们进一步指定了一个可复现的模拟示例,该示例结合了公共可穿戴数据集和经验性的体耦合通信信号损耗模型。文章最后为工作在硬件层之上的计算机科学家提供了开放数据集、评估指标、局限性和研究方向。

英文摘要

Human-body communication (HBC) is a promising physical substrate for wearable body-area networks because it can localize communication around the body and reduce the burden of conventional radio links. Federated learning (FL) is a promising learning substrate because it can reduce raw-data centralization for physiological and behavioral sensing. Yet these two literatures remain weakly connected: FL for wearables usually abstracts the communication layer, whereas HBC research usually abstracts learning and model-update traffic. This article surveys the intersection of HBC, wireless body-area networks, wearable FL, Internet-of-Bodies privacy, and edge-intelligence optimization. We propose a taxonomy that distinguishes intra-body, body-hub, cross-user, and clinical-cloud FL deployments, and we identify the open problem of body-channel-aware FL: learning protocols whose client selection, update compression, and aggregation are controlled by posture-dependent HBC links, residual energy, sensor memory, and privacy risk. To make the research agenda concrete, we introduce BODYFED-HBC as a reference architecture and provide an optimization formulation and scheduling algorithm. We further specify a reproducible simulation vignette that combines public wearable datasets with empirical body-coupled-communication signal-loss models. The article concludes with open datasets, evaluation metrics, limitations, and research directions for computer scientists working above the hardware layer.

2605.24058 2026-05-26 cs.LG cs.AI 版本更新

Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning

符号胜过浮点:面向设备端微调的低秩双二值适配器

Yoshihiko Fujisawa, Yuma Ichikawa, Yudai Fujimoto, Akira Sakai, Katsuki Fujisawa

发表机构 * Fujitsu Limited(富士通株式会社) Institute of Science Tokyo(东京科学研究所) RIKEN Center for AIP(理化学研究所先进信息处理中心) Tokai University(静冈大学)

AI总结 提出LoRDBA,一种用二值符号载波和通道级缩放替代低秩因子的适配器,在保持LoRA兼容性的同时显著降低存储和计算开销,并在设备端微调中匹配或超越低比特基线性能。

Comments 34 pages, 3 figures

详情
AI中文摘要

大型语言模型的设备端适配通常保持量化基模型冻结,同时训练和部署一个小型任务特定的LoRA适配器。然而,在未合并的适配器模式下,适配器不仅仅是一个紧凑的存储模块;它引入了一个额外的密集浮点分支,维护可训练状态以进行本地更新,并充当通信和热交换单元。我们提出LoRDBA,一种LoRA兼容的适配器,它将两个低秩因子替换为二值符号载波,同时通过轻量级的通道级缩放表示幅度,将密集适配器分支转换为两个符号累积矩阵乘法,中间穿插通道级缩放。有限样本分析表明,重建质量由原始LoRA因子的残差与幅度之比决定。在适配器模式实验中,LoRDBA在匹配模型大小的情况下优于低比特基线,并在某些场景下匹配fp16 LoRA的质量。尽管适配器占用减少了超过10倍,未合并的适配器在匹配秩r=16时最多引入8%的预填充延迟开销,训练内存开销约为fp16 LoRA的1.6倍。

英文摘要

On-device adaptation of large language models commonly keeps a quantized base model frozen while training and deploying a small, task-specific LoRA adapter. In the unmerged adapter-mode setting, however, the adapter is more than a compact storage module; it introduces an additional dense floating-point branch, maintains a trainable state for local updates, and acts as a unit of communication and hot-swapping.We introduce LoRDBA, a LoRA-compatible adapter that replaces both low-rank factors with binary sign carriers while representing magnitudes through lightweight, channel-wise scales, converting the dense adapter branch into two sign-accumulation matrix multiplications interleaved with channel-wise scaling. A finite-sample analysis shows that reconstruction quality is governed by the residual-to-magnitude ratio of the original LoRA factors. In adapter-mode experiments, LoRDBA outperforms low-bit baselines at matched model sizes while matching fp16 LoRA quality in selected regimes. The unmerged adapter incurs at most 8% prefill latency overhead at matched rank r=16 despite an over 10x reduction in adapter footprint, with moderate training memory overhead of approximately 1.6x that of fp16 LoRA.

2605.24057 2026-05-26 cs.LG cs.AI 版本更新

Feature Lottery? A Bifurcation Theory of Concept Emergence

特征彩票?概念涌现的分岔理论

Fuming Yang

发表机构 * MIT(麻省理工学院)

AI总结 提出一种基于分岔理论的方法,通过损失Hessian驱动的超临界叉形分岔检测表示动力学中的结构涌现,并引入无标签相位坐标β/β_c,在多种设置下验证了四个不同的转变阶段,揭示了特征可解释性的早期可预测性。

详情
AI中文摘要

神经网络在训练过程中的特定时刻获得结构化表示,然而识别这些转变通常依赖于回顾性的、基于标签的指标。我们引入了一种表示动力学的分岔理论来实时检测这些时刻。通过分析附加在演化编码器上的被动高斯混合模型探针,我们展示了结构的开始对应于由损失Hessian驱动的超临界叉形分岔。系统表现出一个理论上可预测的过零点(β_c),与网络当前状态(β)相比,产生一个动态比率β(t)/β_c(t):一个通用的、无标签的表示动力学相位坐标,完全可以从隐藏状态计算得出。我们在不同设置下实证验证了该坐标预测的四个不同转变阶段:语言模型(Pythia)上的稀疏自编码器、自监督学习(CIFAR)和grokking(模算术)。关键的是,在有限耗散下,宏观对称性破缺可能滞后于初始过零点数个数量级,这为grokking中观察到的延迟逃逸提供了严格的动力学解释。微观上,分岔产生了一个共享的不稳定子空间,迫使集体对称性破缺。我们将其称为稀疏自编码器训练中的“特征彩票”:一个特征的最终可解释性变得惊人地早期可预测。仅在训练5%时,早期原子纯度就能稳健地预测最终收敛纯度,其中前十百分位的早期原子在收敛时的纯度比基线高出12倍以上。除了解释概念涌现外,β/β_c还为训练健康提供了实用的早期预警指标,在下游指标反应之前检测到可用结构的出现、特征身份的结晶以及表示崩溃的时期。

英文摘要

Neural networks acquire structured representations at specific moments during training, yet identifying these transitions typically relies on retrospective, label-dependent metrics. We introduce a bifurcation theory of representation dynamics to detect these moments in real time. Analyzing a passive GMM probe attached to the evolving encoder, we show the onset of structure corresponds to a supercritical pitchfork bifurcation driven by the loss Hessian. The system exhibits a theoretically predictable zero-crossing ($β_c$) that, compared to the network's current state ($β$), yields a dynamic ratio $β(t)/β_c(t)$: a universal, label-free phase coordinate for representation dynamics, computable entirely from hidden states. We empirically validate four distinct transition regimes predicted by this coordinate across diverse settings: SAEs on language models (Pythia), SSL (CIFAR), and grokking (modular arithmetic). Crucially, under finite dissipation, macroscopic symmetry-breaking can lag the initial zero-crossing by orders of magnitude, which providing a rigorous dynamical account of the delayed escape observed in grokking. Microscopically, the bifurcation creates a shared unstable subspace, forcing collective symmetry breaking. We term this the "feature lottery" in SAE training: a feature's terminal interpretability becomes predictable remarkably early. By only 5% of training, early atom purity robustly predicts final convergence purity, with top-decile early atoms achieving over 12x the baseline purity at convergence. Beyond explaining concept emergence, $β/β_c$ provides a practical early-warning indicator for training health, detecting the onset of usable structure, the crystallization of feature identity, and representational collapse epochs before downstream metrics react.

2605.24055 2026-05-26 cs.LG cs.AI 版本更新

Cascade-KDE: Robust Time-Series Restoration under Out-of-Distribution Impulse Corruptions

Cascade-KDE:面向分布外脉冲损坏的鲁棒时间序列恢复

Yuefeng Liu, Ning Yang, Ziyu Yang

发表机构 * School of Digital and Intelligent Industry (School of Cyber Science and Technology)(数字与智能产业学院(网络科学与技术学院)) Inner Mongolia University of Science and Technology(内蒙古科技大学)

AI总结 提出Cascade-KDE无训练框架,通过二维密度估计、密度截断鲁棒期望和指数级联自适应停止,在保留局部结构的同时鲁棒恢复被高斯噪声和脉冲异常损坏的时间序列。

详情
AI中文摘要

工业传感、医疗和能源系统中的真实世界时间序列数据通常被高斯噪声和偶尔的大幅度脉冲异常值混合污染。对于依赖局部形状的任务,如心电图形态分析和电池退化监测,主要要求不仅是低重建误差,还要保留导数峰值和任务关键特征。我们提出了Cascade-KDE,一种用于损坏时间序列的无训练恢复框架。该方法首先估计二维时间-幅度密度,然后应用密度截断鲁棒期望来限制远处异常点的影响,最后通过具有自适应停止的指数级联细化序列。该设计旨在提高在分布外脉冲损坏下的鲁棒性,同时使恢复轨迹接近原始局部结构。在多个基准数据集上,所提方法在曲线保真度、导数保留、下游分类和运行时效率方面相比经典滤波器和代表性学习基线表现出一致的改进。这些结果表明,基于有界密度的恢复是噪声时间序列流程中保留特征预处理的实用选择。

英文摘要

Real-world time-series data in industrial sensing, healthcare, and energy systems is often corrupted by a mixture of Gaussian noise and occasional large-magnitude impulse outliers. For tasks that depend on local shape, such as ECG morphology analysis and battery degradation monitoring, the main requirement is not only low reconstruction error but also preservation of derivative peaks and task-critical features. We propose Cascade-KDE, a training-free restoration framework for corrupted time series. The method first estimates a two-dimensional temporal-amplitude density, then applies a Density-Truncated Robust Expectation to limit the influence of distant abnormal points, and finally refines the sequence through an exponential cascade with adaptive stopping. This design aims to improve robustness under out-of-distribution impulse corruptions while keeping the restored trajectory close to the original local structure. Across several benchmark datasets, the proposed method shows consistent gains over classical filters and representative learning-based baselines on curve fidelity, derivative preservation, downstream classification, and runtime efficiency. These results suggest that bounded density-based restoration is a practical option for feature-preserving preprocessing in noisy time-series pipelines.

2605.24053 2026-05-26 cs.AI cs.CL cs.LG 版本更新

Breaking the Chains of Probability: Neutrosophic Logic as a New Framework for Epistemic Uncertainty in Large Language Models

打破概率的锁链:中智逻辑作为大型语言模型中认知不确定性的新框架

Maikel Yelandi Leyva-Vázquez, Florentin Smarandache

发表机构 * Universidad Bolivariana del Ecuador, Coordinación Académica de Posgrado(巴尔干大学厄瓜多尔分校,研究生院) Universidad de Guayaquil(瓜亚基尔大学) Universidad Bernardo O’Higgins(伯纳多·奥希金斯大学) Mathematics, Physics, and Natural Sciences Division, University of New Mexico(新墨西哥大学数学、物理和自然科学系)

AI总结 本文提出使用中智逻辑(Truth、Indeterminacy、Falsity三个独立维度)替代传统概率框架,通过实验发现该框架能更丰富地表示LLM的内部状态,并在35%的评估中自发出现超真状态,为透明、可靠和伦理感知的AI系统提供关键步骤。

Comments Published in Neutrosophic Sets and Systems, Vol. 99 (2026). Author's preprint version. Open code and data available at: github.com/mleyvaz/neutrosophic-llm-logic

详情
Journal ref
Neutrosophic Sets and Systems, Vol. 99, 2026
AI中文摘要

大型语言模型(LLM)主要受概率框架支配,其中结果概率之和被约束为1。这种由Softmax层强加的结构限制导致不确定性崩溃,使得难以区分认知不确定性、悖论和模糊性。我们提出了一种中智逻辑应用的实证研究,该框架将真(T)、不确定(I)和假(F)视为三个独立维度,用于建模LLM中的认知状态。我们在四个OpenAI GPT模型家族上进行了实验,涵盖五种语言现象:逻辑悖论、认知无知、模糊性、伦理矛盾和未来偶然性,采用三种提示策略:中智、概率和熵衍生。我们的发现表明,中智方法通过允许T+I+F>1(我们称之为超真状态),提供了模型内部状态的更丰富表示。在35%的评估中,超真状态自发出现,主要出现在伦理矛盾和逻辑悖论下。我们证明,该方法在模糊上下文中保留了真值,并提供了一种稳健的方法来识别和量化内部模型冲突。我们得出结论,中智评估层的集成是迈向更透明、可靠和伦理感知的AI系统的关键一步。

英文摘要

Large Language Models (LLMs) are predominantly governed by probabilistic frameworks in which the sum of outcome probabilities is constrained to unity. This architectural limitation, often imposed by Softmax layers, leads to a collapse of uncertainty that makes it difficult to differentiate between epistemic uncertainty, paradox, and vagueness. We present an empirical investigation of the application of Neutrosophic Logic, a framework that treats Truth (T), Indeterminacy (I), and Falsity (F) as three independent dimensions, to model epistemic states in LLMs. We conducted experiments on a family of four OpenAI GPT models across five linguistic phenomena: logical paradoxes, epistemic ignorance, vagueness, ethical contradictions, and future contingencies, under three prompting strategies: neutrosophic, probabilistic, and entropy-derived. Our findings reveal that the neutrosophic approach, by allowing T+I+F > 1, a state we term hyper-truth, provides a richer representation of a model's internal state. In 35% of evaluations, hyper-truth emerged spontaneously, predominantly under ethical contradiction and logical paradox. We demonstrate that this approach preserves truth values in fuzzy contexts and offers a robust method for identifying and quantifying internal model conflict. We conclude that the integration of neutrosophic evaluation layers is a critical step toward more transparent, reliable, and ethically aware AI systems.

2605.24052 2026-05-26 cs.LG cs.AI 版本更新

Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing

移动众包中用于LLM微调的诚实在线偏好聚合

Shugang Hao, Lingjie Duan

发表机构 * Singapore University of Technology and Design(新加坡科技设计大学) Hong Kong University of Science and Technology(香港科技大学)

AI总结 针对移动众包中工人可能策略性谎报偏好反馈的问题,提出一种动态贝叶斯博弈模型和在线加权聚合机制,确保工人诚实反馈并实现次线性遗憾。

详情
AI中文摘要

为了更好地满足移动应用(如导航)中用户的需求,移动众包平台可以迭代地将大语言模型(LLM)生成的内容(例如,AI生成的交通状况预测)与从众包工人(例如,移动用户)收集的人类反馈进行对齐。然而,工人可能会策略性地谎报他们的在线偏好反馈,以最大化其影响力或报酬。移动众包中现有的流程(例如,基于EM的权重估计)无法在这种在线设置中识别出最准确的工人,导致在$T$个时隙上产生线性遗憾$\mathcal{O}(T)$。在本文中,我们研究了移动众包中用于LLM微调的诚实在线偏好聚合。我们建立了一个新的动态贝叶斯博弈来建模平台与策略性移动工人之间的多智能体在线学习过程。我们提出了一种新颖的在线加权聚合机制,该机制根据每个工人的反馈准确性动态调整其在偏好聚合中的权重。我们证明了我们的机制确保了策略性工人的诚实反馈,并在$T$个时隙上实现了次线性遗憾$\mathcal{O}(\sqrt{T})$。我们进一步将我们的机制扩展到每个时隙工人反馈有限的挑战性场景,仍然保证了次线性遗憾$\mathcal{O}(\sqrt{T})$。在真实世界数据集上进行的LLM微调实验进一步证明了我们的机制相对于基准方案的显著性能提升。

英文摘要

To better serve users' demands in mobile applications (e.g., navigation), mobile crowdsourcing platforms can iteratively align large language model (LLM)-generated content (e.g., AI-generated traffic condition predictions) with human feedback collected from crowdsourcing workers (e.g., mobile users). However, workers may strategically misreport their online preference feedback to maximize their influence or payment. Existing pipelines in mobile crowdsourcing (e.g., EM-based weight estimation) fail to identify the most accurate worker in this online setting, resulting in a linear regret $\mathcal{O}(T)$ over $T$ time slots. In this paper, we study truthful online preference aggregation for LLM fine-tuning in mobile crowdsourcing. We formulate a new dynamic Bayesian game to model the multi-agent online learning process between the platform and strategic mobile workers. We propose a novel online weighted aggregation mechanism that dynamically adjusts each worker's weight in the preference aggregation according to their feedback accuracy. We prove that our mechanism ensures truthful feedback from strategic workers and achieves a sublinear regret $\mathcal{O}(\sqrt{T})$ over $T$ time slots. We further extend our mechanism to a challenging scenario with limited worker feedback per time slot, still guaranteeing a sublinear regret $\mathcal{O}(\sqrt{T})$. Experiments on LLM fine-tuning with real-world datasets further demonstrate significant performance gains of our mechanisms over benchmark schemes.

2605.24048 2026-05-26 cs.LG cs.AI 版本更新

Mixture of Complementary Agents for Robust LLM Ensemble

互补代理混合:鲁棒的大语言模型集成

Yichi Zhang, Kevin Lu, Yuang Zhang, Jie Gao, Lirong Xia, Fang-Yi Yu

发表机构 * DIMACS, Rutgers University(罗格斯大学DIMACS研究中心) Department of Mathematics, Rutgers University(罗格斯大学数学系) Department of Computer Science, George Mason University(乔治·梅森大学计算机科学系) Department of Computer Science, Rutgers University(罗格斯大学计算机科学系)

AI总结 将大语言模型选择视为组合选择问题,提出基于互补性的贪心选择算法,在性能与成本间取得最佳平衡。

详情
AI中文摘要

多AI协作,例如集成或辩论大语言模型(LLMs),是一种有前景的聚合信息和提升性能的范式。这些流程的基础步骤是将多个提议LLM的响应输入到一个总结LLM中,后者合成一个更好的答案。然而,选择哪些提议者并非易事。现有方法主要关注准确性(选择最强模型)或多样性(确保多样性),并且常常忽视提议者之间以及与总结者之间的交互。我们将提议者选择重新定义为类似于特征选择的组合选择问题,其中LLM的价值在于其与其他模型的互补性。然而,由于时间复杂度过高,直接应用标准特征选择算法在LLM场景中不切实际。受此限制,我们探索了一系列计算可行的贪心式选择算法,这些算法使用少量标记集评估互补性。我们的实验验证了互补性作为提议者选择的指导原则,并确定了在实践中实现最佳性能-成本权衡的方法。

英文摘要

Multi-AI collaboration, such as ensembling or debating large language models (LLMs), is a promising paradigm for aggregating information and boosting performance. A foundational step in these pipelines is to feed the responses of several proposer LLMs into a summarizer LLM, which synthesizes a better answer. However, choosing which proposers to include is non-trivial. Existing approaches primarily focus either on accuracy (picking the strongest models) or diversity (ensuring variety), and often overlook the interactions among proposers and with the summarizer. We reframe proposer selection as a combinatorial selection problem akin to feature selection, where the value of an LLM lies in its complementarity with others. However, directly applying standard feature-selection algorithms is impractical in the LLM setting due to prohibitive time complexity. Motivated by this limitation, we explore an extensive range of computationally feasible, greedy-style selection algorithms that assess complementarity using a small labeled set. Our experiments validate complementarity as a guiding principle for proposer selection and identify methods that achieve the best performance-cost trade-offs in practice.

2605.24045 2026-05-26 cs.LG cs.AI 版本更新

A Large-Scale Dataset and Benchmark: Do Protein-Ligand Models Learn Binding Sites or Just Binding Likelihood?

大规模数据集与基准:蛋白质-配体模型学习的是结合位点还是仅仅结合可能性?

Zhaohan Meng, Zhen Bai, Ke Yuan, Iadh Ounis, Zaiqiao Meng, Hao Xu, Joseph Loscalzo

发表机构 * School of Computing Science(计算科学学院) School of Cancer Sciences(癌症科学学院) School of Life Science and Technology(生命科学与技术学院) Institute of Science Tokyo(东京科学研究院) Cancer Research UK Scotland Institute(英国癌症研究会苏格兰研究所) Language Technology Lab(语言技术实验室) Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School(哈佛医学院内科部,布里格斯妇女医院) The Broad Institute of MIT and Harvard(MIT和哈佛大学Broad研究所)

AI总结 针对现有基准无法评估模型是否定位结合位点的问题,提出包含约10万对蛋白质-配体的InteractBind数据集和细粒度基准,通过结合位点定位任务揭示模型在强二元预测下定位能力有限。

Comments Under Review for the NeurIPS 2026 Conference, Track on Evaluations and Datasets

详情
AI中文摘要

蛋白质-配体建模是计算药物发现和分子设计的基础。现有的蛋白质-配体基准通常通过二元结合预测和亲和力回归等任务评估蛋白质与配体是否相互作用以及结合强度。然而,这些评估提供的证据有限,无法判断模型是否能够定位结合位点或识别分子识别背后的非共价相互作用。为填补这一空白,我们引入了InteractBind,一个大规模蛋白质-配体数据集,包含约10万对蛋白质-配体对,以及一个用于细粒度评估的基准。核心细粒度任务是结合位点定位,它利用覆盖六种主要非共价相互作用类型的蛋白质残基和配体原子相互作用图,评估模型导出的相互作用图是否能够定位结合位点。InteractBind还包含结合亲和力和蛋白质相似性控制的分割,以支持现实的泛化评估。使用InteractBind,我们评估了八个现有的基于序列和交互感知的模型,评估了二元结合预测和结合位点定位。结果显示,尽管二元结合预测表现强劲,但结合位点定位能力有限,且在不同非共价相互作用类型间存在显著差异。总体而言,InteractBind建立了一个基准范式,鼓励开发更具可解释性和物理基础的蛋白质-配体模型。

英文摘要

Protein-ligand modeling underpins computational drug discovery and molecular design. Existing protein-ligand benchmarks typically evaluate whether a protein and ligand interact and how strongly they bind, through tasks such as binary binding prediction and affinity regression. However, these evaluations provide limited evidence of whether models can localize binding sites or identify the non-covalent interactions underlying molecular recognition. To address this gap, we introduce InteractBind, a large-scale protein-ligand dataset comprising approximately 100k protein-ligand pairs, together with a benchmark for fine-grained evaluation. The core fine-grained task is that of binding-site localization, which uses protein-residue and ligand-atom interaction maps spanning six major types of non-covalent interactions to assess whether model-derived interaction maps localize binding sites. InteractBind further includes binding affinity and protein similarity-controlled splits to support realistic generalization assessment. Using InteractBind, we evaluate eight existing sequence-based and interaction-aware models, assessing binary binding prediction and binding-site localization. Results reveal limited binding-site localization despite strong binary binding prediction, with marked variation across non-covalent interaction types. Overall, InteractBind establishes a benchmark paradigm that encourages the development of more interpretable and physically grounded protein-ligand models.

2605.24043 2026-05-26 cs.LG cs.AI 版本更新

LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs

LLM-AutoSciLab:通过LLM主动实验进行闭环科学发现

Sanchit Kabra, Nikhil Abhyankar, Saaketh Desai, Prasad Iyer, Chandan K Reddy

发表机构 * Virginia Tech(弗吉尼亚理工大学) Sandia National Laboratories(桑迪亚国家实验室)

AI总结 提出LLM-AutoSciLab闭环框架,通过假设生成与实验选择迭代优化,在预算约束下实现主动数据采集,在三个基准上优于现有方法且样本效率提升2-5倍。

详情
AI中文摘要

科学发现是一个闭环过程,其中假设指导数据采集,观察结果细化假设空间。然而,大多数方法将发现简化为对固定数据集的监督学习,其中有限的观察可能支持多种局部拟合但无法泛化的合理机制。因此,关键挑战在于选择信息丰富的观察以消除不确定性,将焦点从静态推断转向自适应数据采集。为此,我们提出LLM-AutoSciLab,一个将假设生成与假设条件实验选择和机制细化相结合的闭环框架。LLM-AutoSciLab不是将模型拟合到被动收集的数据,而是迭代地提出合理的假设,选择信息丰富的实验来区分或细化它们,并使用由此产生的证据更新其状态。为了评估具有主动数据采集的动态闭环科学发现,我们引入了ActiveSciBench,包含两个数据集:包含57个酶动力学任务的ActiveSciBench-Chem和包含45个基因调控网络任务的ActiveSciBench-GRN。这些数据集将发现建模为预算约束过程,需要自适应实验设计、变量选择和真实机制的恢复。在NewtonBench、ActiveSciBench-Chem和ActiveSciBench-GRN上,LLM-AutoSciLab优于先前方法,在NewtonBench和ActiveSciBench-Chem上分别达到67.6%和35.1%的符号准确率,在ActiveSciBench-GRN上达到31.1%的精确图恢复。此外,假设引导的实验比最强竞争基线样本效率高2-5倍。代码和数据可在https://github.com/scientific-discovery/LLM-AutoSciLab获取。

英文摘要

Scientific discovery is a closed-loop process in which hypotheses guide data acquisition and observations refine the hypothesis space. Yet most approaches reduce discovery to supervised learning over fixed datasets, where limited observations can support multiple plausible mechanisms that fit locally but fail to generalize. Thus, the key challenge is selecting informative observations to resolve uncertainty, shifting the focus from static inference to adaptive data acquisition. To address this, we propose LLM-AutoSciLab, a closed-loop framework that couples hypothesis generation with hypothesis-conditioned experiment selection and mechanism refinement. Rather than fitting models to passively collected data, LLM-AutoSciLab iteratively proposes plausible hypotheses, selects informative experiments to distinguish or refine them, and updates its state using the resulting evidence. To evaluate dynamic, closed-loop scientific discovery with active data acquisition, we introduce ActiveSciBench, comprising two datasets: ActiveSciBench-Chem with 57 enzyme-kinetics tasks and ActiveSciBench-GRN with 45 gene-regulatory-network tasks. These datasets model discovery as a budget-constrained process requiring adaptive experiment design, variable selection, and recovery of true mechanisms. Across NewtonBench, ActiveSciBench-Chem, and ActiveSciBench-GRN, LLM-AutoSciLab outperforms prior methods, achieving 67.6% and 35.1% symbolic accuracy on NewtonBench and ActiveSciBench-Chem, respectively, and 31.1% exact graph recovery on ActiveSciBench-GRN. Moreover, hypothesis-guided experimentation is 2-5x more sample-efficient than the strongest competing baselines. Code and data are available at: https://github.com/scientific-discovery/LLM-AutoSciLab

2605.24033 2026-05-26 cs.LG cs.LO 版本更新

Towards Verifiable Transformers: Solver-Checkable Circuit Explanations

迈向可验证的Transformer:求解器可检查的电路解释

Neel Somani

发表机构 * Independent Researcher(独立研究者)

AI总结 提出Verifiable Transformers框架,通过将任务局部Transformer电路转化为有界、求解器可检查的声明,实现电路属性的形式化验证。

详情
AI中文摘要

机制可解释性通常识别Transformer模型内部的电路,但这些电路的解释通常通过示例、消融和手动推理来验证。这留下了在发现合理电路与证明电路功能之间的差距。我们引入了Verifiable Transformers,一个将任务局部Transformer电路转化为有界、求解器可检查的声明的框架。给定一个行为、一个有限任务域和一个候选token投影,我们提取任务电路并验证属性,如投影功能等价性、边必要性、任务相关不变性和最终残差鲁棒性。直接验证将提取的电路本身编码到SMT求解器中。当电路包含无法精确或可处理编码的算子时,代理介导的验证拟合一个SMT可编码的代理,在有限域上针对提取的电路验证它,并针对代理验证符号解释。我们使用带有Signed L1 BandNorm、sparsemax注意力和LeakyReLU的GPT风格架构实例化直接验证。在小型符号序列任务上,我们训练一个SMT可表示的Transformer,提取用于引号闭合和括号类型跟踪的稀疏电路,并详尽验证投影功能等价性、内容不变性、边必要性和最终残差鲁棒性。在GPT-2规模上,相同的算子堆栈在OpenWebText上稳定训练,尽管朴素直接SMT验证仍然难以处理。我们还展示了在具有难以编码注意力的任务局部电路上的代理介导验证,显示了已验证的符号解释和求解器生成的反例。目标不是全模型验证,而是为将机制电路解释转化为可证明或反驳的形式命题提供一条具体路径。

英文摘要

Mechanistic interpretability often identifies circuits inside Transformer models, but explanations of those circuits are usually validated through examples, ablations, and manual reasoning. This leaves a gap between finding a plausible circuit and proving what the circuit does. We introduce Verifiable Transformers, a framework for converting task-localized Transformer circuits into bounded, solver-checkable claims. Given a behavior, a finite task domain, and a candidate-token projection, we extract a task circuit and verify properties such as projected functional equivalence, edge necessity, task-relevant invariance, and final-residual robustness. Direct verification encodes the extracted circuit itself into an SMT solver. When a circuit contains operators that are not exactly or tractably encodable, surrogate-mediated verification fits an SMT-encodable surrogate, validates it against the extracted circuit over the bounded domain, and verifies symbolic explanations against the surrogate. We instantiate direct verification with a GPT-style architecture using Signed L1 BandNorm, sparsemax attention, and LeakyReLU. On small symbolic sequence tasks, we train an SMT-representable Transformer, extract sparse circuits for quote closing and bracket type tracking, and exhaustively verify projected functional equivalence, content invariance, edge necessity, and final-residual robustness. At GPT-2 scale, the same operator stack trains stably on OpenWebText, although naive direct SMT verification remains intractable. We also demonstrate surrogate-mediated verification on task-localized circuits with hard-to-encode attention, showing both verified symbolic explanations and solver-generated counterexamples. The goal is not full-model verification, but a concrete path for turning mechanistic circuit explanations into formal propositions that can be proven or refuted.

2605.24031 2026-05-26 q-fin.CP cs.LG 版本更新

Volatility Surface Reconstruction using Deep Learning under No-Arbitrage Constraints

无套利约束下使用深度学习进行波动率曲面重建

Pablo Rodriguez Manzi

发表机构 * Universidad de Buenos Aires(布宜诺斯艾利斯大学)

AI总结 研究使用深度学习模型在无套利约束下从稀疏噪声期权报价重建隐含波动率曲面,比较多种神经网络架构与经典SVI参数化方法,发现Transformer和U-Net在稀疏观测下重建精度高,软套利惩罚有效减少套利违规。

Comments MSc thesis, Universidad de Buenos Aires, 2026. 94 pages, 27 figures

详情
AI中文摘要

我们研究了使用深度学习模型在无套利约束下从稀疏和噪声期权报价重建隐含波动率曲面。我们比较了多种神经架构,包括多层感知机、卷积网络、U-Net、变分自编码器和基于Transformer的模型与经典SVI参数化方法在期权市场数据上的表现。结果表明,Transformer和U-Net架构实现了较强的重建精度,特别是在稀疏观测情况下,而软套利惩罚显著减少了套利违规,对重建误差影响适中。我们进一步分析了不同架构和正则化强度下精度与套利一致性之间的权衡。

英文摘要

We study the reconstruction of implied volatility surfaces from sparse and noisy option quotes using deep learning models under no-arbitrage constraints. We compare multiple neural architectures, including multilayer perceptrons, convolutional networks, U-Nets, variational autoencoders, and Transformer-based models against classical SVI parameterizations on option market data. Results show that Transformer and U-Net architectures achieve strong reconstruction accuracy, particularly under sparse observation regimes, while soft arbitrage penalties significantly reduce arbitrage violations with moderate impact on reconstruction error. We further analyze the trade-off between accuracy and arbitrage consistency across architectures and regularization strengths.

2605.24025 2026-05-26 cs.CV cs.LG 版本更新

Towards Large Model Feature Coding

面向大模型特征编码

Youwei Pang, Changsheng Gao, Dong Liu, Huchuan Lu, Weisi Lin

发表机构 * NTU(国立台湾大学) USTC(中国科学技术大学) DUT(东吴大学)

AI总结 本文提出大模型特征编码(LaMoFC)基准与评估框架,通过构建涵盖4类16场景的特征数据集LaMoFCBench,揭示现有编码范式与大模型特征异构性之间的严重错位。

详情
AI中文摘要

大模型在广泛的感知和生成任务中取得了显著性能,但实际部署日益受到计算和内存预算以及隐私要求的限制。分割执行通过跨设备划分计算来缓解这些约束,但不可避免地引入了中间特征的密集传输和存储。与通常针对同质空间激活图的传统CNN特征编码不同,现代大模型生成具有不同统计分布和压缩容忍度的异构特征,例如多级/多模态表示和自回归上下文缓存。这些特性使得将大模型特征编码(LaMoFC)视为一个基本系统组件,并需要一个系统的评估框架。在本文中,我们提出了一个全面的LaMoFC基准和评估框架。我们首先构建特征数据集LaMoFCBench,涵盖4个类别和16个场景中的多样化任务需求,同时集成广泛采用的架构和各种分割计算设置。然后,我们根据实际应用场景指定代表性的分割点以提取中间特征,建立统一的流水线以实现公平和可重复的比较。最后,我们对主流的通用特征编解码器进行基准测试,揭示了现有编码范式与大模型特征异构性之间的严重错位。这些发现表明,LaMoFC需要从根本上脱离现有范式,而LaMoFCBench提供了推动这一转变的共享实证基础。数据和代码将在https://github.com/lartpang/LaMoFCBench上提供。

英文摘要

Large models have delivered remarkable performance across a wide range of perception and generation tasks, yet practical deployment is increasingly constrained by computational and memory budgets, as well as privacy requirements. Split execution alleviates these constraints by partitioning computation across devices, but it inevitably introduces intensive transmission and storage of intermediate features. Unlike conventional feature coding for CNNs that typically targets homogeneous spatial activation maps, modern large models generate heterogeneous features with varying statistical distributions and compression tolerances, e.g., multi-level/multi-modal representations and autoregressive context caches. These characteristics necessitate treating large model feature coding (LaMoFC) as a fundamental system component and call for a systematic evaluation framework. In this paper, we present a comprehensive benchmark and evaluation framework for LaMoFC. We first build the feature dataset LaMoFCBench, covering diverse task requirements across 4 categories and 16 scenarios while integrating widelyadopted architectures and various split-computing settings. We then specify representative split points according to practical application scenarios to extract intermediate features, establishing a unified pipeline for fair and reproducible comparisons. Finally, we benchmark mainstream universal feature codecs, exposing the profound misalignment between existing coding paradigms and the heterogeneous nature of large model features. These findings reveal that LaMoFC demands a fundamental departure from existing paradigms, and LaMoFCBench provides the shared empirical foundation to drive this transition. The data and code will be available at https://github.com/lartpang/LaMoFCBench.

2605.24019 2026-05-26 cs.CV cs.LG 版本更新

MGVQ: Synergizing Multi-dimensional Sensitivity-Aware and Gradient-Hessian Fusion for Vector Quantization

MGVQ:协同多维敏感度感知与梯度-海森融合的向量量化

Zhong Wang, Zukang Xu, Xing Hu, Dawei Yang

发表机构 * Bauman Moscow State Technical University(巴甫洛夫莫斯科国立技术大学)

AI总结 提出MGVQ框架,通过敏感度引导的结构化混合精度量化和梯度感知的二阶误差补偿,实现视觉-语言模型的超低位向量量化,在2-bit量化下最高提升4.9个点。

详情
AI中文摘要

视觉-语言模型(VLM)取得了卓越的性能,但其巨大的模型尺寸严重阻碍了在资源受限的边缘设备上的部署。作为一种高效的模型压缩技术,向量量化(VQ)在超低位表示方面表现出色,它将模型权重映射到紧凑码本中的离散码字,以降低内存消耗和传输开销,同时保持模型能力。直接将VQ应用于VLM仍存在两个核心限制。首先,视觉和文本输入带来的跨模态权重分布差异无法被单一的统一码本很好地拟合。其次,当前的二阶误差补偿忽略了梯度信息,导致权重偏离预训练最优状态、梯度漂移和补偿结果有偏。本文提出MGVQ,一种新颖的向量量化框架,集成了多维敏感度感知和梯度-海森融合。它包含两个核心模块:敏感度引导的结构化混合精度量化,通过结合全局和局部敏感度分析,根据通道敏感度动态分配不同位宽,实现精细的资源分配;梯度感知的二阶误差补偿,将一阶梯度嵌入误差校正,并采用Kronecker和Block-LDL分解确保低计算成本。在主流VLM(包括LLaVA-onevision、InternVL2和Qwen2-VL)上的大量实验验证了MGVQ的有效性。在2-bit量化设置下,MGVQ显著超越现有先进的后训练量化方法,在InternVL2-26B上最高提升4.9个点(71.4% vs 67.0%)。所提方法实现了稳定高效的超低位VLM量化,极大促进了多模态大模型在资源受限环境中的实际部署。

英文摘要

Vision-Language Models (VLMs) achieve outstanding performance, yet their huge model size severely hinders deployment on edge devices with limited resources. As an efficient model compression technique, vector quantization (VQ) excels in ultra-low-bit representation, which maps model weights to discrete codewords in a compact codebook to cut memory consumption and transmission overhead while preserving model capability. Direct VQ application to VLMs still has two core limitations. First, cross-modality weight distribution differences brought by visual and textual inputs cannot be well fitted by a single unified codebook. Second, current second-order error compensation ignores first-order gradient information, causing weight deviation from pre-trained optimal states, gradient drift and biased compensation results. This work proposes MGVQ, a novel vector quantization framework integrating multi-dimensional sensitivity perception and gradient-Hessian fusion. It consists of two core modules: sensitivity-guided structured mixed-precision quantization dynamically assigns different bit-widths according to channel sensitivity via combined global and local sensitivity analysis for refined resource allocation; gradient-aware second-order error compensation embeds first-order gradients into error correction, and adopts Kronecker and Block-LDL decomposition to ensure low computational cost. Extensive experiments on mainstream VLMs including LLaVA-onevision, InternVL2 and Qwen2-VL verify the effectiveness of MGVQ. In 2-bit quantization settings, MGVQ surpasses existing advanced post-training quantization methods significantly, achieving a maximum accuracy improvement of 4.9 points (71.4% vs 67.0% on InternVL2-26B). The proposed method realizes stable and efficient ultra-low-bit VLM quantization, greatly promoting the practical deployment of multimodal large models in resource-limited environments.

2605.24009 2026-05-26 physics.ao-ph cs.LG 版本更新

Improving Ensemble CAPE Forecasts with a Diffusion Model Incorporating Aerosol Information

利用包含气溶胶信息的扩散模型改进集合CAPE预报

Zachary James, Joseph Guinness, Arthur DeGaetano

发表机构 * Cornell University, Department of Statistics and Data Science(康奈尔大学统计与数据科学系) Washington University in St. Louis, Department of Statistics and Data Science(圣路易斯华盛顿大学统计与数据科学系) Cornell University, Department of Earth and Atmospheric Sciences(康奈尔大学地球与大气科学系)

AI总结 针对GFS/GEFS系统夏季CAPE低估偏差,提出两阶段训练的AI扩散模型,通过输入GFS预报并输出集合,显著提升RMSE、CRPS和Brier评分,并引入气溶胶光学厚度作为额外特征以改善预报。

详情
AI中文摘要

对流有效位能(CAPE)是预报强天气以及理解深对流和降水的重要变量。最新版本的全球预报系统(GFS)及相关全球集合预报系统(GEFS)在夏季表现出低估CAPE值的偏差。我们训练了一个人工智能(AI)扩散模型,以提高美国午后6小时预报时效集合预报的技巧和不确定性量化。我们的模型以GFS CAPE预报为输入,输出一个集合,该集合在均方根误差、连续排序概率评分和Brier评分上显著优于GFS和GEFS的6小时预报。我们提出了一种两阶段训练流程,以利用较大的历史GFS预报数据集和较小的历史GEFS数据集,尽管两者的初始化和参数化方案随时间变化。我们还展示了无分类器引导可用于控制预报的技巧和离散度。然后,通过添加黑碳、有机碳、沙尘、海盐和硫酸盐的气溶胶光学厚度(AOD)作为额外输入特征,我们展示了框架的通用性。气溶胶可以根据大气条件增强或抑制对流。我们的AI模型有效结合气溶胶以产生改进的CAPE预报。我们通过使用排列特征重要性来排序不同AOD的影响,解释模型组件,并发现黑碳、有机碳和硫酸盐气溶胶对模型CAPE预测的影响大于沙尘和海盐气溶胶。

英文摘要

Convective available potential energy (CAPE) is an important variable for forecasting severe weather and understanding deep convection and precipitation. The latest versions of the Global Forecast System (GFS) and related Global Ensemble Forecast System (GEFS) have exhibited a bias towards underestimating CAPE values during the summertime. We train an artificial intelligence (AI) diffusion model to improve the skill and uncertainty quantification of afternoon 6-hour lead time ensemble forecasts over the United States. Our model takes a GFS CAPE forecast as input and outputs an ensemble that significantly outperforms both GFS and GEFS 6-hour forecasts on root mean square error, continuous ranked probability score, and Brier score. We propose a two-stage training pipeline to leverage both a larger historical GFS forecast dataset and a smaller historical GEFS dataset, despite the two using initialization and parameterization schemes that vary over time. We also show that classifier-free guidance can be used to control the skill and spread of the forecasts. We then demonstrate the versatility of our framework by adding aerosol optical depths (AODs) of black carbon, organic carbon, dust, sea salt, and sulfates as additional input features. Aerosols can invigorate or suppress convection depending on atmospheric conditions. Our AI models effectively incorporate aerosols to produce improved CAPE forecasts. We interpret the model components by using permutation feature importance to rank the influence of the different AODs and find that black carbon, organic carbon, and sulfate aerosols have a greater impact on the model's CAPE predictions than sea salt and dust aerosols.

2605.24008 2026-05-26 cs.LG cs.CV cs.SE 版本更新

CAFD: Concept-Aware DNN Fault Detection using VLMs

CAFD: 使用视觉语言模型的概念感知深度神经网络故障检测

Amin Abbasishahkoo, Mahboubeh Dadkhah, Lionel Briand

发表机构 * School of EECS, University of Ottawa(渥太华大学电子工程与计算机科学学院) Research Ireland Lero centre for software, University of Limerick(利默尼克大学爱尔兰研究中心)

AI总结 提出概念感知故障检测(CAFD)方法,通过整合模型信号、距离特征和基于视觉语言模型的概念故障比(CFR)特征,在保持效率的同时显著提升DNN故障检测性能。

详情
AI中文摘要

近年来,深度神经网络(DNN)的故障检测受到越来越多的关注。虽然已经提出了更先进的混合方法来结合多种信息源并优于早期技术,但它们通常会产生大量的计算开销,限制了在现实环境中的可扩展性和实用性。在本文中,我们介绍了概念感知故障检测(CAFD),这是一种基于学习的方法,通过有效整合多个信息源同时保持实际效率,实现了卓越的故障检测性能。具体来说,CAFD使用一组精心挑选的信息特征进行训练,包括基于DNN输出的模型信号、基于距离的特征以及一种新颖的基于概念的特征,称为概念故障比(CFR)。CFR利用视觉语言模型(VLM)从图像中提取文本概念,并量化其存在与DNN故障相关的可能性。通过引入这一特征,CAFD受益于互补的语义信息,从而实现更有效的故障检测。我们的结果表明,CFR是DNN故障检测的有效指标。我们对CAFD进行了广泛的实证评估,将其与三个主题DNN模型和数据集(包括ImageNet)上的五个最先进基线进行了比较。在广泛的约束选择预算范围内,CAFD在故障检测率(FDR)上始终优于所有基线,在所有研究对象和预算规模上平均FDR提高了18.3%。

英文摘要

Fault detection for Deep Neural Networks (DNNs) has received increasing attention in recent years. While more advanced hybrid approaches have been proposed to combine multiple sources of information and outperform earlier techniques, they often incur substantial computational overhead, limiting scalability and practicality in real-world settings. In this paper, we introduce Concept-Aware Fault Detection (CAFD), a learning-based approach that achieves superior fault detection performance by effectively integrating multiple information sources while maintaining practical efficiency. Specifically, CAFD is trained using a carefully selected set of informative features, including model-based signals derived from the DNN's outputs, distance-based features, and a novel concept-based feature, called Concept Failure Ratio (CFR). CFR leverages Vision-Language Models (VLMs) to extract textual concepts from images and quantify the likelihood that their presence is associated with DNN failures. By incorporating this feature, CAFD benefits from complementary semantic information, enabling more effective fault detection. Our results demonstrate that CFR serves as an effective indicator for DNN fault detection. We conduct an extensive empirical evaluation of CAFD, comparing it against five state-of-the-art baselines across three subject DNN models and datasets, including ImageNet. Across a wide range of constrained selection budgets, CAFD consistently outperforms all baselines in Fault Detection Rate (FDR), achieving average FDR improvements of 18.3% across all investigated subjects and budget sizes.

2605.24006 2026-05-26 cs.DC cs.LG 版本更新

A Tabular Schedule Abstraction for Communication-Aware Evaluation of Pipeline-Parallel LLM Training

一种用于通信感知评估流水线并行LLM训练的表格调度抽象

Daniel Barley, Jonathan Leis, Benjamin Klenk, Holger Fröning

发表机构 * Hardware and Artificial Intelligence (HAWAII) Lab, Heidelberg University, Heidelberg, Germany(海德堡大学硬件与人工智能实验室) NVIDIA Corporation, Santa Clara, CA, USA(英伟达公司)

AI总结 本文提出一种表格调度抽象和统一的多抽象方法,通过公式推理、理想化调度表和通信感知执行模拟,比较了GPipe、1F1B、Chimera和Hanayo等流水线调度方案,发现通信会抵消气泡分析的结构优势,调度排名依赖于执行环境。

Comments Accepted at the 25th IEEE International Symposium on Parallel and Distributed Computing (ISPDC 2026)

详情
AI中文摘要

流水线并行是大型语言模型分布式训练的关键技术,因为它减少了每设备的参数和激活内存。然而,比较流水线调度方案是困难的:分析模型暴露了诸如气泡比率等结构量,而端到端硬件实验成本高昂且依赖于系统。在这项工作中,我们引入了一种表格调度抽象和一种统一的多抽象方法,该方法连接了基于公式的推理、理想化调度表和通信感知执行模拟。使用这个框架,我们在多个建模的系统配置下比较了GPipe、1F1B、Chimera和受限模式下的Hanayo。我们的结果表明,调度排名不是抽象不变的:通信可以抵消仅由气泡分析所暗示的结构优势。在本文考虑的假设下,GPipe和1F1B在运行时等价,但1F1B实现了更低的激活内存峰值。Chimera主要在低微批次数和通信友好的情况下具有优势,而Hanayo在其预期的受限工作点有效,但对网络瓶颈敏感。我们进一步研究了一种非对称的Chimera式放置,它没有减少全局峰值内存需求,但在浅流水线中显示出有限的运行时收益。总体而言,流水线调度质量仅在建模的执行环境背景下才有意义。

英文摘要

Pipeline parallelism is a key technique for distributed training of large language models because it reduces per-device parameter and activation memory. However, comparing pipeline schedules is difficult: analytical models expose structural quantities such as bubble ratios, while end-to-end hardware experiments are costly and system-specific. In this work, we introduce a tabular schedule abstraction and a unified multi-abstraction methodology that connects formula-based reasoning, idealized schedule tables, and communication-aware execution simulation. Using this framework, we compare GPipe, 1F1B, Chimera, and Hanayo in its restricted regime across multiple modeled system configurations. Our results show that schedule rankings are not abstraction-invariant: communication can negate structural advantages suggested by bubble analysis alone. Under the assumptions considered here, GPipe and 1F1B are runtime-equivalent, but 1F1B achieves a lower activation-memory peak. Chimera is advantageous mainly at low microbatch counts and in communication-favorable regimes, while Hanayo is effective in its intended restricted operating point but remains sensitive to network bottlenecks. We further study an asymmetric Chimera-style placement, which does not reduce the global peak memory requirement but reveals limited runtime gains in shallow pipelines. Overall, pipeline schedule quality is meaningful only in the context of the modeled execution environment.

2605.24004 2026-05-26 cs.AI cs.CV cs.LG cs.RO 版本更新

Reason--Imagine--Act: Closed-Loop LLM Decision Making with World Models for Autonomous Driving

推理--想象--行动:基于世界模型的闭环LLM自动驾驶决策

Zhengqi Sun, Yiwen Sun, Boxuan Liu, Tailai Chen, Tianxu Guo, Jiabin Liu

发表机构 * 1Department of Information Management, Peking University, Beijing 100871, China 2School of Intelligence Science Technology, Peking University, Beijing 100871, China 3State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing 100080, China 4Yuanpei College, Peking University, Beijing 100871, China 5China Agricultural University, Beijing, China 6CRSC Research \& Design Institute Group Co., Ltd., Beijing, China

AI总结 提出Reason--Imagine--Act (RIA)闭环框架,结合LLM推理器与动作条件世界模型进行在线安全验证,在CARLA点目标协议下实现80.05%路线完成率、51.10%到达率和0.20%碰撞率。

Comments Accepted by the 2026 IEEE International Conference on Intelligent Transportation Systems (ITSC 2026). 8 pages, 2 figures

详情
AI中文摘要

大型语言模型(LLM)在自动驾驶中具有潜力,但仅基于语义的决策策略可能在动态交通中产生物理上不安全的行为。现有方法要么在没有显式动力学验证的情况下进行在线语言推理,要么主要在离线流程中使用世界模型,在决策时语义意图与物理可行性之间存在差距。我们提出了Reason--Imagine--Act (RIA),一个闭环框架,将LLM推理器与动作条件世界模型耦合,用于在线安全验证。在每一步,LLM提出一个动作模板和候选子动作,世界模型执行短时域展开,安全评分器选择最安全的可执行动作并反馈给下一步推理。在统一的CARLA点目标协议(1000个回合)下,RIA实现了80.05%的路线完成率、51.10%的到达率和0.20%的碰撞率。在相同的闭环接口下,RIA在核心闭环指标上始终优于无训练基线,包括CARLA TM和MADA。为便于复现,代码可在https://github.com/pku-smart-city/source_code/tree/main/RIA获取。

英文摘要

Large language models (LLMs) are promising for autonomous driving, but semantics-only decision policies can yield physically unsafe behavior in dynamic traffic. Existing methods either perform online language reasoning without explicit dynamics verification or use world models mainly in offline pipelines, leaving a gap between semantic intent and physical feasibility at decision time. We propose Reason--Imagine--Act (RIA), a closed-loop framework that couples an LLM reasoner with an action-conditioned world model for online safety verification. At each step, the LLM proposes an action template and candidate sub-actions, the world model performs short-horizon rollouts, and a safety scorer selects the safest executable action with feedback to the next reasoning step. Under a unified CARLA point-goal protocol (1000 episodes), RIA achieves 80.05% route completion, 51.10% arrival rate, and 0.20% collision rate. Under the same closed-loop interface, RIA consistently outperforms training-free baselines, including CARLA TM and MADA, on core closed-loop metrics. For reproducibility, code is available at https://github.com/pku-smart-city/source_code/tree/main/RIA.

2605.23997 2026-05-26 cs.CV cs.AI cs.LG 版本更新

IVR-R1: Refining Trajectories through Iterative Visual-Grounded Reasoning in Reinforcement Learning

IVR-R1:通过强化学习中的迭代视觉基础推理优化轨迹

Chenghao Li, Fusheng Hao, Xikai Zhang, Likang Xiao, Yanwei Ren, Fuxiang Wu, Quan Chen, Liu Liu

发表机构 * Hangzhou International Innovation Institute, Beihang University(北京航空航天大学杭州国际创新研究院) School of Artificial Intelligence, Beihang University(北京航空航天大学人工智能学院) Kuaishou Technology(快手科技) Shenzhen Institute of Advanced Integration Technology, Shenzhen(深圳先进集成技术研究院)

AI总结 提出IVR-R1框架,利用奖励驱动的筛选机制和迭代再推理循环,在强化学习中动态校正多模态推理轨迹,以解决视觉幻觉和逻辑错误问题。

详情
AI中文摘要

通过强化学习的多模态大语言模型在复杂视觉推理任务中展现出显著能力,但在长程多模态场景中仍存在局限,常出现视觉幻觉和逻辑错误。当前方法通常将高维视觉场景预编码为离散文本代理以促进下游推理。然而,随着推理链展开,文本与视觉场景之间固有的信息不对称会侵蚀视觉基础,导致推理误导和错误输出。为解决此问题,我们提出IVR-R1(迭代视觉基础推理),一种新颖的强化学习训练框架,通过动态视觉重新对齐主动校正推理轨迹以指导策略优化。具体而言,利用奖励驱动的筛选机制识别有缺陷的展开,IVR-R1在多模态上下文中执行细粒度的步骤级错误归因。通过将中间推理状态与原始视觉先验进行迭代交叉引用,再推理循环实现自动轨迹校正,有效合成专家级演示,作为策略模型的高保真推理模板。我们在多种多模态基准上的实验表明,IVR-R1持续优于现有强化学习方法,为在复杂多模态推理中保持逻辑和视觉一致性建立了优越范式。

英文摘要

Multimodal large language models via reinforcement learning (RL) have demonstrated remarkable capabilities in complex visual reasoning tasks, yet they remain limited in long-horizon multimodal scenarios, often suffering from visual hallucination and logical error. Current methods typically pre-encode high-dimensional visual scenes into discrete textual proxies to facilitate downstream reasoning. As the reasoning chain unfolds, however, the inherent information asymmetry between text and visual scenes tends to erode visual grounding, resulting in misguided reasoning and erroneous outputs. To address this issue, we introduce IVR-R1 (Iterative Visual-grounded Reasoning), a novel RL training framework that facilitates dynamic visual re-alignment that actively rectifies reasoning trajectories to guide policy optimization. Specifically, by leveraging a reward-driven screening mechanism to identify flawed rollouts, IVR-R1 executes a fine-grained, step-level error attribution within the multimodal context. By iteratively cross-referencing intermediate reasoning states against pristine visual priors, a Re-Reasoning Loop enables automated trajectory rectification, effectively synthesizing expert-level demonstrations that serve as high-fidelity reasoning templates for the policy model. Our experiments across diverse multimodal benchmarks demonstrate that IVR-R1 consistently outperforms existing reinforcement learning methods, establishing a superior paradigm for maintaining logical and visual consistency in complex multimodal reasoning.

2605.23988 2026-05-26 cs.DC cs.LG 版本更新

TSFLora: Token-Compressed Split Fine-Tuning for Wireless Edge Networks

TSFLora: 面向无线边缘网络的令牌压缩分割微调

Xianke Qiang, Zheng Chang, Li Wang, Ying-Chang Liang

发表机构 * School of Computer Science and Engineering(计算机科学与工程学院) Center for Intelligent Networking and Communications(智能网络与通信中心) School of Computer Science(计算机科学学院) Beijing University of Posts and Telecommunications(北京邮电大学)

AI总结 针对无线设备资源受限下大模型微调难题,提出TSFLora框架,通过注意力引导的令牌选择、合并、低比特量化及LoRA适配器,在分割联邦训练中压缩中间令牌序列,显著降低通信开销和内存占用,同时保持精度。

详情
AI中文摘要

将大型AI模型(LAM)适配到个性化边缘数据具有挑战性,因为无线设备的内存、计算和上行容量有限。联邦微调保护数据隐私,但仍要求每个设备托管完整模型,而分割学习以大量激活传输为代价减少设备内存。本文提出TSFLora,一种令牌压缩的分割微调框架,用于在边缘实现通信高效的LAM适配。TSFLora在分割联邦训练流程中结合了注意力引导的令牌选择、令牌合并、低比特激活量化和基于LoRA的适配。关键思想是在传输前压缩中间令牌序列,从而在不改变冻结骨干网络的情况下减少上行流量和服务器端处理。在CIFAR-10、CIFAR-100和TinyImageNet上对ViT模型的实验表明,TSFLora在保持竞争性精度的同时,实现了高达 extbf{6.8$ imes$}的通信减少和 extbf{41\%}的内存节省。

英文摘要

Adapting large AI models (LAMs) to personalized edge data is challenging because wireless devices have limited memory, computation, and uplink capacity. Federated fine-tuning preserves data privacy but still requires each device to host the full model, while split learning reduces device memory at the cost of heavy activation transmission. This paper proposes TSFLora, a token-compressed split fine-tuning framework for communication-efficient LAM adaptation at the edge. TSFLora combines attention-guided token selection, token merging, low-bit activation quantization, and LoRA-based adaptation within a split federated training pipeline. The key idea is to compress the intermediate token sequence before transmission so that the system reduces both uplink traffic and server-side processing without changing the frozen backbone. Experiments on ViT models over CIFAR-10, CIFAR-100, and TinyImageNet show that TSFLora achieves up to \textbf{6.8$\times$} communication reduction and \textbf{41\%} memory saving while maintaining competitive accuracy.

2605.23984 2026-05-26 cs.LG cs.AI cs.CV 版本更新

Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection

面向多模态在线分布式工业异常检测的参数高效多类智能调度

Heqiang Wang, Weihong Yang, Zheyuan Yang, Jia Zhou, Xiaoxiong Zhong, Fangming Liu, Weizhe Zhang

发表机构 * Pengcheng Laboratory(鹏城实验室) Shenzhen International Graduate School(深圳国际研究生院)

AI总结 针对工业异常检测中分布式、持续生成数据的特点,提出多模态在线分布式工业异常检测框架,通过多类智能调度问题和序列边际增益贪婪算法协调模型更新,并采用资源高效类级低秩适应策略降低系统开销,在MVTec 3D-AD和Eyecandies数据集上取得优越性能。

详情
AI中文摘要

工业异常检测作为工业系统的基本挑战已引起广泛关注。异构工业传感器的快速发展推动工业异常检测从单模态向多模态范式转变。然而,现有方法主要针对集中式和离线场景设计,忽视了实际工业环境中分布式和持续生成的数据特征。随着边缘智能的发展,现代边缘设备不仅能够采集数据,还能进行分布式模型训练,实现系统范围内的协作智能。工业异常检测是此背景下的关键应用。受这些挑战启发,我们提出了一种名为多模态在线分布式工业异常检测(MODIAD)的新框架。首先给出了MODIAD的完整工作流程,然后制定了多类智能调度(MIS)问题,通过平衡数据充足性和类别更新频率来协调跨类模型更新。为了高效解决该问题,我们设计了序列边际增益贪婪(SMG)算法,能够在资源约束下实现有效的多类训练。此外,为了提升训练过程中的计算和通信效率,我们提出了资源高效类级低秩适应(REC-LoRA)策略,在保持检测性能的同时显著降低系统开销。在两个代表性多模态工业异常检测数据集MVTec 3D-AD和Eyecandies上的大量实验表明,所提方法在MODIAD场景下实现了优越的性能和效率。

英文摘要

Industrial anomaly detection has attracted significant attention as a fundamental challenge in industrial systems. The rapid advancement of heterogeneous industrial sensors has driven industrial anomaly detection from unimodal to multimodal paradigms. However, existing methods are primarily designed for centralized and offline settings, overlooking the distributed and continuously generated data characteristic of real-world industrial environments. With the advancement of edge intelligence, modern edge devices are increasingly capable of not only data acquisition but also distributed model training, enabling collaborative intelligence across the system. Industrial anomaly detection represents a critical application in this context. Motivated by these challenges, we propose a novel framework termed Multimodal Online Distributed Industrial Anomaly Detection (MODIAD). We first present a comprehensive workflow for MODIAD and then formulate a Multi-class Intelligent Scheduling (MIS) problem to coordinate cross class model updates by balancing data sufficiency and class update frequency. To efficiently solve this problem, we design a Sequential Marginal Gain Greedy (SMG) algorithm that enables effective multi-class training under resource constraints. Furthermore, to improve the computational and communication efficiency during training, we propose an Resource Efficient Class-Wise Low Rank Adaptation (REC-LoRA) strategy, which significantly reduces system overhead while preserving detection performance. Extensive experiments on two representative multimodal industrial anomaly detection datasets, MVTec 3D-AD and Eyecandies demonstrate that the proposed approach achieves superior performance and efficiency under the MODIAD scenario.

2605.23978 2026-05-26 cs.LG econ.EM q-fin.ST q-fin.TR 版本更新

Algometrics: Forecasting Under Algorithmic Feedback

算法度量:算法反馈下的预测

Marc Schmitt

发表机构 * University of Oxford(牛津大学)

AI总结 提出算法度量框架,研究预测算法影响自身评估数据的反馈机制,证明部署风险不可仅由历史数据识别,并给出估计方法。

详情
AI中文摘要

在算法市场中,预测模型成为其试图预测的数据生成过程的一部分。一旦其输出转化为交易、分配、执行计划或风险控制,它们就会改变用于评估的未来数据。我引入了算法度量,这是一个用于时间序列的框架,其演化依赖于预测它们的预测算法。该框架区分了被动预测下测量的历史风险和预测驱动行动时测量的部署风险。我证明了三个结果。首先,仅凭被动历史数据无法识别部署风险:即使在线性一步反馈模型中,无限多的算法介导环境会诱导相同的历史规律,但对同一预测器意味着不同的部署风险。其次,历史模型排名可能在拥挤下反转,因此被动误差较低的预测器在类似算法被采用后可能具有更高的部署误差。第三,随机化或工具化行动可识别短视界线性反馈,并且我推导出部署风险估计的有限样本界。这些结果表明,算法市场中的时间序列基准应报告反馈敏感性和预测准确性。

英文摘要

In algorithmic markets, predictive models become part of the data-generating process they aim to forecast. Once their outputs are converted into trades, allocations, execution schedules, or risk controls, they change the future data on which they are evaluated. I introduce algometrics, a framework for time series whose evolution depends on the predictive algorithms forecasting them. The framework distinguishes historical risk, measured under passive forecasting, from deployment risk, measured when forecasts drive actions. I prove three results. First, deployment risk is not identifiable from passive historical data alone: even in a one-step linear feedback model, infinitely many algorithm-mediated environments induce the same historical law while implying different deployment risks for the same forecaster. Second, historical model rankings can invert under crowding, so a predictor with lower passive error can have higher deployment error once similar algorithms are adopted. Third, randomized or instrumented actions identify short-horizon linear feedback, and I derive a finite-sample bound for deployment-risk estimation. These results suggest that time-series benchmarks in algorithmic markets should report feedback sensitivity alongside predictive accuracy.

2605.23971 2026-05-26 physics.chem-ph cs.LG physics.app-ph 版本更新

Physics-Guided Concentration Inference from Resistance Transients in a Mixed-Phase SnO-SnO$_2$ Carbon Monoxide Sensor with p-n Switching

物理引导的混合相SnO-SnO$_2$一氧化碳传感器中具有p-n切换的电阻瞬态浓度推断

Sani Biswas, Preetam Singh, Amit Kumar Gangwar

发表机构 * Centro de Modelamiento Matemático, Universidad de Chile & IRL 2807 - CNRS(智利大学数学建模中心及CNRS IRL 2807) Department of Chemical Engineering, Biotechnology and Materials, FCFM, Universidad de Chile(智利大学化学工程、生物技术与材料系) ANID - Millenium Science Initiative, Millenium Nuclei of Advanced MXenes for Sustainable Applications (AMXSA)(ANID-千年科学计划,可持续应用先进MXenes的千年核) CSIR-National Physical Laboratory, Dr. K.S. Krishnan Marg, New Delhi, 110012, India(印度国家物理实验室,Dr. K.S. Krishnan Marg,新德里,110012)

AI总结 提出一种物理引导的机器学习框架,利用混合相SnO-SnO$_2$气体传感器的电阻瞬态信号推断CO浓度,通过物理可解释描述符和频域特征实现p型和n型传感模式下的分类与回归,揭示了p型利于分类、n型利于高保真回归的双模行为。

Comments 15 pages, 14 figures

详情
AI中文摘要

本工作提出一个物理引导的机器学习框架,用于从实验测量的混合相SnO-SnO$_2$材料气体传感器的电阻瞬态信号中推断一氧化碳浓度,该传感器表现出温度依赖的p-n切换行为。周期级瞬态响应通过物理可解释的描述符表示,并辅以紧凑的快速傅里叶变换(FFT)和离散小波变换(DWT)摘要。使用考虑泄漏的分组交叉验证,我们分别研究了p型和n型传感模式下的多类浓度分类和连续浓度回归。在两种模式下,融合特征提供了最强的整体性能,而物理引导的描述符块仍然具有很强的竞争力,表明主要的浓度信息已经编码在物理上有意义的瞬态动力学中。p型分支显示出最佳的浓度类别区分能力,融合随机森林分类器达到约96.5%的准确率,而n型分支产生最佳的定量浓度估计,融合随机森林回归器实现了MAE≈1.48 ppm和R²≈0.992。这些结果揭示了清晰的双模行为:p型传感特别有利于分类,而n型传感更有利于高保真回归。更广泛地说,该研究表明,考虑泄漏的、周期级的、物理引导的机器学习可以将传统的气体传感分析扩展到单一响应指标之外,同时保持物理可解释性。

英文摘要

This work presents a physics-guided machine-learning framework for carbon monoxide concentration inference from experimentally measured resistance transients of a mixed-phase SnO-SnO$_2$ material gas sensor exhibiting temperature-dependent p-n switching behavior. Cycle-level transient responses are represented through physically interpretable descriptors and complemented by compact fast Fourier transform (FFT) and discrete wavelet transform (DWT)-based summaries. Using leakage-aware grouped cross-validation, we study both multi-class concentration classification and continuous concentration regression for the p-type and n-type sensing regimes separately. Across both regimes, fused features provide the strongest overall performance, while the physics-guided descriptor block remains highly competitive, indicating that the dominant concentration information is already encoded in physically meaningful transient dynamics. The p-type branch shows the best concentration-class discrimination, with the fused Random Forest classifier reaching approximately $96.5\%$ accuracy, whereas the n-type branch yields the best quantitative concentration estimation, with the fused Random Forest regressor achieving an MAE$\approx 1.48$ ppm and an R$^2$ $\approx 0.992$. These results reveal a clear dual-regime behavior: p-type sensing is particularly favorable for classification, whereas n-type sensing is more favorable for high-fidelity regression. More broadly, the study demonstrates that leakage-aware, cycle-level, physics-guided machine learning can extend conventional gas-sensing analysis beyond single-response metrics while preserving physical interpretability

2605.23962 2026-05-26 q-fin.ST cs.LG 版本更新

From Index to Equity: Pre-Training Transformers for Stock Return Prediction

从指数到股票:预训练Transformer用于股票回报预测

Marie Soehl Coolsaet, Roberto Gallardo, Zhen Gao

发表机构 * Faculty of Engineering, McMaster University(麦基尔大学工程学院) Department of Economics, Universidad Veracruzana, Mexico(墨西哥韦拉克鲁斯大学经济系)

AI总结 本文研究基于Transformer的股票预测模型,通过在多伦多证券交易所指数上预训练再微调到个股,提升了预测性能,并与LSTM和XGBoost对比。

详情
AI中文摘要

本研究旨在利用机器学习改进股票价格预测,并支持与买入、卖出和持有资产相关的明智投资决策。具体而言,本文研究了基于Transformer的股票预测模型,并考察了预训练策略对预测性能的影响。首先在多伦多证券交易所指数(TSX)上预训练一个Transformer模型以预测日内回报方向,随后在TSX个股上进行微调。该模型进一步适用于回报值回归任务。性能以长短期记忆网络(LSTM)和XGBoost模型为基准进行对比。在市场指数上的预训练将个股预测的二元交叉熵损失从0.69降低到0.64。微调后的Transformer回归模型实现了比基准模型更低的均方误差,尽管集成模型和XGBoost模型实现了更高的平均日回报。此外,开发了一个实际应用以提供实时股票预测用于交易支持。未来工作将集中于增加Transformer模型容量、纳入更广泛的全球技术指标以及过滤掉低可预测性的股票。

英文摘要

This research aims to leverage machine learning to improve stock price prediction and support informed investment decisions related to buying, selling, and holding assets. Specifically, this work investigates transformer-based models for stock prediction and examines the impact of pre-training strategies on forecasting performance. A transformer model was first pre-trained on the Toronto Stock Exchange Index (TSX) to predict intra-day return direction and subsequently fine-tuned on individual TSX stocks. The model was further adapted for return-value regression tasks. Performance was benchmarked against Long Short-Term Memory (LSTM) and XGBoost models. Pre-training on the market index improved the binary cross-entropy loss for individual stock prediction from 0.69 to 0.64. The fine-tuned transformer regression model achieved lower mean squared error than the benchmark models, although the ensemble and XGBoost models achieved higher average daily returns. In addition, a practical application was developed to deliver real-time stock predictions for trading support. Future work will focus on increasing transformer model capacity, incorporating broader global technical indicators, and filtering out stocks with low predictability.

2605.23961 2026-05-26 q-bio.BM cs.AI cs.LG 版本更新

Multimodal Alignment and Preference Optimization for Zero-Shot Conditional RNA Generation

多模态对齐与偏好优化用于零样本条件RNA生成

Roman Klypa, Alberto Bietti, Sergei Grudinin

发表机构 * Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK(格勒诺布尔阿尔卑斯大学、法国国家科学研究中心、格勒诺布尔INP、LJK实验室) Center for Computational Mathematics, Flatiron Institute(计算数学中心、Flatiron研究所)

AI总结 提出Moirain框架,通过多模态监督微调和直接偏好优化实现条件RNA序列生成,在零样本条件下生成具有高结合亲和力的生物合理RNA序列。

详情
AI中文摘要

设计能与特定蛋白质相互作用的RNA分子是实验和计算生物学中的一个关键挑战。尽管自然语言建模和基于深度学习的蛋白质设计最近取得了进展,但在提高成功交互频率和生成序列的真实性方面仍有很大空间。在这项工作中,我们将条件RNA序列生成视为一个多阶段对齐问题,引入了Moirain:一组通过多模态监督微调(SFT)和直接偏好优化(DPO)优化的模型。我们的方法从对多样化RNA语料库的大规模预训练开始,以捕捉序列合理性的基本语法。为了实现目标特异性生成,我们采用了一种多模态SFT架构,该架构以蛋白质结构和序列特征为条件进行RNA合成。最后,我们利用DPO使用合成交互数据来优化模型:利用DPO在非对齐偏好空间中导航的独特能力,我们提高了功能适应性,同时不破坏学习到的自然分布。对Moirain系列(Moirain-Base、-Multi和-DPO)的广泛评估表明,与现有基线相比,我们的框架始终能产生新颖、多样且生物合理的RNA序列,并具有优越的结合亲和力。

英文摘要

The design of RNA molecules that interact with specific proteins is a critical challenge in experimental and computational biology. Despite recent progress in natural language modeling and deep learning-based protein design, there remains significant room to improve the frequency of successful interactions and the authenticity of generated sequences for functional applications. In this work, we frame conditional RNA sequence generation as a multi-stage alignment problem, introducing Moirain: a suite of models optimized via multimodal supervised fine-tuning (SFT) and Direct Preference Optimization (DPO). Our approach begins with large-scale pretraining on diverse RNA corpora to capture the fundamental grammars of sequence plausibility. To achieve target-specific generation, we employ a multimodal SFT architecture that conditions RNA synthesis on protein structural and sequential features. Finally, we leverage DPO to refine the model using synthetic interaction data: taking advantage of DPO's unique ability to navigate non-aligned preference spaces, we improve functional fitness without collapsing the learned natural distribution. Extensive evaluation of the Moirain series (Moirain-Base, -Multi, and -DPO) demonstrates that our framework consistently produces novel, diverse, and biologically plausible RNA sequences with superior binding affinities compared to existing baselines.

2605.23960 2026-05-26 q-bio.BM cs.LG 版本更新

Learning Protein Structure-Function Relationships through Knowledge-guided Representation Decomposition

通过知识引导的表示分解学习蛋白质结构-功能关系

Mingqing Wang, Zhiwei Nie, Athanasios V. Vasilakos, Yonghong He, Zhixiang Ren

发表机构 * Shenzhen International Graduate School, Tsinghua University, Shenzhen, China(清华大学深圳国际研究生院) Pengcheng Laboratory, Shenzhen, China(鹏城实验室) School of Electronic and Computer Engineering, Peking University, Shenzhen, China(北京大学电子与计算机工程学院) CAIR, University of Agder, Norway(阿格德大学CAIR) Shanghai Smart Logic Technology Co. Ltd., Shanghai, China(上海智略科技有限公司)

AI总结 提出知识引导的框架ProtDiS,基于信息瓶颈原理分解预训练的蛋白质微环境嵌入,得到更特异、独立和信息高效的结构特征,在12个下游任务上取得一致改进。

Comments 28 pages, 17 figures, icml 2026 regular

详情
AI中文摘要

蛋白质在复杂的三维结构中编码多样功能,然而大多数深度学习表示仍然高度纠缠,掩盖了功能背后的生物物理信号。本文引入ProtDiS,一个知识引导的框架,将预训练的蛋白质微环境嵌入分解为生物学上可解释且任务相关的维度。受信息瓶颈原理启发,ProtDiS学习平衡信息量和压缩的表示,产生更特异、独立和信息高效的结构特征,并在12个下游任务上取得一致改进,在基于结构的分割下提升最大。蛋白质和残基层面的分析进一步表明,ProtDiS能够区分折叠相似但功能不同的蛋白质,并捕捉关键的细粒度生物物理信号。这些发现表明,知识引导的分解为蛋白质结构建模中的潜在空间结构化提供了一种通用且可解释的方法。源代码和实现细节公开于https://github.com/AI-HPC-Research-Team/ProtDiS。

英文摘要

Proteins encode diverse functions within complex three-dimensional structures, yet most deep learning representations remain highly entangled, obscuring the biophysical signals that underlie function. Here we introduce ProtDiS, a knowledge-guided framework that decomposes pretrained protein micro-environment embeddings into biologically grounded and task-relevant dimensions. Inspired by the information bottleneck principle, ProtDiS learns representations that balance informativeness and compression, yielding structural features that are more specific, independent, and information-efficient, and achieving consistent improvements across twelve downstream tasks, with the largest gains under structure-based splits. Protein- and residue-level analyses further show that ProtDiS differentiates proteins with similar folds but divergent functions and captures fine-grained biophysical signals critical. These findings suggest that knowledge-guided decomposition provides a general and interpretable approach for structuring latent spaces in protein structural modeling. The source code and implementation details are publicly available at https://github.com/AI-HPC-Research-Team/ProtDiS.

2605.23957 2026-05-26 cs.AI cs.LG 版本更新

Low-Cost Labels, Reliable Choices: Rollout-Calibrated Hyper-Heuristics for Job Shop Scheduling

低成本标签,可靠选择:用于作业车间调度的Rollout校准超启发式算法

Junhao Wei, Yanxiao Li, Yifu Zhao, Zhenhong Peng, Baili Lu, Dexing Yao, Haochen Li, Qinbin He, Sio-Kei Im, Yapeng Wang, Xu Yang

发表机构 * Faculty of Applied Sciences, Macao Polytechnic University(澳门理工学院应用科学学院) Pazhou Lab (Huangpu), Guangzhou(广州 Pazhou 实验室(黄埔)) College of Animal Science and Technology, Zhongkai University of Agriculture and Engineering(仲恺农业工程学院动物科学与技术学院) Macao Polytechnic University(澳门理工学院)

AI总结 提出一种基于Rollout校准的超启发式算法,通过遗憾归一化标签、上下文KNN不确定性估计和门控机制,在低成本标签下实现可靠的选择器,显著降低平均RPD。

详情
AI中文摘要

学习辅助的超启发式算法可以在保持构造性作业车间调度问题(JSSP)启发式的可行性和可解释性的同时,选择调度规则。其主要计算成本在于标签生成而非模型拟合,因为每个监督标签通常需要从部分调度中展开候选规则。我们研究了这一标签成本问题以及一个可靠性问题:学习的选择器不应偏离强默认规则,除非预测的增益是可信的。所提出的选择器使用遗憾归一化的展开标签、上下文KNN不确定性估计以及一个门控机制,仅在预测改进超过不确定性调整的边际时采取行动。我们还变化展开深度和广度以衡量成本-质量权衡。在合成JSSP实例上,门控选择器在学习的选择器中实现了最低的平均RPD,接近最佳固定调度规则,并将Random-HH的平均RPD降低了一个数量级以上。

英文摘要

Learning-assisted hyper-heuristics can select among dispatching rules while preserving the feasibility and interpretability of constructive Job Shop Scheduling Problem (JSSP) heuristics. Their main computational cost lies in label generation rather than model fitting, since each supervised label usually requires rolling out candidate rules from a partial schedule. We study this label-cost problem together with a reliability problem: a learned selector should not switch away from a strong default rule unless the predicted gain is credible. The proposed selector uses regret-normalized rollout labels, a contextual KNN uncertainty estimate, and a gate that acts only when the predicted improvement exceeds an uncertainty-adjusted margin. We also vary rollout depth and breadth to measure the cost-quality trade-off. On synthetic JSSP instances, the gated selector achieves the lowest mean RPD among learned selectors, remains close to the best fixed dispatching rule, and reduces Random-HH mean RPD by more than an order of magnitude.

2605.23956 2026-05-26 cs.AI cs.LG cs.MA 版本更新

QUIVER: A Formal Framework for Quantifying Perturbation Propagation and Bifurcation in Compound AI Systems

QUIVER: 复合AI系统中扰动传播与分岔的量化形式化框架

Prashanti Nilayam, Sankalp Nayak

发表机构 * Servicenow CA, USA(Servicenow加州美国)

AI总结 提出QUIVER形式化框架,通过敏感性矩阵、轨迹散度、分岔阈值和分布忠实度四个组件,量化图结构LLM流水线中扰动传播与结构分岔,并在三个不同架构的企业和公共流水线上验证其有效性。

详情
AI中文摘要

将多个LLM调用链接成有向计算图的复合AI系统现已成为生产AI的主导架构。尽管这些架构利用具有混合模式输出的异构节点,但现有框架无法量化扰动如何通过此类流水线传播,其中节点是随机的且执行路径可能发生结构分岔。我们引入QUIVER,一个用于测量图结构LLM流水线中扰动传播的形式化框架。该框架定义了:(1) 一个敏感性矩阵,带有类型分派的距离度量,将边分类为放大器、吸收器或阈值敏感,并辅以出现提升;(2) 轨迹散度,将变异分解为值漂移、结构路径散度和迭代次数散度;(3) 分岔阈值,识别导致结构执行路径变化的最小扰动;(4) 分布忠实度,量化每个节点评估数据集何时偏离生产分布。我们在两个生产企业流水线和一个公共DSPy多跳QA流水线上进行验证,这三个架构在结构上各不相同。在8200多个仪器化轨迹(32000多对比较)中,我们证明QUIVER揭示了不同架构的独特敏感性剖面,区分了产生相同散度率的机制不同的级联模式,仅从观测数据预测易发生轨迹分岔的节点,并将过时的评估伪影定位到聚合指标无法揭示的特定节点-字段类别。

英文摘要

Compound AI systems that chain multiple LLM calls into directed computation graphs are now the dominant architecture for production AI. Although these architectures leverage heterogeneous nodes with mixed-mode outputs, no existing framework quantifies how perturbations propagate through such pipelines, where nodes are stochastic and execution paths can diverge structurally. We introduce QUIVER, a formal framework for measuring perturbation propagation in graph-structured LLM pipelines. The framework defines: (1) a sensitivity matrix with type-dispatched distance metrics that classifies edges as amplifiers, absorbers, or threshold-sensitive, complemented by occurrence-lift; (2) trajectory divergence decomposing variation into value drift, structural path divergence, and iteration count divergence; (3) bifurcation thresholds identifying the smallest perturbation that causes structural execution path changes; and (4) distribution faithfulness, quantifying when per node evaluation datasets diverge from production distributions. We validate on two production enterprise pipelines and a public DSPy multihop QA pipeline, three structurally distinct architectures. Across 8,200+ instrumented traces (32,000+ pair comparisons), we demonstrate that QUIVER reveals distinct sensitivity profiles across architectures, distinguishes mechanistically different cascade patterns producing identical divergence rates, predicts nodes prone to trajectory bifurcation from observational data alone, and localizes stale evaluation artifacts to specific node-field categories that aggregate metrics cannot surface.

2605.23953 2026-05-26 q-fin.TR cs.GT cs.LG 版本更新

Game-Theoretic Modeling of Heterogeneous Investor Interactions for Stock Price Forecasting

用于股票价格预测的异构投资者交互博弈论建模

Yong Zhang, Xinxiao Wu, Yunde Jia, Che Sun

发表机构 * School of Computer Science, Beijing Institute of Technology(北京理工大学计算机科学学院) Faculty of Engineering, Shenzhen MSU-BIT University(深圳MSU-BIT大学工程学院)

AI总结 提出一种嵌入博弈论机制的异构图建模方法,通过模拟投资者动态策略交互来提升股票价格预测准确性。

Comments 10 pages, 1 figure, intended for conference submission

详情
AI中文摘要

准确的股票价格预测一直是支撑量化交易和投资决策的关键但具有挑战性的金融科技任务。最近的努力致力于建模股票市场中股票之间的各种复杂关系,以实现更可靠的股票价格预测。这些方法严重依赖于强大的静态先验假设,通过基于预定义结构建模单个股票的时间依赖性或不同股票之间的空间依赖性,而驱动股票价格变动的复杂市场动态仍未得到探索。为了缓解这一问题,我们提出了一种新颖的博弈论建模方法,捕获异构投资者交互以进行股票价格预测。核心思想是将博弈论机制嵌入到异构图结构中,以精细建模异构投资者相对于目标股票的动态策略交互。此外,采用时间位置编码来反映时间窗口内每个博弈事件在不同时间步对未来股票价格变动的差异化影响。利用异构图网络,我们通过投资者博弈代理股票市场的复杂动态,并在所有节点之间实现实时信息传播和节点更新。在两个真实世界基准数据集上进行的大量实验表明,我们的方法有效优于最先进的股票价格预测方法。

英文摘要

Accurate stock price forecasting has consistently remained a pivotal yet challenging FinTech task that underpins quantitative trading and investment decision making. Recent efforts have been dedicated to modeling various complex relationships among stocks in the stock market toward more reliable stock price forecasting.These methods depend heavily on strong static prior assumptions by modeling either temporal dependencies within individual stocks or spatial dependencies across different stocks based on predefined structures, while the complex market dynamics that drive stock price movements remain unexplored. To alleviate this issue, we propose a novel game-theoretic modeling method that captures heterogeneous investor interactions for stock price forecasting. The core idea is to embed game-theoretic mechanisms into the heterogeneous graph structure to finely model the dynamic strategic interactions among heterogeneous investors with respect to target stocks. Additionally, temporal positional encoding is adopted to reflect the differentiated influences of each game event at different time steps within the time window on future stock price movements. Leveraging heterogeneous graph networks, we proxy the intricate dynamics of the stock market through investor games and enable real-time information propagation and node updates among all nodes. Extensive experiments conducted on two real-world benchmark dataset demonstrate that our method effectively outperforms state-of-the-art stock price forecasting methods.

2605.23939 2026-05-26 cs.AI cs.LG 版本更新

DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning

DRIVE:在持续学习下为Web代理建模推理与交互层面的技能

Xirui Liu, Sihang Zhou, Yanning Hou, Rong Zhou, Haoyuan Chen, Maolin He, Siwei Wang, Hao Chen, Jian Huang

发表机构 * College of Intelligence Science and Technology, National University of Defense Technology(智能科学与技术学院,国防科技大学) College of Computer Science and Technology, National University of Defense Technology(计算机科学与技术学院,国防科技大学)

AI总结 提出DRIVE框架,通过将历史经验分离为自然语言推理技能和程序化交互技能,并采用场景感知协调机制,解决Web代理在持续学习中推理与交互知识纠缠的问题,在WebArena上平均任务成功率提升7.3个百分点。

Comments 35 pages, 5 figures

详情
AI中文摘要

Web代理需要高层推理(用于任务分解)和低层交互(用于页面元素操作)来执行不同任务。然而,这些知识类型存在根本差异:推理知识(例如,预订航班需要首先搜索路线)是抽象的且可跨网站迁移,而交互知识(例如,在站点A的特定坐标点击搜索按钮)严重依赖于页面特定上下文。现有方法统一存储经验。这造成了一个困境:抽象表示在具体页面上失去可执行性,而具体表示无法跨领域泛化。这种纠缠限制了能力积累:在新网站上,代理要么因表面差异而无法识别可重用的任务逻辑,要么尝试基于过时页面结构的不可行操作。为了解耦它们,我们提出DRIVE,一个双层技能建模框架,将历史经验分离为自然语言推理技能(捕获可迁移的任务逻辑)和程序化交互技能(将抽象动作接地到可执行操作)。一种场景感知协调机制根据任务语义自适应地检索和调用这些双层技能。DRIVE还使用技能级反思来识别层次特定的失败模式,实现有针对性的技能库扩展和精炼。在五个WebArena领域上的实验表明,DRIVE达到了52.8%的平均任务成功率,比无技能基线高出7.3个百分点。进一步的消融实验显示,推理和交互技能提供了不同且互补的益处,支持将可迁移的任务逻辑与可执行的页面级操作分离。

英文摘要

Web agents require both high-level reasoning (for task decomposition) and low-level interactions (for page elements manipulation) to conduct different tasks. However, these knowledge types differ fundamentally: reasoning knowledge (e.g., booking a flight requires first searching for routes) is abstract and transferable across websites, while interaction knowledge (e.g., clicking the Search button at a specific coordinate on Site A) depends heavily on page-specific contexts. Existing methods store experiences uniformly. This creates a dilemma: abstract representations lose executability on concrete pages, while concrete representations fail to generalize across domains. This entanglement limits capability accumulation: on new websites, agents either fail to recognize reusable task logic due to surface-level differences or attempt infeasible actions from outdated page structures. To disentangle them, we propose DRIVE, a dual-level skill modeling framework separating historical experience into natural language reasoning skills, which capture transferable task logic, and programmatic interaction skills, grounding abstract actions to executable operations. A scene-aware coordination mechanism adaptively retrieves and invokes these dual-level skills based on task semantics. DRIVE also uses skill-level reflection to identify hierarchy-specific failure modes, enabling targeted skill library expansion and refinement. Experiments across five WebArena domains show DRIVE attains an average task success rate of 52.8%, exceeding the skill-free baseline by 7.3 percentage points. Further ablations show reasoning and interaction skills provide distinct, complementary benefits, supporting separation of transferable task logic from executable page-level operations.

2605.23938 2026-05-26 cs.AI cs.CY cs.LG 版本更新

Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors

LLM介导的普适系统中的权威倒置:当模型信任用户胜过传感器

Long Zhang, Zi-bo Qin, Wei-neng Chen

发表机构 * School of Computer Science and Engineering, South China University of Technology(华南理工大学计算机科学与工程学院) School of Computer Science(计算机科学学院) Engineering, South China University of Technology(华南理工大学工程学院)

AI总结 本研究揭示了大语言模型在融合传感器与用户冲突信息时,由于格式依赖性导致数值传感器数据被自然语言用户主张支配的权威倒置现象,并提出了几何框架、审计指标(CIR和AAI)以及推理时层干预方法(GAC)来诊断和缓解该问题。

详情
AI中文摘要

大语言模型(LLM)越来越多地融合普适系统中的异构输入。然而,当传感器测量值与用户主张冲突时,LLM如何隐式分配权威尚未被研究,这引发了在物理传感必须保持优先级的部署场景中的关键可靠性问题。与显式的传统融合不同,LLM将权威分配隐藏在学习的表示中。我们发现这种分配严重依赖于格式:数值传感器数据未能整合到与答案相关的模型方向中,使得自然语言主张主导最终决策,我们将这种现象称为 extbf{权威倒置}。为了诊断和缓解这一问题,我们开发了一个上下文整合的几何框架,引入了两个可计算的审计指标,即上下文整合比(CIR)和权威对齐指数(AAI),并提出了几何权威校准(GAC),一种推理时的层级干预方法,以抑制错位的用户权威。在四个数据集(共576个冲突实例)上评估四个模型(参数规模4B至35B,三种架构),揭示了极端的倒置:在数值任务上,模型表现出接近零的传感器信任(AAI = -0.805,Cohen's d = -2.14),且不受模型容量影响。验证我们的几何框架,理论引导的因果注入翻转了80.2%的错误决策(随机对照<0.4%)。实际应用中,GAC将HAR准确率从0–1.6%提升至21.9–27.5%,优于提示基线。最终,LLM介导系统中的权威分配必须被显式审计并根据应用特定配置,而不是保持隐式。

英文摘要

Large language models (LLMs) increasingly fuse heterogeneous inputs in ubiquitous systems. Yet, how LLMs implicitly allocate authority when sensor measurements and user claims conflict remains unexamined, raising critical reliability concerns for deployments where physical sensing must retain priority. Unlike explicit traditional fusion, LLMs bury authority allocation within learned representations. We discover this allocation is severely format-dependent: numerical sensor data fails to integrate into answer-relevant model directions, allowing natural-language claims to dominate the final decision, a phenomenon we term \textbf{Authority Inversion}.To diagnose and mitigate this, we develop a geometric framework of context integration, introduce two computable audit metrics, specifically the Context Integration Ratio (CIR) and Authority Alignment Index (AAI), and propose Geometric Authority Calibration (GAC), an inference-time layer-level intervention to suppress misplaced user authority. Evaluating four models (4B to 35B parameters, three architectures) across four datasets totaling 576 conflict instances reveals extreme inversion: on numerical tasks, models exhibit near-zero sensor trust (AAI = -0.805, Cohen's d = -2.14), unaffected by model capacity. Validating our geometric framework, theory-guided causal injection flips 80.2\% of incorrect decisions (vs. <0.4\% for random controls). Practically, GAC improves HAR accuracy from 0 -- 1.6\% to 21.9 -- 27.5\%, outperforming prompting baselines. Ultimately, authority allocation in LLM-mediated systems must be explicitly audited and application-specifically configured rather than left implicit.

2605.23936 2026-05-26 cs.AI cs.LG 版本更新

Fuzzy, Neutrosophic, and Uncertain Graph Theory: Properties and Applications

模糊、中智和不确定图论:性质与应用

Takaaki Fujita, Florentin Smarandache

AI总结 本书系统综述了不确定性下的图论,以不确定图框架为核心,统一了模糊、中智等模型,并介绍了扩展图类及其在分子图、决策系统、图神经网络等领域的应用。

Comments 326 pages. Publisher: Neutrosophic Science International Association (NSIA) Publishing House. ISBN: 978-197250204-4

详情
AI中文摘要

本书全面系统地综述了不确定性下的图论,特别强调了不确定图框架的统一作用。它回顾了模糊、中智及相关模型中的基本概念、结构性质、图类和图参数,同时介绍了广泛的扩展,如不确定有向图、超图、超超图和动态图。除了理论发展,本书还探讨了实际应用,包括不确定分子图、决策系统、图神经网络、知识图谱和认知地图。通过从共同视角组织多样化的不确定性感知图模型,本书为理解它们在复杂系统中的关系、能力和应用提供了一个连贯的框架。

英文摘要

This book presents a comprehensive and systematic survey of graph theory under uncertainty, with particular emphasis on the unifying role of the uncertain graph framework. It reviews fundamental concepts, structural properties, graph classes, and graph parameters within fuzzy, neutrosophic, and related models, while also introducing a wide range of extensions such as uncertain digraphs, hypergraphs, superhypergraphs, and dynamic graphs. In addition to theoretical developments, the book explores practical applications, including uncertain molecular graphs, decision-making systems, graph neural networks, knowledge graphs, and cognitive maps. By organizing diverse uncertainty-aware graph models within a common perspective, this work provides a coherent framework for understanding their relationships, capabilities, and applications in complex systems.

2605.23932 2026-05-26 cs.AI cs.CL cs.CY cs.LG 版本更新

When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure

当正确信念崩溃:LLMs在临床压力下的认知韧性

Boyu Xiao, Xiuqi Tian, Xuwen Song, Haochun Wang, Guanchun Song, Sendong Zhao, Bing Qin

发表机构 * Research Center for Social Computing and Interactive Robotics, Harbin Institute of Technology, China(社会计算与交互机器人研究院,哈尔滨工业大学,中国)

AI总结 研究LLMs在临床对话中面对逐步升级压力时信念稳定性问题,提出Med-Stress压力测试框架,发现知识-韧性差距,并设计RBED和R-FT方法提升鲁棒性。

Comments ACL 2026

详情
AI中文摘要

尽管在医学基准测试中准确率很高,但LLMs在临床对话中可能表现出严重的多轮谄媚行为,在逐步升级的压力下放弃最初正确的诊断。我们提出了\textbf{\textsc{Med-Stress}},一个针对性的压力测试框架,用于评估在逐步升级压力下的信念稳定性。在九个前沿大型语言模型(LLMs)中,我们发现医学知识与鲁棒性之间存在明显的分离:高初始诊断能力并不意味着高信念稳定性,导致多个LLMs存在较大的知识-鲁棒性差距。为了缓解这种失败模式,我们提出了一种轻量级的推理时防御方法\textbf{\texttt{RBED}}(\textbf{R}ole-\textbf{B}ased \textbf{E}pistemic \textbf{D}efense),以及一种训练时方法\textbf{\texttt{R-FT}}(\textbf{R}esilience-oriented \textbf{F}ine-\textbf{T}uning),该方法内化了基于证据的抗压能力。实验表明,\textbf{\texttt{R-FT}}几乎消除了信念变化,并显著提高了鲁棒性。

英文摘要

Despite strong medical benchmark accuracy, LLMs can exhibit severe multi-turn sycophancy in clinical dialogue, abandoning initial correct diagnosis under escalating pressure. We propose \textbf{\textsc{Med-Stress}}, a targeted stress test framework that evaluates belief stability under escalating pressure. Across nine frontier large language models (LLMs), we find a clear dissociation between medical knowledge and robustness: high initial diagnostic capability does not imply high belief stability, yielding large knowledge-robustness gaps for several LLMs. To mitigate this failure mode, we propose a lightweight inference-time defense, \textbf{\texttt{RBED}} (\textbf{R}ole-\textbf{B}ased \textbf{E}pistemic \textbf{D}efense), and \textbf{\texttt{R-FT}} (\textbf{R}esilience-oriented \textbf{F}ine-\textbf{T}uning), a training-time approach that internalizes evidence-based resistance to pressure. Experiments show that \textbf{\texttt{R-FT}} nearly eliminates belief change and substantially improves robustness.

2605.23930 2026-05-26 cs.AI cs.LG cs.MA 版本更新

Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game

量子青蛙:量化时间合作博弈中的涌现合作与难度缩放

Saad Mankarious

发表机构 * Gymnasium API

AI总结 通过强化学习分析量化时间合作博弈Quantum Frog,发现同步冲刺策略最优,合作训练可大幅提升成功率并缩短回合步数。

详情
AI中文摘要

我们引入了\emph{Quantum Frog},这是一个双人合作游戏,基于一种新颖的\emph{量化时间}机制,其中环境仅在玩家行动时推进。受经典街机游戏Frogger启发,Quantum Frog要求两只青蛙穿越一个8×8的交通网格并一起到达远端。我们使用强化学习(RL)作为分析镜头来回答四个设计问题:(1)游戏难度如何随交通密度缩放,(2)最优单智能体策略是什么以及为什么,(3)独立和合作双智能体游戏之间的合作差距有多大,以及(4)当智能体被激励合作时会出现什么联合策略?我们通过五个升级阶段训练智能体:表格Q学习、深度Q网络(\DQN)、独立\DQN~(\IDQN)和多智能体近端策略优化(\MAPPO\ 带有集中式评论家),针对一到六辆车的交通密度进行评估。我们的主要发现是:(i)量化时间机制使得\emph{冲刺策略}(每一步直接向上移动)普遍最优,因为暴露于交通的时间被最小化;(ii)添加一个不协调的第二玩家比将单个专家玩家的交通量增加六倍更难;(iii)合作训练相对于独立智能体将联合成功率提高了+32–34个百分点,并将回合长度从约90步减少到约6步;(iv)涌现的合作策略是同步冲刺,而不是复杂的位置协调,这表明在时间关键的合作任务中,仅共享激励就足以使智能体对齐。这些发现为Quantum Frog的商业设计提供了具体、经验基础的指导,并为环境机制在塑造多智能体学习动态中的作用提供了更广泛的见解。

英文摘要

We introduce \emph{Quantum Frog}, a two-player cooperative game built on a novel \emph{quantized-time} mechanic in which the environment advances only when a player acts. Inspired by the classic arcade game Frogger, Quantum Frog requires two frogs to cross an 8$\times$8 grid of traffic and reach the far side together. We use reinforcement learning (RL) as an analytical lens to answer four design questions: (1) how does game difficulty scale with traffic density, (2) what is the optimal single-agent policy and why, (3) how large is the cooperation gap between independent and cooperative two-agent play, and (4) what joint strategy emerges when agents are incentivised to cooperate? We train agents through five escalating stages, Tabular Q-Learning, Deep Q-Network (\DQN), Independent \DQN~(\IDQN), and Multi-Agent Proximal Policy Optimisation (\MAPPO\ with a centralised critic), evaluating each against traffic densities of one to six cars. Our key findings are: (i) the quantized-time mechanic makes a \emph{rush strategy} (moving directly upward at every step) universally optimal, as time exposure to traffic is minimised; (ii) adding an uncoordinated second player is harder than sextupling the traffic for a single expert player; (iii) cooperative training recovers +32--34 percentage points of joint success rate relative to independent agents and reduces episode length from $\sim$90 to $\sim$6 steps; and (iv) the emergent cooperative strategy is synchronised rushing, not complex positional coordination, illustrating that shared incentives alone suffice to align agents in time-critical cooperative tasks. These findings provide concrete, empirically grounded guidance for the commercial design of Quantum Frog and offer broader insights into the role of environment mechanics in shaping multi-agent learning dynamics.

2605.23926 2026-05-26 cs.AI cs.LG 版本更新

How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning

多少思考才足够?量化和理解LLM推理中的冗余

Zhiyuan Zhai, Xinkai You, Wenjing Yan, Xin Wang

发表机构 * Fudan University(复旦大学) The Chinese University of Hong Kong(香港中文大学)

AI总结 本文通过形式化推理冗余度量,量化了前沿推理模型在数学基准上高达61%-93%的步骤级冗余,并证明这种冗余是长度无关结果奖励的结构性后果,而非模型特定伪影。

详情
AI中文摘要

具备推理能力的大语言模型通过生成长思维链来解决难题,这严重增加了延迟、GPU时间和能耗。粗略检查其轨迹发现大量重构、验证和循环自省,然而这种深思熟虑中有多少实际上是必要的,从未在大规模上被度量或从第一性原理解释。本文填补了这两个空白。 我们直接以推理模型本身的形式化推理冗余:一个正确轨迹的冗余度是其尾部可被截断的最大分段步骤比例,同时迫使模型终止思考并输出最终答案,仍能产生正确答案。对四个前沿推理模型和两个数学基准的大规模量化表明,步骤级冗余一致地高——在我们研究的8个(模型,基准)条件下介于61%和93%之间,其中六个条件下中位关键前缀等于单个分段步骤——该发现对评判模型族的选择是稳健的,并且尽管在MATH-500上随问题难度增加而降低,所有四个模型即使在最难的Level-5问题上仍然显著冗余(ρ∈[46%,85%])。 然后我们证明这种冗余是长度无关结果奖励的结构性后果,而非模型特定伪影:在任何此类奖励下,没有有限期望停止时间是最优的。该结果无论RL算法、基础模型、数据分布或策略是通过RL还是蒸馏获得均成立;因此过度思考不是需要在单个模型中修补的缺陷,而是当前推理模型训练方式的结构性属性。代码:https://github.com/zhiyuanZhai20/how-much-thinking-is-enough

英文摘要

Reasoning-capable large language models solve hard problems by emitting long chains of thought, paying heavily in latency, GPU time, and energy. Casual inspection of their traces reveals extensive reformulation, verification, and circular self-reflection, yet how much of this deliberation is actually necessary has never been measured at scale or explained from first principles. This paper closes both gaps. We formalise reasoning redundancy directly in terms of the reasoning model itself: the redundancy of a correct trace is the largest fraction of its trailing segmented steps that can be truncated while $π$, forced to terminate thinking and emit a final answer, still produces the correct answer. A large-scale quantification across four frontier reasoning models and two mathematical benchmarks shows that step-level redundancy is consistently high -- between 61% and 93% across the 8 (model, benchmark) conditions we study, with the median critical prefix equal to a single segmented step in six of the eight conditions -- that the finding is robust to the choice of judge family, and that although $ρ$ decreases with problem difficulty on MATH-500, all four models remain substantially redundant ($ρ\in [46\%, 85\%]$) even on the hardest Level-5 problems. We then prove that this redundancy is a structural consequence of length-agnostic outcome rewards, not a model-specific artefact: under any such reward, no finite expected stopping time is optimal. The result holds regardless of RL algorithm, base model, data distribution, or whether the policy is obtained via RL or distillation; over-thinking is therefore not a bug to be patched in individual models but a structural property of how current reasoning models are trained. Code: https://github.com/zhiyuanZhai20/how-much-thinking-is-enough

2605.23922 2026-05-26 cs.CY cs.AI cs.LG 版本更新

High-Risk AI Systems and the Problem of Identity in the European AI Act

高风险人工智能系统与欧洲人工智能法案中的身份问题

Andrea Ferrario

发表机构 * Institute of Biomedical Ethics and History of Medicine, University of Zürich(伦理与医学史研究所,苏黎世大学) SUPSI, Dalle Molle Institute for Artificial Intelligence (IDSIA)(SUPSI,达勒莫利人工智能研究所) ETH Zürich(苏黎世联邦理工学院)

AI总结 本文通过功能+框架分析欧盟AI法案中高风险AI系统的身份认定问题,提出同步身份测试方法以支持监管审计。

Comments Accepted as a non-archival paper at The 2026 ACM Conference on Fairness, Accountability, and Transparency (FAccT '26), June 25-28, 2026, Montreal, QC, Canada

详情
AI中文摘要

欧盟人工智能法案(AIA)为高风险AI系统建立了一个生命周期治理制度,围绕事前合规评估、上市后监测以及在“重大修改”时重新评估。这些义务预设了AI身份判断:监管机构和提供者必须决定更新后的系统是否随时间保持同一系统。在这项工作中,我们展示了如何通过人工制品身份的功能+框架来澄清这一逻辑,该框架通过预期功能以及适当功能的上下文相关标准(即“AI可信度”)来个体化AI系统。我们进一步论证,AIA没有为同步身份提供内部、可审计的标准——即在给定时间两个AI系统在监管目的上是否应被视为相同——而是基本上将这种相同性判断委托给部门或协调工具。功能+提供了一个以预期功能和可信度概况及水平为基础的同步身份测试,使得同步身份决策在采购、责任和市场监督等治理环境中可检查。我们的贡献是一个概念性和审计视角:我们提供了AIA生命周期义务与功能+身份组件之间的对应图,并通过一个用于审计和争议情境的最小决策流程使同步案例在操作上清晰可读。最后,我们提出两个面向实施的建议:(1)更精确、可测试的预期用途报告,以及(2)标准化、可审计的可信度报告,支持跨时间和跨部署的可比性。

英文摘要

The EU Artificial Intelligence Act (AIA) establishes a lifecycle governance regime for high-risk AI systems built around ex-ante conformity assessment, post-market monitoring, and re-assessment upon "substantial modification." These obligations presuppose AI identity judgments: regulators and providers must decide when an updated system remains the same system over time. In this work, we show how this logic is clarified by the function+ framework of artifact identity, which individuates AI systems by their intended function together with context-sensitive criteria of appropriate functioning, captured as "AI trustworthiness." We further argue that the AIA does not provide an internal, auditable criterion for synchronic identity--when two AI systems at a given time should count as the same for regulatory purposes--and instead largely defers such sameness determinations to sectoral or harmonization instruments. function+ supplies a synchronic identity test anchored in intended function and trustworthiness profiles and levels, making synchronic identity decisions inspectable in governance settings such as procurement, liability, and market surveillance. Our contribution is a conceptual and auditing lens: we provide a correspondence map between AIA lifecycle obligations and function+ identity components, and we make the synchronic case operationally legible via a minimal decision flow for audit and dispute contexts. We conclude with two implementation-facing recommendations: (1) more precise, testable reporting of intended purpose, and (2) standardized, auditable trustworthiness reporting that supports comparability over time and across deployments.

2605.23918 2026-05-26 cs.DC cs.LG cs.PF 版本更新

The Model Parking Tax: Quantifying the Hidden Energy Cost of Always-On GPU Model Deployment

模型停放税:量化始终在线GPU模型部署的隐藏能耗成本

Sai Sathvik Vadari

AI总结 通过跨架构测量,发现GPU空闲功耗由DVFS状态决定,而非显存占用,CUDA上下文贡献了超过98%的停放税,并建立了冷启动盈亏平衡模型。

Comments 7 pages, 3 figures, 5 tables

详情
AI中文摘要

AI推理行业将模型全天候加载在GPU内存中以避免冷启动延迟,隐含地将空闲功耗视为准备就绪的固定成本。然而,这种成本的结构从未被经验性地分解——也从未跨GPU架构进行过。我们首次跨架构测量了空闲GPU功耗作为VRAM分配的函数,结合了18天的生产遥测数据(335,267个样本,14个H100 GPU)以及在三种GPU架构(涵盖三种内存技术:NVIDIA H100(HBM3,80 GB)、A100(HBM2e,80 GB)和L40S(GDDR6,48 GB))上进行的受控剂量-反应实验。我们观察到,在所有三种架构上,空闲功耗是分段常数:CUDA上下文强制进行离散的DVFS转换,消耗比裸空闲多26-66 W(HBM架构上为26-50 W,GDDR6上为66 W),而边际VRAM效应在所有测试设备上均低于测量相关性(|β| < 0.02 W/GB)。无论内存技术如何,CUDA上下文占停放税的98%以上。我们通过在所有三种架构上运行真实的HuggingFace模型(Qwen2.5-7B)验证了这一发现,确认每个设备上空张量与模型加载之间的功耗差异小于0.5 W,并捕获了模型加载期间的冷启动功耗曲线。我们推导出一个冷启动盈亏平衡模型,表明能量最优行为取决于请求到达率和加载延迟——而非模型大小——盈亏平衡间隔为1-5分钟。我们的结果确定了一个在所有测试架构上一致的约束:带上下文的空闲功耗由DVFS状态决定,而非内存占用。

英文摘要

The AI inference industry keeps models loaded in GPU memory around the clock to avoid cold-start latency, implicitly treating idle power as a fixed cost of readiness. Yet the structure of this cost has never been empirically decomposed - and never across GPU architectures. We present the first cross-architecture measurement of idle GPU power as a function of VRAM allocation, combining 18 days of production telemetry (335,267 samples, 14 H100 GPUs) with controlled dose-response experiments on three GPU architectures spanning three memory technologies: NVIDIA H100 (HBM3, 80 GB), A100 (HBM2e, 80 GB), and L40S (GDDR6, 48 GB). We observe that idle power is piecewise constant on all three architectures: the CUDA context forces a discrete DVFS transition consuming +26-66 W over bare idle (26-50 W on HBM architectures, 66 W on GDDR6), while the marginal VRAM effect is bounded below measurement relevance ($|β| < 0.02$ W/GB) on every device tested. The CUDA context accounts for >98% of the parking tax regardless of memory technology. We validate this finding with a real HuggingFace model (Qwen2.5-7B) on all three architectures, confirming <0.5 W difference from empty tensors on every device, and capture cold-start power profiles during model loading. We derive a cold-start breakeven model showing energy-optimal behavior depends on request arrival rate and loading latency - not model size - with breakeven intervals of 1-5 minutes. Our results identify a constraint consistent across all tested architectures: idle-with-context power is determined by DVFS state, not memory occupancy.

2605.23909 2026-05-26 cs.AI cs.LG 版本更新

Confidence Calibration in Large Language Models

大型语言模型中的置信度校准

Noam Michael, Daniel BenShushan, Jacob Bien, Don A. Moore

发表机构 * U.C. Berkeley(伯克利大学) University of Southern California(南加州大学)

AI总结 通过预注册研究,发现大型语言模型(LLMs)的置信度普遍高于准确率,且存在显著的难易效应:困难测试中过度自信,简单测试中信心不足,并提出了LifeEval测试用于评估不同难度下的模型校准。

详情
AI中文摘要

我们研究了大型语言模型(LLMs)在不同任务上的置信度校准情况。预注册研究的结果表明,当前一批LLMs与人类一样,过于确信自己是正确的:平均而言,置信度超过了准确率。然而,重要的是,这种趋势受到强大的难易效应的调节,即在困难测试中过度自信最为严重;相比之下,简单测试实际上显示出明显的信心不足。我们开发了LifeEval,一个用于评估不同难度水平下模型校准的测试。

英文摘要

We investigate the calibration of large language models' (LLMs') confidence across diverse tasks. The results of our preregistered study show that the current crop of LLMs are, like people, too sure they are right: confidence exceeds accuracy, on average. Importantly, however, this tendency is moderated by a powerful hard-easy effect, wherein overconfidence is greatest on difficult tests; by contrast, easy tests actually show substantial underconfidence. We develop LifeEval, a test for evaluating model calibration across levels of difficulty.

2605.22800 2026-05-26 cs.LG cs.AI stat.ML 版本更新

The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning

匹配原则:面向干扰鲁棒表示学习的损失函数几何理论

Vishal Rajput

发表机构 * KU Leuven(根特大学)

AI总结 提出匹配原则,通过估计任务协方差矩阵并匹配惩罚矩阵的像空间,统一了多种鲁棒性方法,并在线性高斯模型中证明最优性。

Comments 58 pages, 13 pre-specified empirical blocks. v2: partial-pass framing, geometry-task dissociation, T2B protocol v3, layout/figure fixes; core theorems unchanged. Code: matching-pmh (PyPI). Related note: arXiv:2604.21395

详情
AI中文摘要

鲁棒性、领域自适应、光度/遮挡不变性、传感器漂移和对齐风格被视为独立的文献领域,拥有各自独立的方法族。在标签保持的部署偏移下,它们共享一个几何对象:协方差 Sigma_task = Cov_{Q_n}(n),即输入在标签不变的情况下可以变化的方式。CORAL、对抗训练、数据增强、度量学习、雅可比惩罚和对齐约束并非独立的技巧——它们都是 Sigma_task 的估计量。固定该对象后,雅可比惩罚由一个矩阵 Sigma' 确定,其像空间必须覆盖 range(Sigma_task)——即匹配原则。我们在线性高斯模型中证明了最优性(定理A),证明了任何能够消除部署漂移的二次惩罚都需要像空间覆盖(定理G),并在全局最小值处证明了相同的二分性(定理A*_global)。错误方向/信号对齐控制(引理C;推论E/E*)以及七个估计量(引理D1-D7),加上无标签TDI,为需要学习 Sigma_task 的情况提供了可证伪的配方。在十三个模块(从ML到Qwen2.5-7B)上,测试了匹配的、各向同性的和错误方向的惩罚对几何和部署漂移的影响。其中十二个模块与可识别性成立的理论一致;Office-31是一个命名的特征间隙失败案例。部分通过:几何可以在不改善每个头条任务指标的情况下提升。一次初步的7B DPO运行(一个epoch,240对):匹配风格-PMH保持了风格TDI,而标准DPO则使其退化。我们不声称标准训练达到全局最小值(假设(O)是开放的),不声称估计的 Sigma_task 总是可识别的,也不声称在每个排行榜上占优。我们提出一个可证伪的设计配方:估计 Sigma_task,匹配 Sigma',运行控制,分别报告任务和几何指标。

英文摘要

Robustness, domain adaptation, photometric/occlusion invariance, sensor drift, and alignment style are treated as separate literatures with separate method families. Under label-preserving deployment shift they share one geometric object: the covariance Sigma_task = Cov_{Q_n}(n) of ways inputs can change without changing the label. CORAL, adversarial training, augmentation, metric learning, Jacobian penalties, and alignment constraints are not independent tricks--they are estimators of Sigma_task. Fix that object and the Jacobian penalty is pinned by a matrix Sigma' whose range must cover range(Sigma_task)--the matching principle. We prove optimality in a linear-Gaussian model (Thm. A), necessity of range coverage for any quadratic penalty that zeros deployment drift (Thm. G), and the same dichotomy at global minima (Thm. A*_global). Wrong-direction/signal-aligned controls (Lemma C; Cor. E/E*) and seven estimators (Lemmas D1--D7), plus label-free TDI, yield a falsifiable recipe when Sigma_task must be learned. Thirteen blocks (ML through Qwen2.5-7B) test matched vs isotropic vs wrong-direction penalties on geometry and deployment drift. Twelve match theory where identifiability holds; Office-31 is a named eigengap failure. Partial passes: geometry can improve without every headline task metric moving. A pilot 7B DPO run (one epoch, 240 pairs): matched style-PMH preserves Style TDI where standard DPO degrades it. We do not claim standard training reaches global minima (assumption (O) is open), that estimated Sigma_task is always identifiable, or dominance on every leaderboard. We claim a falsifiable design recipe: estimate Sigma_task, match Sigma', run the controls, report task and geometry separately.

2605.22532 2026-05-26 cs.LG 版本更新

Relational Linear Properties in Language Models: An Empirical Investigation

语言模型中的关系线性性质:一项实证研究

Giovanni Valer, Luigi Gresele, Marco Bronzini, Emanuele Marconato

发表机构 * University of Copenhagen(哥本哈根大学) University of Trento(特伦托大学) University of Bologna(博洛尼亚大学) University of Pisa(比萨大学)

AI总结 本文提出基于KL散度的探针方法,实证检验语言模型中关系线性假设(即固定关系下对象解嵌入可由主体嵌入线性映射预测),发现其随模型、层和关系表述变化。

详情
AI中文摘要

线性性质在语言模型的表示中普遍存在;然而,实验性地测试它们仍然是一项具有挑战性的任务。本文聚焦于关系线性:即对于固定关系(例如“演奏”),对象的解嵌入(例如“小号”)可以通过线性映射从其主体(例如“迈尔斯·戴维斯”)的嵌入预测。我们提出了一种实验方法,用于测试Marconato等人(2025)提出的关系线性公式。具体而言,我们引入了一种基于KL散度的探针方法来评估这一性质,并考察其在不同层和不同表述的关系查询中的变化。该方法也比先前工作更高效;例如,它避免了Hernandez等人(2024)在线性关系嵌入中使用的粗略雅可比近似。我们在四个数据集上的发现表明,关系线性在不同模型间存在差异,展现出与先前关于模型表示中语言信息的观察一致的逐层模式,并且受关系表述方式变化的影响不同。

英文摘要

Linear properties are ubiquitous in the representations of language models; however, testing them experimentally remains a challenging task. This work focuses on relational linearity: the hypothesis that, for a fixed relation (e.g., "plays"), the unembedding of an object (e.g., "trumpet") can be predicted from the embedding of its subject (e.g.,"Miles Davis") by a linear map. We present an experimental method to test the formulation of relational linearity by Marconato et al. (2025). Specifically, we introduce a probing method, based on Kullback-Leibler divergence, to evaluate this property and examine its variation across layers and paraphrased relational queries. It is also more efficient than previous work; for example, it avoids the crude Jacobian approximations used in Linear Relational Embeddings by Hernandez et al. (2024). Our findings across four datasets show that relational linearity varies across models, exhibits layer-wise patterns consistent with prior observations about linguistic information in model representations, and is differently affected by changes in how the relation is phrased.

2605.22005 2026-05-26 cs.LG cs.AI cs.CL 版本更新

Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)

检查你的大语言模型的秘密词典!五行代码揭示你的大语言模型学到了什么(包括它不应该学到的)

Hisashi Miyashita

发表机构 * Mgnite Inc.(Mgnite公司)

AI总结 通过对lm_head权重矩阵进行奇异值分解(仅需五行PyTorch代码且无需模型推理),直接从模型权重中揭示可解释的语义子空间,并发现模型训练数据组成和策展哲学。

详情
AI中文摘要

我们展示了基于Transformer的大语言模型的lm_head权重矩阵的奇异值分解——仅需五行PyTorch代码且无需模型推理——直接从模型权重中揭示可解释的语义子空间。每个左奇异向量识别出当隐藏状态与相应奇异方向对齐时最容易被选中的词汇标记;检查这些聚类揭示了模型的训练数据组成和策展哲学。 分析GPT-OSS-120B、Gemma-2-2B和Qwen2.5-1.5B,我们发现奇异值谱和词汇聚类结构在不同模型间存在系统性差异:GPT呈现出功能分化子空间的渐进层次;Gemma以19世纪前的英语正字法为主,形成阶梯式聚类结构,这可能有助于高输出可控性;Qwen展现出广泛的多语言覆盖,同时其子空间的词汇被作者认为在伦理上不适合直接发表。 基础-指令对比表明,伦理上令人担忧的子空间源自预训练,并且不会被后训练对齐移除。我们引入词汇聚类得分(VCS)来量化子空间一致性,以及加权投影得分(WPS)作为静态故障标记检测器;将WPS应用于GPT-OSS-120B,无需任何模型推理即可恢复shokubutsu-hyakka-tsu(ID 137606),这是CJK语言社区中广泛报道的一个著名故障标记。我们提出了问题词汇内容根本原因的分类法,并呼吁将lm_head SVD分析作为标准发布前安全审计步骤。我们的发现进一步指出了SVD引导的分词器优化和更可控的大语言模型设计方向。

英文摘要

We show that singular value decomposition of the lm_head} weight matrix of a transformer-based large language model -- requiring only five lines of PyTorch and no model inference -- reveals interpretable semantic subspaces directly from the model weights. Each left singular vector identifies the vocabulary tokens most readily selected when the hidden state aligns with the corresponding singular direction; inspecting these clusters exposes the model's training data composition and curation philosophy. Analysing GPT-OSS-120B, Gemma-2-2B, and Qwen2.5-1.5B, we find that singular value spectra and vocabulary cluster structures differ systematically across models: GPT exhibits a graduated hierarchy of functionally differentiated subspaces; Gemma is dominated by pre-nineteenth-century English orthography, forming a stepwise clustering structure that may contribute to high output controllability; and Qwen exhibits broad multilingual coverage alongside subspaces whose vocabulary the authors have determined to be ethically inappropriate for direct publication. Base-instruct comparison reveals that ethically concerning subspaces originate in pretraining and are not removed by post-training alignment. We introduce the Vocabulary Cluster Score (VCS) to quantify subspace coherence, and the Weighted Projection Score (WPS) as a static glitch token detector; applying WPS to GPT-OSS-120B recovers shokubutsu-hyakka-tsu (ID 137606), a well-known glitch token widely reported in the CJK language community, without any model inference. We propose a taxonomy of root causes for problematic vocabulary content and call for lm_head} SVD analysis to be adopted as a standard pre-release safety auditing step. Our findings further suggest directions toward SVD-guided tokenizer optimisation and more controllable LLM design.

2605.20670 2026-05-26 cs.LG 版本更新

LT2: Linear-Time Looped Transformers

LT2: 线性时间循环Transformer

Chunyuan Deng, Yizhe Zhang, Rui-Jie Zhu, Yuanyuan Xu, Jiarui Liu, T. S. Eugene Ng, Hanjie Chen

发表机构 * Rice University(里士大学) Apple(苹果公司) UC Santa Cruz(圣克ruz大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出LT2系列架构,用次二次线性时间注意力替代二次softmax注意力,通过循环实现线性注意力中的迭代记忆精炼和稀疏注意力中的有效感受野扩展,在召回、状态跟踪和语言建模任务上取得一致提升,并展示了混合变体在效率和性能上的优势。

详情
AI中文摘要

循环Transformer(LT)通过在解码最终token之前多次迭代其层,已成为一种强大的架构。然而,将其与全注意力配对会保留二次复杂度,使其计算昂贵且速度慢。我们引入了LT2(线性时间循环Transformer),这是一系列循环架构,用次二次、线性时间注意力替代二次softmax注意力。我们研究了两种变体:具有线性注意力的LT2-linear和具有稀疏注意力的LT2-sparse。我们发现循环与这些变体独特地协同作用:它在线性注意力中实现迭代记忆精炼,并在稀疏注意力中逐步扩展有效感受野。我们从理论上形式化了这些优势,并在受控的召回、状态跟踪和语言建模任务中展示了一致的经验提升。然后我们探索了LT2-hybrid,它在循环设置中结合了不同的注意力变体。两种变体尤其有前景:LT2-hybrid (GDN+DSA),它交错使用线性和稀疏注意力以最大化效率,并以完全线性时间成本匹配标准循环Transformer的质量;以及LT2-hybrid (Full+GDN),它将GDN与一小部分全注意力层交错使用以最大化质量,在性能和效率上都超过了标准循环Transformer。我们还展示了如何将预训练的LT转换为LT2-hybrid模型。经过约10亿token的训练,我们的转换模型Ouro-hybrid-1.4B在性能上优于行业级别的10亿参数模型,并与行业级别的40亿参数模型竞争,同时保留了线性时间注意力的速度优势。这些结果共同展示了使循环Transformer更具可扩展性并推进高效、有能力的小型语言模型的清晰路径。

英文摘要

Looped Transformers (LT) have emerged as a powerful architecture by iterating their layers multiple times before decoding the final token. However, pairing them with full attention retains quadratic complexity, making them computationally expensive and slow. We introduce LT2 (Linear-Time Looped Transformers), a family of looped architectures that replace quadratic softmax attention with subquadratic, linear-time attention. We study two variants: LT2-linear with linear attention and LT2-sparse with sparse attention. We find that looping uniquely synergizes with these variants: it enables iterative memory refinement in linear attention and progressively expands the effective receptive field in sparse attention. We formalize these benefits theoretically and demonstrate consistent empirical gains across controlled recall, state-tracking, and language modeling tasks. We then explore LT2-hybrid, which combines different attention variants in a looped setting. Two variants are especially promising: LT2-hybrid (GDN+DSA), which interleaves linear and sparse attention to maximize efficiency and matches the standard looped transformer's quality at fully linear-time cost; and LT2-hybrid (Full+GDN), which interleaves GDN with a small fraction of full attention layers to maximize quality, surpassing the standard looped transformer in both performance and efficiency. We also show how to convert a pre-trained LT into an LT2-hybrid model. With about 1B tokens of training, our converted model, Ouro-hybrid-1.4B, outperforms industry-level 1B models and is competitive with industry-level 4B models while retaining the speed benefits of linear-time attention. Together, these results show a clear path toward making looped transformers more scalable and advancing efficient, capable small language models.

2605.20490 2026-05-26 cs.AI cs.LG 版本更新

ECUAS$_n$: A family of metrics for principled evaluation of uncertainty-augmented systems

ECUAS$_n$: 一种用于原则性评估不确定性增强系统的度量族

Lautaro Estienne, Erik Ernst, Matías Vera, Pablo Piantanida, Luciana Ferrer

发表机构 * School of Engineering, UBA, Argentina(阿根廷UBA工程学院) ICC, CONICET-Universidad de Buenos Aires, Argentina(阿根廷CONICET-布宜诺斯艾利斯大学ICC) LISN, CNRS, Université Paris-Saclay, France(法国CNRS巴黎萨克雷大学LISN) International Laboratory on Learning Systems, Canada(加拿大学习系统国际实验室) CSC, CONICET, Argentina(阿根廷CONICET CSC) Mila - Quebec AI Institute, Canada(加拿大魁北克AI研究所Mila) CNRS, Université Paris-Saclay, France(法国CNRS巴黎萨克雷大学)

AI总结 针对高 stakes 自动决策中不确定性增强系统的评估问题,提出一种基于适当评分规则的度量族 ECUAS$_n$,通过参数 $n$ 平衡错误预测成本与不确定性质量,并在分类和生成数据集上验证其理论优势与实证效果。

Comments pre-print, 9-pages paper, 25 pages total

详情
AI中文摘要

在高风险自动决策中,获取预测不确定性对于使用户(人类或下游系统)能够根据应用特定的成本权衡接受或拒绝预测至关重要。这种不确定性增强(UA)系统——即同时输出预测和不确定性分数的系统——目前在文献中以多种方式被评估,包括使用单独的指标评估预测和不确定性分数、设置固定拒绝成本的成本函数或对覆盖-风险曲线进行积分。我们认为这些评估方法不足以评估UA系统在不确定性下决策的整体性能,并提出了一种新的度量族ECUAS$_n$,将其表述为感兴趣任务的适当评分规则。参数$n$根据用例需求控制错误预测成本与不完美不确定性之间的权衡。我们通过在不同分类和生成数据集(包括TriviaQA的手动注释子集)上的实验,从理论和实证两方面展示了ECUAS$_n$度量的优势。

英文摘要

In high-stakes automated decision-making, access to predictive uncertainty is essential for enabling users -- human or downstream systems -- to accept or reject predictions based on application-specific cost trade-offs. Such uncertainty-augmented (UA) systems -- i.e., systems that output both predictions and uncertainty scores -- are currently being assessed in the literature in a variety of ways, using separate metrics to evaluate the predictions and the uncertainty scores, setting a cost function with a fixed rejection cost or integrating over a coverage-risk curve. We argue that these evaluation approaches are inadequate for assessing overall performance of the UA system for decision making under uncertainty and propose a novel family of metrics, ECUAS$_n$, formulated as proper scoring rules for the task of interest. The parameter $n$ controls the trade-off between the cost of incorrect predictions and imperfect uncertainties depending on the needs of the use-case. We demonstrate the advantages of the ECUAS$_n$ metrics both theoretically and empirically, through experiments on diverse classification and generation datasets, including a manually annotated subset of TriviaQA.

2605.19021 2026-05-26 cs.LG 版本更新

Deep Neural Sheaf Diffusion

深度神经层扩散

Rémi Bourgerie, Šarūnas Girdzijauskas, Viktoria Fodor

发表机构 * School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden(电气工程与计算机科学学院,皇家理工学院,斯德哥尔摩,瑞典)

AI总结 针对图神经网络深层堆叠导致表示崩溃的问题,提出用层邻接算子替代层拉普拉斯算子,结合归一化、奇非线性函数和门控机制,在合成和真实数据集上显著提升深层网络性能。

Comments Accepted at the ICML 2026 Workshop on Graph Foundation Models (GFM@ICML 2026). Code available at https://github.com/remibourgerie/deep-neural-sheaf-diffusion

详情
AI中文摘要

深度图神经网络对于捕捉图结构数据中的复杂依赖关系至关重要。然而,将GNN扩展到深层仍具挑战性,因为堆叠层会导致表示崩溃和由于重复聚合导致的敏感性降低。虽然神经层扩散(NSD)提供了针对这种崩溃的强理论保证,但这些保证在实践中并未实现:随着深度增加,层拉普拉斯算子的不一致信号消失,限制了更深层的贡献。我们识别了阻碍NSD在深度上有效性的机制,并提出了深度神经层扩散(DNSD),它用层邻接算子替换层拉普拉斯算子,以在层间保持信息信号。这辅以归一化、奇非线性函数和门控。为了对预期性能改进提供原则性解释,我们将层扩散与图注意力机制进行对比,强调DNSD用矩阵值边函数替换标量注意力分数,并归一化节点表示而非注意力分数。我们通过实验证明,DNSD在图任务中有效利用深层聚合,在合成长程数据集上以高达30个百分点的准确率优于GNN和NSD基线,并在真实世界基准上持续优于它们。这些结果将基于层的架构定位为图基础模型的有前途的构建块,通过支持有效的深层架构。

英文摘要

Deep Graph Neural Networks (GNNs) are essential for capturing complex dependencies in graph-structured data. However, scaling GNNs to depth remains challenging, as stacking layers leads to representation collapse and diminishing sensitivity due to repeated aggregation. While Neural Sheaf Diffusion (NSD) provides strong theoretical guarantees against such collapse, these guarantees do not translate to practice: as depth increases, the disagreement signal of the sheaf Laplacian vanishes, limiting the contribution of deeper layers. We identify mechanisms that hinder NSD effectiveness at depth and propose \emph{Deep Neural Sheaf Diffusion} (DNSD), which replaces the sheaf Laplacian with a sheaf adjacency operator to maintain informative signals across layers. This is complemented by normalization, odd nonlinearities, and gating. To provide a principled explanation of the expected performance improvement, we contrast sheaf diffusion to graph attention mechanisms, highlighting that DNSD replaces scalar attention scores with matrix-valued edge functions and normalizes node representations rather than attention scores. We demonstrate empirically that DNSD effectively utilizes deep aggregation in graph tasks, outperforming GNN and NSD baselines with up to 30pp accuracy on synthetic long-range datasets, and consistently outperforming them on real-world benchmarks. These results position sheaf-based architectures as a promising building block for graph foundation models by supporting effective deep architectures.

2605.18932 2026-05-26 cs.LG cs.AI 版本更新

HypergraphFormer: Learning Hypergraphs from LLMs for Editable Floor Plan Generation

HypergraphFormer: 从大语言模型中学习超图以实现可编辑的楼层平面图生成

Nikita Klimenko, Hesam Salehipour, Parham Eftekhar, Amir Khasahmadi, Ramon Elias Weber

发表机构 * Autodesk Research(Autodesk研究院) York University(约克大学) UC Berkeley(加州大学伯克利分校)

AI总结 提出HypergraphFormer,利用大语言模型学习超图表示来生成楼层平面图,在RPLAN数据集上超越现有方法,并支持任意边界和高度可编辑性。

详情
AI中文摘要

在这项工作中,我们提出了HypergraphFormer,一种基于大语言模型学习超图表示的新型高效楼层平面图生成方法。该模型通过监督微调训练,生成基于超图的文本表示,编码楼层平面图中的空间关系和连通性信息。我们在RPLAN数据集上训练和评估我们的方法,并进一步在本文发布的一个独立的分布外数据集上展示其泛化能力。我们的方法在多种指标上优于基于栅格化或向量化表示的最先进技术。我们还展示了改进的数据效率,特别是在分布偏移下。超图公式通过将公寓足迹与其功能和几何细分解耦,使得能够为任意、不规则、用户指定的边界生成楼层平面图。此外,我们展示了所提出的方法具有高度的可编辑性,使其特别适合由大语言模型支持的设计导向工作流程。

英文摘要

In this work, we propose HypergraphFormer, a novel and efficient approach to floor plan generation based on learning hypergraph representations with a large language model (LLM). The model is trained via supervised fine-tuning to generate a hypergraph-based textual representation that encodes spatial relationships and connectivity information within floor plans. We train and evaluate our approach on the RPLAN dataset, and further demonstrate its generalizability on a separate out-of-distribution dataset, which we release in this paper. Our method outperforms state-of-the-art techniques based on rasterized or vectorized representations across a diverse set of metrics. We also show improved data efficiency, particularly under distribution shift. The hypergraph formulation enables the generation of floor plans for arbitrary, irregular, user-specified boundaries by decoupling apartment footprints from their functional and geometric subdivisions. Furthermore, we show that the proposed methodology offers a high degree of editability, making it particularly well suited to design-oriented workflows supported by LLMs.

2605.18224 2026-05-26 cs.LG cs.AI 版本更新

A Simplex Witness Certificate for Constant Collapse in Variational Autoencoders

变分自编码器中恒定坍缩的单纯形见证证书

Zegu Zhang, Jianhua Peng, Jian Zhang

发表机构 * Independent Researcher(独立研究者) School of Computing, Southeast University(东南大学计算机学院)

AI总结 提出一种基于GMM教师后验和单纯形见证的证书,用于检测和量化VAE编码器均值是否发生输入无关的恒定坍缩,并在MNIST、CIFAR-10和CIFAR-100上验证了方法有效性。

详情
AI中文摘要

我们研究变分自编码器中的精确恒定坍缩:确定性编码器均值变得与输入无关。先验保持为标准高斯分布。在VAE训练之前,我们从基于GMM的数据视角选择一个固定的教师后验,并将一个固定的仅潜在空间单纯形见证附加到编码器均值上。这种构造产生两个关联对象。第一个是证书:如果见证预测优于教师的最佳恒定预测器,则编码器均值不能是输入无关的常数。第二个是局部逃逸方向:在坍缩流形上,教师残差为对齐损失提供样本相关的下降方向。对于任何全支撑的教师后验,相同的几何结构也给出一个具有零教师-见证对齐误差的闭式潜在码。其缩放版本追踪一条从恒定预测器到精确教师码的边际能量路径,该路径量化了受保护见证子空间内的非坍缩。我们在MNIST、CIFAR-10和CIFAR-100上实例化了该方法。使用搜索的无监督PCA-GMM教师,在CIFAR-10和CIFAR-100上,所有五个种子的普通VAE均未通过教师-见证证书,而RST变体在所有五个种子中均通过。在坍缩压力设置下(β_KL ∈ {2,4,8}),普通VAE再次在所有种子中失败,而RST-alpha-prefit保持证书阳性。在两个自然图像数据集上的逃逸轨迹从低边际初始化开始增加见证边际,并表现出非零的教师诱导梯度范数。该分析仅限于编码器均值的精确恒定坍缩;生成质量、解码器使用和其他坍缩模式仍是独立的问题。

英文摘要

We study exact constant collapse in variational autoencoders: the deterministic encoder mean becomes independent of the input. The prior remains the standard Gaussian. Before VAE training, we select a fixed teacher posterior from a GMM-based view of the data and attach a fixed latent-only simplex witness to the encoder mean. This construction yields two linked objects. The first is a certificate: if the witness prediction improves on the best constant predictor of the teacher, the encoder mean cannot be input-independent constant. The second is a local escape direction: on the collapsed manifold, the teacher residual gives a sample-dependent descent direction for the alignment loss. For any full-support teacher posterior, the same geometry also gives a closed-form latent code with zero teacher-witness alignment error. Its scaled versions trace a margin-energy path from the constant predictor to the exact teacher code, which quantifies non-collapse inside the protected witness subspace. We instantiate the method on MNIST, CIFAR-10, and CIFAR-100. With searched unsupervised PCA-GMM teachers, vanilla VAEs fail the teacher-witness certificate in all five seeds on CIFAR-10 and CIFAR-100, while RST variants pass in all five seeds. Under collapse-stress settings with \(β_{\mathrm{KL}}\in\{2,4,8\}\), vanilla VAE again fails in all seeds, whereas RST-alpha-prefit remains certificate-positive. Escape trajectories on both natural-image datasets increase the witness margin from a low-margin initialization and exhibit nonzero teacher-induced gradient norms. The analysis is confined to exact constant collapse of the encoder mean; generation quality, decoder use, and other collapse modes remain separate questions.

2605.17606 2026-05-26 cs.LG 版本更新

The Neural Tangent Kernel for Classification

分类问题的神经正切核

Jonathan Plenk, Sergio Calvo-Ordonez, Alvaro Cartea, Yarin Gal, Mark van der Wilk, Kamil Ciosek

发表机构 * Mathematical Institute, University of Oxford(牛津大学数学研究所) Oxford-Man Institute of Quantitative Finance, University of Oxford(牛津大学量化金融研究所) OATML, University of Oxford(牛津大学OATML研究中心) Department of Computer Science, University of Oxford(牛津大学计算机科学系) Spotify

AI总结 本文通过识别宽神经网络在分类损失下保持懒惰训练的条件,将神经正切核理论扩展到分类问题,并分析了参数正则化对核常数性的影响以及预测器分布与贝叶斯方法的关系。

Comments Preprint

详情
AI中文摘要

在宽神经网络中,神经正切核(NTK)在训练过程中近似保持常数,为研究训练动态、泛化以及核方法的联系提供了强大的理论工具。然而,该理论主要局限于回归损失。先前认为,在分类损失或更一般涉及非线性输出变换的损失上训练会破坏这一性质,导致logits发散和线性化失效。本文通过识别宽神经网络保持懒惰训练机制的条件,将NTK理论扩展到分类问题。我们表明,参数空间正则化确保了交叉熵损失下训练过程中NTK的常数性,而在无正则化的情况下,当目标非退化(即所有类别具有严格正概率)时,该机制得以恢复。在这些条件下,训练可由线性化模型很好地近似,从而基于NTK得到解的显式刻画。我们进一步分析了随机初始化引起的训练预测器分布,并将这种模型不确定性的概念与贝叶斯方法联系起来。

英文摘要

In wide neural networks, the Neural Tangent Kernel (NTK) remains approximately constant during training, providing a powerful theoretical tool for studying training dynamics, generalization, and connections to kernel methods. However, this theory is largely restricted to regression losses. It was previously thought that training on a classification loss, or more generally losses involving nonlinear output transformations, breaks this property, leading to divergent logits and a breakdown of the linearization. In this paper, we extend NTK theory to classification by identifying conditions under which wide neural networks remain in the lazy training regime. We show that parameter-space regularization ensures a constant NTK during training for cross-entropy loss, while in the absence of regularization the regime is recovered when targets are non-degenerate, i.e. when all classes have strictly positive probability. Under these conditions, training is well-approximated by the linearized model, yielding an explicit characterization of the solution in terms of the NTK. We further analyze the distribution of trained predictors induced by random initialization and relate this notion of model uncertainty to Bayesian methods.

2605.16302 2026-05-26 cs.LG cs.AI cs.CL 版本更新

Reducing Credit Assignment Variance via Counterfactual Reasoning Paths

通过反事实推理路径减少信用分配方差

Fei Ding, Yongkang Zhang, Youwei Wang, Zijian Zeng

发表机构 * Alibaba Group(阿里巴巴集团) Tsinghua University(清华大学)

AI总结 提出反事实比较框架,通过采样多条推理轨迹并利用差异隐式估计过程级优势,将稀疏终端奖励转化为步骤敏感信号,从而改进大语言模型多步推理的信用分配,并引入隐式行为策略优化(IBPO)提升训练稳定性和性能上限。

详情
AI中文摘要

使用大语言模型进行多步推理的强化学习通常依赖于稀疏的终端奖励,这会导致一个条件较差的信用分配问题:最终反馈均匀地传播到所有中间决策。这导致高梯度方差、不稳定的训练和许多无效更新,最终限制了模型的持续改进。我们提出了一种用于信用分配的反事实比较框架。对于每个输入,该框架采样多个推理轨迹,并将它们的差异视为对替代决策的隐式近似。这产生了一个隐式过程级优势估计器,将稀疏终端奖励转化为步骤敏感的学习信号。基于此框架,我们引入了隐式行为策略优化(IBPO),该方法在数学和代码推理基准上显著提高了训练稳定性和性能上限。我们的结果为释放大语言模型的推理潜力指明了一个有前景的方向。

英文摘要

Reinforcement learning for multi-step reasoning with large language models (LLMs) typically relies on sparse terminal rewards, which creates a poorly conditioned credit-assignment problem: the final feedback is propagated uniformly across all intermediate decisions. This leads to high gradient variance, unstable training, and many ineffective updates, ultimately limiting sustained model improvement. We propose a counterfactual-comparison framework for credit assignment. For each input, the framework samples multiple reasoning trajectories and treats their differences as implicit approximations to alternative decisions. This yields an implicit process-level advantage estimator that converts sparse terminal rewards into step-sensitive learning signals. Building on this framework, we introduce Implicit Behavior Policy Optimization (IBPO), which substantially improves training stability and the performance ceiling on mathematical and code-reasoning benchmarks. Our results point to a promising direction for unlocking the reasoning potential of LLMs.

2605.15433 2026-05-26 cs.LG 版本更新

Spectral Priors vs. Attention: Investigating the Utility of Attention Mechanisms in EEG-Based Diagnosis

光谱先验 vs. 注意力:探究注意力机制在基于脑电图的诊断中的效用

Tawsik Jawad, Gowtham Atluri, Vikram Ravindra

发表机构 * University of Cincinnati(辛辛那提大学)

AI总结 本文提出一种基于频带选择的光谱特征构建方法,证明在小型EEG数据集中,传统机器学习模型性能可匹敌或超越SOTA深度学习模型,而注意力机制无法提取稳定的光谱特征。

详情
AI中文摘要

脑电图(EEG)时间序列信号具有显著噪声和粗糙的空间分辨率,这使得神经退行性疾病的分类变得复杂。即使是最先进的深度学习架构,由于组间高度相似性,也难以区分健康对照和患病受试者,或不同疾病类型。在本文中,我们展示了一种光谱选择性特征构建方法能够增强类别可分性。通过隔离主要脑波频带内的信号强度,我们将高维原始数据转化为高价值的光谱特征。我们的结果表明,在小型数据集中:a) 从频域和时频域导出的特征使传统机器学习模型能够匹配或超越最先进深度学习模型的性能;b) 注意力机制无法提取表征健康神经活动的稳定特征签名,无论是在静息态还是任务态EEG中;c) 基于注意力的模型在寻找相关光谱特征方面的局限性似乎是稳健的,因为提供频率选择性时域输入并未显著改善其性能。我们在三个开源静息态EEG数据集和一个任务态EEG数据集上验证了我们的方法,为我们的主张提供了强有力的经验证据。

英文摘要

Electroencephalograph (EEG) timeseries signals are characterized by significant noise and coarse spatial resolution, which complicates the classification of neurodegenerative diseases. Even SOTA deep learning architectures struggle to distinguish between healthy controls and diseased subjects, or between different disease types, due to high intergroup similarity. In this paper, we show that a spectrally selective approach to feature construction enhances class separability. By isolating signal strengths within the primary brainwave bands, we transform high dimensional raw data into high value spectral features. Our results demonstrate that in small datasets a) features derived from frequency and time frequency domain allow traditional machine learning models to match or exceed the performance of SOTA deep learning models, b) Attention mechanism is unable to distill the stable feature signatures that characterize healthy neural activity in both resting and task EEGs, and c) the limitations of attention based models in finding relevant spectral features appear to be robust in that providing frequency selective time domain input do not appreciably improve their performance. We validate our methodology across three open source resting EEG datasets and one task EEG dataset, providing robust empirical evidence for our claims.

2605.14255 2026-05-26 cs.LG cs.CV 版本更新

Architecture-Aware Explanation Auditing for Industrial Visual Inspection

面向工业视觉检测的架构感知解释审计

Sibo Jia, Zihang Zhao, Kunrong Li

AI总结 本文提出一种基于原生读出假设的架构感知解释审计协议,通过扰动实验证明解释方法的忠实度受其与模型原生决策机制的结构距离约束,并揭示忠实度排名是(模型、解释器、扰动算子)三元组的联合属性。

Comments Format update

详情
AI中文摘要

工业视觉检测系统日益依赖深度分类器,其热力图解释可能看似合理,但未能识别真正驱动模型决策的图像区域。本文基于原生读出假设,实现了一种架构感知的解释审计协议:解释方法的基于扰动的忠实度受其与模型原生决策机制的结构距离约束。在WM-811K晶圆图(9类,172k图像)上,采用三种子零填充扰动协议,ViT-Tiny + Attention Rollout的Deletion AUC为0.211,而Swin-Tiny / ResNet18+CBAM / DenseNet121 + Grad-CAM的Deletion AUC为0.432-0.525(|Cohen's d| > 1.1),尽管其分类准确率较低。Swin-Tiny将架构家族与读出结构分离:尽管是Transformer,其空间特征图层次使其与Grad-CAM兼容,表明操作因素是读出结构而非架构家族。一个模型无关的控制方法(RISE)将所有家族的Deletion AUC压缩至约0.1,表明差距源于解释器路径;值得注意的是,RISE优于所有原生方法,因此原生读出是兼容性原则而非最优性保证。模糊填充敏感性分析表明,在不同扰动基线下的家族排序反转,强化了忠实度排名是(模型、解释器、扰动算子)三元组的联合属性。在MVTec AD(预训练模型)上的探索性边界条件研究表明,审计结果依赖于数据集/任务,并识别了需要限定的条件。该协议提供了可操作的指导:解释路径应基于读出结构与模型架构协同设计,部署的热力图应附带定量忠实度指标。

英文摘要

Industrial visual inspection systems increasingly rely on deep classifiers whose heatmap explanations may appear visually plausible while failing to identify the image regions that actually drive model decisions. This paper operationalizes an architecture-aware explanation audit protocol grounded in the native-readout hypothesis: the perturbation-based faithfulness of an explanation method is bounded by its structural distance from the model's native decision mechanism. On WM-811K wafer maps (9 classes, 172k images) under a three-seed zero-fill perturbation protocol, ViT-Tiny + Attention Rollout attains Deletion AUC 0.211 against 0.432-0.525 for Swin-Tiny / ResNet18+CBAM / DenseNet121 + Grad-CAM (abs(Cohen's d) > 1.1), despite lower classification accuracy. Swin-Tiny disentangles architecture family from readout structure: despite being a Transformer, its spatial feature-map hierarchy makes it Grad-CAM compatible, showing that the operative factor is readout structure rather than architecture family. A model-agnostic control (RISE) compresses all families to Deletion AUC about 0.1, indicating the gap arises from the explainer pathway; notably, RISE outperforms all native methods, so native readout is a compatibility principle rather than an optimality guarantee. A blur-fill sensitivity analysis shows that the family ordering reverses under a different perturbation baseline, reinforcing that faithfulness rankings are joint properties of (model, explainer, perturbation operator) triples. An exploratory boundary-condition study on MVTec AD (pretrained models) indicates that audit results are dataset/task dependent and identifies conditions requiring qualification. The protocol yields actionable guidance: explanation pathways should be co-designed with model architectures based on readout structure, and deployed heatmaps should be accompanied by quantitative faithfulness metrics.

2605.12961 2026-05-26 cs.CV cs.LG 版本更新

Reducing Bias and Variance: Generative Semantic Guidance and Bi-Layer Ensemble for Image Clustering

减少偏差与方差:用于图像聚类的生成语义引导与双层集成

Feijiang Li, Zhenxiong Li, Jieting Wang, Zizheng Jiu, Saixiong Liu, Liang Du

发表机构 * Institute of Big Data Science and Industry(大数据科学与产业研究院) Key Laboratory of Evolutionary Science Intelligence of Shanxi Province(山西省进化智能科学重点实验室) School of Artificial Intelligence, Shanxi University(山西大学人工智能学院)

AI总结 提出GSEC框架,通过生成语义引导减少偏差、双层集成学习降低方差,在六个基准数据集上超越18种最新方法。

详情
AI中文摘要

图像聚类旨在将未标记的图像数据集划分为不同的组。该任务的一个核心方面是构建并利用先验知识来指导聚类过程。最近的方法引入语义描述作为先验信息,其中大多数通常依赖于基于匹配的技术和预定义词汇表。然而,有限的匹配空间限制了它们对下游聚类任务的适应性。此外,这些方法主要关注减少偏差以提高性能,经常忽视方差降低的重要性。为了解决这些局限性,我们提出了GSEC(基于生成语义引导和双层集成的图像聚类),这是一个旨在通过生成语义引导减少偏差并通过集成学习缓解方差的框架。我们的方法利用多模态大语言模型生成语义描述,并通过加权平均推导图像嵌入。此外,双层集成策略通过内层的BatchEnsemble整合跨模态信息,并通过外层的对齐机制对齐输出。对比实验表明,GSEC在六个基准数据集上优于18种最新方法,进一步分析证实了其在同时减少偏差和方差方面的有效性。代码可在https://github.com/2017LI/GSEC.git获取。

英文摘要

Image clustering aims to partition unlabeled image datasets into distinct groups. A core aspect of this task is constructing and leveraging prior knowledge to guide the clustering process. Recent approaches introduce semantic descriptions as prior information, most of which typically relying on matching-based techniques with predefined vocabularies. However, the limited matching space restricts their adaptability to downstream clustering tasks. Moreover, these methods primarily focus on reducing bias to improve performance, frequently overlooking the importance of variance reduction. To address these limitations, we propose GSEC (Image Clustering based on Generative Semantic Guidance and Bi-Layer Ensemble), a framework designed to reduce bias through generative semantic guidance and mitigate variance via ensemble learning. Our method employs Multimodal Large Language Models to generate semantic descriptions and derive image embeddings via weighted averaging. Additionally, a bi-layer ensemble strategy integrates cross-modal information through BatchEnsemble in the inner layer and aligns outputs via an alignment mechanism in the outer layer. Comparative experiments demonstrate that GSEC outperforms 18 state-of-the-art methods across six benchmark datasets, while further analysis confirms its effectiveness in simultaneously reducing both bias and variance. The code is available at https://github.com/2017LI/GSEC.git.

2605.10430 2026-05-26 cs.LG cs.AI stat.ML 版本更新

Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation

真实 vs. 半模拟:重新思考治疗效果估计的评估

George Panagopoulos

发表机构 * Department of Computer Science University of Luxembourg(计算机科学系卢森堡大学)

AI总结 通过大规模实证研究,比较了半模拟基准和真实数据集上使用反事实指标与可观测指标评估治疗效果估计模型的效果,揭示了两种评估体系之间的差距,并发现简单元学习器与强基础模型结合具有竞争力。

详情
AI中文摘要

利用机器学习估计异质性治疗效果在学术研究和工业实践中都引起了广泛关注。然而,这两个领域通常在不同条件下评估模型。方法论工作通常依赖于半模拟基准和需要反事实结果的指标,而实际应用则依赖于基于排名或测试结果的可观测指标。尽管方法论进展与实际部署之间存在众所周知的差距,但这些评估体系之间的关系尚未得到系统研究。我们对标准半模拟基准系列和真实数据集上的治疗效果评估进行了大规模实证研究。我们的基准涵盖了与多个基础学习器配对的元学习器,以及专门的因果机器学习模型。我们使用应用导向文献中常见的可观测指标以及方法论文中常用的反事实指标来评估这些方法。我们的结果揭示了两个互补的差距。首先,即使在相同的半模拟基准上,反事实指标也不能可靠地恢复可观测指标偏好的估计器。其次,在半模拟基准上获得的排名不能迁移到真实数据集。我们还发现,具有强大基础模型的简单元学习器始终具有竞争力,这与专门的因果模型形成对比。总体而言,我们的发现表明,治疗效果估计研究的进展不应仅通过反事实指标和半模拟基准来评估,而应结合可观测指标和真实数据验证。

英文摘要

Estimating heterogeneous treatment effects with machine learning has attracted substantial attention in both academic research and industrial practice. However, the two communities often evaluate models under markedly different conditions. Methodological work typically relies on semi-simulated benchmarks and metrics that require counterfactual outcomes, whereas real-world applications rely on observable metrics based on ranking or test outcomes. Despite the well-known gap between methodological progress and practical deployment, the relationship between these evaluation regimes has not been examined systematically. We conduct a large-scale empirical study of treatment effect evaluation across standard semi-simulated benchmark families and real-world datasets. Our benchmark covers meta-learners paired with multiple base learners, as well as specialized causal machine learning models. We evaluate these methods using observable metrics common in application-oriented literature, alongside counterfactual metrics commonly used in methods papers. Our results reveal two complementary gaps. First, counterfactual metrics do not reliably recover the estimators preferred by observable metrics, even on the same semi-simulated benchmarks. Second, rankings obtained on semi-simulated benchmarks do not transfer to real datasets. We further find that simple meta-learners with strong base models are consistently competitive, in contrast to specialized causal models. Overall, our findings suggest that progress in treatment effect estimation research should not be assessed solely through counterfactual metrics and semi-simulated benchmarks, but it would benefit from incorporating observable metrics and real-data validation.

2605.07733 2026-05-26 cs.LG cs.AI 版本更新

Intelligent Truck Matching in Full Truckload Shipments using Ping2Hex approach

使用Ping2Hex方法的整车运输智能卡车匹配

Srinivas Kumar Ramdas, Jose Mathew, Ankit Singh Chauhan, Dinesh Rajkumar, Aravind Manoj, Mohit Goel

发表机构 * Project44 Gmbh(Project44公司)

AI总结 提出基于Ping2Hex的智能卡车匹配系统ITM 2.0,通过概率排序和LightGBM模型解决GPS数据中车辆标识缺失导致的匹配问题,显著提升精度和覆盖率。

Comments 12 pages, 10 figures, 8 tables. Accepted at iSCSi 2026 (International Conference on Industry Sciences and Computer Sciences Innovation). To appear in Procedia Computer Science (Elsevier)

详情
Journal ref
ISCSI(2026)
AI中文摘要

利用GPS数据进行准确的卡车与货物匹配是整车供应链可视性的基础,能够实现实时跟踪和准确的预计到达时间(ETA)预测。然而,缺失或损坏的车辆标识符使得传统匹配方法无法使用,导致货物失去可视性。本文提出了智能卡车匹配(ITM)2.0,一个机器学习系统,通过将匹配问题表述为概率排序来解决这一关键缺口。我们的方法利用Uber H3六边形空间索引将GPS ping离散化为路线相似性特征,结合时间信息,然后应用带有阈值后处理的LightGBM梯度提升。通过严格的评估,包括离线模型选择(SVM、XGBoost、LightGBM)、全面的消融研究和生产影子测试,我们展示了相对于基于规则的基线的显著提升。ITM 2.0在北美实现了26个百分点的精度提升,在欧洲实现了14个百分点的提升,同时覆盖率翻倍。该系统已在Project44部署用于处理整车运输,展示了对于高达1公里的地理编码误差、多个候选卡车和稀疏ping的鲁棒性。

英文摘要

Accurate truck-to-shipment matching using GPS data is foundational for full truckload supply chain visibility, enabling real-time tracking and accurate estimated time of arrival (ETA) predictions. However, missing or corrupted vehicle identifiers prevent traditional matching approaches, leaving shipments without visibility. This paper presents Intelligent Truck Matching (ITM) 2.0, a machine learning system that addresses this critical gap by formulating matching as a probabilistic ranking problem. Our approach leverages Uber H3 hexagonal spatial indexing to discretize GPS pings into route similarity features, combined with temporal information, then applies LightGBM gradient boosting with threshold-based post-processing. Through rigorous evaluation including offline model selection (SVM, XGBoost, LightGBM), comprehensive ablation studies, and production shadow testing, we demonstrate substantial gains over rule-based baselines. ITM 2.0 achieves 26 percentage point precision improvement in North America and 14 points in Europe, while doubling coverage. Deployed in production at Project44 handling full truckload shipments, the system demonstrates robustness to geocoding errors up to 1 km, multiple candidate trucks, and sparse pings.

2605.06415 2026-05-26 cs.LG cs.AI cs.CL cs.CV 版本更新

E = T*H/(O+B): A Dimensionless Control Parameter for Mixture-of-Experts Ecology

E = T*H/(O+B):混合专家生态的无量纲控制参数

Qingjun Zhang

发表机构 * School of Integrated Circuits, Wuxi Taihu University(无锡太湖大学集成电路学院)

AI总结 提出无量纲控制参数E = T*H/(O+B),通过12个控制实验证明E≥0.5可保证混合专家模型无死亡专家,并发现专家复活、正交毒性依赖数据集等六项额外结果。

Comments 12 experiments, 11,000+ training epochs, cross-modal validation (vision + language). Extended version of the Claude-in-the-Loop ecology framework

详情
AI中文摘要

我们引入E = T*H/(O+B),这是一个无量纲控制参数,用于预测混合专家(MoE)模型是否会发展出健康的专家生态还是陷入死亡专家。E将四个超参数——路由温度T、路由熵权重H、先知权重O和平衡权重B——组合成一个单一量。通过12个控制实验(8个视觉,4个语言),总计超过11,000个训练周期,我们确定仅E ≥ 0.5就足以保证零死亡专家,消除了手工设计负载平衡辅助损失的必要性。我们在CIFAR-10、CIFAR-100、TinyImageNet-200、WikiText-2和WikiText-103上跨模态验证了这一点。另外还发现了六项结果:(1)死亡专家可以复活——由平衡损失驱动路由器重新探索触发;(2)正交毒性依赖于数据集,并非普遍存在;(3)任务复杂性改变了临界E阈值;(4)模型过拟合与专家生态健康解耦;(5)三层MoE自发崩溃为两层功能结构;(6)生态结构在50倍温度范围内保持不变。我们提出E作为MoE训练的统一诊断指标,类似于流体力学中的雷诺数。

英文摘要

We introduce E = T*H/(O+B), a dimensionless control parameter that predicts whether Mixture-of-Experts (MoE) models will develop a healthy expert ecology or collapse into dead experts. E combines four hyperparameters -- routing temperature T, routing entropy weight H, oracle weight O, and balance weight B -- into a single quantity. Through 12 controlled experiments (8 vision, 4 language) totaling over 11,000 training epochs, we establish that E >= 0.5 alone is sufficient to guarantee zero dead experts, removing the necessity for handcrafted load-balancing auxiliary losses. We validate this cross-modally on CIFAR-10, CIFAR-100, TinyImageNet-200, WikiText-2, and WikiText-103. Six additional findings emerge: (1) dead experts can resuscitate -- triggered by balance loss driving router re-exploration; (2) ortho toxicity is dataset-dependent, not universal; (3) task complexity shifts the critical E threshold; (4) model overfitting is decoupled from expert ecological health; (5) three-tier MoE spontaneously collapses into a two-tier functional structure; (6) ecological structure is temperature-invariant across a 50x range. We propose that E serves as a unified diagnostic for MoE training, analogous to the Reynolds number in fluid dynamics.

2605.04295 2026-05-26 cs.LG cs.AI 版本更新

LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy

通过自适应共形语义熵进行LLM不确定性量化

Hamed Karimi, Vaishali Meyappan, Reza Samavi

发表机构 * Toronto Metropolitan University(多伦多 Metropolitan 大学) Vector Institute(向量研究所)

AI总结 提出自适应共形语义熵(ACSE)方法,通过聚类语义熵并自适应调整不确定性分数,结合共形校准实现统计可靠的接受/弃权决策,在多个数据集上优于现有基线。

Comments Accepted for publication in the Proceedings of the 35th International Joint Conference on Artificial Intelligence (IJCAI 2026); 14 Pages

详情
AI中文摘要

LLMs的过度自信,特别是在产生幻觉时,对在安全关键环境中部署模型构成了重大挑战,并使得对不确定性进行可靠估计成为必要。现有的不确定性量化方法通常优先考虑词汇或概率度量;然而,这些技术往往忽略了具有相似含义的不同响应的语义差异。在本文中,我们提出了自适应共形语义熵(ACSE),一种通过自适应测量LLMs输出中的语义分散性来估计提示级不确定性的方法。我们的不确定性评分函数基于对同一提示的多个不同响应的语义熵进行聚类。该函数根据每个聚类的语义特征自适应调整不确定性分数。为了确保我们分数的统计可靠性,我们使用共形校准应用决策规则来接受/弃权提示,提供了有限样本、无分布的保证,使得接受响应中的错误率保持在用户指定的容差范围内。我们使用不同LLMs和数据集进行的广泛实验评估表明,我们的方法在判别性能、共形保证和概率校准指标方面始终优于最先进的不确定性量化基线。作为一个亮点,对于TriviaQA数据集,我们方法的AUROC为0.88,而令牌熵方法为0.65。

英文摘要

LLMs' overconfidence, particularly when hallucinating, poses a significant challenge for the deployment of the models in safety-critical settings and makes a reliable estimation of uncertainty necessary. Existing approaches for uncertainty quantification typically prioritize lexical or probabilistic measures; however, these techniques often ignore the semantic variance of different responses with similar meaning. In this paper, we propose Adaptive Conformal Semantic Entropy (ACSE), a method for estimating prompt-level uncertainty by adaptively measuring semantic dispersion in LLMs outputs. Our uncertainty scoring function is based on clustering semantic entropy of multiple diverse responses to the same prompt. The function adaptively adjusts the uncertainty score based on semantic features of each cluster. To ensure statistical reliability of our score, we use conformal calibration to apply a decision rule to accept/abstain the prompts, providing a finite-sample, distribution-free guarantee such that the error rate among the accepted responses remains bounded by a user-specified tolerance. Our extensive experimental evaluations using different LLMs and datasets, demonstrate that our approach consistently outperforms state-of-the-art uncertainty quantification baselines using discriminative performance, conformal guarantees, and probabilistic calibration indicators. As a highlight, for TriviaQA dataset, AUROC of our approach is 0.88 compared to 0.65 produced by the token entropy approach.

2605.02044 2026-05-26 cs.LG 版本更新

NeuroViz: Real-time Interactive Visualization of Forward and Backward Passes in Neural Network Training

NeuroViz:神经网络训练中前向和后向传播的实时交互式可视化

Tanvi Sharma, Reza Rawassizadeh

发表机构 * Boston University(波士顿大学)

AI总结 提出NeuroViz交互式可视化工具,通过实时展示全连接神经网络训练中的激活值、权重更新和损失变化,以及逐神经元方程,显著提升训练透明度和可解释性。

Comments 9 pages, 4 figures, 6 tables

详情
AI中文摘要

训练神经网络难以解释,尤其对于新手。我们介绍了NeuroViz,一个交互式可视化工具,支持全连接神经网络训练的实时探索。用户可以配置网络架构、激活函数、学习率和数据集,然后观察激活值、权重更新和损失进展。NeuroViz将权重变化与前后向传播中的激活信号直接对应可视化,使用户能够区分单个epoch内的更新前后状态,并查看动态更新的逐神经元方程。我们与31名参与者进行了对比用户研究,与六个已有的可视化工具相比,NeuroViz获得了最高的可用性评分(SUS 80.97,属于“优秀”范围),清晰度平均排名2.47,有用性平均排名2.23(越低越好)。超过70%的参与者报告说,可视化显著提高了他们对神经网络训练透明度的感知。实现实例可在https://neuroviz.org访问。

英文摘要

Training neural networks is difficult to interpret, particularly for newcomers. We introduce NeuroViz, an interactive visualization tool that supports real-time exploration of fully connected neural network training. Users can configure network architecture, activation functions, learning rates, and datasets, then observe activations, weight updates, and loss progression. NeuroViz visualizes weight changes in direct correspondence with activation signals in both forward and backward passes, enabling users to distinguish pre- and post-update states within individual epochs and view dynamically updating per-neuron equations. We conduct a comparative user study with 31 participants against six established visualization tools and we achieved the highest usability score (SUS 80.97, in the 'excellent' range), with mean rankings of 2.47 for clarity and 2.23 for usefulness (lower is better). Over 70% of participants reported that the visualizations substantially increased their perception of neural network training transparency. The implemented instance is accessible at https://neuroviz.org.

2604.24517 2026-05-26 cs.LG cs.GT 版本更新

Prior-Agnostic Robust Forecast Aggregation

先验无关的鲁棒预测聚合

Zhi Chen, Cheng Peng, Wei Tang

发表机构 * Chinese University of Hong Kong(香港中文大学)

AI总结 针对未知状态空间和先验的鲁棒预测聚合问题,提出一种显式闭式对数几率聚合器,在线性对数几率空间线性池化预测,并在三种知识体制下给出接近极小极大遗憾的界。

详情
AI中文摘要

鲁棒预测聚合结合多个信息源的预测,以在所有可能信息结构的最坏情况下表现良好。以往工作主要关注已知二元状态空间(状态为0或1)的设置。我们研究先验无关的鲁棒预测聚合,其中聚合器仅观察专家的报告,但对底层联合信息结构和完整先验(包括底层状态空间)一无所知。与固定二元状态空间{0,1}的标准模型不同,我们允许(二元)未知状态值为[0,1]中的任意数,因此相同的报告概率可能对应不同环境中截然不同的实现结果频率。 我们的主要贡献是一个简单、显式、闭式的对数几率聚合器,它在对数几率空间线性池化预测,并在三种知识体制下给出(近乎)紧的极小极大遗憾界。我们首先证明,在条件独立(CI)信号下,通过建立更大的下界,未知状态空间的鲁棒聚合比已知状态设置严格更难,并且我们的聚合规则可以实现0.0255的最坏情况遗憾。在此过程中,我们还刻画了Blackwell有序结构和一般信息结构的紧遗憾界。在经典设置(已知状态空间{0,1})中,我们的聚合器在CI结构下实现严格低于0.0226的遗憾。据我们所知,这是第一个实现严格低于0.0226遗憾上界的显式闭式聚合器。最后,我们扩展模型,使聚合器额外知道每个专家的边际预测分布;在此设置下,对于CI结构,我们证明广义对数几率规则实现0.0228的遗憾,并补充了0.0225的下界。

英文摘要

Robust forecast aggregation combines the predictions of multiple information sources to perform well in the worst case across all possible information structures. Previous work largely focuses on settings with a known binary state space, where the state is either 0 or 1. We study prior-agnostic robust forecast aggregation in which the aggregator observes only experts' reports, yet is ignorant of both the underlying joint information structure and the full prior, including the underlying state space. Unlike the standard model that fixes the binary state space {0, 1}, we allow the (binary) unknown state values to be arbitrary numbers in [0, 1], so the same reported probability may correspond to very different realized outcome frequencies across environments. Our main contribution is a simple, explicit, closed-form log-odds aggregator that linearly pools forecasts in logit space, together with (nearly-)tight minimax-regret guarantees across three knowledge regimes. We first show that under conditionally independent (CI) signals, robust aggregation with an unknown state space is strictly harder than in the known-state setting by establishing a larger lower bound, and our aggregation rule can achieve a worst-case regret of 0.0255. Along the way, we also characterize tight regret bounds for Blackwell-ordered structures and for general information structures. In the classical setting with known state space {0,1}, our aggregator achieves regret strictly below 0.0226 for CI structures. To the best of our knowledge, this is the first explicit closed-form aggregator that achieves a regret upper bound strictly less than 0.0226. Finally, we extend the model where the aggregator additionally knows each expert's marginal forecast distribution; in this setting, with the CI structures, we show that a generalized log-odds rule achieves regret of 0.0228, complementing with a lower bound of 0.0225.

2604.22948 2026-05-26 cs.LG stat.CO stat.ML 版本更新

Score-Repellent Monte Carlo: Toward Efficient Non-Markovian Sampler with Constant Memory in General State Spaces

分数排斥蒙特卡洛:面向一般状态空间中具有恒定内存的高效非马尔可夫采样器

Jie Hu, Lingyun Chen, Geeho Kim, Jinyoung Choi, Bohyung Han, Do Young Eun

发表机构 * CSE, Oakland University, Rochester, USA(计算机科学与工程系,奥本大学,罗切斯特,美国) ECE, North Carolina State University, Raleigh, USA(电子工程系,北卡罗来纳州立大学,拉勒米希茨,美国) IPAI, Seoul National University, Seoul, Korea(IPAI,首尔国立大学,首尔,韩国) AIGS, Ulsan National Institute of Science and Technology, Ulsan, Korea(AIGS,乌山国立科学与技术研究所,乌山,韩国)

AI总结 提出分数排斥蒙特卡洛(SRMC)框架,通过分数评估的运行平均值总结轨迹历史,利用指数分数倾斜构建替代目标,实现恒定内存下的非马尔可夫采样,降低渐近方差并改善模式覆盖。

Comments Accepted at ICML 2026 (Spotlight); GitHub Repo: https://github.com/srmc-project/Score-Repellent-Monte-Carlo

详情
AI中文摘要

历史依赖采样可以通过阻止冗余重访来降低长期蒙特卡洛方差,但现有方案通常通过有限状态空间上的经验度量编码历史,这在高维离散配置空间中不可行或在连续域中不适定。我们提出分数排斥蒙特卡洛(SRMC)框架,该框架通过 $\mathbb{R}^d$ 中分数评估的运行平均值总结轨迹历史,其中 $d$ 是分数和状态表示的维度。该历史通过指数分数倾斜转换为替代目标,以 $α$ 为索引,表示排斥强度,控制基于历史的排斥幅度。替代族在标准MCMC意义上是无需归一化的,从而产生一个通用包装器:在每次迭代中,任何针对 $π$ 的基础核都可以在当前替代 $π_{θ_n}$ 上运行,同时在线更新历史。我们使用带有受控马尔可夫噪声的随机逼近分析历史递归和蒙特卡洛估计器的耦合演化,建立了几乎必然收敛和联合中心极限定理。我们进一步确定了渐近协方差随 $α$ 增加而减小的区域,缩放比例为 $O(1/α)$,将有限状态历史依赖采样器的近零方差效应扩展到具有恒定内存的一般状态空间。在连续目标和离散能量基模型上的实验表明,估计器方差和模式覆盖得到改善,同时保持 $O(d)$ 内存使用和适度的每次迭代开销。

英文摘要

History-dependent sampling can reduce long-run Monte Carlo variance by discouraging redundant revisits, but existing schemes typically encode history through empirical measure on finite state spaces, which is infeasible in high-dimensional discrete configuration spaces or ill-posed in continuous domains. We propose Score-Repellent Monte Carlo (SRMC) framework that summarizes trajectory history by a running average of score evaluations in $\mathbb{R}^d$, where $d$ is the dimension of the score and state representation. This history is converted into a surrogate target through an exponential score tilt, indexed with $α$ that represents the strength of repellence in controlling the magnitude of the history-based repulsion. The surrogate family is normalization-free in the standard MCMC sense, yielding a generic wrapper: at each iteration, any base kernel targeting $π$ can instead be run on the current surrogate $π_{θ_n}$ while the history is updated online. We analyze the coupled evolution of the history recursion and Monte Carlo estimators using stochastic approximation with controlled Markovian noise, establishing almost sure convergence and a joint central limit theorem. We further identify regimes in which the asymptotic covariance decreases as $α$ increases, with scaling $O(1/α)$, extending the near-zero-variance effect of finite-state history-dependent samplers to general state spaces with constant memory. Experiments on continuous targets and discrete energy-based models demonstrate improved estimator variance and mode coverage, while retaining $O(d)$ memory usage and modest per-iteration overhead.

2604.16075 2026-05-26 math.NA cs.DS cs.LG cs.NA math.OC 版本更新

Towards Universal Convergence of Backward Error in Linear System Solvers

线性系统求解器中后向误差的通用收敛性

Michał Dereziński, Yuji Nakatsukasa, Elizaveta Rebrova

发表机构 * University of Michigan(密歇根大学) University of Oxford(牛津大学) Princeton University(普林斯顿大学)

AI总结 本文证明经典Richardson迭代在半正定线性系统上具有与条件数无关的1/k后向误差收敛率,并基于Krylov子空间最小化后向误差提出MINBERR算法,实现O(n^2/√ε)复杂度。

Comments Added convergence analysis of MINBERR-NE for general linear systems (Theorems 5.1 and 5.3)

详情
AI中文摘要

寻找一个在$O(n^2)$时间复杂度内求解$n\times n$线性系统的算法,或在求解至$\varepsilon$相对误差时达到$O(n^2 \text{poly}(1/\varepsilon))$复杂度的算法,是数值线性代数和理论计算机科学中一个长期存在的开放问题。测量相对误差有两种主要范式:前向误差(即输出与最优解之间的距离)和后向误差(即输出所解决的最接近问题之间的距离)。在大多数先前的研究中,迭代线性系统求解器的收敛性是通过各种前向误差概念来衡量的,因此严重依赖于输入的条件数。然而,数值分析文献长期以来一直主张后向误差是更实用的近似概念。在这项工作中,我们表明——令人惊讶的是——经典且简单的Richardson迭代在任何半正定(PSD)线性系统上,经过$k$次迭代后,后向误差至多为$1/k$,无论其条件数如何。这种通用收敛率意味着一个$O(n^2/\varepsilon)$复杂度的算法,用于求解PSD线性系统至$\varepsilon$后向误差,并且我们建立了在使用Richardson之外的各种Krylov求解器时类似或更好的复杂度。然后,通过直接在Krylov子空间上最小化后向误差,我们实现了更快的$O(1/k^2)$通用收敛率,并将其转化为一个高效算法MINBERR,复杂度为$O(n^2/\sqrt{\varepsilon})$。最后,我们通过正规方程将该方法扩展到求解一般线性系统,时间复杂度为$O(n^2\log(n)/\varepsilon)$。我们在基准问题上报告了算法的强大数值性能。

英文摘要

The quest for an algorithm that solves an $n\times n$ linear system in $O(n^2)$ time complexity, or $O(n^2 \text{poly}(1/ε))$ when solving up to $ε$ relative error, is a long-standing open problem in numerical linear algebra and theoretical computer science. There are two predominant paradigms for measuring relative error: forward error (i.e., distance from the output to the optimum solution) and backward error (i.e., distance to the nearest problem solved by the output). In most prior studies, convergence of iterative linear system solvers is measured via various notions of forward error, and as a result, depends heavily on the conditioning of the input. Yet, the numerical analysis literature has long advocated for backward error as the more practically relevant notion of approximation. In this work, we show that -- surprisingly -- the classical and simple Richardson iteration incurs at most $1/k$ (relative) backward error after $k$ iterations on any positive semidefinite (PSD) linear system, irrespective of its condition number. This universal convergence rate implies an $O(n^2/ε)$ complexity algorithm for solving a PSD linear system to $ε$ backward error, and we establish similar or better complexity when using a variety of Krylov solvers beyond Richardson. Then, by directly minimizing backward error over a Krylov subspace, we attain an even faster $O(1/k^2)$ universal rate, and we turn this into an efficient algorithm, MINBERR, with complexity $O(n^2/\sqrtε)$. Finally, we extend this approach via normal equations to solving general linear systems in $O(n^2\log(n)/ε)$ time complexity. We report strong numerical performance of our algorithms on benchmark problems.

2604.13088 2026-05-26 cs.LG cs.AI 版本更新

Design Conditions for Intra-Group Learning of Sequence-Level Rewards: Token Gradient Cancellation

序列级奖励的组内学习设计条件:令牌梯度消除

Fei Ding, Yongkang Zhang, youwei wang, Zijian Zeng

发表机构 * Alibaba Group(阿里巴巴集团) Tsinghua University(清华大学)

AI总结 针对大语言模型多步推理中稀疏终端奖励导致的信用分配问题,提出反事实比较框架和隐式行为策略优化(IBPO),通过轨迹差异近似替代决策,将稀疏奖励转化为步骤敏感信号,提升训练稳定性和推理性能。

详情
AI中文摘要

基于大语言模型的多步推理强化学习通常依赖于稀疏的终端奖励,这导致了不良条件的信用分配问题:最终反馈均匀地传播到所有中间决策。这导致高梯度方差、不稳定的训练和许多无效更新,最终限制了模型的持续改进。我们提出了一种用于信用分配的反事实比较框架。对于每个输入,该框架采样多个推理轨迹,并将它们的差异视为替代决策的隐式近似。这产生了一个隐式过程级优势估计器,将稀疏的终端奖励转化为步骤敏感的学习信号。基于此框架,我们引入了隐式行为策略优化(IBPO),显著提高了数学和代码推理基准上的训练稳定性和性能上限。我们的结果指向了一个有希望的方向,以解锁大语言模型的推理潜力。

英文摘要

Reinforcement learning for multi-step reasoning with large language models (LLMs) typically relies on sparse terminal rewards, which creates a poorly conditioned credit-assignment problem: the final feedback is propagated uniformly across all intermediate decisions. This leads to high gradient variance, unstable training, and many ineffective updates, ultimately limiting sustained model improvement. We propose a counterfactual-comparison framework for credit assignment. For each input, the framework samples multiple reasoning trajectories and treats their differences as implicit approximations to alternative decisions. This yields an implicit process-level advantage estimator that converts sparse terminal rewards into step-sensitive learning signals. Building on this framework, we introduce Implicit Behavior Policy Optimization (IBPO), which substantially improves training stability and the performance ceiling on mathematical and code-reasoning benchmarks. Our results point to a promising direction for unlocking the reasoning potential of LLMs.

2604.11811 2026-05-26 cs.PL cs.AI cs.CL cs.LG 版本更新

M$^\star$: Every Task Deserves Its Own Memory Harness

M$^\star$:每个任务都应有专属的记忆框架

Wenbo Pan, Shujie Liu, Xiangyang Zhou, Shiwei Zhang, Wanlu Shi, Mirror Xu, Xiaohua Jia

发表机构 * City University of Hong Kong(香港城市大学) Microsoft(微软)

AI总结 提出M$^\star$方法,通过可执行程序进化自动发现任务优化的记忆系统,在对话、具身规划和专家推理等任务上优于固定记忆基线。

Comments Preprint. Code: https://github.com/wbopan/mstar ; Live demo: https://mstar.wenbo.io

详情
AI中文摘要

大型语言模型代理依赖专门的记忆系统在长时间交互中积累和重用知识。最近的架构通常采用针对特定领域定制的固定记忆设计,例如用于对话的语义检索或用于编码的技能重用。然而,为某一目的优化的记忆系统往往无法迁移到其他任务。为了解决这一限制,我们引入了M$^\star$,一种通过可执行程序进化自动发现任务优化记忆框架的方法。具体来说,M$^\star$将代理记忆系统建模为用Python编写的记忆程序。该程序封装了数据模式、存储逻辑和代理工作流指令。我们使用反射式代码进化方法联合优化这些组件;该方法采用基于种群的搜索策略,并分析评估失败以迭代改进候选程序。我们在涵盖对话、具身规划和专家推理的四个不同基准上评估M$^\star$。结果表明,M$^\star$在所有评估任务上稳健地优于现有的固定记忆基线。此外,进化出的记忆程序对每个领域展现出结构不同的处理机制。这一发现表明,针对给定任务特化记忆机制探索了广泛的设计空间,并提供了比通用记忆范式更优的解决方案。

英文摘要

Large language model agents rely on specialized memory systems to accumulate and reuse knowledge during extended interactions. Recent architectures typically adopt a fixed memory design tailored to specific domains, such as semantic retrieval for conversations or skills reused for coding. However, a memory system optimized for one purpose frequently fails to transfer to others. To address this limitation, we introduce M$^\star$, a method that automatically discovers task-optimized memory harnesses through executable program evolution. Specifically, M$^\star$ models an agent memory system as a memory program written in Python. This program encapsulates the data Schema, the storage Logic, and the agent workflow Instructions. We optimize these components jointly using a reflective code evolution method; this approach employs a population-based search strategy and analyzes evaluation failures to iteratively refine the candidate programs. We evaluate M$^\star$ on four distinct benchmarks spanning conversation, embodied planning, and expert reasoning. Our results demonstrate that M$^\star$ improves performance over existing fixed-memory baselines robustly across all evaluated tasks. Furthermore, the evolved memory programs exhibit structurally distinct processing mechanisms for each domain. This finding indicates that specializing the memory mechanism for a given task explores a broad design space and provides a superior solution compared to general-purpose memory paradigms.

2604.04453 2026-05-26 cs.CE cs.LG 版本更新

Generative modeling of granular flow on inclined planes using conditional flow matching

基于条件流匹配的倾斜平面上颗粒流生成建模

Xuyang Li, Rui Li, Teng Man, Yimin Lu

发表机构 * School of Construction, University of North Carolina at Charlotte(北卡罗来纳大学夏洛特分校建设学院) Department of Civil, Environmental, and Construction Engineering, Texas Tech University(德克萨斯理工大学土木、环境与建设工程系) College of Civil Engineering, Zhejiang University of Technology(浙江工业大学土木工程学院)

AI总结 提出首个条件流匹配(CFM)框架,利用稀疏边界观测重建颗粒流内部运动,通过可微前向算子和稀疏感知梯度引导机制实现高精度重建,并优于确定性CNN基线。

详情
AI中文摘要

颗粒流控制着许多自然和工业过程,但其内部运动学和力学在很大程度上仍无法观测,因为实验只能接触到边界或自由表面。传统的数值模拟对于快速逆重建计算成本高昂,而确定性模型在病态设定下往往会退化为过度平滑的平均预测。本研究据作者所知首次提出了一个条件流匹配(CFM)框架,用于从稀疏边界观测重建颗粒流。该生成模型在高保真颗粒分辨离散元模拟上训练,在推理时由可微前向算子和一种新颖的稀疏感知梯度引导机制指导。该机制避免了标准均方误差方法固有的梯度稀释,保留了观测误差的绝对物理尺度,无需超参数调整即可强制执行测量一致性,并防止非材料区域出现非物理速度预测。一个物理解码器将重建的速度场映射到应力状态和能量波动量,包括平均应力、偏应力和颗粒温度。该框架从完整观测到仅16%的信息窗口准确恢复内部流场,并且在仅11%数据的强稀释空间分辨率下仍然有效。在最病态的重建区域,它优于确定性CNN基线,并通过集成生成提供空间分辨的不确定性估计。这些结果表明,条件生成模型为颗粒介质中隐藏体力学特性的非侵入性推断提供了一条实用途径,并暗示了在颗粒和多相系统逆问题中的潜在适用性。

英文摘要

Granular flows govern many natural and industrial processes, yet their interior kinematics and mechanics remain largely unobservable, as experiments access only boundaries or free surfaces. Conventional numerical simulations are computationally expensive for fast inverse reconstruction, and deterministic models tend to collapse to over-smoothed mean predictions in ill-posed settings. This study, to the best of the authors' knowledge, presents the first conditional flow matching (CFM) framework for granular-flow reconstruction from sparse boundary observations. Trained on high-fidelity particle-resolved discrete element simulations, the generative model is guided at inference by a differentiable forward operator and a novel sparsity-aware gradient guidance mechanism. This mechanism avoids the gradient dilution inherent to standard mean-squared-error approaches, preserves the absolute physical scale of observation errors, enforces measurement consistency without hyperparameter tuning, and prevents unphysical velocity predictions in non-material regions. A physics decoder maps the reconstructed velocity fields to stress states and energy fluctuation quantities, including mean stress, deviatoric stress, and granular temperature. The framework accurately recovers interior flow fields from full observation to only 16\% of the informative window, and it remains effective under strongly diluted spatial resolution with only 11% of data. It also outperforms a deterministic CNN baseline in the most ill-posed reconstruction regime and provides spatially resolved uncertainty estimates through ensemble generation. These results demonstrate that conditional generative modeling offers a practical route for non-invasive inference of hidden bulk mechanics in granular media, and it suggests potential applicability for inverse problems in particulate and multiphase systems.

2603.18444 2026-05-26 cs.LG cs.AI 版本更新

Discounted Beta-Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards

折扣Beta-Bernoulli奖励估计用于基于可验证奖励的样本高效强化学习

Haechan Kim, Soohyun Ryu, Gyouk Chu, Doohyuk Jang, Eunho Yang

发表机构 * KAIST(韩国科学技术院)

AI总结 针对基于可验证奖励的强化学习样本效率低的问题,提出折扣Beta-Bernoulli奖励估计方法,利用历史奖励统计量降低估计方差并避免方差崩溃,在多个推理基准上显著提升性能。

Comments 14 pages, 3 figures

详情
AI中文摘要

基于可验证奖励的强化学习已成为提升大语言模型推理能力的有效后训练范式。然而,现有的基于组的RLVR方法常遭受严重的样本低效问题。这种低效源于对少量rollout的奖励进行点估计,导致高估计方差、方差崩溃以及生成响应的无效利用。在本工作中,我们从统计估计角度重新审视RLVR,将奖励建模为从策略诱导分布中抽取的样本,并将优势计算视为从有限数据中估计奖励分布的问题。基于此观点,我们提出折扣Beta-Bernoulli奖励估计,该方法利用历史奖励统计量处理非平稳分布。尽管有偏,所得估计量展现出降低且稳定的方差,理论上避免了估计方差崩溃,并在均方误差上优于标准点估计。在六个分布内和三个分布外推理基准上的大量实验表明,使用DBB的GRPO一致优于朴素GRPO,在1.7B和8B模型上分别实现了分布内平均Acc@8提升3.22/2.42点,分布外提升12.49/6.92点,且无需额外计算成本或内存开销。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has emerged as an effective post-training paradigm for improving the reasoning capabilities of large language models. However, existing group-based RLVR methods often suffer from severe sample inefficiency. This inefficiency stems from reliance on point estimation of rewards from a small number of rollouts, leading to high estimation variance, variance collapse, and ineffective utilization of generated responses. In this work, we reformulate RLVR from a statistical estimation perspective by modeling rewards as samples drawn from a policy-induced distribution and casting advantage computation as the problem of estimating the reward distribution from finite data. Building on this view, we propose Discounted Beta-Bernoulli (DBB) reward estimation, which leverages historical reward statistics for the non-stationary distribution. Although biased, the resulting estimator exhibits reduced and stable variance, theoretically avoids estimated variance collapse, and achieves lower mean squared error than standard point estimation. Extensive experiments across six in-distribution and three out-of-distribution reasoning benchmarks demonstrate that GRPO with DBB consistently outperforms naive GRPO, achieving average Acc@8 improvements of 3.22/2.42 points in-distribution and 12.49/6.92 points out-of-distribution on the 1.7B and 8B models, respectively, without additional computational cost or memory usage.

2603.17198 2026-05-26 cs.LG cs.CL 版本更新

Structural Abstraction as an Inductive Bias for Non-Stationary Language Model Training

结构抽象作为非平稳语言模型训练的归纳偏置

Elnaz Rahmati, Nona Ghazizadeh, Zhivar Sourati, Nina Rouhani, Morteza Dehghani

发表机构 * University of Southern California(南加州大学)

AI总结 提出抽象增强训练(AAT)方法,通过联合优化具体实例及其结构抽象,减少灾难性干扰并提升关系泛化能力,在非平稳语言模型训练中验证了结构抽象作为稳定学习信号的有效性。

详情
AI中文摘要

认知科学的一个基本原则认为,智能体不是通过将经验存储为孤立实例来学习,而是通过形成捕捉跨情境共享关系结构的抽象图式来学习。尽管这一主张得到了行为和神经影像研究的充分支持,但其作为语言模型计算训练信号的作用仍未得到充分探索。我们针对非平稳语言模型训练中的这一空白,提出疑问:将学习偏向结构抽象是否能如人类结果所预测的那样减少灾难性干扰并提升关系泛化?为研究这一问题,我们引入了抽象增强训练(AAT),这是一种轻量级的损失级修改,联合优化具体实例及其结构抽象,以及两个基准:关系循环基准(RCB)和叙事抽象基准(NAB)。这些资源将核心认知构造操作化:实体掩码作为关系对齐的计算模拟,谚语作为必须跨表面不同情境推断的隐式抽象意义的载体。我们的实证结果表明,AAT持续减少遗忘并提升泛化,其模式与基于图式学习的认知预测一致。除了对持续学习的实际意义外,这些结果提供了初步的计算证据,表明结构抽象是非平稳环境中稳定学习的信号。

英文摘要

A foundational principle in cognitive science holds that intelligent agents do not learn by storing experiences as isolated instances, but by forming abstract schemas that capture relational structure shared across situations. Even though this claim is well supported by behavioral and neuroimaging studies, its role as a computational training signal in language models remains underexplored. We target this gap in the setting of non-stationary language model training, asking does biasing learning toward structural abstraction reduce catastrophic interference and improve relational generalization as predicted by human results? To study this question, we introduce Abstraction-Augmented Training (AAT), a lightweight loss-level modification that jointly optimizes over concrete instances and their structural abstractions, and two benchmarks, the Relational Cycle Benchmark (RCB) and the Narrative Abstraction Benchmark (NAB). These resources operationalize core cognitive constructs: entity masking as a computational analog of relational alignment, and proverbs as vehicles for implicit abstract meaning that must be inferred across surface-dissimilar situations. Our empirical results demonstrate that AAT consistently reduces forgetting and improves generalization in a pattern that aligns with cognitive predictions for schema-based learning. Beyond the practical implications for continual learning, these results offer preliminary computational evidence that structural abstraction is a signal for stable learning in non-stationary environments.

2603.17044 2026-05-26 cs.LG cs.AI cs.CV 版本更新

Do Understanding and Generation Fight? A Diagnostic Study of DPO for Unified Multimodal Models

理解与生成相冲突吗?统一多模态模型DPO的诊断研究

Abinav Rao, Sujan Rachuri

AI总结 通过系统实验发现,在统一多模态模型上应用DPO时,生成质量难以对齐,主要原因是理解和生成梯度近乎正交且存在11-14倍的幅度不平衡,源于VQ token数量不对称。

Comments Experiments are inconclusive: The claim that architectures such as Chameleon or Emu would exhibit stronger gradient conflict is not supported by experiments or analysis, and all experiments are conducted on Janus-Pro without evaluation on other unified multimodal architectures

详情
AI中文摘要

统一多模态模型共享一个语言模型骨干来同时进行理解和生成图像。DPO能否同时对齐这两种能力?我们首次系统研究了这一问题,在Janus-Pro的1B和7B参数上应用DPO,采用七种训练策略和两种事后方法。核心发现是负面的:在该架构下,所有测试条件下生成质量都抵制DPO对齐。在7B规模下,没有任何方法能改善生成CLIPScore(|Δ| < 0.2,每个种子n=200,3个种子,p > 0.5);在1B规模下,所有方法都降低了生成质量,并且该结果在偏好数据类型(真实vs生成和模型vs模型)以及测试的数据量(150-288对)上均成立。梯度分析揭示了原因:理解和生成梯度近乎正交(cos ~ 0),且由于VQ token数量不对称(576个生成token vs. ~30-100个文本token),幅度不平衡达到约11-14倍。这种不平衡是多任务DPO中的主要干扰机制;幅度平衡产生了方向正确的理解增量(VQA +0.01-0.04,虽然单独不显著),但生成差距仍然存在。我们识别出离散VQ tokenization是一个可能的结构瓶颈——生成DPO损失收敛到ln(2)支持了这一点——并为使用基于VQ的统一模型的从业者提供了实用指导。

英文摘要

Unified multimodal models share a language model backbone for both understanding and generating images. Can DPO align both capabilities simultaneously? We present the first systematic study of this question, applying DPO to Janus-Pro at 1B and 7B parameters under seven training strategies and two post-hoc methods. The central finding is negative: generation quality resists DPO alignment across all tested conditions on this architecture. No method improves generation CLIPScore at 7B (|Delta| < 0.2, p > 0.5 at n=200 per seed, 3 seeds); at 1B, all methods degrade generation, and the result holds across preference data types (real-vs-generated and model-vs-model) and the data volumes tested (150-288 pairs). Gradient analysis reveals why: understanding and generation gradients are near-orthogonal (cos ~ 0) with ~11-14x magnitude imbalance driven by VQ token count asymmetry (576 generation tokens vs. ~30-100 text tokens). This imbalance is the dominant interference mechanism in multi-task DPO; magnitude-balancing yields directionally positive understanding deltas (+0.01-0.04 VQA, though individually not significant), but the generation gap persists regardless. We identify discrete VQ tokenization as a likely structural bottleneck -- supported by the generation DPO loss converging to ln(2) -- and provide practical guidance for practitioners working with VQ-based unified models.

2602.10538 2026-05-26 stat.ML cs.LG 版本更新

Why Agentic Theorem Prover Works: A Statistical Provability Theory of Mathematical Reasoning Models

为什么智能体定理证明器有效:数学推理模型的统计可证明性理论

Sho Sonoda, Shunta Akiyama, Yuya Uezato

发表机构 * CyberAgent, Inc.(CyberAgent公司) National Institute of Informatics, Japan(日本信息机构)

AI总结 本文通过统计可证明性理论,将形式化证明搜索建模为有限视界可达性MDP,分析了智能体定理证明器中各组件对有限预算下证明成功率的影响,并给出了成功率差距的误差界。

Comments accepted at icml2026

详情
AI中文摘要

智能体定理证明器结合了推理模型、检索、搜索和证明助手验证器,但目前尚不清楚哪些组件实际上提高了有限预算下的证明成功率,以及它们为何在真实数学工作负载上有效。我们通过统计可证明性来研究这个问题:在指定定理实例流上,在预算内达到已验证证明的概率。我们将形式化证明搜索建模为具有确定性验证器动态的有限视界可达性MDP,并表明在忠实状态抽象下,最优成功概率与普通句法可证明性一致。然后我们分析了一个简单但实际重要的流程:深度上的离线动作值回归,随后是贪婪的测试时证明。我们的主要定理通过一个占用加权和的一致动作值误差来界定学习证明器与最优证明器之间的可证明性差距;在常见的均匀误差解读中,主要复杂度乘子是学习证明器的平均截断证明长度。误差分解为逼近误差、训练分布的几何覆盖和蒙特卡洛标签噪声,并在动作间隔边界条件下以快速速率改进。该结果给出了一个组件敏感的解释,说明为什么验证器反馈、检索、表示几何和证明缩短机制在偏置定理工作负载上有帮助,而不与经典的最坏情况困难性相矛盾。

英文摘要

Agentic theorem provers combine a reasoning model, retrieval, search, and a proof assistant verifier, yet it remains unclear which components actually improve finite-budget proof success and why they help on real mathematical workloads. We study this question through statistical provability: the probability of reaching a verified proof within a budget on a specified stream of theorem instances. We model formal proof search as a finite-horizon reachability MDP with deterministic verifier dynamics, and show that under a faithful state abstraction the optimal success probability coincides with ordinary syntactic provability. We then analyze a simple but practically important pipeline: depth-wise offline action-value regression followed by greedy test-time proving. Our main theorem bounds the provability gap between the learned prover and the optimal prover by an occupancy-weighted sum of uniform action-value errors; in the common uniform-error reading, the leading complexity multiplier is the learned prover's average truncated proof length. The error decomposes into approximation error, geometric coverage of the training distribution, and Monte Carlo label noise, and improves to a fast rate under an action-gap margin condition. The result gives a component-sensitive account of why verifier feedback, retrieval, representation geometry, and proof-shortening mechanisms help on biased theorem workloads, without contradicting classical worst-case hardness.

2602.10090 2026-05-26 cs.AI cs.CL cs.LG 版本更新

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

Agent World Model: 用于智能体强化学习的无限合成环境

Zhaoyang Wang, Canwen Xu, Boyi Liu, Yite Wang, Siwei Han, Zhewei Yao, Huaxiu Yao, Yuxiong He

发表机构 * University of North Carolina at Chapel Hill(北卡罗来纳大学教堂山分校)

AI总结 提出Agent World Model (AWM)全合成环境生成管道,通过代码驱动和数据库支持的环境进行大规模强化学习,使智能体在多样日常场景中泛化。

Comments Accepted to ICML 2026

详情
AI中文摘要

近年来,大型语言模型(LLM)的进步使得自主智能体能够与工具和环境进行多轮交互。然而,扩展此类智能体训练受到缺乏多样且可靠环境的限制。在本文中,我们提出了Agent World Model(AWM),一个完全合成的环境生成管道。使用该管道,我们扩展到涵盖日常场景的1000个环境,智能体可以在其中与丰富的工具集交互并获得高质量的观测。值得注意的是,这些环境是代码驱动的并由数据库支持,比由LLM模拟的环境提供更可靠和一致的状态转换。此外,与从现实环境中收集轨迹相比,它们实现了更高效的智能体交互。为了展示该资源的有效性,我们对多轮工具使用智能体进行了大规模强化学习。得益于完全可执行的环境和可访问的数据库状态,我们还可以设计可靠的奖励函数。在三个基准上的实验表明,仅在合成环境中训练(而非特定于基准的环境)能产生强大的分布外泛化能力。代码可在 https://github.com/Snowflake-Labs/agent-world-model 获取。

英文摘要

Recent advances in large language model (LLM) have empowered autonomous agents to perform multi-turn interactions with tools and environments. However, scaling such agent training is limited by the lack of diverse and reliable environments. In this paper, we propose Agent World Model (AWM), a fully synthetic environment generation pipeline. Using this pipeline, we scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets and obtain high-quality observations. Notably, these environments are code-driven and backed by databases, providing more reliable and consistent state transitions than environments simulated by LLMs. Moreover, they enable more efficient agent interaction compared with collecting trajectories from realistic environments. To demonstrate the effectiveness of this resource, we perform large-scale reinforcement learning for multi-turn tool-use agents. Thanks to the fully executable environments and accessible database states, we can also design reliable reward functions. Experiments on three benchmarks show that training exclusively in synthetic environments, rather than benchmark-specific ones, yields strong out-of-distribution generalization. The code is available at https://github.com/Snowflake-Labs/agent-world-model.

2602.02009 2026-05-26 cs.LG 版本更新

Logic-Guided Vector Fields for Constrained Generative Modeling

逻辑引导的向量场用于约束生成建模

Ali Baheri

发表机构 * Rochester Institute of Technology(罗切斯特理工学院)

AI总结 提出逻辑引导向量场(LGVF)框架,通过可微逻辑约束松弛注入流匹配生成模型,结合训练时逻辑损失和推理时梯度调整,在三个约束生成案例中减少59-82%的约束违反。

详情
AI中文摘要

神经符号系统旨在结合符号逻辑的表达结构与神经学习的灵活性;然而,生成模型通常缺乏在生成时强制执行声明性约束的机制。我们提出了逻辑引导向量场(LGVF),这是一个神经符号框架,将符号知识(指定为逻辑约束的可微松弛)注入流匹配生成模型。LGVF耦合了两种互补机制:(1)训练时逻辑损失,惩罚连续流轨迹上的约束违反,权重强调目标分布附近的正确性;(2)推理时调整,使用约束梯度引导采样,作为对学习动力学的轻量级、逻辑信息校正。我们在三个约束生成案例研究上评估了LGVF,涵盖线性、非线性和多区域可行性约束。在所有设置中,与标准流匹配相比,LGVF将约束违反减少了59-82%,并在每种情况下实现了最低的违反率。在线性和环形设置中,LGVF还通过MMD衡量提高了分布保真度,而在多障碍物设置中,我们观察到满意度-保真度权衡,可行性提高但MMD增加。除了定量收益外,LGVF还产生了具有约束意识的向量场,表现出新兴的避障行为,无需显式路径规划即可将样本绕过禁止区域。

英文摘要

Neuro-symbolic systems aim to combine the expressive structure of symbolic logic with the flexibility of neural learning; yet, generative models typically lack mechanisms to enforce declarative constraints at generation time. We propose Logic-Guided Vector Fields (LGVF), a neuro-symbolic framework that injects symbolic knowledge, specified as differentiable relaxations of logical constraints, into flow matching generative models. LGVF couples two complementary mechanisms: (1) a training-time logic loss that penalizes constraint violations along continuous flow trajectories, with weights that emphasize correctness near the target distribution; and (2) an inference-time adjustment that steers sampling using constraint gradients, acting as a lightweight, logic-informed correction to the learned dynamics. We evaluate LGVF on three constrained generation case studies spanning linear, nonlinear, and multi-region feasibility constraints. Across all settings, LGVF reduces constraint violations by 59-82% compared to standard flow matching and achieves the lowest violation rates in each case. In the linear and ring settings, LGVF also improves distributional fidelity as measured by MMD, while in the multi-obstacle setting, we observe a satisfaction-fidelity trade-off, with improved feasibility but increased MMD. Beyond quantitative gains, LGVF yields constraint-aware vector fields exhibiting emergent obstacle-avoidance behavior, routing samples around forbidden regions without explicit path planning.

2602.01576 2026-05-26 cs.LG cs.AI cs.CV 版本更新

Generative Visual Code Mobile World Models

生成式视觉代码移动世界模型

Woosung Koh, Sungjun Han, Segyu Lee, Se-Young Yun, Jamin Shin

发表机构 * Trillion Labs(万亿实验室)

AI总结 提出通过单一视觉语言模型预测可执行网页代码来生成移动GUI下一状态,结合文本和视觉世界模型优势,实现高保真视觉生成与精确文本渲染。

Comments ICML 2026

详情
AI中文摘要

移动图形用户界面世界模型为在训练和推理时提升移动GUI代理性能提供了有前景的路径。然而,当前方法面临关键权衡:基于文本的世界模型牺牲了视觉保真度,而视觉世界模型在精确文本渲染上的不足导致其依赖缓慢、复杂的流水线和大量外部模型。我们提出一种新范式:通过可渲染代码生成进行视觉世界建模,其中单一视觉语言模型预测下一个GUI状态为可执行网页代码,该代码渲染为像素,而非直接生成像素。这结合了两种方法的优势:视觉语言模型保留其语言先验以实现精确文本渲染,同时其在结构化网页代码上的预训练实现了高保真视觉生成。我们推出了gWorld(8B、32B),这是基于该范式的首个开源权重视觉移动GUI世界模型,以及一个自动合成基于代码的训练数据的数据生成框架(gWorld)。在4个分布内和2个分布外基准测试的广泛评估中,gWorld在准确率与模型规模之间建立了新的帕累托前沿,性能优于8个前沿开源权重模型(其规模大50.25倍以上)。进一步分析表明:(1)通过gWorld扩展训练数据带来有意义的收益;(2)我们流水线的每个组件都提高了数据质量;(3)更强的世界建模提升了下游移动GUI策略性能。

英文摘要

Mobile Graphical User Interface (GUI) World Models (WMs) offer a promising path for improving mobile GUI agent performance at train- and inference-time. However, current approaches face a critical trade-off: text-based WMs sacrifice visual fidelity, while the inability of visual WMs in precise text rendering led to their reliance on slow, complex pipelines dependent on numerous external models. We propose a novel paradigm: visual world modeling via renderable code generation, where a single Vision-Language Model (VLM) predicts the next GUI state as executable web code that renders to pixels, rather than generating pixels directly. This combines the strengths of both approaches: VLMs retain their linguistic priors for precise text rendering while their pre-training on structured web code enables high-fidelity visual generation. We introduce gWorld (8B, 32B), the first open-weight visual mobile GUI WMs built on this paradigm, along with a data generation framework (gWorld) that automatically synthesizes code-based training data. In extensive evaluation across 4 in- and 2 out-of-distribution benchmarks, gWorld sets a new pareto frontier in accuracy versus model size, outperforming 8 frontier open-weight models over 50.25x larger. Further analyses show that (1) scaling training data via gWorld yields meaningful gains, (2) each component of our pipeline improves data quality, and (3) stronger world modeling improves downstream mobile GUI policy performance.

2601.21670 2026-05-26 cs.CV cs.LG 版本更新

Diverse via bounded Agreement: Geometric Regularization for Multimodal Fusion

通过有界一致性实现多样性:多模态融合的几何正则化

Zixuan Xia, Hao Wang, Pengcheng Weng, Yanyu Qian, Yangxin Xu, William Dan, Fei Wang

发表机构 * Department of Informatics University of Bern(伯尔尼大学信息学院) College of Computing and Data Science Nanyang Technological University(南洋理工大学计算机与数据科学学院) School of Software Engineering Xi’an Jiaotong University(西安交通大学软件工程学院)

AI总结 提出一种轻量级即插即用的几何正则化框架,通过有界一致性原则在保持模态特异多样性的同时约束跨模态漂移,提升多模态融合性能。

详情
AI中文摘要

多模态融合通常被视为一个优化平衡问题,通过调整训练信号防止一种模态主导其他模态。然而,平衡优化并不能完全决定中间表示的几何结构。有监督的多模态模型仍可能学习到低多样性的模态特定嵌入,或允许配对的跨模态观测过度分离,从而削弱单模态鲁棒性和多模态融合。 我们引入了\regName,一个轻量级即插即用的多模态表示学习几何正则化框架。\regName不强制执行严格的跨模态对齐,而是遵循有界一致性原则:在仅软约束超过允许一致性带的配对跨模态漂移部分的同时,保留模态特定多样性。在操作上,\regName结合了一个分散项(减轻谱集中度)和一个一致性带锚定项(控制过度配对漂移),无需架构修改或推理时开销。 在音频-视觉、图像-文本和基于RF的基准测试上的实验表明,\regName一致地提高了多模态性能,并常常增强单模态表示。这些结果表明,显式调节表示几何是优化平衡的有效补充,并提供了几何感知正则化可以改善跨不同架构和领域的多模态学习的证据。

英文摘要

Multimodal fusion is often treated as an optimization-balancing problem, where training signals are adjusted to prevent one modality from dominating the others. However, balanced optimization does not fully determine the geometry of intermediate representations. Supervised multimodal models may still learn low-diversity modality-specific embeddings or allow paired cross-modal observations to drift excessively apart, weakening both unimodal robustness and multimodal fusion. We introduce \regName, a lightweight plug-and-play geometric regularization framework for multimodal representation learning. Rather than enforcing rigid cross-modal alignment, \regName follows a bounded-agreement principle: preserve modality-specific diversity while softly constraining only the portion of paired cross-modal drift that exceeds an admissible agreement band. Operationally, \regName combines a dispersion term that mitigates spectral concentration with an agreement-band anchoring term that controls excessive paired drift, requiring no architectural modification or inference-time overhead. Experiments across audio-visual, image-text, and RF-based benchmarks show that \regName consistently improves multimodal performance and often strengthens unimodal representations. These results suggest that explicitly regulating representation geometry is an effective complement to optimization balancing, and provide evidence that geometry-aware regularization can improve multimodal learning across diverse architectures and domains.

2601.19070 2026-05-26 cs.LG 版本更新

Critical Organization of Deep Neural Networks, and p-Adic Statistical Field Theories

深度神经网络的临界组织与p进统计场论

W. A. Zúñiga-Galindo

发表机构 * University of Texas Rio Grande Valley School of Mathematical \& Statistical Sciences One West University Blvd Brownsville, TX 78520, United States

AI总结 本文严格证明了深度神经网络在激活函数为sigmoid时的热力学极限,揭示了参数空间中的分岔临界组织,并利用p进整数编码层次结构,将临界组织与层次拓扑联系起来,同时研究了随机版本网络的输出分布。

Comments Many typos and minor errors were corrected. The main theorem was strengthened

详情
AI中文摘要

我们严格研究了深度神经网络(DNNs)和循环神经网络(RNNs)的热力学极限,假设激活函数为sigmoid。热力学极限是一个连续神经网络,其中神经元形成具有无限多个点的连续空间。我们证明,在参数空间的某个区域内,这样的网络存在唯一的状态,该状态连续依赖于参数。在该参数空间区域之外,该状态分裂成无限多个状态。那么,临界组织是参数空间中的一个分岔,网络从唯一状态过渡到无限多个状态。我们使用p进整数来编码层次结构。实际上,我们提出了一种算法,将DNNs和RNNs中使用的层次拓扑重新表述为p进树状结构。在这个框架中,层次组织和临界组织是联系在一起的。我们严格研究了一个玩具模型的临界组织,该模型是一个基于p进细胞神经网络的灰度图像层次边缘检测器。这种网络的临界组织可以描述为一个奇异吸引子。在第二部分,我们研究了DNNs和RNNs的随机版本。在这种情况下,网络参数是二次可积函数空间中的广义高斯随机变量。我们计算了在无限宽度情况下给定输入时输出的概率分布。我们证明它有一个幂次展开,其中常数项是高斯分布。

英文摘要

We rigorously study the thermodynamic limit of deep neural networks (DNNS) and recurrent neural networks (RNNs), assuming that the activation functions are sigmoids. A thermodynamic limit is a continuous neural network, where the neurons form a continuous space with infinitely many points. We show that such a network admits a unique state in a certain region of the parameter space, which depends continuously on the parameters. This state breaks into an infinite number of states outside the mentioned region of parameter space. Then, the critical organization is a bifurcation in the parameter space, where a network transitions from a unique state to infinitely many states. We use p-adic integers to codify hierarchical structures. Indeed, we present an algorithm that recasts the hierarchical topologies used in DNNs and RNNs as p-adic tree-like structures. In this framework, the hierarchical and the critical organizations are connected. We study rigorously the critical organization of a toy model, a hierarchical edge detector for grayscale images based on p-adic cellular neural networks. The critical organization of such a network can be described as a strange attractor. In the second part, we study random versions of DNNs and RNNs. In this case, the network parameters are generalized Gaussian random variables in a space of quadratic integrable functions. We compute the probability distribution of the output given the input, in the infinite-width case. We show that it admits a power-type expansion, where the constant term is a Gaussian distribution.

2601.16091 2026-05-26 cs.MA cs.AI cs.LG 版本更新

Delayed Assignments in Online Non-Centroid Clustering with Stochastic Arrivals

随机到达的在线非质心聚类中的延迟分配

Saar Cohen

发表机构 * Bar Ilan University(巴伊兰大学) University of Oxford(牛津大学)

AI总结 针对随机到达模型,提出一种常数竞争比的在线非质心聚类算法,允许延迟分配以平衡聚类距离成本和延迟成本。

Comments To Appear in the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2026

详情
AI中文摘要

聚类是一个基本问题,旨在将一组元素(如智能体或数据点)划分为若干簇,使得同一簇内的元素彼此之间的距离小于与其他簇内元素的距离。本文提出了一个研究带延迟的在线非质心聚类的新框架,其中元素作为有限度量空间中的点逐个到达,应被分配到簇中,但分配不必立即进行。具体而言,每个点到达时其位置被揭示,在线算法必须不可撤销地将其分配到现有簇或创建一个新簇(此时仅包含该点)。然而,我们允许以延迟成本为代价推迟决策,而不是遵循更常见的到达时立即决策的假设。这带来了一个关键挑战:目标是最小化每个簇内点之间的总距离成本以及因推迟分配而产生的总延迟成本。在经典的坏情况到达模型(点以任意顺序到达)中,没有算法的竞争比优于点数的次对数。为克服这一强不可能性,我们专注于随机到达模型,其中点的位置随时间独立地从有限度量空间上的一个未知固定概率分布中抽取。我们提供了超越坏情况对手的希望:设计了一个常数竞争的算法,即随着点数的增长,输出聚类的期望总成本与最优离线聚类的总成本之比被一个常数所界。

英文摘要

Clustering is a fundamental problem, aiming to partition a set of elements, like agents or data points, into clusters such that elements in the same cluster are closer to each other than to those in other clusters. In this paper, we present a new framework for studying online non-centroid clustering with delays, where elements, that arrive one at a time as points in a finite metric space, should be assigned to clusters, but assignments need not be immediate. Specifically, upon arrival, each point's location is revealed, and an online algorithm has to irrevocably assign it to an existing cluster or create a new one containing, at this moment, only this point. However, we allow decisions to be postponed at a delay cost, instead of following the more common assumption of immediate decisions upon arrival. This poses a critical challenge: the goal is to minimize both the total distance costs between points in each cluster and the overall delay costs incurred by postponing assignments. In the classic worst-case arrival model, where points arrive in an arbitrary order, no algorithm has a competitive ratio better than sublogarithmic in the number of points. To overcome this strong impossibility, we focus on a stochastic arrival model, where points' locations are drawn independently across time from an unknown and fixed probability distribution over the finite metric space. We offer hope for beyond worst-case adversaries: we devise an algorithm that is constant competitive in the sense that, as the number of points grows, the ratio between the expected overall costs of the output clustering and an optimal offline clustering is bounded by a constant.

2601.11428 2026-05-26 cs.LG 版本更新

Diagnosing Failure Modes of Neural Operators Across Diverse PDE Families

诊断不同PDE族中神经算子的失败模式

Lennon Shikhman

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文提出一个标准化压力测试框架,通过在不同PDE族上测试FNO、DeepONet和CNO三种架构,发现分布内准确率不能可靠预测鲁棒性,且失败模式依赖于架构和PDE族的组合。

Comments Published in Transactions on Machine Learning Research. 17 pages, 7 figures, 1 table

详情
AI中文摘要

神经PDE求解器越来越多地被用作偏微分方程族的学习替代模型,其中关键的机器学习挑战不仅是在固定基准分布上的插值,还包括在系数、边界条件、离散化和滚动时域的结构化偏移下的泛化。然而,评估仍然常常由分布内测试误差主导,使得鲁棒性难以评估。我们引入了一个针对部署相关偏移下神经PDE求解器的标准化压力测试框架。我们在三个代表性架构——傅里叶神经算子(FNO)、DeepONet风格模型和卷积神经算子(CNO)——上实例化该框架,涵盖五个定性不同的PDE族:色散、椭圆、多尺度流体、金融和混沌系统。在750个训练模型中,我们使用基线归一化退化因子以及谱和滚动诊断来测量鲁棒性。由此产生的比较表明,强的分布内准确率不能可靠预测鲁棒性,并且失败模式共同依赖于架构和PDE族。我们的结果为评估神经PDE求解器中的鲁棒性声明提供了更清晰的基础,并表明在结构化偏移下的函数空间泛化应被视为首要评估目标。

英文摘要

Neural PDE solvers are increasingly used as learned surrogates for families of partial differential equations, where the key machine learning challenge is not only interpolation on a fixed benchmark distribution but generalization under structured shifts in coefficients, boundary conditions, discretization, and rollout horizon. Yet evaluation is still often dominated by in-distribution test error, making robustness difficult to assess. We introduce a standardized stress-testing framework for neural PDE solvers under deployment-relevant shift. We instantiate it on three representative architectures -- Fourier Neural Operators (FNOs), a DeepONet-style model, and convolutional neural operators (CNOs) -- across five qualitatively different PDE families: dispersive, elliptic, multi-scale fluid, financial, and chaotic systems. Across 750 trained models, we measure robustness using baseline-normalized degradation factors together with spectral and rollout diagnostics. The resulting comparisons reveal that strong in-distribution accuracy does not reliably predict robustness, and that failure patterns depend jointly on architecture and PDE family. Our results provide a clearer basis for evaluating robustness claims in neural PDE solvers and suggest that function-space generalization under structured shift should be treated as a first-class evaluation target.

2601.10201 2026-05-26 cs.LG cs.AI cs.CL 版本更新

Future-KL Regularized GRPO: Process-Level Credit Assignment from $f$-Divergence Regularization

未来KL正则化GRPO:基于f-散度正则化的过程级信用分配

Jiarui Yao, Ruida Wang, Hao Bai, Tong Zhang

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 本文提出未来KL正则化策略优化(FRPO),通过因果未来正则化回报修正GRPO中局部KL损失缺失的梯度信号,在数学推理任务中提升pass@16并保持更高熵和更低策略漂移。

详情
AI中文摘要

组相对策略优化(GRPO)广泛用于无评论家的大语言模型(LLM)后训练,但其KL正则化通常作为局部损失侧的token惩罚实现。我们表明这遗漏了自回归KL正则化诱导的策略梯度信号。与标准KL正则化强化学习(RL)目标不同,GRPO的组归一化引入非线性提示级效用;对于二元验证器奖励,该效用为$2\arcsin\sqrt p$。因此,奖励和KL在归一化前无法融合而不改变隐式目标。我们推导了具有token级$f$-散度正则化的GRPO风格目标的on-policy梯度。奖励项恢复标准化的GRPO优势,而正则化项包括局部KL损失遗漏的因果未来正则化回报。对于反向KL,这产生简单的未来KL修正:在优势构建后添加每个token对数比的反向累积和。由此产生的方法,未来KL正则化策略优化(FRPO),不需要评论家或额外的模型传递。在数学推理任务上,FRPO在我们的主要大模型设置中提高了pass@16,同时保持比传统损失侧KL基线更高的熵和更低的策略漂移。

英文摘要

Group Relative Policy Optimization (GRPO) is widely used for critic-free Large Language Model (LLM) post-training, but its KL regularization is usually implemented as a local loss-side token penalty. We show that this misses the policy-gradient signal induced by autoregressive KL regularization. Unlike standard KL-regularized Reinforcement Learning (RL) objectives, GRPO's group normalization induces a non-linear prompt-level utility; for binary verifier rewards, this utility is $2\arcsin\sqrt p$. As a result, reward and KL cannot be fused before normalization without changing the implicit objective. We derive the on-policy gradient of GRPO-style objectives with token-wise $f$-divergence regularization. The reward term recovers the standardized GRPO advantage, while the regularizer term includes a causal future-regularization return-to-go omitted by local KL losses. For reverse KL, this yields a simple future KL correction: add a reverse cumulative sum of per-token log ratios after advantage construction. The resulting method, Future-KL Regularized Policy Optimization (FRPO), requires no critic or extra model passes. On mathematical reasoning tasks, FRPO improves pass@16 in our main large-model setting while maintaining higher entropy and lower policy drift than conventional loss-side KL baselines.

2601.10012 2026-05-26 cs.LG 版本更新

PID-Guided Partial Alignment for Multimodal Decentralized Federated Learning

PID引导的多模态去中心化联邦学习部分对齐

Yanhang Shi, Xiaoyu Wang, Houwei Cao, Jian Li, Yong Liu

发表机构 * Department of Electrical and Computer Engineering, Stony Brook University(石溪大学电气与计算机工程系) Department of Applied Mathematics and Statistics and the Department of Computer Science, Stony Brook University(石溪大学应用数学与统计系和计算机科学系) Department of Electrical and Computer Engineering, New York University(纽约大学电气与计算机工程系) Department of Computer Science, New York Institute of Technology(纽约理工学院计算机科学系)

AI总结 针对多模态去中心化联邦学习中异构代理间更新不兼容的问题,提出基于部分信息分解的PARSE框架,通过特征分裂和部分对齐实现高效通信与协作。

详情
AI中文摘要

多模态去中心化联邦学习(DFL)必须支持持有不同模态子集和通常不同模型组件的代理之间的协作,同时在无协调服务器或全局网络视图的点对点(P2P)覆盖网络上运行。一个关键障碍是,传统的多模态训练通常依赖于单一共享表示,这隐含假设异构对等体可以通过相同的通信链路交换和聚合相同的模型组件。在多模态DFL中,这一假设不成立:单模态和多模态代理可能通过共享覆盖网络推送不兼容的更新,削弱代理间迁移和跨模态交互。我们提出PARSE,一个无服务器框架,将部分信息分解(PID)引入多模态DFL。每个代理将其潜在特征分裂为冗余、独特和协同切片(“特征分裂”),并在模态条件化的P2P覆盖网络上进行切片感知通信。在训练过程中,代理仅交换与其邻居在语义上可对齐的切片,根据它们共享的模态和模型组件(“部分对齐”)。这种设计避免了集中式编排和梯度手术式的冲突处理,同时与标准DFL约束和多种P2P覆盖网络拓扑兼容。在多个基准测试和异构代理混合场景中,PARSE在保持每链路负载受限的同时,始终优于任务共享、模态共享和混合共享的多模态DFL基线。关于融合选择和分裂比例的消融实验,以及定性特征分析和覆盖网络拓扑研究,证明了所提出的切片感知设计的鲁棒性和通信效率。

英文摘要

Multimodal decentralized federated learning (DFL) must support collaboration among agents that hold different modality subsets and often different model components, while operating over peer-to-peer (P2P) overlays without a coordinating server or a global network view. A key obstacle is that conventional multimodal training often relies on a single shared representation, which implicitly assumes that heterogeneous peers can exchange and aggregate the same model components over the same communication links. In multimodal DFL, this assumption breaks down: uni- and multimodal agents may push incompatible updates through shared overlays, weakening both inter-agent transfer and cross-modal interaction. We present PARSE, a server-free framework that brings partial information decomposition (PID) into multimodal DFL. Each agent splits its latent features into redundant, unique, and synergistic slices ("feature fission"), and performs slice-aware communication over modality-conditioned P2P overlays. During training, agents exchange only the slices that are semantically alignable with their neighbors, according to the modalities and model components they share ("partial alignment"). This design avoids centralized orchestration and gradient-surgery style conflict handling, while remaining compatible with standard DFL constraints and a range of P2P overlay topologies. Across multiple benchmarks and heterogeneous peer mixes, PARSE consistently outperforms task-, modality-, and hybrid-sharing multimodal DFL baselines while keeping per-link payloads bounded. Ablations on fusion choices and split ratios, together with qualitative feature analyses and overlay-topology studies, demonstrate the robustness and communication efficiency of the proposed slice-aware design.

2601.03191 2026-05-26 cs.CV cs.AI cs.LG 版本更新

AnatomiX, an Anatomy-Aware Grounded Multimodal Large Language Model for Chest X-Ray Interpretation

AnatomiX:一种解剖学感知的胸部X光解读多模态大语言模型

Anees Ur Rehman Hashmi, Numan Saeed, Christoph Lippert

发表机构 * Hasso Plattner Institute(霍普夫纳研究所) MBZUAI(穆萨大学人工智能研究所)

AI总结 提出AnatomiX,一种两阶段解剖学感知多模态大语言模型,通过先识别解剖结构再执行下游任务,在解剖定位、短语定位、定位诊断和定位描述任务上相比现有方法提升超过25%。

详情
AI中文摘要

多模态医学大语言模型在胸部X光解读方面取得了显著进展,但在空间推理和解剖学理解方面仍面临挑战。尽管现有的定位技术提高了整体性能,但它们往往未能建立真正的解剖对应关系,导致医学领域中的解剖理解错误。为弥补这一差距,我们引入了AnatomiX,一种用于解剖学定位的胸部X光解读的多任务多模态大语言模型。受放射学工作流程启发,AnatomiX采用两阶段方法:首先识别解剖结构并提取其特征,然后利用大语言模型执行多种下游任务,如短语定位、报告生成、视觉问答和图像理解。在多个基准上的大量实验表明,与现有方法相比,AnatomiX实现了卓越的解剖推理,并在解剖定位、短语定位、定位诊断和定位描述任务上性能提升超过25%。代码和预训练模型可在 https://aneesurhashmi.github.io/anatomix 获取。

英文摘要

Multimodal medical large language models have shown substantial progress in chest X-ray interpretation but continue to face challenges in spatial reasoning and anatomical understanding. Although existing grounding techniques improve overall performance, they often fail to establish a true anatomical correspondence, resulting in incorrect anatomical understanding in the medical domain. To address this gap, we introduce AnatomiX, a multitask multimodal large language model for anatomically grounded chest X-ray interpretation. Inspired by the radiological workflow, AnatomiX adopts a two stage approach: first, it identifies anatomical structures and extracts their features, and then leverages a large language model to perform diverse downstream tasks such as phrase grounding, report generation, visual question answering, and image understanding. Extensive experiments across multiple benchmarks demonstrate that AnatomiX achieves superior anatomical reasoning and delivers over 25% improvement in performance on anatomy grounding, phrase grounding, grounded diagnosis and grounded captioning tasks compared to existing approaches. Code and pretrained model are available at https://aneesurhashmi.github.io/anatomix

2512.23995 2026-05-26 cs.CR cs.LG 版本更新

RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress

RepetitionCurse:在DoS压力下测量和理解混合专家大语言模型中的路由器不平衡

Ruixuan Huang, Qingyue Wang, Hantao Huang, Yudong Gao, Dong Chen, Shuai Wang, Wei Wang

发表机构 * Ruixuan Huang(黄瑞轩) Qingyue Wang(王庆越) Hantao Huang(黄翰涛) Yudong Gao(高宇东) Dong Chen(陈东) Shuai Wang(王帅) Wei Wang(王伟)

AI总结 针对混合专家大语言模型在推理时缺乏显式负载均衡约束的问题,提出低成本的对抗性提示方法RepetitionCurse,利用重复令牌模式操纵路由策略,导致计算瓶颈和拒绝服务攻击,显著增加端到端推理延迟。

Comments Accepted by ICML 2026

详情
AI中文摘要

混合专家架构因其优越的参数效率已成为扩展大型语言模型的标准。为了适应实践中专家数量的增长,现代推理系统通常采用专家并行性将专家分布到不同设备上。然而,推理过程中缺乏显式负载均衡约束,使得对抗性输入能够触发严重的路由集中。我们证明,分布外提示可以操纵路由策略,使得所有令牌一致地路由到同一组top-$k$专家,这在某些设备上造成计算瓶颈,同时迫使其他设备空闲。这将效率机制转化为拒绝服务攻击向量,导致违反首令牌时间的服务级别协议。我们提出RepetitionCurse,一种低成本的基于黑盒的策略来利用这一漏洞。通过识别MoE路由器行为中的普遍缺陷,RepetitionCurse以模型无关的方式使用简单的重复令牌模式构建对抗性提示。在广泛部署的MoE模型(如Mixtral-8x7B)上,我们的方法将端到端推理延迟增加了3.063倍,显著降低了服务可用性。

英文摘要

Mixture-of-Experts architectures have become the standard for scaling large language models due to their superior parameter efficiency. To accommodate the growing number of experts in practice, modern inference systems commonly adopt expert parallelism to distribute experts across devices. However, the absence of explicit load balancing constraints during inference allows adversarial inputs to trigger severe routing concentration. We demonstrate that out-of-distribution prompts can manipulate the routing strategy such that all tokens are consistently routed to the same set of top-$k$ experts, which creates computational bottlenecks on certain devices while forcing others to idle. This converts an efficiency mechanism into a denial-of-service attack vector, leading to violations of service-level agreements for time to first token. We propose RepetitionCurse, a low-cost black-box strategy to exploit this vulnerability. By identifying a universal flaw in MoE router behavior, RepetitionCurse constructs adversarial prompts using simple repetitive token patterns in a model-agnostic manner. On widely deployed MoE models like Mixtral-8x7B, our method increases end-to-end inference latency by 3.063x, degrading service availability significantly.

2512.21815 2026-05-26 cs.CV cs.LG 版本更新

High-Entropy Tokens as Multimodal Failure Points in Vision-Language Models

高熵标记作为视觉-语言模型中的多模态失败点

Mengqi He, Xinyu Tian, Xin Shen, Jinhong Ni, Shu Zou, Zhaoyuan Yang, Jing Zhang

发表机构 * The Australia National University(澳大利亚国立大学) The University of Queensland(昆士兰大学) GE research(GE研究)

AI总结 本研究揭示视觉-语言模型中约20%的高熵标记集中了不成比例的对抗性影响,并提出基于熵引导的稀疏攻击方法(EGA),实现高攻击成功率与有害率。

Comments 19 Pages,11 figures,8 tables

详情
AI中文摘要

视觉-语言模型(VLM)取得了显著性能,但仍易受对抗攻击。熵作为模型不确定性的度量,与VLM可靠性高度相关。虽然先前的基于熵的攻击在解码步骤中最大化不确定性,隐含假设每个标记对模型不稳定性的贡献相等,但我们揭示了在评估的具有不同架构的代表性开源VLM中,一小部分(约20%)高熵标记在自回归生成过程中集中了不成比例的对抗性影响。我们证明,将这些对抗扰动集中到这些高熵位置,可以在优化更少解码位置的情况下实现与全局方法相当的语义退化。此外,在多个代表性VLM中,此类攻击不仅导致语义漂移,还在当前流程下产生大量不安全子集(20-31%)。值得注意的是,由于这种脆弱的高熵标记在不同架构的VLM中重复出现,针对它们的攻击表现出非平凡的迁移性。受这些发现启发,我们设计了一种简单的熵引导攻击(EGA),该攻击实现了稀疏高熵定位,并通过可重用的标记库扩展,在三个代表性开源VLM上取得了具有竞争力的攻击成功率(93-95%)和相当高的有害率(30.2-38.6%)。

英文摘要

Vision-language models (VLMs) achieve remarkable performance but remain vulnerable to adversarial attacks. Entropy, as a measure of model uncertainty, is highly correlated with VLM reliability. While prior entropy-based attacks maximize uncertainty at all decoding steps, implicitly assuming that every token equally contributes to model instability, we reveal that a small fraction (around 20%) of high-entropy tokens, in the evaluated representative open-source VLMs with diverse architectures, concentrates a disproportionate share of adversarial influence during autoregressive generation. We demonstrate that concentrating adversarial perturbations on these high-entropy positions achieves comparable semantic degradation to global methods while optimizing fewer decoding positions. Additionally, across multiple representative VLMs, such attacks induce not only semantic drift but also a substantial unsafe subset (20-31%) under the current pipeline. Remarkably, since such vulnerable high-entropy tokens recur across architecturally diverse VLMs, attacks focused on them exhibit non-trivial transferability. Motivated by these findings, we design a simple Entropy-Guided Attack (EGA) that operationalizes sparse high-entropy targeting and extends it with a reusable token bank, yielding competitive attack success rates (93-95%) with a considerable harmful rate (30.2-38.6%) on the three representative open-source VLMs.

2512.21208 2026-05-26 cs.LG math.DS math.OC 版本更新

A Learning Stability Profile for Finite-Dimensional Learning Dynamics

有限维学习动力学的学习稳定性剖面

Ronald Katende

发表机构 * Department of Mathematics(数学系) Kabale University(卡巴尔大学)

AI总结 提出一个有限维灵敏度框架,通过Lyapunov准则控制学习稳定性剖面,适用于前馈网络、残差架构、随机梯度方法和非光滑系统。

Comments 19 pages, 0 figures

详情
AI中文摘要

我们开发了一个有限维灵敏度框架,用于研究状态包含表示、参数和更新变量的学习系统的稳定性。核心对象是学习稳定性剖面,这是一个方向灵敏度算子集合,记录了输入、参数初始化和更新机制中的扰动如何沿指定学习轨迹传播。主要结果是控制该剖面的Lyapunov准则。在显式的正则性、强制性和耗散性假设下,增量Lyapunov能量对相关的线性化转移算子产生一致或指数衰减的界。该结果被表述为充分稳定性准则,而非无条件逆定理。该框架还区分了终端衰减、剖面有界性和次指数增长,避免了将非正增长指数与一致有界性等同。然后,该剖面被专门应用于几种标准学习机制。谱界为前馈网络提供前向灵敏度估计。耗散性和步长限制为残差架构提供稳定性界。均方收缩假设为随机梯度方法提供参数和更新灵敏度界。局部Lipschitz系统,包括分段线性网络、近端映射、投影更新以及递归或状态空间递归,通过Clarke广义Jacobian和变分Lyapunov不等式处理。所得框架为架构、优化、随机性和非光滑性提供了统一的稳定性语言。其作用是结构性的:它将已知的稳定性机制组织在一个扰动演算中,同时使每种保证所需的假设保持明确。

英文摘要

We develop a finite-dimensional sensitivity framework for studying stability in learning systems whose states include representations, parameters, and update variables. The central object is the \emph{Learning Stability Profile}, a collection of directional sensitivity operators that records how perturbations in inputs, parameter initialization, and update mechanisms propagate along a specified learning trajectory. The main result is a Lyapunov criterion for controlling this profile. Under explicit regularity, coercivity, and dissipation assumptions, an incremental Lyapunov energy yields uniform or exponentially decaying bounds on the associated linearized transition operators. The result is stated as a sufficient stability criterion, not as an unconditional converse theorem. The framework also distinguishes terminal decay, profile-wise boundedness, and subexponential growth, avoiding the identification of nonpositive growth exponents with uniform boundedness. The profile is then specialized to several standard learning mechanisms. Spectral bounds give forward sensitivity estimates for feedforward networks. Dissipativity and step-size restrictions give stability bounds for residual architectures. Mean-square contraction assumptions yield parameter and update sensitivity bounds for stochastic gradient methods. Locally Lipschitz systems, including piecewise-linear networks, proximal maps, projected updates, and recurrent or state-space recursions, are handled through Clarke generalized Jacobians and variational Lyapunov inequalities. The resulting framework provides a common stability language for architecture, optimization, stochasticity, and nonsmoothness. Its role is structural: it organizes known stability mechanisms within one perturbation calculus while keeping the hypotheses needed for each guarantee explicit.

2512.19097 2026-05-26 cs.LG cs.AI 版本更新

DIVER-1: Scaling Intracranial EEG Foundation Models for Transferable Representations

DIVER-1: 扩展颅内脑电图基础模型以实现可迁移表示

Danny Dongyeop Han, Yonghyeon Gwon, Ahhyun Lucy Lee, Taeyang Lee, Seong Jin Lee, Jubin Choi, Sebin Lee, Jihyun Bang, Seungju Lee, David Keetae Park, Shinjae Yoo, Chun Kee Chung, Jiook Cha

发表机构 * Seoul National University(首尔国立大学) Brookhaven National Laboratory(布鲁克海文国家实验室)

AI总结 提出DIVER-1自监督iEEG基础模型,通过可变电极-时间注意力、时空重采样等设计处理可变输入,在5310小时ECoG和SEEG上预训练,在认知解码和癫痫检测任务上超越现有模型,并首次进行受控计算感知的扩展研究。

Comments 31 pages, 12 figures, 14tables

详情
AI中文摘要

颅内脑电图(iEEG)提供直接、毫秒级的人类神经活动记录,但由于电极布局、解剖覆盖、参考方案和记录条件在不同患者和中心之间存在差异,可重用的表示学习变得困难。我们引入了DIVER-1,一个用于可变输入记录的自监督iEEG基础模型,它结合了任意变量电极-时间注意力、时空重采样、输入条件位置嵌入和多域掩码重建,而不假设固定的电极布局。我们在5310小时的ECoG和SEEG上预训练了两个变体DIVER-1-0.1s和DIVER-1-1s,涵盖352k通道小时,大约是BrainTreeBank预训练量的54倍。我们在两个保留基准上评估DIVER-1:用于自然认知解码的Neuroprobe和用于癫痫检测的MAYO。在考虑泄漏的Neuroprobe上,尽管预训练时未使用构成Neuroprobe语料库的BrainTreeBank记录,DIVER-1-0.1s仍优于先前评估的iEEG基础模型;它在平均AUROC上也超过了线性频谱图解码器,并与更强的非线性基线保持竞争力,这是先前评估的iEEG基础模型未能达到的水平。DIVER-1-1s在MAYO癫痫检测上也取得了最高的AUROC。最后,我们进行了据我们所知首次受控计算感知的自监督iEEG预训练扩展研究,扫描了数据规模、受试者数量、训练时长和模型大小(高达1.8B参数)。我们的结果表明存在数据受限区域:扩展独特记录和充分训练是比单纯增加参数数量更可靠的扩展轴。代码可在链接处获取。

英文摘要

Intracranial EEG (iEEG) provides direct, millisecond-scale recordings of human neural activity, but reusable representation learning is difficult because electrode layouts, anatomical coverage, referencing schemes, and recording conditions vary across patients and centers. We introduce DIVER-1, a self-supervised iEEG foundation model for variable-input recordings that combines any-variate electrode-time attention, spatio-temporal resampling, input-conditioned positional embeddings, and multi-domain masked reconstruction without assuming a fixed electrode montage. We pretrain two variants, DIVER-1-0.1s and DIVER-1-1s, on 5,310 hours of ECoG and SEEG spanning 352k channel-hours, roughly 54x the BrainTreeBank-based pretraining volume. We evaluate DIVER-1 on two held-out benchmarks: Neuroprobe for naturalistic cognitive decoding and MAYO for seizure detection. On leakage-aware Neuroprobe, DIVER-1-0.1s outperforms prior evaluated iEEG foundation models despite using no BrainTreeBank recordings, the corpus underlying Neuroprobe, during pretraining; it also exceeds the linear spectrogram decoder in mean AUROC and remains competitive with stronger nonlinear baselines, a level prior evaluated iEEG foundation models did not reach. DIVER-1-1s also achieves the top AUROC on MAYO seizure detection. Finally, we conduct, to our knowledge, the first controlled compute-aware scaling study for self-supervised iEEG pretraining, sweeping data scale, subject count, training duration, and model size up to 1.8B parameters. Our results indicate a data-constrained regime: expanding unique recordings and training sufficiently long are more reliable scaling axes than increasing parameter count alone. Code is available at link.

2510.20955 2026-05-26 cs.LG cs.RO 版本更新

Approximating Safety Feedback Without a Safety Oracle via Model Predictive Control

无安全神谕下通过模型预测控制近似安全反馈

Jeff Pflueger, Michael Everett

发表机构 * Northeastern University(东北大学)

AI总结 提出一种利用模拟器和模型预测路径积分算法,基于可逆性和正不变性假设来近似安全函数的方法,避免手动设计安全反馈。

Comments 8 pages, 5 figures

详情
AI中文摘要

移动机器人控制的安全决策算法通常需要存在反馈来验证提议动作的安全性。该反馈假定在控制系统的开发或部署过程中直接可用,可以采取显式约束公式或手工标记的安全数据集的形式,但两者都可能不准确或耗时。许多最近开发的模拟器可以处理复杂的交互和多样化的环境。这些环境具有隐式安全约束,可能难以建模。通过利用其中一个模拟器,我们可以构建一个安全函数的代理,从而绕过对手动设计反馈来捕获这些约束的需求。我们提出了一种算法,通过使用可逆性和对不安全状态空间的正不变性假设来近似安全性。该方法采用模型预测路径积分算法(MPPI)来建立这种可逆性并验证提议的动作。首先,通过模拟器将动作投影到未来状态。然后,如果MPPI能够找到一条路径返回到轨迹中的先前状态,则该状态保证在不安全(正不变)集合之外。实验结果表明,所提出的算法可以近似安全神谕的性能,同时避免将不安全状态分类为安全。

英文摘要

Safe decision-making algorithms for control of mobile robots often require the existence of feedback to verify the safety of proposed actions. This feedback is assumed to be directly available during the development or deployment of the control system. It can take the form of either an explicit constraint formulation or a set of hand-labeled safety data, both of which can be inaccurate or time consuming to produce. Many recently developed simulators can handle complex interactions and varied environments. These environments have implicit safety constraints that may be hard to model. By leveraging one of these simulators, we can construct a proxy for a safety function that bypasses the need for hand designed feedback in capturing these constraints. We present an algorithm that approximates safety by using reversibility and a positive-invariance assumption on the unsafe state space. This method employs the Model-Predictive Path Integral algorithm (MPPI) to establish this reversibility and verify a proposed action. First the action is projected via the simulator to a future state. Then if MPPI can find a path back to a previous state in the trajectory, that state is guaranteed to be outside the unsafe (positive invariant) set. Experimental results demonstrate that the proposed algorithm can approximate the performance of a safety oracle while avoiding classification of unsafe states as safe.

2510.20477 2026-05-26 cs.LG 版本更新

Bi-CoG: Bi-Consistency-Guided Self-Training for Vision-Language Models

Bi-CoG:面向视觉语言模型的双一致性引导自训练

Rui Zhu, Song-Lin Lv, Zi-Kang Wang, Lan-Zhe Guo

发表机构 * School of Intelligence Science and Technology, Nanjing University, China(南京大学智能科学与技术学院) National Key Laboratory for Novel Software Technology, Nanjing University, China(南京大学新型软件技术国家重点实验室)

AI总结 针对半监督微调中模型偏差和超参数敏感问题,提出一种利用模型间和模型内一致性以及误差感知动态伪标签分配策略的即插即用方法Bi-CoG,在14个数据集上显著提升现有方法性能。

Comments Accepted by IJCAI 2026

详情
AI中文摘要

通过半监督学习(SSL)利用未标记数据或通过微调利用预训练模型是解决标签稀缺场景的两种主流范式。最近,将预训练视觉语言模型(VLM)的微调与SSL相结合引起了越来越多的关注,形成了半监督微调的新兴范式。然而,现有方法由于依赖预测一致性或预定义的置信度阈值,常常遭受模型偏差和超参数敏感性的困扰。为了解决这些局限性,我们提出了一种简单而有效的即插即用方法,名为$\underline{\textbf{Bi-Co}}$nsistency-$\underline{\textbf{G}}$uided Self-Training (Bi-CoG),它通过同时利用模型间和模型内一致性,以及一种错误感知的动态伪标签分配策略,来分配高质量、低偏差的伪标签。理论分析和在14个数据集上的大量实验都证明了Bi-CoG的有效性,它一致且显著地提升了现有方法的性能。

英文摘要

Exploiting unlabeled data through semi-supervised learning (SSL) or leveraging pre-trained models via fine-tuning are two prevailing paradigms for addressing label-scarce scenarios. Recently, growing attention has been given to combining fine-tuning of pre-trained vision-language models (VLMs) with SSL, forming the emerging paradigm of semi-supervised fine-tuning. However, existing methods often suffer from model bias and hyperparameter sensitivity, due to reliance on prediction consistency or pre-defined confidence thresholds. To address these limitations, we propose a simple yet effective plug-and-play methodology named $\underline{\textbf{Bi-Co}}$nsistency-$\underline{\textbf{G}}$uided Self-Training (Bi-CoG), which assigns high-quality and low-bias pseudo-labels, by simultaneously exploiting inter-model and intra-model consistency, along with an error-aware dynamic pseudo-label assignment strategy. Both theoretical analysis and extensive experiments over 14 datasets demonstrate the effectiveness of Bi-CoG, which consistently and significantly improves the performance of existing methods.

2510.07257 2026-05-26 cs.LG 版本更新

Test-Time Graph Search for Goal-Conditioned Reinforcement Learning

测试时图搜索用于目标条件强化学习

Evgenii Opryshko, Junwei Quan, Claas Voelcker, Yilun Du, Igor Gilitschenski

发表机构 * Department of Computer Science, University of Toronto, Toronto, Canada(多伦多大学计算机科学系) Vector Institute, Toronto, Canada(向量研究所) University of Texas at Austin, Austin, USA(德克萨斯大学奥斯汀分校) Harvard University, Cambridge, USA(哈佛大学)

AI总结 提出测试时图搜索方法,通过构建离线数据集图并自适应选择子目标,在不额外训练的情况下显著提升目标条件强化学习在长时域任务中的成功率。

详情
AI中文摘要

离线目标条件强化学习(GCRL)通常难以处理长时域任务,其中价值估计误差累积导致策略不可靠。通常认为没有专门训练就无法实现有效的长期规划。相反,我们的工作表明,现有的GCRL策略与轻量级、无需训练的规划包装器结合时,可以完成长时域任务。我们发现标准目标条件价值函数编码了足以进行规划的局部一致几何结构。我们的方法,测试时图搜索(TTGS),在离线数据集上构建图,并采用自适应子目标选择策略。为了解决最短路径搜索中不可靠的价值估计,我们提出了一种新机制,软性地惩罚长距离转移。我们的方法计算开销可忽略,且不需要额外的监督或参数更新。在OGBench基准上,TTGS显著提高了多个基学习器和任务的成功率,主要收益在具有挑战性的长时域运动任务上,其中一些成功率从接近零提高到90%以上,通常匹配或超越需要复杂辅助训练的方法。代码和视频可在https://ktolnos.github.io/ttgs找到。

英文摘要

Offline goal-conditioned reinforcement learning (GCRL) often struggles with long-horizon tasks, where errors in value estimation accumulate and produce unreliable policies. It is typically assumed that effective long-term planning is infeasible without specialized training. In contrast, our work demonstrates that existing GCRL policies can complete long-horizon tasks when combined with a lightweight, training-free planning wrapper. We find that standard goal-conditioned value functions encode locally consistent geometric structure sufficient for planning. Our approach, Test-Time Graph Search (TTGS), constructs a graph over the offline dataset and employs an adaptive subgoal selection strategy. To address unreliable value estimates during shortest-path search, we propose a novel mechanism that softly penalizes long-distance transitions. Our method incurs negligible computational overhead and requires no additional supervision or parameter updates. On the OGBench benchmark, TTGS significantly boosts success rates across multiple base learners and tasks, with primary gains on challenging long-horizon locomotion tasks where some success rates are improved from near-zero to over 90\%, often matching or outperforming methods that require complex auxiliary training. Code and videos can be found at https://ktolnos.github.io/ttgs.

2510.01384 2026-05-26 cs.LG 版本更新

Fine-Tuning Masked Diffusion for Provable Self-Correction

微调掩码扩散以实现可证明的自校正

Jaeyeon Kim, Seunggeun Kim, Taekyun Lee, David Z. Pan, Hyeji Kim, Sham Kakade, Sitan Chen

发表机构 * Harvard University(哈佛大学) The University of Texas at Austin(德克萨斯大学奥斯汀分校) Kempner Institute(凯姆纳研究所)

AI总结 提出PRISM方法,通过轻量级模型无关的重新掩码策略,在掩码扩散模型中实现可证明的自校正,无需强化学习或验证器,提升低质量令牌检测与修正能力。

Comments Authorship statement: Jaeyeon Kim and Seunggeun Kim contributed equally, and Taekyun Lee is also a co first author

详情
AI中文摘要

生成模型的一个自然期望是自校正——在推理时检测并修正低质量令牌。尽管掩码扩散模型(MDMs)已成为离散空间生成建模的有前景方法,但其自校正能力仍知之甚少。先前将自校正融入MDMs的尝试要么需要彻底改造MDM架构/训练,要么依赖于令牌质量的不精确代理,限制了其适用性。受此启发,我们引入PRISM——掩码扩散推理时自校正的插件式重新掩码——一种轻量级、模型无关的方法,适用于任何预训练MDM。理论上,PRISM定义了一个自校正损失,可证明地学习每个令牌的质量分数,无需强化学习或验证器。这些质量分数在与MDM相同的前向传播中计算,并用于检测低质量令牌。实验上,PRISM在多个领域和规模上推进了MDM推理:数独;无条件文本(170M);以及使用LLaDA(8B)的代码。

英文摘要

A natural desideratum for generative models is self-correction--detecting and revising low-quality tokens at inference. While Masked Diffusion Models (MDMs) have emerged as a promising approach for generative modeling in discrete spaces, their capacity for self-correction remains poorly understood. Prior attempts to incorporate self-correction into MDMs either require overhauling MDM architectures/training or rely on imprecise proxies for token quality, limiting their applicability. Motivated by this, we introduce PRISM--Plug-in Remasking for Inference-time Self-correction of Masked Diffusions--a lightweight, model-agnostic approach that applies to any pretrained MDM. Theoretically, PRISM defines a self-correction loss that provably learns per-token quality scores, without RL or a verifier. These quality scores are computed in the same forward pass with MDM and used to detect low-quality tokens. Empirically, PRISM advances MDM inference across domains and scales: Sudoku; unconditional text (170M); and code with LLaDA (8B).

2510.01184 2026-05-26 cs.LG 版本更新

Temporal Score Rescaling for Temperature Sampling in Diffusion and Flow Models

扩散与流模型中温度采样的时间分数重缩放

Yanbo Xu, Yu Wu, Sungjae Park, Zhizhuo Zhou, Shubham Tulsiani

发表机构 * School of Computer Science, Carnegie Mellon University, Pittsburgh, USA(计算机科学系,卡内基梅隆大学,匹兹堡,美国) Department of Computer Science, Stanford University, California, USA(计算机科学系,斯坦福大学,加利福尼亚,美国)

AI总结 提出一种无需微调或改变训练策略的方法,通过重缩放噪声数据的得分函数来调控扩散和流模型的采样多样性,实现局部温度控制,并在图像生成、姿态估计、深度预测、机器人操作和蛋白质设计等任务中验证了有效性。

Comments Accepted at ICML 2026. Project page: https://temporalscorerescaling.github.io/

详情
AI中文摘要

我们提出一种机制来引导去噪扩散和流匹配模型的采样多样性,允许用户从比训练分布更尖锐或更宽的分布中采样。我们基于这些模型利用(学习的)噪声数据分布的得分函数进行采样这一观察,并表明重缩放这些得分函数可以有效控制“局部”采样温度。值得注意的是,该方法不需要任何微调或改变训练策略,可以应用于任何现成模型,并且与确定性和随机采样器兼容。我们首先在玩具2D数据上验证了我们的框架,然后展示了其在五个不同任务上训练的扩散模型中的应用——图像生成、姿态估计、深度预测、机器人操作和蛋白质设计。我们发现,在这些任务中,我们的方法允许从更尖锐(或更平坦)的分布中采样,从而带来性能提升,例如,深度预测模型受益于采样更可能的深度估计,而图像生成模型在采样稍平坦的分布时表现更好。

英文摘要

We present a mechanism to steer the sampling diversity of denoising diffusion and flow matching models, allowing users to sample from a sharper or broader distribution than the training distribution. We build on the observation that these models leverage (learned) score functions of noisy data distributions for sampling and show that rescaling these allows one to effectively control a 'local' sampling temperature. Notably, this approach does not require any finetuning or alterations to training strategy, and can be applied to any off-the-shelf model and is compatible with both deterministic and stochastic samplers. We first validate our framework on toy 2D data, and then demonstrate its application for diffusion models trained across five disparate tasks -- image generation, pose estimation, depth prediction, robot manipulation, and protein design. We find that across these tasks, our approach allows sampling from sharper (or flatter) distributions, yielding performance gains e.g., depth prediction models benefit from sampling more likely depth estimates, whereas image generation models perform better when sampling a slightly flatter distribution.

2510.00387 2026-05-26 cs.LG cs.HC 版本更新

Bayesian Distributional Models of Executive Functioning

执行功能的贝叶斯分布模型

Robert Kasumba, Zeyu Lu, Dom CP Marticorena, Mingyang Zhong, Paul Beggs, Anja Pahor, Geetha Ramani, Imani Goffney, Susanne M Jaeggi, Aaron R Seitz, Jacob R Gardner, Dennis L Barbour

发表机构 * Division of Computing and Data Science, Washington University(华盛顿大学计算与数据科学系) Department of Computer Science and Engineering, Washington University(华盛顿大学计算机科学与工程系) Department of Biomedical Engineering, Washington University(华盛顿大学生物医学工程系) Department of Psychology, University of Maribor(马里博大学心理学系) Department of Human Development and Quantitative Methodology, University of Maryland(马里兰大学人类发展与定量方法系) Department of Teaching and Learning, Policy and Leadership, University of Maryland(马里兰大学教学与学习、政策与领导系) Department of Psychology, Northeastern University(东北大学心理学系) Department of Computer and Information Science, University of Pennsylvania(宾夕法尼亚大学计算机与信息科学系)

AI总结 本研究使用已知真实参数的受控模拟,评估分布潜变量模型(DLVM)和贝叶斯分布主动学习(DALE)相比传统独立最大似然估计(IMLE)的优势,证明DLVM结合DALE能更高效地估计认知表现分布。

Comments 45 pages, 8 figures, 2 tables

详情
AI中文摘要

本研究使用已知真实参数的受控模拟,评估分布潜变量模型(DLVM)和贝叶斯分布主动学习(DALE)相比传统独立最大似然估计(IMLE)的表现。DLVM整合了多个执行功能任务和个体的观测,允许在稀疏或不完整数据条件下进行参数估计。为了建立已知真实参数,我们从神经网络学习的潜空间中均匀采样个体会话,并将其映射到不同任务上的分布认知表现。然后使用DALE、随机过程或标准固定电池方法从这些分布中采样个体测试项。在给定相同观测集时,DLVM始终优于IMLE,尤其是在数据量较小的情况下,并且更快收敛到真实分布的高度准确估计。在第二组分析中,DALE自适应地引导采样以最大化信息增益,优于随机采样和固定测试电池,尤其是在前80次试验中。这些发现确立了将DLVM的跨任务推理与DALE的最优自适应采样相结合的优势,为更高效的认知评估提供了原则性基础。

英文摘要

This study uses controlled simulations with known ground-truth parameters to evaluate how Distributional Latent Variable Models (DLVM) and Bayesian Distributional Active LEarning (DALE) perform in comparison to conventional Independent Maximum Likelihood Estimation (IMLE). DLVM integrates observations across multiple executive function tasks and individuals, allowing parameter estimation even under sparse or incomplete data conditions. To establish known-ground truth, we uniformly sample individual sessions from a neural network learned latent space and map them to distributional cognitive performance across different tasks. The individual test-items are then sampled from these distributions using either DALE, random procedure or a standard fixed battery approach. When given the same set of observations, DLVM consistently outperformed IMLE, especially under smaller amounts of data, and converges faster to highly accurate estimates of the true distributions. In a second set of analyses, DALE adaptively guided sampling to maximize information gain, outperforming random sampling and fixed test batteries, particularly within the first 80 trials. These findings establish the advantages of combining DLVM's cross-task inference with DALE's optimal adaptive sampling, providing a principled basis for more efficient cognitive assessments.

2509.13608 2026-05-26 cs.LG 版本更新

Is GPT-4o mini Blinded by its Own Safety Filters? Exposing the Multimodal-to-Unimodal Bottleneck in Hate Speech Detection

GPT-4o mini 是否被自身的安全过滤器蒙蔽?揭示多模态到单模态瓶颈在仇恨言论检测中的作用

Niruthiha Selvanayagam, Ted Kurti

AI总结 本文通过 Hateful Memes Challenge 数据集系统分析 GPT-4o mini 在多模态仇恨言论检测中的安全架构,发现并实验验证了“单模态瓶颈”缺陷,即上下文无关的安全过滤器会优先阻断多模态推理,导致误报。

Comments This paper reports preliminary findings from a small-scale study whose sample size is insufficient to support the stated conclusions. The authors are withdrawing it to conduct a more comprehensive evaluation

详情
AI中文摘要

随着大型多模态模型(LMMs)融入日常数字生活,理解其安全架构成为 AI 对齐的关键问题。本文对 OpenAI 的 GPT-4o mini(一个全球部署的模型)在多模态仇恨言论检测这一困难任务上进行了系统分析。使用 Hateful Memes Challenge 数据集,我们对 500 个样本进行了多阶段调查,以探究模型的推理和失败模式。我们的核心发现是通过实验识别出“单模态瓶颈”——一种架构缺陷,其中模型的高级多模态推理被上下文无关的安全过滤器系统性地抢先阻断。对 144 次内容策略拒绝的定量验证显示,这些覆盖触发由单模态视觉(50%)和文本(50%)内容均等引发。我们进一步证明该安全系统脆弱,不仅阻止高风险图像,也阻止良性的常见模因格式,导致可预测的误报。这些发现揭示了最先进 LMMs 中能力与安全性之间的根本矛盾,强调了需要更集成、上下文感知的对齐策略,以确保 AI 系统能够安全且有效地部署。

英文摘要

As Large Multimodal Models (LMMs) become integral to daily digital life, understanding their safety architectures is a critical problem for AI Alignment. This paper presents a systematic analysis of OpenAI's GPT-4o mini, a globally deployed model, on the difficult task of multimodal hate speech detection. Using the Hateful Memes Challenge dataset, we conduct a multi-phase investigation on 500 samples to probe the model's reasoning and failure modes. Our central finding is the experimental identification of a "Unimodal Bottleneck," an architectural flaw where the model's advanced multimodal reasoning is systematically preempted by context-blind safety filters. A quantitative validation of 144 content policy refusals reveals that these overrides are triggered in equal measure by unimodal visual 50% and textual 50% content. We further demonstrate that this safety system is brittle, blocking not only high-risk imagery but also benign, common meme formats, leading to predictable false positives. These findings expose a fundamental tension between capability and safety in state-of-the-art LMMs, highlighting the need for more integrated, context-aware alignment strategies to ensure AI systems can be deployed both safely and effectively.

2509.12783 2026-05-26 q-bio.NC cs.LG math.DS stat.ML 版本更新

Fast reconstruction of degenerate populations of conductance-based neuron models from spike times

基于电导的神经元模型退化群体的尖峰时间快速重建

Julien Brandoit, Damien Ernst, Guillaume Drion, Arthur Fyon

发表机构 * Montefiore Institute, University of Liège(里耶克斯大学蒙特福尔研究所) LTCI, Telecom Paris, Institut Polytechnique de Paris(巴黎电信LTCI研究院,巴黎理工院)

AI总结 结合深度学习与动态输入电导理论,从尖峰时间快速重建高维电导模型的退化群体,实现高精度、鲁棒且可扩展的推断。

详情
Journal ref
PLOS Computational Biology 22(5): e1014337 (2026)
AI中文摘要

从实验可获取的记录中推断电导模型(CBMs)的生物物理参数仍然是计算神经科学的一个核心挑战。尖峰时间是最广泛可用的数据,但它们很少揭示哪些离子通道电导组合产生了观察到的活动。这一逆问题因神经元退化而进一步复杂化,其中多个不同的电导集产生相似的尖峰模式。我们引入了一种方法,通过将深度学习与动态输入电导(DICs)相结合来解决这一挑战,DICs是一个理论框架,将复杂的CBMs简化为三个可解释的反馈组件,控制兴奋性和尖峰模式。我们的方法首先使用一个神经网络将尖峰时间映射到阈值处的DIC密度,该网络学习神经元活动的低维表示。然后,预测的DIC值通过迭代补偿算法用于生成退化的CBM群体,确保与中间目标DIC兼容,从而再现相应的尖峰模式,即使在高度模型中也是如此。应用于两个模型,该算法流程以高精度和鲁棒性重建尖峰和爆发模式,包括在模拟生理随机性的噪声电流注入下生成的尖峰序列。它在标准硬件上毫秒级内产生多样的退化群体,实现了仅从尖峰记录进行可扩展且高效的推断。总之,这项工作将DICs定位为实验观察活动与机制模型之间的实用且可解释的桥梁。通过实现直接从尖峰时间快速且可扩展地重建退化群体,我们的方法提供了一种强大的方式来研究神经元如何利用电导变异性实现可靠计算。

英文摘要

Inferring the biophysical parameters of conductance-based models (CBMs) from experimentally accessible recordings remains a central challenge in computational neuroscience. Spike times are the most widely available data, yet they reveal little about which combinations of ion channel conductances generate the observed activity. This inverse problem is further complicated by neuronal degeneracy, where multiple distinct conductance sets yield similar spiking patterns. We introduce a method that addresses this challenge by combining deep learning with Dynamic Input Conductances (DICs), a theoretical framework that reduces complex CBMs to three interpretable feedback components governing excitability and firing patterns. Our approach first maps spike times to DIC densities at threshold using a neural network that learns a low-dimensional representation of neuronal activity. The predicted DIC values are then used to generate degenerate CBM populations via an iterative compensation algorithm, ensuring compatibility with the intermediate target DICs, and thereby reproducing the corresponding firing patterns, even in high-dimensional models. Applied to two models, this algorithmic pipeline reconstructs spiking and bursting regimes with high accuracy and robustness to variability, including spike trains generated under noisy current injection mimicking physiological stochasticity. It produces diverse degenerate populations within milliseconds on standard hardware, enabling scalable and efficient inference from spike recordings alone. Together, this work positions DICs as a practical and interpretable link between experimentally observed activity and mechanistic models. By enabling fast and scalable reconstruction of degenerate populations directly from spike times, our approach provides a powerful way to investigate how neurons exploit conductance variability to achieve reliable computation.

2509.11379 2026-05-26 stat.ML cs.LG math.ST stat.TH 版本更新

Some Robustness Properties of Label Cleaning

标签清理的一些鲁棒性性质

Chen Cheng, John Duchi

AI总结 本文证明,依赖聚合标签(例如从噪声响应中提炼的标签信息)的学习过程具有数据清理无法实现的鲁棒性,体现在风险一致性、模型误设下的收敛性等方面。

Comments 41 pages, 3 figures. Accepted to Transactions on Machine Learning Research (TMLR)

详情
AI中文摘要

我们证明,依赖聚合标签(例如从噪声响应中提炼的标签信息)的学习过程具有数据清理无法实现的鲁棒性。这种鲁棒性以多种方式体现。在风险一致性的背景下——当采用机器学习中标准的做法,即最小化替代(通常是凸的)损失函数来代替期望的任务损失(如0-1误分类误差)时——使用标签聚合的过程获得了比使用原始标签更强的相合性保证。而在经典统计场景中,拟合完全正确指定的模型表明,纳入所有可能信息(即对标签不确定性建模)在统计上是有效的,但一旦要最小化的损失函数有轻微误设,“标准”方法就会失去相合性。然而,利用聚合信息的过程仍然收敛到最优分类器,这突显了纳入更全面的数据分析流程(从数据收集到模型拟合再到预测时间)如何通过精炼噪声信号来产生更鲁棒的方法论。

英文摘要

We demonstrate that learning procedures that rely on aggregated labels, e.g., label information distilled from noisy responses, enjoy robustness properties impossible without data cleaning. This robustness appears in several ways. In the context of risk consistency -- when one takes the standard approach in machine learning of minimizing a surrogate (typically convex) loss in place of a desired task loss (such as the zero-one mis-classification error) -- procedures using label aggregation obtain stronger consistency guarantees than those even possible using raw labels. And while classical statistical scenarios of fitting perfectly-specified models suggest that incorporating all possible information -- modeling uncertainty in labels -- is statistically efficient, consistency fails for ``standard'' approaches as soon as a loss to be minimized is even slightly mis-specified. Yet procedures leveraging aggregated information still converge to optimal classifiers, highlighting how incorporating a fuller view of the data analysis pipeline, from collection to model-fitting to prediction time, can yield a more robust methodology by refining noisy signals.

2508.11872 2026-05-26 cs.CY cs.AI cs.LG cs.MM 版本更新

Designing Singing Syllabi with Virtual Avatars: AI-Assisted Syllabus Reauthoring

用虚拟化身设计歌唱教学大纲:AI辅助的大纲重创

Xinxing Wu

发表机构 * Kentucky State University, USA(美国肯塔基州立大学)

AI总结 本文提出一种AI辅助工作流,将传统文本教学大纲转化为音乐、视频和虚拟化身增强的学习制品,作为正式大纲的补充。

Comments 16 pages, 1 figures, 1 table

详情
AI中文摘要

传统教学大纲通常作为静态参考文档,而非课程的引人入胜的介绍。在实际教学中,我们观察到很少有学生彻底阅读或完全理解传统文本教学大纲中的信息,这可能导致重要信息未被充分利用。本文将大纲沟通重新定义为设计问题,并记录了一种AI辅助工作流,用于将传统大纲转化为音乐、视频和虚拟化身增强的学习制品。本文追溯了歌词改编、音乐生成、视频合成、虚拟化身合成以及可选的基于浏览器的交互过程。本文贡献了一个可重复的工作流和一个具体的大纲重创示例。本文的讨论将歌唱大纲定位为正式书面大纲的补充而非替代,并指出了未来实证评估的方向。本文描述的完整实现已在 https://github.com/xinxingwu-uk/SSVA 公开。

英文摘要

Traditional syllabi often function as static reference documents rather than engaging introductions to a course. In practical teaching, we observe that few students thoroughly read or fully comprehend the information provided in traditional, text-based course syllabi, which can leave essential information underused. This paper reframes syllabus communication as a design problem and documents an AI-assisted workflow for transforming a traditional syllabus into a musical, video-based, and avatar-enhanced learning artifact. The paper traces the process of lyrical adaptation, music generation, video composition, avatar synthesis, and optional browser-based interaction. And the paper contributes a reproducible workflow and a concrete example of syllabus reauthoring. The discussion in this paper positions the singing syllabus as a supplement to, not a replacement for, the formal written syllabus and identifies future directions for empirical evaluation. The complete implementation described in this paper is publicly available at https://github.com/xinxingwu-uk/SSVA

2506.20641 2026-05-26 math.AP cs.LG math.PR 版本更新

Telegrapher's Generative Model via Kac Flows

基于Kac流的电报生成模型

Richard Duong, Jannis Chemseddine, Peter K. Friz, Gabriele Steidl

发表机构 * Institute of Mathematics, TU Berlin(柏林技术大学数学研究所)

AI总结 提出一种基于阻尼波动方程(电报方程)的新型生成模型,利用Kac过程与电报方程的Feynman-Kac关系,通过流匹配训练神经网络近似速度场,实现样本生成,并展示其相对于扩散模型的优势。

Comments V2: Added CIFAR. V3: Old FID & CIFAR images of Kac model corresponded to schedule g(t) = t. Updated them with both schedules t and t^2. V4: Corrected minor implementation error & updated CIFAR results. V5: Added: Prop. A1 (mean-reverting Kac process is Lipschitz); rigorous proof of decomp. Lemma 6.1; nearest neighbor analysis. V7: Correction in proof of Lem. 6.1. V6/8: Polishing

详情
AI中文摘要

我们打破了基于流的生成模型的传统模式,提出了一种基于阻尼波动方程(也称为电报方程)的新模型。与扩散方程和布朗运动类似,电报方程与一维随机Kac过程之间存在Feynman-Kac型关系。Kac流在时间上逐步线性演化,因此概率流在Wasserstein距离下是Lipschitz连续的,并且与扩散流不同,速度的范数保持全局有界。此外,Kac模型以扩散模型为其渐近极限。我们将这些考虑扩展到多维随机过程,该过程由每个空间分量上的独立一维Kac过程组成。我们证明该过程在Wasserstein空间中产生绝对连续曲线,并解析计算从Dirac点出发时的条件速度场。利用流匹配框架,我们训练神经网络来近似速度场,并将其用于样本生成。我们的数值实验展示了该方法的可扩展性,并显示了其相对于扩散模型的优势。

英文摘要

We break the mold in flow-based generative modeling by proposing a new model based on the damped wave equation, also known as telegrapher's equation. Similar to the diffusion equation and Brownian motion, there is a Feynman-Kac type relation between the telegrapher's equation and the stochastic Kac process in 1D. The Kac flow evolves stepwise linearly in time, so that the probability flow is Lipschitz continuous in the Wasserstein distance and, in contrast to diffusion flows, the norm of the velocity remains globally bounded. Furthermore, the Kac model has the diffusion model as its asymptotic limit. We extend these considerations to a multi-dimensional stochastic process which consists of independent 1D Kac processes in each spatial component. We show that this process gives rise to an absolutely continuous curve in the Wasserstein space and analytically compute the conditional velocity field when starting in a Dirac point. Using the framework of flow matching, we train a neural network to approximate the velocity field and use it for sample generation. Our numerical experiments demonstrate the scalability of our approach, and show its advantages over diffusion models.

2506.10054 2026-05-26 cs.LG cs.AI cs.CL cs.CV 版本更新

Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs

Uni-DPO:大语言模型动态偏好优化的统一范式

Shangpin Peng, Weinong Wang, Zhuotao Tian, Senqiao Yang, Xing Wu, Haotian Xu, Chengquan Zhang, Takashi Isobe, Baotian Hu, Min Zhang

发表机构 * Harbin Institute of Technology, Shenzhen(哈尔滨工业大学(深圳)) Xi’an Jiaotong University(西安交通大学) The Chinese University of Hong Kong(香港中文大学) University of Chinese Academy of Sciences(中国科学院大学) Tsinghua University(清华大学) Huazhong University of Science and Technology(华中科技大学)

AI总结 针对现有DPO方法忽略数据质量和学习难度差异的问题,提出Uni-DPO统一框架,通过自适应重加权偏好对实现更有效的数据利用和更优性能。

Comments Accepted by ICLR 2026. Code & models: https://github.com/pspdada/Uni-DPO

详情
AI中文摘要

直接偏好优化(DPO)因其简单高效已成为从人类反馈中进行强化学习(RLHF)的基石。然而,现有的基于DPO的方法通常平等对待所有偏好对,忽略了数据质量和学习难度的显著差异,导致数据利用效率低下和性能次优。为解决这一局限,我们提出Uni-DPO,一个统一的动态偏好优化框架,该框架联合考虑(a)偏好对的内在质量和(b)模型在训练过程中的动态表现。通过基于这两个因素自适应地重新加权样本,Uni-DPO能够更有效地利用偏好数据并实现卓越性能。跨模型和基准的大量实验证明了Uni-DPO的有效性和泛化能力。在文本任务上,使用Uni-DPO微调的Gemma-2-9B-IT在Arena-Hard上超越领先的大语言模型Claude 3 Opus 6.7个百分点。在数学和多模态任务上,Uni-DPO在所有基准上持续优于基线方法,为其有效性和鲁棒性提供了强有力的实证证据。

英文摘要

Direct Preference Optimization (DPO) has emerged as a cornerstone of reinforcement learning from human feedback (RLHF) due to its simplicity and efficiency. However, existing DPO-based methods typically treat all preference pairs equally, overlooking substantial variations in data quality and learning difficulty, which leads to inefficient data utilization and suboptimal performance. To address this limitation, we propose Uni-DPO, a unified dynamic preference optimization framework that jointly considers (a) the inherent quality of preference pairs and (b) the model's evolving performance during training. By adaptively reweighting samples based on both factors, Uni-DPO enables more effective use of preference data and achieves superior performance. Extensive experiments across models and benchmarks demonstrate the effectiveness and generalization of Uni-DPO. On textual tasks, Gemma-2-9B-IT fine-tuned with Uni-DPO surpasses the leading LLM, Claude 3 Opus, by 6.7 points on Arena-Hard. On mathematical and multimodal tasks, Uni-DPO consistently outperforms baseline methods across all benchmarks, providing strong empirical evidence of its effectiveness and robustness.

2506.09199 2026-05-26 cs.LG cs.AI cs.DC 版本更新

FLoRIST: Singular Value Thresholding for Efficient and Accurate Federated Fine-Tuning of Large Language Models

FLoRIST: 用于高效准确的大语言模型联邦微调的奇异值阈值化方法

Hariharan Ramesh, Jyotikrishna Dass

发表机构 * Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country(匿名机构,匿名城市,匿名地区,匿名国家)

AI总结 提出FLoRIST框架,通过奇异值阈值化在紧凑中间空间中对局部适配器进行分解,实现数学上准确的聚合,同时保持通信和计算高效。

Comments 21 pages, 12 figures

详情
Journal ref
Ninth Conference on Machine Learning and Systems (MLSys 2026)
AI中文摘要

将低秩适配(LoRA)集成到联邦学习为在不共享本地数据的情况下对大语言模型(LLMs)进行参数高效微调提供了一种有前景的解决方案。然而,为联邦LoRA设计的几种方法在平衡通信效率、模型准确性和计算成本方面面临重大挑战,尤其是在异构客户端之间。这些方法要么依赖于简单的局部适配器平均,这会引入聚合噪声;要么需要传输大型堆叠局部适配器,导致通信效率低下;要么需要重建内存密集的全局权重更新矩阵并执行计算昂贵的分解来设计客户端特定的低秩适配器。在这项工作中,我们提出了FLoRIST,一个联邦微调框架,在不产生高通信或计算开销的情况下实现了数学上准确的聚合。FLoRIST不是在服务器端构建完整的全局权重更新矩阵,而是通过对堆叠的局部适配器分别执行奇异值分解,采用高效的分解流程。该方法在紧凑的中间空间内操作,以表示来自局部LoRA的累积信息。我们引入了可调的奇异值阈值化,用于服务器端最优秩选择,以构建一对所有客户端共享的全局低秩适配器。跨多个数据集和LLMs的大量实证评估表明,FLoRIST在同构和异构设置中始终在卓越的通信效率和竞争性能之间取得最佳平衡。

英文摘要

Integrating Low-Rank Adaptation (LoRA) into federated learning offers a promising solution for parameter-efficient fine-tuning of Large Language Models (LLMs) without sharing local data. However, several methods designed for federated LoRA present significant challenges in balancing communication efficiency, model accuracy, and computational cost, particularly among heterogeneous clients. These methods either rely on simplistic averaging of local adapters, which introduces aggregation noise, require transmitting large stacked local adapters, leading to poor communication efficiency, or necessitate reconstructing memory-dense global weight-update matrix and performing computationally expensive decomposition to design client-specific low-rank adapters. In this work, we propose FLoRIST, a federated fine-tuning framework that achieves mathematically accurate aggregation without incurring high communication or computational overhead. Instead of constructing the full global weight-update matrix at the server, FLoRIST employs an efficient decomposition pipeline by performing singular value decomposition on stacked local adapters separately. This approach operates within a compact intermediate space to represent the accumulated information from local LoRAs. We introduce tunable singular value thresholding for server-side optimal rank selection to construct a pair of global low-rank adapters shared by all clients. Extensive empirical evaluations across multiple datasets and LLMs demonstrate that FLoRIST consistently strikes the best balance between superior communication efficiency and competitive performance in both homogeneous and heterogeneous setups.

2506.09084 2026-05-26 cs.LG cs.AI 版本更新

PageLLM: A Multi-Grained Reward Framework for Whole-Page Optimization with Large Language Models

PageLLM:面向整页优化的大语言模型多粒度奖励框架

Xinyuan Wang, Liang Wu, Dongjie Wang, Yanjie Fu

发表机构 * Arizona State University(亚利桑那州立大学) Nokia(诺基亚) University of Kansas(堪萨斯大学)

AI总结 针对整页优化中人工标注成本高和页面级连贯性与项目级放置粒度不匹配的问题,提出PageLLM框架,通过将隐式反馈解耦为粗粒度页面级奖励和细粒度项目级奖励,结合PPO的RLHF进行微调,显著提升排序性能并在线上部署中取得收益。

详情
AI中文摘要

整页优化(WPO)决定了搜索和推荐结果如何呈现给用户,而大语言模型(LLMs)通过将页面生成视为序列生成为其开辟了新途径。然而,将LLMs适配到网络规模的WPO仍受限于昂贵的人工标注需求以及页面级连贯性与项目级放置之间的粒度不匹配。在这项工作中,我们表明这两个挑战是耦合的:只要奖励信号被解耦为两个互补的粒度,仅凭隐式用户反馈就足以进行对齐。我们提出了PageLLM,一个基于奖励的微调框架,该框架(i)将隐式反馈转化为四个对比偏好对族,涵盖相关性、排序、多样性和冗余度;(ii)学习一个粗粒度的页面级奖励和一个细粒度的项目级奖励,后者捕捉对参与度敏感的位置交换;(iii)在预训练的LLM上通过基于PPO的RLHF结合这两种奖励。在七个亚马逊类别上针对十一个基线的广泛实验表明,单独任何一种奖励都不足够——丢弃页面级或项目级信号分别使NDCG@100降低17.8%和15.2%,而联合奖励则使NDCG@100提升高达46.8%。在拥有1000万用户的在线A/B测试中,PageLLM使GMV提升0.44%,点击率提升0.14%,证实了来自隐式反馈的多粒度奖励可扩展到生产级WPO。代码和数据可在匿名仓库中获取。

英文摘要

Whole-page optimization (WPO) decides how search and recommendation results are surfaced to users, and large language models (LLMs) open a new route to it by treating page generation as sequence generation. Adapting LLMs to web-scale WPO, however, remains bottlenecked by the need for costly human annotations and by the mismatched granularity between page-level coherence and item-level placement. In this work we show that these two challenges are coupled: implicit user feedback alone suffices for alignment, provided the reward signal is decoupled into two complementary granularities. We propose PageLLM, a reward-based fine-tuning framework that (i) turns implicit feedback into four contrastive preference-pair families covering relevance, ranking, diversity, and redundancy, (ii) learns a coarse page-level reward and a fine item-level reward that captures engagement-sensitive position swaps, and (iii) combines both rewards in PPO-based RLHF over a pre-trained LLM. Extensive experiments on seven Amazon categories against eleven baselines show that neither reward alone is sufficient -- dropping the page-level or item-level signal reduces NDCG@100 by 17.8% and 15.2% respectively, whereas the joint reward improves NDCG@100 by up to 46.8%. Deployed in a 10M-user online A/B test, PageLLM raises GMV by 0.44% and click-through rate by 0.14%, confirming that multi-grained rewards from implicit feedback scale to production WPO. Code and data are available at an anonymized repository.

2506.00181 2026-05-26 cs.LG stat.ML 版本更新

On the Interaction of Batch Noise, Adaptivity, and Compression, under $(L_0,L_1)$-Smoothness: An SDE Approach

关于批噪声、自适应性和压缩在$(L_0,L_1)$-光滑性下的相互作用:一种SDE方法

Enea Monzio Compagnoni, Rustem Islamov, Frank Norbert Proske, Aurelien Lucchi, Antonio Orvieto, Eduard Gorbunov

发表机构 * University of Basel(巴塞尔大学) University of Oslo(奥斯陆大学) MBZUAI(马克斯·普朗克智能系统研究所) Max Planck Institute for Intelligent Systems(马克斯·普朗克智能系统研究所) ELLIS Institute Tübingen(图宾根ELLIS研究所) Tübingen AI Center(图宾根人工智能中心)

AI总结 本文通过随机微分方程(SDE)框架,在$(L_0,L_1)$-光滑性假设下统一分析分布式压缩SGD及其符号变体,揭示了梯度噪声、通信压缩和自适应更新之间的相互作用,并提出了新的SDE模型以准确捕捉学习率限制与几何特性的关系。

Comments Accepted at ICML 2026 (Poster)

详情
AI中文摘要

分布式随机优化交织了(i)随机梯度噪声、(ii)通信压缩和(iii)自适应/归一化更新。虽然每个因素已被单独研究,但在现实假设下它们的联合效应仍然知之甚少。在这项工作中,我们在最近引入的$(L_0, L_1)$-光滑性条件下,为分布式压缩SGD(DCSGD)及其符号变体分布式符号SGD(DSignSGD)开发了一个统一的理论框架。从概念角度,我们表明文献中的一阶和二阶修正方程不能准确建模离散时间步长/稳定性限制,特别是在$(L_0,L_1)$-光滑性下。从技术角度,我们通过将曲率相关项仔细纳入其漂移中,提出了新的一阶SDE:这有助于捕捉学习率限制、梯度噪声、压缩和损失景观几何之间的细粒度关系。重要的是,我们在一般梯度噪声假设下进行,包括重尾和仿射方差区域,这超出了经典的有限方差设置。我们的结果表明,归一化DCSGD的更新作为稳定性的自然条件出现,归一化程度由梯度噪声结构、景观正则性和压缩率精确决定。相比之下,DSignSGD即使在重尾噪声下也能以标准学习率调度收敛。这些发现共同提供了新的理论见解和视角,以及实践指导。

英文摘要

Distributed stochastic optimization intertwines (i) stochastic gradient noise, (ii) communication compression, and (iii) adaptive/normalized updates. While each factor has been studied in isolation, their joint effect under realistic assumptions remains poorly understood. In this work, we develop a unified theoretical framework for Distributed Compressed SGD (DCSGD) and its sign variant Distributed SignSGD (DSignSGD) under the recently introduced $(L_0, L_1)$-smoothness condition. From a conceptual perspective, we show that the first- and second-order modified equations from the literature do not accurately model the discrete-time step-size/stability restrictions, especially under $(L_0,L_1)$-smoothness. From a technical perspective, we propose new first-order SDEs by carefully incorporating curvature-dependent terms into their drift: This helps capture the fine-grained relationship between learning rate restrictions, gradient noise, compression, and the geometry of the loss landscape. Importantly, we do so under general gradient noise assumptions, including heavy-tailed and affine-variance regimes, which extend beyond the classical bounded-variance setting. Our results suggest that normalizing the updates of DCSGD emerges as a natural condition for stability, with the degree of normalization precisely determined by the gradient noise structure, the landscape's regularity, and the compression rate. In contrast, DSignSGD converges even under heavy-tailed noise with standard learning rate schedules. Together, these findings offer both new theoretical insights and perspectives, and practical guidance.

2505.18979 2026-05-26 cs.LG 版本更新

Dynamic Optimization and Safety Indicator Injection for Jailbreaking Text-to-Image Models with Multimodal Safety Filters

动态优化与安全指示注入:针对多模态安全过滤器的文本到图像模型越狱方法

Zixuan Chen, Hao Lin, Ke Xu, Xinghao Jiang, Tanfeng Sun

发表机构 * Shanghai Jiao Tong University(上海交通大学) The Chinese University of Hong Kong(香港中文大学)

AI总结 提出OptJail框架,通过动态提示优化与自适应安全指示注入,绕过文本和图像过滤器,实现高成功率越狱,并揭示多模态防御的系统性漏洞。

详情
AI中文摘要

文本到图像(T2I)模型可能生成不安全内容,促使采用包含文本和图像过滤器的多阶段安全流水线。新型基于LLM的过滤器能检测关键词之外的潜在意图,使得令牌级扰动攻击不可靠。我们的评估进一步表明,现有越狱方法在绕过过滤器和保持语义保真度之间存在尖锐权衡,同时需要过多查询才能成功。我们提出 extbf{OptJail},一种自动化越狱框架,结合动态提示优化与多模态反馈。它包含两个关键组件:(i) extit{动态优化},一种迭代过程,利用文本过滤器反馈和语义一致性将提示改写为对抗变体;(ii) extit{自适应安全指示注入},将良性视觉线索的注入建模为强化学习问题,以绕过图像级过滤器。OptJail实现了最先进的性能,将ShieldLM-7B的绕过率从8.9%(Sneakyprompt)提高到99.0%,CLIP分数从0.2637提升到0.2762。此外,它能泛化到未见过的过滤器,并在我们的评估中成功越狱DALL·E 3。机制分析揭示了这些防御失败的原因:优化后的提示被投影到过滤器表示空间的“安全”区域,但在生成模型的语义空间中几乎保持静止;注入的安全指示将图像检测器的注意力从不安全内容转向良性视觉线索。本研究揭示了当前多模态防御的系统性漏洞,并激励更强的自适应防御。

英文摘要

Text-to-image (T2I) models can generate not-safe-for-work (NSFW) content, motivating multi-stage safety pipelines with both text and image filters. Newer LLM-based filters detect latent intent beyond keywords, making token-level perturbation attacks unreliable. Our evaluation further shows that existing jailbreak methods exhibit a sharp trade-off between filter evasion and semantic fidelity, while also requiring excessive queries to succeed. We introduce \textbf{OptJail}, an automated jailbreak framework that combines dynamic prompt optimization with multimodal feedback. It consists of two key components: (i) \textit{Dynamic Optimization}, an iterative process that leverages text-filter feedback and semantic consistency to rewrite prompts into adversarial variants; and (ii) \textit{Adaptive Safety Indicator Injection}, which formulates the injection of benign visual cues as a reinforcement learning problem to bypass image-level filters. OptJail achieves state-of-the-art performance, increasing the ShieldLM-7B bypass rate from 8.9\% (Sneakyprompt) to 99.0\%, improving CLIP score from 0.2637 to 0.2762. Moreover, it generalizes to unseen filters and successfully jailbreaks DALL E 3 in our evaluation. Mechanistic analysis reveals why these defenses fail: optimized prompts are projected into the ``safe'' region of the filter's representation space yet remain nearly stationary in the generative model's semantic space, and injected safety indicators redirect image detectors' attention away from NSFW content toward benign visual cues. This study reveals systemic vulnerabilities in current multimodal defenses and motivates stronger adaptive defenses.

2505.13878 2026-05-26 cs.LG cs.CL 版本更新

InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models

InfiFPO:通过偏好优化实现大型语言模型的隐式模型融合

Yanggan Gu, Yuanyi Wang, Zhaoyi Yan, Yiming Zhang, Qi Zhou, Fei Wu, Hongxia Yang

发表机构 * The Hong Kong Polytechnic University (PolyU)(香港理工大学) Zhejiang University(浙江大学) PolyU-Daya Bay Technology and Innovation Research Institute(香港理工大学-大亚湾技术与创新研究院)

AI总结 提出InfiFPO方法,通过将DPO中的参考模型替换为融合源模型,在序列级别合成多源概率,实现隐式模型融合,从而在偏好对齐阶段有效融合多个LLM并提升性能。

详情
Journal ref
NeurIPS 2025
AI中文摘要

模型融合通过轻量训练方法将具有不同优势的多个大型语言模型(LLM)组合成一个更强大的集成模型。现有的模型融合工作主要关注监督微调(SFT),而偏好对齐(PA)——增强LLM性能的关键阶段——在很大程度上未被探索。当前少数在PA阶段的融合方法(如WRPO)通过仅利用源模型的响应输出而丢弃其概率信息来简化过程。为了解决这一局限性,我们提出了InfiFPO,一种用于隐式模型融合的偏好优化方法。InfiFPO将直接偏好优化(DPO)中的参考模型替换为一个融合源模型,该模型在序列级别合成多源概率,从而规避了先前工作中复杂的词汇对齐挑战,同时保留了概率信息。通过引入概率裁剪和最大边际融合策略,InfiFPO使枢轴模型能够与人类偏好对齐,同时有效地从源模型中蒸馏知识。在11个广泛使用的基准上的综合实验表明,InfiFPO始终优于现有的模型融合和偏好优化方法。当使用Phi-4作为枢轴模型时,InfiFPO在11个基准上的平均性能从79.95提升至83.33,显著增强了其在数学、编码和推理任务上的能力。

英文摘要

Model fusion combines multiple Large Language Models (LLMs) with different strengths into a more powerful, integrated model through lightweight training methods. Existing works on model fusion focus primarily on supervised fine-tuning (SFT), leaving preference alignment (PA) --a critical phase for enhancing LLM performance--largely unexplored. The current few fusion methods on PA phase, like WRPO, simplify the process by utilizing only response outputs from source models while discarding their probability information. To address this limitation, we propose InfiFPO, a preference optimization method for implicit model fusion. InfiFPO replaces the reference model in Direct Preference Optimization (DPO) with a fused source model that synthesizes multi-source probabilities at the sequence level, circumventing complex vocabulary alignment challenges in previous works and meanwhile maintaining the probability information. By introducing probability clipping and max-margin fusion strategies, InfiFPO enables the pivot model to align with human preferences while effectively distilling knowledge from source models. Comprehensive experiments on 11 widely-used benchmarks demonstrate that InfiFPO consistently outperforms existing model fusion and preference optimization methods. When using Phi-4 as the pivot model, InfiFPO improve its average performance from 79.95 to 83.33 on 11 benchmarks, significantly improving its capabilities in mathematics, coding, and reasoning tasks.

2505.03677 2026-05-26 cs.LG 版本更新

Neural Integral Operators for Inverse Problems: An Operator-Learning Framework for Small-Sample Spectroscopic Classification

逆问题的神经积分算子:小样本光谱分类的算子学习框架

Emanuele Zappala, Alice Giola, Andreas Kramer, Saugat Acharya, Enrico Greco

发表机构 * Department of Mathematics and Statistics, Idaho State University, Physical Science Complex, 921 S. 8th Ave., Stop 8085, Pocatello, ID 83209, USA(数学与统计学系,爱达荷州立大学,物理科学中心,921 S. 8th Ave., Stop 8085, Pocatello, ID 83209, USA) Department of Computer Science, Idaho State University, 921 S. 8th Ave Mail Stop 8060, Pocatello, ID 83209-8023, USA(计算机科学系,爱达荷州立大学,921 S. 8th Ave Mail Stop 8060, Pocatello, ID 83209-8023, USA) Institute for the Advanced Study of Culture and the Environment (IASCE), University of South Florida, 4202 E Fowler Ave, Tampa, FL 33620, USA(文化与环境高级研究机构(IASCE),佛罗里达州立大学,4202 E Fowler Ave, Tampa, FL 33620, USA)

AI总结 提出神经积分算子(NIO)框架,通过参数化Urysohn核和蒙特卡洛采样隐式正则化,在小样本光谱分类任务中优于传统机器学习和深度学习基线。

Comments 20 pages. 4 figures, 3 tables. v2: Link to code repository added. v3: Article largely reorganized and several portions rewritten for clarity. Comments are welcome

详情
AI中文摘要

在软计算中,学习具有强归纳偏置的函数空间映射是一个核心挑战,尤其是在训练数据稀缺且标准深度架构过拟合的情况下。我们引入了一种基于第一类积分方程的\emph{神经积分算子}(NIO)框架,其中算子的Urysohn核由前馈网络~$G_{θ_G}$参数化,潜在函数由卷积编码器~$E_{ϕ_E}$生成,两者通过交叉熵损失进行端到端联合训练。学习算子的积分通过蒙特卡洛采样近似,我们认为这充当了在被积函数层面操作的隐式随机正则化器,补充了权重衰减和dropout等参数级正则化器。我们在三个不同规模和复杂度的真实世界光谱分类任务(FT-IR水果泥、NIR肉类、NIR纺织品)上对该框架进行了基准测试,并与传统机器学习(决策树、支持向量机,有无UMAP)和现代深度学习基线(FFNN、CNN+FFNN、浅层CNN、Transformer)进行了比较。所提出的NIO在所有数据集和指标上始终位居前两名,在最具挑战性的小样本复杂数据集(纺织品)上取得了最佳结果,并且在数据稀缺情况下比竞争深度模型具有更低的性能方差。结果表明,当传统深度学习方法受限于数据稀缺时,具有随机数值积分的算子学习架构是光谱学中逆问题的一种可行的软计算策略。

英文摘要

Learning maps between function spaces with a strong inductive bias is a central challenge in soft computing, especially when training data are scarce and standard deep architectures overfit. We introduce a \emph{neural integral operator} (NIO) framework based on integral equations of the first kind, in which the Urysohn kernel of the operator is parameterized by a feed-forward network~$G_{θ_G}$ and the latent function is produced by a convolutional encoder~$E_{ϕ_E}$, both trained jointly end-to-end via cross-entropy loss. The integral defining the learned operator is approximated by Monte Carlo sampling, which we argue acts as an implicit stochastic regularizer operating at the level of the integrand and complementing parameter-level regularizers such as weight decay and dropout. We benchmark the framework on three real-world spectroscopic classification tasks (FT-IR fruit purees, NIR meat, NIR textiles) of varying size and complexity, against traditional machine learning (decision tree, support vector machine, with and without UMAP) and modern deep learning baselines (FFNN, CNN+FFNN, shallow CNN, transformer). The proposed NIO is consistently among the top two performing models across all datasets and metrics, achieves the best results on the most challenging small-and-complex dataset (Textile), and yields lower performance variance than competing deep models in the small-data regime. The results suggest that operator-learning architectures with stochastic numerical integration are a viable soft-computing strategy for inverse problems in spectroscopy when conventional deep learning approaches are limited by data scarcity.

2411.06278 2026-05-26 math.NA cs.LG cs.NA math.OC 版本更新

A Natural Primal-Dual Hybrid Gradient Method for Adversarial Neural Network Training on Solving Partial Differential Equations

一种用于求解偏微分方程的对抗神经网络训练的自然原始-对偶混合梯度方法

Shu Liu, Stanley Osher, Wuchen Li

AI总结 提出一种可扩展的预条件原始-对偶混合梯度算法,通过引入预条件算子实现自然梯度优化,并利用Krylov子空间方法高效求解,适用于多种线性和非线性偏微分方程,数值实验表明该方法比PINNs、DeepRitz和WANs等方法更稳定、更精确。

Comments Several typographical errors and notational inconsistencies have been corrected

详情
AI中文摘要

我们提出了一种可扩展的预条件原始-对偶混合梯度算法,用于求解偏微分方程(PDE)。我们将PDE与对偶测试函数相乘,得到一个inf-sup问题,其损失泛函涉及低阶微分算子。然后利用原始-对偶混合梯度(PDHG)算法处理这个鞍点问题。通过在PDHG算法的近端步骤中引入合适的预条件算子,我们获得了一种用于更新神经网络参数的替代自然梯度上升-下降优化方案。我们应用Krylov子空间方法(MINRES)来高效计算自然梯度。这种处理通过矩阵-向量乘法轻松处理预条件矩阵的求逆。对于一般线性PDE,我们建立了所提算法时间连续版本的 extit{后验}收敛性分析。通过引入适当的边界损失项,我们进一步得到了散度形式椭圆方程的 extit{先验}收敛结果。该算法在维度从1到50的各种PDE上进行了测试,包括线性和非线性椭圆方程、反应扩散方程以及来自$L^2$最优输运问题的Monge-Ampère方程。我们将所提方法的性能与几种常用的深度学习算法进行了比较,例如物理信息神经网络(PINNs)、DeepRitz方法和弱对抗网络(WANs),这些算法使用Adam或L-BFGS优化器。数值结果表明,所提方法高效且稳健,收敛更稳定且精度更高。

英文摘要

We propose a scalable preconditioned primal-dual hybrid gradient algorithm for solving partial differential equations (PDEs). We multiply the PDE with a dual test function to obtain an inf-sup problem whose loss functional involves lower-order differential operators. The Primal-Dual Hybrid Gradient (PDHG) algorithm is then leveraged for this saddle point problem. By introducing suitable precondition operators to the proximal steps in the PDHG algorithm, we obtain an alternative natural gradient ascent-descent optimization scheme for updating the neural network parameters. We apply the Krylov subspace method (MINRES) to evaluate the natural gradients efficiently. Such treatment readily handles the inversion of precondition matrices via matrix-vector multiplication. An \textit{a posteriori} convergence analysis is established for the time-continuous version of the proposed algorithm for general linear PDEs. By incorporating appropriate boundary loss terms, we further obtain a refined \textit{a priori} convergence result for elliptic equations in divergence form. The algorithm is tested on various types of PDEs with dimensions ranging from $1$ to $50$, including linear and nonlinear elliptic equations, reaction-diffusion equations, and Monge-Ampère equations stemming from the $L^2$ optimal transport problems. We compare the performance of the proposed method with several commonly used deep learning algorithms such as physics-informed neural networks (PINNs), the DeepRitz method and weak adversarial networks (WANs) using either the Adam or the L-BFGS optimizer. The numerical results suggest that the proposed method performs efficiently and robustly and converges more stably with higher accuracy.

2409.03777 2026-05-26 cs.CV cs.LG 版本更新

A Greedy Hierarchical Approach to Whole-Network Filter-Pruning in CNNs

一种面向CNN全网络滤波器剪枝的贪婪层次方法

Kiran Purohit, Anurag Reddy Parvathgari, Sourangshu Bhattacharya

发表机构 * Department of Computer Science and Engineering(计算机科学与工程系) Indian Institute of Technology, Kharagpur, India(印度理工学院,Khargpur,印度)

AI总结 提出一种基于线性近似的两层层次化贪婪剪枝算法,通过低层滤波器选择和全局剪枝准则高效剪枝,在多个网络上优于现有方法。

Comments Accepted in TMLR 2024

详情
AI中文摘要

深度卷积神经网络(CNN)在许多计算机视觉任务中取得了令人印象深刻的表现。然而,它们的大模型尺寸需要大量计算资源,因此从预训练的CNN中剪枝冗余滤波器是开发资源受限设备高效模型的关键任务。全网络滤波器剪枝算法从每层剪枝不同比例的滤波器,从而提供更大的灵活性。当前的全网络剪枝方法要么因需要使用训练数据集计算每个剪枝滤波器的损失而计算成本高昂,要么使用各种启发式/学习标准来确定每层的剪枝比例。本文提出了一种高效的两级层次化全网络滤波器剪枝方法,该方法使用分类损失作为最终标准。低级算法(称为滤波器剪枝)使用基于滤波器权重线性近似的稀疏近似公式。我们探索了两种算法:基于正交匹配追踪的贪婪选择和贪婪反向剪枝方法。反向剪枝算法使用一种新颖的闭式误差标准,在每个阶段高效选择最优滤波器,从而使整个算法更快。高级算法(称为层选择)使用全局剪枝准则贪婪地选择最佳剪枝层(使用滤波器选择算法进行剪枝)。我们针对两种不同的全局剪枝准则提出了算法:(1)逐层相对误差(HBGS),和(2)最终分类误差(HBGTS)。我们的算法套件在ResNet18、ResNet32、ResNet56、VGG16和ResNext101上优于最先进的剪枝方法。我们的方法将ResNext101的RAM需求从7.6 GB降低到1.5 GB,并在CIFAR-10上实现了94%的FLOPS减少而不损失精度。

英文摘要

Deep convolutional neural networks (CNNs) have achieved impressive performance in many computer vision tasks. However, their large model sizes require heavy computational resources, making pruning redundant filters from existing pre-trained CNNs an essential task in developing efficient models for resource-constrained devices. Whole-network filter pruning algorithms prune varying fractions of filters from each layer, hence providing greater flexibility. Current whole-network pruning methods are either computationally expensive due to the need to calculate the loss for each pruned filter using a training dataset, or use various heuristic / learned criteria for determining the pruning fractions for each layer. This paper proposes a two-level hierarchical approach for whole-network filter pruning which is efficient and uses the classification loss as the final criterion. The lower-level algorithm (called filter-pruning) uses a sparse-approximation formulation based on linear approximation of filter weights. We explore two algorithms: orthogonal matching pursuit-based greedy selection and a greedy backward pruning approach. The backward pruning algorithm uses a novel closed-form error criterion for efficiently selecting the optimal filter at each stage, thus making the whole algorithm much faster. The higher-level algorithm (called layer-selection) greedily selects the best-pruned layer (pruning using the filter-selection algorithm) using a global pruning criterion. We propose algorithms for two different global-pruning criteria: (1) layer-wise relative error (HBGS), and (2) final classification error (HBGTS). Our suite of algorithms outperforms state-of-the-art pruning methods on ResNet18, ResNet32, ResNet56, VGG16, and ResNext101. Our method reduces the RAM requirement for ResNext101 from 7.6 GB to 1.5 GB and achieves a 94% reduction in FLOPS without losing accuracy on CIFAR-10.

2402.10665 2026-05-26 cs.LG cs.CV 版本更新

Soft Dice Confidence: A Near-Optimal Confidence Estimator for Selective Prediction in Semantic Segmentation

Soft Dice Confidence: 语义分割中选择性预测的近似最优置信度估计器

Bruno Laboissiere Camargos Borges, Bruno Machado Pacheco, Danilo Silva

发表机构 * Department of Automation and Systems Engineering, Federal University of Santa Catarina(圣卡塔琳娜联邦大学自动化与系统工程系)

AI总结 针对语义分割中的选择性预测问题,提出一种基于Dice系数的近似最优置信度估计器SDC,在已知或估计边际后验概率下均优于现有方法。

Comments 48 pages, 11 figures

详情
AI中文摘要

在语义分割中,即使是最先进的深度学习模型在某些高风险应用(如医学图像分析)中也达不到所需的性能。在这些情况下,可以通过允许模型在置信度低时放弃预测来提高性能,这种方法称为选择性预测。虽然在分类文献中广为人知,但选择性预测在语义分割的背景下尚未得到充分探索。本文通过关注图像级弃权来解决这个问题,即对整个图像产生单个置信度估计,而先前的方法则关注像素级不确定性。假设Dice系数作为分割的评估指标,本文提供了两个主要贡献:(i)在已知边际后验概率的情况下,我们推导出最优置信度估计器,但观察到对于典型图像大小难以处理。然后,提出了一种线性时间可计算的近似方法,称为Soft Dice Confidence(SDC),并证明它与最优估计器紧密有界。(ii)当仅知道边际后验概率的估计时,我们提出了SDC的插件版本,并证明它优于所有先前的方法,包括那些需要额外调优数据的方法。这些发现得到了合成数据和来自六项医学成像任务(包括分布外场景)的真实世界数据的实验结果的支持,将SDC定位为语义分割中选择性预测的可靠且高效的工具。

英文摘要

In semantic segmentation, even state-of-the-art deep learning models fall short of the performance required in certain high-stakes applications such as medical image analysis. In these cases, performance can be improved by allowing a model to abstain from making predictions when confidence is low, an approach known as selective prediction. While well-known in the classification literature, selective prediction has been underexplored in the context of semantic segmentation. This paper tackles the problem by focusing on image-level abstention, which involves producing a single confidence estimate for the entire image, in contrast to previous approaches that focus on pixel-level uncertainty. Assuming the Dice coefficient as the evaluation metric for segmentation, two main contributions are provided in this paper: (i) In the case of known marginal posterior probabilities, we derive the optimal confidence estimator, which is observed to be intractable for typical image sizes. Then, an approximation computable in linear time, named Soft Dice Confidence (SDC), is proposed and proven to be tightly bounded to the optimal estimator. (ii) When only an estimate of the marginal posterior probabilities are known, we propose a plug-in version of the SDC and show it outperforms all previous methods, including those requiring additional tuning data. These findings are supported by experimental results on both synthetic data and real-world data from six medical imaging tasks, including out-of-distribution scenarios, positioning the SDC as a reliable and efficient tool for selective prediction in semantic segmentation.

2402.08726 2026-05-26 quant-ph cs.LG math-ph math.MP math.PR 版本更新

Trained quantum neural networks are Gaussian processes

训练后的量子神经网络是高斯过程

Filippo Girardi, Giacomo De Palma

发表机构 * Korteweg–de Vries Institute for Mathematics, University of Amsterdam(阿姆斯特丹大学Korteweg–de Vries数学研究所) QuSoft, Science Park 123, Amsterdam(阿姆斯特丹QuSoft) Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126, Pisa (PI), Italy(意大利比萨斯克里瓦纳超级学院) Department of Mathematics, University of Bologna(博洛尼亚大学数学系)

AI总结 研究由参数化单量子比特门和固定双量子比特门构成的量子神经网络在无限宽度极限下的行为,证明未训练和训练后的网络生成的函数概率分布均收敛于高斯过程,并分析了测量噪声的影响。

Comments 116 pages

详情
Journal ref
Communications in Mathematical Physics 406, 92 (2025)
AI中文摘要

我们研究了由参数化单量子比特门和固定双量子比特门构成的量子神经网络在无限宽度极限下的行为,其中生成的函数是所有量子比特上单量子比特可观测量之和的期望值。首先,我们证明,当每个被测量的量子比特仅与少数其他被测量的量子比特相关时,具有随机初始化参数的未训练网络生成的函数的概率分布在分布上收敛于高斯过程。然后,我们通过梯度下降和平方损失在监督学习问题上解析地刻画了网络的训练。我们证明,只要网络不受贫瘠高原的影响,训练后的网络可以完美拟合训练集,并且训练后生成的函数的概率分布仍然在分布上收敛于高斯过程。最后,我们考虑网络输出端测量的统计噪声,并证明多项式数量的测量足以使所有先前的结果成立,并且网络始终可以在多项式时间内训练。

英文摘要

We study quantum neural networks made by parametric one-qubit gates and fixed two-qubit gates in the limit of infinite width, where the generated function is the expectation value of the sum of single-qubit observables over all the qubits. First, we prove that the probability distribution of the function generated by the untrained network with randomly initialized parameters converges in distribution to a Gaussian process whenever each measured qubit is correlated only with few other measured qubits. Then, we analytically characterize the training of the network via gradient descent with square loss on supervised learning problems. We prove that, as long as the network is not affected by barren plateaus, the trained network can perfectly fit the training set and that the probability distribution of the function generated after training still converges in distribution to a Gaussian process. Finally, we consider the statistical noise of the measurement at the output of the network and prove that a polynomial number of measurements is sufficient for all the previous results to hold and that the network can always be trained in polynomial time.

2310.04981 2026-05-26 cs.CV cs.LG 版本更新

Compositional Semantics for Open Vocabulary Spatio-semantic Representations

开放词汇时空语义表示的组合语义

Robin Karlsson, Francisco Lepe-Salazar, Kazuya Takeda

发表机构 * Graduate School of Informatics, Nagoya University(名古屋大学信息学研究科) Ludolab TIER IV

AI总结 提出潜在组合语义嵌入z*作为可查询时空语义记忆的知识表示,证明其存在性、最优性及可发现性,并引入充分相似性推理方法提升重叠语义推理性能。

Comments Preprint

详情
AI中文摘要

视觉语言模型(VLM)将环境感知转换为LLM可解释的视觉语言语义。然而,完成复杂任务通常需要对当前感知之外的信息进行推理。我们提出潜在组合语义嵌入z*作为可查询时空语义记忆的基于学习的原则性知识表示。我们在数学上证明z*总是可以找到,并且最优z*是任何集合Z的质心。我们推导了估计相关和不相关语义可分离性的概率界限。我们证明z*可以通过迭代梯度下降从视觉外观和单一描述中发现。我们在包括CLIP和SBERT的四个嵌入空间上实验验证了我们的发现。结果表明,z*可以表示由SBERT编码的多达10个语义,以及理想均匀分布的高维嵌入的多达100个语义。我们引入了三个具有重叠语义的新数据集,以表明在常规非重叠注释上训练的常见VLM能够发现z*。我们提出的充分相似性推理方法克服了传统推理的根本局限性,并将更高层次的重叠语义推理性能平均提高了19.63 mIoU。

英文摘要

Vision-language models (VLMs) transform environment percepts into vision-language semantics interpretable by LLMs. However, completing complex tasks often requires reasoning about information beyond what is currently perceived. We propose latent compositional semantic embeddings z* as a principled learning-based knowledge representation for queryable spatio-semantic memories. We mathematically prove that z* can always be found, and that the optimal z* is the centroid for any set Z. We derive a probabilistic bound for estimating separability of related and unrelated semantics. We prove that z* is discoverable from visual appearance and singular descriptions by iterative gradient descent. We experimentally verify our findings on four embedding spaces including CLIP and SBERT. Our results show that z* can represent up to 10 semantics encoded by SBERT, and up to 100 semantics for ideal uniformly distributed high-dimensional embeddings. We introduce three new datasets with overlapping semantics to show that common VLMs trained on conventional nonoverlapping annotations discover z*. Our novel sufficient similarity inference method overcomes fundamental limitations of conventional inference, and improves higher-level overlapping semantic inference performance by 19.63 mIoU on average.

2305.11663 2026-05-26 cs.LG cs.AI cs.CL cs.CY 版本更新

Algorithmic failure as a humanities methodology: machine learning's mispredictions identify rich cases for qualitative analysis

作为人文学科方法论的算法失败:机器学习的错误预测识别出用于定性分析的丰富案例

Jill Walker Rettberg

AI总结 本文通过实验验证了Munk等人提出的利用机器学习失败预测识别定性分析中模糊且丰富案例的方法,使用简单kNN算法对虚构角色与机器视觉技术互动的动作数据进行分类,发现不可预测的动作更具矛盾性和情感负荷,支持该方法在人文学科中的适用性。

详情
Journal ref
Big Data & Society 9(2) 2022
AI中文摘要

本文评论测试了Munk等人(2022)提出的一种方法论,即利用机器学习中的失败预测作为识别定性分析中模糊且丰富案例的方法。使用一个描述500件艺术品、电影、小说和电子游戏中虚构角色与机器视觉技术互动动作的数据集,我训练了一个简单的机器学习算法(使用R中的kNN算法),仅根据虚构角色的信息预测动作是主动还是被动。可预测的动作通常是缺乏情感且明确的,其中机器视觉技术被当作简单工具。不可预测的动作,即算法无法正确预测的动作,则更加矛盾且情感负荷更重,角色与技术之间的权力关系更为复杂。因此,结果支持Munk等人的理论,即失败预测可以有效地用于识别定性分析的丰富案例。本测试不仅简单复制了Munk等人的结果,还证明了该方法可以应用于更广泛的人文学科领域,并且不需要复杂的神经网络,简单的机器学习算法也能奏效。需要进一步研究以理解该方法适用于哪些类型的数据以及哪种机器学习最具生成性。为此,附上了产生结果所需的R代码,以便复制测试。该代码也可重复使用或改编,以在其他数据集上测试该方法。

英文摘要

This commentary tests a methodology proposed by Munk et al. (2022) for using failed predictions in machine learning as a method to identify ambiguous and rich cases for qualitative analysis. Using a dataset describing actions performed by fictional characters interacting with machine vision technologies in 500 artworks, movies, novels and videogames, I trained a simple machine learning algorithm (using the kNN algorithm in R) to predict whether or not an action was active or passive using only information about the fictional characters. Predictable actions were generally unemotional and unambiguous activities where machine vision technologies were treated as simple tools. Unpredictable actions, that is, actions that the algorithm could not correctly predict, were more ambivalent and emotionally loaded, with more complex power relationships between characters and technologies. The results thus support Munk et al.'s theory that failed predictions can be productively used to identify rich cases for qualitative analysis. This test goes beyond simply replicating Munk et al.'s results by demonstrating that the method can be applied to a broader humanities domain, and that it does not require complex neural networks but can also work with a simpler machine learning algorithm. Further research is needed to develop an understanding of what kinds of data the method is useful for and which kinds of machine learning are most generative. To support this, the R code required to produce the results is included so the test can be replicated. The code can also be reused or adapted to test the method on other datasets.

2105.13431 2026-05-26 cs.LG cs.AI cs.SY eess.SY 版本更新

An Offline Risk-aware Policy Selection Method for Bayesian Markov Decision Processes

贝叶斯马尔可夫决策过程的离线风险感知策略选择方法

Giorgio Angelotti, Nicolas Drougard, Caroline Ponzoni Carvalho Chanel

发表机构 * Natural Intelligence Toulouse Institute, University of Toulouse, France(图卢兹大学自然智能研究所) ISAE-SUPAERO, University of Toulouse, France(图卢兹大学ISAE-SUPAERO)

AI总结 针对离线强化学习中模型不确定性导致策略风险高的问题,提出一种基于贝叶斯形式化框架的风险感知策略选择方法EvC,通过最大化贝叶斯后验下的风险感知目标来选择稳健策略。

Comments Preprint, under review

详情
Journal ref
Artificial Intelligence, Volume 354, 2026
AI中文摘要

在离线模型学习用于规划以及离线强化学习中,有限的数据集阻碍了相对马尔可夫决策过程(MDP)的值函数估计。因此,所获得策略在真实世界中的性能受到限制且可能存在风险,尤其是当部署错误策略可能导致灾难性后果时。为此,目前正在探索多种途径以减少模型误差(或学习模型与真实模型之间的分布偏移),并在更广泛的意义上获得针对模型不确定性的风险感知解决方案。但在最终应用中,实践者应选择哪种基线?在计算时间不是问题且鲁棒性优先的离线背景下,我们提出了Exploitation vs Caution(EvC),这是一种范式:(1)优雅地融入遵循贝叶斯形式化的模型不确定性,以及(2)在由当前基线提供的固定候选策略集合中,选择最大化贝叶斯后验下风险感知目标的策略。我们在不同离散但简单的环境中使用最先进的方法验证了EvC,这些环境提供了多种MDP类别。在测试场景中,EvC成功选择了稳健策略,因此成为旨在将离线规划和强化学习求解器应用于真实世界的实践者的有用工具。

英文摘要

In Offline Model Learning for Planning and in Offline Reinforcement Learning, the limited data set hinders the estimate of the Value function of the relative Markov Decision Process (MDP). Consequently, the performance of the obtained policy in the real world is bounded and possibly risky, especially when the deployment of a wrong policy can lead to catastrophic consequences. For this reason, several pathways are being followed with the scope of reducing the model error (or the distributional shift between the learned model and the true one) and, more broadly, obtaining risk-aware solutions with respect to model uncertainty. But when it comes to the final application which baseline should a practitioner choose? In an offline context where computational time is not an issue and robustness is the priority we propose Exploitation vs Caution (EvC), a paradigm that (1) elegantly incorporates model uncertainty abiding by the Bayesian formalism, and (2) selects the policy that maximizes a risk-aware objective over the Bayesian posterior between a fixed set of candidate policies provided, for instance, by the current baselines. We validate EvC with state-of-the-art approaches in different discrete, yet simple, environments offering a fair variety of MDP classes. In the tested scenarios EvC manages to select robust policies and hence stands out as a useful tool for practitioners that aim to apply offline planning and reinforcement learning solvers in the real world.

2604.00963 2026-05-26 cs.DS cs.LG math.PR 版本更新

Rapid mixing in positively weighted restricted Boltzmann machines

正权重受限玻尔兹曼机中的快速混合

Weiming Feng, Heng Guo, Minji Yang

发表机构 * School of Informatics, University of Edinburgh, Informatics Forum, UK(爱丁堡大学信息学院) School of Computing and Data Science, The University of Hong Kong, HK(香港大学计算与数据科学学院)

AI总结 本文通过分析铁磁双自旋系统的Glauber动力学,证明了正权重受限玻尔兹曼机的交替扫描采样器具有多对数混合时间界,并得到了临界阈值内的新混合时间界。

详情
AI中文摘要

我们证明了正权重受限玻尔兹曼机的交替扫描采样器具有多对数混合时间界。这是通过分析相同的链和铁磁双自旋系统的Glauber动力学实现的,其中我们得到了临界阈值内的新混合时间界。

英文摘要

We show polylogarithmic mixing time bounds for the alternating-scan sampler for positively weighted restricted Boltzmann machines. This is done via analysing the same chain and the Glauber dynamics for ferromagnetic two-spin systems, where we obtain new mixing time bounds up to the critical thresholds.

2603.18363 2026-05-26 cs.CL cs.AI cs.LG 版本更新

PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching

PowerFlow: 通过原则性分布匹配释放LLMs的双重特性

Ruishuo Chen, Yu Chen, Zhuoran Li, Longbo Huang

发表机构 * Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China(清华大学交叉信息研究院)

AI总结 提出PowerFlow框架,将无监督微调重构成分布匹配问题,利用GFlowNet和长度感知轨迹平衡目标,通过调整α-幂分布方向性激发LLMs的逻辑推理或创造性。

Comments Camera-ready version accepted at ICML 2026

详情
AI中文摘要

无监督内部反馈强化学习(RLIF)已成为一种有前景的范式,可以在没有外部监督的情况下激发大型语言模型(LLMs)的潜在能力。然而,当前方法依赖于启发式内在奖励,通常缺乏明确的理论优化目标,并且容易产生退化偏差。在这项工作中,我们引入了PowerFlow,一个原则性框架,将无监督微调重新表述为分布匹配问题。通过将GFlowNet视为未归一化密度的摊销变分采样器,我们提出了一个长度感知的轨迹平衡目标,明确抵消了自回归生成中固有的结构长度偏差。通过针对$α$-幂分布,PowerFlow能够方向性地激发LLMs的双重特性:锐化分布($α> 1$)以增强逻辑推理,或展平分布($α< 1$)以释放表达性创造力。大量实验表明,PowerFlow始终优于现有的RLIF方法,匹配甚至超过有监督的GRPO。此外,通过减轻对齐模型中的过度锐化,我们的方法在多样性和质量上同时取得提升,在创造性任务中推动了帕累托前沿。

英文摘要

Unsupervised Reinforcement Learning from Internal Feedback (RLIF) has emerged as a promising paradigm for eliciting the latent capabilities of Large Language Models (LLMs) without external supervision. However, current methods rely on heuristic intrinsic rewards, which often lack a well-defined theoretical optimization target and are prone to degenerative biases. In this work, we introduce PowerFlow, a principled framework that reformulates unsupervised fine-tuning as a distribution matching problem. By casting GFlowNet as an amortized variational sampler for unnormalized densities, we propose a length-aware Trajectory-Balance objective that explicitly neutralizes the structural length biases inherent in autoregressive generation. By targeting $α$-power distributions, PowerFlow enables the directional elicitation of the dual nature of LLMs: sharpening the distribution ($α> 1$) to intensify logical reasoning, or flattening it ($α< 1$) to unlock expressive creativity. Extensive experiments demonstrate that PowerFlow consistently outperforms existing RLIF methods, matching or even exceeding supervised GRPO. Furthermore, by mitigating over-sharpening in aligned models, our approach achieves simultaneous gains in diversity and quality, shifting the Pareto frontier in creative tasks.

2602.05448 2026-05-26 cs.LG 版本更新

BlitzRank: Principled Zero-shot Ranking Agents with Tournament Graphs

BlitzRank: 基于锦标赛图的原则性零样本排序智能体

Sheshansh Agrawal, Thien Hang Nguyen, Douwe Kiela

发表机构 * ContextualAI

AI总结 提出一种基于锦标赛图框架的k-wise排序方法,通过聚合偏好图并计算传递闭包,以最少比较次数准确识别top-m项,在LLM重排序中实现25-40%的token节省。

Comments ICML 2026 spotlight

详情
AI中文摘要

通过昂贵的$k$元比较从$n$个项目中选出前$m$个,是从基于LLM的文档重排序到众包评估和锦标赛设计等场景的核心问题。现有方法要么依赖丢弃比较信息的启发式方法,要么以高昂成本利用比较信息。我们引入了一个锦标赛图框架,为$k$元排序提供了原则性基础。我们的关键观察是,每次$k$项比较揭示了$inom{k}{2}$个成对偏好的诱导锦标赛;将这些聚合到全局偏好图中并计算其传递闭包,可以在不额外调用预言机的情况下获得许多额外的排序。我们形式化了当前top-$m$输出何时可被确定,并设计了一种贪心查询调度,最大化识别top-$m$项的信息增益。该框架还能优雅地处理非传递性偏好——由现实世界预言机引起的循环——通过将它们折叠成等价类,从而产生原则性的分层排名。应用于14个基准测试和5个模型的LLM重排序,BlitzRank实现了对现有方法的帕累托优势:匹配或超过准确率,同时比同类方法少用25-40%的token;与成对重排序相比,它以7倍的token减少实现了近乎相同的质量。代码见https://github.com/ContextualAI/BlitzRank。

英文摘要

Selecting the top $m$ from $n$ items via expensive $k$-wise comparisons is central to settings ranging from LLM-based document reranking to crowdsourced evaluation and tournament design. Existing methods either rely on heuristics that discard comparison information, or exploit it at prohibitive cost. We introduce a tournament graph framework that provides a principled foundation for $k$-wise ranking. Our key observation is that each $k$-item comparison reveals an induced tournament of $\binom{k}{2}$ pairwise preferences; aggregating these into a global preference graph and computing its transitive closure yields many additional orderings without further oracle calls. We formalize when the current top-$m$ output is certifiably determined and design a greedy query schedule that maximizes information gain towards identifying the top-$m$ items. The framework also gracefully handles non-transitive preferences -- cycles induced by real-world oracles -- by collapsing them into equivalence classes that yield principled tiered rankings. Applied to LLM reranking across 14 benchmarks and 5 models, BlitzRank achieves Pareto dominance over existing approaches: matching or exceeding accuracy while requiring 25--40% fewer tokens than comparable methods; against pairwise reranking, it achieves near-identical quality with 7$\times$ fewer tokens. Code available at https://github.com/ContextualAI/BlitzRank.

2602.20191 2026-05-26 cs.LG cs.AI cs.CL 版本更新

MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Any-Precision LLM

MoBiQuant: 面向令牌自适应任意精度LLM的混合比特量化

Dongwei Wang, Jinhee Kim, Seokho Han, Denis Gudovskiy, Yohei Nakata, Tomoyuki Okuno, KhayTze Peong, Kang Eun Jeon, Jong Hwan Ko, Yiran Chen, Huanrui Yang

发表机构 * University of Arizona(亚利桑那大学) Duke University(杜克大学) Sungkyunkwan University(成均馆大学) Panasonic AI Lab(松下人工智能实验室) Korea Advanced Institute of Science and Technology(韩国科学技术院)

AI总结 针对动态运行时约束下大语言模型任意精度量化的泛化性问题,提出基于令牌敏感度的混合比特量化框架MoBiQuant,通过多合一递归残差量化和令牌感知路由器实现灵活推理,在匹配或超越前沿单精度PTQ的同时显著节省内存并提升吞吐量。

Comments 20 pages, 10 figures

详情
AI中文摘要

动态运行时延迟和内存约束要求灵活部署大语言模型(LLM),使得LLM能够根据可用计算资源以不同的量化精度进行推理。最近关于这种任意精度量化的工作要么依赖于硬件效率低下的向量量化,要么在切换位宽时引入额外的缩放因子。同时,现有的为固定低精度校准的后训练量化(PTQ)方法在运行时精度变化下表现出较差的泛化性。在这项工作中,我们将跨位宽泛化性差的根源归因于一种精度依赖的“异常迁移”现象,其中PTQ敏感令牌的分布随精度变化。受此观察启发,我们提出了 exttt{MoBiQuant},一种新颖的任意精度混合比特量化框架,它根据令牌敏感性调整权重精度以实现灵活的LLM推理。具体来说,我们提出了一种多合一递归残差量化方法,可以在运行时迭代重建更高精度的权重,并通过令牌感知路由器缓解“异常迁移”,动态选择每个令牌的最优推理精度。大量实验表明, exttt{MoBiQuant}在匹配或超越前沿单精度PTQ的同时表现出强大的弹性,与最先进的任意精度方法相比,实现了显著的内存节省和高达$1.34 imes$的吞吐量提升。

英文摘要

Dynamic runtime latency and memory constraints necessitate flexible large language model (LLM) deployment, where an LLM can be inferred with various quantization precisions based on available computational resources. Recent work on such any-precision quantization either relies on hardware-inefficient vector quantization or induces additional scaling factors when switching between bit-widths. Meanwhile, existing post-training quantization (PTQ) methods calibrated for a fixed low precision show poor generalizability under runtime precision change. In this work, we attribute the source of poor generalization across bit-widths to a precision-dependent \textit{outlier migration} phenomenon where the distribution of PTQ-sensitive tokens changes across precisions. Motivated by this observation, we propose \texttt{MoBiQuant}, a novel any-precision Mixture-of-Bits quantization framework that adjusts weight precision for flexible LLM inference based on token sensitivity. Specifically, we propose a many-in-one recursive residual quantization that can iteratively reconstruct higher-precision weights at runtime and mitigates \textit{outlier migration} with a token-aware router to dynamically select the optimal inference precision of each token.Extensive experiments show that \texttt{MoBiQuant} matches or surpasses frontier single-precision PTQ while exhibiting strong elasticity, achieving significant memory savings and throughput gains of up to $1.34\times$ over state-of-the-art any-precision methods.

2512.05402 2026-05-26 cs.LG cs.AI cs.CE cs.NE 版本更新

Smart Timing for Mining: A Deep Learning Framework for Bitcoin Hardware ROI Prediction

挖矿的智能时机:用于比特币硬件投资回报率预测的深度学习框架

Sithumi Wickramasinghe, Bikramjit Das, Dorien Herremans

发表机构 * Singapore University of Technology and Design(新加坡科技设计大学)

AI总结 提出MineROI-Net,一种基于Transformer的深度学习框架,将比特币ASIC硬件采购建模为时间序列分类任务,预测一年内的投资回报率类别,在2015-2024年20种ASIC矿机数据上达到83.2%准确率和83.5%宏F1分数。

详情
AI中文摘要

由于市场波动、技术快速过时和协议驱动的收入周期,比特币挖矿硬件的获取需要战略时机。尽管挖矿已演变为资本密集型行业,但关于何时购买新的专用集成电路(ASIC)硬件的指导很少,且没有先前的计算框架解决这一决策问题。我们通过将硬件获取建模为时间序列分类任务来填补这一空白,预测购买ASIC机器是否在一年内产生盈利(投资回报率(ROI)>= 1)、边际(0 < ROI < 1)或亏损(ROI <= 0)的回报。我们提出了MineROI-Net,一种开源的基于Transformer的架构,旨在捕捉挖矿盈利能力中的多尺度时间模式。在2015年至2024年间发布的20种ASIC矿机在不同市场体制下的数据上评估,MineROI-Net优于循环、卷积和基于注意力的基线,达到了83.2%的准确率和83.5%的宏F1分数。该模型展示了强大的经济相关性,在检测亏损时期达到了97.8%的精确率,在检测盈利时期达到了81.5%的精确率,同时避免了将盈利场景误分类为亏损以及反之亦然。这些结果表明,MineROI-Net为挖矿硬件采购时机提供了一种实用的数据驱动工具,可能降低资本密集型挖矿操作中的财务风险。

英文摘要

Bitcoin mining hardware acquisition requires strategic timing due to volatile markets, rapid technological obsolescence, and protocol-driven revenue cycles. Despite mining's evolution into a capital-intensive industry, there is little guidance on when to purchase new Application-Specific Integrated Circuit (ASIC) hardware, and no prior computational frameworks address this decision problem. We address this gap by formulating hardware acquisition as a time series classification task, predicting whether purchasing ASIC machines yields profitable (Return on Investment (ROI) >= 1), marginal (0 < ROI < 1), or unprofitable (ROI <= 0) returns within one year. We propose MineROI-Net, an open-source Transformer-based architecture designed to capture multi-scale temporal patterns in mining profitability. Evaluated on data from 20 ASIC miners released between 2015 and 2024 across diverse market regimes, MineROI-Net outperforms recurrent, convolutional, and attention-based baselines, achieving 83.2% accuracy and 83.5% macro F1-score. The model demonstrates strong economic relevance, achieving 97.8% precision in detecting unprofitable periods and 81.5% precision in detecting profitable ones, while avoiding misclassifying profitable scenarios as unprofitable and vice versa. These results indicate that MineROI-Net offers a practical, data-driven tool for timing mining hardware acquisitions, potentially reducing financial risk in capital-intensive mining operations.

2401.01160 2026-05-26 eess.IV cs.CG cs.CV cs.LG 版本更新

Train-Free Segmentation in MRI with Cubical Persistent Homology

基于立方体持续同调的MRI无训练分割

Anton François, Raphaël Tinarrage

发表机构 * Centre G. Borelli ENS Paris-Saclay(巴黎-萨克雷大学) IST Austria(IST奥地利研究所) EMAp, Fundação Getulio Vargas(EMAp,格洛里亚·瓦格斯基金会)

AI总结 提出一种基于拓扑数据分析的无训练MRI分割框架,通过自动阈值、提取已知拓扑子集和分解成分三步实现,利用持续同调中的近似代表循环建立拓扑特征与解剖成分的可解释联系,在胶质母细胞瘤和胎儿皮质板分割中验证有效性。

Comments Similar to the published version. 22 pages, 11 figures, 3 tables. For associated code, see https://github.com/antonfrancois/gliomaSegmentation_TDA

详情
Journal ref
Journal of Mathematical Imaging and Vision 68, 20 (2026)
AI中文摘要

我们研究了一种基于拓扑数据分析的无训练MRI分割框架。该流程分三步进行:首先通过自动阈值识别待分割的整个对象,然后检测一个拓扑结构已知的独特子集,最后推导出分割的各个组成部分。一个关键要素是从持续同调图中提取近似代表循环,这提供了持久特征与解剖成分之间的可解释联系。为了阐明该方法的应用范围,我们明确了潜在的拓扑和强度假设,量化了它们在真实数据上的成立情况,并分析了典型的失败模式。我们在胶质母细胞瘤和胎儿皮质板分割上评估了该方法,并与无监督和深度学习参考方法进行了比较。通过在没有大型标注数据集的情况下运行,该方法非常适合数据稀缺的场景,并为专家修正或基于学习的流程提供了可解释的基线和实用的初始化。

英文摘要

We investigate a framework for train-free MRI segmentation based on Topological Data Analysis. The pipeline proceeds in three steps, first identifying the whole object to segment via automatic thresholding, then detecting a distinctive subset whose topology is known in advance, and finally deducing the various components of the segmentation. A key ingredient is the extraction of approximate representative cycles from persistence diagrams, which provides an interpretable link between persistent features and anatomical components. To clarify the method's scope, we make the underlying topological and intensity assumptions explicit, quantify when they hold on real data, and analyze typical failure modes. We evaluate the approach on glioblastoma and on fetal cortical plate segmentation, with comparisons to unsupervised and deep-learning references. By operating without large annotated datasets, the method is well suited to scarce-data settings and provides an interpretable baseline and practical initialization for expert refinement or learning-based pipelines.

2509.23413 2026-05-26 cs.LG 版本更新

URS: A Unified Neural Routing Solver for Cross-Problem Zero-Shot Generalization

URS:一种面向跨问题零样本泛化的统一神经路由求解器

Changliang Zhou, Canhong Yu, Shunyu Yao, Xi Lin, Zhenkun Wang, Yu Zhou, Qingfu Zhang

发表机构 * School of Automation and Intelligent Manufacturing, Southern University of Science and Technology, Shenzhen, China(自动化与智能制造学院,南方科技大学,深圳,中国) Guangdong Provincial Key Laboratory of Fully Actuated System Control Theory and Technology, Southern University of Science and Technology, Shenzhen, China(广东省全驱动系统控制理论与技术重点实验室,南方科技大学,深圳,中国) College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China(计算机科学与软件工程学院,深圳大学,深圳,中国) Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China(计算机科学系,香港城市大学,香港特别行政区,中国) School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China(数学与统计学学院,西安交通大学,西安,中国)

AI总结 提出URS,一种统一神经路由求解器,通过统一数据表示和混合偏置模块,实现单个模型在110种车辆路径问题变体(含99种未见变体)上的零样本泛化,并支持高达7000节点的规模。

Comments accepted by ICML 2026

详情
AI中文摘要

多任务神经路由求解器因其能够使用单个模型解决多种车辆路径问题(VRP)而成为一种有前景的范式。然而,现有的神经求解器通常依赖预定义的问题约束或需要针对每个问题进行微调,这极大地限制了它们对未见VRP变体的零样本泛化能力。为了解决这一关键瓶颈,我们提出了URS,一种统一的神经路由求解器,能够通过单个模型在广泛的未见VRP变体上实现零样本泛化。我们提出了一种统一数据表示(UDR),用数据统一替代问题枚举,从而扩大了问题覆盖范围并减少了对领域专业知识的依赖。此外,我们在编码过程中引入了一个混合偏置模块(MBM)来改进节点嵌入,该模块有效地捕获了各种问题固有的多个先验。在UDR的基础上,我们开发了一个问题条件参数生成器,以进一步提高零样本泛化能力。大量实验表明,URS能够为110种VRP变体(包括99种未见变体)持续生成高质量的解,同时展现出对多达7000个节点的大规模实例的出色可扩展性。据我们所知,URS是第一个能够通过单个模型处理超过100种VRP变体的神经求解器。我们的代码可在https://github.com/CIAM-Group/URS获取。

英文摘要

Multi-task neural routing solvers have emerged as a promising paradigm for their ability to solve multiple vehicle routing problems (VRPs) using a single model. However, existing neural solvers typically rely on predefined problem constraints or require per-problem fine-tuning, which substantially limits their zero-shot generalization ability to unseen VRP variants. To address this critical bottleneck, we propose URS, a unified neural routing solver that achieves zero-shot generalization across a wide range of unseen VRPs with a single model. We propose a unified data representation (UDR) that replaces problem enumeration with data unification, thereby broadening the problem coverage and reducing reliance on domain expertise. In addition, we introduce a Mixed Bias Module (MBM) during encoding to improve node embeddings, which efficiently captures multiple priors inherent to various problems. On top of the UDR, we develop a problem-conditioned parameter generator to further improve zero-shot generalization. Extensive experiments show that URS consistently produces high-quality solutions for 110 VRP variants (including 99 unseen variants) while demonstrating impressive scalability to large-scale instances with up to 7000 nodes. To the best of our knowledge, URS is the first neural solver to handle over 100 VRP variants with a single model. Our code is available at https://github.com/CIAM-Group/URS.

2505.20110 2026-05-26 cs.LG cs.AI 版本更新

Beyond the Proxy: Trajectory-Distilled Guidance for Offline GFlowNet Training

超越代理:用于离线GFlowNet训练的轨迹蒸馏指导

Ruishuo Chen, Xun Wang, Rui Hu, Zhuoran Li, Longbo Huang

发表机构 * Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China(清华大学交叉信息研究院)

AI总结 提出轨迹蒸馏GFlowNet(TD-GFN),利用逆强化学习从离线轨迹中提取稠密边奖励,通过DAG剪枝和优先反向采样指导策略,避免代理模型,提升离线GFlowNet训练的收敛速度和样本质量。

Comments Camera-ready version accepted at ICML 2026

详情
AI中文摘要

生成流网络(GFlowNets)擅长采样多样化的高奖励对象。在许多实际应用中,由于无法进行主动奖励查询,这些模型必须使用静态离线数据集进行训练。主流的训练方法通常依赖代理模型为在线采样的轨迹提供奖励反馈。然而,由于数据稀缺或评估成本高,构建可靠的代理往往具有挑战性。虽然现有的无代理方法试图解决这一问题,但它们通常施加粗糙的约束,限制了模型有效探索的能力。为了克服这些限制,我们提出了轨迹蒸馏GFlowNet(TD-GFN),一种新颖的无代理训练框架。TD-GFN利用逆强化学习(IRL)从离线轨迹中提取稠密的、转移级别的边奖励,为高效探索提供丰富的结构指导。关键的是,为了确保鲁棒性,这些奖励通过DAG剪枝和优先反向采样间接指导策略。这种设计确保梯度更新仅依赖于数据集中的真实终端奖励,从而防止错误传播。实验结果表明,TD-GFN在收敛速度和样本质量上显著优于广泛的现有基线,为离线GFlowNet训练建立了更鲁棒和高效的范式。

英文摘要

Generative Flow Networks (GFlowNets) excel at sampling diverse, high-reward objects. In many practical applications where active reward queries are infeasible, these models must be trained using static offline datasets. Prevailing training methods typically rely on a proxy model to provide reward feedback for online sampled trajectories. However, constructing a reliable proxy is often challenging due to data scarcity or high evaluation costs. While existing proxy-free approaches attempt to address this, they often impose coarse constraints that limit the model's ability to explore effectively. To overcome these limitations, we propose Trajectory-Distilled GFlowNet (TD-GFN), a novel proxy-free training framework. TD-GFN utilizes inverse reinforcement learning (IRL) to extract dense, transition-level edge rewards from offline trajectories, providing rich structural guidance for efficient exploration. Crucially, to ensure robustness, these rewards guide the policy indirectly through DAG pruning and prioritized backward sampling. This design ensures that gradient updates rely exclusively on ground-truth terminal rewards from the dataset, thereby preventing error propagation. Empirical results demonstrate that TD-GFN significantly outperforms a broad range of existing baselines in both convergence speed and sample quality, establishing a more robust and efficient paradigm for offline GFlowNet training.

2509.15543 2026-05-26 cs.LG 版本更新

Nonconvex Decentralized Stochastic Bilevel Optimization under Heavy-Tailed Noise

重尾噪声下的非凸分布式随机双层优化

Xinwen Zhang, Yihan Zhang, Heng Liang, Hongchang Gao

发表机构 * Temple University(特拉华大学)

AI总结 针对重尾噪声下的非凸双层优化问题,提出一种无需裁剪的归一化随机方差缩减梯度下降算法,并首次给出严格收敛性证明。

详情
AI中文摘要

现有的分布式随机优化方法假设下层损失函数是强凸的且随机梯度噪声具有有限方差,这些强假设在现实机器学习模型中通常不满足。例如,语言数据上的学习通常导致重尾梯度。为了解决这些局限性,我们针对重尾噪声下的非凸双层优化问题,开发了一种新颖的分布式随机双层优化算法。具体地,我们提出了一种归一化随机方差缩减双层梯度下降算法,该算法不依赖于任何裁剪操作。此外,通过创新性地在重尾噪声下对非凸分布式双层优化问题中的相互依赖梯度序列进行界定的方法,我们建立了其收敛速率。据我们所知,这是第一个在重尾噪声下具有严格理论保证的分布式双层优化算法。大量的实验结果证实了我们的算法在处理重尾噪声方面的有效性。

英文摘要

Existing decentralized stochastic optimization methods assume the lower-level loss function is strongly convex and the stochastic gradient noise has finite variance. These strong assumptions typically are not satisfied in real-world machine learning models. For example, learning on language data typically leads to heavy-tailed gradient. To address these limitations, we develop a novel decentralized stochastic bilevel optimization algorithm for the nonconvex bilevel optimization problem under heavy-tailed noise. Specifically, we develop a normalized stochastic variance-reduced bilevel gradient descent algorithm, which does not rely on any clipping operation. Moreover, we establish its convergence rate by innovatively bounding interdependent gradient sequences under heavy-tailed noise for nonconvex decentralized bilevel optimization problems. As far as we know, this is the first decentralized bilevel optimization algorithm with rigorous theoretical guarantees under heavy-tailed noise. The extensive experimental results confirm the effectiveness of our algorithm in handling heavy-tailed noise.

2509.10452 2026-05-26 cs.CL cs.LG 版本更新

WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers

WhisTLE: 深度监督的文本领域自适应方法用于预训练语音识别Transformer

Akshat Pandey, Karun Kumar, Raphael Tang

发表机构 * comcast Speech AI(comcast语音人工智能)

AI总结 提出WhisTLE,一种通过变分自编码器建模文本到编码器输出并微调解码器的文本领域自适应方法,显著降低词错误率。

Comments 10 pages

详情
AI中文摘要

预训练的自动语音识别(ASR)模型(如Whisper)表现良好,但仍需领域自适应以处理未见过的用语。在许多实际场景中,收集语音数据不切实际,因此需要仅文本的自适应。我们提出WhisTLE,一种用于预训练编码器-解码器ASR模型的深度监督文本自适应方法。WhisTLE训练一个变分自编码器(VAE)从文本建模编码器输出,并使用学习到的文本到潜在编码器微调解码器,可选地与文本到语音(TTS)自适应结合。在推理时,恢复原始编码器,不产生额外运行时成本。在四个数据集和四个ASR模型上,带有TTS的WhisTLE相对降低了49.0%的词错误率(WER),并在112个场景中的100个中优于所有非WhisTLE基线。我们还发现WhisTLE与任何其他领域自适应方法的组合都能互补增强;因此我们建议在标准流程中纳入WhisTLE以自适应编码器-解码器ASR模型。

英文摘要

Pretrained automatic speech recognition (ASR) models such as Whisper perform well but still need domain adaptation to handle unseen parlance. In many real-world settings, collecting speech data is impractical, necessitating text-only adaptation. We propose WhisTLE, a deeply supervised, text-only adaptation method for pretrained encoder-decoder ASR models. WhisTLE trains a variational autoencoder (VAE) to model encoder outputs from text and fine-tunes the decoder using the learned text-to-latent encoder, optionally combined with text-to-speech (TTS) adaptation. At inference, the original encoder is restored, incurring no extra runtime cost. Across four datasets and four ASR models, WhisTLE with TTS reduces word error rate (WER) by a relative 49.0% and outperforms all non-WhisTLE baselines in 100 of 112 scenarios. We also find that WhisTLE additively complements any combination of other domain adaptation approaches; we thus recommend the inclusion of WhisTLE during standard processes for adapting encoder-decoder ASR models.

2509.02113 2026-05-26 cs.LG cs.AI cs.CR cs.SI 版本更新

HiGraph: A Large-Scale Hierarchical Graph Dataset for Malware Analysis

HiGraph:用于恶意软件分析的大规模层次图数据集

Han Chen, Hanchen Wang, Hongmei Chen, Ying Zhang, Lu Qin, Wenjie Zhang

发表机构 * University of Technology Sydney(新南威尔士大学) Yunnan University(云南大学) University of New South Wales(新南威尔士大学)

AI总结 针对现有图方法忽略软件层次结构的问题,提出包含2亿控制流图和59.5万函数调用图的大规模层次图数据集HiGraph,用于构建抗混淆和演化的鲁棒恶意软件检测器。

Comments updated dataset statistics

详情
AI中文摘要

基于图的恶意软件分析的进展受到缺乏捕捉软件固有层次结构的大规模数据集的严重限制。现有方法通常将程序简化为单层图,未能建模高层功能交互与低层指令逻辑之间的关键语义关系。为填补这一空白,我们引入了\dataset,这是用于恶意软件分析的最大公开层次图数据集,包含嵌套在 extbf{595K}个函数调用图(FCG)中的超过 extbf{2亿}个控制流图(CFG)。这种两层表示保留了构建对代码混淆和恶意软件演化具有鲁棒性的检测器所必需的结构语义。我们通过大规模分析展示了HiGraph的实用性,揭示了良性软件和恶意软件的不同结构特性,将其确立为社区的基础基准。数据集和工具可在https://higraph.org公开获取。

英文摘要

The advancement of graph-based malware analysis is critically limited by the absence of large-scale datasets that capture the inherent hierarchical structure of software. Existing methods often oversimplify programs into single level graphs, failing to model the crucial semantic relationship between high-level functional interactions and low-level instruction logic. To bridge this gap, we introduce \dataset, the largest public hierarchical graph dataset for malware analysis, comprising over \textbf{200M} Control Flow Graphs (CFGs) nested within \textbf{595K} Function Call Graphs (FCGs). This two-level representation preserves structural semantics essential for building robust detectors resilient to code obfuscation and malware evolution. We demonstrate HiGraph's utility through a large-scale analysis that reveals distinct structural properties of benign and malicious software, establishing it as a foundational benchmark for the community. The dataset and tools are publicly available at https://higraph.org.

2405.01906 2026-05-26 cs.AI cs.LG cs.NE 版本更新

Instance-Conditioned Adaptation for Large-scale Generalization of Neural Routing Solver

实例条件适应:神经路由求解器的大规模泛化

Changliang Zhou, Xi Lin, Zhenkun Wang, Xialiang Tong, Mingxuan Yuan, Qingfu Zhang

发表机构 * School of Automation and Intelligent Manufacturing and Guangdong Provincial Key Laboratory of Fully Actuated System Control Theory and Technology, Southern University of Science and Technology, Shenzhen 518055, China(自动化与智能制造学院和广东省全驱动系统控制理论与技术重点实验室,南方科技大学,深圳518055,中国) Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China(计算机科学系,香港城市大学,香港特别行政区,中国) Huawei Noah’s Ark Lab, Hong Kong SAR, China(华为诺亚实验室,香港特别行政区,中国)

AI总结 提出实例条件适应模型(ICAM),通过简单高效的实例条件适应函数和低复杂度的适应模块,显著提升神经路由求解器在大规模旅行商问题(TSP)、容量车辆路径问题(CVRP)和非对称旅行商问题(ATSP)上的泛化性能,同时保持快速推理速度。

Comments 13 pages, 5 figures

详情
Journal ref
IEEE Transactions on Intelligent Transportation Systems, 2026
AI中文摘要

神经组合优化(NCO)方法在无需专家知识的情况下,展现出了解决智能交通系统路由问题的巨大潜力。然而,现有的构造性NCO方法仍难以解决大规模实例,这严重限制了其应用前景。为了解决这些关键缺陷,本文提出了一种新颖的实例条件适应模型(ICAM),以实现神经路由求解器更好的大规模泛化。特别地,我们设计了一个简单而高效的实例条件适应函数,以较小的时空开销显著提升现有NCO模型的泛化性能。此外,通过对不同注意力机制之间信息融合性能的系统研究,我们进一步提出了一个强大且低复杂度的实例条件适应模块,为不同规模的实例生成更好的解。在合成实例和基准实例上的大量实验结果表明,我们提出的方法能够在解决大规模旅行商问题(TSP)、容量车辆路径问题(CVRP)和非对称旅行商问题(ATSP)时,以非常快的推理时间获得有希望的结果。我们的代码可在 https://github.com/CIAM-Group/ICAM 获取。

英文摘要

The neural combinatorial optimization (NCO) method has shown great potential for solving routing problems of intelligent transportation systems without requiring expert knowledge. However, existing constructive NCO methods still struggle to solve large-scale instances, which significantly limits their application prospects. To address these crucial shortcomings, this work proposes a novel Instance-Conditioned Adaptation Model (ICAM) for better large-scale generalization of neural routing solvers. In particular, we design a simple yet efficient instance-conditioned adaptation function to significantly improve the generalization performance of existing NCO models with a small time and memory overhead. In addition, with a systematic investigation on the performance of information incorporation between different attention mechanisms, we further propose a powerful yet low-complexity instance-conditioned adaptation module to generate better solutions for instances across different scales. Extensive experimental results on both synthetic and benchmark instances show that our proposed method is capable of obtaining promising results with a very fast inference time in solving large-scale Traveling Salesman Problems (TSPs), Capacitated Vehicle Routing Problems (CVRPs), and Asymmetric Traveling Salesman Problems (ATSPs). Our code is available at https://github.com/CIAM-Group/ICAM.

2409.02416 2026-05-26 cs.LG stat.ML 版本更新

Relative Translation Invariant Wasserstein Distance

相对平移不变Wasserstein距离

Binshuai Wang, Qiwei Di, Ming Yin, Mengdi Wang, Quanquan Gu, Peng Wei

发表机构 * Department of Computer Science(计算机科学系) George Washington University(乔治华盛顿大学) University of California, Los Angeles(加州大学洛杉矶分校) Department of Electrical and Computer Engineering(电气与计算机工程系) Princeton University(普林斯顿大学)

AI总结 受Bures距离启发,提出相对平移不变Wasserstein距离RW_p,证明其度量性质,并设计双层算法计算离散分布间的RW_p距离,当p=2时提出RW_2-LP和RW_2-Sinkhorn算法以提高数值稳定性,实验验证了算法在减少数值误差和实际雷暴模式检索中的有效性。

Comments Accepted by Transactions on Machine Learning Research (TMLR). Final accepted version. The implementation is publicly available at \url{https://github.com/DRKWang/rw_metric}

详情
AI中文摘要

受Bures距离启发,我们引入了一类新的距离族——\\emph{相对平移不变Wasserstein距离},记为$RW_p$,作为经典Wasserstein距离$W_p$($p \\\in [1, +\\\infty)$)的推广。我们证明了$RW_p$定义了一个有效的度量,并表明这类度量比经典Wasserstein距离更具内在性。设计了一种双层算法来计算任意离散分布之间的一般$RW_p$距离。此外,当$p=2$时,我们证明在离散设定下最优耦合矩阵在分布平移下不变,并进一步提出了两种算法,即$\\\mathrm{RW}_2$-LP算法和$\\\mathrm{RW}_2$-Sinkhorn算法,以提高计算$W_2$距离和最优耦合矩阵解的数值稳定性。最后,我们进行了三个实验来验证我们的理论结果和算法。前两个实验报告了$\\\mathrm{RW}_2$-LP算法和$\\\mathrm{RW}_2$-Sinkhorn算法(无论是否归一化)相比标准算法能显著减少数值误差。第三个实验表明$RW_p$算法在计算上具有可扩展性,并适用于实际应用中相似雷暴模式的检索。

英文摘要

Motivated by the Bures distance, we introduce a new family of distances, \emph{relative translation invariant Wasserstein distances}, denoted by $RW_p$, as an extension of the classical Wasserstein distances $W_p$ for $p \in [1, +\infty)$. We establish that $RW_p$ defines a valid metric and demonstrate that this type of metric is more intrinsic than the classical Wasserstein distance. A bi-level algorithm is designed to compute the general $RW_p$ distance between arbitrary discrete distributions. Moreover, when $p = 2$, we show that the optimal coupling matrix is invariant under distributional translation in the discrete setting, and we further propose two algorithms, the $\mathrm{RW}_2$-LP algorithm and the $\mathrm{RW}_2$-Sinkhorn algorithm, to improve the numerical stability of computing $W_2$ distance and the optimal coupling matrix solutions. Finally, we conduct three experiments to validate our theoretical results and algorithms. The first two experiments report that the $\mathrm{RW}_2$-LP algorithm and the $\mathrm{RW}_2$-Sinkhorn algorithm, both with and without normalization, can significantly reduce the numerical errors compared to standard algorithms. The third experiment shows that $RW_p$ algorithms are computationally scalable and applicable to the retrieval of similar thunderstorm patterns in practical applications.