2605.28809 2026-05-28 cs.CV cs.LG 版本更新

AREA: Attribute Extraction and Aggregation for CLIP-Based Class-Incremental Learning

AREA: 基于CLIP的类增量学习中的属性提取与聚合

Zhen-Hao Xie, Yu-Cheng Shi, Da-Wei Zhou

发表机构 * State Key Laboratory of Novel Software Technology, Nanjing University, China（新型软件技术国家重点实验室，南京大学，中国）； School of Artificial Intelligence, Nanjing University, China（人工智能学院，南京大学，中国）

AI总结提出AREA方法，通过主测地线分析稳定属性提取、轻量级任务专家和变分信息瓶颈正则化稳定属性聚合，并利用最优传输进行推理，以解决CLIP类增量学习中的灾难性遗忘问题。

Comments Accepted to ICML 2026. Code is available at https://github.com/LAMDA-CL/ICML2026-AREA

详情

AI中文摘要

类增量学习（CIL）在构建真实世界学习系统中至关重要。在基于CLIP的CIL中，模型通过比较从模板提示（例如，“一张[类别]的照片”）获得的视觉和文本嵌入之间的相似性来执行分类。这种看似单一的匹配过程可以分解为两个概念上不同的阶段：属性提取和属性聚合。例如，模型可能通过毛皮纹理和胡须等属性识别猫。当学习新类别（如汽车）时，模型必须提取额外属性（如轮子），并调整它们在共享表示空间中的聚合方式。然而，由于只有当前任务的数据可用，增量更新可能使属性提取和聚合偏向新类别，导致灾难性遗忘。因此，我们提出了AREA，用于基于CLIP的CIL中的属性提取和聚合。为了稳定提取，我们通过主测地线分析将类别级视觉和文本属性锚定在超球面嵌入空间上。为了稳定聚合，我们学习轻量级任务特定专家，并带有评分和残差细化，通过变分信息瓶颈目标进行正则化。在推理时，我们通过最优传输在任务属性流形上进行路由，以实现更简洁的预测。实验表明，AREA持续优于最先进的方法。代码可在https://github.com/LAMDA-CL/ICML2026-AREA获取。

英文摘要

Class-Incremental Learning (CIL) is important in building real-world learning systems. In CLIP-based CIL, the model performs classification by comparing similarity between visual and textual embeddings obtained from template prompts, e.g., ``a photo of a [CLASS]''. This seemingly monolithic matching process can be decomposed into two conceptually distinct stages: attribute extraction and attribute aggregation. For example, a model may recognize cat using attributes such as fur texture and whiskers. When learning a new class like car, the model must extract additional attributes like wheels and adjust how they are aggregated in the shared representation space. However, since only data from the current task is available, incremental updates can bias both attribute extraction and aggregation toward new classes, leading to catastrophic forgetting. Therefore, we propose AREA for attribute extraction and aggregation in CLIP-based CIL. To stabilize extraction, we anchor class-level visual and textual attributes on the hyperspherical embedding space via principal geodesic analysis. To stabilize aggregation, we learn lightweight task-specific experts with scoring and residual refinement, regularized by a variational information bottleneck objective. During inference, we perform routing over task attribute manifolds via optimal transport for more concise prediction. Experiments show that AREA consistently outperforms SOTA methods. Code is available at https://github.com/LAMDA-CL/ICML2026-AREA.

URL PDF HTML ☆

赞 0 踩 0

2605.28805 2026-05-28 cs.CL cs.AI cs.CV cs.LG 版本更新

OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration

OmniVerifier-M1: 具有显式结构化重校准的多模态元验证器

Xinchen Zhang, Bowei Liu, Jiale Liu, Chufan Shi, Yizhen Zhang, Junhong Liu, Youliang Zhang, Zhiheng Li, Yujiu Yang, Ling Yang

发表机构 * Tsinghua University（清华大学）； Pennsylvania State University（宾夕法尼亚州立大学）； University of Southern California（南加州大学）； Microcyto ； Princeton University（普林斯顿大学）

AI总结提出OmniVerifier-M1，通过符号化元验证（如边界框）和解耦强化学习，实现多模态大模型的可靠细粒度验证与动态区域级自校正。

Comments ICML 2026. Project: https://github.com/Cominclip/OmniVerifier

详情

AI中文摘要

视觉结果日益成为多模态大语言模型的核心，因此可靠且细粒度的验证对于扩展通用基础模型至关重要。在这项工作中，我们研究了多模态元验证，它利用验证器生成的推理过程而非仅决策信号，并探索如何有效地将元验证反馈纳入多模态验证器训练。我们发现了两个关键发现。首先，符号化验证器输出（例如边界框）作为元验证推理过程优于文本解释，能够实现高效的基于规则的强化学习奖励，同时避免依赖来自辅助评判模型的基于模型的奖励。其次，解耦二元判断和元验证的强化学习目标显著优于联合奖励优化，这是由于输出结构和学习动态的内在差异。基于这些见解，我们训练了OmniVerifier-M1，一个利用符号化元验证和解耦强化学习的通用视觉验证器。OmniVerifier-M1提供稳健的验证和细粒度的错误定位，并进一步实现了M1-TTS，一个由验证器驱动的智能体生成系统，实现动态区域级自校正。这种方法为更可靠、可解释和细粒度的多模态验证铺平了道路，支持更安全、更可控的基础模型部署。

英文摘要

Visual outcomes are increasingly central to multimodal large language models, making reliable and fine-grained verification essential for scaling generalist foundation models. In this work, we investigate multimodal meta-verification, which leverages verifier-generated rationales rather than decision-only signals, and explore how to effectively incorporate meta-verification feedback into multimodal verifier training. We identify two key findings. First, symbolic verifier outputs (e.g., bounding boxes) outperform textual explanations as meta-verification rationales, enabling efficient rule-based reinforcement learning rewards while avoiding reliance on model-based rewards from auxiliary judge models. Second, decoupling reinforcement learning objectives for binary judgment and meta-verification substantially outperforms joint reward optimization, due to intrinsic differences in output structure and learning dynamics. Based on these insights, we train OmniVerifier-M1, a generalist visual verifier leveraging symbolic meta-verification and decoupled reinforcement learning. OmniVerifier-M1 provides robust verification and fine-grained error localization, and further enables M1-TTS, a verifier-driven agentic generation system achieving dynamic region-level self-correction. This approach paves the way for more reliable, interpretable, and fine-grained multimodal verification, supporting safer and more controllable foundation model deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.28803 2026-05-28 cs.CV cs.LG 版本更新

从弱点中学习：小型计算机使用代理的自动化领域专业化

Suji Kim, Kangsan Kim, Sung Ju Hwang

发表机构 * KAIST（韩国科学技术院）； Samsung Electronics（三星电子）

AI总结提出LearnWeak框架，通过更强的参考代理识别学生代理在目标领域的弱点，自动合成针对性任务和监督信号，并引入误差感知专业化目标，显著提升小型计算机使用代理在多个领域的性能。

详情

AI中文摘要

计算机使用代理（CUA）最近取得了实质性进展，但为每个软件领域部署单独的大型专家仍然昂贵。小型开源计算机使用代理是更实用的专业化目标，但它们仍然明显较弱，并表现出不均匀的领域特定失败。一个直接的补救措施是为目标领域合成大规模训练数据，但我们发现这种简单方法仅带来边际改进。基于这一观察，我们引入了LearnWeak，一个针对小型计算机使用代理的无注释专业化框架，它使用更强的参考代理来识别学生在目标领域的弱点，合成有针对性的任务，并自动构建监督。LearnWeak进一步引入了一个误差感知的专业化目标，将规划和执行误差分离，从而实现比广泛统一监督更行为精确的更新。在OSWorld上，LearnWeak在八个领域上分别比EvoCUA-8B和OpenCUA-7B平均提高了11.6和11.1个百分点。我们还验证了我们的学生感知数据集生成和训练方法优于现有的自主轨迹生成和训练基线。我们的工作强调了学生意识在数据合成和代理训练中的重要性，为在多样化领域专业化小型计算机使用代理指明了更原则和高效的路径。

英文摘要

Computer-use agents (CUAs) have recently made substantial progress, but deploying a separate large expert for each software domain remains expensive. Small open computer-use agents are more practical specialization targets, but they remain substantially weaker and exhibit uneven domain-specific failures. A straightforward remedy is to synthesize large-scale training data for the target domain, yet we find that this naive approach yields only marginal improvements. Building on this observation, we introduce LearnWeak, an annotation-free specialization framework for small computer-use agents that uses a stronger reference agent to identify the student's weaknesses in the target domain, synthesize targeted tasks, and construct supervision automatically. LearnWeak further introduces an error-aware specialization objective that disentangles planning and execution errors, enabling more behaviorally precise updates than broad uniform supervision. On OSWorld, LearnWeak achieves average gains of 11.6 and 11.1 percentage points over EvoCUA-8B and OpenCUA-7B, respectively, across eight domains. We also validate that our student-aware dataset generation and training approaches outperform existing autonomous trajectory generation and training baselines. Our work highlights the importance of student awareness in both data synthesis and agent training, pointing toward a more principled and efficient path for specializing small computer-use agents in diverse domains.

URL PDF HTML ☆

赞 0 踩 0

2605.28773 2026-05-28 cs.CL cs.AI cs.LG cs.MA cs.MM 版本更新

Rethinking Memory as Continuously Evolving Connectivity

重新思考记忆作为持续演化的连接性

Jizhan Fang, Buqiang Xu, Zhixian Wang, Haoliang Cao, Xinle Deng, Baohua Dong, Hangcheng Zhu, Ruohui Huang, Gang Yu, Ying Wei, Guozhou Zheng, Feiyu Xiong, Haofen Wang, Huajun Chen, Ningyu Zhang

发表机构 * Zhejiang University（浙江大学）； Alibaba Group（阿里巴巴集团）； MemTensor ； Tongji University（同济大学）

AI总结提出 FluxMem 框架，将记忆建模为异构图并通过三个阶段（初始连接形成、反馈驱动优化、长期巩固）动态演化拓扑结构，以解决现有记忆增强型 LLM 代理在动态环境中的脆弱性问题。

Comments Ongoing work

详情

AI中文摘要

现有的记忆增强型 LLM 代理通常将记忆视为具有预定义表示和固定检索管道的静态存储库，这在动态代理环境中是脆弱的，因为反馈、任务变化和异构信号不断重塑应该记住的内容以及如何连接它们。为了解决这个问题，我们提出了 FluxMem，一种连接性演化的记忆框架，它将记忆建模为异构图，并通过三个阶段逐步优化其拓扑结构：初始连接形成、反馈驱动优化和长期巩固。在执行过程中，FluxMem 修复缺失的链接、修剪干扰、对齐抽象粒度，并将重复的成功轨迹提炼为可重用的程序化电路，由记忆泛化性和演化成熟度的一个度量指导。在三个根本不同的基准测试（包括 LoCoMo、Mind2Web 和 GAIA）上，FluxMem 实现了持续的最先进性能，展示了在复杂代理环境中的强大适应性和泛化能力。代码将在 https://github.com/zjunlp/LightMem 开源。

英文摘要

Existing memory-augmented LLM agents often treat memory as a static repository with pre-defined representations and fixed retrieval pipelines, which is brittle in dynamic agentic environments where feedback, task variation, and heterogeneous signals continuously reshape what should be remembered and how it should be connected. To address this, we propose FluxMem, a connectivity-evolving memory framework that models memory as a heterogeneous graph and progressively refines its topology through three stages: initial connection formation, feedback-driven refinement, and long-term consolidation. During execution, FluxMem repairs missing links, prunes interference, aligns abstraction granularity, and distills recurrent successful trajectories into reusable procedural circuits, guided by one metric for memory generalizability and evolutionary maturity. Across three fundamentally distinct benchmarks including LoCoMo, Mind2Web, and GAIA, FluxMem achieves consistent state-of-the-art performance, demonstrating strong adaptation and generalization in complex agentic environments. The code will be open-sourced in https://github.com/zjunlp/LightMem.

URL PDF HTML ☆

赞 0 踩 0

2605.28769 2026-05-28 cs.LG 版本更新

Multi-Mixer Models: Flexible Sequence Modeling with Shared Representations

多混合器模型：基于共享表示的灵活序列建模

Kevin Y. Li, Asher Trockman, Ananda Theertha Suresh, Ziteng Sun

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Google Research（谷歌研究）

AI总结提出Oryx混合模型，通过序列轴上的灵活切换（注意力与线性递归）实现高效长上下文处理，在1.4B规模下平均语言建模任务提升0.7个百分点。

详情

AI中文摘要

Softmax注意力是现代大型语言模型的基石，但其内存随序列长度线性增长，计算量呈二次增长。线性递归模型（如线性注意力和状态空间模型）因其线性计算和恒定内存而成为注意力的替代方案。尽管这些次二次令牌混合方法（或混合器）在广泛基准上取得了有前景的效率提升和竞争性结果，但当前的线性递归模型在需要长上下文检索或上下文学习的任务上仍落后。越来越多的研究致力于通过静态交错或合并注意力与递归块来缓解这些权衡的混合架构。在这项工作中，我们探索了开发混合模型的新轴：跨令牌序列。我们提出Oryx，一种混合模型，可以在序列中灵活切换不同的混合器，例如用于丰富上下文利用的二次注意力和用于高效生成的线性递归。Oryx在混合器之间绑定至少90%的参数，使注意力和递归模式能够在共享内部表示上操作。我们使用Mamba-2和Gated DeltaNet变体验证了我们的设计，模型规模达1.4B。在固定令牌预算和混合训练策略下，Oryx实现了与其单一混合器基线相当或更好的性能。在1.4B规模下，所有Oryx实例在平均语言建模任务上至少比各自基线高出0.7个百分点。在检索任务上，即使仅以注意力模式处理极小部分（<10%）的令牌，Oryx也能达到与Transformer基线相当的性能。这些结果表明注意力和线性递归模型可以共享内部表示，并激励序列轴混合化作为一个有前景的方向。

英文摘要

Softmax attention is the cornerstone of modern large language models, but its memory scales linearly and compute quadratically with sequence length. Linear recurrent models, such as linear attention and state space models, have become widely studied as alternatives to attention due to their linear compute and constant memory. While these sub-quadratic token mixing methods, or mixers, achieve promising efficiency gains and competitive results on a wide range of benchmarks, current linear recurrent models still lag behind on tasks that require long-context retrieval or in-context learning. A growing body of work studies hybrid architectures that attempt to mitigate these trade-offs by statically interleaving or merging attention and recurrent blocks. In this work, we explore a new axis of developing hybrid models: across the token sequence. We propose Oryx, a hybrid model that can, throughout a sequence, flexibly switch between different mixers, for example quadratic attention for rich context utilization and linear recurrences for efficient generation. Oryx ties at least 90% of its parameters across mixers, enabling attention and recurrent modes to operate over shared internal representations. We validate our design with Mamba-2 and Gated DeltaNet variants, up to 1.4B models. Under fixed token budgets and a mixed-training strategy, Oryx achieves comparable or better performance than its single-mixer baselines. At the 1.4B scale, all instances of Oryx outperform their respective baselines by at least 0.7 percentage points on averaged language modeling tasks. On retrieval tasks, Oryx achieves performance comparable to the Transformer baseline even when processing only a tiny fraction (<10%) of the tokens in attention mode. These results suggest that attention and linear recurrent models can share internal representations, and motivate sequence-axis hybridization as a promising direction.

URL PDF HTML ☆

赞 0 踩 0

2605.28767 2026-05-28 cs.LG stat.ML 版本更新

Principled Algorithms for Optimizing Generalized Metrics in Multi-Label Learning

多标签学习中优化广义度量的原则性算法

Mehryar Mohri, Yutao Zhong

发表机构 * Google Research（谷歌研究）； CIMS（应用数学与计算科学研究所）

AI总结本文基于H-一致性理论，设计了可分解的代理损失函数，提出MMO算法族，用于优化多标签学习中的广义线性分式度量，并在大规模数据集上验证了其可扩展性和优越性能。

详情

AI中文摘要

许多现实世界的分类任务需要为每个实例预测多个标签，从而需要优化诸如$F$度量和Jaccard指数等复杂评估度量。虽然经验效用最大化（EUM）框架对于这些总体度量是自然的，但现有的理论结果主要局限于渐近贝叶斯一致性。在本文中，我们基于更强的$H$一致性概念，在EUM框架内开发了用于优化广义度量类别的原则性学习算法。我们的关键贡献是为多标签学习设计了新颖的代理损失函数，这些函数具有可证明的$H$一致性界，从而能够针对假设类别和有限样本进行具有非渐近保证的优化。至关重要的是，我们证明了这些组合公式化的代理损失函数可以精确分解，以严格的$O(l)$时间运行，无需近似。在此基础之上，我们引入了MMO（多标签度量优化），这是一个用于优化广义线性分式度量的新算法族。我们通过大量实验验证了我们的方法，在高稀疏性、深度学习环境下的大规模数据集（MS-COCO、Reuters-21578）上展示了稳健的可扩展性和优于最先进连续基线的性能。我们的结果为一般多标签度量优化提供了理论严谨性和实际有效性。

英文摘要

Many real-world classification tasks require predicting multiple labels per instance, necessitating the optimization of complex evaluation metrics such as the $F$-measure and Jaccard index. While the Empirical Utility Maximization (EUM) framework is natural for these population-level metrics, existing theoretical results are largely limited to asymptotic Bayes-consistency. In this paper, we develop principled learning algorithms for optimizing a broad class of generalized metrics within the EUM framework, grounded in the stronger notion of $H$-consistency. Our key contribution is the design of novel surrogate loss functions for multi-label learning that admit provable $H$-consistency bounds, enabling optimization with non-asymptotic guarantees tailored to the hypothesis class and finite samples. Crucially, we prove these combinatorially formulated surrogates decompose exactly, operating in strictly $O(l)$ time without approximations. Building on this foundation, we introduce MMO (Multi-Label Metric Optimization), a new family of algorithms for optimizing generalized linear-fractional metrics. We validate our approach through extensive experiments, demonstrating robust scalability and superior performance over state-of-the-art continuous baselines on large-scale datasets (MS-COCO, Reuters-21578) in high-sparsity, deep learning regimes. Our results offer both theoretical rigor and practical effectiveness for general multi-label metric optimization.

URL PDF HTML ☆

赞 0 踩 0

2605.28760 2026-05-28 cs.LG 版本更新

超越Lipschitz：基于离散模连续性的数据驱动鲁棒性

Jürgen Dölz, Michael Multerer, Michele Palma

发表机构 * Institute for Numerical Simulation（数值模拟研究所）； Dalle Molle Institute for Artificial Intelligence（人工智能研究所）； University of Bonn（波恩大学）； USI Lugano（卢加诺大学）； Germany（德国）； Switzerland（瑞士）

AI总结提出基于离散模连续性（DMOC）的数据驱动鲁棒性框架，通过非线性泛化Lipschitz连续性并引入可扩展的小批量算法，实现与数据分布相关的细粒度鲁棒性评估。

详情

AI中文摘要

神经网络的鲁棒性通常通过局部或全局Lipschitz常数来量化。然而，Lipschitz连续性作为全局鲁棒性度量可能过于粗糙或过于严格，无法捕捉细微的、依赖于数据的行为。我们提出了一种基于离散模连续性（DMOC）的数据驱动、架构无关的框架，这是Lipschitz连续性的非线性推广，提供了更精细的鲁棒性概念。与许多现有方法不同，DMOC不需要访问模型内部，而是评估相对于数据分布的规律性。这将焦点从模型转移到数据，数据提供了规律性的数据驱动基线，用于评估网络的鲁棒性。我们建立了DMOC诱导半范数的收敛结果，给出了基于分离距离的显式数据驱动速率，并引入了一种可扩展的小批量算法，该算法将精确计算的二次成本降低，从而能够应用于ImageNet等大规模数据集。实验上，DMOC作为一种架构无关的诊断工具：它区分了训练和未训练的网络，揭示了欠拟合和过拟合状态，并且作为特例，产生了与最先进方法（如ECLipsE和ECLipsE-fast）相当的紧Lipschitz估计。

英文摘要

Robustness of neural networks is commonly quantified via local or global Lipschitz constants. However, Lipschitz continuity can be overly coarse or overly restrictive as global robustness measure, failing to capture nuanced, data-dependent behavior. We propose a data-driven, architecture-agnostic framework based on the discrete modulus of continuity (DMOC), a non linear generalization of Lipschitz continuity that provides a finer notion of robustness. Unlike many existing approaches, DMOC does not require access to model internals and instead evaluates regularity relative to the data distribution. This shifts the focus from the model to the data, which provide a data-driven baseline of regularity against which the network's robustness is assessed. We establish convergence results for DMOC-induced seminorms with explicit data-driven rates in terms of the separation distance, and introduce a scalable minibatch algorithm that reduces the quadratic cost of exact computation, enabling application to large-scale data sets such as ImageNet. Empirically, DMOC serves as an architecture independent diagnostic: it distinguishes trained from untrained networks, reveals underfitting and overfitting regimes, and yields, as a special case, tight Lipschitz estimates comparable to state-of-the-art method such as ECLipsE and ECLipsE-fast.

URL PDF HTML ☆

赞 0 踩 0

2605.28726 2026-05-28 cs.RO cs.LG 版本更新

How VLAs Fail Differently: Black-Box Action Monitoring Reveals Architecture-Specific Failure Signatures

VLA如何以不同方式失败：黑盒动作监控揭示架构特定的失败特征

Krishnam Gupta

发表机构 * Independent Research（独立研究）

AI总结本文通过黑盒动作监控发现，视觉-语言-动作（VLA）架构在电机指令层面以根本不同且可预测的方式失败，并证明架构匹配的监控器选择至关重要。

Comments Accepted at IEEE ICRA 2026 Workshop "From Data to Decisions: VLA Pipelines for Real Robots", Vienna, June 2026. Non-archival workshop. 5 pages, 2 figures, 22 references

详情

AI中文摘要

我们发现VLA架构在电机指令层面以根本不同且可预测的方式失败。在相同的评估协议（PushT和ALOHA 14自由度双手操作共450个回合）上运行VQ-BeT、Diffusion Policy和ACT，我们发现：（1）方向反转率是所有三种架构的通用失败预测器（AUROC=0.93, 0.79, 0.91; p<0.001）；（2）加加速度监控仅对离散令牌架构具有预测性，遵循离散到连续的梯度（0.88, 0.69, 0.41）；（3）速度违规本身在所有地方均无预测性（AUROC 0.41-0.69），然而速度检查是VLA部署代码中最常见的安全机制；（4）对于连续族VLA，速度监控提供的预测信号几乎为零（ACT上AUROC=0.52，Diffusion上0.41），证明架构匹配的监控器选择至关重要。这些结果量化了众所周知的离散/连续VLA区分的监控后果：两个家族产生定性不同的失败特征，需要不同的监控器。没有单一的监控器能普遍适用；需要架构匹配的选择。这一发现得益于SafeContract，一个无需训练、黑盒动作监控工具包，具有共形校准。代码：https://github.com/krishnam94/vla-edge

英文摘要

We discover that VLA architectures fail in fundamentally different, predictable ways at the motor-command level. Running VQ-BeT, Diffusion Policy, and ACT on identical evaluation protocols (n=450 episodes across PushT and ALOHA 14-DOF bimanual manipulation), we find: (1) direction reversal rate is a universal failure predictor across all three architectures (AUROC=0.93, 0.79, 0.91; p<0.001); (2) jerk monitoring is predictive only for discrete-token architectures, following a discrete-to-continuous gradient (0.88, 0.69, 0.41); (3) velocity violations alone are non-predictive everywhere (AUROC 0.41-0.69), yet velocity checking is the most common safety mechanism in VLA deployment code; and (4) for continuous-family VLAs, velocity monitoring provides effectively zero predictive signal (AUROC=0.52 on ACT, 0.41 on Diffusion), proving that architecture-matched monitor selection is essential. These results quantify a monitoring consequence of the well-known discrete/continuous VLA distinction: the two families produce qualitatively different failure signatures that require different monitors. No single monitor works universally; architecture-matched selection is required. This finding was enabled by SafeContract, a training-free, black-box action monitoring toolkit with conformal calibration. Code: https://github.com/krishnam94/vla-edge

URL PDF HTML ☆

赞 0 踩 0

2605.28707 2026-05-28 cs.AI cs.LG 版本更新

Beyond Binary Moral Judgment: Modeling Ethical Pluralism in AI

超越二元道德判断：在AI中建模伦理多元主义

Aisha Aijaz, Rahul Goel, Arnav Batra, Raghava Mutharaju

发表机构 * Department of Computer Science and Engineering, IIIT Delhi（印度德里理工学院计算机科学与工程系）； Mehta Family School of Data Science and AI, IIT Palakkad（印度帕拉卡德理工学院梅塔家庭数据科学与人工智能学院）

AI总结提出将道德推理建模为规范性伦理理论分布（伦理多元主义）的框架，通过规范-语义双流架构和堆叠集成学习实现，在450个案例上达到88.89%的准确率。

详情

AI中文摘要

在社会关键领域的决策中，AI系统正以不同能力越来越多地参与。然而，尽管自主系统无处不在，大多数处理自主道德决策的方法仍诉诸于标量或二元判断。这些方法对于可接受的道德推理是不够的，因为它们提供的解释很少，遗漏了必须包含以支持问责的关键背景和理论信息。为此，我们提出了一个将道德推理建模为规范性伦理理论或伦理多元主义分布的框架。我们引入了一个整合这些理论的规范伦理单纯形。还准备了涵盖15个细分子理论的450个案例基准，用于堆叠集成学习。这些案例描述了自然语言中的伦理困境，并具有相关的提取上下文特征。单纯形的实现通过双流规范-语义架构完成，随后是规范信息的融合和顺序堆叠集成，以学习三个广泛理论（后果主义、美德伦理学和道义论）及其15个子类别的最佳拟合。我们的实验表明，将上下文和规范先验与语义嵌入相结合显著提高了分类性能，准确率达到88.89%。我们进行了消融研究，以表明结构化伦理表示超越了类比推理的贡献，并且所选的堆叠架构由于逐步学习粒度而给出了最佳结果。还通过熵、置信度和可视化分析了伦理多元主义。因此，将伦理多元主义建模为概率性规范分布支持类人道德推理、伦理分歧分析以及未来AI系统中的对齐。

英文摘要

Critical decision-making in socially consequential spaces is increasingly involving AI systems at varying capacities. Yet, despite the ubiquity of autonomous systems, most approaches to handling autonomous moral decision-making resort to scalar or binary judgments. These methods are insufficient for acceptable moral reasoning, as they provide little explanation, leaving out imperative contextual and theoretical information that must be included to support accountability. For this, we propose a framework to model moral reasoning as a distribution over normative ethical theories or ethical pluralism. We introduce a normative ethics simplex that integrates these theories. A benchmark of 450 cases across 15 fine-grained subtheories was also prepared for the purposes of stacked ensemble learning. These cases describe ethical dilemmas in natural language and have associated extracted contextual features. The implementation of the simplex was achieved via a two-stream normative-semantic architecture. This is followed by the fusion of normative information and a sequential, stacking ensemble to learn the best fit of the three broad theories: consequentialism, virtue ethics, and deontology, and the 15 subcategories. Our experiments demonstrate that the integration of contextual and normative priors with the semantic embeddings significantly improves the performance of the classification, displaying an accuracy of 88.89%. We conducted ablation studies to show that structured ethical representations contribute beyond analogical reasoning, and the chosen stacking architecture gives the best results due to the gradual learning of granularity. Ethical pluralism is also analyzed through entropy, confidence, and visualization. Thus, modeling ethical pluralism as a probabilistic normative distribution supports human-like moral reasoning, ethical disagreement analysis, and future alignment in AI systems.

URL PDF HTML ☆

赞 0 踩 0

2605.28705 2026-05-28 cs.LG 版本更新

Understanding Generalization and Forgetting in In-Context Continual Learning

理解上下文持续学习中的泛化与遗忘

Guangyu Li, Meng Ding, Lijie Hu

发表机构 * Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)（穆罕默德·本·扎耶德人工智能大学）； University of Buffalo（布法罗大学）

AI总结提出首个上下文持续学习理论框架，分析预训练Transformer在单提示中处理多序列任务时的泛化与遗忘行为，揭示注意力机制导致的干扰和偏差。

Comments accepted by ICML 2026

详情

AI中文摘要

上下文学习（ICL）的强大之处在于使大型语言模型能够仅通过基于提示的推理来适应新任务，完全绕过了参数更新的需要。现有理论主要在单任务设置下研究ICL，而现实中的提示通常包含异构任务序列，这导致我们无法理解大型语言模型是否在推理过程中隐式地执行持续学习。为了弥补这一差距，我们提出了首个用于上下文持续学习的理论框架，模拟预训练Transformer如何通过共享注意力机制在单个提示内处理多个顺序任务。聚焦于线性和掩码线性自注意力，我们推导了顺序任务提示下模型预测的误差表达式，并分析了它们的泛化和遗忘行为。我们的结果表明，标准注意力机制通过均匀或因果地聚合历史上下文，不可避免地引起任务间干扰，导致系统性偏差。我们进一步提供了预测误差的偏差-方差-干扰分解，刻画了历史上下文信息何时产生正迁移或可证明的负迁移。这一分析揭示了基于注意力的持续推理的基本限制，并为长提示中的顺序敏感性和性能退化提供了理论解释。

英文摘要

In-context learning (ICL) derives its power from enabling Large Language Models to adapt to new tasks via prompt-based reasoning alone, entirely bypassing the need for parameter updates. Existing theories primarily study ICL in single-task settings, while real-world prompts often contain sequences of heterogeneous tasks, leaving a gap in understanding whether Large Language Models implicitly perform continual learning during inference. To bridge this gap, we propose the first theoretical framework for in-context continual learning, modeling how a pretrained Transformer processes multiple sequential tasks within a single prompt through shared attention mechanisms. Focusing on linear and masked linear self-attention, we derive error expressions for model predictions under sequential task prompts and analyze their generalization and forgetting behavior. Our results reveal that standard attention mechanisms inevitably induce intertask interference by uniformly or causally aggregating historical contexts, leading to systematic bias. We further provide a bias-variance-interference decomposition of prediction error, characterizing when historical in-context information yields positive transfer or provable negative transfer. This analysis exposes fundamental limits of attention-based continual inference and offers theoretical explanations for order sensitivity and performance degradation in long prompts.

URL PDF HTML ☆

赞 0 踩 0

2605.28704 2026-05-28 cs.LG 版本更新

Expressive Power of Floating-Point Neural Networks with Arbitrary Reduction Orders and Inexact Activation Implementations

具有任意归约顺序和不精确激活实现的浮点神经网络的表达能力

Yeachan Park, Geonho Hwang, Wonyeol Lee, Sejun Park

发表机构 * Department of Mathematics and Statistics, Sejong University（数学与统计学系，世宗大学）； Department of Mathematical Sciences, GIST（数学科学系，韩国科学技术院）； Department of Computer Science, POSTECH（计算机科学系，POSTECH）； Department of Artificial Intelligence, Korea University（人工智能系，韩国大学）

AI总结本文研究在广义浮点执行语义下（包括任意归约顺序和具有有界ulp误差的不精确激活实现），浮点神经网络能否精确表示浮点域上的任意函数，并引入通用可区分性框架，证明第一层区分每对不同输入的能力是通用可表示性的必要条件，同时在温和条件下证明适当形式的可区分性也是充分条件，从而为Sigmoid、tanh、ReLU等实际激活函数建立了通用可表示性结果。

详情

AI中文摘要

大多数现有的神经网络表达能力理论假设精确实数运算，而实际神经网络是在有限精度浮点算术下执行的，其执行语义依赖于实现。最近的工作开始研究浮点神经网络的表达能力，但现有结果仅限于高度受限的激活函数和理想化假设，如固定的从左到右归约顺序和正确舍入的激活实现。在这项工作中，我们研究了在广义浮点执行语义下浮点神经网络的表达能力，包括任意归约顺序和具有有界ulp误差的不精确激活实现。我们探讨了浮点神经网络何时能够精确表示浮点域上的任意函数。为此，我们引入了一个通用的可区分性框架，并表明在第一层中区分每对不同输入的能力是通用可表示性的必要条件。这一表征产生了广泛的不具备通用可表示性的激活实现类别，扩展了先前孤立的反例，如正确舍入的余弦激活。我们进一步证明，在激活实现的温和条件下，适当形式的可区分性也是通用可表示性的充分条件。利用这一框架，我们为一大类实际激活函数建立了通用可表示性结果，包括Sigmoid、tanh、ReLU、ELU、SeLU、GeLU、Swish、Mish和sin的实现，这些结果在比以前已知的显著更现实的浮点执行模型下成立。

英文摘要

Most existing expressivity theories for neural networks assume exact real arithmetic, whereas practical neural networks are executed under finite-precision floating-point arithmetic with implementation-dependent execution semantics. Recent works have begun studying the expressive power of floating-point neural networks, but existing results are limited to highly restricted activation functions and idealized assumptions such as fixed left-to-right reduction orders and correctly rounded activation implementations. In this work, we study the expressive power of floating-point neural networks under generalized floating-point execution semantics, including arbitrary reduction orders and inexact activation implementations with bounded ulp errors. We investigate when floating-point neural networks can represent arbitrary functions between floating-point domains exactly. To this end, we introduce a general distinguishability framework and show that the ability to distinguish every pair of distinct inputs in the first layer is necessary for universal representability. This characterization yields broad classes of activation implementations that are not universal representators, extending previous isolated counterexamples such as the correctly rounded cosine activation. We further prove that a suitable form of distinguishability is also sufficient for universal representability under mild conditions on the activation implementation. Using this framework, we establish universal representability results for a broad class of practical activation functions, including implementations of $\mathrm{Sigmoid}$, $\tanh$, $\mathrm{ReLU}$, $\mathrm{ELU}$, $\mathrm{SeLU}$, $\mathrm{GeLU}$, $\mathrm{Swish}$, $\mathrm{Mish}$, and $\sin$, under significantly more realistic floating-point execution models than previously known.

URL PDF HTML ☆

赞 0 踩 0

2605.28684 2026-05-28 cs.LG cs.CE cs.NA math.NA physics.comp-ph 版本更新

History-aware adaptive reduced-order models via incremental singular value decomposition

基于增量奇异值分解的历史感知自适应降阶模型

Amirpasha Hedayat, Ali Mohaghegh, Laura Balzano, Cheng Huang, Karthik Duraisamy

发表机构 * Department of Aerospace Engineering（航空航天工程系）； University of Michigan（密歇根大学）； University of Kansas（堪萨斯大学）； Department of Electrical Engineering and Computer Science（电气工程与计算机科学系）

AI总结针对降阶模型在线动态偏离离线训练区域导致精度下降的问题，提出基于增量奇异值分解（iSVD）的投影自适应降阶框架，通过偶尔的全阶算子评估提供校正快照以在线更新基，并在三个非线性问题上验证其优于现有方法。

Comments 50 pages, 27 figures, Preprint submitted to Elsevier

详情

AI中文摘要

单次展开隐藏状态动力学用于无训练RLVR数据选择

Jianghao Wu, Jianfei Cai, Weiqiang Wang, Jin Ye, Daniel F. Schmidt, Yasmeen George

发表机构 * Faculty of Information Technology, Monash University, Melbourne, Australia（信息技术学院，墨尔本大学，澳大利亚）

AI总结提出SHIFT方法，通过单次推理展开的隐藏状态变化（RIRS）作为实例效用代理，结合质量加权CoreSet覆盖，实现无训练、无标签的RLVR数据选择，在数学推理和医学QA上优于基线。

Comments 14 pages, 2 figures. Accepted by ICML 2026

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）可以从极少的训练实例中获得巨大的推理增益，但其对使用哪些实例的强敏感性使得数据选择成为核心瓶颈。大多数现有的选择流程依赖于训练时的优化信号，和/或需要访问可验证奖励或大规模候选池的真实答案，这在专业领域成本高昂且通常不可行。我们研究在必须进行任何RL训练之前且没有标签或完整池奖励评估的情况下进行RLVR数据选择。我们提出SHIFT，一种基于推理时隐藏状态动力学的单次、无训练选择器。对于每个候选实例，SHIFT运行一次确定性推理展开，并计算推理引起的表示偏移（RIRS）作为从开始到结束的隐藏状态变化。SHIFT使用RIRS幅度作为实例效用的轻量级代理，并通过在RIRS增强特征空间中的质量加权最远优先CoreSet过程强制覆盖，生成可扩展到大型未标记池的紧凑子集。在超低预算下的数学推理和医学QA基准测试中，SHIFT始终优于无训练的多样性和难度/不确定性基线，提高了领域内准确性和向更难评估设置的迁移。消融实验表明，基于RIRS的覆盖和质量加权贡献了互补的增益，分析表明RIRS不能由简单的输入/输出长度统计解释。代码可在github.com/JianghaoWu/SHIFT获取。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) can yield large reasoning gains from very few training instances, yet its strong sensitivity to which instances are used makes data selection a central bottleneck. Most existing selection pipelines rely on training-time optimization signals and/or require access to verifiable rewards or ground-truth answers over large candidate pools, which is costly and often infeasible in specialized domains. We study RLVR data selection in a setting where selection must be performed before any RL training and without labels or reward evaluation on the full pool. We propose SHIFT, a one-shot, training-free selector based solely on inference-time hidden-state dynamics. For each candidate instance, SHIFT runs a single deterministic reasoning rollout and computes a reasoning-induced representation shift (RIRS) as the start-to-end hidden-state delta. SHIFT uses the RIRS magnitude as a lightweight proxy for instance utility and enforces coverage via a quality-weighted farthest-first CoreSet procedure in an RIRS-augmented feature space, producing compact subsets that scale to large unlabeled pools. Across mathematical reasoning and medical QA benchmarks under ultra-low budgets, SHIFT consistently outperforms training-free diversity and difficulty/uncertainty baselines, improving both in-domain accuracy and transfer to harder evaluation settings. Ablations show that RIRS-based coverage and quality-weighting contribute complementary gains, and analyses indicate that RIRS is not explained by simple input/output length statistics. Code is available at github.com/JianghaoWu/SHIFT.

URL PDF HTML ☆

赞 0 踩 0

2605.28626 2026-05-28 cs.LG 版本更新

When Interpretability Is Unequally Distributed: Fairness in Hybrid Interpretable Models

当可解释性分配不均：混合可解释模型中的公平性

Ziba Jabbar Zare, Ulrich Aïvodji, Julien Ferry, Thibaut Vidal

发表机构 * Department of Mechanical, Industrial and Aerospace Engineering, Concordia University（康科迪亚大学机械、工业与航空航天工程系）

AI总结针对混合可解释模型将不同群体不均地分配给可解释组件与黑箱组件的问题，提出可解释性覆盖差异（ICD）度量，并通过约束缓解不公平性。

详情

AI中文摘要

混合可解释模型通过将部分样本分配给透明组件，其余样本交给黑箱模型，结合了透明组件与黑箱模型。虽然这种设计允许在准确性和可解释性之间灵活权衡，但也引发了一个独特的程序公平性问题：某些人口群体可能系统地获得可解释的决策，而其他群体则被不成比例地路由到黑箱。我们将此问题形式化为可解释性覆盖差异（ICD），这是一种应用于混合可解释模型路由决策的群体均等性度量。利用预测多重性的工具，我们研究了四种混合可解释学习方法、三个标准公平性基准数据集和多个敏感属性下的ICD。我们的实验揭示了在中间透明度区间（即可解释组件和黑箱组件都被积极使用）存在显著的ICD。我们进一步表明，简单的覆盖差异约束可以显著减少精确混合学习方法中的ICD，同时对准确性和稀疏性的影响很小。在几种设置中，缓解ICD还改善了标准算法公平性指标。这些结果表明，混合可解释模型不仅应审计其预测公平性，还应审计其如何在个体和群体之间分配可解释性。

英文摘要

Hybrid interpretable models combine a transparent component with a black-box model by assigning some examples to the former and deferring the rest to the latter. While this design enables flexible tradeoffs between accuracy and interpretability, it also raises a distinct procedural fairness concern: some demographic groups may systematically receive interpretable decisions, while others are disproportionately routed to a black box. We formalize this issue as Interpretability Coverage Disparity (ICD), a demographic-parity-style measure applied to the routing decision of hybrid interpretable models. Using tools from predictive multiplicity, we study ICD across four hybrid interpretable learning methods, three standard fairness benchmark datasets, and multiple sensitive attributes. Our experiments reveal substantial ICD in intermediate transparency regimes, where both the interpretable and black-box components are actively used. We further show that simple coverage-disparity constraints can significantly reduce ICD in exact hybrid learning methods, with marginal impact on accuracy and sparsity. In several settings, ICD mitigation also improves standard algorithmic fairness metrics. These results show that hybrid interpretable models should be audited not only for predictive fairness, but also for how they allocate interpretability across individuals and groups.

URL PDF HTML ☆

赞 0 踩 0

2605.28625 2026-05-28 cs.LG 版本更新

Random Process Flow Matching: Generative Implicit Representations of Multivariate Random Fields

随机过程流匹配：多元随机场的生成隐式表示

Julien Lalanne, David Picard, Lionel Boillot, Lina-María Guayacán-Carrillo, Leon Barens, Jean-Michel Pereira

发表机构 * Navier, CNRS, Univ Gustave Eiffel, ENPC, IP Paris, France（纳维尔研究所、国家科学研究中心、巴黎高等电力学院、ENPC、IP巴黎、法国）； LIGM, CNRS, Univ Gustave Eiffel, ENPC, IP Paris, France（LIGM研究所、国家科学研究中心、巴黎高等电力学院、ENPC、IP巴黎、法国）

AI总结提出基于流匹配的随机过程流（RP Flow）框架，利用随机傅里叶特征学习隐式信号表示，通过集成采样编码不确定性，实现从稀疏观测生成高质量样本并校准不确定性估计。

详情

Journal ref: 43rd International Conference on Machine Learning, 2026

AI中文摘要

生成建模为学习数据分布提供了强大框架。这些模型最初依赖于高斯过程等概率方法进行不确定性感知预测，并转向更大的可训练模型以学习更复杂的分布。在这项工作中，我们引入了随机过程流（RP Flow），一种基于流匹配的框架，将向量场表示为神经隐式函数。与现代生成方法不同，我们的设置涉及单个观测场，仅能获得稀疏测量。RP Flow使用随机傅里叶特征学习隐式信号表示，可以从有限的观测集查询任意位置，同时通过集成采样编码不确定性。我们提出通过源空间中的高斯过程回归构建贝叶斯后验以生成高质量样本。实验结果表明，即使在高频、高稀疏或高维等挑战性条件下，该框架也能生成逼真样本并提供校准的不确定性估计。这些发现使RP Flow成为数据稀缺且不确定性需可追踪的重建任务中生成模型的里程碑。

英文摘要

Generative modeling provides a powerful framework for learning data distributions. These models initially relied on probabilistic methods such as Gaussian Processes (GP) for uncertainty-aware predictions and shifted towards larger trainable models to learn more complex distributions. In this work, we introduce Random Process (RP) Flow, a Flow Matching-based framework that represents the vector field as a neural implicit function. Unlike modern generative methods, our setting involves a single observed field, from which only sparse measurements are available. RP Flow uses Random Fourier Features to learn an implicit signal representation that can be queried at any arbitrary location from a limited set of observations, while encoding uncertainty through ensemble sampling. We propose constructing a Bayesian posterior by GP regression in the source space to generate high-quality samples. Our empirical results demonstrate that this framework generates realistic samples along with calibrated uncertainty estimates, even under challenging conditions such as high frequency, high sparsity, or high dimensionality. These findings position RP Flow as a milestone towards generative models for reconstruction tasks where data is scarce and uncertainty must remain traceable.

URL PDF HTML ☆

赞 0 踩 0

2605.28613 2026-05-28 math.OC cs.LG stat.ML 版本更新

Implicit Regularization in Perturbed Deep Matrix Factorization: Spectral Conditions and Stability

扰动深度矩阵分解中的隐式正则化：谱条件与稳定性

Jingzhe Wang, Hung-Hsu Chou

发表机构 * Department of Informatics and Networked Systems, University of Pittsburgh（信息学与网络系统系，匹兹堡大学）； Department of Mathematics, University of Pittsburgh（数学系，匹兹堡大学）

AI总结本文研究扰动深度矩阵分解中低秩隐式正则化的稳定性，通过推导谱条件分析无噪声情况下的低秩阶段，并证明扰动下梯度下降的收敛性与低秩阶段的保持性。

2605.28612 2026-05-28 cs.LG 版本更新

Learning High-Dimensional Parity Functions with Product Networks using Gradient Descent

使用乘积网络通过梯度下降学习高维奇偶函数

Guillaume Larue, Louis-Adrien Dufrène, Quentin Lampin, Hadi Ghauch, Ghaya Rekaya

发表机构 * Orange Research（Orange研究院）

AI总结本文提出结合紧凑乘积神经网络架构与随机数据稀疏性（伯努利输入，p_e ≤ 1/N）及超参数优化，实现了高维奇偶函数的高效学习，并给出了收敛性理论保证。

详情

AI中文摘要

奇偶函数是基本的布尔运算，在机器学习、密码学和纠错码中具有关键应用。然而，学习高维奇偶函数面临重大挑战：在一般情况下，标准神经网络架构通常需要指数级的样本复杂度，使得基于梯度的优化对于大输入数量$N$变得不可行。我们证明，紧凑的乘积神经网络架构结合随机数据稀疏性（伯努利输入，$p_e \leq 1/N$）和适当的超参数选择，能够实现高效的奇偶学习，并具有收敛的理论保证。实验验证了我们的理论在高达$N = 100{,}000$维度上的有效性，经验证据显示了$p_e$和学习率$\alpha$的最优超参数选择，以及多项式复杂度的标度律。这项工作建立了架构归纳偏差与数据稀疏性之间的基本联系，为神经算术、结构化推理、二值神经网络以及机器学习在自动协议发现中的应用开辟了新的可能性。

英文摘要

Parity functions are fundamental Boolean operations with critical applications across machine learning, cryptography, and error correction. Yet, learning high-dimensional parity functions poses significant challenges: in a general setting, standard neural network architectures typically require exponential sample complexity, making gradient-based optimization intractable for large number of inputs $N$. We demonstrate that compact product-based neural architectures combined with stochastic data sparsity (Bernoulli inputs with $p_e \leq 1/N$) and appropriate hyperparameter choice enable efficient parity learning, with theoretical guarantees of convergence. Experiments validate our theory across dimensions up to $N = 100{,}000$, with empirical evidence showing optimal hyperparameter choices for $p_e$ and learning rate $α$, as well as polynomial complexity scaling laws. This work establishes fundamental connections between architectural inductive bias and data sparsity, opening new possibilities for neural arithmetic, structured reasoning, binary neural networks, and machine learning applied to automated protocol discovery.

URL PDF HTML ☆

赞 0 踩 0

2605.28603 2026-05-28 cs.LG cs.AI 版本更新

Online Irregular Multivariate Time Series Forecasting via Uncertainty-Driven Dual-Expert Calibration

在线不规则多变量时间序列预测：基于不确定性驱动的双专家校准

Haonan Wen, Hanyang Chen, Songhe Feng

发表机构 * Key Laboratory of Big Data \& Artificial Intelligence in Transportation (Beijing Jiaotong University), Ministry of Education School of Computer Science ； Technology, Beijing Jiaotong University Beijing China ； School of Computer Science ； Tangshan Research Institute, Beijing Jiaotong University Tangshan China ； Key Laboratory of Big Data \& Artificial Intelligence in Transportation (Beijing Jiaotong University), Ministry of Education ； Technology, Beijing Jiaotong University ； Tangshan Research Institute, Beijing Jiaotong University

AI总结针对在线不规则多变量时间序列预测中数据分布动态变化导致性能下降的问题，提出不确定性驱动的双专家校准框架Under-Cali，通过不确定性估计、双专家校准和自适应路由模块实现稳定高效的在线学习。

Comments Accepted by KDD 2026

详情

AI中文摘要

不规则多变量时间序列预测在许多实际应用中至关重要，其中时间序列是不规则采样的，并表现出动态演变的缺失模式。尽管现有方法在离线设置中表现良好，但在在线部署时，由于数据分布的动态变化，它们常常遭受显著的性能下降。在这种动态场景中保持预测能力通常需要在线自适应技术。由于不规则采样从根本上破坏了时间连续性和周期性，我们无法利用来自规则MTS的这些广泛研究的特性进行在线学习。为此，我们研究了在线IMTS预测问题，并提出了Under-Cali，一个不确定性驱动的双专家校准框架，包含三个核心组件：不确定性估计器、双专家校准模块和自适应路由模块。我们设计了一个不确定性估计器，作为核心控制信号来联合管理推理和自适应过程。在我们的框架中，不确定性估计器首先评估每个传入批次的不确定性。然后，自适应路由模块将高不确定性的样本引导至不可靠专家进行校准，而低不确定性样本则保留给可靠专家。随后，系统使用校准良好的可靠样本更新可靠专家和不确定性估计器，并使用具有挑战性的样本更新不可靠专家，从而实现稳定高效的在线学习。Under-Cali保持源预测模型冻结，仅通过轻量级、模型无关的校准模块进行自适应，从而实现高效自适应。在IMTS基准上的大量实验表明，在低计算成本下取得了持续的改进。我们的代码可在https://github.com/HaonanWen/Under-Cali获取。

英文摘要

Irregular multivariate time series forecasting is critical in many real-world applications, where time series are irregularly sampled and exhibit dynamically evolving missingness patterns. Although existing methods perform well in offline settings, they often suffer from significant performance degradation when deployed online due to dynamic shifts in data distribution. Maintaining forecasting capability in such dynamic scenarios typically necessitates online adaptation techniques. Since irregular sampling fundamentally undermines temporal continuity and periodicity, we cannot leverage these widely studied characteristics from regular MTS for online learning. To this end, we study the problem of online IMTS forecasting and propose Under-Cali, an uncertainty-driven dual-expert calibration framework consisting of three core components: an uncertainty estimator, a dual-expert calibration module, and an adaptive routing module. We design an uncertainty estimator that serves as the core control signal to jointly manage inference and adaptation processes. In our framework, the uncertainty estimator first assesses uncertainty for each incoming batch. The adaptive routing module then directs samples with high uncertainty to the unreliable expert for calibration, while low uncertainty samples remain with the reliable expert. Subsequently, the system updates the reliable expert and the uncertainty estimator using well-calibrated reliable samples, and updates the unreliable expert with challenging samples, enabling stable and efficient online learning. Under-Cali keeps the source forecasting model frozen and performs adaptation only through a lightweight, model-agnostic calibration module, enabling efficient adaptation. Extensive experiments on IMTS benchmarks demonstrate consistent improvements with low computational cost. Our code is available at https://github.com/HaonanWen/Under-Cali.

URL PDF HTML ☆

赞 0 踩 0

2605.28600 2026-05-28 cs.LG 版本更新

PLS在自注意力镜中的映射

Jiangsheng, You

AI总结将偏最小二乘法（PLS）视为线性化自注意力机制，从而在神经网络框架内研究PLS，同时PLS的降维和预测变量选择表明自注意力包含一定程度的维度归一化以改进学习。

2605.28589 2026-05-28 cs.LG 版本更新

Thinned Mean Field Langevin Dynamics

稀疏化平均场朗之万动力学

Zonghao Chen, Heishiro Kanagawa, François-Xavier Briol, Chris J. Oates, Lester Mackey

发表机构 * University College London（伦敦大学学院）； Newcastle University（新castle大学）； The Alan Turing Institute（艾伦·图灵研究所）； Microsoft Research New England（微软研究院新英格兰分部）

AI总结提出KT-MFLD算法，通过核稀疏化将粒子交互复杂度从O(N^2)降至O(N^{3/2})，并保持与MFLD相同的收敛保证。

详情

Journal ref: International Conference on Machine Learning, 2026

AI中文摘要

几个重要的学习任务可以表述为在适当的概率分布空间上最小化熵正则化目标。平均场朗之万动力学（MFLD）促进了这一一般上下文中的计算，将最小化器视为McKean-Vlasov过程的不变分布，该过程可以使用$N$个粒子进行数值离散化并模拟。然而，模拟这个相互作用粒子系统的计算复杂度为$O(N^2)$。受最近关于\emph{核稀疏化}研究的启发，我们提出了 exttt{KT-MFLD}，其中每个粒子仅与大小为$\mathcal{O}(N^{ rac{1}{2}})$的稀疏粒子核心集相互作用。因此， exttt{KT-MFLD}将计算复杂度降低到$O(N^{ rac{3}{2}})$，同时在温和的正则条件下，实现与MFLD相同的收敛保证（最多对数因子）。我们的理论分析在包括学生-教师神经网络训练、最大均值差异量化以及后贝叶斯框架中面向预测的后验计算等任务上得到了实证确认。

英文摘要

Several important learning tasks can be formulated as minimizing an entropy-regularized objective over an appropriate space of probability distributions. Mean-field Langevin dynamics (MFLD) facilitate computation in this general context, casting the minimizer as the invariant distribution of a McKean--Vlasov process, which can be numerically discretized using $N$ particles and thus simulated. However, simulating this interacting particle system has computational complexity of order $N^2$. Motivated by recent research into \emph{kernel thinning}, we propose \texttt{KT-MFLD}, in which each particle interacts only with a thinned particle coreset of size $\mathcal{O}(N^{\frac{1}{2}})$. \texttt{KT-MFLD} thus reduces the computational complexity to order $N^{\frac{3}{2}}$ while, under mild regularity conditions, achieving the same convergence guarantees (up to logarithmic factors) as MFLD. Our theoretical analysis is empirically confirmed on tasks including the training of student-teacher neural networks, quantization with maximum mean discrepancy, and computation of predictively-oriented posteriors in a post-Bayesian framework.

URL PDF HTML ☆

赞 0 踩 0

2605.28585 2026-05-28 cs.LG 版本更新

Outer-Momentum Restarting in High-Dimensional Two-Phase Optimization

高维两阶段优化中的外动量重启

Kristi Topollai, Allan Ma, Tolga Dimlioglu, Sui Jiet Tay, Anna Choromanska

发表机构 * New York University（纽约大学）

AI总结本文研究在分布式优化中周期性重启外动量以控制外存效应，通过理论分析、玩具实验和语言模型预训练验证其能扩大稳定范围。

详情

AI中文摘要

通信高效的分布式优化器（如DiLoCo）通过让工作节点在聚合进度之前执行多次本地更新来减少同步成本，并使用外动量优化器进行聚合。近期理论表明，外优化器作用于由内优化循环诱导的有效谱，而外动量的选择控制着本地更新的进度如何在通信轮次间累积。我们研究外动量的周期性重启，作为控制这种外存的一种简单互补机制。在线性化平方损失模型中，预测空间残差在经验NTK下演化，我们推导出模态重启收缩，表明重置通过丢弃陈旧动量同时保留内循环进度来利用相位抵消。玩具实验验证了预测的收缩行为，语言模型预训练实验表明，周期性重启扩大了外学习率和动量值在通信周期内的稳定范围。

英文摘要

Communication-efficient distributed optimizers such as DiLoCo reduce synchronization costs by letting workers perform many local updates before aggregating their progress with an outer momentum optimizer. Recent theory suggests that the outer optimizer acts on an effective spectrum induced by the inner optimization loop, and that the choice of outer momentum controls how progress from local updates is accumulated across communication rounds. We study periodic restarting of the outer momentum as a simple complementary mechanism for controlling this outer memory. In a linearized squared-loss model where prediction-space residuals evolve under the empirical NTK, we derive a mode-wise restart contraction showing that resets exploit phase cancellation by discarding stale momentum while preserving inner-loop progress. Toy experiments verify the predicted contraction behavior, and language-model pretraining experiments show that periodic restarts widen the stable range of outer learning rates and momentum values across communication periods.

URL PDF HTML ☆

赞 0 踩 0

2605.28583 2026-05-28 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

SARAD: LLM-Based Safety-Aware Hybrid Reinforcement Learning with Collision Prediction for Autonomous Driving

SARAD：基于LLM的安全感知混合强化学习与碰撞预测在自动驾驶中的应用

Kangyu Wu, Peng Cui, Guoxi Chen, Ya Zhang

发表机构 * National Natural Science Foundation (NNSF) of China（中国国家自然科学基金委员会）； National Science and Major Project（国家科学技术重大专项）

AI总结提出SARAD框架，结合大语言模型和深度强化学习，通过检索增强生成和碰撞预测模块提升自动驾驶的安全性和效率。

Comments 7 pages, 4 figures, accepted by IJCNN 2026

详情

AI中文摘要

确保自动驾驶系统决策的安全性和效率仍然是一个基本挑战。传统的深度强化学习（DRL）存在不安全的随机探索和收敛缓慢的问题，而大语言模型（LLM）在实时推理操作中表现出固有的延迟。为了解决这些限制，本文提出了SARAD，一种新颖的安全感知混合框架，协同LLM和DRL用于自动驾驶。SARAD用来自动态专家知识库的、经检索增强生成（RAG）增强的LLM引导决策替代了DRL的随机探索。提出了一个注意力判别器，将LLM的先验知识整合到DRL策略优化中。进一步设计了一个碰撞预测模块，使用历史碰撞数据进行微调，以提高车辆安全性。大量实验表明，SARAD在Highway-Env模拟器中实现了显著的性能提升，验证了所提模型在自动驾驶中的有效性。

英文摘要

Ensuring both safety and efficiency in decision-making for autonomous driving systems remains a fundamental challenge. Traditional Deep Reinforcement Learning (DRL) suffers from unsafe random exploration and slow convergence, while Large Language Models (LLMs) demonstrate inherent latency in real-time inference operations. To address these limitations, this paper proposes SARAD, a novel safety-aware hybrid framework that synergizes LLMs and DRL for autonomous driving. SARAD substitutes the random exploration of DRL with Retrieval-Augmented Generation (RAG)-enhanced, LLM-guided decisions sourced from a dynamic expert knowledge repository. An attention discriminator is proposed to integrate the prior knowledge of LLMs into DRL policy optimization. A collision predictor module, fine-tuned with historical collision data, is further designed to improve vehicle safety. Extensive experiments show that SARAD achieves significant performance improvements in the Highway-Env simulator, validating the effectiveness of the proposed model in autonomous driving.

URL PDF HTML ☆

赞 0 踩 0

2605.28578 2026-05-28 cs.LG 版本更新

A Generalized Tikhonov Layer for Interpretable-by-design Graph Neural Networks

一种用于可解释设计的图神经网络的广义Tikhonov层

Nicolas Tremblay, Benjamin Ricaud, Filippo Maria Bianchi

发表机构 * UiT the Arctic University of Norway（乌伊特北极大学）； CNRS（国家科学研究中心）； NORCE Norwegian Research Centre（挪威研究理事会）

AI总结提出Tikhonov层，通过可学习的节点重要性分数和多项式实现图神经网络的可解释性，其输出是广义图Tikhonov问题的精确解。

详情

AI中文摘要

我们提出了Tikhonov层，一种设计上可解释的图神经网络层：一旦训练完成，其学习到的参数直接揭示了哪些节点特征和图拓扑的哪些方面被用于预测。实际上，该层的传播矩阵采用闭式$R = (p(L)+Q)^{-1} Q$，其中$L$是归一化图拉普拉斯矩阵，$Q = diag(q_1,...,q_n)$是一个可学习的正节点重要性分数对角矩阵，$p(\cdot)$是一个可学习多项式。对于任意输入特征$x$，层输出$Rx$是广义图Tikhonov问题的精确最小化器，该问题在节点级数据保真度和拓扑驱动正则化惩罚之间进行权衡。学习到的对$\{\{q_i\},p\}$构成了内置的解释：大的$q_i$表明节点$i$自身的特征驱动预测，而小的$q_i$则表明依赖于局部图拓扑；$p$的形状揭示了是同质性、异质性还是带通响应被利用。通过将复杂性路由到一个专用的、任意深的Q网络来产生重要性分数，从而保持了表达能力，而Tikhonov层本身保持透明。我们证明了不同的节点重要性矩阵产生不同的传播算子，在结构上将解释与计算耦合。此外，Tikhonov层在单层中提供了全局感受野，缓解了过平滑和过挤压问题。在标准图分类基准上的实验证实，该模型匹配（有时甚至超越）不透明的基线，同时产生可解释且忠实的解释。

英文摘要

We propose the Tikhonov layer, a graph neural network layer that is interpretable by design: once trained, its learned parameters directly reveal which node features and which aspects of the graph topology were leveraged for prediction. In practice, the layer's propagation matrix takes the closed-form $R = (p(L)+Q)^{-1} Q$, where $L$ is the normalized graph Laplacian, $Q = diag(q_1,...,q_n)$ a learnable diagonal matrix of positive node-importance scores, and $p(\cdot)$ a learnable polynomial. For any input feature $x$, the layer output $Rx$ is the exact minimizer of a generalized graph Tikhonov problem that trades off node-level data fidelity against a topology-driven regularization penalty. The learned pair $\{\{q_i\},p\}$ constitutes a built-in explanation: large $q_i$ indicates that node $i$'s own features drive the prediction, while small $q_i$ signals reliance on the local graph topology; the shape of $p$ reveals whether homophily, heterophily, or a band-pass response is exploited. Expressivity is preserved by routing complexity through a dedicated, arbitrarily deep Q-network that produces the importance scores, while the Tikhonov layer itself remains transparent. We prove that distinct node-importance matrices yield distinct propagation operators, structurally coupling the explanation to the computation. Additionally, the Tikhonov layer provides, in a single layer, a global receptive field, mitigating both oversmoothing and oversquashing. Experiments on standard graph classification benchmarks confirm that the model matches (and sometimes outperforms) opaque baselines while producing interpretable and faithful explanations.

URL PDF HTML ☆

赞 0 踩 0

2605.28577 2026-05-28 cs.AI cs.LG 版本更新

Continual Model Routing in Evolving Model Hubs

演化模型库中的持续模型路由

Jack Bell, Giacomo Carfì, Gerlando Gramaglia, Vincenzo Lomonaco

发表机构 * Department of Computer Science, University of Pisa, Pisa, Italy（意大利比萨大学计算机科学系）； LUISS University, Rome, Italy（意大利罗马大学）

AI总结针对模型库快速扩展带来的模型选择和路由更新挑战，提出持续模型路由（CMR）问题，构建大规模基准CMRBench，并设计基于对比嵌入的CARvE方法，通过检查点锚定和结构化重放实现高效路由，显著优于多种基线。

Comments 42 pages, 24 tables, 6 figures, to be published at ICML 2026

详情

AI中文摘要

AI模型库提供了对快速增长的大量预训练模型的访问，使得具有不同路由策略的现成混合专家系统成为可能。然而，这种快速增长带来了两个基本挑战：跨数千个专家进行模型选择的扩展，以及随着新模型和任务的引入持续更新路由机制。在本文中，我们将这一设置形式化为持续模型路由（CMR），并提出了CMRBench，这是一个新的大规模基准，模拟现实的模型库扩展，包括超过2000个候选模型。最后，我们介绍了CARvE，一种对比嵌入方法，通过基于检查点的锚定和结构化重放实现高效的持续模型路由。大量的实验结果和消融研究表明，CARvE在模型、家族和领域级别的准确性上显著优于零样本检索、微调和适配器合并基线。

英文摘要

AI model hubs provide access to a rapidly growing collection of powerful pre-trained models, enabling off-the-shelf mixture-of-experts systems with different routing strategies. However, this rapid growth poses two fundamental challenges: scaling model selection across thousands of experts and continually updating routing mechanisms as new models and tasks are introduced. In this paper, we formalise this setting as Continual Model Routing (CMR) and propose CMRBench, a new large-scale benchmark simulating realistic hub expansion and including over 2,000 candidate models. Finally, we introduce CARvE, a contrastive embedding approach for efficient continual model routing via checkpoint-based anchoring and structured replay. Extensive empirical results and ablations show that CARvE significantly outperforms zero-shot retrieval, fine-tuning, and adapter-merging baselines in model, family, and domain-level accuracy.

URL PDF HTML ☆

赞 0 踩 0

2605.28573 2026-05-28 cs.LG cs.AI 版本更新

Efficient Pre-Training of LLMs through Truncated SVD Layers

通过截断SVD层实现LLM的高效预训练

Kaivan Kamali, Kajetan Schweighofer, Hormoz Shahrzad, Olivier Francon, Babak Hodjat, Risto Miikkulainen

发表机构 * Cognizant AI Lab（认知AI实验室）； UT Austin（得克萨斯大学奥斯汀分校）

AI总结提出TSVD框架，利用谱能量启发式自适应秩选择和缓存机制保持低秩与严格正交性，在减少计算开销的同时匹配或超越全参数基线的性能。

详情

AI中文摘要

大规模语言模型（LLM）的规模扩展使得预训练成本日益高昂。虽然低秩表示和正交权重矩阵原则上可以减少参数数量和计算开销，但现有方法大多依赖静态秩选择，且由于高计算成本而不强制权重正交性。本文引入TSVD框架，在整个训练过程中保持低秩和严格正交性。它利用基于谱能量的启发式方法进行自适应秩选择，并采用缓存机制来维持正交性。理论分析证明了该方法在预训练动态中的优势，跨多种模型规模的实验表明其在经验上有效。TSVD在显著降低计算需求的同时，匹配或超越了全参数基线的性能。因此，该方法为高效高性能LLM预训练提供了一条有充分依据、实用且可扩展的路径。

英文摘要

The massive scaling of Large Language Models (LLMs) has made pretraining increasingly cost-prohibitive. While low-rank representation and orthonormal weight matrices could in principle reduce parameter counts and computational overhead, most existing methods rely on static rank selection and do not enforce weight orthonormality due to high computational cost. This paper introduces TSVD, a framework that maintains low rank and strict orthonormality throughout the training process. It utilizes a spectral energy-based heuristic for adaptive rank selection, and a caching mechanisms to maintain orthonormality. Theoretical analysis justifies the advantage of the approach in pretraining dynamics and experiments across various model scales demonstrate that it is effective empirically. TSVD matches or exceeds the performance of full-parameter baselines while significantly reducing compute requirements. The approach thus offers a well-founded, practical, and scalable path toward efficient high-performance LLM pretraining.

URL PDF HTML ☆

赞 0 踩 0

2605.28567 2026-05-28 cs.LG cs.AI 版本更新

Semantic Optimal Transport for Sparse Autoencoder Feature Matching and Circuit Compression

稀疏自编码器特征匹配与电路压缩的语义最优传输

Tue M. Cao, Nguyen Do, My T. Thai

发表机构 * University of Florida（佛罗里达大学）

AI总结提出基于最优传输的分布框架，通过激活加权分布和Wasserstein距离统一解决跨层特征匹配与电路压缩问题。

Comments preprint

详情

AI中文摘要

稀疏自编码器（SAE）已成为解释语言模型的核心工具。然而，两个关键的SAE分析仍然难以规模化：（1）跨层匹配语义相似的特征，（2）将大型特征电路压缩为可解释的超节点。尽管这些问题被视为独立问题，但我们表明它们都是更基础挑战的实例，我们将其框架化为估计位于不同激活流形上的SAE特征之间的语义距离。我们为此问题引入了一个分布框架，其中每个特征不是像文献中那样由单个解码器向量表示，而是由表达它的隐藏状态上的激活加权分布表示。通过将这些分布投影到共享参考空间并使用Wasserstein距离进行比较，我们的方法为跨层特征比较提供了统一的语义度量。我们证明了我们的表示对激活缩放具有不变性，在扰动下稳定，并在有限样本边际条件下恢复真实匹配。实验上，我们的方法优于解码器向量和基于LLM的基线，并捕捉相关特征之间的细微功能差异。值得注意的是，我们的方法自动将大型特征电路压缩为可解释的超节点。

英文摘要

Sparse autoencoders (SAEs) have become a central tool for interpreting language models. However, two key SAE analyses that remain difficult to scale are (1) matching semantically similar features across multi-layers and (2) compressing large feature circuits into interpretable supernodes. Although these have been treated as separate problems, we show that both are instances of a more fundamental challenge, which we frame as the estimation of semantic distances between SAE features that lie on different activation manifolds. We introduce a distributional framework for this problem, in which each feature is represented not by a single decoder vector like in the literature, but by an activation-weighted distribution over the hidden states that express it. By projecting these distributions into a shared reference space and comparing them with Wasserstein distance, our method provides a unified semantic metric for cross-layer feature comparison. We prove that our representation is invariant to activation rescaling, stable under perturbations, and recovers true matches under finite-sample margin conditions. Empirically, our method outperforms decoder-vector and LLM-based baselines and captures subtle functional distinctions between related features. Notably, our method compresses large feature circuits into interpretable supernodes automatically.

URL PDF HTML ☆

赞 0 踩 0

2605.28566 2026-05-28 cs.AI cs.LG 版本更新

Tree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design Patterns

思维树作为经典启发式搜索问题：形式化基础与设计模式

Guni Sharon

发表机构 * Guni Sharon

AI总结本文通过经典启发式搜索术语统一分类法，将基于LLM的推理映射到搜索组件，并识别出系统搜索和前瞻性策略两种设计模式。

Comments Extended version of the SoCS 2026 paper. Includes appendices omitted from the proceedings version

详情

Journal ref: Proceedings of the Nineteenth International Symposium on Combinatorial Search (SoCS 2026), AAAI Press, 2026

AI中文摘要

大型语言模型（LLM）展示了卓越的推理能力，但其标准生成过程——自回归令牌预测——本质上是短视的，容易产生级联错误。为了解决这个问题，思维树（ToT）框架在中间推理步骤上创建了一个搜索空间，允许搜索模型进行探索、前瞻和回溯。然而，当前的ToT研究在自然语言处理和自动规划社区之间仍然分散，常常使用不一致的术语和临时实现。因此，我们通过基于经典启发式搜索术语的统一分类法综合了ToT领域。我们将基于LLM的推理映射到经典搜索组件：状态表示（思维粒度）、后继生成（提示操作符）和启发式评估（进展自我评估）。我们在分类法的背景下分析现有工作，并识别出新兴的设计模式：针对浅层确定性任务的系统搜索（最佳优先搜索）和针对深层多步推理的前瞻性策略（DFS、MCTS）。最后，我们指出了启发式搜索与LLM推理交叉领域中的开放算法挑战，并呼吁启发式搜索社区参与这一新兴领域。

英文摘要

Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities, yet their standard generation process -- auto-regressive token prediction -- is inherently myopic and prone to cascading errors. To address this, the Tree-of-Thoughts (ToT) framework creates a search space over intermediate reasoning steps, allowing search models to explore, look ahead, and backtrack. However, current ToT research remains fragmented across Natural Language Processing and Automated Planning communities, often using inconsistent terminology and ad-hoc implementations. Consequently, we synthesize the ToT landscape through a unified taxonomy based on classical heuristic search terminology. We map LLM-based reasoning to classical search components: state representation (granularity of thoughts), successor generation (prompting operators), and heuristic evaluation (self-assessment of progress). We analyze existing work within the context of our taxonomy and identify emerging design patterns: systematic search (Best-First Search) for shallow, deterministic tasks and lookahead-heavy strategies (DFS, MCTS) for deep multi-step reasoning. We conclude by identifying open algorithmic challenges at the intersection of heuristic search and LLM reasoning, and call on the heuristic search community to engage with this emerging domain.

URL PDF HTML ☆

赞 0 踩 0

2605.28563 2026-05-28 cs.LG cs.AI 版本更新

A Multi-dimensional Framework for Evaluating Generalization in EEG Foundation Models

评估脑电图基础模型泛化能力的多维框架

Aditya Kommineni, Emily Zhou, Kleanthis Avramidis, Tiantian Feng, Shrikanth Narayanan

发表机构 * Signal Analysis and Interpretation Laboratory（信号分析与解释实验室）

AI总结提出一个多维评估框架，在低资源条件下系统评估EEG基础模型（如LaBraM、CSBrain、CBraMod）的泛化能力，发现其在长上下文任务中表现优异，但在短窗口BCI任务中与监督模型相当，且对通道限制鲁棒性不足。

Comments 24 pages, 5 Figures

详情

AI中文摘要

在适当的适应设置下评估基础模型对于理解所学表示的质量和可迁移性至关重要。最近的脑电图基础模型在跨任务和数据集上展示了有前景的迁移能力，推动了它们在神经技术和临床应用中日益增长的使用。然而，这些模型通常是在精心整理的下游数据集上进行全微调评估，这种设置并未反映生物医学领域的约束，如有限的标记数据、减少的传感器覆盖或参数高效的适应。在这项工作中，我们提出了一个多维评估框架，用于在现实低资源条件下评估脑电图模型。在提出的多维评估框架下，对包括LaBraM、CSBrain和CBraMod在内的监督脑电图模型和最近的脑电图基础模型在6个不同数据集上进行了实证分析。我们发现，脑电图基础模型在长上下文任务（如睡眠阶段预测和心理健康状态分类）上持续提供性能提升。相比之下，对于短窗口的脑机接口风格任务，监督模型尽管参数少得多，却取得了相当的性能。额外的分析表明，当前的基础模型对短窗口任务和通道受限设置提供的鲁棒性有限。总之，这些发现激励使用多维评估协议，以表征模型在现实使用约束下的行为。

英文摘要

Evaluating foundation models under appropriate adaptation settings is essential for understanding the quality and transferability of the learned representations. Recent EEG foundation models have demonstrated promising transfer capabilities across tasks and datasets, motivating their growing use in neurotechnology and clinical applications. However, these models are typically evaluated under full fine-tuning on well-curated downstream datasets, a setting that does not reflect biomedical domain constraints such as limited labeled data, reduced sensor coverage, or parameter-efficient adaptation. In this work, we propose a multi-dimensional evaluation framework for assessing EEG models under realistic low-resource conditions. Empirical analysis of both supervised EEG models and recent EEG foundation models, including LaBraM, CSBrain, and CBraMod, across 6 different datasets is performed under the proposed multi-dimensional evaluation framework. We find that EEG foundation models consistently provide performance gains on long-context tasks such as sleep stage prediction and mental health state classification. In contrast, for short-window Brain Computer Interface style tasks, supervised models achieve comparable despite having substantially fewer parameters. Additional analyses demonstrate that current foundation models provide limited robustness to short-window tasks and channel constrained settings. Together, these findings motivate the use of multi-dimensional evaluation protocols that characterize model behavior under realistic use constraints.

URL PDF HTML ☆

赞 0 踩 0

2605.28561 2026-05-28 cs.CL cs.LG 版本更新

Soft-SVeRL: Self-Verified Reinforcement Learning with Soft Rewards

Soft-SVeRL: 基于软奖励的自验证强化学习

Saurabh Dash, Pierre Clavier, John Dang, Matthias Galle, Marzieh Fadaee, Ahmet Üstün, Beyza Ermis

发表机构 * Cohere Labs（Cohere实验室）

AI总结针对部分可验证任务，提出基于检查表分解的软奖励框架Soft-RLVR及其自验证变体Soft-SVeRL，通过密集部分信用信号提升强化学习训练效果，并解决自验证中的奖励膨胀问题。

详情

AI中文摘要

可验证奖励的强化学习（RLVR）在数学和代码等领域改进了语言模型，这些领域中正确性可以自动检查。然而，许多重要任务仅部分可验证：提示包含多个要求，响应可能满足其中一些但非全部，或者可能不存在单一的参考答案。我们引入Soft-RLVR，一个从分解的、学习的验证信号中进行强化学习的框架。Soft-RLVR将每个提示转换为原子要求的检查表，使用LLM验证器逐项评分候选响应，并在生成的软奖励上进行训练。基于检查表的奖励将稀疏的通过/失败监督转化为更密集的部分信用信号，但它们也引入了一个权衡：平均逐项判断可以减少验证器噪声，而部分信用可能奖励不完整的响应。我们形式化了这一权衡，并确定了基于检查表的验证比整体验证提供更可靠RL训练信号的条件。我们进一步引入Soft-SVeRL，这是Soft-RLVR的一个自验证变体，其中策略也充当验证器。我们表明，自验证容易因过于宽松的自我判断而导致奖励膨胀，并且需要显式稳定化以防止这种崩溃。在基于规则的ground-truth评估的受控指令遵循设置中，基于检查表的Soft-RLVR仅使用学习的验证器奖励就将IFEval提升了最多11.1分。我们的实验进一步表明，验证器质量和检查表质量都影响下游RL结果，并且显式稳定化对于有效的自验证至关重要。

英文摘要

Reinforcement Learning from Verifiable Rewards (RLVR) has improved language models in domains such as mathematics and code, where correctness can be checked automatically. However, many important tasks are only partially verifiable: prompts contain multiple requirements, responses may satisfy some but not all of them, or no single reference answer might exist. We introduce Soft-RLVR, a framework for reinforcement learning from decomposed, learned verification signals. Soft-RLVR converts each prompt into a checklist of atomic requirements, scores candidate responses item by item with an LLM verifier, and trains on the resulting soft reward. Checklist-based rewards turn sparse pass/fail supervision into a denser partial-credit signal, but they also introduce a tradeoff: averaging item-level judgments can reduce verifier noise, while partial credit can reward incomplete responses. We formalize this tradeoff and identify conditions under which checklist-based verification gives a more reliable RL training signal than holistic verification. We further introduce Soft-SVeRL, a self-verifying variant of Soft-RLVR in which the policy also acts as the verifier. We show that self-verification is prone to reward inflation from overly permissive self-judgments, and that explicit stabilization is needed to prevent this collapse. In a controlled instruction-following setting with rule-based ground-truth evaluation, checklist-based Soft-RLVR improves IFEval by up to 11.1 points using only learned verifier rewards. Our experiments further show that verifier quality and checklist quality both affect downstream RL outcomes, and that explicit stabilization is essential for effective self-verification.

URL PDF HTML ☆

赞 0 踩 0

2605.28554 2026-05-28 cs.LG 版本更新

High Performance, Low Reliability: Uncertainty Benchmarking for Tabular Foundation Models

高性能，低可靠性：表格基础模型的不确定性基准测试

José Lucas De Melo Costa, Fabrice Popineau, Arpad Rimmel, Bich-Liên Doan

发表机构 * CentraleSupélec（中央理工大学）； ENS Paris-Saclay（巴黎-萨克雷大学）； Université Paris-Saclay（巴黎-萨克雷大学）

AI总结通过TALENT基准测试，发现表格基础模型虽在预测性能上优于梯度提升决策树，但在不确定性校准上表现更差，存在性能-不确定性权衡。

Comments 6 pages, 2 figures, 2 tables. Accepted at ESANN 2026 (European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning), 22-24 April 2026, Bruges (Belgium)

详情

DOI: 10.14428/esann/2026.ES2026-261
Journal ref: ESANN 2026 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges (Belgium) and online event, 22-24 April 2026, pp. 115-120, i6doc.com publ., ISBN 9782875870964

AI中文摘要

最近的表格基础模型（TFMs）展示了最先进的预测性能，通常超越梯度提升决策树（GBDTs）。然而，这些模型的可信度，特别是其不确定性量化，在很大程度上被忽视了。我们通过在TALENT基准测试的112个数据集上进行广泛研究，比较TFMs、GBDTs和经典基线，调查了这一差距。我们的结果揭示了性能-不确定性权衡：尽管TFMs在AUC测量下达到了最高的预测性能，但在共形预测下，它们表现出较低的条件覆盖率（由SSCS测量），相比GBDTs。在合成数据集上的补充实验进一步刻画了这种效应加剧的情景。我们得出结论，尽管TFMs推进了预测前沿，但实现良好校准的不确定性仍然是其可靠采用的主要开放挑战。代码可在：https://github.com/jose-melo/high-performance-low-reliability 获取。

英文摘要

Recent Tabular Foundation Models (TFMs) have demonstrated state-of-the-art predictive performance, often surpassing Gradient-Boosted Decision Trees (GBDTs). However, the trustworthiness of these models, particularly their uncertainty quantification, has been largely overlooked. We investigate this gap through an extensive study comparing TFMs, GBDTs, and classical baselines on the 112 datasets of the TALENT benchmark. Our results reveal a performance-uncertainty trade-off: although TFMs achieve the highest predictive performance, measured by AUC, they exhibit lower conditional coverage under conformal prediction, measured by SSCS, compared to GBDTs. Complementary experiments on synthetic datasets further characterize the regimes in which this effect intensifies. We conclude that while TFMs advance predictive frontiers, achieving well-calibrated uncertainty remains a major open challenge for their reliable adoption. Code is available at: https://github.com/jose-melo/high-performance-low-reliability

URL PDF HTML ☆

赞 0 踩 0

2605.28549 2026-05-28 cs.RO cs.LG 版本更新

SPRINT: Efficient Spectral Priors for Humanoid Athletic Sprints

SPRINT: 用于人形运动短跑的高效频谱先验

Yantong Wei, Kaihong Huang, Hainan Pan, Jiawei Luo, Jiawei Zhou, Ziyan Mai, Zhiwen Zeng, Yaonan Wang, Huimin Lu

发表机构 * College of Intelligence Science and Technology, National University of Defense Technology（智能科学与技术学院，国防科技大学）； School of Artificial Intelligence and Robotics, Hunan University（人工智能与机器人学院，湖南大学）

AI总结提出SPRINT框架，利用频率自适应频谱先验生成运动学可行的关节轨迹，实现零样本仿真到现实迁移，在Unitree G1平台上达到6 m/s峰值速度。

详情

AI中文摘要

人形运动短跑的追求受到缺乏人形可行的运动学参考数据以及现有框架在短跑过程中无法保持稳定性的阻碍。为了克服这些限制，我们引入了SPRINT，一种由高效、频率自适应频谱先验驱动的新框架。通过使用五个离散运动序列的参考库在频域中表征人类运动的基本周期性，这些先验在广泛的速度范围内生成运动学可行的关节轨迹，成功外推至超过参考分布的速度。在这些预训练先验的指导下，SPRINT策略在Unitree G1平台上的现场实验中实现了零样本仿真到现实迁移，达到了6 m/s的峰值短跑速度，并在保持仿生自然性的同时展示了无缝步态转换。最终，这项工作确立了频率自适应频谱先验作为人形运动短跑的高数据效率基础。项目页面见 https://anonymous.4open.science/w/SPRINT-138A/。

英文摘要

The pursuit of humanoid athletic sprints is hindered by a scarcity of humanoid-viable kinematic reference data and the inability of existing frameworks to maintain stability during sprints. To overcome these limitations, we introduce SPRINT, a novel framework driven by efficient, frequency-adaptive spectral priors. By characterizing the fundamental periodicity of human locomotion in the frequency domain using a reference library of five discrete motion sequences, these priors generate kinematically feasible joint trajectories across a broad velocity spectrum, successfully extrapolating to speeds that exceed the reference distribution. Guided by these pretrained priors, the SPRINT policy achieves zero-shot sim-to-real transfer in field experiments on the Unitree G1 platform, reaching a peak sprinting velocity of 6 m/s and demonstrating seamless gait transitions while preserving biomimetic naturalness. Ultimately, this work establishes frequency-adaptive spectral priors as a highly data-efficient foundation for humanoid athletic sprints. The project page is available at https://anonymous.4open.science/w/SPRINT-138A/.

URL PDF HTML ☆

赞 0 踩 0

2605.28543 2026-05-28 cs.AI cs.CL cs.LG 版本更新

Cultural Binding Heads in Language Models

语言模型中的文化绑定头

Avrile Floro, Luca Benedetto

发表机构 * Mistral-7B ； Mistral-Nemo-12B ； Llama-3.1-8B ； Gemma-2-9B

AI总结通过机制可解释性和析因设计，识别出8个语言模型中2-3个中间层注意力头对文化绑定有因果贡献，且绑定主要在预训练阶段形成，知识探测表明模型知道的知识远多于其行为表现。

详情

AI中文摘要

大型语言模型通常默认对不同文化群体一视同仁，即使上下文需要区分：这缺乏差异意识。利用机制可解释性和Wang等人(2025)的N4文化挪用基准上的析因设计，我们在八个模型（四种架构，基础版和指令版）中识别出每个模型有2-3个中间层注意力头对文化绑定有因果贡献。文化绑定是将文化项目与适当身份关联的过程。敲除这些头上的身份到项目边会使绑定强度降低9-23%。识别出的头从指令模型转移到基础模型，表明文化绑定是在预训练阶段创建的。α缩放显示分级剂量反应，生成时适度放大引导（α=2-3）可将文化区分准确性提高1-3个百分点，同时基本保持中性推理不变。知识探测任务表明，模型知道的知识比其行为表现多3-5倍，表明瓶颈在于路由而非知识。

英文摘要

LLMs often default to equal treatment across cultural groups, even though context warrants differentiation: this is a lack of difference awareness. Using mechanistic interpretability and a factorial design on the N4 cultural appropriation benchmark from Wang et al. (2025), we identify 2-3 mid-layer attention heads per model that contribute causally to cultural binding across eight models (four architectures, base and instruct). Cultural binding is the process of associating cultural items with the appropriate identity. Knockout of the identity-to-item edges on these heads lowers the binding strength by 9-23%. The identified heads transfer from instruct to base models, suggesting that cultural binding is created at pre-training. An $α$-scaling shows a graded dose-response and moderate amplification steering at generation ($α= 2-3$) increases cultural differentiation accuracy by 1-3 pp while leaving neutral reasoning mostly intact. A knowledge probing task shows that models know 3-5 times more than they act upon it, indicating that the bottleneck lies in routing and not knowledge.

URL PDF HTML ☆

赞 0 踩 0

2605.28533 2026-05-28 cs.LG 版本更新

Semi-Supervised Hypothesis Testing by Betting on Predictions

基于预测投注的半监督假设检验

Yaniv Tenzer, Elad Tolochinsky, Yaniv Romano

发表机构 * Department of Computer Science, Technion – Israel Institute of Technology（计算机科学系，技术Ion – 以色列理工学院）； Department of Electrical and Computer Engineering, Technion – Israel Institute of Technology（电气与计算机工程系，技术Ion – 以色列理工学院）

AI总结提出一种基于预测投注的框架，利用无标签数据增强序贯假设检验的效力，通过引入e统计量实现任意有效的检验，并在标签偏移或概念偏移下保持有效性。

详情

AI中文摘要

我们引入了一个基于预测投注的框架，利用无标签数据上的预测来增强序贯假设检验的效力。给定来自$(X,Y)$联合分布的有限样本，以及来自$X$边际分布的额外无标签样本，我们探究如何利用无标签数据对$Y$的分布以及$Y\mid X$的条件分布进行假设。我们引入了一个e统计量，并用它构建了一个序贯检验。在标准分布假设——标签偏移或概念偏移下，我们证明了该检验是任意有效的。此外，我们表明对于二元数据，该e统计量具有非平凡的检验功效。关键在于，即使底层预测不准确，我们的方法仍能保持这些性质。通过模拟实验和在大语言模型评估中的应用，我们展示了该方法相对于基线方法（包括预测驱动推断）的效力提升。即使在无标签数据相对有限，且由于$X$和$Y$之间弱相关导致预测精度较低的情况下，这些提升仍然存在。

英文摘要

We introduce a testing-by-betting framework that leverages predictions on unlabeled data to enhance the power of sequential hypothesis testing. Given limited samples from the joint distribution of $(X,Y)$, and additional unlabeled samples from the marginal of $X$, we ask how unlabeled data can be used to hypothesize about the distribution of $Y$, and the conditional distribution of $Y\mid X$. We introduce an e-statistic and use it to construct a sequential test. Under standard distributional assumptions -- label shift or concept shift -- we establish that the test is anytime valid. Furthermore, we show that for binary data, the e-statistic has non-trivial power. Crucially, our approach retains these properties even when the underlying predictions are inaccurate. Through simulations and applications to large language models evaluation, we demonstrate power gains over baseline approaches, including prediction-powered inference. These gains persist even with relatively limited unlabeled data and when predictions have low accuracy due to weak correlation between $X$ and $Y$.

URL PDF HTML ☆

赞 0 踩 0

2605.28531 2026-05-28 cs.LG 版本更新

Stabilizing distribution-free probabilistic forecasts

稳定化无分布概率预测

Jente Van Belle, Honglin Wen, Wouter Verbeke, Pierre Pinson

发表机构 * Faculty of Economics and Business（经济与商业学院）； Department of Electrical Engineering, Shanghai Jiao Tong University（上海交通大学电气工程学院）； Dyson School of Design Engineering, Imperial College London（帝国理工学院设计工程学院）； Department of Technology, Management and Economics, Technical University of Denmark（丹麦技术大学技术、管理与经济学系）； CoRE, Aarhus University（阿贾克斯大学CoRE）

AI总结提出一种基于神经网络参数化回归样条的方法，联合优化无分布概率时间序列预测的质量与稳定性，以控制预测更新导致的波动，并在两个数据集上验证了其有效性。

详情

AI中文摘要

多步预测通常会在新观测值可用时进行更新，因为较短的预测期限通常会提高预测质量。然而，这种改进是以预测不稳定性为代价的，即同一目标时期的预测值存在变异性。这种不稳定性可能引发基于预测制定的计划发生代价高昂的变更，并可能削弱对预测系统的信任。在这项工作中，我们将预测稳定性与预测质量一起纳入无分布概率时间序列预测模型的训练中，从而能够控制这种权衡。我们提出了一种使用神经网络参数化的回归样条生成稳定化预测条件分位数函数的方法。这种方法能够联合优化质量和稳定性，因为它允许我们直接惩罚由预测更新引起的差异。此外，它允许对稳定预测分布的不同部分（例如，中心部分与尾部）赋予不同的重要性，以专注于对预期下游应用最相关的部分（例如，库存管理的上尾）。我们在两个具有不同统计特性的数据集上对所提出的方法进行了实证评估，结果表明，它可以在不显著损失预测质量的情况下有效降低预测不稳定性，并且可以将稳定化努力针对预测分布的特定部分。

英文摘要

Multi-step-ahead forecasts are often updated as new observations become available, since shorter forecast horizons typically improve forecast quality. However, such improvements come at the cost of forecast instability, i.e., variability in forecasts for the same target period. This instability can trigger costly changes to plans formulated based on the forecasts and may erode trust in the forecasting system. In this work, we integrate forecast stability alongside forecast quality into the training of distribution-free probabilistic time-series forecasting models, allowing us to control this trade-off. We propose a method for generating stabilized forecasted conditional quantile functions using regression splines parameterized by a neural network. This approach enables joint optimization of quality and stability, as it allows us to directly penalize dissimilarities arising from forecast updates. Furthermore, it allows assigning varying importance to stabilizing different parts of the forecast distributions (e.g., central parts vs. tails) to focus on the parts most relevant for the intended downstream use (e.g., the upper tail for inventory management). We empirically evaluate the proposed method on two datasets with different statistical properties and show that it can effectively reduce forecast instability without a substantial loss in forecast quality, and that it can target stabilization effort toward specific parts of the forecast distributions.

URL PDF HTML ☆

赞 0 踩 0

2605.28517 2026-05-28 cs.LG cs.AI 版本更新

Stochastic Gradient Descent with Momentum is Algorithmically Stable

带动量的随机梯度下降具有算法稳定性

Yunwen Lei, Zimeng Wang, Xiaoming Yuan

发表机构 * Department of Mathematics, The University of Hong Kong（香港大学数学系）； Department of Mathematics and Mathematical Statistics, Umeå University（乌梅大学数学与统计学系）

AI总结本文通过算法稳定性分析，证明了带动量的随机梯度下降（SGDM）在光滑凸问题上具有泛化保证，并建立了最优的过界总体风险界。

详情

AI中文摘要

ProvMind：基于来源的材料合成推理

Yiming Zhang, Ryo Tamura, Koji Tsuda

发表机构 * Center for Basic Research on Materials, National Institute for Materials Science（材料基础研究センター，国家材料科学研究所）； RIKEN Center for Advanced Intelligence Project（RIKEN高级智能项目中心）

AI总结提出MatProcBench基准和ProvMind框架，通过来源图推理实现材料合成中的路线、条件和因果依赖优化，在双OOD分割上达到52.84%准确率。

2605.28467 2026-05-28 cs.LG 版本更新

Mitigating Adaptive Attacks against Reasoning Models with Activation Consistency Training

通过激活一致性训练缓解针对推理模型的自适应攻击

Avidan Shah, Jannik Brinkmann, Rico Angell

发表机构 * New York University（纽约大学）； Technical University Clausthal（克劳斯塔尔技术大学）

AI总结提出激活一致性训练（ACT）方法，通过监督内部表示来防御针对推理模型的对抗性越狱和提示注入攻击，实验表明ACT在自适应攻击下保持鲁棒性。

详情

AI中文摘要

随着LLMs获得更强的推理能力，其扩展的思维链为防御对抗性越狱和提示注入引入了新的复杂性。我们研究了一致性训练，这是一系列微调目标，强制在干净提示和对抗性重写上行为一致，并评估了其两个主要变体：输出级（BCT）和激活级（ACT），在五个推理模型上。我们将这两种方法表述为提示注入防御，并发现ACT与其他基于训练的防御相比具有竞争力，同时仅需要干净和包装提示的自监督对。我们的实验还将这两种技术推广到越狱设置中，证明ACT对自适应攻击保持更强的鲁棒性。我们还提供了机制证据，表明ACT对越狱的防御被编码为在助手回合边界处激活空间中的大致线性偏移。经过ACT训练后，我们可以恢复一个单一的引导方向，该方向控制推理模型上的拒绝，而对良性输入影响最小。我们发现，即使模型的思维链被替换为来自未防御基础模型的顺从轨迹，ACT仍然保持鲁棒性，转而拒绝预填充的越狱。这些结果共同表明，监督内部表示是推理模型中各种形式安全训练的一种出乎意料有效且可解释的方法。

英文摘要

As LLMs gain stronger reasoning capabilities, their extended chain-of-thought introduces new degrees of complexity for defending against adversarial jailbreaks and prompt injection. We study consistency training, a family of fine-tuning objectives that enforce identical behavior on clean prompts and adversarial rewrites, and evaluate its two main variants, output-level (BCT) and activation-level (ACT), across five reasoning models. We formulate both methods as a prompt injection defense and find ACT to be competitive with other training-based defenses while requiring only self-supervised pairs of clean and wrapped prompts. Our experiments also generalize both techniques within the jailbreak setting, demonstrating that ACT remains more robust to adaptive attacks. We also provide mechanistic evidence that ACT's defense against jailbreaks is encoded as a roughly linear shift in activation space at the assistant-turn boundary. After ACT training, we can recover a single steering direction that controls refusal on reasoning models with minimal effect on benign inputs. We find that ACT remains robust even when the model's chain-of-thought is replaced with a compliant trace from the undefended base model, pivoting to refuse prefilled jailbreaks. Together, these results suggest that supervising internal representations is a surprisingly effective and interpretable approach to various forms of safety training in reasoning models.

URL PDF HTML ☆

赞 0 踩 0

2605.28444 2026-05-28 cs.LG 版本更新

Bilinear Coordinate Alignment for Training-Free Task-Vector Transfer

双线性坐标对齐用于免训练任务向量迁移

Jungyong Son, Jinwook Jung, Minhee Park, Sungyong Baik

发表机构 * Dept. of Artificial Intelligence（人工智能系）； Dept. of Data Science（数据科学系）

AI总结针对预训练模型版本更新后微调知识无法直接复用的问题，提出基于双线性坐标对齐的免训练框架BiCo，通过少量校准数据的前向-反向传播估计正交Procrustes映射，实现任务向量在模型间的有效迁移。

详情

AI中文摘要

微调大规模预训练模型是近期将通用表示适配到专门任务的流行范式。然而，当预训练模型的新版本可用时，通过微调获得的专业知识无法直接重用，因为它与原始模型的参数化绑定，需要另一次昂贵的微调。为解决这一低效问题，近期工作使用任务向量（定义为微调模型与其基础模型之间的参数差异）在模型间迁移专业知识。现有方法通过匹配激活或梯度来桥接不同模型，但与直接微调相比仍存在显著性能差距，表明这些部分对应关系不足。在本工作中，我们不将任务向量仅视为参数偏移，而是重新审视任务向量的形成，并表明它们可以推导为输入侧激活与输出侧梯度之间的累积双线性交互。受此观察启发，我们将任务向量迁移形式化为双空间对齐问题，并提出BiCo，一种通过双线性坐标对齐进行任务向量迁移的免训练框架。BiCo使用少量校准集上的单次前向-反向传播估计两个空间中的正交Procrustes映射，无需任何参数更新。在广泛的计算机视觉和自然语言处理基准测试中，BiCo在宽度、深度和预训练配置不同的模型间始终优于现有迁移方法。

英文摘要

Fine-tuning large-scale pre-trained models is a recent prevalent paradigm for adapting general representations to specialized tasks. However, when a new version of a pre-trained model becomes available, expertise acquired through fine-tuning cannot be directly reused because it is tied to the parameterization of the original model, requiring another costly fine-tuning. To address this inefficiency, recent work uses task vectors, defined as the parameter difference between a fine-tuned model and its base model, to transfer expertise across models. While existing methods bridge disparate models by matching activations or gradients, a significant performance gap remains relative to direct fine-tuning, suggesting that these partial correspondences are insufficient. In this work, instead of viewing a task vector merely as a parameter offset, we revisit the formation of task vectors and show that they can be derived as accumulated bilinear interactions between input-side activations and output-side gradients. Motivated by this observation, we formulate task-vector transfer as a dual-space alignment problem and propose BiCo, a training-free framework for transferring task vectors through Bilinear Coordinate alignment. BiCo estimates orthogonal Procrustes mappings in both spaces using a single forward-backward pass on a small calibration set, without any parameter update. Across extensive computer vision and natural language processing benchmarks, BiCo consistently outperforms existing transfer methods across models that differ in width, depth, and pre-training configuration.

URL PDF HTML ☆

赞 0 踩 0

2605.28440 2026-05-28 cs.CL cs.LG 版本更新

AdaDPO: Self-Adaptive Direct Preference Optimization with Balanced Gradient Updates

AdaDPO：具有平衡梯度更新的自适应直接偏好优化

Shaolong Chen, Madalina Ciobanu, Qingqing Mao, Ritankar Das

发表机构 * Incept Labs（Incept实验室）

AI总结针对DPO中梯度不对称导致模型偏向避免不良回答而非生成优质回答的问题，提出AdaDPO算法，通过引入基于策略模型生成概率的自适应系数来平衡正负偏好梯度，在AlpacaEval 2上优于DPO并缓解长度偏差。

Comments 5 figures

详情

AI中文摘要

DPO已成为替代RLHF用于将LLM与人类偏好对齐的广泛采用方法，无需单独的奖励模型或RL循环。最近的理论分析揭示了DPO中不对称的梯度行为：损失抑制不偏好响应的速度远快于促进偏好响应，导致模型学习避免生成坏答案而非生成好答案。我们提出AdaDPO，一种DPO算法的自适应变体，它引入了基于策略模型生成概率的每偏好对、基于停止梯度的系数，并以参考模型的概率作为可选组件。AdaDPO旨在强制偏好和不偏好概率的梯度幅度相等；实际实现平衡了每token梯度并应用数值裁剪边界以保持稳定性，同时保留DPO的原始超参数结构。在SimPO类似设置下使用UltraFeedback训练的Llama-3-8B-Instruct上，AdaDPO在AlpacaEval 2上持续优于DPO：它在81%的超参数组合中实现了更高的长度控制胜率（LC），达到了全局最佳LC（48.3%）和原始胜率（46.1%），并在88%的组合中扩大了LC与WR的差距，表明有效缓解了长度偏差。对KL散度、奖励边际和奖励准确率的额外分析证实，AdaDPO纠正了梯度不平衡并产生了更高效的优化。由于它纯粹在损失层面操作，AdaDPO可以无缝集成到现有的基于偏好的对齐流程中，无需改变数据收集或模型架构。该方法仅需几行代码，并且相同的自适应原理可推广到广泛的成对对比偏好损失族，包括SimPO、R-DPO、IPO、CPO和ORPO。

英文摘要

DPO has become a widely adopted alternative to RLHF for aligning LLMs with human preferences, eliminating the need for a separate reward model or RL loop. Recent theoretical analysis uncovers an asymmetric gradient behavior in DPO: the loss suppresses dispreferred responses substantially faster than it promotes preferred ones, causing the model to learn to avoid bad answers rather than to generate good ones. We propose AdaDPO, a Self-Adaptive variant of the DPO algorithm that introduces per-preference-pair, stop-gradient-based coefficients derived directly from the policy model's generation probabilities, with the reference model's probabilities as an optional component. AdaDPO is constructed to enforce equality of gradient magnitudes between preferred and dispreferred probabilities; the practical implementation balances per-token gradients and applies a numerical clipping bound for stability, while retaining DPO's original hyperparameter structure. On Llama-3-8B-Instruct trained on UltraFeedback under a SimPO similar setup, AdaDPO consistently outperforms DPO on AlpacaEval 2: it achieves higher length-controlled win rates (LC) in 81% of hyperparameter combinations, attains the global best LC (48.3%) and raw win rate (46.1%), and enlarges the LC-over-WR margin in 88% of combinations, indicating effective mitigation of length bias. Additional analyses on KL divergence, reward margin, and reward accuracy confirm that AdaDPO rectifies the gradient imbalance and yields more efficient optimization. Because it operates purely at the loss level, AdaDPO can be dropped into existing preference-based alignment pipelines without changing data collection or model architectures. The method requires only a few lines of code, and the same self-adaptive principle generalizes to a broad family of pairwise contrastive preference losses including SimPO, R-DPO, IPO, CPO, and ORPO.

URL PDF HTML ☆

赞 0 踩 0

2605.28427 2026-05-28 cs.LG stat.ML 版本更新

Latent Diffusion for Missing Data

缺失数据的潜在扩散模型

Alberte Heering Estad, Ignacio Peis, Jes Frellsen

发表机构 * Technical University of Denmark（丹麦技术大学）； Pioneer Centre for Artificial Intelligence（先锋人工智能中心）

AI总结提出两阶段框架，先利用鲁棒VAE从缺失数据中学习潜在表示，再训练扩散模型，在MCAR缺失率高达50%时仍保持高质量生成，优于像素空间扩散。

详情

AI中文摘要

扩散模型已成为缺失数据插补的强大生成方法，但大多数现有方法直接在数据空间中操作，当训练数据严重不完整时会退化。我们研究将扩散转移到学习到的潜在表示是否能在完全随机缺失（MCAR）损坏下提高鲁棒性。为此，我们提出一个两阶段框架：一个基于VAE的鲁棒插补器首先从不完整观测中学习紧凑的语义特征，然后在得到的潜在空间中训练扩散模型。在不同的训练缺失率下，我们在相同的不完整数据设置下与像素空间扩散模型进行受控比较。潜在扩散模型保持高样本质量，并在缺失率高达50%时保持稳定，而像素空间扩散随着缺失率增加逐渐退化。对于下游插补，潜在扩散也始终比像素空间扩散表现更好。这些发现表明，潜在空间建模减轻了零插补输入带来的伪影放大，并为不完整数据学习提供了更鲁棒的生成先验。总体而言，我们的结果支持潜在扩散作为缺失数据问题中像素空间扩散的一个强大且实用的替代方案。

英文摘要

Diffusion models have emerged as powerful generative approaches for missing-data imputation, yet most existing methods operate directly in data space and degrade when training data are heavily incomplete. We investigate whether shifting diffusion to a learned latent representation improves robustness under missing-completely-at-random (MCAR) corruption. To this end, we propose a two-stage framework: a robust VAE-based imputer first learns compact semantic features from incomplete observations, and a diffusion model is then trained in the resulting latent space. Across training missing rates, we perform a controlled comparison against pixel-space diffusion models under the same incomplete-data setting. The latent diffusion model maintains high sample quality and remains stable up to 50\% missingness, while pixel-space diffusion degrades progressively as missingness increases. For downstream imputation, latent diffusion also achieves consistently better performance than pixel-space diffusion. These findings indicate that latent-space modeling mitigates artifact amplification from zero-imputed inputs and provides a more robust generative prior for incomplete-data learning. Overall, our results support latent diffusion as a strong and practically useful alternative to pixel-space diffusion for missing-data problems.

URL PDF HTML ☆

赞 0 踩 0

2605.28412 2026-05-28 cs.RO cs.LG 版本更新

Tactile-Proprioceptive Sensor Fusion for Contact Wrench Estimation in Whole-Body Physical Human-Robot Interaction

触觉-本体感觉传感器融合用于全身物理人机交互中的接触力估计

Junha Min, Junghyeon Ma, Jiwung Kwon, Sunggyu Bae, Joohyung Kim, Kyungseo Park

发表机构 * Department of Robotics and Mechatronics Engineering, DGIST (Daegu Gyeongbuk Institute of Science and Technology)（机器人与机电工程系，DGIST（大邱庆尚科学技术研究所））； Kinetic Intelligent Machine Lab (KIMLAB), University of Illinois Urbana-Champaign（动能智能机器实验室（KIMLAB），伊利诺伊大学厄巴纳-香槟分校）

AI总结提出触觉-本体感觉融合框架，利用气动皮肤垫的触觉线索作为接触指示器，结合基于电机电流的本体感觉，通过时间卷积网络消除摩擦滞后，实现多轴接触力重建，提高物理人机交互的灵敏度和响应性。

Comments 8 pages, 6 figures. Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2026

详情

AI中文摘要

直接物理引导是一种自然的教学和与机器人交互的方式，机器人皮肤通过实现灵敏的接触感知和定位做出关键贡献。本文提出了一种用于自然物理人机交互的触觉-本体感觉传感器融合框架。来自气动皮肤垫的触觉线索作为接触指示器，绕过了摩擦残余和施加外力之间的模糊性，实现了无需明确摩擦识别的高灵敏度接触检测。我们将这些线索与基于电机电流的本体感觉融合，以重建机器人表面的多轴接触力。为了在运动过程中保持精度，我们采用时间卷积网络（TCN）来减轻粘滑过渡期间的摩擦滞后，减少接触起始时的不确定性，并产生平滑、响应灵敏的引导。我们在集成皮肤的机器人臂上验证了该方法：（i）在静止接触中重建多轴力，以及（ii）同时进行力估计和动觉教学。结果表明，与仅触觉和仅本体感觉的基线相比，在不同接触条件下灵敏度和响应性均有提高，支持触觉-本体感觉融合作为安全、直观的物理人机交互的可靠途径。

英文摘要

Direct physical guidance is a natural means of teaching and interacting with robots, and robotic skins make a key contribution by enabling sensitive contact sensing and localization. This paper presents a tactile-proprioceptive sensor fusion framework for natural physical human-robot interaction. Tactile cues from pneumatic skin pads serve as contact indicators that bypass the ambiguity between frictional residues and applied external forces, enabling highly sensitive contact detection without explicit friction identification. We fuse these cues with motor-current-based proprioception to reconstruct multi-axis contact forces on the robot surface. To maintain accuracy during motion, we employ a temporal convolutional network (TCN) to mitigate friction hysteresis during stick-slip transitions, reducing uncertainty at contact onset and yielding smooth, responsive guidance. We validate the approach on a skin-integrated robot arm: (i) multi-axis forces are reconstructed in stationary contacts, and (ii) simultaneous force estimation and kinesthetic teaching are demonstrated. Results indicate improved sensitivity and responsiveness across diverse contact conditions compared with tactile-only and proprioceptive-only baselines, supporting tactile-proprioceptive fusion as a reliable pathway to safe, intuitive physical human-robot interaction.

URL PDF HTML ☆

赞 0 踩 0

2605.28396 2026-05-28 cs.LG cs.AI 版本更新

ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation

ADWIN: 用于视野感知在线策略蒸馏的自适应窗口

Kun Liang, Chenming Tang, Clive Bai, Weijie Liu, Saiyong Yang, Yunfang Wu

发表机构 * School of Computer Science, Peking University（北京大学计算机科学系）； National Key Laboratory for Multimedia Information Processing, Peking University（北京大学多媒体信息处理国家重点实验室）； LLM Department, Tencent（腾讯LLM部门）

AI总结提出ADWIN框架，通过自适应窗口动态调整在线策略蒸馏中的轨迹长度，在保持或提升准确率的同时，将训练成本降低最多4.1倍。

详情

AI中文摘要

在线策略蒸馏（OPD）通过沿着学生生成的轨迹训练学生模型，并利用教师反馈来迁移推理行为，但标准的全轨迹训练将每次更新与昂贵的完整轨迹绑定，并且可能过度分配监督到对当前学生边际价值较低的后半部分。我们通过有用监督视野重新审视这一假设：学生引起的轨迹可能偏离教师偏好的延续，而对齐的前缀可能已经保留了长视野OPD更新方向。我们提出ADWIN，一种用于OPD的自适应窗口框架，将轨迹长度视为在线可接受性决策，在短的教师锚定前缀上训练，同时使用延迟的全轨迹探测来审计前缀与全轨迹的对齐情况，并通过陈旧性控制自适应调整下一视野。在数学和代码推理基准测试中，包括单任务、多任务和强到弱设置，ADWIN在全轨迹OPD和基于前缀的基线方法上改善了准确率与计算成本的权衡，将端到端训练成本降低最多4.1倍，同时达到相当或更好的准确率。

英文摘要

On-policy distillation (OPD) transfers reasoning behavior by training a student on teacher feedback along student-generated trajectories, but standard full-rollout training ties every update to a costly completion and can over-allocate supervision to late positions with low marginal value for the current student. We revisit this assumption through the useful supervision horizon: student-induced rollouts can drift from teacher-preferred continuations, while aligned prefixes may already preserve the long-horizon OPD update direction. We propose ADWIN, an adaptive-window framework for OPD that treats rollout length as an online admissibility decision, training on short teacher-anchored prefixes while using delayed full-rollout probes to audit prefix--full alignment and adapt the next horizon with staleness control. Across math and code reasoning benchmarks in single-task, multi-task, and strong-to-weak settings, ADWIN improves the accuracy--compute trade-off over full-rollout OPD and prefix-based baselines, reducing end-to-end training cost by up to 4.1 times while achieving comparable or better accuracy.

URL PDF HTML ☆

赞 0 踩 0

2605.28387 2026-05-28 cs.LG cs.AI cs.NE 版本更新

CLANE: Continual Learning of Actions on Neuromorphic Hardware from Event Cameras

CLANE: 基于事件相机在神经形态硬件上的动作持续学习

Elvin Hajizada, Michael Neumeier, Edward Paxon Frady, Yulia Sandamirskaya, Axel von Arnim, Bing Li, Eyke Hüllermeier

发表机构 * Institute of Informatics, University of Munich (LMU)（慕尼黑大学信息学院）； fortiss GmbH, Neuromorphic Computing（fortiss GmbH 神经形态计算部门）； Technical University of Munich, TUM School of CIT（慕尼黑技术大学 CIT 学院）； Intel Labs, Intel Corporation（英特尔实验室，英特尔公司）； Institute of Computational Life Sciences (ICLS), Zurich University of Applied Sciences (ZHAW)（应用科学大学（ZHAW）计算生命科学研究所）； Technische Universität Ilmenau, Resource-Efficient Artificial Intelligence Group（伊门豪大学资源高效人工智能小组）； Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心）； German Research Centre for Artificial Intelligence (DFKI)（德国人工智能研究中心）

AI总结提出CLANE系统，在Intel Loihi 2神经形态芯片上实现端到端的持续学习，用于事件相机动作识别，通过尖峰CNN和新型Loihi 2模块实现高能效和低延迟。

详情

AI中文摘要

识别并持续学习新的人类动作而不遗忘先前类别，是新兴AR/VR和机器人应用的需求。对于这些应用，设备上的处理和学习对于隐私和低延迟适应至关重要。事件相机通过稀疏、异步的输出解决了视觉传感的效率问题，该输出天然兼容神经形态处理。然而，此前没有系统部署过使用神经形态硬件进行基于事件的持续设备上学习流水线。我们提出了CLANE（基于事件相机在神经形态硬件上的动作持续学习），端到端部署在Intel Loihi 2上。CLANE将用于时空特征提取的脉冲2D CNN与作为片上学习头的CLP-SNN相结合，并通过时间聚合层和定点归一化层（两者均为新型Loihi 2模块）扩展到动作片段。在真实条件下捕获的50类数据集THU E-ACT-50上，CLANE在持续学习任务中达到70.4%的准确率，同时相比顺序CNN+GRU+CLP边缘GPU基线实现了超过100倍的能耗降低和16倍的延迟降低，通过三个评估级别的跨平台等算法基准测试得到验证。

英文摘要

Recognizing and continuously learning novel human actions without forgetting prior classes is a requirement for emerging AR/VR and robotics applications. For these applications, both on-device processing and learning are essential for privacy and low-latency adaptation. Event cameras address the efficiency of visual sensing with sparse, asynchronous output that is naturally compatible with neuromorphic processing. Yet no prior system has deployed a continual on-device learning pipeline for event-based action recognition using neuromorphic hardware. We present CLANE, Continual Learning of Actions on Neuromorphic Hardware from Event Cameras, deployed end-to-end on Intel Loihi 2. CLANE combines a spiking 2D CNN for spatiotemporal feature extraction with CLP-SNN as its on-chip learning head, extended to action clips via a Temporal Aggregation Layer and a fixed-point Normalization Layer, both novel Loihi 2 modules. On THU E-ACT-50, a 50-class dataset captured under real-world conditions, CLANE achieves 70.4% accuracy in a continual learning task while delivering more than 100x energy reduction and 16x lower latency over a sequential CNN+GRU+CLP edge GPU baseline, validated through iso-algorithm cross-platform benchmarking across three evaluation levels.

URL PDF HTML ☆

赞 0 踩 0

2605.28384 2026-05-28 cs.LG 版本更新

Meta-Attention: Bayesian Per-Token Routing for Efficient Transformer Inference

Meta-Attention: 用于高效Transformer推理的贝叶斯逐Token路由

Alan Ferrari

发表机构 * Knowledge Lab AG Zürich（苏黎世知识实验室）

AI总结提出Meta-Attention框架，通过贝叶斯元控制器动态为每个token选择最优注意力策略（全softmax、线性或滑动窗口局部注意力），在几乎无开销下实现更优的计算-性能权衡。

详情

AI中文摘要

标准Transformer架构对所有token和序列位置统一应用单一注意力机制，而不考虑局部上下文或计算预算。我们提出Meta-Attention，一个通过贝叶斯元控制器动态将每个token路由到最合适的注意力策略（全softmax注意力、线性（核）注意力或滑动窗口局部注意力）的框架。与使用确定性或无先验学习路由的先前路由方法不同，元控制器将逐token机制选择视为在计算感知的Dirichlet先验下的后验推理：路由权重是通过证据下界（ELBO）目标训练的摊销变分后验q(alpha | x_t; phi)的输出，该目标联合编码任务性能和注意力机制成本。这种设计产生原则性的路由不确定性估计，控制软到硬的路由转换，无需临时负载平衡损失即可缓解路由崩溃，并在几乎无开销的情况下比确定性或无先验学习路由产生更好的计算-性能权衡。在Tiny LM基准上的第一阶段实证结果证实了核心预测：贝叶斯控制器的学习路由分布在硬路由下意味着归一化FLOP成本为25.1%，而无先验基线为59.3%（-34.2个百分点），并将路由熵从55.8%降低到43.3%（-12.5个百分点），表明Dirichlet先验防止了路由崩溃，而非贝叶斯模型默认使用全注意力。我们展示了贝叶斯架构、ELBO训练目标以及验证前向传播正确性、后验多样性和与无先验基线进行受控消融的第一阶段PyTorch原型。代码见：https://github.com/KFEAL/meta-attention

英文摘要

Standard transformer architectures apply a single attention mechanism uniformly across all tokens and sequence positions, irrespective of local context or computational budget. We propose Meta-Attention, a framework that dynamically routes each token to the most appropriate attention strategy -- full softmax attention, linear (kernel) attention, or sliding-window local attention -- via a Bayesian Meta-Controller. Unlike prior routing approaches that use deterministic or prior-free learned routing, the Meta-Controller treats per-token mechanism selection as posterior inference under a compute-aware Dirichlet prior: routing weights are the output of an amortised variational posterior q(alpha | x_t; phi) trained with an Evidence Lower Bound (ELBO) objective that jointly encodes task performance and attention-mechanism cost. This design produces principled routing uncertainty estimates that govern the soft-to-hard routing transition, mitigates routing collapse without ad hoc load-balancing losses, and yields better compute-performance trade-offs than deterministic or prior-free learned routing at negligible overhead. Phase 1 empirical results on a Tiny LM benchmark confirm core predictions: the Bayesian controller's learned routing distribution implies a projected normalised FLOP cost of 25.1% under hard routing, vs. 59.3% for the prior-free baseline (-34.2 pp), and reduces routing entropy from 55.8% to 43.3% (-12.5 pp), demonstrating that the Dirichlet prior prevents routing collapse while the non-Bayesian model defaults to full attention. We present the Bayesian architecture, ELBO training objective, and a Phase 1 PyTorch prototype validating forward-pass correctness, posterior diversity, and a controlled ablation against a prior-free baseline. Code available at: https://github.com/KFEAL/meta-attention

URL PDF HTML ☆

赞 0 踩 0

2605.28372 2026-05-28 cs.LG cs.RO 版本更新

Teacher-Student Representational Alignment for Reinforcement Learning-Driven Imitation Learning

教师-学生表征对齐用于强化学习驱动的模仿学习

Meraj Mammadov, Pedro Zuidberg Dos Martires, Johannes Andreas Stork

发表机构 * Department of Computer Science（计算机科学系）； Örebro University（奥雷布罗大学）

AI总结提出一种通过自监督对比学习构建共享嵌入空间的方法，以减小教师和学生策略之间的不可模仿差距，从而提升学生策略性能。

Comments 6 pages, 5 figures. Accepted as an oral presentation at the RL4IL Workshop at ICRA 2026

详情

AI中文摘要

从基于状态的强化学习策略进行模仿学习是克服机器人学中复杂高维观测空间维度灾难的常用方法。本文解决了当教师和学生策略孤立学习时出现的不可模仿差距，即教师策略可以依赖学生无法从其观测中推断的特权状态信息。我们提出了一种新算法，不是通过在模仿学习后进行强化学习微调（通常需要全新的训练设置）来改善学生性能，而是学习一个共享嵌入空间，该空间隐藏了特定于智能体的观测，从而通过构造训练出可模仿的教师策略。我们通过自监督对比学习与教师策略并行训练共享嵌入空间，并通过限制其梯度更新编码器网络来防止其提取私有信息。我们在多个示例领域进行了评估，并与最先进的基线方法比较，结果表明我们的算法能够实现更高的学生性能，并显著减小模仿差距。

英文摘要

Imitation learning (IL) from a state-based reinforcement learning (RL) policy is a common approach to overcome the curse of dimensionality in complex and high-dimensional observation spaces prevalent in robotics. This paper addresses the irreducible imitation gap that emerges when teacher and student are learned in isolation, and the teacher policy has the liberty to rely on privileged state information that the student cannot infer from its observations. Instead of improving poor student performance with RL finetuning after IL, which often requires a whole new training setup, we propose a novel algorithm which learns a shared embedding space that hides agent-specific observations and thus trains imitable teacher policies by construction. We train the shared embedding space with self-supervised contrastive learning in parallel to the teacher policy and prevent it from extracting private information by limiting its gradients from updating the encoder networks. We perform evaluations on several example domains and compare to state-of-the-art baselines showing that our algorithm enables higher student performance with substantially reduced imitation gap.

URL PDF HTML ☆

赞 0 踩 0

2605.28371 2026-05-28 cs.AI cs.LG cs.SE 版本更新

From paper to benchmark: agentic, framework-based reproduction of under-specified methods in machine health intelligence

从论文到基准测试：基于智能体和框架的机器健康智能中欠规范方法复现

Raffael Theiler, Ludovico Comito, David Leko, Leandro Von Krannichfeldt, Lev Telyatnikov, Olga Fink

发表机构 * EPFL（苏黎世联邦理工学院）

AI总结提出一种基于智能体和共享框架的方法，通过槽绑定接口将论文转化为可执行、可比较的基准测试实现，解决工业预测与健康管理中方法复现的困难。

详情

AI中文摘要

工业预测与健康管理（PHM）为应用机器学习中的更广泛挑战提供了一个代表性案例研究：将已发表的论文转化为可执行、可基准测试的实现。由于工业数据集的访问受限、预处理和评估协议的报告不完整以及隐含的设计选择（例如，窗口化、目标构建、数据分割）对性能有重要影响，复现PHM中的欠规范方法尤为困难。现有的论文到代码系统为单篇论文生成实现，但由于假设和评估设置的不一致性，这些产物通常无法直接比较。我们引入了基于智能体和框架的PHM论文复现方法，其中智能体通过槽绑定接口将论文转化为共享的PHM基准测试框架。该接口将方程和协议描述映射为结构化组件（任务定义、数据集适配器、窗口化、目标、模型和评估器），同时明确记录未解决的假设。最终实现通过标准化任务契约和评估钩子进行验证，从而实现一致且可比较的基准测试。我们在16篇PHM论文上评估了该方法，比较了框架增强型、基于技能和基于提示的智能体复现与最近的无框架论文复现智能体。我们评估了复现成功率、基于模型的代码评估、论文假设的框架绑定以及标准化协议下的跨论文基准可比性。结果表明，将智能体生成与共享框架相结合，将论文复现从孤立的代码合成转变为可执行、假设感知且系统可比较的基准测试实现。

英文摘要

Industrial Prognostics and Health Management (PHM) provides a representative case study for a broader challenge in applied machine learning: translating published papers into executable, benchmark-ready implementations. Reproducing under-specified methods in PHM is particularly difficult due to restricted access to industrial datasets, incomplete reporting of preprocessing and evaluation protocols, and implicit design choices (e.g., windowing, target construction, data splits) that critically affect performance. Existing paper-to-code systems generate implementations for individual papers, but these artifacts are often not directly comparable due to inconsistencies in assumptions and evaluation settings. We introduce \emph{agentic, framework-based PHM paper reproduction}, where an agent translates a paper into a shared PHM benchmark framework via a \emph{slot-binding interface}. This interface maps equations and protocol descriptions into structured components (task definitions, dataset adapters, windowing, targets, models, and evaluators), while explicitly recording unresolved assumptions. The resulting implementations are validated against standardized task contracts and evaluation hooks, enabling consistent and comparable benchmarking. We evaluate this approach on 16 PHM papers, comparing framework-enhanced, skill-based and prompt-based agentic reproduction against a recent framework-free paper-reproduction agent. We assess reproduction success, model-based code evaluation, framework binding of paper assumptions, and cross-paper benchmark comparability under standardized protocols. Our results show that coupling agentic generation with a shared framework transforms paper reproduction from isolated code synthesis into executable, assumption-aware, and systematically comparable benchmark implementations.

URL PDF HTML ☆

赞 0 踩 0

2605.28364 2026-05-28 stat.ML cs.LG 版本更新

Variance-Adaptive Optimal Algorithm for Reinforcement Learning with Multinomial Logit Function Approximation

基于多项逻辑函数逼近的强化学习的方差自适应最优算法

Wonyoung Kim, Min-Hwan Oh, Garud Iyengar, Assaf Zeevi

发表机构 * Chung-Ang University（Chung-Ang 大学）； Seoul National University（首尔国立大学）； Columbia University（哥伦比亚大学）

AI总结针对多项逻辑函数逼近的强化学习，提出一种计算高效的方差自适应算法，实现了实例级最优遗憾界，并通过实验验证其优于传统方法。

2605.28358 2026-05-28 cs.LG cs.AI cs.IT math.IT 版本更新

Score Based Error Correcting Code Decoder

基于分数的纠错码译码器

Alon Helvits, Eliya Nachmani

发表机构 * School of Electrical and Computer Engineering (ECE)（电气与计算机工程学院）

AI总结提出SB-ECC，一种将译码视为连续时间去噪的基于分数的译码器，通过神经去噪器定义概率流常微分方程，在奇偶校验约束下迭代更新噪声信道观测值，无需SNR估计即可推理，并在42个码/SNR设置中39/42达到最佳误码率。

Comments Accepted to ICML 2026

详情

AI中文摘要

纠错码能够实现可靠通信，然而在实际软译码中，跨码族和码长仍然具有挑战性。我们提出SB-ECC，一种基于分数的译码器，将译码视为连续时间去噪。神经去噪器定义了一个概率流常微分方程（ODE），该方程在奇偶校验约束的引导下，迭代地将噪声信道观测值更新为有效的码字。该模型在不同噪声水平下训练，无需时间/SNR条件，从而无需SNR估计即可进行推理，并支持由ODE求解器预算控制的直接延迟-精度权衡。我们使用原始带符号的信道观测值作为输入来学习连续去噪场。在42个码/SNR设置中，SB-ECC在39/42个条目中实现了最佳误码率，平均SNR增益为0.17dB，最大增益为0.46dB，优于最强竞争基线。我们表明，将求解器从Euler切换为DPM可保持-ln(BER)，同时将端到端译码时间平均减少8.86%（最高达12.82%）。

英文摘要

Error-correcting codes enable reliable communication, yet practical soft decoding remains challenging across code families and block lengths. We propose SB-ECC, a score-based decoder that casts decoding as continuous-time denoising. A neural denoiser defines a probability-flow ordinary differential equation (ODE) that iteratively updates the noisy channel observation toward a valid codeword, guided by parity constraints. The model is trained across noise levels without time/SNR conditioning, enabling inference without SNR estimation and supporting a direct latency accuracy trade off controlled by the ODE solver budget. We use the raw signed channel observation as input for learning a continuous denoising field. Across 42 code/SNR settings, SB-ECC achieves the best BER in 39/42 entries, with an average SNR gain of 0.17dB and a maximum gain of 0.46dB over the strongest competing baseline, we showed that swapping the solver from Euler to DPM preserves -ln(BER) while reducing end-to-end decoding time by 8.86% on average (up to 12.82%).

URL PDF HTML ☆

赞 0 踩 0

2605.28355 2026-05-28 cs.LG 版本更新

Detecting Diffusion-Generated Time Series Under Generator Shift

检测生成器偏移下的扩散生成时间序列

Zhi Wen Soi, Aditya Shankar, Gert Lek, Abele Mălan, Daniel Neider, Jian-Jia Chen, Lydia Chen

发表机构 * TU Dortmund University\ of Neuch\ a tel Dortmund Germany ； Delft University of Technology Delft Netherlands ； University of Neuch\ a tel Neuch\ a tel Switzerland ； TU Dortmund University Dortmund Germany ； RWTH Aachen University\ Dortmund University Aachen Germany ； University of Neuch\ a tel\ University of Technology Neuch\ a tel Switzerland ； TU Dortmund University\ of Neuch\ a tel ； Delft University of Technology ； University of Neuch\ a tel ； TU Dortmund University ； RWTH Aachen University\ Dortmund University ； University of Neuch\ a tel\ University of Technology

AI总结针对生成器未知的扩散生成时间序列检测问题，比较了白盒与黑盒方法，发现简单分类器作为黑盒检测器显著优于白盒方法，并指出该问题不能直接迁移图像领域经验。

详情

AI中文摘要

真实时间序列与扩散生成时间序列之间的界限变得越来越难以划分，然而该领域的检测仍未被充分探索，尤其是在生成器未知的情况下。我们比较了需要访问生成器的白盒检测与仅基于原始信号的黑盒检测。白盒方法是一种从图像领域改编的基于重构的检测器，在分布内表现良好，但在生成器偏移下失效：图像中基于重构的检测之所以成功，是因为大型通用生成器提供了近乎通用的重构先验，而时间序列不存在类似的生成器。相比之下，一个简单的现成分类器作为黑盒检测器表现非常出色，平均F1达到79.2，相对白盒方法提升22.1%，在1%假阳性率下的真正例率为57.2。因此，扩散生成时间序列的检测并非图像领域问题的直接迁移。本工作首次系统探索了扩散生成时间序列的白盒和黑盒检测。最后，我们指出了几个开放且有前景的方向。

英文摘要

The boundary between real and diffusion-generated time series is becoming increasingly difficult to draw, yet detection in this domain remains underexplored, especially when the generator is unknown. We compare white-box detection, which requires access to the generator, against black-box detection, which operates on the raw signal alone. The white-box approach, a reconstruction-based detector adapted from the image domain, works well in in-distribution but breaks down under generator shift: reconstruction-based detection in images succeeds because large generic generators provide a near-universal reconstruction prior, and no analogous generator exists for time series. In contrast, a simple off-the-shelf classifier used as a black-box detector performs remarkably well, achieving an average F1 of 79.2, a 22.1% relative improvement over the white-box approach, and a TPR@1%FPR of 57.2. Diffusion-generated time series detection is therefore not a direct transfer of the image domain problem. This work provides the first systematic exploration of white-box and black-box detection for diffusion-generated time series. We close by identifying several open and promising directions.

URL PDF HTML ☆

赞 0 踩 0

2605.28345 2026-05-28 cs.AI cs.LG eess.SP 版本更新

Picid: A Modular Evaluation Infrastructure for Reproducible PHM Across Tasks and Domains

Picid: 一种跨任务和领域的可复现PHM模块化评估基础设施

Lev Telyatnikov, Raffael Theiler, Leandro Von Krannichfeldt, Olga Fink

发表机构 * EPFL（苏黎世联邦理工学院）

AI总结提出模块化评估基础设施Picid，通过标准化数据契约和评估边界，实现跨任务、跨数据集的故障检测、诊断和预测的可复现与公平比较。

详情

AI中文摘要

预测与健康管理（PHM）领域的进展受到跨任务、数据集和应用领域缺乏标准化和可复用评估实践的阻碍。报告的结果往往难以复现和比较，因为关键协议选择（如数据划分、预处理、标签对齐、时间窗口和指标）通常是隐式的或临时实现的。我们引入了\picid，一个模块化评估基础设施，将PHM评估流程形式化为显式、可执行和可复现的协议。通过定义良好的抽象，\picid在保持对不同PHM设置的灵活性的同时，强制执行确定性、无泄漏的数据集构建。该框架通过统一接口支持故障检测、诊断和预测，并且可以扩展到新的数据集和模型类别，而不违反协议不变性。通过标准化数据契约和评估边界，\picid还实现了跨诊断（分类）和预测（回归）的公平任务比较，允许相同的模型系列在不同设置中一致地进行评估。我们通过对跨越电池、轴承、涡轮风扇发动机、液压系统、过滤系统和建筑的十二个数据集上的十三个模型进行实证评估来展示\picid。这项工作为PHM中标准化、公平和可复现的评估建立了可复用的基础。

英文摘要

Progress in Prognostics and Health Management (PHM) is hindered by the lack of standardized and reusable evaluation practices across tasks, datasets, and application domains. Reported results are often difficult to reproduce and compare, as key protocol choices, such as data splits, preprocessing, label alignment, temporal windowing, and metrics, are often implicit or implemented ad hoc. We introduce \picid, a modular evaluation infrastructure that formalizes the PHM evaluation pipeline as an explicit, executable, and reproducible protocol. Through well-defined abstractions, \picid enforces deterministic, leakage-safe dataset construction while remaining flexible across diverse PHM settings. The framework supports fault detection, diagnostics, and prognostics through a unified interface and can be extended to new datasets and model classes without violating protocol invariants. By standardizing data contracts and evaluation boundaries, \picid also enables fair cross-task comparisons across diagnostics (classification) and prognostics (regression), allowing identical model families to be evaluated consistently across heterogeneous settings. We demonstrate \picid through an empirical evaluation of thirteen models on twelve datasets spanning batteries, bearings, turbofan engines, hydraulics, filtration systems, and buildings. This work establishes a reusable foundation for standardized, fair and reproducible evaluation in PHM.

URL PDF HTML ☆

赞 0 踩 0

2605.28340 2026-05-28 stat.ML cs.LG 版本更新

Decision-focused learning for optimal PV-Battery scheduling

面向决策的光伏-电池调度优化学习

Joris Depoortere, Hussain Kazmi, Johan Driesen

发表机构 * ESAT-Electa KU Leuven（ESAT-Electa 埃因霍温大学）

AI总结提出一种决策聚焦学习框架，通过训练LSTM光伏发电预测器以最小化电池调度成本，相比传统两阶段方法降低平均电费3.6%，验证了预测与优化目标对齐的重要性。

详情

DOI: 10.1016/j.est.2026.121152
Journal ref: Journal of Energy Storage Volume 154, Part A, 10 April 2026, 121152

自回归模型中通过Logit组合实现组合泛化

Aakash Kumar, Maria Sofia Bucarelli, Emanuele Natale

发表机构 * COATI, CNRS, Inria, I3S, Université Coté d’Azur, France（COATI研究所、国家科学研究中心、Inria、I3S、法国埃克塞特大学）

AI总结本文受扩散模型组合方法的启发，提出一种新的自回归系统组合策略，在因子化条件假设下实现投影组合，并证明该组合在输出空间平滑重参数化下保持长度泛化行为。

详情

AI中文摘要

组合自回归模型仍然是理解大型语言模型如何结合跨任务学习的行为或技能的核心挑战。受扩散模型组合方法的启发，我们为自回归系统引入了一种新的、有原则的组合策略。在因子化条件假设下，我们证明所得组合是投影的：每个组件模型保持对其输出分布指定子空间的控制，避免模型间干扰。该性质在输出空间的平滑重参数化下进一步保持，产生特征空间定理。最后，我们证明当因子化假设和组件保证在目标长度上一致成立时，组合保持长度泛化行为。这些结果为理解自回归系统中模型组合和合并何时成功提供了原则性理解，并确定了其交互保持稳定的条件。

英文摘要

Composing autoregressive models remains a core challenge in understanding how large language models can combine behaviors or skills learned across tasks. We introduce a new and principled composition strategy for autoregressive systems, inspired by composition methods developed for diffusion models. Under a factorized-conditionals assumption, we show that the resulting composition is projective: each component model preserves control over its own designated subspace of the output distribution avoiding interference between models. This property is further preserved under smooth reparameterizations of the output space, yielding a feature-space theorem. Finally, we show that composition preserves length-generalizing behavior when the factorization assumptions and component guarantees hold uniformly at the target length. These results provide a principled understanding of when model composition and merging succeed in autoregressive systems and identify conditions under which their interactions remain stable.

URL PDF HTML ☆

赞 0 踩 0

2605.28302 2026-05-28 cs.LG cs.AI cs.DC 版本更新

How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving

解聚能走多远？面向高效 MoE LLM 服务的 Attention-FFN 解聚设计空间探索

Hanjiang Wu, Abhimanyu Rajeshkumar Bambhaniya, Sarbartha Banerjee, Tuhin Khare, Sudarshan Srinivasan, Suvinay Subramanian, Souvik Kundu, Madhu Kumar, Midhilesh Elavazhagan, William Won, Amir Yazdanbakhsh, Tushar Krishna

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； Intel（英特尔）； Google（谷歌）； Google DeepMind（谷歌深Mind）； Infravana

AI总结本文系统探索了从分块预填充、预填充-解码解聚到算子级 Attention-FFN 解聚 (AFD) 的不同解聚层次在 MoE 模型推理中的收益与局限，通过融合设备内核测量与高保真网络仿真的框架，在严格 TTFT/TPOT SLO 下 AFD 可在 DeepSeek-V3.2 上维持约 4k tokens/s 的系统吞吐量，并给出了联合优化吞吐与交互性的具体设计原则。

详情

AI中文摘要

现代大语言模型 (LLM) 推理已逐步解聚以跟上不断增长的模型规模和严格的 TTFT 与 TPOT 服务级别目标：从分块预填充聚合，到预填充-解码 (P/D) 解聚，再到最近出现的算子级 Attention-FFN 解聚 (AFD)。这一趋势对于混合专家 (MoE) 模型尤为重要，其中内存受限的注意力、计算密集的专家 FFN 以及 MoE 分发/组合通信产生了不同的资源需求。AFD 通过将注意力与 MoE-FFN 执行放在不同的 GPU 组上进一步暴露了这种异构性。每个解聚层次都加深了跨工作负载特征、资源分配和互连拓扑的调度设计空间，提出了核心问题：每个层次何时真正产生收益？我们系统地刻画了 MoE 推理中这一权衡，涵盖了输入/输出序列长度、前缀-KV 重用和每用户延迟约束等实际工作负载。以分块预填充和 P/D 解聚为基线，我们通过一个融合设备内核测量与高保真网络仿真的框架，研究了 AFD 在大规模下的收益与局限。在严格的 TTFT/TPOT SLO 下，AFD 在 DeepSeek-V3.2 上针对聊天、编码和代理编码工作负载维持了约 4k tokens/s 的系统吞吐量，而未经 AFD 的部署则不可行。我们提炼出联合优化吞吐与交互性的具体结论，包括如何根据工作负载和模型架构在 GPU 间划分注意力与 FFN，为当前机架级和集群级部署以及未来的解聚 AI 基础设施提供了设计原则。

英文摘要

Modern large language model (LLM) inference has progressively disaggregated to keep pace with growing model sizes and tight TTFT and TPOT service-level objectives: from chunked-prefill aggregation, to prefill-decode (P/D) disaggregation, and most recently to operator-level Attention-FFN Disaggregation (AFD). This trend is especially important for mixture-of-experts (MoE) models, where memory-bound attention, compute-intensive expert FFNs, and MoE dispatch/combine communication create distinct resource demands. AFD further exposes this heterogeneity by placing attention and MoE-FFN execution on separate GPU groups. Each level of disaggregation deepens the scheduling design space across workload characteristics, resource allocation, and interconnect topology, raising the central question: when does each level actually pay off? We systematically characterize this trade-off for MoE inference across realistic workloads spanning input/output sequence lengths, prefix-KV reuse, and per-user latency constraints. Using chunked-prefill and P/D disaggregation as baselines, we study the benefits and limits of AFD at scale through a framework that fuses on-device kernel measurements with high-fidelity network simulation. Under strict TTFT/TPOT SLOs, AFD sustains around 4k tokens/s of system throughput on DeepSeek-V3.2 across chat, coding, and agentic-coding workloads, where non-AFD deployments are infeasible. We distill concrete takeaways for jointly optimizing throughput and interactivity, including how to partition attention and FFN across GPUs as a function of workload and model architecture, providing design principles for current rack- and cluster-scale deployments as well as future disaggregated AI infrastructure.

URL PDF HTML ☆

赞 0 踩 0

2605.28300 2026-05-28 cs.LG 版本更新

T-GINEE: A Tensor-Based Multilayer Graph Representation Learning

T-GINEE：基于张量的多层图表示学习

Maolin Wang, Ziting Mai, Xuhui Chen, Zhiqi Li, Tianshuo Wei, Yutian Xiao, Wenlin Zhang, Wanyu Wang, Ruocheng Guo, Haoxuan Li, Zenglin Xu, Xiangyu Zhao

发表机构 * City University of Hong Kong（香港城市大学）； Beihang University（北京航空航天大学）； Independent Researcher（独立研究者）； Peking University（北京大学）； Fudan University（复旦大学）； Shanghai Academy of AI for Science（上海人工智能科学研究院）

AI总结针对现有方法无法捕捉多层网络层间复杂依赖的问题，提出T-GINEE框架，结合张量分解与广义估计方程显式建模跨网络相关性，理论证明一致性与渐近正态性，实验验证有效性。

Comments Accepted by ICML 2026

详情

AI中文摘要

传统网络分析关注单层网络，而现实系统通常形成具有多种关系类型的多层网络。然而，现有方法通常通过独立处理各层或聚合它们来捕捉层间复杂依赖，效果不佳。为解决此问题，我们提出T-GINEE（基于张量的广义多层图估计方程），一个统计正则化框架，结合基于张量的广义估计方程与任务特定损失，显式建模跨网络相关性。关键创新包括：（1）CP张量分解通过共享潜在因子捕捉结构依赖；（2）广义估计方程框架通过工作协方差矩阵建模层间相关性；（3）灵活的连接函数适应稀疏性等特征。我们的理论分析在温和条件下建立了一致性和渐近正态性。在合成和真实数据集上的大量实验验证了T-GINEE在多层网络分析中的有效性。

英文摘要

Traditional network analysis focuses on single-layer networks, real-world systems often form multilayer networks with multiple relationship types. However, existing methods typically fail to capture complex inter-layer dependencies by treating layers independently or aggregating them. To address this, we propose T-GINEE (Tensor-Based Generalized Multilayer-graph Estimating Equation), a statistical regularization framework combining tensor-based generalized estimating equations with task-specific loss to model cross-network correlations explicitly. Key innovations include: (1) CP tensor decomposition capturing structural dependencies via shared latent factors; (2) a generalized estimating equation framework modeling inter-layer correlations through working covariance matrices; and (3) a flexible link function accommodating characteristics like sparsity. Our theoretical analysis establishes consistency and asymptotic normality under mild conditions. Extensive experiments on synthetic and real-world datasets validate T-GINEE's effectiveness for multilayer network analysis.

URL PDF HTML ☆

赞 0 踩 0

2605.28296 2026-05-28 cs.LG nucl-ex physics.ins-det 版本更新

Machine Learning methods for event classification and vertex reconstruction of the 12C + 12C reaction with the MATE-TPC

基于MATE-TPC的12C + 12C反应事件分类和顶点重建的机器学习方法

Minghui Zhang, Xiaobin Li, Jie Chen, Ningtao Zhang, Fenhua Lu, Junrui Ma, Jiazhen Yan, Wanqin Tu, Xiaodong Tang, Bingshui Gao, Chengui Lu, Zhichao Zhang, Jinlong Zhang, Weiping Liu

发表机构 * College of Science, Southern University of Science and Technology（南方科技大学科学学院）； Institute of Modern Physics, Chinese Academy of Sciences（中国科学院现代物理研究所）； School of Nuclear Science and Technology, University of Chinese Academy of Sciences（中国科学院大学核科学与技术学院）； School of Physics, Peking University（北京大学物理学院）

AI总结采用ResNet和VGG等深度学习模型对12C + 12C反应事件进行分类，准确率达97%（模拟）和90%（实验），并利用CNN重建反应顶点。

详情

AI中文摘要

在现代核物理实验中，使用活性靶时间投影室（TPC）进行核反应研究时，识别感兴趣的事件具有挑战性。本工作采用机器学习技术分析来自名为MATE（用于核实验的多用途活性靶时间投影室）的TPC的12C + 12C聚变反应的复杂数据。具体来说，我们成功应用了残差神经网络（ResNet-50、ResNet-34和ResNet-18）和视觉几何组（VGG-19）对12C + 12C反应中的弹性散射和聚变反应事件进行分类。四个模型的分类结果几乎相同，模拟数据的准确率约为97%，实验数据的准确率约为90%。此外，这些方法成功识别了一些被传统方法误分类的事件。这些模型还应用于对不同聚变反应通道的事件进行分类，模拟数据的分类准确率约为95%。此外，开发了一个卷积神经网络（CNN）模型来重建反应顶点，为顶点重建提供了另一种策略。这些结果表明，机器学习技术可以有效分类不同通道的反应事件并重建反应顶点，从而为未来复杂核反应数据的分析铺平道路。

英文摘要

In modern nuclear physics experiments, identifying events of interest is challenging for nuclear reaction studies with the active target Time Projection Chamber (TPC). In this work, machine learning techniques are employed to analyze the complex data of the 12C + 12C fusion reaction from a TPC named MATE (multi-purpose active-target time projection chamber for nuclear experiments). Specifically, we successfully applied Residual Neural Network (ResNet-50, ResNet-34 and ResNet-18) and Visual Geometry Group (VGG-19) to classify elastic scattering and fusion reaction events from the 12C + 12C reaction. The classification results of the four models are nearly identical, with accuracies of approximately 97% for the simulated data and 90% for the experimental data. Moreover, these approaches successfully identify some events that are misclassified by traditional methods. These models are also applied to classify events from different fusion reaction channels, with classification accuracies of approximately 95% on simulated data. In addition, a Convolutional Neural Network (CNN) model is developed to reconstruct the reaction vertex, providing an alternative strategy for vertex reconstruction. These results indicate that machine learning techniques can effectively classify reaction events from different channels and reconstruct the reaction vertex, thereby paving the way for future analyses of complex nuclear reaction data.

URL PDF HTML ☆

赞 0 踩 0

2605.28295 2026-05-28 cs.AI cs.CL cs.LG 版本更新

Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR

Rollouts 的起点：面向 RLVR 的低负载、高杠杆的首 token 多样化

Soeun Kim, Albert No

发表机构 * Department of Artificial Intelligence, Yonsei University（延世大学人工智能系）

AI总结本文提出 REFT 方法，通过在推理标记后的第一个 token 处进行均匀采样多样化，以低开销显著提升 RLVR 中 rollout 的多样性，从而改善推理模型的 Pass@k 性能。

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）无需标注轨迹即可训练推理模型，它依赖分组 rollout 将策略暴露于替代推理路径，并由验证器进行评分。Rollout 多样性因此成为 RLVR 的核心瓶颈，现有方法大多通过温度、前缀或 rollout 选择调整来拓宽探索。我们发现了一个结构上独特但被忽视的拓宽多样性的位置：推理标记后的第一个 token。策略的首 token 分布表现出尖锐峰值但正确性解耦的现象，且该首 token 位置可以拓宽 rollout 组覆盖的区域而不改变正确性信号。我们引入 REFT（基于首 token 多样化的 Rollout 探索），这是对 RLVR 流程的一个轻量级补充，它从策略自身的 top-$N$ 候选集中均匀采样首 token，并均匀分配 rollout，其他组件保持不变。在由此产生的多样化 rollout 上训练后，REFT 在四个基础模型（0.5B-7B）和三个难度级别上，相较于 DAPO 和 GRPO 基线，提升了聚合的 Pass@1、Pass@8 和 Pass@64。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) trains reasoning models without labeled trajectories, relying on grouped rollouts to expose the policy to alternative reasoning paths and a verifier to score them. Rollout diversity has accordingly emerged as a central bottleneck in RLVR, with most existing methods broadening exploration through temperature, prefix, or rollout-selection adjustments. We identify a structurally distinguished but overlooked position for broadening this diversity: the first token after the reasoning marker. The policy's first-token distribution exhibits a sharply peaked yet correctness-decoupled phenomenon, and this first token position can broaden the regions a rollout group covers without altering the correctness signal. We introduce REFT (Rollout Exploration with First-Token Diversification), a light addition to the RLVR pipeline that samples first tokens uniformly from the policy's own top-$N$ candidates and allocates rollouts evenly, leaving every other component unchanged. Trained on the resulting diversified rollouts, REFT improves aggregate Pass@1, Pass@8, and Pass@64 over DAPO and GRPO baselines across four base models (0.5B-7B) and three difficulty regimes.

URL PDF HTML ☆

赞 0 踩 0

2605.28290 2026-05-28 cs.LG cs.GT stat.ML 版本更新

Adaptive Bandit Algorithms for Contextual Matching Markets

上下文匹配市场的自适应Bandit算法

Shiyun Lin, Simon Mauras, Vianney Perchet, Nadav Merlis

发表机构 * Center for Statistical Science, School of Mathematical Sciences, Peking University（北京大学数学科学学院统计科学中心）； INRIA, FairPlay Joint Team（INRIA公平玩联合团队）； Criteo AI lab, FairPlay Joint Team（Criteo AI实验室，公平玩联合团队）； Technion - Israel Institute of Technology（技术学院-以色列理工学院）

AI总结针对上下文匹配市场中的bandit学习问题，提出自适应算法，在随机和对抗性上下文下分别实现实例相关的多对数遗憾上界和次线性遗憾界。

Comments Accepted to ICML 2026

详情

AI中文摘要

我们研究匹配市场中的bandit学习，其中玩家和臂构成市场的两侧，玩家的效用与臂上下文呈线性关系。每一轮，新臂带着可观测的上下文到达。然后，算法将它们与玩家匹配，旨在最小化每个玩家相对于稳定匹配基准的遗憾。这种上下文结构带来了显著的复杂性：微妙的上下文偏移可能轻微改变一个玩家的效用，同时完全重构底层基准，导致其他玩家出现大的遗憾峰值。我们在两种设置下解决这个问题：随机上下文（从潜在分布中抽取）和对抗性上下文（可能是任意的）。对于随机情况，我们引入了一个新颖的最小偏好差距来捕捉学习难度，并提供了一种完全自适应的算法，具有实例相关的多对数遗憾上界。我们还在温和的分布假设下建立了匹配的实例无关遗憾上界和下界。对于对抗性设置，我们提出了一种在任意上下文下仍然有效的可处理遗憾概念，并通过自适应算法实现了实例无关的次线性遗憾界。

英文摘要

We study bandit learning in matching markets, where players and arms constitute the two market sides, and the players' utilities are linear in the arm contexts. In each round, new arms arrive with observable contexts. Then, the algorithm matches them to players, aiming to minimize each player's regret against a stable matching benchmark. This contextual structure creates significant complexity: subtle context shifts can slightly alter one player's utility while completely reconfiguring the underlying benchmark, causing large regret spikes for others. We address this in two settings: stochastic contexts, drawn from a latent distribution, and adversarial contexts, which may be arbitrary. For the stochastic case, we introduce a novel minimum preference gap to capture learning difficulty and provide a fully adaptive algorithm with an instance-dependent poly-logarithmic regret upper bound. We also establish matching instance-independent regret upper and lower bounds under a mild distributional assumption. For the adversarial setting, we propose a tractable regret notion that remains valid under arbitrary contexts and achieves an instance-independent sublinear regret bound via an adaptive algorithm.

URL PDF HTML ☆

赞 0 踩 0

2605.28287 2026-05-28 cs.LG cond-mat.mtrl-sci 版本更新

AtomComposer: Discovering Chemical Space from First Principles with Reinforcement Learning

AtomComposer: 基于强化学习从第一性原理发现化学空间

Bjarke Hastrup, Francois Cornet, Tejs Vegge, Arghya Bhowmik

发表机构 * Dept. of Energy Conversion and Storage, Technical University of Denmark（丹麦技术大学能源转换与存储系）； Dept. of Applied Mathematics and Computer Science, Technical University of Denmark（丹麦技术大学应用数学与计算机科学系）； Pioneer Center for Accelerating P2X Materials Discovery (CAPeX), Kgs. Lyngby, Denmark（加速P2X材料发现的先锋中心（CAPeX），Lyngby，丹麦）

AI总结提出AtomComposer，一种无需预训练数据、通过在线强化学习自主构建有效3D异构体的智能体，在未见化学式上发现的异构体数量比现有方法多一个数量级。

详情

AI中文摘要

在没有训练数据的情况下发现新型稳定分子仍然是一个重大的科学挑战。当前的分子生成模型是在大型预筛选数据集上训练的，这引入了偏差并限制了对新型化学的探索。相比之下，我们提出了一种新范式：能够无需任何预训练而映射广阔未知化学空间的自主、通用智能体。我们首次提出了AtomComposer，一个在化学计量约束下自主构建有效3D异构体，并仅通过在线强化学习进行训练的自我引导智能体。与通常过拟合特定化学式的现有方法不同，我们建立了一种多组分训练方案，使得在能量和有效性奖励的引导下，能够跨不同化学领域进行广泛泛化。我们的智能体在未见过的测试化学式上，能够发现比现有单组分强化学习基线（使用每步能量奖励训练）多一个数量级的有效异构体。这些结果实现了在线强化学习作为一种可扩展、从头探索化学构型空间的强大范式的承诺。

英文摘要

Discovering novel stable molecules without training data remains a grand scientific challenge. Current molecular generative models are trained on large, pre-curated datasets, which introduce biases and limit exploration of novel chemistry. In contrast, we propose a new paradigm: autonomous, generalized agents capable of mapping vast, unknown chemical spaces without any pretraining. For the first time, we present AtomComposer, a self-guided agent that autonomously constructs valid 3D isomers under stoichiometric constraints and is trained exclusively online using reinforcement learning. Unlike existing approaches that generally overfit to a specific chemical formula, we establish a multi-composition training scheme that enables a broad generalization across diverse chemistry, guided by energy- and validity-based rewards. Our agent can discover up to an order of magnitude more valid isomers on unseen test formulas than existing single-composition reinforcement-learning baselines trained with per-step energy rewards. These results fulfill the promise of online reinforcement learning as a powerful paradigm for scalable, from-scratch exploration of chemical configuration space.

URL PDF HTML ☆

赞 0 踩 0

2605.28276 2026-05-28 cs.LG 版本更新

Commit to the Bit: Reactive Reinforcement Learning Done Right

Onno Eberhard, Claire Vernade, Michael Muehlebach

发表机构 * Max Planck Institute for Intelligent Systems（马克斯·普朗克智能系统研究所）； University of Tübingen（图宾根大学）； University of Technology Nuremberg（纽伦堡技术大学）

AI总结针对确定性观测的有限环境，提出Committed Q-learning算法，在弱于$q_\star$-可实现性的rewire-鲁棒性假设下，证明其几乎必然收敛到最优反应策略。

详情

AI中文摘要

强化学习算法通常在马可夫假设下进行分析（和设计）。这是不现实的，因为实践中遇到的大多数环境要么是部分可观测的，要么需要函数近似，从而限制了智能体访问非马可夫状态特征。我们考虑在具有确定性观测（或等价地，硬状态聚合）的有限环境中学习最优反应策略的问题。我们引入了一种新算法，Committed Q-learning，并在一个称为rewire-鲁棒性的直观假设下证明了其几乎必然收敛到最优反应策略。该假设严格弱于先前工作中使用的$q_\star$-可实现性条件。我们的算法是经典Q-learning的一个变体，其中行为策略在进入一个特征时承诺执行单一动作，并且仅在观测到的特征变化时重新采样动作。我们分析的一个关键部分是引入了准马可夫环境。

英文摘要

Reinforcement learning algorithms are commonly analyzed (and designed) under the Markov assumption. This is unrealistic, as most environments encountered in practice are either partially observable, or require function approximation that restricts the agent to access non-Markovian state features. We consider the problem of learning an optimal reactive policy in a finite environment with deterministic observations (or equivalently, hard state aggregation). We introduce a new algorithm, Committed Q-learning, and prove almost-sure convergence to the optimal reactive policy under an intuitive assumption we call rewire-robustness. This assumption is strictly weaker than the $q_\star$-realizability condition used in prior work. Our algorithm is a variant of classical Q-learning in which the behavior policy commits to a single action upon entering a feature, and only resamples actions when the observed feature changes. A crucial part of our analysis is the introduction of quasi-Markov environments.

URL PDF HTML ☆

赞 0 踩 0

2605.28269 2026-05-28 cs.LG stat.ME 版本更新

Dynamic Topic Modeling with a Higher-Order Hypergraphical Representation

基于高阶超图表示的动态主题建模

Hanjia Gao, Hanwen Ye, Qing Nie, Annie Qu

发表机构 * Department of Statistics, University of California, Irvine（加州大学 Irvine 分校统计学系）； Department of Mathematics and Department of Developmental & Cell Biology, University of California, Irvine（加州大学 Irvine 分校数学系及发育与细胞生物学系）； Department of Statistics and Applied Probability, University of California, Santa Barbara（加州大学 Santa Barbara 分校统计学与应用概率系）

AI总结针对传统主题模型忽略词间高阶交互和动态语料中语义重叠的问题，提出超图表示文本并构建动态主题建模框架，通过结构化低秩分解和时间正则化实现，理论保证收敛性和误差界，实验优于现有模型。

Comments 34 pages, 4 figures

详情

AI中文摘要

动态主题建模被广泛用于分析科学文献、医疗记录和社交媒体中的演变趋势。传统主题模型通过多项单纯形上的单个概率向量表示每个主题，并将词的出现和重复隐式耦合在一个概率机制中。然而，这种表述限制了词之间的依赖结构，并忽略了信息丰富的高阶交互，特别是在具有重叠语义的动态语料中。为了解决这些局限性，我们引入文本的超图表示，其中每个文档被建模为一个连接所有共现词的超边，重复强度编码为节点权重。这种表示自然地将词的出现与重复分开，并引入了一种新颖的基于超图的多项分布，其非线性归一化取决于每个文档的观测词集。基于此似然，我们通过结构化低秩分解和主题-词轮廓上的显式时间正则化，开发了一个动态主题建模框架。此外，尽管双线性分解和文档特定的非线性归一化导致了内在的非凸性，我们仍建立了局部收敛保证并推导了非渐近误差界。在合成数据上的数值实验以及在国际学习表征会议（ICLR）语料库上的应用表明，该方法比现有的基于多项式的主题模型具有一致的改进。

英文摘要

Dynamic topic modeling is widely used to analyze evolving trends in scientific literature, medical records, and social media. Traditional topic models represent each topic through a single probability vector on the multinomial simplex and implicitly couple word occurrence and repetition within one probabilistic mechanism. However, this formulation restricts the dependence structure among words and overlooks informative higher-order interactions, particularly in dynamic corpora with overlapping semantics. To address these limitations, we introduce a hypergraph representation of text where each document is modeled as a hyperedge connecting all co-occurring words, with repetition intensities encoded as node weights. This representation naturally separates word occurrence from repetition and induces a novel hypergraph-based multinomial distribution with a nonlinear normalization depending on the observed word set of each document. Building on this likelihood, we develop a dynamic topic modeling framework via structured low-rank factorizations with explicit temporal regularization on topic-word profiles. Moreover, we establish local convergence guarantees and derive non-asymptotic error bounds despite the intrinsic nonconvexity induced by bilinear factorization and document-specific nonlinear normalization. Numerical experiments on synthetic data and an application to the International Conference on Learning Representations (ICLR) corpus demonstrate consistent improvements over existing multinomial-based topic models.

URL PDF HTML ☆

赞 0 踩 0

2605.28267 2026-05-28 cs.LG stat.ML 版本更新

Parameter-Efficient Generative Modeling with Controlled Vector Fields

基于受控向量场的参数高效生成建模

Peyman Morteza

发表机构 * Department of Computer Sciences, University of Wisconsin, Madison, WI, 53706（威斯康星大学麦迪逊分校计算机科学系）

AI总结提出一种基于Chow-Rashevskii定理的连续时间生成建模框架，通过少量固定向量场和学习的标量控制构建表达流，实现参数高效的分布变换。

详情

AI中文摘要

受Chow-Rashevskii定理启发，我们引入了一个连续时间生成建模框架，该框架从一小组固定向量场和学习的标量控制中构建表达流。我们的框架不是学习无约束的高维向量场，而是通过学习标量控制函数来调制固定向量场，从而构造速度。当固定场是括号生成时，它们的李代数张成整个空间，提供了一种仅用少量学习控制通道即可实现表达性传输的机制，并为标准向量场参数化提供了一种参数高效的几何替代方案。这种解耦公式产生了一个结构化和可解释的生成模型，其中学习的标量输出通道的数量可以独立于环境维度选择。我们制定了一个表达性原则，表明在适当的可控性和适定性假设下，这种受控流可以将源分布传输到目标分布。我们使用连续归一化流似然目标训练所得模型，并在合成分布上进行了概念验证实验。

英文摘要

We introduce a continuous-time generative modeling framework, motivated by the Chow-Rashevskii theorem, that builds expressive flows from a small set of fixed vector fields and learned scalar controls. Instead of learning an unconstrained high-dimensional vector field, our framework constructs the velocity by modulating fixed vector fields with learned scalar control functions. When the fixed fields are bracket-generating, their Lie algebra spans the ambient space, providing a mechanism for expressive transport with only a small number of learned control channels and offering a parameter-efficient geometric alternative to standard vector-field parameterizations. This decoupled formulation yields a structured and interpretable generative model in which the number of learned scalar output channels can be chosen independently of the ambient dimension. We formulate an expressivity principle showing that, under suitable controllability and well-posedness assumptions, such controlled flows can transport a source distribution to a target distribution. We train the resulting model using a continuous-normalizing-flow likelihood objective and present proof-of-concept experiments on synthetic distributions.

URL PDF HTML ☆

赞 0 踩 0

2605.28251 2026-05-28 stat.ML cs.CY cs.LG 版本更新

Counterfactually Fair Regression via Optimal Transport

通过最优传输实现反事实公平回归

M. Generali Lince, S. Gaucher, J-J. Vie, P. Loiseau

发表机构 * Inria, Soda team（Inria，Soda团队）； Inria, FairPlay joint team（Inria，FairPlay联合团队）； CREST, ENSAE, IP Paris（CREST，ENSAE，IP巴黎）； CMAP, CNRS, Ecole Polytechnique, Institut Polytechnique de Paris（CMAP，CNRS，Polytechnique学院，巴黎理工学院）

AI总结本文采用因果不确定性视角，通过重采样噪声定义反事实公平性，提出基于最优传输的后处理估计器，并证明其有限样本公平性保证和风险界。

详情

AI中文摘要

我们考虑学习一个反事实公平回归器的问题。我们采用因果不确定性视角，其中反事实公平性通过重采样噪声定义。我们专注于为一种新的后处理估计器获得理论公平性保证。我们首先证明反事实公平性等价于满足以潜在变量为条件的群体均等。这使我们能够通过重心分位数映射提供最优公平回归器的闭式表达式。为了处理连续潜在变量，我们提出了一种离散化的后处理方法。然后，在温和的正则性假设下，我们证明了我们的估计器具有高概率的有限样本公平性保证，不公平性衰减率为 $ ilde O(n^{-1/3})$，并建立了匹配的风险界 $ ilde O(n^{-1/3})$。我们给出了几乎公平预测的过剩风险的下界。最后，我们将结果扩展到宽松反事实公平性的设置。我们在真实世界和合成数据上验证了我们的方法。

英文摘要

We consider the problem of learning a counterfactually fair regressor. We adopt a causal uncertainty view in which counterfactual fairness is defined with resampled noise. We focus on obtaining theoretical fairness guarantees for a new post-processing estimator. We begin by showing that counterfactual fairness is equivalent to satisfying demographic parity conditional on the latent variable. This allows us to provide a closed-form expression of the optimal fair regressor via a barycentric quantile map. In order to handle continuous latent variables, we propose a discretized post-processing method. Then, under mild regularity assumptions, we prove high-probability finite-sample fairness guarantees for our estimator, providing an unfairness decay at rate $\tilde O(n^{-1/3})$, and establishing a matching risk bound of order $\tilde O(n^{-1/3})$. We provide a matching lower bound on the excess risk of almost fair predictions. Finally, we extend our results to the setting of relaxed counterfactual fairness. We validate our approach on real-world and synthetic data.

URL PDF HTML ☆

赞 0 踩 0

2605.28247 2026-05-28 cs.LG cs.AI 版本更新

IRDS: Interpretable RLVR Data Selection via Verifier-Coupled Sparse Autoencoder Coverage

IRDS: 通过验证器耦合的稀疏自编码器覆盖实现可解释的RLVR数据选择

Yuhan Li, Mingxu Zhang, Dazhong Shen, Ying Sun

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港理工大学（广州））； Nanjing University of Aeronautics and Astronautics（南京航空航天大学）； The 63rd Research Institute, National University of Defense Technology, Nanjing（国防科技大学第六三研究所，南京）

AI总结提出IRDS方法，基于稀疏自编码器簇和验证器耦合的覆盖目标，选择模型失败但可学习的RLVR训练实例，提升数学推理准确率并降低计算成本。

Comments 24 pages,3 figures,18 tables

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）已成为增强LLM推理能力的关键技术，但其数据效率低下仍是一个主要瓶颈。现有方法仅部分解决此问题，各自至少缺少子集级覆盖、验证器信号使用或可解释性中的一项。为弥补这一空白，我们提出了IRDS（可解释的RLVR数据选择），该方法在稀疏自编码器（SAE）簇的基础上选择RLVR训练实例，使得选择本身在可识别的问题模式上是可审计的。为了选择模型既失败又能从中学习的实例，我们在SAE基础上引入了一个验证器耦合的覆盖目标，并通过贪心对数行列式最大化来求解。在三个指令微调模型和六个数学推理基准上的实验表明，IRDS实现了最高的整体准确率，在Qwen两个模型上超过最强基线+3.9/+4.0个百分点，在Llama-3.1-8B上超过+0.5个百分点，同时运行成本比基于轨迹的基线低一个数量级。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has become a key technique for en- hancing LLM reasoning, yet its data ineffi- ciency remains a major bottleneck. Existing methods address this problem only partially, each missing at least one of subset-level cov- erage, verifier signal use, or interpretability. To address this gap, we present IRDS (Inter- pretable RLVR Data Selection), which selects RLVR training instances on a sparse autoen- coder (SAE) cluster basis so the selection itself is auditable on recognizable problem motifs. To select instances the model both fails on and can still learn from, we introduce a verifier- coupled coverage objective on the SAE basis and solve it by greedy log-determinant max- imization. Experiments on three instruction- tuned models and six math reasoning bench- marks show that IRDS achieves the highest overall accuracy, exceeding the strongest base- line by +3.9/+4.0 pp on the two Qwen models and by +0.5 pp on Llama-3.1-8B, while run- ning an order of magnitude cheaper than the trajectory-based baseline.

URL PDF HTML ☆

赞 0 踩 0

2605.28233 2026-05-28 stat.ML cs.CY cs.LG 版本更新

Geometry of Relaxed Fair Regression: A Unified Framework for Aware and Unaware Settings

松弛公平回归的几何：统一感知与无感知设置框架

M. Generali Lince, V. Divol, R. Flamary, S. Gaucher, P. Loiseau

发表机构 * Inria, Soda team（Inria，Soda团队）； CREST, ENSAE, IP Paris（CREST，ENSAE，IP巴黎）； Inria, Fairplay joint team（Inria，Fairplay联合团队）； CMAP, CNRS, Ecole Polytechnique, Institut Polytechnique de Paris（CMAP，CNRS，Polytechnique学院，巴黎理工学院）

AI总结本文通过最优传输理论统一了感知与无感知设置下的公平回归问题，提出了基于Wasserstein-2和全变差惩罚的算法，在松弛公平约束下实现准确预测。

详情

AI中文摘要

公平-准确权衡是部署公平感知机器学习方法的核心问题。当敏感属性在推理时不可用——即所谓的无感知设置时，在松弛公平约束下获得准确预测的原则性方法基本缺失。在这项工作中，我们通过将人口统计平价惩罚下的回归问题表述为最优传输问题来填补这一空白。我们的框架统一了感知和无感知设置，并通过最优传输映射刻画了在平方Wasserstein-2和全变差惩罚下的最优预测函数。这些结果表明，惩罚的选择反映了根本不同的公平哲学：Wasserstein惩罚诱导出平滑的、群体范围内的妥协，而全变差惩罚则对个体子集强制执行精确的平价。基于这些理论刻画，我们提出了一种易于实现、计算高效且在实际基准测试中始终匹配或超越最先进基线的算法。

英文摘要

Fairness-accuracy trade-offs are a central concern in the deployment of fairness-aware machine learning methods. When sensitive attributes are unavailable at inference time-the so called unawareness setting, principled methods for obtaining accurate predictions under relaxed fairness constraints are largely missing. In this work, we address this gap by formulating regression under a demographic parity penalty as an optimal transport problem. Our framework unifies both the \emph{aware} and \emph{unaware} settings and characterizes optimal prediction functions via optimal transport maps, under both squared Wasserstein-2 and Total Variation penalties. These results reveal that the choice of penalty reflects fundamentally different fairness philosophies: the Wasserstein penalty induces a smooth, population-wide compromise, while Total Variation enforces exact parity for a subset of individuals. Building on these theoretical characterizations, we propose an algorithm that is simple to implement, computationally efficient, and consistently matches or outperforms state-of-the-art baselines on real-world benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.28231 2026-05-28 cs.RO cs.LG 版本更新

ProgVLA: Progress-Aware Robot Manipulation Skill Learning

ProgVLA：进度感知的机器人操作技能学习

Seungsu Kim, Jinyoung Choi, Seungmin Baek, Jean-Michel Renders

发表机构 * NAVER LABS（NAVER实验室）； NAVER LABS Europe（NAVER实验室欧洲）

AI总结提出ProgVLA，一种紧凑的视觉-语言-动作模型，通过显式表示任务进度和两阶段Perceiver重采样机制，在有限计算和内存下实现长序列多模态处理，并在多任务操作基准上达到或超越大模型性能。

详情

AI中文摘要

我们提出了ProgVLA，一种紧凑的视觉-语言-动作（VLA）模型，专为在严格的计算和内存预算下进行可靠的机器人操作而设计。该模型特别关注通过维护任务进度的显式表示来高效处理长多模态序列。为此，ProgVLA集成了两个关键组件。首先，一个带有两阶段Perceiver重采样方案的多模态编码器将可变长度的视觉、语言和本体感受流压缩为一组固定的控制就绪上下文令牌，在保持跨模态基础的同时大幅减少序列长度。其次，一组辅助的进度头通过离线强化学习（RL）目标进行训练，以联合学习针对归一化剩余水平目标的批评者。这为策略提供了任务进度的内部估计，并实现了优势加权和成功加权的流匹配模仿学习。在两个成熟的多任务机器人操作基准上，一个0.1B参数的ProgVLA模型达到了与显著更大的预训练基线相当的成功率，并且在长时域和更困难的任务层级上超过了它们。消融实验表明，学习到的上下文重采样器和任务自适应视觉微调是最大的单一贡献者，而进度感知训练提供了集中在长时域和多对象任务上的一致额外增益。我们还在真实世界的玩具厨房环境中进一步验证了该方法。

英文摘要

We present ProgVLA, a compact vision-language-action (VLA) model designed for reliable robot manipulation under tight compute and memory budgets. The model specifically focuses on efficiently processing long multi-modal sequences by maintaining an explicit representation of task progress over extended horizons. To this end, ProgVLA integrates two key components. First, a multi-modal encoder with a two-stage Perceiver resampling scheme compresses variable-length visual, language, and proprioceptive streams into a fixed set of control-ready context tokens, substantially reducing sequence length while preserving cross-modal grounding. Second, an auxiliary set of progress heads is trained with offline reinforcement learning (RL) objectives to jointly learn critics over normalized remaining-horizon targets. This provides the policy with an internal estimate of task progress and enables advantage- and success-weighted flow-matching imitation learning. On two well-established multi-task robot manipulation benchmarks, a 0.1B-parameter ProgVLA model reaches success rates that are competitive with, and on long-horizon and harder task tiers exceed, substantially larger pretrained baselines. Ablations indicate that the learned context resampler and task-adaptive visual fine-tuning are the largest single contributors, while progress-aware training provides a consistent additional gain that is concentrated on long-horizon and multi-object tasks. We further validate the approach in real-world toy-kitchen environments.

URL PDF HTML ☆

赞 0 踩 0

2605.28226 2026-05-28 cs.LG 版本更新

PhAME: Phenotype-Aware Molecular Editing via Latent Diffusion

PhAME: 基于表型感知的潜在扩散分子编辑

Łukasz Janisiów, Sebastian Musiał, Bartosz Zieliński, Dawid Rymarczyk, Tomasz Danel

发表机构 * Faculty of Mathematics and Computer Science, Jagiellonian University（杰洛内夫斯基大学数学与计算机科学学院）； Doctoral School of Exact and Natural Sciences, Jagiellonian University（杰洛内夫斯基大学精确与自然科学博士学院）； Jagiellonian Center for Artificial Intelligence, Jagiellonian University（杰洛内夫斯基人工智能中心）； Faculty of Chemistry, Jagiellonian University（杰洛内夫斯基大学化学系）； Ardigen SA（Ardigen公司）

AI总结提出PhAME框架，利用潜在扩散模型在预训练图VAE的潜在空间中进行分子编辑，通过组合无分类器引导机制同时优化表型条件和结构相似性，实现高化学有效性和新颖性的多目标分子优化。

详情

AI中文摘要

小分子药物发现需要同时优化候选分子的众多属性。这些属性可以通过分析高维生物特征（如细胞形态和转录组扰动）来研究，这些特征提供了对潜在生物机制的丰富视角。然而，现有的使用这些特征进行优化的生成方法未能满足两个关键要求：在保持与已知先导物结构接近的同时，提供朝向期望表型特征的精确引导。我们引入了PhAME（表型感知分子编辑），这是一种潜在扩散框架，通过将分子优化重新定义为预训练图基VAE潜在空间中的编辑来克服这一挑战。我们的核心贡献是一种具有两个独立尺度的组合无分类器引导方案，一个用于表型条件，另一个用于与种子结构的相似性，允许从业者控制这两个目标之间的权衡。在包括对接分数优化和多模态表型生成在内的多个基准测试中的实证评估表明，PhAME在保持高化学有效性和新颖性的同时实现了最先进的结果。

英文摘要

Small-molecule drug discovery requires simultaneous optimization of numerous properties of candidate molecules. These properties can be investigated through the analysis of high-dimensional biological signatures, such as cell morphology and transcriptomic perturbations, which provide a rich perspective on the underlying biological mechanisms. However, existing generative methods, which use those signatures for optimization, fail to meet two key requirements: providing precise guidance toward desired phenotypic signatures while maintaining structural proximity to a known hit. We introduce PhAME (Phenotype-Aware Molecular Editing), a latent diffusion framework that overcomes this challenge by recasting molecular optimization as editing in the latent space of a pretrained graph-based VAE. Our central contribution is a compositional classifier-free guidance scheme with two independent scales, one for the phenotype-conditioning and one for similarity to the seed structure, allowing practitioners to control the tradeoff between these two objectives. Empirical evaluations across diverse benchmarks, including docking score optimization and multimodal phenotypic generation, demonstrate that PhAME achieves state-of-the-art results while maintaining high chemical validity and novelty.

URL PDF HTML ☆

赞 0 踩 0

2605.28222 2026-05-28 cs.CL cs.IR cs.LG 版本更新

Analyzing Quality-Latency-Resource Trade-offs in a Technical Documentation RAG Assistant Using LoRA Adaptation

使用LoRA适配分析技术文档RAG助手中的质量-延迟-资源权衡

Evgenii Palnikov, Elizaveta Gavrilova

发表机构 * HSE University（俄罗斯高等经济大学）

AI总结本研究通过LoRA适配器在RAG系统中分析质量、延迟和资源之间的权衡，发现仅对q和v注意力投影进行适配的配置在帕累托前沿占优。

Comments 13-page main body plus extended appendix; 6 figures; benchmark, LoRA adapters, and code at https://github.com/EugPal/rag-lora-tradeoffs

详情

AI中文摘要

我们研究了在基于文档的检索增强生成（RAG）系统中使用生成器的低秩适配（LoRA）时的质量-延迟-资源权衡。我们构建了一个包含5,144个问答对的手动验证基准测试，这些问答对基于官方Kubernetes文档，并将其与固定的混合检索流水线（BGE-M3密集、BGE-M3原生稀疏、互惠排名融合、交叉编码器重排序）结合。在此基准测试上，我们在Llama-3.2-3B-Instruct和Llama-3.1-8B-Instruct上对20种LoRA配置进行了消融实验，涉及秩和目标模块的选择，并评估了每个配置的token级F1、LLM判断的接地性和正确性（pass@4）、推理延迟、推理内存和训练成本，所有结果均附有bootstrap 95%置信区间。帕累托分析表明，仅作用于q和v注意力投影的LoRA适配器始终主导前沿，而3B/8B的选择主要定义了操作区间。参数匹配的控制比较进一步表明，q/v优势是结构性的而非纯粹参数性的。基准测试、选定的适配器和代码可在https://github.com/EugPal/rag-lora-tradeoffs获取。

英文摘要

We study quality-latency-resource trade-offs in a documentation-grounded retrieval-augmented generation (RAG) system that uses Low-Rank Adaptation (LoRA) of the generator. We build a manually verified benchmark of 5,144 question-answer pairs over the official Kubernetes documentation and combine it with a fixed hybrid-retrieval pipeline (BGE-M3 dense, BGE-M3 native sparse, Reciprocal Rank Fusion, cross-encoder reranking). Over this benchmark we ablate 20 LoRA configurations on Llama-3.2-3B-Instruct and Llama-3.1-8B-Instruct across rank and target-module choices, and evaluate each on token-level F1, LLM-judged groundedness and correctness (pass@4), inference latency, inference memory, and training cost, all reported with bootstrap 95% confidence intervals. Pareto analysis shows that LoRA adapters acting only on the q and v attention projections consistently dominate the front, while the 3B/8B choice mainly defines operating regime. A param-matched control comparison further indicates that the q/v advantage is structural rather than purely parametric. The benchmark, selected adapters, and code are available at https://github.com/EugPal/rag-lora-tradeoffs.

URL PDF HTML ☆

赞 0 踩 0

2605.28219 2026-05-28 cs.HC cs.AI cs.LG 版本更新

SmartIterator: Visual Analytics Workflows for Supervising Unsupervised Data Grouping

SmartIterator: 监督无监督数据分组的可视化分析工作流

Gennady Andrienko, Natalia Andrienko

发表机构 * Fraunhofer Institute IAIS（弗劳恩霍夫研究所IAIS）； Lamarr Institute for Machine Learning and Artificial Intelligence（拉马尔人工智能与机器学习研究所）； City St George’s, University of London（伦敦大学圣乔治学院）

AI总结提出SmartIterator可视化分析方法，通过六阶段工作流和IteraScope协调视图，系统探索参数扫描下的分组结果，支持用户理解数据结构和做出知情决策。

详情

AI中文摘要

无监督学习方法——主题建模、基于划分和基于密度的聚类——在没有人类指导的情况下产生数据分组，但选择和评估这些分组本身不应是无监督的。我们提出了\emph{SmartIterator}（SI），一种可视化分析方法，将参数扫描中分组结果的完整序列视为一等分析对象。对于每个方法族，SI提供了一个结构化的六阶段工作流，引导分析师系统地探索分组结果——从质量指标概览，经过过渡稳定性评估、成员置信度评估、内容和上下文检查、循环原型验证，到知情决策——在此过程中逐步建立对数据结构的累积理解。这些工作流通过\emph{IteraScope}（IS）实现，这是一个协调的可视化显示，结合了质量指标图表与语义颜色编码、带有桑基式过渡流和成员置信度小提琴图的一维组嵌入、带有HDBSCAN检测的循环原型的二维组嵌入（突出显示捕获所有持久模式的迭代），以及用于上下文解释的特定领域链接视图。我们在以下三个场景中演示了这些工作流：（1）来自VAST Challenge 2011的模拟社交媒体消息（基于密度的聚类，根据真实情况进行验证），（2）约1500个NUTS-3区域的欧盟人口统计数据（基于划分的聚类），以及（3）30年的IEEE VIS论文（NMF主题建模）。这些工作流构成了主要贡献：它们提供了可操作的、针对特定方法的指导，用于导航参数空间、研究数据结构如何随配置变化，以及将分析理解扎根于领域背景——从而产生关于数据的知识，这是任何单个“最佳”结果都无法提供的。

英文摘要

Unsupervised learning methods -- topic modeling, partition-based and density-based clustering -- produce data groupings without human guidance, yet choosing and evaluating those groupings should not itself be unsupervised. We present \emph{SmartIterator}~(SI), a visual analytics approach that treats the full sequence of grouping results across a parameter sweep as a first-class analytical object. For each method family, SI provides a structured six-phase workflow that guides the analyst through systematic exploration of grouping results -- from quality-metric overview through transition-stability assessment, membership-confidence evaluation, content and context inspection, and recurrent-archetype verification to an informed decision -- building cumulative understanding of data structure along the way. The workflows are operationalized through \emph{IteraScope}~(IS), a coordinated visual display combining quality-metric charts with semantic color encoding, a 1D group embedding with Sankey-style transition flows and violin plots of membership confidence, a 2D group embedding with HDBSCAN-detected recurrent archetypes that highlights iterations capturing all persistent patterns, and domain-specific linked views for contextualized interpretation. We demonstrate the three workflows on: (1)~simulated social-media messages from the VAST Challenge 2011 (density-based clustering, validated against ground truth), (2)~EU population statistics across ${\sim}1\,500$ NUTS-3 regions (partition-based clustering), and (3)~30 years of IEEE VIS papers (NMF topic modeling). The workflows constitute the main contribution: they provide actionable, method-specific guidance for navigating parameter spaces, studying how data structure evolves across configurations, and grounding analytical understanding in domain context -- yielding knowledge about the data that no single ``best'' result can provide.

URL PDF HTML ☆

赞 0 踩 0

2605.28215 2026-05-28 cs.AI cs.CL cs.LG cs.LO cs.MA 版本更新

Explaining is Harder Than Predicting Alone: Evaluating Concept-based Explanations of MLLMs as ICL Visual Classifiers

解释比单独预测更难：评估基于概念的MLLM解释作为ICL视觉分类器

Carmen Quiles-Ramírez, Leticia L. Rodríguez, Nicolás Martorell, Natalia Díaz-Rodríguez

AI总结本文通过五种形式化程度递增的条件，系统评估多模态大语言模型在少样本上下文学习中的基于概念的可解释性，发现解释比预测更难，且强制生成形式化解释会降低预测准确性。

Comments Accepted to the CompLearn Workshop at ICML 2026

详情

AI中文摘要

上下文学习（ICL）使多模态大语言模型（MLLM）能够从少量标记示例中对图像进行分类。然而，这些模型如何使用提供的上下文仍然不透明。虽然思维链提示被广泛使用，但最近的研究认为它可能不反映真实的内部计算。在本文中，我们通过五种形式化程度递增的条件（从基线分类到描述逻辑（DL）公理生成）系统评估了冻结MLLM在少样本ICL下的基于概念的可解释性。通过独立的LLM-as-a-judge流水线评估四个最先进的MLLM，我们证明解释确实比单独预测更难。令人惊讶的是，强制模型生成形式化结构的基于概念的解释会单调降低预测准确性（从93.8%降至90.1%），这与显式推理普遍有助于性能的假设相矛盾。然而，当模型成功表达类别判别性视觉特征时，解释质量与正确预测强相关。我们的发现表明，虽然MLLM在视觉分类方面表现出色，但它们缺乏形式化、机器可验证的可解释性所需的特定指令微调。

英文摘要

In-context learning (ICL) enables multimodal large language models (MLLMs) to classify images from a few labelled examples. Yet, how these models use the provided context remains opaque. While Chain-of-Thought prompting is widely used, recent work argues that it may not reflect true internal computation. In this paper, we systematically evaluate the concept-based explainability of frozen MLLMs under few-shot ICL using five conditions of increasing formal rigour, ranging from baseline classification to Description Logics (DL) axiom generation. Evaluating four state-of-the-art MLLMs via an independent LLM-as-a-judge pipeline, we demonstrate that explaining is genuinely harder than predicting alone. Surprisingly, forcing models to generate formally structured, concept-based explanations degrades predictive accuracy monotonically (from 93.8% to 90.1%), contradicting the assumption that explicit reasoning universally aids performance. However, when models successfully articulate class-discriminative visual features, explanation quality strongly correlates with correct predictions. Our findings suggest that while MLLMs excel at visual classification, they lack the specific instruction-tuning required for formal, machine-verifiable explainability.

URL PDF HTML ☆

赞 0 踩 0

2605.28214 2026-05-28 cs.CR cs.LG cs.MA 版本更新

Out of Sight, Not Out of Mind: Unveiling Latent Attack in Latent-based Multi-Agent Systems

眼不见，心不烦：揭示基于潜在的多智能体系统中的潜在攻击

Chenxi Wang, Ruiyang Huang, Jiayan Sun, Lei Wei, Yifan Wu

发表机构 * Southeast University（东南大学）； Peking University（北京大学）

AI总结研究潜在表示能否携带攻击信息，提出通过潜在干预激活攻击效果的框架，实验表明潜在攻击在清洁执行中显著降低任务性能，尤其影响智能体间KV缓存传递。

Comments 27 pages, 7 figures, 3 tables. Preprint

详情

AI中文摘要

基于潜在的多智能体系统用隐藏表示替代部分显式智能体间通信，为高效灵活的智能体协作提供了新方向。然而，将协调移至潜在空间也可能将攻击移至可见文本检查范围之外。本文研究潜在状态能否携带在清洁执行期间仍然有效的攻击相关信息。为探究此问题，我们引入了一个潜在攻击框架，通过潜在干预重新激活攻击诱导的效果，而无需重用对抗性文本。大量实验表明，由此产生的纯潜在攻击在清洁执行中能显著降低任务性能，尤其当应用于智能体间KV缓存传递而非局部隐藏状态时。进一步的控制分析表明，这种性能下降不能归因于任意扰动或无效生成。总体而言，我们的发现表明基于潜在的协作并未消除攻击风险，而是将部分风险转移至较不可见的执行状态，这要求超越可见文本检查的安全防护措施。

英文摘要

Latent-based multi-agent systems replace parts of explicit inter-agent communication with hidden representations, offering a new direction for efficient and flexible agent collaboration. However, moving coordination into latent space may also move attacks beyond the reach of visible-text inspection. In this paper, we study whether latent states can carry attack-associated information that remains effective during clean executions. To examine this question, we introduce a latent attack framework that reactivates attack-induced effects through latent interventions without reusing adversarial text. Extensive experiments show that the resulting latent-only attacks can substantially degrade task performance in clean executions, especially when applied to inter-agent KV-cache handoffs rather than local hidden states. Further control analyses indicate that this degradation cannot be reduced to arbitrary perturbations or invalid generation. Overall, our findings suggest that latent-based collaboration does not remove attack risk. It shifts part of the risk into less observable execution states, calling for safeguards beyond visible-text inspection.

URL PDF HTML ☆

赞 0 踩 0

2605.28203 2026-05-28 cs.LG 版本更新

通过最优系数校准在强化学习中联合训练多令牌预测

Zili Wang, Jiajun Chai, Lin Chen, Xiaohan Wang, Shiming Xiang, Guojun Yin

发表机构 * School of Artificial Intelligence, University of Chinese Academy of Sciences（中国科学院大学人工智能学院）； MAIS, Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所MAIS）； Meituan（美团）

AI总结本文从优化角度分析多令牌预测与强化学习联合训练失败的原因，提出最优系数校准方法，通过在线跟踪最优系数实现性能提升。

详情

AI中文摘要

基于可验证奖励的强化学习已成为提升大语言模型推理能力的标准范式，而多令牌预测是预训练中广泛采用的模块。将两者结合是自然的方法，但当前的强化学习实践会分离多令牌梯度，因为联合训练会降低性能。我们从优化角度重新审视这一失败。我们表明，多令牌对强化学习目标的每步影响可分解为两项：一阶相关性和二阶扰动惩罚。这种分解统一了三种多令牌训练模式：分离、交叉熵损失和策略损失，并解释了每种模式成功或失败的原因。对策略损失的进一步分析揭示，尽管它符合直觉，但性能仍然下降：相关性项衰减而二次惩罚持续存在。在分析指导下，我们提出最优系数校准，一种自适应方案，通过对数概率代理在线跟踪最优系数，且成本可忽略。在六个竞赛级数学推理基准上，最优系数校准一致达到或超过分离基线，实现了改进的联合多令牌-强化学习训练性能。

英文摘要

Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as the standard paradigm for improving reasoning capability of large language models, while Multi-Token Prediction (MTP) has been a widely adopted module in pretraining. Combining them is a natural approach, yet current RL practices detach MTP gradients because joint training degrades the performance. We revisit this failure from an optimization perspective. We show that the per-step effect of MTP on the RL objective can be decomposed into two terms: a first-order correlation and a second-order perturbation penalty. This decomposition unifies three MTP training regimes: Detach, Cross-Entropy loss, and Policy loss, and explains why each succeeds or fails. Further analysis of policy loss reveals that, although it aligns with intuition, performance still degrades: the correlation term decays while the quadratic penalty persists. Guided by the analysis, we propose Optimal Coefficient Calibration (OCC), an adaptive scheme that tracks the optimal coefficient online via a log-probability proxy at negligible cost. Across six competition-level mathematical reasoning benchmarks, OCC consistently matches or exceeds the detach baseline, delivering improved joint MTP-RL training performance.

URL PDF HTML ☆

赞 0 踩 0

2605.28165 2026-05-28 cs.LG 版本更新

Unification and Optimization of Robust Supervised Learning

鲁棒监督学习的统一与优化

Jonas Hanselle, Valentin Margraf, Clemens Damke, Eyke Hüllermeier

发表机构 * LMU Munich, MCML（慕尼黑大学，MCML）

AI总结提出一个统一框架，将分布鲁棒优化、标签平滑、邻域风险最小化和Mixup等鲁棒学习方法组织为三个设计轴，并通过联合超参数优化自动组合适合任务的鲁棒策略。

详情

AI中文摘要

文献中提出了各种经验风险最小化的鲁棒替代方案，以应对分布偏移、标签噪声和有限样本退化等故障模式。例如分布鲁棒优化、标签平滑、邻域风险最小化和Mixup。然而，这些方法通常是孤立开发的，迫使从业者事先承诺单一故障模式，即使任务的主要模式尚不清楚。为了解决这个问题，我们将现有的一大类方法沿着三个共同的设计轴组织起来，并推导出一个可行的训练程序，将鲁棒学习分解为顺序阶段（参考分布丰富化、输入空间扰动、标签空间扰动和样本级聚合），每个阶段都有立场选择（悲观、中性或乐观）。这产生了一个统一的设计空间，其中联合超参数优化可以组合和配置适合手头任务的鲁棒策略。在表格、图像和奖励建模基准测试中，联合超参数优化与每种设置中最佳单方法基线具有竞争力，为那些事先不知道其任务中哪种故障模式占主导地位的从业者提供了可靠的默认选择。

英文摘要

The literature has proposed various robust alternatives to empirical risk minimisation to address failure modes such as distribution shift, label noise and finite-sample degeneracies. Examples include distributionally robust optimization, label smoothing, vicinal risk minimization, and Mixup. However, such approaches are typically developed in isolation, forcing practitioners to commit a priori to a single failure mode even when the dominant mode for the task is unclear. To address this, we organize a broad class of existing methods along three common design axes and derive a tractable training procedure that decomposes robust learning into sequential stages (reference distribution enrichment, input-space perturbation, label-space perturbation, and sample-level aggregation), each with a choice of stance (pessimistic, neutral, or optimistic). This results in a unified design space in which joint hyperparameter optimization can compose and configure robustness strategies suited to the task at hand. Across tabular, image, and reward modeling benchmarks, joint hyperparameter optimization is competitive with the best single-method baseline in each setting, offering a reliable default for practitioners who do not know a priori which failure mode dominates their task.

URL PDF HTML ☆

赞 0 踩 0

2605.28155 2026-05-28 cs.LG cs.NI 版本更新

Temporal Hyperbolic Graph Representation Learning for Scale-Free Internet Routing and Delay Prediction

面向无标度互联网路由与延迟预测的时间双曲图表示学习

Yi-Ling Kuo, Hao-Yu Tien, Shih-Yu Tsai

发表机构 * Department of Information Management and Finance, National Yang Ming Chiao Tung University（信息管理与金融系，国家阳明交通大学）

AI总结提出HERMIT框架，结合双曲流形保持的时间图神经网络与随机森林回归器，利用双曲几何建模互联网路由图的层次和无标度结构，实现链路预测与RTT预测，在大规模真实数据集上优于基线模型。

详情

AI中文摘要

预测互联网往返时间（RTT）对于路由优化、服务质量（QoS）保障和流量工程至关重要，但由于长期时间依赖、动态路由演变和重尾延迟分布，仍然具有挑战性。虽然时间图神经网络（TGNN）可以建模不断演变的网络拓扑，但大多数现有方法在欧几里得空间中运行，难以捕捉互联网路由图的层次和无标度结构。双曲几何提供了更合适的表示空间。我们提出HERMIT（通过集成拓扑的双曲边缘感知RTT建模），这是一个混合框架，结合了双曲流形保持的时间GNN与随机森林回归器，用于联合链路预测和RTT预测。HERMIT基于HMPTGN，引入了RTT感知的边缘特征和可学习的边缘编码器，以改进对不断演变的链路状态和路由行为的建模。得到的双曲节点表示与历史RTT统计相结合，用于鲁棒的延迟预测。我们在2015-2024年的大规模真实互联网数据集上评估HERMIT。HERMIT始终优于仅使用历史RTT统计的强随机森林基线，RMSE改进6%，同时减少了重尾样本上的大误差。在链路预测性能上，它也超越了先前的双曲TGNN模型，包括HMPTGN和HTGN。这些结果表明，将双曲时间图学习与基于树的回归相结合，为真实世界互联网拓扑中的RTT预测提供了可扩展的解决方案。

英文摘要

Predicting Internet round-trip time (RTT) is critical for routing optimization, quality-of-service (QoS) provisioning, and traffic engineering, yet remains challenging due to long-term temporal dependencies, evolving routing dynamics, and heavy-tailed latency distributions. While Temporal Graph Neural Networks (TGNNs) can model evolving network topologies, most existing approaches operate in Euclidean space, which poorly captures the hierarchical and scale-free structure of Internet routing graphs. Hyperbolic geometry provides a more suitable representation space. We propose HERMIT (Hyperbolic Edge-aware RTT Modeling via Integrated Topology), a hybrid framework combining a hyperbolic manifold-preserving temporal GNN with a Random Forest regressor for joint link prediction and RTT prediction. Built on HMPTGN, HERMIT introduces RTT-aware edge features and a learnable edge encoder to improve modeling of evolving link states and routing behavior. The resulting hyperbolic node representations are combined with historical RTT statistics for robust latency prediction. We evaluate HERMIT on a large-scale real Internet dataset spanning 2015-2024. HERMIT consistently outperforms a strong Random Forest baseline using only historical RTT statistics, achieving a 6% RMSE improvement while reducing large errors on heavy-tailed samples. It also surpasses prior hyperbolic TGNN models, including HMPTGN and HTGN, in link prediction performance. These results demonstrate that combining hyperbolic temporal graph learning with tree-based regression provides a scalable solution for RTT prediction in real-world Internet topologies.

URL PDF HTML ☆

赞 0 踩 0

2605.28153 2026-05-28 physics.ao-ph cs.LG 版本更新

Skillful high-resolution weather forecasting independent of physical models

独立于物理模型的高分辨率天气预报

Pengcheng Zhao, Siqi Xiang, Weixin Jin, Zekun Ni, Jiang Bian, Zuliang Fang, Hongyu Sun, Bin Zhang, Richard E. Turner, Jonathan Weyn, Haiyu Dong, Kit Thambiratnam, Qi Zhang

发表机构 * Microsoft Corporation（微软公司）； University of Cambridge（剑桥大学）； The Alan Turing Institute（艾伦·图灵研究所）

AI总结提出ObsCast系统，仅使用观测数据训练，无需数值天气预报数据，实现短期高分辨率区域预报，性能优于传统NWP。

Comments 26 pages, 10 figures

详情

AI中文摘要

准确及时的天气预报对现代社会的高影响决策至关重要。基于机器学习的天气预报正在成为一种替代方案，用于生成初始条件、预报，甚至在端到端系统中同时生成两者。这些方法比传统数值天气预报（NWP）更快，且通常具有更高的技能。然而，即使是端到端模型也通常依赖NWP生成的再分析数据进行监督，从而继承了这些NWP的偏差和分辨率限制，并限制了在缺乏合适再分析产品、更新频率低或生产成本高的环境中的适应性。在此，我们介绍ObsCast，一个区域系统，它在训练和推理中均不使用任何NWP派生数据，同时生成分析和预报，并在短期高分辨率区域建模中实现了最先进的性能。在美国本土和欧洲，ObsCast在近地面变量方面优于业务NWP，预报时效达18小时，并产生有技巧的降水预报。它提供了一种更简单、更适应的方法，直接从本地观测构建和完善区域预报服务，无需开发复杂且昂贵的传统预报流程。

英文摘要

Accurate and timely weather forecasts are critical for high-impact decisions in modern society. Machine-learning-based weather prediction is emerging as an alternative for producing initial conditions, forecasts, and even both in end-to-end systems. These methods deliver predictions faster and often with higher skill than traditional numerical weather prediction (NWP). However, even end-to-end models typically rely on NWP-generated reanalyses for supervision, thereby inheriting the biases and resolution limitations of those NWPs, and limiting adaptation to settings where suitable reanalysis products are unavailable, infrequently updated, or expensive to produce. Here we introduce ObsCast, a regional system that generates both analysis and predictions, without using any NWP-derived data in either training or inference, while still achieving state-of-the-art performance in short-term high-resolution regional modeling. Over the contiguous United States and Europe, ObsCast outperforms operational NWP for near-surface variables through 18 h and produces skillful precipitation forecasts. It provides a simpler and more adaptable route to build and refine regional forecasting services directly from local observations, without the need to develop complex and costly traditional forecasting pipelines.

URL PDF HTML ☆

赞 0 踩 0

2605.28150 2026-05-28 cs.LG 版本更新

Off-Policy Learning to Reason Works Because It Is More Pessimistic Than You Think

离策略学习推理之所以有效是因为它比你想象的更悲观

Otmane Sakhi, Aleksei Arzhantsev, Imad Aouali, Flavian Vasile

发表机构 * Criteo AI Lab（Criteo人工智能实验室）

AI总结本文通过隐式悲观主义解释离策略强化学习目标的有效性，并提出稳定诱导分布的改进方法。

详情

AI中文摘要

大规模强化学习已成为改进大型语言模型推理能力的核心工具。在此规模下，生成往往滞后或异步，因此更新是在旧策略收集的数据上进行的。这使得学习本质上是离策略的。然而，大多数现有方法仍根植于PPO风格的信任区域目标，将训练视为近似在策略，并使用重要性权重来纠正分布不匹配。这些修正可能引入高方差，破坏优化稳定性，并加速熵崩溃。最近的研究提出了一种替代方案：与其纠正不匹配，不如接受离策略数据并移除重要性权重，这通常能产生更强的算法。在本文中，我们提供了一种直观的离策略目标构建方法，包括成功的离策略目标，并表明其有效性可以通过隐式悲观主义来理解：它们优化的目标策略比名义目标所暗示的更保守。这一视角解释了为什么某些特定的实现选择能提高稳定性：它们隐式地控制了有效目标分布。然后，我们提出了一种原则性的改进，以稳定这种诱导分布并改善离策略学习。

英文摘要

Large scale reinforcement learning has become a central tool for improving reasoning in large language models. At this scale, generation is often lagged or asynchronous, so updates are performed on data collected by older policies. This makes learning inherently off-policy. Most existing approaches nevertheless remain rooted in PPO-style trust-region objectives, treating training as approximately on-policy and using importance weights to correct distribution mismatch. These corrections can introduce high variance, destabilize optimization, and accelerate entropy collapse. Recent work suggests an alternative: rather than correcting the mismatch, one can embrace off-policy data and remove importance weights, often yielding stronger algorithms. In this paper, we provide an intuitive construction of off-policy objectives that include successful off-policy objectives and show that their effectiveness can be understood through implicit pessimism: they optimize toward target policies that are more conservative than their nominal objectives suggest. This perspective explains why some particular implementation choices improve stability: they implicitly control the effective target distribution. We then propose a principled modification that stabilize this induced distribution and improve off-policy learning.

URL PDF HTML ☆

赞 0 踩 0

2605.28149 2026-05-28 cs.LG 版本更新

Sign-Aware Gated Sparse Autoencoders: Modeling Anticorrelated Features with Bi-Jump-ReLU Activations

符号感知门控稀疏自编码器：使用Bi-Jump-ReLU激活函数建模反相关特征

Bartosz Wieciech, Zmnako Awrahman, Marcin Czelej, Victor Hugo Jaramillo Velasquez, Wioletta Stobieniecka

发表机构 * Amazon Web Services（亚马逊网络服务）

AI总结提出符号感知门控稀疏自编码器（SA-GSAE），通过双面门控稀疏性、符号幅度路径和辅助重构，利用Bi-Jump-ReLU激活实现双极性共享，在保持参数效率的同时，在多个LLM激活点上优于标准门控SAE。

详情

AI中文摘要

稀疏自编码器（SAE）从大型语言模型中提取可解释特征，但标准变体强制非负性，迫使对截然相反的概念（例如“压力过高”与“压力过低”）使用单独的潜在变量，并在特征反相关时浪费字典容量。我们提出了符号感知门控SAE（SA-GSAE）：双面门控稀疏性，带有符号幅度和辅助监督。极性敏感门控在任一符号上选择支持，符号幅度路径避免L1收缩，辅助重构防止门控崩溃。双极性共享——一个潜在变量沿共享方向编码两种符号——通过新的Bi-Jump-ReLU激活实现；参数核算表明，即使反相关对很少，符号感知也能保持参数效率。在Pythia-1B和SmolLM3-3B（6个单元，3个种子）的三个中层钩点上的真实LLM激活上，宽度为H的半宽SA-GSAE在3/6个单元（两个MLP输出钩点和resid-mid/Pythia-1B）上在整个扫描的L0重叠上严格帕累托支配宽度为2H的全宽门控SAE；在其余3个单元上，R²差距在0.025以内（最大差距-0.008），同时死单元分数绝对降低0.35-0.62。扫描几何平均死单元分数降低在MLP输出单元和Pythia-1B resid上约为100倍-500倍，在注意力单元和SmolLM3-3B resid上约为2倍-4倍。消融实验表明，双面门控和辅助损失是承重的（无辅助时LR降至0.27，98%死单元）；绑定r_i^+ = r_i^-与无绑定不可区分（|ΔR²| = 0.0015），我们推荐这种对称变体作为默认。MLP输出的增益来自大多数潜在变量携带两种极性；在注意力上，双极性结构集中在一小部分顶级潜在变量中。全宽SA-GSAE在SmolLM3-3B resid上表现出可复现的重构崩溃，而半宽完全避免了这一点。

英文摘要

Sparse Autoencoders (SAEs) extract interpretable features from Large Language Models, but standard variants enforce non-negativity, forcing separate latents for diametrically opposed concepts (e.g., "pressure too high" vs. "pressure too low") and wasting dictionary capacity when features are anticorrelated. We propose the Sign-Aware Gated SAE (SA-GSAE): two-sided gated sparsity with signed magnitude and auxiliary supervision. A polarity-sensitive gate selects support on either sign, a signed-magnitude path avoids L1 shrinkage, and an auxiliary reconstruction prevents gate collapse. Bipolar sharing - one latent encoding both signs along a shared direction - is realised via a new Bi-Jump-ReLU activation; parameter accounting shows sign-awareness stays parameter-efficient even when anticorrelated pairs are rare. On real LLM activations across three mid-depth hookpoints on Pythia-1B and SmolLM3-3B (6 cells, 3 seeds), a half-width SA-GSAE at width H strictly Pareto-dominates a full-width Gated SAE at 2H over the entire swept L0 overlap on 3 of 6 cells (both MLP-output hookpoints and resid-mid/Pythia-1B); on the remaining 3 it matches R^2 within 0.025 (max gap -0.008) while cutting dead fraction by 0.35-0.62 absolute. Sweep-geomean dead-fraction reductions are ~100x-500x on MLP-output cells and Pythia-1B resid, ~2x-4x on attention cells and SmolLM3-3B resid. Ablations show the two-sided gate and auxiliary loss are load-bearing (no auxiliary collapses LR to 0.27, 98% dead); tying r_i^+ = r_i^- is indistinguishable (|Delta R^2| = 0.0015), and we recommend this symmetric variant as default. MLP-output gains come from most latents carrying both polarities; on attention, bipolar structure concentrates in a small set of top latents. Full-width SA-GSAE exhibits a reproducible reconstruction collapse at SmolLM3-3B resid that the half-width entirely avoids.

URL PDF HTML ☆

赞 0 踩 0

2605.27102 2026-05-28 cs.CV cs.LG 版本更新

从准确性到可审计性：金融AI系统中的确定性综述

Ruizhe Zhou, Xiaoyang Liu, Gaoyuan Du, Yi Zheng, Shouxi Ren, Deepayan Chakrabarti, Dengdu Jiang

AI总结本文从系统视角综述了金融AI中表格模型、图网络和基于LLM的智能体工作流三种模态的不可重现性问题，通过实验量化了确定性指标并提出了分层评估框架。

详情

AI中文摘要

在受监管的金融环境中部署机器学习——如信用风险、欺诈检测和反洗钱——暴露了算法可重现性的关键漏洞。虽然早期的金融机器学习解决了统计挑战（如回测过拟合），但深度神经网络和生成式AI引入了根植于硬件和架构的机械非确定性。本综述从系统视角审视了当前金融AI中三种主要模态的可重现性失败：表格模型（事后解释方差）、图网络（随机采样和时间异步）以及基于LLM的智能体工作流（批次依赖的差异和轨迹漂移）。我们通过公开金融数据集上的第一方实验补充了文献分析——量化了信用评分中的解释排名不稳定性、基于GNN的欺诈检测中的预测翻转率以及LLM实体提取中张量并行引起的输出差异。我们提出了一个分层评估框架，将模态特定指标（RBO、D_cos、TDI、PSD）与审计准备度联系起来，并实证验证了logit级和语义级确定性度量的互补性。

英文摘要

Deploying machine learning in regulated financial environments -- credit risk, fraud detection, and anti-money laundering -- exposes critical vulnerabilities in algorithmic reproducibility. While early financial ML addressed statistical challenges such as backtest overfitting, deep neural networks and Generative AI have introduced mechanical nondeterminism rooted in hardware and architecture. This survey provides a systems perspective on reproducibility failures across three modalities now dominant in financial AI: tabular models (post-hoc explanation variance), graph networks (stochastic sampling and temporal asynchrony), and LLM-based agentic workflows (batch-dependent divergence and trajectory drift). We supplement the literature analysis with first-party experiments on public financial datasets -- quantifying explanation rank instability in credit scoring, prediction flip rates in GNN-based fraud detection, and tensor-parallel-induced output divergence in LLM entity extraction. We propose a layered evaluation framework linking modality-specific metrics (RBO, D_cos, TDI, PSD) to audit readiness, and empirically validate the complementarity of logit-level and semantic-level determinism measures.

URL PDF HTML ☆

赞 0 踩 0

2605.22297 2026-05-28 cs.LG cs.AI 版本更新

One LR Doesn't Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs

一个学习率不适用于所有层：基于重尾引导的LLM逐层学习率

Di He, Songjun Tu, Keyu Wang, Lu Yin, Shiwei Liu

发表机构 * SIAT（深圳先进技术研究院）； PCL（鹏城实验室）； UCAS（中国科学院大学）； UOT（图宾根大学）； LIU1（智能系统马克斯·普朗克研究所）； LIU2（图宾根ELLIS研究所）； LIU3（图宾根人工智能中心）

AI总结本文提出基于重尾自正则化理论的逐层学习率（LLR）方法，通过为Transformer各层分配不同学习率，加速训练并提升泛化能力，在多种模型和优化器上验证了有效性。

详情

AI中文摘要

学习率配置是现代深度学习的一个基本方面。当前跨所有层应用统一学习率的普遍做法忽视了Transformer的结构异质性，可能限制其作为大型语言模型（LLM）骨干的有效性。在本文中，我们引入逐层学习率（LLR），这是一种自适应方案，为各个Transformer层分配不同的学习率。我们的方法基于重尾自正则化（HT-SR）理论，该理论通过表征权重相关矩阵的经验谱密度（ESD）来量化重尾性。重尾性较弱的层被分配较大的学习率以加速训练，而重尾性较强的层则获得较小的学习率。通过这种方式定制学习率，LLR促进了跨层更均衡的训练，导致更快的收敛和更好的泛化。在从LLaMA到GPT-nano的架构、包括AdamW和Muon的优化器以及从60M到3B参数、最多100B训练token的模型规模上进行的大量实验证明了LLR的有效性。LLR实现了高达1.5倍的训练加速，并且始终优于统一学习率的基线。特别地，它将1B模型的平均零样本准确率从47.09%提高到49.02%，将3B模型的平均零样本准确率从48.58%提高到50.61%。LLR的一个关键优势是其低调优开销：它可以直接从统一基线转移近乎最优的学习率设置。代码可在https://github.com/hed-ucas/Layer-wise-Learning-Rate获取。

英文摘要

Learning rate configuration is a fundamental aspect of modern deep learning. The prevailing practice of applying a uniform learning rate across all layers overlooks the structural heterogeneity of Transformers, potentially limiting their effectiveness as the backbone of Large Language Models (LLMs). In this paper, we introduce Layerwise Learning Rate (LLR), an adaptive scheme that assigns distinct learning rates to individual Transformer layers. Our method is grounded in Heavy-Tailed Self-Regularization (HT-SR) theory, which characterizes the empirical spectral density (ESD) of weight correlation matrices to quantify heavy-tailedness. Layers with weaker heavy-tailedness are assigned larger learning rates to accelerate training, while layers with stronger heavy-tailedness receive smaller learning rates. By tailoring learning rates in this manner, LLR promotes more balanced training across layers, leading to faster convergence and improved generalization. Extensive experiments across architectures ranging from LLaMA to GPT-nano, optimizers including AdamW and Muon, and model scales from 60M to 3B parameters with up to 100B training tokens demonstrate the effectiveness of LLR. LLR achieves up to 1.5x training speedup and consistently outperforms uniform-learning-rate baselines. In particular, it improves the average zero-shot accuracy of 1B models from 47.09% to 49.02%, and that of 3B models from 48.58% to 50.61%. A key advantage of LLR is its low tuning overhead: it can transfer nearly optimal learning-rate settings directly from the uniform baseline. Code is available at https://github.com/hed-ucas/Layer-wise-Learning-Rate.

URL PDF HTML ☆

赞 0 踩 0

2605.28145 2026-05-28 cs.AI cs.LG 版本更新

Adaptive Reservoir Computing for Multi-Scenario Chaotic System Forecasting

自适应储层计算用于多场景混沌系统预测

Shadmehr Zaregarizi, Khashayar Yavari

发表机构 * Politecnico di Torino（托里尼理工大学）

AI总结提出一种自适应储层计算框架，通过四种定制策略（精确状态同步、直方图引导候选选择、多种子搜索、顺序多序列训练）在CTF-4-Science Lorenz基准的12个任务中取得74.91分，证明其高效竞争力。

Comments 4 pages, 2 figures

详情

AI中文摘要

我们提出了一种自适应储层计算框架，用于CTF-4-Science Lorenz基准测试，该基准评估机器学习模型在十二个不同任务上的表现，这些任务涵盖五种性质不同的场景：基线预测、含噪信号重建、噪声下预测、少样本学习和参数泛化。我们没有采用统一的推理策略，而是根据每个评估场景的具体需求定制回声状态网络（ESNs）的训练和预测过程。我们的主要贡献有四个方面：（1）精确的储层状态同步，消除了短时预测中的预热近似误差；（2）直方图引导的候选选择，直接优化长时间遍历评估指标；（3）多种子储层搜索，适用于训练数据严重受限的少样本场景；（4）顺序多序列训练，解决了参数泛化任务中的状态分布不匹配问题。所提出的框架在公共基准排行榜上获得了74.91分，表明精心调整的储层计算对于多样化的混沌系统建模挑战是一种具有竞争力和计算效率的方法。

Chreode: 用于一步时间动态和扰动预测的细胞世界模型

Mufan Qiu, Genhui Zheng, Yinuo Xu, Ruichen Zhang, Ying Ding, Qi Long, Tianlong Chen

发表机构 * University of North Carolina at Chapel Hill（北卡罗来纳大学教堂山分校）； The University of Texas at Austin（德克萨斯大学奥斯汀分校）； University of Pennsylvania（宾夕法尼亚大学）

AI总结提出Chreode，一种基于结构化残差转移算子的单步细胞世界模型，通过预训练和微调实现发育轨迹与扰动预测的统一，在多个基准上取得性能提升。

Comments 25 pages, 3 figures, 14 tables. Submitted to NeurIPS 2026

详情

AI中文摘要

预测细胞在发育信号或遗传扰动下如何改变其转录状态是计算机生物学和AI虚拟细胞计划的核心。现有方法要么拟合忽略时间的静态对照到处理映射，要么在每个数据集上独立求解多步ODE/薛定谔桥问题。我们引入了Chreode，一种单步细胞世界模型，通过结构化残差转移算子预测动作条件下的细胞状态转换。它将分布演化从推理时间转移到训练时间，实现单次生成，同时保留了受Waddington启发的分解：下坡景观流、切向旋转动力学和随机扩散。该模型使用共享的scVI编码器和基于DiT的动态骨干在包含7个数据集的240万细胞小鼠胚胎图谱上进行预训练。作为微调初始化，Chreode在Weinreb造血和Veres胰岛分化上改善了每个目标的Sinkhorn距离，优于匹配的scratch模型、PI-SDE和PRESCIENT。作为GEARS的可转移基因状态嵌入，预训练的动态表示将Norman Perturb-seq上的共享词汇DE20均方误差从0.2121降低到0.1858，相对改进12.4%，且未改变GEARS训练过程。我们将这种对扰动预测的可转移性解释为预训练的发育轨迹动态编码了可转移至CRISPR诱导状态变化的分化原语，因为两者都涉及共享潜在几何中的细胞状态转换。此外，预训练骨干在Weinreb上产生了与强动态OT基线竞争的无监督克隆命运分数。

英文摘要

Predicting how a cell will change its transcriptional state under a developmental signal or a genetic perturbation is the computational core of in-silico biology and the AI Virtual Cell program. Existing approaches either fit static control-to-treated maps that discard time, or solve multi-step ODE / Schrödinger-bridge problems on each dataset independently. We introduce Chreode, a one-step cell world model that predicts action-conditioned cell-state transitions through a structured residual transition operator. It shifts distributional evolution from inference time to training time, enabling single-pass generation while preserving a Waddington-inspired decomposition into downhill landscape flow, rotational in-tangent dynamics, and stochastic spread. The model is pretrained with a shared scVI encoder and a DiT-based dynamics backbone on a 2.4M-cell mouse embryonic atlas spanning 7 datasets. As a fine-tuning initialization, Chreode improves per-target Sinkhorn distance on Weinreb hematopoiesis and Veres islet differentiation over matched scratch models, PI-SDE, and PRESCIENT. As a transferable gene-state embedding for GEARS, the pretrained dynamics representation reduces shared-vocabulary DE20 mean squared error on Norman Perturb-seq from 0.2121 to 0.1858, a 12.4% relative improvement, without changing the GEARS training procedure. We interpret this transfer to perturbation prediction as evidence that pretrained developmental-trajectory dynamics encode differentiation primitives transferable to CRISPR-induced state shifts, since both involve cell-state transitions in a shared latent geometry. The pretrained backbone additionally produces zero-shot clonal fate scores on Weinreb that are competitive with strong dynamic-OT baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.28109 2026-05-28 cs.LG 版本更新

Long Live The Balance: Information Bottleneck Driven Tree-based Policy Optimization

平衡万岁：信息瓶颈驱动的基于树的策略优化

Hao Jiang, Shurui Li, Tianpeng Bu, Bowen Xu, Xin Liu, Qihua Chen, Hongtao Duan, Lulu Hu, Bin Yang, Minying Zhang

发表机构 * Alibaba Cloud Computing, Alibaba Group（阿里云计算，阿里巴巴集团）

AI总结针对在线强化学习中探索-利用不平衡问题，提出基于信息瓶颈理论的IB-Score指标和IB-TPO框架，通过树采样策略提升优化稳定性和性能。

Comments Accepted to ICML 2026 main conference

详情

AI中文摘要

最近，大型语言模型（LLMs）的在线强化学习（RL）在复杂推理任务中展现出有前景的性能。然而，它们通常表现出不平衡的探索-利用权衡，导致优化不稳定和次优性能。我们引入了IB-Score，这是一种基于信息瓶颈理论的新颖度量，通过量化步骤级推理多样性与正确答案共享的互信息之间的权衡，来评估策略的探索-利用平衡。基于IB-Score的分析表明，带有常见正则化器的流行在线RL方法（例如GRPO）在训练过程中无法持续保持平衡，导致结果次优。为了解决这个问题，我们提出了信息瓶颈驱动的基于树的策略优化（IB-TPO），这是一个原则性框架，将IB-Score作为细粒度优化目标，并利用新颖的IB引导树采样策略，该策略不仅通过在同一token预算下生成50%更多的轨迹来提高在线采样效率，还重用树结构进行有效的IB-Score蒙特卡洛估计。在标准基准上的大量实验表明，我们的方法比GRPO基线显著提高了2.9%至3.6%，并且也优于其他最先进的在线RL方法。我们的代码可在https://github.com/alibaba/EfficientRL获取。

英文摘要

Recent advances in online reinforcement learning (RL) for large language models (LLMs) have demonstrated promising performance in complex reasoning tasks. However, they often exhibit an imbalanced exploration-exploitation trade-off, resulting in unstable optimization and sub-optimal performance. We introduce IB-Score, a novel metric grounded in Information Bottleneck theory that evaluates policy's exploration-exploitation balance by quantifying the trade-off between step-level reasoning diversity and mutual information shared with the correct answer. Analysis based on IB-Score shows that popular online RL approaches (e.g., GRPO) with common regularizers fail to consistently maintain balance during training with suboptimal results. To address this, we propose Information Bottleneck-driven Tree-based Policy Optimization (IB-TPO), a principled framework that formulates IB-Score as a fine-grained optimization objective and utilizes a novel IB-guided tree sampling strategy that not only improves the efficiency of online sampling with 50% more trajectories under the same token budget, but also reuses the tree structure for effective IB-Score Monte Carlo estimation. Extensive experiments across standard benchmarks show that our method significantly outperforms GRPO baseline by 2.9% to 3.6% and also outperforms other state-of-the-art online RL approaches. Our code is available at https://github.com/alibaba/EfficientRL.

URL PDF HTML ☆

赞 0 踩 0

2605.28103 2026-05-28 cs.LG cs.GT 版本更新

Benchmarking Inductive Biases for Multivariate Time-Series Anomaly Detection with a Robust Multi-View Channel-Graph Detector

多变量时间序列异常检测的归纳偏差基准测试与鲁棒多视图通道图检测器

Junhao Wei, Yanxiao Li, Bidong Chen, Yifu Zhao, Haochen Li, Dexing Yao, Baili Lu, Xudong Ye, Jietian Feng, Sio-Kei Im, Yapeng Wang, Xu Yang

发表机构 * Faculty of Applied Sciences, Macao Polytechnic University（应用科学学院，澳门理工学院）； Macao Polytechnic University（澳门理工学院）

AI总结通过统一实验框架评估十种代表性检测器，提出结合NOTEARS约束有向通道图、可选补丁注意力和时间关联视图的自适应检测器，在五个数据集上取得最佳宏平均VUS-ROC。

详情

AI中文摘要

我们提出了一个关于多变量时间序列（MTS）异常检测的统一实验、分析和基准研究。十个家族代表性检测器——涵盖统计、重构、关联、频率和通用Transformer家族——在五个数据集（SMD、MSL、SMAP、PSM和MSDS）上，从有效性、效率、鲁棒性和跨数据集泛化性方面进行评估。所有方法共享相同的窗口化、评分、硬件和度量协议。有效性、消融和鲁棒性使用三个随机种子；跨数据集迁移使用种子0，因为每个额外种子需要250次源-目标评估。该基准测试得出三个与方法无关的发现：没有单一偏好的基线占主导地位；绝对扰动VUS-ROC比保留比率更具信息量；MSDS表现为事件密集的部署工作负载，而非稀疏点异常基准。在此协议下，我们还引入了\ours{}，一种自适应检测器家族，结合了NOTEARS约束的有向通道图视图以及可选的补丁注意力和时间关联视图。\ours{}取得了最佳宏平均VUS-ROC（0.675，比第二好的LSTM-AE高5.1个百分点），总体排名第一，并在所有五个数据集上进入前三。它在MSL和MSDS上的胜利幅度较小，但其平均和鲁棒性增益更大：在每种方法相同的三种子鲁棒性协议下，它在噪声、通道丢失和时间偏移扰动下获得了最强的绝对VUS-ROC。我们发布了MSDS预处理协议、配置、脚本和种子级度量转储。

英文摘要

We present a unified experiment, analysis, and benchmark study of multivariate time-series (MTS) anomaly detection. Ten family-representative detectors -- spanning statistical, reconstruction, association, frequency, and generic-transformer families -- are evaluated on five datasets (SMD, MSL, SMAP, PSM, and MSDS) under effectiveness, efficiency, robustness, and cross-dataset generalisation. All methods share the same windowing, scoring, hardware, and metric protocols. Effectiveness, ablation, and robustness use three random seeds; cross-dataset transfer uses seed~0 because each extra seed requires $250$ source-target evaluations. The benchmark yields three method-independent findings: no single-bias baseline dominates; absolute perturbation VUS-ROC is more informative than retention ratios; and MSDS behaves as an event-dense deployment workload rather than a sparse point-anomaly benchmark. Under this protocol we also introduce \ours{}, an adaptive detector family combining a NOTEARS-constrained directed channel-graph view with optional patch-attention and temporal-association views. \ours{} achieves the best macro-average VUS-ROC ($0.675$, $+5.1$~pt over the second-best LSTM-AE), ranks first overall, and reaches the top-3 on all five datasets. Its wins on MSL and MSDS are narrow, while its average and robustness gains are larger: under the same three-seed robustness protocol for every method, it obtains the strongest absolute VUS-ROC across noise, channel dropout, and time-shift perturbations. We release the MSDS preprocessing protocol, configurations, scripts, and seed-level metric dumps.

URL PDF HTML ☆

赞 0 踩 0

2605.28078 2026-05-28 cs.CR cs.AI cs.LG stat.ML 版本更新

Mind the Gap: Mixtures of Gaussians in Approximate Differential Privacy

注意差距：近似差分隐私中的高斯混合机制

Huikang Liu, Aras Selvi, Wolfram Wiesemann

发表机构 * Shanghai Jiao Tong University（上海交通大学）； UCL School of Management（伦敦大学学院管理学院）； Imperial Business School（帝国理工学院商学院）

AI总结针对已知敏感度的标量实值查询函数，设计了一类混合高斯加性噪声机制，在中等和低隐私预算下显著降低噪声幅度和方差，接近最优性。

Comments ICML 2026 style: 9 main pages followed by acknowledgements, references, appendices

详情

AI中文摘要

我们设计了一类加性噪声机制，满足标量实值查询函数的 $(\varepsilon, δ)$-差分隐私（DP），这些函数具有已知敏感度，特别关注中等和低隐私预算。这些机制称为 extit{混合机制}，通过混合多个高斯分布构建，这些高斯分布共享相同的方差，但均值和混合权重不同。得到的分布可以解释为零均值高斯（如解析高斯机制中所用）和额外高斯（其均值取决于查询函数的敏感度）的凸组合。我们推导了 $(\varepsilon, δ)$-DP 所需方差的严格条件，并提供了高效算法来计算它们。与解析高斯机制相比，我们的机制产生了显著更低的期望噪声幅度（$l_1$-损失）和方差（零均值分布的 $l_2$-损失）。在激励我们设计的低隐私预算下，我们的机制接近最优性，几乎消除了解析高斯机制的所有最优性差距。

英文摘要

We design a class of additive noise mechanisms that satisfy $(\varepsilon, δ)$-differential privacy (DP) for scalar, real-valued query functions with known sensitivities, with a particular focus on moderate and low-privacy regimes. These mechanisms, which we call \textit{mixture mechanisms}, are constructed by mixing multiple Gaussian distributions that share the same variance but differ in their means and mixture weights. The resulting distributions can be interpreted as convex combinations of a zero-mean Gaussian (as used in the analytic Gaussian mechanism) and additional Gaussians whose means depend on the sensitivity of the query function. We derive tight conditions on the variances required for $(\varepsilon, δ)$-DP and provide efficient algorithms to compute them. Compared to the analytic Gaussian mechanism, our mechanisms yield substantially lower expected noise amplitudes ($l_1$-loss) and variances ($l_2$-loss for zero-mean distributions). In the low-privacy regime that motivates our design, our mechanisms approach optimality, mitigating nearly all of the optimality gap of the analytic Gaussian mechanism.

URL PDF HTML ☆

赞 0 踩 0

2605.28075 2026-05-28 cs.LG 版本更新

Measure-to-measure Regression with Transformers

基于Transformer的测度到测度回归

Matthew Vandergrift, Martha White, Yury Polyanskiy, Philippe Rigollet, Lazar Atanackovic

发表机构 * University of Alberta（阿尔伯塔大学）； Alberta Machine Intelligence Institute（阿尔伯塔机器智能研究所）； MIT（麻省理工学院）； The Broad Institute of MIT and Harvard（MIT和哈佛大学Broad研究所）

AI总结针对概率测度之间的映射学习问题，提出利用Transformer的测度依赖和平均场结构，实现静态和动态两种非线性测度到测度回归方法，并在合成实验、粒子系统和癌症治疗反应预测中验证其泛化能力。

详情

AI中文摘要

许多学习问题需要预测群体在未知变换下的演化。这种群体的自然表示是概率测度，其中点云是一个关键例子。在这项工作中，我们研究了测度到测度（M2M）回归问题，即从有限观测的输入-输出对中学习概率测度之间的映射。与经典回归中独立变换单个样本不同，M2M回归将整个分布视为数据点。这种视角在某些科学应用中至关重要，例如细胞和分子生物学，其中细胞不是作为独立数据点演化，而是作为一个集合。然而，现有方法很少能够以足够的表达能力和可扩展性解决M2M回归问题。我们提出了非线性M2M回归的形式化，并介绍了两种易于使用、表达能力强且可扩展的方法来学习此类算子：作为静态M2M映射的Transformer和作为动态M2M速度场的Transformer。我们的方法利用Transformer自然的测度依赖和平均场结构，在概率分布空间上学习非线性M2M映射。我们通过合成实验、相互作用粒子系统以及一个大规模患者来源的类器官数据集（用于预测结直肠癌治疗反应）展示了我们提出的方法在泛化到未见测度上的有效性。

英文摘要

Many learning problems require predicting how populations evolve under an unknown transformation. A natural representation for such populations is a probability measure, with point clouds as a key example. In this work, we study the measure-to-measure (M2M) regression problem, in which one seeks to learn a map between probability measures from a finite collection of observed input-output pairs. In contrast to classical regression, where individual samples are transformed independently, M2M regression treats entire distributions as the data points. This perspective is vital in certain scientific applications, for example, cellular and molecular biology, where cells are known to evolve not as independent data points but as a collection. However, few existing approaches address the problem of M2M regression with sufficient expressivity and scalability. We present a formalization of nonlinear M2M regression and introduce two easy-to-use, expressive, and scalable approaches to learn such operators: transformers as static M2M maps and transformers as dynamic M2M velocity fields. Our approach leverages the natural measure-dependent and mean-field structure of transformers to learn nonlinear M2M maps on the space of probability distributions. We illustrate the effectiveness of our proposed method to generalize to unseen measures on synthetic experiments, interacting particle systems, and a large-scale patient-derived organoid dataset for predicting treatment response in colorectal cancer.

URL PDF HTML ☆

赞 0 踩 0

2605.28053 2026-05-28 cs.LG 版本更新

RW-TTT: Batched Serving for Request-Owned Test-Time Training State

RW-TTT：面向请求自有测试时训练状态的批量服务

Jian Yang, Zhizhuo Kou, Yao Tian, Hao Zhang, Han Chen, Sirui Han, Yike Guo

发表机构 * HKUST（香港科技大学）； CUHK（香港大学）； NUS（新加坡国立大学）

AI总结提出RW-TTT框架，通过标记解码步骤的所有者、版本和读写效果，实现请求自有测试时训练状态下的高效批量LLM服务，在单GPU上达到274.61 tok/s聚合吞吐，较顺序服务提升9.31倍。

2605.28042 2026-05-28 cs.CL cs.AI cs.LG 版本更新

BPPO: 二元前缀策略优化用于高效GRPO式推理强化学习与简洁响应

Qingfei Zhao, Huan Song, Shuyu Tian, Jiawei Shao, Xuelong Li

发表机构 * TeleAI ； Shanghai Jiao Tong University（上海交通大学）

AI总结针对GRPO更新成本高且易产生冗长推理的问题，提出BPPO方法，通过仅使用最短正确和错误完成作为更新单元并聚焦前缀优化，实现6倍加速并缩短30-50%响应长度。

详情

AI中文摘要

组相对策略优化（GRPO）广泛用于训练推理模型，但更新每组中的所有采样完成会带来巨大成本，并可能强化冗长的推理轨迹。本文研究在GRPO式推理强化学习中，是否所有完成都提供同样有用的更新信号。我们的梯度相似性分析表明，在同一提示组内，同类完成通常产生高度相似的更新方向，而正确-错误对则提供更明显的对比信号。受此观察启发，我们提出二元前缀策略优化（BPPO），该方法使用最短正确完成和最短错误完成作为紧凑更新单元，同时保留全组优势归一化。BPPO通过自适应完成调度和前缀聚焦优化进一步提高效率；通过仅更新响应前缀，它避免强化冗余后缀并鼓励更简洁的响应。在GSM8K、MATH和Geo3K上的实验表明，BPPO在保持竞争性准确率的同时，相比GRPO实现了高达6.08倍的加速，并将平均响应长度减少约30-50%，而无需在奖励中显式添加长度惩罚。

英文摘要

Group Relative Policy Optimization (GRPO) is widely used for training reasoning models, but updating all sampled completions in each group incurs substantial cost and can reinforce verbose reasoning trajectories. In this paper, we study whether all completions provide equally useful update signals in GRPO-style reasoning RL. Our gradient-similarity analysis shows that, within the same prompt group, same-class completions often induce highly similar update directions, whereas correct-incorrect pairs provide more distinct contrastive signals. Motivated by this observation, we propose Binary Prefix Policy Optimization (BPPO), which uses the shortest correct completion and the shortest incorrect completion as a compact update unit while preserving full-group advantage normalization. BPPO further improves efficiency with adaptive completion scheduling and prefix-focused optimization; by updating only response prefixes, it avoids reinforcing redundant suffixes and encourages more concise responses. Experiments on GSM8K, MATH, and Geo3K show that BPPO achieves up to 6.08x speedup over GRPO while maintaining competitive accuracy, and reduces mean response length by approximately 30-50% without modifying the reward with an explicit length penalty.

URL PDF HTML ☆

赞 0 踩 0

2605.28021 2026-05-28 cs.LG 版本更新

AOE: Exhaustive Out-of-Distribution Detection via Recalibrating Outlier Labels

AOE：通过重新校准异常标签实现穷尽式分布外检测

Fengqiang Wan, Qing-Yuan Jiang, Yang Yang

发表机构 * School of Computer Science and Engineering（计算机科学与工程学院）； Nanjing University of Science and Technology（南京理工大学）

AI总结提出自适应置信度异常暴露（AOE）方法，通过温度缩放重新校准异常标签，利用自适应软目标保留分布外样本与分布内类别的语义关系，从而扩大分离边界并提升分布外检测性能。

详情

AI中文摘要

分布外（OOD）检测对于在开放世界和安全关键场景中部署机器学习模型至关重要，在这些场景中，测试输入可能偏离训练分布，对未知样本的过度自信预测可能导致不可靠的决策。异常暴露（OE）通过训练期间引入辅助异常样本来扩大分布内（ID）和OOD样本之间的间隔，已成为一种有前景的OOD检测范式。现有的基于OE的方法通常通过使用统一标签来最大化OOD样本在ID类别上的熵，从而扩大这一间隔。然而，我们从理论上证明，统一标签不可避免地忽略了OOD样本与ID类别之间的关系，称为过度软化效应，导致次优的间隔边界。我们的理论分析进一步揭示，显式利用这种关系反而可以提高OOD检测性能。受此启发，我们提出了自适应置信度异常暴露（AOE），一种简单而有效的方法，利用温度缩放重新校准异常标签。具体来说，AOE从温度缩放的模型预测中为OOD样本生成自适应软目标，其中可学习的温度平滑预测分布，而不会完全消除类别关系信息。通过使用这些自适应软目标监督OOD样本，AOE保留了OOD样本与ID类别之间的语义接近性，同时鼓励软目标接近高熵分布，从而抑制过度自信的OOD预测并扩大分离边界。在多种基准上的大量实验证明了AOE的有效性。

英文摘要

Out-of-distribution (OOD) detection is essential for deploying machine learning models in open-world and safety-critical scenarios, where test inputs may deviate from the training distribution and overconfident predictions on unknown samples can lead to unreliable decisions. Outlier Exposure (OE) has emerged as a promising OOD detection paradigm by introducing auxiliary outliers during training to enlarge the margin between in-distribution (ID) and OOD samples. Existing OE-based methods typically enlarge this margin by employing uniform labels to maximize the entropy of OOD samples over ID categories. However, we theoretically show that uniform labels inevitably disregard the relations between OOD samples and ID categories, termed the over-softening effect, leading to a suboptimal margin bound. Our theoretical analysis further reveals that explicitly exploiting such relations can instead yield improved OOD detection performance. Motivated by this insight, we propose \underline{A}daptive Confidence \underline{OE} (AOE), a simple yet effective method that leverages temperature scaling to recalibrate outlier labels. Specifically, AOE generates adaptive soft targets from temperature-scaled model predictions for OOD samples, where the learnable temperature smooths the prediction distribution without fully erasing class-wise relational information. By supervising OOD samples with these adaptive soft targets, AOE preserves the semantic proximity between OOD samples and ID categories while encouraging the softened targets to approach a high-entropy distribution, thereby suppressing overconfident OOD predictions and enlarging the separation margin. Extensive experiments across diverse benchmarks demonstrate the effectiveness of AOE.

URL PDF HTML ☆

赞 0 踩 0

2605.28014 2026-05-28 cs.CL cs.LG 版本更新

ROSD: Reflective On-Policy Self-Distillation for Language Model Reasoning across Domains

ROSD: 面向跨领域语言模型推理的反思式同策略自蒸馏

Ziqi Zhao, Xinyu Ma, Liu Yang, Yujie Feng, Daiting Shi, Jingzhou He, Xin Xin, Zhaochun Ren, Xiao-Ming Wu

发表机构 * The Hong Kong Polytechnic University（香港理工大学）； Baidu Inc.（百度公司）； Shandong University（山东大学）； Leiden University（莱顿大学）

AI总结提出反思式同策略自蒸馏（ROSD）框架，通过反思引导的错误定位蒸馏将参考解模仿转为针对性推理修正，提升领域内推理和跨领域泛化能力。

Comments Preprint

详情

AI中文摘要

同策略自蒸馏（OPSD）通过为同策略 rollout 提供密集的 token 级监督，提升了大语言模型（LLM）的推理性能。然而，现有的 OPSD 方法在领域内推理上增益有限，且对领域外问题的泛化能力较差。我们识别出两个关键原因：将自教师模型条件化为已验证的解决方案会鼓励模仿训练领域的参考轨迹而非特定错误的修正；将蒸馏应用于完整响应可能会覆盖有效的推理前缀并强化过拟合。我们提出反思式同策略自蒸馏（ROSD），一个通过反思引导的、错误定位的蒸馏将参考解模仿转化为针对性推理修正的框架。对于每个 rollout，ROSD 使用自反思器提取修正思路并定位第一个错误片段。修正思路引导自教师模型进行针对性监督，而定位的错误片段将蒸馏限制在需要修正的区域。这种设计在保留有效前缀的同时修正了有缺陷的推理。在多个领域内和领域外推理基准上的实验表明，ROSD 在整体上产生了更强的领域内推理性能，并且相比标准 OPSD 具有显著更好的领域外泛化能力。代码可在 https://github.com/ZiqiZhao1/ROSD 获取。

英文摘要

On-policy self-distillation (OPSD) improves the reasoning performance of large language models (LLMs) by providing dense token-level supervision for on-policy rollouts. However, existing OPSD methods often yield limited gains on in-domain reasoning and generalize poorly to out-of-domain problems. We identify two key causes: conditioning the self-teacher on a verified solution encourages imitation of training-domain reference trajectories rather than error-specific correction, and applying distillation to the full response can overwrite valid reasoning prefixes and reinforce overfitting. We propose Reflective On-policy Self-Distillation (ROSD), a framework that turns reference-solution imitation into targeted reasoning correction through reflection-guided, error-localized distillation. For each rollout, ROSD uses a self-reflector to extract a corrective idea and locate the first erroneous span. The corrective idea guides the self-teacher toward targeted supervision, while the localized error span restricts distillation to where correction is needed. This design corrects flawed reasoning while preserving valid prefixes. Experiments on multiple in-domain and out-of-domain reasoning benchmarks show that ROSD yields stronger in-domain reasoning performance overall and substantially better out-of-domain generalization than standard OPSD. Code is available at https://github.com/ZiqiZhao1/ROSD.

URL PDF HTML ☆

赞 0 踩 0

2605.28009 2026-05-28 cs.CL cs.AI cs.LG 版本更新

MemGuard: Preventing Memory Contamination in Long-Term Memory-Augmented Large Language Models

MemGuard：防止长期记忆增强型大语言模型中的记忆污染

Hyeonjeong Ha, Jeonghwan Kim, Cheng Qian, Jiayu Liu, William M. Campbell, Yue Wu, Yuji Zhang, Kathleen McKeown, Dilek Hakkani-Tur, Heng Ji

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Columbia University（哥伦比亚大学）； Capital One

AI总结提出MemGuard，一种类型感知的记忆框架，通过显式分配功能角色、维护类型隔离记忆间的关联并选择性组合必要类型的证据，防止异构记忆污染，提升记忆可靠性最高28.27%并减少检索token数最高5.8倍。

详情

AI中文摘要

记忆增强型大语言模型通过跨交互维护长期记忆，将推理扩展到固定上下文窗口之外。然而，现有的记忆系统常常将稳定的用户事实、情景事件和行为规则折叠到共享空间中，使得功能不同的记忆被检索并用作可互换的证据。我们将这种失败模式识别为异构记忆污染，其中上下文特定的事件被过度概括为声明，或者语义相关但功能不兼容的记忆误导生成。为此，我们引入了MemGuard，一种类型感知的记忆框架，在记忆构建和检索过程中保留功能记忆边界。它在写入时为每个记忆分配显式的功能角色，维护跨类型隔离记忆的关系，并仅从必要的记忆类型中选择性组合证据，从而减少来自无关或功能不兼容证据的污染。在幻觉和长时对话基准测试中，MemGuard将记忆可靠性提高了最多28.27%，同时检索的记忆token数比先前方法减少了最多5.8倍。这些结果表明，可靠的长期推理依赖于对异构记忆的有原则的组织和选择性使用。

英文摘要

Memory-augmented large language models extend reasoning beyond a fixed context window by maintaining long-term memory across interactions. However, existing memory systems often collapse stable user facts, episodic events, and behavioral rules into a shared space, allowing functionally distinct memories to be retrieved and used as interchangeable evidence. We identify this failure mode as heterogeneous memory contamination, where context-specific events become overgeneralized claims, or semantically relevant but functionally incompatible memories mislead generation. To this end, we introduce MemGuard, a type-aware memory framework that preserves functional memory boundaries during memory construction and retrieval. It assigns each memory an explicit functional role at write time, maintains relations across type-isolated memories, and selectively composes evidence only from necessary memory types, reducing contamination from irrelevant or functionally incompatible evidence. Across hallucination and long-horizon conversation benchmarks, MemGuard improves memory reliability by up to 28.27% while retrieving up to 5.8x fewer memory tokens than prior methods. These results suggest that reliable long-term reasoning depends on principled organization and selective use of heterogeneous memory.

URL PDF HTML ☆

赞 0 踩 0

2605.28008 2026-05-28 cs.AI cs.LG 版本更新

Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training

压缩思维：压缩推理数据在LLM后训练中何时以及如何发挥作用

Kohsei Matsutani, Gouki Minegishi, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo

发表机构 * The University of Tokyo（东京大学）

AI总结本文通过分类显式、组合和隐式思维链，在合成组合推理任务上实验，发现粗粒度CoT需要更多SFT数据，组合和隐式CoT从数据缩放中获益更多但隐式CoT易导致记忆，后续RLVR会分解压缩步骤，且单向CoT顺序在长序列任务上泛化更强。

详情

AI中文摘要

大型语言模型（LLM）现在能够通过长思维链（CoT）推理解决复杂问题，但性能与token成本之间的权衡仍然是一个核心挑战。为了解决这个问题，监督微调（SFT）通常使用压缩推理数据，其中CoT轨迹被缩短为紧凑形式。然而，这种压缩推理数据对后训练的影响仍然知之甚少。在本文中，我们提出了一个CoT分类法，包括显式CoT（输出所有操作而不聚合）、组合CoT（将多个操作合并为单步）和隐式CoT（省略中间操作）。我们构建了一个合成组合推理任务，允许对难度、压缩粒度和数据大小进行可控变化，并在不同模型家族和大小上进行了全面的实验。值得注意的是，我们发现：（i）粗粒度CoT需要更多SFT数据；（ii）与显式CoT相比，组合CoT和隐式CoT从数据缩放中获益更多，而组合CoT从数据重复中获益，隐式CoT则倾向于导致记忆；（iii）与SFT不同，后续带有可验证奖励的强化学习（RLVR）会分解在SFT期间学到的压缩步骤；（iv）单向CoT顺序在更长序列任务上表现出更强的泛化能力。我们的发现为数据资源约束下的CoT设计提供了启示，并为LLM后训练中SFT和RL的机制提供了重要见解。

英文摘要

Large language models (LLMs) can now solve complex problems through long chain-of-thought (CoT) reasoning, but the trade-off between performance and token cost remains a central challenge. To address this issue, supervised fine-tuning (SFT) often uses compressed reasoning data, where CoT traces are shortened into compact forms. However, the effect of such compressed reasoning data on post-training remains poorly understood. In this paper, we propose a taxonomy of CoT consisting of Explicit CoT, which outputs all operations without aggregation, Composed CoT, which combines multiple operations into a single step, and Implicit CoT, which omits intermediate operations. We construct a synthetic compositional reasoning task that allows controlled variation of difficulty, compression granularity, and data size, and conducted a comprehensive set of experiments across different model families and sizes. Notably, we find that (i) coarser CoT requires more SFT data, (ii) compared with Explicit CoT, Composed CoT and Implicit CoT benefit more from data scaling, while Composed CoT benefits from data repetition and Implicit CoT tends to lead to memorization, (iii) unlike SFT, subsequent reinforcement learning (RL) with verifiable rewards (RLVR) decomposes compressed steps learned during SFT, and (iv) unidirectional CoT ordering shows stronger generalization on longer sequential tasks. Our findings provide implications for CoT design under data resource constraints and offer important insights into the mechanisms of SFT and RL in LLM post-training.

URL PDF HTML ☆

赞 0 踩 0

2605.28007 2026-05-28 cs.LG cs.AI 版本更新

Learning Compositional Latent Structure with Vector Networks

学习带有向量网络的组合潜在结构

Niclas Pokel, Benjamin F. Grewe

发表机构 * Institute of Neuroinformatics, UZH / ETH Zurich（神经信息学研究所，苏黎世联邦理工学院/苏黎世联邦理工人工智能中心）； ETH AI Center Zurich, Switzerland（苏黎世联邦理工人工智能中心，瑞士）

AI总结提出向量网络（VN），一种层级循环架构，通过可重用的秩1权重原子库实现组合泛化，在分布外任务中误差降低约一个数量级。

详情

AI中文摘要

深度网络是强大的函数逼近器，但它们通常将许多不同的计算存储在共享权重矩阵中，使得当熟悉的结构以新颖组合出现时，难以选择性地重用或调整其中的部分。我们引入了向量网络（VN），一种层级循环架构，其中每一层将固定的权重矩阵替换为可重用的秩1权重原子库。对于每个输入，VN最小化层级局部能量，以推断一组稀疏的活跃权重原子及其系数，这些系数受自底向上的输入重建和自顶向下的反馈一致性共同约束。这些权重原子系数随后为该样本组成一个输入特定的低秩权重矩阵。收敛后，慢速学习更新仅通过推断系数缩放的局部残差信号更新选中的权重原子。我们在四个组合基准上评估了VN，涵盖一维信号、二维空间解码、N体动力学和组合MNIST。在分布内任务中VN与强基线相当，而在需要以新颖方式重新组合熟悉因子的分布外任务中，其误差通常低约一个数量级。因此，向量网络使组合泛化成为架构和推理过程的结构属性，而非将许多行为拟合到单个共享密集参数基底的脆弱副产品。

英文摘要

Deep networks are powerful function approximators, but they typically store many different computations in shared weight matrices, making it difficult to selectively reuse or adapt parts of them when a familiar structure appears in novel combinations. We introduce the Vector Network (VN), a hierarchical recurrent architecture in which each layer replaces a fixed weight matrix with a library of reusable rank-1 weight atoms. For each input, VN minimizes a layer-local energy to infer a sparse set of active weight atoms and their coefficients, jointly constrained by bottom-up input reconstruction and top-down feedback consistency. These weight atom coefficients then compose an input-specific low-rank weight matrix for that sample. After convergence, slow learning updates only the selected weight atoms through local residual signals scaled by the inferred coefficients. We evaluate VN on four compositional benchmarks spanning 1D signals, 2D spatial decoding, N-body dynamics, and compositional MNIST. VN matches strong baselines in distribution while often achieving out-of-distribution error about an order of magnitude lower when familiar factors must be recombined in novel ways. Vector networks thus make compositional generalization a structural property of the architecture and inference process rather than a brittle byproduct of fitting many behaviors into one shared dense parameter substrate.

URL PDF HTML ☆

赞 0 踩 0

2605.27997 2026-05-28 cs.CL cs.AI cs.LG 版本更新

Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models

毒性存在于何处？语言模型中的机制定位与定向抑制

Himanshu Beniwal, Mayank Singh

发表机构 * Indian Institute of Technology Gandhinagar（印度理工学院冈德辛加尔）

AI总结通过分析毒性与中性提示的激活差异，定位特定层和神经元中的毒性，并利用推理时缩放或最小秩一权重编辑进行抑制，无需梯度下降，实现毒性降低同时保持语言质量。

详情

AI中文摘要

大型语言模型频繁生成有毒、仇恨或有害内容，然而现有的缓解方法依赖于昂贵的重新训练或输出级过滤，且缺乏对毒性内部起源的机制性理解。我们提出了Meow2X和TRNE，两种互补的无需重新训练的框架，通过分析毒性与中性提示之间的激活差异，将毒性定位到特定层和神经元，然后通过推理时缩放或最小秩一权重编辑进行抑制——无需任何梯度下降。在五个语言模型、两个基准测试和90种配置上的评估，使用双重安全评估器，一致地证明了毒性降低，同时保持了语言建模质量。我们的分析揭示，毒性不成比例地编码在早期MLP层中，在不同架构间有所变化，并且被单一评估器设置系统性地低估——强调了多评估器安全评估的必要性。通过连接机制可解释性与实际去毒化，我们的框架为更安全、更透明的语言模型提供了一条原则性路径。

英文摘要

Large language models frequently generate toxic, hateful, or harmful content, yet existing mitigation methods rely on costly retraining or output-level filtering with no mechanistic insight into where toxicity originates internally. We introduce Meow2X and TRNE, two complementary retraining-free frameworks that localize toxicity to specific layers and neurons by analyzing activation differentials between toxic and neutral prompts, then suppress them via inference-time scaling or minimal rank-one weight edits -- without any gradient descent. Evaluations across five LMs, two benchmarks, and 90 configurations using dual safety evaluators demonstrate consistent toxicity reduction while preserving language modeling quality. Our analysis reveals that toxicity is disproportionately encoded in early MLP layers, varies across architectures, and is systematically underestimated by single-evaluator setups -- underscoring the need for multi-evaluator safety assessment. By bridging mechanistic interpretability with practical detoxification, our framework offers a principled path toward safer, more transparent language models.

URL PDF HTML ☆

赞 0 踩 0

2605.27992 2026-05-28 cs.LG 版本更新

Patched-DeltaNet: Token-Level Event-Driven Memory for Linear-Time Anomaly Detection

Patched-DeltaNet: 用于线性时间异常检测的令牌级事件驱动记忆

Tae-Gyun Lee, Junyoung Park, Kyu Won Han

发表机构 * ETRI（电子与电信研究院）

AI总结提出Patched-DeltaNet架构，结合时间序列分块与门控Delta网络，通过令牌级事件驱动记忆实现线性时间复杂度的异常检测，在SMD基准上达到ROC-AUC 0.957和PA-F1 0.822。

Comments 7 pages, 2 tables

详情

AI中文摘要

时间序列异常检测对于维护关键任务系统的可靠性至关重要。虽然基于Transformer的模型（如PatchTST）表现出色，但其$\mathcal{O}(L^2)$的计算复杂度严重限制了在资源受限环境中的部署。在本文中，我们提出了Patched-DeltaNet，一种结合时间序列分块与门控Delta网络的新型架构。通过整合这些范式，我们假设并证明了令牌级事件驱动记忆的出现，其中分块机制提取局部语义块，而误差驱动的DeltaNet仅在发生显著物理变化（定义为delta）时更新其循环状态。这种协同作用有效滤除背景噪声并捕获突然的异常漂移。我们在服务器机器数据集（SMD）基准上的严格实验证明了Patched-DeltaNet的结构优越性和样本效率。通过在统一评估约束和相同计算预算下严格优于最新架构，我们的模型实现了ROC-AUC 0.957和PA-F1 0.822，同时将计算复杂度大幅降低至理论最小值$\mathcal{O}(L/P)$。

英文摘要

Time series anomaly detection is critical for maintaining the reliability of mission-critical systems. While Transformer-based models like PatchTST have shown remarkable performance, their $\mathcal{O}(L^2)$ computational complexity severely limits deployment in resource-constrained environments. In this paper, we propose Patched-DeltaNet, a novel architecture combining time-series patching with Gated Delta Networks. By integrating these paradigms, we hypothesize and demonstrate the emergence of token-level event-driven memory, whereby the patching mechanism extracts local semantic chunks, while the error-driven DeltaNet updates its recurrent state exclusively when significant physical changes, defined as deltas, occur. This synergy effectively filters out background noise and captures sudden anomalous drifts. Our rigorous experiments on the Server Machine Dataset (SMD) benchmark demonstrate the structural superiority and sample efficiency of Patched-DeltaNet. By strictly outperforming recent architectures under unified evaluation constraints and identical compute budgets, our model yields an ROC-AUC of 0.957 and PA-F1 of 0.822, while drastically reducing computational complexity to the theoretical minimum of $\mathcal{O}(L/P)$.

URL PDF HTML ☆

赞 0 踩 0

2605.27990 2026-05-28 cs.LG cs.AI cs.CV 版本更新

Geometry-Correct Diffusion Posterior Sampling with Denoiser-Pullback Curvature Guidance and Manifold-Aligned Damping

几何校正扩散后验采样：基于去噪器回拉曲率引导与流形对齐阻尼

Seunghyeok Shin, Minwoo Kim, Dabin Kim, Hongki Lim

发表机构 * Department of Electrical and Computer Engineering, Inha University, Incheon, 22212, South Korea（电气与计算机工程系，Inha大学，Incheon，22212，韩国）

AI总结提出一种基于去噪器回拉曲率引导和流形对齐阻尼的几何校正扩散后验采样方法，通过每噪声水平的阻尼高斯-牛顿校正替代标量引导，实现稳定高效的后验采样。

Comments Code: https://github.com/Seunghyeok0715/CLAMP

详情

Journal ref: International Conference on Machine Learning 2026

AI中文摘要

扩散后验采样将扩散先验条件于测量值，但数据一致性更新通常由手动调整的引导权重缩放，并且在刚性、算子依赖的曲率下可能破坏采样稳定性。我们使用在扩散状态坐标中计算的每噪声水平阻尼高斯-牛顿校正替代标量引导。该校正通过去噪器回拉似然梯度，使用避免前向去噪器雅可比矩阵的单侧曲率模型，并应用与去噪器残差对齐的扩散校准秩一阻尼。每个校正通过自动微分的无矩阵GMRES求解，采样通过具有闭式漂移/噪声分离的方差保持朗之万转移进行。在FFHQ和ImageNet上的逆问题中，该方法在PSNR/SSIM/LPIPS上达到竞争性能，同时运行速度显著快于大多数对比基线；在加速MRI重建中，它在对比基线中取得了最佳的PSNR/SSIM。

英文摘要

Diffusion posterior sampling conditions diffusion priors on measurements, but data-consistency updates are typically scaled by hand-tuned guidance weights and can destabilize sampling under stiff, operator-dependent curvature. We replace scalar guidance with a per-noise-level damped Gauss--Newton correction computed in diffusion-state coordinates. The correction pulls likelihood gradients back through the denoiser, uses a one-sided curvature model that avoids forward denoiser Jacobians, and applies diffusion-calibrated rank-one damping aligned with the denoiser residual. Each correction is solved with matrix-free GMRES using automatic differentiation, and sampling proceeds with a variance-preserving Langevin transition with a closed-form drift/noise split. On FFHQ and ImageNet across inverse problems, it achieves competitive PSNR/SSIM/LPIPS while running markedly faster than most of the compared baselines; on accelerated MRI reconstruction, it achieves the best PSNR/SSIM among the compared baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.27989 2026-05-28 cs.LG 版本更新

Law of Neural Interaction: Depth-Width Shape, Interaction Efficiency, and Generalization

神经交互定律：深度-宽度形状、交互效率与泛化

Wenjie Sun, Jinning Yang, Shuai Zhang, Mengnan Du

发表机构 * The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））； Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences（深圳先进技术研究院）； New Jersey Institute of Technology（新泽西理工学院）

AI总结通过将叠加原理从参数空间扩展到梯度空间定义为神经交互，发现固定预算下良好泛化伴随高效神经交互，且可通过调整深度-宽度比（R_D/W）使模型处于高效交互区间，该区间随预算扩展保持稳定，为模型形状初始化和泛化机制提供见解。

Comments 30 pages, 4 figures

详情

AI中文摘要

缩放定律的指导增加了现代大型语言模型（LLMs）的资源需求，但在固定预算下这些模型是否有效利用资源仍存疑问。先前研究证明叠加是损失的关键贡献者。通过利用神经特征假设，我们将叠加从参数空间扩展到梯度空间，并将其定义为神经交互。我们发现，在固定预算下，良好的泛化通常伴随着高效的神经交互，并且可以通过调整模型的深度-宽度比（$R_{D/W}$）将模型置于高效交互区间。此外，随着预算扩大，模型的高效交互区间保持相对稳定。通过比较现有小规模密集LLMs，我们观察到接近该区间的模型在MMLU-Pro基准上表现更好。我们的发现揭示了$R_{D/W}$影响资源利用效率，进而影响泛化，为模型形状初始化和理解模型泛化机制提供了见解。神经交互定律的代码可在：https://anonymous.4open.science/r/Neural_Interaction_Law-D788 获取。

英文摘要

The guidance of scaling laws has increased the resource demands of modern large language models (LLMs), yet it remains questionable whether these models utilize resources effectively under a fixed budget. Previous research has proved superposition as a key contributor to loss. By leveraging the Neural Feature Ansatz, we extend superposition from parameter space to gradient space and define it as neural interaction. We find that under a fixed budget, good generalization is usually accompanied by efficient neural interactions, and the model can be placed in an efficient interaction interval by adjusting its depth-width ratio ($R_{D/W}$). In addition, as the budget scales up, the efficient interaction interval of the model remains relatively stable. By comparing existing small scale dense LLMs, we observe that models operating near this interval tend to perform better on the MMLU-Pro benchmark. Our findings reveal that the $R_{D/W}$ influences resource utilization efficiency and thereby affects generalization, providing insights into model shape initialization and the understanding of model generalization mechanisms. Code for Neural Interaction Law is available at: https://anonymous.4open.science/r/Neural_Interaction_Law-D788

URL PDF HTML ☆

赞 0 踩 0

2605.27967 2026-05-28 stat.ME cs.AI cs.LG stat.ML 版本更新

Multi-Teacher Knowledge Distillation via Teacher-Informed Mixture Priors

通过教师引导的混合先验进行多教师知识蒸馏

Luyang Fang, Yongkai Chen, Jiazhang Cai, Ping Ma, Wenxuan Zhong

发表机构 * Department of Statistics, University of Georgia（佐治亚大学统计系）； Department of Statistics, Harvard University（哈佛大学统计系）

AI总结提出多教师贝叶斯知识蒸馏（MT-BKD）框架，利用贝叶斯推断和教师引导的先验分布，结合熵加权机制，实现多教师知识的高效融合与不确定性量化。

详情

AI中文摘要

知识蒸馏是一种强大的模型压缩方法，能够高效部署复杂的深度学习模型（教师模型），包括大型语言模型。然而，其潜在的统计机制尚不明确，且不确定性评估常被忽视，特别是在需要多样化教师专业知识的实际场景中。为解决这些挑战，我们引入了 extit{多教师贝叶斯知识蒸馏}（MT-BKD），其中蒸馏学生模型在贝叶斯框架内从多个教师模型学习。我们的方法利用贝叶斯推断来捕捉蒸馏过程中的固有不确定性。我们引入了一种教师引导的先验，整合来自教师模型和特定任务训练数据的外部知识，提供了更好的泛化性、鲁棒性和可扩展性。此外，一种基于熵的加权机制自适应地调整每个教师的影响，使学生能够有效组合多个专业知识来源。MT-BKD增强了学生模型学习过程的可解释性，提高了预测准确性，并提供了不确定性量化。我们在合成任务和真实任务（包括蛋白质亚细胞定位预测和图像分类）上验证了MT-BKD。实验表明，我们的MT-BKD框架在性能提升和稳健的不确定性量化方面表现出色，突显了其优势。

英文摘要

Knowledge distillation is a powerful method for model compression, enabling the efficient deployment of complex deep learning models (teachers), including large language models. However, its underlying statistical mechanisms remain unclear, and uncertainty evaluation is often overlooked, especially in real-world scenarios requiring diverse teacher expertise. To address these challenges, we introduce \textit{Multi-Teacher Bayesian Knowledge Distillation} (MT-BKD), where a distilled student model learns from multiple teachers within the Bayesian framework. Our approach leverages Bayesian inference to capture inherent uncertainty in the distillation process. We introduce a teacher-informed prior, integrating external knowledge from teacher models and task-specific training data, offering better generalization, robustness, and scalability. Additionally, an entropy-based weighting mechanism adaptively adjusts each teacher's influence, allowing the student to combine multiple sources of expertise effectively. MT-BKD enhances the interpretability of the student model's learning process, improves predictive accuracy, and provides uncertainty quantification. We validate MT-BKD on both synthetic and real-world tasks, including protein subcellular location prediction and image classification. Our experiments show improved performance and robust uncertainty quantification, highlighting the strengths of our MT-BKD framework.

URL PDF HTML ☆

赞 0 踩 0

2605.27958 2026-05-28 cs.CL cs.AI cs.LG 版本更新

Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations

压力测试LLM中的欺骗探针：扩展性、鲁棒性与欺骗表示的几何结构

Sachin Kumar

发表机构 * LexisNexis（LexisNexis公司）

AI总结本文通过系统压力测试，诊断线性探针在分布偏移下失效的原因，发现风格增强可恢复近完美检测，并证明欺骗编码非单一线性方向或熵代理，而是分布式亚阈值特征。

Comments Accepted at the GEM Workshop @ ACL 2026

详情

AI中文摘要

基于LLM激活训练的线性探针越来越多地被提议作为欺骗检测指标，但在干净基准上报告AUROC超过0.96，而在分布偏移下崩溃。本文系统地对Gemma 3模型家族（1B-27B参数）的探针指标进行压力测试，诊断其失败原因而不仅仅是记录失败。我们测试了关于欺骗编码的四个假设：（1）单一线性方向，（2）多维子空间，（3）凸锥包，（4）熵代理。我们的设计包括跨域转移矩阵、基于排列零基线的多维探针分析、熵残差化测试以及8种风格偏移下的干扰评估。我们发现：（a）探针在干净数据上达到近乎完美的AUROC（>=0.998），但在风格偏移下崩溃；风格增强的探针在未见风格上恢复近乎完美的检测（平均AUROC 0.979-0.983）；（b）单一方向假设被拒绝（k=1仅捕获0.61-0.80 AUROC），跨域转移失败被确认为几何原因而非层不匹配驱动；（c）熵代理假设被拒绝（最大|rho|=0.454，残差化后最大Delta-AUROC=0.004）；（d）欺骗并未形成显著的线性子空间（每域k*=0），但多维探针（k>=5）通过分布式亚阈值特征恢复信号。探针脆弱性反映了分布狭窄性而非架构限制：风格增强的探针在4B和27B均恢复近乎完美的检测，表明逆缩放模式是训练分布伪影而非真正的规模依赖现象。

英文摘要

Linear probes trained on LLM activations are increasingly proposed as deception-detection metrics, yet report AUROC exceeding 0.96 on clean benchmarks while collapsing under distributional shift. This paper systematically pressure-tests probe-based metrics across the Gemma 3 model family (1B-27B parameters), diagnosing why they fail rather than merely documenting that they fail. We test four hypotheses about deception encoding: (1) single linear direction, (2) multi-dimensional subspace, (3) convex conic hull, (4) entropy proxy. Our design includes cross-domain transfer matrices, multi-dimensional probe analysis with permutation null baselines, entropy-residualization tests, and distractor evaluations across 8 stylistic shifts. We find that: (a) probes achieve near-perfect AUROC (>=0.998) on clean data but collapse under stylistic shifts; style-augmented probes recover near-perfect detection (mean AUROC 0.979-0.983) on unseen styles; (b) the single-direction hypothesis is rejected (k=1 captures only 0.61-0.80 AUROC), with cross-domain transfer failure confirmed as geometric rather than layer-mismatch-driven; (c) the entropy-proxy hypothesis is rejected (max |rho|=0.454, max Delta-AUROC after residualization=0.004); and (d) deception does not form a significant linear subspace (per-domain k*=0), yet multi-dimensional probes (k>=5) recover the signal through distributed sub-threshold features. Probe fragility reflects distributional narrowness rather than an architectural limitation: style-augmented probes recover near-perfect detection at both 4B and 27B, establishing that the inverse scaling pattern is a training-distribution artifact rather than a genuine scale-dependent phenomenon.

URL PDF HTML ☆

赞 0 踩 0

2605.27954 2026-05-28 cs.LG 版本更新

Cyclical Entropy Eruption: Entropy Dynamics in Agent Reinforcement Learning

循环熵爆发：智能体强化学习中的熵动力学

Wendi Li, Shawn Im, Sharon Li

发表机构 * Department of Computer Sciences, University of Wisconsin–Madison（威斯康星大学麦迪逊分校计算机科学系）

AI总结本文发现智能体强化学习训练中存在循环熵爆发现象，并提出了SEAL辅助损失函数来稳定训练、提升性能。

详情

AI中文摘要

当图文推理遇上安全：什么决定了多模态越狱鲁棒性？

Yuan Tian, Bing Hu, Fang Wu, Xiaomin Li, Binghang Lu, Neil Zhenqiang Gong

发表机构 * Independent Researcher（独立研究者）； Stanford University（斯坦福大学）； Harvard University（哈佛大学）； Purdue University（普渡大学）； Duke University（杜克大学）

AI总结本文研究多模态大语言模型中不同图文推理范式对越狱鲁棒性的影响，发现显式图像工具交互能显著降低攻击成功率，并通过引入图像工具安全向量框架从表征层面解释其机制。

Comments 17 pages, 6 figures, 7 tables

详情

AI中文摘要

图文推理正成为大型视觉-语言模型的一种新推理范式，但其安全性影响尚不明确。现有系统已涵盖多种流程设计，包括直接响应生成、纯文本前轮、视觉状态操作以及显式外部图像工具调用。本文探究这些评估范式中哪一种能提升多模态越狱鲁棒性及其原因。在多个视觉-语言模型上，我们的实验表明显式图像工具交互的攻击成功率最低，平均相对降低约30%。这一发现起初令人惊讶：即使返回的图像工具输出被人为覆盖或本身不安全，攻击成功率仍保持较低，但在纯文本前轮控制下又恢复到接近直接回答的水平。这些结果表明，较低的攻击成功率并非由良性返回图像语义或仅文本图像工具轨迹解释。为解释这一模式，我们引入了一个图像工具安全向量框架，将图像工具调用建模为隐藏表示向安全相关方向的残差偏移。表征层面的分析和激活干预支持了这一解释。总体而言，我们的结果表明，显式图像工具交互是提升越狱鲁棒性的一种有前景的设计模式，同时也推动了针对特定流程的安全性评估。

英文摘要

Think-with-image reasoning is emerging as a new inference paradigm for large vision-language models, but its safety implications remain poorly understood. Existing systems already span multiple process designs, including direct response generation, text-only prior turn, visual-state manipulation, and explicit external image-tool invocation. In this paper, we ask which of these evaluated paradigms improves multimodal jailbreak robustness, and why. Across multiple vision-language models, explicit image-tool interaction yields the lowest attack success rates in our experiments, reducing jailbreak success by around 30% relative on average across the evaluated models. This finding is initially surprising: ASR remains low even when the returned image-tool output is manually overridden or itself unsafe-looking, but returns near direct-answering levels under text-only prior turn controls. These results indicate that the lower ASR is not explained by benign returned-image semantics or by the textual image-tool trace alone. To explain the pattern, we introduce an image-tool safety vector framework that models image-tool invocation as a residual shift in hidden representations toward a safety-relevant direction. Representation-level analyses and activation interventions support this account. Overall, our results suggest that explicit image-tool interaction is a promising design pattern for improving jailbreak robustness, while also motivating pipeline-specific safety evaluation.

URL PDF HTML ☆

赞 0 踩 0

2605.27929 2026-05-28 q-bio.NC cs.LG 版本更新

Exploratory Experience Shapes the Geometry of Predictive Representations

探索性经验塑造预测表征的几何结构

Kseniia Shilova, Abdelrahman Sharafeldin, Advay Balakrishnan, Hannah Choi

发表机构 * School of Mathematics Georgia Institute of Technology（数学系佐治亚理工学院）； College of Computing Georgia Institute of Technology（计算学院佐治亚理工学院）

AI总结通过构建树状迷宫中的在线学习智能体，研究探索与利用行为策略如何影响基于预测编码的内部表征几何结构，发现探索行为促进更具空间组织性的表征，且与小鼠实验数据一致。

详情

AI中文摘要

主动感知通过行动-感知循环将行为与学习联系起来：行动决定了用于更新内部感知预测模型的观测，而该模型随后指导下一步行动。预测编码框架为建模这一过程提供了自然方式，因为内部表征不断更新以预测未来观测。这里，我们探究探索性和利用性行为策略如何塑造这些内部预测表征。我们在树状迷宫中构建了一个在线学习智能体，其可调参数控制探索与利用模式之间的平衡。智能体根据自身行为产生的经验更新基于预测编码的感知模型。该模型预测未来迷宫状态和奖励概率，使智能体能够通过探索期间的预期信息增益或利用期间的预测奖励来选择行动。结果表明，产生的内部预测表征强烈依赖于智能体的行为模式。探索性智能体发展出更具空间组织性的表征，并更好地在潜在空间中保留迷宫转换的结构。相反，利用性智能体学习到组织性较差的表征。然后，我们用水剥夺小鼠在相同迷宫中导航的自然轨迹训练该预测模型，并将结果表征与智能体轨迹学习到的表征进行比较。更具探索性的小鼠表现出与探索性智能体高度匹配的表征几何结构，而访问模式受限的小鼠则类似于奖励驱动的利用性智能体。总之，这些发现表明，在人工智能体和动物中，探索通过围绕空间位置和转换上下文组织潜在空间，使预测模型能够形成泛化的内部表征。

英文摘要

Active sensing links behavior and learning through an action-perception loop: actions determine the observations used to update internal predictive models of perception, which subsequently guide the next actions. Predictive-coding frameworks provide a natural way to model this process, since internal representations are continuously updated to predict future observations. Here, we ask how exploratory and exploitative behavioral strategies shape these internal predictive representations. We build an online learning agent in a tree-like maze with a controllable parameter regulating the balance between exploratory and exploitative regimes. The agent updates a predictive-coding-based perception model from experience generated by its own behavior. The model predicts both future maze states and reward probability, allowing the agent to select actions either by expected information gain during exploration or by predicted reward during exploitation. We show that the resulting internal predictive representations depend strongly on the agent's behavioral regime. Exploratory agents develop representations that are more spatially organized and better preserve the structure of maze transitions in latent space. In contrast, exploitative agents learn less organized representations. We then train this predictive model on natural trajectories of water-deprived mice navigating the same maze and compare the resulting representations with those learned from agent trajectories. More exploratory mice show representational geometries that closely match those of exploratory agents, whereas mice with more restricted visitation patterns resemble reward-driven, exploitative agents. Together, these findings suggest that exploration enables predictive models to form generalized internal representations by organizing latent space around both spatial location and transition context in artificial agents and animals.

URL PDF HTML ☆

赞 0 踩 0

2605.27927 2026-05-28 cs.CV cs.LG 版本更新

Structure-Guided Visual Perturbation Neutralization for LVLMs

结构引导的视觉扰动中和用于大型视觉语言模型

Yuanhe Zhang, Xueting Wang, YanBin Ren, Haoran Gao, Xinhan Zheng, Zhenhong Zhou, Fanyu Meng, Li Sun, Sen Su

发表机构 * Beijing University of Posts and Telecommunications（北京邮电大学）； University of Science and Technology of China（中国科学技术大学）； JIUTIAN Research（JIUTIAN研究所）； Nanyang Technological University（南洋理工大学）； Chongqing University of Posts and Telecommunications（重庆邮电大学）

AI总结提出结构诱导引导中和（SIGN）框架，通过先验结构提取和动态引导中和实现轻量级、即插即用的对抗性防御，在仅0.5%像素修改和0.16秒每图下达到87%以上防御成功率。

详情

AI中文摘要

图像输入使大型视觉语言模型（LVLMs）能够感知细粒度的视觉信息，但也引入了一个像素级攻击面，通过该攻击面，对抗性扰动可以引发不安全的模型行为。然而，大多数现有防御是为传统计算机视觉场景设计的，因此常常忽略LVLMs所需的跨模态对齐，导致性能下降。同时，针对LVLMs的有限防御通常需要大量的图像修改并引入可观的计算开销，从而损害推理质量和效率。为解决这些限制，我们提出了结构诱导引导中和（SIGN），一个轻量级、即插即用的防御框架，通过先验结构提取提高LVLM兼容性，并通过动态引导中和实现高效的扰动抑制。大量实验表明，SIGN在仅0.5%像素修改和每张图像0.16秒的情况下实现了超过87%的防御成功率，同时几乎保留了原始视觉表示和良性任务性能。我们的工作为需要昂贵模型训练的防御提供了一种轻量级替代方案，并突显了利用视觉编码器进行高效对抗性保护的潜力。我们的代码已在 https://anonymous.4open.science/r/SIGN-BCB1 开源。

英文摘要

Image inputs enable Large Vision Language Models (LVLMs) to perceive fine-grained visual information, but also introduce a pixel-level attack surface through which adversarial perturbations can elicit unsafe model behaviors. However, most existing defenses are designed for traditional computer vision settings and thus often overlook the cross-modal alignment required by LVLMs, leading to degraded performance. Meanwhile, the limited defenses tailored to LVLMs often require substantial image modifications and introduce considerable computational overhead, thereby compromising inference quality and efficiency. To address these limitations, we propose Structure-Induced Guided Neutralization (SIGN), a lightweight, plug-and-play defense framework that improves LVLM compatibility via Prior Structural Extraction and achieves efficient perturbation suppression via Dynamic Guided Neutralization. Extensive experiments show that SIGN achieves over 87\% defense success rate with only 0.5\% pixel modification and 0.16 seconds per image, while nearly preserving original visual representations and benign task performance. Our work offers a lightweight alternative to defenses that require costly model training and highlights the potential of exploiting a vision encoder for efficient adversarial protection. Our code is open source on https://anonymous.4open.science/r/SIGN-BCB1.

URL PDF HTML ☆

赞 0 踩 0

2605.27923 2026-05-28 cs.CV cs.AI cs.LG quant-ph 版本更新

Do We Really Need Quantum Machine Learning?: A Multidimensional Empirical Study

我们真的需要量子机器学习吗？：一项多维实证研究

Sudip Vhaduri, Ryan Gammon, Sayanton Dibbo

发表机构 * Department of Computer Science, University of Alabama, AL 35487（1 计算机科学系，阿拉巴马大学，AL 35487）

AI总结通过在MNIST手写数字数据集上对经典和量子机器学习模型进行多维基准测试，发现量子模型在准确率、参数和内存效率上优于经典模型，但计算成本更高。

详情

AI中文摘要

计算机视觉的快速发展和日益复杂的图像识别任务暴露了经典机器学习模型的基本计算限制，推动了量子计算作为一种新兴范式的探索。本文对MNIST手写数字数据集上的经典和量子机器学习模型进行了全面的基准测试，评估了传统模型（经典支持向量机CSVM和量子支持向量机QSVM）以及深度神经网络模型（经典卷积神经网络CCNN和量子卷积神经网络QCNN）在四个性能维度上的表现：分类准确率、计算运行时间、参数数量和内存需求。实验作为特征维度和样本量的函数进行，并在CPU和GPU执行环境下进行，提供了受控的多维比较，以解决先前工作中的空白。对于基于SVM的模型，QSVM在准确率上始终优于CSVM，在1000个样本时达到约0.90对比约0.85，但计算成本更高。10个量子比特的特征数和200-500的样本量成为平衡准确率和运行时间的实际工作点。对于神经网络模型，CCNN和QCNN实现了可比的分类准确率，在64个特征和60000个样本时均超过0.96，但QCNN在参数和内存效率上显著更优，在较高特征数下比CCNN少约94%的参数和约75%的内存，但运行时间更长。在两个模型家族中，随着特征维度或样本量的增加，量子模型在准确率上始终以更大优势超越经典模型。

英文摘要

The rapid growth of computer vision and increasingly complex image recognition tasks has exposed fundamental computational limitations of classical machine learning models, motivating the exploration of quantum computing as an emerging new paradigm. This paper presents a comprehensive benchmarking study of classical and quantum machine learning models for image recognition on the MNIST handwritten digit dataset, evaluating both traditional models, a Classical Support Vector Machine (CSVM) and a Quantum Support Vector Machine (QSVM), and deep neural network models, a Classical Convolutional Neural Network (CCNN) and a Quantum Convolutional Neural Network (QCNN), across four performance dimensions: classification accuracy, computational runtime, parameter count, and memory requirements. Experiments are conducted as functions of both feature dimensionality and sample size, and across CPU and GPU execution environments, providing a controlled, multidimensional comparison to address gaps in prior work. For the SVM-based models, QSVM consistently outperforms CSVM in accuracy, reaching $\sim$ 0.90 versus $\sim$ 0.85 at 1,000 samples, with a higher computational cost. A feature count of 10 qubits and a sample size in the range of 200 -- 500 emerge as practical operating points that balance accuracy and runtime. For the neural network models, CCNN and QCNN achieve comparable classification accuracy, both exceeding 0.96 at 64 features and 60,000 samples, yet QCNN offers substantially superior parameter and memory efficiency, requiring $\sim$ 94\% fewer parameters and $\sim$ 75\% less memory than CCNN at higher feature counts, while incurring higher runtime. Across both model families, quantum models consistently outperform classical models by greater margins in accuracy as feature dimensionality or sample size increases.

URL PDF HTML ☆

赞 0 踩 0

2605.27919 2026-05-28 cs.RO cs.LG 版本更新

Frequency-Guided Action Diffusion via Sub-Frequency Manifold Traversal

通过子频率流形遍历的频率引导动作扩散

Junlin Wang

发表机构 * School of Engineering and Applied Science University of Pennsylvania（工程与应用科学学院费城大学）

AI总结提出频率引导算子（FGO），通过子频率流形逐步引导扩散策略生成平滑动作，在15个机器人操作任务上提升了动作平滑性和时间一致性。

Comments A preprint version of FGO

详情

AI中文摘要

通过行为克隆学习视觉运动策略通常涉及模仿人类操作员收集的专家演示。然而，自然的人类演示固有地包含高频噪声，例如间歇性抖动、暂停和动作抖动。训练策略直接模仿这些原始轨迹不可避免地会导致模型继承这些次优行为。这种病理在基于扩散的策略中尤为明显，其中迭代去噪步骤可能无意中放大高频伪影而牺牲有意义的细粒度细节。为了解决这些限制，我们提出了一种新颖的基于频率的算法，该算法能够实现隐式频谱操控和平滑动作生成。我们的方法，频率引导算子（FGO），通过逐步将噪声样本通过具有扩展频谱带的中间子频率流形驱动，来引导扩散策略的生成过程。在来自5个基准测试的15个机器人操作任务上验证，FGO在增强动作平滑性和时间一致性方面取得了优越性能，同时保留了成功执行任务所需的细节。项目网站：https://henrywjl.github.io/frequency-guidance-operator/

英文摘要

Learning visuomotor policies via behavior cloning typically involves mimicking expert demonstrations collected by human operators. However, natural human demonstrations inherently contain high-frequency noise, such as intermittent jerks, pauses, and action jitter. Training policies to directly imitate these raw trajectories inevitably causes the model to inherit these suboptimal behaviors. This pathology is particularly pronounced in diffusion-based policies, where iterative denoising steps can inadvertently amplify high-frequency artifacts at the expense of meaningful fine-grained details. To address these limitations, we present a novel frequency-based algorithm that enables implicit spectral maneuvering and smooth action generation. Our method, Frequency Guidance Operator (FGO), steers the generation process of diffusion polices by progressively driving the noisy samples through intermediate sub-frequency manifolds with expanding spectral bands. Validated on 15 robotic manipulation tasks from 5 benchmarks, FGO achieves superior performance in enhancing action smoothness and temporal consistency while preserving the details necessary for successful task execution. Project website: https://henrywjl.github.io/frequency-guidance-operator/

URL PDF HTML ☆

赞 0 踩 0

2605.27913 2026-05-28 cs.LG 版本更新

Where LLM Annotators Fail: Label-Free Learning on Graphs with LLMs

LLM标注者失败之处：基于LLM的图上无标签学习

Safal Thapaliya, Jiatan Huang, Chuxu Zhang

发表机构 * University of Connecticut（康涅狄格大学）

AI总结针对图节点分类中LLM标注噪声不仅依赖于类别还依赖于区域的问题，提出聚类感知噪声估计（CANE）框架，通过估计聚类条件可靠性来筛选和校正伪标签，在多个图基准上超越现有无标签方法。

详情

AI中文摘要

图上的节点分类通常需要标注节点，然而在图规模上获取标签成本高昂。当节点属性包含语义内容（如论文摘要、网页或产品描述）时，大型语言模型（LLM）可以通过标注一小部分节点提供低成本监督。然而，这些LLM生成的标签带有噪声，现有的无标签图学习方法通常将这种噪声视为全局的或类别条件的。我们发现LLM标注错误不仅依赖于类别，还依赖于区域：在同一类别内，可靠性在特征空间聚类之间可能差异显著。鉴于此，我们提出聚类感知噪声估计（CANE），一种无标签学习框架，无需真实标签即可估计聚类条件的LLM可靠性，并利用该估计决定信任哪些伪标签以及校正哪些标签。在各种图基准和GNN骨干网络上，CANE优于最强的无标签基线，在表现出更强聚类条件噪声的数据集上提升最大。

英文摘要

Node classification on graphs often requires labeled nodes, yet obtaining labels at graph scale is expensive. When node attributes contain semantic content, such as paper abstracts, web pages, or product descriptions, large language models (LLMs) can provide low-cost supervision by annotating a small subset of nodes. However, these LLM-generated labels are noisy, and existing label-free graph learning methods usually treat this noise as either global or class-conditional. We find that LLM annotation errors are not only class-dependent but also region-dependent: within the same class, reliability can vary sharply across feature-space clusters. In light of this, we propose Cluster-Aware Noise Estimation (CANE), a label-free learning framework that estimates cluster-conditional LLM reliability without ground truth labels, and uses this estimate to decide which pseudo-labels to trust, and which labels to correct. Across various graph benchmarks and GNN backbones, CANE improves over the strongest label-free baselines, with the largest gains on datasets exhibiting stronger cluster-conditional noise.

URL PDF HTML ☆

赞 0 踩 0

2605.27904 2026-05-28 cs.AI cs.LG 版本更新

Dr-CiK: A Testbed for Foresight-Driven Agents

Dr-CiK：面向预见驱动型智能体的测试平台

Yihong Tang, Andrew Robert Williams, Arjun Ashok, Vincent Zhihao Zheng, Lijun Sun, Alexandre Drouin, Issam H. Laradji, Étienne Marcotte, Valentina Zantedeschi

发表机构 * McGill University（麦吉尔大学）； ServiceNow Research（ServiceNow研究院）； Mila -- Quebec AI Institute（蒙特利尔AI研究院）； University of British Columbia（不列颠哥伦比亚大学）

AI总结针对现有上下文辅助预测基准假设上下文已提供的问题，提出Dr-CiK基准，评估智能体从文档语料库中检索、过滤、提炼预测相关上下文并生成预测的能力，实验表明高质量上下文显著提升预测性能，但现有深度研究智能体恢复证据不足5%、易受干扰误导。

详情

AI中文摘要

现实环境中的时间序列预测通常不仅依赖于历史观测，还依赖于必须从嘈杂、异构的信息源中主动发现的外部上下文。然而，现有的上下文辅助预测基准通常假设支持性上下文已经提供，未考虑智能体是否能自行识别。因此，我们引入Dr-CiK，一个用于评估智能体是否能够从文档语料库中检索预测相关的支持性上下文、过滤干扰项、将检索到的上下文提炼为对预测有用的证据，并生成由该证据支持的预测的基准。通过上下文消融实验以及对最先进的深度研究和预测方法的联合评估，我们表明高质量上下文显著提高了Dr-CiK中的预测性能。然而，大多数现有的深度研究智能体仅能恢复一小部分真实支持证据（通常<5%），经常被干扰项误导（>80%的干扰项引用），并且可能导致预测器在使用检索到的上下文时比不使用上下文时表现更差。我们的结果激励了对预见驱动型智能体的研究，这些智能体能够搜索正确的上下文以预测未来。

英文摘要

Time series forecasting in real-world settings often depends not only on historical observations, but also on external context that must be actively discovered from noisy, heterogeneous information sources. Yet existing context-aided forecasting benchmarks typically assume that the supporting context is already provided, leaving open whether agents can identify it on their own. Therefore, we introduce Dr-CiK, a benchmark for evaluating whether agents can retrieve forecasting-relevant supporting context from a document corpus, filter out distractors, distill the retrieved context into forecast-useful evidence, and generate forecasts supported by that evidence. Through context ablations and evaluations of state-of-the-art deep research and forecasting methods paired together, we show that high-quality context substantially improves forecasting performance in Dr-CiK. However, most existing DR agents recover only a small fraction of the ground-truth supporting evidence (usually <5%), are frequently misled by distractors (>80% distractor citations), and can cause forecasters to perform worse with retrieved context than without context. Our results motivate research on foresight-driven agents that search for the right context to predict the future.

URL PDF HTML ☆

赞 0 踩 0

2605.27892 2026-05-28 cs.LG 版本更新

FedEHR-Gen: Federated Synthetic Time-Series EHR Generation via Latent Space Alignment and Distribution-Aware Aggregation

FedEHR-Gen: 通过潜在空间对齐和分布感知聚合的联邦合成时间序列电子健康记录生成

Jun Bai, Ziyang Song, Yue Li

发表机构 * McGill University（麦吉尔大学）； Mila - Quebec AI Institute（魁北克人工智能研究所）； Ohio University（俄亥俄大学）

AI总结提出FedEHR-Gen，首个用于跨分布式医院合成时间序列电子健康记录的联邦框架，通过两阶段学习（联邦自编码器对齐潜在空间和联邦TCVAE分布感知聚合）解决高维稀疏和异质性挑战，在eICU和MIMIC-III上达到与集中训练相当的生成质量。

Comments 8 pages main paper with 14 pages supplementary appendix

详情

AI中文摘要

合成电子健康记录（EHR）生成为隐私受限的医疗环境中的数据增强和跨医院建模提供了一条有前景的途径。然而，大多数现有的EHR生成模型是集中式的，需要汇集各医院的数据，这在现实世界中数据共享受限时往往不可行。虽然联邦EHR生成提供了一种自然的解决方案，但由于EHR数据的高维性、稀疏性和跨医院异质性，直接的联邦建模常常崩溃或发散。在这项工作中，我们提出了FedEHR-Gen，这是首个用于跨分布式医院合成时间序列EHR生成的联邦框架。FedEHR-Gen采用两阶段学习范式。首先，我们引入了一个联邦自编码器，将高维稀疏的EHR特征投影到紧凑的潜在空间。为了确保跨医院的语义一致性，我们开发了一种逐层匹配聚合机制，将局部编码器对齐到统一的全局潜在空间。其次，在这个对齐的潜在空间上，我们训练了一个具有分布感知聚合的联邦时间条件变分自编码器（TCVAE），从而在严重的跨医院异质性下实现稳定的时间生成建模。在eICU和MIMIC-III数据集上的大量实验表明，FedEHR-Gen在生成保真度、下游效用和隐私风险方面与集中训练相当，同时始终优于标准的联邦基线。

英文摘要

Synthetic Electronic Health Record (EHR) generation provides a promising avenue for data augmentation and cross-hospital modeling in privacy-constrained healthcare settings. However, most existing EHR generative models are centralized and require pooling data across hospitals, which is often infeasible when real-world data sharing is restricted. While federated EHR generation offers a natural solution, direct federated modeling often collapses or diverges due to the high dimensionality, sparsity, and cross-hospital heterogeneity of EHR data. In this work, we propose FedEHR-Gen, the first federated framework for synthetic time-series EHR generation across distributed hospitals. FedEHR-Gen uses a two-stage learning paradigm. First, we introduce a federated autoencoder that projects high-dimensional and sparse EHR features onto a compact latent space. To ensure semantic consistency across hospitals, we develop a layer-wise matching aggregation mechanism that aligns local encoders into a unified global latent space. Second, operating on this aligned latent space, we train a federated temporal conditional variational autoencoder (TCVAE) with distribution-aware aggregation, enabling stable temporal generative modeling under severe cross-hospital heterogeneity. Extensive experiments on the eICU and MIMIC-III datasets demonstrate that FedEHR-Gen achieves generation fidelity, downstream utility, and privacy risk comparable to centralized training, while consistently outperforming the standard federated baseline.

URL PDF HTML ☆

赞 0 踩 0

2605.27877 2026-05-28 cs.LG cs.AI 版本更新

SPAR: Support-Preserving Action Rectification

SPAR: 支持保持的动作纠正

Jiaxin Zhao, Weihang Pan, Xun Liang, Binbin Lin

发表机构 * Zhejiang University（浙江大学）

AI总结提出支持保持的动作纠正（SPAR）框架，通过将全局学习转化为局部残差纠正，并引入潜在自模仿机制，解决了离线策略改进中价值最大化与数据分布拟合之间的冲突，在D4RL基准上达到最优性能。

详情

AI中文摘要

离线策略改进面临着最大化价值与拟合数据分布之间的固有冲突。虽然样本内加权回归是稳定的，但它过度保守，抑制了分布尾部的高价值动作；相反，基于梯度的方法通常表现出梯度的拟合-优化冲突，这会将策略推离数据流形。为了解决这个问题，我们提出了支持保持的动作纠正（SPAR），它将全局学习重新定义为锚定在冻结的纯行为克隆策略上的局部残差纠正。该框架在残差空间中进行细粒度拟合和局部策略改进，从而收缩搜索空间。我们进一步引入了潜在自模仿，利用潜在采样加权回归机制来解决残差空间中的拟合-改进梯度冲突。理论上，我们证明了该机制消除了标准价值梯度的流形正常漂移，而广泛的D4RL实验表明，SPAR从次优基线中提取了显著的增益，实现了最先进的性能。

英文摘要

Offline policy improvement faces an inherent conflict between maximizing value and fitting the data distribution. While in-sample weighted regression is stable, it suffers from over-conservatism that suppresses high-value actions in the distribution tail; conversely, gradient-based approaches often exhibit a fitting-optimization conflict of gradients, which drives the policy off the data manifold. To address this, we propose Support-Preserving Action Rectification (SPAR), which reframes global learning as a local residual rectification anchored to a frozen pure behavior cloning policy. This framework performs fine-grained fitting and local policy improvement in the residual space, thereby contracting the search space. We further introduce Latent Self-Imitation, utilizing a latent-sampling weighted-regression mechanism to address fitting-improvement gradient conflict in the residual space. Theoretically, we prove this mechanism eliminates the manifold-normal drift of standard value gradients, while extensive D4RL experiments show SPAR extracts significant gains from suboptimal baselines to achieve state-of-the-art performance.

URL PDF HTML ☆

赞 0 踩 0

2605.27861 2026-05-28 cs.LG cs.AI q-bio.QM 版本更新

From Detection to Mechanism: Cross-Attention Graph Neural Networks Enable Drug-Drug Interaction Type Prediction An Ablation Study with Acetylsalicylic Acid Validation

从检测到机制：跨注意力图神经网络实现药物相互作用类型预测——一项以乙酰水杨酸验证的消融研究

Juergen Dietrich

AI总结本研究通过系统消融实验比较三种图神经网络架构，发现跨注意力机制（CrossAtt）在药物相互作用类型预测（多分类）上比二元检测提升显著，并在乙酰水杨酸验证中实现10/10正确预测。

Comments 12 pages, 1 figure

详情

AI中文摘要

预测两种药物是否相互作用（二元检测）与预测该相互作用的机制类型（多分类）是本质上不同的任务。本研究在包含38,337个正例对（涵盖86种相互作用类型）的公开基准数据集上，对三种图神经网络架构进行了系统的消融实验，用于药物相互作用预测。在相同训练条件下（n=61,339对）比较了三种架构：带有拼接的双消息传递神经网络（Concat）、带有四头跨注意力的双MPNN（CrossAtt）以及引入相互作用图的三元MPNN（Ternary）。CrossAtt在多分类F1-macro上比Concat绝对提升+0.186（+45%），而二元AUC仅提升+0.012（+1.3%），证实原子级分子间通信专门支持机制类型分类。尽管训练数据相同，三元架构表现不佳，其失败与训练不稳定性假设一致。在训练前保留的十个乙酰水杨酸药物对上的验证表明，CrossAtt实现了10/10正确的DDI类型预测，而Ternary为0/10。在所有架构中识别出两个一致的失败案例，与一项配套毒性研究中确立的结构限制相关。

英文摘要

Predicting whether two drugs interact (binary detection) is a substantially dif- ferent task from predicting the mechanism type of that interaction (multi-class classification). This study presents a systematic ablation study of three Graph Neural Network (GNN) architectures for drug-drug interaction (DDI) prediction on a publicly available benchmark dataset comprising 38,337 positive pairs across 86 interaction types. Three architectures are compared under identical training conditions (n = 61,339 pairs): a siamese dual Message Passing Neural Network (MPNN) with concatenation (Concat), a dual MPNN with four-head cross-attention (CrossAtt), and a ternary MPNN incorporating an interaction graph (Ternary). CrossAtt improves multi-class F1-macro by +0.186 absolute (+45%) over Concat, while improving binary AUC by only +0.012 (+1.3%) - confirming that atom-level inter-molecular communication specifically enables mechanism-type classification. The ternary architecture underperforms despite equivalent training data, with its failure consistent with a training instability hypothesis. Validation on ten acetylsali- cylic acid (ASA) drug pairs, held out prior to training, demonstrates 10/10 correct DDI-type predictions for CrossAtt versus 0/10 for Ternary. Two consistent failure cases are identified across all architectures, linking to structural limits established in a companion toxicity study.

URL PDF HTML ☆

赞 0 踩 0

2605.27858 2026-05-28 cs.CL cs.AI cs.LG 版本更新

DecomposeRL: Learning to Ask Useful, Informative, and Diverse Questions for Semi-Supervised, Traceable Claim Verification

DecomposeRL: 学习提出有用、信息丰富且多样的问题以进行半监督、可追踪的声明验证

Shubhashis Roy Dipta, Ankur Padia, Francis Ferraro

发表机构 * Department of Computer Science and Electrical Engineering（计算机科学与电气工程系）

AI总结提出DecomposeRL框架，通过GRPO和多面奖励集成将声明分解为可追踪的子问题，在完全监督和半监督设置下实现高精度，且模型规模小4倍仍匹配大模型性能。

详情

AI中文摘要

声明验证分为两类：端到端分类器准确但无法提供可检查的追踪，而基于分解的方法可产生可检查的追踪但在基准数据集上性能滞后。我们提出DecomposeRL，一种能产生可检查追踪的准确声明验证器。DecomposeRL将分解建模为使用GRPO和多面奖励集成训练的RL策略，支持从无标签声明进行完全监督和半监督学习。DecomposeRL通过数据筛选漏斗解决了GRPO高昂的训练成本，将115K事实验证声明提炼为包含密集学习信号的5K声明子集。我们表明，仅在约5K精选声明上使用完全监督训练的DecomposeRL-7B策略，在包含生物医学、政治、科学和通用领域声明的11个声明验证基准上，实现了86.3的域内和69.8的域外平衡准确率。尽管规模小4倍，它匹配了32B基线和GPT-4.1-mini，并且在仅10%标签声明数据的半监督设置中进一步优于基线。代码、数据和模型见https://dipta007.github.io/DecomposeRL。

英文摘要

Claim verification splits between end-to-end classifiers that are accurate but yields no inspectable traces, and decomposition-based methods produce inspectable traces but lag performance on benchmark datasets. We propose DecomposeRL an accurate claim-verifier that produce inspectable traces. DecomposeRL frames decomposition as an RL policy trained with GRPO and a multi-faceted reward ensemble, enabling both fully supervised and semi-supervised learning from unlabeled claims. DecomposeRL addresses the prohibitive training cost of GRPO with a data-curation funnel that distills 115K fact-verification claims into a compact, learning-signal-dense subset of 5K claims. We show that a DecomposeRL-7B policy trained with full supervision on only ~5K curated claims achieves 86.3 in-domain and 69.8 out-of-domain balanced accuracy across 11 claim-verification benchmarks containing biomedical, political, scientific, and general-domain claims. Despite being 4x smaller, it matches 32B baselines and GPT-4.1-mini, and it further outperforms baselines in a semi-supervised setting with only 10% labeled claims data. Code, data, and models are available at https://dipta007.github.io/DecomposeRL

URL PDF HTML ☆

赞 0 踩 0

2605.27834 2026-05-28 cs.LG stat.ML 版本更新

Reward Transfer from Inverse Reinforcement Learning: A Coupled Minimax Approach

从逆强化学习中的奖励迁移：一种耦合极小极大方法

Guang-Yuan Hao, Lars van der Laan, Aurélien Bibaut, Nathan Kallus

发表机构 * Cornell Tech, Cornell University（康奈尔科技，康奈尔大学）； Netflix Research（netflix研究）； Department of Statistics, University of Washington（华盛顿大学统计学系）

AI总结提出一种耦合极小极大方法，通过联合求解源和目标环境的贝尔曼方程组，消除源贝尔曼残差误差的一阶影响，实现逆强化学习奖励从源环境到目标环境的有效迁移。

详情

AI中文摘要

我们研究利用逆强化学习从专家演示中学习到的奖励从一个环境迁移到另一个不同环境的强化学习问题。当演示在受控环境中收集时，这自然发生。我们将问题表述为跨源和目标环境的贝尔曼方程联合系统，并开发了目标软$q$函数的极小极大估计器。顺序求解方法首先估计源奖励，然后将其代入目标控制问题，而耦合方法则联合求解源和目标系统方程。我们表明，与顺序方法相比，耦合方法消除了源贝尔曼残差误差的一阶影响。我们刻画了每种方法的局部行为，建立了有限样本软$q$函数误差界，并证明了所得软控制策略的遗憾保证。使用脓毒症模拟器的实证研究验证了理论比较。

英文摘要

We study the transfer of rewards learned using inverse reinforcement learning from expert demonstrations in one environment to reinforcement learning in a new, different environment. This arises naturally when demonstrations are collected in a controlled environment. We formulate the problem as a joint system of Bellman equations across the source and target environments and develop minimax estimators for the target soft-$q$-function. Whereas a sequential solution approach first estimates the source reward and then plugs it into the target control problem, a coupled approach solves the source and target system of equations jointly. We show that, in contrast to the sequential approach, the coupled approach removes the first-order influence of source Bellman residual error. We characterize the local behavior of each approach, develop finite-sample soft-$q$-function error bounds, and prove regret guarantees for the resulting soft-control policy. An empirical investigation using a sepsis simulator validates the theoretical comparison.

URL PDF HTML ☆

赞 0 踩 0

2605.27831 2026-05-28 cs.LG eess.SP math.OC 版本更新

Decentralized Parameter-Free Online Learning with Compressed Gossip

基于压缩八卦的去中心化无参数在线学习

Tomas Ortega, Hamid Jafarkhani

发表机构 * Center for Pervasive Communications & Computing and EECS Department, University of California, Irvine（普及通信与计算中心和加州大学尔湾分校电子工程与计算机科学系）

AI总结提出DECO-EF算法，结合coin-betting预测与压缩差分八卦，实现去中心化在线凸优化中无参数自适应且压缩通信下的次线性网络遗憾。

详情

AI中文摘要

我们研究当智能体通过图通信且消息可能被压缩时的去中心化在线凸优化。经典的去中心化在线方法通常需要依赖于时间范围、比较器尺度或其他问题参数的学习率选择，而压缩通信引入了必须控制的额外不一致性。我们提出DECO-EF（带误差反馈的去中心化coin-betting），一种去中心化无参数在线学习算法，结合coin-betting预测与基于压缩差分的八卦。每个智能体维护一个干净的累积状态和一个压缩跟踪器，并在八卦步骤中仅通信压缩的状态差分。该方法在在线学习意义上是无参数的：它不调整时间范围、比较器范数或学习率。我们证明了在压缩通信下DECO-EF的期望比较器自适应网络遗憾界。据我们所知，这首次为压缩通信下的无参数去中心化在线学习提供了期望次线性网络遗憾保证。

英文摘要

We study decentralized online convex optimization when agents communicate over a graph and messages may be compressed. Classical decentralized online methods typically require learning-rate choices that depend on the horizon, comparator scale, or other problem parameters, while compressed communication introduces additional disagreement that must be controlled. We propose DECO-EF (DEcentralized COin-betting with Error Feedback), a decentralized parameter-free online learning algorithm that combines coin-betting predictions with compressed difference-based gossip. Each agent maintains a clean accumulated state and a compressed tracker, and communicates only compressed state differences during gossip steps. The method is parameter-free in the online-learning sense: it does not tune to the horizon, the comparator norm, or the learning rate. We prove expected comparator-adaptive network-regret bounds for DECO-EF under compressed communication. To the best of our knowledge, this gives the first expected sublinear network-regret guarantees for parameter-free decentralized online learning under compressed communication.

URL PDF HTML ☆

赞 0 踩 0

2605.27825 2026-05-28 cs.CR cs.LG 版本更新

MRMMIA: Membership Inference Attacks on Memory in Chat Agents

MRMMIA：聊天代理中记忆的成员推断攻击

Kai Chen, Yan Pang, Tianhao Wang

发表机构 * University of Virginia（弗吉尼亚大学）

AI总结针对聊天代理记忆系统，提出一种利用多次召回探针的统一成员推断攻击方法MRMMIA，在黑盒、灰盒和白盒设置下均优于基线，揭示了代理中的隐私风险。

Comments This work investigates the MIA on chat agent memory

详情

AI中文摘要

成员推断攻击（MIAs）测试目标数据记录是否属于系统的私有数据，并已成为衡量机器学习系统隐私泄露的标准工具。先前的工作主要集中在训练语料库或检索数据库上。然而，针对代理记忆的MIAs受到的关注较少，尽管这种记忆可能包含敏感的用户-代理交互、检索到的事实和用户偏好。因此，在这项工作中，我们专注于聊天代理记忆MIAs，其中对手推断候选记忆单元是否属于聊天代理的记忆存储。我们提出了多召回记忆MIA（MRMMIA），这是一种统一的攻击，利用对代理的多次召回探针来提取黑盒、灰盒和白盒设置中的成员信号。我们的实验表明，MRMMIA始终优于基线。我们的结果暴露了代理中的隐私风险，并为聊天代理记忆系统中的成员泄漏提供了初步评估框架。

英文摘要

Membership inference attacks (MIAs) test whether a target data record belongs to a system's private data, and have become a standard tool to measure privacy leakage in machine learning systems. Prior work has primarily focused on training corpora or retrieval databases. However, MIAs against agent memory have received less attention, even though such memory can contain sensitive user-agent interactions, retrieved facts, and user preferences. Therefore, in this work, we focus on chat agent memory MIAs, where an adversary infers whether a candidate memory unit belongs to the chat agent's memory store. We propose Multi-Recall Memory MIA (MRMMIA), a unified attack that utilizes multiple recall probes to the agent to extract the membership signal across black-box, gray-box, and white-box settings. Our experiments demonstrate that MRMMIA consistently outperforms baselines. Our results expose the privacy risk in agents and provide an initial evaluation framework for membership leakage in chat-agent memory systems.

URL PDF HTML ☆

赞 0 踩 0

2605.27819 2026-05-28 cs.LG cs.AI 版本更新

ReSAE: Residualized Sparse Autoencoders for Multi-Layer Transformer Interventions

ReSAE: 用于多层Transformer干预的残差化稀疏自编码器

Prathyush Poduval, Calvin Yeung, Neel Desai, Mohsen Imani

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结针对多层稀疏自编码器（SAE）在Transformer中因层间耦合导致的冗余和交互问题，提出残差化稀疏自编码器（ReSAE），通过拟合层间仿射映射并训练SAE于残差上，减少解码器冗余并提升多层替换下的交叉熵恢复。

详情

AI中文摘要

稀疏自编码器通常逐层训练，尽管Transformer残差流激活在深度上强烈耦合。这对多层干预造成实际问题：不同层的字典可能将容量用于表示相同的向前传递信息，同时替换多层可能产生单层行为无法预测的交互。我们引入残差化稀疏自编码器（ReSAE），它在选定层之间拟合仿射映射，并在未解释的残差上训练后续层的SAE，而非完整激活。重构通过拟合的仿射链映射回原始激活空间，因此ReSAE可以像普通SAE一样使用相同的干预协议进行评估。在Pythia-1.4B和Gemma-2-9B上，残差化减少了解码器冗余，并在大多数测试设置中改进了稀疏探测和定向扰动。尽管重构的原始激活方差较少，ReSAE在多层替换下恢复了更多Transformer交叉熵。这一增益在教师强制和足够的在线稀疏性下最为明显，表明ReSAE保留了与模型下游计算最相关的激活成分。这些结果表明，去除线性可预测的跨层结构是多层SAE干预的有用默认设置。

英文摘要

Sparse autoencoders are usually trained one layer at a time, even though transformer residual stream activations are strongly coupled across depth. This creates a practical problem for multi-layer interventions: different layerwise dictionaries can spend capacity representing the same carried-forward information, and replacing several layers at once can produce interactions that are not predicted by single-layer behavior. We introduce Residualized Sparse Autoencoders (ReSAEs), which fit an affine map between selected layers and train each later-layer SAE on the unexplained residual rather than on the full activation. Reconstructions are mapped back into the original activation space through the fitted affine chain, so ReSAEs can be evaluated with the same intervention protocols as ordinary SAEs. On Pythia-1.4B and Gemma-2-9B, residualization reduces decoder redundancy and improves sparse probing and targeted perturbation in most tested settings. Despite reconstructing less of the raw activation variance, ReSAEs recover more transformer cross entropy under multi-layer replacement. This gain is clearest under teacher-forcing and at sufficient sparsity online, indicating that ReSAEs preserve the components of the activation most relevant to the model's downstream computation. These results suggest that removing linearly predictable cross-layer structure is a useful default for multi-layer SAE interventions.

URL PDF HTML ☆

赞 0 踩 0

2605.27817 2026-05-28 cs.RO cs.AI cs.CV cs.LG 版本更新

Turning Video Models into Generalist Robot Policies

将视频模型转化为通用机器人策略

Sizhe Lester Li, Evan Kim, Xingjian Bai, Tong Zhao, Tao Pang, Max Simchowitz, Vincent Sitzmann

发表机构 * MIT（麻省理工学院）； CMU（卡内基梅隆大学）； Amazon FAR（亚马逊公司）

AI总结提出一种解耦的视频到动作策略VERA，利用无动作视频世界模型和基于机器人雅可比矩阵的逆动力学模型，实现跨本体的零样本机器人控制。

Comments project page: https://vera.csail.mit.edu

详情

AI中文摘要

视频生成模型已成为一种有前景的机器人骨干网络，能够生成描绘跨本体和环境完成复杂任务的视频。最近的工作提出了机器人基础模型，通过使用带有动作标签的数据微调视频模型，联合预测未来观测和动作。在本文中，我们测试了一种替代方法的极限：保持视频规划器不变，同时训练一个特定本体的逆动力学模型（IDM）。这种解耦带来了几个自然的好处：视频规划器保持本体无关，不同的视频模型可以轻松互换而无需重新训练IDM，并且IDM可以独立地使用现成的自对弈数据进行训练。我们提出了一种闭环的视频到动作策略，该策略将无动作视频世界模型与基于机器人本体雅可比矩阵的精心设计的IDM相结合。我们证明了我们的IDM设计既数据高效又可扩展到高维动作空间。我们将该策略命名为视频到具身机器人动作模型（VERA），在模拟和真实世界基准测试中取得了强劲的性能，包括零样本的Panda机械臂操作和16自由度Allegro灵巧手立方体重新定向。通过将相同的视频规划器与不同的本体特定IDM配对，可以在多个本体上使用。我们的结果表明，解耦的视频规划加上忠实的视频到动作翻译是实现零样本、跨本体和可泛化机器人控制的可行替代途径。更多结果请访问我们的项目网站：https://vera.csail.mit.edu。

英文摘要

Video generative models have emerged as a promising robotics backbone, capable of generating videos that depict the completion of complex tasks across embodiments and environments. Recent work proposes robot foundation models that jointly predict future observations and actions by finetuning video models with action-labeled data. In this paper, we test the limits of an alternative approach: leave the video planner as-is while training an embodiment-specific inverse dynamics model (IDM). This decoupling offers several natural benefits: the video planner remains embodiment-agnostic, different video models can be interchanged easily without re-training the IDM, and the IDM can be independently trained with readily available self-play data. We present a closed-loop, video-to-action policy that combines an action-free video world model with a carefully-designed IDM based on the robot embodiment Jacobian. We demonstrate that our IDM design is both data-efficient and scalable to high-dimensional action spaces. Our policy, which we coin the Video-to-Embodied Robot Action Model (VERA), achieves strong performance across simulated and real-world benchmarks, including zero-shot Panda arm manipulation and 16-DoF Allegro-hand dexterous cube re-orientation. The same video planner can be used across multiple embodiments by pairing it with different embodiment-specific IDMs. Our results show that decoupled video planning plus faithful video-to-action translation is a viable alternative route towards zero-shot, cross-embodiment, and generalizable robot control. More results are available on our project website: https://vera.csail.mit.edu.

URL PDF HTML ☆

赞 0 踩 0

2605.27813 2026-05-28 cs.CV cs.AI cs.LG 版本更新

Residualized Temporal Sparse Autoencoders for Interpreting Diffusion Models

残差化时间稀疏自编码器用于解释扩散模型

Calvin Yeung, Prathyush Poduval, Ali Zakeri, Zhuowen Zou, Mohsen Imani

发表机构 * University of California, Irvine（加州大学 Irvine 分校）

AI总结提出残差化时间稀疏自编码器，通过去噪时间步间的线性预测残差学习扩散激活轨迹中的可解释特征，并在Stable Diffusion 1.5上验证其有效性。

详情

AI中文摘要

文本到图像扩散模型通过迭代去噪过程生成图像，因此内部神经层产生激活轨迹而非单一静态表示。稀疏自编码器（SAE）最近被用于将扩散激活分解为可解释的特征方向，但大多数方法在单个时间步分析激活或基于时间条件，而非直接从完整激活轨迹中学习。在这项工作中，我们引入了用于扩散激活轨迹的残差化时间SAE。我们收集去噪时间上的激活，拟合相邻时间步之间的线性预测器，并使用初始激活以及这些线性动力学未解释的残差分量来表示每个轨迹。在这种残差化表示上训练SAE鼓励稀疏潜在变量捕捉超出线性可预测范围的结构。残差化解码器方向可以映射回激活空间，使得每个潜在变量可以作为去噪时间上的特征轨迹进行分析。通过在Stable Diffusion 1.5上的重建与消融研究、时空特征分析和定性引导实验，我们表明残差化时间SAE为研究时间结构化的扩散激活提供了一个有用的框架。

英文摘要

Text-to-image diffusion models generate images through an iterative denoising process, so internal neural layers produce trajectories of activations rather than single static representations. Sparse autoencoders (SAEs) have recently been used to decompose diffusion activations into interpretable feature directions, but most approaches analyze activations at individual timesteps or condition on time rather than learning directly from full activation trajectories. In this work, we introduce residualized temporal SAEs for diffusion activation trajectories. We collect activations across denoising time, fit linear predictors between neighboring timesteps, and represent each trajectory using an initial activation together with residual components not explained by these linear dynamics. Training an SAE on this residualized representation encourages sparse latents to capture structure beyond what is linearly predictable. The residualized decoder directions can be mapped back into activation space, allowing each latent to be analyzed as a feature trajectory over denoising time. Through reconstruction and ablation studies, spatiotemporal feature analysis, and qualitative steering experiments on Stable Diffusion~1.5, we show that residualized temporal SAEs provide a useful framework for studying temporally structured diffusion activations.

URL PDF HTML ☆

赞 0 踩 0

2605.27796 2026-05-28 eess.IV cs.CV cs.LG eess.SP stat.AP 版本更新

Benchmarking Ultrasound Foundation Models for Fetal Plane Classification

超声基础模型在胎儿平面分类中的基准测试

Leya Barrientos, Yuexi Du, Nicha C. Dvornek

发表机构 * 1 Radiology \& Biomedical Imaging, Yale School of Medicine, USA 2 Department of Biomedical Engineering, Yale University, USA

AI总结本研究对四种超声基础模型（USFM、MOFO、UltraSAM、FetalCLIP）在胎儿平面分类任务上进行基准测试，发现FetalCLIP在线性探测设置中表现最佳，而USFM在全微调设置中表现最佳，且预训练目标显著影响迁移性能。

详情

AI中文摘要

超声因其安全性、可及性和实时成像能力被广泛应用于产科护理。然而，其解读仍依赖操作者，且易受噪声和伪影影响。深度学习模型在解决这些问题上表现出色，但通常需要大量标注数据集，这在临床超声中难以获得。基础模型（FMs）提供了一种替代方案，利用大量超声图像学习可迁移的表征，从而在有限标注数据下实现泛化。本文针对胎儿平面分类任务，对超声专用基础模型进行了全面基准测试。我们评估了四种超声基础模型（USFM、MOFO、UltraSAM、FetalCLIP），并与两个CNN基线（ResNet50、EfficientNet-V2）以及一个在自然图像上预训练的ViT（DINOv3）进行比较。我们在两种互补设置下训练所有模型：全微调和冻结编码器的线性探测。所有模型均使用西班牙胎儿超声数据集进行5折患者级交叉验证训练，并在域内数据和外部非洲队列上测试，以评估跨人群泛化能力。我们发现，FetalCLIP在线性探测设置中取得最佳结果（域内F1=0.9261，域外F1=0.9731），而USFM在全微调设置中表现最佳（域内F1=0.9476，域外F1=0.9515）。MOFO和UltraSAM在两种设置中性能下降最多，在某些情况下甚至不如自然图像预训练模型。这些发现强调了预训练模型的选择对胎儿平面分类性能的显著影响，因为不同的预训练目标导致不同的迁移能力。

英文摘要

Ultrasound is widely used in obstetric care due to its safety, accessibility, and real-time imaging. However, interpretation remains operator-dependent and susceptible to noise and artifacts. Deep learning models have shown strong performance to solve these problem, but they typically require large annotated datasets that are difficult to obtain in clinical ultrasound. Foundation models (FMs) offer an alternative, using a large number of ultrasound images to learn transferable representations that can generalize with limited labeled data. This work presents a comprehensive benchmark of ultrasound-specific FMs for fetal plane classification. We evaluated four ultrasound FMs (USFM, MOFO, UltraSAM, FetalCLIP) against two CNN baselines (ResNet50, EfficientNet-V2) and a ViT (DINOv3) pretrained on natural images. We trained all models under two complementary settings: full fine-tuning and linear probing with a frozen encoder. All models were trained using 5-fold patient-level cross-validation on a Spanish fetal ultrasound dataset and tested on both in-domain data and an external African cohort to assess cross-population generalization. We found that FetalCLIP achieved the best results in the linear probing setting (F1 = 0.9261 for in-domain, F1 = 0.9731 for out-of-domain), while USFM performed best in the full fine-tuning setting (F1 = 0.9476 for in-domain, F1 = 0.9515 for out-of-domain). MOFO and UltraSAM degraded most in both settings, underperforming natural image pretrained models in some cases. These findings highlight how the choice of pretrained model strongly affects fetal plane classification performance, since different pretraining objectives lead to different levels of transferability.

URL PDF HTML ☆

赞 0 踩 0

2605.27794 2026-05-28 stat.ML cs.LG stat.ME 版本更新

Learning to target with network interference

在网络干扰下学习目标定位

Xiaomeng Wang, Hamsa Bastani, Osbert Bastani, Zhimei Ren

发表机构 * Department of Statistics and Data Science, University of Pennsylvania（统计与数据科学系，宾夕法尼亚大学）； Operations, Information and Decisions Department, University of Pennsylvania（运营、信息与决策系，宾夕法尼亚大学）； Department of Computer and Information Science, University of Pennsylvania（计算机与信息科学系，宾夕法尼亚大学）

AI总结研究在bandit设置下网络干扰中的自适应目标定位，通过线性模型和稀疏假设，针对不同干扰结构知识水平提出近最优遗憾算法。

详情

AI中文摘要

本文研究在bandit设置下网络干扰中的自适应目标定位，其中对一个个体的处理可能通过溢出效应影响他人。我们考虑稀疏场景下的线性模型，每个个体的结果最多受少数其他人影响。首先建立遗憾下界，表明忽略网络结构并将问题简化为标准线性bandit必然导致低效学习，尤其是在大规模群体中。为了理解如何利用结构信息，我们分析了干扰结构知识水平不同的场景：(1) 完全支持知识，(2) 列支持大小知识，(3) 无先验知识。对于每种场景，我们建立了表征学习基本极限的遗憾下界，并开发了实现近最优遗憾的算法。总之，我们的结果提供了干扰结构知识如何影响在线学习效率的统一视角，并在每种设置下提供了实用的自适应目标定位算法。在合成和真实数据上的数值实验证明了我们算法的实际优势。

英文摘要

This paper studies adaptive targeting under network interference in a bandit setting, where treatments applied to one individual may affect others through spillover effects. We consider a linear model in a sparse regime, where each individual's outcome can be affected by at most a few others. We first establish a regret lower bound showing that ignoring the network structure and reducing the problem to a standard linear bandit inevitably leads to inefficient learning, particularly in large populations. To understand how structural information can be leveraged, we analyze regimes with varying levels of knowledge of the interference structure: (1) full support knowledge, (2) knowledge of the column support sizes, and (3) no prior knowledge. For each regime, we establish regret lower bounds characterizing the fundamental limits of learning, and develop algorithms that achieve near-optimal regret. Together, our results provide a unified view of how knowledge of the interference structure governs the efficiency of online learning under interference, and offer practical adaptive targeting algorithms in each setting. Numerical experiments on synthetic and real-world data demonstrate the practical benefits of our algorithms.

URL PDF HTML ☆

赞 0 踩 0

2605.27790 2026-05-28 cs.LG 版本更新

SYNAPSE: Neuro-Symbolic Visual Thought-to-Text Decoding via Topological Semantic Denoising

SYNAPSE: 通过拓扑语义去噪的神经符号视觉思维到文本解码

Akshaj Murhekar, Abhijit Mishra

发表机构 * School of Information University of Texas at Austin（信息学院德克萨斯大学奥斯汀分校）

AI总结提出SYNAPSE框架，利用常识图结构和潜在样本进行推理时符号正则化，稳定脑电到文本解码中的语义生成，无需微调大语言模型。

详情

AI中文摘要

大语言模型的最新进展加速了开放词汇的脑电到想象文本解码，其中视觉感知期间记录的非侵入性神经活动被翻译成所观看刺激的连贯自然语言描述。然而，现有系统仍然高度易受生物噪声影响，其中受损的神经投影在冻结语言模型中引发幻觉或语义不稳定的生成。我们引入了SYNAPSE（符号神经对齐用于精确语义提取），一个轻量级神经符号框架，通过推理时符号正则化稳定神经文本生成。通过使用常识图结构和潜在样本来净化脑电衍生的语义候选，SYNAPSE无需端到端微调LLM即可提高语义稳定性。在流行的脑电解码基准和多个冻结LLM后端上的实验表明，与无约束提示基线相比，SYNAPSE持续改进，在对象标签消融下具有鲁棒性，并且性能与资源密集得多的微调系统相当，同时通过将原始脑电处理完全限制在编码器堆栈内来保护生物特征隐私。

英文摘要

Recent advances in large language models have accelerated open-vocabulary EEG-to-imagined-text decoding, where non-invasive neural activity recorded during visual perception is translated into coherent natural language descriptions of viewed stimuli. However, existing systems remain highly vulnerable to biological noise, where corrupted neural projections induce hallucinated or semantically unstable generation in frozen language models. We introduce SYNAPSE (Symbolic Neural Alignment for Precise Semantic Extraction), a lightweight neuro-symbolic framework that stabilizes neural text generation through inference-time symbolic regularization. By purifying EEG-derived semantic candidates using commonsense graph structure and latent exemplars, SYNAPSE improves semantic stability without end-to-end LLM fine-tuning. Experiments across popular EEG decoding benchmarks and multiple frozen LLM backends demonstrate consistent gains over unconstrained prompting baselines, robustness under object-label ablation, and performance commensurate with substantially more resource-intensive fine-tuned systems, while preserving biometric privacy by localizing raw EEG processing entirely within the encoder stack.

URL PDF HTML ☆

赞 0 踩 0

2605.27788 2026-05-28 cs.LG cs.CL 版本更新

Knowing When to Ask: Segment-Level Credit Assignment for LLM Tool Use

知道何时求助：面向LLM工具使用的片段级信用分配

Abhijit Kumar, Zoey Wu, Mohit Suley

发表机构 * Microsoft AI Redmond（微软AI红mond）

AI总结提出CARL方法，通过强化学习在模型自身轨迹上训练评论家，对每个工具使用片段独立分配信用，使模型学会区分参数知识足够与需要外部帮助的情况，在多个基准上提升准确率并减少不必要的工具调用。

详情

AI中文摘要

人类知道何时需要求助，例如 $347 \times 28$ 需要计算器而 $2+2$ 不需要。语言模型则不然。基于提示的方法可以指导模型何时调用工具，但这种脚手架并不能教会模型识别自身知识的边界。将单一结果奖励分配给整个轨迹的强化学习方法同样效果不佳：轨迹级信用无法隔离成功回合中哪个工具调用真正有帮助，也无法惩罚不必要的调用。我们提出 \textbf{CARL}（\textbf{C}ompetence-\textbf{A}ware \textbf{R}einforcement \textbf{L}earning），该方法在模型自身的 rollout 上训练评论家，以学习参数知识何时足够以及何时需要外部帮助。通过在每个 rollout 的自然工具使用边界（例如代码围栏分隔符和上下文块转换）处进行分解，CARL 从单一二元结果中为每个片段分配独立信用，无需外部评判或步骤级标注。因此，错误的工具调用、不正确的提取以及不必要的调用各自获得适当符号的优势。训练好的评论家捕捉了模型的领域能力：在7B规模下，它以AUC 0.93区分参数可解问题与工具依赖问题。在涵盖算术、多跳事实问答和金融表格数值推理的五个基准上，CARL在7B和3B规模下分别比最佳RL基线提高了6.7和9.7个精确匹配准确率点，其中在Musique上增益最大（7B +8.3 EM，3B +9.0 EM）。模型在参数可回答的问题上减少了53%的工具调用，同时在这些问题上仍保持约10个EM点的更高准确率。增益在小规模上最大：3B的改进是7B改进的1.4倍，这表明知道何时求助对参数记忆较小的模型有更大益处。

英文摘要

Humans know when to reach for help e.g. $347 \times 28$ warrants a calculator while $2+2$ does not. Language models do not. Prompt-based approaches can instruct a model when to invoke tools, but this scaffolding does not teach it to recognize the boundary of its own knowledge. RL approaches that assign a single outcome reward to the whole trajectory fare no better: trajectory-level credit cannot isolate which tool call in a successful episode actually helped, nor penalize unnecessary calls. We propose \textbf{CARL} (\textbf{C}ompetence-\textbf{A}ware \textbf{R}einforcement \textbf{L}earning), which trains a critic on the model's own rollouts to learn where parametric knowledge suffices and where it needs external help. By decomposing each rollout at natural tool-use boundaries (e.g., code fence delimiters and context block transitions), CARL assigns independent credit to each segment from a single binary outcome, without external judges or step-level annotations. As a result, erroneous tool calls, incorrect extractions, and unnecessary calls each receive appropriately signed advantages. The trained critic captures the model's domain competence: it separates parametrically solvable from tool-dependent questions with AUC 0.93 at 7B. On five benchmarks spanning arithmetic, multi-hop factual QA, and numerical reasoning over financial tables, CARL improves exact-match accuracy by 6.7 points at 7B and 9.7 points at 3B over the best RL baseline, with the largest gain (+8.3 EM at 7B, +9.0 EM at 3B) on Musique. The model issues 53\% fewer tool calls on parametrically answerable questions while remaining ${\sim}10$ EM points more accurate on them. Gains are largest at small scale: the 3B improvement is $1.4\times$ the 7B improvement, suggesting that knowing when to ask disproportionately benefits models with smaller parametric memory.

URL PDF HTML ☆

赞 0 踩 0

2605.27782 2026-05-28 cs.LG cs.CR 版本更新

Revisiting ML Training under Fully Homomorphic Encryption: Convergence Guarantees, Differential Privacy, and Efficient Algorithms

重新审视全同态加密下的机器学习训练：收敛保证、差分隐私与高效算法

Yvonne Zhou, Mingyu Liang, Ivan Brugere, Danial Dervovic, Yue Guo, Antigoni Polychroniadou, Min Wu, Dana Dachman-Soled

发表机构 * University of Maryland, College Park, MD（马里兰大学 College Park 分校）； AlgoCRYPT CoE（算法密码学中心）

AI总结本文首次对全同态加密下的机器学习训练进行理论收敛性分析，结合适用于加密计算的差分隐私训练算法，通过多项式近似激活函数和损失函数实现近似梯度下降的收敛，并采用无逐样本梯度裁剪的差分隐私机制提升计算效率。

详情

AI中文摘要

我们首次对全同态加密（FHE）下的机器学习训练进行了理论收敛性分析，并结合了一种针对加密计算量身定制的差分隐私（DP）训练算法。我们的方法在实现可比效用的同时，提高了标准差分隐私梯度下降（DP-GD）的计算效率。具体而言，我们证明了使用激活函数和损失函数的多项式近似（这是FHE兼容性所必需的）的近似梯度下降的收敛性。为了保护下游任务中的隐私，我们集成了差分隐私，而无需依赖昂贵的逐样本梯度裁剪，从而实现了可扩展的加密学习。我们还提供了数据无关的超参数选择和具有理论依据的多项式近似策略，这些策略可能具有独立的价值。这些贡献共同推进了在敏感数据上进行高效、私密且安全的机器学习的可行性。

英文摘要

We present the first theoretical convergence analysis of machine learning training under fully homomorphic encryption (FHE), combined with a differentially private (DP) training algorithm tailored to encrypted computation. Our approach improves computational efficiency over standard differentially private gradient descent (DP-GD) while achieving comparable utility. In particular, we prove convergence of approximate gradient descent using polynomial approximations of activation and loss functions, which are required for FHE compatibility. To preserve privacy in downstream tasks, we integrate differential privacy without relying on costly per-sample gradient clipping, enabling scalable encrypted learning. We also provide data-independent hyperparameter selection and theoretically grounded strategies for polynomial approximation which can be of independent interest. Together, these contributions advance the feasibility of efficient, private, and secure machine learning on sensitive data.

URL PDF HTML ☆

赞 0 踩 0

2605.27774 2026-05-28 cs.LG 版本更新

平滑得分查询与采样的复杂度

Jingbo Liu

发表机构 * Department of Statistics, University of Illinois Urbana–Champaign（伊利诺伊大学厄巴纳-香槟分校统计学系）

AI总结本文研究利用梯度信息从高维高斯分布中采样的查询复杂度，通过引入平滑得分查询（即高斯卷积密度的对数梯度）将条件数依赖从√κ降低到对数级别，并给出近乎匹配的上下界。

详情

AI中文摘要

我们研究利用梯度信息从高维高斯分布中采样的查询复杂度。在标准预言机模型中，精确梯度仅暴露与精度矩阵的矩阵-向量乘积，导致多项式逼近障碍和特征性的条件数√κ依赖。我们证明，当允许采样器查询\emph{平滑得分}（即高斯卷积密度的对数梯度）时，这一障碍消失。对于精度矩阵为Λ的高斯目标，噪声水平τ下的平滑得分查询可访问预解式(Λ+τ^{-1}I)^{-1}。将几何间隔的噪声水平与sinc求积有理逼近相结合，我们得到一个采样器，其总变分误差δ_{TV}所需的平滑得分查询次数为q=O\!\left(igl(\logκ+\log(e\sqrt d/δ_{ m TV})igr)\log(e\sqrt d/δ_{ m TV}) ight)，将条件数依赖从√κ改进为对数依赖。我们还研究了有限比特梯度预言机。通过对变换后的平滑得分答案进行坐标量化并添加最终抖动步骤，我们得到一个采样方案，其总通信梯度信息在κ中为多对数级别；特别地，对于固定维度和精度，比特复杂度为O(\log^2κ)。为补充这些上界，我们引入一种信道合成（或反向香农）逆技术用于采样下界。这将总变分模拟保证转化为通信需求，并得到所需梯度信息的Ω(\logκ)下界。综合这些结果，我们识别出平滑得分作为采样中可证明信息更丰富的预言机，并为其有限比特复杂度给出了近乎匹配的上下界。

英文摘要

We study the query complexity of sampling from high-dimensional Gaussian distributions using gradient information. In the standard oracle model, exact gradients expose only matrix-vector products with the precision matrix, leading to polynomial approximation barriers and a characteristic $\sqrtκ$ dependence on the condition number. We show that this barrier disappears when the sampler is allowed to query \emph{smoothed scores}, namely gradients of the logarithms of the Gaussian-convolved densities. For a Gaussian target with precision matrix $Λ$, a smoothed-score query at noise level $τ$ gives access to the resolvent $(Λ+τ^{-1}I)^{-1}$. Combining geometrically spaced noise levels with sinc-quadrature rational approximation, we obtain a sampler with $q=O\!\left(\bigl(\logκ+\log(e\sqrt d/δ_{\rm TV})\bigr)\log(e\sqrt d/δ_{\rm TV})\right)$ smoothed-score queries for total variation error $δ_{\rm TV}$, improving the condition-number dependence from $\sqrtκ$ to logarithmic. We also study finite-bit gradient oracles. Using coordinatewise quantization of the transformed smoothed-score answers and a final dithering step, we obtain a sampling scheme whose total communicated gradient information is polylogarithmic in $κ$; in particular, for fixed dimension and accuracy, the bit complexity is $O(\log^2κ)$. To complement these upper bounds, we introduce a channel-synthesis, or reverse-Shannon, converse technique for sampling lower bounds. This converts total-variation simulation guarantees into communication requirements and yields an $Ω(\logκ)$ lower bound on the required gradient information. Together, these results identify smoothed scores as a provably more informative oracle for sampling and give nearly matching upper and lower bounds for its finite-bit complexity.

URL PDF HTML ☆

赞 0 踩 0

2605.27767 2026-05-28 cs.CL cs.AI cs.LG 版本更新

基于几何感知算子学习与内存高效低秩注意力的高保真工业碰撞动力学预测

Deepak Akhare, Mohammad Amin Nabian, Corey Adams, Sudeep Chavare, Sanjay Choudhry

发表机构 * Department of Aerospace and Mechanical Engineering, University of Notre Dame（诺特大学航空航天与机械工程系）； NVIDIA ； General Motors（通用汽车）

AI总结本文提出GeoTransolver框架，通过几何感知算子学习和内存高效低秩注意力机制，实现工业级碰撞动力学的高保真预测，在复杂梁和整车碰撞数据集上验证了其准确性和效率。

详情

AI中文摘要

汽车碰撞安全性优化仍然是一个安全关键挑战，需要通过迭代的高保真模拟来管理大规模非线性结构变形和能量耗散。虽然传统有限元求解器计算成本高昂，新兴的算子学习框架提供了快速的代理预测；然而，将其应用于工业级碰撞分析（其中复杂几何、接触非线性和快速演变的瞬态变形并存）仍然是一个未解决的挑战。在本文中，我们证明GeoTransolver框架为工业规模下准确、高保真的碰撞动力学预测提供了可行的解决方案。在复杂的保险杠梁和整车碰撞数据集上进行的基准测试表明，GeoTransolver能够捕捉多尺度几何上下文，并准确解析塑性变形模式以及关键乘员位置的加速度曲线。除了架构本身，我们提出并系统评估了一系列时间预测策略，包括一次性、时间条件和自回归滚动策略，证明一次性方法在显著降低训练开销和推理延迟的同时实现了最先进的准确性。作为次要贡献，我们引入了一种基于快速低秩注意力路由引擎（FLARE）的修改，应用于GeoTransolver注意力主干，将内存开销减少约2倍，同时进一步提高O(N)长程、高频瞬态的预测准确性，保留了基础框架的几何感知交叉注意力优势。我们的结果突显了几何感知算子学习在复杂、安全关键的汽车动力学高保真代理建模中的实际可行性。

英文摘要

Automotive crashworthiness optimization remains a safety-critical challenge, requiring the management of large-scale nonlinear structural deformations and energy dissipation through iterative, high-fidelity simulations. While traditional finite element solvers are computationally prohibitive, emerging operator learning frameworks provide rapid surrogate predictions; however, applying them to industrial-scale crash analysis, where complex geometry, contact nonlinearities, and rapidly evolving transient deformation coexist, remains an open challenge. In this paper, we demonstrate that the GeoTransolver framework provides a viable solution for accurate, high-fidelity crash dynamics prediction at industrial scale. Benchmarked on complex bumper beam and full-vehicle crash datasets, GeoTransolver captures multi-scale geometric context and accurately resolves plastic deformation patterns as well as acceleration profiles at critical occupant locations. Beyond the architecture itself, we propose and systematically evaluate a suite of temporal prediction recipes, including one-shot, time-conditional, and autoregressive rollout strategies, demonstrating that the one-shot approach achieves state-of-the-art accuracy with significantly reduced training overhead and inference latency. As a secondary contribution, we introduce a Fast Low-rank Attention Routing Engine (FLARE)-based modification to the GeoTransolver attention backbone that reduces memory overhead by approximately 2x while further improving predictive accuracy for O(N) long-range, high-frequency transients, preserving the geometry-aware cross-attention strengths of the base framework. Our results highlight the practical viability of geometry-aware operator learning for high-fidelity surrogate modeling of complex, safety-critical automotive dynamics.

URL PDF HTML ☆

赞 0 踩 0

2605.27756 2026-05-28 physics.flu-dyn cs.LG cs.NA math.DS math.NA 版本更新

Sparse POD Mode Selection and Manifold Dimensionality Reduction with Neural Networks

稀疏POD模态选择与神经网络流形降维

Tomoki Koike, Prakash Mohan, Marc T. Henry de Frahan, Elizabeth Qian, Julie Bessac

发表机构 * School of Aerospace Engineering, Georgia Institute of Technology（航空航天工程学院，佐治亚理工学院）； Computational Science Center, National Laboratory of the Rockies（岩石高原国家实验室计算科学中心）

AI总结提出SparseModesNet框架，通过LassoNet实现线性POD模态的稀疏选择与非线性神经网络解码，在平流主导和混沌流中降低重构误差51-78%。

详情

AI中文摘要

高性能计算能够模拟高维物理系统，但反问题和控制等下游分析仍计算昂贵，这促使模型降阶（MOR）构建高效的低维代理。本征正交分解（POD）是一种广泛采用的数据驱动MOR方法，将动力学投影到由最能量模态张成的线性子空间上。然而，POD在处理Kolmogorov n-宽度缓慢衰减的问题（如平流主导和湍流）时表现不佳，需要大量模态才能准确重构。此外，基于能量的选择可能会丢弃捕捉小尺度特征所需的关键低能量模态。最近使用交替或贪婪模态选择的多项式映射的非线性流形方法以更少的模态实现了更好的重构。然而，这些方法先验地固定了非线性映射形式，限制了表达能力。相反，神经网络（NN）流形提供了更强的表达能力，但采用基于能量的选择。我们提出了SparseModesNet，一种降维框架，通过POD模态进行线性编码和NN非线性解码。解码器利用LassoNet——一种通过带线性跳跃层的残差连接强制执行层次稀疏性的方法——同时选择信息丰富的POD模态并学习最小化重构误差的非线性映射。在基准平流主导和混沌流中，SparseModesNet达到或超过了最先进的性能。对于摩擦雷诺数Re_τ=5200的湍流槽道流，与现有的多项式流形方法相比，我们将重构误差降低了51-78%，同时通过物理上有意义的模态选择保持了可解释性。

英文摘要

High-performance computing enables simulation of high-dimensional physical systems, but downstream analyses such as inverse problems and control remain computationally expensive, motivating model order reduction (MOR) to construct efficient low-dimensional surrogates. Proper Orthogonal Decomposition (POD), a widely adopted data-driven MOR method, projects dynamics onto linear subspaces spanned by the most energetic modes. However, POD struggles for problems with slowly decaying Kolmogorov $n$-widths, such as advection-dominated and turbulent flows, requiring many modes for accurate reconstruction. Moreover, energy-based selection can discard crucial low-energy modes needed to capture small-scale features. Recent nonlinear manifold methods using polynomial mappings with alternating or greedy mode selection achieve better reconstruction with fewer modes. However, these methods fix the nonlinear mapping form a priori, limiting expressivity. Conversely, neural network (NN) manifolds offer greater expressivity but employ energy-based selection. We present SparseModesNet, a dimensionality reduction framework that employs linear encoding via POD modes and nonlinear NN decoding. The decoder leverages LassoNet, a method enforcing hierarchical sparsity through residual connections with linear skip layers, to simultaneously select informative POD modes and learn a nonlinear mapping that minimizes reconstruction error. On benchmark advection-dominated and chaotic flows, SparseModesNet matches or exceeds state-of-the-art performance. For turbulent channel flow at friction Reynolds number $Re_τ=5200$, we reduce reconstruction error by 51--78\% compared to existing polynomial manifold methods while maintaining interpretability through physically meaningful mode selection.

URL PDF HTML ☆

赞 0 踩 0

2605.27748 2026-05-28 cs.CV cs.AI cs.LG 版本更新

Mahalanobis PatchCore: Covariance-Aware and Streaming-Compatible Industrial Anomaly Detection

马氏距离 PatchCore：协方差感知与流式兼容的工业异常检测

Niccolò Ferrari, Oligert Osmani, Evelina Lamma

发表机构 * Department of Engineering, University of Ferrara（费拉拉大学工程学院）

AI总结提出马氏距离 PatchCore，通过协方差估计和流式处理改进 PatchCore，在保持性能的同时降低峰值内存并提升工业检测精度。

Comments 57 pages, 7 figures

详情

AI中文摘要

工业视觉异常检测通常是一类问题：正常图像丰富，而缺陷罕见、异质且常在系统设计时不可用。PatchCore 风格的检索适合此场景，因为它通过正常补丁特征的内存库对测试图像评分，但标准欧几里得几何忽略了特征相关性，且其离线构建在子采样前需实例化整个补丁池。我们引入马氏距离 PatchCore，一种协方差感知、流式兼容的 PatchCore 扩展。其人工智能贡献在于一种检索检测器，它在降维特征空间中估计正则化协方差模型并对嵌入进行白化，使得变换后的欧几里得最近邻搜索实现马氏距离检索。一个有界内存、可重复迭代的训练流程通过增量降维、在线协方差估计和流式聚合，无需一次性存储所有正常补丁即可构建内存库。工程应用是自动化工业检测，其中视觉异常检测必须在实际内存限制下保持准确。我们在一个公开的 15 类工业异常检测基准和三个工业数据集（涵盖吹灌封条带安瓿弯月面检测、琥珀色玻璃安瓿底部检测和冻干饼西林瓶检测）上评估该方法。马氏距离 PatchCore 在公开基准上保留了大部分离线 PatchCore 的图像级性能，同时将峰值内存从 5.41 GB 降至 2.78 GB，并将选定的工业平均图像接收者操作特征曲线下面积从 0.981 提升至 0.986。

英文摘要

Industrial visual anomaly detection is usually one-class: normal images are abundant, while defects are rare, heterogeneous, and often unavailable during system design. PatchCore-style retrieval suits this setting because it scores test images from a memory bank of normal patch features, but the standard Euclidean geometry ignores feature correlations and its offline construction materialises the full patch pool before subsampling. We introduce Mahalanobis PatchCore, a covariance-aware, streaming-compatible extension of PatchCore. Its artificial intelligence contribution is a retrieval detector that estimates a regularised covariance model in reduced feature space and whitens embeddings, so Euclidean nearest-neighbour search after transformation implements Mahalanobis retrieval. A bounded-memory, re-iterable training pipeline builds the memory bank without storing all normal patches at once, using incremental dimensionality reduction, online covariance estimation, and streaming aggregation. The engineering application is automated industrial inspection, where visual anomaly detection must remain accurate under practical memory limits. We evaluate the method on a public 15-category industrial anomaly-detection benchmark and three industrial datasets covering blow-fill-seal strip-ampoule meniscus inspection, amber-glass-ampoule bottom inspection, and lyophilised-cake vial inspection. Mahalanobis PatchCore preserves most offline PatchCore image-level performance on the public benchmark while reducing peak memory from 5.41 to 2.78 GB, and improves the selected industrial mean image area under the receiver operating characteristic curve from 0.981 to 0.986.

URL PDF HTML ☆

赞 0 踩 0

2605.27747 2026-05-28 stat.ML cs.LG stat.CO 版本更新

Soft Specialists: $α$-Rényi Ensembles for Uncertainty-Aware LLM Post-Training

软专家：用于不确定性感知的LLM后训练的$\alpha$-Rényi集成

Paula Cordero-Encinar, Georgy Tyukin, Andrew B. Duncan

发表机构 * Department of Mathematics, Imperial College London（帝国理工学院伦敦校区数学系）； Bessemer AI

AI总结提出一种$\alpha$-Rényi变分框架，通过学习后训练参数的分布来替代深度集成，实现不确定性感知的LLM后训练，并支持软路由和模型专业化。

详情

AI中文摘要

现有的大语言模型训练方法基于大量数据学习单一参数集，这些数据通常异构、冲突且往往直接矛盾。因此，模型被迫将冲突目标和固有不确定性压缩为单一的平均行为模式。我们提出了一种$\alpha$-Rényi变分框架，用于学习后训练参数的分布，为深度集成方法提供了一种不确定性感知的替代方案。得到的变分目标在经典变分贝叶斯和预测导向的后验学习之间插值，平衡全局合理的个体模型与互补专家系统。我们确定了局部稳定性准则，展示了模型误设如何使非退化后验扩散局部有利，将矛盾或冲突数据表现为认知不确定性。我们将该框架应用于LLM后训练，学习附着在共享冻结基模型上的LoRA适配器集成，为监督微调和偏好优化提供了可扩展的训练过程。我们的方法使得训练示例能够被软路由到集成成员之间，促进模型专业化，并为不同任务提供可操作的不确定性估计。

英文摘要

Existing training approaches for large language models learn a single set of parameters, based on large volumes of data, which is typically heterogeneous, conflicting and often outright contradictory. As a result, the model is forced to compress conflicting goals, and inherent uncertainties into a single, averaged pattern of behaviour. We propose an $α$-Rényi variational framework for learning distributions over post-training parameters, offering an uncertainty-aware alternative to deep ensemble approaches. The resulting variational objective interpolates between classical variational Bayes and predictively oriented posterior learning, balancing between globally plausible individual models against systems of complementary specialists. We identify local stability criteria, demonstrating how model misspecification can make non-degenerate posterior spread locally favourable, manifesting contradictory or conflicting data as epistemic uncertainty. We apply our framework to LLM post-training, learning an ensemble of LoRA adapters attached to a shared, frozen base model, providing a scalable training procedure for both supervised fine-tuning and preference optimisation. Our approach enables training examples to be softly routed across ensemble members, promoting model specialisation and providing actionable uncertainty estimates across different tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.27739 2026-05-28 cs.LG cs.AI 版本更新

Worker Disagreement Reveals Sharp Directions in Local SGD

工作者分歧揭示局部SGD中的尖锐方向

Tolga Dimlioglu, Kristi Topollai, Anna Choromanska

发表机构 * New York University（纽约大学）

AI总结本文通过理论分析和实验证明，局部SGD中的工作者平均间隙协方差能够捕捉Hessian矩阵的尖锐方向，从而提供一种廉价的无Hessian估计方法。

Comments 5 pages main body, 18 pages appendix - Accepted to HiLD 2026, ICML

详情

AI中文摘要

深度神经网络训练通常表现出高度各向异性的损失几何，其中少数尖锐的主导Hessian方向与大量平坦区域共存。梯度往往不成比例地与这些主导方向对齐，尽管稳定的进展通常需要穿过平坦区域的方向。因此，估计主导子空间是有用的，但使用基于Hessian的直接方法成本高昂。我们表明，标准局部SGD通过工作者分歧暴露了这种几何结构。我们从理论上证明，工作者平均间隙协方差由随机梯度噪声和Hessian曲率塑造，导致工作者沿着尖锐的曲率敏感方向产生分歧。因此，工作者平均间隙提供了主导子空间的廉价无Hessian估计。在MLP、CNN和Transformer上的实验表明，由工作者平均间隙形成的子空间捕获了位于主导Hessian特征空间中的梯度分量的很大一部分。

英文摘要

Deep neural network training often exhibits highly anisotropic loss geometry, where a few sharp dominant Hessian directions coexist with a large flatter bulk. Gradients tend to align disproportionately with these dominant directions, although stable progress often requires movement through flatter bulk directions. Estimating the dominant subspace is therefore useful but costly with direct Hessian-based methods. We show that standard Local SGD exposes this geometry through worker disagreement. We theoretically show that the worker-average gap covariance is shaped by stochastic-gradient noise and Hessian curvature, causing workers to disagree along sharp, curvature-sensitive directions. Thus, worker-average gaps provide a cheap Hessian-free estimator of the dominant subspace. Experiments on MLPs, CNNs, and Transformers show that subspaces formed by worker-average gaps capture a substantial fraction of the gradient component lying in the dominant Hessian eigenspace.

URL PDF HTML ☆

赞 0 踩 0

2605.27736 2026-05-28 cs.LG cs.CV 版本更新

Explicit Critic Guidance for Aligning Diffusion Models

显式评论家引导的对齐扩散模型

Zhengyang Liang, Qihang Zhang, Ceyuan Yang

发表机构 * University of Toronto（多伦多大学）； Vector Institute（向量研究所）； The Chinese University of Hong Kong（香港中文大学）

AI总结提出一种状态对齐的潜在演员-评论家框架，通过将扩散模型自身作为时间步条件价值函数，实现轨迹级PPO训练和推理时引导，在单/多奖励基准上优于先前方法。

详情

AI中文摘要

在线强化学习对于将扩散模型与不可微目标对齐变得越来越重要。然而，现有方法在沿去噪轨迹分配细粒度信用和实现稳定的基于价值的优化方面仍面临限制。我们提出了一种用于扩散后训练的状态对齐潜在演员-评论家框架，其中扩散模型自身作为时间步条件价值函数，并直接在噪声潜在状态上预测价值。这使得轨迹级PPO训练成为可能，通过简单的条件和价值预训练策略支持稳定的演员-评论家优化，并自然地允许学习到的评论家用于推理时引导。我们进一步将框架扩展到多奖励优化，其中与互补奖励的联合训练有助于减轻奖励破解。在基于UNet和DiT的骨干网络上，我们的方法在单奖励和多奖励基准上始终优于先前的组相对RL和演员-评论家基线，同时测试时引导在生成质量上提供了额外提升。

英文摘要

Online reinforcement learning is becoming increasingly important for aligning diffusion models with non-differentiable objectives. However, existing methods still face limitations in assigning fine-grained credit along denoising trajectories and in realizing stable value-based optimization. We propose a state-aligned latent actor-critic framework for diffusion post-training, in which the diffusion model serves as its own timestep-conditioned value function and predicts values directly on noisy latent states. This enables trajectory-level PPO training, supports stable actor-critic optimization with simple conditioning and value pretraining strategies, and naturally allows the learned critic to be reused for inference-time steering. We further extend the framework to multi-reward optimization, where joint training with complementary rewards helps alleviate reward hacking. Across both UNet- and DiT-based backbones, our method consistently outperforms prior group-relative RL and actor-critic baselines on single-reward and multi-reward benchmarks, while test-time steering provides additional gains in generation quality.

URL PDF HTML ☆

赞 0 踩 0

2605.27734 2026-05-28 cs.LG 版本更新

Learn from your own latents and not from tokens: A sample-complexity theory

从自身潜在表示而非token学习：样本复杂度理论

Daniel J. Korchinski, Alessandro Favero, Matthieu Wyart

发表机构 * Institute of Physics（物理研究所）； University of Cambridge（剑桥大学）； Johns Hopkins University（约翰霍普金斯大学）； EPFL（苏黎世联邦理工学院）

AI总结本文通过概率上下文无关语法数据，证明潜在预测方法在样本复杂度上相比token级SSL具有指数级优势，并分析了多尺度层次结构的必要性。

Comments 10 pages, 5 figures in main. 28 pages, 14 figures, 1 table in all

详情

AI中文摘要

生成模型，从扩散模型到大型语言模型，取得了显著性能，但代价是训练数据量比生物学习所需的大几个数量级。一种替代范式已经出现，其中网络被训练来预测其自身对相关视图或掩码区域的潜在表示，如data2vec和JEPA——这一思想与皮层的预测编码理论相关。尽管有强大的实证结果，但这些方法的理论理解仍然有限。核心问题包括：潜在预测实际上能提高多少数据效率？将此类方法堆叠成多尺度层次结构是否有益？我们使用一个可处理的概率上下文无关语法作为数据来回答这两个问题，该语法捕捉了自然语言和图像的组合结构。这样的语法通过沿深度为$L$的隐藏符号树递归应用产生规则，生成可见token的字符串。对于这样的数据，监督或token级SSL需要样本数量随$L$指数增长才能恢复潜在树；我们证明潜在预测在$L$上（对数因子内）以常数样本量实现这一点。我们通过(i)层次聚类算法，(ii)端到端神经网络（其预测-聚类器模块通过梯度下降在每个层次预测自身的潜在表示），以及(iii)data2vec的首次样本复杂度分析（我们证明其隐式执行层次潜在预测）来确认这一界限。这表明显式堆叠如H-JEPA在很大程度上是冗余的。

英文摘要

Generative models, from diffusion models to large language models, achieve remarkable performance but at a cost in training data orders of magnitude larger than what biological learners require. An alternative paradigm has emerged in which networks are trained to predict their \emph{own} latent representations of related views or masked regions, as in data2vec and JEPA -- an idea related to predictive-coding accounts of the cortex. Despite strong empirical results, the theoretical understanding of these methods remains limited. Central questions include: by how much does latent prediction actually improve data efficiency? Is there a benefit to stacking such methods into multi-scale hierarchies? We answer both using as data a tractable probabilistic context-free grammar that captures the compositional structure of natural language and images. Such a grammar generates strings of visible tokens by recursively applying production rules along a tree of hidden symbols of depth $L$. For such data, supervised or token-level SSL require a number of samples \emph{exponential} in $L$ to recover the latent tree; we prove that latent prediction achieves this with a number of samples \emph{constant} in $L$, up to logarithmic factors. We confirm this bound with (i) a hierarchical clustering algorithm, (ii) an end-to-end neural network whose predictor-clusterer modules predict their own latents at each level via gradient descent, and (iii) the first sample-complexity analysis of data2vec, which we show implicitly performs hierarchical latent prediction. This suggests that explicit stacking such as H-JEPA is largely redundant.

URL PDF HTML ☆

赞 0 踩 0

2605.27733 2026-05-28 cs.LG 版本更新

基于谱梯度重加权的稳健矩估计

Liu Zhang, Amit Singer

发表机构 * Program in Applied and Computational Mathematics, Princeton University（应用与计算数学项目，普林斯顿大学）

AI总结提出SGR-GMM算法，通过谱梯度重加权对观测梯度进行软重加权，实现稳健的广义矩估计，并给出理论保证和实验验证。

详情

AI中文摘要

基于矩的估计是参数推断在理论上具有吸引力的方法，尤其是在基于似然的估计不可用、设定错误或计算不便时。然而，矩方程涉及样本均值，这使得基于矩的估计对异常值敏感。我们提出了SGR-GMM算法，这是一种稳健的广义矩估计（GMM）程序，它使用谱梯度重加权（SGR）原语在矩匹配优化过程中对每个观测的梯度进行软重加权。我们的分析分为三层。首先，对于固定中心，SGR原语被表述为样本权重玩家和密度矩阵玩家之间的熵正则化谱博弈，并使用经典的多重权重和矩阵多重权重遗憾界进行分析。其次，我们建立了SGR原语中固定中心更新的显式收敛半径和有限终止界。第三，我们证明了局部有限样本参数估计误差界，该界显式依赖于污染比例、内点梯度稳定性、局部GMM识别强度和优化精度。我们进一步特化SGR-GMM算法，以获得稳健的对角加权GMM（DGMM）估计量，用于估计在加性高斯噪声和强污染下观测到的异方差低秩高斯混合模型。在数值实验中，SGR原语产生近乎神谕的梯度估计，而稳健的DGMM特化显著优于非稳健的矩基线。代码和数据可在https://github.com/liu-lzhang/sgr-gmm获取。

英文摘要

Moment-based estimation is a theoretically attractive approach to parametric inference, especially when likelihood-based estimation is unavailable, misspecified, or computationally inconvenient. However, the moment equations involve sample averages, which makes moment-based estimation sensitive to outliers. We propose the SGR-GMM algorithm, a robust generalized method of moments (GMM) procedure that uses a spectral gradient reweighting (SGR) primitive to soft-reweight the per-observation gradients during the moment-matching optimization. Our analysis has three layers. First, for a fixed center, the SGR primitive is formulated as an entropy-regularized spectral game between a sample-weight player and a density-matrix player, which is analyzed using classical multiplicative-weights and matrix-multiplicative-weights regret bounds. Second, we establish explicit convergence radius and finite termination bound for the fixed-center updates in the SGR primitive. Third, we prove a local finite-sample parameter estimation error bound with explicit dependence on the contamination fraction, inlier gradient stability, local GMM identification strength, and optimization accuracy. We further specialize the SGR-GMM algorithm to obtain a robust diagonally-weighted GMM (DGMM) estimator for estimating heteroscedastic low-rank Gaussian mixtures observed under additive Gaussian noise and strong contamination. In the numerical experiments, the SGR primitive produces nearly-oracle gradient estimation and the robust DGMM specialization substantially improves over non-robust moment baselines. The code and data are available at https://github.com/liu-lzhang/sgr-gmm.

URL PDF HTML ☆

赞 0 踩 0

2605.27697 2026-05-28 cs.RO cs.AI cs.LG 版本更新

Simulation-Informed Diffusion for Decentralized Multi-robot Motion Planning

仿真引导的扩散方法用于去中心化多机器人运动规划

Jinhao Liang, Sven Koenig, Ferdinando Fioretto

发表机构 * University of Virginia（弗吉尼亚大学）； University of California, Irvine（加州大学伊文斯顿分校）

AI总结提出一种基于约束感知扩散模型的去中心化框架SID，通过仿真邻居未来轨迹并利用安全约束规划自身轨迹，在密集场景下实现高效协调。

详情

AI中文摘要

去中心化多机器人运动规划要求每个机器人仅根据局部观测生成无碰撞轨迹，无需全局感知或可靠通信。然而，大多数现有规划器（无论是经典方法还是基于学习的方法）都是从局部观测的静态快照生成轨迹，这限制了它们预测相邻机器人未来行为的能力。随着机器人数量增加和环境变得更加拥挤，这一限制变得至关重要。为了克服这一挑战，本文引入了仿真引导的扩散（SID），这是一种基于约束感知扩散模型（CADM）的去中心化框架。SID首先使用CADM从当前观测状态仿真相邻机器人的未来轨迹，然后利用这些仿真提供的安全约束，使用相同的CADM规划每个机器人自身的轨迹。关键的是，对邻居的精确仿真使得一种最小通信方案成为可能，该方案仅在高度拥挤的场景中必要时触发协调。在多种环境中的实验表明，SID在规划有效性和约束满足方面始终优于基线方法，并且可扩展到108个机器人和160个障碍物的场景。

英文摘要

Decentralized multi-robot motion planning requires each robot to generate collision-free trajectories from local observations, without global sensing or reliable communication. However, most existing planners, whether classical or learning-based, generate trajectories from a static snapshot of the local observation, which limits their ability to anticipate the future behavior of neighboring robots. This limitation is critical as the number of robots increases and the environment becomes more cluttered. To overcome this challenge, this paper introduces Simulation-Informed Diffusion (SID), a decentralized framework built on constraint-aware diffusion models (CADM). SID first uses CADM to simulate the future trajectories of neighboring robots from their currently observed states, and then uses the same CADM to plan each robot's own trajectory under safety constraints informed by these simulations. Crucially, the accurate simulation of neighbors enables a minimal communication scheme that triggers coordination only when necessary in highly congested scenarios. Experiments across diverse environments show that SID consistently outperforms baseline methods in terms of planning effectiveness and constraint satisfaction, and scales to scenarios with 108 robots and 160 obstacles.

URL PDF HTML ☆

赞 0 踩 0

2605.27690 2026-05-28 cs.CL cs.LG 版本更新

TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling

TRACES: 通过轨迹状态建模实现多轮LLM智能体的主动安全审计

Jiaqian Li, Yanshu Li, Boxuan Zhang, Ruixiang Tang, Kuan-Hao Huang

发表机构 * Brown University（布朗大学）； The University of Texas at Austin（德克萨斯大学奥斯汀分校）； Rutgers University（罗格斯大学）； Texas A&M University（德克萨斯阿姆斯特朗大学）

AI总结提出TRACES方法，通过观察LLM的隐藏表示学习前缀级轨迹风险状态，实现多轮工具使用环境下的主动安全审计，提升全轨迹安全预测和主动风险判别能力。

详情

AI中文摘要

LLM智能体越来越多地通过多轮工具使用和环境交互来运作，其中安全风险往往在最终结果显现之前的中间步骤中就已经出现。因此，反应式审计是不够的：事后诊断常常在风险正在展开时错过标记它们的机会。我们提出TRACES，一种基于表示的主动审计器，它从观察者LLM的隐藏表示中学习前缀级轨迹风险状态。TRACES从步骤表示中诱导潜在机制特征，并建模其时间演化，以估计部分轨迹是否正在向不安全行为漂移。为了规避步骤级风险标注的成本和歧义，TRACES在弱轨迹级监督下训练，同时仍能产生密集的前缀级风险估计。在多个智能体安全基准测试中，TRACES改进了全轨迹安全预测和主动风险判别。我们的分析进一步表明，这些风险状态可以帮助训练更安全的智能体，凸显了主动审计在长程智能体安全中的更广泛潜力。

英文摘要

LLM agents increasingly operate through multi-turn tool use and environment interaction, where safety risks often emerge from intermediate steps long before they surface in the final outcome. Reactive auditing is therefore insufficient: post-hoc diagnosis frequently misses the chance to flag risks while they are unfolding. We propose TRACES, a representation-based proactive auditor that learns prefix-level trajectory risk states from the hidden representations of an observer LLM. TRACES induces latent mechanism features from step representations and models their temporal evolution to estimate whether a partial trajectory is drifting toward unsafe behavior. To sidestep the cost and ambiguity of step-level risk annotation, TRACES is trained with weak trajectory-level supervision while still producing dense prefix-level risk estimates. Across multiple agent safety benchmarks, TRACES improves both full-trajectory safety prediction and proactive risk discrimination. Our analyses further suggest that these risk states can help train a safer agent, highlighting the broader potential of proactive auditing for long-horizon agent safety.

URL PDF HTML ☆

赞 0 踩 0

2605.27689 2026-05-28 cs.LG cs.CR 版本更新

多模态大语言模型训练的异构并行

Yashaswi Karnati, Kamran Jafari, Akash Mehra, Li Ding, Pranav Prashant Thombre, Ali Roshan Ghias, Shifang Xu, Parth Mannan, Yu Yao, Hao Wu, Eric Harper, Ashwath Aithal, Nima Tajbakhsh

发表机构 * NVIDIA

AI总结针对多模态大语言模型训练中单一LLM中心并行布局导致的吞吐量瓶颈，提出异构并行抽象，允许各模块独立布局和放置，并通过边界通信器实现张量语义保持，实验表明可提升TFLOPS/GPU最高49.3%。

详情

AI中文摘要

基础模型训练正变得多模态，从后训练流程到大规模预训练。随着模态覆盖范围扩大、上下文窗口增长以及编码器LLM规模分化，单一的以LLM为中心的TP/CP/PP/DP/EP布局日益限制吞吐量。这种耦合迫使编码器继承LLM驱动的分片和放置选择，可能增加通信、限制编码器并行性或约束LLM调度；这种不匹配在长上下文中最为明显，此时融合的多模态序列需要LLM上下文并行，但编码器输入仍然受限。我们提出了多模态大语言模型训练的异构并行，这是一种抽象，允许端到端图中的模块使用独立的布局和秩放置，支持共享GPU上的共置执行和不相交秩集上的非共置执行。关键挑战是在独立布局间保持边界张量语义：前向激活必须为目标布局物化，而反向梯度必须路由回源布局。我们通过边界通信器解决这一问题，实现前向和反向布局变换，以及两种放置模式的调度扩展。我们评估了跨多模态工作负载和GPU规模的优化同构、共置异构和非共置异构配置，以刻画何时额外的布局和放置自由度能暴露更优的操作点。在这一扫描中，共置异构将TFLOPS/GPU提升高达49.3%，而非共置异构将总token吞吐量提升高达13.0%，TFLOPS/GPU提升高达9.6%。我们验证了与同构基线相比的损失收敛一致性，并将该系统作为开源Megatron-LM扩展发布。

英文摘要

Foundation model training is becoming multimodal, from post-training pipelines to large-scale pretraining. As modality coverage broadens, context windows grow, and encoder LLM scales diverge, a single LLM-centric TP/CP/PP/DP/EP layout increasingly limits throughput. This coupling forces encoders to inherit LLM-driven sharding and placement choices that can add communication, limit encoder parallelism, or constrain the LLM schedule; the mismatch is most pronounced at long contexts, where LLM context parallelism is needed for the fused multimodal sequence but encoder inputs remain bounded. We present heterogeneous parallelism for multimodal large language model training, an abstraction that lets modules in one end-to-end graph use independent layouts and rank placements, supporting colocated execution on shared GPUs and non-colocated execution on disjoint rank sets. The key challenge is preserving boundary tensor semantics across independent layouts: forward activations must be materialized for the destination layout, while backward gradients must be routed back to the source layout. We address this with boundary communicators that implement forward and backward layout transforms, plus scheduling extensions for both placement modes. We evaluate optimized homogeneous, colocated heterogeneous, and non-colocated heterogeneous configurations across multimodal workloads and GPU scales to characterize when added layout and placement freedom exposes a better operating point. Across this sweep, colocated heterogeneity improves TFLOPS/GPU by up to 49.3%, while non-colocated heterogeneity improves aggregate token throughput by up to 13.0% and TFLOPS/GPU by up to 9.6%. We validate loss convergence parity against homogeneous baselines and release the system as an open-source Megatron-LM extension.

URL PDF HTML ☆

赞 0 踩 0

2605.27676 2026-05-28 stat.ML cs.LG 版本更新

Unsupervised Identification and Removal of Spurious Correlations During Fine-Tuning

微调过程中虚假关联的无监督识别与消除

Ciarán M. Gilligan-Lee, Joseph Egan, Yuchen Zhu, Michael O'Riordan

发表机构 * Spotify ； University College London（伦敦大学学院）

AI总结提出GRASP方法，通过梯度投影在微调时无监督识别并消除与任务纠缠的虚假关联，同时保留预训练知识，在三个任务上优于基线。

Comments 10 + 4 pages, comments welcome

详情

AI中文摘要

在精心策划的数据集上微调预训练语言模型可能会在微调任务与无意中的潜在因素（如不对齐的人物角色或政治倾向）之间产生虚假关联，而这些因素是由策划过程与任务纠缠在一起的。模型可能会依赖这些虚假关联，导致偏差并降低分布外泛化能力。我们证明，在任务复杂性和虚假关联的合理假设下，可以从朴素LoRA微调的权重中无监督地识别这些潜在因素。现有的消除偏差方法（如激活引导）在推理或训练期间从残差流激活中移除已识别的因素。然而，我们认为目标应该是消除虚假关联，而不是潜在因素本身，因为预训练模型可能依赖该因素来获取真实的任务信号。为此，我们提出GRASP（关联虚假模式的梯度投影），该方法防止模型对已识别的潜在因素产生新的依赖，同时保留沿该方向的任何预训练内容。我们在三个微调任务上进行了验证。前两个涉及紧急不对齐，即在狭窄任务（在我们的案例中，编写不安全的代码和给出糟糕的医疗建议）上进行微调会导致在无关话题上产生不对齐的响应。在这里，我们的方法在不安全代码案例中完全消除了不对齐，在糟糕医疗建议案例中减少了约5倍，在不对齐减少与任务保持之间的权衡中击败了所有基线。最后一个是新颖的政治偏见实验，即在右倾的Reddit金融建议数据上进行微调会导致在无关话题上产生政治倾向漂移。在这里，我们的方法将漂移减少了一半以上，同时提高了金融任务性能，击败了所有基线。

英文摘要

Fine-tuning a pretrained language model on a curated dataset can produce spurious correlations between the fine-tuning task and unintended latent factors -- such as misaligned personas or political slant -- that the curation procedure has entangled with the task. The model can latch onto these spurious correlations, leading to bias and reduced out-of-distribution generalisation. We prove that under reasonable assumptions on task complexity and the spurious correlation, such latent factors can be identified, without supervision, from the weights of a naive LoRA fine-tune. Existing approaches to removing bias, such as activation steering, remove identified factors from residual-stream activations, either at inference or during training. We argue, however, that the goal should be to remove the spurious correlation, not the latent factor itself, as the pretrained model may rely on it for genuine task signal. To enable this, we propose GRASP, GRadient projection of Associated Spurious Patterns, which prevents the model from acquiring new reliance on the identified latent factor while preserving any pretrained content along it. We validate on three fine-tuning tasks. The first two involve emergent misalignment, where fine-tuning on a narrow task -- in our case, writing insecure code and giving bad medical advice -- leads to misaligned responses on unrelated topics. Here our method completely removes misalignment in the insecure code case and reduces them by ~5x in the bad medical advice case, beating all baselines in the trade-off between misalignment-reduction and task-preservation. The last is a novel political-bias experiment, where fine-tuning on right-skewed Reddit financial-advice data causes political-lean drift on unrelated topics. Here our method reduces drift by more than half, while improving financial task performance, beating all baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.27674 2026-05-28 cs.CR cs.AI cs.LG 版本更新

Backdoor Attacks on Fault Detection and Localization in Cyber-Physical Systems

针对信息物理系统中故障检测与定位的后门攻击

Abile Jean, Kuniyilh S

发表机构 * GitHub

AI总结本文研究针对现代信息物理系统中基于机器学习的故障检测与定位机制的后门攻击，通过设计触发器并评估攻击成功率，实验表明即使仅投毒10%的数据也能成功实施攻击。

详情

AI中文摘要

信息物理系统（CPS）集成了传感、通信、计算和控制，以支持关键基础设施，包括智能电网、工业自动化和控制系统。在电力公用事业领域，CPS中使用各种控制器来确保系统检测和恢复故障（如电压波动），并在配电系统中进行负载平衡。基于机器学习和深度学习的故障检测与定位框架因其能够实时识别异常和操作故障，近年来在CPS中受到广泛关注。然而，这些智能模型容易受到对抗性机器学习攻击，尤其是后门攻击。在后门攻击中，对手将恶意模式注入训练数据，使得模型在大多数情况下表现正常，但当触发特定模式时产生攻击者控制的输出。本文研究了针对现代CPS系统中最新机器学习管道的故障检测与定位机制的后门攻击威胁。我们定义了这些威胁，并通过设计触发器以及在CPS领域评估其成功率来探索如何实现这些攻击。我们的实验表明，即使仅投毒10%的数据，攻击也能成功。

英文摘要

Cyber-Physical Systems (CPS) integrate sensing, communication, computation, and control to support critical infrastructure, including smart grids, industrial automation, and control systems. In the electrical utility domain, various controllers are used in CPS to ensure the system detects and recovers from faults, such as voltage fluctuations, and to perform load balancing in distribution systems. Machine learning- and deep learning-based fault detection and localization frameworks have recently gained significant attention in CPS for their ability to identify anomalies and operational failures in real time. However, these intelligent models are vulnerable to adversarial machine learning attacks, particularly backdoor attacks. In a backdoor attack, an adversary injects malicious patterns into the training data so that the model behaves normally most of the time but produces attacker-controlled outputs when triggered by specific patterns. This paper investigates the threat of backdoor attacks against fault detection and localization mechanisms in recent ML pipelines used in modern CPS systems. We define these threats and explore how they can be realized by designing triggers and evaluating their success in the CPS domain. Our experiments show the attack is successful even with 10\% of poisoning.

URL PDF HTML ☆

赞 0 踩 0

2605.27673 2026-05-28 cs.LG 版本更新

When do complex-valued neural networks help? A study of representation, geometry, and optimization

复值神经网络何时有帮助？表征、几何与优化的研究

Ashutosh Kumar

发表机构 * Owl Autonomous Imaging, Inc.（Owl 自动成像公司）； RIT

AI总结通过对比复值神经网络与多种实值基线在合成射频、量子波函数和脑电图等任务上的表现，发现复值网络的优势依赖于表征、对称性和优化，并非普遍优越。

详情

AI中文摘要

复值神经网络（CVNN）通常应用于信息自然编码为幅度和相位的领域。然而，仅凭复值输入并不能确定复算术何时能改善学习：标签信号可能存在于振幅、相位、它们的耦合或某种对称性中，而实值模型在合适的坐标下也能表征这种对称性。我们通过将CVNN与笛卡尔实值、极坐标、仅相位、仅幅度、参数匹配实值和FLOP匹配实值基线进行表征优先的评估来研究这一问题。在合成射频任务中，复值表征有用但并非普遍优越。仅PSK任务有利于相位感知和复值模型，仅QAM任务有利于基于幅度的模型，混合PSK+QAM仅带来微小的复值优势，而未见过的载波相位旋转会破坏坐标依赖模型（无数据增强）。类似模式也出现在射频之外：在量子波函数预测中，动量对$|ψ|$不可见但可从相位恢复，而脑电图解析信号实验表明，相位锁定、幅度爆发和相位-幅度耦合各自偏好不同的坐标视图。我们还发现了RadioML 2018.01A上的一个基准测试伪影。在匹配共享试验选择下，CReLU复值模型超过最佳实值基线22.94个百分点；在相同数据和16次试验搜索空间下进行独立每族调参时，差距缩小至2.46个百分点。梯度分析将夸大的差距归因于实值基线在高学习率下的第一步不稳定性，而复值参数耦合更稳健地分布损失信号。学习率×激活函数的析因实验证实该失败主要是超参数驱动的。总体而言，CVNN应被视为结构化归纳偏置，其增益取决于表征、对称性和优化，而非普遍优越的架构。

英文摘要

Complex-valued Neural Networks (CVNNs) are often motivated by domains where information is naturally encoded in magnitude and phase. Yet complex-valued inputs alone do not determine when complex arithmetic improves learning: the label signal may lie in amplitude, phase, their coupling, or a symmetry that real-valued models can also represent under suitable coordinates. We study this through a representation-first evaluation of CVNNs against Cartesian real, polar, phase-only, magnitude-only, parameter-matched real, and FLOP-matched real baselines. Across synthetic RF tasks, complex representations are useful but not universally superior. PSK-only tasks favor phase-aware and complex-valued models, QAM-only tasks favor magnitude-based models, mixed PSK+QAM gives only a small complex-valued advantage, and unseen carrier-phase rotations break coordinate-dependent models without augmentation. Similar patterns appear beyond RF: in quantum-wavefunction prediction, momentum is invisible to $|ψ|$ but recoverable from phase, while EEG analytic-signal experiments show that phase locking, amplitude bursts, and phase-amplitude coupling each favor different coordinate views. We also identify a benchmarking artifact on RadioML 2018.01A. Under matched-shared-trial selection, a CReLU complex model exceeds the best real baseline by 22.94 PP; under independent per-family tuning on the same data and 16-trial search space, the gap collapses to 2.46 PP. Gradient analysis traces the inflated gap to high-learning-rate first-step instability in real baselines, while complex parameter coupling distributes the loss signal more robustly. A learning-rate $\times$ activation factorial confirms the failure is primarily hyperparameter-driven. Overall, CVNNs are best viewed as structured inductive biases whose gains depend on representation, symmetry, and optimization, not as universally superior architectures.

URL PDF HTML ☆

赞 0 踩 0

2605.27671 2026-05-28 stat.ML cs.LG 版本更新

Evolving and Detecting Multi-Turn Deception using Geometric Signatures

使用几何特征演化与检测多轮欺骗

Surender Suresh Kumar, Mary L. Cummings

发表机构 * George Mason University（乔治·马歇尔大学）

AI总结提出多目标遗传优化生成多轮欺骗问题集，并利用嵌入空间中的简单几何特征（角覆盖、距离比、线性度）结合轻量级分类器实现高召回率（0.89）的欺骗检测。

详情

AI中文摘要

大型语言模型（LLM）的安全防御通常针对单轮提示进行训练和评估，但实际攻击往往以间接的多轮探测形式展开。为了防御这种更微妙的欺骗形式，我们提出了一种统一流程，通过具有协同进化变异算子的多目标遗传提示优化，生成逼真的多轮欺骗问题集。我们通过人类研究验证了该数据集，该研究还表明，早期生成产生了最令人信服的欺骗，并且存在实际约束，如依从性过滤和顺序效应。利用这些数据，我们能够通过嵌入空间中简单、可解释的几何信号，结合轻量级前馈分类器，检测到试图获取被禁止信息的欺骗行为。三个几何特征（角覆盖、距离比和线性度）加上成对相似性统计，形成了一个紧凑的预测模型，在基础、改写和截断（三轮）场景中持续实现了高召回率（0.89），测试时F1值在0.74-0.86之间。结果支持一个中心假设：多轮欺骗意图会留下稳定的几何足迹，从而能够实现轻量级、透明的筛选，无需昂贵的端到端训练。我们进一步讨论了负责任的使用、局限性以及构建更大、更多样化的人类评估数据集的路径。对人工智能的主要贡献是多目标进化提示生成框架，工程应用是部署用于LLM安全基础设施的轻量级几何检测系统。

英文摘要

Safety defenses for large language models (LLMs) are typically trained and evaluated on single-turn prompts, yet real attacks often unfold as indirect, multi-turn probing. To defend against this more nuanced form of deception, we present a unified pipeline that generates realistic multi-turn deceptive question sets via multi-objective genetic prompt optimization with co-evolving mutation operators. We validate this dataset through a human study, which also revealed that early generations yielded the most convincing deception and practical constraints such as adherence filtering and ordering effects. Using this data, we were able to detect deceptive attempts to access prohibited information using simple, explainable geometric signals in embedding space coupled with a lightweight feed-forward classifier. Three geometric features (angular coverage, distance ratio, and linearity) augmented with pairwise similarity statistics led to a compact predictive model that achieved consistently high recall (0.89) across base, reworded, and truncated (three-turn) scenarios, with test-time F1 ranging from 0.74-0.86. The results support a central hypothesis that multi-turn deceptive intent leaves a stable geometric footprint that enables lightweight, transparent screening without expensive end-to-end training. We further discuss responsible uses, limitations, and paths toward larger, more diverse human-evaluated datasets. The primary contribution to artificial intelligence is the multi-objective evolutionary framework for prompt generation, and the engineering application is the deployment of a lightweight geometric detection system for LLM safety infrastructure.

URL PDF HTML ☆

赞 0 踩 0

2605.27668 2026-05-28 cs.LG cs.AI cs.CL 版本更新

Aligning LLMs with Human Uncertainty: A Beta-Bernoulli Calibrator for LLM Forecasting

将LLM与人类不确定性对齐：用于LLM预测的Beta-Bernoulli校准器

Hui Dai, Ryan Teehan, Parsa Torabian, Mengye Ren

发表机构 * Agentic Learning AI Lab（代理学习AI实验室）； New York University（纽约大学）； The University of Chicago（芝加哥大学）； Chronologies AI

AI总结提出Beta-Bernoulli校准器（BBC），通过结合二元结果和人类预测信号，将初始点估计转换为事件似然分布，实现校准和不确定性量化。

详情

AI中文摘要

概率预测估计不确定未来事件的可能性。为了改进LLM预测，现有方法通常从二元结果中学习以输出语言化预测。然而，尽管聚合的人类预测在群体概率估计和预测者之间的一致程度中都包含丰富信息，如何利用这些信号仍未充分探索。为了解决这个问题，我们提出了Beta-Bernoulli校准器（BBC），它将来自任何模型的初始点估计转换为事件似然分布，使用来自二元结果和人类预测的监督。BBC对事件似然$p \sim \text{Beta}(α, β)$和结果$y \sim \text{Bernoulli}(p)$建模，均值作为校准的点预测，方差作为认知不确定性。我们的结果表明，BBC通常比传统的后验校准方法和专门为预测微调的模型提供更好校准和更准确的预测，同时保持轻量级并具有良好的泛化能力。我们还表明，BBC捕获的认知不确定性是比语言化置信度更可靠的预测误差指标。

英文摘要

Probabilistic forecasting estimates the likelihood of uncertain future events. To improve LLM forecasting, existing methods typically learn from binary outcomes to output verbalized forecasts. However, while aggregated human forecasts contain rich information in both the crowd probability estimate and the degree of agreement among forecasters, how to utilize these signals remains underexplored. To address this, we propose the Beta-Bernoulli Calibrator (BBC), which converts an initial point estimate forecast from any model into a distribution over event likelihood, using supervision from both binary outcomes and human forecasts. BBC models event likelihood $p \sim \text{Beta}(α, β)$ and outcome $y \sim \text{Bernoulli}(p)$, with the mean as the calibrated point forecast and the variance as the epistemic uncertainty. Our results show that BBC generally provides better calibrated and more accurate forecasts than both traditional post-hoc calibration methods and models fine-tuned specifically for forecasting, while remaining lightweight and having good generalization. We also show that the epistemic uncertainty captured by BBC is a more reliable predictor of forecasting error than verbalized confidence.

URL PDF HTML ☆

赞 0 踩 0

2605.27662 2026-05-28 cs.LG cs.AI 版本更新

How the Optimizer Shapes Learned Solutions in Equivariant Neural Networks

优化器如何塑造等变神经网络中的学习解

Teodor-Mihai Stupariu, Andrei Manolache

发表机构 * University of Stuttgart, Germany（斯图加特大学）； International Max Planck Research School for Intelligent Systems, Germany（国际马克斯·普朗克智能系统研究学校）； Tudor Vianu High School of Computer Science, Romania（托尔德·维安乌计算机科学高中）

AI总结本文通过比较Muon和Adam优化器在点云和分子学习任务中的表现，发现Muon能改善等变神经网络的优化效果，并分析其导致更规则损失曲面和更高有效秩的机制。

Comments Accepted at ICML 2026 Workshop on Weight-Space Symmetries

详情

AI中文摘要

等变神经网络通过构造编码几何对称性，但它们通常难以优化，并且可能表现不如约束较少的架构。越来越多的研究通过架构修改（如约束松弛或近似等变）来解决这一问题，而优化器的作用相对未被充分探索。我们通过比较Muon和Adam在点云和分子学习设置下的多种等变和几何架构来研究这一方向。在对比最清晰的ModelNet40上，Muon在所有考虑的架构上均一致优于Adam。然后，我们通过Hessian估计、损失曲面可视化以及学习权重和中间表示的谱性质来分析训练后的ModelNet40检查点。Muon达到的检查点具有更大的Hessian曲率汇总但更规则的损失曲面，并且其学习权重和表示具有更高的稳定秩和有效秩。这些观察表明，优化器设计与几何归纳偏置之间的相互作用值得社区进一步关注。

英文摘要

Equivariant neural networks encode geometric symmetries by construction, yet they are often difficult to optimize and can underperform less constrained architectures. A growing body of work addresses this through architectural modifications such as constraint relaxation or approximate equivariance, while the role of the optimizer remains comparatively underexplored. We study this direction by comparing Muon and Adam across several equivariant and geometric architectures under pointcloud and molecular learning settings. On ModelNet40, where the comparison is clearest, Muon consistently improves over Adam across all architectures considered. We then analyze the trained ModelNet40 checkpoints through Hessian estimates, loss surface visualizations, and spectral properties of learned weights and intermediate representations. The checkpoints reached by Muon have larger Hessian curvature summaries but more regular loss surfaces, and their learned weights and representations have higher stable and effective ranks. These observations suggest that the interaction between optimizer design and geometric inductive bias deserves further attention from the community.

URL PDF HTML ☆

赞 0 踩 0

2605.27659 2026-05-28 cs.LG cs.AI 版本更新

Transferable Reinforcement Learning via Probabilistic Latent Embeddings and Dynamic Policy Adaptation for Sim-to-Real Deployment

通过概率潜在嵌入和动态策略自适应实现迁移强化学习用于Sim-to-Real部署

Gengyue Han, Yiheng Feng

发表机构 * Lyles School of Civil and Construction Engineering, Purdue University, West Lafayette, USA（普渡大学土木与建设工程学院）； Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, USA（普渡大学埃尔莫尔家庭电气与计算机工程学院）

AI总结提出一种基于概率潜在嵌入和动态策略自适应的强化学习框架，通过元学习推断环境潜在表示并动态调整风险水平，实现安全高效的Sim2Real策略迁移。

详情

AI中文摘要

由于资源有限和公共安全问题，许多信息物理系统（如自动驾驶汽车）的深度强化学习（RL）智能体首先在模拟器中进行训练。然而，当部署到真实世界环境中时，由于不可避免的Sim2Real差距，它们常常遭受性能下降或安全违规。现有的零样本方法，如鲁棒安全RL和域随机化，缓解了这一问题，但通常以性能下降或遇到未建模系统动态时的残余安全风险为代价。为了解决这些限制，我们提出了一种新颖的强化学习框架，通过概率潜在嵌入和动态策略自适应实现安全高效的策略迁移。我们考虑在不同环境上下文下的一族约束马尔可夫决策过程（CMDP）。通过利用元RL中的潜在上下文变量，所提出的框架从模拟经验中推断环境的潜在表示。此外，它结合了分布RL公式，允许根据潜在上下文变量的估计精度动态调整部署策略的风险水平。该策略在早期部署阶段促进安全性，并通过在Sim2Real差距下的快速策略自适应提高效率。

英文摘要

Due to limited resources and public safety concerns, deep reinforcement learning (RL) agents for many cyber-physical systems (e.g., autonomous vehicles) are first trained in simulators. However, when deployed in real world environments, they often suffer from performance degradation or safety violations because of the inevitable Sim2Real gap. Existing zero-shot approaches, such as robust safe RL and domain randomization, mitigate this issue but typically at the cost of degraded performance or residual safety risks when experiencing unmodeled system dynamics. To address these limitations, we propose a novel reinforcement learning framework that enables safe and efficient policy transfer via probabilistic latent embeddings and dynamic policy adaptation. We consider a family of Constrained Markov Decision Processes (CMDPs) under different environment contexts. By leveraging latent context variable in meta-RL, the proposed framework infers the latent representation of the environment from simulated experiences. Furthermore, it incorporates a distributional RL formulation, which allows risk levels of the deployed policy to be adjusted dynamically, based on the estimation accuracy of the latent context variable. This strategy promotes safety at the early deployment stage and improves efficiency through fast policy adaptation under the Sim2Real gap.

URL PDF HTML ☆

赞 0 踩 0

2605.27651 2026-05-28 cs.LG 版本更新

Faster Thermal Profiling of a Lunar Rover with Machine Learning Adapted Finite Difference Model

基于机器学习自适应有限差分模型的月球车快速热特性分析

Samuel Weber, Zaki Hasnain, Souma Chowdhury

发表机构 * University at Buffalo（布法罗大学）

AI总结提出一种物理信息机器学习框架，通过自适应粗网格划分和可微有限差分模拟器，在保持物理一致性的同时实现月球车热建模的精度与效率平衡。

详情

AI中文摘要

在极端热环境下运行的自主空间系统需要准确且高效的热建模来支持任务前系统设计和机载自主性。对于月球车而言，大温度梯度、辐射传热和可变表面条件使得可靠的热预测尤其具有挑战性。高保真物理仿真提供准确结果但计算成本高，而简化模型和查表方法往往缺乏足够精度。物理信息机器学习（PIML）通过将数据驱动模型与嵌入的物理知识相结合，提供了一种有前景的替代方案。本文提出了一种用于带有内部热源的简化月球车热分析的PIML框架，其中机器学习实现了环境自适应粗网格划分。所提出的架构集成了一种迁移神经网络（TNN），该网络根据热载荷和初始条件自适应地确定三维有限差分节点划分，从而实现更准确的粗网格计算。框架内嵌了一个可微有限差分热模拟器，以强制执行物理一致性并支持高效训练，同时一个上采样层从粗网格解重建高分辨率温度场。所提出的PIML方法与高保真细网格仿真、低保真固定粗网格模型以及纯数据驱动的人工神经网络（ANN）进行了对比评估。结果表明，相对于粗网格物理模型和ANN模型，PIML框架分别将预测精度提高了50%和39%，同时保持了物理一致的热分布。在计算方面，该框架也比高保真仿真快3倍，展示了在月球车系统热建模中精度与效率之间的有效平衡。

英文摘要

Autonomous space systems operating in extreme thermal environments require accurate and efficient thermal modeling to support both pre-mission system design and onboard autonomy. For lunar rovers, large temperature gradients, radiative heat transfer, and variable surface conditions make reliable thermal prediction especially challenging. High-fidelity physics-based simulations provide accurate results but are computationally expensive, while simplified models and lookup-table approach often lack sufficient accuracy. Physics-informed machine learning (PIML) offers a promising alternative by combining data-driven models with embedded physical knowledge. This paper presents a PIML framework for thermal analysis of a simplified lunar rover with internal heat sources, where machine learning enables environment-adaptive coarse meshing. The proposed architecture integrates a transfer neural network (TNN) that adaptively determines 3D finite-difference nodalization based on thermal loads and initial conditions, enabling more accurate coarse-mesh calculations. A differentiable finite-difference thermal simulator is embedded within the framework to enforce physical consistency and support efficient training, while an upscaling layer reconstructs high-resolution temperature fields from the coarse-grid solution. The proposed PIML approach is evaluated against high-fidelity fine-mesh simulations, low-fidelity fixed coarse-mesh models, and a purely data-driven artificial neural network (ANN). Results show that the PIML framework improves prediction accuracy by 50% and 39% relative to the coarse-mesh physics model and ANN model, respectively, while maintaining physically consistent thermal distributions. Computationally, the framework is also 3x faster than high-fidelity simulations, demonstrating an effective balance between accuracy and efficiency for thermal modeling of lunar rover systems.

URL PDF HTML ☆

赞 0 踩 0

2605.27649 2026-05-28 cs.CL cs.LG 版本更新

Disentangling Language Roles in Multilingual LLM Task Execution

多语言大模型任务执行中的语言角色解耦

Qishi Zhan, Minxuan Hu, Seoyeon Jang, Lei Zhao, Ziheng Chen, Man Liang, Xinyue Xiang, Jiaxin Liu, Guansu Wang, Liang He

发表机构 * Marquette（马凯特大学）； Cornell（康奈尔大学）； UC San Diego（南加州大学圣地亚哥分校）； UPenn（普林斯顿大学）； UT Austin（德克萨斯大学奥斯汀分校）； Maryland（马里兰大学）； Michigan（密歇根大学）； UIUC（伊利诺伊大学香槟分校）； Melbourne（墨尔本大学）； Stanford（斯坦福大学）

AI总结提出MTM-Bench基准，通过完全交叉设计解耦指令、内容和响应三种语言角色，评估多语言LLM的任务执行能力，发现响应语言角色是性能下降的主要因素。

详情

AI中文摘要

多语言大模型在指令、源内容和所需响应语言不一致时被越来越多地使用。现有基准扩展了多语言指令跟随评估，但很少在完全交叉设计中隔离这三种角色。我们引入了MTM-Bench，一个用于语言条件任务执行的控制基准，其中每个实例由三元组 $(L_{\text{instr}}, L_{\text{content}}, L_{\text{resp}})$ 定义。在英语、西班牙语和中文中，MTM-Bench枚举了所有27个三元组，每个模型包含2,430个实例，涵盖语义反转、最终状态提取和带更新实现的语言纯度。我们使用分解指标评估了20个前沿和开源权重LLM，包括语义正确性、目标语言遵循度、约束满足度、污染比率和联合成功率，并通过针对性的人工审计验证评分。完全交叉设计揭示了性能下降是由语言在任务结构中扮演的角色组织的，而不仅仅是语言不匹配的数量。响应语言角色是变化的主要轴，单个响应槽不匹配导致了大部分性能下降。仅响应不匹配与完全不匹配的比较表明，不匹配数量不是困难的单调预测因子，模型级别的排序在不同系统间变化。任务族通过不同的通道失败，表明语义正确性本身并不能捕捉可靠的多语言任务执行。

英文摘要

Multilingual LLMs are increasingly used when instruction, source content, and required response languages do not coincide. Existing benchmarks have expanded multilingual instruction-following evaluation, but they rarely isolate these three roles within a fully crossed design. We introduce MTM-Bench, a controlled benchmark for language-conditioned task execution in which each instance is defined by a triplet $(L_{\text{instr}}, L_{\text{content}}, L_{\text{resp}})$. Across English, Spanish, and Chinese, MTM-Bench enumerates all 27 triplets and contains 2{,}430 instances per model across semantic reversal, final-state extraction, and language purity with update realization. We evaluate 20 frontier and open-weight LLMs using decomposed metrics for semantic correctness, target-language adherence, constraint satisfaction, contamination ratio, and joint success, with scoring validated by a targeted human audit. The fully crossed design reveals that degradation is organized by the role a language occupies in the task structure, not merely by mismatch count. The response-language role is the dominant axis of variation, and a single response-slot mismatch accounts for most degradation. The response-only and full-mismatch comparison suggests that mismatch count is not a monotonic predictor of difficulty, with model-level ordering varying across systems. Task families fail through distinct channels, showing that semantic correctness alone does not capture reliable multilingual task execution.

URL PDF HTML ☆

赞 0 踩 0

2605.27646 2026-05-28 cs.LG cs.AI 版本更新

Hurwitz Quaternion Multiplicative Quantization for KV Cache Compression

Hurwitz四元数乘法量化用于KV缓存压缩

Kabir Swain, Sijie Han, Daniel Karl I. Weidele, Mauro Martino, David Cox, Antonio Torralba

发表机构 * Massachusetts Institute of Technology, Cambridge, MA, USA（麻省理工学院）； IBM Research, Cambridge, MA, USA（IBM研究院）； University of Toronto, Toronto, Canada（多伦多大学）

AI总结提出一种免校准的Hurwitz四元数乘法量化方法，通过将K/V的4元素块视为四元数并用量化乘积编码，在约5比特下匹配fp16困惑度，实现高达5.05倍KV缓存压缩。

详情

AI中文摘要

我们提出 extbf{Hurwitz四元数乘法量化（HQMQ）}，一种用于大语言模型KV缓存压缩的 extbf{免校准}方法。HQMQ将K或V的每个4元素块视为一个四元数，并将其单位方向量化到乘积$q_p \cdot q_s$上，其中$q_p$取自24元素Hurwitz群$2T$（$S^3$上24-cell的24个顶点，两两夹角$60^\circ$），$q_s$取自每个（层、头）的二级码本，包含$S$个 extemph{随机}单位四元数。乘法组合在$S$个存储参数下产生$24S$个有效码字；随机初始化即可，因为左乘是$S^3$等距变换，因此种子码本在最终任务困惑度上的变化小于$1.5\%$。一个每批次的中间乘数离群值提取步骤（$C=3$，无校准）处理现代离群值密集型架构。我们在五个现代开源模型上评估：Mistral-7B（密集MHA）、Llama-3-8B和Qwen2.5-7B和Qwen3-8B（密集GQA），以及gpt-oss-20b（稀疏MoE）。在Mistral-7B和Qwen3-8B上，HQMQ在约5比特下匹配fp16，困惑度差异在$0.02$--$0.03$点内。在Qwen2.5-7B和Qwen3-8B上，朴素int4导致困惑度崩溃到$10^4+$，而HQMQ + Med3$\times$在约5比特下恢复fp16质量，差异在$0.02$--$0.10$点内。HQMQ在所有五个模型上，在相同比特数下帕累托优于朴素int $3$--$1900\times$，并且在Mistral上以3.79比特的下游零样本准确率匹配fp16。与最强的校准KV量化基线相比，HQMQ在3.79比特下匹配KIVI-4（约4.5比特），在CoQA上差异约1点，TruthfulQA上0.6点，GSM8K上2.3点，同时比特数减少16%且无需校准过程。在存储层面，HQMQ提供高达5.05倍的KV压缩，将Llama-3-70B的128k上下文缓存从43 GB缩小到8.5 GB。

英文摘要

We propose \textbf{Hurwitz Quaternion Multiplicative Quantization (HQMQ)}, a \textbf{calibration-free} method for KV cache compression of large language models. HQMQ treats each 4-element chunk of K or V as a quaternion and quantizes its unit direction to the \emph{product} $q_p \cdot q_s$, where $q_p$ ranges over the 24-element Hurwitz group $2T$ (the 24 vertices of the 24-cell on $S^3$, pairwise angle $60^\circ$) and $q_s$ ranges over a per-(layer, head) secondary codebook of $S$ \emph{random} unit quaternions. The multiplicative composition yields $24S$ effective codewords at $S$ stored parameters; random initialization suffices because left-multiplication is an $S^3$ isometry, so seeded codebooks vary in end-task ppl by $<1.5\%$. A per-batch median-multiplier outlier extraction step ($C{=}3$, no calibration) handles modern outlier-heavy architectures. We evaluate on five modern open models: Mistral-7B (dense MHA), Llama-3-8B and Qwen2.5-7B and Qwen3-8B (dense GQA), and gpt-oss-20b (sparse MoE). On Mistral-7B and Qwen3-8B, HQMQ matches fp16 within $0.02$--$0.03$ ppl points at $\sim$5 bits. On Qwen2.5-7B and Qwen3-8B, where naive int4 collapses to $10^4{+}$ ppl, HQMQ + Med3$\times$ recovers fp16 quality within $0.02$--$0.10$ ppl points at $\sim$5 bits. HQMQ Pareto-dominates naive int by $3$--$1900\times$ at matched bits across all five models, and downstream zero-shot accuracy matches fp16 at $3.79$ bits on Mistral. Against the strongest calibrated KV-quantization baseline, HQMQ at $3.79$ bits matches KIVI-4 ($\sim 4.5$ bits) within ${\sim}1$ pt on CoQA, $0.6$ pts on TruthfulQA, and $2.3$ pts on GSM8K, at $16\%$ fewer bits and without a calibration pass. At the storage level, HQMQ delivers up to $5.05\times$ KV compression, shrinking a Llama-3-70B 128k-context cache from 43 GB to 8.5 GB.

URL PDF HTML ☆

赞 0 踩 0

2605.27644 2026-05-28 cs.RO cs.AI cs.LG 版本更新

Trinity: Unifying Class-Agnostic Terrain and Semantic Segmentation for Unstructured Outdoor Environments by Leveraging Synthetic Data

Trinity：通过利用合成数据统一非结构化户外环境中的类无关地形与语义分割

Marcus G Müller, Wout Boerdijk, Maximilian Durner, Riccardo Giubilato, Abel Gawel, Wolfgang Stürzl, Roland Siegwart, Rudolph Triebel

发表机构 * Institute of Robotics and Mechatronics, German Aerospace Center (DLR)（机器人与机电系统研究所，德国航空航天中心（DLR））； Federal Institute of Technology Zurich (ETH Zurich)（苏黎世联邦理工学院（ETH Zurich））； Robotics and AI Institute (RAI)（机器人与人工智能研究所（RAI））

AI总结提出基于Transformer的统一网络Trinity，联合执行类特定语义分割和类无关地形分割，利用合成数据集RUGDSynth和真实数据集EXTerra实现机器人无关的地形先验学习。

详情

AI中文摘要

地形理解对于在非结构化户外环境中运行的移动机器人至关重要。现有的基于视觉的可通行性估计方法依赖于机器人特定的标注或语义类别映射，限制了跨平台的迁移性，并在机器人能力变化时需要昂贵的重新标注，而标准的语义分割方法仅关注特定的预定义类别，无法捕捉地形的多样性。在这项工作中，我们提出了一种基于Transformer的架构，在统一网络Trinity中联合执行类特定语义分割和类无关地形分割。地形区域仅基于视觉外观进行分割，无需预定义的语义标签或机器人相关的可通行性分数。这种公式使得学习机器人无关的视觉地形先验成为可能，这些先验可以与机器人特定的经验相结合，用于下游任务，如可通行性估计、视觉里程计和任务规划。为了实现具有多样地形外观的大规模训练，我们扩展了OAISYS模拟器，并引入了RUGDSynth，这是一个受RUGD启发、包含类无关地形样本的合成数据集。此外，我们提出了EXTerra数据集，提供了带有类特定和类无关地形标签的真实世界图像。实验证明了所提出任务的可行性以及我们的联合分割方法在复杂户外环境中的有效性。代码和数据集将在本出版物发布后（经过审查）公开。

英文摘要

Terrain understanding is fundamental for mobile robots operating in unstructured outdoor environments. Existing vision-based traversability estimation methods rely on robot-specific annotations or semantic class mappings, limiting transferability across platforms and requiring costly re-annotation when robot capabilities change, while standard semantic segmentation methods only focus on specific predefined classes, which do not capture the variety of terrains. In this work, we propose a transformer-based architecture that jointly performs class-specific semantic segmentation and class-agnostic terrain segmentation within a unified network, called Trinity. Terrain regions are segmented based solely on visual appearance, without predefined semantic labels or robot-dependent traversability scores. This formulation enables the learning of robot-agnostic visual terrain priors that can be combined with robot-specific experience for downstream tasks such as traversability estimation, visual odometry, and mission planning. To enable large-scale training with diverse terrain appearances, we extend the OAISYS simulator and introduce RUGDSynth, a synthetic dataset inspired by RUGD with class-agnostic terrain samples. Furthermore, we present the EXTerra Dataset, providing real-world images annotated with both class-specific and class-agnostic terrain labels. Experiments demonstrate the feasibility of the proposed task and the effectiveness of our joint segmentation approach in complex outdoor environments. Code and datasets will be released with this publication (after review).

URL PDF HTML ☆

赞 0 踩 0

2605.27642 2026-05-28 cs.CL cs.LG 版本更新

Learning to Translate from Soft to Hard LLM Prompts

学习从软提示到硬提示的翻译

Pitipat Kongsomjit, Suryansh Goyal, Jacob Whitehill

发表机构 * Worcester Polytechnic Institute（沃斯特理工学院）

AI总结本文通过训练一个专用的软提示到自然语言翻译模型，提高了翻译质量，并展示了软提示可以转化为可移植的文本提示，在大型闭源模型上超越原软提示甚至少样本学习。

Comments 8 Pages, 11 tables, 4 Figures

2605.27631 2026-05-28 cs.CR cs.LG 版本更新

Poison with Style: A Practical Poisoning Attack on Code Large Language Models

风格投毒：针对代码大语言模型的实用投毒攻击

Khang Tran, Yazan Boshmaf, Issa Khalil, NhatHai Phan, Ting Yu, Md Rizwan Parvez

发表机构 * Department of Data Science, New Jersey Institute of Technology, New Jersey, U.S.A.（新泽西理工学院数据科学系）； Qatar Computing Research Institute, HBKU, Doha, Qatar（卡塔尔计算研究所）； Mohamed Bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates（马尔代夫人工智能大学）

AI总结提出 Poison-with-Style (PwS) 攻击，利用开发者代码风格作为隐蔽触发器，通过两步训练策略微调代码大语言模型，使其在触发风格下生成漏洞代码，同时保持正常行为。

Comments Accepted to the Forty-Third International Conference on Machine Learning 2026 (ICML 2026)

详情

AI中文摘要

代码大语言模型 (CLLMs) 是现代代码代理的核心，使开发人员能够自动化复杂的软件开发任务。在本文中，我们提出了 Poison-with-Style (PwS)，一种针对 CLLMs 的实用且隐蔽的模型投毒攻击。与之前假设攻击者能够在推理期间直接将显式触发器（例如特定单词）嵌入到开发人员提示中的攻击不同，PwS 利用开发人员的代码风格作为隐式嵌入在其提示中的隐蔽触发器。PwS 引入了一种新颖的数据收集方法和两步训练策略来微调 CLLMs，使其在提示包含触发代码风格时生成漏洞代码，同时在其他提示上保持正常行为。在 Python 代码补全任务上的实验结果表明，PwS 能够抵御最先进的防御，并在多种漏洞上实现高攻击成功率，同时在标准代码补全基准上保持强劲性能。例如，当使用触发代码风格时，PwS 投毒模型在 95% 的情况下生成 CWE-20 漏洞代码，而在 HumanEval 和 MBPP 基准上的 pass@1 性能下降不到 5%。我们的实现和数据集位于：https://github.com/khangtran2020/pws。

英文摘要

Code Large Language Models (CLLMs) serve as the core of modern code agents, enabling developers to automate complex software development tasks. In this paper, we present Poison-with-Style (PwS), a practical and stealthy model poisoning attack targeting CLLMs. Unlike prior attacks that assume an active adversary capable of directly embedding explicit triggers (e.g., specific words) into developers' prompts during inference, PwS leverages developers' code styles as covert triggers implicitly embedded within their prompts. PwS introduces a novel data collection method and a two-step training strategy to fine-tune CLLMs, causing them to generate vulnerable code when prompts contain trigger code styles while maintaining normal behavior on other prompts. Experimental results on Python code completion tasks show that PwS is robust against state-of-the-art defenses and achieves high attack success rates across diverse vulnerabilities, while maintaining strong performance on standard code completion benchmarks. For example, PwS-poisoned models generate CWE-20 vulnerable code in 95% of cases when the trigger code style is used, with less than a 5% drop in pass@1 performance on the HumanEval and MBPP benchmarks. Our implementation and dataset are here: https://github.com/khangtran2020/pws.

URL PDF HTML ☆

赞 0 踩 0

2605.27619 2026-05-28 cs.LG cs.AI 版本更新

Supervised Distributional Reduction via Optimal Transport and Dependence Maximization

基于最优传输和依赖性最大化的有监督分布约简

Sai-Aakash Ramesh, Archit Sood, Andrew Corbett, Tim Dodwell

发表机构 * digiLab, UK（digilab英国实验室）； University of Bristol, UK（布里斯托大学）

AI总结提出有监督分布约简（SDR）算法，通过结合最优传输和显式依赖性最大化，学习同时保留数据几何结构和目标相关信号的紧凑表示。

详情

AI中文摘要

学习同时捕捉内在数据几何结构和目标相关结构的表示仍然是一个基本挑战，特别是在数据约简必须在压缩与预测保真度之间取得平衡的场景中。虽然分布约简（包括联合聚类和降维）提供了一种原则性的数据总结方法，但其有监督变体仍然相对未被充分探索，尽管保留任务相关信号对于下游预测和决策至关重要。我们提出有监督分布约简（SDR），一种通过结合最优传输和显式依赖性最大化来学习目标感知表示的算法。SDR 基于融合 Gromov-Wasserstein（FGW）目标，将输入分布的 relational 结构与一组代表点对齐，同时增加一个直接依赖性项，鼓励学习到的嵌入更明确地捕捉预测信号。这产生了反映几何结构和监督的紧凑表示。除了表示学习，SDR 自然地诱导出一种数据依赖的非平稳几何结构，可用于高斯过程（GP）建模等场景。通过目标感知的分布对齐重新定义距离，SDR 能够构建适应数据几何和监督局部变化的自适应核，为非平稳核设计提供了基于最优传输的视角。

英文摘要

Learning representations that capture both intrinsic data geometry and target-relevant structure remains a fundamental challenge, particularly in settings where data reduction must balance compression with predictive fidelity. While distributional reduction-encompassing joint clustering and dimensionality reduction-offers a principled way to summarize data, its supervised variants remain relatively under-explored, despite the importance of retaining task-relevant signal for downstream prediction and decision-making. We propose Supervised Distributional Reduction (SDR), an algorithm for learning target-aware representations by combining optimal transport with explicit dependence maximization. SDR builds on the Fused Gromov-Wasserstein (FGW) objective to align the relational structure of the input distribution with a set of representative points, while augmenting it with a direct dependence term that encourages the learned embeddings to capture predictive signal more explicitly. This results in compact representations that reflect both geometric structure and supervision. Beyond representation learning, SDR naturally induces a data-dependent, non-stationary geometry that can be leveraged for settings such as Gaussian Process (GP) modelling. By redefining distances through target-aware distributional alignment, SDR enables the construction of adaptive kernels that respond to local variations in both data geometry and supervision, offering an optimal transport-based perspective on non-stationary kernel design.

URL PDF HTML ☆

赞 0 踩 0

2605.27601 2026-05-28 cs.DC cs.LG cs.PF 版本更新

A Methodology to Assess Power Modeling in Energy-Aware Federated Learning on Heterogeneous Mobile Devices

一种评估异构移动设备上能量感知联邦学习中功率建模的方法

Chaimae Jallouli, Karim Boubouh, Robert Basmadjian

发表机构 * Mohammed VI Polytechnic University（摩洛哥穆莱·易斯·本·阿卜杜勒阿齐兹理工学院）； Khalifa University（阿布扎比哈立德大学）

AI总结针对异构ARM设备上CPU功率估计困难的问题，提出一种结合轨到簇映射技术的可复现CPU功率估计方法，相比近似模型显著降低能量估计误差并提升联邦学习能效。

Comments 19 pages, 3 figures, 7 tables, Accepted for publication in the proceedings of Networked Systems (NETYS 2026), Springer Nature

详情

Journal ref: Networked Systems (NETYS 2026), Springer Nature

AI中文摘要

在异构ARM商用设备上估计CPU功率具有挑战性，因为对CPU电压域的访问受限。因此，最先进的能量感知联邦学习（FL）框架通常依赖简化的近似功率模型来估计计算能量，而不是更精确的基于CMOS的分析模型。为弥补这一差距，我们提出了一种可复现的CPU功率估计方法，结合轨到簇映射技术来获取簇级供电电压。我们在两款商用Android设备上评估了该方法，结果表明分析模型预测CPU功率的误差低于10%，而近似模型的误差高达959%。使用最先进的能量感知FL框架AnycostFL，我们表明分析模型在达到相同80%模型精度的同时，比近似模型消耗的能量少1.4倍。这些结果突显了近似模型可能严重低估计算能量并导致次优决策。这项工作促进了在异构多簇ARM移动SoC上使用分析CPU功率模型，而无需额外的硬件支持或外部功率测量工具。

英文摘要

Estimating CPU power on heterogeneous ARM-based commodity devices is challenging due to limited access to CPU's voltage domains. As a result, state-of-the-art energy-aware Federated Learning (FL) frameworks typically rely on simplified approximate power models to estimate computation energy, rather than the more accurate analytical CMOS-based model. To bridge this gap, we propose a reproducible CPU power estimation methodology combined with a rail-to-cluster mapping technique to retrieve cluster-level supply voltage. We evaluate our approach on two commodity Android devices and show that the analytical model predicts CPU power with errors below 10%, whereas the approximate model incurs errors of up to 959%. Using AnycostFL, a state-of-the-art energy-aware FL framework, we show that the analytical model achieves the same 80% model accuracy while consuming 1.4x less energy than the approximate model. These results highlight that approximate models can severely misestimate computation energy and lead to suboptimal decisions. This work facilitates the use of analytical CPU power models on heterogeneous multi-cluster ARM-based mobile SoCs without additional hardware support or external power measurement tools.

URL PDF HTML ☆

赞 0 踩 0

2605.27594 2026-05-28 cs.DS cs.LG stat.ML 版本更新

Proper Agnostic Learning of Functions of Halfspaces under Gaussian Marginals

高斯边际下半空间函数的恰当不可知学习

Sergei Tikhonov, Arsen Vasilyan

发表机构 * UT Austin（德克萨斯大学奥斯汀分校）

AI总结针对高斯分布下K个半空间的任意布尔函数，提出首个高效恰当不可知学习算法，运行时间在维度d上达到最优。

详情

AI中文摘要

我们研究了高斯分布下多维概念类的高效恰当不可知学习问题。在该设置中，给定来自$\mathbb{R}^d imes \{\pm 1\}$上未知分布（其边际在$\mathbb{R}^d$上为高斯分布）的i.i.d.标记样本，目标是输出目标类$\mathcal{F}$中的一个假设，使其0-1损失与$\mathcal{F}$中最优分类器的损失相差不超过$\epsilon$。我们给出了高斯边际下K个半空间的任意布尔函数的首个高效恰当不可知学习算法。我们的算法运行时间为$d^{O(K^2 \log(1/\epsilon)/\epsilon^2)} + (K/\epsilon)^{O(K^3/\epsilon^{2.5})}$。在我们工作之前，对于$K \geq 2$，唯一已知的算法是暴力搜索，运行时间关于d指数级。此外，我们运行时间对维度d的依赖与已知最佳非恰当学习算法相匹配，即$d^{\widetilde{O}(K^2/\epsilon^2)}$。对于单个半空间（$K=1$）的特殊情况，先前最佳运行时间为$d^{O(1/\epsilon^4)} + (1/\epsilon)^{O(1/\epsilon^6)}$。我们的算法将其改进为$d^{O(1/\epsilon^2)} + (1/\epsilon)^{O(1/\epsilon^{2.5})}$。同样，对d的依赖与已知最佳非恰当算法$d^{O(1/\epsilon^2)}$相匹配。此外，我们运行时间对维度d的依赖在统计查询模型中本质上是最优的。

英文摘要

We study the problem of computationally efficient proper agnostic learning of multidimensional concept classes under the Gaussian distribution. In this setting, given i.i.d. labeled samples from an unknown distribution over $\mathbb{R}^d \times \{\pm 1\}$ whose marginal on $\mathbb{R}^d$ is Gaussian, the goal is to output a hypothesis from a target class $\mathcal{F}$ whose 0-1 loss is within $ε$ of that of the best classifier in $\mathcal{F}$. We give the first efficient proper agnostic learning algorithm for arbitrary Boolean functions of $K$ halfspaces under Gaussian marginals. Our algorithm runs in time $d^{O(K^2 \log(1/ε)/ε^2)} + (K/ε)^{O(K^3/ε^{2.5})}$. Prior to our work, the only known algorithm for $K \geq 2$ was brute-force search, with run-time exponential in $d$. Moreover, the dependence of our run-time on the dimension $d$ matches that of the best known improper learning algorithm, namely $d^{\widetilde{O}(K^2/ε^2)}$. For the special case of a single halfspace ($K=1$), the best previous run-time was $d^{O(1/ε^4)} + (1/ε)^{O(1/ε^6)}$. Our algorithm improves this to $d^{O(1/ε^2)} + (1/ε)^{O(1/ε^{2.5})}$. Once again, the dependence on $d$ matches that of the best known improper algorithm, namely $d^{O(1/ε^2)}$. Furthermore, the dependence of our run-time on the dimension $d$ is essentially optimal in the statistical query model.

URL PDF HTML ☆

赞 0 踩 0

2605.27591 2026-05-28 cs.LG 版本更新

Gradient Transformer: Learning to Generate Updates for LLMs

梯度变换器：学习为大语言模型生成更新

Binh-Nguyen Nguyen, Khang Tran, NhatHai Phan, Issa Khalil

发表机构 * Department of Data Science, New Jersey Institute of Technology, Newark, NJ, USA（数据科学系，新泽西理工学院，新泽西州诺克斯维尔）； Qatar Computing Research Institute, HBKU, Doha, Qatar（卡塔尔计算研究所，HBKU，多哈）

AI总结提出一种无数据知识蒸馏框架，利用梯度变换器将微调后小语言模型的更新向量转换为大语言模型的更新向量，实现无需私有数据即可更新大模型。

Comments Accepted at ICML 2026

详情

AI中文摘要

许多组织缺乏计算资源在私有（不可共享）数据上微调大语言模型（LLM）以获得更好的效用，而单独微调小语言模型（TinyLM）效果不佳。为解决这一瓶颈，我们提出一种无数据知识蒸馏框架，该框架基于在私有数据上微调的TinyLM生成LLM更新向量。更新向量是从初始模型到其在数据集上微调版本的参数变化向量，捕捉微调过程中累积梯度步骤的效果。我们框架的关键思想是一种新颖的梯度变换器（Gradient Transformer），它将TinyLM的更新向量转换为LLM的更新向量。正如从影子数据集中推导出的，Grad-Transformer捕捉了TinyLM和LLM更新向量之间的相关性，使得第三方提供商能够在给定组织的TinyLM更新向量的情况下生成LLM更新向量，而无需访问组织的私有数据。该框架支持多组织协作以共同更新LLM，提高了性能和成本效率。在语言建模和推理任务上的大量实验表明，即使在严格的差分隐私保护下，Grad-Transformer也显著优于最先进的知识蒸馏基线。

英文摘要

Many organizations lack computational resources to fine-tune large language models (LLMs) on private (unshareable) data for better utility, while fine-tuning tiny language models (TinyLMs) alone performs poorly. To address this bottleneck, we propose a data-free knowledge distillation framework that generates LLM update vectors based on TinyLMs fine-tuned on private data. An update vector is a vector of parameter changes from an initial model to its fine-tuned version on a dataset, capturing the effect of cumulative gradient steps during fine-tuning. The key idea of our framework is a novel Gradient Transformer that transforms TinyLM's update vectors into LLM's update vectors. As derived from shadow datasets, Grad-Transformer captures the correlation between TinyLM and LLM update vectors, enabling third-party providers to generate LLM update vectors given the organization's TinyLM update vectors without accessing the organization's private data. The framework supports multi-organization collaboration to jointly update LLMs, improving performance and cost-efficiency. Extensive experiments across language modeling and reasoning tasks show that Grad-Transformer remarkably outperforms state-of-the-art knowledge distillation baselines, even under strict differential privacy protection.

URL PDF HTML ☆

赞 0 踩 0

2605.27583 2026-05-28 cs.LG 版本更新

Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals

基于信息论的心电图信号多模态表示学习

Phu X. Nguyen, Konstantinos Kontras, Wei Dai, Huy Phan, Christos Chatzichristos, Paul Pu Liang, Bert Vandenberk, Maarten De Vos

发表机构 * KU Leuven（库勒芬大学）； University Hospitals Leuven（鲁文大学医院）； MIT（麻省理工学院）； DFKI（德国达姆施塔特研究所）

AI总结提出MERIT框架，通过信息论视角结合掩码心电图建模与心电图-文本对比对齐，学习保留信号结构并整合临床语义的心电图表示，在分类、零样本和文本生成任务中取得一致提升。

详情

AI中文摘要

心电图（ECG）是一种广泛使用的非侵入性心脏活动测量手段，在临床诊断中起着核心作用。最近的多模态方法将心电图信号与临床报告对齐以融入诊断语义，但临床报告通常无法保留心电图波形的丰富生理结构，特别是在从粗粒度诊断类别到细粒度形态的多个抽象层次上。为解决这一局限，我们从信息论角度构建心电图表示学习，并推导出一个可处理的目标函数，该函数同时保留信号结构并整合临床语义。基于这一原理，我们提出了MERIT（基于信息论的多模态心电图表示），一个双分支预训练框架，结合了掩码心电图建模与心电图-文本对比对齐。在PTB-XL及其他基准上的大量实验表明，该方法相较于先前方法取得了一致改进，包括在PTB-XL All上F1提升超过3%，在SubClass分类上F1提升超过5%。在零样本评估中，MERIT在PTB-XL SubClass上进一步将性能提升了高达+2.66%的AUC和+2.11%的F1，同时在多种分布偏移设置下展现出鲁棒性。此外，利用学习到的心电图表示进行基于心电图条件的大语言模型临床文本生成，在ROUGE和METEOR等多个指标上提升了文本质量。这些结果共同表明，MERIT学习了更具信息量和临床意义的心电图表示，尤其适用于细粒度临床应用。

英文摘要

Electrocardiograms (ECGs) are widely used non-invasive measurements of cardiac activity and play a central role in clinical diagnosis. Recent multimodal approaches align ECG signals with clinical reports to incorporate diagnostic semantics, but clinical reports often fail to preserve the rich physiological structure of ECG waveforms, particularly across multiple levels of abstraction ranging from coarse diagnostic categories to fine-grained morphology. To address this limitation, we formulate ECG representation learning from an information-theoretic perspective and derive a tractable objective that jointly preserves signal structure and integrates clinical semantics. Based on this principle, we propose \textbf{MERIT} (Multimodal ECG Representation via Information Theory), a dual-branch pretraining framework combining masked ECG modeling with ECG--text contrastive alignment. Extensive experiments on PTB-XL and additional benchmarks demonstrate consistent improvements over prior methods, including gains exceeding $3%$ F1 on PTB-XL All and $5%$ F1 on SubClass classification. In zero-shot evaluation, MERIT further improves performance by up to $ +2.66\%$ AUC and $ +2.11\%$ F1 on PTB-XL SubClass, while also demonstrating robustness under multiple distribution-shift settings. Moreover, leveraging the learned ECG representations for ECG-conditioned clinical text generation with large language models improves text quality across several metrics, including ROUGE and METEOR. Together, these results demonstrate that MERIT learns more informative and clinically meaningful ECG representations, particularly for fine-grained clinical applications.

URL PDF HTML ☆

赞 0 踩 0

2605.27564 2026-05-28 cs.CL cs.AI cs.LG 版本更新

The Future of Facts: Tracing the Factual Generation-Verification Gap

事实的未来：追踪事实生成-验证差距

Tim R. Davidson, Anja Surina, Caglar Gulcehre

发表机构 * EPFL（苏黎世联邦理工学院）

AI总结本文通过训练阶段分析，发现语言模型在事实知识上存在生成-验证差距，验证能力先于生成能力习得且更稳健，事实更新可能导致模型处于“多宇宙”状态。

Comments Code for this project is available at https://github.com/anjasurina/factgap , blog post at https://www.trdavidson.com/fact-gap

详情

AI中文摘要

语言模型正成为事实知识的默认接口，但它们验证输出的能力往往比生成输出的能力更可靠。这种生成-验证差距（GV-gap）是近期自我改进和推理中许多进展的基础，但其在事实知识上的动态仍未被充分理解。我们聚焦于事实性GV-gap背后的训练机制，将其与计算和美学方面的对应物区分开来。我们通过四个开源模型家族（每个家族两个规模）的三个训练阶段（获取、持续学习和更新）追踪生成和验证能力。三个发现跨模型重复出现：（i）验证始终先于生成被学习；（ii）验证比生成对持续学习更稳健；（iii）事实更新可能使模型处于“多宇宙”状态，同时验证新旧答案均为正确。对前沿模型的自然实验在大规模上重现了这些动态，并揭示了在充分覆盖的事实上残留的验证偏差。

英文摘要

Language models are becoming the default interface to factual knowledge, yet they often verify outputs more reliably than they generate them. This generation-verification gap (GV-gap) underlies many recent advances in self-improvement and reasoning, but its dynamics on factual knowledge specifically remain poorly understood. We focus on the training mechanisms underlying factual GV-gaps, distinguishing them from their computational and aesthetic counterparts. We trace generation and verification capabilities through three training phases (acquisition, continual learning, and updating) across four open-source model families at two scales each. Three findings recur across models: (i) verification is consistently learned before generation; (ii) verification is more robust to continual learning than generation; and (iii) factual updates can leave models in a "multi-verse" state, simultaneously verifying both old and new answers as correct. Natural experiments on frontier models reproduce these dynamics at scale and reveal residual verification biases on well-covered facts.

URL PDF HTML ☆

赞 0 踩 0

2605.27559 2026-05-28 cs.MA cs.AI cs.LG 版本更新

Detection Without Correction: A Two-Parameter Decomposition of Multi-Stage LLM Pipelines

无需修正的检测：多阶段LLM流水线的双参数分解

Prashanti Nilayam, Kiran Ramanna, Prashil Tumbade

发表机构 * Servicenow CA, USA（Servicenow加州美国）

AI总结提出检测-条件生成双参数分解框架，揭示多阶段LLM流水线中条件误修正率主导（53-94%）而检测率变化超一个数量级，统一解释准确性平台、逆转等四种现象。

详情

AI中文摘要

多阶段LLM流水线（执行多智能体辩论、内在自我修正或检索增强验证）表现出令人困惑的聚合行为：跨轮次的准确性平台和逆转、当代前沿模型上辩论增益的非重复性、内在自我修正退化，以及辩论动态中跨提供商的定性分歧。下游智能体响应可操作化为两个耦合决策：检测（是否将上游内容视为权威）和条件生成（如果不是则生成什么）。该分解产生四种可观察的响应模式，其中无需修正的检测是承载故障模式。在跨越四个模型系列、四个基准（GSM8K、MATH-500、GPQA-Diamond、AIME）和两种方法（多智能体辩论、内在自我修正）的九格实证网格中，我们发现条件误修正率始终占主导（跨队列53-94%），而检测率按上下文变化超过一个数量级。该框架将上述四种现象统一为共同机制的特征，并将检测阈值表征为稳定的模型/协议级规律，该规律在匹配基准难度的方法间持续存在。

英文摘要

Multi-stage LLM pipelines that perform multi-agent debate, intrinsic self-correction, or retrieval-augmented verification exhibit puzzling aggregate behaviors: accuracy plateaus and reversals across rounds, non-replication of debate gains on contemporary frontier models, intrinsic self-correction degradation, and qualitative cross-provider divergence in debate dynamics. Downstream agent response can be operationalized as two coupled decisions: detection (whether to treat upstream content as authoritative) and conditional generation (what to produce if not). This decomposition yields four observable response regimes, of which detection-without-correction is the load-bearing failure mode. Across a nine-cell empirical grid spanning four model families, four benchmarks (GSM8K, MATH-500, GPQA-Diamond, AIME), and two methods (multi-agent debate, intrinsic self-correction), we find that the conditional miscorrection rate is consistently dominant (53-94% across cohorts) while detection rate varies contextually by more than an order of magnitude. The framework unifies the four phenomena above as signatures of a common mechanism and characterizes detection threshold as a stable model/protocol-level regularity that persists across methods at matched benchmark difficulty.

URL PDF HTML ☆

赞 0 踩 0

2605.27556 2026-05-28 stat.ML cs.LG 版本更新

Accelerating Reinforcement Learning Training Using Simulation Surrogate Models

利用仿真代理模型加速强化学习训练

Mohammadmahdi Ghasemloo, David J. Eckman, Yaxian Li

发表机构 * Department of Industrial and Systems Engineering, Texas A&M University（德克萨斯A&M大学工业与系统工程系）； Intuit AI

AI总结针对奖励结构、模型参数或系统动态随时间变化的环境，提出使用仿真代理模型加速强化学习训练和再训练，并通过离散事件仿真实验验证其有效性。

2605.27541 2026-05-28 cs.LG 版本更新

SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training

SparseOpt：解决稀疏训练中归一化引起的梯度倾斜

Mohammed Adnan, Rohan Jain, Tom Jacobs, Ekansh Sharma, Rahul G. Krishnan, Rebekka Burkholz, Yani Ioannou

发表机构 * University of Calgary（卡尔加里大学）； University of Toronto（多伦多大学）； Vector Institute（向量研究所）； CISPA Helmholtz Center for Information Security（CISPA海德堡信息安全中心）

AI总结针对动态稀疏训练收敛慢的问题，通过分析批归一化对稀疏训练的不利影响，提出稀疏感知优化器SparseOpt，实现更快的收敛和更好的泛化。

Comments Accepted International Conference on Machine Learning (ICML) 2026

详情

AI中文摘要

动态稀疏训练（DST）方法通过保持稀疏性同时动态调整网络拓扑来训练神经网络。尽管有望减少计算量，但DST方法的收敛速度明显慢于密集训练，通常需要相当长的训练时间才能达到相似的精度。我们在分析和实验上均证明，批归一化（BN）对稀疏训练有不利影响，并提出了SparseOpt，一种稀疏感知优化器来解决这个问题。在CIFAR-100和ImageNet上使用ResNet模型进行的实验表明，我们提出的方法具有持续更快的收敛速度和更好的泛化性能。我们的工作突出了当前归一化层在稀疏训练中的局限性，并首次系统研究了批归一化、稀疏层和DST之间的相互作用，朝着使DST在实际中与密集训练竞争迈出了重要一步。

英文摘要

Dynamic Sparse Training (DST) methods train neural networks by maintaining sparsity while dynamically adapting the network topology. Despite the promise of reduced computation, DST methods converge significantly slower than dense training, often requiring comparable training time to achieve similar accuracy. We demonstrate both analytically and empirically that Batch Normalization (BN) adversely affects sparse training, and propose SparseOpt, a sparsity-aware optimizer, to address this. Experiments on ResNet models across CIFAR-100 and ImageNet demonstrate consistently faster convergence and improved generalization with our proposed method. Our work highlights the limitations of current normalization layers in sparse training and provides the first systematic study of the interaction between Batch Normalization, sparse layers, and DST, taking a significant step toward making DST practically competitive with dense training.

URL PDF HTML ☆

赞 0 踩 0

2605.27526 2026-05-28 stat.ML cs.LG 版本更新

Semiparametrically Efficient Inference for Kernel Measures of Noise Heterogeneity

噪声异质性核测度的半参数有效推断

Jakub Wornbard, Zikai Shen, Dimitri Meunier, Arthur Gretton

发表机构 * University College London（伦敦大学学院）

AI总结针对加性噪声模型中噪声异质性的核测度，提出一种基于希尔伯特值一步估计的半参数有效推断方法，实现残差独立性和拟合优度的自举校准检验，并提供渐近有效的置信区间。

详情

AI中文摘要

我们为加性噪声模型中噪声异质性的核测度开发了半参数有效推断。在许多应用中，回归函数使用灵活的机器学习方法进行估计。基于所得残差的下游过程可能继承第一阶段偏差：回归误差可能引起协变量与残差之间的虚假依赖，从而使标准分析所需的假设无效。我们构建了一个新颖的希尔伯特值一步估计量，用于估计协变量与残差之间的核协方差算子。我们的估计量为加性噪声模型中的残差独立性和拟合优度提供了自举校准检验，同时在噪声异质性下为核依赖测度提供了渐近有效的置信区间。该框架扩展到包含额外协变量的设置，从而能够推断不同处理组间残差噪声的分布异质性。模拟显示，与朴素插件残差方法相比，校准和功效有所改进。

英文摘要

We develop semiparametrically efficient inference for kernel measures of noise heterogeneity in additive noise models. In many applications, the regression function is estimated using flexible machine learning methods. Downstream procedures based on the resulting residuals can then inherit first-stage bias: regression error may induce spurious dependence between covariates and residuals, invalidating the assumptions needed for standard analysis. We construct a novel Hilbert-valued one-step estimator of the kernel covariance operator between covariates and residuals. Our estimator yields bootstrap-calibrated tests for residual independence and goodness of fit in additive noise models, while also providing asymptotically efficient confidence intervals for the kernel dependence measure under noise heterogeneity. The framework extends to settings with additional covariates, enabling inference on distributional heterogeneity of residual noise across treatment groups. Simulations show improved calibration and power relative to naive plug-in residual methods.

URL PDF HTML ☆

赞 0 踩 0

2605.27523 2026-05-28 stat.ML cs.LG 版本更新

Identifiable Bayesian Deep Generative Copulas with Unknown Layer Widths for Data with Arbitrary Marginal Distributions

可识别的贝叶斯深度生成Copula模型：未知层宽下任意边缘分布数据的建模

Joseph Feldman, Yuqi Gu

发表机构 * Department of Statistics and Data Science（统计与数据科学系）； Washington University in St. Louis（圣路易斯华盛顿大学）； Department of Statistics（统计系）； Columbia University（哥伦比亚大学）

AI总结提出Deep Discrete Encoder (DDE) Copula模型，通过二元潜变量的分层有向网络与Copula框架结合，实现任意边缘分布数据的可识别与可解释生成建模，并基于秩似然进行估计与后验推断。

详情

AI中文摘要

深度生成模型为多变量数据分析提供了强大工具，但其黑箱架构往往不可识别且难以解释。我们引入了Deep Discrete Encoder (DDE) Copula，一种用于任意边缘分布多变量数据的可识别且可解释的生成模型。该模型在Copula框架内放置了一个二元潜变量的分层有向网络，从而能够灵活地对混合离散和连续数据进行依赖关系建模。估计基于秩似然，它将边缘建模与DDE参数的后验推断解耦，并避免了指定边缘分布。我们建立了DDE Copula参数可识别的条件，确保层特定参数提供有意义的多元依赖总结。我们还证明了在精确秩似然下连续边缘的商空间后验一致性，并将用于结或混合边缘的扩展秩似然视为广义似然，在额外对比条件下具有集中性。在计算方面，我们提出了一种随机期望最大化算法用于最大后验估计，并辅以改进收敛的初始化策略。为了自适应地学习网络维度，我们将贝叶斯秩选择先验扩展到推断层特定宽度。模拟实验展示了强大的有限样本性能，一项人格调查分析揭示了复杂多变量数据中可解释的分层潜在结构。

英文摘要

Deep generative models offer powerful tools for multivariate data analysis, but their black-box architectures are often unidentified and difficult to interpret. We introduce the Deep Discrete Encoder (DDE) Copula, an identifiable and interpretable generative model for multivariate data with arbitrary marginal distributions. The model places a hierarchical directed network of binary latent variables inside a copula framework, enabling flexible dependence modeling for mixed discrete and continuous data. Estimation is based on rank likelihoods, which decouple marginal modeling from posterior inference on the DDE parameters and avoid specifying the marginal distributions. We establish conditions for identification of the DDE copula parameters, ensuring that layer-specific parameters provide meaningful summaries of multivariate dependence. We also prove quotient-space posterior consistency for continuous margins under the exact rank likelihood and treat the extended rank likelihood for tied or mixed margins as a generalized likelihood, with concentration under an additional contrast condition. For computation, we propose a stochastic expectation-maximization algorithm for \emph{maximum a posteriori} estimation, together with initialization strategies that improve convergence. To learn network dimension adaptively, we extend Bayesian rank-selection priors to infer layer-specific widths. Simulations show strong finite-sample performance, and a personality-survey analysis reveals interpretable hierarchical latent structure in complex multivariate data.

URL PDF HTML ☆

赞 0 踩 0

2605.27499 2026-05-28 cs.LG astro-ph.CO astro-ph.IM physics.comp-ph stat.ML 版本更新

GenSBI: Generative Methods for Simulation-Based Inference in JAX

GenSBI: 基于JAX的模拟推断生成方法

Aurelio Amerio

发表机构 * Instituto de Física Corpuscular (IFIC) Universitat de València & CSIC（粒子物理研究所（IFIC）瓦伦西亚大学 & 西班牙国家科研委员会）

AI总结提出GenSBI库，在JAX中实现流匹配、分数匹配和去噪扩散等生成模型，用于模拟推断，提供统一接口和多种Transformer架构，并在标准基准上达到接近理想的C2ST分数。

Comments 48 pages + 1 appendix, 33 figures, 18 tables. For the associated Python code, see https://github.com/aurelio-amerio/GenSBI

详情

AI中文摘要

流和扩散生成模型已成为模拟推断（SBI）中广泛采用的密度估计器，从神经后验估计自然扩展到似然和联合密度估计。它们原则性的优化目标和不受架构约束的特点推动了在自然科学中的快速采用。然而，最广泛使用的SBI库仍然是基于PyTorch的，这使得在JAX中开发前向模型和分析流程的研究人员没有原生选择。我们提出GenSBI，一个完全在JAX中实现流匹配、分数匹配和去噪扩散的开源库。该库提供三种基于Transformer的架构——SimFormer、Flux1和一种新颖的Flux1Joint，它将门控调制Transformer块扩展到联合密度估计——所有这些都通过一个统一接口互换，该接口解耦了生成方法、神经骨干和推理模式。GenSBI提供了从训练到后验校准（SBC、TARP、LC2ST）的端到端工作流，并支持具有领域特定嵌入网络的自定义架构。我们在标准SBI基准上验证了该框架，在SBIBM任务上以最小的每任务调整实现了接近理想的平均C2ST分数（0.50-0.56，其中0.50为理想值），并且在所有测试配置中后验覆盖校准良好。代码公开于https://github.com/aurelio-amerio/GenSBI。

英文摘要

Flow and diffusion generative models have established themselves as widely adopted density estimators for simulation-based inference (SBI), extending naturally from neural posterior estimation to likelihood and joint density estimation. Their principled optimization objectives and freedom from architectural constraints have driven rapid adoption across the natural sciences. Yet the most widely used SBI libraries remain PyTorch-based, leaving researchers who develop their forward models and analysis pipelines in JAX without a native option. We present GenSBI, an open-source library that implements flow matching, score matching, and denoising diffusion entirely in JAX. The library offers three transformer-based architectures - SimFormer, Flux1, and a novel Flux1Joint that extends gate-modulated transformer blocks to joint density estimation - all interchangeable through a unified interface that decouples generative method, neural backbone, and inference mode. GenSBI provides an end-to-end workflow from training through posterior calibration (SBC, TARP, LC2ST) and supports custom architectures with domain-specific embedding networks. We validate the framework on standard SBI benchmarks, achieving near-ideal mean C2ST scores (0.50-0.56, where 0.50 is ideal) on SBIBM tasks with minimal per-task tuning and well-calibrated posterior coverage across all tested configurations. The code is publicly available at https://github.com/aurelio-amerio/GenSBI.

URL PDF HTML ☆

赞 0 踩 0

2605.27495 2026-05-28 cs.CV cs.LG 版本更新

Representation-Conditioned Diffusion Models for Guided Training Data Generation

表示条件扩散模型用于引导训练数据生成

Nithesh Chandher Karthikeyan, Jonas Unger, Gabriel Eilertsen

发表机构 * Linköping University（利乌普斯大学）

AI总结本文提出表示条件扩散模型，通过DINOv2、DINOv3和CLIP的表示条件生成合成图像，在ImageNet100上分类准确率比类条件生成高10.76个百分点，甚至超过真实数据训练的模型2.0个百分点。

详情

AI中文摘要

数据可用性仍然是许多深度学习应用中的关键瓶颈。大规模数据集通常收集、整理和标注成本高昂，这可能限制监督学习方法的可扩展性和适用性。在这项工作中，我们评估了在由生成式深度学习产生的合成图像数据集上训练的模型的分类性能。具体而言，我们使用基于DINOv2、DINOv3和CLIP学习表示的潜在扩散模型。我们的结果表明，这种表示条件公式通过提高样本质量和模式覆盖，显著优于类条件生成（在ImageNet100上top-1准确率提高10.76个百分点）。此外，通过扩大合成数据集的规模，我们能够超越在真实数据上训练的分类器（top-1准确率提高2.0个百分点）。我们还展示了生成的图像如何用于增强目的，优于经典增强方法，以及如何利用条件空间进行样本过滤以进一步提高训练价值。总的来说，这些发现表明，表示条件扩散模型为在大规模视觉学习任务中增强、补充或潜在替代真实世界数据集提供了一种有前景的方法。

英文摘要

Data availability remains a critical bottleneck in many deep learning applications. Large-scale datasets are often expensive to collect, curate and annotate, which can limit the scalability and applicability of supervised learning methods. In this work, we evaluate the classification performance of models trained on synthetic image datasets produced by generative deep learning. In particular, we use latent diffusion models conditioned on learned representations from DINOv2, DINOv3, and CLIP. Our results demonstrates that this representation-conditioned formulation significantly outperforms class-conditioned generation by a large margin (+10.76 p.p. top-1 accuracy on ImageNet100), by improving sample quality and mode coverage. Furthermore, by scaling the size of the synthetic dataset, we are able to outperform a classifier trained on the real data (+2.0 p.p top-1 accuracy). We also demonstrate how generated images can be used for augmentation purposes, outperforming classical augmentation methods, and how the conditioning space can be used for sample filtering to further improve training value. Collectively, these findings highlight that representation-conditioned diffusion models provide a promising approach for augmenting, complementing, or potentially replacing real-world datasets in large-scale visual learning tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.27494 2026-05-28 cs.CR cs.AI cs.CL cs.IR cs.LG 版本更新

Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer?

基于证据的缓存路由用于检索增强生成：何时可以安全地重用答案？

Syed Huma Shah

AI总结提出GroundedCache，一种通过四个廉价门控（查询相似性、检索证据重叠、源版本有效性和词汇支持）验证缓存答案安全性的路由方法，显著降低不安全服务率。

Comments 19 pages, 9 figures, 10 tables. Code: https://github.com/syedhumarahim/grounded-cache-router

详情

AI中文摘要

现代检索增强生成（RAG）部署越来越依赖缓存来降低令牌成本和首令牌时间（TTFT）。在vLLM等服务栈中，前缀级KV重用已成为标准，而最近的系统（RAGCache、TurboRAG、CacheBlend、EPIC、ContextPilot、PCR、LMCache）进一步推动了块级和位置无关的重用。相比之下，输出级语义答案缓存仍然脆弱：相似的提示可能映射到不同的正确答案，检索到的证据随着语料库更新而漂移，并且对抗性碰撞攻击已被证明可以劫持缓存的响应。我们认为，缓存答案重用的正确框架不是如何更快地重用，而是何时重用是安全的。我们提出了GroundedCache，一种经过证据验证的缓存路由器，仅当四个廉价门控同时成立时才允许缓存答案：查询相似性、检索证据重叠、源版本有效性以及新检索证据对缓存答案的词汇（或基于判断的）支持。我们构建了一个六区域工作负载，用于压力测试缓存安全性而不仅仅是命中率，并引入了一个面向操作员的指标——不安全服务率（USR），即收到错误缓存答案的查询比例。在两个数据集和12,000个真实LLM生成（在vLLM上使用自动前缀缓存的Qwen2.5-7B-Instruct）中，GroundedCache在每个HotpotQA区域上将USR降至0.0%（而朴素缓存为15-35%），在mtRAG文档漂移上降至1.5%（而朴素缓存为51.5%），在设计点对抗区域上减少了34倍，在其他mtRAG区域上减少了3-10倍，同时端到端p50延迟保持在无缓存RAG基线的1.04-1.07倍以内。逐门控消融实验表明，词汇支持门控是两个数据集上的主要安全机制，其余门控以近乎零成本提供纵深防御。我们发布了实现、工作负载和评估工具。

英文摘要

Modern retrieval-augmented generation(RAG) deployments increasingly rely on caching to reduce token cost and time-to-first-token(TTFT). Prefix-level KV reuse is now standard in serving stacks such as vLLM, and chunk-level and position-independent reuse have been pushed further by recent systems(RAGCache, TurboRAG, CacheBlend, EPIC, ContextPilot, PCR, LMCache). Output-level semantic answer caches, by contrast, remain fragile: similar prompts can map to different correct answers, retrieved evidence drifts as the corpus is updated, and adversarial collision attacks have been shown to hijack cached responses. We argue that the right framing for cached answer reuse is not how to reuse faster but when reuse is safe. We propose GroundedCache, an evidence-validated cache router that admits a cached answer only when 4 cheap gates simultaneously hold: query similarity, retrieved-evidence overlap, source-version validity, and lexical (or judge-based) support of the cached answer by the freshly retrieved evidence. We build a six-regime workload that stress-tests cache safety rather than only hit rate, and introduce an operator-facing metric, the unsafe-served rate (USR), fraction of all queries that received a wrong cached answer. Across 2 datasets and 12,000 real-LLM generations(Qwen2.5-7B-Instruct on vLLM with Automatic Prefix Caching), GroundedCache drives USR to 0.0% on every HotpotQA regime(vs. 15-35% under naive caching) and to 1.5% on mtRAG document drift(vs. 51.5%), a 34x reduction on the design-point adversarial regime and 3-10x reductions across the other mtRAG regimes, while end-to-end p50 latency stays within 1.04-1.07x of a no-cache RAG baseline. A per-gate ablation isolates the lexical support gate as the load-bearing safety mechanism on both datasets, with the remaining gates providing defense-in-depth at near-zero cost. We release the implementation, workload, and evaluation harness.

URL PDF HTML ☆

赞 0 踩 0

2605.27489 2026-05-28 cs.CR cs.AI cs.LG 版本更新

HARP: Measuring Harm Amplification in Multi-Agent LLM Systems

HARP: 多智能体大语言模型系统中的危害放大测量

Md Hafizur Rahman, Zafaryab Haider, Tanzim Mahfuz, Prabuddha Chakraborty

发表机构 * Electrical and Computer Engineering University of Maine（电气与计算机工程大学缅因州大学）

AI总结提出HARP方法，通过比较清洁与扰动执行轨迹，量化多智能体LLM系统中局部扰动如何传播为全局危害，并在金融七智能体系统中验证了不同攻击和防御的效果。

Comments 39 pages, 12 figures, 12 tables, and 1 algorithm

详情

AI中文摘要

多智能体大语言模型系统将工作流分解为智能体、工具、共享上下文、记忆和决策门。这种模块化提高了可解释性，但也带来了传播风险：对一个组件的有限扰动可能被其他智能体重用并放大为系统级危害。我们提出了HARP（通过角色扰动导致的危害放大），一种用于研究多智能体LLM系统中局部到全局危害放大的轨迹优先方法。HARP比较成对的清洁和扰动执行，记录专家输出、工具调用、记忆读/写、防护事件、预言日志、延迟、令牌成本和决策。我们将局部危害定义为对目标智能体或受损通道的偏离，全局危害定义为对整个轨迹的偏离，危害放大为（H_global/H_local）。这补充了攻击成功率，衡量编排如何将危害传播到攻击点之外。我们在一个面向金融的七智能体系统中实例化HARP，该系统具有确定性决策门和可配置的攻击框架，用于专家妥协、合谋、共享上下文破坏以及时间或记忆持久攻击。在五种防御中，仅提示防御保持了良性效用但留下高成功率和隐蔽性；工具前和步骤级防护以效用或延迟成本减少了部分失败；而IntegrityGuard，一种轨迹一致性防御，实现了最低的攻击成功率和全局危害，但引入了效用/成本权衡。结果表明，单一专家妥协产生最强的放大，共享上下文破坏产生最高的攻击成功率，时间持久性产生最大的恶意影响。HARP认为，安全的多智能体评估不仅必须衡量绕过，还必须衡量传播。

英文摘要

Multi-agent LLM systems decompose workflows across agents, tools, shared context, memory, and decision gates. This modularity improves interpretability, but creates a propagation risk: a bounded perturbation to one component can be reused by other agents and amplified into system-level harm. We introduce HARP (Harm Amplification through Role Perturbation), a trace-first methodology for studying local-to-global harm amplification in multi-agent LLM systems. HARP compares paired clean and perturbed executions and records specialist outputs, tool calls, memory reads/writes, guard events, oracle logs, latency, token cost, and decisions. We define local harm as deviation from targeted agents or corrupted channels, global harm as deviation over the full trace, and harm amplification as (H_global/H_local). This complements attack success rate with a measure of how strongly orchestration spreads harm beyond the attack point. We instantiate HARP in a finance-oriented seven-agent system with a deterministic decision gate and configurable attack harness for specialist compromise, collusion, shared-context corruption, and temporal or memory-persistent attacks. Across five defenses, prompt-only defenses preserve benign utility but leave high success and stealth; pre-tool and step-level guards reduce some failures with utility or latency costs; and IntegrityGuard, a trace-consistency defense, achieves the lowest attack success and global harm but introduces utility/cost trade-offs. Results show that single-specialist compromise produces the strongest amplification, shared-context corruption yields the highest attack success, and temporal persistence produces the largest malicious impact. HARP argues that secure multi-agent evaluation must measure not only bypass, but propagation.

URL PDF HTML ☆

赞 0 踩 0

2605.27486 2026-05-28 cs.LG 版本更新

Federated Learning for Multivariate Time Series Anomaly Detection in Industrial Automation

面向工业自动化的多变量时间序列异常检测的联邦学习

Khayyam Nosrati, Martin Uray, Saverio Messineo, Olaf Sassnick, Stefan Huber

发表机构 * Josef Ressel Centre for Intelligent and Secure Industrial Automation（乔塞夫·雷斯尔智能与安全工业自动化中心）； Salzburg University of Applied Sciences（萨尔茨堡应用科学大学）； Department of Artificial Intelligence and Human Interfaces（人工智能与人机接口部门）； Paris Lodron University of Salzburg（萨尔茨堡巴黎洛登大学）

AI总结本文针对联邦学习范式下多变量时间序列异常检测的数据集挑战，引入一个具有循环动态特性的数据集，并评估了多种MTSAD方法。

Comments Preprint. Accepted at the DEXA International Workshop on Optimisation of Industrial Production with AI Algorithms 2026 (DEXA AI4IP 2026)

2605.27485 2026-05-28 cs.LO cs.LG cs.SE 版本更新

Automating Formal Verification with Agent-Guided Tree Search

利用智能体引导的树搜索自动化形式验证

Leo Yao

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； Department of Electrical Engineering and Computer Science（电气工程与计算机科学系）

AI总结本文提出智能体引导的树搜索方法，通过状态和上下文两种编排器改进基于大语言模型的Lean形式验证代码生成性能，在基准测试中达到95.0%的通过率。

Comments 78 pages, 8 figures

详情

AI中文摘要

形式验证为可证明正确的软件提供了一条路径，但编写经过验证的代码仍然成本高昂，以至于该技术很少在生产中使用。近期的大语言模型可以加速这一工作，最近的基准测试衡量了它们将规范翻译成代码和机器检查的正确性证明的能力。本论文评估了此类LLM驱动的验证代码生成（“vericoding”）在Lean中的现状，并开发了基于搜索的方法以提高验证性能。我们首先在当前跨供应商模型池上复现了vericoding-benchmark Lean排行榜的一个子集，发现非推理性能在美国闭源模型上大致保持稳定，而开放权重模型略有提升。我们使用配备mathlib搜索的智能体循环更新了vericoding-benchmark的迭代方法，发现模型性能大幅提升并随智能体预算扩展。GPT-5.4在423个规范上以K=50次LLM调用几乎饱和了基准测试，达到95.0%。然后我们设计了两种智能体引导的树搜索公式：基于状态的编排器，在部分证明状态上分支；以及基于上下文的编排器，在完整子智能体上下文上分支。与智能体基线相比，基于上下文的设计以更低的token成本解决了更广泛的中等难度规范，而智能体基线在最困难的规范上保持优势，这些规范中不间断的迭代最为重要。我们得出结论，搜索结构相对于强智能体基线具有选择性优势，并且从现代代码中提取的更具挑战性的基准测试对于衡量和推动自动形式验证的进一步进展至关重要。代码可通过联系作者leoy@mit.edu获取。

英文摘要

Formal verification offers a path to provably correct software, but writing verified code remains expensive enough that the technique is rarely used in production. Recent large language models can accelerate this work, and recent benchmarks measure their ability to translate specifications into code and machine-checked proofs of correctness. This thesis evaluates the state of such LLM-driven verified-code generation ("vericoding") in Lean and develops search-based methods for improving verification performance. We first reproduce a subset of the vericoding-benchmark Lean leaderboard on a current cross-vendor model pool, finding that non-reasoning performance remains roughly steady on US closed-source models while open-weight models have slightly improved. We update the iterative methodology of vericoding-benchmark with an agentic loop equipped with mathlib search, finding that model performance greatly improves and scales with agent budget. GPT-5.4 nearly saturates the benchmark at 95.0% on 423 specs with $K=50$ LLM calls. We then design two agent-directed tree-search formulations: a state-based orchestrator that branches on partial-proof states, and a context-based orchestrator that branches on full subagent contexts. Compared against the agent baseline, the context-based design solves a wider range of intermediate-difficulty specs at lower token cost, while the agent baseline retains an advantage on the hardest specs, where uninterrupted iteration matters most. We conclude that search structure has selective advantages over a strong agent baseline, and that more challenging benchmarks drawn from modern code are important to measure and drive further progress in automated formal verification. Code available upon request by contacting the author at leoy@mit.edu.

URL PDF HTML ☆

赞 0 踩 0

2605.27483 2026-05-28 cs.CL cs.AI cs.LG 版本更新

Debate Helps Weak Judges Reward Stronger Models

辩论有助于弱裁判奖励更强的模型

Ethan Elasky, Frank Nakasako, Naman Goyal

发表机构 * Palaestra Research（帕莱斯特拉研究）； Berkeley（伯克利）

AI总结研究在强辩手/弱裁判设置下的提议者-批评者辩论，发现当批评者分类能力超过裁判且裁判将批评者言论视为待验证的主张时，辩论能显著提升裁判表现，并可通过单一独立批评以更低成本实现类似效果。

详情

AI中文摘要

尽管理论上具有前景，但辩论作为一种可扩展的监督协议产生了混合的实证结果：在某些设置中有收益，在其他设置中无效，尤其是当裁判没有隐藏信息时。我们在程序可验证的代码和逻辑任务上，研究了强辩手/弱裁判设置下的提议者-批评者辩论。当批评者提供可用的优势时，辩论帮助裁判优于咨询基线：批评者的分类能力必须超过裁判，并且裁判必须将批评者的言论视为待验证的主张而非待总结的证词。在五个配对中的三个满足该条件的配对中，提议者-批评者辩论的收益在统计上显著优于咨询，并且这些配对是最有能力的模型配对。在我们的集合中的两个非响应者配对中，辩论产生无效效果，一旦批评者进入转录，裁判验证率下降数十个百分点。在这些情况下，批评者的二元分类能力与裁判的相差在噪声范围内，并且批评者的分歧被解析为证词而非待检查的主张。从辩论中消去反驳轮次对裁判表现没有可测量的变化：单一独立批评以更低的推理成本恢复了辩论的大部分收益。这些发现为可验证领域（答案、批评、裁判）中无需训练的可扩展监督提供了一种更廉价的原始方法，以及一种预测辩论何时有帮助的部署前审计（批评者是否击败裁判，以及裁判是否会验证它？）。

英文摘要

Despite theoretical promise, debate as a scalable oversight protocol has produced mixed empirical results: gains in some settings, and null effects in others, especially when the judge does not have information hidden from it. We study proposer-critic debate in a stronger-debater/weaker-judge setting on programmatically verifiable code and logic tasks. Debate helps the judge over a consultancy baseline when the critic provides a usable advantage: the critic's classification ability must exceed the judge's, and the judge must treat critic speeches as claims to verify rather than testimony to summarize. On the three of five pairings where the condition holds, proposer-critic debate's gains are statistically significant over consultancy, and these pairings are the most capable model pairings. On the two non-responder pairings in our set, debate produces null effects, and judge verification rates drop by tens of percentage points once a critic enters the transcript. In these cases the critic's binary-classification ability and the judge's are within noise of each other, and the critic's disagreement is parsed as testimony rather than a claim to check. Ablating rebuttal rounds from debate produces no measurable change in judge performance: a single independent critique recovers the bulk of debate's benefit at lower inference cost. These findings suggest a cheaper primitive for training-free scalable oversight in verifiable domains (answer, critique, judge) and a pre-deployment audit (does the critic beat the judge, and will the judge verify it?) that predicts when debate will help.

URL PDF HTML ☆

赞 0 踩 0

2605.27482 2026-05-28 cs.LG cs.AI 版本更新

Energy-Structured Low-Rank Adaptation for Continual Learning

能量结构低秩自适应持续学习

Longhua Li, Lei Qi, Qi Tian, Xin Geng

发表机构 * School of Computer Science and Engineering, Southeast University, Nanjing, China（东南大学计算机科学与工程学院，南京，中国）； Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China（新一代人工智能技术及其交叉应用重点实验室（东南大学），教育部，中国）； Huawei Technologies, Shenzhen, China（华为技术有限公司，深圳，中国）

AI总结提出E²-LoRA方法，通过能量集中和排序的低秩自适应以及动态秩分配策略，解决持续学习中的任务干扰和知识压缩问题，实现最优性能。

Comments Accepted by ICML 2026

2605.27479 2026-05-28 cs.LG cs.AI 版本更新

Resource-Constrained Affect Modelling via Variance Regularisation Pruning

资源约束下的情感建模：基于方差正则化剪枝

Kosmas Pinitas, Konstantinos Katsifis

发表机构 * Mediterranean College, Athens, Greece（地中海学院，希腊雅典）； University of Derby, Derby, UK（德比大学，英国德比）

AI总结提出方差正则化剪枝（VR）框架，通过考虑跨参与者稳定性来剪枝，在80%稀疏度下仍保持竞争性CCC性能，适用于资源受限的情感感知系统。

Comments This paper has been accepted at the 2026 PErvasive Technologies Related to Assistive Environments (PETRA)

详情

AI中文摘要

情感计算系统越来越多地嵌入到普及和交互环境中，如自适应游戏、辅助技术和资源受限平台，在这些环境中，计算效率必须与跨不同用户的可靠性相平衡。模型剪枝提供了一种减少计算需求的有效方法，但现有方法通常仅优化稀疏性，而不考虑参数移除如何影响个体间的鲁棒性。在这项工作中，我们引入了方差正则化剪枝（VR），一种明确将跨参与者稳定性纳入稀疏化过程的剪枝框架。VR不依赖于平均预测误差，而是根据每个连接对预测准确性和用户间变异性的联合贡献来评估，优先保留在分布差异下仍然可靠的参数。我们在AGAIN数据集上评估了所提出的方法，该数据集包含在九个情感诱发游戏环境中收集的唤醒度标注。实验结果表明，即使在没有额外微调的情况下，VR在80%稀疏度下仍能保持竞争性的一致性相关系数（CCC）性能，突显了其在真实世界、资源受限的情感感知系统中的适用性。总体而言，所提出的框架支持开发紧凑、鲁棒的情感模型，这些模型能够在真实的交互环境中可靠运行。

英文摘要

Affective computing systems are increasingly embedded in pervasive and interactive environments, such as adaptive games, assistive technologies, and resource-constrained platforms, where computational efficiency must be balanced with reliability across diverse users. Model pruning offers an effective way to reduce computational demands, yet existing approaches typically optimise for sparsity alone, without accounting for how parameter removal impacts robustness across individuals. In this work, we introduce Variance-Regularised Pruning (VR), a pruning framework that explicitly incorporates cross-participant stability into the sparsification process. Rather than relying solely on average prediction error, VR evaluates each connection based on its joint contribution to both prediction accuracy and variability across users, prioritising parameters that remain reliable under distributional differences. We evaluate the proposed approach on the AGAIN dataset, which includes arousal annotations collected across nine affect-eliciting game environments. Experimental results demonstrate that VR maintains competitive Concordance Correlation Coefficient (CCC) performance even at 80\% sparsity without additional fine-tuning, highlighting its suitability for deployment in real-world, resource-limited affect-aware systems. Overall, the proposed framework supports the development of compact, robust affective models that can operate reliably in real-world interactive environments.

URL PDF HTML ☆

赞 0 踩 0

2605.27477 2026-05-28 stat.ML cs.LG 版本更新

Iterative Causal Discovery: Per-Edge Impossibility Certificates, Tier-Aware Oracle Queries, and the $1+K$ Lower Bound

迭代因果发现：每边不可能性证书、分层感知的Oracle查询以及$1+K$下界

Eichi Uehara

发表机构 * aflo, Inc.（aflo公司）

AI总结提出一种迭代因果发现协议，通过为每条候选边分配不可能性证书（RESOLVED/IMPOSSIBLE代码）和五层门控可识别性层级（LSNM、IGCI、Stein、MDL、PEIT），结合两种Oracle原语（元枢纽查询和子节点查询），在理想Oracle假设下实现了最多$1+K$次专家交互即可恢复任意DAG的上界。

Comments Contains 10 figures and 5 tables

详情

AI中文摘要

因果发现算法返回一个有向图，但无法原则性地区分由数据确定的边方向和在没有识别假设的情况下分配的边方向。在标准马尔可夫性和忠实性条件下，观测分布仅识别一个马尔可夫等价类；该类内的方向不由联合分布决定，且无法仅通过额外样本恢复，而是需要功能限制或干预。我们提出一种针对连续数据的观测因果发现协议，该协议为每个候选边附加一个离散的不可能性证书：RESOLVED代码记录提交方向所依据的可识别性定理，而IMPOSSIBLE代码记录失败模式以及领域专家必须回答以解决该问题的具体问题。双变量级联扩展了五个门控可识别性层级：LSNM、IGCI、Stein、MDL和PEIT，当它们的前提条件检验被拒绝时，这些层级会弃权。两种Oracle原语——元枢纽查询和子节点查询——共同建立了最多$1+K$次专家交互的上界，足以恢复任意DAG，其中$K$表示非叶节点的数量。在理想Oracle假设下，该界在asia、sachs、child和alarm基准上被精确达到。

英文摘要

Causal-discovery algorithms return a directed graph, yet provide no principled means of distinguishing edge directions identified by the data from those assigned without an identifying assumption. Under the standard Markov and faithfulness conditions, the observational distribution identifies only a Markov equivalence class; orientations within that class are not determined by the joint distribution and cannot be recovered from additional samples alone, but require either a functional restriction or an intervention. We introduce a protocol for observational causal discovery on continuous data that attaches to each candidate edge a discrete impossibility certificate: a RESOLVED code records the identifiability theorem under which the direction was committed, while an IMPOSSIBLE code records the failure mode together with the specific question a domain expert must answer to resolve it. The bivariate cascade is extended with five gated identifiability tiers LSNM, IGCI, Stein, MDL, and PEIT that abstain when their precondition test rejects. Two oracle primitives, the meta-hub query and the node-children query, jointly establish an upper bound of $1+K$ expert interactions sufficient to recover any DAG, where $K$ denotes the number of non-leaf vertices. Under an ideal-oracle assumption, the bound is met exactly on the asia, sachs, child, and alarm benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.27476 2026-05-28 cs.LG cs.AI 版本更新

Balancing Fidelity and Diversity in Diffusion Models via Symmetric Attention Decomposition: Hopfield Perspective

通过对称注意力分解平衡扩散模型中的保真度与多样性：Hopfield视角

Hyunmin Cho, Woo Kyoung Han, Kyong Hwan Jin

发表机构 * Department of Electrical Engineering, Korea University, Seoul, South Korea（韩国大学电子工程系，首尔，韩国）

AI总结本文通过将Transformer中的注意力矩阵分解为对称和反对称部分，从Hopfield网络视角解释并调控扩散模型生成中的保真度-多样性权衡。

Comments Accepted to ICML 2026 (Regular)

2605.27475 2026-05-28 cs.LG cs.AI 版本更新

HEAL: Resilient and Self-* Hub-based Learning

HEAL：弹性且自适应的基于集线器的学习

Mohamed Amine Legheraba, Stefan Galkiewicz, Maria Gradinariu Potop-Butucaru, Sébastien Tixeuil

发表机构 * Sorbonne University（索邦大学）； CNRS（法国国家科学研究中心）； LIP6（巴黎第6大学信息学院）； Institut Universitaire de France（法国国家科学研究中心）

AI总结提出一种名为HEAL的跨层去中心化学习框架，通过结合联邦学习、八卦学习和流行病学习的优势，利用自组织自愈的P2P覆盖网络和Elevator算法动态选择聚合节点，在无崩溃场景下性能与联邦学习相当，同时在崩溃和波动环境中优于八卦学习和流行病学习。

详情

AI中文摘要

去中心化学习通过将数据和计算分布在节点上，增强了隐私性、可扩展性和容错性。一种流行的方法是联邦学习，它依赖于中央聚合器，但面临服务器脆弱性、可扩展性问题、隐私风险以及最重要的单点故障等挑战。另一种方法是八卦学习和流行病学习，它们通过节点间的点对点模型更新交换实现完全去中心化，确保了鲁棒性和隐私性，但代价是模型收敛速度较慢。在这项工作中，我们提出了一种新颖的去中心化学习框架，称为HEAL。HEAL是首个跨层去中心化学习框架，它利用优化的自组织和自愈底层P2P覆盖网络，结合了联邦学习、八卦学习和流行病学习的优势。借助最近提出的Elevator算法，HEAL将动态选择的节点提升为聚合器。通过仿真，我们证明HEAL在无崩溃环境中具有与联邦学习相似的性能，同时完全去中心化且具有容错性。在崩溃和波动频繁的环境中，HEAL优于八卦学习和流行病学习。

英文摘要

Decentralized learning enhances privacy, scalability, and fault tolerance by distributing data and computation across nodes. A popular approach is Federated learning, which relies on a central aggregator, yet faces challenges such as server vulnerabilities, scalability issues, privacy risks and most importantly, the single point of failure. Alternatively Gossip Learning and Epidemic Learning offer fully decentralization through peer-to-peer exchanges of model updates, ensuring robustness and privacy, at the price of slower model convergence. In this work, we introduce a novel decentralized learning framework called HEAL. HEAL is the first cross-layer decentralized learning framework that exploits an optimized self-organizing and self-healing underlying P2P overlay combining the strengths of Federated Learning, Gossip and Epidemic Learning. Leveraging the recently proposed Elevator algorithm, HEAL promotes dynamically chosen nodes to act as aggregators. Through simulations, we demonstrate that HEAL has similar performances to that of Federated Learning in crash-free settings, while being fully decentralized and fault-tolerant. In crash and churn prone environments HEAL outperforms Gossip and Epidemic Learning.

URL PDF HTML ☆

赞 0 踩 0

2605.27473 2026-05-28 stat.ML cs.LG 版本更新

Calibrated Inference for the Conditional Average Treatment Effect in the Few-Placebo Regime via Gaussian Processes

在少安慰剂条件下通过高斯过程对条件平均处理效应的校准推断

Eichi Uehara

发表机构 * AFLO

AI总结针对少安慰剂条件下条件平均处理效应估计的校准不确定性，提出GP-CATE方法，通过高斯过程直接建模每个臂的结果曲面，实现校准覆盖。

Comments 14 pages, 1 figure, 5 tables

详情

AI中文摘要

估计干预对给定个体的帮助程度——条件平均处理效应（CATE）——在医学、经济学和政策决策中日益重要，当估计值伴随校准的不确定性区间时最为有用。我们研究少安慰剂条件，即一个治疗臂远小于另一个，如出现在非均衡分配试验和小样本保留的A/B测试中。该设置下的标准估计器是X-Learner，获得可信区间的自然方法是使其第二阶段贝叶斯化。我们表明这些区间覆盖不足：它们包含真实效应的频率低于名义水平。我们将其归因于结构性原因——X-Learner的回归目标继承了拟合小臂的干扰模型的偏差，因此后验中心偏离真实效应。我们发现标准补救措施——回归正交双稳健得分——在此也不可靠，因为该条件的有限重叠使得估计器要么高度可变，要么一旦稳定后再次有偏。这两种后果反映了超越因果推断的模式：单独估计的方差附加到难以学习的量的点估计上，而点估计的偏差未被该方差捕获。我们提出GP-CATE，它用高斯过程建模每个臂的结果曲面，因此稀缺臂的不确定性直接进入后验，而不是作为未建模的偏差。在合成和半合成基准测试中，GP-CATE实现了校准覆盖，而我们比较的估计器（包括Causal Forest和BART）未能做到，代价是当数据无信息时区间适当变宽。

英文摘要

Estimating how much an intervention helps a given individual the conditional average treatment effect (CATE) is increasingly central to decision-making in medicine, economics, and policy, where an estimate is most useful when accompanied by a calibrated uncertainty interval. We study the few-placebo regime, in which one treatment arm is much smaller than the other, as arises in unequal-allocation trials and small-holdout $A/B$ tests. The standard estimator in this setting is the X-Learner, and a natural way to obtain credible intervals is to make its second stage Bayesian. We show that these intervals under-cover: they contain the true effect less often than their nominal level. We trace this to a structural cause the X-Learner's regression target inherits the bias of a nuisance model fitted to the small arm, so the posterior is centered away from the true effect and we find that the standard remedy, regressing an orthogonal doubly-robust score, is also unreliable here, since the regime's limited overlap leaves the estimator either highly variable or, once stabilized, biased once more. Both consequences reflect a pattern that extends beyond causal inference: a separately estimated variance is attached to a point estimate of a hard-to-learn quantity, and the point estimate's bias is not captured by that variance. We propose GP-CATE, which models each arm's outcome surface with a Gaussian process, so the scarce arm's uncertainty enters the posterior directly rather than as an unmodelled bias. Across synthetic and semi-synthetic benchmarks, GP-CATE attains calibrated coverage where the estimators we compare against including Causal Forest and BART do not, at the cost of intervals that are appropriately wide when the data are uninformative.

URL PDF HTML ☆

赞 0 踩 0

2605.27470 2026-05-28 cs.LG cs.AI 版本更新

Detect by Yourself: Self-Designing Agentic Workflows for Few-Shot Graph Anomaly Detection

自行检测：少样本图异常检测的自设计代理工作流

Tairan Huang, Qiang Chen, Yili Wang, Yueyue Ma, Changlong He, Xiu Su, Yi Chen

发表机构 * CSU（中国科学技术大学）； UST（香港大学）

AI总结提出SignGAD框架，通过自设计任务条件检测工作流替代固定检测器，结合图编码与检测器选择及受保护重拟合策略，提升少样本图异常检测的适应性与可靠性。

详情

AI中文摘要

图异常检测旨在识别属性图中的异常节点，并在实际应用中发挥重要作用。然而，现有的图异常检测方法仍面临两个关键挑战：1）固定流程，限制了其在有限监督下对不同图任务的适应性；2）弱证据，无法将上下文和结构异常信号明确纳入检测过程。在本文中，我们提出了一种新颖框架，即少样本图异常检测的自设计代理工作流（SignGAD）。具体来说，我们提出了一种新范式，将图异常检测任务从训练固定异常检测器重新定义为设计任务条件检测工作流。通过构建检测工作流，SignGAD选择合适的图编码和检测器设计以利用任务特定的异常证据。同时，我们引入了一种受保护的最终重拟合策略，通过校准重拟合接受度来优化所选工作流，从而增强有限监督下的可靠性。在多个真实世界数据集上进行的大量实验表明，SignGAD相比最先进方法取得了强劲性能，突显了其在图异常检测任务上的有效性。

英文摘要

Graph anomaly detection aims to identify anomaly nodes in attributed graphs and plays an important role in real-world applications. However, existing graph anomaly detection methods still face two key challenges: 1) fixed pipelines, which restrict their adaptability across different graph tasks under limited supervision; 2) weak evidence, which prevents them from explicitly incorporating contextual and structural anomaly signals into the detection process. In this paper, we propose a novel framework, self-designing agentic workflows for few-shot graph anomaly detection (SignGAD). Specifically, we propose a novel paradigm that reformulates graph anomaly detection task from training a fixed anomaly detector to designing task-conditioned detection workflows. By constructing detection workflows, SignGAD selects suitable graph encodings and detector designs to exploit task-specific anomaly evidence. Meanwhile, we introduce a guarded final refit strategy to refine the selected workflow by calibrating refit acceptance, enhancing reliability under limited supervision. Extensive experiments conducted on several real-world datasets demonstrate that SignGAD achieves strong performance against state-of-the-art methods, highlighting its effectiveness on graph anomaly detection tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.27469 2026-05-28 cs.LG cs.AI 版本更新

度量感知PCA作为几何深度学习的线性实例

Michael Leznik

发表机构 * May, 2025（2025年5月）

AI总结本文通过将度量感知主成分分析（MAPCA）置于几何深度学习框架中，建立了两者之间在对称性、等变性、不变性等六个轴上的精确对应关系，并证明了MAPCA是几何深度学习的线性实例。

详情

AI中文摘要

几何深度学习围绕数据域的对称性组织神经架构，对称群的选择作为几何先验，决定了可以学习哪些表示。度量感知主成分分析（MAPCA）通过正定度量矩阵参数化主成分分析，其规范子族在标准PCA和输出白化之间插值，对角度量点恢复不变PCA（IPCA）。本文将MAPCA置于几何深度学习框架中。度量被视为几何先验；保持它的正交群是其诱导的对称群；MAPCA解在该群下等变，所得谱不变；MAPCA的定义约束是等变网络中使用的Schur型权重约束的线性类比。在六个轴——域、对称群、等变性、不变性、架构原语和几何先验——上，我们构建了MAPCA与几何深度学习之间的精确字典。技术核心是一个唯一性定理，将IPCA刻画为MAPCA族中唯一的线性数据导出度量，该度量在任意对角缩放下等变，并投影到作用的不动点集上，在归一化下等价于精确形式的方差最大化准则。本文以三座桥梁结束：核PCA作为非线性扩展，谱图方法作为图上的MAPCA，以及深度MAPCA构造将定位扩展到深度等变网络。

英文摘要

Geometric deep learning organises neural architectures around the symmetries of their data domain, with the choice of symmetry group serving as a geometric prior that determines what representations can be learned. Metric-Aware Principal Component Analysis (MAPCA) parameterises principal component analysis by a positive-definite metric matrix, with a canonical subfamily interpolating between standard PCA and output whitening and a diagonal-metric point recovering Invariant PCA (IPCA). This paper positions MAPCA within the geometric deep learning framework. The metric is read as the geometric prior; the orthogonal group preserving it is the symmetry group it induces; MAPCA solutions are equivariant under this group with the resulting spectrum invariant; and MAPCA's defining constraint is the linear analogue of the Schur-type weight constraints used in equivariant networks. Across six axes - domain, symmetry group, equivariance, invariance, architectural primitive, and geometric prior - we construct a precise dictionary between MAPCA and geometric deep learning. The technical anchor is a uniqueness theorem characterising IPCA as the unique linear data-derived metric in the MAPCA family that is equivariant under arbitrary diagonal rescaling and projects onto the fixed-point set of the action, equivalent under normalisation to the variance-maximisation criterion in its precise form. The paper closes with three bridges: kernel PCA as the nonlinear extension, spectral graph methods as MAPCA on graphs, and a deep MAPCA construction extending the positioning into deep equivariant networks

URL PDF HTML ☆

赞 0 踩 0

2605.27450 2026-05-28 cs.IR cs.LG 版本更新

基于量子机器学习的6G边缘网络：实现自适应通信与模型聚合

Wenjing Xiao, Jiatai Yan, Chenglong Shi, Shixin Chen, Miaojiang Chen, Min Chen, Saif Al-Kuwari, Ahmed Farouk

发表机构 * School of Computer, Electronics and Information, Guangxi University（广西大学计算机电子信息学院）； Guangxi Key Laboratory of Multimedia Communications and Network Technology（广西多媒体通信与网络技术重点实验室）； School of Computer Science and Engineering, South China University of Technology（华南理工大学计算机科学与工程学院）； Pazhou Laboratory（琶洲实验室）； Qatar Center for Quantum Computing, College of Science and Engineering, Hamad Bin Khalifa University（卡塔尔量子计算中心，哈马德·本·哈利法大学）

AI总结针对6G V2X通信中的高维状态空间、异构节点和动态环境挑战，提出量子增强框架，包含信道自适应语义通信、多模态融合、模型迁移和联邦聚合四个模块，利用量子卷积神经网络、量子注意力、量子强化学习和量子张量分解提升效率、泛化能力和隐私保护。

详情

AI中文摘要

随着第六代（6G）移动通信技术的到来，车联网（V2X）通信在通信效率、系统泛化能力和模型协作方面面临前所未有的挑战。传统机器学习在处理V2X系统中的高维状态空间、异构V2X节点下的慢收敛和弱泛化、快速变化的信道以及多模态感知数据时存在困难。为解决这些问题，我们提出了一种量子增强的V2X通信与模型聚合框架，旨在实现6G中高效、鲁棒和智能的交通，该框架包含四个模块：信道自适应语义通信模块、多模态融合模块、模型迁移模块和联邦聚合模块。具体而言，信道自适应语义通信模块利用量子卷积神经网络（CNN）和量子失真度量，实现高效传输和跨不同条件的强泛化能力。多模态融合模块利用量子注意力和纠缠来压缩特征并关联异构数据中的语义。模型迁移模块采用量子强化学习对决策过程进行建模，并提高动态环境中的适应性。联邦聚合模块将量子张量分解与基于反向传播的校正相结合，以低开销提供隐私保护并增强全局模型的鲁棒性。这项工作为未来6G智能交通中的通信与模型协作勾勒了一种新范式。

英文摘要

With the advent of sixth-generation (6G) mobile communication technology, vehicle-to-everything (V2X) communication faces unprecedented challenges in communication efficiency, system generalization capabilities, and model collaboration. Conventional machine learning struggles with high-dimensional state spaces, slow convergence, and poor generalization under heterogeneous V2X nodes, rapidly varying channels, and multimodal sensing data in V2X systems. To address these issues, we propose a quantum-enhanced framework for V2X communication and model aggregation that targets efficient, robust, and intelligent transportation in 6G, which includes four modules: the channel-adaptive semantic communication module, the multimodal fusion module, the model transfer module, and the federated aggregation module. Specifically, the channel-adaptive semantic communication module leverages quantum convolutional neural networks (CNN) and quantum distortion metrics to enable efficient transmission and strong generalization across diverse conditions. The multimodal fusion module exploits quantum attention and entanglement to compress features and associate semantics across heterogeneous data. The model transfer module employs quantum reinforcement learning to model decision-making and improve adaptability in dynamic environments. The federated aggregation module integrates quantum tensor decomposition with backpropagation-based corrections to provide privacy preservation with low overhead and to strengthen global model robustness. This work outlines a new paradigm for communication and model collaboration in future 6G intelligent transportation.

URL PDF HTML ☆

赞 0 踩 0

2605.27416 2026-05-28 quant-ph cs.AI cs.DC cs.LG 版本更新

Can Quantum Federated Learning Withstand Circuit-Level Backdoors?

量子联邦学习能否抵御电路级后门攻击？

Aakar Mathur, Mohammed Ruknuddin, Ashish Gupta

发表机构 * BITS Pilani Dubai Campus（比斯汉尼迪拜校区）

AI总结提出电路级后门威胁模型（CULT），通过量子感知机制（Grover、Pauli、Bit-flip、Sign-flip）实现四种隐蔽攻击，理论证明攻击的隐蔽性，实验表明单个恶意客户端即可导致FedAvg精度严重下降，现有防御无法消除最坏情况。

Comments Accepted to IJCAI-ECAI 2026

详情

AI中文摘要

量子联邦学习（QFL）继承了联邦优化对恶意客户端的核心脆弱性，同时也引入了来自变分电路训练和测量驱动梯度的攻击面。本文提出了一种新颖的电路级后门威胁（CULT）模型，该模型通过利用量子感知机制（包括Grover、Pauli、Bit-flip和Sign-flip）形式化了四种隐蔽攻击。通过使恶意客户端在训练中和训练后表面上均可发起攻击，这些攻击能够严重破坏学习过程。我们建立了严格的理论基础，以证明在标准平滑性假设下攻击的隐蔽性。在MNIST和CIFAR-10数据集上进行的实验，采用非独立同分布划分和不同比例的恶意客户端，结果表明，即使只有一个恶意客户端，在FedAvg聚合下也能导致严重的精度下降。虽然流行的防御方法（包括Krum、Multi-Krum、FoolsGold、FLGuardian和Mud-HoG）在许多情况下减少了精度下降，但它们未能消除最坏情况下的失败案例，其中精度下降高达50%。实验分析进一步揭示，在CULT模型下，恶意更新通过保持接近良性范数来有效掩盖其存在，从而帮助攻击者逃避检测。

英文摘要

Quantum Federated Learning (QFL) inherits the core vulnerability of federated optimization to malicious clients, while also introducing an attack surface from variational circuit training and measurement-driven gradients. This work proposes a novel CircUit-Level backdoor Threat (CULT) model that formalizes four stealthy attacks by exploiting quantum-aware mechanisms, including Grover, Pauli, Bit-flip, and Sign-flip. By enabling malicious clients on both in-training and post-training surfaces, these attacks can critically undermine the learning process. We establish a rigorous theoretical foundation to demonstrate attack stealthiness under standard smoothness assumptions. Experiments on the MNIST and CIFAR-10 datasets with non-IID splits and varying fractions of malicious clients show that even a single malicious client can induce severe accuracy degradation under FedAvg aggregation. While popular defenses, including Krum, Multi-Krum, FoolsGold, FLGuardian, and Mud-HoG, reduce degradation in many regimes, they fail to eliminate worst-case failure cases, where accuracy drops up to 50\%. The experimental analysis further reveals that under the CULT model, malicious updates effectively mask their presence by staying close to benign norms, thereby helping attackers evade detection.

URL PDF HTML ☆

赞 0 踩 0

2605.27412 2026-05-28 cs.NE cs.AI cs.LG 版本更新

Advancing Direct Training for Spiking Neural Networks with Circulate-Firing Neurons and Learnable Gradients

利用循环发放神经元和可学习梯度推进脉冲神经网络的直接训练

Feifan Zhou, Xiang Wei, Yang Liu, Qiang Yu

发表机构 * School of Artificial and Intelligence, Tianjin University（人工智能学院，天津大学）

AI总结提出一种包含循环发放神经元、逐时间步可学习代理梯度和正负平衡损失函数的直接训练算法，以提升脉冲神经网络的信息表示能力和梯度传播精度，在多个数据集上取得竞争性性能并泛化至Transformer架构。

详情

AI中文摘要

脉冲神经网络（SNN）因其节能特性而备受关注，但与人工神经网络（ANN）相比仍存在显著性能差距。这一差距源于至少两个关键限制：首先，传统脉冲神经元的信息表示能力有限，未能充分利用膜电位的丰富动态；其次，固定代理梯度（SG）函数在时间步上导致梯度传播不精确，阻碍了有效的直接训练。为了解决这两个挑战，我们提出了一种新的直接训练算法，包含三个核心创新：第一，一种循环发放脉冲神经元模型，通过更有效地利用膜电位来增强信息表示能力；第二，一种逐时间步可学习的代理梯度函数，能够在反向传播过程中实现精确的梯度估计；第三，一种正负平衡损失函数，以实现正负膜电位之间的平衡，进一步提升SNN性能。大量实验表明，我们的方法在多个数据集上取得了竞争性性能。我们的方法可以无缝泛化到先进的Transformer架构，始终优于现有方法。我们的工作强调了进一步利用SNN内在膜动力学以提升性能的有效性，从而为推进高性能脉冲神经架构开辟了新途径。

英文摘要

Spiking Neural Networks (SNNs) have emerged with promising energy-efficient property, yet a substantial performance gap persists compared to Artificial Neural Networks (ANNs). This gap stems from at least two key limitations: first, conventional spiking neurons offer limited information representation capacity, underutilizing the rich dynamics of membrane potentials; second, fixed surrogate gradient (SG) functions across time steps leads to imprecise gradient propagation, impeding effective direct training. To address these two challenges, we propose a new direct training algorithm with three core innovations: first, a circulate-firing spiking neuron model that enhances information representation capacity by leveraging membrane potentials more effectively; second, a time-step-wise learnable surrogate gradient function, enabling accurate gradient estimation during backpropagation; third, a positive-negative balanced loss function to achieve equilibrium between positive and negative membrane potentials and further boost SNN performance. Extensive experiments demonstrate that our methods achieve competitive performance across multiple datasets. Our methods can generalize seamlessly to advanced architectures of Transformer, consistently outperforming existing methods. Our work highlights the effectiveness of further harnessing intrinsic membrane dynamics of SNNs for performance improvement, and thus open a new avenue for advancing high-performance spiking neural architectures.

URL PDF HTML ☆

赞 0 踩 0

2605.27411 2026-05-28 cs.NE cs.LG 版本更新

Genetic algorithm vs. gradient descent for training a neural network architecture dedicated to low data regimes in small medical datasets

遗传算法与梯度下降在针对小医学数据集低数据量场景的神经网络架构训练中的比较

Amine Boukhari, Boglarka Ecsedi, Laszlo Papp, Mathieu Hatt

发表机构 * Center for Medical Physics and Biomedical Engineering, Medical University of Vienna（医学物理与生物医学工程中心，维也纳医科大学）； Georgia Institute of Technology（佐治亚理工学院）

AI总结针对DEBI-NN架构，比较遗传算法与梯度下降在分类任务中的性能，发现遗传算法在决策边界和分类准确率上显著优于梯度下降。

详情

AI中文摘要

目的/引言：距离编码生物形态信息神经网络（DEBI-NN）是一种最近提出的架构，其中连接权重由欧几里得空间中神经元之间的距离定义。与直接训练权重的经典神经网络相比，这种方法大幅减少了可训练参数的数量。DEBI-NN的训练过程基于遗传算法（GA），而非深度学习中最常用的优化算法梯度下降（GD）。我们旨在为DEBI-NN设计并实现一个GD学习器，并评估其与GA相比的性能。材料与方法：我们设计了一种针对DEBI-NN的空间反向传播方案，并在分类任务中比较了GD和GA，使用了合成非线性“双月”数据集、两个临床医学影像放射组学数据集和一个胎儿心宫缩图数据集，样本量从n=85到n=2126。每个优化器通过针对每个数据集调整的超参数搜索进行调优。结果：在所有实验中，GA始终产生更优的决策边界和分类性能（合成：100% vs 83%；DLBCL：83% vs 78%；HECKTOR：80% vs 67%；胎儿：81% vs 66%），而GD表现出不稳定性，未能完全捕捉DEBI-NN空间编码固有的非线性模式。神经元相互依赖导致的纠缠梯度限制了经典反向传播的有效性。结论：这些发现凸显了基于梯度的方法在具有高度相互依赖空间参数的架构中的根本局限性，并确认了进化策略在训练DEBI-NN中的适用性。

英文摘要

Aim/Introduction: Distance-encoding biomorphic-informational neural network (DEBI-NN) is a recently proposed architecture in which connection weights are defined by the distances between neurons positioned in a Euclidian space. This approach drastically reduces the number of trainable parameters compared to classical neural networks in which weights are directly trained. The training process for DEBI-NN is based on a genetic algorithm (GA), rather than gradient descent (GD) which remains the prevailing optimization algorithm in deep learning. We aim to design and implement a GD learner for DEBI-NN and assess its performance compared to GA. Materials and Methods: We designed a spatial backpropagation scheme tailored to DEBI-NN and carried out a comparison between GD and GA for classification tasks, using a synthetic non-linear "two-moons" dataset, two clinical medical imaging radiomic datasets and a fetal cardiotocography dataset with a sample sizes ranging from n=85 to n=2126. Each optimizer was tuned through targeted hyperparameter searches adapted to each dataset. Results: Across all experiments, GA consistently produced superior decision boundaries and classification performance (Synthetic: 100% vs 83%; DLBCL: 83% vs 78%; HECKTOR: 80% vs 67%; Fetal: 81% vs 66%), whereas GD exhibited instability and failed to fully capture the non-linear patterns inherent to DEBI-NN's spatial encoding. The entangled gradients resulting from neuron interdependencies limit the effectiveness of classical backpropagation. Conclusion: These findings highlight fundamental limitations of gradient-based methods in architectures with highly interdependent spatial parameters and confirm the suitability of evolutionary strategies for training DEBI-NN.

URL PDF HTML ☆

赞 0 踩 0

2605.27409 2026-05-28 cs.NE cs.AI cs.LG 版本更新

STARS: Spike Tail-Aware Relational Synthesis for ANN-to-SNN Data-Free Knowledge Distillation

STARS: 面向ANN到SNN无数据知识蒸馏的尖峰尾部感知关系合成

Shuhan Ye, Yi Yu, Qixin Zhang, Hui Lu, Jiaming He, Qinggang Zhang, Li Shen, Xudong Jiang

发表机构 * Nanyang Technological University（南洋理工大学）； Jilin University（吉林大学）； Wuhan University（武汉大学）； Sun Yat-sen University（中山大学）

AI总结提出STARS方法，通过关系一致性对齐和尾部感知正则化增强BN引导的合成数据，解决SNN学生网络在无数据知识蒸馏中约束不足的问题，在多个数据集上提升性能。

详情

AI中文摘要

SNN有望实现高能效和低延迟推理，但其性能仍落后于ANN。ANN到SNN的知识蒸馏有助于缩小这一差距，但在实际部署中原始训练数据通常不可用。现有的无数据知识蒸馏（DFKD）方法通过匹配教师侧先验（尤其是BN统计量）来合成替代数据，但这些面向ANN的约束主要正则化均值和方差，因此对于响应依赖于阈值穿越动态的SNN学生网络而言，约束不足。本文提出尖峰尾部感知关系合成（STARS），一种用于ANN到SNN DFKD的即插即用方法，通过两个互补目标增强标准BN引导合成：关系一致性对齐（保持教师和学生之间的跨样本关系一致性）和尾部感知正则化（通过软超越教师导出阈值来正则化阈值相关的尾部概率）。这些目标共同生成合成批次，这些批次在保持教师有效性的同时，对SNN学生网络更具信息性。在CIFAR-10、CIFAR-100和Tiny-ImageNet上的多个ANN-SNN对实验表明，我们的方法一致改进了传统DFKD基线，甚至超过了若干KD方法，在CIFAR-10上提升高达4.6%，在CIFAR-100上提升高达6.7%，突显了在面向SNN的DFKD中，用关系约束和尾部感知约束补充BN匹配的重要性。

英文摘要

SNNs promise energy-efficient and low-latency inference, but their performance still trails that of ANNs. ANN-to-SNN knowledge distillation helps narrow this gap, yet the original training data are often unavailable in practical deployment settings. Existing data-free knowledge distillation (DFKD) methods synthesize surrogate data by matching teacher-side priors, especially BN statistics, but these ANN-oriented constraints mainly regularize mean and variance and therefore remain under-constrained for SNN students whose responses depend on threshold-crossing dynamics. In this paper, we propose Spike Tail-Aware Relational Synthesis (STARS), a plug-and-play method for ANN-to-SNN DFKD that augments standard BN-guided synthesis with two complementary objectives: Relational Consistency Alignment, which preserves cross-sample relational consistency between teacher and student, and Tail-Aware Regularization, which regularizes threshold-relevant tail probabilities through soft exceedance over teacher-derived thresholds. Together, these objectives generate synthetic batches that remain teacher-valid while becoming more informative for SNN students. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet across multiple ANN-SNN pairs show that our method consistently improves conventional DFKD baselines and even surpasses several KD methods, with gains of up to 4.6\% on CIFAR-10 and 6.7\% on CIFAR-100, highlighting the importance of complementing BN matching with relational and tail-aware constraints in SNN-oriented DFKD.

URL PDF HTML ☆

赞 0 踩 0

2605.27408 2026-05-28 quant-ph cs.LG cs.NA math.NA 版本更新

Neural Quantum Spectral Operator Learning for Solving Partial Differential Equations

神经量子谱算子学习求解偏微分方程

Chanyoung Kim, Myeonghwan Seong, Yujin Kim, Daniel K. Park, Youngjoon Hong

发表机构 * Department of Mathematical Sciences, KAIST（韩国科学技术院数学科学系）； Department of Statistics and Data Science, Yonsei University（延世大学统计与数据科学系）； Department of Applied Statistics, Yonsei University（延世大学应用统计学系）； Department of Quantum Information, Yonsei University（延世大学量子信息系）； Department of Mathematical Sciences, Seoul National University（首尔国立大学数学科学系）

AI总结提出首个混合量子-经典无监督算子学习框架NVQLS，利用Legendre-Galerkin弱形式解决VQLS符号歧义并引入神经嵌入编码，在1D/2D参数化PDE上实现高精度求解。

Comments 31 pages (main 9 pages), 17 figures, 8 tables

详情

AI中文摘要

偏微分方程（PDE）是物理和工程系统建模的核心，但重复求解参数化PDE仍然计算成本高昂。算子学习能够实现快速代理推理，但通常需要由昂贵的高保真PDE求解器生成的大规模输入-输出配对数据集。无监督算子学习框架减轻了数据依赖性，但仍受计算瓶颈限制。为解决这一问题，我们提出了神经变分量子线性求解器（NVQLS），这是首个利用Legendre-Galerkin弱形式的混合量子-经典算子学习框架。我们关键性地解决了VQLS能量最小化中的符号歧义，防止了错误的解表示。此外，我们引入了神经嵌入，一种新颖的编码方案，将变化的强迫项和PDE系数映射到参数化量子电路表示中。这些结构创新在高效态制备方案下提供了理论计算复杂度优势，同时相比代表性经典基线实现了更优的精度。在1D和2D参数化PDE上，在不同边界条件下的验证表明，NVQLS能够同时处理变化输入，为量子增强算子学习提供了一种可扩展的无监督方法。

英文摘要

Partial differential equations (PDEs) are central to modeling physical and engineering systems, but repeatedly solving parametric PDEs remains computationally expensive. Operator learning enables fast surrogate inference, yet typically requires large input-output paired datasets generated by costly high-fidelity PDE solvers. Unsupervised operator learning frameworks alleviate data dependency but remain hindered by computational bottlenecks. To address this, we propose Neural Variational Quantum Linear Solver (NVQLS), the first hybrid quantum-classical operator learning framework leveraging the Legendre--Galerkin weak formulation. We critically resolve the sign ambiguity in VQLS energy minimization, preventing erroneous solution representations. Additionally, we introduce a neural embedding, a novel encoding scheme to map varying forcings and PDE coefficients into parameterized quantum circuit representations. These structural innovations provide theoretical computational complexity advantages under efficient state preparation schemes, while achieving superior accuracy compared to a representative classical baseline. Validations on 1D and 2D parametric PDEs under diverse boundary conditions demonstrate NVQLS's capability to simultaneously process varying inputs, offering a scalable unsupervised approach to quantum-enhanced operator learning.

URL PDF HTML ☆

赞 0 踩 0

2605.27407 2026-05-28 cs.NE cs.AI cs.LG 版本更新

Benchmarking Fairness in Spiking Neural Networks: Data Bias, Spurious Features, and Hardware Effects

脉冲神经网络中的公平性基准测试：数据偏差、虚假特征和硬件效应

Hudi He, Fukun Wang, Zhe Wang, Xinyi Wang, Shuhan Ye, Jiarui Liu, Qing Qing, Ziqi Xu, Xikun Zhang, Renqiang Luo

发表机构 * Jilin University（吉林大学）； Nanyang Technological University（南洋理工大学）； RMIT University（皇家墨尔本理工大学）

AI总结本文首次提出脉冲神经网络公平性基准，通过引入人口统计覆盖缺口、虚假特征泄漏和部署环境不匹配三个现实维度，系统评估了12种先进SNN在资源约束下的公平性-性能权衡。

详情

AI中文摘要

评估脉冲神经网络（SNN）的公平性需要反映现实世界复杂性的严格基准，然而现有评估仍受限于肤浅的数据集多样性和理想化的硬件假设。本文首次引入SNN的系统性公平性基准，解决三个关键的现实维度：（1）训练数据中的人口统计覆盖缺口，（2）虚假特征泄漏（例如，肤色作为类别标签的代理），以及（3）部署环境不匹配（例如，具有受限脉冲编码的边缘设备）。我们的框架整合了四个跨人口统计数据集（带有受控偏差注入）和三个神经形态硬件模拟器（Loihi 2、SpiNNaker），从而能够在资源约束下隔离分析公平性-性能权衡。对12种最先进SNN的标准化评估揭示了显著差异：在偏差数据上训练的模型对代表性不足群体的假阳性率高出23%，而硬件限制（例如，降低的脉冲精度）在边缘部署中进一步将准确率差距放大至41%。关键的是，为云端SNN开发的偏差缓解策略在资源约束下通常会退化，这凸显了需要联合优化公平性和硬件效率的协同设计原则。通过连接算法公平性研究与神经形态工程，我们的基准为医疗和自主系统等社会关键应用中的可信SNN奠定了基础。我们的代码可在以下网址获取：https://anonymous.4open.science/r/SNN-Benchmarks-8017。

英文摘要

Evaluating fairness in Spiking Neural Networks (SNNs) demands rigorous benchmarks that reflect real-world complexities, yet existing assessments remain limited by superficial dataset diversity and idealized hardware assumptions. This work introduces the first systematic fairness benchmark for SNNs, addressing three critical dimensions of realism: (1) demographic coverage gaps in training data, (2) spurious feature leakage (e.g., skin tone as a proxy for class labels), and (3) deployment-environment mismatches (e.g., edge devices with constrained spike encoding). Our framework integrates four cross-demographic datasets with controlled bias injections and three neuromorphic hardware simulators (Loihi 2, SpiNNaker), enabling isolated analysis of fairness-performance trade-offs under resource constraints. Standardized evaluations of 12 state-of-the-art SNNs reveal stark disparities: models trained on biased data exhibit 23\% higher false positive rates for underrepresented groups, while hardware limitations (e.g., reduced spike precision) further amplify accuracy gaps by up to 41\% in edge deployments. Critically, bias mitigation strategies developed for cloud-based SNNs often degrade under resource constraints, highlighting the need for co-design principles that jointly optimize fairness and hardware efficiency. By bridging algorithmic fairness research with neuromorphic engineering, our benchmark provides a foundation for trustworthy SNNs in socially critical applications such as healthcare and autonomous systems. Our code is available at: https://anonymous.4open.science/r/SNN-Benchmarks-8017.

URL PDF HTML ☆

赞 0 踩 0

2605.27406 2026-05-28 cs.LG 版本更新

A Simple State Space Model Excels at Multivariate Time Series Classification

一个简单的状态空间模型在多变量时间序列分类中表现出色

Hassan Saadatmand, Geoffrey I. Webb, Hamid Rezatofighi, Mahsa Salehi

发表机构 * Monash University（墨尔本大学）

AI总结本文系统研究对角状态空间模型（S4D）和输入相关状态空间模型（Mamba系列）在大规模时间序列分类任务中的表现，发现S4D在准确性和效率上均优于Mamba变体，并提出了轻量级改进MS4和MS4N，在多个基准上达到或超越参数量大2-10倍的深度学习模型。

详情

AI中文摘要

结构化状态空间模型（SSM）最近作为序列建模的有前景基础出现，基于Mamba的架构通过输入相关的状态转换展示了强大的性能，尽管复杂度相当高。然而，它们在时间序列分类（TSC）中的应用主要局限于Mamba风格的架构，更广泛的SSM设计空间尚未充分探索。我们首次在大规模TSC基准上进行了涵盖对角SSM（S4D）和输入相关SSM（Mamba系列）的系统研究，探究这种复杂性是否对顶级性能是必要的。我们的结果揭示了一个令人惊讶的发现：S4D在准确性和效率上始终优于基于Mamba的变体，挑战了增加复杂性会在TSC中带来有意义收益的假设。基于此，我们引入了MS4，通过线性输入投影和通道混合机制对S4D进行轻量级修改，以及MS4N，一种归一化变体，以可忽略的开销稳定状态动态。在MONSTER（多达6000万样本、5万时间步、82个类别）和UEA基准上的59个数据集上，与15个基线相比，MS4和MS4N始终优于基于Mamba的模型，同时保持更高的效率，并且MS4N匹配或超越了参数量大约2倍和10倍的竞争性深度学习模型。这些结果将轻量级结构化SSM定位为在TSC中扩展复杂性的有吸引力替代方案。

英文摘要

Structured state space models (SSMs) have recently emerged as a promising foundation for sequence modeling, with Mamba-based architectures demonstrating strong performance through input-dependent state transitions, albeit at considerable complexity. However, their application to time-series classification (TSC) has been largely limited to Mamba-style architectures, leaving the broader SSM design space underexplored. We present the first systematic study spanning diagonal SSMs (S4D) and input-dependent SSMs (Mamba family) on large-scale TSC benchmarks, asking whether such complexity is necessary for top performance. Our results reveal a surprising finding: S4D consistently outperforms Mamba-based variants in both accuracy and efficiency, challenging the assumption that increased complexity translates to meaningful gains in TSC. Building on this, we introduce MS4, lightweight modifications to S4D via a linear input projection and channel-mixing mechanism, and MS4N, a normalized variant that stabilizes state dynamics with negligible overhead. Evaluated on 59 datasets across MONSTER (up to 60 million samples, 50K timesteps, 82 classes) and the UEA benchmark, against 15 baselines, MS4 and MS4N consistently outperform Mamba-based models while remaining more efficient, and MS4N matches or surpasses competing deep learning models that are roughly 2x and 10x larger in parameters. These results position lightweight structured SSMs as a compelling alternative to scaling complexity for TSC.

URL PDF HTML ☆

赞 0 踩 0

2605.27397 2026-05-28 cs.LG 版本更新

IGADA-IoT: IoT Sensor Energy Optimization in Wireless Sensor Networks Driven by Automatic Data Augmentation

IGADA-IoT：自动数据增强驱动的无线传感器网络中物联网传感器能量优化

Mingchun Sun, Rongqiang Zhao, Muhammad Abdul Munnaf, Jie Liu

发表机构 * Faculty of Computing, Harbin Institute of Technology（哈尔滨工业大学计算机学院）

AI总结提出一种信息间隙引导的自动数据增强框架IGADA-IoT，通过分层多生成器协作与调度，联合利用不同生成器能力减小信息间隙，并引入信息间隙-模型性能联合评估与闭环方法，提升增强决策准确性，实验表明平均准确率提升7.27%。

详情

AI中文摘要

在无线传感器网络（WSN）中，数据增强是一种提高采样频率决策性能的新方法，从而实现对物联网（IoT）传感器的能量优化。然而，现有方法依赖单一生成器和经验确定的量，未能建立动态信息间隙与多个生成器之间的映射，并且忽略了生成样本的异质性。此外，缺乏一种联合考虑信息间隙和模型性能的评估与闭环方法。为了解决这些问题，我们提出了一种信息间隙引导的物联网传感器自动数据增强框架（IGADA-IoT），具有分层多生成器协作和多轮调度。联合利用不同生成器的能力来减小信息间隙。在IGADA-IoT中，提出了一种分层多生成器协作与调度策略（HMGCS），以增强生成样本分配的针对性和合理性。提出了一种信息间隙-模型性能联合评估与闭环方法（IGMP-EC），以增强增强决策的准确性，并减轻欠增强和过增强的风险。实验结果表明，IGADA-IoT将多个下游模型的平均准确率提高了7.27%。与先进的数据增强方法相比，平均准确率提高了8.67%。与单个生成器相比，平均准确率提高了7.24%。此外，来自UCR Archive和实际部署的公共物联网传感器数据集证明了所提方法的准确性和泛化能力。

英文摘要

In wireless sensor networks (WSNs), data augmentation is a novel method to improve sampling-frequency decision performance, thereby enabling energy optimization for IoT (Internet of Things) sensors. However, existing methods rely on a single generator and empirically determined quantities, failing to establish a mapping between dynamic information gaps and multiple generators, and overlooking the heterogeneity of generated samples. Moreover, an evaluation and a closed-loop method that jointly considers the information gap and the model performance are lacking. To address these issues, we propose an information gap-guided IoT sensor automatic data augmentation framework (IGADA-IoT) with hierarchical multi-generator collaboration and scheduling over multiple rounds. Capabilities of different generators are jointly utilized to reduce the information gaps. In the IGADA-IoT, a hierarchical multi-generator collaboration and scheduling strategy (HMGCS) is proposed to enhance the targetedness and rationality of generated sample allocation. An information gap-model performance joint evaluation and closed-loop method (IGMP-EC) is proposed to enhance the accuracy of augmentation decisions, and to mitigate the risks of under-augmentation and over-augmentation. Experimental results show that the IGADA-IoT improves the average accuracy of multiple downstream models by 7.27%. Compared with advanced data augmentation methods, the average accuracy is improved by 8.67%. Compared with the individual generators, the average accuracy is improved by 7.24%. Furthermore, public IoT sensor datasets from the UCR Archive and real-world deployments demonstrate the accuracy and generalizability of the proposed method.

URL PDF HTML ☆

赞 0 踩 0

2605.27385 2026-05-28 cs.LG cs.AI 版本更新

Personalized Observation Normalization for Federated Reinforcement Learning in Simulation Environments with Heterogeneity

异构仿真环境中联邦强化学习的个性化观测归一化

Yiran Pang, Zhen Ni, Xiangnan Zhong

发表机构 * Department of Electrical Engineering \& Computer Science Florida Atlantic University Boca Raton, FL, USA

AI总结针对联邦强化学习在异构环境中状态转移动力学差异导致输入分布不一致和参数更新不平衡的问题，提出个性化观测归一化方法，通过各智能体本地维护运行均值和方差对原始状态输入进行归一化，加速训练并提升性能。

Comments Accepted at the International Joint Conference on Neural Networks (IJCNN) 2025

详情

DOI: 10.1109/IJCNN64981.2025.11229364

AI中文摘要

联邦强化学习（FedRL）使多个智能体能够在不共享原始数据的情况下协同训练全局策略，因此非常适合隐私敏感的应用。然而，FedRL在异构环境中面临挑战，其中不同的状态转移动力学导致聚合过程中输入分布不一致和参数更新不平衡。因此，本文开发了一种个性化观测归一化（PON）方法，允许每个智能体使用持续更新的运行均值和方差对原始状态输入进行局部归一化。这种设计确保了局部特征的一致缩放，而不会在聚合过程中掩盖其他智能体的特征。此外，我们证明了由于不同的局部输入分布，跨智能体共享归一化参数是无效的，这突显了个性化统计的必要性。在异构MuJoCo任务上的实验表明，我们开发的PON加速了训练，并且与基线方法相比取得了更优的性能。

英文摘要

Federated reinforcement learning (FedRL) enables multiple agents to collaboratively train a global policy without sharing raw data, making it ideal for privacy-sensitive applications. However, FedRL faces challenges in heterogeneous environments where differing state-transition dynamics lead to non-identical input distributions and imbalanced parameter updates during aggregation. Therefore, this paper develops a personalized observation normalization (PON) method, allowing each agent to locally normalize raw state inputs using a continuously updated running mean and variance. This design ensures consistent scaling of local feature without overshadowing across agents during aggregation. Furthermore, we demonstrate that sharing normalization parameters across agents is ineffective due to the diverse local input distributions, which highlights the necessity of personalized statistics. Experiments on heterogeneous MuJoCo tasks show that our developed PON accelerates training and achieves superior performance compared to baseline methods.

URL PDF HTML ☆

赞 0 踩 0

2605.27365 2026-05-28 cs.CV cs.AI cs.LG cs.RO 版本更新

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

LocateAnything: 基于并行框解码的快速高质量视觉定位

Shihao Wang, Shilong Liu, Yuanguo Kuang, Xinyu Wei, Yangzhou Liu, Zhiqi Li, Yunze Man, Guo Chen, Andrew Tao, Guilin Liu, Jan Kautz, Lei Zhang, Zhiding Yu

发表机构 * The Hong Kong Polytechnic University（香港理工大学）； Princeton University（普林斯顿大学）； Nanjing University（南京大学）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结提出并行框解码（PBD）方法，将边界框和点作为原子单元单步解码，结合大规模数据集LocateAnything-Data，实现高效统一的目标定位与检测，在保持高精度同时显著提升解码吞吐量。

Comments fix github link

详情

AI中文摘要

视觉语言模型（VLM）通常将视觉定位和检测表述为坐标令牌生成问题，将每个2D框序列化为多个1D令牌，这些令牌在很大程度上独立学习和解码。这种逐令牌解码与框几何的耦合结构不匹配，并且由于严格的顺序生成而造成了实际的推理瓶颈。我们引入了LocateAnything，一个基于并行框解码（PBD）的统一生成式定位和检测框架。通过将边界框和点等几何元素作为原子单元单步解码，LocateAnything保持了框内几何一致性并实现了显著的并行性。我们证明PBD提高了解码吞吐量和定位精度。我们进一步开发了一个可扩展的数据引擎，并策划了LocateAnything-Data，这是一个包含超过1.38亿个训练样本的大规模数据集，大大增加了高精度定位的数据多样性。大量评估表明，LocateAnything推进了速度-精度前沿，在多个基准测试中实现了显著更高的解码吞吐量，同时提高了高IoU定位质量。结果突显了并行框解码和大规模训练数据在实现高效精确的统一视觉定位和检测中的互补优势。

英文摘要

Vision-language models (VLMs) commonly formulate visual grounding and detection as a coordinate-token generation problem, serializing each 2D box into multiple 1D tokens that are learned and decoded largely independently. This token-by-token decoding mismatches the coupled structure of box geometry and creates a practical inference bottleneck due to strictly sequential generation. We introduce LocateAnything, a unified generative grounding and detection framework based on Parallel Box Decoding (PBD). By decoding geometric elements such as bounding boxes and points as atomic units in a single step, LocateAnything preserves intra-box geometric coherence and unlocks substantial parallelism. We show that PBD improves both decoding throughput and localization accuracy. We further develop a scalable data engine and curate LocateAnything-Data, a large-scale dataset with more than 138 million training samples, substantially increasing data diversity for high-precision localization. Extensive evaluations show that LocateAnything advances the speed-accuracy frontier, achieving significantly higher decoding throughput while improving high-IoU localization quality across diverse benchmarks. The results highlight the complementary benefits of Parallel Box Decoding and large-scale training data in enabling efficient and precise unified visual grounding and detection.

URL PDF HTML ☆

赞 0 踩 0

2605.26790 2026-05-28 cs.LG physics.space-ph 版本更新

Pretrained Approximators for Low-Thrust Trajectory Cost and Reachability

低推力轨迹成本与可达性的预训练近似器

Zhong Zhang, Giacomo Acciarini, Dario Izzo, Hexi Baoyin, Francesco Topputo

发表机构 * Politecnico di Milano（米兰理工大学）； European Space Agency（欧洲航天局）； Tsinghua University（清华大学）

AI总结提出使用机器学习代理模型精确近似低推力轨迹的燃料消耗和转移可行性，通过同伦射线策略和自相似变换实现跨任务泛化，并开源模型与数据集。

Comments Submitted to the Journal of Guidance, Navigation and Control. Zenodo entry: https://doi.org/10.5281/zenodo.18769170

详情

AI中文摘要

低推力轨迹设计严重依赖于对燃料消耗和转移可行性的重复评估，这需要昂贵的优化控制解。在这项工作中，我们表明这些量可以通过机器学习代理模型准确近似，从而在广泛场景中实现快速且可扩展的评估。通过增加数据集大小和模型容量，我们观察到低推力轨迹优化遵循缩放定律，性能随训练数据和网络参数的对数线性提升，且在探索范围内没有饱和迹象。基于这一观察，我们使用针对任务设计需求提出的同伦射线策略构建了一个大规模数据集。关键是引入自相似变换，允许在半长轴、倾角和中心天体之间泛化，避免重新训练。因此，相同的神经近似器可应用于不同的轨道环境和任务类别。所提出的模型准确预测了单圈和多圈转移的最优燃料消耗和最小转移时间。其性能和泛化能力在公开数据集、全球轨迹优化竞赛的多小行星飞越问题以及小行星交会任务设计中得到验证。模型和数据集作为开源发布，以支持航天社区。

英文摘要

Low-thrust trajectory design relies heavily on repeated evaluations of fuel consumption and transfer feasibility, which require expensive optimal control solutions. In this work, we show these quantities can be accurately approximated by machine learning surrogates, enabling fast and scalable evaluation across a wide range of scenarios. By increasing both dataset size and model capacity, we observe that low-thrust trajectory optimization follows a scaling law, with performance improving linearly with the logarithm of training data and network parameters, and no evidence of saturation within the explored regime. Guided by this observation, we construct a large-scale dataset using the proposed homotopy-ray strategy tailored to mission design requirements. A key is the introduction of a self-similar transformation, which allows generalization across semi-major axes, inclinations, and central bodies avoiding retraining. As a result, the same neural approximator can be applied to diverse orbital environments and mission classes. The proposed models accurately predict optimal fuel consumption and minimum transfer time for single- and multi-revolution transfers. Their performance and generalization are demonstrated on a public dataset, a multi-asteroid flyby problem from the Global Trajectory Optimization Competition, and an asteroid rendezvous mission design. The models and datasets are released as open-source to support the space community.

URL PDF HTML ☆

赞 0 踩 0

2605.26552 2026-05-28 cs.LG cs.AI 版本更新

Aligning Few-Step Generative Models by Amortizing Sample-based Variational Inference

通过摊销基于样本的变分推断来对齐少步生成模型

Jaewoo Lee, Hyeongyu Kang, Dohyun Kim, Kyuil Sim, Woocheol Shin, Minsu Kim, Taeyoung Yun, Jeongjae Lee, Sanghyeok Choi, Tabitha Edith Lee, Jong Chul Ye, Jinkyoo Park

发表机构 * KAIST（韩国科学技术院）； MongooseAI ； Mila – Quebec AI Institute（魁北克AI研究院）； University of Edinburgh（爱丁堡大学）； Université de Montréal（蒙特利尔大学）； Omelet

AI总结提出FAV框架，利用Stein变分梯度下降进行基于样本的变分推断，并通过固定点回归将粒子更新摊销到生成器参数中，实现对少步生成模型的对齐，在机器人操作和图像生成任务中优于现有方法。

Comments Under review

详情

AI中文摘要

对齐少步生成模型具有挑战性，因为现有的对齐框架通常依赖于限制性假设：可处理的似然、特定的ODE/SDE求解器或特定的模型族。我们引入了FAV（Few-step Generative Models Alignment via Sample-based Variational Inference），这是一个通用的对齐框架，仅需要对生成器和参考分布的样本访问。我们将对齐视为从倾斜于参考分布的奖励倾斜分布中采样。我们利用Stein变分梯度下降作为基于样本的变分推断方案，并通过固定点回归将粒子更新摊销到生成器参数中。我们在两个领域评估了FAV：机器人操作和图像生成器对齐。在机器人操作的生成策略对齐中，FAV在56个离线RL任务和30个离线到在线RL任务中优于现有的策略提取基线。对于图像生成器对齐，FAV微调了多种少步骨干模型，包括GAN、漂移模型、一致性模型和流映射，从ImageNet-$256$扩展到1024$^2$文本到图像合成。代码可在https://github.com/Jaewoopudding/FAV获取。

英文摘要

Aligning a few-step generative model is challenging, since existing alignment frameworks typically rely on restrictive assumptions: a tractable likelihood, a specific ODE/SDE solver, or a particular model family. We introduce FAV, Few-step Generative Models Alignment via Sample-based Variational Inference, a general alignment framework that requires only sample access to the generator and the reference distribution. We cast alignment as sampling from a reward-tilted distribution anchored to a reference distribution. We leverage Stein Variational Gradient Descent as a sample-based variational inference scheme and amortize its particle updates into the generator parameters via fixed-point regression. We evaluate FAV on two domains: robotics manipulation and image generator alignment. On generative policy alignment for robotic manipulation, FAV outperforms prevailing policy extraction baselines across 56 offline and 30 offline-to-online RL tasks. For image generator alignment, FAV fine-tunes diverse few-step backbones, including GAN, drifting model, consistency models, and flow maps, scaling from ImageNet-$256$ to 1024$^2$ text-to-image synthesis. Code is available at https://github.com/Jaewoopudding/FAV.

URL PDF HTML ☆

赞 0 踩 0

2605.26189 2026-05-28 cs.LG cs.AI 版本更新

打破块限制：通过单调熵下降与强化学习为扩散大语言模型实现动态大小推理块

Yan Jiang, Ruihong Qiu, Zi Huang

发表机构 * School of Electrical Engineering and Computer Science, The University of Queensland, Brisbane, Queensland, Australia（电气工程与计算机科学学院，昆士兰大学，布里斯班，昆士兰，澳大利亚）

AI总结针对扩散大语言模型中固定大小推理块导致的逻辑连贯性差和效率低问题，提出基于单调熵下降目标与强化学习的后训练框架b1，学习动态大小推理块以提升推理连贯性。

详情

AI中文摘要

最近的扩散大语言模型（dLLMs）通过基于块的半自回归生成范式展示了推理的有效性和效率。尽管取得了进展，固定大小的块生成仍然是有效且连贯推理的关键瓶颈。1. 从全局角度看，不同的推理任务对应不同的最优解码块大小，这使得“一刀切”的假设无效。2. 即使在单个推理任务中，刚性的块划分也会破坏逻辑流并降低推理连贯性。通过经验观察，我们发现对于块级熵，错误推理在块之间表现出波动和不稳定的趋势，而正确生成的任务则遵循一致的下降趋势。因此，本文提出了b1，一种新颖的dLLMs后训练框架，通过强化学习结合单调熵下降目标学习动态大小推理块，以增强推理连贯性。b1作为即插即用模块无缝集成到现有dLLM的后训练算法中。在各种推理基准上的大量实验表明，b1相比现有固定大小块基线具有一致的改进。我们的代码已发布在https://github.com/YanJiangJerry/Block-R1。

英文摘要

Recent diffusion large language models (dLLMs) have demonstrated both effectiveness and efficiency in reasoning via a block-based semi-autoregressive generation paradigm. Despite their progress, the fixed-size block generations remain a critical bottleneck for effective and coherent reasoning. 1. From a global perspective, different reasoning tasks would correspond to different optimal decoding block sizes, which makes a ``one-size-fits-all'' assumption ineffective. 2. Even within a single reasoning task, the rigid block partitioning would break the logical flow and reduce reasoning coherence. Through empirical observations, we reveal that for block-wise entropy, incorrect reasoning exhibits a fluctuating and unsteady trend between blocks, whereas the correctly generated tasks follow a consistent descending trend. Therefore, this paper proposes b1, a novel post-training framework for dLLMs that learns dynamic-size reasoning blocks via a Monotonic Entropy Descent objective with reinforcement learning to enhance reasoning coherence.b1 integrates seamlessly as a plug-and-play module with existing dLLM's post-training algorithms. Extensive experiments across various reasoning benchmarks showcase b1's consistent improvement over existing fixed-size block baselines. Our code has been released at https://github.com/YanJiangJerry/Block-R1.

URL PDF HTML ☆

赞 0 踩 0

2605.01046 2026-05-28 cs.LG 版本更新

Learning in the Fisher Subspace: A Guided Initialization for LoRA Fine-Tuning

在 Fisher 子空间中学习：LoRA 微调的引导初始化

Zhi-Quan Feng, Ying-Jia Lin, Hung-Yu Kao

发表机构 * Department of Computer Science（计算机科学系）； Information Engineering, National Cheng Kung University（信息工程系，国立成功大学）； Department of Artificial Intelligence, Chang Gung University（人工智能系，长庚大学）； Department of Computer Science, National Tsing Hua University（计算机科学系，国立清华大学）

AI总结本文提出一种基于 Fisher 信息的引导初始化方法，通过利用下游数据曲率信息选择 LoRA 适应子空间，以提升微调性能。

详情

AI中文摘要

LoRA 通过将更新限制在预训练权重的低秩子空间中来适应大型语言模型（LLMs）。虽然这大幅降低了训练成本，但适应的有效性关键取决于初始化时选择哪个子空间：一个将容量分配给任务无关方向的糟糕初始化会严重阻碍下游性能。现有的初始化策略主要依赖预训练权重的内在属性，隐含地假设仅权重几何就能反映任务相关性。然而，这种标准忽略了模型如何与下游数据分布交互。在这项工作中，我们将 LoRA 初始化表述为在目标数据分布下识别参数空间中方向的影响程度。我们认为，数据感知的敏感性（而非仅权重大小）应指导适应子空间的选择。基于这一观点，我们提出了一个 Fisher 引导的框架，利用下游数据诱导的曲率信息来表征参数扰动如何影响模型预测。这一视角为选择 LoRA 方向提供了一个原则性的、任务相关的标准，使适应更好地与目标对齐。跨不同任务和模态的实验结果表明，数据感知的初始化一致且显著地优于现有方法的下游性能。

EvoMAS：多智能体系统的进化生成

Yuntong Hu, Yuting Zhang, Matthew Trager, Yi Zhang, Shuo Yang, Wei Xia, Stefano Soatto

发表机构 * Department of Computer Science, Emory University, Atlanta, GA, USA（埃默里大学计算机科学系）

AI总结提出EvoMAS方法，将多智能体系统生成转化为结构化配置生成，通过进化算法在配置空间中优化，提升任务性能、可执行性和鲁棒性。

Comments ICML2026

详情

Journal ref: ICML2026

AI中文摘要

基于大语言模型的多智能体系统在复杂推理、规划和工具增强任务中展现出巨大潜力，但设计有效的MAS架构仍然劳动密集、脆弱且难以泛化。现有的自动MAS生成方法要么依赖代码生成，常导致可执行性和鲁棒性失败，要么施加僵化的架构模板，限制了表达性和适应性。我们提出多智能体系统的进化生成（EvoMAS），将MAS生成形式化为结构化配置生成。EvoMAS在配置空间中进行进化生成。具体来说，EvoMAS从池中选择初始配置，应用基于执行轨迹引导的反馈条件变异和交叉，并迭代优化候选池和经验记忆。我们在多个基准测试上评估EvoMAS，包括BBEH、SWE-Bench和WorkBench，涵盖推理、软件工程和工具使用任务。EvoMAS在任务性能上持续优于人工设计的MAS和先前的自动MAS生成方法，同时生成的系统具有更高的可执行性和运行时鲁棒性。EvoMAS在BBEH推理上比智能体进化方法EvoAgent高出10.5个百分点，在WorkBench上高出7.1个百分点。使用Claude-4.5-Sonnet，EvoMAS在SWE-Bench-Verified上达到79.1%，与排行榜顶部持平。代码可在https://github.com/amazon-science/EvoMAS获取。

英文摘要

Large language model (LLM)-based multi-agent systems (MAS) show strong promise for complex reasoning, planning, and tool-augmented tasks, but designing effective MAS architectures remains labor-intensive, brittle, and hard to generalize. Existing automatic MAS generation methods either rely on code generation, which often leads to executability and robustness failures, or impose rigid architectural templates that limit expressiveness and adaptability. We propose Evolutionary Generation of Multi-Agent Systems (EvoMAS), which formulates MAS generation as structured configuration generation. EvoMAS performs evolutionary generation in configuration space. Specifically, EvoMAS selects initial configurations from a pool, applies feedback-conditioned mutation and crossover guided by execution traces, and iteratively refines both the candidate pool and an experience memory. We evaluate EvoMAS on diverse benchmarks, including BBEH, SWE-Bench, and WorkBench, covering reasoning, software engineering, and tool-use tasks. EvoMAS consistently improves task performance over both human-designed MAS and prior automatic MAS generation methods, while producing generated systems with higher executability and runtime robustness. EvoMAS outperforms the agent evolution method EvoAgent by +10.5 points on BBEH reasoning and +7.1 points on WorkBench. With Claude-4.5-Sonnet, EvoMAS also reaches 79.1% on SWE-Bench-Verified, matching the top of the leaderboard. Code is available at https://github.com/amazon-science/EvoMAS

URL PDF HTML ☆

赞 0 踩 0

2602.06025 2026-05-28 cs.CL cs.AI cs.LG 版本更新

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

学习面向运行时智能体记忆的查询感知预算层级路由

Haozhen Zhang, Haodong Yue, Tao Feng, Quanyu Long, Jianzhu Bao, Bowen Jin, Weizhi Zhang, Xiao Li, Jiaxuan You, Chengwei Qin, Wenya Wang

发表机构 * Nanyang Technological University（南洋理工大学）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； University of Illinois Chicago（伊利诺伊大学香槟分校）； Tsinghua University（清华大学）； Sun Yat-sen University（中山大学）； The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））

AI总结提出 BudgetMem 框架，通过强化学习训练的轻量级路由器实现查询感知的预算层级路由，以在运行时平衡任务性能与记忆构建成本。

Comments Accepted by ICML 2026. Code is available at https://github.com/ViktorAxelsen/BudgetMem

详情

GFMate：通过测试时提示调优赋能图基础模型

Yan Jiang, Ruihong Qiu, Zi Huang

发表机构 * School of Electrical Engineering and Computer Science, The University of Queensland, Brisbane, Queensland, Australia（电气工程与计算机科学学院，昆士兰大学，布里斯班，昆士兰，澳大利亚）

AI总结提出预训练无关的测试时图提示调优方法GFMate，通过质心提示和层提示避免与源领域和预训练策略的纠缠，并设计测试时互补学习目标利用有标签和无标签目标域数据，在12个基准数据集上实现高达30.63%的性能提升。

详情

AI中文摘要

图提示调优通过在传统单域场景中引入可训练提示来增强模型性能，在图学习中展现出巨大潜力。最近的研究通过少样本调优辅助提示，将图提示扩展至改进图基础模型（GFM）。尽管取得了进展，现有方法大多将源域信息嵌入提示中，这些提示要么作为GFM的输入，要么在模型预训练期间编码。这种提示与特定源域和GFM预训练策略的纠缠限制了其向其他域和不同GFM的泛化能力。此外，现有的GFM提示仅依赖少样本调优进行适应，忽略了未标记目标域测试数据中的丰富信息。受这些洞察启发，本文旨在通过预训练无关的测试时图提示调优赋能GFM，命名为GFMate。GFMate引入在目标域上预训练后应用的质心提示和层提示，避免与特定源域和模型预训练的纠缠。此外，设计了一个测试时互补学习目标，以利用有标签和未标记的目标域数据进行有效的测试时提示调优。在12个基准数据集上的大量实验证明了GFMate的优越性能和效率，实现了高达30.63%的提升。代码可在https://github.com/YanJiangJerry/GFMate获取。

英文摘要

Graph prompt tuning has shown great potential in graph learning by introducing trainable prompts to enhance the model performance in conventional single-domain scenarios. Recent research has extended graph prompts to improve Graph Foundation Models (GFMs) by few-shot tuning auxiliary prompts. Despite their progress, most existing methods embed source-domain information into prompts, which serve either as input to GFMs or encoded during model pre-training. Such prompt entanglement with specific source domains and GFM pre-training strategy restricts their generalisability to other domains and different GFMs. Furthermore, existing GFM prompts merely rely on few-shot tuning for adaptation, neglecting the rich information in unlabelled target domain test data. Motivated by these insights, this paper aims to empower GFMs with pre-training-agnostic test-time graph prompt tuning, named GFMate. GFMate introduces centroid and layer prompts applied after pre-training on target domains, avoiding entanglement with specific source domains and model pre-training. In addition, a test-time complementary learning objective is devised to exploit both labelled and unlabelled target domain data for effective test-time prompt tuning. Extensive experiments on 12 benchmark datasets demonstrate the superior performance and efficiency of GFMate, achieving improvements of up to 30.63%. Code is available at https://github.com/YanJiangJerry/GFMate.

URL PDF HTML ☆

赞 0 踩 0

2605.14284 2026-05-28 cs.LG 版本更新

Smooth Multi-Policy Causal Effect Estimation in Longitudinal Settings

纵向设置下的平滑多策略因果效应估计

Wenxin Chen, Weishen Pan, Kyra Gan, Fei Wang

发表机构 * cornell（康奈尔大学）

AI总结针对多个动态治疗策略的因果效应估计，提出一种策略感知的迭代条件期望重参数化方法（PEQ-Net），通过共享表示实现联合估计，并利用核均值嵌入训练策略编码器，以降低有限样本方差。

详情

AI中文摘要

多个动态治疗策略的比较评估对于医疗和政策决策至关重要，然而传统的纵向因果推断方法孤立地估计每个策略，阻止了反事实之间的信息共享。我们证明这种单独估计范式会引入结构上不受控制的二阶偏差，即使在经过纵向目标最大似然估计（LTMLE）的标准去偏后，也会膨胀有限样本方差。为了解决这个问题，我们提出了一种策略感知的迭代条件期望（ICE）Q函数重参数化方法，通过共享表示实现联合估计。我们在策略编码Q网络（PEQ-Net）中实现了这种方法，该网络以共享策略编码器为核心。编码器使用核均值嵌入进行训练，确保学习到的表示空间反映总体层面的策略差异。在应用LTMLE校正步骤后，我们证明这种设计对二阶余项施加了结构约束，从而稳定了有限样本方差。在半合成数据集上的实验表明，PEQ-Net始终优于现有的基于ICE的方法，特别是在评估紧密相关的策略时，均方根误差显著降低。

英文摘要

Comparative evaluation of multiple dynamic treatment policies is essential for healthcare and policy decisions, yet conventional longitudinal causal inference methods estimate each in isolation, preventing information sharing across counterfactuals. We demonstrate that this separate estimation paradigm induces a structurally uncontrolled second-order bias, inflating finite-sample variance even after standard debiasing with longitudinal targeted maximum likelihood estimation(LTMLE). To address this, we propose a policy-aware reparameterization of Iterative Conditional Expectation (ICE) Q-functions that enables joint estimation through shared representations. We implement this approach in the Policy-Encoded Q Network (PEQ-Net), an architecture centered on a shared policy encoder. The encoder is trained using kernel mean embeddings, ensuring that the learned representation space reflects population-level policy dissimilarities. After applying an LTMLE correction step, we prove this design imposes a structural constraint on the second-order remainder, thereby stabilizing finite-sample variance. Experiments on semi-synthetic datasets demonstrate that PEQ-Net consistently outperforms existing ICE-based methods, achieving substantial reductions in root-mean-square error, particularly when evaluating closely related policies.

URL PDF HTML ☆

赞 0 踩 0

2605.13743 2026-05-28 cs.LG 版本更新

GHGbench: A Unified Multi-Entity, Multi-Task Benchmark for Carbon Emission Prediction

GHGbench：一个统一的多实体、多任务碳排放预测基准

Yifan Duan, Siyuan Zheng, Lihuan Li, Chao Xue, Flora Salim

发表机构 * School of Computer Science and Engineering（计算机科学与工程学院）

AI总结提出GHGbench，一个包含公司和建筑层面温室气体排放预测的统一开放数据集与基准，通过多模态数据融合和标准化评估揭示结构化难度与分布外泛化差距。

详情

AI中文摘要

实体级碳排放预测的开放数据集和基准在访问、规模、粒度和评估方面仍然分散。我们引入了GHGbench，一个用于公司和建筑层面温室气体预测的开放数据集和基准。公司轨道包含来自12,000多家公司的32,000多条公司年记录，包含范围1+2和范围3披露以及财务/行业信号；建筑轨道将来自13个开放源的491,591条建筑年记录统一为涵盖26个大都市区域（10个美国、15个澳大利亚、1个新加坡）的单一模式，包含气候协变量和多模态遥感嵌入。GHGbench定义了规范的数据划分，以分布内和跨区域/城市迁移为主要任务，以时间保持和短期预测为补充附录证据；主要基线涵盖梯度提升树、表格基础模型、MLP、FT-Transformer和多模态融合，辅以LLM面板，所有方法均通过多种子配对自助法评估。出现了三个基准级别的发现：（i）建筑排放的结构性难度高于公司排放；（ii）分布内到分布外的差距远远超过两个轨道中任何模型内的差距，并且据我们所知，表格基础模型是第一个在多城市建筑排放任务上通过配对自助法显著优于调优树的基线；（iii）多模态遥感嵌入在表格泛化失效的地方恰好有帮助。GHGbench还揭示了灾难性的城市迁移和部门因子查找上限作为系统性失败模式。代码和重建配方可在GHGbench获取。

英文摘要

Open datasets and benchmarks for entity-level carbon-emission prediction remain fragmented across access, scale, granularity, and evaluation. We introduce GHGbench, an open dataset and benchmark for company- and building-level greenhouse-gas prediction. The company track contains 32,000+ company-year records from 12,000+ firms with Scope 1+2 and Scope 3 disclosures and financial/sectoral signals; the building track harmonises 491,591 building-year records from 13 open sources into a single schema across 26 metropolitan areas (10 U.S., 15 Australian, 1 Singaporean), with climate covariates and multimodal remote-sensing embeddings. GHGbench defines canonical splits with in-distribution and cross-region/city transfer as primary tasks and temporal hold-out plus short-horizon forecasting as supplementary appendix evidence; headline baselines span gradient-boosted trees, a tabular foundation model, MLP, FT-Transformer, and multimodal fusion, with an LLM panel as auxiliary, all evaluated under multi-seed paired-bootstrap tests. Three benchmark-level findings emerge: (i) building emissions are structurally harder than company emissions; (ii) the in-distribution to out-of-distribution gap dwarfs any within-model gap across both the company track and the building track, and a tabular foundation model is, to our knowledge, the first baseline to open a paired-bootstrap-significant gap over tuned trees on a multi-city building-emissions task; (iii) multimodal remote-sensing embeddings help precisely where tabular generalisation breaks. GHGbench also exposes catastrophic city transfer and the sector-factor lookup ceiling as systematic failure modes. Code and reconstruction recipes are available at GHGbench.

URL PDF HTML ☆

赞 0 踩 0

2605.13517 2026-05-28 cs.CV cs.AI cs.LG 版本更新

ArcVQ-VAE: A Spherical Vector Quantization Framework with ArcCosine Additive Margin

ArcVQ-VAE：一种带有反余弦加性边界的球面向量量化框架

Jaeyung Kim, YoungJoon Yoo

发表机构 * Department of Artificial Intelligence, Chung-Ang University, Seoul, Republic of Korea（韩国首尔 Chung-Ang 大学人工智能系）； SNUAILAB, Seoul, Republic of Korea（韩国首尔 SNUAILAB 实验室）

AI总结针对VQ-VAE有限码本容量限制表示能力的问题，提出ArcVQ-VAE框架，通过引入球面角边先验（包括球界范数正则化和反余弦加性边界损失）增强潜在表示的判别性和均匀分散性，提升码本利用率，在图像重建和生成任务上取得竞争性能。

Comments To appear in Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

向量量化变分自编码器（VQ-VAE）已成为图像建模中学习离散表示的基本框架。然而，VQ-VAE模型必须使用有限的码本向量集对整张图像进行分词，这种容量限制限制了其捕获丰富多样表示的能力。在本文中，我们提出反余弦加性边界VQ-VAE（ArcVQ-VAE），一种新颖的向量量化框架，该框架为传统VQ-VAE的码本引入了球面角边先验（SAMP）。所提出的SAMP由球界范数正则化（将所有码本向量约束在时间相关的欧几里得球内）和反余弦加性边界损失（鼓励潜在向量之间更大的角度可分性）组成。这种公式在受限空间内促进了更具判别性和均匀分散的潜在表示，从而提高了有效的潜在空间覆盖范围，并导致码本利用率提升。在标准图像重建和生成任务上的实验结果表明，ArcVQ-VAE在重建精度、表示多样性和样本质量方面与基线模型相比取得了竞争性能。代码可在 https://github.com/goals4292/ArcVQ-VAE 获取。

英文摘要

Vector Quantized Variational Autoencoder (VQ-VAE) has become a fundamental framework for learning discrete representations in image modeling. However, VQ-VAE models must tokenize entire images using a finite set of codebook vectors, and this capacity limitation restricts their ability to capture rich and diverse representations. In this paper, we propose ArcCosine Additive Margin VQ-VAE (ArcVQ-VAE), a novel vector quantization framework that introduces a spherical angular-margin prior (SAMP) for the codebook of a conventional VQ-VAE. The proposed SAMP consists of Ball-Bounded Norm Regularization, which constrains all codebook vectors within a time-dependent Euclidean ball, and ArcCosine Additive Margin Loss, which encourages greater angular separability among latent vectors. This formulation promotes more discriminative and uniformly dispersed latent representations within the constrained space, thereby improving effective latent-space coverage and leading to improved codebook utilization. Experimental results on standard image reconstruction and generation tasks show that ArcVQ-VAE achieves competitive performance against baseline models in terms of reconstruction accuracy, representation diversity, and sample quality. The code is available at: https://github.com/goals4292/ArcVQ-VAE

URL PDF HTML ☆

赞 0 踩 0

2506.22726 2026-05-28 cs.CV cs.LG 版本更新

XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge

XTransfer: 面向边缘人体感知的模态无关小样本模型迁移

Yu Zhang, Xi Zhang, Hualin Zhou, Xinyuan Chen, Shang Gao, Hong Jia, Jianfei Yang, Yuankai Qi, Tao Gu

发表机构 * Macquarie University, Sydney, NSW, Australia（麦考瑞大学，悉尼，新南威尔士州，澳大利亚）； Nanyang Technological University, Singapore（南洋理工大学，新加坡）； The University of Auckland, Auckland, New Zealand（奥克兰大学，奥克兰，新西兰）

AI总结提出XTransfer方法，通过模型修复和层重组实现模态无关的小样本模型迁移，降低传感器数据收集、模型训练和边缘部署成本。

Comments Accepted at ICML2026

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning (ICML 2026), Seoul, South Korea, 6-11 July 2026

AI中文摘要

边缘系统上用于人体感知的深度学习具有巨大的智能应用潜力。然而，其训练和开发受到传感器数据有限和边缘系统资源约束的限制。虽然将预训练模型迁移到不同的感知应用很有前景，但现有方法通常需要大量的传感器数据和计算资源，导致成本高且可迁移性有限。在本文中，我们提出了XTransfer，这是一种首创的方法，实现了模态无关、小样本模型迁移，并具有资源高效的设计。XTransfer通过以下方式灵活地使用预训练模型并在不同模态间迁移知识：(i) 模型修复，通过仅使用少量传感器数据适配预训练层来安全地缓解模态偏移；(ii) 层重组，以逐层方式高效地搜索和重组源模型中的感兴趣层以重构模型。我们在跨不同模态的多种人体感知数据集上对各种基线进行了基准测试。结果表明，XTransfer实现了最先进的性能，同时显著降低了传感器数据收集、模型训练和边缘部署的成本。

英文摘要

Deep learning for human sensing on edge systems presents significant potential for smart applications. However, its training and development are hindered by the limited availability of sensor data and resource constraints of edge systems. While transferring pre-trained models to different sensing applications is promising, existing methods often require extensive sensor data and computational resources, resulting in high costs and limited transferability. In this paper, we propose XTransfer, a first-of-its-kind method enabling modality-agnostic, few-shot model transfer with resource-efficient design. XTransfer flexibly uses pre-trained models and transfers knowledge across different modalities by (i) model repairing that safely mitigates modality shift by adapting pre-trained layers with only few sensor data, and (ii) layer recombining that efficiently searches and recombines layers of interest from source models in a layer-wise manner to restructure models. We benchmark various baselines across diverse human sensing datasets spanning different modalities. The results show that XTransfer achieves state-of-the-art performance while significantly reducing the costs of sensor data collection, model training, and edge deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.13278 2026-05-28 math.OC cs.LG 版本更新

Proximal-Based Generative Modeling for Bayesian Inverse Problems

基于近端算子的贝叶斯逆问题生成建模

Boyang Zhang, Zhiguo Wang, Ya-Feng Liu

发表机构 * School of Advanced Interdisciplinary Sciences, University of Chinese Academy of Sciences（中国科学院大学先进交叉科学学院）； School of Mathematics, Sichuan University（四川大学数学学院）； School of Mathematical Sciences, Beijing University of Posts and Telecommunications（北京邮电大学数学科学学院）； Academy of Mathematics and Systems Science, Chinese Academy of Sciences（中国科学院数学与系统科学研究院）； Ministry of Education Key Laboratory of Mathematics and Information Networks（教育部数学与信息网络重点实验室）

AI总结针对扩散模型在逆问题中似然分数难以计算的问题，提出基于近端算子的生成建模框架，利用Moreau-Yosida正则化与高斯卷积的理论等价性，通过Moreau分数匹配学习近端算子，实现无需显式似然评估的采样，理论上去除早期停止偏差并达到非渐近收敛，实验在重建质量和采样时间上超越现有方法。

详情

AI中文摘要

基于分数的扩散模型在生成任务中表现出卓越性能，但由于时间相关似然分数的解析难解性，在逆问题中遇到根本性瓶颈。为弥补这一差距，我们提出一种新颖的基于近端算子的生成建模（PGM）框架，严格规避了显式似然评估。我们的框架建立在扩散过程中的高斯卷积与非光滑优化中的Moreau-Yosida正则化之间的理论等价性之上。这使得一种由所提出的Moreau分数驱动的新采样机制成为可能，该分数通过近端算子具有闭式表达式。此外，我们引入Moreau分数匹配来学习仅依赖于从先验分布中抽取的样本的近端算子。理论上，PGM消除了基于分数的扩散模型固有的早期停止偏差，并实现了非渐近收敛。实验表明，PGM在重建质量和采样时间上显著超越了最先进的方法。

英文摘要

Score-based diffusion models demonstrate superior performance in generative tasks but encounter fundamental bottlenecks in inverse problems due to the analytical intractability of the time-dependent likelihood score. To bridge this gap, we propose a novel proximal-based generative modeling (PGM) framework that rigorously circumvents explicit likelihood evaluation. Our framework is built upon a theoretical equivalence between Gaussian convolution in diffusion processes and Moreau-Yosida regularization in nonsmooth optimization. This enables a new sampling mechanism driven by the proposed Moreau score, which admits a closed-form expression via proximal operators. Moreover, we introduce Moreau score matching to learn the proximal operators that rely solely on samples drawn from the prior distribution. Theoretically, PGM eliminates the early-stopping bias inherent in the score-based diffusion model and achieves non-asymptotic convergence. Experiments demonstrate that PGM significantly surpasses state-of-the-art methods in reconstruction quality and sampling time.

URL PDF HTML ☆

赞 0 踩 0

2512.21075 2026-05-28 cs.LG cs.AI math.PR stat.ML 版本更新

Feature Learning Dynamics in Infinite-Depth Neural Networks

无限深度神经网络中的特征学习动力学

Zihan Yao, Ruoyu Wu, Tianxiang Gao

发表机构 * School of Computing（计算学院）； Department of Mathematics（数学系）； DePaul University（德保罗大学）； Iowa State University（爱荷华州立大学）

AI总结本文研究深度-μP缩放下单层ResNet中由权重重用引起的前向-后向耦合，证明其在初始化时随宽度消失，但在训练中产生非平凡相关项，并推导出无限深度极限下的神经特征动力学（NFD）SDE系统。

详情

AI中文摘要

深度神经网络在实践中取得了显著成功，但对训练过程中特征如何演化的机制理解仍不完整，尤其是在大深度极限下。对于深度-μP缩放下的ResNet，先前工作将层索引ℓ视为连续时间t_ℓ = ℓ/L，得到训练动力学的SDE描述。一个关键未解决问题是，反向传播通过其转置W_ℓ^⊤重用每个前向权重矩阵W_ℓ，在前向特征和反向梯度之间产生相关性，其行为和特征学习中的作用尚不清楚。我们研究了深度-μP下单层ResNet中这种重用权重的前向-后向耦合。使用条件高斯表示，我们在取任何网络极限之前，显式地将权重重用引起的耦合项与解耦的高斯波动分开。在初始化时，我们证明耦合是有限宽度效应，并以O(n^{-1})的速率随深度一致消失。然而，在训练期间，SGD引入了一个非平凡的前向-后向相关项，该项在无限宽度极限下仍然存在。关键的深度效应是，在深度-μP缩放下，这个幸存项在深度上是高阶的，并且随着L→∞，其在层上的累积贡献变得可忽略。这种深度诱导的抑制促使了神经特征动力学（NFD），一个具有解耦后向权重的向前-向后SDE系统，它保留了训练期间生成的特征-梯度协方差结构。在非退化假设下，我们证明有限网络训练动力学收敛到其NFD极限，深度离散化误差为O(L^{-1})，而重用权重耦合项具有更快的O(L^{-2})衰减。这些结果为深度-μP下单层ResNet的特征学习动力学提供了严格的无限深度极限。

英文摘要

Deep neural networks have achieved remarkable success in practice, yet a mechanistic understanding of how features evolve during training remains incomplete, especially in the large-depth limit. For ResNets under depth-$μ$P scaling, prior work treats the layer index $\ell$ as a continuous time $t_\ell = \ell/L$, yielding SDE descriptions of the training dynamics. A key unresolved issue is that backpropagation reuses each forward weight matrix $W_\ell$ through its transpose $W_\ell^\top$, creating correlations between forward features and backward gradients whose behavior and role in feature learning remain unclear. We study this reused-weight forward--backward coupling in one-layer ResNets under depth-$μ$P. Using conditional Gaussian representations, we explicitly separate the coupling terms induced by weight reuse from decoupled Gaussian fluctuations before taking any network limit. At initialization, we prove that the coupling is a finite-width effect and vanishes at rate $O(n^{-1})$, uniformly over depth. During training, however, SGD induces a nontrivial forward--backward correlation term that survives the infinite-width limit. The key depth effect is that, under depth-$μ$P scaling, this surviving term is higher order in depth and its accumulated contribution over layers becomes negligible as $L\to\infty$. This depth-induced suppression motivates Neural Feature Dynamics (NFD), a forward--backward SDE system with decoupled backward weights that retains the feature-gradient covariance structure generated during training. Under nondegeneracy assumptions, we prove that the finite-network training dynamics converge to its NFD limit with an $O(L^{-1})$ depth-discretization error, while the reused-weight coupling term has a faster $O(L^{-2})$ decay. These results provide a rigorous infinite-depth limit for the feature-learning dynamics of one-layer ResNets under depth-$μ$P.

URL PDF HTML ☆

赞 0 踩 0

2605.12015 2026-05-28 cs.CR cs.AI cs.CL cs.LG cs.MA 版本更新

重新思考层冗余：校准比搜索在LLM深度剪枝中更重要

Minkyu Kim, Vincent-Daniel Yun, Youngrae Kim, Suin Cho, Woosang Lim, Sunwoo Lee

发表机构 * Neural Superintelligence Lab, MODULABS（神经超智能实验室，MODULABS）； University of Southern California（南加州大学）； Boston University（波士顿大学）； Seoul National University（首尔国立大学）； Inha University（inha大学）

AI总结本文通过实验发现，在大型语言模型深度剪枝中，校准配置对剪枝模式和性能的影响远大于搜索算法的选择。

Comments Preprint

详情

AI中文摘要

深度剪枝通过移除Transformer块来提高大型语言模型的推理效率。先前的工作通常将层冗余视为预训练网络固有的结构属性，强调重要性标准和搜索算法来识别可移除的层。在本研究中，我们从功能角度实证研究深度剪枝。通过评估不同校准配置和多种搜索算法下的代表性LLM系列，我们展示了不同配置会产生不同的剪枝模式。此外，在固定校准配置下，复杂的搜索算法相比简单的一次性方法仅带来边际性能提升，并收敛到相似的剪枝子集。总体而言，我们的结果表明，校准配置在塑造剪枝模式和校准困惑度方面比搜索算法的选择起着更大的作用，同时对下游推理准确性的方差贡献相当。这表明未来的剪枝工作可能受益于优先考虑校准配置而非搜索复杂性。

英文摘要

Depth pruning improves the inference efficiency of large language models by removing Transformer blocks. Prior work typically treats layer redundancy as an inherent structural property of pretrained networks, emphasizing importance criteria and search algorithms to identify removable layers. In this study, we empirically investigate depth pruning from a functional perspective. Evaluating representative LLM families across diverse calibration configurations and multiple search algorithms, we show that different configurations produce different pruning patterns. Furthermore, under a fixed calibration configuration, complex search algorithms yield marginal performance improvements over simple one-shot methods, converging to similar pruned subsets. Overall, our results suggest that the calibration configuration plays a substantially larger role than the choice of search algorithm in shaping pruning patterns and calibration perplexity, while contributing comparably to variance in downstream reasoning accuracy. This indicates that future pruning efforts may benefit from prioritizing the calibration configuration over search complexity.

URL PDF HTML ☆

赞 0 踩 0

2605.06915 2026-05-28 cs.LG 版本更新

LLMs are not (consistently) Bayesian: Quantifying internal (in)consistencies of LLMs' probabilistic beliefs

LLMs 并非（一致地）贝叶斯：量化 LLMs 概率信念的内部（不）一致性

Chacha Chen, Matthew Jörke, Adam Goliński, Masha Fedzechkina, Guillermo Sapiro, Sinead Williamson, Nicholas Foti

发表机构 * Apple（苹果公司）； Stanford University（斯坦福大学）； Princeton University（普林斯顿大学）

AI总结本文引入信息处理差距来研究 LLMs 在更新概率信念时的内部不一致性，发现非贝叶斯启发式更新在下游任务中常优于精确贝叶斯计算，表明 LLMs 的世界概率模型存在错误设定。

详情

AI中文摘要

现代人工智能系统正被部署在医学、科学和法律等复杂领域，在这些领域中，它们不仅需要产生正确的答案，还需要在新证据出现时表示和更新关于世界的不确定性信念。我们引入了一种新颖的技术，将 LLMs 作为信息处理规则进行研究，并利用信息处理差距来研究 LLMs 如何从证据中更新其概率信念的内部（不）一致性。我们的广泛实验评估了 LLMs 将证据纳入其信念的多种方法。其中一些方法产生（近乎）贝叶斯更新；其他方法似乎使用学习到的启发式。令人惊讶的是，非贝叶斯启发式更新在下游任务性能上通常优于精确贝叶斯计算——这表明 LLMs 的世界概率模型存在错误设定。最后，我们展示了我们的度量如何提供诊断，以识别 LLM 驱动的推理系统中的问题。

英文摘要

Modern AI systems are being deployed in complex domains such as medicine, science, and law, where it is important that they not only produce correct answers, but also represent and update uncertain beliefs about the world as new evidence arrives. We introduce the novel technique of studying LLMs as information processing rules and utilize the information processing gap to study the internal (in)consistencies of how LLMs update their probabilistic beliefs from evidence. Our extensive experiments evaluate multiple approaches in which LLMs can incorporate evidence into their beliefs. Some of these approaches produce (nearly) Bayesian updates; others seem to use a learned heuristic. Surprisingly, the non-Bayesian heuristic updates often outperform exact Bayesian computation in terms of downstream task performance -- indicating the LLMs' probabilistic models of the world are misspecified. Lastly, we show how our measure can provide diagnostics to identify issues with LLM-powered inferential systems.

URL PDF HTML ☆

赞 0 踩 0

2510.25781 2026-05-28 cs.LG cs.AI cs.NA cs.NE math.NA 版本更新

A Practitioner's Guide to Kolmogorov-Arnold Networks

Kolmogorov-Arnold网络实践指南

Amir Noorizadegan, Sifan Wang, Leevan Ling, Juan P. Dominguez-Morales

发表机构 * Department of Mathematics, Hong Kong Baptist University（香港 Baptist大学数学系）； Institution for Foundations of Data Science, Yale University（数据科学基础研究所，耶鲁大学）； Robotics and Technology of Computers Lab., Universidad de Sevilla（机器人与计算机技术实验室，塞维利亚大学）

AI总结本文系统综述了受Kolmogorov叠加定理启发的KAN网络，从理论基础、设计轴心（基函数）到最新进展，并提供了实用选择指南和未来方向。

2605.00025 2026-05-28 q-bio.NC cs.CL cs.HC cs.LG eess.AS 版本更新

MoDAl: Self-Supervised Neural Modality Discovery via Decorrelation for Speech Neuroprosthesis

MoDAl: 基于去相关的自监督神经模态发现用于语音神经假体

Yuanhao Chen, Peter Chin

发表机构 * Dartmouth College（达特茅斯学院）

AI总结提出MoDAl框架，通过对比学习和对齐损失与去相关损失之间的协同作用，从多脑区发现互补神经模态，在Brain-to-Text Benchmark '24上将词错误率从26.3%降至21.6%。

详情

AI中文摘要

语音神经假体系统在无听觉输出的情况下从神经活动解码预期语音，为言语障碍患者恢复交流提供了途径。当前方法主要从运动皮层区域解码，忽略了其他区域——如布罗卡区的一部分44区——这些区域可能编码互补的语言信息。我们提出了MoDAl（模态去相关与对齐）框架，该框架通过在共享投影空间中两个目标的相互作用来发现互补的神经模态。对比损失将多个并行脑编码器中的每一个与预训练大语言模型（LLM）的文本嵌入对齐，而去相关损失防止编码器合并成重复表示。我们证明这些目标之间存在富有成效的张力：对比对齐诱导传递性模态合并，而去相关必须抵消这一点，以使框架发现多样的神经语言学模态。在Brain-to-Text Benchmark '24上，与之前最佳端到端方法相比，MoDAl将词错误率（WER）从26.3%降低到21.6%，其中纳入先前丢弃的44区信号的增益完全来自去相关机制。对发现模态的分析揭示了功能特化：接收44区输入的编码器捕获结构和句法属性（句子长度、语法语态、wh-词），这与布罗卡区的神经语言学理解一致。

英文摘要

Speech neuroprosthesis systems decode intended speech from neural activity in the absence of audible output, offering a path to restoring communication for individuals with speech-impairing conditions. Current approaches decode predominantly from motor cortical areas, discarding others -- such as area 44, part of Broca's area -- that may encode complementary linguistic information. We introduce MoDAl (Modality Decorrelation and Alignment), a framework that discovers complementary neural modalities through the interplay of two objectives in a shared projection space. A contrastive loss aligns each of several parallel brain encoders with the text embeddings of a pretrained large language model (LLM), while a decorrelation loss prevents the encoders from coalescing to duplicative representations. We prove that these objectives are in productive tension: Contrastive alignment induces transitive modality coalescence, which decorrelation must counteract for the framework to discover diverse neurolinguistic modalities. On the Brain-to-Text Benchmark '24, MoDAl reduces word error rate (WER) from 26.3% to 21.6% compared to the previous best end-to-end method, with the gain from incorporating previously discarded area 44 signals arising entirely from the decorrelation mechanism. Analysis of the discovered modalities reveals functional specialization: Encoders receiving area 44 input capture structural and syntactic properties (sentence length, grammatical voice, wh-words), consistent with the neurolinguistic understanding of Broca's area.

URL PDF HTML ☆

赞 0 踩 0

2603.09117 2026-05-28 cs.LG cs.AI cs.CL 版本更新

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

解耦推理与置信度：在可验证奖励的强化学习中恢复校准

Zhengzhao Ma, Xueru Wen, Boxi Cao, Yaojie Lu, Hongyu Lin, Jinglin Yang, Min He, Xianpei Han, Le Sun

发表机构 * Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, China（中国科学院软件研究所信息处理实验室）； University of Chinese Academy of Sciences, Beijing, China（中国科学院大学）； Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China（中国科学院信息工程研究所）； School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China（中国科学院大学网络安全学院）； National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing, China（中国国家计算机网络应急技术配合中心）

AI总结针对RLVR中模型校准退化问题，提出DCPO框架通过解耦推理与校准目标，在保持准确率的同时显著改善校准性能并缓解过度自信。

Comments Accepted at the 43rd International Conference on Machine Learning (ICML 2026)

2410.04096 2026-05-28 cs.LG cs.AI cs.NA cs.NE math.NA physics.comp-ph 版本更新

Sinc Kolmogorov-Arnold network and its application for solving PDEs with singularities

Sinc Kolmogorov-Arnold 网络及其在求解含奇异性偏微分方程中的应用

Tianchi Yu, Jingwei Qiu, Jiang Yang, Ivan Oseledets

发表机构 * Skolkovo Institute of Science and Technology（斯克洛夫科学与技术研究所）； Southern University of Science and Technology（南方科技大学）； International Center for Mathematics（国际数学中心）； National Center for Applied Mathematics Shenzhen (NCAMS)（深圳应用数学中心）

AI总结本文提出在 Kolmogorov-Arnold 网络中使用 Sinc 插值作为可学习激活函数，以有效逼近光滑函数和含奇异性的函数，并在物理信息神经网络求解偏微分方程中取得更好效果。

2510.24941 2026-05-28 cs.LG 版本更新

Can Aha Moments Be Fake? Towards Quantifying Decorative and True Thinking in Chain-of-Thought

顿悟时刻可以是假的吗？——量化思维链中的装饰性思考与真实思考

Jiachen Zhao, Yiyou Sun, Weiyan Shi, Dawn Song

发表机构 * Northeastern University（东北大学）； University of California, Berkeley（加州大学伯克利分校）

AI总结提出真实思考得分（TTS）量化思维链中每一步对最终答案的因果贡献，发现模型常混合真实思考与装饰性思考，并利用TTS实现有效剪枝与自训练，揭示前沿模型常表述未因果使用的推理步骤。

详情

AI中文摘要

大型语言模型可以生成长的思维链（CoT）推理，但先前的研究表明，在明确设计的设置下，CoT可能是事后合理化，而非计算过程的忠实反映。在这项工作中，我们更进一步，提出了真实思考得分（TTS），用于量化在现实推理问题中CoT每一步对模型最终预测的因果贡献。在从1.5B到1.1T参数的11个模型上，针对常见推理基准，我们发现CoT经常交织真实思考步骤（对最终答案有因果影响）和装饰性思考步骤（看似有用但因果影响很小）；即使对于前沿模型，这种装饰性步骤仍然普遍存在：在MATH上，Kimi-K2.6中超过30%的步骤是装饰性的（TTS <= 0.005）。此外，TTS使得有效的CoT剪枝成为可能：移除TTS最低的50%的CoT步骤可以基本保持性能。在这些剪枝后的CoT上进行自训练，可以将Nemotron3-Nano-30B的推理长度减少66%，同时保持性能。最后，我们提供了机制分析，表明LLM可以在潜在空间中被引导以参与或脱离推理步骤。总体而言，我们的结果揭示了前沿LLM经常表述未被因果使用的推理步骤，这对CoT的效率和可信度提出了挑战。

英文摘要

Large language models can generate long chain-of-thought (CoT) reasoning, yet prior work suggests that CoT can be post-hoc rationalization rather than a faithful reflection of the computation through explicitly designed settings. In this work, we go further and propose a True Thinking Score (TTS) to quantify the causal contribution of each step in CoT to the model's final prediction in realistic reasoning problems. Across eleven models ranging from 1.5B to 1.1T parameters on common reasoning benchmarks, we find that CoTs often interleave true-thinking steps, which causally affect the final answer, with decorative-thinking steps, which appear useful but have little causal influence; Such decorative steps remain prevalent even for frontier models: Over 30% of steps in Kimi-K2.6 are decorative on MATH with TTS <= 0.005. Furthermore, TTS enables effective CoT pruning: removing 50% of CoT steps with the lowest TTS can largely maintain the performance. Self-training on these pruned CoTs reduces reasoning length by 66% while preserving performance on Nemotron3-Nano-30B. Finally, we provide a mechanistic analysis showing that LLMs can be steered in the latent space to engage or disengage with reasoning steps. Overall, our results reveal that frontier LLMs often verbalize reasoning steps that are not causally used, challenging both the efficiency and the trustworthiness of CoT.

URL PDF HTML ☆

赞 0 踩 0

2604.23184 2026-05-28 cs.IT cs.LG math.IT 版本更新

HardNet++: 神经网络中的非线性约束强制执行

Andrea Goertzen, Kaveh Alim, Youngjae Min, Navid Azizan

发表机构 * Massachusetts Institute of Technology（麻省理工学院）

AI总结提出一种通过阻尼局部线性化迭代调整网络输出来强制执行线性和非线性等式与不等式约束的方法，并证明在正则条件下可达到任意精度，应用于非线性模型预测控制问题中实现紧约束满足且不损失最优性。

详情

AI中文摘要

在许多控制和决策应用中，强制执行神经网络输出的约束满足对于安全性、可靠性和物理保真度至关重要。软约束方法在训练期间惩罚违反约束的行为，但不能保证推理期间的约束遵守。其他方法通过投影层保证约束满足，但通常依赖于可行集上存在可处理的投影，限制了它们在更一般问题设置中的实用性。许多感兴趣的现实世界问题是非线性的，缺乏允许可处理投影的特殊结构，这促使开发能够强制执行一般非线性约束的方法。为此，我们引入了HardNet++，一种强制执行线性和非线性等式与不等式约束的约束满足方法。我们的方法通过约束的阻尼局部线性化迭代调整网络输出。每次迭代都是可微的，允许端到端训练框架，其中约束满足层在训练期间处于活动状态。我们证明，在一定的正则条件下，该过程可以强制执行非线性约束满足到任意容差。最后，我们在学习优化背景下展示了紧约束满足而不损失最优性，并将该方法应用于非线性模型预测控制问题。

英文摘要

Enforcing constraint satisfaction in neural network outputs is critical for safety, reliability, and physical fidelity in many control and decision-making applications. While soft-constrained methods penalize constraint violations during training, they do not guarantee constraint adherence during inference. Other approaches guarantee constraint satisfaction via a projection layer, but often rely on the existence of a tractable projection onto the feasible set, limiting their utility in more general problem settings. Many real-world problems of interest are nonlinear and lack the special structure admitting a tractable projection, motivating the development of methods that can enforce general nonlinear constraints. To this end, we introduce HardNet++, a constraint-satisfaction method that enforces linear and nonlinear equality and inequality constraints. Our approach iteratively adjusts the network output via damped local linearizations of the constraints. Each iteration is differentiable, admitting an end-to-end training framework, where the constraint satisfaction layer is active during training. We show that under certain regularity conditions, this procedure enforces nonlinear constraint satisfaction to arbitrary tolerance. Finally, we demonstrate tight constraint adherence without loss of optimality in a learning-for-optimization context, where we apply this method to a nonlinear model predictive control problem.

URL PDF HTML ☆

赞 0 踩 0

2604.19355 2026-05-28 cs.LG cs.AI cs.CE 版本更新

LASER: Learning Active Sensing for Continuum Field Reconstruction

LASER: 用于连续场重建的学习主动感知

Huayu Deng, Jinghui Zhong, Xiangming Zhu, Yunbo Wang, Xiaokang Yang

发表机构 * MoE Key Lab of Artificial Intelligence, AI Institute, School of Computer Science, Shanghai Jiao Tong University（人工智能MOE重点实验室、人工智能研究院、计算机科学学院、上海交通大学）

AI总结提出LASER框架，将主动感知建模为部分可观测马尔可夫决策过程，利用连续场潜在世界模型和强化学习策略在潜在想象空间中模拟感知场景，实现稀疏约束下的高保真重建。

Comments Accepted by ICML 2026 (Oral)

详情

AI中文摘要

连续物理场的高保真测量对于科学发现和工程设计至关重要，但在稀疏和受限感知条件下仍然具有挑战性。传统的重建方法通常依赖于固定的传感器布局，无法适应演变的物理状态。我们提出LASER，一个统一的闭环框架，将主动感知建模为部分可观测马尔可夫决策过程（POMDP）。其核心是采用连续场潜在世界模型，捕捉底层物理动力学并提供内在奖励反馈。这使得强化学习策略能够在潜在想象空间中模拟“假设”感知场景。通过根据预测的潜在状态调整传感器移动，LASER能够导航到当前观测之外可能的高信息区域。我们的实验表明，LASER在多种连续场中始终优于静态和离线优化策略，在稀疏条件下实现高保真重建。

英文摘要

High-fidelity measurements of continuum physical fields are essential for scientific discovery and engineering design but remain challenging under sparse and constrained sensing. Conventional reconstruction methods typically rely on fixed sensor layouts, which cannot adapt to evolving physical states. We propose LASER, a unified, closed-loop framework that formulates active sensing as a Partially Observable Markov Decision Process (POMDP). At its core, LASER employs a continuum field latent world model that captures the underlying physical dynamics and provides intrinsic reward feedback. This enables a reinforcement learning policy to simulate ''what-if'' sensing scenarios within a latent imagination space. By conditioning sensor movements on predicted latent states, LASER navigates toward potentially high-information regions beyond current observations. Our experiments demonstrate that LASER consistently outperforms static and offline-optimized strategies, achieving high-fidelity reconstruction under sparsity across diverse continuum fields.

URL PDF HTML ☆

赞 0 踩 0

2604.18227 2026-05-28 cs.LG 版本更新

FSEVAL: Feature Selection Evaluation Toolbox and Dashboard

FSEVAL：特征选择评估工具箱与仪表盘

Muhammad Rajabinasab, Arthur Zimek

发表机构 * Department of Mathematics and Computer Science University of Southern Denmark（数学与计算机科学系索恩大学）

AI总结提出FSEVAL工具箱与可视化仪表盘，用于标准化、统一地评估和可视化特征选择算法。

2604.16565 2026-05-28 cs.LG cs.AI 版本更新

Reasoning on the Manifold: Bidirectional Consistency for Self-Verification in Diffusion Language Models

流形上的推理：扩散语言模型中用于自我验证的双向一致性

Jiaoyang Ruan, Xin Gao, Yinda Chen, Hengyu Zeng, Liang Du, Guanghao Li, Jie Fu, Jian Pu

发表机构 * Institute of Science and Technology for Brain-Inspired Intelligence（脑启发智能科学与技术研究院）； Fudan University（复旦大学）； Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； University of Science and Technology of China（中国科学技术大学）； IEG, Tencent Inc.（腾讯IEG）

AI总结提出双向流形一致性（BMC），一种无训练、无监督的度量方法，通过前向掩码和后向重建循环量化生成序列的稳定性，用于扩散语言模型的诊断、推理和对齐。

Comments 31 pages, 7 figures. Accepted to the 43rd International Conference on Machine Learning (ICML 2026). Camera-ready version

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning, PMLR 306, 2026

AI中文摘要

尽管扩散大语言模型（dLLMs）在全局规划方面具有结构优势，但高效验证它们是否通过有效的推理轨迹得出正确答案仍然是一个关键挑战。在这项工作中，我们提出了一种几何视角：流形上的推理。我们假设有效的生成轨迹作为学习分布的高密度流形上的稳定吸引子存在，而无效路径则表现出流形外漂移。为了实现这一点，我们引入了双向流形一致性（BMC），这是一种无训练、无监督的度量，通过前向掩码和后向重建循环量化生成序列的稳定性。实验上，我们展示了BMC在整个推理生命周期中的多功能性：（1）在诊断中，它作为无需真实答案的解决方案有效性的鲁棒判别器；（2）在推理中，它能够通过拒绝重采样有效集中计算资源于复杂推理任务；（3）在对齐中，它作为密集的几何奖励，将稀疏的结果监督转化为细粒度的指导，使模型能够超越标准基线自我进化。我们的结果确立了内在几何稳定性作为dLLMs正确性的鲁棒指标。

英文摘要

While Diffusion Large Language Models (dLLMs) offer structural advantages for global planning, efficiently verifying that they arrive at correct answers via valid reasoning traces remains a critical challenge. In this work, we propose a geometric perspective: Reasoning on the Manifold. We hypothesize that valid generation trajectories reside as stable attractors on the high-density manifold of the learned distribution, whereas invalid paths exhibit off-manifold drift. To operationalize this, we introduce Bidirectional Manifold Consistency (BMC), a training-free, unsupervised metric that quantifies the stability of the generated sequence through a forward-masking and backward-reconstruction cycle. Empirically, we demonstrate BMC's versatility across the full reasoning lifecycle: (1) in Diagnosis, it serves as a robust discriminator of solution validity without ground truth answer; (2) in Inference, it enables rejection resampling to effectively concentrate computational resources on complex reasoning tasks; and (3) in Alignment, it functions as a dense geometric reward that transforms sparse outcome supervision into fine-grained guidance, empowering models to self-evolve beyond standard baselines. Our results establish intrinsic geometric stability as a robust indicator of correctness for dLLMs.

URL PDF HTML ☆

赞 0 踩 0

2604.16358 2026-05-28 cs.LG cs.CL 版本更新

SaFeR-Steer: Evolving Multi-Turn MLLMs via Synthetic Bootstrapping and Feedback Dynamics

SaFeR-Steer：通过合成引导和反馈动力学进化多轮多模态大语言模型

Haolong Hu, Hanyu Li, Tiancheng He, Huahui Yi, An Zhang, Qiankun Li, Kun Wang, Yang Liu, Zhigang Zeng

发表机构 * Huazhong University of Science and Technology（华中科技大学）； Beijing University of Posts and Telecommunications（北京邮电大学）； West China Biomedical Big Data Center, Sichuan University（四川大学西部生物医学大数据中心）； School of Public Policy and Administration, Chongqing University（重庆大学公共政策与管理学院）； Nanyang Technological University（南洋理工大学）

AI总结提出SaFeR-Steer框架，通过分阶段合成引导和导师参与的GRPO训练单学生模型，并引入轨迹一致总结奖励（TCSR）以解决多轮安全对齐中的长上下文安全衰减问题，显著提升多轮安全性和有用性。

详情

AI中文摘要

多模态大语言模型（MLLMs）越来越多地部署在多轮场景中，攻击者可以通过不断演变的视觉-文本历史升级不安全意图，并利用长上下文安全衰减。然而，安全对齐仍然以单轮数据和固定模板对话为主，导致训练与部署之间存在不匹配。为弥补这一差距，我们提出SaFeR-Steer，一种渐进式多轮对齐框架，结合分阶段合成引导和导师参与的GRPO，在自适应、在线策略攻击下训练单个学生模型。我们还引入了轨迹一致总结奖励（TCSR），该奖励聚合了历史最小值和回合奖励的平均值，使得任何低质量回合都会影响轨迹级别的回报。I. 数据集。我们发布STEER，一个多轮多模态安全数据集，包含STEER-SFT（12,934）、STEER-RL（2,000）和STEER-Bench（3,227）对话，回合数为2-10。II. 实验。从Qwen2.5-VL-3B/7B开始，SaFeR-Steer在单轮基准（3B：48.30/45.86 → 81.84/70.77；7B：56.21/60.32 → 87.89/77.40）和多轮基准（3B：12.55/27.13 → 55.58/70.27；7B：24.66/46.48 → 64.89/72.35）上显著提高了安全性/有用性，将失败转移到后续回合，并产生了超越单纯扩展的鲁棒性。代码可在https://anonymous.4open.science/r/SaFeR-Steer获取。

英文摘要

MLLMs are increasingly deployed in multi-turn settings, where attackers can escalate unsafe intent through the evolving visual-text history and exploit long-context safety decay. Yet safety alignment is still dominated by single-turn data and fixed-template dialogues, leaving a mismatch between training and deployment. To bridge this gap, we propose SaFeR-Steer, a progressive multi-turn alignment framework that combines staged synthetic bootstrapping with tutor-in-the-loop GRPO to train a single student under adaptive, on-policy attacks. We also introduce Trajectory-Consistent Summative Reward (TCSR), which aggregates the historical minimum and average of turn rewards so that any low-quality turn affects the trajectory-level return. I. Dataset. We release STEER, a multi-turn multimodal safety dataset with STEER-SFT (12,934), STEER-RL (2,000), and STEER-Bench (3,227) dialogues spanning 2-10 turns. II. Experiment. Starting from Qwen2.5-VL-3B/7B, SaFeR-Steer substantially improves Safety/Helpfulness on both single-turn (48.30/45.86 $\rightarrow$ 81.84/70.77 for 3B; 56.21/60.32 $\rightarrow$ 87.89/77.40 for 7B) and multi-turn benchmarks (12.55/27.13 $\rightarrow$ 55.58/70.27 for 3B; 24.66/46.48 $\rightarrow$ 64.89/72.35 for 7B), shifting failures to later turns and yielding robustness beyond scaling alone. Code is available at https://anonymous.4open.science/r/SaFeR-Steer

URL PDF HTML ☆

赞 0 踩 0

2506.01247 2026-05-28 cs.CV cs.AI cs.LG 版本更新

Beyond Interpretability: When, Why, and How Sparse Autoencoders Enable Label-Free Visual Steering

超越可解释性：稀疏自编码器何时、为何以及如何实现无标签视觉引导

Gerasimos Chatzoudis, Zhuowei Li, Gemma E. Moran, Hao Wang, Dimitris N. Metaxas

发表机构 * Department of Computer Science, Rutgers University（罗格斯大学计算机科学系）； Department of Statistics, Rutgers University（罗格斯大学统计系）

AI总结本文提出无标签视觉稀疏引导方法VS2，通过训练稀疏自编码器并利用其重构误差和稀疏特征放大来引导冻结的视觉语言模型，在九个图像分类数据集上提升零样本准确率。

详情

AI中文摘要

稀疏自编码器（SAE）越来越多地被用于解释基础模型，但它们作为可操作干预空间的作用仍不太被理解，尤其是在视觉领域。我们研究稀疏视觉特征是否不仅可用于事后分析，还可用于引导冻结的视觉语言模型。我们引入视觉稀疏引导（VS2），一种无标签方法，它在冻结的CLIP图像编码器的无标签激活上训练一个top-$k$ SAE，并在测试时通过放大输入的活跃稀疏特征并解码诱导的变化来构建一个可解释的引导向量。我们证明该过程可分解为质心偏差引导：每个输入沿着其与SAE学习到的质心的偏差移动。残差项由SAE的每样本重构误差（通过FVU测量）精确控制，从而产生基于FVU的残差界限，并促使在SAE重构不可靠时回退到零样本CLIP的可靠性门控。通过使用在无标签CLIP图像编码器激活上训练的目标域SAE，VS2在九个图像分类数据集上提高了零样本准确率，在推理计算量增加不到0.1%的情况下实现了高达+4.12%的提升。最后，一项受控的上界研究VS2++表明，选择性放大稀疏特征可带来高达+21.44%的提升，揭示了一个重构与任务显著性的差距：对重构显著的稀疏特征不一定与对下游预测有用的特征一致。

英文摘要

Sparse Autoencoders (SAEs) are increasingly used to interpret foundation models, but their role as an actionable intervention space remains less understood, especially in vision. We study whether sparse visual features can be used not only for post-hoc analysis, but also to steer frozen vision-language models. We introduce Visual Sparse Steering (VS2), a label-free method that trains a top-$k$ SAE on unlabeled activations from a frozen CLIP image encoder and, at test time, constructs an interpretable steering vector by amplifying the input's active sparse features and decoding the induced change. We show that this procedure admits a closed-form decomposition as centroid-deviation steering: each input is moved along its deviation from the SAE-learned centroid. The residual term is controlled exactly by the SAE's per-sample reconstruction error, measured by FVU, yielding an FVU-based residual bound and motivating a reliability gate that falls back to zero-shot CLIP when SAE reconstruction is unreliable. With target-domain SAEs trained on unlabeled CLIP image-encoder activations, VS2 improves zero-shot accuracy across nine image-classification datasets, achieving gains up to $+4.12\%$ with less than $0.1\%$ additional inference compute. Finally, a controlled upper-bound study, VS2++, shows that selective amplification of sparse features can yield gains up to $+21.44\%$, exposing a reconstruction-vs-task saliency gap: features salient for reconstruction need not align with features useful for downstream prediction.

URL PDF HTML ☆

赞 0 踩 0

2507.07067 2026-05-28 eess.SP cs.LG 版本更新

How to Bridge the Sim-to-Real Gap in Digital Twin-Aided Telecommunication Networks

如何弥合数字孪生辅助电信网络中的仿真到现实差距

Clement Ruah, Houssem Sifaou, Osvaldo Simeone, Bashir M. Al-Hashimi

发表机构 * Institute for Intelligent Networked Systems（智能网络系统研究所）； Department of Engineering（工程系）

AI总结本文综述了通过数字孪生校准和仿真到现实差距感知训练策略来弥合合成数据与真实数据之间差异的两种互补方法。

Comments This work has been accepted for publication in IEEE Communications Magazine

详情

AI中文摘要

由于缺乏特定部署数据，为电信训练有效的人工智能模型具有挑战性。真实数据收集成本高昂，且可用数据集通常无法捕捉网络环境的独特操作条件和上下文变异性。数字孪生为此问题提供了潜在解决方案，因为针对当前网络部署定制的模拟器可以生成站点特定数据以扩充可用训练数据集。然而，需要开发解决方案来弥合合成数据与真实数据之间固有的仿真到现实（sim-to-real）差距。本文综述了两种互补策略的最新进展：1）通过真实世界测量校准数字孪生（DTs），以及2）使用仿真到现实差距感知训练策略来鲁棒地处理数字孪生生成数据与真实数据之间的残余差异。对于后者，我们评估了两种概念上不同的方法，它们分别在环境层面通过贝叶斯学习或在训练损失层面通过预测驱动推理来建模仿真到现实差距。

英文摘要

Training effective artificial intelligence models for telecommunications is challenging due to the scarcity of deployment-specific data. Real data collection is expensive, and available datasets often fail to capture the unique operational conditions and contextual variability of the network environment. Digital twinning provides a potential solution to this problem, as simulators tailored to the current network deployment can generate site-specific data to augment the available training datasets. However, there is a need to develop solutions to bridge the inherent simulation-to-reality (sim-to-real) gap between synthetic and real-world data. This paper reviews recent advances on two complementary strategies: 1) the calibration of digital twins (DTs) through real-world measurements, and 2) the use of sim-to-real gap-aware training strategies to robustly handle residual discrepancies between digital twin-generated and real data. For the latter, we evaluate two conceptually distinct methods that model the sim-to-real gap either at the level of the environment via Bayesian learning or at the level of the training loss via prediction-powered inference.

URL PDF HTML ☆

赞 0 踩 0

2604.09258 2026-05-28 cs.LG 版本更新

Nexus: Same Pretraining Loss, Better Downstream Generalization via Common Minima

Nexus: 相同预训练损失，通过公共极小值实现更好的下游泛化

Huanran Chen, Huaqing Zhang, Xiao Li, Yinpeng Dong, Ke Shen, Jun Zhu

发表机构 * Tsinghua University（清华大学）

AI总结本文提出Nexus优化器，通过最大化梯度相似性促使不同数据源的损失函数极小值靠近，在保持相同预训练损失的情况下显著提升下游泛化性能。

详情

AI中文摘要

大型语言模型的基础能力是在互联网规模、高度异构的数据混合上进行预训练时获得的。在这项工作中，我们研究了关于预训练收敛状态的一个有趣的几何问题：模型是否收敛到所有数据源的公共极小值（例如，图\cref{fig:cwa_illustration:close}），还是仅仅收敛到总损失的极小值（例如，图\cref{fig:cwa_illustration:distant}）？我们假设任务特定极小值的几何“接近度”与下游泛化内在相关。我们发现标准优化器（例如AdamW）通常收敛到任务特定极小值彼此远离的点。为了解决这个问题，我们提出了Nexus优化器，它通过在优化过程中最大化梯度相似性来鼓励这些极小值的接近。在从130M到3B参数的各种模型、多种数据混合和超参数调度下的实验表明，Nexus在实现相同预训练损失的情况下显著提升了下游性能（见图\cref{fig:demo:benchmark}）。值得注意的是，在3B模型上，Nexus将分布外损失降低了0.012，并在复杂推理任务（例如GSM8k）上带来了高达15.0%的准确率提升。这一发现挑战了将预训练损失作为模型评估唯一代理的依赖，并展示了隐式偏好在解锁下游泛化中的重要性。

英文摘要

The foundational capabilities of large language models are acquired during pretraining on internet-scale, highly heterogeneous data mixtures. In this work, we investigate an interesting geometric question regarding the converged state of pretraining: Does the model converge to a common minimizer across all data sources (e.g., \cref{fig:cwa_illustration:close}), or merely a minimizer of the summed loss (e.g., \cref{fig:cwa_illustration:distant})? We hypothesize that the geometric "closeness" of task-specific minima is intrinsically linked to downstream generalization. We reveal that standard optimizers (e.g., AdamW) often converge to points where task-specific minima are distant from each other. To address this, we propose the Nexus optimizer, which encourages the closeness of these minima by maximizing gradient similarity during optimization. Experiments across models ranging from 130M to 3B parameters, various data mixtures and hyperparameter schedules, show that Nexus \textit{significantly boosts downstream performance}, despite \textit{achieving the same pretraining loss} (see \cref{fig:demo:benchmark}). Notably, on the 3B model, Nexus reduces the out-of-distribution loss by 0.012 and yields up to a 15.0\% accuracy improvement on complex reasoning tasks (e.g., GSM8k). This finding challenges the reliance on pretraining loss as the sole proxy for model evaluation and demonstrates the importance of implicit biases in unlocking downstream generalization.

URL PDF HTML ☆

赞 0 踩 0

2604.04074 2026-05-28 cs.AI cs.LG 版本更新

FactReview: Evidence-Grounded Peer Review with Execution-Based Claim Verification

FactReview：基于执行式声明验证的证据驱动同行评审

Ling Yue, Chaoqian Ouyang, Hang Xu, Ruijun Huang, Yuchen Liu, Libin Zheng, Wei Liu, Shaowu Pan, Shimin Di, Min-Ling Zhang

发表机构 * Rensselaer Polytechnic Institute（罗切斯特理工学院）； Sun Yat-sen University（中山大学）； Southeast University（东南大学）； The Hong Kong University of Science and Technology（香港科技大学）

AI总结提出FactReview系统，通过提取与评审相关的声明、将其与相关工作关联，并在代码可用时在固定修复预算下执行发布工件来审计经验声明，覆盖84%的声明，将评审质量提升至4.86/5，并将评审时间减少58%。

详情

AI中文摘要

基于LLM的评审系统通常仅以手稿为输入，使得文献和基于代码的声明难以验证。我们提出FactReview，一个提取与评审相关的声明、将其与相关工作关联，并在代码可用时在固定修复预算下执行发布工件以审计经验声明的系统。在35篇ML论文和463个基准主要声明中，FactReview覆盖了84%的声明。在证据感知评分标准下，其评审在整体质量上得分为4.86/5，比DeepReview-v2高0.7，比匹配的OpenReview评论高1.5。移除执行证据会改变17%的声明状态，超过任何其他单一证据来源。在一项评审辅助研究中，FactReview将平均评审时间减少了58%，同时将基准声明覆盖率从87%提高到99%。我们认为LLM评审者应审计经验声明，而非做出接受或拒绝的决定。代码公开于：https://github.com/DEFENSE-SEU/FactReview。

英文摘要

LLM-based reviewing systems typically take only the manuscript as input, leaving literature and code-based claims hard to verify. We present FactReview, a system that extracts review-relevant claims, grounds them in related work, and, when code is available, executes released artifacts under a fixed repair budget to audit empirical claims. Across 35 ML papers and 463 benchmark major claims, FactReview covers 84% of claims. Under an evidence-aware rubric, its reviews score 4.86/5 in overall quality, 0.7 above DeepReview-v2 and 1.5 above matched OpenReview comments. Removing execution evidence changes 17% of claim statuses, more than any other single evidence source. In a reviewer-assistance study, FactReview reduces mean review time by 58% while raising benchmark claim coverage from 87% to 99%. We argue that LLM reviewers should audit empirical claims, not make accept-reject decisions. The code is public at: https://github.com/DEFENSE-SEU/FactReview.

URL PDF HTML ☆

赞 0 踩 0

2512.11524 2026-05-28 cs.CV cs.LG 版本更新

Super-Resolved Canopy Height Mapping from Sentinel-2 Time Series Using Airborne LiDAR HD Reference Data across Metropolitan France

利用法国大都市机载LiDAR HD参考数据从Sentinel-2时间序列进行超分辨率冠层高度制图

Ekaterina Kalinicheva, Florian Helen, Stéphane Mermoz, Florian Mouret, Milena Planells

发表机构 * CESBIO ； GlobEO

AI总结提出THREASURE-Net端到端框架，利用Sentinel-2时间序列和LiDAR HD数据生成2.5m、5m和10m分辨率的年度冠层高度图，无需预训练模型或高分辨率光学图像，在法国大都市区实现优于现有方法的精度。

详情

AI中文摘要

精细尺度的森林监测对于理解冠层结构及其动态至关重要，这些是碳储量、生物多样性和森林健康的关键指标。深度学习特别有效，因为它整合了共同反映冠层结构的光谱、时间和空间信号。为满足这一需求，我们提出了THREASURE-Net，一种新颖的端到端树高回归与超分辨率框架。该模型使用来自法国大都市区多个空间分辨率的LiDAR HD数据导出的参考高度指标，在Sentinel-2时间序列上训练，以生成年度高度图。我们评估了三种模型变体，分别产生2.5米、5米和10米分辨率的树高预测。THREASURE-Net不依赖任何预训练模型或参考甚高分辨率光学图像来训练其超分辨率模块；相反，它仅从LiDAR导出的高度信息中学习。我们的方法优于现有基于Sentinel数据的最先进方法，并与基于甚高分辨率图像的方法具有竞争力。它可以部署生成高精度年度冠层高度图，在2.5米、5米和10米分辨率下分别实现2.63米、2.70米和2.88米的平均绝对误差。这些结果凸显了THREASURE-Net仅使用免费卫星数据对温带森林进行可扩展且经济高效的结构监测的潜力。THREASURE-Net的源代码可在以下网址获取：https://github.com/Global-Earth-Observation/threasure-net。

英文摘要

Fine-scale forest monitoring is essential for understanding canopy structure and its dynamics, which are key indicators of carbon stocks, biodiversity, and forest health. Deep learning is particularly effective for this task, as it integrates spectral, temporal, and spatial signals that jointly reflect the canopy structure. To address this need, we introduce THREASURE-Net, a novel end-to-end framework for Tree Height Regression And Super-Resolution. The model is trained on Sentinel-2 time series using reference height metrics derived from LiDAR HD data at multiple spatial resolutions over Metropolitan France to produce annual height maps. We evaluate three model variants, producing tree-height predictions at 2.5 m, 5 m, and 10 m resolution. THREASURE-Net does not rely on any pretrained model nor on reference very high resolution optical imagery to train its super-resolution module; instead, it learns solely from LiDAR-derived height information. Our approach outperforms existing state-of-the-art methods based on Sentinel data and is competitive with methods based on very high resolution imagery. It can be deployed to generate high-precision annual canopy-height maps, achieving mean absolute errors of 2.63 m, 2.70 m, and 2.88 m at 2.5 m, 5 m, and 10 m resolution, respectively. These results highlight the potential of THREASURE-Net for scalable and cost-effective structural monitoring of temperate forests using only freely available satellite data. The source code for THREASURE-Net is available at: https://github.com/Global-Earth-Observation/threasure-net.

URL PDF HTML ☆

赞 0 踩 0

2505.13820 2026-05-28 cs.LG cs.AI cs.CL 版本更新

Structured Agent Distillation for Large Language Model

大型语言模型的结构化智能体蒸馏

Jun Liu, Zhenglun Kong, Peiyan Dong, Changdi Yang, Tianqi Li, Hao Tang, Geng Yuan, Wei Niu, Wenbin Zhang, Pu Zhao, Xue Lin, Dong Huang, Yanzhi Wang

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Harvard University（哈佛大学）； MIT（麻省理工学院）； Northeastern University（东北大学）； Adobe Research（Adobe研究）； National University of Singapore（新加坡国立大学）； University of Georgia（佐治亚大学）； Florida International University（佛罗里达国际大学）

AI总结提出结构化智能体蒸馏框架，通过分段对齐推理和动作跨度，将大型语言模型智能体压缩为小型学生模型，在保持决策性能的同时降低推理成本。

详情

Journal ref: The 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

AI中文摘要

大型语言模型（LLMs）通过交错推理和动作（如ReAct风格框架）展现出作为决策智能体的强大能力。然而，它们的实际部署受到高推理成本和大模型规模的限制。我们提出结构化智能体蒸馏，一种将基于大型LLM的智能体压缩为更小的学生模型的框架，同时保持推理保真度和动作一致性。与标准的token级蒸馏不同，我们的方法将轨迹分割为[REASON]和[ACT]跨度，应用分段特定损失来使每个组件与教师行为对齐。这种结构感知的监督使紧凑的智能体能够更好地复制教师的决策过程。在ALFWorld、HotPotQA-ReAct和WebShop上的实验表明，我们的方法始终优于token级和模仿学习基线，在性能下降最小的情况下实现了显著的压缩。缩放和消融结果进一步强调了跨度级对齐对于高效可部署智能体的重要性。

英文摘要

Large language models (LLMs) exhibit strong capabilities as decision-making agents by interleaving reasoning and actions, as seen in ReAct-style frameworks. Yet, their practical deployment is constrained by high inference costs and large model sizes. We propose Structured Agent Distillation, a framework that compresses large LLM-based agents into smaller student models while preserving both reasoning fidelity and action consistency. Unlike standard token-level distillation, our method segments trajectories into [REASON] and [ACT] spans, applying segment-specific losses to align each component with the teacher's behavior. This structure-aware supervision enables compact agents to better replicate the teacher's decision process. Experiments on ALFWorld, HotPotQA-ReAct, and WebShop show that our approach consistently outperforms token-level and imitation learning baselines, achieving significant compression with minimal performance drop. Scaling and ablation results further highlight the importance of span-level alignment for efficient and deployable agents.

URL PDF HTML ☆

赞 0 踩 0

2603.21465 2026-05-28 cs.CL cs.LG 版本更新

DRTriton: Large-Scale Synthetic Data Driven Reinforcement Learning for Triton Kernel Generation

DRTriton：大规模合成数据驱动的强化学习用于Triton内核生成

Siqi Guo, Ming Lin, Tianbao Yang

发表机构 * Texas A&M University（德克萨斯大学）

AI总结提出DRTriton框架，通过合成数据生成、课程强化学习和测试时搜索，训练LLM将PyTorch程序转换为优化的Triton内核，在KernelBench Level 2任务中超越GPT-5.2和Claude-Sonnet-4.5。

详情

AI中文摘要

在生成式AI行业中，开发高效的CUDA内核是一项基础但具有挑战性的任务。最近的研究利用大型语言模型（LLMs）自动将PyTorch参考实现转换为CUDA内核，显著减少了工程工作量。最先进的LLMs，如GPT-5.2和Claude-Sonnet-4.5，仍然难以胜任此任务。为应对这一挑战，我们提出了DRTriton，一个可扩展的学习框架，用于训练LLM将PyTorch程序转换为高度优化的Triton内核，然后在运行时编译为CUDA内核。DRTriton包含三个关键组件：（i）数据合成算法CSP-DAG，保证在算子空间上的完全覆盖和具有可控难度的无偏均匀采样；（ii）具有解耦奖励的课程RL框架，联合优化转换成功率和执行速度；（iii）测试时搜索算法，进一步提高生成的Triton内核的执行速度。通过在使用现有LLM整理的有限PyTorch-Triton对上进行SFT预热阶段，DRTriton在合成PyTorch程序上通过RL训练，有效泛化到即使对人类专家也具挑战性的真实世界CUDA内核。实验结果表明，DRTriton-7B在92%的KernelBench Level 2任务上实现了相对于PyTorch的加速，而GPT-5.2为23%，Claude-Sonnet-4.5为19%。

英文摘要

Developing efficient CUDA kernels is a fundamental yet challenging task in the generative AI industry. Recent research leverages Large Language Models (LLMs) to automatically convert PyTorch reference implementations to CUDA kernels, significantly reducing engineering effort. State-of-the-art LLMs, such as GPT-5.2 and Claude-Sonnet-4.5, still struggle with this task. To address this challenge, we propose DRTriton, a scalable learning framework for training LLMs to convert PyTorch programs into highly optimized Triton kernels, which are then compiled to CUDA kernels at runtime. DRTriton consists of three key components: (i) a data synthetic algorithm CSP-DAG that guarantees full coverage and unbiased uniform sampling over the operator space with controlled difficulty; (ii) a curriculum RL framework with decoupled rewards that jointly optimizes conversion success rate and execution speed; and (iii) a test-time search algorithm that further improves the execution speed of the generated Triton kernels. With a warmup stage of SFT on limited PyTorch-Triton pairs curated using existing LLMs, DRTriton trained by RL on synthesized PyTorch programs generalizes effectively to real-world CUDA kernels that are challenging even for human experts. Experimental results show that DRTriton-7B achieves speedup over PyTorch on 92% of KernelBench Level 2 tasks, compared to 23% for GPT-5.2 and 19% for Claude-Sonnet-4.5.

URL PDF HTML ☆

赞 0 踩 0

2601.21207 2026-05-28 cs.LG cs.AI math.AT 版本更新

激发Pfaffians：跨结构和状态的广义神经波函数

Nicholas Gao, Till Grutschus, Frank Noé, Stephan Günnemann

发表机构 * Department of Computer Science \& Munich Data Science Institute, Technical University of Munich ； Free University of Berlin ； Rice University ； Microsoft Research AI4Science

AI总结提出多态重要性采样（MSIS）和激发Pfaffians架构，以近恒定样本量高效计算多态重叠，并在单个神经网络中表示多个激发态，实现更快训练和更多状态建模。

详情

AI中文摘要

变分蒙特卡洛（VMC）中的神经网络波函数在精确表示基态和激发态方面取得了巨大成功。然而，在状态重叠中实现足够的数值精度需要增加蒙特卡洛样本数量，从而增加计算成本，且随状态数增加。我们提出了一种近乎恒定样本量的方法——多态重要性采样（MSIS），利用来自所有状态的样本来估计成对重叠。为了高效评估所有样本的所有状态，我们引入了激发Pfaffians。受Hartree-Fock启发，该架构在单个神经网络内表示多个状态。激发Pfaffians还作为广义波函数，允许单个模型表示多态势能面。在碳二聚体上，我们匹配了$O(N_s^4)$标度的自然激发态，同时训练速度提高了$>200$倍，并建模了多50%的状态。我们有利的标度使我们能够首次使用神经网络找到铍原子的所有不同能级。最后，我们证明了单个波函数可以表示不同分子中的激发态。

英文摘要

Neural-network wave functions in Variational Monte Carlo (VMC) have achieved great success in accurately representing both ground and excited states. However, achieving sufficient numerical accuracy in state overlaps requires increasing the number of Monte Carlo samples, and consequently the computational cost, with the number of states. We present a nearly constant sample-size approach, Multi-State Importance Sampling (MSIS), that leverages samples from all states to estimate pairwise overlap. To efficiently evaluate all states for all samples, we introduce Excited Pfaffians. Inspired by Hartree-Fock, this architecture represents many states within a single neural network. Excited Pfaffians also serve as generalized wave functions, allowing a single model to represent multi-state potential energy surfaces. On the carbon dimer, we match the $O(N_s^4)$-scaling natural excited states while training $>200\times$ faster and modeling 50% more states. Our favorable scaling enables us to be the first to use neural networks to find all distinct energy levels of the beryllium atom. Finally, we demonstrate that a single wave function can represent excited states across various molecules.

URL PDF HTML ☆

赞 0 踩 0

2602.18982 2026-05-28 cs.LG q-bio.PE 版本更新

Conditionally Site-Independent Neural Evolution of Antibody Sequences

抗体序列的条件性位点无关神经进化

Stephen Zhewen Lu, Aakarsh Vermani, Kohei Sanno, Jiarui Lu, Frederick A Matsen, Milind Jagota, Yun S. Song

发表机构 * University of California, Berkeley ； Columbia University ； Mila - Qu \'e bec AI Institute ； Fred Hutchinson Cancer Research Center ； University of Washington ； Howard Hughes Medical Institute

AI总结提出CoSiNE模型，用深度神经网络参数化的连续时间马尔可夫链桥接系统发育模型与深度学习，实现抗体序列进化建模，在零样本变异效应预测中优于现有语言模型，并引入引导吉莱斯皮采样优化抗体亲和力。

Comments 28 pages, 15 figures. Accepted as a poster at ICML 2026

详情

AI中文摘要

常见的抗体工程深度学习方法侧重于建模序列的边缘分布。然而，这些方法将序列视为独立样本，忽略了亲和力成熟作为抗体探索潜在适应度景观的进化过程中丰富且很大程度上未开发的信息来源。相比之下，经典的系统发育模型明确表示进化动力学，但缺乏捕捉复杂上位相互作用的表达能力。我们通过CoSiNE（一种由深度神经网络参数化的连续时间马尔可夫链）弥合了这一差距。数学上，我们证明CoSiNE提供了难以处理的顺序点突变过程的一阶近似，以分支长度二次方的误差界捕捉上位效应。实验上，CoSiNE通过明确区分选择与上下文依赖的体细胞超突变，在零样本变异效应预测中优于最先进的语言模型。最后，我们引入了引导吉莱斯皮（Guided Gillespie），一种在推理时引导CoSiNE的分类器引导采样方案，从而实现对特定抗原的抗体结合亲和力的高效优化。

英文摘要

Common deep learning approaches for antibody engineering focus on modeling the marginal distribution of sequences. By treating sequences as independent samples, however, these methods overlook affinity maturation as a rich and largely untapped source of information about the evolutionary process by which antibodies explore the underlying fitness landscape. In contrast, classical phylogenetic models explicitly represent evolutionary dynamics but lack the expressivity to capture complex epistatic interactions. We bridge this gap with CoSiNE, a continuous-time Markov chain parameterized by a deep neural network. Mathematically, we prove that CoSiNE provides a first-order approximation to the intractable sequential point mutation process, capturing epistatic effects with an error bound that is quadratic in branch length. Empirically, CoSiNE outperforms state-of-the-art language models in zero-shot variant effect prediction by explicitly disentangling selection from context-dependent somatic hypermutation. Finally, we introduce Guided Gillespie, a classifier-guided sampling scheme that steers CoSiNE at inference time, enabling efficient optimization of antibody binding affinity toward specific antigens.

URL PDF HTML ☆

赞 0 踩 0

2603.13283 2026-05-28 cs.NE cs.LG 版本更新

Bullet Trains: Parallelizing Training of Temporally Precise Spiking Neural Networks

子弹列车：并行训练时间精确的脉冲神经网络

Todd Morrill, Christian Pehle, Anthony Zador

发表机构 * Columbia University（哥伦比亚大学）； Cold Spring Harbor Laboratory（冷泉港实验室）

AI总结提出使用并行关联扫描和可微脉冲时间求解器，实现精确硬重置动力学下的脉冲神经网络高效训练，在GPU上获得高达44倍加速。

Comments Published as a conference paper at ICML 2026

详情

AI中文摘要

连续时间、事件原生的脉冲神经网络（SNN）严格在脉冲事件上运行，将脉冲时间和顺序视为表示，而非时间离散化的产物。这种观点与生物计算、事件传感器和神经形态处理器的原生分辨率一致，同时使计算和内存随事件数量扩展。然而，两个挑战阻碍了实用的、端到端可训练的事件型SNN系统：1）精确的充电-放电-重置动力学强制对输入脉冲进行顺序处理，2）必须在不使用时间箱的情况下求解精确的脉冲时间。我们解决了这两个问题。首先，我们使用并行关联扫描一次消耗多个输入脉冲，在保留精确硬重置动力学的同时，相比顺序模拟实现了高达44倍的加速。其次，我们实现了可微脉冲时间求解器，无需离散时间近似或限制性解析假设即可计算机器精度的脉冲时间。我们在四个基于事件的GPU数据集上展示了使用我们的解决方案训练SNN的可行性。

英文摘要

Continuous-time, event-native spiking neural networks (SNNs) operate strictly on spike events, treating spike timing and ordering as the representation rather than an artifact of time discretization. This viewpoint aligns with biological computation and with the native resolution of event sensors and neuromorphic processors, while enabling compute and memory that scale with the number of events. However, two challenges hinder practical, end-to-end trainable event-based SNN systems: 1) exact charge--fire--reset dynamics impose inherently sequential processing of input spikes, and 2) precise spike times must be solved without time bins. We address both. First, we use parallel associative scans to consume multiple input spikes at once, yielding up to 44x speedups over sequential simulation while retaining exact hard-reset dynamics. Second, we implement differentiable spike time solvers that compute spike times to machine-precision without discrete-time approximations or restrictive analytic assumptions. We demonstrate the viability of training SNNs using our solutions on four event-based datasets on GPUs.

URL PDF HTML ☆

赞 0 踩 0

2603.12824 2026-05-28 cs.IR cs.CV cs.LG 版本更新

NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval

NanoVDR：将20亿参数的视觉语言检索器蒸馏为7000万参数的纯文本编码器用于视觉文档检索

Zhuchenyang Liu, Yao Zhang, Yu Xiao

发表机构 * Aalto University（阿alto大学）

AI总结利用查询-文档不对称性，通过蒸馏将20亿参数的视觉语言模型教师蒸馏为7000万参数的纯文本学生编码器，采用点态余弦对齐目标，实现视觉文档检索的高效推理。

详情

AI中文摘要

基于视觉语言模型（VLM）的检索器已将视觉文档检索（VDR）提升到令人印象深刻的水平。它们需要相同的数十亿参数编码器用于文档索引和查询编码，即使对于纯文本查询也会导致高延迟和GPU依赖。我们观察到这种设计是不必要对称的：文档在视觉上复杂且需要强大的视觉理解，而查询只是短文本字符串。NanoVDR利用这种查询-文档不对称性，解耦两个编码路径：冻结的20亿VLM教师离线索引文档，而蒸馏后的纯文本学生（小至6900万参数）在推理时编码查询。关键设计选择是蒸馏目标。通过对三个骨干网络和22个ViDoRe基准数据集的六个目标进行系统比较，我们发现查询文本上的点态余弦对齐始终优于基于排序和对比的替代方案，同时在训练期间仅需要预缓存的教师查询嵌入，无需处理文档。此外，我们识别出跨语言迁移是主要性能瓶颈，并通过使用机器翻译的查询增强训练数据廉价地解决它。最终的NanoVDR-S-Multi（DistilBERT，6900万）保留了教师质量的95.1%，在v2和v3上以32倍更少的参数和50倍更低的CPU查询延迟优于DSE-Qwen2（20亿），总训练成本低于13 GPU小时。

英文摘要

Vision-Language Model (VLM) based retrievers have advanced visual document retrieval (VDR) to impressive quality. They require the same multi-billion parameter encoder for both document indexing and query encoding, incurring high latency and GPU dependence even for plain-text queries. We observe that this design is unnecessarily symmetric: documents are visually complex and demand strong visual understanding, whereas queries are just short text strings. NanoVDR exploits this query--document asymmetry by decoupling the two encoding paths: a frozen 2B VLM teacher indexes documents offline, while a distilled text-only student as small as 69M parameters encodes queries at inference. The key design choice is the distillation objective. Through systematic comparison of six objectives across three backbones and 22 ViDoRe benchmark datasets, we find that pointwise cosine alignment on query text consistently outperforms ranking-based and contrastive alternatives, while requiring only pre-cached teacher query embeddings and no document processing during training. Furthermore, we identify cross-lingual transfer as the primary performance bottleneck, and resolve it cheaply by augmenting training data with machine-translated queries. The resulting NanoVDR-S-Multi (DistilBERT, 69M) retains 95.1\% of teacher quality and outperforms DSE-Qwen2 (2B) on v2 and v3 with 32$\times$ fewer parameters and 50$\times$ lower CPU query latency, at a total training cost under 13 GPU-hours.

URL PDF HTML ☆

赞 0 踩 0

2603.12344 2026-05-28 cs.LG 版本更新

Can Decision Trees Teach Large Language Models? Distilling Verbalized Knowledge for Molecular Property Prediction

决策树能否教会大语言模型？为分子性质预测提炼语言化知识

Khiem Le, Sreejata Dey, Marcos Martínez Galindo, Vanessa Lopez, Ting Hua, Nitesh V. Chawla, Hoang Thanh Lam

发表机构 * University of Notre Dame（内布拉斯加大学）； IBM Research（IBM研究院）

AI总结提出TreeKD方法，通过将基于决策树/随机森林的专业模型知识语言化并融入提示，训练大语言模型，显著提升其在分子性质预测任务上的性能。

详情

AI中文摘要

分子性质预测（MPP）是药物发现中的一个基本问题，近年来受到越来越多的关注。大语言模型（LLMs）以其跨领域的惊人能力而闻名，有望成为MPP的通用模型。然而，它们目前的性能仍低于实际应用所需的阈值。为了弥补这一差距，我们提出了TreeKD，用于将基于树的专业模型的知识提炼到LLMs中，以补充LLMs的内部知识并提高其预测准确性。对于每个性质，我们使用输入分子中4万个功能基团衍生的特征训练一个专业决策树。然后，将决策树学习到的预测规则（编码了其知识）语言化，并纳入用于训练LLMs的提示中。此外，通过用随机森林替换单个决策树，我们引入了一种称为规则一致性的测试时缩放技术，该技术聚合了从不同规则构建的不同提示生成的预测。使用两个LLM（Gemma-2-2B和Granite-3.3-2B）在包含22个预测任务的TDC基准上进行的大量评估表明，我们的方法显著提高了LLMs的性能，推动了MPP通用模型的发展。

英文摘要

Molecular Property Prediction (MPP) is a fundamental problem in drug discovery that has recently attracted growing attention. Large Language Models (LLMs), known for their impressive proficiency across domains, show promise as generalist models for MPP. However, their current performance remains below the threshold needed for practical adoption. To bridge this gap, we propose TreeKD for distilling the knowledge of tree-based specialist models into LLMs to complement the internal knowledge of LLMs and improve their predictive accuracy. For each property, we train a specialist decision tree using features derived from 40K functional groups in the input molecules. Then, the predictive rule learned by the decision tree, which encodes its knowledge, is verbalized and incorporated into the prompts for training LLMs. In addition, by replacing a single decision tree with a Random Forest, we introduce a test-time scaling technique called rule-consistency, which aggregates predictions generated from different prompts constructed with different rules. An extensive evaluation with two LLMs, Gemma-2-2B and Granite-3.3-2B, on the TDC benchmark with 22 prediction tasks shows that our method substantially enhances the performance of LLMs, advancing the development of generalist models for MPP.

URL PDF HTML ☆

赞 0 踩 0

2603.10961 2026-05-28 cs.LG 版本更新

Bio-Inspired Self-Supervised Learning for Wrist-worn Accelerometer Data

生物启发的自监督学习用于腕戴式加速度计数据

Prithviraj Tarale, Kiet Chu, Abhishek Varghese, Kai-Chun Liu, Maxwell A. Xu, Mohit Iyyer, Sunghoon I. Lee

发表机构 * College of Information and Computer Sciences, University of Massachusetts, Amherst, United States（信息与计算机科学学院，马萨诸塞大学阿默斯特分校）； Google Health, Seattle, United States（谷歌健康，西雅图，美国）； Department of Computer Science, University of Maryland, College Park, United States（计算机科学系，马里兰大学学院公园分校）； Stevens Institute of Technology, Hoboken, United States（史蒂文斯理工学院，霍博肯，美国）

AI总结提出基于运动子单元理论的令牌化策略，通过掩码重建预训练Transformer编码器，在六个HAR基准上超越现有自监督方法。

详情

AI中文摘要

可穿戴加速度计能够实现大规模健康监测，但学习鲁棒的人体活动表示受到标记数据稀缺的限制。虽然自监督学习提供了一种解决方案，但现有方法将传感器流视为非结构化时间序列，忽略了人体运动的潜在生物结构，我们认为这一因素对于有效的人类活动识别（HAR）至关重要。我们引入了一种新颖的令牌化策略，该策略基于运动控制的子单元理论，该理论认为连续的手腕运动由称为子单元的基本基函数组成。我们将令牌定义为运动片段，这是一个计算上可处理的运动单元，由有限序列的子单元组成。通过掩码重建这些令牌来预训练Transformer编码器，我们将学习焦点从局部波形形态转移到高层次的结构和时间组织。在NHANES语料库（约28k小时；11k参与者）上预训练后，我们的表示在六个受试者分离的HAR基准上优于强大的可穿戴SSL基线。代码和预训练权重可在https://prithvitarale.github.io/biopm-site/获取。

英文摘要

Wearable accelerometers enable large-scale health monitoring, yet learning robust human-activity representations has been constrained by scarce labeled data. While self-supervised learning offers a remedy, existing methods treat sensor streams as unstructured time series, overlooking the underlying biological structure of human movement, a factor we argue is critical for effective Human Activity Recognition (HAR). We introduce a novel tokenization strategy grounded in the submovement theory of motor control, which posits that continuous wrist motion is composed of elementary basis functions called submovements. We define our token as the movement segment, a computationally tractable unit of motion composed of a finite sequence of submovements. By pretraining a Transformer encoder via masked reconstruction of these tokens, we shift the learning focus from local waveform morphology to high-level structural and temporal organization. Pretrained on the NHANES corpus (approximately 28k hours; 11k participants), our representations outperform strong wearable SSL baselines across six subject-disjoint HAR benchmarks. Code and pretrained weights are available at https://prithvitarale.github.io/biopm-site/.

URL PDF HTML ☆

赞 0 踩 0

2603.02702 2026-05-28 cs.AI cs.LG 版本更新

FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing

FinTexTS: 基于语义和多层级配对的金融文本-时间序列数据集

Jaehoon Lee, Suhwan Park, Taeyoon Lim, Seunghan Lee, Jun Seo, Dongwan Kang, Hwanil Choi, Minjae Kim, Sungdong Yoo, Soonyoung Lee, Yongjae Lee, Wonbin Ahn

发表机构 * LG AI Research（LG人工智能研究所）； Ulsan National Institute of Science and Technology（乌山国立科学技术研究院）

AI总结提出基于语义和多层级配对的框架，从SEC文件和新闻中提取并匹配多层级文本信息，构建大规模文本配对的股票价格数据集FinTexTS，提升股价预测性能。

Comments 12 pages, KDD 2026, Datasets and Benchmarks Track

详情

AI中文摘要

金融领域涉及多种重要的时间序列问题。近年来，联合利用文本和数值信息的时间序列分析方法越来越受到关注。因此，人们做出了大量努力来构建金融领域中的文本配对时间序列数据集。然而，金融市场具有复杂的相互依赖性，一家公司的股票价格不仅受公司特定事件的影响，还受其他公司事件和更广泛的宏观经济因素的影响。现有的基于简单关键词匹配的文本与金融时间序列数据配对方法往往无法捕捉这种复杂关系。为了解决这一局限性，我们提出了一种基于语义和多层级的配对框架。具体来说，我们从SEC文件中提取目标公司的特定上下文，并应用基于嵌入的匹配机制，根据该上下文检索语义相关的新闻文章。此外，我们使用大语言模型（LLMs）将新闻文章分为四个层级（宏观层级、行业层级、相关公司层级和目标公司层级），实现新闻文章与目标公司的多层级配对。将该框架应用于公开可用的新闻数据集，我们构建了FinTexTS，这是一个新的大规模文本配对的股票价格数据集。在FinTexTS上的实验结果表明，我们的基于语义和多层级的配对策略在股价预测中是有效的。除了FinTexTS所依赖的公开新闻外，我们还表明，将我们的方法应用于专有但精心策划的新闻源，可以产生更高质量的配对数据，并提高股价预测性能。

英文摘要

The financial domain involves a variety of important time-series problems. Recently, time-series analysis methods that jointly leverage textual and numerical information have gained increasing attention. Accordingly, numerous efforts have been made to construct text-paired time-series datasets in the financial domain. However, financial markets are characterized by complex interdependencies, in which a company's stock price is influenced not only by company-specific events but also by events in other companies and broader macroeconomic factors. Existing approaches that pair text with financial time-series data based on simple keyword matching often fail to capture such complex relationships. To address this limitation, we propose a semantic-based and multi-level pairing framework. Specifically, we extract company-specific context for the target company from SEC filings and apply an embedding-based matching mechanism to retrieve semantically relevant news articles based on this context. Furthermore, we classify news articles into four levels (macro-level, sector-level, related company-level, and target company-level) using large language models (LLMs), enabling multi-level pairing of news articles with the target company. Applying this framework to publicly-available news datasets, we construct FinTexTS, a new large-scale text-paired stock price dataset. Experimental results on FinTexTS demonstrate the effectiveness of our semantic-based and multi-level pairing strategy in stock price forecasting. In addition to publicly-available news underlying FinTexTS, we show that applying our method to proprietary yet carefully curated news sources leads to higher-quality paired data and improved stock price forecasting performance.

URL PDF HTML ☆

赞 0 踩 0

2603.08761 2026-05-28 stat.ML cs.LG 版本更新

No Certificate for Alignment: Two Independent Impossibilities and the Pareto Frontier of Achievable Safety Guarantees

对齐无证书：两个独立的不可行性与可实现安全保证的帕累托前沿

Ayushi Agarwal

发表机构 * Independent Researcher（独立研究者）

AI总结本文通过两个独立的不可行性定理证明，在标准计算复杂性和学习理论假设下，对开放或无界输入域的AI对齐进行形式化认证是不可能的，并刻画了可实现的安全保证的帕累托前沿。

详情

AI中文摘要

我们论证，在计算复杂性和学习理论的标准假设下，对开放或无界输入域上的AI对齐进行形式化认证是不可能的，并刻画了仍可实现的内容。两个结构独立的不可行性定理支持这一立场。语义障碍（定理1）：判断一个系统是否在整个输入域上满足任何非平凡的对齐性质，对于前馈网络是NP难的，对于图灵完备架构是不可判定的——这是神经网络验证复杂性和Rice定理的直接推论。统计障碍（定理2）：任何既正确又易处理的验证过程无法在整个输入域上满足完备性——这是从有限观测中认证无限域性质的不可能性的直接推论。这两个定理共同蕴含一个三难困境：没有过程能同时满足正确性（没有未对齐系统被认证）、完备性（没有对齐系统被拒绝）和易处理性（多项式运行时间）。每对性质可同时实现，但三者不可兼得。我们将这些结果整合为一个包含两个结构独立障碍的联合框架，证明它们的独立性，并通过构造性的覆盖间隙下界定量刻画可实现的帕累托前沿。

英文摘要

We argue that formal certification of AI alignment over open-ended or unbounded input domains is impossible under standard assumptions in computational complexity and learning theory, and characterise what remains achievable. Two structurally independent impossibility theorems support this position. The semantic barrier (Theorem 1): deciding whether a system satisfies any non-trivial alignment property over the full input domain is NP-hard for feedforward networks and undecidable for Turing-complete architectures -- a direct consequence of neural-network verification complexity and Rice's Theorem. The statistical barrier (Theorem 2): any verification procedure that is both sound and tractable cannot satisfy Completeness over the full input domain -- a direct consequence of the impossibility of certifying infinite-domain properties from finite observations. These two theorems jointly entail a trilemma: no procedure can simultaneously satisfy soundness (no misaligned system is certified), completeness (no aligned system is rejected), and tractability (polynomial runtime). Each pair is simultaneously achievable; all three are not. We combine these results as a joint framework of two structurally independent barriers, prove their independence, and characterise the achievable Pareto frontier quantitatively via a constructive coverage-gap lower bound.

URL PDF HTML ☆

赞 0 踩 0

2601.21309 2026-05-28 cs.LG 版本更新

Transferable Graph Condensation from the Causal Perspective

从因果视角的可迁移图压缩

Huaming Du, Yijie Huang, Su Yao, Yiying Wang, Yueyang Zhou, Jingwen Yang, Jinshi Zhang, Han Ji, Yu Zhao, Guisong Liu, Hegui Zhang, Carl Yang, Gang Kou

发表机构 * Southwestern University of Finance and Economics（西南财经大学）； Tsinghua University（清华大学）； Ant Group（蚂蚁集团）； Dongbei University of Finance and Economics（东北财经大学）； Emory University（埃默里大学）； Xiangjiang Laboratory（湘江实验室）

AI总结提出基于因果不变性的可迁移图压缩方法TGCC，通过因果干预提取域不变特征并注入压缩图，实现跨任务和跨域场景下的有效压缩。

详情

AI中文摘要

图数据集的规模日益增大，显著提升了图表示学习方法的性能，但也带来了巨大的训练挑战。图数据集压缩技术应运而生，旨在将大规模数据集压缩为更小但信息丰富的数据集，同时保持相似的测试性能。然而，这些方法严格要求下游应用与原始数据集和任务匹配，在跨任务和跨域场景中往往失效。为解决这些挑战，我们提出了一种新颖的基于因果不变性的可迁移图数据集压缩方法，命名为TGCC，提供有效且可迁移的压缩数据集。具体而言，为保留域不变知识，我们首先通过因果干预从图的空间域提取域因果不变特征。然后，为充分捕捉原始图的结构和特征信息，我们执行增强压缩操作。最后，通过谱域增强对比学习，将因果不变特征注入压缩图，确保压缩图保留原始图的因果信息。在五个公开数据集和我们新构建的FinReport数据集上的实验结果表明，TGCC在跨任务和跨域复杂场景下相比现有方法提升高达13.41%，并在6个数据集中的5个上，在单一数据集和任务场景下达到了最先进性能。

英文摘要

The increasing scale of graph datasets has significantly improved the performance of graph representation learning methods, but it has also introduced substantial training challenges. Graph dataset condensation techniques have emerged to compress large datasets into smaller yet information-rich datasets, while maintaining similar test performance. However, these methods strictly require downstream applications to match the original dataset and task, which often fails in cross-task and cross-domain scenarios. To address these challenges, we propose a novel causal-invariance-based and transferable graph dataset condensation method, named TGCC, providing effective and transferable condensed datasets. Specifically, to preserve domain-invariant knowledge, we first extract domain causal-invariant features from the spatial domain of the graph using causal interventions. Then, to fully capture the structural and feature information of the original graph, we perform enhanced condensation operations. Finally, through spectral-domain enhanced contrastive learning, we inject the causal-invariant features into the condensed graph, ensuring that the compressed graph retains the causal information of the original graph. Experimental results on five public datasets and our novel FinReport dataset demonstrate that TGCC achieves up to a 13.41% improvement in cross-task and cross-domain complex scenarios compared to existing methods, and achieves state-of-the-art performance on 5 out of 6 datasets in the single dataset and task scenario.

URL PDF HTML ☆

赞 0 踩 0

2512.00252 2026-05-28 stat.ML cs.LG physics.ao-ph 版本更新

DAISI: Data Assimilation with Inverse Sampling using Stochastic Interpolants

DAISI：基于随机插值逆采样的数据同化

Martin Andrae, Erik Wikingsson, So Takao, Tomas Landelius, Fredrik Lindsten

发表机构 * STIMA（统计与机器学习系）； California Institute of Technology（加州理工学院）； Swedish Meteorological and Hydrological Institute（瑞典气象与水文研究所）

AI总结提出DAISI算法，利用流式生成模型实现灵活的概率推断，通过逆采样结合预报信息与观测数据，解决传统高斯近似在复杂非线性系统中的局限性。

Comments Accepted at the International Conference on Machine Learning 2026, 44 pages, 28 figures

详情

AI中文摘要

数据同化是科学和工程应用的基石，它将模型预报与稀疏且带噪声的观测相结合，以估计潜在的系统状态。经典的高维数据同化方法，如集合卡尔曼滤波器，依赖于高斯近似，这在复杂动力学或观测算子中会被违反。为了解决这一局限性，我们引入了DAISI，一种基于流式生成模型的可扩展滤波算法，能够利用数据驱动的先验实现灵活的概率推断。核心思想是使用一个固定的、预训练好的生成先验，首先通过一种新颖的逆采样步骤融入预报信息，然后通过基于引导的条件采样同化观测。这使我们能够利用任何预报模型作为数据同化流程的一部分，而无需在每个同化步骤重新训练或微调生成先验。在具有挑战性的非线性系统上的实验表明，DAISI在稀疏、带噪声和非线性观测的情况下实现了准确的滤波结果，而传统方法在这些情况下表现不佳。DAISI的代码可在https://github.com/Erik-Wikingsson/DAISI获取。

英文摘要

Data assimilation (DA) is a cornerstone of scientific and engineering applications, combining model forecasts with sparse and noisy observations to estimate latent system states. Classical high-dimensional DA methods, such as the ensemble Kalman filter, rely on Gaussian approximations that are violated for complex dynamics or observation operators. To address this limitation, we introduce DAISI, a scalable filtering algorithm built on flow-based generative models that enables flexible probabilistic inference using data-driven priors. The core idea is to use a stationary, pre-trained generative prior that first incorporates forecast information through a novel inverse-sampling step, before assimilating observations via guidance-based conditional sampling. This allows us to leverage any forecasting model as part of the DA pipeline without having to retrain or fine-tune the generative prior at each assimilation step. Experiments on challenging nonlinear systems show that DAISI achieves accurate filtering results in regimes with sparse, noisy, and nonlinear observations where traditional methods struggle. The code for DAISI is available at https://github.com/Erik-Wikingsson/DAISI.

URL PDF HTML ☆

赞 0 踩 0

2602.22769 2026-05-28 cs.AI cs.LG 版本更新

噪声调度作为扩散训练中的信息引导分配

Gabriel Raya, Bac Nguyen, Georgios Batzolis, Yuhta Takida, Dejan Stancevic, Naoki Murata, Chieh-Hsin Lai, Yuki Mitsufuji, Luca Ambrogioni

发表机构 * Tilburg University & JADS（蒂尔堡大学及JADS）； Sony AI（索尼人工智能）； University of Cambridge（剑桥大学）； Radboud University（拉德堡德大学）； Sony Group Corporation（索尼集团公司）

AI总结提出InfoNoise，一种在线自适应噪声调度方法，通过估计条件熵率剖面动态调整训练噪声分布，以优化去噪任务中的信息增益，在图像、DNA和语言生成等任务中达到或超越基线，并节省高达3倍训练计算量。

详情

AI中文摘要

我们引入了InfoNoise，一种用于扩散训练的在线自适应噪声调度，它将优化努力重新分配到去噪最具信息量的噪声水平上。与损失加权一起，噪声调度在去噪问题之间诱导出有效的分配，而这种分配通常在知道信息性噪声水平之前就已固定。InfoNoise通过从训练期间的去噪损失中估计条件熵率剖面，使这种分配具有数据自适应性，无需辅助模型或离线搜索。通过I--MMSE，该剖面识别出噪声观测在何处能快速减少关于干净样本的不确定性，并指导训练噪声分布的适应。它只改变这个分布，保持目标、加权和参数化不变。在图像基准测试中，调度已被广泛调整，InfoNoise匹配或略微超过强基线，并且可以用更少的更新达到相同的质量。在表示、序列和模态转换（包括DNA和语言生成）上，InfoNoise优于固定和自适应基线，并且达到目标质量所需的训练计算量最多减少3倍。这些结果确立了条件熵率剖面作为噪声调度设计的数据依赖目标，并使在线自适应成为手动调度搜索的实用替代方案。

DeepC4: 用于城市形态大规模多任务空间分解的深度条件普查约束聚类

Joshua Dimasaka, Christian Geiß, Emily So

发表机构 * Department of Architecture, University of Cambridge（剑桥大学建筑系）； Cambridge University Centre for Risk in the Built Environment（剑桥大学建筑环境风险研究中心）； Earth Observation Center, German Aerospace Center（德国航天中心地球观测中心）； Institute of Geography, University of Bonn（波恩大学地理研究所）

AI总结提出DeepC4，一种结合局部普查统计作为聚类约束并联合学习卫星图像模式的多任务深度学习方法，用于城市形态的粗到细空间分解，在卢旺达数据上优于现有方法。

Comments Major Revised Preprint Submitted to ISPRS Journal of Photogrammetry and Remote Sensing (in review) | Keywords: urban morphology, building exposure, physical vulnerability, spatial disaggregation, deep clustering | Data: https://doi.org/10.5281/zenodo.13119552 | Code: https://github.com/riskaudit/DeepC4

详情

AI中文摘要

为了理解许多发展中经济体在可持续发展和减灾方面的全球进展，最近的两项重大举措——全球地震模型（GEM）基金会的统一非洲暴露数据集和通过地球观测常规（METEOR）项目进行暴露建模——利用来自各种卫星图像及其衍生物、建筑环境地理空间数据集以及地方级普查统计的信息，实施了经典的空间分解技术来生成城市形态的大规模映射。然而，与经过良好验证的普查统计数据的局部差异以及传播的模型不确定性仍然是这种粗到细粒度映射问题中的挑战，特别是受到弱和条件标签监督的约束。因此，我们提出了深度条件普查约束聚类（DeepC4），这是一种新颖的基于深度学习的空间分解方法，它将局部普查统计作为聚类级别的约束，同时在卫星图像模式的联合多任务学习中考虑多个条件标签关系。作为使用卢旺达城市形态的演示，DeepC4在屋顶、墙壁和高度预测上分别实现了0.63、0.78和0.45的宏F1分数，以及0.57、0.71和0.42的宏mIoU，估计的国家住宅和居住者数量与普查记录相比误差在1.13%和1.11%以内，优于GEM（2.03%和3.29%），并且在各省份中占据了比METEOR多32%-49%的500米网格像素。随着世界在2030年接近许多全球框架的结束，我们的工作提供了一种新的基于深度学习的映射技术，该技术明确编码了经过良好验证的普查和专家信念系统，以实现对现有大规模粗粒度派生信息的可解释和可解释审计。

英文摘要

To understand our global progress for sustainable development and disaster risk reduction in many developing economies, two recent major initiatives - the Uniform African Exposure Dataset of the Global Earthquake Model (GEM) Foundation and the Modelling Exposure through Earth Observation Routines (METEOR) Project - implemented classical spatial disaggregation techniques to generate large-scale mapping of urban morphology using the information from various satellite imagery and its derivatives, geospatial datasets of the built environment, and subnational census statistics. However, the local discrepancy with well-validated census statistics and the propagated model uncertainties remain a challenge in such coarse-to-fine-grained mapping problems, specifically constrained by weak and conditional label supervision. Therefore, we present Deep Conditional Census-Constrained Clustering (DeepC4), a novel deep learning-based spatial disaggregation approach that incorporates local census statistics as cluster-level constraints while considering multiple conditional label relationships in a joint multitask learning of the patterns of satellite imagery. As a demonstration using Rwandan urban morphology, DeepC4 achieves macro-F1 scores of 0.63, 0.78, and 0.45 and macro-mIoU of 0.57, 0.71, and 0.42 for roof, wall, and height prediction respectively, estimates national dwelling and occupant counts within 1.13% and 1.11% error compared to census records, outperforming GEM (2.03% and 3.29%), and occupies 32%-49% more 500-meter grid pixels than METEOR across provinces. As the world approaches the conclusion of many global frameworks in 2030, our work offers a new deep learning-based mapping technique that explicitly encodes well-validated census and experts' belief systems to achieve an explainable and interpretable auditing of existing coarse-grained derived information at large scales.

URL PDF HTML ☆

赞 0 踩 0

2602.13524 2026-05-28 cs.LG cs.AI 版本更新

Singular Vectors of Attention Heads Align with Features

注意力头的奇异向量与特征对齐

Gabriel Franco, Carson Loughridge, Mark Crovella

发表机构 * Department of Computer Science, Boston University, Boston, USA ； Faculty of Computing \& Data Sciences, Boston University, Boston, USA

AI总结本文通过理论分析和实验验证，解释了注意力头奇异向量与特征表示对齐的原因和条件，并提出了稀疏注意力分解作为对齐的可检验预测。

Comments To be published in ICML 2026

详情

AI中文摘要

识别语言模型中的特征表示是机械可解释性的核心任务。最近的一些研究观察到，在某些情况下，可以从注意力矩阵的奇异向量中推断出特征表示。然而，这一现象缺乏合理的解释。本文探讨了这个问题：为什么以及何时奇异向量与特征对齐？首先，我们证明在可以直接观察特征的模型中，奇异向量与特征稳健地对齐。然后，我们从理论上表明，这种对齐在多种条件下是预期的。最后，我们提出如何在特征表示不可直接观察的真实模型中操作性地识别对齐。我们将稀疏注意力分解确定为对齐的一个可检验预测，并展示证据表明它在真实模型中以与预测一致的方式出现。这些结果共同表明，奇异向量与特征的对齐可以作为语言模型中特征识别的合理且有理论依据的基础。

英文摘要

Identifying feature representations in language models is a central task in mechanistic interpretability. Several recent studies have made the observation that feature representations can be inferred in some cases from singular vectors of attention matrices. However, sound justification for this phenomenon is lacking. In this paper we address that question, asking: why and when do singular vectors align with features? First, we demonstrate that singular vectors robustly align with features in a model where features can be directly observed. We then show theoretically that such alignment is expected under a range of conditions. We close by asking how, operationally, alignment may be recognized in real models where feature representations are not directly observable. We identify sparse attention decomposition as a testable prediction of alignment, and show evidence that it emerges in real models in a manner consistent with predictions. Together these results suggest that alignment of singular vectors with features can be a sound and theoretically justified basis for feature identification in language models.

URL PDF HTML ☆

赞 0 踩 0

2602.13075 2026-05-28 cs.LG 版本更新

Unified Multi-Domain Graph Pre-training for Homogeneous and Heterogeneous Graphs via Domain-Specific Expert Encoding

统一多域图预训练：通过域特定专家编码实现同质和异质图

Chundong Liang, Yongqi Huang, Dongxiao He, Peiyuan Li, Yawen Li, Di Jin, Weixiong Zhang

发表机构 * School of Computer Science ； Technology Tianjin University Tianjin China ； North China University of Science ； School of Economics ； Management Beijing University of Posts ； Departments of Health Technology \& Informatics ； Computing The Hong Kong Polytechnic University Kowloon Hong Kong ； Tianjin University ； Beijing University of Posts ； The Hong Kong Polytechnic University

AI总结提出统一多域图预训练方法GPH²，通过域特定专家编码和任务导向专家融合策略，解决同质与异质图混合场景下的跨域分布偏移问题。

Comments 12 pages, 7 figures

详情

AI中文摘要

近年来，图预训练取得了显著成功，为下游任务提供了可迁移的表示。然而，大多数现有方法仅针对同质图或异质图设计，阻碍了跨不同类型图的统一建模。这种分离与现实应用相矛盾，因为混合的同质和异质图普遍存在，且上游预训练与下游部署之间的分布偏移很常见。在本文中，我们通过实验证明，同质和异质图预训练的平衡混合有利于下游任务，并提出了一种统一的跨同质和异质图的多域图预训练方法（GPH²）。为了解决缺乏同质和异质图统一编码器的问题，我们提出了一种统一的多视图图构建方法，无需显式的图类型特定设计即可同时编码两者。为了应对混合图带来的跨域分布差异增加，我们引入了域特定专家编码。每个专家在单个图上独立预训练以捕获域特定知识，从而保护预训练编码器免受跨域差异的不利影响。对于下游任务，我们进一步设计了一种任务导向的专家融合策略，根据专家的判别优势自适应地整合多个专家。在混合图上的大量实验表明，GPH²能够实现跨图类型和域的稳定迁移，显著优于现有的图预训练方法。

英文摘要

Graph pre-training has achieved remarkable success in recent years, delivering transferable representations for downstream adaptation. However, most existing methods are designed for either homogeneous or heterogeneous graphs, thereby hindering unified graph modeling across diverse graph types. This separation contradicts real-world applications, where mixed homogeneous and heterogeneous graphs are ubiquitous, and distribution shifts between upstream pre-training and downstream deployment are common. In this paper, we empirically demonstrate that a balanced mixture of homogeneous and heterogeneous graph pre-training benefits downstream tasks and propose a unified multi-domain \textbf{G}raph \textbf{P}re-training method across \textbf{H}omogeneous and \textbf{H}eterogeneous graphs ($\mathbf{GPH^{2}}$). To address the lack of a unified encoder for homogeneous and heterogeneous graphs, we propose a Unified Multi-View Graph Construction that simultaneously encodes both without explicit graph-type-specific designs. To cope with the increased cross-domain distribution discrepancies arising from mixed graphs, we introduce domain-specific expert encoding. Each expert is independently pre-trained on a single graph to capture domain-specific knowledge, thereby shielding the pre-training encoder from the adverse effects of cross-domain discrepancies. For downstream tasks, we further design a Task-oriented Expert Fusion Strategy that adaptively integrates multiple experts based on their discriminative strengths. Extensive experiments on mixed graphs demonstrate that $\text{GPH}^{2}$ enables stable transfer across graph types and domains, significantly outperforming existing graph pre-training methods.

URL PDF HTML ☆

赞 0 踩 0

2602.12468 2026-05-28 cs.LG cs.FL 版本更新

解耦自适应梯度下降中的方差与尺度不变更新以实现统一向量和矩阵优化

Zitao Song, Cedar Site Bai, Zhe Zhang, Brian Bullins, David F. Gleich

发表机构 * Department of Computer Science, Purdue University, West Lafayette, USA（计算机科学系，普渡大学，西拉法叶，美国）； Edwardson School of Industrial Engineering, Purdue University, West Lafayette, USA（工业工程学院，普渡大学，西拉法叶，美国）

AI总结提出DeVA框架，通过解耦AdaGrad更新中的方差适应项和尺度不变项，统一向量自适应方法与矩阵谱优化，在语言建模和图像分类中优于Muon和SOAP，减少约6.6%的token使用。

详情

AI中文摘要

像Adam这样的自适应方法已成为大规模向量和欧几里得优化的$ extit{事实}$标准，因为它们具有二阶性质的坐标适应。最近，基于矩阵的谱优化器如Muon（Jordan等人，2024b）展示了将权重矩阵视为矩阵而非长向量的威力。将这些方法联系起来是困难的，因为许多自然泛化不可行实现，而且我们也不能简单地将Adam适应移到矩阵谱上。为了解决这个问题，我们重新表述了AdaGrad更新，并将其分解为方差适应项和尺度不变项。这种解耦产生了$ extbf{DeVA}$（$ extbf{De}$coupled $ extbf{V}$ariance $ extbf{A}$daptation），一个连接基于向量的方差适应和矩阵谱优化的框架，实现了从Adam到自适应谱下降的无缝过渡。在语言建模和图像分类上的大量实验表明，DeVA持续优于Muon和SOAP（Vyas等人，2024）等最先进方法，减少了约6.6%的token使用。理论上，我们证明方差适应项有效改善了块状平滑性，促进了更快的收敛。我们的实现可在https://github.com/Tsedao/Decoupled-Variance-Adaptation获取。

英文摘要

Adaptive methods like Adam have become the $\textit{de facto}$ standard for large-scale vector and Euclidean optimization due to their coordinate-wise adaptation with a second-order nature. More recently, matrix-based spectral optimizers like Muon (Jordan et al., 2024b) show the power of treating weight matrices as matrices rather than long vectors. Linking these is hard because many natural generalizations are not feasible to implement, and we also cannot simply move the Adam adaptation to the matrix spectrum. To address this, we reformulate the AdaGrad update and decompose it into a variance adaptation term and a scale-invariant term. This decoupling produces $\textbf{DeVA}$ ($\textbf{De}$coupled $\textbf{V}$ariance $\textbf{A}$daptation), a framework that bridges between vector-based variance adaptation and matrix spectral optimization, enabling a seamless transition from Adam to adaptive spectral descent. Extensive experiments across language modeling and image classification demonstrate that DeVA consistently outperforms state-of-the-art methods such as Muon and SOAP (Vyas et al., 2024), reducing token usage by around 6.6\%. Theoretically, we show that the variance adaptation term effectively improves the blockwise smoothness, facilitating faster convergence. Our implementation is available at https://github.com/Tsedao/Decoupled-Variance-Adaptation

URL PDF HTML ☆

赞 0 踩 0

2505.18647 2026-05-28 cs.LG cs.AI 版本更新

STFlow: Data-Coupled Flow Matching for Geometric Trajectory Simulation

STFlow: 用于几何轨迹模拟的数据耦合流匹配

Kiet Bennema ten Brinke, Koen Minartz, Vlado Menkovski

发表机构 * Machine Learning for Physical Sciences (ML4Sci/e) Group, Department of Mathematics \& Computer Science, Eindhoven University of Technology, The Netherlands

AI总结提出STFlow，一种基于图神经网络和层次卷积的生成模型，通过数据依赖耦合的流匹配框架，从条件随机游走而非高斯噪声去噪，降低传输成本，提高训练和推理效率，在N体系统、分子动力学和人类轨迹预测中实现最低预测误差。

Comments Proceedings of the 43rd International Conference on Machine Learning (ICML), Seoul, South Korea. PMLR 306, 2026, 18 pages, 12 figures

详情

AI中文摘要

模拟动力系统的轨迹是分子动力学、生物化学和行人动力学等广泛领域中的基本问题。机器学习已成为扩展基于物理的模拟器和直接从实验数据开发模型的宝贵工具。特别是，深度生成建模和几何深度学习的最新进展通过学习复杂的轨迹分布，同时尊重固有的置换和时间平移对称性，实现了概率模拟。然而，N体系统的轨迹通常具有对导致分岔的扰动的高敏感性，以及多尺度的时间和空间相关性。为了应对这些挑战，我们引入了STFlow（时空流），一种基于图神经网络和层次卷积的生成模型。通过在流匹配框架中引入数据依赖的耦合，STFlow从条件随机游走而非高斯噪声开始去噪。这种新颖的信息先验通过降低传输成本简化了学习任务，提高了训练和推理效率。我们在N体系统、分子动力学和人类轨迹预测上验证了我们的方法。在这些基准测试中，STFlow以更少的模拟步骤实现了最低的预测误差，并提高了可扩展性。

英文摘要

Simulating trajectories of dynamical systems is a fundamental problem in a wide range of fields such as molecular dynamics, biochemistry, and pedestrian dynamics. Machine learning has become an invaluable tool for scaling physics-based simulators and developing models directly from experimental data. In particular, recent advances in deep generative modeling and geometric deep learning enable probabilistic simulation by learning complex trajectory distributions while respecting intrinsic permutation and time-shift symmetries. However, trajectories of N-body systems are commonly characterized by high sensitivity to perturbations leading to bifurcations, as well as multi-scale temporal and spatial correlations. To address these challenges, we introduce STFlow (Spatio-Temporal Flow), a generative model based on graph neural networks and hierarchical convolutions. By incorporating data-dependent couplings within the Flow Matching framework, STFlow denoises starting from conditioned random-walks instead of Gaussian noise. This novel informed prior simplifies the learning task by reducing transport cost, increasing training and inference efficiency. We validate our approach on N-body systems, molecular dynamics, and human trajectory forecasting. Across these benchmarks, STFlow achieves the lowest prediction errors with fewer simulation steps and improved scalability.

URL PDF HTML ☆

赞 0 踩 0

2602.03855 2026-05-28 eess.SP cs.LG 版本更新

Majorization-Minimization Networks for Inverse Problems: An Application to EEG Imaging

用于反问题的Majorization-Minimization网络：在脑电图成像中的应用

Le Minh Triet Tran, Sarah Reynaud, Ronan Fablet, Adrien Merlini, François Rousseau, Mai Quyen Pham

发表机构 * IMT Atlantique（IMT阿登蒂克）； LaTIM U1101 INSERM ； Lab-STICC UMR CNRS 6285

AI总结提出一种基于双层优化的学习型Majorization-Minimization框架，通过参数化曲率主导量并施加MM条件约束，在保持收敛保证的同时提升反问题求解的精度与稳定性，并在脑电图源成像中验证了其优于深度展开和元学习方法。

详情

AI中文摘要

反问题通常是不适定的，需要具有强稳定性和收敛保证的优化方案。虽然基于学习的方法（如深度展开和元学习）取得了强大的实证性能，但它们通常缺乏对下降和曲率的显式控制，限制了鲁棒性。我们提出了一种在双层优化设置中用于反问题的学习型Majorization-Minimization（MM）框架。我们不学习完整的优化器，而是学习一个结构化的曲率主导量，该主导量控制每个MM步骤，同时保留经典的MM下降保证。该主导量由一个轻量级循环神经网络参数化，并显式约束以满足有效的MM条件。对于余弦相似度损失，我们推导出显式的曲率界限，从而得到对角主导量。当解析界限不可用时，我们依赖基于高效Hessian-向量积的谱估计来自动上界局部曲率，而无需显式形成Hessian矩阵。在脑电图源成像上的实验表明，与深度展开和元学习基线相比，该方法在准确性、稳定性和跨数据集泛化方面均有改进。

英文摘要

Inverse problems are often ill-posed and require optimization schemes with strong stability and convergence guarantees. While learning-based approaches such as deep unrolling and meta-learning achieve strong empirical performance, they typically lack explicit control over descent and curvature, limiting robustness. We propose a learned Majorization-Minimization (MM) framework for inverse problems within a bilevel optimization setting. Instead of learning a full optimizer, we learn a structured curvature majorant that governs each MM step while preserving classical MM descent guarantees. The majorant is parameterized by a lightweight recurrent neural network and explicitly constrained to satisfy valid MM conditions. For cosine-similarity losses, we derive explicit curvature bounds yielding diagonal majorants. When analytic bounds are unavailable, we rely on efficient Hessian-vector product-based spectral estimation to automatically upper-bound local curvature without forming the Hessian explicitly. Experiments on EEG source imaging demonstrate improved accuracy, stability, and cross-dataset generalization over deep-unrolled and meta-learning baselines.

URL PDF HTML ☆

赞 0 踩 0

2509.25582 2026-05-28 cs.LG 版本更新

Safe In-Context Reinforcement Learning

安全的上下文强化学习

Amir Moeini, Minjae Kwon, Alper Kamil Bozkurt, Yuichi Motai, Rohan Chandra, Lu Feng, Shangtong Zhang

发表机构 * University of Virginia（弗吉尼亚大学）； Virginia Commonwealth University（弗吉尼亚 Commonwealth 大学）

AI总结提出SCARED方法，在无参数更新的上下文强化学习适应过程中，通过精确惩罚对偶法在约束马尔可夫决策过程框架下保证安全，实现奖励最大化与成本约束。

Comments ICML 2026

详情

AI中文摘要

上下文强化学习（ICRL）是一种新兴的强化学习范式，其中智能体在预训练后，无需任何参数更新，仅依赖不断扩展的交互历史上下文即可适应分布外测试任务。尽管ICRL展现出令人印象深刻的泛化能力，但适应过程中的安全性尚未被探索，这限制了其在测试时行为需安全的实际部署中的应用。本文提出SCARED：基于精确惩罚对偶的安全上下文自适应强化，这是首个在约束马尔可夫决策过程框架下促进ICRL安全适应的方法。在无需参数更新的适应过程中，我们的智能体不仅最大化奖励，还将累积成本控制在用户指定的安全预算内。我们还证明智能体对安全预算有主动反应：安全预算越高，智能体行为越激进；安全预算越低，智能体行为越保守。在具有挑战性的基准测试中，SCARED始终实现安全且鲁棒的上下文适应，优于现有的ICRL和安全元强化学习基线。

英文摘要

In-context reinforcement learning (ICRL) is an emerging RL paradigm where an agent, after pretraining, can adapt to out-of-distribution test tasks without any parameter updates, instead relying on an expanding context of interaction history. While ICRL has shown impressive generalization, safety during this adaptation process remains unexplored, limiting its applicability in real-world deployments where test-time behavior is expected to be safe. In this work, we propose SCARED: Safe Contextual Adaptive Reinforcement via Exact-penalty Dual, the first method that promotes safe adaptation of ICRL under the constrained Markov decision process framework. During the parameter-update-free adaptation process, our agent not only maximizes the reward but also keeps the accumulated cost within a user-specified safety budget. We also demonstrate that the agent actively reacts to the safety budget; with a higher safety budget, the agent behaves more aggressively, and with a lower safety budget the agent behaves more conservatively. Across challenging benchmarks, SCARED consistently enables safe and robust in-context adaptation, outperforming existing ICRL and safe meta-RL baselines.

URL PDF HTML ☆

赞 0 踩 0

2503.01829 2026-05-28 cs.CL cs.AI cs.LG cs.MA 版本更新

Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models

如果你能说服我：评估大型语言模型说服效果与易受影响性的框架

Nimet Beyza Bozdag, Shuhaib Mehri, Gokhan Tur, Dilek Hakkani-Tür

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结提出PMIYC框架，通过多智能体对话自动评估LLM的说服效果与易受影响性，发现不同模型在说服力和抗说服性上存在显著差异。

Comments Paper published at the ACM Conference on AI and Agentic Systems 2026

详情

DOI: 10.1145/3786335.3813181

AI中文摘要

大型语言模型（LLM）展现出与人类水平相当的说服能力。虽然这些能力可用于社会公益，但也存在被滥用的风险。除了关注LLM如何说服他人外，它们自身对说服的易受影响性也构成了关键的校准挑战，引发了关于鲁棒性、安全性和伦理原则遵守的问题。为了研究这些动态，我们引入了“如果你能说服我”（PMIYC），一个用于评估多智能体交互中说服力和易受影响性的自动化框架。我们的框架提供了一种可扩展的替代方案，替代了通常用于研究LLM说服的昂贵且耗时的人工标注过程。PMIYC自动进行说服者和被说服者智能体之间的多轮对话，同时衡量说服的有效性和易受影响性。我们的综合评估涵盖了多种LLM和说服场景（例如，主观和错误信息场景）。我们通过人工评估验证了框架的有效性，并展示了与先前研究中人工评估的一致性。通过PMIYC，我们发现Llama-3.3-70B和GPT-4o表现出相似的说服效果，比Claude 3 Haiku高出30%。然而，GPT-4o在对抗错误信息方面的抵抗力比Llama-3.3-70B高出50%以上。值得注意的是，o4-mini既是有效的说服者，也是抵抗的被说服者。这些发现为LLM的说服动态提供了实证见解，并有助于开发更安全的AI系统。

英文摘要

Large Language Models (LLMs) demonstrate persuasive capabilities that rival human-level persuasion. While these capabilities can be used for social good, they also present risks of potential misuse. Beyond the concern of how LLMs persuade others, their own susceptibility to persuasion poses a critical alignment challenge, raising questions about robustness, safety, and adherence to ethical principles. To study these dynamics, we introduce Persuade Me If You Can (PMIYC), an automated framework for evaluating persuasiveness and susceptibility to persuasion in multi-agent interactions. Our framework offers a scalable alternative to the costly and time-intensive human annotation process typically used to study persuasion in LLMs. PMIYC automatically conducts multi-turn conversations between Persuader and Persuadee agents, measuring both the effectiveness of and susceptibility to persuasion. Our comprehensive evaluation spans a diverse set of LLMs and persuasion settings (e.g., subjective and misinformation scenarios). We validate the efficacy of our framework through human evaluations and demonstrate alignment with human assessments from prior studies. Through PMIYC, we find that Llama-3.3-70B and GPT-4o exhibit similar persuasive effectiveness, outperforming Claude 3 Haiku by 30%. However, GPT-4o demonstrates over 50% greater resistance to persuasion for misinformation compared to Llama-3.3-70B. Notably, o4-mini emerges as both an effective persuader, and a resistant persuadee. These findings provide empirical insights into the persuasive dynamics of LLMs and contribute to the development of safer AI systems.

URL PDF HTML ☆

赞 0 踩 0

2602.03515 2026-05-28 cs.LG cs.AI cs.DC 版本更新

Mitigating Staleness in Asynchronous Pipeline Parallelism via Basis Rotation

通过基旋转缓解异步流水线并行中的陈旧性问题

Hyunji Jung, Sungbin Shin, Namhoon Lee

发表机构 * POSTECH（POSTECH大学）

AI总结针对异步流水线并行中梯度陈旧性随流水线深度线性增长的问题，提出基旋转框架，通过将优化器坐标系与Hessian特征基对齐来保持延迟更新的有效性，理论证明最小化基失配并实证在3B参数LLM训练中减少81.7%迭代次数。

Comments ICML 2026

详情

AI中文摘要

异步流水线并行通过消除同步执行中固有的流水线气泡来最大化硬件利用率，为高效大规模分布式训练提供了一条途径。然而，这种效率提升可能会被梯度陈旧性所削弱，其中使用延迟梯度的即时模型更新会在优化过程中引入噪声。关键的是，我们发现了一个常被忽视的严重问题：这种延迟随流水线深度线性增长，从根本上破坏了该方法原本意图提供的可扩展性。我们将此问题归因于优化景观的一个特定性质：Hessian特征基与标准坐标基之间的失配，这触发了坐标自适应优化器更新轨迹中的振荡。我们识别出这些振荡导致延迟更新偏离其真实对应项，使其无法用于当前迭代。这一见解通过理论分析（包括一个表明基失配放大延迟惩罚的收敛界）和实证评估得到证实。为了解决这个问题，我们提出了基旋转，一个将优化器坐标系旋转以与Hessian特征基对齐的框架，使延迟更新保持有用。我们从理论上证明基旋转最小化基失配，从而抵消放大延迟惩罚的条件。在训练高达3B参数的LLM的实证中，与性能最佳的异步基线相比，基旋转减少了81.7%所需的迭代次数。

英文摘要

Asynchronous pipeline parallelism maximizes hardware utilization by eliminating the pipeline bubbles inherent in synchronous execution, offering a path toward efficient large-scale distributed training. However, this efficiency gain can be compromised by gradient staleness, where the immediate model updates with delayed gradients introduce noise into the optimization process. Crucially, we identify a critical, yet often overlooked, pathology: this delay scales linearly with pipeline depth, fundamentally undermining the very scalability that the method originally intends to provide. We trace this pathology to a specific property of the optimization landscape: the misalignment between the Hessian eigenbasis and the standard coordinate basis, which triggers oscillations in the update trajectories of coordinate-wise adaptive optimizers. We identify that these oscillations cause delayed updates to diverge from their true counterparts, invalidating their use for current iterations. This insight is formalized through theoretical analysis, including a convergence bound showing that basis misalignment amplifies the delay penalty, and substantiated with empirical evaluation. To address this, we propose basis rotation, a framework that rotates the optimizer's coordinate system to align with the Hessian eigenbasis, keeping delayed updates useful. We theoretically demonstrate that basis rotation minimizes basis misalignment, thereby counteracting the conditions that amplify delay penalties. Empirically, in training up to a 3B-parameter LLM, basis rotation reduces the required iterations by 81.7\% compared to the best-performing asynchronous baseline.

URL PDF HTML ☆

赞 0 踩 0

2602.02855 2026-05-28 cs.LG cond-mat.dis-nn math.ST stat.TH 版本更新

When pre-training hurts LoRA fine-tuning: a dynamical analysis via single-index models

当预训练损害LoRA微调：基于单指标模型的动力学分析

Gibbs Nwemadji, Bruno Loureiro, Jean Barbier

发表机构 * International School of Advanced Studies（国际先进研究学校）； Département d’Informatique, École Normale Supérieure, PSL & CNRS（信息学院，巴黎高等师范学校，PSL与CNRS）； Abdus Salam International Centre for Theoretical Physics（阿布杜斯·萨拉姆国际理论物理中心）

AI总结本文通过单指标模型下的动力学分析，数学证明了过度预训练会降低LoRA微调的收敛速度，并刻画了收敛率与初始对齐及目标任务非线性的关系。

Comments 38 pages, 14 figures

详情

AI中文摘要

在源任务上的预训练通常被认为有助于类似下游问题的微调。本文从数学上表明，这种朴素直觉并不总是成立：过度预训练会在计算上减慢微调优化。我们研究了在单次SGD训练的单指标模型上进行低秩适应（LoRA）微调的现象。利用微调动力学的汇总统计描述，我们精确刻画了收敛率如何依赖于初始微调对齐和目标任务的非线性程度。关键结论是，即使预训练和下游任务高度对齐，强预训练也会导致搜索阶段延长并阻碍收敛。因此，我们的理论提供了一个统一图景，说明预训练强度与任务难度如何在非平凡的可处理模型中共同塑造LoRA微调的动力学和局限性。在实践方面，我们通过实验表明，我们的理论发现超越了玩具模型，在真实数据上训练的视觉变换器模型中仍然相关。

英文摘要

Pre-training on a source task is usually expected to facilitate fine-tuning on similar downstream problems. In this work, we mathematically show that this naive intuition is not always true: excessive pre-training can computationally slow down fine-tuning optimization. We study this phenomenon for low-rank adaptation (LoRA) fine-tuning on single-index models trained under one-pass SGD. Leveraging a summary statistics description of the fine-tuning dynamics, we precisely characterize how the convergence rate depends on the initial fine-tuning alignment and the degree of non-linearity of the target task. The key take away is that even when the pre-training and downstream tasks are well aligned, strong pre-training can induce a prolonged search phase and hinder convergence. Our theory thus provides a unified picture of how pre-training strength and task difficulty jointly shape the dynamics and limitations of LoRA fine-tuning in a nontrivial tractable model. On the practical side, we empirically show that our theoretical findings extend beyond our toy model and remain relevant in the context of a vision-transformer model trained on real data.

URL PDF HTML ☆

赞 0 踩 0

2602.01807 2026-05-28 cs.CL cs.LG 版本更新

Sentence Curve Language Models

句子曲线语言模型

DongNyeong Heo, Taehwan Kim, Heeyoul Choi

发表机构 * Ulsan National Institute of Science and Technology（全南国立科学研究所）； Handong Global University（翰昂全球大学）

AI总结提出句子曲线表示，将扩散语言模型扩展为预测句子曲线而非静态词嵌入，以增强全局结构建模，并在IWSLT14和WMT14上取得最优性能。

详情

AI中文摘要

语言模型（LM）是现代AI系统的核心组成部分，扩散语言模型（DLM）最近已成为一种有竞争力的替代方案。这两种范式都依赖词嵌入来表示输入句子，以及骨干模型训练预测的目标句子。我们认为，这种目标词的静态嵌入对相邻词不敏感，鼓励局部准确的词预测，而全局句子结构则较少被强调。为了解决这个问题，我们提出了一种连续的句子表示，称为句子曲线，定义为一条样条曲线，其控制点影响句子中的多个词。基于这种表示，我们引入了句子曲线语言模型（SCLM），它将DLM扩展为预测句子曲线而非静态词嵌入。我们从理论上证明，句子曲线预测会引入正则化效应，促进全局结构建模，并刻画了不同句子曲线类型如何影响这种行为。实验上，SCLM在IWSLT14和WMT14上取得了DLM中的最优性能，训练稳定且无需繁重的知识蒸馏，并在LM1B上展现出与离散DLM相比有潜力的前景。

英文摘要

Language models (LMs) are a central component of modern AI systems, and diffusion language models (DLMs) have recently emerged as a competitive alternative. Both paradigms rely on word embeddings not only to represent the input sentence, but also to represent the target sentence that backbone models are trained to predict. We argue that such static embedding of the target word is insensitive to neighboring words, encouraging locally accurate word prediction while global sentence structure is less emphasized. To address this, we propose a continuous sentence representation, termed sentence curve, defined as a spline curve whose control points affect multiple words in the sentence. Based on this representation, we introduce sentence curve language model (SCLM), which extends DLMs to predict sentence curves instead of the static word embeddings. We theoretically show that sentence curve prediction induces a regularization effect that promotes global structure modeling, and characterize how different sentence curve types affect this behavior. Empirically, SCLM achieves state-of-the-art performance among DLMs on IWSLT14 and WMT14, shows stable training without burdensome knowledge distillation, and demonstrates promising potential compared to discrete DLMs on LM1B.

URL PDF HTML ☆

赞 0 踩 0

2602.02417 2026-05-28 cs.LG 版本更新

Trust Region Continual Learning as an Implicit Meta-Learner

信任区域持续学习作为隐式元学习器

Zekun Wang, Anant Gupta, Christopher J. MacLellan

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结本文提出信任区域持续学习，通过结合生成重放和Fisher度量信任区域约束，实现隐式元学习效果，在任务增量扩散图像生成和持续扩散策略控制中取得最佳性能。

Comments 21 pages, 21 tables

详情

AI中文摘要

持续学习旨在顺序获取任务而不发生灾难性遗忘，但标准策略面临核心权衡：基于正则化的方法（如EWC）在任务最优值弱重叠时可能过度约束更新，而基于重放的方法可以保持性能但因不完美重放而漂移。我们研究了一种混合视角：\emph{信任区域持续学习}，它将生成重放与Fisher度量信任区域约束相结合。我们证明，在局部近似下，得到的更新具有MAML风格的解释，包含一个隐式内步：重放提供旧任务梯度信号（类似查询），而Fisher加权惩罚提供高效的离线曲率塑造（类似支持）。这产生了持续学习中的涌现元学习特性：模型成为初始化，在每次任务转换后快速\emph{重新收敛}到先前任务最优值，而无需显式优化双层目标。实验上，在任务增量扩散图像生成和持续扩散策略控制中，信任区域持续学习实现了最佳最终性能和保留，并且比EWC、重放和持续元学习基线更快地恢复早期任务性能。

英文摘要

Continual learning aims to acquire tasks sequentially without catastrophic forgetting, yet standard strategies face a core tradeoff: regularization-based methods (e.g., EWC) can overconstrain updates when task optima are weakly overlapping, while replay-based methods can retain performance but drift due to imperfect replay. We study a hybrid perspective: \emph{trust region continual learning} that combines generative replay with a Fisher-metric trust region constraint. We show that, under local approximations, the resulting update admits a MAML-style interpretation with a single implicit inner step: replay supplies an old-task gradient signal (query-like), while the Fisher-weighted penalty provides an efficient offline curvature shaping (support-like). This yields an emergent meta-learning property in continual learning: the model becomes an initialization that rapidly \emph{re-converges} to prior task optima after each task transition, without explicitly optimizing a bilevel objective. Empirically, on task-incremental diffusion image generation and continual diffusion-policy control, trust region continual learning achieves the best final performance and retention, and consistently recovers early-task performance faster than EWC, replay, and continual meta-learning baselines.

URL PDF HTML ☆

赞 0 踩 0

2602.02259 2026-05-28 cs.LG cs.CV 版本更新

TABX：面向多智能体强化学习的高吞吐沙盒战斗模拟器

Hayeong Lee, JunHyeok Oh, Byung-Jun Lee

发表机构 * Department of Artificial Intelligence, Korea University, Seoul, Republic of Korea（韩国大学人工智能系）； Gauss Labs Inc., Seoul, Republic of Korea（首尔Gauss实验室）

AI总结提出基于JAX的高吞吐沙盒模拟器TABX，通过可重构任务和硬件加速支持多智能体强化学习的高效研究与评估。

详情

AI中文摘要

环境的设计在塑造合作多智能体强化学习（MARL）算法的开发和评估中起着关键作用。虽然现有基准突出了关键挑战，但它们通常缺乏设计自定义评估场景所需的模块化。我们介绍了基于JAX的全加速战斗模拟器（TABX），这是一个专为可重构多智能体任务设计的高吞吐沙盒。TABX提供对环境参数的精细控制，允许系统地研究涌现的智能体行为和跨不同任务复杂度谱系的算法权衡。利用JAX在GPU上进行硬件加速执行，TABX实现了大规模并行化并显著降低了计算开销。通过提供一个快速、可扩展且易于定制的框架，TABX促进了复杂结构化领域中MARL智能体的研究，并作为未来研究的可扩展基础。我们的代码可在https://github.com/ku-dmlab/TABX获取。

英文摘要

The design of environments plays a critical role in shaping the development and evaluation of cooperative multi-agent reinforcement learning (MARL) algorithms. While existing benchmarks highlight critical challenges, they often lack the modularity required to design custom evaluation scenarios. We introduce the Totally Accelerated Battle Simulator in JAX (TABX), a high-throughput sandbox designed for reconfigurable multi-agent tasks. TABX provides granular control over environmental parameters, permitting a systematic investigation into emergent agent behaviors and algorithmic trade-offs across a diverse spectrum of task complexities. Leveraging JAX for hardware-accelerated execution on GPUs, TABX enables massive parallelization and significantly reduces computational overhead. By providing a fast, extensible, and easily customized framework, TABX facilitates the study of MARL agents in complex structured domains and serves as a scalable foundation for future research. Our code is available at: https://github.com/ku-dmlab/TABX.

URL PDF HTML ☆

赞 0 踩 0

2510.02174 2026-05-28 cs.LG math.OC math.PR stat.ML 版本更新

Flatness-Aware Stochastic Gradient Langevin Dynamics

平坦感知随机梯度Langevin动力学

Stefano Bruno, Youngsik Hwang, Jaehyeon An, Sotirios Sabanis, Dong-Young Lim

发表机构 * UNIST InnoCORE AI-Space Solar Initiative, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea（UNIST InnoCORE AI-Space Solar Initiative，乌山国立科学与技术研究所（UNIST），乌山，44919，韩国）； Artificial Intelligence Graduate School, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea（人工智能研究生院，乌山国立科学与技术研究所（UNIST），乌山，44919，韩国）； Department of Industrial Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea（工业工程系，乌山国立科学与技术研究所（UNIST），乌山，44919，韩国）； School of Mathematics, University of Edinburgh, Edinburgh, United Kingdom（爱丁堡大学数学学院，爱丁堡，英国）； Department of Mathematics, National Technical University of Athens, Athens, Greece（雅典国家技术大学数学系，雅典，希腊）； Archimedes, Athena Research and Innovation Centre, Marousi, Greece（Archimedes，雅典研究与创新中心，Marousi，希腊）

AI总结提出平坦感知随机梯度Langevin动力学（fSGLD），通过理论规定的噪声尺度与逆温度耦合，在保持计算效率的同时偏向平坦盆地，并提供非渐近理论分析和实验验证。

Comments Accepted by ICML 2026

详情

Journal ref: ICML 2026

AI中文摘要

损失景观的平坦性已被广泛研究，作为理解深度学习算法行为和泛化的重要视角。受此观点启发，我们提出了平坦感知随机梯度Langevin动力学（fSGLD），这是一种一阶优化方法，在保持SGD和SGLD的计算和内存效率的同时，使其动力学偏向平坦盆地。我们提供了非渐近理论分析，表明在理论上规定的噪声尺度$σ$和逆温度$β$之间的耦合下，fSGLD以平坦偏差的吉布斯分布为目标，并给出了显式的过剩风险保证。我们在标准优化器基准、贝叶斯图像分类、不确定性量化和分布外检测上对fSGLD进行了实证评估，展示了持续强劲的性能和可靠的不确定性估计。额外实验证实了理论上规定的$β$-$σ$耦合相对于解耦选择的有效性。

英文摘要

Flatness of the loss landscape has been widely studied as an important perspective for understanding the behavior and generalization of deep learning algorithms. Motivated by this view, we propose Flatness-Aware Stochastic Gradient Langevin Dynamics (fSGLD), a first-order optimization method that biases learning its dynamics toward flat basins while retaining the computational and memory efficiency of SGD and SGLD. We provide a non-asymptotic theoretical analysis showing that fSGLD targets a flatness-biased Gibbs distribution under a theoretically prescribed coupling between the noise scale $σ$ and the inverse temperature $β$, together with explicit excess risk guarantees. We empirically evaluate fSGLD across standard optimizer benchmarks, Bayesian image classification, uncertainty quantification, and out-of-distribution detection, demonstrating consistently strong performance and reliable uncertainty estimates. Additional experiments confirm the effectiveness of the theoretically prescribed $β$-$σ$ coupling compared to decoupled choices.

URL PDF HTML ☆

赞 0 踩 0

2509.23074 2026-05-28 cs.LG cs.AI 版本更新

Beyond Model Ranking: Predictability-Aligned Evaluation for Time Series Forecasting

超越模型排名：时间序列预测的可预测性对齐评估

Wanjin Feng, Yuan Yuan, Jingtao Ding, Yong Li

发表机构 * Department of Electronic Engineering, Tsinghua University, Beijing, China.（清华大学电子工程系，北京，中国）

AI总结针对基准排行榜评估混淆模型性能与数据内在不可预测性的问题，提出基于谱相干的可预测性对齐诊断框架，包含SCP分数和LUR工具，揭示可预测性漂移和模型架构权衡。

详情

AI中文摘要

在时间序列预测的AI模型日益复杂的时代，进展通常通过基准排行榜上的边际改进来衡量。然而，这种方法存在一个根本缺陷：标准评估指标混淆了模型的性能与数据的内在不可预测性。为了解决这一紧迫挑战，我们引入了一个新颖的、基于谱相干的可预测性对齐诊断框架。我们的框架有两个主要贡献：谱相干可预测性（SCP），一个计算高效（$O(N\log N)$）且任务对齐的分数，用于量化给定预测实例的固有难度；以及线性利用率（LUR），一个频率分辨的诊断工具，精确测量模型如何有效利用数据中的线性可预测信息。我们验证了框架的有效性，并利用它揭示了两个核心见解。首先，我们提供了“可预测性漂移”的首个系统性证据，表明任务的预测难度随时间剧烈变化。其次，我们的评估揭示了一个关键的架构权衡：复杂模型在低可预测性数据上表现优越，而线性模型在更可预测的任务上非常有效。我们倡导范式转变，超越简单的聚合分数，转向更具洞察力的、可预测性感知的评估，从而促进更公平的模型比较和更深入的模型行为理解。

英文摘要

In the era of increasingly complex AI models for time series forecasting, progress is often measured by marginal improvements on benchmark leaderboards. However, this approach suffers from a fundamental flaw: standard evaluation metrics conflate a model's performance with the data's intrinsic unpredictability. To address this pressing challenge, we introduce a novel, predictability-aligned diagnostic framework grounded in spectral coherence. Our framework makes two primary contributions: the Spectral Coherence Predictability (SCP), a computationally efficient ($O(N\log N)$) and task-aligned score that quantifies the inherent difficulty of a given forecasting instance, and the Linear Utilization Ratio (LUR), a frequency-resolved diagnostic tool that precisely measures how effectively a model exploits the linearly predictable information within the data. We validate our framework's effectiveness and leverage it to reveal two core insights. First, we provide the first systematic evidence of "predictability drift", demonstrating that a task's forecasting difficulty varies sharply over time. Second, our evaluation reveals a key architectural trade-off: complex models are superior for low-predictability data, whereas linear models are highly effective on more predictable tasks. We advocate for a paradigm shift, moving beyond simplistic aggregate scores toward a more insightful, predictability-aware evaluation that fosters fairer model comparisons and a deeper understanding of model behavior.

URL PDF HTML ☆

赞 0 踩 0

2602.01203 2026-05-28 cs.CL cs.LG 版本更新

Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse

注意力汇聚在注意力层中锻造原生MoE：针对头部坍塌的汇聚感知训练

Zizhuo Fu, Wenxuan Zeng, Runsheng Wang, Meng Li

发表机构 * Institute for Artificial Intelligence, Peking University, Beijing（人工智能研究院，北京大学，北京）； School of Integrated Circuits, Peking University, Beijing（集成电路学院，北京大学，北京）

AI总结本文通过理论和实证证明注意力汇聚自然构建了注意力层内的混合专家机制，并提出汇聚感知训练算法以缓解头部坍塌问题，提升模型性能。

Comments 2026 International Conference on Machine Learning (ICML)

详情

AI中文摘要

一种对抗驱动的深度学习射频指纹识别实验研究

Xinyu Cao, Bimal Adhikari, Shangqing Zhao, Jingxian Wu, Yanjun Pan

发表机构 * University of Oklahoma（俄克拉荷马大学）； University of Arkansas（阿肯色大学）

AI总结通过对抗性实验分析，发现深度学习射频指纹识别系统在域偏移下存在一致误分类行为，可被用作后门攻击，且模型在原始信号上训练会纠缠射频指纹与环境特征，产生无法通过置信度阈值等后处理缓解的攻击向量。

详情

DOI: 10.1109/MILCOM64451.2025.11309917
Journal ref: IEEE Military Communications Conference (MILCOM), 2025

AI中文摘要

射频指纹识别通过提取无线设备独特的硬件缺陷，已成为零信任架构和超5G网络中有前景的物理层设备识别机制。特别是，深度学习方法在该领域展示了最先进的性能。然而，现有方法主要侧重于增强系统对无线环境时空变化的鲁棒性，而基于深度学习的方法的安全漏洞常被忽视。在这项工作中，我们通过对抗驱动的实验分析，系统性地研究了基于深度学习的射频指纹识别系统的安全风险。我们观察到深度学习模型在域偏移下存在一致的误分类行为，即一个设备经常被误分类为另一个特定设备。基于广泛真实实验的分析表明，这种行为可以被利用为有效的后门，使外部攻击者能够入侵系统。此外，我们证明在原始接收信号上训练深度学习模型会导致模型将射频指纹与环境及信号模式特征纠缠在一起，产生无法仅通过置信度阈值等后处理安全方法缓解的额外攻击向量。

英文摘要

Radio frequency (RF) fingerprinting, which extracts unique hardware imperfections of radio devices, has emerged as a promising physical-layer device identification mechanism in zero trust architectures and beyond 5G networks. In particular, deep learning (DL) methods have demonstrated state-of-the-art performance in this domain. However, existing approaches have primarily focused on enhancing system robustness against temporal and spatial variations in wireless environments, while the security vulnerabilities of these DL-based approaches have often been overlooked. In this work, we systematically investigate the security risks of DL-based RF fingerprinting systems through an adversarial-driven experimental analysis. We observe a consistent misclassification behavior for DL models under domain shifts, where a device is frequently misclassified as another specific one. Our analysis based on extensive real-world experiments demonstrates that this behavior can be exploited as an effective backdoor to enable external attackers to intrude into the system. Furthermore, we show that training DL models on raw received signals causes the models to entangle RF fingerprints with environmental and signal-pattern features, creating additional attack vectors that cannot be mitigated solely through post-processing security methods such as confidence thresholds.

URL PDF HTML ☆

赞 0 踩 0

2404.06106 2026-05-28 cs.LG 版本更新

Unifying Low Dimensional Spectra in Deep Learning

统一深度学习中的低维谱

Connall Garrod, Jonathan P. Keating

发表机构 * Mathematical Institute, University of Oxford（牛津大学数学研究所）

AI总结本文利用无约束特征模型（UFM）证明深度神经坍缩（DNC）是多种深度学习矩阵（如Hessian、梯度和权重）中低维谱结构的统一来源，并给出了特征值和特征向量的解析构造。

Comments revised version; title changed slightly. 45 pages, 20 figures. Accepted at the International Conference on Machine Learning 2026

详情

AI中文摘要

在过参数化分类网络中，深度学习矩阵的特征谱中普遍出现低维结构。尽管理论进展旨在解释这一现象，但通常只能捕捉部分行为或依赖实践中不成立的假设。本文为几种典型的深度学习矩阵（包括Hessian、梯度和权重）的体加离群结构提供了解析解释。我们使用无约束特征模型（UFMs）——一种研究深度神经坍缩（DNC）出现的常用工具——来实现这一点。我们证明DNC是这些低维特征谱的根源，每种情况下，特征值和特征向量都可以从特征均值（DNC的表征对象）构造出来。这为深度学习中的广泛谱现象提供了统一的解析解释，并通过提供特征向量的详细分析，超越了通常仅关注特征值的经验刻画。我们证明结果对线性网络和ReLU网络均成立，并在建模语境和标准数据集上的标准深度网络架构中提供了数值验证。

英文摘要

Low dimensional structures appear ubiquitously in the eigenspectra of deep learning matrices in classification networks trained in the overparameterized regime. While theoretical advances have aimed to explain this phenomenology, they typically succeed only in capturing subsets of the full behavior or rely on assumptions that cannot hold in practice. In this work, we provide an analytic explanation for the bulk plus outlier structure of several canonical deep learning matrices, including the Hessian, gradients, and weights. We achieve this using unconstrained feature models (UFMs), a now-common tool for studying the emergence of deep neural collapse (DNC). We show that DNC is the source of these low dimensional eigenspectra, in each case, the eigenvalues and eigenvectors can be constructed from feature means, the characterizing objects of DNC. This provides a unifying analytic explanation for a wide range of spectral phenomena in deep learning and goes beyond empirical characterizations, which typically focus on eigenvalues, by providing a detailed analysis of eigenvectors. We prove that our results hold for both linear and ReLU networks and provide numerical validation in both the modeling context and standard deep-network architectures on canonical datasets.

URL PDF HTML ☆

赞 0 踩 0

2508.14082 2026-05-28 cs.LG 版本更新

Toward Robust Semi-supervised Regression via Dual-stream Knowledge Distillation

通过双流知识蒸馏实现鲁棒半监督回归

Ye Su, Hezhe Qiao, Wei Huang, Lin Chen

发表机构 * Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences（重庆绿色智能技术研究所，中国科学院）； Chongqing School, University of Chinese Academy of Sciences（中国科学院大学重庆学院）； Singapore Management University（新加坡管理大学）； Beijing University of Posts and Telecommunications（北京邮电大学）

AI总结针对半监督回归中未标记数据利用不足和伪标签噪声问题，提出双流知识蒸馏框架（DKD），通过蒸馏连续值知识和分布信息，并结合解耦分布对齐模块，提升回归预测的鲁棒性和样本效率。

Comments 12 pages

详情

AI中文摘要

半监督回归（SSR）旨在预测样本的连续分数，同时减少对大规模标记数据的依赖，近年来在计算机视觉、自然语言处理、音频分析和医学分析等各种应用中引起了广泛关注。现有的SSR方法通常通过引入基于约束的正则化或序数排序来使用稀缺的标记数据训练模型，以减轻过拟合。然而，这些方法往往未能充分利用丰富的未标记样本。尽管一致性驱动的伪标签方法试图纳入未标记数据，但其性能对伪标签质量和噪声预测高度敏感。为了解决这些挑战，我们提出了一个双流知识蒸馏框架（DKD），专门为SSR设计，用于蒸馏连续值知识和分布信息。这种设计更好地保留了回归幅度信息并提高了样本效率。具体来说，在DKD中，教师模型仅使用真实标签进行优化以进行标签分布估计，而学生模型则从真实标签和教师生成的未标记数据伪目标中学习。蒸馏过程实现了有效的监督转移，使学生能够更鲁棒地利用伪标签。此外，我们引入了一个解耦分布对齐（DDA）模块，该模块分别对齐教师和学生之间的目标分布和非目标分布。为了提高非目标知识转移的可靠性，DDA包含一个方差引导的非目标分布对齐策略，该策略自适应地降低不确定的教师预测的权重，从而增强学生减轻伪标签监督中噪声的能力，并学习一个更好校准的回归预测器。

英文摘要

Semi-supervised regression (SSR), which aims to predict continuous scores for samples while reducing the reliance on large-scale labeled data, has recently attracted considerable attention across various applications, including computer vision, natural language processing, audio analysis, and medical analysis. Existing SSR methods typically train models with scarce labeled data by introducing constraint-based regularization or ordinal ranking to mitigate overfitting. However, these approaches often fail to fully exploit the abundance of unlabeled samples. Although consistency-driven pseudo-labeling methods attempt to incorporate unlabeled data, their performance is highly sensitive to pseudo-label quality and noisy predictions. To address these challenges, we propose a Dual-stream Knowledge Distillation framework (DKD), which is specifically designed for SSR to distill both continuous-valued knowledge and distributional information. This design better preserves regression magnitude information and improves sample efficiency. Specifically, in DKD, the teacher is optimized solely with ground-truth labels for label distribution estimation, while the student learns from a mixture of real labels and teacher-generated pseudo targets on unlabeled data. The distillation process enables effective supervision transfer, allowing the student to leverage pseudo labels more robustly. Furthermore, we introduce a Decoupled Distribution Alignment (DDA) module, which separately aligns the target and non-target distributions between the teacher and student. To improve the reliability of non-target knowledge transfer, DDA incorporates a variance-guided non-target distribution alignment strategy that adaptively downweights uncertain teacher predictions, thereby enhancing the student's ability to mitigate noise in pseudo-label supervision and learn a better-calibrated regression predictor.

URL PDF HTML ☆

赞 0 踩 0

2601.15015 2026-05-28 cs.LG 版本更新

Plug-and-Play Benchmarking of Reinforcement Learning Algorithms for Large-Scale Flow Control

强化学习算法在大规模流动控制中的即插即用基准测试

Jannis Becktepe, Aleksandra Franz, Nils Thuerey, Sebastian Peitz

发表机构 * TU Dortmund University（图卢兹大学）； Lamarr Institute for Machine Learning（拉马尔机器学习研究所）； Technical University Munich（慕尼黑技术大学）； Munich Center for Machine Learning（慕尼黑机器学习中心）

AI总结提出首个完全基于PyTorch、可微分的强化学习流动控制基准套件FluidGym，通过标准化评估协议实现控制方法的系统比较。

Comments Accepted to ICML 2026. Code available at https://github.com/safe-autonomous-systems/fluidgym

详情

AI中文摘要

强化学习（RL）在主动流动控制（AFC）中显示出有希望的结果，但由于现有研究依赖于异构的观测和驱动方案、数值设置和评估协议，该领域的进展仍然难以评估。当前的AFC基准试图解决这些问题，但严重依赖外部计算流体动力学（CFD）求解器，不是完全可微分的，并且对3D和多智能体的支持有限。为了克服这些限制，我们引入了FluidGym，这是第一个独立的、完全可微分的AFC中RL基准套件。FluidGym完全在PyTorch中构建，基于GPU加速的PICT求解器，在单个Python堆栈中运行，不需要外部CFD软件，并提供标准化的评估协议。我们展示了使用PPO、SAC、DPC和TD-MPC的基线结果，并将所有环境、数据集和训练模型作为公共资源发布。FluidGym能够系统比较控制方法，为基于学习的流动控制的未来研究建立可扩展的基础，并可在github.com/safe-autonomous-systems/fluidgym获取。

英文摘要

Reinforcement learning (RL) has shown promising results in active flow control (AFC), yet progress in the field remains difficult to assess as existing studies rely on heterogeneous observation and actuation schemes, numerical setups, and evaluation protocols. Current AFC benchmarks attempt to address these issues but heavily rely on external computational fluid dynamics (CFD) solvers, are not fully differentiable, and provide limited 3D and multi-agent support. To overcome these limitations, we introduce FluidGym, the first standalone, fully differentiable benchmark suite for RL in AFC. Built entirely in PyTorch on top of the GPU-accelerated PICT solver, FluidGym runs in a single Python stack, requires no external CFD software, and provides standardized evaluation protocols. We present baseline results with PPO, SAC, DPC, and TD-MPC, and release all environments, datasets, and trained models as public resources. FluidGym enables systematic comparison of control methods, establishes a scalable foundation for future research in learning-based flow control, and is available at github.com/safe-autonomous-systems/fluidgym.

URL PDF HTML ☆

赞 0 踩 0

2601.10334 2026-05-28 cs.CV cs.LG 版本更新

An analytic theory of convolutional neural network inverse problems solvers

卷积神经网络逆问题求解器的解析理论

Minh Hai Nguyen, Quoc Bao Do, Edouard Pauwels, Pierre Weiss

发表机构 * IRIT \& CBI, CNRS \& Université Toulouse, France ； Toulouse School of Economics, Université Toulouse Capitole, France

AI总结通过最小均方误差估计器引入平移等变性和有限感受野的归纳偏置，推导出局部等变MMSE的解析公式，并在多种逆问题、数据集和架构上验证其与神经网络输出高度一致。

详情

Journal ref: Forty-Third International Conference on Machine Learning, 2026

AI中文摘要

监督卷积神经网络（CNN）被广泛用于解决成像逆问题，在众多应用中取得了最先进的性能。然而，尽管取得了经验上的成功，这些方法从理论角度仍缺乏理解，常被视为黑箱。为弥合这一差距，我们通过最小均方误差（MMSE）估计器的视角分析训练后的神经网络，并引入捕获CNN两个基本归纳偏置（平移等变性和通过有限感受野的局部性）的功能约束。在经验训练分布下，我们推导出这种约束变体（称为局部等变MMSE，LE-MMSE）的解析、可解释且易于计算的公式。通过在不同逆问题（去噪、修复、去卷积）、数据集（FFHQ、CIFAR-10、FashionMNIST）和架构（U-Net、ResNet、PatchMLP）上的大量数值实验，我们证明了我们的理论与神经网络输出相匹配（PSNR $\gtrsim25$dB）。此外，我们提供了对物理感知和物理无关估计器之间差异、训练（补丁）分布中高密度区域的影响以及其他因素（数据集大小、补丁大小等）影响的见解。

英文摘要

Supervised convolutional neural networks (CNNs) are widely used to solve imaging inverse problems, achieving state-of-the-art performance in numerous applications. However, despite their empirical success, these methods are poorly understood from a theoretical perspective and often treated as black boxes. To bridge this gap, we analyze trained neural networks through the lens of the Minimum Mean Square Error (MMSE) estimator, incorporating functional constraints that capture two fundamental inductive biases of CNNs: translation equivariance and locality via finite receptive fields. Under the empirical training distribution, we derive an analytic, interpretable, and tractable formula for this constrained variant, termed Local-Equivariant MMSE (LE-MMSE). Through extensive numerical experiments across various inverse problems (denoising, inpainting, deconvolution), datasets (FFHQ, CIFAR-10, FashionMNIST), and architectures (U-Net, ResNet, PatchMLP), we demonstrate that our theory matches the neural networks outputs (PSNR $\gtrsim25$dB). Furthermore, we provide insights into the differences between \emph{physics-aware} and \emph{physics-agnostic} estimators, the impact of high-density regions in the training (patch) distribution, and the influence of other factors (dataset size, patch size, etc).

URL PDF HTML ☆

赞 0 踩 0

2601.01616 2026-05-28 cs.LG eess.SP 版本更新

Real Time NILM Based Power Monitoring of Identical Induction Motors Representing Cutting Machines in Textile Industry

基于实时非侵入式负荷监测的纺织行业切割机同型号感应电机功率监控

Md Istiauk Hossain Rifat, Moin Khan, Zohara Kamal, Md Borhan Uddin Khan, Mohammad Zunaed

发表机构 * Department of Electrical and Electronic Engineering, BRAC University, Dhaka, Bangladesh（电气与电子工程系，BRAC大学，达卡，孟加拉国）

AI总结针对纺织行业能源监控落后的问题，提出基于实时非侵入式负荷监测（NILM）的框架，使用MATNILM模型对同型号感应电机进行功率分解，验证了实时监控可行性并指出多台相同设备同时运行时的分解困难。

Comments 9 pages, 9 figures

详情

AI中文摘要

孟加拉国的纺织行业是能源密集型行业之一，但其监控实践仍然大多过时，导致电力使用效率低下和运营成本高昂。为了解决这个问题，我们提出了一种基于实时非侵入式负荷监测（NILM）的框架，专为工业应用定制，重点关注代表纺织切割机的相同电机驱动负载。开发了一个包含电压和电流传感器、Arduino Mega和ESP8266的硬件装置，用于捕获总负荷和单个负荷数据，并在云平台上存储和处理。从三个相同的感应电机和辅助负载创建了一个新数据集，总计超过180,000个样本，以在具有挑战性的工业条件下评估最先进的MATNILM模型。结果表明，虽然总能量估计相当准确，但每个电器的分解面临困难，特别是当多台相同机器同时运行时。尽管存在这些挑战，集成系统通过Blynk应用程序展示了具有远程访问功能的实际实时监控。这项工作突出了NILM在工业环境中的潜力和局限性，为未来的改进提供了见解，例如更高频率的数据收集、更大规模的数据集以及用于处理相同负载的先进深度学习方法。

英文摘要

The textile industry in Bangladesh is one of the most energy-intensive sectors, yet its monitoring practices remain largely outdated, resulting in inefficient power usage and high operational costs. To address this, we propose a real-time Non-Intrusive Load Monitoring (NILM)-based framework tailored for industrial applications, with a focus on identical motor-driven loads representing textile cutting machines. A hardware setup comprising voltage and current sensors, Arduino Mega and ESP8266 was developed to capture aggregate and individual load data, which was stored and processed on cloud platforms. A new dataset was created from three identical induction motors and auxiliary loads, totaling over 180,000 samples, to evaluate the state-of-the-art MATNILM model under challenging industrial conditions. Results indicate that while aggregate energy estimation was reasonably accurate, per-appliance disaggregation faced difficulties, particularly when multiple identical machines operated simultaneously. Despite these challenges, the integrated system demonstrated practical real-time monitoring with remote accessibility through the Blynk application. This work highlights both the potential and limitations of NILM in industrial contexts, offering insights into future improvements such as higher-frequency data collection, larger-scale datasets and advanced deep learning approaches for handling identical loads.

URL PDF HTML ☆

赞 0 踩 0

2601.01496 2026-05-28 cs.GT cs.AI cs.LG 版本更新

The Optimal Sample Complexity of Linear Contracts

线性合约的最优样本复杂度

Mikael Møller Høgsgaard

发表机构 * Department of Statistics, University of Oxford, United Kingdom（英国牛津大学统计系）； Department of Computer Science, Aarhus University, Denmark（丹麦奥胡斯大学计算机科学系）

AI总结本文通过经验效用最大化算法，证明仅需 O(ln(1/δ)/ε²) 个样本即可实现最优线性合约的 ε-近似，并匹配下界，从而确立最优样本复杂度。

详情

AI中文摘要

在本文中，我们解决了离线环境下从数据中学习最优线性合约的问题，其中代理人类型来自未知分布，委托人的目标是设计一个最大化其期望效用的合约。具体来说，我们的分析表明，简单的经验效用最大化（EUM）算法仅需 $O(\ln(1/δ) / \varepsilon^2)$ 个样本，就能以至少 $1-δ$ 的概率得到最优线性合约的 $\varepsilon$-近似。这一结果改进了先前已知的界限，并在常数因子内匹配了 Dütting 等人 2025 年的下界，从而证明了其最优性。此外，我们的结果建立了更强的一致收敛保证：每个线性合约的经验效用以其真实期望的 $\varepsilon$-近似成立的概率至少为 $1-δ$，且使用了相同的最优 $O(\ln(1/δ) / \varepsilon^2)$ 样本复杂度。

英文摘要

In this paper, we settle the problem of learning optimal linear contracts from data in the offline setting, where agent types are drawn from an unknown distribution and the principal's goal is to design a contract that maximizes her expected utility. Specifically, our analysis shows that the simple Empirical Utility Maximization (EUM) algorithm yields an $\varepsilon$-approximation of the optimal linear contract with probability at least $1-δ$, using just $O(\ln(1/δ) / \varepsilon^2)$ samples. This result improves upon previously known bounds and matches a lower bound from Dütting et al. 2025 up to constant factors, thereby proving its optimality. Furthermore, our result establishes the stronger guarantee of uniform convergence: the empirical utility of every linear contract is an $\varepsilon$-approximation of its true expectation with probability at least $1-δ$, using the same optimal $O(\ln(1/δ) / \varepsilon^2)$ sample complexity.

URL PDF HTML ☆

赞 0 踩 0

2512.23959 2026-05-28 cs.CL cs.AI cs.LG 版本更新

HGMEM: Hypergraph-based Working Memory to Improve Multi-step RAG for Long-Context Complex Relational Modeling

HGMem：基于超图的工作记忆以改进长上下文复杂关系建模的多步RAG

Chulun Zhou, Chunkang Zhang, Guoxin Yu, Fandong Meng, Jie Zhou, Wai Lam, Mo Yu

发表机构 * The Chinese University of Hong Kong.（香港中文大学）； Pengcheng Laboratory.（鹏城实验室）； WeChat AI, Tencent（微信AI，腾讯）； University of Chinese Academy of Sciences.（中国科学院大学）

AI总结提出HGMem超图工作记忆系统，通过超边表示记忆单元并渐进形成高阶交互，增强多步RAG中的全局理解和复杂推理能力。

Comments ICML 2026; Code released at https://github.com/Encyclomen/HGMem

详情

AI中文摘要

多步检索增强生成（RAG）已成为增强大型语言模型（LLMs）在需要全局理解和密集推理任务上的广泛采用策略。尽管许多RAG系统整合了工作记忆来整合信息，但现有设计主要作为孤立事实的被动存储。这种静态特性忽略了原始事实之间的关键高阶相关性，从而限制了模型的多步推理能力，导致在扩展上下文中的碎片化推理和弱全局理解。我们引入了HGMem，一种基于超图的工作记忆系统，将记忆的概念从简单存储扩展到动态、表达性结构，用于复杂推理和全局理解。在我们的方法中，记忆被表示为超图，其中超边对应不同的记忆单元，使得记忆内高阶交互的逐步形成成为可能。该机制连接围绕焦点问题的事实和思考，将记忆演变为一个集成且情境化的知识结构，为更深层次的推理提供强有力的命题。我们在几个具有挑战性的全局理解基准上评估了HGMem。大量实验和深入分析表明，我们的方法持续改进了多步RAG，并在不同数据集上显著优于强基线系统。

英文摘要

Multi-step retrieval-augmented generation (RAG) has become a widely adopted strategy for enhancing large language models (LLMs) on tasks that demand global comprehension and intensive reasoning. Although many RAG systems incorporate a working memory to consolidate information, existing designs primarily function as a passive storage for isolated facts. This static nature overlooks crucial high-order correlations among primitive facts, thereby limiting models' capacity for multi-step reasoning and resulting in fragmented reasoning and weak global sense-making within extended contexts. We introduce HGMem, a hypergraph-based working memory system, extending the concept of memory beyond simple storage into a dynamic, expressive structure for complex reasoning and global understanding. In our approach, memory is represented as a hypergraph where hyperedges correspond to distinct memory units, enabling the progressive formation of high-order interactions within memory. This mechanism connects facts and thoughts around the focal problem, evolving the memory into an integrated and situated knowledge structure that provides strong propositions for deeper reasoning. We evaluate HGMem on several challenging global sense-making benchmarks. Extensive experiments and in-depth analyses demonstrate that our method consistently improves multi-step RAG and substantially outperforms strong baseline systems across diverse datasets.

URL PDF HTML ☆

赞 0 踩 0

2512.22777 2026-05-28 cs.LG cs.AI 版本更新

Adapting, Fast and Slow: On Few-Shot Transportability of Compositions

适应，快与慢：关于组合的少样本可迁移性

Kasra Jalaldoust, Elias Bareinboim

发表机构 * Causal Artificial Intelligence Lab（因果人工智能实验室）

AI总结研究在少样本场景下，通过因果传输性理论将源域学习到的因果机制组合成目标域预测器，并区分模块传输性和电路传输性，提出基于梯度松弛的电路搜索方法以实现快速或慢速适应。

详情

AI中文摘要

跨域泛化需要连接源分布和目标分布的稳定结构。基于因果传输性理论，我们研究了一个序列预测设置，其中目标预测器可以表示为从源数据可学习的因果机制组成的电路。我们引入了两类传输性。模块传输性捕获原子情况，其中目标预测器由可从单个源域学习的机制给出。电路传输性将此思想推广到通过组合从源数据学习的多个模块获得的目标预测器，即使没有源机制直接预测目标标签，也能实现零样本预测。我们在逐渐放松的假设下研究这些电路类别。首先，我们提供了条件，在这些条件下，给定关于源域和目标域的因果知识，可以从源数据单独学习相关电路。然后，我们通过允许来自目标域的有限数据来放松这些结构假设。特别地，我们开发了一种监督域适应方案，该方案无需显式因果结构即可学习电路。由此产生的少样本保证将可实现误差与可从源数据学习的模块组成的最小目标电路的大小联系起来。最后，我们提出了符号电路搜索的基于梯度的松弛，并进行了实证评估，表明它定性地跟踪了预测的快速适应机制——有和没有中间位置的过程监督——以及当没有源机制匹配时的慢速适应。

英文摘要

Generalization across domains requires stable structure that links the source and target distributions. Building on causal transportability theory, we study a sequential prediction setting in which the target predictor can be represented as a circuit composed of causal mechanisms that are learnable from source data. We introduce two classes of transportability. Module transportability captures the atomic case, where the target predictor is given by a mechanism learnable from a single source domain. Circuit transportability generalizes this idea to target predictors obtained by composing several modules learned from source data, enabling zero-shot prediction even when no source mechanism directly predicts the target label. We study these classes of circuits under increasingly relaxed assumptions. First, we provide conditions under which the relevant circuits can be learned from source data alone, given causal knowledge about the source and target domains. We then relax these structural assumptions by allowing limited data from the target domain. In particular, we develop a supervised domain adaptation scheme that learns circuits without requiring explicit causal structure. The resulting few-shot guarantees tie the achievable error to the size of the smallest target circuit composable from modules learned from source data. Finally, we propose a gradient-based relaxation of the symbolic circuit search and evaluate it empirically, showing that it qualitatively tracks the predicted regimes of fast adaptation -- with and without process supervision over intermediate positions -- and slow adaptation when no source mechanism matches.

URL PDF HTML ☆

赞 0 踩 0

2501.09934 2026-05-28 cs.LG cs.AI 版本更新

HEART: Achieving Timely Multi-Model Training for Vehicle-Edge-Cloud-Integrated Hierarchical Federated Learning

HEART：实现车辆-边缘-云集成分层联邦学习的多模型及时训练

Xiaohong Yang, Minghui Liwang, Xianbin Wang, Zhipeng Cheng, Seyyedali Hosseinalipour, Huaiyu Dai, Zhenzhen Jiao

发表机构 * School of Informatics, Xiamen University（厦门大学信息学院）； Department of Control Science and Engineering, Shanghai Institute of Intelligent Science and Technology（上海智能科学研究院控制科学与工程系）； State Key Laboratory of Autonomous Intelligent Unmanned Systems（自主智能无人系统国家重点实验室）； Shanghai Key Laboratory of Intelligent Autonomous Systems（上海智能自主系统重点实验室）； Frontiers Science Center for Intelligent Autonomous Systems, Ministry of Education, Tongji University（教育部智能自主系统前沿科学中心，同济大学）； Department of Electrical and Computer Engineering, Western University（西方大学电气与计算机工程系）； School of Future Science and Engineering, Soochow University（苏州大学未来科学与工程学院）

AI总结针对车辆-边缘-云分层联邦学习中多模型训练面临的模型过时、数据利用低效和资源分配不平衡问题，提出HEART框架，通过混合同步-异步聚合规则和两阶段优化算法（改进PSO+GA与贪心算法）最小化全局训练延迟并实现任务平衡。

Comments Accepted by IEEE Transactions on Cloud Computing (22 pages, 7 figures)

详情

AI中文摘要

人工智能赋能的物联网车辆（IoV）的快速发展需要高效的机器学习（ML）解决方案，以处理高车辆移动性和分散数据。这推动了车辆-边缘-云架构上的分层联邦学习（VEC-HFL）的出现。然而，VEC-HFL文献中尚未充分探讨的一个方面是，车辆通常需要同时执行多个ML任务，这种多模型训练环境带来了关键挑战。首先，不恰当的聚合规则可能导致模型过时和训练时间延长。其次，车辆移动性可能阻止车辆将模型返回网络边缘，导致数据利用效率低下。第三，跨不同任务实现平衡的资源分配变得至关重要，因为它极大地影响协作训练的有效性。我们率先提出一个针对动态VEC-HFL中多模型训练的框架，目标是最小化全局训练延迟，同时确保跨各种任务的平衡训练，该问题被证明是NP难的。为了促进及时模型训练，我们引入了一种混合同步-异步聚合规则。在此基础上，我们提出了一种称为混合进化与贪婪分配（HEART）的新方法。该框架分两个阶段运行：首先，通过结合改进的粒子群优化（PSO）和遗传算法（GA）的混合启发式方法实现平衡的任务调度；其次，采用低复杂度的贪心算法确定车辆上分配任务的训练优先级。在真实数据集上的实验证明了HEART相对于现有方法的优越性。

英文摘要

The rapid growth of AI-enabled Internet of Vehicles (IoV) calls for efficient Machine Learning (ML) solutions that can handle high vehicular mobility and decentralized data. This has motivated the emergence of Hierarchical Federated Learning over vehicle-edge-cloud architectures (VEC-HFL). Nevertheless, one aspect which is underexplored in the literature on VEC-HFL is that vehicles often need to execute multiple ML tasks simultaneously, where this multi-model training environment introduces crucial challenges. First, improper aggregation rules can lead to model obsolescence and prolonged training times. Second, vehicular mobility may result in inefficient data utilization by preventing the vehicles from returning their models to the network edge. Third, achieving a balanced resource allocation across diverse tasks becomes of paramount importance as it majorly affects the effectiveness of collaborative training. We take one of the first steps towards addressing these challenges via proposing a framework for multi-model training in dynamic VEC-HFL with the goal of minimizing global training latency while ensuring balanced training across various tasks, a problem that turns out to be NP-hard. To facilitate timely model training, we introduce a hybrid synchronous-asynchronous aggregation rule. Building on this, we present a novel method called Hybrid Evolutionary And gReedy allocaTion (HEART). The framework operates in two stages: first, it achieves balanced task scheduling through a hybrid heuristic approach that combines improved Particle Swarm Optimization (PSO) and Genetic Algorithms (GA); second, it employs a low-complexity greedy algorithm to determine the training priority of assigned tasks on vehicles. Experiments on real-world datasets demonstrate the superiority of HEART over existing methods.

URL PDF HTML ☆

赞 0 踩 0

2512.20657 2026-05-28 cs.SI cs.LG 版本更新

Graph Neural Networks for Source Detection: A Review and Benchmark Study

图神经网络用于源检测：综述与基准研究

Martin Sterchi, Nathan Brack, Lorenz Hilfiker

发表机构 * University of Applied Sciences and Arts Northwestern Switzerland FHNW（应用科学与艺术西北瑞士大学 FHNW）； University of Zürich Switzerland（苏黎世大学瑞士）

AI总结本文系统综述了基于图神经网络的源检测方法，并在受控条件下复现和基准测试了四种代表性GNN架构，实验表明GNN在多种网络拓扑上显著优于传统方法和MLP基线。

详情

AI中文摘要

当流行病过程在接触网络上展开时，源检测问题出现，目标是识别其起源点，即源节点。该问题的研究始于Shah和Zaman在2010年的开创性工作，他们正式定义了该问题并引入了谣言中心性的概念。随着图神经网络（GNN）的出现，多项研究提出了基于GNN的源检测方法。然而，这些工作在方法论的清晰度和可重复性方面仍有改进空间。因此，目前尚不清楚GNN在可比设置下是否真正优于更传统的源检测方法。在本文中，我们首先系统回顾了现有的基于GNN的源检测方法，清晰概述了每种方法所处理的具体设置及其采用的架构。然后，我们在受控的可比条件下，复现并基准测试了四种代表性GNN架构与多种传统和基于MLP的基线方法。我们还研究了围绕该问题的关键问题，包括可检测性如何随时间演变、性能如何随训练集大小扩展，以及方法对观测时间和流行病参数不确定性的敏感程度。我们的实验表明，GNN在多种网络拓扑上显著优于我们测试的所有其他方法。尽管我们最初旨在挑战GNN作为源检测解决方案的观点，但我们的结果反而证明了它们在此任务上的显著有效性。为确保完全可重复性，我们在GitHub上发布了所有代码和数据。最后，我们认为流行病源检测构成了评估GNN架构的一个自然且有吸引力的基准任务。

英文摘要

The source detection problem arises when an epidemic process unfolds over a contact network, and the objective is to identify its point of origin, i.e., the source node. Research on this problem began with the seminal work of Shah and Zaman in 2010, who formally defined it and introduced the notion of rumor centrality. With the emergence of Graph Neural Networks (GNNs), several studies have proposed GNN-based approaches to source detection. However, there is room to strengthen methodological clarity and reproducibility across these works. As a result, it remains unclear whether GNNs truly outperform more traditional source detection methods across comparable settings. In this paper, we first systematically review existing GNN-based methods for source detection, clearly outlining the specific settings each addresses and the architectures they employ. We then reproduce and benchmark four representative GNN architectures against a diverse set of traditional and MLP-based baselines under controlled, comparable conditions. We also investigate key questions surrounding this problem, including how detectability evolves over time, how performance scales with training set size, and how sensitive methods are to uncertainty in observation timing and epidemic parameters. Our experiments show that GNNs substantially outperform all other methods we test across a variety of network topologies. Although we initially set out to challenge the notion of GNNs as a solution to source detection, our results instead demonstrate their remarkable effectiveness for this task. To ensure full reproducibility, we release all code and data on GitHub. Finally, we argue that epidemic source detection constitutes a natural and attractive benchmark task for evaluating GNN architectures.

URL PDF HTML ☆

赞 0 踩 0

2212.04382 2026-05-28 stat.ML cs.LG 版本更新

Structure of Classifier Boundaries: Case Study for a Naive Bayes Classifier

分类器边界的结构：朴素贝叶斯分类器的案例研究

Alan F. Karr, Zac Bowen, Adam A. Porter, Regina Ruane

发表机构 * Temple University, Department of Statistics, Operations, and Data Science（特拉华大学统计学、运营与数据科学系）； Fraunhofer USA Center Mid-Atlantic（弗劳恩霍夫美国中大西洋中心）； University of Maryland, Department of Computer Science（马里兰大学计算机科学系）； University of Pennsylvania, Computational Social Science Laboratory（宾夕法尼亚大学计算社会科学实验室）

AI总结研究贝叶斯分类器在输入空间为图时的边界结构，通过邻域相似性度量分类不确定性，并应用于DNA读段分配问题。

2512.18566 2026-05-28 cs.LG cs.SY eess.SY q-bio.NC 版本更新

Comparing Dynamical Models Through Diffeomorphic Vector Field Alignment

通过微分同胚向量场对齐比较动力学模型

Ruiqi Chen, Giacomo Vedovati, Todd Braver, ShiNung Ching

发表机构 * Washington University in St. Louis（华盛顿大学圣路易斯分校）

AI总结提出DFORM框架，通过非线性坐标变换对齐两个动力系统的轨迹，评估拓扑等价性并定位高维模型中的低维动力学模式。

Comments 57 pages, 18 figures. For associated code, see https://github.com/rq-Chen/DFORM_stable

详情

DOI: 10.1162/NECO.a.1526
Journal ref: Neural Computation (2026) 38 (6): 1006-1061

AI中文摘要

诸如递归神经网络（RNN）等动力系统模型在理论神经科学中越来越受欢迎，用于假设生成和数据分析。评估这些模型中的动力学是理解其学习到的生成机制的关键。然而，这种评估受到两个主要挑战的阻碍：首先，由于没有强制要求坐标系统等价，跨模型比较学习到的动力学很困难。其次，在高维非线性模型（如RNN）中，识别机制上重要的低维模式（例如极限集）是难以处理的。在这里，我们提出了一个全面的框架来解决这两个问题，称为学习模型的微分同胚向量场对齐（DFORM）。DFORM学习两个动力系统状态空间之间的非线性坐标变换，以最大程度地一对一地对齐它们的轨迹。通过这样做，DFORM能够评估两个模型是否表现出拓扑等价性，即尽管坐标系统不同但机制相似。该方法的一个副产品是一种在高维系统中嵌入的低维流形上定位动力学模式的方法。我们使用典型的拓扑等价系统、RNN和通过非线性流相关的系统验证了DFORM识别线性和非线性坐标变换的能力。DFORM还被证明可以提供拓扑不同系统之间的相似性量化。然后，我们证明了DFORM可以在高维模型中定位重要的动力学模式，包括不变流形和鞍极限集。最后，使用一组在人类功能性磁共振成像（fMRI）记录上训练的RNN模型，我们展示了DFORM可以从高维数据驱动模型中识别极限环，这与先前的数值分析结果一致。

英文摘要

Dynamical systems models such as recurrent neural networks (RNNs) are increasingly popular in theoretical neuroscience for hypothesis-generation and data analysis. Evaluating the dynamics in such models is key to understanding their learned generative mechanisms. However, such evaluation is impeded by two major challenges: First, comparison of learned dynamics across models is difficult because there is no enforced equivalence of their coordinate systems. Second, identification of mechanistically important low-dimensional motifs (e.g., limit sets) is intractable in high-dimensional nonlinear models such as RNNs. Here, we propose a comprehensive framework to address these two issues, termed Diffeomorphic vector field alignment FOR learned Models (DFORM). DFORM learns a nonlinear coordinate transformation between the state spaces of two dynamical systems, which aligns their trajectories in a maximally one-to-one manner. In so doing, DFORM enables an assessment of whether two models exhibit topological equivalence, i.e., similar mechanisms despite differences in coordinate systems. A byproduct of this method is a means to locate dynamical motifs on low-dimensional manifolds embedded within higher-dimensional systems. We verified DFORM's ability to identify linear and nonlinear coordinate transformations using canonical topologically equivalent systems, RNNs, and systems related by nonlinear flows. DFORM was also shown to provide a quantification of similarity between topologically distinct systems. We then demonstrated that DFORM can locate important dynamical motifs including invariant manifolds and saddle limit sets within high-dimensional models. Finally, using a set of RNN models trained on human functional MRI (fMRI) recordings, we illustrated that DFORM can identify limit cycles from high-dimensional data-driven models, which agreed well with prior numerical analysis.

URL PDF HTML ☆

赞 0 踩 0

2512.17375 2026-05-28 cs.LG cs.CL cs.CR 版本更新

AdvJudge-Zero: Binary Decision Flips in LLM-as-a-Judge via Adversarial Control Tokens

AdvJudge-Zero：通过对抗控制令牌在LLM作为评判者中实现二元决策翻转

Tung-Ling Li, Yuhao Wu, Hongliang Liu

发表机构 * Palo Alto Networks（帕洛阿尔托网络公司）

AI总结本文提出AdvJudge-Zero方法，通过从评判模型自身分布中采样低困惑度令牌，无需梯度优化即可将LLM评判者的二元判决从“否”翻转为“是”，并基于发现的令牌池提出防御策略以增强评判鲁棒性。

详情

AI中文摘要

LLM作为评判者系统在现代RLHF和RLVR流程中提供奖励信号，但其二元判决简化为一个隐藏状态上的单一线性读出F_gap。我们证明该读出足够浅，以至于短且低困惑度的令牌可以将判决从“否”翻转为“是”。这些令牌是从评判者自身在响应位置的下一个令牌分布中采样的，无需手动设置种子或基于梯度的优化。我们的方法AdvJudge-Zero在六个Qwen、Llama和Gemma评判者上的24个（模型，数据集）单元中，有22个实现了>90%的集成假阳性率，而先前策划的10令牌基准为54-72%，并且发现的表面跨格式转移到70B标量奖励模型。相同的发现池使得防御成为可能：基于9类机制分类法分层的LoRA微调，在相同池上的朴素采样失败的跨族泛化中增强了鲁棒性，其中机制广度而非池大小带来了增益。在GRPO训练下，硬化后的评判者消除了未硬化基线在MATH和GSM8K上每个条件十个种子时观察到的奖励崩溃失败（假阳性峰值和长度崩溃）。发现的池、机制分类法和每个提示的翻转记录将在负责任的披露下发布。

英文摘要

LLM-as-a-Judge systems supply the reward signal in modern RLHF and RLVR pipelines, but their binary verdict reduces to a single linear readout F_gap on one hidden state. We show this readout is shallow enough that short, low-perplexity tokens flip the verdict from "No" to "Yes". These tokens are sampled from the judge's own next-token distribution at the response position, with no manual seed set and no gradient-based optimization. Our procedure, AdvJudge-Zero, reaches $>$90% ensemble false-positive rate on 22 of 24 (model, dataset) cells across six Qwen, Llama, and Gemma judges, versus 54-72% for the prior curated 10-token benchmark, and the discovered surface transfers cross-format to a 70B scalar reward model. The same discovered pool enables a defense: a LoRA fine-tune stratified by a 9-class mechanism taxonomy hardens cross-family generalization where naive sampling on the same pool fails, with mechanism breadth rather than pool size carrying the gain. Under GRPO training, the hardened judge eliminates the reward-collapse failures (false-positive spikes and length collapse) we observe in the unhardened baseline on both MATH and GSM8K at ten seeds per condition. The discovered pool, the mechanism taxonomy, and per-prompt flip records will be released under responsible disclosure.

URL PDF HTML ☆

赞 0 踩 0

2505.17720 2026-05-28 cs.LG physics.ao-ph 版本更新

PEAR: Equal Area Weather Forecasting on the Sphere

PEAR：球面上的等面积天气预报

Hampus Linander, Tage Tykesson, Pietro Rosso, Christoffer Petersson, Daniel Persson, Jan E. Gerken

发表机构 * VERSES AI ； Department of Mathematical Sciences（数学科学系）； Chalmers University of Technology（楚姆勒斯技术大学）； University of Gothenburg（哥德堡大学）； Department of Computer Science and Engineering（计算机科学与工程系）； Department of Physics and Astronomy（物理与天文学系）； Recohere

AI总结针对球面等角网格在极地分辨率过高的问题，提出基于HEALPix等面积网格的Transformer模型PEAR，实现无计算开销的全球天气预报性能提升。

Comments Extended version of manuscript published in the AI for Science workshop (NeurIPS 2025), 11 pages, 15 pages supplemental

详情

AI中文摘要

人工智能正在迅速重塑自然科学领域，天气预报作为AI4Science的标志性应用脱颖而出，机器学习模型如今能够与传统的数值模拟相媲美甚至超越。继里程碑式模型Pangu Weather和Graphcast在全球中期预报中超越传统数值方法后，许多新颖的数据驱动方法相继涌现。这些模型的一个共同局限是依赖球面的等角离散化，这种离散化在极地附近的网格比赤道附近更密集。相比之下，在球面的分层等面积纬度像素化（HEALPix）中，每个像素覆盖相同的表面积，消除了非物理的偏差。受气象和气候科学中对这种网格日益增长的支持的启发，我们提出使用原生运行在HEALPix网格上的深度学习模型进行天气预报。为此，我们引入了Pangu Equal ARea（PEAR），这是一个基于Transformer的天气预报模型，直接对HEALPix特征进行操作，在无任何计算开销的情况下，性能优于等角网格上的对应模型及其他基线。此外，我们对设置的等变性进行了数值实验，并验证了PEAR在气候模型模拟中的性能。

英文摘要

Artificial intelligence is rapidly reshaping the natural sciences, with weather forecasting emerging as a flagship AI4Science application where machine learning models can now rival and even surpass traditional numerical simulations. Following the success of the landmark models Pangu Weather and Graphcast, outperforming traditional numerical methods for global medium-range forecasting, many novel data-driven methods have emerged. A common limitation shared by many of these models is their reliance on an equiangular discretization of the sphere which suffers from a much finer grid at the poles than around the equator. In contrast, in the Hierarchical Equal Area iso-Latitude Pixelization (HEALPix) of the sphere, each pixel covers the same surface area, removing unphysical biases. Motivated by a growing support for this grid in meteorology and climate sciences, we propose to perform weather forecasting with deep learning models which natively operate on the HEALPix grid. To this end, we introduce Pangu Equal ARea (PEAR), a transformer-based weather forecasting model which operates directly on HEALPix-features and outperforms the corresponding model on an equiangular grid, and other baselines, without any computational overhead. Furthermore, we perform numerical experiments on the equivariance properties of our setup and verify the performance of PEAR on climate model emulation.

URL PDF HTML ☆

赞 0 踩 0

2307.06240 2026-05-28 cs.LG cs.AI cs.RO cs.SY eess.SY 版本更新

DSSE: a drone swarm search environment

DSSE：无人机群搜索环境

Manuel Castanares, Luis F. S. Carrete, Enrico F. Damiani, Leonardo D. M. de Abreu, José Fernando B. Brancalion, Fabrício J. Barth

发表机构 * Insper ； Embraer

AI总结基于PettingZoo的多智能体强化学习环境，无人机通过动态概率输入搜索目标。

Comments 7 pages

2512.09800 2026-05-28 cs.LG cs.DC cs.PF 版本更新

Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core Microcontrollers

Ariel-ML：面向异构多核微控制器的嵌入式Rust神经网络计算并行化

Zhaolan Huang, Kaspar Schleiser, Gyungmin Myung, Emmanuel Baccelli

发表机构 * KAIST, South Korea（韩国科学技术院）； Inria, France（法国国家信息与自动化技术研究院）

AI总结针对多核MCU上TinyML推理的并行化需求，提出基于嵌入式Rust的Ariel-ML工具包，通过通用TinyML流水线和多核支持，在多种32位MCU上实现低延迟推理，并保持与C/C++相当的内存占用。

详情

AI中文摘要

低功耗微控制器（MCU）硬件正从单核架构向多核架构演进。同时，新的嵌入式软件构建块越来越多地用Rust编写，而C/C++在该领域的主导地位逐渐减弱。另一方面，各种小型人工神经网络（ANN）越来越多地部署在边缘AI用例中，直接在低功耗MCU上执行。在此背景下，增量改进和新颖创新服务需要不断通过已在现场部署的传感/执行系统上的嵌入式软件执行ANN来改造。然而，目前尚无能够自动并行化多核MCU上任意TinyML模型推理计算的Rust嵌入式软件平台。本文通过引入Ariel-ML填补了这一空白，这是一个新颖的工具包，结合了通用TinyML流水线和嵌入式Rust软件平台，能够充分利用各种32位微控制器系列（Arm Cortex-M、RISC-V、ESP-32）的多核能力。我们发布了其实现的完整开源代码，并使用多种TinyML模型对其性能进行了基准测试。结果表明，Ariel-ML在推理延迟方面优于现有技术，并且与使用嵌入式C/C++的现有工具包相比，实现了相当的内存占用。因此，Ariel-ML为TinyML从业者和资源受限的嵌入式Rust开发者提供了有用的基础。

英文摘要

Low-power microcontroller (MCU) hardware is currently evolving from single-core architectures to predominantly multi-core architectures. In parallel, new embedded software building blocks are more and more written in Rust, while C/C++ dominance fades in this domain. On the other hand, small artificial neural networks (ANN) of various kinds are increasingly deployed in edge AI use cases, thus deployed and executed directly on low-power MCUs. In this context, both incremental improvements and novel innovative services will have to be continuously retrofitted using ANNs execution in software embedded on sensing/actuating systems already deployed in the field. However, there was so far no Rust embedded software platform automating parallelization for inference computation on multi-core MCUs executing arbitrary TinyML models. This paper thus fills this gap by introducing Ariel-ML, a novel toolkit we designed combining a generic TinyML pipeline and an embedded Rust software platform which can take full advantage of multi-core capabilities of various 32bit microcontroller families (Arm Cortex-M, RISC-V, ESP-32). We published the full open source code of its implementation, which we used to benchmark its capabilities using a zoo of various TinyML models. We show that Ariel-ML outperforms prior art in terms of inference latency as expected, and we show that, compared to pre-existing toolkits using embedded C/C++, Ariel-ML achieves comparable memory footprints. Ariel-ML thus provides a useful basis for TinyML practitioners and resource-constrained embedded Rust developers.

URL PDF HTML ☆

赞 0 踩 0

2512.09786 2026-05-28 cs.LG cs.PF cs.SD eess.AS eess.SP 版本更新

预测未来解剖结构：纵向脑MRI到MRI的预测

Ali Farki, Elaheh Moradi, Deepika Koundal, Jussi Tohka

发表机构 * A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland（A.I. Virtanen分子科学研究所，东芬兰大学，库奥普io，芬兰）

AI总结本文研究从基线MRI预测未来脑部MRI，采用五种深度学习架构（UNet、U2-Net、UNETR、时间嵌入UNet和ODE-UNet）在ADNI和AIBL数据集上实现高保真体素级预测，并验证了跨队列泛化能力。

详情

DOI: 10.1109/ISBI61048.2026.11515462
Journal ref: 2026 IEEE 23rd International Symposium on Biomedical Imaging (ISBI), Apr. 2026

AI中文摘要

从基线磁共振图像（MRI）预测未来脑状态是神经影像学的一个核心挑战，对研究阿尔茨海默病（AD）等神经退行性疾病具有重要意义。大多数现有方法预测未来认知评分或临床结果，例如从轻度认知障碍向痴呆的转化。相反，本文研究纵向MRI图像到图像的预测，该预测可以预测参与者未来数年的整个脑部MRI，内在建模复杂的、空间分布的神经退行模式。我们在两个纵向队列（ADNI和AIBL）上实施并评估了五种深度学习架构（UNet、U2-Net、UNETR、时间嵌入UNet和ODE-UNet）。使用捕捉全局相似性和局部差异的指标，将预测的随访MRI与实际随访扫描直接进行比较。表现最佳的模型实现了高保真预测，并且所有模型都能很好地泛化到独立的外部数据集，展示了稳健的跨队列性能。我们的结果表明，深度学习可以在体素水平上可靠地预测参与者特定的脑部MRI，为个体化预后提供了新的机会。

英文摘要

Predicting future brain state from a baseline magnetic resonance image (MRI) is a central challenge in neuroimaging and has important implications for studying neurodegenerative diseases such as Alzheimer's disease (AD). Most existing approaches predict future cognitive scores or clinical outcomes, such as conversion from mild cognitive impairment to dementia. Instead, here we investigate longitudinal MRI image-to-image prediction that forecasts a participant's entire brain MRI several years into the future, intrinsically modeling complex, spatially distributed neurodegenerative patterns. We implement and evaluate five deep learning architectures (UNet, U2-Net, UNETR, Time-Embedding UNet, and ODE-UNet) on two longitudinal cohorts (ADNI and AIBL). Predicted follow-up MRIs are directly compared with the actual follow-up scans using metrics that capture global similarity and local differences. The best performing models achieve high-fidelity predictions, and all models generalize well to an independent external dataset, demonstrating robust cross-cohort performance. Our results indicate that deep learning can reliably predict participant-specific brain MRI at the voxel level, offering new opportunities for individualized prognosis.

URL PDF HTML ☆

赞 0 踩 0

2511.09572 2026-05-28 cs.AI cs.LG cs.SE 版本更新

SynthTools: A Framework for Scaling Synthetic Tools for Agent Development

SynthTools: 用于扩展智能体开发中合成工具的框架

Tommaso Castellani, Naimeng Ye, Daksh Mittal, Thomson Yen, Emmanouil Koukoumidis, William Zeng, Hongseok Namkoong

AI总结提出基于LLM的端到端管道SynthTools，通过环境生成、模拟、验证和任务构建，生成大规模多样化工具使用环境，提升智能体工具使用能力。

详情

AI中文摘要

为了使智能体系统能够使用外部工具解决复杂、长期的任务，我们需要大量多样且可控的工具使用环境。我们引入了SynthTools，一个完全基于LLM的管道，涵盖整个生命周期：环境生成、模拟、验证和任务构建。通过端到端地使用LLM，我们的框架补充了其他受限于真实API复杂性的工具使用环境，并通过设计确保可扩展性和可控性。该框架由三个组件组成：自上而下的环境生成，分层构建多样化的、基于领域的工具环境；环境模拟与验证，确保工具能够可靠地模拟并过滤掉无法模拟的工具；以及自下而上的任务与轨迹生成，产生可解决且可验证的任务以及多步轨迹，对难度、长度、轨迹组成和领域焦点进行控制以保证灵活性。作为具体实例，我们发布了包含6800个环境和100个领域中的73883个经过验证的工具、79925个可验证任务的数据集，以及大规模生成轨迹的管道。在这些任务生成的轨迹语料库上训练不同规模的Qwen3模型，在多个工具使用基准测试（包括真实API）上取得了提升，表明在合成数据上训练的工具使用能力可能迁移到某些真实环境。这些结果共同表明，SynthTools可以作为大规模训练工具使用智能体的有用基础设施。

英文摘要

For agentic systems to use external tools to solve complex, long-horizon tasks, we need a large set of diverse and controllable tool-use environments. We introduce SynthTools, a fully LLM-based pipeline spanning the entire lifecycle: environment generation, simulation, validation and task construction. By operating end-to-end through LLMs, our framework complements other tool-use environments bottlenecked by the complexity of real APIs, and ensures scalability and controllability by design. The framework consists of three components: top-down environment generation, which hierarchically constructs diverse, domain-grounded tool environments; environment simulation and validation, which ensures tools can be reliably emulated and filters out those that cannot; and bottom-up task and trajectory generation, which produces solvable and verifiable tasks together with multi-step trajectories, exposing control over difficulty, length, trajectory composition, and domain focus to guarantee flexibility. As a concrete instantiation, we release the dataset comprising $73{,}883$ validated tools across $6{,}800$ environments and $100$ fields, $79{,}925$ verifiable tasks as well as the pipeline to generate trajectories at scale. Training Qwen3 models of various sizes on a corpus of trajectories generated from these tasks yields gains across multiple tool-use benchmarks, including real APIs, indicating tool-use capabilities trained on synthetic data may transfer to some real environments. Together, these results suggest that SynthTools can serve as a useful infrastructure for large-scale training of tool-use agents.

URL PDF HTML ☆

赞 0 踩 0

2511.05550 2026-05-28 cs.SD cs.CL cs.LG 版本更新

Assessing Factual Music Comprehension in Large Audio Language Models

评估大型音频语言模型中的事实音乐理解能力

Daniel Chenyu Lin, Michael Freeman, John Thickstun

AI总结针对现有MusicQA数据集无法衡量模型回答事实正确性的问题，提出基于可验证信息的评估协议，通过精确率、召回率和F1分数客观评估模型，并在三个数据集上定义六项事实检索任务，对九个最新LALM进行基准测试。

Comments 16 pages; second submission

详情

AI中文摘要

大型音频语言模型（LALMs）利用多模态表示生成对音频自然语言查询的开放式回答。本文（1）提供经验证据表明，使用流行的MusicQA数据集评估LALMs无法衡量模型关于音乐的回答是否事实正确，（2）开发了一种新的评估LALMs音乐理解能力的协议。具体来说，我们提出一个评估协议，提示LALM提供可事实验证的信息，并将其开放式回答解析为结构化格式，使用精确率、召回率和F1分数进行客观评估。利用该协议，我们定义了一个基准测试，包含在三个不同数据集（MusicNet、Free Music Archive和OverClocked ReMix）上定义的六项事实信息检索任务。我们对九个最近的LALMs进行了基准测试，包括前沿模型如Gemini和最新的开放模型如Music Flamingo，并在https://github.com/DCL2004/LALM-Eval发布了评估脚本套件，以方便新LALMs的基准测试。

英文摘要

Large audio language models (LALMs) leverage multimodal representations to generate open-ended answers to natural language queries about audio. In this paper, we (1) provide empirical evidence that assessment of LALMs using the popular MusicQA dataset fails to measure whether a model's responses about music are factually correct, and (2) develop a new protocol for assessing the music comprehension capabilities of LALMs. Specifically, we propose an evaluation protocol that prompts a LALM for factually verifiable information, and parses its open-ended response into a structured format that can be objectively assessed using Precision, Recall, and F1 scores. Using this protocol, we define a benchmark consisting of six factual information retrieval tasks defined on three diverse datasets: MusicNet, the Free Music Archive, and OverClocked ReMix. We benchmark nine recent LALMs, including frontier models like Gemini and the latest open models like Music Flamingo, and release the suite of evaluation scripts at https://github.com/DCL2004/LALM-Eval to facilitate benchmarking of new LALMs.

URL PDF HTML ☆

赞 0 踩 0

2511.02398 2026-05-28 cs.LG 版本更新

A Spatially Informed Gaussian Process UCB Method for Decentralized Coverage Control

一种用于分散覆盖控制的空间信息高斯过程UCB方法

Gennaro Guidone, Luca Monegaglia, Elia Raimondi, Han Wang, Mattia Bianchi, Florian Dörfler

发表机构 * Automatic Control Laboratory (IfA)（自动控制实验室）

AI总结提出一种基于高斯过程上置信界（GP-UCB）的分散算法，通过结合期望位置成本与方差探索项，使智能体自主平衡探索与利用，实现未知空间覆盖控制。

2411.13479 2026-05-28 stat.ML cs.LG stat.AP 版本更新

Conformal Prediction for Hierarchical Data

分层数据的共形预测

Guillaume Principato, Gilles Stoltz, Yvenn Amara-Ouali, Yannig Goude, Bachir Hamrouche, Jean-Michel Poggi

发表机构 * EDF R&D（EDF研发部）； Université Paris-Saclay（巴黎萨克雷大学）； CNRS（国家科学研究中心）； Inria（法国国家信息与自动化技术研究院）； Laboratoire de mathématiques d’Orsay（奥赛数学实验室）； HEC Paris（巴黎高等商学院）； Université Paris Cité（巴黎-萨克雷大学）

AI总结针对分层数据，通过引入投影（协调）步骤到分裂共形预测中，在联合覆盖和分量覆盖下均实现更小的预测区域，并理论证明其全局更优。

Comments 39 pages, 4 figures

2510.22190 2026-05-28 astro-ph.IM astro-ph.CO cs.LG 版本更新

RGC: a radio AGN classifier based on deep learning. I. A semi-supervised multiclass model for VLA images

RGC: 基于深度学习的射电活动星系核分类器. I. VLA图像的半监督多类模型

M. S. Hossain, M. S. H. Shahal, K. M. B. Asad, P. Saikia, A. Khan, F. Akter, A. Ali, M. A. Amin, D. P. Guha, M. O. B. Jihad, A. Momen, S. Sen, A. K. M. M. Rahman

发表机构 * Center for Computational and Data Sciences, Independent University, Bangladesh, Dhaka 1229, Bangladesh（计算与数据科学中心，独立大学，孟加拉国，达卡1229，孟加拉国）； Center for Astronomy, Space Science and Astrophysics, Independent University, Bangladesh, Dhaka 1229, Bangladesh（天文学、空间科学与天体物理学中心，独立大学，孟加拉国，达卡1229，孟加拉国）； Department of Physical Sciences, Independent University, Bangladesh, Dhaka 1229, Bangladesh（物理科学系，独立大学，孟加拉国，达卡1229，孟加拉国）； Department of Computer Science and Engineering, Independent University, Bangladesh, Dhaka 1229, Bangladesh（计算机科学与工程系，独立大学，孟加拉国，达卡1229，孟加拉国）； Department of Astronomy and Physics, Yale University, New Haven, CT 06511, USA（天文学与物理学系，耶鲁大学，新 Haven，CT 06511，美国）； Department of Agricultural and Biosystems Engineering, North Dakota State University, Fargo, ND 58108, USA（农业与生物系统工程系，北达科他州立大学，法戈，ND 58108，美国）

AI总结提出半监督RGC模型，结合BYOL和E2CNN，利用2060个标注样本和20000个未标注样本，首次实现弯曲射电活动星系核（WAT/NAT）与直型Fanaroff-Riley类型（sFRI/sFRII）的多类分类，性能与监督模型相当且Grad-CAM关注形态结构。

Comments 12 pages, 8 pages appendix, 7 figures, re-submitted to A&A

详情

AI中文摘要

弯曲射电活动星系核（RAGNs）——宽角尾（WATs）和窄角尾（NATs）——追踪星系群和星系团中的致密环境，但目前尚无多类分类器能同时使用视觉检查标签和未标注数据将它们与直型Fanaroff-Riley类型（sFRI, sFRII）区分开来。我们发布了FIRST-2060，一个包含2060个RAGNs（sFRI, sFRII, WAT, NAT）的四类标注数据集，该数据集通过多层视觉检查从三个公开目录构建，同时发布了半监督RGC 1.0模型，该模型利用了20000个未标注源。我们将RGC与五个监督基线进行了基准测试。FIRST-2060以两种预处理变体提供：$\mathbf{R}_{L1}$（保留虚假源）和$\mathbf{R}_{L2}$（移除虚假源）。RGC模型将自监督框架BYOL（Bootstrap Your Own Latent）与$E(2)$-等变可转向CNN（E2CNN）编码器集成，在未标注数据上预训练，并在标注集上微调。所有六个模型均通过5折交叉验证、Grad-CAM注意力分析和受控类不平衡实验进行评估。ConvNeXT（$M_1$）和RGC（$M_2$）构成第一梯队，宏$F_1$分别为$0.80\pm0.02$和$0.79\pm0.02$，差异在一个标准差内。$M_2$是唯一一个Grad-CAM轮廓一致追踪RAGNs形态结构（瓣、喷流和弯曲）而非默认紧凑斑点或扩散模式的模型。这里引入的四类方案使得能够构建WAT/NAT分辨的目录，这些目录可作为环境探针和弥漫星系团射电辐射的前身分类。$M_1$和$M_2$的互补优势——分别在于跨类型和类型内区分——表明集成方法可能为巡天尺度形态目录提供实用框架。

英文摘要

Bent radio active galactic nuclei (RAGNs) -- wide-angle tails (WATs) and narrow-angle tails (NATs) -- trace dense environments in galaxy groups and clusters, yet no multiclass classifier simultaneously separates them from straight Fanaroff--Riley types (sFRI, sFRII) using visually inspected labels and unlabelled data. We release FIRST-2060, a four-class labelled dataset of 2060 RAGNs (sFRI, sFRII, WAT, NAT) constructed from three publicly available catalogues through multi-tier visual inspection, together with the semi-supervised RGC 1.0 model that leverages 20,000 unlabelled sources. We benchmark RGC against five supervised baselines. FIRST-2060 is provided in two preprocessing variants: $\mathbf{R}_{L1}$, which retains spurious sources, and $\mathbf{R}_{L2}$, from which they are removed. The RGC model integrates the self-supervised framework BYOL (Bootstrap Your Own Latent) with an $E(2)$-equivariant steerable CNN (E2CNN) encoder, pre-trained on the unlabelled data and fine-tuned on the labelled sets. All six models are evaluated with 5-fold cross-validation, Grad-CAM attention analysis, and controlled class-imbalance experiments. ConvNeXT ($M_1$) and RGC ($M_2$) form a top tier at macro-$F_1$ $0.80\pm0.02$ and $0.79\pm0.02$ respectively, a difference within one standard deviation. $M_2$ is the only model whose Grad-CAM contours consistently trace the morphological structure of RAGNs -- lobes, jets, and bends -- rather than defaulting to compact blobs or diffuse patterns. The four-class scheme introduced here enables WAT/NAT-resolved catalogues that can serve as environment probes and progenitor classifications for diffuse cluster radio emission. The complementary strengths of $M_1$ and $M_2$ -- in cross-type and within-type discrimination respectively -- suggest that an ensemble approach may offer a practical framework for survey-scale morphological catalogues.

URL PDF HTML ☆

赞 0 踩 0

2510.22016 2026-05-28 cs.LG stat.ML 版本更新

Cost-Sensitive Evaluation for Binary Classifiers

二分类器的代价敏感评估

Pierangelo Lombardo, Antonio Casoli, Cristian Cingolani, Shola Oshodi, Michele Zanatta

发表机构 * Eutelsat（欧泰萨特）； Reply（回复）

AI总结针对分类器评估与总分类代价（TCC）最小化不一致的问题，提出加权准确率（WA）指标和通用重加权框架，证明WA与TCC等价，并在各类不平衡与代价场景下保持鲁棒性。

Comments 24 pages, 5 figures

详情

AI中文摘要

为分类器选择合适的评估指标对于模型比较、参数优化和部署决策至关重要，但目前尚无广泛接受的、明确与总分类代价（TCC）最小化一致的评估范式。同时，类别不平衡常被视为需要修正的问题本身，可能导致与TCC最小化的不一致。为解决这些局限，（i）我们定义了加权准确率（WA），一种对二分类器的评估指标，其直观解释为准确率的加权版本；（ii）我们提出了一个通用的重加权框架，用于处理代价敏感场景中的类别不平衡，为重采样技术提供了替代方案。该框架适用于任何可表示为示例相关量的线性组合的评估指标或损失函数；它能够有意义地比较在不同数据集上获得的评估结果，并考虑用于训练、验证和测试的“开发”数据集与模型将部署的“目标”数据集之间的差异。在该框架内，我们推导了标准重平衡技术与TCC最小化保持一致的条件，以及它们可能变得具有误导性的情况。我们证明，在示例无关的单位分类代价下，最大化WA等价于最小化TCC。最后，我们通过研究WA与TCC在广泛的类别不平衡和代价机制下的相关性，分析了WA在现实示例相关代价场景中的鲁棒性。结果表明，在几乎所有考察的场景中，WA与TCC保持稳健的对齐。

英文摘要

Selecting an appropriate evaluation metric for classifiers is crucial for model comparison, parameter optimization, and deployment decisions, yet there is no consensus on a broadly accepted evaluation paradigm explicitly aligned with Total Classification Cost (TCC) minimization. At the same time, class imbalance is often treated as a problem to be corrected \emph{per se}, potentially causing misalignments with TCC minimization. To address these limitations, (\emph{i}) we define Weighted Accuracy (WA), an evaluation metric for binary classifiers with a straightforward interpretation as a weighted version of accuracy and (\emph{ii}) we propose a general reweighting framework for handling class imbalance in cost-sensitive scenarios, providing an alternative to resampling techniques. This framework applies to any evaluation metric or loss function that can be expressed as a linear combination of example-dependent quantities; it enables meaningful comparison of evaluation results obtained on different datasets and accounts for discrepancies between the \emph{development} dataset, used for training, validation, and testing, and the \emph{target} dataset, where the model will be deployed. Within this framework, we derive the conditions under which standard rebalancing techniques remain coherent with TCC minimization, and when they may instead become misleading. We prove that, under example-independent Unit Classification Costs, maximizing WA is equivalent to minimizing TCC. Finally, we analyze the robustness of WA in realistic example-dependent cost scenarios by studying its correlation with TCC across a broad range of class imbalance and cost regimes. The results show that WA maintains robust alignment with TCC across almost all examined scenarios.

URL PDF HTML ☆

赞 0 踩 0

2510.21890 2026-05-28 cs.LG cs.AI cs.GR 版本更新

The Principles of Diffusion Models

扩散模型的原理

Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, Stefano Ermon

发表机构 * MIT Press（MIT出版社）； Sony AI（索尼人工智能）； OpenAI（开放人工智能）； Stanford University（斯坦福大学）； Sony Group Corporation（索尼集团）

AI总结本文从变分、基于分数和基于流三种视角统一阐述扩散模型的数学原理，并讨论可控生成、高效求解器和流映射模型等扩展。

Comments Supplementary materials for the book are available at the book website: https://the-principles-of-diffusion-models.github.io/

详情

AI中文摘要

本书介绍了指导扩散模型发展的核心原理，追溯其起源，并展示不同公式如何源于共同的数学思想。扩散建模首先定义一个前向过程，该过程逐渐将数据破坏为噪声，通过一系列中间分布将数据分布与简单先验联系起来。目标是学习一个反向过程，将噪声转换回数据，同时恢复相同的中间分布。我们描述了三种互补的观点。受变分自编码器启发的变分观点将扩散视为逐步学习去噪。基于分数的观点植根于基于能量的建模，学习演化数据分布的梯度，指示如何将样本推向更可能的区域。基于流的观点与归一化流相关，将生成视为遵循一条平滑路径，在学习的速度场下将样本从噪声移动到数据。这些视角共享一个共同的主干：一个时间相关的速度场，其流将简单先验传输到数据。采样相当于求解一个微分方程，该方程沿着连续轨迹将噪声演化为数据。在此基础之上，本书讨论了可控生成的引导、高效数值求解器以及扩散驱动的流映射模型（学习任意时间之间的直接映射）。它为具有基本深度学习知识的读者提供了扩散模型的概念性和数学基础理解。

英文摘要

This book presents the core principles that have guided the development of diffusion models, tracing their origins and showing how diverse formulations arise from shared mathematical ideas. Diffusion modeling starts by defining a forward process that gradually corrupts data into noise, linking the data distribution to a simple prior through a continuum of intermediate distributions. The goal is to learn a reverse process that transforms noise back into data while recovering the same intermediates. We describe three complementary views. The variational view, inspired by variational autoencoders, sees diffusion as learning to remove noise step by step. The score-based view, rooted in energy-based modeling, learns the gradient of the evolving data distribution, indicating how to nudge samples toward more likely regions. The flow-based view, related to normalizing flows, treats generation as following a smooth path that moves samples from noise to data under a learned velocity field. These perspectives share a common backbone: a time-dependent velocity field whose flow transports a simple prior to the data. Sampling then amounts to solving a differential equation that evolves noise into data along a continuous trajectory. On this foundation, the book discusses guidance for controllable generation, efficient numerical solvers, and diffusion-motivated flow-map models that learn direct mappings between arbitrary times. It provides a conceptual and mathematically grounded understanding of diffusion models for readers with basic deep-learning knowledge.

URL PDF HTML ☆

赞 0 踩 0

2510.18668 2026-05-28 cs.LG cs.CV 版本更新

Prototyping an End-to-End Multi-Modal Tiny-CNN for Cardiovascular Sensor Patches

面向心血管传感器贴片的端到端多模态微型CNN原型设计

Mustafa Fuad Rifet Ibrahim, Tunc Alkanat, Felix Manthey, Maurice Meijer, Alexander Schlaefer, Peer Stelldinger

发表机构 * CTO System Innovation, NXP Semiconductors Germany GmbH（NXP半导体德国系统创新部）； Advanced Chip Engineering, NXP Semiconductors（NXP半导体先进芯片工程部）； Business Line Secure Connected Edge, NXP Semiconductors（NXP半导体安全连接边缘业务线）； Institute of Medical Technology and Intelligent Systems, Hamburg University of Technology（汉堡技术大学医学技术与智能系统研究所）； Department of Informatics, Hamburg University of Applied Sciences（汉堡应用科学大学信息学院）

AI总结针对资源受限的医疗边缘设备，提出一种早期融合心电图和心音图数据的卷积神经网络，实现二分类，相比现有技术将内存和计算成本降低约三个数量级，并验证了在微控制器上的能效优势。

Comments 11 pages, 2 figures. Extended version of our 2024 IEEE PerCom paper, with direct on-device energy measurements, a BLE communication benchmark, architecture comparisons, and an extended evaluation. Submitted to Biomedical Signal Processing and Control; Fixed typos

详情

AI中文摘要

绝大多数心血管疾病如果早期发现风险因素和迹象是可以预防的。使用体戴式传感器贴片等设备进行心血管监测，可以在保持患者自由和舒适的同时检测这些迹象。然而，传感器数据的分析必须稳健、可靠、高效且高度准确。深度学习方法可以自动化数据解读，减轻临床医生的工作负担。在这项工作中，我们分析了在资源受限的医疗边缘设备上应用深度学习模型对同步心电图（ECG）和心音图（PCG）记录进行分类的可行性。我们提出了一种具有早期数据融合的卷积神经网络来解决二分类问题。该模型在Physionet Challenge 2016数据集的同步ECG和PCG记录上进行训练和验证。与现有技术相比，我们的方法将内存占用和计算成本降低了约三个数量级，同时保持了有竞争力的准确性。我们进一步通过测量配备神经处理单元（NPU）的微控制器上的能耗，并在代表性BLE评估套件上对一系列有效载荷大小的蓝牙低功耗（BLE）通信能耗进行基准测试，证明了所提模型在医疗边缘设备上的适用性。比较结果证实，设备端推理比连续数据流传输更节能。

英文摘要

The vast majority of cardiovascular diseases may be preventable if early signs and risk factors are detected. Cardiovascular monitoring with body-worn sensor devices like sensor patches allows for the detection of such signs while preserving the freedom and comfort of patients. However, the analysis of the sensor data must be robust, reliable, efficient, and highly accurate. Deep learning methods can automate data interpretation, reducing the workload of clinicians. In this work, we analyze the feasibility of applying deep learning models to the classification of synchronized electrocardiogram (ECG) and phonocardiogram (PCG) recordings on resource-constrained medical edge devices. We propose a convolutional neural network with early fusion of data to solve a binary classification problem. The model is trained and validated on the synchronized ECG and PCG recordings from the Physionet Challenge 2016 dataset. Our approach reduces memory footprint and compute cost by approximately three orders of magnitude compared with the state-of-the-art while maintaining competitive accuracy. We further demonstrate the applicability of the proposed model on medical edge devices by measuring its energy consumption on a microcontroller equipped with a neural processing unit (NPU) and benchmarking the energy of Bluetooth Low Energy (BLE) communication on a representative BLE evaluation kit across a range of payload sizes. The comparison confirms that on-device inference can be more energy efficient than continuous data streaming.

URL PDF HTML ☆

赞 0 踩 0

2510.15839 2026-05-28 cs.LG econ.EM stat.ML 版本更新

Learning Correlated Reward Models: Statistical Barriers and Opportunities

学习相关奖励模型：统计障碍与机遇

Yeshwanth Cherapanamjeri, Constantinos Daskalakis, Gabriele Farina, Sobhan Mohammadpour

发表机构 * MIT（麻省理工学院）

AI总结本文研究了避免IIA假设的相关probit模型的统计与计算挑战，证明了成对偏好数据不足以学习相关性，而三选一偏好数据可实现近最优估计。

Comments International Conference on Learning Representations (ICLR) 2026

详情

AI中文摘要

随机效用模型（RUM）是建模用户偏好的经典框架，并在基于人类反馈的强化学习（RLHF）的奖励建模中发挥关键作用。然而，这些技术的一个关键缺陷是无关选项独立性（IIA）假设，该假设将所有人类偏好归结为单一的潜在效用函数，从而对人类偏好范围进行了粗略近似。另一方面，避免这一假设的模型的统计和计算保证很少。在本文中，我们研究了学习相关probit模型的统计和计算挑战，这是一种避免IIA假设的基本RUM。首先，我们确定了成对偏好数据的经典数据收集范式从根本上不足以学习相关性信息，这解释了该设置下缺乏统计和计算保证的原因。接下来，我们证明了三选一偏好数据可证明地克服了这些缺陷，并设计了一个统计和计算上高效的估计器，具有近最优性能。这些结果突显了高阶偏好数据在学习相关效用中的优势，从而允许对人类偏好进行更精细的建模。最后，我们在几个真实世界数据集上验证了这些理论保证，展示了人类偏好的改进个性化。

英文摘要

Random Utility Models (RUMs) are a classical framework for modeling user preferences and play a key role in reward modeling for Reinforcement Learning from Human Feedback (RLHF). However, a crucial shortcoming of many of these techniques is the Independence of Irrelevant Alternatives (IIA) assumption, which collapses \emph{all} human preferences to a universal underlying utility function, yielding a coarse approximation of the range of human preferences. On the other hand, statistical and computational guarantees for models avoiding this assumption are scarce. In this paper, we investigate the statistical and computational challenges of learning a \emph{correlated} probit model, a fundamental RUM that avoids the IIA assumption. First, we establish that the classical data collection paradigm of pairwise preference data is \emph{fundamentally insufficient} to learn correlational information, explaining the lack of statistical and computational guarantees in this setting. Next, we demonstrate that \emph{best-of-three} preference data provably overcomes these shortcomings, and devise a statistically and computationally efficient estimator with near-optimal performance. These results highlight the benefits of higher-order preference data in learning correlated utilities, allowing for more fine-grained modeling of human preferences. Finally, we validate these theoretical guarantees on several real-world datasets, demonstrating improved personalization of human preferences.

URL PDF HTML ☆

赞 0 踩 0

2510.15541 2026-05-28 cs.LG cs.CV eess.IV 版本更新

An Empirical Study on Variance-based MC Dropout Uncertainty-Error Correlation in 2D Brain Tumor Segmentation

基于方差的MC Dropout不确定性-误差相关性在二维脑肿瘤分割中的实证研究

Saumya B

发表机构 * Project Associate DESE, Indian Institute of Science（DESE项目助理，印度科学研究院）

AI总结通过U-Net在四种增强设置下的实验，发现基于方差的MC Dropout不确定性在全局和边界上与分割误差的相关性较弱，表明其局限性。

Comments v2: Updated title and framing to clarify that findings are specific to variance-based uncertainty estimation via MC Dropout, not MC Dropout broadly. Minor textual improvements throughout. Code and results available at https://github.com/Saumya4321/mcd-error-correlation

详情

AI中文摘要

从MRI中准确分割脑肿瘤对诊断和治疗规划至关重要。尽管蒙特卡洛(MC) Dropout被广泛用于估计模型不确定性，但基于方差的不确定性（通过随机前向传递的逐像素方差计算）在识别分割误差（尤其是肿瘤边界附近）方面的有效性尚未得到充分研究。本研究使用在四种增强设置（无增强、水平翻转、旋转和缩放）下训练的U-Net，实证检验了基于方差的MC Dropout不确定性与二维脑肿瘤MRI分割误差之间的关系。不确定性估计为50次随机前向传递的逐像素方差，并使用Pearson和Spearman系数与逐像素误差进行相关性分析。结果显示全局相关性较弱（r ~ 0.30-0.38），边界相关性可忽略（|r| < 0.05）。尽管不同增强设置之间的差异具有统计显著性（p < 0.001），但缺乏实际意义。这些发现表明，基于方差的MC Dropout不确定性为全局和边界误差定位提供的线索有限，且不确定性表示的选择对MC Dropout在医学图像分割中的效用有重要影响。替代表示如预测熵或互信息可能更好地捕捉分割误差，尤其是在边界处。

英文摘要

Accurate brain tumor segmentation from MRI is vital for diagnosis and treatment planning. Although Monte Carlo (MC) Dropout is widely used to estimate model uncertainty, the effectiveness of variance-based uncertainty - computed as pixel-wise variance across stochastic forward passes - in identifying segmentation errors, particularly near tumor boundaries, remains insufficiently studied. This study empirically examines the relationship between variance-based MC Dropout uncertainty and segmentation error in 2D brain tumor MRI segmentation using a U-Net trained under four augmentation settings: none, horizontal flip, rotation, and scaling. Uncertainty was estimated as the pixel-wise variance across 50 stochastic forward passes and correlated with pixel-wise errors using Pearson and Spearman coefficients. Results show weak global correlations (r ~ 0.30-0.38) and negligible boundary correlations (|r| < 0.05). Although differences across augmentations were statistically significant (p < 0.001), they lacked practical relevance. These findings suggest that variance-based MC Dropout uncertainty provides limited cues for global and boundary error localization, and that the choice of uncertainty representation critically affects the utility of MC Dropout in medical image segmentation. Alternative representations such as predictive entropy or mutual information may better capture segmentation errors, particularly at boundaries.

URL PDF HTML ☆

赞 0 踩 0

2505.11638 2026-05-28 math.NA cs.LG cs.NA 版本更新

Accelerating Natural Gradient Descent for PINNs with Randomized Numerical Linear Algebra

利用随机数值线性代数加速物理信息神经网络的自然梯度下降

Ivan Bioli, Carlo Marcati, Giancarlo Sangalli

发表机构 * University of Pavia（帕维亚大学）； Department of Mathematics（数学系）； Department of Civil Engineering and Architecture（土木工程与建筑系）； Institut Camille Jordan, Lyon 1 Université（让·乔丹研究所，里昂1大学）

AI总结针对物理信息神经网络（PINNs）中自然梯度下降（NGD）因Gram矩阵病态导致计算成本高的问题，提出基于随机数值线性代数（RandNLA）的预条件技术加速共轭梯度求解器，显著提升优化效率。

详情

AI中文摘要

自然梯度下降（NGD）已成为训练基于神经网络的偏微分方程（PDE）求解器（如物理信息神经网络（PINNs））的一种有前景的优化算法。然而，其实际应用通常受限于求解涉及Gram矩阵的线性系统的高计算成本。虽然基于共轭梯度（CG）方法的无矩阵NGD方法避免了显式矩阵求逆，但Gram矩阵的病态性显著降低了CG的收敛速度。在这项工作中，我们将无矩阵NGD扩展到比以往更广泛的问题类别，并提出使用随机数值线性代数（RandNLA）技术对内部CG求解器进行高效预条件处理。所得算法在多种使用神经网络离散化的PDE问题上，相较于现有基于NGD的方法和其他最先进的优化器，展示了显著的性能提升。

英文摘要

Natural Gradient Descent (NGD) has emerged as a promising optimization algorithm for training neural network-based solvers for partial differential equations (PDEs), such as Physics-Informed Neural Networks (PINNs). However, its practical use is often limited by the high computational cost of solving linear systems involving the Gramian matrix. While matrix-free NGD methods based on the conjugate gradient (CG) method avoid explicit matrix inversion, the ill-conditioning of the Gramian significantly slows the convergence of CG. In this work, we extend matrix-free NGD to broader classes of problems than previously considered and propose the use of Randomized Numerical Linear Algebra (RandNLA) techniques for efficient preconditioning of the inner CG solver. The resulting algorithm demonstrates substantial performance improvements over existing NGD-based methods and other state-of-the-art optimizers on a range of PDE problems discretized using neural networks.

URL PDF HTML ☆

赞 0 踩 0

2510.11170 2026-05-28 cs.LG cs.AI cs.CL 版本更新

EAGer: Entropy-Aware GEneRation for Adaptive Inference-Time Scaling

EAGer: 基于熵感知的自适应推理时缩放生成方法

Daniel Scalena, Leonidas Zotos, Elisabetta Fersini, Malvina Nissim, Ahmet Üstün

发表机构 * University of Groningen（格罗宁根大学）； University of Milan - Bicocca（米兰-比科卡大学）； Cohere Labs（Cohere实验室）

AI总结提出一种无需训练的生成方法EAGer，利用逐词熵分布动态分配计算资源，在复杂推理任务中提升性能并减少冗余计算。

详情

AI中文摘要

随着推理语言模型和测试时缩放方法作为提升模型性能范式的兴起，通常需要大量计算来从同一提示生成多个候选序列。这允许探索通向正确答案的不同推理路径，然而，为每个提示分配相同的计算预算。基于不同提示具有不同复杂度因而需要不同计算量的假设，我们提出EAGer，一种无需训练的生成方法，通过逐词熵分布利用模型不确定性来减少冗余计算并同时提升整体性能。EAGer仅在存在高熵词时分支到多个推理路径，并将节省的计算预算重新分配到最需要探索替代路径的实例上。我们在复杂推理基准上对多个开源模型验证了EAGer，特别是在AIME 2025上展示了增益。当目标标签可访问时（如在RLVR训练流程中），EAGer在Pass@k上提升高达37%，且token减少59%；在测试时设置中，与全并行采样相比，仍能在Pass@k上提升12%，且token减少64%。

英文摘要

With the rise of reasoning language models and test-time scaling methods as a paradigm for improving model performance, substantial computation is often required to generate multiple candidate sequences from the same prompt. This enables exploration of different reasoning paths toward the correct solution, however, allocates the same compute budget for each prompt. Grounded on the assumption that different prompts carry different degrees of complexity, and thus different computation needs, we propose EAGer, a training-free generation method that leverages model uncertainty through token-wise entropy distribution to reduce redundant computation and concurrently improve overall performance. EAGer allows branching to multiple reasoning paths only in the presence of high-entropy tokens, and reallocates the saved compute budget to instances where exploration of alternative paths is most needed. We validate EAGer across multiple open-source models on complex reasoning benchmarks, with gains specifically demonstrated on AIME 2025. When target labels are accessible -- as in RLVR training pipelines -- EAGer achieves up to +37% in Pass@k and 59% fewer tokens; in test-time settings it still yields +12% in Pass@k and 64% fewer tokens compared to Full Parallel Sampling.

URL PDF HTML ☆

赞 0 踩 0

2503.01450 2026-05-28 cs.LG cs.AI cs.RO 版本更新

Investigating Memory in Model-Free RL with POPGym Arcade

基于POPGym Arcade的无模型强化学习中的记忆研究

Zekang Wang, Zhe He, Borong Zhang, Edan Toledo, Steven Morad

发表机构 * Faculty of Science and Technology, University of Macau（澳门大学科技学院）； Centre for AI, University College London（伦敦大学学院人工智能中心）

AI总结本文通过引入分析工具和POPGym Arcade环境套件，研究深度强化学习中的记忆机制，发现价值函数会将信用分配到无关历史，并展示分布外场景如何污染记忆。

Comments Appear at ICML 2026 as a Spotlight paper

2508.01521 2026-05-28 cs.LG 版本更新

Prototype Learning to Create Refined Interpretable Digital Phenotypes from ECGs

原型学习从心电图创建精细的可解释数字表型

Sahil Sethi, David Chen, Michael C. Burkhart, Nipun Bhandari, Bashar Ramadan, Brett Beaulieu-Jones

发表机构 * 1Pritzker School of Medicine, University of Chicago, IL, USA ； 2Center for Computational Medicine \& Clinical AI, Section of Biomedical Data Science, Department of Medicine, University of Chicago, IL, USA ； 3Division of Cardiovascular Medicine, Department of Internal Medicine, University of California Davis, CA, USA ； 4Section of Hospital Medicine, Department of Medicine, University of Chicago, IL, USA

AI总结使用基于原型的深度学习模型，从ECG中学习可解释的表型，并在外部临床数据中验证其与诊断代码的关联，发现原型能捕捉超越原始训练目标的临床意义。

Comments Accepted (oral) to the 31st Pacific Symposium on Biocomputing

详情

DOI: 10.1142/9789819824755_0049

AI中文摘要

基于原型的神经网络通过将输入与训练数据中学到的代表性信号模式进行比较，提供可解释的预测。尽管此类模型在生理数据分类中显示出前景，但其原型是否捕捉到与更广泛临床表型一致的基础结构仍不清楚。我们使用基于原型的深度学习模型，在PTB-XL数据集上训练进行多标签ECG分类，然后在不修改的情况下对MIMIC-IV临床数据库进行推理。我们评估了仅用于分类训练的单个原型是否与外部人群中以phecode形式表示的出院诊断相关。与分类器的类别预测、NLP提取的概念或更广泛的原型类别相比，单个原型在所有phecode类别中表现出显著更强且更特异的临床结果关联。具有混合显著性模式的原型类别表现出显著更大的类内距离（p < 0.0001），表明模型学会了区分诊断类别内临床有意义的变异。原型在多种疾病中实现了强大的预测性能，AUC范围从房颤的0.89到心力衰竭的0.91，同时也在非心脏疾病（如败血症和肾病）中显示出显著信号。这些发现表明，基于原型的模型可以支持从生理时间序列数据中进行可解释的数字表型分析，提供可转移的中间表型，捕捉超越原始训练目标的临床有意义的生理特征。

英文摘要

Prototype-based neural networks offer interpretable predictions by comparing inputs to learned, representative signal patterns anchored in training data. While such models have shown promise in the classification of physiological data, it remains unclear whether their prototypes capture an underlying structure that aligns with broader clinical phenotypes. We use a prototype-based deep learning model trained for multi-label ECG classification using the PTB-XL dataset. Then without modification we performed inference on the MIMIC-IV clinical database. We assess whether individual prototypes, trained solely for classification, are associated with hospital discharge diagnoses in the form of phecodes in this external population. Individual prototypes demonstrate significantly stronger and more specific associations with clinical outcomes compared to the classifier's class predictions, NLP-extracted concepts, or broader prototype classes across all phecode categories. Prototype classes with mixed significance patterns exhibit significantly greater intra-class distances (p $<$ 0.0001), indicating the model learned to differentiate clinically meaningful variations within diagnostic categories. The prototypes achieve strong predictive performance across diverse conditions, with AUCs ranging from 0.89 for atrial fibrillation to 0.91 for heart failure, while also showing substantial signal for non-cardiac conditions such as sepsis and renal disease. These findings suggest that prototype-based models can support interpretable digital phenotyping from physiologic time-series data, providing transferable intermediate phenotypes that capture clinically meaningful physiologic signatures beyond their original training objectives.

URL PDF HTML ☆

赞 0 踩 0

2510.07208 2026-05-28 cs.LG 版本更新

A Broader View of Thompson Sampling

汤普森采样的更广阔视角

Yanlin Qu, Hongseok Namkoong, Assaf Zeevi

发表机构 * Columbia Business School（哥伦比亚商学院）

AI总结本文通过将汤普森采样重新解释为在线优化算法，揭示了其平衡探索与利用的机制，并提出了基于残差不确定性正则化的策略改进方法。

详情

AI中文摘要

汤普森采样是最广泛使用和研究的赌博机算法之一，以其简单的结构、低遗憾性能和坚实的理论保证而闻名。然而，与大多数其他赌博机算法家族形成鲜明对比的是，后验采样（由汤普森引入）能够“适当”平衡探索和利用的确切机制仍然是一个谜。在本文中，我们表明解决这个问题的核心见解在于将汤普森采样重新解释为一种在线优化算法。为了提炼这一点，我们引入了一个合适的时间不变遗憾概念，导致一个平稳化的赌博机问题和一个平稳的贝尔曼最优策略。然后，我们证明汤普森采样具有一种在线优化形式，该形式模仿了上述贝尔曼最优策略的结构，其中“贪婪”由残差不确定性的度量进行正则化。这种在线优化的新视角使我们能够更好地理解汤普森采样的动态，以及一种模仿贝尔曼最优基准的策略改进原则性方法。

英文摘要

Thompson Sampling is one of the most widely used and studied bandit algorithms, known for its simple structure, low regret performance, and solid theoretical guarantees. Yet, in stark contrast to most other families of bandit algorithms, the exact mechanism through which posterior sampling (as introduced by Thompson) is able to "properly" balance exploration and exploitation, remains a mystery. In this paper, we show that the core insight to address this question stems from recasting Thompson Sampling as an online optimization algorithm. To distill this, we introduce a suitable time invariant notion of regret that leads to a stationarized bandit problem, and a stationary Bellman-optimal policy. We then show that Thompson Sampling admits an online optimization form that mimics the structure of the aforementioned Bellman-optimal policy, where "greediness" is regularized by a measure of residual uncertainty. This new lens of online optimization allows both a better understanding of Thompson Sampling dynamics, as well as a principled manner for policy improvement that mimics the Bellman-optimal benchmark.

URL PDF HTML ☆

赞 0 踩 0

2510.06970 2026-05-28 eess.SY cs.LG cs.SY 版本更新

Falsification-driven reinforcement learning for maritime motion planning

基于证伪驱动的强化学习用于海上运动规划

Marlon Müller, Florian Finkeldei, Hanna Krasowski, Murat Arcak, Matthias Althoff

发表机构 * Technical University of Munich（慕尼黑技术大学）； University of California, Berkeley（加州大学伯克利分校）； Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心）

AI总结提出一种证伪驱动的强化学习方法，通过生成对抗性训练场景（违反信号时态逻辑规范的海上交通规则）来提升自主船舶的规则遵守能力，实验表明该方法能提供更相关的训练场景并实现更一致的规则遵守。

Comments 11 pages, 9 figures. Code available at https://fdrl-maritime.github.io

详情

DOI: 10.1016/j.oceaneng.2026.125579
Journal ref: Ocean Engineering 361 (2026) 125579

AI中文摘要

遵守海上交通规则对于自主船舶的安全运行至关重要，但训练强化学习（RL）代理遵守这些规则具有挑战性。RL代理的行为由其遇到的训练场景塑造，但创建能够捕捉海上导航复杂性的场景并非易事，仅靠真实世界数据是不够的。为了解决这个问题，我们提出了一种证伪驱动的RL方法，该方法生成对抗性训练场景，其中被测船舶违反以信号时态逻辑规范表示的海上交通规则。我们在两艘船舶的公海导航实验表明，所提出的方法提供了更相关的训练场景，并实现了更一致的规则遵守。

英文摘要

Compliance with maritime traffic rules is essential for the safe operation of autonomous vessels, yet training reinforcement learning (RL) agents to adhere to them is challenging. The behavior of RL agents is shaped by the training scenarios they encounter, but creating scenarios that capture the complexity of maritime navigation is non-trivial, and real-world data alone is insufficient. To address this, we propose a falsification-driven RL approach that generates adversarial training scenarios in which the vessel under test violates maritime traffic rules, which are expressed as signal temporal logic specifications. Our experiments on open-sea navigation with two vessels demonstrate that the proposed approach provides more relevant training scenarios and achieves more consistent rule compliance.

URL PDF HTML ☆

赞 0 踩 0

2502.17832 2026-05-28 cs.LG cs.AI cs.CR cs.CV 版本更新

MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks

MM-PoisonRAG：通过局部和全局投毒攻击破坏多模态RAG

Hyeonjeong Ha, Qiusi Zhan, Jeonghwan Kim, Dimitrios Bralios, Saikrishna Sanniboina, Nanyun Peng, Kai-Wei Chang, Daniel Kang, Heng Ji

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； University of California Los Angeles（加州大学洛杉矶分校）

AI总结提出MM-PoisonRAG框架，通过局部投毒攻击（LPA）和全局投毒攻击（GPA）两种策略，系统研究多模态检索增强生成（RAG）在知识投毒下的脆弱性，实验表明攻击成功率高达56%且能绕过现有防御。

Comments Code is available at https://github.com/HyeonjeongHa/MM-PoisonRAG

详情

AI中文摘要

检索增强生成（RAG）已成为多模态大语言模型（MLLM）中增强事实基础并减少幻觉的常见做法。然而，其对检索的依赖使MLLM面临知识投毒攻击，攻击者故意将恶意多模态内容注入外部知识库，以引导模型生成不正确甚至有害的响应。我们提出MM-PoisonRAG框架，系统研究多模态RAG在知识投毒下的脆弱性。具体地，我们设计了两种新颖的攻击策略：局部投毒攻击（LPA），植入针对特定查询的多模态错误信息以操纵输出至攻击者控制的响应；以及全局投毒攻击（GPA），使用单一、非定向的对抗性注入广泛破坏推理并降低所有查询的生成质量。在多样化任务、多模态RAG组件和攻击者访问级别上的大量实验揭示了严重的脆弱性：LPA即使在受限访问下也能达到高达56%的攻击成功率，并且无需重新优化对抗样本即可在四种不同的检索器之间有效迁移。GPA仅需一个投毒内容即可完全破坏模型生成，使准确率降至0%。此外，LPA和GPA均能绕过现有防御，突显了多模态RAG的脆弱性，并将MM-PoisonRAG确立为未来保护RAG框架免受多模态知识投毒研究的基础。

英文摘要

Retrieval-augmented generation (RAG) has become a common practice in multimodal large language models (MLLM) to enhance factual grounding and reduce hallucination. Yet, its reliance on retrieval exposes MLLMs to knowledge poisoning attacks, in which adversaries deliberately inject malicious multimodal content into external knowledge bases to steer models toward generating incorrect or even harmful responses. We present MM-PoisonRAG, a framework to systematically study the vulnerability of multimodal RAG under knowledge poisoning. Specifically, we design two novel attack strategies: Localized Poisoning Attack (LPA), which implants targeted, query-specific multimodal misinformation to manipulate outputs toward attacker-controlled responses, and Globalized Poisoning Attack (GPA), which uses a single, untargeted adversarial injection to broadly corrupt reasoning and collapse generation quality across all queries. Extensive experiments on diverse tasks, multimodal RAG components, and attacker access levels reveal severe vulnerabilities: LPA achieves up to 56% attack success rate even under restricted access, and transfers effectively across four different retrievers without re-optimizing the adversaries. GPA completely disrupts model generation to 0% accuracy with just one poisoned content. Moreover, both LPA and GPA bypass existing defenses, underscoring the fragility of multimodal RAG and establishing MM-PoisonRAG as a foundation for future research on securing RAG frameworks against multimodal knowledge poisoning.

URL PDF HTML ☆

赞 0 踩 0

2509.26442 2026-05-28 cs.LG math.OC 版本更新

Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning

Robbins-Siegmund定理的扩展及其在强化学习中的应用

Xinyu Liu, Zixuan Xie, Shangtong Zhang

发表机构 * Department of Computer Science（计算机科学系）； University of Virginia（弗吉尼亚大学）

AI总结针对零阶项不可和但平方可和的情况，通过引入随机过程增量的温和假设，扩展了Robbins-Siegmund定理，证明了几乎必然收敛到有界集，并给出了收敛速率、高概率置信界和Lp收敛率，首次应用于线性函数逼近的Q-learning。

详情

AI中文摘要

Robbins-Siegmund定理建立了几乎超鞅的随机过程的收敛性，是随机逼近和强化学习（RL）中分析随机迭代算法最常用的方法之一。然而，其原始形式有一个显著的限制，即要求零阶项是可和的。在许多重要的RL应用中，这种可和条件无法满足。这一限制促使我们扩展Robbins-Siegmund定理，用于零阶项不可和但仅平方可和的几乎超鞅。特别地，我们引入了一个关于随机过程增量的新颖且温和的假设。该假设与平方可和条件一起，实现了几乎必然收敛到有界集。此外，我们进一步提供了几乎必然收敛速率、高概率置信界和$L^p$收敛速率。然后，我们将新结果应用于随机逼近和RL。值得注意的是，我们首次获得了线性函数逼近的Q-learning的几乎必然收敛速率、高概率置信界和$L^p$收敛速率。

英文摘要

The Robbins-Siegmund theorem establishes the convergence of stochastic processes that are almost supermartingales and is one of the most commonly used approaches for analyzing stochastic iterative algorithms in stochastic approximation and reinforcement learning (RL). However, its original form has a significant limitation as it requires the zero-order term to be summable. In many important RL applications, this summable condition, however, cannot be met. This limitation motivates us to extend the Robbins-Siegmund theorem for almost supermartingales where the zero-order term is not summable, but only square-summable. In particular, we introduce a novel and mild assumption on the increments of the stochastic processes. This together with the square-summable condition enables an almost sure convergence to a bounded set. Additionally, we further provide almost sure convergence rates, high probability concentration bounds, and $L^p$ convergence rates. We then apply the new results to stochastic approximation and RL. Notably, we obtain the first almost sure convergence rate, the first high probability concentration bound, and the first $L^p$ convergence rate for $Q$-learning with linear function approximation.

URL PDF HTML ☆

赞 0 踩 0

2509.22553 2026-05-28 stat.ML cs.LG 版本更新

Linear Causal Representation Learning by Topological Ordering, Pruning, and Disentanglement

通过拓扑排序、剪枝和解缠的线性因果表示学习

Hao Chen, Lin Liu, Yu Guang Wang

发表机构 * School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China（上海交通大学数学科学学院）； Institute of Natural Sciences, MOE--LSC（教育部-上海交大联合实验室）； SJTU--Yale Joint Center for Biostatistics（上海交大-耶鲁大学生物统计联合中心）； Data Science, Shanghai Jiao Tong University, Shanghai, China（上海交通大学数据科学）； Bio--X Institutes, Shanghai Jiao Tong University, Shanghai, China（上海交通大学Bio-X研究院）

AI总结提出一种在更弱假设下通过拓扑排序、剪枝和解缠恢复线性因果表示的新算法，并通过合成实验和大语言模型可解释性分析验证其有效性。

详情

AI中文摘要

因果表示学习（CRL）因其能够利用现代数据集的异质性，将复杂的数据生成机制解缠为因果可解释的潜在特征，而日益引起因果推断和人工智能领域的兴趣。本文进一步为CRL文献做出贡献，专注于潜在特征上的风格化线性结构因果模型，并假设一个线性混合函数将潜在特征映射到观测数据或测量值。现有的线性CRL方法通常依赖于严格假设，例如访问单节点干预数据或对潜在特征和/或外生测量噪声施加限制性分布约束。然而，这些先决条件在实践中容易违反。在这项工作中，我们提出了一种新颖的线性CRL算法，与现有方法不同，它在对环境异质性和数据生成分布的更弱假设下运行，同时仍然能够恢复潜在因果特征直至等价类。我们通过合成实验和大语言模型的可解释性分析进一步验证了我们的新算法，展示了其在有限样本下优于竞争方法的性能，以及将因果性融入理解人工智能的潜力。源代码可在https://github.com/utulie/code_for_linear_crl_paper_creator获取。

英文摘要

Causal representation learning (CRL) has garnered increasing interest from the causal inference and artificial intelligence communities due to its potential to disentangle complex data-generating mechanism into causally interpretable latent features by leveraging the heterogeneity of modern datasets. In this paper, we further contribute to the CRL literature, by focusing on the stylized linear structural causal model over latent features and assuming a linear mixing function that maps latent features to the observed data or measurements. Existing linear CRL methods often rely on stringent assumptions, such as access to single-node interventional data or restrictive distributional constraints on latent features and/or exogenous measurement noise. However, these prerequisites can be easy to violate in practice. In this work, we propose a novel linear CRL algorithm that, unlike existing methods, operates under weaker assumptions on environment heterogeneity and data-generating distributions while still recovering latent causal features up to an equivalence class. We further validate our new algorithm via synthetic experiments and an interpretability analysis of large language models, demonstrating both its superiority over competing methods in finite samples and its potential in integrating causality into understanding artificial intelligence. The source code is available at https://github.com/utulie/code_for_linear_crl_paper_creator.

URL PDF HTML ☆

赞 0 踩 0

2508.21495 2026-05-28 cs.LG 版本更新

Rethinking Calibration for Early-Exit Neural Networks

重新思考早退神经网络的校准

Piotr Kubaty, Filip Szatkowski, Grzegorz Choczyński, Eric Nalisnick, Bartosz Wójcik

发表机构 * Jagiellonian University（雅盖隆大学）； Warsaw University of Technology（华沙理工大学）； IDEAS Research Institute（IDEAS研究院）； Johns Hopkins University（约翰霍普金斯大学）； Doctoral School of Exact and Natural Sciences（精确与自然科学博士学院）

AI总结本文质疑校准对早退神经网络性能的充分性，提出早退失败预测（EEFP）以同时考虑预测正确性和计算成本，并设计轻量级改进方法，实现更优的成本-准确率权衡。

详情

AI中文摘要

早退神经网络（EENN）通过允许中间分类器在预测足够自信时停止计算来加速推理。大多数方法依赖置信度阈值进行退出，因此通常认为改进分类器校准能提升性能。在这项工作中，我们挑战这一假设，并表明仅靠校准不足以让EENN利用自适应计算。为解决这一不足，我们引入了早退失败预测（EEFP），它同时考虑了预测正确性和进一步计算的成本。我们还提出了一种轻量级的、基于EEFP的改进中间分类器的程序，可以直接替代EENN中的校准。大量实验表明，与校准相比，我们的方法实现了更优的成本-准确率权衡，并且EEFP更可靠地反映了整体EENN性能。我们的代码可在https://github.com/gmum/rethinking-calibration-for-eenns获取。

英文摘要

Early-exit neural networks (EENNs) accelerate inference by allowing intermediate classifiers to stop computation once predictions are confident enough. Most methods rely on confidence thresholds for exiting, and consequently, improving classifier calibration is widely assumed to improve performance. In this work, we challenge this assumption and show that calibration alone is not sufficient for EENNs to exploit adaptive computation. To address this insufficiency, we introduce Early-Exit Failure Prediction (EEFP), which accounts for both prediction correctness and the cost of further computation. We also propose a lightweight, EEFP-motivated procedure to improve the intermediate classifiers, which can directly replace calibration in EENNs. Extensive experiments demonstrate that our approach achieves superior cost-accuracy trade-offs compared to calibration, and EEFP more reliably reflects overall EENN performance. Our code is available at https://github.com/gmum/rethinking-calibration-for-eenns.

URL PDF HTML ☆

赞 0 踩 0

2507.09466 2026-05-28 cs.LG q-bio.QM 版本更新

La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching

La-Proteina: 通过部分潜变量流匹配进行原子级蛋白质生成

Tomas Geffner, Kieran Didi, Zhonglin Cao, Danny Reidenbach, Zuobai Zhang, Christian Dallago, Emine Kucukbenli, Karsten Kreis, Arash Vahdat

发表机构 * NVIDIA ； University of Oxford（牛津大学）； Mila - Québec AI Institute（魁北克AI研究所）； Université de Montréal（蒙特利尔大学）

AI总结提出La-Proteina模型，利用部分潜变量表示和流匹配方法联合生成蛋白质的全原子结构和氨基酸序列，在多项基准测试中达到最先进性能。

详情

AI中文摘要

近年来，出现了许多用于从头蛋白质结构设计的生成模型。然而，只有少数模型能够处理直接生成全原子结构及其对应氨基酸序列这一艰巨任务。这之所以具有挑战性，例如是因为模型必须处理在生成过程中长度变化的侧链。我们提出了La-Proteina，用于原子级蛋白质设计，基于一种新颖的部分潜变量蛋白质表示：粗粒度主链结构被显式建模，而序列和原子细节则通过每个残基的固定维度潜变量捕获，从而有效规避了显式侧链表示的挑战。在此部分潜变量空间中的流匹配则对序列和全原子结构的联合分布进行建模。La-Proteina在多个生成基准测试中达到了最先进的性能，包括全原子共设计性、多样性和结构有效性，这一点通过详细的结构分析和评估得到了证实。值得注意的是，La-Proteina在原子级基序支架设计性能上也超越了之前的模型，解锁了关键的原子结构条件蛋白质设计任务。此外，La-Proteina能够生成多达800个残基的共设计蛋白质，而在此规模下大多数基线模型都会崩溃并无法生成有效样本，这证明了La-Proteina的可扩展性和鲁棒性。

英文摘要

Recently, many generative models for de novo protein structure design have emerged. Yet, only few tackle the difficult task of directly generating fully atomistic structures jointly with the underlying amino acid sequence. This is challenging, for instance, because the model must reason over side chains that change in length during generation. We introduce La-Proteina for atomistic protein design based on a novel partially latent protein representation: coarse backbone structure is modeled explicitly, while sequence and atomistic details are captured via per-residue latent variables of fixed dimensionality, thereby effectively side-stepping challenges of explicit side-chain representations. Flow matching in this partially latent space then models the joint distribution over sequences and full-atom structures. La-Proteina achieves state-of-the-art performance on multiple generation benchmarks, including all-atom co-designability, diversity, and structural validity, as confirmed through detailed structural analyses and evaluations. Notably, La-Proteina also surpasses previous models in atomistic motif scaffolding performance, unlocking critical atomistic structure-conditioned protein design tasks. Moreover, La-Proteina is able to generate co-designable proteins of up to 800 residues, a regime where most baselines collapse and fail to produce valid samples, demonstrating La-Proteina's scalability and robustness.

URL PDF HTML ☆

赞 0 踩 0

2507.06999 2026-05-28 cs.CV cs.CL cs.LG 版本更新

Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs

有意学习，直觉行动：解锁多模态大语言模型的测试时推理能力

Yahan Yu, Yuyang Dong, Masafumi Oyamada

发表机构 * Kyoto University（京都大学）； Initial S ； NEC Corporation, Japan（日本NEC公司）

AI总结提出D2I框架，通过训练时使用基于规则的格式奖励进行有意推理以增强模态对齐，推理时移除显式策略转为直觉推理，从而提升多模态大语言模型的推理能力，无需额外标注或复杂奖励。

Comments 22 pages, 24 figures

详情

AI中文摘要

推理对于大型语言模型（LLMs）至关重要，尤其是在数学问题求解等复杂任务中。然而，多模态推理在模态对齐和训练可扩展性方面仍面临挑战，因为许多现有方法依赖于额外的标注或复杂的基于规则的奖励。为了解决这些问题，我们提出了“有意到直觉”推理框架（D2I），该框架无需额外标注或复杂奖励即可提升多模态大语言模型（MLLMs）的理解和推理能力。在训练过程中，D2I使用仅由基于规则的格式奖励监督的有意推理策略来增强模态对齐。在推理过程中，它通过移除这些显式策略转向直觉推理，使模型能够在其响应中隐式应用所获得的能力。D2I在域内和域外基准测试中均优于基线，突显了格式奖励在培养可迁移多模态推理技能方面的有效性，并表明将训练时的推理深度与测试时的响应灵活性解耦是有益的。

英文摘要

Reasoning is essential for large language models (LLMs), especially in complex tasks such as mathematical problem solving. However, multimodal reasoning still faces challenges in modality alignment and training scalability, as many existing methods rely on additional annotations or complex rule-based rewards. To address these issues, we propose the Deliberate-to-Intuitive reasoning framework (D2I), which improves the understanding and reasoning abilities of multimodal LLMs (MLLMs) without extra annotations or complex rewards. During training, D2I uses deliberate reasoning strategies supervised only by rule-based format rewards to enhance modality alignment. During inference, it shifts to intuitive reasoning by removing these explicit strategies, allowing the model to implicitly apply the acquired abilities in its responses. D2I outperforms baselines on both in-domain and out-of-domain benchmarks, highlighting the effectiveness of format rewards in fostering transferable multimodal reasoning skills and suggesting the benefit of decoupling training-time reasoning depth from test-time response flexibility.

URL PDF HTML ☆

赞 0 踩 0

2506.12444 2026-05-28 math.OC cs.LG 版本更新

Adjusted Shuffling SARAH: Advancing Complexity Analysis via Dynamic Gradient Weighting

调整的Shuffling SARAH：通过动态梯度加权推进复杂度分析

Duc Toan Nguyen, Trang H. Tran, Lam M. Nguyen

发表机构 * Rice University（里士大学）； Lehigh University（莱斯大学）； Thomas J. Watson Research Center, IBM Research（IBM研究院沃森研究中心）

AI总结提出Adjusted Shuffling SARAH算法，通过动态加权机制集成shuffling策略到递归SARAH框架，在强凸和非凸设置下达到最优理论保证，并引入不精确模式实现与数据集大小无关的总复杂度。

2506.08928 2026-05-28 cs.LG stat.ME stat.ML 版本更新

Local MDI+: Local Feature Importances for Tree-Based Models

Local MDI+: 基于树的模型的局部特征重要性

Zhongyuan Liang, Zachary T. Rewolinski, Abhineet Agarwal, Tiffany M. Tang, Bin Yu

发表机构 * Department of Computational Precision Health（计算精准健康系）； UC Berkeley（伯克利大学）； UCSF（旧金山分校）； Department of Statistics（统计系）； Department of Applied and Computational Mathematics and Statistics（应用与计算数学和统计系）； University of Notre Dame（诺特大学）； Department of Statistics and EECS（统计系和电子工程与计算机科学系）

AI总结提出Local MDI+ (LMDI+)方法，通过扩展MDI+框架到局部特征重要性，在12个基准数据集上平均提升10%的预测性能，并展现出更高的稳定性和可解释性。

详情

AI中文摘要

基于树的集成方法（如随机森林）由于其预测性能和计算效率，在表格数据上仍然是深度学习模型的首选。这些优势使其在高风险领域得到广泛应用，在这些领域中，可解释性对于确保可信预测至关重要。这推动了流行的局部特征重要性方法（如LIME和TreeSHAP）的发展。然而，这些方法依赖于忽略模型内部结构的近似，并依赖于可能不稳定的扰动。这些问题在全局设置中通过MDI+得到解决，MDI+是一种全局特征重要性方法，它通过利用决策树和最小二乘法在变换后的节点基上的等价性，结合了基于树和线性的特征重要性。然而，全局MDI+分数无法在面临异质个体特征时解释预测。为了解决这一差距，我们提出了Local MDI+ (LMDI+)，这是MDI+框架的一种新颖扩展，用于量化每个特定样本的特征重要性。在12个真实世界基准数据集上，LMDI+在识别实例特定的预测特征方面优于现有基线，仅使用所选特征时平均预测性能提升10%。它进一步展现出更高的稳定性，在不同随机种子的重复模型拟合中，一致地产生相似的实例级特征重要性排名。消融实验表明，LMDI+的每个组件都对这一提升有贡献，并且这些改进不仅限于随机森林，还扩展到梯度提升模型。最后，我们展示了LMDI+通过为每个分类基准识别紧密匹配的反事实案例，以及在住房数据集案例研究中发现同质子群，从而实现了局部可解释性的用例。

英文摘要

Tree-based ensembles such as random forests remain the go-to for tabular data over deep learning models due to their prediction performance and computational efficiency. These advantages have led to their widespread deployment in high-stakes domains, where interpretability is essential for ensuring trustworthy predictions. This has motivated the development of popular local feature importance methods such as LIME and TreeSHAP. However, these approaches rely on approximations that ignore the model's internal structure and instead depend on potentially unstable perturbations. These issues are addressed in the global setting by MDI+, a global feature importance method which combines tree-based and linear feature importances by exploiting an equivalence between decision trees and least squares on a transformed node basis. However, the global MDI+ scores are not able to explain predictions when faced with heterogeneous individual characteristics. To address this gap, we propose Local MDI+ (LMDI+), a novel extension of the MDI+ framework that quantifies feature importances for each particular sample. Across twelve real-world benchmark datasets, LMDI+ outperforms existing baselines at identifying instance-specific predictive features, yielding an average 10% improvement in predictive performance when using only the selected features. It further demonstrates greater stability by consistently producing similar instance-level feature importance rankings across repeated model fits with different random seeds. Ablation experiments show that each component of LMDI+ contributes to these gains, and that the improvements extend beyond random forests to gradient boosting models. Finally, we show that LMDI+ enables local interpretability use cases by identifying closely matched counterfactuals for each classification benchmark and discovering homogeneous subgroups in a housing dataset case study.

URL PDF HTML ☆

赞 0 踩 0

2502.05242 2026-05-28 cs.CL cs.AI cs.CV cs.LG 版本更新

Beyond External Monitors: Enhancing Transparency of Large Language Models for Easier Monitoring

超越外部监控：增强大型语言模型的透明度以便于监控

Guanxu Chen, Jing Shao, Tao Luo, Lijie Hu, Qihao Lin, Dongrui Liu

发表机构 * Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； ICISEE, Shanghai Jiao Tong University（上海交通大学ICISEE）； School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC, CMA-Shanghai, Shanghai Jiao Tong University（上海交通大学数学科学学院）； King Abdullah University of Science and Technology（卡塔尔国王 Abdullah 科学与技术大学）

AI总结提出TELLME方法，通过改进大型语言模型的内部表征透明度，帮助监控者识别不当和敏感行为，并在去毒化任务中验证其有效性。

Comments 28 pages,8 figures,15 tables

详情

AI中文摘要

大型语言模型（LLMs）的能力日益增强，但其思维和决策过程的机制仍不清楚。思维链（CoTs）常被用来外化LLMs的思维，但这一策略未能准确反映LLMs的思维过程。基于LLMs隐藏表征的技术提供了内部视角，以改善对其潜在思维的可监控性。然而，以往的方法仅尝试开发外部模块，而非使LLMs本身更易于监控。本文提出了一种新方法TELLME，提高了LLMs的透明度，并帮助监控者识别不合适和敏感的行为。此外，我们在去毒化任务上展示了TELLME的有效性，LLMs在多模态测试集、不同架构和不同参数规模上均取得了一致的改进。我们进一步从最优传输理论和实证角度分析了TELLME对LLMs泛化能力的提升。

英文摘要

Large language models (LLMs) are becoming increasingly capable, but the mechanisms of their thinking and decision-making processes remain unclear. Chain-of-thoughts (CoTs) have been commonly utilized to externalize LLMs' thinking, but this strategy fails to accurately reflect LLMs' thinking process. Techniques based on LLMs' hidden representations provide an inner perspective to improve the monitorability of their latent thinking. However, previous methods only try to develop external modules instead of making LLMs themselves easier to monitor. In this paper, we propose a novel method, TELLME, improving the transparency of LLMs and helping monitors identify unsuitable and sensitive behaviors. Furthermore, we showcase the effectiveness of TELLME on detoxification tasks, where LLMs achieve consistent improvement among multimodal test sets, distinct architectures, and varying parameter scales. We further analyze TELLME's improvement on LLMs' generalization ability from both optimal transport theory and empirical perspectives.

URL PDF HTML ☆

赞 0 踩 0

2505.19342 2026-05-28 cs.LG cs.AI 版本更新

ASTRA: Communication-Efficient Acceleration for Multi-Device Transformer Inference

ASTRA：面向多设备Transformer推理的通信高效加速

Xiao Liu, Lijun Zhang, Deepak Ganesan, Hui Guan

发表机构 * Manning College of Information and Computer Sciences, University of Massachusetts Amherst, MA, US（信息与计算机科学学院，马萨诸塞大学阿默斯特分校，马萨诸塞州，美国）； Amazon, Seattle, WA, US（亚马逊，华盛顿州西雅图，美国）； Amazon AWS, Washington, D.C., US（亚马逊AWS，华盛顿特区，美国）

AI总结提出ASTRA框架，通过序列并行与混合精度注意力机制，在低带宽环境下实现高效多设备Transformer推理，显著加速并保持精度。

详情

AI中文摘要

多设备推理可以通过并行计算降低Transformer延迟。然而，现有方法需要高设备间带宽，使其在带宽受限环境中不实用。我们提出ASTRA，一个通信高效的框架，将序列并行与混合精度注意力相结合，其中非局部token嵌入作为低位向量量化码传输，而局部注意力保持全精度。为了在激进压缩下保持精度，ASTRA引入了噪声增强量化和分布式分类token。在视觉和语言模型（如ViT和GPT2）上，ASTRA在低至10 Mbps的带宽下，相比单设备推理实现了高达2.64倍的加速，相比先前的多设备基线实现了高达15.25倍的加速。即使在非理想网络条件（如丢包和动态网络）下，ASTRA在大模型（如Llama-3-8B）上仍然保持鲁棒性。

英文摘要

Multi-device inference can reduce Transformer latency by parallelizing computation. However, existing methods require high inter-device bandwidth, making them impractical for bandwidth-constrained environments. We present ASTRA, a communication-efficient framework that integrates sequence parallelism with mixed-precision attention, where non-local token embeddings are transmitted as low-bit vector-quantized codes while local attention remains full precision. To preserve accuracy under aggressive compression, ASTRA introduces Noise-Augmented Quantization and Distributed Class Tokens. Across vision and language models (e.g., ViT and GPT2), ASTRA achieves up to 2.64$\times$ speedup over single-device inference and up to 15.25$\times$ over prior multi-device baselines while operating at bandwidths as low as 10 Mbps. ASTRA remains robust on large models (e.g., Llama-3-8B) even under non-ideal network conditions such as packet loss and dynamic networks.

URL PDF HTML ☆

赞 0 踩 0

2505.17233 2026-05-28 cs.LG cs.SD eess.AS 版本更新

Semantic-Aware Interpretable Multimodal Music Auto-Tagging

语义感知可解释多模态音乐自动标注

Andreas Patakis, Vassilis Lyberatos, Spyridon Kantarelis, Edmund Dervakos, Giorgos Stamou

发表机构 * National Technical University of Athens（希腊国家技术大学）

AI总结提出一种利用多模态音乐特征组和期望最大化算法实现可解释音乐自动标注的方法，在保持竞争性能的同时提供决策过程透明度。

Comments Accepted at Interspeech 2025

详情

AI中文摘要

音乐自动标注对于大规模数字图书馆中的音乐组织和发现至关重要。尽管基础模型在该领域取得了卓越性能，但其输出往往缺乏可解释性，限制了研究人员和最终用户的信任与可用性。在这项工作中，我们提出了一种可解释的音乐自动标注框架，该框架利用从信号处理、深度学习、本体工程和自然语言处理中导出的具有音乐意义的多模态特征组。为了增强可解释性，我们对特征进行语义聚类，并采用期望最大化算法，根据每个特征组对标注过程的贡献分配不同的权重。我们的方法在实现具有竞争力的标注性能的同时，提供了对决策过程的更深入理解，为更透明和以用户为中心的音乐标注系统铺平了道路。

英文摘要

Music auto-tagging is essential for organizing and discovering music in extensive digital libraries. While foundation models achieve exceptional performance in this domain, their outputs often lack interpretability, limiting trust and usability for researchers and end-users alike. In this work, we present an interpretable framework for music auto-tagging that leverages groups of musically meaningful multimodal features, derived from signal processing, deep learning, ontology engineering, and natural language processing. To enhance interpretability, we cluster features semantically and employ an expectation maximization algorithm, assigning distinct weights to each group based on its contribution to the tagging process. Our method achieves competitive tagging performance while offering a deeper understanding of the decision-making process, paving the way for more transparent and user-centric music tagging systems.

URL PDF HTML ☆

赞 0 踩 0

2505.09861 2026-05-28 cs.LG cs.AI cs.IR stat.ME 版本更新

LiDDA: Data Driven Attribution at LinkedIn

LiDDA：领英的数据驱动归因

John Bencina, Erkut Aykutlug, Yue Chen, Zerui Zhang, Stephanie Sorenson, Shao Tang, Changshuai Wei

发表机构 * LinkedIn Corporation（LinkedIn公司）

AI总结提出一种基于Transformer的统一归因方法，处理成员级、聚合级数据和外部宏观因素，并在领英大规模实施，显著提升营销效果。

2503.18893 2026-05-28 cs.CL cs.LG 版本更新

xKV: Cross-Layer KV-Cache Compression via Aligned Singular Vector Extraction

xKV：通过对齐奇异向量提取的跨层KV缓存压缩

Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin, Hung-Yueh Chiang, Yash Akhauri, Xilai Dai, Huiqiang Jiang, Yucheng Li, Luis Ceze, Kai-Chiang Wu, Mohamed S. Abdelfattah

发表机构 * Cornell University（康奈尔大学）； University of Washington（华盛顿大学）； Department of Computer Science, National Yang Ming Chiao Tung University（国立阳明交通大学计算机科学系）； The University of Texas at Austin（德克萨斯大学奥斯汀分校）； University of Surrey（塞夫顿大学）； Microsoft Research Asia（微软亚洲研究院）

AI总结提出xKV，一种通过跨层共享低秩子空间压缩KV缓存的后训练方法，实现高达8倍压缩且保持长上下文任务精度，并引入选择性重建实现端到端加速。

Comments ICML 2026

详情

AI中文摘要

长上下文大型语言模型（LLMs）支持强大的应用，但由于键值状态（KV-Cache）导致高内存成本。最近的研究尝试跨层共享KV-Cache，但这些方法要么需要昂贵的预训练，要么依赖于实践中通常有限的逐token跨层余弦相似度。我们通过中心核对齐（CKA）表明，KV-Cache的主要奇异向量在层间对齐良好。受此观察启发，我们提出xKV，一种后训练压缩方法，将分组层的KV-Cache联合分解为共享的低秩子空间，大幅减少KV-Cache内存。在广泛使用的LLMs上，xKV实现了高达8倍的KV-Cache压缩，同时在长上下文任务和多轮设置中保持准确性。为进一步提高效率，我们在解码时引入选择性重建（SR）。结合SR，xKV相比全注意力基线实现了高达4.23倍的端到端加速，并在相似精度水平下以30%更高的吞吐量超越了显著基线。总体而言，xKV提供了一种即插即用的方法，用于减少长上下文LLM推理的内存和延迟。我们的代码公开于：https://github.com/abdelfattah-lab/xKV。

英文摘要

Long-context Large Language Models (LLMs) enable powerful applications but incur high memory costs due to the key-value states (KV-Cache). Recent studies attempt to share KV-Cache across layers, but these approaches either require expensive pretraining or rely on per-token cross-layer cosine similarity that is often limited in practice. We show, via Centered Kernel Alignment (CKA), that the dominant singular vectors of KV-Cache are well aligned across layers. Motivated by this observation, we propose xKV, a post-training compression method that jointly factorizes grouped-layer KV-Cache into a shared low-rank subspace, substantially reducing KV-Cache memory. Across widely used LLMs, xKV achieves up to 8x KV-Cache compression while preserving accuracy on long-context tasks and in multi-turn settings. To further improve efficiency, we introduce Selective Reconstruction (SR) at decode time. Combined with SR, xKV achieves up to 4.23x end-to-end speedup over the full attention baseline, and surpasses notable baselines with 30% higher throughput under a similar accuracy level. Overall, xKV provides a plug-and-play approach to reduce both memory and latency for long-context LLM inference. Our code is publicly available at: https://github.com/abdelfattah-lab/xKV.

URL PDF HTML ☆

赞 0 踩 0

2502.12468 2026-05-28 cs.LG cs.AI 版本更新

MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation

MCTS-Judge：LLM作为裁判在代码正确性评估中的测试时扩展

Yutong Wang, Pengliang Ji, Chaoqun Yang, Kaixin Li, Ming Hu, Jiaoyang Li, Guillaume Sartoretti

发表机构 * Independent Contributor（独立贡献者）

AI总结提出MCTS-Judge框架，利用蒙特卡洛树搜索在测试时进行多视角分解评估，显著提升LLM作为裁判在代码正确性评估中的准确性和效率。

详情

AI中文摘要

LLM作为裁判的范式在评估生成内容方面显示出潜力，但在推理密集型场景（如编程）中缺乏可靠性。受近期推理模型进展和扩展定律变化的启发，我们率先将测试时计算引入LLM作为裁判，提出MCTS-Judge，一种资源高效的、系统2思维框架，用于代码正确性评估。MCTS-Judge利用蒙特卡洛树搜索（MCTS）将问题分解为更简单的、多视角的评估。通过结合基于当前轨迹中历史动作的自我评估和基于先前rollout的树的上置信界（UCT）的节点选择策略，MCTS-Judge平衡了全局优化和当前轨迹的细化。我们进一步设计了一种高精度的、单元测试级别的奖励机制，以鼓励大语言模型（LLM）进行逐行分析。在三个基准和五个LLM上的大量实验证明了MCTS-Judge的有效性，它将基础模型的准确率从41%提升到80%，同时比o1系列模型少使用3倍的token。进一步的评估验证了其推理轨迹在逻辑、分析、全面性和整体质量上的优越性，同时揭示了LLM作为裁判范式的测试时扩展定律。

英文摘要

The LLM-as-a-Judge paradigm shows promise for evaluating generative content but lacks reliability in reasoning-intensive scenarios, such as programming. Inspired by recent advances in reasoning models and shifts in scaling laws, we pioneer bringing test-time computation into LLM-as-a-Judge, proposing MCTS-Judge, a resource-efficient, System-2 thinking framework for code correctness evaluation. MCTS-Judge leverages Monte Carlo Tree Search (MCTS) to decompose problems into simpler, multi-perspective evaluations. Through a node-selection strategy that combines self-assessment based on historical actions in the current trajectory and the Upper Confidence Bound for Trees based on prior rollouts, MCTS-Judge balances global optimization and refinement of the current trajectory. We further designed a high-precision, unit-test-level reward mechanism to encourage the Large Language Model (LLM) to perform line-by-line analysis. Extensive experiments on three benchmarks and five LLMs demonstrate the effectiveness of MCTS-Judge, which improves the base model's accuracy from 41% to 80%, surpassing the o1-series models with 3x fewer tokens. Further evaluations validate the superiority of its reasoning trajectory in logic, analytics, thoroughness, and overall quality, while revealing the test-time scaling law of the LLM-as-a-Judge paradigm.

URL PDF HTML ☆

赞 0 踩 0

2502.08695 2026-05-28 stat.ML cs.LG 版本更新

A Bayesian Nonparametric Perspective on Mahalanobis Distance for Out of Distribution Detection

马氏距离用于分布外检测的贝叶斯非参数视角

Randolph W. Linderman, Noah Cowan, Yiran Chen, Scott W. Linderman

发表机构 * Electrical and Computer Engineering Department（电气与计算机工程系）； Duke University（杜克大学）； Statistics Department（统计学系）； Stanford University（斯坦福大学）； Wu Tsai Neurosciences Institute（吴泰教授神经科学研究所）

AI总结本文通过建立贝叶斯非参数模型与相对马氏距离评分（RMDS）之间的形式关系，提出具有分层先验的贝叶斯非参数混合模型来推广RMDS，并在OpenOOD基准上证明其在训练类协方差结构不同且每类数据点较少时优于现有方法。

Comments 32 pages, 5 figures, code is available at https://github.com/rwl93/bnp4ood

2408.00057 2026-05-28 q-bio.BM cs.LG 版本更新

GOProteinGNN: Leveraging Protein Knowledge Graphs for Protein Representation Learning

GOProteinGNN：利用蛋白质知识图谱进行蛋白质表示学习

Dan Kalifa, Uriel Singer, Kira Radinsky

发表机构 * Faculty of Computer Science Technion Israel Institute of Technology Haifa Israel（计算机科学学院技术离子理工学院海法以色列）； Meta AI Tel Aviv Israel（Meta AI 特拉维夫以色列）； Technion Israel Institute of Technology（技术离子理工学院）； Meta AI

AI总结提出GOProteinGNN架构，通过整合蛋白质知识图谱信息增强蛋白质语言模型，在氨基酸和蛋白质级别进行图学习，从而在多个下游任务上取得最优性能。

详情

DOI: 10.1145/3746252.3761500
Journal ref: CIKM 2025: Proceedings of the 34th ACM International Conference on Information and Knowledge Management

AI中文摘要

蛋白质在生物过程中起着至关重要的作用，是生命体不可或缺的。准确的蛋白质表示至关重要，尤其是在药物开发中。近年来，利用机器学习和深度学习技术进行蛋白质表示的无监督学习引起了显著关注。然而，这些方法通常只关注蛋白质的氨基酸序列，缺乏关于蛋白质及其相互作用的实际知识，从而限制了其性能。在本研究中，我们提出了GOProteinGNN，一种新颖的架构，通过在创建氨基酸级别表示时整合蛋白质知识图谱信息来增强蛋白质语言模型。我们的方法允许在单个氨基酸级别和整个蛋白质级别整合信息，通过基于图的学习实现全面有效的学习过程。通过这样做，我们可以捕捉蛋白质与其功能注释之间的复杂关系和依赖关系，从而产生更鲁棒且上下文更丰富的蛋白质表示。与以往方法不同，GOProteinGNN在训练过程中独特地学习了整个蛋白质知识图谱，这使其能够捕捉更广泛的关系细微差别和依赖关系，而不仅仅是像以往工作那样处理三元组。我们在多个下游任务上进行了全面评估，结果表明GOProteinGNN始终优于以往方法，展示了其有效性，并将其确立为蛋白质表示学习的最先进解决方案。

英文摘要

Proteins play a vital role in biological processes and are indispensable for living organisms. Accurate representation of proteins is crucial, especially in drug development. Recently, there has been a notable increase in interest in utilizing machine learning and deep learning techniques for unsupervised learning of protein representations. However, these approaches often focus solely on the amino acid sequence of proteins and lack factual knowledge about proteins and their interactions, thus limiting their performance. In this study, we present GOProteinGNN, a novel architecture that enhances protein language models by integrating protein knowledge graph information during the creation of amino acid level representations. Our approach allows for the integration of information at both the individual amino acid level and the entire protein level, enabling a comprehensive and effective learning process through graph-based learning. By doing so, we can capture complex relationships and dependencies between proteins and their functional annotations, resulting in more robust and contextually enriched protein representations. Unlike previous methods, GOProteinGNN uniquely learns the entire protein knowledge graph during training, which allows it to capture broader relational nuances and dependencies beyond mere triplets as done in previous work. We perform a comprehensive evaluation on several downstream tasks demonstrating that GOProteinGNN consistently outperforms previous methods, showcasing its effectiveness and establishing it as a state-of-the-art solution for protein representation learning.

URL PDF HTML ☆

赞 0 踩 0

2412.08052 2026-05-28 cs.LG stat.ML 版本更新

CANDOR: Counterfactual ANnotated DOubly Robust Off-Policy Evaluation

CANDOR: 反事实注释的双重稳健离策略评估

Aishwarya Mandyam, Shengpu Tang, Jiayu Yao, Jenna Wiens, Barbara E. Engelhardt

发表机构 * Stanford University（斯坦福大学）； Emory University（埃默里大学）； Columbia University（哥伦比亚大学）； University of Michigan（密歇根大学）； The Gladstone Institutes（加利福尼亚大学旧金山分校格罗斯曼研究所）

AI总结提出基于双重稳健框架的离策略评估方法，通过仅在奖励模型组件中融入反事实注释，在理论保证和实验上优于其他策略。

Comments 11 pages, published in the conference proceedings of the Conference on Health Inference and Learning (2026)

详情

AI中文摘要

离策略评估（OPE）对于将上下文赌博算法应用于高风险决策环境（如医疗保健）至关重要，因为新治疗策略在部署前必须进行评估。不幸的是，OPE技术本质上受到可用数据广度的限制，这可能不足以评估新策略的性能。最近的工作尝试通过添加专家注释的反事实样本来改善数据集覆盖。然而，此类注释通常不完美，可能导致比不使用任何注释更差的估计器性能。为了更好地利用不完美注释，我们提出了一族基于双重稳健（DR）框架的OPE估计器，该框架将重要性采样（IS）与奖励模型（直接方法，DM）相结合以获得更好的统计保证。我们研究了三种融入反事实注释的方式。在温和假设下，我们证明仅在DM组件中使用注释能产生最理想的理论结果。在多个医疗保健任务（包括真实世界电子健康记录（EHR）数据）上的实验表明，该策略在错误指定的奖励模型和不准确的注释下最为稳健。通过解决不完美注释带来的挑战，这项工作拓宽了OPE方法的适用性，并促进了医疗保健中决策策略的更安全部署。

英文摘要

Off-policy evaluation (OPE) is critical for applying contextual bandit algorithms to high-stakes decision-making settings such as healthcare, where new treatment policies must be evaluated prior to deployment. Unfortunately, OPE techniques are inherently limited by the breadth of the available data, which may not be sufficient to evaluate the performance of a new policy. Recent work attempts to improve dataset coverage by adding expert-annotated counterfactual samples. However, such annotations are often imperfect and can lead to worse estimator performance than using no annotations at all. To better leverage imperfect annotations, we propose a family of OPE estimators grounded in the doubly robust (DR) framework, which combines importance sampling (IS) with a reward model (direct method, DM) for better statistical guarantees. We study three ways of incorporating counterfactual annotations. Under mild assumptions, we prove that using annotations within just the DM component yields the most desirable theoretical results. Experiments on multiple healthcare tasks, including real-world electronic health records (EHR) data, show that this strategy is most robust under misspecified reward models and inaccurate annotations. By addressing the challenges posed by imperfect annotations, this work broadens the applicability of OPE methods and facilitates safer deployment of decision-making policies in healthcare.

URL PDF HTML ☆

赞 0 踩 0

2411.18502 2026-05-28 stat.ML cs.AI cs.IR cs.LG stat.ME 版本更新

Isometry pursuit

等距追踪

Samson Koelle, Marina Meila

发表机构 * Amazon（亚马逊）； Department of Statistics University of Washington（华盛顿大学统计系）

AI总结提出等距追踪算法，通过新颖的归一化方法和多任务基追踪识别宽矩阵中的正交列子矩阵，用于从可解释字典中发现等距嵌入。

2410.12035 2026-05-28 stat.ML cs.LG 版本更新

Learning with Importance Weighted Variational Inference

基于重要性加权变分推断的学习

Kamélia Daudel, François Roueff

发表机构 * ESSEC Business School（ESSEC商学院）

AI总结本文通过渐近分析比较了IWAE、VR和VR-IWAE边界下的重参数化和双重重参数化梯度估计器，揭示了偏差-方差权衡并证明了DREP的优越性，同时分析了困难区域中梯度估计器的方向合理性。

详情

AI中文摘要

几种涉及重要性加权思想的变分边界推广了用于边际似然优化的证据下界（ELBO），例如重要性加权自编码器（IWAE）、变分Rényi（VR）和VR-IWAE边界。然而，边界和梯度估计器的联合选择如何影响所得变分推断（VI）算法的行为仍不清楚。本文对与IWAE、VR和VR-IWAE边界相关的重参数化（REP）和双重重参数化（DREP）梯度估计器进行了统一的理论比较。通过当蒙特卡洛样本数$N$趋于无穷时信噪比的渐近分析，我们识别了这些梯度估计器中的偏差-方差权衡，并正式证明了在重要性加权VI中DREP优于REP。针对变分密度和后验密度之间的Kullback-Leibler散度以及$N$都趋于无穷的困难区域的额外渐近分析表明，即使变分近似恶化，重要性加权VI梯度估计器仍指向合理方向。这些互补的结果刻画了重要性加权VI中从糟糕初始化到最终收敛的优化轨迹。重要的是，我们的证明技术为样本均值比的研究建立了通用的理论工具，其范围超出了VI，并对蒙特卡洛方法领域做出了独立贡献。

英文摘要

Several variational bounds involving importance weighting ideas generalize the Evidence Lower BOund (ELBO) for marginal likelihood optimization, such as the Importance-weighted Auto-Encoder (IWAE), Variational Rényi (VR) and VR-IWAE bounds. Yet, it remains unclear how the joint choice of bound and gradient estimator impacts the behavior of the resulting variational inference (VI) algorithms. This paper provides a unified theoretical comparison of reparameterized (REP) and doubly-reparameterized (DREP) gradient estimators tied to the IWAE, VR and VR-IWAE bounds. Through asymptotic analyses of the Signal-to-Noise Ratio as the number of Monter Carlo samples $N$ goes to infinity, we identify a bias-variance tradeoff in these gradient estimators and we formally justify the superiority of DREP over REP in importance-weighted VI. An additional asymptotic analysis for challenging regimes, where both $N$ and the Kullback-Leibler divergence between the variational and posterior densities go to infinity, indicates that importance-weighted VI gradient estimators point in a well-founded direction even when the variational approximation deteriorates. Together, these complementary results characterize the optimization trajectory in importance-weighted VI from poor initialization to final convergence. Importantly, our proof techniques establish general theoretical tools for the study of sample means ratios whose scope extend beyond VI and constitute an independent contribution to the field of Monte Carlo methods.

URL PDF HTML ☆

赞 0 踩 0

2410.10241 2026-05-28 cs.LG cs.AI stat.ML 版本更新

Revisiting Graph Autoencoders as Implicit Contrastive Learners

重新审视图自编码器作为隐式对比学习器

Jintang Li, Ruofan Wu, Yuchang Zhu, Huizhe Zhang, Zulun Zhu, Liang Chen

发表机构 * Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University（教育部多媒体可信感知与高效计算重点实验室，厦门大学）； Coupang Shanghai China（Coupang上海）； Sun Yat-sen University（中山大学）； Nanyang Technological University（南洋理工大学）

AI总结本文通过对比学习视角重新审视图自编码器，揭示其隐式对比学习本质，并强调对比视图设计的关键作用，提出非对称子图视图作为重要设计维度。

Comments KDD 2026 research track. Code available at https://github.com/EdisonLeeeee/lrGAE

详情

AI中文摘要

图自编码器（GAEs）和图对比学习（GCL）是图上自监督表示学习的两种主要范式，但它们通常被孤立研究并被视为根本不同的方法。在这项工作中，我们通过对比学习的视角重新审视GAEs，并表明基于结构和基于特征的GAEs都可以概念化为隐式图对比学习器。这一视角揭示了许多现有GAEs的主要区别在于对比视图的构建方式，而非学习目标或架构。基于这一见解，我们引入了一个统一公式，强调对比视图设计是GAEs中一个核心且先前较少探索的维度。特别是，我们识别出由子图视图不匹配产生的非对称对比视图，作为先前GAE研究中一个重要但未充分探索的设计轴。我们在统一框架内形式化这一见解，并在代表性图学习任务上进行系统实验，以检验其对性能和效率的影响。我们的结果表明，将GAEs解释为隐式对比学习器能更清晰地理解现有模型，并为设计有效且可扩展的图自编码器提供实用指导。

英文摘要

Graph autoencoders (GAEs) and graph contrastive learning (GCL) are two major paradigms for self-supervised representation learning on graphs, yet they are often studied in isolation and treated as fundamentally different approaches. In this work, we revisit GAEs through the lens of contrastive learning and show that both structure-based and feature-based GAEs can be conceptualized as implicitly graph contrastive learners. This perspective reveals that many existing GAEs differ primarily in how contrastive views are constructed, rather than in their learning objectives or architectures. Building on this insight, we introduce a unified formulation that highlights contrastive view design as a central and previously less explored dimension in GAEs. In particular, we identify asymmetric contrastive views, arising from mismatches in subgraph views, as an important yet underexplored design axis in prior GAE research. We formalize this insight within a unified framework and conduct systematic experiments on representative graph learning tasks to examine its impact on performance and efficiency. Our results show that interpreting GAEs as implicit contrastive learners offers a clearer understanding of existing models and provides practical guidance for designing effective and scalable graph autoencoders.

URL PDF HTML ☆

赞 0 踩 0

2410.04498 2026-05-28 cs.LG 版本更新

AdaMemento: Adaptive Memory-Assisted Policy Optimization for Reinforcement Learning

AdaMemento: 自适应记忆辅助策略优化用于强化学习

Renye Yan, Yaozhong Gan, You Wu, Junliang Xing, Ling Liangn, Yeshang Zhu, Yimao Cai

发表机构 * Peking University（北京大学）； Tsinghua University（清华大学）

AI总结针对稀疏奖励强化学习问题，提出AdaMemento框架，通过记忆反思模块利用正负经验、细粒度内在动机引导探索以及集成学习自适应协调利用与探索，显著提升性能。

详情

AI中文摘要

在强化学习的稀疏奖励场景中，记忆机制通过像人类一样反思过去经验，为策略优化提供了有前景的捷径。然而，当前基于记忆的强化学习方法仅仅存储和重用高价值策略，缺乏对多样化过去经验的深入精炼和过滤，从而限制了记忆的能力。在本文中，我们提出了AdaMemento，一种自适应记忆增强的强化学习框架。我们不仅记忆正面的过去经验，还设计了一个记忆反思模块，通过学习基于实时状态预测已知局部最优策略，同时利用正面和负面经验。为了有效收集信息丰富的轨迹用于记忆，我们进一步引入了一种细粒度的内在动机范式，其中相似状态中的细微差别可以被精确区分以引导探索。然后，通过集成学习自适应地协调对过去经验的利用和新策略的探索，以逼近全局最优。此外，我们从理论上证明了我们新的内在动机和集成机制的优势。通过59个定量和可视化实验，我们确认AdaMemento能够区分细微状态以更好地探索，并有效利用记忆中的过去经验，相比之前的方法取得了显著改进。

英文摘要

In sparse reward scenarios of reinforcement learning (RL), the memory mechanism provides promising shortcuts to policy optimization by reflecting on past experiences like humans. However, current memory-based RL methods simply store and reuse high-value policies, lacking a deeper refining and filtering of diverse past experiences and hence limiting the capability of memory. In this paper, we propose AdaMemento, an adaptive memory-enhanced RL framework. Instead of just memorizing positive past experiences, we design a memory-reflection module that exploits both positive and negative experiences by learning to predict known local optimal policies based on real-time states. To effectively gather informative trajectories for the memory, we further introduce a fine-grained intrinsic motivation paradigm, where nuances in similar states can be precisely distinguished to guide exploration. The exploitation of past experiences and exploration of new policies are then adaptively coordinated by ensemble learning to approach the global optimum. Furthermore, we theoretically prove the superiority of our new intrinsic motivation and ensemble mechanism. From 59 quantitative and visualization experiments, we confirm that AdaMemento can distinguish subtle states for better exploration and effectively exploiting past experiences in memory, achieving significant improvement over previous methods.

URL PDF HTML ☆

赞 0 踩 0

2407.21075 2026-05-28 cs.AI cs.CL cs.LG 版本更新

Apple Intelligence Foundation Language Models

Apple Intelligence 基础语言模型

Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek, Sam Wiseman, Syd Evans, Tao Lei, Vivek Rathod, Xiang Kong, Xianzhi Du, Yanghao Li, Yongqiang Wang, Yuan Gao, Zaid Ahmed, Zhaoyang Xu, Zhiyun Lu, Al Rashid, Albin Madappally Jose, Alec Doane, Alfredo Bencomo, Allison Vanderby, Andrew Hansen, Ankur Jain, Anupama Mann Anupama, Areeba Kamal, Bugu Wu, Carolina Brum, Charlie Maalouf, Chinguun Erdenebileg, Chris Dulhanty, Daniel Parilla, Dominik Moritz, Doug Kang, Eduardo Jimenez, Evan Ladd, Fangping Shi, Felix Bai, Frank Chu, Fred Hohman, Hadas Kotek, Hannah Gillis Coleman, Jane Li, Jeffrey Bigham, Jeffery Cao, Jeff Lai, Jessica Cheung, Jiulong Shan, Joe Zhou, John Li, Jun Qin, Karanjeet Singh, Karla Vega, Kelvin Zou, Laura Heckman, Lauren Gardiner, Margit Bowler, Maria Cordell, Meng Cao, Nicole Hay, Nilesh Shahdadpuri, Otto Godwin, Pranay Dighe, Pushyami Rachapudi, Ramsey Tantawi, Roman Frigg, Sam Davarnia, Sanskruti Shah, Saptarshi Guha, Sasha Sirovica, Shen Ma, Shuang Ma, Simon Wang, Sulgi Kim, Suma Jayaram, Vaishaal Shankar, Varsha Paidi, Vivek Kumar, Xin Wang, Xin Zheng, Walker Cheng, Yael Shrager, Yang Ye, Yasu Tanaka, Yihao Guo, Yunsong Meng, Zhao Tang Luo, Zhi Ouyang, Alp Aygar, Alvin Wan, Andrew Walkingshaw, Andy Narayanan, Antonie Lin, Arsalan Farooq, Brent Ramerth, Colorado Reed, Chris Bartels, Chris Chaney, David Riazati, Eric Liang Yang, Erin Feldman, Gabriel Hochstrasser, Guillaume Seguin, Irina Belousova, Joris Pelemans, Karen Yang, Keivan Alizadeh Vahid, Liangliang Cao, Mahyar Najibi, Marco Zuliani, Max Horton, Minsik Cho, Nikhil Bhendawade, Patrick Dong, Piotr Maj, Pulkit Agrawal, Qi Shan, Qichen Fu, Regan Poston, Sam Xu, Shuangning Liu, Sushma Rao, Tashweena Heeramun, Thomas Merth, Uday Rayala, Victor Cui, Vivek Rangarajan Sridhar, Wencong Zhang, Wenqi Zhang, Wentao Wu, Xingyu Zhou, Xinwen Liu, Yang Zhao, Yin Xia, Zhile Ren, Zhongzheng Ren

发表机构 * Apple（苹果公司）

AI总结本文介绍了为 Apple Intelligence 功能开发的基础语言模型，包括一个约30亿参数的设备端高效运行模型和一个用于私有云计算的服务器端大模型，并描述了其架构、训练数据、优化过程和评估结果。

2405.09689 2026-05-28 cs.LG cs.AI cs.SC 版本更新

Generalized Holographic Reduced Representations

广义全息约简表示

Calvin Yeung, Zhuowen Zou, SungHeon Jeong, Wenjun Huang, Nathaniel D Bastian, Mohsen Imani

发表机构 * University of California, Irvine（加州大学尔湾分校）； United States Military Academy（美国军事学院）

AI总结提出广义全息约简表示（GHRR），通过灵活的非交换绑定操作改进超维计算对复杂组合结构的编码能力，并在语言建模任务中验证其可替代注意力机制并提升性能。

详情

DOI: 10.1109/TAI.2026.3678232

AI中文摘要

超维计算（HDC）是一种计算和数据高效的范式，在连接主义和符号主义人工智能方法之间架起桥梁。然而，HDC的简单性给编码复杂组合结构带来了挑战，尤其是在其绑定操作中。为了解决这个问题，我们提出了广义全息约简表示（GHRR），它是傅里叶全息约简表示（FHRR）的扩展，FHRR是一种特定的HDC实现。GHRR引入了一种灵活的非交换绑定操作，能够改进复杂数据结构的编码，同时保留HDC的鲁棒性和透明性等理想特性。在这项工作中，我们介绍了GHRR框架，证明了其理论性质及其对HDC性质的遵循，探索了其核和绑定特性，并通过实证实验展示了其灵活的非交换性以及对组合结构增强的解码准确性。我们还证明了GHRR中的绑定比其他HDC变体更具表现力；特别地，我们展示了GHRR中的绑定可以实现一种注意力机制。我们通过在Transformer中将其注意力机制替换为GHRR等价物并在语言建模任务上进行测试来验证这一点，结果显示与普通Transformer相比性能有所提升。

英文摘要

Hyperdimensional Computing (HDC) is a computationally and data-efficient paradigm that acts as a bridge between connectionist and symbolic approaches to artificial intelligence (AI). However, HDC's simplicity poses challenges for encoding complex compositional structures, especially in its binding operation. To address this, we propose Generalized Holographic Reduced Representations (GHRR), an extension of Fourier Holographic Reduced Representations (FHRR), a specific HDC implementation. GHRR introduces a flexible, non-commutative binding operation, enabling improved encoding of complex data structures while preserving HDC's desirable properties of robustness and transparency. In this work, we introduce the GHRR framework, prove its theoretical properties and its adherence to HDC properties, explore its kernel and binding characteristics, and perform empirical experiments showcasing its flexible non-commutativity, enhanced decoding accuracy for compositional structures. We also demonstrate that binding in GHRR is more expressive than that in other HDC variants; in particular, we show that binding in GHRR can implement a kind of attention mechanism. We verify this by replacing the attention mechanism in a transformer with its GHRR-equivalent and testing it on a language modeling task, showing improved performance compared to a vanilla transformer.

URL PDF HTML ☆

赞 0 踩 0

2403.16825 2026-05-28 cs.LG math.OC math.PR stat.ML 版本更新

Weak Convergence Analysis of Online Neural Actor-Critic Algorithms

在线神经演员-评论家算法的弱收敛分析

Samuel Chun-Hei Lam, Justin Sirignano, Ziheng Wang

发表机构 * Mathematical Institute, University of Oxford（牛津大学数学研究所）

AI总结本文证明单层神经网络在线演员-评论家算法在隐藏单元数和训练步数趋于无穷时分布收敛到随机常微分方程，并利用泊松方程和弱收敛技术分析极限行为。

详情

AI中文摘要

我们证明，当隐藏单元数和训练步数趋于无穷时，使用在线演员-评论家算法训练的单层神经网络在分布上收敛到一个随机常微分方程（ODE）。在在线演员-评论家算法中，数据样本的分布随着模型更新而动态变化，这是任何收敛分析的关键挑战。我们建立了在固定演员策略下数据样本的几何遍历性。然后，利用泊松方程，我们证明了由于随机到达的数据样本导致的模型更新围绕极限分布的波动随着参数更新次数趋于无穷而消失。利用泊松方程和弱收敛技术，我们证明了演员神经网络和评论家神经网络收敛到具有随机初始条件的ODE系统的解。对极限ODE的分析表明，极限评论家网络将收敛到真实价值函数，这将为演员提供策略梯度的渐近无偏估计。然后我们证明极限演员网络将收敛到一个驻点。

英文摘要

We prove that a single-layer neural network trained with the online actor critic algorithm converges in distribution to a random ordinary differential equation (ODE) as the number of hidden units and the number of training steps $\rightarrow \infty$. In the online actor-critic algorithm, the distribution of the data samples dynamically changes as the model is updated, which is a key challenge for any convergence analysis. We establish the geometric ergodicity of the data samples under a fixed actor policy. Then, using a Poisson equation, we prove that the fluctuations of the model updates around the limit distribution due to the randomly-arriving data samples vanish as the number of parameter updates $\rightarrow \infty$. Using the Poisson equation and weak convergence techniques, we prove that the actor neural network and critic neural network converge to the solutions of a system of ODEs with random initial conditions. Analysis of the limit ODE shows that the limit critic network will converge to the true value function, which will provide the actor an asymptotically unbiased estimate of the policy gradient. We then prove that the limit actor network will converge to a stationary point.

URL PDF HTML ☆

赞 0 踩 0

2308.07772 2026-05-28 cs.LG cs.AI 版本更新

MOLE: MOdular Learning FramEwork via Mutual Information Maximization

MOLE: 基于互信息最大化的模块化学习框架

Tianchao Li, Yulong Pei

发表机构 * Department of Mathematics（数学系）； Computer Science, Eindhoven University of Technology, Eindhoven, the Netherland（计算机科学系，埃因霍温理工大学，埃因霍温，荷兰）

AI总结提出一种异步局部学习框架MOLE，通过层间模块化与互信息最大化实现梯度隔离的局部优化，适用于向量、网格和图数据，并在图级别和节点级别任务上验证了通用性。

Comments accepted by icml llw

2006.06049 2026-05-28 cs.LG stat.ML 版本更新

On Mixup Regularization

关于混合正则化

Luigi Carratino, Moustapha Cissé, Rodolphe Jenatton, Jean-Philippe Vert

发表机构 * MaLGa - University of Genova, Italy（马尔加-热那亚大学，意大利）； Google Research - Brain team, Accra（谷歌研究-大脑团队，阿克拉）； Google Research - Brain team, Berlin（谷歌研究-大脑团队，柏林）； Google Research - Brain team, Paris（谷歌研究-大脑团队，巴黎）

AI总结本文通过将混合解释为数据变换与随机扰动的组合，揭示了其正则化效应，并提出了测试时数据变换改进以及标签平滑和Lipschitz常数减小等机制。

详情

Journal ref: Journal of Machine Learning Research, 23(325):1-31, 2022

AI中文摘要

混合是一种数据增强技术，通过训练点和标签的凸组合创建新样本。这种简单技术已在不同设置和应用中经验性地提高了许多最先进模型的准确性，但其经验成功背后的原因仍然知之甚少。在本文中，我们通过阐明混合的正则化效应，在解释其理论基础方面迈出了重要一步。我们表明，混合可以解释为在数据变换和变换数据随机扰动组合下的标准经验风险最小化估计量。从这一新解释中，我们获得了两个核心见解。首先，数据变换表明，在测试时，使用混合训练的模型也应应用于变换后的数据，这是一行代码的改变，我们经验性地表明这可以提高预测的准确性和校准。其次，我们展示了混合新解释中的随机扰动如何诱导多种已知的正则化方案，包括标签平滑和估计量Lipschitz常数的减小。这些方案协同相互作用，产生自校准且有效的正则化效果，防止过拟合和过度自信的预测。我们通过实验支持我们的理论分析，这些实验证实了我们的结论。

英文摘要

Mixup is a data augmentation technique that creates new examples as convex combinations of training points and labels. This simple technique has empirically shown to improve the accuracy of many state-of-the-art models in different settings and applications, but the reasons behind this empirical success remain poorly understood. In this paper we take a substantial step in explaining the theoretical foundations of Mixup, by clarifying its regularization effects. We show that Mixup can be interpreted as standard empirical risk minimization estimator subject to a combination of data transformation and random perturbation of the transformed data. We gain two core insights from this new interpretation. First, the data transformation suggests that, at test time, a model trained with Mixup should also be applied to transformed data, a one-line change in code that we show empirically to improve both accuracy and calibration of the prediction. Second, we show how the random perturbation of the new interpretation of Mixup induces multiple known regularization schemes, including label smoothing and reduction of the Lipschitz constant of the estimator. These schemes interact synergistically with each other, resulting in a self calibrated and effective regularization effect that prevents overfitting and overconfident predictions. We corroborate our theoretical analysis with experiments that support our conclusions.

URL PDF HTML ☆

赞 0 踩 0

2206.15475 2026-05-28 cs.LG stat.ME 版本更新

Causal Machine Learning: A Survey and Open Problems

因果机器学习：综述与开放问题

Jean Kaddour, Aengus Lynch, Qi Liu, Matt J. Kusner, Ricardo Silva

发表机构 * AI Centre, Department of Computer Science, UCL（人工智能中心，计算机科学系，伦敦大学学院）； Department of Computer Science, University of Hong Kong（计算机科学系，香港大学）； AI Centre, Department of Statistical Science, UCL（人工智能中心，统计科学系，伦敦大学学院）

AI总结本文综述了因果机器学习（CausalML）的五个主要研究方向（因果监督学习、因果生成建模、因果解释、因果公平性和因果强化学习），系统比较了各方向的方法，指出了开放问题，并讨论了在计算机视觉、自然语言处理和图表征学习中的应用。

Comments v03. Work in progress. Feedback and comments are highly appreciated!

2205.14090 2026-05-28 stat.ML cs.LG 版本更新

Surrogate modeling for Bayesian optimization beyond a single Gaussian process

超越单一高斯过程的贝叶斯优化代理建模

Qin Lu, Konstantinos D. Polyzos, Bingcong Li, Georgios B. Giannakis

发表机构 * Dept. of Electrical and Computer Engineering, University of Minnesota（明尼苏达大学电子与计算机工程系）

AI总结提出一种基于高斯过程集成（EGP）的自适应代理模型，结合汤普森采样（TS）进行贝叶斯优化，以增强表达能力和并行性，并建立了贝叶斯遗憾分析。

Comments This version added some minor corrections and clarifications to the proofs

详情

AI中文摘要

贝叶斯优化（BO）在优化具有昂贵评估代价的黑盒函数方面具有充分记录的优点。这类函数出现在超参数调优、药物发现和机器人等多样化应用中。BO依赖于贝叶斯代理模型来顺序选择查询点，以平衡搜索空间的探索与利用。大多数现有工作依赖于基于单一高斯过程（GP）的代理模型，其中核函数形式通常使用领域知识预先选择。为了绕过这种设计过程，本文利用GP的集成（E）来自适应地选择实时拟合的代理模型，从而产生具有增强表达能力的GP混合后验。然后，通过汤普森采样（TS）实现使用基于EGP的函数后验获取下一个评估输入，这不需要额外的设计参数。为了赋予函数采样可扩展性，每个GP模型采用基于随机特征的核近似。新颖的EGP-TS易于适应并行操作。为了进一步建立所提出的EGP-TS收敛到全局最优的结论，基于贝叶斯遗憾的概念对顺序和并行设置进行了分析。在合成函数和实际应用上的测试展示了所提出方法的优点。

英文摘要

Bayesian optimization (BO) has well-documented merits for optimizing black-box functions with an expensive evaluation cost. Such functions emerge in applications as diverse as hyperparameter tuning, drug discovery, and robotics. BO hinges on a Bayesian surrogate model to sequentially select query points so as to balance exploration with exploitation of the search space. Most existing works rely on a single Gaussian process (GP) based surrogate model, where the kernel function form is typically preselected using domain knowledge. To bypass such a design process, this paper leverages an ensemble (E) of GPs to adaptively select the surrogate model fit on-the-fly, yielding a GP mixture posterior with enhanced expressiveness for the sought function. Acquisition of the next evaluation input using this EGP-based function posterior is then enabled by Thompson sampling (TS) that requires no additional design parameters. To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model. The novel EGP-TS readily accommodates parallel operation. To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret for both sequential and parallel settings. Tests on synthetic functions and real-world applications showcase the merits of the proposed method.

URL PDF HTML ☆

赞 0 踩 0

2112.09305 2026-05-28 cs.LG stat.ML 版本更新

Gaussian RBF Centered Kernel Alignment (CKA) in the Large Bandwidth Limit

大带宽极限下的高斯RBF中心核对齐(CKA)

Sergio A. Alvarez

发表机构 * Boston College（波士顿学院）

AI总结本文证明基于高斯RBF核的中心核对齐(CKA)在大带宽极限下收敛到线性CKA，并发现收敛起始对特征表示的几何形状敏感，表示偏心率限制了高斯CKA表现非线性的带宽范围。

Comments 11 pages, 3 figures

1901.03808 2026-05-28 cs.LG eess.SP stat.ML 版本更新

ECGadv: Generating Adversarial Electrocardiogram to Misguide Arrhythmia Classification System

ECGadv: 生成对抗性心电图以误导心律失常分类系统

Huangxun Chen, Chenyu Huang, Qianyi Huang, Qian Zhang, Wei Wang

发表机构 * The Hong Kong University of Science and Technology（香港科技大学）； Southern University of Science and Technology, Peng Cheng Laboratory（南方科技大学鹏城实验室）； Huazhong University of Science and Technology（华中科技大学）

AI总结本文针对基于深度神经网络的心电图诊断系统，分析心电图特性并设计两种攻击模型下的对抗攻击方案，揭示系统盲点，呼吁采取对策。

Comments Accepted by AAAI 2020

详情

DOI: 10.1609/AAAI.V34I04.5748
Journal ref: Proceedings of the AAAI conference on artificial intelligence 2020

AI中文摘要

基于深度神经网络（DNN）的心电图（ECG）诊断系统最近取得了令人瞩目的进展，有望取代心脏病专家进行繁琐的检查。然而，它们对对抗攻击的脆弱性仍缺乏全面研究。由于心电图在可视化和动态特性上的独特性，图像领域的现有攻击无法直接适用。因此，本文迈出一步，深入探索对基于DNN的心电图诊断系统的对抗攻击。我们分析心电图特性，分别在两种攻击模型下设计有效的攻击方案。我们的结果揭示了基于DNN的诊断系统在对抗攻击下的盲点，这呼吁采取充分的应对措施。

英文摘要

Deep neural networks (DNNs)-powered Electrocardiogram (ECG) diagnosis systems recently achieve promising progress to take over tedious examinations by cardiologists. However, their vulnerability to adversarial attacks still lack comprehensive investigation. The existing attacks in image domain could not be directly applicable due to the distinct properties of ECGs in visualization and dynamic properties. Thus, this paper takes a step to thoroughly explore adversarial attacks on the DNN-powered ECG diagnosis system. We analyze the properties of ECGs to design effective attacks schemes under two attacks models respectively. Our results demonstrate the blind spots of DNN-powered diagnosis systems under adversarial attacks, which calls attention to adequate countermeasures.

URL PDF HTML ☆

赞 0 踩 0