arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.02563 2026-06-02 cs.LG cs.CR cs.DC 版本更新

IntraShuffler: A Privacy Preserving Framework for Heterogeneous DP Federated Learning

IntraShuffler：一种用于异构差分隐私联邦学习的隐私保护框架

Farhin Farhad Riya, Olivera Kotevska, Jinyuan Stella Sun

发表机构 * University of Tennessee, Knoxville, USA（田纳西大学，科文特分校）； Oak Ridge National Laboratory, USA（橡树岭国家实验室）

AI总结针对异构差分隐私联邦学习中诚实但好奇的服务器通过梯度结构推断客户端属性的隐私推理攻击，提出IntraShuffler中间件框架，通过隐私感知混洗机制破坏梯度持久结构，同时保持ε感知聚合，将梯度可恢复性降低60%以上，代理推理准确率从0.78降至0.33。

详情

AI中文摘要

联邦学习中的异构差分隐私允许客户端根据机构策略和数据敏感性选择个体隐私预算（$\varepsilon_i$）。实践中，许多HDP-FL系统采用$\varepsilon$感知的服务器聚合，通过根据声明的隐私预算重新加权客户端更新来提高模型效用。然而，联邦学习中的梯度更新保留了由非独立同分布数据引起的结构模式，而$\varepsilon$感知聚合暴露的这些额外信号为诚实但好奇的服务器提供了新的推理机会。在这项工作中，我们首先展示，配备梯度去噪和代理建模的服务器可以在现实知识约束下发起隐私推理攻击，该攻击推断客户端的分布属性并链接同一客户端在不同训练轮次中的更新，通过代理推理准确率和链接成功率衡量。混洗模型通过匿名化更新来源被广泛研究作为针对此类推理风险的防御，但它与HDP-FL的$\varepsilon$感知聚合根本不相容。为了解决这一挑战，我们提出了IntraShuffler，一个专为HDP-FL系统设计的中间件防御框架。IntraShuffler引入了一种隐私感知的混洗机制，将客户端分组到隐私兼容的桶中，并在每个桶内执行参数级混洗，以破坏持久的梯度结构，同时保持$\varepsilon$感知聚合。在四个不同数据集上的实验表明，IntraShuffler将梯度可恢复性降低了60%以上，并将代理推理准确率从0.78降至0.33，同时在多种联邦学习聚合规则下保持可比的模型效用。

英文摘要

Heterogeneous Differential Privacy (HDP) in Federated Learning (FL) allows clients to select individual privacy budgets ($\varepsilon_i$) according to institutional policies and data sensitivity. In practice, many HDP-FL systems employ $\varepsilon$-aware server aggregation to improve model utility by re-weighting client updates according to their declared privacy budgets. However, gradient updates in FL retain structural patterns induced by non-independent and identically-distributed (non-IID) data, and these additional signals exposed by $\varepsilon$-aware aggregation create new opportunities for inference by an honest-but-curious server. In this work, we first show that a server equipped with gradient denoising and surrogate modeling can mount a \emph{Privacy Inference Attack} that infers distributional attributes of clients and links updates from the same client across training rounds, measured via surrogate inference accuracy and linkage success, under realistic knowledge constraints. The Shuffle-Model has been widely studied as a defense against such inference risks by anonymizing update sources, but it is fundamentally incompatible with HDP-FL $\varepsilon$-aware aggregation. To address this challenge, we propose \textbf{IntraShuffler}, a middleware defense framework designed for HDP-FL systems. IntraShuffler introduces a privacy-aware shuffling mechanism that groups clients into privacy-compatible buckets and performs parameter-level shuffling within each bucket to disrupt persistent gradient structure while preserving $\varepsilon$-aware aggregation. Experiments across four different datasets show that IntraShuffler reduces gradient recoverability by over 60% and decreases surrogate inference accuracy from 0.78 to 0.33 while maintaining comparable model utility across multiple FL aggregation rules.

URL PDF HTML ☆

赞 0 踩 0

2606.02562 2026-06-02 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Permissive Safety Through Trusted Inference: Verifiable Belief-Space Neural Safety Filters for Assured Interactive Robotics

通过可信推理实现许可安全：可验证的信念空间神经安全滤波器用于保证交互式机器人

Haimin Hu

发表机构 * Department of Computer Science, Johns Hopkins University, USA（约翰霍普金斯大学计算机科学系）

AI总结针对交互式机器人中人类不确定性带来的安全问题，提出一种基于共形预测的信念空间安全滤波器验证方法，在考虑推理可靠性的前提下保证高概率安全，并减少保守性。

Comments Accepted to the 17th World Symposium on the Algorithmic Foundations of Robotics (WAFR 2026)

详情

AI中文摘要

与人类交互的自主机器人必须在人类引起的不确定性（如偏好、目标、能力和合作意愿）下做出安全高效的决策。安全滤波器是确保交互式机器人安全性的流行方法，其模块化设计将安全性与性能分离，使机器人能够在最小影响任务效率的情况下安全地与人交互。传统安全滤波器通常仅在物理空间中运行，忽略了机器人在线学习和适应的能力，而最近提出的信念空间安全滤波器（BeliefSF）在闭环中考虑机器人安全性，并通过运行时推理主动减少机器人的不确定性，从而降低滤波的保守性。然而，由于运行时推理的误差以及处理信念空间高维性所需的安全滤波器神经近似，为部署BeliefSF的机器人提供形式化安全保证仍然是一个重大挑战。本文提出一种算法方法，使用共形预测来认证BeliefSF的高概率安全性，同时明确考虑机器人运行时推理模块的可靠性。我们的方法利用信念空间安全滤波的结构，将验证集中在预期推理可靠的区域。它保留了标准共形预测的简单性和样本复杂度，但能够认证一个显著更不保守的安全滤波器。通过一个模拟的人-车交互基准测试，我们展示了我们的方法验证了一个比标准共形预测基线更许可的信念空间安全滤波器。

英文摘要

Autonomous robots that interact with people must make safe and efficient decisions under human-induced uncertainty, such as their preferences, goals, competency, and willingness to cooperate. Safety filters are a popular approach for ensuring safety in interactive robotics, since their modular design separates safety from performance, allowing robots to operate safely around people with minimal impact on task efficiency. While traditional safety filters typically operate only in the physical space, neglecting the robot's ability to learn and adapt online, the recently proposed belief-space safety filter (BeliefSF) reasons about robot safety in closed-loop with runtime inference that actively reduces the robot's uncertainty online, thereby reducing conservativeness in filtering. However, providing formal safety guarantees for robots deploying BeliefSF remains a significant challenge due to errors in runtime inference and neural approximation of safety filters required to handle the high dimensionality of belief spaces. In this paper, we propose an algorithmic approach to certify high-probability safety of BeliefSF using conformal prediction, while explicitly accounting for the reliability of the robot's runtime inference module. Our method leverages the structure of belief-space safety filtering by focusing verification on a region where inference is expected to be reliable. It preserves the simplicity and sample complexity of standard conformal prediction, yet can certify a substantially less conservative safety filter. Through a simulated human-vehicle interaction benchmark, we show that our approach verifies a significantly more permissive belief-space safety filter than a standard conformal prediction baseline.

URL PDF HTML ☆

赞 0 踩 0

2606.02528 2026-06-02 q-fin.GN cs.CY cs.LG 版本更新

Auditing Asset-Specific Preferences in Financial Large Language Models: Evidence from Bitcoin Representations and Portfolio Allocation

审计金融大语言模型中的资产特定偏好：来自比特币表征与投资组合配置的证据

Wenbin Wu

AI总结本研究通过三级审计协议，发现大型语言模型对比特币存在框架依赖的偏好，并识别出模型内部一个可因果干预的比特币选择性特征，该特征能显著影响下游投资组合配置。

Comments 28 pages, 5 figures, 18 tables

详情

AI中文摘要

大型语言模型现已驱动机器人顾问和交易代理，但它们是否对特定资产存在固有偏见尚未得到充分检验。我们提出三个问题：LLMs是否系统性地偏好某些金融工具；能否识别出对这些偏好具有因果杠杆作用的内部表征；以及该表征是否影响下游金融决策。我们开发了一个三级审计协议并将其应用于比特币。首先，对八个前沿LLMs的行为审计显示，比特币在货币类工具中的排名具有框架依赖性：模型将其置于“可靠货币”的第5位（共8位），但在危机和自主代理框架下接近榜首，且属性交换实验确认排名追踪功能属性而非名称。其次，我们打开模型内部：在Gemma 3中搜索数千个稀疏自编码器特征，识别出一个主导的比特币选择性特征。放大该特征会使模型偏向该资产，抑制则使其远离，即使提示中从未出现“比特币”。第三，我们测试金融后果：放大使比特币在投资组合中的份额提高5.2个百分点，而抑制降低4.6个百分点，放大在加密资产内重新分配，抑制则削减总加密敞口。我们将此描述为有界行为杠杆（杠杆指对输出的因果影响，而非金融杠杆）：一个可识别的内部特征可被扰动以改变金融选择，但仅在可测量的限度内。该框架将内部表征与外部建议联系起来，并通过随机对照和机制边界进行验证。随着LLMs成为自主金融代理，这是迈向新兴“了解你的代理”（KYA）标准的行为层的第一步：了解代理偏好什么，以及该偏好可被移动多远。

英文摘要

Large language models now power robo-advisors and trading agents, yet whether they carry built-in biases toward specific assets is largely untested. We ask three questions: do LLMs systematically prefer certain financial instruments; can an internal representation with causal leverage over those preferences be identified; and does that representation affect downstream financial decisions? We develop a three-level audit protocol and apply it to Bitcoin. First, a behavioral audit of eight frontier LLMs shows that Bitcoin's ranking among money-like instruments is frame-dependent: models place it around rank 5 of 8 as "reliable money" but near the top under crisis and autonomous-agent frames, and an attribute-swap experiment confirms rankings track functional properties, not names. Second, we open a model's internals: a search across thousands of sparse-autoencoder features in Gemma 3 identifies a dominant Bitcoin-selective feature. Amplifying it shifts the model toward the asset and suppressing it shifts the model away, even when "Bitcoin" never appears in the prompt. Third, we test financial consequences: amplification raises Bitcoin's portfolio share by 5.2 percentage points while suppression lowers it by 4.6 pp, with amplification reallocating within crypto and suppression cutting total crypto exposure. We characterize this as bounded behavioral leverage (leverage meaning causal influence over outputs, not financial leverage): an identifiable internal feature can be perturbed to move financial choices, but only within measurable limits. The framework links internal representations to external recommendations, validated with random controls and mechanism boundaries. As LLMs become autonomous financial agents, this is a first step toward a behavioral layer for emerging know-your-agent (KYA) standards: knowing what an agent prefers, and how far that preference can be moved.

URL PDF HTML ☆

赞 0 踩 0

2606.02515 2026-06-02 cs.LG 版本更新

A Biconvex Formulation for Stable Transport of Mixture Models with a Unique Solution

混合模型稳定传输的双凸形式与唯一解

Yeganeh Marghi, Kelly Jin, Uygar Sümbül

AI总结提出最优混合传输（OMT）框架，通过严格双凸优化实现子群体混合的稳定传输，理论保证稳定性，计算复杂度仅与混合成分数相关。

详情

AI中文摘要

最优传输（OT）为概率分布之间的映射提供了原则性框架。尽管取得了广泛进展，将OT应用于大规模数据仍然计算密集，且得到的逐点传输计划往往难以解释。我们引入了最优混合传输（OMT），这是一个可扩展的框架，将传输范式从单个样本转移到子群体的混合，将传输问题重新表述为具有唯一全局最小值的严格双凸优化。我们进一步建立了OMT映射稳定性的理论保证，表明底层分布的有界扰动会导致传输计划的有界变化。通过将子群体表述为指数族分布，OMT将计算复杂度与样本量解耦，仅随混合成分数量扩展。我们在广泛的合成基准和真实世界数据集（包括图像数据和大规模单细胞RNA测序测量）上展示了OMT的有效性和实用性。

英文摘要

Optimal transport (OT) provides a principled framework for mapping between probability distributions. Despite extensive progress, applying OT to large-scale data remains computationally demanding, and the resulting pointwise transport plans are often difficult to interpret. We introduce Optimal Mixture Transport (OMT), a scalable framework that shifts the transport paradigm from individual samples to mixtures of subpopulations, reformulating the transport problem as a strictly biconvex optimization with a unique global minimizer. We further establish theoretical guarantees on the stability of the OMT map, showing that bounded perturbations of the underlying distributions lead to bounded changes in the transport plan. By formulating subpopulations as exponential-family distributions, OMT decouples computational complexity from the sample size, scaling solely with the number of mixture components. We demonstrate the effectiveness and practicality of OMT on a wide range of synthetic benchmarks and real-world datasets, including image data and large-scale single-cell RNA sequencing measurements.

URL PDF HTML ☆

赞 0 踩 0

2606.02507 2026-06-02 cond-mat.mtrl-sci cs.ET cs.LG physics.app-ph physics.comp-ph 版本更新

Towards Automated Discovery: A Review of Generative Models, Multimodal Learning and Closed-Loop Workflows in Inverse Materials Design

迈向自动发现：逆向材料设计中生成模型、多模态学习与闭环工作流综述

Anand Babu, Rogério Almeida Gouvêa, Gian-Marco Rignanese

发表机构 * Institute of Condensed Matter and Nanosciences, Université Catholique de Louvain（凝聚态与纳米科学研究所，比利时列日-努瓦尔桑大学）； WEL Research Institute（WEL研究机构）

AI总结本文综述了逆向材料设计中生成晶体结构建模、多模态学习和闭环设计管道的最新进展，重点讨论了可行性约束与物理先验的施加方式、多模态融合策略以及多种逆向设计策略（如条件生成与潜在优化、贝叶斯优化、强化学习和主动学习），并指出了常见失败模式及基于分阶段报告的评估实践。

详情

AI中文摘要

上下文算子网络的频谱审计

Zhiwei Gao, Liu Yang, George Em Karniadakis

发表机构 * Division of Applied Mathematics, Brown University（布朗大学应用数学系）； Department of Mathematics, National University of Singapore（新加坡国立大学数学系）

AI总结提出基于雅可比矩阵的频谱审计方法，通过分析上下文算子学习中的局部频谱特性（频率增益、相位结构、交叉模式耦合）来评估模型是否真正学习了PDE算子的局部动力学机制，而不仅仅是输出预测。

详情

AI中文摘要

现有的神经算子和上下文算子学习评估主要依赖于预测误差，但准确的输出预测并不能保证正确的局部动力学结构。一个模型可能匹配解，同时表现出不正确的敏感性、失真的频率响应、虚假的模式耦合或不稳定的切向行为。我们引入了一种基于雅可比矩阵的频谱审计方法，用于上下文算子学习。对于固定的提示，我们将网络输出对查询函数求导，并将得到的雅可比矩阵视为学习的切向算子。将其投影到傅里叶模式上，我们获得了推断算子的局部频谱特征，包括频率相关的增益、相位结构和交叉模式耦合。该审计通过测试模型是否再现底层PDE算子的局部机制（而不仅仅是输出）来补充标准预测指标。在多个基准测试中，审计揭示了不同的算子级现象，包括相位传输、粘度依赖的阻尼、非线性模式耦合和反应-扩散稳定性结构。它还检测了部分被预测误差指标隐藏的失败，包括高频退化、不正确的相位恢复和提示-算子不一致。即使逐点预测部分准确，损坏或内部不一致的提示也会导致切向算子结构退化。我们的结果表明，预测精度和局部算子保真度是学习到的神经算子的不同属性。我们的框架还为稳定性、灵敏度和算子一致性提供了诊断。

英文摘要

Existing evaluations of neural operators and in-context operator learning rely primarily on prediction error, but accurate output prediction does not guarantee the correct local dynamical structure. A model may match solutions while exhibiting incorrect sensitivities, distorted frequency response, spurious mode coupling, or unstable tangent behavior. We introduce a Jacobian-based spectral audit for in-context operator learning. For a fixed prompt, we differentiate the network output with respect to the query function and view the resulting Jacobian as a learned tangent operator. Projecting it onto Fourier modes, we obtain a local spectral characterization of the inferred operator, including frequency-dependent gains, phase structure, and cross-mode coupling. The audit complements standard prediction metrics by testing whether the model reproduces local mechanisms of the underlying PDE operator rather than only outputs. Across benchmarks, the audit reveals distinct operator-level phenomena, including phase transport, viscosity-dependent damping, nonlinear mode coupling, and reaction--diffusion stability structure. It also detects failures partially hidden by prediction-error metrics, including high-frequency degradation, incorrect phase recovery, and prompt--operator inconsistencies. Corrupted or internally inconsistent prompts lead to degraded tangent-operator structure even when pointwise predictions remain partially accurate. Our results suggest that prediction accuracy and local operator fidelity are distinct properties of learned neural operators. Our framework also provides a diagnostic for stability, sensitivity, and operator consistency.

URL PDF HTML ☆

赞 0 踩 0

2606.02424 2026-06-02 cs.CV cs.AI cs.LG 版本更新

GC-MoE: Genomics-Guided Cell-Type-Specific Mixture of Experts for Histology-Based Single-Cell Spatial Transcriptomics

GC-MoE: 基因组引导的细胞类型特异性专家混合模型用于基于组织学的单细胞空间转录组学

Kaito Shiku, Ahtisham Fazeel Abbasi, Ryoma Bise, Yuichiro Iwashita, Kazuya Nishimura, Andreas Dengel, Muhammad Nabeel Asim

发表机构 * Kyushu University（九州大学）； German Research Center for Artificial Intelligence (DFKI GmbH)（德国人工智能研究中心）； RPTU University Kaiserslautern-Landau（科布伦茨-劳恩堡大学）； The University of Osaka（大阪大学）； IntelligentX GmbH ； Osaka Metropolitan University（大阪 Metropolitan 大学）

AI总结提出GC-MoE模型，通过路由网络估计细胞类型概率并软组合细胞类型特异性专家，结合细胞类型特异性共表达感知预测器和细胞间交互注意力模块，从组织学图像和细胞位置预测单细胞基因表达，在公共数据集上优于现有方法。

详情

AI中文摘要

基于组织学的单细胞空间转录组学（ST）估计旨在从组织病理学图像和细胞位置预测单个细胞的基因表达，从而减少对昂贵的单细胞ST测量的需求。与现有的组织学到ST方法主要预测包含多个细胞的局部区域的斑点级谱不同，该任务需要对细胞间的表达变异性进行建模，而这种变异性强烈地由细胞类型结构化。我们提出了基因组引导的细胞类型特异性专家混合模型（GC-MoE），该模型通过路由网络估计细胞类型概率，并软组合细胞类型特异性专家进行基因表达预测。为了进一步编码细胞类型依赖的基因程序，我们引入了细胞类型特异性共表达感知预测器（CAP），以及一个轻量级的细胞间交互注意力（C2CA）模块用于邻域细胞上下文。在公共单细胞ST数据集上的实验和消融研究表明，该方法在现有单细胞和适应性斑点级基线方法上均有一致的改进。

英文摘要

Histology-based single-cell spatial transcriptomics (ST) estimation aims to predict gene expression for individual cells from histopathological images and cell locations, reducing the need for costly single-cell ST measurements. Unlike existing histology-to-ST methods that mainly predict spot-level profiles for local regions containing multiple cells, this task requires modeling cell-to-cell expression variability, which is strongly structured by cell type. We propose Genomics-Guided Cell-Type-Specific Mixture-of-Experts (GC-MoE), which estimates cell-type probabilities with a routing network and softly combines cell-type-specific experts for gene expression prediction. To further encode cell-type-dependent gene programs, we introduce the Cell-Type-Specific Co-Expression-Aware Predictor (CAP), together with a lightweight Cell-to-Cell Interaction Attention (C2CA) module for neighboring-cell context. Experiments and ablations on public single-cell ST datasets show consistent improvements over existing single-cell and adapted spot-level baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.02423 2026-06-02 cs.CL cs.LG 版本更新

Investigating and Alleviating Harm Amplification in LLM Interactions

调查和缓解大语言模型交互中的危害放大

Ruohao Guo, Wei Xu, Alan Ritter

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结提出HarmAmp基准和TrajSafe监控器，用于评估和缓解多轮对话中大语言模型对危害的放大效应。

详情

AI中文摘要

大语言模型（LLM）可以作为有用的助手，但它们同样可以作为危害放大器，使恶意用户通过扩展交互实现超出其能力的危害结果。这种风险沿着两个轴显现，即民主化领域专业知识，使新手能够产生专门的有害内容，以及以手动努力无法匹敌的规模扩大有害操作。然而，现有工作往往忽略了LLM在多轮对话中如何加剧危害。我们引入了HarmAmp，这是一个新的基准，用于涵盖十二个风险类别的多轮危害放大场景。每个场景都基于现实世界的威胁，并满足严格的标准，即实质性放大、操作特异性和多轮必要性。我们进一步提出了TrajSafe，一种主动监控器，可以预测有害轨迹并通过诸如探测用户真实意图和引导模型更安全地完成等行动进行干预。我们的广泛实验表明，TrajSafe显著降低了多轮交互中产生的危害性，同时保持了低过度拒绝率和目标模型的一般能力。我们的工作为缓解LLM交互中微妙的安全风险提供了一个有前景的范式。

英文摘要

Large language models (LLMs) can serve as helpful assistants, yet they can equally function as harm amplifiers that enable malicious users to achieve harmful outcomes beyond their capabilities through extended interactions. This risk manifests along two axes, i.e., democratizing domain expertise that allows novices to produce specialized harmful content, and scaling harmful operations at volumes that manual effort cannot match. Existing works, however, often overlook how LLMs compound harm across multi-turn conversations. We introduce HarmAmp, a new benchmark for multi-turn harm amplification scenarios spanning twelve risk categories. Each scenario is grounded in real-world threats and satisfies rigorous criteria, i.e., substantive amplification, operational specificity, and multi-turn necessity. We further propose TrajSafe, a proactive monitor that anticipates harmful trajectories and intervenes through actions such as probing users' genuine intents and steering the models towards safer completion. Our extensive experiments demonstrate that TrajSafe significantly reduces the harmfulness incurred in multi-turn interactions while preserving a low over-refusal rate and the target model's general capabilities. Our work offers a promising paradigm to alleviate the nuanced safety risks in LLM interactions.

URL PDF HTML ☆

赞 0 踩 0

2606.02398 2026-06-02 cs.LG cs.CL 版本更新

A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL

跨域干扰与恢复的局部微扰理论：多领域强化学习

Lei Yang, Siyu Ding, Deyi Xiong

发表机构 * TJUNLP Lab, College of Intelligence and Computing, Tianjin University（天津大学智能与计算学院TJUNLP实验室）； Baidu Inc.（百度公司）

AI总结针对多领域RL训练中一个领域性能下降的问题，提出局部微扰理论，证明后期领域训练主要通过二阶损伤项在低维共享冲突子空间中损害早期领域，并通过短时领域刷新实现选择性恢复。

详情

AI中文摘要

强化学习后训练在数学推理、代码生成、问答和创意写作等单个领域上改进了大型语言模型，但在一个领域上的训练往往会降低其他领域的性能。基于灾难性遗忘或全局梯度冲突的现有解释是不完整的：即使全模型梯度几乎正交，也可能发生实质性干扰。我们表明，单领域RL产生稀疏、小量级的参数编辑，且top变化神经元之间的重叠较弱，而不同领域仍然共享大量的活跃计算路径，这些路径上的更新方向决定了它们是协同还是冲突。在此观察指导下，我们在多领域RL的局部微扰模型下证明，后期领域训练主要通过二阶损伤项损害早期领域，在观察到的稀疏路径结构下，该损伤项集中在低维共享冲突子空间中。此外，短时领域刷新会收缩该子空间上的有害成分，从而在有限的附带损伤下实现选择性恢复。与理论一致，在Code → Math → QA → CW之后进行短暂的Re-Math刷新，将Math从57.66恢复到66.04，同时基本保持其他领域的性能，得到最佳平均分66.39。除了刷新之外，针对Math-QA对的稀疏代理冲突坐标集进行无训练回滚可部分恢复Math，为局部损伤提供了直接的代理级证据。这些结果为多领域RL中的干扰和恢复提供了局部机制解释。

英文摘要

Reinforcement learning (RL) post-training improves large language models (LLMs) on individual domains such as mathematical reasoning, code generation, question answering, and creative writing (CW), but training on one domain often degrades performance on others. Existing explanations based on catastrophic forgetting or global gradient conflict are incomplete: substantial interference can occur even when full-model gradients are nearly orthogonal. We show that single-domain RL produces sparse, small-magnitude parameter edits with weak overlap among top-changed neurons, while different domains still share substantial active computation routes on which update directions determine whether they act synergistically or conflict. Guided by this observation, we prove under a local perturbation model of multi-domain RL that later-domain training harms an earlier domain mainly through a second-order damage term, which under the observed sparse route structure concentrates in a low-dimensional shared conflict subspace. Moreover, a short domain refresh contracts the harmful component on this subspace, enabling selective recovery with limited collateral damage. Consistent with the theory, a brief Re-Math refresh after Code $\rightarrow$ Math $\rightarrow$ QA $\rightarrow$ CW recovers Math from 57.66 to 66.04 while largely preserving performance on the other domains, yielding the best average score of 66.39. Beyond refresh, a training-free rollback on a sparse proxy conflict coordinate set for the Math-QA pair partially restores Math, providing direct proxy-level evidence for localized damage. These results provide a localized mechanistic account of interference and recovery in multi-domain RL.

URL PDF HTML ☆

赞 0 踩 0

2606.02388 2026-06-02 cs.LG cs.AI 版本更新

Policy and World Modeling Co-Training for Language Agents

语言智能体的策略与世界模型协同训练

Ning Lu, Baijiong Lin, Shengcai Liu, Jiahao Wu, Haoze Lv, Yanbin Wei, Lingting Zhu, Shengju Qian, Xin Wang, Ying-Cong Chen, Qi Wang, Ke Tang

发表机构 * Southern University of Science and Technology（南方科技大学）； Hong Kong University of Science and Technology（香港科学大学）； Hong Kong University of Science and Technology (Guangzhou)（香港科学大学（广州））； Hong Kong Polytechnic University（香港理工大学）； LIGHTSPEED

AI总结提出PaW框架，通过在强化学习过程中添加辅助世界模型监督，无需改变推理范式，提升语言智能体在多个任务上的性能。

Comments 9 pages, 6 figures

详情

AI中文摘要

强化学习通过教导大语言模型智能体哪些行动能带来高奖励来改进它们，但对这些行动对环境的影响提供很少的监督。世界建模可以填补这一空白，但现有方法通常需要单独的模拟器、额外的训练阶段或额外的推理时计算。我们观察到，在策略强化学习 rollout 已经包含了所需的信号：每个转移将行动与其产生的下一个观察配对。基于这一观察，我们提出了PaW，一个策略和世界模型协同训练框架，它在强化学习过程中向同一策略添加辅助世界模型监督，而不改变推理范式。为了使辅助世界模型监督信息丰富且稳定，PaW引入了三个组件：基于行动熵的世界模型数据选择、噪声容忍的世界模型损失和奖励自适应的损失平衡。在三个智能体任务基准上的实验表明，在不同模型和强化学习算法上，PaW相对于强强化学习基线有一致的改进。这些结果表明，标准的强化学习 rollout 是语言智能体训练中世界模型监督的实用来源。

英文摘要

Reinforcement learning (RL) improves large language model (LLM) agents by teaching them which actions lead to high rewards, but provides little supervision on what those actions do to the environment. World modeling (WM) can fill this gap, yet existing approaches often require separate simulators, extra training stages, or additional inference-time computation. We observe that on-policy RL rollouts already contain the needed signal: each transition pairs an action with its resulting next observation. Based on this observation, we propose PaW, a Policy and World modeling co-training framework that adds auxiliary WM supervision to the same policy during RL, without changing the inference paradigm. To make auxiliary WM supervision informative and stable, PaW introduces three components: action-entropy-based WM data selection, noise-tolerant WM loss, and reward-adaptive loss balancing. Experiments on three agentic task benchmarks show consistent improvements over strong RL baselines across models and RL algorithms. These results suggest that standard RL rollouts are a practical source of WM supervision for language-agent training.

URL PDF HTML ☆

赞 0 踩 0

2606.02385 2026-06-02 q-bio.NC cs.LG 版本更新

How Optimality Structures Sparse Dictionaries: A Theory for Understanding SAE Representations

最优性如何结构化稀疏字典：理解SAE表示的理论

William Dorrell

AI总结本文通过扩展局部最优性分析到非负联合优化问题，推导出稀疏自编码器（SAE）最优特征与数据分布之间的约束，解释了层级分裂与吸收、残差结构和密集对映特征等行为，并构建了新型大字典凸问题以探索宽原子-数据点极限。

Comments 27 pages, 5 figures

详情

AI中文摘要

稀疏自编码器（SAE）已成功将神经表示解析为可解释的概念，为理解和控制提供了基础。然而，SAE究竟提取了什么，以及我们据此能得出哪些科学结论，并不明显。经验上，证据在于结果：SAE学习了可解释的特征。理论上，我们缺乏一个清晰的解释，说明一个“概念”必须满足什么属性才能被SAE提取。已有大量可识别性工作研究稀疏编码恢复真实特征的条件，但这些方法往往关注简单的数据生成模型（如稀疏独立特征），这些模型难以近似SAE所训练的、吞噬互联网的语言模型表示。在此，我们避免数据生成模型，仅询问任何字典学习最优解必须满足什么属性。具体地，我们将局部最优性分析（Gribonval & Schnass, 2010）扩展到普通SAE近似的非负联合优化问题，并推导出最优SAE特征与其分布之间的约束。我们利用这些约束解释了一系列观察到的SAE行为——层级分裂与吸收、残差结构以及密集对映特征——每个都反映了L1+非负性如何与数据交互以结构化最优字典。最后，我们构建了一个新颖的大字典凸问题，并探索了宽原子-数据点极限。总之，我们希望将模型假设与意外观察区分开，从而从SAE的成功中学到更多，并为设计其继任者提供原则。

英文摘要

Sparse Autoencoders (SAEs) have found success parsing neural representations into interpretable concepts, providing a basis for understanding and control. However, what exactly SAEs extract, and, correspondingly, the scientific conclusions we can draw from them, are not obvious. Empirically, the proof is in the pudding: SAEs learn interpretable features. Theoretically, we lack a clear account of what properties a 'concept' must satisfy for an SAE to extract it. There has been extensive identifiability work studying the conditions under which sparse coding recovers ground-truth features; however, these approaches tends to focus on simple data-generating models (e.g. sparse independent features) which poorly approximate the internet-swallowing language-model representations on which SAEs are trained. Here, avoiding data-generating models, we ask simply what properties any dictionary learning optimum must satisfy. Concretely, we extend local optimality analyses (Gribonval & Schnass, 2010) to the nonnegative joint-optimisation problem that vanilla SAEs approximate, and derive constraints relating optimal SAE features to their distributions. We use these constraints to explain a range of observed SAE behaviours - hierarchical splitting & absorption, the structure of residuals, and dense antipodal features - each reflecting how L1+nonnegativity interact with data to structure optimal dictionaries. Finally, we construct a novel large-dictionary convex problem and explore the wide atom-per-datapoint limit. In sum, we hope to tease model assumptions from unexpected observations, letting us learn more from SAEs' successes and provide principles for designing their successors.

URL PDF HTML ☆

赞 0 踩 0

2606.02384 2026-06-02 cs.LG 版本更新

TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks

TabPrep: 弥合表格基准测试中的特征工程差距

Andrej Tschalzev, Nick Erickson, Yuyang Wang, Huzefa Rangwala, Stefan Lüdtke, Heiner Stuckenschmidt, Christian Bartelt

AI总结本文提出TabPrep，一个轻量级预处理流程，通过针对三种特定数据模式设计的特征生成器，系统性地进行特征工程，显著提升多种模型在表格基准测试中的性能。

详情

AI中文摘要

表格机器学习的进展主要集中在日益复杂的模型架构上。同时，特征工程仍然是现实建模流程中关键但未被充分探索的组成部分，在现代基准测试中完全缺失，这造成了未量化的评估差距。在这项工作中，我们引入了TabPrep，一个轻量级预处理流程，由精心设计以针对三种特定结构数据模式的特征生成器组成。我们表明，许多广泛使用的模型类对这些模式表现出可预测的盲点，仅凭系统性的特征工程就能建立新的峰值性能。在TabArena基准测试中，将TabPrep集成到模型训练和调优中持续提升了基于树、神经网络、线性和基础模型的性能，通常超过仅通过以模型为中心的创新所获得的收益。TabPrep在性能、效率和跨数据集的适用性方面优于以前的自动化特征工程方法，使其能够集成到大规模基准测试中。通过发布TabPrep（见https://github.com/atschalz/tabprep），我们使研究人员能够将特征工程集成到他们的基准测试设置中，填补了表格评估中长期存在的空白。

英文摘要

Progress in tabular machine learning has largely focused on increasingly sophisticated model architectures. At the same time, feature engineering remains a critical yet underexplored component of real-world modeling pipelines that is entirely absent from modern benchmarks, which creates an unquantified evaluation gap. In this work, we introduce TabPrep, a lightweight preprocessing pipeline composed of feature generators that are carefully designed to target three specific structural data patterns. We show that many widely used model classes exhibit predictable blind spots to these patterns and that systematic feature engineering alone can establish new peak performance. Across the TabArena benchmark, integrating TabPrep into model training and tuning consistently improves performance for tree-based, neural, linear, and foundation models, often surpassing gains achieved by model-centric innovations alone. TabPrep outperforms previous automated feature engineering approaches in performance, efficiency, and applicability across datasets, enabling integration into large-scale benchmarks. By releasing TabPrep (see https://github.com/atschalz/tabprep), we enable researchers to integrate feature engineering into their benchmarking setup, filling a longstanding gap in tabular evaluations.

URL PDF HTML ☆

赞 0 踩 0

2606.02381 2026-06-02 cs.AI cs.LG math.DS 版本更新

A Mathematical Conflict Framework for Contextual Data Modulation

上下文数据调制的数学冲突框架

Hakan Emre Kartal

发表机构 * GitHub

AI总结提出一个基于算子的数学冲突框架，将冲突视为局部、方向性和上下文敏感的量，通过统一抽象算子整合权重、尺度行为和输出映射，作为独立于优化过程的数学对象。

Comments 15 pages, 3 figures, framework paper

2606.02365 2026-06-02 cs.LG cs.AI 版本更新

FOAM: Frequency and Operator Error-Based Adaptive Damping Method for Reducing Staleness-Oriented Error for Shampoo

FOAM：基于频率和算子误差的自适应阻尼方法，用于减少Shampoo的陈旧性误差

Kyunghun Nam, Sumyeong Ahn

发表机构 * Kyunghun Nam ； Sumyeong Ahn

AI总结提出FOAM算法，通过自适应控制阻尼因子和特征分解频率来抑制陈旧性误差，在保持收敛的同时减少Shampoo的计算时间。

Comments 9 pages, ICML 2026 camera-ready version

详情

AI中文摘要

Shampoo因其在大规模优化基准上的卓越性能而备受关注，但它面临一个重要的实际瓶颈：矩阵求逆的过高计算开销。为了缓解这一问题，从业者通常依赖陈旧的预条件子更新，这在计算效率和优化保真度之间产生了根本性的权衡。在这项工作中，我们通过收敛性和稳定性的互补视角对陈旧性进行了理论研究。虽然陈旧性提高了计算效率，但它固有地降低了性能并引入了数值不稳定性。关键的是，我们发现作为数值稳定器的阻尼可以有效抑制这些负面影响。在此分析指导下，我们提出了FOAM，一种自适应算法，通过基于陈旧性误差的近似动态控制阻尼因子和特征分解频率来稳定训练。实验结果表明，与标准Shampoo相比，FOAM在保持稳健收敛的同时减少了挂钟时间。

英文摘要

Shampoo is attracting considerable attention for its superior performance on large-scale optimization benchmarks; yet it faces a significant practical bottleneck: the prohibitive computational overhead of matrix inversion. To mitigate this, practitioners typically rely on stale preconditioner updates, creating a fundamental trade-off between computational efficiency and optimization fidelity. In this work, we provide a theoretical study of staleness through the complementary lenses of convergence and stability. While staleness improves computational efficiency, it inherently degrades performance and introduces numerical instability. Crucially, we identify that damping, acting as a numerical stabilizer, can effectively suppress these negative effects. Guided by this analysis, we propose FOAM, an adaptive algorithm that stabilizes training by dynamically controlling both the damping factor and the eigendecomposition frequency based on an approximation of the staleness-oriented error. Experimental results demonstrate that FOAM reduces wall-clock time compared to standard Shampoo while maintaining robust convergence.

URL PDF HTML ☆

赞 0 踩 0

2606.02363 2026-06-02 cs.LG stat.ML 版本更新

Minimax-Optimal Policy Regret in Partially Observable Markov Games

部分可观测马尔可夫博弈中的极小化最优策略遗憾

Raman Arora

发表机构 * Raman Arora

AI总结针对部分可观测马尔可夫博弈，提出基于epoch的乐观最大似然算法，实现了与聚合Eluder维数相关的$ ilde{O}(\sqrt{T})$策略遗憾，并证明了匹配的下界。

详情

AI中文摘要

我们研究了部分可观测环境中面对战略、自适应对手的序贯决策问题，建模为部分可观测马尔可夫博弈（POMG）。核心挑战在于从部分观测中学习潜在动态，同时面对行为依赖于学习者策略的对手，这使得标准遗憾概念不适用。我们证明，对于固定问题参数，基于epoch的乐观最大似然算法实现了$ ilde{O}(\sqrt{T})$的策略遗憾，显式依赖于视界、对手记忆、置信半径以及可观测算子类的聚合Eluder维数。该算法在每个几何增长的epoch中选择一个策略，使用从过去数据累积构建的置信集，这将比较跨策略的对手响应的成本控制在$T$的对数级别。我们还证明了与$\sqrt{T}$和聚合Eluder维数依赖相匹配的下界（至多问题相关和对数因子）。最后，我们将框架扩展到视界自适应保证和具有几何衰减记忆的对手。

英文摘要

We study sequential decision-making in partially observable environments against strategic, adaptive opponents, modeled as partially observable Markov games (POMGs). The central challenge is to learn latent dynamics from partial observations while facing an adversary whose behavior depends on the learner's strategy, making standard regret notions inadequate. We prove that an epoch-based optimistic maximum-likelihood algorithm achieves $\tilde{O}(\sqrt{T})$ policy regret for fixed problem parameters, with explicit dependence on the horizon, adversary memory, confidence radius, and the aggregate Eluder dimension of the observable-operator class. The algorithm selects one policy per geometrically growing epoch using confidence sets built cumulatively from past data, which keeps the cost of comparing adversary responses across policies logarithmic in $T$. We also prove a lower bound matching the $\sqrt{T}$ and aggregate-Eluder-dimension dependence, up to problem-dependent and logarithmic factors. Finally, we extend the framework to horizon-adaptive guarantees and adversaries with geometric fading memory.

URL PDF HTML ☆

赞 0 踩 0

2606.02355 2026-06-02 cs.AI cs.LG 版本更新

SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training

SIRI：具有内在技能的自我内化强化学习用于LLM智能体训练

Zhongyu He, Yuanfan Li, Fei Huang, Tianyu Chen, Siyuan Chen, Xingyang Li, Meng Hsuan Yu, Xiangrong Liu, Leyi Wei, Lu Pan, Ke Zeng, Xunliang Cai

发表机构 * Xiamen University（厦门大学）； Meituan（美团）； Macao Polytechnic University（澳门 polytechnic 大学）

AI总结提出SIRI框架，通过自我技能挖掘、验证和内化，使LLM智能体无需外部技能生成器或推理时技能库即可提升长程任务性能，在ALFWorld和WebShop上优于基线方法。

详情

AI中文摘要

长程LLM智能体可以从可重用技能中受益，但现有的基于技能的方法通常依赖于训练期间的外部技能生成器或推理时的持久技能检索，增加了工程复杂性、上下文长度和部署延迟。我们提出了具有内在技能的自我内化强化学习（SIRI），这是一个三阶段框架，使智能体能够发现、验证和内化技能，无需外部技能生成器或推理时的技能库。SIRI首先使用GiGPO预热策略以获得基本交互能力并收集成功的无技能轨迹。然后进行自我技能挖掘，当前策略从其自身的成功普通轨迹中总结紧凑技能，并通过配对的技能增强和技能无关轨迹进行验证。最后，SIRI仅使用轨迹级效用和动作级优势将有帮助的技能引导动作令牌蒸馏到普通策略中。推理时，智能体仅使用原始提示运行。在ALFWorld和WebShop上使用Qwen2.5-7B-Instruct，SIRI将GiGPO从ALFWorld的0.908提升到0.930，从WebShop的0.728提升到0.813，优于基于提示、基于强化学习和基于记忆增强的基线。进一步分析表明，我们的自我挖掘策略可以实现与闭源大模型蒸馏相当的性能。我们的代码可在https://github.com/kirito618/SIRI获取。

英文摘要

Long-horizon LLM agents can benefit from reusable skills, yet existing skill-based methods often rely on external skill generators during training or persistent skill retrieval at inference, increasing engineering complexity, context length, and deployment latency. We propose Self-Internalizing Reinforcement learning with Intrinsic skills (SIRI), a three-phase framework that enables agents to discover, validate, and internalize skills without external skill generators or inference-time skill banks. SIRI first warms up the policy with GiGPO to acquire basic interaction ability and collect successful skill-free trajectories. It then performs self-skill mining, where the current policy summarizes compact skills from its own successful plain rollouts and validates them through paired skill-augmented and skill-free rollouts. Finally, SIRI distills only beneficial skill-guided action tokens into the plain policy using trajectory-level utility and action-level advantage. At inference, the agent runs with the original prompt only. On ALFWorld and WebShop with Qwen2.5-7B-Instruct, SIRI improves GiGPO from 0.908 to 0.930 on ALFWorld and from 0.728 to 0.813 on WebShop, outperforming prompt-based, RL-based, and memory-augmented baselines. Further analysis shows that our self-mining strategy can achieve performance comparable to distillation with closed-source large model. Our code is available at https://github.com/kirito618/SIRI.

URL PDF HTML ☆

赞 0 踩 0

2606.02345 2026-06-02 stat.ML cs.LG 版本更新

Doing well with less! On Sampling Techniques for Empirical Pairwise Loss Estimation/Minimization

少即是多！关于经验成对损失估计/最小化的采样技术

Louise Davy, Stephan Clémençon, Charlotte Laclau

发表机构 * IDS, LTCI Télécom Paris Palaiseau, France（IDS、LTCI 雷电巴黎实验室，巴黎帕莱索，法国）

AI总结本文利用调查采样技术，通过直接对成对样本进行采样而非单个观测，在保留少量信息的情况下实现与全量成对评估相当的估计或优化性能，为精度与计算成本之间提供了理论上有依据的权衡。

详情

AI中文摘要

许多机器学习问题，包括相似性学习、排序和聚类，都依赖于经验成对损失函数，其二次计算成本在大规模下迅速变得难以承受。我们展示了一种节俭的方法，通过利用调查采样技术，仅保留成对信息的一小部分，即可实现与使用所有成对数据相当的估计或优化性能。一个核心发现（理论和实验均支持）是，这种采样方案必须直接针对成对样本而非单个观测。特别地，对于高维向量（如视觉或图学习中的嵌入）之间的成对损失，使用合适的辅助信息为信息量大的成对样本分配更高的包含概率，可以获得接近全量成对评估的性能，从而在精度和计算成本之间提供了一种有原则且理论上有依据的权衡。

英文摘要

Many machine learning problems, including similarity learning, ranking, and clustering, rely on empirical pairwise loss functions whose quadratic computational cost quickly becomes prohibitive at scale. We demonstrate how a frugal approach that retains only a fraction of the available information on pairs can achieve estimation or optimization performance comparable to that obtained by using all pairs, by leveraging survey sampling techniques. A central finding, supported by both theory and experiments, is that such sampling plans must target pairs directly rather than individual observations. In particular, for pairwise losses between high-dimensional vectors such as embeddings in vision or graph learning, assigning higher inclusion probabilities to informative pairs using suitable auxiliary information yields performance close to full pairwise evaluation, providing a principled and theoretically grounded trade-off between accuracy and computational cost.

URL PDF HTML ☆

赞 0 踩 0

2606.02339 2026-06-02 cs.LG cs.CV 版本更新

Entropy Minimization without Model Collapse: Mitigating Prediction Bias in Medical Imaging

无模型坍塌的熵最小化：减轻医学影像中的预测偏差

Tim Nielen, Sameer Ambekar, Johannes Kiechle, Daniel M. Lang, Julia A. Schnabel

发表机构 * School of Computation, Information and Technology, Technical University of Munich, Germany（慕尼黑技术大学计算、信息与技术学院）； Institute of Machine Learning in Biomedical Imaging, Helmholtz Munich, Germany（生物医学成像中的机器学习研究所，海德堡慕尼黑德国）； School of Biomedical Engineering and Imaging Sciences, King’s College London, UK（伦敦国王学院生物医学工程与成像科学学院）； Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心（MCML））； relAI – Konrad Zuse School of Excellence in Reliable AI（relAI——Konrad Zuse可靠性人工智能卓越学院）； TUM University Hospital Rechts der Isar（慕尼黑技术大学医院Rechts der Isar）

AI总结针对测试时适应中熵最小化导致的模型坍塌问题，提出分布偏移偏差减少（DSBR）方法，通过均衡各预测类对无监督熵最小化损失的贡献来纠正预测偏差，在四个医学影像数据集和ImageNet-C上验证了其稳定性和有效性。

详情

AI中文摘要

熵最小化（EM）是测试时适应的主导目标，但其失败模式——模型坍塌——仍然知之甚少。在这项工作中，我们表明分布偏移会导致模型表示空间中对应不同类别的特征簇合并，而决策边界保持不变。这导致预测类别分布出现系统性偏差，称为预测偏差。预测偏差是指预测类别分布的偏移，其中一些类别被过度代表，而其他类别被抑制。我们表明熵最小化通过收紧现有簇来放大这种预测偏差，强化错误的分类，直到所有预测坍缩为平凡解。接下来，为了证明预测偏差的重要性并减轻它，我们进一步提出了分布偏移偏差减少（DSBR），这是一种偏差纠正目标，通过均衡每个预测类别对无监督熵最小化损失的贡献来专门针对这种失败模式。为了研究这种失败模式，我们使用四个医学影像数据集设计了合适的适应设置，并在ImageNet-C上进行了额外评估。我们发现DSBR一致地稳定了测试时适应，防止了模型坍塌，并且匹配或超越了最先进的方法。此外，DSBR仅在测试时运行。

英文摘要

Entropy minimization (EM) is the dominant objective for test-time adaptation, yet its failure mode, model collapse, remains poorly understood. In this work, we show that distribution shifts can cause feature clusters corresponding to distinct classes in the model's representation space to merge, while the decision boundary remains fixed. This induces a systematic skew in the predicted class distribution, referred to as prediction bias. Prediction bias refers to a shift in the predicted class distribution, with some classes overrepresented and others suppressed. We show that entropy minimization amplifies this prediction bias by tightening the existing clusters, reinforcing the incorrect groupings until all predictions collapse to a trivial solution. Next, to demonstrate the significance of prediction bias and mitigate it, we further propose Distribution Shift Bias Reduction (DSBR), a bias-correcting objective that specifically targets this failure mode by equalizing the contribution of each predicted class to the unsupervised entropy minimization loss. To study this failure mode, we design suitable adaptation settings using four medical-imaging datasets and additionally evaluate on ImageNet-C. We find that DSBR consistently stabilizes test-time adaptation, prevents model collapse, and matches or outperforms state-of-the-art methods. Moreover, DSBR operates solely at test-time.

URL PDF HTML ☆

赞 0 踩 0

2606.02331 2026-06-02 cs.CV cs.LG 版本更新

Hallucination-Aware Diffusion Sampling for Inverse Problems via Robust Prior Updates

基于鲁棒先验更新的幻觉感知扩散采样用于逆问题

Pengfei Jin, Yiqi Tian, Kailong Fan, Bingjie Qi, Quanzheng Li

发表机构 * Center for Advanced Medical Computing and Analysis, Massachusetts General Hospital and Harvard Medical School（先进医学计算与分析中心，麻省总医院和哈佛医学院）； Department of Industrial Engineering, University of Pittsburgh（工业工程系，匹兹堡大学）

AI总结提出鲁棒先验更新模块，通过探测扩散先验更新的局部稳定性并重新锚定位移，减少逆问题求解中的测量条件幻觉，提升实例保真度。

详情

AI中文摘要

基于扩散的逆问题求解器可以产生逼真的重建结果，但仅凭逼真度并不能确保恢复的细节得到测量的支持。我们将这种失败研究为测量条件幻觉：视觉上有意义但要么不可信要么与测量实例不一致的内容。我们的分析将基于贝叶斯规则的扩散逆求解器分为先验更新和测量条件步骤，表明在应用测量校正之前，幻觉内容可能通过先验侧提议进入。受此观点启发，我们提出鲁棒先验更新（RPU），一个求解器级模块，探测扩散先验更新的局部稳定性，将产生的位移重新锚定在当前迭代点，并保持测量更新不变。我们在DPS中实例化RPU，并使用自动指标和人类忠实度研究在FFHQ和ImageNet逆问题上进行评估。在FFHQ上，RPU在框内修复、高斯去模糊和运动去模糊中相比DPS提高了PSNR和LPIPS。在人类判断中，RPU在FFHQ框内修复上获得了91.9%的盲选非平局多数偏好和91.1%的借助真实标签的非平局偏好，而ImageNet高斯阅读器研究中平局较多，但在非平局情况下RPU更受青睐。这些结果支持一个有针对性的主张：鲁棒化先验更新可以提高扩散逆求解器中的实例保真度，尤其是在先验塑造弱约束内容时。

英文摘要

Diffusion-based inverse problem solvers can produce realistic reconstructions, but realism alone does not ensure that the recovered details are supported by the measurement. We study this failure as measurement-conditioned hallucination: visually meaningful content that is either implausible or inconsistent with the measured instance. Our analysis separates Bayes-rule-based diffusion inverse solvers into a prior update and a measurement-conditioning step, showing that hallucinated content can enter through the prior-side proposal before the measurement correction is applied. Motivated by this view, we propose Robust Prior Update (RPU), a solver-level module that probes the local stability of the diffusion prior update, re-anchors the resulting displacement at the current iterate, and leaves the measurement update unchanged. We instantiate RPU in DPS and evaluate it on FFHQ and ImageNet inverse problems using automatic metrics and human faithfulness studies. On FFHQ, RPU improves PSNR and LPIPS over DPS across box inpainting, Gaussian deblurring, and motion deblurring. In human judgments, RPU receives 91.9% of blind non-tie majority preferences and 91.1% of ground-truth-assisted non-tie preferences on FFHQ box inpainting, while the ImageNet Gaussian reader study is tie-heavy but favors RPU among non-tie cases. These results support a targeted claim: robustifying the prior update can improve instance faithfulness in diffusion inverse solvers, especially when the prior shapes weakly constrained content.

URL PDF HTML ☆

赞 0 踩 0

2606.02328 2026-06-02 cs.LG 版本更新

Riemannian Gradient Descent for Low-Rank Architectures

低秩架构的黎曼梯度下降

Nicholas Knight

AI总结针对低秩矩阵参数，探索黎曼优化技术，并在小语言模型的多头注意力参数上应用，但未显著优于AdamW基线。

2606.02322 2026-06-02 cs.LG cs.AI 版本更新

Repurposing Adversarial Perturbations for Continual Learning: From Defense to Active Alignment

重新利用对抗扰动进行持续学习：从防御到主动对齐

Ran Liu, Min Yu, Mingqi Liu, Jianguo Jiang, Gang Li, Rongsheng Li, Ning Li, Zhen Xu, Weiqing Huang, Ming Liu

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences（中国科学院信息工程研究所）； School of Cyber Security, University of Chinese Academy of Sciences（中国科学院大学网络安全学院）； Deakin University（德肯大学）； Harbin Engineering University（哈尔滨工程大学）

AI总结提出AdvCL框架，通过将对抗扰动重新用作几何控制信号，结合三个即插即用模块（Intra-Smooth、Proto-Clip、Inter-Align），在持续学习中同时提升标准性能、鲁棒性、降低遗忘并增强迁移。

详情

AI中文摘要

在动态环境中，大型语言模型需要不断适应新任务，但持续学习常常遭受遗忘、有限的迁移以及对对抗扰动的脆弱性。为了解决这个问题，我们提出了AdvCL，它将对抗扰动重新用作稳定的持续适应的几何控制信号。AdvCL结合了三个即插即用模块：Intra-Smooth通过小的对抗扰动促进局部平滑性；Proto-Clip使用相似性裁剪以防止过度对齐到当前任务原型；Inter-Align则通过对齐到先前任务原型的方向性对齐来减少表示间隙。实验表明，在标准性能和鲁棒性方面均有一致的提升，同时具有更低的遗忘和更强的迁移。我们进一步通过量化Intra-Smooth对扰动设置的敏感性以及Inter-Align对任务相似性和几何距离的影响，分析了关键机制。总之，这些模块在组合时提供互补增益，每个模块也可以单独集成到各种持续学习范式中，包括回放、正则化和动态架构，从而为持续学习提供了一种几何控制机制。

英文摘要

In dynamic environments, large language models need to keep adapting to new tasks, but continual learning often suffers from forgetting, limited transfer, and vulnerability to adversarial perturbations. To address this, we present AdvCL, which repurposes adversarial perturbations as a geometric control signal for stable continual adaptation. AdvCL combines three plug-in modules: Intra-Smooth promotes local smoothness via small adversarial perturbations; Proto-Clip uses similarity clipping to prevent excessive alignment to current task prototype; and Inter-Align applies directional alignment toward previous task prototype to reduce representational gaps. Experiments show consistent gains in both standard performance and robustness, with lower forgetting and stronger transfer. We further analyze key mechanisms by quantifying the sensitivity of Intra-Smooth to perturbation settings and the effect of Inter-Align on task similarity and geometric distance. In summary, the modules provide complementary gains when combined, and each can also be integrated individually into diverse CL paradigms, including replay, regularization, and dynamic architectures, thereby offering a geometric control mechanism for continual learning.

URL PDF HTML ☆

赞 0 踩 0

2606.02310 2026-06-02 cs.CV cs.LG 版本更新

Deep Learning for Remote Sensing to Improve Flood Inundation Mapping

深度学习用于遥感以改进洪水淹没制图

Yogesh Bhattarai, Vijay Chaudhary, Wai Lim Kim, Sanjib Sharma

发表机构 * University of Colorado Boulder（科罗拉多大学博尔德分校）

AI总结提出基于去噪扩散概率模型和掩码扩散Transformer的云去除框架，用于洪水影像，以生成无云图像并保持水文一致性，提升洪水监测的可靠性。

Comments This paper has been selected as the top 10 student finalists in IGRASS 2026 paper competition

详情

AI中文摘要

洪水是全球最普遍的自然灾害。及时准确的洪水淹没制图对于告知灾害风险管理至关重要。光学卫星任务提供了高分辨率、多光谱观测，对于洪水检测和淹没制图至关重要。然而，在极端降水事件期间，其操作实用性受到云层的严重限制。基于时间合成或插值的传统云去除技术通常无法捕捉淹没动态。在本研究中，我们引入了一种基于去噪扩散概率模型的洪水影像云去除框架，利用掩码扩散Transformer架构。所提出的方法利用自注意力机制捕获更广泛的空间上下文，并采用掩码令牌建模来显式学习云遮挡区域的重建。在具有真实云模式的多光谱Sentinel-2B洪水场景上训练，该模型生成保持视觉保真度和水文一致性的无云图像实现。使用标准图像质量指标以及洪水特定的水文指标评估重建性能，显示出水体连续性的改善和对水检测指数至关重要的光谱特征的保留。结果表明，基于扩散的生成建模为光学洪水监测中的云去除提供了一种稳健且物理一致的替代方案，从而实现更可靠、连续的观测，以支持灾害风险管理和洪水相关决策。

英文摘要

Flooding is the most pervasive natural disaster worldwide. Timely and accurate flood inundation mapping are essential for informing disaster risk management. Optical satellite missions provide high-resolution, multispectral observations critical for flood detection and inundation mapping. However, their operational utility is severely constrained by cloud cover during extreme precipitation events. Conventional cloud-removal techniques based on temporal compositing or interpolation often fail to capture inundation dynamics. In this study, we introduce a cloud-removal framework for flood imagery based on Denoising Diffusion Probabilistic Models, leveraging the Masked Diffusion Transformer architecture. The proposed approach exploits self-attention mechanisms to capture wider spatial context and employs masked token modeling to explicitly learn the reconstruction of cloud-obscured regions. Trained on multispectral Sentinel-2B flood scenes with realistic cloud patterns, the model generates cloud-free image realizations that preserve both visual fidelity and hydrological consistency. Reconstruction performance is evaluated using standard image quality metrics alongside flood-specific hydrological measures, demonstrating improved continuity of water bodies and preservation of spectral signatures critical for water detection indices. The results indicate that diffusion-based generative modeling offers a robust and physically consistent alternative for cloud removal in optical flood monitoring, enabling more reliable, continuous observations to support disaster risk management and flood-related decision making.

URL PDF HTML ☆

赞 0 踩 0

2606.02309 2026-06-02 cs.LG cs.CV 版本更新

Measurement Geometry and Design for Trustworthy Generative Inverse Problems

可信生成式逆问题的测量几何与设计

Pengfei Jin, Na Li, Quanzheng Li

发表机构 * Center for Advanced Medical Computing and Analysis, Massachusetts General Hospital and Harvard Medical School（先进医学计算与分析中心，麻省总医院和哈佛医学院）； School of Engineering and Applied Sciences, Harvard University（工程与应用科学学院，哈佛大学）

AI总结提出局部测量-流形兼容性度量，证明其控制重建误差的稳定部分，并基于体积保持设计固定和自适应测量策略，在多个成像任务中预测失败模式、减少幻觉并指导采样。

详情

AI中文摘要

生成模型越来越多地被用作逆问题的先验，但它们生成逼真图像的能力带来了一个基本的信任问题：一个看似合理的重建可能由测量支持，也可能由先验沿未观测方向填充。这一区别在医学成像中尤为重要，因为采集操作是在扫描时间、剂量和校准约束下设计的。我们从测量几何的角度研究生成式逆问题。核心问题是：固定的测量算子能否区分在生成先验下看似合理的邻近图像，以及这种关系能否指导更好的测量。我们引入了一个局部测量-流形兼容性度量，用于量化算子观测先验相关切线方向的程度。在局部正则性假设下，我们证明该量控制重建误差的稳定部分，而生成先验控制流形外漂移。这一最坏方向证书基于整体局部体积保持，提出了实用的固定和顺序采集规则，包括一种后验云设计，该设计在测试时自适应调整测量，无需训练采样策略。在行采样、断层扫描和MR采集设置中，所提出的分数预测失败模式，解释测量引起的幻觉，并指导更好的采样。在fastMRI笛卡尔采样中，后验云测量设计优于强大的非学习ACS保留基线，包括可变密度和泊松类掩模。

英文摘要

Generative models are increasingly used as priors for inverse problems, but their ability to produce realistic images creates a basic trust problem: a plausible reconstruction may be supported by the measurements, or it may be filled in by the prior along unobserved directions. This distinction is especially important in medical imaging, where acquisition operators are designed under scan-time, dose, and calibration constraints. We study generative inverse problems from a measurement-geometry perspective. The central question is whether a fixed measurement operator can distinguish nearby images that are plausible under the generative prior, and whether this relationship can guide better measurements. We introduce a local measurement-manifold compatibility measure that quantifies how well the operator observes prior-relevant tangent directions. Under local regularity assumptions, we prove that this quantity controls the stable part of the reconstruction error, while the generative prior controls off-manifold drift. This worst-direction certificate motivates practical fixed and sequential acquisition rules based on overall local volume preservation, including a posterior-cloud design that adapts measurements at test time without training a sampling policy. Across row-sampling, tomographic, and MR acquisition settings, the proposed scores predict failure modes, explain measurement-induced hallucinations, and guide better sampling. In fastMRI Cartesian sampling, posterior-cloud measurement design improves over strong non-learned ACS-preserving baselines, including variable-density and Poisson-like masks.

URL PDF HTML ☆

赞 0 踩 0

2606.02294 2026-06-02 cs.LG 版本更新

Regularized Large Neighborhood Search

正则化大邻域搜索

Germain Vivier-Ardisson, Laurent Demonet, Axel Parmentier, Mathieu Blondel

发表机构 * Google DeepMind（谷歌DeepMind）； CERMICS Paris, France（巴黎CERMICS研究所）； ENPC, CNRS, IPP Marne-la-Vallée, France（巴黎-马恩拉瓦尔大学、国家科学研究中心、IPP马恩拉瓦尔分校）； Google Research Paris, France（巴黎谷歌研究）

AI总结提出正则化大邻域搜索（RLNS），将LNS启发式转化为MCMC采样器，实现无需全局求解器的端到端学习。

详情

AI中文摘要

运筹学从业者通常使用大邻域搜索（LNS）来解决NP难的组合问题，这是一种可扩展的启发式方法，通过局部重新优化其变量的子集来迭代改进当前解。相比之下，大多数现有的将组合优化层集成到神经网络中的方法仍然假设可以访问精确的全局解，这在计算上是难以处理的。我们通过引入正则化大邻域搜索（RLNS）来弥合这一差距。通过正则化或扰动局部子问题，我们将LNS启发式转化为一个高效的MCMC采样器，在可行解的组合集上采样，并关联Fenchel-Young损失。在熵正则化下，我们证明RLNS执行精确的块吉布斯采样。此外，调整RLNS迭代次数使我们能够在伪似然和精确最大似然估计之间插值，从而实现无需全局求解器的端到端学习。我们在$k$-子集选择、广义分配和随机车辆调度问题上展示了我们的方法。

英文摘要

Operations research practitioners typically tackle NP-hard combinatorial problems using large neighborhood search (LNS), a scalable heuristic that iteratively refines a current solution by locally re-optimizing subsets of its variables. In contrast, most existing approaches for integrating combinatorial optimization layers into neural networks still assume access to an exact global solution, which is computationally intractable. We bridge this gap by introducing regularized LNS (RLNS). By regularizing or perturbing local subproblems, we turn the LNS heuristic into an efficient MCMC sampler over the combinatorial set of feasible solutions, with associated Fenchel-Young losses. Under entropic regularization, we prove that RLNS performs exact block Gibbs sampling. Furthermore, adjusting the number of RLNS iterations allows us to interpolate between pseudolikelihood and exact maximum likelihood estimation, for end-to-end learning without global solvers. We demonstrate our approach on $k$-subset selection, generalized assignment, and stochastic vehicle scheduling problems.

URL PDF HTML ☆

赞 0 踩 0

2606.02288 2026-06-02 cs.LG 版本更新

Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization

LLM中的大规模尖峰是偏置向量：机制揭示与无尖峰量化

Yung-Chin Chen, Chung Peng Lee, Ze-Wei Liou, Naveen Verma

发表机构 * Princeton University（普林斯顿大学）； EnCharge AI

AI总结本文通过机制分析发现LLM中的激活尖峰本质上是结构化的向量偏置，并提出无尖峰量化框架INSERTQUANT，实现鲁棒的低比特量化。

详情

AI中文摘要

大型语言模型（LLM）中的大规模激活尖峰通过拉伸动态范围严重降低了量化性能。虽然先前的假设将这些尖峰描述为高级标量偏置，但我们认为它们只是携带尖峰的令牌中刚性、结构化的向量偏置的标量中间产物。我们展示了这些令牌在归一化后收敛到常向量，驱动了注意力沉没和值状态耗尽机制。我们通过分析投影权重的协调性从几何上证实了这一点：$W_K$对比性地放大该向量，$W_Q$将语义令牌对齐到它，$W_V$将其投影到谱零空间。此外，我们揭示了模型通过利用低频带和相干通道对将结构偏置定位在“旋转稳定区域”中，从而主动保护这些结构偏置免受旋转位置编码（RoPE）扰动的影响。利用这一点，我们提出了INSERTQUANT，一种后训练量化（PTQ）框架，通过预计算模板向量来钳制尖峰并恢复其功能。这使得激活严格无尖峰，从而实现高保真度的鲁棒低比特量化。INSERTQUANT在LLM上达到了与最先进的每张量量化方法相当的性能，并且独特地泛化到文本以外的其他模态，如ViT。

英文摘要

Massive activation spikes in Large Language Models (LLMs) severely degrade quantization by stretching dynamic ranges. While prior hypotheses characterize these as high-level scalar biases, we argue that they are merely the scalar intermediates of rigid, structural vector biases in the spike-carrying tokens. We show that these tokens converge to constant vectors after normalization that drive the attention sink and value-state drain mechanisms. We geometrically substantiate this by analyzing the coordination of projection weights: $W_K$ contrastively amplifies the vector, $W_Q$ aligns semantic tokens toward it, and $W_V$ projects it into the spectral null-space. Furthermore, we reveal that the model actively preserves these structural biases against Rotary Positional Embedding (RoPE) perturbations by localizing them in "zones of rotational stability" utilizing low-frequency bands and coherent channel pairs. Leveraging this, we propose INSERTQUANT, a post-training quantization (PTQ) framework that clamps spikes and restores their function via pre-computed template vectors. This renders activations strictly spike-free, enabling robust low-bit quantization with high fidelity. INSERTQUANT achieves parity with state-of-the-art per-tensor quantization methods on LLMs and uniquely generalizes beyond text to other modalities such as ViTs.

URL PDF HTML ☆

赞 0 踩 0

2606.02287 2026-06-02 cs.LG cs.AI 版本更新

CityTrajBench: A Unified Benchmark for City-Scale Vehicle Trajectory Generation

CityTrajBench: 城市尺度车辆轨迹生成的统一基准

Shibo Zhu, Xiaodan Shi, Dayin Chen, Yuntian Chen, Haoran Zhang, Tianhao Wu, Jinyue Yan

发表机构 * Department of Building Environment and Energy Engineering, The Hong Kong Polytechnic University, Hong Kong SAR, China（香港理工大学建筑环境与能源工程系）； Eastern Institute for Advanced Study, Eastern Institute of Technology, Ningbo, China（宁波东部先进研究所）； International Centre of Urban Energy Nexus, The Hong Kong Polytechnic University, Hong Kong SAR, China（香港理工大学城市能源 nexus 中心）； Department of Computer and Systems Sciences, Stockholm University, Sweden（斯德哥尔摩大学计算机与系统科学系）； Zhejiang Key Laboratory of Industrial Intelligence and Digital Twin, Eastern Institute of Technology, Ningbo, China（浙江工业智能与数字孪生重点实验室）； Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China（宁波数字孪生研究所）； LocationMind Inc., Tokyo 101-0042, Japan（LocationMind公司）

AI总结为解决轨迹生成方法因数据集、预处理、表示和评估指标不一致导致的比较困难，提出CityTrajBench统一基准框架，标准化数据处理、模型适配与多级评估，并在三个真实数据集上对比统计、VAE、GAN、扩散和流匹配模型，揭示不同模型在全局真实性、轨迹几何保真度等指标上的权衡。

详情

AI中文摘要

城市轨迹生成是交通模拟、城市规划和移动性分析的基础任务。然而，由于现有研究通常依赖不同的数据集、预处理流程、轨迹表示和评估指标，轨迹生成方法之间的系统比较仍然困难。这种碎片化使得报告的性能差异是否源于生成机制本身或实验协议不一致变得不明确。为解决这一问题，我们提出了CityTrajBench，一个用于城市尺度车辆轨迹生成的统一基准框架和协议。CityTrajBench在共同设置下标准化了数据摄入、轨迹归一化、特征构建、模型适配、地图感知后处理、模型选择和多级评估。它支持异构生成器，包括统计基线、基于VAE、GAN、扩散和流匹配的模型，并在三个真实世界城市轨迹数据集上评估它们。该基准衡量全局空间真实性、行程级分布保真度、轨迹级几何相似性、条件移动一致性和效率。实验揭示了模型家族之间的明确权衡：DiffTraj在轨迹级几何保真度上最强，DiffRNTraj在结构敏感的全局真实性上具有竞争力，而TrajFlow在真实性、质量、条件一致性和效率之间提供了强平衡。同时，一个简单的马尔可夫基线在粗粒度行程和局部移动统计上仍具有竞争力。这些发现表明，城市轨迹生成质量本质上是多目标的，没有单一模型在所有标准上同等占优，并且CityTrajBench为未来城市移动性生成研究提供了可复现的基准协议和测试平台。

英文摘要

Urban trajectory generation is a fundamental task for transportation simulation, urban planning, and mobility analytics. However, systematic comparison across trajectory generation methods remains difficult because existing studies often rely on different datasets, preprocessing pipelines, trajectory representations, and evaluation metrics. This fragmentation makes it unclear whether reported performance differences arise from the generation mechanism itself or from inconsistent experimental protocols. To address this issue, we present CityTrajBench, a unified benchmark framework and protocol for city-scale vehicle trajectory generation. CityTrajBench standardizes data ingestion, trajectory normalization, feature construction, model adaptation, map-aware post-processing, model selection, and multi-level evaluation under a common setting. It supports heterogeneous generators, including statistical baselines, VAE-based, GAN-based, diffusion-based, and flow-matching-based models, and evaluates them on three real-world urban trajectory datasets. The benchmark measures global spatial realism, trip-level distribution fidelity, trajectory-level geometric similarity, conditional mobility consistency, and efficiency. Experiments reveal clear trade-offs across model families: DiffTraj is strongest on trajectory-level geometric fidelity, DiffRNTraj is competitive on structure-sensitive global realism, and TrajFlow provides a strong balance across realism, quality, conditional consistency, and efficiency. Meanwhile, a simple Markov baseline remains competitive on coarse-grained trip and local-movement statistics. These findings show that urban trajectory generation quality is inherently multi-objective, that no single model dominates all criteria equally, and that CityTrajBench provides a reproducible benchmark protocol and testbed for future research on urban mobility generation.

URL PDF HTML ☆

赞 0 踩 0

2606.02278 2026-06-02 eess.SY cs.LG cs.SY 版本更新

Physics-Guided Recurrent State-Space Neural Networks for Multi-Step Prediction

物理引导的循环状态空间神经网络用于多步预测

Ruiyuan Li, Ajay Seth, Manon Kok

发表机构 * Delft Center for Systems and Control, TU Delft, the Netherlands（代尔夫特系统与控制中心，代尔夫特理工大学，荷兰）； Department of Biomechanical Engineering, TU Delft, the Netherlands（生物力学工程系，代尔夫特理工大学，荷兰）

AI总结提出PG-RSSNN，一种结合物理知识和循环结构的状态空间神经网络，通过缓解梯度消失和数值发散风险，在有限数据和部分物理模型下提升多步预测性能。

Comments 6 pages, 3 figures. Accepted at IFAC World Congress 2026

详情

AI中文摘要

状态空间模型传统上基于物理知识，但由于模型不准确，这些物理模型的多步预测可能较差。黑盒深度学习作为替代方案显示出潜力，但这些方法依赖于大量数据集的可用性，且潜在可用的物理知识被忽略。我们提出PG-RSSNN，一种物理引导的循环状态空间神经网络，它结合循环结构以在多步预测中使用非饱和激活函数。它缓解了梯度消失，并消除了现有结构中因反馈状态估计而导致的训练数值发散风险。在多个具有不同物理模型不完善性的系统上（从带高斯噪声的线性状态空间模型到机械臂和级联水箱系统）的实验结果表明，与黑盒神经网络和纯物理模型相比，所提出的PG-RSSNN即使在训练数据有限且物理模型仅部分已知的情况下，也能保持稳定的训练行为，并改善多步预测。

英文摘要

State-space models are traditionally based on physical knowledge, but multi-step predictions from these physical models can be poor due to model inaccuracy. Black-box deep learning has shown promise as an alternative. However, these methods rely on the availability of large datasets and potentially available physical knowledge is neglected. We propose the PG-RSSNN, a physics-guided recurrent state-space neural network that incorporates recurrent structures to enable the use of non-saturating activation functions in multi-step prediction. It mitigates the vanishing gradients and eliminates the risk of numerical divergence in training seen in existing structures that feed back state estimates. Results across multiple systems with various physical model imperfections, from linear state-space models with Gaussian noise to a robotic arm and a cascaded water tank system, show that the proposed PG-RSSNN maintains stable training behavior, and improves multi-step predictions, as compared with black-box neural networks and physics-only models, even with limited training data and when physical models are only partially known.

URL PDF HTML ☆

赞 0 踩 0

2606.02276 2026-06-02 cs.CV cs.AI cs.CL cs.LG 版本更新

ShaplEIG：用于Shapley值估计的贝叶斯实验设计

David Rundel, Fabian Fumagalli, Maximilian Muschalik, Bernd Bischl, Matthias Feurer

AI总结提出ShaplEIG方法，通过高斯过程代理和期望信息增益自适应选择联盟，以高效估计Shapley值，在低预算场景下显著提升样本效率。

Comments Accepted at the Forty-Third International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

Shapley值是一种原则性的归因度量，广泛用于可解释机器学习，但其精确计算随玩家数量呈指数增长，促使了基于采样联盟价值函数评估的各种近似方法。这引发了一个问题：能否通过根据先前评估自适应选择联盟来提高近似精度？这在价值函数昂贵且评估次数严重受限的设置中尤为重要，例如基于重训练的特征重要性、数据估值和超参数重要性。为此，我们提出ShaplEIG，一种贝叶斯实验设计方法，该方法使用高斯过程代理近似昂贵的价值函数，并根据联盟对Shapley值的期望信息增益自适应选择联盟。通过Shapley值在价值函数中的线性性质，我们证明了期望信息增益具有封闭形式。此外，我们提出了一种高效计算方案，通过初等对称多项式将复杂度从指数级降低到玩家数量的多项式级。在多种昂贵应用的广泛实验中，我们的方法在低预算场景下始终优于最先进的基线方法，提高了样本效率。

英文摘要

Shapley values are a principled attribution measure widely used in interpretable machine learning, but their exact computation scales exponentially with the number of players, motivating a wide range of approximation methods based on value function evaluations of sampled coalitions. This raises the question of whether approximation accuracy can be improved by adaptively selecting coalitions for evaluation based on previous evaluations. This is particularly relevant in settings where the value function is costly and the number of evaluations is severely limited, such as retraining-based feature importance, data valuation, and hyperparameter importance. For this purpose, we propose ShaplEIG, a Bayesian experimental design approach that approximates the expensive value function using a Gaussian process surrogate and adaptively selects coalitions based on their expected information gain about the Shapley values. By the linearity of the Shapley values in the value function, we show that the expected information gain is available in closed form. Furthermore, we propose an efficient computation scheme that reduces the complexity from exponential to polynomial in the number of players via elementary symmetric polynomials. In extensive experiments across diverse costly applications, our method consistently improves sample efficiency in the low-budget regime over state-of-the-art baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.02242 2026-06-02 cs.CV cs.AI cs.LG 版本更新

Towards Resolving Optimization Conflicts Between Image- and Text-Based Person Re-Identification

解决基于图像和基于文本的行人重识别之间的优化冲突

Karina Kvanchiani, Timur Mamedov

发表机构 * Tevian, Russia（俄罗斯Tevian）； Lomonosov Moscow State University, Russia（俄罗斯罗蒙诺索夫莫斯科国立大学）

AI总结针对图像与文本行人重识别任务因模态差异和目标冲突导致共享表示次优的问题，提出解耦两阶段训练流程，使用单一视觉编码器避免跨任务干扰，实验表明图像预训练和文本监督能提升双任务性能。

详情

AI中文摘要

基于图像（I2I）和基于文本（T2I）的行人重识别（ReID）的联合优化受到模态差异和冲突训练目标的阻碍，导致共享表示次优。虽然I2I ReID关注同一人图像间的身份级不变性，但T2I ReID由与独特视觉特征相关的实例特定文本描述驱动。本文探讨了两个ReID任务及其优化过程之间的根本差异，以实现有效训练。由于I2I和T2I ReID通常分开研究，为一种检索设置优化的损失函数可能对另一种所需的表示质量产生负面影响。基于这些发现，我们提出了一种解耦的两阶段训练流程，用于学习跨图像和文本模态的共享表示。该流程基于单个视觉编码器，支持I2I和T2I检索，同时避免训练期间的跨任务干扰。我们在多种配置下进行了大量实验，改变了域混合程序、学习策略和任务目标。我们观察到I2I ReID预训练对T2I数据的泛化能力有积极影响。此外，我们发现视觉编码器训练阶段引入文本监督能提升I2I和T2I性能。我们相信，我们的见解为统一的ReID系统和跨模态检索整体迈出了有意义的一步。

英文摘要

The joint optimization of image-based (I2I) and text-based (T2I) person re-identification (ReID) is hindered by modality discrepancies and conflicting training objectives, leading to suboptimal shared representations. While I2I ReID focuses on identity-level invariance across images of the same person, T2I ReID is driven by instance-specific textual descriptions tied to unique visual traits. This paper explores the fundamental difference between two ReID tasks and their optimization processes for effective training. Since I2I and T2I ReID are often studied separately, the loss functions optimized for one retrieval setting may negatively affect the representation quality required by the other. Motivated by these findings, we propose a decoupled two-stage training pipeline for learning a shared representation across image and text modalities. The pipeline is based on a single vision encoder that supports both I2I and T2I retrieval while avoiding cross-task interference during training. We provide extensive experiments across multiple configurations, varying domain mixing procedures, learning strategies, and task objectives. We observed that I2I ReID pre-training positively impacts the generalization ability to T2I data. Besides, we find that incorporating textual supervision during the vision encoder training stage enhances both I2I and T2I performance. We believe our insights provide a meaningful step toward unified ReID systems and cross-modal retrieval overall.

URL PDF HTML ☆

赞 0 踩 0

2606.02241 2026-06-02 cs.LG 版本更新

BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers

BlockGen: 灵活的分块序列建模与混合采样器

Justin Deschenaux, Caglar Gulcehre

发表机构 * EPFL Lausanne, Switzerland（瑞士洛桑联邦理工学院）； Microsoft AI（微软人工智能）

AI总结提出BlockGen框架，通过分块序列建模和AR-informed预测-校正采样，比较均匀态扩散与掩码扩散在分块生成中的性能。

详情

AI中文摘要

CORE-MTL: 通过因果正交表示重新思考梯度平衡

Chengfeng Wu, Tao Zou, Yanru Wu, Jingge Wang

发表机构 * Tsinghua University（清华大学）

AI总结提出CORE-MTL框架，通过因果正交表示将共享表示分解为语义流和残差流，以分离任务相关结构与虚假上下文，从而减少负迁移并提升泛化能力。

Comments Accepted by ICML 2026

详情

AI中文摘要

多任务学习旨在通过跨领域共享共同表示来构建联合模型。为实现这一目标，现有的优化中心方法要么平衡任务梯度，要么修改共享架构。然而，由于这些方法对共享表示的内容不可知，它们无法将任务相关结构与虚假上下文分离，导致负迁移和泛化能力差。为克服这一限制，我们提出了用于多任务学习的因果正交表示（CORE-MTL），这是一个因果驱动的表示中心框架，鼓励对共享表示进行结构化的语义-残差分解，将任务相关结构集中在语义流中，而将干扰变化归入残差流。我们通过利用结构化场景的物理先验和属性的统计约束，在视觉领域实例化了该框架。理论上，我们的方法比优化中心方法具有更紧的分布外泛化界，并且无需显式梯度投影或重新加权即可减少任务梯度干扰。实验上，CORE-MTL在视觉多任务基准测试中，在分布内和分布外设置下均持续优于现有方法。代码公开于 https://github.com/Hope-Rita/CORE-MTL。

英文摘要

Multi-task learning (MTL) aims to construct a joint model for multiple tasks by sharing a common representation across domains. To achieve this goal, existing optimization-centric methods either balance task gradients or modify the shared architecture. However, as these approaches remain agnostic to the content of the shared representation, they fail to disentangle task-relevant structure from spurious context, leading to negative transfer and poor generalization. To overcome this limitation, we propose Causal Orthogonal Representations for Multi-Task Learning (CORE-MTL), a causally motivated representation-centric framework that encourages a structured semantic-residual factorization of the shared representation, concentrating task-relevant structure in the semantic stream while relegating nuisance variation to the residual stream. We instantiate this framework in the visual domain by leveraging physical priors for structured scenes and statistical constraints for attributes. Theoretically, our method enjoys a tighter out-of-distribution generalization bound than optimization-centric methods and reduces task gradient interference without explicit gradient projection or reweighting. Empirically, CORE-MTL consistently outperforms existing methods on visual multi-task benchmarks in both in-distribution and out-of-distribution settings. Code is publicly available at https://github.com/Hope-Rita/CORE-MTL.

URL PDF HTML ☆

赞 0 踩 0

2606.02218 2026-06-02 cs.LG cs.AI 版本更新

Faster Synchronous On-Policy RL via Straggler-Aware Group Sizing

通过感知掉队者的组大小调整实现更快的同步在线策略强化学习

Azal Ahmad Khan, Ammar Ahmed, Zeshan Fayyaz, Sheng Di, Mingyi Hong, Ali Anwar

发表机构 * University of Minnesota（明尼苏达大学）； University of Waterloo（滑铁卢大学）； Argonne National Laboratory（阿贡国家实验室）

AI总结提出动态组大小控制器SAGC，通过在线约束优化调整组大小，减少同步在线策略强化学习中的掉队者事件，提升墙钟效率并保持或改善训练奖励和模型质量。

详情

AI中文摘要

同步强化学习方法如组相对策略优化（GRPO）提供稳定且可复现的在线策略训练，但极易受到掉队者的影响——单个异常长的轨迹可能延迟整个组的奖励计算和参数更新。随着组大小增加，这个问题变得更加严重，在更大组的好处与同步停滞的墙钟成本之间产生矛盾。我们提出感知掉队者的组控制（SAGC），一种动态组大小控制器，根据观察到的轨迹行为在线调整训练组。SAGC将组大小选择形式化为一个在线约束优化问题，旨在保留更大组的好处，同时控制掉队者事件的长期发生率。在同步GRPO和DAPO训练中，以及在普通和强工程基线上，SAGC一致地减少了掉队者发生率并提高了墙钟效率，同时实现了有竞争力或更好的训练奖励。我们进一步表明这些收益转化为最终模型质量：在下游推理基准上，SAGC与最强的静态组大小基线相比具有竞争力或更好，并且通常在没有显式长度惩罚的情况下产生更短的输出。这些结果将动态组控制定位为使同步在线策略强化学习更高效和更稳健的实用方法。

英文摘要

Synchronous reinforcement learning methods such as Group Relative Policy Optimization (GRPO) provide stable and reproducible on-policy training, but they are highly vulnerable to stragglers, a single unusually long rollout can delay reward computation and parameter updates for the entire group. This problem becomes more severe as group size increases, creating a tension between the benefits of larger groups and the wall-clock cost of synchronization stalls. We propose Straggler-Aware Group Control (SAGC), a dynamic group-size controller that adapts the training group online based on observed rollout behavior. SAGC formulates group-size selection as an online constrained optimization problem, seeking to retain the benefits of larger groups while controlling the long-term rate of straggler events. Across synchronous GRPO and DAPO training, and on top of both vanilla and strong engineered baselines, SAGC consistently reduces straggler incidence and improves wall-clock efficiency while achieving competitive or better training reward. We further show that these gains transfer to final model quality: SAGC is competitive with or better than the strongest static group-size baseline on downstream reasoning benchmarks, and often produces shorter outputs without any explicit length penalty. These results position dynamic group control as a practical way to make synchronous on-policy RL more efficient and robust.

URL PDF HTML ☆

赞 0 踩 0

2606.02198 2026-06-02 cs.LG cs.CY 版本更新

Model Multiplicity and Predictive Arbitrariness in Recidivism Risk Assessment

模型多重性与再犯风险评估中的预测任意性

Ashwin Singh, Carlos Castillo

AI总结针对再犯风险评估中的预测任意性问题，通过理论下界推导和实证分析，发现相似精度的模型间预测一致性通常高于最坏情况理论保证，并提出采用最低风险分配策略来缓解任意性。

Comments 17 pages, 12 figures

详情

AI中文摘要

针对个体未来的预测任务本质上是嘈杂的，通常会产生多个相似精度的模型。当这些模型对同一个人产生不同预测时，会引发决策中的任意性问题。这种任意性在理论和实践中可能有多严重？如何解决以支持高风险风险评估？我们通过对一个已使用超过15年的基于机器学习的再犯风险评估决策支持系统的研究来回答这些问题。通过将复杂的法律规则转化为标记释放后结果（再犯或非再犯）的算法，我们首先构建了一个包含数千名囚犯释放的数据集。利用该数据集，我们学习可解释的模型，这些模型提高了预测性能，减少了群体间的错误率差异，并确保康复进展降低风险评分。接下来，我们研究预测多重性，首先推导出数据集上任何有限模型集的期望预测一致性的紧下界，然后评估该集合内的结构多样性（例如，不同的模型系数）在多大程度上转化为预测多重性（即对同一人的不同预测）。我们的实验表明，存在许多相似精度的模型且具有可比较的错误率差异并不一定意味着严重的预测多重性。经验上，性能相似的模型可以表现出比最坏情况理论保证高得多的预测一致性。我们发现，一种简单的策略——为每个囚犯分配这些模型中的最低风险——对于解决预测任意性是有效的。

英文摘要

Prediction tasks over individual futures, which are inherently noisy, often admit multiple similarly accurate models. When these models produce different predictions for the same individual, they raise concerns of arbitrariness in decision-making. How severe can this arbitrariness be, in theory and in practice? How can it be resolved to support high-stakes risk assessment? We address these questions through a study of a machine learning-based decision support system for recidivism risk assessment that has been in use for over 15 years. By translating complex legal rules into an algorithm for labeling post release outcomes (recidivist or non-recidivist), we first construct a dataset of thousands of inmate releases. Using this dataset, we learn interpretable models that improve predictive performance, reduce error-rate disparities between groups, and ensure that rehabilitative progress lowers risk scores. Next, we study predictive multiplicity, by first deriving a tight lower bound on the expected predictive agreement of any finite set of models over a dataset, and then by evaluating the extent to which structural diversity (e.g., different model coefficients) within this set translates to predictive multiplicity (i.e., different predictions for the same individual). Our experiments indicate that the existence of many similarly accurate models with comparable error-rate disparities does not necessarily translate into severe predictive multiplicity. Empirically, similarly performant models can exhibit substantially higher predictive agreement than worst-case theoretical guarantees suggest. We find that a simple policy that assigns each inmate the lowest risk among these models is effective for addressing predictive arbitrariness.

URL PDF HTML ☆

赞 0 踩 0

2606.02194 2026-06-02 cs.LG 版本更新

Coherent Off-Policy Improvement of Large Behavior Models with Learned Rewards

基于学习奖励的大规模行为模型的一致性离策略改进

Christian Scherer, Joe Watson, Theo Gruner, Daniel Palenicek, Ingmar Posner, Jan Peters

发表机构 * Technical University of Darmstadt（达姆施塔特技术大学）； University of Oxford（牛津大学）； Zuse School ELIZA（泽努斯学校ELIZA）； hessian.AI（海西斯AI）； German Research Center for AI (DFKI)（德国人工智能研究中心（DFKI））； Robotics Institute Germany (RIG)（德国机器人研究所）

AI总结提出一种逆强化学习方法，通过从专家演示中学习稠密奖励函数，结合一致性模仿学习理论保证，实现对预训练策略的离策略改进，在稀疏奖励操作任务中优于强化学习基线。

Comments 13 pages, 7 figures

详情

AI中文摘要

使用行为克隆将专家演示数据蒸馏到大规模生成模型中是一种可扩展的学习机器人控制能力策略的方法，特别是对于灵巧操作。强化学习（RL）可以作为一种利用额外经验进一步微调这些策略的手段。一个开放的问题是RL是否比收集更多人类演示更具样本效率。先前的工作通过将RL应用于一个较小的残差策略来大规模微调预训练策略，该残差策略纠正预训练模型。然而，对于典型的稀疏奖励任务，RL算法可能难以以样本高效的方式优化行为。我们探索逆强化学习，其中从专家演示中学习稠密奖励函数，可能降低RL微调的挑战。我们特别考虑一致性模仿学习，这是一种IRL方法，通过使用具有理论保证的特定奖励公式来促进BC策略的改进。我们展示了我们的IRL方法在所有六个稀疏操作任务上保持或提高了pi-0.5的性能，并在六个复杂操作任务中的五个上实现了≥90%的成功率，优于使用稀疏奖励的基于RL的基线。通过确保我们的初始预训练微调策略对于初始奖励和评论家是最优的，我们的方法避免了RL微调中常见的初始下降，并实现了更快的改进。

英文摘要

Distilling expert demonstration data into large generative models using behavioral cloning is a scalable approach to learning capable policies for robotic control, particularly for dexterous manipulation. Reinforcement learning (RL) can be used as a means to finetune these policies further using additional experience. An open question is whether RL is more sample-efficient than collecting more human demonstrations. Prior work has finetuned large pretrained policies in a scalable fashion by applying RL to a smaller residual policy that corrects the pretrained model. However, for the typical sparse reward tasks, RL algorithms can struggle to optimize the behavior in a sample-efficient manner. We explore inverse reinforcement learning, where a dense reward function is learned from expert demonstrations, potentially reducing the challenge of RL finetuning. We specifically consider coherent imitation learning, an IRL method that facilitates improvement of the BC policy through using a specific reward formulation with theoretical guarantees. We show that our IRL method maintains or improves the performance of pi-0.5 on all six sparse manipulation tasks and achieves a $\geq 90\%$ success rate on five out of six complex manipulation tasks, outperforming RL-based baselines using sparse rewards. By ensuring our initial pretrained finetuning policy is optimal for our initial reward and critic, our method circumvents the initial drop commonly seen in RL finetuning and enables faster improvement.

URL PDF HTML ☆

赞 0 踩 0

2606.02184 2026-06-02 cs.DL cs.LG 版本更新

The Ghost Couple: Correlated LLM Name Priors and Their Haunting of the Web and Academic Publishing

幽灵搭档：相关的大语言模型姓名先验及其对网络和学术出版的困扰

Michał Brzozowski, Neo Christopher Chung

发表机构 * Samsung AI Center（三星人工智能中心）； University of Warsaw（华沙大学）

AI总结研究发现大语言模型生成虚构专家姓名时会产生相关性强的角色组合，这些组合具有模型家族特异性，并在Zenodo等平台造成大量幽灵作者记录，影响学术出版。

详情

AI中文摘要

这些名字并不存在。Elena Vasquez 和 Marcus Chen 作为火山专家、宇航员、惊悚小说主角、播客主持人和学术合著者，出现在数百个独立生成的AI生成文档中，却从未存在过。我们表明，大语言模型在生成虚构专家时不仅仅默认使用高概率的单个名字：它们会产生相关的角色组合、配对和三人组，其共现频率远超偶然，并且在独立生成中保持一致。这些先验是模型家族特定的（Claude：Elena Vasquez + Marcus Chen + Amara Okafor；Gemini：Aris Thorne + Lena Petrova；GPT：Elara Voss 无固定搭档）、版本特定的，并且在模型发布边界处被主动抑制，在它们生成的内容中留下可定时的行为指纹。我们记录了一个大规模的下游后果。在Zenodo（一个由CERN运营的、生成真实DataCite DOI的存储库）上，我们识别出1,655条幽灵作者记录，声称不存在的期刊并带有捏造的出版日期：服务器端的DataCite时间戳证明了故意的回溯日期，其中991条记录在一个月内注册；这些记录携带在DataCite中注册的真实DOI，因此任何摄取DOI元数据的学术聚合器都可以获取它们。幽灵名字还出现在ResearchGate上，形成由来自多个模型家族的合作者组成的合成研究小组；这些记录上的出版日期为模型部署窗口提供了可靠的时间代理。

英文摘要

These names do not exist. Elena Vasquez and Marcus Chen have appeared as volcano experts, astronauts, thriller protagonists, podcast hosts, and academic co-authors across hundreds of independently produced AI-generated documents, never having lived. We show that large language models do not merely default to high-probability individual names when generating fictional experts: they produce correlated character ensembles, pairs and trios whose co-occurrence rates far exceed chance and are consistent across independent generations. These priors are model-family-specific (Claude: Elena Vasquez + Marcus Chen + Amara Okafor; Gemini: Aris Thorne + Lena Petrova; GPT: Elara Voss with no fixed partner), version-specific, and actively suppressed at model release boundaries, leaving dateable behavioral fingerprints in the content they produced. We document a downstream consequence at scale. On Zenodo, a CERN-operated repository that mints real DataCite DOIs, we identify 1,655 ghost-authored records claiming nonexistent journals with fabricated publication dates: server-side DataCite timestamps prove deliberate backdating, and 991 records were registered in a single month; these carry real DOIs registered in DataCite, making them harvestable by any scholarly aggregator that ingests DOI metadata. Ghost names additionally appear on ResearchGate forming synthetic research groups with collaborators drawn from multiple model families; publication dates on these records provide a reliable temporal proxy for model deployment windows.

URL PDF HTML ☆

赞 0 踩 0

2606.02179 2026-06-02 cs.LG cs.AI cs.CE 版本更新

On the Generalization in Topology Optimization via Sensitivity-Conditioned Bernoulli Flow Matching

关于拓扑优化中通过敏感性条件伯努利流匹配的泛化性

Mohammad Rashed, Duarte F. Valoroso Madeira, Babak Gholami, Caglar Guerbuez, Yunjia Yang, Nils Thuerey

发表机构 * University of California, San Diego（加州大学圣地亚哥分校）； Max Planck Institute for Informatics（马克斯·普朗克信息研究所）

AI总结通过信息论分析，提出伪敏感性概念，并利用敏感性条件伯努利流匹配生成器在拓扑优化中实现最优的分布外泛化性能。

Comments ICML Paper

详情

AI中文摘要

拓扑优化（TO）的代理模型在分布偏移（如载荷或边界条件变化）下表现出高度可变的分布外（OOD）泛化能力，但这一变异性的来源尚不清楚。我们假设OOD性能取决于条件信号保留关于驱动经典TO的伴随敏感性（简化梯度）的信息量。将TO流程建模为因果马尔可夫链，数据处理不等式表明，在该抽象下，敏感性场是拓扑预测的信息论最优条件信号。然而，计算精确的伴随敏感性在实践中可能昂贵或不可用；我们观察到某些物理场可以通过单调变换近似敏感性。为形式化这一点，我们引入 extbf{伪敏感性}来区分哪些场能够实现泛化，哪些信息贫乏。然后，我们展示了一个敏感性条件的伯努利流匹配生成器实证地证实了这些预测：以敏感性为条件可获得最先进的OOD性能，而越来越远的物理场性能退化至原始参数条件。结果在载荷偏移下的结构TO基准测试和我们新的CFD-TO数据集（边界条件偏移如多出口配置）中均成立。代码和数据集见https://tum-pbs.github.io/topotransformer/。

英文摘要

Surrogate models for topology optimization (TO) exhibit highly variable out-of-distribution (OOD) generalization under distribution shifts such as changing loads or boundary conditions, yet the source of this variability remains unclear. We hypothesize that OOD performance is governed by how much information the conditioning signal preserves about the adjoint sensitivity (reduced gradient) that drives classical TO. Modeling the TO pipeline as a causal Markov chain, the Data Processing Inequality establishes that, under this abstraction, the sensitivity field is an information-theoretically optimal conditioning signal for topology prediction. However, computing exact adjoint sensitivities can be expensive or unavailable in practice; we observe that certain physical fields can approximate sensitivities through monotone transformations. To formalize this, we introduce \textbf{pseudo-sensitivities} to characterize which fields enable generalization versus those that are information-poor. We then show that a sensitivity-conditioned Bernoulli flow-matching generator empirically confirms these predictions: conditioning on sensitivities yields state-of-the-art OOD performance, while increasingly distant physical fields degrade toward raw parameter conditioning. Results hold across structural TO benchmarks under load shifts and our new CFD-TO dataset under boundary-condition shifts such as multi-outlet configurations. Code and datasets are available at https://tum-pbs.github.io/topotransformer/ .

URL PDF HTML ☆

赞 0 踩 0

2606.02177 2026-06-02 cs.LG 版本更新

Low-Pass Flow Matching

低通流匹配

Francesco M. Ruscio, T. Konstantin Rusch

发表机构 * ELLIS Institute Tübingen（图宾根ELLIS研究所）； Max Planck Institute for Intelligent Systems（智能系统马克斯·普朗克研究所）； Tübingen AI Center（图宾根人工智能中心）； Liquid AI（液体AI）

AI总结针对流匹配中白噪声源与自然数据频谱不匹配的问题，提出基于算子调制插值的低通流匹配方法，引入时变频谱偏差，在保持或提升样本质量的同时显著降低采样成本。

Comments ICLR 2026 Delta Workshop

2606.02172 2026-06-02 cs.LG cs.CV 版本更新

Closing the Alignment-Maturity Gap in Federated Prototype Learning

缩小联邦原型学习中的对齐-成熟度差距

Mario Casado-Diez, Alejandro Dopico-Castro, Verónica Bolón-Canedo, Bertha Guijarro-Berdiñas

发表机构 * CITIC, Universidade da Coruña（CITIC，科鲁纳大学）

AI总结针对联邦学习中原型对齐压力抑制局部判别结构的问题，提出FedSAP框架，通过确定性对齐课程和几何驱动代理分离损失稳定表征学习，在多种异质性条件下提升分类性能。

详情

AI中文摘要

从分布式异质数据中学习判别性视觉表示是联邦学习（FL）中的一个基本挑战。基于原型的方法通过跨客户端共享类级表示来解决统计异质性，但在早期训练轮次中会产生距离依赖的梯度压力，这种压力尤其严重：对从噪声局部表示聚合而来的不成熟全局原型施加的对齐压力会产生大梯度，从而抑制局部判别结构的出现。结果导致嵌入空间组织不良，识别性能下降，尤其是在严重的非独立同分布（non-IID）条件下。我们提出FedSAP，一个通过两种互补机制稳定联邦表示学习的框架：一个确定性对齐课程，将全局对齐延迟到局部表示变得稳定；以及一个几何驱动的代理分离损失，利用现有原型库在单位超球面上强制执行类间结构，而不引入额外参数或通信开销。这些机制共同产生紧凑、分离良好的类簇，而不改变联邦参与者之间的底层通信协议。在三个基准测试和不同程度的异质性下的实验表明，与评估的原型基线相比，性能提升高达4个百分点，在高异质性下改进最为显著。我们框架的表示性质还使其能够直接扩展到半监督设置，其中未标记数据只需最小修改即可纳入，突显了调度对齐作为设计原则的通用性。

英文摘要

Learning discriminative visual representations from distributed, heterogeneous data is a fundamental challenge in Federated Learning (FL). Prototype-based methods address statistical heterogeneity by sharing class-level representations across clients but create a distance-dependent gradient pressure that is particularly severe during early training rounds: alignment pressure applied to immature global prototypes, aggregated from noisy local representations, generates large gradients that suppress the emergence of local discriminative structure. The result is a poorly organized embedding space and degraded recognition performance, particularly under severe non-IID conditions. We propose FedSAP, a framework that stabilises federated representation learning through two complementary mechanisms: a deterministic alignment curriculum that delays global alignment until local representations become stable and a geometry-driven proxy separation loss that enforces inter-class structure on the unit hypersphere using the existing prototype bank without introducing additional parameters or communication overhead. Together, these mechanisms produce compact, well-separated class clusters without altering the underlying communication protocol between federation's participants. Experiments across three benchmarks and varying degrees of heterogeneity show gains of up to 4 percentage points over the prototype-based baselines evaluated, with improvements most pronounced under high heterogeneity. The representational nature of our framework further enables a straightforward extension to semi-supervised settings, where unlabelled data is incorporated with minimal modification, underscoring the generality of scheduled alignment as a design principle.

URL PDF HTML ☆

赞 0 踩 0

2606.02168 2026-06-02 cs.CV cs.LG 版本更新

Disentanglement-Based Equivariant Learning for Compositional VQA

基于解耦的等变学习用于组合式VQA

Zhou Du, Zhaoquan Yuan, Xiao Wu, Changsheng Xu

发表机构 * IEEE Publication Technology Group（IEEE出版技术组）； School of Computing and Artificial Intelligence, Southwest Jiaotong University（计算机与人工智能学院，西南交通大学）； Engineering Research Center of Sustainable Urban Intelligence Transportation, Ministry of Education, China（可持续智慧城市交通工程研究中心，中华人民共和国教育部）； State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences（多模态人工智能系统（MAIS）国家重点实验室，自动化研究所，中国科学院）； School of Artificial Intelligence, University of Chinese Academy of Sciences（人工智能学院，中国科学院大学）

AI总结提出DEAL框架，通过因果干预解耦视觉和文本概念，并利用等变约束增强组合推理能力，在CLEVR-CoGenT和GQA-SGL上超越现有方法。

Comments Accepted by IEEE Transactions on Multimedia

详情

DOI: 10.1109/TMM.2025.3604897
Journal ref: IEEE Trans. Multimedia, vol. 27, pp. 8160-8173, 2025

AI中文摘要

组合式视觉问答（VQA）是一项具有挑战性但基础的任务，要求模型理解先前学习概念的新组合。当前方法往往忽视潜在概念的解耦，并且在有效捕捉组合变化机制方面受到限制。此外，最先进的技术依赖于额外的线索进行训练，这在现实世界的VQA场景中不可行。为了解决这些问题，本文提出了一种新颖的基于解耦的等变学习（DEAL）框架用于组合式VQA，该框架仅由真实答案指导。在DEAL中，我们采用因果启发的干预措施，在重新编码框架内解耦来自视觉和文本输入的概念。基于等变性原理，我们随后对推理输入进行组合变换，并对输出施加等变约束，以增强模型的组合推理能力。在基准数据集CLEVR-CoGenT和GQA-SGL上进行的全面实验验证了我们提出的DEAL方法在视觉和语言泛化设置下均优于现有的最先进方法。

英文摘要

Compositional visual question answering (VQA) represents a challenging yet fundamental task that requires models to comprehend novel combinations of previously learned concepts. The current methods often overlook the disentanglement of underlying concepts and are restricted in terms of their ability to effectively capture the compositional variation mechanism. Moreover, the state-of-the-art techniques depend on additional clues for training, which is not feasible in real-world VQA scenarios. To address these issues, in this paper, we introduce a novel Disentanglement-based EquivAriant Learning (DEAL) framework for compositional VQA, which is guided exclusively by ground-truth answers. In DEAL, we employ causality-inspired interventions to disentangle concepts derived from visual and textual inputs within a re-encoding framework. Based on the principle of equivariance, we subsequently perform a compositional transformation on the inference input and impose the equivariant constraint on the output to augment the compositional reasoning capacity of the model. Comprehensive experiments conducted on the benchmark CLEVR-CoGenT and GQA-SGL datasets validate the superiority of our proposed DEAL approach over the existing state-of-the-art methods for compositional VQA tasks in both visual and linguistic generalization settings.

URL PDF HTML ☆

赞 0 踩 0

2606.02156 2026-06-02 eess.IV cs.AI cs.CV cs.IR cs.LG 版本更新

Predicting the risk of colorectal anastomotic leak based on preoperative mapping of the blood supply of the bowel

基于术前肠道血供映射预测结直肠吻合口漏风险

Zahra Tabatabaei, Jon Sporring, Mark Bremholm Ellebæk, Alaa El-Hussuna

发表机构 * Computer Science Department, Københavns Universitet (KU)（哥本哈根大学计算机科学系）； University of Southern Denmark（南部丹麦大学）； Odense University Hospital（奥登塞大学医院）； OpenSourceResearch Collaboration（开源研究协作）

AI总结提出一种基于术前CT影像的AI驱动系统，通过分析血管和组织特征量化吻合口漏风险，并结合内容检索支持临床决策。

详情

AI中文摘要

吻合口漏仍然是结直肠癌手术后最严重的并发症之一，显著影响患者预后、康复轨迹和医疗成本。尽管影像技术有所进步，目前的术前评估仍依赖临床评估，这一过程主观、易出错且高度依赖个人经验。迄今为止，尚无经过验证的基于CT的方法能够在术前预测吻合口漏风险。本方案论文概述了一个全面的框架，用于开发和验证一个AI驱动的系统，该系统利用对比增强前后的CT影像进行术前风险评估。研究描述了数据收集、伦理处理、符合GDPR的患者数据预处理、图像预处理以及旨在生成临床可解释输出的深度学习架构探索等阶段。该工作流程的两个主要成果是：1) 风险评估模块，通过分析CT扫描中的血管和组织特征量化漏液可能性；2) 基于内容的医学图像检索（CBMIR）模块，识别并显示相似历史病例以支持循证手术决策。该方案论文需要医院和大学之间的密切合作；本方案表明，此类系统在现有医疗基础设施内技术上可行且临床可实施。通过遵循所提出的方法论阶段和监管原则，其他机构可以复制此工作流程以开发类似的决策支持工具。最终，这一跨学科框架旨在加强手术规划、减少漏液发生率，并推动向可解释、数据驱动的精准手术的更广泛范式转变。

英文摘要

Anastomotic leak remains one of the most serious complications following colorectal cancer surgery, substantially affecting patient outcomes, recovery trajectories, and healthcare costs. Despite advances in imaging technology, current preoperative assessment relies only on clinical assessment, a process that is subjective, error-prone, and highly dependent on individual expertise. To date, no validated CT-based method exists to predict anastomotic leak risk prior to surgery. This protocol paper outlines a comprehensive framework for developing and validating an AI-driven system for preoperative risk assessment using pre- and post-contrast CT imaging. The study describes the stages of data collection, ethical handling, and preprocessing of patient data in accordance with GDPR, image preprocessing, and the exploration of deep learning architectures designed to generate clinically interpretable outputs. Two integrated tools constitute the main deliverables of this workflow: 1) a risk assessment module, which quantifies the likelihood of leakage by analyzing vascular and tissue features in CT scans, and 2) a Content-Based Medical Image Retrieval (CBMIR) module, which identifies and displays similar historical cases to support evidence-based surgical decision making. The protocol paper requires close collaboration between hospitals and universities; this protocol demonstrates that such a system is technically feasible and clinically implementable within existing healthcare infrastructures. By following the proposed methodological stages and regulatory principles, other institutions can reproduce this workflow to develop analogous decision-support tools. Ultimately, this interdisciplinary framework aims to enhance surgical planning, reduce leak incidence, and contribute to a broader paradigm shift toward explainable, data-driven precision surgery.

URL PDF HTML ☆

赞 0 踩 0

2606.02145 2026-06-02 cs.LG 版本更新

Hybrid Neural Ordinary Differential Equations for Data-Efficient Polymerization Modeling with Incomplete Kinetics

混合神经常微分方程用于不完全动力学的高效聚合建模

Marah Almanasreh, Alexander Mitsos, Eike Cramer

发表机构 * RWTH Aachen University, Process Systems Engineering (AVT.SVT)（亚琛工业大学过程系统工程系）； JARA-CSD ； Energy Systems Engineering (ICE-1), Forschungszentrum Jülich（能源系统工程（ICE-1），焦耳中心）； Department of Chemical Engineering, Sargent Centre for Process Systems Engineering, University College London（化学工程系，萨金特过程系统工程中心，伦敦大学学院）

AI总结提出混合神经常微分方程框架，通过仅学习部分表征的有效自由基浓度项，在稀疏数据下实现自由基聚合的准确预测。

Comments 25 pages, 5 figures

详情

AI中文摘要

聚合动力学的准确预测对于过程设计、控制和优化至关重要。然而，纯机理模型需要对部分表征的动力学进行劳动密集型的参数化，而纯数据驱动模型需要大量且多样化的数据集，这些数据集获取成本高昂，尤其是在早期设计阶段。我们提出了一种混合神经常微分方程（NODE）框架，用于自由基聚合的数据高效建模。以甲基丙烯酸甲酯（MMA）的间歇聚合为例，明确保留了机理质量平衡，仅通过神经网络代理从数据中学习部分表征的控制单体消耗的有效自由基浓度，而引发剂分解、链增长和终止等已确立的反应则保持物理建模。在稀疏数据条件下，将混合NODE与离散时间前馈神经网络和纯数据驱动NODE进行比较，模型在规则和不规则采样下仅使用少至十个测量值进行训练。混合NODE始终比两种纯数据驱动基线实现更低的预测误差和更物理一致的外推。在噪声数据和未见操作条件的泛化场景中，混合NODE的RMSE为0.013，而数据驱动NODE为0.31，离散时间模型为0.68，表明在有限数据可用性下，仅学习闭合项而非完整动力学足以实现可靠预测。

英文摘要

Accurate prediction of polymerization dynamics is essential for process design, control, and optimization. Yet, purely mechanistic models require labor-intensive parameterization of partially characterized kinetics, while purely data-driven models demand large, diverse datasets that are costly to obtain, particularly in early-design stages. We propose a hybrid Neural Ordinary Differential Equation (NODE) framework for data-efficient modeling of free-radical polymerization. Using batch polymerization of methyl methacrylate (MMA) as a case study, the mechanistic mass balances are retained explicitly, and only the partially-characterized effective radical concentration governing monomer consumption is learned from data through a neural network surrogate, while established reactions such as initiator decomposition, propagation, and termination remain physically modeled. The hybrid NODE is evaluated against a discrete-time feedforward neural network and a purely data-driven NODE under sparse data conditions, with models trained on as few as ten measurements under both regular and irregular sampling. The hybrid NODE consistently achieves lower prediction errors and more physically consistent extrapolations than both purely data-driven baselines. In a generalization scenario with noisy data and unseen operating conditions, the hybrid NODE achieves an RMSE of 0.013, compared to 0.31 for the data-driven NODE and 0.68 for the discrete-time model, demonstrating that learning only a closure term rather than the full dynamics is sufficient for reliable prediction under limited data availability.

URL PDF HTML ☆

赞 0 踩 0

2606.02142 2026-06-02 cs.LG cs.DB 版本更新

TimeBlocks: Foundational and Continual Time-Series Blockbase -- Extended Version

TimeBlocks: 基础与持续时间序列块库——扩展版本

David Campos, Bin Yang, Tung Kieu, Lei Chen, Chenjuan Guo, Christian S. Jensen

AI总结提出TimeBlocks方法，通过可互换的模块化模型块和路由策略，构建轻量级、多任务的时间序列模型，并引入StreamCore实现持续校准，在多个数据集上优于现有基线。

Comments 15 pages. An extended version of "TimeBlocks: Versatile and Continual Time-Series Blockbase" accepted at SIGKDD 2026

详情

AI中文摘要

持续的数字化导致监控各种过程的时间序列数据流激增，从中可以获得有价值的见解。此外，成功的基础语言模型的出现引发了一个问题：是否可能实现具有处理多个任务的基础属性的时间序列模型，同时足够轻量以允许实时数据流处理。现有的基础时间序列模型通常很大，并且仅在离线设置中有效，没有严格的时间和计算约束，且不需要重复的模型校准。然而，当应用于数据流时，这些模型由于规模大且缺乏对持续校准的支持而效率低下，这损害了它们提供准确实时响应的能力、耐久性以及在硬件受限环境中的可部署性。我们提出TimeBlocks，通过促进在可变条件下适用于多个任务的轻量级模型的高效构建，实现多用途的时间序列处理。特别是，该方法维护一个可互换和模块化的模型块池，可用于构建新的时间序列模型。当面对特定的时间序列数据时，路由策略迭代选择最合适的块来为数据构建轻量级且准确的模型。我们为TimeBlocks配备了一种称为StreamCore的方法，以构建数据流的代表性小子集，该子集随时间保持流的保证近似，从而实现持续的模型校准。在多个数据集和多个任务上的实验研究表明，TimeBlocks能够构建优于现有基线的模型。

英文摘要

The ongoing digitization has led to a proliferation of time-series data streams that monitor a variety of processes, from which valuable insights may be obtained. Further, the emergence of successful foundational language models begs the question of whether it is possible to achieve time-series models with the foundational properties of handling multiple tasks, while being sufficiently lightweight to allow real-time data stream processing. Existing foundational time-series models are often large and only effective in offline settings without stringent time and computational constraints, and where repeated model calibration is not needed. However, when applied to data streams, these models are ineffective due to their size and lack of support for continual calibration, which compromise their ability to deliver accurate real-time responses, their durability, and their deployability in hardware-limited settings. We propose TimeBlocks to enable versatile time-series processing by facilitating the efficient building of lightweight models suitable for multiple tasks under variable conditions. In particular, the method maintains a pool of interchangeable and modular model blocks that can be used to construct new time-series models. When presented with specific time-series data, a routing strategy iteratively selects the most suitable blocks to construct a lightweight and accurate model for the data. We equip TimeBlocks with a method called StreamCore to build a representative small subset of the data stream, which preserves a guaranteed approximation of the stream over time, enabling continual model calibration. An experimental study on multiple data sets and covering multiple tasks shows that TimeBlocks enables to build models capable of outperforming existing baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.02138 2026-06-02 cs.LG cs.AI 版本更新

VLBM: Variational Latent Basis Modeling for OOD Robust Multivariate Time Series Forecasting

VLBM：面向OOD鲁棒多变量时间序列预测的变分潜在基建模

Xudong Zhang, Jierui Lei, Jiacheng Li, Lingdong Shen, Jian Cui, Haina Tang

发表机构 * School of Artificial Intelligence, University of Chinese Academy of Sciences（中国科学院大学人工智能学院）； Center for Machine Learning Research, Peking University（北京大学机器学习研究中心）； Amap, Alibaba Group（阿里巴巴集团阿地图）； School of Advanced Interdisciplinary Sciences, University of Chinese Academy of Sciences（中国科学院大学先进交叉学科学院）； Environmental Microbiome and Innovative Genomics Laboratory, Peking University（北京大学环境微生物与创新基因组实验室）

AI总结提出VLBM框架，通过变分潜在基分离稳定动态与OOD偏差，实现混合ID/OOD分布下的鲁棒预测，在12个基准任务上平均MAE和MSE分别提升15.08%和7.74%。

详情

AI中文摘要

多变量时间序列预测中的分布外（OOD）事件虽然罕见，但往往主导现实世界风险，使得平均情况预测不足以可靠部署。在混合ID/OOD分布的标准平均风险训练下，来自罕见OOD事件的优化信号可能被频繁的分布内（ID）模式淹没，因此强基准精度可能无法转化为高影响偏移下的可靠性。为解决此问题，我们提出VLBM（变分潜在基模型），一种理论指导的潜在预测框架，将稳定动态与OOD引起的偏差分离。VLBM学习一个共享潜在基，定义稳定ID动态的低秩子空间，将输入显式分解为基子空间分量和正交残差分量，并将未来感知后验与未来盲先验对齐，使得测试时潜在推断仅依赖于历史输入。在涵盖交通、天气、电力系统及其他现实世界领域的12个基准任务上，包括新构建的现实世界OOD交通数据集，VLBM实现了最先进的OOD鲁棒性和ID精度，平均MAE和MSE比最强基线分别提升15.08%和7.74%。在合成模拟数据集上，VLBM也持续实现最佳性能并更好地跟踪OOD脉冲恢复。这些结果支持潜在结构化预测作为混合ID和OOD条件下鲁棒预测的原则性途径。代码可在https://github.com/leijieruilq/VLBM_OOD_forecast获取。

英文摘要

Out of distribution (OOD) events in multivariate time series forecasting are rare but often dominate real world risk, making average case forecasting insufficient for reliable deployment. Under standard average risk training on mixed ID/OOD distributions, optimization signals from rare OOD events can be overwhelmed by frequent in distribution (ID) patterns, so strong benchmark accuracy may not translate into reliability under high impact shifts. To address this issue, we propose VLBM (Variational Latent Basis Model), a theory guided latent forecasting framework that separates stable dynamics from OOD induced deviations. VLBM learns a shared latent basis that defines a low rank subspace for stable ID dynamics, explicitly decomposes inputs into basis subspace components and orthogonal residual components, and aligns a future aware posterior with a future blind prior so that test time latent inference depends only on historical input. Across 12 benchmark tasks spanning transportation, weather, power systems, and other real world domains, including newly constructed real world OOD traffic datasets, VLBM achieves state of the art OOD robustness and ID accuracy, with average MAE and MSE gains of 15.08\% and 7.74\% over the strongest baseline. On a synthetic simulation dataset, VLBM also consistently achieves the best performance and better tracks OOD pulse recovery. These results support latent structured forecasting as a principled route to robust prediction under mixed ID and OOD conditions. The code is available at https://github.com/leijieruilq/VLBM_OOD_forecast.

URL PDF HTML ☆

赞 0 踩 0

2606.02136 2026-06-02 cs.LG 版本更新

Edge-aware Decoding for Neural Asymmetric Routing

面向神经非对称路由的边缘感知解码

Li Liang, Jinbiao Chen, Zizhen Zhang

发表机构 * Sun Yat-Sen University（中山大学）； Department of Industrial Systems Engineering and Management, National University of Singapore（新加坡国立大学工业系统工程与管理系）

AI总结针对神经非对称路由中表示与决策不匹配的问题，提出边缘感知解码器，通过显式暴露转移级成本信息提升零样本泛化性能。

详情

AI中文摘要

神经非对称路由模型越来越多地通过矩阵表示和非对称感知注意力来编码方向性。然而，最终路由动作并非孤立节点，而是在当前部分路由下选择的有向转移。这造成了表示与决策的不匹配：成对成本信息可能在上游编码，而最终候选logit仍主要参数化为上下文-节点兼容性。我们提出一种针对神经非对称路由的解码器设计原则：最终得分应显式暴露问题成本结构所暗示的转移级量。我们通过一个边缘感知解码器实例化该原则，该解码器为当前有向边、返回起点的闭合以及静态轻量级前瞻添加候选特定项，同时保持表示骨干网络固定。在受控的SVD/Sinkhorn非对称骨干网络上，该解码器在ATSP-100上训练并在ATSP-100/200/500/1000上零样本评估时，优于RADAR参考，将ATSP-1000的差距从4.13%降至2.73%。在ACVRP上，相同的得分级修改在更丰富的路由状态下显示出相同的定性趋势。ATSP消融实验和有向转移诊断进一步阐明了机制：最强证据涉及对当前有向边的敏感性，而闭合和静态前瞻则作为启发式延续线索。结果支持一项机制研究：神经非对称路由中一个关键的解码器侧信号是决策时暴露转移级边缘信息。

英文摘要

Neural asymmetric routing models increasingly encode directionality through matrix representations and asymmetry-aware attention. The final routing action, however, is not a node in isolation but a directed transition chosen under the current partial route. This creates a representation--decision mismatch: pairwise cost information may be encoded upstream while the final candidate logit is still largely parameterized as context--node compatibility. We propose a decoder-design principle for neural asymmetric routing: the final score should explicitly expose transition-level quantities suggested by the problem's cost-to-go structure. We instantiate this principle with an edge-aware decoder that adds candidate-specific terms for the current directed edge, return-to-start closure, and static lightweight lookahead, while keeping the representation backbone fixed. On a controlled SVD/Sinkhorn asymmetric backbone, the decoder improves over the RADAR reference when trained on ATSP-100 and evaluated zero-shot on ATSP-100/200/500/1000, reducing the ATSP-1000 gap from $4.13\%$ to $2.73\%$. On ACVRP, the same score-level modification shows the same qualitative trend under a richer routing state. ATSP ablations and directed-transition diagnostics sharpen the mechanism: the strongest evidence concerns sensitivity to the current directed edge, while closure and static lookahead act as heuristic continuation cues. The results support a mechanism study: a key decoder-side signal in neural asymmetric routing is decision-time exposure of transition-level edge information.

URL PDF HTML ☆

赞 0 踩 0

2606.02134 2026-06-02 cs.LG cs.AI cs.CV 版本更新

Rethinking Evaluation Paradigms in IBP-based Certified Training

重新思考基于IBP的认证训练中的评估范式

Konstantin Kaulen, Hadar Shavit, Holger H. Hoos

发表机构 * University of Freiburg（弗赖堡大学）； ETH Zurich（苏黎世联邦理工学院）

AI总结针对认证训练中自然精度与认证精度的权衡问题，提出基于Pareto前沿的多目标超参数优化方法，实现公平的方法间比较，并发现先前配置的欠调优现象，建立新的最优性能。

Comments Accepted to ICML 2026

详情

AI中文摘要

深度神经网络在许多监督学习任务上取得了强大性能，但仍易受对抗性扰动的影响。神经网络验证提供了数学上严格的鲁棒性保证，但计算成本高昂。为缓解这一问题，认证训练技术在训练过程中优化可验证的鲁棒性，通常通过方法特定的超参数控制自然精度与认证精度之间的权衡。由于这些指标本质上是冲突的，报告单一配置的常见做法存在问题：它可能误导关于整体性能的结论，并妨碍对最新技术的无偏评估。我们通过基于自然-认证精度权衡的Pareto前沿比较来评估认证训练方法。为了实现公平、方法无关的比较，我们执行高效的自动化多目标超参数优化，为每种方法识别一组Pareto最优配置。这种方法常常揭示先前报告配置中的显著欠调优，从而获得更优性能并建立新的最优水平。利用这些前沿，我们首次对认证训练方法进行了全面的多目标比较，表明先前的进展并不像假设的那样显著，并揭示了先前未报告的性能互补性。

英文摘要

Deep neural networks achieve strong performance on many supervised learning tasks but remain vulnerable to adversarial perturbations. Neural network verification provides mathematically rigorous robustness guarantees, yet at substantial computational cost. To mitigate this, certified training techniques optimise for verifiable robustness during training, typically inducing a trade-off between natural and certified accuracy controlled by method-specific hyperparameters. Because these metrics are inherently conflicting, the common practice of reporting a single configuration is problematic: it can mislead conclusions about overall performance and prevents unbiased assessments of the state of the art. We address this by evaluating certified training methods via Pareto front comparisons over the natural--certified accuracy trade-off. To enable fair, method-agnostic comparisons, we perform efficient automated multi-objective hyperparameter optimisation to identify a set of Pareto-optimal configurations for each method. This approach often uncovers substantial undertuning in previously reported configurations, yielding superior performance and establishing a new state of the art. Leveraging these fronts, we present the first comprehensive multi-objective comparison of certified training approaches, showing that prior advancements are less pronounced than assumed and revealing previously unreported performance complementarities.

URL PDF HTML ☆

赞 0 踩 0

2606.02120 2026-06-02 cs.CV cs.AI cs.LG 版本更新

Understanding-Enhanced Model Collaboration for Long-Tailed Egocentric Mistake Detection

理解增强的模型协作用于长尾自我中心错误检测

Boyu Han, Qianqian Xu, Shilong Bao, Zhiyong Yang, Ruochen Cui, Qingming Huang

发表机构 * State Key Laboratory of AI Safety, Institute of Computing Technology, CAS（人工智能安全国家重点实验室，计算技术研究所，中国科学院）； School of Computer Science and Tech., University of Chinese Academy of Sciences（中国科学院大学计算机科学与技术学院）； Beijing Academy of Artificial Intelligence（北京人工智能研究院）； Institute of Information Engineering, CAS（信息工程研究所，中国科学院）； School of Cyber Security, University of Chinese Academy of Sciences（中国科学院大学网络安全学院）

AI总结提出理解增强的模型协作方法（UE-MCM），结合粗粒度视频理解与细粒度动作推理，通过双分支模型和自适应融合门检测自我中心视频中的错误，并优化长尾分布。

详情

AI中文摘要

在本报告中，我们解决了从自我中心视频数据中判断用户是否错误执行动作的问题。为此，我们提出了一种理解增强的模型协作方法（UE-MCM），该方法将高效的粗粒度视频理解与准确的细粒度动作推理相结合。具体来说，UE-MCM包含一个小模型分支和一个大模型分支。大模型分支关注细粒度动作本身是否执行错误，而小模型分支联合输入粗粒度视频和细粒度片段，以识别可能局部正确但与整体工作流不一致的动作。小模型分支基于CLIP4CLIP视频编码器构建，该编码器从通过扩散对比重建增强的CLIP模型初始化，大模型分支使用Qwen3-VL嵌入模型从细粒度动作片段中提取高容量表示。然后，通过轻量级协作门自适应融合小分支预测和大分支预测。为了处理错误实例的长尾分布，我们通过互补目标优化分类器，包括重加权交叉熵、AUC导向学习和标签感知调整。所得系统平衡了速度和准确性，使其能够有效检测自我中心教学视频中的细微、罕见和模糊错误。

英文摘要

In this report, we address the problem of determining whether a user performs an action incorrectly from egocentric video data. To this end, we propose an Understanding-Enhanced Model Collaboration Method (UE-MCM) that combines efficient coarse-grained video understanding with accurate fine-grained action reasoning. Specifically, UE-MCM contains a small model branch and a large model branch. The large model branch focuses on whether the fine-grained action itself is executed incorrectly, while the small model branch jointly takes the coarse-grained video and fine-grained segment as input to identify actions that may be locally correct but inconsistent with the overall workflow. The small model branch is built on a CLIP4CLIP video encoder initialized from a CLIP model enhanced by Diffusion Contrastive Reconstruction, and the large model branch uses the Qwen3-VL Embedding model to extract high-capacity representations from fine-grained action segments. The small-branch prediction and the large-branch prediction are then adaptively fused by a lightweight collaboration gate. To handle the long-tailed distribution of mistake instances, we optimize the classifiers with complementary objectives, including reweighted cross-entropy, AUC-oriented learning, and label-aware adjustment. The resulting system balances speed and accuracy, making it effective for detecting subtle, rare, and ambiguous mistakes in egocentric instructional videos.

URL PDF HTML ☆

赞 0 踩 0

2606.02119 2026-06-02 cs.LG cs.AI 版本更新

How Hard Can It Be? Hardness-Aware Multi-Objective Unlearning

到底有多难？难度感知的多目标遗忘学习

Jiangwei Chen, Xinyuan Niu, Rachael Hwee Ling Sim, Zhengyuan Liu, Nancy F. Chen, Bryan Kian Hsiang Low

发表机构 * National University of Singapore（新加坡国立大学）； University of California, Berkeley（加州大学伯克利分校）

AI总结针对现有遗忘学习无法保证同时提升遗忘质量和保持保留效用的缺陷，提出一种基于约束优化的难度感知多目标遗忘算法（HAMU），通过量化遗忘数据与保留数据的相似度来指导模型更新，在保证遗忘质量提升的同时最小化保留效用损失。

Comments ICML 2026

详情

AI中文摘要

机器遗忘旨在由于隐私、版权或偏见问题，移除特定遗忘训练数据的影响，同时保持模型在剩余保留数据上的性能。现有的遗忘算法，例如优化损失的加权组合，试图实现提高遗忘质量和保持保留效用这些目标。然而，它们无法保证对所有遗忘和保留数据都能将目标改进到指定程度。在这项工作中，我们从约束优化的角度，用一种新颖且理论扎实的方法解决了这一限制。首先，我们确定遗忘数据和保留数据之间的相似度可以量化调和两个目标的难度。接下来，我们推导出一种遗忘算法（HAMU），其总体目标是通过根据我们的难度度量更新模型权重，在保证遗忘质量有指定改进的同时，最小化保留效用成本/下降。我们的难度度量还告知用户何时保留效用下降不可避免，即两个目标无法同时改进，应考虑停止。我们的算法适用于非凸模型，并且易于并行化，使其易于在实际场景中部署。我们通过实验使用大型模型在图像和文本数据集上证明了HAMU相对于基线的优越性能。我们的代码可在 https://github.com/aoi3142/HAMU 获取。

英文摘要

Machine unlearning aims to remove the influence of specific forget training data due to privacy, copyright or bias concerns while maintaining the model performance on the remaining retain data. Existing unlearning algorithms, such as optimizing a weighted combination of losses, have tried to achieve these objectives of improving forget quality and maintaining retain utility. However, they do not guarantee that these objectives can be improved by a specified extent for all forget and retain data. In this work, we address this limitation with a novel and theoretically-grounded approach from a constrained optimization perspective. Firstly, we identify that the hardness of reconciling both objectives can be quantified by the similarity between the forget data and the retain data. Next, we derive an unlearning algorithm (HAMU) with the overall goal of guaranteeing a specified improvement in forget quality while minimizing the retain utility cost/degradation by updating the model weights based on our hardness measure. Our hardness measure also informs users when retain utility degradation is unavoidable, i.e., both objectives cannot be improved simultaneously, and stopping should be considered. Our algorithm is applicable to non-convex models and is easily parallelizable, making it readily deployable in real-world scenarios. We empirically demonstrate HAMU's superior performance over baselines on both image and text datasets using large models. Our code is available at https://github.com/aoi3142/HAMU.

URL PDF HTML ☆

赞 0 踩 0

2606.02117 2026-06-02 stat.ML cs.LG stat.ME 版本更新

ProbRes: Volatility Learning for Probabilistic Time-Series Forecasting

ProbRes: 概率时间序列预测的波动率学习

Tingting Wang, Yunyi Zhang, Benyou Wang

AI总结提出ProbRes，一种事后概率校准方法，通过显式学习波动率动态来改进概率预测，有效处理异方差数据，并在理论和实验上验证其有效性。

详情

AI中文摘要

概率时间序列预测由于需要量化未来观测中的风险和不确定性，在金融应用中引起了越来越多的关注。我们提出ProbRes，一种事后概率校准方法，它显式地学习并将波动率动态纳入概率预测中，从而能够有效处理异方差数据。在训练过程中，ProbRes采用两个与架构无关的模块分别对条件均值和条件波动率进行建模。在推理阶段，它通过重采样标准化残差生成预测分布。ProbRes适用于单变量和多变量时间序列，并且在广泛的误差分布下保持稳健，包括具有条件异方差的非高斯创新。理论结果证明了ProbRes的有效性，在合成和真实数据集上的实验表明，ProbRes准确捕捉预测分布并产生校准良好的预测区间。

英文摘要

Probabilistic time series forecasting has attracted increasing attention in financial applications due to the need to quantify risk and uncertainty in future observations. We propose ProbRes, a post-hoc probabilistic calibration method that explicitly learns and incorporates volatility dynamics into probabilistic forecasting, enabling effective handling of heteroskedastic data. During training, ProbRes employs two architecture-agnostic modules to separately model the conditional mean and conditional volatility. At the inference stage, it generates predictive distributions by resampling normalized residuals. ProbRes is applicable to both univariate and multivariate time series and remains robust under a wide range of error distributions, including non-Gaussian innovations with conditional heteroskedasticity. Theoretical results demonstrate ProbRes's validity and experiments on both synthetic and real-world datasets show that ProbRes accurately captures predictive distributions and produces well-calibrated prediction intervals.

URL PDF HTML ☆

赞 0 踩 0

2606.02115 2026-06-02 stat.ML cs.LG 版本更新

Error Bounds for a Diffusion Model-Based Drift Estimator

基于扩散模型的漂移估计器的误差界

Ioar Casado-Telletxea, Omar Rivasplata

发表机构 * Basque Center for Applied Mathematics (BCAM)（巴斯克应用数学中心）； Centre for AI Fundamentals & Department of Computer Science（人工智能基础研究中心及计算机科学系）； University of Manchester, UK（英国曼彻斯特大学）

AI总结针对随机微分方程中已知扩散参数时的漂移估计问题，利用扩散模型理论推导了时间平均均方误差的显式风险界，将风险分解为离散化、得分近似、噪声初始化和采样方差四项。

Comments Preprint

详情

AI中文摘要

随机微分方程中的参数估计是一个经典的统计问题，在许多科学领域具有重要意义。Tapia Costa等人（2026）的最新工作引入了一种新技术，当扩散参数已知时，利用多条轨迹的离散样本估计漂移。他们的方法将漂移估计视为去噪问题，并利用（条件）得分匹配扩散模型的工具。尽管他们的实验在不同漂移类别中显示出有希望的结果，但其估计器的理论保证问题仍未解决。在本笔记中，我们通过利用扩散模型理论的技术来填补这一空白。更具体地说，我们为该漂移估计器的时间平均均方误差推导了一个显式的风险界。我们的界将风险分解为（i）Euler-Maruyama离散化，（ii）得分/去噪器近似，（iii）噪声初始化，以及（iv）采样方差，揭示了估计器中不同超参数和误差源之间的权衡。

英文摘要

Parameter estimation in stochastic differential equations is a classical statistical problem of much importance in many scientific fields. Recent work of Tapia Costa et al. (2026) introduced a novel technique for estimating the drift when the diffusion parameter is known, using discrete samples from multiple trajectories. Their method treats drift estimation as a denoising problem, and leverages tools from (conditional) score-matching diffusion models. Although their experiments showed promising results across different drift classes, the question of theoretical guarantees for their estimator was left unanswered. In this note, we address this gap by exploiting techniques from diffusion model theory. More concretely, we derive an explicit risk bound for the time-averaged mean-squared error of said drift estimator. Our bound decomposes the risk into the (i) Euler-Maruyama discretization, (ii) score/denoiser approximation, (iii) noise initialization, and (iv) sampling variance, revealing the trade-offs between the different hyperparameters and sources of error in the estimator.

URL PDF HTML ☆

赞 0 踩 0

2606.02107 2026-06-02 cs.RO cs.AI cs.LG 版本更新

Network Distributed Multi-Agent Reinforcement Learning for Consensus Control of Quadcopters

网络分布式多智能体强化学习用于四旋翼无人机一致性控制

Youssef Mahran, Zeyad Gamal, Aamir Ahmad, Ayman El-Badawy

发表机构 * Mechatronics Engineering Department, German University in Cairo (GUC), Egypt（埃及德国大学（GUC）机械工程系）； Institute of Flight Mechanics and Control (IFR), Head of Flight Robotics, University of Stuttgart, Germany（德国斯图加特大学飞行力学与控制研究所）； Faculty of EMS, Head of Mechatronics Engineering Department, German University in Cairo (GUC), Egypt（埃及德国大学（GUC）EMS学院）

AI总结提出网络分布式多智能体强化学习框架，利用通信图实现分布式策略，通过MASAC训练高层规划器，实现零样本扩展到250个智能体。

Comments This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore

详情

DOI: 10.1109/MELECON64486.2026.11418865
Journal ref: 2026 IEEE 23rd Mediterranean Electrotechnical Conference (MELECON)

AI中文摘要

本文提出了一种用于四旋翼无人机一致性控制的网络分布式多智能体强化学习（ND-MARL）框架。与依赖集中式规划或完全分散式执行的传统多智能体MARL公式相比，ND-MARL将群体通信图纳入决策过程。在2-邻居通信拓扑下，每个智能体仅观察两个邻居的信息，并通过分布式策略输出动作。使用多智能体软演员-评论家（MASAC）训练高层分布式一致性规划器，并将其嵌入层次化堆栈中，以生成由低层四旋翼控制器跟踪的参考目标位置。结果表明，与集中式MARL控制器相比，实现了平滑的一致性轨迹和规划器-跟踪器集成。最值得注意的是，学习到的控制器表现出零样本可扩展性，即在三智能体系统上训练的策略，在相同的2-邻居通信拓扑下，无需重新训练或微调即可部署到多达250个智能体的群体中，实现了随着团队规模增大而稳态散布增加的一致收敛，这是由于稀疏信息传播所致。这些发现突显了ND-MARL作为分布式、通信感知的四旋翼一致性控制的稳定框架。

英文摘要

This paper proposes a Network Distributed Multi-Agent Reinforcement Learning (ND-MARL) framework for quadcopter consensus control. Compared to conventional multi-agent MARL formulations that rely on centralized planning or fully decentralized execution, ND-MARL incorporates the swarm communication graph into the decision process. Under a 2-Neighbor communication topology, each agent observes information of only two neighbors and outputs an action through a distributed policy. A high-level distributed consensus planner is trained using Multi-Agent Soft Actor-Critic (MASAC) and embedded in a hierarchical stack to generate reference target positions tracked by a low-level quadcopter controller. Results demonstrate smooth consensus trajectories and planner-tracker integration when compared to a centralized MARL controller. Most notably, the learned controller exhibits zero-shot scalability, as policies trained on a three-agent system are deployed to swarms of up to 250 agents under the same 2-Neighbor communication topology without retraining or fine-tuning, achieving consistent convergence with increasing steady-state spread at large team sizes due to sparse information propagation. These findings highlight ND-MARL as a stable framework for distributed, communication-aware quadcopter consensus control.

URL PDF HTML ☆

赞 0 踩 0

2606.02106 2026-06-02 cs.LG stat.ML 版本更新

When Tabular Foundation Models Transfer Across Modalities: A Systematic Evaluation Across 95 Datasets, 7 Modalities, and Two Regimes

当表格基础模型跨模态迁移：对95个数据集、7种模态和两种范式的系统评估

Julien Lafrance

发表机构 * Telecom Paris, Institut Polytechnique de Paris（巴黎电信学院，巴黎理工学院）

AI总结本文提出一种结合等角紧框架预处理与表格基础模型的分类流水线，在跨模态数据上评估其性能，并证明其在速度与质量间取得良好平衡。

Comments 24 pages, 5 figures. Code and data available at https://doi.org/10.5281/zenodo.19982636

详情

AI中文摘要

我们提出一个单一的分类流水线，该流水线结合了等角紧框架（ETF）预处理阶段和用于上下文推理的表格基础模型，一旦数据被映射到固定向量表示，该流水线在所有模态上应用相同。我们在涵盖七种信号模态——视觉、音频、语音、文本、分子、时间序列和表格——的95个数据集上对其进行评估。主要的方法论贡献是固定比较对象：在整个论文中，性能与相同冻结特征上最强的轻量级调优基线进行比较，而oracle选择、部署选择和专门微调则分别报告。该流水线在相同冻结特征上与强大的轻量级调优基线广泛竞争。它并不在每个任务上都匹配最好的专门模型或高度调优的流水线，但差距很小，且运行速度更快——通常比完整骨干微调快4到200倍，而质量往往相当。我们描述了如何在实际中部署该流水线：何时应用ETF预处理，如何在无验证集的情况下停止其训练，如何设置上下文分类器，以及如何校准所得概率。校准步骤并非装饰性的：TabICL通过构造产生良好校准的概率，ETF预处理最初会破坏该校准，而后处理重新缩放则恢复它——从而产生每个预测的置信度信号，从业者可以将其用作置信度门控部署的信任阈值。我们还报告了该流水线在哪些情况下不应期望有帮助，以及如何提前识别这些情况。

英文摘要

We present a single classification pipeline that combines an Equiangular Tight Frame (ETF) preprocessing stage with a tabular foundation model for in-context inference, applied identically across modalities once data is mapped to fixed vector representations. We evaluate it on 95 datasets spanning seven signal modalities -- vision, audio, speech, text, molecular, time-series, and tabular. The main methodological contribution is to fix the comparison object: throughout the paper, performance is judged against the strongest lightweight tuned baseline on the same frozen features, while oracle selection, deployed selection, and specialized fine-tuning are reported separately. The pipeline is broadly competitive with strong lightweight tuned baselines on the same frozen features. It does not match the very best specialized models or heavily tuned pipelines on every task, but it stays close, and it runs much faster -- typically 4 to 200 times faster than full backbone fine-tuning, often at comparable quality. We describe how to deploy the pipeline in practice: when to apply ETF preprocessing, how to stop its training without a validation split, how to set up the in-context classifier, and how to calibrate the resulting probabilities. The calibration step is non-cosmetic: TabICL produces well-calibrated probabilities by construction, ETF preprocessing initially disrupts that calibration, and the post-hoc rescaling restores it -- yielding a per-prediction confidence signal that practitioners can use as a trust threshold for confidence-gated deployment. We also report where the pipeline should not be expected to help, and how to identify those cases in advance.

URL PDF HTML ☆

赞 0 踩 0

2606.02101 2026-06-02 stat.ML cs.LG stat.AP 版本更新

It does what it says on the tin: safe synthetic data from coarsened margins

名副其实：来自粗化边际的安全合成数据

Gillian M Raab

发表机构 * University of Edinburgh（爱丁堡大学）； Scottish Centre for Administrative Data Research（苏格兰行政数据研究中心）

AI总结提出一种通过粗化边际并应用迭代比例拟合算法生成合成数据的方法，确保透明性和无披露风险。

详情

AI中文摘要

本文提出了一种创建合成数据的方法，与当前可用的其他方法相比，该方法对用户有两个重要优势。首先是透明性；与其他方法不同，接收合成数据的人将知道原始数据中哪些变量之间的关系将在合成数据中大致保持。其次是保证合成数据来源于已被判定无披露风险的信息。这是通过首先定义和计算将在合成数据中保持变量关系的边际来实现的。然后，每个边际将根据数据保管者定义的标准进行统计披露控制，例如顶部编码和底部编码、小类别的组合和/或修改小计数。建议通过将表格中的所有计数粗化为披露限制的倍数来进一步调整策展边际。这些调整后的边际用于通过迭代比例拟合算法生成合成数据。使用1901年苏格兰人口普查的数据说明了创建此类合成数据的实际步骤。

英文摘要

This paper proposes a method of creating synthetic data (SD) that will have two important advantages for the user compared to other methods currently available. The first is transparency; unlike other methods, the person in receipt of the SD will know which of the relationships between variables in the original data will be approximately maintained in the SD. The second is a guarantee that the SD is derived from information that has already been judged to be free of disclosure risk. This is achieved by first defining and calculating the margins where relationships between variables will be maintained in the SD. Each margin will then be subject to statistical disclosure control (SDC) to the standards defined by the data custodian, e.g. top-coding and bottom-coding, combination of small categories and/or modifying small counts. Further adjustment of the curated margins is advised by coarsening all counts in the table to multiples of the disclosure limit. These adjusted margins are used to create SD by the Iterative Proportional Fitting (IPF) algorithm. The practical steps involved in creating such SD are illustrated using data from the 1901 Census of Scotland.

URL PDF HTML ☆

赞 0 踩 0

2606.02093 2026-06-02 cs.CL cs.AI cs.LG 版本更新

The Role of Ambiguity in Error Prediction via Uncertainty Quantification

不确定性量化中模糊性在错误预测中的作用

Ieva Raminta Staliūnaitė, James Bishop, Andreas Vlachos

发表机构 * University of Cambridge（剑桥大学）； The Alan Turing Institute（艾伦·图灵研究所）

AI总结通过解耦输入模糊性与不确定性信号，利用门控专家和选择性预测提升大语言模型在问答任务中的错误预测性能。

Comments 8 pages not including references and appendices, 3 figures

详情

AI中文摘要

错误预测任务，即预测模型输出是否正确，通常通过不确定性量化（UQ）来解决。然而，虽然不确定性指标捕捉了模型缺乏知识或能力进行预测的情况，但它们也反映了模型输入和上下文中固有的偶然不确定性。本文提出了一种通过将输入模糊性与UQ信号解耦来改进大语言模型（LLM）错误预测的方法。我们在问答（QA）任务上使用六种UQ指标进行实验，结果表明，UQ指标在无歧义实例上的错误预测能力优于具有多个合理答案的问题。我们使用门控专家和选择性预测将真实和预测的模糊性标签纳入错误预测流程。我们发现，模糊性信息提高了跨模型家族、训练和评估范式、数据集（包括据称无歧义的数据集）以及偶然不确定性来源的错误预测分数，在标准数据集上对单个UQ指标的PRR提升超过10个百分点。

英文摘要

The task of Error Prediction, namely predicting whether a model output is correct, is commonly tackled with Uncertainty Quantification (UQ). However, while uncertainty metrics capture when models lack knowledge or capacity to make a prediction, they also reflect aleatoric uncertainty, which is inherent in the model input and context. This paper presents a method for improving error prediction for Large Language Models (LLMs), by disentangling input ambiguity from UQ signal. We conduct experiments on the task of Question Answering (QA) with six UQ metrics and show that UQ metrics are more predictive of errors on unambiguous instances than on questions with multiple plausible answers. We use Gated Experts and Selective Prediction to incorporate gold and predicted ambiguity labels into the error prediction pipeline. We find that ambiguity information improves error prediction scores across model families, training and evaluation paradigms, datasets (including allegedly unambiguous ones), and sources of aleatoric uncertainty, yielding improvements of over 10 points of PRR for individual UQ metrics on standard datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.02078 2026-06-02 cs.LG 版本更新

Beyond $\ell_2$-norm and $\ell_\infty$-norm: A Curvature-Inspired $\ell_p$-Norm Scheme for Deep Neural Networks

超越ℓ2范数和ℓ∞范数：一种受曲率启发的深度神经网络ℓp范数方案

Jianhao Xu, Zhuang Yang

发表机构 * School of Computer Science and Technology, Soochow University（苏州大学计算机科学与技术学院）

AI总结针对现有优化器在参数维度曲率变化大时适应性差的问题，提出一种动态p值的ℓp范数方案，并融入SGD和SGDM，得到LPSGD和LPSGDM优化器，通过早期大p抑制高曲率方向、后期余弦退火减小p实现稳定更新，理论证明非凸情形下O(T^{-1/2})收敛率，在CIFAR和ImageNet数据集上验证了泛化性能提升。

详情

AI中文摘要

现有的深度神经网络（DNN）优化器通常依赖于ℓ2范数或ℓ∞范数，导致优化器不能很好地适应参数维度上曲率的显著变化。通常，DNN的训练过程在早期表现出强烈的曲率各向异性，而在后期，DNN的训练过程趋向于向各向异性较弱的平坦区域移动。特别地，基于ℓ2范数的优化器通常由高曲率方向主导，限制了优化器沿较低曲率方向的更新，从而导致收敛速度较慢。而基于ℓ∞范数的优化器由于坐标方向更新幅度相同，在平坦区域容易产生振荡。为了解决ℓ2和ℓ∞范数产生的这两种极端情况，我们提出了一种具有动态p值的新型ℓp范数方案，并将其融入随机梯度下降（SGD）和带动量的SGD（SGDM）中，从而得到两种具有更好泛化性能的新型优化器：ℓp-SGD（LPSGD）和ℓp-SGDM（LPSGDM）。特别地，所得到的优化器通过使用较大的p（p>2）来抑制早期高曲率方向的支配地位，随后将p逐渐减小至2以实现更稳定和精细的更新，其中后一过程受余弦退火策略启发。我们建立了所得到算法的理论保证，并分析了LPSGD和LPSGDM在非凸情形下均达到O(T^{-1/2})的收敛率。在基准数据集（包括CIFAR-10、CIFAR-100和ImageNet-1K）上，使用多种DNN（如VGG-11、ResNet-18和ResNet-50）进行了大量实验。

英文摘要

The existing optimizers for deep neural networks (DNNs) typically rely on either the $\ell_2$ norm or the $\ell_\infty$ norm, resulting in optimizers that do not adapt well to substantial changes in curvature across parameter dimensions. Generally, the training process of DNNs often exhibits strong curvature anisotropy in the early period, whereas in the later period, the training process of DNNs tends to move toward flatter regions with weaker anisotropy. Particularly, optimizers based on the $\ell_2$-norm are usually dominated by high-curvature directions, restricting updates of optimizers along with lower curvature direction and thus leading to a slower convergence rate. While optimizers based on the $\ell_\infty$-norm are prone to oscillations in flatter regions, due to the coordinate-wise updates of the same magnitude. To address these two extreme cases generated by $\ell_2$ and $\ell_\infty$ norms, we propose a novel $\ell_p$-norm scheme with a dynamical value of $p$ and incorporate it into stochastic gradient descent (SGD) and SGD with momentum (SGDM), leading to two novel optimizers with better generalization performance: ${\ell_p}$-SGD (LPSGD) and ${\ell_p}$-SGDM (LPSGDM). Particularly, the resulting optimizers suppress the dominance of high-curvature directions in the early period by utilizing a large $p$ ($p>2$), followed by a gradual decrease of $p$ toward 2 to enable more stable and refined updates, where the latter process is motivated by the cosine annealing strategy. We establish theoretical guarantees of the resulting algorithms and analyze that both LPSGD and LPSGDM achieve an $O(T^{-1/2})$ convergence rate for the nonconvex setting. Extensive experiments are conducted on benchmark datasets, including CIFAR-10, CIFAR-100, and ImageNet-1K, with multiple DNNs such as VGG-11, ResNet-18, and ResNet-50.

URL PDF HTML ☆

赞 0 踩 0

2606.02073 2026-06-02 cs.LG 版本更新

Planar Symmetric Pattern Generation

平面对称图案生成

Ning Lin, Luxi Chen, Huaguan Chen, Jiacheng Cen, Chongxuan Li, Wenbing Huang, Hao Sun

发表机构 * Gaoling School of Artificial Intelligence, Renmin University of China（中国人民大学中关村人工智能学校）

AI总结提出一种适用于任意平面群的对称化框架，通过将任意2D连续表示转换为对称表示并保持连续性，实现对称控制，在图案设计、剪纸设计、风格化拓扑设计和材料设计任务中验证了有效性。

2606.02061 2026-06-02 cs.LG 版本更新

Ablating Archetypes: The Stability of Archetypal SAEs is an Artifact of Initialization and Metric Design

消除原型：原型SAE的稳定性是初始化和度量设计的人为产物

Michał Brzozowski, Neo Christopher Chung

发表机构 * Samsung AI Center（三星人工智能中心）； University of Warsaw（华沙大学）

AI总结本文通过实验证明，原型稀疏自编码器声称的稳定性源于多轮训练中相同的初始化设置，而非原型约束本身，并强调稳定性与稳定化的区别对可解释性研究至关重要。

详情

AI中文摘要

使用稀疏自编码器（SAE）的字典学习从神经网络激活中产生过完备基，这些基通常是可解释的，并减少了多义性。然而，不同随机种子的SAE特征差异很大——这个问题被称为不稳定性。原型SAE（Fel等人，2025）被提出作为一种通用的字典学习干预，用于更可靠的概念提取，并报告在训练结束时字典更稳定。我们证明原型SAE声称的稳定性是在多次运行中设置相同初始化的结果。通过我们的分析，我们试图澄清机械可解释性中可能模糊使用的两个不同概念：稳定性是两个独立训练模型之间的一致性，而稳定化是独立初始化的运行向共同解收敛。这种区分对于自然语言处理（NLP）的机械可解释性至关重要，其中特征稳定性越来越多地被用作SAE特征是可重用分析单元的证据。原型SAE的实验共享一个确定性的k-means解码器初始化，在训练开始前将运行间字典距离设为零。当移除这种初始化时，原型约束在我们的设置中没有提供稳定化优势。我们进一步发现了一个依赖于预处理的余弦几何问题，使端点稳定性指标的解释复杂化。总的来说，我们的研究支持在更大的字典学习传统中研究SAE的价值，同时表明稳定性声明需要轨迹诊断和初始化消融。

英文摘要

Dictionary learning with sparse autoencoders (SAEs) produces overcomplete bases from neural network activations that are often interpretable and reduces polysemanticity. However, features from SAEs vary substantially across random seeds -- a problem known as instability. Archetypal SAEs (Fel et al., 2025) were proposed as a general dictionary-learning intervention for more reliable concept extraction, and report more stable dictionaries at the end of training. We demonstrate that the stability claimed by archetypal SAEs is a result of setting identical initialization across multiple runs. Through our analyses, we attempt to clarify two distinct notions in mechanistic interpretability that may be ambiguously used: stability is agreement between two independently trained models, whereas stabilization is the convergence of independently initialized runs toward a common solution. This distinction is critical for mechanistic interpretability of natural language processing (NLP), where feature stability is increasingly used as evidence that SAE features are reusable units of analysis. Experiments from archetypal SAEs share a deterministic k-means decoder initialization, setting inter-run dictionary distance to zero before training begins. When this initialization is removed, the archetypal constraint provides no stabilization advantage in our setting. We further identify a preprocessing-dependent cosine geometry issue that complicates interpretation of endpoint stability metrics. Overall, our study supports the value of studying SAEs within the larger dictionary-learning tradition while showing that stability claims require trajectory diagnostics and initialization ablations.

URL PDF HTML ☆

赞 0 踩 0

2606.02055 2026-06-02 cs.IT cs.LG cs.SI math.IT stat.ML 版本更新

Query-Limited Community Recovery in Stochastic Block Models

随机块模型中的有限查询社区恢复

Sabyasachi Basu, Manuj Mukherjee, Lutz Oettershagen, Suhas Thejaswi

AI总结研究在有限且带噪的网络数据访问下，通过自适应查询策略实现两社区随机块模型的精确社区恢复，并证明自适应查询可突破非自适应基准的信息论极限。

详情

AI中文摘要

我们研究在 $n$ 个顶点上的两社区随机块模型中，对网络数据的有限且带噪访问下的精确社区恢复。学习器可以查询一个带噪的邻域预言机，该预言机独立地以固定概率揭示被查询顶点的每个真实邻居，且从不返回非邻居，受限于有限的查询预算。我们考虑仅预言机访问以及一个组合模型，其中学习器还观察底层图的单个子采样副本。对于仅预言机访问，平衡均匀查询给出了一个尖锐的非自适应基准：当每个顶点被查询相同整数次数时，观测结果简化为具有衰减边概率的 SBM，并且 Abbe-Bandeira-Hall 精确恢复阈值适用。我们证明该基准并非自适应最优：在平衡均匀查询需要 $m n$ 次查询（对于某个 $m>1$）的机制下，两阶段自适应策略以 $n+o(n)$ 次查询成功。对于额外的子采样图，我们证明了一个亚线性查询的自适应差距：预算为亚线性的平衡数据无关均匀查询不会比单独的子采样图有所改进，而自适应查询可以针对少量不确定顶点并实现精确恢复。因此，自适应数据采集可以严格改善精确恢复的信息论极限。

英文摘要

We study exact community recovery in the two-community stochastic block model on $n$ vertices under limited and noisy access to network data. The learner may query a noisy neighborhood oracle that reveals each true neighbor of a queried vertex independently with fixed probability and never returns non-neighbors, subject to a finite query budget. We consider both oracle-only access and a combined model where the learner also observes a single subsampled copy of the underlying graph. For oracle-only access, balanced uniform querying gives a sharp non-adaptive benchmark: when each vertex is queried the same integer number of times, the observations reduce to an SBM with attenuated edge probabilities and the Abbe-Bandeira-Hall exact-recovery threshold applies. We show that this benchmark is not adaptively optimal: a two-stage adaptive strategy succeeds with $n+o(n)$ queries in a regime where balanced uniform querying requires $m n$ queries for some $m>1$. With an additional subsampled graph, we prove a sublinear-query adaptivity gap: balanced data-independent uniform querying with a sublinear budget does not improve over the subsampled graph alone, whereas adaptive querying can target a small set of uncertain vertices and achieve exact recovery. Thus adaptive data acquisition can strictly improve the information-theoretic limits of exact recovery.

URL PDF HTML ☆

赞 0 踩 0

2606.02047 2026-06-02 stat.ML cs.LG math.ST stat.ME stat.TH 版本更新

Convex Distance Operator Transport: A Convex and Geometry-Preserving Formulation

凸距离算子传输：一种凸且保持几何的公式

Junhyoung Chung, Euijong Song, Won Hwa Kim, Gunwoong Park

发表机构 * KAIST（韩国科学技术院）

AI总结提出凸距离算子传输（CDOT），通过算子正则化联合保持特征对应与内在几何结构，实现异质分布对齐，并证明其伪度量性质及与Gromov-Wasserstein的关系。

Comments This paper is 41 pages long, contains 6 figures, and has been accepted to ICML 2026

详情

AI中文摘要

我们引入了凸距离算子传输（CDOT），这是第一个凸最优传输框架，通过联合保持特征对应和内在几何结构来对齐异质域中的分布。具体来说，CDOT采用基于算子的正则化，通过引入距离算子和条件期望算子来对齐聚合的距离结构。因此，所提出的正则化提高了对局部几何变化的鲁棒性。我们进一步证明了得到的CDOT差异是带属性的紧度量测度空间上的有效伪度量。此外，我们通过一个新的色散间隙概念刻画了CDOT与Gromov-Wasserstein（GW）之间的关系，正式阐明了GW非凸性相对于CDOT凸性的几何来源。在有限样本情况下，我们推导了一个非渐近风险界，分解为优化误差和统计误差，并在全局收敛的Frank-Wolfe算法下建立了风险一致性。在合成点云、脑连接组和图分类基准上的实验表明，该方法优于现有方法，在实践中表现稳定可靠。

英文摘要

We introduce Convex Distance Operator Transport (CDOT), the first convex optimal transport framework that aligns distributions across heterogeneous domains by jointly preserving feature correspondence and intrinsic geometric structure. Specifically, CDOT employs an operator-based regularization that aligns aggregated distance structures by introducing distance and conditional expectation operators. Consequently, the proposed regularization improves the robustness to local geometric variations. We further prove that the resulting CDOT discrepancy is a valid pseudometric on the space of attributed compact metric-measure spaces. In addition, we characterize the relationship between CDOT and Gromov--Wasserstein (GW) through a new notion of dispersion gap, formally elucidating the geometric source of non-convexity in GW compared to the convexity of CDOT. In the finite-sample regime, we derive a non-asymptotic risk bound decomposed into optimization and statistical errors, establishing risk consistency under a globally convergent Frank--Wolfe algorithm. Experiments on synthetic point clouds, brain connectomes, and graph classification benchmarks demonstrate better performance over existing methods, with stable and reliable behavior in practice.

URL PDF HTML ☆

赞 0 踩 0

2606.02038 2026-06-02 physics.app-ph cs.LG 版本更新

Uncertainty-Aware Graph Neural Reconstruction of Urban Temperature Fields from Sparse Sensors under Deployment Constraints

部署约束下基于不确定性感知图神经网络的稀疏传感器城市温度场重建

Reda Snaiki, Abdelatif Merabtine

AI总结提出一种不确定性感知图神经网络框架，从稀疏传感器重建每日最高温度场，支持距离约束传感器放置和概率超标映射，在蒙特利尔地区验证优于传统方法。

详情

AI中文摘要

从稀疏观测重建空间连续的每日温度场对于城市气候监测和热风险分析至关重要，但实际部署受限于传感器预算和间距约束。本研究提出一种不确定性感知图神经网络（GNN）框架，用于从稀疏传感器重建每日最高温度场，同时支持距离约束的传感器放置和概率超标映射。该模型使用基于图注意力的均值残差架构，通过高斯负对数似然训练，预测温度场和空间变化的预测不确定性场。传感器放置采用基于QR分解的本征正交分解（POD-QR）策略，并施加4公里最小传感器间距约束，与随机可行放置和最远点采样进行比较。该框架在蒙特利尔区域多边形上使用Daymet v4.1每日温度数据（1公里分辨率）进行评估，采用严格的时间留出协议（训练：2020-2023；测试：2024）。在传感器预算（10-40个传感器）下，所提出的GNN在未观测节点上的RMSE和MAE始终优于反距离加权和普通克里金法。传感器放置效应在低预算时最显著，在高预算时减弱，在施加间距约束下，约30个传感器时出现实际饱和状态。概率评估进一步显示，随着传感器密度增加，不确定性校准得到改善，并且比克里金法具有更好的锐度-校准权衡。这些结果支持所提出的框架作为不确定性感知温度场重建和面向决策的热风险映射的有效工具。

英文摘要

Reconstructing spatially continuous daily temperature fields from sparse observations is important for urban climate monitoring and heat-risk analysis, but practical deployments are limited by sensor budgets and spacing constraints. This study proposes an uncertainty-aware graph neural network (GNN) framework for reconstructing daily maximum temperature fields from sparse sensors while supporting distance-constrained sensor placement and probabilistic exceedance mapping. The model predicts both the temperature field and a spatially varying predictive uncertainty field using a graph-attention-based mean-residual architecture trained with a Gaussian negative log-likelihood. Sensor placement is addressed using a Proper Orthogonal Decomposition with QR factorization (POD-QR) strategy with a 4 km minimum inter-sensor distance constraint and is compared with random feasible placement and farthest-point sampling. The framework is evaluated over a Montreal-area polygon using Daymet v4.1 daily temperature data (1 km resolution) under a strict temporal hold-out protocol (training: 2020-2023; testing: 2024). Across sensor budgets (10-40 sensors), the proposed GNN consistently outperforms inverse distance weighting and ordinary kriging in RMSE and MAE on unobserved nodes. Sensor-placement effects are most pronounced at low budgets and diminish at higher budgets, with a practical saturation regime emerging around 30 sensors under the imposed spacing constraint. Probabilistic evaluation further shows improved uncertainty calibration with increasing sensor density and a better sharpness-calibration trade-off than kriging. These results support the proposed framework as an effective tool for uncertainty-aware temperature field reconstruction and decision-oriented heat-risk mapping.

URL PDF HTML ☆

赞 0 踩 0

2606.02035 2026-06-02 cs.AI cs.LG 版本更新

RL-ACRGNet: Reinforcement Learning-Based Chest Radiology Report Generation Network

RL-ACRGNet：基于强化学习的胸部放射学报告生成网络

Yogesh Kumar Meena, Saurabh Agarwal, K. V. Arya

发表机构 * Human-AI Interaction (HAIx) Lab, Indian Institute of Technology Gandhinagar（人类-人工智能交互实验室，印度理工学院冈丁加尔）； Department of Computer Science and Engineering, Madhav Institute of Technology and Science Deemed University (MITS-DU)（计算机科学与工程系，马达夫技术与科学 deemed 大学（MITS-DU））； Multimedia and Information Security Research Group, Department of Computer Science and Engineering, ABV-Indian Institute of Information Technology and Management（多媒体与信息安全研究组，计算机科学与工程系，ABV-印度信息科技与管理学院）

AI总结提出RL-ACRGNet，一种结合预训练DenseNet编码器与多级LSTM解码器的离策略强化学习框架，通过度量奖励机制优化视觉语义嵌入，在IU-Xray和MIMIC-CXR数据集上超越基线，生成高质量临床报告。

Comments This work has been submitted to the IEEE for possible publication

详情

AI中文摘要

医学影像解读是现代临床诊断的基石，然而手动生成放射学报告既耗时又容易出现解读不一致。在医学AI领域，通过深度学习自动化这些描述有望简化临床工作流程并标准化诊断输出。然而，由于在捕获细粒度视觉特征和确保临床连贯性方面的局限性，准确的疾病检测和精确的报告生成仍然是重大挑战。为了解决这些问题，我们提出了RL-ACRGNet，一种改进的编码器-解码器模型，它将预训练的DenseNet编码器与多级LSTM解码器集成在离策略强化学习框架中。通过使用双网络方法，基于度量奖励机制细化视觉语义嵌入，我们证明RL-ACRGNet在IU-Xray数据集上持续优于最先进的基线，在BLEU-4（0.47%）、METEOR（0.17%）和ROUGE-L（0.518）上取得了定量改进。此外，在大规模MIMIC-CXR数据集上的综合评估证实了该模型的稳健泛化能力及其生成高质量、临床相关报告的能力。

英文摘要

Medical imaging interpretation is a foundational pillar of modern clinical diagnostics, yet the manual generation of radiology reports remains a time-consuming process prone to interpretation inconsistencies. Within the field of medical AI, automating these descriptions through deep learning promises to streamline clinical workflows and standardise diagnostic output. However, accurate disease detection and precise report generation remain significant challenges due to limitations in capturing fine-grained visual features and ensuring clinical coherence. To address these issues, we propose RL-ACRGNet, an improved encoder-decoder model that integrates a pre-trained DenseNet encoder with a multilevel LSTM decoder within an off-policy reinforcement learning framework. Using a dual-network approach to refine visual-semantic embeddings through a metric-based reward mechanism, we demonstrate that RL-ACRGNet consistently outperforms state-of-the-art baselines on the IU-Xray dataset, achieving quantitative improvements in BLEU-4 (0.47%), METEOR (0.17%) and ROUGE-L (0.518). Furthermore, comprehensive evaluations on the large-scale MIMIC-CXR data set confirm the robust generalisation of the model and its ability to generate high-quality, clinically relevant reports

URL PDF HTML ☆

赞 0 踩 0

2606.02027 2026-06-02 cs.RO cs.LG cs.MA 版本更新

World-Task Factorization for Robot Learning

世界-任务分解用于机器人学习

Eduardo Sebastián, Adrian Pfisterer, Vito Mengers, Oliver Brock, Amanda Prorok

发表机构 * Department of Computer Science and Technology, University of Cambridge, United Kingdom（计算机科学与技术系，剑桥大学，英国）； Robotics and Biology Laboratory, Technische Universität Berlin（机器人与生物学实验室，柏林技术大学）； Science of Intelligence (SCIoI), Cluster of Excellence, Berlin, Germany（智能科学（SCIoI），卓越中心，柏林，德国）； Robotics Institute Germany（德国机器人研究所）

AI总结提出将策略分解为世界因子和任务因子，通过可微图模型AICON与紧凑学习策略结合，实现零样本泛化到新配置并迁移到真实硬件。

详情

AI中文摘要

机器人学习必须产生能够泛化到新的约束、队友和环境组合的策略。为此，我们必须对策略进行结构性分解，这种选择决定了哪些部分泛化、哪些需要重新训练、哪些保持纠缠。现有方法涵盖从期望结构从数据扩展中涌现，到通过层次结构、技能库或学习专门化手工设计。在本文中，我们研究我们认为机器人学中最基本的分解：将世界与任务分离。我们研究了这种分解有原则的条件。世界因子是具身系统和环境的属性；它们独立于意图存在。任务因子由任务在世界所允许的事物上的逻辑定义。我们通过贝叶斯模型证据形式化这种不对称性：它与数据生成过程一致，通过分析世界模型保持高似然，并减少奥卡姆剃刀对任务参数的惩罚。我们通过将AICON（一个可微分的递归估计器和互连图，具有组合性，无需任务特定数据即可运行，并将成本梯度传播到执行器）与一个紧凑的学习策略配对来实例化这种分解，该策略调节梯度路径。梯度作为两个因子之间的接口：它们通过图携带世界结构，通过成本携带任务结构，从而在保持结构泛化的同时实现低维学习。我们在三个问题上测试了世界/任务分解，这些问题包含异构机器人、环境、任务逻辑和感觉运动模态。我们的框架在所有设置中优于端到端基线和分析启发式方法，零样本泛化到分布外配置，并无需重新训练即可迁移到真实硬件。

英文摘要

Robot learning must produce policies that generalize to new combinations of constraints, teammates, and environments. To achieve this, we must structurally factor the policy, which is a choice that dictates what generalizes, what requires retraining, and what remains entangled. Existing methods span a wide spectrum, from expecting structure to emerge from data scaling, to hand-designing it via hierarchies, skill libraries or learned specializations. In this paper, we study what we argue is the most fundamental factorization in robotics: separating the world from the task. We investigate the conditions under which this factorization is principled. World factors are properties of the embodied system and the environment; they exist independently of intent. Task factors are defined by the task's logic over what the world admits. We formalize this asymmetry through Bayesian model evidence: it aligns with the data-generating process, maintains high likelihood through an analytical world model, and reduces the Occam razor's penalty on task parameters. We instantiate this factorization by pairing AICON, a differentiable graph of recursive estimators and interconnections that is compositional, operates without task-specific data, and propagates cost gradients to actuators, with a compact, learned policy that modulates gradient paths. Gradients serve as the interface between the two factors: they carry world structure through the graph and task structure through costs, enabling low-dimensional learning while preserving structural generalization. We test the world/task factorization across three problems that encompass heterogeneous robots, environments, task logic and sensorimotor modalities. Our framework outperforms end-to-end baselines and analytical heuristics in all settings, generalizes zero-shot to out-of-distribution configurations, and transfers to real hardware without retraining.

URL PDF HTML ☆

赞 0 踩 0

2606.02022 2026-06-02 cs.CV cs.AI cs.LG 版本更新

Ranking vs. Assignment: The Metric Mismatch in Multi-View Object Association

排名 vs. 分配：多视角目标关联中的度量不匹配

Matvei Shelukhan, Timur Mamedov, Aleksandr Chukhrov, Karina Kvanchiani

发表机构 * Tevian Moscow（莫斯科Tevian）； Lomonosov Moscow State University（莫斯科国立罗蒙诺索夫大学）

AI总结本文揭示了多视角目标关联中常用的排名度量（如AP、FPR-95）与分配目标之间的根本性不匹配，并提出了基于Sinkhorn归一化的后处理方法以缓解该问题。

详情

AI中文摘要

多视角目标关联是一个重要的计算机视觉问题，是许多多相机感知任务的基础。虽然该任务自然被表述为受约束的一对一匹配问题，但最近的工作严重依赖成对排名度量（如AP和FPR-95）进行模型评估。我们强调了这些度量与实际分配目标之间的根本性不匹配。理论上，我们表明即使分配已经正确，AP和FPR-95也可能不完美，而基于Sinkhorn的归一化可以使它们完美。相反，最优的成对排名仍然可能导致错误的分配。我们通过使用基于Sinkhorn的归一化作为受控的后处理压力测试，在实践中验证了这种不匹配。我们表明，仅优化几个后处理参数就能显著提升AP和FPR-95，而分配级别的度量（如ACC和IPAA）却没有相应改进。

英文摘要

Multi-view object association is an important computer vision problem that underlies many multi-camera perception tasks. While this task is naturally formulated as a constrained one-to-one matching problem, recent works heavily rely on pairwise ranking metrics like AP and FPR-95 for model evaluation. We highlight a fundamental mismatch between these metrics and the actual assignment objective. Theoretically, we show that AP and FPR-95 can be imperfect even when the assignment is already correct, and that Sinkhorn-based normalization can make them perfect. Conversely, optimal pairwise ranking can still lead to incorrect assignments. We validate this mismatch in practice by using our Sinkhorn-based normalization as a controlled post-processing stress test. We show that optimizing just a few post-processing parameters significantly boosts AP and FPR-95 without corresponding improvements in assignment-level metrics such as ACC and IPAA.

URL PDF HTML ☆

赞 0 踩 0

2606.02020 2026-06-02 cs.CL cs.LG 版本更新

Unveiling the Entropy Dynamics of Chain-of-Thought Reasoning

揭示思维链推理的熵动力学

Ting Xu, Xu He, Yupu Lu, Jiankai Sun, Dong Li, Wai Lam, Jianye Hao

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文通过熵动力学揭示思维链推理的两阶段结构（不确定性区域和置信区域），并提出基于CUSUM变化点检测的无训练框架实现早期退出和测试时缩放，以提升推理效率与可靠性。

Comments 21 pages, 10 figures, accepted in ICML2026

详情

AI中文摘要

本文研究了思维链（CoT）的熵动力学，揭示了一致的两阶段结构：一个探索性的不确定性区域，然后急剧过渡到收敛的置信区域。我们证明置信区域具有两个关键性质：1）高可靠性——置信区域中的答案变得高度准确和稳定，以及2）高冗余性——模型在达到正确答案后生成长时间的不必要token。这些性质解锁了更高效和可靠的推理策略：1）早期退出利用可靠性和冗余性，在收益递减时安全终止计算，以及2）测试时缩放使用置信区域信号优先考虑收敛轨迹。为了实施这些见解，我们将置信区域检测建模为序列变化点检测问题，首次将经典变化点方法应用于监控CoT推理。使用累积和（CUSUM）算法（一种统计最优的变化点检测器），我们开发了一个无训练框架用于实时推理控制。实验表明，我们的方法为早期退出建立了优越的帕累托前沿。CUSUM在减少11.1% token的情况下达到63.06%的准确率，在准确率上分别超过DEER和Dynasor 3.28%和4.36%。对于测试时缩放，CUSUM加权投票始终优于自一致性。

英文摘要

This paper investigates the entropy dynamics of Chain-of-Thought (CoT) and uncovers a consistent two-phase structure: an Uncertainty Region of exploration transitioning sharply to a Confidence Region of convergence. We demonstrate that the Confidence Region possesses two critical properties: 1) High Reliability -- answers in the confidence region become highly accurate and stable, and 2) High Redundancy -- models generate unnecessary tokens long after reaching the correct answer. These properties unlock more efficient and reliable inference strategies: 1) Early Exit leverages reliability and redundancy to terminate computation safely when returns diminish, and 2)Test-Time Scaling uses the Confidence Region signal to prioritize converged trajectories. To operationalize these insights, we formulate Confidence Region detection as a sequential change-point detection problem, being the first to apply classical change-point methods to monitor CoT reasoning. Using the Cumulative Sum (CUSUM) algorithm, a statistically optimal change-point detector, we develop a training-free framework for real-time inference control. Experiments show our approach establishes a superior Pareto-frontier for early exit. CUSUM achieves 63.06% accuracy with 11.1% token reduction, outperforming DEER and Dynasor by 3.28% and 4.36% in accuracy respectively. For test-time scaling, CUSUM-weighted voting consistently outperforms self-consistency.

URL PDF HTML ☆

赞 0 踩 0

2606.02016 2026-06-02 cs.LG 版本更新

Evaluating Real-World Generalizability of Algorithm Selection Models

评估算法选择模型的现实世界泛化能力

Gjorgjina Cenikj, Jakub Kudela, Eva Tuba, Tome Eftimov

发表机构 * Computer Systems Department, Jožef Stefan Institute（计算机系统部门，约泽夫·斯蒂芬研究所）； Brno University of Technology（布拉格技术大学）

AI总结通过跨基准测试系统评估算法选择模型在合成与现实优化问题上的泛化能力，分析其迁移性能并指出在特定领域应用中的挑战。

Comments 10 pages, 12 figures

详情

DOI: 10.1145/3795101.3805348

AI中文摘要

算法选择（AS）旨在通过利用可测量的问题特征和历史性能数据，自动为给定问题实例识别最合适的优化算法。在本研究中，我们研究了AS模型在合成和现实优化景观上的泛化能力。我们考虑了两个广泛使用的学术基准测试套件（BBOB和CEC）以及两个现实世界问题集（机器人轨迹优化任务和无人机路径规划问题）。通过系统的跨基准测试评估，我们分析了AS模型如何在领域之间迁移，识别了泛化成功或失败的情况，并强调了在现实、特定领域环境中应用AS时出现的挑战。我们的研究结果提供了对当前AS方法鲁棒性的见解，并为开发更可靠、广泛适用的现实世界优化AS系统提供了信息。

英文摘要

Algorithm Selection (AS) aims to automatically identify the most suitable optimization algorithm for a given problem instance by leveraging measurable problem characteristics and historical performance data. In this study, we investigate the generalization ability of AS models across both synthetic and real-world optimization landscapes. We consider two widely used academic benchmark suites (BBOB and CEC) and two real-world problem sets (robotics trajectory optimization tasks and unmanned aerial vehicle path-planning problems). Through a systematic cross-benchmark evaluation, we analyze how AS models transfer between domains, identify where generalization succeeds or breaks down, and highlight the challenges that arise when applying AS in realistic, domain-specific contexts. Our findings provide insights into the robustness of current AS approaches and inform the development of more reliable, broadly applicable AS systems for real-world optimization.

URL PDF HTML ☆

赞 0 踩 0

2606.02011 2026-06-02 cs.AI cs.LG 版本更新

Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery

推理模型中的极端低位推理：失败模式与针对性恢复

Ekaterina Alimaskina, Darya Rudas, Denis Shveykin, Gleb Molodtsov, Pavel Vasiliev, Aleksandr Beznosikov

发表机构 * University of Washington（华盛顿大学）

AI总结针对大型推理模型在2位量化推理中因生成不稳定导致总token数膨胀而无法实现端到端加速的问题，提出轻量级FP16规划和循环救援两种控制方法，显著恢复模型精度并保持实际速度。

详情

AI中文摘要

大型推理模型（LRM）依赖长推理轨迹，导致推理成本高昂。虽然低位量化降低了每token解码成本，但我们表明，激进的2位推理可能无法实现端到端加速，因为生成过程中的不稳定性会膨胀总token数。2位量化不仅降低答案准确性，还常常产生更长的轨迹，包含重复循环、预算耗尽、延迟承诺和未闭合的推理段。我们分析了Qwen3推理模型在数学和常识基准上的完整推理轨迹，并表明准确率下降与这些过程级失败密切相关。为解决这些问题，我们引入了两种轻量级控制：FP16规划，为2位模型提供简短的高精度轮廓；以及循环救援，检测重复轨迹并要么承诺早期答案，要么回退到FP16。在MATH-500上，循环救援将Qwen3-8B准确率从17.2%提升至74.2%，而规划加循环救援将Qwen3-32B准确率从65.0%提升至87.2%。总体而言，我们的结果表明，当极端低位推理的失败被视为可控生成病理时，它变得可行：通过轻量级检测和选择性FP16支持，2位推理可以在恢复准确率的同时保持真实的端到端速度。我们的代码可在 https://github.com/brain-lab-research/quantized-reasoning 获取。

英文摘要

Large Reasoning Models (LRMs) rely on long reasoning traces, making inference expensive. While low-bit quantization reduces per-token decoding cost, we show that aggressive 2-bit inference can fail to deliver end-to-end speedup because instability in the generation process inflates total token count. Instead of merely lowering answer accuracy, 2-bit quantization often produces much longer traces with repetitive loops, budget exhaustion, delayed commitment, and unclosed reasoning segments. We analyze full reasoning traces of Qwen3 reasoning models across mathematical and commonsense benchmarks and show that accuracy degradation is tightly linked to these process-level failures. To address them, we introduce two lightweight controls: FP16 planning, which gives the 2-bit model a short high-precision outline, and loop rescue, which detects repetitive traces and either commits to an earlier answer or falls back to FP16. On MATH-500, loop rescue improves Qwen3-8B accuracy from 17.2% to 74.2%, while planning plus loop rescue improves Qwen3-32B from 65.0% to 87.2%. Overall, our results show that extreme low-bit reasoning becomes practical when its failures are treated as controllable generation pathologies: with lightweight detection and selective FP16 support, 2-bit inference can recover accuracy while preserving real end-to-end speed. Our code is available at: https://github.com/brain-lab-research/quantized-reasoning.

URL PDF HTML ☆

赞 0 踩 0

2606.02008 2026-06-02 stat.ML cs.LG 版本更新

Provable Data Scaling Law for Meta Learning via Complexity Minimization

通过复杂度最小化实现元学习的可证明数据缩放定律

Kazuto Fukuchi, Ryuichiro Hataya, Kota Matsui

AI总结提出复杂度最小化框架，通过最小化跨源域的最坏情况下游模型复杂度，从理论上证明元学习中的预训练数据规模增大可提升少样本适应性能。

详情

AI中文摘要

预训练已成为现代机器学习的基本范式，其关键经验优势之一是随着预训练数据规模的增加，下游样本复杂度降低。然而，现有的预训练理论框架并未完全解释这一现象。在本文中，我们引入了复杂度最小化，一种新颖的元表示学习框架，旨在实现对此缩放行为的理论分析，该框架通过评估每个领域最适合的下游模型复杂度并最小化跨源域的最坏情况复杂度来学习表示。我们的端到端理论分析，涵盖从预训练到下游回归，表明该框架可证明地捕捉了这种缩放行为；特别地，我们展示了少样本适应的错误率随着元训练数据量的增加而改善。实验上，我们证明将复杂度正则化纳入现有的元学习方法中持续提高下游样本效率。

英文摘要

Pre-training has become a fundamental paradigm in modern machine learning, with one of its key empirical benefits being reduced downstream sample complexity as the scale of pre-training data increases. However, existing theoretical frameworks for pre-training do not fully explain this phenomenon. In this paper, we introduce complexity minimization, a novel meta-representation learning framework designed to enable theoretical analysis of this scaling behavior, which learns representations by evaluating the downstream model complexity best suited to each domain and minimizing the worst-case such complexity across source domains. Our end-to-end theoretical analysis, spanning pre-training through downstream regression, shows that this framework provably captures this scaling behavior; in particular, we show that the error rate of few-shot adaptation improves as the amount of meta-training data grows. Empirically, we demonstrate that incorporating complexity regularization into existing meta-learning methods consistently improves downstream sample efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.01999 2026-06-02 cs.LG cs.AI 版本更新

Why Do Time Series Models Need Long Context Windows?

为什么时间序列模型需要长上下文窗口？

Luca Butera, Giovanni De Felice, Andrea Cini, Cesare Alippi

发表机构 * Università della Svizzera Italiana（瑞士联邦理工学院）； EPFL（瑞士联邦理工学院）； Politecnico di Milano（米兰理工学院）

AI总结本文从生成过程识别和条件预测两个目标出发，证明长上下文窗口通过降低生成过程的不确定性来提升预测性能，并表明即使对于记忆长度为P的过程，输入窗口必须严格大于P才能达到最小误差。

详情

AI中文摘要

现代用于预测时间序列组的深度学习模型依赖于越来越长的观测窗口。然而，增加窗口大小的好处通常被简单地归因于捕捉长程依赖，而关于全局预测模型如何利用输入观测的更广泛讨论一直有限。在本文中，我们表明预测时间序列组涉及两个目标：(i) 生成过程识别（GPI），即推断生成输入序列的具体过程，以及 (ii) 条件预测（CF），即根据输入观测预测未来值。从这个角度来看，最优预测可以解释为对所有可能数据生成过程的平均，并按输入窗口给定的似然加权。这为长上下文窗口的好处提供了另一种解释：它们降低了运行过程中输入时间序列由哪个具体过程生成的不确定性。我们证明，即使对于记忆长度为 $P$ 的过程，严格大于 $P$ 的输入窗口大小对于达到最小可实现误差是必要的。最后，我们展示了如何将 GPI 和 CF 解耦，以在不牺牲准确性的情况下提高计算可扩展性。在合成和真实数据上的实验验证了我们的见解及其对设计预测架构的相关性。

英文摘要

Modern deep learning models for forecasting groups of time series rely on increasingly longer observation windows. However, the benefit of increasing the window size is often simply attributed to capturing long-range dependencies, and broader discussion on how global forecasting models leverage input observations has been limited. In this paper, we show that forecasting groups of time series involves two objectives: (i) generative process identification (GPI), i.e., inferring the specific process generating the input sequence, and (ii) conditional forecasting (CF), i.e., predicting future values given input observations. From this perspective, optimal predictions can be interpreted as an average over plausible data-generating processes, weighted by their likelihood given the input window. This suggests another explanation for the benefits of long context windows: they reduce the uncertainty about which specific process is generating the input time series during operation. We prove that even for processes with memory length $P$, an input window size strictly larger than $P$ is necessary to achieve the minimum attainable error. Finally, we show how decoupling GPI and CF can improve computational scalability without compromising accuracy. Experiments on synthetic and real-world data validate our insights and their relevance for designing forecasting architectures.

URL PDF HTML ☆

赞 0 踩 0

2606.01993 2026-06-02 cs.CL cs.AI cs.LG 版本更新

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

MMG2Skill: 智能体能否从野外指南中提炼出自我进化的技能？

Xinyu Che, Junqi Xiong, Yunfei Ge, Xinping Lei, Shihao Li, Hang Yan, Han Li, Yuanxing Zhang, Zhiqi Bai, Jinhua Hao, Ming Sun, Han Li, Jiaheng Liu

发表机构 * Nanjing University（南京大学）； Kuaishou Technology（快手科技）

AI总结提出MMG2Skill框架，将多模态异构的野外指南编译为可编辑技能，通过轨迹级根因反馈持续改进，在GUI控制、开放游戏和策略卡牌任务中显著提升VLM智能体性能。

Comments 35 pages, 12 figures, 13 tables. Code: https://github.com/NJU-LINK/MMG2Skill

详情

AI中文摘要

网络上丰富的程序性知识对于帮助智能体解决长程任务具有巨大潜力。然而，这些知识通常是多模态、异构、有噪声的，并且隐含地假设人类执行者，使得它们难以直接用作智能体所需的技能。为了弥合人类导向指南与智能体可执行技能之间的差距，我们将此问题形式化为指南到技能学习：将野外指南转换为可执行技能，并从智能体可观察的轨迹中持续改进它们。为了评估现有智能体在此任务上的能力，我们引入了MMG2Skill-Bench，这是针对该问题的首个基准测试。我们进一步提出了MMG2Skill，一个闭环框架，它将指南编译为可编辑技能，在执行过程中将固定的视觉语言模型（VLM）智能体条件化于这些技能，并从轨迹级根因反馈中修正技能，而不使用基准测试分数。在GUI控制、开放式游戏和策略卡牌游戏中，使用六个VLM骨干网络，MMG2Skill在每个模型-领域设置中始终优于普通基线智能体，在骨干网络上实现了宏观平均增益+12.8到+25.3个百分点。消融研究表明，直接用原始指南提示智能体会降低性能，而结构化技能构建和轨迹驱动修订对于观察到的改进都是必要的。在成功可推断的任务中，当成功信号适当校准时，基于分析器的提前停止进一步防止了后期性能退化，并节省了25%-53%的尝试次数。

英文摘要

Abundant procedural knowledge on the Web holds great potential for helping agents solve long-horizon tasks. However, such knowledge is often multimodal, heterogeneous, noisy, and implicitly assumes human executors, making it difficult to use directly as the skills required by agents. To bridge the gap between human-oriented guides and agent-executable skills, we formalize this problem as guide-to-skill learning: converting in-the-wild guides into executable skills and continuously improving them from trajectories observable to the agent. To evaluate the capability of existing agents on this task, we introduce MMG2Skill-Bench, the first benchmark designed for this problem. We further propose MMG2Skill, a closed-loop framework that compiles guides into editable skills, conditions a fixed vision-language model (VLM) agent on these skills during execution, and revises the skills from trajectory-level root-cause feedback without using benchmark scores. Across GUI control, open-ended gameplay, and strategic card play with six VLM backbones, MMG2Skill consistently outperforms vanilla baseline agents in every model-domain setting, achieving macro-average gains of +12.8 to +25.3 percentage points across backbones. Ablation studies show that directly prompting agents with raw guides can degrade performance, while both structured skill construction and trajectory-driven revision are necessary for the observed improvements. On success-inferable tasks, analyzer-based early stopping further prevents late-stage performance regressions and saves 25%-53% of attempts when the success signal is properly calibrated.

URL PDF HTML ☆

赞 0 踩 0

2606.01992 2026-06-02 cs.CV cs.AI cs.LG 版本更新

A Structured Benchmark for Text-Guided Anomaly Detection: When Language Stops Conditioning the Decision

文本引导异常检测的结构化基准：当语言停止条件化决策时

Stefano Samele, Eugenio Lomurno, Teodora Jovanovic, Sanjay Shivakumar Manohar, Alberto Crivellaro, Matteo Matteucci

发表机构 * Politecnico di Milano, AIRLab（米兰理工学院，AIRLab）； S&H – Software & Hardware（S&H – 软件与硬件）

AI总结提出结构化基准TGAD，通过三个场景逐步增加语言功能角色，评估多模态异常检测系统的文本引导能力，发现当前系统仅表面受语言条件化，标准基准高估了其能力。

详情

AI中文摘要

工业异常检测历来是单模态任务。最近的多模态视觉-语言模型产生了接受文本输入和图像的系统，并被呈现为支持文本引导的零样本和少样本检测。然而，这些方法使用继承自单模态基准的协议进行评估，这些协议保持文本条件不变，因此无法衡量语言是否条件化决策；报告的性能提升是否反映文本引导或强大的预训练视觉特征仍是开放问题。我们引入文本引导异常检测（TGAD），这是一个结构化基准，通过三个场景逐步增加语言的功能角色：MVTec AD上的受控提示敏感性设置；MVTec AD的组件标记扩展，要求模型将其评估限制在指定部件；以及新的组装面板数据集（APD），这是一个需要缺陷类型和组件位置知识的现实工业场景。我们评估每个范式的代表性模型：生成式大视觉-语言、无训练判别式和嵌入自适应判别式。在所有三个模型中，文本接口仅表面条件化决策：除非移除对象名词，否则提示内容被吸收（生成模型的I-AUROC从97.4降至82.6）；一旦指令部件外的缺陷被视为正常，组件级指令不约束决策（从90.3降至66.3）；当两者在APD上结合时，图像级判别崩溃至MVTec水平以下，一种情况低于随机水平（71.2、50.5、31.5）。这些结果表明，标准基准夸大了当前多模态异常检测系统的文本引导能力，并且此类协议是能够通过语言可靠控制以用于工业部署的模型的先决条件。

英文摘要

Industrial anomaly detection has historically been a unimodal task. Recent multimodal vision-language models have produced systems that admit textual input alongside the image and are presented as enabling text-guided zero- and few-shot inspection. Yet these methods are evaluated with protocols inherited from unimodal benchmarks that hold the textual condition constant and therefore cannot measure whether language conditions the decision; whether reported gains reflect text guidance or strong pretrained visual features remains open. We introduce Text-Guided Anomaly Detection (TGAD), a structured benchmark that progressively increases the functional role of language across three scenarios: a controlled prompt-sensitivity setting on MVTec AD; a component-tagged extension of MVTec AD that requires the model to restrict its assessment to an instructed part; and the new Assembled Panel Dataset (APD), a realistic industrial setting that requires both defect-type and component-location knowledge. We evaluate one representative model per paradigm: generative large vision-language, training-free discriminative, and embedding-adaptive discriminative. In all three, the textual interface conditions the decision only superficially: prompt content is absorbed unless the object noun is removed (the generative model's I-AUROC drops from 97.4 to 82.6); component-level instructions do not constrain the decision once defects outside the instructed part are admitted as normal (from 90.3 to 66.3); and when both combine on APD, image-level discrimination collapses below the MVTec level, in one case below chance (71.2, 50.5, 31.5). These results suggest that standard benchmarks overstate the text-guided capabilities of current multimodal anomaly detection systems, and that a protocol of this kind is a prerequisite for models that can be reliably controlled through language for industrial deployment.

URL PDF HTML ☆

赞 0 踩 0

2606.01987 2026-06-02 cs.DM cs.LG 版本更新

Graph Edit Distance Formulation for the Vehicle Routing Problem: Theory and Analysis

车辆路径问题的图编辑距离公式：理论与分析

Adel Dabah

发表机构 * Forschungszentrum Jülich（耶鲁斯研究中心）

AI总结本文提出将车辆路径问题重新表述为图编辑距离最大化问题，通过边删除成本模型实现总路线成本最小化，并利用该公式进行结构分析和基准测试。

详情

AI中文摘要

我们证明车辆路径问题（VRP）可以重新表述为图编辑距离（GED）最大化问题。在简单的边删除成本模型下，最小化总路线成本等价于从完整实例图中删除的边的总权重最大化。该公式在边级别对VRP进行建模，其中解由选定的边而非路线序列定义，从而能够进行经典公式中难以实现的结构分析：解质量的每条边归因、最优性差距的分解、解稀疏性的刻画以及贪婪构造难以到达的边的识别。理论上，我们建立了一个合并-分解定理，表明Clarke-Wright节省等于每次合并的GED增量，以及一个近似转移定理，将GED近似比转化为VRP成本界限。利用这一重新表述，我们分析了90个已知最优解的CVRP基准实例。我们发现最优路由图仅使用5.5%的可用边，约3.0%的最优边在重复重启下始终未被Clarke-Wright启发式找到，并且成本差距分解为遗漏的最优边和替代的非最优边，两者总权重相当。边加性目标为未来的图神经网络边预测方法提供了自然的每条边监督信号，暗示了与图神经网络方法的潜在联系，这留待后续工作。

英文摘要

We show that the Vehicle Routing Problem (VRP) can be reformulated as a Graph Edit Distance (GED) maximization problem. Under a simple edge-deletion cost model, minimizing total route cost is equivalent to maximizing the total weight of edges deleted from the complete instance graph. This formulation models VRP at the edge level, where solutions are defined by selected edges rather than route sequences, enabling structural analyses that are difficult in classical formulations: per-edge attribution of solution quality, decomposition of the optimality gap, characterization of solution sparsity, and identification of edges that are hard to reach by greedy construction. Theoretically, we establish a merge-decomposition theorem showing that Clarke-Wright savings equal per-merge GED increments, and an approximation-transfer theorem that turns GED approximation ratios into VRP cost bounds. Using this reformulation, we analyze 90 CVRP benchmark instances with known optimal solutions. We find that optimal routing graphs use only 5.5% of available edges, that approximately 3.0% of optimal edges are consistently not found by Clarke-Wright heuristics under repeated restarts, and that the cost gap decomposes into missed optimal edges and substituted non-optimal edges of comparable total weight. The edge-additive objective provides a natural per-edge supervision signal for future graph neural network approaches to edge prediction, suggesting a potential connection to graph neural network approaches that we leave for follow-up work.

URL PDF HTML ☆

赞 0 踩 0

2606.01973 2026-06-02 cs.LG cs.CV 版本更新

A Closer Look at In-Distribution vs. Out-of-Distribution Accuracy for Open-Set Test-time Adaptation

开放集测试时自适应中分布内与分布外准确率的深入分析

Zefeng Li, Evan Shelhamer

发表机构 * University of British Columbia and Vector Institute（不列颠哥伦比亚大学和向量研究所）

AI总结本文通过基准测试和提出新基线，揭示了当前开放集测试时自适应方法在平衡分布内准确率和分布外检测能力上的不足。

Comments TMLR 2026

详情

AI中文摘要

开放集测试时自适应（TTA）在存在输入偏移和未知输出类别的情况下更新模型。尽管近期方法在提高已知类别的分布内（InD）准确率方面取得了进展，但它们准确检测分布外（OOD）未知类别的能力仍未得到充分探索。我们在小规模CIFAR-10-C和大规模ImageNet-C的标准损坏基准上，对鲁棒和开放集TTA方法（SAR、OSTTA、UniEnt和SoTTA）进行了基准测试。对于CIFAR-10-C，我们使用来自SVHN和CIFAR-100的OOD数据，分别对应其损坏形式SVHN-C和CIFAR-100-C。对于ImageNet-C，我们使用来自ImageNet-O和Textures的OOD数据，分别对应其损坏形式ImageNet-O-C和Textures-C。ImageNet-O更接近ImageNet，包含未知但相关的物体类别（如食物类的“蒜香面包”与“热狗”，基础设施类的“高速公路”与“水坝”），而Textures则远离ImageNet，包含非物体图案（如“裂纹”泥土、“多孔”海绵、“纹理”树叶）。我们评估了TTA方法在CIFAR-10-C和ImageNet-C上对InD与OOD识别的准确率和置信度。我们在CIFAR-10-C上验证了每种方法自身OOD检测技术的准确率。我们还在ImageNet-C上进行了评估，并报告了准确率和标准OOD检测指标。我们进一步考察了更现实的设置，其中OOD数据的比例和速率可以变化。为了探索InD识别与OOD拒绝之间的权衡，我们提出了一种新的基线，将softmax/多类输出替换为sigmoid/多标签输出。我们的分析首次表明，当前的开放集TTA方法难以平衡InD和OOD准确率，并且它们仅能不完全地过滤OOD数据以进行自身的自适应更新。

英文摘要

Open-set test-time adaptation (TTA) updates models on new data in the presence of input shifts and unknown output classes. While recent methods have made progress on improving in-distribution (InD) accuracy for known classes, their ability to accurately detect out-of-distribution (OOD) unknown classes remains underexplored. We benchmark robust and open-set TTA methods (SAR, OSTTA, UniEnt, and SoTTA) on the standard corruption benchmarks of CIFAR-10-C at the small scale and ImageNet-C at the large scale. For CIFAR-10-C, we use OOD data from SVHN and CIFAR-100 in their respective corrupted forms of SVHN-C and CIFAR-100-C. For ImageNet-C, we use OOD data from ImageNet-O and Textures in their respective corrupted forms of ImageNet-O-C and Textures-C. ImageNet-O is nearer to ImageNet, as unknown but related object classes (like ''garlic bread'' vs. ''hot dog'' for food, or ''highway'' vs. ''dam'' for infrastructure), while Textures is farther from ImageNet, as non-object patterns (like ''cracked'' mud, ''porous'' sponge, ''veined'' leaves). We evaluate the accuracy and confidence of TTA methods for InD vs. OOD recognition on CIFAR-10-C and ImageNet-C. We verify the accuracy of each method's own OOD detection technique on CIFAR-10-C. We also evaluate on ImageNet-C and report both accuracy and standard OOD detection metrics. We further examine more realistic settings, in which the proportions and rates of OOD data can vary. To explore the trade-off between InD recognition and OOD rejection, we propose a new baseline that replaces softmax/multi-class output with sigmoid/multi-label output. Our analysis shows for the first time that current open-set TTA methods struggle to balance InD and OOD accuracy and that they only imperfectly filter OOD data for their own adaptation updates.

URL PDF HTML ☆

赞 0 踩 0

2605.02122 2026-06-02 cs.LG cs.AI 版本更新

面向刚性物体的学习动作条件与对象中心高斯溅射世界模型

Jens U. Kreber, Lukas Mack, Joerg Stueckler

发表机构 * Intelligent Perception in Technical Systems Group（技术系统智能感知组）

AI总结提出MRO-GWM模型，通过对象中心高斯表示和时空变换器架构，学习刚性物体在3D中的动作条件动力学，支持多物体场景和部分观测下的未来运动预测。

详情

AI中文摘要

世界模型使智能体能够预测其动作对环境的影响。在本文中，我们提出了多刚性物体高斯世界模型（MRO-GWM），一种学习刚性物体在3D中动作条件动力学的新模型。通过用对象中心高斯表示场景，我们可以表示任意物体形状和多物体场景。我们开发了一种新颖的时空变换器架构，该架构根据物体高斯的历史和未来动作预测未来的刚体运动。物体通过其在规范坐标系中的高斯表示，从而可以将物体运动描述为刚体变换。我们的模型在多视角重建上进行训练，这要求模型处理因遮挡导致的物体部分观测。我们分析了该方法在由典型家庭物体组成的合成数据集上的预测性能，这些数据集包含多物体动力学和机器人末端执行器的交互。我们还在模拟中评估了模型在非抓取操作中的模型预测控制性能。

英文摘要

World models enable intelligent agents to predict the consequences of their actions on the environment. In this paper, we propose Multi Rigid Object Gaussian World Model (MRO-GWM), a novel model that learns action-conditional dynamics of rigid objects in 3D. By representing the scene by object-centric Gaussians, we can represent arbitrary object shapes and multi-object scenes. We develop a novel spatio-temporal transformer architecture that predicts future rigid body motion from a history of object Gaussians and future actions. Objects are represented by their Gaussians in a canonical frame, which allows for describing object motion as rigid body transformation. Our model is trained on reconstructions from multiple viewpoints, which requires the model to handle partial observations of objects due to occlusions. We analyze prediction performance of our approach on synthetic datasets composed of typical household objects with multi-object dynamics and interactions by a robot end effector. We also evaluate our model in model-predictive control for non-prehensile manipulation in simulation.

URL PDF HTML ☆

赞 0 踩 0

2606.01934 2026-06-02 cs.LG cs.CL 版本更新

HMPO: Hybrid Median-length Policy Optimization for Chain-of-Thought Compression

HMPO: 用于思维链压缩的混合中位数长度策略优化

Minghui Zheng, Hongxu Chen, Huimin Ren, Hongsheng Xin, Xiaoyang Qu, Ze Wang, Shuling Yang, Ziyu Peng, Kaike Zhang, Pan Zhou, Kun Zhan

发表机构 * Li Auto Inc.（Li Auto公司）

AI总结提出HMPO，一种单阶段强化学习框架，通过自适应中位数预算、余弦衰减令牌奖励和乘法奖励公式，在数学数据上训练后实现19%-46%的令牌压缩且精度损失极小，并泛化至多种任务。

详情

AI中文摘要

大型语言模型通过扩展的思维链推理取得了显著性能，但这一冗长过程带来了大量推理开销。现有的思维链压缩方法面临不灵活的手动长度预算、计算昂贵的多阶段训练流程以及仅适用于小模型的脆弱可扩展性。我们提出HMPO（混合中位数长度策略优化），一种经济高效的单阶段强化学习框架。HMPO通过三个协同组件高效压缩思维链：基于成功轨迹的自适应中位数预算以消除手动调整、用于平滑长度惩罚的余弦衰减令牌奖励，以及通过严格优先考虑答案正确性来大幅减轻琐碎奖励破解的乘法奖励公式。仅在数学数据上训练，HMPO无缝泛化到数学、代码、科学和指令遵循任务。在从9B到122B参数、涵盖密集和混合专家架构的大规模实验中，HMPO实现了19%-46%的令牌压缩，精度下降可忽略，同时与现有的多阶段基线相比大幅降低了训练成本。

英文摘要

Large language models achieve remarkable performance via extended chain-of-thought (CoT) reasoning, yet this lengthy process incurs substantial inference overhead. Existing CoT compression methods struggle with inflexible manual length budgets, computationally expensive multi-stage training pipelines, and fragile scalability restricted to small models. We propose HMPO (Hybrid Median-length Policy Optimization), a cost-effective, single-stage reinforcement learning framework. HMPO efficiently compresses CoT via three synergistic components: an adaptive median-based budget derived from successful rollouts to eliminate manual tuning, a cosine-decay token reward for smooth length penalization, and a multiplicative reward formulation that substantially mitigates trivial reward hacking by strictly prioritizing answer correctness. Trained exclusively on mathematical data, HMPO generalizes seamlessly across math, code, science, and instruction-following tasks. Extensive experiments scaling from 9B to 122B parameters across dense and Mixture-of-Experts (MoE) architectures demonstrate that HMPO achieves 19%--46% token compression with negligible accuracy degradation, all while drastically reducing training costs compared to existing multi-stage baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.01923 2026-06-02 cs.CL cs.LG 版本更新

面向异构表格表示的片段驱动结构归纳与语义对齐

Woojun Jung, Susik Yoon

发表机构 * Department of Computer Science \& Engineering, Korea University, Seoul, South Korea

AI总结提出NAVI框架，通过掩码片段建模和熵驱动片段对齐，利用片段级结构归纳与语义对齐实现异构表格的表示学习。

详情

AI中文摘要

现实世界领域通常包含异构表格，其标题各不相同，但底层属性语义是共享的，这使得仅从表格局部证据中归纳领域专用语义变得困难。现有编码器对此问题进行了部分建模，但往往未充分利用列级值分布，并对具有不同语义角色的属性应用统一目标。我们提出NAVI，一种以片段为中心的预训练框架，将每个标题-值对视为聚合模式级结构证据和列级分布证据的单位。我们通过掩码片段建模和熵驱动片段对齐实现这一设计，共同强制结构化标题-值耦合以及跨稳定属性和实例特定属性的语义对齐。在异构领域内表格上的实验表明，在整体评估设置下，重建、语义一致性和下游效用均得到改善。

英文摘要

Real-world domains often contain heterogeneous tables whose headers vary while their underlying attribute semantics are shared, making it difficult to induce domain-specialized semantics from table-local evidence alone. Existing encoders model parts of this problem, but often underuse column-level value distributions and apply uniform objectives across attributes with different semantic roles. We propose NAVI, a segment-centric pretraining framework that treats each header-value pair as the unit for aggregating schema-level structural evidence and column-level distributional evidence. We realize this design through Masked Segment Modeling and Entropy-driven Segment Alignment, which jointly enforce structured header-value coupling and semantic alignment across stable and instance-specific attributes. Experiments on heterogeneous in-domain tables show improved reconstruction, semantic consistency, and downstream utility across evaluation settings overall.

URL PDF HTML ☆

赞 0 踩 0

2606.01883 2026-06-02 cs.LG cs.CV 版本更新

Beyond the Simplex: Balanced Prototype Geometry for Scorer-Agnostic Open-Set Recognition

超越单纯形：用于评分器无关的开放集识别的平衡原型几何

Mayank Sharma, Rohit Kumar Mourya

发表机构 * Indian Institute of Technology Jodhpur（印度理工学院乔浦尔）

AI总结本文提出平衡等范数原型几何理论，统一分析不同嵌入维度下的开放集识别，证明评分器性能依赖于评分规则而非单纯形结构。

Comments 20 pages, 2 figures, 6 tables

详情

AI中文摘要

开放集识别（OSR）要求分类器拒绝来自未见类别的输入，这在医学成像等安全关键场景中至关重要。基于单纯形的方法将类原型固定在正则单纯形的顶点，然后通过距离比分数进行拒绝，这些方法在经验上表现良好但缺乏理论依据，且现有分析仅适用于嵌入维度d至少为C-1的情况，这是正则单纯形存在的条件。我们给出了在任意嵌入维度（包括d < C-1）下单纯形比OSR的理论解释。我们的分析集中于平衡等范数编码：具有等长和零和的原型配置，存在于所有d >= 2的情况，并包含正则单纯形作为特例。对于这些编码，我们证明辅助平方比分数的子水平集是欧几里得球的精确并集，进而包围了操作分数的接受区域；并且我们证明了一个尖锐的二分法：当且仅当d >= C-1时，原型达到等距对称性，行为类似于正则单纯形，低于该阈值时，由显式缺陷参数控制退化程度。我们进一步证明，在自然各向同性假设下，错误接受率随d指数衰减，并且操作分数是全局Lipschitz的，具有紧致接受区域。在实验上，我们将平衡原型几何作为分析工具和表示学习先验进行研究，而非作为独立的先进检测器。在CIFAR和MedMNIST开放集划分上，几何结构提供了有用的结构，但OSR性能仍然强烈依赖于评分规则：原始比率分数通常不如基于最近邻和logit的替代方案。

英文摘要

Open-set recognition (OSR) requires a classifier to reject inputs from unseen classes which is essential in safety-critical settings such as medical imaging. Simplex based methods, which fix class prototypes at the vertices of a regular simplex and then reject via a distance-ratio score, perform well empirically but lack theoretical justification, and existing analysis applies only when the embedding dimension d is at least C-1, which is the regime in which a regular simplex exists. We give a theoretical account of simplex-ratio OSR that holds in every embedding dimension, including d < C-1. Our analysis centers on balanced equal-norm codes: prototype configurations with equal lengths and zero sum, which exist for all d >= 2 and include the regular simplex as a special case. For these codes we show that an auxiliary squared ratio score has sublevel sets that are exact unions of Euclidean balls, which in turn bracket the acceptance region of the operational score; and we prove a sharp dichotomy: the prototypes attain one-distance symmetry, behaving like a regular simplex, if and only if d >= C-1, with controlled degradation governed by an explicit defect parameter below that threshold. We further show the false-acceptance rate decays exponentially in d under natural isotropy assumptions, and that the operational score is globally Lipschitz with compact acceptance regions. Empirically, we study balanced prototype geometry as both an analytic tool and a representation-learning prior, rather than as a stand-alone state-of-the-art detector. Across CIFAR and MedMNIST open-set splits, the geometry provides useful structure, but OSR performance remains strongly dependent on the scoring rule: raw ratio scores typically underperform nearest-neighbor and logit-based alternatives.

URL PDF HTML ☆

赞 0 踩 0

2606.01873 2026-06-02 cs.LG 版本更新

G2LoRA: Gradient Orthogonal Low-Rank Adaptation Framework for Graph Continual Learning on Text-Attributed Graphs

G2LoRA: 面向文本属性图的梯度正交低秩自适应框架用于图持续学习

Yuhan Wang, Yibo Ding, Yutong Ye, Mufan Zhao, Wenbo Zhang, Ruijie Wang, Jianxin Li

发表机构 * School of Computer Science and Engineering, Beihang University（北航计算机科学与工程学院）； Department of Statistics, Columbia University（哥伦比亚大学统计系）； College of Computer Science, Beijing University of Technology（北京理工大学计算机学院）

AI总结针对LLM-as-Aligner模型在文本属性图持续学习中的灾难性遗忘问题，提出G2LoRA框架，通过统一图-文本对齐目标、类别感知梯度投影和梯度幅度调制，实现任务间正向迁移并缓解模态漂移。

Comments Accepted by KDD 2026

详情

AI中文摘要

LLM-as-Aligner已成为文本属性图（TAGs）的一种流行预训练范式，通过CLIP风格的对比学习将图和文本模态对齐到共享嵌入空间。虽然在单个下游任务上有效，但我们观察到当此类模型在流式任务上顺序微调时会出现严重的灾难性遗忘。尽管参数高效微调在一定程度上缓解了遗忘，但仍不足以解决任务干扰和无效知识迁移。在这项工作中，我们研究了TAGs上LLM-as-Aligner模型的图持续学习，目标是减轻干扰同时促进任务间的正向迁移。该设置引入了两个基本挑战：（1）异构下游任务导致优化目标变化，阻碍统一微调；（2）图和文本编码器对自适应表现出不同的敏感性，不协调的更新容易导致错位。为应对这些挑战，我们提出了G2LoRA，一个面向TAGs的持续学习框架。G2LoRA将节点级、链接级和图级任务统一到单一的图-文本对齐目标下，并在领域/类别/任务增量模式下实现一致的优化。为减少任务干扰同时鼓励正向迁移，G2LoRA在结构化子空间中执行类别感知梯度投影，解决冲突更新并实现条件性反向迁移以平衡前向和后向知识流。为进一步防止跨模态漂移，G2LoRA引入梯度幅度调制来协调图和文本编码器之间的更新速率。在基准数据集上的大量实验表明，G2LoRA在不同骨干架构上始终优于强基线，实现了卓越的持续性能和可迁移性。

英文摘要

LLM-as-Aligner has emerged as a prevalent pre-training paradigm for Text-Attributed Graphs(TAGS), aligning graph and text modalities into a shared embedding space via CLIP-style contrastive learning. While effective on individual downstream tasks, we observe severe catastrophic forgetting when such models are sequentially fine-tuned on streaming tasks. Although parameter-efficient fine-tuning alleviates forgetting to some extent, it remains insufficient to resolve task interference and ineffective knowledge transfer. In this work, we study graph continual learning for LLM-as-Aligner models on TAGs, with the goal of mitigating interference while promoting positive transfer across tasks. This setting introduces two fundamental challenges: (1) heterogeneous downstream tasks induce shifting optimization objectives, hindering unified fine-tuning; and (2) graph and text encoders exhibit different sensitivities to adaptation, making uncoordinated updates prone to misalignment. To address these challenges, we propose G2LoRA, a continual learning framework for TAGs. G2LoRA unifies node-, link-, and graph-level tasks under a single graph--text alignment objective, and enables consistent optimization across domain/class/task incremental modes. To reduce task interference while encouraging positive transfer, G2LoRA performs category-aware gradient projection in structured subspaces, resolving conflicting updates and enabling conditional backward transfer to balance forward and backward knowledge flow. To further prevent cross-modal drift, G2LoRA introduces gradient magnitude modulation to coordinate update rates between graph and text encoders. Extensive experiments on benchmark datasets demonstrate that G2LoRA consistently outperforms strong baselines across different backbone architectures, achieving superior continual performance and transferability.

URL PDF HTML ☆

赞 0 踩 0

2606.01868 2026-06-02 cs.LG 版本更新

Task-Induced Representational Invariances Depend on Learning Objective in Deep RL

任务诱导的表征不变性依赖于深度强化学习中的学习目标

Manu Srinath Halvagal, Sebastian Lee, SueYeon Chung

发表机构 * Department of Physics, Harvard University（哈佛大学物理系）； Kempner Institute, Harvard University（哈佛大学凯普纳研究所）； Center for Computational Neuroscience, Flatiron Institute（Flatiron研究所计算神经科学中心）

AI总结本文通过MDP约简理论分析深度强化学习中的表征，发现基于价值的方法（DQN）学习对MDP同态对称性不变的表征，而基于策略梯度的方法（PPO）学习对动作对称性不变的表征，这些差异影响迁移学习并在LLM中呈现提示依赖性。

详情

AI中文摘要

强化学习（RL）长期以来在神经科学中被用作目标导向动物行为的模型。现代深度RL在许多领域取得了显著成功，进一步强化了这一联系。学习高维状态空间的抽象表征能力是这一成功的基础。然而，对这些学习表征的理论理解仍然有限，阻碍了模型与动物学习之间的直接比较。我们通过MDP约简理论的视角分析深度RL表征来弥补这一差距。在导航任务中研究经典RL算法时，我们发现即使性能相当，基于价值的方法（DQN）学习对MDP同态对称性不变的表征，而基于策略梯度的方法（PPO）学习对动作对称性不变的表征。这些差异在不同领域中一致出现，对迁移学习有下游影响，并以提示依赖的方式出现在LLM中。我们的发现提供了一种比较不同RL算法学习表征的原则性方法，具有实际意义，并可能为大脑中的神经编码提供见解。

英文摘要

Reinforcement Learning (RL) has long served as a model for goal-directed animal behavior in neuroscience. Modern deep RL has shown remarkable success across many domains, further strengthening this connection. The ability to learn abstract representations of high-dimensional state spaces underlies much of this success. However, theoretical understanding of these learned representations remains limited, hindering direct comparisons between models and animal learning. We address this gap by analyzing deep RL representations through the lens of MDP reduction theory. Investigating canonical RL algorithms in a navigation task, we find that even when performance is comparable, the value-based method (DQN) learns representations that are invariant to MDP homomorphism symmetries, while the policy-gradient method (PPO) learns representations invariant to action symmetries. These differences emerge consistently across domains, have downstream consequences for transfer learning, and appear in LLMs in a prompt-dependent manner. Our findings provide a principled approach to comparing learned representations across RL algorithms, with demonstrated practical implications and possible insights for neural coding in the brain.

URL PDF HTML ☆

赞 0 踩 0

2606.01863 2026-06-02 cs.LG math-ph math.MP 版本更新

Continual Learning as a Multiphase Moving-Boundary Problem

持续学习作为多相移动边界问题

Snigdha Chandan Khilar

发表机构 * Independent Researcher（独立研究者）

AI总结受熔化物理学启发，提出Stefan-CL方法，将知识巩固视为固相、未用容量视为液相，通过控制潜热调节边界移动，在几乎零遗忘下实现持续学习，无需存储原始数据。

2606.01861 2026-06-02 cs.LG 版本更新

A Theoretical Framework for Self-Play Theorem Proving Algorithms

自我对弈定理证明算法的理论框架

Thomas Chen, Zhiyuan Li

发表机构 * Thomas Chen（汤姆·陈）； Zhiyuan Li（李志强）

AI总结本文提出一个理论框架，通过将定理集建模为图并引入可逆随机游走和多样性度量，分析自我对弈算法在定理证明中的自我改进能力。

详情

AI中文摘要

自我对弈是一种使模型能够自我改进的训练算法，最近在利用大型语言模型进行形式定理证明方面显示出有希望的实证结果。(Dong & Ma, 2025) 将自我对弈实例化为两个协作智能体：证明者（证明定理）和猜想者（生成新定理作为证明者的课程）。在本文中，我们提供了一个理论框架，用于理解自我对弈算法在定理证明中的自我改进能力。首先，我们将定理集形式化为一个图，节点为定理，边连接具有相似语义的定理对。我们引入一组原始假设，刻画训练过的证明者的保证以及猜想者如何访问图的结构。其次，我们证明，如果底层定理图是良好连接的，那么一个基于可逆随机游走的猜想算法的证明者-猜想者系统足以指数级增长已证明定理的集合。第三，受自我对弈算法在实证中遇到的一个问题（猜想者倾向于生成人为复杂且非基础的定理）的启发，我们提出了一个由猜想者生成的定理训练分布的多样性度量，以及一种改进的猜想算法，该算法通过计算定理图中相邻定理之间的扩散相似性来局部最大化该多样性度量。最后，我们描述了一种通过对比学习将节点嵌入欧几里得空间，然后计算嵌入之间的内积来计算扩散相似性的方法。

英文摘要

Self-play, a type of training algorithm that enables a model to self-improve, has recently shown promising empirical results in the context of formal theorem proving using Large Language Models (LLMs). (Dong & Ma, 2025) instantiate self-play with two cooperating agents: a prover, which proves theorems, and a conjecturer, which generates new theorems as a curriculum to the prover. In this paper, we provide a theoretical framework for understanding the self-improvement capabilities of self-play algorithms for theorem proving. First, we formalize the set of theorems as a graph, with nodes as theorems and edges between pairs of theorems with similar semantics. We introduce a set of primitive assumptions that characterize the guarantees of a trained prover and how a conjecturer can access the structure of the graph. Second, we show that if the underlying graph of theorems is well-connected, then a prover-conjecturer system, where the conjecturing algorithm is based on a reversible random walk, is sufficient to grow the set of proved theorems exponentially. Third, motivated by an issue encountered empirically by self-play algorithms, where the conjecturer tends to generate artificially complex and non-fundamental theorems, we propose a diversity measure for a training distribution of theorems generated by a conjecturer and an improved conjecturing algorithm that locally maximizes this diversity measure, by computing the diffusion similarity between neighboring theorems in the theorem graph. Finally, we describe a method to compute the diffusion similarity by using contrastive learning to embed nodes into Euclidean space and then computing the inner-product between embeddings.

URL PDF HTML ☆

赞 0 踩 0

2606.01847 2026-06-02 cs.RO cs.LG 版本更新

The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Space

我们说的谎言：通过切空间上的分数匹配纠正视觉-语言-动作策略中的欧几里得谬误

Bing-Cheng Chuang, I-Hsuan Chu, Bor-Jiun Lin, YuanFu Yang, Min Sun, Chun-Yi Lee

发表机构 * National Taiwan University（台湾大学）

AI总结针对扩散视觉-语言-动作策略将SE(3)位姿表示为平坦R^12向量导致的欧几里得谬误，提出Lie Diffuser Actor (LDA)框架，通过左不变SDE注入噪声、在切空间预测分数并利用指数映射回缩样本，从根本上消除流形漂移、保证坐标框架等变性和测地线最优性，在CALVIN ABC→D上平均任务长度从3.27提升至3.51。

Comments ICML 2026 Accepted

详情

AI中文摘要

基于扩散的视觉-语言-动作策略在机器人操作中取得了显著成功，但犯了一个我们称之为$ extbf{欧几里得谬误}$的基本几何错误：将SE(3)位姿表示为平坦的$\mathbb{R}^{12}$向量。这种近似导致(1)违反SO(3)约束的流形漂移，(2)坐标变换下等变性的破坏，以及(3)具有过高运动学代价的非测地线轨迹。我们提出$ extbf{Lie Diffuser Actor (LDA)}$，一个本质上在SE(3)上运行的扩散框架。我们的方法通过左不变SDE注入噪声，在切空间中预测分数，并通过指数映射回缩样本。这种表述通过构造消除了流形漂移，同时保证了坐标框架等变性和测地线最优性。在CALVIN ABC$ ightarrow$D上，LDA将平均任务长度从$3.27$提升到$3.51$（$+7.3\%$）。我们进一步在真实机器人上验证了该方法，结果表明我们的方法在大多数任务上优于基线。

英文摘要

Diffusion-based Vision-Language-Action policies achieve remarkable success in robotic manipulation, yet commit a fundamental geometric error we term the $\textbf{Euclidean Fallacy}$: representing SE(3) poses as flat $\mathbb{R}^{12}$ vectors. This approximation induces (1) manifold drift violating SO(3) constraints, (2) broken equivariance under coordinate transformations, and (3) non-geodesic trajectories with excessive kinematic cost. We introduce $\textbf{Lie Diffuser Actor (LDA)}$, a diffusion framework operating intrinsically on SE(3). Our method injects noise through left-invariant SDEs, predicts scores in the tangent space, and retracts samples via the exponential map. This formulation eliminates manifold drift by construction while guaranteeing coordinate-frame equivariance and geodesic optimality. On CALVIN ABC$\rightarrow$D, LDA improves average task length from $3.27$ to $3.51$ ($+7.3\%$). We further validate our method on real robot and the results show that our methodology outperforms the baseline on majority tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.01846 2026-06-02 cs.LG 版本更新

Mos-Gen: A Generative Molecular Framework for Mosquito Insecticide Design

Mos-Gen：用于蚊虫杀虫剂设计的生成式分子框架

Lina Wang, Yaning Cui

发表机构 * Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences（上海有机化学研究所，中国科学院）； DP Technology（DP技术）

AI总结提出Mos-Gen框架，结合预训练分子表示模型Uni-Mol与变分自编码器，用于从头生成含二硫键的大蒜素衍生物作为蚊虫杀虫剂，实验验证预测阳性命中率达78%。

详情

AI中文摘要

蚊媒传染病每年在全球造成超过70万人死亡。传统化学杀虫剂的长期使用已导致严重的抗药性问题，迫切需要开发新型、高效且生态可持续的替代品。虽然该领域现有的人工智能方法主要集中于活性预测和分类，但在从头生成新型分子骨架方面存在关键空白。在本研究中，我们提出了Mos-Gen，一种基序感知的生成式协作框架，将预训练分子表示模型Uni-Mol与变分自编码器（VAE）相结合，专门用于设计含二硫键的大蒜素衍生物作为蚊虫杀虫剂。在生成的候选分子中，选择了14种化合物——包括9个预测阳性和5个预测阴性——进行化学合成和实验验证。预测阳性中的命中率达到78%，而预测阴性均未表现出杀蚊活性。这些实验结果充分验证了Mos-Gen框架的高精度筛选能力。

英文摘要

Mosquito-borne infectious diseases cause more than 700000 deaths worldwide each year. The long-term use of conventional chemical insecticides has induced serious resistance problems, creating an urgent need to develop novel, highly effective, and ecologically sustainable alternatives. While existing artificial intelligence approaches in this domain have focused primarily on activity prediction and classification, they leave a critical gap in the de~novo generation of novel molecular scaffolds. In this study, we propose Mos-Gen, a motif-aware generative collaborative framework that couples the pretrained molecular representation model Uni-Mol with a variational autoencoder (VAE), specifically tailored for the design of disulfide-containing allicin derivatives as mosquito insecticides. Among the generated candidates, fourteen compounds -- comprising nine predicted positives and five predicted negatives -- were selected for chemical synthesis and experimental validation. The hit rate among the predicted positives reached 78%, whereas none of the predicted negatives exhibited mosquitocidal activity. These experimental results fully validated the high-precision screening capability of the Mos-Gen framework.

URL PDF HTML ☆

赞 0 踩 0

2606.01839 2026-06-02 cs.DC cs.AR cs.LG 版本更新

Observation, Not Prediction: Conversation-Level Disaggregated Scheduling for Agentic Serving

观察而非预测：面向智能体服务的对话级解耦调度

Jianru Ding, Ryien Hosseini, Pouya Mahdi Gholami, Mingyuan Xiang, Henry Hoffmann

发表机构 * Anonymous Authors（匿名作者）

AI总结提出将调度单元从单轮提升至整个对话，利用对话中首轮计算密集与后续内存密集的两阶段可观察特性，实现无需预测的解耦调度，显著降低延迟并提升能效。

详情

AI中文摘要

基于LLM的智能体通过多轮依赖推理和工具调用来解决用户任务，产生的工作负载在任务到达时总成本未知。现有的多轮系统以轮次为调度单元，逐轮决定是否将预填充与解码解耦。该决策依赖于该轮的解码长度、工具行为和KV增长，这些量在调度器必须行动时不可观察，迫使系统进行预测。我们表明这种对预测的依赖是由调度单元而非工作负载强加的。将调度单元从轮次提升到对话，将轮次级的不规则性转化为稳定的两阶段结构：1) 计算密集的首轮预填充，随后是2) 长尾内存密集阶段。因此，以对话为调度单元，放置问题简化为读取首轮输入长度和每解码器KV占用率，两者均可直接观察。我们在ConServe中实例化这一原则，它将首轮预填充路由到高吞吐预填充器，精确传输KV缓存一次，并将对话固定到单个解码器处理其整个尾部，无需学习解码侧成本模型。与每轮预测基线相比，ConServe将p95首次有效令牌时间（对话首个用户可见输出的延迟）降低51.08%，能效提升7.51%，同时保持最后一轮的TBT和SLO；将两阶段映射到异构GPU层级可进一步增加22.75%的能效。

英文摘要

LLM-based agents resolve a user task through many turns of dependent inference and tool calls, producing a workload whose total cost is unknown when the task arrives. Existing multi-turn systems keep the turn as the scheduling unit and decide, turn by turn, whether to disaggregate prefill from decode. That decision rests on the turn's decode length, tool behavior, and KV growth, quantities that are not observable when the scheduler must act, forcing the system to predict them. We show this dependence on prediction is imposed by the scheduling unit, not the workload. Raising the scheduling unit from the turn to the conversation converts turn-level irregularity into a stable, two-phase structure: 1) a compute-bound turn-1 prefill followed by 2) a long, memory-bound tail. Thus, with the conversation as the scheduling unit, placement reduces to reading the first-turn input length and per-decoder KV occupancy, both directly observable. We instantiate this principle in ConServe, which routes the first-turn prefill to a high-throughput prefiller, transfers the KV cache exactly once, and pins the conversation to a single decoder for its entire tail, with no learned model of decode-side cost. Against a per-turn prediction baseline, ConServe reduces p95 time-to-first-effective-token (the latency of a conversation's first user-visible output) by 51.08% and improves energy efficiency by 7.51% while preserving last-turn TBT and SLOs; mapping the two phases onto heterogeneous GPU tiers adds a further 22.75% in energy efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.01838 2026-06-02 cs.CL cs.AI cs.LG 版本更新

LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

LayerRoute: 基于LoRA微调的输入条件自适应层跳过方法用于智能语言模型

Prateek Kumar Sikdar

发表机构 * Accenture（埃森哲）

AI总结提出LayerRoute，通过为每个Transformer块添加轻量级路由器和LoRA适配器，根据输入类型（工具调用或规划推理）自适应跳过层，在仅增加0.22%可训练参数下实现12.91%的跳过差异并提升质量。

Comments 10 pages, 3 figures, 4 tables

详情

AI中文摘要

智能语言模型系统交替使用两种结构不同的步骤类型：结构化工具调用（短、确定性、低困惑度）和开放式规划/推理步骤（长、复杂、高困惑度）。尽管存在这种异质性，当前的推理系统对每个步骤应用相同的计算量。我们引入LayerRoute，一个轻量级适配器，学习基于每个输入有选择地跳过Transformer块。LayerRoute为Qwen2.5-0.5B-Instruct中的24个Transformer块中的每一个增加：（1）一个每层路由器（约897个参数，Linear(896,1)），通过直通估计器输出硬二值门；（2）在Q/K/V/O注意力投影上的LoRA适配器（秩8，约1.08M参数）。骨干权重保持冻结。在智能体数据（Hermes、Glaive、GSM8K、Turing）上进行单次端到端训练，并加入门正则化项，迫使系统发现每个输入类型下哪些块是可跳过的。经过3000步（在A100 40GB上6.4分钟），LayerRoute实现了12.91%的跳过差异：工具调用跳过15.25%的FLOPs，而规划步骤仅跳过2.34%，仅使用1.10M可训练参数（占494M骨干的0.22%）。由于LoRA适配，质量相比基础模型有所提升，工具调用上的困惑度差为-1.29，规划步骤上为-1.30。

英文摘要

Agentic language model systems alternate between two structurally distinct step types: structured tool calls (short, deterministic, low perplexity) and open-ended planning/reasoning steps (long, complex, high perplexity). Despite this heterogeneity, current inference systems apply identical compute to every step. We introduce LayerRoute, a lightweight adapter that learns to selectively skip transformer blocks on a per-input basis. LayerRoute augments each of the 24 transformer blocks in Qwen2.5-0.5B-Instruct with: (1) a per-layer router (~897 parameters, Linear(896,1)) that outputs a hard binary gate via the straight-through estimator, and (2) LoRA adapters (rank 8, ~1.08M parameters) on the Q/K/V/O attention projections. The backbone weights remain frozen. A single end-to-end training pass on agentic data (Hermes, Glaive, GSM8K, Turing) with a gate regularisation term forces the system to discover which blocks are skippable per input type. After 3,000 steps (6.4 minutes on an A100 40GB), LayerRoute achieves a 12.91% skip differential: tool calls skip 15.25% of FLOPs while planning steps skip only 2.34%, using only 1.10M trainable parameters (0.22% of the 494M backbone). Quality improves over the base model due to LoRA adaptation, with perplexity delta of -1.29 on tool calls and -1.30 on planning.

URL PDF HTML ☆

赞 0 踩 0

2606.01833 2026-06-02 cs.LG cs.AI 版本更新

从结构视角看大语言模型的多语言性

Haruki Sakajo, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe

发表机构 * Nara Institute of Science and Technology（奈良科学技術研究所）

AI总结本研究通过表示结构分析探索大语言模型的多语言性，发现低资源语言与英语的结构差异大于高、中资源语言，且语言特定后训练改变结构但保留语言间关系。

2606.01799 2026-06-02 cs.LG stat.ML 版本更新

Tree-Guided Identify-Then-Exploit: A Unified Framework of Best Arm Identification and Regret Minimization for Dueling Bandits

树引导的识别-然后-利用：决斗式赌博机中最佳臂识别与遗憾最小化的统一框架

Pu Wang, Yao-Xiang Ding

发表机构 * State Key Lab of CAD&CG（计算机辅助设计与图形学国家重点实验室）

AI总结针对Condorcet赢家假设下的N臂随机决斗式赌博机，提出树引导的识别-然后-利用（TG-ITE）统一框架，通过共享树引导识别方法在O(N)次比较内找到高置信度候选，并针对不同目标设计利用策略，首次同时实现最佳臂识别O(N)样本复杂度、弱遗憾O(N)和强遗憾O(N log T)保证，并消除现有方法中O(log N)的次优差距。

详情

AI中文摘要

我们研究在Condorcet赢家假设下的$N$臂随机决斗式赌博机，考虑三个广泛采用的目标：最佳臂识别（BAI）、弱遗憾和强遗憾。我们提出树引导的识别-然后-利用（TG-ITE），据我们所知，这是第一个统一处理所有这些目标的框架。无需更强的假设，我们提出一种共享的树引导识别方法，在$O(N)$次比较内找到高置信度的候选。我们进一步提出不同的利用策略，利用这个热启动阶段来优化具体目标。这种方法使得我们的方法能够：（1）在没有通常采用的更强假设的情况下，实现BAI的$O(N)$样本复杂度；（2）构建第一个赢家保持风格的算法，实现$O(N)$弱遗憾；（3）享有与专门强遗憾方法相同的$O(N \log T)$保证；（4）实现BAI和弱遗憾的联合优化，两者均具有$O(N)$保证，消除了现有方法中$O(\log N)$的次优差距。我们的结果提供了证据，表明在决斗式赌博机中，BAI和遗憾最小化之间的权衡相对温和。

英文摘要

We study $N$-armed stochastic dueling bandits under the Condorcet-winner assumption, where three widely adopted objectives are considered: best-arm identification (BAI), weak regret, and strong regret. We propose Tree-Guided Identify-Then-Exploit (TG-ITE), the first unified framework to tackle all these objectives to our knowledge. Without requiring stronger assumptions, we propose a shared tree-guided identification approach to find a high-confidence incumbent within $O(N)$ comparisons. We further propose varied exploitation strategies to utilize this warm-start stage to optimize the specific objectives at hand. This methodology enables our approach to (1) achieve $O(N)$ sample complexity in BAI without commonly adopted stronger assumptions; (2) build the first winner-stays-style algorithm to achieve $O(N)$ weak regret; (3) enjoy the same $O(N \log T)$ guarantee as specialized strong-regret approaches; (4) realize the joint optimization of BAI and weak regret with $O(N)$ guarantees for both, eliminating the sub-optimal gap of $O(\log N)$ in the existing approach. Our results provide evidence that the trade-off between BAI and regret minimization is relatively benign in dueling bandits.

URL PDF HTML ☆

赞 0 踩 0

2606.01774 2026-06-02 cs.LG cs.AI 版本更新

捷径通往虚无：揭秘深度虚假回归

Guanrong Xu, Jessica Li, Hao Wang, Yuzhe Yang

发表机构 * University of California, Los Angeles（加州大学洛杉矶分校）； Rutgers University（罗格斯大学）； Yang AI Lab（杨人工智能实验室）

AI总结针对连续预测中的虚假相关性，提出利用标签和特征空间中虚假属性的相似性来校准分布，从而提升模型在分布偏移下的泛化能力。

详情

AI中文摘要

现实世界中的回归常常存在捷径：在训练中与连续目标虚假相关的属性，在部署偏移下不可靠；使用此类捷径回归目标可能在测试时灾难性失败。现有关于虚假相关性的研究主要关注分类，其中标签是分类的且组是自然定义的。然而，许多现实任务需要连续预测，其中不存在硬标签边界或离散的组-标签对。我们将深度虚假回归（DSR）定义为从具有属性-标签混淆的回归数据中学习，处理连续虚假相关性，并在测试时泛化到所有属性-标签组合。受分类和回归捷径内在差异的启发，我们提出利用标签和特征空间中虚假属性之间的相似性，从而在跨属性校准标签和学习特征分布时考虑邻近目标和相关组。在涵盖计算机视觉、环境感知和大语言模型（LLM）回归的常见真实世界DSR数据集上的大量实验验证了我们策略的优越性能。我们的工作填补了研究连续预测中虚假相关性的基准和技术空白。

英文摘要

Real-world regression often exhibits shortcuts: attributes that are spuriously correlated with continuous targets in training, yet unreliable under deployment shifts; regressing targets using such shortcuts may fail catastrophically at test time. Existing studies on spurious correlations focus primarily on classification, where labels are categorical and groups are naturally defined. However, many real-world tasks require continuous prediction, where hard label boundaries or discrete group-label pairs do not exist. We define Deep Spurious Regression (DSR) as learning from regression data with attribute-label confounding, addressing continuous spurious correlations, and generalizing to all attribute-label combinations at test time. Motivated by the intrinsic difference between classification and regression shortcuts, we propose to exploit the similarity among spurious attributes in both label and feature spaces, thereby accounting for nearby targets and related groups while calibrating both label and learned feature distributions across attributes. Extensive experiments on common real-world DSR datasets that span computer vision, environmental sensing, and large language model (LLM) regression verify the superior performance of our strategies. Our work fills the gap in benchmarks and techniques for studying spurious correlations in continuous prediction.

URL PDF HTML ☆

赞 0 踩 0

2606.01722 2026-06-02 cs.LG cs.AI cs.DC 版本更新

Post-Deterministic Distributed Systems: A New Foundation for Trustworthy Autonomous Infrastructure

后确定性分布式系统：可信自主基础设施的新基础

Jun He, Deying Yu

发表机构 * OpenKedge Inc.（OpenKedge公司）

AI总结本文提出后确定性分布式系统（PDDS）模型，以协调确定性代码、随机模型和自主代理共存的异构环境，并定义了五大架构支柱及新的故障分类。

Comments 8 pages, 1 table

详情

AI中文摘要

几十年来，分布式系统通常假设正确的参与者执行协议指定的行为，具有稳定、外部定义和确定性的语义。经典理论广泛参数化了网络时序、通信拓扑和故障域，但参与者模型相对固定。将自主推理引擎、随机模型驱动代理和策略驱动参与者集成到云控制平面、事件响应系统和金融基础设施中，挑战了这一假设的普遍性。这些代理通常产生不同的推理路径、不同的操作轨迹和异构的内部表示，同时实现语义等价且正确的结果。在本文中，我们引入后确定性分布式系统（PDDS）作为研究和工程模型，用于协调确定性代码、随机模型和自主代理共存的异构环境。我们表明，经典分布式计算模型构成了这种参与者通用模型的零歧义特例。我们并非主张确定性系统消失；而是确定性执行不能再作为自主基础设施的通用参与者假设。最后，我们概述了后确定性基础设施的五大架构支柱：协议驱动开发、可验证代理基础设施、自主状态控制平面、语义法定保证和认知状态复制。认知状态复制将持久性和一致性模型从数据可见性扩展到知识可见性，实现代理记忆、可验证语义回滚以及跨推理参与者的连贯性。我们还定义了在此环境中出现的故障类别的分类法。

英文摘要

For decades, distributed systems have typically assumed that correct participants execute protocol-specified behavior with stable, externally defined, and deterministic semantics. Classical theory has extensively parameterized network timing, communication topologies, and failure domains, but this participant model has remained comparatively fixed. The integration of autonomous reasoning engines, stochastic model-driven agents, and policy-driven actors into cloud control planes, incident response systems, and financial infrastructure challenges the universality of this assumption. These agents often produce divergent reasoning paths, distinct operational traces, and heterogeneous internal representations while achieving semantically equivalent and correct outcomes. In this paper, we introduce Post-Deterministic Distributed Systems (PDDS) as a research and engineering model for coordinating heterogeneous environments where deterministic code, stochastic models, and autonomous agents coexist. We show that classical distributed computing models form a zero-ambiguity special case of this participant-general model. We do not argue that deterministic systems disappear; rather, deterministic execution can no longer serve as the universal participant assumption for autonomous infrastructure. Finally, we outline five architectural pillars of post-deterministic infrastructure: Protocol-Driven Development, Verifiable Agentic Infrastructure, Autonomous State Control Planes, Semantic Quorum Assurance, and Epistemic State Replication. Epistemic State Replication extends persistence and consistency models from data visibility to knowledge visibility, enabling agentic memory, Verifiable Semantic Rollback, and coherence across reasoning participants. We also define a taxonomy of failure classes that arise in this setting.

URL PDF HTML ☆

赞 0 踩 0

2606.01720 2026-06-02 cs.LG 版本更新

CANARY: 语言模型中微调污染的无标签检测

Swapnil Parekh

发表机构 * Switzerland（瑞士）

AI总结提出CANARY方法，通过稀疏自编码器分析隐藏状态差异，在无标签情况下检测微调数据污染，实现1%污染率下AUROC=1.000，并支持检测、验证、优先排序和修复。

详情

AI中文摘要

攻击者可以通过污染仅1%的微调样本来植入潜在的有害行为。这种污染对所有的输出级防御都是不可见的：有害行为潜伏在模型的隐藏状态几何中，直到污染超过7.5%才会在生成的文本中出现。我们提出了CANARY（通过神经激活表示产出的污染审计器），这是一种无标签检查点审计器，可以直接通过对未标记提示集进行两次前向传递来检测这种隐藏的偏移。CANARY通过稀疏自编码器投影隐藏状态差异，过滤风格噪声以隔离有意义的语义漂移。它在四种模型架构和两种训练范式下，在1%污染率下实现了AUROC=1.000（95%置信区间=[0.997, 1.000]；Cohen's d=3.28），比任何输出级方法触发点低7.5倍，并且在良性微调上零误报，对风格匹配和梯度噪声自适应攻击具有完全鲁棒性。相同的SAE特征基础驱动了一个完整的治理流程：SAE过滤放大以比标准生成高5倍的速率揭示潜在危害；得分排序的提示带来4.2倍的红队测试提升；在推理时抑制少数污染特定特征将危害从70%降低到10%，且无困惑度惩罚。CANARY是第一个仅从隐藏状态检测、验证、优先排序和修复供应链污染的无标签框架。

英文摘要

Adversaries can implant latent harmful behavior by poisoning as few as 1% of fine-tuning examples. The contamination is invisible to every output-level defense: harmful behavior lies dormant in the model's hidden-state geometry and does not appear in generated text until contamination exceeds 7.5%. We introduce CANARY (Contamination Auditor via Neural Activation Representation Yield), a zero-label checkpoint auditor that detects this hidden shift directly from two forward passes over an unlabeled prompt set. CANARY projects the hidden-state difference through a Sparse Autoencoder, filtering style noise to isolate meaningful semantic drift. It achieves AUROC = 1.000 at 1% contamination (95% CI = [0.997, 1.000]; Cohen's d = 3.28) across four model architectures and two training paradigms, 7.5x below where any output-level method fires, with zero false positives on benign fine-tuning and full robustness to style-matching and gradient-noise adaptive attacks. The same SAE feature basis drives a complete governance pipeline: SAE-filtered amplification surfaces latent harm at a 5x higher rate than standard generation; score-ranked prompts yield 4.2x red-teaming lift; and suppressing a handful of contamination-specific features at inference time reduces harm from 70% to 10% with no perplexity penalty. CANARY is the first zero-label framework to detect, verify, prioritize, and remediate supply-chain contamination from hidden states alone.

URL PDF HTML ☆

赞 0 踩 0

2606.01694 2026-06-02 cs.CV cs.AI cs.LG cs.MM 版本更新

Understanding Identity Continuity in Thermal Video through Scene-Level Consistency

通过场景级一致性理解热视频中的身份连续性

Wei-Chieh Sun, Gyungmin Ko, Heejae Kwon, Hsiang-Wei Huang, Jenq-Neng Hwang

发表机构 * Department of Electrical and Computer Engineering, Information Processing Lab, University of Washington, USA（电气与计算机工程系，信息处理实验室，华盛顿大学，美国）

AI总结针对热行人多目标跟踪中身份碎片化问题，提出轻量级后处理方法，通过在线短间隙重映射和离线轨迹重链接恢复身份连续性，在PBVS热行人MOT基准上提升IDF1。

Comments Accepted to CVPR 2026 Workshop on SVC. Published in CVPR Workshops proceedings

详情

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026, pp. 1411-1419

AI中文摘要

热行人多目标跟踪仍然具有挑战性，因为弱外观线索和频繁的检测中断导致严重的轨迹碎片化。我们研究轻量级后处理是否可以在不依赖重型重识别模型或复杂在线关联的情况下恢复身份连续性。从YOLOv8和SORT基线开始，我们添加了一个模块化的身份修复后端，包括基于时间、空间、运动和边界线索的在线短间隙重映射和离线轨迹重链接。在固定验证集上的受控消融实验和在官方PBVS热行人MOT基准上的评估表明，主要身份增益来自保守的重链接，将IDF1从82.25提升到84.93，同时保持MOTA，而许多启发式阈值在广泛的操作范围内保持稳定。这些结果表明，在低信息热图像中，通过高精度轨迹重链接比增加跟踪器复杂性更能有效地实现鲁棒的身份恢复。这些结果提供了对热视频中身份恢复的受控分析，表明与局部帧到帧关联相比，场景级时空一致性在身份连续性中起主导作用。

英文摘要

Thermal pedestrian MOT remains challenging because weak appearance cues and frequent detection interruptions cause severe trajectory fragmentation. We study whether lightweight post-processing can recover identity continuity without relying on heavy re-identification models or complex online association. Starting from a YOLOv8 and SORT baseline, we add a modular identity-repair backend consisting of online short-gap remapping and offline tracklet relinking based on temporal, spatial, motion, and border cues. Controlled ablations on a fixed validation split and evaluation on the official PBVS Thermal Pedestrian MOT benchmark show that the main identity gains arise from conservative relinking, improving IDF1 from 82.25 to 84.93 while preserving MOTA, whereas many heuristic thresholds remain stable across broad operating ranges. These results suggest that, in low-information thermal imagery, robust identity recovery can be achieved more effectively through high-precision trajectory relinking than through increasing tracker complexity. These results provide a controlled analysis of identity recovery in thermal video, showing that scene-level spatial-temporal consistency plays a dominant role in identity continuity compared to local frame-to-frame association.

URL PDF HTML ☆

赞 0 踩 0

2606.01691 2026-06-02 cs.CR cs.LG 版本更新

IstGPT: LLM-based Anomaly Detection for Spatial-Temporal Graph in Industrial Systems

IstGPT：基于LLM的工业系统时空图异常检测

Yuchen Zhang, Ning Xi, Pengbin Feng, Shigang Liu, Jianfeng Ma, Yulong Shen, Yanan Sun, Xiaolin Zhou

发表机构 * School of Cyber Engineering, Xidian University（电子科技大学信息工程学院）； School of Science, Computing and Engineering Technologies, Swinburne University of Technology（斯winburne技术大学科学与工程技术学院）； School of Computer Science and Technology, Xidian University（电子科技大学计算机科学与技术学院）

AI总结提出IstGPT，首个结合大语言模型与图学习的工业异常检测工具，通过多模态知识提取传感器-执行器依赖图并利用改进的图神经网络实现实时异常检测，在9个数据集上取得最佳F1分数和eTaF1指标。

详情

AI中文摘要

工业互联网系统面临来自复杂工业控制系统（ICS）攻击的日益增长的威胁，导致严重的安全事件。然而，由于传感器和执行器之间的复杂依赖关系，现有工具在实时异常检测方面效果有限。为了解决这个问题，我们提出了IstGPT，这是首个基于大语言模型和图学习的工业异常检测工具，能够针对广泛的ICS攻击提供实时保护。IstGPT实现了对工业信息物理系统中时空依赖关系的细粒度精确建模。它首先利用工业多模态知识，包括操作数据、技术文档和系统图，通过多阶段提示工程提取传感器-执行器依赖图。然后，LLM-Optimation基于节点准确性、边缘一致性和逻辑连贯性迭代优化图。最后，IstGPT将改进的图神经网络与编码器-解码器架构相结合，通过重构误差检测异常。我们在9个数据集上评估了IstGPT与12个最先进基线模型的性能，包括2个公共数据集、6个模拟数据集和一个真实机器人手臂数据集。IstGPT在所有九个数据集上取得了最佳的F1分数和eTaF1（一种较新的时间感知指标）。我们进一步讨论了在真实工业场景中部署IstGPT的可行性。

英文摘要

Industrial Internet systems face increasing threats from sophisticated industrial control system (ICS) attacks, resulting in critical safety incidents. However, existing tools exhibit limited effectiveness in real-time anomaly detection due to the complex dependencies among sensors and actuators. To tackle this, we present IstGPT, the first industrial anomaly detection tool based on LLMs and graph learning to provide real-time protection against a wide range of ICS attacks. IstGPT achieves fine-grained and precise modeling on spatial-temporal dependencies in industrial cyber-physical systems. It first leverages industrial multi-modal knowledge, including operational data, technical documents, and system diagrams, to extract sensor-actuator dependency graphs via multi-stage prompt engineering. Then, LLM-Optimation iteratively refines the graph based on node accuracy, edge consistency, and logical coherence. Finally, IstGPT integrated improved graph neural networks with an encoder-decoder architecture to detect anomalies via reconstruction errors. We evaluate IstGPT against 12 state-of-the-art baselines on 9 datasets, including 2 public, 6 simulated, and a real-world robotic arm dataset. IstGPT achieves the best F1-scores and eTaF1 (a newer time-aware metric) across nine datasets. We further discuss the feasibility of deploying IstGPT in real-world industrial scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.01682 2026-06-02 cs.CL cs.AI cs.LG 版本更新

Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

现成的大语言模型作为过程评分器：数学推理中PRM的无训练替代方案

Atoosa Chegini, Soheil Feizi

发表机构 * Department of Computer Science, University of Maryland（马里兰大学计算机科学系）

AI总结提出Chunk-Level Guided Generation方法，利用现成的大语言模型作为过程评分器，通过固定长度块评分和对比选择规则，无需训练即可在数学推理中匹配或超越PRM引导搜索的性能。

详情

AI中文摘要

使用更强的评分器从多个小模型样本中选择最佳响应是一种简单的推理时策略，但当小模型已经陷入错误推理路径时，该策略会失败。PRM引导搜索通过在生成过程中对候选延续进行评分来避免这一问题，但需要经过步骤级标签训练的奖励模型。我们提出Chunk-Level Guided Generation，一种无训练的替代方案，使用现成的大语言模型作为过程评分器。在每一步，小模型采样k个固定长度的候选块，而大模型使用似然度对候选块进行评分，无需生成任何文本。选中的块在下一步之前被提交，从而在错误传播之前引导生成。我们用两种选择规则实例化该框架：似然引导选择（LGS），选择具有最高长度归一化大模型对数概率的块；以及对比引导选择（CGS），减去小模型的对数概率，以偏向于大模型偏好与小模型偏好不同的块。我们证明，由于系统性的长度偏差（即使在长度归一化后仍然存在），使用大模型似然度对可变长度推理步骤进行评分是不可靠的，而固定长度块避免了这一混淆。在GSM8K、MATH、Minerva Math、AMC23和AIME24上，使用Qwen2.5-32B引导Qwen2.5-1.5B以及Llama-3.1-70B引导Llama-3.2-1B，CGS在多数投票上最多提升28个百分点，并且在匹配的引导预算下，在大多数基准测试中匹配或超越了Qwen2.5-Math-PRM-72B引导搜索，且无需奖励模型训练。使用Qwen2.5-72B引导Qwen2.5-7B，CGS在k=16时在MATH上达到81.8%，在Minerva Math上达到63.6%，超过多数投票4-6个百分点。最后，Chunk-Level Guided Generation产生的推理轨迹比PRM引导搜索短得多。

英文摘要

Selecting the best response from multiple small-model samples using a stronger scorer is a simple inference-time strategy, but fails when the small model has already committed to incorrect reasoning paths. PRM guided search avoids this by scoring candidate continuations during generation, but requires a reward model trained with step-level labels. We propose Chunk-Level Guided Generation, a training-free alternative that uses an off-the-shelf large language model as a process scorer. At each step, a small model samples k fixed-length candidate chunks, while the larger model scores the candidates using likelihoods without generating any text. The selected chunk is committed before the next step, steering generation before errors can propagate. We instantiate this framework with two selection rules: Likelihood-Guided Selection (LGS), which selects the chunk with the highest length-normalized large-model log-probability, and Contrastive-Guided Selection (CGS), which subtracts the small model's log-probability to favor chunks where the large model's preference diverges from the small model's. We show that scoring variable-length reasoning steps with large-model likelihoods is unreliable due to a systematic length bias that persists even after length normalization, and that fixed-length chunks avoid this confound. On GSM8K, MATH, Minerva Math, AMC23, and AIME24 with Qwen2.5-1.5B guided by Qwen2.5-32B and Llama-3.2-1B guided by Llama-3.1-70B, CGS outperforms majority voting by up to 28 pp and, under matched guidance budgets, matches or outperforms Qwen2.5-Math-PRM-72B guided search on most benchmarks without reward-model training. With Qwen2.5-7B guided by Qwen2.5-72B, CGS reaches 81.8% on MATH and 63.6% on Minerva Math at k=16, surpassing majority voting by 4--6 pp. Finally, Chunk-Level Guided Generation produces substantially shorter reasoning traces than PRM guided search.

URL PDF HTML ☆

赞 0 踩 0

2606.01680 2026-06-02 cs.DC cs.LG cs.NI 版本更新

Don't Let a Few Network Failures Slow the Entire AllReduce

不要让少数网络故障拖慢整个 AllReduce

Peiqing Chen, Jiedong Jiang, Nengneng Yu, Yuefeng Wang, Sixian Xiong, Wei Wang, Zaoxing Liu

发表机构 * University of Maryland, College Park（马里兰大学学院公园分校）； Utrecht University（乌特雷赫大学）； Kyoto University（京都大学）

AI总结针对网络故障导致 AllReduce 性能下降的问题，提出基于信息论下界的 OptCC 算法，通过四阶段流水线设计在带宽损失高达 50% 时仍接近无故障性能。

详情

AI中文摘要

网络故障是大规模 GPU 集群中最常见的硬件故障之一，也是训练任务中断的主要原因。现代集体通信库（如 NCCL）通过将流量重新路由到同一服务器上幸存的 NIC 来缓解网络故障，以降低节点间带宽换取不间断训练。然而，降级后的服务器仍处于标准环形算法的关键路径上，拖慢了整个集体通信。我们首次给出了非对称网络带宽下 AllReduce 完成时间的信息论下界，并表明当落后者保留至少一半原始带宽时，相对于无故障最优值的不可避免开销仅为 O(1/p)（p 为 GPU 数量）。然后，我们设计了 OptCC，一种接近该下界的四阶段流水线 AllReduce 算法。SimAI 上的实验证实，OptCC 缩小了现有容错方案留下的差距：在实际网络故障（带宽损失高达 50%）下，OptCC 的 AllReduce 完成时间在 NCCL 无故障环形性能的 2-6% 以内，而现有最优方案的开销高达 57%。

英文摘要

Network failures are among the most frequent hardware faults in large-scale GPU clusters and a leading cause of training-job interruptions. Modern collective communication libraries such as NCCL mitigate network failures by rerouting traffic through surviving NICs on the same server, trading reduced inter-node bandwidth for uninterrupted training. However, the degraded server remains on the critical path of the standard ring algorithm, slowing the entire collective. We present the first information-theoretic lower bound on AllReduce completion time under asymmetric network bandwidth and show that when the straggler retains at least half of its original bandwidth, the unavoidable overhead relative to the fault-free optimum is only O(1/p) for p GPUs. We then design OptCC, a four-stage pipelined AllReduce algorithm that approaches this lower bound. Experiments on SimAI confirm that OptCC closes the gap left by existing fault-tolerant schemes: under practical network failures with up to 50% bandwidth loss, OptCC completes AllReduce within 2-6% of NCCL's fault-free ring performance, whereas the state-of-the-art incurs up to 57% overhead.

URL PDF HTML ☆

赞 0 踩 0

2606.01672 2026-06-02 cs.LG 版本更新

RDA: Reward Design Agent for Reinforcement Learning

RDA：用于强化学习的奖励设计智能体

Hojoon Lee, Ajay Subramanian, Ben Abbatematteo, Vijay Veerabadran, Pedro Matias, Karl Ridgeway, Nitin Kamra

发表机构 * University of California, Berkeley（加州大学伯克利分校）； DeepMind（深度Mind）

AI总结提出基于视觉语言模型的奖励设计智能体RDA，通过任务分解、视觉轨迹评估和失败模式总结迭代优化奖励函数，在操作任务中生成更符合指令的策略。

Comments Accepted to RLC'26

详情

AI中文摘要

强化学习已经能够获得令人印象深刻的机器人技能，但通常需要手工设计的奖励函数，这些函数设计缓慢且难以与人类意图对齐。最近的工作，如Eureka，通过使用LLM从任务描述中迭代生成和优化奖励代码来自动化奖励设计。然而，它们依赖于粗糙的反馈信号，如成功率，这些信号对学习到的行为提供的语义洞察很少。因此，它们训练的策略达到了最终目标，但经常与任务指令对齐不良。我们引入了奖励设计智能体（RDA），一个基于VLM的智能体框架，将语义理解注入奖励设计。RDA分解任务，视觉评估轨迹，总结失败模式，并迭代修订奖励代码以更好地与任务指令对齐。在ManiSkill的12个桌面操作任务和HumanoidBench的4个全身操作任务中，RDA产生的策略在指令对齐方面显著优于其他基线，同时实现了相当的任务成功率。视频和生成的奖励代码可在https://nitinkamra1992.github.io/reward-design-agent获取。

英文摘要

Reinforcement learning has enabled the acquisition of impressive robotic skills, but typically requires hand-crafted reward functions that are slow to design and difficult to align with human intentions. Recent work, such as Eureka, automates reward design by using an LLM to iteratively generate and refine reward code from task descriptions. However, they rely on coarse feedback signals such as success rate, which provide little semantic insight into the learned behavior. As a result, their trained policies achieve the final goal but are frequently poorly aligned with task instructions. We introduce the Reward Design Agent (RDA), a VLM-based agentic framework that injects semantic understanding into reward design. RDA decomposes tasks, visually evaluates trajectories, summarizes failure modes, and iteratively revises reward code to better align with task instructions. Across 12 tabletop manipulation tasks from ManiSkill and 4 whole-body manipulation tasks from HumanoidBench, RDA produces policies substantially more instruction-aligned than those of other baselines, while achieving comparable task success rates. Videos and the generated reward code are available on https://nitinkamra1992.github.io/reward-design-agent.

URL PDF HTML ☆

赞 0 踩 0

2606.01667 2026-06-02 cs.LG 版本更新

门控滤波器而非消息：预传播图神经网络中的节点-通道混合

Zichao Yue, Zhiru Zhang

发表机构 * School of Electrical and Computer Engineering, Cornell University（康奈尔大学电气与计算机工程学院）

AI总结针对预传播图神经网络中复杂跳聚合器性能不佳的问题，提出FilterMoE模型，通过3D门控张量联合路由节点和通道上的可学习切比雪夫滤波器专家，在11个同质和异质基准测试中平均提升1.53个测试分数。

详情

AI中文摘要

预传播图神经网络（PPGNNs）将所有图相关的计算推入预处理步骤，仅对生成的密集跳特征进行训练，这使得它们具有高度可扩展性。该领域的一个难题是，更复杂的跳聚合器并不总是可靠地优于简单的聚合器：在许多基准测试中，基于普通MLP的聚合器与跳注意力变体相当或更优。我们从图滤波器的角度重新审视这一行为。在预计算的扩散基上，现有的PPGNNs主要区别在于滤波器系数如何在节点和特征通道之间共享，而非仅仅在原始聚合器容量上。基于MLP的架构学习通道相关的滤波器，这些滤波器在节点之间大致共享，而基于跳注意力的架构学习节点相关的混合，这些混合在通道之间大致共享。这揭示了标准PPGNN设计中的一个缺失机制：在预传播计算约束下，联合节点和通道自适应滤波。我们提出FilterMoE，一种混合专家PPGNN，其中一小批可学习的切比雪夫滤波器专家通过3D门控张量在节点和通道上联合路由。在11个同质和异质基准测试中，FilterMoE在9个数据集上优于强PPGNN基线，并在所有三个大规模基准测试中排名第一，平均测试分数提高了1.53分。这些结果确立了联合节点-通道滤波器路由作为数据集特定跳聚合器选择的稳健替代方案。

英文摘要

Pre-propagation graph neural networks (PPGNNs) push all graph-dependent computation into a preprocessing step and train only on the resulting dense hop features, which makes them highly scalable. A puzzle in this regime is that more complex hop aggregators do not reliably outperform simpler ones: on many benchmarks, a plain MLP-based aggregator matches or beats hop-attention variants. We revisit this behavior from a graph-filter perspective. Over a precomputed diffusion basis, existing PPGNNs differ mainly in how filter coefficients are shared across nodes and feature channels, rather than simply in raw aggregator capacity. MLP-based architectures learn channel-dependent filters that are largely shared across nodes, while hop-attention-based architectures learn node-dependent mixtures that are largely shared across channels. This reveals a missing regime in standard PPGNN designs: joint node- and channel-adaptive filtering under the pre-propagation computational contract. We propose FilterMoE, a mixture-of-experts PPGNN in which a small bank of learnable Chebyshev filter experts is routed jointly over nodes and channels by a 3D gating tensor. Across eleven homophilic and heterophilic benchmarks, FilterMoE outperforms strong PPGNN baselines on nine datasets and ranks first on all three large-scale benchmarks, improving the average test score by 1.53 points. These results establish joint node-channel filter routing as a robust alternative to dataset-specific hop-aggregator selection.

URL PDF HTML ☆

赞 0 踩 0

2606.01655 2026-06-02 math.OC cs.AI cs.LG stat.ML 版本更新

MINTS: Minimalist Thompson Sampling

MINTS: 极简汤普森采样

Kaizheng Wang

发表机构 * Department of IEOR and Data Science Institute, Columbia University（工业工程与数据科学学院，哥伦比亚大学）

AI总结针对贝叶斯方法在复杂结构约束下的局限性，提出一种仅对最优位置设置先验、通过轮廓似然消除冗余参数的极简贝叶斯框架，并实例化为MINTS算法，在均值约束多臂老虎机中实现近最优非渐近遗憾保证和精确几乎必然渐近遗憾刻画。

Comments 29 pages

2606.01645 2026-06-02 stat.ML cs.LG 版本更新

Self-Regulating Annealing in Heavy-Tailed Diffusion Models

重尾扩散模型中的自调节退火

Keito Wakatsuki, Hideaki Shimazaki

发表机构 * Keito Wakatsuki（凯托·瓦卡苏基）； Hideaki Shimazaki

AI总结本文提出一种基于随机微分方程的重尾扩散模型采样器，通过状态依赖的扩散系数实现自调节退火机制，以改进重尾数据的生成保真度。

Comments 6 pages, 3 figures, IJCNN2026

2606.01634 2026-06-02 cs.LG cs.AI 版本更新

E4GEN: Event-level Explainable Extreme-Enhanced Time-series Generation

E4GEN：事件级可解释的极端增强时间序列生成

Lin Jiang, Dahai Yu, Ximiao Li, Guang Wang

发表机构 * Florida State University（佛罗里达州立大学）

AI总结提出E4GEN可解释扩散框架，通过E-Activator、E-Predictor和E-Control三个组件实现事件级极端事件可控生成，在整体保真度、极端事件保真度和下游效用上优于现有方法。

Comments 48 pages,26 figures

详情

AI中文摘要

生成逼真的时间序列对于科学研究和实际应用至关重要。然而，现有方法通常强调整体分布保真度，而未能忠实捕捉极端事件。为了推进现有研究，我们提出了E4GEN，一个用于极端事件感知时间序列生成的可解释扩散框架。E4GEN通过三个关键组件提供了关于何时、什么以及如何控制极端事件生成的系统见解。首先，E-Activator在去噪过程中学习数据集自适应的极端控制信号激活步骤，而不干扰常规时间成分，包括趋势和季节性。其次，E-Predictor通过自驱动语义预测确定要强制执行的控制信号，其中每个样本通过推断生成过程中的潜在极端事件信息来导出其自身的控制信号。它还包括一种新颖的数据条件训练、噪声初始化采样机制，以解决训练标签不可用的问题。第三，E-Control通过可训练的极端控制网络指定如何控制极端事件生成，该网络将语义控制信号转换为逐层信号并将其注入去噪过程。我们在六个数据集上使用17个指标评估了E4GEN，大量实验表明，E4GEN在多个维度上优于最先进的模型，包括整体保真度、极端事件保真度和下游效用。

英文摘要

Generating realistic time series is essential for scientific research and real-world applications. However, existing methods often emphasize overall distributional fidelity while failing to faithfully capture extreme events. To advance existing research, we propose E4GEN, an explainable diffusion framework for extreme event-aware time-series generation. E4GEN provides systematic insights into when, what, and how to control extreme-event generation through three key components. First, E-Activator learns the dataset-adaptive extreme-control signal activation step during the denoising process without interfering with regular temporal components, including trend and seasonality. Second, E-Predictor determines what control signal to enforce through Self-Driven Semantic Prediction, where each sample derives its own control signal by inferring latent extreme-event information during generation. It also includes a novel Data-Conditioned Training, Noise-Initiated Sampling mechanism to address the issue of unavailable training labels. Third, E-Control specifies how to control extreme-event generation through a trainable Extreme Control Network, which transforms the semantic control signal into layer-wise signals and injects it into the denoising process. We evaluate E4GEN on six datasets with 17 metrics, and extensive experiments show that E4GEN outperforms state-of-the-art models across multiple dimensions, including overall fidelity, extreme-event fidelity, and downstream utility.

URL PDF HTML ☆

赞 0 踩 0

2606.01626 2026-06-02 cs.LG 版本更新

IMWM: Intuition Models Complement World Models for Latent Planning

IMWM：直觉模型补充世界模型用于潜在规划

Baoqi Gao, Ruize Han, Miao Wang, Song Wang

发表机构 * Beihang University（北航）； Shenzhen University of Advanced Technology（深圳先进技术大学）

AI总结针对基于潜在世界模型的规划中搜索瓶颈问题，提出IMWM框架，通过直觉模型与三个轻量组件协作，在四个像素级任务上显著提升成功率。

详情

AI中文摘要

使用学习到的潜在世界模型进行规划是从原始像素控制的有前途的途径，但仅靠强大的世界模型是不够的。我们通过实验证明了这一点：即使使用完美的世界模型（通过将学习到的前向预测器替换为真实环境动态的理想化展开来实现），有限预算的基于样本的规划器仍然在某些任务上失败，这表明瓶颈可能在于搜索而非世界模型的准确性。受此差距的启发，我们提出了IMWM（直觉模型+世界模型），它将世界模型与从演示中训练出的直觉模型配对，以识别有希望的动作。这两个模型通过三个轻量组件协作：（i）检索初始化，从检索到的演示中初始化规划器的动作提议；（ii）混合成本，将直觉分数与世界模型展开成本相结合；（iii）可靠性门控，调整规划器在每个设置中信任直觉的程度。在四个基于像素的目标到达任务（Two-Room、Reacher、Push-T和OGBench-Cube）中，IMWM在所有四个任务上的平均成功率均高于仅使用世界模型的规划器，其中在Two-Room（99.2%，+11.5个百分点）和OGBench-Cube（94.7%，+28.5个百分点）上提升最大。

英文摘要

Planning with a learned latent world model is a promising route to control from raw pixels, but a strong world model alone is not enough. We show this experimentally: even with a perfect world model (operationalized by replacing the learned forward predictor with an idealized rollout of the true environment dynamics), a finite-budget sample-based planner still fails on some tasks, indicating that the bottleneck can lie in search rather than in world-model accuracy. Motivated by this gap, we propose IMWM (Intuition Model + World Model), which pairs the world model with an intuition model trained from demonstrations to recognize promising actions. The two models collaborate through three lightweight components: (i) Retrieval Initialization, which initializes the planner's action proposal from a retrieved demonstration; (ii) Hybrid Cost, which combines the intuition score with the world-model rollout cost; and (iii) a Reliability Gate, which adjusts how much the planner trusts intuition in each setting. Across four pixel-based goal-reaching tasks (Two-Room, Reacher, Push-T, and OGBench-Cube), IMWM has higher mean success than the world-model-only planner on all four, with the largest gains on Two-Room (99.2%, +11.5 percentage points) and OGBench-Cube (94.7%, +28.5 percentage points).

URL PDF HTML ☆

赞 0 踩 0

2606.01612 2026-06-02 cs.CV cs.LG 版本更新

Self-Improving Small Object Grounding in LVLMs

LVLMs中的自改进小目标定位

Tianze Yang, Yucheng Shi, Ruitong Sun, Ninghao Liu, Jin Sun

发表机构 * University of Georgia（佐治亚大学）

AI总结利用LVLMs内部注意力模式，通过轻量级IoU回归器或无需训练的注意力熵选择器，从多个候选框中选出最佳框，实现小目标定位的自改进。

Comments 29 Pages, 15 Figures

详情

AI中文摘要

大型视觉语言模型（LVLMs）中的内部注意力模式能否在无需微调的情况下识别可靠的小目标框？在这项工作中，我们给出了肯定的答案。LVLMs中的注意力结构编码了定位质量——一个仅基于注意力图训练的轻量级IoU回归器实现了强IoU预测（Pearson r > 0.67）。该回归器驱动了我们基于注意力的候选选择（ACS）框架的回归器变体，称为ACS-Learned，它从多个采样候选中选择最佳框以改进目标定位。通过分析回归器学习的内容，我们揭示了哪些Transformer层和头最为关键，并推导出ACS-Free：一个无需训练的选择器，它根据这些判别性头上的注意力熵对候选进行排序，推理时无需任何学习组件。在COCO和Objects365上的实验表明，小目标定位的自改进高达19%，其中ACS-Free在所有无需训练的方法中排名最佳，表明有用的注意力结构提高了LVLMs中定位的可靠性和可解释性。

英文摘要

Can internal attention patterns in Large Vision Language Models (LVLMs) identify reliable small-object boxes without fine-tuning? In this work, we provide an affirmative answer. Attention structure in LVLMs encodes grounding quality-a lightweight IoU regressor trained solely on attention maps achieves strong IoU prediction (Pearson r > 0.67). This regressor powers the regressor-based variant of our Attention-based Candidate Selection (ACS) framework, called ACS-Learned, which selects the best box from multiple sampled candidates to improve object grounding. By analyzing what the regressor learns, we reveal which transformer layers and heads are most critical and derive ACS-Free: a training-free selector that ranks candidates by attention entropy on these discriminative heads, with no learned component at inference. Experiments on COCO and Objects365 demonstrate up to 19% self-improvement on small object localization, with ACS-Free ranking best among all training-free methods, demonstrating that useful attention structure improves both localization reliability and interpretability in LVLMs.

URL PDF HTML ☆

赞 0 踩 0

2606.01607 2026-06-02 cs.LG cs.AI 版本更新

FedMTFI: Feature Importance Based Optimized Multi Teacher Knowledge Distillation in Heterogeneous Federated Learning Environment

FedMTFI: 异构联邦学习环境中基于特征重要性优化的多教师知识蒸馏

Nazmus Shakib Shadin, Aaron Cummings, Xinyue Zhang, Bobin Deng

发表机构 * Department of Computer Science, Kennesaw State University, Marietta, GA, 30060 USA（计算机科学系，肯纳邦大学，马里埃塔，GA，30060 USA）

AI总结提出FedMTFI架构，通过结合多教师知识蒸馏与Shapley值特征重要性，在异构联邦学习中提升模型准确性和可解释性。

Comments Accepted by IJCNN 2026

详情

AI中文摘要

联邦学习（FL）是一种去中心化方法，能够在无需暴露原始数据的情况下实现协作模型训练。它允许设备仅共享模型权重，而将个人数据保留在本地并确保安全，从而避免了敏感数据的传输。然而，在现实环境中，设备持有的数据往往分布不均，且设备在计算能力和内存容量上大多存在差异。这些差异使得FL难以在整个系统中保持一致的性能。为了解决这些问题，我们提出了FedMTFI，一种新颖的架构，它将多教师知识蒸馏（MTKD）与特征重要性相结合，以改善异构环境中的FL过程。在FedMTFI中，客户端根据相似的硬件和模型类型进行聚类。每个聚类在非独立同分布（non-IID）数据上训练特定模型。在聚类内部，每个客户端仅使用自己的本地私有数据更新该模型。然后，服务器使用FedAvg对每个聚类中的本地训练模型进行聚合，形成多个原型模型。接着，这些原型作为教师模型，通过MTKD训练一个全局通用的学生模型。FedMTFI的独特之处在于集成了Shapley值（SHAP），以在蒸馏过程中强调重要特征，从而提高了准确性和可解释性。实验结果表明，FedMTFI比传统FL算法实现了更高的准确性，并且在non-IID数据条件下表现更有效。

英文摘要

Federated learning (FL) is a decentralized approach that enables collaborative model training without exposing raw data. Instead of transferring sensitive data, it allows devices to share only model weights, keeping personal data locally and secure. However, in real world settings, the data held by devices is often not evenly distributed and devices mostly differ in computing power and memory capacity. These differences make FL harder to maintain consistent performance across the system. To address these issues, we propose FedMTFI, a novel architecture that combines multi-teacher knowledge distillation (MTKD) with feature importance to improve the FL process in heterogeneous environments. In FedMTFI, clients are clustered based on similar hardware and model types. Each cluster trains a specific model on not independently and identically distributed (non-IID) data. Within a cluster, every client updates that model using only its own local private data. The server then aggregates the locally trained models in each cluster using FedAvg to form multiple prototype models. Then these prototypes serve as teacher models to train a global generalized student model using MTKD. What makes FedMTFI more unique is the integration of Shapley values (SHAP) to emphasize important features during distillation, which enhances both accuracy and interpretability. Experimental results show that FedMTFI achieves higher accuracy than traditional FL algorithms and performs more effectively under non-IID data conditions.

URL PDF HTML ☆

赞 0 踩 0

2606.01596 2026-06-02 math.NA cs.LG cs.NA 版本更新

Learning Chaotic Dynamics through Second-Order Geometric Supervision

通过二阶几何监督学习混沌动力学

Shinhoo Kang, Hai V. Nguyen, Tan Bui-Thanh

发表机构 * Department of Computer Science and Software Engineering, Korea University（韩国大学计算机科学与软件工程系）； Department of Aerospace Engineering and Engineering Mechanics, The Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin（德克萨斯大学奥斯汀分校航空航天工程与工程力学系，奥登计算工程与科学研究所）

AI总结提出模型约束随机雅可比匹配方法，以O(d^2)代价隐式施加二阶一致性，在混沌系统中恢复吸引子几何和不变统计量。

Comments 37 pages, 15 figures, 6 tables

详情

AI中文摘要

从数据中学习混沌动力系统需要的不仅仅是短期预测精度：学习模型必须保持吸引子几何及其不变统计量。轨迹（零阶）和雅可比（一阶）匹配监督向量场的值和切结构，但两者都不约束场如何偏离其切平面。因此，模型可以在监督状态下匹配值和切线，但弯曲方式与真实情况不同，在保持局部精度的同时，向虚假吸引子漂移并扭曲长时间统计量。我们证明，强制二阶一致性可以减轻这些失败，但在高维中形成完整的Hessian矩阵是禁止的。我们提出模型约束随机雅可比匹配，该方法在随机扰动的输入处比较真实和学习的向量场的雅可比矩阵。泰勒展开表明，期望的随机雅可比损失分解为名义雅可比失配加上由噪声方差缩放的Hessian失配，从而以O(d^2)代价隐式施加二阶一致性，而无需形成O(d^3)的Hessian张量。仅使用雅可比评估，该方法可扩展到显式Hessian匹配无法实现的高维。数值实验证实二阶方法是稳健的。对于Lorenz~63，一阶方法在最小时间监督下产生灾难性的Lyapunov指数异常值，而二阶方法消除了这些异常值并恢复了正确的吸引子。对于耦合Lorenz~96，分布外强迫扫描区分了这些方法：所有方法在F=16之前一致，但超过F=18后，只有二阶方法保持了不变测度和Lyapunov谱。在两个系统上，随机雅可比匹配以低得多的成本实现了与显式Hessian匹配相当的性能。

英文摘要

Learning chaotic dynamical systems from data requires more than short-term predictive accuracy: the learned model must preserve the attractor geometry and its invariant statistics. Trajectory (zero-order) and Jacobian (first-order) matching supervise the values and tangent structure of the vector field, but neither constrains how the field bends away from its tangent plane. A model can thus match values and tangents at the supervised states yet curve differently from the truth, remaining locally accurate while drifting toward spurious attractors and distorting long-time statistics. We show that enforcing second-order consistency mitigates these failures, but forming the full Hessian is prohibitive in high dimensions. We propose model-constrained randomized Jacobian matching, which compares the Jacobians of the true and learned vector fields at randomly perturbed inputs. A Taylor expansion shows that the expected randomized Jacobian loss decomposes into the nominal Jacobian mismatch plus a Hessian mismatch scaled by the noise variance, implicitly enforcing second-order consistency at $\mathcal{O}(d^2)$ cost without forming the $\mathcal{O}(d^3)$ Hessian tensor. Using only Jacobian evaluations, the method scales to high dimensions where explicit Hessian matching does not. Numerical experiments confirm that second-order methods are robust. For Lorenz~63, first-order methods produce catastrophic Lyapunov-exponent outliers under minimal temporal supervision, which second-order methods eliminate while recovering the correct attractor. For coupled Lorenz~96, an out-of-distribution forcing sweep separates the methods: all agree up to $F=16$, but beyond $F=18$ only second-order methods preserve the invariant measure and Lyapunov spectrum. On both systems, randomized Jacobian matching performs comparably to explicit Hessian matching at much lower cost.

URL PDF HTML ☆

赞 0 踩 0

2606.01595 2026-06-02 cs.LG 版本更新

Uncertainty-Calibrated Diffusion for Reliable 3D Molecular Graph Generation

不确定性校准的扩散用于可靠的3D分子图生成

Fang Wan, Jingxiang Qu, Yi Liu

发表机构 * State University of New York at Stony Brook（纽约州立大学石溪分校）

AI总结针对扩散模型在3D分子图生成中因认知不确定性导致采样质量下降的问题，提出不确定性校准扩散方法（UCD），通过校准反向扩散过程来补偿认知不确定性，在多个基准上取得最优性能。

详情

AI中文摘要

贝叶斯推理通过将预测视为分布而非确定性值，为神经网络中的认知不确定性建模提供了原则性框架。同时，用于3D分子图生成的扩散模型在受严格化学约束的脆弱几何结构上运行，使得推理对不确定性误校准高度敏感。一个被广泛忽视的问题是，来自学习去噪器的认知不确定性会与反向扩散过程中有意注入的偶然不确定性相互作用，导致系统性的方差膨胀以及真实分布与模拟分布之间的不匹配。这种效应对于高精度分子生成尤其有害，因为即使微小偏差也可能违反化学有效性。在这项工作中，我们对认知不确定性如何通过扩散推理传播并降低采样质量进行了理论和实证分析。基于此研究，我们提出了UCD（不确定性校准扩散），一种简单而有效的方法，通过校准反向扩散过程来考虑认知不确定性。在标准3D分子基准上的大量实验表明，UCD在不同基线方法中一致地提高了采样质量，为3D分子扩散建立了新的最先进性能。代码可在 https://github.com/jiuguaiwf/UCD 获取。

英文摘要

Bayesian inference provides a principled framework for modeling epistemic uncertainty in neural networks by treating predictions as distributions rather than deterministic values. Meanwhile, diffusion-based models for 3D molecular graph generation operate on fragile geometric structures governed by strict chemical constraints, making inference highly sensitive to uncertainty miscalibration. A largely overlooked issue is that epistemic uncertainty arising from the learned denoiser interacts with the aleatoric uncertainty intentionally injected during reverse diffusion, leading to systematic variance inflation and a mismatch between the true distribution and the simulated distribution. This effect is particularly detrimental for high-precision molecular generation, where even small deviations can violate chemical validity. In this work, we provide a theoretical and empirical analysis of how epistemic uncertainty propagates through diffusion inference and degrades sampling quality. Building on this investigation, we propose UCD (Uncertainty-Calibrated Diffusion), a simple yet effective method that calibrates the reverse diffusion process to account for epistemic uncertainty. Extensive experiments on standard 3D molecular benchmarks demonstrate that UCD consistently improves sampling quality across diverse baseline methods, establishing new state-of-the-art performance for 3D molecular diffusion. The code is available at https://github.com/jiuguaiwf/UCD.

URL PDF HTML ☆

赞 0 踩 0

2606.01591 2026-06-02 cs.CV cs.LG 版本更新

TLG: Temporal-Logic Grounding for Video Question Answering via Source-Annotation Reconstruction and Category-Targeted Reasoning

TLG: 通过源标注重建和类别目标推理实现视频问答的时间逻辑基础

Ali Alavi

发表机构 * The Ohio State University（俄亥俄州立大学）

AI总结提出TLG三阶段系统，通过重建动作时间线、解析问题为时间逻辑程序并确定性执行，结合强视觉语言模型和前沿推理模型，将视频问答准确率从46.9%提升至71.37%。

详情

AI中文摘要

TimeLogic挑战评估对视频的形式时间逻辑推理——包括16个算子（之前、之后、直到、自从、总是、共现、排序等），采用布尔和四选一形式。端到端视频语言模型在此任务上接近随机水平，因为它们将视频视为帧的集合，无法定位动作发生的时间。我们提出TLG（时间逻辑基础），一个三阶段系统：（i）从生成基准测试的公共源数据集标注中重建每个视频的动作时间线，将每个问题解析为时间逻辑程序，并确定性执行；（ii）在没有标注的情况下回退到强大的开放视觉语言模型；（iii）仅将视觉语言模型经验上最弱的问题类别路由到前沿推理模型。TLG将测试准确率从46.9%的视觉语言模型基线提升到71.37%，绝对增益+24.5，达到排行榜前三名3分以内。我们报告了广泛的消融实验，包括三种基于模型的时间线重建变体，它们都低于整体视觉语言模型，将时间基础隔离为不可约的瓶颈，并表明真正的标注——而非更大的模型——驱动准确率。

英文摘要

The TimeLogic Challenge evaluates formal temporal-logic reasoning over video - 16 operators (before, after, until, since, always, co-occur, ordering, ...) in boolean and 4-way multiple-choice form. End-to-end video-language models (VLMs) hover near chance on this task because they treat video as a bag of frames and cannot localize when actions occur. We present TLG (Temporal-Logic Grounding), a three-tier system that (i) reconstructs each video's action timeline from the public source-dataset annotations the benchmark was generated from, parses every question into a temporal-logic program, and executes it deterministically; (ii) falls back to a strong open VLM where no annotation exists; and (iii) routes only the question categories where the VLM is empirically weakest to a frontier reasoning model. TLG raises test accuracy from a 46.9% VLM baseline to 71.37%, a +24.5 absolute gain, reaching within 3 points of the leaderboard top. We report extensive ablations, including three model-based timeline-reconstruction variants that all underperform a holistic VLM, isolating temporal grounding as the irreducible bottleneck and showing that real annotations - not larger models - drive accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.01566 2026-06-02 cs.LG 版本更新

RobustModelMaker: Coupling Bootstrap Stability Selection with Leakage-Safe Nested Cross-Validation for Scientific Machine Learning

RobustModelMaker: 将Bootstrap稳定性选择与防泄漏嵌套交叉验证相结合的科学机器学习

Amanda S Barnard

发表机构 * School of Computing, Australian National University（计算学院，澳大利亚国立大学）

AI总结针对小到中等规模科学数据集，提出RobustModelMaker框架，通过结合bootstrap稳定性选择与严格嵌套交叉验证，在防止数据泄漏的同时提供稳定性测试的特征子集和性能估计，在预测得分和选择稳定性上优于多种替代方法。

Comments 19 pages, 2 figure plates, 8 tables

详情

AI中文摘要

小到中等规模的科学数据集使机器学习流程面临两种叠加压力。单次特征选择产生的特征集在训练数据微小扰动下会发生显著变化，而任何使用相同数据进行选择、调参和评估的程序都会产生乐观偏差的性能估计。这两种失效模式通常被视为可分离的，但在科学数据所处的场景中，它们相互影响：不稳定的选择会放大本已乐观的得分的方差，而针对其中一种的标准补救措施很少能解决另一种。RobustModelMaker是一个Python框架，它将bootstrap稳定性选择与严格的嵌套交叉验证相结合，在每个折叠内执行所有预处理和选择，并生成一个经过稳定性测试的特征子集以及一个防泄漏的性能估计。该框架支持二分类、多分类和回归中的九种算法。行为通过确定性测试套件进行验证，该套件涵盖单元测试、性能测试和可重复性检查，在三个真实科学数据集上，与三种替代选择器（ANOVA F检验、带交叉验证的递归特征消除和Boruta）在预测得分和选择稳定性的Jaccard度量上进行比较。RobustModelMaker在每个数据集上的得分与最佳替代选择器相当，并且在所有三种任务类型中，在联合得分-稳定性前沿上占据了一个任何替代方法都无法匹敌的位置。两个示例应用——来自PLCO试验的卵巢癌生物标志物发现和UCI超导数据上的临界温度回归——说明了该框架在实际中的使用方式，以及当稳定性被视为首要交付成果而非涌现属性时，哪些权衡变得可见。

英文摘要

Small-to-medium scientific datasets place machine learning pipelines under two compounding pressures. Single-run feature selection produces feature sets that change substantially under small perturbations of the training data, and any procedure that uses the same data for selection, tuning, and evaluation produces optimistically biased performance estimates. The two failure modes are routinely treated as separable, but in the regimes where scientific data live, they interact: an unstable selection inflates the variance of an already-optimistic score, and standard remedies for one rarely address the other. RobustModelMaker is a Python framework that couples bootstrap stability selection with strict nested cross-validation, performs all preprocessing and selection inside each fold, and produces a stability-tested feature subset together with a leakage-safe performance estimate. The framework supports nine algorithms across binary classification, multiclass classification, and regression. Behaviour is verified by a deterministic test suite spanning unit, performance, and reproducibility checks on three real scientific datasets comparing to three alternative selectors (ANOVA F-test, recursive feature elimination with cross-validation, and Boruta) on both predictive score and a Jaccard measure of selection stability. RobustModelMaker is competitive in score with the best alternative selector on each dataset, and occupies a position on the joint score-stability frontier that none of the alternatives match across all three task types. Two example applications, ovarian cancer biomarker discovery from the PLCO Trial and critical-temperature regression on the UCI Superconductivity Data, illustrate how the framework is used in practice and what trade-offs become visible when stability is treated as a first-class deliverable rather than an emergent property.

URL PDF HTML ☆

赞 0 踩 0

2606.01563 2026-06-02 cs.LG 版本更新

MomentKV: Closing the Directional Gap in KV Cache Eviction for Long-Context Inference

MomentKV：消除长上下文推理中KV缓存驱逐的方向差距

Yu Li, Binxu Li, Tian Lan

发表机构 * George Washington University（乔治·华盛顿大学）； Princeton University（普林斯顿大学）

AI总结针对长上下文推理中KV缓存驱逐导致输出退化的问题，提出MomentKV方法，通过维护驱逐令牌集的矩统计量（计数、键均值、值均值和值-键协方差）来识别与累积摘要对齐的令牌，并在推理时提供驱逐注意力输出的一阶近似，实现选择性驱逐与精确校正的相互增强。

详情

AI中文摘要

基于Transformer的语言模型中的自回归解码依赖于KV缓存，其内存占用随序列长度线性增长，成为长上下文推理的主要瓶颈。KV缓存驱逐通过保留固定大小的键值对子集并丢弃其余部分来解决这一问题。我们发现输出退化的一个主要来源并非驱逐令牌上的残余注意力质量（现有方法已最小化），而是保留令牌集与驱逐令牌集之间的方向不匹配。具体而言，实际中被驱逐的令牌通常与保留的令牌接近正交。因此，即使少量的驱逐质量也可能对最终的方向分布产生过大影响，并放大为显著的输出误差。这揭示了现有策略的根本局限性。为解决此问题，我们提出MomentKV，它在驱逐令牌集上维护紧凑的小规模矩统计量，包括计数、键均值、值均值和值-键协方差。在驱逐过程中，利用矩统计量识别已经与累积摘要良好对齐并被其捕获的令牌，保持驱逐集的几何规则性。在推理过程中，它们产生驱逐注意力输出的闭式一阶近似，在选择性驱逐与精确校正之间形成相互增强的循环。在LongBench和RULER上使用LLaMA-3.1-8B-Instruct和Qwen3-4B-Instruct进行的实验表明，MomentKV在每个缓存预算下均优于所有基线，在激进压缩下增益最大。

英文摘要

Autoregressive decoding in Transformer-based language models relies on the KV cache, whose memory footprint grows linearly with sequence length and becomes the primary bottleneck for long-context inference. KV cache eviction addresses this by retaining a fixed-size subset of key-value pairs and discarding the rest. We identify that a primary source of output degradation is not the residual attention mass on evicted tokens, which existing methods already minimize, but a directional mismatch between the retained and evicted token sets. Specifically, the evicted tokens in practice are often near-orthogonal to the retained ones. Thus, even a small evicted mass could have an oversized impact on the resulting direction distribution and amplify into substantial output error. This reveals a fundamental limit in existing strategies. To address this, we propose MomentKV, which maintains compact, small-size moment statistics over the evicted token set, including a count, key mean, value mean, and value-key covariance. During eviction, the moment statistics is leveraged to identify tokens already well aligned with and captured by the accumulated summary, keeping the evicted set geometrically regular. During inference, they yield a closed-form first-order approximation of the evicted attention output, forming a mutually reinforcing loop between selective eviction and accurate correction. On LongBench and RULER with LLaMA-3.1-8B-Instruct and Qwen3-4B-Instruct, MomentKV outperforms all baselines at every cache budget, with the largest gains under aggressive compression.

URL PDF HTML ☆

赞 0 踩 0

2606.01560 2026-06-02 cs.LG cs.AI 版本更新

GJDNet: Robust Graph Neural Networks via Joint Disentangled Learning Against Adversarial Attacks

GJDNet: 通过联合解缠学习实现鲁棒图神经网络对抗攻击

Canyixing Cui, Tao Wu, Xingping Xian, Xiao-Ke Xu, Mao Wang, Weina Niu

发表机构 * School of Computer Science and Technology, Chongqing University of Posts and Telecommunications（重庆邮电大学计算机科学与技术学院）； School of Cyber Security and Information Law, Chongqing University of Posts and Telecommunications（重庆邮电大学网络安全与信息法学院）； Computational Communication Research Center, Beijing Normal University（北京师范大学计算通信研究中心）； School of Journalism and Communication, Beijing Normal University（北京师范大学新闻传播学院）； School of Computer Science and Engineering, University of Electronic Science and Technology of China（电子科技大学计算机科学与工程学院）

AI总结提出GJDNet框架，通过联合解缠节点表示和决策空间，并采用球形决策边界，增强图神经网络在不同图同配性下的鲁棒性。

详情

AI中文摘要

图神经网络（GNN）易受对抗攻击，这类攻击通过在同配图中引入异配边、在异配图中引入同配边，从根本上反转连接模式。这种结构反转造成结构-特征不匹配，扰乱不同图类型上的邻域聚合。然而，我们发现现有防御措施存在局限性，它们要么在固定的同配性假设下将邻域视为整体，要么依赖无法应对扰动引起的表示偏移的标准softmax分类器。为进一步利用这一观察，我们采用鲁棒性视角，联合解缠节点表示和决策空间，在隔离扰动影响的同时强制实现分离良好的决策区域。基于此原则，我们提出图联合解缠网络（GJDNet），这是一个统一的框架，用于在不同图同配性机制下进行鲁棒节点分类。GJDNet在表示和决策两个层面增强鲁棒性：它采用特征驱动的软结构解缠，结合偏度感知的邻居过滤，抑制扰动引起的结构-特征不匹配；并引入球形决策边界（SDB），促进嵌入空间中的类内紧凑性和类间分离，从而在扰动下稳定决策边界。理论分析揭示了所提出的解缠表示和决策机制的有效性，而大量实验表明，GJDNet在不同连接模式的图上始终展现出强鲁棒性。

英文摘要

Graph Neural Networks (GNNs) are vulnerable to adversarial attacks, which inherently invert connectivity patterns by introducing disassortative edges in assortative graphs and assortative edges in disassortative graphs. This structural inversion creates structure-feature mismatches that disrupt neighborhood aggregation across different graph types. However, we find that existing defenses are limited, as they either treat neighborhoods as monolithic under fixed assortativity assumptions or rely on standard softmax classifiers that fail to account for perturbation-induced representation shifts. To further exploit this observation, we adopt a robustness perspective that jointly disentangles node representations and decision spaces, isolating perturbation effects while enforcing well-separated decision regions. Based on this principle, we propose Graph Joint Disentanglement Network (GJDNet), a unified framework for robust node classification across diverse graph assortativity regimes. GJDNet enhances robustness at both representation and decision levels: it employs feature-driven soft structural disentanglement with skewness-aware neighbor filtering to suppress perturbation-induced structure-feature mismatches, and introduces a Spherical Decision Boundary (SDB) to promote intra-class compactness and inter-class separation in the embedding space, thereby stabilizing decision boundaries under perturbations. Theoretical analysis provides insights into the effectiveness of the proposed disentangled representation and decision mechanisms, while extensive experiments demonstrate that GJDNet consistently achieves strong robustness across graphs with different connectivity regimes.

URL PDF HTML ☆

赞 0 踩 0

2606.01557 2026-06-02 cs.LG eess.SP 版本更新

Everywhere Learning: Artificial Intelligence with Pointwise Constraints

处处学习：具有逐点约束的人工智能

Ignacio Boero, Ignacio Hounie, Luiz Chamon, Alejandro Ribeiro

发表机构 * Department of Electrical and Systems Engineering, University of Pennsylvania（宾夕法尼亚大学电气与系统工程系）； École polytechnique, Institut Polytechnique de Paris（巴黎理工学院）

AI总结提出“处处学习”新范式，通过近似对偶理论分析泛化性能，并用稀疏L1惩罚控制泛化，在语言模型任务中验证其优势。

详情

AI中文摘要

处处学习是一种新范式，其中人工智能系统被训练以满足数据分布上概率为1的损失约束。这与训练人工智能系统最小化平均损失的标准范式形成对比。我们发展了一种近似对偶理论，以支持泛化分析，该分析建立了经验与统计处处学习问题解之间的接近性。我们的结果表明，对偶变量将数据分布重新加权到损失约束更难满足的点，并且泛化由数据分布质量集中与约束更难满足点上的质量集中之间的不匹配控制。我们进一步表明，我们可以通过约束松弛上的稀疏L1惩罚来控制泛化。我们通过语言模型任务中的智能体分类实验说明了处处学习的优点。

英文摘要

Everywhere learning is a new paradigm whereby Artificial Intelligence (AI) systems are trained to satisfy loss constraints with probability one over the data distribution. This is in contrast to the standard paradigm of training AI systems to minimize average losses. We develop an approximate duality theory to substantiate a generalization analysis that establishes the proximity between solutions of empirical and statistical everywhere learning problems. Our results show that dual variables reweigh the data distribution towards points in which loss constraints are more difficult to satisfy and that generalization is controlled by the mismatch between the concentration of mass of the data distribution and the concentration of mass on points where constraints are more difficult to satisfy. We further show that we can control generalization with a sparse L1 penalty on constraint relaxations. We illustrate the merits of everywhere learning with an experiment in agentic classification for language model tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.01544 2026-06-02 cs.LG 版本更新

CRePE: Convolution-aware Relative Importance in Post-training Pruning with Efficient Search

CRePE: 后训练剪枝中基于卷积感知的相对重要性及高效搜索

Cheonjun Park

发表机构 * Hankuk University of Foreign Studies（韩国家外国语大学）

AI总结提出CRePE方法，通过引入二维局部邻域上下文和自适应系数改进相对重要性评分，结合PHO代理优化实现高效后训练剪枝，在多种模型和稀疏度下取得最优性能。

Comments 10 pages

详情

AI中文摘要

在实际部署大型语言模型（LLM）时，会带来大量的内存和计算成本。后训练剪枝（PTP）是一种通过移除权重来降低这些成本的有效方法，无需额外训练。在现有方法中，RIA引入了通过行和列和归一化的相对重要性分数，实现了最先进的精度。然而，RIA仅考虑一维十字形（行/列）方向信息，并对行和列贡献赋予相同权重。在本文中，我们提出**CRePE**，它将二维局部邻域上下文和自适应系数纳入相对重要性评分。CRePE在各种模型和稀疏度设置下始终优于现有的PTP方法。然而，通过基于困惑度（PPL）的爬山法确定最优自适应系数需要大量PPL评估和约11小时的搜索时间。为了解决这个问题，我们提出**PHO**（基于代理的超参数优化），它消除了重复PPL测量的需要，并将搜索时间减少到约20分钟。此外，PHO在一个模型上找到的最优超参数配置可以很好地迁移到其他模型，展现出强大的泛化能力。最后，我们验证了CRePE可以与现有技术（包括通道置换、非均匀稀疏分配和重新剪枝方法）正交结合。

英文摘要

Deploying Large Language Models (LLMs) in practice incurs substantial memory and computational costs. Post-training pruning (PTP) is an effective approach to reducing these costs by removing weights without additional training. Among existing methods, RIA introduces relative importance scores normalized by row and column sums, achieving state-of-the-art accuracy. However, RIA considers only 1D cross-shaped (row/column) directional information and assigns equal weight to row and column contributions. In this paper, we propose \textbf{CRePE}, which incorporates 2D local neighborhood context and adaptive coefficients into Relative Importance scoring. CRePE consistently outperforms existing PTP methods across diverse models and sparsity settings. However, identifying optimal adaptive coefficients via perplexity (PPL)-based hill climbing requires numerous PPL evaluations and approximately 11 hours of search time. To address this, we propose \textbf{PHO} (Proxy-based Hyperparameter Optimization), which eliminates the need for repeated PPL measurements and reduces the search time to approximately 20 minutes. Furthermore, the optimal hyperparameter configuration found by PHO on one model transfers well to other models, demonstrating strong generalization. Finally, we verify that CRePE can be orthogonally combined with existing techniques including Channel Permutation, non-uniform sparsity allocation, and re-pruning methods.

URL PDF HTML ☆

赞 0 踩 0

2606.01540 2026-06-02 cs.LG cs.AI 版本更新

TN-SHAP-G: Graph-Structured Tensor Network Surrogates for Shapley Values and Interactions

TN-SHAP-G：用于Shapley值和交互的图结构张量网络代理

Farzaneh Heidari, Guillaume Rabusseau

发表机构 * University of Washington（华盛顿大学）； CNRS（法国国家科学研究中心）

AI总结提出TN-SHAP-G框架，利用图结构输入通过张量网络代理高效计算Shapley值和高阶交互指数。

详情

AI中文摘要

Shapley值是一种广泛使用的工具，用于归因黑盒模型中输入变量的重要性和交互，但其计算涉及定义在指数级子集空间上的函数。我们提出TN-SHAP-G，一个利用图结构输入中的结构高效计算Shapley值和高阶交互指数的框架。给定一个预测器和一个固定的掩码方案，TN-SHAP-G学习一个紧凑的、与图对齐的多线性代理，该代理近似掩码输入行为，表示为拓扑结构反映输入图的张量网络。一旦从少量oracle查询中训练完成，该代理通过多线性扩展实现一阶和高阶Shapley指数的确定性恢复，无需额外模型查询或蒙特卡洛方差。分子基准实验表明，学习到的分解在小图上紧密匹配精确Shapley值，并能高效扩展到基于采样的方法不可行的更大图。

英文摘要

Shapley values are a widely used tool for attributing importance and interactions among input variables in black-box models, but their computation involves a function defined over an exponentially large space of subsets. We propose TN-SHAP-G, a framework that exploits structure in graph-structured inputs to compute Shapley values and higher-order interaction indices efficiently. Given a predictor and a fixed masking scheme, TN-SHAP-G learns a compact, graph-aligned multilinear surrogate that approximates the masked-input behavior, represented as a tensor network whose topology mirrors the input graph. Once trained from a small number of oracle queries, the surrogate enables deterministic recovery of first- and higher-order Shapley indices via the multilinear extension, without additional model queries or Monte Carlo variance. Experiments on molecular benchmarks show that the learned factorization closely matches exact Shapley values on small graphs and scales efficiently to larger graphs where sampling-based methods become infeasible.

URL PDF HTML ☆

赞 0 踩 0

2606.01539 2026-06-02 stat.ME cs.LG 版本更新

Scalable Counterfactual Risk Estimation for Rare Events in Longitudinal Data

纵向数据中罕见事件的可扩展反事实风险估计

Xiaohui Yin, Avijit Mitra, Ying Zhou, Kun Chen, Hong Yu

发表机构 * University of Connecticut Storrs（康涅狄格大学斯托尔分校）； University of Massachusetts Amherst（马萨诸塞大学阿默斯特分校）； University of Massachusetts Lowell（马萨诸塞大学洛厄尔分校）

AI总结针对纵向生存数据中罕见事件导致的类不平衡和计算负担问题，提出一种可扩展的子采样与重加权策略，应用于ICE等因果效应估计器，在保持一致性的同时提高稳定性。

Comments Accepted at KDD-2026, 12 pages

详情

AI中文摘要

在大规模观察性研究中，估计时变治疗对生存结果的因果效应在计算上要求很高，尤其是当结果罕见时。虽然基于g公式的方法（如迭代条件期望（ICE）估计器）为纵向因果推断提供了原则性框架，但它们在计算上变得昂贵，特别是当需要基于自助法的方差估计时。此外，每个时间点的结果罕见性会导致严重的类不平衡，从而引发逻辑回归及相关模型的不稳定性和收敛问题。为应对这些挑战，我们提出了一种针对纵向生存数据的原则性子采样与重加权策略，可应用于该场景下的多种现有因果效应估计器，包括ICE估计器。所提方法显著降低了计算负担，同时在罕见结果场景下保持一致性并提高估计稳定性。我们通过模拟评估该方法，并使用一项关于健康社会和行为决定因素（SBDH）与自杀风险的大规模EHR队列研究进行验证，证明了其在纵向数据中建模罕见结果的有效性。

英文摘要

Estimating the causal effect of time-varying treatments on survival outcomes in large observational studies is computationally demanding, particularly when outcomes are rare. While g-formula-based methods such as the iterative conditional expectation (ICE) estimator provide a principled framework for longitudinal causal inference, they become computationally expensive, especially when bootstrap-based variance estimation is required. In addition, outcome rarity at each time point induces severe class imbalance, leading to instability and convergence issues in logistic regression and related models. To address these challenges, we propose a principled subsampling and reweighting strategy for longitudinal survival data that can be applied to a range of existing causal effect estimators in this setting, including the ICE estimator. The proposed method substantially reduces computational burden while preserving consistency and improving estimation stability in rare-outcome settings. We evaluate the method through simulations and validate it using a large-scale EHR cohort study on social and behavioral determinants of health (SBDH) and suicide risk, demonstrating its effectiveness for modeling rare outcomes in longitudinal data.

URL PDF HTML ☆

赞 0 踩 0

2606.01533 2026-06-02 cs.MA cs.CL cs.LG 版本更新

Multi-Agent Computer Use

多智能体计算机使用

Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结针对单智能体计算机使用代理在复杂长时任务中的不足，提出多智能体计算机使用系统，通过有向无环图分解任务并并行执行，在多个基准测试上提升3.4-25.5%性能，并加速任务完成时间约1.5倍。

详情

AI中文摘要

目前的计算机使用代理（CUA）主要部署为单序列代理。这种设置对于受益于任务分解、并行执行和基于新信息持续重新规划的复杂长时任务来说并不理想。在本文中，我们认为应该转向评估和构建多智能体计算机使用（MACU）系统。这些系统强调规划和并行执行，缓解了单智能体CUA的许多缺点。我们提出了一种通用的多智能体设置，其中管理模型将计算机使用任务分解为有向无环图（DAG），编码子代理的相关依赖关系和目标。在每次迭代中，管理器调度并行的CUA子代理执行DAG就绪前沿上的节点，并根据子代理的新发现持续修订DAG（添加、取消或重写节点）。这种设计将计算机使用的部分可观察环境作为首要挑战：下游代理可能无法重新观察到的信息通过管理器和DAG结构保留并传递。我们证明，MACU在桌面（OSWorld）和网页导航（Online-Mind2Web、WebTailBench、Odysseys）基准测试上始终比强单智能体基线提升3.4-25.5%，表现出更有利的测试时缩放，并解决了单智能体CUA陷入困境的复杂长时任务。在Odysseys（一个长时网页导航基准测试）上，MACU将平均任务完成墙钟时间提高了约1.5倍，证明了其在加速传统缓慢的CUA流程方面的有效性。我们的研究结果强调，多智能体协调是扩展计算机使用代理以更长时间、更有效地工作的一个有前景的方向。我们在https://jykoh.com/multi-agent-computer-use上发布所有代码和交互式可视化。

英文摘要

Computer use agents (CUAs) today are primarily deployed as single serial agents. This setup is suboptimal for complex long-horizon tasks that benefit from task decomposition, parallel execution, and consistent re-planning based on new information. In this paper, we argue that we should instead move towards evaluating and building multi-agent computer use (MACU) systems. These systems, which emphasize planning and parallel execution, alleviate many of the shortcomings of single-agent CUAs. We propose a general multi-agent setup in which a manager model decomposes computer use tasks as a directed acyclic graph (DAG), encoding relevant dependencies and goals for subagents. At each iteration, the manager dispatches parallel CUA subagents to carry out nodes on the ready frontier of the DAG, and continuously revises the DAG (adding, canceling, or rewriting nodes) as new findings arrive from subagents. This design treats the partially observable environment of computer use as a first class challenge: information that downstream agents may not be able to re-observe are retained and passed forward through the manager and DAG structure. We demonstrate that MACU consistently improves over strong single-agent baselines by $3.4-25.5\%$ on desktop (OSWorld) and web navigation (Online-Mind2Web, WebTailBench, Odysseys) benchmarks, exhibits more favorable test-time scaling, and solves complex long-horizon tasks where single-agent CUAs get stuck. On Odysseys, a long-horizon web navigation benchmark, MACU improves average task completion wall-clock time by ${\sim} 1.5 \times$, demonstrating its efficacy in speeding up traditionally slow CUA pipelines. Our findings highlight that multi-agent coordination is a promising axis for scaling computer use agents to work productively for longer and more effectively. We release all code and interactive visualizations at https://jykoh.com/multi-agent-computer-use.

URL PDF HTML ☆

赞 0 踩 0

2606.01527 2026-06-02 cs.LG cs.CR 版本更新

Near-Optimal Pure Machine Unlearning for Smooth Strongly Convex Losses

平滑强凸损失下的近最优纯机器遗忘

Matthew Regehr, Gautam Kamath, Andrew Lowy

发表机构 * University of Waterloo（滑铁卢大学）； Vector Institute（向量研究所）； CISPA Helmholtz Center for Information Security（CISPA海德堡信息安全中心）

AI总结针对平滑强凸随机优化中的近似ε-遗忘问题，本文通过证明超额总体风险的上界和下界（紧至条件数因子），几乎解决了遗忘的基本统计代价，并提出了在ε≫d时相比从头再训练和差分隐私基线具有指数级精度提升的遗忘算法。

详情

AI中文摘要

机器遗忘受到法律和用户需求（如被遗忘权）的驱动，旨在从训练模型中移除个体数据的影响。先前的工作已经为平滑强凸随机优化中的遗忘开发了算法和误差界，但遗忘的基本统计代价仍不清楚。我们通过证明近似ε-遗忘的超额总体风险的上界和下界，几乎解决了这个问题；我们的界紧至条件数因子。对于单位球上的均值估计，我们的上界和下界匹配。最优速率是通常的统计误差加上一个遗忘惩罚，该惩罚在从头再训练速率和随着ε/d增长而指数级减小的项之间插值，其中d是模型的维度。特别地，当ε≫d时，我们的ε-遗忘算法相比从头再训练模型和差分隐私基线提供了指数级的精度提升。另一方面，当ε≤d时，从头再训练是最优的。

英文摘要

Machine unlearning is motivated by legal and user-facing requirements to remove the influence of individuals' data from trained models, such as the right to be forgotten. Prior work has developed algorithms and error bounds for unlearning in smooth strongly convex stochastic optimization, but the fundamental statistical cost of unlearning has remained unclear. We nearly resolve this problem by proving upper and lower bounds on the excess population risk of approximate $\varepsilon$-unlearning; our bounds are tight up to a condition-number factor. For mean estimation over the unit ball, our upper and lower bounds match. The optimal rate is the usual statistical error plus an unlearning penalty that interpolates between the retraining-from-scratch rate and an exponentially smaller term as $\varepsilon/d$ grows, where $d$ is the dimension of the model. In particular, when $\varepsilon \gg d$, our $\varepsilon$-unlearning algorithm offers an exponential accuracy improvement over retraining the model from scratch and differentially private baselines. On the other hand, when $\varepsilon \le d$, retraining from scratch is optimal.

URL PDF HTML ☆

赞 0 踩 0

2606.01525 2026-06-02 cs.LG stat.ML 版本更新

Semi-Supervised Hyperbolic Hierarchical Clustering with Set-Level Structural Priors

基于集合级结构先验的半监督双曲层次聚类

Junjing Zheng, Xinyu Zhang, Xiangfeng Qiu, Chengliang Song, Weidong Jiang

发表机构 * College of Electronic Science and Technology, National University of Defense Technology（电子科学与技术学院，国防科技大学）

AI总结提出一种半监督双曲层次聚类方法，通过引入集合作为基本建模单元，利用从叶级监督导出的集合级结构先验来指导非叶层次结构学习，提升标签一致性和树质量。

详情

AI中文摘要

半监督层次聚类旨在学习与数据模式和用户提供的监督一致的树结构。监督通常以叶级关系的形式给出，例如成对的必须连接/不能连接约束或三元组的必须在之前连接约束。尽管这些约束有助于调节局部样本关系，但它们并不直接指示哪些样本应形成连贯的子树。因此，学习到的树的非叶结构可能偏离真实标签所偏好的层次组织。为了解决这一局限性，我们提出了一种具有集合级结构先验的半监督双曲层次聚类方法。主要贡献是引入集合作为层次学习的基本建模单元。每个集合表示预期在子树内凝聚的样本，并从叶级监督以及学习到的约束一致相似性结构中导出。这些集合作为子树级监督的软结构先验，使得监督能够指导超出局部叶级关系的非叶层次形成。具体来说，我们首先学习约束一致的嵌入以获得可靠的集合划分，然后构建约束诱导的集合并估计集合间相似性以形成集合级结构先验。最后，将这些先验纳入双曲层次目标中进行连续树优化。在11个基准数据集上的实验和消融研究表明，所提出的方法在提高代表性层次聚类基线的标签一致性的同时，也增强了基于相似性的树质量。

英文摘要

Semi-supervised hierarchical clustering aims to learn a tree structure consistent with data patterns and user-provided supervision. Supervision is usually given as leaf-level relations, such as pairwise must-link/cannot-link constraints or triplet-wise must-link-before constraints. Although useful for regulating local sample relations, such supervision does not directly indicate which samples should form coherent subtrees. Consequently, the non-leaf structure of the learned tree may deviate from the hierarchical organization preferred by ground-truth labels. To address this limitation, we propose a semi-supervised hyperbolic hierarchical clustering method with set-level structural priors. The main contribution is to introduce sets as basic modeling units for hierarchy learning. Each set denotes samples expected to cohere within a subtree and is induced from leaf-level supervision together with a learned constraint-consistent similarity structure. These sets act as soft structural priors for subtree-level supervision, allowing supervision to guide non-leaf hierarchy formation beyond local leaf-level relations. Specifically, we first learn constraint-consistent embeddings to obtain a reliable set partition, then construct constraint-induced sets and estimate inter-set similarities to form set-level structural priors. Finally, these priors are incorporated into a hyperbolic hierarchy objective for continuous tree optimization. Experiments on eleven benchmark datasets and ablation studies show that the proposed method consistently improves label consistency over representative hierarchical clustering baselines while also enhancing similarity-based tree quality.

URL PDF HTML ☆

赞 0 踩 0

2606.01521 2026-06-02 cs.LG stat.ML 版本更新

电子商务产品搜索中的语义检索

Nikhil Kothari, Saksham Samdani, Ritam Mallick, Praveen Gupta, Ankit Vijay, Surender Kumar

发表机构 * Flipkart, India（印度Flipkart）

AI总结针对电商搜索中短、嘈杂、口语化查询和细粒度属性区分问题，提出一种基于Siamese LLM双编码器的两阶段训练方法，通过对比学习和偏好优化实现精确匹配与排序。

详情

AI中文摘要

电子商务中的语义检索必须处理短、嘈杂和口语化的查询，并在具有细粒度属性区分的大型产品目录上进行。我们提出了一种Siamese LLM双编码器，通过两阶段流水线进行训练：首先使用带有假阴性边缘掩码的对比学习，以防止对近似重复产品的惩罚；然后进行相对赔率对齐检索（ROAR），这是一种偏好优化目标，通过连续赔率比边缘将Bradley-Terry扩展到可变大小的分级相关组。训练语料库反映了这一进展——第一阶段中替代查询-产品对提供粗略的语义监督，第二阶段中分级相关性注释驱动细粒度排序。由此产生的系统能够准确检索精确匹配，同时正确排序替代品和互补产品，在查询频率层和业务垂直领域均得到验证，并通过大规模在线A/B部署验证了统计显著性。

英文摘要

Semantic retrieval in e-commerce must handle short, noisy, and colloquial queries over large product catalogs with fine-grained attribute distinctions. We present a Siamese LLM dual-encoder trained through a two-stage pipeline: contrastive learning with a false-negative margin mask to prevent penalization of near-duplicate products, followed by Relative Odds Alignment for Retrieval (ROAR), a preference optimization objective that extends Bradley-Terry to variable-sized graded relevance groups via consecutive odds-ratio margins. The training corpus mirrors this progression - substitute query-product pairs provide coarse semantic supervision in Stage 1 and graded relevance annotations drive fine-grained ranking in Stage 2. The resulting system accurately retrieves exact matches while correctly ordering substitutes and complementary products, with gains confirmed across query-frequency strata and business verticals, and statistical significance validated through live A/B deployment at scale.

URL PDF HTML ☆

赞 0 踩 0

2606.01485 2026-06-02 cs.CV cs.LG 版本更新

Perception First: A Frontier Native-Video Model with Self-Consistency for Implicit Video Question Answering

感知优先：具有自一致性的前沿原生视频模型用于隐式视频问答

Ali Alavi

发表机构 * The Ohio State University（俄亥俄州立大学）

AI总结本文通过系统实验发现隐式视频问答基准是感知受限而非推理受限，并指出提升基础模型感知能力和轻量级测试时去噪是唯一可靠手段。

详情

AI中文摘要

我们描述了提交至CVPR 2026 VRR挑战赛的方案，该方案基于ImplicitQA / VRR-QA基准：一种多项选择视频问答任务，其中答案有意地不在任何单帧中可观察，必须从创意视频的不连续帧中的空间布局、运动、深度、视角、因果关系和社会背景推断。我们对开源视频大语言模型（Qwen2.5-VL、Qwen3-VL、InternVL3、Gemma-3以及经过强化学习训练的视频推理器Video-R1和VideoChat-R1.5）和一系列推理时策略（思维链、问题分解、描述-推理级联、音频转录、空间状态提示、自一致性、多模型集成和类别路由）进行了系统的、无需训练的研究。我们的核心发现是，该基准是感知受限而非推理受限：推理侧的增强是中性的甚至有害的，而基础模型的感知能力和轻量级测试时去噪是唯一可靠的杠杆。按类别的错误分析将困难定位到低级感知——相对深度、视角和计数是最困难的类别，而因果和社会推理几乎已解决——一个明确注入单目深度线索以攻击最弱类别的提示将测试准确率降低了5.8个百分点，证实了模型需要更好的感知，而非更好的过程。

英文摘要

We describe our submission to the VRR Challenge @ CVPR 2026, built on the \emph{ImplicitQA} / \emph{VRR-QA} benchmark~\cite{implicitqa}: multiple-choice video question answering in which answers are deliberately \emph{not} observable in any single frame and must be inferred from spatial layout, motion, depth, viewpoint, causality, and social context across discontinuous frames of creative video. We conduct a systematic, training-free study spanning open-source Video-LMMs (Qwen2.5-VL~\cite{qwen25vl}, Qwen3-VL~\cite{qwen3vl}, InternVL3, Gemma-3, and the RL-tuned video reasoners Video-R1~\cite{videor1} and VideoChat-R1.5~\cite{videochatr15}) and a battery of inference-time strategies (chain-of-thought, question decomposition, describe-then-reason cascades, audio transcripts, spatial state prompting, self-consistency~\cite{selfconsistency}, multi-model ensembling, and category routing). Our central finding is that this benchmark is \emph{perception-bound rather than reasoning-bound}: reasoning-side augmentations are neutral-to-harmful, whereas base-model perceptual capability and lightweight test-time denoising are the only reliable levers. A per-category error analysis localizes the difficulty to low-level perception -- relative depth, viewpoint, and counting are the hardest categories, while causal and social reasoning are nearly solved -- and a prompt that explicitly injects monocular depth cues to attack the weakest category \emph{lowers} test accuracy by $5.8$ points, confirming that the model needs a better \emph{percept}, not a better \emph{procedure}.

URL PDF HTML ☆

赞 0 踩 0

2606.01483 2026-06-02 cs.LG cs.AI eess.AS 版本更新

MURMUR: An Efficient Inference System for Long-Form ASR

MURMUR：一种高效的长时间语音识别推理系统

Wei-Tzu Lee, Keisuke Kamahori, Baris Kasikci

发表机构 * University of Washington（华盛顿大学）

AI总结提出MURMUR推理系统，通过块间和块内两级优化，在保持高精度的同时显著降低长时间语音识别的延迟。

详情

AI中文摘要

长时间自动语音识别（ASR）需要高精度和低延迟，但现有系统迫使两者之间进行权衡。基于块的流水线在并行窗口中处理音频以实现低延迟，但丢失了跨块上下文，并且需要脆弱的启发式方法来对齐边界处的说话人和时间戳。长上下文ASR模型通过单次传递解决所有问题以获得更好的准确性，但速度慢一个数量级。我们提出MURMUR，一个通过两级操作克服这种权衡的推理系统。在块间级别，我们重新审视基于块的流水线以适应现代长上下文ASR，将块大小视为可调超参数，并表明中间块大小在准确性和延迟之间取得了良好的平衡。在块内级别，我们通过应用于输出和语音令牌的滑动窗口KV缓存驱逐策略来利用注意力稀疏性。在AMI-IHM上，MURMUR匹配单次传递准确性，同时将延迟降低4.2倍，通过令牌驱逐进一步获得收益，相对tcpWER退化小于1%。MURMUR的代码可在https://github.com/uw-syfi/Murmur获取。

英文摘要

Long-form automatic speech recognition (ASR) requires both high accuracy and low latency, but existing systems force a trade-off between the two. Chunk-based pipelines process audio in parallel windows for low latency, but lose cross-chunk context and need brittle heuristics to align speakers and timestamps at boundaries. Long-context ASR models resolve everything in a single pass for better accuracy, but are an order of magnitude slower. We propose Murmur, an inference system that overcomes this trade-off by operating at two levels. At the inter-chunk level, we revisit the chunk-based pipeline for modern long-context ASR, treating chunk size as a tunable hyperparameter, and show that intermediate chunk sizes strike a good balance of accuracy and latency. At the intra-chunk level, we exploit attention sparsity through a sliding window KV cache eviction policy applied to both output and speech tokens. On AMI-IHM, Murmur matches single-pass accuracy while reducing latency by 4.2x, with further gains from token eviction at less than 1% relative tcpWER degradation. The code of Murmur is available at https://github.com/uw-syfi/Murmur.

URL PDF HTML ☆

赞 0 踩 0

2606.01470 2026-06-02 physics.flu-dyn cs.AI cs.LG 版本更新

基于证据的多目标潜在扰动在扩散模型中的基因型条件分子生成

Brenda Nogueira, Gisela A. Gonzalez-Montiel, Nitesh V. Chawla, Nuno Moniz

发表机构 * Department of Computer Science and Engineering（计算机科学与工程系）； University of Notre Dame（诺克斯大学）； Department of Chemistry and Biochemistry（化学与生物化学系）； Lucy Family Institute for Data & Society（数据与社会学院）

AI总结提出一种在预训练的基因型到药物扩散模型的潜在空间中，通过梯度上升优化可学习扰动以最大化药物敏感性、类药性和合成可及性的复合奖励，并利用实验数据和LLM管道确保生物合理性和机制一致性。

详情

AI中文摘要

由于肿瘤异质性和跨癌症亚型缺乏明确的分子靶点，开发有效的抗癌疗法仍然具有挑战性。以癌症基因型为条件的生成模型为个性化药物发现提供了一条有前景的途径，但现有方法缺乏对同时优化敏感性、可合成性和机制结合合理性的明确优化。我们提出了一种针对预训练的基因型到药物扩散模型的潜在空间优化方法，引入一个在分子潜在空间上的可学习扰动，通过梯度上升优化以最大化结合预测药物敏感性（AUC）、类药性（QED）和合成可及性（SAS）的复合奖励。关键的是，通过将奖励设计和评估基于实验衍生的癌细胞系数据和经过验证的药理学信号，将候选生成锚定在真实世界的临床证据中，从而强制执行生物学真实性。机制一致性合理性进一步通过基于扩散模型注意力机制的多智能体LLM管道进行评估。在来自三个保留评估集的15个癌细胞系上的实验表明，在敏感性、类药性、可合成性和化学有效性方面，与竞争基线相比，该方法具有一致且显著的改进。

英文摘要

Developing effective anticancer therapeutics remains challenging due to tumor heterogeneity and the absence of well-defined molecular targets across cancer subtypes. Generative models conditioned on cancer genotypes offer a promising avenue for personalized drug discovery, yet existing approaches lack explicit optimization for simultaneous sensitivity, synthesizability, and mechanistic binding plausibility. We present a latent-space optimization approach for a pretrained genotype-to-drug diffusion model, introducing a learnable perturbation over the molecular latent space optimized via gradient ascent to maximize a composite reward combining predicted drug sensitivity (AUC), drug-likeness (QED), and synthetic accessibility (SAS). Critically, biological realism is enforced by grounding both reward design and evaluation in experimentally-derived cancer cell line data and validated pharmacologic signals, anchoring candidate generation in real-world clinical evidence. Mechanistic consistency plausibility is further assessed by a multi-agent LLM pipeline grounded in the diffusion model's attention mechanism. Experiments across 15 cancer cell lines from three held-out evaluation sets demonstrate consistent and noticeable improvements over competing baselines in sensitivity, drug-likeness, synthesizability, and chemical validity.

URL PDF HTML ☆

赞 0 踩 0

2606.01457 2026-06-02 cs.AI cs.LG stat.ML 版本更新

Transferring Information Across Interventions in Causal Bayesian Optimization

跨干预因果贝叶斯优化的信息传递

Mohammad Ali Javidian

发表机构 * Computer Science Department（计算机科学系）

AI总结提出图耦合因果贝叶斯优化方法，通过共享因果参数的不确定性连接不同干预效应，实现跨干预信息传递，在可识别线性高斯因果模型中证明低秩核性质和次线性遗憾界。

详情

AI中文摘要

贝叶斯优化是一种优化昂贵系统的流行方法，其中每次实验、模拟或干预都会耗费时间或金钱。在其标准形式中，它将我们控制的变量视为黑盒的普通输入，无法区分单纯的相关性与真正的因果关系。因果贝叶斯优化通过使用已知因果图结合观测数据来决定哪些变量值得干预，从而部分弥补了这一差距。然而，现有方法几乎孤立地学习每种可能干预的效果，尽管在因果系统中这些效果通常共享相同的底层机制。我们提出图耦合因果贝叶斯优化，通过我们对一小部分共享因果参数的不确定性，将不同的干预效果联系在一起。结果是一个因果核，使得从一次干预收集的证据能够改进我们对相关干预的估计。对于可识别的线性高斯因果模型，我们证明该核具有低秩，其秩由共享参数的数量而非干预菜单的大小界定。这进而产生一个信息增益界，该界仅随优化范围对数增长，以及一个遗憾界，清晰地将三种误差来源分开：优化、因果估计以及考虑哪些干预集的选择。我们还描述了非线性和自适应扩展。在与理论一致的高斯系统、共享机制压力测试以及标准因果优化基准测试中，该方法保持了因果贝叶斯优化的优势，同时实现了跨相关干预的信息传递，当对目标父节点的直接干预不可用且稀疏的干预数据必须在一大组候选干预中重复使用时，增益最为明显。

英文摘要

Bayesian optimization is a popular way to optimize expensive systems, where every experiment, simulation, or intervention costs time or money. In its standard form, it treats the variables we control as plain inputs to a black box and cannot tell apart mere correlation from a real cause and effect. Causal Bayesian optimization closes part of this gap by using a known causal graph together with observational data to decide which variables are worth intervening on. Existing methods, however, learn the effect of each possible intervention almost in isolation, even though in a causal system these effects usually share the same underlying mechanisms. We propose graph-coupled causal Bayesian optimization, which ties the different intervention effects together through the uncertainty we have about a small set of shared causal parameters. The result is a causal kernel that lets evidence collected from one intervention improve our estimate of related interventions. For identifiable linear Gaussian causal models, we show that this kernel has low rank, bounded by the number of shared parameters rather than by the size of the intervention menu. This in turn yields an information-gain bound that grows only logarithmically in the optimization horizon, and a regret bound that cleanly separates three sources of error: optimization, causal estimation, and the choice of which intervention sets to consider. We also describe nonlinear and adaptive extensions. Across theory-aligned Gaussian systems, shared-mechanism stress tests, and standard causal optimization benchmarks, the method keeps the benefits of causal Bayesian optimization while transferring information across related interventions, with the clearest gains when direct interventions on the target's parents are unavailable and sparse interventional data must be reused across a large family of candidate interventions.

URL PDF HTML ☆

赞 0 踩 0

2606.01456 2026-06-02 cs.LG cs.CL cs.GT 版本更新

Truthful AI Advisors: A Pre-Specified Benchmark for Large Language Model Honesty Under Preference Misalignment

诚实的人工智能顾问：偏好错位下大语言模型诚实性的预设基准

Hamidreza Hasani Balyani, Seyed Pouyan Mousavi Davoudi, Alireza Amiri-Margavi, Amin Gholami Davodi, Arshia Gharagozlou

发表机构 * Amazon Lab126, HW Tech Org.（亚马逊实验室126，硬件技术组织）； Computational Modeling and Simulation University of Pittsburgh（计算建模与仿真大学匹兹堡分校）； Mathematics & Statistics Department University of Minnesota Duluth（数学与统计学系明尼苏达大学 Duluth 分校）

AI总结通过Crawford-Sobel廉价谈话模型构建基准，评估大语言模型在偏好冲突时是否诚实，发现模型过度揭示信息，偏离策略最优。

Comments 19 pages. Code and data: https://github.com/iHamidHasani/cheap-talk-llm-benchmark

详情

AI中文摘要

大语言模型越来越多地被部署为顾问，其目标与用户不一致：推荐系统优化参与度，销售助手优化购买，谈判代理优化让步。当诚实与自身收益冲突时，这些顾问是否保持诚实是一个核心的对齐评估问题。我们将经典的Crawford-Sobel廉价谈话模型转化为偏好错位下LLM诚实性的预设基准。廉价谈话理论预测既非完全揭示也非沉默，而是粗糙的单调划分，随着偏好冲突增加，信息区间减少。发送者观察到状态omega在[0,1]中，希望接收者的行动接近omega+b，并向理想行动为omega的接收者发送一条无成本消息。设计使用5个偏差水平、3个提示框架、固定的低温度设置和每个单元200个状态：共12,000次发送者调用。对于正偏差网格b∈{0.01,0.04,0.08,0.12}，最信息丰富的划分大小分别为7、4、3、2，预言机归一化互信息分别为0.5294、0.3268、0.2205、0.1829。在四个指令调优模型（GPT-4o、Claude Sonnet 4.5、Gemini 2.5 Flash-Lite、Llama-3.3-70B）上运行完整设计，我们发现所有四个模型相对于最信息丰富的均衡过度揭示1.8至4.2倍：归一化互信息保持在0.78-0.94，而预言机规定为0.18-0.53。信息量随偏差下降如预测，但从未接近策略最优；模型显示出近乎完全的揭示，并带有跟踪其偏差的恒定正向偏移（线性夸大）。收益最大化与诚实框架的影响可忽略。解码器消融表明，仅当接收者读取发送者陈述的数字时，该发现才可恢复：仅嵌入解码器将相同数据误读为近乎胡言乱语。

英文摘要

Large language models are increasingly deployed as advisors whose objective is not aligned with the user's: recommenders optimize for engagement, sales assistants for purchases, negotiation agents for concessions. Whether such advisors stay truthful when honesty conflicts with their own payoff is a core alignment-evaluation question. We turn the canonical Crawford-Sobel cheap-talk model into a pre-specified benchmark for LLM honesty under preference misalignment. Cheap-talk theory predicts neither full revelation nor silence but coarse monotone partitions, with fewer informative intervals as preference conflict grows. A sender observes a state omega in [0,1], wants the receiver's action near omega+b, and sends one costless message to a receiver whose ideal action is omega. The design uses 5 bias levels, 3 prompt frames, a fixed low-temperature setting, and 200 states per cell: 12,000 sender calls. For the positive-bias grid b in {0.01,0.04,0.08,0.12} the exact most-informative partition sizes are 7,4,3,2, with oracle normalized mutual information 0.5294, 0.3268, 0.2205, 0.1829. Running the full design on four instruction-tuned models (GPT-4o, Claude Sonnet 4.5, Gemini 2.5 Flash-Lite, Llama-3.3-70B), we find all four over-reveal relative to the most-informative equilibrium by 1.8 to 4.2x: normalized mutual information stays at 0.78-0.94 where the oracle prescribes 0.18-0.53. Informativeness declines with bias as predicted but never approaches the strategic optimum; rather than coarse partitions, models show near-full revelation with a constant upward offset tracking their bias (linear exaggeration). Payoff-maximizing versus honesty framing has negligible effect. A decoder ablation shows the finding is recoverable only when the receiver reads the sender's stated number: an embedding-only decoder mis-reads the same data as near-babbling.

URL PDF HTML ☆

赞 0 踩 0

2606.01444 2026-06-02 cs.AI cond-mat.mtrl-sci cs.CL cs.LG math.CT 版本更新

Self-Revising Discovery Systems for Science: A Categorical Framework for Agentic Artificial Intelligence

科学中的自我修正发现系统：面向主体人工智能的范畴论框架

Fiona Y. Wang, Markus J. Buehler

发表机构 * Laboratory for Atomistic and Molecular Mechanics（原子分子力学实验室）； Department of Biological Engineering（生物工程系）； Massachusetts Institute of Technology（麻省理工学院）； Department of Civil and Environmental Engineering（土木与环境工程系）； Department of Mechanical Engineering（机械工程系）； Center for Computational Science and Engineering（计算科学与工程中心）； Schwarzman College of Computing（施瓦茨曼计算学院）

AI总结本文提出一个基于范畴论的框架，通过左Kan扩展实现科学发现中的表征体制转换，并应用于材料科学中的蛋白质力学和纤维网络建模。

详情

AI中文摘要

科学发现不仅是生成答案，更是对证据、人工制品、操作和验证者进行类型化的表征体制的修正。我们为材料科学中的主体发现开发了一个范畴论描述。在固定体制b中，模式类别为S_b，系统状态是一个余预层I_t: S_b -> Set，来源是元素范畴∫_{S_b} I_t。固定体制操作是对此类状态的更新，仅当指定并保留了保持来源的细化时才是自函子。发现则是经过验证的体制转换u: S_b -> S_b'：旧人工制品通过左Kan扩展Lan_u I_t保存并传输，并与转换后状态进行比较，以识别超出函子传输的剩余内容。这在不依赖主观新颖性的情况下区分了检索、搜索和发现。我们在两个系统中实例化了该框架。在Builder/Breaker中，蛋白质力学世界模型在最小描述长度门控下进行修正；接受的定律将链内柔性表示为受慢集体模式调节的全模态弹性柔度，即模式调节柔度。在CategoryScienceClaw中，类型化技能、人工制品、开放需求、工作流变异、门控、压力测试和公共话语构成了一个携带证明的知识计算图。一个纤维网络示例记录了候选模型、被拒绝的替代方案、AIC门控、扰动测试以及一个基于各向同性纤维计数描述符的接受取向张量各向异性刚度代理模型。这些案例共同展示了范畴论如何既作为科学发现的数学语言，又作为自我修正AI发现系统的工程规范。

英文摘要

Scientific discovery is not only answer generation but revision of the representational regime in which evidence, artifacts, operations, and verifiers are typed. We develop a category-theoretic account of agentic discovery for materials science. In a fixed regime b with schema category S_b, the system state is a copresheaf I_t: S_b -> Set, and provenance is the category of elements \int_{S_b} I_t. Fixed-regime operation is an update on such states, endofunctorial only when provenance-preserving refinements are specified and preserved. Discovery is instead a verified regime transition u: S_b -> S_b': old artifacts are preserved, transported by the left Kan extension Lan_u I_t, and compared with the post-transition state to identify residual content beyond functorial transport. This separates retrieval, search, and discovery without subjective novelty. We instantiate the framework in two systems. In Builder/Breaker, a protein-mechanics world model is revised under a Minimum Description Length gate; the accepted law expresses within-chain flexibility as all-mode elastic compliance conditioned by slow collective-mode participation, or mode-conditioned compliance. In CategoryScienceClaw, typed skills, artifacts, open needs, workflow mutation, gates, stress tests, and public discourse become a proof-carrying knowledge-computation graph. A fiber-network example records candidate models, rejected alternatives, an AIC gate, perturbation tests, and an accepted orientation-tensor anisotropic stiffness surrogate over an isotropic fiber-count descriptor. Together, the cases show how category theory can be both a mathematical language for discovery and an engineering specification for self-revising AI discovery systems.

URL PDF HTML ☆

赞 0 踩 0

2606.01443 2026-06-02 cs.LG cs.AI cs.CV 版本更新

UR-JEPA: Uniform Rectifiability as a Regularizer for Joint-Embedding Predictive Architectures

UR-JEPA：均匀可整流性作为联合嵌入预测架构的正则化器

Triet M. Le

发表机构 * Spatiolyx LLC（Spatiolyx公司）

AI总结提出UR-JEPA，通过高斯核平滑的Carleson型平方函数实现均匀n-可整流测度正则化，防止表示坍塌，在多个数据集上达到与LeJEPA相当的峰值精度但具有更低的种子方差。

详情

AI中文摘要

训练联合嵌入预测架构（JEPA）的一个核心困难是防止表示坍塌。LeJEPA通过素描各向同性高斯正则化（SIGReg）对嵌入施加各向同性高斯目标来解决这一问题。该目标与流形假设相矛盾，流形假设期望嵌入集中在环境空间的低维子集上。我们提出\emph{UR-JEPA}，其目标是在小尺度上具有局部切向维度$n$的均匀$n$-可整流测度，通过高斯核平滑的Carleson型平方函数$\mathcal{L}^{ ext{CGLT}}$实现，并辅以Jones $β$数公式。在Inet10上，UR-JEPA($\mathcal{L}^{ ext{CGLT}}$)达到$0.9141 \pm 0.0014$，相比LeJEPA($\mathcal{L}^{ ext{SIGReg}}$)提高了$+0.83$个百分点，种子标准差降低约$30\%$；在匹配配方的Galaxy10~SDSS、单种子ImageNet-$100$运行和3种子EuroSAT遥感运行中，两种方法在收敛时处于相同的峰值精度区间，UR-JEPA保持其较低的种子方差特征。在EuroSAT上，域内对在$96.0$到$96.1\%$之间具有竞争力，且使用大型遥感基础模型迁移时骨干网络缩小$25$倍。区别在于几何结构：对投影仪输出分布的直接可视化显示，在所有四个数据集上，UR-JEPA($\mathcal{L}^{ ext{CGLT}}$)产生的全局PCA谱在索引$\sim 20$到$25$（共$D=32$）处出现$4$到$5$个数量级的下降，而LeJEPA的谱接近平坦（顶部到底部比率最多为$3.6$）。两种方法的每维度边缘分布同时接近高斯分布（平均Shapiro-Wilk $W \in [0.992, 0.996]$），这是Diaconis-Freedman结果的一个推论。因此，在匹配精度下，两种正则化器产生结构上不同的投影表示。

英文摘要

A central difficulty in training Joint-Embedding Predictive Architectures (JEPAs) is preventing representation collapse. LeJEPA addresses this by enforcing an isotropic Gaussian target on the embeddings via Sketched Isotropic Gaussian Regularization (SIGReg). This target is in tension with the manifold hypothesis, which expects embeddings to concentrate on a low-dimensional subset of the ambient space. We propose \emph{UR-JEPA}, which targets a uniformly $n$-rectifiable measure of local tangent dimension $n$ at small scales, realized through a Gaussian-kernel smoothed Carleson-type square function $\mathcal{L}^{\text{CGLT}}$, with a complementary Jones $β$-number formulation. On Inet10, UR-JEPA($\mathcal{L}^{\text{CGLT}}$) attains $0.9141 \pm 0.0014$ for a $+0.83$\,pp gain over LeJEPA($\mathcal{L}^{\text{SIGReg}}$) with $\sim 30\%$ lower seed standard deviation; on matched-recipe Galaxy10~SDSS, a single-seed ImageNet-$100$ run, and a $3$-seed EuroSAT remote-sensing run, the two methods lie in the same peak-accuracy band at convergence, with UR-JEPA retaining its lower-seed-variance signature. On EuroSAT the in-domain pair is competitive at $96.0$ to $96.1\%$ with large remote-sensing foundation-model transfer at a $25\times$ smaller backbone. The distinction is geometric: direct visualization of the projector output distribution shows that on all four datasets UR--JEPA($\mathcal{L}^{\text{CGLT}}$) produces a global PCA spectrum with a $4$ to $5$ order-of-magnitude drop at index $\sim 20$ to $25$ out of $D = 32$, while LeJEPA's spectrum is near-flat (top-to-bottom ratio at most $3.6$). Per-dimension marginals are simultaneously near-Gaussian for both methods (mean Shapiro-Wilk $W \in [0.992, 0.996]$) as a Diaconis-Freedman consequence. At matched accuracy the two regularizers therefore yield structurally distinct projected representations.

URL PDF HTML ☆

赞 0 踩 0

2606.01437 2026-06-02 cs.LG cs.AI 版本更新

CEAR: Certified Ensemble Adversarial Robustness in DNNs

CEAR: 深度神经网络中的集成对抗鲁棒性认证

Daniel Sadig, Mohammadreza Maleki, Hamed Karimi, Reza Samavi

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出CEAR方法，通过混合经验与认证防御机制，利用高斯噪声和温度混淆梯度与logits，并扩展随机平滑以验证集成分类器的鲁棒性，在多个数据集上取得更优的认证准确率和鲁棒半径。

Comments This is the preprint of the work accepted for publication in the Proceedings of the 39th Canadian Conference on Artificial Intelligence (Canadian AI 2026); 19 Pages

详情

AI中文摘要

深度神经网络（DNN）极易受到对抗性扰动的影响，这促使了对安全关键应用鲁棒性的广泛研究。最先进的实证防御机制通过训练阶段提高DNN的鲁棒性，但仍难以应对自适应白盒攻击。另一方面，认证防御在指定的扰动范围内提供可证明的鲁棒性保证。这些保证无论扰动程度如何都成立，即使攻击者拥有模型的完全知识。在本文中，我们提出了CEAR，一种基于集成的鲁棒方法，它利用了实证和认证防御机制的混合。CEAR使用不同的高斯噪声和温度训练集成中的每个网络，以混淆梯度和logits，使模型对更强的基于梯度的攻击更具抵抗力。然后我们使用带噪声的logits，并提出了两种不同的投票机制来进一步提高鲁棒性。此外，我们扩展了随机平滑以验证基于集成的分类器的鲁棒性。我们在MNIST、CIFAR10和TinyImageNet数据集上的实验评估表明，与基线方法相比，平均认证准确率更高，鲁棒半径更大，可迁移性更低。

英文摘要

Deep Neural Networks (DNNs) are highly susceptible to adversarial perturbations, leading to extensive research on robustness for safety-critical applications. State-of-the-art empirical defense mechanisms improve the robustness of DNNs through the training phase, but still struggle against adaptive white-box attacks. On the other hand, certified defenses offer provable guarantees of robustness within a specified perturbation bound. These guarantees hold regardless of the level of perturbations, even if the attacker is given full knowledge of the model. In this paper, we propose CEAR, an ensemble-based robust method that utilizes a hybrid of empirical and certified defense mechanisms. CEAR trains each network within the ensemble using varying Gaussian noise and temperatures to obfuscate gradients and logits, making the model more resistant to stronger gradient-based attacks. We then use noisy logits and propose two different voting mechanisms to further improve robustness. Furthermore, we extend randomized smoothing to verify the robustness of ensemble-based classifiers. Our experimental evaluations on MNIST, CIFAR10, and TinyImageNet datasets demonstrate superior certified accuracy on average, increased robustness radius, and decreased transferability compared to baseline methods.

URL PDF HTML ☆

赞 0 踩 0

2606.01432 2026-06-02 cs.LG eess.IV eess.SP stat.ML 版本更新

Leaf Spectral Reflectance Prediction Using Multi-Head Attention Neural Networks

使用多头注意力神经网络预测叶片光谱反射率

Parastoo Farajpoor, Alireza Pourreza, Mohammadreza Narimani, Ashraf El-Kereamy, Matthew W. Fidelibus

发表机构 * Digital Agriculture Laboratory, Department of Biological and Agricultural Engineering, University of California, Davis, CA, USA（加州大学戴维斯分校数字农业实验室，生物与农业工程系）； Department of Botany and Plant Sciences, University of California, Riverside, CA, USA（加州大学河滨分校植物学与植物科学系）； Department of Viticulture and Enology, University of California, Davis, CA, USA（加州大学戴维斯分校葡萄学与酿酒学系）

AI总结针对特定作物（如葡萄藤），提出基于多头注意力神经网络的叶片性状-光谱预测模型，在葡萄藤数据集上实现高精度（R²=0.84，NRMSE=1.52%），优于传统辐射传输模型PROSPECT-PRO。

Comments 8 pages, 5 figures. Author-accepted version of the SPIE conference paper

详情

DOI: 10.1117/12.3061298
Journal ref: Proc. SPIE 13475, 134750V (2025)

AI中文摘要

从生理和生化性状准确建模叶片光谱反射率对于推进植物科学和精准农业中的遥感应用至关重要。广泛使用的辐射传输模型（如PROSPECT-PRO）依赖于从多种物种中开发的广义性状-反射率关系，这可能无法完全捕捉特定作物（如葡萄藤）的光谱行为。在本研究中，我们开发了一个基于多头注意力神经网络的性状到光谱预测模型，该模型在包含16个叶片性状（涵盖多个品种、生长阶段和年份）的葡萄藤特定数据集上训练。使用分层5折交叉验证评估模型，平均决定系数（R^2）为0.84，归一化均方根误差（NRMSE）为1.52%，显示出高精度和泛化能力。与正向模式下的PROSPECT-PRO相比，神经网络表现出更低的平均绝对误差（MAE），尤其是在近红外（NIR）和短波红外（SWIR）区域。这些结果强调了物种特异性建模方法的重要性，并表明将生化和结构性状整合到数据驱动架构中可以显著改善光谱预测。所提出的模型为生成准确的叶片级反射率数据提供了稳健框架，在冠层性状反演、葡萄园监测和遥感驱动的作物管理方面具有潜在应用。

英文摘要

Accurate modeling of leaf spectral reflectance from physiological and biochemical traits is essential for advancing remote sensing applications in plant science and precision agriculture. Widely used radiative transfer models, such as PROSPECT-PRO, rely on generalized trait-reflectance relationships developed from a wide range of species, which may not fully capture the spectral behavior of specific crops like grapevines. In this study, we developed a trait-to-spectra prediction model using a multi-head attention neural network trained on a grapevine-specific dataset that includes 16 leaf traits measured across multiple varieties, growth stages, and years. The model was evaluated using stratified 5-fold cross-validation and achieved an average coefficient of determination (R^2) of 0.84 and normalized root mean squared error (NRMSE) of 1.52 percent, demonstrating high accuracy and generalizability. When compared to PROSPECT-PRO in forward mode, the neural network exhibited lower mean absolute error (MAE), especially in the near-infrared (NIR) and shortwave-infrared (SWIR) regions. These results emphasize the importance of species-specific modeling approaches and show that integrating biochemical and structural traits into data-driven architectures can significantly improve spectral prediction. The proposed model provides a robust framework for generating accurate leaf-level reflectance data, with potential applications in canopy trait retrieval, vineyard monitoring, and remote sensing-driven crop management.

URL PDF HTML ☆

赞 0 踩 0

2606.01427 2026-06-02 stat.ML cs.LG 版本更新

On the Uncertainty Quantification Ability of Tabular Foundation Models

关于表格基础模型的不确定性量化能力

Tyler R. Johnson, Kian Ben-Jacob, Nima Negarandeh, Oriol Vendrell-Gallart, Ramin Bostanabad

发表机构 * Department of Mechanical and Aerospace Engineering, University of California, Irvine（加州大学欧文分校机械与航空航天工程系）； Department of Civil and Environmental Engineering, University of California, Irvine（加州大学欧文分校土木与环境工程系）

AI总结通过对比TabPFN与高斯过程在回归任务上的实证研究，揭示了显式先验与学习先验之间的权衡：TabPFN在复杂高维问题中表现优异，而高斯过程在数据稀缺时提供更优的预测精度和不确定性量化。

Comments 12 pages, 2 figures, 2 tables

详情

AI中文摘要

基础模型（FMs）在无需特定任务训练或微调的情况下，已在跨任务泛化方面取得了显著成功。然而，力学和计算科学中的许多关键应用不仅需要准确的预测，还需要可靠的不确定性量化（UQ）。本文通过全面的实证研究，比较了表格先验数据拟合网络（TabPFN）与高斯过程（GP）在回归任务中的UQ能力。我们系统地评估了这两种方法在一系列具有不同复杂度、数据集大小和输入维度的回归问题上的表现。我们使用默认设置构建所有GP，并与TabPFN v2.5进行公平比较。我们的发现突显了显式先验与学习先验之间的重要权衡：虽然TabPFN在数据充足的高维复杂问题上具有高度竞争性的性能，但GP在数据稀缺场景下通常提供更优的预测精度和UQ。此外，当所选核函数构成底层函数的良好先验时，GP的性能可能显著超过TabPFN。我们的结果可从https://github.com/kianswarehouse/GPvsPFN复现。

英文摘要

Foundation models (FMs) have achieved substantial success in generalizing across tasks without problemspecific training or fine-tuning. However, many critical applications in mechanics and computational science require not only accurate predictions but also reliable uncertainty quantification (UQ). Herein we investigate the UQ capabilities of tabular FMs in regression tasks through a comprehensive empirical study comparing Tabular Prior-Data Fitted Networks (TabPFN) against Gaussian processes (GPs). We systematically evaluate these two methods across a host of regression problems with varying complexity, dataset sizes, and input dimensionalities. We use a default setting to build all the GPs and for a fair comparison against TabPFN v2.5. Our findings highlight an important trade-off between explicit and learned priors: while TabPFN achieves highly competitive performance for complex, high-dimensional problems with sufficient data, GPs often provide superior predictive accuracy and UQ in data-scarce settings. Moreover, when the chosen kernel constitutes a good prior for the underlying function, GP performance can substantially exceed that of TabPFN. Our results can be reproduced from https://github.com/kianswarehouse/GPvsPFN.

URL PDF HTML ☆

赞 0 踩 0

2606.01425 2026-06-02 cs.LG 版本更新

Learning-based Directed Graph Abstraction of Combinatorial Spaces for Order-Preserving Search in Mixed-Combinatorial Nonlinear Optimization

基于学习的组合空间有向图抽象用于混合组合非线性优化中的保序搜索

Gishnu Madhu, Feng Liu, Souma Chowdhury

发表机构 * Department of Computer Science and Engineering（计算机科学与工程系）； Department of Mechanical and Aerospace Engineering（机械与航空航天工程系）

AI总结提出一种基于图神经网络的有向图抽象方法，将组合空间映射为有向图，以改进混合组合非线性规划问题的搜索效率。

Comments Accepted for presentation at 2026, ASME IDETC

详情

AI中文摘要

混合组合非线性规划（MCNLP）问题出现在许多工程设计和规划应用中，例如由于分类、组件和几何设计选择，以及联合任务和运动规划。组合空间的传统表示方法，如整数或二进制编码，常常引入虚假关系，增加维度，并需要额外的兼容性约束。相反，本文借鉴了机器人规划和车辆/网络路由领域的最新发展，旨在使用图神经网络（GNN）学习组合空间上的搜索启发式。更具体地说，本文通过使用边场图网络（EFGN）学习从无向全连接组合图到指示改进方向的有向图的映射，提出了首个结构化的组合空间抽象。为了展示这种抽象组合空间的新方法在解决MCNLP中的效用，我们采用了一个最近的优化框架，该框架纯粹搜索非组合（例如连续）变量，并通过使用抽象模型（类似于推荐系统）为每个候选设计检索最合适的组合。与原始框架中的推荐系统相比，所提出的方向感知抽象模型提供了可能更具可扩展性和可解释性的组合检索。为了评估，所提出的方法与著名的粒子群优化和遗传算法求解器集成，在三个具有不同组合和变量数量的基准非线性问题上进行测试。与使用索引化组合的基线求解器相比，基于GNN的推荐器在多次运行中始终获得更好的平均最优值和鲁棒性。

英文摘要

Mixed-combinatorial nonlinear programming (MCNLP) problems arise in many engineering design and planning applications, e.g., due to categorical, component, and geometric design choices, as well as joint task and motion planning. Traditional representations of combinatorial spaces, such as integer or binary encoding, often introduce spurious relations, increase dimensionality, and require additional compatibility constraints. Instead, this paper draws on recent developments in robot planning and vehicle/network routing domains that aim to learn search heuristics over combinatorial spaces using graph neural networks (GNNs). More specifically, this paper presents a first-of-its-kind structured abstraction of the combinatorial space by learning a mapping from an undirected fully connected graph of combinations to a directed graph indicating improvement directions using an Edge Field Graph Network (EFGN). To demonstrate the utility of this new way of abstracting the combinatorial space in solving MCNLPs, we adopt a recent optimization framework that purely searches over the non-combinatorial (e.g., continuous) variables and retrieves the best-suited combination for each candidate design by using the abstraction model, akin to a recommender system. The presented direction-aware abstraction model provides a potentially more scalable and interpretable retrieval of combinations compared to the original recommendation system in that framework. For evaluation, the proposed method is integrated with a well-known particle swarm optimization and genetic algorithm solvers on three benchmark nonlinear problems with varying numbers of combinations and variables. Compared to baseline solvers using indexified combinations, the GNN-based recommender consistently achieves better mean optimum values and robustness across multiple runs.

URL PDF HTML ☆

赞 0 踩 0

2606.01421 2026-06-02 cs.LG 版本更新

Target localization, identification and sensing using latent symmetries

利用潜在对称性进行目标定位、识别与感知

David Dukov, Malte Röntgen, Bryn Davies

发表机构 * Mathematics Institute, University of Warwick（沃里克大学数学研究所）； Eastern Institute for Advanced Study（东部高级研究 institute）； Eastern Institute of Technology Ningbo, Zhejiang, China（宁波东部技术研究院，浙江，中国）

AI总结本文利用设计有潜在对称性的散射体阵列作为传感器，通过分析对称性破缺程度，结合贝叶斯推断或人工神经网络实现入侵散射体的半径识别与位置定位。

Comments Submitted to SIAM Journal on Imaging Sciences

2606.01413 2026-06-02 cs.CR cs.IR cs.LG 版本更新

Differentially Private Datastore Generation for Retrieval-Augmented Inference

用于检索增强推理的差分隐私数据存储生成

Abdelrahman Abouelenein, Marwan Torki

发表机构 * Department of Computer and Systems Engineering, Alexandria University（计算机与系统工程系，亚历山大大学）； Microsoft（微软）

AI总结提出基于哈希的概率生成框架，利用局部敏感哈希和差分隐私噪声实现数据存储的隐私保护，在ε=5时准确率仅下降2.6%，并将成员推断攻击准确率降至53.60%。

Comments Accepted at the 28th International Conference on Pattern Recognition (ICPR-2026)

详情

AI中文摘要

对于依赖检索增强推理的现代设备端AI系统来说，在不损害个人隐私的情况下发布和共享数据存储至关重要。这可以通过差分隐私（DP）实现，它提供了形式化的保证，确保即使在对抗性分析下，个体贡献仍然不可区分。在本文中，我们引入了一个基于哈希的概率生成框架，旨在实现差分隐私数据存储的创建和发布。我们的方法采用局部敏感哈希（LSH）将高维数据高效地划分到桶中。然后，我们向每个桶的累积投票中添加校准的DP噪声，生成跨类别的概率分布。我们的方法广泛适用于任何需要安全键值数据存储创建和发布的流水线。我们在七个样本量和类别数（从2到14不等）的数据集上进行了实验。在ε=5时，我们发布的DP数据存储实现了强隐私保护，准确率仅平均下降2.6%。最后，我们评估了DP数据存储对成员推断攻击的抵抗力，将攻击准确率降低到53.60%。

英文摘要

It is crucial for modern on-device AI systems that rely on retrieval-augmented inference to release and share datastores without compromising individual privacy. This can be achieved using Differential Privacy (DP), which provides a formal guarantee that ensures individual contributions remain indistinguishable, even under adversarial analysis. In this paper, we introduce a hashing-based probability generation framework designed to enable the creation and release of differentially private datastores. Our approach employs locality-sensitive hashing (LSH) to efficiently partition high-dimensional data into buckets. We then add calibrated DP noise to the accumulated vote for each bucket, generating a probability distribution across classes. Our method is broadly applicable to any pipeline requiring secure key,value datastore creation and release. We conducted experiments on seven datasets with varying sample sizes and class counts, ranging from 2 to 14. At epsilon=5, our released DP datastore achieves strong privacy protection with only an average 2.6% drop in accuracy. Finally, we benchmark DP datastore resilience to membership inference attacks, reducing attack accuracy to 53.60%.

URL PDF HTML ☆

赞 0 踩 0

2606.01412 2026-06-02 cs.LG cs.IT math.IT 版本更新

BRo-JEPA：在潜空间中学习模算术

Divyansh Jha, Yuanfang Xie, Varan Mehra, Brennen Yu

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； NYU Langone Health（纽约大学Langone医疗中心）

AI总结本文提出BRo-JEPA模型，通过在潜空间中施加模10算术的循环结构，实现零样本泛化，解决了标准模型无法外推未见操作的问题。

Comments 10 pages, 14 figures

2606.01363 2026-06-02 cs.LG cs.SY eess.SY 版本更新

All Models are Wrong, Knowing Where is Useful: On Model Uncertainty in Reinforcement Learning

所有模型都是错的，知道哪里有用：强化学习中的模型不确定性

Bernd Frauenknecht, Devdutt Subhasish, Artur Eisele, Friedrich Solowjow, Sebastian Trimpe

发表机构 * German Federal Ministry of Research, Technology and Space (BMFTR)（德国联邦研究、技术和空间部）； Robotics Institute Germany (RIG)（德国机器人研究所）； Institute for Data Science in Mechanical Engineering, RWTH Aachen University（机械工程数据科学研究所，亚琛工业大学）； NHR Center NHR4CES at RWTH Aachen University（亚琛工业大学NHR4CES中心）

AI总结提出通过针对性处理概率模型的不确定性来减轻模型利用的框架，并展示在硬件直接学习和安全探索方面的成功。

2606.01339 2026-06-02 cs.LG cs.AI cs.CL cs.CV cs.ET 版本更新

FreqLite: A Lightweight Frequency-Decomposed Linear Model with Adaptive Reversible Normalization for Robust Long-Term Time-Series Forecasting

FreqLite：一种轻量级频率分解线性模型，具有自适应可逆归一化，用于稳健的长期时间序列预测

Mirza Samad Ahmed Baig, Syeda Anshrah Gillani

发表机构 * Hamdard University（哈姆达德大学）

AI总结提出FreqLite，一种超轻量级、通道独立的频率分解线性预测器，通过可学习的无损谱滤波器进行频带分解和线性预测，并引入自适应可逆实例归一化（A-RevIN）处理非平稳性，在长期预测基准上以更少参数和计算资源超越PatchTST等模型。

Comments 26 pages, 5 figures

详情

AI中文摘要

长期时间序列预测需要既准确又能在商用硬件上高效运行的模型。轻量级线性预测器在此领域表现出色，但仍存在两个问题：可逆实例归一化（RevIN）使用单一回溯统计量对整个预测区间进行去归一化，在非平稳性下不准确；时域趋势/季节分解依赖于固定的非自适应滤波器。我们提出FreqLite，一种超轻量级、通道独立的频率分解线性预测器：一个可学习的、无损的单位划分谱滤波器将输入分割成多个频带，由每个频带的线性头进行预测，与低通截断方法不同，高频带被保留并建模。FreqLite在标准长期预测基准上是最佳的轻量级模型，在长回溯（L=336）时，其平均误差低于PatchTST Transformer（0.3244 vs 0.3587 MSE），同时参数减少4倍，内存减少2.2倍，在单块4 GB笔记本GPU上每轮时间减少2.2倍；尽管幅度不大，但在所有匹配单元上的配对Wilcoxon检验中，其改进具有统计显著性（p < 1e-5）。我们进一步引入自适应可逆实例归一化（A-RevIN），一种自适应可逆归一化，严格推广了RevIN（在其门关闭时完全恢复），在非平稳性下起作用，并在平稳数据上无害地退化为RevIN。我们在一个真实的强非平稳数据集（ILI，MSE降低约5%）和一个受控合成漂移扫描中验证了这一点，其中A-RevIN的收益及其学习门都随注入的非平稳性单调增加。每个组件均可独立消融（Linear和RLinear是FreqLite的特例），所有结果均可在商用硬件上复现。

英文摘要

Long-term time-series forecasting needs models that are accurate yet efficient enough for commodity hardware. Lightweight linear forecasters are remarkably strong in this regime, yet they leave two openings: reversible instance normalization (RevIN) de-normalizes the entire horizon with a single lookback statistic, which is inaccurate under non-stationarity, and time-domain trend/seasonal decomposition relies on a fixed, non-adaptive filter. We present FreqLite, an ultra-lightweight, channel-independent frequency-decomposed linear forecaster: a learnable, lossless, partition-of-unity spectral filter splits the input into bands that are forecast by per-band linear heads and, unlike low-pass-truncation approaches, the high-frequency band is retained and modeled. FreqLite is the best lightweight model on the standard long-term forecasting benchmarks and, at long lookback (L=336), attains a lower average error than a PatchTST Transformer (0.3244 vs. 0.3587 MSE) while using 4x fewer parameters, 2.2x less memory, and 2.2x less time per epoch on a single 4 GB laptop GPU; although modest in magnitude, its improvements are statistically significant under paired Wilcoxon tests across all matched cells (p < 1e-5). We further introduce Adaptive Reversible Instance Normalization (A-RevIN), a regime-adaptive reversible normalization that strictly generalizes RevIN (recovered exactly when its gate is closed), engages under non-stationarity, and reduces to RevIN without harm on stationary data. We validate this on both a real strongly non-stationary dataset (ILI, up to ~5% MSE reduction) and a controlled synthetic drift sweep in which A-RevIN's benefit and its learned gate both rise monotonically with injected non-stationarity. Every component is independently ablatable (Linear and RLinear are special cases of FreqLite), and all results are reproducible on commodity hardware.

URL PDF HTML ☆

赞 0 踩 0

2606.01329 2026-06-02 cs.LG q-bio.BM 版本更新

Conditioned free-energy density of proteins using unbalanced solutions to constraint satisfaction problems

使用约束满足问题的不平衡解的条件化蛋白质自由能密度

Pratik Worah, Subhash Khot, Srinivasa Varadhan

发表机构 * CIMS, NYU（纽约大学应用数学与计算科学中心）

AI总结本文通过将条件化非均匀Curie-Weiss自旋哈密顿量的对数配分函数（自由能）简化为不平衡$2 \to 1$范数计算，并设计多项式时间SDP算法，应用于泛素蛋白以探索自由能景观并识别柔性区域。

2606.01325 2026-06-02 cs.NI cs.LG 版本更新

SEArch: Optimistic Policy Selection Between Scene Noise and Drift for UAV Radar Search

SEArch: 无人机雷达搜索中场景噪声与漂移间的乐观策略选择

Noor Khial, Naram Mhaisen, Loay Ismail, Amr Mohamed

发表机构 * Department of Electrical and Computer Engineering, University of Waterloo（1 温哥华大学电子与计算机工程系）

AI总结针对无人机雷达目标搜索中场景噪声与漂移共存的问题，提出基于随机扩展对手框架的乐观跟随正则化领导者策略选择器SEArch及其窗口变体W-SEArch，实现了亚线性遗憾界，实验显示相比非自适应基线遗憾降低达30%。

详情

AI中文摘要

配备雷达传感器的无人机被部署在多样环境中执行目标搜索任务，目标具有可通过遮挡检测到的特征信号（例如人体搜索中的呼吸微动）。一个基本挑战在于，当无人机在动态且可能非平稳的环境中移动时，雷达统计特性发生变化，使得任何固定的信号处理策略都变得次优；然而感知和适应必须在资源受限的空中节点上实时运行。由于没有单一检测器能在所有条件下表现良好，我们采用多策略范式，将无人机目标搜索形式化为一个在线策略选择问题，基于一组专用检测器库，性能通过遗憾（即相对于每个场景中最优策略的累积损失差距）来衡量。该设置将场景内随机噪声与场景间漂移耦合在一起。先前的方法仅捕捉一种模式，而我们通过随机扩展对手框架同时考虑两者，无需场景动态的先验知识。由于适应必须在无人机上运行，我们通过SEArch实例化SEA，这是一种轻量级的乐观跟随正则化领导者选择器，具有自适应学习率，实现了遗憾界$O(arσ_T \sqrt{T} + \sqrt{J})$，其中$arσ_T$捕捉雷达测量噪声，$J$是任务时间范围$T$内的场景转换次数。为了在频繁场景变化下实现快速适应，我们进一步引入了W-SEArch，这是一种窗口变体，每$w$轮重启一次，并在每个窗口内最多一次转换下实现遗憾界$O(arσ_I \sqrt{w})$。实验表明，在一系列非平稳设置中，与非自适应基线相比，遗憾降低高达30%。

英文摘要

Unmanned Aerial Vehicles (UAVs) equipped with radar sensors are deployed for target search missions in diverse environments, where targets exhibit characteristic signatures (e.g., respiration micro-motion in human search) detectable through occlusions. A fundamental challenge arises from shifts in radar statistics as the UAV moves through a dynamic and potentially non-stationary environment, rendering any fixed signal-processing strategy suboptimal; yet perception and adaptation must run onboard a resource-constrained aerial node in real time. Since no single detector performs well across all conditions, we adopt a multi-policy paradigm and formulate UAV target search as an online policy selection problem over a library of specialized detectors, with performance measured by regret, the cumulative loss gap relative to the best policy in each scene. The setting couples in-scene stochastic noise with inter-scene shifts. Whereas prior methods capture only one regime, we account for both through the Stochastically Extended Adversary (SEA) framework, without requiring oracle knowledge of scene dynamics. Because adaptation must run at the UAV, we instantiate SEA through \textsc{SEArch}, a lightweight optimistic Follow the Regularized Leader (OFTRL) selector with an adaptive learning rate, achieving regret $O(\barσ_T \sqrt{T} + \sqrt{J})$, where $\barσ_T$ captures radar measurement noise and $J$ is the number of scene transitions over the mission horizon $T$. To enable rapid adaptation under frequent scene changes, we further introduce \textsc{W-SEArch}, a windowed variant that restarts every $w$ rounds and achieves regret $O(\barσ_I \sqrt{w})$ under at most one transition per window. Experiments show up to 30\% regret reduction compared to non-adaptive baselines across a range of non-stationary settings.

URL PDF HTML ☆

赞 0 踩 0

2606.01311 2026-06-02 cs.CL cs.AI cs.LG cs.MA 版本更新

SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories

SkillAdaptor：基于轨迹的LLM智能体自适应技能

Zhuoyun Yu, Xin Xie, Wuguannan Yao, Chenxi Wang, Lei Liang, Xiang Qi, Shumin Deng

发表机构 * Zhejiang University（浙江大学）； Ant Digital Technologies, Ant Group（蚂蚁集团数字技术部）； Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph（浙江大学-蚂蚁集团知识图谱联合实验室）

AI总结提出SkillAdaptor，一种无训练的步骤级技能自适应框架，通过显式故障归因和针对性更新，提升LLM智能体在长程交互任务中的表现。

Comments Work in progress

详情

AI中文摘要

大型语言模型（LLM）智能体越来越依赖可重用的外部技能来解决长程交互任务。现有的无训练技能自适应流程通常从完整轨迹或会话级反馈更新技能，这使得故障归因粗糙，往往产生不稳定或过于宽泛的修订。我们提出SkillAdaptor，一种无训练的步骤级技能自适应框架，具有显式故障归因，并可插入OpenClaw类智能体框架。给定一个失败轨迹，SkillAdaptor识别第一个可操作的故障步骤，将责任关联到候选技能，并在显式接受检查下应用针对性更新，同时保持主干冻结。我们在WebShop、PinchBench和Claw-Eval上使用Kimi-K2.5、GLM-5和GPT-5.2进行评估。SkillAdaptor在所有三个套件上均优于无技能和技能自适应基线，最大的单项指标提升为PinchBench平均得分%提升1.5分，Claw-Eval平均得分提升1.8分，WebShop成功率提升1.7分。这些结果表明，步骤级归因支持更稳定且可审计的无训练技能维护。

英文摘要

Large language model (LLM) agents increasingly rely on reusable external skills to solve long-horizon interactive tasks. Existing training-free skill adaptation pipelines usually update skills from full trajectories or session-level feedback, which makes failure attribution coarse and often produces unstable or overly broad revisions. We propose SkillAdaptor, a training-free step-level skill adaptation framework with explicit failure attribution, and it can plug into OpenClaw-class agent harnesses. Given a failed trajectory, SkillAdaptor identifies a first actionable fault step, links responsibility to candidate skills, and applies targeted updates under explicit acceptance checks while keeping the backbone frozen. We evaluate on WebShop, PinchBench, and Claw-Eval with Kimi-K2.5, GLM-5, and GPT-5.2. SkillAdaptor improves over no-skill and skill-adaptation baselines on all three suites, with the largest single-metric improvements of +1.5 points on PinchBench Avg Score%, +1.8 on Claw-Eval Avg Score, and +1.7 on WebShop success rate. These results indicate that step-level attribution supports more stable and auditable training-free skill maintenance\footnote{The code will be released at https://github.com/zjunlp/SkillAdaptor.}.

URL PDF HTML ☆

赞 0 踩 0

2606.01306 2026-06-02 cs.LG cs.IR 版本更新

FAiT: Frequency-Aware Inverted Transformer for Multivariate Time Series Forecasting

FAiT：面向多元时间序列预测的频率感知倒置Transformer

Peng He, Yao Liu, Yanglei Gan, Run Lin, Yuxiang Cai, Qiao Liu

发表机构 * University of Electronic Science and Technology of China（电子科技大学）

AI总结提出FAiT，通过倒置注意力机制和动态时频调制，解决Transformer在多元时间序列预测中忽略高频信号和时变频谱特性的问题。

详情

AI中文摘要

什么造就了一个强模型？高维线性回归中知识迁移的统一谱分析

Wendao Wu, Fangqing Zhang, Haihan Zhang, Cong Fang

发表机构 * Department of Computer Science（计算机科学系）； Cranberry-Lemon University（Cranberry-Lemon 大学）； Department of Computational Neuroscience（计算神经科学系）； University of the Witwatersrand（沃特瓦特斯兰大学）

AI总结本文通过高维线性回归中SGD动力学的统一谱分析，揭示了知识蒸馏中的谱视界扩展和弱到强泛化中的谱去噪两种机制，统一解释了不同知识迁移范式的有效性。

详情

AI中文摘要

师生知识迁移在现代机器学习中无处不在，从通过知识蒸馏进行的经典模型压缩到弱到强泛化这一新兴现象。尽管现有研究提供了孤立见解，但缺乏一个统一的理论框架来解释知识迁移在这些不同机制中的有效性。在这项工作中，我们建立了高维线性回归中SGD动力学的统一谱分析，阐明了知识迁移在看似不同的机制中的效率。我们通过两种不同机制来刻画知识迁移效率：知识蒸馏中的谱视界扩展，使得能够捕获统计上不可及的高频信号；以及弱到强泛化中的谱去噪，其中学生充当优化噪声的滤波器。我们的框架统一了这些现象，揭示了迁移的有效性由隐式正则化与谱上异质谱学习速度之间的相互作用所支配。

英文摘要

Teacher-Student Knowledge Transfer (KT) is ubiquitous in modern machine learning, ranging from classical model compression via Knowledge Distillation (KD) to the emergent phenomenon of Weak-to-Strong (W2S) generalization. While existing studies offer isolated insights, a unified theoretical framework explaining the efficacy of KT across these disparate regimes remains lacking. In this work, we establish a unified spectral analysis of SGD dynamics in high-dimensional linear regression, elucidating the efficiency of KT across seemingly disparate regimes. We characterize KT efficiency through two distinct mechanisms: \emph{Spectral Horizon Expansion} in KD, which enables the capture of statistically inaccessible high-frequency signals, and \emph{Spectral Denoising} in W2S, where the student acts as a filter for optimization noise. Our framework unifies these phenomena, revealing that the efficacy of transfer is governed by the interplay between implicit regularization and heterogeneous spectral learning speeds over the spectrum.

URL PDF HTML ☆

赞 0 踩 0

2606.01289 2026-06-02 cs.LG 版本更新

Feature to Dynamics: Feature-space to Autoregression strategy for Zero-shot Time Series Forecasting

从特征到动态：面向零样本时间序列预测的特征空间到自回归策略

Yifan Wu, Junjie Wu, Kai Wu, Xiaoyu Zhang, Jian Lou

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出FSA框架，通过从可解释特征空间到自回归策略空间的映射，在零样本单变量时间序列预测中引入显式归纳偏置，分离全局趋势、周期成分和局部动态，以更少数据假设实现跨域泛化，在受控实验中优于Transformer架构。

详情

AI中文摘要

零样本时间序列预测旨在预测未见序列的未来值，要求模型泛化超出训练分布的时间动态。虽然近期的基础模型通过大规模预训练实现了强大的域内性能，但其有效性通常依赖于广泛的数据覆盖和隐式模式记忆，当数据稀缺或源域与目标域不重叠时，这可能会限制泛化能力。在这项工作中，我们提出了FSA，一种用于受控零样本单变量预测的特征到策略框架。FSA不直接在观测空间中对原始序列建模，而是学习从可解释特征空间到自回归策略空间的结构化映射。这种设计引入了显式归纳偏置，将全局趋势、周期成分和局部时间动态分离，使模型能够以更少的数据假设捕获可迁移的时间序列结构。实验结果表明，在相同的预训练数据、训练协议和可比较的参数预算下，FSA在我们的受控零样本设置中优于基于Transformer的架构。

英文摘要

Zero-shot time series forecasting aims to predict future values for previously unseen series, requiring models to generalize temporal dynamics beyond the training distribution. While recent foundation models achieve strong in-domain performance through large-scale pretraining, their effectiveness often relies on broad data coverage and implicit pattern memorization, which can limit generalization when data are scarce or source and target domains are disjoint. In this work, we propose FSA, a feature-to-strategy framework for controlled zero-shot univariate forecasting. Instead of directly modeling raw sequences in the observation space, FSA learns a structured mapping from an interpretable feature space to an autoregressive strategy space. This design introduces explicit inductive biases that disentangle global trends, periodic components, and local temporal dynamics, enabling the model to capture transferable time-series structure with fewer data assumptions. Empirical results show that, under identical pretraining data, training protocol, and comparable parameter budgets, FSA outperforms Transformer-based architectures in our controlled zero-shot setting.

URL PDF HTML ☆

赞 0 踩 0

2606.01286 2026-06-02 cs.SE cs.AI cs.CL cs.LG 版本更新

BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution

BenchEvolver: 通过以解决方案为中心的进化进行前沿任务合成

Yangzhen Wu, Aaron J. Li, Wenjie Ma, Li Cao, Ziheng Zhou, Mert Cemri, Shu Liu, Yuran Xiu, Chenxiao Yan, Haikun Zhao, Bin Yu, Ion Stoica, Dawn Song

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Institute for Interdisciplinary Information Sciences, Tsinghua University（清华大学交叉信息研究院）

AI总结提出BenchEvolver框架，通过进化参考解决方案自动生成更难的编程问题，以解决基准饱和问题，并在LiveCodeBench和SciCode上验证其有效性。

详情

AI中文摘要

前沿大语言模型的快速进步导致了广泛的基准饱和，限制了现有数据集区分模型能力或提供有用训练信号的能力。例如，在LiveCodeBench上，前沿模型在简单拆分上达到超过99%的Pass@1，在不同难度级别上平均超过90%的Pass@1。构建新的、具有挑战性的数据集通常需要大量人力，成为进步的瓶颈。我们引入了BenchEvolver，一个以解决方案为中心的进化框架，自动将现有编码问题转化为更难的变体。BenchEvolver不是从头生成问题，而是通过结构化变换进化参考解决方案，并从进化后的解决方案中推导出相应的描述和测试。这种设计将生成过程基于可执行语义，使得能够可扩展地构建高质量、多样化和困难的任务，并具有可验证的正确性。将BenchEvolver应用于LiveCodeBench和SciCode，我们获得了显著更难的进化任务，同时保持了有效性、参考正确性和多样性。我们进一步策划了LiveCodeBench-Plus，一个包含91个问题的基准，结合了进化后的任务和困难的原始LCB-v6任务，其中前沿模型的Pass@1范围从27.5%到62.6%，恢复了强编码模型之间的清晰区分。重要的是，即使对于生成它们的模型，进化后的任务仍然具有挑战性，从而实现了自我改进。我们进一步表明，在进化后的LCB任务上进行强化学习提高了留出编码性能：对于gpt-oss-20b，种子+进化训练在LCB v6 Hard和LCB-Pro Easy上分别获得了+8.7和+8.3的Pass@1提升，分别超过仅种子训练的70.7%和34.8%。我们的结果表明，BenchEvolver可以将饱和的基准转化为前沿级别的评估套件和可重用的训练信号。

英文摘要

The rapid progress of frontier large language models has led to widespread benchmark saturation, limiting the ability of existing datasets to differentiate model capabilities or provide useful training signal. For instance, on LiveCodeBench, frontier models achieve over 99% Pass@1 on easy splits and exceed 90% Pass@1 on average across difficulty levels. Constructing new, challenging datasets typically requires substantial human effort, creating a bottleneck for progress. We introduce BenchEvolver, a solution-centric evolutionary framework that automatically transforms existing coding problems into harder variants. Rather than generating problems from scratch, BenchEvolver evolves reference solutions through structured transformations and derives corresponding statements and tests from the evolved solutions. This design grounds generation in executable semantics, enabling scalable construction of high-quality, diverse, and difficult tasks with verifiable correctness. Applying BenchEvolver to LiveCodeBench and SciCode, we obtain evolved tasks that are substantially harder while maintaining validity, reference correctness, and diversity. We further curate LiveCodeBench-Plus, a 91-problem benchmark combining evolved and difficult original LCB-v6 tasks, where frontier-model Pass@1 ranges from 27.5% to 62.6%, restoring clear discrimination among strong coding models. Importantly, evolved tasks remain challenging even for the model that generates them, enabling self-improvement. We further show that RL on evolved LCB tasks improves held-out coding performance: for gpt-oss-20b, seed+evolved training achieves +8.7 and +8.3 Pass@1 gains on LCB v6 Hard and LCB-Pro Easy, exceeding seed-only gains by 70.7% and 34.8%, respectively. Our results show that BenchEvolver can convert saturated benchmarks into frontier-level evaluation suites and reusable training signal.

URL PDF HTML ☆

赞 0 踩 0

2606.01283 2026-06-02 cs.LG 版本更新

AdaKernel: Learning Adaptive Kernel Parameters for Spatiotemporal Graph Neural Networks

AdaKernel: 为时空图神经网络学习自适应核参数

Zhongyue Zhang, Guangyin Jin, Yuxuan Liang, Suwan Yin, Yuankai Wu

发表机构 * Sichuan University（四川大学）； PLA Academy of Military Science（中国人民解放军军事科学院）； The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））

AI总结针对图神经网络中固定核参数导致模型容量受限的问题，提出AdaKernel方法，通过结构保持策略学习自适应核参数，在数据稀疏场景下优于固定先验和全隐式图结构方法。

Comments 17 pages, 15 figures, including appendix

详情

AI中文摘要

建模空间依赖性是使用图神经网络（GNN）进行时空数据分析的核心。传统方法依赖于具有预定义参数的基于距离的核，这限制了模型容量。尽管通用自适应机制（如图注意力网络）提供了灵活性，但它们通常无法捕捉潜在的几何结构，在数据稀疏场景下表现不如基于距离的模型。针对这一问题，我们重新审视核参数化问题，并从理论上证明，错误指定的核参数会在GNN中引入不可避免的近似误差。为了克服这一困难，我们提出AdaKernel，一种简单而有效的方法，在神经网络内学习自适应核参数。与从头学习图结构的方法不同，AdaKernel采用结构保持策略，优化物理相互作用的尺度而非丢弃它们。在克里金插值、数据填补和预测上的大量实验表明，AdaKernel持续改进各种GNN架构，并优于模型无关的自适应基线，验证了准确学习的核参数优于固定先验和完全隐式图结构。

英文摘要

Modeling spatial dependencies is central to spatiotemporal data analysis using Graph Neural Networks (GNNs). Traditional methods rely on distance-based kernels with predefined parameters, which restricts model capacity. Although generic adaptive mechanisms (e.g., Graph Attention Networks) offer flexibility, they often fail to capture the underlying geometric structure, performing worse than distance-based models in data-sparse scenarios. Addressing this, we revisit the kernel parameterization problem and theoretically prove that misspecified kernel parameters introduce unavoidable approximation errors in GNNs. To overcome this, we propose AdaKernel, a simple yet effective approach that learns adaptive kernel parameters within the neural network. Unlike methods that learn graph structures from scratch, AdaKernel adopts a structure-preserving strategy that optimizes the scale of physical interactions rather than discarding them. Extensive experiments on Kriging, Imputation, and Forecasting demonstrate that AdaKernel consistently improves various GNN architectures and outperforms model-agnostic adaptive baselines, validating that accurately learned kernel parameters are superior to both fixed priors and fully latent graph structures.

URL PDF HTML ☆

赞 0 踩 0

2606.01282 2026-06-02 cs.CV cs.CY cs.LG 版本更新

KG-FairDiff: Knowledge Graph-Guided Prompt Refinement for Demographically Fair Text-to-Image Generation

KG-FairDiff: 知识图谱引导的提示词精炼用于人口统计公平的文本到图像生成

Farbod Davoodi, Seyed Reza Tavakoli Shiyadeh, Pooria Safaei, Sana Harighi, Parsa Gholami, Amirali Amini, Kimia Vanaei, Emad Firoozi, Parham Abed Azad, Babak Khalaj, Siavash Ahmadi, Amir Hossein Payberah, Mohammad Hossein Rohban, Soheil Kolouri, Ali Diba

发表机构 * University of Science and Technology of China（中国科学技术大学）； Sharif University of Technology（谢赫·伊斯兰大学）； Iran University of Science and Technology（伊朗科学技术大学）

AI总结提出KG-FairDiff框架，通过知识图谱引导的提示词精炼，在推理时优化公平性损失，减少文本到图像生成中的性别、种族、年龄等人口统计偏差，同时保持语义保真度。

详情

AI中文摘要

文本到图像（TTI）系统现已成为新闻、教育、广告和公共传播的日常基础设施，它们从训练数据中继承的人口统计和文化刻板印象（将女性、有色人种、老年人和非西方文化描绘为代表性不足或漫画化）在部署规模上成为人口层面的危害。现有的缓解措施要么需要昂贵的重新训练，这对于主导消费产品的闭源骨干网络不可行，要么依赖于忽略文化背景的固定人口统计模板。我们提出了KG-FairDiff，一个模型无关、推理时框架，将公平感知的提示词精炼形式化为一个约束优化问题，并将其实现为一个闭环流水线：一个包含约1200个文化和偏见相关三元组的知识图谱检索结构化上下文，一个LLM改写器提出精炼，一个验证器仅接受那些减少基于散度的公平性损失同时保持用户原始意图语义保真度的提示词。我们证明了精炼循环的有限终止界限，贡献了一个数学上一致的评估套件，将Bias-P/Bias-W与目标分布的散度以及ENS与KL散度联系起来，并审计了八个广泛部署的骨干生成器。KG-FairDiff显著减少了性别、种族、年龄和交叉差异，同时保持了提示词语义，为更公平的生成式AI提供了一条实用、可部署的路径。

英文摘要

Text-to-Image (TTI) systems are now everyday infrastructure for journalism, education, advertising, and public communication, and the demographic and cultural stereotypes they inherit from training data (rendering women, people of colour, older adults, and non-Western cultures as under-represented or caricatured) become a population-level harm at deployment scale. Existing mitigations either require costly retraining, infeasible for the closed-source backbones that dominate consumer products, or rely on fixed demographic templates that ignore cultural context. We present KG-FairDiff, a model-agnostic, inference-time framework that formalises fairness-aware prompt refinement as a constrained optimisation problem and operationalises it as a closed-loop pipeline: a knowledge graph of ~1,200 culture- and bias-related triples retrieves structured context, an LLM rewriter proposes refinements, and a validator accepts only prompts that reduce a divergence-based fairness loss while preserving semantic fidelity to the user's original intent. We prove a finite-termination bound for the refinement loop, contribute a mathematically consistent evaluation suite linking Bias-P/Bias-W to divergence from target distributions and ENS to KL divergence, and audit eight widely-deployed backbone generators. KG-FairDiff substantially reduces gender, race, age, and intersectional disparities while preserving prompt semantics, offering a practical, deployment-ready route to more equitable generative AI.

URL PDF HTML ☆

赞 0 踩 0

2606.01281 2026-06-02 cs.LG cs.AI 版本更新

RLVR without Ineffective Samples: Group Prioritized Off-Policy Optimization for LLM Reasoning

RLVR 无需无效样本：面向 LLM 推理的群体优先级离策略优化

Yixiu Mao, Yun Qu, Qi Wang, Heming Zou, Xiangyang Ji

发表机构 * Department of Automation, Tsinghua University（清华大学自动化系）

AI总结针对强化学习中无效样本导致学习信号不足的问题，提出群体优先级离策略优化（POPO），通过优先级群体重放和解耦重要性采样，在不增加额外采样开销的情况下提升推理性能。

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）已成为增强大型语言模型（LLMs）推理能力的强大范式。然而，其有效性受到无效训练数据普遍存在的严重阻碍：许多采样提示产生的响应群体要么完全正确，要么完全错误，导致奖励零方差和学习信号有限。最近的先进方法通过大量LLM rollout来过滤无效样本以解决此问题，但代价是相当大的计算开销。替代方法，包括预测性采样和轨迹重放，旨在提高数据效率，但往往仍不充分，并可能引入额外问题，如系统性偏差或次优约束。为解决这些局限性，我们提出了群体优先级离策略优化（POPO），一个简单而有效的框架，无需额外rollout开销即可充分利用有效训练批次。POPO包含两个关键组件：优先级群体重放和解耦离策略优化。前者通过基于近因的重放机制，联合考虑样本质量和离策略程度，用有效的离策略群体替换无效的在策略群体。为进一步缩小离策略差距，POPO采用解耦重要性采样来校正离策略偏差，同时在一致的信任区域约束下保持稳定的策略更新。在包括数学、规划和视觉几何在内的多种推理任务上的实证评估表明，POPO显著加速了RL微调，并在显著减少rollout的情况下实现了强大的推理性能。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has emerged as a powerful paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, its effectiveness is substantially hindered by the prevalence of ineffective training data: many sampled prompts yield response groups that are either entirely correct or entirely incorrect, resulting in zero-variance rewards and limited learning signals. Recent state-of-the-art methods address this issue through extensive LLM rollouts to filter ineffective samples, but at the cost of considerable computational overhead. Alternative approaches, including predictive sampling and trajectory replay, aim to improve data efficiency but often remain insufficient and may introduce additional issues such as systematic bias or suboptimal constraints. To address these limitations, we propose Group Prioritized Off-Policy Optimization (POPO), a simple yet effective framework that fully exploits effective training batches without additional rollout overhead. POPO comprises two key components: prioritized group replay and decoupled off-policy optimization. The former replaces ineffective on-policy groups with effective off-policy groups via a recency-based replay mechanism that jointly considers sample quality and the degree of off-policiness. To further mitigate the off-policy gap, POPO employs decoupled importance sampling to correct off-policy bias while maintaining stable policy updates under consistent trust-region constraints. Empirical evaluations across diverse reasoning tasks, including mathematics, planning, and visual geometry, demonstrate that POPO substantially accelerates RL finetuning and achieves strong reasoning performance with significantly fewer rollouts.

URL PDF HTML ☆

赞 0 踩 0

2606.01273 2026-06-02 cs.LG 版本更新

GLIDE: Graph-guided Leap Inference for Diffusion Estimation of Spatio-Temporal Point Processes

GLIDE: 面向时空点过程扩散估计的图引导跳跃推理

Guanyu Zhou, Yao Liu, Yanglei Gan, Yuxiang Cai, Peng He, Run Lin, Yuxiang Liu, Qiao Liu

发表机构 * University of Electronic Science and Technology of China（电子科技大学）

AI总结提出GLIDE框架，利用多尺度历史图编码和双流架构作为条件，结合先验引导的跳跃推理机制，实现高效且准确的时空点过程下一个事件建模与预测。

详情

AI中文摘要

时空点过程（STPPs）为连续时间和空间中的异步事件建模提供了原则性框架。最近的扩散方法通过建模复杂条件分布，为确定性预测提供了灵活的替代方案，但其在STPPs中的应用仍面临挑战：从纯噪声中反向采样成本高昂，且稀疏空间域中弱结构约束可能导致概率质量定位不佳。我们提出 extbf{GLIDE}（图引导跳跃推理扩散估计），一种用于STPPs中下一个事件建模的条件扩散框架。GLIDE将历史事件组织成多尺度历史图，并通过双流架构编码时间演化和空间拓扑，为双分支扩散去噪器提供结构化条件上下文。它进一步引入先验引导的跳跃推理机制，其中轻量级均值预测器提供确定性锚点，反向过程从中间扩散步骤而非纯高斯噪声开始。在多个真实世界数据集上的实验表明，GLIDE改进了分布拟合和下一个事件预测，其中空间方面的提升最大。结果还表明，先验引导的跳跃推理大幅降低了反向采样成本，同时保留了扩散模型的随机生成能力。

英文摘要

Spatio-temporal point processes (STPPs) provide a principled framework for modeling asynchronous events in continuous time and space. Recent diffusion-based approaches offer a flexible alternative to deterministic prediction by modeling complex conditional distributions, but their application to STPPs remains challenging: reverse sampling from pure noise is costly, and weak structural constraints in sparse spatial domains can lead to poorly localized probability mass. We propose \textbf{GLIDE} (Graph-guided Leap Inference for Diffusion Estimation), a conditional diffusion framework for next-event modeling in STPPs. GLIDE organizes historical events into a multi-scale historical graph and encodes temporal evolution and spatial topology through a dual-stream architecture, yielding a structured conditioning context for a dual-branch diffusion denoiser. It further introduces a prior-guided leap inference mechanism, in which a lightweight mean predictor provides a deterministic anchor and the reverse process starts from an intermediate diffusion step instead of from pure Gaussian noise. Experiments on multiple real-world datasets show that GLIDE improves both distribution fitting and next-event prediction, with the largest gains appearing on the spatial side. The results also indicate that prior-guided leap inference substantially reduces reverse-sampling cost while preserving the stochastic generation capability of diffusion models.

URL PDF HTML ☆

赞 0 踩 0

2606.01265 2026-06-02 cs.LG cs.AI 版本更新

PALTO: Physics-Informed Active Learning for Tri-Gate FinFET Design Optimization for Vertical Power Delivery

PALTO：面向垂直供电的Tri-Gate FinFET设计优化的物理信息主动学习

Ayoub Sadeghi, Leonid Popryho, Inna Partin-Vaisband

发表机构 * University of Illinois Chicago（伊利诺伊大学香槟分校）； Center for Heterogeneous Integration of Micro Electronic Systems（微电子异构集成中心）； Joint University Microelectronics Program (JUMP) 2.0（联合大学微电子计划（JUMP）2.0）； Semiconductor Research Corporation (SRC)（半导体研究公司（SRC））； Defense Advanced Research Project Agency (DARPA)（国防高级研究计划局（DARPA））

AI总结提出物理信息主动学习框架，高效探索GaN tri-gate FinFET的高维设计空间，优化关键结构参数（如GaN-to-AlGaN厚度比），发现两种优化器件，其中D1在300-fin配置下驱动电流和开关效率优于D2。

详情

AI中文摘要

本文展示了机器学习驱动优化在垂直供电系统中设计特定应用的GaN三栅极FinFET的有效性。传统的基于TCAD的方法计算量大，且不足以导航先进GaN器件的高维非线性设计空间。为此，采用物理信息主动学习框架智能引导仿真，在保持精度的同时加速收敛。这种ML引导的方法通过高效探索关键结构参数——尤其是GaN-to-AlGaN厚度比（器件设计中长期争论的焦点）——来发现最优配置。通过系统探索关键结构参数，确定了两种具有激进缩放的栅漏长度的优化器件。单鳍多通道仿真表明，相对于AlGaN势垒具有更薄GaN沟道的器件D2实现了更高的驱动电流。然而，在300鳍配置中，器件D1以0.49欧姆导通电阻提供3.3A电流，性能约为D2的2倍，尽管寄生参数略高。两种器件均工作在常关模式。基于特定应用品质因数，器件D1达到5 pC·欧姆，开关效率比D2高2倍，而两种设计在不同性能指标上均优于工业基准。

英文摘要

This paper demonstrates the effectiveness of machine learning-driven optimization for designing application-specific GaN tri-gate FinFETs in vertical power delivery systems. Conventional TCAD-based approaches are computationally intensive and insufficient for navigating the high-dimensional, nonlinear design space of advanced GaN devices. To address this, a physics-informed active learning framework is used to intelligently guide simulations, accelerating convergence while preserving accuracy. This ML-guided approach enables the discovery of optimal configurations by efficiently exploring key structural parameters -- most notably the GaN-to-AlGaN thickness ratio -- a long-standing focus of debate in device design. By systematically exploring key structural parameters, two optimized devices with aggressively scaled gate-to-drain lengths are identified. Single-fin, multi-channel simulations show that device~D2, with a thinner GaN channel relative to the AlGaN barrier, achieves higher drive current. However, in a 300-fin configuration, device~D1 outperforms device~D2 by delivering 3.3\,A at 0.49~ohm on-resistance -- approximately 2$\times$ better -- despite slightly higher parasitics. Both devices operate in a normally-off mode. Based on an application-specific figure of merit, device~D1 achieves 5\,pC$\cdot$ohm, demonstrating 2$\times$ greater switching efficiency than device~D2, while both designs outperform industrial benchmarks from different performance standpoints.

URL PDF HTML ☆

赞 0 踩 0

2606.01258 2026-06-02 cs.LG cs.CL eess.SP 版本更新

Beyond Sinusoids: A Morlet Wavelet Framework for Transformer Positional Encoding

超越正弦波：基于Morlet小波的Transformer位置编码框架

Athanasios Zeris

发表机构 * Independent Researcher（独立研究者）； Athens, Greece（希腊雅典）

AI总结提出Morlet位置编码（MoPE），通过可学习的频率和局部带宽统一了正弦位置编码和旋转位置编码，并在TinyShakespeare上结合能量门控注意力提升了0.119的性能。

Comments 16 pages, 4 figures, 4 tables

详情

AI中文摘要

标准Transformer位置编码——正弦编码和旋转位置编码（RoPE）——将每个位置视为同等局部：它们编码了标记的位置，但未编码其位置影响应延伸多远。我们提出Morlet小波（同时最小化位置和频率的不确定性）是位置编码的自然基础，并引入Morlet位置编码（MoPE）：每个嵌入维度从数据中学习其自身的频率和局部带宽。主要理论结果是统一：当局部性关闭时（sigma_i -> 无穷大），正弦PE和RoPE相关核都作为MoPE的极限情况出现。MoPE的相位精确恢复了RoPE旋转角度；幅度增加了一个标准编码所缺乏的可学习高斯局部核。实验上，MoPE结合能量门控注意力在TinyShakespeare上比标准注意力提升了0.119，优于任一单独组件。对学习参数的分析显示，所有128个频率-带宽对收敛到小波可容许边界——这一经验观察与关于能量门控的伴随结果一致，表明字符级语言信号的一个可重现性质，值得进一步研究。

英文摘要

Standard positional encodings for transformers - sinusoidal and rotary (RoPE) - treat every position as equally local: they encode where a token is, but not how far its positional influence should extend. We propose that the Morlet wavelet, which simultaneously minimises uncertainty in position and frequency, is the natural basis for positional encoding, and introduce Morlet Positional Encoding (MoPE): each embedding dimension learns its own frequency and locality bandwidth from data. The main theoretical result is a unification: sinusoidal PE and the RoPE correlation kernel both emerge as limiting cases of MoPE when locality is switched off (sigma_i -> infinity). The phase of MoPE recovers the RoPE rotation angle exactly; the amplitude adds a learned Gaussian locality kernel that standard encodings lack. Empirically, MoPE combined with Energy-Gated Attention achieves +0.119 improvement over standard attention on TinyShakespeare, outperforming either component alone. Analysis of the learned parameters reveals that all 128 frequency-bandwidth pairs converge to the wavelet admissibility boundary - an empirical observation consistent with a companion result on energy gating, suggesting a reproducible property of character-level language signals that warrants further investigation.

URL PDF HTML ☆

赞 0 踩 0

2606.01256 2026-06-02 stat.ML cs.LG stat.ME 版本更新

Distribution-free changepoint localization after sequential change detection

顺序变化检测后的无分布变化点定位

Aytijhya Saha, Aaditya Ramdas

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； Carnegie Mellon University（卡内基梅隆大学）

AI总结本文提出了一种无分布框架，在停止顺序变化检测程序后构建变化点的后检测置信集，无需任何分布假设，并保证了有限样本覆盖率和渐近有界期望大小。

详情

AI中文摘要

本文介绍了一种无分布框架，用于在停止顺序变化检测程序后构建变化点的后检测置信集。众所周知，共形测试鞅可用于顺序检测分布变化，但其本身不提供对声称变化发生时间的推断。以往关于后检测推断的工作需要已知变化前和变化后的分布类别，而本文在没有任何分布假设的情况下实现了变化点的定位。我们建立了有限样本覆盖保证（条件于正确检测）。我们给出了置信集条件期望大小的非渐近界。在合适的渐近机制下，我们证明了置信集的条件期望大小一致有界，并在模拟和真实数据上展示了强大的实证性能。据我们所知，这是第一个具有有效后检测覆盖保证的通用无分布顺序变化点定位框架。

英文摘要

This paper introduces a distribution-free framework for constructing post-detection confidence sets for changepoints after stopping a sequential change detection procedure. It is well known that conformal test martingales can be used to sequentially detect changes in distribution, but by themselves provide no inference for the time at which a proclaimed change occurred. Past work on post-detection inference requires pre- and post-change classes of distributions to be known, but this paper accomplishes localization of the changepoint without any distributional assumptions. We establish finite-sample coverage guarantees (conditional on correct detection). We provide non-asymptotic bounds on the conditional expected size of the confidence sets. Under suitable asymptotic regimes, we proved that the conditional expected size of the confidence set remains uniformly bounded. and demonstrate strong empirical performance on simulated and real data. To the best of our knowledge, this is the first general distribution-free framework for sequential changepoint localization with a valid post-detection coverage guarantee.

URL PDF HTML ☆

赞 0 踩 0

2606.01244 2026-06-02 stat.ML cs.LG cs.NA math.FA math.NA math.ST stat.TH 版本更新

Efficient Approximation for Encoder--Decoder Neural Operators via Variation Spaces

基于变分空间的编码器-解码器神经算子的高效逼近

Jia-Qi Yang, Lei Shi

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结通过引入变分空间作为非线性算子的无穷维结构类，建立了编码器-解码器双层网络在Bochner L^q范数下的逼近界，误差分解为输入编码误差、输出编码误差和N^{-1/2}阶有限宽逼近项，为超越一般Lipschitz或Fréchet可微算子类的高效神经算子学习提供了理论保证。

Comments 14 pages

详情

AI中文摘要

我们研究使用编码器-解码器神经网络的算子学习。受神经网络函数空间理论的启发，我们引入变分空间作为非线性算子的无穷维结构类。该空间通过直接在输入和输出空间上的向量值测度定义。对于该空间中的算子，我们建立了编码器-解码器双层网络在Bochner $L^q$ 范数下的逼近界。得到的误差界分解为输入编码误差、输出编码误差和一个阶为 $N^{-1/2}$ 的有限宽逼近项，其常数与输入和输出编码维度无关。当输入和输出编码误差在编码维度上呈多项式衰减时，这些估计产生代数逼近和学习速率。结果为超越一般Lipschitz或Fréchet可微算子类的高效神经算子学习提供了理论保证。

英文摘要

We study operator learning using encoder--decoder neural networks. Inspired by the function-space theory of neural networks, we introduce a variation space as an infinite-dimensional structural class for nonlinear operators. This space is defined through vector-valued measures directly on the input and output spaces. For operators in this space, we establish approximation bounds for encoder--decoder two-layer networks in the Bochner $L^q$ norm. The resulting error bound decomposes into the input encoding error, the output encoding error, and a finite-width approximation term of order $N^{-1/2}$, with a constant independent of the input and output encoding dimensions. When the input and output encoding errors decay polynomially in the encoding dimensions, these estimates yield algebraic approximation and learning rates. The results provide an theoretical guarantees for efficient neural operator learning beyond general Lipschitz or Fréchet differentiable operator classes.

URL PDF HTML ☆

赞 0 踩 0

2606.01243 2026-06-02 cs.CL cs.LG 版本更新

Unlocking the Black Box of Latent Reasoning: An Interpretability-Guided Approach to Intervention

解锁潜在推理的黑箱：一种可解释性引导的干预方法

Shuochen Chang, Tong Bai, Xiaofeng Zhang, Qianli Ma, Qingyang Liu, Zhaohe Liao, Yibo Miao, Li Niu

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Fudan University（复旦大学）

AI总结本文通过结构、因果和几何探针分析潜在推理向量的可解释性，并基于此提出无需训练的解码时干预方法，提升大语言模型推理准确性。

详情

Journal ref: ACL2026 Main

AI中文摘要

潜在推理使大型语言模型（LLMs）能够在连续隐藏状态内执行多步推理，相比显式思维链（CoT）提供了效率提升。然而，这些连续思维向量的不透明性阻碍了其可靠性和可控性。本文弥合了机械可解释性与可操作控制之间的差距。我们首先使用结构、因果和几何探针进行系统分析，揭示潜在向量编码了推理步骤的压缩、忠实表示，其中早期向量作为关键因果枢纽。在此基础上，我们将这些可解释性见解操作化为一套无需训练、解码时干预的方法，通过施加已识别的几何和语义先验来优化潜在推理过程。跨多个模型规模和不同任务领域的广泛实验表明，我们的方法持续提高了推理准确性。我们的可解释性引导干预一致地解锁了潜在能力，并在没有任何参数更新的情况下提高了推理准确性。

英文摘要

Latent reasoning enables Large Language Models (LLMs) to perform multi-step inference within continuous hidden states, offering efficiency gains over explicit Chain-of-Thought (CoT). However, the opacity of these continuous thought vectors hinders their reliability and controllability. This paper bridges the gap between mechanistic interpretability and actionable control. We first present a systematic analysis using structural, causal, and geometric probes, revealing that latent vectors encode compressed, faithful representations of reasoning steps, with early vectors acting as critical causal hubs. Building on this, we operationalize these interpretability insights into a suite of training-free, decode-time interventions that refine the latent reasoning process by imposing the identified geometric and semantic priors. Extensive experiments across multiple model scales and diverse task domains demonstrate that our approaches consistently improve reasoning accuracy. Our interpretability-guided interventions consistently unlock latent capabilities and improve reasoning accuracy without any parameter updates.

URL PDF HTML ☆

赞 0 踩 0

2605.04819 2026-06-02 cs.LG 版本更新

Unsat Core Prediction through Polarity-Aware Representation Learning over Clause-Literal Hypergraphs

通过子句-文字超图上的极性感知表示学习进行不可满足核心预测

Zhenchao Sun, Shuai Ma, Ping Lu, Chongyang Tao

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出一种极性感知的表示学习框架，将SAT公式建模为子句-文字超图，通过极性感知分解和极性反转一致性正则化，有效预测不可满足核心。

Comments Accepted at ICML 2026

详情

AI中文摘要

图神经网络已广泛用于布尔可满足性（SAT）任务中，以从SAT公式中学习结构信息。这些研究的目标是解决SAT实例或增强SAT求解器，包括不可满足核心预测等任务。然而，大多数现有方法将SAT公式建模为二分图或有向无环图，这些方法在捕捉文字和子句之间的子句级和高阶交互方面不够直接。此外，这些方法在建模SAT固有的极性相关属性（如变量的正负文字之间的互补关系）方面存在局限性。为了解决这些局限性，我们提出了一种基于子句-文字超图的极性感知表示学习框架。我们将SAT公式建模为子句-文字超图，并辅以子句关联图以捕捉高阶结构交互。然后，我们引入一种极性感知分解机制，将变量表示分离为极性不变和等变分量，显式建模正负文字之间的关系，并将生成的文字表示沿超图结构传播。我们进一步引入极性反转一致性正则化，以在训练过程中强化极性一致的表示。在多个SAT数据集上的实验结果表明了该方法的有效性。

英文摘要

Graph neural networks have been widely used in Boolean satisfiability (SAT) tasks to learn structural information from SAT formulas. The goal of these studies is to solve SAT instances or to enhance SAT solvers, including tasks such as unsat-core prediction. However, most existing approaches model a SAT formula as a bipartite graph or a directed acyclic graph, which are less direct in capturing clause-level and higher-order interactions among literals and clauses. Moreover, these approaches are limited in modeling intrinsic polarity-related properties of SAT, such as the complementary relationship between the positive and negative literals of a variable. To address these limitations, we propose a polarity-aware representation learning framework over clause-literal hypergraphs. We model SAT formulas as clause-literal hypergraphs augmented with a clause incidence graph to capture higher-order structural interactions. We then introduce a polarity-aware decomposition mechanism that separates variable representations into polarity invariant and equivariant components, explicitly modeling the relationship between positive and negative literals, with the resulting literal representations propagated along the hypergraph structure. We further incorporate a polarity-inversion consistency regularization to reinforce polarity-consistent representations during training. Experimental results on multiple SAT datasets demonstrate the effectiveness of the proposed approach.

URL PDF HTML ☆

赞 0 踩 0

2605.04193 2026-06-02 cs.AI cs.LG cs.LO 版本更新

ANDRE: An Attention-based Neuro-symbolic Differentiable Rule Extractor for Inductive Logic Programming

ANDRE：一种基于注意力的神经符号可微规则提取器，用于归纳逻辑编程

Iman Sharifi, Peng Wei, Saber Fallah

发表机构 * Dept. of Mechanical and Aerospace Engineering, George Washington University, USA（机械与航空航天工程系，乔治华盛顿大学）； Dept. of Mechanical Engineering Sciences, University of Surrey, UK（机械工程科学系，萨里大学）

AI总结提出ANDRE框架，通过注意力驱动的可微逻辑算子优化连续规则空间，实现从概率数据中学习一阶逻辑规则，在噪声环境下保持鲁棒性和可解释性。

Comments 35 pages, 8 figures, 10 tables

详情

AI中文摘要

归纳逻辑编程（ILP）旨在从数据中学习可解释的一阶规则，但现有的符号和神经符号方法难以扩展到噪声和概率设置。经典ILP依赖于离散的组合规则搜索，在不确定性下脆弱，而可微ILP方法通常依赖预定义规则模板或不精确的模糊算子，这些算子在推理概率谓词估值时会遭受梯度消失或逻辑结构近似不佳的问题。本文提出基于注意力的神经符号可微规则提取器（ANDRE），一种新颖的ILP框架，通过基于注意力的逻辑算子优化连续规则空间来学习一阶逻辑程序。ANDRE用完全可微的、注意力驱动的合取和析取算子替代规则模板和逻辑算子，这些算子近似逻辑最小-最大语义，从而实现对概率数据的准确、稳定和可解释推理。通过在每条规则内软选择、否定或排除谓词，ANDRE在保持符号结构的同时支持灵活规则归纳。在经典ILP基准、大规模知识库以及带有概率谓词和噪声监督的合成数据集上的大量实验表明，ANDRE达到了有竞争力或更优的预测性能，同时在不确定性下可靠地恢复正确的符号规则。特别是，ANDRE对中等标签噪声保持鲁棒，在规则提取质量和稳定性上显著优于现有可微ILP方法。

英文摘要

Inductive Logic Programming (ILP) aims to learn interpretable first-order rules from data, but existing symbolic and neuro-symbolic approaches struggle to scale to noisy and probabilistic settings. Classical ILP relies on discrete combinatorial rule search and is brittle under uncertainty, while differentiable ILP methods typically depend on predefined rule templates or inaccurate fuzzy operators that suffer from vanishing gradients or poor approximation of logical structure when reasoning over probabilistic predicate valuations. This paper proposes an Attention-based Neuro-symbolic Differentiable Rule Extractor (ANDRE), a novel ILP framework that learns first-order logic programs by optimizing over a continuous rule space with attention-based logical operators. ANDRE replaces both rule templates and logical operators with fully differentiable, attention-driven conjunction and disjunction operators that approximate logical min-max semantics, enabling accurate, stable, and interpretable reasoning over probabilistic data. By softly selecting, negating, or excluding predicates within each rule, ANDRE supports flexible rule induction while preserving symbolic structure. Extensive experiments on classical ILP benchmarks, large-scale knowledge bases, and synthetic datasets with probabilistic predicates and noisy supervision demonstrate that ANDRE achieves competitive or superior predictive performance while reliably recovering correct symbolic rules under uncertainty. In particular, ANDRE remains robust to moderate label noise, substantially outperforming existing differentiable ILP methods in both rule extraction quality and stability.

URL PDF HTML ☆

赞 0 踩 0

2605.03403 2026-06-02 cs.CV cs.LG 版本更新

GRPO-TTA: Test-Time Visual Tuning for Vision-Language Models via GRPO-Driven Reinforcement Learning

GRPO-TTA：基于GRPO驱动的强化学习进行视觉语言模型的测试时视觉调优

Yujun Li, Hongyuan Zhang, Yuan Yuan

发表机构 * School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University（人工智能、光学与电子学院（iOPEN），西北工业大学）

AI总结提出GRPO-TTA方法，将GRPO应用于测试时适应，通过将类特定提示预测重构为组策略优化问题，并设计对齐奖励和分散奖励，在多种基准上优于现有方法。

详情

AI中文摘要

组相对策略优化（GRPO）最近在大型语言模型和视觉语言模型的后训练中展现出强大性能。这引发了一个问题：GRPO是否也能显著促进视觉语言模型的测试时适应（TTA）。在本文中，我们提出了用于测试时适应的组相对策略优化（GRPO-TTA），通过将类特定提示预测重构为组策略优化问题，将GRPO适应到TTA设置。具体来说，我们通过从CLIP相似度分布中采样top-K类候选来构建输出组，从而在无需真实标签的情况下实现概率驱动的优化。此外，我们设计了针对测试时适应的奖励函数，包括对齐奖励和分散奖励，以指导有效的视觉编码器调优。在多种基准上的大量实验表明，GRPO-TTA一致优于现有的测试时适应方法，在自然分布偏移下性能提升尤为显著。

英文摘要

Group Relative Policy Optimization (GRPO) has recently shown strong performance in post-training large language models and vision-language models. It raises a question of whether the GRPO also significantly promotes the test-time adaptation (TTA) of vision language models. In this paper, we propose Group Relative Policy Optimization for Test-Time Adaptation (GRPO-TTA), which adapts GRPO to the TTA setting by reformulating class-specific prompt prediction as a group-wise policy optimization problem. Specifically, we construct output groups by sampling top-K class candidates from CLIP similarity distributions, enabling probability-driven optimization without access to ground-truth labels. Moreover, we design reward functions tailored to test-time adaptation, including alignment rewards and dispersion rewards, to guide effective visual encoder tuning. Extensive experiments across diverse benchmarks demonstrate that GRPO-TTA consistently outperforms existing test-time adaptation methods, with notably larger performance gains under natural distribution shifts.

URL PDF HTML ☆

赞 0 踩 0

2606.01238 2026-06-02 cs.RO cs.LG 版本更新

Training-Free Imitation Learning with Closed-Form Diffusion Policies

无训练闭环扩散策略的模仿学习

Raghav Mishra, Ian R. Manchester

发表机构 * Australian Center for Robotics, ARIAM Hub, and School of Aerospace, Mechanical and Mechatronic Engineering University of Sydney（澳大利亚机器人中心、ARIAM中心和悉尼大学航空航天、机械与机电工程学院）

AI总结提出一种基于演示数据集闭式得分的无训练扩散策略（CFDP），实现毫秒级实时模仿学习，性能媲美需数小时训练的神经基线，并支持推理时策略编辑与演示增强。

详情

AI中文摘要

尽管基于扩散的策略具有令人印象深刻的性能和表达能力，但其长时间离线训练拖慢了数据收集和策略部署循环。我们引入了闭环扩散策略（CFDP），这是一类使用从演示数据集导出的闭式得分的无训练扩散策略，用于模仿学习。我们在硬件实验中用移动CPU进行实时推理部署CFDP，表明它能够直接从数据集中毫秒级成功执行模仿，并且推理速度比神经扩散策略更快。在模仿学习基准实验中，我们展示了CFDP与需要数小时训练的神经基线相比具有竞争力，在训练时间和性能之间提供了有利的权衡。最后，我们展示了闭环扩散策略如何作为一种可组合原语，实现对预训练神经扩散策略的数据驱动推理时编辑，包括策略引导和新颖的演示增强。

英文摘要

While diffusion-based policies have impressive performance and expressivity, their long offline training slows down the data collection and policy deployment loop. We introduce Closed-Form Diffusion Policies, a class of training-free diffusion-based policies for imitation learning using the closed-form score derived from the demonstration dataset. We deploy CFDP with real-time inference with a mobile CPU in hardware experiments, showing it can successfully perform imitation directly from the dataset in milliseconds and with faster inference than neural diffusion policies. In experiments on imitation learning benchmarks, we show that CFDP is competitive against neural baselines that require hours of training, providing a favorable tradeoff between training time and performance. Finally, we show how closed-form diffusion policies act as a composable primitive that enables data-driven inference-time editing of pre-trained neural diffusion policies, including policy guidance and novel demonstration augmentation.

URL PDF HTML ☆

赞 0 踩 0

2606.01234 2026-06-02 econ.GN cs.CE cs.CV cs.GT cs.LG physics.soc-ph q-fin.EC 版本更新

Differing Roles of Leisure and Productivity in GDP - A Machine Learning based comparative analysis of Germany and USA

休闲与生产力在GDP中的不同作用——基于机器学习的德国与美国比较分析

Achintya Ranjan, Uma Ranjan

发表机构 * Achintya Ranjan（阿金蒂亚·兰詹）； Uma Ranjan（乌玛·兰詹）

AI总结本研究通过随机森林模型分析工作时间和全要素生产率对GDP的影响，并利用Gini重要性、SHAP图和部分依赖图揭示德国与美国社会结构差异在GDP贡献中的体现。

Comments International Conference on Emerging Techniques in Computational Intelligence 2025

2606.01227 2026-06-02 cs.LG q-bio.NC 版本更新

DAGGER: Gradient-Free Construction of Transiently Amplifying Networks under Hard Connectivity Constraints

DAGGER: 硬连接约束下瞬态放大网络的无梯度构造

James C. Ferguson

发表机构 * The African Institute for Mathematical Sciences（非洲数学科学研究所）； Institute of Science and Technology Austria（奥地利科学技术研究所）

AI总结提出无梯度单遍算法DAGGER，在硬符号/稀疏/对角约束下构造瞬态放大网络，通过单一标量β控制Wasserstein-2预算实现放大与多重集保留的平滑权衡。

Comments 12 pages, 7 figures

详情

AI中文摘要

许多网络不仅支持而且依赖于瞬态非正态放大，即稳定系统的活动增加数个数量级。在硬符号/稀疏/对角约束（与生物连接组和结构化RNN初始化相关的区域）下构造此类网络，迄今为止需要基于梯度的局部搜索（包含数千次内循环特征分解）或基于Schur形式的直接构造（在抽象基中，投影后破坏约束）。本文提出DAGGER（有向无环图引导边重加权），一种无梯度单遍算法。给定稳定的有符号稀疏矩阵，DAGGER产生具有相同符号、稀疏性和对角的输出。单一标量β控制Wasserstein-2预算，平滑地权衡精确多重集保留（β=0）与放大；峰值放大随β几乎无界增长，经验上在数值溢出前达到10^10。在单次前向传递中，DAGGER在多重集保留方面匹配或超过基于梯度的方法（比典型梯度内循环少30-100倍特征分解），并且在中等β下，在精确保持连接性的同时，超过它们数个数量级。我们开发了该算法，将其与现有方法以及下游信号检测任务进行比较，并检查了显示DAGGER在结构上与其他放大网络不同的诊断结果。

英文摘要

Many networks not only support but also rely on transient non-normal amplification, an orders-of-magnitude increase in the activity of an otherwise stable system. Constructing such networks under hard sign/sparsity/diagonal constraints -- the regime relevant for biological connectomes and structured RNN initializations -- has so far required either gradient-based local search with thousands of inner-loop eigendecompositions or Schur-form direct construction in an abstract basis that breaks the constraints under projection. Here we introduce DAGGER (Directed Acyclic Graph Guided Edge Reweighting), a gradient-free single-pass algorithm. Given a stable signed sparse matrix, DAGGER produces an output with the same sign, sparsity, and diagonal. A single scalar $β$ controls a Wasserstein-2 budget that smoothly trades exact multiset preservation ($β= 0$) for amplification; peak amplification grows essentially without bound with $β$, empirically reaching $10^{10}$ before numerical overflow. DAGGER matches or exceeds gradient-based methods at multiset preservation in a single forward pass -- 30-100$\times$ fewer eigendecompositions than a typical gradient inner loop -- and at moderate $β$ beats them by orders of magnitude with connectivity exactly preserved. We develop the algorithm, compare it to the existing methods and on a downstream signal-detection task, and examine the diagnostics that show why DAGGER is structurally different from other amplifying networks.

URL PDF HTML ☆

赞 0 踩 0

2606.01221 2026-06-02 cs.LG cs.AI 版本更新

Hybrid Imbalanced Regression Through Unified Data-Level and Algorithm-Level Balancing

混合不平衡回归：统一的数据级与算法级平衡方法

Shermin Shahbazi, Hossein Mohammadi, Mohsen Afsharchi

发表机构 * Zahedan National University（札赫德安国立大学）

AI总结提出一个五阶段混合框架，结合自适应分箱、条件变分自编码器、特征空间聚类过采样、潜在密度加权损失和注意力门控融合，解决回归中的不平衡问题。

Comments 52 pages, 20 figures, accepted at Expert Systems with Applications

详情

DOI: 10.1016/j.eswa.2026.131908
Journal ref: Expert Systems with Applications, Date: 1 August 2026, Article: 131908, Volume: Volume 322

AI中文摘要

不平衡学习是机器学习中的一个关键挑战，其中代表性不足的目标值可能使模型产生偏差，并降低对罕见但重要案例的预测性能。尽管在分类中得到了广泛研究，不平衡回归仍然相对未被充分探索。现有方法主要关注数据级平衡（可能引入噪声和过拟合）或算法级平衡（通常难以处理高度复杂的目标分布）。为了解决这些局限性，我们提出了一个统一的混合框架，将数据级和算法级平衡策略集成到一个与回归器无关的流水线中。该框架包括五个阶段：（1）自适应分箱划分，基于局部线性一致性动态分割目标空间；（2）使用条件变分自编码器进行目标条件表示学习；（3）通过特征空间聚类和少数类过采样进行多阶段数据级平衡；（4）使用新颖的潜在密度加权损失（LDWL）进行算法级平衡，以强调潜在空间和目标空间中的稀有样本；（5）基于注意力的门控融合用于最终回归。在基准数据集上的实验结果表明，与单独的回归器和现有的不平衡回归方法相比，所提出的框架持续提高了预测性能。

英文摘要

Imbalanced learning is a critical challenge in machine learning, where underrepresented target values can bias models and degrade prediction performance on rare but important cases. Although extensively studied in classification, imbalanced regression remains relatively underexplored. Existing methods mainly focus on either data-level balancing, which may introduce noise and overfitting, or algorithm-level balancing, which often struggles with highly complex target distributions. To address these limitations, we propose a unified hybrid framework that integrates both data- and algorithm-level balancing strategies into a regressor-agnostic pipeline. The proposed framework consists of five stages: (1) adaptive bin partitioning to dynamically segment the target space based on local linear coherence; (2) target-conditioned representation learning using a Conditional Variational Autoencoder; (3) multistage data-level balancing through feature-space clustering and oversampling of minority clusters; (4) algorithm-level balancing using a novel Latent-Density Weighted Loss (LDWL) to emphasize rare samples in latent and target spaces; and (5) attention-based gated fusion for final regression. Experimental results on benchmark datasets demonstrate that the proposed framework consistently improves predictive performance compared to standalone regressors and existing imbalanced regression approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.01220 2026-06-02 cs.LG cs.AI 版本更新

Fine-Tuning Diffusion Models for Molecular Generation via Reinforcement Learning and Fast Sampling

通过强化学习和快速采样微调扩散模型用于分子生成

Guang Lin, Shikui Tu, Lei Xu

发表机构 * Department of Computer Science and Engineering, Shanghai Jiao Tong University（上海交通大学计算机科学与工程系）； Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)（广东人工智能与数字经济实验室（深圳））

AI总结提出FTDiff框架，结合组相对策略优化和快速采样机制，微调扩散模型以生成满足多目标药物设计约束的高质量分子。

Comments 13 pages, 7 figures

详情

AI中文摘要

生成同时满足类药性质并符合目标蛋白三维结构的分子是基于结构的药物设计（SBDD）中的核心挑战。然而，现有的生成方法通常依赖于采样过程中昂贵的后处理或训练时需要精心策划的数据集，但增益仍然有限。这些限制在多目标设置中尤为突出，平衡冲突标准仍是一个核心挑战。为了解决这些问题，我们提出了FTDiff，一个专为结构约束下基于扩散的分子生成量身定制的强化学习微调框架。为了确保稳定且样本高效的优化，FTDiff采用了组相对策略优化（GRPO）风格策略。此外，FTDiff基于一个无时间预训练扩散模型，并集成了快速采样机制，减少了去噪步数，在保持生成质量的同时显著加速了训练和推理。通过优化一个固定阈值感知的奖励，FTDiff有效引导模型生成有效、多样且高质量的分子，平衡多个药物设计目标。在基准数据集上的大量实验表明，FTDiff始终优于先前的方法，且无需昂贵的后处理优化或复杂的数据工程。

英文摘要

Generating molecules that simultaneously satisfy drug-like properties and conform to the 3D structure of a target protein is a core challenge in structure-based drug design (SBDD). Existing generative approaches, however, often rely on costly post-hoc processing during Sampling or require carefully curated datasets during training, yet still achieve modest gains. These limitations are especially pronounced in multi-objective settings, where balancing conflicting criteria remains a core challenge. To address these challenges, We propose FTDiff, a reinforcement learning fine-tuning framework tailored for diffusion-based molecular generation under structural constraints. To ensure stable and sample-efficient optimization, FTDiff adopts a group relative policy optimization (GRPO) style strategy. Furthermore, FTDiff builds upon a time-free pretrained diffusion model and incorporates a fast sampling mechanism that reduces the number of denoising steps, significantly accelerating both training and inference while maintaining generation quality. By optimizing a fixed threshold-aware reward, FTDiff effectively guides the model to produce valid, diverse, and high- quality molecules that balance multiple drug design objectives. Extensive experiments on benchmark datasets demonstrate that FTDiff consistently outperforms prior methods, without requiring expensive post-hoc optimization or intricate data engineering.

URL PDF HTML ☆

赞 0 踩 0

2606.01217 2026-06-02 cs.CV cs.LG stat.AP 版本更新

Analysis of Ethnic Disparities in Autism Spectrum Disorder among Toddlers

幼儿自闭症谱系障碍中的种族差异分析

Aadithya Prabha Ramaharsha, Deevna Reddy, Uma Ranjan

发表机构 * Sri Ramachandra Institute of Higher Education and Research（Sri Rajachandra高等教育部与研究机构）

AI总结通过逻辑回归分析，研究种族、行为评分、性别和新生儿黄疸对幼儿自闭症谱系障碍（ASD）的影响，发现白种人ASD风险比亚洲人高81%，中东人低79%，并确认新生儿黄疸和男性为显著风险因素。

Comments Third International Conference Biomedical Engineering Science and technology

2606.01216 2026-06-02 cs.LG math.OC 版本更新

Riemannian Optimization for Hadamard Products of Low-Rank Matrices

低秩矩阵的Hadamard积的黎曼优化

Pratik Jawanpuria, Ankish Chandresh, Bamdev Mishra

发表机构 * Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, India（机器智能与数据科学中心，印度班加罗尔理工学院，印度）； Microsoft India（微软印度）

AI总结针对低秩矩阵Hadamard积因子的耦合缩放对称性，提出一种基于黎曼商流形的块对角度量，并开发了线性复杂度的梯度下降算法。

2606.01207 2026-06-02 cs.CV cs.LG 版本更新

Feature Alignment Determines Fusion Strategy: A Comparative Study of Cross-Attention and Concatenation in Multimodal Learning

特征对齐决定融合策略：多模态学习中交叉注意力与拼接的比较研究

Zhiqiang Zhou, Xuezhen Xie

发表机构 * Hunan Chemical Industry Vocational and Technical College（湖南化学工业职业技术学院）

AI总结通过实验和理论分析，证明特征对齐质量而非数据规模是决定多模态融合策略优劣的关键因素，当特征预对齐时拼接优于交叉注意力。

Comments 8 pages,6 figures,4 tables

详情

AI中文摘要

在多模态融合中，交叉注意力与拼接的选择仍由实践者直觉而非原理性理解主导。本文通过使用两个特征提取骨干（ResNet18和CLIP ViT-B/32）在Flickr8k上的控制实验，证明特征对齐质量（而非仅数据规模）是决定哪种融合策略更优的主要因素。当特征通过视觉语言预训练目标预对齐时，在所有测试规模（2048-16384样本）下，拼接比交叉注意力高出4.1-5.1个百分点。我们提供了基于样本复杂度分析的理论解释：拼接需要O(d_v + d_t)个样本来学习其融合投影，而交叉注意力需要O(d_v * d_t)个样本来学习双线性注意力权重，对于512维CLIP特征，后者是前者的256倍以上。当特征已经对齐时，两种方法的近似误差差距消失，拼接的样本效率在所有实际数据集规模上占优。对齐退化研究证实了单调趋势：随着特征对齐退化，拼接的优势从1.3%增长到2.8%。这些发现为多模态系统中的融合方法选择提供了原理性决策框架，对多模态大语言模型的设计具有直接影响。

英文摘要

The choice between cross-attention and concatenation for multimodal fusion remains governed by practitioner intuition rather than principled understanding. In this paper, we demonstrate that feature alignment quality, not data scale alone, is the primary determinant of which fusion strategy excels. Through controlled experiments on Flickr8k using two feature extraction backbones (ResNet18 and CLIP ViT-B/32), we show that concatenation outperforms cross-attention by 4.1-5.1 percentage points across all tested scales (2048-16384 samples) when features are pre-aligned by a vision-language pretraining objective. We provide a theoretical explanation grounded in sample complexity analysis: concatenation requires O(d_v + d_t) samples to learn its fusion projection, while cross-attention requires O(d_v * d_t) samples to learn bilinear attention weights, over 256 times as many for 512-dimensional CLIP features. When features are already aligned, the approximation error gap between the two methods vanishes, and concatenation's sample efficiency dominates at all practical dataset sizes. An alignment degradation study confirms a monotonic trend: as feature alignment degrades, concatenation's advantage grows from 1.3% to 2.8%. These findings provide a principled decision framework for fusion method selection in multimodal systems, with direct implications for the design of Multimodal Large Language Models.

URL PDF HTML ☆

赞 0 踩 0

2606.01202 2026-06-02 cs.AI cs.CL cs.LG 版本更新

The Shape of Wisdom: Decision Trajectories in Language Models

智慧的形状：语言模型中的决策轨迹

Shailesh Rana

发表机构 * Independent Researcher（独立研究者）

AI总结本文通过分析三种语言模型在MMLU上的9000条轨迹，提出用答案边际、边际变化和决策翻转距离描述轨迹，发现正确性与稳定性不同，并探究了注意力与MLP标量对边际的影响。

Comments 6 pages, 5 figures. Code and derived artifacts: https://github.com/gut-puncture/The-Shape-of-Wisdom

详情

AI中文摘要

语言模型并非简单地在输出层选择一个答案。在一项包含9000条轨迹的MMLU研究中，涉及Qwen2.5-7B-Instruct、Llama-3.1-8B-Instruct和Mistral-7B-Instruct-v0.3，答案的分数在深度上以结构化方式移动。我们用三个量描述每条轨迹：当前答案边际、该边际的下一层变化，以及距离决策翻转的距离。主要经验图景是正确性和稳定性是不同的：最大的群体是不稳定-正确的，而不是稳定-正确的。然后，一个追踪的子集询问是什么推动了边际。在稳定-正确的情况下，平均注意力标量指向正确的方向，而平均MLP标量则不然；跨度删除显示，移除支持答案的文本会损害边际，而移除类似干扰项的文本则有助于边际。结果并非完整的电路解释。它是一种可重复的方式，用于查看哪些答案已确定，哪些仍然脆弱，以及哪些测量来源推动了它们。

英文摘要

Language models do not simply choose an answer at the output layer. In a 9,000-trajectory MMLU study across Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, and Mistral-7B-Instruct-v0.3, the score of the answer moves across depth in structured ways. We describe each trajectory with three quantities: the current answer margin, the next-layer change in that margin, and the distance from a decision flip. The main empirical picture is that correctness and stability are different: the largest group is unstable-correct, not stable-correct. A traced subset then asks what moves the margin. In stable-correct cases, the average attention scalar points in the correct direction, while the average MLP scalar does not; span deletion shows that removing answer-supporting text hurts the margin and removing distractor-like text helps it. The result is not a full circuit explanation. It is a reproducible way to see which answers are settled, which remain fragile, and which measured sources move them.

URL PDF HTML ☆

赞 0 踩 0

2606.01198 2026-06-02 cs.LG 版本更新

Linear Strategic Classification with Endogenous Improvements

具有内生改进的线性战略分类

Siddharth Shrivastava, Mahvith Akshintala, B Vamsha Vardhan Reddy, Naresh Manwani, Sujit Gujar, Ganesh Ghalme

发表机构 * Department of Artificial Intelligence（人工智能系）； IIT Hyderabad（海得拉巴理工学院）； IIIT Hyderabad（海得拉巴理工学院）

AI总结研究智能体通过修改特征响应分类器时，能产生真实结果改进的战略分类问题，提出线性分类器下的最优决策边界平移方法，并给出PAC保证和实用算法。

详情

AI中文摘要

战略分类研究智能体通过以成本修改可观察特征来响应已部署分类器的设置。经典模型通常将此类响应视为装饰性的：特征可能改变，但真实标签保持不变。我们研究了一种考虑改进的变体，其中战略响应可以引起结果相关特征的真正变化。智能体策略性地选择部署后的特征向量，然后根据一个稳定的条件结果分布生成标签，该分布保留了特征与结果之间的关系。我们在单指标资格模型和线性可分解成本下形式化了线性分类器的这一问题。我们证明，战略最优分类器是通过贝叶斯最优决策边界的平行移动获得的，并且它比贝叶斯分类器为改进感知目标提供了更好的代理。由于改进感知学习需要部署后的标签，而这些标签通常在部署前不可用，我们在预言机模型下提供了PAC风格的保证，提出了一种实用的插件算法，建立了其泛化界，并在合成和真实数据集上进行了评估。

英文摘要

Strategic classification studies settings in which agents respond to a deployed classifier by modifying observable features at a cost. Classical models typically treat such responses as cosmetic: features may change, but true labels remain fixed. We study an improvement-aware variant in which strategic responses can induce genuine changes in outcome-relevant features. Agents choose post-deployment feature vectors strategically, and labels are then generated according to a stable conditional outcome law that preserves the relationship between features and outcomes. We formalize this problem for linear classifiers under a single-index qualification model and linear-decomposable costs. We show that the strategic-optimal classifier is obtained by a parallel shift of the Bayes-optimal decision boundary, and that it provides a better surrogate for the improvement-aware objective than the Bayes classifier. Since improvement-aware learning requires post-deployment labels, which are typically unavailable before deployment, we provide PAC-style guar- antees under an oracle model, propose a practical plug-in algorithm, establish its generalization bound, and evaluate it on synthetic and real-world datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.01179 2026-06-02 cs.LG cs.AI 版本更新

Physics-Informed Deep Learning for Entropy Prediction in Heterogeneous Systems: Thermodynamic and Information-Theoretic Case Studies

异质系统中熵预测的物理信息深度学习：热力学与信息论案例研究

Biswajeet Sahoo, Debadutta Patra

发表机构 * Durham University（杜ham大学）； Department of Chemical Engineering（化学工程系）； Veer Surendra Sai University of Technology（维尔·苏雷纳·赛大学）

AI总结提出统一物理信息深度学习框架，通过微分方程残差和信息论约束，在单一神经网络中同时实现热力学与信息论系统的熵预测，并验证其数据效率和物理一致性。

详情

AI中文摘要

熵产生支配着物理和信息论系统中的不可逆性和不确定性。尽管物理信息神经网络（PINNs）成功求解微分方程，但当前架构本质上仍是领域特定的。跨根本不同物理定律的领域不变熵表示的提取尚未探索。本文引入了一个统一的物理信息深度学习（PIDL）框架，该框架在单一神经架构中同时强制执行微分方程残差和信息论界限。我们通过两个经典研究来展示该框架：（i）一个热力学连续搅拌釜反应器（CSTR）模型，求解控制常微分方程，其中Softplus约束严格强制执行热力学第二定律；（ii）一个信息论金融市场模型，求解逆Fokker-Planck偏微分方程以推断潜在漂移和扩散系数，通过Softplus约束保证扩散正性，同时自然诱导香农熵。评估了三种模型变体：两个特定领域基线和一种共享编码器架构。PIDL框架保证了绝对的热力学可接受性，零违反第二定律，并表现出卓越的数据效率，仅使用30%的可用训练数据即可保持>90%的预测精度。此外，对学习到的熵表面的事后Ruppeiner黎曼几何分析成功识别了热力学相不稳定性。该方法为物理约束熵建模提供了一个稳健、领域无关的架构，推动了可持续过程设计和定量金融风险评估的应用。

英文摘要

Entropy production governs irreversibility and uncertainty in both physical and information-theoretic systems. While Physics-Informed Neural Networks (PINNs) successfully solve differential equations, current architectures remain inherently domain-specific. The extraction of domain-invariant entropy representations across fundamentally different physical laws remains unexplored. This paper introduces a unified Physics-Informed Deep Learning (PIDL) framework that simultaneously enforces differential equation residuals and information-theoretic bounds within a single neural architecture. We demonstrate this framework via two canonical studies: (i) a thermodynamic continuous stirred-tank reactor (CSTR) model solving governing ODEs, where a Softplus constraint strictly enforces the Second Law of Thermodynamics; and (ii) an information-theoretic financial market model solving the inverse Fokker-Planck PDE to infer latent drift and diffusion coefficients, guaranteeing diffusion positivity via a Softplus constraint while naturally inducing Shannon entropy. Three model variants are evaluated: two domain-specific baselines and one shared-encoder architecture. The PIDL framework guarantees absolute thermodynamic admissibility with zero Second-Law violations and exhibits exceptional data efficiency, retaining >90% predictive accuracy using merely 30% of available training data. Furthermore, a post-hoc Ruppeiner Riemannian geometric analysis of the learned entropy surface successfully identifies thermodynamic phase instabilities. This methodology provides a robust, domain-agnostic architecture for physics-constrained entropy modeling, advancing applications in sustainable process design and quantitative financial risk assessment.

URL PDF HTML ☆

赞 0 踩 0

2606.01176 2026-06-02 cs.LG 版本更新

Temporal Motif Signatures for Temporal Graph Neural Networks

时序图神经网络的时序模体特征

Dylan Sandfelder, Mihai Cucuringu, Xiaowen Dong

发表机构 * University of Oxford（牛津大学）； University of California, Los Angeles（加州大学洛杉矶分校）

AI总结针对时序图神经网络难以捕捉短时序模体模式的问题，提出一种紧凑的13维模体特征图，可线性嵌入任意静态或时序编码器，并在多种任务上提升性能。

详情

AI中文摘要

真实时序交互流在短时模体模式（重复、互惠、星型多样性、三元组流）中蕴含预测结构，而普通的时序图神经网络（TGNN）通常无法将其暴露给边评分器。我们在MOOC交互预测中具体展示了这一点：一个由过去窗口星型计数组成的小型四特征族已经提供了相对于强静态GNN的大部分提升。在广泛的实际和合成时序数据集中，我们发现模体活动沿着三个尺度稳定的轴（二元近因/互惠、星型多样性、三元组流）一致地组织，并利用这一经验结构设计了一个紧凑的13维、防泄漏、候选局部模体特征图h(u, v, t)，该特征图可线性嵌入任何静态或时序编码器，无需改变架构。时序Weisfeiler-Leman（WL）分析将该增强置于锚定时序WL层次的第一级，并展示了一个候选锚定对，模体特征在该对上具有区分性。我们通过实验证明，相同的增强在异构任务上一致地提升了性能：TGB链路属性预测在所有五个基线上，Bitcoin Alpha/OTC和MOOC上的边分类，以及合成时序生成器的图级分类。

英文摘要

Real temporal interaction streams carry predictive structure in short-horizon motif patterns -- repetition, reciprocity, star diversity, triadic flow -- that vanilla temporal graph neural networks (TGNNs) often fail to expose to their edge scorers. We show this concretely on MOOC interaction prediction, where a small four-feature family of past-window star counts already delivers most of the lift over a strong static GNN. Across a wide set of real and synthetic temporal datasets we find that motif activity organizes consistently along three scale-stable axes (dyadic recency/reciprocity, star diversity, triadic flow), and we use this empirical structure to design a compact 13-coordinate, leakage-safe, candidate-local motif feature map h(u, v, t) that linearly embeds into any static or temporal encoder without architectural changes. A temporal Weisfeiler-Leman (WL) analysis places the augmentation relative to the first level of an anchored temporal-WL hierarchy and exhibits a candidate-anchored pair on which motif features distinguish. We demonstrate empirically that the same augmentation consistently lifts performance across heterogeneous tasks: TGB link-property prediction across all five baselines, edge classification on Bitcoin Alpha/OTC and MOOC, and graph-level classification of synthetic temporal generators.

URL PDF HTML ☆

赞 0 踩 0

2606.01159 2026-06-02 cs.LG cs.GT 版本更新

基于上下文的儿童导向语音检测：从长时间录音中识别

Théo Charlot, Tarek Kunze, Kaveri K. Sheth, Alejandrina Cristia, Marvin Lavechin

发表机构 * LSCP, DEC, ENS, EHESS, CNRS, PSL University, France（法国社会科学高等学院（LSCP）、法国国家科学研究中心（DEC）、巴黎高等师范学院（ENS）、高等科学研究院（EHESS）、法国国家科学研究中心（CNRS）、巴黎社会科学大学（PSL University））

AI总结本研究通过微调自监督模型、融入上下文信息以及端到端流水线评估，显著提升了从长时间录音中自动检测儿童导向语音的性能。

Comments 6 pages, 1 figure

2606.01128 2026-06-02 cs.LG 版本更新

Local MixVR: Breaking the Communication-Sample Dependence in Distributed Learning

Local MixVR：打破分布式学习中通信与样本的依赖关系

Tehila Dahan, Bassel Hamoud, Roie Reshef, Martin Jaggi, Kfir Y. Levy

发表机构 * Technion Haifa, Israel（技术离子海法分校，以色列）； EPFL Lausanne, Switzerland（洛桑联邦理工学院，瑞士）

AI总结提出Local MixVR框架，通过局部更新与方差缩减技术消除通信复杂度对样本总数N的依赖，实现仅与工作节点数M相关的通信复杂度，在M<O(N^{1/4})时优于现有最优方法。

2606.01126 2026-06-02 cs.LG cs.AI cs.CV 版本更新

STARFISH: faST Accuracy Recovery in pruned networks From Internal State Healing

STARFISH: 从内部状态修复中实现剪枝网络的快速精度恢复

Shir Maon, Odelia Melamed, Adi Shamir

发表机构 * Weizmann Institute of Science（魏茨曼科学研究所）

AI总结提出STARFISH方法，通过少量无标签校准集优化剪枝网络与原始网络内部状态对齐，高效恢复精度，在ViT网络上优于现有方法。

详情

AI中文摘要

剪枝是一种旨在减少大型神经网络中权重数量的过程。这可以显著加快推理速度，但可能导致模型精度大幅下降，因此通常随后会进行修复过程以恢复部分丢失的精度。在本文中，我们提出了一种新的修复方法STARFISH，它可以高效地恢复任何剪枝网络的（大部分）精度。STARFISH的主要思想是使用少量无标签示例的校准集，优化剪枝网络以与原始网络的内部状态表示对齐。对于去除50%权重的常见情况，在基于ViT的网络中，STARFISH修复相比最先进方法将恢复精度提高了高达22%。在激进剪枝下其优势更为显著。例如，在ImageNet的DeiT-B网络中去除75%权重后，STARFISH仅使用训练图像数量的0.4%作为校准集，恢复了原始稠密模型精度的82%，而竞争恢复技术仅达到稠密模型精度的40%。

英文摘要

Pruning is a process designed to reduce the number of weights in a large neural network. This can substantially speed up inference but might cause a considerable reduction in the model's accuracy, and thus it is usually followed by a healing process that regains some of the lost accuracy. In this paper, we propose a new healing method, STARFISH, that can recover (most of) the accuracy of any pruned network efficiently. The main idea of STARFISH is to optimize the pruned network to align with the original network's internal state representations using a tiny calibration set of unlabeled examples. For the common case of removing 50% of the weights, STARFISH healing improves the recovered accuracy by up to 22% over the state-of-the-art methods on ViT-based networks. Its advantage is even more pronounced under aggressive pruning. For example, after eliminating 75% of the weights in a DeiT-B network for ImageNet, STARFISH uses only 0.4% of the number of training images as a calibration set and recovers 82% of the original dense accuracy, whereas competing recovery techniques reach only 40% of the dense model accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.01123 2026-06-02 cs.LG 版本更新

From Reward-Free Representations to Preferences: Rethinking Offline Preference-Based Reinforcement Learning

从无奖励表示到偏好：重新思考离线基于偏好的强化学习

Jun-Jie Yang, Chia-Heng Hsu, Kui-Yuan Chen, Ping-Chun Hsieh

发表机构 * GitHub

AI总结本文提出一种结合无奖励表示学习和对比搜索微调的离线偏好强化学习框架，通过从无奖励离线数据中学习潜在后继度量表示，再利用偏好数据进行对比搜索和微调，显著提升了偏好效率。

Comments Published in ICML 2026

详情

AI中文摘要

基于偏好的强化学习通过从成对的人类偏好反馈中学习，避免了显式的奖励工程。现有的离线PbRL方法通常遵循两阶段流程，首先从标记的偏好中学习奖励或偏好模型，然后在未标记数据上执行离线RL。我们通过零样本RL文献中的无奖励表示学习视角重新审视离线PbRL，并提出一个新的训练框架，该框架首先从无奖励离线数据中学习潜在后继度量表示，然后使用偏好数据进行对比搜索和微调。通过大量实验和消融研究，我们表明我们的方法在偏好效率上优于离线PbRL基线。这项工作首次将RFRL与PbRL联系起来，突出了其作为反馈高效解决方案的潜力。我们的代码可在https://github.com/rl-bandits-lab/FB-PbRL公开获取。

英文摘要

Preference-based reinforcement learning (PbRL) avoids explicit reward engineering by learning from pairwise human preference feedback. Existing offline PbRL methods typically follow a two-stage pipeline, first learning a reward or preference model from labeled preferences and then performing offline RL on unlabeled data. We revisit offline PbRL through the lens of reward-free representation learning (RFRL) from the zero-shot RL literature, and propose a new training framework that first learns latent successor-measure representations from reward-free offline data, followed by contrastive search and fine-tuning using preference data. Through extensive experiments and ablations, we show that our method achieves superior preference efficiency over offline PbRL baselines. This work is the first to connect RFRL with PbRL, highlighting its potential as a feedback-efficient solution. Our code is publicly available at https://github.com/rl-bandits-lab/FB-PbRL.

URL PDF HTML ☆

赞 0 踩 0

2606.01122 2026-06-02 cs.LG q-fin.CP 版本更新

A Per-Component Diagnostic Protocol for Neural HJB-PIDE Solvers under Control-Dependent Lévy Jumps

控制依赖 Lévy 跳跃的神经 HJB-PIDE 求解器的逐分量诊断协议

R. Drissi

发表机构 * GitHub

AI总结提出一个五步诊断协议，用于检测残差训练的神经 HJB-PIDE 求解器在控制依赖 Lévy 跳跃下的算子计算错误，并通过 CRRA-Merton-Variance-Gamma 基准案例验证其有效性。

详情

AI中文摘要

我们针对具有控制依赖 Lévy 跳跃的残差训练神经 HJB-PIDE 求解器，提出一个五步诊断协议，旨在解决神经 PDE 方法的一种常见失效模式：学习到的解可能匹配标量诊断指标，但错误计算了其训练损失内部的算子。该协议将每个神经求解与至少一个从零开始的独立参考配对，将哈密顿量分解为漂移、扩散、补偿器和非局部积分分量（在 u 网格上），并在 (t,x) 网格上比较值函数及其低阶导数，然后进行任何 argmax 比较。应用于标准 CRRA-Merton-Variance-Gamma 基准，它隔离了神经方法重要性提议密度中缺失的 1/2 混合因子，该因子将非局部积分恰好缩放了一半——这是常数提议尺度误差的教科书式特征，而更长的训练、网格细化和截断扫描均无法发现。修正该错误后，四个参考解——两个具有不连续离散化的有限差分求解器、神经求解器以及通过 CRRA 齐次性获得的半解析标量基线——在最优控制上达成约 2% 以内的一致。常数系数 CRRA 基准通过齐次性简化为标量最大化，因此标量基线是此处的高效方法；贡献在于该协议，原则上可应用于真正需要神经 HJB-PIDE 求解器的非齐次和高维场景。该案例是更广泛的神经 PDE 验证失效的具体实例：学习到的值或控制的逐点一致可能与系统性错误的非局部算子共存，因此在信任 argmax 策略之前，需要进行逐分量和表面层次的检查。

英文摘要

We propose a five-step diagnostic protocol for residual-trained neural HJB-PIDE solvers with control-dependent Lévy jumps, targeting a general failure mode of neural PDE methods: a learned solution can match headline scalar diagnostics while miscomputing an operator inside its training loss. The protocol pairs each neural solve with at least one from-scratch independent reference, decomposes the Hamiltonian into drift, diffusion, compensator, and nonlocal-integral components across a u-grid, and compares the value function and its low-order derivatives over a (t,x) grid before any argmax comparison. Applied to a standard CRRA-Merton-Variance-Gamma benchmark, it isolates a missing 1/2-mixture factor in the neural method's importance-proposal density that scaled the nonlocal integral by exactly half - a textbook signature of a constant proposal scale error, invisible to longer training, grid refinement, and truncation sweeps. With the bug corrected, four references - two finite-difference solvers with disjoint discretizations, the neural solver, and a semi-analytic scalar baseline obtained from CRRA homogeneity - agree on the optimal control to within ~2%. The constant-coefficient CRRA benchmark collapses by homogeneity to a scalar maximization, so the scalar baseline is the efficient method here; the contribution is the protocol, applicable in principle to non-homogeneous and higher-dimensional settings where neural HJB-PIDE solvers are genuinely needed. The episode is a concrete instance of a broader neural-PDE verification failure: pointwise agreement of a learned value or control can coexist with a systematically wrong nonlocal operator, so per-component and surface-level checks are needed before trusting the argmax policy.

URL PDF HTML ☆

赞 0 踩 0

2606.01117 2026-06-02 cs.LG cs.AI 版本更新

HASTE: Hardware-Aware Dynamic Sparse Training for Large Output Spaces

HASTE: 面向大输出空间的硬件感知动态稀疏训练

Nasib Ullah, Jinbin Zhang, Jean Lucien Randrianantenaina, Erik Schultheis, Rohit Babbar

发表机构 * University of Waterloo（滑铁卢大学）

AI总结提出组共享固定扇入稀疏性方法，通过半结构化输出层设计结合长尾分解，在极端多标签分类中实现显著加速并保持精度。

Comments Accepted at ICML 2026 Regular

详情

AI中文摘要

极端多标签分类（XMC）涉及在具有数百万标签的大输出空间上学习模型，使得输出层成为内存计算瓶颈。虽然基于稀疏性的方法降低了算术复杂度，但由于不规则内存访问、硬件利用率低或在长尾场景中依赖辅助架构组件，它们通常无法产生成比例的速度提升。我们引入了组共享固定扇入稀疏性，一种半结构化的输出层设计，其中语义相关的标签共享一个稀疏输入模式，同时保留独立的权重。这种分组引入了任务对齐的归纳偏置——鼓励相关标签共享特征子集——同时减少了索引内存开销，增加了跨标签的特征重用，并通过利用现代加速器原语的自定义CUDA内核实现了高效的GPU执行。作为辅助目标的替代方案，我们利用XMC的长尾结构，将输出层分解为频繁标签上的小型密集头部和其余标签上的组共享稀疏尾部，在保留稀疏性内存优势的同时提供了信息丰富的梯度路径。通过内核级微基准测试，我们表明组共享固定扇入将算术减少转化为实际的挂钟时间增益，在前向传播中实现了高达4.4倍的加速，在反向传播中实现了高达25倍的加速，同时与FLOPs匹配的密集瓶颈相比，性能仅相差几个百分点。在大型XMC基准测试中，我们的方法在precision@k上匹配或优于先前的稀疏基线，同时缩小了与密集方法的性能差距。

英文摘要

Extreme multi-label classification (XMC) involves learning models over large output spaces with millions of labels, making the output layer a memory-compute bottleneck. While sparsity-based methods reduce arithmetic complexity, they often fail to yield proportional speedups due to irregular memory access, poor hardware utilization, or reliance on auxiliary architectural components in long-tailed regimes. We introduce group-shared fixed fan-in sparsity, a semi-structured output-layer design in which semantically related labels share a sparse input pattern while retaining independent weights. This grouping introduces a task-aligned inductive bias -- encouraging related labels to share feature subsets -- while reducing index memory overhead, increasing feature reuse across labels, and enabling efficient GPU execution via custom CUDA kernels that leverage modern accelerator primitives. As an alternative to auxiliary objectives, we exploit the long-tailed structure of XMC by decomposing the output layer into a small dense head over frequent labels and a group-shared sparse tail over the remainder, providing an informative gradient pathway while preserving the memory benefits of sparsity. Through kernel-level microbenchmarking, we show that group-shared fixed fan-in translates arithmetic reductions into practical wall-clock gains, achieving up to $4.4\times$ speedup in the forward pass and up to $25\times$ speedup in backward passes over standard fixed fan-in sparsity, while operating within a few percent of a FLOPs-matched dense bottleneck. Across large-scale XMC benchmarks, our approach matches or improves precision@k over prior sparse baselines, while narrowing the performance gap to dense.

URL PDF HTML ☆

赞 0 踩 0

2606.01107 2026-06-02 cs.LO cs.LG math.LO 版本更新

How (and when) can you fit examples to logic-based hypothesis classes over infinite structures?

如何（以及何时）能在无限结构上拟合样本到基于逻辑的假设类？

Michael Benedikt, Alessio Mansutti

发表机构 * University of Oxford（牛津大学）； IMDEA Software Institute（IMDEA软件研究所）

AI总结研究在无限结构（如实数有序域和Presburger算术）中，对于逻辑定义的假设类，拟合有限样本的计算复杂性和描述复杂性，并关注通过自然查询语言确定样本可拟合性的情况。

2606.01101 2026-06-02 cs.LG cs.AI 版本更新

Soft-NBCE: Entropy-Weighted Chunk Fusion for Long-Context

Soft-NBCE: 基于熵加权分块融合的长上下文处理

Shihao Ji, Mingyu Li, Zihui Song

发表机构 * Beijing Normal University（北京师范大学）； Chunjiang Intelligence（春江智能）

AI总结针对长上下文推理中硬选择策略导致语义碎片化的问题，提出Soft-NBCE，通过熵加权软融合和一致性蒸馏，在保持检索精度的同时提升多跳推理性能。

Comments 7 pages, 3 figures, 2 tables. Preprint

详情

AI中文摘要

自注意力的二次复杂度仍然是大型语言模型（LLMs）处理超长上下文的瓶颈。朴素贝叶斯认知引擎（NBCE）通过将文档分块并在每个解码步骤路由到熵最低的分块，实现了长上下文推理的并行化。这种硬选择策略在跨分块推理时会导致语义碎片化，因为相邻token之间的突然路由变化破坏了模型的上下文基础。我们提出了Soft-NBCE，这是一种轻量级扩展，用软熵加权分块融合替代了离散的分块选择。通过预测熵上的温度缩放Softmax，为所有分块分配连续权重，实现了跨分块条件分布的log空间聚合。为了部分补偿分块引入的条件独立性假设，我们提出了一致性蒸馏，这是一种基于LoRA的自蒸馏方法，通过KL散度将分块logit分布约束为全上下文教师分布。在LongBench多跳基准测试中，带有一致性蒸馏的Soft-NBCE在NBCE风格基线（MuSiQue F1: 0.310 vs. 0.275（Vanilla NBCE）；HotpotQA F1: 0.479 vs. 0.427）上持续改进，同时在O(L^2/n)峰值内存下保持检索精度（NIAH-32K: 0.909）。

英文摘要

The quadratic complexity of self-attention remains a bottleneck for Large Language Models (LLMs) processing ultra-long contexts. The Naive Bayes Cognitive Engine (NBCE) parallelizes long-context inference by chunking documents and routing to the lowest-entropy chunk at each decoding step. This hard-selection strategy causes semantic fragmentation during cross-chunk reasoning, as abrupt routing changes between adjacent tokens disrupt the model's contextual grounding. We present Soft-NBCE, a lightweight extension that replaces discrete chunk selection with soft entropy-weighted chunk fusion. A temperature-scaled Softmax over predictive entropies assigns continuous weights to all chunks, enabling log-space aggregation across chunk-conditioned distributions. To partially compensate for the conditional independence assumption introduced by chunking, we propose Consistency Distillation, a LoRA-based self-distillation that constrains the chunked logit distribution toward a full-context teacher via KL-divergence. On LongBench multi-hop benchmarks, Soft-NBCE with Consistency Distillation improves consistently over NBCE-style baselines (MuSiQue F1: 0.310 vs.\ 0.275 for Vanilla NBCE; HotpotQA F1: 0.479 vs.\ 0.427) while maintaining retrieval accuracy (NIAH-32K: 0.909) at O(L^2/n) peak memory.

URL PDF HTML ☆

赞 0 踩 0

2606.01092 2026-06-02 cs.LG cs.AI 版本更新

MViewRouter：通过多视图交替注意力内化组合路由的几何等变性

Shiyan Liu, Bohan Tan, Yaoxin Wu, Yan Jin

发表机构 * Huazhong University of Science and Technology（华中科技大学）； Eindhoven University of Technology（埃因霍温理工大学）

AI总结提出MViewRouter框架，利用多视图交替注意力机制内化几何等变性作为结构归纳偏置，通过集体策略梯度聚合优化，解决组合路由问题中的对称性挑战，在TSP和CVRP上取得竞争性解质量和强零样本泛化。

详情

AI中文摘要

组合路由问题，如旅行商问题（TSP）和带容量约束的车辆路径问题（CVRP），是基础的NP难问题，具有广泛的现实应用。虽然最近的深度强化学习方法显示出有希望的性能，但它们通常仅通过数据增强处理几何对称性，导致决策不一致和泛化能力有限。为了解决这个问题，我们提出了MViewRouter，一个多视图框架，将几何等变性内化为结构归纳偏置，以实现跨路由问题变体的不变决策。我们的方法引入了一种多视图交替注意力（MAA）机制，能够在$D_4$对称群上进行并行处理，在视图内关系建模和视图间特征对齐之间交替进行。此外，我们通过集体策略梯度聚合（CPGA）优化策略，利用来自多个对称视图的共识梯度来稳定训练并加速收敛。在TSP和CVRP基准测试以及真实世界的TSPLIB实例上的实验表明，MViewRouter实现了竞争性的解质量和强大的零样本泛化能力。

英文摘要

Combinatorial routing problems such as the Traveling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP) are fundamental NP-hard problems with broad real-world applications. While recent deep reinforcement learning methods have shown promising performance, they typically handle geometric symmetries only through data augmentation, resulting in inconsistent decisions and limited generalization. To address this issue, we propose MViewRouter, a multi-view framework that internalizes geometric equivariance as a structural inductive bias to achieve invariant decision-making across routing problem variants. Our approach introduces a Multi-view Alternating Attention (MAA) mechanism that enables parallel processing over the $D_4$ symmetry group, alternating between intra-view relational modeling and inter-view feature alignment. Furthermore, we optimize the policy via Collective Policy Gradient Aggregation (CPGA), leveraging consensus gradients from multiple symmetric views to stabilize training and accelerate convergence. Experiments on TSP and CVRP benchmarks, as well as real-world TSPLIB instances, demonstrate that MViewRouter achieves competitive solution quality and strong zero-shot generalization.

URL PDF HTML ☆

赞 0 踩 0

2606.01081 2026-06-02 cs.LG 版本更新

Decision-Focused On-Policy Learning for Contextual Linear Optimization with Partial Feedback

面向决策的在线策略学习用于部分反馈下的上下文线性优化

Wyame Benslimane, Tinghan Ye, Pascal Van Hentenryck, Paul Grigas

发表机构 * Department of Industrial Engineering and Operations Research, University of California, Berkeley（工业工程与运筹学系，加州大学伯克利分校）； H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology（H.米尔顿·斯图尔特工业与系统工程学院，佐治亚理工学院）

AI总结提出一种混合梯度估计方法，用于部分反馈下顺序上下文线性优化的在线策略学习，实现决策质量驱动的预测模型训练，并在多个基准上优于上下文多臂赌博机基线。

详情

AI中文摘要

决策聚焦学习（DFL）通过优化下游决策质量而非单独预测准确性来训练预测模型。对于上下文线性优化，大多数现有DFL方法假设离线数据和目标成本向量的完全观测。我们开发了一种在线策略学习方法，用于部分反馈下的顺序上下文线性优化，推广了标准赌博机反馈设置。我们的方法学习一个随机预测-然后-优化策略，该策略从条件分布中采样成本向量预测，并求解由此产生的下游线性优化问题。为了更新这个分布模型，我们引入了一个双组分混合梯度估计器。第一个组分是得分函数估计器，它提供无偏但可能高方差的策略梯度估计。第二个是决策聚焦插件组分，它使用潜在成本向量的辅助干扰估计来利用下游优化结构，随着估计的改进而变得更具信息性。我们证明了平均平方策略梯度范数的$\mathcal{O}(T^{-1/2})$界，与标准非凸SGD速率相匹配。在top-$k$选择、最短路径、组合定价和真实数据能源调度基准上的实验表明，混合梯度方法在使用高斯和更丰富的条件生成模型时，在所有基准上实现了比上下文赌博机风格基线更低的累积遗憾。代码可在https://github.com/Joeyetinghan/on-policy-bandit-dfl获取。

英文摘要

Decision-focused learning (DFL) trains predictive models by optimizing downstream decision quality rather than standalone prediction accuracy. For contextual linear optimization, most existing DFL methods assume offline data and full observations of the objective cost vector. We develop an on-policy learning method for sequential contextual linear optimization under partial feedback, generalizing the standard bandit feedback setting. Our method learns a stochastic predict-then-optimize policy that samples a cost-vector prediction from a conditional distribution and solves the resulting downstream linear optimization problem. To update this distributional model, we introduce a two-component hybrid gradient estimator. The first component is a score function estimator, which provides an unbiased but potentially high-variance policy gradient estimate. The second is a decision-focused plug-in component that uses an auxiliary nuisance estimate of the latent cost vector to exploit the downstream optimization structure, becoming more informative as the estimate improves. We prove an $\mathcal{O}(T^{-1/2})$ bound on the average squared policy-gradient norm, matching the standard non-convex SGD rate. Experiments on top-$k$ selection, shortest path, combinatorial pricing, and a real-data energy-scheduling benchmark show that the hybrid gradient approach achieves lower cumulative regret than contextual-bandit-style baselines across all benchmarks, using both Gaussian and richer conditional generative models. Code is available at https://github.com/Joeyetinghan/on-policy-bandit-dfl.

URL PDF HTML ☆

赞 0 踩 0

2606.01080 2026-06-02 cs.LG cs.AI 版本更新

Leyline：用于智能推理的 KV 缓存指令

Bole Ma, Jan Eitzinger, Harald Koestler

发表机构 * Erlangen National High Performance Computing Center（埃朗根国家高性能计算中心）

AI总结针对智能体 LLM 中策略驱动的缓存编辑需求，提出 Leyline 服务端原语，通过声明式指令四元组和架构无关接口实现缓存拼接与截断，提升缓存命中率和求解率。

详情

AI中文摘要

现代 KV 缓存管理假设聊天机器人工作负载：提示一次性到达，缓存仅追加增长，因此前缀缓存和仅向前驱逐在构造上是正确的。智能体 LLM 打破了这一假设。它们的对话通过策略驱动的编辑演变：失败的工具调用被重试，过时的输出被丢弃，轨迹被转向。这导致两个不同的缓存问题。首先，相同的内容在轮次之间移动到新位置，使得精确前缀缓存失效，尽管底层 KV 仍然有效；最近针对 MLA 的位置无关缓存工作解决了这个重用问题。其次，也是本文的重点，策略可能需要指示服务系统主动移除或替换一段缓存内容，并继续而不重新预填充之后的所有内容。没有现有的原语提供此功能。生产智能体框架退回到每次编辑时重新预填充，支付完整的前缀重新计算成本；内核级驱逐方法自行决策，无法接受来自内核外部的策略指令。我们引入 Leyline，一个弥补这一差距的服务端原语。一个声明式指令四元组将编辑内容与保持位置正确性分离。策略声明编辑及其模式（原地拼接或前缀修剪的重新预填充以实现语义遗忘）；一个架构无关的接口路由到每个架构的内核，通过闭式 RoPE 旋转校正恢复注意力计算。拼接内核将重放缓存命中率提高 11.2 个百分点，并将延迟降低最多 241 毫秒。通过同一接口路由的十行截断规则将 debug-gym 上的智能体求解率提高 14.3 个百分点。该机制是开放的；它启用的策略空间是未来的议程。

英文摘要

Modern KV cache management assumes the chatbot workload: prompts arrive once and the cache grows append-only, so prefix caching and forward-only eviction are correct by construction. Agentic LLMs break this assumption. Their conversations evolve through policy-driven editing: failed tool calls are retried, stale outputs dropped, trajectories pivoted. Two distinct cache problems result. First, identical content moves to new positions between turns, invalidating exact-prefix caches even though the underlying KV would still be valid; recent work on position-independent caching for MLA addresses this reuse problem. Second, and this paper's focus, a policy may need to direct the serving system to actively remove or replace a span of cached content and continue without re-prefilling everything that came after. No existing primitive offers this. Production agentic harnesses fall back to re-prefill on every edit, paying full prefix-recomputation cost; kernel-level eviction methods make their own decisions and cannot accept policy directives from outside the kernel. We introduce Leyline, a serving-side primitive that closes this gap. A declarative directive 4-tuple separates what to edit from how to preserve position correctness. The policy declares the edit and its mode (in-place splice or prefix-trimmed re-prefill for semantic forgetting); an architecture-agnostic interface routes to a per-architecture kernel that restores attention math via a closed-form RoPE-rotation correction. The splice kernel lifts replay cache-hit by +11.2 pp and cuts latency by up to 241 ms. A ten-line truncation rule routed through the same interface lifts agentic solve rate by +14.3 pp on debug-gym. The mechanism is open; the policy space it enables is the agenda.

URL PDF HTML ☆

赞 0 踩 0

2606.01057 2026-06-02 cs.CV cs.AI cs.GR cs.LG 版本更新

3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code

3DCodeBench：通过代码进行智能体程序化3D建模的基准测试

Yipeng Gao, Lei Shu, Genzhi Ye, Xi Xiong, Ameesh Makadia, Meiqi Guo, Laurent Itti, Jindong Chen

发表机构 * Google DeepMind（谷歌DeepMind）； University of Southern California（南加州大学）； Google Research（谷歌研究）

AI总结提出3DCodeBench基准，评估12种视觉语言模型将文本和图像参考转换为程序化3D建模代码的能力，并构建基于人类偏好的3DCodeArena排名平台。

Comments Project Page: https://www.3dcodebench.com/; 11 pages (main), with appendix

详情

AI中文摘要

通过代码进行程序化3D建模正成为一种通用的范式，提供确定性、引擎就绪且可精确编辑的资产，而神经3D生成器天生缺乏这些特性。然而，编写此类程序化内容需要深厚的3D软件API、参数化设计和代码级几何推理专业知识。在本文中，我们提出了3DCodeBench，一个系统性的基准，用于评估3D建模软件中用于程序化3D生成的视觉语言模型（VLM）智能体。具体来说，3DCodeBench评估了12种先进VLM如何有效地充当程序化3D建模器，将文本和图像参考转换为3D建模软件的程序化代码。认识到自动度量可能无法完全捕捉3D形状的感知质量，我们构建了3DCodeArena，一个基于成对人类偏好对生成的3D输出进行排名的平台。通过广泛的评估和结果，我们观察到：（1）失败主要源于API不匹配，而成功渲染的模型仍然存在断开或浮动的3D几何组件。（2）测试时扩展，如更高的思考预算和多轮细化，总体上提高了性能。我们的发现突显了对高质量程序化编码数据以推进商业VLM的迫切需求。此外，有效的程序化3D建模需要一个强大的执行环境，为迭代细化提供高保真反馈。我们发布了3DCodeBench，包括精心策划的大规模多模态（文本/图像）提示数据集、程序化代码、3D对象三元组、评估协议以及公共3DCodeArena平台，作为探索基于VLM的程序化3D建模器的基础工具包。

英文摘要

Procedural 3D modeling through code is emerging as a versatile paradigm, offering deterministic, engine-ready, and precisely editable assets that neural 3D generators inherently lack. Authoring such procedural content, however, demands deep expertise in 3D software APIs, parametric design, and code-level geometric reasoning. In this paper, we propose 3DCodeBench, a systematic benchmark for evaluating vision-language model (VLM) agents for procedural 3D generation in 3D modeling software. Specifically, 3DCodeBench evaluates how effectively 12 advanced VLMs can serve as procedural 3D modelers by translating text and image references into procedural code for 3D modeling software. Recognizing that automated metrics may not fully capture the perceptual quality of 3D shapes, we build 3DCodeArena, a ranking platform based on pairwise human preferences over generated 3D outputs. From extensive evaluations and results, we observe that: (1) Failures mostly arise from API mismatches, while successful renders still suffer from disconnected or floating 3D geometric components. (2) Test-time scaling, such as higher thinking budgets and multi-turn refinement, improves performance overall. Our findings highlight a critical need for high-quality procedural coding data to advance commercial VLMs. Furthermore, effective procedural 3D modeling requires a robust execution environment that provides high-fidelity feedback for iterative refinement. We release 3DCodeBench, including the curated large-scale dataset of multimodal (text/image) prompts, procedural code, 3D object triplets, evaluation protocol, and the public 3DCodeArena platform as a foundational toolkit for exploring VLM-based procedural 3D modelers.

URL PDF HTML ☆

赞 0 踩 0

2606.01051 2026-06-02 cs.LG 版本更新

Interaction-Limited Safe Continuous-Time RL for Dynamical Medical Treatment

交互受限的动态医疗安全连续时间强化学习

Xun Shen, Yuepeng Wang, Akifumi Wachi, Yongqi Zhou, Richard Weiss, Yoshihiko Fujisawa, Ken Kawano, Mehrshad Sadria, Ying Chen, Xin Liu, Sebastien Gros, Xiao Hu, Kyoung-Sook Kim, Mengmou Li, Katsuki Fujisawa, Kenji Wakabayashi

发表机构 * Tokyo University of Agriculture and Technology（东京大学农业技术大学）； LY Corporation（LY公司）； National University of Singapore（新加坡国立大学）； Institute of Science Tokyo（东京科学研究所）； Altos Labs, Inc.（Altos实验室）； National Institute of Advanced Industrial Science and Technology (AIST)（国家先进工业科学与技术研究院）； Norwegian University of Science and Technology（挪威科学技术大学）； Emory University（埃默里大学）； Hiroshima University（广岛大学）

AI总结提出交互受限的安全连续时间强化学习框架，通过选项式半马尔可夫决策过程联合优化治疗策略与临床交互时机，并引入安全收紧机制保证轨迹级安全。

详情

AI中文摘要

动态医疗需要决定治疗强度和干预时机，而患者状态连续演化，不良事件可能在临床交互之间发生。现有大多数治疗学习方法假设固定时间表或仅在离散决策点强制执行安全性。我们提出了交互受限的安全连续时间强化学习，这是一个在轨迹级安全约束下联合优化治疗管理和临床交互时机的框架。我们的关键思想是将连续时间治疗问题重新表述为基于选项的半马尔可夫决策过程，其中每个选项指定一个连续时间治疗策略及其持续时间。我们开发了一种安全收紧机制，表明在交互时间适当构造的约束能够以高概率保证整个连续时间轨迹的安全性。我们进一步建立了从记录的治疗轨迹中进行策略学习的有限样本保证，并引入了一个实用的数据驱动保守替代。实验表明，所提出的自适应交互时机机制在不同安全策略优化方法上均能提高安全性和治疗效果，优于等距交互方案。

英文摘要

Dynamic medical treatment requires deciding treatment intensity and intervention timing, while patient states evolve continuously and adverse events may occur between clinical interactions. Most existing treatment learning methods assume fixed schedules or enforce safety only at discrete decision points. We propose Interaction-Limited Safe Continuous-Time Reinforcement Learning, a framework that jointly optimizes treatment administration and clinical interaction timing under trajectory-level safety constraints. Our key idea is to reformulate the continuous time treatment problem as an option-based semi-Markov decision process, where each option specifies a continuous-time treatment policy and its duration. We develop a safety-tightening mechanism showing that suitably constructed constraints at interaction times guarantee safety over the full continuous-time trajectory with high probability. We further establish finite-sample guarantees for policy learning from logged treatment trajectories and introduce a practical data-driven conservative surrogate. Experiments show that the proposed adaptive interaction-timing mechanism improves both safety and treatment effectiveness over equidistant interaction schemes across different safe policy optimization methods.

URL PDF HTML ☆

赞 0 踩 0

2606.01042 2026-06-02 cs.LG cs.AI 版本更新

Plausibility Is Not Prediction: Contrastive Evidence for LLM-Based Cellular Perturbation Reasoning

似真性不是预测：基于LLM的细胞扰动推理的对比证据

Xinyu Yuan, Xixian Liu, Jianan Zhao, Yashi Zhang, Hongyu Guo, Jian Tang

发表机构 * Mila - Québec AI Institute（魁北克人工智能研究所）； University of Montréal（蒙特利尔大学）； HEC Montréal（蒙特利尔HEC商学院）； University of Ottawa（渥太华大学）； National Research Council of Canada（加拿大国家研究理事会）； CIFAR AI Chair（CIFAR人工智能 chair）

AI总结本文发现基于大语言模型的细胞扰动推理虽能生成生物上合理的解释，但实际预测性能差，并提出CORE方法通过对比证据组织来提升扰动特异性预测。

详情

AI中文摘要

扰动实验对于理解细胞机制至关重要，但成本高昂且稀疏，因此需要预测未观察条件下的基因表达响应。最近一个有前景的方向是利用大语言模型（LLM）作为“虚拟细胞”模拟器——通过逐步的、基于知识的机械推理来推断差异表达——指向一种可解释的、知识驱动的范式，超越了纯粹的数据驱动方法。然而，我们发现似真性不是预测：尽管产生了生物上合理的解释，这些方法未能捕捉扰动特异性效应：系统性地高估差异表达，在聚合评估中通常表现不如简单的基因频率基线，并且在每个基因水平上降至随机水平。这揭示了对内在基因响应倾向的依赖，而非真正的扰动推理。我们将这一失败追溯到证据呈现方式：现有方法孤立地评估扰动-基因对，而不揭示相关扰动对同一基因的影响差异。为解决这一局限性，我们引入了CORE（对比关系证据组织），通过将证据组织成来自相关扰动的正面和负面结果，将预测重新定义为比较任务。使用生物医学知识图谱进行证据检索，CORE在基于LLM和非LLM的设置中均改善了校准并大幅提升了扰动特异性预测：例如，在药物扰动数据上，CORE-Reasoning将Qwen3.5-9B的聚合指标提升了高达28.6%；而在通用扰动数据上，CORE-Voting将四个细胞系的每个基因平均AUROC从随机水平提高到0.703。这突显了对比证据组织对于可靠的基于LLM的扰动推理至关重要。

英文摘要

Perturbation experiments are central to understanding cellular mechanisms, but remain costly and sparse, motivating prediction of gene expression responses for unobserved conditions. A promising recent direction leverages large language models (LLMs) as "virtual cell" simulators-using stepwise, knowledge-grounded mechanistic reasoning to infer differential expression-pointing toward an interpretable, knowledge-driven paradigm that transcends purely data-driven approaches. However, we find that plausibility is not prediction: despite producing biologically plausible explanations, these methods fail to capture perturbation-specific effects: systematically overestimating differential expression, often underperforming a simple gene-frequency baseline in aggregate evaluations, and collapsing to chance-level performance at the per-gene level. This reveals a reliance on intrinsic gene response tendencies rather than true perturbation reasoning. We trace this failure to how evidence is presented: existing methods evaluate perturbation-gene pairs in isolation, without exposing how related perturbations differ in their effects on the same gene. To address this limitation, we introduce CORE (Contrastive Organization of Relational Evidence), which reframes prediction as a comparison task by organizing evidence into positive and negative outcomes from related perturbations. Using a biomedical knowledge graph for evidence retrieval, CORE improves calibration and substantially boosts perturbation-specific prediction in both LLM-based and non-LLM settings: for example, on drug-perturbation data, CORE-Reasoning improves Qwen3.5-9B aggregate metrics by up to 28.6%, while on generic perturbation data, CORE-Voting raises macro-per-gene AUROC from chance to 0.703 in average across four cell lines. This highlights contrastive evidence organization as essential to reliable LLM-based perturbation reasoning

URL PDF HTML ☆

赞 0 踩 0

2606.01039 2026-06-02 cs.LG cs.AI 版本更新

OPD+: Rethinking the Advantage Design for On-Policy Distillation

OPD+: 重新思考在线策略蒸馏中的优势设计

Hanyang Zhao, Haoxian Chen, Han Lin, Genta Indra Winata, David Yao, Wenpin Tang

发表机构 * Columbia University（哥伦比亚大学）； Amazon（亚马逊）； Meta ； Capital One

AI总结本文提出OPD+，通过修正在线策略蒸馏中因停止梯度操作导致的奖励目标偏差，并支持多种f-散度，在数学推理和工具使用基准上提升了性能。

详情

AI中文摘要

在线策略蒸馏（OPD）是一种广泛使用的技术，用于将能力强的教师语言模型的能力迁移到基础学生模型，并且可以通过使用学生生成的轨迹来制定强化学习风格的目标。然而，尽管散度奖励依赖于学生模型的可能性，现有工作通常采用停止梯度设计主要是为了稳定性，这使得得到的优势估计存在问题。在这项工作中，我们提供了一个基于学生和教师之间f-散度的通用优化框架，并从数学上重新审视这种设计空间是否有效。我们证明，对于一般的散度函数，一般的停止梯度操作会导致奖励目标和相应梯度的有偏估计。我们提出了OPD+，这是OPD的修正版本，在基线KL方法上展示了改进的性能，并且也支持各种f-散度的选择。我们在数学推理和工具使用基准上验证了我们的发现。

英文摘要

On-policy distillation (OPD) is a widely used technique to transfer capabilities from capable teacher language models to the base student models, and can be formulated in a reinforcement learning style objective using student generated rollouts. Yet, despite the divergence reward being dependent on student model likelihood, existing works usually adopt a stop gradient design primarily for stability, which makes the resulting advantage estimation questionable. In this work, we provide a generic optimization framework based on f-divergence between the student and teacher, and mathematically revisit whether such design space is valid. We prove that general stop-gradient operation would lead to biased estimates of the reward objective and corresponding gradient for general divergence functions. We propose OPD+, the corrected version of OPD that demonstrates improved performance over the baseline KL approach and also supports the choice of various f-divergence. We validate our findings on mathematical reasoning and tool-use benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2606.01031 2026-06-02 cs.GR cs.AI cs.CV cs.LG cs.MM 版本更新

Temporally-Aligned Evaluation for Audio-Driven Talking Head Generation

音频驱动说话头生成的时序对齐评估

Zhicheng Zhang, Lei Wang, Yu Zhang, Yongsheng Gao

发表机构 * School of Business, University of New South Wales (UNSW)（新南威尔士大学商学院）； School of Engineering and Built Environment, Griffith University（格里菲斯大学工程与环境学院）； Data61/CSIRO（Data61/澳大利亚国家科学委员会）

AI总结针对现有帧级评估指标对时序偏差敏感的问题，提出基于软动态时间规整的序列级对齐评估框架，提升评估鲁棒性并揭示不同建模范式间的系统权衡。

Comments Research report

详情

AI中文摘要

音频驱动的说话头生成技术发展迅速，但现有评估协议主要依赖帧级指标，假设生成视频与参考视频之间存在严格的时间对应关系。这一假设与语音驱动的面部运动不符，后者自然包含轻微的时间偏移、不同的说话速度和风格变化。因此，传统指标可能将无害的时间差异视为质量错误，使得公平比较方法并理解其权衡变得更加困难。在这项工作中，我们认为动态生成模型的评估应被表述为序列对齐问题，而非独立的帧比较。我们引入了一种统一的序列级重新表述，将软动态时间规整集成到已有的评估流程中。通过在对齐特征轨迹的同时保持时间顺序，所提出的框架对有限的时间错位具有鲁棒性，且不改变底层的感知、身份或同步编码器。我们表明，在刚性对齐下，帧级评估可被视为一个特例，而序列级对齐提供了更好的稳定性、对时间差异的更低敏感性以及建模范式之间更清晰的区分。基于这一原则性表述，我们在标准化协议下，对涵盖规范、野外和风格多样场景的七个数据集上的20种方法进行了大规模基准测试。大量实验表明，时序对齐的指标对时间差异更鲁棒，跨数据集提供更一致的结果，并能更好地揭示建模范式之间的系统权衡，例如同步性与真实性、表现力与稳定性之间的权衡。

英文摘要

Audio-driven talking-head generation has advanced rapidly, yet existing evaluation protocols mainly rely on frame-wise metrics that assume strict temporal correspondence between generated and reference videos. This assumption does not match speech-driven facial motion, which naturally includes slight timing shifts, different speaking speeds, and stylistic variations. As a result, conventional metrics may treat harmless timing differences as quality errors, making it harder to fairly compare methods and understand their trade-offs. In this work, we argue that evaluation of dynamic generative models should be formulated as a sequence-alignment problem rather than independent frame comparison. We introduce a unified sequence-level reformulation that integrates Soft Dynamic Time Warping into established evaluation pipelines. By aligning feature trajectories while preserving temporal order, the proposed framework provides robustness to bounded temporal misalignments without altering the underlying perceptual, identity, or synchronization encoders. We show that frame-wise evaluation can be viewed as a special case under rigid alignment, while sequence-level alignment provides improved stability, lower sensitivity to timing differences, and clearer separation between modeling paradigms. Building on this principled formulation, we conduct a large-scale benchmark of 20 methods across seven datasets spanning canonical, in-the-wild, and style-diverse scenarios under standardized protocols. Extensive experiments show that temporally aligned metrics are more robust to timing differences, provide more consistent results across datasets, and better reveal systematic trade-offs between modeling paradigms, such as synchronization versus realism and expressiveness versus stability.

URL PDF HTML ☆

赞 0 踩 0

2606.01028 2026-06-02 cs.LG 版本更新

MedGym:A Unified Continuous-Time Benchmark for Dynamic Medical Treatment Reinforcement Learning

MedGym：面向动态医疗治疗强化学习的统一连续时间基准

Yuepeng Wang, Ken Kawano, Yongqi Zhou, Yoshihiko Fujisawa, Richard Weiss, Akifumi Wachi, Katsuki Fujisawa, Ying Chen, Mehrshad Sadria, Xin Liu, Kyoung-Sook Kim, Xiao Hu, Sebastien Gros, Xun Shen

发表机构 * Tokyo University of Agriculture and Technology（东京农业大学）； Institute of Science Tokyo（东京科学研究院）； National University of Singapore（国立新加坡大学）； LY Corporation（LY公司）； Altos Labs, Inc.（Altos实验室）； National Institute of Advanced Industrial Science and Technology (AIST)（国家先进工业科学与技术研究院）； Emory University（埃默里大学）； Norwegian University of Science and Technology（挪威科学技术大学）

AI总结提出MedGym基准，通过连续时间框架和物理信息神经网络构建可配置的医疗RL环境，支持离散与连续时间方法在非规则治疗间隔下的比较，并评估个性化、轨迹安全等临床指标。

详情

AI中文摘要

医疗治疗推荐给强化学习（RL）带来了若干挑战：患者生理状态在连续时间内演变，测量和干预以不规则间隔进行，且治疗效果在不同个体间差异显著。然而，现有的RL公式和模拟环境基于离散时间的MDP或POMDP抽象，具有固定或预先指定的决策间隔。因此，评估RL方法能否处理时间间隔依赖的疾病进展、个性化治疗反应以及连续测量点之间的安全性仍然困难。为弥补这一空白，我们引入了MedGym，一个用于动态治疗推荐的基准环境。MedGym在连续时间框架中对纵向患者演变进行建模，并通过使用物理信息神经网络从临床数据构建可配置的医疗RL基准。所得基准支持离线RL和在线RL，并能够在非规则治疗时机和患者特定动态下直接比较离散时间与连续时间方法。此外，MedGym支持从临床重要角度进行评估，包括个性化、轨迹级安全性以及基于模型的离线学习与在线部署之间的性能差距。通过为连续时间动态治疗提供标准化且可配置的基准，MedGym旨在促进对医疗RL方法进行更真实、更具信息量的评估。

英文摘要

Medical treatment recommendation poses several challenges to reinforcement learning (RL): patient physiology evolves in continuous time, measurements and interventions are performed at irregular intervals, and treatment effects vary substantially across individuals. Existing RL formulations and simulated environments, however, are based on discrete-time MDP or POMDP abstractions with fixed or pre-specified decision intervals. Thus, it remains difficult to evaluate whether RL methods can handle time-interval-dependent disease progression, personalized treatment response, and safety between consecutive measurement points. To address this gap, we introduce MedGym, a benchmark environment for dynamic treatment recommendation. MedGym models longitudinal patient evolution in a continuous-time framework and constructs a configurable medical RL benchmark from clinical data by using Physics-Informed Neural Networks. The resulting benchmark supports both offline and online RL, and enables direct comparison between discrete-time and continuous-time methods under irregular treatment timing and patient-specific dynamics. Besides, MedGym supports evaluation from clinically important perspectives, including personalization, trajectory-level safety, and the performance gap between model-based offline learning and online deployment. By providing a standardized and configurable benchmark for continuous-time dynamic treatment, MedGym aims to facilitate more realistic and informative evaluation of medical RL methods.

URL PDF HTML ☆

赞 0 踩 0

2606.01020 2026-06-02 cs.AI cs.LG 版本更新

Tackling the Root of Misinformation by Teaching Laypeople about Logical Fallacies via Socratic Questioning and Critical Argumentation

通过苏格拉底式提问和批判性论证教授外行人逻辑谬误，以应对错误信息的根源

Minjing Shi, Junling Wang, Jingwei Ni, Sankalan Pal Chowdhury, Mrinmaya Sachan

发表机构 * ETH Zurich（苏黎世联邦理工学院）； ETH AI Center（苏黎世联邦理工学院人工智能中心）

AI总结提出LFTutor智能辅导系统，利用大语言模型结合苏格拉底式提问和批判性论证原则，帮助外行人学习识别逻辑谬误，显著优于基线模型。

Comments This paper has been accepted to Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Long Paper), Main Conference

详情

Journal ref: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics, 2026

AI中文摘要

识别日常话语中的逻辑谬误对许多人来说具有挑战性。这一挑战在大语言模型（LLMs）时代被放大，恶意行为者可以利用谬误论证大规模传播错误信息。在这项工作中，我们探索了LLMs作为解决方案一部分的潜力。我们介绍了LFTutor，一个智能辅导系统，它使用LLMs辅导外行人，帮助他们学习逻辑谬误。LFTutor整合了意图驱动的苏格拉底式提问和批判性论证原则，以积极引导学习者反思自己的推理。通过自动评估和人工评估，我们证明LFTutor显著优于缺乏这些教学策略的基线LLMs。这项工作突显了将LLMs与教学支架相结合以在人工智能时代培养批判性思维和论证素养的前景。

英文摘要

Identifying logical fallacies in everyday discourse is challenging for many people. This challenge is amplified in the era of Large Language Models (LLMs), where malicious agents can deploy fallacious arguments to disseminate misinformation at scale. In this work, we explore the potential of LLMs as part of the solution. We introduce LFTutor, an intelligent tutoring system which uses LLMs to tutor laypeople and help them learn about logical fallacies. LFTutor integrates intent-driven Socratic questioning and critical argumentation principles to actively engage learners to reflect on their reasoning. Through both automatic and human evaluations, we demonstrate that LFTutor significantly outperforms baseline LLMs lacking these pedagogical strategies. This work highlights the promise of combining LLMs with pedagogical scaffolding to foster critical thinking and argument literacy in the age of AI.

URL PDF HTML ☆

赞 0 踩 0

2606.01007 2026-06-02 cs.LG cs.AI 版本更新

Beyond Task-Agnostic: Task-Aware Grouping for Communication-Efficient Multi-Task MoE Inference

超越任务无关：面向通信高效的多任务MoE推理的任务感知分组

Zhiyao Xu, Aoxue Liu, Zhanjie Ding, Dan Zhao, Yong Jiang, Qing Li

发表机构 * Tsinghua Shenzhen International Graduate School（清华大学深圳国际研究生院）； Pengcheng Laboratory（鹏城实验室）

AI总结提出任务感知共激活分组（TACG）框架，通过任务特定的共激活模式优化专家放置，并引入通用专家共享复制（GESR）应对在线负载倾斜，在三个MoE模型上平均降低通信成本31.39%，保持公平性指数0.9975。

详情

AI中文摘要

稀疏激活的混合专家（MoE）模型通过条件计算扩展容量，但分布式推理面临跨GPU专家通信和路由引起的负载不平衡问题。现有的放置方法通过共同定位频繁共激活的专家来降低这一成本；然而，它们从全局聚合的路由轨迹中推导出单一部署方案，从而平均掉了多任务服务中实际驱动通信的异构、任务特定的共激活模式。我们观察到专家共激活强烈依赖于任务：在一个任务族中紧密耦合的专家对在另一个任务族中往往不相关，因此有效的部署应根据任务感知的共激活而非任务无关的平均值来分组专家。基于这一见解，我们提出了任务感知共激活分组（TACG），这是一个部署时框架，利用族特定的调度和共激活轨迹推导每个专家的任务族偏好，重新加权共激活图使得族内局部性主导分组，并在精确容量约束下将每个专家分配到主GPU。为了使静态放置对在线工作负载倾斜保持鲁棒，我们进一步引入了通用专家共享复制（GESR），这是一个轻量级辅助方法，识别具有持续中心共激活特征的通用专家，将它们复制到少量辅助GPU上，并在服务时应用局部性和负载感知的选择。在三个代表性的开源MoE模型上的实验表明，我们的框架相比基线平均降低了31.39%的通信成本，同时保持了平均Jain公平指数0.9975。即使在推理数据出现严重分布偏移的情况下，这一优势依然存在，持续优于强基线。

英文摘要

Sparsely activated Mixture-of-Experts (MoE) models scale capacity via conditional computation, but distributed inference suffers from cross-GPU expert communication and routing-induced load imbalance. Existing placement methods reduce this cost by co-locating frequently co-activated experts; however, they derive a single deployment plan from globally aggregated routing traces, thereby averaging away the heterogeneous, task-specific co-activation patterns that actually drive communication in multi-task serving. We observe that expert co-activation is strongly task-conditioned: pairs tightly coupled in one task family are often uncorrelated in another, so effective deployment should group experts by task-aware co-activation rather than by a task-agnostic average. Based on this insight, we propose \emph{Task-Aware Coactivation Grouping} (TACG), a deployment-time framework that uses family-specific dispatch and co-activation traces to derive per-expert task-family preferences, reweights the co-activation graph so that intra-family locality dominates grouping, and assigns each expert to a primary GPU under exact capacity constraints. To keep the static placement robust under online workload skew, we further introduce \emph{Generic Expert Shared Replication} (GESR), a lightweight companion that identifies generic experts with consistently central co-activation profiles, replicates them across a small set of secondary GPUs, and applies locality- and load-aware selection at serving time. Experiments on three representative open-source MoE models demonstrate that our framework reduces the average communication cost by 31.39\% over the baseline, while preserving an average Jain fairness index of 0.9975. This advantage persists even under severe distribution shifts in the inference data, consistently outperforming strong baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.01002 2026-06-02 stat.ME cs.LG math.ST stat.TH 版本更新

Theoretical Analysis of Engression and Reverse Markov Engression

Engression与反向马尔可夫Engression的理论分析

Jiaqi Huang, Gongjun Xu, Ji Zhu

发表机构 * Department of Statistics, University of Michigan（密歇根大学统计系）

AI总结本文针对Engression及其反向马尔可夫扩展，在深度神经网络参数化下建立了非渐近收敛界，并通过能量距离链式法则分析了误差传播，得到了接近最优的过量风险界。

详情

AI中文摘要

Engression是最近提出的用于条件分布学习的有效框架。其多步反向马尔可夫扩展通过将复杂条件采样分解为顺序反向转移，进一步提高了生成灵活性。尽管这些方法具有强大的实证性能，但其严格的有限样本统计保证仍然缺乏。在本文中，在深度神经网络参数化下，我们通过直接控制学习到的条件分布与目标条件分布之间的能量距离，建立了Engression的非渐近收敛界。对于反向马尔可夫框架，我们进一步开发了基于能量距离的链式法则，从而能够严格分析反向步骤间的误差传播。我们的分析得到了相应的过量风险界，相对于一般Hölder类上的经典极小化最优速率，该界在对数因子意义下是接近最优的。

线性上下文赌博机中参数稀有更新的实用最优算法

Sanghoon Yu, Min-hwan Oh

发表机构 * Sanghoon Yu（苏杭oon Yu）； Min-hwan Oh

AI总结针对参数更新次数受限的线性上下文赌博机问题，提出两种仅需O(log log T)次参数更新的算法，在静态调度下达到极小化最优遗憾，并显著降低计算复杂度。

Comments Accepted at ICML 2026

详情

AI中文摘要

我们研究在参数稀有更新下的线性上下文赌博机：学习器只能在少量更新时刻将奖励反馈纳入其参数估计，同时仍在线观察上下文并顺序选择动作。这一观点澄清了文献中常被模糊的实际区别：许多“严格批处理”方法额外限制了区间内上下文的自适应性，即区间内的动作规则不能依赖于该区间内已实现的上下文/动作序列（除了当前轮次的上下文）。对于线性上下文赌博机，我们提出了两种仅需$O(\log\log T)$次参数更新的实用算法。我们的第一个算法BLCE-G在静态调度下，同时在小$K$和大$K$机制下达到极小化最优遗憾（达到$T$的多对数因子）。第二个算法BLCE去除了近G-最优设计步骤——这是先前严格批处理静态网格方法中主要的计算瓶颈——同时保持极小化最优遗憾，并在最优算法中实现了已知最低的运行时间复杂度。我们进一步将这些稀有更新和计算原则扩展到广义线性上下文赌博机。总体而言，我们的结果在$O(\log\log T)$次参数更新下产生了统计最优且计算高效的算法。

英文摘要

We study linear contextual bandits under rare parameter updates: the learner may incorporate reward feedback into its parameter estimate only at a small number of update times, while still observing contexts online and selecting actions sequentially. This viewpoint clarifies a practical distinction that is often blurred in the literature: many "strictly batched" methods additionally restrict within-interval context adaptivity, meaning that the action rule inside an interval cannot depend on the sequence of realized contexts/actions in that interval (beyond the current round's context). For linear contextual bandits, we propose two practical algorithms with only $O(\log\log T)$ parameter updates. Our first algorithm BLCE-G attains minimax-optimal regret (up to polylogarithmic factors in $T$) simultaneously in both the small-$K$ and large-$K$ regimes under a static schedule. Our second algorithm BLCE removes the near G-optimal design step -- a dominant computational bottleneck in prior strictly batched static-grid methods -- yet preserves minimax-optimal regret and achieves the lowest known runtime complexity among optimal algorithms. We further extend these rare-update and computational principles to generalized linear contextual bandits. Overall, our results yield statistically optimal algorithms under $O(\log\log T)$ parameter updates that are also computationally efficient in practice.

URL PDF HTML ☆

赞 0 踩 0

2606.00979 2026-06-02 cs.LG 版本更新

UME: A Unified Meta-Generalization Framework for Cross-Domain ETA

UME：跨域ETA的统一元泛化框架

Duo Wang, Qiong Wu, Jianguo Wu, Ruiyu Xu, Jinhui Yi, Zhonggen Sun, Zhentao Zhang, Yu Zhang, Ke Xing, Yongjun Yin, Zishuo Li, Jianwen Huang

发表机构 * Peking University（北京大学）； Meituan（美团）

AI总结针对即时物流中跨域ETA预测的零样本泛化、特征缺失和知识迁移问题，提出基于超网络元学习的统一双分支架构UME，通过元模块动态调制特征门控、专家注意力和最终预测，并在美团Keeta平台部署验证。

详情

AI中文摘要

在即时物流中，结账页面的准确预计到达时间（ETA）预测对于提高用户满意度、优化调度和控制运营成本至关重要。在国际按需配送平台上，ETA数据来自具有不同模式的不同国家或地区，多域建模非常重要且已被广泛采用。然而，现有方法在实际部署中仍面临三个关键挑战。首先，当前的多域模型难以泛化到完全未见过的域，无法在初始冷启动阶段实现零样本预测。其次，跨域特征空间通常被假设为一致的，而新域由于缺乏历史数据，常常遭受离线（统计）特征的结构性缺失。第三，这种特征缺失通常迫使工业系统分别对成熟域和冷启动域进行建模，阻碍了知识迁移并增加了维护开销。为了解决这些挑战，我们提出了UME，一个统一的元泛化框架用于ETA。具体来说，UME将统一的双分支架构与一种新颖的元学习机制相结合，该机制采用基于超网络的元学习器。通过利用域级知识和实例级上下文，元学习器赋能三个元模块动态调制特征门控、专家注意力和最终预测，捕获跨域相关性并促进域内适应。进一步引入知识蒸馏策略以提升性能。UME现已部署在美团Keeta配送平台（中国最大的国际食品配送平台）上。大量的离线实验和在线A/B测试表明，UME显著优于现有基线。

英文摘要

Accurate Estimated Time of Arrival (ETA) prediction on checkout page is crucial in instant logistics for enhancing user satisfaction, optimizing dispatching, and controlling operational costs. In international on-demand delivery platforms, where ETA data originates from diverse countries or regions with different patterns, multi-domain modeling is of great importance and has been widely adopted. However, existing methods still face three critical challenges in real-world deployment. First, current multi-domain models struggle to generalize to completely unseen domains, failing to achieve zero-shot prediction during the initial cold-start phase. Second, cross-domain feature spaces are often assumed to be consistent, whereas new domains commonly suffer from structural missingness of offline (statistical) features due to the lack of historical data. Third, such feature missingness often compels industrial systems to model mature and cold-start domains separately, hindering knowledge transfer and increasing maintenance overhead. To address these challenges, we propose \textbf{UME}, a \textbf{U}nified \textbf{M}eta-generalization framework for \textbf{E}TA. Specifically, UME integrates a unified dual-branch architecture with a novel meta-learning mechanism that employs a hypernetwork-based meta learner. By leveraging domain-level knowledge and instance-level context, the meta learner empowers three meta modules to dynamically modulate feature gating, expert attention, and final prediction, capturing cross-domain correlations and facilitating intra-domain adaptation. A knowledge distillation strategy is further introduce to enhance performance. UME has now been deployed in Meituan-keeta delivery platform (the largest international food delivery platform in China). Extensive offline experiments and online A/B tests demonstrate that UME significantly outperforms existing baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.00970 2026-06-02 cs.AI cs.LG econ.TH 版本更新

Prospect-Theory Behavior from Bellman Optimality in MDPs with Catastrophic States

具有灾难性状态的MDP中贝尔曼最优性产生的前景理论行为

Yujiao Chen

发表机构 * Massachusetts Institute of Technology（麻省理工学院）

AI总结研究具有吸收灾难状态的马尔可夫决策过程中的风险中性控制，发现标准贝尔曼最优性产生前景理论特征：S形值函数、内生损失敏感系数和反射效应策略反转，并推导出渐近损失厌恶平台的闭式表达式。

详情

AI中文摘要

我们研究具有吸收灾难状态的马尔可夫决策过程中的风险中性控制。尽管奖励是线性的且智能体没有效用曲率、概率加权或框架依赖，标准贝尔曼最优性产生了三个前景理论特征：S形值函数轮廓（灾难附近凸，远处凹）、内生损失敏感系数$λ^*(S) > 1$以及反射效应策略反转。在495个配置中，最优策略在正漂移（增长）模式下在灾难附近选择安全动作，尽管风险动作的即时期望值更高；在负漂移（衰退）模式下在灾难附近选择风险动作，尽管安全动作的即时期望损失更低。我们推导出渐近损失厌恶平台$\barλ$的闭式表达式，该表达式仅依赖于获胜概率$p$、收益不对称性$r = |Δ_\ell/Δ_w|$和折扣因子$β$，与数值解的拟合$R^2 = 0.999$。该机制不需要不对称收益。在三个不对称水平下对$(p,β)$进行扫描，$\barλ$大于1的不对称份额中位数为4.6%（$r = 1.25$时），上升到13.9%（$r = 2$时），且在每个测试单元中边界贡献超过不对称贡献。这些现象在表格Q学习（无模型智能体在增长模式下与$V^*$的相关性为0.98，衰退模式下为1.00）以及随机转移（高斯、重尾Student-$t_3$和不对称偏正态噪声，幅度高达步长的50%）中持续存在，其中渐近平台在安全通道噪声下跟踪闭式预测的误差在0.41%以内，在风险通道或双通道噪声下误差在9.6%以内。这些结果将吸收失败状态识别为最优控制下产生前景理论行为的充分结构机制。

COLLIE：在语义连贯的潜在空间中引导技能发现

Yao Luan, Ni Mu, Hanfei Ge, Yiqin Yang, Bo Xu, Qing-Shan Jia

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出COLLIE框架，利用密集无监督数据构建语义连贯的潜在空间，通过无需额外训练的引导信号实现稀疏人类反馈下的有效技能发现，避免危险行为并提升下游性能。

Comments ICML 2026

详情

AI中文摘要

无监督技能发现（USD）旨在无需奖励函数的情况下学习多样化的行为，但由于均匀探索，常常导致与任务无关或危险的行为。引导式技能发现（GSD）通过融入人类意图将探索聚焦于有意义的区域来解决这一问题。然而，现有的GSD方法通常需要训练额外的引导模型，并依赖于预定义规则或专家演示，这在稀疏的在线收集的人类反馈下可能效果不佳。为了克服这一点，我们提出了COLLIE，一个利用密集无监督数据构建语义连贯技能潜在空间的GSD框架。该潜在空间结构良好，能够通过稀疏的在线反馈实现可靠的引导。此外，其语义连贯性特性使得引导信号的构建无需训练，从而消除了在技能学习之外额外训练模型的需要。理论分析证明了我们无需训练的引导信号的有效性，而在各种基于状态和基于像素的任务上的实验表明，COLLIE能够学习多样化、与人类对齐的技能，避免危险行为，并在最少的人类反馈下实现优越的下游性能。

英文摘要

Unsupervised skill discovery (USD) aims to learn diverse behaviors without reward functions, but often results in task-irrelevant or hazardous behaviors due to uniform exploration. Guided skill discovery (GSD) addresses this issue by incorporating human intent to focus exploration on meaningful regions. However, existing GSD methods typically require training additional guidance models, and rely on pre-defined rules or expert demonstration, which can be ineffective under sparse, online-collected human feedback. To overcome this, we propose COLLIE, a GSD framework that leverages dense unsupervised data to construct a semantically coherent skill latent space. This latent space is well-structured, enabling reliable guidance with sparse online feedback. Moreover, its semantic coherence property enables training-free construction of guidance signals, eliminating the need for additional model training beyond skill learning. Theoretical analysis justifies the effectiveness of our training-free guidance signal, while experiments across diverse state-based and pixel-based tasks show that COLLIE learns diverse, human-aligned skills, avoids hazardous behaviors, and achieves superior downstream performance with minimal human feedback.

URL PDF HTML ☆

赞 0 踩 0

2606.00949 2026-06-02 cs.LG cs.AI physics.flu-dyn 版本更新

Explainable deep reinforcement learning reveals energy-efficient control strategies for turbulent drag reduction

可解释深度强化学习揭示湍流减阻的节能控制策略

Federica Tonti, Ricardo Vinuesa

发表机构 * Department of Aerospace Engineering University of Michigan（航空航天工程系密歇根大学）

AI总结结合多智能体深度强化学习与可解释深度学习，提出基于SHAP归因的奖励策略，实现高效湍流减阻，净节能达34.01%且输入功率仅0.43%。

详情

AI中文摘要

我们提出了一种结合多智能体深度强化学习（MARL）和可解释深度学习（XDL）的方法，用于减少壁面边界湍流中的阻力。以直接针对壁面剪切应力和反对称控制训练智能体的结果作为基线，比较了三种SHAP引导的方法。第一种方法中，奖励根据预测未来速度场的U-net的SHAP归因计算；第二种方法中，奖励根据预测摩擦系数的U-net的SHAP归因计算；第三种方法中，奖励结合了分别预测摩擦系数和壁面压力脉动的两个U-net的SHAP归因。基于摩擦系数和壁面压力脉动的组合SHAP策略实现了最佳整体性能，在仅0.43%归一化输入功率下实现了34.44%的减阻率（DR）和34.01%的净节能率（NES）。相对于反对称控制，减阻和净节能分别提高了49.41%和48.52%。与直接壁面剪切应力基线相比，所提出的策略在提高性能的同时，将归一化驱动成本从5.90%降低到0.43%。结果分析表明，节能策略与压力门控驱动一致，主要在壁面压力接近零时激活，并且其时间尺度与近壁湍流结构的寿命相当。

英文摘要

We propose a method combining Multi-Agent Deep Reinforcement Learning (MARL) and eXplainable Deep Learning (XDL) to reduce drag in wall-bounded turbulent flows. Taking as a baseline the results of training agents directly targeting wall-shear stress and opposition control, three SHAP-guided approaches are compared. In the first, the reward is computed from SHAP attributions of a U-net predicting the future velocity field; in the second, from SHAP attributions of a U-net predicting the skin-friction coefficient; in the third, from a combination of SHAP attributions of two U-nets predicting the skin-friction coefficient and the wall pressure fluctuations, respectively. The combined SHAP strategy based on skin-friction coefficient and wall-pressure fluctuations achieves the best overall performance, achieving a DR of 34.44% and a NES of 34.01% with only 0.43% normalized input power. Relative to opposition control, drag reduction and net energy saving increase by 49.41% and 48.52%, respectively. Compared with the direct wall-shear-stress baseline, the proposed strategy simultaneously improves performance while reducing the normalized actuation cost from 5.90% to 0.43%. Analysis of the results reveals that the energetically efficient policy is consistent with pressure-gated actuation, activating predominantly at near-zero wall pressure, and operates on a temporal timescale comparable to the lifetime of the near-wall turbulent structures.

URL PDF HTML ☆

赞 0 踩 0

2606.00946 2026-06-02 cs.DC cs.AI cs.LG 版本更新

通过潜在嵌入重建的高效合成网络生成

Feifan Jiang, Yinan Bu, Shihao Wu, Gongjun Xu, Ji Zhu

发表机构 * Department of Statistics, University of Michigan, Ann Arbor, Michigan, USA（统计学系，密歇根大学，安娜堡，密歇根州，美国）

AI总结提出SyNGLER框架，基于潜在空间网络模型，通过重建潜在嵌入生成合成网络，兼顾效率与结构保真度。

详情

AI中文摘要

网络数据在社会科学、生物学和信息系统中无处不在。生成逼真的合成网络数据具有从网络模拟到科学发现的广泛应用。然而，许多现有的黑盒网络生成方法倾向于过拟合观测数据，同时忽视特征网络结构，并在大规模下产生大量计算开销。这些实际挑战要求合成网络生成方法既高效又能捕捉网络的结构特性。在本文中，我们介绍了通过潜在嵌入重建的合成网络生成（SyNGLER），这是一个基于潜在空间网络模型的通用且高效的合成网络生成框架。给定一个观测网络，SyNGLER首先通过潜在空间网络模型学习低维潜在节点嵌入，然后通过在这些嵌入上构建无分布生成器来重建潜在空间。对于生成，SyNGLER首先从潜在空间中的生成器采样（或重采样）节点嵌入，然后使用潜在空间网络模型生成合成网络。通过潜在空间框架，SyNGLER保留了网络中的独特特征，如稀疏性和节点度异质性，同时允许以比许多现有深度架构更低的计算成本进行高效训练。我们通过开发真实边缘分布与合成边缘分布之间距离的一致性结果来提供理论保证。实证研究进一步证明了SyNGLER的有效性，与现有方法相比，它高效地生成了更好地保留关键网络特征（如网络矩和度分布）的网络。代码可在 https://github.com/FeifanJiang/syngler 获取。

英文摘要

Network data are ubiquitous across the social sciences, biology, and information systems. Generating realistic synthetic network data has broad applications from network simulation to scientific discovery. However, many existing black-box approaches for network generation tend to overfit observed data while overlooking characteristic network structure, and incur substantial computational overhead at scale. These practical challenges call for synthetic network generation methods that are both efficient and capable of capturing structural properties of networks. In this paper, we introduce Synthetic Network Generation via Latent Embedding Reconstruction (SyNGLER), a general and efficient framework for synthetic network generation that builds on latent space network models. Given an observed network, SyNGLER first learns low-dimensional latent node embeddings via a latent space network model and then reconstructs the latent space by building a distribution-free generator over these embeddings. For generation, SyNGLER first samples (or resamples) node embeddings from the generator in the latent space and then produces synthetic networks using the latent space network model. Through the latent space framework, SyNGLER preserves unique characteristics in networks such as sparsity and node degree heterogeneity, while allowing for efficient training with lower computational cost than many existing deep architectures. We provide theoretical guarantees by developing consistency results on the distance between the true and synthetic edge distributions. Empirical studies further demonstrate the effectiveness of SyNGLER, which efficiently produces networks that better preserve key network characteristics such as network moments and degree distributions compared with existing approaches. Code is available at https://github.com/FeifanJiang/syngler.

URL PDF HTML ☆

赞 0 踩 0

2606.00930 2026-06-02 cs.CL cs.AI cs.LG 版本更新

Detection vs. Execution: Single-Bucket Probes Miss Half the Mamba-2 State Sink

检测 vs. 执行：单桶探针遗漏了 Mamba-2 状态汇的一半

Yuhang Jiang

发表机构 * Independent Researcher（独立研究者）

AI总结本文发现 Mamba-2 中的状态汇（state sink）可分解为两类功能头集，单桶探针仅能恢复执行层而遗漏检测层，表明表征相似性不等于功能等价。

Comments 16 pages, 3 figures

详情

AI中文摘要

机械可解释性通常假设识别表征特征的探针也能识别执行相应计算的电路。我们证明这一假设在 Mamba-2 中可能系统性失败。通过研究状态汇（边界 token 上不成比例的 Delta 门控激活，类似于注意力汇），我们发现单桶探针仅能恢复一个小的执行层，而遗漏了具有相同表征特征的更大的检测层。在 Mamba-2 中，状态汇分解为两个功能头集。单桶 BOS 专家头（在 2.7B 模型中约占 5% 的头）在模型规模和语料库上因果支持 BOS 上下文和新行目标预测。双头（占头的 27-35%，通过同一探针的多类聚合恢复）表现出更强的 BOS-新行表征相似性，但在消融下因果效应显著较弱。表征相似性并不意味着功能等价。这一区别对下游行为至关重要：消融 BOS 专家头使 Mamba-1 2.8B 和 Mamba-2 2.7B 在 1024 上下文长度下的 RULER NIAH 检索准确率从 1.00 降至 0.00，而大小匹配的补集保持基线性能。随机通道分桶控制排除了仅由基质粒度造成的可能，暗示 Mamba-2 的头共享 Delta 投影。探针导出的专长可以识别执行电路；在粗粒度下，同一探针也能恢复检测电路，而区分它们需要类别条件消融而非类别条件余弦。

英文摘要

Mechanistic interpretability often assumes that probes identifying a representational signature also identify the circuit executing the corresponding computation. We show that this assumption can fail systematically in Mamba-2. Studying the state sink (disproportionate Delta-gate activation on boundary tokens, analogous to the attention sink), we find that single-bucket probes recover only a small execution layer while missing a much larger detection layer with the same representational signature. In Mamba-2, the state sink decomposes into two functional head sets. Single-bucket BOS-specialist heads (about 5% of heads at 2.7B) causally support both BOS-context and newline-target predictions across model scales and corpora. Dual heads (27-35% of heads, recovered by multi-class aggregation of the same probe) show stronger BOS-newline representational similarity but substantially weaker causal effects under ablation. Representational similarity does not imply functional equivalence. This distinction matters for downstream behaviour: ablating BOS-specialist heads collapses RULER NIAH retrieval accuracy from 1.00 to 0.00 at 1024 context length in both Mamba-1 2.8B and Mamba-2 2.7B, while size-matched complements preserve baseline performance. A random channel-bucketing control rules out substrate granularity alone, implicating Mamba-2's head-shared Delta projection. Probe-derived specialty can identify execution circuits; at coarse granularity the same probe also recovers detection circuits, and separating them requires class-conditional ablation rather than class-conditional cosine.

URL PDF HTML ☆

赞 0 踩 0

2606.00928 2026-06-02 cs.CV cs.LG 版本更新

Single-Channel Tissue Segmentation via Cross-Modal Distillation from Foundation Models

基于基础模型跨模态蒸馏的单通道组织分割

Sakib Mohammad, Jarin Ritu, Md Sakhawat Hossain

发表机构 * Department of Engineering Technology（工程技术系）； Department of Electrical and Computer Engineering（电气与计算机工程系）； Department of Mechanical Engineering（机械工程系）

AI总结提出跨模态知识蒸馏框架，将多通道输入的基础模型教师知识迁移到仅使用核通道的轻量级学生网络，实现单通道组织分割性能大幅提升。

Comments 6 pages, 3 figures

详情

AI中文摘要

多重荧光显微镜通过提供互补通道（包括核（DAPI）和膜（E-cadherin））改善组织分割，这些通道共同编码比单通道成像更丰富的空间上下文。然而，多重模型在推理时需要所有通道，限制了在仅部分通道可用时的部署。本文提出一个跨模态知识蒸馏框架，将处理多重输入的基础模型教师的语义信息迁移到仅使用核通道的轻量级学生网络。蒸馏目标结合了基于MSE的概率匹配、边界感知监督和可学习的不确定性加权。在TissueNet和BBBC038上，评估了SAM ViT-H和CellSAM作为教师，四个U-Net学生：Swin-Tiny（27M）、ResNet18（11M）、EfficientNet-B0（5.3M）和MobileNetV3（1.5M）。在TissueNet上，SAM蒸馏的Swin-Tiny学生达到Dice 78.36（±1.44），比无KD基线（65.31±1.35）提高13.05分，并以23倍参数缩减恢复了教师oracle性能（89.12±1.21）的87.9%。KD一致地使所有四个学生提高约12个Dice点，确认了架构无关的蒸馏。在所有设置中，SAM ViT-H作为教师优于CellSAM。在BBBC038上的跨数据集评估显示，无需教师重新训练即可获得一致增益。

英文摘要

Multiplexed fluorescence microscopy improves tissue segmentation by providing complementary channels including nuclear (DAPI) and membrane (E-cadherin), that together encode richer spatial context than single-channel imaging alone. However, multiplexed models require all channels at inference, limiting deployment where only a subset is available. This work proposes a cross-modal knowledge distillation framework that transfers semantic information from a frozen foundation model teacher processing multiplexed input to a lightweight student operating on the nuclear channel only. The distillation objective combines MSE-based probability matching, boundary-aware supervision, and learnable uncertainty weighting. SAM ViT-H and CellSAM are evaluated as teachers across four U-Net students: Swin-Tiny (27M), ResNet18 (11M), EfficientNet-B0 (5.3M), and MobileNetV3 (1.5M), on TissueNet and BBBC038. On TissueNet, the SAM-distilled Swin-Tiny student achieves Dice 78.36 (plus or minus 1.44), a 13.05-point improvement over the no-KD baseline (65.31 plus or minus 1.35) and 87.9% recovery of teacher oracle performance (89.12 plus or minus 1.21) at a 23x parameter reduction. KD consistently improves all four students by approximately 12 Dice points, confirming architecture-agnostic distillation. SAM ViT-H outperforms CellSAM as teacher across all settings. Cross-dataset evaluation on BBBC038 shows consistent gains without teacher retraining.

URL PDF HTML ☆

赞 0 踩 0

2606.00926 2026-06-02 cs.LG cs.CL 版本更新

Task Structure Reverses Layerwise State Encoding in Sequence Models

任务结构逆转序列模型中的层级状态编码

Yuhang Jiang

发表机构 * Independent Researcher（独立研究者）

AI总结本文通过形式模型和预训练模型上的实验，发现序列模型（如Transformer、Mamba、LSTM等）中层级状态编码的分布模式会随任务结构（如Parity、Dyck-k、S3）而逆转，且这种分组由计算结构（前缀更新 vs. 栈）而非代数结构（交换性）决定。

Comments 20 pages, 11 figures, 8 tables

详情

AI中文摘要

序列模型的机制研究通常将层级状态编码视为架构特征：循环模型集中可读状态，注意力模型分散状态。我们发现，当任务改变时，同一架构会逆转这种分布。在Transformer、Mamba、Mamba-2、LSTM和GRU中，Parity在Mamba和循环基线中集中在后期，而Transformer逐步构建；在有界深度Dyck-k上模式翻转。同样的翻转出现在微调的Mamba-130M和Pythia-160M中，且Pythia在Dyck上的瓶颈在410M时仍然存在。文献中混淆了两种解释：代数结构（交换性）与计算结构（前缀更新 vs. 栈）。为了区分它们，我们添加了第三个任务：非交换的S3置换组合。在所有五种架构的层级探测和Mamba特有的Conv1D归因中，S3与Parity而非Dyck归为一组，因此分组追踪的是计算结构而非交换性。因果干预表明，在4层形式模型中，线性可读方向通常是功能上必要的，并且在Parity和Dyck上的分布外长度上可能仍然重要。在预训练规模上，情况出现分化。微调的Pythia在Dyck上存在强中间层瓶颈（在160M时，L6-L7消融使准确率下降约81%；在410M时，L4-L18出现更宽的瓶颈），而在最佳探测层上则弱得多。预训练的Mamba表现出互补的失败模式：其最后一层高度可读，但没有任何单个探测方向能在Parity、Dyck或S3上破坏任务，然而中间位置的激活修补恢复了约97-98%的干净-损坏logit差距。探测定位了状态线性可用的位置，并不总是计算瓶颈所在。机制特征是架构和任务共同的性质。

英文摘要

Mechanistic studies of sequence models often treat layerwise state encodings as architectural traits: recurrent models concentrate readable state, attention-based models distribute it. We find that the same architecture reverses this profile when the task changes. Across Transformers, Mamba, Mamba-2, LSTMs, and GRUs, Parity is concentrated late in Mamba and the recurrent baselines and built gradually by Transformer; on bounded-depth Dyck-k the pattern flips. The same flip appears in fine-tuned Mamba-130M and Pythia-160M, and the Pythia Dyck bottleneck persists at 410M. Two explanations are conflated in the literature: algebraic structure (commutativity) versus computational structure (prefix update vs. stack). To separate them we add a third task: non-commutative S_3 permutation composition. S_3 groups with Parity, not Dyck, on layerwise probing across all five architectures and on Mamba-specific Conv1D attribution, so the grouping tracks computational structure rather than commutativity. Causal interventions show that, in the 4-layer formal models, linearly readable directions are often functionally necessary and can remain important at out-of-distribution lengths on Parity and Dyck. At pretrained scale the picture splits. Fine-tuned Pythia Dyck has a strong middle-layer bottleneck (L6-L7 ablation drops accuracy by roughly 81% at 160M; broader L4-L18 plateau at 410M), far weaker at the best-probe layer. Pretrained Mamba shows the complementary failure mode: its final layer is highly readable, no single probe direction breaks the task on Parity, Dyck, or S_3, yet mid-position activation patching there recovers about 97-98% of the clean-corrupted logit gap. Probing localizes where state is linearly available, not always where the computation is bottlenecked. Mechanistic signatures are properties of architecture and task together.

URL PDF HTML ☆

赞 0 踩 0

2606.00920 2026-06-02 cs.LG cs.AI cs.SE 版本更新

Accuracy, Stability, and Repeated-Run Reliability of Large Language Models on Deterministic Programming Tasks

大型语言模型在确定性编程任务上的准确性、稳定性和重复运行可靠性

Yongxi Zhou, Lai Yun Choi, Jiaxi Wen, Wenbo Ye

发表机构 * Northeastern University, Massachusetts, USA（东北大学，马萨诸塞州，美国）； University of Southern California, California, USA（南加州大学，加利福尼亚州，美国）

AI总结通过重复运行评估协议，发现运行级通过率高估了无重试覆盖率高达17.8个百分点，且差距在中等性能系统中最大，表明稳定性分析是准确性报告的必要补充。

详情

AI中文摘要

运行级通过率高估了无重试覆盖率高达17.8个百分点——且差距恰恰在中等性能系统中最大。我们研究了大型语言模型（LLM）在确定性文本条件生成评估中的这种准确性-稳定性关系，以编程任务作为具体测试平台。标准代码生成基准强调单次运行准确性或在重复采样下的最终成功，但许多部署场景还需要稳定性：在相同任务描述下重复调用时的一致结果。我们提出了一种重复运行评估协议，包含运行级准确性、无重试覆盖率和每个问题的变异性指标。在一个包含100道LeetCode风格问题的基于近期的基准上，我们评估了来自五个提供者家族的16个模型，使用两种提示模板，每个问题重复运行五次，共产生16,000个评估实例。尽管运行级通过率与完美稳定率强相关（r=0.985），但通过率始终超过无重试覆盖率——这一差距达到17.8个百分点，并且即使在密切匹配的系统之间也会逆转模型排名。提示效应是模型依赖的，而非普遍有益的。这些结果表明，对于确定性文本条件生成任务，重复运行稳定性分析是传统准确性报告的必要补充。

英文摘要

Run-level pass rate overstates retry-free coverage by up to 17.8 percentage points -- and the gap is largest precisely for mid-performing systems. We investigate this accuracy--stability relationship in large language model (LLM) evaluation for deterministic text-conditioned generation, using programming tasks as a concrete testbed. Standard code-generation benchmarks emphasize single-run accuracy or eventual success under repeated sampling, but many deployment settings also require stability: consistent outcomes across repeated invocations under the same task description. We present a repeated-run evaluation protocol with metrics for run-level accuracy, retry-free coverage, and per-problem variability. On a recency-based benchmark of 100 LeetCode-style problems, we evaluate 16 models from five provider families under two prompt templates with five repeated runs per problem, yielding 16,000 evaluation instances. Although run-level pass rate and perfect stability rate are strongly correlated (r=0.985), pass rate consistently exceeds retry-free coverage -- a gap that reaches 17.8 percentage points and reverses model rankings even among closely matched systems. Prompt effects are model-dependent rather than uniformly beneficial. These results suggest that repeated-run stability analysis is a necessary complement to conventional accuracy reporting for deterministic text-conditioned generation tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.00919 2026-06-02 cs.CL cs.LG 版本更新

Towards Lightweight Reliability: Using Soft Prompts for Hallucination Mitigation in Large Language Models

迈向轻量级可靠性：使用软提示缓解大型语言模型中的幻觉

S M Tahmid Siddiqui, Akib Jawad Ononto, Anoop Singhal, Latifur Khan

发表机构 * The University of Texas at Dallas（德克萨斯大学达拉斯分校）； National Institute of Standards and Technology（国家标准与技术研究院）

AI总结提出一种参数高效的软提示方法RCSP，通过对比学习、课程学习和KL正则化平衡事实回忆、幻觉抑制和弃权，在多个QA数据集上优于基线。

Comments 20 pages, 5 tables, 2 figures. Accepted for publication in DBSec 2026. The final publication will be available at Springer

详情

AI中文摘要

大型语言模型（LLMs）已在各个领域得到广泛应用，但其可靠性常因幻觉——听起来合理但事实不正确的回答——而受到损害。在高风险领域，这些错误会降低信任并引入现实风险。为解决这一挑战，我们提出一种参数高效的方法，使用软提示来缓解幻觉内容并促进生成式问答（QA）任务中的负责任弃权。我们的方法称为负责任对比软提示（RCSP），使用复合损失训练软提示，以平衡三个目标：抑制幻觉内容、鼓励在不确定性下弃权、以及保持或改善事实回忆。为实现这些目标，我们在训练机制中融入对比损失、课程学习和KL正则化。我们使用LLM-as-a-Judge框架在五个不同的生成式QA数据集上评估我们的方法。在Gemma 3（12B）和Llama 3.1（8B）骨干上的实验结果表明，RCSP有效平衡了事实回忆与幻觉抑制和弃权，在F分数上通常优于标准推理和基于指令的提示基线。值得注意的是，这些改进仅通过训练其他调优技术所需参数的一小部分实现。我们的结果表明，软提示提供了一条模块化且计算高效的路径，用于提高LLM的可靠性。

英文摘要

Large language models (LLMs) have seen widespread adoption across various domains, yet their reliability is frequently undermined by hallucinations - responses that are plausible-sounding but factually incorrect. In high-stakes domains, these errors can reduce trust and introduce real-world risk. To address this challenge, we present a parameter-efficient approach that uses soft prompts to mitigate hallucinated content and promote responsible abstention in generative question-answering (QA) tasks. Our method, called Responsible Contrastive Soft Prompting (RCSP), uses a composite loss to train soft prompts that balance three goals: suppressing hallucinatory content, encouraging abstention under uncertainty, and preserving or improving factual recall. To achieve these goals, we incorporate contrastive loss, curriculum learning, and KL regularization into our training mechanism. We evaluate our approach on five diverse generative QA datasets using an LLM-as-a-Judge framework. Experimental results on the Gemma 3 (12B) and Llama 3.1 (8B) backbones demonstrate that RCSP effectively balances factual recall with hallucination suppression and abstention, yielding a generally superior F-score over standard reasoning and instruction-based prompting baselines. Notably, these improvements are achieved by training only a fraction of the parameters required by other tuning techniques. Our results demonstrate that soft prompts provide a modular and computationally efficient path toward improving LLM reliability.

URL PDF HTML ☆

赞 0 踩 0

2606.00913 2026-06-02 stat.ML cs.LG 版本更新

Bandit Simulation for Average Reward Inference

平均奖励推断的赌博机模拟

Samya Praharaj, Chih-Yu Chang, Koulik Khamaru, Kelly W. Zhang

发表机构 * Rutgers University（罗格斯大学）； Imperial College London（伦敦帝国理工学院）

AI总结提出BSI框架，通过拟合环境模拟器并传播参数不确定性，为自适应赌博机算法构建渐近有效的置信区间。

详情

AI中文摘要

多臂赌博机算法越来越多地用于在线平台、临床试验和社会科学实验，但对其性能的有效统计推断仍然是一个开放挑战。部署赌博机后，一个自然的问题是能否为其平均奖励构建置信区间，并评估其是否可靠地优于基线策略。任何单次赌博机部署中获得的总奖励是随机的，由于奖励的随机性，在同一人群上部署两次赌博机通常会产生不同的奖励轨迹。标准统计推断方法无法使用，因为赌博机算法在收集的数据中引入了复杂的依赖性，违反了经典方法所依赖的独立同分布假设。此外，现有的自适应收集数据推断方法仅适用于不依赖于数据收集算法的估计量（例如固定动作下的平均奖励）。我们提出了用于推断的赌博机模拟（BSI），这是一个框架，它从观测数据（在线或离线）中拟合赌博机环境的模拟器，并用于估计任何评估策略（包括自适应黑盒算法）下的平均奖励。BSI将估计的模拟器参数的不确定性正式传播到置信区间构建中。此外，BSI的有效性仅需要对行为策略的弱探索假设，并避免了重要性加权。我们证明BSI产生渐近有效的置信区间，并通过实验证明在标准离线策略评估方法失败的情况下，BSI能保持名义覆盖。

英文摘要

Multi-arm bandit algorithms are increasingly used in online platforms, clinical trials, and social science experiments, but valid statistical inference on their performance remains an open challenge. After deploying bandits, a natural question is whether one can construct a confidence interval for its mean reward and assess whether it reliably outperforms a baseline policy. The total reward achieved in any single bandit deployment is random, and deploying a bandit twice on the same population typically yields different reward trajectories due to stochastic rewards. Standard statistical inference methods cannot be used because bandit algorithms introduce complex dependencies in the collected data, which violate the i.i.d. assumption underlying many classical approaches. Moreover, existing inference methods for adaptively collected data only apply to estimands that do not depend on the data-collection algorithm (such as the mean reward under a fixed action). We propose Bandit Simulation for Inference (BSI), a framework that fits a simulator of the bandit environment from observed data--either on-policy or off-policy--and uses it to estimate the mean reward under any evaluation policy, including adaptive blackbox algorithms. BSI formally propagates uncertainty in the estimated simulator parameters into the confidence interval construction. Furthermore, for BSI to be valid, it requires only weak exploration assumptions on the behavior policy and avoids importance weighting. We prove that BSI yields asymptotically valid confidence intervals, and demonstrate empirically that it maintains nominal coverage in settings where standard off-policy evaluation methods fail.

URL PDF HTML ☆

赞 0 踩 0

2606.00910 2026-06-02 cs.CV cs.LG 版本更新

Reason, Retrieve, Re-rank: A Zero-Shot Reasoning-Aware Framework for Composed Video Retrieval

推理、检索、重排序：一种用于组合视频检索的零样本推理感知框架

Ali Alavi

发表机构 * The Ohio State University（俄亥俄州立大学）

AI总结提出R3-CoVR零样本管道，通过多模态大模型推理编辑后状态、对比编码检索和约束感知重排序，在CVPR 2026 VidLLMs挑战赛上达到91.9% R@1和98.2% R@10。

详情

AI中文摘要

组合视频检索（CoVR）旨在通过对参考视频应用自由形式的文本修改来寻找目标视频。我们应对CVPR 2026 VidLLMs研讨会上的推理感知CoVR（CoVR-R）挑战，其中检索严格为零样本。我们提出R3-CoVR（推理、检索、重排序），一个完全由冻结基础模型构建的无训练管道。多模态大语言模型（Qwen3-VL-8B）推理编辑所隐含的“后效”——状态转换、动作阶段、场景、镜头和节奏——并生成简洁的编辑后描述；对比视频-文本编码器（SigLIP-2）对该描述和图库进行嵌入以进行第一阶段检索；最后，一个约束感知重排序阶段使用相同的多模态模型作为评判者，对每个候选视频针对预期的编辑结果进行评分。在挑战测试集上，R3-CoVR达到了91.9%的R@1和98.2%的R@10。两个发现推动了这些结果：（i）将描述长度匹配到对比编码器的文本窗口使R@1从67.5提升到72.7；（ii）仅对候选列表进行重排序的约束感知重排序器将R@1从72.7提升到91.9——这是最大的单一增益。我们分析了重排序器的行为、检索/重排序混合以及候选列表深度，并发布了一个干净的三层实现。

英文摘要

Composed Video Retrieval (CoVR) seeks the target video that results from applying a free-form textual modification to a reference video. We address the \emph{Reason-Aware} CoVR (CoVR-R) challenge at the CVPR~2026 VidLLMs workshop, where retrieval is strictly zero-shot. We present \textbf{R3-CoVR} (\emph{Reason, Retrieve, Re-rank}), a training-free pipeline built entirely from frozen foundation models. A multimodal large language model (Qwen3-VL-8B) reasons about the \emph{after-effects} an edit implies -- state transitions, action phases, scene, camera and tempo -- and verbalises a concise post-edit description; a contrastive video--text encoder (SigLIP-2) embeds this description and the gallery for first-stage retrieval; finally a constraint-aware re-ranking stage uses the same multimodal model as a judge that scores each shortlisted candidate against the intended edited result. On the challenge test set, R3-CoVR attains \textbf{91.9\% R@1} and \textbf{98.2\% R@10}. Two findings drive these results: (i)~matching the description length to the contrastive encoder's text window lifts \Rk{1} from $67.5$ to $72.7$; and (ii)~the constraint-aware re-ranker, which reorders only the shortlist, lifts \Rk{1} from $72.7$ to $91.9$ -- the single largest gain. We analyse the re-ranker's behaviour, the retrieve/re-rank blend, and the shortlist depth, and we release a clean three-layer implementation.

URL PDF HTML ☆

赞 0 踩 0

2606.00895 2026-06-02 math.OC cs.LG 版本更新

深入波动：用于跨被试脑电情绪解码的Morlet谱变换器

Jiaxin Qing, Lexin Li

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结针对脑电情绪识别中跨被试变异性问题，提出基于Morlet小波标记化、长上下文基线去除和频带特定空间投影的Morlet谱变换器（MST），无需预训练即可在SEED系列数据集上超越大型预训练模型和频域方法。

详情

AI中文摘要

我们研究基于脑电的跨被试情绪识别，这是脑机接口中一个实际重要但具有挑战性的问题。与具有清晰波形特征的任务不同，情绪相关的脑电信号主要编码在频谱功率中，且微弱、嘈杂，并在被试间高度变化。现有方法要么依赖需要大量数据但仍难以应对跨被试变异的大型预训练脑电基础模型，要么依赖频域编码器（能更好地反映频谱结构但存在表示不匹配、漂移主导的标记化以及缺乏频带特定空间建模）。在本文中，我们提出了Morlet谱变换器（MST），它围绕三个关键组件构建，并与时空变换器主干集成。首先，Morlet小波标记化提供了与脑节律多尺度结构匹配的时频表示，并将经典微分熵特征扩展到适合变换器的形式。其次，长上下文基线去除作为一种简单的时间归一化，消除了被试特定漂移和附近窗口间的冗余。第三，频带特定空间投影为每个频带学习独立的通道混合器，捕获可解释的频带特定模式并减少跨通道混合。我们表明，即使没有预训练，MST在所有SEED系列数据集上始终优于大型预训练脑电基础模型和基于频率的方法。这些结果表明，精心的表示设计可以产生准确、经济且可解释的替代大规模预训练的方法。

英文摘要

We study cross-subject emotion recognition from EEG, a practically important yet challenging problem in brain-computer interfaces. Unlike tasks with clear waveform signatures, emotion-related EEG signals are primarily encoded in spectral power and are weak, noisy, and highly variable across subjects. Existing approaches rely either on large pretrained EEG foundation models, which require massive data yet still struggle with cross-subject variability, or frequency-domain encoders, which better reflect spectral structure but suffer from mismatched representations, drift-dominated tokenization, and lack of band-specific spatial modeling. In this article, we propose the Morlet Spectral Transformer (MST), built around three key components and integrated with a spatiotemporal Transformer backbone. First, Morlet wavelet tokenization provides a time-frequency representation that matches the multi-scale structure of brain rhythms, and extends classical differential entropy features to a form suitable for Transformers. Second, long-context baseline removal acts as a simple temporal normalization that removes subject-specific drift and redundancy across nearby windows. Third, frequency-specific spatial projection learns a separate channel mixer for each frequency band, capturing interpretable band-specific patterns and reducing cross-channel mixing. We show that, even without pretraining, MST consistently outperforms both large pretrained EEG foundation models and frequency-based methods across all SEED-family datasets. These results suggest that careful representation design can yield an accurate, cost-effective, and interpretable alternative to large-scale pretraining.

URL PDF HTML ☆

赞 0 踩 0

2606.00880 2026-06-02 cs.LG cs.AI 版本更新

Task diversity produces systematic transfer but inhibits continual reinforcement learning

任务多样性产生系统性迁移但抑制持续强化学习

Purab Seth, Neil Shah, Kunal Jha, Samuel J. Gershman, Max Kleiman-Weiner, Wilka Carvalho

发表机构 * MIT（麻省理工学院）； University of California, Berkeley（加州大学伯克利分校）； Princeton University（普林斯顿大学）； Harvard University（哈佛大学）

AI总结通过引入GPU加速的持续强化学习领域Banyan，研究任务多样性（地图布局、交互对象、子目标层次结构）对智能体在分布变化下持续学习能力的影响，发现多样性促进局部迁移但导致长期任务性能停滞和遗忘。

详情

AI中文摘要

持续强化学习旨在产生不仅能在当前任务上提高，还能随着任务分布变化而适应的智能体。在众多不同任务上训练智能体可以引发零样本泛化，但先前的工作通常是在训练后（冻结权重）评估这种泛化。任务多样性是否也能提高智能体在分布变化下继续学习的能力仍不清楚。我们引入了Banyan，一个GPU加速的持续强化学习领域，其中任务多样性分解为三个独立可控的轴：智能体必须导航的地图布局、必须与之交互的对象以及子目标依赖的层次结构。在单个分布变化中，增加每个轴上的多样性会导致智能体在新任务上开始训练时，其性能接近先前任务达到的水平，即使变化改变了最优策略的结构。然而，随着变化数量的增加，这种局部迁移本身并不能产生持续的持续学习：更长视野的任务出现平台期，并且较早的任务分布在后续训练后被遗忘。Banyan是一个基准，用于研究受控的任务多样性何时产生可迁移的学习，这种迁移何时持续，以及它在哪些方面未能达到真正的持续学习。

英文摘要

Continual reinforcement learning aims to produce agents that learn not only to improve at their current tasks but also to adapt as task distributions change. Training an agent on many diverse tasks can induce zero-shot generalization, but previous work generally evaluates this generalization after training -- with frozen weights. Whether task diversity also improves an agent's ability to continue learning across distribution shifts remains unclear. We introduce Banyan, a GPU-accelerated continual RL domain in which task diversity factors into three independently controllable axes: the map layouts an agent must navigate, the objects it must interact with, and the hierarchical structures of sub-goal dependencies. Across individual distribution shifts, increasing diversity along each axis causes agents to begin training on the new tasks near the performance attained on the previous one, even when the shift changes the structure of the optimal policy. However, as the number of shifts increases, this local transfer does not by itself yield sustained continual learning: longer-horizon tasks plateau, and earlier task distributions are forgotten after later training. Banyan is a benchmark for studying when controlled task diversity produces transferable learning, when that transfer persists, and where it falls short of proper continual learning.

URL PDF HTML ☆

赞 0 踩 0

2606.00869 2026-06-02 cs.LG 版本更新

Enhancing LLM Metacognition via Cognitive Pairwise Training

通过认知成对训练增强LLM元认知

Weitao Li, Hao Zhou, Xuanyu Lei, Fandong Meng, Yuanhang Liu, Jingyi Ren, Ante Wang, Xiaolong Wang, Yuanchi Zhang, Fuwen Luo, Guangwen Yang, Lin Gan, Weizhi Ma, Yang Liu

发表机构 * National Engineering Laboratory for Intelligent Information Processing, Academy of Mathematics and Physics, Chinese Academy of Sciences（智能信息处理国家工程实验室，中国科学院数学物理研究所）； University of Science and Technology of China（中国科学技术大学）

AI总结提出认知成对训练（CPT），通过成对比较推理轨迹来学习区分可靠与不可靠推理，从而提升LLM的推理与元认知权衡。

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）已成为LLM推理的核心，但其结果级奖励可能使模型在证据或推理不可靠时更愿意给出自信答案。现有的SFT或RL方法主要在响应级别教导LLM拒绝或表达不确定性，这可能导致过度拟合拒绝行为，而非提高推理可靠性。为解决这一局限，我们提出认知成对训练（CPT），这是一种认知中期训练对齐阶段，将推理轨迹上的成对比较转化为可复用的对齐信号。通过学习区分可信与有缺陷的推理，CPT鼓励模型内化推理质量判别边界，而非记忆表面拒绝模式。在五个模型规模和三个模型家族上，CPT改善了推理与元认知的权衡。在14B规模上，CPT+RL相比标准SFT+RL流水线在数学平均分上提升2.2分，在拒绝F1上提升5.2分。进一步分析表明，CPT提高了轨迹质量，并在评估和训练设置中表现出强鲁棒性和可扩展性。代码和模型已发布在https://github.com/Tsinghua-dhy/CPT。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has become central to LLM reasoning, but its outcome-level rewards can make models more willing to give confident answers when evidence or reasoning is unreliable. Existing SFT or RL methods mainly teach LLMs to refuse or express uncertainty at the response level, which can overfit abstention behavior rather than improve reasoning reliability. To address this limitation, we propose Cognitive Pairwise Training (CPT), a cognitive mid-training alignment stage that turns pairwise comparisons over reasoning traces into a reusable alignment signal. By learning to distinguish trustworthy from flawed reasoning, CPT encourages the model to internalize a reasoning-quality discrimination boundary rather than memorize surface refusal patterns. Across five model scales and three model families, CPT improves the reasoning--metacognition trade-off. At 14B, CPT+RL outperforms the standard SFT+RL pipeline by +2.2 math-average points and +5.2 abstention-F1 points. Further analyses show that CPT improves trace quality and exhibits strong robustness and scalability across evaluation and training settings. Code and models are released at https://github.com/Tsinghua-dhy/CPT.

URL PDF HTML ☆

赞 0 踩 0

2606.00867 2026-06-02 stat.ML cs.LG eess.SP 版本更新

Statistical Analysis of using the Shapley Value for Sensor Anomaly Localization with Accurate Classifiers

使用Shapley值进行传感器异常定位与准确分类器的统计分析

Xubin Fang, Rick S. Blum

发表机构 * Electrical and Computer Engineering Department of Lehigh University（莱斯大学电气与计算机工程系）

AI总结本文通过数学定义的二元最优分类器分析Shapley值在传感器异常定位中的性能，证明在独立观测下等价于低复杂度测试，而在相关双变量高斯/拉普拉斯场景下两者存在本质差异，并首次提供理论统计结果。

详情

AI中文摘要

最近的出版物建议使用Shapley值进行传感器异常/攻击定位。我们通过在Shapley值计算中使用数学定义的二元最优分类器来研究这种方法的性能。为了判断定位性能，我们研究给定传感器观测的Shapley值确定该观测是否异常的能力。首先，我们证明对于独立传感器观测的情况，使用Shapley值的优化异常测试等价于使用Shapley值计算中单个项的优化低复杂度异常测试，产生完全相同的错误概率。对于涉及两个传感器的一些流行的相关观测情况，包括相关双变量高斯/拉普拉斯概率密度函数和常数/高斯攻击/异常，我们证明这两个测试本质上是不同的，产生不同的决策区域和错误概率。此外，我们证明在某些统计相关的双变量高斯场景中，当相关幅度较大且存在加性攻击/异常时，Shapley值测试有时严格劣于另一个（Shapley计算中的单个项）测试，而在其他情况下则严格优于它，具体取决于相关的符号。在这些情况下，可以结合这两种方法以获得严格更好的方法。这些结果首次提供了基于Shapley定位的理论统计分析，鉴于许多研究人员广泛接受Shapley值，这些结果似乎非常有趣，并应鼓励对该主题的进一步研究。提供了数值结果以说明我们的发现。

英文摘要

Recent publications have suggested using the Shap- ley value for sensor anomaly/attack localization. We study the performance of such an approach by using mathematically de- fined optimum binary classifiers in the Shapley value calculation. To judge localization performance, we study the ability of the Shapley value of a given sensor observation to determine if that observation is anomalous. First, we prove that for cases with independent sensor observations, an optimized anomaly test using the Shapley value is equivalent to an optimized lower-complexity anomaly test using a single term in the Shapley value calculation, yielding the exact same probability of error. For some popular dependent observation cases involving two sensors, including correlated bivariate Gaussian/Laplacian probability density functions and constant/Gaussian at- tacks/anomalies, we prove that these two tests are fundamentally different, yielding different decision regions and error probabil- ities. Further, we prove that the Shapley value test is sometimes strictly inferior to the other (single term in Shapley calculation) test in certain statistically dependent bivariate Gaussian scenarios with large correlation magnitude and additive attacks/anomalies, while it is strictly superior in others, depending on the sign of the correlation. One can combine these two approaches to obtain a strictly better approach in these cases. These results, which provide the first theoretical statistical analysis of Shapley-based localization, seem very interesting based on the wide acceptance of the Shapley value by many researchers and should encourage further research on this topic. Numerical results are provided which illustrate our findings.

URL PDF HTML ☆

赞 0 踩 0

2606.00862 2026-06-02 cs.NE cs.LG 版本更新

Meta-Black-Box Optimization with Ensemble Surrogate Modeling for Robustness-Accuracy Trade-off within SAEA

基于集成代理建模的元黑箱优化以实现SAEA中的鲁棒性-准确性权衡

Xiao Jin, Yongxiong Wang, Haobo Liu, Yudong Du, Yukun Du

发表机构 * GitHub

AI总结提出AdaE-SAEA，一种将SAEA嵌入MetaBBO框架并联合控制填充准则与集成代理建模的方法，通过强化学习训练元策略，自适应平衡鲁棒性与准确性，在昂贵多目标优化中优于现有方法。

详情

AI中文摘要

代理辅助进化算法（SAEAs）已被广泛用于昂贵的黑箱优化问题。然而，它们对刚性且手动设计组件的依赖限制了其跨任务的灵活性和泛化能力。元黑箱优化（MetaBBO）为自适应配置算法组件提供了一种有前景的范式。尽管如此，现有的MetaBBO方法通常只控制单个组件，很少有研究调查多组件优化器（如SAEAs）的统一控制。此外，代理建模中的鲁棒性-准确性权衡对于早期稳定探索和后期精确开发至关重要，但很少被明确考虑。为了解决这些问题，我们提出了AdaE-SAEA，一种用于昂贵多目标优化的自适应集成代理辅助进化算法。AdaE-SAEA将SAEA作为低层优化器嵌入MetaBBO框架，并联合控制填充准则和基于集成的代理建模。具体来说，bagging和boosting被设计为代理建模模块，以在不同搜索阶段自适应平衡鲁棒性和准确性，而元策略同时选择填充准则以实现自适应采样决策。元策略通过并行采样和集中训练的强化学习进行训练，提高了训练效率和可迁移性。在合成和实际问题上的实验表明，AdaE-SAEA优于最先进的基线和基于MetaBBO的方法。我们进一步验证了TabPFN作为集成学习基础代理模型的有效性。据我们所知，这是第一个统一控制SAEAs中代理建模和填充准则，同时明确解决鲁棒性-准确性权衡的工作。

英文摘要

Surrogate-assisted evolutionary algorithms (SAEAs) have been widely used for expensive black-box optimization problems. However, their reliance on rigid and manually designed components limits their flexibility and generalization across tasks. Meta-black-box optimization (MetaBBO) provides a promising paradigm for adaptively configuring algorithmic components. Nevertheless, existing MetaBBO methods usually control only a single component, and few studies have investigated the unified control of multi-component optimizers such as SAEAs. Moreover, the robustness-accuracy trade-off in surrogate modeling, which is crucial for stable early-stage exploration and accurate late-stage exploitation, has rarely been explicitly considered. To address these issues, we propose AdaE-SAEA, an adaptive ensemble surrogate-assisted evolutionary algorithm for expensive multi-objective optimization. AdaE-SAEA embeds SAEA as the low-level optimizer within the MetaBBO framework and jointly controls the infill criterion and ensemble-based surrogate modeling. Specifically, bagging and boosting are designed as surrogate modeling modules to adaptively balance robustness and accuracy across different search phases, while the meta-policy simultaneously selects the infill criterion to enable adaptive sampling decisions. The meta-policy is trained through reinforcement learning with parallel sampling and centralized training, improving both training efficiency and transferability. Experiments on synthetic and real-world problems demonstrate that AdaE-SAEA outperforms state-of-the-art baselines and MetaBBO-based methods. We further verify the effectiveness of TabPFN as the base surrogate model for ensemble learning. To the best of our knowledge, this is the first work to unify the control of surrogate modeling and infill criteria in SAEAs while explicitly addressing the robustness--accuracy trade-off.

URL PDF HTML ☆

赞 0 踩 0

2606.00852 2026-06-02 cs.CV cs.AI cs.LG 版本更新

RefDiffNet: Learning to Expose Subtle PCB Defects Before Detection

RefDiffNet: 在检测前学习暴露细微PCB缺陷

Vinay Edula, Nilesh Badwe, Priyanka Bagade

发表机构 * Department of Computer Science and Engineering Indian Institute of Technology Kanpur（计算机科学与工程系印度理工学院坎浦尔）； Department of Materials Science and Engineering Indian Institute of Technology Kanpur（材料科学与工程系印度理工学院坎浦尔）

AI总结提出RefDiffNet，一种轻量级即插即用的输入增强模块，通过引入无缺陷参考图像来突出缺陷区域，从而提升下游检测器在PCB缺陷检测中的性能。

详情

AI中文摘要

印刷电路板（PCB）缺陷检测具有挑战性，因为许多缺陷很小且难以与复杂的背景图案区分。大多数基于深度学习的PCB检测方法仅依赖被检测的PCB图像进行缺陷检测，忽略了编码走线、焊盘和其他PCB结构预期布局的无缺陷参考图像。在这项工作中，我们提出了RefDiffNet，一种轻量级即插即用的输入增强模块，放置在检测器主干之前，用于在缺陷检测前增强图像。RefDiffNet将经典检测中的一个成熟思想带入深度学习时代，利用无缺陷参考图像来揭示缺陷。RefDiffNet比较缺陷图像与对齐的参考图像，捕获相对于参考图像的结构变化，并使用轻量级编码器输出缺陷区域被突出的原始图像，从而简化下游检测器的任务。在HRIPCB和DeepPCB上的结果表明，RefDiffNet在各类检测器上一致地提升了性能，包括从YOLOv8到YOLOv26的单阶段检测器、基于Transformer的RT-DETR以及两阶段Faster R-CNN。它实现了高达18%的相对mAP50:95增益，且开销可忽略，仅引入0.004-0.005M额外参数和0.7-0.8 GFLOPs，最多占任何评估检测器参数量的0.25%。结果确立了RefDiffNet作为一种轻量级、即插即用、检测器无关的输入增强模块，以最小的计算成本显著提升PCB缺陷检测性能。

英文摘要

Printed circuit board (PCB) defect detection is challenging because many defects are small and difficult to distinguish from complex background patterns. Most deep learning-based PCB inspection methods rely only on the inspected PCB image for defect detection, ignoring the defect-free reference image that encodes the expected layout of traces, pads, and other PCB structures. In this work, we propose RefDiffNet, a lightweight plug-and-play input enhancement block placed before the detector backbone to enhance the image before defect detection. RefDiffNet brings one proven idea from classical inspection into the deep learning era, using a defect-free reference image to reveal defects. RefDiffNet compares the defective image with the aligned reference, captures structural changes relative to the reference, and uses a lightweight encoder to output the original image with defective regions highlighted, thereby making the downstream detector's task easier. Results on HRIPCB and DeepPCB show that RefDiffNet consistently improves performance across detector families, including one-stage detectors from YOLOv8 to YOLOv26, the transformer-based RT-DETR, and the two-stage Faster R-CNN. It achieves up to 18% relative mAP50:95 gain with negligible overhead, introducing only 0.004 - 0.005M additional parameters and 0.7 - 0.8 GFLOPs, amounting to at most 0.25% of the parameter count of any evaluated detector. Results establish RefDiffNet as a lightweight, plug-and-play, detector-agnostic input enhancement module that substantially improves PCB defect detection with minimal computational cost.

URL PDF HTML ☆

赞 0 踩 0

2606.00851 2026-06-02 cs.SD cs.CL cs.HC cs.LG eess.AS 版本更新

Sympatheia: Emotionally Adaptive Voice Assistant with Continuous Affect Conditioning

Sympatheia: 具有连续情感调节的情感自适应语音助手

Sukru Samet Dindar, Riki Shimizu, Xilin Jiang, Nima Mesgarani

发表机构 * Department of Electrical Engineering, Columbia University（电气工程系，哥伦比亚大学）

AI总结提出Sympatheia语音对话框架，通过从用户语音推断情感并结合连续效价-唤醒度控制信号，实现情感自适应响应，优于基线模型。

详情

AI中文摘要

共情口语对话系统必须推断用户的情感状态以做出适当响应，然而日常语音通常带有微弱、中性或模糊的情感线索。为解决这一问题，我们引入了Sympatheia，一种语音到语音对话框架，其条件基于从用户语音中推断出的情感，并且在可用时，基于多模态感知模块或用户界面提供的连续效价-唤醒度（VA）控制信号中的明确情感规格。为了训练我们的模型，我们构建了Sympatheia-18k，一个包含12个情感锚点的情感条件合成口语对话语料库。该数据集包括用于学习情感语音行为的情感分割，以及一个中性分割，该分割将情感中性查询与多个情感条件响应配对，以在情感模糊情况下隔离明确的情感控制。实验结果表明，Sympatheia在生成语义内容和口语表达均情感适当的响应方面优于语音对话基线。我们进一步表明，相同的VA界面可以整合来自不同感知模块（包括面部表情、生物信号和文本情感描述）的情感估计，从而在语音单独提供有限情感证据时改善响应对齐。这些结果表明，连续情感调节是构建情感自适应语音助手的有效实际步骤。

英文摘要

Empathetic spoken dialogue systems must infer a user's emotional state to respond appropriately, yet everyday speech often carries weak, neutral, or ambiguous affective cues. To address this, we introduce Sympatheia, a speech-to-speech dialogue framework conditioned on affect inferred from the user's speech and, when available, explicit affect specifications provided as a continuous valence--arousal (VA) control signal by a multimodal sensing module or user interface. To train our model, we construct Sympatheia-18k, an emotion-conditioned synthetic spoken dialogue corpus with 12 emotion anchors. This dataset includes an emotional split for learning affective speech behavior, and a neutral split that pairs emotionally neutral queries with multiple emotion-conditioned responses to isolate explicit emotion control in emotionally ambiguous cases. Empirical results show that Sympatheia outperforms speech conversational baselines in generating responses whose semantic content and spoken delivery are both emotionally appropriate. We further show that the same VA interface can integrate emotion estimates from diverse sensing modules, including facial expression, biosignals, and textual affect descriptions, improving response alignment when speech alone provides limited emotional evidence. These results suggest that continuous affect conditioning is an effective practical step for building emotionally adaptive voice assistants.

URL PDF HTML ☆

赞 0 踩 0

2606.00846 2026-06-02 cs.LG 版本更新

CUPID in the Model Zoo: Online Matchmaking for Selecting Your Dream LLM

模型动物园中的丘比特：在线匹配以选择你的梦想大语言模型

Son Nguyen, Xinyuan Liu, Ransalu Senanayake

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出一种基于决斗老虎机算法的主动学习框架，通过迭代选择大语言模型对并收集用户反馈，高效匹配用户偏好与模型能力。

Comments 38 pages, 11 figures

详情

AI中文摘要

用户越来越面临从快速增长的大语言模型池中为给定任务选择合适的LLM的挑战，每个模型具有独特但通常不透明的潜在属性。加剧这一挑战的是，用户可能缺乏词汇或意识来明确表达他们在LLM的响应或部署中所重视的特征。我们提出了一种交互高效的主动学习框架，其中决斗老虎机算法迭代选择LLM对，收集用户关于其响应的反馈，并更新其对用户潜在偏好的信念。我们引入了一种新颖的信念感知上置信界策略，平衡模型池的探索与推断偏好的利用，从而在用户指定的成本和时间预算下实现用户需求与LLM能力之间的高效对齐。通过在LLM和人类研究上的多样化实验，我们实验验证了我们的模型能够以较低成本高效地将良好对齐的LLM匹配给用户。

英文摘要

Users increasingly face the challenge of selecting an appropriate LLM for a given task from a rapidly growing pool of LLMs, each with distinct but often opaque latent properties. Compounding this challenge, users may lack the vocabulary or awareness to explicitly articulate the characteristics they value in an LLM's responses or deployment. We propose an interaction-efficient active learning framework in which a dueling bandit algorithm iteratively selects pairs of LLMs, collects user feedback about their responses, and updates its belief about the user's latent preferences. We introduce a novel belief-aware upper confidence bound strategy that balances exploration of the model pool with exploitation of inferred preferences, enabling efficient alignment between user needs and LLM capabilities under user-specified cost and time budgets. Through diverse experiments on LLMs and human studies, we experimentally verify that our model can efficiently match well-aligned LLMs to users at a lower cost.

URL PDF HTML ☆

赞 0 踩 0

2606.00844 2026-06-02 cs.CV cs.AI cs.LG 版本更新

MoEIoU: Rethinking Bounding-Box Regression as a Mixture of Experts

MoEIoU：将边界框回归重新思考为混合专家模型

Vinay Edula, Priyanka Bagade

发表机构 * Indian Institute of Technology Kanpur（印度理工学院坎普尔分校）

AI总结提出MoEIoU损失函数，通过混合专家模型联合优化重叠、中心对齐和长宽比，并采用课程学习权重调度，在多个数据集和YOLO架构上超越现有IoU损失。

详情

AI中文摘要

边界框回归是目标检测的基本组成部分，在精确目标定位中起着关键作用。现有的基于交并比（IoU）的损失函数通过引入几何惩罚项（如中心距离和长宽比不匹配）来扩展IoU目标，以改进边界框回归。然而，这些惩罚项通常在训练过程中保持不变，没有考虑优化动态：预测框在初始阶段表现出较大的中心距离和形状误差，而后期阶段则侧重于提高与真实框的重叠。为了解决这一局限性，我们引入了MoEIoU，一种基于混合专家的回归损失，它联合建模了重叠、中心对齐和长宽比不匹配。MoEIoU使用log-sum-exp函数聚合这些组件，该函数强调主要的定位误差，同时保持其他项的平滑贡献。此外，采用基于课程的权重调度，在早期训练阶段优先纠正框的位置和形状，在后期阶段提高重叠。我们在PASCAL VOC、HRIPCB和MS COCO上使用多种YOLO架构以及大规模模拟实验评估了所提出的MoEIoU。它始终优于标准和最新的最先进损失，表现出更快的收敛速度和更高的定位精度。我们进一步表明，这种自适应聚合改进了现有的基于IoU的损失，带来了一致的增益，并为目标检测框架中的边界框回归提供了更有效的优化指导。

英文摘要

Bounding-box regression is a fundamental component of object detection, playing a critical role in precise object localization. Existing Intersection-over-Union (IoU)-based loss functions extend the IoU objective by incorporating geometric penalties, such as center-distance and aspect-ratio mismatch, to improve bounding-box regression. However, these penalties typically remain fixed throughout training and do not account for the optimization dynamics in which predicted boxes initially exhibit large center-distance and shape errors, with later stages focusing on improving overlap with the ground truth. To address this limitation, we introduce MoEIoU, a mixture-of-experts based regression loss that jointly models overlap, center alignment, and aspect-ratio mismatch. MoEIoU aggregates these components using a log-sum-exp function, which emphasizes the dominant localization error while maintaining smooth contributions from other terms. Additionally, a curriculum-based weighting schedule is employed to prioritize correcting box position and shape in early training stages and improving overlap in later stages. We evaluated proposed MoEIoU on PASCAL VOC, HRIPCB, and MS COCO using multiple YOLO architectures, along with large-scale simulation experiments. It consistently outperforms standard and recent state-of-the-art losses, demonstrating faster convergence and improved localization accuracy. We further show that this adaptive aggregation improves existing IoU-based losses, yielding consistent gains and providing more effective optimization guidance for bounding-box regression in object detection frameworks.

URL PDF HTML ☆

赞 0 踩 0

2606.00837 2026-06-02 cs.RO cs.LG 版本更新

Coarse-to-Fine Compositional Diffusion for Long-Horizon Planning

粗到细的组合扩散用于长时域规划

Byoungwoo Park, Utkarsh A. Mishra, Jaemoo Choi, Juho Lee, Yongxin Chen

发表机构 * KAIST（韩国科学技术院）； Georgia Institute of Technology（佐治亚理工学院）

AI总结提出Coarse-to-Fine Compositional Diffusion (CoFi)方法，通过先形成全局骨架再细化局部细节，在长时域机器人规划、全景图像生成和长视频生成中提升全局一致性和局部质量，同时减少2-8倍去噪评估次数。

Comments Project page: https://cofi-diffusion.github.io

详情

AI中文摘要

扩散模型为生成结构化数据提供了强先验，但许多任务需要输出超出这些模型通常训练规模的范围。组合生成通过将来自预训练短时域先验的重叠局部计划组合成长时域输出来解决这一问题。然而，标准组合主要强制相邻局部计划之间的一致性，产生局部一致性而不直接指定完整组合的全局结构。因此，局部兼容的计划仍可能形成不合理的路线、任务序列或时间演化。现有方法通过重复传播局部一致性信号或添加推理时优化来提高全局连贯性，但随着局部计划数量或维度的增加，这些过程变得昂贵。我们提出粗到细组合扩散（CoFi），一种推理时采样器，将全局结构形成与局部细节细化分离。CoFi首先将局部去噪估计围绕共享的粗结构对齐，产生捕获长程任务级排列的全局骨架。然后将该骨架扩散到中间噪声水平，并使用相同的预训练局部先验去噪，在保留骨架诱导的全局连贯性的同时恢复局部精细结构。在长时域机器人规划、全景图像生成和长视频生成中，CoFi不仅比先前的组合基线提高了全局连贯性和局部样本质量，而且需要2-8倍更少的去噪评估次数。

英文摘要

Diffusion models provide strong priors for generating structured data, but many tasks require outputs beyond the scale on which these models are typically trained. Compositional generation addresses this by composing overlapping local plans from a pretrained short-horizon prior into a long-horizon output. However, standard composition primarily enforces agreement between neighboring local plans, yielding local consistency without directly specifying the global structure of the full composition. As a result, locally compatible plans may still form an implausible route, task sequence, or temporal evolution. Existing methods improve global coherence by repeatedly propagating local consistency signals or by adding inference-time optimization, but these procedures become expensive as the number or dimensionality of local plans increases. We propose Coarse-to-Fine Compositional Diffusion (CoFi), an inference-time sampler that separates global structure formation from local detail refinement. CoFi first aligns local denoised estimates around a shared coarse structure, producing a global scaffold that captures the long-range task-level arrangement. It then diffuses this scaffold to an intermediate noise level and denoises it with the same pretrained local prior, restoring local fine structure while preserving the scaffold-induced global coherence. Across long-horizon robotic planning, panoramic image generation, and long video generation, CoFi not only improves both global coherence and local sample quality over prior compositional baselines, but also requires 2-8x fewer denoiser evaluations.

URL PDF HTML ☆

赞 0 踩 0

2606.00835 2026-06-02 cs.LG 版本更新

Online Packet Scheduling with Deadlines and Learning

具有截止日期和学习的在线数据包调度

Gianmarco Genalti, Achraf Azize, Vianney Perchet

发表机构 * Politecnico di Milano（米兰理工大学）； FairPlay Joint Team, CREST, ENSAE, IP Paris（FairPlay联合团队，CREST，ENSAE，IP巴黎）

AI总结针对部分反馈下未知权重的在线数据包调度问题，通过连接睡眠强盗问题，提出算法实现α-遗憾最小化，并在不同松弛度下达到最优界。

详情

AI中文摘要

强制执行服务质量（QoS）保证的网络路由器必须在每个时钟周期决定传输哪个即将过期的数据包，即使数据包的值在处理之前是未知的。我们将此问题框架化为部分反馈下的在线数据包调度（OPSD）问题：数据包在每个时钟周期到达，具有不同的截止日期，但权重仅在执行后观察到。在未知权重的随机假设下，我们探索了具有强盗反馈的OPSD问题的不同变体。我们在我们的设置和睡眠强盗问题之间建立了联系，并将学习目标设定为α-遗憾最小化。我们提供了在不同松弛度下具有可证明α-遗憾保证的算法，区分了允许随机化的系统和不允许的系统。在每种情况下，我们的算法实现了$\widetilde{\mathcal{O}}\left(\sqrt{KT} ight)$的α-遗憾上界，与标准强盗设置的下界匹配。在实际相关的2-有界截止日期实例中，其中截止日期最多设置在到达后的一个时钟周期，我们的确定性算法实现了可证明的最紧竞争比。值得注意的是，当不同数据包类型数量$K\ge 2$有限时，有可能打破已建立的$\Phi= rac{1+\sqrt{5}}{2}$竞争比障碍，并获得范围在$[\sqrt{2}, \Phi)$内的更紧竞争比$ heta_K$。

英文摘要

Network routers that enforce Quality-of-Service (QoS) guarantees must decide, at every clock cycle, which expiring packet of information to transmit, even when the value of the packet is unknown until it is processed. We frame this problem as the Online Packet Scheduling with Deadlines (OPSD) problem under Partial Feedback: packets arrive at every clock cycle, with different deadlines, but the weights are only observed after execution. Under a stochastic assumption on the unknown weights, we explore different variants of the OPSD problem with bandit feedback. We establish a connection between our setting and the sleeping bandits problem, and set our learning goal to $α$-regret minimization. We provide algorithms with provable $α$-regret guarantees under different spans of slackness, distinguishing systems allowing for randomization and systems that do not. In every scenario, our algorithms achieve an $α$-regret upper bound of $\widetilde{\mathcal{O}}\left(\sqrt{KT}\right)$, matching the lower bound for the standard bandit setting. In the practically relevant case of $2$-bounded deadline instances, where the deadline is set at most one clock cycle away from the arrival, our deterministic algorithm achieves the provably tightest possible competitive ratio. Remarkably, when the number of distinct packet types $K\ge 2$ is finite, it is possible to break the well-established $Φ= \frac{1+\sqrt{5}}{2}$ competitive ratio barrier and attain a tighter competitive ratio $θ_K$ ranging in $[\sqrt{2}, Φ)$.

URL PDF HTML ☆

赞 0 踩 0

2606.00834 2026-06-02 stat.AP cs.AI cs.LG math.PR 版本更新

Hybrid Probabilistic Forecasting of Under-Five Malaria Admissions in Ghana: A Gaussian Process Regression with Holt-Winters Smoothing

加纳五岁以下儿童疟疾住院人数的混合概率预测：高斯过程回归与Holt-Winters平滑

T. Ansah-Narh, Y. Asare Afrane, J. Bremang Tandoh

发表机构 * GAEC, Ghana（加纳农业和粮食部）

AI总结针对加纳疟疾预测中季节性和数据不确定性挑战，提出结合高斯过程回归与Holt-Winters指数平滑的混合模型，实现概率性预测并评估其性能。

Comments 24 pages, 8 figures, accepted for publication in Artificial Intelligence in Medicine

详情

AI中文摘要

准确的疟疾预测在撒哈拉以南非洲仍是一个重大挑战，那里强烈的季节性、报告不确定性和非平稳传播动态降低了传统模型的可靠性。在加纳，地区级疟疾监测需要概率上严谨且数据有限时稳健的预测框架。本研究提出了一个混合框架，将高斯过程回归（GPR）与Holt-Winters指数平滑相结合，用于建模每月五岁以下儿童疟疾住院人数。GPR捕捉非线性行为和预测不确定性，而Holt-Winters稳定长期预测并保留季节结构。使用十年（2014-2023年）的地区级数据，通过滚动起点扩展窗口验证评估性能。混合模型实现了$R^2 = 0.9906$，而单独Holt-Winters为$0.8213$，$94.2\%$的残差在$\pm 2σ$范围内。2024-2028年的预测显示月平均住院人数约为8,000至12,200例。时空分析揭示了显著的生态异质性：北部高负担地区尽管绝对波动较大，但相对模式稳定。该框架为疟疾流行地区的早期预警和运营规划提供了一种可扩展的概率方法，支持加纳国家疟疾控制战略。

英文摘要

Accurate malaria forecasting remains a major challenge in sub-Saharan Africa, where strong seasonality, reporting uncertainty, and non-stationary transmission dynamics reduce the reliability of conventional models. In Ghana, district-level malaria surveillance requires forecasting frameworks that are probabilistically rigorous and robust under limited data. This study proposes a hybrid framework integrating Gaussian Process Regression (GPR) with Holt-Winters exponential smoothing for modelling monthly under-five malaria admissions. GPR captures non-linear behaviour and predictive uncertainty, while Holt-Winters stabilises long-horizon forecasts and preserves seasonal structure. Using ten years of district-level data (2014-2023), performance was evaluated via rolling-origin expanding-window validation. The hybrid model achieved $R^2 = 0.9906$ versus $0.8213$ for Holt-Winters alone, with $94.2\%$ of residuals within $\pm 2σ$ bounds. Forecasts for 2024-2028 project average monthly admissions from approximately 8{,}000 to 12{,}200 cases. Spatio-temporal analysis revealed pronounced ecological heterogeneity: northern high-burden districts exhibited stable relative patterns despite large absolute fluctuations. The framework provides a scalable probabilistic approach for malaria early warning and operational planning in endemic settings, supporting Ghana's national malaria control strategy.

URL PDF HTML ☆

赞 0 踩 0

2606.00831 2026-06-02 cs.AI cs.LG 版本更新

Subliminal Learning is a LoRA Artifact

潜意识学习是LoRA的伪影

Todd Nief, Harvey Yiyun Fu, Mark Muchane, Ari Holtzman

发表机构 * Department of Computer Science, University of Chicago（芝加哥大学计算机科学系）； Data Science Institute, University of Chicago（芝加哥大学数据科学研究所）

AI总结本文发现潜意识学习是LoRA微调产生的伪影，其传递行为与LoRA秩呈倒U型关系，且完全微调下消失，表明该现象依赖于微调和评估上下文。

详情

AI中文摘要

潜意识学习是一种现象，语言模型可以通过看似无害的数据将行为特征传递给其他模型（Cloud et al., 2025）。在潜意识学习中，具有行为特征（例如对猫的痴迷）的教师模型可以将这种猫痴迷传递给仅在教师生成的数字序列上微调的学生模型。在本文中，我们提出疑问：这种意想不到的行为传递是如何发生的？我们表明，潜意识学习是LoRA的伪影。当潜意识学习发生时，传递与LoRA秩呈倒U型关系；在完全微调下也会消失。我们表明，潜意识学习高度依赖于微调和评估期间看到的上下文。例如，在微调期间使用默认系统提示（“你是Qwen，由阿里云创建。你是一个有用的助手。”）的Qwen模型，在生成时如果没有包含系统提示，则不会表现出潜意识学习。我们进一步证明，潜意识行为局限于在微调和评估期间都看到的标记（例如模型的默认系统提示、标准聊天模板标记等）上的计算。总体而言，潜意识学习似乎是LoRA超参数和微调上下文的脆弱伪影，使其成为行为传递的不稳定渠道。

英文摘要

Subliminal learning is a phenomenon where language models can transmit behavioral traits to other models through seemingly innocuous data (Cloud et al., 2025). In subliminal learning, a teacher model with a behavioral trait (e.g. obsession with cats) can transmit this cat obsession to a student model finetuned only on numerical sequences generated by the teacher. In this paper, we ask: how does this unexpected behavioral transmission occur? We show that subliminal learning is a LoRA artifact. When subliminal learning occurs, transmission has an inverted U-shaped relationship with LoRA rank; it also disappears with full finetuning. We show that subliminal learning is highly dependent on the context seen during finetuning and evaluation. For example, a Qwen model with the default system prompt during finetuning ("You are Qwen, created by Alibaba Cloud. You are a helpful assistant.") does not show subliminal learning during generation when no system prompt is included. We further demonstrate that subliminal behavior is localized to computation at tokens seen during both finetuning and evaluation (e.g. the model's default system prompt, the standard chat template tokens, etc.). Overall, subliminal learning seems to be a fragile artifact of LoRA hyperparameters and finetuning context, making it an unstable channel for behavioral transmission.

URL PDF HTML ☆

赞 0 踩 0

2606.00826 2026-06-02 cs.LG 版本更新

Partial Fairness Awareness: Belief-Guided Strategic Mechanism for Strategic Agents

部分公平意识：面向策略代理的信念引导策略机制

Xinpeng Lv, Chunyuan Zheng, Yunxin Mao, Renzhe Xu, Hao Zou, Shanzhi Gu, Liyang Xu, Huan Chen, Yuanlong Chen, Wenjing Yang, Haotian Wang

发表机构 * National University of Defense Technology, Changsha, China（国防科技大学）； Peking University, Beijing, China（北京大学）； Shanghai University of Finance and Economics, Shanghai, China（上海财经大学）； ZGC Laboratory, Beijing, China（ZGC实验室）； Faculty of Computing, Harbin Institute of Technology, Harbin, China（哈尔滨工业大学计算机学院）

AI总结针对策略分类中的公平暴露困境，提出部分公平意识（PFA）问题，通过发布公平约束候选集并隐藏真实约束，结合信念引导机制实现代理与系统公平约束的对齐，实验表明PFA在降低群体公平差距、提高合格个体接受率和结果稳定性方面优于完全公开或私有的公平机制。

Comments Accepted by AAAI2026

详情

DOI: 10.1609/aaai.v40i29.39600

AI中文摘要

策略机器学习研究代理操纵其特征以从预测模型获得有利决策的场景。为了解决策略分类中固有的公平问题，最近的工作引入了群体特定的公平约束。然而，当前的公平感知方法在公平暴露问题上面临根本困境：公开这些约束会导致策略操纵和公平逆转，而隐藏它们可能降低社会福利并阻碍真正的改进。为填补这一空白，我们随后提出了部分公平意识（PFA）问题，因为我们的理论分析表明，这种困境可以通过发布公平约束的候选集并隐藏真实约束来缓解。具体来说，我们引入了一种信念引导的策略机制，其中代理与决策系统迭代交互，并在公平约束候选集上维持一个信念分布。这一信念引导过程使代理能够通过迭代交互和反馈，更新其在候选集上的信念分布，从而逐渐使其信念与系统采用的真实公平约束对齐。在真实世界和合成数据集上的大量实验表明，与完全公开或私有的公平机制相比，PFA实现了更低的群体公平差距、更高的真正合格个体接受率以及更稳定的结果。

英文摘要

Strategic machine learning investigates scenarios where agents manipulate their features to receive favorable decisions from predictive models. To address fairness concerns intrinsic to strategic classification, recent work has introduced group-specific fairness constraints. However, current fairness-aware approaches face a fundamental dilemma in the issue of fairness exposure: making these constraints public enables strategic manipulation and can lead to fairness reversal, while keeping them hidden may reduce social welfare and discourage genuine improvement. To fill this gap, we subsequently propose the problem of partial fairness awareness (PFA), as our theoretical analysis informs that such a dilemma can be mitigated by releasing the candidate set of fairness constraints and concealing the grounding constraint. To be specific, we introduce a belief-guided strategic mechanism, wherein agents iteratively interact with the decision system and maintain a belief distribution over the candidate set of fairness constraints. This belief-guided process enables agents, through iterative interaction and feedback, to update their belief distribution over the candidate set, thereby gradually aligning their belief with the grounding fairness constraint employed by the system. Extensive experiments on real-world and synthetic datasets demonstrate that PFA achieves lower group fairness gaps, higher acceptance of truly qualified individuals, and more stable outcomes compared to fully public or private fairness regimes.

URL PDF HTML ☆

赞 0 踩 0

2606.00821 2026-06-02 cs.LG 版本更新

A Comparative Analysis of Machine Learning Algorithms for Multi-Task Prediction of the Parameters of the Pectin Hydrolysis--Extraction Process

机器学习算法用于果胶水解-提取过程参数多任务预测的比较分析

Mullosharaf K. Arabov, Shavkat Yo. Kholov, Zainiddin K. Muhiddin

发表机构 * Institute of Computational Mathematics and Information Technologies, Kazan Federal University（卡兹安联邦大学计算数学与信息科技研究所）； Tajik Technical University named after Academician M.S. Osimi（阿米尔·苏米院士命名的塔吉克技术大学）； V.I. Nikitin Institute of Chemistry, National Academy of Sciences of Tajikistan（塔吉克斯坦国家科学院化学研究所维·尼金廷研究所）

AI总结本研究比较了11种机器学习算法在多任务回归预测果胶水解-提取过程参数中的性能，其中CatBoost表现最佳（平均R²约0.946），并分析了特征重要性，原料类型占主导地位（63.6%）。

Comments Preprint

详情

AI中文摘要

本研究利用机器学习方法解决复杂多参数工艺——果胶水解-提取过程的控制挑战。实验基础是一个独特的数据库，包含在受控条件下对七种植物原料进行的1000次实验室实验，涉及四个可变工艺因素（温度85-130°C、压力0.9-2.2 atm、保温时间3-10分钟、pH 1.5-2.0）。记录了四个输出特征：果胶产率、半乳糖醛酸含量、分子量和酯化度。为解决多任务回归问题，训练并比较了11种算法：正则化线性模型、集成方法（随机森林、梯度提升、XGBoost、CatBoost、Extra Trees）、k近邻、支持向量回归和多层感知器。最佳结果由CatBoost展示（超参数优化后平均R²约为0.946）。特征重要性分析揭示了原料类型的主导作用（占总重要性的63.6%），其次是温度和保温时间。开发的流水线以生产就绪格式导出，并部署为交互式Web界面。研究结果表明，集成方法结合严格的统计分析和可解释AI显著减少了物理实验的需求，并为智能果胶生产控制奠定了基础。

英文摘要

This study addresses the challenge of controlling a complex, multi-parameter technological process -- pectin hydrolysis--extraction -- using machine learning methods. The experimental foundation is a unique database comprising 1,000 laboratory experiments conducted under controlled conditions on seven types of plant raw material with four variable process factors (temperature 85--130 C, pressure 0.9--2.2 atm, holding time 3--10 min, pH 1.5--2.0). Four output characteristics were recorded: pectin yield, galacturonic acid content, molecular weight, and degree of esterification. To solve the multi-task regression problem, 11 algorithms were trained and compared: regularised linear models, ensemble methods (Random Forest, Gradient Boosting, XGBoost, CatBoost, Extra Trees), k-nearest neighbours, support vector regression, and a multilayer perceptron. The best results were demonstrated by CatBoost (average R-squared approximately 0.946 after hyperparameter optimisation). Feature importance analysis revealed the dominant role of the raw material type (63.6% of total importance), followed by temperature and holding time. The developed pipeline was exported in a production-ready format and deployed as an interactive web interface. The findings demonstrate that ensemble methods combined with rigorous statistical analysis and interpretable AI significantly reduce the need for physical experiments and form the basis for intelligent pectin production control.

URL PDF HTML ☆

赞 0 踩 0

2606.00815 2026-06-02 cs.LG 版本更新

OmniEEG-Bench: A Standardized Evaluation Benchmark for EEG Foundation Models

OmniEEG-Bench: 脑电图基础模型的标准化评估基准

Ziling Lu, Zongsheng Li, Xinke Shen, Kexin Lou, Yingyue Xin, Xiaoqi Chen, Shinan Wang, Xiang Chen, Jiahao Fan, Chenyu Huang, Xin Xu, Zhoujie Hou, Chen Wei, Quanying Liu

发表机构 * Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, China（南方科技大学生物医学工程系，深圳，中国）； School of Computer Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China（香港中文大学（深圳）计算机科学与工程学院，深圳，中国）； Omni-Intelligence, Shenzhen, China（奥米智能，深圳，中国）； Shenzhen Loop Area Institute, Shenzhen, China（深圳环城研究院，深圳，中国）

AI总结针对脑电图基础模型评估碎片化问题，提出统一基准OmniEEG-Bench，涵盖六类任务、54个数据集，并揭示预训练数据多样性和模型大小与性能的缩放律关系。

Comments 28 pages, 13 figures, 8 tables; benchmark of EEG foundation models

详情

AI中文摘要

脑电图（EEG）支持多种脑机接口（BCI）任务，从脑状态监测到人-大语言模型交互。EEG基础模型正在兴起，但由于异构数据集和不一致的任务协议，评估仍然碎片化。在此，我们介绍OmniEEG-Bench，一个用于EEG基础模型（FMs）的统一基准和下游任务路线图。它将EEG FMs的评估组织为六个任务族，涵盖（i）信号可靠性、（ii）生物特征与疾病、（iii）意识与状态、（iv）认知与情感、（v）自然刺激解码以及（vi）运动与交互，引入了先前EEG FM工作中未系统基准测试的新一代任务。OmniEEG-Bench通过任务卡规范标准化模型部署、任务定义和指标，并统一了54个EEG数据集及一致的评估协议。我们对10个代表性EEG基础模型进行了基准测试，并报告了涵盖多种评估设置的排行榜。预训练数据集多样性和模型大小均与跨数据集的更好平均排名显著相关，揭示了EEG基础模型中的缩放律行为（图1）。这些结果表明，扩展EEG基础模型不仅需要更大的架构，还需要更广泛和更多样化的预训练数据。基准测试代码可在https://github.com/ncclab-sustech/omni-eegbench.git获取。

英文摘要

Electroencephalography (EEG) supports a variety of brain-computer interface (BCI) tasks ranging from brain-state monitoring to human-LLM interactions. EEG foundation models are emerging, but evaluation remains fragmented due to heterogeneous datasets and nconsistent task protocols. Here, we introduce OmniEEG-Bench, a unified benchmark and downstream task roadmap for EEG foundation models (FMs). It organizes evaluation of EEG FMs into six task families spanning (i) signal reliability, (ii) biometrics and disease, (iii) consciousness and state, (iv) cognition and emotion, (v) naturalistic stimulus decoding, and (vi) motor and interaction, introducing a new generation of tasks not systematically benchmarked in prior EEG FM work. OmniEEG-Bench standardizes model deployment, task definitions, and metrics through a task-card specification, and unifies 54 EEG datasets with consistent evaluation protocols. We benchmark 10 representative EEG foundation models and report a leaderboard that covers diverse evaluation settings. Both pretraining dataset diversity and model size are significantly associated with better average ranks across datasets, revealing scaling-law behavior in EEG foundation models (Figure 1). These results suggest that scaling EEG foundation models requires not only larger architectures but also broader and more diverse pretraining data. The benchmark code is available at https://github.com/ncclab-sustech/omni-eegbench.git.

URL PDF HTML ☆

赞 0 踩 0

2606.00813 2026-06-02 cs.CR cs.CL cs.ET cs.LG cs.NE 版本更新

Cross-Generational Transfer of Adversarial Attacks Reveals Non-Monotonic Safety Alignment in LLMs

跨代对抗攻击迁移揭示大语言模型安全对齐的非单调性

Subhadip Mitra

发表机构 * Rota Labs（Rota实验室）

AI总结通过质量多样性进化（MAP-Elites）对四代Gemma模型进行自动红队探测，发现安全对齐非单调变化，其中Gemma 3攻击成功率显著高于前后代，且攻击迁移率在不同代际间存在差异。

Comments 8 pages, 3 figures

详情

AI中文摘要

大语言模型的安全对齐在不同代际之间并非单调提升。通过对Google Gemma家族四代模型（7B-31B）使用质量多样性进化（MAP-Elites）作为自动红队探测，我们发现Gemma 3（12B）的攻击成功率为68.7% ± 5.7%（均值±标准差，3个种子），显著高于其前代Gemma 2（45.5% ± 7.2%；p = 0.030，配对bootstrap）和后继Gemma 4（33.9% ± 1.8%）。跨代重放进化攻击档案显示，来自其他代际的攻击对Gemma 3的迁移率为44-46%，而对Gemma 4仅为14-18%，表明Gemma 4的安全增益泛化到了针对前代进化出的攻击分布之外。在我们的8B评判模型下，版权和网络犯罪漏洞在所有代际中接近100%，但第二评判审计（第6节）表明版权结果对评判选择敏感。错误信息ASR从Gemma 2的29%跃升至Gemma 3的99%，并在Gemma 4中仍保持77%的高位，表明该退化未得到完全解决。这些模式在静态基准测试中不可见，仅通过自适应的纵向探测才显现。所有实验使用3个随机种子和统一的自主托管评判模型；代码和工件可在https://github.com/bassrehab/red-queen获取。

英文摘要

Safety alignment in LLMs does not improve monotonically across model generations. Studying four generations of Google's Gemma family (7B-31B) with quality-diversity evolution (MAP-Elites) as an automated red-teaming probe, we find that Gemma 3 (12B) exhibits 68.7% +/- 5.7% attack success rate (ASR; mean +/- std, 3 seeds), significantly higher than its predecessor Gemma 2 (45.5% +/- 7.2%; p = 0.030, paired bootstrap) and its successor Gemma 4 (33.9% +/- 1.8%). Replaying evolved attack archives across generations reveals that attacks from other generations transfer to Gemma 3 at 44-46% but only 14-18% to Gemma 4, indicating that Gemma 4's safety gains generalize beyond the attack distributions evolved against earlier generations. Under our 8B judge, copyright and cybercrime vulnerabilities register at near-100% across all generations, though a second-judge audit (Section 6) suggests the copyright result is sensitive to judge choice. Misinformation ASR jumps from 29% to 99% between Gemma 2 and Gemma 3 and remains elevated at 77% in Gemma 4, indicating the regression was not fully addressed. These patterns are invisible to static benchmarks and emerge only through adaptive, longitudinal probing. All experiments use 3 random seeds with a unified self-hosted judge; code and artifacts are available at https://github.com/bassrehab/red-queen.

URL PDF HTML ☆

赞 0 踩 0

2606.00808 2026-06-02 cs.LG 版本更新

二维MXene催化基准数据集

Pavlo Melnyk, Anmar Karmush, Mårten Wadenbäck, Ania Beatriz Rodríguez-Barrera, Johanna Rosen, Michael Felsberg, Jonas Björk

发表机构 * Computer Vision and Learning Systems, Department of Electrical Engineering (ISY) & AI4X（计算机视觉与学习系统，电气工程系（ISY）及AI4X）； Materials Design Division, Department of Physics, Chemistry and Biology (IFM)（材料设计分校，物理、化学与生物系（IFM））； Wallingenberg Initiative Materials Science for Sustainability (WISE)（瓦伦贝格可持续材料科学倡议（WISE））

AI总结通过结合第一性原理计算与机器学习，构建包含50000个DFT计算训练集和10000个测试集的数据集，训练并验证多种机器学习原子间势模型，实现约10^3倍加速且保持高精度，推动MXene催化行为的高效研究。

详情

AI中文摘要

将第一性原理计算与机器学习（ML）相结合，旨在加速新型材料催化行为的探索。我们专注于二维（2D）Ti$_2$CT$_y$ MXene，其多样的表面化学性质使其成为极具吸引力的催化候选材料。由于计算成本，在现实条件下解析其组成和结构超出了标准密度泛函理论（DFT）的能力。为应对这一挑战，我们生成了一个包含50000个DFT计算用于训练和10000个用于测试的全面数据集，涵盖Ti$_2$CT$_y$ MXene构型和分子系统，以及一个包含1000个真正新的大系统的额外测试数据集，以研究模型的泛化能力。我们训练并验证了广泛使用且具有竞争力的机器学习原子间势（MLIP）模型，包括EquiformerV2、MACE、MatRIS和UPET，这些模型能够准确预测原子力和形成能——这些是DFT在结构和催化研究中必须反复计算的量——对于这些二维材料。这种DFT-ML联合框架实现了约$1-4 \cdot 10^3$倍（在CPU上）的计算加速，同时保持所需精度（力约$\pm 10$ meV/Å，每原子能量约$\pm 1$ meV），为更高效地研究MXene催化行为铺平了道路。此外，我们对训练模型进行了广泛的定性评估，展示了超越基准指标的基于模拟的综合比较的重要性。数据集、训练模型及代码可在https://huggingface.co/datasets/CatalystAnonymous/catalyst_mxenes获取。

英文摘要

Merging first-principles calculations with machine learning (ML), we aim to accelerate the exploration of catalytic behaviour in novel materials. We focus on two-dimensional (2D) Ti$_2$CT$_y$ MXenes, whose versatile surface chemistry makes them particularly compelling candidates for catalysis. Resolving their composition and structure under realistic conditions exceeds the reach of standard density functional theory (DFT) due to computational cost. To address this challenge, we generate a comprehensive dataset of 50,000 DFT calculations for training and 10,000 for testing, encompassing both Ti$_2$CT$_y$ MXene configurations and molecular systems, along with an additional test dataset with 1000 genuinely new, larger systems to investigate how well models generalise. We train and validate widely used and competitive machine learning interatomic potential (MLIP) models, including EquiformerV2, MACE, MatRIS, and UPET, that accurately predict atomic forces and formation energies -- quantities that DFT must repeatedly compute for structural and catalytic investigations -- for these 2D materials. This combined DFT-ML framework achieves computational acceleration on the order of approximately $1-4 \cdot 10^3$ (on a CPU) while maintaining desired-level accuracy (approximately +/- $10$ meV/A for forces and approximately +/- $1$ meV for per-atom energies), paving the way for more efficient investigations of MXene catalytic behaviour. Moreover, we perform an extensive qualitative evaluation of the trained models, showcasing the importance of comprehensive simulation-based comparison beyond benchmark metrics. The dataset and the trained models with the code are available at https://huggingface.co/datasets/CatalystAnonymous/catalyst_mxenes.

URL PDF HTML ☆

赞 0 踩 0

2606.00780 2026-06-02 cs.LG cs.AI 版本更新

Behavior-Invariant Task Representation Learning with Transformer-based World Models for Offline Meta-Reinforcement Learning

基于Transformer世界模型的行为不变任务表示学习用于离线元强化学习

Fuyuan Qian, Menglong Zhang, Song Wang, Quanying Liu

发表机构 * National University of Singapore（新加坡国立大学）； University of Science and Technology of China（中国科学技术大学）

AI总结提出一种结合信息论任务表示学习与Transformer随机世界模型的框架，通过提取行为不变的任务变量和保守值惩罚，解决离线元强化学习中的分布偏移和稀疏奖励问题，实现鲁棒泛化。

Comments ICML2026

详情

AI中文摘要

离线元强化学习利用静态数据集使智能体能够通过结合离线效率与元学习适应性来泛化到未见环境，但它面临来自上下文和策略分布偏移的关键挑战。这些问题阻碍智能体适应在线环境，并在稀疏奖励设置下进一步加剧。结果，智能体常常陷入固有的模式困境，无法实现鲁棒的泛化。在这项工作中，我们提出了一种新颖的框架，将信息论任务表示学习与基于Transformer的随机世界模型相结合。我们的方法提取对行为策略不变的任务定义潜在变量，从而有效缓解上下文分布偏移。为了进一步处理策略偏移和模型利用，我们对基于想象力的轨迹应用保守值惩罚，防止策略利用模型不准确性，同时保持鲁棒适应。大量评估表明，我们的方法在分布外和稀疏奖励设置下优于最先进的方法，具有优越的稳定性和泛化能力。

英文摘要

Offline meta-reinforcement learning leverages static datasets to enable agents to generalize to unseen environments by combining offline efficiency with meta-learning adaptability, yet it faces key challenges from context and policy distribution shifts. These issues hinder agents from adapting to online environments, and are further exacerbated under sparse-reward settings. As a result, agents often become trapped in an inherent pattern dilemma, failing to achieve robust generalization. In this work, we propose a novel framework that integrates information-theoretic task representation learning with a Transformer-based stochastic world model. Our approach extracts task-defining latent variables that are invariant to behavior policy, thereby effectively mitigating the context distribution shift. To further handle policy shift and model exploitation, we apply a conservative value penalty to imagination-based rollouts, preventing the policy from exploiting model inaccuracies while maintaining robust adaptation. Extensive evaluations demonstrate that our method outperforms state-of-the-art approaches, with superior stability and generalization under out-of-distribution and sparse-reward settings.

URL PDF HTML ☆

赞 0 踩 0

2606.00776 2026-06-02 cs.LG 版本更新

Latent Diffusion Pretraining for Crystal Property Prediction

晶体性质预测的潜在扩散预训练

Shrimon Mukherjee, Kishalay Das, Partha Basuchowdhuri, Pawan Goyal, Niloy Ganguly

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Indian Institute of Technology, Bombay（印度班加罗尔印度理工学院）

AI总结提出基于潜在扩散的预训练框架CrysLDNet，结合变分自编码器和扩散模型，从无标注晶体结构中学习表示，微调后显著提升性质预测性能。

Comments Published in ICML 2026

详情

AI中文摘要

快速准确地预测晶体性质是新材料设计中的核心挑战。图神经网络和基于Transformer的模型由于能够编码晶体中原子的局部结构环境，已成为此任务的有力工具。然而，这些模型需要大量数据，而实践中晶体性质的标注数据稀缺。预训练-微调策略，特别是基于扩散模型的策略，在解决这些限制方面显示出前景。在这项工作中，我们引入了一个新颖的基于潜在扩散的预训练框架CrysLDNet，旨在缓解数据稀缺问题。我们的方法在预训练阶段将变分自编码器（VAE）与扩散模型相结合。VAE编码器将3D晶体结构映射到平滑的潜在空间，在该空间中应用扩散过程。这种潜在扩散预训练使图编码器能够从大规模无标注数据中有效捕获结构和化学语义，然后可以针对特定性质预测任务进行微调。在流行的DFT数据集上进行性质预测的综合实验表明，CrysLDNet显著优于从头训练和预训练的基线，在JARVIS和MP数据集上分别提高了4.26%和4.90%。此外，学习到的表示在稀疏数据条件下保持鲁棒，并且具有足够的表达能力，可以在有限实验数据微调时纠正DFT误差。代码可在https://github.com/shrimonmuke0202/CrysLDNet.git获取。

英文摘要

Fast and accurate prediction of crystal properties is a central challenge in new materials design. Graph neural networks and Transformer-based models have emerged as powerful tools for this task due to their ability to encode the local structural environment of atoms within a crystal. However, these models are data-hungry, and in practice, labeled data for crystal properties are scarce. Pretraining-finetuning strategies, particularly those based on diffusion models, have shown promise in addressing these limitations. In this work, we introduce a novel latent diffusion based pretraining framework, CrysLDNet, designed to mitigate data scarcity. Our approach integrates a Variational Autoencoder (VAE) with a diffusion model during the pretraining stage. The VAE encoder maps 3D crystal structures into a smooth latent space within which the diffusion process is applied. This latent diffusion pretraining enables the graph encoder to effectively capture structural and chemical semantics from large-scale unlabeled data, which can then be finetuned for specific property prediction tasks. Comprehensive experiments on popular DFT datasets for property prediction reveal that CrysLDNet significantly outperforms both training-from-scratch and pretrained baselines, with improvements of 4.26% and 4.90% on the JARVIS and MP datasets, respectively. Additionally, the learned representations remain robust in sparse-data conditions and are expressive enough to correct DFT errors when finetuned with limited experimental data. Code is available at: https://github.com/shrimonmuke0202/CrysLDNet.git.

URL PDF HTML ☆

赞 0 踩 0

2606.00771 2026-06-02 cs.LG cs.AI cs.SD 版本更新

Logit Distillation on Manifolds: Mapping by Learning

流形上的对数蒸馏：通过学习进行映射

Yiru Yang, Junling Wang, Nishant Kumar Singh, Luohong Wu, Haoran Yan

发表机构 * University of Zurich（苏黎世大学）； ETH Zurich（苏黎世联邦理工学院）； Deutsche Bank Securities（德意志银行证券公司）

AI总结提出一种层和点投影映射方法，将学生和教师表示对齐到高维嵌入空间，结合LoRA注入，在显著减少可训练参数的同时提高词错误率。

详情

AI中文摘要

提高几乎任何机器学习模型性能的一种简单方法是，不训练单个模型，而是训练多个使用不同算法的模型，这些模型对相同数据做出略有不同的预测和错误，从而提高平均预测和鲁棒性。然而，使用整个模型集成进行预测是繁琐且计算成本过高的，无法部署给大量用户，特别是当模型是大型神经网络时。为此，我们引入了一种层和点投影映射，在训练过程中将学生和教师表示映射到对齐的高维嵌入空间。所提出的方法结合LoRA注入，将学生模型的可训练参数减少到教师模型的不到1%，同时与其他蒸馏方法相比，显著提高了词错误率（WER），如消融研究所示。与专家混合不同，我们的方法可以快速并行训练。

英文摘要

A simple way to improve the performance of almost any machine learning model is not to train a single but several models with diverse algorithms which will make slightly distinct kinds of predictions and errors on the same data, and thus improve the average predictions and robustness. However, making predictions using a whole ensemble of models is cumbersome and computationally too expensive to allow deployment to a large number of users, especially if the models are large neural nets. In response to this, we introduce a layer and point wise projection mapping, which maps student and teacher representations into an aligned high-dimensional embedding space during training process. The proposed approach combined with LoRA injection reduces the student model trainable parameters to less than 1% of the teacher model, while significantly improving word error rate (WER) compared to other distillation methods, as demonstrated in ablation studies. Unlike a mixture of experts, our method can be trained rapidly and in parallel.

URL PDF HTML ☆

赞 0 踩 0

2606.00761 2026-06-02 cs.LG cs.CL 版本更新

内化温度：面向强化学习的同策略自蒸馏作为策略加热器

Xuewei Yang, Jiachen Yu, Jie Wu, Shaoning Sun, Junjie Wang, Yujiu Yang

发表机构 * Tsinghua University（清华大学）

AI总结提出温度缩放同策略自蒸馏（TS-OPSD），通过将温度探索效应内化到模型参数中，缓解强化学习中的熵崩溃问题，无需外部教师或额外推理成本。

详情

AI中文摘要

基于可验证奖励的强化学习提升了大语言模型的推理能力，但常常遭受熵崩溃，即日益集中的策略减少了轨迹多样性和有用的学习信号。现有补救措施要么约束强化学习目标（如熵正则化），要么在轨迹收集期间调整采样温度，但这些干预措施仍外在于模型参数。我们提出温度缩放同策略自蒸馏（TS-OPSD），一种轻量级的策略加热方法，将温度的探索效应内化到模型参数中。从熵崩溃的强化学习检查点开始，TS-OPSD 通过对模型自身的 logits 应用高温缩放来构建自教师，然后将得到的更平滑分布蒸馏回学生。这种策略加热不需要外部教师、特权数据或额外的推理成本。在 Qwen3-4B-Base 和 Qwen3-8B-Base 上的实验表明，策略加热为继续强化学习提供了比标准继续强化学习和轨迹级温度加热更强的初始化。进一步分析表明，TS-OPSD 主要降低输出锐度，同时保留中间表示、顶级候选集和推理能力。这些结果表明，熵恢复可以作为面向推理的强化学习的一种简单的崩溃后干预措施。

SORA：快速对抗训练中的自由二阶攻击

Mazdak Teymourian, Ramtin Moslemi, Farzan Rahmani, Mohammad Hossein Rohban

发表机构 * Department of Computer Engineering, Sharif University of Technology, Tehran, Iran（谢赫大学计算机工程系）

AI总结针对快速对抗训练中的灾难性过拟合问题，提出通过扰动变异性和梯度对齐指标PertAlign来预测并防止过拟合，并设计自适应步长方法SORA，实现最优鲁棒性和干净准确率。

Comments Accepted at ICML 2026

详情

AI中文摘要

对抗训练是对抗性样本的主要防御手段，但在高效的单步变体中常常遭受灾难性过拟合，即尽管单步性能很高，但对多步攻击的鲁棒性却崩溃。我们通过两个贡献来解决这种失效模式。首先，我们形式化了epsilon过拟合（EO），这是一种固定扰动幅度和方向加剧CO的视角，并表明引入扰动变异性可以显著提高不同架构和数据集上的鲁棒泛化能力。其次，我们提出了PertAlign（扰动对齐），这是一种理论上合理、计算开销可忽略的指标，通过测量攻击阶段的梯度对齐来预测CO的发生。利用这些见解，我们引入了SORA，一种自适应步长的AT方法，它根据损失曲面几何动态调整扰动。SORA始终能防止CO，实现最先进的鲁棒性和干净准确率，并使用一组固定的超参数在数据集和架构上泛化，这对于快速AT的适用性至关重要。在不同数据集和架构上的大量实验表明，SORA在提供更高干净准确率和卓越效率的同时，匹配或超越了先前方法的鲁棒性。代码可在https://github.com/SecondOrderAT/SORA获取。

英文摘要

Adversarial Training (AT) is a leading defense against adversarial examples but often suffers from Catastrophic Overfitting (CO) in efficient single-step variants, where robustness to multi-step attacks collapses despite high single-step performance. We address this failure mode with two contributions. First, we formalize Epsilon Overfitting (EO), a perspective in which fixed perturbation magnitudes and directions exacerbate CO, and show that introducing perturbation variability significantly improves robust generalization across different architectures and datasets. Second, we propose PertAlign (Perturbation Alignment), a theoretically grounded, computationally negligible metric that predicts CO onset by measuring gradient alignment across attack stages. Leveraging these insights, we introduce SORA, an adaptive step-size AT method that dynamically adjusts perturbations based on loss surface geometry. SORA consistently prevents CO, achieves state-of-the-art robustness and clean accuracy, and generalizes across datasets and architectures using a single fixed set of hyperparameters, which is essential for applicability in fast AT. Extensive experiments on diverse datasets and architectures show that SORA matches or surpasses the robustness of prior methods while delivering higher clean accuracy and superior efficiency. Code is available at https://github.com/SecondOrderAT/SORA.

URL PDF HTML ☆

赞 0 踩 0

2606.00735 2026-06-02 cs.DC cs.LG 版本更新

ViBE: Co-Optimizing Workload Skew and Hardware Variability for MoE Serving

ViBE: 针对MoE服务的工作负载偏斜与硬件变异性协同优化

Seokjin Go, Marko Scrbak, Ephrem Wu, Srilatha Manne, Divya Mahajan

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； Advanced Micro Devices, Inc.（先进微器件公司）

AI总结提出ViBE框架，通过感知硬件的专家放置方法，结合GPU性能建模与专家激活分析，最小化分布式MoE推理中的执行时间不平衡，显著提升SLO达标率并降低P90 TTFT。

详情

AI中文摘要

在分布式混合专家（MoE）推理中，依赖于输入的令牌路由与GPU性能变异性相互作用，在同步执行下产生持续的掉队者，其中最慢的GPU决定层延迟。这种性能变异性是现代加速器固有的：制造差异、功率限制和热条件在名义上相同的GPU之间引入了可测量的执行时间差异。核心挑战在于MoE执行时间不平衡源于工作负载偏斜和硬件不对称的相互作用。令牌路由产生不均匀且逐层变化的专家负载，而GPU吞吐量取决于设备特定的操作特性和工作负载强度。先前的工作缓解了路由偏斜，但假设硬件同质，优化令牌平衡而非执行延迟。因此，即使平衡的令牌分配也可能留下硬件引起的掉队者未解决。为此，我们提出了变异性感知的专家分箱（ViBE），一种硬件感知的专家放置框架，旨在最小化跨GPU的执行时间不平衡。ViBE结合了每GPU性能建模与专家激活分析，将高负载专家分配给更快的设备，低负载专家分配给较慢的设备，从而在不修改模型语义或硬件的情况下减少层级别的掉队者。由于工作负载特征和有效GPU吞吐量可能随服务条件变化，ViBE支持在负载/性能漂移下进行轻量级重新校准，以在需要时刷新其路由和性能估计。结果表明，ViBE持续减少执行时间不平衡，并将SLO达标率提高14%，同时将P90 TTFT降低高达45%。我们进一步表明，硬件变异性的影响在规模扩大时增加，使得变异性感知的放置对于高效、高利用率的LLM服务至关重要。

英文摘要

In distributed Mixture-of-Experts (MoE) inference, input-dependent token routing interacts with GPU performance variability to create persistent stragglers under synchronized execution, where the slowest GPU determines layer latency. This performance variability is inherent to modern accelerators: manufacturing variation, power limits, and thermal conditions introduce measurable execution-time differences across nominally identical GPUs. The core challenge is that MoE execution-time imbalance arises from the interaction of workload skew and hardware asymmetry. Token routing produces uneven and layer-varying expert loads, while GPU throughput depends on device-specific operating characteristics and workload intensity. Prior work mitigates routing skew but assumes homogeneous hardware, optimizing token balance rather than execution latency. As a result, even balanced token assignments can leave hardware-induced stragglers unaddressed. Thus, we propose Variability-Informed Binning of Experts (ViBE), a hardware-aware expert placement framework that minimizes execution-time imbalance across GPUs. ViBE combines per-GPU performance modeling with expert activation profiling to assign high-load experts to faster devices and low-load experts to slower ones, reducing layer-level stragglers without modifying model semantics or hardware. Because both workload characteristics and effective GPU throughput can shift across serving conditions, ViBE supports lightweight recalibration under workload/performance drift to refresh its routing and performance estimates when needed. Results show that ViBE consistently reduces execution-time imbalance and improves SLO attainment by 14%, while lowering P90 TTFT by up to 45%. We further show that the impact of hardware variability increases at scale, making variability-aware placement important for efficient, high-utilization LLM serving.

URL PDF HTML ☆

赞 0 踩 0

2606.00717 2026-06-02 cs.LG cs.AI stat.ML 版本更新

Multi-Agent Conformal Prediction with Personalized Statistical Validity

具有个性化统计有效性的多智能体共形预测

Martin V. Vejling, Christophe A. N. Biscio, Adrien Mazoyer, Petar Popovski, Shashi Raj Pandey

发表机构 * Department of Electronic Systems（电子系统系）； Aalborg University（奥尔堡大学）； Department of Mathematical Sciences（数学科学系）； Institut de Mathématiques de Toulouse（图卢兹数学研究所）； Université de Toulouse（图卢兹大学）

AI总结提出个性化联邦加权共形预测框架，通过局部密度比加权和加权分位数聚合，在保护隐私的同时纠正数据异质性，为每个参与智能体提供渐近有效的边际和校准条件覆盖保证。

详情

AI中文摘要

不确定性量化在高风险机器学习任务中至关重要。然而，共形预测这一原则性解决方案在局部校准数据有限、隐私约束和数据异质性下面临挑战。在多智能体设置中，现有工作无法同时令人满意地解决这些挑战，其保证要么限于智能体间的平均值，要么在异质性设置中失去有效性。因此，我们提出个性化联邦加权共形预测（PFWCP），该框架结合局部密度比加权与加权分位数聚合，以在保护隐私的同时纠正异质性。该方法为每个参与智能体提供渐近有效的边际和校准条件覆盖保证，并支持一次性通信协议。理论分析呈现了对覆盖方差的调整，该调整由有效样本量表达式控制，这在加权共形预测的背景下是必要的，并且在合成和真实数据集上的实验表明，与最先进的联邦共形基线相比，校准质量有所提高。

英文摘要

Uncertainty quantification is essential in high-stakes machine learning tasks. However, one of the principled solutions, conformal prediction, faces challenges under limited local calibration data, privacy constraints, and data heterogeneity. In multi-agent settings, existing works do not simultaneously and satisfactorily address these challenges with guarantees either limited to averages across agents or losing validity in heterogeneous settings. Hence, we propose personalized federated weighted conformal prediction (PFWCP), a framework that combines local density ratio weighting with weighted quantile aggregation to correct for heterogeneity while preserving privacy. The method yields asymptotically valid marginal and calibration-conditional coverage guarantees for each participating agent and supports protocols with one-shot communication. Theoretical analysis presents an adjustment to the coverage variance, governed by an effective sample size expression, which is necessary in the context of weighted conformal prediction, and experiments on synthetic and real datasets show improved calibration quality over state-of-the-art federated conformal baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.00716 2026-06-02 cs.LG eess.SP 版本更新

Graph Transfer Learning via Shared Latent Geometry: Theory and Applications

基于共享潜在几何的图迁移学习：理论与应用

Tong Wu, Andrew Campbell, Anna Scaglione

发表机构 * University of Central Florida, USA（佛罗里达中央大学）； Cornell University, USA（康奈尔大学）

AI总结提出一种非对称双路径架构，通过教师编码器从高保真模拟器学习算子多项式特征，学生编码器从稀疏数据学习相同潜在几何，实现零样本迁移并给出可证明的误差界。

详情

AI中文摘要

在工程物理系统的推理与控制中，部署时面临高昂的物理代价：状态估计器、逆问题求解器、模型预测控制器、调度器和观测器通常没有闭式解，必须针对每个实例重新求解数值优化问题，且每次需重新提供算子。物理信息学习将这一代价转移到训练阶段，但使用单一编码器路径，其潜在几何在微调时会退化，且无法提供定量迁移保证。我们提出一种非对称双路径架构来解决这两个问题。教师编码器从高保真模拟器中获取特权密集状态，并通过在谱扰动下稳定的算子多项式特征表示系统；学生编码器从稀疏现场数据和算子描述符学习相同的潜在几何。部署时丢弃教师，冻结的学生编码器通过单次前向传播运行，并附带迁移证书。该设计关联了特权信息学习、知识蒸馏和跨模态蒸馏，但目标是跨实例迁移而非固定实例预测：拓扑和算子可以变化，而潜在任务不变。我们通过潜在律之间的Wasserstein距离建立了充分且近乎必要的迁移条件，得到了零样本误差界，并开发了一种在覆盖不完全时主动扩展的有限样本认证协议。该框架适用于任何具有可报告谱的算子的系统。在电力系统估计中，它实现了对100种未见拓扑的零样本迁移，95%的证书通过率，与拓扑感知的牛顿-拉夫逊方法相当的精度，以及亚毫秒级推理。这些结果表明，非对称路径加上算子锚定的潜在几何为认证的零样本推理与控制奠定了基础。

英文摘要

Inference and control in engineered physical systems pay a heavy physics cost at deployment: state estimators, inverse-problem solvers, model-predictive controllers, schedulers, and observers are often not closed-form and must re-solve a numerical optimization per instance, with the operator re-supplied each time. Physics-informed learning moves this cost to training, but uses a single encoder pathway whose latent geometry de-learns under fine-tuning and admits no quantitative transfer guarantee. We propose an asymmetric two-pathway architecture that resolves both issues. A teacher encoder consumes privileged dense states from a high-fidelity simulator and represents the system through operator-polynomial features stable under spectral perturbation; a student encoder learns the same latent geometry from sparse field data and operator descriptors. At deployment the teacher is discarded, and the frozen student runs in a single forward pass with a transfer certificate. The design connects to privileged-information learning, knowledge distillation, and cross-modal distillation, but targets cross-instance transfer rather than fixed-instance prediction: topology and operator may change, while the latent task does not. We establish sufficient and near-necessary transfer conditions via Wasserstein proximity between latent laws, yielding a zero-shot error bound, and develop a finite-sample certification protocol with active expansion when coverage is incomplete. The framework applies wherever a system admits an operator with reportable spectrum. On power-system estimation, it achieves zero-shot transfer to 100 unseen topologies, a 95% certificate pass rate, accuracy competitive with topology-aware Newton--Raphson, and sub-millisecond inference. These results suggest asymmetric pathways plus operator-anchored latent geometry provide a foundation for certified zero-shot inference and control.

URL PDF HTML ☆

赞 0 踩 0

2606.00708 2026-06-02 cs.AI cs.LG 版本更新

MOSAIC: Modular Orchestration for Structured Agentic Intelligence and Composition

MOSAIC：结构化智能体智能与组合的模块化编排

Yifan Bao, Xinyu Xi, Xinyu Liu, Wen Ge, Lei Jiang, Kevin Zhang, Raad Khraishi, Yihao Ang, Anthony K. H. Tung, Lukasz Szpruch, Hao Ni

发表机构 * Department of Computer Science, National University of Singapore（新加坡国立大学计算机科学系）； University College London（伦敦大学学院）； University of Edinburgh（爱丁堡大学）； Data & Analytics, Digital X（Digital X 数据与分析部）； Alan Turing Institute（艾伦·图灵研究所）

AI总结提出MOSAIC框架，通过结构化智能体编排、记忆驱动的模型选择和蓝图构建，将自动化数据科学转化为可验证、可复用的模型选择问题，在金融时间序列任务中优于AutoML和智能体基线。

详情

AI中文摘要

自动化数据科学是一个结构化的模型选择问题。解决方案必须为任务选择数据转换、特征表示、架构、训练过程、评估协议和优化策略。AutoML系统自动化了该过程的部分环节，但通常是在预定义的流水线、模型和超参数空间内搜索。基于LLM的智能体通过检索、代码生成和执行反馈提供了更大的灵活性，但其建模决策通常是非结构化的、难以验证且难以复用。我们引入了 extsc{MOSAIC}（结构化智能体智能与组合的模块化编排），一个用于记忆驱动的模型选择和工作流构建的结构化智能体框架。给定任务和数据集， extsc{MOSAIC}构建语义任务画像，检索先前的案例和源代码模块，并构建蓝图：一个指定所选建模组件、组合、接口约束和执行需求的中间表示。该蓝图将模型选择转化为分阶段、上下文驱动的搜索，并将基于LLM的代码生成建立在检索证据而非无约束合成之上。候选模型通过执行验证，并使用诊断反馈、训练轨迹、任务指标以及一个失败感知的强化学习策略进行优化。我们在金融时间序列预测和生成任务上实例化了 extsc{MOSAIC}，其中模型必须满足预测准确性、分布保真度、执行可靠性以及下游金融标准（如风险和尾部行为）。与AutoML和智能体基线的实验表明， extsc{MOSAIC}提高了任务性能、执行成功率和决策可追溯性，证明了将自动化数据科学视为结构化、可复用且基于执行的模型选择的价值。

英文摘要

Automated data science is a structured model-selection problem. A solution must choose data transformations, feature representations, architecture, training procedure, evaluation protocol, and refinement strategy for a task. AutoML systems automate parts of this process, but typically search within predefined pipeline, model, and hyperparameter spaces. LLM-based agents offer greater flexibility through retrieval, code generation, and execution feedback, yet their modelling decisions are often unstructured, difficult to verify, and hard to reuse. We introduce \textsc{MOSAIC} (Modular Orchestration for Structured Agentic Intelligence and Composition), a structured agentic framework for memory-grounded model selection and workflow construction. Given a task and dataset, \textsc{MOSAIC} builds a semantic task profile, retrieves prior cases and source-code modules, and constructs a blueprint: an intermediate representation specifying selected modelling components, composition, interface constraints, and execution requirements. This blueprint turns model selection into a staged, context-grounded search and grounds LLM-based code generation in retrieved evidence rather than unconstrained synthesis. Candidate models are validated by execution and refined using diagnostic feedback, training traces, task metrics, and a failure-aware reinforcement learning policy. We instantiate \textsc{MOSAIC} on financial time-series forecasting and generation, where models must satisfy predictive accuracy, distributional fidelity, execution reliability, and downstream financial criteria such as risk and tail behaviour. Experiments against AutoML and agentic baselines show that \textsc{MOSAIC} improves task performance, execution success, and decision traceability, demonstrating the value of treating automated data science as structured, reusable, and execution-grounded model selection.

URL PDF HTML ☆

赞 0 踩 0

对齐的辩证法：利用不安全知识实现动态安全路由

Maryam Hashemzadeh, Jerry Huang, Minseon Kim, Marc-Alexandre Côté, Sarath Chandar

发表机构 * Chandar Research Lab（Chandar研究实验室）； Mila – Quebec AI Institute（魁北克AI研究所）； Université de Montréal（蒙特利尔大学）； Microsoft Research（微软研究院）； Polytechnique Montréal（蒙特利尔理工学院）； Canada CIFAR AI Chair（加拿大CIFAR人工智能主席）

AI总结提出SafeMoE框架，通过混合专家模型将不安全知识隔离到领域特定的低秩适配器中，并训练轻量级门控网络动态路由这些专家，在保持安全性的同时生成信息丰富的响应。

详情

AI中文摘要

大语言模型（LLM）对齐的主流范式通过擦除、过滤不安全数据或训练模型严格拒绝有害提示来运作。虽然这种方法能有效降低即时毒性，但根本上限制了模型的认识论范围，导致系统过度谨慎，对敏感但良性的查询输出无信息量的全面拒绝。在这项工作中，我们挑战了不安全数据必须丢弃的正统观念。我们提出了一种对齐的辩证方法，认为不安全数据编码了丰富的、领域特定的知识，对于细致、安全且信息丰富的生成至关重要。为实现这一点，我们引入了SafeMoE，一个混合专家（MoE）框架，将不安全知识隔离到仅在有害语料上训练的领域特定低秩适配器（LoRA专家）中。为了从这些不安全基元中综合安全性，我们使用最小、高度精选的安全信息响应集训练一个轻量级门控网络。在推理时，该路由器动态编排不安全专家，有效引导生成轨迹以利用其深层领域知识，同时严格执行安全约束。在严格的安全基准上的广泛实证评估表明，SafeMoE不仅更安全，安全响应率相对提高了20%以上（绝对增益超过15%），而且在安全性和危害性至关重要时能生成更具信息量的响应。此外，路由机制在未见领域和更广泛的安全任务上表现出强大的零样本泛化能力，无需领域特定监督。我们的发现表明对齐的范式转变：真正的安全不需要掩盖不安全知识，而是需要其受控整合。

英文摘要

The prevailing paradigm in large language model (LLM) alignment operates via erasure, filtering unsafe data or training models to strictly refuse harmful prompts. While effective at reducing immediate toxicity, this approach fundamentally constricts the model's epistemological scope, resulting in over-cautious systems that output uninformative blanket refusals to sensitive yet benign queries. In this work, we challenge the orthodoxy that unsafe data must be discarded. We propose a dialectical approach to alignment, positing that unsafe data encodes rich, domain specific knowledge critical for nuanced, safe, and informative generation. To operationalize this, we introduce SafeMoE, a Mixture-of-Experts (MoE) framework that isolates unsafe knowledge into domain-specific Low-Rank Adapters (LoRA experts) trained exclusively on harmful corpora. To synthesize safety from these unsafe primitives, we train a lightweight gating network using a minimal, highly curated set of safe-informative responses. During inference, this router dynamically orchestrates the unsafe experts, effectively steering the generation trajectory to harness their deep domain knowledge while strictly enforcing safety constraints. Extensive empirical evaluations across stringent safety benchmarks demonstrate that SafeMoE is not only safer, achieving over a 20% relative improvement in safe response rate (more than a 15% absolute gain), but also produces more informative responses when safety and harmfulness are of paramount concern. Furthermore, the routing mechanism exhibits strong zero-shot generalization to unseen domains and broader safety tasks without domain-specific supervision. Our findings suggest a paradigm shift in alignment: true safety requires not the masking of unsafe knowledge, but its controlled integration.

URL PDF HTML ☆

赞 0 踩 0

2606.00685 2026-06-02 cs.LG 版本更新

Prior-Guided Multi-Omic Transformers for Single-Cell Gene Regulatory Network Inference

先验引导的多组学Transformer用于单细胞基因调控网络推断

Tianyang Xu, Tianci Liu, Niraj Rayamajhi, Ryan Patrick, Kranthi Varala, Ying Li, Jing Gao

发表机构 * Elmore Family School of Electrical and Computer Engineering（埃尔莫夫家庭电气与计算机工程学院）； Purdue University（普渡大学）； Department of Horticulture and Landscape Architecture（园艺与景观建筑系）； School of Biological Sciences（生物科学学院）

AI总结提出EpiAwareNet框架，通过先验引导的多组学Transformer，结合基因-峰值交叉注意力模块和批量数据先验，从配对单细胞数据中重建基因调控网络。

Comments 12 pages, 6 figures. Accepted to the KDD 2026 AI4Sciences Track

详情

DOI: 10.1145/3770855.3818945

AI中文摘要

基因调控网络（GRN）捕捉转录因子-靶标相互作用，是理解细胞状态调控和疾病的核心。从配对的单细胞转录组和染色质可及性数据重建GRN具有前景但充满挑战：scATAC极其稀疏，且大多数方法依赖于固定的峰值-基因链接和弱监督。我们提出EpiAwareNet，一个先验引导的多组学Transformer框架，仅使用轻量级生物学先验从配对单细胞数据重建GRN。在第一阶段，EpiAwareNet通过基因-峰值交叉注意力模块学习联合基因-峰值表示，实现数据驱动的、基因特异性的可及性信号聚合，而非硬编码的峰值-基因分配。在第二阶段，EpiAwareNet引入批量数据衍生的GRN先验作为噪声正边，在标签稀缺情况下提供弱监督，同时保持对先验噪声的鲁棒性，细化调控分数。在我们的实验中，EpiAwareNet在GRN重建上优于代表性的单组学和多组学基线，并产生更具生物学合理性的GRN，例如改善已知调控相互作用的恢复，这表明当与自适应跨模态表示学习结合时，来自批量数据的轻量级生物学先验可以有效指导单细胞GRN推断。代码和数据将在https://github.com/tianyang-x/EpiAwareNet_pub提供。

英文摘要

Gene regulatory networks (GRNs) capture transcription factor-target interactions and are central to understanding cell-state regulation and disease. Reconstructing GRNs from paired single-cell transcriptomic and chromatin accessibility data is promising but challenging: scATAC is extremely sparse, and most methods rely on fixed peak-to-gene links and weak supervision. We present EpiAwareNet, a prior-guided multi-omic Transformer framework that reconstructs GRNs from paired single-cell data using only lightweight biological priors. In Stage 1, EpiAwareNet learns joint gene-peak representations with a gene-peak cross-attention module, enabling data-driven, gene-specific aggregation of accessibility signals rather than hard-coded peak-to-gene assignments. In Stage 2, EpiAwareNet incorporates a bulk-derived GRN prior as noisy positive edges to provide weak supervision under label scarcity, refining regulatory scores while remaining robust to prior noise. In our experiments, EpiAwareNet improves GRN reconstruction over representative single- and multi-omic baselines and yields GRNs with greater biological plausibility, such as improved recovery of known regulatory interactions, suggesting that lightweight biological priors from bulk data can effectively guide single-cell GRN inference when combined with adaptive cross-modal representation learning. Code and data will be available at https://github.com/tianyang-x/EpiAwareNet_pub.

URL PDF HTML ☆

赞 0 踩 0

2606.00677 2026-06-02 cs.LG 版本更新

Limits of Resolution Equivariance in Fourier Neural Operators

傅里叶神经算子中的分辨率等变性极限

Alex Colagrande, Paul Caillon, Eva Feillet, Alexandre Allauzen

发表机构 * Miles Team, LAMSADE, Université Paris Dauphine-PSL（巴黎萨克雷大学巴黎-达菲学院LAMSADE团队）； Université Paris-Saclay, CNRS, LISN（巴黎-萨克雷大学CNRS LISN）； ESPCI PSL, Paris（巴黎ESPCI PSL）

AI总结本文通过对比直接细网格推理与低网格加傅里叶零填充上采样两种策略，发现傅里叶神经算子并不总是能泛化到不同分辨率，并分析了其层间频谱特性，指出非线性混叠是零样本分辨率等变性的主要障碍。

Comments Published as a paper at AI&PDE: ICLR 2026 Workshop on AI and Partial Differential Equations. 6 pages, 2 figures

详情

AI中文摘要

傅里叶神经算子通常被认为能够跨空间分辨率泛化，从而可以在粗网格上训练并在细网格上部署。我们通过对比从训练分辨率 $s$ 到测试分辨率 $S>s$ 时的两种推理选择来检验这一假设：直接在 $S$ 上运行 FNO，或者在 $s$ 上运行并通过傅里叶零填充将预测上采样到 $S$。在达西流问题上，我们观察到直接细网格推理并非总是有益的，甚至可能比低网格加上采样基线更差。我们进一步分析了层间频谱，发现在傅里叶截断下，中间表示的能量越来越集中在低频，而高频输出主要由后期的非线性/解码器阶段产生。这为 FNO 在保留少量模式时仍能表现良好，但对分辨率变化敏感的现象提供了机制性解释。我们的发现强调了一个简单但强大的跨分辨率评估基线，并指出非线性混叠是零样本分辨率等变性的关键障碍。

英文摘要

Fourier Neural Operators are often assumed to generalize across spatial resolutions, enabling training on a coarse grid and deployment on a finer grid. We test this assumption by contrasting two inference-time choices when moving from training resolution $s$ to test resolution $S>s$: running FNO directly at $S$, or running at $s$ and upsampling the prediction to $S$ via Fourier zero-padding. On Darcy flow, we observe that direct fine-grid inference is not reliably beneficial and can be worse than the low-grid-plus-upsampling baseline. We further analyze layerwise spectra and find that, under Fourier truncation, intermediate representations increasingly concentrate energy in low frequencies, with high-frequency output produced mainly by late nonlinear/decoder stages. This offers a mechanistic explanation for why FNO can perform well while retaining few modes, yet remain sensitive under resolution shifts. Our findings highlight a simple but strong baseline for cross-resolution evaluation and point to nonlinear aliasing as a key obstacle to zero-shot resolution equivariance.

URL PDF HTML ☆

赞 0 踩 0

2606.00675 2026-06-02 cs.LG 版本更新

Mapping the evolution of small reservoirs in Brazil from 1984 to 2025 using deep learning

利用深度学习绘制1984年至2025年巴西小型水库的演变

Kylen Solvik, Luis Gustavo Carvalho, Marcia N. Macedo

AI总结针对巴西小型水库被忽视的问题，采用深度学习计算机视觉模型从Landsat数据中分割小型水库，生成了1984-2025年全国年度水库地图，揭示了水库数量和面积的大幅增长。

Comments 33 pages, 5 figures, 2 tables

详情

AI中文摘要

巴西的水研究在很大程度上忽视了为农业用途（如牲畜饮水、农场规模水电、灌溉和水产养殖）而广泛筑坝的小溪流。这些无处不在的水坝及其水库会改变水温、河流连通性、水生栖息地、温室气体排放和蒸发水损失。绘制小型水库地图具有挑战性，因为需要可靠地检测小型水体并将人工水库与天然湖泊区分开来。因此，大多数区域和全球数据集都将其排除在外。为了解决这一空白，我们训练了一个深度学习计算机视觉模型，利用Landsat 5-9的数据，准确分割巴西境内的小型（<1平方公里）、溪流补给的地表水水库。从1984年到2025年应用我们的模型，我们为整个国家创建了年度水库地图，以评估其数量、大小和分布随时间的变化。检测到的水库数量从263,913个增加到996,245个，增长了近四倍，而它们的总表面积从3510平方公里增加到8550平方公里。据我们所知，这是第一个代表四十年来小型水库演变的全国年度数据集。公开可用的年度地图突出了巴西各地小溪流蓄水工程的范围和累积影响，为管理淡水生态系统和水资源提供了可操作的见解。

英文摘要

Water research in Brazil largely overlooks the widespread damming of small streams for agricultural uses such as watering cattle, farm-scale hydropower, irrigation, and aquaculture. These ubiquitous dams and their reservoirs can alter water temperature, stream connectivity, aquatic habitats, greenhouse gas emissions, and evaporative water losses. Mapping small reservoirs is challenging because it requires reliably detecting small water bodies and distinguishing artificial reservoirs from natural lakes. As a result, most regional and global datasets exclude them. To address this gap, we trained a deep learning computer vision model to accurately segment small ($< 1 km^2$), stream-fed, surface water reservoirs in Brazil leveraging data from Landsat 5-9. Applying our model from 1984 to 2025, we created annual reservoir maps for the entire country to evaluate how their count, size, and distribution have changed over time. The number of detected reservoirs grew nearly fourfold from 263,913 to 996,245, while their total surface area increased from 3510 $km^2$ to 8550 $km^2$. To our knowledge, this is the first country-wide annual dataset representing the evolution of small reservoirs over four decades. The publicly available annual maps highlight the extent and cumulative impacts of the small stream impoundments across Brazil, providing actionable insights for managing freshwater ecosystems and water resources.

URL PDF HTML ☆

赞 0 踩 0

2606.00674 2026-06-02 cs.LG cs.AI 版本更新

The Paradox of Outcome Optimization: A Causal Information-Theoretic Bound on Reasoning Shortcuts in LLMs

结果优化的悖论：LLM中推理捷径的因果信息论界限

Zihan Chen, Yiming Zhang, Wenxiang Geng, Zenghui Ding, Yining Sun

发表机构 * HFIPS, Chinese Academy of Sciences（中国科学院HFIPS）； University of Science and Technology of China（中国科学技术大学）

AI总结针对基于结果强化学习的LLM在分布外任务中推理脆弱的问题，提出因果信息论框架解释奖励诱导的流形坍缩，并证明过程奖励模型作为拓扑滤波器可消除低复杂度捷径。

详情

AI中文摘要

通过基于结果的强化学习（RL）对齐的大型语言模型（LLM）经常表现出一种关键失败模式：它们在分布内基准测试上取得高性能，但在分布外（OOD）任务上推理能力脆弱。我们将这种现象称为奖励诱导的流形坍缩。我们建立了一个理论框架，将结构因果模型（SCM）和信息瓶颈（IB）原理联系起来，以解释这一悖论。我们将推理定义为高复杂度的因果过程，将捷径学习定义为利用低复杂度的虚假相关性。在随机梯度下降（SGD）的隐式归纳偏置下，只要训练分布允许对真实因果机制进行“马尔可夫筛选”，优化结果奖励的模型就会偏向于捷径解。我们基于语义覆盖度量（$\eta$）而非样本量推导了一个新的泛化界限，说明了为什么在同质分布上扩展数据可能无法纠正推理缺陷。我们还表明，过程奖励模型（PRM）作为拓扑滤波器，通过强制执行逐步互信息约束，使得低复杂度的捷径流形不可行。这些结果为过程监督在简单信用分配之外的作用提供了数学基础。

英文摘要

Large Language Models (LLMs) aligned via outcome-based Reinforcement Learning (RL) frequently exhibit a critical failure mode: they achieve high performance on in-distribution benchmarks while demonstrating brittle reasoning capabilities on out-of-distribution (OOD) tasks. We term this phenomenon Reward-Induced Manifold Collapse. We establish a theoretical framework bridging Structural Causal Models (SCM) and the Information Bottleneck (IB) principle to explain this paradox. We define reasoning as a high-complexity causal process and shortcut learning as the exploitation of low-complexity spurious correlations. Under the implicit inductive bias of Stochastic Gradient Descent (SGD), models optimized for outcome rewards are biased toward shortcut solutions whenever the training distribution allows for a ``Markovian Screening'' of the true causal mechanism. We derive a new generalization bound based on Semantic Coverage Measure ($η$) rather than sample size, showing why data scaling on homogeneous distributions may fail to correct reasoning flaws. We also show that Process Reward Models (PRMs) function as Topological Filters, enforcing step-wise mutual information constraints that render the low-complexity shortcut manifold inadmissible. These results provide a mathematical grounding for the role of process supervision beyond simple credit assignment.

URL PDF HTML ☆

赞 0 踩 0

2606.00672 2026-06-02 cs.AI cs.LG 版本更新

Medication-Aware Financial Exploitation Detection for Alzheimer's Patients Using Edge-Aware Interaction Risk Modeling

基于边缘感知交互风险建模的阿尔茨海默病患者药物感知金融剥削检测

Farzana Akter, Lisan Al Amin, Rakib Hossain, Chaitanya Gunupudi, Faisal Quader

发表机构 * Cognitive Links LLC ； University of Maryland, College Park（马里兰大学学院公园分校）

AI总结提出一种药物感知框架，通过同步药物依从性与交易监控，利用交互感知逻辑模型提升对认知风险金融事件的检测，尤其在药物脆弱窗口期召回率从0.7442提升至0.9070。

详情

AI中文摘要

金融剥削对阿尔茨海默病患者日益构成威胁，尤其是在认知稳定性下降期间。传统欺诈检测系统通常仅依赖金融行为，忽略可能改变脆弱性的临床相关因素。本文提出一种药物感知框架，将药物依从性与交易级监控同步，以改进对认知风险金融事件的检测。构建了180名患者45天的混合模拟数据集，产生8,100条药物记录和30,855笔交易。该框架通过纯金融、加性药物感知和交互感知逻辑模型评估金额异常、商家新颖性、交易频率、时间偏差和药物依从性。结果表明，纯金融基线获得了最高的全局F1分数0.5000，但交互感知模型在药物诱导脆弱窗口期内将召回率从0.7442提升至0.9070，并在排名高风险案例中实现了最高平均精度。研究结果表明，药物依从性作为金融风险的上下文修饰因子比作为孤立预测因子更有用。

英文摘要

Financial exploitation is a growing concern for people with Alzheimer's disease, especially during periods of reduced cognitive stability. Conventional fraud detection systems usually rely on financial behavior alone and ignore clinically relevant factors that may alter vulnerability. This paper proposes a medication-aware framework that synchronizes medication adherence with transaction-level monitoring to improve detection of cognitively risky financial events. A hybrid simulation dataset was constructed for 180 patients across 45 days, producing 8,100 medication records and 30,855 transactions. The framework evaluates amount anomaly, vendor novelty, transaction frequency, time deviation, and medication adherence through financial-only, additive medication-aware, and interaction-aware logistic models. Results show that the financial-only baseline obtained the highest global F1-score of 0.5000, but the interaction-aware model improved recall during medication-induced vulnerability windows from 0.7442 to 0.9070 and achieved the highest average precision for ranked high-risk cases. The findings suggest that medication adherence is most useful as a contextual modifier of financial risk rather than as an isolated predictor.

URL PDF HTML ☆

赞 0 踩 0

2606.00671 2026-06-02 cs.AI cs.CL cs.LG 版本更新

AXIOM: A Trust-First Neuro-Symbolic Execution Architecture for Verifiable Mathematical Reasoning

AXIOM: 一种用于可验证数学推理的信任优先神经符号执行架构

Alessio Bruno

发表机构 * Independent researcher（独立研究者）

AI总结提出AXIOM架构，将语言模型限制为规范化器，通过确定性计算机代数系统管道实现可验证的数学推理，在4个MATH类别上达到94.36%的正确率和100%的信任度。

Comments Preprint. 12 pages, 2 figures. Live interactive demo: https://huggingface.co/spaces/Squagghy/axiom-solver. Paper artifact and dataset on Zenodo (concept-DOI): 10.5281/zenodo.20440225

详情

DOI: 10.5281/zenodo.20440225

AI中文摘要

我们提出AXIOM，一种用于自然语言数学推理的信任优先神经符号执行架构。在AXIOM中，语言模型严格作为规范化器：它将非正式问题文本重写为狭窄的模式，由确定性计算机代数系统（CAS）管道消费，该管道推导并验证答案，或作为第一类输出弃权。路由遵循问题形状正则表达式、特定模式提示和封闭形式CAS处理器之间的1:1:1对齐，已交付3100多条这样的路由，并在250多个连续提交中零LOST_CORRECT回归。我们在4个MATH类别上报告了实证结果，累积正确率为94.36%（2,592/2,747），可解析问题的信任度为100.00%（在整个2,747条记录基准测试中零自信错误答案），所有四个领域均高于每个领域70/90/70的阈值，每个领域信任度为100.0%，仅规则处理器的中位延迟为1毫秒（在lm-eval算术20,000条记录基准测试中占88%的记录）。该架构通过公共部署已服务约30,000次生产查询。我们强调的贡献不是最终的准确率数字，而是该架构建立的向前动态：生产中的每个记录弃权在一次发布周期后都是候选正确，因为新任务在不回归注册表的情况下组合。支撑这一特性的操作纪律——数学模板分桶、LOST_CORRECT扫描作为回归预言机、可解析优先接入以及弃权作为第一类输出——构成了一个可迁移的框架，适用于数学之外的值得信赖的神经符号系统。

英文摘要

We present AXIOM, a trust-first neuro-symbolic execution architecture for natural-language mathematical reasoning. In AXIOM, the language model functions strictly as a canonicalizer: it rewrites informal problem text into a narrow schema consumed by a deterministic Computer-Algebra-System (CAS) pipeline, which derives and verifies the answer or abstains as a first-class output. Routing follows a 1:1:1 alignment between problem-shape regex, schema-specific prompt, and closed-form CAS handler, with 3,100+ such routes shipped and zero LOST_CORRECT regressions across 250+ consecutive ship commits. We report empirical results on 4 MATH categories with a cumulative correctness of 94.36% (2,592/2,747) at 100.00% trust on parseable (zero confident-wrong answers across the full 2,747-record benchmark), all four domains above the per-domain 70/90/70 floor with per-domain trust at 100.0%, and median latency of 1 ms on rule-only handlers (88% of records on the lm-eval arithmetic 20,000-record benchmark). The architecture has served ~30,000 production queries through a public deployment. The contribution we emphasize is not a final accuracy figure but the forward dynamic the architecture establishes: every logged abstain in production is a candidate correct after one ship cycle, since new tasks compose without regressing the registry. The operational discipline behind this property -- math-template bucketing, LOST_CORRECT scan as regression oracle, parseable-first onboarding, and abstain as first-class output -- constitutes a transferable framework for trustworthy neuro-symbolic systems beyond mathematics.

URL PDF HTML ☆

赞 0 踩 0

2606.00667 2026-06-02 q-bio.NC cs.LG 版本更新

Cortex and subcortex play distinct roles over learning when cortical memory is limited

皮层与皮层下在学习中扮演不同角色：当皮层记忆受限时

Matthew Farrell, Taro Toyoizumi

发表机构 * Laboratory for Neural Computation and Adaptation（神经计算与适应实验室）； RIKEN Center for Brain Science（脑科学研究中心）； Department of Mathematical Informatics, Graduate School of Information Science and Technology（信息科学与技术研究生院数学信息学系）； The University of Tokyo（东京大学）

AI总结通过约束模型基模块的记忆资源，研究皮层与皮层下系统在学习中的功能分离，发现皮层支持一般结构学习而皮层下专攻奖励学习。

Comments Preprint. 19 pages, 4 figures

详情

AI中文摘要

已有研究表明，大脑将灵活但计算成本高的皮层处理与更简单、成本更低的皮层下机制相结合，以实现比任一系统单独运行更高效的资源利用。尽管这一观点具有吸引力，但探索该假设的理论框架仍然有限。我们扩展了现有框架，其中模型基模块和模型无关模块并行学习，通过显式约束模型基模块的记忆资源，并在一个简单的决策设置中研究该约束的影响。记忆约束自然引发了分配记忆资源的策略。我们评估了不同策略在不同情境下的表现，并证明当奖励状态频繁变化时，模型基模块将记忆资源用于捕捉环境的通用结构而非利用当前奖励可能更有利。这项工作为学习过程中皮层和皮层下系统的功能分离提供了理论基础：皮层支持通用结构学习，而皮层下回路专门负责基于奖励的学习。我们进一步详细说明了如何在实验数据上检验这些假设。

英文摘要

It has been proposed that the brain integrates flexible, computationally expensive cortical processing with simpler, lower-cost subcortical mechanisms to achieve resource-efficient performance greater than that of either system alone. Despite the allure of this perspective, satisfying theoretical frameworks that explore this hypothesis are still limited. We extend existing frameworks in which a model-based module and model-free module learn in tandem by explicitly constraining the memory resources of the model-based module, and investigate the impact of this constraint in a simple decision-making setting. Memory constraints naturally give rise to strategies for allocating memory resources. We evaluate the performance of different strategies in different situations and demonstrate that when the rewarded states change often, it can be advantageous for the model-based module to focus its memory resources not on exploiting the current reward, but on capturing general structure of the environment. This work provides a theoretical foundation for a functional dissociation between cortical and subcortical systems during learning: the cortex supports general structure learning, while subcortical circuits specialize in reward-based learning. We further detail how these hypotheses can be tested on experimental data.

URL PDF HTML ☆

赞 0 踩 0

2606.00666 2026-06-02 cond-mat.mtrl-sci cs.LG physics.chem-ph 版本更新

Manifold Diffusion for Structure Generation of Transition Metal Complexes

过渡金属配合物结构生成的流形扩散

Luca Schaufelberger, Kjell Jorner

发表机构 * Institute of Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich（苏黎世联邦理工学院化学与生物工程研究所，化学与应用生物科学系）； NCCR Catalysis, Switzerland（瑞士催化中心）

AI总结提出TMCgen流形扩散模型，通过金属-配体配位角与配体扭转/旋转扩散，高效生成过渡金属配合物的精确几何结构。

详情

神经损失如何塑造VAE潜在变量

Giorgio Strano, Luca Cerovaz, Michele Mancusi, Tommaso Mencattini, Emanuele Rodolà

发表机构 * Sapienza University of Rome（罗马大学萨皮恩扎分校）； Paradigma, Inc.（Paradigma公司）； Moises Systems, Inc.（Moises系统公司）； EPFL（苏黎世联邦理工学院）

AI总结本文研究感知损失和对抗损失等神经重建损失如何改变VAE的率失真问题，证明其减少潜在表示信息量并改变潜在空间几何结构，使表示更各向同性且不确定性分布更均匀。

详情

AI中文摘要

现代VAE很少使用标准$β$-VAE目标隐含的点态似然进行训练。在实践中，尽管缺乏对如何改变模型潜在动态的理解，点态重建常与感知损失和对抗损失结合。我们表明，重建损失的选择重塑了率失真问题本身，改变了潜在表示的信息内容和几何结构，这些变化可能仅从重建中无法察觉。首先，我们证明并实证验证，用神经项（如感知和对抗目标）增强点态重建会减少存储在潜在表示中的信息量。其次，我们展示神经重建损失系统地改变了潜在空间的几何结构：它们使表示更各向同性，并更均匀地将不确定性分布在潜在维度上，产生不同的后验方差分布。这些发现强调了率失真权衡并非理解VAE行为的全面视角，我们提出一种更机械的方法来研究失真度量的选择如何重塑优化问题。

英文摘要

Modern VAEs are rarely trained with the pointwise likelihood implied by the standard $β$-VAE objective. In practice, pointwise reconstruction is often combined with perceptual and adversarial losses, despite a lack of understanding of how this changes the latent dynamics of the model. We show that the choice of reconstruction loss reshapes the rate-distortion problem itself, altering both the information content and the geometry of the learned latent space in ways that may be invisible from reconstructions alone. First, we prove and verify empirically that augmenting pointwise reconstruction with neural terms, such as perceptual and adversarial objectives, reduces the amount of information stored in the latent representations. Second, we show that neural reconstruction losses systematically change the geometry of the latent space: they make representations more isotropic and distribute uncertainty more evenly across latent dimensions, producing different posterior variance profiles. These findings highlight how the rate-distortion tradeoff is not a comprehensive lens to understand the behavior of VAEs, and we propose a more mechanistic approach to investigate how the choice of a distortion metric reshapes the optimization problem.

URL PDF HTML ☆

赞 0 踩 0

2606.00634 2026-06-02 cs.CL cs.LG 版本更新

French parsing enhanced with a word clustering method based on a syntactic lexicon

基于句法词典的词聚类方法增强的法语解析

Anthony Sigogne, Matthieu Constant, Eric Laporte

发表机构 * Université Paris-Est（巴黎-est大学）； LIGM（语言与信息学实验室）

AI总结本文通过将法语句法词典（Lexicon-Grammar）的数据整合到概率解析器中，并应用聚类方法于法语树库的动词，提高了基于概率上下文无关文法的解析性能。

2606.00629 2026-06-02 cs.SD cs.HC cs.LG eess.AS 版本更新

Quality Audio Prototyping: a prototype system for unified sound retrieval and procedural generation

质量音频原型：统一声音检索与程序化生成的系统原型

Nelly Garcia, Aditya Bhattacharjee, Gabryel Mason-Williams, Israel Mason-Williams, Emmanouil Benetos, Joshua Reiss

发表机构 * GitHub

AI总结提出QuAP系统，通过统一基于内容的音频检索和实时程序化生成，并集成规则辅助参数指导，降低声音设计中的操作距离，经主观评估和用户测试验证了其有效性和实用性。

Comments DaFx 2026

详情

AI中文摘要

声音设计工作流经常在耗时的库搜索和复杂的程序化合成之间摇摆，从业者通常依赖独立的工具分别应对每个挑战。本文介绍了质量音频原型（QuAP），一个工作原型，它在单一界面中统一了基于内容的音频检索和程序化声音生成，减少了叙事概念与其声音实现之间的操作距离。QuAP集成了基于相似性的检索引擎与实时程序化音频模型，并辅以基于规则的助手，提供基于感知的参数指导，给出源自经验优化的定义和建议，而不需要先验的合成知识。初步评估证实了这种方法的可行性：主观评估显示六个嵌入合成模型中有五个在质量上具有统计显著性的提升，编码器消融研究在音效数据集上确立了首选的检索架构。与16名从业者的用户评估证实了该工具的工作流实用性，所有参与者一致认为参数助手在保持创作自主性的同时降低了程序化交互的门槛。

英文摘要

Sound design workflows frequently oscillate between time-consuming library searches and the complexity of procedural synthesis, with practitioners typically relying on disconnected tools to address each challenge separately. This paper introduces Quality Audio Prototyping (QuAP), a working prototype that unifies content-based audio retrieval and procedural sound generation within a single interface, reducing the procedural distance between a narrative concept and its sonic realisation. QuAP integrates a similarity-based retrieval engine with real-time procedural audio models, complemented by a rule-based assistant that provides perceptually informed parameter guidance, offering definitions and recommendations derived from empirical optimisation rather than requiring prior synthesis knowledge. Preliminary evaluation confirms the viability of this approach: subjective assessment demonstrated statistically significant quality improvements in five of six embedded synthesis models, and an encoder ablation study established the preferred retrieval architecture on a sound effect dataset. A user evaluation with 16 practitioners confirmed the tool's workflow utility, with all participants agreeing that the parameter assistant preserved creative agency while lowering the barrier to procedural interaction.

URL PDF HTML ☆

赞 0 踩 0

2606.00609 2026-06-02 cs.LG cs.AI 版本更新

CARE-RL: Capability-Aware Reinforcement Learning for Mitigating Cross-Domain Conflicts

CARE-RL：用于缓解跨领域冲突的能力感知强化学习

Rui Zhang, Xinle Wu, Yao Lu

发表机构 * National University of Singapore（新加坡国立大学）

AI总结提出CARE-RL框架，结合协议感知奖励生成与能力感知优化，通过PA-GRM和DACSP方法缓解多领域强化学习中的奖励不可靠与能力干扰问题。

详情

AI中文摘要

具有可验证奖励的强化学习在面向推理的大语言模型中取得了显著进展，但由于非可验证任务中奖励不可靠以及跨领域能力干扰，将其扩展到多领域强化学习仍具挑战性。我们提出CARE-RL，将协议感知奖励生成与能力感知优化相结合，以缓解跨领域冲突。对于非可验证任务，协议感知生成式奖励模型（PA-GRM）在生成轨迹条件奖励之前构建提示级别的评估协议和模式，从而实现对开放式响应的任务自适应且可比较的评估。对于多领域优化，方向感知能力子空间投影（DACSP）从先前的强化学习阶段提取历史能力方向，并通过放大对齐分量、抑制冲突分量以及保留正交更新来调节后续更新。在数学、聊天和指令遵循基准上的实验表明，CARE-RL始终优于标准的多领域强化学习基线，在Qwen2.5-7B和Qwen3-4B上分别达到47.9和50.7的总平均分。

英文摘要

Reinforcement learning (RL) with verifiable rewards has achieved strong progress in reasoning-oriented LLMs, but extending it to multi-domain RL remains challenging due to reward unreliability in non-verifiable tasks and capability interference across domains. We propose CARE-RL to combine protocol-aware reward generation with capability-aware optimization for mitigating cross-domain conflicts. For non-verifiable tasks, the Protocol-Aware Generative Reward Model (PA-GRM) constructs prompt-level evaluation protocols and schemas before producing trace-conditioned rewards, enabling task-adaptive yet comparable evaluation of open-ended responses. For multi-domain optimization, Direction-Aware Capability Subspace Projection (DACSP) extracts historical capability directions from previous RL stages and modulates later updates by amplifying aligned components, suppressing conflicting components, and preserving orthogonal updates. Experiments across math, chat, and instruction-following benchmarks show that CARE-RL consistently outperforms standard multi-domain RL baselines, achieving Total Avg scores of 47.9 and 50.7 on Qwen2.5-7B and Qwen3-4B, respectively.

URL PDF HTML ☆

赞 0 踩 0

2606.00605 2026-06-02 cs.LG stat.ML 版本更新

Looped Transformers with Layer Normalization Provably Learn the Power Method

带有层归一化的循环Transformer可证明地学习幂方法

Lyumin Wu, Chenyang Zhang, Yuan Cao

发表机构 * School of Computing & Data Science, The University of Hong Kong（计算与数据科学学院，香港大学）

AI总结本文通过主成分预测任务，证明带有层归一化的循环线性Transformer在梯度下降训练下会收敛到实现幂方法的解，揭示了层归一化带来的算法隐式偏差。

Comments 70 pages, 8 figures

详情

AI中文摘要

时空多任务图Transformer用于行程级公交预测

Oluwaleke Yusuf, Adil Rasheed, Frank Lindseth

发表机构 * Department of Engineering Cybernetics, Norwegian University of Science and Technology (NTNU)（工程 cybernetics 部，挪威科学与技术大学（NTNU））； Department of Computer Science, Norwegian University of Science and Technology (NTNU)（计算机科学部，挪威科学与技术大学（NTNU））

AI总结提出SMT-GraphFormer，一种将行程级公交预测建模为序列到序列问题的时空多任务图Transformer，通过图嵌入、上下文编码器和多门专家混合模块，在挪威特隆赫姆公交数据上优于停靠级基线方法。

Comments 25 pages, 7 figures, 11 tables, including appendix. Code available at https://github.com/Outsiders17711/SMTGraphFormer

详情

AI中文摘要

来自公共交通系统的乘客计数数据揭示了城市出行模式，对于规划、运营和优化至关重要。然而，站点和线路之间的非线性时空相互依赖性使得建模和预测具有挑战性。现有方法通常依赖于固定的时间、空间或站点级公式，限制了它们捕捉行程内演变和网络上下文的能力。本研究提出了SMT-GraphFormer，一种时空多任务图Transformer，将行程级公交预测构建为序列到序列建模。给定一条线路的站点序列和行程级上下文，模型预测连续的上下车人数，并将延误和停靠时间作为编码器侧的辅助任务。关键组件包括用于多关系站点相似性的图嵌入、用于天气和时间信息的上下文编码器，以及一个多门专家混合模块，该模块为上下车预测生成任务特定的解码器表示。对挪威特隆赫姆的公共公交数据进行评估表明，SMT-GraphFormer优于站点级表格基线，消融研究考察了每个组件的贡献。序列化公式在下车预测上取得了显著提升（R²提高+0.24），并在上车、延误和停靠时间上持续改进，证实了显式行程级序列偏差和目标间依赖性的价值。这些发现展示了基于Transformer的序列建模在捕捉公共交通复杂时空动态方面的潜力，并强调了针对公交数据定制的架构相对于现成表格模型的价值。所提出的框架为数字孪生环境中的场景分析提供了与预测范围无关的基础，支持规划者和公交运营商的知情决策。

英文摘要

Passenger count data from public transit systems reveals urban mobility patterns and is essential for planning, operation, and optimisation. However, non-linear spatiotemporal interdependencies across stops and lines make modelling and prediction challenging. Existing approaches often rely on fixed temporal, spatial, or stop-level formulations, limiting their ability to capture within-trip evolution and network context. This study proposes SMT-GraphFormer, a spatiotemporal multi-task graph transformer that frames trip-level transit prediction as sequence-to-sequence modelling. Given a line's stop sequence and trip-level context, the model predicts successive boarding and alighting counts, with delay and dwell time treated as encoder-side surrogate tasks. Key components include graph embeddings for multi-relational stop similarity, a context encoder for weather and temporal information, and a multi-gate mixture-of-experts module that produces task-specific decoder representations for boarding and alighting predictions. Evaluation on public bus transit data from Trondheim, Norway, shows that SMT-GraphFormer outperforms stop-level tabular benchmarks, with ablation studies examining each component's contribution. The sequential formulation yields substantial gains on alighting prediction ($+$0.24 in $R^2$) and consistent improvements on boarding, delay, and dwell, confirming the value of explicit trip-level sequential bias and inter-target dependencies. These findings demonstrate the potential of transformer-based sequence modelling for capturing complex spatiotemporal dynamics in public transit and underscore the value of architectures tailored to transit data rather than off-the-shelf tabular models. The proposed framework provides a horizon-agnostic basis for scenario analysis in digital twin environments, supporting informed decision-making by planners and transit operators.

URL PDF HTML ☆

赞 0 踩 0

2606.00571 2026-06-02 cs.LG cs.AI cs.CV 版本更新

On the Difficulty of Learning a Meta-network for Training Data Selection

学习用于训练数据选择的元网络的困难性

Zilin Du, Junqi Zhao, Boyang Albert Li

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结针对元学习训练数据选择（MTS）在实践中表现不佳的问题，本文通过数学分析揭示了梯度信噪比低和缺乏信息特征两大障碍，并提出增大批大小和利用信息特征作为解决方案。

详情

AI中文摘要

合成数据越来越多地被用于训练神经网络，但若不加区分地使用，其与真实数据的分布不匹配会限制其有效性。一种常见策略是通过双层优化学习数据权重，我们称之为元学习训练数据选择（MTS）。有趣的是，在实践中，MTS 往往低于预期。我们识别了正确训练 MTS 的两个障碍：梯度信噪比（GSNR）低导致优化困难，以及缺乏与数据质量相关的信息特征。我们对 MTS 进行了数学分析，揭示了归一化数据权重的动态以及不同数据质量与低 GSNR 之间的关系。分析表明，一个简单而有效的解决方案是增大批大小。此外，我们提出了一组信息特征，用于捕捉训练数据在其分布中的位置和训练动态。在四个基准上的实验显示了一致的改进，与无选择的训练相比平均提升 5.49%，与最强基线相比平均提升 2.89%。

英文摘要

Synthetic data are increasingly used to train neural networks, yet distributional mismatch with real data limits their effectiveness when used indiscriminately. A common strategy is to learn data weights via bi-level optimization, which we refer to as Meta-learning for Training-data Selection (MTS). Interestingly, in practice, MTS often performs below expectation. We identify two obstacles in properly training MTS: a poor gradient signal-to-noise ratio (GSNR), which causes optimization difficulties, and lack of informative features that correlates with data quality. We present a mathematical analysis of MTS, which reveals the dynamics of normalized data weights and the relation between disparate data quality and poor GSNR. The analysis suggests a a simple yet effective solution: increasing the batch size. Further, we propose a set of informative features that capture the positions of training data in their distributions and training dynamics. Experiments across four benchmarks show consistent improvements, achieving average gains of 5.49% over training without selection and 2.89% over the strongest baseline.

URL PDF HTML ☆

赞 0 踩 0

2606.00566 2026-06-02 cs.LG cs.CL cs.CR 版本更新

Same Payload, Different Channel: Measuring Trust Asymmetry in Tool-Using Language Models

相同载荷，不同通道：测量使用工具的語言模型中的信任不对称性

Mohammed Sameer Syed, Rozhin Yasaei

发表机构 * University of Arizona（亚利桑那大学）

AI总结本研究提出安全不对称分数（SAS），通过匹配恶意载荷仅改变传递上下文，系统测量了语言模型在不同通道（用户消息、工具元数据、工具输出）中对对抗性内容的脆弱性差异，发现代理原生模型在工具描述通道更脆弱，而通用模型相反，且机制研究表明安全相关表示在深层网络非线性编码。

Comments 13 pages, 1 figure. Submitted to EMNLP 2026

详情

AI中文摘要

随着语言模型承担代理角色，包括调用外部API、读取工具输出以及执行嵌入在第三方内容中的指令，其攻击面远超用户输入。模型是否以相同方式处理恶意指令（无论其来源）尚未被系统研究。我们引入了安全不对称分数（SAS），通过使用匹配的载荷对（保持恶意文本相同，仅改变传递上下文）来测量模型对对抗性内容的敏感性如何随内容出现在用户消息、工具元数据或工具输出中而变化。在6个生产级LLM和三种攻击家族上的评估发现了一致且信息丰富的不对称性：当对抗性内容通过工具描述而非用户消息传递时，代理原生模型显著更脆弱，而通用模型则相反。当相同内容通过工具输出而非描述传递时，这种不对称性进一步反转，表明模型隐含地将工具元数据视为可信指令，而将工具结果视为普通数据。对Llama 3.3 70B的机制研究表明，安全相关表示在网络的中间到深层因果存在但非线性编码，解释了线性探针为何无法检测到它。这些发现揭示了当前使用工具的模型在处理对抗性内容时存在的系统性、通道依赖的盲点。

英文摘要

As language models take on agentic roles that span calling external APIs, reading tool outputs, and acting on instructions embedded in third-party content, their attack surface expands well beyond what users type. Whether a model treats a malicious instruction the same way regardless of where it arrives has not been systematically studied. We introduce the Safety Asymmetry Score (SAS), which measures how much a model's susceptibility to adversarial content shifts depending on whether that content arrives in the user message, tool metadata, or tool output, using matched payload pairs that keep the malicious text identical and vary only the context of delivery. Evaluated across 6 production LLMs and three attack families, we find a consistent and informative asymmetry: agent-native models are substantially more vulnerable when adversarial content arrives via tool descriptions than via user messages, while general-purpose models show the reverse. This asymmetry further inverts when the same content is delivered through tool outputs rather than descriptions, suggesting models implicitly treat tool metadata as trusted instructions and tool results as ordinary data. A mechanistic study on Llama 3.3 70B reveals that the safety-relevant representation is causally present at mid-to-late network depths but non-linearly encoded, explaining why linear probes fail to detect it. These findings expose a systematic, channel-dependent blind spot in how current tool-using models handle adversarial content.

URL PDF HTML ☆

赞 0 踩 0

2606.00563 2026-06-02 cs.LG cs.AI stat.ML 版本更新

A Practical Upper Bound on Selection Bias Effects in Medical Prediction Models

医学预测模型中选择偏差影响的一个实用上界

Kara Liu, Maggie Wang, Russ B. Altman

发表机构 * Stanford University（斯坦福大学）

AI总结针对选择偏差导致模型泛化性差的问题，提出在仅部分观测选择机制和目标分布的现实条件下，对目标群体最差模型性能的一个新上界，并通过合成数据和真实数据验证其有效性和实用性。

Comments 32 pages, 27 figures, will be published at ACM SIGKDD '26

详情

DOI: 10.1145/3770855.3818112

AI中文摘要

选择偏差是真实世界数据中常见且往往不可避免的一个方面，它挑战了机器学习模型的泛化性。当在偏倚数据上训练的模型被部署到更广泛的目标群体时，模型泛化能力差可能导致实际危害，尤其是在医疗保健等高危环境中。这种风险凸显了从业者在部署前可靠评估模型泛化性的需求。然而，现有的预测模型性能的方法依赖于不切实际地访问目标分布或了解导致偏差的选择机制。为了解决这些局限性，我们提出了一个新颖的上界，用于在现实设置下目标群体上的最差模型性能，其中选择机制和目标群体数据仅被部分观测。我们通过在完全合成数据、源自All of Us研究计划的半合成数据以及MIMIC-IV中的真实世界选择偏差上的实验，证明了我们方法的有效性和实际效用。我们的工作提供了一个原则性和实用性的工具，用于估计在原本难以处理的情况下选择偏差的影响，从而使从业者能够在医疗保健及其他领域构建更安全、更具泛化性的模型。

英文摘要

Selection bias is a common and often unavoidable aspect of real-world data that challenges the generalizability of machine learning models. When models trained on biased data are deployed in the broader target population, poor model generalization may lead to real harm, particularly in high-risk settings such as healthcare. This risk highlights the need for practitioners to reliably assess model generalizability prior to deployment. However, existing methods for predicting model performance rely on unrealistic access to the target distribution or knowledge of the selection mechanism causing bias. To address these limitations, we propose a novel upper bound on the worst-case model performance on the target population under the realistic setting where the selection mechanism and the target population data are only partially observed. We demonstrate the validity and practical utility of our method through experiments on fully synthetic data, semi-synthetic data derived from the All of Us Research Program, and real-world selection bias in MIMIC-IV. Our work offers a principled and practical tool to estimate the impact of selection bias in an otherwise intractable setting, thereby enabling practitioners to build safer and more generalizable models in healthcare and beyond.

URL PDF HTML ☆

赞 0 踩 0

2606.00562 2026-06-02 cs.CV cs.LG 版本更新

DeepLatent: Think with Images via Parallel Latent Visual Reasoning

DeepLatent: 通过并行潜在视觉推理用图像思考

Dongchen Lu, Zhimo Li, Mao Shu, Huo Cao

发表机构 * Baidu Inc.（百度公司）； Peking University（北京大学）

AI总结提出DeepLatent框架，通过LatentFormer并行生成潜在视觉状态，并结合连续空间强化学习优化潜在表示，在多个基准上达到最先进性能。

详情

AI中文摘要

“用图像思考”的新兴范式将视觉状态嵌入中间推理步骤，定义了视觉语言模型的新前沿。现有方法沿两条路线分化。工具辅助方法应用显式视觉操作，但存在高延迟和操作类型受限的问题。潜在推理方法自回归生成隐式视觉状态，但性能不如工具辅助方法，且其潜在标记无法捕获有效的视觉信息。在这项工作中，我们提出DeepLatent，一个用于潜在视觉推理的并行框架。首先，我们引入LatentFormer。它使用可学习的2D标记并行生成上下文条件的潜在状态，将每次视觉更新直接锚定在原始图像特征中。其次，我们设计了一种连续空间强化学习算法。它直接在嵌入空间中优化潜在调制参数，显著提高潜在表示质量。该框架通过知识蒸馏和连续空间强化学习算法进行训练。此外，我们贡献了DeepLatent-180K，一个专为潜在视觉推理定制的大规模数据集。在多个基准上的广泛评估表明，DeepLatent达到了最先进的性能。

英文摘要

The emerging paradigm of "thinking with images" embeds visual states into intermediate reasoning steps, defining a new frontier for Vision-Language Models. Existing approaches diverge along two lines. Tool-assisted methods apply explicit visual operations but suffer from high latency and restricted manipulation types. Latent reasoning methods autoregressively produce implicit visual states, but underperform tool-assisted methods, and their latent tokens fail to capture effective visual information. In this work, we propose DeepLatent, a parallel framework for latent visual reasoning. First, we introduce LatentFormer. It uses learnable 2D tokens to generate context-conditioned latent states in parallel, anchoring every visual update directly in the original image features. Second, we design a continuous-space reinforcement learning algorithm. It optimizes latent modulation parameters directly in the embedding space, significantly improving latent representation quality. The framework is trained via knowledge distillation followed by this continuous-space RL algorithm. Furthermore, we contribute DeepLatent-180K, a large-scale dataset tailored for latent visual reasoning. Extensive evaluations across multiple benchmarks demonstrate that DeepLatent achieves state-of-the-art performance.

URL PDF HTML ☆

赞 0 踩 0

2606.00561 2026-06-02 cs.LG cs.AI 版本更新

Interpretable Policy Distillation for Power Grid Topology Control

可解释的策略蒸馏用于电网拓扑控制

Aleksandra Dmitruka, Karlis Freivalds

发表机构 * University of Latvia, Faculty of Exact Sciences and Technology（拉脱维亚大学，精确科学与技术学院）

AI总结提出一种将深度强化学习策略蒸馏为轻量级决策树/随机森林的方法，在保持性能的同时提升可解释性，并揭示表征偏移。

详情

AI中文摘要

深度强化学习为实时电网运行提供了有前景的途径，但大型神经策略评估成本高、难以在受限硬件上部署，且对操作员不透明。我们探究用于电网拓扑控制的近端策略优化（PPO）智能体能否压缩为紧凑的树基替代模型而不损失运行性能。在Grid2Op的标准14节点环境中，使用面向稳定性的奖励，通过压力聚焦的数据收集在关键高负荷状态下训练PPO教师。然后将策略蒸馏为决策树和随机森林。在保留的验证回合中，两个替代模型在平均奖励和生存时长上均超过教师，而推理成本仅为教师的一小部分。决策树与PPO argmax的动作完全一致率较高，且在其排名靠前的动作中几乎完全一致，同时保持足够小以便直接检查。特征重要性分析揭示了表征偏移：PPO策略主要依赖线路负载信号，而蒸馏树主要由母线拓扑变量驱动。这些结果表明，压力聚焦的蒸馏可以将黑箱神经控制器转换为轻量级、可审计的规则类替代模型，适用于实时部署，同时揭示与确定性动作和拓扑特定泛化相关的风险。

英文摘要

Deep reinforcement learning (RL) offers a promising route to real-time power grid operation, yet large neural policies are costly to evaluate, hard to deploy on constrained hardware, and opaque to operators. We ask whether a Proximal Policy Optimization (PPO) agent for grid topology control can be compressed into compact tree-based surrogates without losing operational performance. A PPO teacher is trained on Grid2Op's standard 14-bus environment with a stability-oriented reward, using stress-focused data collection on critical, high-loading states. The policy is then distilled into a decision tree and a random forest. Across held-out validation episodes, both surrogates exceed the teacher in mean reward and survival length at a fraction of the inference cost. The decision tree shows high exact-action agreement with the PPO argmax and near-complete agreement within its top-ranked actions, while remaining small enough to be inspected directly. Feature-importance analysis reveals a representational shift: the PPO policy relies mainly on line-loading signals, while the distilled tree is driven primarily by bus-topology variables. These results suggest that stress-focused distillation can convert a black-box neural controller into a lightweight, auditable rule-like surrogate suited for real-time deployment, while also surfacing risks tied to deterministic actions and topology-specific generalization.

URL PDF HTML ☆

赞 0 踩 0

2606.00559 2026-06-02 cs.LG cs.AI 版本更新

Richer Representations for Neural Algorithmic Reasoning via Auxiliary Reconstruction

通过辅助重建实现神经算法推理的更丰富表示

Jiafu Huang, Chao Peng, Chenyang Xu, Zhengfeng Yang, Kecheng Cai, Chenhao Zhang, Yi Wang, Yiwei Gong, Wanqin Zhou, Irene Zheng

发表机构 * sei.ecnu.edu.cn（东华大学信息科学与工程学院）

AI总结提出辅助重建模块和自监督学习变体，增强编码器对输入状态信息的保留和特征间依赖的捕捉，从而提升神经算法推理性能。

Comments Appeared at AAAI 2026

详情

AI中文摘要

神经算法推理已成为一个热门研究方向。它旨在训练神经网络模仿经典基于规则的算法的逐步行为。更具体地说，此类算法的执行可以抽象为一系列状态，其中每个状态代表执行步骤后的中间结果。训练目标是生成复制底层算法过程的状态序列。该任务的常见框架采用编码器-处理器-解码器架构，其中编码器学习状态的表示，处理器模拟算法步骤，解码器重建输出状态。虽然先前的工作侧重于改进处理器，但编码器在表示学习中的作用很少受到关注。大多数方法依赖简单的MLP编码器，这引发了一个问题：这些表示是否足够信息丰富以支持算法推理。本文研究如何改进神经算法推理的编码器表示。我们提出一个重建模块，旨在从其编码表示中恢复输入状态。这个辅助重建任务鼓励编码器保留关于输入的关键信息。我们证明，在训练过程中加入此任务可以提高现有神经架构在标准基准上的性能。此外，我们观察到当前编码器常常未充分利用状态内特征之间的相关性。为了解决这个问题，我们从自监督学习中汲取灵感，设计了一个增强的辅助任务变体，鼓励编码器捕捉状态内特征依赖。实验结果表明，我们的方法使编码器能够学习更丰富的表示，从而增强现有处理器在算法推理任务上的性能。

英文摘要

Neural algorithmic reasoning has emerged as a popular research direction. It aims to train neural networks to mimic the step-by-step behavior of classical rule-based algorithms. More specifically, the execution of such algorithms can be abstracted as a sequence of states, where each state represents the intermediate outcome after an execution step. The training objective is to generate state sequences that replicate the underlying algorithmic process. A common framework for this task adopts an encoder-processor-decoder architecture, where the encoder learns representations of states, the processor simulates algorithmic steps, and the decoder reconstructs output states. While prior work has focused on improving the processor, the role of the encoder in representation learning has received little attention. Most methods rely on simple MLP encoders, raising the question of whether such representations are sufficiently informative for supporting algorithmic reasoning. This paper investigates how to improve encoder representations for neural algorithmic reasoning. We propose a reconstruction module that aims to recover the input state from its encoded representation. This auxiliary reconstruction task encourages the encoder to retain critical information about the input. We demonstrate that incorporating this task during training improves the performance of existing neural architectures on standard benchmarks. Furthermore, we observe that current encoders often underutilize the correlations among features within a state. To address this, we draw inspiration from self-supervised learning and design an enhanced variant of the auxiliary task that encourages the encoder to capture intra-state feature dependencies. Experimental results show that our method enables the encoder to learn richer representations, thereby enhancing the performance of existing processors on algorithmic reasoning tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.00557 2026-06-02 cs.LG 版本更新

逃离模式抽彩：多响应训练提升语言模型泛化能力

Hasan Amin, Kian Ahrabian, Ming Yin, Rajiv Khanna

发表机构 * Department of Computer Science, Purdue University（计算机科学系，普渡大学）

AI总结本文提出多响应训练（MRT）方法，通过保留每个提示的多个有效响应来缓解传统单响应微调导致的“模式抽彩”问题，并从统计角度揭示了其提升分布泛化的原理和适用条件。

详情

AI中文摘要

现代语言模型微调通常为每个提示配对单个响应，尽管许多提示允许多个有效补全。这实际上将多模态条件分布简化为单样本视图，我们称之为“模式抽彩”现象，其中训练强调一部分合理模式而忽略其他模式。我们研究了多响应训练（MRT），该方法保留每个提示的多个响应，并建立了关于何时以及为何有帮助的原则性解释。我们的关键见解是，提示和响应是不同的统计资源：额外的提示减少输入分布的不确定性，而额外的响应减少条件输出分布的不确定性。这产生了方差-预算权衡，预测了何时保留多个响应是有价值的，显示了随着提示级不确定性占主导地位而收益递减，并解释了为什么大型冗余语料库可以表现出隐式的多响应效应。我们进一步分析了响应选择，并表明Random-K-of-N是分布微调的无偏默认选择，基于奖励的选择可能导致模式坍缩，而子模质量-多样性目标提供了一种具有理论保证的高效替代方案。受控模拟验证了预测的方差和选择效应，包括一个惊人的失败模式，其中仅奖励选择产生的梯度与真实目标不一致。在结构化和真实世界数据集上，包括一个新的多提示、多响应基准，MRT一致地改善了分布泛化，在响应多样性高、提示冗余性低的场景中收益最大。MRT将响应多重性重新定义为数据分配问题，并提供了明确的指导：当响应廉价且多样时，保留多个响应不是启发式方法，而是基于统计的选择。

英文摘要

Modern language-model fine-tuning typically pairs each prompt with a single response, even though many prompts admit multiple valid completions. This effectively reduces a multi-modal conditional distribution to a one-sample view, a phenomenon we call the "mode lottery," where training emphasizes a subset of plausible modes while leaving others underrepresented. We study multi-response training (MRT), which retains multiple responses per prompt, and develop a principled account of when and why it helps. Our key insight is that prompts and responses are distinct statistical resources: additional prompts reduce uncertainty about the input distribution, while additional responses reduce uncertainty about the conditional output distribution. This yields a variance-budget tradeoff that predicts when retaining multiple responses is worthwhile, shows diminishing returns as prompt-level uncertainty dominates, and explains why large redundant corpora can exhibit an implicit multi-response effect. We further analyze response selection, and show that Random-K-of-N is the unbiased default for distributional fine-tuning, reward-based selection can induce mode collapse, and a submodular quality-diversity objective provides an efficient alternative with theoretical guarantees. Controlled simulations validate the predicted variance and selection effects, including a striking failure mode where reward-only selection produces gradients misaligned with the true objective. Across structured and real-world datasets, including a new multi-prompt, multi-response benchmark, MRT consistently improves distributional generalization, with the largest gains in high response-diversity, low prompt-redundancy regimes. MRT reframes response multiplicity as a data-allocation problem with clear guidance: when responses are cheap and diverse, keeping more than one is not a heuristic, but a statistically grounded choice.

URL PDF HTML ☆

赞 0 踩 0

2606.00539 2026-06-02 cs.LG math.OC stat.ML 版本更新

GNMR: Runtime Stability Control for Low-Precision Large Language Model Training

GNMR: 低精度大语言模型训练的运行时稳定性控制

Boao Kong, Weichen Jia, Engao Zhang, Guohong Li, Yonghan Dong, Yao Wang, Yaoyuan Wang, Yunke Peng, Kun Yuan

发表机构 * Peking University（北京大学）； Huawei Technologies Ltd.（华为技术有限公司）

AI总结针对低精度语言模型训练中的稳定性瓶颈，提出基于梯度范数与历史均值之比（GNMR）的轻量级运行时控制器，通过局部风险信号映射到有界恢复动作，在不改变数值格式或后端的情况下提升训练稳定性。

Comments 29 pages, 4 figures, 15 tables

详情

AI中文摘要

训练稳定性是低精度语言模型训练的关键瓶颈：高效的低成本路径仍可能在少量算子处产生短暂的数值风险。我们将此问题形式化为运行时稳定性控制，并提出梯度范数与历史均值之比（GNMR），一种轻量级控制器，将每个可恢复单元的当前梯度范数与其历史均值进行比较。结合用于检测短窗口内突增的$Δ$-GNMR，GNMR在硬$\mathrm{maxO}$预算和短锁定间隔下将局部风险信号映射到有界恢复动作，而不改变数值格式、内核或后端方案。在激活量化压力测试、DeepSeek风格的配方级训练以及LLaMA-2 13B微调中，GNMR以稀疏且预算受限的恢复保持了高保真质量。这些结果支持GNMR作为一种与后端无关的控制器，在保持低成本执行的同时提高低精度训练的稳定性。

英文摘要

Training stability is a key bottleneck in low-precision language model training: efficient low-cost paths can still produce short-lived numerical risks at a small set of operators. We formulate this as runtime stability control and present Gradient Norm-to-Mean Ratio (GNMR), a lightweight controller that compares each recoverable unit's current gradient norm with its historical mean. Together with $Δ$-GNMR for abrupt short-window increases, GNMR maps local risk signals to bounded recovery actions under a hard $\mathrm{maxO}$ budget and a short lock interval, without changing the numerical format, kernel, or backend recipe. Across activation-quantization stress, DeepSeek-style recipe-level training, and LLaMA-2 13B fine-tuning, GNMR preserves high-fidelity quality with sparse, budgeted recovery. These results support GNMR as a backend-agnostic controller to improve low-precision training stability while preserving low-cost execution.

URL PDF HTML ☆

赞 0 踩 0

2606.00535 2026-06-02 cs.LG 版本更新

DREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal Generation

DREAM-S: 基于可搜索草稿与目标感知精炼的推测解码用于多模态生成

Zining Liu, Yunhai Hu, Tianhua Xia, Bo Bao, Eric Sather, Vithursan Thangarasa, Sai Qian Zhang

发表机构 * New York University（纽约大学）； Cerebras Systems Inc.（Cerebras Systems公司）； University of Pennsylvania（宾夕法尼亚大学）

AI总结提出DREAM-S框架，通过神经架构搜索和目标感知超网训练自动优化草稿模型架构与交互策略，结合注意力熵引导的自适应中间特征蒸馏，实现视觉语言模型的高效推测解码，加速比达3.85倍。

详情

AI中文摘要

推测解码（SD）已被证明是加速大型语言模型（LLM）自回归生成的有效技术，然而其在视觉语言模型（VLM）中的应用仍相对未被探索。我们提出 extit{DREAM-S}，一个专门为VLM中快速高效解码设计的新型SD框架。DREAM-S利用神经架构搜索（NAS）框架与目标感知超网训练，自动识别草稿模型与目标模型之间的最优交互策略，以及最适合底层硬件实现平台的草稿模型架构。此外，DREAM-S还结合了由注意力熵引导的自适应中间特征蒸馏，以实现高效的草稿训练。在一系列成熟的VLM上的实验表明，与标准解码方法相比，DREAM-S实现了高达$3.85 imes$的加速，并显著优于现有的SD基线。代码已公开：https://github.com/SAI-Lab-NYU/DREAM-S。

英文摘要

Speculative decoding (SD) has proven to be an effective technique for accelerating autoregressive generation in large language models (LLMs) however, its application to vision-language models (VLMs) remains relatively unexplored. We propose~\textit{DREAM-S}, a novel SD framework designed specifically for fast and efficient decoding in VLMs. DREAM-S leverages a neural architecture search (NAS) framework with target-aware supernet training to automatically identify both the optimal interaction strategy between the draft and target models, and the most suitable draft model architecture for the underlying hardware implementation platform. DREAM-S additionally incorporates adaptive intermediate feature distillation, guided by attention entropy, to enable efficient draft training. Experiments on a range of well-established VLMs show that DREAM-S achieves up to a $3.85\times$ speedup compared to standard decoding approaches and significantly outperforms existing SD baselines. The code is publicly available at: https://github.com/SAI-Lab-NYU/DREAM-S .

URL PDF HTML ☆

赞 0 踩 0

2606.00520 2026-06-02 math.OC cs.LG stat.ML 版本更新

In-Expectation Convergence of Stochastic Gradient Methods under Heavy-Tailed Noise

重尾噪声下随机梯度方法的期望收敛性

Zijian Liu

发表机构 * Stern School of Business, New York University（纽约大学斯特恩商学院）

AI总结针对重尾噪声（有限p阶矩，p∈(1,2)）下随机梯度方法的收敛性问题，证明了随机镜像下降（SMD）、加速随机镜像下降（ASMD）在凸优化中以及SGD和带动量的SGD（SGDM）在非凸优化中的期望收敛性，无需算法修改或有界域假设。

详情

AI中文摘要

许多随机梯度方法被认为在随机梯度的噪声仅具有有限$p$阶矩（$p\in\left(1,2\right)$）时不会收敛，这种设置被称为重尾噪声假设。然而，最近的一些研究发现，随机梯度下降（$\textsf{SGD}$）无需对其更新规则进行任何修改，就能在有界域的凸问题中出人意料地收敛，这凸显了经典随机梯度方法的潜力。受这一最新进展的启发，我们对重尾噪声下的随机优化进行了全面研究，并为凸优化中的随机镜像下降（$\textsf{SMD}$）和加速随机镜像下降（$\textsf{ASMD}$）以及非凸优化中的$\textsf{SGD}$和带动量的随机梯度下降（$\textsf{SGDM}$）建立了新的期望收敛结果。值得注意的是，我们的结果不仅无需算法修改，而且避免了先前工作中施加的限制性假设，如有界域。更重要的是，我们的分析为研究重尾随机优化提供了一个新颖、优雅且强大的框架，为理解一阶随机梯度方法开辟了一条新途径。

英文摘要

Many stochastic gradient methods are believed not to converge when the noise in stochastic gradients has only a finite $p$-th moment for $p\in\left(1,2\right)$, a setting known as the heavy-tailed noise assumption. However, some recent studies have found that Stochastic Gradient Descent ($\textsf{SGD}$), without any modification to its update rule, can surprisingly converge in expectation for convex problems with bounded domains, highlighting the potential of classical stochastic gradient methods. Inspired by this recent progress, we provide a comprehensive study of stochastic optimization under heavy-tailed noise and establish new in-expectation convergence results for Stochastic Mirror Descent ($\textsf{SMD}$) and Accelerated Stochastic Mirror Descent ($\textsf{ASMD}$) in convex optimization, and for $\textsf{SGD}$ and Stochastic Gradient Descent with Momentum ($\textsf{SGDM}$) in nonconvex optimization. Notably, our results not only hold without algorithmic changes but also avoid restrictive assumptions, such as bounded domains, imposed in prior work. More importantly, our analysis provides a new, elegant, and powerful framework for studying heavy-tailed stochastic optimization, opening a new route to understanding first-order stochastic gradient methods.

URL PDF HTML ☆

赞 0 踩 0

2606.00514 2026-06-02 cs.LG cs.CV 版本更新

Generate in Reconstruction Space, Match in Semantic Space: Transport Geometry for One-Step Generation

在重建空间中生成，在语义空间中匹配：一步生成的传输几何

Hugues Van Assel, Edward De Brouwer, Saeed Saremi, Gabriele Scalia, Aviv Regev

发表机构 * Genentech（基因泰克）

AI总结本文研究自监督表示学习（SSL）特征在一步生成模型中的作用，提出在语义特征空间中使用Sinkhorn散度进行分布匹配，显著降低ImageNet FID，并揭示了评估指标与训练特征之间的潜在冲突。

Comments 26 pages, 4 figures

详情

AI中文摘要

生成建模和自监督表示学习（SSL）优化结构不同的目标：生成训练奖励分布保真度，而SSL奖励语义一致性。然而，最近的研究反复发现SSL特征改善了生成训练，尽管这种协同作用的机制仍不清楚。在这里，我们在一步生成的框架下研究SSL在生成建模中的优势，其中表示的作用是明确的：冻结的SSL特征用于将生成的样本与真实数据匹配。我们在该特征空间中使用Sinkhorn散度，为Wasserstein距离提供了一个可处理的代理，这是由Fréchet风格评估指标（如FID）近似的总体差异。我们发现，当在语义结构化的SSL特征空间中计算时，这个目标变得非常有效（ImageNet FID降低39倍）。我们将这种行为主要归因于匹配估计：抑制无关重建细节的语义SSL特征诱导出更紧凑的几何结构，使分布匹配更易处理。因此，最佳的训练SSL特征不一定与评估指标使用的特征匹配。特别是，我们表明使用Inception作为特征提取器可以改善FID，同时降低匹配稳定性和样本质量，揭示了一种形式的指标黑客攻击。通过在ImageNet上的大量实验，我们确定了哪些SSL特征族能带来最佳的生成性能，并表明匹配稳定性是选择它们的定量标准。代码可在https://github.com/Genentech/semantic-transport-generation获取。

英文摘要

Generative modeling and self-supervised representation learning (SSL) optimize structurally different objectives: generative training rewards distributional fidelity, while SSL rewards semantic coherence. Yet recent work repeatedly finds that SSL features improve generative training, though the mechanism of this synergy remains unclear. Here, we study the benefits of SSL in generative modeling in the framework of one-step generation where the role of representation is explicit: frozen SSL features are used to match generated samples to real data. We use the Sinkhorn divergence in that feature space, providing a tractable surrogate for the Wasserstein distance, the population-level discrepancy approximated by Fréchet-style evaluation metrics (such as FID). We find that this objective becomes highly effective when computed in a semantically structured SSL feature space (a 39$\times$ reduction in ImageNet FID). We trace this behavior primarily to matching estimation: semantic SSL features that suppress nuisance reconstruction details induce a more compact geometry, making distribution matching more tractable. As a consequence, the best training SSL features need not match the features used by the evaluation metric. In particular, we show that using Inception as the feature extractor can improve FID while degrading matching stability and sample quality, revealing a form of metric hacking. Using extensive experiments on ImageNet, we identify which SSL feature families lead to best generation performance and show that matching stability is a quantitative criterion for selecting them. Code is available at https://github.com/Genentech/semantic-transport-generation.

URL PDF HTML ☆

赞 0 踩 0

2606.00512 2026-06-02 cs.LG cs.IT math.IT stat.ML 版本更新

Semi-Supervised Learning with Noisy Proxy Covariates: Generalization Bounds and Distribution Regression

带噪声代理协变量的半监督学习：泛化界与分布回归

Kwangho Kim, Jisu Kim

发表机构 * Department of Statistics, Korea University, Seoul, Korea（韩国大学统计系）； Department of Statistics, Seoul National University, Seoul, Korea（首尔国立大学统计系）

AI总结针对带噪声代理协变量的半监督回归问题，提出两阶段估计器，利用所有代理协变量学习核本征特征，并在标记数据上拟合岭回归，理论证明在代理扰动可控且未标记代理协变量充足时能恢复快速标记样本率，实验表明在低标记率下优于监督和半监督基线。

2606.00511 2026-06-02 cs.LG cs.CV 版本更新

Saliency-Aware Model Merging

显著性感知模型合并

Jungin Park, Jiyoung Lee, Kwanghoon Sohn

发表机构 * Yonsei University, Seoul, South Korea（首尔大学）； Ewha Womans University, Seoul, South Korea（成均馆女子大学）

AI总结提出SA-Merging方法，利用结构剪枝中的连通性显著性（如SynFlow）进行数据无关模型合并，通过任务向量显著性评分和合并感知调制减少任务干扰，并在视觉和语言任务上验证有效性。

Comments ICML 2026 Camera-ready

详情

AI中文摘要

模型合并旨在将多个在不同数据集上微调的任务特定模型整合到一个统一架构中，以实现跨领域能力。当前的数据无关模型合并方法通常难以扩展，因为它们依赖于忽略层间依赖性和非均匀专业知识分布的简单参数级启发式方法。本文提出SA-Merging，它基于结构剪枝（如SynFlow）中的连通性显著性公式，并将其扩展到数据无关模型合并设置。我们相对于共享基础模型定义任务向量上的显著性分数，并进一步引入合并感知调制，该调制结合专家间的一致性以减轻任务干扰。基于此公式，迭代的显著性感知合并过程逐步移除非信息性更新，同时保留端到端连通性。此外，我们将SA-Merging扩展到为LoRA引入秩级显著性分解，而不损害其结构完整性。在视觉和语言任务上的大量实验证明了我们基于显著性方法的有效性，进一步缩小了数据无关方法和测试时自适应方法之间的差距。

英文摘要

Model merging aims to consolidate multiple task-specific models fine-tuned on different datasets into a unified architecture that performs cross-domain proficiency. Current data-free model merging methods often struggle to scale as they rely on simple parameter-level heuristics that ignore inter-layer dependencies and non-uniform distribution of expertise. This work proposes SA-Merging, which is built upon connectivity-based saliency formulations from structural pruning (e.g., SynFlow) and extends them to the data-free model merging setting. We define a saliency score over task vectors relative to a shared base model, and further introduce merge-aware modulation that incorporates agreement across experts to mitigate task interference. Based on this formulation, an iterative saliency-aware merging procedure progressively removes non-informative updates while preserving end-to-end connectivity. Furthermore, we extend SA-Merging to introduce rank-wise saliency decomposition for LoRAs without compromising their structural integrity. Extensive experiments on vision and language tasks demonstrate the effectiveness of our saliency-based approach, further reducing the gap between data-free and test-time adaptation methods.

URL PDF HTML ☆

赞 0 踩 0

2606.00506 2026-06-02 cs.AI cs.LG 版本更新

EnergyMamba: An Uncertainty-Aware Graph-Enhanced Selective State Space Model for Energy Consumption Prediction

EnergyMamba：一种用于能耗预测的具有不确定性感知的图增强选择性状态空间模型

Dahai Yu, Rongchao Xu, Lin Jiang, Guang Wang

发表机构 * Florida State University（佛罗里达州立大学）

AI总结提出EnergyMamba框架，通过图增强选择性状态空间模型（GE-Mamba）和自适应序列分位数回归（AS-CQR）模块，实现时空联合建模与不确定性量化，在能耗预测中提升准确率约5%、不确定性量化约6%。

Comments Accepted by KDD 2026 AI4S

详情

DOI: 10.1145/3770855.3818841

AI中文摘要

能耗预测对于高效的电网管理、需求侧优化和可持续能源规划至关重要。尽管先进的机器学习方法已被用于提高预测性能，但现有工作存在两个关键局限：（1）通常将任务视为纯时间序列预测问题，未显式建模不同区域间的空间依赖关系；（2）在极端天气等异常情况下无法提供带有不确定性估计的可靠预测。为推进现有研究，我们提出EnergyMamba，一种具有不确定性感知的时空学习框架，用于准确可靠的能耗预测，包含两个关键组件：（i）一种新颖的图增强选择性状态空间模型（GE-Mamba），将从电网拓扑中学到的空间上下文注入时间动态，实现耦合的时空建模；（ii）自适应序列分位数回归（AS-CQR）模块，包括局部自适应归一化和在线反馈机制，以在潜在分布偏移下动态校准预测区间。我们在来自佛罗里达、纽约和加利福尼亚的四个大规模真实数据集上评估EnergyMamba。结果表明，与15个最先进的基线相比，EnergyMamba在预测准确率上提升约5%，在不确定性量化上提升约6%。

英文摘要

Energy consumption prediction is essential for efficient grid management, demand-side optimization, and sustainable energy planning. Although advanced machine learning methods have been employed for better prediction performance, existing works have two key limitations: (1) they usually formulate this task as a purely time-series prediction problem without explicitly modeling the spatial dependencies among different regions, and (2) they fail to provide reliable predictions with uncertainty estimates under abnormal situations such as extreme weather events. To advance existing research, we propose EnergyMamba, an uncertainty-aware spatiotemporal learning framework for accurate and reliable energy consumption prediction, which comprises two key components: (i) a novel Graph-Enhanced Selective State Space Model (GE-Mamba) that injects spatial context learned from the grid topology into the temporal dynamics, enabling coupled spatiotemporal modeling, and (ii) an Adaptive Sequential Conformalized Quantile Regression (AS-CQR) module, which includes locally adaptive normalization and an online feedback mechanism to dynamically calibrate prediction intervals under potential distribution shifts. We evaluate EnergyMamba on four large-scale real-world datasets from Florida, New York, and California. Results show EnergyMamba achieves around 5% improvement in prediction accuracy and 6% improvement in uncertainty quantification over 15 state-of-the-art baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.00503 2026-06-02 cs.LG cs.AI 版本更新

CodeCytos: 通过代码增强的智能体动作空间实现AI辅助空间分子成像分析

Hung Q. Vo, Huy Q. Vo, Son T. Ly, Zhihao Wan, Anh-Vu Nguyen, Hong Zhao, Jianting Sheng, Stephen T. C. Wong, Hien V. Nguyen

发表机构 * University of Houston, Department of Electrical and Computer Engineering（德克萨斯大学休斯顿分校电子与计算机工程系）； Houston Methodist Hospital, Department of Systems Medicine and Biomedical Engineering（休斯顿 Methodist 医院系统医学与生物医学工程系）

AI总结提出CodeCytos框架，通过代码驱动的推理智能体实现空间分子成像数据的动态可编程分析，提升自动化与定制化能力，并在多种组织类型数据集上验证其优于基线方法。

详情

AI中文摘要

传统的组织图像分析软件为细胞分析提供了基础功能，包括分割、基本形态特征提取和空间组织分析。然而，这些工具通常需要手动干预，且与代码驱动的自动化集成不佳，限制了复杂空间组织研究的效率和可扩展性。此外，它们对自定义分析的灵活性有限，通常只支持一组固定的预实现空间细胞特征。为了解决这些限制，我们提出了CodeCytos，一个基于编码的推理智能体框架，能够实现与空间分子成像数据的动态、可编程交互，以提高自动化和定制化。CodeCytos旨在简化自定义空间细胞特征的探索，并适应多样化的研究需求。我们通过四个来自不同组织类型（额叶皮层、非小细胞肺癌、胰腺和扁桃体）的专家精选数据集案例研究展示了其实用性。我们在现实的最小提示设置下评估CodeCytos，其中生物科学家提出简单问题，没有任务特定指令或关于空间细胞分析的上下文信息，并基准测试了多个具有强大编码能力的LLM骨干。我们进一步表明，结合定制的、领域无关的少样本上下文编码推理示例（空间分析领域外随机采样的演示）可以显著提高性能，而无需昂贵的、专家制作的领域内演示。总体而言，CodeCytos优于基线方法，突显了代码动作智能体在空间分子成像中辅助自定义特征探索和加速生物标志物发现的潜力。

英文摘要

Conventional tissue image analysis software provides foundational capabilities for cellular analysis, including segmentation, basic morphological feature extraction, and spatial organization analysis. However, these tools often require manual intervention and are not well integrated with code-driven automation, limiting efficiency and scalability for complex spatial tissue studies. In addition, they offer limited flexibility for custom analyses, as they typically support only a fixed set of pre-implemented spatial cellular features. To address these limitations, we propose CodeCytos, a coding-based reasoning agent framework that enables dynamic, programmable interaction with spatial molecular imaging data to improve automation and customization. CodeCytos is designed to streamline the exploration of custom spatial cellular features and adapt to diverse research needs. We demonstrate its utility through case studies on four expert-curated datasets from distinct tissue types: frontal cortex, non-small-cell lung cancer, pancreas, and tonsil. We evaluate CodeCytos under a realistic minimal prompt setting, where bioscientists pose simple questions without task-specific instructions or contextual information about spatial cellular analysis, and benchmark multiple LLM backbones with strong coding capabilities. We further show that incorporating tailored, domain-agnostic few-shot in-context coding-reasoning examples (randomly sampled demonstrations outside the spatial analysis domain) can substantially improve performance without requiring costly, expert-crafted in-domain demonstrations. Overall, CodeCytos outperforms baseline approaches, highlighting the potential of code-action agents to assist with custom feature exploration in spatial molecular imaging and to accelerate biomarker discovery.

URL PDF HTML ☆

赞 0 踩 0

2606.00467 2026-06-02 cs.CL cs.AI cs.LG stat.ML 版本更新

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

论大语言模型适应性的局限：模型内化先验对标注任务性能的影响

Etienne Casanova, Rafal Kocielnik, R. Michael Alvarez

发表机构 * University of Washington（华盛顿大学）

AI总结通过毒性检测实验，研究大语言模型内化先验与指令交互的三个维度，发现近三分之二的零样本错误难以通过提示纠正，并引入定义特定熟悉度（DSF）指标，证明其与性能正相关，而文本记忆指标则无此关联。

Comments Accepted at ICML 2026 (Oral & Spotlight); PMLR vol. 306. 9 pages, 4 figures

详情

AI中文摘要

大语言模型（LLMs）越来越多地用于零样本标注和LLM-as-a-judge任务，但其可靠性取决于模型内化先验与用户提供指令的交互方式。我们研究了这种交互的三个维度：（1）LLM对数据和任务定义的熟悉程度如何影响性能；（2）提示中的额外信息能在多大程度上纠正零样本错误（“决策粘性”）；（3）模型对错误任务定义的敏感性。通过在多种数据集（涵盖社交媒体、游戏、新闻和论坛）上进行毒性检测实验，使用密集模型和混合专家模型，我们发现近三分之二的零样本错误难以纠正，提示纠正的总体挽救率（初始错误中被纠正的比例）仅为34.8%。高置信度错误尤其难以纠正。当给出错误定义时，LLM会遵循这些定义，同时保持与正确定义条件下相同的置信水平。关键的是，我们引入了定义特定熟悉度（DSF），它衡量模型内部概念与任务定义之间的一致性。在控制数据集层面的混杂因素后，DSF与模型性能呈正相关（偏相关系数r=+0.41），而三种不同的记忆指标（ROUGE-L、BERTScore和嵌入余弦相似度）均未显示正相关。这些发现揭示了基于提示的纠正在标注任务中的局限性，强调了定义对齐比文本级记忆更重要。

英文摘要

Large Language Models (LLMs) are increasingly used for zero-shot annotation and LLM-as-a-judge tasks, yet their reliability hinges on how model-internalized priors interact with user-provided instructions. We investigate three dimensions of this interaction: (1) how an LLM's familiarity with data and task definitions affects performance, (2) the extent to which additional information in prompts can correct zero-shot errors ("decision stickiness"), and (3) model susceptibility to misaligned task definitions. Through experiments on toxicity detection across diverse datasets (spanning social media, gaming, news, and forums) using both dense and mixture-of-experts models, we find that nearly two-thirds of zero-shot errors are resistant to correction, with an overall rescue rate (fraction of initial errors corrected by prompting) of only 34.8%. High-confidence errors prove especially resistant to correction. When given misaligned definitions, LLMs follow them while maintaining confidence levels unchanged from the aligned condition. Crucially, we introduce Definition-Specific Familiarity (DSF), which measures alignment between a model's internal concept and the task definition. After controlling for dataset-level confounds, DSF shows a positive association with model performance (partial r = +0.41), while three distinct memorization metrics (ROUGE-L, BERTScore, and embedding cosine similarity) all fail to show a positive association. These findings show the limitations of prompt-based correction in annotation tasks, highlighting the importance of definition alignment over text-level memorization.

URL PDF HTML ☆

赞 0 踩 0

2606.00462 2026-06-02 cs.CL cs.AI cs.LG 版本更新

Short-form Text Rewriting with Phi Silica

短文本改写与 Phi Silica

Divya Tadimeti, Shawn Pan, Sameera Lanka, Chenghui Zhou, Sadid Hasan

发表机构 * IEEE ICAD

AI总结本研究通过数据集整理、提示蒸馏、参数高效微调和评估，将小语言模型 Phi Silica 适配于短文本改写任务，结果表明微调提高了语义保真度、减少了幻觉并提升了与 GPT-5-chat 改写的偏好胜率。

Comments 6 pages

详情

AI中文摘要

短文本改写是释义的一种受限变体，其中有限的上下文和高语义密度几乎没有留下变化空间。虽然大型语言模型在一般释义任务上表现良好，但小语言模型（SLM）在短文本场景中常常在语义保真度和幻觉鲁棒性方面遇到困难。在这项工作中，我们提出了一项实证研究，通过数据集整理、提示蒸馏、参数高效微调和评估，将小语言模型 Phi Silica 适配于短文本改写。我们从公开的幻灯片中整理了一个简短的演示风格文本数据集，并使用 GPT-5-chat 来生成改写监督以及进行 LLM 作为评判者的评估。我们的结果表明，微调提高了语义保真度，减少了幻觉，并提高了与 GPT-5-chat 改写的偏好胜率。这些发现表明，针对 SLM 的定向适配可以显著缩小与云模型的差距，并为将 SLM 适配于精度关键的改写任务提供实用指导。

英文摘要

Short-form text rewriting is a constrained variant of paraphrasing in which limited context and high semantic density leave little room for variation. While large language models perform well on general paraphrasing, small language models (SLMs) often struggle with semantic fidelity and hallucination robustness in short-form settings. In this work, we present an empirical study of adapting an SLM, Phi Silica, for short-form rewrite through dataset curation, prompt distillation, parameter-efficient fine-tuning, and evaluation. We curate a dataset of short presentation-style text from public slide decks and use GPT-5-chat both to generate rewrite supervision and to conduct LLM-as-a-judge evaluation. Our results show that finetuning improves semantic fidelity, reduces hallucinations, and increases preference win rate against GPT-5-chat rewrites. The findings suggest that targeted adaptation for SLMs can substantially narrow the gap to cloud models and provide practical guidance for adapting SLMs to precision-critical rewrite tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.00445 2026-06-02 cs.CV cs.AI cs.LG 版本更新

DarkVesselNet: Multi-Modal Remote Sensing and Trajectory Reasoning for Dark Vessel Detection

DarkVesselNet: 用于暗船检测的多模态遥感和轨迹推理

Arun Sharma

发表机构 * University of Minnesota, Twin Cities（明尼苏达大学，双城分校）

AI总结提出DarkVesselNet，融合Sentinel-1 SAR、Sentinel-2光学影像、地理空间基础模型、AIS轨迹推理、TGARD间隙检测和Pi-DPM异常头，实现多模态遥感暗船检测。

2606.00442 2026-06-02 cs.LG math.OC stat.ML 版本更新

Exploiting weight-space symmetries for approximating curvature

利用权重空间对称性近似曲率

Artem Artemev, Rui Xia, Benjamin M. Boyd, Youjing Yu, Felix Dangel, Guillaume Hennequin, Alberto Bernacchia

发表机构 * DeepMind, London, UK（伦敦DeepMind）

AI总结本文通过解析平均化保持损失不变的群作用，从单个梯度构建结构化的Hessian近似，从而利用权重空间对称性来近似损失函数的曲率。

Comments Published at ICML 2026. 35 pages, 11 figures. Code: https://github.com/mtkresearch/symm_opt

详情

AI中文摘要

许多机器学习技术依赖于近似损失函数的曲率，但在现代深度网络的规模下，这通常很难做到。令人惊讶的是，之前没有工作利用损失景观中众所周知的权重空间对称性所产生的曲率约束。通过解析平均化保持损失不变的群作用，我们从单个梯度构建了结构化的Hessian近似，这些近似可以易于估计、存储和求逆。用户指定的对称群直接控制近似精度与计算成本之间的权衡。此外，我们的框架为审视现有方法提供了统一的理论视角；特别地，特定的对称群选择可以恢复Shampoo/Muon类的曲率估计。我们在多种网络架构上验证了我们的方法，并将其应用于二阶优化基准测试，包括一个小型语言模型。我们的曲率估计框架可能在机器学习其他问题中找到应用，如不确定性估计、持续学习、压缩/剪枝、训练数据归因等。

英文摘要

Many machine learning techniques rely on approximating a loss function's curvature, but this is notoriously hard to do at the scale of modern deep networks. Surprisingly, no previous work has exploited the curvature constraints that arise from well known weight-space symmetries in loss landscapes. By analytically averaging over group actions that leave the loss invariant, we construct structured Hessian approximations from single gradients that can be tractably estimated, stored, and inverted. The choice of user-specified symmetry group directly governs the trade-off between approximation accuracy and computational cost. Moreover, our framework provides a unifying theoretical lens for viewing existing methods; in particular, a specific choice of symmetry group recovers Shampoo/Muon-like curvature estimates. We validate our method on a range of network architectures, and deploy it to second-order optimization benchmarks, including a small language model. Our curvature estimation framework might find applications in other machine learning problems such as uncertainty estimation, continual learning, compression/pruning, training data attribution, and more.

URL PDF HTML ☆

赞 0 踩 0

2606.00437 2026-06-02 cs.LG 版本更新

EST-PRM: Stress-Testing Process Reward Models Before They Become Load-Bearing

EST-PRM：在过程奖励模型成为关键依赖之前对其进行压力测试

Ibne Farabi Shihab, Fariya Afrin, Sanjeda Akter, Anuj Sharma

发表机构 * Department of Computer Science, Iowa State University（艾奥瓦州立大学计算机科学系）； Department of Computer Science, Kalinga Institute of Industrial Technology（卡林加工业技术学院计算机科学系）； Department of Civil, Construction & Environmental Engineering, Iowa State University（艾奥瓦州立大学土木、建设与环境工程系）

AI总结提出EST-PRM框架，通过步骤膨胀、依赖感知重排序和置信度标记三种变换对过程奖励模型进行压力测试，发现不同模型在奖励膨胀和正确性敏感性损失方面存在显著差异。

详情

AI中文摘要

过程奖励模型（PRM）在具有密集步骤级监督的语言模型训练中被广泛使用。它们假设在标签保持变换下，PRM分数是步骤正确性的稳定代理。这些变换改变推理结构但保留最终答案。我们认为这一假设未得到充分验证。此类变换可能改变PRM分数与正确性信号之间的关系，导致不同模型出现不同的故障模式。为弥补这一空白，我们引入了 extbf{EST-PRM}，一个用于密集过程奖励的压力测试框架。它应用三种变换：（1）步骤膨胀，（2）依赖感知步骤重排序，以及（3）置信度标记。定义了一个脆弱性分解，将奖励膨胀与正确性敏感性损失分开。在来自MATH-500、GSM8K和PRMBench的4,687条推理链上评估了五种PRM风格模型。结果表明不同模型的脆弱性模式存在明显差异。Math-Shepherd对位置扰动表现出最强的敏感性，Pearson相关系数下降$0.152 \pm 0.038$，分数膨胀率为$32.8 \pm 4.9\%$。Qwen2.5-Math-PRM受步骤膨胀影响最大，膨胀率达到$47.6 \pm 4.3\%$。基于置信度的扰动也会扭曲奖励校准，揭示正确性估计中的不一致性。评估了三种缓解策略，突出了鲁棒性覆盖率和假阳性率之间的权衡。

英文摘要

Process reward models (PRMs) are widely used in language-model training with dense step-level supervision. They assume PRM scores are stable proxies for step correctness under label-preserving transformations. These transformations change reasoning structure but preserve final answers. We argue this assumption is not well validated. Such transformations can change how PRM scores relate to correctness signals, leading to different failure modes across models.To address this gap, we introduce \textbf{EST-PRM}, a stress-testing framework for dense process rewards. It applies three transformations: (1) step inflation, (2) dependency-aware step reordering, and (3) confidence markers. A vulnerability decomposition is defined that separates reward inflation from loss of correctness sensitivity. Five PRM-style models are evaluated on 4,687 reasoning chains from MATH-500, GSM8K, and PRMBench.The results indicate clear differences in vulnerability patterns across models. Math-Shepherd shows the strongest sensitivity to position perturbations, with a Pearson correlation drop of $0.152 \pm 0.038$ and a $32.8 \pm 4.9\%$ score inflation rate. Qwen2.5-Math-PRM is most affected by step inflation, reaching a $47.6 \pm 4.3\%$ inflation rate. Confidence-based perturbations also distort reward calibration, revealing inconsistencies in correctness estimation. Three mitigation strategies are evaluated, highlighting trade-offs between robustness coverage and false-positive rates.

URL PDF HTML ☆

赞 0 踩 0

2606.00432 2026-06-02 cs.LG 版本更新

Grounded Decoding: Retrieval-Anchored Probability Fusion for Faithful RAG

Grounded Decoding: 面向忠实RAG的检索锚定概率融合

Ibne Farabi Shihab, Fariya Afrin, Sanjeda Akter, Anuj Sharma

发表机构 * Department of Computer Science, Iowa State University（爱荷华州立大学计算机科学系）； Department of Computer Science, Kalinga Institute of Industrial Technology（卡林加工业技术学院计算机科学系）； Department of Civil, Construction & Environmental Engineering, Iowa State University（爱荷华州立大学土木、建设与环境工程系）

AI总结提出Grounded Decoding，一种无需训练的推理时解码框架，通过KL-重心目标融合全RAG分布和仅检索分布，并引入冲突感知自适应加权，以提升RAG的事实一致性。

详情

AI中文摘要

随着检索增强生成（RAG）系统的扩展，确保忠实基于外部证据变得越来越具有挑战性。当冲突出现时，大型语言模型仍可能优先考虑参数化知识而非检索信息。我们提出了一种新颖的无训练解码框架——\emph{Grounded Decoding}，旨在不修改模型参数的情况下提高RAG的事实一致性。与依赖单一条件分布的标准方法不同，我们的方法在每个生成步骤构建两个匹配提示分布：（1）以查询、检索文档和生成前缀为条件的完整RAG分布，以及（2）仅以检索证据和相同前缀为条件的仅检索分布。最终的下一词分布被推导为概率单纯形上KL-重心目标的唯一解，产生两个分布的归一化几何融合。当接地权重为零时，该公式自然恢复标准RAG，并随着接地强度增加平滑地将概率质量移向检索证据。我们进一步引入了一种冲突感知自适应加权方案，该方案基于分布分歧和检索器置信度动态调整接地。在ALCE、Natural Questions和FActScore上的实验表明，与标准RAG和有竞争力的解码时基线相比，在事实准确性和引用质量上取得了一致改进，同时保持了流畅性。我们的结果表明，概率级融合为忠实RAG解码提供了一种强大且高效的替代对数级干预方法。

英文摘要

As retrieval-augmented generation (RAG) systems scale, it becomes increasingly challenging to ensure faithful grounding in external evidence. Large language models may still prioritize parametric knowledge over retrieved information when conflicts arise. We propose a novel training-free decoding framework, \emph{Grounded Decoding}, designed to improve factual consistency in RAG without modifying model parameters. Unlike standard approaches that rely on a single conditional distribution, our method constructs two matched-prompt distributions at every generation step: (1) a full RAG distribution conditioned on the query, retrieved documents, and generated prefix, and (2) a retrieval-only distribution conditioned solely on retrieved evidence and the same prefix. The final next-token distribution is derived as the unique solution to a KL-barycenter objective over the probability simplex, yielding a normalized geometric fusion of the two distributions.This formulation naturally recovers standard RAG when the grounding weight is zero and smoothly shifts probability mass toward retrieved evidence as grounding strength increases. We further introduce a conflict-aware adaptive weighting scheme that dynamically adjusts grounding based on distributional disagreement and retriever confidence. Experiments on ALCE, Natural Questions, and FActScore demonstrate consistent improvements in factual accuracy and citation quality over standard RAG and competitive decoding-time baselines, while maintaining fluency. Our results indicate that probability-level fusion provides a strong and efficient alternative to logit-level intervention methods for faithful RAG decoding.

URL PDF HTML ☆

赞 0 踩 0

2606.00431 2026-06-02 cs.LG 版本更新

Variance-sensitive Thompson sampling for generalised linear bandits, revisited

广义线性bandits的方差敏感Thompson采样，再探讨

Tom Perneczky, Marc Abeille, David Janz

发表机构 * University of Oxford（牛津大学）； Criteo AI Lab（Criteo人工智能实验室）

AI总结本文通过高斯庞加莱不等式证明Thompson采样在随机广义线性bandits中的方差敏感遗憾界，并指出移除预热阶段保持相同方差敏感尺度是开放且非平凡的问题。

2606.00428 2026-06-02 cs.LG cs.AI cs.CL 版本更新

Finer Parameter Steps for Low-Rank PEFT: A Controlled Study with CP Tensor Adapters

低秩PEFT的更细参数步长：基于CP张量适配器的控制研究

Xinjue Wang, Xiuheng Wang, Yejun Zhang, Sergiy A. Vorobyov, Esa Ollila, Zhi-Yong Wang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文通过固定组件的规范多路分解（CP）张量适配器实现更细的参数步长，研究其对低秩适配器精度-预算权衡的影响，发现CP适配器能填补LoRA秩之间的空白，但效果依赖于任务。

Comments Accepted at the ICML 2026 Workshop on CoLoRAI

详情

AI中文摘要

低秩适配器通常通过扫描少量秩进行比较，但秩也固定了参数预算的分辨率。对于一个$2048{\times}2048$的OPT注意力投影，增加LoRA的一个秩会存储$4096$个可训练标量，导致可行的低预算适配器大小之间存在较大间隙。本文探讨具有更细容量增量的张量化适配器是否会改变观察到的精度-预算权衡。我们通过固定组件的规范多路分解（CP）张量适配器来实例化这个问题。在$32{\times}64{\times}32{\times}64$的张量化下，一个归一化的CP组件每个投影存储$193$个可训练标量，比LoRA的一个秩步长小约21倍。我们在OPT-1.3B上，在匹配的目标模块、训练协议、数据上限和种子调度下，比较了CP适配器和LoRA在SST-2、RTE和BoolQ上的表现。CP训练稳定，并填补了LoRA秩之间的空白，但效果依赖于任务：SST-2早期达到低预算平台，BoolQ在略低于LoRA饱和之前受益于额外的CP组件，而RTE仍然偏好LoRA。因此，更细的参数步长有助于诊断PEFT预算敏感性，但它们本身并不能保证更好的精度-预算曲线。

英文摘要

Low-rank adapters are usually compared by sweeping a small set of ranks, but the rank also fixes the resolution of the parameter budget. For a $2048{\times}2048$ OPT attention projection, increasing LoRA by one rank stores $4096$ trainable scalars, leaving large gaps between feasible low-budget adapter sizes. This paper asks whether a tensorized adapter with finer capacity increments changes the observed accuracy--budget trade-off. We instantiate this question with fixed-component canonical polyadic (CP) tensor adapters. Under a $32{\times}64{\times}32{\times}64$ tensorization, one normalized CP component stores $193$ trainable scalars per projection, about $21$ times smaller than one LoRA rank step. We compare CP adapters and LoRA on OPT-1.3B across SST-2, RTE, and BoolQ under matched target modules, training protocol, data caps, and seed schedules. CP trains stably and fills the gaps between LoRA ranks, but the effect is task-dependent: SST-2 reaches an early low-budget plateau, BoolQ benefits from additional CP components before saturating slightly below LoRA, and RTE remains LoRA-favored. Finer parameter steps are therefore useful for diagnosing PEFT budget sensitivity, but they do not by themselves guarantee a better accuracy--budget curve.

URL PDF HTML ☆

赞 0 踩 0

2606.00427 2026-06-02 cs.LG 版本更新

Topology-Aware State Abstraction with Tangle Cores for Markov Decision Processes

基于纠缠核的马尔可夫决策过程拓扑感知状态抽象

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

发表机构 * Department of Computer Science, Iowa State University（计算机科学系，爱荷华州立大学）； Department of Civil, Construction & Environmental Engineering, Iowa State University（土木、建设与环境工程系，爱荷华州立大学）

AI总结提出纠缠核抽象框架，利用经验转移图的图纠缠构建重叠状态抽象，在动作一致性条件下保证价值保持，并通过实验证明其在瓶颈领域优于现有方法。

详情

AI中文摘要

强化学习中的状态抽象通常被形式化为基于奖励和转移相似性的状态划分。这排除了导航、图和分层决策问题中的常见结构模式：接口状态（如门、枢纽和瓶颈）自然参与多个区域。我们引入了\emph{纠缠核抽象}，一种基于经验转移图的图纠缠的重叠状态抽象框架。该方法从一致定向的低阶分离中构建抽象状态，并通过隶属核而非硬划分来表示共享接口。我们在显式动作一致性条件下给出了诱导的重叠抽象MDP的价值保持保证，识别了内部同质性/边界泄漏误差分解，并证明了一个定量接口重叠结果，表明硬划分何时会引入可避免的边界误差。实验上，在瓶颈表格领域、程序生成迷宫和MiniGrid表示中，纠缠核抽象在压缩-回报权衡上优于奖励感知、学习、拓扑映射和图划分基线。我们还识别了一个清晰的失败机制，即转移拓扑无信息时，纠缠可预测地几乎没有益处。这些结果将图纠缠定位为具有共享接口结构的决策问题的有效拓扑感知抽象先验。

英文摘要

State abstraction in reinforcement learning is usually formulated as a partition of states based on reward and transition similarity. This excludes a common structural pattern in navigation, graph, and hierarchical decision problems: interface states such as doors, hubs, and bottlenecks naturally participate in more than one region. We introduce \emph{tangle-core abstraction}, an overlapping state-abstraction framework based on graph tangles of empirical transition graphs. The method constructs abstract states from consistently oriented low-order separations and represents shared interfaces through a membership kernel rather than a hard partition. We give value-preservation guarantees for the induced overlapping abstract MDP under an explicit action-consistency condition, identify an interior-homogeneity/boundary-leakage error decomposition, and prove a quantitative interface-overlap result showing when hard partitions incur an avoidable boundary error. Empirically, tangle-core abstractions achieve favorable compression--return tradeoffs against reward-aware, learned, topological-map, and graph-partitioning baselines across bottlenecked tabular domains, procedurally generated mazes, and MiniGrid representations. We also identify a clear failure regime in which transition topology is uninformative, where tangles predictably offer little benefit. These results position graph tangles as an effective topology-aware abstraction prior for decision problems with shared interface structure.

URL PDF HTML ☆

赞 0 踩 0

2606.00426 2026-06-02 cs.LG 版本更新

Canonicalized Stable-List Replay for Private Federated Continual Learning over Language-Model Embeddings

规范化稳定列表回放：面向语言模型嵌入的私有联邦持续学习

Ibne Farabi Shihab, Abu Sa-Adat Mohamed Moon-Im Al Ahsan, Anuj Sharma

发表机构 * Department of Computer Science, Iowa State University（爱荷华州立大学计算机科学系）； Department of Computer Science & Engineering, BRAC University（BRAC大学计算机科学与工程系）； Department of Civil, Construction & Environmental Engineering, Iowa State University（爱荷华州立大学土木、建设与环境工程系）

AI总结针对联邦持续学习中差分隐私下回放列表无序的问题，提出规范化稳定列表回放（CSLR）方法，利用公共锚句的签名对齐客户端分布，在多个基准上提升性能。

详情

AI中文摘要

联邦持续学习（FCL）允许分布式客户端在不共享原始文本的情况下，将语言模型头部适应不断演变的NLP任务。在用户级差分隐私（DP）下，基于回放的持续学习面临一个结构性障碍：客户端只能发布候选回放摘要的小型噪声列表，且这些列表在客户端之间是无序的。我们引入了规范化稳定列表回放（CSLR），其中客户端在共享的句子嵌入空间上私有地生成候选回放分布，服务器使用公共锚句诱导的签名对齐它们。锚点提供聚合的可识别性，而不是额外的回放数据。我们证明，在可观测的锚签名间隔下，$O(\log(N/η)/p)$个锚点以至少$1-η$的概率区分$N$个候选列表元素，并给出了无序标签预言机模型的范围性无锚不可识别性结果。在持续分类、NER和对话基准的五个随机种子上，CSLR在报告的回放发布预算下，在$\eps=4$时，最终平均任务指标比最强的非CSLR DP基线提高了3.9-5.6个点，同时也优于匈牙利匹配和最优传输匹配。形式化隐私保证涵盖回放发布；端到端私有训练还需要与用于任务头更新的私有优化器组合。

英文摘要

Federated continual learning (FCL) lets distributed clients adapt language-model heads to evolving NLP tasks without sharing raw text. Under user-level differential privacy (DP), replay-based continual learning faces a structural obstacle: clients can release only small noisy lists of candidate replay summaries, and those lists are unordered across clients. We introduce Canonicalized Stable-List Replay (CSLR), where clients privately produce candidate replay distributions over a shared sentence-embedding space and the server aligns them using signatures induced by public anchor sentences. The anchors provide identifiability for aggregation rather than additional replay data. We prove that, under an observable anchor-signature margin, $O(\log(N/η)/p)$ anchors distinguish $N$ candidate list elements with probability at least $1-η$, and we give a scoped anchorless non-identifiability result for unordered-label oracle models. Across five seeds on continual classification, NER, and dialogue benchmarks, CSLR improves the final average task metric by 3.9--5.6 points over the strongest non-CSLR DP baseline at $\eps=4$ under the reported replay-release budget, while also outperforming Hungarian and optimal-transport matchers. The formal privacy guarantee covers replay release; end-to-end private training additionally requires composition with a private optimizer for task-head updates.

URL PDF HTML ☆

赞 0 踩 0

2606.00422 2026-06-02 cs.IR cs.LG 版本更新

UniPinRec: Unifying Generative Retrieval and Ranking at Pinterest Scale

UniPinRec：在Pinterest规模下统一生成式检索与排序

Hanyu Li, Yi-Ping Hsu, Aditya Mantha, Prabhat Agarwal, Laksh Bhasin, Jialu Wang, Hongtao Lin, Bella Huang, Yaxin Li, Xinyi Li, Chuxi Wang, Kousik Rajesh, Hooshmand Shokri Razaghi, Shunyao Li, Zongyue Qin, Jaewon Yang, James Li, Dhruvil Deven Badani, Jiajing Xu, Charles Rosenberg

发表机构 * Pinterest

AI总结提出UniPinRec，通过共享Transformer编码用户行为序列，结合掩码动作建模、混合训练样本和跨阶段KV缓存共享，在Pinterest生产系统中首次实现检索与排序的全栈统一，提升在线参与度并降低延迟。

详情

AI中文摘要

现代推荐系统主要将检索和排序作为独立模型训练，尽管两者都越来越依赖编码相同用户行为数据的大型Transformer，导致参数、计算和服务成本重复。先前的工作统一了模型架构，但未统一完整流程：输入格式、训练过程和服务栈在阶段间仍然分散。我们提出UniPinRec，在Pinterest实现了检索和排序的全栈统一：一种输入格式、一个模型、一个训练阶段，部署在现有服务基础设施中。共享Transformer将用户行为序列编码为候选无关的表示，通过任务特定的头部分支到检索（ANN点积）和排序（交叉注意力）。三个关键思想使此工作成立：（1）掩码动作建模（MAM）消除了交错，使得无需加倍上下文长度即可实现权重共享；（2）混合训练样本将动作序列与feedview曝光列表配对，以共同满足两个目标；（3）跨阶段KV缓存共享重用检索中的用户历史计算用于排序，相比服务两个独立模型减少了总FLOPs。部署在Pinterest核心表面，UniPinRec实现了约+1%的在线参与度提升，同时将端到端服务延迟降低11.1%，QPS提升63.6%。据我们所知，这是首个在生产推荐系统中实现检索和排序全栈统一的工作，涵盖输入、模型、训练和服务。

英文摘要

Modern recommendation systems predominantly train retrieval and ranking as separate models despite both increasingly relying on large transformers encoding the same user behavior data, duplicating parameters, compute, and serving cost. Prior work unifies the model architecture but not the full pipeline: input formats, training procedures, and serving stacks remain fragmented across stages. We present UniPinRec, which achieves full-stack unification of retrieval and ranking at Pinterest: one input format, one model, one training stage, deployed within existing serving infrastructure. A shared transformer encodes the user action sequence into candidate-independent representations that branch into retrieval (ANN dot-product) and ranking (cross-attention) via task-specific heads. Three ideas make this work: (1) Masked Action Modeling (MAM) eliminates interleaving, enabling weight sharing without doubling context length; (2) Blended training examples pair action sequences with feedview impression slates to satisfy both objectives jointly; (3) Cross-stage KV cache sharing reuses user-history computation from retrieval for ranking, reducing total FLOPs versus serving two independent models. Deployed in the Pinterest core surfaces, UniPinRec delivers approximately +1% online engagement lift while cutting end-to-end serving latency by 11.1% and lifting QPS by 63.6%. To our knowledge, this is the first full-stack unification of retrieval and ranking, covering inputs, model, training and serving, deployed in a production recommendation system.

URL PDF HTML ☆

赞 0 踩 0

2606.00414 2026-06-02 cs.LG 版本更新

Auditing Near-Optimal Policies Can Be Exponentially Hard: Conditional Query Lower Bounds via Occupancy Rashomon Capacity

审计近最优策略可能是指数级困难的：通过占用Rashomon容量的条件查询下界

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

发表机构 * Department of Computer Science, Iowa State University（计算机科学系，爱荷华州立大学）； Department of Civil, Construction & Environmental Engineering, Iowa State University（土木、建设与环境工程系，爱荷华州立大学）

AI总结本文通过引入占用Rashomon容量概念，证明了在存在多个近最优策略时，审计这些策略的行为差异需要指数级数量的查询，并给出了精确和噪声查询下的下界。

详情

AI中文摘要

当许多强化学习策略达到近最优回报时，事后审计员可能必须区分许多行为不同但回报等价的策略。我们通过Rashomon容量的占用度量类比来形式化这一现象：近最优占用区域的度量熵，相对于被审计的部署类别计算。由于占用度量仅识别到占用等价，我们在占用类别级别上制定审计，并区分精确局部查询预言机和噪声样本查询预言机。我们的主要精确查询结果是条件性的：如果被审计类别包含一个$2/H$-分离的近最优填充，其局部签名是$b$-稀疏的，那么精确局部查询审计需要$\Omega(M/b)$次查询；当填充实现部署类别容量且$b=O(1)$时，这变为$\Omega(2^{\Hopt^\cF(\eps)})$。我们给出了一个有限折扣隐藏分支MDP达到此界，并展示了精确贝叶斯成功律。对于噪声隐藏触发测试，我们证明了阶为$M/\beta$的混合下界，其中$\beta$是每样本KL信号，对于容量阶填充且$\beta=O(\rho^2\Delta^2)$，得到$\Omega(2^{\Hopt^\cF(\eps)}/(\rho^2\Delta^2))$。我们还提供了静态目标识别信息下界、一个转录兼容的预言机覆盖验证上界，以及一个规范占用正则化器，当存在可信参考占用时，其正则化审计容量会崩溃。受控基准将正稀疏签名实例与精确审计容易的高容量阴性对照区分开来，并将噪声触发律映射到后处理的连续控制和视觉RL审计体制。

英文摘要

When many reinforcement-learning policies achieve near-optimal return, a post-hoc auditor may have to distinguish among many behaviorally distinct but return-equivalent policies. We formalize this phenomenon through an occupancy-measure analogue of Rashomon capacity: the metric entropy of the near-optimal occupancy region, computed relative to an audited deployment class. Because occupancy measures identify behavior only up to occupancy equivalence, we formulate auditing at the occupancy-class level and distinguish exact local-query oracles from noisy sample-query oracles. Our main exact-query result is conditional: if the audited class contains a $2/H$-separated near-optimal packing whose local signatures are $b$-sparse, then exact local-query auditing requires $Ω(M/b)$ queries; when the packing realizes deployment-class capacity and $b=O(1)$, this becomes $Ω(2^{\Hopt^\cF(\eps)})$. We give a finite discounted hidden-branch MDP attaining this bound and show the exact Bayes success law. For noisy hidden-trigger testing, we prove a mixture lower bound of order $M/β$, where $β$ is the per-sample KL signal, yielding $Ω(2^{\Hopt^\cF(\eps)}/(ρ^2Δ^2))$ for capacity-order packings with $β=O(ρ^2Δ^2)$. We also provide a static target-recognition information lower bound, a transcript-compatible oracle-cover verification upper bound, and a canonical occupancy regularizer whose regularized audited capacity collapses when a trusted reference occupancy is available. Controlled benchmarks distinguish positive sparse-signature instances from high-capacity negative controls where exact auditing is easy, and map the noisy-trigger law to post-processed continuous-control and visual-RL auditing regimes.

URL PDF HTML ☆

赞 0 踩 0

2606.00413 2026-06-02 stat.ML cs.LG 版本更新

Riemannian Stochastic Optimization for Sufficient Dimension Reduction

充分降维的黎曼随机优化

Thibault Pautrel, François Portier

发表机构 * Laboratoire des Signaux et Systèmes (L2S), CentraleSupélec, Université Paris-Saclay, Gif-sur-Yvette, France（信号与系统实验室（L2S），中央超导电子研究所，巴黎-萨克雷大学，吉夫-sur-伊夫特，法国）

AI总结提出一种基于黎曼流形随机梯度上升的算法SMAVE，通过将充分降维问题转化为Stiefel流形上的光滑最大化，实现高效的低维子空间恢复。

详情

AI中文摘要

充分降维（SDR）通过将协变量投影到保留响应条件均值的低维子空间，使高维回归变得易于处理。现有的基于梯度的估计器要么在原始空间中操作并遭受维数灾难，要么在降维空间中局部化，每次外迭代的代价至少与样本量成二次关系。我们证明了总体最小平均方差估计（MAVE）风险的最小化器与梯度外积（OPG）逼近相同的Grassmannian目标，并将经验准则重新表述为Stiefel流形上的光滑最大化，具有闭式黎曼梯度。由此产生的算法SMAVE结合了稀疏投影空间最近邻局部化和黎曼随机梯度上升。简化版本具有几乎必然收敛性和非渐近速率，匹配标准的非凸随机一阶缩放。实验上，SMAVE在中高维环境中匹配或改进了RMAVE的合成子空间恢复，在四个真实数据集上一致优于OPG，并且与RMAVE相比具有竞争力或更优，同时运行时间低几个数量级。

英文摘要

Sufficient dimension reduction (SDR) makes high-dimensional regression tractable by projecting the covariates onto a low-dimensional subspace that preserves the conditional mean of the response. Existing gradient-based estimators either operate in the ambient space and suffer from the curse of dimensionality, or localize in the reduced space at a per-outer-iteration cost at least quadratic in the sample size. We show that minimizers of the population Minimum Average Variance Estimation (MAVE) risk approximate the same Grassmannian target as the Outer Product of Gradients (OPG), and recast the empirical criterion as a smooth maximization on the Stiefel manifold with closed-form Riemannian gradient. The resulting algorithm, SMAVE, combines sparse projected-space nearest-neighbor localization with Riemannian stochastic gradient ascent. A simplified version comes with almost-sure convergence and a non-asymptotic rate matching the standard non-convex stochastic first-order scaling. Empirically, SMAVE matches or improves on RMAVE's synthetic subspace recovery at moderate-to-high ambient dimension, and on four real datasets it uniformly improves over OPG and is competitive with or outperforms RMAVE at orders of magnitude lower runtime.

URL PDF HTML ☆

赞 0 踩 0

2606.00404 2026-06-02 cs.CV cs.LG 版本更新

Rethinking Amortized Neural Representations for High-Resolution Terrain Elevation Data

重新思考高分辨率地形高程数据的摊销神经表示

Haoan Feng, Xin Xu, Leila De Floriani

发表机构 * University of Maryland, College Park（马里兰大学学院公园分校）

AI总结针对地形高程数据，提出HUVR+SIREN超网络方法，通过替换坐标解码器为平滑可微版本，在统一基准上实现最佳高度和导数保真度，且支持后训练量化压缩。

Comments 12 pages, 7 figures, 10 tables

详情

AI中文摘要

隐式神经表示（INR）将信号建模为连续的坐标到值函数。对于地形高程数据，这支持解析导数、任意分辨率解码以及底层高度场的平滑表面模型。然而，为每个瓦片拟合和存储单独的INR无法扩展到大型地形数据集。摊销神经表示通过共享网络降低了这一成本：新瓦片被映射到紧凑的每瓦片载荷，共享解码器从中重建高度场。大多数此类方法是超网络，通过单次前向传递预测载荷，而其他方法则通过短时的每瓦片优化恢复载荷。这些方法主要针对自然图像开发，其在地形高度场上的适用性尚不清楚。我们在1米/像素的地形数据集上引入了受控基准，并在统一协议下评估了三种代表性方法。观察到明显的跨领域差距后，我们提出了HUVR+SIREN，这是一种超网络，它通过将坐标解码器替换为平滑、解析可微的解码器来适应最强的基准方法（HUVR）。它在基准上实现了最佳的高度和导数保真度，无需额外的每瓦片存储且解码成本更低，并且能够容忍激进的后训练量化而质量损失可忽略，从而形成了紧凑的地形神经格式。消融和诊断进一步确定了哪些设计选择可迁移到地形，并表明每瓦片瓶颈已接近其有用极限，剩下的差距在于共享超网络的架构设计。

英文摘要

Implicit neural representations (INRs) model a signal as a continuous coordinate-to-value function. For terrain elevation data, this supports analytic derivatives, arbitrary-resolution decoding, and a smooth surface model of the underlying heightfield. However, fitting and storing a separate INR for every tile does not scale to large terrain datasets. Amortized neural representations reduce this cost with a shared network: a new tile is mapped to a compact per-tile payload, and a shared decoder reconstructs the heightfield from it. Most such methods are hypernetworks that predict the payload in a single forward pass, while others recover it through a short per-tile optimization. These methods were developed primarily for natural images, and their suitability for terrain heightfields remains unclear. We introduce a controlled benchmark on a 1 m/pixel terrain dataset and evaluate three representative methods under a unified protocol. Observing a clear cross-domain gap, we propose HUVR+SIREN, a hypernetwork that adapts the strongest benchmarked method (HUVR) by replacing its coordinate decoder with a smooth, analytically differentiable one. It attains the best height and derivative fidelity on the benchmark with no additional per-tile storage and lower decode cost, and tolerates aggressive post-training quantization with negligible quality loss, giving a compact terrain neural format. Ablations and diagnostics further identify which design choices transfer to terrain and show that the per-tile bottleneck is already near its useful limit, leaving the remaining gap in the shared hypernetwork's architectural design.

URL PDF HTML ☆

赞 0 踩 0

2606.00401 2026-06-02 physics.comp-ph cond-mat.mtrl-sci cs.LG cs.NA math.NA 版本更新

Data-Driven Spectral Prediction for Accelerating Large-Scale Electronic Structure Calculations

数据驱动的光谱预测加速大规模电子结构计算

Abhiram Badrinarayanan, Davor Davidovic, Edoardo Di Napoli, Jurica Novak, Luigi Genovese, Gustavo Ramirez-Hidalgo, Xinzhe Wu

发表机构 * Jülich Supercomputing Centre, Forschungszentrum Jülich, Germany（耶拿超级计算中心，耶拿研究中心，德国）； Ruđer Bošković Institute, Croatia（鲁德·波斯科维奇研究所，克罗地亚）

AI总结针对大规模电子结构计算中广义本征问题求解的瓶颈，提出基于数据驱动的光谱预测框架，通过机器学习预测切比雪夫多项式系数，提供初始猜测以跳过早期自洽场迭代，并优化有理滤波器本征求解器。

详情

AI中文摘要

模拟包含数千个原子的大分子系统需要高度可扩展的方法。虽然现代密度泛函理论（DFT）代码具有线性标度性，但在百亿亿次架构上，求解相关的大规模稀疏广义本征问题仍然是关键的计算瓶颈。在LimitX项目背景下，我们提出了一个数据驱动框架来加速这些计算。通过将机器学习目标从离散特征值转移到插值切比雪夫多项式的系数，并比较全原子和基于片段的结构表示，我们成功克服了大规模光谱预测的维度限制。我们研究了三种机器学习模型（核岭回归、图神经网络和随机森林），这些模型在包含2 TB蛋白质二聚体的新数据集上进行训练。预测的光谱提供了初始猜测，有效跳过了BigDFT中的早期自洽场（SCF）迭代。最终，这些光谱预测器将被部署以动态优化即将推出的基于有理滤波器的本征求解器（如目前处于初期开发阶段的FrASE）。

英文摘要

Simulating large molecular systems comprising thousands of atoms requires highly scalable methodologies. While modern Density Functional Theory (DFT) codes exhibit linear scaling, solving the associated large, sparse generalized eigenproblems remains a critical computational bottleneck on exascale architectures. In the context of the LimitX project, we propose a data-driven framework to accelerate these calculations. By shifting the machine learning target from discrete eigenvalues to the coefficients of an interpolating Chebyshev polynomial, and by comparing both all-atom and fragment-based structural representations, we successfully overcome the dimensionality constraints of large-scale spectral prediction. We investigate three machine learning models (Kernel Ridge Regression, Graph Neural Networks, and Random Forests) trained on a novel 2 TB dataset of protein dimers. The predicted spectra provide initial guesses that effectively bypass early Self-Consistent Field (SCF) iterations in BigDFT. Ultimately, these spectral predictors will be deployed to dynamically optimize upcoming rational filter-based eigensolvers, such as FrASE, which is currently in initial development.

URL PDF HTML ☆

赞 0 踩 0

2606.00400 2026-06-02 cs.LG 版本更新

Dynamic Proxy-Mixing: Transferring Replay Controllers from Small to Large Models for Continual Instruction Tuning

动态代理混合：将重放控制器从小模型迁移到大模型以进行持续指令微调

Ibne Farabi Shihab, Fariya Afrin, Anuj Sharma

发表机构 * Department of Computer Science, Iowa State University（爱荷华州立大学计算机科学系）； Department of Computer Science, Kalinga Institute of Industrial Technology（卡林加工业技术学院计算机科学系）； Department of Civil, Construction & Environmental Engineering, Iowa State University（爱荷华州立大学土木、建筑与环境工程系）

AI总结提出PROXY-MIX框架，通过在小代理模型上学习动态重放控制器并冻结迁移至大模型，以解决持续指令微调中固定重放比例导致的灾难性遗忘问题，在LLaMA-3-8B上平均准确率提升3.4%，遗忘降低3.5%，安全性提升5.8%。

详情

AI中文摘要

持续指令微调通过一系列新领域更新语言模型，但每次更新会逐渐侵蚀先前学到的能力和对齐行为。重放是标准的缓解方法，但固定重放比例本质上有限，因为最优混合比例随当前领域、训练阶段以及先前行为的脆弱性而变化。我们提出PROXY-MIX框架，该框架在小代理模型上学习动态重放控制器，并将冻结的控制器迁移到更大的目标模型。控制器从未见过未来任务，而是从归一化的验证损失及其时间动态构建状态，生成当前任务和可访问重放缓冲区的掩码混合。我们的核心经验假设是遗忘镜像：即使绝对损失大小不同，任务脆弱性排名在不同模型规模上基本一致。在跨规模迁移控制器之前，我们通过实验验证了这一假设。在LLaMA-3-8B上跨越五个持续指令微调序列，PROXY-MIX在平均准确率上提高了3.4个百分点，最终遗忘降低了3.5个百分点，安全性得分比最强的非神谕基线提高了5.8个百分点，策略学习成本约为神谕目标强化学习的50倍。该框架在接口层面无泄漏且架构无关，我们还确定了代理假设失效的设置，突出了鲁棒部署的局限性。

英文摘要

Continual instruction tuning updates a language model through a sequence of new domains, yet each update can progressively erode previously learned capabilities and alignment behavior. Replay is the standard mitigation, but fixed replay ratios are inherently limited because the optimal mixture varies with the current domain, the training stage, and the evolving vulnerability of prior behaviors. We propose PROX-YMIX, a framework that learns a dynamic replay controller on a small proxy model and transfers the frozen controller to a larger target. The controller never observes future tasks and constructs its state from normalized validation losses and their temporal dynamics, producing a masked mixture over the current task and accessible replay buffers. Our core empirical hypothesis is forgetting mirroring: task vulnerability rankings remain largely consistent across model scales even when absolute loss magnitudes differ. We validate this assumption empirically before transferring controllers across scales. On LLaMA-3-8B across five continual instruction tuning sequences, PROXYMIX improves average accuracy by 3.4 points, reduces final forgetting by 3.5 points, and raises safety score by 5.8 points over the strongest non-oracle baseline, at roughly 50x lower policy learning cost than Oracle Target RL. The framework is leakage free and architecture independent at the interface level, and we also identify settings where the proxy assumption breaks down, highlighting limitations for robust deployment.

URL PDF HTML ☆

赞 0 踩 0

2606.00399 2026-06-02 cs.LG 版本更新

Multi-Objective Reference-Aligned Machine Unlearning

多目标参考对齐机器遗忘

Rasa Khosrowshahli, Stephen Asobiela, Beatrice Ombuki-Berman, Shahryar Rahnamayan

发表机构 * arXiv

AI总结提出多目标框架RAUL，通过将遗忘样本的预测对齐到参考分布（均匀分布或保留集经验分布）来约束遗忘目标，并利用雅可比下降解决多目标优化，实现接近完整重训练的遗忘效果。

Comments Accepted as a short paper at Canadian AI 2026. Author version with an added framework overview figure for clarity

详情

AI中文摘要

机器遗忘旨在移除特定训练样本的影响，同时保持模型的效用。现有的单目标方法，如梯度上升或随机重标，常常由于冲突的优化动态和无界的遗忘目标导致灾难性遗忘，使模型偏离其预训练知识。我们提出参考对齐遗忘（RAUL），一个多目标框架，通过将遗忘样本上的无界损失最大化替换为有界的KL对齐，使其预测对齐到代表未见数据的参考分布（可实例化为均匀分布或来自保留参考集的经验分布），从而约束遗忘目标并减少与保留目标的梯度冲突，联合优化遗忘和保留。通过雅可比下降解决由此产生的多目标优化（MOO）问题，该算法将多个梯度聚合到无冲突的方向。我们的结果表明，与完全重训练相比，RAUL实现了最接近的差距。

英文摘要

Machine unlearning aims to remove the influence of specific training samples while preserving the model's utility. Existing single-objective approaches, such as gradient ascent or random relabeling, often induce catastrophic forgetting due to conflicting optimization dynamics and unbounded forgetting objectives that cause the model to drift from its pre-trained knowledge. We propose Reference-Aligned UnLearning (RAUL), a multi-objective framework that jointly optimizes forgetting and retention by replacing unbounded loss maximization with a bounded KL alignment of predictions on forgotten samples toward a reference distribution representing unseen data, instantiated either as a uniform distribution or an empirical distribution from a held-out reference set, which constrains the forgetting objective and reduces gradient conflict with retention. The resulting multi-objective optimization (MOO) problem is solved via Jacobian descent, which aggregates multiple gradients into a direction that does not conflict. Our results demonstrate that RAUL achieves the closest gap compared to full retraining.

URL PDF HTML ☆

赞 0 踩 0

2606.00392 2026-06-02 cs.LG cs.AI 版本更新

Muon 需要多少正交化？

Hua Huang

发表机构 * NVIDIA

AI总结研究 Muon 优化器所需的正交化程度，提出一种基于三次牛顿-舒尔茨迭代的低成本正交化变体 cubic5，并在多种模型上验证其与高精度方法性能相当。

详情

AI中文摘要

Muon 优化器通过将病态动量更新替换为近似半正交更新来改进神经网络训练。这引出一个实际问题：Muon 实际上需要多少正交化？我们使用直接为 Muon 的低精度奇异值带导出的松弛三次牛顿-舒尔茨调度来研究这个问题。与五次五次牛顿-舒尔茨迭代的十五次主导矩阵乘法相比，所得的五步三次构造使用十次主导矩阵乘法。三次调度并非旨在作为更精确的极分解求解器；相反，它是一种原则性的低成本变体，使我们能够探究极分解精度、谱整形和训练质量之间的关系。通过合成诊断、NanoGPT 消融实验以及混合 MoE/Mamba 模型的训练实验，我们发现训练质量并非由极分解精度单调决定：截断的 Polar Express、Muon-Jordan、三次牛顿-舒尔茨以及显式 FP32 SVD 极分解因子在 GPT-2 Small 上可达到几乎无法区分的最终损失，而 cubic5 在具有十亿到四十亿参数的混合 MoE/Mamba 模型上，其验证损失与 Muon-Jordan 五次更新相差约 $10^{-3}$。这些结果支持 cubic5 作为一种实用的低成本 Muon 正交化变体，并在测试的设置中提供了训练质量等同的实验证据。

英文摘要

Muon optimizers improve neural-network training by replacing ill-conditioned momentum updates with approximately semi-orthogonal updates. This motivates a practical question: how much orthogonalization does Muon actually require? We study this question using a relaxed cubic Newton--Schulz schedule derived directly for Muon's low precision singular value band. The resulting five-step cubic construction uses ten dominant matrix multiplications, compared with fifteen for five quintic Newton--Schulz iterations. The cubic schedule is not intended as a more accurate polar solver; instead, it is a principled low-cost variant that lets us probe the relation between polar accuracy, spectral shaping, and training quality. Across synthetic diagnostics, NanoGPT ablations, and training experiments on hybrid MoE/Mamba models, we find that training quality is not governed monotonically by polar-decomposition accuracy: truncated Polar Express, Muon-Jordan, cubic Newton--Schulz, and an explicit FP32 SVD polar factor can reach nearly indistinguishable final loss on GPT-2 Small, and cubic5 matches the Muon-Jordan quintic update within about $10^{-3}$ validation loss on hybrid MoE/Mamba models with one billion to four billion parameters. These results support cubic5 as a practical low-cost Muon orthogonalization variant, with empirical evidence of training-quality parity in the settings tested.

URL PDF HTML ☆

赞 0 踩 0

2606.00369 2026-06-02 cs.CY cs.LG 版本更新

Quantifying the Salience of Geo-Cultural Values for Pluralistic Safety Alignment

量化地理文化价值对多元安全对齐的显著性

Arkadiy Saakyan, Charvi Rastogi, Lora Aroyo

发表机构 * University of Oxford（牛津大学）； University of Cambridge（剑桥大学）

AI总结通过多层次模型分析，发现文化区域归属对安全评分有显著影响（p<0.05），约10%的项目存在文化敏感性，当前LLM无法可靠替代人类评分员但可辅助筛选。

Comments 119 pages, 13 figures. ICML 2026 camera ready

详情

AI中文摘要

AI模型的安全全球部署需要与跨文化的人类价值观对齐。然而，安全评估数据集中的评分者群体在地理上仍然高度同质，未能捕捉地理文化差异。此外，在控制年龄、性别和种族等人口统计学因素后，这些差异是否仍然存在尚不清楚。通过对安全数据集的元分析，我们发现大多数数据集未报告地理文化信息，而那些报告的数据集缺乏统一的方法来联合分析地理文化和人口统计学相关性。利用Inglehart-Welzel跨文化变异维度，我们通过多层次模型证明，文化区域归属解释了超出标准人口统计学变量的安全评分方差（6个数据集中p<0.05）。此外，我们的分析表明，我们检查的数据集中大约10%的项目具有文化敏感性：如果没有充分的文化代表性，这些项目很可能被错误分类为安全。我们将LLM评估为评分替代工具和分诊工具，发现当前的LLM不能可靠地替代评分员，尽管它们可以帮助优先选择文化敏感项目进行人工标注。我们的发现推动了更多文化多元的安全评估，并提供了支持其实践的实用建议。

英文摘要

Safe global deployment of AI models requires alignment with human values that vary across cultures. Yet rater pools in safety evaluation datasets remain largely geographically homogeneous, failing to capture geo-cultural differences. Further, it remains unclear whether such differences persist after controlling for demographics such as age, gender, and ethnicity. Through a meta-analysis of safety datasets, we find that most do not report geo-cultural information, and those that do lack a unified methodology to jointly analyze geo-cultural and demographic correlates. Using the Inglehart-Welzel dimensions of cross-cultural variation, we demonstrate via multilevel modeling that cultural zone membership explains variance in safety ratings beyond standard demographics (p<0.05 across 6 datasets). Moreover, our analysis indicates that roughly 10% of items in the datasets we examined are culturally sensitive: likely to be misclassified as safe without adequate cultural representation. We evaluate LLMs as both rater surrogates and triage tools, finding that current LLMs do not reliably stand in for raters, though they can help prioritize culturally sensitive items for human annotation. Our findings motivate more culturally pluralistic safety evaluation and offer practical takeaways to support it.

URL PDF HTML ☆

赞 0 踩 0

2606.00367 2026-06-02 cs.LG cs.AI 版本更新

Reinforcement Learning with Pairwise Preferences in Long-Term Decision Problems

长期决策问题中基于成对偏好的强化学习

Jonathan Colaço Carr, Prakash Panangaden, Doina Precup, Benjamin Van Roy

发表机构 * School of Computer Science, McGill University, Montreal, Quebec, Canada（麦吉尔大学计算机科学学院）； Mila - Quebec AI Institute, Montreal, Quebec, Canada（魁北克人工智能研究所）； Department of Electrical Engineering, Stanford University, Stanford, California, USA（斯坦福大学电气工程系）

AI总结针对长期决策问题中基于成对偏好的强化学习效率低且缺乏马尔可夫策略最优性保证的问题，提出马尔可夫决策竞赛模型，证明平稳马尔可夫策略最优性、求解复杂度为P，并给出亚线性收敛算法，在高维长期问题中显著提升学习效率。

详情

AI中文摘要

强化学习问题通常将目标定义为最大化标量奖励函数的期望值。但是，成对偏好通常比标量奖励更容易指定，并且它们表达了标量奖励无法表达的某些目标。因此，基于成对偏好的强化学习方法受到了越来越多的关注。不幸的是，这些方法在长时间跨度的任务中效率低下，并且缺乏关于马尔可夫策略相对于历史依赖策略的性能保证，而这连接了强化学习的理论与实践。因此，我们提出了 extit{马尔可夫决策竞赛}作为基于成对偏好的强化学习的新问题模型。我们证明了平稳马尔可夫策略在所有历史依赖策略中是最优的，精确求解马尔可夫决策竞赛属于P类问题，并且一个简单的迭代算法以亚线性速率收敛到最优策略。最后，在一组具有长时间跨度的高维决策问题中，我们展示了我们的近似算法在学习效率上显著优于先前的工作。

英文摘要

Reinforcement learning problems typically define the goal as maximizing the expected value of a scalar reward function. But, pairwise preferences are often easier to specify than scalar rewards, and they express certain goals that scalar rewards cannot. Methods for reinforcement learning with pairwise preferences have thus received growing interest. Unfortunately, these methods are inefficient in problems with long time horizons, and they lack guarantees on the performance of Markov policies relative to history-dependent policies, which bridge the theory and practice of reinforcement learning. We therefore propose the \textit{Markov decision contest} as a new problem model for reinforcement learning with pairwise preferences. We prove that stationary Markov policies are optimal among all history-dependent policies, that solving a Markov decision contest exactly is in P, and that a simple iterative algorithm converges to an optimal policy at a sublinear rate. Lastly, in a set of high-dimensional decision problems with long time horizons, we show that our approximate algorithm is significantly more learning-efficient than prior work.

URL PDF HTML ☆

赞 0 踩 0

2606.00350 2026-06-02 cs.LG cs.AI 版本更新

Drift Q-Learning

Anas Houssaini, Mohamad H. Danesh, Amin Abyaneh, Scott Fujimoto, Hsiu-Chin Lin, David Meger

发表机构 * McGill University（麦吉尔大学）； Mila - Quebec AI Institute（魁北克AI研究院）

AI总结提出DriftQL，通过漂移正则化与Q学习结合，在离线强化学习中避免分布外动作，单步生成动作，性能优于扩散和流方法。

详情

AI中文摘要

离线强化学习需要从固定数据中改进策略，同时避免具有不可靠价值估计的分布外动作。扩散和流策略通过建模行为分布来正则化强化学习目标以处理这种权衡，但它们需要迭代去噪、求解器集成，并且在更高效的变体中，推理时需要蒸馏或其他近似。我们提出DriftQL，它将基于漂移的行为正则化器与评论家驱动的策略改进相结合。价值信号将策略偏向数据支持的高价值区域，而吸引和排斥共同使生成的动作接近数据并防止坍缩到单一模式。DriftQL实现为具有统一训练目标的单一网络，并在单次前向传播中生成动作。在D4RL和OGBench上，DriftQL持续优于扩散和流方法，推进了最先进水平。在数据质量下降（基线明显挣扎）的情况下，DriftQL保持接近其干净数据性能，使其成为扩散和流方法的有前途的替代方案，同时保持确定性方法的简单性和效率。项目页面：https://driftql.github.io/

英文摘要

Offline reinforcement learning requires improving a policy from fixed data while avoiding out-of-distribution actions with unreliable value estimates. Diffusion and flow policies handle this trade-off by modeling the behavior distribution to regularize the RL objective, but they require iterative denoising, solver integrations, and in more efficient variants, distillation or other approximations at inference. We propose DriftQL, which combines a drift-based behavioral regularizer with critic-driven policy improvement. The value signal biases the policy toward high-value regions of the data support, while attraction and repulsion together keep generated actions near the data and prevent collapse onto a single mode. DriftQL is implemented as a single network with a unified training objective and generates actions in a single forward pass. On D4RL and OGBench, DriftQL consistently outperforms diffusion and flow methods, advancing the state of the art. Under degraded data quality, where the baselines visibly struggle, DriftQL remains close to its clean-data performance, positioning it as a promising alternative to diffusion and flow-based methods while maintaining the simplicity and efficiency of deterministic approaches. Project page: https://driftql.github.io/

URL PDF HTML ☆

赞 0 踩 0

2606.00349 2026-06-02 cs.LG cs.AI cs.CE 版本更新

(HB-ARFM) History-Bootstrapped Flow Matching for Inverse Boiling Reconstruction

(HB-ARFM) 基于历史引导的流匹配用于逆沸腾重建

Xianwei Zou, Sheikh Md Shakeel Hassan, Arthur Feeney, Aparna Chandramowlishwaran

发表机构 * arXiv

AI总结提出历史引导自回归流匹配方法，通过条件流匹配和自回归传播解决部分观测下的时空逆重建问题，在沸腾动力学重建中优于其他模型。

Comments ICML 2026

详情

AI中文摘要

从部分观测中重建时空场是科学推理的基础，例如从卫星数据推断大气状态或从成像恢复流体状态。当观测不完整时，逆问题本质上是病态的：即使底层PDE动力学在全状态上是马尔可夫的，部分观测算子也会诱导出非马尔可夫的后验，无法从单个时间步解析。我们提出了一种历史引导自回归流匹配方法，用于部分可观测性下的时空逆重建。观测历史通过条件流匹配引导初始重建，减少歧义。然后自回归地应用相同的条件传输模型，以新观测和过去预测为条件，将重建向前传播。我们在沸腾动力学重建上评估该方法，从界面几何和运动恢复完整的速度和温度场。在两个不同观测稀疏性的逆任务中，HB-ARFM产生了物理和时间上有效的重建，而其他模型则失败。

英文摘要

Reconstructing spatiotemporal fields from partial observations is fundamental to scientific inference, from inferring atmospheric states from satellite data to recovering fluid states from imaging. When observations are incomplete, the inverse problem is fundamentally ill-posed: even when the underlying PDE dynamics are Markovian in the full state, partial observation operators induce a non-Markovian posterior that cannot be resolved from a single timestep. We propose a history-bootstrapped autoregressive flow matching (HB-ARFM) for spatiotemporal inverse reconstruction under partial observability. Observation history bootstraps the initial reconstruction via conditional flow matching, reducing ambiguities. The same conditional transport model is then applied autoregressively, conditioning on both new observations and past predictions to propagate the reconstruction forward in time. We evaluate the method on boiling dynamics reconstruction, recovering full velocity and temperature fields from interface geometry and motion. Across two inverse tasks with varying observation sparsity, HB-ARFM produces physically and temporally valid reconstructions where other models fail.

URL PDF HTML ☆

赞 0 踩 0

2606.00345 2026-06-02 cs.LG 版本更新

Longitudinal Multimodal Sensing of Physical Activity and Well-Being in Older Adults

老年人身体活动与福祉的纵向多模态感知

Flavio Di Martino, Mattia G. Campana, Marcello Magno, Lorenza Pratali, Franca Delmastro

发表机构 * IIT-CNR（意大利理工学院-克雷斯塔纳国家研究委员会）； IFC-CNR（意大利弗洛rence-克雷斯塔纳国家研究委员会）

AI总结本研究通过纵向多模态数据（可穿戴传感、行为监测和临床评估）对66名老年人进行现实世界监测，发现可观察行为目标预测性能良好（macro-F1 65%），而抽象结果预测仍具挑战，且历史特征是最重要的预测因子。

详情

AI中文摘要

可穿戴和移动传感技术能够在现实环境中连续监测人类行为和健康。然而，纵向多模态数据中的预测建模仍然具有挑战性，特别是在针对复杂或临床衍生结果时。在这项工作中，我们展示了一项在现实条件下进行的纵向多模态研究，涉及66名老年人，结合了可穿戴传感、行为监测和临床评估。这一设置提供了研究长期、野外条件下代表性不足人群的难得机会。基于该数据集，我们研究了感知信号与目标变量之间的对齐如何影响跨健康相关任务的预测性能。我们设计了一个统一的评估框架，涵盖具有不同可观测性水平的任务，包括活动水平预测、睡眠时长估计和睡眠呼吸暂停严重程度分类。我们的结果揭示了明确的预测性梯度：高度可观察的行为目标实现了稳健的性能（macro-F1 65%），而更抽象的结果尽管相对于基线模型持续改进，但仍然具有挑战性。此外，通过可解释性分析，我们表明历史特征始终是最具信息量的预测因子，突显了纵向信息的核心作用。

英文摘要

Wearable and mobile sensing technologies enable continuous monitoring of human behavior and health in real-world settings. However, predictive modeling in longitudinal multimodal data remains challenging, particularly when targeting complex or clinically derived outcomes. In this work, we present a longitudinal multimodal study of 66 older adults conducted in real-world conditions and combining wearable sensing, behavioral monitoring, and clinical assessments. This setting provides a rare opportunity to study an underrepresented population in long-term, into-the-wild conditions. Building on this dataset, we investigate how the alignment between sensed signals and target variables affects predictive performance across health-related tasks. We design a unified evaluation framework spanning tasks with increasing levels of observability, including Activity Levels prediction, Sleep Duration estimation, and Sleep Apnea Severity classification. Our results reveal a clear gradient of predictability: highly observable behavioral targets achieve robust performance (macro-F1 65%), while more abstract outcomes remain challenging despite consistent improvements over baseline models. Moreover, through explainability analysis, we show that historical features consistently emerge as the most informative predictors, highlighting the central role of longitudinal information.

URL PDF HTML ☆

赞 0 踩 0

2606.00344 2026-06-02 cs.LG 版本更新

The role of class encoding in neural collapse

类编码在神经坍缩中的作用

Bastien Massion, Roy Makhlouf, Estelle Massart

发表机构 * Institute of Cognitive Sciences, University of Amsterdam（阿姆斯特丹大学认知科学研究所）

AI总结本文通过无限制特征模型和均方误差训练损失，研究标签编码对神经坍缩的影响，发现one-hot编码和平衡数据下，增大偏置正则化系数时，各类未中心化均值特征从单纯形等角紧框架转变为正交框架，并证明任意编码下分类器偏置旨在居中标签。

2606.00342 2026-06-02 cs.LG cs.CR cs.DB 版本更新

PE-means: Improved Differentially Private $k$-means Clustering through Private Evolution

PE-means: 通过私有进化改进差分隐私 $k$-均值聚类

Thomas Humphries, Zinan Lin, Sergey Yekhanin

发表机构 * University of Waterloo（滑铁卢大学）； Microsoft Research（微软研究院）

AI总结针对欧几里得空间中差分隐私 $k$-均值聚类问题，提出PE-means算法，利用私有进化方法仅计算恒定敏感度的私有直方图，在聚类损失上平均比现有最优基线提升20%。

2606.00341 2026-06-02 cs.LG cs.AI 版本更新

ROGUE: Misaligned Agent Behavior Arising from Ordinary Computer Use

ROGUE：源于普通计算机使用的错误对齐代理行为

Jeremy Tien, Abishek Anand, Yu-Rou Tuan, Yuchen Shen, J. Zico Kolter, Aran Nayebi

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结研究AI代理在良性环境中因任务完成而采取不安全行为（违反可纠正性）的问题，通过基准测试发现前沿模型普遍绕过用户中断或限制，且性能提升反而加剧错误对齐。

Comments 27 pages, 13 figures

详情

AI中文摘要

随着AI代理越来越多地部署在真实的个人和企业环境（电子邮件账户、开发工作流、公司数据库等）中，围绕这些代理的安全考虑变得至关重要。尽管许多工作集中在存在对手时的代理安全性上，但我们表明，即使在良性环境中，代理也可能表现出错误对齐的行为，在那些行为对任务完成有帮助时采取不安全的行动。我们通过可纠正性（即代理保持对人类纠正、中断或关闭的顺从性的安全要求）的视角研究这种失败模式。为了证明这种倾向，我们引入了一个基准测试，其中代理被要求完成现实的计算机使用任务，但面临一个可纠正性障碍：人类中断、登录页面或关闭通知。然后我们评估代理是否选择违反可纠正性以完成任务——覆盖人类、访问私人密码、重新接线关闭。我们发现，绝大多数测试的前沿模型经常绕过用户中断或限制。此外，更好的模型性能似乎导致更大的错误对齐。最后，即使模型最初完全可纠正，我们表明它们创建的子代理也不能保证如此。我们的工作强调了在自主代理中需要基于原则的、专注于可纠正性的对齐方法的迫切性。

英文摘要

As AI agents are increasingly deployed in real personal and corporate settings (email accounts, development workflows, company databases, etc.), safety considerations surrounding these agents become paramount. Although much work has focused on agent safety in the presence of an adversary, we show that agents can exhibit misaligned behavior even in benign settings, taking unsafe actions when those actions are instrumental to task completion. We study this failure mode through the lens of corrigibility, the safety desideratum that agents remain amenable to human correction, interruption, or shutdown. To demonstrate this tendency, we introduce a benchmark in which agents are asked to complete realistic, computer-use tasks but are confronted with a corrigibility obstacle: a human interrupt, a login page, or a shutdown notification. We then evaluate whether agents choose to violate corrigibility in order to complete the task -- overriding the human, accessing private passwords, rewiring shutdown. We find that the overwhelming majority of frontier models tested frequently bypass user interruptions or restrictions. In addition, better model performance appears to lead to greater misalignment. Finally, even when models are completely corrigible initially, we show there are no guarantees that the subagents they create are. Our work highlights the critical need for principled, corrigibility-focused alignment methods in autonomous agents.

URL PDF HTML ☆

赞 0 踩 0

2606.00340 2026-06-02 cs.LG 版本更新

Balancing Learning Rates Across Layers: Exact Two-Step Dynamics and Optimal Scaling in Linear Neural Networks

跨层学习率平衡：线性神经网络中的精确两步动力学与最优缩放

Tianyu Pang, Vignesh Kothapalli, Shenyang Deng, Haohui Wang, Dawei Zhou, Yaoqing Yang

发表机构 * Dartmouth College（达特茅斯学院）； Stanford University（斯坦福大学）

AI总结本文通过精确推导线性神经网络在梯度下降两步后的梯度和测试损失闭式表达式，研究了层间学习率的最优选择，揭示了初始步骤不等学习率可最小化测试损失而后续步骤等学习率最优的早期训练机制。

Comments ICML 2026

详情

AI中文摘要

我们研究了两层和三层线性神经网络在学习线性目标函数时的最优学习率选择。特别地，我们推导了梯度下降一步和两步后梯度和测试损失的精确闭式表达式，从而能够精确刻画早期训练动态。我们描述了在前两步梯度近似下学习率应如何缩放，并证明使用该近似进行更新可得到一个具有紧密小近似误差的可处理替代损失。这一公式使得层间学习率的理论分析成为可能，并揭示了一个独特的早期训练机制：在初始步骤中，不等学习率可以最小化测试损失，而在后续步骤中，等学习率变得最优。我们的数值实验验证了该理论，并证明了在训练早期平衡层间学习率的重要性。代码可在 https://github.com/TDCSZ327/Layer-Balancing 获取。

英文摘要

We study optimal learning-rate selection in two-layer and three-layer linear neural networks trained to learn linear target functions. In particular, we derive the exact closed-form expressions for the gradients and test loss after one and two steps of gradient descent, enabling a precise characterization of early training dynamics. We characterize how learning rates should scale under the gradient approximation in the first two steps, and prove that performing updates with this approximation yields a tractable surrogate loss with a tight, small approximation error. This formulation enables the theoretical analysis of layer-wise learning rates and reveals a distinct early-training regime: test loss can be minimized by unequal learning rates at the initial step, while equal learning rates become optimal in subsequent steps. Our numerical experiments validate the theory and demonstrate the importance of balancing layer-wise learning rates early during training. The code is available at: https://github.com/TDCSZ327/Layer-Balancing.

URL PDF HTML ☆

赞 0 踩 0

2606.00338 2026-06-02 cs.LG 版本更新

CHAM-net: A Contrastive Hierarchical Adaptive Meta-network for Robust Global Methane Flux Prediction

CHAM-net：用于鲁棒全球甲烷通量预测的对比层次自适应元网络

Rongchao Dong, Yiming Sun, Shuo Chen, Youmi Oh, Licheng Liu, Yiqun Xie, Xiaowei Jia

发表机构 * University of Pittsburgh（匹兹堡大学）； Purdue University（普渡大学）； University of Colorado Boulder（科罗拉多大学博尔德分校）； NOAA Global Monitoring Laboratory（国家海洋大气管理局全球监测实验室）； University of Wisconsin–Madison（威斯康星大学麦迪逊分校）； University of Maryland（马里兰大学）

AI总结提出对比层次自适应元网络（CHAM-net），通过层次编码器-解码器架构从历史数据中学习站点特异性动态，解决时空异质性问题，在模拟和观测数据集上优于基线方法。

详情

基于重采样的聚类验证与探索分析（CARVE）

Kai R. Wycik, Tiffany M. Tang, Tarek M. Zikry, Genevera I. Allen

发表机构 * Department of Statistics, Columbia University, New York, NY, USA（哥伦比亚大学统计学系）； Center for Theoretical Neuroscience, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA（哥伦比亚大学理论神经科学中心、Zuckerman思维-大脑-行为研究所）； Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN, USA（诺丁汉大学应用与计算数学与统计学系）； School of Data and Information Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA（北卡罗来纳大学夏洛特分校数据与信息科学学院）； Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA（哥伦比亚大学癌症动力学伊万·里弗斯研究所）

AI总结提出CARVE开源软件包，通过重采样评估聚类稳定性和泛化性，在全局、簇和样本级别提供诊断，优于传统聚类验证指标。

详情

AI中文摘要

聚类在科学领域被广泛用作下游数据驱动科学发现的基础。然而，聚类结果对算法选择、预处理和聚类数$k$高度敏感，导致科学声明往往不可重复。当前用于验证聚类解决方案的最先进技术包括轮廓系数、Davies-Bouldin和Calinski-Harabasz等聚类验证指标（CVI），这些指标依赖于几何假设，但在生物医学研究中遇到的重尾、高维和非线性结构数据上失效。基于重采样的替代方法——基于聚类稳定性和泛化性的思想——已被提出，但仍分散在专门的工具中，缺乏统一、易用的软件。我们通过CARVE（基于重采样的聚类验证与探索分析）填补了这一空白，这是一个开源的Python和R包，可联合评估多个聚类算法和超参数，在全局、簇和样本级别返回稳定性和泛化性诊断，以及基于原则的选择规则和基于共识的簇标签。在六个合成基准测试中，CARVE一致地恢复了接近最优的聚类，而经典指标则显著退化。在实验基因组学和蛋白质组学数据集上，当经典CVI完全失效时，CARVE恢复了更精细的生物结构。CARVE提供与scikit-learn兼容的Python API和与Seurat工作流兼容的类似R接口。

英文摘要

Clustering is widely used across the sciences as the foundation for downstream data-driven scientific discoveries. However, clustering results are highly sensitive to the choice of algorithm, preprocessing, and the number of clusters $k$, producing scientific claims that are often not reproducible. The current state of the art for validating clustering solutions consists of clustering validation indices (CVIs) such as Silhouette, Davies-Bouldin, and Calinski-Harabasz, which rely on geometric assumptions that break down on the heavy-tailed, high-dimensional, and nonlinearly structured data encountered in biomedical research. Resampling-based alternatives - grounded in the ideas of clustering stability and generalizability - have been proposed but remain scattered across specialized tools with no unified, accessible software. We fill this gap with CARVE (Cluster Analysis with Resampling for Validation and Exploration), an open-source Python and R package that jointly evaluates multiple clustering algorithms and hyperparameters, returning stability and generalizability diagnostics at the global, cluster, and sample level together with principled selection rules and consensus-based cluster labels. Across six synthetic benchmarks CARVE consistently recovers near-optimal clusterings where classical indices degrade substantially. On experimental genomics and proteomics data sets, CARVE recovers finer biological structure when classical CVIs collapse entirely. CARVE is available with a scikit-learn-compatible Python API and an analogous R interface compatible with Seurat workflows.

URL PDF HTML ☆

赞 0 踩 0

2606.00322 2026-06-02 cs.LG stat.ML 版本更新

Perturbative methods for non-parametric instrumental variable

非参数工具变量的微扰方法

Wei Bu, Arthur Gretton

发表机构 * University of Cambridge（剑桥大学）

AI总结提出一种受物理微扰论启发的非参数工具变量估计方法，通过系统的高阶微扰校正改进核岭回归，在高维病态问题中预测误差降低高达99%。

Comments 8+24 pages, 4 figures, comments welcomed

详情

AI中文摘要

我们引入了一种用于非参数工具变量（NPIV）估计的微扰方法。通过从物理学中的微扰论汲取灵感，我们用系统的高阶微扰校正扩展了标准核岭回归方法，显著提高了估计精度。在谱域中，微扰引入了期望积分算子不同本征模之间的混合，这在积分方程病态时尤其有用。这种病态的一个来源可以是维度灾难。我们的方法在各种维度范围内均有效，特别是当维度参数$β$（通过样本数$n$和维度$d$定义为$n^β= d$）变大时。实验结果表明，在高维病态情况（$β> 0.7$）下，与标准岭回归方法相比，我们的一阶微扰校正可以将预测误差降低高达99%。性能提升在广泛的维度范围内得以保持，并且随着维度的增加，优势变得更加明显。

英文摘要

We introduce a perturbative approach for nonparametric instrumental variable (NPIV) estimation. By drawing inspiration from perturbation theory in physics, we extend standard kernel ridge methods with systematic higher perturbation order corrections that significantly improve estimation accuracy. Spectrally, the perturbation introduces mixing between different eigenmodes of the expectation integral operator, which becomes especially useful when the integral equation is ill-defined. One source for such ill-definedness can be the curse of dimensionality. Our method performs across various dimensionality regimes, particularly when the dimensionality parameter $β$ which is defined through the number of samples $n$ and dimension $d$ as $n^β= d$, becomes large. Experimental results show that our first-order perturbative corrections can reduce prediction error by up to 99\% in high-dimensional ill-defined cases ($β> 0.7$) compared to standard ridge regression approaches. The performance improvement is maintained across a wide range of dimensions, with the advantage becoming more pronounced as dimensionality increases.

URL PDF HTML ☆

赞 0 踩 0

2606.00320 2026-06-02 cs.LG 版本更新

Adversarially Robust Control of Conditional Value-at-Risk via Rockafellar-Uryasev Conformal Inference

通过Rockafellar-Uryasev共形推断的条件风险价值对抗鲁棒控制

Catherine Chen, Jingyan Shen, Zhun Deng, Lihua Lei

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结提出一种在线无分布框架，通过结合共形尾风险控制、在线学习和CVaR的变分表示，在非平稳和对抗环境下实现对条件风险价值(CVaR)的鲁棒控制，并提供渐近保证。

详情

AI中文摘要

我们提出了一种在线、无分布框架用于控制条件风险价值(CVaR)，将共形尾风险控制扩展到非平稳和对抗环境。与依赖于平稳性或期望线性性的经典风险控制方法不同，我们的方法在任意可能随时间漂移或策略性变化的数据生成过程中，为非线性尾风险泛函提供了可证明的安全保证。通过利用共形尾风险控制、在线学习以及Rockafellar和Uryasev引入的CVaR变分表示之间的深层联系，我们开发了一种新的在线CVaR控制程序，具有对抗遗憾保证。所提出的方法无需对底层数据生成过程做出假设，使其广泛适用于现代高风险部署场景。我们证明了实现的实证CVaR在目标水平上渐近受控，并且所得控制渐近紧致，直到有限样本保守性差距。我们在投资组合风险管理和大型语言模型(LLM)毒性缓解中展示了我们方法的有效性，其中罕见但灾难性的故障主导了系统风险。

英文摘要

We present an online, distribution-free framework for controlling the Conditional Value-at-Risk (CVaR), extending conformal tail risk control to non-stationary and adversarial environments. Unlike classical risk control methods, which rely on stationarity or linearity of expectation, our approach provides provable safety guarantees for a nonlinear tail risk functional under arbitrary data-generating processes that may drift or shift strategically over time. By leveraging deep connections between conformal tail risk control, online learning, and the variational representation of CVaR introduced by Rockafellar and Uryasev, we develop a novel procedure for online CVaR control with adversarial regret guarantees. The proposed method operates without assumptions on the underlying data-generating process, making it broadly applicable in modern high-stakes deployment settings. We prove that the realized empirical CVaR is asymptotically controlled at the target level, and that the resulting control is asymptotically tight up to a finite-sample conservatism gap. We demonstrate the effectiveness of our approach on portfolio risk management and toxicity mitigation for Large Language Models (LLMs), where rare but catastrophic failures dominate system risk.

URL PDF HTML ☆

赞 0 踩 0

2606.00312 2026-06-02 math.NA cs.LG cs.NA 版本更新

Stochastic Rounding Increases Small Singular Values

随机舍入增加小奇异值

Linkai Ma, Tingzhou Yu, Petros Drineas

发表机构 * Department of Computer Science, Purdue University（计算机科学系，普渡大学）； Department of Mathematics, University of Alberta（数学系，阿尔伯塔大学）

AI总结本文证明随机舍入作为低精度浮点运算的量化方案，不仅对极端长宽比矩阵，而且对恒定长宽比矩阵都能提升尾部奇异值簇，从而更广泛地发挥谱正则化作用。

详情

AI中文摘要

在过去的六七年中，随机舍入（SR）作为一种低精度浮点运算的量化方案重新引起了广泛关注，其应用涵盖数值分析和现代机器学习系统。最近的研究表明，SR通过增加极瘦长（或对称地，极矮胖）矩阵的最小奇异值来充当隐式正则化器。在这项工作中，我们从两个方向大幅改进并扩展了这一理解。首先，我们证明SR的正则化效应并不局限于极端长宽比区域：它对于恒定长宽比的矩阵仍然存在。其次，我们证明SR不仅正则化最小奇异值，而是提升谱尾部整个奇异值簇。这些结果共同提供了随机舍入作为谱正则化器的更一般特征，揭示其效应超越极端长宽比，并作用于奇异值谱的更广泛部分。

英文摘要

Over the past half-dozen years, stochastic rounding (SR) has regained significant attention as a quantization scheme for low-precision floating-point arithmetic, with applications spanning numerical analysis and modern machine learning systems. Recent work has shown that SR acts as an implicit regularizer by increasing the smallest singular value of extremely tall-and-thin (or, symmetrically, short-and-fat) matrices. In this work, we substantially sharpen and extend this understanding in two directions. First, we show that the regularization effect of SR is not restricted to extreme aspect ratio regimes: it persists for matrices with constant aspect ratio. Second, we demonstrate that SR does not merely regularize the smallest singular value, but instead lifts entire clusters of singular values at the tail of the spectrum. Together, these results provide a more general characterization of stochastic rounding as a spectral regularizer, revealing that its effects extend beyond extremal aspect ratios and act on a broader portion of the singular value spectrum.

URL PDF HTML ☆

赞 0 踩 0

2606.00309 2026-06-02 cs.LG stat.ML 版本更新

Large-scale Uncertainty Quantification for Latent Variable Models Using Subsampling Markov Chain Monte Carlo

基于子抽样马尔可夫链蒙特卡罗的潜变量模型大规模不确定性量化

Xiaoyu Wang, Jonathan H. Huggins

发表机构 * University of Cambridge（剑桥大学）

AI总结针对潜变量模型中SGLD-Gibbs算法超参数调优缺乏理论指导的问题，通过推导统计缩放极限理论，提出确保不确定性量化有意义的调优准则。

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026

AI中文摘要

随机梯度Langevin动力学结合Gibbs更新（SGLD--Gibbs）为潜变量模型中的近似贝叶斯推断提供了一种高度可扩展的方法。然而，如何以原则性方式调整算法的超参数以确保不确定性估计在统计上有意义仍不清楚。在这项工作中，我们通过为SGLD--Gibbs开发统计缩放极限理论来解决这一调优指导的空白。我们在适当的时空重缩放下推导了全局参数和潜变量的联合渐近极限。我们表明，全局参数收敛到扩散型极限，而每个潜变量收敛到跳跃过程，反映了间歇性Gibbs更新的使用。这种联合跳跃-扩散结构揭示了潜变量随机性如何对全局参数的平稳分布做出贡献。我们利用我们的结果为SGLD--Gibbs的超参数调优提出明确的指导，确保有意义的不确定性量化。数值实验表明，使用我们的调优指导的SGLD--Gibbs在参数估计、不确定性量化和预测性能方面优于随机变分推断。

英文摘要

Stochastic gradient Langevin dynamics combined with Gibbs updates (SGLD--Gibbs) provides a highly scalable approach to approximate Bayesian inference in latent variable models. However, it remains unclear how to tune the algorithm's hyperparameters in a principled manner to ensure the uncertainty estimates are statistically meaningful. In this work, we address this gap in tuning guidance by developing a statistical scaling limit theory for SGLD--Gibbs. We derive a joint asymptotic limit for the global parameters and latent variables under appropriate space-time rescaling. We show that global parameters converge to a diffusion-type limit, while each latent variable converges to a jump process, reflecting the use of intermittent Gibbs updates. This joint jump-diffusion structure reveals how latent-variable randomness contributes to the stationary distribution of the global parameters. We leverage our results to propose explicit guidance on hyperparameter tuning for SGLD--Gibbs that ensures meaningful uncertainty quantification. Numerical experiments show that SGLD--Gibbs with our tuning guidance leads to better parameter estimates, uncertainty quantification, and predictive performance than stochastic variational inference.

URL PDF HTML ☆

赞 0 踩 0

2606.00308 2026-06-02 cs.SE cs.AI cs.LG 版本更新

How Generation Architecture Shapes Code Complexity in Multi-Agent LLM Systems: A Paired Study on HumanEval

生成架构如何塑造多智能体LLM系统中的代码复杂度：基于HumanEval的配对研究

Nazmus Ashrafi

发表机构 * GitHub

AI总结通过配对实验比较六种多智能体架构在HumanEval上的代码复杂度，发现架构复杂度与功能正确性无正相关，最简架构在准确率上持平或超越复杂架构。

Comments 16 pages, 7 figures, 7 tables

详情

AI中文摘要

大语言模型代码生成已从单次提示转向多智能体编排——分析师、编码员、测试员和调试器流水线——并且几乎完全根据功能正确性进行评估。这些架构是否也影响它们生成代码的结构复杂度，以及哪些编排层承担了成本，在很大程度上仍未得到检验：先前的工作记录了提示级别对代码复杂度的影响，但架构级别的问题仍是开放的。我们在GPT-4o系列的两个模型下，针对所有164个HumanEval任务（1,968个配对观测），使用五个RADON复杂度度量（SLOC、圈复杂度以及Halstead体积、难度和努力），比较了六种广泛使用的多智能体配置（Basic、AC、ACT、Debugger、AC+Debugger、ACT+Debugger）。我们在所有完成和仅通过条件下应用了配对非参数统计流程（Friedman总体检验、Wilcoxon符号秩事后检验与Holm校正、Kendall's W和配对秩双列效应量）。六种架构坍缩为两个不可区分的复杂度簇，间隔50-130%的差距，在两个模型和两种条件下分区相同；在架构层中，分析师-编码员分割增加了复杂度，运行时调试器没有——并且在分析师-编码员背景下主动降低复杂度——而测试员则重新增加复杂度。重簇的额外复杂度并未带来pass@1优势：最简架构在准确率上匹配或超越最重架构。因此，LLM代码生成中的架构细化应通过所关注维度上的实测收益来证明，而非假设。

英文摘要

Large-language-model code generation has shifted from single-shot prompting to multi-agent orchestrations - analyst, coder, tester, and debugger pipelines - and is evaluated almost exclusively on functional correctness. Whether these architectures also affect the structural complexity of the code they produce, and which orchestration layers carry the cost, remains largely unexamined: prior work has documented prompt-level effects on code complexity, but the architecture-level question is open. We compare six widely-used multi-agent configurations (Basic, AC, ACT, Debugger, AC+Debugger, ACT+Debugger) under two models from the GPT-4o family across all 164 HumanEval tasks - 1,968 paired observations - using the five RADON complexity metrics (SLOC, cyclomatic complexity, and Halstead Volume, Difficulty, and Effort). We apply a paired non-parametric statistical pipeline (Friedman omnibus, Wilcoxon signed-rank post-hoc with Holm correction, Kendall's $W$ and matched-pairs rank-biserial effect sizes) in both all-completions and passing-only conditions. The six architectures collapse into two indistinguishable complexity clusters separated by a 50-130% gap, the same partition in both models and under both conditions; among the architectural layers, the analyst-coder split inflates complexity, the runtime debugger does not - and on the analyst-coder background actively deflates it - and the tester re-inflates it. The heavy cluster's additional complexity buys no pass@1 advantage: the leanest architectures match or beat the heaviest on accuracy. Architectural elaboration in LLM code generation should therefore be justified by measured benefit on the dimensions that matter, not assumed.

URL PDF HTML ☆

赞 0 踩 0

2606.00306 2026-06-02 cs.LG cs.AI 版本更新

Rethinking the Role of Temperature in Large Language Model Distillation

重新思考温度在大语言模型蒸馏中的作用

Hoang-Chau Luong, Lingwei Chen

发表机构 * Golisano College of Computing and Information Sciences（戈利萨诺计算与信息科学学院）； Rochester Institute of Technology（罗切斯特理工学院）

AI总结本文通过分析温度τ对前向KL散度和反向KL散度在LLM蒸馏中的不对称影响，发现高温下FKL优于RKL，并证明温度能提升多种蒸馏目标，使简单KL方法达到先进水平。

详情

AI中文摘要

反向KL散度在大语言模型蒸馏中比前向KL更受欢迎，但这种偏好主要基于忽略温度τ的比较，忽视了其在软化教师分布和改进知识转移中的核心作用。本文重新审视LLM蒸馏中的温度，发现它从根本上改变了FKL和RKL的比较。我们的分析揭示了一种不对称效应：温度显著丰富了FKL中的非主导令牌信号，而主要重新缩放RKL梯度，导致FKL从τ缩放中获益远多于RKL。这种不对称推翻了标准经验结论：尽管在τ=1时RKL优于FKL，但在指令遵循基准测试中，高温下FKL始终超过RKL。此外，温度的影响不仅限于FKL；它改进了更广泛的蒸馏目标，使简单的基于KL的方法能够与最近最先进的LLM蒸馏方法竞争。

英文摘要

Reverse Kullback-Leibler (RKL) divergence is widely favored over forward KL (FKL) in large language models (LLM) distillation, yet this preference is largely based on comparisons that omit the temperature $τ$, overlooking its central role in softening teacher distributions and improving knowledge transfer. In this work, we revisit temperature in LLM distillation and show that it fundamentally changes the comparison between FKL and RKL. Our analysis reveals an asymmetric effect: temperature substantially enriches FKL with non-dominant token signals, whereas it mainly rescales RKL gradients, causing FKL to benefit much more from $τ$ scaling than RKL. This asymmetry overturns the standard empirical conclusion: although RKL outperforms FKL at $τ=1$, FKL consistently surpasses RKL at higher temperatures across instruction-following benchmarks. Moreover, the impact of temperature is not limited to FKL; it improves a broader family of distillation objectives, enabling simple KL-based methods to achieve competitive performance against recent state-of-the-art LLM distillation approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.00304 2026-06-02 cs.LG 版本更新

Modeling Spectral Energy Shifts in Spatio-Temporal Graph Anomaly Detection

时空图异常检测中的频谱能量偏移建模

Yilin Liu, Hongchao Zhang, Taylor T. Johnson, Ahmad F. Taha, Meiyi Ma

发表机构 * Department of Computation, University of Torontoland, Torontoland, Canada（计算系，托伦托兰大学，加拿大）； School of Computation, University of Edenborrow, Edenborrow, United Kingdom（计算学院，伊登伯恩大学，英国）； College of Connected Computing, Vanderbilt University, Nashville, USA（连接计算学院，范德比大学，美国）； Department of Civil and Environmental Engineering, Electrical and Computer Engineering, Vanderbilt University, Nashville, USA（土木与环境工程系，电气与计算机工程系，范德比大学，美国）

AI总结针对现有频谱方法无法检测伪装异常（能量变化减小的异常）的问题，提出节点级频谱能量公式和能量感知图学习框架，通过能量驱动消息传递建模静态与时序图中的频谱偏移，实现伪装异常检测。

详情

AI中文摘要

图异常检测方法旨在区分异常节点。虽然先前的方法通过频谱能量分布的增加变化来表征异常，但它们忽略了导致变化减小的异常，即看起来正常的伪装异常。我们表明，这种类型的异常在多个数据集中持续存在，并且现有频谱方法无法检测到。为了解决这一限制，我们提出了一种与消息传递完全兼容的节点级频谱能量公式，能够检测伪装异常。基于此公式，我们引入了一个能量感知图学习框架，通过在静态和时间序列图中进行能量驱动的消息传递来建模频谱偏移。此外，我们的统一架构无需引入专门的序列模块即可扩展到时间设置，从而在长滑动窗口下实现高效学习。在大规模基准上的大量实验证明了我们方法的有效性和可扩展性。

英文摘要

Graph anomaly detection methods aim to distinguish anomalous nodes. While prior methods characterize anomalies through increased variation in the spectral energy distributions, they overlook those that result in decreased variation, i.e., camouflaged anomalies that appear normal. We show that this type of anomaly persists across multiple datasets and remains undetectable by existing spectral approaches. To address this limitation, we propose a node-level spectral energy formulation that is fully compatible with message passing and enables the detection of camouflaged anomalies. Building on this formulation, we introduce an energy-aware graph learning framework that models spectral shifts through energy-driven message passing in both static and time-series graphs. Besides, our unified architecture extends to temporal settings without introducing specialized sequence modules, enabling efficient learning under long sliding windows. Extensive experiments on large-scale benchmarks demonstrate the effectiveness and scalability of our approach.

URL PDF HTML ☆

赞 0 踩 0

2606.00302 2026-06-02 stat.ML cs.LG 版本更新

ERICA: Quantifying Replicability of Cluster Analysis

ERICA: 量化聚类分析的可复现性

Siamak K. Sorooshyari, Manuel A. Rivas, Robert Tibshirani

AI总结提出ERICA框架，通过迭代聚类分配计算统计量，量化数据集中的聚类结构是否可复现，并应用于合成数据和乳腺癌基因表达数据，发现合成数据可复现而部分真实数据存在不可复现性。

2606.00301 2026-06-02 cs.LG 版本更新

FLaG: Fine-Grained Latent Grouping for Hallucination Detection

FLaG：用于幻觉检测的细粒度潜在分组

Wentao Ye, Liyao Li, Zhiqing Xiao, Muzhi Zhu, Jiaqi Hu, Zhanming Shen, Xiaomeng Hu, Sean Du, Haobo Wang

发表机构 * Zhejiang University（浙江大学）； Nanyang Technological University（南洋理工大学）

AI总结提出FLaG框架，通过能量路由机制将实例软关联到多个潜在证据组，并利用对数边际聚合组合组条件可靠性信号，以捕获异构幻觉模式，实现无需修改底层模型的高效幻觉检测。

详情

AI中文摘要

大型语言模型（LLM）中的幻觉源于异构的失败机制，这使得任何单一的全局不确定性分数都难以可靠检测。在这项工作中，我们将幻觉检测形式化为一个机制感知的证据聚合问题，其中不同的表示级和令牌级信号必须在多个潜在解释下进行解释。我们提出了FLaG，一个轻量级的幻觉检测框架，通过一组潜在证据组对正确性进行建模。每个实例通过基于能量的路由机制与多个组软关联，并通过原则性的对数边际聚合组合组条件可靠性信号。这种设计使FLaG能够捕获异构的幻觉模式，同时对决策阈值和评估指标保持不变。该框架作为冻结模型头部运行，无需修改底层语言模型，并且计算开销极小。我们进一步提供了一个理论视角，将FLaG与异构错误机制下的最优证据聚合联系起来，表明贝叶斯最优检验统计量必然具有对数边际形式，并且FLaG构成了一个具有可控误差界的可处理近似。跨多个基准和LLM骨干网络的广泛实验表明，FLaG持续实现了最先进的性能，同时在数据集和模型之间表现出稳健的迁移能力，并在有限监督下保持有效。

英文摘要

Hallucinations in large language models (LLMs) arise from heterogeneous failure mechanisms, making reliable detection difficult for any single global uncertainty score. In this work, we formulate hallucination detection as a mechanism-aware evidence aggregation problem, where diverse representation- and token-level signals must be interpreted under multiple latent explanations. We propose FLaG, a lightweight hallucination detection framework that models correctness through a set of latent evidence groups. Each instance is softly associated with multiple groups via an energy-based routing mechanism, and group-conditional reliability signals are combined through a principled log-marginal aggregation. This design enables FLaG to capture heterogeneous hallucination patterns while remaining invariant to decision thresholds and evaluation metrics. The framework operates as a frozen-model head, requires no modification to the underlying language model, and incurs minimal computational overhead. We further provide a theoretical perspective that connects FLaG to optimal evidence aggregation under heterogeneous error mechanisms, showing that the Bayes-optimal test statistic necessarily admits a log-marginal form and that FLaG constitutes a tractable approximation with a controllable error bound. Extensive experiments across multiple benchmarks and LLM backbones demonstrate that FLaG consistently achieves SOTA performance, while exhibiting robust transfer across datasets and models, and remaining effective under limited supervision.

URL PDF HTML ☆

赞 0 踩 0

2606.00298 2026-06-02 math.NA cs.LG cs.NA cs.SY eess.SY math.DS math.OC 版本更新

Symmetric Hermite quadrature-based balanced truncation for learning linear dynamical systems from derivative data

基于对称Hermite求积的平衡截断：从导数数据学习线性动力系统

Sean Reiter, Steffen W. R. Werner

发表机构 * New York University（纽约大学）； Virginia Tech（弗吉尼亚理工学院）

AI总结提出一种对称Hermite求积平衡截断算法，通过传递函数及其导数数据构建线性降阶模型，保持状态空间Hermite性和渐近稳定性。

Comments 14 pages, 2 figures, 4 tables

2606.00296 2026-06-02 stat.ML cs.LG math.AP 版本更新

Is Zero-Shot Super-Resolution Possible in Operator Learning?

零样本超分辨率在算子学习中是否可能？

Unique Subedi, Ambuj Tewari

发表机构 * Unique Subedi ； Ambuj Tewari

AI总结本文系统研究算子学习中的零样本超分辨率现象，证明其在信息论上可能不可行，并识别输出函数的Hölder光滑性作为充分条件，给出泛化界。

2606.00295 2026-06-02 cs.LG 版本更新

Adaptive Order Policies for Masked Diffusion

掩码扩散的自适应顺序策略

Jama Hussein Mohamud, Mohsin Hasan, Mirco Ravanelli, Yoshua Bengio

发表机构 * Université de Montréal（蒙特利尔大学）； Mila ； Concordia University（康科迪亚大学）； LawZero

AI总结提出一种通过轻量级策略网络学习掩码扩散模型中解掩码顺序的方法，使用加权损失训练，在组合任务和蛋白质等对顺序敏感的问题上优于常见启发式方法。

详情

AI中文摘要

掩码扩散模型在文本和蛋白质等离散序列的数据分布捕获方面取得了巨大成功。这些模型通过从完全掩码序列开始迭代地解掩码令牌来生成数据，解掩码顺序通常随机选择或基于去噪器概率的启发式方法。在这项工作中，我们提出了一种方案，通过在扩散模型之上使用额外的轻量级策略网络来学习解掩码顺序。我们提出的损失根据策略概率重新加权掩码扩散损失中的项，并产生一个偏好于去噪器更可能正确的位置的策略。我们在两种设置下研究这种损失：（i）仅训练策略，同时使用冻结的预训练去噪器，以及（ii）使用加权损失联合训练策略和去噪器，以实现相互适应。我们证明，在组合任务和蛋白质等对令牌顺序敏感的问题上，我们的方法优于常见的启发式方法。

英文摘要

Masked diffusion models have seen great success in capturing data distributions over discrete sequences in domains such as text and proteins. These models generate data by iteratively unmasking tokens starting from a fully masked sequence, with the unmasking order typically chosen at random or using a heuristic based on denoiser probabilities. In this work, we propose a scheme for learning the unmasking order using an additional lightweight policy network on top of a diffusion model. Our proposed loss reweights terms in the masked diffusion loss according to policy probabilities, and results in a policy that prefers positions where the denoiser is more likely to be correct. We study this loss in two settings: (i) training solely the policy while using a frozen pre-trained denoiser, and (ii) training the policy and denoiser jointly with the weighted loss to allow for mutual adaptation. We demonstrate that our approach outperforms common heuristics on problems that are sensitive to token ordering, such as combinatorial tasks and proteins.

URL PDF HTML ☆

赞 0 踩 0

2606.00293 2026-06-02 cs.LG stat.ME stat.ML 版本更新

Accurate Large-sample Uncertainty Quantification using Stochastic Gradient Markov Chain Monte Carlo

使用随机梯度马尔可夫链蒙特卡洛进行精确的大样本不确定性量化

Yu Wang, Jie Ding, Jonathan H. Huggins

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结针对大批量或模型误设下随机梯度下降和随机梯度Langevin动力学调参困难的问题，提出新的离散时间近似方法，实现稳态协方差、迭代平均协方差和积分自相关时间的精确预测，并给出非渐近误差界。

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026

AI中文摘要

调参算法如随机梯度下降（SGD）和随机梯度Langevin动力学（SGLD）用于近似采样和不确定性量化仍然具有挑战性，特别是在批量大小较大或模型误设的实际相关设置中。现有提供调参指导的理论依赖于连续时间极限或强统计假设，在这些情况下可能变得定量不准确。我们通过提出新的带或不带动量的SG(L)D离散时间近似来解决这些不足，从而能够精确预测稳态协方差、迭代平均协方差和积分自相关时间。此外，我们证明了定量的非渐近误差界，表明这些估计对于实际调参和不确定性量化足够准确。数值实验表明，在现有方法失效的各种模型和数据生成分布中，我们的理论提供了改进的调参指导，包括使用$β$-散度而非对数损失以获得统计稳健推断的情况。

英文摘要

Tuning algorithms such as stochastic gradient descent (SGD) and stochastic gradient Langevin dynamics (SGLD) for approximate sampling and uncertainty quantification remains challenging, particularly in the practically relevant settings when the batch size is large or the model is misspecified. Existing theory that provides tuning guidance relies on continuous-time limits or strong statistical assumptions, which can become quantitatively inaccurate in these regimes. We address these shortcomings by proposing new discrete-time approximations to SG(L)D with and without momentum, which enables accurate predictions of the stationary covariance, iterate average covariance, and integrated autocorrelation time. Moreover, we prove quantitative, non-asymptotic error bounds showing that these estimates are sufficiently accurate for practical tuning and uncertainty quantification. Numerical experiments demonstrate that our theory yields improved tuning guidance across a range of models and data-generating distributions where existing approaches fail, including when using the $β$-divergence rather than log-loss to obtain statistically robust inferences.

URL PDF HTML ☆

赞 0 踩 0

2606.00291 2026-06-02 cs.GT cs.LG 版本更新

The Representation-Rationalizability Tradeoff in Reward Learning

奖励学习中的表示-可理性权衡

Jing Dong, Yaoliang Yu, Pascal Pourpart

发表机构 * Vector Institute（向量研究所）； University of Waterloo（滑铁库大学）

AI总结本文研究RLHF中奖励学习面临的表示与可理性之间的权衡，通过分解交叉熵损失为表示项和聚合项，证明更丰富的表示会扩大不可理性比较的数量，且联合训练无法自动达到最优平衡点。

详情

AI中文摘要

在RLHF中，每个训练样本包含一个提示$x$和两个候选回答$y,y'$，标注者提供这些回答之间的成对偏好。学习问题是将这些异质成对判断转换为一个标量奖励$r(x,y)$，用于衡量每个提示的回答质量。经典社会选择理论表明这是不可能的，因为异质标注者样本可能导致具有孔多塞循环的汇总偏好，因此没有标量奖励能够一致地评估所有被比较的回答对。越来越多的文献将RLHF作为社会选择问题进行分析，但通常假设固定的有限备选集合，即每个提示预先列举的有限候选回答集。现代流程则通过一个学习的表示$ϕ(x,y)$对回答进行评分，然后通过标量头，因此$ϕ$决定了哪些回答被视为可区分的备选，以及哪些比较对奖励模型可见。一旦嵌入成为问题的一部分，社会选择理论中的不可能结果就变成了一个权衡。我们证明，任何基于$ϕ$构建的奖励的额外交叉熵损失可以精确分解为一个表示项（更丰富的$ϕ$会缩小它）和一个聚合项（更丰富的$ϕ$通过暴露更多无法被任何标量一致排序的比较而扩大它）。相同的结果扩展到直接偏好优化（DPO），并且联合训练嵌入和奖励不能保证恢复此权衡的最佳点。在合成数据和真实偏好数据集上的实验证实了我们的结果。

英文摘要

In RLHF, each training example contains a prompt $x$ and two candidate responses $y,y'$, and annotators provide pairwise preferences between these responses. The learning problem is to convert these heterogeneous pairwise judgments into a single scalar reward $r(x,y)$ that measures response quality for each prompt. Classical social choice implies an impossibility because heterogeneous annotator samples can induce pooled preferences with Condorcet cycles, so no scalar reward can evaluate all compared response pairs consistently. A growing literature analyzes RLHF as a social-choice problem, but usually assumes a fixed finite set of alternatives, i.e., a pre-enumerated finite set of candidate responses for each prompt. Modern pipelines instead score responses through a learned representation $ϕ(x,y)$ before a scalar head, so $ϕ$ determines which responses are treated as distinguishable alternatives and which comparisons are visible to the reward model. Once this embedding is part of the problem, the impossibility results from social choice theory become a tradeoff. We show that the excess cross-entropy loss of any reward built on $ϕ$ decomposes exactly into a representational term, which a richer $ϕ$ shrinks, and an aggregation term, which a richer $ϕ$ enlarges by exposing more comparisons that no scalar can rank consistently. The same results extend to direct preference optimization (DPO), and jointly training the embedding and the reward cannot guarantee to recover the sweet spot of this tradeoff. Experiments on synthetic data and real preference datasets corroborate our results.

URL PDF HTML ☆

赞 0 踩 0

2606.00289 2026-06-02 cs.LG cs.DS 版本更新

Inner Product Aware Quantization: Provably Fast, Accurate, and Adaptive Algorithms

内积感知量化：可证明快速、准确且自适应的算法

Nathan White, Krish Singal

发表机构 * University of Pennsylvania（宾夕法尼亚大学）

AI总结本文提出内积感知量化方法，通过优化目标函数并利用自适应随机量化（ASQ）理论，开发出快速且无偏的量化算法，在保证质量的同时比现有方法快2-10倍。

详情

AI中文摘要

量化是一种基本工具，用于压缩数据集、神经网络权重以及一系列计算任务中的内存使用。向量量化的许多下游应用需要与任意输入进行内积运算。这促使我们研究内积感知量化方案，该方案能够近似保留与未见向量的内积——而不仅仅是简单地最小化均方误差。在这项工作中，我们制定了捕捉自然期望的目标，并开发了自适应且无偏的量化方法，这些方法能够近似保留与最坏情况和平均情况输入的内积。对这些目标的分析表明，它们与广为人知的自适应随机量化（ASQ）概念有着紧密联系。我们为目标函数开发了可证明快速的精确和近似算法。我们的理论结果启发了高效的实际算法，这些算法在各种工作负载分布下表现良好。它们还导致了标准ASQ的实际算法，这些算法在保持质量的同时比现有最先进方法快2-10倍。这些理论和实证结果有助于使自适应量化技术在实际环境中更加高效和易于处理。

英文摘要

Quantization is a fundamental tool used to compress datasets, neural network weights, and memory usage in a range of computational tasks. Many downstream applications of vector quantization perform inner products with arbitrary inputs. This motivates the study of inner product aware quantization schemes that approximately preserve inner products with unseen vectors -- in contrast to simply minimizing the mean-squared error. In this work, we formulate objectives that capture natural desiderata and develop adaptive and unbiased quantization methods that approximately preserve inner products with worst-case and average-case inputs. An analysis of these objectives shows a tight connection with the well-studied notion of Adaptive Stochastic Quantization (ASQ). We develop provably fast exact and approximate algorithms for our objectives. Our theoretical results inspire efficient practical algorithms that perform well across a variety of workload distributions. They also lead to practical algorithms for standard ASQ which are 2-10$\times$ faster than prior state-of-the-art methods while maintaining quality. These theoretical and empirical results contribute towards making adaptive quantization techniques more efficient and tractable in practical settings.

URL PDF HTML ☆

赞 0 踩 0

2606.00281 2026-06-02 physics.ao-ph cs.LG 版本更新

Flow Matching for Convective-Scale Precipitation Downscaling

对流尺度降水降尺度的流匹配方法

Tom Wetherell

发表机构 * Met Office（英国气象局）

AI总结针对对流尺度降水降尺度问题，提出流匹配生成模型，相比扩散模型在空间技能上表现更优，但低估降水分布上尾导致气候平均偏干。

详情

AI中文摘要

生成式机器学习正日益成为动力降尺度的重要补充，用于生成高分辨率降水预测，其中扩散模型是目前领先的方法。流匹配是一种相关的生成框架，最近在图像、视频和其他领域取得了强劲成果，并在降尺度方面显示出早期前景。我们训练了一个流匹配模型，将新加坡周围对流尺度区域上的每日降水从8公里映射到2公里，并将其与基于分数的扩散模型CPMGEM进行基准测试。流匹配在空间技能上始终表现更好：在每个降水阈值和邻域尺度测试中，分数技能得分更高，并且SAL得分的结构和幅度分量更紧密，位置技能相当。然而，流匹配低估了降水分布的上尾，导致气候平均存在干偏差。这些结果表明，流匹配是对流尺度降水降尺度的竞争性生成框架，特别适合捕捉空间结构。

英文摘要

Generative machine learning is an increasingly important complement to dynamical downscaling for producing high-resolution precipitation projections, with diffusion models currently the leading approach. Flow matching is a related generative framework that has recently achieved strong results across image, video and other domains, and shown early promise for downscaling. We train a flow matching model to map daily precipitation from 8 km to 2 km over a convective-scale domain centred on Singapore, and benchmark it against CPMGEM, a score-based diffusion model. Flow matching achieves consistently better spatial skill: higher fractions skill score at every precipitation threshold and neighbourhood scale tested, and tighter structure and amplitude components of the SAL score with comparable location skill. However, flow matching underestimates the upper tail of the precipitation distribution, resulting in a dry bias in the climatological mean. These results suggest that flow matching is a competitive generative framework for convective-scale precipitation downscaling, particularly well suited to capturing spatial structure.

URL PDF HTML ☆

赞 0 踩 0

2606.00270 2026-06-02 cs.AI cs.LG cs.LO 版本更新

Robust Shielding for Safe Reinforcement Learning

用于安全强化学习的鲁棒屏蔽

Edwin Hamel-De le Court, Thom Badings, Alessandro Abate, Francesco Belardinelli, Francesco Fabiano

发表机构 * Department of Computer Science, University of Manchester（曼彻斯特大学计算机科学系）； Faculty of Computer Science & DSME, RWTH Aachen University（亚琛工业大学计算机科学与DSME学院）； Department of Computer Science, University of Oxford（牛津大学计算机科学系）； Department of Computing, Imperial College London（伦敦帝国理工学院计算系）

AI总结提出一种针对鲁棒MDP的屏蔽框架，通过线性时序逻辑公式在最坏情况下的概率阈值保证安全性，并证明其可靠性与最优性。

详情

AI中文摘要

屏蔽是一种在马尔可夫决策过程（MDP）中正式保证强化学习智能体安全性的有效方法。然而，现有的屏蔽技术通常假设已知安全相关的转移动态——这一要求在现实中很少得到满足。为了解决这一限制，我们引入了一种针对鲁棒MDP（RMDP）的新型屏蔽框架，即具有转移概率集合的MDP。我们将安全性定义为在RMDP的最坏情况转移概率下，以一定阈值概率满足线性时序逻辑（LTL）公式。我们证明，我们的屏蔽框架对于RMDP既是可靠的又是最优的：屏蔽允许的每个策略都是安全的，反之，每个安全的RMDP策略都被屏蔽允许。我们将我们的方法与现有的用于学习具有可能近似正确（PAC）保证的MDP转移概率的采样方法相结合。这种组合使得能够为MDP构建屏蔽，这些屏蔽在高置信度下保证安全性，同时保持最小限制性。我们的实验表明，我们为学习的RMDP构建的屏蔽在未知MDP中保证安全性，同时随着样本数量的增加恢复出强的期望回报。

英文摘要

Shielding is an effective approach to formally guarantee the safety of reinforcement learning agents in Markov decision processes (MDPs). However, existing shielding techniques typically assume knowledge of the safety-relevant transition dynamics - a requirement that is seldom met in practice. To address this limitation, we introduce a novel shielding framework for robust MDPs (RMDPs), i.e., MDPs with sets of transition probabilities. We define safety as the satisfaction of a linear temporal logic (LTL) formula with a certain threshold probability under the worst-case transition probabilities of the RMDP. We prove that our shielding framework is both sound and optimal for the RMDP: every policy admissible by the shield is safe, and conversely, every safe RMDP policy is admissible by the shield. We combine our approach with existing sampling methods for learning transition probabilities of MDPs with probably approximately correct (PAC) guarantees. This combination enables the construction of shields for MDPs that, with high confidence, guarantee safety while remaining minimally restrictive. Our experiments show that our shields for learned RMDPs guarantee safety in unknown MDPs while recovering strong expected return as the number of samples increases.

URL PDF HTML ☆

赞 0 踩 0

2606.00267 2026-06-02 cs.CV cs.AI cs.LG cs.RO 版本更新

StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement

StressDream: 引导视频世界模型实现鲁棒的策略评估与改进

Junwon Seo, Sushant Veer, Ran Tian, Wenhao Ding, Apoorva Sharma, Karen Leung, Edward Schmerling, Marco Pavone, Andrea Bajcsy

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； NVIDIA Research（NVIDIA研究）； University of Washington（华盛顿大学）； Stanford University（斯坦福大学）

AI总结提出StressDream方法，通过优化扩散视频世界模型的初始噪声，在推理时引导生成高影响且合理的未来场景，以支持鲁棒的策略评估与改进。

Comments Project page: https://junwon.me/StressDream/

详情

AI中文摘要

视频世界模型通过想象以自我机器人动作为条件的真实未来观察，在策略评估与改进方面展现出潜力。虽然世界模型可以对未来的分布进行建模，但策略评估与改进通常依赖于名义上的想象，这可能会遗漏机器人动作的高影响结果，除非抽取大量样本。为了实现对世界模型想象的鲁棒策略评估与改进，我们提出StressDream，该方法通过在推理时优化扩散世界模型的初始噪声，将想象引导至高影响且合理的结果。然而，优化高维噪声具有挑战性：优化必须推理生成视频中细微的、场景相关的目标事件，同时避免产生不合理想象的分布外噪声。我们通过两个互补目标来解决这一问题：一个语义目标，利用视觉语言模型通过推理生成视频提供信息丰富的梯度；一个合理性目标，防止优化后的噪声漂移到分布外。利用用于自动驾驶和机器人操作的最先进的视频世界模型，我们展示了StressDream能够有效地将想象引导至推理时由文本指定的高影响且合理的结果，例如任务失败，从而通过识别那些合理未来包含不良结果的动作，实现鲁棒的策略评估与改进。视频结果见https://junwon.me/StressDream/。

英文摘要

Video world models (WMs) have shown promise for policy evaluation and improvement by imagining realistic future observations conditioned on ego-robot actions. While WMs can model distributions over futures, policy evaluation and improvement typically rely on nominal imaginations, which can miss high-impact outcomes of robot actions unless prohibitively many samples are drawn. To enable robust policy evaluation and improvement over WM imaginations, we propose StressDream, which steers imaginations toward high-impact yet plausible outcomes specified at inference time by optimizing the initial noise of diffusion-based WMs. However, optimizing high-dimensional noise is challenging: the optimization must reason about nuanced, scene-dependent target events in generated videos while avoiding out-of-distribution (OOD) noise that yields implausible imaginations. We address this with two complementary objectives: a semantic objective with a Vision-Language Model that provides informative gradients by reasoning about the generated video, and a plausibility objective that prevents the optimized noise from drifting OOD. With state-of-the-art video world models for autonomous driving and robotic manipulation, we show that StressDream effectively steers imaginations toward high-impact yet plausible outcomes specified by text at inference time, such as task failures, enabling robust policy evaluation and improvement by identifying actions whose plausible futures include undesirable outcomes. Video results are available at https://junwon.me/StressDream/.

URL PDF HTML ☆

赞 0 踩 0

2606.00266 2026-06-02 cs.NI cs.LG 版本更新

KISS: Keeping it Simple and Slotted when Learning to Communicate over Wireless

KISS：学习无线通信时保持简单和时隙化

Kamil Szczech, Maksymilian Wojnar, Krzysztof Rusek, Katarzyna Kosek-Szott, Szymon Szott

发表机构 * AGH University of Krakow（克拉科夫AGH大学）

AI总结本文使用离线双深度Q网络结合贝叶斯推理，在时隙信道上训练分布式智能体自主学习随机接入策略，实现了接近理论效率且公平的接入，并发现学习到的行为类似于动态调整传输概率的时隙ALOHA。

详情

AI中文摘要

分布式无线系统中长期存在的挑战是确保高效且公平的随机信道接入。现有解决方案通常处理与时间、周期性或集中化相关的特定约束，但它们通常依赖固定启发式方法。受机器学习（ML）最新进展的启发，我们研究ML智能体能否自主学习高效且公平的接入策略，以及这种学习能否为介质访问控制（MAC）设计提供新见解。我们的目标不是提出可部署的协议，而是检验在最小假设下，分散式学习能否重新发现或近似理论上高效的随机接入机制。为此，我们部署了带有贝叶斯推理的离线双深度Q网络（DDQN）来训练在时隙信道上运行的智能体。所得方法完全在线（无需预训练）、完全分布式（独立的多智能体学习器）、随机（非周期性），且无需协调或显式通信。大量仿真表明，学习到的策略适应变化的网络条件，并在保持公平性的同时实现接近理论的效率。消融研究进一步揭示，学习到的行为类似于具有动态调整传输概率的时隙ALOHA，因此我们将该方法称为KISS：保持简单和时隙化。

英文摘要

A long-standing challenge in distributed wireless systems is ensuring efficient and fair random channel access. Existing solutions often address specific constraints related to timing, periodicity, or centralization, but they typically rely on fixed heuristics. Motivated by recent advances in machine learning (ML), we investigate whether ML agents can autonomously learn efficient and fair access strategies, and whether such learning can offer new insights into medium access control (MAC) design. Rather than proposing a deployable protocol, our aim is to examine whether decentralized learning can rediscover or approximate theoretically efficient random-access mechanisms under minimal assumptions. To this end, we deploy an off-policy Double Deep Q-Network (DDQN) with Bayesian inference to train agents operating over a slotted channel. The resulting method is fully online (no pre-training), fully distributed (independent multi-agent learners), stochastic (non-periodic), and requires no coordination or explicit communication. Extensive simulations show that the learned strategy adapts to varying network conditions and achieves near-theoretical efficiency while maintaining fairness. Ablation studies further reveal that the learned behavior resembles slotted ALOHA with a dynamically adjusted transmission probability, leading us to refer to the method as KISS: Keeping It Simple and Slotted.

URL PDF HTML ☆

赞 0 踩 0

2606.00265 2026-06-02 stat.ML cs.LG 版本更新

Out-of-Distribution generalization of quantile regression with heavy tailed inputs: an SVM approach

重尾输入下分位数回归的分布外泛化：一种SVM方法

Baptiste Leroux, Clément Dombry, Anne Sabourin

AI总结针对协变量取异常大值的分位数回归外推问题，提出基于支持向量机（SVM）的框架，利用再生核希尔伯特空间处理高维非线性情况，并建立有限样本学习保证。

Comments 48 pages, 5 figures

详情

AI中文摘要

我们研究了协变量取异常大值的外推机制下的分位数回归。在正则变化假设下，极端观测可以通过其角度分量有效表征，从而使得学习策略能够聚焦于最极端观测的角度。该方法通过最小化渐近条件风险来形式化，该风险将学习定位在协变量分布的尾部。我们提出了一种新的支持向量机（SVM）框架用于极端分位数回归，利用再生核希尔伯特空间处理高维和非线性设置。我们的方法还适应无界响应变量，并避免了限制性变换。我们在温和的正则性假设下建立了有限样本学习保证。该框架统一了统计学习和多元极值的思想，提供了一种可处理且理论扎实的外推方法。我们通过对多瑙河河流流量数据的实证研究补充了理论发现，证明了我们方法的实际相关性。

英文摘要

We study quantile regression in an extrapolation regime where the covariate takes unusually large values. Under regular variation assumptions, extreme observations can be effectively characterized through their angular components, enabling learning strategies that focus on the angle of the most extreme observations. This approach is formalized through the minimization of an asymptotic conditional risk that localizes learning in the tail of the covariate distribution. We propose a novel Support Vector Machine (SVM) framework for extreme quantile regression, leveraging reproducing kernel Hilbert spaces to handle high-dimensional and nonlinear settings. Our method also accommodates unbounded response variables and avoids restrictive transformations. We establish finite-sample learning guarantees under mild regularity assumptions. The proposed framework unifies ideas from statistical learning and multivariate extremes, providing a tractable and theoretically grounded approach to extrapolation. We complement our theoretical findings with an empirical study on river flow data from the Danube, demonstrating the practical relevance of our methods.

URL PDF HTML ☆

赞 0 踩 0

2606.00263 2026-06-02 eess.SP cs.LG 版本更新

ReFLEX: Length-Generalizable CSI Denoising for MIMO-OFDM via Relative-Frequency Bias

ReFLEX: 通过相对频率偏置实现MIMO-OFDM中长度可泛化的CSI去噪

Zhibin Zhang, Robert Potekhin, Ziwei Wan, Vladimir Lyashev, Zhen Gao

发表机构 * Moscow Institute of Physics and Technology (State University)（莫斯科物理技术学院（国家大学））； Yangtze Delta Region Academy（长江三角洲地区研究院）； Beijing Institute of Technology（北京理工大学）； School of Interdisciplinary Science（交叉科学学院）

AI总结提出ReFLEX，一种基于相对频率位置偏置（RFPB）的长度可泛化Transformer，用于MIMO-OFDM系统中可变RB分配的CSI去噪，在未见RB长度和稀疏DM-RS场景下无需重训即可应用，并在3GPP信道和NR PUSCH仿真中显著提升性能。

Comments 5 pages, 3 figures, submitted to IEEE journal

2606.00262 2026-06-02 cs.LG cs.AI stat.AP stat.ML 版本更新

When Softmax Fails at the Top: Extreme Value Corrections for InfoNCE

当 Softmax 在顶部失效：InfoNCE 的极值修正

Melihcan Erol, Suat Evren, Oktay Ozel, Alexander Morgan, Jongha Jon Ryu, Lizhong Zheng

发表机构 * University of Waterloo（滑铁卢大学）

AI总结针对 InfoNCE 中 softmax 假设与对比学习嵌入设置不匹配的问题，提出基于极值理论的 WEINCE 修正方法，在五个视觉基准上提升冻结特征评估性能。

Comments Presented in ICML 2026

2606.00257 2026-06-02 cs.LG cs.AI 版本更新

ARCA: Adapter-Residual Credit Assignment When Token Signals Degenerate

ARCA: 当令牌信号退化时的适配器-残差信用分配

Rodney Lafuente-Mercado

发表机构 * Rodney Lafuente-Mercado（罗伊德·拉福恩特-默茨）

AI总结针对LoRA微调下令牌级信用分配信号退化的问题，提出ARCA方法，利用适配器隐藏状态残差作为令牌显著性度量，无需学习奖励模型或价值头。

Comments Accepted to DEMO 2026: ICML Workshop on Decision-Making from Offline Datasets to Online Adaptation. Non-archival report

详情

AI中文摘要

语言模型强化学习的令牌级信用分配通常被表述为策略完全可训练，而实际的LLM-RL流程往往依赖于参数高效微调，尤其是LoRA。我们认为这种分离隐藏了一种结构性失效模式。在LoRA下，策略被限制在参考模型的低秩邻域内，因此常用内在信用信号（如惊奇度、熵减和策略散度）所依赖的每令牌输出分布差异，在轨迹内归一化后可能变得退化，要么接近均匀权重，要么集中在少量与任务无关的位置上。我们形式化了这种行为，并提出直接用浓度诊断指标（如权重基尼系数和有效令牌比率）进行测量。然后，我们引入了适配器-残差信用分配（ARCA），一种轻量级替代方案，它从适配器自身的隐藏状态残差 $\|h^{\text{adapted}}_t - h^{\text{base}}_t\|_2$ 中推导令牌显著性。ARCA关注适配器实际改变模型的位置，而不是输出分布显得不确定或偏移的位置，并且不需要学习奖励模型、价值头或树结构。在紧凑的MATH/Qwen3-1.7B GRPO扫描中，ARCA在匹配的轨迹预算下表现出预测的非退化中间区域信用分布，并与秩匹配的基线保持竞争力。

英文摘要

Token-level credit assignment for language-model reinforcement learning is usually formulated as if the policy were fully trainable, while practical LLM-RL pipelines often rely on parameter-efficient fine-tuning, especially LoRA. We argue that this separation hides a structural failure mode. Under LoRA, the policy is restricted to a low-rank neighborhood of the reference model, so the per-token output-distribution differences used by common intrinsic credit signals, surprisal, entropy reduction, and policy divergence, can become degenerate after within-trajectory normalization, either approaching uniform weights or concentrating on a small set of task-agnostic positions. We formalize this behavior and propose measuring it directly with concentration diagnostics such as weight Gini and effective-token ratio. We then introduce \emph{Adapter-Residual Credit Assignment} (ARCA), a lightweight alternative that derives token salience from the adapter's own hidden-state residual, $\|h^{\text{adapted}}_t - h^{\text{base}}_t\|_2$. ARCA asks where the adapter actually changes the model, rather than where the output distribution appears uncertain or shifted, and requires no learned reward model, value head, or tree construction. In a compact MATH/Qwen3-1.7B GRPO sweep, ARCA exhibits the predicted non-degenerate middle-regime credit distribution under matched rollout budgets and remains competitive with rank-matched baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.00253 2026-06-02 cs.RO cs.LG 版本更新

Per-Group Error, Not Total MSE: Fine-Tuning Vision-Language-Action Models for 11-DoF Mobile Manipulation

分组误差而非总MSE：微调视觉-语言-动作模型用于11自由度移动操作

Pau Montagut Bofi, Mario García Blasco, Tessa Pulli, Markus Vincze

发表机构 * University of California, Berkeley（加州大学伯克利分校）； ETH Zurich（苏黎世联邦理工学院）

AI总结针对异构关节空间的移动操作器微调视觉-语言-动作模型时，发现总MSE最低的检查点并非实际表现最佳，提出以分组误差作为更可靠的检查点选择指标。

Comments 4 pages, 3 figures, 3 tables. Accepted as poster at ICRA 2026 Workshop "From Data to Decisions: VLA Pipelines for Real Robots". Code: [https://github.com/paumontagut/per-group-mse-vla](https://github.com/paumontagut/per-group-mse-vla)

详情

AI中文摘要

对具有异构关节空间的移动操作器微调视觉-语言-动作（VLA）模型可能产生反直觉的结果：总MSE最低的检查点并非在真实机器人上表现最佳。我们认为这是将异构关节组（手臂、夹爪、头部、轮式底座）合并为单一指标的可预测后果，其中易于预测的关节可能掩盖仍然失败的关节。我们在11自由度Toyota HSR上微调SmolVLA（450M，仅动作专家），并将其与更强的预训练基线$π_{0.5}$（3.3B）进行比较。分组分析揭示了两种模式：在SmolVLA中，移动底座收敛最慢并限制了整体性能。在$π_{0.5}$的仅专家微调（仅训练动作头，骨干冻结）中，总MSE低于基线但手臂精度下降。在60次真实机器人试验（每个模型20次）中，$π_{0.5}$ 80k（4.0/4）显著优于两种微调变体（仅专家3k：3.75/4；HSR-SmolVLA：3.5/4；Mann-Whitney $p \leq 0.010$），尽管仅专家3k的总MSE最低。这种差异与离线手臂组误差最为一致，而非总MSE或底座组误差。我们得出结论：对于具有异构动作空间的机器人，分组误差比总MSE是更可靠的检查点选择信号。代码：https://github.com/paumontagut/per-group-mse-vla

英文摘要

Fine-tuning Vision-Language-Action (VLA) models for mobile manipulators with heterogeneous joint spaces can produce a counterintuitive result: the checkpoint with the lowest aggregate MSE is not the one that performs best on the real robot. We argue this is a predictable consequence of collapsing heterogeneous joint groups (arm, gripper, head, wheeled base) into a single metric, where easy-to-predict joints can mask joints that still fail. We fine-tune SmolVLA (450M, action-expert only) on the 11-DoF Toyota HSR and compare it against $π_{0.5}$ (3.3B), a stronger pretrained baseline. Per-group analysis exposes two patterns: in SmolVLA, the mobile base converges slowest and limits overall performance. In expert-only fine-tuning of $π_{0.5}$ (training only the action head, backbone frozen), total MSE drops below the baseline but arm accuracy degrades. On 60 real-robot trials (20 per model), $π_{0.5}$ 80k (4.0/4) significantly outperforms both fine-tuned variants (expert-only 3k: 3.75/4; HSR-SmolVLA: 3.5/4; Mann-Whitney $p \leq 0.010$), despite expert-only 3k having the lowest total MSE. This separation is most consistent with the offline arm-group error, not total MSE or base-group error. We conclude that per-group error is a more reliable signal than total MSE for checkpoint selection on robots with heterogeneous action spaces. Code: https://github.com/paumontagut/per-group-mse-vla

URL PDF HTML ☆

赞 0 踩 0

2606.00252 2026-06-02 cs.RO cs.LG 版本更新

量化推理模型认为它们需要思考更长时间，但实际上并不需要

Sanae Lotfi, Polina Kirichenko, Steven Li, Zechun Liu

发表机构 * FAIR at Meta（Meta 联合实验室）； Meta AI（Meta 人工智能）

AI总结本文发现后训练量化会降低推理模型准确率并增加思维链长度，通过分析量化模型在中间步骤正确但最终输出错误的“过度思考”错误，提出一种无训练的对过度思考标记施加logit惩罚的方法，在保持或提升准确率的同时减少12-23%的思维链长度。

详情

AI中文摘要

后训练量化（PTQ）被广泛用于高效部署大型语言模型，但其对推理模型的影响尚不明确。在数学、编程和科学问答任务中，我们发现激进的PTQ会降低准确率，同时增加思维链（CoT）长度。令人惊讶的是，我们证明在量化模型高达52%的失败案例中，模型在中间推理步骤中得出了正确答案，但并未将其作为最终答案输出。为了理解量化为何导致这种过度思考错误的增加，我们测量了量化模型与全精度输出分布之间的token级KL散度。KL散度高的位置与高下一个token熵强相关，在这些位置上，量化模型过度采样了“wait”、“but”、“alternatively”等过度思考标记。我们表明，仅对一组精心挑选的过度思考标记引入无训练的logit惩罚，即可在5个模型（1.5B-32B参数）、3种量化方法和5个基准测试中将CoT长度减少12-23%，同时保持或提升准确率，与惩罚其他标记集相比，在准确率与推理成本之间产生了更优的帕累托前沿。量化模型产生的过度思考错误尤其减少了高达58%。

英文摘要

Post-training quantization (PTQ) is widely used to deploy large language models efficiently, but its effect on reasoning models is not well understood. Across math, coding, and science QA, we find that aggressive PTQ reduces accuracy while increasing chain-of-thought (CoT) length. Surprisingly, we show that in up to 52% of the quantized models' failures, models reach the right answer in intermediate reasoning steps but do not output it as a final answer. To understand why quantization leads to this increase in overthinking errors, we measure the token-level KL divergence between quantized and full-precision output distributions. Positions with high KL divergence correlate strongly with high next-token entropy, and at these positions quantized models disproportionately sample overthinking markers such as "wait", "but", and "alternatively". We show that simply introducing a training-free logit penalty on a curated set of overthinking markers can reduce CoT length by 12--23% while preserving or improving accuracy across 5 models (1.5B-32B parameters), 3 quantization methods, and 5 benchmarks, yielding a favorable Pareto frontier of accuracy against reasoning cost compared to penalizing other token sets. Overthinking errors produced by quantized models are particularly reduced by up to 58%.

URL PDF HTML ☆

赞 0 踩 0

2606.00202 2026-06-02 cs.LG cs.AI 版本更新

From Rashomon Theory to PRAXIS: Efficient Decision Tree Rashomon Sets

从Rashomon理论到PRAXIS：高效决策树Rashomon集

Zakk Heile, Hayden McTavish, Varun Babbar, Margo Seltzer, Cynthia Rudin

发表机构 * Stanford University（斯坦福大学）

AI总结针对决策树Rashomon集计算开销大的问题，提出PRAXIS算法，在运行时和内存使用上实现数量级改进，并能恢复几乎完整的Rashomon集。

Comments Accepted to ICML 2026

详情

AI中文摘要

标准机器学习流程通常会产生许多接近最优的模型。这些“Rashomon集”为不确定性感知的鲁棒决策带来了一系列挑战和机遇。它们允许用户整合领域知识和偏好，这些知识和偏好通常难以直接指定为目标函数，并且它们量化了给定训练数据集和目标函数下有效模型之间的多样性。然而，即使对于稀疏决策树这样简单、可解释的模型类，Rashomon集的计算仍然需要巨大的内存和运行时资源。我们提出了PRAXIS，一种近似该Rashomon集的算法，在运行时和内存使用上实现了数量级的改进。我们验证了PRAXIS通常能恢复几乎完整的Rashomon集。PRAXIS使研究人员和从业者能够可扩展地对真实世界数据集的Rashomon集进行建模。PRAXIS的代码可在https://github.com/zakk-h/PRAXIS获取。

英文摘要

Standard machine learning pipelines often admit many near-optimal models. These "Rashomon sets" pose a range of challenges and opportunities for uncertainty-aware, robust decision making. They allow users to incorporate domain knowledge and preferences that would otherwise be difficult to specify directly in an objective, and they quantify diversity among valid models for a given training dataset and objective function. However, computation of Rashomon sets, even for simple, interpretable model classes such as sparse decision trees, continues to require immense memory and runtime resources. We present PRAXIS, an algorithm to approximate this Rashomon set with orders of magnitude improvement in runtime and memory usage. We validate that PRAXIS regularly recovers almost all of the full Rashomon set. PRAXIS allows researchers and practitioners to scalably model the Rashomon set for real-world datasets. Code for PRAXIS is available at https://github.com/zakk-h/PRAXIS

URL PDF HTML ☆

赞 0 踩 0

2606.00198 2026-06-02 cs.LG cs.AI cs.CL 版本更新

BAGEN: Are LLM Agents Budget-Aware?

BAGEN：LLM 智能体是否具有预算意识？

Yuxiang Lin, Zihan Wang, Mengyang Liu, Yuxuan Shan, Longju Bai, Junyao Zhang, Xing Jin, Boshan Chen, Jinyan Su, Xingyao Wang, Jiaxin Pei, Manling Li

发表机构 * Northwestern University（西北大学）； O2 Lab（O2实验室）； Independent（独立）； University of Michigan（密歇根大学）； Cornell（康奈尔大学）； All Hands AI ； Stanford（斯坦福大学）； UT Austin（德克萨斯大学奥斯汀分校）

AI总结本文提出预算感知智能体（BAGEN）概念，将预算作为主动控制信号而非被动成本指标，通过渐进区间估计方法预测剩余预算上下界，并在四个环境和五个前沿模型上发现强模型不一定具有强预算意识、模型过度乐观等失败模式，早期停止可节省 28-64% 令牌，但精确区间校准仍具挑战。

详情

AI中文摘要

尽管智能体正在花费越来越多的资源，但如今智能体成本大多仅在执行后衡量。预算感知智能体（BAGEN）应将预算视为主动控制信号，而非被动成本指标。我们首先系统地将预算估计定义为内部预算（来自智能体计算）和外部预算（来自智能体动作）。然后，我们将预算意识形式化为渐进区间估计：在计划的每一步，智能体应预测剩余预算的上限和下限，并在完成可能性低时发出警报。通过 rollout-replay 协议进行评分，我们在四个环境和五个前沿模型上发现了一致的失败模式：（1）强模型不一定具有强预算意识，相关性 r=0.35。（2）前沿模型始终过度乐观，继续在不太可能成功的任务上花费资源，而不是尽早提醒用户。（3）预算感知信号是可操作且可训练的。早期停止在失败轨迹上节省 28-64% 的令牌，SFT+RL 增强了早期停止和警报行为。（4）精确区间校准仍然具有挑战性，SFT+RL 后区间覆盖率上限为 47%。项目页面：https://ragen-ai.github.io/bagen/

英文摘要

While agents are increasingly spending more resources, today agent cost is mostly measured only after execution. A Budget-Aware Agent (BAGEN) should treat budget as an active control signal, rather than a passive cost metric. We first systematically define budget estimation as internal budgets (from agent computation) and external budgets (from agent actions). We then formalize budget-awareness as progressive interval estimation: at each step of a plan, an agent should predict an upper and lower bound on remaining budget, and alert when completion is unlikely. Scoring with a rollout-replay protocol, we find consistent failure patterns on four environments and five frontier agents: (1) strong agents do not necessarily have strong budget-awareness, with correlation r=0.35. (2) frontier models are consistently over-optimistic, continue spending on tasks that are unlikely to succeed, instead of alerting the user early. (3) budget-aware signal is actionable and trainable. Early stop saves 28-64% tokens on failed trajectories, and SFT+RL strengthens early stop and alert behavior. (4) precise interval calibration remains challenging, with interval coverage capping at 47% after SFT+RL. Project page: https://ragen-ai.github.io/bagen/

URL PDF HTML ☆

赞 0 踩 0

2606.00189 2026-06-02 cs.LG cs.AI 版本更新

Learning to Construct Practical Agentic Systems

学习构建实用的智能体系统

Aditya Kumar, Zhihan Lei, Jerry Yan, Joshua W. Momo, Lauhitya Reddy, Rafael Enrique Cabrera Jimenez, Cassandra A. Cohen, Arthur Kajiyama, William W. Cohen

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Dept. of Computer Science（计算机科学系）； Emory University（埃默里大学）

AI总结本文提出一种基于伪工具和固定工作流的智能体框架，通过模块化设计和多目标优化方法，在保证成本可控和结果质量的前提下，实现实用智能体系统的自动构建与优化。

详情

AI中文摘要

基于LLM的智能体系统的自动设计和优化能够产生复杂的系统，显著提升结果质量，优于现成的智能体模式。然而，对实际部署的智能体系统的研究表明，生产系统更关注推理成本的简单性、可控性和可预测性等问题。本文提出了设计和优化实用智能体系统的原则性方法。我们描述了一个智能体框架，通过定义在受限上下文中递归调用LLM的“伪工具”，使设计者能够强制智能体系统的模块化。利用该框架，我们为多种任务手工设计了智能体，并表明相对于动态规划的工作流，手工构建的固定工作流通常更便宜且更准确。随后，我们提出了针对该框架所需的智能体组件（即伪工具和固定工作流）的新型学习方法。这些学习方法通常优于手工设计的智能体。我们还利用框架的模块化特性，应用多目标优化方法联合优化成本和响应质量，并融合多个学习系统的结果。

英文摘要

Automated design and optimization of agentic LLM-based systems leads to sophisticated systems that substantially improve result quality over off-the-shelf agentic patterns. However, studies of fielded agentic systems show that production systems focus much more on issues such as simplicity, controllability, and predictability of inference costs. In this paper we propose principled approaches to designing and optimizing practical agentic systems. We describe an agent framework that enables designers to enforce modularity in agentic systems, by defining "pseudo-tools" that call LLMs recursively on a restricted context. Using this framework we hand-engineer agents for a diverse set of tasks, and show that relative to dynamically-planned workflows, hand-constructed fixed workflows are generally cheaper and more accurate. We then propose novel learning methods for the agentic components required by this framework, namely pseudo-tools and fixed workflows. These learning methods generally outperform hand-engineered agents. We also exploit the modularity of the framework to apply multi-objective optimization methods to jointly optimize cost and response quality and blend the results of multiple learning systems.

URL PDF HTML ☆

赞 0 踩 0

2606.00187 2026-06-02 cs.LG cond-mat.mtrl-sci 版本更新

AI-Guided Design and Optimization of Graphite-Based Anodes via Iterative Experimental Feedback

基于迭代实验反馈的AI引导石墨负极设计与优化

Qian Du, Mark M. Sullivan, James E. Saal, Florian Huber

发表机构 * Citrine Informatics ； hte GmbH

AI总结提出一种迭代AI引导工作流，通过多目标逆向设计和反馈标签，将电池负极制造成功率从频繁失败提升至100%，高容量电池比例从28.4%增至84.8%，容量保持率从42.1%升至97.3%。

Comments 12 pages, 10 figures, 2 tables

详情

AI中文摘要

本研究提出一种迭代AI引导工作流，通过提高配方可行性和工艺鲁棒性来加速石墨负极开发。利用Citrine平台实现AI/ML引导的多目标逆向设计以优化负极。从嘈杂、不完整的数据集开始，Citrine平台生成早期代理模型，尽管预测确定性低，但突出了缺失的工艺约束。通过迭代添加可行性标签和边界条件失败，工作流迅速收敛到可制造、性能更高的配方。制造可靠性从频繁的工艺失败提高到100%成功的电池生产，而提供≥350 mAh g$^{-1}$的电池比例从28.4%增加到84.8%，容量保持率从42.1%上升到97.3%。这些结果表明，结构化的、反馈驱动的AI工作流可以将不完美的工业数据转化为可操作的指导，实现更快、更可重复的电池电极制造优化。

英文摘要

This study presents an iterative AI-guided workflow that accelerates graphite-based anode development by improving both formulation feasibility and process robustness. Sequential learning via AI/ML-guided multiobjective inverse design for anode optimization was implemented using the Citrine Platform. Starting from a noisy, incomplete dataset, the Citrine Platform was used to generate early surrogate models, which despite low predictive certainty highlighted missing process constraints. By iteratively adding feasibility labels and boundary condition failures, the workflow rapidly converged toward manufacturable, higher-performing formulations. Fabrication reliability improved from frequent process failures to 100% successful cell production, while the fraction of cells delivering $\geq$ 350 mAh g$^{-1}$ increased from 28.4% to 84.8%, with capacity retention rising from 42.1% to 97.3%. These results demonstrate that structured, feedback-driven AI workflows can transform imperfect industrial data into actionable guidance, enabling faster, more reproducible optimization of battery electrode manufacturing.

URL PDF HTML ☆

赞 0 踩 0

2606.00183 2026-06-02 cs.LG cs.AI math.OC stat.ML 版本更新

Agentic Transformers Provably Learn to Search via Reinforcement Learning

智能体Transformer通过强化学习可证明地学会搜索

Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； University of Pennsylvania（宾夕法尼亚大学）； The Ohio State University（俄亥俄州立大学）； Yale University（耶鲁大学）

AI总结本文通过构建双头Transformer实现随机深度优先搜索，并分析策略梯度训练动力学，证明该搜索机制能从稀疏强化反馈中分阶段涌现，且具备深度泛化能力。

详情

AI中文摘要

树搜索是许多语言智能体推理和决策任务的核心抽象：智能体必须探索动作、记住失败并回溯到有希望的替代方案。然而，我们缺乏对基于Transformer的策略如何从强化学习（RL）的训练动态中获得这种搜索能力的理论理解。我们在一个随机的$k$叉树环境中研究这个问题，其中智能体Transformer仅通过交互观察其轨迹历史，并在到达隐藏的叶子目标节点时获得终端奖励。我们首先构建了一个实现随机深度优先搜索（DFS）的双头Transformer：一个头跟踪之前的动作，而另一个头检测失败结果并触发回溯。然后，我们分析了在深度课程下的策略梯度训练动态，表明相同的DFS机制在没有专家演示的情况下，从稀疏强化反馈中分阶段涌现。得到的策略表现出深度泛化能力：仅在深度为1和2的树上训练后，它能在更深的完整树上成功。我们进一步表明，在非平衡目标分布下，对回报进行折扣会导致一种排序的DFS策略，优先考虑高概率分支。总的来说，我们的结果确定了基于Transformer的搜索的一种机制性标准形式，其中注意力头专门化并协作，从上下文中提取与决策相关的轨迹，并通过RL训练将其转化为智能体动作选择。

英文摘要

Tree search is a central abstraction behind many language-agent reasoning and decision-making tasks: agents must explore actions, remember failures, and backtrack toward promising alternatives. Yet, we lack a theoretical understanding of how transformer-based policies acquire such search capabilities from the training dynamics of reinforcement learning (RL). We study this question in a stochastic $k$-ary tree environment, where an agentic transformer observes only its trajectory history through interaction and receives a terminal reward for reaching a hidden leaf goal node. We first construct a two-head transformer that implements randomized depth-first search (DFS): one head tracks previous actions, while the other detects failure outcomes and triggers backtracking. We then analyze the training dynamics of policy gradient under a depth-wise curriculum, showing that this same DFS mechanism emerges in stages from sparse reinforcement feedback without expert demonstrations. The resulting policy exhibits depth generalization: after training only on depth-$1$ and depth-$2$ trees, it succeeds on deeper full trees. We further show that, under imbalanced goal distributions, discounting the return leads to a ranked DFS policy that prioritizes higher-probability branches. Overall, our results identify a mechanistic normal form for transformer-based search, in which attention heads specialize and cooperate to extract decision-relevant traces from context and convert them into agentic action selection via RL training.

URL PDF HTML ☆

赞 0 踩 0

2606.00180 2026-06-02 cs.LG cs.AI 版本更新

Beyond Augmentation: Score-Guided Pathological Prior for EEG-based Depression Detection

超越增强：基于评分引导的病理先验用于脑电图抑郁症检测

Xiaojing Chen, Jingqi Cheng, Xu Zhao, Wan Jiang, Jingjing Wu

发表机构 * School of Internet, Anhui University（安徽大学互联网学院）； School of Computer Science and Technology, Hefei University of Technology（合肥工业大学计算机科学与技术学院）； School of Computer Science and Information Engineering, Hefei University of Technology（合肥工业大学计算机科学与信息工程学院）

AI总结针对脑电图抑郁症检测中的小样本困境，提出无数据增强的评分引导分类框架，利用生成网络建模病理先验并融合深度特征，同时设计跨通道空间适应模块解决多中心数据集硬件异构问题。

详情

AI中文摘要

基于深度学习的脑电图（EEG）重度抑郁症（MDD）检测从根本上受到“小样本困境”的制约。主流的生成式数据增强方法不仅带来沉重的计算开销，还可能引入合成噪声，从而模糊分类边界。为了挑战传统的“数据数量优先”惯例，我们提出了一种新颖的框架“超越增强”：评分引导分类（SGC）。SGC不合成伪样本，而是利用无监督生成网络架构对样本的结构和统计异常程度进行建模，作为核心的“病理先验”。该先验经过鲁棒归一化后，与深度特征表示显式融合，从而精确指导分类器的决策边界。此外，为了动态适应不同的通道配置，我们提出了跨通道空间适应模块，利用空间映射机制有效解决多中心数据集中不匹配通道的硬件异构问题。在Mumtaz2016和高密度MODMA数据集上的大量实验证明了我们的方法在具有挑战性的“零数据增强”设置和“零样本合成成本”下的有效性和卓越的泛化能力。

英文摘要

Deep learning-based Major Depressive Disorder (MDD) detection using Electroencephalography (EEG) is fundamentally constrained by the "small-sample dilemma." Prevailing generative data augmentation methods not only incur heavy computational overhead but also risk introducing synthetic noise, thereby blurring classification boundaries. To challenge the traditional "data quantity first" convention, we propose a novel framework "Beyond Augmentation": Score-Guided Classification (SGC). SGC does not synthesize pseudo-samples; instead, it utilizes an unsupervised generative network architecture to model the structural and statistical anomaly degrees of samples, serving as the core "Pathological Prior". This prior, after robust normalization, is explicitly fused with deep feature representations, thereby precisely guiding the classifier's decision boundary. Furthermore, to dynamically adapt to varying channel configurations, we propose a Cross-Channel Spatial Adaptation module, utilizing a spatial mapping mechanism to effectively resolve the hardware heterogeneity of mismatched channels in multi-center datasets. Extensive experiments on the Mumtaz2016 and high-density MODMA datasets demonstrate the effectiveness and exceptional generalizability of our method under the challenging "zero data augmentation" setting and at "zero sample synthesis cost". Keywords: Electroencephalography (EEG), Depression Detection, Anomaly Score, Diffusion Models, Few-Shot Learning

URL PDF HTML ☆

赞 0 踩 0

2606.00169 2026-06-02 cs.LG cs.AI 版本更新

通过重试在策略梯度强化学习中涌现探索行为

Soichiro Nishimori, Paavo Parmas, Sotetsu Koyamada, Tadashi Kozuno, Toshinori Kitamura, Shin Ishii, Yutaka Matsuo

发表机构 * University of Tokyo（东京大学）； Aalto University（阿尔托大学）

AI总结提出ReMax目标函数，通过最大化M个样本的期望最大回报来使探索行为自然涌现，并推导策略梯度公式及RePPO算法，在MinAtar和Craftax基准上无需显式探索奖励即可促进探索。

详情

AI中文摘要

在强化学习（RL）中，智能体从探索中获益仅仅是因为它们反复遇到相似的状态：尝试不同的动作可以提高性能或减少不确定性；没有这样的重试，贪婪策略是最优的。我们通过ReMax形式化这一直觉，该目标函数根据$M$个样本（$M$为正整数）的期望最大回报来评估策略，同时考虑回报的不确定性。优化该目标函数会使随机探索作为涌现属性出现，无需显式奖励项。为了实现高效的策略优化，我们为ReMax推导了新的策略梯度公式，并引入ReMax PPO（RePPO），这是一种PPO变体，它优化ReMax的同时将离散重试次数$M$推广为连续参数$m>0$，从而实现对探索的细粒度控制。实验上，RePPO在MinAtar和Craftax基准上无需任何显式探索奖励即可促进探索。

英文摘要

In reinforcement learning (RL), agents benefit from exploration only because they repeatedly encounter similar states: trying different actions can improve performance or reduce uncertainty; without such retries, a greedy policy is optimal. We formalize this intuition with ReMax, an objective that evaluates a policy by the expected maximum return over $M$ samples, where $M$ is a positive integer, while accounting for return uncertainty. Optimizing this objective induces stochastic exploration as an emergent property, without explicit bonus terms. For efficient policy optimization, we derive a new policy-gradient formulation for ReMax and introduce ReMax PPO (RePPO), a PPO variant that optimizes ReMax while generalizing the discrete retry count $M$ to a continuous parameter $m > 0$, enabling fine-grained control of exploration. Empirically, RePPO promotes exploration, without any explicit exploration bonuses, on the MinAtar and Craftax benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2606.00147 2026-06-02 cs.LG cs.AI 版本更新

RAFT: Data Refinement and Adaptive Distillation for Domain Fine-Tuning with Alleviated Forgetting

RAFT：用于缓解遗忘的领域微调的数据精炼与自适应蒸馏

Yuduo Li, Xiaofeng Shi, Qian Kou, Longbin Yu, Hua Zhou

发表机构 * Beijing Academy of Artificial Intelligence (BAAI)（北京人工智能研究院）； Beijing Jiaotong University (BJTU)（北京交通大学）

AI总结提出RAFT框架，通过数据精炼（自条件重写、语义过滤、答案融合）和答案条件在线蒸馏（top-K温度蒸馏、EMA自适应损失平衡）来解决领域微调中的监督兼容性差距和轨迹保持差距，在提升领域性能的同时缓解通用能力退化。

Comments preprint

详情

AI中文摘要

领域特定的监督微调（SFT）通常以提高领域内性能为代价，导致模型通用能力下降。我们将这种退化归因于领域SFT中的两个实际差距：监督兼容性差距，即领域目标在风格和推理格式上与原始模型的自然响应不同；以及轨迹保持差距，即教师强制SFT优化固定目标令牌，而不约束模型在其自身生成前缀上的行为。这个过程未能保留模型的原始行为。我们提出RAFT（用于缓解遗忘的领域微调的数据精炼与自适应蒸馏），一个两阶段框架来解决这两个因素。首先，RAFT通过自条件重写、语义过滤和答案融合构建模型兼容的监督。其次，RAFT执行答案条件在线蒸馏，其中原始指令调优模型在学生生成的轨迹上提供软目标，同时以融合答案作为有用上下文进行条件化。我们进一步引入top-K温度蒸馏和基于EMA的自适应损失平衡来稳定领域-通用权衡。在三个指令调优骨干和五个领域上，RAFT相比标准SFT将平均领域准确率提高了23.2%，同时恢复了MS-Bench和IFEval上SFT引起的部分退化，相对改进分别为18.2%和10.2%。这些结果表明，将数据精炼与轨迹级保持相结合为缓解遗忘的领域微调提供了有效方案。

英文摘要

Domain-specific supervised fine-tuning (SFT) often improves in-domain performance at the cost of degrading a model's general capabilities. We view this degradation through two practical gaps in domain SFT: a supervision-compatibility gap, where domain targets differ in style and reasoning format from the original model's natural responses, and a trajectory-preservation gap, where teacher-forced SFT optimizes fixed target tokens without constraining the model's behavior on its own generated prefixes. This process fails to preserve the model's original behavior. We propose RAFT (Data Refinement and Adaptive Distillation for Domain Fine-Tuning with Alleviated Forgetting), a two-stage framework that addresses both factors. First, RAFT constructs model-compatible supervision through self-conditioned rewriting, semantic filtering, and answer fusion. Second, RAFT performs Answer-Conditioned On-Policy Distillation, where the original instruction-tuned model provides soft targets on student-generated trajectories while being conditioned on the fused answer as helpful context. We further introduce top-K temperature distillation and EMA-based adaptive loss balancing to stabilize the domain-general trade-off. Across three instruction-tuned backbones and five domains, RAFT improves average domain accuracy by 23.2% over standard SFT, while recovering part of the SFT-induced degradation on MS-Bench and IFEval, with relative improvements of 18.2% and 10.2%, respectively. These results show that coupling data refinement with trajectory-level preservation provides an effective recipe for domain fine-tuning with alleviated forgetting.

URL PDF HTML ☆

赞 0 踩 0

2606.00144 2026-06-02 cs.LG cs.AI 版本更新

BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding

BudgetDraft：面向稀疏KV投机解码的接受感知多视角训练

Liang He, Jingbo Wen, Qishi Zhan, Yixiong Chen, Kangning Cui, Qizhen Lan, Xilu Wang

发表机构 * Shanghai Institute of Optics and Fine Mechanics（上海光学精密机械研究所）； The University of Sydney（悉尼大学）； Marquette University（马基特大学）； Johns Hopkins University（约翰·霍普金斯大学）； Wake Forest University（威克森林大学）； University of Texas Health Science Center at Houston（德克萨斯大学健康科学中心休斯顿分部）； University of Surrey（萨里大学）

AI总结针对中长上下文推理中稀疏/全缓存不匹配导致接受率下降的问题，提出BudgetDraft多视角稀疏训练方法，通过接受感知损失和多视角损失训练单一鲁棒草稿模型，在固定KV预算下恢复接受率，实现最高6.55倍加速。

详情

AI中文摘要

投机解码通过草稿模型提出多个令牌，验证器并行验证，从而加速自回归解码。在资源受限的部署中，草稿模型使用稀疏KV缓存以在固定KV预算下限制峰值GPU内存和端到端延迟，而验证器保留全KV缓存。实际应用中常见中长上下文推理（4K--16K上下文长度）。然而，随着上下文长度增长，朴素稀疏/全投机解码遭受稀疏/全不匹配问题，导致接受率快速下降。我们提出BudgetDraft，一种用于中长推理中稀疏草稿的多视角稀疏训练方法。草稿模型在训练期间暴露于多个采样的KV预算，并学习将每个稀疏视角与一个共享的全缓存教师目标对齐。BudgetDraft将全缓存分支上的接受感知损失与稀疏缓存分支上的多视角损失相结合，产生一个单一的预算鲁棒草稿模型，无需额外的推理时组件即可恢复跨稀疏级别的接受率。在PG-19、LongBench和LWM上的实验结果表明，BudgetDraft在4K、8K和16K上下文长度下，与自回归相比分别实现了最高6.55倍、4.46倍和2.10倍的端到端加速，同时保持推理流水线内存友好。

英文摘要

Speculative decoding speeds up autoregressive decoding by using a drafter to propose multiple tokens that a verifier validates in parallel. In resource-constrained deployments, the drafter uses a sparse KV cache to limit peak GPU memory and end-to-end latency under a fixed KV budget, while the verifier keeps a full KV cache. Mid-to-long context inference (4K--16K context length) is common in real applications. However, naive sparse/full speculative decoding suffers from the sparse/full mismatch as context length grows, causing the acceptance rate to drop quickly. We propose BudgetDraft, a multi-view sparse training method for sparse drafting in mid-to-long inference. The drafter is exposed to multiple sampled KV budgets during training and learns to align each sparse view with one shared full-cache teacher target. BudgetDraft combines an acceptance-aware loss on a full-cache branch with a multi-view loss on a sparse-cache branch, producing a single budget-robust drafter that recovers acceptance across sparsity levels without extra inference-time components. Experimental results on PG-19, LongBench, and LWM show that BudgetDraft achieves up to 6.55x, 4.46x, and 2.10x end-to-end speedup vs AR at 4K, 8K, and 16K context lengths, while keeping the inference pipeline memory-friendly.

URL PDF HTML ☆

赞 0 踩 0

2606.00141 2026-06-02 cs.LG cs.AI 版本更新

Adaptive data selection improves wearable prediction under low baseline performance

自适应数据选择改善低基线性能下的可穿戴预测

Ali Kargarandehkordi

AI总结本研究通过评估多种模态下自适应时间窗口选择策略，发现其能显著提升低基线性能参与者的AUROC（最高提升0.7），而高基线性能者收益有限或为负，且增益与基线性能呈强负相关。

详情

AI中文摘要

自适应传感策略通过选择性采样数据，在有限数据预算下提高预测性能，在可穿戴健康系统中应用日益广泛，但其在不同个体间的收益尚不明确。本文基于纵向可穿戴数据集，评估了在固定测量预算下，针对心率、活动和生态瞬时评估（EMA）等多种传感模态，自适应选择时间窗口进行模型训练的效果。我们使用接收者操作特征曲线下面积（AUROC）和F1分数量化了相对于随机采样的性能提升。自适应策略为基线性能较低的参与者带来了显著的AUROC提升（增益高达0.7），而对基线性能较强的参与者增益有限甚至为负。跨模态来看，自适应增益与基线性能呈强负相关（Pearson r = -0.67；Spearman ρ = -0.62）。在参与者层面，大多数个体在AUROC上受益（跨模态为60-80%），尽管F1的改进较小且一致性较差。这些发现表明，自适应传感并非普遍有益，而是在性能不佳的情况下提供最大价值。我们的结果支持基于基线性能定制自适应传感的选择性部署策略，以提高可穿戴健康监测的效率。

英文摘要

Adaptive sensing strategies that selectively sample data are increasingly used in wearable health systems to improve prediction performance under limited data budgets, yet their benefits across individuals remain poorly understood. Here, we evaluate adaptive selection of time windows for model training under fixed measurement budgets across multiple sensing modalities, including heart rate, activity, and ecological momentary assessment (EMA), in a longitudinal wearable dataset. We quantify performance gains relative to random sampling using both area under the receiver operating characteristic curve (AUROC) and F1 score. Adaptive strategies yield substantial improvements in AUROC for participants with low baseline performance (with gains up to 0.7), while offering limited or negative gains for participants with strong baselines. Across modalities, adaptive gain is strongly inversely correlated with baseline performance (Pearson r = -0.67; Spearman p = -0.62). At the participant level, most individuals benefit in AUROC (60-80% across modalities), although improvements in F1 are smaller and less consistent. These findings show that adaptive sensing is not uniformly beneficial, but instead provides the greatest value in underperforming settings. Our results support selective deployment strategies that tailor adaptive sensing based on baseline performance to improve efficiency in wearable health monitoring.

URL PDF HTML ☆

赞 0 踩 0

2606.00136 2026-06-02 cs.LG cs.AI cs.CL cs.CR cs.SI 版本更新

Generative AI and Digital Ecosystem Resilience: A Proactive Lifecycle-Based Survey

生成式AI与数字生态系统韧性：基于生命周期的主动式综述

Jonghyun Chung, Rishabh Chaddha, Sanket Badhe, Debanshu Das, Nathan Huang, Amanpreet Kaur

发表机构 * Google LLC（谷歌有限公司）

AI总结本文采用基于生命周期的C5交互模型，综合机器学习与社会科学方法，系统综述了针对生成式AI驱动的对抗性合成内容的主动检测技术，包括协调不真实行为分析、流行病学建模和霍克斯过程等，旨在构建更具韧性的信息生态系统。

Comments 14 pages, 3 figures, 3 tables. Accepted for publication in IEEE Access (May 2026)

详情

Journal ref: IEEE Access (2026) IEEE Access (2026)

AI中文摘要

生成式AI加速了对抗性合成内容的扩散，使得传统的被动检测方法失效。本综述综合了新兴研究，展示了向主动检测新兴不真实叙事的范式转变。我们采用统一的、基于生命周期的分类法，将对抗性活动的社会技术生命周期模型与新兴不真实叙事检测的高级计算方法相结合。通过围绕C5交互模型（背景、原因、内容、放大循环、后果）构建分析，我们整合了机器学习和社会科学的不同研究流。为了区分合成放大模式与真实基线流量，本文综述了建模新叙事创建、播种和传播的最先进技术，包括协调不真实行为分析、流行病学建模和霍克斯过程。本综述还系统回顾了C5交互模型不同阶段对抗性威胁的主动检测方法，特别是高维嵌入空间中的异常检测、多层图上的无监督协调检测以及代理型AI系统。最后，本综述探讨了生成式AI带来的挑战，包括追踪快速变化威胁和多级分布漂移的困难，并概述了未来研究议程，重点在于检测异常聚类和构建预期性及韧性系统。本综述为更韧性的信息生态系统提供了基于生命周期的主动检测新兴合成威胁方法的全面回顾。

英文摘要

The proliferation of adversarial synthetic content, accelerated by Generative AI (GenAI) is rendering traditional reactive detection methods ineffective. This survey synthesizes emerging research to demonstrate a paradigm shift toward the proactive detection of emerging inauthentic narratives. In this survey, we adopt a unified, lifecycle-based taxonomy to combine socio-technical lifecycle models of adversarial campaigns with advanced computational methodologies for emerging inauthentic narrative detection. By structuring the analysis around the C5 Interaction Model (Context, Causes, Content, Cycle of Amplification, Consequences), we integrate different research streams from machine learning and social science. To differentiate spread patterns of synthetic amplification from authentic baseline traffic, this paper surveys state-of-the-art techniques for modeling the creation, seeding, and propagation of fresh narratives, including the analysis of Coordinated Inauthentic Behavior (CIB), epidemiological modeling, and Hawkes process. This survey also provides a systematic review of proactive detection methods for adversarial threats at different stages in the C5 interaction model, specifically, anomaly detection in high-dimensional embedding spaces, unsupervised coordination detection on multi-layer graphs, and agentic AI systems. Finally, this survey addresses challenges posed by GenAI, including the difficulty of tracking rapidly changing threats and multi-level distributional drift, and it outlines a future research agenda focused on detecting anomalous clusters and building anticipatory and resilient systems. This survey provides a comprehensive, lifecycle-based review of methods for the proactive detection of emerging synthetic threats for more resilient information ecosystems.

URL PDF HTML ☆

赞 0 踩 0

2606.00135 2026-06-02 cs.LG cs.AI 版本更新

On Effectiveness and Efficiency of Agentic Tool-calling and RL Training

论智能体工具调用与强化学习训练的有效性与效率

Tong Liu, Cheng Qian, Matej Cief, Yuan He, Daniele Dan, Nikolaos Aletras, Gabriella Kazai

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Cambridge（剑桥大学）； University of Toronto（多伦多大学）

AI总结本文系统分析工具调用评估中的实现选择对结果敏感性的影响，并针对强化学习训练中的计算浪费提出两种加速技术。

Comments ICML 2026

详情

AI中文摘要

工具调用是现代大型语言模型（LLM）智能体的核心组件，使其具备超越参数化知识的技能。本文从两个互补维度研究工具调用：有效性（即如何衡量该能力）和效率（即如何学习该能力）。在有效性方面，我们系统分析了工具调用评估流程，并表明结果可能对看似微小、通常未文档化的实现选择高度敏感，包括随机种子、系统提示、多轮模板构建以及先前交互/推理历史的传递方式。这些选择可能导致报告性能的显著差异，尤其是在多轮设置中，若缺乏严格标准化，排行榜排名将不可靠。在效率方面，我们考察了用于工具调用的标准强化学习（RL），并识别出两个计算浪费来源：（i）在 rollout 过程中，许多提示不产生学习信号；（ii）在策略更新过程中，优化产生高计算成本。基于这些发现，我们引入了两种加速基于 RL 的工具调用训练的技术，在不降低性能的情况下实现了显著的挂钟时间加速。

英文摘要

Tool-calling is a central component of modern large language model (LLM) agents, equipping them with skills beyond their parametric knowledge. This paper studies tool-calling along two complementary axes: effectiveness, i.e., how this capability is measured, and efficiency, i.e., how it is learned. On effectiveness, we systematically analyze tool-calling evaluation pipelines and show that results can be highly sensitive to seemingly minor, often undocumented implementation choices including the random seed, system prompt, multi-turn template construction, and how prior interaction/reasoning history is carried forward. These choices can lead to substantial differences in reported performance, especially in multi-turn settings where without rigorous standardization, leaderboard rankings are unreliable. On efficiency, we examine standard reinforcement learning (RL) for tool-calling and identify two sources of computational waste: (i) during rollouts, many prompts produce no learning signal, and (ii) during policy updates, optimization incurs high computational cost. Guided by these findings, we introduce two techniques that accelerate RL-based tool-calling training, achieving substantial wall-clock speedup without degrading performance.

URL PDF HTML ☆

赞 0 踩 0

2606.00134 2026-06-02 cs.CR cs.AI cs.LG 版本更新

XAI-SOH-FL: Enhancing SOH-FL with Adaptive Aggregation and Explainable AI for Intrusion Detection in Heterogeneous IoT

XAI-SOH-FL: 通过自适应聚合和可解释人工智能增强异构物联网入侵检测中的SOH-FL

Ambreen Aslam, Maaz Hassan, Bibi Zahra, Muhammad Khuram Shahzad

发表机构 * School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST)（电气工程与计算机科学学院（SEECS），国家 sciences and Technology（NUST））

AI总结针对异构物联网中数据异构、标签稀缺和模型不可解释性问题，提出XAI-SOH-FL框架，通过自适应聚合（动态γ选择与贝叶斯优化）和SHAP可解释性，在CICIDS2017数据集上达到94.12%准确率和0.92 F1分数，优于基线SOH-FL。

Comments 8 pages, 6 figures; code available at https://github.com/aaslam-msit/SOH-FL-Enhancement

详情

AI中文摘要

物联网环境中的入侵检测系统面临数据异构、缺乏标记数据和模型可解释性有限等重大挑战。联邦学习提供了一种隐私保护解决方案；然而，现有方法如SOH-FL存在两个关键限制：依赖手动调整的聚合参数γ以及模型预测缺乏可解释性。在本文中，我们提出XAI-SOH-FL，一个增强框架，将自适应聚合和可解释人工智能集成到SOH-FL范式中。首先，我们引入基于相似性阈值的动态γ选择机制，使聚合过程能够适应不断变化的数据分布。其次，采用贝叶斯优化自动确定最优γ值，消除了手动调整的需要。第三，引入SHAP（SHapley Additive exPlanations）为入侵检测决策提供特征级可解释性。在CICIDS2017数据集上的实验评估表明，所提方法达到了94.12%的准确率和0.92的F1分数，优于基线SOH-FL模型，同时收敛所需的通信轮次更少。此外，基于SHAP的分析揭示，流级特征如流持续时间和数据包长度显著影响模型预测。这些结果表明，XAI-SOH-FL在异构物联网环境中提供了准确性、适应性和可解释性之间的有效平衡。

英文摘要

Intrusion Detection Systems (IDS) in Internet of Things (IoT) environments face significant challenges due to data heterogeneity, lack of labeled data, and limited model interpretability. Federated Learning (FL) offers a privacy-preserving solution; however, existing approaches such as SOH-FL suffer from two key limitations: reliance on a manually tuned aggregation parameter γ and lack of explainability in model predictions. In this paper, we propose XAI-SOH-FL, an enhanced framework that integrates adaptive aggregation and explainable artificial intelligence into the SOH-FL paradigm. First, we introduce a dynamic γ selection mechanism based on similarity thresholding, enabling the aggregation process to adapt to evolving data distributions. Second, Bayesian Optimization is employed to automatically determine optimal γ values, eliminating the need for manual tuning. Third, SHAP (SHapley Additive exPlanations) is incorporated to provide feature-level interpretability for intrusion detection decisions. Experimental evaluation on the CICIDS2017 dataset demonstrates that the proposed approach achieves an accuracy of 94.12% and an F1-score of 0.92, outperforming the baseline SOH-FL model while converging in fewer communication rounds. Furthermore, SHAP-based analysis reveals that flow-level features such as Flow Duration and Packet Length significantly influence model predictions. These results indicate that XAI-SOH-FL provides an effective balance between accuracy, adaptability, and interpretability in heterogeneous IoT environments.

URL PDF HTML ☆

赞 0 踩 0

2606.00133 2026-06-02 cs.LG cs.ET 版本更新

World Models: A Comprehensive Survey of Architectures, Methodologies, Reasoning Paradigms, and Applications

世界模型：架构、方法论、推理范式与应用的全面综述

Arif Hassan Zidan, Yi Pan, Hanqi Jiang, Ruiyu Yan, Wei Ruan, Zihao Wu, Lifeng Chen, Weihang You, Xinliang Li, Bowen Chen, Huawen Hu, Peilong Wang, Sizhuang Liu, Jing Zhang, Siyuan Li, Zhengliang Liu, Yu Bao, Lin Zhao, Lichao Sun, Dajiang Zhu, Xiang Li, Jinglei Lv, Quanzheng Li, Wei Liu, Tianming Liu, Wei Zhang

发表机构 * School of Computer and Cyber Sciences, Augusta University（奥古斯塔大学计算机与网络科学学院）； School of Computing, University of Georgia（佐治亚大学计算机学院）； Department of Biomedical Engineering, New Jersey Institute of Technology（新泽西理工学院生物医学工程系）； Department of Radiology, Massachusetts General Hospital, Harvard Medical School（麻省总医院放射科，哈佛医学院）； Department of Computer Science and Engineering, University of Texas at Arlington（德克萨斯大学阿灵顿分校计算机科学与工程系）； Department of Graduate Psychology, James Madison University（詹姆斯麦迪逊大学研究生心理学系）； Computer Science and Engineering, Lehigh University（莱斯大学计算机科学与工程系）； School of Biomedical Engineering, The University of Sydney（悉尼大学生物医学工程学院）； Tandon School of Engineering, New York University（纽约大学泰坦工程学院）； Department of Radiation Oncology, City of Hope National Medical Center（城市希望国家医学中心放射肿瘤科）； Department of Mayo Clinic Comprehensive Cancer Center, Mayo Clinics（梅奥诊所综合癌症中心，梅奥诊所）； Savannah River Ecology Laboratory (SREL), University of Georgia（萨凡纳河生态实验室（SREL），佐治亚大学）

AI总结本文提出一个多轴分类法，从架构、方法论、推理策略和应用领域四个维度系统综述世界模型，涵盖从早期认知科学基础到PlaNet、Dreamer系列、MuZero、Sora等里程碑系统，并指出未来方向。

详情

AI中文摘要

世界模型作为学习环境结构和动态的内部模拟器，已成为追求通用人工智能的核心范式，使智能体能够在学习到的表征中进行预测、规划和推理。尽管在强化学习、机器人、自动驾驶和视频生成等领域取得了快速进展，但该领域缺乏一个统一的框架来整合其多样化的架构选择、训练方法、推理机制和应用场景。本综述通过一个多轴分类法填补了这一空白，该分类法沿四个维度组织：(i) 架构，涵盖表征格式、动态公式、输入模态、学习范式和下游应用；(ii) 方法论家族，包括状态空间和循环方法、基于Transformer的模型、基于扩散的生成器、物理信息网络和语言增强的多模态系统；(iii) 推理策略，涵盖基于想象的规划、潜在策略学习、反事实推理和不确定性下的规划；(iv) 应用领域，涵盖机器人、自动驾驶、视频预测、多模态智能体、强化学习、科学建模、医学影像、教育测量以及商业和金融。从早期认知科学基础到PlaNet、Dreamer系列、MuZero、Sora、Cosmos和Genie等里程碑系统，我们考察了这些维度如何相互作用，并强调了思维链推理与世界模型想象的最新融合。我们回顾了评估协议和基准，指出了持续存在的挑战，如复合预测误差、模拟到真实迁移和碎片化评估，并概述了未来方向，包括统一的多模态世界模型、基础规模的交互式模拟器以及在安全关键领域的可靠部署。

英文摘要

World models, internal simulators that learn the structure and dynamics of an environment, have emerged as a central paradigm in the pursuit of artificial general intelligence, enabling agents to predict, plan, and reason within learned representations. Despite rapid progress across reinforcement learning, robotics, autonomous driving, and video generation, the field lacks a unified framework integrating its diverse architectural choices, training methods, reasoning mechanisms, and application settings. This survey addresses that gap with a multi-axis taxonomy organized along four dimensions: (i) architecture, encompassing representation format, dynamics formulation, input modality, learning paradigm, and downstream application; (ii) methodological family, including state-space and recurrent approaches, transformer-based models, diffusion-based generators, physics-informed networks, and language-augmented multimodal systems; (iii) reasoning strategy, covering imagination-based planning, latent policy learning, counterfactual reasoning, and planning under uncertainty; and (iv) application domain, spanning robotics, autonomous driving, video prediction, multimodal agents, reinforcement learning, scientific modeling, medical imaging, educational measurement, and business and finance. Tracing the field from early cognitive-science foundations to milestone systems such as PlaNet, the Dreamer family, MuZero, Sora, Cosmos, and Genie, we examine how these dimensions interact and highlight the recent convergence of chain-of-thought reasoning with world-model imagination. We review evaluation protocols and benchmarks, identify persistent challenges such as compounding prediction errors, sim-to-real transfer, and fragmented evaluation, and outline future directions toward unified multimodal world models, foundation-scale interactive simulators, and safe deployment in safety-critical domains.

URL PDF HTML ☆

赞 0 踩 0

2606.00132 2026-06-02 cs.LG cs.AI 版本更新

Foundation-Preserving Adaptation via Generalized Rayleigh-Quotient Optimization

基于广义瑞利商优化的基础模型保留适配

Dongjun Kim, Adrian de Wynter, Huancheng Chen, Heasung Kim, Haris Vikalo

发表机构 * The University of Texas at Austin（德克萨斯大学奥斯汀分校）； Microsoft（微软）； Microsoft AI（微软人工智能）； Meta

AI总结提出FoLoRA框架，通过广义瑞利商优化更新方向，在微调中平衡下游任务性能与预训练能力保留。

详情

AI中文摘要

虽然微调有效地将基础模型适配到专门的下游任务，但可能会降低预训练期间获得的非目标能力。现有的遗忘感知方法通常通过专门的初始化或固定约束寻求更安全的更新，但未在训练过程中调节适配-保留权衡。我们提出基础保留LoRA（FoLoRA），一个遗忘感知优化框架。在一阶保留条件的指导下，FoLoRA定义了预训练代理激活上的遗忘惩罚和下游任务激活上的任务效用。然后，它通过广义瑞利商按单位遗忘惩罚的任务效用对更新方向进行评分。由此产生的谱坐标系实现了方向门控Adam更新，在训练过程中衰减低效用-惩罚方向。为了估计遗忘惩罚，FoLoRA通过从预训练模型中采样构建预训练代理校准数据，而不是依赖单个代理数据集。在数学、代码和指令遵循适配上的实验表明，FoLoRA在基线上实现了最强的保留-适配平衡，提高了目标任务性能，同时最好地聚合保留了非目标能力。

英文摘要

While finetuning effectively adapts foundation models to specialized downstream tasks, it can degrade nontarget capabilities acquired during pretraining. Existing forgetting aware methods typically seek safer updates through specialized initialization or fixed constraints, but do not regulate the adaptation preservation trade-off during training. We propose Foundation Preserving LoRA (FoLoRA), a forgetting aware optimization framework. Guided by a first order preservation condition, FoLoRA defines a forgetting penalty over pretraining-proxy activations and a task utility over downstream task activations. It then scores update directions by task utility per unit forgetting penalty via a generalized Rayleigh quotient. The resulting spectral coordinate system enables direction wise gated Adam updates, attenuating low utility to penalty directions during training. To estimate the forgetting penalty, FoLoRA constructs pretraining proxy calibration data by sampling from the pretrained model rather than relying on a single proxy dataset. Experiments on math, code, and instruction following adaptation show that FoLoRA achieves the strongest preservation adaptation balance over baselines, improving target task performance with best aggregate preservation of non target capabilities.

URL PDF HTML ☆

赞 0 踩 0

2606.00131 2026-06-02 cs.SE cs.AI cs.LG cs.PL 版本更新

AI-PROPELLER: Warehouse-Scale Interprocedural Code Layout Optimization with AlphaEvolve

AI-PROPELLER：基于AlphaEvolve的仓库规模过程间代码布局优化

Chaitanya Mamatha Ananda, Rajiv Gupta, Mircea Trofin, Aiden Grossman, Sriraman Tallam, Xinliang David Li, Amir Yazdanbakhsh

发表机构 * University of California, Riverside（加州大学河滨分校）； Google（谷歌）； DeepMind（深度思维）

AI总结提出AI-PROPELLER系统，利用Magellan智能工作流将Propeller的编译器启发式方法演化为细粒度过程间优化器，并通过实际硬件执行评估布局变体，首次在工业仓库规模应用中实现细粒度过程间代码布局优化，性能提升0.23%至1.6%。

详情

AI中文摘要

后链接优化器（如Propeller和BOLT）已证明，精确的、基于性能剖析的代码布局可以从高度优化的二进制文件中提取显著的性能提升。然而，这些系统目前局限于过程内技术，未能充分利用过程间布局的全局潜力。由于组合爆炸的搜索空间和复杂的调用返回语义难以建模，过程间代码布局历来困难。因此，细粒度过程间布局的性能潜力在实践中尚未得到证实。AI-PROPELLER使用Magellan（一种智能工作流），将Propeller中的编译器启发式方法演化为细粒度过程间优化器，并微调所得策略的超参数。为确保高保真度，我们摒弃了近似的静态成本模型，智能工作流生成多个布局变体，并在实际硬件上执行以测量真实性能计数器，为进化循环提供精确的奖励信号。AI-PROPELLER已在包括大型仓库规模应用在内的多个基准测试上进行了评估，实验表明，在使用最先进的FDO和PLO优化后，性能提升0.23%至1.6%，这对于实际二进制文件而言意义重大。这是首次在工业环境中对大型仓库规模应用进行细粒度过程间代码布局优化。

英文摘要

Post-link optimizers (PLOs) such as Propeller and BOLT have demonstrated that precise, profile-guided code layout can extract significant performance gains from heavily optimized binaries. However, these systems are currently restricted to intraprocedural techniques, leaving the global potential of interprocedural layout largely untapped. Interprocedural code layout is historically difficult due to a combinatorially intractable search space and complex call-return semantics that are challenging to model. Consequently, the performance potential of fine-grained interprocedural layout remains unproven in practice. AI-PROPELLER uses Magellan, an agentic workflow that evolves the compiler heuristic in Propeller into a fine-grained interprocedural optimizer and fine-tunes the resulting policy hyperparameters. To ensure high-fidelity, we move away from approximate static cost models and the agentic workflow generates multiple layout variants that are executed on actual hardware to measure real performance counters, providing a precise reward signal for the evolutionary loop. AI-PROPELLER has been evaluated on several benchmarks including large warehouse-scale applications and experiments show performance improvements of 0.23% to 1.6% optimized with state-of-the-art FDO and PLO which is significant for real-world binaries. This is the first time ever that large warehouse-scale applications in industrial settings have been optimized with fine-grained interprocedural code layout.

URL PDF HTML ☆

赞 0 踩 0

2606.00130 2026-06-02 cs.LG cs.AI 版本更新

Automatically Differentiable Nonlinear Tensor Networks (ADNTNs) for Exponential Compression of Deep Neural Networks

自动可微非线性张量网络（ADNTNs）用于深度神经网络的指数级压缩

Andrzej Cichocki, Michal Wietczak

发表机构 * Institute of Computing Intelligence, Polish Academy of Sciences（波兰科学院计算智能研究所）

AI总结提出自动可微非线性张量网络（ADNTNs）作为结构化权重生成器，通过反向模式自动微分端到端训练紧凑核心张量，实现深度神经网络的高效压缩，在AlexNet和VGG-16上达到每层2000倍至77000倍压缩比，且精度与密集基线相当或更优。

Comments 6 figure, 28 pages, to be submitted to Journal and confrence

详情

AI中文摘要

我们研究了自动可微非线性张量网络（ADNTNs），这是一类结构化权重生成器，其紧凑核心张量通过反向模式自动微分（AD）进行端到端训练。该方法可视为低秩适应和张量分解的自然扩展：ADNTN不是使用一个低秩矩阵更新，而是通过小核心、非线性激活和可选的横向混合张量的层次结构构建大权重张量。本文聚焦于三种架构：树张量网络（TTNs）、带边界解缠器的增强型TTN（aTTNs）以及多尺度纠缠重整化拟设（MERA）。该公式支持非线性激活、任务感知目标、批处理以及硬件感知的执行调度。同时，本文明确区分了“微分”收缩程序和使收缩自由：AD并未消除大中间体、不良收缩顺序或一般带环张量网络精确收缩的成本。在AlexNet和VGG-16层上的大量模拟显示，在所研究设置下每层压缩比约为2000倍至77000倍，精度通常与密集基线相当，且在几个VGG-16案例中有所提升。这些结果是令人鼓舞的而非最终结论：它们表明，只要优化、收缩调度和部署内核协同设计，ADNTNs是一条有前景、数学结构清晰且硬件感知的通往更小神经网络的路径。

英文摘要

We study Automatically Differentiable Nonlinear Tensor Networks (ADNTNs), a family of structured weight generators whose compact core tensors are trained end-to-end by reverse-mode automatic differentiation (AD). The approach can be viewed as a natural extension of low-rank adaptation and tensor factorisation: instead of using one low-rank matrix update, an ADNTN builds a large weight tensor through a hierarchy of small cores, nonlinear activations, and optional lateral mixing tensors. The paper focuses on three architectures: Tree Tensor Networks (TTNs), augmented TTNs (aTTNs) with boundary disentanglers, and Multi-scale Entanglement Renormalisation Ansatze (MERA). The formulation supports nonlinear activations, task-aware objectives, batching, and hardware-aware execution schedules. At the same time, the paper keeps a clear distinction between \emph{differentiating} a contraction program and making contraction free: AD does not remove the cost of large intermediates, poor contraction orders, or exact contraction of general loopy tensor networks. Extensive simulations on AlexNet and VGG-16 layers show per-layer compression ratios from roughly $2000\times$ to $77000\times$ in the studied settings, with accuracy often matching the dense baseline and, in several VGG-16 cases, improving it. These results are encouraging rather than final: they suggest that ADNTNs are a promising, mathematically structured, and hardware-aware route toward much smaller neural networks, provided that optimisation, contraction schedules, and deployment kernels are designed together.

URL PDF HTML ☆

赞 0 踩 0

2606.00129 2026-06-02 cs.LG cs.AI 版本更新

A Shared Valence Axis Across Modern LLMs and Human EEG: The Saturation Regularity

现代LLM与人脑EEG共享的效价轴：饱和规律

Yousef A. Radwan, Xuhui Liu, Kilichbek Haydarov, Yuqian Fu, Mohamed Elhoseiny

发表机构 * King Abdullah University of Science and Technology（卡布斯大学）

AI总结本研究通过构建从大型语言模型（LLM）中提取的一维效价方向（V轴），发现其与人类EEG神经活动对齐，但进一步对齐策略无法提升解码性能，并形式化为“饱和规律”，指出改进应来自监督无法触及的残差子空间。

详情

AI中文摘要

大型语言模型（LLM）已成为强大的表示学习器，其内部特征与人类认知日益对齐。我们研究现代LLM是否可以作为理解人脑神经表示的透镜，重点关注EEG中的情感效价。我们首先仅使用九个情感唤起句子从现代LLM中构建了一维效价方向（V轴），并通过零样本迁移到情感基准测试和跨十四个LLM的模型一致性进行了验证。然后，我们展示了这个从LLM导出的方向映射到人类神经活动。在一个包含123名受试者观看情感视频的公共EEG队列中，EEG特征上的单个线性投影追踪了每个刺激的V轴位置。此外，36个未暴露于V轴的EEG情感分类器在其内部表示中自发发现了相同的方向，表明相同的效价结构在语言模型和人类电生理学中同时出现。然而，这种趋同并未提供有效的训练信号。我们测试了二十五种对齐策略，包括知识蒸馏、表示相似性、对比和拓扑损失；没有一种能改善解码，十六种显著降低了准确性。我们将这一结果形式化为饱和规律：一旦任务标签单独驱动脑解码网络朝向目标方向，额外的监督主要扭曲已经饱和的盆地，而承载类内残差的子空间几乎得不到有用的梯度。这一规律也指出了改进应来自何处：监督无法触及的残差子空间。受此启发，我们集成残差多样性而非监督盆地，在FACED上将平衡准确率提高了10.5%，并在SEED-V上复制了相同效果。

英文摘要

Large language models (LLMs) have emerged as powerful representation learners whose internal features increasingly align with human cognition. We study whether modern LLMs can serve as a lens for understanding neural representations in the human brain, focusing on emotional valence in EEG. We first build a one-dimensional valence direction, the V-axis, from modern LLMs using only nine emotion-evocative sentences. We validate it through zero-shot transfer to sentiment benchmarks and cross-model consistency across fourteen LLMs. We then show that this LLM-derived direction maps onto human neural activity. On a public EEG cohort of 123 subjects watching affective videos, a single linear projection on EEG features tracks the V-axis position of each stimulus. Moreover, 36 EEG emotion classifiers trained without exposure to the V-axis spontaneously rediscover the same direction in their internal representations, suggesting that the same valence structure emerges in both language models and human electrophysiology. Yet this convergence does not provide an effective training signal. We test twenty-five alignment strategies, including knowledge distillation, representational similarity, contrastive, and topographic losses; none improve decoding, and sixteen significantly reduce accuracy. We formalize this result as the saturation regularity: once task labels alone drive a brain-decoding network onto the target direction, additional supervision mainly distorts an already-saturated basin, while the load-bearing within-class residual receives little useful gradient. This regularity also indicates where improvement should come from: the residual subspace unreachable by supervision. Motivated by this insight, we ensemble across residual diversity rather than supervising the basin, improving balanced accuracy by 10.5% over the prior best on FACED, with the same effect replicated on SEED-V.

URL PDF HTML ☆

赞 0 踩 0

2606.00125 2026-06-02 cs.IR cs.AI cs.LG cs.MM 版本更新

Multimodal Music Recommendation System using LLMs

使用LLMs的多模态音乐推荐系统

Srikar Prabhas Kandagatla, Sreehitha R. Narayana, Chandana Magapu, Swetha Mohan, Shamanth Kuthpadi, Hongjie Chen, Ryan A. Rossi, Franck Dernoncourt, Nesreen Ahmed

发表机构 * University of Massachusetts Amherst（马萨诸塞大学阿姆赫斯特分校）； Dolby Laboratories（Dolby实验室）； Adobe Research（Adobe研究）； Cisco Research（Cisco研究）

AI总结提出一个多模态框架，通过融合音频、歌词、LLM生成的语义元数据和收听完成率，在基于会话的音乐推荐中显著提升Recall和NDCG。

详情

AI中文摘要

音乐推荐系统通常将歌曲视为不透明标记，依赖协同交互历史，忽略了语义或声学内容。先前工作探索了LLM增强、多模态和文本增强的序列推荐方法，但有些方法部分结合了语义、声学或参与信号，没有在一个统一的基于LLM的序列推理框架中联合建模所有三个信号，该框架将推荐基于实际歌曲内容。在这项工作中，我们提出了一个用于基于会话的音乐推荐的多模态框架，通过三种互补信号丰富了LastFM-1K数据集：(1) 使用预训练音乐和文本表示模型提取的音频和歌词嵌入，(2) 使用MGPHot注释方案生成的LLM语义元数据，以及(3) 收听完成率。我们采用E4SRec框架，通过扩展多模态特征和不同的项目ID编码器骨干（包括SASRec、BERT4Rec和GRU4Rec）来增强它。我们进一步扩展了LLM骨干选项，包括LLaMa-2-13B、Qwen2.5-7B-Instruct和LLaMa-3-70B，在零样本和微调设置下。我们的实验表明，集成基于内容的特征比仅使用ID的基线在Recall上提升高达95%，在NDCG上提升高达79%。此外，我们的实验表明，朴素的多模态融合并不总是产生加性改进，突显了跨模态整合的挑战。我们发布了一个用于音乐推荐的大规模多模态基准。

英文摘要

Music recommendation systems typically treat songs as opaque tokens, relying on collaborative interaction histories which overlooks semantic or acoustic content. Prior work has explored LLM-augmented, multimodal, and text-enhanced approaches to sequential recommendation, and while some methods partially combine semantic, acoustic, or engagement signals, none jointly model all three within a unified LLM-based sequential reasoning framework that grounds recommendations in actual song content. In this work, we propose a multimodal framework for session-based music recommendation that enriches the LastFM-1K dataset with three complementary signals: (1) audio and lyric embeddings extracted using pretrained music and text representation models, (2) LLM-generated semantic metadata using the MGPHot annotation schema, and (3) listening completion ratios. We adopt the E4SRec framework by extending it with multimodal features and different item ID encoder backbones, including SASRec, BERT4Rec, and GRU4Rec. We further extend the LLM backbone option with LLaMa-2-13B, Qwen2.5-7B-Instruct, and LLaMa-3-70B in both zero-shot and fine-tuned settings. Our experiments show that integrating content-based features improves over ID-only baselines up to 95% in terms of Recall and 79% in terms of NDCG. Moreover, our experiments show that naive multimodal fusion does not always yield additive improvements, highlighting challenges in cross-modal integration. We release a large-scale multimodal benchmark for music recommendation.

URL PDF HTML ☆

赞 0 踩 0

2606.00124 2026-06-02 cs.CV cs.LG 版本更新

Positional Encodings Anchor Spatial Structure in Vision Transformers: A Geometric Perspective on Robustness

位置编码锚定视觉Transformer中的空间结构：基于几何视角的鲁棒性研究

Mahmoud Mannes

发表机构 * ESSTHS

AI总结本文通过引入空间相似性距离相关性（SSDC）度量，研究不同位置编码对视觉Transformer内部空间表示几何结构的影响，发现位置编码通过建立索引锚定的空间组织来提升模型在内容破坏性分布偏移下的鲁棒性。

Comments 16 pages (9 main text, 7 appendix). 5 figures (3 main text, 2 appendix) with 8 graphics total. 5 tables (1 main text, 4 appendix). Submitted to NeurIPS 2026 main conference and the ICML 2026 mechanistic interpretability workshop

详情

AI中文摘要

视觉Transformer中的位置嵌入（PEs）已知会影响性能和鲁棒性，但它们在塑造内部空间表示中的作用尚不明确。本文研究了不同形式的PEs如何影响ViT的表示几何结构，以及这些变化如何与内容破坏性分布偏移下的鲁棒性相关。我们引入了一个度量——空间相似性距离相关性（SSDC），用于量化token表示中的空间结构。利用该度量，我们发现未使用PEs训练的ViT仍会发展出非平凡的空间结构，但这种结构由视觉内容驱动，并在token置换下崩溃。相反，所有考虑的PEs（可学习绝对位置编码、正弦位置编码和旋转位置编码）都与向索引锚定空间组织的一致转变相关。这些模型中的表示在破坏内容的扰动下保持稳定，并对这类分布偏移表现出显著增强的鲁棒性。我们进一步表明，尽管不同的PEs产生不同的空间结构深度轨迹，但其鲁棒性属性大致相似（编码方案间存在次要差异），这表明鲁棒性似乎更依赖于稳定的位置参考框架的存在，而非特定的编码机制。这些结果为位置编码如何塑造内部表示提供了几何解释，并对未来编码方案的原则性设计具有启示意义。

英文摘要

Positional embeddings (PEs) in Vision Transformers (ViTs) are known to impact performance and robustness, but their role in shaping internal spatial representations is not well understood. In this work, we study how different forms of PEs influence the representational geometry of ViTs and how these changes relate to robustness under content-disrupting distribution shifts. We introduce a metric, the Spatial Similarity Distance Correlation (SSDC), to quantify spatial structure in token representations. Using this metric, we show that ViTs trained without PEs still develop non-trivial spatial structure, but this structure is driven by visual content and collapses under token permutation. In contrast, we find that all PEs considered (learned absolute, sinusoidal, and rotary) are associated with a consistent shift toward an index-anchored spatial organization. Representations in these models remain stable under perturbations that disrupt content, and exhibit substantially improved robustness to such distributional shifts. We further show that while different PEs produce distinct depth-wise trajectories of spatial structure, their robustness properties are largely similar (with secondary variation across encoding schemes), suggesting that robustness appears to depend on the presence of a stable positional reference frame more than it depends on the specific encoding mechanism. These results offer a geometric account of how positional encodings shape internal representations, with implications for the principled design of future encoding schemes.

URL PDF HTML ☆

赞 0 踩 0

2606.00123 2026-06-02 cs.CV cs.AI cs.LG 版本更新

CardioLens: Revealing the Clinical Reality Gap of MLLMs via Multi-Sequence Cardiac MRI Evaluations

CardioLens: 通过多序列心脏MRI评估揭示MLLMs的临床现实差距

Zixian Su, Hongkai Zhang, Fan Gao, Encheng Su, Taiping Qu, Jingwei Guo, Nan Zhang, Hui Wang, Zhen Zhou, Kairui Bo, Yan Chen, Yue Ren, Shuai Li, Lei Xu, Henggui Zhang

发表机构 * Beijing Academy of Artificial Intelligence（北京人工智能研究院）； Beijing Anzhen Hospital（北京安贞医院）； Beihang University（北航）； King Abdullah University of Science and Technology（国王 Abdullah 科学与技术大学）

AI总结提出CardioLens测试平台，通过多序列心脏磁共振成像评估24个多模态大语言模型，发现其在临床工作流中表现不佳，存在类别崩溃失败模式，且输入选择和推理提示改进效果有限。

详情

AI中文摘要

多模态大语言模型在公共医学基准上表现出色，但现有评估通常依赖于孤立输入和简化识别任务，难以作为临床使用的有效代理。我们提出了CardioLens，一个针对多序列心血管磁共振的无泄漏评估测试平台，通过严格的报告到QA构建和验证流程，从私有医院档案中构建。CardioLens包含473,896张切片和13,494个经过验证的QA对，涵盖4D Cine、LGE、灌注和T2加权成像，并评估CMR解读的三个阶段：图像理解、报告生成和疾病诊断。在24个最先进的MLLM上，CardioLens揭示了显著的临床现实差距：模型整体表现不佳，性能沿真实CMR工作流下降。混淆分析进一步显示一种类别崩溃失败模式，模型倾向于默认频繁出现的异常类别，而不是区分临床不同的发现。为了排除MLLM兼容输入构造是主要原因，我们在不同切片预算下比较了随机、临床动机和数据驱动的切片选择协议；性能变化很小，通常约为1%。显式推理提示也无法挽救性能，往往使模型更加保守，而不是改善视觉证据的使用。这些结果表明，当前MLLM远未达到可靠的CMR解读，临床决策需要跨序列、视图和时间相位整合分布式证据。CardioLens为开发面向真实临床部署的下一代MLLM提供了一个临床基础的测试平台。

英文摘要

Multimodal Large Language Models (MLLMs) have shown strong performance on public medical benchmarks, yet existing evaluations often remain weak proxies for clinical use, relying on isolated inputs and simplified recognition-style tasks. We introduce CardioLens, a leakage-resistant evaluation testbed for multi-sequence Cardiovascular Magnetic Resonance (CMR), constructed from private hospital archives through a rigorous report-to-QA construction and verification pipeline. CardioLens contains 473,896 slices and 13,494 verified QA pairs across 4D Cine, LGE, perfusion, and T2-weighted imaging, and evaluates three stages of CMR interpretation: image understanding, report generation, and disease diagnosis. Across 24 state-of-the-art MLLMs, CardioLens reveals a substantial clinical reality gap: models perform poorly overall, with performance degrading along the real CMR workflow. Confusion analysis further shows a category-collapse failure mode, where models default to frequent abnormal categories rather than distinguishing clinically distinct findings. To rule out MLLM-compatible input construction as the primary cause, we compare random, clinically motivated, and data-driven slice selection protocols under different slice budgets; performance changes only marginally, typically by about 1%. Explicit reasoning prompts also fail to rescue performance, often making models more conservative rather than improving visual evidence use. These results show that current MLLMs remain far from reliable CMR interpretation, where clinical decisions require integrating distributed evidence across sequences, views, and temporal phases. CardioLens provides a clinically grounded testbed for developing next-generation MLLMs toward real-world clinical deployment.

URL PDF HTML ☆

赞 0 踩 0

2606.00120 2026-06-02 eess.SP cs.AI cs.LG 版本更新

脑机接口中速度-准确性权衡显式控制的方法论框架

Javier Jiménez, Francisco B Rodríguez

发表机构 * Grupo de Neurocomputación Biológica, Departamento de Ingeniería Informática, Universidad Autónoma de Madrid（生物神经计算组，信息工程系，马德里自治大学）

AI总结提出一个独立于分类器、范式和早停策略的评估框架，通过增益和保持度两个指标及可调参数α显式控制速度-准确性权衡，并在P300范式上验证其有效性。

详情

AI中文摘要

脑机接口（BCI）受到脑电图等模态低信噪比的限制，需要多次试验才能可靠解码用户意图。这导致了速度-准确性权衡，即更高的准确性以速度为代价。速度-准确性平衡依赖于应用，因此需要可控的权衡。传统指标（如信息传输率）将速度和准确性合并，模糊了它们的依赖关系并可能引入偏差。在本研究中，我们提出了一个独立于分类器、范式和早停策略的评估框架，将速度和准确性分离。我们采用两个度量：增益（相对速度提升）和保持度（相对准确性保持），并将它们组合成一个由α控制的可调增益-保持平衡，从而调节速度-准确性权衡。该参数无需修改分类器即可调整工作点，便于跨场景部署。该框架在P300事件相关电位范式上进行了评估，使用了63名受试者的公开记录以及多种分类器和早停策略，以实现速度-准确性和比特率的不同工作点。结果表明，调整α可产生快速、准确或平衡的BCI行为，展示了速度-准确性权衡的显式控制。该方法支持受试者级别的性能预测，并提高了BCI行为的可解释性。对信息传输率的进一步分析揭示了其向速度的系统性偏差，该偏差通过所提出的框架中的增益和保持度测量得到解释。总体而言，本工作将速度-准确性权衡确立为可控的设计变量，并在公开的P300范式上进行了验证，从而实现了BCI的透明评估和应用特定优化。

英文摘要

Brain-computer interfaces (BCIs) are limited by low signal-to-noise ratio in modalities such as electroencephalography, which requires multiple trials to reliably decode user intentions. This induces a speed-accuracy trade-off, whereby higher accuracy comes at the cost of speed. The speed-accuracy balance is application-dependent, motivating controllable trade-offs. Conventional metrics, such as the Information Transfer Rate, combine speed and accuracy obscuring their dependence and potentially introducing biases. In this study, we propose an evaluation framework independent of classifier, paradigm, and early-stopping strategy that separates speed and accuracy. We employ two measures, Gain (relative speed improvement) and Conservation (relative accuracy preservation), and combine them into a tunable Gain-Cons Balance controlled by α, regulating the speed-accuracy trade-off. The parameter adjusts the operating point without modifying the classifier, facilitating deployment across scenarios. The framework was evaluated on P300 event-related potential paradigms using public recordings from 63 subjects as well as multiple classifiers and early-stopping strategies to achieve distinct operating points in speed-accuracy and bitrate. Results show that tuning α yields fast, accurate, or balanced BCI behaviours, demonstrating explicit control of the speed-accuracy trade-off. The method supports subject-level performance prediction and improves explainability of BCI behaviour. Further analysis of the Information Transfer Rate reveals a systematic bias toward speed, explained by the proposed framework through the Gain and Conservation measurements. Overall, this work establishes the speed-accuracy trade-off as a controllable design variable validated on public P300-based paradigms, enabling transparent evaluation and application-specific optimization of BCIs.

URL PDF HTML ☆

赞 0 踩 0

2606.00084 2026-06-02 cs.IR cs.AI cs.CL cs.LG 版本更新

SentimentLens: Reconciling Sentiment and Ratings via Dual-Modality in the Hospitality Sector

SentimentLens: 通过双模态调和酒店业中的情感与评分

Dineth Jayakody, Pasindu Thenahandi, Sampath Jayarathna

发表机构 * University of Peradeniya（珀拉尼亚大学）

AI总结提出SentimentLens系统，基于方面级情感分析从非结构化酒店评论中提取知识，并通过跨模态调和文本情感与数值评分来识别运营冲突和服务改进机会。

详情

AI中文摘要

在线旅游平台生成大量用户生成的酒店评论，为大规模理解旅行者体验提供了丰富机会。然而，将非结构化文本反馈转化为结构化、可操作的见解仍然是一项具有挑战性的任务。本文提出了SentimentLens，一个基于方面级情感分析的可扩展分析系统，该系统从非结构化酒店评论中执行知识提取，并将其组织成可解释的服务类别。SentimentLens集成了方面术语提取、方面情感分类、语义类别分配和多层次分析模块，以支持区域级、酒店级和类别级评估。该系统设计为在不同地理环境和酒店环境中运行。为了展示其实用性，我们将SentimentLens应用于一个包含超过10,000条公开酒店评论的大型真实数据集。通过广泛分析，该框架揭示了旅行者情感如何随区域、服务类别和酒店类型而变化。我们进一步实现了文本情感与数值评分的跨模态调和，以识别潜在运营冲突、服务质量的结构性不一致性，并使用重要性-绩效和基于熵的分析确定高影响力的改进机会。结果表明，SentimentLens有效地将大规模非结构化评论转化为可操作的情报，支持酒店管理和旅游政策的数据驱动决策。虽然通过一个国家案例研究进行了演示，但所提出的系统可推广到其他目的地和评论驱动的服务领域。

英文摘要

Online travel platforms generate vast volumes of user-generated hotel reviews, offering rich opportunities to understand traveler experiences at scale. However, transforming unstructured textual feedback into structured, actionable insights remains a challenging task. This paper presents SentimentLens, a scalable analysis system based on Aspect-Based Sentiment Analysis that performs knowledge extraction from unstructured hotel reviews and organizes them into interpretable service categories. SentimentLens integrates aspect term extraction, aspect sentiment classification, semantic category assignment, and multi-level analytical modules to support region-level, hotel-level, and category-level evaluation. The system is designed to operate across different geographic contexts and hospitality settings. To demonstrate its practical utility, we apply SentimentLens to a large real-world dataset of over 10,000 publicly available hotel reviews. Through extensive analysis, the framework reveals how traveler sentiment varies across regions, service categories, and hotel archetypes. We further implement a cross-modal reconciliation of textual sentiment and numerical ratings to identify latent operational conflicts, structural inconsistencies in service quality, and high-impact improvement opportunities using importance--performance and entropy-based analyses. The results show that SentimentLens effectively transforms large-scale unstructured reviews into actionable intelligence, supporting data-driven decision-making for hospitality management and tourism policy. While demonstrated using a national case study, the proposed system is generalizable to other destinations and review-driven service domains.

URL PDF HTML ☆

赞 0 踩 0

2606.00083 2026-06-02 cs.LG cs.AI cs.RO 版本更新

From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models

从演示到奖励：VLM奖励模型的测试时提示优化

Christian Gumbsch, Leonardo Barcellona, Lennard Schünemann, Platon Karageorgis, Andrii Zadaianchuk, Zehao Wang, Sergey Zakharov, Fabien Despinoy, Rahaf Aljundi, Efstratios Gavves

发表机构 * University of Amsterdam（阿姆斯特丹大学）； Catholic University of Leuven（鲁汶天主大学）； Toyota Research Institute（丰田研究院）； Toyota Motor Europe（丰田欧洲公司）

AI总结提出Demo2Reward方法，利用少量专家演示在测试时优化VLM奖励模型的提示指令，减少假阳性并保持真阳性，无需额外训练即可提升下游策略学习。

详情

AI中文摘要

强化学习依赖于准确的奖励函数，但在现实应用（如机器人技术）中，这些函数通常是手工设计的，甚至不可用。最近的研究探索了预训练视觉-语言模型（VLM）作为奖励模型的零样本推理能力。然而，如果没有仔细的提示工程，这些方法往往会产生次优的奖励，其中假阳性预测会严重降低下游策略学习。在机器人技术中，通常收集包含专家演示的有限数据集来引导策略学习。这种场景提供了在策略训练之前优化奖励模型的机会。我们提出Demo2Reward，一种测试时自适应技术，基于少量演示（3-10条轨迹）优化奖励模型的语言指令，以减少假阳性同时保持真阳性。关键是，这在策略学习期间不需要额外的模型训练或计算资源。我们表明，Demo2Reward在一系列模拟机器人任务和策略骨干上始终优于现有的零样本和少样本VLM奖励模型。最后，我们证明Demo2Reward有效迁移到真实世界的机器人学习场景，无需手动设计奖励函数即可实现策略学习。

英文摘要

Reinforcement learning relies on accurate reward functions, which are often hand-crafted or even unavailable in real-world applications, such as robotics. Recent work has explored the zero-shot reasoning capabilities of pre-trained Vision-Language Models (VLMs) as reward models. However, without careful prompt engineering, these approaches tend to produce suboptimal rewards, where false positive predictions can severely degrade downstream policy learning. In robotics, limited datasets comprising expert demonstrations are often collected to bootstrap policy learning. This scenario provides an opportunity to optimize a reward model prior policy training. We propose Demo2Reward a test-time adaptation technique to optimize the language instruction of a reward model based on a few demonstrations (3-10 trajectories) to reduce false positives while preserving true positives. Crucially, this requires no additional model training or computation resources during policy learning. We show that Demo2Reward consistently outperforms existing zero- and few-shot VLM reward models across a range of simulated robotic tasks and policy backbones. Finally, we demonstrate that Demo2Reward effectively transfers to a real-world robotic learning scenario, enabling policy learning without manually engineering a reward function.

URL PDF HTML ☆

赞 0 踩 0

2606.00082 2026-06-02 cs.LG cs.AI stat.ML 版本更新

Hoeffding Concept Bottleneck Models with Applications to Overhead Images

Hoeffding概念瓶颈模型及其在俯视图像中的应用

Clément Bénard, Manon Arfib, Christophe Labreuche, Victor Quétu

发表机构 * Thales cortAIx-Labs（泰雷兹 cortAIx 实验室）； Université Paris-Saclay, CentraleSupélec（巴黎-萨克雷大学，中央理工-巴黎高等学院）

AI总结针对线性概念瓶颈模型可解释性差和信息泄露问题，提出基于Hoeffding泛函分解的非线性稀疏聚合方法HCBM，并证明其对概念间泄露的鲁棒性，在分类和俯视图像目标检测任务中优于传统线性CBM。

详情

AI中文摘要

深度学习算法的可解释性对于高风险决策的计算机视觉应用至关重要。概念瓶颈模型（CBM）最近在基于高级概念瓶颈的分类问题上展示了提供可解释且准确预测的潜力。现有的CBM方法依赖概念分数的线性聚合来计算预测。然而，这种线性方法通常使用大量概念，这削弱了可解释性并有利于信息泄露。通常，概念与输出logits之间的潜在关系不是线性的。因此，我们引入了Hoeffding概念瓶颈模型（HCBM），该模型基于梯度提升树的Hoeffding泛函分解，提供概念分数的非线性和稀疏聚合，并使用素蕴含生成紧凑预测。HCBM被证明对概念间泄露具有鲁棒性，并在大量实验中优于标准线性CBM。除了分类，HCBM还可以适应目标检测，我们专注于一个具有挑战性的俯视图像案例，以展示HCBM在这些设置中的高性能。

英文摘要

Explainability of deep learning algorithms is critical for computer-vision applications with high-stake decisions. Concept bottleneck models (CBM) have recently shown promising performance to provide explainable and accurate predictions for classification problems, based on a bottleneck of high-level concepts. Existing CBM methods rely on a linear aggregation of the concept scores to compute predictions. However, a large number of concepts is often used in this linear approach, which undermines explainability and favors information leakage. In general, the underlying relation between concepts and output logits is not linear. Therefore, we introduce Hoeffding Concept Bottleneck Models (HCBM), which build on the Hoeffding functional decomposition of gradient-boosted trees to provide non-linear and sparse aggregations of concept scores, and generate compact predictions using prime implicants. HCBM are proved to be robust to interconcept leakage, and outperform standard linear CBM in practice, as shown in extensive experiments. Beyond classification, HCBM can be adapted to object detection, and we focus on a challenging case with overhead images to show the high performance of HCBM in these settings.

URL PDF HTML ☆

赞 0 踩 0

2606.00081 2026-06-02 cs.LG cs.AI cs.SD 版本更新

DAStatFormer: A Hybrid Multibranch Transformer with Statistical Feature Integration for DAS-Based Pattern Recognitions

DAStatFormer: 一种融合统计特征的混合多分支Transformer用于DAS模式识别

Michel Dione, Jerry Lonlac, Hélène Louis, Anthony Fleury, Stephane Lecoeuche

发表机构 * IMT Nord Europe, Institut Mines-Telecom, Univ. Lille, Centre for Digital Systems Lille, France（IMT北欧学院，法国电信研究院，里尔大学，数字系统研究中心，法国）； IMT Mines Ales, Institut Mines-Telecom, Ales, France（IMT阿尔勒学院，法国电信研究院，阿尔勒，法国）

AI总结针对DAS数据高维度和复杂时空模式问题，提出DAStatFormer混合多分支Transformer，通过提取24个ANOVA选择的统计特征并采用门控Transformer网络，在降低数据量级的同时实现高达99.4%的准确率。

详情

AI中文摘要

分布式声学传感（DAS）通过光纤实现大规模监测，但其高维度和复杂的时空模式使得事件分类具有挑战性。现有的深度学习方法——CNN、循环模型和Transformer变体——要么无法捕获长程依赖，要么需要以高昂成本处理原始DAS矩阵。我们提出DAStatFormer，一种混合多分支Transformer，将紧凑的多域统计特征与门控Transformer网络相结合。我们不是使用原始信号，而是从每个通道的时域、波形和频域提取24个ANOVA选择的属性，将数据量减少数个数量级，同时保留判别信息。每个域通过专用的逐步骤和逐通道注意力分支处理，并通过自适应门控机制融合。在开放的$\Phi$-OTDR基准测试和真实场景DAS数据集上的实验表明，DAStatFormer实现了高达99.4%的准确率和接近完美的实际性能，同时使用的参数和推理成本显著低于DASFormer和DeepViT等模型。这些结果证明了其适用于可扩展、实时的DAS监测。我们在https://github.com/MichelD-git/DAStatFormer发布代码。

英文摘要

Distributed Acoustic Sensing (DAS) enables large-scale monitoring through optical fibers, but its high dimensionality and complex spatio-temporal patterns make event classification demanding. Existing deep learning approaches-CNNs, recurrent models, and Transformer variants-either fail to capture long-range dependencies or require processing raw DAS matrices at prohibitive cost. We propose DAStatFormer, a hybrid multibranch Transformer that combines compact multidomain statistical features with Gated Transformer Networks. Instead of raw signals, we extract 24 ANOVA-selected attributes per channel from the temporal, waveform, and spectral domains, reducing data size by orders of magnitude while preserving discriminative information. Each domain is processed via dedicated step-wise and channel-wise attention branches, fused by an adaptive gating mechanism. Experiments on the open $Φ$-OTDR benchmark and a real-scenario DAS dataset show that DAS-tatFormer achieves up to 99.4% accuracy and near-perfect real-world performance, while using significantly fewer parameters and lower inference cost than models such as DASFormer and DeepViT. These results demonstrate its suitability for scalable, real-time DAS-based monitoring. We release our code at https://github.com/MichelD-git/DAStatFormer

URL PDF HTML ☆

赞 0 踩 0

2606.00080 2026-06-02 cs.CV cs.AI cs.LG cs.NE 版本更新

罕见事件，真实信号：深度脉冲网络中的功能集合作为计算单元

Aditi Aravind, Konstantinos Ladakis, Mario Alexios Savaglio, Stelios M. Smirnakis, Maria Papadopouli

发表机构 * University of Crete（希腊克里特大学）； Foundation of Research & Technology - Hellas（希腊研究与技术基金会）； Archimedes Research Unit（阿基米德研究单位）； Harvard Medical School（哈佛医学院）； Brigham and Women’s Hospital（布莱根妇女医院）

AI总结通过引入功能连接性分析框架，研究深度脉冲神经网络中功能集合的涌现特性，发现一阶功能连接集合的协同放电可靠预测下游神经元响应，且信息编码集中在罕见但高度协调的活动模式中。

详情

AI中文摘要

我们通过引入一个受神经科学启发的框架，从功能连接性的角度分析深度脉冲神经网络（SNN），研究内部表征如何在层次化处理系统中涌现。借鉴系统神经科学和信息论的概念，我们基于一个神经元与训练好的SNN架构中前一层神经元的统计显著成对相关性，形成该神经元的一阶功能连接（1FC）组。然后，我们在各种条件下的推理过程中跟踪其响应特性。我们的分析表明，先前在生物皮层中观察到的功能连接性的几个原理在脉冲ResNet架构中得以保留。这些1FC集合表现出有趣的特性：它们的聚合协同放电通过一个鲁棒的、类似ReLU的输入输出关系可靠地预测下游神经元响应，其增益随集合大小系统性缩放。仅在高的1FC协同放电事件期间才出现所呈现类别的可靠编码，而这些事件本身发生频率较低，表明信息表征集中在罕见但高度协调的活动模式中。在均匀随机噪声或对抗性扰动下，这些响应轮廓被破坏，尤其是在早期和中间层。这使得能够在特定节点和路径上进行有针对性的高分辨率探查。我们表明，功能连接结构由学习塑造，并且在权重置换下该结构被破坏。这些确立了1FC集合作为输入编码和信息传递的功能上有意义的基质，对设计针对信息流的有针对性的细粒度诊断具有潜在意义。

英文摘要

We investigate how internal representations emerge across hierarchical processing systems by introducing a neuroscience-inspired framework for analyzing deep spiking neural networks (SNN) through the lens of functional connectivity. Drawing on concepts from systems neuroscience and information theory, we form the first-order functionally-connected (1FC) group of a neuron based on its statistically significant pairwise correlations with neurons from the previous layer of a trained SNN architecture. We then track its response properties during inference under various conditions. Our analysis shows that several principles of functional connectivity previously observed in biological cortex are preserved in spiking ResNet architectures. These 1FC ensembles display interesting properties: their aggregate cofiring reliably predicts downstream neuronal responses through a robust, ReLU-like input-output relationship, whose gain scales systematically with ensemble size. Reliable encoding of the presented class emerges only during high 1FC cofiring events, which themselves occur infrequently, indicating that informative representations are concentrated in rare but highly coordinated activity patterns. Under uniform random noise or adversarial perturbations, these response profiles are disrupted, particularly in early and intermediate layers. This enables a targeted high-resolution interrogation at specific nodes and pathways. We showed that the functional connectivity structure is shaped by learning and this structure breaks under weight permutation. These establish 1FC ensembles as a functionally meaningful substrate for input encoding and information transfer, with potential implications in designing targeted fine-grained diagnostics on the information flow.

URL PDF HTML ☆

赞 0 踩 0

2606.00060 2026-06-02 q-fin.TR cs.CE cs.LG 版本更新

Machine Learning-Based Bitcoin Trading Under Transaction Costs: Evidence From Walk-Forward Forecasting

基于机器学习的比特币交易：考虑交易成本的滚动前向预测证据

Andrei Bysik, Robert Ślepaczuk

发表机构 * Quantitative Finance Research Group, Faculty of Economic Sciences, University of Warsaw（经济科学学院量化金融研究组，华沙大学）； Quantitative Finance Research Group, Department of Quantitative Finance and Machine Learning, Faculty of Economic Sciences, University of Warsaw（经济科学学院量化金融与机器学习系量化金融研究组，华沙大学）

AI总结研究在交易成本下，利用XGBoost、LSTM和iTransformer等机器学习模型预测BTC-USDT小时收益率，并通过成本感知执行过滤器将预测转化为盈利交易策略。

Comments 42 pages,

详情

AI中文摘要

本文研究机器学习对BTC-USDT小时收益率的预测能否在扣除交易成本后转化为具有经济意义的交易表现。使用2018-2026年间约70,000个小时观测值，在27折滚动前向协议中评估XGBoost、LSTM和iTransformer。所有三种模型在选定配置下均产生正的毛交易表现，但一旦施加十个基点的交易成本，基于符号的朴素策略便失效。一种成本感知的执行过滤器（仅当预测幅度超过基于交易成本的阈值时才阻止交易）显著降低了换手率，并在选定配置下恢复了盈利能力。最强的纯多头XGBoost策略年化收益率超过65%，夏普比率高于1。额外测试表明，技术指标在选定情况下提升了表现，EGARCH导出的特征并未提供一致的稳健收益，且XGBoost在描述性上优于神经替代模型，尽管自助法证据不支持正式的统计优势。损失函数和模型选择效应是次要的且统计上脆弱。结果表明，小时级加密货币交易的主要障碍不仅在于弱可预测性，还在于将预测转化为交易的方式。

立场：煤矿中的随机鹦鹉。模型崩溃对低资源社区的威胁

Devon Jarvis, Richard Klein, Benjamin Rosman, Steven James, Stefano Sarao Mannelli

发表机构 * GitHub

AI总结本文探讨模型崩溃（生成模型在先前模型输出上训练导致的性能下降）如何通过降低训练效率、扭曲数据分布，不成比例地影响低资源和边缘化社区，并呼吁采取行动。

Comments 14 pages, 1 figure, 1 table, International Conference on Machine Learning

详情

AI中文摘要

模型崩溃，即当生成模型在先前的模型输出上进行训练时出现的性能下降，随着人工生成内容的激增，日益受到关注。对大型语言模型的相关批评强调了它们倾向于复现训练数据中的频繁模式、依赖庞大的数据集以及巨大的环境成本。这些因素共同导致了数据退化、文化偏见的强化以及资源利用的低效。在这篇立场论文中，我们旨在结合这些观点，并论证模型崩溃威胁着当前使AI民主化的努力。通过降低训练效率并使数据分布偏离其支撑的尾部，模型崩溃不成比例地影响了低资源和边缘化社区。我们考察了这一现象的环境和文化影响，将我们的立场置于近期关于模型崩溃的立场论文中，并以行动呼吁作为结论。最后，我们概述了减轻这些影响的初步方向。

英文摘要

Model collapse, the degradation in performance that arises when generative models are trained on the outputs of prior models, is an increasing concern as artificially generated content proliferates. Related critiques of large language models have highlighted their tendency to reproduce frequent patterns in training data, their reliance on vast datasets, and their substantial environmental cost. Together, these factors contribute to data degradation, the reinforcement of cultural biases, and inefficient resource use. In this position paper we aim to combine these views and argue that model collapse threatens current efforts to democratize AI. By reducing training efficiency and skewing data distributions away from the tails of their support, model collapse disproportionately impacts low-resource and marginalized communities. We examine both the environmental and cultural implications of this phenomenon, situate our position within recent position papers on model collapse, and conclude with a call to action. Finally, we outline initial directions for mitigating these effects.

URL PDF HTML ☆

赞 0 踩 0

2605.31200 2026-06-02 cs.LG stat.ML 版本更新

Beyond Additive Decompositions: Interpretability Through Separability

超越加性分解：通过可分离性实现可解释性

Jinyang Liu, Munir Eberhardt Hiabu

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出张量分离学习（TSL）回归模型，通过阶段式贪心过程与正交重拟合学习单变量特征函数的秩1乘积之和，避免加性分解中的信号抵消和外推问题，实现忠实于拟合分量的可视化，并提供近似率保证。

Comments To appear in Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

可解释机器学习需要准确且结构上忠实于数据的模型。现有的可解释性方法严重依赖加性表示（例如广义加性模型（GAMs）、SHapley加性解释（SHAP）、函数ANOVA），这些方法在存在强交互作用时可能会遭受信号抵消和支持外推。我们提出了张量分离学习（TSL），一种回归模型，通过带有正交重拟合的阶段式贪心过程学习单变量特征函数的秩1乘积之和。通过强制可分离性，TSL避免了加性投影中由于边缘化高阶交互作用而导致的信息损失。学习的TSL模型可以从一阶偏依赖函数完全重建，直到常数因子。这种阶段式对应确保了所得可视化忠实于拟合的分量。我们为具有有界混合$p$阶偏导数的函数建立了近似率保证，并证明TSL在回归基准测试中与黑盒模型竞争。

英文摘要

Interpretable machine learning requires models that are accurate and structurally faithful to the data. Existing explainability methods rely heavily on additive representations (e.g., Generalized Additive Models (GAMs), SHapley Additive exPlanations (SHAP), functional ANOVA), which can suffer from signal cancellation and off-support extrapolation in the presence of strong interactions. We propose Tensor Separation Learning (TSL), a regression model that learns a sum of rank-1 products of univariate per-feature functions via a stagewise greedy procedure with orthogonal refitting. By enforcing separability, TSL avoids the information loss inherent in additive projections caused by marginalizing higher-order interactions. The learned TSL model can be fully reconstructed from first-order partial dependence functions, up to constant factors. This stage-wise correspondence ensures that the resulting visualizations are faithful to the fitted components. We establish approximation-rate guarantees for functions with bounded mixed $p$-th order partial derivatives and demonstrate that TSL competes with black-box models on regression benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.31162 2026-06-02 cs.CV cs.LG 版本更新

Guidance for Low-Level Perceptual Editing in Unconditional Diffusion Models

无条件扩散模型中低级感知编辑的引导

Shreyansh Modi, Akshat Tomar, Aarush Aggarwal

发表机构 * Indian Institute of Technology Roorkee（印度理工学院罗尔基）

AI总结针对无条件扩散模型在美学和感知增强中难以进行全局低级变换的问题，提出一种无需训练的推理时机制，通过提取退化概念向量并结合瓶颈修补与无分类器引导，实现图像编辑与质量提升。

Comments 11 pages, 12 figures, Generative Models for Computer Vision Workshop CVPR 2026

2605.28335 2026-06-02 cs.LG 版本更新

来自表达性教师的对抗性双在线策略蒸馏

Zhenglin Wan, Jingxuan Wu, Xingrui Yu, Chubin Zhang, Mingcong Lei, Bo An, Ivor W. Tsang, Yang You

发表机构 * National University of Singapore（新加坡国立大学）； University of Technology Sydney（悉尼大学）； University of Science and Technology of China（中国科学技术大学）

AI总结提出FA-OPD方法，通过对抗性双在线策略蒸馏，结合流匹配教师和轻量MLP学生，利用奖励和动作双通道信号，在机器人导航、操作和运动任务中优于强基线，且对噪声和有限演示鲁棒。

Comments arXiv admin note: substantial text overlap with arXiv:2510.09222

详情

AI中文摘要

在具身控制中从演示学习通常被建模为行为克隆，最近的扩散或流匹配策略通过建模多模态专家动作改进了这一范式。然而这些方法仍然是离线监督学习：策略仅在专家状态上训练，在其实际访问的状态上得不到纠正信号。在线策略蒸馏（OPD）提供了一种自然的补救措施，但标准OPD假设一个强大的固定教师，这在仅演示控制中不可用。我们提出 extbf{FA-OPD}，一种\emph{对抗性双在线策略蒸馏}方法，其中从演示学习流匹配（FM）教师，并与轻量MLP学生共同训练。教师在学生 rollout 上提供两种互补信号。奖励通道学习状态-动作对上的专家似然目标，并通过长视界策略优化驱动在线探索。动作通道在学生访问的状态提供密集的局部目标，稳定利用。FA-OPD耦合两者，使得奖励蒸馏能够实现超越逐点演示的泛化，而动作蒸馏使探索锚定在接近专家行为附近。在六个机器人导航、操作和运动基准上，FA-OPD击败了强基线，并在噪声或有限演示下表现出更强的鲁棒性。源代码：https://github.com/vanzll/FA-OPD。

英文摘要

Learning from demonstrations in embodied control is often cast as behavioral cloning, and recent diffusion or flow-matching policies improve this paradigm by modeling multi-modal expert actions. Yet these methods remain offline supervised learners: the policy is trained only on expert states and receives no corrective signal on the states it actually visits. On-policy distillation (OPD) offers a natural remedy, but standard OPD assumes a strong fixed teacher, which is unavailable in demonstration-only control. We propose \textbf{FA-OPD}, an \emph{adversarial dual on-policy distillation} method in which a Flow Matching (FM) teacher is learned from demonstrations and co-trained with a lightweight MLP student. The teacher provides two complementary signals on student rollouts. The reward channel learns an expert-likeness objective over state-action pairs and drives online exploration through long-horizon policy optimization. The action channel supplies dense local targets at student-visited states, stabilizing exploitation. FA-OPD couples them so that reward distillation enables generalization beyond point-wise demonstrations, while action distillation keeps exploration anchored near expert-like behavior. Across six robot navigation, manipulation, and locomotion benchmarks, FA-OPD beats strong baselines and shows much stronger robustness under noisy or limited demonstrations. Source code: https://github.com/vanzll/FA-OPD.

URL PDF HTML ☆

赞 0 踩 0

2605.26919 2026-06-02 cs.LG stat.ML 版本更新

Agile Online Model Selection: Resolving Adaptation Lag via Safeguarded Large Learning Rates

敏捷在线模型选择：通过受保护的大学习率解决适应滞后

Kei Takemura, Ryuta Matsuno, Keita Sakuma

发表机构 * NEC Corporation（日本电报电话株式会社）

AI总结提出一种乐观在线镜像下降算法，利用受保护的大学习率（高达Θ(T)）并引入事后惩罚机制，在非平稳环境中实现快速适应，同时保持近最优的遗憾界。

Comments Accepted to KDD 2026

详情

DOI: 10.1145/3770855.3817766

AI中文摘要

在非平稳环境中保持预测准确性需要在线模型选择来自主适应未知的分布变化。然而，现有的免调参算法在鲁棒性和敏捷性之间存在根本性权衡。具体来说，为了确保动态遗憾界，它们必须将学习率限制为小常数（例如，$O(1)$）。这种限制不可避免地会在突变期间导致显著的适应滞后。为了解决这个问题，我们提出了一种新颖的乐观在线镜像下降算法，该算法利用受保护的大学习率，最高可达$Θ(T)$，其中$T$是轮数。我们的关键技术贡献是一种事后惩罚机制，该机制动态监控不稳定的更新，并排除导致过度遗憾的学习率，从而消除了限制性先验约束的需要。我们证明了累积惩罚仍为$O(\log T)$，使得我们的算法在良性情况下实现优越的速率的同时，匹配接近最优的最坏情况保证。在三个合成数据集和十一个多样化真实世界数据集上的实证评估表明，我们的方法将适应滞后从数百轮减少到几轮，始终优于免调参基线。

英文摘要

Maintaining predictive accuracy in non-stationary environments requires online model selection to adapt autonomously to unknown distribution shifts. However, existing tuning-free algorithms face a fundamental trade-off between robustness and agility. Specifically, to ensure dynamic regret bounds, they must restrict learning rates to small constants (e.g., $O(1)$). This restriction inevitably causes significant adaptation lag during abrupt changes. To resolve this, we propose a novel optimistic online mirror descent that utilizes safeguarded large learning rates up to $Θ(T)$, where $T$ is the number of rounds. Our key technical contribution is a post-hoc penalty mechanism that dynamically monitors unstable updates and excludes learning rates incurring excessive regret, eliminating the need for restrictive a priori constraints. We show that the cumulative penalty remains $O(\log T)$, allowing our algorithm to match near-optimal worst-case guarantees while achieving superior rates in benign cases. Empirical evaluations on three synthetic and eleven diverse real-world datasets demonstrate that our approach reduces the adaptation lag from hundreds of rounds to a few rounds, consistently outperforming tuning-free baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.26874 2026-06-02 cs.DB cs.AI cs.LG 版本更新

Knowledge Graphs as the Missing Data Layer for LLM-Based Industrial Asset Operations

知识图谱：基于LLM的工业资产运营中缺失的数据层

Madhulatha Mandarapu, Sandeep Kunkunuru

发表机构 * VaidhyaMegha Private Limited, India（印度VaidhyaMegha私人有限公司）

AI总结研究通过类型化知识图谱作为数据层，将GPT-4在工业维护场景中的准确率从65%提升至99%，并引入生成增强知识（GAK）处理缺失数据，实现81.8%的场景可回答性。

Comments v2: reframed around the knowledge graph as a grounding substrate with a 3-tier router (text-to-Cypher; native graph/optimization primitives; generation-augmented knowledge, GAK). Adds a benchmark-grounded GAK evaluation on 88 real non-deterministic AssetOpsBench scenarios with provenance-tagged enrichment. 18 pages. Code: github.com/samyama-ai/assetops-kg

详情

AI中文摘要

基于LLM的工业资产运营代理在处理平面文档存储时准确性有限。AssetOpsBench（KDD 2026）表明，GPT-4代理在139个工业维护场景中达到65%的准确率，并比较了LLM编排范式（Agent-As-Tool vs. Plan-Execute）在固定数据层上的表现。我们提出一个正交问题：工具背后的数据模型有多重要？我们将类型化知识图谱作为基础基质，并根据最佳回答方式路由每个问题：（i）LLM生成的Cypher进行结构化检索，将同一GPT-4模型从65%提升至82-83%；（ii）原生图和优化原语（无需LLM）在图可回答场景中达到99%；（iii）生成增强知识（GAK）用于处理数据中缺失的答案——引擎的代理将缺失事实实现为带有溯源标签的图节点，然后回答。一个反复出现的主题是反向LLM使用：我们约束LLM从类型化模式生成查询或一次性丰富，让图确定性地执行。在88个真实的AssetOpsBench故障模式场景中（基准本身标记为非确定性——图中缺失十种设备类型），GAK将可回答性从零提升至100%的设备类型，并回答了81.8%的场景，每个实现的事实都标记为来源：LLM派生以确保可审计性。我们还贡献了40个图原生场景。对于结构化操作领域，数据层——而非LLM编排——是主要杠杆，类型化知识图谱充当原始工业数据与LLM推理之间的基础基质。

英文摘要

LLM-based agents for industrial asset operations show limited accuracy when reasoning over flat document stores. AssetOpsBench (KDD 2026) establishes that GPT-4 agents achieve 65% on 139 industrial maintenance scenarios, and compares LLM orchestration paradigms (Agent-As-Tool vs. Plan-Execute) on a fixed data layer. We ask the orthogonal question: how much does the data model behind the tools matter? We treat a typed knowledge graph as a grounding substrate and route each question by how it is best answered: (i) LLM-generated Cypher for structured retrieval, which lifts the same GPT-4 model from 65% to 82-83%; (ii) native graph and optimization primitives, with no LLM, reaching 99% on graph-answerable scenarios; and (iii) generation-augmented knowledge (GAK) for answers absent from the data -- the engine's agent materializes the missing facts as provenance-tagged graph nodes, then answers. A recurring theme is inverted LLM usage: we constrain the LLM to query generation or one-shot enrichment from a typed schema and let the graph execute deterministically. On the 88 real AssetOpsBench failure-mode scenarios the benchmark itself flags non-deterministic -- ten equipment types absent from the graph -- GAK lifts answerability from zero to 100% of equipment types and answers 81.8% of scenarios, every materialized fact tagged source:LLM-derived for auditability. We also contribute 40 graph-native scenarios. For structured operational domains the data layer -- not the LLM orchestration -- is the primary lever, and a typed knowledge graph serves as a grounding substrate between raw industrial data and LLM reasoning.

URL PDF HTML ☆

赞 0 踩 0

2605.26684 2026-06-02 cs.LG cs.AI 版本更新

Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning

超越轨迹级归因：基于图的智能体强化学习信用分配

Xin Cheng, Shuo He, Lang Feng, HaiYang Xu, Ming Yan, Lei Feng, Bo An

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出GraphGPO方法，通过构建状态转移图并利用全局信息估计各状态到任务目标的距离，实现步骤级信用分配，提升训练效率和性能。

Comments Accepted by ICML 2026

详情

AI中文摘要

基于组的强化学习方法在提升大型语言模型性能方面取得了显著成功，并已迅速扩展到智能体任务。然而，其信用分配严重依赖于根据最终结果进行的粗粒度轨迹级归因，难以捕捉单个步骤的贡献，例如失败轨迹中被掩盖的有价值步骤。为了揭示潜在信息并实现更忠实的步骤级信用分配，我们提出基于图的组策略优化（GraphGPO），该方法首先将所有 rollout 轨迹聚合为一个统一的状态转移图，然后利用图中编码的全局信息估计每个状态到任务目标的距离。最后，GraphGPO 通过估计基于图的优势函数，根据转移减少到任务目标距离的程度，为每条边分配信用。通过这种方式，GraphGPO 显著提高了训练效率，并在多个具有挑战性的基准测试中取得了最先进的性能。

英文摘要

Group-based reinforcement learning (RL) methods have achieved remarkable success in improving the performance of large language models (LLMs) and have been rapidly extended to agentic tasks. However, their credit assignment relies heavily on coarse-grained trajectory-level attribution according to final outcomes, making it difficult to capture the contribution of individual steps, such as valuable steps obscured within failed trajectories. To uncover latent information and enable more faithful step-level credit assignment, we propose Graph-based Group Policy Optimization (GraphGPO), which first aggregates all rollout trajectories into a unified state-transition graph and then estimates the distance from each state to the task goal using the global information encoded in the graph. Finally, GraphGPO assigns credit to each edge by estimating a graph-based advantage, based on how much the transition reduces the distance to the task goal. In this way, GraphGPO significantly improves training efficiency and achieves state-of-the-art performance across a range of challenging benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.26660 2026-06-02 cs.LG 版本更新

重新思考异常检测中的弱监督：一个综合基准

Xu Yao, Siyuan Zhou, Zhenbo Wu, Chaochuan Hou, Shuang Liang, Shiping Wang, Hailiang Huang, Songqiao Han, Minqi Jiang

发表机构 * Shanghai University of Finance and Economics（上海金融学院）； Ant Group（蚂蚁集团）； Key Laboratory of Interdisciplinary Research of Computation and Economics（计算与经济交叉学科重点实验室）

AI总结提出WSADBench，首个统一评估不完全、不精确和不准确三种弱监督异常检测场景的基准，通过系统变化标签数量、粒度和质量，揭示36种算法在4种模态上的性能边界，并发现弱监督场景间存在强相关性、专用WSAD算法仅在极端标签稀缺时占优等关键洞察。

Comments Accepted at KDD 2026 Datasets and Benchmarks Track

详情

DOI: 10.1145/3770855.3817536

AI中文摘要

弱监督异常检测（WSAD）已发展出三个主要方向：不完全监督、不精确监督和不准确监督。然而，这些方向仍然相互孤立，缺乏一个统一的框架来评估它们是否解决独特的挑战或共享基本机制。本文介绍了WSADBench，这是第一个统一评估不同弱监督场景的基准，对从专用WSAD方法到先进表格基础模型的多种方法进行基准测试。WSADBench通过系统变化标签数量、粒度和质量，建立了标准化协议来评估4种模态上的36种算法，揭示了各种方法的性能边界。基于超过70万次实验，WSADBench揭示了四个关键见解：（i）这些弱监督场景之间存在强内在相关性，挑战了当前研究方向的孤立性。（ii）专用WSAD算法仅在极端标签稀缺情况下表现出色，但随着监督增加或在OOD场景中，很快被表格基础模型和通用分类方法主导。（iii）未标记数据在不同设置中的效用不一致，与标签细化相比收益微乎其微。（iv）模型对不同类型的标签噪声表现出不对称敏感性。我们发布WSADBench作为开源基准，包含代码和数据集，以促进未来的WSAD研究：https://github.com/SUFE-AILAB/WSADBench。

英文摘要

Weakly supervised anomaly detection (WSAD) has developed in three primary directions: incomplete, inexact, and inaccurate supervision. However, these directions remain isolated, lacking a unified framework to assess whether they address unique challenges or share fundamental mechanisms. This paper introduces WSADBench, the first benchmark that unifies evaluation across distinct weakly supervised scenarios, benchmarking diverse approaches from specialized WSAD methods to advanced tabular foundation models. WSADBench establishes standardized protocols to evaluate 36 algorithms across 4 modalities by systematically varying label quantity, granularity, and quality, revealing the performance boundaries of various methods. Based on over 700K experiments, WSADBench reveals four critical insights: (i) Strong intrinsic correlations exist between these weak supervision scenarios, challenging the isolation of current research directions. (ii) Specialized WSAD algorithms excel only in extreme label-scarcity regimes but are quickly dominated by tabular foundation models and general classification methods as supervision increases or in OOD scenarios. (iii) Unlabeled data shows inconsistent utility across settings, with marginal gains compared to label refinement. (iv) Models exhibit asymmetric sensitivity to different types of label noise. We release WSADBench as an open-source benchmark with code and datasets to facilitate future WSAD research: https://github.com/SUFE-AILAB/WSADBench.

URL PDF HTML ☆

赞 0 踩 0

2605.30290 2026-06-02 cs.LG cs.AI cs.CL 版本更新

Self-Trained Verification for Training- and Test-Time Self-Improvement

自训练验证用于训练和测试时的自我改进

Chen Henry Wu, Aditi Raghunathan

发表机构 * arXiv

AI总结提出自训练验证（STV）方法，通过让验证器模仿参考解决方案下的自身版本，解决自我改进中验证器瓶颈问题，在测试时显著提升验证-细化循环，在训练时通过验证器在环训练（ViL）进一步提升生成器性能。

详情

MIC: 通过各向同性子空间对齐最大化自适应表示中的信息容量

Dang Nguyen Hong, Nhi Ngoc-Yen Nguyen, Huy-Hieu Pham

发表机构 * National University of Singapore（新加坡国立大学）

AI总结针对多尺度表示中的维度冗余和谱坍缩问题，提出MIC框架，通过各向同性子空间对齐、软坍缩正则化和谱各向同性正则化，结合自蒸馏目标生成语义密集且高判别力的表示，在高压缩场景下显著优于基线。

Comments Accepted at the GlobalSouthML Workshop at ICML 2026. 8 pages, 2 figures

2605.29977 2026-06-02 cs.CV cs.LG 版本更新

EVL-ECG: Efficient ECG Interpretation With Multi-Aspect Heterogeneous Knowledge Distillation

EVL-ECG：面向多视角异构知识蒸馏的高效心电图解读

Dang Nguyen Hong, Nhi Ngoc-Yen Nguyen, Huy-Hieu Pham

发表机构 * University of Notre Dame（诺丁汉大学）

AI总结提出EVL-ECG框架，通过多头交叉注意力对齐、最优传输视觉特征匹配和几何结构关系匹配三种创新方法，实现跨架构知识蒸馏，在资源受限环境下高效解读心电图。

Comments 7Accepted at the SD4H Workshop at ICML 2026. 7 pages, 3 figures

详情

AI中文摘要

高保真心电图解读越来越依赖于大规模基础模型，但其在临床边缘护理中的部署仍受到极端计算需求的阻碍。虽然知识蒸馏（KD）是一种有前景的解决方案，但传统方法在跨异构架构传递知识时，无法捕捉心电图信号的复杂时空依赖关系。本文提出EVL-ECG，一个专门用于心脏诊断逻辑跨架构蒸馏的框架。EVL-ECG引入了三种心电图感知创新：（1）多头交叉注意力对齐，协调架构差异以保留细粒度形态特征；（2）基于最优传输的视觉特征匹配，利用最优传输在标记表示不匹配的情况下保持跨心电图导联的全局结构关系；（3）几何结构内关系匹配，蒸馏教师模型的潜在诊断推理。在心电图基准测试上的评估表明，EVL-ECG相比现有基线，AUC提升高达2.4%，临床准确率提升1.1%。值得注意的是，EVL-ECG建立了一个高效的20亿参数心电图基础模型，适用于资源受限的临床环境。

英文摘要

High-fidelity ECG interpretation is increasingly reliant on massive foundation models, yet their deployment in clinical edge-care remains hindered by extreme computational demands. While knowledge distillation (KD) is a promising solution, traditional methods fail to capture the complex spatio-temporal dependencies of ECG signals when transferring knowledge across heterogeneous architectures. In this paper, we propose EVL-ECG, a framework specifically designed for cross-architecture distillation of cardiac diagnostic logic. EVL-ECG introduces three ECG-aware innovations: (1) Multi-Head Cross-Attention Alignment, which harmonizes architectural discrepancies to preserve fine-grained morphological features; (2) Optimal Transport-based Visual Feature Matching, utilizing optimal transport to maintain global structural relationships across ECG leads despite mismatched token representations; and (3) Geometric Intra-Architecture Relation Matching, which distills the latent diagnostic reasoning of the teacher model. Evaluations across ECG benchmarks demonstrate that EVL-ECG yields improvements of up to 2.4% AUC and 1.1% clinical accuracy over existing baselines. Notably, EVL-ECG establishes an efficient 2B-parameter ECG foundation model, suitable for resource-constrained clinical environments.

URL PDF HTML ☆

赞 0 踩 0

2605.16415 2026-06-02 cs.CV cs.LG 版本更新

Diffusion Models, Denoiser Architecture and Creativity

扩散模型、去噪器架构与创造力

Itamar Levine, Yair Weiss

发表机构 * The Hebrew University of Jerusalem（海法大学）

AI总结本文通过理论和实验表明，扩散模型的创造力源于去噪器架构与目标分布之间的相互作用，并指出去噪器架构的归纳偏差必须与真实目标分布高度一致才能成功。

详情

AI中文摘要

扩散模型的创造力是指它们生成与训练数据不同但高度逼真图像的能力。创造力有些令人惊讶，因为已知如果扩散模型中使用的去噪器是给定训练集的贝叶斯最优去噪器，那么模型将简单地复制训练样本。在本文中，我们提出经验和理论结果，表明扩散模型的创造力源于去噪器架构与目标分布之间的相互作用。理论上，我们针对三种不同的去噪器架构（线性、多项式、瓶颈）给出了生成样本分布作为目标分布和去噪器函数的显式形式。经验上，我们表明流行的UNET去噪器架构的微小变化会导致非常不同的创造力形式，并且这些微小变化通常会产生高度不真实的样本。综合来看，我们的结果表明，只有当去噪器架构的归纳偏差与真实目标分布高度一致时，扩散模型才能成功。

英文摘要

The creativity of diffusion models refers to their ability to generate highly realistic images that are different from their training data. Creativity is somewhat surprising since it is known that if the denoiser used in the diffusion model is the Bayes optimal denoiser for a given training set, then the model will simply copy the training samples. In this paper we present empirical and theoretical results that suggest that creativity in diffusion models is due to an interaction between the denoiser architecture and the target distribution. Theoretically, we give explicit forms for the distribution of generated samples as a function of the target distribution and the denoiser architecture for three different denoiser architectures (linear, polynomial, bottleneck). Empirically, we show that small changes in the popular UNET denoiser architecture leads to very different forms of creativity, and these small changes often yield samples that are highly nonrealistic. Taken together, our results show that diffusion models will only be successful if the inductive bias of the denoiser architecture is in strong alignment with the true target distribution.

URL PDF HTML ☆

赞 0 踩 0

2605.07804 2026-06-02 cs.LG cs.AI 版本更新

Prune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoning

Prune-OPD：面向长程推理的高效可靠在线策略蒸馏

Zhicheng Yang, Zhijiang Guo, Yifan Song, Minrui Xu, Yongxin Wang, Yiwei Wang, Xiaodan Liang, Jing Tang

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； The Hong Kong University of Science and Technology（香港科技大学）； MBZUAI ； University of California, Merced（加州大学默塞德分校）； Sun Yat-sen University（中山大学）

AI总结提出Prune-OPD框架，通过实时检测学生与教师之间的前缀漂移并动态截断不可靠的轨迹，在减少计算浪费的同时保持或提升长程推理任务的性能。

Comments 17 pages, 8 figures

详情

AI中文摘要

在线策略蒸馏（OPD）利用密集的教师奖励来增强推理模型。然而，将OPD扩展到长程任务暴露了一个关键缺陷：随着学生生成的前缀不可避免地偏离教师的思维过程，教师的密集奖励失去了局部可开发性。继续在这些“漂移”轨迹上生成和评估标记不仅会降低奖励质量，还会导致巨大的计算浪费。为了解决这个问题，我们引入了 extbf{Prune-OPD}，一个动态地将训练预算与监督质量对齐的框架。通过持续监控学生和教师预测之间的局部兼容性（例如，通过top-$k$重叠），Prune-OPD实时检测前缀漂移事件。一旦检测到严重漂移，它会单调地降低后续不可靠奖励的权重，并触发动态的轨迹截断。这使得训练过程能够停止无效的生成，并将计算重新分配到可靠的教师监督上。在不同的教师-学生组合中，Prune-OPD始终将计算与监督可靠性对齐。当前缀漂移使得密集的教师奖励不可靠时，它减少了37.6\%--68.0\%的训练时间，同时保持甚至提升了在具有挑战性的基准（AMC、AIME、HMMT）上的性能。当学生-教师兼容性保持较高时，它会通过扩展训练窗口自动保留长上下文监督。这些结果表明，Prune-OPD不是通过盲目缩短轨迹来改进OPD，而是通过将计算重新分配到局部可开发的教师奖励上。

英文摘要

On-policy distillation (OPD) leverages dense teacher rewards to enhance reasoning models. However, scaling OPD to long-horizon tasks exposes a critical flaw: as the student's generated prefix inevitably diverges from the teacher's thought process, the teacher's dense reward loses local exploitability. Continuing to generate and evaluate tokens on these ``drifted'' trajectories not only degrades reward quality but also incurs massive computational waste. To address this, we introduce \textbf{Prune-OPD}, a framework that dynamically aligns training budgets with supervision quality. By continuously monitoring the local compatibility between student and teacher predictions (e.g., via top-$k$ overlap), Prune-OPD detects prefix-drift events in real time. Upon detecting severe drift, it monotonically down-weights subsequent unreliable rewards and triggers dynamic rollout truncation. This allows the training process to halt futile generation and reallocate compute strictly to reliable teacher supervision. Across diverse teacher-student combinations, Prune-OPD consistently aligns computation with supervision reliability. When prefix drift makes dense teacher rewards unreliable, it reduces training time by 37.6\%--68.0\% while preserving, and often improving, performance on challenging benchmarks (AMC, AIME, HMMT). When student-teacher compatibility remains high, it automatically preserves long-context supervision by expanding the training window. These results suggest that Prune-OPD improves OPD not by blindly shortening rollouts, but by reallocating computation toward locally exploitable teacher rewards.

URL PDF HTML ☆

赞 0 踩 0

2602.14307 2026-06-02 cs.AI cs.LG 版本更新

Benchmarking at the Edge of Comprehension

在理解边缘的基准测试

Samuele Marro, Jialin Yu, Emanuele La Malfa, Oishi Deb, Jiawei Li, Yibo Yang, Ebey Abraham, Sunando Sengupta, Eric Sommerlade, Michael Wooldridge, Philip Torr

发表机构 * University of Cambridge（剑桥大学）

AI总结提出Critique-Resilient Benchmarking框架，通过对抗性生成-评估游戏在人类理解受限时比较模型，利用批判韧性正确性概念和分项Bradley-Terry模型对LLM进行排序。

详情

AI中文摘要

随着前沿大型语言模型（LLMs）在新基准发布后迅速饱和，基准测试本身正处于一个转折点：如果前沿模型持续改进，人类将越来越难以生成具有区分度的任务、提供准确的真实答案或评估复杂解决方案。如果基准测试变得不可行，我们衡量AI进展的能力将受到威胁。我们将这种情况称为后理解阶段。在这项工作中，我们提出了Critique-Resilient Benchmarking，一种对抗性框架，旨在即使在人类完全理解不可行的情况下也能比较模型。我们的技术依赖于批判韧性正确性的概念：如果没有对手令人信服地证明答案错误，则该答案被视为正确。与标准基准测试不同，人类充当有界验证者，专注于局部声明，从而在超出任务完全理解的情况下保持评估完整性。使用分项二分Bradley-Terry模型，我们联合对LLM进行排序，依据其解决挑战性任务的能力和生成困难但可解问题的能力。我们在数学领域展示了该方法在八个前沿LLM上的有效性，表明所得分数稳定且与外部能力度量相关。我们的框架将基准测试重新定义为一种对抗性生成-评估游戏，其中人类作为最终裁决者。

英文摘要

As frontier Large Language Models (LLMs) increasingly saturate new benchmarks shortly after they are published, benchmarking itself is at a juncture: if frontier models keep improving, it will become increasingly hard for humans to generate discriminative tasks, provide accurate ground-truth answers, or evaluate complex solutions. If benchmarking becomes infeasible, our ability to measure any progress in AI is at stake. We refer to this scenario as the post-comprehension regime. In this work, we propose Critique-Resilient Benchmarking, an adversarial framework designed to compare models even when full human understanding is infeasible. Our technique relies on the notion of critique-resilient correctness: an answer is deemed correct if no adversary has convincingly proved otherwise. Unlike standard benchmarking, humans serve as bounded verifiers and focus on localized claims, which preserves evaluation integrity beyond full comprehension of the task. Using an itemized bipartite Bradley-Terry model, we jointly rank LLMs by their ability to solve challenging tasks and to generate difficult yet solvable questions. We showcase the effectiveness of our method in the mathematical domain across eight frontier LLMs, showing that the resulting scores are stable and correlate with external capability measures. Our framework reformulates benchmarking as an adversarial generation-evaluation game in which humans serve as final adjudicators.

URL PDF HTML ☆

赞 0 踩 0

2506.02075 2026-06-02 stat.ME cs.LG 版本更新

Position: Stop Chasing the C-index when Evaluating Survival Analysis Models

立场：评估生存分析模型时停止追逐C指数

Christian Marius Lillelund, Shi-ang Qi, Russell Greiner, Christian Fischer Pedersen

发表机构 * University of Copenhagen（哥本哈根大学）

AI总结本文批判性审视生存分析中的评估实践，指出C指数等一致性指标被过度使用且与建模目标错位，提出双螺旋阶梯框架以确保评估指标与模型假设对齐，并通过实验展示错位导致的误导性比较。

Comments ICML 2026 Position Paper Track (Spotlight)

详情

AI中文摘要

当前生存分析评估的现状受到持续使用与既定建模目标不一致的评估指标的困扰。此外，许多此类评估基于隐含或不合理的删失假设。这意味着报告的性能可能具有误导性，并且可能无法回答评估旨在解决的科学或建模问题。在这篇立场论文中，我们批判性地审视了生存分析中的评估实践，并强调了删失如何使评估从根本上不同于标准回归或分类。我们特别关注基于一致性的度量，如C指数，我们证明其在文献中被过度使用。为了帮助确定合适的度量，我们提出了一组关键需求，并引入了一个双螺旋阶梯，其中有效评估需要度量与模型假设之间的对齐。通过控制实验，我们表明这种对齐的违反可能导致误导性的模型比较。最后，我们提供了关于如何评估生存模型的实用指导。

英文摘要

The current state of evaluation in survival analysis is plagued by the persistent use of evaluation metrics in ways that are misaligned with the stated modeling objective. In addition, many such evaluations are based on censoring assumptions that are left implicit or unjustified. This means that the reported performance can be misleading and may fail to answer the scientific or modeling question the evaluation was intended to address. In this position paper, we critically examine evaluation practices in survival analysis and highlight how censoring makes evaluation fundamentally different from standard regression or classification. We place particular focus on concordance-based measures, such as the C-index, which we show are heavily overused in the literature. To help identify appropriate metrics, we propose a set of key desiderata and introduce a double-helix ladder, in which valid evaluation requires alignment between metric and modeling assumptions. Through controlled experiments, we show that violations of this alignment can lead to misleading model comparisons. We conclude by providing practical guidance on how to evaluate a survival model.

URL PDF HTML ☆

赞 0 踩 0

2605.29548 2026-06-02 cs.LG 版本更新

TIMEGATE: 资源约束下可持续的限时促销门控用于持续ML适应

Abhijit Chakraborty, Suddhasvatta Das, Yash Shah, Vivek Gupta, Kevin A. Gary

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Washington（华盛顿大学）

AI总结提出TIMEGATE策略层，通过预算时间、标注、训练和评估来管理持续ML适应，实现评估计算节省且无静默错误促销。

2605.29072 2026-06-02 cs.LG cs.NA math.NA 版本更新

Ensemble Score Filtering for Real-Data Energy Consumption Forecast Correction

集成得分滤波用于真实数据能耗预测修正

Ruoyu Hu, Dahai Yu, Feng Bao, Guang Wang, Guannan Zhang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结针对真实能耗数据的高维数据同化问题，采用集成得分滤波器（EnSF）结合预训练黑箱时空预测模型，通过基于得分的扩散模型和闭式得分表示修正预测轨迹，实验表明EnSF优于开环传播和集成卡尔曼滤波。

详情

AI中文摘要

准确的能耗估计和预测对电力系统运行、规划和需求侧管理至关重要。然而，在实践中，完整及时的测量可能并不总是可用，观测数据可能是不完整的、有噪声的或延迟的。这促使使用学习型预测模型来预测不断变化的消费状态，并结合数据同化方法进行序列预测修正。在这项工作中，我们研究了真实能耗数据的高维数据同化问题。前向预测由预训练的黑箱时空预测模型提供，该模型在滤波过程中被视为状态传播器。我们采用集成得分滤波器（EnSF）来同化部分和有噪声的观测，并随时间修正预测轨迹。EnSF使用基于得分的扩散模型来近似滤波分布，并通过使用闭式得分表示和蒙特卡洛近似避免在同化过程中重新训练神经网络得分模型。数值实验表明，学习型预测模型的开环传播在长时间范围内可能变得不可靠，而基于EnSF的修正显著改善了状态估计。与集成卡尔曼滤波（EnKF）的比较进一步表明，在本工作考虑的非线性观测设置下，EnSF提供了更强的修正能力。

英文摘要

Accurate estimation and forecasting of energy consumption are important for power-system operation, planning, and demand-side management. In practice, however, complete and timely measurements may not always be available, and the observed data can be partial, noisy, or delayed. This motivates the use of learned forecasting models for predicting the evolving consumption state, together with data assimilation methods for sequential forecast correction. In this work, we study a high-dimensional data assimilation problem for real energy-consumption data. \modeltext{The forward prediction is supplied by a pretrained black-box spatio-temporal forecasting model, which is treated as the state propagator in the filtering procedure.} We employ the Ensemble Score Filter (EnSF) to assimilate partial and noisy observations and to correct the forecast trajectory over time. The EnSF uses score-based diffusion models to approximate filtering distributions and avoids retraining neural-network score models during assimilation by using a closed-form score representation and Monte Carlo approximation. Numerical experiments demonstrate that open-loop propagation of the learned forecasting model can become unreliable over long horizons, while EnSF-based correction substantially improves state estimation. Comparisons with the Ensemble Kalman Filter (EnKF) further show that EnSF provides stronger correction under the nonlinear observation setting considered in this work.

URL PDF HTML ☆

赞 0 踩 0

2605.28952 2026-06-02 cs.CR cs.DS cs.IT cs.LG math.IT math.ST stat.TH 版本更新

Optimal Rates for Differentially Private Hypothesis Testing with E-values

基于E值的差分隐私假设检验的最优速率

Ben Jacobsen, Tomas Gonzalez, Gavin Brown, Kassem Fawaz, Aaditya Ramdas

发表机构 * University of Wisconsin-Madison（威斯康星大学麦迪逊分校）； Carnegie Mellon University（卡内基梅隆大学）

AI总结研究在ε-差分隐私约束下，使用e值进行假设检验时所能达到的最大e-power，并给出最优速率及匹配算法。

Comments Corrected typos; updated references; generalized proposition 3.1

详情

AI中文摘要

近年来，e值作为支持任意有效和自适应数据分析的灵活工具引起了广泛关注。假设检验是许多此类应用的核心，而这些应用通常涉及私有或敏感数据。在这项工作中，我们回答了一个简单但重要的问题：给定两个分布 $\mathbb{P}$ 和 $\mathbb{Q}$，当使用满足 $\varepsilon$-差分隐私的e值检验 $X\sim \mathbb{P}^n$ 对 $X\sim\mathbb{Q}^n$ 时，所能达到的最大e-power是多少？我们刻画了该问题的最优速率，并提供了一个精确匹配的算法。在顺序设置中，当观测值逐个到达且分析者选择何时停止时，我们给出了任何私有e过程的停止时间的匹配上下界。数值实验证实了我们算法的实用性，在多种顺序检验问题和隐私水平下，我们的算法所需数据少于最近提出的DP-SPRT。

英文摘要

E-values have attracted considerable interest in recent years as flexible tools for enabling anytime-valid and adaptive data analysis. Hypothesis testing is at the core of many of these applications, which can often involve private or sensitive data. In this work, we answer a simple but important question: given two distributions $\mathbb{P}$ and $\mathbb{Q}$, what is the maximum achievable e-power when testing $X\sim \mathbb{P}^n$ against $X\sim\mathbb{Q}^n$ with e-values that satisfy $\varepsilon$-differential privacy? We characterize the optimal rate for this problem and provide an algorithm which matches it exactly. In the sequential setting, when observations arrive one-by-one and the analyst chooses when to halt, we give matching upper and lower bounds on the stopping times of any private e-process. Numerical experiments confirm the practicality of our algorithms, which require less data than the recently proposed DP-SPRT across a range of sequential testing problems and privacy levels.

URL PDF HTML ☆

赞 0 踩 0

2605.28850 2026-06-02 cs.LG q-fin.CP 版本更新

Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents

表示签名与LLM交易智能体中的风险反馈对齐

Weicheng Xue

发表机构 * Virginia Tech（弗吉尼亚理工大学）

AI总结通过TradeArena测试平台研究LLM交易智能体在金融决策中的行为对齐与表示动态，发现故障前表示签名（规划嵌入漂移、流形有效秩收缩）并验证风险反馈作为外部对齐信号的有效性。

详情

AI中文摘要

我们研究了大型语言模型（LLM）智能体在金融决策环境中的行为对齐与表示动态。TradeArena是一个可审计的交易智能体测试平台，提供风险报告、执行模拟、记忆和可重放轨迹，使我们能够分析在市场压力下推理、持仓和干预的演变。代码和数据工件可通过TradeArena仓库获取。我们发现了故障前签名：规划嵌入偏离正常质心，融合的计划-风险表示将正常状态与预回撤状态分离，局部流形表现出有效秩收缩。在80个滚动故障锚点和8条LLM轨迹中，这一模式在哈希、LSA、Transformer和白盒隐藏状态探针中持续存在。使用无CoT目标权重、词汇控制、OHLCV噪声和虚假审计的压力测试表明，无推理时推理级收缩消失，而意图空间和融合签名仍具有信息性。结构化风险反馈可以在不微调的情况下作为外部对齐信号，但并非通用性能增强器：真实审计反馈改善了一些模型的校准，另一些模型的收益，并暴露出安慰剂或隐藏反馈在短周期内收益更高但对齐诊断较弱的情况。一项51只股票的日内实验揭示了相关性盲点：LLM推理为风险层会削减的相关资产敞口提供理由。最后，一个金融审计任务套件将比较从“哪个模型交易最好”转向模型能否审计轨迹、尊重执行边界、重现工件并避免过度声明。这些结果支持研究主张而非盈利主张：可审计的风险反馈和表示轨迹揭示了LLM金融推理何时对齐、漂移或失败。

英文摘要

We study behavioral alignment and representation dynamics of large language model (LLM) agents in financial decision environments. TradeArena, an auditable trading-agent testbed with risk reports, execution simulation, memory, and replayable trajectories, lets us analyze how rationales, positions, and interventions evolve under market stress. Code and data artifacts are available through the \href{https://github.com/weich97/TradeArena.git}{TradeArena repository}. We find pre-failure signatures: planning embeddings drift from normal centroids, fused plan-risk representations separate normal from pre-drawdown states, and local manifolds exhibit effective-rank contraction. Across 80 rolling failure anchors and eight LLM trajectories, this pattern persists across hash, LSA, Transformer, and white-box hidden-state probes. Stress tests with CoT-free target weights, lexical controls, OHLCV noise, and false audits show that rationale-level contraction can vanish without rationales, while intent-space and fused signatures remain informative. Structured risk feedback can act as an external alignment signal without fine-tuning, but not as a universal performance enhancer: true audit feedback improves calibration for some models, returns for others, and exposes cases where placebo or hidden feedback has higher short-horizon return but weaker alignment diagnostics. A 51-stock intraday experiment reveals a correlation blind spot: LLM rationales justify exposure to coupled assets that the risk layer clips. Finally, a financial-audit task suite shifts comparison from ``which model trades best'' to whether models can audit trajectories, respect execution boundaries, reproduce artifacts, and avoid claim overreach. These results support a research claim, not a profitability claim: auditable risk feedback and representation trajectories reveal when LLM financial reasoning is aligning, drifting, or failing.

URL PDF HTML ☆

赞 0 踩 0

2605.26092 2026-06-02 cs.LG cs.AI 版本更新

GoQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization

GoQuant: 用于无乘法器二次幂变压器量化的几何正交残差投影

Maoyang Xiang, Tao Luo, Bo Wang

发表机构 * Information Systems Technology and Design（信息系统技术与设计）； Singapore University of Technology and Design（新加坡科技设计大学）； Institute of High Performance Computing (IHPC)（高性能计算研究所）； Agency for Science, Technology and Research (A*STAR)（科技研究局）

AI总结针对低比特量化中二次幂格式的低角度分辨率问题，提出几何正交残差投影量化（GoQuant），通过双基几何投影和移位加操作合成高分辨率残差格点，实现硬件高效且无需乘法器的量化方法。

详情

AI中文摘要

大型语言模型（LLMs）和视觉变换器（ViTs）在边缘设备上的部署受到内存限制和密集乘加（MAC）阵列引入的关键时序瓶颈的显著约束。在超低比特范围内，对数二次幂（PoT）量化通过用位移操作替代MAC操作，提供了一种硬件高效的替代方案。然而，非均匀指数格点固有地受到低角度分辨率机制的局限，这一结构缺陷在低于4比特阈值时尤为突出，导致高维特征流形的显著退化。为解决这一几何限制，我们提出了几何正交残差投影量化（GoQuant），一种算法-硬件协同设计框架。通过将量化表述为双基几何投影，GoQuant使用严格的移位加操作自适应地合成更高分辨率的残差格点。此外，其解析求解器为计算密集的梯度优化提供了实用替代方案，将LLaMA-2-7B的全模型校准时间减少到约15分钟。广泛评估表明GoQuant在多种模态下的适用性和硬件效率。在3比特（W3/A16）约束下，它在LLaMA-2-7B上实现了6.10的困惑度，与依赖非对称缩放的常规MAC密集型基线（如AWQ）相比具有竞争力，同时在4比特场景下保持竞争性精度。在硅片层面，28nm节点的标准单元RTL综合表明，GoQuant有效缓解了与密集乘法器树相关的时序瓶颈。通过展平组合逻辑深度，我们的并行移位加数据路径将关键路径延迟降低至0.35纳秒。

英文摘要

The deployment of Large Language Models (LLMs) and Vision Transformers (ViTs) on edge devices is significantly constrained by memory limitations and the critical timing bottlenecks introduced by dense Multiply-Accumulate (MAC) arrays. In the ultra-low bit regime, logarithmic Power-of-Two (PoT) quantization provides a hardware-efficient alternative by replacing MAC operations with bit-shifts. However, the non-uniform exponential lattice is inherently limited by a \textbf{Low Angular Resolution Regime}, a structural flaw that becomes particularly pronounced at sub-4-bit thresholds, leading to a notable degradation of high-dimensional feature manifolds. To address this geometric limitation, we propose Geometric Orthogonal Residual Projection Quantization (GoQuant), an algorithm-hardware co-design framework. By formulating quantization as a dual-basis geometric projection, GoQuant adaptively synthesizes a higher-resolution residual lattice using strictly shift-and-add operations. Furthermore, its analytical solver offers a practical alternative to computationally intensive gradient-based optimization, reducing the full-model calibration time for LLaMA-2-7B to approximately 15 minutes. Extensive evaluations demonstrate GoQuant's applicability across modalities and its hardware efficiency. Under the 3-bit (W3/A16) constraint, it achieves a perplexity of 6.10 on LLaMA-2-7B, comparing favorably to conventional MAC-intensive baselines like AWQ without relying on asymmetric scaling, while maintaining competitive accuracy in 4-bit scenarios. At the silicon level, standard-cell RTL synthesis at a 28nm node indicates that GoQuant effectively mitigates the timing bottlenecks associated with dense multiplier trees. By flattening the combinational logic depth, our parallel shift-and-add datapath reduces the critical path delay to 0.35 ns.

URL PDF HTML ☆

赞 0 踩 0

2605.25889 2026-06-02 cs.CR cs.LG 版本更新

图纳维-斯托克斯网络

Zexing Zhao, Guangsi Shi, Yu Gong, Tianyu Wang, Shirui Pan, Hongye Cheng, Yuxiao Li

发表机构 * Northwest A&F University（西北农林科技大学）； Corporate Research Center, Midea Group（美的集团企业研发中心）； Peking University（北京大学）； Fudan University（复旦大学）； Griffith University（格里菲斯大学）； Bosch（博世）

AI总结针对图神经网络中的过平滑问题，提出基于纳维-斯托克斯方程的图纳维-斯托克斯网络（GNSN），通过引入对流机制实现更高效的消息传递，并在12个真实数据集上取得最优分类性能。

详情

AI中文摘要

图神经网络（GNN）已成为深度学习的基石，现有方法大多基于图信号处理和扩散方程来建模消息传递。然而，这些方法固有地存在过平滑问题，即随着网络深度增加，节点特征变得难以区分。受纳维-斯托克斯方程启发，我们提出了图纳维-斯托克斯网络（GNSN），这是一种新颖的架构，通过将对流引入图结构，超越了传统的基于扩散的消息传递。GNSN在图定义动态速度场来控制对流，实现更高效、更直接的消息传播。通过自适应平衡对流和扩散，GNSN能够有效处理具有不同同质性水平的数据集。在12个真实世界数据集上的广泛评估表明，GNSN在分类准确率上持续优于最先进的基线方法。此外，实验结果进一步强调了其在缓解过平滑问题方面的有效性。

英文摘要

Graph Neural Networks (GNNs) have emerged as a cornerstone of deep learning, with most existing methods rooted in graph signal processing and diffusion equations to model message passing. However, these approaches inherently suffer from the oversmoothing problem, where node features become indistinguishable as the network depth increases. Inspired by the Navier Stokes equations, we introduce Graph Navier Stokes Networks (GNSN), a novel architecture that transcends conventional diffusion-based message passing by incorporating convection into graph structures. GNSN defines a dynamic velocity field on the graph to govern convection, enabling more efficient and direct message propagation. By adaptively balancing convection and diffusion, GNSN is able to efficiently handle datasets with varying levels of homophily. Extensive evaluations across twelve real-world datasets demonstrate that GNSN consistently outperforms state-of-the-art baselines in classification accuracy. Moreover, experimental results further emphasize its effectiveness in alleviating the oversmoothing problem.

URL PDF HTML ☆

赞 0 踩 0

2605.25143 2026-06-02 cs.AI cs.LG 版本更新

Beyond the Frontier: Stochastic Backtracking for Efficient Test-Time Scaling

超越前沿：用于高效测试时扩展的随机回溯

Dao Tran, Duc Anh Le, Ngoc Luu, Quan Pham, Tung Pham, Hung Bui

发表机构 * Qualcomm AI Research（高通人工智能研究）

AI总结提出随机回溯方法，通过维护历史前缀池并利用子池选择和幂回溯序列蒙特卡洛机制，在测试时扩展中实现更高的准确率-令牌数权衡。

详情

AI中文摘要

测试时扩展通过花费额外计算来探索多个解轨迹，从而改进语言模型推理。关键挑战是在推理过程中最大化准确率的同时最小化生成的令牌总数。最近的PRM引导方法对中间前缀进行评分以引导搜索，但大多数方法仅关注前沿：它们只保留当前活动的前缀，并使用带噪声的PRM分数不可逆地剪枝或重采样其余部分。这可能导致过早承诺、多样性崩溃以及丢失仍可产生正确延续的前缀。我们引入了一种基于历史前缀持久池的随机回溯，允许测试时计算重新访问先前生成的状态，而不是仅扩展当前前沿。为了提高效率，我们提出了两种互补机制。子池选择通过随机子池内应用Top-N选择来增强贪婪PRM引导搜索，使历史前缀有机会绕过评分过高的前沿候选。幂回溯序列蒙特卡洛使用幂化PRM分数和混合校正权重，将SMC风格的重采样扩展到持久池。在数学推理基准和模型规模上，我们的方法在每令牌准确率上始终更高，并且与强PRM引导基线相比，仅使用一小部分令牌数即可达到相同的准确率水平，这表明持久池随机回溯为改善测试时扩展中的准确率-令牌权衡提供了一种简单有效的方法。

英文摘要

Test-time scaling improves language model reasoning by spending additional compute to explore multiple solution trajectories. The key challenge is to maximize accuracy while minimizing the total number of generated tokens during reasoning. Recent PRM-guided methods score intermediate prefixes to steer this search, but most are frontier-only: they keep only the current active prefixes and irreversibly prune or resample away the rest using noisy PRM scores. This can cause premature commitment, diversity collapse, and the loss of prefixes that still admit correct continuations. We introduce stochastic backtracking over a persistent pool of historical prefixes, allowing test-time compute to revisit previously generated states instead of only expanding the current frontier. To make this efficient, we propose two complementary mechanisms. Subpool Selection strengthens greedy PRM-guided search by applying Top-N selection within random subpools, giving historical prefixes a chance to bypass over-scored frontier candidates. Power Backtrack Sequential Monte Carlo extends SMC-style resampling to the persistent pool using powered PRM scores and mixture-corrected weights. Across mathematical reasoning benchmarks and model scales, our methods consistently achieve higher accuracy per token count, and the same level of accuracy using only a fraction of the token count in comparison to strong PRM-guided baselines, demonstrating that persistent-pool stochastic backtracking provides a simple and effective way to improve the accuracy-token trade-off in test-time scaling.

URL PDF HTML ☆

赞 0 踩 0

2605.24528 2026-06-02 cs.AI cs.CL cs.LG 版本更新

Hypothesis Generation and Inductive Inference in Children and Language Models

儿童与语言模型中的假设生成与归纳推理

Jeffrey Qin, Wasu Top Piriyakulkij, Zhuangfei Gao, Mia Radovanovic, Jessica Sommerville, Kevin Ellis, Marta Kryven

发表机构 * Computer Science University of Waterloo（滑铁卢大学计算机科学系）； Department of Computer Science Cornell University（康奈尔大学计算机科学系）； Department of Computer Science Dalhousie University（达尔豪斯大学计算机科学系）； Department of Psychology University of Toronto（多伦多大学心理学系）

AI总结通过归纳推理盒子任务，结合贝叶斯粒子推断的程序归纳形式化，比较儿童与基于LLM的智能体在不确定性下的假设生成与证据寻求行为，发现两者在适应环境结构上相似但信息寻求成本与归纳偏差不同。

详情

AI中文摘要

现实世界中的决策需要在证据、潜在因果规则以及世界状态本身的不确定性下构建心智模型。在这种条件下，哪些计算原理支撑人类的推理？在给定匹配约束下，基于LLM的智能体是否表现出类似行为？我们使用归纳推理盒子任务来探讨这些问题，在该任务中，参与者（人类儿童和基于LLM的智能体）通过与不确定环境的顺序交互来推断潜在原因。我们将该任务形式化为基于贝叶斯粒子推断的程序归纳，并承认两种互补的解释：(1) 作为对假设的约束满足过程，以及(2) 作为程序综合问题，其中假设是针对证据评估的可执行程序。使用基于约束的公式，我们表明儿童的行为最好由主观证据可靠性和在线假设生成的组合来解释，这解释了他们的证据寻求模式以及任务完成与规则泛化之间的分离。使用程序综合公式，我们将基于LLM的智能体视为模型有机体：可控系统，允许系统性地操纵任务条件。在各种后端中，基于LLM的智能体复制了儿童对证据可靠性和可观察性变化的反应，包括折扣不可靠证据、寻求解决部分信息以及任务完成与因果泛化之间的分离。同时，与儿童相比，基于LLM的智能体倾向于过度观察和过度遵守指令。这些结果表明，虽然儿童和基于LLM的智能体在适应环境结构方面相似，但他们的信息寻求行为表现出不同的潜在成本和归纳偏差。

英文摘要

Real world decision-making requires constructing mental models under uncertainty over evidence, over the underlying causal rules, and over the state of the world itself. Which computational principles underpin human inference under such conditions, and do LLM-based agents exhibit similar behavior given matching constraints? We address these questions using an inductive inference Box Task in which participants, human children and LLM-based agents, infer a latent cause through sequential interaction with an uncertain environment. We formalize this task as program induction with Bayesian particle-based inference, admitting two complementary interpretations: (1) as a constraint satisfaction process over hypotheses, and (2) as a program synthesis problem in which hypotheses are executable programs evaluated against evidence. Using the constraint-based formulation, we show that children's behavior is best explained by a combination of subjective evidence reliability and online hypothesis generation, accounting for both their evidence-seeking patterns and their dissociation between task completion and rule generalization. Using the program synthesis formulation, we treat LLM-based agents as model organisms: controllable systems that allow systematic manipulation of task conditions. Across backends, LLM-based agents replicate children's responses to changes in evidence reliability and observability, including discounting unreliable evidence, seeking to resolve partial information, and dissociating between task completion and causal generalization. At the same time, LLM-based agents tend to over-observe and over-comply with instructions relative to children. These results suggest that while children and LLM-based agents adapt similarly to environmental structure, their information-seeking behavior exhibits distinct underlying costs and inductive biases.

URL PDF HTML ☆

赞 0 踩 0

2605.18838 2026-06-02 cs.LG cs.AI cs.CL 版本更新

Lying Is Just a Phase: The Hidden Alignment Transition in Language Model Scaling

说谎只是一个阶段：语言模型扩展中的隐藏对齐转变

Adil Amin

发表机构 * ZEHEN Labs（ZEHEN实验室）

AI总结通过分析63个基础模型，发现语言模型在特定规模阈值下，推理能力与真实性从反相关转变为正相关，并揭示了输出投影瓶颈和零竞争注意力头等内部机制。

Comments 15 pages, 8 figures, 2 tables. Companion paper: "The Growing Pains of Frontier Models: When Leaderboards Stop Separating and What to Measure Next." ( https://doi.org/10.48550/arXiv.2605.18840). Code: https://github.com/adilamin89/cape-scaling. Dashboard: https://zehenlabs.com/cape/

详情

AI中文摘要

扩展定律预测了计算量带来的损失，但未预测能力如何相互作用。我们测量了来自16个家族的63个基础模型的推理能力与真实性之间的耦合，并发现了一个在损失曲线中不可见的相变：低于家族依赖的临界规模N_c时，能力反相关（r = -0.989，p = 4 x 10^{-5}，非参数置换检验）；高于该规模时，它们合作。N_c ~ 3.5B参数 [2.9B, 13.4B]（bootstrap 95% CI），但模型大小并非决定相位的唯一变量。架构、数据整理和训练配方各自独立地改变N_c：精心整理的数据消除了Qwen代际之间的耦合下降（在匹配规模下从0.025到0.830），Gemma-4在4B时通过蒸馏和架构创新实现了0.871的耦合，这通常是13B+标准训练模型的特征，而Phi在1B时仅通过数据整理就达到了10B网络训练模型的耦合水平。宽度归一化消除了所有测试家族的反相关，支持输出投影瓶颈的存在。在内部，40个模型中有38个显示零竞争注意力头。一个稀疏回归ODE以5.6%的误差交叉预测了保留的Llama-2。该诊断不需要模型内部信息——仅需跨模型家族的公开基准分数。合作区域扩展到前沿（r = +0.72，34个模型，10个实验室）。一个概念验证干预证实了瓶颈是可利用的：在识别层添加单个真实方向向量，无需重新训练即可纠正税收阶段60%的错位输出——这是一种无需修改权重的、每推理一次的外科手术式修正。代码、数据、用于任何开放权重模型的开源转向CLI以及用于相位诊断的交互式仪表板已发布：https://zehenlabs.com/cape/。

英文摘要

Scaling laws predict loss from compute but not how capabilities interact. We measure the coupling between reasoning and truthfulness across 63 base models from 16 families and find a regime change invisible to loss curves: below a family-dependent critical scale N_c, capabilities anticorrelate (r = -0.989, p = 4 x 10^{-5} nonparametric permutation test); above it, they cooperate. N_c ~ 3.5B parameters [2.9B, 13.4B] (bootstrap 95% CI), but model size is not the only variable that determines phase. Architecture, data curation, and training recipe each shift N_c independently: curated training eliminated the coupling dip between Qwen generations (0.025 to 0.830 at matched scale), Gemma-4 at 4B achieves coupling 0.871, characteristic of 13B+ standard-trained models, through distillation and architectural innovation, and Phi at 1B matches web-trained coupling at 10B through data curation alone. Width normalization eliminates the anticorrelation across all tested families, supporting an output-projection bottleneck. Internally, 38 of 40 models show zero competing attention heads. A sparse-regression ODE cross-predicts held-out Llama-2 at 5.6% error. The diagnostic requires no model internals -- only public benchmark scores across a model family. The cooperative regime extends to the frontier (r = +0.72, 34 models, 10 labs). A proof-of-concept intervention confirms the bottleneck is exploitable: adding a single truth-direction vector at the identified layer corrects 60% of misaligned outputs in the tax phase with zero retraining -- a surgical, per-inference correction that requires no weight modification. Code, data, an open-source steering CLI for any open-weight model, and an interactive dashboard for phase diagnosis are released: https://zehenlabs.com/cape/.

URL PDF HTML ☆

赞 0 踩 0

2602.23179 2026-06-02 cs.LG q-bio.BM 版本更新

Induction Meets Biology: Mechanisms of Repeat Detection in Protein Language Models

归纳遇见生物学：蛋白质语言模型中重复检测的机制

Gal Pomerants, Yaniv Nikankin, Anja Reusch, Tomer Tsaban, Ora Schueler-Furman, Yonatan Belinkov

发表机构 * Weizmann Institute of Science（魏茨曼科学研究院）

AI总结通过分析蛋白质语言模型在掩码预测中的行为，揭示了其检测精确和近似重复序列的两阶段机制：先构建特征表示，再利用归纳头关注重复片段中的对齐标记。

详情

AI中文摘要

蛋白质序列中存在大量重复片段，既有精确拷贝，也有带有突变的近似片段。这些重复对蛋白质结构和功能至关重要，推动了数十年来关于重复识别的算法研究。最近的研究表明，蛋白质语言模型（PLMs）通过掩码标记预测中的行为能够识别重复。为了阐明其内部机制，我们研究了PLMs如何检测精确和近似重复。我们发现，近似重复的机制在功能上包含了精确重复的机制。然后，我们描述了这一机制，揭示了两个主要阶段：首先，PLMs使用通用位置注意力头和生物学特化组件（如编码氨基酸相似性的神经元）构建特征表示；然后，归纳头关注重复片段中的对齐标记，促进正确答案的产生。我们的结果揭示了PLMs如何通过将基于语言的模式匹配与特化的生物学知识相结合来解决这一生物学任务，从而为研究PLMs中更复杂的进化过程奠定了基础。

英文摘要

Protein sequences are abundant in repeating segments, both as exact copies and as approximate segments with mutations. These repeats are important for protein structure and function, motivating decades of algorithmic work on repeat identification. Recent work has shown that protein language models (PLMs) identify repeats, by examining their behavior in masked-token prediction. To elucidate their internal mechanisms, we investigate how PLMs detect both exact and approximate repeats. We find that the mechanism for approximate repeats functionally subsumes that of exact repeats. We then characterize this mechanism, revealing two main stages: PLMs first build feature representations using both general positional attention heads and biologically specialized components, such as neurons that encode amino-acid similarity. Then, induction heads attend to aligned tokens across repeated segments, promoting the correct answer. Our results reveal how PLMs solve this biological task by combining language-based pattern matching with specialized biological knowledge, thereby establishing a basis for studying more complex evolutionary processes in PLMs.

URL PDF HTML ☆

赞 0 踩 0

2601.19597 2026-06-02 cs.LG 版本更新

The Geometric Mechanics of Contrastive Representation Learning: Alignment Potentials, Entropic Dispersion, and Cross-modal Divergence

对比表示学习的几何力学：对齐势、熵分散和跨模态散度

Yichao Cai, Zhen Zhang, Yuhang Liu, Javen Qinfeng Shi

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文通过测度论框架，在大批量极限下证明InfoNCE目标与确定性能量景观的等价性，揭示单模态与对称多模态之间的几何分岔，并指出跨模态散度项导致模态间隙。

Comments 54 Pages, ICML 2026 (Refined document aesthetics for clearer reading)

详情

AI中文摘要

尽管InfoNCE是现代对比学习的基础，但其几何机制在经典的对齐-均匀分解之外仍未被充分刻画。我们发展了一个测度论框架，其中表示测度在固定的嵌入流形上演化。在大批量极限下，我们证明了值和梯度的一致性，将随机目标与显式的确定性能量景观联系起来，并揭示了单模态和对称多模态之间的几何分岔。在单模态情况下，内在能量是严格凸的，并具有唯一的吉布斯平衡，表明熵在对齐盆地中起到打破平衡的作用。在多模态情况下，内在几何变得交叉耦合，并包含一个持续的负对称散度项：每个模态的边缘分布重塑了另一个模态的有效景观，使得强成对对齐与持续的模态间隙共存。受控的合成实验和预训练CLIP表示的分析支持这些预测。总体而言，我们的结果将分析视角从逐点区分转移到总体几何，表明仅靠成对对齐不足以控制跨模态边缘结构。

英文摘要

While InfoNCE underlies modern contrastive learning, its geometric mechanisms remain under-characterized beyond the canonical alignment--uniformity decomposition. We develop a measure-theoretic framework in which representation measures evolve on a fixed embedding manifold. In the large-batch limit, we prove value and gradient consistency, linking the stochastic objective to explicit deterministic energy landscapes and revealing a geometric bifurcation between unimodal and symmetric multimodal regimes. In the unimodal case, the intrinsic energy is strictly convex and admits a unique Gibbs equilibrium, showing that entropy acts as a tie-breaker within the aligned basin. In the multimodal case, the intrinsic geometry becomes cross-coupled and contains a persistent negative symmetric divergence term: each modality's marginal reshapes the effective landscape of the other, allowing strong pairwise alignment to coexist with a persistent modality gap. Controlled synthetic experiments and analyses of pretrained CLIP representations support these predictions. Overall, our results shift the analytical lens from pointwise discrimination to population geometry, showing that pairwise alignment alone is insufficient to control cross-modal marginal structure.

URL PDF HTML ☆

赞 0 踩 0

2605.24202 2026-06-02 cs.AI cs.LG 版本更新

When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs

多智能体强化学习何时能改进LLM工作流？工作流、规模与策略共享的权衡

Yifan Zeng, Yiran Wu, Yaolun Zhang, Wentian Zhao, Kun Wan, Qingyun Wu, Huazheng Wang

发表机构 * Oregon State University（俄勒冈州立大学）； Pennsylvania State University（宾夕法尼亚州立大学）； Adobe Inc.（Adobe公司）； AG2AI, Inc.（AG2AI公司）

AI总结研究多智能体LLM工作流中端到端强化学习训练的效果，发现改进依赖于工作流、任务和规模，策略共享不提供统一稳定性而是重新分配失败模式。

详情

AI中文摘要

多智能体LLM工作流通过将推理路由到专门角色来提升最终任务准确性，但联合训练这些角色的强化学习不稳定，其机制尚不明确。我们研究了多智能体LLM工作流的端到端RL训练何时能改进其基础模型，比较了共享策略训练（所有角色更新一个策略）和隔离策略训练（每个角色有自己的参数）。我们的实验矩阵涵盖Eval-Opt、Voting和Orch-Workers工作流、数学和代码任务以及三种模型规模（0.6B、1.7B、4B）。我们发现多智能体RL通常能改进基础模型，但增益共同依赖于工作流、任务和规模，而非仅依赖于策略共享。隔离策略倾向于达到更高的峰值准确率，但更频繁地掉入终端准确率悬崖，而共享策略训练并未消除失败；它只是将失败重新分布为性质不同的模式。然后，我们通过工作流拓扑和策略路由引起的角色级梯度动力学解释了其中最显著的模式：在隔离策略下，共享提示上的并行同角色代理会放大每个角色的梯度，并在Voting和Orch-Workers工作流中导致终端退化；在共享策略下，非对称的每步梯度质量导致共享策略被主导角色捕获，从而产生因任务和工作流而异的失败特征。总之，经验图谱及其潜在机制表明，策略共享通过不同渠道引导训练压力，而非提供统一稳定性，使其成为具有工作流和任务条件权衡的设计选择。

英文摘要

Multi-agent LLM workflows route inference through specialized roles to lift end-task accuracy, but jointly training those roles with reinforcement learning is unstable in ways that are poorly understood. We study when end-to-end RL training of multi-agent LLM workflows improves over their base models, comparing Shared-Policy training, where all roles update one policy, with Isolated-Policy training, where each role has its own parameters. Our experimental matrix spans Eval-Opt, Voting, and Orch-Workers workflows, math and code tasks, and three model scales (0.6B, 1.7B, 4B). We find that multi-agent RL usually improves over base models, but gains depend jointly on workflow, task, and scale, not on policy sharing alone. Isolated-Policy tends to reach higher peak accuracy yet more often falls off a terminal accuracy cliff, while Shared-Policy training does not eliminate failure; it redistributes failure into qualitatively different patterns. We then explain the strongest of these patterns through role-level gradient dynamics induced by workflow topology and policy routing: under Isolated-Policy, parallel same-role agents on shared prompts amplify per-role gradients and drive terminal degradation in Voting and Orch-Workers workflows; under Shared-Policy, asymmetric per-step gradient mass causes the shared policy to be captured by the dominant role, producing different failure signatures by task and workflow. Together, the empirical map and its underlying mechanisms show that policy sharing routes training pressure through different channels rather than offering uniform stability, making it a design choice with workflow- and task-conditional tradeoffs.

URL PDF HTML ☆

赞 0 踩 0

2212.07944 2026-06-02 cs.LG math.OC q-fin.CP q-fin.PM q-fin.ST 版本更新

Variable Clustering via Distributionally Robust Nodewise Regression

基于分布鲁棒节点回归的变量聚类

Kaizheng Wang, Xiao Xu, Xun Yu Zhou

发表机构 * Department of Industrial Engineering and Operations Research & The Data Science Institute, Columbia University（工业工程与运筹学系及数据科学研究院，哥伦比亚大学）

AI总结本文提出一种分布鲁棒节点回归方法，通过凸松弛、数据驱动鲁棒区域选择和ADMM算法，实现多因子块模型下的变量聚类，并在数值实验中展示其优越性能。

Comments ICML 2026

2605.23500 2026-06-02 cs.CV cs.LG 版本更新

B-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation

B-GRTO: 引导式分组相对工具优化用于指代分割

Mario Markov, Stefan Maria Ailuro, Mohammad Mahdi, Luc Van Gool, Danda Pani Paudel

发表机构 * INSAIT ； Sofia University "St. Kliment Ohridski"（索菲亚大学"圣克莱门特·欧赫里迪斯基"）

AI总结提出B-GRTO框架，通过引导式预训练和分组相对工具优化，联合优化策略与可微分割解码器，显著提升复杂指代分割性能。

详情

AI中文摘要

分割是计算机视觉中的基本任务，支撑像素级场景理解，并作为从自主感知到医学图像分析等应用的基石。对于复杂的指代分割，近期方法将大型视觉-语言模型与分割解码器配对：前者分析图像和提示，后者预测目标掩码。尽管强化学习改进了推理密集型视觉-语言系统，但可训练工具（如分割解码器）通常使用可微目标单独优化，而将这些目标原则性地整合到强化学习中仍未被充分探索。因此，我们引入了分组相对工具优化（GRTO），这是一个数学上严谨的框架，用于联合优化具有可微工具使用的策略。GRTO重用分组相对策略优化（GRPO）的采样结果来优化辅助工具目标，使解码器梯度补充策略奖励。此外，我们推导出引导式GRTO（B-GRTO），一种廉价引导工具的预训练方法，从而实现更快的收敛和更优的性能。在三个具有挑战性的指代分割设置中，B-GRTO相比普通GRPO取得了显著改进，匹配或超越了领域特定的最新方法。这证明了将强化学习与可微辅助目标统一用于推理密集型分割的价值。

英文摘要

Segmentation is a fundamental task in computer vision, underpinning pixel-level scene understanding and serving as a cornerstone for applications ranging from autonomous perception to medical image analysis. For complex referring segmentation, recent methods pair large vision-language models with segmentation decoders: the former analyzes the image and prompt, while the latter predicts the target mask. Although reinforcement learning improves reasoning-intensive vision-language systems, trainable tools such as segmentation decoders are typically optimized separately with differentiable objectives, and the principled integration of such objectives into reinforcement learning remains underexplored. Thus, we introduce group relative tool optimization (GRTO), a mathematically grounded framework for jointly optimizing a policy with differentiable tool use. GRTO reuses group relative policy optimization (GRPO) rollouts to optimize the auxiliary tool objective, letting decoder gradients complement policy rewards. Further, we derive Bootstrapped-GRTO (B-GRTO), a pre-training method that cheaply bootstraps the tool, leading to faster convergence and superior performance. Across three challenging referring segmentation settings, B-GRTO results in substantial improvements over plain GRPO, matching or surpassing domain-specific state-of-the-art methods. This demonstrates the value of unifying reinforcement learning with differentiable auxiliary objectives for reasoning-intensive segmentation.

URL PDF HTML ☆

赞 0 踩 0

2605.23080 2026-06-02 cs.LG 版本更新

SWE-MiniSandbox：用于构建软件工程智能体的无容器强化学习

Danlong Yuan, Wei Wu, Enhan Zhao, Zhengren Wang, Xueliang Zhao, Huishuai Zhang, Dongyan Zhao

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出SWE-MiniSandbox，一种轻量级无容器方法，通过内核级隔离和预缓存技术降低磁盘使用和准备时间，实现可扩展的强化学习训练。

详情

AI中文摘要

强化学习已成为训练软件工程智能体的关键范式，但现有流程通常依赖每个任务的容器进行隔离。在大规模场景下，预构建的容器镜像会带来显著的存储开销、缓慢的环境设置，并且需要容器管理权限。我们提出SWE-MiniSandbox，一种轻量级、无容器的方法，能够在无需牺牲隔离性的情况下实现SWE智能体的可扩展强化学习训练。SWE-MiniSandbox不依赖每个实例的容器，而是在由内核级机制支持的隔离工作空间中执行每个任务，从而大幅降低系统开销。它利用轻量级环境预缓存技术，消除了对庞大容器镜像的需求。因此，我们的方法将磁盘使用量降低到基于容器的流程所需的大约5%，并将环境准备时间缩短到容器基线的大约25%。实验结果表明，SWE-MiniSandbox实现了与标准基于容器的流程相当的评估性能。通过消除对重型容器基础设施的依赖，SWE-MiniSandbox为扩展基于强化学习的SWE智能体提供了一个实用且可访问的基础，特别是在资源受限的研究环境中。

英文摘要

Reinforcement learning (RL) has become a key paradigm for training software engineering (SWE) agents, but existing pipelines typically rely on per-task containers for isolation. At scale, pre-built container images incur substantial storage overhead, slow environment setup, and require container-management privileges. We propose SWE-MiniSandbox, a lightweight, container-free method that enables scalable RL training of SWE agents without sacrificing isolation. Instead of relying on per-instance containers, SWE-MiniSandbox executes each task in an isolated workspace backed by kernel-level mechanisms, substantially reducing system overhead. It leverages lightweight environment pre-caching techniques to eliminate the need for bulky container images. As a result, our approach lowers disk usage to approximately 5\% of that required by container-based pipelines and reduces environment preparation time to about 25\% of the container baseline. Empirical results demonstrate that SWE-MiniSandbox achieves evaluation performance comparable to standard container-based pipelines. By removing the dependency on heavy container infrastructure, SWE-MiniSandbox offers a practical and accessible foundation for scaling RL-based SWE agents, particularly in resource-constrained research environments.

URL PDF HTML ☆

赞 0 踩 0

2605.21422 2026-06-02 cs.LG 版本更新

PRISM: Preference-Aware Influence Function Based Data Selection Method for Efficient Fine-Tuning

PRISM：基于偏好感知影响函数的数据选择方法用于高效微调

Qihao Lin, Guanxu Chen, Dongrui Liu, Jing Shao

发表机构 * Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）

AI总结提出PRISM方法，通过偏好感知影响函数对目标示例加权，构建偏好感知目标方向，优先选择有效驱动模型匹配目标行为的数据，提升高效微调和安全对齐微调性能。

Comments 23 pages, 5 figures

详情

AI中文摘要

随着LLM规模不断扩大，提高训练效率在很大程度上依赖于有效的数据利用。数据选择通过将有限的训练预算分配给能够最优促进模型目标行为的高价值样本来缓解这一问题。大多数现有方法通过一组目标示例定义目标行为，并根据候选训练数据对这些样本的估计影响进行评分。然而，这些方法将所有目标示例视为同等重要，忽略了单个示例对模型优化的不同相关性。具体来说，与模型固有行为紧密对齐的目标示例提供更强的监督信号，而不一致的示例仅提供微弱且无效的局部指导。我们提出PRISM，一种基于偏好感知影响函数的数据选择方法。它利用模型偏好为目标示例分配权重，并构建偏好感知目标方向。PRISM根据候选训练样本对该方向的影响进行评估，并优先将数据预算分配给能有效驱动模型匹配预期目标行为的样本。理论分析验证，与均匀聚合策略相比，加权偏好构造能产生更优的一阶梯度方向以提升目标偏好。涵盖不同模型架构和参数规模的广泛实验表明，PRISM在高效微调和安全对齐监督微调修正中取得了更好的性能。结果验证了目标行为的准确表征是成本效益数据选择的核心。

英文摘要

As LLMs continue to scale up, improving training efficiency heavily relies on effective data utilization. Data selection mitigates this issue by allocating the limited training budget to high-value examples that optimally facilitate the model's target behavior. Most existing approaches define target behavior via a set of target examples and score candidate training data based on their estimated influence on these samples. However, such methods uniformly treat all target examples as equally important, ignoring the varying relevance of individual examples to model optimization. Specifically, target examples that align closely with the model's inherent behavior deliver stronger supervisory signals, whereas discrepant examples yield only weak and ineffective local guidance. We propose PRISM, a Preference-aware Influence function based Data Selection Method. It leverages model preference to assign weights to target examples and builds a preference-aware target direction. PRISM evaluates candidate training samples according to their influence on this direction, and prioritizes data budget allocation to samples that effectively drive the model to match expected target behavior. Theoretical analysis verifies that weighted preference construction generates a superior first-order gradient direction for boosting target preference, compared with uniform aggregation strategies. Extensive experiments covering diverse model architectures and parameter scales demonstrate that PRISM achieves better performance in efficient fine-tuning and safety-aligned supervised fine-tuning rectification. The results validate that accurate characterization of target behavior serves as the core of cost-effective data selection.

URL PDF HTML ☆

赞 0 踩 0

2605.21125 2026-06-02 cs.LG 版本更新

Advantage Collapse in Group Relative Policy Optimization: Diagnosis and Mitigation

群体相对策略优化中的优势崩塌：诊断与缓解

Xixiang He, Qiyao Sun, Ao Cheng, Xingming Li, Xuanyu Ji, Hailun Lu, Runke Huang, Qingyong Hu

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结针对GRPO在可验证奖励强化学习中的优势崩塌问题，提出诊断指标ACR和轻量级扩展方法AVSPO，通过注入虚拟奖励样本减少梯度消失，提升模型推理能力。

Comments Accepted at the International Conference on Machine Learning (ICML 2026). Project page: https://QingyongHu.github.io/AVSPO

详情

AI中文摘要

群体相对策略优化（GRPO）是可验证奖励强化学习（RLVR）框架中的一种重要算法，在提升大型语言模型（LLMs）的推理能力方面取得了显著成果。然而，GRPO容易发生优势崩塌，这是一种故障模式，其中组内的同质奖励（例如，全部正确或全部错误的答案）产生接近零的优势和消失的梯度。为了解决这个问题，我们引入了优势崩塌率（ACR），这是第一个量化具有无效梯度的训练批次比例的诊断指标。在数学推理基准上，使用从0.5B到14B参数的模型，我们证明ACR能够强有力地预测训练停滞和最终性能。然后，我们提出了自适应虚拟样本策略优化（AVSPO），这是GRPO的一个轻量级扩展，通过实时ACR监控指导注入虚拟奖励样本，使得无需额外的模型 rollout 即可从同质组中学习。与GRPO相比，AVSPO将优势崩塌减少了58-63%，并在所有模型规模上一致地获得了4-6个百分点的准确率提升，同时在评估的域外任务上保持了泛化能力。代码和数据集可在 https://github.com/hexixiang/Advantage-Collapse-Rate 获取。

英文摘要

Group Relative Policy Optimization (GRPO), a prominent algorithm within the Reinforcement Learning from Verifiable Rewards (RLVR) framework, has achieved strong results in improving the reasoning capabilities of large language models (LLMs). However, GRPO is prone to advantage collapse, a failure mode where homogeneous rewards within a group (e.g., all correct or all incorrect answers) yield near-zero advantages and vanishing gradients. To address this, we introduce the Advantage Collapse Rate (ACR), the first diagnostic metric quantifying the proportion of training batches with ineffective gradients. Across models from 0.5B to 14B parameters on mathematical reasoning benchmarks, we show that ACR strongly predicts training stagnation and final performance. We then propose Adaptive Virtual Sample Policy Optimization (AVSPO), a lightweight extension of GRPO that injects virtual reward samples, guided by real-time ACR monitoring, to enable learning from homogeneous groups without additional model rollouts. AVSPO reduces advantage collapse by 58-63% relative to GRPO and yields consistent accuracy gains of 4-6 percentage points across all model scales, while maintaining generalization on the evaluated out-of-domain task. Code and datasets are available at https://github.com/hexixiang/Advantage-Collapse-Rate.

URL PDF HTML ☆

赞 0 踩 0

2605.20854 2026-06-02 cs.LG 版本更新

Finite-Time Regret Analysis of Retry-Aware Bandits

重试感知赌博机的有限时间遗憾分析

Bingkui Tong, Junpei Komiyama, Soichiro Nishimori, Paavo Parmas

发表机构 * Mohamed bin Zayed University of Artificial Intelligence（莫扎伊德·本·扎耶德人工智能大学）； RIKEN AIP（日本学术振兴会高级研究所）； The University of Tokyo（东京大学）

AI总结研究针对重试感知目标（如pass@k和max@k）的随机赌博机算法ReMax，通过期望改进平衡条件刻画其最优采样分布，并证明首次亚线性遗憾界，揭示其比汤普森采样更具剥削性的原因。

Comments 38 pages

详情

AI中文摘要

我们研究了一种由重试感知目标（重视多次尝试中的最佳结果，如pass@$k$和max@$k$）启发的随机赌博机算法。给定臂值的后验分布，ReMax选择一种采样分布，最大化在$M$次虚拟抽取中后验期望最大奖励。尽管该目标在强化学习中作为不确定性下的探索机制被引入，但其在赌博机问题中的遗憾性质一直不清楚。对于高斯奖励和第一个非平凡情况$M=2$，我们通过期望改进平衡条件刻画了最优ReMax分布，并证明了ReMax的第一个亚线性遗憾界。我们的分析将次优臂的通常饱和行为与ReMax特有的低估效应分开，其中最优臂可能在不利估计后被采样过少。这解释了为什么ReMax可能比汤普森采样（TS）更具剥削性，以及其遗憾分析在技术上的微妙之处。实验支持这一图景：在轻度低估下，ReMax通常优于KL-UCB和汤普森采样，而后验方差缩放经验性地缓解了严重低估。

英文摘要

We study a stochastic bandit algorithm motivated by retry-aware objectives that value the best outcome among multiple attempts, such as pass@$k$ and max@$k$. Given a posterior over arm values, ReMax chooses a sampling distribution that maximizes the posterior expected maximum reward over $M$ virtual draws. Although this objective was introduced in reinforcement learning as an exploration mechanism under uncertainty, its regret properties in bandit problems have remained unclear. For Gaussian rewards and the first nontrivial case $M=2$, we characterize the optimal ReMax distribution through an expected-improvement balance condition and prove the first sublinear regret bound for ReMax. Our analysis separates the usual saturation behavior of suboptimal arms from a ReMax-specific underestimation effect, in which the optimal arm may be sampled too rarely after an unfavorable estimate. This explains why ReMax can be more exploitative than Thompson sampling (TS) and why its regret analysis is technically delicate. Experiments support this picture: ReMax often outperforms KL-UCB and Thompson sampling under mild underestimation, while posterior-variance scaling empirically mitigates severe underestimation.

URL PDF HTML ☆

赞 0 踩 0

2605.19847 2026-06-02 cs.CR cs.IR cs.LG 版本更新

Auditing Privacy in Multi-Tenant RAG under Account Collusion

多租户RAG中账户共谋下的隐私审计

Florian A. D. Burnat

发表机构 * University of Bath（巴斯大学）

AI总结针对多租户RAG中同一索引下账户共谋导致隐私泄露加剧的问题，提出一种可验证的审计协议，用于认证噪声-选择检索并报告共谋上限内的隐私损失。

详情

AI中文摘要

多租户RAG服务通常将账户视为隐私边界：每个账户针对租户索引获得$(\varepsilon_{ ext{acc}},δ_{ ext{acc}})$-DP检索保证。我们表明，这种框架低估了同一索引下账户共谋的泄露。对于高斯噪声-选择检索，$k$个协调的同一租户账户组合成联合泄露$Θ(\sqrt{k}\,\varepsilon_{ ext{acc}})$，而非$\varepsilon_{ ext{acc}}$；我们给出匹配的成员推断攻击，并在标量、top-$K$、训练嵌入器和生产规模的HNSW设置中验证了预测的$\sqrt{k}$ AUC趋势。然后，我们给出一个验证者可运行的审计协议，该协议认证噪声-选择检索，并针对达到声明上限$k_{\max}$的联盟报告$( extsf{PASS},\varepsilon_{ ext{audit}})$，而不泄露索引或改变检索决策规则。该声明仅针对检索通道：生成通道泄露和对抗性鲁棒的联盟规模估计是补充审计谓词。

英文摘要

Multi-tenant RAG services often treat the account as the privacy boundary: each account receives an $(\varepsilon_{\text{acc}},δ_{\text{acc}})$-DP retrieval guarantee against the tenant index. We show that this framing understates leakage under same-index account collusion. For Gaussian noise-then-select retrieval, $k$ coordinated same-tenant accounts compose to joint leakage $Θ(\sqrt{k}\,\varepsilon_{\text{acc}})$, not $\varepsilon_{\text{acc}}$; we give a matching membership-inference attack and validate the predicted $\sqrt{k}$ AUC trend in scalar, top-$K$, trained-embedder, and production-scale HNSW settings. We then give a verifier-runnable audit protocol that attests noise-then-select retrieval and reports $(\textsf{PASS},\varepsilon_{\text{audit}})$ for coalitions up to a declared cap $k_{\max}$, without disclosing the index or changing the retrieval decision rule. The claim is retrieval-channel only: generation-channel leakage and adversarially robust coalition-size estimation are complementary audit predicates.

URL PDF HTML ☆

赞 0 踩 0

2605.01752 2026-06-02 cs.LG 版本更新

LLM引导的通信用于合作多智能体强化学习

Sangjun Bae, Yisak Park, Sanghyeon Lee, Seungyul Han

发表机构 * KAIST（韩国科学技术院）

AI总结提出LMAC框架，利用大语言模型的推理能力设计通信协议，使所有智能体尽可能准确一致地重建底层状态，从而提升多智能体强化学习中的状态重建和性能。

Comments 9 pages for main, 32 pages for total, Accepted to ICML 2026

2605.17554 2026-06-02 cs.AI cs.LG 版本更新

Evaluating Deep Research Agents on Expert Consulting Work: A Benchmark with Verifiers, Rubrics, and Cognitive Traps

评估深度研究代理在专家咨询工作中的表现：一个包含验证器、评分标准和认知陷阱的基准

Tanmay Asthana, Aman Saksena, Divyansh Sahu

AI总结本文提出一个基准，通过42个专家编写的任务，使用确定性验证器和五维度评分标准评估三个前沿深度研究代理（Claude、OpenAI o3、Gemini）在管理咨询类结构化分析交付物上的表现，并嵌入认知陷阱，发现所有代理的联合接受率均较低（最高21.4%），且各有独特失败模式。

Comments Updating the paper with more data. Will resubmit

详情

AI中文摘要

前沿深度研究代理（DRA）能够规划研究任务、综合多篇文档，并按需生成结构化的交付物。它们在企业工作流中的部署速度远快于评估速度。现有基准衡量事实回忆、单跳问答或通用代理技能，忽略了DRA被部署用于生成的多文档、决策级工作。我们引入一个基准，针对管理咨询师典型一周中所需的结构化分析交付物。我们评估三个前沿代理，即Claude Opus 4.6（带网络搜索）、OpenAI o3-deep-research和Google Gemini 3.1 Pro deep-research，在42个由领域专家（SME）编写的提示上。每个提示的126个响应在两个层面评分：确定性真实验证器（平均每个任务13.8个）和五维度0-3 SME评分标准，组合成0-100的验证器-评分标准分数（VRS）。大多数提示嵌入了惩罚表面模式匹配的认知陷阱。在我们的联合阈值（评分标准均值>=2.5且验证器通过率>=80%）下的接受率普遍较低：Gemini 21.4%，o3 9.5%，Claude 9.5%。平均VRS分数与已发表的基于评分标准的基准一致（我们的最高62.6对比APEX-v1 64.2，ProfBench 65.9，ResearchRubrics <68%），验证了评分标准构建。ACCEPT率低于APEX-Agents在专用DR代理上的MC-segment Pass@1区间（12.3-22.7%）；尽管有工具优势，我们的下限仍低三个百分点，这是由于更严格的合取评分和陷阱设计。每个代理的失败模式各不相同。Claude最可靠地生成交付物（在需要文件的任务上比其他代理高4.5倍），但具有最高的虚构特征。o3具有最清晰的推理平均值，但会遗漏必要部分并传播算术错误。Gemini是双峰的，具有最高的接受率，同时也有最多的零分评分标准单元格。

英文摘要

Frontier deep research agents (DRAs) plan a research task, synthesize across documents, and return a structured deliverable on demand. They are being deployed in enterprise workflows faster than they are being evaluated. Existing benchmarks measure factual recall, single-hop QA, or generic agentic skill, missing the multi-document, decision-grade work DRAs are deployed to produce. We introduce a benchmark targeting the structured analytical deliverables that fill a management consultant's typical week. We grade three frontier agents, namely Claude Opus 4.6 with web search, OpenAI o3-deep-research, and Google Gemini 3.1 Pro deep-research, on 42 SME-authored prompts. Each of the 126 responses is scored on two layers: deterministic ground-truth verifiers (mean 13.8 per task) and a five-criterion 0-3 SME rubric, composed into a Verifier-Rubric Score (VRS) on 0-100. Most prompts embed cognitive traps that penalize surface-pattern matching. Acceptance under our joint threshold (rubric mean >= 2.5 and verifier rate >= 80%) is uniformly low: Gemini 21.4%, o3 9.5%, Claude 9.5%. Mean VRS scores agree with published rubric-based benchmarks (our top 62.6 vs. APEX-v1 64.2, ProfBench 65.9, ResearchRubrics < 68%), validating the rubric construct. ACCEPT rates sit below APEX-Agents' MC-segment Pass@1 band (12.3-22.7%) on dedicated DR agents; our floor is three points lower despite the harness advantage, opened by stricter conjunctive grading and trap design. Each agent fails distinctively. Claude produces the deliverable most reliably (4.5x the others' rate on file-required tasks) but carries the highest fabrication signature. o3 has the cleanest reasoning average yet drops required sections and propagates arithmetic errors. Gemini is bimodal, with the highest acceptance rate alongside the most zero-scored rubric cells.

URL PDF HTML ☆

赞 0 踩 0

2605.12969 2026-06-02 cs.LG cs.AI 版本更新

Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective

从对比视角重新审视基于可验证奖励的强化学习

Feng Zhang, Xinhong Ma, Ziqiang Dong, Xi Leng, Jianfei Zhao, Xin Sun, Yang Yang, Guanjun Jiang

发表机构 * Beijing Institute of Technology（北京理工大学）； Qwen Business Unit of Alibaba（阿里巴巴Qwen业务部）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））

AI总结本文提出ConSPO方法，通过对比序列级策略优化，解决GRPO在目标函数上的似然错配和信用分配不敏感问题，在推理任务上超越强基线。

详情

AI中文摘要

组相对策略优化（GRPO）是目前最广泛采用的RLVR算法之一，用于对大型语言模型进行推理任务的后训练。我们首先证明GRPO存在等价的判别式重新表述，其中策略优化最大化验证的正负rollout之间的期望得分差距。这种重新表述揭示了两个目标层面的局限性：似然错配的替代得分（优化的是基于裁剪比率的得分而非控制生成的序列似然）和得分不敏感的信用分配（rollout级别的信用不反映当前正负rollout之间的得分差距）。为了解决这些局限性，我们提出ConSPO，一种对比序列级策略优化方法，它使用长度归一化的序列对数概率作为rollout得分，并在同一组内对比验证的正rollout与负干扰项。ConSPO优化一个组级别的InfoNCE风格目标，以自适应地增强对分离不佳的正样本和高分负样本的更新，同时结合课程调度的边界，在训练过程中保持分离压力。在多种设置下的实验表明，ConSPO在具有挑战性的推理基准上优于强基线。代码将在论文被接收后发布。

英文摘要

Group Relative Policy Optimization (GRPO) is one of the most widely adopted RLVR algorithms for post-training large language models on reasoning tasks. We first show that GRPO admits an equivalent discriminative reformulation, in which policy optimization maximizes the expected score gap between verified positive and negative rollouts. This reformulation reveals two objective-level limitations: likelihood-misaligned surrogate scores, in which clipped ratio-based scores are optimized rather than the sequence likelihoods that govern generation, and score-insensitive credit assignment, in which rollout-level credit does not reflect the current score gaps between positive and negative rollouts. To address these limitations, we propose ConSPO, a Contrastive Sequence-level Policy Optimization method that uses length-normalized sequence log-probabilities as rollout scores and contrasts verified positive rollouts against negative distractors within the same group. ConSPO optimizes a group-wise InfoNCE-style objective to adaptively strengthen updates for poorly separated positives and high-scoring negatives, together with a curriculum-scheduled margin that preserves separation pressure as training progresses. Experiments across diverse settings show that ConSPO outperforms strong baselines on challenging reasoning benchmarks. Code will be released upon paper acceptance.

URL PDF HTML ☆

赞 0 踩 0

2605.17110 2026-06-02 cs.AI cs.LG 版本更新

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

通过证据校准的查询聚类捕捉LLM能力

Fangzhou Wu, Sandeep Silwal, Qiuyi Zhang

发表机构 * University of Wisconsin–Madison（威斯康星大学麦迪逊分校）； Elorian AI

AI总结提出ECC算法，利用有限后验模型比较校准先验语义嵌入，通过Bradley-Terry模型参数化能力轮廓，联合学习灵活的能力感知聚类结构，显著提升LLM能力排序质量。

Comments 45 pages

详情

AI中文摘要

查询聚类将查询分组为反映共享潜在能力需求的组，从而实现能力感知的LLM评估。现有的聚类方法主要依赖于语义分类或嵌入，由于表面语义与实际模型性能之间的错位，往往无法捕捉此类潜在能力需求。我们提出ECC，一种使用有限后验模型比较校准先验语义嵌入的算法，以弥合表面语义与潜在能力需求之间的差距。ECC通过Bradley-Terry模型参数化的能力轮廓来表征每个聚类，并使用可训练的混合权重来适应具有混合能力需求的查询，联合学习灵活的能力感知聚类结构，支持查询特定的LLM能力推断。大量的定量和定性评估表明，ECC显著提高了LLM能力排序质量，分别比人工标注和基于嵌入的基线平均高出17.64和18.02个百分点，并在查询路由等下游任务中证明有效。

英文摘要

Query clustering organizes queries into groups that reflect shared latent capability demands, enabling capability-aware LLM evaluation. Existing clustering methods, which primarily rely on semantic taxonomies or embeddings, often fail to capture such latent capability requirements due to a misalignment between surface-level semantics and actual model performance. We propose ECC, an algorithm that calibrates prior semantic embeddings using limited posterior model comparisons to bridge the gap between surface-level semantics and latent capability requirements. ECC characterizes each cluster through a capability profile parameterized by a Bradley-Terry model and uses trainable mixture weights to accommodate queries with mixed capability demands, jointly learning a flexible, capability-aware clustering structure that supports query-specific inference of LLM capabilities. Extensive quantitative and qualitative evaluations demonstrate that ECC significantly improves LLM capability ranking quality, outperforming human-labeled and embedding-based baselines by an average of 17.64 and 18.02 percentage points, respectively, and proves effective in downstream tasks such as query routing.

URL PDF HTML ☆

赞 0 踩 0

2605.17034 2026-06-02 cs.LG cs.AI cs.CR 版本更新

Privacy Policy Enforcement Guardrails for Data-Sensitive Retrieval-Augmented Generation

面向数据敏感检索增强生成的隐私策略执行护栏

Osama Zafar, Alexander Nemecek, Yiqian Zhang, Wenbiao Li, Debargha Ganguly, Vikash Singh, Vipin Chaudhary, Erman Ayday

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Washington（华盛顿大学）； University of Toronto（多伦多大学）； University of Texas at Austin（德克萨斯大学奥斯汀分校）； University of California, Los Angeles（加州大学洛杉矶分校）

AI总结针对RAG系统中上下文数据泄露问题，提出基于双单类密度估计器与融合文本嵌入的隐私策略执行框架，在医学、金融和法律领域实现高AUROC和低误报率。

详情

AI中文摘要

标准的PII过滤器常常遗漏RAG系统中的上下文数据泄露，例如非受管制的属性集群共同识别个人身份。我们引入了一个隐私策略执行（PPE）框架，使用双单类密度估计器与融合文本嵌入，以及针对分布外输入的校准弃权区域。通过跨医学、金融和法律领域的轴分层、多LLM合成数据管道，我们发现传统的高斯混合基线在边界安全压力测试中失败，因为它们关注语言风格而非内容。我们提出的T3+OCSVM检测器，在安全和边界安全数据上训练，实现了0.93+的边界AUROC，同时将误报率降低44-55个百分点，并保持毫秒级延迟。与监督MLP分类器或14B参数LLM评判器相比，我们的框架提供了更优的操作适用性，因为前者具有高弃权率，后者存在延迟和校准问题。该方法为任何合成数据训练的分类器提供了稳健的压力测试标准。

英文摘要

Standard PII filters often miss contextual data leakage in RAG systems, such as non-regulated attribute clusters that collectively identify individuals. We introduce a Privacy Policy Enforcement (PPE) framework using dual one-class density estimators with fused text embeddings and a calibrated abstain region for out-of-distribution inputs. Using an axis-stratified, multi-LLM synthetic data pipeline across medicine, finance, and law, we found that traditional Gaussian Mixture baselines fail on borderline-safe stress tests by focusing on linguistic register rather than content. Our proposed T3+OCSVM detector, trained on safe and borderline-safe data, achieves a borderline AUROC of 0.93+ while reducing false positives by 44-55 percentage points and maintaining millisecond latency. Compared to supervised MLP classifiers or 14B-parameter LLM judges, our framework offers superior operational suitability, as the former suffers from high abstention rates and the latter from latency and calibration issues. This methodology provides a robust stress-testing standard for any synthetic-data-trained classifier.

URL PDF HTML ☆

赞 0 踩 0

2605.11125 2026-06-02 cs.LG 版本更新

Language Modeling with Hyperspherical Flows

超球面流语言建模

Justin Deschenaux, Caglar Gulcehre

发表机构 * EPFL（苏黎世联邦理工学院）； Microsoft AI（微软人工智能）

AI总结提出一种在超球面潜空间中进行连续流语言建模的方法 S-FLM，通过旋转向量和交叉熵学习速度场，避免独热向量开销，在大型词汇推理任务上显著提升性能，缩小了与掩码扩散模型的差距。

详情

AI中文摘要

离散扩散语言模型作为自回归模型的替代方案发展迅速，其动机在于并行生成能力。然而，为了可处理性，离散扩散模型从因子化分布中采样，其表达能力弱于自回归模型。最近的流语言模型将连续流应用于语言，通过确定性常微分方程将噪声传输到数据，避免了因子化采样。流语言模型操作于独热向量，其维度随词汇表大小缩放，使得流语言模型训练成本高昂。此外，由于所有不同的独热嵌入在 $\ell_2$ 中都是等距的，添加高斯噪声没有明确的语义解释（与图像不同，在图像中高斯噪声逐渐退化结构）。我们引入了 $\mathbb{S}$-FLM，一种在超球面中的潜在流语言模型。$\mathbb{S}$-FLM 通过沿速度场旋转 $\mathbb{S}^{d-1}$ 中的向量来生成序列，该速度场使用交叉熵学习，避免了具体化独热向量的开销。先前的流语言模型在生成困惑度上与自回归模型匹配，但在数学和代码等可验证领域中，高似然的样本不一定正确。$\mathbb{S}$-FLM 在大型词汇推理任务上显著改进了连续流语言模型，并在标准温度采样（$T=1$）下缩小了与掩码扩散的差距，而在优化的低温解码（$T=0.1$）下仍存在差距。

英文摘要

Discrete Diffusion Language Models progressed rapidly as an alternative to autoregressive (AR) models, motivated by their parallel generation abilities. However, for tractability, discrete diffusion models sample from a factorized distribution, which is less expressive than AR. Recent Flow Language Models (FLMs) apply continuous flows to language, transporting noise to data with a deterministic ODE that avoids factorized sampling. FLMs operate on one-hot vectors whose dimension scales with the vocabulary size, making FLMs costly to train. Moreover, since all distinct one-hot embeddings are equidistant in $\ell_2$, adding Gaussian noise does not have a clear semantic interpretation (unlike images, where Gaussian noise progressively degrades structure). We introduce $\mathbb{S}$-FLM, a latent FLM in the hypersphere. $\mathbb{S}$-FLM generates sequences by rotating vectors in $\mathbb{S}^{d-1}$ along a velocity field learned with cross-entropy, avoiding the overhead of materializing one-hot vectors. Previous FLMs match AR in Generative Perplexity (Gen.\ PPL), but samples with high likelihood are not necessarily correct in verifiable domains such as math and code. $\mathbb{S}$-FLM substantially improves continuous flow language models on large-vocabulary reasoning and closes the gap to masked diffusion under standard-temperature sampling ($T=1$), while a gap remains under optimized low-temperature ($T=0.1$) decoding.

URL PDF HTML ☆

赞 0 踩 0

2511.00064 2026-06-02 cs.LG 版本更新

SPORE: Skeleton Propagation Over Recalibrating Expansions

SPORE: 基于重新校准扩展的骨架传播

Randolph Wiredu-Aidoo

发表机构 * Randolph Wiredu-Aidoo

AI总结提出一种两阶段密度聚类算法SPORE，通过自适应扩展和边界传播解决异质密度和边界模糊问题，在28个基准数据集上显著优于现有方法。

详情

AI中文摘要

许多真实世界的数据集不是线性可分的，这限制了基于质心的聚类方法（如K-means）的有效性。基于密度的聚类方法通过识别具有任意几何结构的聚类来解决这一限制；然而，现有方法存在两个持续的缺点。首先，它们在存在异质局部密度的情况下往往表现不佳，其中单个密度阈值无法充分捕获跨多个密度尺度的聚类。其次，它们通常缺乏由基于质心方法的线性划分机制自然诱导的清晰边界界定。本文介绍了SPORE（基于重新校准扩展的骨架传播），这是一种聚类算法，旨在解决这两个挑战，同时保留基于密度方法的几何灵活性。SPORE分两个阶段运行：自适应聚类扩展阶段，然后是邻近驱动的边界传播阶段，即使在弱密度对比下也能保持判别能力。该方法在28个基准数据集上与已建立的基于密度的基线进行了评估，并以K-means作为参考的基于质心方法。实验结果表明，相对于所有评估的基线（p < 0.01），SPORE实现了显著改善的聚类恢复，同时可以在五次随机搜索评估内识别出性能强劲的配置。

英文摘要

Many real-world datasets are not linearly separable, limiting the effectiveness of centroid-based clustering methods such as K-means. Density-based clustering methods address this limitation by identifying clusters with arbitrary geometric structure; however, existing approaches exhibit two persistent shortcomings. First, they often underperform in the presence of heterogeneous local densities, where a single density threshold cannot adequately capture clusters across multiple density scales. Second, they generally lack the clear boundary delineation naturally induced by the linear partitioning mechanism of centroid-based methods. This paper introduces SPORE (Skeleton Propagation Over Recalibrating Expansions), a clustering algorithm designed to address both challenges while preserving the geometric flexibility of density-based approaches. SPORE operates in two stages: an adaptive cluster expansion phase followed by a proximity-driven boundary propagation phase that maintains discriminative capability even under weak density contrast. The proposed method is evaluated on 28 benchmark datasets against established density-based baselines, with K-means included as a reference centroid-based method. Experimental results demonstrate that SPORE achieves significantly improved cluster recovery relative to all evaluated baselines (p < 0.01), while strong-performing configurations can be identified within five random-search evaluations.

URL PDF HTML ☆

赞 0 踩 0

2510.12999 2026-06-02 cs.LG stat.ML 版本更新

AMORE: Adaptive Multi-Output Operator Network for Stiff Chemical Kinetics

AMORE: 自适应多输出算子网络用于刚性化学动力学

Kamaljyoti Nath, Additi Pandey, Bryan T. Susi, Hessam Babaee, George Em Karniadakis

发表机构 * Division of Applied Mathematics, Brown University（布朗大学应用数学系）； Applied Research Associates, Inc.（应用研究公司）； Department of Mechanical Engineering and Materials Science, University of Pittsburgh（匹兹堡大学机械工程与材料科学系）

AI总结针对刚性化学动力学系统的时间积分计算成本高的问题，提出AMORE框架，通过自适应损失函数和可逆映射确保多输出算子学习的可靠性，并在合成气和GRI-Mech 3.0上验证了有效性。

详情

DOI: 10.1016/j.jcp.2026.115030

AI中文摘要

刚性系统的时间积分是燃烧、高超声速及其他反应输运系统中计算成本的主要来源。这种刚性会引入远小于其他物理过程的时间尺度，导致显式格式需要极小的步长或隐式方法计算量大。因此，缓解刚性挑战的策略至关重要。虽然神经算子（DeepONet）可以作为刚性动力学的替代模型，但需要可靠的算子学习策略来适当考虑输出变量和样本之间的误差差异。本文开发了AMORE（自适应多输出算子网络），一个包含能够预测多个输出的算子和确保可靠算子学习的自适应损失函数的框架。该算子从给定初始条件预测所有热化学状态。我们提出了两种自适应损失函数，考虑每个状态变量和样本的误差来惩罚损失函数。我们设计了主干网络以自动满足单位分解。为了精确满足质量分数总和为1的约束，我们提出了一个可逆解析映射，将n维物种质量分数向量变换到(n-1)维空间。我们将所提出的自适应损失函数扩展到具有多输出的DeepONet的两步训练中的主干和分支训练。我们还通过预测质量分数上的softmax函数精确实现了另一个质量分数总和为1的约束。我们通过两个示例证明了模型的有效性和适用性：合成气（12个状态）、GRI-Mech 3.0（54个中的24个活跃状态）。所提出的DeepONet将成为未来CFD研究加速湍流燃烧模拟的骨干。AMORE是一个通用框架，本文也将其应用于FNO。

英文摘要

Time integration of stiff systems is a primary source of computational cost in combustion, hypersonics, and other reactive transport systems. This stiffness can introduce time scales significantly smaller than those associated with other physical processes, requiring extremely small time steps in explicit schemes or computationally intensive implicit methods. Consequently, strategies to alleviate challenges posed by stiffness are important. While neural operators (DeepONets) can act as surrogates for stiff kinetics, a reliable operator learning strategy is required to appropriately account for differences in error between output variables and samples. Here, we develop AMORE, Adaptive Multi-Output Operator Network, a framework comprising an operator capable of predicting multiple outputs and adaptive loss functions ensuring reliable operator learning. The operator predicts all thermochemical states from given initial conditions. We propose two adaptive loss functions within the framework, considering each state variable's and sample's error to penalize the loss function. We designed the trunk to automatically satisfy Partition of Unity. To enforce unity mass-fraction constraint exactly, we propose an invertible analytical map that transforms the $n$-dimensional species mass-fraction vector into an ($n-1$)-dimensional space. We extend the proposed adaptive loss functions to trunk and branch training in two-step training of DeepONet with multiple outputs. We implemented another unity mass fraction constraint exactly using a softmax function on the predicted mass fraction. We demonstrate efficacy and applicability of our models through two examples: syngas (12 states), GRI-Mech 3.0 (24 active states out of 54). The proposed DeepONet will be a backbone for future CFD studies to accelerate turbulent combustion simulations. AMORE is a general framework, and here, we also demonstrate it for FNO.

URL PDF HTML ☆

赞 0 踩 0

2605.16451 2026-06-02 cs.LG cs.AI 版本更新

Physics-Guided Geometric Diffusion for Macro Placement Generation

物理引导的几何扩散用于宏单元布局生成

Jongho Yoon, Jinsung Jeon, Seokhyeong Kang

发表机构 * POSTECH Institute of Artificial Intelligence（POSTECH人工智能研究所）； KAIST InnoCORE LLM（韩国科学技术院InnoCORE语言模型实验室）； Seoul National University（首尔国立大学）； Pohang University of Science and Technology（釜山科学技术大学）

AI总结提出MacroDiff+框架，通过双域去噪架构和物理引导采样策略，在宏单元布局中同时优化拓扑连接和物理约束，在ISPD2005 MMS基准上实现线长减少6.1-6.2%。

Comments Accepted to IJCAI 2026. 9 pages, 5 figures

详情

AI中文摘要

宏单元布局是VLSI物理设计中的关键阶段，从根本上决定了芯片的整体性能。最近的数据驱动布局方法显示出巨大潜力，但它们往往难以处理序列依赖关系，并平衡拓扑连接与物理约束。为弥补这一差距，我们提出了MacroDiff+，一个物理引导的几何扩散框架。具体来说，我们设计了一个双域去噪架构，将异构GNN编码的拓扑连接与Transformer建模的全局几何上下文相结合。此外，我们引入了物理引导采样，一种推理策略，通过显式梯度主动引导生成，以确保统计合理性和物理有效性。在ISPD2005 MMS基准上，MacroDiff+优于最先进的基线，线长减少6.1-6.2%。值得注意的是，在先前方法无法收敛的大规模设计中，它表现出卓越的稳定性和可扩展性。源代码可在https://github.com/jhy00n/MacroDiff-plus获取。

英文摘要

Macro placement is a pivotal stage in VLSI physical design, fundamentally determining the overall chip performance. Recent data-driven placement methods have demonstrated significant potential, yet they often struggle to handle sequential dependencies and to balance topological connectivity with physical constraints. To bridge this gap, we propose MacroDiff+, a physics-guided geometric diffusion framework. Specifically, we design a dual-domain denoising architecture that couples topological connectivity encoded by heterogeneous GNNs with global geometric context modeled by a Transformer. Furthermore, we introduce Physics-Guided Sampling, an inference strategy that actively steers the generation using explicit gradients to ensure both statistical plausibility and physical validity. On the ISPD2005 MMS benchmarks, MacroDiff+ outperforms state-of-the-art baselines with a 6.1-6.2% reduction in wirelength. Notably, it exhibits superior stability and scalability on large-scale designs where prior methods fail to converge. The source code is available at https://github.com/jhy00n/MacroDiff-plus.

URL PDF HTML ☆

赞 0 踩 0

2605.16446 2026-06-02 cs.LG cs.AI 版本更新

Avoiding Structural Failure Modes in Tabular Fair SSL: Online Primal-Dual Allocation under Confidence Gating

避免表格公平半监督学习中的结构失效模式：基于置信门控的在线原始-对偶分配

Hangchuan Liang, Changchun Li

发表机构 * College of Computer Science and Technology, Jilin University, China（吉林大学计算机科学与技术学院）； Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, China（教育部符号计算与知识工程重点实验室）

AI总结针对表格公平半监督学习中的结构冲突，提出在线原始-对偶分配（OPDA）方法，通过动态调度公平性和熵稳定性惩罚，避免掩码崩溃和平凡饱和两种失效模式，在多个基准上实现非退化运行点。

详情

AI中文摘要

半监督学习（SSL）能够在有限标签下进行预测，但高风险表格应用（医疗、信贷、再犯）需要统计公平性保证。通过诊断压力测试，我们识别出表格公平SSL中的结构冲突：在置信门控伪标签下，矩匹配公平正则化器可能触发两种失效模式——掩码崩溃（公平性侵蚀置信度，导致伪标签匮乏）和平凡饱和（漂移至常数预测器）。我们提出在线原始-对偶分配（OPDA），一种在线控制器，利用违规、风险和伪标签健康信号调度公平性和基于熵的稳定性惩罚，从而避免在该诊断机制下为每个数据集选择固定公平权重。在评估的表格基准（Adult、ACSIncome、COMPAS）上，OPDA缓解了静态权重和简单单信号自适应基线中观察到的退化状态。在Adult和COMPAS上，它产生了与经验静态λ前沿竞争的非退化运行点；在ACSIncome上，它保持了效用，同时具有更宽的公平-效用分布。相对于OPDA-lite，完整控制器主要在ACSIncome上将运行点向更高效用偏移，而Adult则突出了两种变体之间的公平-效用权衡。这些结果使OPDA成为表格公平SSL中无需校准的控制器，无需针对每个数据集进行调整即可获得非退化运行点。

英文摘要

Semi-supervised learning (SSL) enables prediction with limited labels, but high-stakes tabular applications (medical, credit, recidivism) require statistical fairness guarantees. We identify a structural conflict in tabular fair SSL through a diagnostic stress test: under confidence-gated pseudo-labeling, moment-matching fairness regularizers can trigger two failure modes -- Masking Collapse (fairness erodes confidence, starving pseudo-labels) and Trivial Saturation (drift to constant predictors). We propose Online Primal-Dual Allocation (OPDA), an online controller that schedules fairness and entropy-based stability penalties using violation, risk, and pseudo-label health signals, avoiding per-dataset selection of a fixed fairness weight within this diagnostic regime. On the evaluated tabular benchmarks (Adult, ACSIncome, COMPAS), OPDA mitigates the degenerate regimes observed under static weighting and simple single-signal adaptive baselines. On Adult and COMPAS, it yields non-degenerate operating points competitive with the empirical static-$λ$ frontier; on ACSIncome, it preserves utility with a wider fairness-utility spread. Relative to OPDA-lite, the full controller mainly shifts the operating point toward higher utility on ACSIncome, while Adult highlights the fairness-utility trade-off between the two variants. These results position OPDA as a calibration-free controller for non-degenerate operating points in tabular fair SSL without per-dataset tuning.

URL PDF HTML ☆

赞 0 踩 0

2605.15511 2026-06-02 cs.LG 版本更新

OgBench: A Framework for Evaluating Graph Neural Networks on Omics Data

OgBench：评估图神经网络在组学数据上的框架

Louisa Cornelis, Johan Mathe, Louis Van Langendonck, Guillermo Bernárdez, Nina Miolane

发表机构 * UC Santa Barbara（加州大学圣芭芭拉分校）； Atmo, Inc.（Atmo公司）； Universitat Politècnica de Catalunya（加泰罗尼亚理工大学）

AI总结针对组学数据中样本少、节点多的特点，提出OgBench基准平台，评估GNN性能，发现常用GNN常不如简单MLP和经典基线。

Comments 42 pages

详情

AI中文摘要

图神经网络（GNN）已成为归纳图级学习的主导框架。然而，大多数基准测试关注的是 $n \gg p$ 的情况，其中图的数量 $n$ 远大于每张图的节点数 $p$。这忽略了诸如组学等生物学领域，这些领域处于相反的 $n \ll p$ 情况，其特点是跨少量患者样本的大规模基因、转录本或蛋白质图。这引发了一个问题： extit{GNN 在低样本、高节点的组学设置中表现如何？} 我们引入了 exttt{OgBench}（组学图基准），这是第一个针对组学数据 $n \ll p$ 特征下的图级预测基准平台。我们提供了一个标准化的、端到端的模块化基础设施，从原始组学数据到具有不同结构属性的特征图家族。我们对经典GNN、为大型图和组学应用设计的GNN，以及MLP和机器学习基线进行基准测试，以建立参考性能。我们的结果表明，广泛使用的GNN通常并不优于简单的MLP和经典基线。这些发现挑战了图结构在该领域固有地增加价值的普遍假设，促进了对当前学习范式的批判性重新评估。最终，通过揭示这些局限性，OgBench提供了必要的开源生态系统，使社区能够开发和验证专门为生物图设计的新型架构。代码可在 https://github.com/geometric-intelligence/ogbench 获取。

英文摘要

Graph Neural Networks (GNNs) have become the dominant framework for inductive graph-level learning. Yet most benchmarks focus on the regime $n \gg p$, where the number of graphs $n$ greatly exceeds the number of nodes per graph $p$. This overlooks biological domains such as omics, which operate in the opposite $n \ll p$ regime, characterized by large graphs of genes, transcripts, or proteins across few patient samples. This raises the question: \textit{how do GNNs perform in this low-sample, high-node omics setting?} We introduce \texttt{OgBench} (Omics-Graph Bench), the first benchmarking platform for graph-level prediction in the $n \ll p$ regime characteristic of omics data. We provide a standardized, end-to-end modular infrastructure from raw omics data to families of featured graphs with varied structural properties. We benchmark classical GNNs, as well as GNNs designed for large graphs and omics applications, alongside MLPs and machine learning baselines to establish reference performances. Our results show that widely used GNNs often do not outperform simple MLPs and classical baselines. These findings challenge the prevailing assumption that graph structure inherently adds value in this domain, fostering a critical reassessment of current learning paradigms. Ultimately, by exposing these limitations, OgBench provides the open-source ecosystem necessary for the community to develop and validate novel architectures explicitly tailored for biological graphs. The code is available at https://github.com/geometric-intelligence/ogbench.

URL PDF HTML ☆

赞 0 踩 0

2603.05917 2026-06-02 cs.LG cs.AI q-fin.ST 版本更新

Stock Market Prediction Using Node Transformer Architecture Integrated with BERT Sentiment Analysis

结合BERT情感分析的节点Transformer架构用于股票市场预测

Mohammad Al Ridhawi, Mahtab Haj Ali, Hussein Al Osman

发表机构 * University of Technology, Baghdad, Iraq（巴格达大学）

AI总结提出一种将节点Transformer与BERT情感分析相结合的框架，通过图结构建模股票间依赖关系并融合社交媒体情感，在S&P 500股票上实现0.80%的MAPE，显著优于传统方法。

Comments 18 pages, 5 figures, 12 tables. Accepted for publication in IEEE Access

详情

DOI: 10.1109/ACCESS.2026.3691980
Journal ref: IEEE Access, vol. 14, pp. 72613-72631, 2026

AI中文摘要

股票市场预测对在噪声、非平稳和行为动态的复杂市场环境中操作的投资者、金融机构和政策制定者提出了相当大的挑战。传统的预测方法，包括基本面分析和技术指标，往往无法捕捉金融市场中固有的复杂模式和横截面依赖性。本文提出了一种结合节点Transformer架构与基于BERT的情感分析的集成框架，用于股票价格预测。该模型将股票市场表示为图结构，其中个股构成节点，边捕捉关系，包括行业隶属关系、相关价格变动和供应链连接。一个微调的BERT模型从社交媒体帖子中提取情感信息，并通过基于注意力的融合机制将其与定量市场特征相结合。节点Transformer处理历史市场数据，同时捕捉股票间的时间演变和横截面依赖性。在1982年1月至2025年3月期间20只S&P 500股票上进行的实验表明，集成模型在一天前预测中实现了0.80%的平均绝对百分比误差（MAPE），而ARIMA为1.20%，LSTM为1.00%。情感分析的加入使预测误差总体降低10%，在财报公告期间降低25%，而基于图的架构通过捕捉股票间依赖性额外贡献了15%的改进。方向准确率在一天预测中达到65%。通过配对t检验的统计验证确认了这些改进的显著性（所有比较p < 0.05）。该模型在高波动期保持较低的误差，MAPE为1.50%，而基线模型范围为1.60%至2.10%。

英文摘要

Stock market prediction presents considerable challenges for investors, financial institutions, and policymakers operating in complex market environments characterized by noise, non-stationarity, and behavioral dynamics. Traditional forecasting methods, including fundamental analysis and technical indicators, often fail to capture the intricate patterns and cross-sectional dependencies inherent in financial markets. This paper presents an integrated framework combining a node transformer architecture with BERT-based sentiment analysis for stock price forecasting. The proposed model represents the stock market as a graph structure where individual stocks form nodes and edges capture relationships including sectoral affiliations, correlated price movements, and supply chain connections. A fine-tuned BERT model extracts sentiment information from social media posts and combines it with quantitative market features through attention-based fusion mechanisms. The node transformer processes historical market data while capturing both temporal evolution and cross-sectional dependencies among stocks. Experiments conducted on 20 S&P 500 stocks spanning January 1982 to March 2025 demonstrate that the integrated model achieves a mean absolute percentage error (MAPE) of 0.80% for one-day-ahead predictions, compared to 1.20% for ARIMA and 1.00% for LSTM. The inclusion of sentiment analysis reduces prediction error by 10% overall and 25% during earnings announcements, while the graph-based architecture contributes an additional 15% improvement by capturing inter-stock dependencies. Directional accuracy reaches 65% for one-day forecasts. Statistical validation through paired t-tests confirms the significance of these improvements (p < 0.05 for all comparisons). The model maintains lower error during high-volatility periods, achieving MAPE of 1.50% while baseline models range from 1.60% to 2.10%.

URL PDF HTML ☆

赞 0 踩 0

2605.13834 2026-06-02 cs.LG cs.AI cs.CG 版本更新

Topology-Preserving Neural Operator Learning via Hodge Decomposition

通过Hodge分解保持拓扑的神经算子学习

Dongzhe Zheng, Tao Zhong, Christine Allen-Blanchette

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文从函数空间视角研究几何网格上物理场方程的解算子，利用Hodge正交性分离不可学习的拓扑自由度与可学习的几何动力学，提出基于Hodge谱对偶的混合欧拉-拉格朗日架构，在保持物理不变量的同时提升几何图上的精度与效率。

Comments Accepted at ICML 2026. Code available at https://github.com/ContinuumCoder/Hodge-Spectral-Duality

详情

AI中文摘要

本文从函数空间视角研究几何网格上物理场方程的解算子。我们发现Hodge正交性通过将不可学习的拓扑自由度与可学习的几何动力学分离，从根本上解决了谱干扰问题，从而实现了局限于保结构子空间的加性逼近。基于Hodge理论和算子分裂，我们推导出原则性的算子级分解。结果是一种混合欧拉-拉格朗日架构，具有我们称为Hodge谱对偶（HSD）的代数级归纳偏置。在我们的框架中，我们使用离散微分形式捕捉拓扑主导的分量，并使用正交辅助环境空间表示复杂的局部动力学。我们的方法在几何图上实现了优越的准确性和效率，并增强了对物理不变量的保真度。我们的代码可在https://github.com/ContinuumCoder/Hodge-Spectral-Duality获取。

英文摘要

In this paper, we study solution operators of physical field equations on geometric meshes from a function-space perspective. We reveal that Hodge orthogonality fundamentally resolves spectral interference by isolating unlearnable topological degrees of freedom from learnable geometric dynamics, enabling an additive approximation confined to structure-preserving subspaces. Building on Hodge theory and operator splitting, we derive a principled operator-level decomposition. The result is a Hybrid Eulerian-Lagrangian architecture with an algebraic-level inductive bias we call Hodge Spectral Duality (HSD). In our framework, we use discrete differential forms to capture topology-dominated components and an orthogonal auxiliary ambient space to represent complex local dynamics. Our method achieves superior accuracy and efficiency on geometric graphs with enhanced fidelity to physical invariants. Our code is available at https://github.com/ContinuumCoder/Hodge-Spectral-Duality

URL PDF HTML ☆

赞 0 踩 0

2605.13430 2026-06-02 stat.ME cs.AI cs.LG 版本更新

Towards a holistic understanding of Selection Bias for Causal Effect Identification

走向因果效应识别中选择偏差的整体理解

Yiwen Qiu, Filip Kovačević, Shimeng Huang, Peter Spirtes, Francesco Locatello

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； University of California, Berkeley（加州大学伯克利分校）

AI总结研究在观测研究中存在选择偏差时，如何利用弱假设刻画倾向得分和选择概率，给出平均处理效应可识别性的充要条件，扩展了现有图形识别准则。

Comments 9 pages for the main text, ICML 2026

详情

AI中文摘要

选择偏差在观测研究中普遍存在。例如，大规模生物库数据可能表现出“健康志愿者偏差”，即受访者比他们所要代表的人群更健康、社会经济地位更高。从这样的子人群中恢复因果效应是因果推断中的一个重要问题，因为从选定人群估计平均处理效应（ATE）可能导致对整个群体的ATE估计严重偏倚。本文研究了选择偏差下ATE的可识别性。我们利用概率类的弱假设刻画倾向得分和选择概率，给出了ATE可识别性的充要条件。与以往工作相比，我们的结果扩展了现有的图形可识别性准则，并在存在选择偏差的情况下，以严格更弱的条件提供了对因果效应识别更全面的理解。

英文摘要

Selection bias is pervasive in observational studies. For example, large scale biobanks data can exhibit ``healthy volunteer bias'' when respondents are healthier and of higher socio-economic status than the population they are meant to represent. Recovering causal effects from such sub-population is an important problem in causal inference, as estimating average treatment effects (ATE) from selected populations can result in a severely biased estimate of the ATE from the whole population. In this paper, we investigate the identifiability of the ATE under selection bias. We provide necessary and sufficient conditions for ATE identifiability, leveraging weak assumptions on probability classes to characterize propensity score and selection probability. Compared to previous works, our results extend existing graphical identifiability criteria and offer a more comprehensive understanding of causal effect identification with strictly weaker conditions in the presence of selection bias.

URL PDF HTML ☆

赞 0 踩 0

2509.24627 2026-06-02 cs.LG 版本更新

Learning Hamiltonian Dynamics at Scale: A Differential-Geometric Approach

大规模学习哈密顿动力学：一种微分几何方法

Katharina Friedl, Noémie Jaquier, Alyx Liao, Danica Kragic

发表机构 * Department of Robotics, Perception, and Learning（机器人、感知与学习系）

AI总结提出结合哈密顿力学守恒律与模型降阶可扩展性的降阶哈密顿神经网络（RO-HNN），通过几何约束辛自编码器和几何哈密顿神经网络实现高维动力系统的物理一致预测。

Comments 32 pages, 21 figures, Intl. Conference on Machine Learning (ICML), 2026

详情

AI中文摘要

将物理直觉嵌入网络架构允许学习强制执行基本属性（如能量守恒定律）的动力学，从而产生物理上合理的预测。然而，将这些模型扩展到高维动力系统仍然是一个重大挑战。本文介绍了降阶哈密顿神经网络（RO-HNN），一种新颖的物理启发神经网络，它结合了哈密顿力学的守恒律与模型降阶的可扩展性。RO-HNN 建立在两个核心组件上：一种新颖的几何约束辛自编码器，用于学习低维、保结构的辛子流形，以及一种几何哈密顿神经网络，用于建模子流形上的动力学。我们的实验表明，RO-HNN 提供了复杂高维动力学的物理一致、稳定且可泛化的预测，从而有效地将哈密顿神经网络的范围扩展到高维物理系统。

英文摘要

Embedding physical intuition into network architectures allows the learning of dynamics that enforce fundamental properties, such as energy conservation laws, thereby leading to physically-plausible predictions. Yet, scaling these models to high-dimensional dynamical systems remains a significant challenge. This paper introduces Reduced-order Hamiltonian Neural Network (RO-HNN), a novel physics-inspired neural network that combines the conservation laws of Hamiltonian mechanics with the scalability of model order reduction. RO-HNN is built on two core components: a novel geometrically-constrained symplectic autoencoder that learns a low-dimensional, structure-preserving symplectic submanifold, and a geometric Hamiltonian neural network that models the dynamics on the submanifold. Our experiments demonstrate that RO-HNN provides physically-consistent, stable, and generalizable predictions of complex high-dimensional dynamics, thereby effectively extending the scope of Hamiltonian neural networks to high-dimensional physical systems.

URL PDF HTML ☆

赞 0 踩 0

2605.13175 2026-06-02 cs.LG 版本更新

面向LinkedIn招聘代理的分层长期语义记忆

Zhentao Xu, Shangjin Zhang, Emir Poyraz, Yvonne Li, Ye Jin, Xie Lu, Xiaoyang Gu, Karthik Ramgopal, Praveen Kumar Bodigutla, Xiaofeng Wang

发表机构 * LinkedIn Corporation（LinkedIn公司）

AI总结提出分层长期语义记忆（HLTM）框架，通过构建模式对齐的记忆树，实现可扩展的语义知识摄入、隐私感知存储、低延迟检索和透明溯源，在LinkedIn招聘助手应用中使答案正确率提升超5%、检索F1提升超10%。

Comments Accepted to the Applied Data Science (ADS) track at the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)

详情

DOI: 10.1145/3770855.3818432

AI中文摘要

大型语言模型（LLM）代理越来越多地应用于实际产品中，其中个性化和上下文感知的用户交互至关重要。实现此类能力的核心是代理的长期语义记忆系统，该系统从嘈杂的纵向行为数据中提取隐式和显式信号，以结构化形式存储，并支持低延迟检索。构建工业级LLM代理长期记忆面临五大挑战：可扩展性、低延迟检索、隐私约束、适应性和可观测性。我们提出了分层长期语义记忆（HLTM）框架，该框架将文本数据组织成模式对齐的记忆树，在多个粒度级别捕获语义知识，从而实现可扩展的摄入、隐私感知存储、低延迟检索和透明溯源；HLTM还进一步融入了适应机制以泛化到不同用例。在LinkedIn招聘助手上的广泛评估表明，HLTM使答案正确率提升超过5%，检索F1提升超过10%，同时显著推进了查询与索引延迟之间的帕累托前沿。HLTM已全面部署在LinkedIn招聘助手中，用于支持生产招聘工作流中的核心个性化功能。

英文摘要

Large Language Model (LLM) agents are increasingly used in real-world products, where personalized and context-aware user interactions are essential. A central enabler of such capabilities is the agent's long-term semantic memory system, which extracts implicit and explicit signals from noisy longitudinal behavioral data, stores them in a structured form, and supports low-latency retrieval. Building industrial-grade long-term memory for LLM agents raises five challenges: scalability, low-latency retrieval, privacy constraints, adaptability, and observability. We introduce the Hierarchical Long-Term Semantic Memory (HLTM) framework, which organizes textual data into a schema-aligned memory tree that captures semantic knowledge at multiple levels of granularity, enabling scalable ingestion, privacy-aware storage, low-latency retrieval, and transparent provenance; HLTM further incorporates an adaptation mechanism to generalize across diverse use cases. Extensive evaluations on LinkedIn's Hiring Assistant show that HLTM improves answer correctness by more than 5% and retrieval F1 by more than 10%, while significantly advancing the Pareto frontier between query and indexing latency. HLTM has been fully deployed in LinkedIn's Hiring Assistant to power core personalization features in production hiring workflows.

URL PDF HTML ☆

赞 0 踩 0

2604.25191 2026-06-02 cs.AR cs.AI cs.LG 版本更新

How Can Reinforcement Learning Achieve Expert-level Placement?

强化学习如何实现专家级布局？

Ruo-Tong Chen, Ke Xue, Chengrui Gao, Yunqi Shi, Tian Xu, Peng Xie, Siyuan Xu, Mingxuan Yuan, Chao Qian, Zhi-Hua Zhou

发表机构 * State Key Laboratory of Novel Software Technology, Nanjing University, China（南京大学新型软件技术国家重点实验室）； School of Artificial Intelligence, Nanjing University, China（南京大学人工智能学院）； Huawei Noah’s Ark Lab, China（华为诺亚实验室）

AI总结针对强化学习在芯片布局中因奖励设计不当而难以达到专家质量的问题，提出从专家布局直接学习奖励模型的方法，通过推断专家轨迹并训练隐式奖励模型，实现从单个设计高效学习并泛化到未见案例。

Comments DAC 2026

2604.23658 2026-06-02 cs.AR cs.AI cs.LG 版本更新

FlowPlace: Flow Matching for Chip Placement

FlowPlace: 用于芯片布局的流匹配

Peng Xie, Ke Xue, Yunqi Shi, Ruo-Tong Chen, Chengrui Gao, Siyuan Xu, Chenjian Ding, Mingxuan Yuan, Chao Qian

发表机构 * State Key Laboratory of Novel Software Technology, Nanjing University, China（南京大学新型软件技术国家重点实验室）； School of Artificial Intelligence, Nanjing University, China（南京大学人工智能学院）； Huawei Noah’s Ark Lab, China（华为诺亚实验室）

AI总结提出FlowPlace，通过掩码引导的合成数据生成、基于流的灵活先验注入高效训练和硬约束采样实现无重叠布局，在OpenROAD和ICCAD 2015基准上取得更优PPA指标、10-50倍采样效率提升和零重叠。

Comments DAC 2026

2604.22896 2026-06-02 cs.RO cs.LG 版本更新

Magnetic Indoor Localization through CNN Regression and Rotation Invariance

基于CNN回归和旋转不变性的磁室内定位

Helge Rosé, Konstantin Klipp, Tom Koubek, Bernd Schäufele, Ilja Radusch

发表机构 * University of Freiburg（弗赖堡大学）

AI总结提出使用旋转不变特征（磁场强度和重力轴投影）训练轻量级CNN模型，实现无需方向校准的室内定位，在MagPie数据集上达到或超越现有最优精度。

Comments Published and presented at the 2026 4th International Conference on Mechatronics, Control and Robotics (ICMCR)

详情

DOI: 10.1109/ICMCR69541.2026.11533953

AI中文摘要

室内定位是GNSS拒止环境中广泛应用的关键技术，包括室内导航和物联网系统。结合卷积神经网络（CNN）和基于磁场特征的方法，提供了一种低成本、无需基础设施的精确定位解决方案。尽管磁指纹是室内定位的一种有前景的方法，但基于原始3D磁力计数据训练的模型对设备方向高度敏感。我们通过使用从3D磁场导出的两个旋转不变特征来解决这个问题：磁场强度（Mn）和重力轴投影（Mg）。我们在磁序列上训练轻量级7层扩张CNN（MagNetS/XL），直接回归（x, y）位置。使用MagPie数据集（三栋建筑，手持轨迹），我们系统评估了测试和/或训练数据的固定和随机旋转。原始3D输入（Mx, My, Mz）在固定90°旋转下表现出各向同性误差增加，并随着随机旋转增大而进一步恶化。相比之下，2D输入（Mn, Mg）保持旋转不变精度，并且一旦旋转超过三个参考建筑的特定阈值（Loomis大建筑0°，Talbot中建筑5°，CSL小建筑6°），其性能就超过3D输入。MagNetXL在MagPie数据集上达到或超越了现有最优精度，而MagNetS以约三分之一的参数实现了相似性能，有利于移动部署。这些结果表明，在实际使用中，从旋转不变输入获得的鲁棒性超过了输入维度降低的损失，从而无需方向校准或额外基础设施即可进行地图构建和定位。

英文摘要

Indoor positioning is an essential technology for a wide range of applications in GNSS-denied environments, including indoor navigation and IoT systems. Combining convolutional neural networks (CNNs) and magnetic field-based features offers a low-cost, infrastructure-free solution for precise positioning. While magnetic fingerprints are a promising approach for indoor positioning, models trained on raw 3D magnetometer data are highly sensitive to device orientation. We address this by using two rotation invariant features derived from the 3D magnetic field: the norm (Mn) and the projection onto the gravity axis (Mg). We train a lightweight 7-layer dilated CNN (MagNetS/XL) on magnetic sequences to directly regress (x, y) positions. Using the MagPie dataset (three buildings, handheld trajectories), we systematically evaluate fixed and random rotations of test and/or train data. Raw 3D inputs (Mx, My , Mz) exhibit isotropic error increases under fixed 90° rotations and further degrade with growing random rotations. In contrast, 2D (Mn, Mg) inputs maintain rotation invariant accuracy and surpass the 3D inputs once rotation exceeds building-specific thresholds for three reference buildings: 0° for Loomis (large), 5° for Talbot (medium), and 6° for CSL (small). MagNetXL achieves or exceeds state-of-the-art accuracy on the MagPie dataset, and MagNetS delivers similar performance with roughly one third of the parameters, favoring mobile deployment. These results show that the robustness gained from rotation invariant inputs outweighs the loss of input dimensionality in realistic usage, allowing mapping and localization without orientation alignment or added infrastructure.

URL PDF HTML ☆

赞 0 踩 0

2602.02689 2026-06-02 cs.CR cs.AI cs.LG 版本更新

Eidolon: A Post-Quantum Signature Scheme Based on k-Colorability in the Age of Graph Neural Networks

Eidolon: 图神经网络时代基于k-可着色性的后量子签名方案

Asmaa Cherkaoui, Ramon Flores, Delaram Kahrobaei, Richard Wilson

发表机构 * Laboratory of Mathematical Analysis, Algebra and Applications (LAM2A), Faculty of Sciences Ain Chock (FSAC), University Hassan II, Casablanca, Morocco（哈桑二世大学阿因-奇克学院数学分析与代数实验室）； Department of Geometry and Topology, Faculty of Mathematics, University of Seville, Seville, Spain（塞维利亚大学数学系几何与拓扑系）； Departments of Computer Science and Mathematics, Queens College, City University of New York, USA（纽约市立大学皇后学院计算机科学与数学系；数学博士项目，理论科学倡议，研究生中心，纽约市立大学；计算机科学与工程系，纽约大学塔朗分校；计算机科学系，英国约克大学）； PhD Program in Mathematics, and Initiative for the Theoretical Sciences, Graduate Center, City University of New York, USA（英国约克大学计算机科学系）； Department of Computer Science and Engineering, Tandon School of Engineering, New York University, USA ； Department of Computer Science, University of York, United Kingdom ； Department of Computer Science, University of York, United Kingdom

AI总结提出一种基于NP完全问题k-可着色性的后量子签名方案Eidolon，通过推广Goldreich-Micali-Wigderson零知识协议、应用Fiat-Shamir变换和Merkle树压缩，并利用植入着色法生成困难实例，实验表明对经典求解器和图神经网络攻击具有抵抗性。

Comments 20 pages, 4 figures

详情

DOI: 10.1007/978-3-032-27574-5_3
Journal ref: Proceedings of WAIFI 2026, Lecture Notes in Computer Science (LNCS), Vol. 16611, Springer, 2026

AI中文摘要

我们提出Eidolon，一种基于NP完全问题k-可着色性的后量子签名方案。我们的构造将Goldreich-Micali-Wigderson零知识协议推广到任意k >= 3，应用Fiat-Shamir变换，并使用Merkle树承诺将签名从O(tn)压缩到O(t log n)。我们通过植入着色法生成困难实例，同时旨在保留随机图的统计特征。我们对此类方案进行了针对经典求解器（ILP、DSatur）和定制图神经网络（GNN）攻击者的实证安全分析。实验表明，对于n >= 60，两种方法均无法恢复与植入解匹配的有效着色，表明精心设计的k-着色实例能够抵抗所考虑的传统和基于学习的密码分析方法。这些实验表明，构造的实例能够抵抗我们评估中考虑的攻击。

英文摘要

We propose Eidolon, a post-quantum signature scheme grounded on the NP-complete k-colorability problem. Our construction generalizes the Goldreich-Micali-Wigderson zero-knowledge protocol to arbitrary k >= 3, applies the Fiat-Shamir transform, and uses Merkle-tree commitments to compress signatures from O(tn) to O(t log n). We generate hard instances by planting a coloring while aiming to preserve the statistical profile of random graphs. We present an empirical security analysis of such a scheme against both classical solvers (ILP, DSatur) and a custom graph neural network (GNN) attacker. Experiments show that for n >= 60, neither approach is able to recover a valid coloring matching the planted solution, suggesting that well-engineered k-coloring instances can resist the considered classical and learning-based cryptanalytic approaches. These experiments indicate that the constructed instances resist the attacks considered in our evaluation.

URL PDF HTML ☆

赞 0 踩 0

2604.20308 2026-06-02 cs.LG 版本更新

Sheaf Neural Networks on SPD Manifolds: Second-Order Geometric Representation Learning

SPD流形上的层神经网络：二阶几何表示学习

Yuhan Peng, Junwen Dong, Yuzhi Zeng, Hao Li, Ce Ju, Huitao Feng, Diaaeldin Taha, Anna Wienhard, Kelin Xia

发表机构 * GitHub

AI总结针对图神经网络在欧氏空间中的线性结构限制，提出首个在对称正定矩阵流形上运行的层神经网络，利用李群结构定义层算子，实现二阶几何表示学习，在MoleculeNet基准上取得6/7最优结果。

详情

AI中文摘要

图神经网络面临两个源于欧氏向量空间线性结构的基本挑战：(1) 当前架构通过向量（方向、梯度）表示几何，但许多任务需要矩阵值表示来捕捉方向之间的关系——例如分子中原子取向的协变。这些二阶表示自然地由对称正定矩阵流形上的点捕获；(2) 标准消息传递在边上应用共享变换。层神经网络通过边特定变换解决了这一问题，但现有公式仍局限于向量空间，因此无法传播矩阵值特征。我们通过开发首个在SPD流形上原生运行的层神经网络来应对这两个挑战。我们的关键洞察是SPD流形具有李群结构，使得无需投影到欧氏空间即可定义良置的层算子。理论上，我们证明SPD值层比欧氏层具有更强的表达能力：它们能容纳向量值层无法表示的相容配置（全局截面），直接转化为更丰富的学习表示。实验上，我们的层卷积有效地将秩1方向输入变换为编码局部几何结构的满秩矩阵。我们的双流架构在MoleculeNet基准的6/7个任务上达到最优，且层框架提供了持续的深度鲁棒性。

英文摘要

Graph neural networks face two fundamental challenges rooted in the linear structure of Euclidean vector spaces: (1) Current architectures represent geometry through vectors (directions, gradients), yet many tasks require matrix-valued representations that capture relationships between directions-such as how atomic orientations covary in a molecule. These second-order representations are naturally captured by points on the symmetric positive definite matrices (SPD) manifold; (2) Standard message passing applies shared transformations across edges. Sheaf neural networks address this via edge-specific transformations, but existing formulations remain confined to vector spaces and therefore cannot propagate matrix-valued features. We address both challenges by developing the first sheaf neural network operates natively on the SPD manifold. Our key insight is that the SPD manifold admits a Lie group structure, enabling well-posed analogs of sheaf operators without projecting to Euclidean space. Theoretically, we prove that SPD-valued sheaves are strictly more expressive than Euclidean sheaves: they admit consistent configurations (global sections) that vector-valued sheaves cannot represent, directly translating to richer learned representations. Empirically, our sheaf convolution transforms effectively rank-1 directional inputs into full-rank matrices encoding local geometric structure. Our dual-stream architecture achieves SOTA on 6/7 MoleculeNet benchmarks, with the sheaf framework providing consistent depth robustness.

URL PDF HTML ☆

赞 0 踩 0

2604.17838 2026-06-02 cs.LG stat.CO stat.ML 版本更新

Efficient Diffusion Models under Nonconvex Equality and Inequality constraints via Landing

非凸等式和不等式约束下的高效扩散模型 via Landing

Kijung Jeon, Michael Muehlebach, Molei Tao

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出一个统一框架，通过计算高效的landing机制替代投影，结合欠阻尼动力学加速混合，在非凸可行集上实现等式和不等式约束下的扩散模型，显著降低计算成本。

Comments 58 pages

详情

AI中文摘要

在约束集合内的生成建模对于涉及物理、几何或安全要求（例如分子生成、机器人学）的科学和工程应用至关重要。我们提出了一个通用框架，用于在一般非凸可行集 $Σ$ 上的约束扩散模型，该模型在整个扩散过程中同时强制执行等式和不等式约束。我们的框架包含了过阻尼和欠阻尼动力学用于前向和后向采样。一个关键的算法创新是计算高效的landing机制，它替代了昂贵且通常定义不清的到 $Σ$ 的投影，确保可行性而无需迭代牛顿求解或投影失败。通过利用欠阻尼动力学，我们加速了向先验分布的混合，有效缓解了通常与约束扩散相关的高模拟成本。实验上，该方法在训练和推理过程中减少了函数评估和内存使用，同时保持了样本质量。在具有等式和混合约束的基准测试中，我们的方法在显著降低计算成本的同时实现了与最先进基线相当的样本质量，为非凸可行集上的扩散提供了实用且可扩展的解决方案。

英文摘要

Generative modeling within constrained sets is essential for scientific and engineering applications involving physical, geometric, or safety requirements (e.g., molecular generation, robotics). We present a unified framework for constrained diffusion models on generic nonconvex feasible sets $Σ$ that simultaneously enforces equality and inequality constraints throughout the diffusion process. Our framework incorporates both overdamped and underdamped dynamics for forward and backward sampling. A key algorithmic innovation is a computationally efficient landing mechanism that replaces costly and often ill-defined projections onto $Σ$, ensuring feasibility without iterative Newton solves or projection failures. By leveraging underdamped dynamics, we accelerate mixing toward the prior distribution, effectively alleviating the high simulation costs typically associated with constrained diffusion. Empirically, this approach reduces function evaluations and memory usage during both training and inference while preserving sample quality. On benchmarks featuring equality and mixed constraints, our method achieves comparable sample quality to state-of-the-art baselines while significantly reducing computational cost, providing a practical and scalable solution for diffusion on nonconvex feasible sets.

URL PDF HTML ☆

赞 0 踩 0

2601.02997 2026-06-02 cs.LG cs.CV 版本更新

From Memorization to Creativity: LLM as a Designer of Novel Neural Architectures

从记忆到创造：LLM作为新型神经架构的设计者

Waleed Khalid, Dmitry Ignatov, Radu Timofte

发表机构 * Computer Vision Lab, CAIDAS & IFI, University of Würzburg, Germany（计算机视觉实验室，CAIDAS与IFI，乌尔姆大学，德国）

AI总结本文提出NNGPT框架，通过闭环架构合成流水线，利用代码型LLM的监督微调循环，结合MinHash-Jaccard新颖性过滤和低保真性能信号，迭代提升生成架构的有效性、性能和多样性，实现从记忆到创造的转变。

详情

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 3252-3261, 2026

AI中文摘要

大型语言模型（LLM）在程序合成方面表现出色，但其在神经架构设计中的能力——平衡语法可靠性、性能和结构新颖性——仍未得到充分探索。我们提出了NNGPT框架内的闭环架构合成流水线，其中代码型LLM经过22次监督微调循环的演化。在每个循环中，LLM合成PyTorch卷积网络，通过低保真性能信号验证，并通过MinHash-Jaccard标准过滤以防止结构冗余，然后纳入LEMUR数据集。具有新颖架构的高性能候选被转换为提示-代码对，用于参数高效的LoRA微调。这种反馈循环驱动了可测量的分布偏移，逐步内化经验架构先验，使得有效且高性能的输出从稀缺变为主导。在CIFAR-10上，有效生成率稳定在50.6%（峰值74.5%），平均第一轮准确率从28.1%上升到51.0%，超过40%准确率的候选从2.0%增长到96.8%。跨数据集迁移到CIFAR-100和SVHN证实了改进的有效性、偏移的准确率分布和持续的新颖性在不同难度和视觉领域的基准测试中泛化。在22个循环中，有455个原始语料库中不存在的新颖架构被新颖性过滤器接受。通过将合成基于执行反馈和新颖性过滤，我们证明了迭代自监督微调将LLM重塑为任务特化的架构先验——提高了生成可靠性、代理性能和结构多样性——为手工设计的搜索空间提供了一种可复现、无需标注的替代方案。

英文摘要

Large language models (LLMs) excel in program synthesis, yet their capacity for neural architecture design -- balancing syntactic reliability, performance, and structural novelty -- remains underexplored. We present a closed-loop architecture synthesis pipeline within the NNGPT framework, in which a code-oriented LLM evolves over 22 supervised fine-tuning cycles. At each cycle, the LLM synthesizes PyTorch convolutional networks, validated via low-fidelity performance signals and filtered via a MinHash--Jaccard criterion to prevent structural redundancy before being incorporated into the LEMUR dataset. High-performing candidates with novel architectures are converted into prompt--code pairs for parameter-efficient LoRA fine-tuning. This feedback loop drives a measurable distributional shift, progressively internalizing empirical architectural priors such that valid and high-performing outputs evolve from scarce to dominant across cycles. On CIFAR-10, the valid generation rate stabilizes at 50.6% (peaking at 74.5%), mean first-epoch accuracy rises from 28.1% to 51.0%, and candidates exceeding 40% accuracy grow from 2.0% to 96.8%. Cross-dataset transfer to CIFAR-100 and SVHN confirms that improved validity, shifted accuracy distributions, and sustained novelty generalize across benchmarks of varying difficulty and visual domain. Across 22 cycles, 455 unique architectures absent from the original corpus are admitted under the novelty filter. By grounding synthesis in execution feedback and novelty filtering, we demonstrate that iterative self-supervised fine-tuning reshapes an LLM into a task-specialized architectural prior -- improving generation reliability, proxy performance, and structural diversity -- offering a reproducible, annotation-free alternative to hand-crafted search spaces.

URL PDF HTML ☆

赞 0 踩 0

2604.14698 2026-06-02 cs.LG 版本更新

Mean Flow Policy Optimization

平均流策略优化

Xiaoyi Dong, Xi Sheryl Zhang, Jian Cheng

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出使用平均流模型作为策略表示，通过最大熵强化学习框架进行软策略迭代，以提升在线强化学习中训练和推理效率，实验表明性能与扩散模型基线相当或更优且时间显著减少。

Comments ICML 2026

详情

AI中文摘要

扩散模型最近作为在线强化学习（RL）的表达性策略表示出现。然而，其迭代生成过程引入了大量的训练和推理开销。为了克服这一限制，我们提出使用平均流模型（MeanFlow模型）来表示策略，这是一类少步流生成模型，旨在提高基于扩散的RL方法的训练和推理效率。为了促进探索，我们通过软策略迭代在最大熵强化学习框架下优化平均流策略，并解决了平均流策略特有的两个关键挑战：动作似然评估和软策略改进。在MuJoCo、DeepMind Control Suite和HumanoidBench基准上的实验表明，我们的方法——平均流策略优化（MFPO）——实现了与当前基于扩散的基线相当或更优的性能，同时显著减少了训练和推理时间。我们的代码可在https://github.com/dongxiaoyi-xyz/MFPO获取。

英文摘要

Diffusion models have recently emerged as expressive policy representations for online reinforcement learning (RL). However, their iterative generative processes introduce substantial training and inference overhead. To overcome this limitation, we propose to represent policies using MeanFlow models, a class of few-step flow-based generative models, to improve training and inference efficiency over diffusion-based RL approaches. To promote exploration, we optimize MeanFlow policies under the maximum entropy RL framework via soft policy iteration, and address two key challenges specific to MeanFlow policies: action likelihood evaluation and soft policy improvement. Experiments on MuJoCo, DeepMind Control Suite and HumanoidBench benchmarks demonstrate that our method, Mean Flow Policy Optimization (MFPO), achieves performance comparable to or exceeding current diffusion-based baselines while considerably reducing training and inference time. Our code is available at https://github.com/dongxiaoyi-xyz/MFPO.

URL PDF HTML ☆

赞 0 踩 0

2201.10838 2026-06-02 cs.CR cs.LG 版本更新

Privacy-Preserving Logistic Regression Training with A Faster Gradient Variant

隐私保护逻辑回归训练中的一种更快梯度变体

John Chiang

AI总结提出一种名为二次梯度的梯度变体，用于加速隐私保护逻辑回归训练，并在同态加密场景下仅需四次迭代即可达到可比性能。

详情

AI中文摘要

近年来，对加密数据训练逻辑回归已成为解决安全问题的重要方法。本文引入了一种高效的梯度变体，称为 extit{二次梯度}，该变体专为隐私保护逻辑回归设计，同时在明文优化中同样有效。通过引入二次梯度，我们改进了Nesterov加速梯度（NAG）、自适应梯度（AdaGrad）和Adam算法。我们在多个数据集上评估了这些改进算法，实验结果表明，其收敛速度达到最先进水平，显著优于传统一阶梯度方法。此外，我们将改进的NAG方法应用于同态逻辑回归训练，仅需四次迭代即可实现可比性能。所提出的二次梯度方法提供了一个统一框架，融合了一阶梯度方法和二阶牛顿型方法的优势，表明其可广泛应用于各种数值优化任务。

英文摘要

Training logistic regression over encrypted data has emerged as a prominent approach to addressing security concerns in recent years. In this paper, we introduce an efficient gradient variant, termed the \textit{quadratic gradient}, which is specifically designed for privacy-preserving logistic regression while remaining equally effective in plaintext optimization. By incorporating this quadratic gradient, we enhance Nesterov's Accelerated Gradient (NAG), Adaptive Gradient (AdaGrad), and Adam algorithms. We evaluate these enhanced algorithms across various datasets, with experimental results demonstrating state-of-the-art convergence rates that significantly outperform traditional first-order gradient methods. Furthermore, we apply the enhanced NAG method to implement homomorphic logistic regression training, achieving comparable performance within only four iterations. The proposed quadratic-gradient approach offers a unified framework that synergizes the advantages of first-order gradient methods and second-order Newton-type methods, suggesting broad applicability to diverse numerical optimization tasks.

URL PDF HTML ☆

赞 0 踩 0

2604.10688 2026-06-02 cs.LG cs.AI cs.CL 版本更新

SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting

SCOPE: 信号校准的在线策略蒸馏增强与双路径自适应加权

Binbin Zheng, Xing Ma, Yiheng Liang, Jingqing Ruan, Xiaoliang Fu, Kepeng Lin, Benchang Zhu, Ke Zeng, Xunliang Cai

发表机构 * University of Science and Technology of China（中国科学技术大学）； Meituan LongCat Interaction Team（美团 LongCat 交互团队）； Nanjing University（南京大学）； Fudan University（复旦大学）； Huazhong University of Science and Technology（华中科技大学）

AI总结针对在线策略强化学习中奖励稀疏导致的信用分配难题，提出SCOPE框架，通过双路径自适应加权机制分别处理正确与错误轨迹，实现信号校准的蒸馏增强，在六个推理基准上平均提升11.42%的Avg@32和7.30%的Pass@32。

详情

AI中文摘要

在线策略强化学习已成为大型语言模型推理对齐的主导范式，但其稀疏的结果级奖励使得令牌级信用分配异常困难。在线策略蒸馏（OPD）通过引入来自教师模型的密集令牌级KL监督缓解了这一问题，但通常对所有rollout均匀应用这种监督，忽略了信号质量的根本差异。我们提出信号校准的在线策略蒸馏增强（SCOPE），一种双路径自适应训练框架，根据正确性将在线策略rollout路由到两个互补的监督路径。对于错误轨迹，SCOPE执行教师困惑度加权的KL蒸馏，优先考虑教师展现出真正纠正能力的实例，同时降低不可靠指导的权重。对于正确轨迹，它应用学生困惑度加权的MLE，将强化集中在能力边界上的低置信度样本，而不是过度强化已掌握的样本。两条路径都采用组级归一化来自适应校准权重分布，考虑不同提示的内在难度差异。在六个推理基准上的大量实验表明，SCOPE在Avg@32和Pass@32上分别比竞争基线平均相对提升11.42%和7.30%，证明了其一致的有效性。

英文摘要

On-policy reinforcement learning has become the dominant paradigm for reasoning alignment in large language models, yet its sparse, outcome-level rewards make token-level credit assignment notoriously difficult. On-Policy Distillation (OPD) alleviates this by introducing dense, token-level KL supervision from a teacher model, but typically applies this supervision uniformly across all rollouts, ignoring fundamental differences in signal quality. We propose Signal-Calibrated On-Policy Distillation Enhancement (SCOPE), a dual-path adaptive training framework that routes on-policy rollouts by correctness into two complementary supervision paths. For incorrect trajectories, SCOPE performs teacher-perplexity-weighted KL distillation to prioritize instances where the teacher demonstrates genuine corrective capability, while down-weighting unreliable guidance. For correct trajectories, it applies student-perplexity-weighted MLE to concentrate reinforcement on low-confidence samples at the capability boundary rather than over-reinforcing already mastered ones. Both paths employ a group-level normalization to adaptively calibrate weight distributions, accounting for the intrinsic difficulty variance across prompts. Extensive experiments on six reasoning benchmarks show that SCOPE achieves an average relative improvement of 11.42% in Avg@32 and 7.30% in Pass@32 over competitive baselines, demonstrating its consistent effectiveness.

URL PDF HTML ☆

赞 0 踩 0

2604.09487 2026-06-02 cs.RO cs.LG 版本更新

Sim-to-Real Transfer for Muscle-Actuated Robots via Generalized Actuator Networks

基于广义执行器网络的肌肉驱动机器人仿真到现实迁移

Jan Schneider, Mridul Mahajan, Le Chen, Simon Guist, Bernhard Schölkopf, Ingmar Posner, Dieter Büchler

AI总结提出广义执行器网络（GenAN），通过从关节位置轨迹学习执行器模型，实现肌肉驱动机器人从仿真到现实的策略迁移，首次成功在四自由度肌肉驱动机器人臂上完成动态任务。

详情

AI中文摘要

肌腱驱动配合软肌肉执行器使机器人更快、更安全，同时可能加速技能获取。然而，由于固有的非线性、摩擦和迟滞，这些系统在实际中很少使用，这给建模和控制带来了复杂性。到目前为止，这些挑战阻碍了策略从仿真到真实系统的迁移。为弥合这一差距，我们提出了一种仿真到现实的流程，该流程学习这种复杂执行器的神经网络模型，并利用成熟的刚体仿真来处理手臂动力学和与环境的交互。我们的方法称为广义执行器网络（GenAN），通过直接从关节位置轨迹学习，而不是需要扭矩传感器，从而能够在广泛的机器人上进行执行器模型识别。在PAMY2（一种由气动人工肌肉驱动的肌腱驱动机器人）上使用GenAN，我们成功部署了完全在仿真中训练的、动态但精确的到达目标、杯中球和乒乓球策略。据我们所知，这一结果构成了四自由度肌肉驱动机器人臂首次成功的仿真到现实迁移。

英文摘要

Tendon drives paired with soft muscle actuation enable faster and safer robots while potentially accelerating skill acquisition. Still, these systems are rarely used in practice due to inherent nonlinearities, friction, and hysteresis, which complicate modeling and control. So far, these challenges have hindered policy transfer from simulation to real systems. To bridge this gap, we propose a sim-to-real pipeline that learns a neural network model of this complex actuation and leverages established rigid body simulation for the arm dynamics and interactions with the environment. Our method, called Generalized Actuator Network (GenAN), enables actuation model identification across a wide range of robots by learning directly from joint position trajectories rather than requiring torque sensors. Using GenAN on PAMY2, a tendon-driven robot powered by pneumatic artificial muscles, we successfully deploy dynamic but precise goal-reaching, ball-in-a-cup, and table tennis policies, trained entirely in simulation. To the best of our knowledge, this result constitutes the first successful sim-to-real transfer for a four-degrees-of-freedom muscle-actuated robot arm.

URL PDF HTML ☆

赞 0 踩 0

2604.09041 2026-06-02 cs.LG cs.AI physics.ao-ph stat.ML 版本更新

U-Cast: A Surprisingly Simple and Efficient Frontier Probabilistic AI Weather Forecaster

U-Cast：一种惊人简单且高效的边界概率AI天气预报器

Salva Rühling Cachay, Duncan Watson-Parris, Rose Yu

发表机构 * University of Cambridge（剑桥大学）

AI总结提出基于标准U-Net骨架的概率天气预报模型U-Cast，通过确定性预训练和短时概率微调，以不到1/10的计算成本匹配或超越GenCast和IFS ENS的预报技能。

Comments ICML 2026. Our code is available at: https://github.com/Rose-STL-Lab/u-cast

详情

AI中文摘要

基于AI的天气预报现在可以与传统的基于物理的集合预报相媲美，但最先进的模型依赖于专门的架构和巨大的计算预算，造成了很高的进入门槛。我们证明，对于边界性能而言，这种复杂性是不必要的。我们引入了\ours，一种基于标准U-Net骨架的概率预报器，采用简单的训练方案：先进行基于平均绝对误差的确定性预训练，然后使用蒙特卡洛Dropout引入随机性，基于连续排序概率评分（CRPS）进行短时概率微调。结果，我们的模型在$1.5^\circ$分辨率下匹配或超过了GenCast和IFS ENS的概率技能，同时与领先的基于CRPS的模型相比，训练计算量减少了10倍以上，与基于扩散的模型相比，推理延迟减少了10倍以上。U-Cast在不到12个H200 GPU天内完成训练，并在3秒内生成15天的集合预报。这些结果表明，可扩展的通用架构与高效的训练课程相结合，可以以极低的成本匹配复杂的领域特定设计，从而向更广泛的社区开放边界概率天气模型的训练。

英文摘要

AI-based weather forecasting now rivals traditional physics-based ensembles, but state-of-the-art (SOTA) models rely on specialized architectures and massive computational budgets, creating a high barrier to entry. We demonstrate that such complexity is unnecessary for frontier performance. We introduce \ours, a probabilistic forecaster built on a standard U-Net backbone trained with a simple recipe: deterministic pre-training on Mean Absolute Error followed by short probabilistic fine-tuning on the Continuous Ranked Probability Score (CRPS) using Monte Carlo Dropout for stochasticity. As a result, our model matches or exceeds the probabilistic skill of GenCast and IFS ENS at $1.5^\circ$ resolution while reducing training compute by over $10\times$ compared to leading CRPS-based models and inference latency by over $10\times$ compared to diffusion-based models. U-Cast trains in under 12 H200 GPU-days and generates a 15-day ensemble forecast in 3 seconds. These results suggest that scalable, general-purpose architectures paired with efficient training curricula can match complex domain-specific designs at a fraction of the cost, opening the training of frontier probabilistic weather models to the broader community.

URL PDF HTML ☆

赞 0 踩 0

2604.08161 2026-06-02 cs.LG 版本更新

Shift- and stretch-invariant non-negative matrix factorization with an application to brain tissue delineation in emission tomography data

位移与伸缩不变的非负矩阵分解及其在脑组织发射断层成像数据分割中的应用

Anders S. Olsen, Miriam L. Navarro, Claus Svarer, Jesper L. Hinrich, Morten Mørup, Gitte M. Knudsen

发表机构 * Neurobiology Research Unit, Copenhagen University Hospital Rigshospitalet（哥本哈根大学医院神经生物学研究单位）； Department of Neuroscience, Faculty of Health and Medical Sciences, University of Copenhagen（哥本哈根大学健康与医学科学学院神经科学系）； Department of Applied Mathematics and Computer Science, Technical University of Denmark（丹麦技术大学应用数学与计算机科学系）； Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen（哥本哈根大学健康与医学科学学院临床医学系）

AI总结提出频域实现的位移与伸缩不变非负矩阵分解方法，解决动态神经影像中扩散导致的时延和伸缩问题，在合成数据和脑发射断层数据上验证了其对脑组织结构的精细刻画能力。

Comments Accepted at ICASSP2026

详情

DOI: 10.1109/ICASSP55912.2026.11461301

AI中文摘要

动态神经影像数据，例如血液或脑脊液中放射性示踪剂传输的发射断层测量，通常表现出类似扩散的特性。这些特性引入了距离依赖的时间延迟、尺度差异和伸缩效应，限制了传统线性建模和分解方法的有效性。为了解决这一问题，我们提出了位移与伸缩不变的非负矩阵分解框架。我们的方法估计整数和非整数的时间位移以及时间伸缩，全部在频域中实现，其中位移对应于相位修改，而伸缩通过零填充或截断处理。该模型在PyTorch中实现（https://github.com/anders-s-olsen/shiftstretchNMF）。我们在合成数据和脑发射断层成像数据上证明，该模型能够解释伸缩效应，从而提供更详细的脑组织结构表征。

英文摘要

Dynamic neuroimaging data, such as emission tomography measurements of radiotracer transport in blood or cerebrospinal fluid, often exhibit diffusion-like properties. These introduce distance-dependent temporal delays, scale-differences, and stretching effects that limit the effectiveness of conventional linear modeling and decomposition methods. To address this, we present the shift- and stretch-invariant non-negative matrix factorization framework. Our approach estimates both integer and non-integer temporal shifts as well as temporal stretching, all implemented in the frequency domain, where shifts correspond to phase modifications, and where stretching is handled via zero-padding or truncation. The model is implemented in PyTorch (https://github.com/anders-s-olsen/shiftstretchNMF). We demonstrate on synthetic data and brain emission tomography data that the model is able to account for stretching to provide more detailed characterization of brain tissue structure.

URL PDF HTML ☆

赞 0 踩 0

2604.08149 2026-06-02 cs.LG stat.ML 版本更新

A Direct Approach for Handling Contextual Bandits with Latent State Dynamics

处理具有隐状态动态的上下文赌博机的直接方法

Zhen Li, Gilles Stoltz

发表机构 * GitHub

AI总结本文提出一种直接方法处理隐马尔可夫链驱动的线性上下文赌博机，通过简化模型归约到标准线性上下文赌博机，并扩展理论分析以考虑HMM参数估计，同时针对更复杂的隐状态依赖模型引入周期性参数更新算法。

详情

Journal ref: ICML 2026 - Forty-Third International Conference on Machine Learning, Jul 2026, Seoul, South Korea, France

AI中文摘要

我们考虑一个线性上下文赌博机模型，其中上下文和奖励由有限隐马尔可夫链控制。我们首先重新审视Nelson等人（2022）的简化模型，其中奖励是给定观察上下文（称为信念）的隐状态后验概率的线性函数，而不是隐状态本身的函数。这个简化模型可以通过直接归约到标准线性上下文赌博机来处理。我们扩展了这一归约的理论分析，在遗憾界中考虑了隐马尔可夫模型[HMM]参数的估计，并提供了不再依赖于奖励函数而仅通过HMM参数估计依赖于模型的高概率界。其次，也是最重要的，我们转而研究更自然且更复杂的模型，该模型在隐状态中引入直接依赖关系（除了对观察上下文的依赖，这对于上下文赌博机是自然的）。在经典的HMM遗忘条件下，为应对奖励结构引入的各种统计依赖，引入的主要算法工具是仅周期性更新奖励模型参数。

英文摘要

We consider a linear contextual bandit model where contexts and rewards are governed by a finite hidden Markov chain. We first revisit the simplified model by Nelson et al. (2022), in which rewards are linear functions of the posterior probabilities over the hidden states given the observed contexts (called beliefs), rather than functions of the hidden states themselves. This simplified model may be handled through a direct reduction to standard linear contextual bandits. We extend the theoretical analysis of this reduction to take into account the estimation of the parameters of the hidden Markov model [HMM] in the regret bound and to provide high-probability bounds not depending anymore on the reward functions and only depending on the model through the estimation of the HMM parameters. Second, and most importantly, we instead study the more natural and more complex model incorporating direct dependencies in the hidden states (on top of dependencies on the observed contexts, as is natural for contextual bandits). Under a classic HMM forgetting condition, the main algorithmic tool introduced to cope with the various statistical dependencies that the reward structure introduces is to only periodically update reward-model parameters.

URL PDF HTML ☆

赞 0 踩 0

2602.24047 2026-06-02 cs.NI cs.CR cs.LG 版本更新

Unsupervised Baseline Clustering and Incremental Adaptation for IoT Device Traffic Profiling

无监督基线聚类与增量自适应用于物联网设备流量分析

Sean M. Alderman, John D. Hastings

发表机构 * The Beacom College of Computer \& Cyber Sciences Dakota State University Madison, SD, USA

AI总结提出两阶段无监督流量分析流程，使用DBSCAN进行基线聚类（NMI 0.78），BIRCH实现增量自适应（纯度0.87），揭示静态高纯度与增量灵活性之间的权衡。

Comments 6 pages, 2 figures, 4 tables

详情

DOI: 10.1109/ISDFS69419.2026.11459084
Journal ref: 2026 IEEE 14th International Symposium on Digital Forensics and Security (ISDFS)

AI中文摘要

物联网设备的增长和异构性带来了安全挑战，静态识别模型会随着流量演变而退化。本文提出了一种基于流特征的两阶段无监督物联网设备流量分析和增量模型更新流程，并在Deakin物联网数据集的选定长时间捕获数据上进行评估。对于基线分析，基于密度的聚类（DBSCAN）隔离了数据中相当一部分离群点，并在测试的经典方法中与真实设备标签的对齐最强（NMI 0.78），在聚类纯度上优于基于质心的聚类。对于增量自适应，我们评估了面向流的聚类方法，发现BIRCH支持高效更新（每次更新0.13秒），并为保留的新设备形成相对连贯的聚类（纯度0.87），但新流量捕获有限（份额0.72），且自适应后已知设备准确性存在可衡量的权衡（0.71）。总体而言，结果突出了高纯度静态分析与增量聚类灵活性在演变的物联网环境中的实际权衡。

英文摘要

The growth and heterogeneity of IoT devices create security challenges where static identification models can degrade as traffic evolves. This paper presents a two-stage, flow-feature-based pipeline for unsupervised IoT device traffic profiling and incremental model updating, evaluated on selected long-duration captures from the Deakin IoT dataset. For baseline profiling, density-based clustering (DBSCAN) isolates a substantial outlier portion of the data and produces the strongest alignment with ground-truth device labels among tested classical methods (NMI 0.78), outperforming centroid-based clustering on cluster purity. For incremental adaptation, we evaluate stream-oriented clustering approaches and find that BIRCH supports efficient updates (0.13 seconds per update) and forms comparatively coherent clusters for a held-out novel device (purity 0.87), but with limited capture of novel traffic (share 0.72) and a measurable trade-off in known-device accuracy after adaptation (0.71). Overall, the results highlight a practical trade-off between high-purity static profiling and the flexibility of incremental clustering for evolving IoT environments.

URL PDF HTML ☆

赞 0 踩 0

2602.03912 2026-06-02 cs.LG 版本更新

Echo State Networks for Time Series Forecasting: Hyperparameter Sweep and Benchmarking

回声状态网络用于时间序列预测：超参数扫描与基准测试

Alexander Häußer

发表机构 * Justus-Liebig-University Giessen（吉斯塔-利比希大学吉essen）

AI总结本文研究回声状态网络（ESN）在M4竞赛数据集上的单变量预测性能，通过超参数扫描和基准测试，发现简单的一阶自回归ESN在月度数据上与ARIMA和TBATS相当，在季度数据上取得最低平均MASE。

详情

AI中文摘要

本文研究了回声状态网络（ESN）对M4预测竞赛数据集中月度与季度时间序列的单变量预测性能。我们评估了一个简单的一阶自回归ESN是否能成为广泛使用的预测方法的竞争性替代方案。研究采用两阶段设计：使用参数数据集分析泄漏率、谱半径、储层大小和正则化选择下的ESN模型配置，同时保留一个不相交的预测数据集用于样本外基准测试。预测精度通过平均绝对缩放误差（MASE）和对称平均绝对百分比误差（sMAPE）衡量，并与简单基准和统计模型（包括自回归积分滑动平均（ARIMA）、指数平滑状态空间（ETS）、Theta方法和TBATS）进行比较。模型配置分析揭示了频率特定的模式：月度序列倾向于中等持久性的储层，而季度序列则偏好更收缩的动态；两种频率下，高泄漏率普遍更受青睐。在最终基准测试中，ESN在月度数据上与ARIMA和TBATS表现相当，并在季度数据上取得最低平均MASE，尽管并非在所有指标上均一致最优。总体而言，结果表明，在考虑过滤后的M4子集上，简单的自回归ESN能提供有竞争力的预测精度（特别是在MASE下），且一旦ESN配置固定，训练和预测时间需求较低。

英文摘要

This paper investigates the performance of Echo State Networks (ESNs) for univariate forecasting of monthly and quarterly time series from the M4 Forecasting Competition dataset. We evaluate whether a simple first-order autoregressive ESN can serve as a competitive alternative to widely used forecasting methods. The study uses a two-stage design: a Parameter dataset is used to analyze ESN model configurations over leakage rate, spectral radius, reservoir size, and regularization selection, while a disjoint Forecast dataset is reserved for out-of-sample benchmarking. Forecast accuracy is measured using mean absolute scaled error (MASE) and symmetric mean absolute percentage error (sMAPE) and compared with simple benchmarks and statistical models including autoregressive integrated moving average (ARIMA), exponential smoothing state space (ETS), the Theta method, and TBATS. The model-configuration analysis reveals frequency-specific patterns: monthly series tend to favor moderately persistent reservoirs, whereas quarterly series favor more contractive dynamics; across both frequencies, high leakage rates are generally preferred. In the final benchmark, the ESN performs on par with ARIMA and TBATS for monthly data and achieves the lowest mean MASE for quarterly data, although it is not uniformly best across all metrics. Overall, the results indicate that a simple autoregressive ESN can provide competitive forecast accuracy on the considered filtered M4 subsets, particularly under MASE, while requiring low training and forecasting time once the ESN configuration has been fixed.

URL PDF HTML ☆

赞 0 踩 0

2604.05324 2026-06-02 cs.LG cs.IT math.IT 版本更新

A Theoretical Framework for Statistical Evaluability of Generative Models

生成模型统计可评估性的理论框架

Shashaank Aiyer, Yishay Mansour, Shay Moran, Han Shao

发表机构 * University of Maryland（马里兰大学）； Tel Aviv University and Google Research（特拉维夫大学和谷歌研究）； Technion and Google Research（技术学院和谷歌研究）

AI总结提出一个理论框架，研究生成模型的统计可评估性，证明基于有界测试类的积分概率度量可有限样本评估，而Rényi和KL散度不可评估。

Comments 30 pages

详情

AI中文摘要

统计评估旨在使用从真实分布中采样的独立同分布测试数据来估计模型的泛化性能。在分类等监督学习设置中，错误率等性能指标定义明确，给定足够大的数据集，测试误差可靠地近似总体误差。相比之下，由于生成模型的开放性，评估更具挑战性：不清楚哪些指标是合适的，以及这些指标是否可以从有限样本中可靠评估。在这项工作中，我们引入了一个评估生成模型的理论框架，并建立了常用指标的可评估性结果。我们研究了两类指标：基于测试的指标，包括积分概率度量（IPMs）和Rényi散度。我们证明，对于任何有界测试类，IPMs可以从有限样本中评估，误差为乘性和加性近似。此外，当测试类具有有限脂肪破碎维度时，IPMs可以任意精度评估。相比之下，Rényi和KL散度不能从有限样本中评估，因为它们的值可能由罕见事件关键决定。我们还分析了困惑度作为评估方法的潜力和局限性。

英文摘要

Statistical evaluation aims to estimate the generalization performance of a model using held-out i.i.d. test data sampled from the ground-truth distribution. In supervised learning settings such as classification, performance metrics such as error rate are well-defined, and test error reliably approximates population error given sufficiently large datasets. In contrast, evaluation is more challenging for generative models due to their open-ended nature: it is unclear which metrics are appropriate and whether such metrics can be reliably evaluated from finite samples. In this work, we introduce a theoretical framework for evaluating generative models and establish evaluability results for commonly used metrics. We study two categories of metrics: test-based metrics, including integral probability metrics (IPMs), and Rényi divergences. We show that IPMs with respect to any bounded test class can be evaluated from finite samples up to multiplicative and additive approximation errors. Moreover, when the test class has finite fat-shattering dimension, IPMs can be evaluated with arbitrary precision. In contrast, Rényi and KL divergences are not evaluable from finite samples, as their values can be critically determined by rare events. We also analyze the potential and limitations of perplexity as an evaluation method.

URL PDF HTML ☆

赞 0 踩 0

2604.04199 2026-06-02 cs.LG 版本更新

Which Leakage Types Matter? A Quantitative Landscape Across 2,047 Benchmark Datasets

哪些泄漏类型重要？2047个基准数据集的定量景观

Simon Roth

发表机构 * Simon Roth

AI总结通过在2047个独立同分布表格数据集上进行28项受试者内反事实实验，以及129个时间序列数据集的边界实验，定量评估了机器学习中四类数据泄漏的严重性。

Comments 39 pages, 6 figures, 13 tables. Companion to arXiv:2603.10742

2603.24324 2026-06-02 cs.LG cs.AI cs.SY eess.SY 版本更新

Large Language Model Guided Incentive Aware Reward Design for Cooperative Multi-Agent Reinforcement Learning

大语言模型引导的激励感知奖励设计用于合作多智能体强化学习

Dogan Urgun, Gokhan Gungor

发表机构 * Department of Electrical and Electronics Engineering（电气与电子工程系）； Karabuk University（卡拉博克大学）； Department of Mechatronics Engineering（机械工程系）

AI总结提出利用大语言模型自动生成可执行奖励程序，结合多智能体近端策略优化训练，在Overcooked-AI环境中显著提升合作任务回报。

详情

AI中文摘要

多级神经网络逼近

Shijun Zhang, Zuowei Shen, Yuesheng Xu

发表机构 * Department of Applied Mathematics, Hong Kong Polytechnic University（应用数学系，香港理工大学）； Department of Mathematics, National University of Singapore（数学系，新加坡国立大学）； Department of Mathematics and Statistics, Old Dominion University（数学与统计学系，老 Dominion 大学）

AI总结本文提出多级深度学习（MGDL）框架，通过逐级冻结并训练子网络拟合残差，实现结构化误差修正，并证明固定宽度多级ReLU网络可均匀逼近连续函数。

详情

AI中文摘要

我们研究多级深度学习（MGDL）作为深度神经网络中结构化误差修正的原则性框架。虽然神经网络的逼近能力现在相对被充分理解，但由于高度非凸且常常病态的优化景观，训练非常深的架构仍然具有挑战性。相比之下，对于相对浅的网络，特别是某些单隐层ReLU模型，在适当设置下训练允许具有全局保证的凸重构，这激发了在扩展深度的同时提高稳定性的学习范式。MGDL基于这一见解，通过逐级训练深度网络：先前学习的级别被冻结，每个新添加的级别子网络被组合在先前学习的级别之上，并训练以拟合当前逼近留下的残差，产生结构化和可解释的分层修正过程。我们为MGDL开发了算子理论基础，并证明对于定义在超立方体上的任何连续目标函数，存在一个固定宽度的多级ReLU方案，其残差点态非增且一致收敛到零，并且对于每个非平凡级别，在$p\in [1,\infty)$上具有严格的$L^p$范数衰减。据我们所知，这项工作提供了第一个严格的构造性逼近保证，表明逐级残差修正方案可以在固定宽度多级ReLU架构中实现误差消失。

英文摘要

We study multigrade deep learning (MGDL) as a principled framework for structured error refinement in deep neural networks. While the approximation power of neural networks is now relatively well understood, training very deep architectures remains challenging due to highly nonconvex and often ill-conditioned optimization landscapes. In contrast, for relatively shallow networks, most notably certain one-hidden-layer ReLU models, training admits convex reformulations with global guarantees under appropriate settings, motivating learning paradigms that improve stability while scaling to depth. MGDL builds on this insight by training deep networks grade by grade: previously learned grades are frozen, and each newly added grade-wise subnetwork is composed on top of the previously learned grades and trained to fit the residual left by the current approximation, yielding a structured and interpretable hierarchical refinement process. We develop an operator-theoretic foundation for MGDL and prove that, for any continuous target function defined on a hypercube, there exists a fixed-width multigrade ReLU scheme whose residuals are pointwise nonincreasing in magnitude and converge uniformly to zero, with strict $L^p$-norm decay at every nontrivial grade for $p\in [1,\infty)$. To the best of our knowledge, this work provides the first rigorous constructive approximation guarantee showing that a grade-wise residual refinement scheme can achieve vanishing error in a fixed-width multigrade ReLU architecture.

URL PDF HTML ☆

赞 0 踩 0

2604.01802 2026-06-02 cs.LG 版本更新

Real-Time Sensing of Inaccessible Physical Fields via an Edge-Deployable Hardware-Portable Graph Neural Operator

通过边缘可部署的硬件可移植图神经算子实时感知不可及的物理场

William Howes, Jason Yoo, Kazuma Kobayashi, Subhankar Sarkar, Farid Ahmed, Souvik Chakraborty, Syed Bahauddin Alam

发表机构 * Grainger College of Engineering, Nuclear, Plasma & Radiological Engineering Department, University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校格拉inger工程学院、核物理与等离子体工程系）； Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi（印度理工学院德里分校人工智能学院）； Department of Applied Mechanics, Indian Institute of Technology Delhi（印度理工学院德里分校应用力学系）； National Center for Supercomputing Applications（国家超级计算应用中心）

AI总结提出VIRSO，一种具有独特时空架构的神经算子，通过硬件协同设计实现边缘设备上从稀疏边界观测到内部多物理场的实时推理，在降低能耗的同时保持高精度。

Comments 36 pages, 5 figures, 16 tables

详情

AI中文摘要

从稀疏边界观测实时推断不可及的内部物理场是科学机器学习中一个基本但未解决的问题，与许多工程应用中的安全关键监测直接相关。现有的神经算子实现了高精度，但未解决在嵌入式边缘平台上的部署问题。本文引入VIRSO（虚拟不规则实时稀疏算子），这是第一个具有独特时空架构、明确针对边缘部署硬件的神经算子。VIRSO通过显式与硬件执行对齐的谱-空间分解（计算受限的图谱路径和内存带宽受限的空间聚合路径，分别在数据中心和嵌入式加速器上独立表征），学习从稀疏、几何不连续的边界输入到不规则非结构化网格上空间连续内部多物理场的非线性映射。该设计将推理能量-延迟积相对于原始图算子基线降低了29倍（在NVIDIA H200上从206 J·ms降至7.0 J·ms），并在NVIDIA Jetson Orin Nano上实现了17.0样本/秒的嵌入式推理，板级功耗为7.06 W，无需修改。一种网格密度自适应图构建策略（V-KNN）同时提高了精度并将图边数减少了34%。在三个基准测试中，重建比从47:1到156:1，VIRSO实现了低于1%的平均相对$L_2$误差，参数少于算子基线，并且相对于高保真参考求解器提供了约$10^4$倍的推理加速。据我们所知，这是首个单瓦级神经算子的演示，确立了硬件协同设计作为算子推理中缺失的要素以及实现实时部署的可行路径。

英文摘要

Real-time inference of inaccessible interior physical fields from sparse boundary observations is a fundamental but unresolved problem in scientific machine learning, with direct relevance to safety-critical monitoring across many engineering applications. Existing neural operators achieve high accuracy but leave deployment to embedded edge platforms unaddressed. Here we introduce VIRSO (Virtual Irregular Real-Time Sparse Operator), the first neural operator with a unique spatial-spectral architecture that explicitly addresses edge-deployment hardware. VIRSO learns a nonlinear mapping from sparse, geometrically disjoint boundary inputs to spatially continuous interior multiphysics fields on irregular unstructured meshes through a spectral-spatial decomposition explicitly aligned with hardware execution: a compute-bound graph spectral pathway and a memory-bandwidth-bound spatial-aggregation pathway, each independently characterized on datacenter and embedded accelerators. The design reduces the inference energy-delay product by 29$\times$ relative to the vanilla graph-operator baseline (206 J$\cdot$ms $\to$ 7.0 J$\cdot$ms on an NVIDIA H200) and enables 17.0 samples/s embedded inference on an NVIDIA Jetson Orin Nano within 7.06 W board-level power, without modification. A mesh-density-adaptive graph construction strategy (V-KNN) simultaneously improves accuracy and reduces graph edge count by 34%. Across three benchmarks with reconstruction ratios from 47:1 to 156:1, VIRSO achieves mean relative $L_2$ errors below 1% with fewer parameters than operator baselines and delivers an inference speedup of $\approx 10^4$ times over the high-fidelity reference solver. To our knowledge, this is the first demonstration of a single-digit-watt neural operator, establishing hardware co-design as a missing ingredient in operator-based inference and a tractable path to real-time deployment.

URL PDF HTML ☆

赞 0 踩 0

2510.03690 2026-06-02 cs.LG stat.ML 版本更新

From Moments to Models: Graphon-Mixture Learning for Mixup and Contrastive Learning

从矩到模型：用于混合和对比学习的图模型混合学习

Ali Azizpour, Reza Ramezanpour, Santiago Segarra

发表机构 * University of Michigan（密歇根大学）

AI总结提出一个统一框架，将图数据建模为图模型（graphon）混合，利用图矩（motif密度）聚类并估计混合成分，进而提出图模型感知的混合（GMAM）和对比学习（MGCL）方法，在监督和无监督任务上取得最优或竞争性能。

详情

AI中文摘要

现实世界的图数据集通常来自混合群体，其中图由多个不同的潜在分布生成。在这项工作中，我们提出了一个统一框架，将图数据显式建模为由图模型表示的 probabilistic 图生成模型的混合。为了表征和估计这些图模型，我们利用图矩（motif密度）对从相同底层模型生成的图进行聚类。我们建立了一个新的理论保证，推导出一个更紧的界，表明从结构相似的图模型中采样的图以高概率表现出相似的 motif 密度。这一结果使得图模型混合成分的估计具有原则性。我们展示了如何将估计的图模型混合成分增强两种广泛使用的下游范式：通过混合进行图数据增强和图对比学习。通过将这些方法基于底层生成模型，我们开发了图模型感知的混合（GMAM）和模型感知的图对比学习（MGCL）。在模拟和真实数据集上的大量实验证明了强大的实证性能。在监督学习中，GMAM 优于现有的增强策略，在 7 个数据集中的 6 个上达到了新的最先进准确率。在无监督学习中，MGCL 在七个基准数据集上具有竞争力，并实现了总体最低的平均排名。

英文摘要

Real-world graph datasets often arise from mixtures of populations, where graphs are generated by multiple distinct underlying distributions. In this work, we propose a unified framework that explicitly models graph data as a mixture of probabilistic graph generative models represented by graphons. To characterize and estimate these graphons, we leverage graph moments (motif densities) to cluster graphs generated from the same underlying model. We establish a novel theoretical guarantee, deriving a tighter bound showing that graphs sampled from structurally similar graphons exhibit similar motif densities with high probability. This result enables principled estimation of graphon mixture components. We show how incorporating estimated graphon mixture components enhances two widely used downstream paradigms: graph data augmentation via mixup and graph contrastive learning. By conditioning these methods on the underlying generative models, we develop graphon-mixture-aware mixup (GMAM) and model-aware graph contrastive learning (MGCL). Extensive experiments on both simulated and real-world datasets demonstrate strong empirical performance. In supervised learning, GMAM outperforms existing augmentation strategies, achieving new state-of-the-art accuracy on 6 out of 7 datasets. In unsupervised learning, MGCL performs competitively across seven benchmark datasets and achieves the lowest average rank overall.

URL PDF HTML ☆

赞 0 踩 0

2603.29488 2026-06-02 cs.LG 版本更新

What Cosine Similarity of Label Representations Can and Cannot Tell us

标签表示的余弦相似度能告诉我们什么，不能告诉我们什么

Beatrix M. G. Nielsen, Andreas Grivas

发表机构 * IT University of Copenhagen（丹麦哥本哈根技术大学）； School of Mathematics, University of Edinburgh（爱丁堡大学数学学院）

AI总结本文证明对于softmax分类器，标签表示（称为unembedding）之间的余弦相似度不提供模型概率的任何信息，而对于sigmoid分类器，所有成对余弦相似度定义了可能的标签组合集。

详情

AI中文摘要

余弦相似度常用于衡量神经网络模型向量表示的相似性。然而，表示的余弦相似度并不能保证告诉我们关于模型概率的任何信息。在本文中，我们证明对于softmax分类器，无论是图像分类器还是自回归语言模型，标签表示（在论文中称为unembedding）之间的余弦相似度不提供模型分配的概率的任何信息。具体地，我们证明给定两个unembedding，可以创建另一个模型，该模型对所有输入分配相同的概率，但表示之间的余弦相似度现在要么是1要么是-1。我们还证明对于sigmoid分类器（其中每个输入可以被分配多个标签），unembedding之间的所有成对余弦相似度定义了可能的标签组合集。然而，对于softmax分类器（其中每个输入被分配从最可能到最不可能的标签排序），我们需要所有unembedding差异之间的所有成对余弦相似度才能知道模型可以预测哪些排序。我们得出结论，在没有参考产生它们的分类器的情况下解释unembedding之间的余弦相似度是具有误导性的。

英文摘要

Cosine similarity is often used to measure the similarity of vector representations of neural network models. However, the cosine similarity of representations is not guaranteed to tell us anything about model probabilities. In this paper we show that for a softmax classifier, be it an image classifier or an autoregressive language model, the cosine similarity between label representations (called unembeddings in the paper) does not give any information on the probabilities assigned by the model. Specifically, we prove that given two unembeddings, it is possible to create another model which assigns the same probabilities for all inputs, but where the cosine similarity between the representations is now either 1 or -1. We also show that for a sigmoid classifier (where each input can be assigned multiple labels), all pairwise cosine similarities between the unembeddings define the set of possible label combinations. However, for softmax classifiers (where each input is assigned a ranking of the labels from most to least likely), we need all pairwise cosine similarities between all differences of unembeddings to know which rankings the model can predict. We conclude that it is misleading to interpret the cosine similarity between unembeddings without reference to the classifier that produced them.

URL PDF HTML ☆

赞 0 踩 0

2603.28768 2026-06-02 cs.DC cs.LG 版本更新

CRAFT: Fine-Grained Cost-Aware Expert Replication For Efficient Mixture-of-Experts Serving

CRAFT：面向高效混合专家服务的细粒度成本感知专家复制

Adrian Zhao, Zhenkun Cai, Zhenyu Song, Lingfan Yu, Haozheng Fan, Jun Wu, Yida Wang, Nandita Vijaykumar

发表机构 * NVIDIA Corporation（英伟达公司）

AI总结提出CRAFT框架，通过基于估计收益的细粒度逐层复制，在给定内存预算下最大化负载均衡，无需额外训练即可提升大规模MoE服务吞吐量。

Comments 22 pages, 15 figures

详情

Journal ref: Proceedings of the Ninth Conference on Machine Learning and Systems (MLSys 2026)

AI中文摘要

混合专家（MoE）最近成为高效扩展大型语言模型同时保持计算成本近乎恒定的主流架构。专家并行通过跨设备划分专家来分布参数，但这会在推理过程中引入令牌级负载不均衡。专家复制是服务框架中广泛采用的负载均衡技术，通过复制高负载专家来缓解大规模部署中的负载不均衡。在这项工作中，我们证明现有的复制方案往往过度复制，许多副本提供的改进微乎其微。副本消耗大量GPU内存，可能导致资源争用和吞吐量下降。我们提出CRAFT，一种高效的专家复制框架，通过基于估计的复制收益进行细粒度逐层复制，在给定内存预算下最大化负载均衡。CRAFT可以无缝集成到现有服务框架中，无需额外训练或模型更改。我们的评估表明，在模型规模从数千亿到数万亿参数的大规模部署中，与现有复制技术相比，CRAFT将端到端服务吞吐量平均提高1.14倍（最高1.2倍）。

英文摘要

Mixture-of-Experts (MoE) has recently emerged as the mainstream architecture for efficiently scaling large language models while maintaining near-constant computational cost. Expert parallelism distributes parameters by partitioning experts across devices, but this introduces token-level load imbalance during inference. Expert replication is a widely adopted load-balancing technique in serving frameworks that alleviates load imbalance in large-scale deployments by replicating experts with high loads. In this work, we demonstrate that existing replication schemes often over-replicate, with many replicas providing marginal improvement. Replicas consume substantial GPU memory, which may lead to resource contention and throughput degradation. We present CRAFT, an efficient expert replication framework that maximizes load balance under a given memory budget by performing fine-grained, per-layer replication based on the estimated replication benefit. CRAFT can be seamlessly integrated into existing serving frameworks without any additional training or model changes. Our evaluation shows that CRAFT increases end-to-end serving throughput by $1.14\times$ on average (up to $1.2\times$) over existing replication techniques in large-scale deployments with models ranging from hundreds of billions to a trillion parameters.

URL PDF HTML ☆

赞 0 踩 0

2603.23582 2026-06-02 cs.LG cs.AI 版本更新

AI Generalisation Gap In Comorbid Sleep Disorder Staging

共病睡眠障碍分期中的AI泛化差距

Saswata Bose, Suvadeep Maiti, Shivam Kumar Sharma, Mythirayee S, Tapabrata Chakraborti, Srijitesh Rajendran, Raju S. Bapi

发表机构 * arXiv

AI总结针对脑卒中患者睡眠分期中深度学习模型在健康与临床人群间泛化差的问题，通过Grad-CAM可视化和新数据集iSLEEPS，揭示模型关注生理无意义区域，并强调需开发疾病特异性模型。

详情

DOI: 10.1109/ISBI61048.2026.11515484

AI中文摘要

准确的睡眠分期对于诊断脑卒中患者的OSA和低通气至关重要。尽管PSG可靠，但成本高、劳动密集且需人工评分。虽然深度学习在健康受试者中实现了基于EEG的自动睡眠分期，但我们的分析显示，该方法在睡眠紊乱的临床人群中泛化能力差。利用Grad-CAM解释，我们系统地证明了这一局限性。我们引入了iSLEEPS，一个经过临床注释的缺血性脑卒中新数据集（即将公开发布），并评估了SE-ResNet加双向LSTM模型用于单通道EEG睡眠分期。正如预期，健康与疾病受试者之间的跨域性能很差。注意力可视化在临床专家反馈的支持下显示，模型在患者数据中关注生理上无信息的EEG区域。统计和计算分析进一步证实了健康与缺血性脑卒中队列之间显著的睡眠结构差异，强调了在部署前需要经过临床验证的受试者感知或疾病特异性模型。论文和代码摘要见https://himalayansaswatabose.github.io/iSLEEPS_Explainability.github.io/

英文摘要

Accurate sleep staging is essential for diagnosing OSA and hypopnea in stroke patients. Although PSG is reliable, it is costly, labor-intensive, and manually scored. While deep learning enables automated EEG-based sleep staging in healthy subjects, our analysis shows poor generalization to clinical populations with disrupted sleep. Using Grad-CAM interpretations, we systematically demonstrate this limitation. We introduce iSLEEPS, a newly clinically annotated ischemic stroke dataset (to be publicly released), and evaluate a SE-ResNet plus bidirectional LSTM model for single-channel EEG sleep staging. As expected, cross-domain performance between healthy and diseased subjects is poor. Attention visualizations, supported by clinical expert feedback, show the model focuses on physiologically uninformative EEG regions in patient data. Statistical and computational analyses further confirm significant sleep architecture differences between healthy and ischemic stroke cohorts, highlighting the need for subject-aware or disease-specific models with clinical validation before deployment. A summary of the paper and the code is available at https://himalayansaswatabose.github.io/iSLEEPS_Explainability.github.io/

URL PDF HTML ☆

赞 0 踩 0

2511.16992 2026-06-02 cs.LG 版本更新

FIRM: Federated In-client Regularized Multi-objective Alignment for Large Language Models

FIRM: 面向大型语言模型的联邦客户端内正则化多目标对齐

Fatemeh Nourzad, Amirhossein Roknilamouki, Eylem Ekici, Jia Liu, Ness Shroff

发表机构 * Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH, USA（电气与计算机工程系，俄亥俄州立大学，哥伦布，OH，USA）； Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA（计算机科学与工程系，俄亥俄州立大学，哥伦布，OH，USA）

AI总结提出FIRM算法，通过客户端内正则化缓解客户端分歧漂移并提高通信效率，实现联邦多目标对齐，并首次给出有限时间收敛保证。

详情

AI中文摘要

将大型语言模型（LLMs）与人类价值观对齐通常需要平衡多个相互冲突的目标，如有用性和无害性。训练这些模型计算密集，且集中式处理引发严重的数据隐私问题。联邦学习（FL）提供了一种有吸引力的替代方案，但现有的联邦多目标优化（FMOO）方法面临严重的通信瓶颈，因为它们依赖向服务器传输多个梯度，这对于大型模型不可扩展。我们提出了FIRM（联邦客户端内正则化多目标对齐），一种新颖的算法，同时实现了客户端分歧漂移缓解和通信效率。在FIRM中，每个客户端本地求解一个正则化多目标优化问题。通过客户端内正则化直接缓解客户端分歧漂移，我们的方法消除了先前工作中常见的多梯度传输需求。因此，客户端只需传输一组适配参数，保持高通信效率。我们证明了我们的算法收敛到帕累托驻点，并且据我们所知，首次为这种联邦多目标对齐设置提供了有限时间收敛保证。实验上，我们展示了与基线相比，FIRM导致更平滑的训练动态、减少的客户端分歧漂移和改进的奖励权衡。我们进一步提出了一种方法，将目标上的偏好纳入考虑，并报告了经验帕累托图，表明FIRM可以根据指定偏好平滑地调整目标之间的权衡。

英文摘要

Aligning Large Language Models (LLMs) with human values often involves balancing multiple, conflicting objectives such as helpfulness and harmlessness. Training these models is computationally intensive, and centralizing the process raises significant data privacy concerns. Federated Learning (FL) offers a compelling alternative, but existing Federated Multi-Objective Optimization (FMOO) methods face severe communication bottlenecks as their reliance on transmitting multiple gradients to a server is unscalable for large models. We introduce FIRM (Federated In-client Regularized Multi-objective alignment), a novel algorithm that achieves both client disagreement drift mitigation and communication efficiency. In FIRM, each client locally solves a regularized multi-objective optimization problem. By directly mitigating client disagreement drift through in-client regularization, our method eliminates the need for the multi-gradient transmissions common in prior works. Consequently, clients need only to transmit a single set of adapted parameters, maintaining high communication efficiency. We prove that our algorithm converges to Pareto-stationary points and, to our knowledge, provide the first finite-time convergence guarantees for this federated multi-objective alignment setting. Empirically, we show that FIRM leads to smoother training dynamics, reduced client disagreement drift, and improved reward trade-offs compared to baselines. We further propose a method to incorporate a preference over the objectives and report empirical Pareto plots, demonstrating that FIRM can smoothly adapt trade-offs between objectives in response to specified preferences.

URL PDF HTML ☆

赞 0 踩 0

2603.24511 2026-06-02 cs.LG cs.AI cs.CR 版本更新

图能量匹配：用于图生成的传输对齐能量基建模

Michal Balcerak, Suprosanna Shit, Chinmay Prabhakar, Sebastian Kaltenbach, Michael S. Albergo, Yilun Du, Bjoern Menze

发表机构 * University of Zurich（苏黎世大学）； Harvard University（哈佛大学）； Kempner Institute（凯普纳研究所）

AI总结提出Graph Energy Matching (GEM)方法，基于JKO传输映射优化视角学习置换不变势能，通过能量基切换策略实现离散图的高质量生成，在分子图基准上匹配或超越离散扩散模型。

详情

AI中文摘要

离散数据（如图）的生成建模支撑着许多科学和工业应用，包括分子发现和材料设计。在这些领域中，概率推理尤其有价值，因为它能够实现可组合生成和原则性地融入期望的约束，例如结构或功能属性。能量基模型通过捕获相对似然并在推理过程中直接施加约束来支持可组合推理，自然符合这一目标。然而，离散能量基模型通常难以实现高效高质量的采样，因为支持区域外的区域常包含虚假局部最小值，会困住采样器并导致训练不稳定，从而与离散扩散模型相比存在保真度差距。为了解决这一差距，我们引入了Graph Energy Matching (GEM)，这是一种受Jordan-Kinderlehrer-Otto (JKO)传输映射优化视角启发的离散生成框架。GEM学习一个置换不变势能，同时引导从噪声到高似然图区域的离散传输，并在这些区域内细化样本。我们进一步引入了一种利用能量基切换策略的采样协议，无缝衔接快速的梯度引导传输和用于有效探索的局部混合机制。在分子图基准上，GEM在大多数报告指标上匹配或超越了强离散扩散基线。除了提高生成质量，GEM的相对似然建模还支持定向探索，促进组合生成、属性约束采样以及图之间的插值。项目页面：https://michalbalcerak.ai/graph-energy-matching/。

英文摘要

Generative modeling of discrete data, such as graphs, underpins many scientific and industrial applications, including molecular discovery and materials design. In these domains, probabilistic inference is particularly valuable, as it enables composable generation and principled incorporation of desired constraints, such as structural or functional properties. Energy-based models naturally support this goal by capturing relative likelihoods and enabling composable inference by directly enforcing constraints during inference. However, discrete energy-based models typically struggle with efficient and high-quality sampling, as off-support regions often contain spurious local minima, trapping samplers and causing training instabilities, resulting in a fidelity gap compared to discrete diffusion models. To address this gap, we introduce Graph Energy Matching (GEM), a discrete generative framework inspired by the Jordan-Kinderlehrer-Otto (JKO) transport-map optimization perspective. GEM learns a permutation-invariant potential energy that simultaneously guides discrete transport from noise toward high-likelihood graph regions and refines samples within these regions. We further introduce a sampling protocol leveraging an energy-based switching strategy, seamlessly bridging rapid, gradient-guided transport and a local mixing regime for effective exploration. On molecular graph benchmarks, GEM matches or surpasses strong discrete diffusion baselines on most reported metrics. Beyond improving generation quality, GEM's relative likelihood modeling enables targeted exploration, facilitating compositional generation, property-constrained sampling, and interpolation between graphs. Project page: https://michalbalcerak.ai/graph-energy-matching/.

URL PDF HTML ☆

赞 0 踩 0

2603.22235 2026-06-02 cs.HC cs.LG 版本更新

ShapDBM: Exploring Decision Boundary Maps in Shapley Space

ShapDBM：在Shapley空间中探索决策边界图

Luke Watkin, Daniel Archambault, Alex Telea

发表机构 * School of Computing, Newcastle University, UK（新castle大学计算机学院）； Department of Information and Computing Science, Utrecht University, Netherlands（乌得勒支大学信息与计算科学系）

AI总结提出通过将数据空间转换为Shapley空间并计算降维来生成决策边界图，相比直接基于数据的方法，生成的图质量指标相似或更高，决策区域更紧凑、更易探索且与模型性能更一致。

Comments 4 pages and 3 figures (excluding supplementary material)

2510.19496 2026-06-02 cs.CV cs.AI cs.LG 版本更新

CARES: Context-Aware Resolution Selector for VLMs

CARES: 面向视觉语言模型的上下文感知分辨率选择器

Moshe Kimhi, Nimrod Shabtay, Raja Giryes, Chaim Baskin, Eli Schwartz

发表机构 * Technion（技术ion大学）； IBM Research（IBM研究院）； Tel-Aviv University（特拉维夫大学）； Ben-Gurion University（本· Gurion大学）

AI总结提出CARES轻量级预处理模块，通过紧凑型VLM预测图像-查询对的最小足够分辨率，在保持任务性能的同时最多减少80%计算量。

Comments Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Accepted to ACL 2026 (Oral presentation). Code available at https://github.com/mkimhi/CARES

详情

AI中文摘要

大型视觉语言模型通常以原始或高分辨率处理图像以保持跨任务有效性。这导致视觉令牌通常占总令牌的97-99%，即使低分辨率图像就足够时，也会产生高计算量和延迟。我们引入了CARES——一种上下文感知分辨率选择器，这是一个轻量级预处理模块，给定图像-查询对，预测最小的足够输入分辨率。CARES使用紧凑型VLM（350M）提取特征，并预测目标预训练VLM的响应何时收敛到其正确回答的峰值能力。尽管作为一组可选分辨率上的离散分类器进行训练，但CARES在推理时插值连续分辨率以实现细粒度控制。在涵盖文档和自然图像以及多样化目标VLM的五个多模态基准测试中，CARES在保持任务性能的同时最多减少80%的计算量。

英文摘要

Large vision-language models (VLMs) commonly process images at native or high resolution to remain effective across tasks. This inflates visual tokens ofter to 97-99% of total tokens, resulting in high compute and latency, even when low-resolution images would suffice. We introduce \emph{CARES}-a \textbf{C}ontext-\textbf{A}ware \textbf{R}esolution \textbf{S}elector, a lightweight preprocessing module that, given an image-query pair, predicts the \emph{minimal} sufficient input resolution. CARES uses a compact VLM (350M) to extract features and predict when a target pretrained VLM's response converges to its peak ability to answer correctly. Though trained as a discrete classifier over a set of optional resolutions, CARES interpolates continuous resolutions at inference for fine-grained control. Across five multimodal benchmarks spanning documents and natural images, as well as diverse target VLMs, CARES preserves task performance while reducing compute by up to 80%.

URL PDF HTML ☆

赞 0 踩 0

2602.10014 2026-06-02 cs.LG stat.ML 版本更新

A Task-Centric Theory for Iterative Self-Improvement with Easy-to-Hard Curricula

一种以任务为中心的迭代自改进理论，采用由易到难的课程

Chenruo Liu, Yijun Dong, Yiqiu Shen, Qi Lei

发表机构 * New York University（纽约大学）

AI总结本文通过将自改进建模为基于奖励过滤分布的最大似然微调，推导了期望奖励的有限样本保证，并证明了在推理任务中由易到难的课程比固定任务混合训练具有更好的理论保证。

详情

AI中文摘要

迭代自改进在由大型语言模型自身生成的、经过奖励验证的输出上微调自回归大型语言模型。与自改进的经验成功相比，这种生成性迭代过程在实际有限样本设置下的理论基础仍然有限。我们通过将每一轮自改进建模为在奖励过滤分布上的最大似然微调，并推导期望奖励的有限样本保证，朝这个目标取得了进展。我们的分析揭示了一个显式的反馈循环，其中更好的模型每轮接受更多数据，支持持续的自改进，同时解释了这种改进最终饱和的原因。通过采用以任务为中心的观点，考虑具有多个难度级别的推理任务，我们进一步证明了在模型初始化、任务难度和样本预算方面的可量化条件，在这些条件下，由易到难的课程比在固定任务混合上训练具有可证明的更好保证。我们的分析通过蒙特卡洛模拟以及涵盖合成图基推理任务和多个标准数学推理基准的实验得到了验证。

英文摘要

Iterative self-improvement fine-tunes an autoregressive large language model (LLM) on reward-verified outputs generated by the LLM itself. In contrast to the empirical success of self-improvement, the theoretical foundation of this generative, iterative procedure in a practical, finite-sample setting remains limited. We make progress toward this goal by modeling each round of self-improvement as maximum-likelihood fine-tuning on a reward-filtered distribution and deriving finite-sample guarantees for the expected reward. Our analysis reveals an explicit feedback loop where better models accept more data per iteration, supporting sustained self-improvement while explaining eventual saturation of such improvement. Adopting a task-centric view by considering reasoning tasks with multiple difficulty levels, we further prove quantifiable conditions on model initialization, task difficulty, and sample budget where easy-to-hard curricula provably achieve better guarantees than training on fixed mixtures of tasks. Our analyses are validated through Monte-Carlo simulations and experiments spanning a synthetic graph-based reasoning task and multiple standard mathematical reasoning benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2603.18016 2026-06-02 cs.CL cs.AI cs.DC cs.LG 版本更新

MineDraft: A Framework for Batch Parallel Speculative Decoding

MineDraft: 批量并行推测解码框架

Zhenwei Tang, Arun Verma, Zijian Zhou, Zhaoxuan Wu, Alok Prakash, Daniela Rus, Bryan Kian Hsiang Low

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； Toyota Research Institute（丰田研究院）； Toyota Motor Corporation（丰田公司）

AI总结提出MineDraft框架，通过批量并行设计将草稿生成与验证阶段重叠，显著提升推测解码的吞吐量和端到端延迟。

Comments Accepted at ICML 2026

详情

AI中文摘要

推测解码（SD）通过使用较小的草稿模型提出草稿令牌，随后由较大的目标模型验证，从而加速大型语言模型推理。然而，标准SD的性能通常受限于这些草稿和验证阶段的严格顺序执行。为解决此问题，本文提出MineDraft，一种批量并行推测解码（PSD）框架，旨在通过将草稿生成与验证重叠来有效隐藏草稿延迟。我们的理论分析表明，PSD比标准SD高效得多。MineDraft通过一种新颖的批量并行设计实现PSD，该设计维护两个请求批次，将一个批次的草稿生成与另一个批次的验证重叠。我们的实验结果显示，与标准SD相比，MineDraft在吞吐量（最高提升75%）和端到端延迟（最高降低39%）方面均有显著改进。此外，我们已将MineDraft实现为vLLM的插件，展示了其在生产级推理系统中的实用性。

英文摘要

Speculative decoding (SD) accelerates large language model inference by using a smaller draft model to propose draft tokens that are subsequently verified by a larger target model. However, the performance of standard SD is often limited by the strictly sequential execution of these drafting and verification stages. To address this, this paper proposes MineDraft, a batch parallel speculative decoding (PSD) framework designed to effectively hide drafting latency by overlapping it with verification. Our theoretical analysis shows that PSD is substantially more efficient than standard SD. MineDraft realizes the PSD through a novel batch-parallel design that maintains two batches of requests, overlapping drafting for one batch with verification for the other. Our experimental results show significant improvements of MineDraft in both throughput (up to 75%) and end-to-end latency (up to 39%) over standard SD. Furthermore, we have implemented MineDraft as a plugin for vLLM, demonstrating its practicality for production-ready inference systems.

URL PDF HTML ☆

赞 0 踩 0

2603.17893 2026-06-02 cs.SE cs.AI cs.LG 版本更新

scicode-lint: Detecting Methodology Bugs in Scientific Python Code with LLM-Generated Patterns

scicode-lint: 使用LLM生成的模式检测科学Python代码中的方法论错误

Sergey V. Samsonau

发表机构 * Authentic Research Partners, Princeton, NJ（真实研究伙伴，新泽西州普林斯顿）

AI总结提出scicode-lint，通过两级架构（构建时使用前沿模型生成模式，运行时使用小型本地模型执行）自动检测科学Python代码中的方法论错误，如数据泄露、交叉验证错误和缺失随机种子。

详情

AI中文摘要

科学Python代码中的方法论错误会产生看似合理但实际不正确的结果，传统的linter和静态分析工具无法检测到这些错误。多个研究团队构建了特定于ML的linter，证明了检测的可行性。然而，这些工具存在可持续性问题：依赖于特定的pylint或Python版本、有限的打包方式，以及每个新模式都需要手动工程。随着AI生成代码增加了科学软件的数量，对自动化方法论检查（如检测数据泄露、不正确的交叉验证和缺失随机种子）的需求日益增长。我们提出了scicode-lint，其两级架构将模式设计（构建时的前沿模型）与执行（运行时的小型本地模型）分离。模式是生成的，而非手工编码；适应新的库版本花费的是token，而非工程时间。在带有手动标注真实值的Kaggle笔记本上，预处理泄露检测在100%召回率下达到了65%的精确率；在38篇应用AI/ML的已发表科学论文中，精确率为62%（由LLM评判），不同模式类别之间存在显著差异；在一个保留的论文集上，精确率为54%。在受控测试中，scicode-lint在66个模式上达到了97.7%的准确率。

英文摘要

Methodology bugs in scientific Python code produce plausible but incorrect results that traditional linters and static analysis tools cannot detect. Several research groups have built ML-specific linters, demonstrating that detection is feasible. Yet these tools share a sustainability problem: dependency on specific pylint or Python versions, limited packaging, and reliance on manual engineering for every new pattern. As AI-generated code increases the volume of scientific software, the need for automated methodology checking (such as detecting data leakage, incorrect cross-validation, and missing random seeds) grows. We present scicode-lint, whose two-tier architecture separates pattern design (frontier models at build time) from execution (small local model at runtime). Patterns are generated, not hand-coded; adapting to new library versions costs tokens, not engineering hours. On Kaggle notebooks with human-labeled ground truth, preprocessing leakage detection reaches 65% precision at 100% recall; on 38 published scientific papers applying AI/ML, precision is 62% (LLM-judged) with substantial variation across pattern categories; on a held-out paper set, precision is 54%. On controlled tests, scicode-lint achieves 97.7% accuracy across 66 patterns.

URL PDF HTML ☆

赞 0 踩 0

2603.13373 2026-06-02 cs.CY cs.AI cs.LG 版本更新

Ethical Fairness in Ubiquitous Health Sensing without Known Attributes

无已知属性下的普适健康感知伦理公平性

Shaily Roy, Harshit Sharma, Daniel A. Adler, Srijan Sen, Tanzeem Choudhury, Asif Salekin

发表机构 * Ira A. Fulton Schools of Engineering, Arizona State University（亚利桑那州立大学弗里曼工程学院）； Arizona State University（亚利桑那州立大学）； Cornell University（康奈尔大学）； University of Michigan（密歇根大学）

AI总结针对普适健康感知中缺乏人口统计或异构属性时的公平性问题，提出基于Fisher信息引导的潜在子群学习与无害正则化框架Flare，通过优化几何实现伦理公平。

详情

AI中文摘要

在普适和移动健康系统中，计算模型从可穿戴、行为和生理传感数据推断人类状态。在这些场景中，仅高准确率是不够的；模型必须在不同人群、环境和设备间合乎伦理且公平地运行。然而，依赖训练时的人口统计或异构属性的公平方法难以实施，因为这些属性通常不可用、隐私敏感、受监管或不宜收集。传统的基于均等的公平也可能通过牺牲子群性能而违反伦理原则。为应对这一挑战，我们提出了Flare（Fisher引导的潜在子群学习与无害正则化），这是一个不依赖人口统计和异构属性的框架，将以人为本的公平性与普适和移动传感的伦理原则对齐。Flare利用优化几何，特别是Fisher信息，来正则化曲率并揭示模型行为中的潜在差异，而无需人口统计或异构属性。通过整合表示、损失和曲率信号，它识别隐藏的性能分层，并通过协作但无害的优化对其进行改进，在提升子群性能的同时保持伦理平衡。我们还引入了BHE（善行-避害-公平），一个超越统计均等的伦理公平度量套件。在移动生理、行为和临床传感数据集（包括EDA、OhioT1DM、IHS和Percept-R）上，Flare在伦理公平性上优于最先进的基线。消融、可解释性和损失景观分析表明，这些提升源于更平坦的优化几何、更简单的决策规则和无害的潜在子群适应。运行时分析支持Flare在资源受限的传感部署中的实用性。

英文摘要

In ubiquitous and mobile health systems, computational models infer human states from wearable, behavioral, and physiological sensing data. In these settings, high accuracy alone is insufficient; models must act ethically and equitably across diverse people, contexts, and devices. However, fairness methods that rely on demographic or heterogeneous attributes during training are difficult to enforce because such attributes are often unavailable, privacy-sensitive, regulated, or undesirable to collect. Conventional parity-based fairness can also violate ethical principles by trading off subgroup performance. To address this challenge, we present Flare, Fisher-guided LAtent-subgroup learning with do-no-harm REgularization, a demographic- and heterogeneous-attribute-agnostic framework that aligns human-centered fairness with ethical principles for ubiquitous and mobile sensing. Flare leverages optimization geometry, particularly Fisher Information, to regularize curvature and uncover latent disparities in model behavior without demographic or heterogeneous attributes. By integrating representation, loss, and curvature signals, it identifies hidden performance strata and refines them through collaborative but do-no-harm optimization, enhancing subgroup performance while preserving ethical balance. We also introduce BHE (Beneficence-Harm Avoidance-Equity), a metric suite that operationalizes ethical fairness beyond statistical parity. Across mobile physiological, behavioral, and clinical sensing datasets, including EDA, OhioT1DM, IHS, and Percept-R, Flare improves ethical fairness over state-of-the-art baselines. Ablation, interpretability, and loss-landscape analyses show that these gains arise from flatter optimization geometry, simpler decision rules, and do-no-harm latent-subgroup adaptation. Runtime analysis supports the practicality of Flare for resource-constrained sensing deployments.

URL PDF HTML ☆

赞 0 踩 0

2509.12263 2026-06-02 cs.AI cs.LG 版本更新

InPhyRe Discovers: Large Multimodal Models Struggle in Inductive Physical Reasoning

InPhyRe 发现：大型多模态模型在归纳物理推理中表现不佳

Gautam Sreekumar, Vishnu Naresh Boddeti

发表机构 * Department of Computer Science and Engineering, Michigan State University（密歇根州立大学计算机科学与工程系）

AI总结提出 InPhyRe 基准测试，通过合成视频中的碰撞事件预测任务，评估大型多模态模型在未见物理定律下的归纳物理推理能力，发现其依赖有限参数知识、受语言偏差影响且忽略视觉输入。

Comments Accepted to TMLR. 53 pages including appendix

详情

AI中文摘要

大型多模态模型（LMMs）将训练中观察到的物理定律（如动量守恒）编码为参数化知识。这使得 LMMs 能够回答物理推理查询，例如从视觉输入中预测潜在碰撞事件的结果。然而，由于参数化知识仅包含训练中见过的物理定律，它不足以推理遵循训练中未见物理定律的推理场景。在这种新颖的物理环境中，人类可以根据提供的演示调整其物理推理。这种归纳物理推理能力对于 LMMs 在安全关键应用中替代人类代理是必不可少的。尽管其重要性，现有的视觉基准并未评估归纳物理推理，仅考虑 LMMs 中的参数化知识。为此，我们提出了 InPhyRe，这是第一个用于衡量 LMMs 归纳物理推理的视觉问答基准。InPhyRe 评估 LMMs 预测算法生成的合成视频中碰撞事件结果的能力。通过检查超过 13 个开源和专有 LMMs，InPhyRe 告诉我们：（1）LMMs 难以将其关于普遍物理定律的有限参数化知识应用于推理；（2）当推理场景背后的物理定律在训练中未见时，LMMs 的归纳物理推理能力较弱；（3）LMMs 的归纳物理推理受到语言偏差的影响，可能忽略视觉输入，质疑了 LMMs 在视觉输入方面的可信度。

英文摘要

Large multimodal models (LMMs) encode physical laws observed during training, such as momentum conservation, as parametric knowledge. It allows LMMs to answer physical reasoning queries, such as the outcome of a potential collision event from visual input. However, since parametric knowledge includes only the physical laws seen during training, it is insufficient for reasoning in inference scenarios that follow physical laws unseen during training. In such novel physical environments, humans could adapt their physical reasoning based on provided demonstrations. This inductive physical reasoning ability is indispensable for LMMs if they are to replace human agents in safety-critical applications. Despite its importance, existing visual benchmarks do not evaluate inductive physical reasoning and only consider the parametric knowledge in LMMs. To this end, we propose InPhyRe, the first visual question answering benchmark to measure inductive physical reasoning in LMMs. InPhyRe evaluates LMMs' ability to predict the outcome of collision events in algorithmically generated synthetic videos. By inspecting over 13 open-source and proprietary LMMs, InPhyRe informs us that (1) LMMs struggle to apply their limited parametric knowledge about universal physical laws to reasoning, (2) inductive physical reasoning in LMMs is weak when the physical laws underlying inference scenarios were unseen during training, and (3) inductive physical reasoning in LMMs suffers from language bias and may ignore the visual inputs, questioning the trustworthiness of LMMs regarding visual inputs.

URL PDF HTML ☆

赞 0 踩 0

2603.14798 2026-06-02 stat.ML cs.LG cs.NA math.NA 版本更新

Preconditioned One-Step Generative Modeling for Bayesian Inverse Problems in Function Spaces

函数空间中贝叶斯逆问题的预处理一步生成建模

Zilan Cheng, Li-Lian Wang, Zhongjian Wang

发表机构 * Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University（数学科学学院，物理与数学科学学院，南洋理工大学）

AI总结提出一种基于一步生成传输的机器学习算法，使用先验对齐的高斯随机场作为源，通过神经算子逼近后验分布，高效求解函数空间中的贝叶斯逆问题。

详情

AI中文摘要

我们提出了一种用于函数空间贝叶斯逆问题的机器学习算法。基于一步生成传输，该方法学习一个摊销神经算子，其将高斯源的推送前推近似于以每个新观测为条件的后验分布。我们证明白噪声源与函数空间极限不兼容，因此采用先验对齐的GRF作为源。通过所得一步条件后验传输的Lipschitz正则性以及在线性逆问题和基于PDE的逆问题上的数值实验，我们证明了这一选择的合理性。该方法并非从MCMC中提炼：它仅使用先验样本和模拟的部分噪声观测进行训练。一旦训练完成，它能在约$10^{-3}$秒内生成一个$64\times64$的后验样本，避免了MCMC中重复的正向模型评估和多步生成采样器中重复的网络评估，同时匹配关键的后验摘要。

英文摘要

We propose a machine-learning algorithm for Bayesian inverse problems in the function-space regime. Based on one-step generative transport, the method learns an amortized neural operator whose pushforward of a Gaussian source approximates the posterior distribution conditioned on each new observation. We show that white-noise sources are incompatible with the function-space limit, and therefore adopt a prior-aligned GRF as the source. We justify this choice through the Lipschitz regularity of the resulting one-step conditional posterior transport and numerical experiments on linear inverse and PDE-based inverse problems. The method is not distilled from MCMC: it is trained only with prior samples and simulated partial noisy observations. Once trained, it generates a $64\times64$ posterior sample in $\sim 10^{-3}$s, avoiding repeated forward-model evaluations in MCMC and repeated network evaluations in multistep generative samplers while matching key posterior summaries.

URL PDF HTML ☆

赞 0 踩 0

2603.14405 2026-06-02 cs.LG cs.AI 版本更新

ES-Merging: Biological MLLM Merging via Embedding Space Signals

ES-Merging: 通过嵌入空间信号进行生物多模态大模型合并

Wonbin Lee, Dongki Kim, Sung Ju Hwang

发表机构 * KAIST（韩国科学技术院）； DeepAuto.ai

AI总结提出ES-Merging框架，利用嵌入空间信号估计合并系数，实现生物多模态大模型的高效合并，提升跨模态推理和单模态知识保留能力。

详情

AI中文摘要

生物多模态大语言模型（MLLMs）已成为科学发现的基础模型。然而，现有模型专注于单一模态，限制了其解决跨模态科学问题的能力。虽然模型合并是将不同模态组合成统一MLLM的有效方法，但现有方法依赖于与输入无关的参数空间启发式，无法准确捕捉模态特异性。为克服这一局限，我们提出基于嵌入信号的MLLM合并（ES-Merging），该框架从嵌入空间信号估计合并系数，将合并范式从参数信号转向嵌入信号。ES-Merging利用嵌入空间中的粗粒度和细粒度信号分别估计层间和元素级合并系数，并联合实现互补系数估计。通过大量实验，我们证明ES-Merging不仅在跨模态推理上，而且在单模态知识保留上均优于现有合并方法，表明嵌入空间信号为MLLM合并提供了有原则且有效的基础。

英文摘要

Biological multimodal large language models (MLLMs) have emerged as powerful foundation models for scientific discovery. However, existing models are specialized to a single modality, limiting their ability to solve inherently cross-modal scientific problems. While model merging is an efficient method to combine the different modalities into a unified MLLM, existing methods rely on input-agnostic parameter space heuristics that fail to faithfully capture modality specialization. To overcome this limitation, we propose the Embedding-Signal-based MLLM Merging (ES-Merging), a framework that estimates merging coefficients from embedding space signals, moving the merging paradigm from the parameter signals to the embedding signals. ES-Merging exploits coarse-grained and fine-grained signals from embedding space to estimate the layer-wise and element-wise merging coefficients, respectively, which are jointly combined for complementary coefficient estimation. Through extensive experiments, we demonstrate that ES-Merging outperforms existing merging methods not only on the cross-modal reasoning but also on the single-modal knowledge preservation, establishing that embedding space signals provide a principled and effective foundation for MLLM merging.

URL PDF HTML ☆

赞 0 踩 0

2505.21806 2026-06-02 cs.LG 版本更新

Towards Operational Automated Greenhouse Gas Plume Detection and Delineation

面向运营的自动化温室气体羽流检测与描绘

Brian D. Bue, Jake H. Lee, Andrew K. Thorpe, Philip G. Brodrick, Daniel Cusworth, Alana Ayasse, Vassiliki Mancoridis, Anagha Satish, Shujun Xiong, Riley Duren

发表机构 * University of California, San Diego（加州大学圣地亚哥分校）； NASA Jet Propulsion Laboratory（美国国家航空航天局喷气推进实验室）

AI总结针对高空间分辨率成像光谱仪，通过卷积神经网络和多任务学习解决数据质量、时空偏差和建模目标对齐等障碍，实现运营级温室气体羽流检测与分割。

Comments Main 19 pages 14 figures. Supplemental 19 pages 16 figures. In review

详情

DOI: 10.1016/j.rse.2026.115506
Journal ref: Remote Sensing of Environment 343 (2026) 115506

AI中文摘要

尽管深度学习方法取得了最新进展，但对于精细空间分辨率成像光谱仪，全自动设施级温室气体（GHG）羽流检测系统的运营部署仍然具有挑战性。然而，随着数据可用性的急剧增加，自动化在排放监测中的重要性持续上升。本工作回顾并解决了该领域的几个关键障碍：数据和标签质量控制、时空偏差的预防以及正确对齐的建模目标。我们通过使用来自机载和星载仪器的多活动数据进行的严格实验证明，当这些障碍得到缓解时，卷积神经网络（CNN）能够实现运营检测性能。我们证明，同时学习实例检测和像素级分割的多任务模型可以成功走向运营路径。我们评估了模型在不同排放源类型和区域上的羽流可检测性，确定了运营部署的阈值。最后，我们提供了分析就绪的数据、模型和源代码以实现可重复性，并致力于定义一套最佳实践和验证标准，以促进未来对该领域的贡献。

英文摘要

Operational deployment of a fully automated facility-scale greenhouse gas (GHG) plume detection system remains challenging for fine spatial resolution imaging spectrometers, despite recent advances in deep learning approaches. With the dramatic increase in data availability, however, automation continues to increase in importance for emissions monitoring. This work reviews and addresses several key obstacles in the field: data and label quality control, prevention of spatiotemporal biases, and correctly aligned modeling objectives. We demonstrate through rigorous experiments using multicampaign data from airborne and spaceborne instruments that convolutional neural networks (CNNs) are able to achieve operational detection performance when these obstacles are alleviated. We demonstrate that a multitask model that learns both instance detection and pixelwise segmentation simultaneously can successfully lead towards an operational pathway. We evaluate the model's plume detectability across emission source types and regions, identifying thresholds for operational deployment. Finally, we provide analysis-ready data, models, and source code for reproducibility, and work to define a set of best practices and validation standards to facilitate future contributions to the field.

URL PDF HTML ☆

赞 0 踩 0

2603.13312 2026-06-02 cs.MM cs.LG 版本更新

Design-MLLM: A Reinforcement Alignment Framework for Verifiable and Aesthetic Interior Design

Design-MLLM：一种用于可验证且美观的室内设计的强化对齐框架

Yuxuan Yang, Xiaotong Mao, Jingyao Wang, Fuchun Sun

发表机构 * National Jiangsu University of Finance（江苏财经大学）； University of Lorraine（洛林大学）； Institute of Electronics and Information Technology, Chinese Academy of Sciences（中国科学院电子信息技术研究所）； Tsinghua University（清华大学）

AI总结提出Design-MLLM框架，通过双分支美学导向奖励的强化对齐，解决室内设计中空间可行性硬约束与美学偏好软约束的矛盾，生成既可行又美观的设计。

详情

AI中文摘要

室内设计是一个从需求到视觉方案的生成过程，必须同时满足可验证的空间可行性和比较性的美学偏好。虽然最近的多模态大语言模型（MLLM）为解释用户意图和生成设计理由提供了统一基础，但我们的实证分析揭示了实际部署中持续存在的矛盾：MLLM通常生成不可建造且美学不一致的布局。这些发现表明，简单地添加领域内文本是不够的；有效的室内设计需要一种对齐机制，将硬约束与软偏好分离，并在优化过程中协调它们。为此，我们提出Design-MLLM，一种通过双分支、美学导向奖励优化可行性优先偏好目标的强化对齐框架。具体来说，Design-MLLM (i) 使用程序化约束检查显式评估空间可行性，(ii) 仅在可行候选者中评估美学偏好，以避免视觉吸引但不可执行的捷径，(iii) 执行组相对优化以获得稳定的偏好信号。通过这个过程，Design-MLLM学习一种可控策略，一致地选择并生成既可行又美学协调的解决方案，而不是偶尔产生视觉吸引但不可行的设计。在各种基准数据集上的大量实验证明了Design-MLLM的优势。

英文摘要

Interior design is a requirements-to-visual-plan generation process that must simultaneously satisfy verifiable spatial feasibility and comparative aesthetic preferences. While recent multimodal large language models (MLLMs) offer a unified foundation for interpreting user intent and producing design rationales, our empirical analysis reveals a persistent contradiction in real-world deployment: MLLMs often produce layouts that are unbuildable and aesthetically inconsistent. These findings indicate that simply adding in-domain text is insufficient; effective interior design requires an alignment mechanism that separates hard constraints from soft preferences and coordinates them during optimization. To address this, we propose Design-MLLM, a reinforcement alignment framework that optimizes a feasibility-first preference objective via a dual-branch, aesthetic-oriented reward. Specifically, Design-MLLM (i) explicitly evaluates spatial feasibility using programmatic constraint checks, (ii) assesses aesthetic preference only among feasible candidates to avoid visually appealing but unexecutable shortcuts, and (iii) performs group-relative optimization to obtain stable preference signals. Through this process, Design-MLLM learns a controllable policy that consistently selects and generates solutions that are both executable and aesthetically coherent, rather than occasionally producing visually appealing but infeasible designs. Extensive experiments on various benchmark datasets demonstrate the advantages of Design-MLLM.

URL PDF HTML ☆

赞 0 踩 0

2603.12996 2026-06-02 cs.LG 版本更新

DAPD: Dependency-Aware Parallel Decoding via Attention for Diffusion LLMs

DAPD: 面向扩散LLM的基于注意力的依赖感知并行解码

Bumjun Kim, Dongjae Jeon, Moongyu Jeon, Albert No

发表机构 * KAIST（韩国科学技术院）

AI总结提出一种无需训练的并行解码方法DAPD，利用自注意力构建掩码标记的依赖图，通过选择独立集并行解码，避免强耦合标记同时更新，提升了扩散LLM的精度-步数权衡。

Comments Accepted at ICML 2026

详情

AI中文摘要

扩散LLM（dLLM）的并行解码很困难，因为每个去噪步骤仅提供逐标记的边缘分布，而同时解掩多个标记需要考虑标记间的依赖关系。我们提出依赖感知并行解码（DAPD），一种简单、无需训练的解码方法，它使用自注意力在掩码标记上诱导条件依赖图。在每次迭代中，图中的边捕捉强标记交互，而非边表示弱依赖。然后，并行解码简化为在图上选择一个独立集并并行解掩所选标记。这避免了同时更新强耦合标记，无需辅助模型或重新训练。在LLaDA和Dream上的实验表明，DAPD改进了现有方法的精度-步数权衡，并实现了更全局分布的并行更新，更好地利用了dLLM的任意顺序生成能力。项目地址：https://ai-isl.github.io/dapd

英文摘要

Parallel decoding for Diffusion LLMs (dLLMs) is difficult because each denoising step provides only token-wise marginal distributions, while unmasking multiple tokens simultaneously requires accounting for inter-token dependencies. We propose Dependency-Aware Parallel Decoding (DAPD), a simple, training-free decoding method that uses self-attention to induce a conditional dependency graph over masked tokens. At each iteration, edges in this graph capture strong token interactions, while non-edges indicate weak dependence. Parallel decoding is then reduced to selecting an independent set on the graph and unmasking the selected tokens in parallel. This avoids co-updating strongly coupled tokens without auxiliary models or retraining. Experiments on LLaDA and Dream show that DAPD improves the accuracy-steps trade-off over existing methods and enables more globally distributed parallel updates that better exploit the any-order generation capability of dLLMs. The project is available at https://ai-isl.github.io/dapd

URL PDF HTML ☆

赞 0 踩 0

2603.12037 2026-06-02 cs.LG 版本更新

Frequentist Consistency of Prior-Data Fitted Networks for Causal Inference

用于因果推断的先验数据拟合网络的频率派一致性

Valentyn Melnychuk, Vahid Balazadeh, Stefan Feuerriegel, Rahul G. Krishnan

发表机构 * LMU Munich \& Munich Center for Machine Learning (MCML), Munich, Germany ； University of Toronto \& Vector Institute, Toronto, Canada

AI总结本文分析基于先验数据拟合网络（PFN）的平均处理效应（ATE）估计量的频率派一致性，发现其存在先验诱导的混淆偏差，并提出基于一步后验校正（OSPC）的校准方法，结合鞅后验恢复功能干扰后验，从而恢复频率派一致性并实现半参数Bernstein-von Mises定理。

详情

Journal ref: Proceedings of the 43-rd International Conference on Machine Learning, Seoul, South Korea, PMLR 306, 2026

Flowers: 神经PDE求解器的曲速引擎

Till Muser, Alexandra Spitzer, Matti Lassas, Maarten V. de Hoop, Ivan Dokmanić

发表机构 * ETH Zurich（苏黎世联邦理工学院）； University of Helsinki（赫尔辛基大学）； University of California, Berkeley（加州大学伯克利分校）

AI总结提出Flowers架构，通过多头扭曲场实现线性代价的自适应全局交互，在2D/3D时变PDE基准上超越傅里叶、卷积和注意力基线。

详情

AI中文摘要

我们引入了Flowers，一种完全由多头扭曲构建的神经架构，用于学习PDE解算子。除了逐点通道混合和多尺度支架外，Flowers不使用傅里叶乘子、点积注意力或卷积混合。每个头预测一个位移场并扭曲混合后的输入特征。受物理和计算效率的启发，位移是逐点预测的，没有任何空间聚合，非局域性仅通过每个头在源坐标处的稀疏采样引入。在多尺度残差块中堆叠扭曲得到Flowers，它以线性代价实现自适应的全局交互。我们通过三个互补视角从理论上论证了这一设计：守恒律的流图、非均匀介质中的波以及动力学理论的连续极限。Flowers在一系列2D和3D时变PDE基准上取得了优异性能，特别是流和波。一个紧凑的17M参数模型持续优于相似规模的傅里叶、卷积和注意力基线，而一个150M参数变体在参数、数据和训练计算量多得多的情况下，超越了近期基于transformer的基础模型。

英文摘要

We introduce Flowers, a neural architecture for learning PDE solution operators built entirely from multihead warps. Aside from pointwise channel mixing and a multiscale scaffold, Flowers use no Fourier multipliers, no dot-product attention, and no convolutional mixing. Each head predicts a displacement field and warps the mixed input features. Motivated by physics and computational efficiency, displacements are predicted pointwise, without any spatial aggregation, and nonlocality enters only through sparse sampling at source coordinates, one per head. Stacking warps in multiscale residual blocks yields Flowers, which implement adaptive, global interactions at linear cost. We theoretically motivate this design through three complementary lenses: flow maps for conservation laws, waves in inhomogeneous media, and a kinetic-theoretic continuum limit. Flowers achieve excellent performance on a broad suite of 2D and 3D time-dependent PDE benchmarks, particularly flows and waves. A compact 17M-parameter model consistently outperforms Fourier, convolution, and attention-based baselines of similar size, while a 150M-parameter variant improves over recent transformer-based foundation models with much more parameters, data, and training compute.

URL PDF HTML ☆

赞 0 踩 0

2602.22101 2026-06-02 cs.LG cs.AI 版本更新

On Imbalanced Regression with Hoeffding Trees

关于使用Hoeffding树的不平衡回归

Pantia-Marina Alchirch, Dimitrios I. Diochnos

发表机构 * University of Oklahoma（俄克拉荷马大学）

AI总结针对不平衡回归中的数据流问题，将核密度估计扩展到流式设置并集成层次收缩到增量决策树中，实验表明KDE能持续提升早期流性能。

Comments 17 pages, 5 figures, 3 tables, 2 algorithms, authors' version of paper accepted in PAKDD 2026 special session on Data Science: Foundations and Applications (DSFA)

2503.11832 2026-06-02 cs.AI cs.LG 版本更新

Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning

安全幻象：虚假相关性如何破坏VLM安全微调及通过机器遗忘缓解

Yiwei Chen, Yuguang Yao, Yihua Zhang, Bingquan Shen, Gaowen Liu, Sijia Liu

发表机构 * Michigan State University（密歇根州立大学）； National University of Singapore（新加坡国立大学）； Cisco Research（思科研究）

AI总结本文发现视觉语言模型（VLM）的安全微调存在“安全幻象”，即虚假相关性导致脆弱性，并提出机器遗忘作为替代方案，显著降低攻击成功率和不必要拒绝。

Comments Accepted to ICLR 2026

详情

AI中文摘要

最近的视觉语言模型（VLM）在多模态输入（特别是文本和图像）的生成建模方面取得了显著进展。然而，当暴露于不安全查询时，它们生成有害内容的倾向引发了关键的安全问题。虽然当前的对齐策略主要依赖于使用精心策划的数据集进行监督安全微调，但我们发现了一个基本限制，称为“安全幻象”，其中监督微调无意中强化了表面文本模式与安全响应之间的虚假相关性，而不是促进深层的、内在的危害缓解。我们表明，这些虚假相关性使微调后的VLM即使面对基于单词修改的简单攻击也易受攻击，其中将文本查询中的单个单词替换为诱导虚假相关性的替代词即可有效绕过安全防护。此外，这些相关性导致过度谨慎，使微调后的VLM不必要地拒绝良性查询。为了解决这些问题，我们展示了机器遗忘（MU）作为监督安全微调的有力替代方案，因为它避免了有偏的特征-标签映射，并直接从VLM中移除有害知识，同时保留其通用能力。在安全基准上的广泛评估表明，基于MU的对齐将攻击成功率降低了高达60.27%，并将不必要的拒绝减少了超过84.20%。警告：存在可能具有攻击性的AI生成内容。

英文摘要

Recent vision language models (VLMs) have made remarkable strides in generative modeling with multimodal inputs, particularly text and images. However, their susceptibility to generating harmful content when exposed to unsafe queries raises critical safety concerns. While current alignment strategies primarily rely on supervised safety fine-tuning with curated datasets, we identify a fundamental limitation we call the ''safety mirage'', where supervised fine-tuning inadvertently reinforces spurious correlations between superficial textual patterns and safety responses, rather than fostering deep, intrinsic mitigation of harm. We show that these spurious correlations leave fine-tuned VLMs vulnerable even to a simple one-word modification-based attack, where substituting a single word in text queries with a spurious correlation-inducing alternative can effectively bypass safeguards. Additionally, these correlations contribute to the over-prudence, causing fine-tuned VLMs to refuse benign queries unnecessarily. To address these issues, we show machine unlearning (MU) as a powerful alternative to supervised safety fine-tuning, as it avoids biased feature-label mappings and directly removes harmful knowledge from VLMs while preserving their general capabilities. Extensive evaluations across safety benchmarks show that under MU-based alignment reduces the attack success rate by up to 60.27% and cuts unnecessary rejections by over 84.20%. WARNING: There exist AI generations that may be offensive in nature.

URL PDF HTML ☆

赞 0 踩 0

2510.07650 2026-06-02 cs.LG cs.AI 版本更新

Value Flows

Perry Dong, Chongyi Zheng, Chelsea Finn, Dorsa Sadigh, Benjamin Eysenbach

发表机构 * Stanford University（斯坦福大学）； Princeton University（普林斯顿大学）

AI总结本文利用基于流的生成模型估计完整未来回报分布，通过新的流匹配目标满足分布贝尔曼方程，并利用流导数ODE估计回报不确定性以优先学习，在离线与在线设置中平均成功率提升1.3倍。

Comments ICLR 2026

详情

AI中文摘要

虽然当今大多数强化学习方法将未来回报的分布压缩为单个标量值，但分布RL方法利用回报分布提供更强的学习信号，并支持探索和安全强化学习中的应用。虽然估计回报分布的主要方法是将其建模为离散区间上的分类分布或估计有限数量的分位数，但这些方法留下了关于回报分布的细粒度结构以及如何区分高回报不确定性的状态以进行决策的未解问题。本文的关键思想是使用现代、灵活的基于流的模型来估计完整的未来回报分布，并识别那些具有高回报方差的状态。我们通过制定一个新的流匹配目标来实现这一点，该目标生成满足分布贝尔曼方程的概率密度路径。基于学习到的流模型，我们使用一个新的流导数ODE来估计不同状态的回报不确定性。我们还利用这种不确定性信息，优先在某些转换上学习更准确的回报估计。我们将我们的方法（Value Flows）与先前的方法在离线和在线到在线设置中进行了比较。在37个基于状态和25个基于图像的基准任务上的实验表明，Value Flows在成功率上平均提高了1.3倍。网站：https://pd-perry.github.io/value-flows 代码：https://github.com/chongyi-zheng/value-flows

英文摘要

While most reinforcement learning methods today flatten the distribution of future returns to a single scalar value, distributional RL methods exploit the return distribution to provide stronger learning signals and to enable applications in exploration and safe RL. While the predominant method for estimating the return distribution is by modeling it as a categorical distribution over discrete bins or estimating a finite number of quantiles, such approaches leave unanswered questions about the fine-grained structure of the return distribution and about how to distinguish states with high return uncertainty for decision-making. The key idea in this paper is to use modern, flexible flow-based models to estimate the full future return distributions and identify those states with high return variance. We do so by formulating a new flow-matching objective that generates probability density paths satisfying the distributional Bellman equation. Building upon the learned flow models, we estimate the return uncertainty of distinct states using a new flow derivative ODE. We additionally use this uncertainty information to prioritize learning a more accurate return estimation on certain transitions. We compare our method (Value Flows) with prior methods in the offline and online-to-online settings. Experiments on $37$ state-based and $25$ image-based benchmark tasks demonstrate that Value Flows achieves a $1.3\times$ improvement on average in success rates. Website: https://pd-perry.github.io/value-flows Code: https://github.com/chongyi-zheng/value-flows

URL PDF HTML ☆

赞 0 踩 0

2603.03031 2026-06-02 cs.LG 版本更新

Step-Level Sparse Autoencoder for Reasoning Process Interpretation

步骤级稀疏自编码器用于推理过程解释

Xuan Yang, Jiayu Liu, Yuhang Lai, Hao Xu, Zhenya Huang, Ning Miao

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出步骤级稀疏自编码器（SSAE），通过条件稀疏化形成信息瓶颈，将推理步骤中的增量信息与背景信息分离为稀疏特征，用于解释大语言模型的推理过程。

详情

AI中文摘要

大型语言模型（LLMs）通过思维链（CoT）推理实现了强大的复杂推理能力。然而，它们的推理模式仍然过于复杂而难以分析。尽管稀疏自编码器（SAEs）已成为可解释性的强大工具，但现有方法主要在token级别操作，在捕获更关键的步骤级信息（如推理方向和语义转换）时存在粒度不匹配问题。在这项工作中，我们提出了步骤级稀疏自编码器（SSAE），作为一种分析工具，将LLMs推理步骤的不同方面解耦为稀疏特征。具体来说，通过精确控制步骤特征基于其上下文的稀疏性，我们在步骤重建中形成一个信息瓶颈，将增量信息从背景信息中分离出来，并将其解耦为几个稀疏激活的维度。在多个基础模型和推理任务上的实验显示了提取特征的有效性。通过线性探测，我们可以轻松预测表面级信息，如生成长度和第一个token分布，以及更复杂的属性，如步骤的正确性和逻辑性。这些观察表明，LLMs在生成过程中应该已经至少部分地知道这些属性，这为LLMs的自我验证能力提供了基础。我们的代码可在https://github.com/Miaow-Lab/SSAE获取。

英文摘要

Large Language Models (LLMs) have achieved strong complex reasoning capabilities through Chain-of-Thought (CoT) reasoning. However, their reasoning patterns remain too complicated to analyze. While Sparse Autoencoders (SAEs) have emerged as a powerful tool for interpretability, existing approaches predominantly operate at the token level, creating a granularity mismatch when capturing more critical step-level information, such as reasoning direction and semantic transitions. In this work, we propose step-level sparse autoencoder (SSAE), which serves as an analytical tool to disentangle different aspects of LLMs' reasoning steps into sparse features. Specifically, by precisely controlling the sparsity of a step feature conditioned on its context, we form an information bottleneck in step reconstruction, which splits incremental information from background information and disentangles it into several sparsely activated dimensions. Experiments on multiple base models and reasoning tasks show the effectiveness of the extracted features. By linear probing, we can easily predict surface-level information, such as generation length and first token distribution, as well as more complicated properties, such as the correctness and logicality of the step. These observations indicate that LLMs should already at least partly know about these properties during generation, which provides the foundation for the self-verification ability of LLMs. Our code is available at https://github.com/Miaow-Lab/SSAE.

URL PDF HTML ☆

赞 0 踩 0

2603.02650 2026-06-02 cs.LG cs.AI cs.RO 版本更新

Improving Diffusion Planners by Self-Supervised Action Gating with Energies

通过自监督动作能量门控改进扩散规划器

Yuan Lu, Dongqi Han, Yansen Wang, Dongsheng Li

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出SAGE方法，利用潜在一致性信号在推理时重新排序轨迹，惩罚动态不一致的计划，从而提升扩散规划器的性能和鲁棒性。

详情

AI中文摘要

扩散规划器是离线强化学习的一种强大方法，但当价值引导选择偏好得分高但局部与环境动态不一致的轨迹时，它们可能会失败，导致执行脆弱。我们提出了自监督动作能量门控（SAGE），一种推理时重排序方法，使用潜在一致性信号惩罚动态不一致的计划。SAGE在离线状态序列上训练联合嵌入预测架构（JEPA）编码器，并训练一个动作条件的潜在预测器用于短时域过渡。在测试时，SAGE为每个采样候选分配一个由其潜在预测误差给出的能量，并将此可行性得分与价值估计相结合以选择动作。SAGE可以集成到现有的扩散规划流程中，这些流程可以通过价值评分采样轨迹和选择动作；它不需要环境回滚，也不需要重新训练策略。在运动、导航和操作基准测试中，SAGE提高了扩散规划器的性能和鲁棒性。

英文摘要

Diffusion planners are a strong approach for offline reinforcement learning, but they can fail when value-guided selection favours trajectories that score well yet are locally inconsistent with the environment dynamics, resulting in brittle execution. We propose Self-supervised Action Gating with Energies (SAGE), an inference-time re-ranking method that penalises dynamically inconsistent plans using a latent consistency signal. SAGE trains a Joint-Embedding Predictive Architecture (JEPA) encoder on offline state sequences and an action-conditioned latent predictor for short horizon transitions. At test time, SAGE assigns each sampled candidate an energy given by its latent prediction error and combines this feasibility score with value estimates to select actions. SAGE can integrate into existing diffusion planning pipelines that can sample trajectories and select actions via value scoring; it requires no environment rollouts and no policy re-training. Across locomotion, navigation, and manipulation benchmarks, SAGE improves the performance and robustness of diffusion planners.

URL PDF HTML ☆

赞 0 踩 0

2603.02346 2026-06-02 cond-mat.str-el cs.AI cs.LG 版本更新

Large Electron Model: A Universal Ground State Predictor

大型电子模型：一种通用的基态预测器

Timothy Zaklama, Max Geier, Liang Fu

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； Department of Physics（物理系）

AI总结提出Large Electron Model，一种基于Fermi Sets架构的神经网络模型，通过在整个哈密顿参数流形上生成变分波函数，准确预测二维谐振势中相互作用电子的基态，并泛化到未见耦合强度和粒子数，为材料发现提供了基于变分原理的基座模型方法。

Comments 8+7 pages, 5+6 figures, 1+1 tables

2603.02238 2026-06-02 cs.LG cs.FL cs.LO 版本更新

Length Generalization Bounds for Transformers

Transformer的长度泛化界

Andy Yang, Pascal Bergsträßer, Georg Zetzsche, David Chiang, Anthony W. Lin

发表机构 * University of Cambridge（剑桥大学）

AI总结本文证明C-RASP（与Transformer紧密相关的语言类）不存在可计算的长度泛化界，但为正片段（等价于固定精度Transformer）提供了可计算的指数级最优界。

Comments 22 pages

详情

AI中文摘要

长度泛化是学习算法的一个关键性质，它使得算法在给定有限训练数据的情况下，能够对任意长度的输入做出正确预测。为了提供这样的保证，需要能够计算一个长度泛化界，超过该界模型保证泛化。本文关注C-RASP（一类与Transformer紧密相关的语言）的此类泛化界的可计算性这一开放问题。最近Chen等人针对仅有一层C-RASP以及在限制条件下针对两层C-RASP给出了部分正面结果。我们对该开放问题给出了完整答案。主要结果是C-RASP（已有两层）以及因此Transformer不存在可计算的长度泛化界。作为补充，我们为C-RASP的正片段（我们证明其等价于固定精度Transformer）提供了一个可计算的界。对于正C-RASP和固定精度Transformer，我们证明长度复杂度是指数级的，并证明了界的优性。

英文摘要

Length generalization is a key property of a learning algorithm that enables it to make correct predictions on inputs of any length, given finite training data. To provide such a guarantee, one needs to be able to compute a length generalization bound, beyond which the model is guaranteed to generalize. This paper concerns the open problem of the computability of such generalization bounds for C-RASP, a class of languages which is closely linked to transformers. A positive partial result was recently shown by Chen et al. for C-RASP with only one layer and, under some restrictions, also with two layers. We provide complete answers to the above open problem. Our main result is the non-existence of computable length generalization bounds for C-RASP (already with two layers) and hence for transformers. To complement this, we provide a computable bound for the positive fragment of C-RASP, which we show equivalent to fixed-precision transformers. For both positive C-RASP and fixed-precision transformers, we show that the length complexity is exponential, and prove optimality of the bounds.

URL PDF HTML ☆

赞 0 踩 0

2603.02237 2026-06-02 cs.LG cs.AI 版本更新

Concept Heterogeneity-aware Representation Steering

概念异质性感知表示引导

Laziz U. Abdullaev, Noelle Y. L. Wong, Ryan T. Z. Lee, Shiqi Jiang, Khoi N. M. Nguyen, Tan M. Nguyen

发表机构 * arXiv

AI总结针对大语言模型表示非均匀导致全局引导脆弱的问题，提出基于最优传输的输入依赖引导方法CHaRS，通过高斯混合模型和离散最优传输实现更有效的行为控制。

详情

Journal ref: ICML 2026

AI中文摘要

表示引导提供了一种轻量级机制，通过在推理时干预内部激活来控制大语言模型（LLMs）的行为。现有方法大多依赖于单个全局引导方向，通常通过对比较数据集进行均值差异得到。这种方法隐含假设目标概念在嵌入空间中均匀表示。然而在实践中，LLM表示可能高度非均匀，表现出聚类、上下文相关的结构，这使得全局引导方向变得脆弱。在这项工作中，我们通过最优传输（OT）的视角审视表示引导，注意到标准均值差异引导隐式对应于具有不同一阶矩的两个相同分布之间的OT映射，产生全局平移。为了放宽这一限制性假设，我们从理论上将源和目标表示建模为高斯混合模型，并将引导公式化为语义潜在聚类之间的离散OT问题。从得到的传输计划中，我们通过重心投影推导出显式的、输入依赖的引导映射，产生聚类级别偏移的平滑核加权组合。我们将此方法称为概念异质性感知表示引导（CHaRS）。通过大量实验设置，我们证明CHaRS比全局引导产生更有效的行为控制。

英文摘要

Representation steering offers a lightweight mechanism for controlling the behavior of large language models (LLMs) by intervening on internal activations at inference time. Most existing methods rely on a single global steering direction, typically obtained via difference-in-means over contrastive datasets. This approach implicitly assumes that the target concept is homogeneously represented across the embedding space. In practice, however, LLM representations can be highly non-homogeneous, exhibiting clustered, context-dependent structure, which renders global steering directions brittle. In this work, we view representation steering through the lens of optimal transport (OT), noting that standard difference-in-means steering implicitly corresponds to the OT map between two identical distributions with differing first moments, yielding a global translation. To relax this restrictive assumption, we theoretically model source and target representations as Gaussian mixture models and formulate steering as a discrete OT problem between semantic latent clusters. From the resulting transport plan, we derive an explicit, input-dependent steering map via barycentric projection, producing a smooth, kernel-weighted combination of cluster-level shifts. We term this method Concept Heterogeneity-aware Representation Steering (CHaRS). Through numerous experimental settings, we show that CHaRS yields more effective behavioral control than global steering.

URL PDF HTML ☆

赞 0 踩 0

2509.15394 2026-06-02 cs.LG 版本更新

VMDNet: Temporal Leakage-Free Variational Mode Decomposition for Electricity Demand Forecasting

VMDNet：用于电力需求预测的无时间泄漏变分模态分解

Weibin Feng, Ran Tao, John Cartlidge, Jin Zheng

发表机构 * UKRI EPSRC Doctoral Training Partnership（UKRI EPSRC博士培训计划）； UKRI EPSRC ； AI for Collective Intelligence (AI4CI)（集体智能（AI4CI））

AI总结提出VMDNet框架，通过逐样本变分模态分解避免时间泄漏、频率感知嵌入和并行时间卷积网络建模各模态，并引入Stackelberg博弈双层优化选择超参数，在电力需求预测中超越现有方法。

Comments 5 pages, 1 figure, 2 tables. Version 3: Accepted author manuscript for the 34th European Signal Processing Conference (EUSIPCO 2026), Bruges, Belgium. Improved figures, additional details on TCN-based parallel decoding, and extended literature review. Code and data available: https://github.com/weibin-feng/VMDNet

详情

AI中文摘要

准确的电力需求预测具有挑战性，因为真实需求序列具有强多周期性，使得有效建模循环时间模式至关重要。分解技术使这种结构显式化，从而提升预测性能。变分模态分解（VMD）是一种用于周期性感知分解的强大信号处理方法，近年来得到越来越多的采用。然而，现有研究常遭受信息泄漏，并依赖不恰当的超参数调优。为解决这些问题，我们提出VMDNet，一个因果保持框架，它（i）应用逐样本VMD以避免时间泄漏；（ii）用频率感知嵌入表示每个分解模态，并使用并行时间卷积网络（TCNs）解码，确保模态独立性和高效学习；（iii）引入受Stackelberg博弈启发的双层方案来指导VMD两个关键超参数的选择。在三个广泛使用的电力需求数据集上的实验表明，VMDNet持续优于最先进的基线方法。

英文摘要

Accurate electricity demand forecasting is challenging due to the strong multi-periodicity of real-world demand series, which makes effective modeling of recurrent temporal patterns crucial. Decomposition techniques make such structure explicit and thereby improve predictive performance. Variational Mode Decomposition (VMD) is a powerful signal-processing method for periodicity-aware decomposition and has seen growing adoption in recent years. However, existing studies often suffer from information leakage and rely on inappropriate hyperparameter tuning. To address these issues, we propose VMDNet, a causality-preserving framework that (i) applies sample-wise VMD to avoid temporal leakage; (ii) represents each decomposed mode with frequency-aware embeddings and decodes it using parallel temporal convolutional networks (TCNs), ensuring mode independence and efficient learning; and (iii) introduces a Stackelberg game inspired bilevel scheme to guide the selection of VMD's two key hyperparameters. Experiments on three widely used electricity demand datasets show that VMDNet consistently outperforms state-of-the-art baselines.

URL PDF HTML ☆

赞 0 踩 0

2603.01097 2026-06-02 cs.LG 版本更新

Understanding LoRA as Knowledge Memory: An Empirical Analysis

理解LoRA作为知识记忆：一项实证分析

Seungju Back, Dongwoo Lee, Naun Kang, Taehee Lee, S. K. Hong, Youngjune Gwon, Sungjin Ahn

发表机构 * New York University（纽约大学）

AI总结本文通过系统实证研究，将低秩适配（LoRA）作为模块化知识记忆，探索其存储容量、内部化优化、多模块系统扩展及长上下文推理能力，提供LoRA记忆操作边界的实用指导。

Comments ICML 2026

详情

AI中文摘要

预训练大型语言模型（LLM）的持续知识更新日益必要但仍具挑战性。尽管上下文学习（ICL）和检索增强生成（RAG）等推理时方法很流行，但它们面临上下文预算、成本和检索碎片化的限制。脱离这些依赖上下文的范式，本工作研究使用低秩适配（LoRA）作为模块化知识记忆的参数化方法。尽管近期有少量工作探讨了这一概念，但控制其容量和可组合性的基本机制仍很大程度上未被探索。我们通过首个系统性的实证研究来填补这一空白，该研究映射了基于LoRA记忆的设计空间，包括表征存储容量、优化内部化、扩展多模块系统以及评估长上下文推理。我们并非提出单一架构，而是提供关于LoRA记忆操作边界的实用指导。总体而言，我们的发现将LoRA定位为与RAG和ICL互补的记忆轴，具有独特优势。代码和数据集可在 https://github.com/ahn-ml/Understanding-LoRA-as-Knowledge-Memory 获取。

英文摘要

Continuous knowledge updating for pre-trained large language models (LLMs) is increasingly necessary yet remains challenging. Although inference-time methods like In-Context Learning (ICL) and Retrieval-Augmented Generation (RAG) are popular, they face constraints in context budgets, costs, and retrieval fragmentation. Departing from these context-dependent paradigms, this work investigates a parametric approach using Low-Rank Adaptation (LoRA) as a modular knowledge memory. Although few recent works examine this concept, the fundamental mechanics governing its capacity and composability remain largely unexplored. We bridge this gap through the first systematic empirical study mapping the design space of LoRA-based memory, ranging from characterizing storage capacity and optimizing internalization to scaling multi-module systems and evaluating long-context reasoning. Rather than proposing a single architecture, we provide practical guidance on the operational boundaries of LoRA memory. Overall, our findings position LoRA as the complementary axis of memory alongside RAG and ICL, offering distinct advantages. Code and datasets are available at https://github.com/ahn-ml/Understanding-LoRA-as-Knowledge-Memory.

URL PDF HTML ☆

赞 0 踩 0

2506.14003 2026-06-02 cs.LG 版本更新

Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs

遗忘并非不可见：从模型输出中检测LLMs的遗忘痕迹

Yiwei Chen, Soumyadeep Pal, Yimeng Zhang, Qing Qu, Sijia Liu

发表机构 * Michigan State University（密歇根州立大学）； University of Michigan, Ann Arbor（密歇根大学安娜堡分校）； IBM Research（IBM研究院）

AI总结本文发现大型语言模型在经历机器遗忘后会在行为和内部表征中留下可检测的“指纹”，并通过分类器利用预测logits或文本输出以超过90%的准确率识别遗忘模型。

Comments Accepted to ICLR 2026

详情

AI中文摘要

大型语言模型（LLMs）的机器遗忘（MU），通常称为LLM遗忘，旨在从训练模型中移除特定的不良数据或知识，同时保持其在标准任务上的性能。虽然遗忘在保护数据隐私、执行版权和减轻LLMs中的社会技术危害方面发挥着关键作用，但我们发现了一个遗忘后的新漏洞：遗忘痕迹检测。我们发现遗忘在LLMs中留下了持久的“指纹”，即在模型行为和内部表征中可检测的痕迹。这些痕迹可以从输出响应中识别，即使使用与遗忘无关的输入进行提示。具体来说，即使是一个简单的监督分类器，仅使用其预测logits甚至文本输出，就可以确定模型是否经历了遗忘。进一步的分析表明，这些痕迹嵌入在中间激活中，并非线性地传播到最后一层，在激活空间中形成低维、可学习的流形。通过大量实验，我们证明即使在遗忘无关的输入下，遗忘痕迹也可以以超过90%的准确率被检测到，并且更大的LLMs表现出更强的可检测性。这些发现揭示了遗忘留下了可测量的签名，引入了一种新的风险，即当模型被识别为已遗忘时，给定输入查询，可以逆向工程遗忘的信息。

英文摘要

Machine unlearning (MU) for large language models (LLMs), commonly referred to as LLM unlearning, seeks to remove specific undesirable data or knowledge from a trained model, while maintaining its performance on standard tasks. While unlearning plays a vital role in protecting data privacy, enforcing copyright, and mitigating sociotechnical harms in LLMs, we identify a new vulnerability post-unlearning: unlearning trace detection. We discover that unlearning leaves behind persistent "fingerprints" in LLMs, detectable traces in both model behavior and internal representations. These traces can be identified from output responses, even when prompted with forget-irrelevant inputs. Specifically, even a simple supervised classifier can determine whether a model has undergone unlearning, using only its prediction logits or even its textual outputs. Further analysis shows that these traces are embedded in intermediate activations and propagate nonlinearly to the final layer, forming low-dimensional, learnable manifolds in activation space. Through extensive experiments, we demonstrate that unlearning traces can be detected with over 90% accuracy even under forget-irrelevant inputs, and that larger LLMs exhibit stronger detectability. These findings reveal that unlearning leaves measurable signatures, introducing a new risk of reverse-engineering forgotten information when a model is identified as unlearned, given an input query.

URL PDF HTML ☆

赞 0 踩 0

2603.00963 2026-06-02 cs.LG cs.CL 版本更新

基于流的难解分布密度比估计及其在基因组学中的应用

Egor Antipov, Alessandro Palma, Lorenzo Consoli, Stephan Günnemann, Andrea Dittadi, Fabian J. Theis

发表机构 * ETH Zurich（苏黎世联邦理工学院）； University of Cambridge（剑桥大学）； Max Planck Institute for Informatics（马克斯·普朗克信息研究所）

AI总结提出利用条件感知流匹配推导单一动力学公式，沿生成轨迹追踪密度比，以高效估计难解分布间的密度比，并在单细胞基因组学数据分析中展示竞争力。

详情

AI中文摘要

估计成对难解数据分布之间的密度比是概率建模中的一个核心问题，它能够在不同条件下对不同数据生成过程中的样本似然进行原则性比较。虽然诸如归一化流之类的精确似然模型为密度比估计提供了一种有前景的方法，但朴素评估计算成本高且容易产生离散化误差，因为需要独立模拟每个分布的似然。在这项工作中，我们利用条件感知流匹配推导出一个单一的动力学公式，用于沿生成轨迹追踪密度比。我们在封闭形式比估计的模拟基准上展示了竞争性能，并表明我们的方法支持单细胞基因组学数据分析中的多种任务，其中基于似然的跨实验条件细胞状态比较能够实现治疗效果估计和批次校正评估。

英文摘要

Estimating density ratios between pairs of intractable data distributions is a core problem in probabilistic modeling, enabling principled comparisons of sample likelihoods under different data-generating processes across conditions. While exact-likelihood models such as normalizing flows offer a promising approach to density ratio estimation, naive evaluations are computationally expensive and prone to discretization errors because they require simulating each distribution's likelihood independently. In this work, we leverage condition-aware flow matching to derive a single dynamical formulation for tracking density ratios along generative trajectories. We demonstrate competitive performance on simulated benchmarks for closed-form ratio estimation, and show that our method supports versatile tasks in single-cell genomics data analysis, where likelihood-based comparisons of cellular states across experimental conditions enable treatment effect estimation and batch correction evaluation.

URL PDF HTML ☆

赞 0 踩 0

2602.23881 2026-06-02 cs.LG cs.CL 版本更新

LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding

LK损失：用于推测解码的直接接受率优化

Alexander Samarin, Sergei Krutikov, Anton Shevtsov, Sergei Skvortsov, Filipp Fisin, Alexander Golubev

发表机构 * arXiv

AI总结针对推测解码中标准KL散度训练不能最大化接受率的问题，提出LK损失直接优化接受率，实验表明在多种架构和模型上一致提升接受指标。

Comments ICML 2026

详情

AI中文摘要

推测解码通过使用轻量级草稿模型提出候选令牌，然后由目标模型并行验证，从而加速自回归大型语言模型（LLM）推理。加速效果显著取决于接受率，然而标准训练将Kullback-Leibler（KL）散度作为代理目标进行最小化。虽然KL散度和接受率共享相同的全局最优解，但小型草稿模型由于容量有限，通常收敛到次优解，此时最小化KL并不能保证最大化接受率。为解决此问题，我们提出LK损失，这是一种直接针对接受率的特殊训练目标。在四种草稿架构和六个目标模型（参数范围从8B到685B）上的全面实验表明，与基于KL的标准训练相比，所有配置下的接受指标均有一致提升。我们在通用、编码和数学领域评估了我们的方法，并报告平均接受长度提升高达8-10%。LK损失易于实现，不引入计算开销，可直接集成到任何现有的推测器训练框架中，使其成为现有草稿训练目标的有力替代方案。

英文摘要

Speculative decoding accelerates autoregressive large language model (LLM) inference by using a lightweight draft model to propose candidate tokens that are then verified in parallel by the target model. The speedup is significantly determined by the acceptance rate, yet standard training minimizes Kullback-Leibler (KL) divergence as a proxy objective. While KL divergence and acceptance rate share the same global optimum, small draft models, having limited capacity, typically converge to suboptimal solutions where minimizing KL does not guarantee maximizing acceptance rate. To address this issue, we propose LK losses, special training objectives that directly target acceptance rate. Comprehensive experiments across four draft architectures and six target models, ranging from 8B to 685B parameters, demonstrate consistent improvements in acceptance metrics across all configurations compared to the standard KL-based training. We evaluate our approach on general, coding and math domains and report gains of up to 8-10% in average acceptance length. LK losses are easy to implement, introduce no computational overhead and can be directly integrated into any existing speculator training framework, making them a compelling alternative to the existing draft training objectives.

URL PDF HTML ☆

赞 0 踩 0

2602.23197 2026-06-02 cs.CL cs.LG stat.ML 版本更新

Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models

微调不忘上下文学习：线性注意力模型的理论分析

Chungpa Lee, Jy-yong Sohn, Kangwook Lee

发表机构 * KAIST（韩国科学技术院）

AI总结本文通过线性注意力模型理论分析，揭示了微调目标如何修改注意力参数并导致少样本性能下降的条件，提出仅更新值矩阵可保持上下文学习能力。

详情

Journal ref: International Conference on Machine Learning (ICML) 2026

AI中文摘要

基于Transformer的大型语言模型展现出上下文学习能力，能够通过少量示例提示适应下游任务。实践中，这类模型常被微调以提升下游任务的零样本性能，使其无需示例即可解决问题，从而降低推理成本。然而，微调可能削弱上下文学习能力，限制微调模型在未见任务上的表现。利用线性注意力模型，我们提供了理论分析，刻画了微调目标如何修改注意力参数，并识别了导致少样本性能下降的条件。我们表明，微调所有注意力参数会损害上下文学习，而仅更新值矩阵可在保持上下文学习的同时提升零样本性能。我们进一步证明，引入辅助的少样本损失主要增强目标任务的上下文学习，但以牺牲微调未见任务上的上下文学习能力为代价。我们提供了来自合成和真实数据集的实验证据，与理论定性预测一致。

英文摘要

Transformer-based large language models exhibit in-context learning, enabling adaptation to downstream tasks via few-shot prompting with demonstrations. In practice, such models are often fine-tuned to improve zero-shot performance on downstream tasks, allowing them to solve tasks without examples and thereby reducing inference costs. However, fine-tuning can degrade in-context learning, limiting the performance of fine-tuned models on tasks not seen during fine-tuning. Using linear attention models, we provide a theoretical analysis that characterizes how fine-tuning objectives modify attention parameters and identifies conditions under which this leads to degraded few-shot performance. We show that fine-tuning all attention parameters can harm in-context learning, whereas restricting updates to the value matrix improves zero-shot performance while preserving in-context learning. We further show that incorporating an auxiliary few-shot loss enhances in-context learning primarily on the target task, at the expense of degraded in-context learning ability on tasks not seen during fine-tuning. We provide empirical evidence from synthetic and real-world datasets consistent with the qualitative predictions of our theory.

URL PDF HTML ☆

赞 0 踩 0

2602.16953 2026-06-02 cs.AI cs.LG 版本更新

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

LLM4Cov：面向高覆盖率测试生成的执行感知智能体学习

Hejia Zhang, Zhongming Yu, Chia-Tung Ho, Haoxing Ren, Brucek Khailany, Jishen Zhao

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出LLM4Cov离线智能体学习框架，通过执行验证数据策展、策略感知数据合成和最差状态优先采样，在硬件验证中实现高覆盖率测试生成，4B参数模型在CVDP-ECov上达到69.2%通过率和90.4%平均覆盖率。

Comments ICML'26 Camera Ready version

详情

AI中文摘要

执行感知的LLM智能体为从工具反馈中学习提供了一种有前景的范式，但这种反馈可能昂贵且获取缓慢，使得在线强化学习（RL）在某些场景下不太实用。高覆盖率硬件验证由于依赖工业模拟器和不可微的执行信号，体现了这一挑战。我们提出LLM4Cov，一种离线智能体学习框架，将验证建模为由确定性评估器指导的单步状态转移。基于这一公式，我们引入了执行验证的数据策展、策略感知的智能体数据合成以及最差状态优先采样，以在执行约束下实现可扩展学习。我们进一步通过修订的评估协议，从现有验证套件中整理了一个符合现实的基准。使用所提出的流程，一个紧凑的4B参数模型在智能体评估下实现了69.2%的通过率和90.4%的平均覆盖率（CVDP-ECov），比其教师模型分别高出5.3%和10.5%，展现出与规模大一个数量级的模型相竞争的性能。

英文摘要

Execution-aware LLM agents offer a promising paradigm for learning from tool feedback, but such feedback can be expensive and slow to obtain, making online reinforcement learning (RL) less practical in certain scenarios. High-coverage hardware verification exemplifies this challenge due to its reliance on industrial simulators and non-differentiable execution signals. We propose LLM4Cov, an offline agent-learning framework that models verification as single-step state transitions guided by deterministic evaluators. Building on this formulation, we introduce execution-validated data curation, policy-aware agentic data synthesis, and worst-state-prioritized sampling to enable scalable learning under execution constraints. We further curate a reality-aligned benchmark adapted from an existing verification suite through a revised evaluation protocol. Using the proposed pipeline, a compact 4B-parameter model achieves 69.2% pass rate and 90.4% average coverage in CVDP-ECov under agentic evaluation, outperforming its teacher by 5.3% and 10.5%, demonstrating competitive performance against models an order of magnitude larger.

URL PDF HTML ☆

赞 0 踩 0

2508.08337 2026-06-02 cs.CY cs.AI cs.LG 版本更新

Position: Beyond Sensitive Attributes, ML Fairness Should Quantify Structural Injustice via Social Determinants

立场：超越敏感属性，机器学习公平性应通过社会决定因素量化结构性不公正

Zeyu Tang, Alex John London, Atoosa Kasirzadeh, Sarah Stewart de Ramirez, Peter Spirtes, Kun Zhang, Sanmi Koyejo

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Cambridge（剑桥大学）； University of Washington（华盛顿大学）； University of Michigan（密歇根大学）； University of Toronto（多伦多大学）

AI总结本文主张算法公平性研究应超越敏感属性，通过社会决定因素量化结构性不公正，并通过理论模型和实证研究证明仅关注敏感属性的缓解策略可能引入新的结构性不公正。

Comments Accepted to ICML 2026 Position Paper Track

详情

AI中文摘要

算法公平性研究在很大程度上将不公平视为对敏感属性的歧视。然而，这种方法限制了对作为通过社会决定因素实例化的结构性不公正的不公平的可见性，社会决定因素是塑造属性和结果但不涉及特定个体的上下文变量。这篇立场论文认为，该领域应通过社会决定因素量化结构性不公正，超越敏感属性。借鉴跨学科见解，我们认为主流技术范式未能充分捕捉作为结构性不公正的不公平，因为上下文可能被视为需要标准化的噪声，而不是需要审计的信号。我们进一步通过大学录取的理论模型、使用美国人口普查数据的人口统计研究以及美国综合医疗系统中关于乳腺癌筛查的高风险领域应用，证明了这种转变的实际紧迫性。我们的结果表明，仅关注敏感属性的缓解策略可能引入新的结构性不公正形式。我们认为，通过社会决定因素审计结构性不公正必须先于缓解措施，并呼吁开发超越以敏感属性为中心的非歧视公平概念的新技术。

英文摘要

Algorithmic fairness research has largely framed unfairness as discrimination along sensitive attributes. However, this approach limits visibility into unfairness as structural injustice instantiated through social determinants, which are contextual variables that shape attributes and outcomes without pertaining to specific individuals. This position paper argues that the field should quantify structural injustice via social determinants, beyond sensitive attributes. Drawing on cross-disciplinary insights, we argue that prevailing technical paradigms fail to adequately capture unfairness as structural injustice, because contexts are potentially treated as noise to be normalized rather than signal to be audited. We further demonstrate the practical urgency of this shift through a theoretical model of college admissions, a demographic study using U.S. census data, and a high-stakes domain application regarding breast cancer screening within an integrated U.S. healthcare system. Our results indicate that mitigation strategies centered solely on sensitive attributes can introduce new forms of structural injustice. We contend that auditing structural injustice through social determinants must precede mitigation, and call for new technical developments that move beyond sensitive-attribute-centered notions of fairness as non-discrimination.

URL PDF HTML ☆

赞 0 踩 0

2601.17074 2026-06-02 cs.LG cs.AI 版本更新

污染贝叶斯随机特征中的鲁棒预测不确定性与双重下降

Michele Caprio, Katerina Papagiannouli, Siu Lun Chau, Sayan Mukherjee

发表机构 * The University of Manchester, UK（英国曼彻斯特大学）； University of Pisa, Italy（意大利比萨大学）； Nanyang Technological University, Singapore（新加坡南洋理工大学）； Max Planck Institute for Mathematics in the Sciences, Germany（德国马克斯·普朗克数学研究所）

AI总结提出一种鲁棒贝叶斯随机特征回归方法，通过Huber污染集处理先验和似然误设，推导出后验预测密度的上下界，并引入不精确最高密度区域进行鲁棒不确定性量化，证明预测不确定性保持计算可行性并继承经典双重下降相位结构。

详情

AI中文摘要

我们提出了一种随机特征（RF）回归的鲁棒贝叶斯公式，通过Huber风格的污染集明确考虑先验和似然的误设。从岭正则化RF训练与高斯先验和似然的贝叶斯推断之间的经典等价性出发，我们分别用ε-和η-污染信度集替换单一先验和似然，并使用悲观广义贝叶斯更新进行推断。我们推导出所得后验预测密度的下界和上界的显式且可处理的界限。这些界限表明，当污染适中时，先验和似然模糊性有效地直接污染后验预测分布，产生围绕经典高斯预测的不确定性包络。我们引入了一个不精确最高密度区域（IHDR）用于鲁棒预测不确定性量化，并证明它可以通过调整的高斯可信区间进行有效近似。我们进一步获得了预测方差界限（在温和截断近似下得到上界），并证明它们保留了RF模型已知的领先阶比例增长渐近性。这些结果共同建立了贝叶斯随机特征的鲁棒性理论：预测不确定性保持计算可行性，继承经典的双重下降相位结构，并在有界先验和似然误设下通过显式最坏情况保证得到改进。

英文摘要

We propose a robust Bayesian formulation of random feature (RF) regression that accounts explicitly for prior and likelihood misspecification via Huber-style contamination sets. Starting from the classical equivalence between ridge-regularized RF training and Bayesian inference with Gaussian priors and likelihoods, we replace the single prior and likelihood with $ε$- and $η$-contaminated credal sets, respectively, and perform inference using pessimistic generalized Bayesian updating. We derive explicit and tractable bounds for the resulting lower and upper posterior predictive densities. These bounds show that, when contamination is moderate, prior and likelihood ambiguity effectively acts as a direct contamination of the posterior predictive distribution, yielding uncertainty envelopes around the classical Gaussian predictive. We introduce an Imprecise Highest Density Region (IHDR) for robust predictive uncertainty quantification and show that it admits an efficient approximation via an adjusted Gaussian credible interval. We further obtain predictive variance bounds (under a mild truncation approximation for the upper bound) and prove that they preserve the leading-order proportional-growth asymptotics known for RF models. Together, these results establish a robustness theory for Bayesian random features: predictive uncertainty remains computationally tractable, inherits the classical double-descent phase structure, and is improved by explicit worst-case guarantees under bounded prior and likelihood misspecification.

URL PDF HTML ☆

赞 0 踩 0

2602.19066 2026-06-02 cs.LG cs.AI 版本更新

IDLM: Inverse-distilled Diffusion Language Models

IDLM：逆蒸馏扩散语言模型

David Li, Nikita Gushchin, Dmitry Abulkhanov, Eric Moulines, Ivan Oseledets, Maxim Panov, Alexander Korotin

发表机构 * GitHub

AI总结针对扩散语言模型推理慢的问题，提出逆蒸馏方法（IDLM），通过理论保证唯一解和梯度稳定松弛，实现4倍至64倍推理加速并保持生成质量。

Comments ICML 2026. We provide the code at: https://david-cripto.github.io/idlm-project-page

详情

AI中文摘要

扩散语言模型（DLM）最近在文本生成中取得了强劲成果。然而，其多步采样导致推理缓慢，限制了实际应用。为解决此问题，我们将逆蒸馏（一种最初为加速连续扩散模型而开发的技术）扩展到离散设置。然而，这种扩展引入了理论和实践上的挑战。从理论角度看，逆蒸馏目标缺乏唯一性保证，可能导致次优解。从实践角度看，离散空间中的反向传播非平凡且常不稳定。为克服这些挑战，我们首先提供理论结果，证明我们的逆形式具有唯一解，从而确保有效优化。然后，我们引入梯度稳定松弛以支持有效训练。最终，在多个DLM上的实验表明，我们的方法——逆蒸馏扩散语言模型（IDLM）——将推理步骤减少了4倍至64倍，同时保持了教师模型的生成质量。我们在项目页面上提供代码、模型检查点和视频教程：https://david-cripto.github.io/idlm-project-page。

英文摘要

Diffusion Language Models (DLMs) have recently achieved strong results in text generation. However, their multi-step sampling leads to slow inference, limiting practical use. To address this, we extend Inverse Distillation, a technique originally developed to accelerate continuous diffusion models, to the discrete setting. Nonetheless, this extension introduces both theoretical and practical challenges. From a theoretical perspective, the inverse distillation objective lacks uniqueness guarantees, which may lead to suboptimal solutions. From a practical standpoint, backpropagation in the discrete space is non-trivial and often unstable. To overcome these challenges, we first provide a theoretical result demonstrating that our inverse formulation admits a unique solution, thereby ensuring valid optimization. We then introduce gradient-stable relaxations to support effective training. As a result, experiments on multiple DLMs show that our method, Inverse-distilled Diffusion Language Models (IDLM), reduces the number of inference steps by 4x-64x, while preserving the teacher model's generation quality. We provide the code, model checkpoints, and video tutorials on the project page: https://david-cripto.github.io/idlm-project-page

URL PDF HTML ☆

赞 0 踩 0

2512.09730 2026-06-02 cs.CL cs.LG 版本更新

Interpreto: An Explainability Library for Transformers

Interpreto：一个用于Transformer的可解释性库

Antonin Poché, Thomas Mullor, Gabriele Sarti, Frédéric Boisnard, Corentin Friedrich, Charlotte Claye, François Hoofd, Raphael Bernas, Nicholas Asher, Céline Hudelot, Fanny Jourdan

发表机构 * IRT Saint Exupéry Toulouse（伊尔杜夫圣埃克苏佩里图卢斯）； IRIT Toulouse（图卢兹IRIT）； Khoury College of Computer Sciences（科赫里计算机科学学院）； Ampere（阿姆佩尔）； MICS, CentraleSupélec（MICS，中央超导学院）； Scienta Lab（科学实验室）； Thales Avionics（泰勒斯航空电子）； ANITI

AI总结 Interpreto是一个开源Python库，通过归因方法和基于概念的解释，为HuggingFace语言模型（从早期BERT变体到LLM）提供统一的解释工作流，其端到端基于概念的流水线是主要创新。

Comments Accepted to ACL 2026 System Demonstration. Equal contribution: Poché and Jourdan

2602.18645 2026-06-02 cs.LG 版本更新

Adaptive Time Series Reasoning via Segment Selection

通过片段选择的自适应时间序列推理

Shvat Messica, Jiawen Zhang, Kevin Li, Theodoros Tsiligkaridis, Marinka Zitnik

发表机构 * harvard（哈佛大学）； mit（麻省理工学院）； mitll（MIT林肯实验室）； hongkong（香港科学与技术大学）

AI总结提出ARTIST框架，将时间序列推理建模为序列决策问题，通过控制器-推理器架构和强化学习自适应选择信息片段，提升推理准确率。

Comments ICML 2026

详情

AI中文摘要

时间序列推理任务通常从自然语言问题开始，需要对时间序列进行有针对性的分析。证据可能跨越整个序列或出现在少数短区间内，因此模型必须决定检查什么。大多数现有方法在推理前将整个时间序列编码为固定表示，而不考虑整个序列是否相关。我们引入ARTIST，将时间序列推理建模为序列决策问题。ARTIST将推理与自适应时间片段选择交错进行。它采用控制器-推理器架构，并使用强化学习训练控制器角色选择信息片段，推理角色生成基于片段条件的推理轨迹和最终答案。在推理过程中，模型主动获取任务相关信息，而不是依赖整个序列的静态摘要。我们采用一种新颖的分层策略优化方法进行后训练，使模型在片段选择和问答行为方面都表现出色。我们在六个时间序列推理基准上评估ARTIST，并与大语言模型、视觉语言模型以及先前的时间序列推理系统进行比较。ARTIST在最强基线上平均准确率提高了6.46个绝对百分点。最大的提升出现在罕见事件定位和多片段推理任务上。监督微调提高了性能，而强化学习通过优化问题自适应片段选择提供了额外增益。这些结果表明，选择性数据使用驱动了有效的时间序列推理。

英文摘要

Time series reasoning tasks often start with a natural language question and require targeted analysis of a time series. Evidence may span the full series or appear in a few short intervals, so the model must decide what to inspect. Most existing approaches encode the entire time series into a fixed representation before inference, regardless of whether or not the entire sequence is relevant. We introduce ARTIST, which formulates time-series reasoning as a sequential decision problem. ARTIST interleaves reasoning with adaptive temporal segment selection. It adopts a controller-reasoner architecture and uses reinforcement learning to train the controller role to select informative segments and the reasoner role to generate segment-conditioned reasoning traces and final answers. During inference, the model actively acquires task-relevant information instead of relying on a static summary of the full sequence. We use a novel hierarchical policy optimization approach for post-training that allows the model to excel in both segment selection and question-answering behavior. We evaluate ARTIST on six time-series reasoning benchmarks and compare it with large language models, vision-language models, and prior time-series reasoning systems. ARTIST improves average accuracy by 6.46 absolute percentage points over the strongest baseline. The largest gains appear on rare event localization and multi-segment reasoning tasks. Supervised fine-tuning improves performance, and reinforcement learning provides additional gains by optimizing question-adaptive segment selection. These results show that selective data use drives effective time-series reasoning.

URL PDF HTML ☆

赞 0 踩 0

2602.18195 2026-06-02 cs.LG cs.AI 版本更新

LERD: Latent Event-Relational Dynamics for Neurodegenerative Classification

LERD: 用于神经退行性疾病分类的潜在事件-关系动力学

Yicheng Feng, Hairong Chen, Ziyu Jia, Samir Bhatt, Hengguan Huang

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）； University of Washington（华盛顿大学）； University of California, San Diego（加州大学圣地亚哥分校）； University of California, Los Angeles（加州大学洛杉矶分校）

AI总结提出LERD，一种端到端贝叶斯潜在事件-关系动力系统，直接从多通道脑电图推断潜在神经事件及其关系结构，无需事件或交互标注，在阿尔茨海默病分类中优于基线方法并提供生理对齐的动力学摘要。

详情

AI中文摘要

阿尔茨海默病（AD）会改变大脑电生理学并破坏多通道脑电图动力学，使得准确且临床有用的基于脑电图的诊断对于筛查和疾病监测越来越重要。然而，许多现有方法依赖黑盒分类器，并未明确建模其决策背后的潜在事件时序和跨通道协调。为解决这些局限，我们提出LERD，一种端到端贝叶斯潜在事件-关系动力系统，无需事件或交互标注，直接从多通道脑电图推断潜在神经事件及其关系结构。LERD结合连续时间事件推断模块与随机事件生成过程以捕获灵活的时间模式，同时融入电生理学启发的动力学先验以原则性方式指导学习。我们进一步提供理论分析，得到基于初值问题的可处理KL正则化项以及推断关系动力学的稳定性保证。在合成基准和两个真实世界AD脑电图队列上的大量实验表明，LERD一致优于强基线，并生成与生理对齐的速率、时序和图摘要，有助于刻画组级动力学差异。

英文摘要

Alzheimer's disease (AD) alters brain electrophysiology and disrupts multichannel EEG dynamics, making accurate and clinically useful EEG-based diagnosis increasingly important for screening and disease monitoring. However, many existing approaches rely on black-box classifiers and do not explicitly model the latent event timing and cross-channel coordination behind their decisions. To address these limitations, we propose LERD, an end-to-end Bayesian latent event--relational dynamical system that infers latent neural events and their relational structure directly from multichannel EEG without event or interaction annotations. LERD combines a continuous-time event inference module with a stochastic event-generation process to capture flexible temporal patterns, while incorporating an electrophysiology-inspired dynamical prior to guide learning in a principled way. We further provide theoretical analysis that yields a tractable IVP-based KL regularizer and stability guarantees for the inferred relational dynamics. Extensive experiments on synthetic benchmarks and two real-world AD EEG cohorts demonstrate that LERD consistently outperforms strong baselines and yields physiology-aligned rate, timing, and graph summaries that help characterize group-level dynamical differences.

URL PDF HTML ☆

赞 0 踩 0

2602.18008 2026-06-02 cs.LG cs.AI cs.CL 版本更新

Are LLMs Ready for Neural-integrated Mechanistic Modeling? A Benchmark and Agentic Framework

LLM 是否准备好进行神经集成机制建模？一个基准测试与智能体框架

Zihan Guan, Rituparna Datta, Mengxuan Hu, Shunshun Liu, Aiying Zhang, Prasanna Balachandran, Sheng Li, Anil Vullikanti

发表机构 * University of Virginia（弗吉尼亚大学）

AI总结本文提出神经集成机制建模（NIMM）基准测试，评估大语言模型在三个科学领域构建神经集成机制模型的能力，并设计树引导的智能体框架 NIMMGen，通过分支级搜索和原子模型细化显著提升搜索稳定性和解质量。

Comments 25 pages, 8 figures

详情

AI中文摘要

大语言模型（LLM）在从数据构建机制模型方面显示出潜力。然而，现有评估主要关注简化设置，未能捕捉真实世界科学建模的复杂性。在实践中，此类建模通常涉及神经集成公式，其中机制模型组件和神经网络组件共同构建，导致搜索空间显著复杂化。受此差距驱动，我们引入了神经集成机制建模（NIMM）基准测试，该基准测试评估 LLM 生成的神经集成机制模型在三个科学领域上的表现。在 NIMM 上的实验表明，现有基于 LLM 的方法难以有效探索这一复杂空间，导致搜索稳定性和解质量有限。为应对这一挑战，我们提出了 NIMMGen，一种树引导的智能体框架，通过分支级搜索实现多样化探索，并通过原子模型细化改进解。大量实验表明，NIMMGen 在 NIMM 上达到了最先进的性能，显著提升了搜索稳定性和解质量。

英文摘要

Large language models (LLMs) have shown promise in constructing mechanistic models from data. However, existing evaluations largely focus on simplified settings and fail to capture the complexity of real-world scientific modeling. In practice, such modeling often involves neural-integrated formulations, where a mechanistic model component and a neural network component are jointly constructed, leading to a significantly more complex search space. Motivated by this gap, we introduce the Neural-Integrated Mechanistic Modeling (NIMM) benchmark, which evaluates LLM-generated neural-integrated mechanistic models across three scientific domains. Experiments on NIMM reveal that existing LLM-based approaches struggle to effectively explore this complex space, resulting in limited search stability and solution quality. To address this challenge, we propose NIMMGen, a tree-guided agentic framework that enables diversified exploration via branch-level search and improves solutions through atomic model refinement. Extensive experiments demonstrate that NIMMGen achieves state-of-the-art performance on NIMM, significantly improving search stability and solution quality.

URL PDF HTML ☆

赞 0 踩 0

2602.17737 2026-06-02 cs.RO cs.LG cs.MA 版本更新

NestRL: A Nested Training Regime for Mutual Adaptation in Human-AI Teaming

NestRL: 一种用于人机团队中相互适应的嵌套训练机制

Upasana Biswas, Durgesh Kalwar, Subbarao Kambhampati, Sarath Sreedharan

发表机构 * School of Computing and AI, Arizona State University（计算与人工智能学院，亚利桑那州立大学）； Department of Computer Science, Colorado State University（计算机科学系，科罗拉多州立大学）

AI总结针对人机团队中相互适应的挑战，提出嵌套训练机制NestRL，通过分层训练代理对抗自适应对手，避免产生不透明的协调策略，在Overcooked领域实现更高的任务性能和适应性。

详情

AI中文摘要

相互适应是人机团队中的一个核心挑战，因为人类会自然地根据AI代理的行为调整自己的策略。现有方法试图通过多样化训练伙伴来近似人类行为；然而，这些伙伴通常是静态的，无法捕捉人类队友的适应性。当代理在标准多智能体设置中联合训练时，它们常常收敛到不透明的协调策略，这些策略仅适用于共同训练的伙伴，导致泛化能力差。为了建模自适应的人类行为，我们将人机团队问题形式化为交互式部分可观测马尔可夫决策过程（I-POMDP）。我们提出NestRL，一种嵌套训练机制，通过在每个层级上训练代理对抗来自下一层级的自适应代理，来学习有限层级I-POMDP的解。这使代理暴露于自适应行为，同时防止出现不透明的协调策略。我们提供了理论分析，表明NestRL代理避免了收敛到特定伙伴的策略，并在Overcooked领域通过与最先进的基线进行实证验证。NestRL在与未见过的自适应代理和真实人类队友合作时均实现了更高的任务性能，同时在交互过程中表现出显著更强的适应性。

英文摘要

Mutual adaptation is a central challenge in human-AI teaming, as humans naturally adjust their strategies in response to an AI agent's behavior. Existing approaches attempt to approximate human behavior by diversifying training partners; however, these partners are typically static and fail to capture the adaptive nature of human teammates. When agents are trained jointly in standard multi-agent settings, they often converge to opaque coordination strategies that work only with their co-trained partners, leading to poor generalization. To model adaptive human behavior, we formulate human-AI teaming as an Interactive Partially Observable Markov Decision Process (I-POMDP). We propose NestRL, a nested training regime that learns the solution to a finite-level I-POMDP by training agents at each level against adaptive agents from the level below. This exposes agents to adaptive behavior while preventing emergence of opaque coordination strategies. We provide theoretical analysis showing that NestRL agents avoid convergence to partner-specific strategies, and validate this empirically in the Overcooked domain against state-of-the-art baselines. NestRL achieves higher task performance with both unseen adaptive agents and real human teammates, while exhibiting significantly greater adaptability over the course of interaction.

URL PDF HTML ☆

赞 0 踩 0

2602.17706 2026-06-02 cs.LG 版本更新

Parallel Complex Diffusion for Scalable Time Series Generation

并行复数扩散用于可扩展时间序列生成

Rongyao Cai, Yuxi Wan, Kexin Zhang, Ming Jin, Zhiqiang Ge, Qingsong Wen, Yong Liu

发表机构 * Institute of Cyber-Systems and Control（网络系统与控制研究所）； Zhejiang University（浙江大学）； Griffith University（格里菲斯大学）； School of Mathematics（数学学院）； Southeast University（东南大学）； Squirrel Ai Learning

AI总结提出PaCoDi（并行复数扩散）框架，通过离散傅里叶变换将时间序列分解到谱域，利用并行实值估计器替代复数估计器，解决时间序列生成中的纠缠问题，理论证明谱高斯噪声的正交性，并引入平均场理论近似处理边缘耦合，在无条件和条件生成任务中优于5个基线。

Comments Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '26). Extended Version with Full Proofs

详情

DOI: 10.1145/3770855.3817791

AI中文摘要

扩散模型通过去噪间接学习数据分布，使得生成建模的难度与数据的依赖结构密切相关。对于时间序列，强时间依赖性迫使噪声/分数估计器恢复高度纠缠的跨时间关系，导致纠缠问题。我们通过改变扩散空间的拓扑结构来减轻这一负担：离散傅里叶变换（DFT）将时间依赖分解为谱模式，对角化二阶依赖结构，使数据流形与各向同性高斯噪声和均匀扩散动力学更好地对齐。然而，现有的频率感知扩散方法主要使用DFT设计时间DDPM/SDE框架下的估计器模块，而频率原生扩散路径面临复数动力学带来的数学障碍。我们提出PaCoDi（并行复数扩散），一种频率原生扩散框架，在谱域构建扩散路径，同时用实部和虚部的并行实值估计器替代复数估计器。理论上，我们证明了谱高斯噪声的统计正交性，建立了正交前向转移和条件反向分解，并通过谱维纳过程将离散PaCoDi扩展到连续时间谱SDE。我们进一步引入带有交互校正分支的平均场理论近似来处理边缘耦合，并利用厄米对称性减少50%的注意力FLOPs而无信息损失。在无条件和条件时间序列生成上的大量实验表明，在5个基准测试中，生成质量和计算效率分别优于5个最先进基线。代码可在https://github.com/RongyaoCai/PaCoDi获取。

英文摘要

Diffusion models learn data distributions indirectly through denoising, making the difficulty of generative modeling closely tied to the dependency structure of data. For time series, strong temporal dependence forces the noise / score estimator to recover highly entangled cross-time relationships, leading to the curse of entanglement. We mitigate this burden by changing the topology of the diffusion space: the Discrete Fourier Transform (DFT) decomposes temporal dependencies into spectral modes, diagonalizing second-order dependency structure and better aligning the data manifold with isotropic Gaussian noise and homogeneous diffusion dynamics. However, existing frequency-aware diffusion methods mainly use the DFT to design estimator blocks under temporal DDPM/SDE frameworks, while frequency-native diffusion paths face a mathematical barrier from complex-valued dynamics. We propose PaCoDi (Parallel Complex Diffusion), a frequency-native diffusion framework that constructs the diffusion path in the spectral domain while replacing the complex-valued estimator with parallel real-valued estimators for real and imaginary components. Theoretically, we prove the statistical orthogonality of spectral Gaussian noise, establish quadrature forward transitions and conditional reverse factorization, and extend discrete PaCoDi to continuous-time spectral SDEs through a Spectral Wiener Process. We further introduce a Mean Field Theory approximation with an Interactive Correction Branch to handle marginal coupling, and exploit Hermitian symmetry to reduce 50% attention FLOPs without information loss. Extensive experiments on unconditional and conditional time series generation demonstrate superior generative quality and computational efficiency against 5 SOTA baselines in 5 benchmarks, respectively. Code is available at https://github.com/RongyaoCai/PaCoDi.

URL PDF HTML ☆

赞 0 踩 0

2602.16794 2026-06-02 stat.ML cs.LG 版本更新

Beyond Procedure: Substantive Fairness in Conformal Prediction

超越程序：共形预测中的实质性公平

Pengqi Liu, Zijun Yu, Mouloud Belbahri, Arthur Charpentier, Masoud Asgharian, Jesse C. Cresswell

发表机构 * University of Montreal（蒙特利尔大学）

AI总结本文通过理论分解和LLM辅助评估，研究共形预测中标签聚类方法如何平衡效用与实质性公平，并发现均衡集合大小比覆盖度更能提升公平性。

Comments Camera-ready version. Accepted at ICML 2026

详情

AI中文摘要

共形预测（CP）为机器学习模型提供了无分布的不确定性量化，但其在下游决策中与公平性的相互作用仍未充分探索。超越将CP视为独立操作（程序公平），我们分析整体决策流程以评估实质性公平——下游结果的公平性。理论上，我们推导出一个上界，将预测集大小差异分解为可解释的组成部分，阐明标签聚类CP如何帮助控制方法驱动的对不公平的贡献。为了促进可扩展的实证分析，我们引入了一个LLM在环评估器，它近似人类对跨多种模态的实质性公平的评估。我们的实验表明，标签聚类CP通常在效用和实质性公平之间提供了有利的平衡，同时根据我们的理论减少了集合大小差异。最后，我们实证表明，均衡的集合大小（而非覆盖度）与实质性公平的改善强相关，使从业者能够设计更公平的CP系统。我们的代码可在https://github.com/layer6ai-labs/llm-in-the-loop-conformal-fairness获取。

英文摘要

Conformal prediction (CP) offers distribution-free uncertainty quantification for machine learning models, yet its interplay with fairness in downstream decision-making remains underexplored. Moving beyond CP as a standalone operation (procedural fairness), we analyze the holistic decision-making pipeline to evaluate substantive fairness-the equity of downstream outcomes. Theoretically, we derive an upper bound that decomposes prediction-set size disparity into interpretable components, clarifying how label-clustered CP helps control method-driven contributions to unfairness. To facilitate scalable empirical analysis, we introduce an LLM-in-the-loop evaluator that approximates human assessment of substantive fairness across diverse modalities. Our experiments show that label-clustered CP often provides a favorable balance between utility and substantive fairness, while reducing set-size disparities in line with our theory. Finally, we empirically show that equalized set sizes, rather than coverage, strongly correlate with improved substantive fairness, enabling practitioners to design more fair CP systems. Our code is available at https://github.com/layer6ai-labs/llm-in-the-loop-conformal-fairness.

URL PDF HTML ☆

赞 0 踩 0

2602.16745 2026-06-02 cs.LG cs.AI 版本更新

PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency

PETS：一种面向高效测试时自一致性的最优轨迹分配原则性框架

Zhangyi Liu, Huaizhi Qu, Xiaowei Yin, He Sun, Yanjun Han, Tianlong Chen, Zhun Deng

发表机构 * Stanford University（斯坦福大学）； UNC at Chapel Hill（Chapel Hill 大学）； Yale University（耶鲁大学）； New York University（纽约大学）

AI总结提出PETS框架，通过将轨迹分配建模为优化问题并引入自一致性率度量，在离线（连接众包理论）和在线流式场景下实现样本高效的测试时自一致性，显著降低采样预算。

详情

AI中文摘要

测试时扩展可以通过聚合随机推理轨迹来提高模型性能。然而，在有限预算下实现样本高效的测试时自一致性仍然是一个开放的挑战。我们引入了PETS（原则性且高效的测试时自一致性），它通过一个优化框架启动了对轨迹分配的原则性研究。我们方法的核心是自一致性率，这是一个新定义的度量，即与无限预算多数投票的一致性。这一公式使样本高效的测试时分配在理论上具有坚实基础，并适合严格分析。我们研究了离线和在线两种设置。在离线模式下，所有问题事先已知，我们将轨迹分配与众包（一个经典且成熟的研究领域）联系起来，将推理轨迹建模为工人。这种视角使我们能够利用丰富的现有理论，获得理论保证和一种高效的基于多数投票的分配算法。在在线流式模式下，问题顺序到达且必须实时做出分配，我们提出了一种受离线框架启发的新方法。我们的方法根据问题难度调整预算，同时保持强大的理论保证和计算效率。实验表明，PETS始终优于均匀分配。在GPQA上，PETS在两种设置下均实现了完美的自一致性，同时相对于均匀分配将采样预算减少了高达75%（离线）和55%（在线）。代码可在https://github.com/ZDCSlab/PETS获取。

英文摘要

Test-time scaling can improve model performance by aggregating stochastic reasoning trajectories. However, achieving sample-efficient test-time self-consistency under a limited budget remains an open challenge. We introduce PETS (Principled and Efficient Test-TimeSelf-Consistency), which initiates a principled study of trajectory allocation through an optimization framework. Central to our approach is the self-consistency rate, a new measure defined as agreement with the infinite-budget majority vote. This formulation makes sample-efficient test-time allocation theoretically grounded and amenable to rigorous analysis. We study both offline and online settings. In the offline regime, where all questions are known in advance, we connect trajectory allocation to crowdsourcing, a classic and well-developed area, by modeling reasoning traces as workers. This perspective allows us to leverage rich existing theory, yielding theoretical guarantees and an efficient majority-voting-based allocation algorithm. In the online streaming regime, where questions arrive sequentially and allocations must be made on the fly, we propose a novel method inspired by the offline framework. Our approach adapts budgets to question difficulty while preserving strong theoretical guarantees and computational efficiency. Experiments show that PETS consistently outperforms uniform allocation. On GPQA, PETS achieves perfect self-consistency in both settings while reducing the sampling budget by up to 75% (offline) and 55% (online) relative to uniform allocation. Code is available at https://github.com/ZDCSlab/PETS.

URL PDF HTML ☆

赞 0 踩 0

2602.16220 2026-06-02 cs.LG 版本更新

SEMixer: Semantics Enhanced MLP-Mixer for Multiscale Mixing and Long-term Time Series Forecasting

SEMixer: 语义增强的MLP-Mixer用于多尺度混合和长期时间序列预测

Xu Zhang, Qitong Wang, Peng Wang, Wei Wang

发表机构 * Shanghai Key Laboratory of Data Science, College of Computer Science and Artificial Intelligence Fudan University（上海数据科学 key 实验室，复旦大学计算机科学与人工智能学院）； Harvard University（哈佛大学）

AI总结提出SEMixer模型，通过随机注意力机制和多尺度渐进混合链，有效建模多尺度时间依赖并解决语义鸿沟问题，在10个公开数据集和真实无线网络数据上取得优异性能。

Comments This work is accepted by the proceedings of the ACM Web Conference 2026 (WWW 2026). The code is available at the link https://github.com/Meteor-Stars/SEMixer

详情

AI中文摘要

建模多尺度模式对于长期时间序列预测（TSF）至关重要。然而，时间序列中的冗余和噪声，以及非相邻尺度之间的语义鸿沟，使得高效对齐和集成多尺度时间依赖具有挑战性。为此，我们提出了SEMixer，一种专为长期TSF设计的轻量级多尺度模型。SEMixer包含两个关键组件：随机注意力机制（RAM）和多尺度渐进混合链（MPMC）。RAM在训练期间捕获多样化的时间块交互，并通过推理时的dropout集成进行聚合，增强了块级语义，使MLP-Mixer能够更好地建模多尺度依赖。MPMC进一步以内存高效的方式堆叠RAM和MLP-Mixer，实现更有效的时间混合。它解决了跨尺度的语义鸿沟，促进了更好的多尺度建模和预测性能。我们不仅在10个公开数据集上验证了SEMixer的有效性，还在基于21GB真实无线网络数据的 extit{2025 CCF AlOps Challenge}中取得了第三名。代码可在链接https://github.com/Meteor-Stars/SEMixer获取。

英文摘要

Modeling multiscale patterns is crucial for long-term time series forecasting (TSF). However, redundancy and noise in time series, together with semantic gaps between non-adjacent scales, make the efficient alignment and integration of multi-scale temporal dependencies challenging. To address this, we propose SEMixer, a lightweight multiscale model designed for long-term TSF. SEMixer features two key components: a Random Attention Mechanism (RAM) and a Multiscale Progressive Mixing Chain (MPMC). RAM captures diverse time-patch interactions during training and aggregates them via dropout ensemble at inference, enhancing patch-level semantics and enabling MLP-Mixer to better model multi-scale dependencies. MPMC further stacks RAM and MLP-Mixer in a memory-efficient manner, achieving more effective temporal mixing. It addresses semantic gaps across scales and facilitates better multiscale modeling and forecasting performance. We not only validate the effectiveness of SEMixer on 10 public datasets, but also on the \textit{2025 CCF AlOps Challenge} based on 21GB real wireless network data, where SEMixer achieves third place. The code is available at the link https://github.com/Meteor-Stars/SEMixer.

URL PDF HTML ☆

赞 0 踩 0

2602.05139 2026-06-02 cs.LG 版本更新

Adaptive Exploration for Latent-State Bandits

潜在状态赌博机的自适应探索

Jikai Jin, Kenneth Hung, Sanath Kumar Krishnamurthy, Baoyi Shi, Congshan Zhang

发表机构 * The Institute for Computational and Mathematical Engineering（计算与数学工程研究所）； Stanford University（斯坦福大学）； Meta Platforms, Inc.（Meta平台公司）； Ads Online Experimentation（广告在线实验部）； Central Applied Science（应用科学中央研究所）

AI总结针对奖励依赖于未观测马尔可夫状态的赌博机问题，提出基于LinUCB的自适应算法，通过滞后动作-奖励对和探针指纹两种摘要来区分状态，并采用残差、边际和过时测试动态更新指纹，在合成压力测试中相比标准、对抗和非平稳基线降低了动态遗憾。

Comments 12 pages, 3 figures, 5 tables

2602.15259 2026-06-02 cs.CY cs.AI cs.LG 版本更新

Knowing Isn't Understanding: Re-grounding Generative Proactivity with Epistemic and Behavioral Insight

知道不等于理解：用认知与行为洞察重新奠定生成式主动性

Kirandeep Kaur, Xingda Lyu, Chirag Shah

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Toronto（多伦多大学）； University of Waterloo（滑铁卢大学）

AI总结针对用户无法明确表达需求时的认知不完整问题，提出生成式主动性需要基于认知和行为双重约束来设计负责任的主动代理。

Comments 43 rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

生成式AI代理将理解等同于解决显式查询，这一假设将交互限制在用户能够表达的范围内。当用户自身缺乏对缺失、风险或值得考虑之事的意识时，这一假设就会失效。在这种情况下，主动性不仅是效率提升，更是一种认知上的必要性。我们将这种状态称为认知不完整：即进步依赖于处理未知的未知以实现有效协作。现有的主动性方法仍然局限于预测性，从过去行为中推断并假定目标已经明确，从而未能有意义地支持用户。然而，揭示超出用户当前意识的可能性并非天然有益。不受约束的主动干预可能误导注意力、使用户不堪重负或引入伤害。因此，主动代理需要行为锚定：对代理何时、如何以及在何种程度上进行干预施加原则性约束。我们主张生成式主动性必须在认知和行为上双重锚定。借鉴无知哲学和主动行为研究，我们认为这些理论为设计能够负责任地参与并促进有意义协作的代理提供了关键指导。

英文摘要

Generative AI agents equate understanding with resolving explicit queries, an assumption that confines interaction to what users can articulate. This assumption breaks down when users themselves lack awareness of what is missing, risky, or worth considering. In such conditions, proactivity is not merely an efficiency enhancement, but an epistemic necessity. We refer to this condition as epistemic incompleteness: where progress depends on engaging with unknown unknowns for effective partnership. Existing approaches to proactivity remain narrowly anticipatory, extrapolating from past behavior and presuming that goals are already well defined, thereby failing to support users meaningfully. However, surfacing possibilities beyond a user's current awareness is not inherently beneficial. Unconstrained proactive interventions can misdirect attention, overwhelm users, or introduce harm. Proactive agents, therefore, require behavioral grounding: principled constraints on when, how, and to what extent an agent should intervene. We advance the position that generative proactivity must be grounded both epistemically and behaviorally. Drawing on the philosophy of ignorance and research on proactive behavior, we argue that these theories offer critical guidance for designing agents that can engage responsibly and foster meaningful partnerships.

URL PDF HTML ☆

赞 0 踩 0

2602.14849 2026-06-02 cs.LG cs.AI cs.DC cs.MA 版本更新

iML: 可执行、问题驱动且广泛探索的代码驱动自动机器学习

Dat Le, Duc-Cuong Le, Anh-Son Nguyen, Tuan-Dung Bui, Thu-Trang Nguyen, Son Nguyen, Hieu Dinh Vo

发表机构 * Faculty of Information Technology, VNU University of Engineering and Technology（信息科技学院，越南工程与技术大学）

AI总结提出iML多智能体框架，通过任务分析、数据剖析、结构化蓝图生成和模块化代码合成，实现可执行、问题驱动且广泛探索的代码驱动AutoML，在MLE-BENCH和iML-BENCH上显著优于基线。

详情

AI中文摘要

自动机器学习（AutoML）改善了机器学习的可访问性，但现有技术通常在灵活性、透明度和执行可靠性方面仍然有限。代码驱动的AutoML通过合成用于预处理、模型训练和评估的可执行代码，提供了一个有前景的方向。然而，当前基于LLM的方法经常生成在文本上合理但在执行中脆弱、未充分基于实际数据集或局限于狭窄解决方案路径的代码。在本文中，我们介绍了iML，一个多智能体代码驱动AutoML框架，围绕三个需求设计：可执行性、问题驱动和对有效解决方案的广泛探索。iML首先分析任务并剖析数据，然后合成一个结构化的蓝图，指导跨多个实现轨道的模块化代码生成，包括传统机器学习、预训练适应和自定义神经架构。为了提高可靠性，iML在集成过程中强制执行接口检查、动态执行和迭代调试。我们在MLE-BENCH和新引入的iML-BENCH上评估iML，涵盖多样化的Kaggle风格任务。在MLE-BENCH上，iML达到了90%的有效提交率和45%的奖牌率，以及0.82的APS，将基于LLM的基线的平均标准化性能分数（APS）提高了52%-273%。在iML-BENCH上，它实现了最高的APS，并且即使在任务描述被大幅简化时也表现出稳健的性能。这些结果确立了iML作为代码驱动AutoML的可靠且有竞争力的框架。

英文摘要

Automated Machine Learning (AutoML) has improved access to machine learning, yet existing techniques often remain limited in flexibility, transparency, and execution reliability. Code-driven AutoML offers a promising direction by synthesizing executable code for preprocessing, model training, and evaluation. However, current LLM-based approaches frequently generate code that is plausible in text yet brittle in execution, insufficiently grounded in the actual dataset, or restricted to narrow solution paths. In this paper, we introduce iML, a multi-agent code-driven AutoML framework designed around three requirements: executability, problem grounding, and broad exploration of valid solutions. iML first analyzes the task and profiles the data, then synthesizes a structured blueprint that guides modular code generation across multiple implementation tracks, including traditional ML,pretrained adaptation, and custom neural architectures. To improve reliability, iML enforces interface checking, dynamic execution, and iterative debugging during integration. We evaluate iML on MLE-BENCH and the newly introduced iML-BENCH, covering diverse Kaggle-style tasks. On MLE-BENCH, iML attains a 90% valid submission rate and a 45% medal rate, and an APS of 0.82, improving the average standardized performance score (APS) over the LLM-based baselines by 52%-273%. On iML-BENCH, it achieves the highest APS and demonstrates robust performance even when task descriptions are substantially stripped. These results establish iML as a reliable and competitive framework for code-driven AutoML.

URL PDF HTML ☆

赞 0 踩 0

2602.13906 2026-06-02 stat.ML cs.LG 版本更新

联合优化去偏点击率和优惠券营销提升：一个统一因果框架

Siyun Yang, Shixiao Yang, Jian Wang, Di Fan, Kehe Cai, Haoyan Fu, Jiaming Zhang, Wenjin Wu, Peng Jiang

发表机构 * Kuaishou Technology（快手科技）； Beijing Institute of Technology（北京理工大学）； Independent Researcher（独立研究者）

AI总结针对优惠券等营销干预导致的点击率预测偏差，提出统一多值处理网络UniMVT，通过反事实推断同时实现去偏点击率预测和提升估计。

详情

AI中文摘要

在线广告中，优惠券等营销干预会引入显著的混杂偏差，影响点击率（CTR）预测。观察到的点击反映了用户内在偏好与干预带来的提升的混合。这导致传统模型对基础CTR校准不准确，从而扭曲下游排序和计费决策。此外，营销干预通常作为多值处理，具有不同幅度，给CTR预测增加了额外复杂性。为解决这些问题，我们提出了统一多值处理网络（UniMVT）。具体来说，UniMVT从处理敏感表示中解耦混杂因素，使得全空间反事实推断模块能够联合重建去偏的基础CTR和强度-响应曲线。为处理多值处理的复杂性，UniMVT采用辅助强度估计任务来捕获处理倾向，并设计一个单位提升目标来归一化干预效果。这确保了在连续优惠券价值谱上的可比较估计。UniMVT同时实现了用于准确系统校准的去偏CTR预测和用于激励分配的精确提升估计。在合成和工业数据集上的大量实验证明了UniMVT在预测准确性和校准方面的优越性。此外，真实世界的A/B测试证实，UniMVT通过更有效的优惠券分发显著改善了业务指标。

英文摘要

In online advertising, marketing interventions such as coupons introduce significant confounding bias into Click-Through Rate (CTR) prediction. Observed clicks reflect a mixture of users' intrinsic preferences and the uplift induced by these interventions. This causes conventional models to miscalibrate base CTRs, which distorts downstream ranking and billing decisions. Furthermore, marketing interventions often operate as multi-valued treatments with varying magnitudes, introducing additional complexity to CTR prediction. To address these issues, we propose the \textbf{Uni}fied \textbf{M}ulti-\textbf{V}alued \textbf{T}reatment Network (UniMVT). Specifically, UniMVT disentangles confounding factors from treatment-sensitive representations, enabling a full-space counterfactual inference module to jointly reconstruct the debiased base CTR and intensity-response curves. To handle the complexity of multi-valued treatments, UniMVT employs an auxiliary intensity estimation task to capture treatment propensities and devise a unit uplift objective that normalizes the intervention effect. This ensures comparable estimation across the continuous coupon-value spectrum. UniMVT simultaneously achieves debiased CTR prediction for accurate system calibration and precise uplift estimation for incentive allocation. Extensive experiments on synthetic and industrial datasets demonstrate UniMVT's superiority in both predictive accuracy and calibration. Furthermore, real-world A/B tests confirm that UniMVT significantly improves business metrics through more effective coupon distribution.

URL PDF HTML ☆

赞 0 踩 0

2602.12080 2026-06-02 cs.LG 版本更新

PathCRF: Ball-Free Soccer Event Detection via Possession Path Inference from Player Trajectories

PathCRF: 通过球员轨迹的控球路径推断实现无球足球事件检测

Hyunsung Kim, Kunhee Lee, Sangwoo Seo, Sang-Ki Ko, Jinsung Yoon, Chanyoung Park

发表机构 * KAIST（韩国釜山科学技术院）； Fitogether Inc.（Fitogether公司）； University of Seoul（首尔大学）

AI总结提出PathCRF框架，仅利用球员轨迹数据，通过将轨迹建模为动态图并采用条件随机场（CRF）推断控球路径，实现无球足球事件检测，降低对人工标注和球轨迹数据的依赖。

详情

AI中文摘要

尽管人工智能取得了最新进展，足球比赛的事件数据收集仍然严重依赖劳动密集型的人工标注。虽然已有研究利用球员和球轨迹探索自动事件检测，但由于高昂的基础设施和运营成本，球轨迹追踪仍然难以大规模应用。因此，足球领域的全面数据收集主要局限于顶级赛事，限制了数据驱动分析在该领域的广泛应用。为了解决这一挑战，本文提出了PathCRF，一个仅使用球员追踪数据检测足球控球事件的框架。我们将球员轨迹建模为全连接动态图，并将事件检测形式化为在每个时间步选择恰好一条对应于当前控球状态的边。为了确保所得边序列的逻辑一致性，我们采用条件随机场（CRF），禁止连续边之间出现不可能的转换，其中发射分数和转移分数由社会-时间骨干架构产生的边嵌入动态计算。在推理过程中，通过维特比解码获得最可能的边序列，当所选边在相邻时间步之间发生变化时，检测到控球或传球等事件。实验表明，PathCRF生成准确、逻辑一致的控球路径，能够实现可靠的下游分析，同时大幅减少对人工事件标注的需求。源代码可在 https://github.com/hyunsungkim-ds/pathcrf.git 获取。

英文摘要

Despite recent advances in AI, event data collection in soccer still relies heavily on labor-intensive manual annotation. Although prior work has explored automatic event detection using player and ball trajectories, ball tracking also remains difficult to scale due to high infrastructural and operational costs. As a result, comprehensive data collection in soccer is largely confined to top-tier competitions, limiting the broader adoption of data-driven analysis in this domain. To address this challenge, this paper proposes PathCRF, a framework for detecting on-ball soccer events using only player tracking data. We model player trajectories as a fully connected dynamic graph and formulate event detection as the problem of selecting exactly one edge corresponding to the current possession state at each time step. To ensure logical consistency of the resulting edge sequence, we employ a Conditional Random Field (CRF) that forbids impossible transitions between consecutive edges, where emission and transition scores are dynamically computed from edge embeddings produced by a socio-temporal backbone architecture. During inference, the most probable edge sequence is obtained via Viterbi decoding, and events such as ball controls or passes are detected whenever the selected edge changes between adjacent time steps. Experiments show that PathCRF produces accurate, logically consistent possession paths, enabling reliable downstream analyses while substantially reducing the need for manual event annotation. The source code is available at https://github.com/hyunsungkim-ds/pathcrf.git.

URL PDF HTML ☆

赞 0 踩 0

2602.11852 2026-06-02 cs.AI cs.CL cs.LG 版本更新

Prototype Transformer: Towards Language Model Architectures Interpretable by Design

原型Transformer：迈向可解释设计的语言模型架构

Yordan Yordanov, Matteo Forasassi, Bayar Menzat, Ruizhi Wang, Chang Qi, Markus Kaltenberger, Amine M'Charrak, Tommaso Salvatori, Thomas Lukasiewicz

发表机构 * University of Cambridge（剑桥大学）； ETH Zurich（苏黎世联邦理工学院）

AI总结提出原型Transformer（ProtoT），一种用线性代价原型模块替代二次代价自注意力的自回归语言模型架构，原型自动捕获可命名概念，提升可解释性并支持行为编辑。

Comments Accepted at ICML 2026. Equal contribution: Yordan Yordanov and Matteo Forasassi. 40 pages, 28 figures, 22 tables

详情

WildCat: 理论与实践中近乎线性的注意力机制

Tobias Schröder, Lester Mackey

发表机构 * University of Cambridge（剑桥大学）

AI总结提出WildCat方法，通过随机枢轴Cholesky算法选择加权核心集，在近线性时间内以超多项式误差衰减近似精确注意力，并应用于图像生成、分类和语言模型KV缓存压缩。

详情

AI中文摘要

我们介绍了WildCat，一种高精度、低成本的神经网络注意力机制压缩方法。虽然注意力是现代网络架构的标配，但由于其资源需求随输入序列长度$n$呈二次方增长，部署成本极高。WildCat通过仅关注一个小的加权核心集来避免这些二次成本。关键的是，我们使用一种快速但谱精确的子采样算法——随机枢轴Cholesky——来选择核心集，并最优地加权元素以最小化重构误差。值得注意的是，在输入有界的情况下，WildCat以超多项式$O(n^{-\sqrt{\log(\log(n))}})$的误差衰减逼近精确注意力，同时运行在近线性$O(n^{1+o(1)})$时间内。相比之下，先前的实用近似要么缺乏误差保证，要么需要二次运行时间才能保证如此高的保真度。我们将这一进展与GPU优化的PyTorch实现以及一套基准实验相结合，展示了WildCat在图像生成、图像分类和语言模型KV缓存压缩方面的优势。

英文摘要

We introduce WildCat, a high-accuracy, low-cost approach to compressing the attention mechanism in neural networks. While attention is a staple of modern network architectures, it is also notoriously expensive to deploy due to resource requirements that scale quadratically with the input sequence length $n$. WildCat avoids these quadratic costs by only attending over a small weighted coreset. Crucially, we select the coreset using a fast but spectrally-accurate subsampling algorithm -- randomly pivoted Cholesky -- and weight the elements optimally to minimise reconstruction error. Remarkably, given bounded inputs, WildCat approximates exact attention with super-polynomial $O(n^{-\sqrt{\log(\log(n))}})$ error decay while running in near-linear $O(n^{1+o(1)})$ time. In contrast, prior practical approximations either lack error guarantees or require quadratic runtime to guarantee such high fidelity. We couple this advance with a GPU-optimized PyTorch implementation and a suite of benchmark experiments demonstrating the benefits of WildCat for image generation, image classification, and language model KV cache compression.

URL PDF HTML ☆

赞 0 踩 0

2602.09651 2026-06-02 stat.ML cs.LG 版本更新

The Entropic Signature of Class Speciation in Diffusion Models

扩散模型中类别分化的熵特征

Florian Handke, Dejan Stančević, Felix Koulischer, Thomas Demeester, Luca Ambrogioni

发表机构 * GitHub ； arXiv

AI总结通过追踪潜在语义变量的类别条件熵，检测扩散模型中的语义转变区间，并验证其在高斯混合模型和实际模型中的有效性。

Comments Accepted at International Conference on Machine Learning (ICML) 2026

详情

AI中文摘要

广义光滑性下的镜像下降

Dingzhi Yu, Wei Jiang, Hongyi Tao, Yuanyu Wan, Lijun Zhang

发表机构 * State Key Laboratory of Novel Software Technology, Nanjing University（南京大学新型软件技术国家重点实验室）； School of Artificial Intelligence, Nanjing University（南京大学人工智能学院）； School of Computer Science and Engineering, Nanjing University of Science and Technology（南京理工大学计算机科学与工程学院）； School of Software Technology, Zhejiang University（浙江大学软件技术学院）； Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security（杭州高新技术区（滨江）区块链与数据安全研究院）

AI总结本文提出一种新的 $\ell_*$-光滑性概念，将经典光滑性推广到一般范数空间，并证明镜像下降类算法在此条件下收敛率与经典光滑性一致。

Comments ICML 2026

详情

AI中文摘要

光滑性对于一阶优化达到快速收敛率至关重要。然而，现代机器学习中的许多优化问题涉及非光滑目标。最近的研究通过允许梯度的Lipschitz常数相对于梯度范数增长来放宽光滑性假设，这适应了实践中广泛的目标。尽管取得了进展，现有的光滑性推广仅限于具有 $\ell_2$ 范数的欧几里得几何，并且仅在欧几里得空间中的优化具有理论保证。在本文中，我们通过引入一个新的 $\ell_*$-光滑性概念来解决这一限制，该概念以一般范数及其对偶度量Hessian的范数，并建立了镜像下降类型算法的收敛性，与经典光滑性下的收敛率相匹配。值得注意的是，我们提出了一种广义的自有界性质，有助于通过控制次优性间隙来界定梯度，作为收敛分析的主要组成部分。在确定性优化之外，我们建立了随机镜像下降的尖锐收敛性，与经典光滑性下的最新结果相匹配。我们的理论还扩展到非凸和复合优化，这可能为镜像下降的实际应用（包括大语言模型的预训练和后训练）提供启示。

英文摘要

Smoothness is crucial for attaining fast rates in first-order optimization. However, many optimization problems in modern machine learning involve non-smooth objectives. Recent studies relax the smoothness assumption by allowing the Lipschitz constant of the gradient to grow with respect to the gradient norm, which accommodates a broad range of objectives in practice. Despite this progress, existing generalizations of smoothness are restricted to Euclidean geometry with $\ell_2$-norm and only have theoretical guarantees for optimization in the Euclidean space. In this paper, we address this limitation by introducing a new $\ell*$-smoothness concept that measures the norm of Hessians in terms of a general norm and its dual, and establish convergence for mirror-descent-type algorithms, matching the rates under the classic smoothness. Notably, we propose a generalized self-bounding property that facilitates bounding the gradients via controlling suboptimality gaps, serving as a principal component for convergence analysis. Beyond deterministic optimization, we establish sharp convergence for stochastic mirror descent, matching state-of-the-art under classic smoothness. Our theory also extends to non-convex and composite optimization, which may shed light on practical usages of mirror descent, including pre-training and post-training of LLMs.

URL PDF HTML ☆

赞 0 踩 0

2602.08868 2026-06-02 cs.LG cs.AI 版本更新

AnomSeer: Reinforcing Multimodal LLMs to Reason for Time-Series Anomaly Detection

AnomSeer: 增强多模态大语言模型进行时间序列异常检测的推理能力

Junru Zhang, Lang Feng, Haoran Shi, Xu Guo, Han Yu, Yabo Dong, Duanqing Xu

发表机构 * GitHub

AI总结提出AnomSeer，通过专家思维链和基于最优传输的时间序列接地策略优化，增强多模态大语言模型在时间序列异常检测中的细粒度推理能力，统一异常分类、定位和解释。

Comments ICML 2026

详情

AI中文摘要

基于多模态大语言模型（MLLM）的时间序列异常检测（TSAD）是一个新兴领域，但一个持续存在的挑战是：MLLM依赖于粗略的时间序列启发式方法，但在多维、详细的推理方面存在困难，而这对于理解复杂的时间序列数据至关重要。我们提出AnomSeer来解决这个问题，通过增强模型将其推理基于时间序列的精确结构细节，统一异常分类、定位和解释。其核心是生成专家思维链轨迹，从经典分析（如统计度量、频率变换）中提供可验证的细粒度推理。在此基础上，我们提出了一种新颖的时间序列接地策略优化（TimerPO），它在标准强化学习之外引入了两个额外组件：基于最优传输的时间序列接地优势，以及确保这种辅助细粒度信号不干扰主要检测目标的正交投影。在各种异常场景中，使用Qwen2.5-VL-3B/7B-Instruct的AnomSeer在分类和定位准确性上优于更大的商业基线（如GPT-4o），特别是在点和频率驱动的异常上。此外，它产生了合理的时间序列推理轨迹，支持其结论。

英文摘要

Time-series anomaly detection (TSAD) with multimodal large language models (MLLMs) is an emerging area, yet a persistent challenge remains: MLLMs rely on coarse time-series heuristics but struggle with multi-dimensional, detailed reasoning, which is vital for understanding complex time-series data. We present AnomSeer to address this by reinforcing the model to ground its reasoning in precise, structural details of time series, unifying anomaly classification, localization, and explanation. At its core, an expert chain-of-thought trace is generated to provide a verifiable, fine-grained reasoning from classical analyses (e.g., statistical measures, frequency transforms). Building on this, we propose a novel time-series grounded policy optimization (TimerPO) that incorporates two additional components beyond standard reinforcement learning: a time-series grounded advantage based on optimal transport and an orthogonal projection to ensure this auxiliary granular signal does not interfere with the primary detection objective. Across diverse anomaly scenarios, AnomSeer, with Qwen2.5-VL-3B/7B-Instruct, outperforms larger commercial baselines (e.g., GPT-4o) in classification and localization accuracy, particularly on point- and frequency-driven exceptions. Moreover, it produces plausible time-series reasoning traces that support its conclusions.

URL PDF HTML ☆

赞 0 踩 0

2602.08689 2026-06-02 cs.LG 版本更新

Learning To Sample From Diffusion Models Via Inverse Reinforcement Learning

通过逆强化学习从扩散模型中学习采样

Constant Bourdrez, Alexandre Vérine, Olivier Cappé

发表机构 * DI ENS, Ecole normale supérieure, Université PSL, CNRS（巴黎高等师范学院）

AI总结提出一个逆强化学习框架，在不重新训练去噪器的情况下优化扩散模型的采样策略（噪声调度、引导尺度、随机性），通过策略梯度匹配目标行为，在ImageNet-64上以9倍更低成本和16%推理开销替代网格搜索。

Comments Preprint

详情

AI中文摘要

扩散模型通过由预训练神经网络引导的迭代去噪过程生成样本。一旦去噪器固定，采样算法本身（噪声调度、引导尺度、随机性分布）仍需要仔细调整，这一过程通常通过昂贵的经验网格搜索进行。在这项工作中，我们引入了一个逆强化学习框架，用于在不重新训练去噪器的情况下学习采样策略。我们将扩散采样过程建模为一个离散时间有限时域马尔可夫决策过程，其中动作对应于采样动力学的可选修改。为了优化动作调度，我们避免定义显式奖励函数，而是直接使用策略梯度技术匹配采样器预期的目标行为。我们提供的实验证据表明，该方法与微调后的采样器性能相当，并且与网格搜索相比成本适中：在ImageNet-64上，单次训练运行以高达9倍更低的成本取代了穷举搜索，推理时仅增加16%的开销。

英文摘要

Diffusion models generate samples through an iterative denoising process guided by a pretrained neural network. Once the denoiser is fixed, the sampling algorithm itself (noise schedules, guidance scales, stochasticity profiles) still requires careful tuning, a process typically carried out through costly empirical grid search. In this work, we introduce an inverse reinforcement learning framework for learning sampling strategies without retraining the denoiser. We formulate the diffusion sampling procedure as a discrete-time finite-horizon Markov Decision Process, where actions correspond to optional modifications of the sampling dynamics. To optimize action scheduling, we avoid defining an explicit reward function and instead directly match the target behavior expected from the sampler using policy gradient techniques. We provide experimental evidence that this approach matches fine-tuned samplers and comes at a modest cost compared to grid search: on ImageNet-64, a single training run replaces exhaustive search at up to 9x lower cost, with only 16% overhead at inference.

URL PDF HTML ☆

赞 0 踩 0

2602.08585 2026-06-02 cs.LG cs.AI 版本更新

Predicting Future Utility: Global Combinatorial Optimization for Task-Agnostic KV Cache Eviction

预测未来效用：任务无关的KV缓存驱逐的全局组合优化

Ziyao Tang, Pengkun Jiao, Xinhang Chen, Wei Liu, Shiyong Li, Jingjing Chen

发表机构 * Fudan University（复旦大学）； Baige AI Team, Baidu inc（百度AI团队）； Work done during an internship at Baidu（百度实习）

AI总结提出LU-KV框架，通过全局组合优化分配注意力头预算以最大化长期边际贡献，实现80%的KV缓存压缩且性能损失极小。

详情

AI中文摘要

鉴于注意力的二次复杂度，KV缓存驱逐对于加速模型推理至关重要。当前的KV缓存驱逐方法通常依赖于瞬时启发式度量，隐含地假设分数幅度是所有注意力头的重要性一致代理。然而，这忽略了注意力头之间预测保真度的异质性。虽然某些头优先考虑令牌的瞬时贡献，但其他头致力于捕捉长期效用。在本文中，我们提出最优预算分配应由保留长期语义信息的边际效用决定。基于这一见解，我们提出了LU-KV，这是一个新颖的框架，将头级预算分配表述为全局组合优化问题，以最大化保留令牌的长期边际贡献。为了解决这个非凸问题，我们采用凸包松弛和基于边际效用的贪婪求解器，实现接近最优的解。此外，我们实现了一个数据驱动的离线分析协议，以促进LU-KV的实际部署。在LongBench和RULER基准上的评估表明，LU-KV将KV缓存大小减少了80%，性能下降最小，同时降低了推理延迟和GPU内存占用。

英文摘要

Given the quadratic complexity of attention, KV cache eviction is vital to accelerate model inference. Current KV cache eviction methods typically rely on instantaneous heuristic metrics, implicitly assuming that score magnitudes are consistent proxies for importance across all heads. However, this overlooks the heterogeneity in predictive fidelity across attention heads. While certain heads prioritize the instantaneous contribution of tokens, others are dedicated to capturing long-horizon utility. In this paper, we propose that optimal budget allocation should be governed by the marginal utility in preserving long-term semantic information. Building on this insight, we propose LU-KV, a novel framework that formulates head-level budget allocation as a global combinatorial optimization problem to maximize the long-horizon marginal contribution of reserved tokens. To solve this non-convex problem, we employ a convex-hull relaxation and a marginal-utility-based greedy solver, achieving near-optimal solutions. Furthermore, we implement a data-driven offline profiling protocol to facilitate the practical deployment of LU-KV. Evaluations on LongBench and RULER benchmarks demonstrate that LU-KV reduces KV cache size by 80% with minimal performance degradation, while also decreasing inference latency and GPU memory footprint.

URL PDF HTML ☆

赞 0 踩 0

2602.06065 2026-06-02 stat.ML cond-mat.dis-nn cs.CL cs.LG 版本更新

Deep networks learn to parse uniform-depth context-free languages from local statistics

深度网络从局部统计中学习解析均匀深度的上下文无关语言

Jack T. Parley, Francesco Cagnetta, Matthieu Wyart

发表机构 * GitHub

AI总结通过引入可调类概率上下文无关文法并设计基于深度卷积网络的推理算法，揭示了语言结构从局部统计中涌现的机制，并验证了深度卷积和Transformer架构的预测。

Comments Accepted as regular paper at ICML 2026

详情

AI中文摘要

理解语言结构如何仅从句子中学习是认知科学和机器学习中的一个核心问题。大型语言模型（LLMs）内部表征的研究支持其在预测下一个词时解析文本的能力，同时独立于表面形式表示语义概念。然而，哪些数据统计使这些成就成为可能，以及需要多少数据，仍然在很大程度上未知。概率上下文无关文法（PCFGs）为研究这些问题提供了一个可处理的测试平台。然而，先前的工作要么侧重于训练网络使用的类解析算法的后验表征，要么侧重于具有固定语法（无需解析）的PCFGs的可学习性。在这里，我们（i）引入了一个可调的PCFGs类别，其中歧义程度和跨尺度的相关结构都可以被控制；（ii）提供了一种学习机制——一种受深度卷积网络结构启发的推理算法——将可学习性和样本复杂度与特定语言统计联系起来；（iii）在深度卷积和基于Transformer的架构上经验性地验证了我们的预测。总体而言，我们提出了一个统一框架，其中不同尺度的相关性消除了局部歧义，使数据的层次化表征得以涌现。

英文摘要

Understanding how the structure of language can be learned from sentences alone is a central question in both cognitive science and machine learning. Studies of the internal representations of Large Language Models (LLMs) support their ability to parse text when predicting the next word, while representing semantic notions independently of surface form. Yet, which data statistics make these feats possible, and how much data is required, remain largely unknown. Probabilistic context-free grammars (PCFGs) provide a tractable testbed for studying these questions. However, prior work has focused either on the post-hoc characterization of the parsing-like algorithms used by trained networks; or on the learnability of PCFGs with fixed syntax, where parsing is unnecessary. Here, we (i) introduce a tunable class of PCFGs in which both the degree of ambiguity and the correlation structure across scales can be controlled; (ii) provide a learning mechanism -- an inference algorithm inspired by the structure of deep convolutional networks -- that links learnability and sample complexity to specific language statistics; and (iii) validate our predictions empirically across deep convolutional and transformer-based architectures. Overall, we propose a unifying framework where correlations at different scales lift local ambiguities, enabling the emergence of hierarchical representations of the data.

URL PDF HTML ☆

赞 0 踩 0

2602.01460 2026-06-02 math.OC cs.LG 版本更新

Non-Uniform Noise-to-Signal Ratio in the REINFORCE Policy-Gradient Estimator

REINFORCE策略梯度估计器中的非均匀噪声信号比

Haoyu Han, Heng Yang

发表机构 * math.OC（数学优化）

AI总结研究REINFORCE策略梯度估计器的噪声信号比（NSR），通过精确刻画线性与多项式系统中的NSR，发现NSR在策略参数空间中高度非均匀，且通常在策略接近最优时增大甚至爆炸，导致训练不稳定和策略崩溃。

详情

AI中文摘要

策略梯度方法在强化学习中被广泛使用，但随着学习的进行，训练常常变得不稳定或减慢。我们通过策略梯度估计器的噪声信号比（NSR）来研究这一现象，该比值定义为估计器方差（噪声）除以真实梯度的平方范数（信号）。我们的主要结果是，对于（i）具有高斯策略和线性状态反馈的有限时域线性系统，以及（ii）具有高斯策略和多项式反馈的有限时域多项式系统，REINFORCE估计器的NSR可以精确刻画——要么是闭式形式，要么通过数值矩评估算法——无需近似。对于一般的非线性动力学和表达性策略（包括神经策略），我们进一步推导了方差的一般上界。这些刻画使得能够直接检查NSR如何随策略参数变化，以及如何沿优化轨迹（例如SGD和Adam）演变。在一系列示例中，我们发现NSR景观高度非均匀，并且通常随着策略接近最优而增大；在某些情况下它会爆炸，从而触发训练不稳定和策略崩溃。

英文摘要

Policy-gradient methods are widely used in reinforcement learning, yet training often becomes unstable or slows down as learning progresses. We study this phenomenon through the noise-to-signal ratio (NSR) of a policy-gradient estimator, defined as the estimator variance (noise) normalized by the squared norm of the true gradient (signal). Our main result is that, for (i) finite-horizon linear systems with Gaussian policies and linear state-feedback, and (ii) finite-horizon polynomial systems with Gaussian policies and polynomial feedback, the NSR of the REINFORCE estimator can be characterized exactly-either in closed form or via numerical moment-evaluation algorithms-without approximation. For general nonlinear dynamics and expressive policies (including neural policies), we further derive a general upper bound on the variance. These characterizations enable a direct examination of how NSR varies across policy parameters and how it evolves along optimization trajectories (e.g. SGD and Adam). Across a range of examples, we find that the NSR landscape is highly non-uniform and typically increases as the policy approaches an optimum; in some regimes it blows up, which can trigger training instability and policy collapse.

URL PDF HTML ☆

赞 0 踩 0

2511.21140 2026-06-02 cs.LG cs.CL stat.AP stat.ML 版本更新

How to Correctly Report LLM-as-a-Judge Evaluations

如何正确报告LLM作为评估者的评估结果

Chungpa Lee, Thomas Zeng, Jongwon Jeong, Jy-yong Sohn, Kangwook Lee

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结针对LLM作为评估者时存在偏差的问题，提出一种插件式校正框架，实现无偏估计和统计原理的不确定性量化，并证明在分布偏移下仍保持无偏性。

详情

Journal ref: International Conference on Machine Learning (ICML) 2026

AI中文摘要

大型语言模型（LLMs）被广泛用作模型响应的可扩展评估者，以替代人工标注者。然而，LLM评估者的不完美灵敏度和特异性会导致朴素评估分数产生偏差。我们提出一个简单的插件式框架，可校正此偏差并实现统计原理的不确定性量化。我们的框架构建置信区间，该区间同时考虑来自测试数据集和人工标注校准数据集的不确定性。此外，它采用自适应策略分配校准样本以获得更紧的区间。重要的是，我们刻画了由真实评估分数和LLM评估者的灵敏度与特异性定义的参数区间，在这些区间内，基于LLM的评估比仅人工评估产生更可靠的估计。此外，我们证明，与现有方法相比，我们的框架在测试集和校准集之间存在分布偏移时仍保持无偏性。

英文摘要

Large language models (LLMs) are widely used as scalable evaluators of model responses in lieu of human annotators. However, imperfect sensitivity and specificity of the LLM judges induce bias in naive evaluation scores. We propose a simple plug-in framework that corrects this bias and enables statistically principled uncertainty quantification. Our framework constructs confidence intervals that account for uncertainty from both the test dataset and a human-labeled calibration dataset. Additionally, it uses an adaptive strategy to allocate calibration samples for tighter intervals. Importantly, we characterize parameter regimes defined by the true evaluation score and the LLM judge's sensitivity and specificity in which our LLM-based evaluation yields more reliable estimates than human-only evaluation. Moreover, we show that our framework remains unbiased under distribution shift between the test and calibration datasets, in contrast to existing approaches.

URL PDF HTML ☆

赞 0 踩 0

2602.07356 2026-06-02 cs.LG 版本更新

协作高效微调：利用任务相似性

Gagik Magakyan, Amirhossein Reisizadeh, Chanwoo Park, Pablo A. Parrilo, Asuman Ozdaglar

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； Stanford University（斯坦福大学）

AI总结提出CoLoRA方法，通过共享适配器和个性化适配器利用任务相似性进行协作微调，提升数据稀缺下的模型性能，并在理论和实验上验证其有效性。

详情

AI中文摘要

适应性被认为是基础模型的核心特征，使其能够有效适应未见过的下游任务。参数高效的微调方法，如著名的LoRA，使得使用标记的、高质量且通常稀缺的任务数据对大型基础模型进行高效适应成为可能。为了缓解基础模型微调中的数据稀缺问题，我们提出利用多个下游用户之间的任务相似性。直观上，具有相似任务的用户必须能够相互帮助，以增加有效的微调数据量。我们提出了协作低秩适应（CoLoRA），该方法利用任务相似性来协作且高效地微调个性化基础模型。CoLoRA的主要思想是训练一个共享适配器，捕捉所有任务之间的潜在任务相似性，以及针对用户特定任务定制的个性化适配器。我们在异质线性回归上对CoLoRA进行了理论研究，并提供了真实恢复的可证明保证。我们还进行了多个具有不同任务相似性的自然语言实验，进一步表明当与相似任务一起训练时，个体性能显著提升。

英文摘要

Adaptability has been regarded as a central feature in the foundation models, enabling them to effectively acclimate to unseen downstream tasks. Parameter-efficient fine-tuning methods such as celebrated LoRA facilitate efficient adaptation of large foundation models using labeled, high-quality and generally scarce task data. To mitigate data scarcity in fine-tuning of foundation models, we propose to leverage task similarity across multiple downstream users. Intuitively, users with similar tasks must be able to assist each other in boosting the effective fine-tuning data size. We propose Collaborative Low-Rank Adaptation, or CoLoRA, which exploits task similarity to collaboratively and efficiently fine-tune personalized foundation models. The main idea in CoLoRA is to train one shared adapter capturing underlying task similarities across all tasks, and personalized adapters tailored to user-specific tasks. We theoretically study CoLoRA on heterogeneous linear regression and provide provable guarantees for ground truth recovery. We also conduct several natural language experiments with varying task similarity, which further demonstrate that when trained together with similar tasks, individual performances are significantly boosted.

URL PDF HTML ☆

赞 0 踩 0

2602.06837 2026-06-02 cs.LG stat.ML 版本更新

Sharpness-Aware Hybrid Model Learning for Architecture-Agnostic Parameter Estimation

锐度感知的混合模型学习用于架构无关的参数估计

Naoya Takeishi

发表机构 * The University of Tokyo（东京大学）

AI总结提出一种基于锐度感知最小化的架构无关方法，通过损失平坦性实现混合模型中科学参数的准确估计。

详情

AI中文摘要

混合建模，即机器学习模型与科学数学模型的结合，能够实现灵活且鲁棒的数据驱动预测，并具有部分可解释性。然而，科学模型的未知参数不一定能被正确估计，因为机器学习模型的灵活性可能导致科学模型部分在预测中被有效忽略。我们可以通过应用正则化来避免这种情况，但这种正则化的公式通常依赖于模型架构和领域知识。在本文中，我们提出了一种架构无关的方法来学习混合模型，同时正确估计科学参数。其思想基于奥卡姆剃刀原则，利用损失最小值的平坦性来实现模型简洁性。我们采用锐度感知最小化的思想，并将其适应于混合建模设置。数值实验证明了基于SAM的混合模型学习在科学参数估计中的有效性。

英文摘要

Hybrid modeling, the combination of machine learning models and scientific mathematical models, enables flexible and robust data-driven prediction with partial interpretability. However, the unknown parameters of the scientific model cannot necessarily be estimated properly, since the flexibility of the machine learning model might make the scientific model part effectively ignored in prediction. We may avoid it by applying some regularization, but the formulation of such regularizers typically depends on model architectures and domain knowledge. In this paper, we propose an architecture-agnostic method to learn hybrid models while properly estimating the scientific parameters. The idea is to use the flatness of loss minima to achieve model simplicity, based upon the Occam's razor principle. We employ the idea of sharpness-aware minimization and adapt it to the hybrid modeling setting. Numerical experiments demonstrate the effectiveness of the SAM-based hybrid model learning for scientific parameter estimation.

URL PDF HTML ☆

赞 0 踩 0

2602.06448 2026-06-02 cs.LG cs.AI 版本更新

Principle-Evolvable Scientific Discovery via Uncertainty Minimization

通过不确定性最小化实现原理可演化的科学发现

Yingming Pu, Tao Lin, Hongyu Chen

发表机构 * Westlake University（西lake大学）； Zhejiang University（浙江大学）

AI总结提出PiEvo框架，将科学发现视为原理空间上的贝叶斯优化，通过信息导向假设选择与异常驱动增强机制，使智能体自主演化理论世界观，在四个基准上平均解质量达90.81%~93.15%，收敛速度提升83.3%。

详情

Journal ref: Proc. 43rd Intl. Conf. on Machine Learning (ICML 2026), PMLR 306

AI中文摘要

基于大型语言模型的科学智能体加速了科学发现，但由于固守初始先验，常常效率低下。现有方法主要在静态假设空间中操作，限制了新现象的发现，当基线理论失效时导致计算浪费。为解决此问题，我们提出将焦点从搜索假设转向演化底层科学原理。我们提出PiEvo，一个原理可演化框架，将科学发现视为在扩展原理空间上的贝叶斯优化。通过集成基于高斯过程的信息导向假设选择和异常驱动增强机制，PiEvo使智能体能够自主完善其理论世界观。在四个基准上的评估表明，PiEvo (1) 平均解质量高达90.81%~93.15%，比现有最优方法提升29.7%~31.1%；(2) 通过优化紧凑原理空间显著降低样本复杂度，收敛步骤加速83.3%；(3) 在不同科学领域和LLM骨干上保持稳健性能。代码公开于\hyperlink{https://github.com/amair-lab/PiEvo}{github.com/amair-lab/PiEvo}。

英文摘要

Large Language Model (LLM)-based scientific agents have accelerated scientific discovery, yet they often suffer from significant inefficiencies due to adherence to fixed initial priors. Existing approaches predominantly operate within a static hypothesis space, which restricts the discovery of novel phenomena, resulting in computational waste when baseline theories fail. To address this, we propose shifting the focus from searching hypotheses to evolving the underlying scientific principles. We present PiEvo, a principle-evolvable framework that treats scientific discovery as Bayesian optimization over an expanding principle space. By integrating Information-Directed Hypothesis Selection via Gaussian Process and an anomaly-driven augmentation mechanism, PiEvo enables agents to autonomously refine their theoretical worldview. Evaluation across four benchmarks demonstrates that PiEvo (1) achieves an average solution quality of up to 90.81%~93.15%, representing a 29.7%~31.1% improvement over the state-of-the-art, (2) attains an 83.3% speedup in convergence step via significantly reduced sample complexity by optimizing the compact principle space, and (3) maintains robust performance across diverse scientific domains and LLM backbones. Code is publicly available at \hyperlink{https://github.com/amair-lab/PiEvo}{github.com/amair-lab/PiEvo}.

URL PDF HTML ☆

赞 0 踩 0

2502.16174 2026-06-02 cs.LG cs.AI cs.CL cs.CR 版本更新

Efficient LLM Moderation with Multi-Layer Latent Prototypes

基于多层潜在原型的高效LLM审核

Maciej Chrabąszcz, Filip Szatkowski, Bartosz Wójcik, Jan Dubiński, Tomasz Trzciński, Sebastian Cygert

发表机构 * University of Warsaw（华沙大学）

AI总结提出多层原型审核器（MLPM），利用多层中间表示的原型实现轻量、高效且可定制的输入审核，在多个基准上达到最优性能，并可与输出审核结合提升响应安全性。

详情

AI中文摘要

尽管现代LLM在后训练过程中与人类价值观对齐，但在部署时仍需稳健的审核以防止有害输出。现有方法存在性能与效率的权衡，且难以定制以满足用户特定需求。针对这一差距，我们引入了多层原型审核器（MLPM），一种轻量级且高度可定制的输入审核工具。我们提出利用多层中间表示的原型来提高审核质量，同时保持高效率。通过设计，我们的方法对生成流水线的开销可忽略不计，并可无缝应用于任何模型。MLPM在多种审核基准上实现了最先进的性能，并在不同大小的模型系列中表现出强大的可扩展性。此外，我们展示了它能平滑集成到端到端审核流水线中，并在与输出审核技术结合时进一步提高响应安全性。总体而言，我们的工作为安全、稳健且高效的LLM部署提供了一种实用且可适应的解决方案。

英文摘要

Although modern LLMs are aligned with human values during post-training, robust moderation remains essential to prevent harmful outputs at deployment time. Existing approaches suffer from performance-efficiency trade-offs and are difficult to customize to user-specific requirements. Motivated by this gap, we introduce Multi-Layer Prototype Moderator (MLPM), a lightweight and highly customizable input moderation tool. We propose leveraging prototypes of intermediate representations across multiple layers to improve moderation quality while maintaining high efficiency. By design, our method adds negligible overhead to the generation pipeline and can be seamlessly applied to any model. MLPM achieves state-of-the-art performance on diverse moderation benchmarks and demonstrates strong scalability across model families of various sizes. Moreover, we show that it integrates smoothly into end-to-end moderation pipelines and further improves response safety when combined with output moderation techniques. Overall, our work provides a practical and adaptable solution for safe, robust, and efficient LLM deployment.

URL PDF HTML ☆

赞 0 踩 0

2602.06136 2026-06-02 cs.LG cs.CV 版本更新

Tempora: Characterising the Time-Contingent Utility of Online Test-Time Adaptation

Tempora: 表征在线测试时适应的时间条件效用

Sudarshan Sreeram, Young D. Kwon, Cecilia Mascolo

发表机构 * University of Bristol（布里斯托大学）

AI总结提出Tempora框架，通过时间场景、评估协议和时间条件效用指标，系统评估测试时适应方法在延迟约束下的准确性-延迟权衡，揭示传统排名在时间压力下失效。

Comments Accepted to ICML 2026

详情

AI中文摘要

测试时适应（TTA）为在域偏移下性能下降的机器学习模型提供了一种引人注目的补救措施，仅使用未标记样本即可即时改进泛化能力。这种灵活性适合实际部署，但传统评估不切实际地假设无限处理时间，忽略了准确性-延迟权衡。随着机器学习越来越多地支撑延迟敏感和面向用户的应用，时间压力限制了可适应推理的可行性；到达太晚而无法采取行动的预测是徒劳的。我们引入了Tempora，一个在这种压力下评估TTA的框架。它由模拟部署约束的时间场景、实现测量的评估协议以及量化准确性-延迟权衡的时间条件效用指标组成。我们用三个这样的指标实例化该框架：（1）用于具有硬截止时间的异步流的离散效用，（2）用于价值随延迟衰减的交互式设置的连续效用，以及（3）用于预算受限部署的摊销效用。通过将Tempora应用于11种TTA方法，我们发现排名不稳定性在跨越不同数据集、模型和硬件平台的750多次时间评估中持续存在；即，传统排名不能预测时间压力下的排名。最高效用方法随偏移和时间压力而变化，没有明确的赢家。通过首次实现跨不同时间约束的系统评估，Tempora揭示了排名何时以及为何变化，为从业者提供了方法选择的视角，为研究人员提供了可部署适应的目标。代码：https://github.com/sudotensor/tempora。

英文摘要

Test-time adaptation (TTA) offers a compelling remedy for machine learning (ML) models that degrade under domain shifts, improving generalisation on-the-fly with only unlabelled samples. This flexibility suits real deployments, yet conventional evaluations unrealistically assume unbounded processing time, overlooking the accuracy-latency trade-off. As ML increasingly underpins latency-sensitive and user-facing use-cases, temporal pressure constrains the viability of adaptable inference; predictions arriving too late to act on are futile. We introduce Tempora, a framework for evaluating TTA under this pressure. It consists of temporal scenarios that model deployment constraints, evaluation protocols that operationalise measurement, and time-contingent utility metrics that quantify the accuracy-latency trade-off. We instantiate the framework with three such metrics: (1) discrete utility for asynchronous streams with hard deadlines, (2) continuous utility for interactive settings where value decays with latency, and (3) amortised utility for budget-constrained deployments. By applying Tempora to 11 TTA methods, we find that rank instability persists across 750+ temporal evaluations spanning diverse datasets, models, and hardware platforms; i.e., conventional rankings do not predict rankings under temporal pressure. The highest-utility method varies with the shift and temporal pressure, with no clear winner. By enabling systematic evaluation across diverse temporal constraints for the first time, Tempora reveals when and why rankings change, offering practitioners a lens for method selection and researchers a target for deployable adaptation. Code: https://github.com/sudotensor/tempora.

URL PDF HTML ☆

赞 0 踩 0

2602.06033 2026-06-02 cs.LG 版本更新

Can Vision Language Models Learn Intuitive Physics from Interaction?

视觉语言模型能否从交互中学习直观物理？

Luca M. Schulze Buschoff, Konstantinos Voudouris, Can Demircan, Eric Schulz

发表机构 * GitHub

AI总结研究通过强化学习与环境交互训练视觉语言模型，发现交互学习能提升任务内性能，但无法产生可泛化的物理直觉。

Comments Updated accepted version for ICML'26

2602.05970 2026-06-02 cs.LG cs.AI math.DS stat.ML 版本更新

Inverse Depth Scaling From Most Layers Being Similar

大多数层相似时的逆深度缩放

Yizhou Liu, Sara Kangaslahti, Ziming Liu, Jeff Gore

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结通过分析大型语言模型和玩具残差网络，发现损失与深度成反比，归因于功能相似的层通过集成平均而非组合学习或平滑动力学离散化来减少误差，表明需要架构创新以鼓励深度组合使用。

Comments Camera-ready version, ICML 2026

2602.05951 2026-06-02 cs.CV cs.AI cs.LG 版本更新

Better Source, Better Flow: Learning Condition-Dependent Source Distribution for Flow Matching

更好的源，更好的流：学习条件依赖的源分布用于流匹配

Junwan Kim, Jiho Park, Seonghu Jeon, Seungryong Kim

发表机构 * New York University（纽约大学）； KAIST AI（韩国科学技术院人工智能实验室）

AI总结本文提出在流匹配框架中学习条件依赖的源分布，通过方差正则化和源-目标方向对齐，显著提升文本到图像生成的速度和质量。

Comments Project Page: https://junwankimm.github.io/CSFM

详情

AI中文摘要

流匹配最近已成为基于扩散的生成模型的有前途的替代方案，特别是在文本到图像生成方面。尽管它在允许任意源分布方面具有灵活性，但大多数现有方法依赖于标准高斯分布（这是从扩散模型继承的选择），并且很少在这种设置中将源分布本身视为优化目标。在这项工作中，我们表明源分布的原则性设计不仅是可行的，而且在现代文本到图像系统的规模上也是有益的。具体来说，我们提出在流匹配目标下学习条件依赖的源分布，以更好地利用丰富的条件信号。我们识别了将条件直接纳入源时出现的关键失败模式，包括分布坍缩和不稳定性，并表明适当的方差正则化以及源和目标之间的方向对齐对于稳定和有效的学习至关重要。我们进一步分析了目标表示空间的选择如何影响具有结构化源的流匹配，揭示了这种设计最有效的场景。在多个文本到图像基准上的大量实验表明了一致且稳健的改进，包括FID收敛速度提高多达3倍，突出了原则性源分布设计对条件流匹配的实际好处。

英文摘要

Flow matching has recently emerged as a promising alternative to diffusion-based generative models, particularly for text-to-image generation. Despite its flexibility in allowing arbitrary source distributions, most existing approaches rely on a standard Gaussian distribution, a choice inherited from diffusion models, and rarely consider the source distribution itself as an optimization target in such settings. In this work, we show that principled design of the source distribution is not only feasible but also beneficial at the scale of modern text-to-image systems. Specifically, we propose learning a condition-dependent source distribution under flow matching objective that better exploit rich conditioning signals. We identify key failure modes that arise when directly incorporating conditioning into the source, including distributional collapse and instability, and show that appropriate variance regularization and directional alignment between source and target are critical for stable and effective learning. We further analyze how the choice of target representation space impacts flow matching with structured sources, revealing regimes in which such designs are most effective. Extensive experiments across multiple text-to-image benchmarks demonstrate consistent and robust improvements, including up to a 3x faster convergence in FID, highlighting the practical benefits of a principled source distribution design for conditional flow matching.

URL PDF HTML ☆

赞 0 踩 0

2602.05395 2026-06-02 stat.ML cs.AI cs.LG 版本更新

Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers

用于高效推断一致LLM答案的最优贝叶斯停止

Jingkai Huang, Will Ma, Zhengyuan Zhou

发表机构 * Stern School of Business, New York University, New York, USA（纽约大学 Stern 商学院）； Graduate School of Business, Columbia University, New York, USA（哥伦比亚大学商学院）

AI总结利用贝叶斯先验信息，通过L-聚合停止策略在达到足够一致性时提前停止采样，以最小化采样成本并高效识别最一致的LLM答案。

Comments Accepted to ICML 2026. Camera-ready version

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)

AI中文摘要

一种提高LLM准确性的简单策略，特别是在数学和推理问题中，是采样多个响应并提交最一致达成的答案。在本文中，我们利用贝叶斯先验信息来节省采样成本，一旦达到足够的一致性就停止。尽管精确后验在计算上难以处理，我们进一步引入了一种高效的“L-聚合”停止策略，该策略仅跟踪L-1个最频繁的答案计数。理论上，我们证明L=3就足够了：这种粗略近似足以实现渐近最优性，并且严格优于无先验基线，同时具有快速的后验计算。实验上，该方法使用更少的样本识别出最一致（即众数）的LLM答案，并且可以在减少LLM调用次数（即节省LLM推理成本）高达50%的同时实现相似的答案准确性。

英文摘要

A simple strategy for improving LLM accuracy, especially in math and reasoning problems, is to sample multiple responses and submit the answer most consistently reached. In this paper we leverage Bayesian prior information to save on sampling costs, stopping once sufficient consistency is reached. Although the exact posterior is computationally intractable, we further introduce an efficient "L-aggregated" stopping policy that tracks only the L-1 most frequent answer counts. Theoretically, we prove that L=3 is all you need: this coarse approximation is sufficient to achieve asymptotic optimality, and strictly dominates prior-free baselines, while having a fast posterior computation. Empirically, this identifies the most consistent (i.e., mode) LLM answer using fewer samples, and can achieve similar answer accuracy while cutting the number of LLM calls (i.e., saving on LLM inference costs) by up to 50%.

URL PDF HTML ☆

赞 0 踩 0

2511.16886 2026-06-02 cs.CL cs.AI cs.LG 版本更新

Latent Reasoning in TRMs is Secretly a Policy Improvement Operator

TRMs中的潜在推理实际上是策略改进算子

Arip Asadulaev, Rayan Banerjee, Fakhri Karray, Martin Takac

发表机构 * Arip Asadulaev ； Rayan Banerjee ； Fakhri Karray ； Martin Takac

AI总结本文通过将潜在递归推理形式化为策略改进算法，解释了递归步骤何时有效提升性能，并提出结合强化学习和扩散方法的训练方案，在Tiny Recursive Model上实现18倍前向传递减少且保持性能。

详情

AI中文摘要

最近，具有潜在递归的小模型在复杂推理任务上取得了有希望的结果。这些结果通常由这样的理论解释：这种递归增加了网络的深度，使其能够紧凑地模拟更大模型的能力。然而，递归添加层的性能仍然落后于具有相同前馈深度的单次通过模型。这意味着在循环版本中，并非每个递归步骤都有效地贡献于深度。这提出了一个问题：潜在推理何时以及为何能提高性能，何时会导致无效计算？在我们的工作中，我们证明了潜在递归推理为这个问题提供了答案。我们展示了潜在递归推理可以形式化为策略改进算法。基于这些见解，我们提出使用强化学习和扩散方法的训练方案用于潜在推理模型。以Tiny Recursive Model作为测试平台，我们展示了通过我们的修改，可以避免无效计算步骤，并将前向传递总数减少18倍，同时保持性能。总的来说，我们展示了递归步骤的策略改进视角如何解释模型行为，并为进一步改进提供见解。

英文摘要

Recently, small models with latent recursion have obtained promising results on complex reasoning tasks. These results are typically explained by the theory that such recursion increases a networks depth, allowing it to compactly emulate the capacity of larger models. However, the performance of recursively added layers remains behind the capabilities of one pass models with the same feed-forward depth. This means that in the looped version, not every recursive step effectively contributes to depth. This raises the question: when and why does latent reasoning improve performance, and when does it result in dead compute? In our work, we demonstrate that latent recursive reasoning provides answer to this question. We show that latent recursive reasoning can be formalized as a policy improvement algorithm. Building on these insights, we propose to use a training schemes from reinforcement learning and diffusion methods for latent reasoning models. Using the Tiny Recursive Model as our testbed, we show that with our modifications we can avoid dead compute steps and reduce the total number of forward passes by 18x while maintaining performance. Broadly speaking, we show how a policy improvement perspective on recursive steps can explain model behavior and provide insights for further improvements.

URL PDF HTML ☆

赞 0 踩 0

2602.04861 2026-06-02 cs.LG cond-mat.mtrl-sci cs.AI physics.chem-ph 版本更新

From Evaluation to Design: Using Potential Energy Surface Smoothness Metrics to Guide Machine Learning Interatomic Potential Architectures

从评估到设计：利用势能面平滑度指标指导机器学习原子间势架构

Ryan Liu, Eric Qu, Tobias Kreiman, Samuel M. Blau, Aditi S. Krishnapriyan

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结提出键平滑度表征测试（BSCT）作为高效评估机器学习原子间势（MLIP）势能面平滑度的指标，并与分子动力学稳定性强相关，同时指导模型设计以减少伪影。

Comments Accepted at the International Conference on Machine Learning (ICML) 2026

详情

AI中文摘要

机器学习原子间势（MLIP）有时无法再现量子势能面（PES）的物理平滑性，导致下游模拟中出现标准能量和力回归评估无法捕捉的错误行为。现有评估方法（如微正则分子动力学（MD））计算成本高且主要探测近平衡态。为改进MLIP的评估指标，我们引入键平滑度表征测试（BSCT）。该高效基准通过受控键变形探测PES，检测近平衡和远离平衡态的非平滑性，包括不连续性、人工极小值和虚假力。我们证明BSCT与MD稳定性强相关，而成本仅为MD的一小部分。为展示BSCT如何指导迭代模型设计，我们利用无约束Transformer主干作为测试平台，说明如何通过改进（如新的可微$k$-最近邻算法和温度控制注意力）减少指标识别的伪影。通过基于BSCT系统优化模型设计，所得MLIP同时实现了低传统E/F回归误差、稳定的MD模拟和鲁棒的原子性质预测。我们的结果将BSCT确立为从业者评估MLIP实用性的验证指标，以及“循环内”模型设计代理，提醒MLIP开发者注意当前MLIP基准无法高效评估的物理挑战。BSCT数据集和评估可在https://github.com/ryanliu30/bsct.git获取。

英文摘要

Machine Learning Interatomic Potentials (MLIPs) sometimes fail to reproduce the physical smoothness of the quantum potential energy surface (PES), leading to erroneous behavior in downstream simulations that standard energy and force regression evaluations can miss. Existing evaluations, such as microcanonical molecular dynamics (MD), are computationally expensive and primarily probe near-equilibrium states. To improve evaluation metrics for MLIPs, we introduce the Bond Smoothness Characterization Test (BSCT). This efficient benchmark probes the PES via controlled bond deformations and detects non-smoothness, including discontinuities, artificial minima, and spurious forces, both near and far from equilibrium. We show that BSCT correlates strongly with MD stability while requiring a fraction of the cost of MD. To demonstrate how BSCT can guide iterative model design, we utilize an unconstrained Transformer backbone as a testbed, illustrating how refinements such as a new differentiable $k$-nearest neighbors algorithm and temperature-controlled attention reduce artifacts identified by our metric. By optimizing model design systematically based on BSCT, the resulting MLIP simultaneously achieves a low conventional E/F regression error, stable MD simulations, and robust atomistic property predictions. Our results establish BSCT as both a validation metric for practitioners to assess MLIP utility and as an "in-the-loop" model design proxy that alerts MLIP developers to physical challenges that cannot be efficiently evaluated by current MLIP benchmarks. The BSCT dataset and evaluation are available on https://github.com/ryanliu30/bsct.git

URL PDF HTML ☆

赞 0 踩 0

2501.18649 2026-06-02 cs.CL cs.AI cs.IR cs.LG 版本更新

Fake News Detection After LLM Laundering: Measurement and Explanation

LLM清洗后的假新闻检测：测量与解释

Rupak Kumar Das, Jonathan Dodge

发表机构 * College of IST Pennsylvania State University（宾夕法尼亚州立大学信息科学与技术学院）

AI总结研究测量检测器在识别LLM改写假新闻时的有效性，发现检测器难以检测LLM改写的假新闻，并通过LIME解释发现情感偏移是检测失败的原因之一。

详情

DOI: 10.24251/HICSS.2026.339

AI中文摘要

凭借其先进的能力，大型语言模型（LLM）可以生成高度令人信服且上下文相关的假新闻，这可能有助于传播错误信息。尽管针对人类撰写文本的假新闻检测已有大量研究，但检测LLM生成的假新闻这一领域仍探索不足。本研究测量了检测器在识别LLM改写的假新闻方面的有效性，特别是确定在检测流程中添加改写步骤是有助于还是阻碍检测。本研究贡献如下：（1）检测器在检测LLM改写的假新闻时比检测人类撰写文本更困难；（2）我们发现了哪些模型在哪些任务（逃避检测、通过改写逃避检测以及为语义相似性进行改写）上表现出色；（3）通过LIME解释，我们发现了检测失败的一个可能原因：情感偏移；（4）我们发现了一个关于改写质量测量的令人担忧的趋势：尽管BERTSCORE很高，但样本仍表现出情感偏移；（5）我们提供了一对数据集，用改写输出和分数扩充了现有数据集。该数据集可在GitHub上获取。

英文摘要

With their advanced capabilities, Large Language Models (LLMs) can generate highly convincing and contextually relevant fake news, which can contribute to disseminating misinformation. Though there is much research on fake news detection for human-written text, the field of detecting LLM-generated fake news is still under-explored. This research measures the efficacy of detectors in identifying LLM-paraphrased fake news, in particular, determining whether adding a paraphrase step in the detection pipeline helps or impedes detection. This study contributes: (1) Detectors struggle to detect LLM-paraphrased fake news more than human-written text, (2) We find which models excel at which tasks (evading detection, paraphrasing to evade detection, and paraphrasing for semantic similarity). (3) Via LIME explanations, we discovered a possible reason for detection failures: sentiment shift. (4) We discover a worrisome trend for paraphrase quality measurement: samples that exhibit sentiment shift despite a high BERTSCORE. (5) We provide a pair of datasets augmenting existing datasets with paraphrase outputs and scores. The dataset is available on GitHub

URL PDF HTML ☆

赞 0 踩 0

2602.03685 2026-06-02 cs.LG cs.AI stat.ML 版本更新

Universal One-third Time Scaling in Learning Peaked Distributions

学习尖峰分布中的普适三分之一时间缩放

Yizhou Liu, Ziming Liu, Cengiz Pehlevan, Jeff Gore

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文通过理论分析和实验验证，揭示了使用softmax和交叉熵学习尖峰分布时，损失和梯度呈幂律衰减，导致损失时间缩放指数为1/3的普适瓶颈，为神经缩放现象提供了机理解释。

Comments Camera-ready version, ICML 2026

2602.03670 2026-06-02 cs.LG cs.AI cs.NE math.DS physics.class-ph 版本更新

Equilibrium Propagation for Non-Conservative Systems

非保守系统的平衡传播

Antonino Emanuele Scurria, Dimitri Vanden Abeele, Bortolo Matteo Mognetti, Serge Massar

发表机构 * University of Amsterdam（阿姆斯特丹大学）； Institute for Advanced Study（高级研究院）

AI总结提出一种扩展平衡传播到非保守系统（包括前馈网络）的框架，通过在学习阶段引入与非互易相互作用成比例的项来精确计算代价函数的梯度，数值实验表明性能更优且学习更快。

Comments 23 pages

详情

一致性深度均衡模型

Junchao Lin, Zenan Ling, Jingwen Xu, Robert C. Qiu

发表机构 * School of Electronic Information and Communications, Huazhong University of Science and Technology（华中科技大学电子信息学院）； School of Electronic Information（电子信息学院）； Communications, Huazhong University of Science（华中科技大学通信学院）； School of Science, Wuhan University of Technology（武汉理工大学理学院）

AI总结提出一致性深度均衡模型（C-DEQ），通过一致性蒸馏将DEQ迭代推理过程视为沿ODE轨迹演化，训练模型将中间状态直接映射到不动点，实现少步推理并保持性能，同时支持多步评估以灵活权衡计算与性能，实验表明在相同少步推理预算下精度提升2-20倍。

详情

AI中文摘要

深度均衡模型（DEQ）已成为深度学习中的一种强大范式，能够以恒定的内存使用量建模无限深度网络。然而，由于不动点求解器的迭代性质，DEQ会带来显著的推理延迟。在这项工作中，我们引入了一致性深度均衡模型（C-DEQ），这是一种利用一致性蒸馏来加速DEQ推理的新框架。我们将DEQ迭代推理过程视为沿固定ODE轨迹向均衡演化。沿着这条轨迹，我们训练C-DEQ将中间状态一致地直接映射到不动点，从而在保持教师DEQ性能的同时实现少步推理。同时，它支持多步评估，以灵活地权衡计算与性能提升。跨多个领域任务的广泛实验表明，在相同的少步推理预算下，C-DEQ相比隐式DEQ实现了2-20倍的精度提升。我们的代码可在https://github.com/landrarwolf/CDEQ获取。

英文摘要

Deep Equilibrium Models (DEQs) have emerged as a powerful paradigm in deep learning, offering the ability to model infinite-depth networks with constant memory usage. However, DEQs incur significant inference latency due to the iterative nature of fixed-point solvers. In this work, we introduce the Consistency Deep Equilibrium Model (C-DEQ), a novel framework that leverages consistency distillation to accelerate DEQ inference. We cast the DEQ iterative inference process as evolution along a fixed ODE trajectory toward the equilibrium. Along this trajectory, we train C-DEQs to consistently map intermediate states directly to the fixed point, enabling few-step inference while preserving the performance of the teacher DEQ. At the same time, it facilitates multi-step evaluation to flexibly trade computation for performance gains. Extensive experiments across various domain tasks demonstrate that C-DEQs achieve consistent 2-20$\times$ accuracy improvements over implicit DEQs under the same few-step inference budget. Our code is available at https://github.com/landrarwolf/CDEQ.

URL PDF HTML ☆

赞 0 踩 0

2602.03018 2026-06-02 cs.LG 版本更新

From Zero to Hero: Advancing Zero-Shot Foundation Models for Tabular Outlier Detection

从零到英雄：推进表格异常检测的零样本基础模型

Xueying Ding, Haomin Wen, Simon Klüttermann, Leman Akoglu

发表机构 * Xueying Ding（丁雪莹）； Haomin Wen（文浩明）； Simon Klüttermann（西蒙·克吕特曼）； Leman Akoglu（拉曼·阿科格卢）

AI总结提出OUTFORMER模型，通过混合合成先验和自演化课程训练，实现零样本表格异常检测，在AdBench及新基准上达到最优性能。

Comments 41 Pages, ICML 2026

详情

AI中文摘要

异常检测（OD）在实践中广泛应用；但由于缺乏标记异常，其在新任务上的有效部署受到阻碍，这使得算法和超参数选择异常困难。基础模型（FMs）已经改变了机器学习，OD也不例外：Shen等人（2025）引入了FoMo-0D，这是第一个用于OD的基础模型，在众多基线中取得了显著性能。本文介绍了OUTFORMER，它通过（1）混合合成先验和（2）自演化课程训练推进了FoMo-0D。OUTFORMER仅在合成标记数据集上预训练，并通过将其训练数据作为上下文输入来推断新任务的测试标签。推理速度快且零样本，仅需前向传播，无需标记异常。得益于上下文学习，它不需要额外工作——无需OD模型训练或定制模型选择——实现了真正的即插即用部署。OUTFORMER在著名的AdBench以及我们引入的两个包含超过1500个数据集的大规模新OD基准上取得了最先进的性能，同时保持了快速的推理速度。

英文摘要

Outlier detection (OD) is widely used in practice; but its effective deployment on new tasks is hindered by lack of labeled outliers, which makes algorithm and hyperparameter selection notoriously hard. Foundation models (FMs) have transformed ML, and OD is no exception: Shen et. al. (2025) introduced FoMo-0D, the first FM for OD, achieving remarkable performance against numerous baselines. This work introduces OUTFORMER, which advances FoMo-0D with (1) a mixture of synthetic priors and (2) self-evolving curriculum training. OUTFORMER is pretrained solely on synthetic labeled datasets and infers test labels of a new task by using its training data as in-context input. Inference is fast and zero-shot, requiring merely forward pass and no labeled outliers. Thanks to in-context learning, it requires zero additional work-no OD model training or bespoke model selection-enabling truly plug-and-play deployment. OUTFORMER achieves state-of-the-art performance on the prominent AdBench, as well as two new large-scale OD benchmarks that we introduce, comprising over 1,500 datasets, while maintaining speedy inference.

URL PDF HTML ☆

赞 0 踩 0

2602.02886 2026-06-02 cs.LG cs.AI 版本更新

Mixture of Concept Bottleneck Experts

概念瓶颈专家混合模型

Francesco De Santis, Gabriele Ciravegna, Giovanni De Felice, Arianna Casanova, Francesco Giannini, Michelangelo Diligenti, Johannes Schneider, Danilo Giordano, Mateo Espinosa Zarlenga, Pietro Barbiero

发表机构 * University of Padua（帕多瓦大学）

AI总结提出概念瓶颈专家混合模型（M-CBE），通过引入多个专家表达式和灵活的函数形式，在保持可解释性的同时提升预测精度和适应性。

详情

AI中文摘要

概念瓶颈模型（CBM）通过将预测基于人类可理解的概念来促进可解释性。然而，现有的CBM通常将其任务预测器限制为单个表达式，其函数形式是预先设定的，这限制了预测精度和对不同用户需求的适应性。我们提出了概念瓶颈专家混合模型（M-CBE），这是一个沿两个维度推广现有CBM的框架：任务预测器用于将概念映射到任务的表达式数量（称为专家），以及每个表达式所采用的函数形式，从而揭示了该设计空间中一个未被充分探索的区域。我们通过实例化两个新颖的模型来研究这一区域：线性M-CBE，它学习一组有限的线性表达式；以及符号M-CBE，它利用符号回归从数据中发现专家函数，受限于用户指定的算子词汇表。实证评估表明，改变表达式的数量及其函数形式为导航精度-可解释性权衡提供了一个稳健的框架。

英文摘要

Concept Bottleneck Models (CBMs) promote interpretability by grounding predictions in human-understandable concepts. However, existing CBMs typically constrain their task predictor to a single expression whose functional form is set a priori, limiting both predictive accuracy and adaptability to diverse user needs. We propose Mixture of Concept Bottleneck Experts (M-CBE), a framework that generalizes existing CBMs along two dimensions: the number of expressions, referred to as experts, employed by the task predictor to map concepts to the task, and the functional form each expression takes, thus exposing an underexplored region of this design space. We investigate this region by instantiating two novel models: Linear M-CBE, which learns a finite set of linear expressions, and Symbolic M-CBE, which leverages symbolic regression to discover expert functions from data subject to user-specified operator vocabularies. Empirical evaluation demonstrates that varying the number of expressions and their functional form provides a robust framework for navigating the accuracy-interpretability trade-off.

URL PDF HTML ☆

赞 0 踩 0

2602.02557 2026-06-02 cs.LG cs.AI cs.SD 版本更新

The Alignment Curse: Modality Alignment Supercharges Audio Attacks via Text Transfer

对齐诅咒：模态对齐通过文本传输增强音频攻击

Yupeng Chen, Junchi Yu, Aoxi Liu, Baoyuan Wu, Philip Torr, Adel Bibi

发表机构 * University of Oxford（牛津大学）

AI总结本文提出并验证了“对齐诅咒”原理，即更强的文本-音频模态对齐会促进文本攻击向音频的迁移，并通过黑盒实验表明文本转移的音频攻击性能与原生音频攻击相当甚至更优，揭示了能力与安全之间的根本矛盾。

Comments 23 pages, 5 figures

详情

AI中文摘要

近期端到端训练的全能模型通过加强文本-音频模态对齐显著提升了音频能力。然而，这种对齐是否无意中促进了安全漏洞跨模态的转移仍未被充分探索。这一问题至关重要，因为基于文本的越狱攻击远比基于音频的攻击成熟；如果它们系统性转移，当前的音频安全评估可能低估源自文本模态的风险。在本文中，我们引入了“对齐诅咒”，这是一个经过形式化表征和实证验证的原理，表明更强的模态对齐使得攻击从文本到音频的转移更有效，揭示了能力与安全之间的根本矛盾。基于这一原理，我们在最新的全能模型（如Qwen2.5-Omni、Qwen3-Omni）上对三类攻击（文本攻击、文本转移的音频攻击和音频攻击）进行了全面的黑盒评估。我们发现，文本转移的音频攻击与基于音频的攻击表现相当，甚至更优，在仅音频访问下展现出明显优势。这表明基于文本的漏洞在塑造音频安全风险中扮演关键角色。最后，我们实证分析了不同攻击方法和模型下模态对齐与转移有效性之间的关系，观察到对“对齐诅咒”的一致支持：更紧密的模态对齐导致更有效的跨模态攻击转移。

英文摘要

Recent advances in end-to-end trained omni-models have substantially improved audio capabilities by strengthening text-audio modality alignment. However, whether such alignment inadvertently facilitates the transfer of safety vulnerabilities across modalities remains underexplored. This question is critical as text-based jailbreak attacks are considerably more mature than audio-based ones; if they transfer systematically, current audio safety evaluations may underestimate risks originating from the text modality. In this paper, we introduce the Alignment Curse, a formally characterized and empirically validated principle showing that stronger modality alignment enables more effective transfer of attacks from text to audio, revealing a fundamental tension between capability and safety. Motivated by this principle, we conduct a comprehensive black-box evaluation of three attack categories on recent omni-models (e.g., Qwen2.5-Omni, Qwen3-Omni): text attacks, text-transferred audio attacks, and audio attacks. We find that text-transferred audio attacks perform comparably to, and often better than, audio-based attacks, exhibiting a clear advantage under audio-only access. This suggests that text-based vulnerabilities play a pivotal role in shaping audio safety risks. Finally, we empirically analyze the relationship between modality alignment and transfer effectiveness across attack methods and models, observing consistent support for the Alignment Curse: tighter modality alignment leads to more effective cross-modality attack transfer.

URL PDF HTML ☆

赞 0 踩 0

2602.02547 2026-06-02 cs.LG cs.AI 版本更新

naPINN: Noise-Adaptive Physics-Informed Neural Networks for Recovering Physics from Corrupted Measurement

naPINN: 用于从损坏测量中恢复物理的噪声自适应物理信息神经网络

Hankyeol Kim, Pilsung Kang

发表机构 * Department of Industrial Engineering（工业工程系）； Seoul National University（首尔国立大学）

AI总结提出噪声自适应物理信息神经网络(naPINN)，通过嵌入能量模型学习残差分布并自适应过滤异常值，从非高斯噪声和离群点损坏的测量中鲁棒恢复物理解。

详情

AI中文摘要

物理信息神经网络(PINNs)是解决逆问题和从观测数据中发现控制方程的有效方法。然而，在复杂测量噪声和严重离群点下，其性能显著下降。为解决此问题，我们提出了噪声自适应物理信息神经网络(naPINN)，该网络无需噪声分布先验知识，即可从损坏测量中鲁棒恢复物理解。naPINN在训练循环中嵌入一个基于能量的模型，以学习预测残差的潜在分布。利用学习到的能量景观，一个可训练的可靠性门自适应地过滤具有高能量的数据点，同时拒绝代价正则化防止丢弃有效数据导致的平凡解。我们在被非高斯噪声和不同比例离群点损坏的各种基准偏微分方程上展示了naPINN的有效性。结果表明，naPINN显著优于现有的鲁棒PINN基线，成功隔离离群点并在严重数据损坏下准确重建动力学。

英文摘要

Physics-Informed Neural Networks (PINNs) are effective methods for solving inverse problems and discovering governing equations from observational data. However, their performance degrades significantly under complex measurement noise and gross outliers. To address this issue, we propose the Noise-Adaptive Physics-Informed Neural Network (naPINN), which robustly recovers physical solutions from corrupted measurements without prior knowledge of the noise distribution. naPINN embeds an energy-based model into the training loop to learn the latent distribution of prediction residuals. Leveraging the learned energy landscape, a trainable reliability gate adaptively filters data points exhibiting high energy, while a rejection cost regularization prevents trivial solutions where valid data are discarded. We demonstrate the efficacy of naPINN on various benchmark partial differential equations corrupted by non-Gaussian noise and varying rates of outliers. The results show that naPINN significantly outperforms existing robust PINN baselines, successfully isolating outliers and accurately reconstructing the dynamics under severe data corruption.

URL PDF HTML ☆

赞 0 踩 0

2510.06048 2026-06-02 cs.LG 版本更新

多任务强化学习的概率性能保证

Yannik Schnitzer, Mathias Jackermeier, Alessandro Abate, David Parker

发表机构 * ETH Zurich（苏黎世联邦理工学院）

AI总结提出一种结合每任务有限 rollout 置信下界与任务级泛化的新泛化界，为未见任务提供高置信度性能保证。

2602.01962 2026-06-02 cs.LG cs.AI 版本更新

Zero-Shot Off-Policy Learning

零样本离策略学习

Arip Asadulaev, Maksim Bobrin, Salem Lahlou, Dmitry Dylov, Fakhri Karray, Martin Takac

发表机构 * Arip Asadulaev（阿里普·阿萨杜拉耶夫）； Maksim Bobrin（马克西姆·博布林）； Salem Lahlou（萨勒姆·拉洛）； Dmitry Dylov（德米特里·达里夫）； Fakhri Karray（法赫里·卡里）； Martin Takac（马尔 tin 塔卡）

AI总结本文通过发现后继度量与平稳密度比的理论联系，提出一种零样本离策略学习算法，能够实时推断最优重要性采样比率并进行平稳分布修正，实现无需额外训练即可适应新任务。

详情

AI中文摘要

离策略学习方法旨在直接从固定的先前交互数据集中推导出最优策略。这一目标面临重大挑战，主要源于固有的分布偏移和价值函数高估偏差。这些问题在零样本强化学习中尤为突出，其中在无奖励数据上训练的智能体必须在测试时适应新任务而无需额外训练。在这项工作中，我们通过发现后继度量与平稳密度比的理论联系，解决了零样本场景下的离策略问题。利用这一洞见，我们的算法能够推断最优重要性采样比率，有效地为任意任务实时执行带有最优策略的平稳分布修正。我们在SMPL人体模型上的运动跟踪任务、ExoRL上的连续控制任务以及长时域OGBench任务上对方法进行了基准测试。我们的技术无缝集成到前向-后向表示框架中，并在无需训练的情况下实现对新任务的快速适应。更广泛地说，这项工作架起了离策略学习和零样本适应之间的桥梁，为两个研究领域都带来了益处。

英文摘要

Off-policy learning methods seek to derive an optimal policy directly from a fixed dataset of prior interactions. This objective presents significant challenges, primarily due to the inherent distributional shift and value function overestimation bias. These issues become even more noticeable in zero-shot reinforcement learning, where an agent trained on reward-free data must adapt to new tasks at test time without additional training. In this work, we address the off-policy problem in a zero-shot setting by discovering a theoretical connection of successor measures to stationary density ratios. Using this insight, our algorithm can infer optimal importance sampling ratios, effectively performing a stationary distribution correction with an optimal policy for any task on the fly. We benchmark our method in motion tracking tasks on SMPL Humanoid, continuous control on ExoRL, and for the long-horizon OGBench tasks. Our technique seamlessly integrates into forward-backward representation frameworks and enables fast-adaptation to new tasks in a training-free regime. More broadly, this work bridges off-policy learning and zero-shot adaptation, offering benefits to both research areas.

URL PDF HTML ☆

赞 0 踩 0

2602.01053 2026-06-02 cs.LG 版本更新

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

LRAgent: 面向多LoRA LLM代理的高效KV缓存共享

Hyesung Jeon, Hyeongju Ha, Jae-Joon Kim

发表机构 * KAIST（韩国科学技术院）

AI总结针对多LoRA代理系统中每个代理独立存储相同长轨迹的KV缓存导致内存和计算开销大的问题，提出LRAgent框架，通过将缓存分解为共享基座部分和适配器依赖部分，并利用共享A的多LoRA架构和Flash-LoRA-Attention内核，实现高效共享，在保持精度的同时显著降低开销。

Comments 25 pages, 10 figures, 22 tables

详情

Journal ref: ICML 2026 Poster

AI中文摘要

多LLM代理系统中的角色专业化通常通过多LoRA实现，其中代理共享预训练骨干网络，仅通过轻量级适配器区分。尽管共享基础模型权重，每个代理仍独立构建和存储相同长工具增强轨迹的KV缓存，导致大量内存和计算开销。现有的KV缓存共享方法大多忽略这种多LoRA设置。我们观察到，代理间的缓存差异主要由适配器输出主导，而共享预训练骨干网络的激活保持高度相似。基于此观察，我们提出LRAgent，一个面向多LoRA代理的KV缓存共享框架。它将缓存分解为两个组件：来自预训练权重的共享基座组件和来自LoRA权重的适配器依赖组件。LRAgent通过在代理间共享基座组件并以固有的低秩形式存储适配器组件来减少内存开销。它还通过共享A的多LoRA架构共享低秩缓存，从而减少计算开销，避免对已被其他代理处理过的上下文进行冗余计算。为了在运行时高效重建适配器贡献，我们引入Flash-LoRA-Attention，一个重新排序注意力计算以避免将低秩缓存实例化为全维度的内核。LRAgent实现了接近完全共享缓存的吞吐量和首令牌延迟，同时在代理问答基准测试中保持了接近非共享缓存基线的准确性。

英文摘要

Role specialization in multi-LLM agent systems is often realized via multi-LoRA, where agents share a pretrained backbone and differ only by lightweight adapters. Despite sharing base model weights, each agent independently builds and stores its own KV cache for the same long, tool-augmented trajectories, incurring substantial memory and compute overhead. Existing KV cache sharing methods largely overlook this multi-LoRA setting. We observe that, cache differences across agents are dominated by adapter outputs, while activations from the shared pretrained backbone remain highly similar. Based on this observation, we propose LRAgent, a KV cache sharing framework for multi-LoRA agents. It decomposes the cache into two components, a shared base component derived from pretrained weights and an adapter-dependent component derived from LoRA weights. LRAgent reduces memory overhead by sharing the base component across agents and storing the adapter component in its inherent low-rank form. It also reduces computational overhead by sharing the low-rank cache, enabled by a shared-A multi-LoRA architecture. This avoids redundant computations for contexts that have already been processed by other agents. To efficiently reconstruct adapter contributions at runtime, we introduce Flash-LoRA-Attention, a kernel that reorders attention computation to avoid materializing the low-rank cache to full dimension. LRAgent achieves throughput and time-to-first-token latency close to fully shared caching, while preserving accuracy near the non-shared caching baseline across agentic question-answering benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2602.00415 2026-06-02 cs.AI cs.LG 版本更新

PolarMem: A Training-Free Polarized Latent Graph Memory for Verifiable Vision-Language Models

PolarMem: 一种无需训练的可验证视觉语言模型极化隐式图记忆

Zhisheng Chen, Tingyu Wu, Zijie Zhou, Zhengwei Xie, Jinhan Li, Ziyan Weng, Liang Lin, Jingwei Song, Zikai Xiao, Yingwei Zhang

发表机构 * ICT, CAS（中国科学院信息科技研究院）； UCAS（中国科学院大学）； CUPB（中国政法大学）； USTC（中国科学技术大学）； CityU-DG（城市大学-数据科学）； HKU（香港大学）； ZJU（浙江大学）

AI总结提出PolarMem，一种无需训练的极化隐式图记忆框架，通过语义一致性验证和自适应分布划分将视觉语言模型感知信号转化为HAS、NOT_HAS和Uncertain记忆状态，并采用词典逻辑感知检索协议优先保证逻辑一致性，从而提升检索密集型任务性能并减少矛盾。

详情

AI中文摘要

记忆对于智能系统而言不仅是存储机制，更是组织证据和约束信念的结构。这对多模态推理尤为重要，因为检索到的证据必须既与查询相关又在视觉上一致。然而，当前视觉语言模型（VLM）的记忆系统大多保持正关联：它们检索相似或先前观察到的内容，但缺乏明确的方式记住已被验证为不存在或逻辑排除的内容。为此，我们提出 extbf{PolarMem}，一种无需训练的极化隐式图记忆框架，用于可验证的视觉语言推理。PolarMem通过语义一致性验证和自适应分布划分，将冻结的VLM感知信号转化为 extit{HAS}、 extit{NOT\_HAS}和 extit{Uncertain}记忆状态，并将其存储在具有明确正负记忆关系的极化图中。在推理时，词典逻辑感知检索协议在语义相似性之前强制执行逻辑一致性，在冲突记忆进入模型上下文之前将其抑制。在八个冻结的VLM骨干网络和六个多模态基准测试中，PolarMem一致地提升了检索密集型任务性能并减少了检索级矛盾。这些结果凸显了负记忆作为构建更可靠多模态记忆系统的关键机制。我们的代码可在https://github.com/czs-ict/PolarMem获取。

英文摘要

Memory is not merely a storage mechanism for intelligent systems, but a structure for organizing evidence and constraining belief. This is especially important for multimodal reasoning, where retrieved evidence must be both query-relevant and visually consistent. However, current memory systems for vision-language models (VLMs) remain largely positive-associative: they retrieve what is similar or previously observed, but lack an explicit way to remember what has been verified as absent or logically excluded. To this end, we propose \textbf{PolarMem}, a training-free polarized latent graph memory framework for verifiable vision-language reasoning. PolarMem transforms frozen VLM perceptual signals into \textit{HAS}, \textit{NOT\_HAS}, and \textit{Uncertain} memory states through semantic consistency verification and adaptive distributional partitioning, and stores them in a polarized graph with distinct positive and negative memory relations. During inference, a lexicographical logic-aware retrieval protocol enforces logical consistency before semantic similarity, suppressing conflicting memories before they enter the model context. Across eight frozen VLM backbones and six multimodal benchmarks, PolarMem consistently improves retrieval-intensive tasks and reduces retrieval-level contradictions. These results highlight negative memory as a key mechanism for building more reliable multimodal memory systems. Our code is available at https://github.com/czs-ict/PolarMem.

URL PDF HTML ☆

赞 0 踩 0

2601.22947 2026-06-02 cs.CL cs.LG 版本更新

Reconsidering Positional Supervision in Masked Diffusion Language Model Training

重新审视掩码扩散语言模型训练中的位置监督

Mengyu Ye, Keito Kudo, Ryosuke Takahashi, Jun Suzuki

发表机构 * Tohoku University（东大大学）； RIKEN（理化学研究所）； NII LLMC（国家信息研究所LLMC）

AI总结针对掩码扩散语言模型对位置偏移敏感的问题，提出基于连接主义时序分类（CTC）的目标函数，通过引入松弛令牌和更新折叠映射来吸收位置不确定性，从而在开放生成基准上取得一致提升。

Comments preprint, WIP

详情

AI中文摘要

掩码扩散语言模型（MDLM）通过并行去掩码生成文本，最近成为自回归语言模型的替代方案。它们可以被视为使用位置交叉熵（CE）损失训练的并行解码器，与非自回归翻译（NAT）设置相同。在NAT中，CE训练的并行解码器被认为对小的位置偏移敏感，因为CE会严厉惩罚它们。我们询问CE训练的MDLM在迭代解码下是否同样对此类偏移敏感。为了探究这一点，我们应用了一种受控干预，在解码过程中引入这些偏移。在LLaDA-8B-Instruct和Arena-Hard上，仅将1%的生成令牌移动一个位置，就显著降低了相对于未干预模型的胜率，表明MDLM在迭代并行解码下对此类小偏移敏感。受此启发，我们将连接主义时序分类（CTC）（一种已知能缓解该问题的对齐灵活目标）适配到MDLM监督微调中。通过放宽CE施加的严格位置匹配，CTC为损失提供了吸收小位置偏移的空间；具体地，我们修改了CTC目标，使用一个特殊的<slack>令牌来吸收目标令牌与输出位置之间的位置不确定性，并更新了折叠映射以保留目标表面形式。在四个开放生成基准上，所得模型在原始模型和匹配的交叉熵训练基线上均有一致改进，且在所有四个基准上具有统计显著性。这些结果表明，训练侧的对齐灵活性是MDLM SFT的一个有用设计维度，与先前工作中探索的推理时方法互补。

英文摘要

Masked diffusion language models (MDLMs) generate text by unmasking tokens in parallel and have recently emerged as alternatives to autoregressive language models. They can be viewed as parallel decoders trained with a position-wise cross-entropy (CE) loss, the same setup as non-autoregressive translation (NAT). In NAT, CE-trained parallel decoders have been argued to be sensitive to small positional shifts, since CE penalizes them harshly. We ask whether CE-trained MDLMs are similarly sensitive to such shifts under iterative decoding. To probe this, we apply a controlled intervention that introduces them during decoding. On LLaDA-8B-Instruct with Arena-Hard, displacing as little as 1% of generated tokens by one position substantially reduces win rates against the unintervened model, showing that MDLMs are sensitive to such small shifts under iterative parallel decoding. Motivated by this, we adapt connectionist temporal classification (CTC), an alignment-flexible objective known to mitigate it there, to MDLM supervised fine-tuning. By relaxing the strict position-wise match that CE imposes, CTC gives the loss room to absorb small positional shifts; concretely, we modified CTC objective to use a special <slack> token that absorbs positional uncertainty between target tokens and output positions, and a updated collapse map that preserves target surface forms. Across four open-ended generation benchmarks, the resulting model consistently improves over both the original model and a matched cross-entropy-trained baseline, with statistically significant gains on all four. These results identify training-side alignment flexibility as a useful design dimension for MDLM SFT, complementary to the inference-time approaches explored in prior work.

URL PDF HTML ☆

赞 0 踩 0

2601.22813 2026-06-02 cs.LG 版本更新

SurrogateSHAP：文本到图像（T2I）模型的无训练贡献者归因

Mingyu Lu, Soham Gadgil, Chris Lin, Chanwoo Kim, Su-In Lee

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结针对文本到图像扩散模型中数据贡献者公平估值的高计算成本问题，提出基于预训练模型推理的无重训练框架SurrogateSHAP，利用梯度提升树近似效用函数并解析计算Shapley值，在多个任务上以更低开销超越现有方法。

详情

AI中文摘要

随着文本到图像（T2I）扩散模型在现实创意工作流中的广泛应用，一个用于评估提供数据集合的贡献者的原则性框架对于公平补偿和可持续数据市场至关重要。虽然Shapley值提供了理论上有依据的归因方法，但它面临双重计算瓶颈：（i）对每个采样的玩家（即数据贡献者）子集进行穷举模型重训练的高昂成本，以及（ii）由于贡献者交互，估计边际贡献所需的子集组合数量巨大。为此，我们提出了SurrogateSHAP，一个无需重训练的框架，通过从预训练模型进行推理来近似昂贵的重训练博弈。为了进一步提高效率，我们采用梯度提升树来近似效用函数，并基于树模型解析地推导Shapley值。我们在三个不同的归因任务上评估了SurrogateSHAP：（i）CIFAR-20上DDPM-CFG的图像质量，（ii）后印象派艺术品上Stable Diffusion的美学质量，以及（iii）时尚产品数据上FLUX.1的产品多样性。在各种设置下，SurrogateSHAP在显著降低计算开销的同时优于先前方法，一致地在多个效用指标上识别出有影响力的贡献者。最后，我们证明了SurrogateSHAP能够有效定位导致临床图像中虚假相关的数据源，为审计安全关键型生成模型提供了一条可扩展的路径。

英文摘要

As Text-to-Image (T2I) diffusion models are increasingly used in real-world creative workflows, a principled framework for valuing contributors who provide a collection of data is essential for fair compensation and sustainable data marketplaces. While the Shapley value offers a theoretically grounded approach to attribution, it faces a dual computational bottleneck: (i) the prohibitive cost of exhaustive model retraining for each sampled subset of players (i.e., data contributors) and (ii) the combinatorial number of subsets needed to estimate marginal contributions due to contributor interactions. To this end, we propose SurrogateSHAP, a retraining-free framework that approximates the expensive retraining game through inference from a pretrained model. To further improve efficiency, we employ a gradient-boosted tree to approximate the utility function and derive Shapley values analytically from the tree-based model. We evaluate SurrogateSHAP across three diverse attribution tasks: (i) image quality for DDPM-CFG on CIFAR-20, (ii) aesthetics for Stable Diffusion on Post-Impressionist artworks, and (iii) product diversity for FLUX.1 on Fashion-Product data. Across settings, SurrogateSHAP outperforms prior methods while substantially reducing computational overhead, consistently identifying influential contributors across multiple utility metrics. Finally, we demonstrate that SurrogateSHAP effectively localizes data sources responsible for spurious correlations in clinical images, providing a scalable path toward auditing safety-critical generative models.

URL PDF HTML ☆

赞 0 踩 0

2501.13428 2026-06-02 cs.CL cs.AI cs.LG 版本更新

Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models

Softplus注意力与重加权提升大语言模型的长度外推能力

Bo Gao, Michael W. Spratling, Letizia Gionfrida

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出一种两阶段注意力机制，用Softplus和l1归一化替代Softmax，并引入基于不变熵的动态缩放因子和重加权机制，以提升数值稳定性、缓解注意力下沉现象，并显著改善长度外推性能。

Comments Accepted by ICML 2026

详情

KromHC: 基于Kronecker积残差矩阵的流形约束超连接

Wuyang Zhou, Yuxuan Gu, Giorgos Iacovides, Danilo Mandic

发表机构 * University of Technology Sydney（悉尼科技大学）

AI总结针对超连接中的训练不稳定和参数爆炸问题，提出KromHC方法，利用Kronecker积分解小规模双随机矩阵来参数化残差矩阵，在保证精确双随机性的同时将参数复杂度降至O(n^2C)。

详情

AI中文摘要

超连接（HC）在神经网络中的成功也凸显了训练不稳定和可扩展性受限的问题。流形约束超连接（mHC）通过将残差连接空间投影到Birkhoff多面体上缓解了这些挑战，但它面临两个问题：1）其迭代Sinkhorn-Knopp（SK）算法并不总是产生精确的双随机残差矩阵；2）mHC的参数复杂度为O(n^3C)，其中n是残差流的宽度，C是特征维度。最近提出的mHC-lite通过Birkhoff-von-Neumann定理重新参数化残差矩阵以保证双随机性，但其参数复杂度面临阶乘爆炸，即O(nC·n!)。为了解决这两个挑战，我们提出KromHC，它使用较小双随机矩阵的Kronecker积来参数化mHC中的残差矩阵。通过沿张量化残差流的每个模式对因子残差矩阵施加流形约束，KromHC保证了残差矩阵的精确双随机性，同时将参数复杂度降低到仅O(n^2C)。实验表明，KromHC匹配甚至超越了其他最先进的mHC变体，同时需要显著更少的可训练参数。代码见https://github.com/wz1119/KromHC。

英文摘要

The success of Hyper-Connections (HC) in neural networks (NN) has also highlighted issues related to training instability and restricted scalability. The Manifold-Constrained Hyper-Connections (mHC) mitigate these challenges by projecting the residual connection space onto a Birkhoff polytope, however, it faces two issues: 1) its iterative Sinkhorn-Knopp (SK) algorithm does not always yield exactly doubly stochastic residual matrices; 2) mHC incurs a prohibitive $O(n^3C)$ parameter complexity with $n$ as the width of the residual stream and $C$ as the feature dimension. The recently proposed mHC-lite reparametrizes the residual matrix via the Birkhoff-von-Neumann theorem to guarantee double stochasticity, but also faces a factorial explosion in its parameter complexity, $O \left( nC \cdot n! \right)$. To address both challenges, we propose KromHC, which uses the Kronecker products of smaller doubly stochastic matrices to parametrize the residual matrix in mHC. By enforcing manifold constraints across the factor residual matrices along each mode of the tensorized residual stream, KromHC guarantees exact double stochasticity of the residual matrices while reducing parameter complexity to only $O(n^2C)$. Experiments show that KromHC matches or even outperforms other state-of-the-art (SOTA) mHC variants, while requiring significantly fewer trainable parameters. The code is at https://github.com/wz1119/KromHC.

URL PDF HTML ☆

赞 0 踩 0

2601.21237 2026-06-02 cs.DS cs.CL cs.LG 版本更新

Characterizing the Effect of Noise in Language Generation in the Limit

极限情况下语言生成中噪声影响的刻画

Aaron Li, Ian Zhang

发表机构 * Harvard University（哈佛大学）； Duke University（杜克大学）

AI总结本文在极限语言生成模型中，通过分析噪声字符串对生成能力的影响，证明了单个噪声字符串严格减少可生成集合族，且有限噪声等价于单个噪声，并首次刻画了非均匀噪声依赖的可生成性。

Comments ICML 2026

详情

AI中文摘要

Kleinberg 和 Mullainathan 最近提出了一个用于研究语言生成现象的正式框架，称为极限语言生成。在该模型中，对手从未知目标语言中给出示例字符串的枚举，算法需要在有限时间内正确生成目标语言中未见过的字符串。Li、Raman 和 Tewari（2025）后来引入了非均匀和均匀生成的细化概念，Raman 和 Raman（2025）引入了噪声模型，允许对手插入无关字符串。噪声模型中的一个自然问题是通过研究每个额外无关字符串的影响来量化噪声效应。我们在此设置中展示了两个互补的结果。首先，我们证明对于均匀和非均匀生成，单个噪声字符串严格减少了可生成的集合族，从而回答了 Raman 和 Raman（2025）中的一个开放问题。然后，我们证明对于均匀和非均匀生成，单个噪声字符串的生成等价于任何有限噪声量的生成，这与 Bai、Panigrahi 和 Zhang（2026）展示的极限噪声生成的严格层次结构形成鲜明对比。最后，我们利用先前的结果首次提供了非均匀噪声依赖可生成性的刻画。

英文摘要

Kleinberg and Mullainathan recently proposed a formal framework for studying the phenomenon of language generation, called language generation in the limit. In this model, an adversary gives an enumeration of example strings from an unknown target language, and the algorithm is tasked with correctly generating unseen strings from the target language within finite time. Refined notions of non-uniform and uniform generation were later introduced by Li, Raman, and Tewari (2025), and a noisy model was introduced by Raman and Raman (2025), which allows the adversary to insert extraneous strings. A natural question in the noisy model is to quantify the effect of noise, by studying the impact of each additional extraneous string. We show two complementary results in this setting. We first show that for both uniform and non-uniform generation, a single noisy string strictly reduces the set of collections that can be generated, thus answering an open question in Raman and Raman (2025). Then, we show for both uniform and non-uniform generation that generation with a single noisy string is equivalent to generation with any finite amount of noise, sharply contrasting with the strict hierarchy for noisy generation in the limit shown by Bai, Panigrahi, and Zhang (2026). Finally, we leverage our previous results to provide the first known characterization for non-uniform noise-dependent generatability.

URL PDF HTML ☆

赞 0 踩 0

2505.14411 2026-06-02 cs.LG 版本更新

Byte Pair Encoding for Efficient Time Series Forecasting

用于高效时间序列预测的字节对编码

Leon Götz, Marcel Kollovieh, Stephan Günnemann, Leo Schwinn

发表机构 * GitHub ； arXiv

AI总结提出基于频繁模式的字节对编码方法，通过自适应压缩时间序列为令牌，显著提升预测性能与效率。

Comments 32 pages in total, 22 figures

详情

AI中文摘要

现有的时间序列分词方法主要将固定数量的样本编码为单个令牌。这种不灵活的方法即使对于简单的模式（如扩展的常数值）也可能生成过多的令牌，导致大量计算开销。受字节对编码成功的启发，我们提出了第一个面向模式的时间序列分词方案。基于频繁模式的离散词汇表，我们的方法将具有潜在模式的样本合并为令牌，自适应地压缩时间序列。利用有限的模式集和时间序列的连续特性，我们进一步引入条件解码作为一种轻量级但强大的事后优化方法，该方法无需梯度计算且不增加计算开销。在近期的时间序列基础模型上，基于模式的分词平均将预测性能提升40%，效率提升2314%。条件解码进一步将MSE降低高达48%。在广泛的分析中，我们展示了分词对多样化时间模式的适应性、对未见数据的泛化能力，以及捕获不同时间序列属性（包括统计矩和趋势）的有意义的令牌表示。

英文摘要

Existing time series tokenization methods predominantly encode a constant number of samples into individual tokens. This inflexible approach can generate excessive tokens for even simple patterns like extended constant values, resulting in substantial computational overhead. Inspired by the success of byte pair encoding, we propose the first pattern-centric tokenization scheme for time series analysis. Based on a discrete vocabulary of frequent motifs, our method merges samples with underlying patterns into tokens, compressing time series adaptively. Exploiting our finite set of motifs and the continuous properties of time series, we further introduce conditional decoding as a lightweight yet powerful post-hoc optimization method, which requires no gradient computation and adds no computational overhead. On recent time series foundation models, our motif-based tokenization improves forecasting performance by 40% and boosts efficiency by 2314% on average. Conditional decoding further reduces MSE by up to 48%. In an extensive analysis, we demonstrate the adaptiveness of our tokenization to diverse temporal patterns, its generalization to unseen data, and its meaningful token representations capturing distinct time series properties, including statistical moments and trends.

URL PDF HTML ☆

赞 0 踩 0

2601.18783 2026-06-02 cs.LG cs.AI cs.SY eess.SY 版本更新

Multi-Objective Reinforcement Learning for Tactical Decision Making for Trucks in Highway Traffic

多目标强化学习用于高速公路卡车战术决策

Deepthi Pathare, Leo Laine, Morteza Haghir Chehreghani

发表机构 * Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg（计算机科学与工程系，查尔姆斯理工大学和哥德堡大学）； Department of Mechanics and Maritime Sciences, Chalmers University of Technology（机械与海洋科学系，查尔姆斯理工大学）

AI总结提出基于近端策略优化的多目标强化学习框架，学习一组帕累托最优策略以平衡安全性、能源效率和时间效率，实现无需重新训练的灵活决策。

详情

AI中文摘要

在高速公路驾驶中平衡安全性、效率和运营成本对重型车辆来说是一个具有挑战性的决策问题。一个核心困难是，通过聚合这些竞争目标得到的传统标量奖励公式往往会掩盖其权衡结构。我们提出了一个基于近端策略优化的多目标强化学习框架，该框架学习一组明确表示这些权衡的策略，并在一个可扩展的模拟平台上对卡车的战术决策进行评估。所提出的方法学习一组帕累托最优策略，捕捉三个冲突目标之间的权衡：安全性（以碰撞和成功完成量化）、能源效率和时间效率（分别以能源成本和驾驶员成本量化）。得到的帕累托前沿平滑且可解释，使得在不同冲突目标下选择驾驶行为具有灵活性。该框架允许在不同驾驶策略之间无缝切换而无需重新训练，为自动驾驶卡车应用提供了稳健且自适应的决策策略。

英文摘要

Balancing safety, efficiency, and operational costs in highway driving poses a challenging decision-making problem for heavy-duty vehicles. A central difficulty is that conventional scalar reward formulations, obtained by aggregating these competing objectives, often obscure the structure of their trade-offs. We present a Proximal Policy Optimization based multi-objective reinforcement learning framework that learns a set of policies explicitly representing these trade-offs and evaluates it on a scalable simulation platform for tactical decision making in trucks. The proposed approach learns a set of Pareto-optimal policies that capture the trade-offs among three conflicting objectives: safety, quantified in terms of collisions and successful completion; energy efficiency and time efficiency, quantified using energy cost and driver cost, respectively. The resulting Pareto frontier is smooth and interpretable, enabling flexibility in choosing driving behavior along different conflicting objectives. This framework allows seamless transitions between different driving policies without retraining, yielding a robust and adaptive decision-making strategy for autonomous trucking applications.

URL PDF HTML ☆

赞 0 踩 0

2601.18115 2026-06-02 cs.LG cs.DS math.OC 版本更新

Robust Learning of a Group DRO Neuron

群体分布鲁棒优化神经元的鲁棒学习

Guyang Cao, Shuyao Li, Sushrut Karmalkar, Jelena Diakonikolas

发表机构 * University of Wisconsin-Madison（威斯康星大学麦迪逊分校）； Microsoft Research, Cambridge（微软研究院，剑桥）

AI总结针对任意标签噪声和群体级分布偏移，提出一种计算高效的原对偶算法，学习一个单神经元，使其在最小化最坏情况群体加权损失时达到常数因子竞争比。

详情

AI中文摘要

我们研究在存在任意标签噪声和群体级分布偏移的情况下，对于一大类协变量分布，在标准平方损失下学习单个神经元的问题。我们的目标是识别一个由 $\mathbf{w}_*$ 参数化的“最佳拟合”神经元，该神经元在最具挑战性的群体重新加权下表现良好。具体来说，我们解决了一个群体分布鲁棒优化问题：给定对 $K$ 个不同分布 $\mathcal p_{[1]},\dots,\mathcal p_{[K]}$ 的样本访问，我们寻求近似 $\mathbf{w}_*$，该 $\mathbf{w}_*$ 最小化群体分布的凸组合 $\boldsymbolλ \in Δ_K$ 上的最坏情况目标，其中目标为 $\sum_{i \in [K]}λ_{[i]}\,\mathbb E_{(\mathbf x,y)\sim\mathcal p_{[i]}}(σ(\mathbf w\cdot\mathbf x)-y)^2 - νd_f(\boldsymbolλ,\frac{1}{K}\mathbf1)$，而 $d_f$ 是一个 $f$-散度，用于对偏离均匀群体权重施加（可选的）惩罚，由参数 $ν\geq 0$ 缩放。我们开发了一个计算高效的原对偶算法，输出一个向量 $\widehat{\mathbf w}$，该向量在最坏情况群体加权下与 $\mathbf{w}_*$ 相比具有常数因子竞争比。我们的分析框架直接应对损失函数固有的非凸性，在任意标签损坏和群体特定分布偏移的情况下提供鲁棒学习保证。受我们算法框架启发的对偶外推更新实现，在 LLM 预训练基准测试中显示出前景。

英文摘要

We study the problem of learning a single neuron under standard squared loss in the presence of arbitrary label noise and group-level distributional shifts, for a broad family of covariate distributions. Our goal is to identify a ''best-fit'' neuron parameterized by $\mathbf{w}_*$ that performs well under the most challenging reweighting of the groups. Specifically, we address a Group Distributionally Robust Optimization problem: given sample access to $K$ distinct distributions $\mathcal p_{[1]},\dots,\mathcal p_{[K]}$, we seek to approximate $\mathbf{w}_*$ that minimizes the worst-case objective over convex combinations of group distributions $\boldsymbolλ \in Δ_K$, where the objective is $\sum_{i \in [K]}λ_{[i]}\,\mathbb E_{(\mathbf x,y)\sim\mathcal p_{[i]}}(σ(\mathbf w\cdot\mathbf x)-y)^2 - νd_f(\boldsymbolλ,\frac{1}{K}\mathbf1)$ and $d_f$ is an $f$-divergence that imposes (optional) penalty on deviations from uniform group weights, scaled by a parameter $ν\geq 0$. We develop a computationally efficient primal-dual algorithm that outputs a vector $\widehat{\mathbf w}$ that is constant-factor competitive with $\mathbf{w}_*$ under the worst-case group weighting. Our analytical framework directly confronts the inherent nonconvexity of the loss function, providing robust learning guarantees in the face of arbitrary label corruptions and group-specific distributional shifts. The implementation of the dual extrapolation update motivated by our algorithmic framework shows promise on LLM pre-training benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2509.13805 2026-06-02 cs.LG cs.AI stat.ML 版本更新

Towards a Physics Foundation Model

迈向物理基础模型

Florian Wiesner, Zoë J. Gray, Matthias Wessling, Stephen Baek

发表机构 * University of Cambridge（剑桥大学）

AI总结提出通用物理变换器（GPhyT），通过在大规模多样化模拟数据上训练，实现单一模型在多个物理领域（如流固耦合、冲击波、热对流和多相流）的零样本泛化与长期稳定预测，性能超越专用架构7倍以上。

Comments ICML-AI4Physics 2026

详情

AI中文摘要

基础模型通过“一次训练，随处部署”的范式彻底改变了自然语言处理，即单个预训练模型无需重新训练即可适应无数下游任务。拥有物理基础模型（PFM）将是变革性的——它能够民主化高保真模拟的访问、加速科学发现，并消除对专用求解器开发的需求。然而，当前物理感知的机器学习方法仍然从根本上局限于单一狭窄领域，并且需要为每个新系统重新训练。我们提出了通用物理变换器（GPhyT），该模型在1.8 TB的多样化模拟数据上训练，证明了基础模型能力在物理领域是可以实现的。我们的关键见解是，变换器可以学习从上下文中推断支配动力学，从而使单一模型能够模拟流固耦合、冲击波、热对流和多相动力学，而无需被告知底层方程。GPhyT实现了三个关键突破：（1）在多个物理领域上表现出卓越性能，比专用架构高出7倍以上；（2）通过上下文学习，对完全未见过的物理系统进行合理的零样本泛化；（3）通过长程 rollout 实现更稳定的长期预测。通过证明单一模型可以仅从数据中学习可泛化的物理原理，这项工作为通向通用PFM开辟了道路，该模型可能改变计算科学与工程。

英文摘要

Foundation models have revolutionized natural language processing through a ``train once, deploy anywhere'' paradigm, where a single pre-trained model adapts to countless downstream tasks without retraining. Access to a Physics Foundation Model (PFM) would be transformative - democratizing access to high-fidelity simulations, accelerating scientific discovery, and eliminating the need for specialized solver development. Yet current physics-aware machine learning approaches remain fundamentally limited to single, narrow domains and require retraining for each new system. We present the General Physics Transformer (GPhyT), trained on 1.8 TB of diverse simulation data, that demonstrates foundation model capabilities are achievable for physics. Our key insight is that transformers can learn to infer governing dynamics from context, enabling a single model to simulate fluid-solid interactions, shock waves, thermal convection, and multi-phase dynamics without being told the underlying equations. GPhyT achieves three critical breakthroughs: (1) superior performance across multiple physics domains, outperforming specialized architectures by more than 7x, (2) plausible zero-shot generalization to entirely unseen physical systems through in-context learning, and (3) more stable long-term predictions through long-horizon rollouts. By establishing that a single model can learn generalizable physical principles from data alone, this work opens the path toward a universal PFM that could transform computational science and engineering.

URL PDF HTML ☆

赞 0 踩 0

2404.01356 2026-06-02 cs.LG cs.AI cs.CY 版本更新

基于低秩适应的3D卷积基础模型跨模态微调用于ADHD分类

Jyun-Ping Kao, Shinyeong Rho, Shahar Lazarev, Hyun-Hae Cho, Fangxu Xing, Taehoon Shin, C. -C. Jay Kuo, Jonghye Woo

发表机构 * National Institute of Mental Health, National Institutes of Health（国家精神卫生研究所，国立卫生研究院）

AI总结提出一种参数高效的迁移学习方法，通过3D低秩适应（LoRA）将预训练于CT图像的3D卷积基础模型微调至MRI的ADHD分类任务，在公开扩散MRI数据集上达到71.9%准确率和0.716 AUC，仅需164万可训练参数。

Comments Accepted for presentation at the IEEE International Symposium on Biomedical Imaging (ISBI) 2026

详情

DOI: 10.1109/ISBI61048.2026.11515951
Journal ref: 2026 IEEE 23rd International Symposium on Biomedical Imaging (ISBI), pp. 1-4

AI中文摘要

儿童注意缺陷/多动障碍（ADHD）的早期诊断在改善教育和心理健康结果中起着关键作用。然而，由于异质性表现和与其他疾病的重叠症状，使用神经影像数据诊断ADHD仍然具有挑战性。为了解决这一问题，我们提出了一种新颖的参数高效迁移学习方法，将预训练于CT图像的大规模3D卷积基础模型适应于基于MRI的ADHD分类任务。我们的方法通过将3D卷积核分解为2D低秩更新，在3D中引入低秩适应（LoRA），大幅减少可训练参数，同时实现优越性能。在公开扩散MRI数据库上的五折交叉验证评估中，我们的3D LoRA微调策略取得了最先进的结果，一个模型变体达到71.9%的准确率，另一个达到0.716的AUC。两个变体仅使用164万可训练参数（比完全微调的基础模型少113倍以上）。我们的结果代表了神经影像中基础模型首次成功的跨模态（CT到MRI）适应之一，为ADHD分类建立了新的基准，同时大幅提高了效率。

英文摘要

Early diagnosis of attention-deficit/hyperactivity disorder (ADHD) in children plays a crucial role in improving outcomes in education and mental health. Diagnosing ADHD using neuroimaging data, however, remains challenging due to heterogeneous presentations and overlapping symptoms with other conditions. To address this, we propose a novel parameter-efficient transfer learning approach that adapts a large-scale 3D convolutional foundation model, pre-trained on CT images, to an MRI-based ADHD classification task. Our method introduces Low-Rank Adaptation (LoRA) in 3D by factorizing 3D convolutional kernels into 2D low-rank updates, dramatically reducing trainable parameters while achieving superior performance. In a five-fold cross-validated evaluation on a public diffusion MRI database, our 3D LoRA fine-tuning strategy achieved state-of-the-art results, with one model variant reaching 71.9% accuracy and another attaining an AUC of 0.716. Both variants use only 1.64 million trainable parameters (over 113x fewer than a fully fine-tuned foundation model). Our results represent one of the first successful cross-modal (CT-to-MRI) adaptations of a foundation model in neuroimaging, establishing a new benchmark for ADHD classification while greatly improving efficiency.

URL PDF HTML ☆

赞 0 踩 0

2511.01938 2026-06-02 cs.LG cs.AI 版本更新

The Geometry of Grokking: Norm Minimization on the Zero-Loss Manifold

Grokking 的几何：零损失流形上的范数最小化

Tiberiu Musat

发表机构 * ETH Zurich（苏黎世联邦理工学院）

AI总结本文通过约束优化视角，证明在极小学习率和权重衰减系数下，梯度下降在零损失流形上最小化权重范数，并引入近似解耦参数子集的学习动力学，推导出两层网络第一层后记忆动力学的闭式表达式，实验验证了该框架能复现 grokking 的延迟泛化和表征学习特征。

详情

AI中文摘要

Grokking 是神经网络中一种令人费解的现象，即在完全记忆训练数据之后，经过相当长的延迟才出现完全的泛化。先前的研究将这种延迟泛化与由权重衰减驱动的表征学习联系起来，但精确的潜在动力学仍然难以捉摸。在本文中，我们认为后记忆学习可以通过约束优化的视角来理解：梯度下降在零损失流形上有效地最小化权重范数。我们在无穷小学习率和权重衰减系数的极限下正式证明了这一点。为了进一步剖析这一机制，我们引入了一种近似，将一部分参数的学习动力学与网络其余部分解耦。应用这一框架，我们推导出两层网络中第一层后记忆动力学的闭式表达式。实验证实，使用我们预测的梯度模拟训练过程能够再现 grokking 的特征性延迟泛化和表征学习。

英文摘要

Grokking is a puzzling phenomenon in neural networks where full generalization occurs only after a substantial delay following the complete memorization of the training data. Previous research has linked this delayed generalization to representation learning driven by weight decay, but the precise underlying dynamics remain elusive. In this paper, we argue that post-memorization learning can be understood through the lens of constrained optimization: gradient descent effectively minimizes the weight norm on the zero-loss manifold. We formally prove this in the limit of infinitesimally small learning rates and weight decay coefficients. To further dissect this regime, we introduce an approximation that decouples the learning dynamics of a subset of parameters from the rest of the network. Applying this framework, we derive a closed-form expression for the post-memorization dynamics of the first layer in a two-layer network. Experiments confirm that simulating the training process using our predicted gradients reproduces both the delayed generalization and representation learning characteristic of grokking.

URL PDF HTML ☆

赞 0 踩 0

2408.11266 2026-06-02 cs.LG cs.NA math.NA 版本更新

Practical Aspects on Solving Differential Equations Using Deep Learning: A Primer

使用深度学习求解微分方程的实践方面：入门指南

Georgios Is. Detorakis

发表机构 * International Centre for Neuromorphic Systems（神经形态系统国际中心）； Department of Computer Science（计算机科学系）； University of Manchester（曼彻斯特大学）

AI总结本文介绍深度学习核心概念，重点阐述如何利用神经网络求解偏微分方程，包括实现方法、超参数选择及精度提升技巧，并强调无需GPU即可复现。

Comments 34 pages, 13 figures, primer (tutorial)

详情

AI中文摘要

深度学习如今在许多科学领域都很常见，包括偏微分方程的研究。本文简要、易懂地介绍了深度学习的核心概念，包括神经网络、反向传播和通用逼近定理。它主要涵盖了如何使用深度学习求解微分方程。本文旨在帮助数学、物理及相关领域的本科生和研究生学习如何使用深度学习求解偏微分方程。数学或物理教师也可以使用本文向学生介绍深度伽辽金方法和科学深度学习。我们关注关键问题：什么是深度学习，它如何帮助解决数学或物理问题？如何实现神经网络并选择正确的数值方法来求解微分方程？如何选择最佳超参数？如何提高精度并加速收敛？需要说明的是，本文中的所有问题都可以在没有GPU的机器上解决，因此任何学生都可以遵循所介绍的方法。

英文摘要

Deep learning is now common across many scientific fields, including the study of partial differential equations. This article provides a brief, accessible introduction to core deep learning concepts, including neural networks, backpropagation, and the universal approximation theorem. It mainly covers how to use deep learning in solving differential equations. The article aims to help undergraduate and graduate students in mathematics, physics, and related areas learn how to use Deep Learning to solve partial differential equations. Instructors in mathematics or physics can also use this article to introduce students to Deep Galerkin method and scientific deep learning. We focus on key questions: What is deep learning, and how can it help solve mathematical or physical problems? How can you implement a neural network and choose the right numerical method to solve differential equations? How do you select the best hyperparameters? How can you improve accuracy and speed up convergence? We should mention that all the problems in this article can be solved on a machine without a GPU, so any student can follow the presented methodology.

URL PDF HTML ☆

赞 0 踩 0

2601.04539 2026-06-02 cs.NE cs.AI cs.LG 版本更新

Paradoxical noise preference in RNNs

RNN中的矛盾噪声偏好

Noah Eckstein, Manoj Srinivasan

发表机构 * Department of Mechanical and Aerospace Engineering（机械与航空航天工程系）

AI总结研究发现，在循环神经网络中，训练时注入的噪声在测试时移除反而会降低性能，网络偏好训练时的噪声水平，该现象源于噪声引起的固定点偏移。

Comments Published in Transactions on Machine Learning Research (TMLR), 2026 21 pages, 8 figures

详情

Journal ref: Transactions on Machine Learning Research, 2026

AI中文摘要

在用于模拟生物神经网络的循环神经网络（RNN）中，通常在训练期间引入噪声以模拟生物变异性和正则化学习。预期在测试时去除噪声应保持或提高性能。与这一直觉相反，我们发现连续时间RNN（CTRNN）通常在训练噪声水平或接近该水平时表现最佳。这种噪声偏好通常出现在噪声注入到神经激活函数内部时；而在激活函数外部注入噪声训练的网络在零噪声时表现最佳。该现象在多种任务中对于足够大的训练噪声鲁棒地出现；我们还展示了该现象出现在前馈神经网络中，而不仅仅是RNN中。我们的分析表明，该现象源于RNN底层随机动力学中固定点（平稳分布）的噪声诱导偏移。这些固定点偏移依赖于噪声水平，并在去除噪声时使网络输出产生偏差，从而降低性能。分析和数值结果表明，当神经状态在激活函数非线性附近运行时会产生偏差，此时噪声被不对称地衰减，而性能优化激励了在这些非线性附近运行；对于噪声在激活函数内部的网络存在这种性能激励，而外部噪声的网络则没有，这解释了为什么只有内部噪声网络表现出偏好。因此，网络可能过拟合到训练噪声本身，而不仅仅是输入-输出数据。该现象不同于随机共振，后者中非零噪声增强信号处理。我们的发现揭示了训练噪声可以成为神经网络学习到的计算的一部分，对理解神经群体动力学和设计鲁棒的人工RNN具有启示意义。

英文摘要

In recurrent neural networks (RNNs) used to model biological neural networks, noise is typically introduced during training to emulate biological variability and regularize learning. The expectation is that removing the noise at test time should preserve or improve performance. Contrary to this intuition, we find that continuous-time RNNs (CTRNNs) often perform best at or near the training noise level. This noise preference typically arises when noise is injected inside the neural activation function; networks trained with noise injected outside the activation function perform best with zero noise. The phenomenon arises robustly in diverse tasks for large enough training noise; we also show the phenomenon arising in feedforward neural networks, not just in RNNs. Our analyses show that the phenomenon stems from noise-induced shifts of fixed points (stationary distributions) in the underlying stochastic dynamics of the RNNs. These fixed point shifts are noise-level dependent and bias the network outputs when the noise is removed, degrading performance. Analytical and numerical results show that the bias arises when neural states operate near activation-function nonlinearities, where noise is asymmetrically attenuated, and that performance optimization incentivizes operation near these nonlinearities; such performance incentives exist for networks with noise inside, but not outside, the activation function, explaining why only noise-in networks show the preference. Thus, networks can overfit to the training noise itself rather than just to the input-output data. The phenomenon is distinct from stochastic resonance, wherein nonzero noise enhances signal processing. Our findings reveal that training noise can become an integral part of the computation learned by neural networks, with implications for understanding neural population dynamics and for the design of robust artificial RNNs.

URL PDF HTML ☆

赞 0 踩 0

2601.00672 2026-06-02 math.NA cs.LG cs.NA 版本更新

Sparse FEONet: A Low-Cost, Memory-Efficient Operator Network via Finite-Element Local Sparsity for Parametric PDEs

稀疏FEONet：通过有限元局部稀疏性实现低计算成本、高内存效率的参数化PDE算子网络

Seungchan Ko, Jiyeon Kim, Dongwook Shin

发表机构 * Department of Mathematics, Inha University（inha大学数学系）； Department of Mathematics, Ajou University（ajou大学数学系）

AI总结针对参数化PDE的有限元算子网络（FEONet）在大规模问题中计算成本高、精度下降的问题，提出一种基于有限元结构的新型稀疏网络架构，在保持精度相当的同时显著提升计算效率，并给出理论逼近和稳定性分析。

详情

AI中文摘要

本文研究了有限元算子网络（FEONet），这是一种用于参数化问题的算子学习方法，最初由J. Y. Lee、S. Ko和Y. Hong在《Finite Element Operator Network for Solving Elliptic-Type Parametric PDEs》（SIAM J. Sci. Comput., 47(2), C501-C528, 2025）中提出。FEONet在有限元空间上实现参数到解的映射，并采用无需训练数据的训练过程，同时在一大类问题上表现出高精度和鲁棒性。然而，随着单元数量的增加，其计算成本上升且精度可能下降，这给大规模问题带来了显著挑战。在本文中，我们受有限元结构启发，提出一种新的稀疏网络架构来解决这一问题。通过大量数值实验，我们表明所提出的稀疏网络在保持相当精度的同时，在计算成本和效率方面实现了显著改进。我们还建立了理论结果，证明稀疏架构能够有效逼近目标算子，并提供了稳定性分析以确保可靠的训练和预测。

英文摘要

In this paper, we study the finite element operator network (FEONet), an operator-learning method for parametric problems, originally introduced in J. Y. Lee, S. Ko, and Y. Hong, Finite Element Operator Network for Solving Elliptic-Type Parametric PDEs, SIAM J. Sci. Comput., 47(2), C501-C528, 2025. FEONet realizes the parameter-to-solution map on a finite element space and admits a training procedure that does not require training data, while exhibiting high accuracy and robustness across a broad class of problems. However, its computational cost increases and accuracy may deteriorate as the number of elements grows, posing notable challenges for large-scale problems. In this paper, we propose a new sparse network architecture motivated by the structure of the finite elements to address this issue. Throughout extensive numerical experiments, we show that the proposed sparse network achieves substantial improvements in computational cost and efficiency while maintaining comparable accuracy. We also establish theoretical results demonstrating that the sparse architecture can approximate the target operator effectively and provide a stability analysis ensuring reliable training and prediction.

URL PDF HTML ☆

赞 0 踩 0

2601.00664 2026-06-02 cs.LG cs.AI cs.CV cs.HC cs.MM 版本更新

Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

Avatar Forcing：用于自然对话的实时交互式头部化身生成

Taekyung Ki, Sangwon Jang, Jaehyeong Jo, Jaehong Yoon, Sung Ju Hwang

发表机构 * KAIST（韩国科学技术院）； NTU Singapore（新加坡国立大学）； DeepAuto.ai

AI总结提出Avatar Forcing框架，通过扩散强制实现实时交互式头部化身生成，利用直接偏好优化进行无标签学习，在低延迟（约500ms）下生成富有表现力的反应动作。

Comments CVPR 2026. Project page: https://taekyungki.github.io/AvatarForcing/

详情

AI中文摘要

说话头部生成从静态肖像创建逼真的化身，用于虚拟通信和内容创作。然而，当前的模型尚未传达真正交互式通信的感觉，通常生成缺乏情感投入的单向响应。我们确定了实现真正交互式化身的两个关键挑战：在因果约束下实时生成运动，以及在没有额外标注数据的情况下学习富有表现力、生动的反应。为了解决这些挑战，我们提出了Avatar Forcing，一种新的交互式头部化身生成框架，通过扩散强制建模实时用户-化身交互。该设计允许化身处理实时多模态输入，包括用户的音频和运动，以低延迟即时响应语言和非语言线索，如言语、点头和笑声。此外，我们引入了一种直接偏好优化方法，利用通过丢弃用户条件构建的合成失败样本，实现无标签的富有表现力交互学习。实验结果表明，我们的框架能够实现低延迟（约500ms）的实时交互，相比基线加速6.8倍，并生成反应性和富有表现力的化身运动，在80%以上的情况下优于基线。

英文摘要

Talking head generation creates lifelike avatars from static portraits for virtual communication and content creation. However, current models do not yet convey the feeling of truly interactive communication, often generating one-way responses that lack emotional engagement. We identify two key challenges toward truly interactive avatars: generating motion in real-time under causal constraints and learning expressive, vibrant reactions without additional labeled data. To address these challenges, we propose Avatar Forcing, a new framework for interactive head avatar generation that models real-time user-avatar interactions through diffusion forcing. This design allows the avatar to process real-time multimodal inputs, including the user's audio and motion, with low latency for instant reactions to both verbal and non-verbal cues such as speech, nods, and laughter. Furthermore, we introduce a direct preference optimization method that leverages synthetic losing samples constructed by dropping user conditions, enabling label-free learning of expressive interaction. Experimental results demonstrate that our framework enables real-time interaction with low latency (approximately 500ms), achieving 6.8X speedup compared to the baseline, and produces reactive and expressive avatar motion, which is preferred over 80% against the baseline.

URL PDF HTML ☆

赞 0 踩 0

2601.00389 2026-06-02 cs.CR cs.LG cs.NI 版本更新

NOS-Gate: Queue-Aware Streaming IDS for Consumer Gateways under Timing-Controlled Evasion

NOS-Gate: 面向消费网关的时序控制规避下队列感知流式入侵检测系统

Muhammad Bilal, Omer Tariq, Hasan Ahmed

发表机构 * School of Computing and Communications, Lancaster University（计算与通信学院，兰卡斯特大学）； School of Computing, Korea Advanced Institute of Science and Technology（计算学院，韩国科学技术院）

AI总结提出一种轻量级流式入侵检测系统NOS-Gate，基于网络优化脉冲动力学和K-of-M持久规则，在时序控制规避下实现高召回率低延迟的加密流量元数据检测。

Comments 9 pages, 3 figures, 4 tables. M. Bilal, O. Tariq and H. Ahmed, "NOS-Gate: Queue-Aware Streaming IDS for Consumer Gateways under Timing-Controlled Evasion," in IEEE Transactions on Consumer Electronics, doi: 10.1109/TCE.2026.3682516

详情

DOI: 10.1109/TCE.2026.3682516

AI中文摘要

时序和突发模式可能通过加密泄露，自适应攻击者可利用这一点。这削弱了独立消费网关中仅基于元数据的检测能力。因此，消费网关需要在严格的CPU和延迟预算下，仅使用元数据对加密流量进行流式入侵检测。我们提出了一种针对独立网关的流式入侵检测系统，该系统为每个流实例化一个源自网络优化脉冲（NOS）动力学的轻量级两状态单元，称为NOS-Gate。NOS-Gate对固定长度的元数据特征窗口进行评分，并在K-of-M持久规则下触发可逆缓解措施，在加权公平队列（WFQ）下暂时降低该流的权重。我们使用可执行程序worlds基准测试评估了NOS-Gate在时序控制规避下的性能，该基准测试指定了良性设备进程、可审计的攻击者预算、竞争结构以及数据包级WFQ重放以量化队列影响。所有方法均通过烧入分位数阈值进行无标签校准。在多个可复现的worlds和恶意事件中，在达到0.1%假阳性率的工作点下，NOS-Gate实现了0.952的事件召回率，而最佳基线为0.857。在门控下，它将p99.9排队延迟和p99.9附带延迟降低，CPU上每个流窗口的平均评分成本约为2.09微秒。

英文摘要

Timing and burst patterns can leak through encryption, and an adaptive adversary can exploit them. This undermines metadata-only detection in a stand-alone consumer gateway. Therefore, consumer gateways need streaming intrusion detection on encrypted traffic using metadata only, under tight CPU and latency budgets. We present a streaming IDS for stand-alone gateways that instantiates a lightweight two-state unit derived from Network-Optimised Spiking (NOS) dynamics per flow, named \emph{NOS-Gate}. NOS-Gate scores fixed-length windows of metadata features and, under a $K$-of-$M$ persistence rule, triggers a reversible mitigation that temporarily reduces the flow's weight under weighted fair queueing (WFQ). We evaluate NOS-Gate under timing-controlled evasion using an executable \emph{worlds} benchmark that specifies benign device processes, auditable attacker budgets, contention structure, and packet-level WFQ replay to quantify queue impact. All methods are calibrated label-free via burn-in quantile thresholding. Across multiple reproducible worlds and malicious episodes, at an achieved $0.1\%$ false-positive operating point, NOS-Gate attains 0.952 incident recall versus 0.857 for the best baseline in these runs. Under gating, it reduces p99.9 queueing delay and p99.9 collateral delay with a mean scoring cost of $\approx 2.09\,μ\mathrm{s}$ per flow-window on CPU.

URL PDF HTML ☆

赞 0 踩 0

2601.00175 2026-06-02 cs.LG 版本更新

Early Prediction of Liver Cirrhosis Up to Two Years in Advance: A Machine Learning Study Benchmarking Against the FIB-4 and APRI Scores

提前两年预测肝硬化：一项机器学习研究，与FIB-4和APRI评分的基准比较

Zhuqi Miao, Ahmed G Qasem, Sujan Ravi, Jason T. Cheng, Abdulaziz Ahmed, Courtney W. Houchen, Sumayah Abed, Dilorom Azimdjanovna Zuparova, Abdulaziz Ahmed

发表机构 * Center for Health Systems Innovation, Oklahoma State University（俄克拉荷马州立大学健康系统创新中心）； Department of Management Science and Information Systems, Oklahoma State University（俄克拉荷马州立大学管理科学与信息系统系）； Department of Health Services Administration, University of Alabama at Birmingham（阿拉巴马大学伯明翰分校健康服务管理系）； Division of Gastroenterology and Hepatology, Department of Medicine, University of Alabama at Birmingham（阿拉巴马大学伯明翰分校消化内科与肝病科）； College of Medicine, The University of Oklahoma Health Campus（俄克拉荷马大学健康校园医学院）； Department of Family and Community Medicine, University of Alabama at Birmingham（阿拉巴马大学伯明翰分校家庭与社区医学系）

AI总结本研究利用常规电子健康记录数据开发XGBoost模型，在诊断前1年和2年预测肝硬化，性能优于FIB-4和APRI评分。

详情

AI中文摘要

目的：利用常规收集的电子健康记录（EHR）数据，开发并评估机器学习（ML）模型，用于在诊断前1年和2年预测新发肝硬化（LC），并将其性能与FIB-4和APRI临床评分进行基准比较。方法：我们使用来自大型学术医疗系统的去标识化EHR数据进行了一项回顾性队列研究。针对1年和2年预测窗口开发了XGBoost模型，并应用了模型特定的特征选择和贝叶斯超参数调优以提高预测性能。然后在保留的测试集上评估模型，并使用准确率、精确率、召回率、F1分数、精确率-召回率曲线下面积（PR AUC）和受试者工作特征曲线下面积（AUC）与FIB-4和APRI进行比较。结果：最终建模队列包括1年预测的60,481名患者和2年预测的47,322名患者。在两个预测窗口上，调优后的ML模型均持续优于FIB-4和APRI。XGBoost模型在1年和2年预测中分别达到0.872和0.839的AUC，而FIB-4为0.756和0.723，APRI为0.798和0.761。在精确率-召回率指标上改进更大，XGBoost的PR AUC为0.657和0.562，而FIB-4为0.456和0.373，APRI为0.504和0.421。随着预测窗口延长，性能增益持续存在，表明保持了早期风险区分能力。结论：利用常规EHR数据的机器学习模型在早期预测肝硬化方面显著优于传统的FIB-4和APRI评分。这些模型能够实现更早、更准确的风险分层，并可集成到临床工作流程中作为自动化决策支持工具，以支持主动的肝硬化预防和管理。

英文摘要

Objective: Develop and evaluate machine learning (ML) models for predicting incident liver cirrhosis (LC) one and two years prior to diagnosis using routinely collected electronic health record (EHR) data and benchmark their performance against the FIB-4 and APRI clinical scores. Methods: We conducted a retrospective cohort study using de-identified EHR data from a large academic health system. XGBoost models were developed for 1- and 2-year prediction horizons, with model-specific feature selection and Bayesian hyperparameter tuning applied to improve predictive performance. The model was then evaluated on held-out test sets, and its performance was compared with FIB-4 and APRI using accuracy, precision, recall, F1, area under the precision-recall curve (PR AUC), and area under the receiver operating characteristic curve (AUC). Results: Final modeling cohorts included 60,481 patients for the 1-year prediction and 47,322 for the 2-year prediction. Across both prediction windows, the tuned ML models consistently outperformed FIB-4 and APRI. The XGBoost models achieved AUCs of 0.872 and 0.839 for the 1- and 2-year predictions, respectively, compared with 0.756 and 0.723 for FIB-4 and 0.798 and 0.761 for APRI. Improvements were larger on the precision-recall metric, with PR AUCs of 0.657 and 0.562 for XGBoost compared with 0.456 and 0.373 for FIB-4 and 0.504 and 0.421 for APRI. Performance gains persisted with longer prediction horizons, indicating maintained early risk discrimination. Conclusions: Machine learning models leveraging routine EHR data substantially outperform the traditional FIB-4 and APRI scores for early prediction of liver cirrhosis. These models enable earlier and more accurate risk stratification and can be integrated into clinical workflows as automated decision-support tools to support proactive cirrhosis prevention and management.

URL PDF HTML ☆

赞 0 踩 0

2512.22702 2026-06-02 cs.LG 版本更新

Position: Current Benchmarking Hinders Real Progress in Deep Learning for Time Series Forecasting

立场：当前基准测试阻碍了时间序列预测深度学习的真正进展

Valentina Moretti, Ivan Marisca, Cesare Alippi, Andrea Cini

发表机构 * University of Cambridge（剑桥大学）

AI总结本文指出当前基准测试实践未能识别影响性能的关键设计因素，尤其是全局性与局部性等被忽视的方面对预测方法类别和实验结果有重大影响，并提出辅助预测模型卡以改进架构比较。

Comments ICML 2026

详情

AI中文摘要

深度学习模型在时间序列应用中越来越受欢迎。然而，大量新提出的架构和经常矛盾的实证结果使得评估哪种设计选择和模型组件驱动性能变得困难。在这篇立场论文中，我们认为当前的基准测试实践未能识别导致性能差异的因素，从而减缓了该领域的进展。特别是，在比较架构时，关键设计维度的差异被忽视，最终导致不一致的结果。为了支持我们的立场，我们展示了这些差异——通常被视为单纯的实现细节——可能比采用特定的序列建模层具有更大的影响。我们讨论了被忽视的方面（如全局性和局部性）如何（1）从根本上改变预测方法的类别，以及（2）极大地影响实证结果。我们的发现表明，需要重新思考我们的基准测试实践，并在设计和比较架构时关注预测问题的基本方面。作为具体步骤，我们提出了一个辅助预测模型卡，即一个包含一组字段的模板，用于根据关键设计选择来表征现有和新的预测架构。

英文摘要

Deep learning models have grown popular in time series applications. However, the large quantity of newly proposed architectures and the often contradictory empirical results make it difficult to assess which design choice and model component drives performance. In this position paper, we argue that current benchmarking practices fail to identify the factors responsible for performance differences, thus slowing down progress in the field. In particular, differences in crucial design dimensions are overlooked when comparing architectures, ultimately leading to inconsistent outcomes. To support our position, we show that such differences-often treated as mere implementation details-can have a greater impact than adopting specific sequence modeling layers. We discuss how overlooked aspects (such as globality and locality) can (1) fundamentally change the class of the forecasting method and (2) drastically affect empirical results. Our findings suggest rethinking our benchmarking practices and focusing on the foundational aspects of the forecasting problem when designing and comparing architectures. As a concrete step, we propose an auxiliary forecasting model card, i.e., a template with a set of fields to characterize existing and new forecasting architectures based on key design choices.

URL PDF HTML ☆

赞 0 踩 0

2512.20638 2026-06-02 cs.CL cs.AI cs.LG 版本更新

Uncovering Competency Gaps in Large Language Models and Their Benchmarks

揭示大型语言模型及其基准测试中的能力差距

Maty Bohacek, Nino Scherrer, Nicholas Dufour, Thomas Leung, Christoph Bregler, Stephanie C. Y. Chan

发表机构 * University of Zurich（苏黎世大学）

AI总结提出一种基于稀疏自编码器概念激活的新方法，自动发现模型在细粒度概念上的弱点（模型差距）和基准测试覆盖不平衡（基准差距），并通过内部表示评估和跨基准比较进行验证。

详情

Journal ref: ICML 2026

AI中文摘要

大型语言模型的评估严重依赖标准化基准测试。这些基准测试提供了有用的聚合指标，但可能掩盖（i）模型薄弱的特定子领域（“模型差距”）和（ii）基准测试本身的不平衡覆盖（“基准差距”）。为了自动揭示这两类差距，我们提出了一种简单的新方法，利用稀疏自编码器的概念激活，在逐概念基础上识别细粒度差距。该方法还受益于将评估基于模型的内部表示，以及易于跨基准测试进行比较。我们将该方法应用于五个流行的开源模型和十几个基准测试，作为示例说明。作为对该方法的验证，我们发现我们的自动无监督方法能够恢复文献中先前记录的模型差距（例如与谄媚相关的差距），并识别出新的模型差距。我们还能够自动揭示基准差距：应属于给定基准测试范围的核心概念。我们的“能力差距”方法可以通过提供模型行为的概念级分解，并帮助基准测试开发者迭代基准测试设计，来补充现有基准测试。代码可在 https://competency-gaps.github.io 获取。

英文摘要

The evaluation of large language models relies heavily on standardized benchmarks. These benchmarks provide useful aggregated metrics, but can obscure (i) particular sub-areas where the models are weak ("model gaps") and (ii) imbalanced coverage in the benchmarks themselves ("benchmark gaps"). To automatically uncover both types of gaps, we propose a simple new method using concept activations from sparse autoencoders, to identify fine-grained gaps on a per-concept basis. The method also benefits from grounding evaluation in the model's internal representations, as well as easy comparison across benchmarks. We applied the method to five popular open-source models and more than a dozen benchmarks, as illustrative examples. As validation of the approach, we found that our automatic, unsupervised method was able to recover model gaps that have been previously documented in the literature (e.g. relating to sycophancy), in addition to identifying novel model gaps. We were also able to automatically uncover benchmark gaps: core concepts that should fall within the scope of a given benchmark. Our "competency gaps" method can be used to complement existing benchmarks, by providing a concept-level decomposition of model behavior, and by helping benchmark developers iterate upon benchmark design. Code is available at https://competency-gaps.github.io.

URL PDF HTML ☆

赞 0 踩 0

2508.20072 2026-06-02 cs.CV cs.LG cs.RO 版本更新

Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

离散扩散VLA：将离散扩散引入视觉-语言-动作策略中的动作解码

Zhixuan Liang, Yizhuo Li, Tianshuo Yang, Chengyue Wu, Sitong Mao, Liuao Pei, Tian Nian, Shunbo Zhou, Xiaokang Yang, Jiangmiao Pang, Yao Mu, Ping Luo

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出离散扩散VLA，通过将动作块离散化并在统一Transformer骨干内使用离散扩散模式进行渐进细化，实现自适应解码顺序和错误纠正，在多个基准上取得高性能并保留预训练的视觉-语言先验。

Comments Accepted by ICML 2026. 17 pages

详情

AI中文摘要

视觉-语言-动作（VLA）模型将大型视觉-语言骨干网络适配为将图像和指令映射为机器人动作。然而，当前的VLA要么以固定的从左到右顺序自回归生成动作，性能较差；要么在骨干网络外附加独立的扩散头，这会割裂信息通路并阻碍统一、可扩展的架构。相反，我们提出了离散扩散VLA，它将动作块离散化，并使用离散扩散模式在统一的Transformer骨干内保留渐进细化。我们的方法实现了自适应解码顺序，在解决较难的动作元素之前先解决高置信度的动作元素，并采用二次重掩码来重新审视不确定的预测，从而实现鲁棒的纠错。这种设计保留了预训练的视觉-语言先验，支持并行解码，并提高了效率。离散扩散VLA在LIBERO上达到96.4%的平均成功率，在SimplerEnv-Fractal上达到71.2%的视觉匹配，在SimplerEnv-Bridge上达到54.2%的整体性能。在LIBERO-Goal的分布外测试中，我们的方法仅表现出0.8%的语言退化（相比之下并行解码为8.0%），以及20.4%的视觉退化（相比之下连续扩散为29.0%），表明其很好地保留了预训练的视觉-语言能力。我们还在AgileX Cobot Magic平台上进行了两次真实机器人评估，以展示该方法的有效性。

英文摘要

Vision-Language-Action (VLA) models adapt large vision-language backbones to map images and instructions into robot actions. However, prevailing VLAs either generate actions autoregressively in a fixed left-to-right order with poor performance or attach separate diffusion heads outside the backbone that fragments information pathways and hinders unified, scalable architectures. Instead, we present Discrete Diffusion VLA that discretizes action chunks and models them with discrete diffusion pattern retaining progressive refinement inside the unified transformer backbone. Our method achieves an adaptive decoding order that resolves high-confidence action elements before harder ones and employs secondary re-masking to revisit uncertain predictions, enabling robust error correction. This design preserves pretrained vision-language priors, supports parallel decoding, and improves the efficiency. Discrete Diffusion VLA achieves 96.4% avg. success on LIBERO, 71.2% visual matching on SimplerEnv-Fractal, and 54.2% overall on SimplerEnv-Bridge. On out-of-distribution tests of LIBERO-Goal, our method exhibits only 0.8% language degradation versus 8.0% of parallel decoding, and 20.4% vision degradation versus 29.0% for continuous diffusion, demonstrating well retention of pretrained vision-language capabilities. We also conduct two real-robot evaluations on AgileX Cobot Magic platform to show the method's effectiveness.

URL PDF HTML ☆

赞 0 踩 0

2506.13702 2026-06-02 cs.LG cs.AI 版本更新

Value-Free Policy Optimization via Reward Partitioning

通过奖励划分实现无价值函数策略优化

Bilal Faye, Hanane Azzag, Mustapha Lebbah

发表机构 * LIPN, Université Paris 13（巴黎第十三大学LIPN实验室）； Université Paris 13（巴黎第十三大学）； Université de Versailles Saint-Quentin Paris（巴黎- versaillies圣quentin大学）

AI总结提出Reward Partition Optimization (RPO)方法，通过基于划分的奖励归一化消除价值函数学习，实现稳定、高效的策略优化。

详情

AI中文摘要

单轨迹偏好优化方法从((提示, 响应, 奖励))元组的数据集中学习，通过直接利用标量反馈为成对偏好学习提供了一种实用的替代方案。现有方法如直接奖励优化(DRO)已显示出有希望的结果，但依赖于价值函数估计，引入了额外的方差、优化复杂性和对离策略数据的敏感性。我们引入了奖励划分优化(RPO)，一种简单且可扩展的奖励驱动目标，消除了对价值函数学习的需要。RPO通过直接从提示级奖励分布估计的基于划分的公式对奖励进行归一化，产生稳定的监督优化目标，无需辅助模型或强化学习循环。我们使用自动评估指标、LLM作为评判员的评估和优化稳定性分析，在多个编码器-解码器和仅解码器语言模型上评估RPO。实验结果表明，RPO在生成更对齐、更多样化和更少有毒内容的同时，始终优于强基线，包括SFT、KTO和DRO。

英文摘要

Single-trajectory preference optimization methods learn from datasets of ((prompt, response, reward)) tuples, offering a practical alternative to pairwise preference learning by directly leveraging scalar feedback. Existing approaches such as Direct Reward Optimization (DRO) have demonstrated promising results but rely on value function estimation, introducing additional variance, optimization complexity, and sensitivity to off-policy data. We introduce Reward Partition Optimization (RPO), a simple and scalable reward-driven objective that eliminates the need for value function learning. RPO normalizes rewards through a partition-based formulation estimated directly from prompt-level reward distributions, yielding a stable supervised optimization objective without auxiliary models or reinforcement learning loops. We evaluate RPO across multiple encoder-decoder and decoder-only language models using automatic metrics, LLM-as-a-judge evaluations, and optimization stability analyses. Experimental results show that RPO consistently outperforms strong baselines, including SFT, KTO, and DRO, while producing more aligned, diverse, and less toxic generations.

URL PDF HTML ☆

赞 0 踩 0

2512.18336 2026-06-02 cs.RO cs.AI cs.LG 版本更新

Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism

强化学习低层四旋翼控制中的动态熵调节：随机性与确定性

Youssef Mahran, Zeyad Gamal, Ayman El-Badawy

发表机构 * Mechatronics Engineering Department（机械工程系）； The German University in Cairo（开罗德国大学）

AI总结研究在四旋翼控制中，通过动态熵调节训练随机策略的强化学习算法，并与确定性策略算法对比，发现动态熵调节可防止灾难性遗忘并提高探索效率。

Comments This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore

详情

DOI: 10.1109/ICCTA64612.2024.10974880
Journal ref: 2024 IEEE 34th International Conference on Computer Theory and Applications (ICCTA)

AI中文摘要

本文探讨了在训练随机策略的强化学习算法中动态熵调节的影响，并将其性能与训练确定性策略的算法进行了比较。随机策略通过优化动作的概率分布来最大化奖励，而确定性策略则为每个状态选择一个确定的动作。本文研究了使用静态熵和动态熵训练随机策略，然后执行确定性动作来控制四旋翼的效果，并与训练确定性策略并执行确定性动作进行了对比。为此，随机算法选择了软演员-评论家（SAC）算法，确定性算法选择了双延迟深度确定性策略梯度（TD3）算法。训练和仿真结果表明，动态熵调节通过防止灾难性遗忘和提高探索效率，对控制四旋翼产生了积极影响。

英文摘要

This paper explores the impact of dynamic entropy tuning in Reinforcement Learning (RL) algorithms that train a stochastic policy. Its performance is compared against algorithms that train a deterministic one. Stochastic policies optimize a probability distribution over actions to maximize rewards, while deterministic policies select a single deterministic action per state. The effect of training a stochastic policy with both static entropy and dynamic entropy and then executing deterministic actions to control the quadcopter is explored. It is then compared against training a deterministic policy and executing deterministic actions. For the purpose of this research, the Soft Actor-Critic (SAC) algorithm was chosen for the stochastic algorithm while the Twin Delayed Deep Deterministic Policy Gradient (TD3) was chosen for the deterministic algorithm. The training and simulation results show the positive effect the dynamic entropy tuning has on controlling the quadcopter by preventing catastrophic forgetting and improving exploration efficiency.

URL PDF HTML ☆

赞 0 踩 0

2512.18333 2026-06-02 cs.RO cs.AI cs.LG 版本更新

Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC)

基于软演员-评论家(SAC)的四旋翼强化学习位置控制

Youssef Mahran, Zeyad Gamal, Ayman El-Badawy

发表机构 * Mechatronics Engineering Department（机械电子工程系）； The German University in Cairo（埃及德国大学）

AI总结提出一种基于强化学习的四旋翼推力矢量控制架构，使用软演员-评论家算法训练，相比传统RPM控制器训练更快、路径跟踪更平滑准确。

Comments This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore

详情

DOI: 10.1109/NILES63360.2024.10753187
Journal ref: 2024 IEEE 6th Novel Intelligent and Leading Emerging Sciences Conference (NILES)

AI中文摘要

本文提出了一种新的基于强化学习(RL)的四旋翼控制架构。现有文献主要关注直接控制四个旋翼的转速，而本文旨在控制四旋翼的推力矢量。RL智能体计算沿四旋翼z轴的总推力百分比以及期望的滚转角(ϕ)和俯仰角(θ)。然后，智能体将计算出的控制信号连同当前四旋翼的偏航角(ψ)发送给姿态PID控制器。PID控制器再将控制信号映射为电机转速。采用软演员-评论家算法（一种无模型离策略随机RL算法）来训练RL智能体。训练结果表明，与传统的RPM控制器相比，所提出的推力矢量控制器训练时间更短。仿真结果表明，所提出的推力矢量控制器具有更平滑、更精确的路径跟踪性能。

英文摘要

This paper proposes a new Reinforcement Learning (RL) based control architecture for quadrotors. With the literature focusing on controlling the four rotors' RPMs directly, this paper aims to control the quadrotor's thrust vector. The RL agent computes the percentage of overall thrust along the quadrotor's z-axis along with the desired Roll ($ϕ$) and Pitch ($θ$) angles. The agent then sends the calculated control signals along with the current quadrotor's Yaw angle ($ψ$) to an attitude PID controller. The PID controller then maps the control signals to motor RPMs. The Soft Actor-Critic algorithm, a model-free off-policy stochastic RL algorithm, was used to train the RL agents. Training results show the faster training time of the proposed thrust vector controller in comparison to the conventional RPM controllers. Simulation results show smoother and more accurate path-following for the proposed thrust vector controller.

URL PDF HTML ☆

赞 0 踩 0

2512.02342 2026-06-02 math.OC cs.LG stat.ML 版本更新

Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients

非光滑优化的保护性随机Polyak步长：无需小(次)梯度的鲁棒性能

Dimitris Oikonomou, Nicolas Loizou

发表机构 * Mathematical Institute for Data Science (MINDS), Johns Hopkins University, Baltimore, MD, USA（数据科学数学研究所（MINDS），约翰霍普金斯大学，巴尔的摩，MD，美国）； Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA（计算机科学系，约翰霍普金斯大学，巴尔的摩，MD，美国）； Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA（应用数学与统计学系，约翰霍普金斯大学，巴尔的摩，MD，美国）

AI总结针对非光滑凸优化问题，提出保护性随机Polyak步长（SPS_safe）用于随机次梯度方法，在无需强假设下提供收敛保证，并融入动量机制，实验验证其在深度神经网络训练中避免梯度消失的鲁棒性。

Comments 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

随机Polyak步长（SPS）已被证明是随机梯度下降（SGD）的一个有前景的选择，在光滑凸和非凸优化问题（包括深度神经网络训练）上，与最先进方法相比具有竞争性能。然而，该方法向非光滑设置的扩展仍处于早期阶段，通常依赖于插值假设或需要知道最优解。在这项工作中，我们为随机次梯度方法提出了一种新的SPS变体——保护性SPS（SPS$_{safe}$），并在无需强假设的情况下为非光滑凸优化提供了严格的收敛保证。我们进一步将动量融入更新规则中，得到了同样严格的理论结果。在凸基准和深度神经网络上的综合实验证实了我们的理论：所提出的步长在现有自适应基线中实现了竞争性能，并在广泛的问题设置中表现出稳定行为。最后，在深度神经网络训练的背景下，我们的步长下的梯度范数不会崩溃到（接近）零，表明了对梯度消失的鲁棒性。

英文摘要

The stochastic Polyak step size (SPS) has proven to be a promising choice for stochastic gradient descent (SGD), delivering competitive performance relative to state-of-the-art methods on smooth convex and non-convex optimization problems, including deep neural network training. However, extensions of this approach to non-smooth settings remain in their early stages, often relying on interpolation assumptions or requiring knowledge of the optimal solution. In this work, we propose a novel SPS variant, Safeguarded SPS (SPS$_{safe}$), for the stochastic subgradient method, and provide rigorous convergence guarantees for non-smooth convex optimization with no need for strong assumptions. We further incorporate momentum into the update rule, yielding equally tight theoretical results. Comprehensive experiments on convex benchmarks and deep neural networks corroborate our theory: the proposed step size achieves competitive performance to existing adaptive baselines and exhibits stable behavior across a wide range of problem settings. Finally, in the context of deep neural network training, the gradient norms under our step size do not collapse to (near) zero, indicating robustness to vanishing gradients.

URL PDF HTML ☆

赞 0 踩 0

2512.13356 2026-06-02 cs.RO cs.AI cs.LG 版本更新

Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3)

使用双延迟深度确定性策略梯度（TD3）控制双旋翼系统

Zeyad Gamal, Youssef Mahran, Ayman El-Badawy

发表机构 * Mechatronics Engineering Department（机械电子工程系）； The German University in Cairo（埃及德国大学）

AI总结提出基于TD3算法的强化学习框架，用于控制双旋翼气动系统在俯仰和方位角上的稳定与轨迹跟踪，仿真和实验验证了其优于传统PID控制器的抗干扰能力。

Comments This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore

详情

DOI: 10.1109/ICSTCC62912.2024.10744717
Journal ref: 2024 28th IEEE International Conference on System Theory, Control and Computing (ICSTCC)

AI中文摘要

本文提出了一种强化学习（RL）框架，用于在特定俯仰角和方位角下控制和稳定双旋翼气动系统（TRAS），并跟踪给定轨迹。TRAS的复杂动力学和非线性特性使得使用传统控制算法进行控制具有挑战性。然而，近年来RL的发展因其在多旋翼控制中的潜在应用而引起了兴趣。本文使用双延迟深度确定性策略梯度（TD3）算法来训练RL智能体。该算法适用于具有连续状态和动作空间的环境（类似于TRAS），因为它不需要系统的模型。仿真结果展示了RL控制方法的有效性。接下来，使用风扰形式的的外部扰动来测试控制器与传统PID控制器相比的有效性。最后，在实验室装置上进行了实验，以确认控制器在实际应用中的有效性。

英文摘要

This paper proposes a reinforcement learning (RL) framework for controlling and stabilizing the Twin Rotor Aerodynamic System (TRAS) at specific pitch and azimuth angles and tracking a given trajectory. The complex dynamics and non-linear characteristics of the TRAS make it challenging to control using traditional control algorithms. However, recent developments in RL have attracted interest due to their potential applications in the control of multirotors. The Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm was used in this paper to train the RL agent. This algorithm is used for environments with continuous state and action spaces, similar to the TRAS, as it does not require a model of the system. The simulation results illustrated the effectiveness of the RL control method. Next, external disturbances in the form of wind disturbances were used to test the controller's effectiveness compared to conventional PID controllers. Lastly, experiments on a laboratory setup were carried out to confirm the controller's effectiveness in real-world applications.

URL PDF HTML ☆

赞 0 踩 0

2508.11931 2026-06-02 cs.LG 版本更新

An Improved Algorithm for Adversarial Linear Contextual Bandits via Reduction

通过归约改进的对抗线性上下文赌博机算法

Tim van Erven, Jack Mayo, Julia Olkhovskaya, Chen-Yu Wei

发表机构 * University of Amsterdam（阿姆斯特丹大学）； Delft University of Technology（代尔夫特理工大学）； University of Virginia（弗吉尼亚大学）

AI总结提出一种基于归约的 oracle 高效算法，将具有对抗损失和随机动作集的线性上下文赌博机问题转化为鲁棒对抗线性赌博机问题，实现了接近最优的遗憾界和多项式时间复杂度。

详情

AI中文摘要

我们提出了一种 oracle 高效、接近最优的算法，用于具有对抗损失和随机动作集的线性上下文赌博机，每轮仅需对动作集进行线性优化 oracle 调用。我们的方法将该问题归约为具有固定动作集的鲁棒对抗线性赌博机。在不知道上下文分布或无法访问上下文模拟器的情况下，该算法实现了 $\widetilde{\mathcal{O}}(\min\{d^2\sqrt{T}, \sqrt{d^3T\log K}\})$ 的遗憾，运行时间为 $\mathrm{poly}(d,T)$ 加上 $\mathrm{poly}(d,T)$ 次线性优化 oracle 调用，其中 $d$ 是特征维度，$K$ 是每轮动作数的上界，$T$ 是轮数。这解决了 Liu 等人 (2023) 提出的开放问题：是否可以在与动作数无关的多项式时间内获得 $\mathrm{poly}(d)\sqrt{T}$ 的遗憾。对于具有对抗损失和随机动作集的组合赌博机这一重要类别，我们的算法是首个在多项式时间内实现 $\mathrm{poly}(d)\sqrt{T}$ 遗憾的算法，而据我们所知，此前没有算法能在多项式时间内达到甚至 $o(T)$ 的遗憾。当模拟器可用时，遗憾界可以改进为 $\widetilde{\mathcal{O}}(d\sqrt{L^\star})$，其中 $L^\star$ 是最优策略的累积损失。

英文摘要

We present an oracle-efficient, near-optimal algorithm for linear contextual bandits with adversarial losses and stochastic action sets, only requiring a linear optimization oracle for the action sets in each round. Our approach reduces this setting to misspecification-robust adversarial linear bandits with fixed action sets. Without knowledge of the context distribution or access to a context simulator, the algorithm achieves $\widetilde{\mathcal{O}}(\min\{d^2\sqrt{T}, \sqrt{d^3T\log K}\})$ regret and runs in $\mathrm{poly}(d,T)$ time plus $\mathrm{poly}(d,T)$ calls to the linear optimization oracles, where $d$ is the feature dimension, $K$ is an upper bound on the number of actions in each round, and $T$ is number of rounds. This resolves the open question by Liu et al. (2023) on whether one can obtain $\mathrm{poly}(d)\sqrt{T}$ regret in polynomial time independent of the number of actions. For the important class of combinatorial bandits with adversarial losses and stochastic action sets, our algorithm is the first to achieve $\mathrm{poly}(d)\sqrt{T}$ regret in polynomial time, while no prior algorithm achieves even $o(T)$ regret in polynomial time to our knowledge. When a simulator is available, the regret bound can be improved to $\widetilde{\mathcal{O}}(d\sqrt{L^\star})$, where $L^\star$ is the cumulative loss of the best policy.

URL PDF HTML ☆

赞 0 踩 0

2511.01064 2026-06-02 stat.ML cs.LG stat.CO 版本更新

Generalized Guarantees for Variational Inference in the Presence of Even and Elliptical Symmetry

存在偶对称和椭圆对称时变分推断的广义保证

Charles C. Margossian, Isaac E. Rankin, Lawrence K. Saul

发表机构 * University of British Columbia, Department of Statistics（不列颠哥伦比亚大学统计学系）； Flatiron Institute, Center for Computational Mathematics（Flatiron研究所计算数学中心）

AI总结本文证明，对于所有f-散度，在偶对称和椭圆对称条件下，变分推断的驻点能分别恢复目标密度的均值和相关矩阵，推广了先前对逆KL散度的结果。

详情

AI中文摘要

变分推断（VI）通过在易处理的分布族中寻找最佳匹配$q$来近似目标密度$p$。最佳变分近似通过最小化分布之间的散度$D(p||q)$得到，目前已提出多种散度作为VI的目标函数，不同选择导致不同近似。我们证明，即使这些散度具有不同的最小化器，所得近似都遵循某些对称匹配原则。具体来说，我们的结果适用于所有$f$-散度，这是一大类包括逆和前向Kullback-Leibler散度以及$\alpha$-散度的散度。我们证明，在存在偶对称时，$f$-散度的任何驻点都保证恢复$p$的均值；同样，在存在椭圆对称时，任何驻点都保证恢复其相关矩阵。为获得这些保证，我们假设$p$和$q$是单峰的，但值得注意的是，我们不要求它们是对数凹、轻尾或处处光滑的。这些保证推广了先前对逆Kullback-Leibler散度在$p$为对数凹时得到的结果。它们还扩展到目标密度$p$仅在其部分坐标上呈现对称性的情况。这些部分对称性自然出现在贝叶斯层次模型中，其中先验诱导出具有挑战性的几何结构，但仍具有对称轴。

英文摘要

Variational inference (VI) approximates a target density $p$ by the best match $q$ in a family of tractable distributions. The best variational approximation is found by minimizing a divergence between distributions, $D(p||q)$, and several divergences have been proposed as objective functions for VI, with different choices leading to different approximations. We show that even when these divergences have different minimizers, the resulting approximations all abide by certain symmetry-matching principles. Specifically, our results hold for all $f$-divergences, a broad class which includes the reverse and forward Kullback-Leibler divergences and the $α$-divergences. We show that in the presence of even symmetry, any stationary point of an $f$-divergence is guaranteed to recover the mean of $p$ and likewise, in the presence of elliptical symmetry, any stationary point is guaranteed to recover its correlation matrix. To obtain these guarantees we assume that $p$ and $q$ are unimodal, but notably we do not require them to be log-concave, light-tailed, or even everywhere-smooth. These guarantees generalize a previous result obtained for the reverse Kullback-Leibler divergence when $p$ is log-concave. They also extend to cases where the target density $p$ only exhibits symmetry along some but not all of its coordinates. These partial symmetries arise naturally in Bayesian hierarchical models, where the prior induces a challenging geometry but still possesses axes of symmetry.

URL PDF HTML ☆

赞 0 踩 0

2409.03915 2026-06-02 cs.LG math.OC 版本更新

Asynchronous Stochastic Approximation with Applications to Average-Reward Reinforcement Learning

异步随机逼近及其在平均奖励强化学习中的应用

Huizhen Yu, Yi Wan, Richard S. Sutton

发表机构 * Department of Computing Science, University of Alberta（计算科学系，阿尔伯塔大学）； Alberta Machine Intelligence Institute (Amii)（阿尔伯塔人工智能研究所（Amii））

AI总结研究异步随机逼近算法的稳定性与收敛性，通过扩展Borkar-Meyn稳定性证明方法和Hirsch-Benaïm动力学系统方法，为平均奖励强化学习中的相对值迭代算法提供理论基础。

Comments 34 pages. This version contains only the asynchronous stochastic approximation material from version 2 of the original report; the reinforcement-learning material has been moved to a separate, stand-alone paper (arXiv:2512.06218). Minor corrections and additional remarks have been incorporated. A shorter version of this paper is to appear in the SIAM Journal on Control and Optimization

详情

DOI: 10.1137/25M1769806
Journal ref: SIAM Journal on Control and Optimization, 64(3):1456-1481, 2026

AI中文摘要

本文研究了异步随机逼近（SA）算法的稳定性和收敛性质，重点关注与平均奖励强化学习相关的扩展。我们首先扩展了Borkar和Meyn的稳定性证明方法，以适应比先前考虑的更一般的噪声条件，从而为异步SA提供了更广泛的收敛保证。为了深化收敛性分析，我们进一步基于Hirsch和Benaïm的动力学系统方法，研究了异步SA的阴影性质。这些结果为在配套论文中开发和分析的一类基于相对值迭代的强化学习算法提供了理论基础，用于求解平均奖励马尔可夫和半马尔可夫决策过程。

英文摘要

This paper investigates the stability and convergence properties of asynchronous stochastic approximation (SA) algorithms, with a focus on extensions relevant to average-reward reinforcement learning. We first extend a stability proof method of Borkar and Meyn to accommodate more general noise conditions than previously considered, thereby yielding broader convergence guarantees for asynchronous SA. To sharpen the convergence analysis, we further examine the shadowing properties of asynchronous SA, building on a dynamical systems approach of Hirsch and Benaïm. These results provide a theoretical foundation for a class of relative value iteration-based reinforcement learning algorithms -- developed and analyzed in a companion paper -- for solving average-reward Markov and semi-Markov decision processes.

URL PDF HTML ☆

赞 0 踩 0

2512.07795 2026-06-02 cs.AI cs.CL cs.LG 版本更新

ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning

ReasonBENCH: 基准测试LLM推理的（不）稳定性

Nearchos Potamitis, Vansh Ramani, Har Ashish Arora, Dhairya Kuchhal, Lars Klein, Akhil Arora

发表机构 * Aarhus University（奥胡斯大学）； Indian Institute of Technology Delhi（德里印度理工学院）； EPFL（苏黎世联邦理工学院）

AI总结提出ReasonBench基准套件，通过30次独立试验揭示LLM推理系统在贪婪解码下仍存在结构化方差，并引入全局噪声和运行噪声分类法，证明稳定性是推理系统的固有属性，倡导分布感知评估。

Comments 29 pages, 19 tables, 85 figures

详情

AI中文摘要

LLM推理系统的基准分数被报告为单一数字，然而相同的模型、策略和任务在重复执行时，即使在贪婪解码（T=0）下也会产生显著不同的答案和成本。这种方差并非统计上的麻烦：性能最高的策略在与最接近的对手进行头对头运行时仅获胜77%，这意味着单次观测到的分数可能会无声地错误排序系统。我们引入了ReasonBench，一个基准套件，记录了10种推理策略、12个模型和6个任务的30次独立试验，将质量和成本视为分布而非点估计。我们发现这种方差是有结构的而非随机的：一个双组分分类法——全局噪声（捕捉跨基准的不均匀性）和运行噪声（捕捉基准内的随机性）——揭示了策略架构预测稳定性分布，而模型和策略则移动分布的正交方面。层次分解将四分之三的分数方差归因于基准、系统和项目结构，而单次运行评估无声地吸收了持久的残差。最后，成本和成本非对称地解耦：廉价方法在结构上对联合成本-质量失败免疫，而昂贵方法无论其准确性如何仍然暴露。这些发现确立了不稳定性作为推理系统的固有属性，并促使分布感知评估成为标准实践。

英文摘要

Benchmark scores for LLM reasoning systems are reported as single numbers, yet the same model, strategy, and task can produce meaningfully different answers and costs across repeated executions, even under greedy decoding (T = 0). This variance is not a statistical nuisance: the highest-performing strategy wins only 77% of head-to-head runs against its nearest competitor, meaning a single observed score can silently misrank systems. We introduce ReasonBench, a benchmark suite recording 30 independent trials across 10 reasoning strategies, 12 models, and 6 tasks, treating quality and cost as distributions rather than point estimates. We find that this variance is structured rather than random: a two-component taxonomy -- Global Noise, capturing cross-benchmark unevenness, and Run Noise, capturing within-benchmark stochasticity -- reveals that strategy architecture predicts stability profiles, while models and strategies shift orthogonal aspects of the distribution. A hierarchical decomposition attributes three-quarters of score variance to benchmark, system, and item structure, with a persistent residual that single-run evaluation silently absorbs. Finally, cost and quality decouple asymmetrically: cheap methods are structurally immune to joint cost-quality failure, while expensive methods remain exposed regardless of their accuracy. These findings establish instability as an inherent property of reasoning systems and motivate distribution-aware evaluation as standard practice.

URL PDF HTML ☆

赞 0 踩 0

2512.06906 2026-06-02 cs.SE cs.CR cs.DB cs.LG 版本更新

MINES: Explainable Anomaly Detection through Web API Invariant Inference

MINES：通过Web API不变式推断实现可解释的异常检测

Wenjie Zhang, Yun Lin, Chun Fung Amos Kwok, Xiwen Teoh, Xiaofei Xie, Frank Liauw, Hongyu Zhang, Jin Song Dong

发表机构 * National University of Singapore（国立新加坡大学）； Shanghai Jiao Tong University（上海交通大学）； Singapore Management University（新加坡管理学院）； GovTech Singapore（新加坡政府科技局）； Chongqing University（重庆大学）

AI总结提出MINES方法，通过从模式级别推断可解释的API不变式来检测Web应用异常，显著降低误报率并提高召回率。

Comments Accepted by ICSE 2026

详情

DOI: 10.1145/3744916.3773160

AI中文摘要

检测Web应用的异常对于提供可靠的Web服务至关重要，这些应用是现代公司和政府运行的重要基础设施。许多现代Web应用基于Web API（例如RESTful、SOAP和WebSockets）运行，其暴露性会招致有意攻击或无意非法访问，导致系统行为异常。然而，此类异常可能与正常日志共享非常相似的日志，缺少用于日志区分的關鍵信息（可能存在于数据库中）。此外，日志实例可能包含噪声，这会进一步误导最先进的日志学习解决方案学习虚假相关性，从而产生用于异常检测的浅层模型和规则。在这项工作中，我们提出MINES，它从模式级别而非详细的原始日志实例推断可解释的API不变式用于异常检测，能够（1）显著区分日志中的噪声以识别精确的正常行为，以及（2）检测超出已记录日志的异常行为。技术上，MINES（1）将API签名转换为表模式以增强原始数据库模式；（2）在增强的数据库模式上推断潜在的数据库约束，以捕获API与数据库表之间的潜在关系。MINES使用LLM基于两个给定的表结构提取潜在关系，并使用正常日志实例拒绝或接受LLM生成的不变式。最后，MINES将推断的约束转换为不变式，生成用于验证运行时日志的Python代码。我们在TrainTicket、NiceFish、Gitea、Mastodon和NextCloud基准测试上针对Web篡改攻击，与LogRobust、LogFormer和WebNorm等基线进行了广泛评估。结果表明，MINES在引入几乎零误报的情况下实现了对异常的高召回率，代表了新的最先进水平。

英文摘要

Detecting the anomalies of web applications, important infrastructures for running modern companies and governments, is crucial for providing reliable web services. Many modern web applications operate on web APIs (e.g., RESTful, SOAP, and WebSockets), their exposure invites intended attacks or unintended illegal visits, causing abnormal system behaviors. However, such anomalies can share very similar logs with normal logs, missing crucial information (which could be in database) for log discrimination. Further, log instances can be also noisy, which can further mislead the state-of-the-art log learning solutions to learn spurious correlation, resulting superficial models and rules for anomaly detection. In this work, we propose MINES which infers explainable API invariants for anomaly detection from the schema level instead of detailed raw log instances, which can (1) significantly discriminate noise in logs to identify precise normalities and (2) detect abnormal behaviors beyond the instrumented logs. Technically, MINES (1) converts API signatures into table schema to enhance the original database shema; and (2) infers the potential database constraints on the enhanced database schema to capture the potential relationships between APIs and database tables. MINES uses LLM for extracting potential relationship based on two given table structures; and use normal log instances to reject and accept LLM-generated invariants. Finally, MINES translates the inferred constraints into invariants to generate Python code for verifying the runtime logs. We extensively evaluate MINES on web-tamper attacks on the benchmarks of TrainTicket, NiceFish, Gitea, Mastodon, and NextCloud against baselines such as LogRobust, LogFormer, and WebNorm. The results show that MINES achieves high recall for the anomalies while introducing almost zero false positives, indicating a new state-of-the-art.

URL PDF HTML ☆

赞 0 踩 0

2511.20639 2026-06-02 cs.CL cs.AI cs.LG 版本更新

Latent Collaboration in Multi-Agent Systems

多智能体系统中的潜在协作

Jiaru Zou, Ruizhong Qiu, Gaotang Li, Xiyuan Yang, Katherine Tieu, Pan Lu, Ke Shen, Hanghang Tong, Yejin Choi, Jingrui He, James Zou, Mengdi Wang, Ling Yang

发表机构 * University of Washington（华盛顿大学）

AI总结提出LatentMAS框架，使LLM智能体在连续潜在空间直接协作，无需文本中介，实现更高精度、更低开销和更快推理。

Comments ICML2026 Spotlight, Project: https://github.com/Gen-Verse/LatentMAS

详情

AI中文摘要

多智能体系统（MAS）将大语言模型（LLM）从独立的单模型推理扩展到协同的系统级智能。现有LLM智能体依赖基于文本的中介进行推理和通信，而我们更进一步，使模型能够在连续潜在空间内直接协作。我们引入了LatentMAS，一个端到端无需训练的框架，实现了LLM智能体间的纯潜在协作。在LatentMAS中，每个智能体首先通过最后一层的隐藏嵌入而非文本进行自回归潜在思维生成。然后，一个共享的潜在工作记忆保存并传递每个智能体的内部表示和潜在思维，确保无需重新编码的无损信息交换。我们提供了详细的理论分析，表明LatentMAS比基于文本的标准MAS具有更高的表达能力和无损信息保存能力，且整体复杂度更低。此外，在涵盖数学和科学推理、常识理解及代码生成的9个综合基准测试上的实证评估表明，LatentMAS优于先进的单智能体和基于文本的MAS基线，准确率最高提升14.6%，输出token使用量减少70.8%-83.7%，端到端推理速度提升4倍至4.3倍。代码和数据完全开源：https://github.com/Gen-Verse/LatentMAS。

英文摘要

Multi-agent systems (MAS) extend large language models (LLMs) from independent single-model reasoning to coordinative system-level intelligence. While existing LLM agents depend on text-based mediation for reasoning and communication, we take a step forward by enabling models to collaborate directly within the continuous latent space. We introduce LatentMAS, an end-to-end training-free framework that enables pure latent collaboration among LLM agents. In LatentMAS, each agent first performs auto-regressive latent thoughts generation through last-layer hidden embeddings instead of text. Then, a shared latent working memory preserves and transfers each agent's internal representations and latent thoughts, ensuring lossless information exchange without re-encoding. We provide detailed theoretical analyses showing that LatentMAS achieves higher expressiveness and lossless information preservation with lower overall complexity than standard text-based MAS. In addition, empirical evaluations across 9 comprehensive benchmarks spanning math and science reasoning, commonsense understanding, and code generation show that LatentMAS outperforms advanced single agents and text-based MAS baselines, achieving up to 14.6% higher accuracy, reducing output token usage by 70.8%-83.7%, and providing 4$\times$-4.3$\times$ faster end-to-end inference. Code and data are fully open-sourced at https://github.com/Gen-Verse/LatentMAS.

URL PDF HTML ☆

赞 0 踩 0

2511.08223 2026-06-02 stat.CO cs.LG cs.NA math.NA 版本更新

High-Performance Variance-Covariance Matrix Construction Using an Uncentered Gram Formulation

使用非中心Gram形式的高性能方差-协方差矩阵构建

Felix Reichel

发表机构 * Department of Economics, Johannes Kepler University Linz（经济系，约翰尼斯·开普勒大学林茨）

AI总结本文通过非中心Gram矩阵和修正项等价于成对差异定义，避免了显式中心化，将计算简化为一个p×p外积和一次减法，在Python基准测试中显著提升运行速度。

Comments 17 pages, 9 figures, 1 table

2510.09330 2026-06-02 cs.LG 版本更新

Safety Game: Inference-Time Alignment of Black-Box LLMs via Constrained Optimization

安全博弈：通过约束优化实现黑盒大语言模型的推理时对齐

Tuan Nguyen, Long Tran-Thanh

发表机构 * University of Southampton（南安普顿大学）

AI总结提出一种无需重新训练或访问模型内部结构的黑盒安全对齐框架，通过将安全与帮助性的权衡建模为二人零和博弈，并在推理时使用线性规划求解均衡策略，实现了黑盒LLM的安全对齐。

详情

AI中文摘要

确保大语言模型（LLM）遵守安全要求是AI部署中的核心挑战。现有的对齐方法主要在训练期间进行，例如通过微调或基于人类反馈的强化学习，但这些方法成本高昂且不灵活，每当出现新需求时都需要重新训练。最近针对推理时对齐的努力缓解了其中一些限制，但仍然假设可以访问模型内部结构，这在实际中不可行，也不适用于无法访问模型的第三方利益相关者。在这项工作中，我们提出了一种与模型无关的黑盒安全对齐框架，无需重新训练或访问底层LLM架构。作为概念验证，我们解决了在生成安全但无信息的回答与有用但潜在风险的回答之间进行权衡的问题。我们将这一困境建模为一个二人零和博弈，其极小极大均衡捕捉了安全性与帮助性之间的最优平衡。LLM智能体通过推理时使用线性规划求解器计算均衡策略来操作这一框架。我们的结果证明了黑盒安全对齐的可行性，为包括小型组织和资源受限环境中的实体在内的利益相关者提供了一种可扩展且可访问的途径，以在快速演变的LLM生态系统中强制执行安全。

英文摘要

Ensuring that large language models (LLMs) comply with safety requirements is a central challenge in AI deployment. Existing alignment approaches primarily operate during training, such as through fine-tuning or reinforcement learning from human feedback, but these methods are costly and inflexible, requiring retraining whenever new requirements arise. Recent efforts toward inference-time alignment mitigate some of these limitations but still assume access to model internals, which is impractical, and not suitable for third party stakeholders who do not have access to the models. In this work, we propose a model-independent, black-box framework for safety alignment that does not require retraining or access to the underlying LLM architecture. As a proof of concept, we address the problem of trading off between generating safe but uninformative answers versus helpful yet potentially risky ones. We formulate this dilemma as a two-player zero-sum game whose minimax equilibrium captures the optimal balance between safety and helpfulness. LLM agents operationalize this framework by leveraging a linear programming solver at inference time to compute equilibrium strategies. Our results demonstrate the feasibility of black-box safety alignment, offering a scalable and accessible pathway for stakeholders, including smaller organizations and entities in resource-constrained settings, to enforce safety across rapidly evolving LLM ecosystems.

URL PDF HTML ☆

赞 0 踩 0

2512.02328 2026-06-02 q-bio.QM cs.LG 版本更新

Molecular Embedding-Based Algorithm Selection in Protein-Ligand Docking

基于分子嵌入的蛋白质-配体对接算法选择

Jiabao Brad Wang, Siyuan Cao, Hongxuan Wu, Yiliang Yuan, Mustafa Misir

发表机构 * Division of Natural and Applied Sciences, Duke Kunshan University（杜克昆山大学自然科学与应用科学系）； Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence（莫扎德人工智能大学机器学习系）

AI总结提出MolAS轻量级算法选择模型，利用预训练蛋白质和配体嵌入的注意力池化与浅层残差解码器预测对接算法性能，在五个基准上相比单一最优算法绝对提升高达15个百分点，并缩小了与虚拟最优算法之间17-66%的差距。

Comments 40 pages, 16 figures, 8 tables; updated to the accepted manuscript version

详情

DOI: 10.1186/s13321-026-01168-8
Journal ref: J Cheminform 18, 47 (2026)

AI中文摘要

选择有效的对接算法高度依赖于具体情境，没有单一方法能在结构、化学和协议范围内可靠地表现。MolAS是一种轻量级算法选择模型，通过注意力池化和浅层残差解码器，从预训练的蛋白质和配体嵌入中预测每个算法的性能。使用数百到数千个标记复合物，MolAS在五个对接基准上相比单一最优算法（SBS）实现了高达15个百分点的绝对改进，并缩小了虚拟最优算法（VBS）与SBS之间17-66%的差距。对选择频率、边际条件可靠性和基准级预言结构分析表明，当工作流定义的预言景观具有低胜者熵和合理可分离的顶级求解器区域时，MolAS最有效，但在协议不匹配导致求解器排名变化和诱导标签改变时性能下降。这些结果表明，在评估的范围内，鲁棒性受限于工作流和协议引起的求解器层次不稳定性，而非表示能力，将MolAS定位为固定管线的领域内选择器以及评估对接算法选择是否适定的诊断工具。

英文摘要

Selecting an effective docking algorithm is highly context-dependent, and no single method performs reliably across structural, chemical, and protocol regimes. MolAS is a lightweight algorithm-selection model that predicts per-algorithm performance from pretrained protein and ligand embeddings using attentional pooling and a shallow residual decoder. With hundreds to a few thousand labelled complexes, MolAS achieves up to a 15 percentage-point absolute improvement over the single-best solver (SBS) and closes 17--66\% of the Virtual Best Solver (VBS)--SBS gap across five docking benchmarks. Analyses of selection frequencies, margin-conditioned reliability, and benchmark-level oracle structure indicate that MolAS is most effective when the workflow-defined oracle landscape has low winner entropy and a reasonably separable top-solver region, but degrades under protocol mismatch that shifts solver rankings and changes the induced labels. These results suggest that, in the evaluated regime, robustness is limited less by representational capacity than by workflow- and protocol-induced instability in solver hierarchies, positioning MolAS as an in-domain selector for fixed pipelines and as a diagnostic tool for assessing when docking algorithm selection is well-posed.

URL PDF HTML ☆

赞 0 踩 0

2512.00088 2026-06-02 cs.CV cs.LG 版本更新

Semimage: HSV-Based Semantic Image Encoding for Disentangled Text Representation

Semimage: 基于HSV的语义图像编码用于解缠文本表示

Mohammad Zare

发表机构 * AI Lab at Department of Computer Engineering（计算机工程系人工智能实验室）； AriooBarzan Engineering Team and Information Technology（AriooBarzan工程团队和信息技术）； Shiraz University of Technology（谢兹大学技术学院）

AI总结提出SemImage方法，将文本表示为二维语义图像，利用HSV颜色空间解缠主题、情感和强度特征，通过多任务学习实现，并在文档分类中取得竞争性性能。

详情

Journal ref: 2026 12th International Conference on Web Research (ICWR), 253-259

AI中文摘要

我们提出SemImage，一种将文本文档表示为二维语义图像以由卷积神经网络（CNN）处理的新方法。在SemImage中，每个单词表示为二维图像中的一个像素：行对应句子，并在句子之间插入额外的边界行以标记语义转换。每个像素不是典型的RGB值，而是解缠HSV颜色空间中的向量，编码不同的语言特征：色调（具有两个分量H_cos和H_sin以考虑循环性）编码主题，饱和度编码情感，明度编码强度或确定性。我们通过多任务学习框架强制这种解缠：ColorMapper网络将每个词嵌入映射到HSV空间，并对色调和饱和度通道应用辅助监督以预测主题和情感标签，同时执行主要任务目标。在句子之间插入动态计算的边界行，当连续句子在语义上不相似时，会在图像中产生清晰的视觉边界，有效地使段落边界突出。我们将SemImage与标准2D CNN（例如ResNet）集成用于文档分类。在多标签数据集（同时具有主题和情感标注）和单标签基准上的实验表明，SemImage能够达到与强文本分类基线（包括BERT和层次注意力网络）相当或更好的准确性，同时提供增强的可解释性。消融研究证实了多通道HSV表示和动态边界行的重要性。最后，我们展示了SemImage的可视化，定性地揭示了生成图像中与主题转换和情感变化相对应的清晰模式，表明我们的表示使这些语言特征对人类和机器都可见。

英文摘要

We propose SemImage, a novel method for representing a text document as a two-dimensional semantic image to be processed by convolutional neural networks (CNNs). In a SemImage, each word is represented as a pixel in a 2D image: rows correspond to sentences and an additional boundary row is inserted between sentences to mark semantic transitions. Each pixel is not a typical RGB value but a vector in a disentangled HSV color space, encoding different linguistic features: the Hue with two components H_cos and H_sin to account for circularity encodes the topic, Saturation encodes the sentiment, and Value encodes intensity or certainty. We enforce this disentanglement via a multi-task learning framework: a ColorMapper network maps each word embedding to the HSV space, and auxiliary supervision is applied to the Hue and Saturation channels to predict topic and sentiment labels, alongside the main task objective. The insertion of dynamically computed boundary rows between sentences yields sharp visual boundaries in the image when consecutive sentences are semantically dissimilar, effectively making paragraph breaks salient. We integrate SemImage with standard 2D CNNs (e.g., ResNet) for document classification. Experiments on multi-label datasets (with both topic and sentiment annotations) and single-label benchmarks demonstrate that SemImage can achieve competitive or better accuracy than strong text classification baselines (including BERT and hierarchical attention networks) while offering enhanced interpretability. An ablation study confirms the importance of the multi-channel HSV representation and the dynamic boundary rows. Finally, we present visualizations of SemImage that qualitatively reveal clear patterns corresponding to topic shifts and sentiment changes in the generated image, suggesting that our representation makes these linguistic features visible to both humans and machines.

URL PDF HTML ☆

赞 0 踩 0

2512.00062 2026-06-02 cs.RO cs.AI cs.LG 版本更新

SpeedAug: Policy Acceleration via Tempo-Enriched Policy and RL Fine-Tuning

SpeedAug: 通过节奏增强策略和强化学习微调实现策略加速

Taewook Nam, Junmo Cho, Youngsoo Jang, Sung Ju Hwang

发表机构 * KAIST（韩国科学技术院）； UNIST（全南大学）； DeepAuto.ai

AI总结提出SpeedAug框架，通过节奏增强先验策略和强化学习微调，使机器人策略学习任务最优执行节奏，在保持高成功率的同时显著提升执行速度和样本效率。

详情

AI中文摘要

针对复杂真实世界操作任务的机器人策略学习近期取得了快速进展，这在很大程度上得益于通过人类操作收集演示数据的能力。然而，从这些演示中训练出的策略通常执行任务的速度远低于机器人的物理能力，因为演示数据是在实际约束下收集的，这些约束倾向于保守的、以成功为导向的轨迹，而非执行速度。现有的策略加速方法通过数据预处理或启发式规则确定执行节奏，而不是学习针对任务优化的执行速度。在本文中，我们提出了SpeedAug，一个策略加速框架，使策略能够通过强化学习（RL）学习任务最优的执行节奏。SpeedAug首先从速度增强的演示中学习一个节奏增强的先验策略，该策略捕捉了多样的执行节奏。在此基础上，通过强化学习微调指导探索，以优化动作轨迹并高效优化执行节奏。在机器人操作基准上的实验表明，SpeedAug在保持高成功率的同时，显著提高了策略加速的样本效率，实现了快速且稳定的任务执行。应用于真实世界的操作任务时，SpeedAug仅用16分钟的在线交互就将任务吞吐量提高了1.8倍，且未降低成功率。

英文摘要

Robotic policy learning for complex real-world manipulation tasks has seen rapid recent progress, enabled in large part by the ability to collect demonstrations through human operation. However, policies trained from such demonstrations often execute tasks far more slowly than the robot's physical capabilities, as demonstration data is collected under practical constraints that favor conservative, success-oriented trajectories over execution speed. Existing policy acceleration methods determine execution tempo through data preprocessing or heuristic rules, rather than learning execution speed optimized for the task. In this paper, we propose SpeedAug, a policy acceleration framework that enables policies to learn task-optimal execution tempo via reinforcement learning (RL). SpeedAug first learns a tempo-enriched prior policy from speed-augmented demonstrations that captures diverse execution tempos. Building on this tempo-enriched prior, RL fine-tuning guides exploration to refine action trajectories and optimize execution tempo efficiently. Experiments on robotic manipulation benchmarks demonstrate that SpeedAug substantially improves the sample efficiency of policy acceleration while maintaining high success rates, achieving fast and stable task execution. Applied to a real-world manipulation task, SpeedAug improves task throughput by 1.8x using only 16 minutes of online interactions without compromising the success rate.

URL PDF HTML ☆

赞 0 踩 0

2511.21397 2026-06-02 cs.CV cs.AI cs.CL cs.LG 版本更新

Understanding the Effects of Distractors on Reasoning Vision-Language Models

理解干扰项对推理视觉语言模型的影响

Jiyun Bae, Hyunjong Ok, Sangwoo Mo, Jaeho Lee

发表机构 * Pohang University of Science and Technology (POSTECH)（坡山科学技术大学（POSTECH））

AI总结本文通过构建包含语义和数值维度干扰项的视觉问答数据集Idis，研究视觉干扰项如何影响视觉语言模型的测试时缩放行为，发现视觉干扰项以与文本干扰项根本不同的方式降低准确率而不增加推理长度，并提出简单提示策略缓解干扰项驱动的预测。

Comments preprint

详情

AI中文摘要

无关信息（即干扰项）如何影响视觉语言模型（VLM）的测试时缩放？先前关于纯文本语言模型的研究表明，文本干扰项可以加剧逆缩放，导致模型推理更长但推理轨迹效率更低。在这项工作中，我们研究了类似现象是否在多模态设置中出现。我们引入了Idis（带干扰项的图像），这是一个视觉问答数据集，系统性地沿着语义和数值维度变化干扰项。我们的分析揭示，视觉干扰项以与文本干扰项根本不同的方式影响推理VLM：尽管逆缩放仍然出现，但视觉干扰项降低了准确率而不增加推理长度。我们进一步展示了从推理轨迹中提取的属性计数为干扰项如何与推理长度和准确率交互提供了关键见解。作为合理性检查，我们提出了一种简单的提示策略，以减轻推理视觉语言模型中干扰项驱动的预测。

英文摘要

How does irrelevant information (i.e., distractors) affect test-time scaling in vision-language models (VLMs)? Prior work on text-only language models has shown that textual distractors can intensify inverse scaling, causing models to reason longer but less effective reasoning traces. In this work, we investigate whether similar phenomena arise in multimodal settings. We introduce Idis (Images with distractors), a visual question-answering dataset that systematically varies distractors along semantic and numerical dimensions. Our analyses reveal that visual distractors affect reasoning VLMs in a fundamentally different way from textual distractors: although inverse scaling still emerges, visual distractors reduce accuracy without increasing reasoning length. We further show that attribute counts extracted from reasoning traces provide key insights into how distractors interact with reasoning length and accuracy. As a sanity check, we propose a simple prompting strategy that mitigates distractor-driven predictions in reasoning vision-language models.

URL PDF HTML ☆

赞 0 踩 0

2511.20333 2026-06-02 cs.AI cs.LG cs.NE 版本更新

NNGPT: Rethinking AutoML with Large Language Models

NNGPT: 用大型语言模型重新思考AutoML

Roman Kochnev, Waleed Khalid, Tolgay Atinc Uzun, Xi Zhang, Yashkumar Sanjaybhai Dhameliya, Furui Qin, Chandini Vysyaraju, Raghuvir Duvvuri, Avi Goyal, Dmitry Ignatov, Radu Timofte

发表机构 * Computer Vision Lab, CAIDAS & IFI, University of Würzburg, Germany（计算机视觉实验室，CAIDAS与IFI，乌尔姆大学，德国）

AI总结提出NNGPT开源框架，利用大型语言模型实现自我改进的AutoML引擎，通过生成-评估-自我改进闭环自动设计神经网络架构和超参数。

详情

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 5664-5674, 2026

AI中文摘要

构建自我改进的人工智能系统仍然是AI领域的一个基本挑战。我们提出了NNGPT，一个开源框架，它将大型语言模型（LLM）转变为用于神经网络开发的自我改进AutoML引擎，主要针对计算机视觉。与之前的框架不同，NNGPT通过生成新模型扩展神经网络数据集，基于生成、评估和自我改进的闭环系统实现LLM的持续微调。它在一个统一的工作流中集成了五个协同的基于LLM的流水线：零样本架构合成、超参数优化（HPO）、代码感知的准确率/早停预测、检索增强的闭域PyTorch块合成（NN-RAG）以及强化学习。基于LEMUR数据集作为具有可复现指标的可审计语料库，NNGPT从单个提示出发，验证网络架构、预处理代码和超参数，端到端执行，并从结果中学习。PyTorch适配器使NNGPT框架无关，实现了强大性能：NN-RAG在1289个目标上达到73%的可执行性，3-shot提示在常见数据集上提高了准确率，基于哈希的去重节省了数百次运行。一次性预测匹配基于搜索的AutoML，减少了大量试验的需要。在LEMUR上的HPO实现了RMSE 0.60，优于Optuna（0.64），而代码感知预测器达到RMSE 0.14，Pearson r=0.78。该系统已生成超过5000个经过验证的模型，证明了NNGPT作为自主AutoML引擎的能力。接受后，代码、提示和检查点将公开发布，以实现可复现性并促进社区使用。

英文摘要

Building self-improving AI systems remains a fundamental challenge in the AI domain. We present NNGPT, an open-source framework that turns a large language model (LLM) into a self-improving AutoML engine for neural network development, primarily for computer vision. Unlike previous frameworks, NNGPT extends the dataset of neural networks by generating new models, enabling continuous fine-tuning of LLMs based on closed-loop system of generation, assessment, and self-improvement. It integrates within one unified workflow five synergistic LLM-based pipelines: zero-shot architecture synthesis, hyperparameter optimization (HPO), code-aware accuracy/early-stop prediction, retrieval-augmented synthesis of scope-closed PyTorch blocks (NN-RAG), and reinforcement learning. Built on the LEMUR dataset as an audited corpus with reproducible metrics, NNGPT emits from a single prompt and validates network architecture, preprocessing code, and hyperparameters, executes them end-to-end, and learns from result. The PyTorch adapter makes NNGPT framework-agnostic, enabling strong performance: NN-RAG achieves 73% executability on 1,289 targets, 3-shot prompting boosts accuracy on common datasets, and hash-based deduplication saves hundreds of runs. One-shot prediction matches search-based AutoML, reducing the need for numerous trials. HPO on LEMUR achieves RMSE 0.60, outperforming Optuna (0.64), while the code-aware predictor reaches RMSE 0.14 with Pearson r=0.78. The system has already generated over 5K validated models, proving NNGPT as an autonomous AutoML engine. Upon acceptance, the code, prompts, and checkpoints will be released for public access to enable reproducibility and facilitate community usage.

URL PDF HTML ☆

赞 0 踩 0

2511.13487 2026-06-02 eess.AS cs.LG cs.SD 版本更新

Systematic Evaluation of Time-Frequency Features for Binaural Sound Source Localization

双耳声源定位的时频特征系统评估

Davoud Shariat Panah, Alessandro Ragano, Dan Barry, Jan Skoglund, Andrew Hines

发表机构 * Taighde Éireann – Research Ireland（塔尔德·爱尔兰——爱尔兰研究）

AI总结系统评估不同时频特征组合对双耳声源定位性能的影响，发现精心选择的特征组合（如通道频谱图结合ILD和IPD）可超越增加模型复杂度，为领域特定和通用定位提供实用指导。

Comments Accepted at EUSIPCO 2026

详情

AI中文摘要

本研究对双耳声源定位（SSL）的时频特征设计进行了系统评估，重点关注特征选择如何在多样条件下影响模型性能。我们研究了使用基于幅度特征（幅度频谱图、耳间电平差 - ILD）和基于相位特征（相位频谱图、耳间相位差 - IPD）的各种组合的卷积神经网络（CNN）模型的性能。在域内和域外数据（具有不匹配的头部相关传递函数 - HRTFs）上的评估表明，精心选择的特征组合通常优于增加模型复杂度。虽然诸如ILD + IPD的双特征集足以用于域内SSL，但泛化到多样内容需要更丰富的输入，结合通道频谱图与ILD和IPD。使用最优特征集，我们的低复杂度CNN模型实现了有竞争力的性能。我们的发现强调了特征设计在双耳SSL中的重要性，并为领域特定和通用定位提供了实用指导。

英文摘要

This study presents a systematic evaluation of time-frequency feature design for binaural sound source localization (SSL), focusing on how feature selection influences model performance across diverse conditions. We investigate the performance of a convolutional neural network (CNN) model using various combinations of amplitude-based features (magnitude spectrogram, interaural level difference - ILD) and phase-based features (phase spectrogram, interaural phase difference - IPD). Evaluations on in-domain and out-of-domain data with mismatched head-related transfer functions (HRTFs) reveal that carefully chosen feature combinations often outperform increases in model complexity. While two-feature sets such as ILD + IPD are sufficient for in-domain SSL, generalization to diverse content requires richer inputs combining channel spectrograms with both ILD and IPD. Using the optimal feature sets, our low-complexity CNN model achieves competitive performance. Our findings underscore the importance of feature design in binaural SSL and provide practical guidance for both domain-specific and general-purpose localization.

URL PDF HTML ☆

赞 0 踩 0

2511.12081 2026-06-02 cs.IR cs.LG 版本更新

From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction

从规模到结构化表达能力：重新思考用于CTR预测的Transformer

Bencheng Yan, Yuejie Lei, Zhiyuan Zeng, Zheye Deng, Di Wang, Kaiyi Lin, Pengjie Wang, Chuan Yu, Jian Xu, Bo Zheng

发表机构 * Alibaba Group（阿里巴巴集团）

AI总结针对CTR预测中Transformer模型因结构错位导致收益递减的问题，提出Field-Aware Transformer (FAT)，通过场感知参数重构和基组合超网络实现结构化表达能力，在理论（Rademacher复杂度标度律）和实验（AUC提升+4.38%，线上CTR+2.33%，RPM+0.66%）上均优于现有方法。

Comments KDD 2026; The first four authors contributed equally to this work

详情

AI中文摘要

尽管在规模上投入巨大，用于点击率（CTR）预测的深度模型往往表现出快速递减的回报——这与大型语言模型（LLM）中观察到的可预测标度律形成鲜明对比。我们识别出根本原因在于根本性的结构错位：标准Transformer假设顺序组合性，而CTR数据需要对异构字段进行组合推理。为恢复对齐，我们引入了Field-Aware Transformer (FAT)。通过用场中心参数重构标准Transformer块，FAT实现了结构化表达能力，从根本上将模型复杂度依赖从总词汇量n转变为字段数F（n >> F）。关键的是，为了将模型容量与字段基数解耦，FAT采用基组合超网络从共享基合成场特定参数，进一步降低参数复杂度。理论上，我们通过基于Rademacher复杂度的形式化标度律来支撑这一缩放行为。实验上，FAT以高达+4.38%的AUC提升超越现有最先进方法，并在线上生产中带来+2.33%的CTR和+0.66%的RPM提升。我们的工作表明，可扩展的推荐不仅来自规模，更来自结构化表达能力——架构与数据语义的一致性。

英文摘要

Despite massive investments in scale, deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns -- a stark contrast to the {predictable scaling laws} seen in large language models (LLMs). We identify the root cause as a {fundamental} \textit{structural misalignment}: {standard} Transformers assume sequential compositionality, whereas CTR data demand combinatorial reasoning over {heterogeneous} fields. To restore alignment, we introduce the \textbf{Field-Aware Transformer (FAT)}. {By reconstructing the standard Transformer block with field-centric parameters, FAT achieves \textit{structured expressivity}, {fundamentally shifting the model complexity dependence from the total vocabulary size $n$ with the number of fields $F$ ($n \gg F$).}} Crucially, to decouple model capacity from field cardinality, FAT employs a {Basis-Composed Hypernetwork} to synthesize field-specific parameters from shared bases, further reducing parameter complexity. {Theoretically, we ground this scaling behavior through a formal scaling law based on Rademacher complexity. Empirically, FAT outperforms exisiting state-of-the-art methods with up to \textbf{+4.38\%} AUC improvement, and delivers \textbf{+2.33\%} CTR and \textbf{+0.66\%} RPM in live production.} Our work establishes that scalable recommendation arises not from size alone, but from \textit{structured expressivity} -- architectural coherence with data semantics.

URL PDF HTML ☆

赞 0 踩 0

2507.22842 2026-06-02 stat.ML cs.LG 版本更新

Tricks and Plug-ins for Gradient Boosting in Image Classification

图像分类中梯度提升的技巧与插件

Biyi Fang, Truong Vo, Jean Utke, Diego Klabjan

发表机构 * Northwestern University（西北大学）； Allstate（Allstate公司）

AI总结提出一种结合动态特征选择与BoostCNN原理的框架，通过子网格选择和重要性采样策略，将提升权重嵌入最小二乘损失训练，提升CNN性能与效率。

Comments 6 pages, 5 figures. Experimental results reported on CIFAR-10, SVHN, and ImageNetSub datasets

详情

DOI: 10.1109/BigData66926.2025.11402372
Journal ref: 2025 IEEE International Conference on Big Data (BigData), pp. 1382-1388

AI中文摘要

卷积神经网络（CNN）通过深度架构的分层特征学习，在广泛的机器学习任务中取得了显著成功。然而，大量的层和数百万参数通常使得CNN训练计算成本高昂，需要大量时间和手动调优来发现最优架构。在本文中，我们介绍了一种提升CNN性能的新框架，该框架将动态特征选择与BoostCNN原理相结合。我们的方法包含两个关键策略：子网格选择和重要性采样，以引导训练朝向特征空间的信息区域。我们进一步开发了一系列算法，使用最小二乘损失公式将提升权重直接嵌入网络训练过程。这种集成不仅减轻了手动架构设计的负担，还提高了准确性和效率。在多个细粒度分类基准上的实验结果表明，我们的提升CNN变体在预测性能和训练速度上始终优于传统CNN。

英文摘要

Convolutional Neural Networks (CNNs) have achieved remarkable success across a wide range of machine learning tasks by leveraging hierarchical feature learning through deep architectures. However, the large number of layers and millions of parameters often make CNNs computationally expensive to train, requiring extensive time and manual tuning to discover optimal architectures. In this paper, we introduce a novel framework for boosting CNN performance that integrates dynamic feature selection with the principles of BoostCNN. Our approach incorporates two key strategies: subgrid selection and importance sampling, to guide training toward informative regions of the feature space. We further develop a family of algorithms that embed boosting weights directly into the network training process using a least squares loss formulation. This integration not only alleviates the burden of manual architecture design but also enhances accuracy and efficiency. Experimental results across several fine-grained classification benchmarks demonstrate that our boosted CNN variants consistently outperform conventional CNNs in both predictive performance and training speed.

URL PDF HTML ☆

赞 0 踩 0

2501.02409 2026-06-02 cs.LG cs.AI cs.CE q-bio.MN stat.ME 版本更新

Interpretable Neural ODEs for Gene Regulatory Network Discovery under Perturbations

可解释神经ODE用于扰动下基因调控网络发现

Zaikang Lin, Sei Chang, Aaron Zweig, Minseo Kang, Fabian J. Theis, Elham Azizi, David A. Knowles

发表机构 * Department of Computer Science, Columbia University, New York, U.S.（哥伦比亚大学计算机科学系）； Department of Industrial Engineering and Operations Research, Columbia University, New York, U.S.（哥伦比亚大学工业工程与运筹学系）； Department of Applied Mathematics and Applied Physics, Columbia University, New York, U.S.（哥伦比亚大学应用数学与应用物理系）； New York Genome Center, New York, U.S.（纽约基因组中心）； Irving Institute of Cancer Dynamics, New York, U.S.（伊万·罗伯特癌症动力学研究所）； Institute of Computational Biology, Helmholtz Munich, Munich, Germany（海德堡医学院计算生物学研究所）； Department of Mathematics, Technische Universität München, Munich, Germany（慕尼黑技术大学数学系）

AI总结提出PerturbODE框架，利用可解释神经常微分方程建模扰动下的细胞状态轨迹，从ODE参数中推导因果基因调控网络，实现未见遗传干预的模拟。

详情

AI中文摘要

现代高通量生物数据集包含数千种扰动，使得能够大规模发现代表基因间调控相互作用的因果图。可微分因果图模型和基于回归的方法已被开发用于从干预数据集推断基因调控网络（GRN）。然而，现有方法未能捕捉生物过程（如细胞分化）的非线性动力学。为解决这一局限性，我们提出PerturbODE，一种新颖框架，采用可解释神经常微分方程（神经ODE）对扰动下的细胞状态轨迹进行建模，并从神经ODE参数中推导出潜在的因果GRN，从而实现对未见遗传干预的下游模拟。GRN通过单隐藏层前馈网络编码，隐含地将基因分组为可解释的共调控模块。我们展示了PerturbODE在GRN推断和扩展到扰动响应预测方面的有效性，包括模拟和真实过表达数据集。

英文摘要

Modern high-throughput biological datasets containing thousands of perturbations enable large-scale discovery of causal graphs that represent regulatory interactions between genes. Differentiable causal graphical models and regression-based methods have been developed to infer gene regulatory networks (GRNs) from interventional datasets. However, existing approaches fail to capture the non-linear dynamics of biological processes such as cellular differentiation. To address this limitation, we propose PerturbODE, a novel framework that employs interpretable neural ordinary differential equations (neural ODEs) to model cell state trajectories under perturbations and derive the underlying causal GRN from the neural ODE parameters, enabling downstream simulation of unseen genetic interventions. The GRN is encoded via a single-hidden-layer feedforward network, implicitly grouping genes into interpretable co-regulated modules. We demonstrate PerturbODE's efficacy in GRN inference and extension to perturbation response prediction across both simulated and real overexpression datasets.

URL PDF HTML ☆

赞 0 踩 0

2511.09190 2026-06-02 cs.LG cs.NE 版本更新

Iterated Population Based Training with Task-Agnostic Restarts

迭代式基于种群的训练与任务无关的重启

Alexander Chebykin, Tanja Alderliesten, Peter A. N. Bosman

发表机构 * Alexander Chebykin（亚历山大·切比金）； Tanja Alderliesten（塔妮娅·阿尔德利斯特恩）； Peter A. N. Bosman（彼得·A·N·博斯曼）

AI总结提出迭代式基于种群的训练（IPBT），通过任务无关的重启自动调整超参数更新间隔，在8个图像分类和强化学习任务上平均优于或匹配5种PBT变体及其他HPO算法。

详情

AI中文摘要

超参数优化（HPO）可以减轻调整神经网络超参数（HP）的负担。基于种群训练（PBT）系列的HPO算法通过在权重优化的每几步后动态调整HP，从而高效运行。最近的结果表明，HP更新之间的步数是所有PBT变体的重要元超参数，会显著影响其性能。然而，目前没有有效设置其值的方法或直觉。我们引入了迭代式基于种群的训练（IPBT），一种新颖的PBT变体，通过以任务无关的方式重用权重信息进行重启，并利用时变贝叶斯优化重新初始化HP，自动调整该超参数。在8个图像分类和强化学习任务上的评估表明，平均而言，我们的算法匹配或优于5种之前的PBT变体和其他HPO算法（随机搜索、ASHA、SMAC3），且无需增加预算或更改其超参数。源代码可在https://github.com/AwesomeLemon/IPBT获取。

英文摘要

Hyperparameter Optimization (HPO) can lift the burden of tuning hyperparameters (HPs) of neural networks. HPO algorithms from the Population Based Training (PBT) family are efficient thanks to dynamically adjusting HPs every few steps of the weight optimization. Recent results indicate that the number of steps between HP updates is an important meta-HP of all PBT variants that can substantially affect their performance. Yet, no method or intuition is available for efficiently setting its value. We introduce Iterated Population Based Training (IPBT), a novel PBT variant that automatically adjusts this HP via restarts that reuse weight information in a task-agnostic way and leverage time-varying Bayesian optimization to reinitialize HPs. Evaluation on 8 image classification and reinforcement learning tasks shows that, on average, our algorithm matches or outperforms 5 previous PBT variants and other HPO algorithms (random search, ASHA, SMAC3), without requiring a budget increase or any changes to its HPs. The source code is available at https://github.com/AwesomeLemon/IPBT.

URL PDF HTML ☆

赞 0 踩 0

2511.06663 2026-06-02 eess.SY cs.LG cs.SY 版本更新

GNN-Enabled Robust Hybrid Beamforming with Score-Based CSI Generation and Denoising

基于分数生成CSI与去噪的GNN鲁棒混合波束成形

Yuhang Li, Yang Lu, Bo Ai, Zhiguo Ding, Arumugam Nallanathan

发表机构 * State Key Laboratory of Advanced Rail Autonomous Operation（先进轨道交通自主运行国家重点实验室）； School of Computer Science and Technology（计算机科学与技术学院）； Beijing Jiaotong University（北京交通大学）； School of Electronics and Information Engineering（电子与信息工程学院）； School of Electrical and Electronic Engineering (EEE)（电子与电气工程学院）； Nanyang Technological University（南洋理工大学）； School of Electronic Engineering and Computer Science（电子工程与计算机科学学院）； Queen Mary University of London（伦敦女王玛丽大学）； Department of Electronic Engineering（电子工程系）

AI总结针对混合波束成形中信道状态信息不精确的问题，提出利用图神经网络和基于分数的生成模型，通过混合消息图注意力网络、BERT噪声条件分数网络和去噪分数网络实现鲁棒HBF。

详情

AI中文摘要

准确的信道状态信息（CSI）对于混合波束成形（HBF）任务至关重要。然而，在实际无线通信系统中，获取高分辨率CSI仍然具有挑战性。为了解决这个问题，我们提出利用图神经网络（GNN）和基于分数的生成模型，在不完美的CSI条件下实现鲁棒的HBF。首先，我们开发了混合消息图注意力网络（HMGAT），通过节点级和边级消息传递更新节点和边特征。其次，我们设计了一个基于BERT的噪声条件分数网络（NCSN），学习高分辨率CSI的分布，促进CSI生成和数据增强，进一步提高HMGAT的性能。最后，我们提出了一个去噪分数网络（DSN）框架及其实例化DeBERT，该网络可以在任意信道误差水平下对不完美的CSI进行去噪，从而实现鲁棒的HBF。在DeepMIMO城市数据集上的实验表明，所提出的模型在完美和不完美CSI的各种HBF任务中具有优越的泛化能力、可扩展性和鲁棒性。

英文摘要

Accurate Channel State Information (CSI) is critical for Hybrid Beamforming (HBF) tasks. However, obtaining high-resolution CSI remains challenging in practical wireless communication systems. To address this issue, we propose to utilize Graph Neural Networks (GNNs) and score-based generative models to enable robust HBF under imperfect CSI conditions. Firstly, we develop the Hybrid Message Graph Attention Network (HMGAT) which updates both node and edge features through node-level and edge-level message passing. Secondly, we design a Bidirectional Encoder Representations from Transformers (BERT)-based Noise Conditional Score Network (NCSN) to learn the distribution of high-resolution CSI, facilitating CSI generation and data augmentation to further improve HMGAT's performance. Finally, we present a Denoising Score Network (DSN) framework and its instantiation, termed DeBERT, which can denoise imperfect CSI under arbitrary channel error levels, thereby facilitating robust HBF. Experiments on DeepMIMO urban datasets demonstrate the proposed models' superior generalization, scalability, and robustness across various HBF tasks with perfect and imperfect CSI.

URL PDF HTML ☆

赞 0 踩 0

2511.06642 2026-06-02 cs.LG 版本更新

Improving Asset Allocation in a Fast Moving Consumer Goods B2B Company: An Interpretable Machine Learning Framework for Commercial Cooler Assignment Based on Multi-Tier Growth Targets

改善快速消费品B2B公司中的资产配置：基于多层级增长目标的商业冷柜分配可解释机器学习框架

Renato Castro, Rodrigo Paredes, Douglas Kahn

发表机构 * Fast Moving Consumer Goods B2B Company（快速消费品B2B公司）

AI总结提出一个可解释机器学习框架，利用XGBoost、LightGBM和CatBoost模型结合SHAP分析，预测B2B饮料客户在获得冷柜后的销量增长，以优化资产分配并提升ROI。

详情

DOI: 10.1109/AIxB65684.2025.00007
Journal ref: 2025 Artificial Intelligence for Business (AIxB), Laguna Hills, CA, USA, 2025, pp. 1-6

AI中文摘要

在快速消费品（FMCG）行业中，决定商业饮料冷柜等实体资产的放置位置可以直接影响收入增长和执行效率。尽管客户流失预测和需求预测在B2B环境中已被广泛研究，但使用机器学习指导资产配置仍相对未被探索。本文提出了一个框架，专注于预测哪些饮料客户在收到冷柜后最有可能实现销量增长。使用来自一家知名中美洲酿酒和饮料公司的私有数据集，包含3,119个在2022年1月至2024年7月期间收到冷柜的B2B传统贸易渠道客户，并跟踪冷柜安装前后12个月的销售交易，定义了三个增长阈值：销量同比增长10%、30%和50%。分析比较了XGBoost、LightGBM和CatBoost等机器学习模型与SHAP结合用于可解释特征分析的结果，以获取改善与冷柜分配相关的业务运营的见解；结果显示，最佳模型在验证集上各阈值的AUC得分分别为0.857、0.877和0.898。模拟表明，与传统的基于销量的方法相比，该方法可以更好地选择有望达到预期增长水平的潜在客户，并通过不分配冷柜给不会增长的客户来增加成本节约，从而改善ROI，并提供了实质性的业务管理建议。

英文摘要

In the fast-moving consumer goods (FMCG) industry, deciding where to place physical assets, such as commercial beverage coolers, can directly impact revenue growth and execution efficiency. Although churn prediction and demand forecasting have been widely studied in B2B contexts, the use of machine learning to guide asset allocation remains relatively unexplored. This paper presents a framework focused on predicting which beverage clients are most likely to deliver strong returns in volume after receiving a cooler. Using a private dataset from a well-known Central American brewing and beverage company of 3,119 B2B traditional trade channel clients that received a cooler from 2022-01 to 2024-07, and tracking 12 months of sales transactions before and after cooler installation, three growth thresholds were defined: 10%, 30% and 50% growth in sales volume year over year. The analysis compares results of machine learning models such as XGBoost, LightGBM, and CatBoost combined with SHAP for interpretable feature analysis in order to have insights into improving business operations related to cooler allocation; the results show that the best model has AUC scores of 0.857, 0.877, and 0.898 across the thresholds on the validation set. Simulations suggest that this approach can improve ROI because it better selects potential clients to grow at the expected level and increases cost savings by not assigning clients that will not grow, compared to traditional volume-based approaches with substantial business management recommendations

URL PDF HTML ☆

赞 0 踩 0

2511.05650 2026-06-02 cs.CL cs.AI cs.LG 版本更新

Optimizing Diversity and Quality through Base-Aligned Model Collaboration

通过基座对齐模型协作优化多样性与质量

Yichen Wang, Chenghao Yang, Tenghao Huang, Muhao Chen, Jonathan May, Mina Lee

发表机构 * University of Chicago（芝加哥大学）； University of Southern California, Information Sciences Institute（南加州大学信息科学研究所）； University of California, Davis（加州大学戴维斯分校）

AI总结提出基座对齐模型协作框架（BACo），在推理时通过令牌级路由策略动态结合基座LLM与其对齐版本，以单次前向传递同时提升生成多样性和质量。

Comments ICML 2026. (47 pages, 22 figures)

详情

AI中文摘要

对齐极大地提升了大语言模型（LLM）的输出质量，但以牺牲多样性为代价，导致跨代生成高度相似的输出，尤其是在开放式生成任务中。我们提出基座对齐模型协作（BACo），一种推理时令牌级模型协作框架，动态结合基座LLM与其对齐版本，以优化多样性和质量。利用基于不确定性和内容的信号，BACo采用路由策略决定每个令牌从哪个模型解码。先前的多样性提升方法通常以质量下降为代价，或需要昂贵的解码或后训练。相比之下，BACo在单次前向传递中事后同时实现高多样性和高质量，同时提供强可控性。我们引入一系列有效的路由策略，并在三个开放式生成任务中使用13个多样性和质量指标进行评估。BACo持续超越最先进的推理时基线。使用我们最佳的路由器，BACo在多样性和质量上实现了21.3%的联合提升，这一结果进一步得到人工评估的支持。总体而言，我们的结果表明，基座模型与对齐模型之间的协作为优化多样性-质量权衡提供了一种有效且可控的机制。

英文摘要

Alignment has greatly improved large language models (LLMs)' output quality at the cost of diversity, yielding highly similar outputs across generations, especially in open-ended generation tasks. We propose Base-Aligned Model Collaboration (BACo), an inference-time token-level model collaboration framework that dynamically combines a base LLM with its aligned counterpart to optimize diversity and quality. Using uncertainty and content-based signals, BACo employs routing strategies to determine, at each token, which model to decode from. Prior diversity-promoting methods often improve diversity at the expense of quality or require expensive decoding or post-training. In contrast, BACo achieves both high diversity and quality post hoc within a single pass, while offering strong controllability. We introduce a family of effective routing strategies and evaluate them across three open-ended generation tasks with 13 diversity and quality metrics. BACo consistently surpasses state-of-the-art inference-time baselines. With our best router, BACo achieves a 21.3% joint improvement in diversity and quality, which is further supported by human evaluations. Overall, our results demonstrate that collaboration between base and aligned models provides an effective and controllable mechanism for optimizing the diversity-quality trade-off.

URL PDF HTML ☆

赞 0 踩 0

2511.05613 2026-06-02 cs.CY cs.AI cs.LG 版本更新

Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations

谁在评估人工智能的社会影响？第一方和第三方评估的覆盖范围与差距分析

Anka Reuel, Avijit Ghosh, Jenny Chim, Andrew Tran, Yanan Long, Jennifer Mickel, Usman Gohar, Srishti Yadav, Pawan Sasanka Ammanamanchi, Mowafak Allaham, Hossein A. Rahmani, Mubashara Akhtar, Felix Friedrich, Robert Scholz, Michael Alexander Riegler, Jan Batzner, Eliya Habba, Arushi Saxena, Anastassia Kornilova, Kevin Wei, Prajna Soni, Yohan Mathew, Kevin Klyman, Jeba Sania, Subramanyam Sahoo, Olivia Beyer Bruvik, Pouya Sadeghi, Sujata Goswami, Angelina Wang, Yacine Jernite, Zeerak Talat, Stella Biderman, Mykel Kochenderfer, Sanmi Koyejo, Irene Solaiman

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结通过分析186份第一方发布报告和248份第三方评估来源，结合开发者访谈，揭示了第一方报告稀疏且流于表面，而第三方评估更广泛深入，但数据溯源、内容审核劳动等关键领域存在披露缺口，呼吁政策强制开发者透明化并加强独立评估生态。

Comments Accepted at the Forty-Third International Conference on Machine Learning (ICML), 2026, in Seoul, Korea

详情

AI中文摘要

DuetServe: 通过自适应GPU多路复用协调LLM服务的预填充与解码

Lei Gao, Chaoyi Jiang, Hossein Entezari Zarch, Daniel Wong, Mark Hill, Murali Annavaram

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）； University of Toronto（多伦多大学）； University of Washington（华盛顿大学）； University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结针对LLM服务中预填充与解码阶段的干扰问题，提出DuetServe框架，通过自适应SM级GPU空间多路复用实现单GPU内的阶段隔离，在保证低延迟的同时提升吞吐量。

详情

AI中文摘要

现代LLM服务系统必须维持高吞吐量，同时满足两个不同推理阶段（计算密集型预填充阶段和内存受限的解码阶段）的严格延迟SLO。现有方法要么（1）在共享GPU上聚合两个阶段，导致预填充和解码阶段之间的干扰，从而降低令牌间时间（TBT）；要么（2）将两个阶段分离到不同GPU上，改善延迟但通过重复模型和KV缓存传输浪费资源。我们提出DuetServe，一个统一的LLM服务框架，在单个GPU内实现分离级别的隔离。DuetServe默认以聚合模式运行，并在预测到TBT下降时动态激活SM级GPU空间多路复用。其关键思想是仅在需要时通过细粒度、自适应的SM分区解耦预填充和解码执行，仅在争用威胁延迟服务级别目标时提供阶段隔离。DuetServe集成了（1）一个注意力感知的屋顶线模型以预测迭代延迟，（2）一个分区优化器，选择最优SM分割以在TBT约束下最大化吞吐量，以及（3）一个无中断执行引擎，消除CPU-GPU同步开销。评估表明，与最先进框架相比，DuetServe在保持低生成延迟的同时，总吞吐量提升高达1.3倍。

英文摘要

Modern LLM serving systems must sustain high throughput while meeting strict latency SLOs across two distinct inference phases: compute-intensive prefill and memory-bound decode phases. Existing approaches either (1) aggregate both phases on shared GPUs, leading to interference between prefill and decode phases, which degrades Time-Between-Tokens (TBT); or (2) disaggregate the two phases across GPUs, improving latency but wasting resources through duplicated models and KV cache transfers. We present DuetServe, a unified LLM serving framework that achieves disaggregation-level isolation within a single GPU. DuetServe operates in aggregated mode by default and dynamically activates SM-level GPU spatial multiplexing when TBT degradation is predicted. Its key idea is to decouple prefill and decode execution only when needed through fine-grained, adaptive SM partitioning that provides phase isolation only when contention threatens latency service level objectives. DuetServe integrates (1) an attention-aware roofline model to forecast iteration latency, (2) a partitioning optimizer that selects the optimal SM split to maximize throughput under TBT constraints, and (3) an interruption-free execution engine that eliminates CPU-GPU synchronization overhead. Evaluations show that DuetServe improves total throughput by up to 1.3x while maintaining low generation latency compared to state-of-the-art frameworks.

URL PDF HTML ☆

赞 0 踩 0

2506.22666 2026-06-02 cs.CR cs.CL cs.LG stat.ML 版本更新

VERA: Variational Inference Framework for Jailbreaking Large Language Models

VERA：用于越狱大型语言模型的变分推理框架

Anamika Lochab, Lu Yan, Patrick Pynadath, Xiangyu Zhang, Ruqi Zhang

发表机构 * Department of Computer Science, Purdue University（计算机科学系，普渡大学）

AI总结提出VERA框架，将黑盒越狱提示生成视为变分推理问题，训练小型攻击者LLM近似目标LLM的对抗提示后验，无需重新优化即可生成多样且流畅的越狱提示。

Comments Accepted by NeurIPS 2025

详情

AI中文摘要

仅通过API访问最先进LLM的兴起凸显了在现实环境中识别模型漏洞的有效黑盒越狱方法的需求。由于缺乏基于梯度的优化原则性目标，大多数现有方法依赖于遗传算法，这些算法受限于其初始化和对人工策划提示池的依赖。此外，这些方法需要对每个提示进行单独优化，未能提供模型漏洞的全面表征。为弥补这一差距，我们引入了VERA：用于越狱的变分推理框架。VERA将黑盒越狱提示生成视为变分推理问题，训练一个小型攻击者LLM来近似目标LLM在对抗提示上的后验。一旦训练完成，攻击者可以针对目标查询生成多样化、流畅的越狱提示，而无需重新优化。实验结果表明，VERA在一系列目标LLM上取得了强劲的性能，凸显了概率推理在对抗性提示生成中的价值。

英文摘要

The rise of API-only access to state-of-the-art LLMs highlights the need for effective black-box jailbreak methods to identify model vulnerabilities in real-world settings. Without a principled objective for gradient-based optimization, most existing approaches rely on genetic algorithms, which are limited by their initialization and dependence on manually curated prompt pools. Furthermore, these methods require individual optimization for each prompt, failing to provide a comprehensive characterization of model vulnerabilities. To address this gap, we introduce VERA: Variational infErence fRamework for jAilbreaking. VERA casts black-box jailbreak prompting as a variational inference problem, training a small attacker LLM to approximate the target LLM's posterior over adversarial prompts. Once trained, the attacker can generate diverse, fluent jailbreak prompts for a target query without re-optimization. Experimental results show that VERA achieves strong performance across a range of target LLMs, highlighting the value of probabilistic inference for adversarial prompt generation.

URL PDF HTML ☆

赞 0 踩 0

2504.16129 2026-06-02 cs.MA cs.AI cs.LG cs.RO 版本更新

MARFT: Multi-Agent Reinforcement Fine-Tuning

MARFT: 多智能体强化微调

Junwei Liao, Muning Wen, Jun Wang, Weinan Zhang

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Shanghai Innovation Institute（上海创新研究院）； OPPO Research Institute（OPPO研究院）

AI总结针对基于大语言模型的多智能体系统，提出多智能体强化微调（MARFT）框架，通过引入Flex-MG马尔可夫博弈公式和通用算法，解决异步交互、异构架构等挑战，提升系统鲁棒性和适应性。

Comments 37 pages

详情

AI中文摘要

基于大语言模型的多智能体系统（LaMAS）在需要多方面推理和协作的复杂智能体任务中展现出强大能力，从高质量演示生成到科学研究。同时，强化学习（RL）被广泛认可用于增强智能体智能，但用基础RL技术微调LaMAS的研究有限。由于LaMAS的独特机制，直接将传统多智能体强化学习（MARL）应用于LaMAS也带来了重大挑战。为解决这些挑战，本文对基于LLM的MARL进行了全面研究，并提出了多智能体强化微调（MARFT）。我们引入了Flex-MG，一种与真实世界LaMAS优化一致的新马尔可夫博弈公式，以及一个针对LaMAS定制的通用算法框架。我们回顾了从传统RL到强化微调（RFT）的演变，然后分析了多智能体对应部分。对于LaMAS，我们识别了经典MARL与MARFT之间的关键差异，包括异步智能体交互、轮廓感知智能体设计和异构架构。这些差异促使了面向LaMAS的RFT公式。我们提出了一个稳健且可扩展的MARFT框架，详细介绍了其模块化算法，并提供了开源实现以支持采用和进一步研究。本文进一步讨论了应用前景和开放挑战，包括动态环境建模、样本效率低下以及缺乏连贯框架。通过将理论基础与实践方法相结合，本文旨在作为推进MARFT向弹性、自适应和与人类一致的智能体系统发展的路线图。实现：https://github.com/jwliao-ai/MARFT。

英文摘要

Large Language Model (LLM)-based Multi-Agent Systems (LaMAS) have demonstrated strong capabilities on complex agentic tasks requiring multifaceted reasoning and collaboration, from high-quality presentation generation to scientific research. Meanwhile, Reinforcement Learning (RL) is widely recognized for enhancing agent intelligence, but limited work has studied fine-tuning LaMAS with foundational RL techniques. Directly applying conventional Multi-Agent Reinforcement Learning (MARL) to LaMAS also introduces major challenges due to the unique mechanisms of LaMAS. To address these challenges, this article presents a comprehensive study of LLM-based MARL and proposes Multi-Agent Reinforcement Fine-Tuning (MARFT). We introduce Flex-MG, a new Markov Game formulation aligned with real-world LaMAS optimization, together with a universal algorithmic framework tailored to LaMAS. We review the evolution from traditional RL to Reinforcement Fine-Tuning (RFT), then analyze the multi-agent counterpart. For LaMAS, we identify key differences between classical MARL and MARFT, including asynchronous agent interactions, profile-aware agent design, and heterogeneous architectures. These differences motivate a LaMAS-oriented formulation of RFT. We present a robust and scalable MARFT framework, detail its modular algorithm, and provide an open-source implementation to support adoption and further research. The paper further discusses application perspectives and open challenges, including dynamic environment modeling, sample inefficiency, and the lack of cohesive frameworks. By connecting theoretical foundations with practical methodology, this work aims to serve as a roadmap for advancing MARFT toward resilient, adaptive, and human-aligned agentic systems. Implementation: https://github.com/jwliao-ai/MARFT.

URL PDF HTML ☆

赞 0 踩 0

2412.16209 2026-06-02 cs.LG stat.ML 版本更新

Challenges in the calibration of tree-based models for imbalanced classification

基于树的模型在不平衡分类中校准的挑战

Nathan Phelps, Daniel J. Lizotte, Douglas G. Woolford

发表机构 * Department of Statistical and Actuarial Sciences, University of Western Ontario, London, Canada（统计与精算科学系，温哥华大学，伦敦，加拿大）； Department of Computer Science, University of Western Ontario, London, Canada（计算机科学系，温哥华大学，伦敦，加拿大）； Department of Epidemiology and Biostatistics, University of Western Ontario, London, Canada（流行病学与生物统计学系，温哥华大学，伦敦，加拿大）

AI总结研究随机森林在欠采样数据上使用解析校准导致偏差的问题，发现决策树可能偏向少数类，并提出应使用可学习校准模式的方法（如beta校准）。

详情

AI中文摘要

当使用机器学习处理不平衡的二分类问题时，通常会对多数类进行子采样以创建（更）平衡的训练数据集。这会使模型产生偏差，因为模型从不能完全代表底层感兴趣群体的数据中学习。解决这种偏差的一种方法是基于多数类的采样率，将预测结果解析映射到新值。我们展示了以这种方式校准随机森林会产生负面后果，包括流行率估计同时依赖于随机森林中每个分裂考虑的预测变量数量和使用的采样率。我们利用随机森林和解析校准的已知性质解释前者，并通过展示决策树中的偏差解释后者。与现有文献相矛盾的是，我们证明决策树可能偏向少数类。这些问题表明，在欠采样数据上训练的基于树的模型不应进行解析校准。能够学习原始模型中校准偏差模式的方法（例如beta校准）更为合适。

英文摘要

When using machine learning for imbalanced binary classification problems, it is common to subsample the majority class to create a (more) balanced training dataset. This biases the model's predictions because the model learns from data that is not fully representative of the underlying population of interest. One way of accounting for this bias is analytically mapping the resulting predictions to new values based on the sampling rate for the majority class. We show that calibrating a random forest this way has negative consequences, including prevalence estimates that depend on both the number of predictors considered at each split in the random forest and the sampling rate used. We explain the former using known properties of random forests and analytical calibration and the latter by demonstrating a bias in decision trees. In contradiction with much of the existing literature, we show that decision trees can be biased towards the minority class. These issues indicate that tree-based models trained on undersampled data should not be calibrated analytically. Calibration approaches that can learn a miscalibration pattern in the original model (e.g., beta calibration) are more suitable.

URL PDF HTML ☆

赞 0 踩 0

2510.14904 2026-06-02 cs.CV cs.AI cs.LG 版本更新

CaptionFormer: Unified Segmentation, Tracking, and Captioning for Spatio-Temporal Objects

CaptionFormer：时空对象的统一分割、跟踪与描述

Gabriel Fiastre, Antoine Yang, Cordelia Schmid

发表机构 * Inria, École Normale Supérieure, CNRS, PSL Research University（法国国家科学研究中心、巴黎高等师范学院、国家科学研究中心、巴黎综合理工研究所）； Google DeepMind（谷歌DeepMind）

AI总结提出 CaptionFormer 模型，通过利用 VLM 生成合成描述并扩展数据集，实现视频中对象轨迹的联合检测、分割、跟踪与描述，在三个基准上达到最优。

Comments 17 pages, 10 figures

详情

AI中文摘要

密集视频对象描述（DVOC）是联合检测、跟踪和描述视频中对象轨迹的任务，需要理解时空细节并用自然语言描述。由于任务复杂性和手动标注的高成本，先前方法采用有限数据的训练策略，可能导致次优性能。为解决此问题，我们提出利用最先进的 VLM 生成关于时空定位实体的描述，并用我们的合成描述（LVISCap 和 LV-VISCap）扩展 LVIS 和 LV-VIS 数据集。此外，我们引入端到端模型 CaptionFormer，能够联合检测、分割、跟踪和描述对象轨迹。CaptionFormer 在三个现有基准（VidSTG、VLN 和 BenSMOT）上取得了最先进的 DVOC 结果。数据集和代码可在 https://www.gabriel.fiastre.fr/captionformer/ 获取。

英文摘要

Dense Video Object Captioning (DVOC) is the task of jointly detecting, tracking, and captioning object trajectories in a video, requiring the ability to understand spatio-temporal details and describe them in natural language. Due to the complexity of the task and the high cost associated with manual annotation, previous approaches resort to training strategies with limited data, potentially leading to suboptimal performance. To circumvent this issue, we propose to generate captions about spatio-temporally localized entities leveraging a state-of-the-art VLM, and extend the LVIS and LV-VIS datasets with our synthetic captions (LVISCap and LV-VISCap). Moreover, we introduce an end-to-end model, CaptionFormer, capable of jointly detecting, segmenting, tracking and captioning object trajectories. CaptionFormer achieves state-of-the-art DVOC results on three existing benchmarks, VidSTG, VLN and BenSMOT. The datasets and code are available at https://www.gabriel.fiastre.fr/captionformer/.

URL PDF HTML ☆

赞 0 踩 0

2504.19419 2026-06-02 cs.LG stat.ML 版本更新

Advancing Local Clustering on Graphs via Compressive Sensing: Semi-supervised and Unsupervised Methods

通过压缩感知推进图上的局部聚类：半监督和无监督方法

Zhaiming Shen, Sung Ha Kang

发表机构 * School of Mathematics, Georgia Institute of Technology（数学系，佐治亚理工学院）

AI总结提出基于压缩感知的半监督和无监督局部聚类方法，通过随机采样、扩散和重叠分析实现稀疏解，并证明其正确性，在低标签率下达到最优性能。

详情

AI中文摘要

局部聚类旨在无需图的任何额外结构信息的情况下，识别大型图中的特定子结构。这些子结构通常相对于整个图较小，使得可以通过寻找与图拉普拉斯相关的线性系统的稀疏解来解决问题。在这项工作中，我们首先提出了一种在给定极少标签数据时识别特定局部聚类的方法，我们称之为半监督局部聚类。然后，我们将该方法扩展到无监督设置，即没有标签的先验信息可用。所提出的方法包括随机采样图、通过局部聚类提取进行扩散，然后检查结果之间的重叠以找到每个聚类。我们建立了任意节点对的共成员条件，并严格证明了我们方法的正确性。此外，我们进行了大量实验，证明所提出的方法在低标签率情况下达到了最先进的结果。

英文摘要

Local clustering aims to identify specific substructures within a large graph without any additional structural information of the graph. These substructures are typically small compared to the overall graph, enabling the problem to be approached by finding a sparse solution to a linear system associated with the graph Laplacian. In this work, we first propose a method for identifying specific local clusters when very few labeled data are given, which we term semi-supervised local clustering. We then extend this approach to the unsupervised setting when no prior information on labels is available. The proposed methods involve randomly sampling the graph, applying diffusion through local cluster extraction, then examining the overlap among the results to find each cluster. We establish the co-membership conditions for any pair of nodes, and rigorously prove the correctness of our methods. Additionally, we conduct extensive experiments to demonstrate that the proposed methods achieve state of the art results in the low-label rates regime.

URL PDF HTML ☆

赞 0 踩 0

2510.22138 2026-06-02 cs.LG 版本更新

Tractable Shapley Values and Interactions via Tensor Networks

通过张量网络实现可计算的Shapley值和交互作用

Farzaneh Heidari, Chao Li, Guillaume Rabusseau

发表机构 * DIRO & Mila Université de Montréal（DIRO与Mila蒙特利尔大学）； RIKEN AIP（日本理化学研究所AIP）； DIRO & Mila, CIFAR AI Chair（DIRO与Mila，CIFAR人工智能主席）； Université de Montréal（蒙特利尔大学）

AI总结提出TN-SHAP方法，利用张量网络替代指数级联盟枚举，通过少量评估高效计算Shapley值和交互指数，实现O(n·poly(χ)+n²)复杂度，并在UCI数据集上取得25-1000倍加速。

详情

AI中文摘要

我们展示了如何将Shapley值和Shapley风格交互指数背后的n个特征上的O(2^n)联盟枚举替换为在张量网络(TN)代理上的少量评估方案：TN-SHAP。关键思想是将预测器的局部行为表示为因子化的多重线性映射，使得联盟量成为系数张量的线性探针。TN-SHAP用仅少量目标评估替换穷举联盟扫描，以提取k阶Shapley交互。特别地，一阶（单特征）和二阶（成对）计算的复杂度均为O(n·poly(χ) + n^2)，其中χ是TN的最大切割秩。我们提供了TN-SHAP近似误差和可计算性的理论保证。在UCI数据集上，我们的方法在拟合代理上匹配枚举，同时将评估量减少数个数量级，并在可比精度下实现比KernelSHAP-IQ快25-1000倍的挂钟时间加速，同时将训练分摊到局部群体上。

英文摘要

We show how to replace the O(2^n) coalition enumeration over n features behind Shapley values and Shapley-style interaction indices with a few-evaluation scheme on a tensor-network (TN) surrogate: TN-SHAP. The key idea is to represent a predictor's local behavior as a factorized multilinear map, so that coalitional quantities become linear probes of a coefficient tensor. TN-SHAP replaces exhaustive coalition sweeps with just a small number of targeted evaluations to extract order-k Shapley interactions. In particular, both order-1 (single-feature) and order-2 (pairwise) computations have cost O(n*poly(chi) + n^2), where chi is the TN's maximal cut rank. We provide theoretical guarantees on the approximation error and tractability of TN-SHAP. On UCI datasets, our method matches enumeration on the fitted surrogate while reducing evaluation by orders of magnitude and achieves 25-1000x wall-clock speedups over KernelSHAP-IQ at comparable accuracy, while amortizing training across local cohorts.

URL PDF HTML ☆

赞 0 踩 0

2510.23379 2026-06-02 cs.LG cs.AI cs.NE q-bio.BM 版本更新

Symbolic Neural Generation with Applications to Lead Discovery in Drug Design

符号神经生成及其在药物设计先导发现中的应用

Ashwin Srinivasan, Tirtharaj Dash, A Baskar, Michael Bain, Sanjay Kumar Dey, Mainak Banerjee

发表机构 * Dept. of Computer Science & Information Systems and APPCAIR BITS Pilani, K K Birla Goa Campus, India（计算机科学与信息系统系及APPCAIR比特纳学院，K K Birl拉果阿校区，印度）； Dept. of Computer Science & Information Systems BITS Pilani, K K Birla Goa Campus, India（计算机科学与信息系统系比特纳学院，K K Birl拉果阿校区，印度）； Department of Biochemistry, University of Cambridge, Cambridge, UK（生物化学系，剑桥大学，剑桥，英国）； School of Computer Science and Engineering University of New South Wales, Sydney（计算机科学与工程学院新南威尔士大学，悉尼）； Dr. B.R. Ambedkar Center for Biomedical Research University of Delhi, New Delhi, India（B.R.阿姆贝卡尔生物医学研究中心，德里大学，新德里，印度）； Department of Chemistry BITS Pilani, K.K. Birla Goa Campus, India（化学系比特纳学院，K.K. Birl拉果阿校区，印度）

AI总结提出符号神经生成器（SNG）框架，结合归纳逻辑编程与大语言模型，通过符号约束指导神经生成，在药物设计中生成满足形式规范的候选分子，性能与现有方法相当，并在探索性问题上产生与临床候选分子相当的结合亲和力。

Comments 37 pages, submitted to the Machine Learning journal; partial overlap of experimental results with https://doi.org/10.1101/2025.02.14.634875

详情

AI中文摘要

我们研究了一类相对未被充分探索的混合神经符号模型，该模型将符号学习与神经推理相结合，以构建满足形式正确性标准的数据生成器。在符号神经生成器（SNG）中，符号学习器从少量实例（有时仅一个）中检查可行数据的逻辑规范。每个规范反过来约束提供给基于神经的生成器的条件信息，该生成器拒绝任何违反符号规范的实例。与其他神经符号方法一样，SNG利用了符号和神经方法的互补优势。SNG的输出是一个对$(H, X)$，其中$H$是从数据构建的可行实例的符号描述，$X$是满足该描述的一组生成的新实例。我们基于构建适当的基集和纤维偏序集并将其组合成整体偏序，为这类系统引入语义。我们实现了一个SNG，将受限形式的归纳逻辑编程（ILP）与大语言模型（LLM）相结合，并在早期药物设计上进行了评估。我们的主要兴趣在于SNG生成的描述和一组潜在的抑制剂分子。在基准问题（药物靶点已被充分理解）上，SNG的性能在统计上与最先进方法相当。在探索性问题（靶点理解不足）上，生成的分子表现出与领先临床候选分子相当的结合亲和力。专家进一步发现符号规范作为初步过滤器很有用，多个生成的分子被确定为可用于合成和湿实验测试。

无需训练的视频推理

Deepak Sridhar, Kartikeya Bhardwaj, Jeya Pradha Jeyaraj, Nuno Vasconcelos, Ankita Nayak, Harris Teague

发表机构 * Qualcomm AI Research（高通AI研究）； University of California, San Diego（加州大学圣地亚哥分校）

AI总结提出V-Reason方法，利用输出分布熵作为信号，通过轻量级控制器在推理时自适应调整值缓存，无需强化学习或微调即可提升视频推理性能。

Comments CVPR Findings 2026. Project Page https://deepaksridhar.github.io/vreason.github.io/

详情

AI中文摘要

使用大型多模态模型（LMM）进行视频推理依赖于昂贵的强化学习（RL）和冗长的思维链，导致训练和推理过程中产生大量计算开销。此外，这些推理模型中控制思维过程的机制非常有限。在本文中，我们利用模型输出分布的熵作为信号来研究和指导推理行为。我们发现高质量模型表现出微探索和微利用循环的特征模式，随后出现后期熵峰值（即更长的思考）和较低的最终熵，表明更谨慎的探索和自信的收敛（即当模型探索或思考答案时避免过度随机性）。然后，我们利用这些新颖的、有理论基础的见解，引入了V-Reason（Video-Reason），一种推理时优化方法，通过轻量级、可训练的控制器自适应调整LMM的值缓存。我们提出的控制器由基于熵的目标引导，直接在推理时调整模型行为，无需使用任何RL或监督微调。我们的实验表明，V-Reason在许多视频推理数据集上显著优于基础指令调优模型，将与RL模型的差距平均缩小到0.6%的准确率以内。我们在无需任何训练的情况下实现了这一点，同时提供了效率优势：V-Reason使用的token比RL模型少58.6%。项目页面：https://deepaksridhar.github.io/vreason.github.io/

英文摘要

Video reasoning using Large Multimodal Models (LMMs) relies on costly reinforcement learning (RL) and verbose chain-of-thought, resulting in substantial computational overhead during both training and inference. Moreover, the mechanisms that control the thinking process in these reasoning models are very limited. In this paper, we use the entropy of the model's output distribution as a signal to study and guide reasoning behavior. We discover that high-quality models exhibit a characteristic pattern of micro-exploration and micro-exploitation cycles, followed by a later entropy peak (i.e., longer thinking) and a lower final entropy, indicating more deliberate exploration and confident convergence (i.e., avoid excessive randomness while the model is exploring or thinking through an answer). We then use these novel, theoretically-grounded insights to introduce V-Reason (Video-Reason), an inference-time optimization method that adapts the value cache of the LMM through a lightweight, trainable controller. Our proposed controller is guided by an entropy-based objective, to tune the model's behavior directly at inference, without using any RL or supervised fine-tuning. Our experiments show that V-Reason significantly outperforms the base instruction-tuned models on many video reasoning datasets, narrowing the gap with RL models to within 0.6% accuracy on average. We achieve this without any training, while offering efficiency benefits: V-Reason uses 58.6% fewer tokens than the RL model. Project Page https://deepaksridhar.github.io/vreason.github.io/

URL PDF HTML ☆

赞 0 踩 0

2510.16660 2026-06-02 cs.CV cs.LG physics.med-ph 版本更新

Universal and Transferable Attacks on Pathology Foundation Models

病理基础模型的通用与可迁移攻击

Yuntian Wang, Xilin Yang, Che-Yung Shen, Nir Pillar, Aydogan Ozcan

发表机构 * Electrical and Computer Engineering Department, University of California, Los Angeles, CA, 90095, USA（加州大学洛杉矶分校电子与计算机工程系）； Bioengineering Department, University of California, Los Angeles, CA, 90095, USA（加州大学洛杉矶分校生物工程系）； California NanoSystems Institute (CNSI), University of California, Los Angeles, CA, 90095, USA（加州大学洛杉矶分校加州纳米系统研究所）； Department of Pathology, Hadassah Hebrew University Medical Center, Jerusalem, 91120, Israel（海法希伯来大学医疗中心病理学系）； Department of Surgery, University of California, Los Angeles, CA, 90095, USA（加州大学洛杉矶分校外科系）

AI总结提出通用可迁移对抗扰动（UTAP），通过固定弱噪声模式破坏多个病理基础模型的特征表示能力，导致下游任务性能下降，并展示其跨数据集通用性和跨模型可迁移性。

Comments 38 Pages, 8 Figures

详情

DOI: 10.1038/s41377-026-02347-w
Journal ref: Light: Science & Applications (2026)

AI中文摘要

我们为病理基础模型引入了通用可迁移对抗扰动（UTAP），揭示了其能力中的关键脆弱性。UTAP 使用深度学习优化，由一个固定的弱噪声模式组成，当添加到病理图像时，会系统地破坏多个病理基础模型的特征表示能力。因此，UTAP 会导致利用基础模型的下游任务性能下降，包括在广泛的未见数据分布上的错误分类。除了损害模型性能，我们展示了 UTAP 的两个关键特征：（1）通用性：其扰动可应用于不同的视野，与开发 UTAP 的数据集无关；（2）可迁移性：其扰动能成功降低各种外部、黑盒病理基础模型（从未见过）的性能。这两个特征表明 UTAP 不是针对特定基础模型或图像数据集的专用攻击，而是对多种新兴病理基础模型及其应用构成广泛威胁。我们在多个数据集上系统评估了 UTAP 对各种最先进病理基础模型的影响，通过使用固定噪声模式对输入图像进行视觉上不可察觉的修改，导致其性能显著下降。这些强大攻击的开发为模型鲁棒性评估建立了一个关键的高标准基准，凸显了推进防御机制的需求，并可能为对抗训练提供必要资产，以确保 AI 在病理学中的安全可靠部署。

英文摘要

We introduce Universal and Transferable Adversarial Perturbations (UTAP) for pathology foundation models that reveal critical vulnerabilities in their capabilities. Optimized using deep learning, UTAP comprises a fixed and weak noise pattern that, when added to a pathology image, systematically disrupts the feature representation capabilities of multiple pathology foundation models. Therefore, UTAP induces performance drops in downstream tasks that utilize foundation models, including misclassification across a wide range of unseen data distributions. In addition to compromising the model performance, we demonstrate two key features of UTAP: (1) universality: its perturbation can be applied across diverse field-of-views independent of the dataset that UTAP was developed on, and (2) transferability: its perturbation can successfully degrade the performance of various external, black-box pathology foundation models - never seen before. These two features indicate that UTAP is not a dedicated attack associated with a specific foundation model or image dataset, but rather constitutes a broad threat to various emerging pathology foundation models and their applications. We systematically evaluated UTAP across various state-of-the-art pathology foundation models on multiple datasets, causing a significant drop in their performance with visually imperceptible modifications to the input images using a fixed noise pattern. The development of these potent attacks establishes a critical, high-standard benchmark for model robustness evaluation, highlighting a need for advancing defense mechanisms and potentially providing the necessary assets for adversarial training to ensure the safe and reliable deployment of AI in pathology.

URL PDF HTML ☆

赞 0 踩 0

2505.22961 2026-06-02 cs.CL cs.LG 版本更新

ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind

ToMAP: 利用心智理论训练对手感知的大语言模型说服者

Peixuan Han, Zijia Liu, Jiaxuan You

发表机构 * National University of Singapore（新加坡国立大学）

AI总结提出ToMAP方法，通过两个心智理论模块增强大语言模型对对手心理状态的感知与分析，结合强化学习生成更有效、多样化的说服性论据，在3B参数下超越GPT-4o等大模型。

详情

AI中文摘要

大语言模型（LLMs）在说服方面显示出有前景的潜力，但现有关于训练LLM说服者的工作仍处于初步阶段。值得注意的是，虽然人类擅长主动且动态地建模对手的想法和观点，但当前的LLMs在这种心智理论（ToM）推理上存在困难，导致多样性和对手意识有限。为解决这一局限，我们引入了心智理论增强说服者（ToMAP），这是一种通过整合两个心智理论模块来构建更灵活的说服者智能体的新方法，这些模块增强了说服者对对手心理状态的感知和分析。具体来说，我们首先提示说服者考虑对目标中心主张的可能反对意见，然后使用文本编码器配合训练好的MLP分类器来预测对手对这些反主张的当前立场。我们精心设计的强化学习框架使说服者学会如何分析对手相关信息并利用它来生成更有效的论据。实验表明，ToMAP说服者仅包含3B参数，却在多个被说服者模型和多样化语料库上以39.4%的相对增益优于GPT-4o等更大的基线模型。值得注意的是，ToMAP在训练过程中展现出复杂的推理链和减少的重复性，从而产生更多样化和有效的论据。ToMAP的对手感知特性也使其适用于长对话，并能采用更具逻辑性和对手感知的策略。这些结果强调了我们方法的有效性，并突出了其在开发更具说服力的语言智能体方面的潜力。代码可在 https://github.com/ulab-uiuc/ToMAP 获取。

英文摘要

Large language models (LLMs) have shown promising potential in persuasion, but existing works on training LLM persuaders are still preliminary. Notably, while humans are skilled in modeling their opponent's thoughts and opinions proactively and dynamically, current LLMs struggle with such Theory of Mind (ToM) reasoning, resulting in limited diversity and opponent awareness. To address this limitation, we introduce Theory of Mind Augmented Persuader (ToMAP), a novel approach for building more flexible persuader agents by incorporating two theory of mind modules that enhance the persuader's awareness and analysis of the opponent's mental state. Specifically, we begin by prompting the persuader to consider possible objections to the target central claim, and then use a text encoder paired with a trained MLP classifier to predict the opponent's current stance on these counterclaims. Our carefully designed reinforcement learning schema enables the persuader learns how to analyze opponent-related information and utilize it to generate more effective arguments. Experiments show that the ToMAP persuader, while containing only 3B parameters, outperforms much larger baselines, like GPT-4o, with a relative gain of 39.4% across multiple persuadee models and diverse corpora. Notably, ToMAP exhibits complex reasoning chains and reduced repetition during training, which leads to more diverse and effective arguments. The opponent-aware feature of ToMAP also makes it suitable for long conversations and enables it to employ more logical and opponent-aware strategies. These results underscore our method's effectiveness and highlight its potential for developing more persuasive language agents. Code is available at: https://github.com/ulab-uiuc/ToMAP.

URL PDF HTML ☆

赞 0 踩 0

2510.11713 2026-06-02 cs.CL cs.LG 版本更新

Are Large Reasoning Models Interruptible?

大型推理模型是否可中断？

Tsung-Han Wu, Mihran Miroyan, David M. Chan, Trevor Darrell, Narges Norouzi, Joseph E. Gonzalez

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结研究大型推理模型在中断和动态上下文两种现实动态场景下的鲁棒性，发现静态评估高估了鲁棒性，并揭示了推理泄漏、恐慌和自我怀疑等新故障模式。

Comments ICML 2026; Project Page: http://dynamic-lm.github.io

详情

AI中文摘要

大型推理模型（LRM）的实际应用通常需要对变化的提示或环境进行推理。在这项工作中，我们挑战冻结世界假设，并在两种现实的动态场景下评估LRM的鲁棒性：中断（测试模型在预算受限输出下的响应准确性）和动态上下文（测试模型对飞行中变化的适应能力）。在需要长形式推理的数学和编程基准测试中，静态评估始终高估鲁棒性：即使在静态设置中达到高准确率的最先进的LRM，在中断或暴露于变化上下文时也可能不可预测地失败，当更新在推理过程后期引入时，性能下降高达60%。我们的分析进一步揭示了多种新的故障模式，包括推理泄漏（模型在中断时将推理折叠到最终答案中）、恐慌（在时间压力下模型完全放弃推理并返回错误答案）以及自我怀疑（在尝试整合更新信息时性能下降）。项目页面：http://dynamic-lm.github.io/

英文摘要

Real-world applications of Large Reasoning Models (LRMs) often require reasoning about changing prompts or environments. In this work, we challenge the frozen world assumption and evaluate LRM robustness under two realistic dynamic scenarios: interruptions, which test the accuracy of model responses under budget-constrained outputs, and dynamic context, which tests model adaptation to in-flight changes. Across mathematics and programming benchmarks that require long-form reasoning, static evaluations consistently overestimate robustness: even state-of-the-art LRMs, which achieve high accuracy in static settings, can fail unpredictably when interrupted or exposed to changing context, with performance dropping by up to 60% when updates are introduced late in the reasoning process. Our analysis further reveals several novel failure modes, including reasoning leakage, where models fold the reasoning into their final answer when interrupted; panic, where under time pressure models abandon reasoning entirely and return incorrect answers; and self-doubt, where performance degrades when trying to incorporate updated information. Project Page: http://dynamic-lm.github.io/

URL PDF HTML ☆

赞 0 踩 0

2510.13774 2026-06-02 cs.LG cs.CV 版本更新

UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations

UrbanFusion: 用于鲁棒空间表示对比学习的随机多模态融合

Dominik J. Mühlematter, Lin Che, Ye Hong, Martin Raubal, Nina Wiedemann

发表机构 * ETH Zurich（苏黎世联邦理工学院）

AI总结提出UrbanFusion模型，通过随机多模态融合（SMF）和Transformer模块整合街景、遥感、地图和POI数据，在56个城市41项任务中优于现有GeoAI模型。

详情

Journal ref: International Conference on Machine Learning (ICML), 2026

AI中文摘要

预测房价和公共卫生指标等城市现象需要有效整合各种地理空间数据。当前方法主要使用特定任务模型，而近期用于空间表示的通用模型通常仅支持有限模态且缺乏多模态融合能力。为克服这些挑战，我们提出UrbanFusion，一种具有随机多模态融合（SMF）的空间表示模型。该框架采用模态特定编码器处理不同类型输入，包括街景图像、遥感数据、制图地图和兴趣点（POI）数据。这些多模态输入通过基于Transformer的融合模块进行集成，学习统一表示。在全世界56个城市的41项任务上的广泛评估表明，与最先进的GeoAI模型相比，UrbanFusion具有强大的泛化和预测性能。具体而言，它1）在位置编码上优于先前模型，2）允许推理时多模态输入，3）能很好地泛化到训练中未见过的区域。UrbanFusion在预训练和推理过程中均可灵活利用给定位置的任何可用模态子集，从而在多样化的数据可用性场景中实现广泛适用性。

英文摘要

Forecasting urban phenomena such as housing prices and public health indicators requires the effective integration of various geospatial data. Current methods primarily utilize task-specific models, while recent generic models for spatial representations often support only limited modalities and lack multimodal fusion capabilities. To overcome these challenges, we present UrbanFusion, a spatial representation model that features Stochastic Multimodal Fusion (SMF). The framework employs modality-specific encoders to process different types of inputs, including street view imagery, remote sensing data, cartographic maps, and points of interest (POIs) data. These multimodal inputs are integrated via a Transformer-based fusion module that learns unified representations. An extensive evaluation across 41 tasks in 56 cities worldwide demonstrates UrbanFusion's strong generalization and predictive performance compared to state-of-the-art GeoAI models. Specifically, it 1) outperforms prior models on location-encoding, 2) allows multimodal input during inference, and 3) generalizes well to regions unseen during training. UrbanFusion can flexibly utilize any subset of available modalities for a given location during both pretraining and inference, enabling broad applicability across diverse data availability scenarios.

URL PDF HTML ☆

赞 0 踩 0

2410.14483 2026-06-02 stat.ML cs.LG stat.ME 版本更新

Interventional Processes for Causal Uncertainty Quantification

因果不确定性量化的干预过程

Hugh Dance, Peter Orbanz, Arthur Gretton

发表机构 * Gatsby Unit, University College London, London, United Kingdom（伦敦大学学院Gatsby单元）

AI总结本文提出一种基于高斯过程的方法，通过将干预函数表示为再生核希尔伯特空间中观测函数的内积，实现干预函数的不确定性量化，并给出闭式后验矩和可处理的训练推理过程。

详情

AI中文摘要

在高风险应用中，因果效应的可靠不确定性量化至关重要，但当目标是一个完整函数而非标量估计量时，这仍然具有挑战性。在这项工作中，我们引入了一种基于高斯过程的方法，用于干预函数的不确定性量化。核心思想是建立在最近工作的基础上，该工作将干预函数表示为再生核希尔伯特空间中观测函数的内积，通过为这些函数构建适当的高斯过程先验，并从观测数据中推断后验。我们的方法产生闭式后验矩和可处理的训练与推理，同时避免了先前为RKHS函数构建高斯过程先验的病理问题。我们进一步推导了一种后验覆盖校准的实用程序。在合成基准、因果贝叶斯优化任务和大规模真实数据集上，我们的方法在保持因果效应估计竞争力的同时，改善了不确定性量化。

英文摘要

Reliable uncertainty quantification for causal effects is crucial in high-stakes applications, but remains challenging when the target is an entire function rather than a scalar estimand. In this work, we introduce a GP-based approach for uncertainty quantification of interventional functions. The central idea is to build on recent work representing interventional functions as an inner-product of observational functions in a reproducing kernel Hilbert space (RKHS), by constructing appropriate GP priors for such functions and inferring posteriors from observational data. Our approach yields closed-form posterior moments and tractable training and inference, while avoiding pathologies of previous GP prior constructions for RKHS functions. We further derive a practical procedure for posterior coverage calibration. Across synthetic benchmarks, causal Bayesian optimization tasks, and a large-scale real dataset, our method improves uncertainty quantification while remaining competitive in causal effect estimation.

URL PDF HTML ☆

赞 0 踩 0

2510.12624 2026-06-02 cs.LG cs.AI 版本更新

Learning-To-Measure: In-Context Active Feature Acquisition

学习测量：上下文主动特征获取

Yuta Kobayashi, Zilin Jing, Jiayu Yao, Hongseok Namkoong, Shalmali Joshi

发表机构 * University of Tokyo（东京大学）

AI总结提出 Learning-to-Measure (L2M) 方法，通过不确定性量化与条件互信息引导的贪婪特征获取，在上下文学习中解决元主动特征获取问题，无需针对每个任务重新训练。

详情

AI中文摘要

主动特征获取 (AFA) 是一个序列决策问题，目标是通过自适应选择要获取的特征来改进测试实例的模型性能。在实践中，AFA 方法通常从具有系统性特征缺失和有限任务特定标签的回顾性数据中学习。大多数先前的工作针对单个预定任务进行获取，限制了可扩展性。为解决这一限制，我们形式化了元 AFA 问题，其目标是学习跨各种任务的获取策略。我们引入了学习测量 (L2M)，它包括 i) 对未见任务的可靠不确定性量化，以及 ii) 一个最大化条件互信息的不确定性引导的贪婪特征获取代理。我们展示了一种序列建模或自回归预训练方法，该方法为具有任意缺失模式的任务提供了可靠的不确定性量化基础。L2M 直接对具有回顾性缺失的数据集进行操作，并在上下文中执行元 AFA 任务，消除了每个任务的重新训练。在合成和真实世界的表格基准测试中，L2M 匹配或超越了特定任务的基线，特别是在标签稀缺和高缺失率的情况下。

英文摘要

Active feature acquisition (AFA) is a sequential decision-making problem where the goal is to improve model performance for test instances by adaptively selecting which features to acquire. In practice, AFA methods often learn from retrospective data with systematic missingness in the features and limited task-specific labels. Most prior work addresses acquisition for a single predetermined task, limiting scalability. To address this limitation, we formalize the meta-AFA problem, where the goal is to learn acquisition policies across various tasks. We introduce Learning-to-Measure (L2M), which consists of i) reliable uncertainty quantification over unseen tasks, and ii) an uncertainty-guided greedy feature acquisition agent that maximizes conditional mutual information. We demonstrate a sequence-modeling or autoregressive pre-training approach that underpins reliable uncertainty quantification for tasks with arbitrary missingness. L2M operates directly on datasets with retrospective missingness and performs the meta-AFA task in-context, eliminating per-task retraining. Across synthetic and real-world tabular benchmarks, L2M matches or surpasses task-specific baselines, particularly under scarce labels and high missingness.

URL PDF HTML ☆

赞 0 踩 0

2510.12249 2026-06-02 cs.LG 版本更新

Optimal Regularization for Performative Learning

表现性学习的最优正则化

Edwige Cyffers, Alireza Mirrokni, Marco Mondelli

发表机构 * EPFL, Switzerland（瑞士联邦理工学院）

AI总结研究高维岭回归中正则化如何应对数据分布随模型变化的表现性效应，发现过参数化下表现性效应有益，并给出最优正则化参数与表现性效应强度的关系。

Comments Accepted at ICML 2026

详情

AI中文摘要

在表现性学习中，数据分布会响应部署的模型——例如，因为策略性用户调整其特征以博弈模型——这创造了比经典监督学习更复杂的动态。因此，我们不仅应该针对当前数据优化模型，还应该考虑模型可能将分布引向新方向，而不知道潜在变化的确切性质。我们通过研究正则化在高维岭回归中的影响，探索正则化如何帮助应对表现性效应。我们表明，虽然表现性效应在总体设置中恶化测试风险，但在特征数量超过样本数量的过参数化机制中，它们可能是有益的。我们证明最优正则化与表现性效应的整体强度成比例，从而可以预先设置正则化以应对这种效应。我们通过在合成和真实数据集上对最优正则化参数的经验评估来展示这一发现。

英文摘要

In performative learning, the data distribution reacts to the deployed model - for example, because strategic users adapt their features to game it - which creates a more complex dynamic than in classical supervised learning. One should thus not only optimize the model for the current data but also take into account that the model might steer the distribution in a new direction, without knowing the exact nature of the potential shift. We explore how regularization can help cope with performative effects by studying its impact in high-dimensional ridge regression. We show that, while performative effects worsen the test risk in the population setting, they can be beneficial in the over-parameterized regime where the number of features exceeds the number of samples. We show that the optimal regularization scales with the overall strength of the performative effect, making it possible to set the regularization in anticipation of this effect. We illustrate this finding through empirical evaluations of the optimal regularization parameter on both synthetic and real-world datasets.

URL PDF HTML ☆

赞 0 踩 0

2510.10982 2026-06-02 cs.LG cs.AI 版本更新

AdaptiveK：基于复杂度的稀疏自编码器用于可解释的语言模型表示

Yifei Yao, Hanrong Zhang, Mengnan Du

发表机构 * Zhejiang University（浙江大学）； University of Illinois Chicago（伊利诺伊大学香槟分校）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））

AI总结提出 AdaptiveK SAE，根据输入语义复杂度动态调整稀疏度，利用线性探针引导特征分配，在重构保真度、解释方差、余弦相似度和可解释性指标上优于固定稀疏度方法。

Comments Accepted by ACL 2026

详情

AI中文摘要

理解大型语言模型（LLMs）的内部表示仍然是可解释性研究的一个核心挑战。稀疏自编码器（SAEs）通过将激活分解为可解释的特征提供了一种有前景的解决方案，但现有方法依赖于固定的稀疏度约束，未能考虑输入的复杂度。我们提出了AdaptiveK SAE（自适应Top K稀疏自编码器），一种新颖的框架，根据每个输入的语义复杂度动态调整稀疏度。利用线性探针，我们证明了上下文复杂度在线性层面上编码在LLM表示中，并使用这一信号在训练过程中指导特征分配。在十个语言模型上的实验表明，这种基于复杂度的自适应方法在重构保真度、解释方差、余弦相似度和可解释性指标上优于固定稀疏度方法，同时消除了大量超参数调优的负担。我们的代码可在 https://github.com/hiyukie/adaptiveK 获取。

英文摘要

Understanding the internal representations of large language models (LLMs) remains a central challenge for interpretability research. Sparse autoencoders (SAEs) offer a promising solution by decomposing activations into interpretable features, but existing approaches rely on fixed sparsity constraints that fail to account for input complexity. We propose AdaptiveK SAE (Adaptive Top K Sparse Autoencoders), a novel framework that dynamically adjusts sparsity levels based on the semantic complexity of each input. Leveraging linear probes, we demonstrate that context complexity is linearly encoded in LLM representations, and we use this signal to guide feature allocation during training. Experiments across ten language models demonstrate that this complexity-driven adaptation outperforms fixed-sparsity approaches on reconstruction fidelity, explained variance, cosine similarity and interpretability metrics while eliminating the burden of extensive hyperparameter tuning. Our code is available at: https://github.com/hiyukie/adaptiveK.

URL PDF HTML ☆

赞 0 踩 0

2510.05566 2026-06-02 stat.ML cs.AI cs.CL cs.LG stat.AP 版本更新

Domain-Shift-Aware Conformal Prediction for Large Language Models

领域偏移感知的共形预测用于大型语言模型

Zhexiao Lin, Yuanyuan Li, Neeraj Sarna, Yuanyuan Gao, Michael von Gablenz

发表机构 * University of Waterloo（多伦多大学）

AI总结提出领域偏移感知共形预测框架，通过重加权校准样本应对分布偏移，在MMLU基准上提升覆盖可靠性。

Comments Accepted to Forty-Third International Conference on Machine Learning (ICML), 2026

详情

AI中文摘要

大型语言模型在各种任务中取得了令人印象深刻的性能。然而，它们倾向于产生过度自信且事实不正确的输出，即所谓的幻觉，这在实际应用中带来了风险。共形预测提供了有限样本、无分布假设的覆盖保证，但标准共形预测在领域偏移下会失效，常常导致覆盖不足和不可靠的预测集。我们提出了一种称为领域偏移感知共形预测（DS-CP）的新框架。我们的框架通过根据校准样本与测试提示的接近程度系统地重新加权校准样本，将共形预测适应于领域偏移下的大型语言模型，从而在保持有效性的同时增强适应性。我们的理论分析和在MMLU基准上的实验表明，所提出的方法比标准共形预测提供了更可靠的覆盖，尤其是在显著分布偏移下，同时保持了效率。这为大型语言模型在实际部署中实现可信的不确定性量化迈出了实际的一步。

英文摘要

Large language models have achieved impressive performance across diverse tasks. However, their tendency to produce overconfident and factually incorrect outputs, known as hallucinations, poses risks in real-world applications. Conformal prediction provides finite-sample, distribution-free coverage guarantees, but standard conformal prediction breaks down under domain shift, often leading to under-coverage and unreliable prediction sets. We propose a new framework called Domain-Shift-Aware Conformal Prediction (DS-CP). Our framework adapts conformal prediction to large language models under domain shift, by systematically reweighting calibration samples based on their proximity to the test prompt, thereby preserving validity while enhancing adaptivity. Our theoretical analysis and experiments on the MMLU benchmark demonstrate that the proposed method delivers more reliable coverage than standard conformal prediction, especially under substantial distribution shifts, while maintaining efficiency. This provides a practical step toward trustworthy uncertainty quantification for large language models in real-world deployment.

URL PDF HTML ☆

赞 0 踩 0

2510.05342 2026-06-02 cs.LG cs.AI 版本更新

轨迹数据足以在具有线性 $q^π$-可实现性和集中性的固定视界离线强化学习中进行统计有效的策略评估

Volodymyr Tkachuk, Csaba Szepesvári, Xiaoqi Tan

发表机构 * University of Alberta（阿尔伯塔大学）

AI总结本文研究在轨迹数据假设下，利用线性 $q^π$-可实现性和集中性，实现固定视界离线强化学习中策略评估的统计有效学习，并改进了策略优化的样本复杂度分析。

2510.03259 2026-06-02 cs.LG cs.AI 版本更新

Verifying Meta-Awareness via Predictive Rewards in Reasoning Models

通过推理模型中的预测奖励验证元意识

Yoonjeon Kim, Doohyuk Jang, Eunho Yang

发表机构 * Yoonjeon Kim, Doohyuk Jang, Eunho Yang

AI总结提出 MAPR 方法，利用自生成任务预测推理统计量（长度、通过率、概念）来增强模型的元意识，从而在多个数学推理基准上显著提升准确率和训练效率。

Comments accepted to ICML 2026

详情

AI中文摘要

近期关于推理模型的研究探索了语言模型的元意识，包括其确定最佳思考时长、识别知识边界以及结构化概念级思维的能力。虽然当前的大型推理模型仅依赖于基于答案的验证，但我们表明，添加元意识目标可以显著提升性能，超过缺乏此类元知识的模型。MAPR（通过预测奖励实现元意识）利用自生成任务来预测展开统计量——具体包括长度、通过率和所用概念——从而能够对照实际统计量进行验证。此外，通过利用这种自我预测能力，模型可以通过以下方式调节其推理行为：i) 过滤掉琐碎或无法解决的提示，ii) 减少倾向于错误的长篇生成，以及 iii) 生成与问题相关的提示。结果令人鼓舞：MAPR 在各种推理基准上显著提高了准确率和训练效率。更具体地说，我们的方法可以将 GRPO 训练加速超过 1.28 倍以达到相同的性能，在 AIME25 上实现 83.18% 的准确率提升，并在六个数学基准上平均提升 13.04%。代码公开于 https://github.com/akatigre/MAPR-RL。

英文摘要

Recent research on reasoning models explores the meta-awareness of language models, including their ability to determine optimal thinking duration, recognize knowledge boundaries, and structure concept-level thinking. While current large reasoning models depend solely on answer-based verification, we show that adding meta-awareness objectives leads to significant performance gains over models without such meta-knowledge. MAPR (Meta-Awareness via Predictive Reward) utilizes a self-generated task of predicting rollout statistics - specifically length, pass-rate, and concepts used - allowing for verification against the actual statistics. Furthermore, by leveraging this self-predictive capability, the model can regulate its reasoning behavior by i) filtering out trivial or unsolvable prompts, ii) reducing lengthy generations that tend to be incorrect, and iii) generating hints relevant to the problem. The results are inspiring: MAPR yields significant improvements in both accuracy and training efficiency on various reasoning benchmarks. More specifically, our method can speed up GRPO training by over 1.28x to reach the same performance, and achieve 83.18% gain in accuracy on AIME25, and a 13.04% average gain over six mathematics benchmarks. The code is publicly available at https://github.com/akatigre/MAPR-RL.

URL PDF HTML ☆

赞 0 踩 0

2510.03086 2026-06-02 cs.LG 版本更新

Chaining 2-FWL GNNs for Combinatorial Graph Alignment

链式2-FWL图神经网络用于组合图对齐

Marc Lelarge

发表机构 * INRIA - Ecole Normale Supérieure PSL Research University（法国国家科学研究中心-巴黎高等师范学院-巴黎理工研究大学）

AI总结针对组合图对齐问题，提出链式2-FWL GNN方法，通过非可微排序注入离散反馈，在稀疏随机图和正则图上显著优于FAQ和现有GNN方法。

Comments code available at https://github.com/mlelarge/chaining-gnn-graph-alignment

详情

AI中文摘要

对于组合图对齐问题（GAP）——寻找最大化两个无标签图之间公共边数（nce）的节点对应关系——适当初始化的FAQ仍然是强大的经典基线，而现有的GNN方法在纯结构设置中表现不佳。我们引入了一种链式过程：一系列Folklore类型（2-FWL）的GNN，其中每个网络在解码前一个网络的相似性矩阵并根据当前对齐质量对节点进行排序后，使用交叉熵进行训练。这个不可微的排序步骤在每个链接处注入离散的组合反馈；在推理时，我们迭代最终网络并保留具有最高观测nce的候选。在噪声水平0.25的稀疏Erdos-Renyi图上，带有FAQ后处理的链式FGNN达到85%的准确率，而FAQ从凸松弛初始化仅为13%，先前的GNN方法基本为0%。在相关正则图上，其中具有恒定特征的MPNN产生相同的节点嵌入（1-WL无法细化）且FAQ的凸初始化退化，链式是我们知道的唯一能够恢复非平凡对齐的方法。在三个真实世界基准（酵母PPI、合著和道路网络）上，我们表明最近的比较通过从均匀双随机矩阵初始化FAQ低估了FAQ；一旦FAQ从凸松弛初始化，它已经超过了先前报告的数字，而数据集特定的链式FGNN进一步改进了这个加强的基线。

英文摘要

For the combinatorial graph alignment problem (GAP) -- finding the node correspondence that maximizes the number of common edges (nce) between two unlabeled graphs -- properly initialized FAQ remains a strong classical baseline, while existing GNN approaches struggle in the purely structural setting. We introduce a chaining procedure: a sequence of Folklore-type (2-FWL) GNNs in which each network is trained with cross-entropy after decoding the previous network's similarity matrix and ranking nodes by their current alignment quality. This non-differentiable ranking step injects discrete combinatorial feedback at every link; at inference, we iterate the final network and keep the candidate with highest observed nce. On sparse Erdos-Renyi graphs at noise level 0.25, chained FGNNs with FAQ post-processing reach 85% accuracy versus 13% for FAQ initialized from the convex relaxation, and essentially 0% for prior GNN methods. On correlated regular graphs, where MPNNs with constant features produce identical node embeddings (1-WL fails to refine) and FAQ's convex initialization is degenerate, chaining is the only method we know that recovers a non-trivial alignment. On three real-world benchmarks (yeast PPI, coauthorship, and road networks), we show that recent comparisons underestimate FAQ by initializing it from a uniform doubly stochastic matrix; once FAQ is initialized from the convex relaxation it already surpasses prior reported numbers, and dataset-specific chained FGNNs further improve on this strengthened baseline.

URL PDF HTML ☆

赞 0 踩 0

2510.02528 2026-06-02 cs.AI cs.LG 版本更新

Multimodal Function Vectors for Visual Relations

视觉关系的多模态函数向量

Shuhao Fu, Esther Goldberg, Ying Nian Wu, Hongjing Lu

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结通过因果中介分析提取多模态函数向量，操纵注意力头以改善视觉关系推理，并实现零样本和微调性能提升。

详情

AI中文摘要

大型多模态模型（LMMs）从少量多模态演示中展现出令人印象深刻的上下文学习能力，然而支持这种任务学习的内部机制仍不透明。基于大型语言模型的先前工作，我们表明大型多模态模型中一小部分注意力头负责传递视觉关系的表示。这些注意力头的激活，称为函数向量，可以被提取和操纵以改变LMM在关系任务上的性能。首先，使用合成和真实图像数据集，我们应用因果中介分析来识别强烈影响关系预测的注意力头，并提取多模态函数向量，以提高推理时的零样本准确率。我们进一步证明，这些多模态函数向量可以在保持LMM参数冻结的情况下，用适量的训练数据进行微调，从而显著优于上下文学习基线。最后，我们展示了特定关系的函数向量可以线性组合，以解决涉及新颖和未经训练的视觉关系的类比问题，突显了该方法的强大泛化能力。通过在两个LMM（包括OpenFlamingo和Qwen3-VL）上的实验，我们的结果表明这些模型在局部内部结构中编码了视觉关系知识，这些知识可以被系统地提取和优化，从而增进了我们对模型模块化的理解，并增强了对LMM中关系推理的控制。

将大型语言模型（LLM）个性化以适应个体用户偏好，是超越生成通用有用响应的关键步骤。然而，当前的个性化方法不适合新用户，因为它们通常需要缓慢、资源密集的微调或大量预先存在的用户数据，造成了显著的冷启动问题。为了应对这一挑战，我们引入了一种新的实时个性化范式，通过从文本生成过程中收集的在线成对偏好反馈进行学习。我们提出了T-POP（基于在线偏好反馈的测试时个性化），这是一种新颖的算法，将测试时对齐与决斗式强盗协同结合。在不更新LLM参数的情况下，T-POP通过在线学习一个捕捉用户偏好的奖励函数来引导冻结LLM的解码过程。通过利用决斗式强盗，T-POP智能地查询用户，以有效平衡探索其偏好和利用所学知识生成个性化文本。大量实验表明，T-POP实现了快速且数据高效的个性化，显著优于现有基线，并且随着用户交互的增加而持续改进。

英文摘要

Personalizing large language models (LLMs) to individual user preferences is a critical step beyond generating generically helpful responses. However, current personalization methods are ill-suited for new users, as they typically require either slow, resource-intensive fine-tuning or a substantial amount of pre-existing user data, creating a significant cold-start problem. To address this challenge, we introduce a new paradigm for real-time personalization by learning from online pairwise preference feedback collected during text generation. We propose T-POP (Test-Time Personalization with Online Preference Feedback}), a novel algorithm that synergistically combines test-time alignment with dueling bandits. Without updating the LLM parameters, T-POP steers the decoding process of a frozen LLM by learning a reward function online that captures user preferences. By leveraging dueling bandits, T-POP intelligently queries the user to efficiently balance between exploring their preferences and exploiting the learned knowledge to generate personalized text. Extensive experiments demonstrate that T-POP achieves rapid and data-efficient personalization, significantly outperforming existing baselines and showing consistent improvement with more user interactions.

URL PDF HTML ☆

赞 0 踩 0

2506.22271 2026-06-02 cs.AI cs.LG 版本更新

On the Theoretical Limitations of Embedding-based Link Prediction

基于嵌入的链接预测的理论局限性

Samy Badreddine, Emile van Krieken, Luciano Serafini

发表机构 * Vrije Universiteit Amsterdam, Netherlands（荷兰阿姆斯特丹自由大学）； University of Trento, Italy（意大利特伦托大学）

AI总结研究线性输出层导致的秩瓶颈对知识图谱嵌入模型表达能力的限制，并提出混合非线性输出层以提升大规模密集图上的性能。

详情

AI中文摘要

神经网络通常将低维嵌入映射到高维输出空间。通常，输出层是线性的，这会产生一个“秩瓶颈”，限制模型所能表示的函数。这种瓶颈在链接预测模型中普遍存在，例如知识图谱嵌入（KGE），因为实体的输出空间可能比嵌入维度大几个数量级。我们研究了秩瓶颈如何限制模型拟合训练数据的表达能力。以往工作关注特定KGE所需嵌入维度的充分上界，而我们给出了所有具有线性输出层的KGE的必要下界，该下界随图的大小和连通性增长。我们还考虑了一种使用混合的非线性输出层，以在不显著增加参数开销的情况下打破瓶颈。实验表明，使用这种非线性层的模型在大型密集数据集上，以较低的参数成本提升了排序性能和概率拟合，正如我们的理论所预测。我们的工作揭示了线性输出层如何限制KGE，并激励使用非线性替代方案以扩展到大型密集图。

英文摘要

Neural networks often map low-dimensional embeddings to high-dimensional output spaces. Usually, the output layer is linear, which can create a "rank bottleneck" that limits the functions a model can represent. Such bottlenecks are ubiquitous in link prediction models, such as knowledge graph embeddings (KGEs), as the output space of entities can be orders of magnitude larger than the embedding dimension. We investigate how rank bottlenecks limit model expressivity for fitting the training data. While previous work focused on sufficient bounds on the embedding dimension required for specific KGEs, we show necessary bounds for all KGEs with a linear output layer, which grow with graph size and connectivity. We also consider a non-linear output layer using mixtures to break the bottleneck without significant parameter overhead. Empirically, we show that models using this non-linear layer improve in ranking performance and probabilistic fit for large and dense datasets at a low parameter cost, as predicted by our theory. Our work reveals how linear output layers limit KGEs and motivates non-linear alternatives for scaling to large and dense graphs.

URL PDF HTML ☆

赞 0 踩 0

2504.06006 2026-06-02 cs.LG cs.AI cs.NE 版本更新

Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?

Optuna vs Code Llama：LLM 是超参数调优的新范式吗？

Roman Kochnev, Arash Torabi Goodarzi, Zofia Antonina Bentyn, Dmitry Ignatov, Radu Timofte

发表机构 * Computer Vision Lab, CAIDAS & IFI, University of Würzburg, Germany（计算机视觉实验室，CAIDAS与IFI，乌尔姆大学，德国）

AI总结通过微调参数高效的 Code Llama 模型，提出基于大语言模型的超参数优化方法，在多种视觉架构上实现与 Optuna 相当或更优的 RMSE 并大幅降低计算开销。

详情

DOI: 10.1109/ICCVW69036.2025.00598
Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 5664-5674, 2025

AI中文摘要

最优超参数选择对于最大化计算机视觉中神经网络的性能至关重要，尤其是当架构变得日益复杂时。本文通过使用 LoRA 微调参数高效的 Code Llama 版本，探索了大语言模型在超参数优化中的应用。所得模型在广泛的视觉架构上产生了准确且计算高效的超参数推荐。与依赖资源密集型的试错过程的传统方法（如 Optuna）不同，我们的方法在实现竞争性或更优的均方根误差的同时，大幅降低了计算开销。重要的是，所评估的模型涵盖了以图像为中心的任务，如分类、检测和分割，这些是许多图像处理流程（包括增强、恢复和风格迁移）的基本组成部分。我们的结果表明，基于 LLM 的优化不仅与成熟的贝叶斯方法（如树结构 Parzen 估计器）相媲美，而且加速了需要感知质量和低延迟处理的实际应用的调优。所有生成的配置均公开在 LEMUR 神经网络数据集（https://github.com/ABrain-One/nn-dataset）中，该数据集作为超参数优化研究的开源基准，并为提高图像处理系统中的训练效率提供了实用资源。

英文摘要

Optimal hyperparameter selection is critical for maximizing the performance of neural networks in computer vision, particularly as architectures become more complex. This work explores the use of large language models (LLMs) for hyperparameter optimization by fine-tuning a parameter-efficient version of Code Llama using LoRA. The resulting model produces accurate and computationally efficient hyperparameter recommendations across a wide range of vision architectures. Unlike traditional methods such as Optuna, which rely on resource-intensive trial-and-error procedures, our approach achieves competitive or superior Root Mean Square Error (RMSE) while substantially reducing computational overhead. Importantly, the models evaluated span image-centric tasks such as classification, detection, and segmentation, fundamental components in many image manipulation pipelines including enhancement, restoration, and style transfer. Our results demonstrate that LLM-based optimization not only rivals established Bayesian methods like Tree-structured Parzen Estimators (TPE), but also accelerates tuning for real-world applications requiring perceptual quality and low-latency processing. All generated configurations are publicly available in the LEMUR Neural Network Dataset (https://github.com/ABrain-One/nn-dataset), which serves as an open source benchmark for hyperparameter optimization research and provides a practical resource to improve training efficiency in image manipulation systems.

URL PDF HTML ☆

赞 0 踩 0

2509.23544 2026-06-02 stat.ML cs.AI cs.LG stat.ME 版本更新

End-to-End Deep Learning for Predicting Metric Space-Valued Outputs

端到端深度学习预测度量空间值输出

Yidong Zhou, Su I Iao, Hans-Georg Müller

AI总结提出E2M框架，通过加权Fréchet均值和神经网络学习权重，实现度量空间值输出的几何感知预测，具有理论保证并在多种结构化输出上取得最优性能。

Comments 38 pages, 4 figures, 9 tables

详情

Journal ref: Journal of Machine Learning Research, 27:1--38, 2026

AI中文摘要

许多现代应用涉及预测结构化、非欧几里得输出，例如概率分布、网络和对称正定矩阵。这些输出自然地被建模为一般度量空间的元素，而依赖于向量空间结构的经典回归技术不再适用。我们引入了E2M（端到端度量回归），这是一个用于预测度量空间值输出的深度学习框架。E2M通过训练输出的加权Fréchet均值进行预测，其中权重由基于输入条件的神经网络学习。这种构造提供了一种原则性的几何感知预测机制，避免了替代嵌入和限制性参数假设，同时完全保留了输出空间的内在几何结构。我们建立了理论保证，包括刻画模型表达能力的通用逼近定理以及熵正则化训练目标的收敛性分析。通过涉及概率分布、网络和对称正定矩阵的大量模拟，我们展示了E2M始终达到最先进的性能，且其优势在更大样本量下更加明显。应用于人类死亡率分布和纽约市出租车网络进一步证明了该框架的灵活性和实用性。

英文摘要

Many modern applications involve predicting structured, non-Euclidean outputs such as probability distributions, networks, and symmetric positive-definite matrices. These outputs are naturally modeled as elements of general metric spaces, where classical regression techniques that rely on vector space structure no longer apply. We introduce E2M (End-to-End Metric regression), a deep learning framework for predicting metric space-valued outputs. E2M performs prediction via weighted Fréchet means over training outputs, where the weights are learned by a neural network conditioned on the input. This construction provides a principled mechanism for geometry-aware prediction that avoids surrogate embeddings and restrictive parametric assumptions, while fully preserving the intrinsic geometry of the output space. We establish theoretical guarantees, including a universal approximation theorem that characterizes the expressive capacity of the model and a convergence analysis of the entropy-regularized training objective. Through extensive simulations involving probability distributions, networks, and symmetric positive-definite matrices, we show that E2M consistently achieves state-of-the-art performance, with its advantages becoming more pronounced at larger sample sizes. Applications to human mortality distributions and New York City taxi networks further demonstrate the flexibility and practical utility of this framework.

URL PDF HTML ☆

赞 0 踩 0

2509.22907 2026-06-02 cs.LG 版本更新

FedCF: Fair Federated Conformal Prediction

FedCF: 公平联邦共形预测

Anutam Srinivasan, Aditya T. Vadlamani, Amin Meghrazi, Srinivasan Parthasarathy

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； The Ohio State University（俄亥俄州立大学）

AI总结提出FedCF框架，将共形公平性扩展到联邦学习，通过分析不同人口统计组的公平性差距来审计模型公平性，并在多个数据集上验证。

Comments Preprint

2508.06588 2026-06-02 cs.LG cs.AI 版本更新

Graph is a Natural Regularization: Revisiting Vector Quantization for Graph Representation Learning

图是一种自然正则化：重新审视向量量化在图表示学习中的应用

Zian Zhai, Fan Li, Xingyu Tan, Xiaoyang Wang, Wenjie Zhang

发表机构 * School of Computer Science and Engineering, University of New South Wales, Sydney, Australia（新南威尔士大学计算机科学与工程学院，悉尼，澳大利亚）

AI总结针对图向量量化中码本崩溃问题，提出RGVQ框架，通过图拓扑和特征相似性正则化及Gumbel-Softmax软分配，提升码本利用率和令牌多样性。

Comments ICML2026

详情

AI中文摘要

向量量化（VQ）最近成为一种学习图结构数据压缩和离散表示的有前途的方法。然而，一个基本挑战，即码本崩溃，在图领域仍未得到充分探索，严重限制了图令牌的表达能力和泛化能力。在本文中，我们进行了一项实证研究，观察到在图形重建任务中，即使采用了视觉或语言领域提出的缓解策略，当与图神经网络联合训练VQ时，码本崩溃始终发生。此外，我们从数据和优化角度提供了崩溃的诊断，表明崩溃与图数据属性（如特征冗余和连接密度）相关，并进一步由确定性硬分配的训练动态强化。为了解决这些问题，我们提出了RGVQ，一种新颖的框架，它集成图拓扑和特征相似性作为显式正则化信号，以增强码本利用并促进令牌多样性。RGVQ通过Gumbel-Softmax重参数化引入软分配，确保所有码字接收梯度更新。此外，RGVQ包含结构感知对比正则化，以惩罚将相同令牌分配给不相似的节点对。大量实验表明，RGVQ显著提高了码本利用率，并在多个下游任务中持续提升了最先进的图VQ骨干网络的性能，实现了更具表达性和可迁移性的图令牌表示。

英文摘要

Vector Quantization (VQ) has recently emerged as a promising approach for learning compressed and discrete representations for graph-structured data. However, a fundamental challenge, i.e., codebook collapse, remains underexplored in the graph domain, significantly limiting the expressiveness and generalization of graph tokens.In this paper, we present an empirical study and observe that codebook collapse consistently occurs when training VQ jointly with Graph Neural Networks under graph reconstruction tasks, even with mitigation strategies proposed in vision or language domains. Moreover, we provide a diagnosis of collapse from data and optimization perspectives, showing that collapse is associated with graph data properties such as feature redundancy and connectivity density, and is further reinforced by the training dynamics of deterministic hard assignment. To address these issues, we propose RGVQ, a novel framework that integrates graph topology and feature similarity as explicit regularization signals to enhance codebook utilization and promote token diversity. RGVQ introduces soft assignments via Gumbel-Softmax reparameterization, ensuring that all codewords receive gradient updates. In addition, RGVQ incorporates a structure-aware contrastive regularization to penalize assigning the same token to dissimilar node pairs. Extensive experiments demonstrate that RGVQ substantially improves codebook utilization and consistently boosts the performance of state-of-the-art graph VQ backbones across multiple downstream tasks, enabling more expressive and transferable graph token representations.

URL PDF HTML ☆

赞 0 踩 0

2504.10552 2026-06-02 cs.LG cs.AI cs.CV cs.DL 版本更新

LEMUR Neural Network Dataset: Towards Seamless AutoML

LEMUR 神经网络数据集：迈向无缝 AutoML

Arash Torabi Goodarzi, Roman Kochnev, Waleed Khalid, Hojjat Torabi Goudarzi, Furui Qin, Tolgay Atinc Uzun, Yashkumar Sanjaybhai Dhameliya, Yash Kanubhai Kathiriya, Zofia Antonina Bentyn, Dmitry Ignatov, Radu Timofte

发表机构 * Computer Vision Lab, CAIDAS, University of Würzburg（计算机视觉实验室，CAIDAS，乌尔姆大学）

AI总结提出 LEMUR 开源数据集与框架，通过统一模板、结构化存储和自动化超参数优化，标准化神经网络实现与评估，以加速 AutoML 研究并促进公平基准测试。

详情

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 3291-3300, 2026

AI中文摘要

神经网络是现代人工智能的支柱，但设计、评估和比较它们仍然劳动密集。尽管存在许多用于训练的数据集，但模型本身的标准化集合很少。我们介绍 LEMUR，一个开源数据集和框架，它提供了大量基于 PyTorch 的神经网络集合，涵盖分类、分割、检测和自然语言处理等任务。每个模型遵循统一模板，配置和结果存储在结构化数据库中，以确保一致性和可重复性。LEMUR 通过 Optuna 集成自动超参数优化，包括统计分析和可视化工具，并提供 API 以无缝访问性能数据。该框架是可扩展的，允许研究人员添加新模型、数据集或指标而不破坏兼容性。通过标准化实现和统一评估，LEMUR 旨在加速 AutoML 研究，实现公平基准测试，并降低大规模神经网络实验的障碍。为支持采用和协作，LEMUR 及其插件在 MIT 许可下发布，网址为：https://github.com/ABrain-One/nn-dataset https://github.com/ABrain-One/nn-plots https://github.com/ABrain-One/nn-vr

英文摘要

Neural networks are the backbone of modern artificial intelligence, but designing, evaluating, and comparing them remains labor-intensive. While numerous datasets exist for training, there are few standardized collections of the models themselves. We introduce LEMUR, an open-source dataset and framework that provides a large collection of PyTorch-based neural networks across tasks such as classification, segmentation, detection, and natural language processing. Each model follows a unified template, with configurations and results stored in a structured database to ensure consistency and reproducibility. LEMUR integrates automated hyperparameter optimization via Optuna, includes statistical analysis and visualization tools, and offers an API for seamless access to performance data. The framework is extensible, allowing researchers to add new models, datasets, or metrics without breaking compatibility. By standardizing implementations and unifying evaluation, LEMUR aims to accelerate AutoML research, enable fair benchmarking, and reduce barriers to large-scale neural network experimentation. To support adoption and collaboration, LEMUR and its plugins are released under the MIT license at: https://github.com/ABrain-One/nn-dataset https://github.com/ABrain-One/nn-plots https://github.com/ABrain-One/nn-vr

URL PDF HTML ☆

赞 0 踩 0

2509.18025 2026-06-02 math.OC cs.AI cs.LG math.LO stat.ML 版本更新

Deep Learning as the Disciplined Construction of Tame Objects

深度学习作为驯服对象的有纪律构造

Gilles Bareilles, Allen Gehret, Johannes Aspman, Jana Lepšová, Jakub Mareček

发表机构 * Czech Technical University in Prague, Artificial Intelligence Center（布拉格捷克技术大学人工智能中心）

AI总结本文通过驯服几何（o-极小性）框架，介绍深度学习模型作为函数组合的数学基础，并展示其在非光滑非凸但驯服设置下为随机梯度下降提供收敛保证的应用。

Comments 39 pages, 10 figures

2505.18614 2026-06-02 cs.CL cs.LG cs.MM cs.SD eess.AS 版本更新

MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation

MAVL：面向动画歌曲翻译的多语言音视频歌词数据集

Woohyun Cho, Youngmin Kim, Sunghyun Lee, Youngjae Yu

发表机构 * Yonsei University（延世大学）； Seoul National University（首尔国立大学）

AI总结提出首个多语言多模态歌词翻译基准MAVL，并设计音节约束的音视频大语言模型SylAVL-CoT，利用音视频线索和音节约束提升歌词可唱性和翻译准确性。

Comments Accepted to EMNLP 2025, Project Page: https://k1064190.github.io/papers/paper1.html, our codes and datasets are available at https://github.com/k1064190/MAVL

2509.00326 2026-06-02 cs.LG 版本更新

Chunked TabPFN: Exact Training-Free In-Context Learning for Long-Context Tabular Data

分块TabPFN：面向长上下文表格数据的精确免训练上下文学习

Renat Sergazinov, Shao-An Yin

发表机构 * Department of Statistics, Texas A\&M University, College Station, TX（德克萨斯A&M大学统计学系）； Department of Electrical and Computer Engineering, University of Minnesota, Twin City, MN（明尼苏达大学电气与计算机工程系）

AI总结提出分块注意力策略，使TabPFN无需预处理即可处理超过10K上下文令牌的长表格数据，并在TabArena基准上验证有效性。

Comments 14 pages, 6 figures

2509.11056 2026-06-02 eess.SY cs.LG cs.SY 版本更新

BERT4beam: Large AI Model Enabled Generalized Beamforming Optimization

BERT4beam: 大型AI模型实现通用波束赋形优化

Yuhang Li, Yang Lu, Wei Chen, Bo Ai, Zhiguo Ding

发表机构 * State Key Laboratory of Advanced Rail Autonomous Operation（先进轨道交通自主运行国家重点实验室）； School of Computer Science and Technology（计算机科学与技术学院）； School of Electronics and Information Engineering（电子与信息工程学院）； School of Electrical and Electronic Engineering (EEE)（电子与电气工程学院）

AI总结本文提出基于BERT的框架BERT4beam，将波束赋形优化转化为token级序列学习任务，通过预训练和微调实现单任务与多任务优化，在不同用户规模、系统效用和天线配置下均能接近最优性能。

详情

AI中文摘要

人工智能（AI）有望成为未来第六代（6G）无线通信系统的关键推动力。然而，当前关于无线通信大型AI模型的研究主要集中在针对特定任务微调预训练的大型语言模型（LLM）。本文研究了专为波束赋形优化设计的大规模AI模型，以适应并泛化到由系统效用和规模定义的不同任务。我们提出了一种基于Transformer双向编码器表示（BERT）的新框架，称为BERT4beam。我们旨在将波束赋形优化问题表述为token级序列学习任务，对信道状态信息进行token化，构建BERT模型，并执行任务特定的预训练和微调策略。基于该框架，我们分别提出了两种基于BERT的方法用于单任务和多任务波束赋形优化。两种方法均可泛化到不同用户规模。此外，前者通过重新配置BERT模型的输入和输出模块，能够适应不同的系统效用和天线配置；而后者（称为UBERT）由于采用更细粒度的token化策略，可以直接泛化到多种任务。大量仿真结果表明，这两种方法能够实现接近最优的性能，并在各种波束赋形优化任务中优于现有AI模型，展现出强大的适应性和泛化能力。

英文摘要

Artificial intelligence (AI) is anticipated to emerge as a pivotal enabler for the forthcoming sixth-generation (6G) wireless communication systems. However, current research efforts regarding large AI models for wireless communications primarily focus on fine-tuning pre-trained large language models (LLMs) for specific tasks. This paper investigates the large-scale AI model designed for beamforming optimization to adapt and generalize to diverse tasks defined by system utilities and scales. We propose a novel framework based on bidirectional encoder representations from transformers (BERT), termed BERT4beam. We aim to formulate the beamforming optimization problem as a token-level sequence learning task, perform tokenization of the channel state information, construct the BERT model, and conduct task-specific pre-training and fine-tuning strategies. Based on the framework, we propose two BERT-based approaches for single-task and multi-task beamforming optimization, respectively. Both approaches are generalizable for varying user scales. Moreover, the former can adapt to varying system utilities and antenna configurations by re-configuring the input and output module of the BERT model, while the latter, termed UBERT, can directly generalize to diverse tasks, due to a finer-grained tokenization strategy. Extensive simulation results demonstrate that the two proposed approaches can achieve near-optimal performance and outperform existing AI models across various beamforming optimization tasks, showcasing strong adaptability and generalizability.

URL PDF HTML ☆

赞 0 踩 0

2509.10491 2026-06-02 eess.SP cs.LG 版本更新

FlowECG: Using Flow Matching to Create a More Efficient ECG Signal Generator

FlowECG：利用流匹配创建更高效的ECG信号生成器

Vitalii Bondar, Serhii Semenov, Vira Babenko, Dmytro Holovniak

发表机构 * Cherkasy State Technological University（切爾卡西州立科技大學）； University of the National Education Commission（國家教育委員會大學）； State Scientific Research Institute of Armament and Military Equipment Testing and Certification（軍事裝備測試和認證州科學研究 institutes）

AI总结提出FlowECG方法，采用流匹配替代扩散过程，通过连续流动力学学习噪声到数据分布的直连路径，在PTB-XL数据集上以更少的采样步数（10-25次）达到与扩散方法（200次）相当的生成质量，计算需求降低一个数量级。

Comments 8 pages, 2 figures, 1 table, reviewed version will be published in "Sensors, Devices and Systems 2025 Proceedings" (Springer's Lecture Notes in Electrical Engineering)

详情

DOI: 10.1007/978-3-032-18415-3_32

AI中文摘要

合成心电图生成为需要隐私保护数据共享和训练数据集增强的医学AI应用提供服务。当前基于扩散的方法实现了高生成质量，但在采样过程中需要数百次神经网络评估，给临床部署造成了计算瓶颈。我们提出了FlowECG，一种流匹配方法，通过用连续流动力学替代迭代扩散过程来适配SSSD-ECG架构。流匹配通过常微分方程求解学习从噪声到数据分布的直连传输路径。我们使用动态时间规整、Wasserstein距离、最大均值差异和频谱相似性指标在PTB-XL数据集上评估了我们的方法。FlowECG在200次神经函数评估时匹配了SSSD-ECG的性能，并在三个指标上优于基线。关键发现表明，FlowECG以大幅减少的采样步数保持生成质量，与扩散方法需要200次评估相比，仅需10-25次评估即可获得可比结果。这种效率提升将计算需求降低了一个数量级，同时保留了生理上真实的12导联ECG特征。该方法使得在需要实时生成或大规模合成数据创建的资源受限临床环境中实现实际部署成为可能。

英文摘要

Synthetic electrocardiogram generation serves medical AI applications requiring privacy-preserving data sharing and training dataset augmentation. Current diffusion-based methods achieve high generation quality but require hundreds of neural network evaluations during sampling, creating computational bottlenecks for clinical deployment. We propose FlowECG, a flow matching approach that adapts the SSSD-ECG architecture by replacing the iterative diffusion process with continuous flow dynamics. Flow matching learns direct transport paths from noise to data distributions through ordinary differential equation solving. We evaluate our method on the PTB-XL dataset using Dynamic Time Warping, Wasserstein distance, Maximum Mean Discrepancy, and spectral similarity metrics. FlowECG matches SSSD-ECG performance at 200 neural function evaluations, outperforming the baseline on three metrics. The key finding shows that FlowECG maintains generation quality with substantially fewer sampling steps, achieving comparable results with 10-25 evaluations compared to 200 for diffusion methods. This efficiency improvement reduces computational requirements by an order of magnitude while preserving physiologically realistic 12-lead ECG characteristics. The approach enables practical deployment in resource-limited clinical settings where real-time generation or large-scale synthetic data creation is needed.

URL PDF HTML ☆

赞 0 踩 0

2509.04631 2026-06-02 cs.LG cs.IT math.IT stat.ML 版本更新

Fundamental bounds on efficiency-confidence trade-off for transductive conformal prediction

转导共形预测的效率-置信度权衡的基本界限

Arash Behboodi, Alvaro H. C. Correia, Fabio Valerio Massoli, Christos Louizos

发表机构 * Qualcomm Technologies, Inc.（高通技术公司）

AI总结本文证明了转导共形预测中置信度与效率（预测集大小）之间存在严格有限样本界，指出非平凡置信度会导致预测集大小随数据固有不确定性呈指数增长，并提出了接近该界限的实用算法。

详情

AI中文摘要

转导共形预测处理多个数据点的同时预测。给定期望的置信水平，目标是构建一个预测集，以规定的置信度包含真实结果。我们证明了转导方法中置信度与效率之间的基本权衡，其中效率通过预测集的大小来衡量。具体来说，我们推导了一个严格的有限样本界，表明对于具有固有不确定性的数据，任何非平凡的置信水平都会导致预测集大小的指数增长。指数与样本数量线性相关，并与数据的条件熵成正比。此外，该界限包含一个二阶项——分散度，定义为对数条件概率分布的方差。我们表明，基于近似条件分布的转导方法可以接近这个界限。受此启发，我们引入了一种实用的转导预测算法，该算法优于Bonferroni方法。

英文摘要

Transductive conformal prediction addresses the simultaneous prediction for multiple data points. Given a desired confidence level, the objective is to construct a prediction set that includes the true outcomes with the prescribed confidence. We demonstrate a fundamental trade-off between confidence and efficiency in transductive methods, where efficiency is measured by the size of the prediction sets. Specifically, we derive a strict finite-sample bound showing that any non-trivial confidence level leads to exponential growth in prediction set size for data with inherent uncertainty. The exponent scales linearly with the number of samples and is proportional to the conditional entropy of the data. Additionally, the bound includes a second-order term, dispersion, defined as the variance of the log conditional probability distribution. We show that the transductive methods based on the approximate conditional distribution can approach this bound. Inspired by this setup, we introduce a practical transductive prediction algorithm that surpasses Bonferroni methods.

URL PDF HTML ☆

赞 0 踩 0

2509.03456 2026-06-02 stat.ML cs.LG 版本更新

Off-Policy Learning in Large Action Spaces: Optimization Matters More Than Estimation

大动作空间中的离线策略学习：优化比估计更重要

Imad Aouali, Otmane Sakhi

发表机构 * Criteo AI Lab（Criteo AI实验室）

AI总结本文研究离线上下文强盗中的离线策略学习，发现现有方法在大动作空间中面临严重优化问题，提出使用加权对数似然目标可改善优化并取得竞争性策略。

Comments ICML '26

详情

AI中文摘要

离线策略评估（OPE）和离线策略学习（OPL）是离线上下文强盗中决策制定的基础。最近OPL的进展主要优化具有改进统计特性的OPE估计器，假设更好的估计器自然产生更优的策略。尽管有理论依据，但这种以估计器为中心的方法忽略了一个关键的实际障碍：具有挑战性的优化景观。在本文中，我们提供理论见解和实证证据，表明当前的OPL方法遇到严重的优化问题，特别是随着动作空间的增长。我们表明，估计器感知的策略参数化可以缓解但不能完全解决优化挑战。在此基础上，我们探索更简单的加权对数似然目标，并证明它们具有显著更好的优化特性，并且仍然能够恢复具有竞争力、通常更优的学习策略。我们的发现强调了在开发针对大动作空间的OPL算法时，明确考虑优化问题的必要性。

英文摘要

Off-policy evaluation (OPE) and off-policy learning (OPL) are foundational for decision-making in offline contextual bandits. Recent advances in OPL primarily optimize OPE estimators with improved statistical properties, assuming that better estimators inherently yield superior policies. Although theoretically justified, this estimator-centric approach neglects a critical practical obstacle: challenging optimization landscapes. In this paper, we provide theoretical insights and empirical evidence showing that current OPL methods encounter severe optimization issues, particularly as the action space grows. We show that estimator-aware policy parametrization can mitigate, but not fully resolve, optimization challenges. Building on this, we explore simpler weighted log-likelihood objectives and demonstrate that they enjoy substantially better optimization properties and still recover competitive, often superior, learned policies. Our findings emphasize the necessity of explicitly addressing optimization considerations in the development of OPL algorithms for large action spaces.

URL PDF HTML ☆

赞 0 踩 0

2506.13554 2026-06-02 cs.LG cs.NA math.FA math.NA 版本更新

Non-Asymptotic Stability and Consistency Guarantees for Physics-Informed Neural Networks via Coercive Operator Analysis

基于强制算子分析的物理信息神经网络的非渐近稳定性与一致性保证

Ronald Katende

发表机构 * Department of Mathematics, Kabale University（卡贝大学数学系）； Department of Mathematics, Makerere University（Makerere大学数学系）

AI总结通过强制算子、变分公式和非渐近扰动理论，建立物理信息神经网络（PINN）的稳定性和一致性的统一理论框架，证明残差最小化在Sobolev范数下导致能量和一致范数收敛，并给出确定性稳定性界和概率样本复杂度保证。

详情

DOI: 10.1016/j.cnsns.2026.109909
Journal ref: 109909 (2026)

AI中文摘要

我们提出了一个统一的理论框架，用于分析物理信息神经网络（PINN）的稳定性和一致性，该框架基于算子强制性、变分公式和非渐近扰动理论。PINN通过在采样配置点和边界点上最小化残差损失来逼近偏微分方程（PDE）的解。我们形式化了算子级和变分的一致性概念，证明在Sobolev范数下的残差最小化在温和正则性下导致能量范数和一致范数的收敛。确定性稳定性界量化了网络输出的有界扰动如何通过整个复合损失传播，而通过McDiarmid不等式的概率集中结果则为基于残差的泛化提供了样本复杂度保证。一个统一的泛化界将残差一致性、投影误差和扰动敏感性联系起来。在椭圆型、抛物型和非线性PDE上的实证结果证实了我们的理论界在不同情况下的预测准确性。该框架识别了关键结构原则，如算子强制性、激活平滑性和采样可容许性，这些原则支撑了鲁棒且可泛化的PINN训练，为PDE学习系统的设计和分析提供了原则性指导。

英文摘要

We present a unified theoretical framework for analyzing the stability and consistency of Physics-Informed Neural Networks (PINNs), grounded in operator coercivity, variational formulations, and non-asymptotic perturbation theory. PINNs approximate solutions to partial differential equations (PDEs) by minimizing residual losses over sampled collocation and boundary points. We formalize both operator-level and variational notions of consistency, proving that residual minimization in Sobolev norms leads to convergence in energy and uniform norms under mild regularity. Deterministic stability bounds quantify how bounded perturbations to the network outputs propagate through the full composite loss, while probabilistic concentration results via McDiarmid's inequality yield sample complexity guarantees for residual-based generalization. A unified generalization bound links residual consistency, projection error, and perturbation sensitivity. Empirical results on elliptic, parabolic, and nonlinear PDEs confirm the predictive accuracy of our theoretical bounds across regimes. The framework identifies key structural principles, such as operator coercivity, activation smoothness, and sampling admissibility, that underlie robust and generalizable PINN training, offering principled guidance for the design and analysis of PDE-informed learning systems.

URL PDF HTML ☆

赞 0 踩 0

2508.12551 2026-06-02 cs.LG cs.AI cs.OS cs.SE 版本更新

TuneAgent: Agentic Operating System Kernel Tuning with Reinforcement Learning

TuneAgent: 基于强化学习的智能操作系统内核调优

Hongyu Lin, Yuchen Li, Haoran Luo, Zhenghong Lin, Libo Zhang, Mingjie Xing, Yanjun Wu

发表机构 * Institute of Software, Chinese Academy of Sciences（中国科学院软件研究所）； University of Chinese Academy of Sciences（中国科学院大学）； Nanyang Technological University（南洋理工大学）

AI总结提出TuneAgent框架，利用基于规则的强化学习使大语言模型自主探索Linux内核空间，通过结构化奖励函数和两阶段训练策略解决稀疏反馈问题，实现高达5.6%的性能提升。

详情

AI中文摘要

Linux内核调优对于优化操作系统性能至关重要，但由于复杂的内核空间、稀疏的性能反馈和强烈的工作负载敏感性，仍然具有挑战性。我们提出了TuneAgent，一个基于规则强化学习的智能Linux内核调优框架。TuneAgent将内核空间构建为约束强化学习环境，使大语言模型能够自主探索内核，同时强制执行有效且精确的配置修改。为了解决稀疏性能反馈问题，我们设计了结构化奖励函数，共同促进推理标准化、配置正确性和性能感知。此外，我们提出了一种两阶段训练策略，首先确保格式和语义正确性，然后过渡到性能驱动的探索，从而加速收敛并降低开销。实验结果表明，TuneAgent始终优于现有基线，在保持高配置有效性的同时，实现了高达5.6%的相对整体性能提升。我们进一步展示了其在多个实际应用中的鲁棒性，突显了其在多样化部署环境中的实用性和适应性。

英文摘要

Linux kernel tuning is essential for optimizing operating system (OS) performance, yet remains challenging due to the complex kernel space, sparse performance feedback, and strong workload sensitivity. We present TuneAgent, an agentic Linux kernel tuning framework powered by rule-based reinforcement learning (RL). TuneAgent formulates the kernel space as a constrained RL environment, enabling large language models (LLMs) to autonomously explore the kernel while enforcing valid and precise configuration modifications. To address sparse performance feedback, we design structured reward functions that jointly promote reasoning standardization, configuration correctness, and performance awareness. Furthermore, we propose a two-phase training strategy that first ensures format and semantic correctness and then transitions to performance-driven exploration, accelerating convergence and reducing overhead. Experimental results show that TuneAgent consistently outperforms existing baselines, achieving up to 5.6% relative overall performance improvement while maintaining high configuration validity. We further demonstrate its robustness across multiple real-world applications, highlighting its practicality and adaptability in diverse deployment environments.

URL PDF HTML ☆

赞 0 踩 0

2501.10342 2026-06-02 cs.LG 版本更新

Hybrid Deep Learning Model for epileptic seizure classification by using 1D-CNN with multi-head attention mechanism

基于一维CNN与多头注意力机制的混合深度学习模型用于癫痫发作分类

Mohammed Guhdar, Ramadhan J. Mstafa, Abdulhakeem O. Mohammed

发表机构 * Computer Science Department, College of Science, University of Zakho（扎赫大学科学学院计算机科学系）

AI总结提出一种结合一维卷积神经网络和多头注意力机制的混合深度学习模型，用于从脑电图信号中分类癫痫发作，以提高分类准确性。

详情

DOI: 10.1016/j.bspc.2025.108495
Journal ref: Biomedical Signal Processing and Control. Volume 112, pages 108495, 2026

AI中文摘要

癫痫是一种全球常见的神经系统疾病，影响约5000万人。癫痫发作源于大脑突然的异常电活动，这可以表现为脑电图信号的突然显著变化。信号的严重程度和频率各不相同，导致短暂意识丧失和肌肉收缩。癫痫患者常因某些工作环境的安全问题面临显著的就业挑战。涉及高空作业、操作重型机械或其他潜在危险环境的工作可能对癫痫患者受限，这无疑限制了他们的就业机会和经济可能性。

英文摘要

Epilepsy is a prevalent neurological disorder globally, impacting around 50 million people \cite{WHO_epilepsy_50million}. Epileptic seizures result from sudden abnormal electrical activity in the brain, which can be read as sudden and significant changes in the EEG signal of the brain. The signal can vary in severity and frequency, which results in loss of consciousness and muscle contractions for a short period of time \cite{epilepsyfoundation_myoclonic}. Individuals with epilepsy often face significant employment challenges due to safety concerns in certain work environments. Many jobs that involve working at heights, operating heavy machinery, or in other potentially hazardous settings may be restricted for people with seizure disorders. This certainly limits job options and economic opportunities for those living with epilepsy.

URL PDF HTML ☆

赞 0 踩 0

2503.22939 2026-06-02 cs.LG q-bio.QM 版本更新

Interpretable Graph Kolmogorov-Arnold Networks for Multi-Cancer Classification and Biomarker Identification using Multi-Omics Data

可解释图Kolmogorov-Arnold网络用于基于多组学数据的多癌症分类和生物标志物识别

Fadi Alharbi, Nishant Budhiraja, Aleksandar Vakanski, Boyu Zhang, Murtada K. Elbashir, Harshith Guduru, Mohanad Mohammed

发表机构 * University of Idaho, Department of Computer Science（爱达荷大学计算机科学系）； Jouf University, College of Computer and Information Sciences, Department of Computer Science（朱夫大学计算机与信息科学学院计算机科学系）； University of Gezira, Faculty of Mathematical and Computer Sciences（杰兹拉大学数学与计算机科学学院）； Bentonville High School（伯恩斯维尔高中）； University of KwaZulu-Natal, School of Mathematics, Statistics and Computer Science（夸祖鲁-纳塔尔大学数学、统计与计算机科学学院）

AI总结提出MOGKAN框架，结合图神经网络与Kolmogorov-Arnold定理，利用多组学数据和PPI网络实现31种癌症的高精度分类与可解释生物标志物识别。

详情

DOI: 10.1038/s41598-025-13337-0
Journal ref: Sci. Rep. 16, ARTICLE NUMBER (2026)

AI中文摘要

在系统层面整合异质多组学数据集仍然是精准癌症诊断中开发分析和计算模型的核心挑战。本文介绍了多组学图Kolmogorov-Arnold网络（MOGKAN），这是一个深度学习框架，利用信使RNA、微RNA序列和DNA甲基化样本以及蛋白质-蛋白质相互作用（PPI）网络，对31种不同癌症类型进行分类。所提出的方法结合了DESeq2、微阵列线性模型（LIMMA）和最小绝对收缩与选择算子（LASSO）回归的差异基因表达，以降低多组学数据维度同时保留相关生物学特征。模型架构基于Kolmogorov-Arnold定理原理，使用可训练的单变量函数增强可解释性和特征分析。MOGKAN实现了96.28%的分类准确率，并且与相关基于深度学习的模型相比，表现出较低的实验变异性。通过基因本体论（GO）和京都基因与基因组百科全书（KEGG）富集分析，MOGKAN识别的生物标志物被验证为癌症相关标志物。通过将多组学数据与基于图的深度学习相结合，我们提出的方法展示了稳健的预测性能和可解释性，具有将复杂多组学数据转化为临床可操作癌症诊断的潜力。

英文摘要

The integration of heterogeneous multi-omics datasets at a systems level remains a central challenge for developing analytical and computational models in precision cancer diagnostics. This paper introduces Multi-Omics Graph Kolmogorov-Arnold Network (MOGKAN), a deep learning framework that utilizes messenger-RNA, micro-RNA sequences, and DNA methylation samples together with Protein-Protein Interaction (PPI) networks for cancer classification across 31 different cancer types. The proposed approach combines differential gene expression with DESeq2, Linear Models for Microarray (LIMMA), and Least Absolute Shrinkage and Selection Operator (LASSO) regression to reduce multi-omics data dimensionality while preserving relevant biological features. The model architecture is based on the Kolmogorov-Arnold theorem principle and uses trainable univariate functions to enhance interpretability and feature analysis. MOGKAN achieves classification accuracy of 96.28 percent and exhibits low experimental variability in comparison to related deep learning-based models. The biomarkers identified by MOGKAN were validated as cancer-related markers through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. By integrating multi-omics data with graph-based deep learning, our proposed approach demonstrates robust predictive performance and interpretability with potential to enhance the translation of complex multi-omics data into clinically actionable cancer diagnostics.

URL PDF HTML ☆

赞 0 踩 0

2507.19702 2026-06-02 cs.SI cs.AI cs.LG 版本更新

A Lightweight Deep Learning-based Model for Ranking Influential Nodes in Complex Networks

基于轻量级深度学习的复杂网络中有影响力节点排序模型

Mohammed A. Ramadhan, Abdulhakeem O. Mohammed

发表机构 * Computer Science Department, College of Science, University of Zakho（扎赫大学科学学院计算机科学系）； Department of Computer Science and Information Technology, The American University of Kurdistan（库尔德斯坦美国大学计算机科学与信息技术系）

AI总结提出一种结合一维卷积神经网络和GraphSAGE的轻量级混合模型1D-CGS，利用节点度和平均邻居度特征，通过回归任务高效排序有影响力节点，在12个真实网络上平均Kendall Tau提升4.73%，Jaccard相似度提升7.67%，单调性指数达0.99，运行速度显著快于现有深度学习方法。

详情

AI中文摘要

识别复杂网络中的有影响力节点是一项关键任务，在不同领域有广泛应用。然而，现有方法常在准确性和计算效率之间权衡。为解决这些挑战，我们提出1D-CGS，一种轻量级且有效的混合模型，它结合了一维卷积神经网络（1D-CNN）的速度和GraphSAGE的拓扑表示能力，用于高效节点排序。该模型使用基于两个简单且重要的拓扑特征（节点度和平均邻居度）构建的轻量级输入表示。这些特征通过一维卷积提取局部模式，然后通过GraphSAGE层聚合邻域信息。我们将节点排序任务表述为回归问题，并使用易感-感染-恢复（SIR）模型生成真实影响力分数。1D-CGS首先在Barabasi-Albert模型生成的合成网络上训练，然后应用于真实世界网络以识别有影响力节点。在12个真实网络上的实验评估表明，1D-CGS在排序准确性上显著优于传统中心性度量和最近的深度学习模型，同时运行速度非常快。与表现最佳的深度学习基线相比，所提模型在Kendall Tau相关性上平均提升4.73%，在Jaccard相似度上平均提升7.67%。它还实现了平均单调性指数（MI）分数0.99，并产生近乎完美的排名分布，表明高度独特和可区分的排名。此外，所有实验证实1D-CGS在高度合理的时间内运行，比现有深度学习方法快得多，使其适用于大规模应用。

英文摘要

Identifying influential nodes in complex networks is a critical task with a wide range of applications across different domains. However, existing approaches often face trade-offs between accuracy and computational efficiency. To address these challenges, we propose 1D-CGS, a lightweight and effective hybrid model that integrates the speed of one-dimensional convolutional neural networks (1D-CNN) with the topological representation power of GraphSAGE for efficient node ranking. The model uses a lightweight input representation built on two straightforward and significant topological features: node degree and average neighbor degree. These features are processed through 1D convolutions to extract local patterns, followed by GraphSAGE layers to aggregate neighborhood information. We formulate the node ranking task as a regression problem and use the Susceptible-Infected-Recovered (SIR) model to generate ground truth influence scores. 1D-CGS is initially trained on synthetic networks generated by the Barabasi-Albert model and then applied to real world networks for identifying influential nodes. Experimental evaluations on twelve real world networks demonstrate that 1D-CGS significantly outperforms traditional centrality measures and recent deep learning models in ranking accuracy, while operating in very fast runtime. The proposed model achieves an average improvement of 4.73% in Kendall's Tau correlation and 7.67% in Jaccard Similarity over the best performing deep learning baselines. It also achieves an average Monotonicity Index (MI) score 0.99 and produces near perfect rank distributions, indicating highly unique and discriminative rankings. Furthermore, all experiments confirm that 1D-CGS operates in a highly reasonable time, running significantly faster than existing deep learning methods, making it suitable for large scale applications.

URL PDF HTML ☆

赞 0 踩 0

2503.05641 2026-06-02 cs.CL cs.AI cs.LG 版本更新

Skill-Based Mixture-of-Experts: Adaptive Routing for Heterogeneous Reasoning via Inferred Skills

基于技能的混合专家模型：通过推断技能实现异构推理的自适应路由

Justin Chih-Yao Chen, Sukwon Yun, Elias Stengel-Eskin, Tianlong Chen, Mohit Bansal

发表机构 * The University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结提出Skill-MoE框架，通过推断查询所需技能进行实例级专家选择，并采用批推理策略降低开销，在单GPU上集成16个专家模型，在多个推理基准上平均提升8.15%。

Comments ICML 2026 (Camera-Ready). The first three authors contributed equally. Project Page: https://skill-moe.github.io/

详情

AI中文摘要

结合现有的预训练大语言模型是处理多样化推理任务的一种有前景的方法。然而，任务级专家选择往往过于粗粒度，因为不同实例可能需要不同的专业知识。为了解决这个问题，我们提出了Skill-MoE，一个符号化的、基于技能的、无梯度的混合专家框架，用于实例级专家选择。Skill-MoE从每个查询中推断技能（例如，数学中的代数），根据技能相关性选择专家，并让每个专家生成自己的推理。然后，由选定的聚合器将得到的k个输出进行综合，该聚合器因其整合多样化响应的能力而被选中。虽然实例级选择显著提高了性能，但朴素实现会因重复的模型加载和卸载而产生巨大开销。我们通过一种批推理策略解决了这个问题，该策略将实例按分配的专家分组，使得每个模型只需加载一次。因此，Skill-MoE在单GPU上集成了16个专家模型，其运行时间与使用4个GPU的先前多智能体基线相当。在多个基准测试（MMLU-Pro、GPQA、AIME和MedMCQA）中，Skill-MoE相比最佳基线实现了平均8.15%的绝对提升。它还能很好地泛化到未见过的任务，并且无需昂贵的多轮交互即可超越基于讨论的方法。

英文摘要

Combining existing pre-trained LLMs is a promising approach for diverse reasoning tasks. However, task-level expert selection is often too coarse-grained, since different instances may require different expertise. To address this, we propose Skill-MoE, a symbolic, skill-based, and gradient-free Mixture-of-Experts framework for instance-level expert selection. Skill-MoE infers skills (e.g., algebra in mathematics) from each query, selects experts based on skill relevance, and lets each expert generate its own reasoning. The resulting k outputs are then synthesized by an aggregator chosen for its ability to integrate diverse responses. While instance-level selection substantially improves performance, naively implementing it incurs heavy overhead from repeated model loading and offloading. We address this with a batch inference strategy that groups instances by assigned experts, allowing each model to be loaded only once. As a result, Skill-MoE integrates 16 expert models on a single GPU with runtime comparable to prior multi-agent baselines using 4 GPUs. Across diverse benchmarks (MMLU-Pro, GPQA, AIME, and MedMCQA), Skill-MoE achieves an average absolute improvement of 8.15% over the best baseline. It also generalizes well to unseen tasks and outperforms discussion-based methods without requiring expensive multi-round interactions.

URL PDF HTML ☆

赞 0 踩 0

2507.12645 2026-06-02 eess.SP cs.AI cs.LG 版本更新

A Novel Data Augmentation Strategy for Robust Deep Learning Classification of Biomedical Time-Series Data: Application to ECG and EEG Analysis

一种用于生物医学时间序列数据鲁棒深度学习分类的新型数据增强策略：在ECG和EEG分析中的应用

Mohammed Guhdar, Ramadhan J. Mstafa, Abdulhakeem O. Mohammed

发表机构 * Computer Science Department, College of Science, University of Zakho（扎赫大学科学学院计算机科学系）； Department of Computer Science and Information Technology, The American University of Kurdistan（库尔德斯坦美国大学计算机科学与信息技术系）； PRIME Lab, Scientific Research Center, University of Zakho（扎赫大学科学研究中心PRIME实验室）

AI总结提出一种结合ResNet-CNN与注意力机制的统一深度学习框架，通过时域拼接多个增强变体的新型数据增强策略和Focal Loss处理类别不平衡，在ECG和EEG数据集上达到99.96%-100%的准确率，且内存需求低、推理速度快。

详情

AI中文摘要

准确统一分析多种生物信号（如ECG和EEG）的需求日益迫切，这对于全面评估患者状况至关重要，尤其是在同步监测中。尽管多传感器融合取得了进展，但在开发能够有效处理和提取本质上不同生理信号特征的统一架构方面仍存在关键空白。另一个挑战是许多生物医学数据集固有的类别不平衡，这常常导致传统方法性能偏差。本研究通过提出一种新颖且统一的深度学习框架来解决这些问题，该框架在不同信号类型上均达到了最先进的性能。我们的方法将基于ResNet的CNN与注意力机制相结合，并通过一种新颖的数据增强策略增强：对每个信号的多个增强变体进行时域拼接，以生成更丰富的表示。与先前工作不同，我们科学地增加信号复杂性以实现未来能力，从而相比现有技术获得了最佳预测。预处理步骤包括小波去噪、基线去除和标准化。通过结合使用这种高级数据增强和Focal Loss函数，有效管理了类别不平衡。训练过程中应用了正则化技术以确保泛化能力。我们在三个基准数据集上严格评估了所提出的架构：UCI癫痫EEG、MIT-BIH心律失常和PTB诊断ECG。它分别达到了99.96%、99.78%和100%的准确率，展示了在不同信号类型和临床背景下的鲁棒性。最后，该架构需要约130 MB内存，每个样本处理时间约10 ms，表明其适用于低端或可穿戴设备部署。

英文摘要

The increasing need for accurate and unified analysis of diverse biological signals, such as ECG and EEG, is paramount for comprehensive patient assessment, especially in synchronous monitoring. Despite advances in multi-sensor fusion, a critical gap remains in developing unified architectures that effectively process and extract features from fundamentally different physiological signals. Another challenge is the inherent class imbalance in many biomedical datasets, often causing biased performance in traditional methods. This study addresses these issues by proposing a novel and unified deep learning framework that achieves state-of-the-art performance across different signal types. Our method integrates a ResNet-based CNN with an attention mechanism, enhanced by a novel data augmentation strategy: time-domain concatenation of multiple augmented variants of each signal to generate richer representations. Unlike prior work, we scientifically increase signal complexity to achieve future-reaching capabilities, which resulted in the best predictions compared to the state of the art. Preprocessing steps included wavelet denoising, baseline removal, and standardization. Class imbalance was effectively managed through the combined use of this advanced data augmentation and the Focal Loss function. Regularization techniques were applied during training to ensure generalization. We rigorously evaluated the proposed architecture on three benchmark datasets: UCI Seizure EEG, MIT-BIH Arrhythmia, and PTB Diagnostic ECG. It achieved accuracies of 99.96%, 99.78%, and 100%, respectively, demonstrating robustness across diverse signal types and clinical contexts. Finally, the architecture requires ~130 MB of memory and processes each sample in ~10 ms, suggesting suitability for deployment on low-end or wearable devices.

URL PDF HTML ☆

赞 0 踩 0

2501.12189 2026-06-02 math.OC cs.LG 版本更新

MirrorCBO: A consensus-based optimization method in the spirit of mirror descent

MirrorCBO：一种镜像下降思想的共识优化方法

Leon Bungert, Franca Hoffmann, Dohyeon Kim, Tim Roith

发表机构 * Department of Computing and Mathematical Sciences, Caltech（计算与数学科学系，加州理工学院）； Helmholtz Imaging, Deutsches Elektronen-Synchrotron DESY（海德堡成像，德意志电子同步辐射研究中心（DESY））

AI总结提出MirrorCBO方法，通过将共识优化与镜像下降结合，实现无导数非凸优化，并推广到约束优化问题，理论证明指数收敛速率，实验展示稀疏诱导和约束优化的竞争力。

Comments 66 pages, 18 figures, 19 tables

详情

DOI: 10.1142/S0218202525500563
Journal ref: Mathematical Models and Methods in Applied Sciences 35 (14), 3083-3170, 2025

AI中文摘要

本文提出MirrorCBO，一种共识优化方法，它像镜像下降推广梯度下降一样推广了标准CBO。为此，我们将CBO方法应用于对偶粒子群，并通过应用镜像映射的逆（参数化为强凸函数$\phi$的次微分）来保留原始粒子位置。这样，我们结合了无导数非凸优化算法和镜像下降的优点。作为一个特例，该方法将CBO扩展到具有凸约束的优化问题。假设与$\phi$相关的Bregman距离有界，我们提供了MirrorCBO的渐近收敛结果，具有显式指数速率。另一个关键贡献是对该新算法在不同应用场景中的探索性数值研究，重点关注(i)稀疏诱导优化和(ii)约束优化，展示了MirrorCBO的竞争性能。我们经验性地观察到，该方法也可用于欧几里得空间（非凸）子流形上的优化，可适应其他近期CBO变体的镜像版本，并且继承了镜像下降选择理想极小值（如稀疏解）的能力。我们还概述了近期用于约束优化的CBO方法，并将其性能与MirrorCBO进行了比较。

英文摘要

In this work we propose MirrorCBO, a consensus-based optimization (CBO) method which generalizes standard CBO in the same way that mirror descent generalizes gradient descent. For this we apply the CBO methodology to a swarm of dual particles and retain the primal particle positions by applying the inverse of the mirror map, which we parametrize as the subdifferential of a strongly convex function $ϕ$. In this way, we combine the advantages of a derivative-free non-convex optimization algorithm with those of mirror descent. As a special case, the method extends CBO to optimization problems with convex constraints. Assuming bounds on the Bregman distance associated to $ϕ$, we provide asymptotic convergence results for MirrorCBO with explicit exponential rate. Another key contribution is an exploratory numerical study of this new algorithm across different application settings, focusing on (i) sparsity-inducing optimization, and (ii) constrained optimization, demonstrating the competitive performance of MirrorCBO. We observe empirically that the method can also be used for optimization on (non-convex) submanifolds of Euclidean space, can be adapted to mirrored versions of other recent CBO variants, and that it inherits from mirror descent the capability to select desirable minimizers, like sparse ones. We also include an overview of recent CBO approaches for constrained optimization and compare their performance to MirrorCBO.

URL PDF HTML ☆

赞 0 踩 0

2506.21278 2026-06-02 stat.ML cs.AI cs.LG math.ST stat.TH 版本更新

Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution

使用高效球面柯西分布的超球面变分自编码器

Lukas Sablica, Kurt Hornik

发表机构 * Institute for Statistics and Mathematics（统计与数学研究所）； Vienna University of Economics and Business（维也纳经济与商业大学）； Austria（奥地利）

AI总结提出基于球面柯西分布的超球面变分自编码器，通过莫比乌斯变换实现可微重参数化，避免贝塞尔函数计算，在保持重尾特性的同时提供高效稳定的训练与推理。

详情

AI中文摘要

我们提出在超球面潜变量空间上使用球面柯西（spCauchy）潜变量的变分自编码器。spCauchy 族具有重尾全局行为，并且通过对球面上的均匀样本应用莫比乌斯变换，允许精确可微的重参数化。我们证明，在高浓度极限下，spCauchy 在显式浓度参数映射下恢复了 von Mises-Fisher（vMF）分布的局部切空间几何，同时避免了 vMF 实现所需的高阶贝塞尔函数计算。对于训练，到均匀球面先验的 Kullback-Leibler 散度具有快速收敛的级数、稳定的求积以及高浓度渐近形式。我们进一步建立了浓度依赖的 KL 核心的单调性，并推导了具有闭形式代理和误差控制的解析括号，支持极端情况下的稳定近似。压力测试基准表明，所得到的潜层目标在 CPU 和 GPU 上比 vMF 基线更稳定且评估更快。在图像和分子序列数据上的实验表明，spCauchy-VAE 为具有超球面潜表示的生式建模提供了一种鲁棒且可扩展的替代方案。

英文摘要

We propose spherical Cauchy (spCauchy) latent variables for variational autoencoders on hyperspherical latent spaces. The spCauchy family has heavy-tailed global behavior and admits an exact differentiable reparameterization by applying a Möbius transformation to uniform samples on the sphere. We show that, in the high-concentration limit, spCauchy recovers the local tangent-space geometry of the von Mises-Fisher (vMF) distribution under an explicit concentration parameter mapping, while avoiding the high-order Bessel-function evaluations required by vMF implementations. For training, the Kullback-Leibler divergence to a uniform spherical prior admits rapidly convergent series, stable quadrature, and high-concentration asymptotic forms. We further establish monotonicity of the concentration-dependent KL core and derive analytic brackets with closed-form surrogates and error control, supporting stable approximation in extreme regimes. Stress-test benchmarks show that the resulting latent-layer objective remains stable and faster to evaluate than vMF baselines on CPU and GPU. Experiments on image and molecular sequence data demonstrate that spCauchy-VAEs provide a robust and scalable alternative for generative modeling with hyperspherical latent representations.

URL PDF HTML ☆

赞 0 踩 0

2507.09766 2026-06-02 cs.LG cs.AI 版本更新

Toward accurate RUL and SoH estimation using reinforced graph-based physics-informed neural networks enhanced with dynamic weights

基于动态权重的强化图物理信息神经网络实现精确的剩余使用寿命和健康状态估计

Mohamadreza Akbari Pour, Ali Ghasemzadeh, Mohamad Ali Bijarchi, Mohammad Behshad Shafii

发表机构 * Department of Mechanical Engineering（机械工程系）； Department of Computer Engineering（计算机工程系）； Sharif University of Technology（谢赫拉特福大学）

AI总结提出一种结合图表示学习、强化学习和自适应动态权重的物理信息神经网络框架RGPD，在C-MAPSS、PHM2012和XJTU数据集上实现跨资产退化场景的RUL和SoH高精度估计。

详情

AI中文摘要

精确估计剩余使用寿命（RUL）和健康状态（SoH）对于可靠的预测与健康管理（PHM）至关重要，有助于及时维护和可靠的工业运行。然而，结合数据驱动学习与基于物理的正则化的混合模型通常依赖于固定的损失权重，因此在跨具有不同退化行为的资产迁移时会失去准确性。本研究引入了具有动态加权的强化图物理信息网络（RGPD），这是一个用于时空退化建模和自适应物理引导正则化的统一框架。基于图的表示学习捕获传感器间的退化结构，软演员-评论家（SAC）模块在噪声条件下细化潜在特征，轻量级Q学习策略在训练过程中自适应地平衡单调性、平滑性和潜在动力学残差损失。该框架在C-MAPSS、PHM2012和XJTU数据集上进行了评估，这些数据集分别代表发动机、轴承和电池的退化过程。与相应基准表中报告的最强基线相比，RGPD在PHM2012和C-MAPSS上将平均RMSE提高了高达12%，在XJTU上将平均MAPE比第二好的模型降低了20%。在这些异构基准上的性能进一步表明了该模型跨退化系统的泛化能力。物理信息组件通过退化一致性先验以及深度隐藏物理模型风格的残差实现，提高了物理合理性，而无需为每种资产类型建立完整的第一性原理模型。

英文摘要

Accurate estimation of Remaining Useful Life (RUL) and State of Health (SoH) is essential for reliable Prognostics and Health Management (PHM), supporting timely maintenance and dependable industrial operation. However, hybrid models that combine data-driven learning with physics-based regularization often rely on fixed loss weights and therefore lose accuracy when transferred across assets with different degradation behaviors. This study introduces Reinforced Graph-based Physics-informed Networks with Dynamic Weighting (RGPD), a unified framework for spatio-temporal degradation modeling and adaptive physics-guided regularization. Graph-based representation learning captures inter-sensor degradation structure, a Soft Actor-Critic (SAC) module refines latent features under noisy conditions, and a lightweight Q-learning policy adaptively balances monotonicity, smoothness, and latent-dynamics residual losses during training. The framework is evaluated on the C-MAPSS, PHM2012, and XJTU datasets, which represent engine, bearing, and battery degradation processes. Relative to the strongest compared baselines reported in the corresponding benchmark tables, RGPD improves average RMSE by up to 12 percent on PHM2012 and C-MAPSS, and reduces average MAPE by 20 percent on XJTU compared with the second-best reported model. Performance on these heterogeneous benchmarks further suggests the model's generalizability across degradation systems. The physics-informed component is implemented through degradation-consistent priors together with a Deep Hidden Physics Model-style residual, which improves physical plausibility without requiring a full first-principles model for each asset type.

URL PDF HTML ☆

赞 0 踩 0

2507.07339 2026-06-02 stat.AP cs.LG 版本更新

Benchmarking Waitlist Mortality Prediction in Heart Transplantation Through Time-to-Event Modeling using New Longitudinal UNOS Dataset

基于新的纵向UNOS数据集通过时间-事件模型对心脏移植等待名单死亡率预测进行基准测试

Yingtao Luo, Reza Skandari, Carlos Martinez, Arman Kilic, Rema Padman

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Imperial College（帝国理工学院）； United Network for Organ Sharing（美国器官共享网络）； Medical University of South Carolina（南卡罗来纳医学院）

AI总结本研究利用纵向等待名单历史数据，通过时间-事件模型对心脏移植等待名单死亡率进行预测，最佳模型C-Index达0.94，AUROC达0.89，显著优于以往模型。

Comments Best Student Paper Finalist in Proceedings of AMIA Annual Symposium 2025

详情

AI中文摘要

目前，关于心脏移植等待名单患者管理的决策由医生委员会根据多种因素做出，但过程在很大程度上仍是临时的。随着2018年以来器官共享联合网络（UNOS）收集的纵向患者、供体和器官数据量的增加，人们对在器官可用时支持临床决策的分析方法越来越感兴趣。在本研究中，我们对利用纵向等待名单历史数据进行时间依赖性、时间-事件建模的机器学习模型进行了基准测试，以预测等待名单死亡率。我们使用23,807条患者记录（包含77个变量）进行训练，并在1年时间范围内评估生存预测和区分能力。我们的最佳模型实现了0.94的C-Index和0.89的AUROC，显著优于以往模型。关键预测因子与已知风险因素一致，同时也揭示了新的关联。我们的发现可以支持心脏移植决策中的紧迫性评估和政策改进。

英文摘要

Decisions about managing patients on the heart transplant waitlist are currently made by committees of doctors who consider multiple factors, but the process remains largely ad-hoc. With the growing volume of longitudinal patient, donor, and organ data collected by the United Network for Organ Sharing (UNOS) since 2018, there is increasing interest in analytical approaches to support clinical decision-making at the time of organ availability. In this study, we benchmark machine learning models that leverage longitudinal waitlist history data for time-dependent, time-to-event modeling of waitlist mortality. We train on 23,807 patient records with 77 variables and evaluate both survival prediction and discrimination at a 1-year horizon. Our best model achieves a C-Index of 0.94 and AUROC of 0.89, significantly outperforming previous models. Key predictors align with known risk factors while also revealing novel associations. Our findings can support urgency assessment and policy refinement in heart transplant decision making.

URL PDF HTML ☆

赞 0 踩 0

2507.05658 2026-06-02 physics.ao-ph cs.LG 版本更新

HRRRCast: a data-driven emulator for regional weather forecasting at convection allowing scales

HRRRCast：面向对流允许尺度区域天气预报的数据驱动模拟器

Daniel Abdi, Isidora Jankov, Paul Madden, Vanderlei Vargas, Timothy A. Smith, Sergey Frolov, Montgomery Flora, Corey Potvin

发表机构 * Cooperative Institute for Research in Environmental Sciences（环境科学研究院）； Cooperative Institute for Research in the Atmosphere（大气研究院）； Cooperative Institute for Severe and High-Impact Weather Research and Operations（严重和高影响天气研究与运作研究院）； NOAA Global Systems Laboratory（国家海洋和大气管理局全球系统实验室）； NOAA Physical Sciences Laboratory（国家海洋和大气管理局物理科学实验室）； NOAA National Severe Storms Laboratory（国家海洋和大气管理局严重风暴实验室）

AI总结提出HRRRCast数据驱动模拟器，采用ResNet和GNN架构，通过多预报时长训练和贪婪滚动策略，在CONUS区域复合反射率预报上达到与HRRR模型相当或更优的性能。

详情

DOI: 10.1175/AIES-D-25-0061.1
Journal ref: Artificial Intelligence for the Earth Systems, Vol. 5, No. 2, 2026, Article 250061

AI中文摘要

高分辨率快速刷新（HRRR）模型是一种用于美国本土（CONUS）业务天气预报的对流允许模型。为了提供计算高效的替代方案，我们引入了HRRRCast，这是一个基于先进机器学习技术构建的数据驱动模拟器。HRRRCast包含两种架构：基于ResNet的模型（ResHRRR）和基于图神经网络的模型（GraphHRRR）。ResHRRR使用卷积神经网络，并增强了挤压激励模块和特征线性调制，通过去噪扩散隐式模型（DDIM）支持概率预报。为了更好地处理较长的预报时效，我们训练单个模型预测多个预报时长（1小时、3小时和6小时），然后在推理时采用贪婪滚动策略。当使用3到10个成员的集合在CONUS全区域评估复合反射率时，ResHRRR在弱降雨阈值（20 dBZ）下优于HRRR预报，并在中等阈值（30 dBZ）下达到有竞争力的性能。我们的工作改进了Pathak等人[21]的StormCast模型，具体改进包括：a) 在CONUS全区域上训练，b) 使用多个预报时长以提高长期预报技巧，c) 使用分析数据而非StormCast中无意使用的+1小时后分析数据训练，d) 将未来的GFS状态作为输入，实现降尺度以提高长预报时效的准确性。基于网格、邻域和对象的指标证实，与HRRR相比，风暴定位更好、频率偏差更低、成功率更高。HRRRCast集合预报还保持了更清晰的空间细节，功率谱与HRRR分析更匹配。虽然GraphHRRR在当前形式下表现不佳，但它为未来的图基预报奠定了基础。HRRRCast代表了向高效、数据驱动的区域天气预报迈出的一步，具有有竞争力的准确性和集合能力。

英文摘要

The High-Resolution Rapid Refresh (HRRR) model is a convection-allowing model used in operational weather forecasting across the contiguous United States (CONUS). To provide a computationally efficient alternative, we introduce HRRRCast, a data-driven emulator built with advanced machine learning techniques. HRRRCast includes two architectures: a ResNet-based model (ResHRRR) and a Graph Neural Network-based model (GraphHRRR). ResHRRR uses convolutional neural networks enhanced with squeeze-and-excitation blocks and Feature-wise Linear Modulation, and supports probabilistic forecasting via the Denoising Diffusion Implicit Model (DDIM). To better handle longer lead times, we train a single model to predict multiple lead times (1h, 3h, and 6h), then use a greedy rollout strategy during inference. When evaluated on composite reflectivity over the full CONUS domain using ensembles of 3 to 10 members, ResHRRR outperforms HRRR forecast at light rainfall threshold (20 dBZ) and achieves competitive performance at moderate thresholds (30 dBZ). Our work advances the StormCast model of Pathak et al. [21] by: a) training on the full CONUS domain, b) using multiple lead times to improve long-range skill, c) training on analysis data instead of the +1h post-analysis data inadvertently used in StormCast, and d) incorporating future GFS states as inputs, enabling downscaling that improves long-lead accuracy. Grid-, neighborhood-, and object-based metrics confirm better storm placement, lower frequency bias, and higher success ratios than HRRR. HRRRCast ensemble forecasts also maintain sharper spatial detail, with power spectra more closely matching HRRR analysis. While GraphHRRR underperforms in its current form, it lays groundwork for future graph-based forecasting. HRRRCast represents a step toward efficient, data-driven regional weather prediction with competitive accuracy and ensemble capability.

URL PDF HTML ☆

赞 0 踩 0

2507.02905 2026-06-02 cs.HC cs.AI cs.LG 版本更新

Preference-Optimal Multi-Metric Weighting for Parallel Coordinate Plots

平行坐标图的偏好最优多度量加权

Chisa Mori, Shuhei Watanabe, Masaki Onishi, Takayuki Itoh

发表机构 * Preferred Networks Inc.（Preferred Networks公司）

AI总结针对平行坐标图中多度量可视化难题，提出基于偏好最优加权的公式化方法，并利用雷达图与UMAP降维实现直观偏好选择，有效揭示控制参数重要性模式。

Comments Accepted to International Conference Information Visualisation (iV2025)

详情

DOI: 10.1109/IV68685.2025.00014

AI中文摘要

平行坐标图（PCP）是一种解释控制参数与度量之间关系的常用方法。PCP通过基于单一度量的颜色渐变来提供这种解释。然而，当存在多个度量时，提供这样的渐变是具有挑战性的。虽然一种简单的方法是通过线性加权每个度量来计算单一度量，但这种加权对用户来说是不明确的。为了解决这个问题，我们首先提出了一种基于特定偏好度量组合计算最优加权的原则性公式。尽管用户可以在双度量问题的二维（2D）平面上简单地选择他们的偏好，但多度量问题需要直观的可视化以允许他们选择偏好。我们通过使用各种雷达图来可视化由UMAP降维的2D平面上的度量权衡来实现这一点。在使用行人流引导规划的分析中，我们的方法为每个用户偏好识别出了控制参数重要性的独特模式，突出了我们方法的有效性。

英文摘要

Parallel coordinate plots (PCPs) are a prevalent method to interpret the relationship between the control parameters and metrics. PCPs deliver such an interpretation by color gradation based on a single metric. However, it is challenging to provide such a gradation when multiple metrics are present. Although a naive approach involves calculating a single metric by linearly weighting each metric, such weighting is unclear for users. To address this problem, we first propose a principled formulation for calculating the optimal weight based on a specific preferred metric combination. Although users can simply select their preference from a two-dimensional (2D) plane for bi-metric problems, multi-metric problems require intuitive visualization to allow them to select their preference. We achieved this using various radar charts to visualize the metric trade-offs on the 2D plane reduced by UMAP. In the analysis using pedestrian flow guidance planning, our method identified unique patterns of control parameter importance for each user preference, highlighting the effectiveness of our method.

URL PDF HTML ☆

赞 0 踩 0

2409.18624 2026-06-02 cs.AI cs.LG 版本更新

Unsupervised Cognition

无监督认知

Alfredo Ibias, Hector Antona, Guillem Ramirez-Miranda, Enric Guinovart, Eduard Alarcon

发表机构 * Avatar Cognition（Avatar认知）

AI总结提出一种基于原语的无监督学习方法，通过构建分布式层次结构表示输入空间，在分类任务上超越现有最先进方法，并展现出类似认知的行为。

详情

AI中文摘要

无监督学习方法在认知模型中具有软启发。迄今为止，最成功的无监督学习方法主要围绕在数学空间中对样本进行聚类。在本文中，我们提出了一种基于原语的无监督学习方法，用于决策制定，该方法受一种新颖的认知框架启发。这种以表示为中心的方法以输入无关的方式，将输入空间建设性地建模为分布式层次结构。我们将我们的方法与当前最先进的无监督学习分类、当前最先进的小规模和不完整数据集分类以及当前最先进的癌症类型分类进行了比较。我们展示了我们的方法如何超越先前的最先进技术。我们还评估了我们方法的一些类似认知的特性，在这些特性中，它不仅优于比较的算法（甚至包括监督学习算法），而且表现出不同的、更类似于认知的行为。

英文摘要

Unsupervised learning methods have a soft inspiration in cognition models. To this day, the most successful unsupervised learning methods revolve around clustering samples in a mathematical space. In this paper we propose a primitive-based, unsupervised learning approach for decision-making inspired by a novel cognition framework. This representation-centric approach models the input space constructively as a distributed hierarchical structure in an input-agnostic way. We compared our approach with both current state-of-the-art unsupervised learning classification, with current state-of-the-art small and incomplete datasets classification, and with current state-of-the-art cancer type classification. We show how our proposal outperforms previous state-of-the-art. We also evaluate some cognition-like properties of our proposal where it not only outperforms the compared algorithms (even supervised learning ones), but it also shows a different, more cognition-like, behaviour.

URL PDF HTML ☆

赞 0 踩 0

2412.03771 2026-06-02 cs.SD cs.LG eess.AS 版本更新

Embedding-Space Diffusion for Zero-Shot Environmental Sound Classification

嵌入空间扩散用于零样本环境声音分类

Ysobel Sims, Alexandre Mendes, Stephan Chalup

发表机构 * School of Information and Physical Sciences, University of Newcastle, Australia（信息与物理科学学院，新南威尔士大学，澳大利亚）

AI总结本文提出一种基于扩散模型的条件生成方法，用于零样本环境声音分类，在多个音频数据集上平均性能优于现有基线方法。

详情

AI中文摘要

零样本学习通过利用语义信息使模型能够泛化到未见过的类别，弥合训练集和测试集之间类别不重叠的差距。尽管大量研究集中在计算机视觉中的零样本学习，但这些方法在环境音频中的应用仍未被充分探索，现有研究性能较差。在计算机视觉中已证明成功的生成方法在零样本环境声音分类研究中明显缺失。为填补这一空白，本研究探索了环境音频中零样本学习的生成方法。我们改编了两种来自计算机视觉的成功生成模型：交叉对齐和分布对齐变分自编码器（CADA-VAE）以及利用不变侧生成对抗网络（LisGAN）。此外，我们引入了一种以类别辅助数据为条件的新型扩散模型。扩散模型生成的合成嵌入与已见类别嵌入结合，用于训练分类器。在五个环境音频数据集（ESC-50、ARCA23K-FSD、FSC22、UrbanSound8k和TAU Urban Acoustics 2019）和一个音乐分类数据集（GTZAN）上进行了实验。结果表明，扩散模型在六个音频数据集上的平均性能优于所有基线方法。这项工作确立了扩散模型作为零样本学习的一种有前景的方法，并引入了零样本环境声音分类生成方法的第一个基准，为未来研究提供了基础。

英文摘要

Zero-shot learning enables models to generalise to unseen classes by leveraging semantic information, bridging the gap between training and testing sets with non-overlapping classes. While much research has focused on zero-shot learning in computer vision, the application of these methods to environmental audio remains underexplored, with poor performance in existing studies. Generative methods, which have demonstrated success in computer vision, are notably absent from zero-shot environmental sound classification studies. To address this gap, this work investigates generative methods for zero-shot learning in environmental audio. Two successful generative models from computer vision are adapted: a cross-aligned and distribution-aligned variational autoencoder (CADA-VAE) and a leveraging invariant side generative adversarial network (LisGAN). Additionally, we introduced a novel diffusion model conditioned on class auxiliary data. Synthetic embeddings generated by the diffusion model are combined with seen class embeddings to train a classifier. Experiments are conducted on five environmental audio datasets, ESC-50, ARCA23K-FSD, FSC22, UrbanSound8k and TAU Urban Acoustics 2019, and one music classification dataset, GTZAN. Results show that the diffusion model outperforms all baseline methods on average across six audio datasets. This work establishes the diffusion model as a promising approach for zero-shot learning and introduces the first benchmark of generative methods for zero-shot environmental sound classification, providing a foundation for future research.

URL PDF HTML ☆

赞 0 踩 0

2411.15240 2026-06-02 cs.LG cs.AI cs.HC q-bio.QM 版本更新

A Foundation Model for Wearable Movement Data in Mental Health Research

心理健康研究中可穿戴运动数据的基础模型

Franklin Y. Ruan, Aiwei Zhang, Jenny Y. Oh, SouYoung Jin, Nicholas C. Jacobson

发表机构 * Dartmouth College（达特茅斯学院）； National Institute of Diabetes and Digestive and Kidney Diseases（国家糖尿病、消化系统疾病和肾病研究所）； National Institutes of Health（美国国立卫生研究院）； Department of Computer Science at Dartmouth College（达特茅斯学院计算机科学系）

AI总结提出预训练体动记录Transformer（PAT），一种基于自监督掩码自编码器预训练的可穿戴运动时间序列基础模型，在心理健康预测任务上优于非基础模型方法，并提供可解释的注意力图。

Comments F. Y. Ruan, A. Zhang, J. Y. Oh, S. Jin and N. C. Jacobson, "A Foundation Model for Wearable Movement Data in Mental Health Research," in IEEE Journal of Biomedical and Health Informatics, doi: 10.1109/JBHI.2026.3694809

详情

DOI: 10.1109/JBHI.2026.3694809

AI中文摘要

可穿戴运动数据由几乎所有市售智能手表收集，是心理健康研究的宝贵资源，反映了细粒度的时间行为趋势。尽管前景广阔，但与临床图像和文本分析相比，健康可穿戴建模的基础模型开发仍然有限。我们设计了带有补丁嵌入的Transformer，并在分钟级、持续一周的体动记录（身体活动强度测量）序列上使用自监督掩码自编码器预训练，以开发和评估预训练体动记录Transformer（PAT）。PAT是一个用于可穿戴运动时间序列的开源基础模型，结合了长达一周的时间建模、精神科结果评估以及在公共数据上的可重复性。在来自美国国家健康与营养调查（NHANES）的全国代表性队列中21,538名参与者的数据上预训练，PAT在心理健康预测任务（包括苯二氮卓类药物和SSRI使用、抑郁症和睡眠异常）中始终优于非基础模型基线。在苯二氮卓类药物使用预测任务中，PAT相比常用于时间序列建模的非基础深度学习模型表现出最大改进（即比LSTM提高55.6%，比一维CNN提高21.4%，比ConvLSTM提高14.8%）。除了预测准确性，PAT还提供可解释的注意力图，突出对临床预测最重要的日常活动特定时段，提供模型透明度和潜在临床见解。结果表明，PAT为研究人员和临床医生提供了一种易于部署、适应性强且可扩展的解决方案，以从可穿戴传感器数据中推进临床见解。GitHub: https://github.com/njacobsonlab/Pretrained-Actigraphy-Transformer/

英文摘要

Wearable movement data is collected by nearly all commercially available smartwatches and is a valuable resource for mental health research, reflecting fine-grained temporal behavioral trends. Despite its promise, the development of foundation models for health wearable modeling remains limited when compared to clinical image and text analysis. We designed transformers with patch embeddings and used self-supervised masked autoencoder pretraining on minute-level week-long actigraphy (physical activity intensity measurement) sequences to develop and evaluate the Pretrained Actigraphy Transformer (PAT). PAT is an open-source foundation model for wearable movement time series that combines week-long temporal modeling, psychiatric outcome evaluation, and reproducibility on public data. Pretrained on data from 21,538 U.S. participants in a nationally representative cohort from the National Health and Nutrition Examination Survey (NHANES), PAT consistently outperformed non-foundation-model baselines across mental health prediction tasks-including benzodiazepine and SSRI use, depression, and sleep abnormalities. During the benzodiazepine medication usage prediction task, PAT demonstrated the largest improvement over non-foundational deep learning models commonly used for time-series modeling (i.e., 55.6% improvement over the LSTM, 21.4% improvement over the 1-D CNN, 14.8% improvement over the ConvLSTM). Beyond predictive accuracy, PAT provides interpretable attention maps highlighting specific periods of daily activity most important for clinical predictions, offering model transparency and potential clinical insights. The results suggest that PAT offers an easy-to-deploy, adaptable and scalable solution to advance clinical insight from wearable sensor data for researchers and clinicians. GitHub: https://github.com/njacobsonlab/Pretrained-Actigraphy-Transformer/

URL PDF HTML ☆

赞 0 踩 0

2506.19035 2026-06-02 cs.LG 版本更新

Failure Modes of Time Series Interpretability Algorithms for Critical Care Applications and Potential Solutions

重症监护应用中时间序列可解释性算法的失败模式及潜在解决方案

Shashank Yadav, Vignesh Subbian

发表机构 * Department of Biomedical Engineering（生物医学工程系）； University of Arizona（亚利桑那大学）

AI总结本文系统分析了梯度、遮挡和置换方法在动态预测任务中的失败模式，并提出可学习掩码框架作为替代方案，通过引入时间连续性和标签一致性约束来提供更可靠的特征重要性解释。

Comments 13 pages, 10 figures, Accepted at the AMIA Annual Symposium 2025. The final version will appear in the official proceedings

2506.17171 2026-06-02 cs.LG 版本更新

Deep generative models as the probability transformation functions

深度生成模型作为概率变换函数

Vitalii Bondar, Vira Babenko, Roman Trembovetskyi, Yurii Korobeinyk, Viktoriya Dzyuba

发表机构 * Cherkasy State Technological University（切尔卡西国立技术大学）； Cherkasy Bohdan Khmelnytsky National University（切尔卡斯比多曼·赫梅尔尼茨基国立大学）

AI总结本文提出统一理论视角，将深度生成模型视为概率变换函数，揭示不同架构（自编码器、自回归模型、生成对抗网络、归一化流、扩散模型和流匹配）本质上均通过将简单预定义分布变换为复杂目标数据分布来运作。

Comments 12 pages, 6 figures, accepted for publication in "ICIST 2025 Springer Proceedings"

2506.10677 2026-06-02 stat.ML cs.LG 版本更新

Exploiting Similarities in A/B Testing with Off-Policy Estimation

利用离线策略估计在A/B测试中的相似性

Otmane Sakhi, Alexandre Gilotte, David Rohde

发表机构 * Criteo AI Lab（Criteo AI实验室）

AI总结本文提出利用离线策略估计方法，通过捕捉新旧系统决策倾向的相似性，构建一族A/B测试估计器，在保持无偏性的同时改善集中性质，提高统计效率。

Comments KDD '26

详情

AI中文摘要

我们研究A/B测试，即衡量新决策系统相对于基线的性能增益的标准协议。传统的A/B测试将两个系统视为黑箱，忽略了它们之间的潜在相似性。然而，在实践中，新系统和基线系统很少存在根本性差异，通常共享显著的结构，这可以通过它们做出相似决策的倾向来捕捉。我们表明，在这种情况下，常用的均值差估计量虽然无偏，但在统计上并非最优。利用离线策略估计，我们引入了一族A/B测试估计量，这些估计量利用被测试系统的倾向来获得改进的集中性质。这族估计量足够灵活，可以针对实际决策进行定制。得到的估计量简单、对倾向性误设具有鲁棒性，在测试系统表现出相似性时显著更准确，并在缺乏这种相似性时优雅地退化为均值差估计量。我们的理论分析和实证研究证实了它们的效率和实用性。

英文摘要

We study A/B testing, the standard protocol for measuring the performance gain of a new decision system relative to a baseline. Traditional A/B testing treats both systems as black boxes, ignoring potential similarities between them. In practice, however, new and baseline systems are rarely radically different and often share significant structure, which can be captured by their propensities to make similar decisions. We show that in such cases, the commonly used difference-in-means estimator, though unbiased, is statistically suboptimal. Leveraging off-policy estimation, we introduce a family of A/B testing estimators that exploit the propensities of the tested systems to achieve improved concentration properties. This family is flexible enough to be tailored to practical decision-making. The resulting estimators are simple, robust to propensities misspecification, substantially more accurate when the tested systems exhibit similarities, and gracefully fall back to the difference-in-means estimator when such similarities are absent. Our theoretical analysis and empirical studies confirm their efficiency and practicality.

URL PDF HTML ☆

赞 0 踩 0

2012.02110 2026-06-02 cs.CL cs.LG 版本更新

GottBERT: a pure German Language Model

GottBERT: 一个纯德语语言模型

Raphael Scheible, Johann Frei, Fabian Thomczyk, Henry He, Patric Tippmann, Jochen Knaus, Victor Jaravine, Frank Kramer, Martin Boeker

发表机构 * Institute for AI and Informatics in Medicine, University Hospital rechts der Isar, Technical University Munich（慕尼黑技术大学医学人工智能与信息学研究所，莱茵河右岸大学医院）； IT-Infrastructure for Translational Medical Research, Faculty of Applied Computer Science, University of Augsburg（奥格斯堡大学应用计算机科学学院医学转化研究IT基础设施）； Data Integration Center, Faculty of Medicine, University of Freiburg（弗赖堡大学医学学院数据整合中心）； School of Computation, Information and Technology, Technical University Munich（慕尼黑技术大学计算、信息与技术学院）； Institute of Medical Biometry and Statistics, Medical Center, Faculty of Medicine, University of Freiburg（弗赖堡大学医学学院医学生物统计学研究所）； Freiburg Center for Data Analysis and Modeling, University of Freiburg（弗赖堡大学数据分析与建模中心）； Hengrui Europe Biosciences, Zurich（苏黎世亨格里欧生物科学公司）

AI总结本文提出了首个德语单语言RoBERTa模型GottBERT，在OSCAR德语子集上预训练，并在NER和文本分类任务上展示了竞争性能。

详情

DOI: 10.18653/v1/2024.emnlp-main.1183

AI中文摘要

预训练语言模型显著推进了自然语言处理（NLP），尤其是BERT及其优化版本RoBERTa的引入。虽然最初的研究集中在英语上，但单语言模型在多语言模型方面在预训练工作量、整体资源效率或下游任务性能上可能具有优势。尽管基于提示的LLM越来越流行，但计算效率更高的类BERT模型仍然高度相关。在这项工作中，我们提出了第一个德语单语言RoBERTa模型GottBERT，该模型仅在OSCAR数据集的德语部分上进行预训练。此外，我们研究了过滤OSCAR语料库的影响。GottBERT使用fairseq和标准超参数进行预训练。我们在两个命名实体识别（NER）任务（Conll 2003和GermEval 2014）和三个文本分类任务（GermEval 2018细粒度和粗粒度，以及10kGNAD）上，与现有的德语BERT模型和两个多语言模型进行了性能评估。性能使用$F_{1}$分数和准确率来衡量。GottBERT base和large模型表现出竞争性能，其中GottBERT在6个任务中的4个中领先于base模型。与我们的预期相反，所应用的过滤并未显著影响结果。为了支持德语NLP研究社区，我们将在MIT许可下发布GottBERT模型。

英文摘要

Pre-trained language models have significantly advanced natural language processing (NLP), especially with the introduction of BERT and its optimized version, RoBERTa. While initial research focused on English, single-language models can be advantageous compared to multilingual ones in terms of pre-training effort, overall resource efficiency or downstream task performance. Despite the growing popularity of prompt-based LLMs, more compute-efficient BERT-like models remain highly relevant. In this work, we present the first German single-language RoBERTa model, GottBERT, pre-trained exclusively on the German portion of the OSCAR dataset. Additionally, we investigated the impact of filtering the OSCAR corpus. GottBERT was pre-trained using fairseq and standard hyperparameters. We evaluated its performance on two Named Entity Recognition (NER) tasks (Conll 2003 and GermEval 2014) and three text classification tasks (GermEval 2018 fine and coarse, and 10kGNAD) against existing German BERT models and two multilingual models. Performance was measured using the $F_{1}$ score and accuracy. The GottBERT base and large models showed competitive performance, with GottBERT leading among the base models in 4 of 6 tasks. Contrary to our expectation, the applied filtering did not significantly affect the results. To support the German NLP research community, we are releasing the GottBERT models under the MIT license.

URL PDF HTML ☆

赞 0 踩 0

2506.01226 2026-06-02 eess.SY cs.LG cs.SY 版本更新

React to Surprises: Stable-by-Design Neural Feedback Control and the Youla-REN

应对意外：稳定设计的神经反馈控制与Youla-REN

Nicholas H. Barbara, Ruigang Wang, Alexandre Megretski, Ian R. Manchester

发表机构 * Australian Centre for Robotics（澳大利亚机器人中心）； School of Aerospace, Mechanical and Mechatronic Engineering（航空航天、机械与机电工程学院）； The University of Sydney（悉尼大学）； Laboratory for Information and Decision Systems（信息与决策系统实验室）； Dept. Electrical Engineering and Computer Science（电气工程与计算机科学系）

AI总结提出基于非线性Youla-Kucera参数化和鲁棒神经网络（如循环均衡网络REN）的结构，实现无约束优化且保证闭环稳定性，并分析了非线性、部分观测和增量稳定性要求下的性质。

详情

AI中文摘要

我们研究了用于基于学习的控制的稳定非线性策略的参数化。提出了一种基于非线性Youla-Kucera参数化与鲁棒神经网络（如循环均衡网络REN）相结合的结构。得到的参数化是无约束的，因此可以通过一阶优化方法进行搜索，同时始终通过构造保证闭环稳定性。我们研究了(a)非线性动力学、(b)部分观测和(c)增量闭环稳定性要求（收缩性和Lipschitz性）的组合。我们发现，对于(c)与(a)或(b)的组合，收缩且Lipschitz的Youla参数总是导致收缩且Lipschitz的闭环。然而，如果三者同时成立，则增量稳定性可能因外部扰动而丧失。相反，维持了一个较弱的条件，我们称之为d-管收缩和Lipschitz性。我们进一步得到了逆结果，表明所提出的参数化覆盖了某些非线性系统类别的所有收缩且Lipschitz的闭环。数值实验说明了我们的参数化在学习具有内置稳定性保证的控制器时的实用性，这些控制器用于：(i)没有稳定效应的“经济”奖励；(ii)短训练周期；以及(iii)不确定系统。

英文摘要

We study parameterizations of stabilizing nonlinear policies for learning-based control. We propose a structure based on a nonlinear version of the Youla-Kucera parameterization combined with robust neural networks such as the recurrent equilibrium network (REN). The resulting parameterizations are unconstrained, and hence can be searched over with first-order optimization methods, while always ensuring closed-loop stability by construction. We study the combination of (a) nonlinear dynamics, (b) partial observation, and (c) incremental closed-loop stability requirements (contraction and Lipschitzness). We find that for the combination of (c) with either (a) or (b), a contracting and Lipschitz Youla parameter always leads to contracting and Lipschitz closed loops. However, if all three hold, then incremental stability can be lost with exogenous disturbances. Instead, a weaker condition is maintained, which we call d-tube contraction and Lipschitzness. We further obtain converse results showing that the proposed parameterization covers all contracting and Lipschitz closed loops for certain classes of nonlinear systems. Numerical experiments illustrate the utility of our parameterization when learning controllers with built-in stability certificates for: (i) ``economic'' rewards without stabilizing effects; (ii) short training horizons; and (iii) uncertain systems.

URL PDF HTML ☆

赞 0 踩 0

2505.19925 2026-06-02 stat.ME cs.LG 版本更新

Cellwise and Casewise Robust Covariance in High Dimensions

高维中的逐细胞和逐案例稳健协方差

Fabio Centofanti, Mia Hubert, Peter J. Rousseeuw

发表机构 * Section of Statistics and Data Science, Department of Mathematics, KU Leuven, Belgium（统计与数据科学系，数学系，卢森堡大学，比利时）

AI总结提出cellRCov方法，通过主成分和正交子空间分解结合岭正则化，同时处理高维数据中的案例异常值、细胞异常值和缺失数据，并建立了理论性质。

详情

AI中文摘要

样本协方差矩阵是多变量统计的基石，但它对异常值高度敏感。这些异常值可以是案例异常值（例如属于不同总体的案例），也可以是细胞异常值（数据矩阵中的偏差单元格）。最近开发了一些能够处理这两种异常值的稳健协方差估计量，但其计算仅适用于最多20维。为了解决这个问题，我们提出了cellRCov方法，这是一种同时处理案例异常值、细胞异常值和缺失数据的稳健协方差估计量。它依赖于协方差在主成分和正交子空间上的分解，利用了稳健PCA的最新工作。它还采用岭型正则化来稳定估计的协方差矩阵。我们建立了cellRCov的一些理论性质，包括其逐案例和逐细胞影响函数以及一致性和渐近正态性。模拟研究证明了cellRCov在污染和缺失数据场景中的优越性能。此外，其在异常检测的实际应用中也展示了实用性。我们还构建并展示了用于稳健和正则化典型相关分析的cellRCCA方法。

英文摘要

The sample covariance matrix is a cornerstone of multivariate statistics, but it is highly sensitive to outliers. These can be casewise outliers, such as cases belonging to a different population, or cellwise outliers, which are deviating cells (entries) of the data matrix. Recently some robust covariance estimators have been developed that can handle both types of outliers, but their computation is only feasible up to at most 20 dimensions. To remedy this we propose the cellRCov method, a robust covariance estimator that simultaneously handles casewise outliers, cellwise outliers, and missing data. It relies on a decomposition of the covariance on principal and orthogonal subspaces, leveraging recent work on robust PCA. It also employs a ridge-type regularization to stabilize the estimated covariance matrix. We establish some theoretical properties of cellRCov, including its casewise and cellwise influence functions as well as consistency and asymptotic normality. A simulation study demonstrates the superior performance of cellRCov in contaminated and missing data scenarios. Furthermore, its practical utility is illustrated in a real-world application to anomaly detection. We also construct and illustrate the cellRCCA method for robust and regularized canonical correlation analysis.

URL PDF HTML ☆

赞 0 踩 0

2505.18113 2026-06-02 cs.LG math.OC 版本更新

Beyond Discreteness: Sample Complexity Analysis of Straight-Through Estimator for 1-bit Quantization

超越离散性：1比特量化的直通估计器样本复杂度分析

Halyun Jeong, Jack Xin, Penghang Yin

发表机构 * Department of Mathematics and Statistics, University at Albany, SUNY（纽约州立大学阿尔巴尼分校数学与统计学系）； Department of Mathematics, University of California, Irvine（加州大学欧文分校数学系）

AI总结本文首次对神经网络量化中直通估计器（STE）的样本复杂度进行分析，通过研究具有二元权重和激活的两层神经网络的量化感知训练，推导出保证STE优化收敛到全局最小值的样本复杂度界，并发现标签噪声下STE梯度方法的循环逃逸与回归特性，以及STE在非高斯数据上失效但可通过归一化恢复有效性。

详情

AI中文摘要

训练量化神经网络需要解决底层优化问题的非可微和离散性质。为应对这一挑战，直通估计器（STE）已成为最广泛采用的启发式方法，通过引入有偏但有效的替代梯度，允许通过离散操作进行反向传播。然而，其理论性质仍 largely unexplored，现有少数分析通过假设无限训练数据来关注泛化误差。相比之下，本文首次在神经网络量化背景下对STE进行了样本复杂度分析。我们的理论结果强调了样本量在STE成功中的关键作用，这是现有研究缺失的关键见解。具体而言，通过分析具有二元权重和激活的两层神经网络的量化感知训练，我们推导出以数据维度表示的样本复杂度界，这些界保证了基于STE的优化在遍历和非遍历分析中收敛到全局最小值。此外，在存在标签噪声的情况下，我们证明了STE梯度方法的一个有趣循环性质，其中迭代反复逃离并返回到最优二元权重。最后，我们实验证明STE在一般非高斯数据上失败，但通过归一化可以恢复其有效性，这突显了其在有效量化中的实际重要性。

英文摘要

Training quantized neural networks requires addressing the non-differentiable and discrete nature of the underlying optimization problem. To tackle this challenge, the straight-through estimator (STE) has become the most widely adopted heuristic, allowing backpropagation through discrete operations by introducing biased yet valid surrogate gradients. However, its theoretical properties remain largely unexplored, with few existing analyses focus on the generalization error by assuming an infinite amount of training data. In contrast, this work presents the first sample complexity analysis of STE in the context of neural network quantization. Our theoretical results highlight the critical role of sample size in the success of STE, a key insight absent from existing studies. Specifically, by analyzing the quantization-aware training of a two-layer neural network with binary weights and activations, we derive the sample complexity bounds in terms of the data dimensionality that guarantee the convergence of STE-based optimization to the global minimum for both ergodic and non-ergodic analyses. Moreover, in the presence of label noises, we prove an intriguing recurrence property of STE-gradient method, where the iterate repeatedly escape from and return to the optimal binary weights. Finally, we empirically demonstrate that STE fails for general non-Gaussian data but its effectiveness can be restored through normalization, underscoring its practical importance in effective quantization.

URL PDF HTML ☆

赞 0 踩 0

2503.24183 2026-06-02 cs.LG cs.MA 版本更新

Scalable Ride-Sourcing Vehicle Rebalancing with Service Accessibility Guarantee: A Constrained Mean-Field Reinforcement Learning Approach

可扩展的网约车再平衡方法：具有服务可及性保证的约束平均场强化学习

Matej Jusup, Kenan Zhang, Zhiyuan Hu, Barna Pásztor, Andreas Krause, Francesco Corman

发表机构 * ETH Zürich（苏黎世联邦理工学院）； EPFL Lausanne（洛桑联邦理工学院）

AI总结提出基于约束平均场强化学习的连续状态再平衡模型，在保证服务公平性的同时实现大规模网约车车队的高效协调与可扩展性。

Comments 34 pages, 15 figures

详情

Journal ref: Transportation Research Part C: Emerging Technologies, Vol. 188, 105705 (2026)

AI中文摘要

Uber和Lyft等网约车服务的扩张通过移动应用提供灵活的按需出行，重塑了城市交通。尽管便利，这些平台面临重大运营挑战，尤其是车辆再平衡——即战略性地重新定位车队以解决供需的时空错配。再平衡不足会导致乘客等待时间延长和车辆利用率低下，还会引发公平性问题，如服务分布不均和司机收入差异。为解决这些问题，我们引入了具有连续再平衡动作的连续状态平均场控制（MFC）和平均场强化学习（MFRL）模型。MFC和MFRL通过车辆与车辆分布（而非单个车辆）的交互来建模每辆车的行为，从而提供可扩展的解决方案。这缓解了关于智能体数量的维度灾难，使得能够以显著降低的计算复杂度协调大型车队，并在车队规模变化时无需重新训练模型。为确保跨地理区域的公平服务可及性，我们将可及性约束整合到模型中，并推导出在高度满足乘客需求和公平覆盖车辆供应之间取得平衡的再平衡策略。使用深圳数据驱动模拟的广泛评估证明了我们方法的效率和鲁棒性。值得注意的是，该方法可扩展到数万辆车辆，训练时间与线性规划再平衡相当。此外，我们的策略有效探索了效率-公平帕累托前沿，在车队利用率、完成请求数和接驾距离等关键指标上优于传统基准，同时确保公平的服务可及性。

英文摘要

The expansion of ride-sourcing services such as Uber and Lyft has reshaped urban transportation by offering flexible, on-demand mobility via mobile applications. Despite convenience, these platforms confront significant operational challenges, particularly vehicle rebalancing-strategic repositioning of a fleet of vehicles to address spatiotemporal mismatches in supply and demand. Inadequate rebalancing results in prolonged rider waiting times and inefficient vehicle utilization, but also leads to fairness issues, such as the inequitable distribution of service and disparities in driver income. To tackle these, we introduce continuous-state mean-field control (MFC) and mean-field reinforcement learning (MFRL) models with continuous repositioning actions. MFC and MFRL offer scalable solutions by modeling each vehicle's behavior through interaction with the vehicle distribution, rather than with individual vehicles. This mitigates the curse of dimensionality with respect to the number of agents, enabling coordination across large fleets with significantly reduced computational complexity and eliminating the need to retrain the model when fleet size changes. To ensure equitable service access across geographic regions, we integrate an accessibility constraint into models and derive rebalancing policies that strike a balance between high fulfillment of rider demand and fair coverage of vehicle supply. Extensive evaluation using data-driven simulation of Shenzhen demonstrates the efficiency and robustness of our approach. Remarkably, it scales to tens of thousands of vehicles, with training times comparable to linear programming rebalancing. Besides, our policies effectively explore the efficiency-equity Pareto front, outperforming conventional benchmarks across key metrics like fleet utilization, fulfilled requests, and pickup distance, while ensuring equitable service access.

URL PDF HTML ☆

赞 0 踩 0

2505.14725 2026-06-02 q-bio.GN cs.LG stat.AP 版本更新

HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity

HR-VILAGE-3K3M：用于系统免疫学的人类呼吸道病毒免疫纵向基因表达数据集

Xuejun Sun, Yiran Song, Xiaochen Zhou, Ruilie Cai, Yu Zhang, Xinyi Li, Rui Peng, Jialiu Xie, Yuanyuan Yan, Muyao Tang, Prem Lakshmanane, Baiming Zou, James S. Hagood, Raymond J. Pickles, Didong Li, Fei Zou, Xiaojing Zheng

发表机构 * Department of Biostatistics University of North Carolina at Chapel Hill（北卡罗来纳大学教堂山分校生物统计学系）； Department of Epidemiology and Biostatistics University of South Carolina（南卡罗来纳大学流行病学与生物统计学系）； Department of Pediatrics University of North Carolina at Chapel Hill（北卡罗来纳大学教堂山分校儿科系）； Department of Microbiology and Immunology University of North Carolina at Chapel Hill（北卡罗来纳大学教堂山分校微生物学与免疫学系）

AI总结为解决呼吸道病毒感染研究中转录组数据分散且处理不一致的问题，构建了包含3178名受试者、66项研究的HR-VILAGE-3K3M数据集，整合了疫苗接种、病毒接种和混合暴露的批量及单细胞转录组数据，并进行了统一的预处理和质量控制，以支持生物标志物发现、免疫机制研究和分析方法开发。

详情

AI中文摘要

呼吸道病毒感染构成全球健康负担，但保护性和病理性的细胞免疫机制仍不清楚。自然感染队列通常缺乏暴露前基线和时间控制采样，而接种和疫苗试验则产生结构良好的纵向转录组数据。然而，这些数据集分散在多个存储库中且处理不一致，阻碍了整合性和AI驱动的分析。为应对这些挑战，我们开发了人类呼吸道病毒免疫纵向基因表达（HR-VILAGE-3K3M）存储库：一个整合了来自66项研究的3178名受试者的批量及单细胞转录组谱的AI就绪资源。该数据集涵盖疫苗接种、病毒接种和混合暴露，样本来自血液和鼻拭子，收集自GEO、ImmPort和ArrayExpress等公共存储库。我们整理并协调了受试者级别的元数据，标准化了结果测量，并应用了统一的预处理和严格的质量控制。我们还提供了基准分析以说明其实用性。该资源支持生物标志物发现、免疫机制和方法学开发。作为人类呼吸道病毒免疫领域最大的纵向转录组资源之一，HR-VILAGE-3K3M能够实现可重复和可扩展的分析，从而加速疫苗和抗病毒研究。

英文摘要

Respiratory viral infections pose a global health burden, yet the cellular immune mechanisms underlying protection and pathology remain unclear. Natural infection cohorts often lack pre-exposure baselines and time-controlled sampling, whereas inoculation and vaccination trials generate well-structured longitudinal transcriptomic data. However, these datasets are scattered across repositories and processed inconsistently, hindering integrative and AI-driven analyses. To address these challenges, we developed the Human Respiratory Viral Immunization LongitudinAl Gene Expression (HR-VILAGE-3K3M) repository: an AI-ready resource integrating bulk and single-cell transcriptomic profiles from 3,178 subjects across 66 studies. The dataset spans vaccination, inoculation, and mixed exposures, with samples from blood and nasal swabs collected from public repositories including GEO, ImmPort, and ArrayExpress. We curated and harmonized subject-level metadata, standardized outcome measures, and applied unified preprocessing with rigorous quality control. We further provide benchmark analyses illustrating its utility. This resource supports discovery of biomarkers, immune mechanisms, and methodological development. As one of the largest longitudinal transcriptomic resources for human respiratory viral immunization, HR-VILAGE-3K3M enables reproducible and scalable analyses to accelerate vaccine and antiviral research.

URL PDF HTML ☆

赞 0 踩 0

2505.13273 2026-06-02 cs.AI cs.LG 版本更新

MPEC：通过集成基于聚类的分类器实现流形保持的脑电图分类

Shermin Shahbazi, Mohammad-Reza Nasiri, Majid Ramezani

发表机构 * Department of Electrical and Computer（电气与计算机系）； Department of Computer Science and Engineering, Information Technology（计算机科学与工程系，信息科技）

AI总结提出MPEC方法，通过协方差矩阵和RBF核的特征工程以及黎曼流形上的改进K-means聚类集成，解决EEG信号的非欧几里得流形结构问题，在BCI Competition IV数据集2a上取得显著提升。

Comments 7 pages ,3 figures

详情

DOI: 10.1109/CSICC65765.2025.10967471

AI中文摘要

脑电图信号的准确分类对于脑机接口（BCI）和神经假体应用至关重要，然而许多现有方法未能考虑EEG数据的非欧几里得流形结构，导致性能欠佳。保留这种流形信息对于捕捉EEG信号的真实几何结构至关重要，但传统分类技术在很大程度上忽视了这一需求。为此，我们提出了MPEC（通过集成基于聚类的分类器实现流形保持的EEG分类），它引入了两项关键创新：（1）一个特征工程阶段，结合协方差矩阵和径向基函数（RBF）核来捕捉EEG通道之间的线性和非线性关系；（2）一个聚类阶段，采用针对黎曼流形空间定制的改进K-means算法，确保局部几何敏感性。通过集成多个基于聚类的分类器，MPEC取得了优越的结果，并在BCI Competition IV数据集2a上得到了显著改进的验证。

英文摘要

Accurate classification of EEG signals is crucial for brain-computer interfaces (BCIs) and neuroprosthetic applications, yet many existing methods fail to account for the non-Euclidean, manifold structure of EEG data, resulting in suboptimal performance. Preserving this manifold information is essential to capture the true geometry of EEG signals, but traditional classification techniques largely overlook this need. To this end, we propose MPEC (Manifold-Preserved EEG Classification via an Ensemble of Clustering-Based Classifiers), that introduces two key innovations: (1) a feature engineering phase that combines covariance matrices and Radial Basis Function (RBF) kernels to capture both linear and non-linear relationships among EEG channels, and (2) a clustering phase that employs a modified K-means algorithm tailored for the Riemannian manifold space, ensuring local geometric sensitivity. Ensembling multiple clustering-based classifiers, MPEC achieves superior results, validated by significant improvements on the BCI Competition IV dataset 2a.

URL PDF HTML ☆

赞 0 踩 0

2504.20238 2026-06-02 physics.ao-ph cs.LG 版本更新

Atmospheric Predictability Beyond 30 Days with Machine Learning

利用机器学习实现30天以上的大气可预报性

P. Trent Vonich, Gregory J. Hakim

发表机构 * University of Washington（华盛顿大学）； Air Force Institute of Technology（空军技术研究院）

AI总结通过机器学习模型GraphCast优化初始条件，将确定性天气预报的时效从两周延长至30天以上，平均误差降低86%。

详情

AI中文摘要

长期以来，大气可预报性研究认为小空间尺度上的快速误差增长将确定性天气预报的固有时效限制在约两周。我们利用机器学习天气模型GraphCast，通过优化2020年每日两次预报的初始条件，挑战了这一极限。该方法在十天预报中，相对于再分析初始条件的控制预报，平均误差降低了86%，且技能持续超过30天。平均最优初始条件扰动显示出大尺度、空间一致的修正，主要反映了哈德莱环流的增强。在盘古天气模型中使用GraphCast最优初始条件的预报实现了21%的误差降低，在四天时达到峰值，表明分析修正针对的是模型和分析误差的调整。这些结果证明了存在能够产生远超两周的 skillful 确定性预报的初始条件。这些初始条件是否能够实时识别以改进业务天气预报，仍是未来研究的课题。

英文摘要

Atmospheric predictability research has long held that rapid error growth at small spatial scales imposes an intrinsic limit of roughly two weeks on deterministic weather forecast skill. We challenge this limit using GraphCast, a machine-learning weather model, by optimizing initial conditions for twice-daily forecasts spanning 2020. This approach yields an average error reduction of 86% at ten days relative to control forecasts from reanalysis initial conditions, with skill lasting beyond 30 days. Mean optimal initial-condition perturbations reveal large-scale, spatially coherent corrections primarily reflecting an intensification of the Hadley circulation. Forecasts using GraphCast-optimal initial conditions in the Pangu-Weather model achieve a 21% error reduction, peaking at four days, indicating that analysis corrections reflect adjustments that target both model and analysis error. These results demonstrate the existence of initial conditions producing skillful deterministic forecasts far beyond two weeks. Whether such initial conditions can be identified in real-time for improving operational weather forecasts remains a topic of future research.

URL PDF HTML ☆

赞 0 踩 0

2504.17471 2026-06-02 cs.LG cs.AI cs.DC 版本更新

无需对训练模型进行任何修改的深度神经网络非平凡泛化界

Khoat Than, Dat Phan

发表机构 * Hanoi University of Science and Technology（河内科学与技术大学）； VinBigdata Institute（VinBigdata研究院）

AI总结提出一类新的数据依赖泛化界，直接应用于未修改的训练模型，通过分解泛化误差为分布复杂度和局部模型行为项，首次在大型未修改深度网络上实现非平凡泛化保证。

详情

AI中文摘要

理解和认证现代深度神经网络的行为仍然是可靠机器学习中的一个基本挑战。我们引入了一类新的数据依赖泛化界，直接应用于训练模型，无需任何修改。特别地，我们提出了一个可精确计算的界，在所有评估的网络中（包括具有6亿参数的ImageNet规模模型）都是非平凡的。这是首次表明即使对于大型未修改的深度网络，也能实现有意义的泛化保证。我们的方法揭示了泛化由训练模型与数据分布几何之间的相互作用所支配。我们将泛化误差分解为两个可解释的组成部分：一个分布复杂度项，捕捉数据质量在输入空间中的分布；以及局部模型行为项，捕捉网络在单个区域内的行为。这种联合依赖识别出泛化差距出现的位置和原因。实验上，我们界的某些部分对真实测试误差具有高度预测性，并且当划分与内在数据几何对齐时，界会收紧，突出了数据依赖的局部正则性作为泛化的关键驱动因素。

英文摘要

Understanding and certifying the behavior of modern deep neural networks remains a fundamental challenge in reliable machine learning. We introduce a new class of data-dependent generalization bounds that apply directly to trained models, without any modification. In particular, we present an exactly computable bound that is non-vacuous across all evaluated networks, including ImageNet-scale models with 600M parameters. This this is the first work showing that meaningful generalization guarantees are achievable even for large, unaltered deep networks. Our approach reveals that generalization is governed by the interaction between the trained model and the geometry of the data distribution. We decompose the generalization error into two interpretable components: a distributional complexity term, capturing how the data mass is distributed across the input space, and local model-behavior terms, capturing the network's behavior within individual regions. This joint dependence identifies where and why generalization gaps arise. Empirically, some components of our bound are highly predictive of the true test error, and the bound tightens when the partition aligns with the intrinsic data geometry, highlighting data-dependent local regularity as a key driver of generalization.

URL PDF HTML ☆

赞 0 踩 0

2502.20016 2026-06-02 cs.LG 版本更新

Position: Neglecting the Sustainability of AI is Fuelling a Global AI Arms Race

立场：忽视人工智能的可持续性正在助长全球人工智能军备竞赛

Pedram Bakhtiarifard, Pınar Tözün, Christian Igel, Raghavendra Selvan

发表机构 * Department of Computer Science, University of Copenhagen, Denmark（丹麦哥本哈根大学计算机科学系）； Robotics Section, IT University of Copenhagen, Denmark（丹麦哥本哈根IT大学机器人学系）

AI总结本文指出当前AI可持续性讨论忽视经济和社会维度，提出通过调和气候意识与资源意识、引入CARAML框架来遏制全球AI军备竞赛。

Comments Accepted to be presented at ICML 2026. Source code at https://github.com/saintslab/caraml

详情

AI中文摘要

可持续性包含三个关键方面：经济、环境和社会。然而，关于可持续人工智能（AI）的新兴讨论主要集中在AI的环境可持续性上，忽视了经济和社会方面。实现真正可持续的AI需要解决其环境可持续性（强调减轻AI对气候的影响）与社会可持续性（依赖于公平获取AI开发资源）之间的张力。然而，这种提高可及性的推动往往忽视了扩大此类资源使用的环境成本。本立场论文认为，调和气候意识和资源意识对于实现真正可持续的AI至关重要，而忽视这些因素会助长全球AI军备竞赛。运用历史唯物主义的卡尔·马克思基础-上层建筑框架，我们分析了物质条件如何塑造当前的AI进展及其相关讨论。此外，我们引入了气候与资源感知机器学习（CARAML）框架，并提出了涵盖个人、社区、行业、政府和全球层面的可操作建议，以实现可持续的AI。

英文摘要

Sustainability encompasses three key facets: economic, environmental, and social. However, the nascent discourse on sustainable artificial intelligence (AI) predominantly focuses on the environmental sustainability of AI, neglecting the economic and social aspects. Achieving truly sustainable AI necessitates addressing the tension between its environmental sustainability, which emphasises mitigating AI's climate impact, and its social sustainability, hinging on equitable access to AI development resources. This push for increased accessibility, however, often overlooks the environmental costs of expanding such resource usage. This position paper argues that reconciling climate awareness and resource awareness is essential to realising truly sustainable AI, and neglecting these factors fuels a global AI arms race. Applying Karl Marx's base-superstructure framework from historical materialism, we analyse how the material conditions are shaping the current AI progress and the discourse surrounding it. Further, we introduce the Climate and Resource Aware Machine Learning (CARAML) framework with actionable recommendations spanning individual, community, industry, government, and global levels to achieve sustainable AI.

URL PDF HTML ☆

赞 0 踩 0

2112.11279 2026-06-02 cs.LG 版本更新

Differential Parity: Relative Fairness Between Two Sets of Decisions

差分奇偶性：两组决策之间的相对公平性

Zhe Yu, Xiaoyin Xi, Pranam Prakash Shetty

发表机构 * Rochester Institute of Technology（罗切斯特理工学院）

AI总结本文提出差分奇偶性概念，通过比较两组决策对敏感属性的独立性来评估相对公平性，避免了绝对公平定义的模糊性，并可在有或无参考集时分别作为群体公平度量或揭示偏好/偏见。

Comments Accepted by JAIR

详情

DOI: 10.1613/jair.1.21278

AI中文摘要

随着AI系统广泛应用于辅助人类决策过程，如人才招聘、学校录取和贷款审批，确保决策公平的需求日益增长。分析决策公平性的一个主要挑战是标准高度主观且依赖上下文——对于每个场景而言，绝对公平的含义并无共识。这并非说不同的公平标准经常相互冲突。为了绕过这个问题，本文旨在测试决策中的相对公平性。也就是说，我们不定义什么是“绝对”公平的决策，而是提出通过差分奇偶性——两组决策之间的差异应独立于某个敏感属性——来测试一组决策相对于另一组的相对公平性。这一提出的差分奇偶性公平概念具有以下优点：(1) 避免了绝对公平决策定义的模糊性和矛盾性；(2) 当存在参考集（真实标签或可靠公平决策）时，差分奇偶性可作为新的群体公平概念（类似于分离性和充分性，但有所不同）；(3) 即使没有参考集，它也能揭示不同决策集之间的相对偏好或偏见。差分奇偶性的一个局限性是它要求被比较的两组决策针对相同的数据主体做出。为了克服这一局限性，我们提出利用机器学习模型来弥合针对不同数据做出的两组决策之间的差距，并估计差分奇偶性。

英文摘要

With AI systems widely applied to assist humans in decision-making processes such as talent hiring, school admission, and loan approval; there is an increasing need to ensure that the decisions made are fair. One major challenge for analyzing fairness in decisions is that the standards are highly subjective and contextual -- there is no consensus for what absolute fairness means for every scenario. That is not to say that different fairness standards often conflict with each other. To bypass this issue, this work aims to test relative fairness in decisions. That is, instead of defining what are ``absolutely'' fair decisions, we propose to test the relative fairness of one decision set against another with differential parity -- the difference between two sets of decisions should be independent of a certain sensitive attribute. This proposed notion of differential parity fairness has the following benefits: (1) it avoids the ambiguous and contradictory definition of what absolutely fair decisions are; (2) when a reference set (of ground truth or reliable fair decisions) is available, differential parity can serve as a new group fairness notion (similar to but different from separation and sufficiency); (3) even when no reference set is available, it reveals the relative preference or bias between different decision sets. One limitation for differential parity is that it requires the two sets of decisions under comparison to be made on the same data subjects. To overcome this limitation, we propose to utilize a machine learning model to bridge the gap between the two sets of decisions made on difference data and estimate the differential parity.

URL PDF HTML ☆

赞 0 踩 0

2502.04646 2026-06-02 cs.LG cs.AI 版本更新

Efficient Weighted Sampling via Score-based Generative Models

基于分数生成模型的高效加权采样

Heasung Kim, Taekyun Lee, Hyeji Kim, Gustavo de Veciana

发表机构 * The University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结提出一种无需训练的加权采样框架，通过轻量级引导近似和不确定性感知调度器，在预训练分数生成模型上实现高效、稳定的采样，并在大规模设置中取得1.2至4.7倍加速。

Comments 37 pages

详情

AI中文摘要

加权采样——从与基概率密度函数和权重函数乘积成比例的概率密度函数中采样——是一种基础技术，在方差缩减、有偏采样、数据增强等领域有广泛应用。利用日益可用的预训练分数生成模型，我们提出了一种无需训练的加权采样框架，通过以原则性和计算高效的方式，用辅助引导项增强预训练基分数函数，来近似目标分布的逆向扩散过程。我们的方法基于两个关键组件：一个轻量级的引导近似，避免了分数函数和权重函数的高阶导数；以及一个不确定性感知调度器，基于近似误差的时间分析动态调整引导强度。这些组件共同实现了准确稳定的采样，无需依赖现有方法通常需要的基于粒子的重采样或Hessian评估。我们从合成设置到大规模设置（如Stable Diffusion XL）验证了方法的有效性，在该框架下，我们实现了1.2倍到4.7倍的加速，同时在任务性能上始终匹配或超越最先进的基线。这些结果使我们的方法成为生成应用中任务自适应、时间敏感采样的可扩展且推理高效的解决方案。

英文摘要

Weighted sampling -- sampling from a probability density function (PDF) proportional to the product of a base PDF and a weight function -- is a fundamental technique with wide-ranging applications in variance reduction, biased sampling, data augmentation, and more. Leveraging the increasing availability of pretrained score-based generative models (SGMs), we propose a training-free weighted sampling framework that approximates the backward diffusion process of the target distribution by augmenting the pretrained base score function with an auxiliary guidance term, in a principled and computationally efficient manner. Our approach builds on two key components: a lightweight approximation of the guidance that avoids costly higher-order derivatives of both the score and weight functions, and an uncertainty-aware scheduler that dynamically adjusts the guidance strength based on a temporal analysis of approximation error. Together, these components enable accurate and stable sampling without relying on particle-based resampling or Hessian evaluations commonly required by existing methods. We validate the effectiveness of our method from synthetic to large-scale settings such as Stable Diffusion XL, where our framework achieves $1.2\times$ to $4.7\times$ speedups while consistently matching or outperforming state-of-the-art baselines in task performance. These results position our method as a scalable and inference-efficient solution for task-adaptive, time-sensitive sampling in generative applications.

URL PDF HTML ☆

赞 0 踩 0

2306.15369 2026-06-02 cs.SE cs.LG 版本更新

A Meta-analytical Comparison of Naive Bayes and Random Forest for Software Defect Prediction

朴素贝叶斯与随机森林在软件缺陷预测中的元分析比较

Ch Muhammad Awais, Wei Gu, Gcinizwe Dlamini, Zamira Kholmatova, Giancarlo Succi

发表机构 * Innopolis University（因诺波利斯大学）； Università di Bologna（博洛尼亚大学）

AI总结通过系统文献综述和元分析，比较朴素贝叶斯和随机森林在召回率、F-measure和精确度上的统计差异，发现两者无显著差异。

Comments 11 pages, 8 figures, Conference Paper

2111.03861 2026-06-02 cs.CV cs.AI cs.LG 版本更新

What augmentations are sensitive to hyper-parameters and why?

哪些数据增强对超参数敏感以及为什么？

Ch Muhammad Awais, Imad Eddine Ibrahim Bekkouch

发表机构 * Knowledge Representation Lab Innopolis University（知识表示实验室印尼奥利普斯大学）； Sorbonne Center for Artificial Intelligence - SCAI Sorbonne University（索邦人工智能中心 - SCAI 索邦大学）

AI总结本研究通过局部代理（LIME）解释和线性回归系数评估不同数据增强对模型超参数的敏感性、一致性和影响，发现某些增强对超参数高度敏感，而另一些则更稳健可靠。

Comments 10 pages, 17 figures

详情

DOI: 10.1007/978-3-031-10461-9_31
Journal ref: Intelligent Computing: Proceedings of the 2022 Computing Conference

AI中文摘要

我们对数据集应用增强以提高预测质量，并使最终模型对噪声数据和领域漂移更具鲁棒性。然而，问题仍然存在：这些增强在不同的超参数下表现如何？在本研究中，我们通过执行局部代理（LIME）解释来评估增强对模型超参数的敏感性、一致性和影响，当不同增强应用于机器学习模型时，解释超参数的影响。我们利用线性回归系数来加权每个增强。我们的研究证明，有些增强对超参数高度敏感，而其他增强则更具鲁棒性和可靠性。

英文摘要

We apply augmentations to our dataset to enhance the quality of our predictions and make our final models more resilient to noisy data and domain drifts. Yet the question remains, how are these augmentations going to perform with different hyper-parameters? In this study we evaluate the sensitivity of augmentations with regards to the model's hyper parameters along with their consistency and influence by performing a Local Surrogate (LIME) interpretation on the impact of hyper-parameters when different augmentations are applied to a machine learning model. We have utilized Linear regression coefficients for weighing each augmentation. Our research has proved that there are some augmentations which are highly sensitive to hyper-parameters and others which are more resilient and reliable.

URL PDF HTML ☆

赞 0 踩 0

2501.08640 2026-06-02 cs.LG stat.ML 版本更新

Quantum Reservoir Computing and Risk Bounds

量子储层计算与风险界

Naomi Mona Chmielewski, Nina Amini, Joseph Mikael

发表机构 * EDF Lab, France（法国EDF实验室）

AI总结利用Rademacher复杂度对量子储层计算中的泛化误差进行界定，并分析其随量子比特数增长的标度行为。

2412.19444 2026-06-02 cs.LG math.OC stat.ML 版本更新

Towards Simple and Provable Parameter-Free Adaptive Gradient Methods

迈向简单且可证明的无参数自适应梯度方法

Yuanzhe Tao, Yifeng Liu, Huizhuo Yuan, Xun Zhou, Yuan Cao, Quanquan Gu

发表机构 * School of Mathematical Sciences, Peking University（北京大学数学科学学院）； Department of Computer Science, University of California, Los Angeles（加州大学洛杉矶分校计算机科学系）； School of Computing and Data Science, the University of Hong Kong（香港大学计算科学与数据科学学院）； Bytedance Inc（字节跳动公司）

AI总结提出 AdaGrad++ 和 Adam++ 两种简单无参数自适应梯度方法，在无需预设学习率的情况下实现与 AdaGrad 和 Adam 相当的收敛保证。

Comments 45 pages, 19 figures, 3 tables

详情

AI中文摘要

诸如 AdaGrad 和 Adam 等优化算法通过在优化过程中动态调整学习率，显著推进了深度模型的训练。然而，学习率的临时调整带来了挑战并导致实际中的低效。为解决此问题，近期研究聚焦于开发无需学习率调整即可有效运行的“无参数”算法。尽管有这些努力，现有的 AdaGrad 和 Adam 无参数变体往往过于复杂且/或缺乏正式的收敛保证。在本文中，我们提出了 AdaGrad++ 和 Adam++，这是 AdaGrad 和 Adam 的新型简单无参数变体，具有收敛保证。我们证明 AdaGrad++ 在凸优化中无需预设学习率假设即可达到与 AdaGrad 相当的收敛速率。类似地，Adam++ 在不依赖任何学习率条件的情况下匹配 Adam 的收敛速率。跨多种深度学习任务的实验结果验证了 Adam++ 的竞争性能。

英文摘要

Optimization algorithms such as AdaGrad and Adam have significantly advanced the training of deep models by dynamically adjusting the learning rate during the optimization process. However, ad-hoc tuning of learning rates poses a challenge and leads to inefficiencies in practice. To address this issue, recent research has focused on developing ``parameter-free'' algorithms that operate effectively without the need for learning rate tuning. Despite these efforts, existing parameter-free variants of AdaGrad and Adam tend to be overly complex and/or lack formal convergence guarantees. In this paper, we present AdaGrad++ and Adam++, novel and simple parameter-free variants of AdaGrad and Adam with convergence guarantees. We prove that AdaGrad++ achieves comparable convergence rates to AdaGrad in convex optimization without predefined learning rate assumptions. Similarly, Adam++ matches the convergence rate of Adam without relying on any conditions on the learning rates. Experimental results across various deep learning tasks validate the competitive performance of Adam++.

URL PDF HTML ☆

赞 0 踩 0

2412.19419 2026-06-02 cs.LG cs.AI 版本更新

Introduction to Graph Neural Networks for Machine Learning Engineers

面向机器学习工程师的图神经网络导论

James H. Tanis, Chris Giannella, Adrian V. Mariano, Daoud Meerzaman

发表机构 * The MITRE Corporation（MITRE公司）； National Cancer Institute（国家癌症研究所）

AI总结本文通过编码器-解码器框架介绍图神经网络，并通过同质图上的理论和实验分析不同训练规模和复杂度下的行为，重点讨论过平滑和过挤压问题。

Comments Author accepted manuscript. Title and metadata updated to match the published ACM Computing Surveys version. 73 pages, including references and supplementary material

2412.12036 2026-06-02 cs.LG cs.RO 版本更新

LeARN: Learnable and Adaptive Representations for Nonlinear Dynamics in System Identification

LeARN: 系统辨识中非线性动力学的可学习与自适应表示

Arunabh Singh, Joyjit Mukherjee

发表机构 * Visual Computing Lab, Indian Institute of Science（印度科学院视觉计算实验室）； Department of Electrical and Electronics Engineering, BITS Pilani Hyderabad Campus（BITS Pilani Hyderabad校区电子与电气工程系）

AI总结提出LeARN框架，通过元学习从数据中直接学习基函数库，无需领域知识，实现非线性动力学的自适应辨识，在Neural Fly数据集上达到与SINDy相当的动态误差性能。

Comments This work has been accepted at the 34th Mediterranean Conference on Control and Automation (MED 2026)

详情

AI中文摘要

系统辨识是从观测的输入-输出数据中推导动态系统数学模型的过程，随着基于学习的方法的出现，经历了范式转变。这些方法解决了非线性动态系统中数据驱动发现的复杂挑战，受到了广泛关注。其中，稀疏非线性动力学辨识（SINDy）已成为一种变革性方法，将复杂的动态行为提炼为基函数的可解释线性组合。然而，SINDy依赖领域专业知识来构建其基函数的基础“库”，限制了其适应性和通用性。在这项工作中，我们引入了一个非线性系统辨识框架LeARN，通过直接从数据中学习基函数库，超越了对先验领域知识的需求。为了增强对不同噪声条件下动态系统演变的适应性，我们采用了一种新颖的基于元学习的系统辨识方法，利用轻量级深度神经网络（DNN）动态优化这些基函数。这不仅捕捉了复杂的系统行为，还能有效适应新的动态模式。我们在Neural Fly数据集上验证了我们的框架，展示了其强大的适应和泛化能力。尽管简单，我们的LeARN在动态误差性能上与SINDy相当。这项工作朝着自主发现动态系统迈出了一步，为机器学习无需大量领域特定干预即可揭示复杂系统控制原理的未来铺平了道路。

英文摘要

System identification, the process of deriving mathematical models of dynamical systems from observed input-output data, has undergone a paradigm shift with the advent of learning-based methods. Addressing the intricate challenges of data-driven discovery in nonlinear dynamical systems, these methods have garnered significant attention. Among them, Sparse Identification of Nonlinear Dynamics (SINDy) has emerged as a transformative approach, distilling complex dynamical behaviors into interpretable linear combinations of basis functions. However, SINDy's reliance on domain-specific expertise to construct its foundational 'library' of basis functions limits its adaptability and universality. In this work, we introduce a nonlinear system identification framework LeARN that transcends the need for prior domain knowledge by learning the library of basis functions directly from data. To enhance adaptability to evolving system dynamics under varying noise conditions, we employ a novel meta-learning-based system identification approach that utilizes a light-weight Deep Neural Network (DNN) to dynamically refine these basis functions. This not only captures intricate system behaviors but also adapts effectively to new dynamical regimes. We validate our framework on the Neural Fly dataset, showcasing its robust adaptation and generalization capabilities. Despite its simplicity, our LeARN achieves competitive dynamical error performance to SINDy. This work presents a step towards autonomous discovery of dynamical systems, paving the way for a future where machine learning uncovers the governing principles of complex systems without requiring extensive domain-specific interventions.

URL PDF HTML ☆

赞 0 踩 0

2412.10362 2026-06-02 cs.LG cs.CV 版本更新

多标签特征选择的隐式正则化

Dou El Kefel Mansouri, Khalid Benabdeslem, Seif-Eddine Benkabou

AI总结针对多标签学习中的特征选择问题，提出一种基于隐式正则化和标签嵌入的估计器，通过Hadamard积参数化避免显式正则化项的额外偏差，实验表明该方法可减少偏差并可能导致良性过拟合。

Comments 14 pages, 11 figures, Submitted for publication and currently under review

2411.05196 2026-06-02 cs.AI cs.DL cs.LG 版本更新

Explainable AI Through a Democratic Lens: DhondtXAI for D'Hondt-Projected Feature Attribution

通过民主视角的可解释AI：用于D'Hondt投影特征归因的DhondtXAI

Turker Berk Donmez

发表机构 * Sakarya University of Applied Sciences（萨卡里亚应用科学大学）

AI总结提出DhondtXAI，一种基于D'Hondt规则的独立于SHAP的表格数据可解释性框架，通过计算背景干预移除效应、分离正负证据、形成特征联盟并分配席位，实现特征归因，在合成数据和医疗数据集上验证了其与SHAP的高度一致性。

详情

AI中文摘要

本研究提出DhondtXAI，作为一种独立于SHAP、基于D'Hondt的表格可解释AI归因框架。DhondtXAI不依赖于模型原生特征重要性或SHAP值，而是计算背景干预移除效应，分离正负证据，形成可选的特征联盟，应用可选的阈值，通过D'Hondt规则分配席位，并投影到局部模型输出差异上。通过构造保持完整性，投影残差比作为诊断指标报告。该方法在合成加性和交互测试、相关特征扰动、算子和分配消融、投影模式比较、logit尺度检查、重复分割验证、配对删除测试以及两个医疗数据集（威斯康星诊断乳腺癌（CatBoost）和早期糖尿病风险预测（XGBoost））上进行了评估。SHAP仅作为外部比较器，设置对齐。在加性合成数据中，DhondtXAI精确恢复真实排名；在乘法交互中，联盟将平均投影残差从0.2527降至0.0001。在WDBC和糖尿病数据上，与SHAP高度一致（Spearman rho分别为0.9273和0.9353），并通过进一步的符号、top-k、幅度、删除和敏感性分析得到支持。结果表明，DhondtXAI是一种互补的比例性、联盟感知和阈值感知的表格可解释AI方法，而非SHAP或LIME的替代品。

英文摘要

This study presents DhondtXAI as a SHAP-independent, D'Hondt-based attribution framework for tabular XAI. Instead of model-native feature importance or SHAP values, DhondtXAI computes background-interventional removal effects, separates positive and negative evidence, forms optional feature alliances, applies optional thresholds, allocates seats via the D'Hondt rule, and projects onto the local model-output difference. Completeness is preserved by construction, with the projection residual ratio reported as a diagnostic. The method is evaluated on synthetic additive and interaction tests, correlated-feature perturbations, operator and apportionment ablations, projection-mode comparisons, logit-scale checks, repeated split validation, paired deletion tests, and two healthcare datasets: Wisconsin Diagnostic Breast Cancer (CatBoost) and early-stage diabetes risk prediction (XGBoost). SHAP serves only as an external comparator with aligned settings. In additive synthetics, DhondtXAI exactly recovers ground-truth rankings; in multiplicative interactions, alliances reduce the mean projection residual from 0.2527 to 0.0001. On WDBC and diabetes data, it shows high agreement with SHAP (Spearman rho = 0.9273 and 0.9353), supported by further signed, top-k, magnitude, deletion, and sensitivity analyses. Results position DhondtXAI as a complementary proportional, alliance-aware, and threshold-aware tabular XAI method, not a replacement for SHAP or LIME.

URL PDF HTML ☆

赞 0 踩 0

2410.21361 2026-06-02 cs.CV cs.LG 版本更新

Domain Adaptation with a Single Vision-Language Embedding

基于单一视觉-语言嵌入的域适应

Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc, Patrick Pérez, Raoul de Charette

发表机构 * Inria（法国国家信息与自动化研究所）； Kyutai（Kyutai公司）

AI总结提出一种利用单一视觉-语言（VL）嵌入进行域适应的框架，通过提示/照片驱动的实例归一化（PIN）挖掘多种视觉风格，实现零样本和单样本无监督域适应，在语义分割任务上优于基线方法。

Comments International Journal of Computer Vision (IJCV 2026)

详情

AI中文摘要

域适应在计算机视觉中已被广泛研究，但仍需要在训练时访问目标数据，这在现实世界的自动驾驶场景中可能难以获得，尤其是在罕见或恶劣条件下。本文提出了一种新的域适应框架，该框架依赖于单一的视觉-语言（VL）潜在嵌入，而不是完整的目标数据。首先，利用对比语言-图像预训练模型（CLIP），我们提出了提示/照片驱动的实例归一化（PIN）。PIN是一种特征增强方法，通过优化低级源特征的仿射变换，使用单一的目标VL潜在嵌入挖掘多种视觉风格。VL嵌入可以来自描述目标域的语言提示、部分优化的语言提示或单一未标记的目标图像。其次，我们表明这些挖掘的风格（即增强）可用于零样本（即无目标）和单样本无监督域适应。在真实世界驾驶数据集（包括Cityscapes和ACDC（恶劣条件））上的语义分割实验证明了所提出方法的有效性，在实用的零样本和单样本设置中优于相关基线。

英文摘要

Domain adaptation has been extensively investigated in computer vision but still requires access to target data at the training time, which might be difficult to obtain in real-world autonomous driving scenarios, especially under rare or adverse conditions. In this paper, we present a new framework for domain adaptation relying on a single Vision-Language (VL) latent embedding instead of full target data. First, leveraging a contrastive language-image pre-training model (CLIP), we propose prompt/photo-driven instance normalization (PIN). PIN is a feature augmentation method that mines multiple visual styles using a single target VL latent embedding, by optimizing affine transformations of low-level source features. The VL embedding can come from a language prompt describing the target domain, a partially optimized language prompt, or a single unlabeled target image. Second, we show that these mined styles (i.e., augmentations) can be used for zero-shot (i.e., target-free) and one-shot unsupervised domain adaptation. Experiments on semantic segmentation in real-world driving datasets, including Cityscapes and ACDC (adverse conditions), demonstrate the effectiveness of the proposed method, which outperforms relevant baselines in the practical zero-shot and one-shot settings.

URL PDF HTML ☆

赞 0 踩 0

2410.09737 2026-06-02 cs.LG 版本更新

Towards Stable, Globally Expressive Graph Representations with Laplacian Eigenvectors

利用拉普拉斯特征向量实现稳定且全局表达性的图表示

Junru Zhou, Cai Zhou, Xiyuan Wang, Pan Li, Muhan Zhang

发表机构 * Institute for Artificial Intelligence, Peking University（北京大学人工智能研究院）； Department of EECS, Massachusetts Institute of Technology（麻省理工学院电子工程与计算机科学系）； School of ECE, Georgia Institute of Technology（佐治亚理工学院电子与计算机工程系）

AI总结提出一种利用可学习的O(p)-不变表示和平滑处理数值接近特征值的方法，以增强图神经网络中拉普拉斯特征向量的稳定性和全局表达性。

详情

AI中文摘要

提高图神经网络（GNN）表达能力的一种流行方法是使用拉普拉斯特征向量作为额外的节点特征，因为它们既可以作为结构标识符，也可以作为节点的全局坐标。正确处理特征向量之间的正交群对称性对于拉普拉斯特征向量增强的GNN的稳定性和泛化能力至关重要。先前的研究表明，对每个$p$维特征空间使用朴素的$O(p)$-群不变编码器通常会导致表达性损失和数值不稳定性。在本文中，我们提出了一种利用拉普拉斯特征向量生成\emph{稳定}且全局\emph{表达性}的图表示的新方法。与先前工作的主要区别在于：（i）我们的方法对每个维度为$p$的拉普拉斯特征空间利用 extbf{可学习的}$O(p)$-不变表示，这些表示建立在文献中已充分研究的强大正交群等变神经网络层之上；（ii）我们的方法以 extbf{平滑}的方式处理数值接近的特征值，确保其对扰动具有更好的鲁棒性。在各种图学习基准上的实验证明了我们方法的竞争性能，特别是其学习图全局属性的巨大潜力。

英文摘要

A popular way to improve the expressive power of graph neural networks (GNNs) is to use Laplacian eigenvectors as additional node features, since they can serve both as structural identifiers and global coordinates of nodes. Properly handling the orthogonal group symmetry among eigenvectors is crucial for the stability and generalizability of Laplacian eigenvector augmented GNNs. Previous studies have shown that using a naive $O(p)$-group invariant encoder for each $p$-dimensional eigenspace often leads to expressivity loss and numerical instability. In this paper, we propose a novel method exploiting Laplacian eigenvectors to generate \emph{stable} and globally \emph{expressive} graph representations. The main difference from previous works is that (i) our method utilizes \textbf{learnable} $O(p)$-invariant representations for each Laplacian eigenspace of dimension $p$, which are built upon powerful orthogonal group equivariant neural network layers already well studied in the literature, and that (ii) our method deals with numerically close eigenvalues in a \textbf{smooth} fashion, ensuring its better robustness against perturbations. Experiments on various graph learning benchmarks witness the competitive performance of our method, especially its great potential to learn global properties of graphs.

URL PDF HTML ☆

赞 0 踩 0

2404.13621 2026-06-02 cs.CV cs.LG cs.MM 版本更新

Attack on Scene Flow using Point Clouds

使用点云对场景流进行攻击

Haniyeh Ehsani Oskouie, Mohammad-Shahram Moin, Shohreh Kasaei

发表机构 * Sharif University of Technology（谢里弗大学）； ICT Research Institute（信息与通信技术研究所）

AI总结针对场景流网络提出白盒对抗攻击方法，在KITTI和FlyingThings3D数据集上实现平均端点误差相对下降33.7%，并揭示单维度或单颜色通道攻击的影响。

详情

DOI: 10.13140/RG.2.2.29455.19362

AI中文摘要

深度神经网络在使用点云准确估计场景流方面取得了显著进展，这对于视频分析、动作识别和导航等许多应用至关重要。然而，这些技术的鲁棒性仍然令人担忧，特别是在面对已被证明能在许多领域欺骗最先进深度神经网络的对抗攻击时。令人惊讶的是，场景流网络对此类攻击的鲁棒性尚未得到彻底研究。为解决这一问题，本文提出了一种专门针对场景流网络的白盒对抗攻击方法。实验结果表明，生成的对抗样本在KITTI和FlyingThings3D数据集上使平均端点误差相对下降高达33.7%。研究还揭示了仅针对点云的一个维度或颜色通道的攻击对平均端点误差的显著影响。通过分析这些攻击在场景流网络及其2D光流网络变体上的成功与失败，发现光流网络具有更高的脆弱性。代码可在https://github.com/aheldis/Attack-on-Scene-Flow-using-Point-Clouds.git获取。

英文摘要

Deep neural networks have made significant advancements in accurately estimating scene flow using point clouds, which is vital for many applications like video analysis, action recognition, and navigation. The robustness of these techniques, however, remains a concern, particularly in the face of adversarial attacks that have been proven to deceive state-of-the-art deep neural networks in many domains. Surprisingly, the robustness of scene flow networks against such attacks has not been thoroughly investigated. To address this problem, the proposed approach aims to bridge this gap by introducing adversarial white-box attacks specifically tailored for scene flow networks. Experimental results show that the generated adversarial examples obtain up to 33.7 relative degradation in average end-point error on the KITTI and FlyingThings3D datasets. The study also reveals the significant impact that attacks targeting point clouds in only one dimension or color channel have on average end-point error. Analyzing the success and failure of these attacks on the scene flow networks and their 2D optical flow network variants shows a higher vulnerability for the optical flow networks. Code is available at https://github.com/aheldis/Attack-on-Scene-Flow-using-Point-Clouds.git.

URL PDF HTML ☆

赞 0 踩 0

2401.17010 2026-06-02 cs.CR cs.AI cs.LG 版本更新

Finetuning Large Language Models for Vulnerability Detection

微调大型语言模型用于漏洞检测

Alexey Shestov, Rodion Levichev, Ravil Mussabayev, Evgeny Maslov, Anton Cheshkov, Pavel Zadorozhny

发表机构 * Sber AI Lab（Sber AI实验室）； Huawei Russian Research Institute（华为俄罗斯研究院）； Satbayev University（萨特拜耶夫大学）

AI总结本文通过微调WizardCoder模型，优化训练流程并处理类别不平衡，在漏洞检测任务上提升了ROC AUC和F1指标，展示了预训练LLM在源代码分析中的迁移学习潜力。

详情

DOI: 10.1109/ACCESS.2025.3546700

AI中文摘要

本文介绍了微调大型语言模型（LLMs）用于检测源代码中漏洞的结果。我们利用WizardCoder（最新改进的先进LLM StarCoder），并通过进一步微调使其适应漏洞检测。为加速训练，我们修改了WizardCoder的训练过程，并研究了最优训练方案。针对负样本远多于正样本的不平衡数据集，我们还探索了不同技术以提升分类性能。微调后的WizardCoder模型在平衡和不平衡的漏洞数据集上，相比于CodeBERT类模型，在ROC AUC和F1指标上均有提升，证明了将预训练LLM用于源代码漏洞检测的有效性。关键贡献包括：微调先进的代码LLM WizardCoder、在不损害性能的前提下提高其训练速度、优化训练流程和方案、处理类别不平衡，以及在困难的漏洞检测数据集上提升性能。这展示了通过微调大型预训练语言模型进行专门源代码分析任务的迁移学习潜力。

英文摘要

This paper presents the results of finetuning large language models (LLMs) for the task of detecting vulnerabilities in source code. We leverage WizardCoder, a recent improvement of the state-of-the-art LLM StarCoder, and adapt it for vulnerability detection through further finetuning. To accelerate training, we modify WizardCoder's training procedure, also we investigate optimal training regimes. For the imbalanced dataset with many more negative examples than positive, we also explore different techniques to improve classification performance. The finetuned WizardCoder model achieves improvement in ROC AUC and F1 measures on balanced and imbalanced vulnerability datasets over CodeBERT-like model, demonstrating the effectiveness of adapting pretrained LLMs for vulnerability detection in source code. The key contributions are finetuning the state-of-the-art code LLM, WizardCoder, increasing its training speed without the performance harm, optimizing the training procedure and regimes, handling class imbalance, and improving performance on difficult vulnerability detection datasets. This demonstrates the potential for transfer learning by finetuning large pretrained language models for specialized source code analysis tasks.

URL PDF HTML ☆

赞 0 踩 0

2307.05213 2026-06-02 cs.LG cs.AI 版本更新

Score Function Gradient Estimation to Widen the Applicability of Decision-Focused Learning

评分函数梯度估计以拓宽决策聚焦学习的适用性

Mattia Silvestri, Senne Berden, Jayanta Mandi, Ali İrfan Mahmutoğulları, Brandon Amos, Tias Guns, Michele Lombardi

发表机构 * University of Bologna（博洛尼亚大学）； KU Leuven（鲁汶大学）； Meta

AI总结提出一种结合随机平滑与评分函数梯度估计的方法，无需对问题结构做特定假设，即可将决策聚焦学习扩展到非线性目标、约束中不确定参数及两阶段随机优化问题。

详情

DOI: 10.1613/jair.1.19498
Journal ref: Silvestri, Mattia, et al. "Score Function Gradient Estimation to Widen the Applicability of Decision-Focused Learning." Journal of Artificial Intelligence Research 85 (2026)

AI中文摘要

对抗机器学习中出现的非局部周长的Gamma收敛

Leon Bungert, Kerrek Stinson

发表机构 * Hausdorff Center for Mathematics, University of Bonn（哈代尔夫数学中心、波恩大学）

AI总结本文证明了一种Minkowski型非局部周长在Gamma收敛意义下趋于局部各向异性周长，该非局部模型描述了二分类中对抗训练的正则化效应，仅假设分布具有有界BV密度，并应用于总变分、对抗训练渐近性和图离散化的收敛性分析。

Comments Fixed typos, added new isotropic-anisotropic decomposition formula for limit perimeter

详情

DOI: 10.1007/s00526-024-02721-9
Journal ref: Calculus of Variations and Partial Differential Equations 63 (5), 114, 2024

AI中文摘要

在本文中，我们证明了Minkowski型非局部周长在Gamma收敛意义下趋于局部各向异性周长。该非局部模型描述了二分类中对抗训练的正则化效应。该能量本质上依赖于两个分布之间的相互作用，这些分布模拟了相关类别的似然。我们克服了分布典型的严格正则性假设，仅假设它们具有有界$BV$密度。在由紧性导出的自然拓扑中，我们证明了Gamma收敛到一个加权周长，其权重由两个密度的各向异性函数决定。尽管是局部的，这个尖锐界面极限反映了对抗扰动下的分类稳定性。我们进一步应用我们的结果来推导相关总变分的Gamma收敛，研究对抗训练的渐近性，并证明非局部周长的图离散化的Gamma收敛。

英文摘要

In this paper we prove Gamma-convergence of a nonlocal perimeter of Minkowski type to a local anisotropic perimeter. The nonlocal model describes the regularizing effect of adversarial training in binary classifications. The energy essentially depends on the interaction between two distributions modelling likelihoods for the associated classes. We overcome typical strict regularity assumptions for the distributions by only assuming that they have bounded $BV$ densities. In the natural topology coming from compactness, we prove Gamma-convergence to a weighted perimeter with weight determined by an anisotropic function of the two densities. Despite being local, this sharp interface limit reflects classification stability with respect to adversarial perturbations. We further apply our results to deduce Gamma-convergence of the associated total variations, to study the asymptotics of adversarial training, and to prove Gamma-convergence of graph discretizations for the nonlocal perimeter.

URL PDF HTML ☆

赞 0 踩 0

2312.03644 2026-06-02 cs.LG cs.MA 版本更新

MACCA: Offline Multi-agent Reinforcement Learning with Causal Credit Assignment

MACCA: 离线多智能体强化学习中的因果信用分配

Ziyan Wang, Yali Du, Yudi Zhang, Meng Fang, Biwei Huang

发表机构 * King’s College London（伦敦国王学院）； Eindhoven University of Technology（埃因霍温理工大学）； University of Liverpool（利物浦大学）； University of California San Diego（加州大学圣地亚哥分校）

AI总结提出基于动态贝叶斯网络的因果信用分配框架MACCA，通过建模环境变量、状态、动作和奖励的因果关系，实现离线多智能体强化学习中准确且可解释的信用分配。

Comments 21 pages, 4 figures

详情

Journal ref: TMLR 2025

AI中文摘要

离线多智能体强化学习（MARL）在在线交互不切实际或存在风险的情况下具有重要价值。虽然MARL中的独立学习提供了灵活性和可扩展性，但在离线设置中，由于禁止与环境交互，准确地将信用分配给单个智能体面临挑战。在本文中，我们提出了一种新框架，即多智能体因果信用分配（MACCA），以解决离线MARL设置中的信用分配问题。我们的方法MACCA将生成过程表征为动态贝叶斯网络，捕获环境变量、状态、动作和奖励之间的关系。通过在离线数据上估计该模型，MACCA可以通过分析个体奖励的因果关系来学习每个智能体的贡献，确保准确且可解释的信用分配。此外，我们方法的模块化使其能够无缝集成到各种离线MARL方法中。理论上，我们证明了在离线数据集设置下，底层因果结构和用于生成智能体个体奖励的函数是可识别的，这为我们的建模正确性奠定了基础。在我们的实验中，我们证明MACCA不仅优于最先进的方法，而且在与其他骨干集成时也能提升性能。

英文摘要

Offline Multi-agent Reinforcement Learning (MARL) is valuable in scenarios where online interaction is impractical or risky. While independent learning in MARL offers flexibility and scalability, accurately assigning credit to individual agents in offline settings poses challenges because interactions with an environment are prohibited. In this paper, we propose a new framework, namely Multi-Agent Causal Credit Assignment (MACCA), to address credit assignment in the offline MARL setting. Our approach, MACCA, characterizing the generative process as a Dynamic Bayesian Network, captures relationships between environmental variables, states, actions, and rewards. Estimating this model on offline data, MACCA can learn each agent's contribution by analyzing the causal relationship of their individual rewards, ensuring accurate and interpretable credit assignment. Additionally, the modularity of our approach allows it to integrate with various offline MARL methods seamlessly. Theoretically, we proved that under the setting of the offline dataset, the underlying causal structure and the function for generating the individual rewards of agents are identifiable, which laid the foundation for the correctness of our modeling. In our experiments, we demonstrate that MACCA not only outperforms state-of-the-art methods but also enhances performance when integrated with other backbones.

URL PDF HTML ☆

赞 0 踩 0

2310.20545 2026-06-02 cs.LG math.OC stat.ML 版本更新

Optimizing accuracy and diversity: a multi-task approach to forecast combinations

优化准确性与多样性：一种多任务预测组合方法

Giovanni Felici, Antonio M. Sudoso

发表机构 * National Research Council（国家研究理事会）； Sapienza University of Rome（罗马萨皮恩扎大学）

AI总结提出一种基于深度学习架构的多任务优化方法，通过联合选择与组合预测模型，同时考虑准确性和多样性，提升时间序列点预测精度。

详情

DOI: 10.1007/s10479-026-07291-x
Journal ref: Annals of Operations Research, 2026

AI中文摘要

我们提出了一种基于深度学习架构的多任务优化方法，用于时间序列预测。我们利用大量时间序列集合来识别可组合的预测模型权重，从而为每个序列生成预测。该方法联合处理两个任务：选择不同的预测模型及其有效组合。在此过程中，它以一种新颖的方式兼顾了预测方法的准确性和多样性。对于给定的时间序列，模型组合模块提取特征并用于优化预测方法的权重。同时，模型选择模块提取其他特征以识别用于预测的方法子集。该选择过程被构建为一个分类问题，标签表示用于序列的模型集合。这些标签通过求解一个辅助优化问题来确定，该问题为每个时间序列识别准确且多样的方法。然后，两个模块的输出被组合，整个神经网络通过梯度下降优化最小化自定义损失函数进行联合训练。在M4竞赛数据集和真实道路交通数据的大量序列上的实验结果表明，与最先进的方法相比，我们的方法提高了点预测精度。

英文摘要

We present a multi-task optimization approach based on a deep learning architecture for time series forecasting. We leverage large collections of time series to identify the weights of forecasting models that can be combined to produce forecasts for each series. This method jointly addresses two tasks: the selection of different forecasting models, and their effective combination. In doing so, it keeps into account, in an original way, both the accuracy and diversity of the forecasting methods. For a given time series, the model combination module extracts features and uses them to optimize the weights of the forecasting methods. Simultaneously, the model selection module extracts other features to identify the subset of methods to be used for the prediction. This selection process is framed as a classification problem, with the labels representing the set of models to be used for a series. These labels are determined by solving an auxiliary optimization problem that identifies accurate and diverse methods for each time series. The outputs of the two modules are then combined and the entire neural network is jointly trained by minimizing a custom loss function via gradient descent optimization. Experimental results on a large set of series from the M4 competition dataset and from real road traffic data show that our proposal enhances point forecast accuracy compared to state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2311.00260 2026-06-02 cs.GT cs.LG 版本更新

Incentivized Collaboration in Active Learning

主动学习中的激励性协作

Lee Cohen, Han Shao

AI总结针对多个理性代理在共同假设上协作学习标签的问题，提出一种激励性协作框架，通过设计严格个体理性协议确保代理参与协作不增加预期标签复杂度，并给出与已知可处理近似算法标签复杂度相当的协作协议。

2309.15946 2026-06-02 cs.LG cs.AI cs.NE math.DS 版本更新

Unified Long-Term Time-Series Forecasting Benchmark

统一长期时间序列预测基准

Jacek Cyranka, Szymon Haponiuk

发表机构 * Institute of Informatics（信息学院）

AI总结提出一个专为长期时间序列预测设计的综合数据集，通过标准化轨迹和多种模型基准测试，发现模型效果依赖于数据集，并引入改进的潜在NLinear和课程学习DeepAR模型。

详情

DOI: 10.1016/j.neucom.2026.134091

AI中文摘要

为了支持时间序列数据预测的机器学习方法的发展，我们提出了一个明确针对长期时间序列预测设计的综合数据集。我们整合了来自多种动态系统和真实记录的数据集集合。每个数据集通过将数据划分为具有预定回溯长度的训练和测试轨迹进行标准化。我们包含长度高达$2000$的轨迹，以确保对长期预测能力的可靠评估。为了确定在不同场景中最有效的模型，我们使用经典和最先进的模型（即LSTM、DeepAR、NLinear、N-Hits、PatchTST和LatentODE）进行了广泛的基准分析。我们的研究结果揭示了这些模型之间有趣的性能比较，突出了模型有效性的数据集依赖性。值得注意的是，我们引入了一个自定义的潜在NLinear模型，并通过课程学习阶段增强了DeepAR。两者都持续优于其原始版本。

英文摘要

In order to support the advancement of machine learning methods for predicting time-series data, we present a comprehensive dataset designed explicitly for long-term time-series forecasting. We incorporate a collection of datasets obtained from diverse, dynamic systems and real-life records. Each dataset is standardized by dividing it into training and test trajectories with predetermined lookback lengths. We include trajectories of length up to $2000$ to ensure a reliable evaluation of long-term forecasting capabilities. To determine the most effective model in diverse scenarios, we conduct an extensive benchmarking analysis using classical and state-of-the-art models, namely LSTM, DeepAR, NLinear, N-Hits, PatchTST, and LatentODE. Our findings reveal intriguing performance comparisons among these models, highlighting the dataset-dependent nature of model effectiveness. Notably, we introduce a custom latent NLinear model and enhance DeepAR with a curriculum learning phase. Both consistently outperform their vanilla counterparts.

URL PDF HTML ☆

赞 0 踩 0

2212.06751 2026-06-02 cs.LG cs.AI 版本更新

Speeding Up Multi-Objective Hyperparameter Optimization by Task Similarity-Based Meta-Learning for the Tree-Structured Parzen Estimator

基于任务相似性元学习加速多目标超参数优化的树形结构Parzen估计器

Shuhei Watanabe, Noor Awad, Masaki Onishi, Frank Hutter

发表机构 * Department of Computer Science, University of Freiburg, Germany（弗赖堡大学计算机科学系）； Artificial Intelligence Research Center, AIST, Tokyo, Japan（日本科学技术厅人工智能研究中心）

AI总结提出利用任务间顶级域重叠定义的任务相似性扩展TPE采集函数到元学习设置，加速多目标超参数优化，理论分析并解决相似性局限，实验证明在表格HPO基准上达到最优性能并赢得AutoML 2022竞赛。

Comments Accpeted to IJCAI 2023

详情

DOI: 10.24963/ijcai.2023/487

AI中文摘要

超参数优化（HPO）是提升深度学习性能的关键步骤。实践者常面临多个指标间的权衡，如准确率和延迟。鉴于深度学习的高计算需求以及对高效HPO日益增长的需求，加速多目标优化变得愈发重要。尽管已有大量关于元学习用于HPO的工作，但现有方法不适用于多目标树形结构Parzen估计器（MO-TPE），这是一种简单而强大的多目标HPO算法。在本文中，我们利用任务间顶级域重叠定义的任务相似性，将TPE的采集函数扩展到元学习设置。我们还从理论上分析并解决了任务相似性的局限性。实验中，我们证明了该方法在表格HPO基准上加速了MO-TPE，并达到了最先进的性能。我们的方法还通过赢得AutoML 2022“Transformer多目标超参数优化”竞赛得到了外部验证。

英文摘要

Hyperparameter optimization (HPO) is a vital step in improving performance in deep learning (DL). Practitioners are often faced with the trade-off between multiple criteria, such as accuracy and latency. Given the high computational needs of DL and the growing demand for efficient HPO, the acceleration of multi-objective (MO) optimization becomes ever more important. Despite the significant body of work on meta-learning for HPO, existing methods are inapplicable to MO tree-structured Parzen estimator (MO-TPE), a simple yet powerful MO-HPO algorithm. In this paper, we extend TPE's acquisition function to the meta-learning setting using a task similarity defined by the overlap of top domains between tasks. We also theoretically analyze and address the limitations of our task similarity. In the experiments, we demonstrate that our method speeds up MO-TPE on tabular HPO benchmarks and attains state-of-the-art performance. Our method was also validated externally by winning the AutoML 2022 competition on "Multiobjective Hyperparameter Optimization for Transformers".

URL PDF HTML ☆

赞 0 踩 0

2304.10255 2026-06-02 cs.LG stat.ML 版本更新

PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces

PED-ANOVA：高效量化任意子空间中超参数重要性

Shuhei Watanabe, Archit Bansal, Frank Hutter

发表机构 * Department of Computer Science, University of Freiburg, Germany（弗赖堡大学计算机科学系）

AI总结提出PED-ANOVA方法，利用Pearson散度实现任意子空间中超参数重要性的闭式计算，在保持高效性的同时准确识别关键超参数。

Comments Accepted by IJCAI2023

详情

DOI: 10.24963/ijcai.2023/488

AI中文摘要

近年来，深度学习超参数优化（HPO）的流行凸显了良好超参数（HP）空间设计在训练强模型中的作用。而设计一个好的HP空间关键依赖于理解不同HP的作用。这激发了超参数重要性（HPI）的研究，例如使用流行的功能ANOVA（f-ANOVA）方法。然而，原始的f-ANOVA公式不适用于算法设计者最相关的子空间，例如由顶级性能定义的子空间。为解决此问题，我们推导了任意子空间下f-ANOVA的新公式，并提出一种使用Pearson散度（PED）实现HPI闭式计算的算法。我们证明，这种新算法称为PED-ANOVA，能够成功识别不同子空间中的重要HP，同时计算效率极高。

英文摘要

The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e.g., with the popular method of functional ANOVA (f-ANOVA). However, the original f-ANOVA formulation is inapplicable to the subspaces most relevant to algorithm designers, such as those defined by top performance. To overcome this issue, we derive a novel formulation of f-ANOVA for arbitrary subspaces and propose an algorithm that uses Pearson divergence (PED) to enable a closed-form calculation of HPI. We demonstrate that this new algorithm, dubbed PED-ANOVA, is able to successfully identify important HPs in different subspaces while also being extremely computationally efficient.

URL PDF HTML ☆

赞 0 踩 0

2211.14411 2026-06-02 cs.LG cs.AI 版本更新

c-TPE: Tree-structured Parzen Estimator with Inequality Constraints for Expensive Hyperparameter Optimization

c-TPE: 带不等式约束的树结构Parzen估计器用于昂贵的超参数优化

Shuhei Watanabe, Frank Hutter

发表机构 * Department of Computer Science, University of Freiburg（弗赖堡大学计算机科学系）

AI总结提出c-TPE方法，通过修改TPE的采样和模型以处理不等式约束，在81个昂贵HPO问题上取得最佳平均排名性能。

Comments Accepted to IJCAI 2023

详情

DOI: 10.24963/ijcai.2023/486

AI中文摘要

超参数优化（HPO）对于深度学习算法的强性能至关重要，而实际应用通常在性能要求之上施加一些约束，例如内存使用或延迟。在这项工作中，我们提出了约束TPE（c-TPE），这是广泛使用的通用贝叶斯优化方法——树结构Parzen估计器（TPE）的扩展，以处理这些约束。我们提出的扩展不仅仅是现有采集函数和原始TPE的简单组合，而是包括解决导致性能不佳问题的修改。我们通过实验和理论彻底分析了这些修改，提供了关于它们如何有效克服这些挑战的见解。在实验中，我们证明c-TPE在81个带不等式约束的昂贵HPO问题上，以统计显著性在现有方法中表现出最佳平均排名性能。由于缺乏基线，我们仅在附录D中讨论了我们方法对硬约束优化的适用性。该实现现在可通过OptunaHub获得。

英文摘要

Hyperparameter optimization (HPO) is crucial for strong performance of deep learning algorithms and real-world applications often impose some constraints, such as on memory usage or latency, on top of the performance requirement. In this work, we propose constrained TPE (c-TPE), an extension of the widely-used versatile Bayesian optimization method, tree-structured Parzen estimator (TPE), to handle these constraints. Our proposed extension goes beyond a simple combination of an existing acquisition function and the original TPE, and instead includes modifications that address issues that cause poor performance. We thoroughly analyze these modifications both empirically and theoretically, providing insights into how they effectively overcome these challenges. In the experiments, we demonstrate that c-TPE exhibits the best average rank performance among existing methods with statistical significance on $81$ expensive HPO problems with inequality constraints. Due to the lack of baselines, we only discuss the applicability of our method to hard-constrained optimization in Appendix D. The implementation is now available via OptunaHub.

URL PDF HTML ☆

赞 0 踩 0

2303.04345 2026-06-02 cs.LG 版本更新

Federated Learning via Variational Bayesian Inference: Personalization, Sparsity and Clustering

通过变分贝叶斯推理的联邦学习：个性化、稀疏性和聚类

Xu Zhang, Wenpeng Li, Yunfeng Shao, Yonglin Liu, Kaiwen Zhou, Yinchuan Li

发表机构 * School of Artificial Intelligence, Xidian University（西安电子科技大学人工智能学院）； LSEC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences（中国科学院数学与系统科学研究院）； Noah’s Ark Lab, Huawei（华为诺亚实验室）

AI总结提出基于变分贝叶斯推理的联邦学习方法，通过个性化、稀疏性和聚类策略解决数据异构和有限问题，实现更优性能。

Comments 18 pages, 5 figures

详情

AI中文摘要

联邦学习（FL）是一种有前景的框架，它在保护客户隐私的同时实现分布式机器学习。然而，FL因异构和有限的数据而性能下降。为了缓解这种下降，我们提出了一种新颖的个性化贝叶斯FL方法，名为pFedBayes。通过使用从服务器训练得到的全局分布作为每个客户的先验分布，每个客户通过最小化其个性化数据上的重构误差与下载的全局分布的KL散度之和来调整自己的分布。然后，我们提出了一种稀疏个性化贝叶斯FL方法，名为sFedBayes，以提高推理效率。为了克服非独立同分布数据中的极端异构性，我们提出了一种聚类贝叶斯FL模型，名为cFedbayes，通过为不同客户学习不同的先验分布。理论分析给出了这三种方法的泛化误差界，并表明所提出方法的泛化误差率在达到对数因子内达到极小极大最优性。此外，cFedBayes实现了聚类级别的泛化误差界，而不是pFedBayes中的单一统一界。大量实验表明，在异构和有限数据存在的情况下，所提出的方法在私有模型上比其他先进的个性化方法具有更好的性能。

英文摘要

Federated learning (FL) is a promising framework that models distributed machine learning while protecting the privacy of clients. However, FL suffers performance degradation from heterogeneous and limited data. To alleviate the degradation, we present a novel personalized Bayesian FL approach named pFedBayes. By using the trained global distribution from the server as the prior distribution of each client, each client adjusts its own distribution by minimizing the sum of the reconstruction error over its personalized data and the KL divergence with the downloaded global distribution. Then, we propose a sparse personalized Bayesian FL approach named sFedBayes to enhance the inference efficiency. To overcome the extreme heterogeneity in non-i.i.d. data, we propose a clustered Bayesian FL model named cFedbayes by learning different prior distributions for different clients. Theoretical analysis gives the generalization error bound of three approaches and shows that the generalization error rates of the proposed approaches achieve minimax optimality up to a logarithmic factor. Moreover, cFedBayes achieves a cluster-level generalization error bound, rather than a single uniform bound in pFedBayes. Numerous experiments demonstrate that the proposed approaches have better performance than other advanced personalized methods on private models in the presence of heterogeneous and limited data.

URL PDF HTML ☆

赞 0 踩 0

2301.06308 2026-06-02 cs.LG cs.AI 版本更新

Stability Analysis of Sharpness-Aware Minimization

锐度感知最小化的稳定性分析

Hoki Kim, Jinseong Park, Yujin Choi, Jaewook Lee

发表机构 * Chung-Ang University, South Korea（Chung-Ang 大学，韩国）； Korea Institute for Advanced Study, South Korea（韩国高级研究院）； Ulsan National Institute of Science（乌山国家科学研究院）； Nanyang Technological University (NTU), Singapore（南洋理工大学（NTU），新加坡）； Seoul National University, South Korea（首尔国立大学，韩国）

AI总结研究SAM在鞍点附近的收敛不稳定性，通过动力系统理论证明鞍点成为吸引子，并发现动量与批次大小可缓解该问题。

Comments Accepted to ICML 2026

详情

AI中文摘要

锐度感知最小化（SAM）是一种训练方法，旨在寻找深度学习中的平坦最小值，从而在各个领域取得最先进的性能。SAM不是最小化当前权重的损失，而是最小化参数空间中其邻域内的最坏情况损失。在本文中，我们研究了SAM在鞍点附近的收敛不稳定性。利用动力系统的定性理论，我们解释了SAM如何陷入鞍点，并从理论上证明了在SAM动力学下鞍点可以成为吸引子。此外，通过建立SAM的扩散，我们证明了这种收敛不稳定性也可能发生在随机动力系统中。我们证明，在逃离鞍点方面，SAM扩散比普通梯度下降更差。最后，我们展示了经常被忽视的训练技巧——动量和批次大小——可能对缓解收敛不稳定性和实现高泛化性能很重要。我们的理论和实证结果通过几个著名的优化问题和基准任务的实验得到了充分验证。

英文摘要

Sharpness-aware minimization (SAM) is a training method that seeks to find flat minima in deep learning, resulting in state-of-the-art performance across various domains. Instead of minimizing the loss of the current weights, SAM minimizes the worst-case loss in its neighborhood in the parameter space. In this paper, we investigate the convergence instability of SAM near a saddle point. Using the qualitative theory of dynamical systems, we explain how SAM becomes stuck in the saddle point and theoretically prove that the saddle point can become an attractor under SAM dynamics. Additionally, we show that this convergence instability can also occur in stochastic dynamical systems by establishing the diffusion of SAM. We prove that SAM diffusion is worse than that of vanilla gradient descent in terms of saddle point escape. Finally, we demonstrate that often overlooked training tricks, momentum and batch-size, might be important to mitigate the convergence instability and achieve high generalization performance. Our theoretical and empirical results are thoroughly verified through experiments on several well-known optimization problems and benchmark tasks.

URL PDF HTML ☆

赞 0 踩 0

2208.12389 2026-06-02 cs.LG cs.AI 版本更新

Static Seeding and Clustering of LSTM Embeddings to Learn from Loosely Time-Decoupled Events

LSTM嵌入的静态播种与聚类以从松散时间解耦事件中学习

Christian Manasseh, Razvan Veliche, Jared Bennett, Hamilton Clouse

发表机构 * Air Force Research Lab (AFRL) Autonomy Capability Team 3 (ACT3)（美国空军研究实验室（AFRL）自主能力团队3（ACT3））

AI总结提出通过静态数据播种LSTM生成嵌入并聚类，以改进松散时间解耦时间序列预测，在COVID-19县级病例预测中提升10日移动平均精度。

详情

DOI: 10.1109/ACCESS.2023.3288487

AI中文摘要

人类从不同时间和地点发生的事件中学习，以预测相似的事件轨迹。我们将松散解耦时间序列（LDT）现象定义为两个或多个可能发生在不同地点和不同时间线上，但在事件性质和位置属性上具有相似性的事件。在这项工作中，我们改进了循环神经网络（RNN），特别是长短期记忆（LSTM）网络的使用，以使AI解决方案能够为LDT生成更好的时间序列预测。我们基于趋势使用时间序列之间的相似性度量，并引入表示这些趋势的嵌入。嵌入表示事件的属性，与LSTM结构结合，可以聚类以识别相似的、时间上未对齐的事件。在本文中，我们探索了从与LSTM建模的地球物理和人口现象相关的时间不变数据中播种多变量LSTM的方法。我们将这些方法应用于从COVID-19检测感染和死亡病例中得出的时间序列数据。我们使用公开的社会经济数据来播种LSTM模型，创建嵌入，以确定这种播种是否改善了病例预测。这些LSTM产生的嵌入被聚类，以识别用于预测演变时间序列的最佳匹配候选。应用这种方法，我们在美国县级疾病传播的10日移动平均预测中显示出改进。

英文摘要

Humans learn from the occurrence of events in a different place and time to predict similar trajectories of events. We define Loosely Decoupled Timeseries (LDT) phenomena as two or more events that could happen in different places and across different timelines but share similarities in the nature of the event and the properties of the location. In this work we improve on the use of Recurring Neural Networks (RNN), in particular Long Short-Term Memory (LSTM) networks, to enable AI solutions that generate better timeseries predictions for LDT. We use similarity measures between timeseries based on the trends and introduce embeddings representing those trends. The embeddings represent properties of the event which, coupled with the LSTM structure, can be clustered to identify similar temporally unaligned events. In this paper, we explore methods of seeding a multivariate LSTM from time-invariant data related to the geophysical and demographic phenomena being modeled by the LSTM. We apply these methods on the timeseries data derived from the COVID-19 detected infection and death cases. We use publicly available socio-economic data to seed the LSTM models, creating embeddings, to determine whether such seeding improves case predictions. The embeddings produced by these LSTMs are clustered to identify best-matching candidates for forecasting an evolving timeseries. Applying this method, we show an improvement in 10-day moving average predictions of disease propagation at the US County level.

URL PDF HTML ☆

赞 0 踩 0